DatasetCustomMetric

Defines a custom dataset-level aggregation.

Fields
displayName string

Optional. A display name for this custom summary metric. Used to prefix keys in the output summaryMetrics map. If not provided, a default name like "dataset_custom_metric_1", "dataset_custom_metric_2", etc., will be generated based on the order in the repeated field.

aggregationFunction string

Required. The Python code string containing the aggregation function. Expected function signature: def aggregate(instances: list[dict[str, Any]]) -> dict[str, float]:

The instances argument is a list of dictionaries, where each dictionary represents a single evaluation result item. The structure of each dictionary corresponds to the fields in the EvaluationResult message.

This includes: - "request": Contains the original input data and model inputs (from EvaluationResult.EvaluationRequest). - "candidateResults": Contains the results of any instance-level metrics (from EvaluationResult.CandidateResults).

Example of a single item in the instances list: { "request": { "prompt": {"text": "What is the capital of France?"}, "goldenResponse": {"text": "Paris"}, "candidateResponses": [{"candidate": "model-v1", "text": "Paris"}] }, "candidateResults": [ {"metric": "exactMatch", "score": 1.0}, {"metric": "bleu", "score": 0.9} ] }

JSON representation
{
  "displayName": string,
  "aggregationFunction": string
}