Evaluation batch upload

This page describes the format required to upload golden evaluations in a CSV file. For details about golden evaluations, see the golden evaluations documentation.

Download the template

Navigate to the Evaluate tab and click + Add test case -> Golden.
In the menu that appears, click Download template.
After you have used the template to create a CSV file containing your golden evaluations, you can upload it by clicking Upload file in the same menu.

General structure

A single CSV file can contain multiple evaluations. Each evaluation can span multiple rows.
The first row of an evaluation is the evaluation row and defines its overall properties (name and metadata).
Each subsequent row is a conversation row and defines a single conversation turn in the evaluation (for example, an end-user says something, the agent is expected to reply, or a tool call is expected).
You can start a new test case by providing a new name in the display_name column. Each new display_name value defines the start of a new evaluation.

Header row

Your CSV file must have a header row as the first line. This header defines the a data variable in each column. All variables other than the required variables are optional, unless required by an action_type value. Optional variable columns can be in any order after the required variables.

Required variables: display_name, turn_index, action_type.

Define a conversation evaluation

Each new evaluation starts at an evaluation row. Each conversation row below the evaluation row corresponds to one conversation turn, until the next evaluation row.

Evaluation row

The first line after the header row must be an evaluation row. Each evaluation row defines a new evaluation.

Required: Enter a unique, human-readable name for the evaluation in the display_name field.
Optional: You can optionally add any metadata variable data in this row.

Conversation row

Each row corresponds to data from one conversation turn.

Required: Enter values in the turn_index and action_type fields. display_name must be left blank.
Optional: Enter values for any header columns other than metadata variables or display_name.

Variables

The following tables describe the available data variables. All variables other than the required variables are optional, unless required by an action_type value. All variables must be defined in the header row, one per column. Optional variable columns can be in any order after the required columns.

Required header variables

Column name	Description
`display_name`	The human-readable name of your evaluation. This is only filled in for the first row of a new evaluation. Each new `display_name` value defines a new evaluation.
`turn_index`	A number (1, 2, 3...) indicating the sequential order of the conversation turn. All rows in one turn share the index value. Values must start at 1 for each evaluation. Each subsequent row must have the same or greater value than the row preceding it.
`action_type`	Specifies what this row's data represents. Each value has optional variable values that must be also filled (as indicated) in order for the conversation turn to input correctly. Input value must be one of the following: `INPUT_TEXT`: An end-user text input. - (Required) `text_content`. `INPUT_IMAGE`: An end-user image input. - (Required) `image_mime_type`, `image_content`. `INPUT_TOOL_RESPONSE`: A tool response input. - (Required) `tool_name`. - (Optional) `tool_response_json`. `INPUT_UPDATED_VARIABLES`: Update variables from an input. - (Required) `updated_variables_json` `EXPECTATION_TEXT`: Expected output from an agent text response. - (Required) `response_agent`, `text_content`. - (Optional) `expectation_note`. `EXPECTATION_TOOL_CALL`: Expected tool call. - (Required) `tool_name`. - (Optional) `tool_call_args_json`, `expectation_note`. `EXPECTATION_TOOL_RESPONSE`: Expected tool response. - (Required) `tool_name`. - (Optional) `expectation_note`. `EXPECTATION_AGENT_TRANSFER`: Expected agent transfer. - (Required) `agent_transfer_target`. - (Optional) `expectation_note`.

Metadata variables

Column name	Description
`evaluation_id`	A unique ID for the evaluation. Each `evaluation_id` value must be unique to your Customer Experience Agent Studio agent. If no value is entered manually in this column, a unique ID will be generated automatically.
`description`	Free-text notes or a description of the evaluation's purpose.
`tags`	Semicolon-separated tags for organizing evaluations (for example, "tag1;tag2").
`evaluation_groups`	Semicolon-separated names of any evaluation groups that the evaluation belongs to (for example, "group name 1;group name 2"). Any `evaluation_groups` values entered in this column but not defined in the header will be ignored.

Conversation turn variables

Column name	Description
`response_agent`	Name of the agent that provided the response. Expected only for `EXPECTATION_TEXT`.
`text_content`	The text for `INPUT_TEXT` or `EXPECTATION_TEXT`.
`image_mime_type`	The IANA standard MIME type of the source image. Supported values: `image/png`, `image/jpeg`, `image/webp`, `image/heic`, `image/heif`.
`image_content`	Bytes string of the `INPUT_IMAGE`.
`tool_name`	The `display_name` for the tool being called or responding. Expected for `INPUT_TOOL_RESPONSE,EXPECTATION_TOOL_CALL` or `EXPECTATION_TOOL_RESPONSE`.
`tool_call_args_json`	The JSON arguments for an `EXPECTATION_TOOL_CALL`.
`tool_response_json`	The JSON content of an `INPUT_TOOL_RESPONSE`.
`updated_variables_json`	The JSON content for `INPUT_UPDATED_VARIABLES`.
`agent_transfer_target`	Display name of the target agent for an `EXPECTATION_AGENT_TRANSFER`.
`expectation_note`	Note or description of the expectation.