MCP Tools Reference: ces.googleapis.com

Tool: `list_evaluations`

Lists evaluations.

The following sample demonstrate how to use curl to invoke the list_evaluations MCP tool.

Curl Request
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "list_evaluations", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }'

Curl Request

                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "list_evaluations",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'

Input Schema

Request message for EvaluationService.ListEvaluations.

ListEvaluationsRequest

JSON representation
{ "parent": string, "pageSize": integer, "pageToken": string, "filter": string, "evaluationFilter": string, "evaluationRunFilter": string, "orderBy": string, "lastTenResults": boolean }

Fields
`parent`	`string` Required. The resource name of the app to list evaluations from.
`pageSize`	`integer` Optional. Requested page size. Server may return fewer items than requested. If unspecified, server will pick an appropriate default.
`pageToken`	`string` Optional. The `next_page_token` value returned from a previous list `EvaluationService.ListEvaluations` call.
`filter (deprecated)`	`string` This item is deprecated! Optional. Deprecated: Use evaluation_filter and evaluation_run_filter instead.
`evaluationFilter`	`string` Optional. Filter to be applied on the evaluation when listing the evaluations. See https://google.aip.dev/160 for more details. Supported fields: evaluation_datasets
`evaluationRunFilter`	`string` Optional. Filter string for fields on the associated EvaluationRun resources. See https://google.aip.dev/160 for more details. Supported fields: create_time, initiated_by, app_version_display_name
`orderBy`	`string` Optional. Field to sort by. Only "name" and "create_time", and "update_time" are supported. Time fields are ordered in descending order, and the name field is ordered in ascending order. If not included, "update_time" will be the default. See https://google.aip.dev/132#ordering for more details.
`lastTenResults`	`boolean` Optional. Whether to include the last 10 evaluation results for each evaluation in the response.

Output Schema

Response message for EvaluationService.ListEvaluations.

ListEvaluationsResponse

JSON representation
{ "evaluations": [ { object (`Evaluation`) } ], "nextPageToken": string }

Fields

Fields
`evaluations[]`	`object (Evaluation)` The list of evaluations.
`nextPageToken`	`string` A token that can be sent as `ListEvaluationsRequest.page_token` to retrieve the next page. Absence of this field indicates there are no subsequent pages.

evaluations[]

object (Evaluation)

The list of evaluations.

nextPageToken

string

A token that can be sent as ListEvaluationsRequest.page_token to retrieve the next page. Absence of this field indicates there are no subsequent pages.

Evaluation

JSON representation

JSON representation
{ "name": string, "displayName": string, "description": string, "tags": [ string ], "evaluationDatasets": [ string ], "createTime": string, "createdBy": string, "updateTime": string, "lastUpdatedBy": string, "evaluationRuns": [ string ], "etag": string, "aggregatedMetrics": { object (`AggregatedMetrics`) }, "lastCompletedResult": { object (`EvaluationResult`) }, "invalid": boolean, "lastTenResults": [ { object (`EvaluationResult`) } ], // Union field `inputs` can be only one of the following: "golden": { object (`Golden`) }, "scenario": { object (`Scenario`) } // End of list of possible types for union field `inputs`. }

{
  "name": string,
  "displayName": string,
  "description": string,
  "tags": [
    string
  ],
  "evaluationDatasets": [
    string
  ],
  "createTime": string,
  "createdBy": string,
  "updateTime": string,
  "lastUpdatedBy": string,
  "evaluationRuns": [
    string
  ],
  "etag": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  },
  "lastCompletedResult": {
    object (EvaluationResult)
  },
  "invalid": boolean,
  "lastTenResults": [
    {
      object (EvaluationResult)
    }
  ],

  // Union field inputs can be only one of the following:
  "golden": {
    object (Golden)
  },
  "scenario": {
    object (Scenario)
  }
  // End of list of possible types for union field inputs.
}

Fields
`name`	`string` Identifier. The unique identifier of this evaluation. Format: `projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}`
`displayName`	`string` Required. User-defined display name of the evaluation. Unique within an App.
`description`	`string` Optional. User-defined description of the evaluation.
`tags[]`	`string` Optional. User defined tags to categorize the evaluation.
`evaluationDatasets[]`	`string` Output only. List of evaluation datasets the evaluation belongs to. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}`
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`createdBy`	`string` Output only. The user who created the evaluation.
`updateTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`lastUpdatedBy`	`string` Output only. The user who last updated the evaluation.
`evaluationRuns[]`	`string` Output only. The EvaluationRuns that this Evaluation is associated with.
`etag`	`string` Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.
`aggregatedMetrics`	`object (AggregatedMetrics)` Output only. The aggregated metrics for this evaluation across all runs.
`lastCompletedResult`	`object (EvaluationResult)` Output only. The latest evaluation result for this evaluation.
`invalid`	`boolean` Output only. Whether the evaluation is invalid. This can happen if an evaluation is referencing a tool, toolset, or agent that has since been deleted.
`lastTenResults[]`	`object (EvaluationResult)` Output only. The last 10 evaluation results for this evaluation. This is only populated if include_last_ten_results is set to true in the ListEvaluationsRequest or GetEvaluationRequest.
Union field `inputs`. The inputs for the evaluation `inputs` can be only one of the following:
`golden`	`object (Golden)` Optional. The golden steps to be evaluated.
`scenario`	`object (Scenario)` Optional. The config for a scenario.

Golden

JSON representation
{ "turns": [ { object (`GoldenTurn`) } ], "evaluationExpectations": [ string ] }

Fields

Fields
`turns[]`	`object (GoldenTurn)` Required. The golden turns required to replay a golden conversation.
`evaluationExpectations[]`	`string` Optional. The evaluation expectations to evaluate the replayed conversation against. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluationExpectation}`

turns[]

object (GoldenTurn)

Required. The golden turns required to replay a golden conversation.

evaluationExpectations[]

string

Optional. The evaluation expectations to evaluate the replayed conversation against. Format: projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluationExpectation}

GoldenTurn

JSON representation
{ "steps": [ { object (`Step`) } ], "rootSpan": { object (`Span`) } }

Fields

Fields
`steps[]`	`object (Step)` Required. The steps required to replay a golden conversation.
`rootSpan`	`object (Span)` Optional. The root span of the golden turn for processing and maintaining audio information.

steps[]

object (Step)

Required. The steps required to replay a golden conversation.

rootSpan

object (Span)

Optional. The root span of the golden turn for processing and maintaining audio information.

Step

JSON representation

JSON representation
{ // Union field `step` can be only one of the following: "userInput": { object (`SessionInput`) }, "agentTransfer": { object (`AgentTransfer`) }, "expectation": { object (`GoldenExpectation`) } // End of list of possible types for union field `step`. }

{

  // Union field step can be only one of the following:
  "userInput": {
    object (SessionInput)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "expectation": {
    object (GoldenExpectation)
  }
  // End of list of possible types for union field step.
}

Fields
Union field `step`. The step to perform. `step` can be only one of the following:
`userInput`	`object (SessionInput)` Optional. User input for the conversation.
`agentTransfer`	`object (AgentTransfer)` Optional. Transfer the conversation to a different agent.
`expectation`	`object (GoldenExpectation)` Optional. Executes an expectation on the current turn.

SessionInput

JSON representation

JSON representation
{ "willContinue": boolean, // Union field `input_type` can be only one of the following: "text": string, "dtmf": string, "audio": string, "toolResponses": { object (`ToolResponses`) }, "image": { object (`Image`) }, "blob": { object (`Blob`) }, "variables": { object }, "event": { object (`Event`) } // End of list of possible types for union field `input_type`. }

{
  "willContinue": boolean,

  // Union field input_type can be only one of the following:
  "text": string,
  "dtmf": string,
  "audio": string,
  "toolResponses": {
    object (ToolResponses)
  },
  "image": {
    object (Image)
  },
  "blob": {
    object (Blob)
  },
  "variables": {
    object
  },
  "event": {
    object (Event)
  }
  // End of list of possible types for union field input_type.
}

Fields
`willContinue`	`boolean` Optional. A flag to indicate if the current message is a fragment of a larger input in the bidi streaming session. When set to `true`, the agent defers processing until it receives a subsequent message where `will_continue` is `false`, or until the system detects an endpoint in the audio input. NOTE: This field does not apply to audio and DTMF inputs, as they are always processed automatically based on the endpointing signal.
Union field `input_type`. The type of the input. `input_type` can be only one of the following:
`text`	`string` Optional. Text data from the end user.
`dtmf`	`string` Optional. DTMF digits from the end user.
`audio`	`string (bytes format)` Optional. Audio data from the end user. A base64-encoded string.
`toolResponses`	`object (ToolResponses)` Optional. Execution results for the tool calls from the client.
`image`	`object (Image)` Optional. Image data from the end user.
`blob`	`object (Blob)` Optional. Blob data from the end user.
`variables`	`object (Struct format)` Optional. Contextual variables for the session, keyed by name. Only variables declared in the app will be used by the CES agent. Unrecognized variables will still be sent to the [Dialogflow agent][Agent.RemoteDialogflowAgent] as additional session parameters.
`event`	`object (Event)` Optional. Event input.

ToolResponses

JSON representation
{ "toolResponses": [ { object (`ToolResponse`) } ] }

Fields

Fields
`toolResponses[]`	`object (ToolResponse)` Optional. The list of tool execution results.

toolResponses[]

object (ToolResponse)

Optional. The list of tool execution results.

ToolResponse

JSON representation

JSON representation
{ "id": string, "displayName": string, "response": { object }, // Union field `tool_identifier` can be only one of the following: "tool": string, "toolsetTool": { object (`ToolsetTool`) } // End of list of possible types for union field `tool_identifier`. }

{
  "id": string,
  "displayName": string,
  "response": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}

Fields
`id`	`string` Optional. The matching ID of the `tool call` the response is for.
`displayName`	`string` Output only. Display name of the tool.
`response`	`object (Struct format)` Required. The tool execution result in JSON object format. Use "output" key to specify tool response and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as tool execution result.
Union field `tool_identifier`. The identifier of the tool that got executed. It could be either a persisted tool or a tool from a toolset. `tool_identifier` can be only one of the following:
`tool`	`string` Optional. The name of the tool to execute. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`
`toolsetTool`	`object (ToolsetTool)` Optional. The toolset tool that got executed.

ToolsetTool

JSON representation
{ "toolset": string, "toolId": string }

Fields

Fields
`toolset`	`string` Required. The resource name of the Toolset from which this tool is derived. Format: `projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}`
`toolId`	`string` Optional. The tool ID to filter the tools to retrieve the schema for.

toolset

string

Required. The resource name of the Toolset from which this tool is derived. Format: projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}

toolId

string

Optional. The tool ID to filter the tools to retrieve the schema for.

Struct

JSON representation
{ "fields": { string: value, ... } }

Fields

Fields
`fields`	`map (key: string, value: value (Value format))` Unordered map of dynamically typed values. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.

fields

map (key: string, value: value (Value format))

Unordered map of dynamically typed values.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

FieldsEntry

JSON representation
{ "key": string, "value": value }

Fields
`key`	`string`
`value`	`value (Value format)`

Value

JSON representation

JSON representation
{ // Union field `kind` can be only one of the following: "nullValue": null, "numberValue": number, "stringValue": string, "boolValue": boolean, "structValue": { object }, "listValue": array // End of list of possible types for union field `kind`. }

{

  // Union field kind can be only one of the following:
  "nullValue": null,
  "numberValue": number,
  "stringValue": string,
  "boolValue": boolean,
  "structValue": {
    object
  },
  "listValue": array
  // End of list of possible types for union field kind.
}

Fields
Union field `kind`. The kind of value. `kind` can be only one of the following:
`nullValue`	`null` Represents a null value.
`numberValue`	`number` Represents a double value.
`stringValue`	`string` Represents a string value.
`boolValue`	`boolean` Represents a boolean value.
`structValue`	`object (Struct format)` Represents a structured value.
`listValue`	`array (ListValue format)` Represents a repeated `Value`.

ListValue

JSON representation
{ "values": [ value ] }

Fields

Fields
`values[]`	`value (Value format)` Repeated field of dynamically typed values.

values[]

value (Value format)

Repeated field of dynamically typed values.

Image

JSON representation
{ "mimeType": string, "data": string }

Fields

Fields
`mimeType`	`string` Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp
`data`	`string (bytes format)` Required. Raw bytes of the image. A base64-encoded string.

mimeType

string

Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp

data

string (bytes format)

Required. Raw bytes of the image.

A base64-encoded string.

Blob

JSON representation
{ "mimeType": string, "data": string }

Fields

Fields
`mimeType`	`string` Required. The IANA standard MIME type of the source data.
`data`	`string (bytes format)` Required. Raw bytes of the blob. A base64-encoded string.

mimeType

string

Required. The IANA standard MIME type of the source data.

data

string (bytes format)

Required. Raw bytes of the blob.

A base64-encoded string.

Event

JSON representation
{ "event": string }

Fields

Fields
`event`	`string` Required. The name of the event.

event

string

Required. The name of the event.

AgentTransfer

JSON representation
{ "targetAgent": string, "displayName": string }

Fields

Fields
`targetAgent`	`string` Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: `projects/{project}/locations/{location}/apps/{app}/agents/{agent}`
`displayName`	`string` Output only. Display name of the agent.

targetAgent

string

Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: projects/{project}/locations/{location}/apps/{app}/agents/{agent}

displayName

string

Output only. Display name of the agent.

GoldenExpectation

JSON representation

JSON representation
{ "note": string, // Union field `condition` can be only one of the following: "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) }, "agentResponse": { object (`Message`) }, "agentTransfer": { object (`AgentTransfer`) }, "updatedVariables": { object }, "mockToolResponse": { object (`ToolResponse`) } // End of list of possible types for union field `condition`. }

{
  "note": string,

  // Union field condition can be only one of the following:
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentResponse": {
    object (Message)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
  // End of list of possible types for union field condition.
}

Fields
`note`	`string` Optional. A note for this requirement, useful in reporting when specific checks fail. E.g., "Check_Payment_Tool_Called".
Union field `condition`. The actual check to perform. `condition` can be only one of the following:
`toolCall`	`object (ToolCall)` Optional. Check that a specific tool was called with the parameters.
`toolResponse`	`object (ToolResponse)` Optional. Check that a specific tool had the expected response.
`agentResponse`	`object (Message)` Optional. Check that the agent responded with the correct response. The role "agent" is implied.
`agentTransfer`	`object (AgentTransfer)` Optional. Check that the agent transferred the conversation to a different agent.
`updatedVariables`	`object (Struct format)` Optional. Check that the agent updated the session variables to the expected values. Used to also capture agent variable updates for golden evals.
`mockToolResponse`	`object (ToolResponse)` Optional. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

ToolCall

JSON representation

JSON representation
{ "id": string, "displayName": string, "args": { object }, // Union field `tool_identifier` can be only one of the following: "tool": string, "toolsetTool": { object (`ToolsetTool`) } // End of list of possible types for union field `tool_identifier`. }

{
  "id": string,
  "displayName": string,
  "args": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}

Fields
`id`	`string` Optional. The unique identifier of the tool call. If populated, the client should return the execution result with the matching ID in `ToolResponse`.
`displayName`	`string` Output only. Display name of the tool.
`args`	`object (Struct format)` Optional. The input parameters and values for the tool in JSON object format.
Union field `tool_identifier`. The identifier of the tool to execute. It could be either a persisted tool or a tool from a toolset. `tool_identifier` can be only one of the following:
`tool`	`string` Optional. The name of the tool to execute. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`
`toolsetTool`	`object (ToolsetTool)` Optional. The toolset tool to execute.

Message

JSON representation
{ "role": string, "chunks": [ { object (`Chunk`) } ], "eventTime": string }

Fields

Fields
`role`	`string` Optional. The role within the conversation, e.g., user, agent.
`chunks[]`	`object (Chunk)` Optional. Content of the message as a series of chunks.
`eventTime`	`string (Timestamp format)` Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an `example`. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.

role

string

Optional. The role within the conversation, e.g., user, agent.

chunks[]

object (Chunk)

Optional. Content of the message as a series of chunks.

eventTime

string (Timestamp format)

Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an example.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Chunk

JSON representation

JSON representation
{ // Union field `data` can be only one of the following: "text": string, "transcript": string, "blob": { object (`Blob`) }, "payload": { object }, "image": { object (`Image`) }, "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) }, "agentTransfer": { object (`AgentTransfer`) }, "updatedVariables": { object }, "defaultVariables": { object } // End of list of possible types for union field `data`. }

{

  // Union field data can be only one of the following:
  "text": string,
  "transcript": string,
  "blob": {
    object (Blob)
  },
  "payload": {
    object
  },
  "image": {
    object (Image)
  },
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "defaultVariables": {
    object
  }
  // End of list of possible types for union field data.
}

Fields
Union field `data`. Chunk data. `data` can be only one of the following:
`text`	`string` Optional. Text data.
`transcript`	`string` Optional. Transcript associated with the audio.
`blob`	`object (Blob)` Optional. Blob data.
`payload`	`object (Struct format)` Optional. Custom payload data.
`image`	`object (Image)` Optional. Image data.
`toolCall`	`object (ToolCall)` Optional. Tool execution request.
`toolResponse`	`object (ToolResponse)` Optional. Tool execution response.
`agentTransfer`	`object (AgentTransfer)` Optional. Agent transfer event.
`updatedVariables`	`object (Struct format)` A struct represents variables that were updated in the conversation, keyed by variable names.
`defaultVariables`	`object (Struct format)` A struct represents default variables at the start of the conversation, keyed by variable names.

Timestamp

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).
`nanos`	`integer` Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

Span

JSON representation
{ "name": string, "startTime": string, "endTime": string, "duration": string, "attributes": { object }, "childSpans": [ { object (`Span`) } ] }

Fields
`name`	`string` Output only. The name of the span.
`startTime`	`string (Timestamp format)` Output only. The start time of the span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of the span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`duration`	`string (Duration format)` Output only. The duration of the span. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
`attributes`	`object (Struct format)` Output only. Key-value attributes associated with the span.
`childSpans[]`	`object (Span)` Output only. The child spans that are nested under this span.

Duration

JSON representation
{ "seconds": string, "nanos": integer }

Fields

seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

Scenario

JSON representation

{
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "maxTurns": integer,
  "rubrics": [
    string
  ],
  "scenarioExpectations": [
    {
      object (ScenarioExpectation)
    }
  ],
  "variableOverrides": {
    object
  },
  "taskCompletionBehavior": enum (TaskCompletionBehavior),
  "userGoalBehavior": enum (UserGoalBehavior),
  "evaluationExpectations": [
    string
  ]
}

Fields
`task`	`string` Required. The task to be targeted by the scenario.
`userFacts[]`	`object (UserFact)` Optional. The user facts to be used by the scenario.
`maxTurns`	`integer` Optional. The maximum number of turns to simulate. If not specified, the simulation will continue until the task is complete.
`rubrics[]`	`string` Required. The rubrics to score the scenario against.
`scenarioExpectations[]`	`object (ScenarioExpectation)` Required. The ScenarioExpectations to evaluate the conversation produced by the user simulation.
`variableOverrides`	`object (Struct format)` Optional. Variables / Session Parameters as context for the session, keyed by variable names. Members of this struct will override any default values set by the system. Note, these are different from user facts, which are facts known to the user. Variables are parameters known to the agent: i.e. MDN (phone number) passed by the telephony system.
`taskCompletionBehavior (deprecated)`	`enum (TaskCompletionBehavior)` This item is deprecated! Optional. Deprecated. Use user_goal_behavior instead.
`userGoalBehavior`	`enum (UserGoalBehavior)` Optional. The expected behavior of the user goal.
`evaluationExpectations[]`	`string` Optional. The evaluation expectations to evaluate the conversation produced by the simulation against. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluationExpectation}`

UserFact

JSON representation
{ "name": string, "value": string }

Fields

name

string

Required. The name of the user fact.

value

string

Required. The value of the user fact.

ScenarioExpectation

JSON representation

{

  // Union field expectation can be only one of the following:
  "toolExpectation": {
    object (ToolExpectation)
  },
  "agentResponse": {
    object (Message)
  }
  // End of list of possible types for union field expectation.
}

Fields

Union field expectation. The expectation to evaluate the conversation produced by the simulation. expectation can be only one of the following:

toolExpectation

object (ToolExpectation)

Optional. The tool call and response pair to be evaluated.

agentResponse

object (Message)

Optional. The agent response to be evaluated.

ToolExpectation

JSON representation
{ "expectedToolCall": { object (`ToolCall`) }, "mockToolResponse": { object (`ToolResponse`) } }

Fields

expectedToolCall

object (ToolCall)

Required. The expected tool call, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

mockToolResponse

object (ToolResponse)

Required. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

AggregatedMetrics

JSON representation
{ "metricsByAppVersion": [ { object (`MetricsByAppVersion`) } ] }

Fields

metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation

{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}

Fields
`appVersionId`	`string` Output only. The app version ID.
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this app version.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this app version.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this app version.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this app version.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this app version.
`passCount`	`integer` Output only. The number of times the evaluation passed.
`failCount`	`integer` Output only. The number of times the evaluation failed.
`metricsByTurn[]`	`object (MetricsByTurn)` Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{ "tool": string, "passCount": integer, "failCount": integer }

Fields

tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{ "score": number }

Fields

score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{ "score": number }

Fields

score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{ "tool": string, "averageLatency": string }

Fields

tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

TurnLatencyMetrics

JSON representation
{ "averageLatency": string }

Fields

averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation

{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}

Fields
`turnIndex`	`integer` Output only. The turn index (0-based).
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this turn.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this turn.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this turn.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this turn.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this turn.

EvaluationResult

JSON representation

{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "changelogCreateTime": string,
  "executionState": enum (ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (GoldenResult)
  },
  "scenarioResult": {
    object (ScenarioResult)
  }
  // End of list of possible types for union field result.
}

Fields
`name`	`string` Identifier. The unique identifier of the evaluation result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}`
`displayName`	`string` Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`evaluationStatus`	`enum (Outcome)` Output only. The outcome of the evaluation. Only populated if execution_state is COMPLETE.
`evaluationRun`	`string` Output only. The evaluation run that produced this result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}`
`persona`	`object (EvaluationPersona)` Output only. The persona used to generate the conversation for the evaluation result.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Error information for the evaluation result.
`error (deprecated)`	`object (Status)` This item is deprecated! Output only. Deprecated: Use `error_info` instead. Errors encountered during execution.
`initiatedBy`	`string` Output only. The user who initiated the evaluation run that resulted in this result.
`appVersion`	`string` Output only. The app version used to generate the conversation that resulted in this result. Format: `projects/{project}/locations/{location}/apps/{app}/versions/{version}`
`appVersionDisplayName`	`string` Output only. The display name of the `app_version` that the evaluation ran against.
`changelog`	`string` Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.
`changelogCreateTime`	`string (Timestamp format)` Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionState`	`enum (ExecutionState)` Output only. The state of the evaluation result execution.
`evaluationMetricsThresholds`	`object (EvaluationMetricsThresholds)` Output only. The evaluation thresholds for the result.
`config`	`object (EvaluationConfig)` Output only. The configuration used in the evaluation run that resulted in this result.
`goldenRunMethod`	`enum (GoldenRunMethod)` Output only. The method used to run the golden evaluation.
Union field `result`. The result of the evaluation. Only populated when the execution_state is COMPLETED. `result` can be only one of the following:
`goldenResult`	`object (GoldenResult)` Output only. The outcome of a golden evaluation.
`scenarioResult`	`object (ScenarioResult)` Output only. The outcome of a scenario evaluation.

GoldenResult

JSON representation
{ "turnReplayResults": [ { object (`TurnReplayResult`) } ], "evaluationExpectationResults": [ { object (`EvaluationExpectationResult`) } ] }

Fields

turnReplayResults[]

object (TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

TurnReplayResult

JSON representation

{
  "conversation": string,
  "expectationOutcome": [
    {
      object (GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],

  // Union field _tool_ordered_invocation_score can be only one of the following:
  "toolOrderedInvocationScore": number
  // End of list of possible types for union field
  // _tool_ordered_invocation_score.
}

Fields
`conversation`	`string` Output only. The conversation that was generated for this turn.
`expectationOutcome[]`	`object (GoldenExpectationOutcome)` Output only. The outcome of each expectation.
`hallucinationResult`	`object (HallucinationResult)` Output only. The result of the hallucination check.
`toolInvocationScore (deprecated)`	`number` This item is deprecated! Output only. Deprecated. Use OverallToolInvocationResult instead.
`turnLatency`	`string (Duration format)` Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
`toolCallLatencies[]`	`object (ToolCallLatency)` Output only. The latency of each tool call in the turn.
`semanticSimilarityResult`	`object (SemanticSimilarityResult)` Output only. The result of the semantic similarity check.
`overallToolInvocationResult`	`object (OverallToolInvocationResult)` Output only. The result of the overall tool invocation check.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Information about the error that occurred during this turn.
`spanLatencies[]`	`object (SpanLatency)` Output only. The latency of spans in the turn.
Union field `_tool_ordered_invocation_score`. `_tool_ordered_invocation_score` can be only one of the following:
`toolOrderedInvocationScore`	`number` Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

GoldenExpectationOutcome

JSON representation

{
  "expectation": {
    object (GoldenExpectation)
  },
  "outcome": enum (Outcome),
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (GoldenExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (Outcome)` Output only. The outcome of the expectation.
`semanticSimilarityResult (deprecated)`	`object (SemanticSimilarityResult)` This item is deprecated! Output only. The result of the semantic similarity check.
`toolInvocationResult`	`object (ToolInvocationResult)` Output only. The result of the tool invocation check.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (ToolCall)` Output only. The result of the tool call expectation.
`observedToolResponse`	`object (ToolResponse)` Output only. The result of the tool response expectation.
`observedAgentResponse`	`object (Message)` Output only. The result of the agent response expectation.
`observedAgentTransfer`	`object (AgentTransfer)` Output only. The result of the agent transfer expectation.

SemanticSimilarityResult

JSON representation

{
  "label": string,
  "explanation": string,
  "outcome": enum (Outcome),

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}

Fields
`label`	`string` Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory
`explanation`	`string` Output only. The explanation for the semantic similarity score.
`outcome`	`enum (Outcome)` Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semantic_similarity_success_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

ToolInvocationResult

JSON representation

{
  "outcome": enum (Outcome),
  "explanation": string,

  // Union field _parameter_correctness_score can be only one of the following:
  "parameterCorrectnessScore": number
  // End of list of possible types for union field _parameter_correctness_score.
}

Fields
`outcome`	`enum (Outcome)` Output only. The outcome of the tool invocation check. This is determined by comparing the parameter_correctness_score to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
`explanation`	`string` Output only. A free text explanation for the tool invocation result.
Union field `_parameter_correctness_score`. `_parameter_correctness_score` can be only one of the following:
`parameterCorrectnessScore`	`number` Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

HallucinationResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess
`explanation`	`string` Output only. The explanation for the hallucination score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The hallucination score. Can be -1, 0, 1.

ToolCallLatency

JSON representation
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string }

Fields
`tool`	`string` Output only. The name of the tool that got executed. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`.
`displayName`	`string` Output only. The display name of the tool.
`startTime`	`string (Timestamp format)` Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionLatency`	`string (Duration format)` Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

OverallToolInvocationResult

JSON representation

{
  "outcome": enum (Outcome),

  // Union field _tool_invocation_score can be only one of the following:
  "toolInvocationScore": number
  // End of list of possible types for union field _tool_invocation_score.
}

Fields

outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the tool_invocation_score to the overall_tool_invocation_correctness_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _tool_invocation_score.

_tool_invocation_score can be only one of the following:

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationErrorInfo

JSON representation
{ "errorType": enum (`ErrorType`), "errorMessage": string, "sessionId": string }

Fields

errorType

enum (ErrorType)

Output only. The type of error.

errorMessage

string

Output only. The error message.

sessionId

string

Output only. The session ID for the conversation that caused the error.

SpanLatency

JSON representation

{
  "type": enum (Type),
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string,

  // Union field identifier can be only one of the following:
  "resource": string,
  "toolset": {
    object (ToolsetTool)
  },
  "model": string,
  "callback": string
  // End of list of possible types for union field identifier.
}

Fields
`type`	`enum (Type)` Output only. The type of span.
`displayName`	`string` Output only. The display name of the span. Applicable to tool and guardrail spans.
`startTime`	`string (Timestamp format)` Output only. The start time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionLatency`	`string (Duration format)` Output only. The latency of span. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
Union field `identifier`. The identifier of the specific item based on its type. `identifier` can be only one of the following:
`resource`	`string` Output only. The resource name of the guardrail or tool spans.
`toolset`	`object (ToolsetTool)` Output only. The toolset tool identifier.
`model`	`string` Output only. The name of the LLM span.
`callback`	`string` Output only. The name of the user callback span.

EvaluationExpectationResult

JSON representation
{ "evaluationExpectation": string, "prompt": string, "outcome": enum (`Outcome`), "explanation": string }

Fields
`evaluationExpectation`	`string` Output only. The evaluation expectation. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluation_expectation}`
`prompt`	`string` Output only. The prompt that was used for the evaluation.
`outcome`	`enum (Outcome)` Output only. The outcome of the evaluation expectation.
`explanation`	`string` Output only. The explanation for the result.

ScenarioResult

JSON representation

{
  "conversation": string,
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "expectationOutcomes": [
    {
      object (ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (UserGoalSatisfactionResult)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ],

  // Union field _all_expectations_satisfied can be only one of the following:
  "allExpectationsSatisfied": boolean
  // End of list of possible types for union field _all_expectations_satisfied.

  // Union field _task_completed can be only one of the following:
  "taskCompleted": boolean
  // End of list of possible types for union field _task_completed.
}

Fields

conversation

string

Output only. The conversation that was generated in the scenario.

task

string

Output only. The task that was used when running the scenario for this result.

userFacts[]

object (UserFact)

Output only. The user facts that were used by the scenario for this result.

expectationOutcomes[]

object (ScenarioExpectationOutcome)

Output only. The outcome of each expectation.

rubricOutcomes[]

object (ScenarioRubricOutcome)

Output only. The outcome of the rubric.

hallucinationResult[]

object (HallucinationResult)

Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.

taskCompletionResult
(deprecated)

object (TaskCompletionResult)

Output only. The result of the task completion check.

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call execution in the conversation.

userGoalSatisfactionResult

object (UserGoalSatisfactionResult)

Output only. The result of the user goal satisfaction check.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

Union field _all_expectations_satisfied.

_all_expectations_satisfied can be only one of the following:

allExpectationsSatisfied

boolean

Output only. Whether all expectations were satisfied for this turn.

Union field _task_completed.

_task_completed can be only one of the following:

taskCompleted

boolean

Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

ScenarioExpectationOutcome

JSON representation

{
  "expectation": {
    object (ScenarioExpectation)
  },
  "outcome": enum (Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (ScenarioExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (Outcome)` Output only. The outcome of the ScenarioExpectation.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (ObservedToolCall)` Output only. The observed tool call.
`observedAgentResponse`	`object (Message)` Output only. The observed agent response.

ObservedToolCall

JSON representation
{ "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) } }

Fields

toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

ScenarioRubricOutcome

JSON representation
{ "rubric": string, "scoreExplanation": string, // Union field `_score` can be only one of the following: "score": number // End of list of possible types for union field `_score`. }

Fields
`rubric`	`string` Output only. The rubric that was used to evaluate the conversation.
`scoreExplanation`	`string` Output only. The rater's response to the rubric.
Union field `_score`. `_score` can be only one of the following:
`score`	`number` Output only. The score of the conversation against the rubric.

TaskCompletionResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined
`explanation`	`string` Output only. The explanation for the task completion score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The task completion score. Can be -1, 0, 1

UserGoalSatisfactionResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified
`explanation`	`string` Output only. The explanation for the user task satisfaction score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationPersona

JSON representation
{ "name": string, "description": string, "displayName": string, "personality": string, "speechConfig": { object (`SpeechConfig`) } }

Fields
`name`	`string` Required. The unique identifier of the persona. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationPersonas/{evaluationPersona}`
`description`	`string` Optional. The description of the persona.
`displayName`	`string` Required. The display name of the persona. Unique within an app.
`personality`	`string` Required. An instruction for the agent on how to behave in the evaluation.
`speechConfig`	`object (SpeechConfig)` Optional. Configuration for how the persona sounds (TTS settings).

SpeechConfig

JSON representation
{ "speakingRate": number, "environment": enum (`BackgroundEnvironment`), "voiceId": string }

Fields

speakingRate

number

Optional. The speaking rate. 1.0 is normal. Lower is slower (e.g., 0.8), higher is faster (e.g., 1.5). Useful for testing how the agent handles fast talkers.

environment

enum (BackgroundEnvironment)

Optional. The simulated audio environment.

voiceId

string

Optional. The specific voice identifier/accent to use. Example: "en-US-Wavenet-D" or "en-GB-Standard-A"

Status

JSON representation
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] }

Fields

code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Any

JSON representation
{ "typeUrl": string, "value": string }

Fields

typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

EvaluationMetricsThresholds

JSON representation

{
  "goldenEvaluationMetricsThresholds": {
    object (GoldenEvaluationMetricsThresholds)
  },
  "hallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "goldenHallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "scenarioHallucinationMetricBehavior": enum (HallucinationMetricBehavior)
}

Fields
`goldenEvaluationMetricsThresholds`	`object (GoldenEvaluationMetricsThresholds)` Optional. The golden evaluation metrics thresholds.
`hallucinationMetricBehavior (deprecated)`	`enum (HallucinationMetricBehavior)` This item is deprecated! Optional. Deprecated: Use `golden_hallucination_metric_behavior` instead. The hallucination metric behavior is currently used for golden evaluations.
`goldenHallucinationMetricBehavior`	`enum (HallucinationMetricBehavior)` Optional. The hallucination metric behavior for golden evaluations.
`scenarioHallucinationMetricBehavior`	`enum (HallucinationMetricBehavior)` Optional. The hallucination metric behavior for scenario evaluations.

GoldenEvaluationMetricsThresholds

JSON representation

{
  "turnLevelMetricsThresholds": {
    object (TurnLevelMetricsThresholds)
  },
  "expectationLevelMetricsThresholds": {
    object (ExpectationLevelMetricsThresholds)
  },
  "toolMatchingSettings": {
    object (ToolMatchingSettings)
  }
}

Fields

turnLevelMetricsThresholds

object (TurnLevelMetricsThresholds)

Optional. The turn level metrics thresholds.

expectationLevelMetricsThresholds

object (ExpectationLevelMetricsThresholds)

Optional. The expectation level metrics thresholds.

toolMatchingSettings

object (ToolMatchingSettings)

Optional. The tool matching settings. An extra tool call is a tool call that is present in the execution but does not match any tool call in the golden expectation.

TurnLevelMetricsThresholds

JSON representation

{
  "semanticSimilarityChannel": enum (SemanticSimilarityChannel),

  // Union field _semantic_similarity_success_threshold can be only one of the
  // following:
  "semanticSimilaritySuccessThreshold": integer
  // End of list of possible types for union field
  // _semantic_similarity_success_threshold.

  // Union field _overall_tool_invocation_correctness_threshold can be only one
  // of the following:
  "overallToolInvocationCorrectnessThreshold": number
  // End of list of possible types for union field
  // _overall_tool_invocation_correctness_threshold.
}

Fields

semanticSimilarityChannel

enum (SemanticSimilarityChannel)

Optional. The semantic similarity channel to use for evaluation.

Union field _semantic_similarity_success_threshold.

_semantic_similarity_success_threshold can be only one of the following:

semanticSimilaritySuccessThreshold

integer

Optional. The success threshold for semantic similarity. Must be an integer between 0 and 4. Default is >= 3.

Union field _overall_tool_invocation_correctness_threshold.

_overall_tool_invocation_correctness_threshold can be only one of the following:

overallToolInvocationCorrectnessThreshold

number

Optional. The success threshold for overall tool invocation correctness. Must be a float between 0 and 1. Default is 1.0.

ExpectationLevelMetricsThresholds

JSON representation

{

  // Union field _tool_invocation_parameter_correctness_threshold can be only one
  // of the following:
  "toolInvocationParameterCorrectnessThreshold": number
  // End of list of possible types for union field
  // _tool_invocation_parameter_correctness_threshold.
}

Fields

Union field _tool_invocation_parameter_correctness_threshold.

_tool_invocation_parameter_correctness_threshold can be only one of the following:

toolInvocationParameterCorrectnessThreshold

number

Optional. The success threshold for individual tool invocation parameter correctness. Must be a float between 0 and 1. Default is 1.0.

ToolMatchingSettings

JSON representation
{ "extraToolCallBehavior": enum (`ExtraToolCallBehavior`) }

Fields

extraToolCallBehavior

enum (ExtraToolCallBehavior)

Optional. Behavior for extra tool calls. Defaults to FAIL.

EvaluationConfig

JSON representation

{
  "inputAudioConfig": {
    object (InputAudioConfig)
  },
  "outputAudioConfig": {
    object (OutputAudioConfig)
  },
  "evaluationChannel": enum (EvaluationChannel),
  "toolCallBehaviour": enum (EvaluationToolCallBehaviour)
}

Fields
`inputAudioConfig (deprecated)`	`object (InputAudioConfig)` This item is deprecated! Optional. Configuration for processing the input audio.
`outputAudioConfig (deprecated)`	`object (OutputAudioConfig)` This item is deprecated! Optional. Configuration for generating the output audio.
`evaluationChannel`	`enum (EvaluationChannel)` Optional. The channel to evaluate.
`toolCallBehaviour`	`enum (EvaluationToolCallBehaviour)` Optional. Specifies whether the evaluation should use real tool calls or fake tools.

InputAudioConfig

JSON representation
{ "audioEncoding": enum (`AudioEncoding`), "sampleRateHertz": integer, "noiseSuppressionLevel": string }

Fields

audioEncoding

enum (AudioEncoding)

Required. The encoding of the input audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the input audio data.

noiseSuppressionLevel

string

Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high".

OutputAudioConfig

JSON representation
{ "audioEncoding": enum (`AudioEncoding`), "sampleRateHertz": integer }

Fields

audioEncoding

enum (AudioEncoding)

Required. The encoding of the output audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the output audio data.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌

MCP Tools Reference: ces.googleapis.com Stay organized with collections Save and categorize content based on your preferences.

Tool: list_evaluations

Input Schema

ListEvaluationsRequest

Output Schema

ListEvaluationsResponse

Evaluation

Golden

GoldenTurn

Step

SessionInput

ToolResponses

ToolResponse

ToolsetTool

Struct

FieldsEntry

Value

ListValue

Image

Blob

Event

AgentTransfer

GoldenExpectation

ToolCall

Message

Chunk

Timestamp

Span

Duration

Scenario

UserFact

ScenarioExpectation

ToolExpectation

AggregatedMetrics

MetricsByAppVersion

ToolMetrics

SemanticSimilarityMetrics

HallucinationMetrics

ToolCallLatencyMetrics

TurnLatencyMetrics

MetricsByTurn

EvaluationResult

GoldenResult

TurnReplayResult

GoldenExpectationOutcome

SemanticSimilarityResult

ToolInvocationResult

HallucinationResult

ToolCallLatency

OverallToolInvocationResult

EvaluationErrorInfo

SpanLatency

EvaluationExpectationResult

ScenarioResult

ScenarioExpectationOutcome

ObservedToolCall

ScenarioRubricOutcome

TaskCompletionResult

UserGoalSatisfactionResult

EvaluationPersona

SpeechConfig

Status

Any

EvaluationMetricsThresholds

GoldenEvaluationMetricsThresholds

TurnLevelMetricsThresholds

ExpectationLevelMetricsThresholds

ToolMatchingSettings

EvaluationConfig

InputAudioConfig

OutputAudioConfig

Tool Annotations

MCP Tools Reference: ces.googleapis.com

Tool: `list_evaluations`