MCP Tools Reference: ces.googleapis.com

Tool: `get_evaluation_result`

Gets details of the specified evaluation result.

The following sample demonstrate how to use curl to invoke the get_evaluation_result MCP tool.

Curl Request
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "get_evaluation_result", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }'

Curl Request

                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "get_evaluation_result",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'

Input Schema

Request message for EvaluationService.GetEvaluationResult.

GetEvaluationResultRequest

JSON representation
{ "name": string }

Fields

Fields
`name`	`string` Required. The resource name of the evaluation result to retrieve.

name

string

Required. The resource name of the evaluation result to retrieve.

Output Schema

An evaluation result represents the output of running an Evaluation.

EvaluationResult

JSON representation

JSON representation
{ "name": string, "displayName": string, "createTime": string, "evaluationStatus": enum (`Outcome`), "evaluationRun": string, "persona": { object (`EvaluationPersona`) }, "errorInfo": { object (`EvaluationErrorInfo`) }, "error": { object (`Status`) }, "initiatedBy": string, "appVersion": string, "appVersionDisplayName": string, "changelog": string, "changelogCreateTime": string, "executionState": enum (`ExecutionState`), "evaluationMetricsThresholds": { object (`EvaluationMetricsThresholds`) }, "config": { object (`EvaluationConfig`) }, "goldenRunMethod": enum (`GoldenRunMethod`), // Union field `result` can be only one of the following: "goldenResult": { object (`GoldenResult`) }, "scenarioResult": { object (`ScenarioResult`) } // End of list of possible types for union field `result`. }

{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "changelogCreateTime": string,
  "executionState": enum (ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (GoldenResult)
  },
  "scenarioResult": {
    object (ScenarioResult)
  }
  // End of list of possible types for union field result.
}

Fields
`name`	`string` Identifier. The unique identifier of the evaluation result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}`
`displayName`	`string` Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`evaluationStatus`	`enum (Outcome)` Output only. The outcome of the evaluation. Only populated if execution_state is COMPLETE.
`evaluationRun`	`string` Output only. The evaluation run that produced this result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}`
`persona`	`object (EvaluationPersona)` Output only. The persona used to generate the conversation for the evaluation result.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Error information for the evaluation result.
`error (deprecated)`	`object (Status)` This item is deprecated! Output only. Deprecated: Use `error_info` instead. Errors encountered during execution.
`initiatedBy`	`string` Output only. The user who initiated the evaluation run that resulted in this result.
`appVersion`	`string` Output only. The app version used to generate the conversation that resulted in this result. Format: `projects/{project}/locations/{location}/apps/{app}/versions/{version}`
`appVersionDisplayName`	`string` Output only. The display name of the `app_version` that the evaluation ran against.
`changelog`	`string` Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.
`changelogCreateTime`	`string (Timestamp format)` Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionState`	`enum (ExecutionState)` Output only. The state of the evaluation result execution.
`evaluationMetricsThresholds`	`object (EvaluationMetricsThresholds)` Output only. The evaluation thresholds for the result.
`config`	`object (EvaluationConfig)` Output only. The configuration used in the evaluation run that resulted in this result.
`goldenRunMethod`	`enum (GoldenRunMethod)` Output only. The method used to run the golden evaluation.
Union field `result`. The result of the evaluation. Only populated when the execution_state is COMPLETED. `result` can be only one of the following:
`goldenResult`	`object (GoldenResult)` Output only. The outcome of a golden evaluation.
`scenarioResult`	`object (ScenarioResult)` Output only. The outcome of a scenario evaluation.

GoldenResult

JSON representation
{ "turnReplayResults": [ { object (`TurnReplayResult`) } ], "evaluationExpectationResults": [ { object (`EvaluationExpectationResult`) } ] }

Fields

Fields
`turnReplayResults[]`	`object (TurnReplayResult)` Output only. The result of running each turn of the golden conversation.
`evaluationExpectationResults[]`	`object (EvaluationExpectationResult)` Output only. The results of the evaluation expectations.

turnReplayResults[]

object (TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

TurnReplayResult

JSON representation

JSON representation
{ "conversation": string, "expectationOutcome": [ { object (`GoldenExpectationOutcome`) } ], "hallucinationResult": { object (`HallucinationResult`) }, "toolInvocationScore": number, "turnLatency": string, "toolCallLatencies": [ { object (`ToolCallLatency`) } ], "semanticSimilarityResult": { object (`SemanticSimilarityResult`) }, "overallToolInvocationResult": { object (`OverallToolInvocationResult`) }, "errorInfo": { object (`EvaluationErrorInfo`) }, "spanLatencies": [ { object (`SpanLatency`) } ], // Union field `_tool_ordered_invocation_score` can be only one of the following: "toolOrderedInvocationScore": number // End of list of possible types for union field // `_tool_ordered_invocation_score`. }

{
  "conversation": string,
  "expectationOutcome": [
    {
      object (GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],

  // Union field _tool_ordered_invocation_score can be only one of the following:
  "toolOrderedInvocationScore": number
  // End of list of possible types for union field
  // _tool_ordered_invocation_score.
}

Fields
`conversation`	`string` Output only. The conversation that was generated for this turn.
`expectationOutcome[]`	`object (GoldenExpectationOutcome)` Output only. The outcome of each expectation.
`hallucinationResult`	`object (HallucinationResult)` Output only. The result of the hallucination check.
`toolInvocationScore (deprecated)`	`number` This item is deprecated! Output only. Deprecated. Use OverallToolInvocationResult instead.
`turnLatency`	`string (Duration format)` Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
`toolCallLatencies[]`	`object (ToolCallLatency)` Output only. The latency of each tool call in the turn.
`semanticSimilarityResult`	`object (SemanticSimilarityResult)` Output only. The result of the semantic similarity check.
`overallToolInvocationResult`	`object (OverallToolInvocationResult)` Output only. The result of the overall tool invocation check.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Information about the error that occurred during this turn.
`spanLatencies[]`	`object (SpanLatency)` Output only. The latency of spans in the turn.
Union field `_tool_ordered_invocation_score`. `_tool_ordered_invocation_score` can be only one of the following:
`toolOrderedInvocationScore`	`number` Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

GoldenExpectationOutcome

JSON representation

JSON representation
{ "expectation": { object (`GoldenExpectation`) }, "outcome": enum (`Outcome`), "semanticSimilarityResult": { object (`SemanticSimilarityResult`) }, "toolInvocationResult": { object (`ToolInvocationResult`) }, // Union field `result` can be only one of the following: "observedToolCall": { object (`ToolCall`) }, "observedToolResponse": { object (`ToolResponse`) }, "observedAgentResponse": { object (`Message`) }, "observedAgentTransfer": { object (`AgentTransfer`) } // End of list of possible types for union field `result`. }

{
  "expectation": {
    object (GoldenExpectation)
  },
  "outcome": enum (Outcome),
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (GoldenExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (Outcome)` Output only. The outcome of the expectation.
`semanticSimilarityResult (deprecated)`	`object (SemanticSimilarityResult)` This item is deprecated! Output only. The result of the semantic similarity check.
`toolInvocationResult`	`object (ToolInvocationResult)` Output only. The result of the tool invocation check.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (ToolCall)` Output only. The result of the tool call expectation.
`observedToolResponse`	`object (ToolResponse)` Output only. The result of the tool response expectation.
`observedAgentResponse`	`object (Message)` Output only. The result of the agent response expectation.
`observedAgentTransfer`	`object (AgentTransfer)` Output only. The result of the agent transfer expectation.

ToolCall

JSON representation

JSON representation
{ "id": string, "displayName": string, "args": { object }, // Union field `tool_identifier` can be only one of the following: "tool": string, "toolsetTool": { object (`ToolsetTool`) } // End of list of possible types for union field `tool_identifier`. }

{
  "id": string,
  "displayName": string,
  "args": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}

Fields
`id`	`string` Optional. The unique identifier of the tool call. If populated, the client should return the execution result with the matching ID in `ToolResponse`.
`displayName`	`string` Output only. Display name of the tool.
`args`	`object (Struct format)` Optional. The input parameters and values for the tool in JSON object format.
Union field `tool_identifier`. The identifier of the tool to execute. It could be either a persisted tool or a tool from a toolset. `tool_identifier` can be only one of the following:
`tool`	`string` Optional. The name of the tool to execute. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`
`toolsetTool`	`object (ToolsetTool)` Optional. The toolset tool to execute.

ToolsetTool

JSON representation
{ "toolset": string, "toolId": string }

Fields

Fields
`toolset`	`string` Required. The resource name of the Toolset from which this tool is derived. Format: `projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}`
`toolId`	`string` Optional. The tool ID to filter the tools to retrieve the schema for.

toolset

string

Required. The resource name of the Toolset from which this tool is derived. Format: projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}

toolId

string

Optional. The tool ID to filter the tools to retrieve the schema for.

Struct

JSON representation
{ "fields": { string: value, ... } }

Fields

Fields
`fields`	`map (key: string, value: value (Value format))` Unordered map of dynamically typed values. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.

fields

map (key: string, value: value (Value format))

Unordered map of dynamically typed values.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

FieldsEntry

JSON representation
{ "key": string, "value": value }

Fields
`key`	`string`
`value`	`value (Value format)`

Value

JSON representation

JSON representation
{ // Union field `kind` can be only one of the following: "nullValue": null, "numberValue": number, "stringValue": string, "boolValue": boolean, "structValue": { object }, "listValue": array // End of list of possible types for union field `kind`. }

{

  // Union field kind can be only one of the following:
  "nullValue": null,
  "numberValue": number,
  "stringValue": string,
  "boolValue": boolean,
  "structValue": {
    object
  },
  "listValue": array
  // End of list of possible types for union field kind.
}

Fields
Union field `kind`. The kind of value. `kind` can be only one of the following:
`nullValue`	`null` Represents a null value.
`numberValue`	`number` Represents a double value.
`stringValue`	`string` Represents a string value.
`boolValue`	`boolean` Represents a boolean value.
`structValue`	`object (Struct format)` Represents a structured value.
`listValue`	`array (ListValue format)` Represents a repeated `Value`.

ListValue

JSON representation
{ "values": [ value ] }

Fields

Fields
`values[]`	`value (Value format)` Repeated field of dynamically typed values.

values[]

value (Value format)

Repeated field of dynamically typed values.

ToolResponse

JSON representation

JSON representation
{ "id": string, "displayName": string, "response": { object }, // Union field `tool_identifier` can be only one of the following: "tool": string, "toolsetTool": { object (`ToolsetTool`) } // End of list of possible types for union field `tool_identifier`. }

{
  "id": string,
  "displayName": string,
  "response": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}

Fields
`id`	`string` Optional. The matching ID of the `tool call` the response is for.
`displayName`	`string` Output only. Display name of the tool.
`response`	`object (Struct format)` Required. The tool execution result in JSON object format. Use "output" key to specify tool response and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as tool execution result.
Union field `tool_identifier`. The identifier of the tool that got executed. It could be either a persisted tool or a tool from a toolset. `tool_identifier` can be only one of the following:
`tool`	`string` Optional. The name of the tool to execute. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`
`toolsetTool`	`object (ToolsetTool)` Optional. The toolset tool that got executed.

Message

JSON representation
{ "role": string, "chunks": [ { object (`Chunk`) } ], "eventTime": string }

Fields

Fields
`role`	`string` Optional. The role within the conversation, e.g., user, agent.
`chunks[]`	`object (Chunk)` Optional. Content of the message as a series of chunks.
`eventTime`	`string (Timestamp format)` Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an `example`. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.

role

string

Optional. The role within the conversation, e.g., user, agent.

chunks[]

object (Chunk)

Optional. Content of the message as a series of chunks.

eventTime

string (Timestamp format)

Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an example.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Chunk

JSON representation

JSON representation
{ // Union field `data` can be only one of the following: "text": string, "transcript": string, "blob": { object (`Blob`) }, "payload": { object }, "image": { object (`Image`) }, "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) }, "agentTransfer": { object (`AgentTransfer`) }, "updatedVariables": { object }, "defaultVariables": { object } // End of list of possible types for union field `data`. }

{

  // Union field data can be only one of the following:
  "text": string,
  "transcript": string,
  "blob": {
    object (Blob)
  },
  "payload": {
    object
  },
  "image": {
    object (Image)
  },
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "defaultVariables": {
    object
  }
  // End of list of possible types for union field data.
}

Fields
Union field `data`. Chunk data. `data` can be only one of the following:
`text`	`string` Optional. Text data.
`transcript`	`string` Optional. Transcript associated with the audio.
`blob`	`object (Blob)` Optional. Blob data.
`payload`	`object (Struct format)` Optional. Custom payload data.
`image`	`object (Image)` Optional. Image data.
`toolCall`	`object (ToolCall)` Optional. Tool execution request.
`toolResponse`	`object (ToolResponse)` Optional. Tool execution response.
`agentTransfer`	`object (AgentTransfer)` Optional. Agent transfer event.
`updatedVariables`	`object (Struct format)` A struct represents variables that were updated in the conversation, keyed by variable names.
`defaultVariables`	`object (Struct format)` A struct represents default variables at the start of the conversation, keyed by variable names.

Blob

JSON representation
{ "mimeType": string, "data": string }

Fields

Fields
`mimeType`	`string` Required. The IANA standard MIME type of the source data.
`data`	`string (bytes format)` Required. Raw bytes of the blob. A base64-encoded string.

mimeType

string

Required. The IANA standard MIME type of the source data.

data

string (bytes format)

Required. Raw bytes of the blob.

A base64-encoded string.

Image

JSON representation
{ "mimeType": string, "data": string }

Fields

Fields
`mimeType`	`string` Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp
`data`	`string (bytes format)` Required. Raw bytes of the image. A base64-encoded string.

mimeType

string

Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp

data

string (bytes format)

Required. Raw bytes of the image.

A base64-encoded string.

AgentTransfer

JSON representation
{ "targetAgent": string, "displayName": string }

Fields

Fields
`targetAgent`	`string` Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: `projects/{project}/locations/{location}/apps/{app}/agents/{agent}`
`displayName`	`string` Output only. Display name of the agent.

targetAgent

string

Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: projects/{project}/locations/{location}/apps/{app}/agents/{agent}

displayName

string

Output only. Display name of the agent.

Timestamp

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).
`nanos`	`integer` Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

GoldenExpectation

JSON representation

JSON representation
{ "note": string, // Union field `condition` can be only one of the following: "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) }, "agentResponse": { object (`Message`) }, "agentTransfer": { object (`AgentTransfer`) }, "updatedVariables": { object }, "mockToolResponse": { object (`ToolResponse`) } // End of list of possible types for union field `condition`. }

{
  "note": string,

  // Union field condition can be only one of the following:
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentResponse": {
    object (Message)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
  // End of list of possible types for union field condition.
}

Fields
`note`	`string` Optional. A note for this requirement, useful in reporting when specific checks fail. E.g., "Check_Payment_Tool_Called".
Union field `condition`. The actual check to perform. `condition` can be only one of the following:
`toolCall`	`object (ToolCall)` Optional. Check that a specific tool was called with the parameters.
`toolResponse`	`object (ToolResponse)` Optional. Check that a specific tool had the expected response.
`agentResponse`	`object (Message)` Optional. Check that the agent responded with the correct response. The role "agent" is implied.
`agentTransfer`	`object (AgentTransfer)` Optional. Check that the agent transferred the conversation to a different agent.
`updatedVariables`	`object (Struct format)` Optional. Check that the agent updated the session variables to the expected values. Used to also capture agent variable updates for golden evals.
`mockToolResponse`	`object (ToolResponse)` Optional. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

SemanticSimilarityResult

JSON representation

JSON representation
{ "label": string, "explanation": string, "outcome": enum (`Outcome`), // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

{
  "label": string,
  "explanation": string,
  "outcome": enum (Outcome),

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}

Fields
`label`	`string` Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory
`explanation`	`string` Output only. The explanation for the semantic similarity score.
`outcome`	`enum (Outcome)` Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semantic_similarity_success_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

ToolInvocationResult

JSON representation

JSON representation
{ "outcome": enum (`Outcome`), "explanation": string, // Union field `_parameter_correctness_score` can be only one of the following: "parameterCorrectnessScore": number // End of list of possible types for union field `_parameter_correctness_score`. }

{
  "outcome": enum (Outcome),
  "explanation": string,

  // Union field _parameter_correctness_score can be only one of the following:
  "parameterCorrectnessScore": number
  // End of list of possible types for union field _parameter_correctness_score.
}

Fields
`outcome`	`enum (Outcome)` Output only. The outcome of the tool invocation check. This is determined by comparing the parameter_correctness_score to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
`explanation`	`string` Output only. A free text explanation for the tool invocation result.
Union field `_parameter_correctness_score`. `_parameter_correctness_score` can be only one of the following:
`parameterCorrectnessScore`	`number` Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

HallucinationResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess
`explanation`	`string` Output only. The explanation for the hallucination score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The hallucination score. Can be -1, 0, 1.

Duration

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years
`nanos`	`integer` Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 `seconds` field and a positive or negative `nanos` field. For durations of one second or more, a non-zero value for the `nanos` field must be of the same sign as the `seconds` field. Must be from -999,999,999 to +999,999,999 inclusive.

seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

ToolCallLatency

JSON representation
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string }

Fields
`tool`	`string` Output only. The name of the tool that got executed. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`.
`displayName`	`string` Output only. The display name of the tool.
`startTime`	`string (Timestamp format)` Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionLatency`	`string (Duration format)` Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

OverallToolInvocationResult

JSON representation

JSON representation
{ "outcome": enum (`Outcome`), // Union field `_tool_invocation_score` can be only one of the following: "toolInvocationScore": number // End of list of possible types for union field `_tool_invocation_score`. }

{
  "outcome": enum (Outcome),

  // Union field _tool_invocation_score can be only one of the following:
  "toolInvocationScore": number
  // End of list of possible types for union field _tool_invocation_score.
}

Fields

outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the tool_invocation_score to the overall_tool_invocation_correctness_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _tool_invocation_score.

_tool_invocation_score can be only one of the following:

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationErrorInfo

JSON representation
{ "errorType": enum (`ErrorType`), "errorMessage": string, "sessionId": string }

Fields

errorType

enum (ErrorType)

Output only. The type of error.

errorMessage

string

Output only. The error message.

sessionId

string

Output only. The session ID for the conversation that caused the error.

SpanLatency

JSON representation

{
  "type": enum (Type),
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string,

  // Union field identifier can be only one of the following:
  "resource": string,
  "toolset": {
    object (ToolsetTool)
  },
  "model": string,
  "callback": string
  // End of list of possible types for union field identifier.
}

Fields
`type`	`enum (Type)` Output only. The type of span.
`displayName`	`string` Output only. The display name of the span. Applicable to tool and guardrail spans.
`startTime`	`string (Timestamp format)` Output only. The start time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionLatency`	`string (Duration format)` Output only. The latency of span. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
Union field `identifier`. The identifier of the specific item based on its type. `identifier` can be only one of the following:
`resource`	`string` Output only. The resource name of the guardrail or tool spans.
`toolset`	`object (ToolsetTool)` Output only. The toolset tool identifier.
`model`	`string` Output only. The name of the LLM span.
`callback`	`string` Output only. The name of the user callback span.

EvaluationExpectationResult

JSON representation
{ "evaluationExpectation": string, "prompt": string, "outcome": enum (`Outcome`), "explanation": string }

Fields
`evaluationExpectation`	`string` Output only. The evaluation expectation. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluation_expectation}`
`prompt`	`string` Output only. The prompt that was used for the evaluation.
`outcome`	`enum (Outcome)` Output only. The outcome of the evaluation expectation.
`explanation`	`string` Output only. The explanation for the result.

ScenarioResult

JSON representation

{
  "conversation": string,
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "expectationOutcomes": [
    {
      object (ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (UserGoalSatisfactionResult)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ],

  // Union field _all_expectations_satisfied can be only one of the following:
  "allExpectationsSatisfied": boolean
  // End of list of possible types for union field _all_expectations_satisfied.

  // Union field _task_completed can be only one of the following:
  "taskCompleted": boolean
  // End of list of possible types for union field _task_completed.
}

Fields

conversation

string

Output only. The conversation that was generated in the scenario.

task

string

Output only. The task that was used when running the scenario for this result.

userFacts[]

object (UserFact)

Output only. The user facts that were used by the scenario for this result.

expectationOutcomes[]

object (ScenarioExpectationOutcome)

Output only. The outcome of each expectation.

rubricOutcomes[]

object (ScenarioRubricOutcome)

Output only. The outcome of the rubric.

hallucinationResult[]

object (HallucinationResult)

Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.

taskCompletionResult
(deprecated)

object (TaskCompletionResult)

Output only. The result of the task completion check.

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call execution in the conversation.

userGoalSatisfactionResult

object (UserGoalSatisfactionResult)

Output only. The result of the user goal satisfaction check.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

Union field _all_expectations_satisfied.

_all_expectations_satisfied can be only one of the following:

allExpectationsSatisfied

boolean

Output only. Whether all expectations were satisfied for this turn.

Union field _task_completed.

_task_completed can be only one of the following:

taskCompleted

boolean

Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

UserFact

JSON representation
{ "name": string, "value": string }

Fields

name

string

Required. The name of the user fact.

value

string

Required. The value of the user fact.

ScenarioExpectationOutcome

JSON representation

{
  "expectation": {
    object (ScenarioExpectation)
  },
  "outcome": enum (Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (ScenarioExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (Outcome)` Output only. The outcome of the ScenarioExpectation.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (ObservedToolCall)` Output only. The observed tool call.
`observedAgentResponse`	`object (Message)` Output only. The observed agent response.

ObservedToolCall

JSON representation
{ "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) } }

Fields

toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

ScenarioExpectation

JSON representation

{

  // Union field expectation can be only one of the following:
  "toolExpectation": {
    object (ToolExpectation)
  },
  "agentResponse": {
    object (Message)
  }
  // End of list of possible types for union field expectation.
}

Fields

Union field expectation. The expectation to evaluate the conversation produced by the simulation. expectation can be only one of the following:

toolExpectation

object (ToolExpectation)

Optional. The tool call and response pair to be evaluated.

agentResponse

object (Message)

Optional. The agent response to be evaluated.

ToolExpectation

JSON representation
{ "expectedToolCall": { object (`ToolCall`) }, "mockToolResponse": { object (`ToolResponse`) } }

Fields

expectedToolCall

object (ToolCall)

Required. The expected tool call, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

mockToolResponse

object (ToolResponse)

Required. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

ScenarioRubricOutcome

JSON representation
{ "rubric": string, "scoreExplanation": string, // Union field `_score` can be only one of the following: "score": number // End of list of possible types for union field `_score`. }

Fields
`rubric`	`string` Output only. The rubric that was used to evaluate the conversation.
`scoreExplanation`	`string` Output only. The rater's response to the rubric.
Union field `_score`. `_score` can be only one of the following:
`score`	`number` Output only. The score of the conversation against the rubric.

TaskCompletionResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined
`explanation`	`string` Output only. The explanation for the task completion score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The task completion score. Can be -1, 0, 1

UserGoalSatisfactionResult

JSON representation
{ "label": string, "explanation": string, // Union field `_score` can be only one of the following: "score": integer // End of list of possible types for union field `_score`. }

Fields
`label`	`string` Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified
`explanation`	`string` Output only. The explanation for the user task satisfaction score.
Union field `_score`. `_score` can be only one of the following:
`score`	`integer` Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationPersona

JSON representation
{ "name": string, "description": string, "displayName": string, "personality": string, "speechConfig": { object (`SpeechConfig`) } }

Fields
`name`	`string` Required. The unique identifier of the persona. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationPersonas/{evaluationPersona}`
`description`	`string` Optional. The description of the persona.
`displayName`	`string` Required. The display name of the persona. Unique within an app.
`personality`	`string` Required. An instruction for the agent on how to behave in the evaluation.
`speechConfig`	`object (SpeechConfig)` Optional. Configuration for how the persona sounds (TTS settings).

SpeechConfig

JSON representation
{ "speakingRate": number, "environment": enum (`BackgroundEnvironment`), "voiceId": string }

Fields

speakingRate

number

Optional. The speaking rate. 1.0 is normal. Lower is slower (e.g., 0.8), higher is faster (e.g., 1.5). Useful for testing how the agent handles fast talkers.

environment

enum (BackgroundEnvironment)

Optional. The simulated audio environment.

voiceId

string

Optional. The specific voice identifier/accent to use. Example: "en-US-Wavenet-D" or "en-GB-Standard-A"

Status

JSON representation
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] }

Fields

code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Any

JSON representation
{ "typeUrl": string, "value": string }

Fields

typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

EvaluationMetricsThresholds

JSON representation

{
  "goldenEvaluationMetricsThresholds": {
    object (GoldenEvaluationMetricsThresholds)
  },
  "hallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "goldenHallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "scenarioHallucinationMetricBehavior": enum (HallucinationMetricBehavior)
}

Fields
`goldenEvaluationMetricsThresholds`	`object (GoldenEvaluationMetricsThresholds)` Optional. The golden evaluation metrics thresholds.
`hallucinationMetricBehavior (deprecated)`	`enum (HallucinationMetricBehavior)` This item is deprecated! Optional. Deprecated: Use `golden_hallucination_metric_behavior` instead. The hallucination metric behavior is currently used for golden evaluations.
`goldenHallucinationMetricBehavior`	`enum (HallucinationMetricBehavior)` Optional. The hallucination metric behavior for golden evaluations.
`scenarioHallucinationMetricBehavior`	`enum (HallucinationMetricBehavior)` Optional. The hallucination metric behavior for scenario evaluations.

GoldenEvaluationMetricsThresholds

JSON representation

{
  "turnLevelMetricsThresholds": {
    object (TurnLevelMetricsThresholds)
  },
  "expectationLevelMetricsThresholds": {
    object (ExpectationLevelMetricsThresholds)
  },
  "toolMatchingSettings": {
    object (ToolMatchingSettings)
  }
}

Fields

turnLevelMetricsThresholds

object (TurnLevelMetricsThresholds)

Optional. The turn level metrics thresholds.

expectationLevelMetricsThresholds

object (ExpectationLevelMetricsThresholds)

Optional. The expectation level metrics thresholds.

toolMatchingSettings

object (ToolMatchingSettings)

Optional. The tool matching settings. An extra tool call is a tool call that is present in the execution but does not match any tool call in the golden expectation.

TurnLevelMetricsThresholds

JSON representation

{
  "semanticSimilarityChannel": enum (SemanticSimilarityChannel),

  // Union field _semantic_similarity_success_threshold can be only one of the
  // following:
  "semanticSimilaritySuccessThreshold": integer
  // End of list of possible types for union field
  // _semantic_similarity_success_threshold.

  // Union field _overall_tool_invocation_correctness_threshold can be only one
  // of the following:
  "overallToolInvocationCorrectnessThreshold": number
  // End of list of possible types for union field
  // _overall_tool_invocation_correctness_threshold.
}

Fields

semanticSimilarityChannel

enum (SemanticSimilarityChannel)

Optional. The semantic similarity channel to use for evaluation.

Union field _semantic_similarity_success_threshold.

_semantic_similarity_success_threshold can be only one of the following:

semanticSimilaritySuccessThreshold

integer

Optional. The success threshold for semantic similarity. Must be an integer between 0 and 4. Default is >= 3.

Union field _overall_tool_invocation_correctness_threshold.

_overall_tool_invocation_correctness_threshold can be only one of the following:

overallToolInvocationCorrectnessThreshold

number

Optional. The success threshold for overall tool invocation correctness. Must be a float between 0 and 1. Default is 1.0.

ExpectationLevelMetricsThresholds

JSON representation

{

  // Union field _tool_invocation_parameter_correctness_threshold can be only one
  // of the following:
  "toolInvocationParameterCorrectnessThreshold": number
  // End of list of possible types for union field
  // _tool_invocation_parameter_correctness_threshold.
}

Fields

Union field _tool_invocation_parameter_correctness_threshold.

_tool_invocation_parameter_correctness_threshold can be only one of the following:

toolInvocationParameterCorrectnessThreshold

number

Optional. The success threshold for individual tool invocation parameter correctness. Must be a float between 0 and 1. Default is 1.0.

ToolMatchingSettings

JSON representation
{ "extraToolCallBehavior": enum (`ExtraToolCallBehavior`) }

Fields

extraToolCallBehavior

enum (ExtraToolCallBehavior)

Optional. Behavior for extra tool calls. Defaults to FAIL.

EvaluationConfig

JSON representation

{
  "inputAudioConfig": {
    object (InputAudioConfig)
  },
  "outputAudioConfig": {
    object (OutputAudioConfig)
  },
  "evaluationChannel": enum (EvaluationChannel),
  "toolCallBehaviour": enum (EvaluationToolCallBehaviour)
}

Fields
`inputAudioConfig (deprecated)`	`object (InputAudioConfig)` This item is deprecated! Optional. Configuration for processing the input audio.
`outputAudioConfig (deprecated)`	`object (OutputAudioConfig)` This item is deprecated! Optional. Configuration for generating the output audio.
`evaluationChannel`	`enum (EvaluationChannel)` Optional. The channel to evaluate.
`toolCallBehaviour`	`enum (EvaluationToolCallBehaviour)` Optional. Specifies whether the evaluation should use real tool calls or fake tools.

InputAudioConfig

JSON representation
{ "audioEncoding": enum (`AudioEncoding`), "sampleRateHertz": integer, "noiseSuppressionLevel": string }

Fields

audioEncoding

enum (AudioEncoding)

Required. The encoding of the input audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the input audio data.

noiseSuppressionLevel

string

Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high".

OutputAudioConfig

JSON representation
{ "audioEncoding": enum (`AudioEncoding`), "sampleRateHertz": integer }

Fields

audioEncoding

enum (AudioEncoding)

Required. The encoding of the output audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the output audio data.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌

MCP Tools Reference: ces.googleapis.com Stay organized with collections Save and categorize content based on your preferences.

Tool: get_evaluation_result

Input Schema

GetEvaluationResultRequest

Output Schema

EvaluationResult

GoldenResult

TurnReplayResult

GoldenExpectationOutcome

ToolCall

ToolsetTool

Struct

FieldsEntry

Value

ListValue

ToolResponse

Message

Chunk

Blob

Image

AgentTransfer

Timestamp

GoldenExpectation

SemanticSimilarityResult

ToolInvocationResult

HallucinationResult

Duration

ToolCallLatency

OverallToolInvocationResult

EvaluationErrorInfo

SpanLatency

EvaluationExpectationResult

ScenarioResult

UserFact

ScenarioExpectationOutcome

ObservedToolCall

ScenarioExpectation

ToolExpectation

ScenarioRubricOutcome

TaskCompletionResult

UserGoalSatisfactionResult

EvaluationPersona

SpeechConfig

Status

Any

EvaluationMetricsThresholds

GoldenEvaluationMetricsThresholds

TurnLevelMetricsThresholds

ExpectationLevelMetricsThresholds

ToolMatchingSettings

EvaluationConfig

InputAudioConfig

OutputAudioConfig

Tool Annotations

MCP Tools Reference: ces.googleapis.com

Tool: `get_evaluation_result`