MCP Tools Reference: ces.googleapis.com

Tool: get_evaluation_result

Gets details of the specified evaluation result.

The following sample demonstrate how to use curl to invoke the get_evaluation_result MCP tool.

Curl Request
                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "get_evaluation_result",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'
                

Input Schema

Request message for EvaluationService.GetEvaluationResult.

GetEvaluationResultRequest

JSON representation
{
  "name": string
}
Fields
name

string

Required. The resource name of the evaluation result to retrieve.

Output Schema

An evaluation result represents the output of running an Evaluation.

EvaluationResult

JSON representation
{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "changelogCreateTime": string,
  "executionState": enum (ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (GoldenResult)
  },
  "scenarioResult": {
    object (ScenarioResult)
  }
  // End of list of possible types for union field result.
}
Fields
name

string

Identifier. The unique identifier of the evaluation result. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}

displayName

string

Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation result was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

evaluationStatus

enum (Outcome)

Output only. The outcome of the evaluation. Only populated if execution_state is COMPLETE.

evaluationRun

string

Output only. The evaluation run that produced this result. Format: projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}

persona

object (EvaluationPersona)

Output only. The persona used to generate the conversation for the evaluation result.

errorInfo

object (EvaluationErrorInfo)

Output only. Error information for the evaluation result.

error
(deprecated)

object (Status)

Output only. Deprecated: Use error_info instead. Errors encountered during execution.

initiatedBy

string

Output only. The user who initiated the evaluation run that resulted in this result.

appVersion

string

Output only. The app version used to generate the conversation that resulted in this result. Format: projects/{project}/locations/{location}/apps/{app}/versions/{version}

appVersionDisplayName

string

Output only. The display name of the app_version that the evaluation ran against.

changelog

string

Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

changelogCreateTime

string (Timestamp format)

Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionState

enum (ExecutionState)

Output only. The state of the evaluation result execution.

evaluationMetricsThresholds

object (EvaluationMetricsThresholds)

Output only. The evaluation thresholds for the result.

config

object (EvaluationConfig)

Output only. The configuration used in the evaluation run that resulted in this result.

goldenRunMethod

enum (GoldenRunMethod)

Output only. The method used to run the golden evaluation.

Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following:
goldenResult

object (GoldenResult)

Output only. The outcome of a golden evaluation.

scenarioResult

object (ScenarioResult)

Output only. The outcome of a scenario evaluation.

GoldenResult

JSON representation
{
  "turnReplayResults": [
    {
      object (TurnReplayResult)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ]
}
Fields
turnReplayResults[]

object (TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

TurnReplayResult

JSON representation
{
  "conversation": string,
  "expectationOutcome": [
    {
      object (GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],

  // Union field _tool_ordered_invocation_score can be only one of the following:
  "toolOrderedInvocationScore": number
  // End of list of possible types for union field
  // _tool_ordered_invocation_score.
}
Fields
conversation

string

Output only. The conversation that was generated for this turn.

expectationOutcome[]

object (GoldenExpectationOutcome)

Output only. The outcome of each expectation.

hallucinationResult

object (HallucinationResult)

Output only. The result of the hallucination check.

toolInvocationScore
(deprecated)

number

Output only. Deprecated. Use OverallToolInvocationResult instead.

turnLatency

string (Duration format)

Output only. Duration of the turn.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call in the turn.

semanticSimilarityResult

object (SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

overallToolInvocationResult

object (OverallToolInvocationResult)

Output only. The result of the overall tool invocation check.

errorInfo

object (EvaluationErrorInfo)

Output only. Information about the error that occurred during this turn.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the turn.

Union field _tool_ordered_invocation_score.

_tool_ordered_invocation_score can be only one of the following:

toolOrderedInvocationScore

number

Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

GoldenExpectationOutcome

JSON representation
{
  "expectation": {
    object (GoldenExpectation)
  },
  "outcome": enum (Outcome),
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (GoldenExpectation)

Output only. The expectation that was evaluated.

outcome

enum (Outcome)

Output only. The outcome of the expectation.

semanticSimilarityResult
(deprecated)

object (SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

toolInvocationResult

object (ToolInvocationResult)

Output only. The result of the tool invocation check.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (ToolCall)

Output only. The result of the tool call expectation.

observedToolResponse

object (ToolResponse)

Output only. The result of the tool response expectation.

observedAgentResponse

object (Message)

Output only. The result of the agent response expectation.

observedAgentTransfer

object (AgentTransfer)

Output only. The result of the agent transfer expectation.

ToolCall

JSON representation
{
  "id": string,
  "displayName": string,
  "args": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}
Fields
id

string

Optional. The unique identifier of the tool call. If populated, the client should return the execution result with the matching ID in ToolResponse.

displayName

string

Output only. Display name of the tool.

args

object (Struct format)

Optional. The input parameters and values for the tool in JSON object format.

Union field tool_identifier. The identifier of the tool to execute. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following:
tool

string

Optional. The name of the tool to execute. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}

toolsetTool

object (ToolsetTool)

Optional. The toolset tool to execute.

ToolsetTool

JSON representation
{
  "toolset": string,
  "toolId": string
}
Fields
toolset

string

Required. The resource name of the Toolset from which this tool is derived. Format: projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}

toolId

string

Optional. The tool ID to filter the tools to retrieve the schema for.

Struct

JSON representation
{
  "fields": {
    string: value,
    ...
  }
}
Fields
fields

map (key: string, value: value (Value format))

Unordered map of dynamically typed values.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

FieldsEntry

JSON representation
{
  "key": string,
  "value": value
}
Fields
key

string

value

value (Value format)

Value

JSON representation
{

  // Union field kind can be only one of the following:
  "nullValue": null,
  "numberValue": number,
  "stringValue": string,
  "boolValue": boolean,
  "structValue": {
    object
  },
  "listValue": array
  // End of list of possible types for union field kind.
}
Fields
Union field kind. The kind of value. kind can be only one of the following:
nullValue

null

Represents a null value.

numberValue

number

Represents a double value.

stringValue

string

Represents a string value.

boolValue

boolean

Represents a boolean value.

structValue

object (Struct format)

Represents a structured value.

listValue

array (ListValue format)

Represents a repeated Value.

ListValue

JSON representation
{
  "values": [
    value
  ]
}
Fields
values[]

value (Value format)

Repeated field of dynamically typed values.

ToolResponse

JSON representation
{
  "id": string,
  "displayName": string,
  "response": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}
Fields
id

string

Optional. The matching ID of the tool call the response is for.

displayName

string

Output only. Display name of the tool.

response

object (Struct format)

Required. The tool execution result in JSON object format. Use "output" key to specify tool response and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as tool execution result.

Union field tool_identifier. The identifier of the tool that got executed. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following:
tool

string

Optional. The name of the tool to execute. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}

toolsetTool

object (ToolsetTool)

Optional. The toolset tool that got executed.

Message

JSON representation
{
  "role": string,
  "chunks": [
    {
      object (Chunk)
    }
  ],
  "eventTime": string
}
Fields
role

string

Optional. The role within the conversation, e.g., user, agent.

chunks[]

object (Chunk)

Optional. Content of the message as a series of chunks.

eventTime

string (Timestamp format)

Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an example.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Chunk

JSON representation
{

  // Union field data can be only one of the following:
  "text": string,
  "transcript": string,
  "blob": {
    object (Blob)
  },
  "payload": {
    object
  },
  "image": {
    object (Image)
  },
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "defaultVariables": {
    object
  }
  // End of list of possible types for union field data.
}
Fields
Union field data. Chunk data. data can be only one of the following:
text

string

Optional. Text data.

transcript

string

Optional. Transcript associated with the audio.

blob

object (Blob)

Optional. Blob data.

payload

object (Struct format)

Optional. Custom payload data.

image

object (Image)

Optional. Image data.

toolCall

object (ToolCall)

Optional. Tool execution request.

toolResponse

object (ToolResponse)

Optional. Tool execution response.

agentTransfer

object (AgentTransfer)

Optional. Agent transfer event.

updatedVariables

object (Struct format)

A struct represents variables that were updated in the conversation, keyed by variable names.

defaultVariables

object (Struct format)

A struct represents default variables at the start of the conversation, keyed by variable names.

Blob

JSON representation
{
  "mimeType": string,
  "data": string
}
Fields
mimeType

string

Required. The IANA standard MIME type of the source data.

data

string (bytes format)

Required. Raw bytes of the blob.

A base64-encoded string.

Image

JSON representation
{
  "mimeType": string,
  "data": string
}
Fields
mimeType

string

Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp

data

string (bytes format)

Required. Raw bytes of the image.

A base64-encoded string.

AgentTransfer

JSON representation
{
  "targetAgent": string,
  "displayName": string
}
Fields
targetAgent

string

Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: projects/{project}/locations/{location}/apps/{app}/agents/{agent}

displayName

string

Output only. Display name of the agent.

Timestamp

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

GoldenExpectation

JSON representation
{
  "note": string,

  // Union field condition can be only one of the following:
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentResponse": {
    object (Message)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
  // End of list of possible types for union field condition.
}
Fields
note

string

Optional. A note for this requirement, useful in reporting when specific checks fail. E.g., "Check_Payment_Tool_Called".

Union field condition. The actual check to perform. condition can be only one of the following:
toolCall

object (ToolCall)

Optional. Check that a specific tool was called with the parameters.

toolResponse

object (ToolResponse)

Optional. Check that a specific tool had the expected response.

agentResponse

object (Message)

Optional. Check that the agent responded with the correct response. The role "agent" is implied.

agentTransfer

object (AgentTransfer)

Optional. Check that the agent transferred the conversation to a different agent.

updatedVariables

object (Struct format)

Optional. Check that the agent updated the session variables to the expected values. Used to also capture agent variable updates for golden evals.

mockToolResponse

object (ToolResponse)

Optional. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

SemanticSimilarityResult

JSON representation
{
  "label": string,
  "explanation": string,
  "outcome": enum (Outcome),

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory

explanation

string

Output only. The explanation for the semantic similarity score.

outcome

enum (Outcome)

Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semantic_similarity_success_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

ToolInvocationResult

JSON representation
{
  "outcome": enum (Outcome),
  "explanation": string,

  // Union field _parameter_correctness_score can be only one of the following:
  "parameterCorrectnessScore": number
  // End of list of possible types for union field _parameter_correctness_score.
}
Fields
outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the parameter_correctness_score to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

explanation

string

Output only. A free text explanation for the tool invocation result.

Union field _parameter_correctness_score.

_parameter_correctness_score can be only one of the following:

parameterCorrectnessScore

number

Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

HallucinationResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess

explanation

string

Output only. The explanation for the hallucination score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The hallucination score. Can be -1, 0, 1.

Duration

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

ToolCallLatency

JSON representation
{
  "tool": string,
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string
}
Fields
tool

string

Output only. The name of the tool that got executed. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}.

displayName

string

Output only. The display name of the tool.

startTime

string (Timestamp format)

Output only. The start time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionLatency

string (Duration format)

Output only. The latency of the tool call execution.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

OverallToolInvocationResult

JSON representation
{
  "outcome": enum (Outcome),

  // Union field _tool_invocation_score can be only one of the following:
  "toolInvocationScore": number
  // End of list of possible types for union field _tool_invocation_score.
}
Fields
outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the tool_invocation_score to the overall_tool_invocation_correctness_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _tool_invocation_score.

_tool_invocation_score can be only one of the following:

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationErrorInfo

JSON representation
{
  "errorType": enum (ErrorType),
  "errorMessage": string,
  "sessionId": string
}
Fields
errorType

enum (ErrorType)

Output only. The type of error.

errorMessage

string

Output only. The error message.

sessionId

string

Output only. The session ID for the conversation that caused the error.

SpanLatency

JSON representation
{
  "type": enum (Type),
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string,

  // Union field identifier can be only one of the following:
  "resource": string,
  "toolset": {
    object (ToolsetTool)
  },
  "model": string,
  "callback": string
  // End of list of possible types for union field identifier.
}
Fields
type

enum (Type)

Output only. The type of span.

displayName

string

Output only. The display name of the span. Applicable to tool and guardrail spans.

startTime

string (Timestamp format)

Output only. The start time of span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionLatency

string (Duration format)

Output only. The latency of span.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Union field identifier. The identifier of the specific item based on its type. identifier can be only one of the following:
resource

string

Output only. The resource name of the guardrail or tool spans.

toolset

object (ToolsetTool)

Output only. The toolset tool identifier.

model

string

Output only. The name of the LLM span.

callback

string

Output only. The name of the user callback span.

EvaluationExpectationResult

JSON representation
{
  "evaluationExpectation": string,
  "prompt": string,
  "outcome": enum (Outcome),
  "explanation": string
}
Fields
evaluationExpectation

string

Output only. The evaluation expectation. Format: projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluation_expectation}

prompt

string

Output only. The prompt that was used for the evaluation.

outcome

enum (Outcome)

Output only. The outcome of the evaluation expectation.

explanation

string

Output only. The explanation for the result.

ScenarioResult

JSON representation
{
  "conversation": string,
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "expectationOutcomes": [
    {
      object (ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (UserGoalSatisfactionResult)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ],

  // Union field _all_expectations_satisfied can be only one of the following:
  "allExpectationsSatisfied": boolean
  // End of list of possible types for union field _all_expectations_satisfied.

  // Union field _task_completed can be only one of the following:
  "taskCompleted": boolean
  // End of list of possible types for union field _task_completed.
}
Fields
conversation

string

Output only. The conversation that was generated in the scenario.

task

string

Output only. The task that was used when running the scenario for this result.

userFacts[]

object (UserFact)

Output only. The user facts that were used by the scenario for this result.

expectationOutcomes[]

object (ScenarioExpectationOutcome)

Output only. The outcome of each expectation.

rubricOutcomes[]

object (ScenarioRubricOutcome)

Output only. The outcome of the rubric.

hallucinationResult[]

object (HallucinationResult)

Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.

taskCompletionResult
(deprecated)

object (TaskCompletionResult)

Output only. The result of the task completion check.

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call execution in the conversation.

userGoalSatisfactionResult

object (UserGoalSatisfactionResult)

Output only. The result of the user goal satisfaction check.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

Union field _all_expectations_satisfied.

_all_expectations_satisfied can be only one of the following:

allExpectationsSatisfied

boolean

Output only. Whether all expectations were satisfied for this turn.

Union field _task_completed.

_task_completed can be only one of the following:

taskCompleted

boolean

Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

UserFact

JSON representation
{
  "name": string,
  "value": string
}
Fields
name

string

Required. The name of the user fact.

value

string

Required. The value of the user fact.

ScenarioExpectationOutcome

JSON representation
{
  "expectation": {
    object (ScenarioExpectation)
  },
  "outcome": enum (Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (ScenarioExpectation)

Output only. The expectation that was evaluated.

outcome

enum (Outcome)

Output only. The outcome of the ScenarioExpectation.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (ObservedToolCall)

Output only. The observed tool call.

observedAgentResponse

object (Message)

Output only. The observed agent response.

ObservedToolCall

JSON representation
{
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  }
}
Fields
toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

ScenarioExpectation

JSON representation
{

  // Union field expectation can be only one of the following:
  "toolExpectation": {
    object (ToolExpectation)
  },
  "agentResponse": {
    object (Message)
  }
  // End of list of possible types for union field expectation.
}
Fields
Union field expectation. The expectation to evaluate the conversation produced by the simulation. expectation can be only one of the following:
toolExpectation

object (ToolExpectation)

Optional. The tool call and response pair to be evaluated.

agentResponse

object (Message)

Optional. The agent response to be evaluated.

ToolExpectation

JSON representation
{
  "expectedToolCall": {
    object (ToolCall)
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
}
Fields
expectedToolCall

object (ToolCall)

Required. The expected tool call, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

mockToolResponse

object (ToolResponse)

Required. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

ScenarioRubricOutcome

JSON representation
{
  "rubric": string,
  "scoreExplanation": string,

  // Union field _score can be only one of the following:
  "score": number
  // End of list of possible types for union field _score.
}
Fields
rubric

string

Output only. The rubric that was used to evaluate the conversation.

scoreExplanation

string

Output only. The rater's response to the rubric.

Union field _score.

_score can be only one of the following:

score

number

Output only. The score of the conversation against the rubric.

TaskCompletionResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined

explanation

string

Output only. The explanation for the task completion score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The task completion score. Can be -1, 0, 1

UserGoalSatisfactionResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified

explanation

string

Output only. The explanation for the user task satisfaction score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationPersona

JSON representation
{
  "name": string,
  "description": string,
  "displayName": string,
  "personality": string,
  "speechConfig": {
    object (SpeechConfig)
  }
}
Fields
name

string

Required. The unique identifier of the persona. Format: projects/{project}/locations/{location}/apps/{app}/evaluationPersonas/{evaluationPersona}

description

string

Optional. The description of the persona.

displayName

string

Required. The display name of the persona. Unique within an app.

personality

string

Required. An instruction for the agent on how to behave in the evaluation.

speechConfig

object (SpeechConfig)

Optional. Configuration for how the persona sounds (TTS settings).

SpeechConfig

JSON representation
{
  "speakingRate": number,
  "environment": enum (BackgroundEnvironment),
  "voiceId": string
}
Fields
speakingRate

number

Optional. The speaking rate. 1.0 is normal. Lower is slower (e.g., 0.8), higher is faster (e.g., 1.5). Useful for testing how the agent handles fast talkers.

environment

enum (BackgroundEnvironment)

Optional. The simulated audio environment.

voiceId

string

Optional. The specific voice identifier/accent to use. Example: "en-US-Wavenet-D" or "en-GB-Standard-A"

Status

JSON representation
{
  "code": integer,
  "message": string,
  "details": [
    {
      "@type": string,
      field1: ...,
      ...
    }
  ]
}
Fields
code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Any

JSON representation
{
  "typeUrl": string,
  "value": string
}
Fields
typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

EvaluationMetricsThresholds

JSON representation
{
  "goldenEvaluationMetricsThresholds": {
    object (GoldenEvaluationMetricsThresholds)
  },
  "hallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "goldenHallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "scenarioHallucinationMetricBehavior": enum (HallucinationMetricBehavior)
}
Fields
goldenEvaluationMetricsThresholds

object (GoldenEvaluationMetricsThresholds)

Optional. The golden evaluation metrics thresholds.

hallucinationMetricBehavior
(deprecated)

enum (HallucinationMetricBehavior)

Optional. Deprecated: Use golden_hallucination_metric_behavior instead. The hallucination metric behavior is currently used for golden evaluations.

goldenHallucinationMetricBehavior

enum (HallucinationMetricBehavior)

Optional. The hallucination metric behavior for golden evaluations.

scenarioHallucinationMetricBehavior

enum (HallucinationMetricBehavior)

Optional. The hallucination metric behavior for scenario evaluations.

GoldenEvaluationMetricsThresholds

JSON representation
{
  "turnLevelMetricsThresholds": {
    object (TurnLevelMetricsThresholds)
  },
  "expectationLevelMetricsThresholds": {
    object (ExpectationLevelMetricsThresholds)
  },
  "toolMatchingSettings": {
    object (ToolMatchingSettings)
  }
}
Fields
turnLevelMetricsThresholds

object (TurnLevelMetricsThresholds)

Optional. The turn level metrics thresholds.

expectationLevelMetricsThresholds

object (ExpectationLevelMetricsThresholds)

Optional. The expectation level metrics thresholds.

toolMatchingSettings

object (ToolMatchingSettings)

Optional. The tool matching settings. An extra tool call is a tool call that is present in the execution but does not match any tool call in the golden expectation.

TurnLevelMetricsThresholds

JSON representation
{
  "semanticSimilarityChannel": enum (SemanticSimilarityChannel),

  // Union field _semantic_similarity_success_threshold can be only one of the
  // following:
  "semanticSimilaritySuccessThreshold": integer
  // End of list of possible types for union field
  // _semantic_similarity_success_threshold.

  // Union field _overall_tool_invocation_correctness_threshold can be only one
  // of the following:
  "overallToolInvocationCorrectnessThreshold": number
  // End of list of possible types for union field
  // _overall_tool_invocation_correctness_threshold.
}
Fields
semanticSimilarityChannel

enum (SemanticSimilarityChannel)

Optional. The semantic similarity channel to use for evaluation.

Union field _semantic_similarity_success_threshold.

_semantic_similarity_success_threshold can be only one of the following:

semanticSimilaritySuccessThreshold

integer

Optional. The success threshold for semantic similarity. Must be an integer between 0 and 4. Default is >= 3.

Union field _overall_tool_invocation_correctness_threshold.

_overall_tool_invocation_correctness_threshold can be only one of the following:

overallToolInvocationCorrectnessThreshold

number

Optional. The success threshold for overall tool invocation correctness. Must be a float between 0 and 1. Default is 1.0.

ExpectationLevelMetricsThresholds

JSON representation
{

  // Union field _tool_invocation_parameter_correctness_threshold can be only one
  // of the following:
  "toolInvocationParameterCorrectnessThreshold": number
  // End of list of possible types for union field
  // _tool_invocation_parameter_correctness_threshold.
}
Fields

Union field _tool_invocation_parameter_correctness_threshold.

_tool_invocation_parameter_correctness_threshold can be only one of the following:

toolInvocationParameterCorrectnessThreshold

number

Optional. The success threshold for individual tool invocation parameter correctness. Must be a float between 0 and 1. Default is 1.0.

ToolMatchingSettings

JSON representation
{
  "extraToolCallBehavior": enum (ExtraToolCallBehavior)
}
Fields
extraToolCallBehavior

enum (ExtraToolCallBehavior)

Optional. Behavior for extra tool calls. Defaults to FAIL.

EvaluationConfig

JSON representation
{
  "inputAudioConfig": {
    object (InputAudioConfig)
  },
  "outputAudioConfig": {
    object (OutputAudioConfig)
  },
  "evaluationChannel": enum (EvaluationChannel),
  "toolCallBehaviour": enum (EvaluationToolCallBehaviour)
}
Fields
inputAudioConfig
(deprecated)

object (InputAudioConfig)

Optional. Configuration for processing the input audio.

outputAudioConfig
(deprecated)

object (OutputAudioConfig)

Optional. Configuration for generating the output audio.

evaluationChannel

enum (EvaluationChannel)

Optional. The channel to evaluate.

toolCallBehaviour

enum (EvaluationToolCallBehaviour)

Optional. Specifies whether the evaluation should use real tool calls or fake tools.

InputAudioConfig

JSON representation
{
  "audioEncoding": enum (AudioEncoding),
  "sampleRateHertz": integer,
  "noiseSuppressionLevel": string
}
Fields
audioEncoding

enum (AudioEncoding)

Required. The encoding of the input audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the input audio data.

noiseSuppressionLevel

string

Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high".

OutputAudioConfig

JSON representation
{
  "audioEncoding": enum (AudioEncoding),
  "sampleRateHertz": integer
}
Fields
audioEncoding

enum (AudioEncoding)

Required. The encoding of the output audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the output audio data.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌