Tool: list_evaluations
Lists evaluations.
The following sample demonstrate how to use curl to invoke the list_evaluations MCP tool.
| Curl Request |
|---|
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "list_evaluations", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }' |
Input Schema
Request message for EvaluationService.ListEvaluations.
ListEvaluationsRequest
| JSON representation |
|---|
{ "parent": string, "pageSize": integer, "pageToken": string, "filter": string, "evaluationFilter": string, "evaluationRunFilter": string, "orderBy": string, "lastTenResults": boolean } |
| Fields | |
|---|---|
parent |
Required. The resource name of the app to list evaluations from. |
pageSize |
Optional. Requested page size. Server may return fewer items than requested. If unspecified, server will pick an appropriate default. |
pageToken |
Optional. The |
filter |
Optional. Deprecated: Use evaluation_filter and evaluation_run_filter instead. |
evaluationFilter |
Optional. Filter to be applied on the evaluation when listing the evaluations. See https://google.aip.dev/160 for more details. Supported fields: evaluation_datasets |
evaluationRunFilter |
Optional. Filter string for fields on the associated EvaluationRun resources. See https://google.aip.dev/160 for more details. Supported fields: create_time, initiated_by, app_version_display_name |
orderBy |
Optional. Field to sort by. Only "name" and "create_time", and "update_time" are supported. Time fields are ordered in descending order, and the name field is ordered in ascending order. If not included, "update_time" will be the default. See https://google.aip.dev/132#ordering for more details. |
lastTenResults |
Optional. Whether to include the last 10 evaluation results for each evaluation in the response. |
Output Schema
Response message for EvaluationService.ListEvaluations.
ListEvaluationsResponse
| JSON representation |
|---|
{
"evaluations": [
{
object ( |
| Fields | |
|---|---|
evaluations[] |
The list of evaluations. |
nextPageToken |
A token that can be sent as |
Evaluation
| JSON representation |
|---|
{ "name": string, "displayName": string, "description": string, "tags": [ string ], "evaluationDatasets": [ string ], "createTime": string, "createdBy": string, "updateTime": string, "lastUpdatedBy": string, "evaluationRuns": [ string ], "etag": string, "aggregatedMetrics": { object ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of this evaluation. Format: |
displayName |
Required. User-defined display name of the evaluation. Unique within an App. |
description |
Optional. User-defined description of the evaluation. |
tags[] |
Optional. User defined tags to categorize the evaluation. |
evaluationDatasets[] |
Output only. List of evaluation datasets the evaluation belongs to. Format: |
createTime |
Output only. Timestamp when the evaluation was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
createdBy |
Output only. The user who created the evaluation. |
updateTime |
Output only. Timestamp when the evaluation was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
lastUpdatedBy |
Output only. The user who last updated the evaluation. |
evaluationRuns[] |
Output only. The EvaluationRuns that this Evaluation is associated with. |
etag |
Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes. |
aggregatedMetrics |
Output only. The aggregated metrics for this evaluation across all runs. |
lastCompletedResult |
Output only. The latest evaluation result for this evaluation. |
invalid |
Output only. Whether the evaluation is invalid. This can happen if an evaluation is referencing a tool, toolset, or agent that has since been deleted. |
lastTenResults[] |
Output only. The last 10 evaluation results for this evaluation. This is only populated if include_last_ten_results is set to true in the ListEvaluationsRequest or GetEvaluationRequest. |
Union field inputs. The inputs for the evaluation inputs can be only one of the following: |
|
golden |
Optional. The golden steps to be evaluated. |
scenario |
Optional. The config for a scenario. |
Golden
| JSON representation |
|---|
{
"turns": [
{
object ( |
| Fields | |
|---|---|
turns[] |
Required. The golden turns required to replay a golden conversation. |
evaluationExpectations[] |
Optional. The evaluation expectations to evaluate the replayed conversation against. Format: |
GoldenTurn
| JSON representation |
|---|
{ "steps": [ { object ( |
| Fields | |
|---|---|
steps[] |
Required. The steps required to replay a golden conversation. |
rootSpan |
Optional. The root span of the golden turn for processing and maintaining audio information. |
Step
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field step. The step to perform. step can be only one of the following: |
|
userInput |
Optional. User input for the conversation. |
agentTransfer |
Optional. Transfer the conversation to a different agent. |
expectation |
Optional. Executes an expectation on the current turn. |
SessionInput
| JSON representation |
|---|
{ "willContinue": boolean, // Union field |
| Fields | |
|---|---|
willContinue |
Optional. A flag to indicate if the current message is a fragment of a larger input in the bidi streaming session. When set to NOTE: This field does not apply to audio and DTMF inputs, as they are always processed automatically based on the endpointing signal. |
Union field input_type. The type of the input. input_type can be only one of the following: |
|
text |
Optional. Text data from the end user. |
dtmf |
Optional. DTMF digits from the end user. |
audio |
Optional. Audio data from the end user. A base64-encoded string. |
toolResponses |
Optional. Execution results for the tool calls from the client. |
image |
Optional. Image data from the end user. |
blob |
Optional. Blob data from the end user. |
variables |
Optional. Contextual variables for the session, keyed by name. Only variables declared in the app will be used by the CES agent. Unrecognized variables will still be sent to the [Dialogflow agent][Agent.RemoteDialogflowAgent] as additional session parameters. |
event |
Optional. Event input. |
ToolResponses
| JSON representation |
|---|
{
"toolResponses": [
{
object ( |
| Fields | |
|---|---|
toolResponses[] |
Optional. The list of tool execution results. |
ToolResponse
| JSON representation |
|---|
{ "id": string, "displayName": string, "response": { object }, // Union field |
| Fields | |
|---|---|
id |
Optional. The matching ID of the |
displayName |
Output only. Display name of the tool. |
response |
Required. The tool execution result in JSON object format. Use "output" key to specify tool response and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as tool execution result. |
Union field tool_identifier. The identifier of the tool that got executed. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following: |
|
tool |
Optional. The name of the tool to execute. Format: |
toolsetTool |
Optional. The toolset tool that got executed. |
ToolsetTool
| JSON representation |
|---|
{ "toolset": string, "toolId": string } |
| Fields | |
|---|---|
toolset |
Required. The resource name of the Toolset from which this tool is derived. Format: |
toolId |
Optional. The tool ID to filter the tools to retrieve the schema for. |
Struct
| JSON representation |
|---|
{ "fields": { string: value, ... } } |
| Fields | |
|---|---|
fields |
Unordered map of dynamically typed values. An object containing a list of |
FieldsEntry
| JSON representation |
|---|
{ "key": string, "value": value } |
| Fields | |
|---|---|
key |
|
value |
|
Value
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field kind. The kind of value. kind can be only one of the following: |
|
nullValue |
Represents a null value. |
numberValue |
Represents a double value. |
stringValue |
Represents a string value. |
boolValue |
Represents a boolean value. |
structValue |
Represents a structured value. |
listValue |
Represents a repeated |
ListValue
| JSON representation |
|---|
{ "values": [ value ] } |
| Fields | |
|---|---|
values[] |
Repeated field of dynamically typed values. |
Image
| JSON representation |
|---|
{ "mimeType": string, "data": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp |
data |
Required. Raw bytes of the image. A base64-encoded string. |
Blob
| JSON representation |
|---|
{ "mimeType": string, "data": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
data |
Required. Raw bytes of the blob. A base64-encoded string. |
Event
| JSON representation |
|---|
{ "event": string } |
| Fields | |
|---|---|
event |
Required. The name of the event. |
AgentTransfer
| JSON representation |
|---|
{ "targetAgent": string, "displayName": string } |
| Fields | |
|---|---|
targetAgent |
Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: |
displayName |
Output only. Display name of the agent. |
GoldenExpectation
| JSON representation |
|---|
{ "note": string, // Union field |
| Fields | |
|---|---|
note |
Optional. A note for this requirement, useful in reporting when specific checks fail. E.g., "Check_Payment_Tool_Called". |
Union field condition. The actual check to perform. condition can be only one of the following: |
|
toolCall |
Optional. Check that a specific tool was called with the parameters. |
toolResponse |
Optional. Check that a specific tool had the expected response. |
agentResponse |
Optional. Check that the agent responded with the correct response. The role "agent" is implied. |
agentTransfer |
Optional. Check that the agent transferred the conversation to a different agent. |
updatedVariables |
Optional. Check that the agent updated the session variables to the expected values. Used to also capture agent variable updates for golden evals. |
mockToolResponse |
Optional. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM. |
ToolCall
| JSON representation |
|---|
{ "id": string, "displayName": string, "args": { object }, // Union field |
| Fields | |
|---|---|
id |
Optional. The unique identifier of the tool call. If populated, the client should return the execution result with the matching ID in |
displayName |
Output only. Display name of the tool. |
args |
Optional. The input parameters and values for the tool in JSON object format. |
Union field tool_identifier. The identifier of the tool to execute. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following: |
|
tool |
Optional. The name of the tool to execute. Format: |
toolsetTool |
Optional. The toolset tool to execute. |
Message
| JSON representation |
|---|
{
"role": string,
"chunks": [
{
object ( |
| Fields | |
|---|---|
role |
Optional. The role within the conversation, e.g., user, agent. |
chunks[] |
Optional. Content of the message as a series of chunks. |
eventTime |
Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
Chunk
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field data. Chunk data. data can be only one of the following: |
|
text |
Optional. Text data. |
transcript |
Optional. Transcript associated with the audio. |
blob |
Optional. Blob data. |
payload |
Optional. Custom payload data. |
image |
Optional. Image data. |
toolCall |
Optional. Tool execution request. |
toolResponse |
Optional. Tool execution response. |
agentTransfer |
Optional. Agent transfer event. |
updatedVariables |
A struct represents variables that were updated in the conversation, keyed by variable names. |
defaultVariables |
A struct represents default variables at the start of the conversation, keyed by variable names. |
Timestamp
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z). |
nanos |
Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive. |
Span
| JSON representation |
|---|
{
"name": string,
"startTime": string,
"endTime": string,
"duration": string,
"attributes": {
object
},
"childSpans": [
{
object ( |
| Fields | |
|---|---|
name |
Output only. The name of the span. |
startTime |
Output only. The start time of the span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of the span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
duration |
Output only. The duration of the span. A duration in seconds with up to nine fractional digits, ending with ' |
attributes |
Output only. Key-value attributes associated with the span. |
childSpans[] |
Output only. The child spans that are nested under this span. |
Duration
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
Scenario
| JSON representation |
|---|
{ "task": string, "userFacts": [ { object ( |
| Fields | |
|---|---|
task |
Required. The task to be targeted by the scenario. |
userFacts[] |
Optional. The user facts to be used by the scenario. |
maxTurns |
Optional. The maximum number of turns to simulate. If not specified, the simulation will continue until the task is complete. |
rubrics[] |
Required. The rubrics to score the scenario against. |
scenarioExpectations[] |
Required. The ScenarioExpectations to evaluate the conversation produced by the user simulation. |
variableOverrides |
Optional. Variables / Session Parameters as context for the session, keyed by variable names. Members of this struct will override any default values set by the system. Note, these are different from user facts, which are facts known to the user. Variables are parameters known to the agent: i.e. MDN (phone number) passed by the telephony system. |
taskCompletionBehavior |
Optional. Deprecated. Use user_goal_behavior instead. |
userGoalBehavior |
Optional. The expected behavior of the user goal. |
evaluationExpectations[] |
Optional. The evaluation expectations to evaluate the conversation produced by the simulation against. Format: |
UserFact
| JSON representation |
|---|
{ "name": string, "value": string } |
| Fields | |
|---|---|
name |
Required. The name of the user fact. |
value |
Required. The value of the user fact. |
ScenarioExpectation
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field expectation. The expectation to evaluate the conversation produced by the simulation. expectation can be only one of the following: |
|
toolExpectation |
Optional. The tool call and response pair to be evaluated. |
agentResponse |
Optional. The agent response to be evaluated. |
ToolExpectation
| JSON representation |
|---|
{ "expectedToolCall": { object ( |
| Fields | |
|---|---|
expectedToolCall |
Required. The expected tool call, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM. |
mockToolResponse |
Required. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM. |
AggregatedMetrics
| JSON representation |
|---|
{
"metricsByAppVersion": [
{
object ( |
| Fields | |
|---|---|
metricsByAppVersion[] |
Output only. Aggregated metrics, grouped by app version ID. |
MetricsByAppVersion
| JSON representation |
|---|
{ "appVersionId": string, "toolMetrics": [ { object ( |
| Fields | |
|---|---|
appVersionId |
Output only. The app version ID. |
toolMetrics[] |
Output only. Metrics for each tool within this app version. |
semanticSimilarityMetrics[] |
Output only. Metrics for semantic similarity within this app version. |
hallucinationMetrics[] |
Output only. Metrics for hallucination within this app version. |
toolCallLatencyMetrics[] |
Output only. Metrics for tool call latency within this app version. |
turnLatencyMetrics[] |
Output only. Metrics for turn latency within this app version. |
passCount |
Output only. The number of times the evaluation passed. |
failCount |
Output only. The number of times the evaluation failed. |
metricsByTurn[] |
Output only. Metrics aggregated per turn within this app version. |
ToolMetrics
| JSON representation |
|---|
{ "tool": string, "passCount": integer, "failCount": integer } |
| Fields | |
|---|---|
tool |
Output only. The name of the tool. |
passCount |
Output only. The number of times the tool passed. |
failCount |
Output only. The number of times the tool failed. |
SemanticSimilarityMetrics
| JSON representation |
|---|
{ "score": number } |
| Fields | |
|---|---|
score |
Output only. The average semantic similarity score (0-4). |
HallucinationMetrics
| JSON representation |
|---|
{ "score": number } |
| Fields | |
|---|---|
score |
Output only. The average hallucination score (0 to 1). |
ToolCallLatencyMetrics
| JSON representation |
|---|
{ "tool": string, "averageLatency": string } |
| Fields | |
|---|---|
tool |
Output only. The name of the tool. |
averageLatency |
Output only. The average latency of the tool calls. A duration in seconds with up to nine fractional digits, ending with ' |
TurnLatencyMetrics
| JSON representation |
|---|
{ "averageLatency": string } |
| Fields | |
|---|---|
averageLatency |
Output only. The average latency of the turns. A duration in seconds with up to nine fractional digits, ending with ' |
MetricsByTurn
| JSON representation |
|---|
{ "turnIndex": integer, "toolMetrics": [ { object ( |
| Fields | |
|---|---|
turnIndex |
Output only. The turn index (0-based). |
toolMetrics[] |
Output only. Metrics for each tool within this turn. |
semanticSimilarityMetrics[] |
Output only. Metrics for semantic similarity within this turn. |
hallucinationMetrics[] |
Output only. Metrics for hallucination within this turn. |
toolCallLatencyMetrics[] |
Output only. Metrics for tool call latency within this turn. |
turnLatencyMetrics[] |
Output only. Metrics for turn latency within this turn. |
EvaluationResult
| JSON representation |
|---|
{ "name": string, "displayName": string, "createTime": string, "evaluationStatus": enum ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of the evaluation result. Format: |
displayName |
Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " |
createTime |
Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
evaluationStatus |
Output only. The outcome of the evaluation. Only populated if execution_state is COMPLETE. |
evaluationRun |
Output only. The evaluation run that produced this result. Format: |
persona |
Output only. The persona used to generate the conversation for the evaluation result. |
errorInfo |
Output only. Error information for the evaluation result. |
error |
Output only. Deprecated: Use |
initiatedBy |
Output only. The user who initiated the evaluation run that resulted in this result. |
appVersion |
Output only. The app version used to generate the conversation that resulted in this result. Format: |
appVersionDisplayName |
Output only. The display name of the |
changelog |
Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. |
changelogCreateTime |
Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionState |
Output only. The state of the evaluation result execution. |
evaluationMetricsThresholds |
Output only. The evaluation thresholds for the result. |
config |
Output only. The configuration used in the evaluation run that resulted in this result. |
goldenRunMethod |
Output only. The method used to run the golden evaluation. |
Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following: |
|
goldenResult |
Output only. The outcome of a golden evaluation. |
scenarioResult |
Output only. The outcome of a scenario evaluation. |
GoldenResult
| JSON representation |
|---|
{ "turnReplayResults": [ { object ( |
| Fields | |
|---|---|
turnReplayResults[] |
Output only. The result of running each turn of the golden conversation. |
evaluationExpectationResults[] |
Output only. The results of the evaluation expectations. |
TurnReplayResult
| JSON representation |
|---|
{ "conversation": string, "expectationOutcome": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated for this turn. |
expectationOutcome[] |
Output only. The outcome of each expectation. |
hallucinationResult |
Output only. The result of the hallucination check. |
toolInvocationScore |
Output only. Deprecated. Use OverallToolInvocationResult instead. |
turnLatency |
Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with ' |
toolCallLatencies[] |
Output only. The latency of each tool call in the turn. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
overallToolInvocationResult |
Output only. The result of the overall tool invocation check. |
errorInfo |
Output only. Information about the error that occurred during this turn. |
spanLatencies[] |
Output only. The latency of spans in the turn. |
Union field
|
|
toolOrderedInvocationScore |
Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order. |
GoldenExpectationOutcome
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the expectation. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
toolInvocationResult |
Output only. The result of the tool invocation check. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The result of the tool call expectation. |
observedToolResponse |
Output only. The result of the tool response expectation. |
observedAgentResponse |
Output only. The result of the agent response expectation. |
observedAgentTransfer |
Output only. The result of the agent transfer expectation. |
SemanticSimilarityResult
| JSON representation |
|---|
{ "label": string, "explanation": string, "outcome": enum ( |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory |
explanation |
Output only. The explanation for the semantic similarity score. |
outcome |
Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semantic_similarity_success_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
Union field
|
|
score |
Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4. |
ToolInvocationResult
| JSON representation |
|---|
{ "outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the parameter_correctness_score to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
explanation |
Output only. A free text explanation for the tool invocation result. |
Union field
|
|
parameterCorrectnessScore |
Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call. |
HallucinationResult
| JSON representation |
|---|
{ "label": string, "explanation": string, // Union field |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess |
explanation |
Output only. The explanation for the hallucination score. |
Union field
|
|
score |
Output only. The hallucination score. Can be -1, 0, 1. |
ToolCallLatency
| JSON representation |
|---|
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string } |
| Fields | |
|---|---|
tool |
Output only. The name of the tool that got executed. Format: |
displayName |
Output only. The display name of the tool. |
startTime |
Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionLatency |
Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with ' |
OverallToolInvocationResult
| JSON representation |
|---|
{ "outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the tool_invocation_score to the overall_tool_invocation_correctness_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
Union field
|
|
toolInvocationScore |
The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked. |
EvaluationErrorInfo
| JSON representation |
|---|
{
"errorType": enum ( |
| Fields | |
|---|---|
errorType |
Output only. The type of error. |
errorMessage |
Output only. The error message. |
sessionId |
Output only. The session ID for the conversation that caused the error. |
SpanLatency
| JSON representation |
|---|
{ "type": enum ( |
| Fields | |
|---|---|
type |
Output only. The type of span. |
displayName |
Output only. The display name of the span. Applicable to tool and guardrail spans. |
startTime |
Output only. The start time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionLatency |
Output only. The latency of span. A duration in seconds with up to nine fractional digits, ending with ' |
Union field identifier. The identifier of the specific item based on its type. identifier can be only one of the following: |
|
resource |
Output only. The resource name of the guardrail or tool spans. |
toolset |
Output only. The toolset tool identifier. |
model |
Output only. The name of the LLM span. |
callback |
Output only. The name of the user callback span. |
EvaluationExpectationResult
| JSON representation |
|---|
{
"evaluationExpectation": string,
"prompt": string,
"outcome": enum ( |
| Fields | |
|---|---|
evaluationExpectation |
Output only. The evaluation expectation. Format: |
prompt |
Output only. The prompt that was used for the evaluation. |
outcome |
Output only. The outcome of the evaluation expectation. |
explanation |
Output only. The explanation for the result. |
ScenarioResult
| JSON representation |
|---|
{ "conversation": string, "task": string, "userFacts": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated in the scenario. |
task |
Output only. The task that was used when running the scenario for this result. |
userFacts[] |
Output only. The user facts that were used by the scenario for this result. |
expectationOutcomes[] |
Output only. The outcome of each expectation. |
rubricOutcomes[] |
Output only. The outcome of the rubric. |
hallucinationResult[] |
Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation. |
taskCompletionResult |
Output only. The result of the task completion check. |
toolCallLatencies[] |
Output only. The latency of each tool call execution in the conversation. |
userGoalSatisfactionResult |
Output only. The result of the user goal satisfaction check. |
spanLatencies[] |
Output only. The latency of spans in the conversation. |
evaluationExpectationResults[] |
Output only. The results of the evaluation expectations. |
Union field
|
|
allExpectationsSatisfied |
Output only. Whether all expectations were satisfied for this turn. |
Union field
|
|
taskCompleted |
Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction. |
ScenarioExpectationOutcome
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the ScenarioExpectation. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The observed tool call. |
observedAgentResponse |
Output only. The observed agent response. |
ObservedToolCall
| JSON representation |
|---|
{ "toolCall": { object ( |
| Fields | |
|---|---|
toolCall |
Output only. The observed tool call. |
toolResponse |
Output only. The observed tool response. |
ScenarioRubricOutcome
| JSON representation |
|---|
{ "rubric": string, "scoreExplanation": string, // Union field |
| Fields | |
|---|---|
rubric |
Output only. The rubric that was used to evaluate the conversation. |
scoreExplanation |
Output only. The rater's response to the rubric. |
Union field
|
|
score |
Output only. The score of the conversation against the rubric. |
TaskCompletionResult
| JSON representation |
|---|
{ "label": string, "explanation": string, // Union field |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined |
explanation |
Output only. The explanation for the task completion score. |
Union field
|
|
score |
Output only. The task completion score. Can be -1, 0, 1 |
UserGoalSatisfactionResult
| JSON representation |
|---|
{ "label": string, "explanation": string, // Union field |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified |
explanation |
Output only. The explanation for the user task satisfaction score. |
Union field
|
|
score |
Output only. The user task satisfaction score. Can be -1, 0, 1. |
EvaluationPersona
| JSON representation |
|---|
{
"name": string,
"description": string,
"displayName": string,
"personality": string,
"speechConfig": {
object ( |
| Fields | |
|---|---|
name |
Required. The unique identifier of the persona. Format: |
description |
Optional. The description of the persona. |
displayName |
Required. The display name of the persona. Unique within an app. |
personality |
Required. An instruction for the agent on how to behave in the evaluation. |
speechConfig |
Optional. Configuration for how the persona sounds (TTS settings). |
SpeechConfig
| JSON representation |
|---|
{
"speakingRate": number,
"environment": enum ( |
| Fields | |
|---|---|
speakingRate |
Optional. The speaking rate. 1.0 is normal. Lower is slower (e.g., 0.8), higher is faster (e.g., 1.5). Useful for testing how the agent handles fast talkers. |
environment |
Optional. The simulated audio environment. |
voiceId |
Optional. The specific voice identifier/accent to use. Example: "en-US-Wavenet-D" or "en-GB-Standard-A" |
Status
| JSON representation |
|---|
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] } |
| Fields | |
|---|---|
code |
The status code, which should be an enum value of |
message |
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the |
details[] |
A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field |
Any
| JSON representation |
|---|
{ "typeUrl": string, "value": string } |
| Fields | |
|---|---|
typeUrl |
Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): In the original design of |
value |
Holds a Protobuf serialization of the type described by type_url. A base64-encoded string. |
EvaluationMetricsThresholds
| JSON representation |
|---|
{ "goldenEvaluationMetricsThresholds": { object ( |
| Fields | |
|---|---|
goldenEvaluationMetricsThresholds |
Optional. The golden evaluation metrics thresholds. |
hallucinationMetricBehavior |
Optional. Deprecated: Use |
goldenHallucinationMetricBehavior |
Optional. The hallucination metric behavior for golden evaluations. |
scenarioHallucinationMetricBehavior |
Optional. The hallucination metric behavior for scenario evaluations. |
GoldenEvaluationMetricsThresholds
| JSON representation |
|---|
{ "turnLevelMetricsThresholds": { object ( |
| Fields | |
|---|---|
turnLevelMetricsThresholds |
Optional. The turn level metrics thresholds. |
expectationLevelMetricsThresholds |
Optional. The expectation level metrics thresholds. |
toolMatchingSettings |
Optional. The tool matching settings. An extra tool call is a tool call that is present in the execution but does not match any tool call in the golden expectation. |
TurnLevelMetricsThresholds
| JSON representation |
|---|
{ "semanticSimilarityChannel": enum ( |
| Fields | |
|---|---|
semanticSimilarityChannel |
Optional. The semantic similarity channel to use for evaluation. |
Union field
|
|
semanticSimilaritySuccessThreshold |
Optional. The success threshold for semantic similarity. Must be an integer between 0 and 4. Default is >= 3. |
Union field
|
|
overallToolInvocationCorrectnessThreshold |
Optional. The success threshold for overall tool invocation correctness. Must be a float between 0 and 1. Default is 1.0. |
ExpectationLevelMetricsThresholds
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
toolInvocationParameterCorrectnessThreshold |
Optional. The success threshold for individual tool invocation parameter correctness. Must be a float between 0 and 1. Default is 1.0. |
ToolMatchingSettings
| JSON representation |
|---|
{
"extraToolCallBehavior": enum ( |
| Fields | |
|---|---|
extraToolCallBehavior |
Optional. Behavior for extra tool calls. Defaults to FAIL. |
EvaluationConfig
| JSON representation |
|---|
{ "inputAudioConfig": { object ( |
| Fields | |
|---|---|
inputAudioConfig |
Optional. Configuration for processing the input audio. |
outputAudioConfig |
Optional. Configuration for generating the output audio. |
evaluationChannel |
Optional. The channel to evaluate. |
toolCallBehaviour |
Optional. Specifies whether the evaluation should use real tool calls or fake tools. |
InputAudioConfig
| JSON representation |
|---|
{
"audioEncoding": enum ( |
| Fields | |
|---|---|
audioEncoding |
Required. The encoding of the input audio data. |
sampleRateHertz |
Required. The sample rate (in Hertz) of the input audio data. |
noiseSuppressionLevel |
Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high". |
OutputAudioConfig
| JSON representation |
|---|
{
"audioEncoding": enum ( |
| Fields | |
|---|---|
audioEncoding |
Required. The encoding of the output audio data. |
sampleRateHertz |
Required. The sample rate (in Hertz) of the output audio data. |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌