MCP Tools Reference: ces.googleapis.com

Tool: list_evaluations

Lists evaluations.

The following sample demonstrate how to use curl to invoke the list_evaluations MCP tool.

Curl Request
                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "list_evaluations",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'
                

Input Schema

Request message for EvaluationService.ListEvaluations.

ListEvaluationsRequest

JSON representation
{
  "parent": string,
  "pageSize": integer,
  "pageToken": string,
  "filter": string,
  "evaluationFilter": string,
  "evaluationRunFilter": string,
  "orderBy": string,
  "lastTenResults": boolean
}
Fields
parent

string

Required. The resource name of the app to list evaluations from.

pageSize

integer

Optional. Requested page size. Server may return fewer items than requested. If unspecified, server will pick an appropriate default.

pageToken

string

Optional. The next_page_token value returned from a previous list EvaluationService.ListEvaluations call.

filter
(deprecated)

string

Optional. Deprecated: Use evaluation_filter and evaluation_run_filter instead.

evaluationFilter

string

Optional. Filter to be applied on the evaluation when listing the evaluations. See https://google.aip.dev/160 for more details. Supported fields: evaluation_datasets

evaluationRunFilter

string

Optional. Filter string for fields on the associated EvaluationRun resources. See https://google.aip.dev/160 for more details. Supported fields: create_time, initiated_by, app_version_display_name

orderBy

string

Optional. Field to sort by. Only "name" and "create_time", and "update_time" are supported. Time fields are ordered in descending order, and the name field is ordered in ascending order. If not included, "update_time" will be the default. See https://google.aip.dev/132#ordering for more details.

lastTenResults

boolean

Optional. Whether to include the last 10 evaluation results for each evaluation in the response.

Output Schema

Response message for EvaluationService.ListEvaluations.

ListEvaluationsResponse

JSON representation
{
  "evaluations": [
    {
      object (Evaluation)
    }
  ],
  "nextPageToken": string
}
Fields
evaluations[]

object (Evaluation)

The list of evaluations.

nextPageToken

string

A token that can be sent as ListEvaluationsRequest.page_token to retrieve the next page. Absence of this field indicates there are no subsequent pages.

Evaluation

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "tags": [
    string
  ],
  "evaluationDatasets": [
    string
  ],
  "createTime": string,
  "createdBy": string,
  "updateTime": string,
  "lastUpdatedBy": string,
  "evaluationRuns": [
    string
  ],
  "etag": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  },
  "lastCompletedResult": {
    object (EvaluationResult)
  },
  "invalid": boolean,
  "lastTenResults": [
    {
      object (EvaluationResult)
    }
  ],

  // Union field inputs can be only one of the following:
  "golden": {
    object (Golden)
  },
  "scenario": {
    object (Scenario)
  }
  // End of list of possible types for union field inputs.
}
Fields
name

string

Identifier. The unique identifier of this evaluation. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}

displayName

string

Required. User-defined display name of the evaluation. Unique within an App.

description

string

Optional. User-defined description of the evaluation.

tags[]

string

Optional. User defined tags to categorize the evaluation.

evaluationDatasets[]

string

Output only. List of evaluation datasets the evaluation belongs to. Format: projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

createdBy

string

Output only. The user who created the evaluation.

updateTime

string (Timestamp format)

Output only. Timestamp when the evaluation was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

lastUpdatedBy

string

Output only. The user who last updated the evaluation.

evaluationRuns[]

string

Output only. The EvaluationRuns that this Evaluation is associated with.

etag

string

Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.

aggregatedMetrics

object (AggregatedMetrics)

Output only. The aggregated metrics for this evaluation across all runs.

lastCompletedResult

object (EvaluationResult)

Output only. The latest evaluation result for this evaluation.

invalid

boolean

Output only. Whether the evaluation is invalid. This can happen if an evaluation is referencing a tool, toolset, or agent that has since been deleted.

lastTenResults[]

object (EvaluationResult)

Output only. The last 10 evaluation results for this evaluation. This is only populated if include_last_ten_results is set to true in the ListEvaluationsRequest or GetEvaluationRequest.

Union field inputs. The inputs for the evaluation inputs can be only one of the following:
golden

object (Golden)

Optional. The golden steps to be evaluated.

scenario

object (Scenario)

Optional. The config for a scenario.

Golden

JSON representation
{
  "turns": [
    {
      object (GoldenTurn)
    }
  ],
  "evaluationExpectations": [
    string
  ]
}
Fields
turns[]

object (GoldenTurn)

Required. The golden turns required to replay a golden conversation.

evaluationExpectations[]

string

Optional. The evaluation expectations to evaluate the replayed conversation against. Format: projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluationExpectation}

GoldenTurn

JSON representation
{
  "steps": [
    {
      object (Step)
    }
  ],
  "rootSpan": {
    object (Span)
  }
}
Fields
steps[]

object (Step)

Required. The steps required to replay a golden conversation.

rootSpan

object (Span)

Optional. The root span of the golden turn for processing and maintaining audio information.

Step

JSON representation
{

  // Union field step can be only one of the following:
  "userInput": {
    object (SessionInput)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "expectation": {
    object (GoldenExpectation)
  }
  // End of list of possible types for union field step.
}
Fields
Union field step. The step to perform. step can be only one of the following:
userInput

object (SessionInput)

Optional. User input for the conversation.

agentTransfer

object (AgentTransfer)

Optional. Transfer the conversation to a different agent.

expectation

object (GoldenExpectation)

Optional. Executes an expectation on the current turn.

SessionInput

JSON representation
{
  "willContinue": boolean,

  // Union field input_type can be only one of the following:
  "text": string,
  "dtmf": string,
  "audio": string,
  "toolResponses": {
    object (ToolResponses)
  },
  "image": {
    object (Image)
  },
  "blob": {
    object (Blob)
  },
  "variables": {
    object
  },
  "event": {
    object (Event)
  }
  // End of list of possible types for union field input_type.
}
Fields
willContinue

boolean

Optional. A flag to indicate if the current message is a fragment of a larger input in the bidi streaming session.

When set to true, the agent defers processing until it receives a subsequent message where will_continue is false, or until the system detects an endpoint in the audio input.

NOTE: This field does not apply to audio and DTMF inputs, as they are always processed automatically based on the endpointing signal.

Union field input_type. The type of the input. input_type can be only one of the following:
text

string

Optional. Text data from the end user.

dtmf

string

Optional. DTMF digits from the end user.

audio

string (bytes format)

Optional. Audio data from the end user.

A base64-encoded string.

toolResponses

object (ToolResponses)

Optional. Execution results for the tool calls from the client.

image

object (Image)

Optional. Image data from the end user.

blob

object (Blob)

Optional. Blob data from the end user.

variables

object (Struct format)

Optional. Contextual variables for the session, keyed by name. Only variables declared in the app will be used by the CES agent.

Unrecognized variables will still be sent to the [Dialogflow agent][Agent.RemoteDialogflowAgent] as additional session parameters.

event

object (Event)

Optional. Event input.

ToolResponses

JSON representation
{
  "toolResponses": [
    {
      object (ToolResponse)
    }
  ]
}
Fields
toolResponses[]

object (ToolResponse)

Optional. The list of tool execution results.

ToolResponse

JSON representation
{
  "id": string,
  "displayName": string,
  "response": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}
Fields
id

string

Optional. The matching ID of the tool call the response is for.

displayName

string

Output only. Display name of the tool.

response

object (Struct format)

Required. The tool execution result in JSON object format. Use "output" key to specify tool response and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as tool execution result.

Union field tool_identifier. The identifier of the tool that got executed. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following:
tool

string

Optional. The name of the tool to execute. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}

toolsetTool

object (ToolsetTool)

Optional. The toolset tool that got executed.

ToolsetTool

JSON representation
{
  "toolset": string,
  "toolId": string
}
Fields
toolset

string

Required. The resource name of the Toolset from which this tool is derived. Format: projects/{project}/locations/{location}/apps/{app}/toolsets/{toolset}

toolId

string

Optional. The tool ID to filter the tools to retrieve the schema for.

Struct

JSON representation
{
  "fields": {
    string: value,
    ...
  }
}
Fields
fields

map (key: string, value: value (Value format))

Unordered map of dynamically typed values.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

FieldsEntry

JSON representation
{
  "key": string,
  "value": value
}
Fields
key

string

value

value (Value format)

Value

JSON representation
{

  // Union field kind can be only one of the following:
  "nullValue": null,
  "numberValue": number,
  "stringValue": string,
  "boolValue": boolean,
  "structValue": {
    object
  },
  "listValue": array
  // End of list of possible types for union field kind.
}
Fields
Union field kind. The kind of value. kind can be only one of the following:
nullValue

null

Represents a null value.

numberValue

number

Represents a double value.

stringValue

string

Represents a string value.

boolValue

boolean

Represents a boolean value.

structValue

object (Struct format)

Represents a structured value.

listValue

array (ListValue format)

Represents a repeated Value.

ListValue

JSON representation
{
  "values": [
    value
  ]
}
Fields
values[]

value (Value format)

Repeated field of dynamically typed values.

Image

JSON representation
{
  "mimeType": string,
  "data": string
}
Fields
mimeType

string

Required. The IANA standard MIME type of the source data. Supported image types includes: * image/png * image/jpeg * image/webp

data

string (bytes format)

Required. Raw bytes of the image.

A base64-encoded string.

Blob

JSON representation
{
  "mimeType": string,
  "data": string
}
Fields
mimeType

string

Required. The IANA standard MIME type of the source data.

data

string (bytes format)

Required. Raw bytes of the blob.

A base64-encoded string.

Event

JSON representation
{
  "event": string
}
Fields
event

string

Required. The name of the event.

AgentTransfer

JSON representation
{
  "targetAgent": string,
  "displayName": string
}
Fields
targetAgent

string

Required. The agent to which the conversation is being transferred. The agent will handle the conversation from this point forward. Format: projects/{project}/locations/{location}/apps/{app}/agents/{agent}

displayName

string

Output only. Display name of the agent.

GoldenExpectation

JSON representation
{
  "note": string,

  // Union field condition can be only one of the following:
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentResponse": {
    object (Message)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
  // End of list of possible types for union field condition.
}
Fields
note

string

Optional. A note for this requirement, useful in reporting when specific checks fail. E.g., "Check_Payment_Tool_Called".

Union field condition. The actual check to perform. condition can be only one of the following:
toolCall

object (ToolCall)

Optional. Check that a specific tool was called with the parameters.

toolResponse

object (ToolResponse)

Optional. Check that a specific tool had the expected response.

agentResponse

object (Message)

Optional. Check that the agent responded with the correct response. The role "agent" is implied.

agentTransfer

object (AgentTransfer)

Optional. Check that the agent transferred the conversation to a different agent.

updatedVariables

object (Struct format)

Optional. Check that the agent updated the session variables to the expected values. Used to also capture agent variable updates for golden evals.

mockToolResponse

object (ToolResponse)

Optional. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

ToolCall

JSON representation
{
  "id": string,
  "displayName": string,
  "args": {
    object
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}
Fields
id

string

Optional. The unique identifier of the tool call. If populated, the client should return the execution result with the matching ID in ToolResponse.

displayName

string

Output only. Display name of the tool.

args

object (Struct format)

Optional. The input parameters and values for the tool in JSON object format.

Union field tool_identifier. The identifier of the tool to execute. It could be either a persisted tool or a tool from a toolset. tool_identifier can be only one of the following:
tool

string

Optional. The name of the tool to execute. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}

toolsetTool

object (ToolsetTool)

Optional. The toolset tool to execute.

Message

JSON representation
{
  "role": string,
  "chunks": [
    {
      object (Chunk)
    }
  ],
  "eventTime": string
}
Fields
role

string

Optional. The role within the conversation, e.g., user, agent.

chunks[]

object (Chunk)

Optional. Content of the message as a series of chunks.

eventTime

string (Timestamp format)

Optional. Timestamp when the message was sent or received. Should not be used if the message is part of an example.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Chunk

JSON representation
{

  // Union field data can be only one of the following:
  "text": string,
  "transcript": string,
  "blob": {
    object (Blob)
  },
  "payload": {
    object
  },
  "image": {
    object (Image)
  },
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "updatedVariables": {
    object
  },
  "defaultVariables": {
    object
  }
  // End of list of possible types for union field data.
}
Fields
Union field data. Chunk data. data can be only one of the following:
text

string

Optional. Text data.

transcript

string

Optional. Transcript associated with the audio.

blob

object (Blob)

Optional. Blob data.

payload

object (Struct format)

Optional. Custom payload data.

image

object (Image)

Optional. Image data.

toolCall

object (ToolCall)

Optional. Tool execution request.

toolResponse

object (ToolResponse)

Optional. Tool execution response.

agentTransfer

object (AgentTransfer)

Optional. Agent transfer event.

updatedVariables

object (Struct format)

A struct represents variables that were updated in the conversation, keyed by variable names.

defaultVariables

object (Struct format)

A struct represents default variables at the start of the conversation, keyed by variable names.

Timestamp

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

Span

JSON representation
{
  "name": string,
  "startTime": string,
  "endTime": string,
  "duration": string,
  "attributes": {
    object
  },
  "childSpans": [
    {
      object (Span)
    }
  ]
}
Fields
name

string

Output only. The name of the span.

startTime

string (Timestamp format)

Output only. The start time of the span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of the span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

duration

string (Duration format)

Output only. The duration of the span.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

attributes

object (Struct format)

Output only. Key-value attributes associated with the span.

childSpans[]

object (Span)

Output only. The child spans that are nested under this span.

Duration

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

Scenario

JSON representation
{
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "maxTurns": integer,
  "rubrics": [
    string
  ],
  "scenarioExpectations": [
    {
      object (ScenarioExpectation)
    }
  ],
  "variableOverrides": {
    object
  },
  "taskCompletionBehavior": enum (TaskCompletionBehavior),
  "userGoalBehavior": enum (UserGoalBehavior),
  "evaluationExpectations": [
    string
  ]
}
Fields
task

string

Required. The task to be targeted by the scenario.

userFacts[]

object (UserFact)

Optional. The user facts to be used by the scenario.

maxTurns

integer

Optional. The maximum number of turns to simulate. If not specified, the simulation will continue until the task is complete.

rubrics[]

string

Required. The rubrics to score the scenario against.

scenarioExpectations[]

object (ScenarioExpectation)

Required. The ScenarioExpectations to evaluate the conversation produced by the user simulation.

variableOverrides

object (Struct format)

Optional. Variables / Session Parameters as context for the session, keyed by variable names. Members of this struct will override any default values set by the system.

Note, these are different from user facts, which are facts known to the user. Variables are parameters known to the agent: i.e. MDN (phone number) passed by the telephony system.

taskCompletionBehavior
(deprecated)

enum (TaskCompletionBehavior)

Optional. Deprecated. Use user_goal_behavior instead.

userGoalBehavior

enum (UserGoalBehavior)

Optional. The expected behavior of the user goal.

evaluationExpectations[]

string

Optional. The evaluation expectations to evaluate the conversation produced by the simulation against. Format: projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluationExpectation}

UserFact

JSON representation
{
  "name": string,
  "value": string
}
Fields
name

string

Required. The name of the user fact.

value

string

Required. The value of the user fact.

ScenarioExpectation

JSON representation
{

  // Union field expectation can be only one of the following:
  "toolExpectation": {
    object (ToolExpectation)
  },
  "agentResponse": {
    object (Message)
  }
  // End of list of possible types for union field expectation.
}
Fields
Union field expectation. The expectation to evaluate the conversation produced by the simulation. expectation can be only one of the following:
toolExpectation

object (ToolExpectation)

Optional. The tool call and response pair to be evaluated.

agentResponse

object (Message)

Optional. The agent response to be evaluated.

ToolExpectation

JSON representation
{
  "expectedToolCall": {
    object (ToolCall)
  },
  "mockToolResponse": {
    object (ToolResponse)
  }
}
Fields
expectedToolCall

object (ToolCall)

Required. The expected tool call, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

mockToolResponse

object (ToolResponse)

Required. The tool response to mock, with the parameters of interest specified. Any parameters not specified will be hallucinated by the LLM.

AggregatedMetrics

JSON representation
{
  "metricsByAppVersion": [
    {
      object (MetricsByAppVersion)
    }
  ]
}
Fields
metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation
{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}
Fields
appVersionId

string

Output only. The app version ID.

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this app version.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this app version.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this app version.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this app version.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this app version.

passCount

integer

Output only. The number of times the evaluation passed.

failCount

integer

Output only. The number of times the evaluation failed.

metricsByTurn[]

object (MetricsByTurn)

Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{
  "tool": string,
  "passCount": integer,
  "failCount": integer
}
Fields
tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{
  "tool": string,
  "averageLatency": string
}
Fields
tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

TurnLatencyMetrics

JSON representation
{
  "averageLatency": string
}
Fields
averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation
{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}
Fields
turnIndex

integer

Output only. The turn index (0-based).

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this turn.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this turn.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this turn.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this turn.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this turn.

EvaluationResult

JSON representation
{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "changelogCreateTime": string,
  "executionState": enum (ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (GoldenResult)
  },
  "scenarioResult": {
    object (ScenarioResult)
  }
  // End of list of possible types for union field result.
}
Fields
name

string

Identifier. The unique identifier of the evaluation result. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}

displayName

string

Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation result was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

evaluationStatus

enum (Outcome)

Output only. The outcome of the evaluation. Only populated if execution_state is COMPLETE.

evaluationRun

string

Output only. The evaluation run that produced this result. Format: projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}

persona

object (EvaluationPersona)

Output only. The persona used to generate the conversation for the evaluation result.

errorInfo

object (EvaluationErrorInfo)

Output only. Error information for the evaluation result.

error
(deprecated)

object (Status)

Output only. Deprecated: Use error_info instead. Errors encountered during execution.

initiatedBy

string

Output only. The user who initiated the evaluation run that resulted in this result.

appVersion

string

Output only. The app version used to generate the conversation that resulted in this result. Format: projects/{project}/locations/{location}/apps/{app}/versions/{version}

appVersionDisplayName

string

Output only. The display name of the app_version that the evaluation ran against.

changelog

string

Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

changelogCreateTime

string (Timestamp format)

Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionState

enum (ExecutionState)

Output only. The state of the evaluation result execution.

evaluationMetricsThresholds

object (EvaluationMetricsThresholds)

Output only. The evaluation thresholds for the result.

config

object (EvaluationConfig)

Output only. The configuration used in the evaluation run that resulted in this result.

goldenRunMethod

enum (GoldenRunMethod)

Output only. The method used to run the golden evaluation.

Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following:
goldenResult

object (GoldenResult)

Output only. The outcome of a golden evaluation.

scenarioResult

object (ScenarioResult)

Output only. The outcome of a scenario evaluation.

GoldenResult

JSON representation
{
  "turnReplayResults": [
    {
      object (TurnReplayResult)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ]
}
Fields
turnReplayResults[]

object (TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

TurnReplayResult

JSON representation
{
  "conversation": string,
  "expectationOutcome": [
    {
      object (GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],

  // Union field _tool_ordered_invocation_score can be only one of the following:
  "toolOrderedInvocationScore": number
  // End of list of possible types for union field
  // _tool_ordered_invocation_score.
}
Fields
conversation

string

Output only. The conversation that was generated for this turn.

expectationOutcome[]

object (GoldenExpectationOutcome)

Output only. The outcome of each expectation.

hallucinationResult

object (HallucinationResult)

Output only. The result of the hallucination check.

toolInvocationScore
(deprecated)

number

Output only. Deprecated. Use OverallToolInvocationResult instead.

turnLatency

string (Duration format)

Output only. Duration of the turn.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call in the turn.

semanticSimilarityResult

object (SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

overallToolInvocationResult

object (OverallToolInvocationResult)

Output only. The result of the overall tool invocation check.

errorInfo

object (EvaluationErrorInfo)

Output only. Information about the error that occurred during this turn.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the turn.

Union field _tool_ordered_invocation_score.

_tool_ordered_invocation_score can be only one of the following:

toolOrderedInvocationScore

number

Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

GoldenExpectationOutcome

JSON representation
{
  "expectation": {
    object (GoldenExpectation)
  },
  "outcome": enum (Outcome),
  "semanticSimilarityResult": {
    object (SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (GoldenExpectation)

Output only. The expectation that was evaluated.

outcome

enum (Outcome)

Output only. The outcome of the expectation.

semanticSimilarityResult
(deprecated)

object (SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

toolInvocationResult

object (ToolInvocationResult)

Output only. The result of the tool invocation check.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (ToolCall)

Output only. The result of the tool call expectation.

observedToolResponse

object (ToolResponse)

Output only. The result of the tool response expectation.

observedAgentResponse

object (Message)

Output only. The result of the agent response expectation.

observedAgentTransfer

object (AgentTransfer)

Output only. The result of the agent transfer expectation.

SemanticSimilarityResult

JSON representation
{
  "label": string,
  "explanation": string,
  "outcome": enum (Outcome),

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory

explanation

string

Output only. The explanation for the semantic similarity score.

outcome

enum (Outcome)

Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semantic_similarity_success_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

ToolInvocationResult

JSON representation
{
  "outcome": enum (Outcome),
  "explanation": string,

  // Union field _parameter_correctness_score can be only one of the following:
  "parameterCorrectnessScore": number
  // End of list of possible types for union field _parameter_correctness_score.
}
Fields
outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the parameter_correctness_score to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

explanation

string

Output only. A free text explanation for the tool invocation result.

Union field _parameter_correctness_score.

_parameter_correctness_score can be only one of the following:

parameterCorrectnessScore

number

Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

HallucinationResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess

explanation

string

Output only. The explanation for the hallucination score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The hallucination score. Can be -1, 0, 1.

ToolCallLatency

JSON representation
{
  "tool": string,
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string
}
Fields
tool

string

Output only. The name of the tool that got executed. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}.

displayName

string

Output only. The display name of the tool.

startTime

string (Timestamp format)

Output only. The start time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionLatency

string (Duration format)

Output only. The latency of the tool call execution.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

OverallToolInvocationResult

JSON representation
{
  "outcome": enum (Outcome),

  // Union field _tool_invocation_score can be only one of the following:
  "toolInvocationScore": number
  // End of list of possible types for union field _tool_invocation_score.
}
Fields
outcome

enum (Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the tool_invocation_score to the overall_tool_invocation_correctness_threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

Union field _tool_invocation_score.

_tool_invocation_score can be only one of the following:

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationErrorInfo

JSON representation
{
  "errorType": enum (ErrorType),
  "errorMessage": string,
  "sessionId": string
}
Fields
errorType

enum (ErrorType)

Output only. The type of error.

errorMessage

string

Output only. The error message.

sessionId

string

Output only. The session ID for the conversation that caused the error.

SpanLatency

JSON representation
{
  "type": enum (Type),
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string,

  // Union field identifier can be only one of the following:
  "resource": string,
  "toolset": {
    object (ToolsetTool)
  },
  "model": string,
  "callback": string
  // End of list of possible types for union field identifier.
}
Fields
type

enum (Type)

Output only. The type of span.

displayName

string

Output only. The display name of the span. Applicable to tool and guardrail spans.

startTime

string (Timestamp format)

Output only. The start time of span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of span.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionLatency

string (Duration format)

Output only. The latency of span.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Union field identifier. The identifier of the specific item based on its type. identifier can be only one of the following:
resource

string

Output only. The resource name of the guardrail or tool spans.

toolset

object (ToolsetTool)

Output only. The toolset tool identifier.

model

string

Output only. The name of the LLM span.

callback

string

Output only. The name of the user callback span.

EvaluationExpectationResult

JSON representation
{
  "evaluationExpectation": string,
  "prompt": string,
  "outcome": enum (Outcome),
  "explanation": string
}
Fields
evaluationExpectation

string

Output only. The evaluation expectation. Format: projects/{project}/locations/{location}/apps/{app}/evaluationExpectations/{evaluation_expectation}

prompt

string

Output only. The prompt that was used for the evaluation.

outcome

enum (Outcome)

Output only. The outcome of the evaluation expectation.

explanation

string

Output only. The explanation for the result.

ScenarioResult

JSON representation
{
  "conversation": string,
  "task": string,
  "userFacts": [
    {
      object (UserFact)
    }
  ],
  "expectationOutcomes": [
    {
      object (ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (UserGoalSatisfactionResult)
  },
  "spanLatencies": [
    {
      object (SpanLatency)
    }
  ],
  "evaluationExpectationResults": [
    {
      object (EvaluationExpectationResult)
    }
  ],

  // Union field _all_expectations_satisfied can be only one of the following:
  "allExpectationsSatisfied": boolean
  // End of list of possible types for union field _all_expectations_satisfied.

  // Union field _task_completed can be only one of the following:
  "taskCompleted": boolean
  // End of list of possible types for union field _task_completed.
}
Fields
conversation

string

Output only. The conversation that was generated in the scenario.

task

string

Output only. The task that was used when running the scenario for this result.

userFacts[]

object (UserFact)

Output only. The user facts that were used by the scenario for this result.

expectationOutcomes[]

object (ScenarioExpectationOutcome)

Output only. The outcome of each expectation.

rubricOutcomes[]

object (ScenarioRubricOutcome)

Output only. The outcome of the rubric.

hallucinationResult[]

object (HallucinationResult)

Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.

taskCompletionResult
(deprecated)

object (TaskCompletionResult)

Output only. The result of the task completion check.

toolCallLatencies[]

object (ToolCallLatency)

Output only. The latency of each tool call execution in the conversation.

userGoalSatisfactionResult

object (UserGoalSatisfactionResult)

Output only. The result of the user goal satisfaction check.

spanLatencies[]

object (SpanLatency)

Output only. The latency of spans in the conversation.

evaluationExpectationResults[]

object (EvaluationExpectationResult)

Output only. The results of the evaluation expectations.

Union field _all_expectations_satisfied.

_all_expectations_satisfied can be only one of the following:

allExpectationsSatisfied

boolean

Output only. Whether all expectations were satisfied for this turn.

Union field _task_completed.

_task_completed can be only one of the following:

taskCompleted

boolean

Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

ScenarioExpectationOutcome

JSON representation
{
  "expectation": {
    object (ScenarioExpectation)
  },
  "outcome": enum (Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (ScenarioExpectation)

Output only. The expectation that was evaluated.

outcome

enum (Outcome)

Output only. The outcome of the ScenarioExpectation.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (ObservedToolCall)

Output only. The observed tool call.

observedAgentResponse

object (Message)

Output only. The observed agent response.

ObservedToolCall

JSON representation
{
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  }
}
Fields
toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

ScenarioRubricOutcome

JSON representation
{
  "rubric": string,
  "scoreExplanation": string,

  // Union field _score can be only one of the following:
  "score": number
  // End of list of possible types for union field _score.
}
Fields
rubric

string

Output only. The rubric that was used to evaluate the conversation.

scoreExplanation

string

Output only. The rater's response to the rubric.

Union field _score.

_score can be only one of the following:

score

number

Output only. The score of the conversation against the rubric.

TaskCompletionResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined

explanation

string

Output only. The explanation for the task completion score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The task completion score. Can be -1, 0, 1

UserGoalSatisfactionResult

JSON representation
{
  "label": string,
  "explanation": string,

  // Union field _score can be only one of the following:
  "score": integer
  // End of list of possible types for union field _score.
}
Fields
label

string

Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified

explanation

string

Output only. The explanation for the user task satisfaction score.

Union field _score.

_score can be only one of the following:

score

integer

Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationPersona

JSON representation
{
  "name": string,
  "description": string,
  "displayName": string,
  "personality": string,
  "speechConfig": {
    object (SpeechConfig)
  }
}
Fields
name

string

Required. The unique identifier of the persona. Format: projects/{project}/locations/{location}/apps/{app}/evaluationPersonas/{evaluationPersona}

description

string

Optional. The description of the persona.

displayName

string

Required. The display name of the persona. Unique within an app.

personality

string

Required. An instruction for the agent on how to behave in the evaluation.

speechConfig

object (SpeechConfig)

Optional. Configuration for how the persona sounds (TTS settings).

SpeechConfig

JSON representation
{
  "speakingRate": number,
  "environment": enum (BackgroundEnvironment),
  "voiceId": string
}
Fields
speakingRate

number

Optional. The speaking rate. 1.0 is normal. Lower is slower (e.g., 0.8), higher is faster (e.g., 1.5). Useful for testing how the agent handles fast talkers.

environment

enum (BackgroundEnvironment)

Optional. The simulated audio environment.

voiceId

string

Optional. The specific voice identifier/accent to use. Example: "en-US-Wavenet-D" or "en-GB-Standard-A"

Status

JSON representation
{
  "code": integer,
  "message": string,
  "details": [
    {
      "@type": string,
      field1: ...,
      ...
    }
  ]
}
Fields
code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Any

JSON representation
{
  "typeUrl": string,
  "value": string
}
Fields
typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

EvaluationMetricsThresholds

JSON representation
{
  "goldenEvaluationMetricsThresholds": {
    object (GoldenEvaluationMetricsThresholds)
  },
  "hallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "goldenHallucinationMetricBehavior": enum (HallucinationMetricBehavior),
  "scenarioHallucinationMetricBehavior": enum (HallucinationMetricBehavior)
}
Fields
goldenEvaluationMetricsThresholds

object (GoldenEvaluationMetricsThresholds)

Optional. The golden evaluation metrics thresholds.

hallucinationMetricBehavior
(deprecated)

enum (HallucinationMetricBehavior)

Optional. Deprecated: Use golden_hallucination_metric_behavior instead. The hallucination metric behavior is currently used for golden evaluations.

goldenHallucinationMetricBehavior

enum (HallucinationMetricBehavior)

Optional. The hallucination metric behavior for golden evaluations.

scenarioHallucinationMetricBehavior

enum (HallucinationMetricBehavior)

Optional. The hallucination metric behavior for scenario evaluations.

GoldenEvaluationMetricsThresholds

JSON representation
{
  "turnLevelMetricsThresholds": {
    object (TurnLevelMetricsThresholds)
  },
  "expectationLevelMetricsThresholds": {
    object (ExpectationLevelMetricsThresholds)
  },
  "toolMatchingSettings": {
    object (ToolMatchingSettings)
  }
}
Fields
turnLevelMetricsThresholds

object (TurnLevelMetricsThresholds)

Optional. The turn level metrics thresholds.

expectationLevelMetricsThresholds

object (ExpectationLevelMetricsThresholds)

Optional. The expectation level metrics thresholds.

toolMatchingSettings

object (ToolMatchingSettings)

Optional. The tool matching settings. An extra tool call is a tool call that is present in the execution but does not match any tool call in the golden expectation.

TurnLevelMetricsThresholds

JSON representation
{
  "semanticSimilarityChannel": enum (SemanticSimilarityChannel),

  // Union field _semantic_similarity_success_threshold can be only one of the
  // following:
  "semanticSimilaritySuccessThreshold": integer
  // End of list of possible types for union field
  // _semantic_similarity_success_threshold.

  // Union field _overall_tool_invocation_correctness_threshold can be only one
  // of the following:
  "overallToolInvocationCorrectnessThreshold": number
  // End of list of possible types for union field
  // _overall_tool_invocation_correctness_threshold.
}
Fields
semanticSimilarityChannel

enum (SemanticSimilarityChannel)

Optional. The semantic similarity channel to use for evaluation.

Union field _semantic_similarity_success_threshold.

_semantic_similarity_success_threshold can be only one of the following:

semanticSimilaritySuccessThreshold

integer

Optional. The success threshold for semantic similarity. Must be an integer between 0 and 4. Default is >= 3.

Union field _overall_tool_invocation_correctness_threshold.

_overall_tool_invocation_correctness_threshold can be only one of the following:

overallToolInvocationCorrectnessThreshold

number

Optional. The success threshold for overall tool invocation correctness. Must be a float between 0 and 1. Default is 1.0.

ExpectationLevelMetricsThresholds

JSON representation
{

  // Union field _tool_invocation_parameter_correctness_threshold can be only one
  // of the following:
  "toolInvocationParameterCorrectnessThreshold": number
  // End of list of possible types for union field
  // _tool_invocation_parameter_correctness_threshold.
}
Fields

Union field _tool_invocation_parameter_correctness_threshold.

_tool_invocation_parameter_correctness_threshold can be only one of the following:

toolInvocationParameterCorrectnessThreshold

number

Optional. The success threshold for individual tool invocation parameter correctness. Must be a float between 0 and 1. Default is 1.0.

ToolMatchingSettings

JSON representation
{
  "extraToolCallBehavior": enum (ExtraToolCallBehavior)
}
Fields
extraToolCallBehavior

enum (ExtraToolCallBehavior)

Optional. Behavior for extra tool calls. Defaults to FAIL.

EvaluationConfig

JSON representation
{
  "inputAudioConfig": {
    object (InputAudioConfig)
  },
  "outputAudioConfig": {
    object (OutputAudioConfig)
  },
  "evaluationChannel": enum (EvaluationChannel),
  "toolCallBehaviour": enum (EvaluationToolCallBehaviour)
}
Fields
inputAudioConfig
(deprecated)

object (InputAudioConfig)

Optional. Configuration for processing the input audio.

outputAudioConfig
(deprecated)

object (OutputAudioConfig)

Optional. Configuration for generating the output audio.

evaluationChannel

enum (EvaluationChannel)

Optional. The channel to evaluate.

toolCallBehaviour

enum (EvaluationToolCallBehaviour)

Optional. Specifies whether the evaluation should use real tool calls or fake tools.

InputAudioConfig

JSON representation
{
  "audioEncoding": enum (AudioEncoding),
  "sampleRateHertz": integer,
  "noiseSuppressionLevel": string
}
Fields
audioEncoding

enum (AudioEncoding)

Required. The encoding of the input audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the input audio data.

noiseSuppressionLevel

string

Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high".

OutputAudioConfig

JSON representation
{
  "audioEncoding": enum (AudioEncoding),
  "sampleRateHertz": integer
}
Fields
audioEncoding

enum (AudioEncoding)

Required. The encoding of the output audio data.

sampleRateHertz

integer

Required. The sample rate (in Hertz) of the output audio data.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌