Tool: list_evaluation_runs
Lists evaluation runs.
The following sample demonstrate how to use curl to invoke the list_evaluation_runs MCP tool.
| Curl Request |
|---|
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "list_evaluation_runs", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }' |
Input Schema
Request message for EvaluationService.ListEvaluationRuns.
ListEvaluationRunsRequest
| JSON representation |
|---|
{ "parent": string, "pageSize": integer, "pageToken": string, "filter": string, "orderBy": string } |
| Fields | |
|---|---|
parent |
Required. The resource name of the app to list evaluation runs from. |
pageSize |
Optional. Requested page size. Server may return fewer items than requested. If unspecified, server will pick an appropriate default. |
pageToken |
Optional. The |
filter |
Optional. Filter to be applied when listing the evaluation runs. See https://google.aip.dev/160 for more details. |
orderBy |
Optional. Field to sort by. Only "name" and "create_time", and "update_time" are supported. Time fields are ordered in descending order, and the name field is ordered in ascending order. If not included, "update_time" will be the default. See https://google.aip.dev/132#ordering for more details. |
Output Schema
Response message for EvaluationService.ListEvaluationRuns.
ListEvaluationRunsResponse
| JSON representation |
|---|
{
"evaluationRuns": [
{
object ( |
| Fields | |
|---|---|
evaluationRuns[] |
The list of evaluation runs. |
nextPageToken |
A token that can be sent as |
EvaluationRun
| JSON representation |
|---|
{ "name": string, "displayName": string, "evaluationResults": [ string ], "createTime": string, "initiatedBy": string, "appVersion": string, "appVersionDisplayName": string, "changelog": string, "changelogCreateTime": string, "evaluations": [ string ], "evaluationDataset": string, "evaluationType": enum ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of the evaluation run. Format: |
displayName |
Optional. User-defined display name of the evaluation run. default: " |
evaluationResults[] |
Output only. The evaluation results that are part of this run. Format: |
createTime |
Output only. Timestamp when the evaluation run was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
initiatedBy |
Output only. The user who initiated the evaluation run. |
appVersion |
Output only. The app version to evaluate. Format: |
appVersionDisplayName |
Output only. The display name of the |
changelog |
Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. |
changelogCreateTime |
Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
evaluations[] |
Output only. The evaluations that are part of this run. The list may contain evaluations of either type. This field is mutually exclusive with |
evaluationDataset |
Output only. The evaluation dataset that this run is associated with. This field is mutually exclusive with |
evaluationType |
Output only. The type of the evaluations in this run. |
state |
Output only. The state of the evaluation run. |
progress |
Output only. The progress of the evaluation run. |
config |
Output only. The configuration used in the run. |
error |
Output only. Deprecated: Use error_info instead. Errors encountered during execution. |
errorInfo |
Output only. Error information for the evaluation run. |
evaluationRunSummaries |
Output only. Map of evaluation name to EvaluationRunSummary. An object containing a list of |
latencyReport |
Output only. Latency report for the evaluation run. |
runCount |
Output only. The number of times the evaluations inside the run were run. |
personaRunConfigs[] |
Output only. The configuration to use for the run per persona. |
optimizationConfig |
Optional. Configuration for running the optimization step after the evaluation run. If not set, the optimization step will not be run. |
scheduledEvaluationRun |
Output only. The scheduled evaluation run resource name that created this evaluation run. This field is only set if the evaluation run was created by a scheduled evaluation run. Format: |
goldenRunMethod |
Output only. The method used to run the evaluation. |
Timestamp
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z). |
nanos |
Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive. |
Progress
| JSON representation |
|---|
{ "totalCount": integer, "failedCount": integer, "errorCount": integer, "completedCount": integer, "passedCount": integer } |
| Fields | |
|---|---|
totalCount |
Output only. Total number of evaluation results in this run. |
failedCount |
Output only. Number of completed evaluation results with an outcome of FAIL. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is FAIL). |
errorCount |
Output only. Number of evaluation results that failed to execute. (EvaluationResult.execution_state is ERROR). |
completedCount |
Output only. Number of evaluation results that finished successfully. (EvaluationResult.execution_state is COMPLETED). |
passedCount |
Output only. Number of completed evaluation results with an outcome of PASS. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is PASS). |
EvaluationConfig
| JSON representation |
|---|
{ "inputAudioConfig": { object ( |
| Fields | |
|---|---|
inputAudioConfig |
Optional. Configuration for processing the input audio. |
outputAudioConfig |
Optional. Configuration for generating the output audio. |
evaluationChannel |
Optional. The channel to evaluate. |
toolCallBehaviour |
Optional. Specifies whether the evaluation should use real tool calls or fake tools. |
InputAudioConfig
| JSON representation |
|---|
{
"audioEncoding": enum ( |
| Fields | |
|---|---|
audioEncoding |
Required. The encoding of the input audio data. |
sampleRateHertz |
Required. The sample rate (in Hertz) of the input audio data. |
noiseSuppressionLevel |
Optional. Whether to enable noise suppression on the input audio. Available values are "low", "moderate", "high", "very_high". |
OutputAudioConfig
| JSON representation |
|---|
{
"audioEncoding": enum ( |
| Fields | |
|---|---|
audioEncoding |
Required. The encoding of the output audio data. |
sampleRateHertz |
Required. The sample rate (in Hertz) of the output audio data. |
Status
| JSON representation |
|---|
{ "code": integer, "message": string, "details": [ { "@type": string, field1: ..., ... } ] } |
| Fields | |
|---|---|
code |
The status code, which should be an enum value of |
message |
A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the |
details[] |
A list of messages that carry the error details. There is a common set of message types for APIs to use. An object containing fields of an arbitrary type. An additional field |
Any
| JSON representation |
|---|
{ "typeUrl": string, "value": string } |
| Fields | |
|---|---|
typeUrl |
Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name. Example: type.googleapis.com/google.protobuf.StringValue This string must contain at least one The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): In the original design of |
value |
Holds a Protobuf serialization of the type described by type_url. A base64-encoded string. |
EvaluationErrorInfo
| JSON representation |
|---|
{
"errorType": enum ( |
| Fields | |
|---|---|
errorType |
Output only. The type of error. |
errorMessage |
Output only. The error message. |
sessionId |
Output only. The session ID for the conversation that caused the error. |
EvaluationRunSummariesEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
EvaluationRunSummary
| JSON representation |
|---|
{ "passedCount": integer, "failedCount": integer, "errorCount": integer } |
| Fields | |
|---|---|
passedCount |
Output only. Number of passed results for the associated Evaluation in this run. |
failedCount |
Output only. Number of failed results for the associated Evaluation in this run. |
errorCount |
Output only. Number of error results for the associated Evaluation in this run. |
LatencyReport
| JSON representation |
|---|
{ "toolLatencies": [ { object ( |
| Fields | |
|---|---|
toolLatencies[] |
Output only. Unordered list. Latency metrics for each tool. |
callbackLatencies[] |
Output only. Unordered list. Latency metrics for each callback. |
guardrailLatencies[] |
Output only. Unordered list. Latency metrics for each guardrail. |
llmCallLatencies[] |
Output only. Unordered list. Latency metrics for each LLM call. |
sessionCount |
Output only. The total number of sessions considered in the latency report. |
ToolLatency
| JSON representation |
|---|
{ "toolDisplayName": string, "latencyMetrics": { object ( |
| Fields | |
|---|---|
toolDisplayName |
Output only. The display name of the tool. |
latencyMetrics |
Output only. The latency metrics for the tool. |
Union field tool_identifier. The identifier of the tool. tool_identifier can be only one of the following: |
|
tool |
Output only. Format: |
toolsetTool |
Output only. The toolset tool identifier. |
ToolsetTool
| JSON representation |
|---|
{ "toolset": string, "toolId": string } |
| Fields | |
|---|---|
toolset |
Required. The resource name of the Toolset from which this tool is derived. Format: |
toolId |
Optional. The tool ID to filter the tools to retrieve the schema for. |
LatencyMetrics
| JSON representation |
|---|
{ "p50Latency": string, "p90Latency": string, "p99Latency": string, "callCount": integer } |
| Fields | |
|---|---|
p50Latency |
Output only. The 50th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
p90Latency |
Output only. The 90th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
p99Latency |
Output only. The 99th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
callCount |
Output only. The number of times the resource was called. |
Duration
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
CallbackLatency
| JSON representation |
|---|
{
"stage": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
stage |
Output only. The stage of the callback. |
latencyMetrics |
Output only. The latency metrics for the callback. |
GuardrailLatency
| JSON representation |
|---|
{
"guardrail": string,
"guardrailDisplayName": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
guardrail |
Output only. The name of the guardrail. Format: |
guardrailDisplayName |
Output only. The display name of the guardrail. |
latencyMetrics |
Output only. The latency metrics for the guardrail. |
LlmCallLatency
| JSON representation |
|---|
{
"model": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
model |
Output only. The name of the model. |
latencyMetrics |
Output only. The latency metrics for the LLM call. |
PersonaRunConfig
| JSON representation |
|---|
{ "persona": string, "taskCount": integer } |
| Fields | |
|---|---|
persona |
Optional. The persona to use for the evaluation. Format: |
taskCount |
Optional. The number of tasks to run for the persona. |
OptimizationConfig
| JSON representation |
|---|
{
"generateLossReport": boolean,
"assistantSession": string,
"reportSummary": string,
"shouldSuggestFix": boolean,
"status": enum ( |
| Fields | |
|---|---|
generateLossReport |
Optional. Whether to generate a loss report. |
assistantSession |
Output only. The assistant session to use for the optimization based on this evaluation run. Format: |
reportSummary |
Output only. The summary of the loss report. |
shouldSuggestFix |
Output only. Whether to suggest a fix for the losses. |
status |
Output only. The status of the optimization run. |
errorMessage |
Output only. The error message if the optimization run failed. |
lossReport |
Output only. The generated loss report. |
Struct
| JSON representation |
|---|
{ "fields": { string: value, ... } } |
| Fields | |
|---|---|
fields |
Unordered map of dynamically typed values. An object containing a list of |
FieldsEntry
| JSON representation |
|---|
{ "key": string, "value": value } |
| Fields | |
|---|---|
key |
|
value |
|
Value
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field kind. The kind of value. kind can be only one of the following: |
|
nullValue |
Represents a null value. |
numberValue |
Represents a double value. |
stringValue |
Represents a string value. |
boolValue |
Represents a boolean value. |
structValue |
Represents a structured value. |
listValue |
Represents a repeated |
ListValue
| JSON representation |
|---|
{ "values": [ value ] } |
| Fields | |
|---|---|
values[] |
Repeated field of dynamically typed values. |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ✅ | Read Only Hint: ✅ | Open World Hint: ❌