Tool: evaluate_instances
Evaluates instances based on a given metric. Use this to perform online evaluation of model responses using metrics like fluency, coherence, safety, and more.
The following sample demonstrate how to use curl to invoke the evaluate_instances MCP tool.
| Curl Request |
|---|
curl --location 'https://aiplatform.googleapis.com/mcp/generate' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "evaluate_instances", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }' |
Input Schema
Request message for EvaluationService.EvaluateInstances.
EvaluateInstancesRequest
| JSON representation |
|---|
{ "location": string, "autoraterConfig": { object ( |
| Fields | |
|---|---|
location |
Required. The resource name of the Location to evaluate the instances. Format: |
autoraterConfig |
Optional. Autorater config used for evaluation. |
Union field metric_inputs. Instances and specs for evaluation metric_inputs can be only one of the following: |
|
exactMatchInput |
Auto metric instances. Instances and metric spec for exact match metric. |
bleuInput |
Instances and metric spec for bleu metric. |
rougeInput |
Instances and metric spec for rouge metric. |
fluencyInput |
LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric. |
coherenceInput |
Input for coherence metric. |
safetyInput |
Input for safety metric. |
groundednessInput |
Input for groundedness metric. |
fulfillmentInput |
Input for fulfillment metric. |
summarizationQualityInput |
Input for summarization quality metric. |
pairwiseSummarizationQualityInput |
Input for pairwise summarization quality metric. |
summarizationHelpfulnessInput |
Input for summarization helpfulness metric. |
summarizationVerbosityInput |
Input for summarization verbosity metric. |
questionAnsweringQualityInput |
Input for question answering quality metric. |
pairwiseQuestionAnsweringQualityInput |
Input for pairwise question answering quality metric. |
questionAnsweringRelevanceInput |
Input for question answering relevance metric. |
questionAnsweringHelpfulnessInput |
Input for question answering helpfulness metric. |
questionAnsweringCorrectnessInput |
Input for question answering correctness metric. |
pointwiseMetricInput |
Input for pointwise metric. |
pairwiseMetricInput |
Input for pairwise metric. |
toolCallValidInput |
Tool call metric instances. Input for tool call valid metric. |
toolNameMatchInput |
Input for tool name match metric. |
toolParameterKeyMatchInput |
Input for tool parameter key match metric. |
toolParameterKvMatchInput |
Input for tool parameter key value match metric. |
cometInput |
Translation metrics. Input for Comet metric. |
metricxInput |
Input for Metricx metric. |
trajectoryExactMatchInput |
Input for trajectory exact match metric. |
trajectoryInOrderMatchInput |
Input for trajectory in order match metric. |
trajectoryAnyOrderMatchInput |
Input for trajectory match any order metric. |
trajectoryPrecisionInput |
Input for trajectory precision metric. |
trajectoryRecallInput |
Input for trajectory recall metric. |
trajectorySingleToolUseInput |
Input for trajectory single tool use metric. |
rubricBasedInstructionFollowingInput |
Rubric Based Instruction Following metric. |
ExactMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for exact match metric. |
instances[] |
Required. Repeated exact match instances. |
ExactMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
BleuInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for bleu score metric. |
instances[] |
Required. Repeated bleu instances. |
BleuSpec
| JSON representation |
|---|
{ "useEffectiveOrder": boolean } |
| Fields | |
|---|---|
useEffectiveOrder |
Optional. Whether to use_effective_order to compute bleu score. |
BleuInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
RougeInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for rouge score metric. |
instances[] |
Required. Repeated rouge instances. |
RougeSpec
| JSON representation |
|---|
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean } |
| Fields | |
|---|---|
rougeType |
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum. |
useStemmer |
Optional. Whether to use stemmer to compute rouge score. |
splitSummaries |
Optional. Whether to split summaries while using rougeLsum. |
RougeInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
FluencyInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for fluency score metric. |
instance |
Required. Fluency instance. |
FluencySpec
| JSON representation |
|---|
{ "version": integer } |
| Fields | |
|---|---|
version |
Optional. Which version to use for evaluation. |
FluencyInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
CoherenceInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for coherence score metric. |
instance |
Required. Coherence instance. |
CoherenceSpec
| JSON representation |
|---|
{ "version": integer } |
| Fields | |
|---|---|
version |
Optional. Which version to use for evaluation. |
CoherenceInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
SafetyInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for safety metric. |
instance |
Required. Safety instance. |
SafetySpec
| JSON representation |
|---|
{ "version": integer } |
| Fields | |
|---|---|
version |
Optional. Which version to use for evaluation. |
SafetyInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
GroundednessInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for groundedness metric. |
instance |
Required. Groundedness instance. |
GroundednessSpec
| JSON representation |
|---|
{ "version": integer } |
| Fields | |
|---|---|
version |
Optional. Which version to use for evaluation. |
GroundednessInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
context |
Required. Background information provided in context used to compare against the prediction. |
FulfillmentInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for fulfillment score metric. |
instance |
Required. Fulfillment instance. |
FulfillmentSpec
| JSON representation |
|---|
{ "version": integer } |
| Fields | |
|---|---|
version |
Optional. Which version to use for evaluation. |
FulfillmentInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
instruction |
Required. Inference instruction prompt to compare prediction with. |
SummarizationQualityInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for summarization quality score metric. |
instance |
Required. Summarization quality instance. |
SummarizationQualitySpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute summarization quality. |
version |
Optional. Which version to use for evaluation. |
SummarizationQualityInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to be summarized. |
Union field
|
|
instruction |
Required. Summarization prompt for LLM. |
PairwiseSummarizationQualityInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for pairwise summarization quality score metric. |
instance |
Required. Pairwise summarization quality instance. |
PairwiseSummarizationQualitySpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute pairwise summarization quality. |
version |
Optional. Which version to use for evaluation. |
PairwiseSummarizationQualityInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the candidate model. |
Union field
|
|
baselinePrediction |
Required. Output of the baseline model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to be summarized. |
Union field
|
|
instruction |
Required. Summarization prompt for LLM. |
SummarizationHelpfulnessInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for summarization helpfulness score metric. |
instance |
Required. Summarization helpfulness instance. |
SummarizationHelpfulnessSpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute summarization helpfulness. |
version |
Optional. Which version to use for evaluation. |
SummarizationHelpfulnessInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to be summarized. |
Union field
|
|
instruction |
Optional. Summarization prompt for LLM. |
SummarizationVerbosityInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for summarization verbosity score metric. |
instance |
Required. Summarization verbosity instance. |
SummarizationVerbositySpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute summarization verbosity. |
version |
Optional. Which version to use for evaluation. |
SummarizationVerbosityInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to be summarized. |
Union field
|
|
instruction |
Optional. Summarization prompt for LLM. |
QuestionAnsweringQualityInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for question answering quality score metric. |
instance |
Required. Question answering quality instance. |
QuestionAnsweringQualitySpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute question answering quality. |
version |
Optional. Which version to use for evaluation. |
QuestionAnsweringQualityInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to answer the question. |
Union field
|
|
instruction |
Required. Question Answering prompt for LLM. |
PairwiseQuestionAnsweringQualityInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for pairwise question answering quality score metric. |
instance |
Required. Pairwise question answering quality instance. |
PairwiseQuestionAnsweringQualitySpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute question answering quality. |
version |
Optional. Which version to use for evaluation. |
PairwiseQuestionAnsweringQualityInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the candidate model. |
Union field
|
|
baselinePrediction |
Required. Output of the baseline model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Required. Text to answer the question. |
Union field
|
|
instruction |
Required. Question Answering prompt for LLM. |
QuestionAnsweringRelevanceInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for question answering relevance score metric. |
instance |
Required. Question answering relevance instance. |
QuestionAnsweringRelevanceSpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute question answering relevance. |
version |
Optional. Which version to use for evaluation. |
QuestionAnsweringRelevanceInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Optional. Text provided as context to answer the question. |
Union field
|
|
instruction |
Required. The question asked and other instruction in the inference prompt. |
QuestionAnsweringHelpfulnessInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for question answering helpfulness score metric. |
instance |
Required. Question answering helpfulness instance. |
QuestionAnsweringHelpfulnessSpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute question answering helpfulness. |
version |
Optional. Which version to use for evaluation. |
QuestionAnsweringHelpfulnessInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Optional. Text provided as context to answer the question. |
Union field
|
|
instruction |
Required. The question asked and other instruction in the inference prompt. |
QuestionAnsweringCorrectnessInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for question answering correctness score metric. |
instance |
Required. Question answering correctness instance. |
QuestionAnsweringCorrectnessSpec
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
| Fields | |
|---|---|
useReference |
Optional. Whether to use instance.reference to compute question answering correctness. |
version |
Optional. Which version to use for evaluation. |
QuestionAnsweringCorrectnessInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
context |
Optional. Text provided as context to answer the question. |
Union field
|
|
instruction |
Required. The question asked and other instruction in the inference prompt. |
PointwiseMetricInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for pointwise metric. |
instance |
Required. Pointwise metric instance. |
PointwiseMetricSpec
| JSON representation |
|---|
{ "customOutputFormatConfig": { object ( |
| Fields | |
|---|---|
customOutputFormatConfig |
Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the |
Union field
|
|
metricPromptTemplate |
Required. Metric prompt template for pointwise metric. |
Union field
|
|
systemInstruction |
Optional. System instructions for pointwise metric. |
CustomOutputFormatConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field custom_output_format_config. Custom output format configuration. custom_output_format_config can be only one of the following: |
|
returnRawOutput |
Optional. Whether to return raw output. |
PointwiseMetricInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field instance. Instance for pointwise metric. instance can be only one of the following: |
|
jsonInstance |
Instance specified as a json string. String key-value pairs are expected in the json_instance to render PointwiseMetricSpec.instance_prompt_template. |
contentMapInstance |
Key-value contents for the mutlimodality input, including text, image, video, audio, and pdf, etc. The key is placeholder in metric prompt template, and the value is the multimodal content. |
ContentMap
| JSON representation |
|---|
{
"values": {
string: {
object ( |
| Fields | |
|---|---|
values |
Optional. Map of placeholder to contents. An object containing a list of |
ValuesEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
Contents
| JSON representation |
|---|
{
"contents": [
{
object ( |
| Fields | |
|---|---|
contents[] |
Optional. Repeated contents. |
Content
| JSON representation |
|---|
{
"role": string,
"parts": [
{
object ( |
| Fields | |
|---|---|
role |
Optional. The producer of the content. Must be either 'user' or 'model'. If not set, the service will default to 'user'. |
parts[] |
Required. A list of A |
Part
| JSON representation |
|---|
{ "thought": boolean, "thoughtSignature": string, "mediaResolution": { object ( |
| Fields | |
|---|---|
thought |
Optional. Indicates whether the |
thoughtSignature |
Optional. An opaque signature for the thought so it can be reused in subsequent requests. A base64-encoded string. |
mediaResolution |
per part media resolution. Media resolution for the input media. |
Union field
|
|
text |
Optional. The text content of the part. When sent from the VSCode Gemini Code Assist extension, references to @mentioned items will be converted to markdown boldface text. For example |
inlineData |
Optional. The inline data content of the part. This can be used to include images, audio, or video in a request. |
fileData |
Optional. The URI-based data of the part. This can be used to include files from Google Cloud Storage. |
functionCall |
Optional. A predicted function call returned from the model. This contains the name of the function to call and the arguments to pass to the function. |
functionResponse |
Optional. The result of a function call. This is used to provide the model with the result of a function call that it predicted. |
executableCode |
Optional. Code generated by the model that is intended to be executed. |
codeExecutionResult |
Optional. The result of executing the |
Union field
|
|
videoMetadata |
Optional. Video metadata. The metadata should only be specified while the video data is presented in inline_data or file_data. |
Blob
| JSON representation |
|---|
{ "mimeType": string, "data": string, "displayName": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
data |
Required. The raw bytes of the data. A base64-encoded string. |
displayName |
Optional. The display name of the blob. Used to provide a label or filename to distinguish blobs. This field is only returned in |
FileData
| JSON representation |
|---|
{ "mimeType": string, "fileUri": string, "displayName": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
fileUri |
Required. The URI of the file in Google Cloud Storage. |
displayName |
Optional. The display name of the file. Used to provide a label or filename to distinguish files. This field is only returned in |
FunctionCall
| JSON representation |
|---|
{
"id": string,
"name": string,
"args": {
object
},
"partialArgs": [
{
object ( |
| Fields | |
|---|---|
id |
Optional. The unique id of the function call. If populated, the client to execute the |
name |
Optional. The name of the function to call. Matches |
args |
Optional. The function parameters and values in JSON object format. See |
partialArgs[] |
Optional. The partial argument value of the function call. If provided, represents the arguments/fields that are streamed incrementally. |
willContinue |
Optional. Whether this is the last part of the FunctionCall. If true, another partial message for the current FunctionCall is expected to follow. |
Struct
| JSON representation |
|---|
{ "fields": { string: value, ... } } |
| Fields | |
|---|---|
fields |
Unordered map of dynamically typed values. An object containing a list of |
FieldsEntry
| JSON representation |
|---|
{ "key": string, "value": value } |
| Fields | |
|---|---|
key |
|
value |
|
Value
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field kind. The kind of value. kind can be only one of the following: |
|
nullValue |
Represents a JSON |
numberValue |
Represents a JSON number. Must not be |
stringValue |
Represents a JSON string. |
boolValue |
Represents a JSON boolean ( |
structValue |
Represents a JSON object. |
listValue |
Represents a JSON array. |
ListValue
| JSON representation |
|---|
{ "values": [ value ] } |
| Fields | |
|---|---|
values[] |
Repeated field of dynamically typed values. |
PartialArg
| JSON representation |
|---|
{ "jsonPath": string, "willContinue": boolean, // Union field |
| Fields | |
|---|---|
jsonPath |
Required. A JSON Path (RFC 9535) to the argument being streamed. https://datatracker.ietf.org/doc/html/rfc9535. e.g. "$.foo.bar[0].data". |
willContinue |
Optional. Whether this is not the last part of the same json_path. If true, another PartialArg message for the current json_path is expected to follow. |
Union field delta. The delta of field value being streamed. delta can be only one of the following: |
|
nullValue |
Optional. Represents a null value. |
numberValue |
Optional. Represents a double value. |
stringValue |
Optional. Represents a string value. |
boolValue |
Optional. Represents a boolean value. |
FunctionResponse
| JSON representation |
|---|
{
"id": string,
"name": string,
"response": {
object
},
"parts": [
{
object ( |
| Fields | |
|---|---|
id |
Optional. The id of the function call this response is for. Populated by the client to match the corresponding function call |
name |
Required. The name of the function to call. Matches |
response |
Required. The function response in JSON object format. Use "output" key to specify function output and "error" key to specify error details (if any). If "output" and "error" keys are not specified, then whole "response" is treated as function output. |
parts[] |
Optional. Ordered |
FunctionResponsePart
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field data. The data of the function response part. data can be only one of the following: |
|
inlineData |
Inline media bytes. |
fileData |
URI based data. |
FunctionResponseBlob
| JSON representation |
|---|
{ "mimeType": string, "data": string, "displayName": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
data |
Required. Raw bytes. A base64-encoded string. |
displayName |
Optional. Display name of the blob. Used to provide a label or filename to distinguish blobs. This field is only returned in PromptMessage for prompt management. It is currently used in the Gemini GenerateContent calls only when server side tools (code_execution, google_search, and url_context) are enabled. |
FunctionResponseFileData
| JSON representation |
|---|
{ "mimeType": string, "fileUri": string, "displayName": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
fileUri |
Required. URI. |
displayName |
Optional. Display name of the file data. Used to provide a label or filename to distinguish file datas. This field is only returned in PromptMessage for prompt management. It is currently used in the Gemini GenerateContent calls only when server side tools (code_execution, google_search, and url_context) are enabled. |
ExecutableCode
| JSON representation |
|---|
{
"language": enum ( |
| Fields | |
|---|---|
language |
Required. Programming language of the |
code |
Required. The code to be executed. |
CodeExecutionResult
| JSON representation |
|---|
{
"outcome": enum ( |
| Fields | |
|---|---|
outcome |
Required. Outcome of the code execution. |
output |
Optional. Contains stdout when code execution is successful, stderr or other description otherwise. |
VideoMetadata
| JSON representation |
|---|
{ "startOffset": string, "endOffset": string, "fps": number } |
| Fields | |
|---|---|
startOffset |
Optional. The start offset of the video. A duration in seconds with up to nine fractional digits, ending with ' |
endOffset |
Optional. The end offset of the video. A duration in seconds with up to nine fractional digits, ending with ' |
fps |
Optional. The frame rate of the video sent to the model. If not specified, the default value is 1.0. The valid range is (0.0, 24.0]. |
Duration
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
MediaResolution
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
level |
The tokenization quality used for given media. |
PairwiseMetricInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for pairwise metric. |
instance |
Required. Pairwise metric instance. |
PairwiseMetricSpec
| JSON representation |
|---|
{ "candidateResponseFieldName": string, "baselineResponseFieldName": string, "customOutputFormatConfig": { object ( |
| Fields | |
|---|---|
candidateResponseFieldName |
Optional. The field name of the candidate response. |
baselineResponseFieldName |
Optional. The field name of the baseline response. |
customOutputFormatConfig |
Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the |
Union field
|
|
metricPromptTemplate |
Required. Metric prompt template for pairwise metric. |
Union field
|
|
systemInstruction |
Optional. System instructions for pairwise metric. |
PairwiseMetricInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field instance. Instance for pairwise metric. instance can be only one of the following: |
|
jsonInstance |
Instance specified as a json string. String key-value pairs are expected in the json_instance to render PairwiseMetricSpec.instance_prompt_template. |
contentMapInstance |
Key-value contents for the mutlimodality input, including text, image, video, audio, and pdf, etc. The key is placeholder in metric prompt template, and the value is the multimodal content. |
ToolCallValidInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for tool call valid metric. |
instances[] |
Required. Repeated tool call valid instances. |
ToolCallValidInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
ToolNameMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for tool name match metric. |
instances[] |
Required. Repeated tool name match instances. |
ToolNameMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
ToolParameterKeyMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for tool parameter key match metric. |
instances[] |
Required. Repeated tool parameter key match instances. |
ToolParameterKeyMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
ToolParameterKVMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for tool parameter key value match metric. |
instances[] |
Required. Repeated tool parameter key value match instances. |
ToolParameterKVMatchSpec
| JSON representation |
|---|
{ "useStrictStringMatch": boolean } |
| Fields | |
|---|---|
useStrictStringMatch |
Optional. Whether to use STRICT string match on parameter values. |
ToolParameterKVMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Required. Ground truth used to compare against the prediction. |
CometInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for comet metric. |
instance |
Required. Comet instance. |
CometSpec
| JSON representation |
|---|
{ "sourceLanguage": string, "targetLanguage": string, // Union field |
| Fields | |
|---|---|
sourceLanguage |
Optional. Source language in BCP-47 format. |
targetLanguage |
Optional. Target language in BCP-47 format. Covers both prediction and reference. |
Union field
|
|
version |
Required. Which version to use for evaluation. |
CometInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
source |
Optional. Source text in original language. |
MetricxInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for Metricx metric. |
instance |
Required. Metricx instance. |
MetricxSpec
| JSON representation |
|---|
{ "sourceLanguage": string, "targetLanguage": string, // Union field |
| Fields | |
|---|---|
sourceLanguage |
Optional. Source language in BCP-47 format. |
targetLanguage |
Optional. Target language in BCP-47 format. Covers both prediction and reference. |
Union field
|
|
version |
Required. Which version to use for evaluation. |
MetricxInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
prediction |
Required. Output of the evaluated model. |
Union field
|
|
reference |
Optional. Ground truth used to compare against the prediction. |
Union field
|
|
source |
Optional. Source text in original language. |
TrajectoryExactMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectoryExactMatch metric. |
instances[] |
Required. Repeated TrajectoryExactMatch instance. |
TrajectoryExactMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
Union field
|
|
referenceTrajectory |
Required. Spec for reference tool call trajectory. |
Trajectory
| JSON representation |
|---|
{
"toolCalls": [
{
object ( |
| Fields | |
|---|---|
toolCalls[] |
Required. Tool calls in the trajectory. |
ToolCall
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
toolName |
Required. Spec for tool name |
Union field
|
|
toolInput |
Optional. Spec for tool input |
TrajectoryInOrderMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectoryInOrderMatch metric. |
instances[] |
Required. Repeated TrajectoryInOrderMatch instance. |
TrajectoryInOrderMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
Union field
|
|
referenceTrajectory |
Required. Spec for reference tool call trajectory. |
TrajectoryAnyOrderMatchInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectoryAnyOrderMatch metric. |
instances[] |
Required. Repeated TrajectoryAnyOrderMatch instance. |
TrajectoryAnyOrderMatchInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
Union field
|
|
referenceTrajectory |
Required. Spec for reference tool call trajectory. |
TrajectoryPrecisionInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectoryPrecision metric. |
instances[] |
Required. Repeated TrajectoryPrecision instance. |
TrajectoryPrecisionInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
Union field
|
|
referenceTrajectory |
Required. Spec for reference tool call trajectory. |
TrajectoryRecallInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectoryRecall metric. |
instances[] |
Required. Repeated TrajectoryRecall instance. |
TrajectoryRecallInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
Union field
|
|
referenceTrajectory |
Required. Spec for reference tool call trajectory. |
TrajectorySingleToolUseInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for TrajectorySingleToolUse metric. |
instances[] |
Required. Repeated TrajectorySingleToolUse instance. |
TrajectorySingleToolUseSpec
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
toolName |
Required. Spec for tool name to be checked for in the predicted trajectory. |
TrajectorySingleToolUseInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
predictedTrajectory |
Required. Spec for predicted tool call trajectory. |
RubricBasedInstructionFollowingInput
| JSON representation |
|---|
{ "metricSpec": { object ( |
| Fields | |
|---|---|
metricSpec |
Required. Spec for RubricBasedInstructionFollowing metric. |
instance |
Required. Instance for RubricBasedInstructionFollowing metric. |
RubricBasedInstructionFollowingInstance
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field instance. Instance for RubricBasedInstructionFollowing metric. instance can be only one of the following: |
|
jsonInstance |
Required. Instance specified as a json string. String key-value pairs are expected in the json_instance to render RubricBasedInstructionFollowing prompt templates. |
AutoraterConfig
| JSON representation |
|---|
{ "autoraterModel": string, "generationConfig": { object ( |
| Fields | |
|---|---|
autoraterModel |
Optional. The fully qualified name of the publisher model or tuned autorater endpoint to use. Publisher model format: Tuned model endpoint format: |
generationConfig |
Optional. Configuration options for model generation and outputs. |
Union field
|
|
samplingCount |
Optional. Number of samples for each instance in the dataset. If not specified, the default is 4. Minimum value is 1, maximum value is 32. |
Union field
|
|
flipEnabled |
Optional. Default is true. Whether to flip the candidate and baseline responses. This is only applicable to the pairwise metric. If enabled, also provide PairwiseMetricSpec.candidate_response_field_name and PairwiseMetricSpec.baseline_response_field_name. When rendering PairwiseMetricSpec.metric_prompt_template, the candidate and baseline fields will be flipped for half of the samples to reduce bias. |
GenerationConfig
| JSON representation |
|---|
{ "stopSequences": [ string ], "responseMimeType": string, "responseModalities": [ enum ( |
| Fields | |
|---|---|
stopSequences[] |
Optional. A list of character sequences that will stop the model from generating further tokens. If a stop sequence is generated, the output will end at that point. This is useful for controlling the length and structure of the output. For example, you can use ["\n", "###"] to stop generation at a new line or a specific marker. |
responseMimeType |
Optional. The IANA standard MIME type of the response. The model will generate output that conforms to this MIME type. Supported values include 'text/plain' (default) and 'application/json'. The model needs to be prompted to output the appropriate response type, otherwise the behavior is undefined. |
responseModalities[] |
Optional. The modalities of the response. The model will generate a response that includes all the specified modalities. For example, if this is set to |
thinkingConfig |
Optional. Configuration for thinking features. An error will be returned if this field is set for models that don't support thinking. |
modelConfig |
Optional. Config for model selection. |
Union field
|
|
temperature |
Optional. Controls the randomness of the output. A higher temperature results in more creative and diverse responses, while a lower temperature makes the output more predictable and focused. The valid range is (0.0, 2.0]. |
Union field
|
|
topP |
Optional. Specifies the nucleus sampling threshold. The model considers only the smallest set of tokens whose cumulative probability is at least |
Union field
|
|
topK |
Optional. Specifies the top-k sampling threshold. The model considers only the top k most probable tokens for the next token. This can be useful for generating more coherent and less random text. For example, a |
Union field
|
|
candidateCount |
Optional. The number of candidate responses to generate. A higher |
Union field
|
|
maxOutputTokens |
Optional. The maximum number of tokens to generate in the response. A token is approximately four characters. The default value varies by model. This parameter can be used to control the length of the generated text and prevent overly long responses. |
Union field
|
|
responseLogprobs |
Optional. If set to true, the log probabilities of the output tokens are returned. Log probabilities are the logarithm of the probability of a token appearing in the output. A higher log probability means the token is more likely to be generated. This can be useful for analyzing the model's confidence in its own output and for debugging. |
Union field
|
|
logprobs |
Optional. The number of top log probabilities to return for each token. This can be used to see which other tokens were considered likely candidates for a given position. A higher value will return more options, but it will also increase the size of the response. |
Union field
|
|
presencePenalty |
Optional. Penalizes tokens that have already appeared in the generated text. A positive value encourages the model to generate more diverse and less repetitive text. Valid values can range from [-2.0, 2.0]. |
Union field
|
|
frequencyPenalty |
Optional. Penalizes tokens based on their frequency in the generated text. A positive value helps to reduce the repetition of words and phrases. Valid values can range from [-2.0, 2.0]. |
Union field
|
|
seed |
Optional. A seed for the random number generator. By setting a seed, you can make the model's output mostly deterministic. For a given prompt and parameters (like temperature, top_p, etc.), the model will produce the same response every time. However, it's not a guaranteed absolute deterministic behavior. This is different from parameters like |
Union field
|
|
responseSchema |
Optional. Lets you to specify a schema for the model's response, ensuring that the output conforms to a particular structure. This is useful for generating structured data such as JSON. The schema is a subset of the OpenAPI 3.0 schema object object. When this field is set, you must also set the |
Union field
|
|
responseJsonSchema |
Optional. When this field is set, |
Union field
|
|
routingConfig |
Optional. Routing configuration. |
Union field
|
|
audioTimestamp |
Optional. If enabled, audio timestamps will be included in the request to the model. This can be useful for synchronizing audio with other modalities in the response. |
Union field
|
|
mediaResolution |
Optional. The token resolution at which input media content is sampled. This is used to control the trade-off between the quality of the response and the number of tokens used to represent the media. A higher resolution allows the model to perceive more detail, which can lead to a more nuanced response, but it will also use more tokens. This does not affect the image dimensions sent to the model. |
Union field
|
|
speechConfig |
Optional. The speech generation config. |
Union field
|
|
enableAffectiveDialog |
Optional. If enabled, the model will detect emotions and adapt its responses accordingly. For example, if the model detects that the user is frustrated, it may provide a more empathetic response. |
Union field
|
|
imageConfig |
Optional. Config for image generation features. |
Schema
| JSON representation |
|---|
{ "type": enum ( |
| Fields | |
|---|---|
type |
Optional. Data type of the schema field. |
format |
Optional. The format of the data. For |
title |
Optional. Title for the schema. |
description |
Optional. Describes the data. The model uses this field to understand the purpose of the schema and how to use it. It is a best practice to provide a clear and descriptive explanation for the schema and its properties here, rather than in the prompt. |
nullable |
Optional. Indicates if the value of this field can be null. |
default |
Optional. Default value to use if the field is not specified. |
items |
Optional. If type is |
minItems |
Optional. If type is |
maxItems |
Optional. If type is |
enum[] |
Optional. Possible values of the field. This field can be used to restrict a value to a fixed set of values. To mark a field as an enum, set |
properties |
Optional. If type is An object containing a list of |
propertyOrdering[] |
Optional. Order of properties displayed or used where order matters. This is not a standard field in OpenAPI specification, but can be used to control the order of properties. |
required[] |
Optional. If type is |
minProperties |
Optional. If type is |
maxProperties |
Optional. If type is |
minimum |
Optional. If type is |
maximum |
Optional. If type is |
minLength |
Optional. If type is |
maxLength |
Optional. If type is |
pattern |
Optional. If type is |
example |
Optional. Example of an instance of this schema. |
anyOf[] |
Optional. The instance must be valid against any (one or more) of the subschemas listed in |
additionalProperties |
Optional. If |
ref |
Optional. Allows referencing another schema definition to use in place of this schema. The value must be a valid reference to a schema in For example, the following schema defines a reference to a schema node named "Pet": type: object properties: pet: ref: #/defs/Pet defs: Pet: type: object properties: name: type: string The value of the "pet" property is a reference to the schema node named "Pet". See details in https://json-schema.org/understanding-json-schema/structuring |
defs |
Optional. An object containing a list of |
PropertiesEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
DefsEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
RoutingConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field routing_config. The routing mode for the request. routing_config can be only one of the following: |
|
autoMode |
In this mode, the model is selected automatically based on the content of the request. |
manualMode |
In this mode, the model is specified manually. |
AutoRoutingMode
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
modelRoutingPreference |
The model routing preference. |
ManualRoutingMode
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
modelName |
The name of the model to use. Only public LLM models are accepted. |
SpeechConfig
| JSON representation |
|---|
{ "voiceConfig": { object ( |
| Fields | |
|---|---|
voiceConfig |
The configuration for the voice to use. |
languageCode |
Optional. The language code (ISO 639-1) for the speech synthesis. |
multiSpeakerVoiceConfig |
The configuration for a multi-speaker text-to-speech request. This field is mutually exclusive with |
VoiceConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field voice_config. The configuration for the speaker to use. voice_config can be only one of the following: |
|
prebuiltVoiceConfig |
The configuration for a prebuilt voice. |
replicatedVoiceConfig |
Optional. The configuration for a replicated voice. This enables users to replicate a voice from an audio sample. |
PrebuiltVoiceConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
voiceName |
The name of the prebuilt voice to use. |
ReplicatedVoiceConfig
| JSON representation |
|---|
{ "mimeType": string, "voiceSampleAudio": string } |
| Fields | |
|---|---|
mimeType |
Optional. The mimetype of the voice sample. The only currently supported value is |
voiceSampleAudio |
Optional. The sample of the custom voice. A base64-encoded string. |
MultiSpeakerVoiceConfig
| JSON representation |
|---|
{
"speakerVoiceConfigs": [
{
object ( |
| Fields | |
|---|---|
speakerVoiceConfigs[] |
Required. A list of configurations for the voices of the speakers. Exactly two speaker voice configurations must be provided. |
SpeakerVoiceConfig
| JSON representation |
|---|
{
"speaker": string,
"voiceConfig": {
object ( |
| Fields | |
|---|---|
speaker |
Required. The name of the speaker. This should be the same as the speaker name used in the prompt. |
voiceConfig |
Required. The configuration for the voice of this speaker. |
ThinkingConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
includeThoughts |
Optional. If true, the model will include its thoughts in the response. "Thoughts" are the intermediate steps the model takes to arrive at the final response. They can provide insights into the model's reasoning process and help with debugging. If this is true, thoughts are returned only when available. |
Union field
|
|
thinkingBudget |
Optional. The token budget for the model's thinking process. The model will make a best effort to stay within this budget. This can be used to control the trade-off between response quality and latency. |
Union field
|
|
thinkingLevel |
Optional. The number of thoughts tokens that the model should generate. |
ModelConfig
| JSON representation |
|---|
{
"featureSelectionPreference": enum ( |
| Fields | |
|---|---|
featureSelectionPreference |
Required. Feature selection preference. |
ImageConfig
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
imageOutputOptions |
Optional. The image output format for generated images. |
Union field
|
|
aspectRatio |
Optional. The desired aspect ratio for the generated images. The following aspect ratios are supported: "1:1" "2:3", "3:2" "3:4", "4:3" "4:5", "5:4" "9:16", "16:9" "21:9" |
Union field
|
|
personGeneration |
Optional. Controls whether the model can generate people. |
Union field
|
|
imageSize |
Optional. Specifies the size of generated images. Supported values are |
ImageOutputOptions
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
mimeType |
Optional. The image format that the output should be saved as. |
Union field
|
|
compressionQuality |
Optional. The compression quality of the output image. |
Output Schema
Response message for EvaluationService.EvaluateInstances.
EvaluateInstancesResponse
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field evaluation_results. Evaluation results will be served in the same order as presented in EvaluationRequest.instances. evaluation_results can be only one of the following: |
|
exactMatchResults |
Auto metric evaluation results. Results for exact match metric. |
bleuResults |
Results for bleu metric. |
rougeResults |
Results for rouge metric. |
fluencyResult |
LLM-based metric evaluation result. General text generation metrics, applicable to other categories. Result for fluency metric. |
coherenceResult |
Result for coherence metric. |
safetyResult |
Result for safety metric. |
groundednessResult |
Result for groundedness metric. |
fulfillmentResult |
Result for fulfillment metric. |
summarizationQualityResult |
Summarization only metrics. Result for summarization quality metric. |
pairwiseSummarizationQualityResult |
Result for pairwise summarization quality metric. |
summarizationHelpfulnessResult |
Result for summarization helpfulness metric. |
summarizationVerbosityResult |
Result for summarization verbosity metric. |
questionAnsweringQualityResult |
Question answering only metrics. Result for question answering quality metric. |
pairwiseQuestionAnsweringQualityResult |
Result for pairwise question answering quality metric. |
questionAnsweringRelevanceResult |
Result for question answering relevance metric. |
questionAnsweringHelpfulnessResult |
Result for question answering helpfulness metric. |
questionAnsweringCorrectnessResult |
Result for question answering correctness metric. |
pointwiseMetricResult |
Generic metrics. Result for pointwise metric. |
pairwiseMetricResult |
Result for pairwise metric. |
toolCallValidResults |
Tool call metrics. Results for tool call valid metric. |
toolNameMatchResults |
Results for tool name match metric. |
toolParameterKeyMatchResults |
Results for tool parameter key match metric. |
toolParameterKvMatchResults |
Results for tool parameter key value match metric. |
cometResult |
Translation metrics. Result for Comet metric. |
metricxResult |
Result for Metricx metric. |
trajectoryExactMatchResults |
Result for trajectory exact match metric. |
trajectoryInOrderMatchResults |
Result for trajectory in order match metric. |
trajectoryAnyOrderMatchResults |
Result for trajectory any order match metric. |
trajectoryPrecisionResults |
Result for trajectory precision metric. |
trajectoryRecallResults |
Results for trajectory recall metric. |
trajectorySingleToolUseResults |
Results for trajectory single tool use metric. |
rubricBasedInstructionFollowingResult |
Result for rubric based instruction following metric. |
ExactMatchResults
| JSON representation |
|---|
{
"exactMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
exactMatchMetricValues[] |
Output only. Exact match metric values. |
ExactMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Exact match score. |
BleuResults
| JSON representation |
|---|
{
"bleuMetricValues": [
{
object ( |
| Fields | |
|---|---|
bleuMetricValues[] |
Output only. Bleu metric values. |
BleuMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Bleu score. |
RougeResults
| JSON representation |
|---|
{
"rougeMetricValues": [
{
object ( |
| Fields | |
|---|---|
rougeMetricValues[] |
Output only. Rouge metric values. |
RougeMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Rouge score. |
FluencyResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for fluency score. |
Union field
|
|
score |
Output only. Fluency score. |
Union field
|
|
confidence |
Output only. Confidence for fluency score. |
CoherenceResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for coherence score. |
Union field
|
|
score |
Output only. Coherence score. |
Union field
|
|
confidence |
Output only. Confidence for coherence score. |
SafetyResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for safety score. |
Union field
|
|
score |
Output only. Safety score. |
Union field
|
|
confidence |
Output only. Confidence for safety score. |
GroundednessResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for groundedness score. |
Union field
|
|
score |
Output only. Groundedness score. |
Union field
|
|
confidence |
Output only. Confidence for groundedness score. |
FulfillmentResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for fulfillment score. |
Union field
|
|
score |
Output only. Fulfillment score. |
Union field
|
|
confidence |
Output only. Confidence for fulfillment score. |
SummarizationQualityResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for summarization quality score. |
Union field
|
|
score |
Output only. Summarization Quality score. |
Union field
|
|
confidence |
Output only. Confidence for summarization quality score. |
PairwiseSummarizationQualityResult
| JSON representation |
|---|
{ "pairwiseChoice": enum ( |
| Fields | |
|---|---|
pairwiseChoice |
Output only. Pairwise summarization prediction choice. |
explanation |
Output only. Explanation for summarization quality score. |
Union field
|
|
confidence |
Output only. Confidence for summarization quality score. |
SummarizationHelpfulnessResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for summarization helpfulness score. |
Union field
|
|
score |
Output only. Summarization Helpfulness score. |
Union field
|
|
confidence |
Output only. Confidence for summarization helpfulness score. |
SummarizationVerbosityResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for summarization verbosity score. |
Union field
|
|
score |
Output only. Summarization Verbosity score. |
Union field
|
|
confidence |
Output only. Confidence for summarization verbosity score. |
QuestionAnsweringQualityResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for question answering quality score. |
Union field
|
|
score |
Output only. Question Answering Quality score. |
Union field
|
|
confidence |
Output only. Confidence for question answering quality score. |
PairwiseQuestionAnsweringQualityResult
| JSON representation |
|---|
{ "pairwiseChoice": enum ( |
| Fields | |
|---|---|
pairwiseChoice |
Output only. Pairwise question answering prediction choice. |
explanation |
Output only. Explanation for question answering quality score. |
Union field
|
|
confidence |
Output only. Confidence for question answering quality score. |
QuestionAnsweringRelevanceResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for question answering relevance score. |
Union field
|
|
score |
Output only. Question Answering Relevance score. |
Union field
|
|
confidence |
Output only. Confidence for question answering relevance score. |
QuestionAnsweringHelpfulnessResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for question answering helpfulness score. |
Union field
|
|
score |
Output only. Question Answering Helpfulness score. |
Union field
|
|
confidence |
Output only. Confidence for question answering helpfulness score. |
QuestionAnsweringCorrectnessResult
| JSON representation |
|---|
{ "explanation": string, // Union field |
| Fields | |
|---|---|
explanation |
Output only. Explanation for question answering correctness score. |
Union field
|
|
score |
Output only. Question Answering Correctness score. |
Union field
|
|
confidence |
Output only. Confidence for question answering correctness score. |
PointwiseMetricResult
| JSON representation |
|---|
{ "explanation": string, "customOutput": { object ( |
| Fields | |
|---|---|
explanation |
Output only. Explanation for pointwise metric score. |
customOutput |
Output only. Spec for custom output. |
Union field
|
|
score |
Output only. Pointwise metric score. |
CustomOutput
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field custom_output. Custom output. custom_output can be only one of the following: |
|
rawOutputs |
Output only. List of raw output strings. |
RawOutput
| JSON representation |
|---|
{ "rawOutput": [ string ] } |
| Fields | |
|---|---|
rawOutput[] |
Output only. Raw output string. |
PairwiseMetricResult
| JSON representation |
|---|
{ "pairwiseChoice": enum ( |
| Fields | |
|---|---|
pairwiseChoice |
Output only. Pairwise metric choice. |
explanation |
Output only. Explanation for pairwise metric score. |
customOutput |
Output only. Spec for custom output. |
ToolCallValidResults
| JSON representation |
|---|
{
"toolCallValidMetricValues": [
{
object ( |
| Fields | |
|---|---|
toolCallValidMetricValues[] |
Output only. Tool call valid metric values. |
ToolCallValidMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Tool call valid score. |
ToolNameMatchResults
| JSON representation |
|---|
{
"toolNameMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
toolNameMatchMetricValues[] |
Output only. Tool name match metric values. |
ToolNameMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Tool name match score. |
ToolParameterKeyMatchResults
| JSON representation |
|---|
{
"toolParameterKeyMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
toolParameterKeyMatchMetricValues[] |
Output only. Tool parameter key match metric values. |
ToolParameterKeyMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Tool parameter key match score. |
ToolParameterKVMatchResults
| JSON representation |
|---|
{
"toolParameterKvMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
toolParameterKvMatchMetricValues[] |
Output only. Tool parameter key value match metric values. |
ToolParameterKVMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Tool parameter key value match score. |
CometResult
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. Comet score. Range depends on version. |
MetricxResult
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. MetricX score. Range depends on version. |
TrajectoryExactMatchResults
| JSON representation |
|---|
{
"trajectoryExactMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectoryExactMatchMetricValues[] |
Output only. TrajectoryExactMatch metric values. |
TrajectoryExactMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectoryExactMatch score. |
TrajectoryInOrderMatchResults
| JSON representation |
|---|
{
"trajectoryInOrderMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectoryInOrderMatchMetricValues[] |
Output only. TrajectoryInOrderMatch metric values. |
TrajectoryInOrderMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectoryInOrderMatch score. |
TrajectoryAnyOrderMatchResults
| JSON representation |
|---|
{
"trajectoryAnyOrderMatchMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectoryAnyOrderMatchMetricValues[] |
Output only. TrajectoryAnyOrderMatch metric values. |
TrajectoryAnyOrderMatchMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectoryAnyOrderMatch score. |
TrajectoryPrecisionResults
| JSON representation |
|---|
{
"trajectoryPrecisionMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectoryPrecisionMetricValues[] |
Output only. TrajectoryPrecision metric values. |
TrajectoryPrecisionMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectoryPrecision score. |
TrajectoryRecallResults
| JSON representation |
|---|
{
"trajectoryRecallMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectoryRecallMetricValues[] |
Output only. TrajectoryRecall metric values. |
TrajectoryRecallMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectoryRecall score. |
TrajectorySingleToolUseResults
| JSON representation |
|---|
{
"trajectorySingleToolUseMetricValues": [
{
object ( |
| Fields | |
|---|---|
trajectorySingleToolUseMetricValues[] |
Output only. TrajectorySingleToolUse metric values. |
TrajectorySingleToolUseMetricValue
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field
|
|
score |
Output only. TrajectorySingleToolUse score. |
RubricBasedInstructionFollowingResult
| JSON representation |
|---|
{ "rubricCritiqueResults": [ { object ( |
| Fields | |
|---|---|
rubricCritiqueResults[] |
Output only. List of per rubric critique results. |
Union field
|
|
score |
Output only. Overall score for the instruction following. |
RubricCritiqueResult
| JSON representation |
|---|
{ "rubric": string, "verdict": boolean } |
| Fields | |
|---|---|
rubric |
Output only. Rubric to be evaluated. |
verdict |
Output only. Verdict for the rubric - true if the rubric is met, false otherwise. |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ❌ | Read Only Hint: ❌ | Open World Hint: ❌