Full name: projects.locations.evaluateInstances
Evaluates instances based on a given metric.
Endpoint
posthttps://{service-endpoint}/v1/{location}:evaluateInstances
Where {service-endpoint} is one of the supported service endpoints.
Path parameters
locationstring
Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}
Request body
The request body contains data with the following structure:
metric_inputsUnion type
metric_inputs can be only one of the following:Auto metric instances. Instances and metric spec for exact match metric.
Instances and metric spec for bleu metric.
Instances and metric spec for rouge metric.
LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.
Input for coherence metric.
Input for safety metric.
Input for groundedness metric.
Input for fulfillment metric.
Input for summarization quality metric.
Input for pairwise summarization quality metric.
Input for summarization helpfulness metric.
Input for summarization verbosity metric.
Input for question answering quality metric.
Input for pairwise question answering quality metric.
Input for question answering relevance metric.
Input for question answering helpfulness metric.
Input for question answering correctness metric.
Input for pointwise metric.
Input for pairwise metric.
Tool call metric instances. Input for tool call valid metric.
Input for tool name match metric.
Input for tool parameter key match metric.
Input for tool parameter key value match metric.
Translation metrics. Input for Comet metric.
Input for Metricx metric.
Input for trajectory exact match metric.
Input for trajectory in order match metric.
Input for trajectory match any order metric.
Input for trajectory precision metric.
Input for trajectory recall metric.
Input for trajectory single tool use metric.
Response body
Response message for EvaluationService.EvaluateInstances.
If successful, the response body contains data with the following structure:
evaluation_resultsUnion type
evaluation_results can be only one of the following:Auto metric evaluation results. Results for exact match metric.
Results for bleu metric.
Results for rouge metric.
LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.
result for coherence metric.
result for safety metric.
result for groundedness metric.
result for fulfillment metric.
Summarization only metrics. result for summarization quality metric.
result for pairwise summarization quality metric.
result for summarization helpfulness metric.
result for summarization verbosity metric.
Question answering only metrics. result for question answering quality metric.
result for pairwise question answering quality metric.
result for question answering relevance metric.
result for question answering helpfulness metric.
result for question answering correctness metric.
Generic metrics. result for pointwise metric.
result for pairwise metric.
Tool call metrics. Results for tool call valid metric.
Results for tool name match metric.
Results for tool parameter key match metric.
Results for tool parameter key value match metric.
Translation metrics. result for Comet metric.
result for Metricx metric.
result for trajectory exact match metric.
result for trajectory in order match metric.
result for trajectory any order match metric.
result for trajectory precision metric.
Results for trajectory recall metric.
Results for trajectory single tool use metric.
| JSON representation |
|---|
{ // evaluation_results "exactMatchResults": { object ( |
ExactMatchInput
Input for exact match metric.
Required. Spec for exact match metric.
Required. Repeated exact match instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
ExactMatchSpec
This type has no fields.
Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.
ExactMatchInstance
Spec for exact match instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
BleuInput
Input for bleu metric.
Required. Spec for bleu score metric.
Required. Repeated bleu instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
BleuSpec
Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.
useEffectiveOrderboolean
Optional. Whether to useEffectiveOrder to compute bleu score.
| JSON representation |
|---|
{ "useEffectiveOrder": boolean } |
BleuInstance
Spec for bleu instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
RougeInput
Input for rouge metric.
Required. Spec for rouge score metric.
Required. Repeated rouge instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
RougeSpec
Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.
rougeTypestring
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
useStemmerboolean
Optional. Whether to use stemmer to compute rouge score.
splitSummariesboolean
Optional. Whether to split summaries while using rougeLsum.
| JSON representation |
|---|
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean } |
RougeInstance
Spec for rouge instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
FluencyInput
Input for fluency metric.
Required. Spec for fluency score metric.
Required. Fluency instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
FluencySpec
Spec for fluency score metric.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "version": integer } |
FluencyInstance
Spec for fluency instance.
predictionstring
Required. Output of the evaluated model.
| JSON representation |
|---|
{ "prediction": string } |
CoherenceInput
Input for coherence metric.
Required. Spec for coherence score metric.
Required. Coherence instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
CoherenceSpec
Spec for coherence score metric.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "version": integer } |
CoherenceInstance
Spec for coherence instance.
predictionstring
Required. Output of the evaluated model.
| JSON representation |
|---|
{ "prediction": string } |
SafetyInput
Input for safety metric.
Required. Spec for safety metric.
Required. Safety instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
SafetySpec
Spec for safety metric.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "version": integer } |
SafetyInstance
Spec for safety instance.
predictionstring
Required. Output of the evaluated model.
| JSON representation |
|---|
{ "prediction": string } |
GroundednessInput
Input for groundedness metric.
Required. Spec for groundedness metric.
Required. Groundedness instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
GroundednessSpec
Spec for groundedness metric.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "version": integer } |
GroundednessInstance
Spec for groundedness instance.
predictionstring
Required. Output of the evaluated model.
contextstring
Required. Background information provided in context used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "context": string } |
FulfillmentInput
Input for fulfillment metric.
Required. Spec for fulfillment score metric.
Required. Fulfillment instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
FulfillmentSpec
Spec for fulfillment metric.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "version": integer } |
FulfillmentInstance
Spec for fulfillment instance.
predictionstring
Required. Output of the evaluated model.
instructionstring
Required. Inference instruction prompt to compare prediction with.
| JSON representation |
|---|
{ "prediction": string, "instruction": string } |
SummarizationQualityInput
Input for summarization quality metric.
Required. Spec for summarization quality score metric.
Required. Summarization quality instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
SummarizationQualitySpec
Spec for summarization quality score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute summarization quality.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
SummarizationQualityInstance
Spec for summarization quality instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to be summarized.
instructionstring
Required. Summarization prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PairwiseSummarizationQualityInput
Input for pairwise summarization quality metric.
Required. Spec for pairwise summarization quality score metric.
Required. Pairwise summarization quality instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
PairwiseSummarizationQualitySpec
Spec for pairwise summarization quality score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute pairwise summarization quality.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
PairwiseSummarizationQualityInstance
Spec for pairwise summarization quality instance.
predictionstring
Required. Output of the candidate model.
baselinePredictionstring
Required. Output of the baseline model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to be summarized.
instructionstring
Required. Summarization prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string } |
SummarizationHelpfulnessInput
Input for summarization helpfulness metric.
Required. Spec for summarization helpfulness score metric.
Required. Summarization helpfulness instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
SummarizationHelpfulnessSpec
Spec for summarization helpfulness score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute summarization helpfulness.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
SummarizationHelpfulnessInstance
Spec for summarization helpfulness instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to be summarized.
instructionstring
Optional. Summarization prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
SummarizationVerbosityInput
Input for summarization verbosity metric.
Required. Spec for summarization verbosity score metric.
Required. Summarization verbosity instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
SummarizationVerbositySpec
Spec for summarization verbosity score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute summarization verbosity.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
SummarizationVerbosityInstance
Spec for summarization verbosity instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to be summarized.
instructionstring
Optional. Summarization prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringQualityInput
Input for question answering quality metric.
Required. Spec for question answering quality score metric.
Required. Question answering quality instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
QuestionAnsweringQualitySpec
Spec for question answering quality score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute question answering quality.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringQualityInstance
Spec for question answering quality instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to answer the question.
instructionstring
Required. Question Answering prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PairwiseQuestionAnsweringQualityInput
Input for pairwise question answering quality metric.
Required. Spec for pairwise question answering quality score metric.
Required. Pairwise question answering quality instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
PairwiseQuestionAnsweringQualitySpec
Spec for pairwise question answering quality score metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute question answering quality.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
PairwiseQuestionAnsweringQualityInstance
Spec for pairwise question answering quality instance.
predictionstring
Required. Output of the candidate model.
baselinePredictionstring
Required. Output of the baseline model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Required. Text to answer the question.
instructionstring
Required. Question Answering prompt for LLM.
| JSON representation |
|---|
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringRelevanceInput
Input for question answering relevance metric.
Required. Spec for question answering relevance score metric.
Required. Question answering relevance instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
QuestionAnsweringRelevanceSpec
Spec for question answering relevance metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute question answering relevance.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringRelevanceInstance
Spec for question answering relevance instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Optional. Text provided as context to answer the question.
instructionstring
Required. The question asked and other instruction in the inference prompt.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringHelpfulnessInput
Input for question answering helpfulness metric.
Required. Spec for question answering helpfulness score metric.
Required. Question answering helpfulness instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
QuestionAnsweringHelpfulnessSpec
Spec for question answering helpfulness metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute question answering helpfulness.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringHelpfulnessInstance
Spec for question answering helpfulness instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Optional. Text provided as context to answer the question.
instructionstring
Required. The question asked and other instruction in the inference prompt.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringCorrectnessInput
Input for question answering correctness metric.
Required. Spec for question answering correctness score metric.
Required. Question answering correctness instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
QuestionAnsweringCorrectnessSpec
Spec for question answering correctness metric.
useReferenceboolean
Optional. Whether to use instance.reference to compute question answering correctness.
versioninteger
Optional. Which version to use for evaluation.
| JSON representation |
|---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringCorrectnessInstance
Spec for question answering correctness instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
contextstring
Optional. Text provided as context to answer the question.
instructionstring
Required. The question asked and other instruction in the inference prompt.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PointwiseMetricInput
Input for pointwise metric.
Required. Spec for pointwise metric.
Required. Pointwise metric instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
PointwiseMetricSpec
Spec for pointwise metric.
metricPromptTemplatestring
Required. Metric prompt template for pointwise metric.
| JSON representation |
|---|
{ "metricPromptTemplate": string } |
PointwiseMetricInstance
Pointwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.
instanceUnion type
instance can be only one of the following:jsonInstancestring
Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PointwiseMetricSpec.instance_prompt_template.
| JSON representation |
|---|
{ // instance "jsonInstance": string // Union type } |
PairwiseMetricInput
Input for pairwise metric.
Required. Spec for pairwise metric.
Required. Pairwise metric instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
PairwiseMetricSpec
Spec for pairwise metric.
metricPromptTemplatestring
Required. Metric prompt template for pairwise metric.
| JSON representation |
|---|
{ "metricPromptTemplate": string } |
PairwiseMetricInstance
Pairwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.
instanceUnion type
instance can be only one of the following:jsonInstancestring
Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PairwiseMetricSpec.instance_prompt_template.
| JSON representation |
|---|
{ // instance "jsonInstance": string // Union type } |
ToolCallValidInput
Input for tool call valid metric.
Required. Spec for tool call valid metric.
Required. Repeated tool call valid instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
ToolCallValidSpec
This type has no fields.
Spec for tool call valid metric.
ToolCallValidInstance
Spec for tool call valid instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
ToolNameMatchInput
Input for tool name match metric.
Required. Spec for tool name match metric.
Required. Repeated tool name match instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
ToolNameMatchSpec
This type has no fields.
Spec for tool name match metric.
ToolNameMatchInstance
Spec for tool name match instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
ToolParameterKeyMatchInput
Input for tool parameter key match metric.
Required. Spec for tool parameter key match metric.
Required. Repeated tool parameter key match instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
ToolParameterKeyMatchSpec
This type has no fields.
Spec for tool parameter key match metric.
ToolParameterKeyMatchInstance
Spec for tool parameter key match instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
ToolParameterKVMatchInput
Input for tool parameter key value match metric.
Required. Spec for tool parameter key value match metric.
Required. Repeated tool parameter key value match instances.
| JSON representation |
|---|
{ "metricSpec": { object ( |
ToolParameterKVMatchSpec
Spec for tool parameter key value match metric.
useStrictStringMatchboolean
Optional. Whether to use STRICT string match on parameter values.
| JSON representation |
|---|
{ "useStrictStringMatch": boolean } |
ToolParameterKVMatchInstance
Spec for tool parameter key value match instance.
predictionstring
Required. Output of the evaluated model.
referencestring
Required. Ground truth used to compare against the prediction.
| JSON representation |
|---|
{ "prediction": string, "reference": string } |
CometInput
Input for Comet metric.
Required. Spec for comet metric.
Required. Comet instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
CometSpec
Spec for Comet metric.
sourceLanguagestring
Optional. Source language in BCP-47 format.
targetLanguagestring
Optional. Target language in BCP-47 format. Covers both prediction and reference.
Required. Which version to use for evaluation.
| JSON representation |
|---|
{
"sourceLanguage": string,
"targetLanguage": string,
"version": enum ( |
CometVersion
Comet version options.
| Enums | |
|---|---|
COMET_VERSION_UNSPECIFIED |
Comet version unspecified. |
COMET_22_SRC_REF |
Comet 22 for translation + source + reference (source-reference-combined). |
CometInstance
Spec for Comet instance - The fields used for evaluation are dependent on the comet version.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
sourcestring
Optional. Source text in original language.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "source": string } |
MetricxInput
Input for MetricX metric.
Required. Spec for Metricx metric.
Required. Metricx instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
MetricxSpec
Spec for MetricX metric.
sourceLanguagestring
Optional. Source language in BCP-47 format.
targetLanguagestring
Optional. Target language in BCP-47 format. Covers both prediction and reference.
Required. Which version to use for evaluation.
| JSON representation |
|---|
{
"sourceLanguage": string,
"targetLanguage": string,
"version": enum ( |
MetricxVersion
MetricX version options.
| Enums | |
|---|---|
METRICX_VERSION_UNSPECIFIED |
MetricX version unspecified. |
METRICX_24_REF |
MetricX 2024 (2.6) for translation + reference (reference-based). |
METRICX_24_SRC |
MetricX 2024 (2.6) for translation + source (QE). |
METRICX_24_SRC_REF |
MetricX 2024 (2.6) for translation + source + reference (source-reference-combined). |
MetricxInstance
Spec for MetricX instance - The fields used for evaluation are dependent on the MetricX version.
predictionstring
Required. Output of the evaluated model.
referencestring
Optional. Ground truth used to compare against the prediction.
sourcestring
Optional. Source text in original language.
| JSON representation |
|---|
{ "prediction": string, "reference": string, "source": string } |
TrajectoryExactMatchInput
Instances and metric spec for TrajectoryExactMatch metric.
Required. Spec for TrajectoryExactMatch metric.
Required. Repeated TrajectoryExactMatch instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectoryExactMatchSpec
This type has no fields.
Spec for TrajectoryExactMatch metric - returns 1 if tool calls in the reference trajectory exactly match the predicted trajectory, else 0.
TrajectoryExactMatchInstance
Spec for TrajectoryExactMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
| JSON representation |
|---|
{ "predictedTrajectory": { object ( |
Trajectory
ToolCall
Spec for tool call.
toolNamestring
Required. Spec for tool name
toolInputstring
Optional. Spec for tool input
| JSON representation |
|---|
{ "toolName": string, "toolInput": string } |
TrajectoryInOrderMatchInput
Instances and metric spec for TrajectoryInOrderMatch metric.
Required. Spec for TrajectoryInOrderMatch metric.
Required. Repeated TrajectoryInOrderMatch instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectoryInOrderMatchSpec
This type has no fields.
Spec for TrajectoryInOrderMatch metric - returns 1 if tool calls in the reference trajectory appear in the predicted trajectory in the same order, else 0.
TrajectoryInOrderMatchInstance
Spec for TrajectoryInOrderMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
| JSON representation |
|---|
{ "predictedTrajectory": { object ( |
TrajectoryAnyOrderMatchInput
Instances and metric spec for TrajectoryAnyOrderMatch metric.
Required. Spec for TrajectoryAnyOrderMatch metric.
Required. Repeated TrajectoryAnyOrderMatch instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectoryAnyOrderMatchSpec
This type has no fields.
Spec for TrajectoryAnyOrderMatch metric - returns 1 if all tool calls in the reference trajectory appear in the predicted trajectory in any order, else 0.
TrajectoryAnyOrderMatchInstance
Spec for TrajectoryAnyOrderMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
| JSON representation |
|---|
{ "predictedTrajectory": { object ( |
TrajectoryPrecisionInput
Instances and metric spec for TrajectoryPrecision metric.
Required. Spec for TrajectoryPrecision metric.
Required. Repeated TrajectoryPrecision instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectoryPrecisionSpec
This type has no fields.
Spec for TrajectoryPrecision metric - returns a float score based on average precision of individual tool calls.
TrajectoryPrecisionInstance
Spec for TrajectoryPrecision instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
| JSON representation |
|---|
{ "predictedTrajectory": { object ( |
TrajectoryRecallInput
Instances and metric spec for TrajectoryRecall metric.
Required. Spec for TrajectoryRecall metric.
Required. Repeated TrajectoryRecall instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectoryRecallSpec
This type has no fields.
Spec for TrajectoryRecall metric - returns a float score based on average recall of individual tool calls.
TrajectoryRecallInstance
Spec for TrajectoryRecall instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
| JSON representation |
|---|
{ "predictedTrajectory": { object ( |
TrajectorySingleToolUseInput
Instances and metric spec for TrajectorySingleToolUse metric.
Required. Spec for TrajectorySingleToolUse metric.
Required. Repeated TrajectorySingleToolUse instance.
| JSON representation |
|---|
{ "metricSpec": { object ( |
TrajectorySingleToolUseSpec
Spec for TrajectorySingleToolUse metric - returns 1 if tool is present in the predicted trajectory, else 0.
toolNamestring
Required. Spec for tool name to be checked for in the predicted trajectory.
| JSON representation |
|---|
{ "toolName": string } |
TrajectorySingleToolUseInstance
Spec for TrajectorySingleToolUse instance.
Required. Spec for predicted tool call trajectory.
| JSON representation |
|---|
{
"predictedTrajectory": {
object ( |
ExactMatchResults
Results for exact match metric.
Output only. Exact match metric values.
| JSON representation |
|---|
{
"exactMatchMetricValues": [
{
object ( |
ExactMatchMetricValue
Exact match metric value for an instance.
scorenumber
Output only. Exact match score.
| JSON representation |
|---|
{ "score": number } |
BleuResults
Results for bleu metric.
Output only. Bleu metric values.
| JSON representation |
|---|
{
"bleuMetricValues": [
{
object ( |
BleuMetricValue
Bleu metric value for an instance.
scorenumber
Output only. Bleu score.
| JSON representation |
|---|
{ "score": number } |
RougeResults
Results for rouge metric.
Output only. Rouge metric values.
| JSON representation |
|---|
{
"rougeMetricValues": [
{
object ( |
RougeMetricValue
Rouge metric value for an instance.
scorenumber
Output only. Rouge score.
| JSON representation |
|---|
{ "score": number } |
FluencyResult
Spec for fluency result.
explanationstring
Output only. Explanation for fluency score.
scorenumber
Output only. Fluency score.
confidencenumber
Output only. confidence for fluency score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
CoherenceResult
Spec for coherence result.
explanationstring
Output only. Explanation for coherence score.
scorenumber
Output only. Coherence score.
confidencenumber
Output only. confidence for coherence score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
SafetyResult
Spec for safety result.
explanationstring
Output only. Explanation for safety score.
scorenumber
Output only. Safety score.
confidencenumber
Output only. confidence for safety score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
GroundednessResult
Spec for groundedness result.
explanationstring
Output only. Explanation for groundedness score.
scorenumber
Output only. Groundedness score.
confidencenumber
Output only. confidence for groundedness score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
FulfillmentResult
Spec for fulfillment result.
explanationstring
Output only. Explanation for fulfillment score.
scorenumber
Output only. Fulfillment score.
confidencenumber
Output only. confidence for fulfillment score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
SummarizationQualityResult
Spec for summarization quality result.
explanationstring
Output only. Explanation for summarization quality score.
scorenumber
Output only. Summarization Quality score.
confidencenumber
Output only. confidence for summarization quality score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
PairwiseSummarizationQualityResult
Spec for pairwise summarization quality result.
Output only. Pairwise summarization prediction choice.
explanationstring
Output only. Explanation for summarization quality score.
confidencenumber
Output only. confidence for summarization quality score.
| JSON representation |
|---|
{
"pairwiseChoice": enum ( |
PairwiseChoice
Pairwise prediction autorater preference.
| Enums | |
|---|---|
PAIRWISE_CHOICE_UNSPECIFIED |
Unspecified prediction choice. |
BASELINE |
baseline prediction wins |
CANDIDATE |
Candidate prediction wins |
TIE |
Winner cannot be determined |
SummarizationHelpfulnessResult
Spec for summarization helpfulness result.
explanationstring
Output only. Explanation for summarization helpfulness score.
scorenumber
Output only. Summarization Helpfulness score.
confidencenumber
Output only. confidence for summarization helpfulness score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
SummarizationVerbosityResult
Spec for summarization verbosity result.
explanationstring
Output only. Explanation for summarization verbosity score.
scorenumber
Output only. Summarization Verbosity score.
confidencenumber
Output only. confidence for summarization verbosity score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringQualityResult
Spec for question answering quality result.
explanationstring
Output only. Explanation for question answering quality score.
scorenumber
Output only. Question Answering Quality score.
confidencenumber
Output only. confidence for question answering quality score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
PairwiseQuestionAnsweringQualityResult
Spec for pairwise question answering quality result.
Output only. Pairwise question answering prediction choice.
explanationstring
Output only. Explanation for question answering quality score.
confidencenumber
Output only. confidence for question answering quality score.
| JSON representation |
|---|
{
"pairwiseChoice": enum ( |
QuestionAnsweringRelevanceResult
Spec for question answering relevance result.
explanationstring
Output only. Explanation for question answering relevance score.
scorenumber
Output only. Question Answering Relevance score.
confidencenumber
Output only. confidence for question answering relevance score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringHelpfulnessResult
Spec for question answering helpfulness result.
explanationstring
Output only. Explanation for question answering helpfulness score.
scorenumber
Output only. Question Answering Helpfulness score.
confidencenumber
Output only. confidence for question answering helpfulness score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringCorrectnessResult
Spec for question answering correctness result.
explanationstring
Output only. Explanation for question answering correctness score.
scorenumber
Output only. Question Answering Correctness score.
confidencenumber
Output only. confidence for question answering correctness score.
| JSON representation |
|---|
{ "explanation": string, "score": number, "confidence": number } |
PointwiseMetricResult
Spec for pointwise metric result.
explanationstring
Output only. Explanation for pointwise metric score.
scorenumber
Output only. Pointwise metric score.
| JSON representation |
|---|
{ "explanation": string, "score": number } |
PairwiseMetricResult
Spec for pairwise metric result.
Output only. Pairwise metric choice.
explanationstring
Output only. Explanation for pairwise metric score.
| JSON representation |
|---|
{
"pairwiseChoice": enum ( |
ToolCallValidResults
Results for tool call valid metric.
Output only. Tool call valid metric values.
| JSON representation |
|---|
{
"toolCallValidMetricValues": [
{
object ( |
ToolCallValidMetricValue
Tool call valid metric value for an instance.
scorenumber
Output only. Tool call valid score.
| JSON representation |
|---|
{ "score": number } |
ToolNameMatchResults
Results for tool name match metric.
Output only. Tool name match metric values.
| JSON representation |
|---|
{
"toolNameMatchMetricValues": [
{
object ( |
ToolNameMatchMetricValue
Tool name match metric value for an instance.
scorenumber
Output only. Tool name match score.
| JSON representation |
|---|
{ "score": number } |
ToolParameterKeyMatchResults
Results for tool parameter key match metric.
Output only. Tool parameter key match metric values.
| JSON representation |
|---|
{
"toolParameterKeyMatchMetricValues": [
{
object ( |
ToolParameterKeyMatchMetricValue
Tool parameter key match metric value for an instance.
scorenumber
Output only. Tool parameter key match score.
| JSON representation |
|---|
{ "score": number } |
ToolParameterKVMatchResults
Results for tool parameter key value match metric.
Output only. Tool parameter key value match metric values.
| JSON representation |
|---|
{
"toolParameterKvMatchMetricValues": [
{
object ( |
ToolParameterKVMatchMetricValue
Tool parameter key value match metric value for an instance.
scorenumber
Output only. Tool parameter key value match score.
| JSON representation |
|---|
{ "score": number } |
CometResult
Spec for Comet result - calculates the comet score for the given instance using the version specified in the spec.
scorenumber
Output only. Comet score. Range depends on version.
| JSON representation |
|---|
{ "score": number } |
MetricxResult
Spec for MetricX result - calculates the MetricX score for the given instance using the version specified in the spec.
scorenumber
Output only. MetricX score. Range depends on version.
| JSON representation |
|---|
{ "score": number } |
TrajectoryExactMatchResults
Results for TrajectoryExactMatch metric.
Output only. TrajectoryExactMatch metric values.
| JSON representation |
|---|
{
"trajectoryExactMatchMetricValues": [
{
object ( |
TrajectoryExactMatchMetricValue
TrajectoryExactMatch metric value for an instance.
scorenumber
Output only. TrajectoryExactMatch score.
| JSON representation |
|---|
{ "score": number } |
TrajectoryInOrderMatchResults
Results for TrajectoryInOrderMatch metric.
Output only. TrajectoryInOrderMatch metric values.
| JSON representation |
|---|
{
"trajectoryInOrderMatchMetricValues": [
{
object ( |
TrajectoryInOrderMatchMetricValue
TrajectoryInOrderMatch metric value for an instance.
scorenumber
Output only. TrajectoryInOrderMatch score.
| JSON representation |
|---|
{ "score": number } |
TrajectoryAnyOrderMatchResults
Results for TrajectoryAnyOrderMatch metric.
Output only. TrajectoryAnyOrderMatch metric values.
| JSON representation |
|---|
{
"trajectoryAnyOrderMatchMetricValues": [
{
object ( |
TrajectoryAnyOrderMatchMetricValue
TrajectoryAnyOrderMatch metric value for an instance.
scorenumber
Output only. TrajectoryAnyOrderMatch score.
| JSON representation |
|---|
{ "score": number } |
TrajectoryPrecisionResults
Results for TrajectoryPrecision metric.
Output only. TrajectoryPrecision metric values.
| JSON representation |
|---|
{
"trajectoryPrecisionMetricValues": [
{
object ( |
TrajectoryPrecisionMetricValue
TrajectoryPrecision metric value for an instance.
scorenumber
Output only. TrajectoryPrecision score.
| JSON representation |
|---|
{ "score": number } |
TrajectoryRecallResults
Results for TrajectoryRecall metric.
Output only. TrajectoryRecall metric values.
| JSON representation |
|---|
{
"trajectoryRecallMetricValues": [
{
object ( |
TrajectoryRecallMetricValue
TrajectoryRecall metric value for an instance.
scorenumber
Output only. TrajectoryRecall score.
| JSON representation |
|---|
{ "score": number } |
TrajectorySingleToolUseResults
Results for TrajectorySingleToolUse metric.
Output only. TrajectorySingleToolUse metric values.
| JSON representation |
|---|
{
"trajectorySingleToolUseMetricValues": [
{
object ( |
TrajectorySingleToolUseMetricValue
TrajectorySingleToolUse metric value for an instance.
scorenumber
Output only. TrajectorySingleToolUse score.
| JSON representation |
|---|
{ "score": number } |