REST Resource: projects.locations.apps.evaluations.results

Resource: EvaluationResult
Methods

Resource: EvaluationResult

An evaluation result represents the output of running an Evaluation.

JSON representation

JSON representation
{ "name": string, "displayName": string, "createTime": string, "evaluationStatus": enum (`EvaluationResult.Outcome`), "evaluationRun": string, "persona": { object (`EvaluationPersona`) }, "errorInfo": { object (`EvaluationErrorInfo`) }, "error": { object (`Status`) }, "initiatedBy": string, "appVersion": string, "appVersionDisplayName": string, "changelog": string, "executionState": enum (`EvaluationResult.ExecutionState`), "evaluationMetricsThresholds": { object (`EvaluationMetricsThresholds`) }, "config": { object (`EvaluationConfig`) }, "goldenRunMethod": enum (`GoldenRunMethod`), // Union field `result` can be only one of the following: "goldenResult": { object (`EvaluationResult.GoldenResult`) }, "scenarioResult": { object (`EvaluationResult.ScenarioResult`) } // End of list of possible types for union field `result`. }

{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (EvaluationResult.Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "executionState": enum (EvaluationResult.ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (EvaluationResult.GoldenResult)
  },
  "scenarioResult": {
    object (EvaluationResult.ScenarioResult)
  }
  // End of list of possible types for union field result.
}

Fields
`name`	`string` Identifier. The unique identifier of the evaluation result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}`
`displayName`	`string` Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`evaluationStatus`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the evaluation. Only populated if executionState is COMPLETE.
`evaluationRun`	`string` Output only. The evaluation run that produced this result. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}`
`persona`	`object (EvaluationPersona)` Output only. The persona used to generate the conversation for the evaluation result.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Error information for the evaluation result.
`error (deprecated)`	`object (Status)` This item is deprecated! Output only. Deprecated: Use `errorInfo` instead. Errors encountered during execution.
`initiatedBy`	`string` Output only. The user who initiated the evaluation run that resulted in this result.
`appVersion`	`string` Output only. The app version used to generate the conversation that resulted in this result. Format: `projects/{project}/locations/{location}/apps/{app}/versions/{version}`
`appVersionDisplayName`	`string` Output only. The display name of the `appVersion` that the evaluation ran against.
`changelog`	`string` Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.
`executionState`	`enum (EvaluationResult.ExecutionState)` Output only. The state of the evaluation result execution.
`evaluationMetricsThresholds`	`object (EvaluationMetricsThresholds)` Output only. The evaluation thresholds for the result.
`config`	`object (EvaluationConfig)` Output only. The configuration used in the evaluation run that resulted in this result.
`goldenRunMethod`	`enum (GoldenRunMethod)` Output only. The method used to run the golden evaluation.
Union field `result`. The result of the evaluation. Only populated when the execution_state is COMPLETED. `result` can be only one of the following:
`goldenResult`	`object (EvaluationResult.GoldenResult)` Output only. The outcome of a golden evaluation.
`scenarioResult`	`object (EvaluationResult.ScenarioResult)` Output only. The outcome of a scenario evaluation.

EvaluationResult.GoldenResult

The result of a golden evaluation.

JSON representation
{ "turnReplayResults": [ { object (`EvaluationResult.GoldenResult.TurnReplayResult`) } ] }

Fields

Fields
`turnReplayResults[]`	`object (EvaluationResult.GoldenResult.TurnReplayResult)` Output only. The result of running each turn of the golden conversation.

turnReplayResults[]

object (EvaluationResult.GoldenResult.TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

EvaluationResult.GoldenResult.TurnReplayResult

The result of running a single turn of the golden conversation.

JSON representation

JSON representation
{ "conversation": string, "expectationOutcome": [ { object (`EvaluationResult.GoldenExpectationOutcome`) } ], "hallucinationResult": { object (`EvaluationResult.HallucinationResult`) }, "toolInvocationScore": number, "turnLatency": string, "toolCallLatencies": [ { object (`EvaluationResult.ToolCallLatency`) } ], "semanticSimilarityResult": { object (`EvaluationResult.SemanticSimilarityResult`) }, "overallToolInvocationResult": { object (`EvaluationResult.OverallToolInvocationResult`) }, "errorInfo": { object (`EvaluationErrorInfo`) }, "toolOrderedInvocationScore": number }

{
  "conversation": string,
  "expectationOutcome": [
    {
      object (EvaluationResult.GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (EvaluationResult.HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (EvaluationResult.ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (EvaluationResult.SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (EvaluationResult.OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "toolOrderedInvocationScore": number
}

Fields
`conversation`	`string` Output only. The conversation that was generated for this turn.
`expectationOutcome[]`	`object (EvaluationResult.GoldenExpectationOutcome)` Output only. The outcome of each expectation.
`hallucinationResult`	`object (EvaluationResult.HallucinationResult)` Output only. The result of the hallucination check.
`toolInvocationScore (deprecated)`	`number` This item is deprecated! Output only. Deprecated. Use OverallToolInvocationResult instead.
`turnLatency`	`string (Duration format)` Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.
`toolCallLatencies[]`	`object (EvaluationResult.ToolCallLatency)` Output only. The latency of each tool call in the turn.
`semanticSimilarityResult`	`object (EvaluationResult.SemanticSimilarityResult)` Output only. The result of the semantic similarity check.
`overallToolInvocationResult`	`object (EvaluationResult.OverallToolInvocationResult)` Output only. The result of the overall tool invocation check.
`errorInfo`	`object (EvaluationErrorInfo)` Output only. Information about the error that occurred during this turn.
`toolOrderedInvocationScore`	`number` Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

EvaluationResult.GoldenExpectationOutcome

Specifies the expectation and the result of that expectation.

JSON representation

JSON representation
{ "expectation": { object (`Evaluation.GoldenExpectation`) }, "outcome": enum (`EvaluationResult.Outcome`), "semanticSimilarityResult": { object (`EvaluationResult.SemanticSimilarityResult`) }, "toolInvocationResult": { object (`EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult`) }, // Union field `result` can be only one of the following: "observedToolCall": { object (`ToolCall`) }, "observedToolResponse": { object (`ToolResponse`) }, "observedAgentResponse": { object (`Message`) }, "observedAgentTransfer": { object (`AgentTransfer`) } // End of list of possible types for union field `result`. }

{
  "expectation": {
    object (Evaluation.GoldenExpectation)
  },
  "outcome": enum (EvaluationResult.Outcome),
  "semanticSimilarityResult": {
    object (EvaluationResult.SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (Evaluation.GoldenExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the expectation.
`semanticSimilarityResult (deprecated)`	`object (EvaluationResult.SemanticSimilarityResult)` This item is deprecated! Output only. The result of the semantic similarity check.
`toolInvocationResult`	`object (EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult)` Output only. The result of the tool invocation check.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (ToolCall)` Output only. The result of the tool call expectation.
`observedToolResponse`	`object (ToolResponse)` Output only. The result of the tool response expectation.
`observedAgentResponse`	`object (Message)` Output only. The result of the agent response expectation.
`observedAgentTransfer`	`object (AgentTransfer)` Output only. The result of the agent transfer expectation.

EvaluationResult.Outcome

The outcome of the evaluation or expectation.

Enums
`OUTCOME_UNSPECIFIED`	Evaluation outcome is not specified.
`PASS`	Evaluation/Expectation passed. In the case of an evaluation, this means that all expectations were met.
`FAIL`	Evaluation/Expectation failed. In the case of an evaluation, this means that at least one expectation was not met.

EvaluationResult.SemanticSimilarityResult

The result of the semantic similarity check.

JSON representation
{ "label": string, "explanation": string, "outcome": enum (`EvaluationResult.Outcome`), "score": integer }

Fields
`label`	`string` Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory
`explanation`	`string` Output only. The explanation for the semantic similarity score.
`outcome`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semanticSimilaritySuccessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
`score`	`integer` Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult

The result of the tool invocation check.

JSON representation
{ "outcome": enum (`EvaluationResult.Outcome`), "parameterCorrectnessScore": number }

Fields

Fields
`outcome`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the tool invocation check. This is determined by comparing the parameterCorrectnessScore to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
`parameterCorrectnessScore`	`number` Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the parameterCorrectnessScore to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

parameterCorrectnessScore

number

Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

EvaluationResult.HallucinationResult

The result of the hallucination check for a single turn.

JSON representation
{ "label": string, "explanation": string, "score": integer }

Fields

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess
`explanation`	`string` Output only. The explanation for the hallucination score.
`score`	`integer` Output only. The hallucination score. Can be -1, 0, 1.

label

string

Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess

explanation

string

Output only. The explanation for the hallucination score.

score

integer

Output only. The hallucination score. Can be -1, 0, 1.

EvaluationResult.ToolCallLatency

The latency of a tool call execution.

JSON representation
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string }

Fields
`tool`	`string` Output only. The name of the tool that got executed. Format: `projects/{project}/locations/{location}/apps/{app}/tools/{tool}`.
`displayName`	`string` Output only. The display name of the tool.
`startTime`	`string (Timestamp format)` Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`endTime`	`string (Timestamp format)` Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`executionLatency`	`string (Duration format)` Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

EvaluationResult.OverallToolInvocationResult

The result of the overall tool invocation check.

JSON representation
{ "outcome": enum (`EvaluationResult.Outcome`), "toolInvocationScore": number }

Fields

Fields
`outcome`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the tool invocation check. This is determined by comparing the toolInvocationScore to the overallToolInvocationCorrectnessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.
`toolInvocationScore`	`number` The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the toolInvocationScore to the overallToolInvocationCorrectnessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationResult.ScenarioResult

The outcome of a scenario evaluation.

JSON representation

JSON representation
{ "conversation": string, "expectationOutcomes": [ { object (`EvaluationResult.ScenarioExpectationOutcome`) } ], "rubricOutcomes": [ { object (`EvaluationResult.ScenarioRubricOutcome`) } ], "hallucinationResult": [ { object (`EvaluationResult.HallucinationResult`) } ], "taskCompletionResult": { object (`EvaluationResult.TaskCompletionResult`) }, "toolCallLatencies": [ { object (`EvaluationResult.ToolCallLatency`) } ], "userGoalSatisfactionResult": { object (`EvaluationResult.UserGoalSatisfactionResult`) }, "allExpectationsSatisfied": boolean, "taskCompleted": boolean }

{
  "conversation": string,
  "expectationOutcomes": [
    {
      object (EvaluationResult.ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (EvaluationResult.ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (EvaluationResult.HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (EvaluationResult.TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (EvaluationResult.ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (EvaluationResult.UserGoalSatisfactionResult)
  },
  "allExpectationsSatisfied": boolean,
  "taskCompleted": boolean
}

Fields
`conversation`	`string` Output only. The conversation that was generated in the scenario.
`expectationOutcomes[]`	`object (EvaluationResult.ScenarioExpectationOutcome)` Output only. The outcome of each expectation.
`rubricOutcomes[]`	`object (EvaluationResult.ScenarioRubricOutcome)` Output only. The outcome of the rubric.
`hallucinationResult[]`	`object (EvaluationResult.HallucinationResult)` Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.
`taskCompletionResult (deprecated)`	`object (EvaluationResult.TaskCompletionResult)` This item is deprecated! Output only. The result of the task completion check.
`toolCallLatencies[]`	`object (EvaluationResult.ToolCallLatency)` Output only. The latency of each tool call execution in the conversation.
`userGoalSatisfactionResult`	`object (EvaluationResult.UserGoalSatisfactionResult)` Output only. The result of the user goal satisfaction check.
`allExpectationsSatisfied`	`boolean` Output only. Whether all expectations were satisfied for this turn.
`taskCompleted`	`boolean` Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

EvaluationResult.ScenarioExpectationOutcome

The outcome of a scenario expectation.

JSON representation

JSON representation
{ "expectation": { object (`Evaluation.ScenarioExpectation`) }, "outcome": enum (`EvaluationResult.Outcome`), // Union field `result` can be only one of the following: "observedToolCall": { object (`EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall`) }, "observedAgentResponse": { object (`Message`) } // End of list of possible types for union field `result`. }

{
  "expectation": {
    object (Evaluation.ScenarioExpectation)
  },
  "outcome": enum (EvaluationResult.Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}

Fields
`expectation`	`object (Evaluation.ScenarioExpectation)` Output only. The expectation that was evaluated.
`outcome`	`enum (EvaluationResult.Outcome)` Output only. The outcome of the ScenarioExpectation.
Union field `result`. The result of the expectation. `result` can be only one of the following:
`observedToolCall`	`object (EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall)` Output only. The observed tool call.
`observedAgentResponse`	`object (Message)` Output only. The observed agent response.

EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall

The observed tool call and response.

JSON representation
{ "toolCall": { object (`ToolCall`) }, "toolResponse": { object (`ToolResponse`) } }

Fields

Fields
`toolCall`	`object (ToolCall)` Output only. The observed tool call.
`toolResponse`	`object (ToolResponse)` Output only. The observed tool response.

toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

EvaluationResult.ScenarioRubricOutcome

The outcome of the evaluation against the rubric.

JSON representation
{ "rubric": string, "scoreExplanation": string, "score": number }

Fields

Fields
`rubric`	`string` Output only. The rubric that was used to evaluate the conversation.
`scoreExplanation`	`string` Output only. The rater's response to the rubric.
`score`	`number` Output only. The score of the conversation against the rubric.

rubric

string

Output only. The rubric that was used to evaluate the conversation.

scoreExplanation

string

Output only. The rater's response to the rubric.

score

number

Output only. The score of the conversation against the rubric.

EvaluationResult.TaskCompletionResult

The result of the task completion check for the conversation.

JSON representation
{ "label": string, "explanation": string, "score": integer }

Fields

Fields
`label`	`string` Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined
`explanation`	`string` Output only. The explanation for the task completion score.
`score`	`integer` Output only. The task completion score. Can be -1, 0, 1

label

string

Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined

explanation

string

Output only. The explanation for the task completion score.

score

integer

Output only. The task completion score. Can be -1, 0, 1

EvaluationResult.UserGoalSatisfactionResult

The result of a user goal satisfaction check for a conversation.

JSON representation
{ "label": string, "explanation": string, "score": integer }

Fields

Fields
`label`	`string` Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified
`explanation`	`string` Output only. The explanation for the user task satisfaction score.
`score`	`integer` Output only. The user task satisfaction score. Can be -1, 0, 1.

label

string

Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified

explanation

string

Output only. The explanation for the user task satisfaction score.

score

integer

Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationResult.ExecutionState

The state of the evaluation result execution.

Enums
`EXECUTION_STATE_UNSPECIFIED`	Evaluation result execution state is not specified.
`RUNNING`	Evaluation result execution is running.
`COMPLETED`	Evaluation result execution has completed.
`ERROR`	Evaluation result execution failed due to an internal error.

Methods
`delete`	Deletes an evaluation result.
`get`	Gets details of the specified evaluation result.
`list`	Lists all evaluation results for a given evaluation.

REST Resource: projects.locations.apps.evaluations.results

Resource: EvaluationResult

EvaluationResult.GoldenResult

EvaluationResult.GoldenResult.TurnReplayResult

EvaluationResult.GoldenExpectationOutcome

EvaluationResult.Outcome

EvaluationResult.SemanticSimilarityResult

EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult

EvaluationResult.HallucinationResult

EvaluationResult.ToolCallLatency

EvaluationResult.OverallToolInvocationResult

EvaluationResult.ScenarioResult

EvaluationResult.ScenarioExpectationOutcome

EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall

EvaluationResult.ScenarioRubricOutcome

EvaluationResult.TaskCompletionResult

EvaluationResult.UserGoalSatisfactionResult

EvaluationResult.ExecutionState

Methods

`delete`

`get`

`list`

REST Resource: projects.locations.apps.evaluations.results Stay organized with collections Save and categorize content based on your preferences.

Resource: EvaluationResult

EvaluationResult.GoldenResult

EvaluationResult.GoldenResult.TurnReplayResult

EvaluationResult.GoldenExpectationOutcome

EvaluationResult.Outcome

EvaluationResult.SemanticSimilarityResult

EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult

EvaluationResult.HallucinationResult

EvaluationResult.ToolCallLatency

EvaluationResult.OverallToolInvocationResult

EvaluationResult.ScenarioResult

EvaluationResult.ScenarioExpectationOutcome

EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall

EvaluationResult.ScenarioRubricOutcome

EvaluationResult.TaskCompletionResult

EvaluationResult.UserGoalSatisfactionResult

EvaluationResult.ExecutionState

Methods

delete

get

list

REST Resource: projects.locations.apps.evaluations.results

`delete`

`get`

`list`