REST Resource: projects.locations.apps.evaluations.results

Resource: EvaluationResult

An evaluation result represents the output of running an Evaluation.

JSON representation
{
  "name": string,
  "displayName": string,
  "createTime": string,
  "evaluationStatus": enum (EvaluationResult.Outcome),
  "evaluationRun": string,
  "persona": {
    object (EvaluationPersona)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "error": {
    object (Status)
  },
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "executionState": enum (EvaluationResult.ExecutionState),
  "evaluationMetricsThresholds": {
    object (EvaluationMetricsThresholds)
  },
  "config": {
    object (EvaluationConfig)
  },
  "goldenRunMethod": enum (GoldenRunMethod),

  // Union field result can be only one of the following:
  "goldenResult": {
    object (EvaluationResult.GoldenResult)
  },
  "scenarioResult": {
    object (EvaluationResult.ScenarioResult)
  }
  // End of list of possible types for union field result.
}
Fields
name

string

Identifier. The unique identifier of the evaluation result. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}

displayName

string

Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " result - ".

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation result was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

evaluationStatus

enum (EvaluationResult.Outcome)

Output only. The outcome of the evaluation. Only populated if executionState is COMPLETE.

evaluationRun

string

Output only. The evaluation run that produced this result. Format: projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}

persona

object (EvaluationPersona)

Output only. The persona used to generate the conversation for the evaluation result.

errorInfo

object (EvaluationErrorInfo)

Output only. Error information for the evaluation result.

error
(deprecated)

object (Status)

Output only. Deprecated: Use errorInfo instead. Errors encountered during execution.

initiatedBy

string

Output only. The user who initiated the evaluation run that resulted in this result.

appVersion

string

Output only. The app version used to generate the conversation that resulted in this result. Format: projects/{project}/locations/{location}/apps/{app}/versions/{version}

appVersionDisplayName

string

Output only. The display name of the appVersion that the evaluation ran against.

changelog

string

Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

executionState

enum (EvaluationResult.ExecutionState)

Output only. The state of the evaluation result execution.

evaluationMetricsThresholds

object (EvaluationMetricsThresholds)

Output only. The evaluation thresholds for the result.

config

object (EvaluationConfig)

Output only. The configuration used in the evaluation run that resulted in this result.

goldenRunMethod

enum (GoldenRunMethod)

Output only. The method used to run the golden evaluation.

Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following:
goldenResult

object (EvaluationResult.GoldenResult)

Output only. The outcome of a golden evaluation.

scenarioResult

object (EvaluationResult.ScenarioResult)

Output only. The outcome of a scenario evaluation.

EvaluationResult.GoldenResult

The result of a golden evaluation.

JSON representation
{
  "turnReplayResults": [
    {
      object (EvaluationResult.GoldenResult.TurnReplayResult)
    }
  ]
}
Fields
turnReplayResults[]

object (EvaluationResult.GoldenResult.TurnReplayResult)

Output only. The result of running each turn of the golden conversation.

EvaluationResult.GoldenResult.TurnReplayResult

The result of running a single turn of the golden conversation.

JSON representation
{
  "conversation": string,
  "expectationOutcome": [
    {
      object (EvaluationResult.GoldenExpectationOutcome)
    }
  ],
  "hallucinationResult": {
    object (EvaluationResult.HallucinationResult)
  },
  "toolInvocationScore": number,
  "turnLatency": string,
  "toolCallLatencies": [
    {
      object (EvaluationResult.ToolCallLatency)
    }
  ],
  "semanticSimilarityResult": {
    object (EvaluationResult.SemanticSimilarityResult)
  },
  "overallToolInvocationResult": {
    object (EvaluationResult.OverallToolInvocationResult)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "toolOrderedInvocationScore": number
}
Fields
conversation

string

Output only. The conversation that was generated for this turn.

expectationOutcome[]

object (EvaluationResult.GoldenExpectationOutcome)

Output only. The outcome of each expectation.

hallucinationResult

object (EvaluationResult.HallucinationResult)

Output only. The result of the hallucination check.

toolInvocationScore
(deprecated)

number

Output only. Deprecated. Use OverallToolInvocationResult instead.

turnLatency

string (Duration format)

Output only. Duration of the turn.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

toolCallLatencies[]

object (EvaluationResult.ToolCallLatency)

Output only. The latency of each tool call in the turn.

semanticSimilarityResult

object (EvaluationResult.SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

overallToolInvocationResult

object (EvaluationResult.OverallToolInvocationResult)

Output only. The result of the overall tool invocation check.

errorInfo

object (EvaluationErrorInfo)

Output only. Information about the error that occurred during this turn.

toolOrderedInvocationScore

number

Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order.

EvaluationResult.GoldenExpectationOutcome

Specifies the expectation and the result of that expectation.

JSON representation
{
  "expectation": {
    object (Evaluation.GoldenExpectation)
  },
  "outcome": enum (EvaluationResult.Outcome),
  "semanticSimilarityResult": {
    object (EvaluationResult.SemanticSimilarityResult)
  },
  "toolInvocationResult": {
    object (EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult)
  },

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (ToolCall)
  },
  "observedToolResponse": {
    object (ToolResponse)
  },
  "observedAgentResponse": {
    object (Message)
  },
  "observedAgentTransfer": {
    object (AgentTransfer)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (Evaluation.GoldenExpectation)

Output only. The expectation that was evaluated.

outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the expectation.

semanticSimilarityResult
(deprecated)

object (EvaluationResult.SemanticSimilarityResult)

Output only. The result of the semantic similarity check.

toolInvocationResult

object (EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult)

Output only. The result of the tool invocation check.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (ToolCall)

Output only. The result of the tool call expectation.

observedToolResponse

object (ToolResponse)

Output only. The result of the tool response expectation.

observedAgentResponse

object (Message)

Output only. The result of the agent response expectation.

observedAgentTransfer

object (AgentTransfer)

Output only. The result of the agent transfer expectation.

EvaluationResult.Outcome

The outcome of the evaluation or expectation.

Enums
OUTCOME_UNSPECIFIED Evaluation outcome is not specified.
PASS Evaluation/Expectation passed. In the case of an evaluation, this means that all expectations were met.
FAIL Evaluation/Expectation failed. In the case of an evaluation, this means that at least one expectation was not met.

EvaluationResult.SemanticSimilarityResult

The result of the semantic similarity check.

JSON representation
{
  "label": string,
  "explanation": string,
  "outcome": enum (EvaluationResult.Outcome),
  "score": integer
}
Fields
label

string

Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory

explanation

string

Output only. The explanation for the semantic similarity score.

outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semanticSimilaritySuccessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

score

integer

Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4.

EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult

The result of the tool invocation check.

JSON representation
{
  "outcome": enum (EvaluationResult.Outcome),
  "parameterCorrectnessScore": number
}
Fields
outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the parameterCorrectnessScore to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

parameterCorrectnessScore

number

Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call.

EvaluationResult.HallucinationResult

The result of the hallucination check for a single turn.

JSON representation
{
  "label": string,
  "explanation": string,
  "score": integer
}
Fields
label

string

Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess

explanation

string

Output only. The explanation for the hallucination score.

score

integer

Output only. The hallucination score. Can be -1, 0, 1.

EvaluationResult.ToolCallLatency

The latency of a tool call execution.

JSON representation
{
  "tool": string,
  "displayName": string,
  "startTime": string,
  "endTime": string,
  "executionLatency": string
}
Fields
tool

string

Output only. The name of the tool that got executed. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}.

displayName

string

Output only. The display name of the tool.

startTime

string (Timestamp format)

Output only. The start time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Output only. The end time of the tool call execution.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

executionLatency

string (Duration format)

Output only. The latency of the tool call execution.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

EvaluationResult.OverallToolInvocationResult

The result of the overall tool invocation check.

JSON representation
{
  "outcome": enum (EvaluationResult.Outcome),
  "toolInvocationScore": number
}
Fields
outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the tool invocation check. This is determined by comparing the toolInvocationScore to the overallToolInvocationCorrectnessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL.

toolInvocationScore

number

The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked.

EvaluationResult.ScenarioResult

The outcome of a scenario evaluation.

JSON representation
{
  "conversation": string,
  "expectationOutcomes": [
    {
      object (EvaluationResult.ScenarioExpectationOutcome)
    }
  ],
  "rubricOutcomes": [
    {
      object (EvaluationResult.ScenarioRubricOutcome)
    }
  ],
  "hallucinationResult": [
    {
      object (EvaluationResult.HallucinationResult)
    }
  ],
  "taskCompletionResult": {
    object (EvaluationResult.TaskCompletionResult)
  },
  "toolCallLatencies": [
    {
      object (EvaluationResult.ToolCallLatency)
    }
  ],
  "userGoalSatisfactionResult": {
    object (EvaluationResult.UserGoalSatisfactionResult)
  },
  "allExpectationsSatisfied": boolean,
  "taskCompleted": boolean
}
Fields
conversation

string

Output only. The conversation that was generated in the scenario.

expectationOutcomes[]

object (EvaluationResult.ScenarioExpectationOutcome)

Output only. The outcome of each expectation.

rubricOutcomes[]

object (EvaluationResult.ScenarioRubricOutcome)

Output only. The outcome of the rubric.

hallucinationResult[]

object (EvaluationResult.HallucinationResult)

Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation.

taskCompletionResult
(deprecated)

object (EvaluationResult.TaskCompletionResult)

Output only. The result of the task completion check.

toolCallLatencies[]

object (EvaluationResult.ToolCallLatency)

Output only. The latency of each tool call execution in the conversation.

userGoalSatisfactionResult

object (EvaluationResult.UserGoalSatisfactionResult)

Output only. The result of the user goal satisfaction check.

allExpectationsSatisfied

boolean

Output only. Whether all expectations were satisfied for this turn.

taskCompleted

boolean

Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction.

EvaluationResult.ScenarioExpectationOutcome

The outcome of a scenario expectation.

JSON representation
{
  "expectation": {
    object (Evaluation.ScenarioExpectation)
  },
  "outcome": enum (EvaluationResult.Outcome),

  // Union field result can be only one of the following:
  "observedToolCall": {
    object (EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall)
  },
  "observedAgentResponse": {
    object (Message)
  }
  // End of list of possible types for union field result.
}
Fields
expectation

object (Evaluation.ScenarioExpectation)

Output only. The expectation that was evaluated.

outcome

enum (EvaluationResult.Outcome)

Output only. The outcome of the ScenarioExpectation.

Union field result. The result of the expectation. result can be only one of the following:
observedToolCall

object (EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall)

Output only. The observed tool call.

observedAgentResponse

object (Message)

Output only. The observed agent response.

EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall

The observed tool call and response.

JSON representation
{
  "toolCall": {
    object (ToolCall)
  },
  "toolResponse": {
    object (ToolResponse)
  }
}
Fields
toolCall

object (ToolCall)

Output only. The observed tool call.

toolResponse

object (ToolResponse)

Output only. The observed tool response.

EvaluationResult.ScenarioRubricOutcome

The outcome of the evaluation against the rubric.

JSON representation
{
  "rubric": string,
  "scoreExplanation": string,
  "score": number
}
Fields
rubric

string

Output only. The rubric that was used to evaluate the conversation.

scoreExplanation

string

Output only. The rater's response to the rubric.

score

number

Output only. The score of the conversation against the rubric.

EvaluationResult.TaskCompletionResult

The result of the task completion check for the conversation.

JSON representation
{
  "label": string,
  "explanation": string,
  "score": integer
}
Fields
label

string

Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined

explanation

string

Output only. The explanation for the task completion score.

score

integer

Output only. The task completion score. Can be -1, 0, 1

EvaluationResult.UserGoalSatisfactionResult

The result of a user goal satisfaction check for a conversation.

JSON representation
{
  "label": string,
  "explanation": string,
  "score": integer
}
Fields
label

string

Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified

explanation

string

Output only. The explanation for the user task satisfaction score.

score

integer

Output only. The user task satisfaction score. Can be -1, 0, 1.

EvaluationResult.ExecutionState

The state of the evaluation result execution.

Enums
EXECUTION_STATE_UNSPECIFIED Evaluation result execution state is not specified.
RUNNING Evaluation result execution is running.
COMPLETED Evaluation result execution has completed.
ERROR Evaluation result execution failed due to an internal error.

Methods

delete

Deletes an evaluation result.

get

Gets details of the specified evaluation result.

list

Lists all evaluation results for a given evaluation.