- Resource: EvaluationResult
- JSON representation
- EvaluationResult.GoldenResult
- EvaluationResult.GoldenResult.TurnReplayResult
- EvaluationResult.GoldenExpectationOutcome
- EvaluationResult.Outcome
- EvaluationResult.SemanticSimilarityResult
- EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult
- EvaluationResult.HallucinationResult
- EvaluationResult.ToolCallLatency
- EvaluationResult.OverallToolInvocationResult
- EvaluationResult.ScenarioResult
- EvaluationResult.ScenarioExpectationOutcome
- EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall
- EvaluationResult.ScenarioRubricOutcome
- EvaluationResult.TaskCompletionResult
- EvaluationResult.UserGoalSatisfactionResult
- EvaluationResult.ExecutionState
- Methods
Resource: EvaluationResult
An evaluation result represents the output of running an Evaluation.
| JSON representation |
|---|
{ "name": string, "displayName": string, "createTime": string, "evaluationStatus": enum ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of the evaluation result. Format: |
displayName |
Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " |
createTime |
Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
evaluationStatus |
Output only. The outcome of the evaluation. Only populated if executionState is COMPLETE. |
evaluationRun |
Output only. The evaluation run that produced this result. Format: |
persona |
Output only. The persona used to generate the conversation for the evaluation result. |
errorInfo |
Output only. Error information for the evaluation result. |
error |
Output only. Deprecated: Use |
initiatedBy |
Output only. The user who initiated the evaluation run that resulted in this result. |
appVersion |
Output only. The app version used to generate the conversation that resulted in this result. Format: |
appVersionDisplayName |
Output only. The display name of the |
changelog |
Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. |
executionState |
Output only. The state of the evaluation result execution. |
evaluationMetricsThresholds |
Output only. The evaluation thresholds for the result. |
config |
Output only. The configuration used in the evaluation run that resulted in this result. |
goldenRunMethod |
Output only. The method used to run the golden evaluation. |
Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following: |
|
goldenResult |
Output only. The outcome of a golden evaluation. |
scenarioResult |
Output only. The outcome of a scenario evaluation. |
EvaluationResult.GoldenResult
The result of a golden evaluation.
| JSON representation |
|---|
{
"turnReplayResults": [
{
object ( |
| Fields | |
|---|---|
turnReplayResults[] |
Output only. The result of running each turn of the golden conversation. |
EvaluationResult.GoldenResult.TurnReplayResult
The result of running a single turn of the golden conversation.
| JSON representation |
|---|
{ "conversation": string, "expectationOutcome": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated for this turn. |
expectationOutcome[] |
Output only. The outcome of each expectation. |
hallucinationResult |
Output only. The result of the hallucination check. |
toolInvocationScore |
Output only. Deprecated. Use OverallToolInvocationResult instead. |
turnLatency |
Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with ' |
toolCallLatencies[] |
Output only. The latency of each tool call in the turn. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
overallToolInvocationResult |
Output only. The result of the overall tool invocation check. |
errorInfo |
Output only. Information about the error that occurred during this turn. |
toolOrderedInvocationScore |
Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order. |
EvaluationResult.GoldenExpectationOutcome
Specifies the expectation and the result of that expectation.
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the expectation. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
toolInvocationResult |
Output only. The result of the tool invocation check. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The result of the tool call expectation. |
observedToolResponse |
Output only. The result of the tool response expectation. |
observedAgentResponse |
Output only. The result of the agent response expectation. |
observedAgentTransfer |
Output only. The result of the agent transfer expectation. |
EvaluationResult.Outcome
The outcome of the evaluation or expectation.
| Enums | |
|---|---|
OUTCOME_UNSPECIFIED |
Evaluation outcome is not specified. |
PASS |
Evaluation/Expectation passed. In the case of an evaluation, this means that all expectations were met. |
FAIL |
Evaluation/Expectation failed. In the case of an evaluation, this means that at least one expectation was not met. |
EvaluationResult.SemanticSimilarityResult
The result of the semantic similarity check.
| JSON representation |
|---|
{
"label": string,
"explanation": string,
"outcome": enum ( |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory |
explanation |
Output only. The explanation for the semantic similarity score. |
outcome |
Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semanticSimilaritySuccessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
score |
Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4. |
EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult
The result of the tool invocation check.
| JSON representation |
|---|
{
"outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the parameterCorrectnessScore to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
parameterCorrectnessScore |
Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call. |
EvaluationResult.HallucinationResult
The result of the hallucination check for a single turn.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess |
explanation |
Output only. The explanation for the hallucination score. |
score |
Output only. The hallucination score. Can be -1, 0, 1. |
EvaluationResult.ToolCallLatency
The latency of a tool call execution.
| JSON representation |
|---|
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string } |
| Fields | |
|---|---|
tool |
Output only. The name of the tool that got executed. Format: |
displayName |
Output only. The display name of the tool. |
startTime |
Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionLatency |
Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with ' |
EvaluationResult.OverallToolInvocationResult
The result of the overall tool invocation check.
| JSON representation |
|---|
{
"outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the toolInvocationScore to the overallToolInvocationCorrectnessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
toolInvocationScore |
The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked. |
EvaluationResult.ScenarioResult
The outcome of a scenario evaluation.
| JSON representation |
|---|
{ "conversation": string, "expectationOutcomes": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated in the scenario. |
expectationOutcomes[] |
Output only. The outcome of each expectation. |
rubricOutcomes[] |
Output only. The outcome of the rubric. |
hallucinationResult[] |
Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation. |
taskCompletionResult |
Output only. The result of the task completion check. |
toolCallLatencies[] |
Output only. The latency of each tool call execution in the conversation. |
userGoalSatisfactionResult |
Output only. The result of the user goal satisfaction check. |
allExpectationsSatisfied |
Output only. Whether all expectations were satisfied for this turn. |
taskCompleted |
Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction. |
EvaluationResult.ScenarioExpectationOutcome
The outcome of a scenario expectation.
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the ScenarioExpectation. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The observed tool call. |
observedAgentResponse |
Output only. The observed agent response. |
EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall
The observed tool call and response.
| JSON representation |
|---|
{ "toolCall": { object ( |
| Fields | |
|---|---|
toolCall |
Output only. The observed tool call. |
toolResponse |
Output only. The observed tool response. |
EvaluationResult.ScenarioRubricOutcome
The outcome of the evaluation against the rubric.
| JSON representation |
|---|
{ "rubric": string, "scoreExplanation": string, "score": number } |
| Fields | |
|---|---|
rubric |
Output only. The rubric that was used to evaluate the conversation. |
scoreExplanation |
Output only. The rater's response to the rubric. |
score |
Output only. The score of the conversation against the rubric. |
EvaluationResult.TaskCompletionResult
The result of the task completion check for the conversation.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined |
explanation |
Output only. The explanation for the task completion score. |
score |
Output only. The task completion score. Can be -1, 0, 1 |
EvaluationResult.UserGoalSatisfactionResult
The result of a user goal satisfaction check for a conversation.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified |
explanation |
Output only. The explanation for the user task satisfaction score. |
score |
Output only. The user task satisfaction score. Can be -1, 0, 1. |
EvaluationResult.ExecutionState
The state of the evaluation result execution.
| Enums | |
|---|---|
EXECUTION_STATE_UNSPECIFIED |
Evaluation result execution state is not specified. |
RUNNING |
Evaluation result execution is running. |
COMPLETED |
Evaluation result execution has completed. |
ERROR |
Evaluation result execution failed due to an internal error. |
Methods |
|
|---|---|
|
Deletes an evaluation result. |
|
Gets details of the specified evaluation result. |
|
Lists all evaluation results for a given evaluation. |