- Resource: Evaluation
- Evaluation.Golden
- Evaluation.GoldenTurn
- Evaluation.Step
- SessionInput
- ToolResponses
- Blob
- Event
- Evaluation.Scenario
- Evaluation.Scenario.UserFact
- Evaluation.Scenario.TaskCompletionBehavior
- Evaluation.Scenario.UserGoalBehavior
- Methods
Resource: Evaluation
An evaluation represents all of the information needed to simulate and evaluate an agent.
| JSON representation |
|---|
{ "name": string, "displayName": string, "description": string, "tags": [ string ], "evaluationDatasets": [ string ], "createTime": string, "createdBy": string, "updateTime": string, "lastUpdatedBy": string, "evaluationRuns": [ string ], "etag": string, "aggregatedMetrics": { object ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of this evaluation. Format: |
displayName |
Required. User-defined display name of the evaluation. Unique within an App. |
description |
Optional. User-defined description of the evaluation. |
tags[] |
Optional. User defined tags to categorize the evaluation. |
evaluationDatasets[] |
Output only. List of evaluation datasets the evaluation belongs to. Format: |
createTime |
Output only. Timestamp when the evaluation was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
createdBy |
Output only. The user who created the evaluation. |
updateTime |
Output only. Timestamp when the evaluation was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
lastUpdatedBy |
Output only. The user who last updated the evaluation. |
evaluationRuns[] |
Output only. The EvaluationRuns that this Evaluation is associated with. |
etag |
Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes. |
aggregatedMetrics |
Output only. The aggregated metrics for this evaluation across all runs. |
lastCompletedResult |
Output only. The latest evaluation result for this evaluation. |
invalid |
Output only. Whether the evaluation is invalid. This can happen if an evaluation is referencing a tool, toolset, or agent that has since been deleted. |
lastTenResults[] |
Output only. The last 10 evaluation results for this evaluation. This is only populated if include_last_ten_results is set to true in the ListEvaluationsRequest or GetEvaluationRequest. |
Union field inputs. The inputs for the evaluation inputs can be only one of the following: |
|
golden |
Optional. The golden steps to be evaluated. |
scenario |
Optional. The config for a scenario. |
Evaluation.Golden
The steps required to replay a golden conversation.
| JSON representation |
|---|
{
"turns": [
{
object ( |
| Fields | |
|---|---|
turns[] |
Required. The golden turns required to replay a golden conversation. |
Evaluation.GoldenTurn
A golden turn defines a single turn in a golden conversation.
| JSON representation |
|---|
{ "steps": [ { object ( |
| Fields | |
|---|---|
steps[] |
Required. The steps required to replay a golden conversation. |
rootSpan |
Optional. The root span of the golden turn for processing and maintaining audio information. |
Evaluation.Step
A step defines a singular action to happen during the evaluation.
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field step. The step to perform. step can be only one of the following: |
|
userInput |
Optional. User input for the conversation. |
agentTransfer |
Optional. Transfer the conversation to a different agent. |
expectation |
Optional. Executes an expectation on the current turn. |
SessionInput
Input for the session.
| JSON representation |
|---|
{ "willContinue": boolean, // Union field |
| Fields | |
|---|---|
willContinue |
Optional. A flag to indicate if the current message is a fragment of a larger input in the bidi streaming session. When Note: This flag has no effect on audio and DTMF inputs, which are always processed in real-time. |
Union field input_type. The type of the input. input_type can be only one of the following: |
|
text |
Optional. Text data from the end user. |
dtmf |
Optional. DTMF digits from the end user. |
audio |
Optional. Audio data from the end user. A base64-encoded string. |
toolResponses |
Optional. Execution results for the tool calls from the client. |
image |
Optional. Image data from the end user. |
blob |
Optional. Blob data from the end user. |
variables |
Optional. Contextual variables for the session, keyed by name. Variables must be declared in the app first, otherwise they will be ignored. |
event |
Optional. Event input. |
ToolResponses
Execution results for the requested tool calls from the client.
| JSON representation |
|---|
{
"toolResponses": [
{
object ( |
| Fields | |
|---|---|
toolResponses[] |
Optional. The list of tool execution results. |
Blob
Represents a blob input or output in the conversation.
| JSON representation |
|---|
{ "mimeType": string, "data": string } |
| Fields | |
|---|---|
mimeType |
Required. The IANA standard MIME type of the source data. |
data |
Required. Raw bytes of the blob. A base64-encoded string. |
Event
Event input.
| JSON representation |
|---|
{ "event": string } |
| Fields | |
|---|---|
event |
Required. The name of the event. |
Evaluation.Scenario
The config for a scenario
| JSON representation |
|---|
{ "task": string, "userFacts": [ { object ( |
| Fields | |
|---|---|
task |
Required. The task to be targeted by the scenario. |
userFacts[] |
Optional. The user facts to be used by the scenario. |
maxTurns |
Optional. The maximum number of turns to simulate. If not specified, the simulation will continue until the task is complete. |
rubrics[] |
Required. The rubrics to score the scenario against. |
scenarioExpectations[] |
Required. The ScenarioExpectations to evaluate the conversation produced by the user simulation. |
variableOverrides |
Optional. Variables / Session Parameters as context for the session, keyed by variable names. Members of this struct will override any default values set by the system. Note, these are different from user facts, which are facts known to the user. Variables are parameters known to the agent: i.e. MDN (phone number) passed by the telephony system. |
taskCompletionBehavior |
Optional. The expected behavior of the user task. |
userGoalBehavior |
Optional. The expected behavior of the user goal. |
Evaluation.Scenario.UserFact
Facts about the user as a key value pair.
| JSON representation |
|---|
{ "name": string, "value": string } |
| Fields | |
|---|---|
name |
Required. The name of the user fact. |
value |
Required. The value of the user fact. |
Evaluation.Scenario.TaskCompletionBehavior
The expected behavior of the user task. This is used to determine whether the scenario is successful.
| Enums | |
|---|---|
TASK_COMPLETION_BEHAVIOR_UNSPECIFIED |
Behavior unspecified. Will default to TASK_SATISFIED. |
TASK_SATISFIED |
The user task should be completed successfully. |
TASK_REJECTED |
The user task should be rejected. |
Evaluation.Scenario.UserGoalBehavior
The expected behavior of the user goal. This is used to determine whether the scenario is successful.
| Enums | |
|---|---|
USER_GOAL_BEHAVIOR_UNSPECIFIED |
Behavior unspecified. Will default to USER_GOAL_SATISFIED. |
USER_GOAL_SATISFIED |
The user goal should be completed successfully. |
USER_GOAL_REJECTED |
The user goal should be rejected. |
USER_GOAL_IGNORED |
Ignore the user goal status. |
Methods |
|
|---|---|
|
Creates an evaluation. |
|
Deletes an evaluation. |
|
Gets details of the specified evaluation. |
|
Lists all evaluations in the given app. |
|
Updates an evaluation. |