REST Resource: projects.locations.apps.evaluations

Resource: Evaluation

An evaluation represents all of the information needed to simulate and evaluate an agent.

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "tags": [
    string
  ],
  "evaluationDatasets": [
    string
  ],
  "createTime": string,
  "createdBy": string,
  "updateTime": string,
  "lastUpdatedBy": string,
  "evaluationRuns": [
    string
  ],
  "etag": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  },
  "lastCompletedResult": {
    object (EvaluationResult)
  },
  "invalid": boolean,
  "lastTenResults": [
    {
      object (EvaluationResult)
    }
  ],

  // Union field inputs can be only one of the following:
  "golden": {
    object (Evaluation.Golden)
  },
  "scenario": {
    object (Evaluation.Scenario)
  }
  // End of list of possible types for union field inputs.
}
Fields
name

string

Identifier. The unique identifier of this evaluation. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}

displayName

string

Required. User-defined display name of the evaluation. Unique within an App.

description

string

Optional. User-defined description of the evaluation.

tags[]

string

Optional. User defined tags to categorize the evaluation.

evaluationDatasets[]

string

Output only. List of evaluation datasets the evaluation belongs to. Format: projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

createdBy

string

Output only. The user who created the evaluation.

updateTime

string (Timestamp format)

Output only. Timestamp when the evaluation was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

lastUpdatedBy

string

Output only. The user who last updated the evaluation.

evaluationRuns[]

string

Output only. The EvaluationRuns that this Evaluation is associated with.

etag

string

Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.

aggregatedMetrics

object (AggregatedMetrics)

Output only. The aggregated metrics for this evaluation across all runs.

lastCompletedResult

object (EvaluationResult)

Output only. The latest evaluation result for this evaluation.

invalid

boolean

Output only. Whether the evaluation is invalid. This can happen if an evaluation is referencing a tool, toolset, or agent that has since been deleted.

lastTenResults[]

object (EvaluationResult)

Output only. The last 10 evaluation results for this evaluation. This is only populated if include_last_ten_results is set to true in the ListEvaluationsRequest or GetEvaluationRequest.

Union field inputs. The inputs for the evaluation inputs can be only one of the following:
golden

object (Evaluation.Golden)

Optional. The golden steps to be evaluated.

scenario

object (Evaluation.Scenario)

Optional. The config for a scenario.

Evaluation.Golden

The steps required to replay a golden conversation.

JSON representation
{
  "turns": [
    {
      object (Evaluation.GoldenTurn)
    }
  ]
}
Fields
turns[]

object (Evaluation.GoldenTurn)

Required. The golden turns required to replay a golden conversation.

Evaluation.GoldenTurn

A golden turn defines a single turn in a golden conversation.

JSON representation
{
  "steps": [
    {
      object (Evaluation.Step)
    }
  ],
  "rootSpan": {
    object (Span)
  }
}
Fields
steps[]

object (Evaluation.Step)

Required. The steps required to replay a golden conversation.

rootSpan

object (Span)

Optional. The root span of the golden turn for processing and maintaining audio information.

Evaluation.Step

A step defines a singular action to happen during the evaluation.

JSON representation
{

  // Union field step can be only one of the following:
  "userInput": {
    object (SessionInput)
  },
  "agentTransfer": {
    object (AgentTransfer)
  },
  "expectation": {
    object (Evaluation.GoldenExpectation)
  }
  // End of list of possible types for union field step.
}
Fields
Union field step. The step to perform. step can be only one of the following:
userInput

object (SessionInput)

Optional. User input for the conversation.

agentTransfer

object (AgentTransfer)

Optional. Transfer the conversation to a different agent.

expectation

object (Evaluation.GoldenExpectation)

Optional. Executes an expectation on the current turn.

SessionInput

Input for the session.

JSON representation
{
  "willContinue": boolean,

  // Union field input_type can be only one of the following:
  "text": string,
  "dtmf": string,
  "audio": string,
  "toolResponses": {
    object (ToolResponses)
  },
  "image": {
    object (Image)
  },
  "blob": {
    object (Blob)
  },
  "variables": {
    object
  },
  "event": {
    object (Event)
  }
  // End of list of possible types for union field input_type.
}
Fields
willContinue

boolean

Optional. A flag to indicate if the current message is a fragment of a larger input in the bidi streaming session. When true, the agent will defer processing until a subsequent message with willContinue set to false is received.

Note: This flag has no effect on audio and DTMF inputs, which are always processed in real-time.

Union field input_type. The type of the input. input_type can be only one of the following:
text

string

Optional. Text data from the end user.

dtmf

string

Optional. DTMF digits from the end user.

audio

string (bytes format)

Optional. Audio data from the end user.

A base64-encoded string.

toolResponses

object (ToolResponses)

Optional. Execution results for the tool calls from the client.

image

object (Image)

Optional. Image data from the end user.

blob

object (Blob)

Optional. Blob data from the end user.

variables

object (Struct format)

Optional. Contextual variables for the session, keyed by name. Variables must be declared in the app first, otherwise they will be ignored.

event

object (Event)

Optional. Event input.

ToolResponses

Execution results for the requested tool calls from the client.

JSON representation
{
  "toolResponses": [
    {
      object (ToolResponse)
    }
  ]
}
Fields
toolResponses[]

object (ToolResponse)

Optional. The list of tool execution results.

Blob

Represents a blob input or output in the conversation.

JSON representation
{
  "mimeType": string,
  "data": string
}
Fields
mimeType

string

Required. The IANA standard MIME type of the source data.

data

string (bytes format)

Required. Raw bytes of the blob.

A base64-encoded string.

Event

Event input.

JSON representation
{
  "event": string
}
Fields
event

string

Required. The name of the event.

Evaluation.Scenario

The config for a scenario

JSON representation
{
  "task": string,
  "userFacts": [
    {
      object (Evaluation.Scenario.UserFact)
    }
  ],
  "maxTurns": integer,
  "rubrics": [
    string
  ],
  "scenarioExpectations": [
    {
      object (Evaluation.ScenarioExpectation)
    }
  ],
  "variableOverrides": {
    object
  },
  "taskCompletionBehavior": enum (Evaluation.Scenario.TaskCompletionBehavior),
  "userGoalBehavior": enum (Evaluation.Scenario.UserGoalBehavior)
}
Fields
task

string

Required. The task to be targeted by the scenario.

userFacts[]

object (Evaluation.Scenario.UserFact)

Optional. The user facts to be used by the scenario.

maxTurns

integer

Optional. The maximum number of turns to simulate. If not specified, the simulation will continue until the task is complete.

rubrics[]

string

Required. The rubrics to score the scenario against.

scenarioExpectations[]

object (Evaluation.ScenarioExpectation)

Required. The ScenarioExpectations to evaluate the conversation produced by the user simulation.

variableOverrides

object (Struct format)

Optional. Variables / Session Parameters as context for the session, keyed by variable names. Members of this struct will override any default values set by the system.

Note, these are different from user facts, which are facts known to the user. Variables are parameters known to the agent: i.e. MDN (phone number) passed by the telephony system.

taskCompletionBehavior

enum (Evaluation.Scenario.TaskCompletionBehavior)

Optional. The expected behavior of the user task.

userGoalBehavior

enum (Evaluation.Scenario.UserGoalBehavior)

Optional. The expected behavior of the user goal.

Evaluation.Scenario.UserFact

Facts about the user as a key value pair.

JSON representation
{
  "name": string,
  "value": string
}
Fields
name

string

Required. The name of the user fact.

value

string

Required. The value of the user fact.

Evaluation.Scenario.TaskCompletionBehavior

The expected behavior of the user task. This is used to determine whether the scenario is successful.

Enums
TASK_COMPLETION_BEHAVIOR_UNSPECIFIED Behavior unspecified. Will default to TASK_SATISFIED.
TASK_SATISFIED The user task should be completed successfully.
TASK_REJECTED The user task should be rejected.

Evaluation.Scenario.UserGoalBehavior

The expected behavior of the user goal. This is used to determine whether the scenario is successful.

Enums
USER_GOAL_BEHAVIOR_UNSPECIFIED Behavior unspecified. Will default to USER_GOAL_SATISFIED.
USER_GOAL_SATISFIED The user goal should be completed successfully.
USER_GOAL_REJECTED The user goal should be rejected.
USER_GOAL_IGNORED Ignore the user goal status.

Methods

create

Creates an evaluation.

delete

Deletes an evaluation.

get

Gets details of the specified evaluation.

list

Lists all evaluations in the given app.

patch

Updates an evaluation.