MCP Tools Reference: ces.googleapis.com

Tool: create_evaluation_dataset

Creates a new evaluation dataset.

The following sample demonstrate how to use curl to invoke the create_evaluation_dataset MCP tool.

Curl Request
                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "create_evaluation_dataset",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'
                

Input Schema

Request message for EvaluationService.CreateEvaluationDataset.

CreateEvaluationDatasetRequest

JSON representation
{
  "parent": string,
  "evaluationDatasetId": string,
  "evaluationDataset": {
    object (EvaluationDataset)
  }
}
Fields
parent

string

Required. The app to create the evaluation for. Format: projects/{project}/locations/{location}/apps/{app}

evaluationDatasetId

string

Optional. The ID to use for the evaluation dataset, which will become the final component of the evaluation dataset's resource name. If not provided, a unique ID will be automatically assigned for the evaluation.

evaluationDataset

object (EvaluationDataset)

Required. The evaluation dataset to create.

EvaluationDataset

JSON representation
{
  "name": string,
  "displayName": string,
  "evaluations": [
    string
  ],
  "createTime": string,
  "updateTime": string,
  "etag": string,
  "createdBy": string,
  "lastUpdatedBy": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  }
}
Fields
name

string

Identifier. The unique identifier of this evaluation dataset. Format: projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}

displayName

string

Required. User-defined display name of the evaluation dataset. Unique within an App.

evaluations[]

string

Optional. Evaluations that are included in this dataset.

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation dataset was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

updateTime

string (Timestamp format)

Output only. Timestamp when the evaluation dataset was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

etag

string

Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.

createdBy

string

Output only. The user who created the evaluation dataset.

lastUpdatedBy

string

Output only. The user who last updated the evaluation dataset.

aggregatedMetrics

object (AggregatedMetrics)

Output only. The aggregated metrics for this evaluation dataset across all runs.

Timestamp

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

AggregatedMetrics

JSON representation
{
  "metricsByAppVersion": [
    {
      object (MetricsByAppVersion)
    }
  ]
}
Fields
metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation
{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}
Fields
appVersionId

string

Output only. The app version ID.

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this app version.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this app version.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this app version.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this app version.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this app version.

passCount

integer

Output only. The number of times the evaluation passed.

failCount

integer

Output only. The number of times the evaluation failed.

metricsByTurn[]

object (MetricsByTurn)

Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{
  "tool": string,
  "passCount": integer,
  "failCount": integer
}
Fields
tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{
  "tool": string,
  "averageLatency": string
}
Fields
tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Duration

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

TurnLatencyMetrics

JSON representation
{
  "averageLatency": string
}
Fields
averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation
{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}
Fields
turnIndex

integer

Output only. The turn index (0-based).

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this turn.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this turn.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this turn.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this turn.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this turn.

Output Schema

An evaluation dataset represents a set of evaluations that are grouped together basaed on shared tags.

EvaluationDataset

JSON representation
{
  "name": string,
  "displayName": string,
  "evaluations": [
    string
  ],
  "createTime": string,
  "updateTime": string,
  "etag": string,
  "createdBy": string,
  "lastUpdatedBy": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  }
}
Fields
name

string

Identifier. The unique identifier of this evaluation dataset. Format: projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}

displayName

string

Required. User-defined display name of the evaluation dataset. Unique within an App.

evaluations[]

string

Optional. Evaluations that are included in this dataset.

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation dataset was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

updateTime

string (Timestamp format)

Output only. Timestamp when the evaluation dataset was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

etag

string

Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.

createdBy

string

Output only. The user who created the evaluation dataset.

lastUpdatedBy

string

Output only. The user who last updated the evaluation dataset.

aggregatedMetrics

object (AggregatedMetrics)

Output only. The aggregated metrics for this evaluation dataset across all runs.

Timestamp

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

AggregatedMetrics

JSON representation
{
  "metricsByAppVersion": [
    {
      object (MetricsByAppVersion)
    }
  ]
}
Fields
metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation
{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}
Fields
appVersionId

string

Output only. The app version ID.

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this app version.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this app version.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this app version.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this app version.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this app version.

passCount

integer

Output only. The number of times the evaluation passed.

failCount

integer

Output only. The number of times the evaluation failed.

metricsByTurn[]

object (MetricsByTurn)

Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{
  "tool": string,
  "passCount": integer,
  "failCount": integer
}
Fields
tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{
  "score": number
}
Fields
score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{
  "tool": string,
  "averageLatency": string
}
Fields
tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Duration

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

TurnLatencyMetrics

JSON representation
{
  "averageLatency": string
}
Fields
averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation
{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}
Fields
turnIndex

integer

Output only. The turn index (0-based).

toolMetrics[]

object (ToolMetrics)

Output only. Metrics for each tool within this turn.

semanticSimilarityMetrics[]

object (SemanticSimilarityMetrics)

Output only. Metrics for semantic similarity within this turn.

hallucinationMetrics[]

object (HallucinationMetrics)

Output only. Metrics for hallucination within this turn.

toolCallLatencyMetrics[]

object (ToolCallLatencyMetrics)

Output only. Metrics for tool call latency within this turn.

turnLatencyMetrics[]

object (TurnLatencyMetrics)

Output only. Metrics for turn latency within this turn.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ❌ | Read Only Hint: ❌ | Open World Hint: ❌