MCP Tools Reference: ces.googleapis.com

Tool: `create_evaluation_dataset`

Creates a new evaluation dataset.

The following sample demonstrate how to use curl to invoke the create_evaluation_dataset MCP tool.

Curl Request
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "create_evaluation_dataset", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }'

Curl Request

                  
curl --location 'https://ces.[REGION].rep.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "create_evaluation_dataset",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'

Input Schema

Request message for EvaluationService.CreateEvaluationDataset.

CreateEvaluationDatasetRequest

JSON representation
{ "parent": string, "evaluationDatasetId": string, "evaluationDataset": { object (`EvaluationDataset`) } }

Fields

Fields
`parent`	`string` Required. The app to create the evaluation for. Format: `projects/{project}/locations/{location}/apps/{app}`
`evaluationDatasetId`	`string` Optional. The ID to use for the evaluation dataset, which will become the final component of the evaluation dataset's resource name. If not provided, a unique ID will be automatically assigned for the evaluation.
`evaluationDataset`	`object (EvaluationDataset)` Required. The evaluation dataset to create.

parent

string

Required. The app to create the evaluation for. Format: projects/{project}/locations/{location}/apps/{app}

evaluationDatasetId

string

Optional. The ID to use for the evaluation dataset, which will become the final component of the evaluation dataset's resource name. If not provided, a unique ID will be automatically assigned for the evaluation.

evaluationDataset

object (EvaluationDataset)

Required. The evaluation dataset to create.

EvaluationDataset

JSON representation

JSON representation
{ "name": string, "displayName": string, "evaluations": [ string ], "createTime": string, "updateTime": string, "etag": string, "createdBy": string, "lastUpdatedBy": string, "aggregatedMetrics": { object (`AggregatedMetrics`) } }

{
  "name": string,
  "displayName": string,
  "evaluations": [
    string
  ],
  "createTime": string,
  "updateTime": string,
  "etag": string,
  "createdBy": string,
  "lastUpdatedBy": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  }
}

Fields
`name`	`string` Identifier. The unique identifier of this evaluation dataset. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}`
`displayName`	`string` Required. User-defined display name of the evaluation dataset. Unique within an App.
`evaluations[]`	`string` Optional. Evaluations that are included in this dataset.
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation dataset was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`updateTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation dataset was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`etag`	`string` Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.
`createdBy`	`string` Output only. The user who created the evaluation dataset.
`lastUpdatedBy`	`string` Output only. The user who last updated the evaluation dataset.
`aggregatedMetrics`	`object (AggregatedMetrics)` Output only. The aggregated metrics for this evaluation dataset across all runs.

Timestamp

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).
`nanos`	`integer` Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

AggregatedMetrics

JSON representation
{ "metricsByAppVersion": [ { object (`MetricsByAppVersion`) } ] }

Fields

Fields
`metricsByAppVersion[]`	`object (MetricsByAppVersion)` Output only. Aggregated metrics, grouped by app version ID.

metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation

JSON representation
{ "appVersionId": string, "toolMetrics": [ { object (`ToolMetrics`) } ], "semanticSimilarityMetrics": [ { object (`SemanticSimilarityMetrics`) } ], "hallucinationMetrics": [ { object (`HallucinationMetrics`) } ], "toolCallLatencyMetrics": [ { object (`ToolCallLatencyMetrics`) } ], "turnLatencyMetrics": [ { object (`TurnLatencyMetrics`) } ], "passCount": integer, "failCount": integer, "metricsByTurn": [ { object (`MetricsByTurn`) } ] }

{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}

Fields
`appVersionId`	`string` Output only. The app version ID.
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this app version.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this app version.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this app version.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this app version.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this app version.
`passCount`	`integer` Output only. The number of times the evaluation passed.
`failCount`	`integer` Output only. The number of times the evaluation failed.
`metricsByTurn[]`	`object (MetricsByTurn)` Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{ "tool": string, "passCount": integer, "failCount": integer }

Fields

Fields
`tool`	`string` Output only. The name of the tool.
`passCount`	`integer` Output only. The number of times the tool passed.
`failCount`	`integer` Output only. The number of times the tool failed.

tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{ "score": number }

Fields

Fields
`score`	`number` Output only. The average semantic similarity score (0-4).

score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{ "score": number }

Fields

Fields
`score`	`number` Output only. The average hallucination score (0 to 1).

score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{ "tool": string, "averageLatency": string }

Fields

Fields
`tool`	`string` Output only. The name of the tool.
`averageLatency`	`string (Duration format)` Output only. The average latency of the tool calls. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Duration

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years
`nanos`	`integer` Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 `seconds` field and a positive or negative `nanos` field. For durations of one second or more, a non-zero value for the `nanos` field must be of the same sign as the `seconds` field. Must be from -999,999,999 to +999,999,999 inclusive.

seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

TurnLatencyMetrics

JSON representation
{ "averageLatency": string }

Fields

Fields
`averageLatency`	`string (Duration format)` Output only. The average latency of the turns. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation

JSON representation
{ "turnIndex": integer, "toolMetrics": [ { object (`ToolMetrics`) } ], "semanticSimilarityMetrics": [ { object (`SemanticSimilarityMetrics`) } ], "hallucinationMetrics": [ { object (`HallucinationMetrics`) } ], "toolCallLatencyMetrics": [ { object (`ToolCallLatencyMetrics`) } ], "turnLatencyMetrics": [ { object (`TurnLatencyMetrics`) } ] }

{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}

Fields
`turnIndex`	`integer` Output only. The turn index (0-based).
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this turn.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this turn.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this turn.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this turn.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this turn.

Output Schema

An evaluation dataset represents a set of evaluations that are grouped together basaed on shared tags.

EvaluationDataset

JSON representation

JSON representation
{ "name": string, "displayName": string, "evaluations": [ string ], "createTime": string, "updateTime": string, "etag": string, "createdBy": string, "lastUpdatedBy": string, "aggregatedMetrics": { object (`AggregatedMetrics`) } }

{
  "name": string,
  "displayName": string,
  "evaluations": [
    string
  ],
  "createTime": string,
  "updateTime": string,
  "etag": string,
  "createdBy": string,
  "lastUpdatedBy": string,
  "aggregatedMetrics": {
    object (AggregatedMetrics)
  }
}

Fields
`name`	`string` Identifier. The unique identifier of this evaluation dataset. Format: `projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}`
`displayName`	`string` Required. User-defined display name of the evaluation dataset. Unique within an App.
`evaluations[]`	`string` Optional. Evaluations that are included in this dataset.
`createTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation dataset was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`updateTime`	`string (Timestamp format)` Output only. Timestamp when the evaluation dataset was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`etag`	`string` Output only. Etag used to ensure the object hasn't changed during a read-modify-write operation. If the etag is empty, the update will overwrite any concurrent changes.
`createdBy`	`string` Output only. The user who created the evaluation dataset.
`lastUpdatedBy`	`string` Output only. The user who last updated the evaluation dataset.
`aggregatedMetrics`	`object (AggregatedMetrics)` Output only. The aggregated metrics for this evaluation dataset across all runs.

Timestamp

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).
`nanos`	`integer` Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

AggregatedMetrics

JSON representation
{ "metricsByAppVersion": [ { object (`MetricsByAppVersion`) } ] }

Fields

Fields
`metricsByAppVersion[]`	`object (MetricsByAppVersion)` Output only. Aggregated metrics, grouped by app version ID.

metricsByAppVersion[]

object (MetricsByAppVersion)

Output only. Aggregated metrics, grouped by app version ID.

MetricsByAppVersion

JSON representation

JSON representation
{ "appVersionId": string, "toolMetrics": [ { object (`ToolMetrics`) } ], "semanticSimilarityMetrics": [ { object (`SemanticSimilarityMetrics`) } ], "hallucinationMetrics": [ { object (`HallucinationMetrics`) } ], "toolCallLatencyMetrics": [ { object (`ToolCallLatencyMetrics`) } ], "turnLatencyMetrics": [ { object (`TurnLatencyMetrics`) } ], "passCount": integer, "failCount": integer, "metricsByTurn": [ { object (`MetricsByTurn`) } ] }

{
  "appVersionId": string,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ],
  "passCount": integer,
  "failCount": integer,
  "metricsByTurn": [
    {
      object (MetricsByTurn)
    }
  ]
}

Fields
`appVersionId`	`string` Output only. The app version ID.
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this app version.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this app version.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this app version.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this app version.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this app version.
`passCount`	`integer` Output only. The number of times the evaluation passed.
`failCount`	`integer` Output only. The number of times the evaluation failed.
`metricsByTurn[]`	`object (MetricsByTurn)` Output only. Metrics aggregated per turn within this app version.

ToolMetrics

JSON representation
{ "tool": string, "passCount": integer, "failCount": integer }

Fields

Fields
`tool`	`string` Output only. The name of the tool.
`passCount`	`integer` Output only. The number of times the tool passed.
`failCount`	`integer` Output only. The number of times the tool failed.

tool

string

Output only. The name of the tool.

passCount

integer

Output only. The number of times the tool passed.

failCount

integer

Output only. The number of times the tool failed.

SemanticSimilarityMetrics

JSON representation
{ "score": number }

Fields

Fields
`score`	`number` Output only. The average semantic similarity score (0-4).

score

number

Output only. The average semantic similarity score (0-4).

HallucinationMetrics

JSON representation
{ "score": number }

Fields

Fields
`score`	`number` Output only. The average hallucination score (0 to 1).

score

number

Output only. The average hallucination score (0 to 1).

ToolCallLatencyMetrics

JSON representation
{ "tool": string, "averageLatency": string }

Fields

Fields
`tool`	`string` Output only. The name of the tool.
`averageLatency`	`string (Duration format)` Output only. The average latency of the tool calls. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

tool

string

Output only. The name of the tool.

averageLatency

string (Duration format)

Output only. The average latency of the tool calls.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Duration

JSON representation
{ "seconds": string, "nanos": integer }

Fields

Fields
`seconds`	`string (int64 format)` Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years
`nanos`	`integer` Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 `seconds` field and a positive or negative `nanos` field. For durations of one second or more, a non-zero value for the `nanos` field must be of the same sign as the `seconds` field. Must be from -999,999,999 to +999,999,999 inclusive.

seconds

string (int64 format)

nanos

integer

TurnLatencyMetrics

JSON representation
{ "averageLatency": string }

Fields

Fields
`averageLatency`	`string (Duration format)` Output only. The average latency of the turns. A duration in seconds with up to nine fractional digits, ending with '`s`'. Example: `"3.5s"`.

averageLatency

string (Duration format)

Output only. The average latency of the turns.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

MetricsByTurn

JSON representation

JSON representation
{ "turnIndex": integer, "toolMetrics": [ { object (`ToolMetrics`) } ], "semanticSimilarityMetrics": [ { object (`SemanticSimilarityMetrics`) } ], "hallucinationMetrics": [ { object (`HallucinationMetrics`) } ], "toolCallLatencyMetrics": [ { object (`ToolCallLatencyMetrics`) } ], "turnLatencyMetrics": [ { object (`TurnLatencyMetrics`) } ] }

{
  "turnIndex": integer,
  "toolMetrics": [
    {
      object (ToolMetrics)
    }
  ],
  "semanticSimilarityMetrics": [
    {
      object (SemanticSimilarityMetrics)
    }
  ],
  "hallucinationMetrics": [
    {
      object (HallucinationMetrics)
    }
  ],
  "toolCallLatencyMetrics": [
    {
      object (ToolCallLatencyMetrics)
    }
  ],
  "turnLatencyMetrics": [
    {
      object (TurnLatencyMetrics)
    }
  ]
}

Fields
`turnIndex`	`integer` Output only. The turn index (0-based).
`toolMetrics[]`	`object (ToolMetrics)` Output only. Metrics for each tool within this turn.
`semanticSimilarityMetrics[]`	`object (SemanticSimilarityMetrics)` Output only. Metrics for semantic similarity within this turn.
`hallucinationMetrics[]`	`object (HallucinationMetrics)` Output only. Metrics for hallucination within this turn.
`toolCallLatencyMetrics[]`	`object (ToolCallLatencyMetrics)` Output only. Metrics for tool call latency within this turn.
`turnLatencyMetrics[]`	`object (TurnLatencyMetrics)` Output only. Metrics for turn latency within this turn.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ❌ | Read Only Hint: ❌ | Open World Hint: ❌

MCP Tools Reference: ces.googleapis.com Stay organized with collections Save and categorize content based on your preferences.

Tool: create_evaluation_dataset

Input Schema

CreateEvaluationDatasetRequest

EvaluationDataset

Timestamp

AggregatedMetrics

MetricsByAppVersion

ToolMetrics

SemanticSimilarityMetrics

HallucinationMetrics

ToolCallLatencyMetrics

Duration

TurnLatencyMetrics

MetricsByTurn

Output Schema

EvaluationDataset

Timestamp

AggregatedMetrics

MetricsByAppVersion

ToolMetrics

SemanticSimilarityMetrics

HallucinationMetrics

ToolCallLatencyMetrics

Duration

TurnLatencyMetrics

MetricsByTurn

Tool Annotations

MCP Tools Reference: ces.googleapis.com

Tool: `create_evaluation_dataset`