MCP Tools Reference: us-central1-aiplatform.googleapis.com

Tool: create_endpoint

Creates an Endpoint.

The following sample demonstrate how to use curl to invoke the create_endpoint MCP tool.

Curl Request
                  
curl --location 'https://us-central1-aiplatform.googleapis.com/mcp' \
--header 'content-type: application/json' \
--header 'accept: application/json, text/event-stream' \
--data '{
  "method": "tools/call",
  "params": {
    "name": "create_endpoint",
    "arguments": {
      // provide these details according to the tool's MCP specification
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}'
                

Input Schema

Request message for EndpointService.CreateEndpoint.

CreateEndpointRequest

JSON representation
{
  "parent": string,
  "endpoint": {
    object (Endpoint)
  },
  "endpointId": string
}
Fields
parent

string

Required. The resource name of the Location to create the Endpoint in. Format: projects/{project}/locations/{location}

endpoint

object (Endpoint)

Required. The Endpoint to create.

endpointId

string

Immutable. The ID to use for endpoint, which will become the final component of the endpoint resource name. If not provided, Vertex AI will generate a value for this ID.

If the first character is a letter, this value may be up to 63 characters, and valid characters are [a-z0-9-]. The last character must be a letter or number.

If the first character is a number, this value may be up to 9 characters, and valid characters are [0-9] with no leading zeros.

When using HTTP/JSON, this field is populated based on a query string argument, such as ?endpoint_id=12345. This is the fallback for fields that are not included in either the URI or the body.

Endpoint

JSON representation
{
  "name": string,
  "displayName": string,
  "description": string,
  "deployedModels": [
    {
      object (DeployedModel)
    }
  ],
  "trafficSplit": {
    string: integer,
    ...
  },
  "etag": string,
  "labels": {
    string: string,
    ...
  },
  "createTime": string,
  "updateTime": string,
  "encryptionSpec": {
    object (EncryptionSpec)
  },
  "network": string,
  "enablePrivateServiceConnect": boolean,
  "privateServiceConnectConfig": {
    object (PrivateServiceConnectConfig)
  },
  "modelDeploymentMonitoringJob": string,
  "predictRequestResponseLoggingConfig": {
    object (PredictRequestResponseLoggingConfig)
  },
  "dedicatedEndpointEnabled": boolean,
  "dedicatedEndpointDns": string,
  "clientConnectionConfig": {
    object (ClientConnectionConfig)
  },
  "satisfiesPzs": boolean,
  "satisfiesPzi": boolean,
  "genAiAdvancedFeaturesConfig": {
    object (GenAiAdvancedFeaturesConfig)
  }
}
Fields
name

string

Identifier. The resource name of the Endpoint.

displayName

string

Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.

description

string

The description of the Endpoint.

deployedModels[]

object (DeployedModel)

Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.

trafficSplit

map (key: string, value: integer)

A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel.

If a DeployedModel's ID is not listed in this map, then it receives no traffic.

The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

etag

string

Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.

labels

map (key: string, value: string)

The labels with user-defined metadata to organize your Endpoints.

Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

See https://goo.gl/xmQnxf for more information and examples of labels.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

createTime

string (Timestamp format)

Output only. Timestamp when this Endpoint was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

updateTime

string (Timestamp format)

Output only. Timestamp when this Endpoint was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

encryptionSpec

object (EncryptionSpec)

Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.

network

string

Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered.

Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network.

Only one of the fields, network or enable_private_service_connect, can be set.

Format: projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is network name.

enablePrivateServiceConnect
(deprecated)

boolean

Deprecated: If true, expose the Endpoint via private service connect.

Only one of the fields, network or enable_private_service_connect, can be set.

privateServiceConnectConfig

object (PrivateServiceConnectConfig)

Optional. Configuration for private service connect.

network and private_service_connect_config are mutually exclusive.

modelDeploymentMonitoringJob

string

Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by JobService.CreateModelDeploymentMonitoringJob. Format: projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{model_deployment_monitoring_job}

predictRequestResponseLoggingConfig

object (PredictRequestResponseLoggingConfig)

Configures the request-response logging for online prediction.

dedicatedEndpointEnabled

boolean

If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.

dedicatedEndpointDns

string

Output only. DNS of the dedicated endpoint. Will only be populated if dedicated_endpoint_enabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format: https://{endpoint_id}.{region}-{uid}.prediction.vertexai.goog.

clientConnectionConfig

object (ClientConnectionConfig)

Configurations that are applied to the endpoint for online prediction.

satisfiesPzs

boolean

Output only. Reserved for future use.

satisfiesPzi

boolean

Output only. Reserved for future use.

genAiAdvancedFeaturesConfig

object (GenAiAdvancedFeaturesConfig)

Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported.

DeployedModel

JSON representation
{
  "id": string,
  "model": string,
  "gdcConnectedModel": string,
  "modelVersionId": string,
  "displayName": string,
  "createTime": string,
  "explanationSpec": {
    object (ExplanationSpec)
  },
  "disableExplanations": boolean,
  "serviceAccount": string,
  "enableContainerLogging": boolean,
  "disableContainerLogging": boolean,
  "enableAccessLogging": boolean,
  "privateEndpoints": {
    object (PrivateEndpoints)
  },
  "fasterDeploymentConfig": {
    object (FasterDeploymentConfig)
  },
  "rolloutOptions": {
    object (RolloutOptions)
  },
  "status": {
    object (Status)
  },
  "systemLabels": {
    string: string,
    ...
  },
  "checkpointId": string,
  "speculativeDecodingSpec": {
    object (SpeculativeDecodingSpec)
  },

  // Union field prediction_resources can be only one of the following:
  "dedicatedResources": {
    object (DedicatedResources)
  },
  "automaticResources": {
    object (AutomaticResources)
  },
  "sharedResources": string,
  "fullFineTunedResources": {
    object (FullFineTunedResources)
  }
  // End of list of possible types for union field prediction_resources.
}
Fields
id

string

Immutable. The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID.

This value should be 1-10 characters, and valid characters are /[0-9]/.

model

string

The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint.

The resource name may contain version id or version alias to specify the version. Example: projects/{project}/locations/{location}/models/{model}@2 or projects/{project}/locations/{location}/models/{model}@golden if no version is specified, the default version will be deployed.

gdcConnectedModel

string

GDC pretrained / Gemini model name. The model name is a plain model name, e.g. gemini-1.5-flash-002.

modelVersionId

string

Output only. The version ID of the model that is deployed.

displayName

string

The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used.

createTime

string (Timestamp format)

Output only. Timestamp when the DeployedModel was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

explanationSpec

object (ExplanationSpec)

Explanation configuration for this DeployedModel.

When deploying a Model using EndpointService.DeployModel, this value overrides the value of Model.explanation_spec. All fields of explanation_spec are optional in the request. If a field of explanation_spec is not populated, the value of the same field of Model.explanation_spec is inherited. If the corresponding Model.explanation_spec is not populated, all fields of the explanation_spec will be used for the explanation configuration.

disableExplanations

boolean

If true, deploy the model without explainable feature, regardless the existence of Model.explanation_spec or explanation_spec.

serviceAccount

string

The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.

Users deploying the Model must have the iam.serviceAccounts.actAs permission on this service account.

enableContainerLogging

boolean

If true, the container of the DeployedModel instances will send stderr and stdout streams to Cloud Logging.

Only supported for custom-trained Models and AutoML Tabular Models.

disableContainerLogging

boolean

For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send stderr and stdout streams to Cloud Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing.

User can disable container logging by setting this flag to true.

enableAccessLogging

boolean

If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request.

Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option.

privateEndpoints

object (PrivateEndpoints)

Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if network is configured.

fasterDeploymentConfig

object (FasterDeploymentConfig)

Configuration for faster model deployment.

rolloutOptions

object (RolloutOptions)

Options for configuring rolling deployments.

status

object (Status)

Output only. Runtime status of the deployed model.

systemLabels

map (key: string, value: string)

System labels to apply to Model Garden deployments. System labels are managed by Google for internal use only.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

checkpointId

string

The checkpoint id of the model.

speculativeDecodingSpec

object (SpeculativeDecodingSpec)

Optional. Spec for configuring speculative decoding.

Union field prediction_resources. The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. See Model.supported_deployment_resources_types. Required except for Large Model Deploy use cases. prediction_resources can be only one of the following:
dedicatedResources

object (DedicatedResources)

A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration.

automaticResources

object (AutomaticResources)

A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration.

sharedResources

string

The resource name of the shared DeploymentResourcePool to deploy on. Format: projects/{project}/locations/{location}/deploymentResourcePools/{deployment_resource_pool}

fullFineTunedResources

object (FullFineTunedResources)

Optional. Resources for a full fine tuned model.

DedicatedResources

JSON representation
{
  "machineSpec": {
    object (MachineSpec)
  },
  "minReplicaCount": integer,
  "maxReplicaCount": integer,
  "requiredReplicaCount": integer,
  "initialReplicaCount": integer,
  "autoscalingMetricSpecs": [
    {
      object (AutoscalingMetricSpec)
    }
  ],
  "spot": boolean,
  "flexStart": {
    object (FlexStart)
  },
  "scaleToZeroSpec": {
    object (ScaleToZeroSpec)
  }
}
Fields
machineSpec

object (MachineSpec)

Required. Immutable. The specification of a single machine being used.

minReplicaCount

integer

Required. Immutable. The minimum number of machine replicas that will be always deployed on. This value must be greater than or equal to 1.

If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed.

maxReplicaCount

integer

Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use min_replica_count as the default value.

The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type).

requiredReplicaCount

integer

Optional. Number of required available replicas for the deployment to succeed. This field is only needed when partial deployment/mutation is desired. If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. If not set, the default required_replica_count will be min_replica_count.

initialReplicaCount

integer

Immutable. Number of initial replicas being deployed on when scaling the workload up from zero or when creating the workload in case min_replica_count = 0. When min_replica_count > 0 (meaning that the scale-to-zero feature is not enabled), initial_replica_count should not be set. When min_replica_count = 0 (meaning that the scale-to-zero feature is enabled), initial_replica_count should be larger than zero, but no greater than max_replica_count.

autoscalingMetricSpecs[]

object (AutoscalingMetricSpec)

Immutable. The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator's duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric.

If machine_spec.accelerator_count is above 0, the autoscaling will be based on both CPU utilization and accelerator's duty cycle metrics and scale up when either metrics exceeds its target value while scale down if both metrics are under their target value. The default target value is 60 for both metrics.

If machine_spec.accelerator_count is 0, the autoscaling will be based on CPU utilization metric only with default target value 60 if not explicitly set.

For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set autoscaling_metric_specs.metric_name to aiplatform.googleapis.com/prediction/online/cpu/utilization and autoscaling_metric_specs.target to 80.

spot

boolean

Optional. If true, schedule the deployment workload on spot VMs.

flexStart

object (FlexStart)

Optional. Immutable. If set, use DWS resource to schedule the deployment workload. reference: (https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler)

scaleToZeroSpec

object (ScaleToZeroSpec)

Optional. Specification for scale-to-zero feature.

MachineSpec

JSON representation
{
  "machineType": string,
  "acceleratorType": enum (AcceleratorType),
  "acceleratorCount": integer,
  "gpuPartitionSize": string,
  "tpuTopology": string,
  "multihostGpuNodeCount": integer,
  "reservationAffinity": {
    object (ReservationAffinity)
  },
  "minGpuDriverVersion": string
}
Fields
machineType

string

Immutable. The type of the machine.

See the list of machine types supported for prediction

See the list of machine types supported for custom training.

For DeployedModel this field is optional, and the default value is n1-standard-2. For BatchPredictionJob or as part of WorkerPoolSpec this field is required.

acceleratorType

enum (AcceleratorType)

Immutable. The type of accelerator(s) that may be attached to the machine as per accelerator_count.

acceleratorCount

integer

The number of accelerators to attach to the machine.

For accelerator optimized machine types (https://cloud.google.com/compute/docs/accelerator-optimized-machines), One may set the accelerator_count from 1 to N for machine with N GPUs. If accelerator_count is less than or equal to N / 2, Vertex will co-schedule the replicas of the model into the same VM to save cost.

For example, if the machine type is a3-highgpu-8g, which has 8 H100 GPUs, one can set accelerator_count to 1 to 8. If accelerator_count is 1, 2, 3, or 4, Vertex will co-schedule 8, 4, 2, or 2 replicas of the model into the same VM to save cost.

When co-scheduling, CPU, memory and storage on the VM will be distributed to replicas on the VM. For example, one can expect a co-scheduled replica requesting 2 GPUs out of a 8-GPU VM will receive 25% of the CPU, memory and storage of the VM.

Note that the feature is not compatible with multihost_gpu_node_count. When multihost_gpu_node_count is set, the co-scheduling will not be enabled.

gpuPartitionSize

string

Optional. Immutable. The Nvidia GPU partition size.

When specified, the requested accelerators will be partitioned into smaller GPU partitions. For example, if the request is for 8 units of NVIDIA A100 GPUs, and gpu_partition_size="1g.10gb", the service will create 8 * 7 = 56 partitioned MIG instances.

The partition size must be a value supported by the requested accelerator. Refer to Nvidia GPU Partitioning for the available partition sizes.

If set, the accelerator_count should be set to 1.

tpuTopology

string

Immutable. The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpu_topology: "2x2x1").

multihostGpuNodeCount

integer

Optional. Immutable. The number of nodes per replica for multihost GPU deployments.

reservationAffinity

object (ReservationAffinity)

Optional. Immutable. Configuration controlling how this resource pool consumes reservation.

minGpuDriverVersion

string

Optional. Immutable. The minimum GPU driver version that this machine requires. For example, "535.104.06". If not specified, the default GPU driver version will be used by the underlying infrastructure.

ReservationAffinity

JSON representation
{
  "reservationAffinityType": enum (Type),
  "key": string,
  "values": [
    string
  ]
}
Fields
reservationAffinityType

enum (Type)

Required. Specifies the reservation affinity type.

key

string

Optional. Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use compute.googleapis.com/reservation-name as the key and specify the name of your reservation as its value.

values[]

string

Optional. Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block.

AutoscalingMetricSpec

JSON representation
{
  "metricName": string,
  "target": integer,
  "monitoredResourceLabels": {
    string: string,
    ...
  }
}
Fields
metricName

string

Required. The resource metric name. Supported metrics:

  • For Online Prediction:
  • aiplatform.googleapis.com/prediction/online/accelerator/duty_cycle
  • aiplatform.googleapis.com/prediction/online/cpu/utilization
  • aiplatform.googleapis.com/prediction/online/request_count
  • pubsub.googleapis.com/subscription/num_undelivered_messages
  • prometheus.googleapis.com/vertex_dcgm_fi_dev_gpu_util
  • prometheus.googleapis.com/vertex_vllm_gpu_cache_usage_perc
  • prometheus.googleapis.com/vertex_vllm_num_requests_waiting
target

integer

The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided.

monitoredResourceLabels

map (key: string, value: string)

Optional. The Cloud Monitoring monitored resource labels as key value pairs used for metrics filtering. See Cloud Monitoring Labels https://cloud.google.com/monitoring/api/v3/metric-model#generic-label-info

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

MonitoredResourceLabelsEntry

JSON representation
{
  "key": string,
  "value": string
}
Fields
key

string

value

string

FlexStart

JSON representation
{
  "maxRuntimeDuration": string
}
Fields
maxRuntimeDuration

string (Duration format)

The max duration of the deployment is max_runtime_duration. The deployment will be terminated after the duration. The max_runtime_duration can be set up to 7 days.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

Duration

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years

nanos

integer

Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 seconds field and a positive or negative nanos field. For durations of one second or more, a non-zero value for the nanos field must be of the same sign as the seconds field. Must be from -999,999,999 to +999,999,999 inclusive.

ScaleToZeroSpec

JSON representation
{
  "minScaleupPeriod": string,
  "idleScaledownPeriod": string
}
Fields
minScaleupPeriod

string (Duration format)

Optional. Minimum duration that a deployment will be scaled up before traffic is evaluated for potential scale-down. [MinValue=300] (5 minutes) [MaxValue=28800] (8 hours)

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

idleScaledownPeriod

string (Duration format)

Optional. Duration of no traffic before scaling to zero. [MinValue=300] (5 minutes) [MaxValue=28800] (8 hours)

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

AutomaticResources

JSON representation
{
  "minReplicaCount": integer,
  "maxReplicaCount": integer
}
Fields
minReplicaCount

integer

Immutable. The minimum number of replicas that will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to max_replica_count, and as traffic decreases, some of these extra replicas may be freed. If the requested value is too large, the deployment will error.

maxReplicaCount

integer

Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number.

FullFineTunedResources

JSON representation
{
  "deploymentType": enum (DeploymentType),
  "modelInferenceUnitCount": integer
}
Fields
deploymentType

enum (DeploymentType)

Required. The kind of deployment.

modelInferenceUnitCount

integer

Optional. The number of model inference units to use for this deployment. This can only be specified for DEPLOYMENT_TYPE_PROD. The following table lists the number of model inference units for different model types: * Gemini 2.5 Flash * Foundation FMIU: 25 * Expansion FMIU: 4 * Gemini 2.5 Pro * Foundation FMIU: 32 * Expansion FMIU: 16 * Veo 3.0 (undistilled) * Foundation FMIU: 63 * Expansion FMIU: 7 * Veo 3.0 (distilled) * Foundation FMIU: 30 * Expansion FMIU: 10

Timestamp

JSON representation
{
  "seconds": string,
  "nanos": integer
}
Fields
seconds

string (int64 format)

Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z).

nanos

integer

Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive.

ExplanationSpec

JSON representation
{
  "parameters": {
    object (ExplanationParameters)
  },
  "metadata": {
    object (ExplanationMetadata)
  }
}
Fields
parameters

object (ExplanationParameters)

Required. Parameters that configure explaining of the Model's predictions.

metadata

object (ExplanationMetadata)

Optional. Metadata describing the Model's input and output for explanation.

ExplanationParameters

JSON representation
{
  "topK": integer,
  "outputIndices": array,

  // Union field method can be only one of the following:
  "sampledShapleyAttribution": {
    object (SampledShapleyAttribution)
  },
  "integratedGradientsAttribution": {
    object (IntegratedGradientsAttribution)
  },
  "xraiAttribution": {
    object (XraiAttribution)
  },
  "examples": {
    object (Examples)
  }
  // End of list of possible types for union field method.
}
Fields
topK

integer

If populated, returns attributions for top K indices of outputs (defaults to 1). Only applies to Models that predicts more than one outputs (e,g, multi-class Models). When set to -1, returns explanations for all outputs.

outputIndices

array (ListValue format)

If populated, only returns attributions that have output_index contained in output_indices. It must be an ndarray of integers, with the same shape of the output it's explaining.

If not populated, returns attributions for top_k indices of outputs. If neither top_k nor output_indices is populated, returns the argmax index of the outputs.

Only applicable to Models that predict multiple outputs (e,g, multi-class Models that predict multiple classes).

Union field method.

method can be only one of the following:

sampledShapleyAttribution

object (SampledShapleyAttribution)

An attribution method that approximates Shapley values for features that contribute to the label being predicted. A sampling strategy is used to approximate the value rather than considering all subsets of features. Refer to this paper for model details: https://arxiv.org/abs/1306.4265.

integratedGradientsAttribution

object (IntegratedGradientsAttribution)

An attribution method that computes Aumann-Shapley values taking advantage of the model's fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1703.01365

xraiAttribution

object (XraiAttribution)

An attribution method that redistributes Integrated Gradients attribution to segmented regions, taking advantage of the model's fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1906.02825

XRAI currently performs better on natural images, like a picture of a house or an animal. If the images are taken in artificial environments, like a lab or manufacturing line, or from diagnostic equipment, like x-rays or quality-control cameras, use Integrated Gradients instead.

examples

object (Examples)

Example-based explanations that returns the nearest neighbors from the provided dataset.

SampledShapleyAttribution

JSON representation
{
  "pathCount": integer
}
Fields
pathCount

integer

Required. The number of feature permutations to consider when approximating the Shapley values.

Valid range of its value is [1, 50], inclusively.

IntegratedGradientsAttribution

JSON representation
{
  "stepCount": integer,
  "smoothGradConfig": {
    object (SmoothGradConfig)
  },
  "blurBaselineConfig": {
    object (BlurBaselineConfig)
  }
}
Fields
stepCount

integer

Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is within the desired error range.

Valid range of its value is [1, 100], inclusively.

smoothGradConfig

object (SmoothGradConfig)

Config for SmoothGrad approximation of gradients.

When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf

blurBaselineConfig

object (BlurBaselineConfig)

Config for IG with blur baseline.

When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383

SmoothGradConfig

JSON representation
{
  "noisySampleCount": integer,

  // Union field GradientNoiseSigma can be only one of the following:
  "noiseSigma": number,
  "featureNoiseSigma": {
    object (FeatureNoiseSigma)
  }
  // End of list of possible types for union field GradientNoiseSigma.
}
Fields
noisySampleCount

integer

The number of gradient samples to use for approximation. The higher this number, the more accurate the gradient is, but the runtime complexity increases by this factor as well. Valid range of its value is [1, 50]. Defaults to 3.

Union field GradientNoiseSigma. Represents the standard deviation of the gaussian kernel that will be used to add noise to the interpolated inputs prior to computing gradients. GradientNoiseSigma can be only one of the following:
noiseSigma

number

This is a single float value and will be used to add noise to all the features. Use this field when all features are normalized to have the same distribution: scale to range [0, 1], [-1, 1] or z-scoring, where features are normalized to have 0-mean and 1-variance. Learn more about normalization.

For best results the recommended value is about 10% - 20% of the standard deviation of the input feature. Refer to section 3.2 of the SmoothGrad paper: https://arxiv.org/pdf/1706.03825.pdf. Defaults to 0.1.

If the distribution is different per feature, set feature_noise_sigma instead for each feature.

featureNoiseSigma

object (FeatureNoiseSigma)

This is similar to noise_sigma, but provides additional flexibility. A separate noise sigma can be provided for each feature, which is useful if their distributions are different. No noise is added to features that are not set. If this field is unset, noise_sigma will be used for all features.

FeatureNoiseSigma

JSON representation
{
  "noiseSigma": [
    {
      object (NoiseSigmaForFeature)
    }
  ]
}
Fields
noiseSigma[]

object (NoiseSigmaForFeature)

Noise sigma per feature. No noise is added to features that are not set.

NoiseSigmaForFeature

JSON representation
{
  "name": string,
  "sigma": number
}
Fields
name

string

The name of the input feature for which noise sigma is provided. The features are defined in explanation metadata inputs.

sigma

number

This represents the standard deviation of the Gaussian kernel that will be used to add noise to the feature prior to computing gradients. Similar to noise_sigma but represents the noise added to the current feature. Defaults to 0.1.

BlurBaselineConfig

JSON representation
{
  "maxBlurSigma": number
}
Fields
maxBlurSigma

number

The standard deviation of the blur kernel for the blurred baseline. The same blurring parameter is used for both the height and the width dimension. If not set, the method defaults to the zero (i.e. black for images) baseline.

XraiAttribution

JSON representation
{
  "stepCount": integer,
  "smoothGradConfig": {
    object (SmoothGradConfig)
  },
  "blurBaselineConfig": {
    object (BlurBaselineConfig)
  }
}
Fields
stepCount

integer

Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is met within the desired error range.

Valid range of its value is [1, 100], inclusively.

smoothGradConfig

object (SmoothGradConfig)

Config for SmoothGrad approximation of gradients.

When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf

blurBaselineConfig

object (BlurBaselineConfig)

Config for XRAI with blur baseline.

When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383

Examples

JSON representation
{
  "gcsSource": {
    object (GcsSource)
  },
  "neighborCount": integer,

  // Union field source can be only one of the following:
  "exampleGcsSource": {
    object (ExampleGcsSource)
  }
  // End of list of possible types for union field source.

  // Union field config can be only one of the following:
  "nearestNeighborSearchConfig": value,
  "presets": {
    object (Presets)
  }
  // End of list of possible types for union field config.
}
Fields
gcsSource

object (GcsSource)

The Cloud Storage locations that contain the instances to be indexed for approximate nearest neighbor search.

neighborCount

integer

The number of neighbors to return when querying for examples.

Union field source.

source can be only one of the following:

exampleGcsSource

object (ExampleGcsSource)

The Cloud Storage input instances.

Union field config.

config can be only one of the following:

nearestNeighborSearchConfig

value (Value format)

The full configuration for the generated index, the semantics are the same as metadata and should match NearestNeighborSearchConfig.

presets

object (Presets)

Simplified preset configuration, which automatically sets configuration values based on the desired query speed-precision trade-off and modality.

ExampleGcsSource

JSON representation
{
  "dataFormat": enum (DataFormat),
  "gcsSource": {
    object (GcsSource)
  }
}
Fields
dataFormat

enum (DataFormat)

The format in which instances are given, if not specified, assume it's JSONL format. Currently only JSONL format is supported.

gcsSource

object (GcsSource)

The Cloud Storage location for the input instances.

GcsSource

JSON representation
{
  "uris": [
    string
  ]
}
Fields
uris[]

string

Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/wildcards.

Value

JSON representation
{

  // Union field kind can be only one of the following:
  "nullValue": null,
  "numberValue": number,
  "stringValue": string,
  "boolValue": boolean,
  "structValue": {
    object
  },
  "listValue": array
  // End of list of possible types for union field kind.
}
Fields
Union field kind. The kind of value. kind can be only one of the following:
nullValue

null

Represents a null value.

numberValue

number

Represents a double value.

stringValue

string

Represents a string value.

boolValue

boolean

Represents a boolean value.

structValue

object (Struct format)

Represents a structured value.

listValue

array (ListValue format)

Represents a repeated Value.

Struct

JSON representation
{
  "fields": {
    string: value,
    ...
  }
}
Fields
fields

map (key: string, value: value (Value format))

Unordered map of dynamically typed values.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

FieldsEntry

JSON representation
{
  "key": string,
  "value": value
}
Fields
key

string

value

value (Value format)

ListValue

JSON representation
{
  "values": [
    value
  ]
}
Fields
values[]

value (Value format)

Repeated field of dynamically typed values.

Presets

JSON representation
{
  "modality": enum (Modality),

  // Union field _query can be only one of the following:
  "query": enum (Query)
  // End of list of possible types for union field _query.
}
Fields
modality

enum (Modality)

The modality of the uploaded model, which automatically configures the distance measurement and feature normalization for the underlying example index and queries. If your model does not precisely fit one of these types, it is okay to choose the closest type.

Union field _query.

_query can be only one of the following:

query

enum (Query)

Preset option controlling parameters for speed-precision trade-off when querying for examples. If omitted, defaults to PRECISE.

ExplanationMetadata

JSON representation
{
  "inputs": {
    string: {
      object (InputMetadata)
    },
    ...
  },
  "outputs": {
    string: {
      object (OutputMetadata)
    },
    ...
  },
  "featureAttributionsSchemaUri": string,
  "latentSpaceSource": string
}
Fields
inputs

map (key: string, value: object (InputMetadata))

Required. Map from feature names to feature input metadata. Keys are the name of the features. Values are the specification of the feature.

An empty InputMetadata is valid. It describes a text feature which has the name specified as the key in ExplanationMetadata.inputs. The baseline of the empty feature is chosen by Vertex AI.

For Vertex AI-provided Tensorflow images, the key can be any friendly name of the feature. Once specified, featureAttributions are keyed by this key (if not grouped with another feature).

For custom images, the key must match with the key in instance.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

outputs

map (key: string, value: object (OutputMetadata))

Required. Map from output names to output metadata.

For Vertex AI-provided Tensorflow images, keys can be any user defined string that consists of any UTF-8 characters.

For custom images, keys are the name of the output field in the prediction to be explained.

Currently only one key is allowed.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

featureAttributionsSchemaUri

string

Points to a YAML file stored on Google Cloud Storage describing the format of the feature attributions. The schema is defined as an OpenAPI 3.0.2 Schema Object. AutoML tabular Models always have this field populated by Vertex AI. Note: The URI given on output may be different, including the URI scheme, than the one given on input. The output URI will point to a location where the user only has a read access.

latentSpaceSource

string

Name of the source to generate embeddings for example based explanations.

InputsEntry

JSON representation
{
  "key": string,
  "value": {
    object (InputMetadata)
  }
}
Fields
key

string

value

object (InputMetadata)

InputMetadata

JSON representation
{
  "inputBaselines": [
    value
  ],
  "inputTensorName": string,
  "encoding": enum (Encoding),
  "modality": string,
  "featureValueDomain": {
    object (FeatureValueDomain)
  },
  "indicesTensorName": string,
  "denseShapeTensorName": string,
  "indexFeatureMapping": [
    string
  ],
  "encodedTensorName": string,
  "encodedBaselines": [
    value
  ],
  "visualization": {
    object (Visualization)
  },
  "groupName": string
}
Fields
inputBaselines[]

value (Value format)

Baseline inputs for this feature.

If no baseline is specified, Vertex AI chooses the baseline for this feature. If multiple baselines are specified, Vertex AI returns the average attributions across them in Attribution.feature_attributions.

For Vertex AI-provided Tensorflow images (both 1.x and 2.x), the shape of each baseline must match the shape of the input tensor. If a scalar is provided, we broadcast to the same shape as the input tensor.

For custom images, the element of the baselines must be in the same format as the feature's input in the instance[]. The schema of any single instance may be specified via Endpoint's DeployedModels' Model's PredictSchemata's instance_schema_uri.

inputTensorName

string

Name of the input tensor for this feature. Required and is only applicable to Vertex AI-provided images for Tensorflow.

encoding

enum (Encoding)

Defines how the feature is encoded into the input tensor. Defaults to IDENTITY.

modality

string

Modality of the feature. Valid values are: numeric, image. Defaults to numeric.

featureValueDomain

object (FeatureValueDomain)

The domain details of the input feature value. Like min/max, original mean or standard deviation if normalized.

indicesTensorName

string

Specifies the index of the values of the input tensor. Required when the input tensor is a sparse representation. Refer to Tensorflow documentation for more details: https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor.

denseShapeTensorName

string

Specifies the shape of the values of the input if the input is a sparse representation. Refer to Tensorflow documentation for more details: https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor.

indexFeatureMapping[]

string

A list of feature names for each index in the input tensor. Required when the input InputMetadata.encoding is BAG_OF_FEATURES, BAG_OF_FEATURES_SPARSE, INDICATOR.

encodedTensorName

string

Encoded tensor is a transformation of the input tensor. Must be provided if choosing Integrated Gradients attribution or XRAI attribution and the input tensor is not differentiable.

An encoded tensor is generated if the input tensor is encoded by a lookup table.

encodedBaselines[]

value (Value format)

A list of baselines for the encoded tensor.

The shape of each baseline should match the shape of the encoded tensor. If a scalar is provided, Vertex AI broadcasts to the same shape as the encoded tensor.

visualization

object (Visualization)

Visualization configurations for image explanation.

groupName

string

Name of the group that the input belongs to. Features with the same group name will be treated as one feature when computing attributions. Features grouped together can have different shapes in value. If provided, there will be one single attribution generated in Attribution.feature_attributions, keyed by the group name.

FeatureValueDomain

JSON representation
{
  "minValue": number,
  "maxValue": number,
  "originalMean": number,
  "originalStddev": number
}
Fields
minValue

number

The minimum permissible value for this feature.

maxValue

number

The maximum permissible value for this feature.

originalMean

number

If this input feature has been normalized to a mean value of 0, the original_mean specifies the mean value of the domain prior to normalization.

originalStddev

number

If this input feature has been normalized to a standard deviation of 1.0, the original_stddev specifies the standard deviation of the domain prior to normalization.

Visualization

JSON representation
{
  "type": enum (Type),
  "polarity": enum (Polarity),
  "colorMap": enum (ColorMap),
  "clipPercentUpperbound": number,
  "clipPercentLowerbound": number,
  "overlayType": enum (OverlayType)
}
Fields
type

enum (Type)

Type of the image visualization. Only applicable to Integrated Gradients attribution. OUTLINES shows regions of attribution, while PIXELS shows per-pixel attribution. Defaults to OUTLINES.

polarity

enum (Polarity)

Whether to only highlight pixels with positive contributions, negative or both. Defaults to POSITIVE.

colorMap

enum (ColorMap)

The color scheme used for the highlighted areas.

Defaults to PINK_GREEN for Integrated Gradients attribution, which shows positive attributions in green and negative in pink.

Defaults to VIRIDIS for XRAI attribution, which highlights the most influential regions in yellow and the least influential in blue.

clipPercentUpperbound

number

Excludes attributions above the specified percentile from the highlighted areas. Using the clip_percent_upperbound and clip_percent_lowerbound together can be useful for filtering out noise and making it easier to see areas of strong attribution. Defaults to 99.9.

clipPercentLowerbound

number

Excludes attributions below the specified percentile, from the highlighted areas. Defaults to 62.

overlayType

enum (OverlayType)

How the original image is displayed in the visualization. Adjusting the overlay can help increase visual clarity if the original image makes it difficult to view the visualization. Defaults to NONE.

OutputsEntry

JSON representation
{
  "key": string,
  "value": {
    object (OutputMetadata)
  }
}
Fields
key

string

value

object (OutputMetadata)

OutputMetadata

JSON representation
{
  "outputTensorName": string,

  // Union field display_name_mapping can be only one of the following:
  "indexDisplayNameMapping": value,
  "displayNameMappingKey": string
  // End of list of possible types for union field display_name_mapping.
}
Fields
outputTensorName

string

Name of the output tensor. Required and is only applicable to Vertex AI provided images for Tensorflow.

Union field display_name_mapping. Defines how to map Attribution.output_index to Attribution.output_display_name.

If neither of the fields are specified, Attribution.output_display_name will not be populated. display_name_mapping can be only one of the following:

indexDisplayNameMapping

value (Value format)

Static mapping between the index and display name.

Use this if the outputs are a deterministic n-dimensional array, e.g. a list of scores of all the classes in a pre-defined order for a multi-classification Model. It's not feasible if the outputs are non-deterministic, e.g. the Model produces top-k classes or sort the outputs by their values.

The shape of the value must be an n-dimensional array of strings. The number of dimensions must match that of the outputs to be explained. The Attribution.output_display_name is populated by locating in the mapping with Attribution.output_index.

displayNameMappingKey

string

Specify a field name in the prediction to look for the display name.

Use this if the prediction contains the display names for the outputs.

The display names in the prediction must have the same shape of the outputs, so that it can be located by Attribution.output_index for a specific output.

PrivateEndpoints

JSON representation
{
  "predictHttpUri": string,
  "explainHttpUri": string,
  "healthHttpUri": string,
  "serviceAttachment": string
}
Fields
predictHttpUri

string

Output only. Http(s) path to send prediction requests.

explainHttpUri

string

Output only. Http(s) path to send explain requests.

healthHttpUri

string

Output only. Http(s) path to send health check requests.

serviceAttachment

string

Output only. The name of the service attachment resource. Populated if private service connect is enabled.

FasterDeploymentConfig

JSON representation
{
  "fastTryoutEnabled": boolean
}
Fields
fastTryoutEnabled

boolean

If true, enable fast tryout feature for this deployed model.

RolloutOptions

JSON representation
{
  "previousDeployedModel": string,
  "revisionNumber": integer,

  // Union field max_unavailable can be only one of the following:
  "maxUnavailableReplicas": integer,
  "maxUnavailablePercentage": integer
  // End of list of possible types for union field max_unavailable.

  // Union field max_surge can be only one of the following:
  "maxSurgeReplicas": integer,
  "maxSurgePercentage": integer
  // End of list of possible types for union field max_surge.
}
Fields
previousDeployedModel

string

ID of the DeployedModel that this deployment should replace.

revisionNumber

integer

Output only. Read-only. Revision number determines the relative priority of DeployedModels in the same rollout. The DeployedModel with the largest revision number specifies the intended state of the deployment.

Union field max_unavailable. Configures how many replicas are allowed to be unavailable during a rolling deployment. max_unavailable can be only one of the following:
maxUnavailableReplicas

integer

Absolute count of replicas allowed to be unavailable.

maxUnavailablePercentage

integer

Percentage of replicas allowed to be unavailable. For autoscaling deployments, this refers to the target replica count.

Union field max_surge. Configures how many additional replicas can be provisioned during a rolling deployment. max_surge can be only one of the following:
maxSurgeReplicas

integer

Absolute count of allowed additional replicas.

maxSurgePercentage

integer

Percentage of allowed additional replicas. For autoscaling deployments, this refers to the target replica count.

Status

JSON representation
{
  "message": string,
  "lastUpdateTime": string,
  "availableReplicaCount": integer
}
Fields
message

string

Output only. The latest deployed model's status message (if any).

lastUpdateTime

string (Timestamp format)

Output only. The time at which the status was last updated.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

availableReplicaCount

integer

Output only. The number of available replicas of the deployed model.

SystemLabelsEntry

JSON representation
{
  "key": string,
  "value": string
}
Fields
key

string

value

string

SpeculativeDecodingSpec

JSON representation
{
  "speculativeTokenCount": integer,

  // Union field speculation can be only one of the following:
  "draftModelSpeculation": {
    object (DraftModelSpeculation)
  },
  "ngramSpeculation": {
    object (NgramSpeculation)
  }
  // End of list of possible types for union field speculation.
}
Fields
speculativeTokenCount

integer

The number of speculative tokens to generate at each step.

Union field speculation. The type of speculation method to use. speculation can be only one of the following:
draftModelSpeculation

object (DraftModelSpeculation)

draft model speculation.

ngramSpeculation

object (NgramSpeculation)

N-Gram speculation.

DraftModelSpeculation

JSON representation
{
  "draftModel": string
}
Fields
draftModel

string

Required. The resource name of the draft model.

NgramSpeculation

JSON representation
{
  "ngramSize": integer
}
Fields
ngramSize

integer

The number of last N input tokens used as ngram to search/match against the previous prompt sequence. This is equal to the N in N-Gram. The default value is 3 if not specified.

TrafficSplitEntry

JSON representation
{
  "key": string,
  "value": integer
}
Fields
key

string

value

integer

LabelsEntry

JSON representation
{
  "key": string,
  "value": string
}
Fields
key

string

value

string

EncryptionSpec

JSON representation
{
  "kmsKeyName": string
}
Fields
kmsKeyName

string

Required. Resource name of the Cloud KMS key used to protect the resource.

The Cloud KMS key must be in the same region as the resource. It must have the format projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}.

PrivateServiceConnectConfig

JSON representation
{
  "enablePrivateServiceConnect": boolean,
  "projectAllowlist": [
    string
  ],
  "pscAutomationConfigs": [
    {
      object (PSCAutomationConfig)
    }
  ],
  "enableSecurePrivateServiceConnect": boolean,
  "serviceAttachment": string
}
Fields
enablePrivateServiceConnect

boolean

Required. If true, expose the IndexEndpoint via private service connect.

projectAllowlist[]

string

A list of Projects from which the forwarding rule will target the service attachment.

pscAutomationConfigs[]

object (PSCAutomationConfig)

Optional. List of projects and networks where the PSC endpoints will be created. This field is used by Online Inference(Prediction) only.

enableSecurePrivateServiceConnect

boolean

Optional. If set to true, enable secure private service connect with IAM authorization. Otherwise, private service connect will be done without authorization. Note latency will be slightly increased if authorization is enabled.

serviceAttachment

string

Output only. The name of the generated service attachment resource. This is only populated if the endpoint is deployed with PrivateServiceConnect.

PSCAutomationConfig

JSON representation
{
  "projectId": string,
  "network": string,
  "ipAddress": string,
  "forwardingRule": string,
  "state": enum (PSCAutomationState),
  "errorMessage": string
}
Fields
projectId

string

Required. Project id used to create forwarding rule.

network

string

Required. The full name of the Google Compute Engine network. Format: projects/{project}/global/networks/{network}.

ipAddress

string

Output only. IP address rule created by the PSC service automation.

forwardingRule

string

Output only. Forwarding rule created by the PSC service automation.

state

enum (PSCAutomationState)

Output only. The state of the PSC service automation.

errorMessage

string

Output only. Error message if the PSC service automation failed.

PredictRequestResponseLoggingConfig

JSON representation
{
  "enabled": boolean,
  "samplingRate": number,
  "errorSamplingRate": number,
  "bigqueryDestination": {
    object (BigQueryDestination)
  },
  "requestResponseLoggingSchemaVersion": string,
  "enableOtelLogging": boolean
}
Fields
enabled

boolean

If logging is enabled or not.

samplingRate

number

Percentage of requests to be logged, expressed as a fraction in range(0,1].

errorSamplingRate

number

Optional. Percentage of failed requests to be logged, expressed as a fraction in range [0,1]. Only non-transient errors will be logged (currently 500/Internal errors).

bigqueryDestination

object (BigQueryDestination)

BigQuery table for logging. If only given a project, a new dataset will be created with name logging_<endpoint-display-name>_<endpoint-id> where will be made BigQuery-dataset-name compatible (e.g. most special characters will become underscores). If no table name is given, a new table will be created with name request_response_logging

requestResponseLoggingSchemaVersion

string

Output only. The schema version used in creating the BigQuery table for the request response logging. The versions are "v1" and "v2". The current default version is "v1".

enableOtelLogging

boolean

This field is used for large models. If true, in addition to the original large model logs, logs will be converted in OTel schema format, and saved in otel_log column. Default value is false.

BigQueryDestination

JSON representation
{
  "outputUri": string
}
Fields
outputUri

string

Required. BigQuery URI to a project or table, up to 2000 characters long.

When only the project is specified, the Dataset and Table is created. When the full table reference is specified, the Dataset must exist and table must not exist.

Accepted forms:

  • BigQuery path. For example: bq://projectId or bq://projectId.bqDatasetId or bq://projectId.bqDatasetId.bqTableId.

ClientConnectionConfig

JSON representation
{
  "inferenceTimeout": string
}
Fields
inferenceTimeout

string (Duration format)

Customizable online prediction request timeout.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

GenAiAdvancedFeaturesConfig

JSON representation
{
  "ragConfig": {
    object (RagConfig)
  }
}
Fields
ragConfig

object (RagConfig)

Configuration for Retrieval Augmented Generation feature.

RagConfig

JSON representation
{
  "enableRag": boolean
}
Fields
enableRag

boolean

If true, enable Retrieval Augmented Generation in ChatCompletion request. Once enabled, the endpoint will be identified as GenAI endpoint and Arthedain router will be used.

Output Schema

This resource represents a long-running operation that is the result of a network API call.

Operation

JSON representation
{
  "name": string,
  "metadata": {
    "@type": string,
    field1: ...,
    ...
  },
  "done": boolean,

  // Union field result can be only one of the following:
  "error": {
    object (Status)
  },
  "response": {
    "@type": string,
    field1: ...,
    ...
  }
  // End of list of possible types for union field result.
}
Fields
name

string

The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, the name should be a resource name ending with operations/{unique_id}.

metadata

object

Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

done

boolean

If the value is false, it means the operation is still in progress. If true, the operation is completed, and either error or response is available.

Union field result. The operation result, which can be either an error or a valid response. If done == false, neither error nor response is set. If done == true, exactly one of error or response can be set. Some services might not provide the result. result can be only one of the following:
error

object (Status)

The error result of the operation in case of failure or cancellation.

response

object

The normal, successful response of the operation. If the original method returns no data on success, such as Delete, the response is google.protobuf.Empty. If the original method is standard Get/Create/Update, the response should be the resource. For other methods, the response should have the type XxxResponse, where Xxx is the original method name. For example, if the original method name is TakeSnapshot(), the inferred response type is TakeSnapshotResponse.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Any

JSON representation
{
  "typeUrl": string,
  "value": string
}
Fields
typeUrl

string

Identifies the type of the serialized Protobuf message with a URI reference consisting of a prefix ending in a slash and the fully-qualified type name.

Example: type.googleapis.com/google.protobuf.StringValue

This string must contain at least one / character, and the content after the last / must be the fully-qualified name of the type in canonical form, without a leading dot. Do not write a scheme on these URI references so that clients do not attempt to contact them.

The prefix is arbitrary and Protobuf implementations are expected to simply strip off everything up to and including the last / to identify the type. type.googleapis.com/ is a common default prefix that some legacy implementations require. This prefix does not indicate the origin of the type, and URIs containing it are not expected to respond to any requests.

All type URL strings must be legal URI references with the additional restriction (for the text format) that the content of the reference must consist only of alphanumeric characters, percent-encoded escapes, and characters in the following set (not including the outer backticks): /-.~_!$&()*+,;=. Despite our allowing percent encodings, implementations should not unescape them to prevent confusion with existing parsers. For example, type.googleapis.com%2FFoo should be rejected.

In the original design of Any, the possibility of launching a type resolution service at these type URLs was considered but Protobuf never implemented one and considers contacting these URLs to be problematic and a potential security issue. Do not attempt to contact type URLs.

value

string (bytes format)

Holds a Protobuf serialization of the type described by type_url.

A base64-encoded string.

Status

JSON representation
{
  "code": integer,
  "message": string,
  "details": [
    {
      "@type": string,
      field1: ...,
      ...
    }
  ]
}
Fields
code

integer

The status code, which should be an enum value of google.rpc.Code.

message

string

A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.

details[]

object

A list of messages that carry the error details. There is a common set of message types for APIs to use.

An object containing fields of an arbitrary type. An additional field "@type" contains a URI identifying the type. Example: { "id": 1234, "@type": "types.example.com/standard/id" }.

Tool Annotations

Destructive Hint: ❌ | Idempotent Hint: ❌ | Read Only Hint: ❌ | Open World Hint: ❌