Tool: list_endpoints
Lists Endpoints in a Location.
The following sample demonstrate how to use curl to invoke the list_endpoints MCP tool.
| Curl Request |
|---|
curl --location 'https://us-central1-aiplatform.googleapis.com/mcp' \ --header 'content-type: application/json' \ --header 'accept: application/json, text/event-stream' \ --data '{ "method": "tools/call", "params": { "name": "list_endpoints", "arguments": { // provide these details according to the tool's MCP specification } }, "jsonrpc": "2.0", "id": 1 }' |
Input Schema
Request message for EndpointService.ListEndpoints.
ListEndpointsRequest
| JSON representation |
|---|
{ "parent": string, "filter": string, "pageSize": integer, "pageToken": string, "readMask": string, "gdcZone": string } |
| Fields | |
|---|---|
parent |
Required. The resource name of the Location from which to list the Endpoints. Format: |
filter |
Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
Some examples:
|
pageSize |
Optional. The standard list page size. |
pageToken |
Optional. The standard list page token. Typically obtained via |
readMask |
Optional. Mask specifying which fields to read. This is a comma-separated list of fully qualified names of fields. Example: |
gdcZone |
Optional. Configures the Google Distributed Cloud (GDC) environment for online prediction. Only set this field when the Endpoint is to be deployed in a GDC environment. |
FieldMask
| JSON representation |
|---|
{ "paths": [ string ] } |
| Fields | |
|---|---|
paths[] |
The set of field mask paths. |
Output Schema
Response message for EndpointService.ListEndpoints.
ListEndpointsResponse
| JSON representation |
|---|
{
"endpoints": [
{
object ( |
| Fields | |
|---|---|
endpoints[] |
List of Endpoints in the requested page. |
nextPageToken |
A token to retrieve the next page of results. Pass to |
Endpoint
| JSON representation |
|---|
{ "name": string, "displayName": string, "description": string, "deployedModels": [ { object ( |
| Fields | |
|---|---|
name |
Identifier. The resource name of the Endpoint. |
displayName |
Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters. |
description |
The description of the Endpoint. |
deployedModels[] |
Output only. The models deployed in this Endpoint. To add or remove DeployedModels use |
trafficSplit |
A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment. An object containing a list of |
etag |
Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens. |
labels |
The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. An object containing a list of |
createTime |
Output only. Timestamp when this Endpoint was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
updateTime |
Output only. Timestamp when this Endpoint was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
encryptionSpec |
Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. |
network |
Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, Format: |
enablePrivateServiceConnect |
Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, |
privateServiceConnectConfig |
Optional. Configuration for private service connect.
|
modelDeploymentMonitoringJob |
Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by |
predictRequestResponseLoggingConfig |
Configures the request-response logging for online prediction. |
dedicatedEndpointEnabled |
If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon. |
dedicatedEndpointDns |
Output only. DNS of the dedicated endpoint. Will only be populated if dedicated_endpoint_enabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format: |
clientConnectionConfig |
Configurations that are applied to the endpoint for online prediction. |
satisfiesPzs |
Output only. Reserved for future use. |
satisfiesPzi |
Output only. Reserved for future use. |
genAiAdvancedFeaturesConfig |
Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported. |
DeployedModel
| JSON representation |
|---|
{ "id": string, "model": string, "gdcConnectedModel": string, "modelVersionId": string, "displayName": string, "createTime": string, "explanationSpec": { object ( |
| Fields | |
|---|---|
id |
Immutable. The ID of the DeployedModel. If not provided upon deployment, Vertex AI will generate a value for this ID. This value should be 1-10 characters, and valid characters are |
model |
The resource name of the Model that this is the deployment of. Note that the Model may be in a different location than the DeployedModel's Endpoint. The resource name may contain version id or version alias to specify the version. Example: |
gdcConnectedModel |
GDC pretrained / Gemini model name. The model name is a plain model name, e.g. gemini-1.5-flash-002. |
modelVersionId |
Output only. The version ID of the model that is deployed. |
displayName |
The display name of the DeployedModel. If not provided upon creation, the Model's display_name is used. |
createTime |
Output only. Timestamp when the DeployedModel was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
explanationSpec |
Explanation configuration for this DeployedModel. When deploying a Model using |
disableExplanations |
If true, deploy the model without explainable feature, regardless the existence of |
serviceAccount |
The service account that the DeployedModel's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project. Users deploying the Model must have the |
enableContainerLogging |
If true, the container of the DeployedModel instances will send Only supported for custom-trained Models and AutoML Tabular Models. |
disableContainerLogging |
For custom-trained Models and AutoML Tabular Models, the container of the DeployedModel instances will send User can disable container logging by setting this flag to true. |
enableAccessLogging |
If true, online prediction access logs are sent to Cloud Logging. These logs are like standard server access logs, containing information like timestamp and latency for each prediction request. Note that logs may incur a cost, especially if your project receives prediction requests at a high queries per second rate (QPS). Estimate your costs before enabling this option. |
privateEndpoints |
Output only. Provide paths for users to send predict/explain/health requests directly to the deployed model services running on Cloud via private services access. This field is populated if |
fasterDeploymentConfig |
Configuration for faster model deployment. |
rolloutOptions |
Options for configuring rolling deployments. |
status |
Output only. Runtime status of the deployed model. |
systemLabels |
System labels to apply to Model Garden deployments. System labels are managed by Google for internal use only. An object containing a list of |
checkpointId |
The checkpoint id of the model. |
speculativeDecodingSpec |
Optional. Spec for configuring speculative decoding. |
Union field prediction_resources. The prediction (for example, the machine) resources that the DeployedModel uses. The user is billed for the resources (at least their minimal amount) even if the DeployedModel receives no traffic. Not all Models support all resources types. See Model.supported_deployment_resources_types. Required except for Large Model Deploy use cases. prediction_resources can be only one of the following: |
|
dedicatedResources |
A description of resources that are dedicated to the DeployedModel, and that need a higher degree of manual configuration. |
automaticResources |
A description of resources that to large degree are decided by Vertex AI, and require only a modest additional configuration. |
sharedResources |
The resource name of the shared DeploymentResourcePool to deploy on. Format: |
fullFineTunedResources |
Optional. Resources for a full fine tuned model. |
DedicatedResources
| JSON representation |
|---|
{ "machineSpec": { object ( |
| Fields | |
|---|---|
machineSpec |
Required. Immutable. The specification of a single machine being used. |
minReplicaCount |
Required. Immutable. The minimum number of machine replicas that will be always deployed on. This value must be greater than or equal to 1. If traffic increases, it may dynamically be deployed onto more replicas, and as traffic decreases, some of these extra replicas may be freed. |
maxReplicaCount |
Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, will use The value of this field impacts the charge against Vertex CPU and GPU quotas. Specifically, you will be charged for (max_replica_count * number of cores in the selected machine type) and (max_replica_count * number of GPUs per replica in the selected machine type). |
requiredReplicaCount |
Optional. Number of required available replicas for the deployment to succeed. This field is only needed when partial deployment/mutation is desired. If set, the deploy/mutate operation will succeed once available_replica_count reaches required_replica_count, and the rest of the replicas will be retried. If not set, the default required_replica_count will be min_replica_count. |
initialReplicaCount |
Immutable. Number of initial replicas being deployed on when scaling the workload up from zero or when creating the workload in case |
autoscalingMetricSpecs[] |
Immutable. The metric specifications that overrides a resource utilization metric (CPU utilization, accelerator's duty cycle, and so on) target value (default to 60 if not set). At most one entry is allowed per metric. If If For example, in the case of Online Prediction, if you want to override target CPU utilization to 80, you should set |
spot |
Optional. If true, schedule the deployment workload on spot VMs. |
flexStart |
Optional. Immutable. If set, use DWS resource to schedule the deployment workload. reference: (https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler) |
scaleToZeroSpec |
Optional. Specification for scale-to-zero feature. |
MachineSpec
| JSON representation |
|---|
{ "machineType": string, "acceleratorType": enum ( |
| Fields | |
|---|---|
machineType |
Immutable. The type of the machine. See the list of machine types supported for prediction See the list of machine types supported for custom training. For |
acceleratorType |
Immutable. The type of accelerator(s) that may be attached to the machine as per |
acceleratorCount |
The number of accelerators to attach to the machine. For accelerator optimized machine types (https://cloud.google.com/compute/docs/accelerator-optimized-machines), One may set the accelerator_count from 1 to N for machine with N GPUs. If accelerator_count is less than or equal to N / 2, Vertex will co-schedule the replicas of the model into the same VM to save cost. For example, if the machine type is a3-highgpu-8g, which has 8 H100 GPUs, one can set accelerator_count to 1 to 8. If accelerator_count is 1, 2, 3, or 4, Vertex will co-schedule 8, 4, 2, or 2 replicas of the model into the same VM to save cost. When co-scheduling, CPU, memory and storage on the VM will be distributed to replicas on the VM. For example, one can expect a co-scheduled replica requesting 2 GPUs out of a 8-GPU VM will receive 25% of the CPU, memory and storage of the VM. Note that the feature is not compatible with |
gpuPartitionSize |
Optional. Immutable. The Nvidia GPU partition size. When specified, the requested accelerators will be partitioned into smaller GPU partitions. For example, if the request is for 8 units of NVIDIA A100 GPUs, and gpu_partition_size="1g.10gb", the service will create 8 * 7 = 56 partitioned MIG instances. The partition size must be a value supported by the requested accelerator. Refer to Nvidia GPU Partitioning for the available partition sizes. If set, the accelerator_count should be set to 1. |
tpuTopology |
Immutable. The topology of the TPUs. Corresponds to the TPU topologies available from GKE. (Example: tpu_topology: "2x2x1"). |
multihostGpuNodeCount |
Optional. Immutable. The number of nodes per replica for multihost GPU deployments. |
reservationAffinity |
Optional. Immutable. Configuration controlling how this resource pool consumes reservation. |
minGpuDriverVersion |
Optional. Immutable. The minimum GPU driver version that this machine requires. For example, "535.104.06". If not specified, the default GPU driver version will be used by the underlying infrastructure. |
ReservationAffinity
| JSON representation |
|---|
{
"reservationAffinityType": enum ( |
| Fields | |
|---|---|
reservationAffinityType |
Required. Specifies the reservation affinity type. |
key |
Optional. Corresponds to the label key of a reservation resource. To target a SPECIFIC_RESERVATION by name, use |
values[] |
Optional. Corresponds to the label values of a reservation resource. This must be the full resource name of the reservation or reservation block. |
AutoscalingMetricSpec
| JSON representation |
|---|
{ "metricName": string, "target": integer, "monitoredResourceLabels": { string: string, ... } } |
| Fields | |
|---|---|
metricName |
Required. The resource metric name. Supported metrics:
|
target |
The target resource utilization in percentage (1% - 100%) for the given metric; once the real usage deviates from the target by a certain percentage, the machine replicas change. The default value is 60 (representing 60%) if not provided. |
monitoredResourceLabels |
Optional. The Cloud Monitoring monitored resource labels as key value pairs used for metrics filtering. See Cloud Monitoring Labels https://cloud.google.com/monitoring/api/v3/metric-model#generic-label-info An object containing a list of |
MonitoredResourceLabelsEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
FlexStart
| JSON representation |
|---|
{ "maxRuntimeDuration": string } |
| Fields | |
|---|---|
maxRuntimeDuration |
The max duration of the deployment is max_runtime_duration. The deployment will be terminated after the duration. The max_runtime_duration can be set up to 7 days. A duration in seconds with up to nine fractional digits, ending with ' |
Duration
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Signed seconds of the span of time. Must be from -315,576,000,000 to +315,576,000,000 inclusive. Note: these bounds are computed from: 60 sec/min * 60 min/hr * 24 hr/day * 365.25 days/year * 10000 years |
nanos |
Signed fractions of a second at nanosecond resolution of the span of time. Durations less than one second are represented with a 0 |
ScaleToZeroSpec
| JSON representation |
|---|
{ "minScaleupPeriod": string, "idleScaledownPeriod": string } |
| Fields | |
|---|---|
minScaleupPeriod |
Optional. Minimum duration that a deployment will be scaled up before traffic is evaluated for potential scale-down. [MinValue=300] (5 minutes) [MaxValue=28800] (8 hours) A duration in seconds with up to nine fractional digits, ending with ' |
idleScaledownPeriod |
Optional. Duration of no traffic before scaling to zero. [MinValue=300] (5 minutes) [MaxValue=28800] (8 hours) A duration in seconds with up to nine fractional digits, ending with ' |
AutomaticResources
| JSON representation |
|---|
{ "minReplicaCount": integer, "maxReplicaCount": integer } |
| Fields | |
|---|---|
minReplicaCount |
Immutable. The minimum number of replicas that will be always deployed on. If traffic against it increases, it may dynamically be deployed onto more replicas up to |
maxReplicaCount |
Immutable. The maximum number of replicas that may be deployed on when the traffic against it increases. If the requested value is too large, the deployment will error, but if deployment succeeds then the ability to scale to that many replicas is guaranteed (barring service outages). If traffic increases beyond what its replicas at maximum may handle, a portion of the traffic will be dropped. If this value is not provided, a no upper bound for scaling under heavy traffic will be assume, though Vertex AI may be unable to scale beyond certain replica number. |
FullFineTunedResources
| JSON representation |
|---|
{
"deploymentType": enum ( |
| Fields | |
|---|---|
deploymentType |
Required. The kind of deployment. |
modelInferenceUnitCount |
Optional. The number of model inference units to use for this deployment. This can only be specified for DEPLOYMENT_TYPE_PROD. The following table lists the number of model inference units for different model types: * Gemini 2.5 Flash * Foundation FMIU: 25 * Expansion FMIU: 4 * Gemini 2.5 Pro * Foundation FMIU: 32 * Expansion FMIU: 16 * Veo 3.0 (undistilled) * Foundation FMIU: 63 * Expansion FMIU: 7 * Veo 3.0 (distilled) * Foundation FMIU: 30 * Expansion FMIU: 10 |
Timestamp
| JSON representation |
|---|
{ "seconds": string, "nanos": integer } |
| Fields | |
|---|---|
seconds |
Represents seconds of UTC time since Unix epoch 1970-01-01T00:00:00Z. Must be between -62135596800 and 253402300799 inclusive (which corresponds to 0001-01-01T00:00:00Z to 9999-12-31T23:59:59Z). |
nanos |
Non-negative fractions of a second at nanosecond resolution. This field is the nanosecond portion of the duration, not an alternative to seconds. Negative second values with fractions must still have non-negative nanos values that count forward in time. Must be between 0 and 999,999,999 inclusive. |
ExplanationSpec
| JSON representation |
|---|
{ "parameters": { object ( |
| Fields | |
|---|---|
parameters |
Required. Parameters that configure explaining of the Model's predictions. |
metadata |
Optional. Metadata describing the Model's input and output for explanation. |
ExplanationParameters
| JSON representation |
|---|
{ "topK": integer, "outputIndices": array, // Union field |
| Fields | |
|---|---|
topK |
If populated, returns attributions for top K indices of outputs (defaults to 1). Only applies to Models that predicts more than one outputs (e,g, multi-class Models). When set to -1, returns explanations for all outputs. |
outputIndices |
If populated, only returns attributions that have If not populated, returns attributions for Only applicable to Models that predict multiple outputs (e,g, multi-class Models that predict multiple classes). |
Union field
|
|
sampledShapleyAttribution |
An attribution method that approximates Shapley values for features that contribute to the label being predicted. A sampling strategy is used to approximate the value rather than considering all subsets of features. Refer to this paper for model details: https://arxiv.org/abs/1306.4265. |
integratedGradientsAttribution |
An attribution method that computes Aumann-Shapley values taking advantage of the model's fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1703.01365 |
xraiAttribution |
An attribution method that redistributes Integrated Gradients attribution to segmented regions, taking advantage of the model's fully differentiable structure. Refer to this paper for more details: https://arxiv.org/abs/1906.02825 XRAI currently performs better on natural images, like a picture of a house or an animal. If the images are taken in artificial environments, like a lab or manufacturing line, or from diagnostic equipment, like x-rays or quality-control cameras, use Integrated Gradients instead. |
examples |
Example-based explanations that returns the nearest neighbors from the provided dataset. |
SampledShapleyAttribution
| JSON representation |
|---|
{ "pathCount": integer } |
| Fields | |
|---|---|
pathCount |
Required. The number of feature permutations to consider when approximating the Shapley values. Valid range of its value is [1, 50], inclusively. |
IntegratedGradientsAttribution
| JSON representation |
|---|
{ "stepCount": integer, "smoothGradConfig": { object ( |
| Fields | |
|---|---|
stepCount |
Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is within the desired error range. Valid range of its value is [1, 100], inclusively. |
smoothGradConfig |
Config for SmoothGrad approximation of gradients. When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf |
blurBaselineConfig |
Config for IG with blur baseline. When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383 |
SmoothGradConfig
| JSON representation |
|---|
{ "noisySampleCount": integer, // Union field |
| Fields | |
|---|---|
noisySampleCount |
The number of gradient samples to use for approximation. The higher this number, the more accurate the gradient is, but the runtime complexity increases by this factor as well. Valid range of its value is [1, 50]. Defaults to 3. |
Union field GradientNoiseSigma. Represents the standard deviation of the gaussian kernel that will be used to add noise to the interpolated inputs prior to computing gradients. GradientNoiseSigma can be only one of the following: |
|
noiseSigma |
This is a single float value and will be used to add noise to all the features. Use this field when all features are normalized to have the same distribution: scale to range [0, 1], [-1, 1] or z-scoring, where features are normalized to have 0-mean and 1-variance. Learn more about normalization. For best results the recommended value is about 10% - 20% of the standard deviation of the input feature. Refer to section 3.2 of the SmoothGrad paper: https://arxiv.org/pdf/1706.03825.pdf. Defaults to 0.1. If the distribution is different per feature, set |
featureNoiseSigma |
This is similar to |
FeatureNoiseSigma
| JSON representation |
|---|
{
"noiseSigma": [
{
object ( |
| Fields | |
|---|---|
noiseSigma[] |
Noise sigma per feature. No noise is added to features that are not set. |
NoiseSigmaForFeature
| JSON representation |
|---|
{ "name": string, "sigma": number } |
| Fields | |
|---|---|
name |
The name of the input feature for which noise sigma is provided. The features are defined in |
sigma |
This represents the standard deviation of the Gaussian kernel that will be used to add noise to the feature prior to computing gradients. Similar to |
BlurBaselineConfig
| JSON representation |
|---|
{ "maxBlurSigma": number } |
| Fields | |
|---|---|
maxBlurSigma |
The standard deviation of the blur kernel for the blurred baseline. The same blurring parameter is used for both the height and the width dimension. If not set, the method defaults to the zero (i.e. black for images) baseline. |
XraiAttribution
| JSON representation |
|---|
{ "stepCount": integer, "smoothGradConfig": { object ( |
| Fields | |
|---|---|
stepCount |
Required. The number of steps for approximating the path integral. A good value to start is 50 and gradually increase until the sum to diff property is met within the desired error range. Valid range of its value is [1, 100], inclusively. |
smoothGradConfig |
Config for SmoothGrad approximation of gradients. When enabled, the gradients are approximated by averaging the gradients from noisy samples in the vicinity of the inputs. Adding noise can help improve the computed gradients. Refer to this paper for more details: https://arxiv.org/pdf/1706.03825.pdf |
blurBaselineConfig |
Config for XRAI with blur baseline. When enabled, a linear path from the maximally blurred image to the input image is created. Using a blurred baseline instead of zero (black image) is motivated by the BlurIG approach explained here: https://arxiv.org/abs/2004.03383 |
Examples
| JSON representation |
|---|
{ "gcsSource": { object ( |
| Fields | |
|---|---|
gcsSource |
The Cloud Storage locations that contain the instances to be indexed for approximate nearest neighbor search. |
neighborCount |
The number of neighbors to return when querying for examples. |
Union field
|
|
exampleGcsSource |
The Cloud Storage input instances. |
Union field
|
|
nearestNeighborSearchConfig |
The full configuration for the generated index, the semantics are the same as |
presets |
Simplified preset configuration, which automatically sets configuration values based on the desired query speed-precision trade-off and modality. |
ExampleGcsSource
| JSON representation |
|---|
{ "dataFormat": enum ( |
| Fields | |
|---|---|
dataFormat |
The format in which instances are given, if not specified, assume it's JSONL format. Currently only JSONL format is supported. |
gcsSource |
The Cloud Storage location for the input instances. |
GcsSource
| JSON representation |
|---|
{ "uris": [ string ] } |
| Fields | |
|---|---|
uris[] |
Required. Google Cloud Storage URI(-s) to the input file(s). May contain wildcards. For more information on wildcards, see https://cloud.google.com/storage/docs/wildcards. |
Value
| JSON representation |
|---|
{ // Union field |
| Fields | |
|---|---|
Union field kind. The kind of value. kind can be only one of the following: |
|
nullValue |
Represents a null value. |
numberValue |
Represents a double value. |
stringValue |
Represents a string value. |
boolValue |
Represents a boolean value. |
structValue |
Represents a structured value. |
listValue |
Represents a repeated |
Struct
| JSON representation |
|---|
{ "fields": { string: value, ... } } |
| Fields | |
|---|---|
fields |
Unordered map of dynamically typed values. An object containing a list of |
FieldsEntry
| JSON representation |
|---|
{ "key": string, "value": value } |
| Fields | |
|---|---|
key |
|
value |
|
ListValue
| JSON representation |
|---|
{ "values": [ value ] } |
| Fields | |
|---|---|
values[] |
Repeated field of dynamically typed values. |
Presets
| JSON representation |
|---|
{ "modality": enum ( |
| Fields | |
|---|---|
modality |
The modality of the uploaded model, which automatically configures the distance measurement and feature normalization for the underlying example index and queries. If your model does not precisely fit one of these types, it is okay to choose the closest type. |
Union field
|
|
query |
Preset option controlling parameters for speed-precision trade-off when querying for examples. If omitted, defaults to |
ExplanationMetadata
| JSON representation |
|---|
{ "inputs": { string: { object ( |
| Fields | |
|---|---|
inputs |
Required. Map from feature names to feature input metadata. Keys are the name of the features. Values are the specification of the feature. An empty InputMetadata is valid. It describes a text feature which has the name specified as the key in For Vertex AI-provided Tensorflow images, the key can be any friendly name of the feature. Once specified, For custom images, the key must match with the key in An object containing a list of |
outputs |
Required. Map from output names to output metadata. For Vertex AI-provided Tensorflow images, keys can be any user defined string that consists of any UTF-8 characters. For custom images, keys are the name of the output field in the prediction to be explained. Currently only one key is allowed. An object containing a list of |
featureAttributionsSchemaUri |
Points to a YAML file stored on Google Cloud Storage describing the format of the |
latentSpaceSource |
Name of the source to generate embeddings for example based explanations. |
InputsEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
InputMetadata
| JSON representation |
|---|
{ "inputBaselines": [ value ], "inputTensorName": string, "encoding": enum ( |
| Fields | |
|---|---|
inputBaselines[] |
Baseline inputs for this feature. If no baseline is specified, Vertex AI chooses the baseline for this feature. If multiple baselines are specified, Vertex AI returns the average attributions across them in For Vertex AI-provided Tensorflow images (both 1.x and 2.x), the shape of each baseline must match the shape of the input tensor. If a scalar is provided, we broadcast to the same shape as the input tensor. For custom images, the element of the baselines must be in the same format as the feature's input in the |
inputTensorName |
Name of the input tensor for this feature. Required and is only applicable to Vertex AI-provided images for Tensorflow. |
encoding |
Defines how the feature is encoded into the input tensor. Defaults to IDENTITY. |
modality |
Modality of the feature. Valid values are: numeric, image. Defaults to numeric. |
featureValueDomain |
The domain details of the input feature value. Like min/max, original mean or standard deviation if normalized. |
indicesTensorName |
Specifies the index of the values of the input tensor. Required when the input tensor is a sparse representation. Refer to Tensorflow documentation for more details: https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor. |
denseShapeTensorName |
Specifies the shape of the values of the input if the input is a sparse representation. Refer to Tensorflow documentation for more details: https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor. |
indexFeatureMapping[] |
A list of feature names for each index in the input tensor. Required when the input |
encodedTensorName |
Encoded tensor is a transformation of the input tensor. Must be provided if choosing An encoded tensor is generated if the input tensor is encoded by a lookup table. |
encodedBaselines[] |
A list of baselines for the encoded tensor. The shape of each baseline should match the shape of the encoded tensor. If a scalar is provided, Vertex AI broadcasts to the same shape as the encoded tensor. |
visualization |
Visualization configurations for image explanation. |
groupName |
Name of the group that the input belongs to. Features with the same group name will be treated as one feature when computing attributions. Features grouped together can have different shapes in value. If provided, there will be one single attribution generated in |
FeatureValueDomain
| JSON representation |
|---|
{ "minValue": number, "maxValue": number, "originalMean": number, "originalStddev": number } |
| Fields | |
|---|---|
minValue |
The minimum permissible value for this feature. |
maxValue |
The maximum permissible value for this feature. |
originalMean |
If this input feature has been normalized to a mean value of 0, the original_mean specifies the mean value of the domain prior to normalization. |
originalStddev |
If this input feature has been normalized to a standard deviation of 1.0, the original_stddev specifies the standard deviation of the domain prior to normalization. |
Visualization
| JSON representation |
|---|
{ "type": enum ( |
| Fields | |
|---|---|
type |
Type of the image visualization. Only applicable to |
polarity |
Whether to only highlight pixels with positive contributions, negative or both. Defaults to POSITIVE. |
colorMap |
The color scheme used for the highlighted areas. Defaults to PINK_GREEN for Defaults to VIRIDIS for |
clipPercentUpperbound |
Excludes attributions above the specified percentile from the highlighted areas. Using the clip_percent_upperbound and clip_percent_lowerbound together can be useful for filtering out noise and making it easier to see areas of strong attribution. Defaults to 99.9. |
clipPercentLowerbound |
Excludes attributions below the specified percentile, from the highlighted areas. Defaults to 62. |
overlayType |
How the original image is displayed in the visualization. Adjusting the overlay can help increase visual clarity if the original image makes it difficult to view the visualization. Defaults to NONE. |
OutputsEntry
| JSON representation |
|---|
{
"key": string,
"value": {
object ( |
| Fields | |
|---|---|
key |
|
value |
|
OutputMetadata
| JSON representation |
|---|
{ "outputTensorName": string, // Union field |
| Fields | |
|---|---|
outputTensorName |
Name of the output tensor. Required and is only applicable to Vertex AI provided images for Tensorflow. |
Union field If neither of the fields are specified, |
|
indexDisplayNameMapping |
Static mapping between the index and display name. Use this if the outputs are a deterministic n-dimensional array, e.g. a list of scores of all the classes in a pre-defined order for a multi-classification Model. It's not feasible if the outputs are non-deterministic, e.g. the Model produces top-k classes or sort the outputs by their values. The shape of the value must be an n-dimensional array of strings. The number of dimensions must match that of the outputs to be explained. The |
displayNameMappingKey |
Specify a field name in the prediction to look for the display name. Use this if the prediction contains the display names for the outputs. The display names in the prediction must have the same shape of the outputs, so that it can be located by |
PrivateEndpoints
| JSON representation |
|---|
{ "predictHttpUri": string, "explainHttpUri": string, "healthHttpUri": string, "serviceAttachment": string } |
| Fields | |
|---|---|
predictHttpUri |
Output only. Http(s) path to send prediction requests. |
explainHttpUri |
Output only. Http(s) path to send explain requests. |
healthHttpUri |
Output only. Http(s) path to send health check requests. |
serviceAttachment |
Output only. The name of the service attachment resource. Populated if private service connect is enabled. |
FasterDeploymentConfig
| JSON representation |
|---|
{ "fastTryoutEnabled": boolean } |
| Fields | |
|---|---|
fastTryoutEnabled |
If true, enable fast tryout feature for this deployed model. |
RolloutOptions
| JSON representation |
|---|
{ "previousDeployedModel": string, "revisionNumber": integer, // Union field |
| Fields | |
|---|---|
previousDeployedModel |
ID of the DeployedModel that this deployment should replace. |
revisionNumber |
Output only. Read-only. Revision number determines the relative priority of DeployedModels in the same rollout. The DeployedModel with the largest revision number specifies the intended state of the deployment. |
Union field max_unavailable. Configures how many replicas are allowed to be unavailable during a rolling deployment. max_unavailable can be only one of the following: |
|
maxUnavailableReplicas |
Absolute count of replicas allowed to be unavailable. |
maxUnavailablePercentage |
Percentage of replicas allowed to be unavailable. For autoscaling deployments, this refers to the target replica count. |
Union field max_surge. Configures how many additional replicas can be provisioned during a rolling deployment. max_surge can be only one of the following: |
|
maxSurgeReplicas |
Absolute count of allowed additional replicas. |
maxSurgePercentage |
Percentage of allowed additional replicas. For autoscaling deployments, this refers to the target replica count. |
Status
| JSON representation |
|---|
{ "message": string, "lastUpdateTime": string, "availableReplicaCount": integer } |
| Fields | |
|---|---|
message |
Output only. The latest deployed model's status message (if any). |
lastUpdateTime |
Output only. The time at which the status was last updated. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
availableReplicaCount |
Output only. The number of available replicas of the deployed model. |
SystemLabelsEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
SpeculativeDecodingSpec
| JSON representation |
|---|
{ "speculativeTokenCount": integer, // Union field |
| Fields | |
|---|---|
speculativeTokenCount |
The number of speculative tokens to generate at each step. |
Union field speculation. The type of speculation method to use. speculation can be only one of the following: |
|
draftModelSpeculation |
draft model speculation. |
ngramSpeculation |
N-Gram speculation. |
DraftModelSpeculation
| JSON representation |
|---|
{ "draftModel": string } |
| Fields | |
|---|---|
draftModel |
Required. The resource name of the draft model. |
NgramSpeculation
| JSON representation |
|---|
{ "ngramSize": integer } |
| Fields | |
|---|---|
ngramSize |
The number of last N input tokens used as ngram to search/match against the previous prompt sequence. This is equal to the N in N-Gram. The default value is 3 if not specified. |
TrafficSplitEntry
| JSON representation |
|---|
{ "key": string, "value": integer } |
| Fields | |
|---|---|
key |
|
value |
|
LabelsEntry
| JSON representation |
|---|
{ "key": string, "value": string } |
| Fields | |
|---|---|
key |
|
value |
|
EncryptionSpec
| JSON representation |
|---|
{ "kmsKeyName": string } |
| Fields | |
|---|---|
kmsKeyName |
Required. Resource name of the Cloud KMS key used to protect the resource. The Cloud KMS key must be in the same region as the resource. It must have the format |
PrivateServiceConnectConfig
| JSON representation |
|---|
{
"enablePrivateServiceConnect": boolean,
"projectAllowlist": [
string
],
"pscAutomationConfigs": [
{
object ( |
| Fields | |
|---|---|
enablePrivateServiceConnect |
Required. If true, expose the IndexEndpoint via private service connect. |
projectAllowlist[] |
A list of Projects from which the forwarding rule will target the service attachment. |
pscAutomationConfigs[] |
Optional. List of projects and networks where the PSC endpoints will be created. This field is used by Online Inference(Prediction) only. |
enableSecurePrivateServiceConnect |
Optional. If set to true, enable secure private service connect with IAM authorization. Otherwise, private service connect will be done without authorization. Note latency will be slightly increased if authorization is enabled. |
serviceAttachment |
Output only. The name of the generated service attachment resource. This is only populated if the endpoint is deployed with PrivateServiceConnect. |
PSCAutomationConfig
| JSON representation |
|---|
{
"projectId": string,
"network": string,
"ipAddress": string,
"forwardingRule": string,
"state": enum ( |
| Fields | |
|---|---|
projectId |
Required. Project id used to create forwarding rule. |
network |
Required. The full name of the Google Compute Engine network. Format: |
ipAddress |
Output only. IP address rule created by the PSC service automation. |
forwardingRule |
Output only. Forwarding rule created by the PSC service automation. |
state |
Output only. The state of the PSC service automation. |
errorMessage |
Output only. Error message if the PSC service automation failed. |
PredictRequestResponseLoggingConfig
| JSON representation |
|---|
{
"enabled": boolean,
"samplingRate": number,
"errorSamplingRate": number,
"bigqueryDestination": {
object ( |
| Fields | |
|---|---|
enabled |
If logging is enabled or not. |
samplingRate |
Percentage of requests to be logged, expressed as a fraction in range(0,1]. |
errorSamplingRate |
Optional. Percentage of failed requests to be logged, expressed as a fraction in range [0,1]. Only non-transient errors will be logged (currently |
bigqueryDestination |
BigQuery table for logging. If only given a project, a new dataset will be created with name |
requestResponseLoggingSchemaVersion |
Output only. The schema version used in creating the BigQuery table for the request response logging. The versions are "v1" and "v2". The current default version is "v1". |
enableOtelLogging |
This field is used for large models. If true, in addition to the original large model logs, logs will be converted in OTel schema format, and saved in otel_log column. Default value is false. |
BigQueryDestination
| JSON representation |
|---|
{ "outputUri": string } |
| Fields | |
|---|---|
outputUri |
Required. BigQuery URI to a project or table, up to 2000 characters long. When only the project is specified, the Dataset and Table is created. When the full table reference is specified, the Dataset must exist and table must not exist. Accepted forms:
|
ClientConnectionConfig
| JSON representation |
|---|
{ "inferenceTimeout": string } |
| Fields | |
|---|---|
inferenceTimeout |
Customizable online prediction request timeout. A duration in seconds with up to nine fractional digits, ending with ' |
GenAiAdvancedFeaturesConfig
| JSON representation |
|---|
{
"ragConfig": {
object ( |
| Fields | |
|---|---|
ragConfig |
Configuration for Retrieval Augmented Generation feature. |
RagConfig
| JSON representation |
|---|
{ "enableRag": boolean } |
| Fields | |
|---|---|
enableRag |
If true, enable Retrieval Augmented Generation in ChatCompletion request. Once enabled, the endpoint will be identified as GenAI endpoint and Arthedain router will be used. |
Tool Annotations
Destructive Hint: ❌ | Idempotent Hint: ❌ | Read Only Hint: ✅ | Open World Hint: ❌