This document describes how to use AI AutoMetrics to monitor your AI workloads on Vertex AI.
AI AutoMetrics lets you to monitor the performance and health of your models with minimal configuration. This feature is designed to give you immediate insights into your custom containers and models running on Vertex AI Inference.
Before you begin
- Make sure you have a Vertex AI endpoint with a deployed model that uses a container with supported frameworks.
- Make sure your project has enabled Cloud Monitoring. See For more information, see Enable the Monitoring API.
Use AI AutoMetrics
To view AI AutoMetrics on Metrics Explorer, do the following:
Go to the Metrics Explorer page in the Google Cloud console.
Under Select a metric, select Prometheus Target.
Under Active metric categories, select Vertex.
Under Active metrics, select the desired metric.
Click Apply.
You can also query metrics using Grafana, or Prometheus API or UI.
Supported Frameworks
AI AutoMetrics supports the following frameworks:
| Framework | Qualified endpoint | Qualified metrics |
|---|---|---|
| vLLM | Prometheus-compatible /metrics endpoint |
Metrics with vllm: prefix |
How it works
Vertex AI automatically scrapes the /metrics endpoint of your
container at a predefined interval. All qualified metrics are then exported to
Google Cloud Google Cloud Managed Service for Prometheus,
where you can analyze and visualize them.
Metric naming and labels
The metrics collected by AI AutoMetrics are ingested into Cloud Monitoring
under the vertex_* naming convention.
For easier filtering and grouping, AI AutoMetrics automatically attaches the following additional Vertex AI labels to each metric:
deployed_model_id: the ID of a deployed model which serves inference requests.model_display_name: the display name of a deployed model.replica_id: the unique ID corresponding to the deployed model replica (pod name).endpoint_id: the ID of a model endpoint.endpoint_display_name: the display name of a model endpoint.product: the name of the feature under Vertex AI. This is always Online Inference.
What's next
- Learn more about the Metrics Explorer.