Expose custom metrics for autoscaling

Autopilot Standard

This document describes how to send one or more metrics from a Pod or workload to your autoscaler.

These metrics come from the service or application that you are running. For one example of exposed metrics, see the metrics exposed by the vLLM Engine.

The workload autoscaler can then use this data to scale workloads more efficiently. For example, you can use this feature to monitor queue depth or active requests, and then allow the autoscaler to increase or decrease the number of Pods. From the vLLM example, a metric that could be useful to track utilization is vllm:gpu_cache_usage_perc.

Requirements

Requirements for the Pods are the following:

GKE 1.35.1-gke.1396000 or later with clusters in the Rapid channel.
Using horizontal Pod autoscaling with the performance profile.

Requirements for the metrics are the following.

Metrics must be accessible on an HTTP endpoint. The endpoint path is by default /metrics.
Metrics must be formatted according to the Prometheus standard.
Only gauge metrics are supported.
Label names in Pod label selectors must not contain special characters. Only a-z letters (lowercase or uppercase), numbers, hyphen, and underscore are supported.
When using filtering based on metrics labels, the label key must match the regular expression ^[a-zA-Z_][a-zA-Z0-9_]* (start with a letter or an underscore, and only contain letters, numbers, or underscores).
Maximum of 20 unique metrics can be exposed per cluster.

Expose metrics for autoscaling

Choose a metric to expose. You can choose any metric that your workload exposes, and that also meets the requirements listed in the previous section.

If your workload exposes multiple metrics with the same name but different labels, add a label filter to help ensure only one is selected.
Add the following custom resource, replacing details that are specific to your metric and Pod:
```
apiVersion: autoscaling.gke.io/v1beta1
kind: AutoscalingMetric
metadata:
  name: NAME
  namespace: NAMESPACE
spec:
  metrics:
  - pod:
      selector:
        matchLabels:
          APP_LABEL_NAME: APP_LABEL_VALUE
      containers:
      - endpoint:
          port: METRIC_PORT
          path: METRIC_PATH
        metrics:
        - gauge:
            name: METRIC
            prometheusMetricName: METRIC_PROMETHEUS_NAME
```
Replace the following to match your workload:
- NAME: the name of the AutoscalingMetric object.
- NAMESPACE: the namespace that the Pods are in.
- APP_LABEL_NAME and APP_LABEL_VALUE: the label name and value matching the Pods that emit the metric.
- METRIC_PORT: the port number.
- METRIC_PATH: the path to the metric. Verify the path used by your service or application; this path is often /metrics.
- METRIC: the name of the metric that you are exposing. The name must match the regular expression ^[a-z]([-a-z0-9]*[a-z0-9])?, and have a length of no more than 63 characters. This means that the first character must be a lowercase letter, and all the following characters must be hyphens, lowercase letters, or digits, except the last character, which cannot be a hyphen.
- Optional: METRIC_PROMETHEUS_NAME: the Prometheus metric name as exposed by the Pod. You can use this field to rename the metric, for example because the metric name exposed by the Pod does not comply with the name restrictions set by the autoscaler.
  
  For details about name restrictions, see limitations for horizontal Pod autoscaling.
Apply the manifest using the following command:
```
kubectl apply -f FILE_NAME_AUTOSCALING_METRIC.yaml
```
Replace FILE_NAME_AUTOSCALING_METRIC with the name of the YAML file.

When you have added the custom resource, the metric is pushed to the autoscaling API. The metric is read every few seconds and sent to the workload autoscaler.
Now that you have exposed the metrics to the autoscaler, you can configure the workload autoscaler to use these metrics. To do so, add the following custom resource:
```
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: NAME_HPA
  namespace: NAMESPACE
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: DEPLOYMENT
  minReplicas: MIN_REPLICAS
  maxReplicas: MAX_REPLICAS
  metrics:
    - type: Pods
      pods:
        metric:
          name: autoscaling.gke.io|NAME|METRIC
        target:
          type: AverageValue
          averageValue: AVERAGE_VALUE
```
Replace the following to match your workload:
- NAME_HPA: the name of the HorizontalPodAutoscaler object.
- NAMESPACE: the namespace that the Pods are in.
- DEPLOYMENT: the name of the deployment you are targeting.
- MIN_REPLICAS: the minimum number of replicas the Deployment can scale to.
- MAX_REPLICAS: the maximum number of replicas the Deployment can scale to.
- NAME: the name of the AutoscalingMetric object.
- METRIC: the name of the metric that you are exposing.
- AVERAGE_VALUE: the target average value for the metric. The autoscaler adjusts the number of replicas to maintain an average metric value across all Pods.
Apply the manifest using the following command:
```
kubectl apply -f FILE_NAME_HPA.yaml
```
Replace FILE_NAME_HPA with the name of the YAML file.

Filtering metrics using metric labels

Filtering metrics using metric labels is available on GKE 1.36.0-gke.1759000 or later.

Metrics often include labels. Labels are key-value pairs that let you add dimensions to a value. For example, a metric counting the number of requests to a HTTP endpoint sliced by method and environment could use labels to specify this context. This example could look like the following:

http_requests_total{method="GET", env="prod"} 11111
http_requests_total{method="PUT", env="staging"} 22222

You can use label filters to help ensure that your metric specification exactly matches one metric. For example, to select only the first metric in the preceding example, use the following specification for the AutoscalingMetric gauge metric:

apiVersion: autoscaling.gke.io/v1beta1
kind: AutoscalingMetric
metadata:
  name: filter-sample
spec:
  # Several lines are omitted here.
      metrics:
      - gauge:
          name: http_requests_total
          filter:
            matchLabels:
              method: GET
              env: prod

The keys of the matchLabels key-value pairs must match the regular expression ^[a-zA-Z_][a-zA-Z0-9_]*: that is, start with a letter or an underscore, and only contain letters, numbers, or underscores.

Troubleshoot metrics that are exposed for autoscaling

You can review the status of the AutoscalingMetric custom resource to look for configuration errors. To do so:

Inspect the AutoscalingMetric custom resource by running the following command:
```
kubectl describe autoscalingmetric NAME -n NAMESPACE
```
Look at the Status field for information about configured metrics, such as warnings about configuration errors, and the exact name of the metric as it should appear in the HorizontalPodAutoscaler object.

What's next

For more details about autoscaling workloads based on metrics, see About autoscaling workloads based on metrics.
Learn how to Configure horizontal Pod autoscaling.
For more details about troubleshooting horizontal autoscaling, see Troubleshoot horizontal Pod autoscaling.