Vertical Pod autoscaling automates setting CPU and memory resource requests and limits for containers within Kubernetes Pods. Vertical Pod autoscaling analyzes historical and current resource usage to provide recommendations, which it can either display or automatically apply by updating Pods. This feature improves stability and cost efficiency by right-sizing resource allocations.
Before you begin
Before you configure Vertical Pod Autoscaling, ensure you meet the following prerequisites:
- You have a running bare metal cluster.
- You have
kubectlaccess to the cluster. - Metrics Server is available in the cluster. Bare metal clusters include Metrics Server by default.
Enable vertical Pod autoscaling
Enable vertical Pod autoscaling on your bare metal cluster by setting a preview annotation and configuring the cluster specification:
Add or update the preview annotation on the Cluster custom resource.
Edit the Cluster custom resource directly or modify the cluster configuration file and use
bmctl update.metadata: annotations: preview.baremetal.cluster.gke.io/vertical-pod-autoscaler: enableModify the
specof the Cluster custom resource to include theverticalPodAutoscalingfield and specify theenableUpdaterandenableMemorySavermodes:apiVersion: baremetal.cluster.gke.io/v1 kind: Cluster metadata: name: cluster1 namespace: cluster-cluster1 annotations: preview.baremetal.cluster.gke.io/vertical-pod-autoscaler: enable spec: # ... other cluster spec fields verticalPodAutoscaling: enableUpdater: true # Set to true for automated updates enableMemorySaver: true # Set to true to reduce recommender memory usageIf you modified the cluster configuration file, apply the changes using the following command:
bmctl update cluster -c CLUSTER_NAME --kubeconfig KUBECONFIGReplace the following:
CLUSTER_NAME: the name of your cluster.KUBECONFIG: the path of your cluster kubeconfig file.
Create a VerticalPodAutoscaler custom resource
After enabling vertical Pod autoscaling on your cluster, define a
VerticalPodAutoscaler custom resource to target specific workloads:
Define a
VerticalPodAutoscalerresource in the same namespace as the target workload.This custom resource specifies which Pods it targets using
targetRefand any resource policies.apiVersion: "autoscaling.k8s.io/v1" kind: VerticalPodAutoscaler metadata: name: hamster-vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: hamster resourcePolicy: containerPolicies: - containerName: '*' minAllowed: cpu: 100m memory: 50Mi maxAllowed: cpu: 1 memory: 500Mi controlledResources: ["cpu", "memory"]Apply the
VerticalPodAutoscalermanifest using the following command:kubectl apply -f VPA_MANIFEST \ --kubeconfig KUBECONFIGReplace the following:
VPA_MANIFEST: the path of theVerticalPodAutoscalermanifest file.KUBECONFIG: the path of the cluster kubeconfig file.
Understand vertical Pod autoscaling modes
Vertical Pod autoscaling operates in different modes that control how it applies resource recommendations.
Recommendation mode
In recommendation mode, vertical Pod autoscaling installs the recommender
component. This component analyzes resource usage and publishes recommended
values for CPU and memory requests and limits in the status section of the
VerticalPodAutoscaler custom resources you create.
To view resource requests and limits recommendations, use the following command:
kubectl describe vpa VPA_NAME \
--kubeconfig KUBECONFIG \
-n CLUSTER_NAMESPACE
Replace the following:
* `VPA_NAME`: the name of the `VerticalPodAutoscaler`
that's targeting the workloads for which you are considering resource
adjustments.
* `KUBECONFIG`: the path of the cluster kubeconfig
file.
* `CLUSTER_NAMESPACE`: the name of the cluster that's
running vertical Pod autoscaling.
The response should contain a Status section that's similar to the following
sample:
Status:
Conditions:
Last Transition Time: 2025-08-04T23:53:32Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 100m
Memory: 262144k
Target:
Cpu: 587m
Memory: 262144k
Uncapped Target:
Cpu: 587m
Memory: 262144k
Upper Bound:
Cpu: 1
Memory: 500Mi
Pods aren't automatically updated in this mode. Use these recommendations to
manually update your Pod configurations. This is the default behavior if
enableUpdater
isn't set or is false.
Automated update mode
When you set
enableUpdater
enableUpdater
to true, bare metal lifecycle controllers deploy the vertical Pod autoscaling
updater and admission controller components in addition to the recommender. The
updater monitors for Pods whose current resource requests deviate significantly
from the recommendations.
The update policy in the VerticalPodAutoscaler resource specifies how the
updater applies the recommendations. By default, the update mode is Auto,
which dictates that the updater assigns updated resource settings on Pod
creation. The following VerticalPodAutoscaler sample how you set the update
mode to Initial:
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
name: hamster-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hamster
resourcePolicy:
updatePolicy:
updateMode: "Initial"
...
The updater supports the following five modes:
Auto: The updater evicts the Pod. The admission controller intercepts the creation request for the new Pod and modifies it to use the recommended CPU and memory values provided by the recommender. Updating resources requires recreating the Pod, which can cause disruptions. Use Pod Disruption Budgets, which the updater honors, to manage the eviction process. This mode equivalent toRecreate.Recreate: The updater evicts Pods and assigns recommended resource requests and limits when the Pod is recreated.InPlaceOrRecreate(alpha): The updater attempts best-effort in-place updates, but may fall back to recreating the Pod if in-place updates aren't possible. For more information, see the in-place pod resize documentation.Initial: The updater only assigns resource requests on Pod creation and never changes them later.Off: The updater doesn't automatically change the resource requirements of the Pods. The recommendations are calculated and can be inspected in theVerticalPodAutoscalerobject.
For more information about the VerticalPodAutoscaler custom resource, use
kubectl to retrieve the verticalpodautoscalercheckpoints.autoscaling.k8s.io
custom resource definition that is installed on the version 1.33.0 or later
cluster.
The following sample shows how resource recommendations might appear in the
Status section for the hamster container. The sample also shows an example
of a Pod eviction event, which occurs when the updater evicts a Pod prior to
automatically assigning the recommended resource configuration to the recreated
Pod:
Spec:
Resource Policy:
Container Policies:
Container Name: *
Controlled Resources:
cpu
memory
Max Allowed:
Cpu: 1
Memory: 500Mi
Min Allowed:
Cpu: 100m
Memory: 50Mi
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: hamster
Update Policy:
Update Mode: Auto
Status:
Conditions:
Last Transition Time: 2025-08-04T23:53:32Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: hamster
Lower Bound:
Cpu: 100m
Memory: 262144k
Target:
Cpu: 587m
Memory: 262144k
Uncapped Target:
Cpu: 587m
Memory: 262144k
Upper Bound:
Cpu: 1
Memory: 500Mi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EvictedPod 49s vpa-updater VPA Updater evicted Pod hamster-7cb59fb657-lkrk4 to apply resource recommendation.
Memory saver mode
Memory saver mode reduces the memory footprint of the vertical Pod autoscaling
recommender component. When you set
enableMemorySaver
to true, the recommender only tracks and computes aggregations for Pods that
have a matching VerticalPodAutoscaler custom resource.
The trade-off is that when you create a new VerticalPodAutoscaler custom
resource for an existing workload, the recommender takes some time (up to 24
hours) to gather sufficient history to provide accurate recommendations. This
mode is false by default for most cluster types, but defaults to true for
edge clusters.
Use Prometheus as a persistent history provider
By default, the recommender component maintains the resource
consumption history of the workloads running on the cluster in
memory, and, periodically, saves its state to a VerticalPodAutoscalerCheckpoint
custom resource in etcd to provide resilience against restarts.
Starting with Google Distributed Cloud version 1.34, you can use your own instance of Prometheus as a persistent history provider for resource consumption data, namely CPU and memory usage metrics. When this integration is enabled, the recommender can query the Prometheus server upon startup or restart to retrieve long-term historical resource usage data for all managed Pods. Retrieving this data allows the recommender to immediately build its internal state with a rich dataset, leading to more informed and accurate recommendations from the start.
Using Prometheus as a persistent history provider offers the following advantages:
Optimizing resource utilization: generates well-informed and accurate recommendations as soon as it starts so you can optimize cluster resource utilization.
Preventing Out Of Memory (OOM) errors: Prometheus eliminates the need to store the recommender's internal state in
VerticalPodAutoscalerCheckpointcustom resources (CRs) making ETCD memory utilization more efficient. When the recommender component restarts, it loses in-memory historical data. When you use Prometheus as a history provider, the recommender fetches historical metrics from Prometheus on restart, which eliminates the need for theVerticalPodAutoscalerCheckpointCR and saves ETCD memory.
You can enable and disable the use of Prometheus as a persistent history provider at any time.
Prerequisites for using Prometheus with vertical Pod autoscaling
To use your own Prometheus instance as a history provider for vertical Pod autoscaling, you must configure it to scrape the necessary metrics, which involves the following steps:
If needed, deploy the Prometheus Operator in the cluster for which you want to use it as a persistent history provider for vertical Pod autoscaling. For more information, see How to deploy and configure the Prometheus Operator in Kubernetes.
Configure permissions for scraping metrics.
To allow Prometheus to scrape metrics from cAdvisor using a configuration file, you must grant additional permissions to the service account that the Prometheus server uses. Create or update a
ClusterRolecontaining these rules and ensure it's bound to the correct Prometheus service account using aClusterRoleBinding:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-role labels: app: prometheus-server rules: - apiGroups: [""] resources: - nodes verbs: - get - list - watch - apiGroups: [""] resources: - nodes/proxy - nodes/metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-binding labels: app: prometheus-server subjects: - kind: ServiceAccount name: prometheus-server # Service account being used by prometheus namespace: prometheus # Service account's namespace roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-role # Name of the ClusterRole created aboveUpdate the Prometheus configuration file to scrape the following metrics from cAdvisor:
container_cpu_usage_seconds_totalcontainer_memory_working_set_bytes
The following lines define the scrape details for the cAdvisor metrics:
- job_name: 'kubernetes-cadvisor' scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor metric_relabel_configs: # Keep only the metrics VPA uses to save disk space - source_labels: [__name__] regex: (container_cpu_usage_seconds_total|container_memory_working_set_bytes) action: keepUpdate the Prometheus configuration file to scrape the following metric from the
kube-state-metricsService:kube_pod_labels
Deploy a
kube-state-metricsService on your cluster.You can use the following Helm commands to install the new Service:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo updateCreate a
ksm-values.yamlfile with the following content:fullnameOverride: vpa-kube-state-metrics metricAllowlist: - kube_pod_labels metricLabelsAllowlist: - "pods=[*]"Install a Helm chart based on the values file from the preceding step:
helm install vpa-ksm prometheus-community/kube-state-metrics \ -f ksm-values.yaml --namespace kube-systemAdd the following lines to the Prometheus configuration file to scrape the
kube_pod_labelsmetric from the installedkube-state-metricsService:- job_name: 'kube-state-metrics' static_configs: - targets: ['vpa-kube-state-metrics.kube-system.svc.cluster.local:8080'] metric_relabel_configs: - source_labels: [ __name__ ] regex: 'kube_pod_labels' action: keep
Enable and use Prometheus
Vertical Pod autoscaling supports both basic authentication and bearer
token-based authentication for connecting to Prometheus. When using
authentication, you need to create a Secret containing the necessary
credentials in the cluster namespace. The controller forwards this Secret to the
target cluster and mounts it as a volume or environment variable in the
recommender Pod. You can also use Prometheus without authentication.
To enable and use your own Prometheus instance with vertical Pod autoscaling,
you need to configure the verticalPodAutoscaling section in your cluster
specification with details for connecting to your Prometheus instance.
Here's an example of the configuration in the cluster spec for use with a bearer token:
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: cluster1
namespace: cluster-cluster1
annotations:
preview.baremetal.cluster.gke.io/vertical-pod-autoscaler: enable
spec:
# ... other existing cluster configurations ...
verticalPodAutoscaling:
# ... other vertical Pod autoscaling configurations ...
# Add this new section to configure the vpa to use prometheus using bearer token authentication as history provider
prometheus:
url: "http://prometheus.prometheus.monitoring.svc.cluster.local:9090"
auth:
bearerTokenAuth:
name: prom-bearer-creds
key: bearertoken
To enable Prometheus for use with vertical Pod autoscaling:
Ensure that your Prometheus instance is set up to scrape required metrics as outlined in Prerequisites for using Prometheus with vertical Pod autoscaling.
Update the Cluster custom resource
specso that theverticalPodAutoscaling.prometheusfield specifies the connection settings for your Prometheus server.Add the
urlto theprometheussection and set it to the fully qualified domain name (FQDN) for connecting to Prometheus from within the cluster:spec: # ... other existing cluster configurations ... verticalPodAutoscaling: # ... other vpa configurations ... # Add this new section to configure the vpa to use prometheus as history provider prometheus: # Required: The URL of the Prometheus server url: "http://prometheus.prometheus.svc.cluster.local:9090"Specify the connection details:
Vertical Pod autoscaling supports the following three connection methods:
- No authentication
- Basic (username, password) authentication
Bearer token authentication
No authentication
If your Prometheus instance doesn't require authentication, you're done. The
prometheussection must include only aurlfield.Basic authentication
Use the following steps to specify basic authentication for Prometheus:
Create a Secret that contains a username and password in the
stringDatasection and thebaremetal.cluster.gke.io/mark-source: "true"annotation.The following example shows a Secret that supports basic authentication:
apiVersion: v1 kind: Secret metadata: name: prom-basic-creds namespace: <cluster-namespace> annotations: baremetal.cluster.gke.io/mark-source: "true" type: Opaque stringData: username: admin password: pwdThe annotation is required to make sure that the source secret and the secret in the target cluster are always in sync. The Secret updates when the source secret is updated.
Update the
prometheus.auth.basicAuthsection of the cluster spec to reference the username and password from thedatafield in the Secret.The following example shows a
basicAuthsection that references the username and password in the Secret from the preceding step:# ... other vpa configurations ... prometheus: url: "http://prometheus.prometheus.svc.cluster.local:9090" auth: basicAuth: usernameRef: name: prom-basic-creds key: username passwordRef: name: prom-basic-creds key: passwordThe username and password must be in the same Secret. The keys must be valid keys from the
datafield of the Secret.
The Prometheus instance should start working as a history provider for the vertical Pod autoscaler when the Cluster custom resource is updated.
Bearer token authentication
Use the following steps to specify bearer token authentication for Prometheus:
Create a Secret that contains a bearer token in the
stringDatasection and thebaremetal.cluster.gke.io/mark-source: "true"annotation.The following example shows a Secret that supports bearer token authentication:
apiVersion: v1 kind: Secret metadata: name: prom-bearer-creds namespace: <cluster-namespace> annotations: baremetal.cluster.gke.io/mark-source: "true" type: Opaque stringData: bearertoken: "SAMPLE_TOKEN"The annotation is required to make sure that the source secret and the secret in the target cluster are always in sync. The Secret updates when the source secret is updated.
Update the
prometheus.auth.bearerTokenAuthsection of the cluster spec to reference the bearer token from thedatafield in the Secret.The following example shows a
bearerTokenAuthsection that references the bearer token in the Secret from the preceding step:# ... other vertical Pod autoscaling configurations ... prometheus: url: "http://prometheus.prometheus.svc.cluster.local:9090" auth: bearerTokenAuth: name: prom-bearer-creds key: bearertokenThe key must be a valid key from the
datafield of the Secret.
The Prometheus instance should start working as a history provider for vertical Pod autoscaling when the Cluster custom resource is updated.
Disable the use of Prometheus
To disable the use of Prometheus with vertical Pod autoscaling, remove the
prometheus section from the verticalPodAutoscaling section of the Cluster
custom resource.
Disable vertical Pod autoscaling
Disable Vertical Pod Autoscaling by removing its custom resources and configuration from your cluster:
Delete any
VerticalPodAutoscalercustom resources you have created.Modify the Cluster custom resource and remove the entire
verticalPodAutoscalingsection from thespec.You can edit the Cluster custom resource directly or modify the cluster configuration file and use
bmctl update.Remove the
preview.baremetal.cluster.gke.io/vertical-pod-autoscalerannotation from the Cluster custom resource.
Limitations
Consider the following limitations when using Vertical Pod Autoscaling:
- Vertical Pod autoscaling isn't ready for use with JVM-based workloads due to limited visibility into actual memory usage of the workload.
- The updater requires a minimum of two Pod replicas for Deployments to replace Pods with revised resource values.
- The updater doesn't quickly update Pods that are crash-looping due to Out-Of-Memory (OOM) errors.
- The
InPlaceOrRecreateupdate policy for Pods is an alpha feature within vertical Pod autoscaling. It attempts best-effort in-place updates, but may fall back to recreating the Pod if in-place updates aren't possible.
What's next
- Explore Pod Disruption Budgets.