This page explains how to dynamically scale GKE Agent Sandbox environments using the Horizontal Pod Autoscaler (HPA) and standby capacity buffers on a GKE Standard cluster.
By default, Agent Sandbox Warm Pools keep a static number of pre-provisioned replicas ready to minimize Pod startup latency. This helps to avoid scenarios with variable traffic, where maintaining a high number of static replicas can incur high compute costs.
You can balance capacity readiness and cost savings by using dynamic scaling. This
approach adjusts the size of the SandboxWarmPool based on demand and uses
standby capacity buffers (suspended VMs) to proactively provision infrastructure
for fast scaling without the full cost of over-provisioning active nodes.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
A GKE Standard cluster running version 1.36.0-gke.2208000 or later.
Create a cluster
To create a GKE Standard cluster with the required configurations for standby capacity buffers and Agent Sandbox, run the following command:
gcloud container clusters create CLUSTER_NAME \
--location=CONTROL_PLANE_LOCATION \
--cluster-version=VERSION \
--enable-autoscaling \
--enable-autoprovisioning \
--max-cpu=MAX_CPU \
--max-memory=MAX_MEMORY \
--enable-agent-sandbox \
--enable-image-streaming \
--workload-pool=PROJECT_ID.svc.id.goog \
--monitoring=SYSTEM
Replace the following:
CLUSTER_NAME: the name of your new cluster.VERSION: the GKE version, which must be1.36.0-gke.2208000or later.CONTROL_PLANE_LOCATION: the Compute Engine location for your new cluster. Choose a region for regional clusters (for example,us-central1), or a zone for zonal clusters (for example,us-central1-a).MAX_CPU: maximum CPU limits for auto-provisioning, for example4000.MAX_MEMORY: maximum memory limits for auto-provisioning in GB, for example12000.PROJECT_ID: your Google Cloud project ID.
Configure Agent Sandbox components
You must define a SandboxTemplate and a SandboxWarmPool to manage your
sandboxed workloads.
Save the following manifest as
sandboxtemplate.yaml:apiVersion: extensions.agents.x-k8s.io/v1alpha1 kind: SandboxTemplate metadata: name: agent-template namespace: NAMESPACE spec: podTemplate: metadata: labels: app: agent-sandbox-workload spec: restartPolicy: Never containers: - name: python-agent image: python:3.11-slim command: ["/bin/sh", "-c"] args: ["echo 'Hello from the Sandbox!' && sleep 3600"] resources: requests: cpu: "1000m" memory: "100Mi"Replace
NAMESPACEwith your namespace, for exampleagent-sandbox-demo.Apply the manifest:
kubectl apply -f sandboxtemplate.yamlSave the following manifest as
sandboxwarmpool.yaml. This establishes an initial static pool of replicas.apiVersion: extensions.agents.x-k8s.io/v1alpha1 kind: SandboxWarmPool metadata: name: agent-warmpool namespace: NAMESPACE spec: replicas: 10 sandboxTemplateRef: name: agent-templateApply the manifest:
kubectl apply -f sandboxwarmpool.yaml
Configure metrics collection
The Agent Sandbox controller exposes a counter metric for the number of sandboxes
claimed: agent_sandbox_claim_creation_total. You can configure a PodMonitoring
resource to collect this metric and send it to Google Cloud Managed Service for Prometheus.
Save the following manifest as
podmonitoring.yaml:apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: agent-sandbox-controller-monitoring namespace: agent-sandbox-system # Namespace where the controller is running spec: selector: matchLabels: app: agent-sandbox-controller endpoints: - port: 8080 # Port where metrics are exposed path: /metrics interval: 15sApply the manifest:
kubectl apply -f podmonitoring.yaml
Enable custom metrics adapter
To allow the HPA to read metrics from Google Cloud Managed Service for Prometheus, you must deploy the
custom-metrics-stackdriver-adapter.
Enable the required IAM bindings. Run the following commands:
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole=cluster-admin --user="$(gcloud config get-value account)"
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
gcloud projects add-iam-policy-binding PROJECT_ID \
--role=roles/monitoring.viewer \
--member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter
Replace PROJECT_NUMBER with your Google Cloud project number.
Configure RBAC permissions for SandboxWarmPool
The capacity buffer controller needs permission to read the scale subresource of
the SandboxWarmPool custom resource.
Save the following manifest as
capacity-buffer-rbac.yaml:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: sandbox-warmpool-scale-reader rules: - apiGroups: ["extensions.agents.x-k8s.io"] resources: ["sandboxwarmpools/scale"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: ca-sandbox-warmpool-scale-reader subjects: - kind: User name: "system:cluster-autoscaler" namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: sandbox-warmpool-scale-readerApply the manifest:
kubectl apply -f capacity-buffer-rbac.yaml
Configure capacity buffer
Configure a CapacityBuffer to maintain an infrastructure buffer proportional to the
size of the SandboxWarmPool. For more information, see Configure capacity buffers.
Save the following manifest as
capacitybuffer.yaml. This example maintains a buffer equivalent to 200% of theSandboxWarmPool's replicas using standby capacity (suspended VMs).apiVersion: autoscaling.x-k8s.io/v1beta1 kind: CapacityBuffer metadata: name: agent-warmpool-buffer namespace: NAMESPACE spec: percentage: 200 scalableRef: apiGroup: extensions.agents.x-k8s.io kind: SandboxWarmPool name: agent-warmpool provisioningStrategy: "buffer.gke.io/standby-capacity"Apply the manifest:
kubectl apply -f capacitybuffer.yaml
Configure Horizontal Pod Autoscaler
Connect the SandboxWarmPool to the HPA to dynamically scale replicas based on
the custom metric.
Save the following manifest as
hpa.yaml:apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: agent-warmpool-hpa namespace: NAMESPACE spec: scaleTargetRef: apiVersion: extensions.agents.x-k8s.io/v1alpha1 kind: SandboxWarmPool name: agent-warmpool minReplicas: 10 maxReplicas: 100 metrics: - type: External external: metric: name: "prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter" selector: matchLabels: metric.labels.warmpool_name: "agent-warmpool" target: type: Value value: 0.2Apply the manifest:
kubectl apply -f hpa.yaml
Monitor scaling events
You can monitor HPA and Capacity Buffer events to verify dynamic scaling.
Monitor HPA events
To watch HPA events, run the following command:
kubectl get events -n NAMESPACE --watch \
--field-selector involvedObject.kind=HorizontalPodAutoscaler
The sample output when scaling occurs looks similar to the following:
SuccessfulRescale New size: 20; reason: external metric prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter above target
Monitor CapacityBuffer events
To watch Capacity Buffer events, run the following command:
kubectl get events -n NAMESPACE --watch \
--field-selector involvedObject.kind=CapacityBuffer
The sample output showing suspended VM resume or scale-up looks similar to the following:
TriggeredScaleUp capacity buffer 20 fake pods triggered scale-up
What's next
- Learn more about Agent Sandbox.
- Learn more about Capacity buffers.