Scale Agent Sandboxes dynamically using HPA and Capacity Buffers

Standard

This page explains how to dynamically scale GKE Agent Sandbox environments using the Horizontal Pod Autoscaler (HPA) and standby capacity buffers on a GKE Standard cluster.

By default, Agent Sandbox Warm Pools keep a static number of pre-provisioned replicas ready to minimize Pod startup latency. This helps to avoid scenarios with variable traffic, where maintaining a high number of static replicas can incur high compute costs.

You can balance capacity readiness and cost savings by using dynamic scaling. This approach adjusts the size of the SandboxWarmPool based on demand and uses standby capacity buffers (suspended VMs) to proactively provision infrastructure for fast scaling without the full cost of over-provisioning active nodes.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

A GKE Standard cluster running version 1.36.0-gke.2208000 or later.

Note: Standby buffers are available in experimental GKE version 1.35.2-gke.1842002.
Enable the Agent Sandbox add-on on your cluster.

Create a cluster

To create a GKE Standard cluster with the required configurations for standby capacity buffers and Agent Sandbox, run the following command:

gcloud container clusters create CLUSTER_NAME \
    --location=CONTROL_PLANE_LOCATION \
    --cluster-version=VERSION \
    --enable-autoscaling \
    --enable-autoprovisioning \
    --max-cpu=MAX_CPU \
    --max-memory=MAX_MEMORY \
    --enable-agent-sandbox \
    --enable-image-streaming \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --monitoring=SYSTEM

Replace the following:

CLUSTER_NAME: the name of your new cluster.
VERSION: the GKE version, which must be 1.36.0-gke.2208000 or later.
CONTROL_PLANE_LOCATION: the Compute Engine location for your new cluster. Choose a region for regional clusters (for example, us-central1), or a zone for zonal clusters (for example, us-central1-a).
MAX_CPU: maximum CPU limits for auto-provisioning, for example 4000.
MAX_MEMORY: maximum memory limits for auto-provisioning in GB, for example 12000.
PROJECT_ID: your Google Cloud project ID.

Configure Agent Sandbox components

You must define a SandboxTemplate and a SandboxWarmPool to manage your sandboxed workloads.

Save the following manifest as sandboxtemplate.yaml:

apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxTemplate
metadata:
  name: agent-template
  namespace: NAMESPACE
spec:
  podTemplate:
    metadata:
      labels:
        app: agent-sandbox-workload
    spec:
      restartPolicy: Never
      containers:
        - name: python-agent
          image: python:3.11-slim
          command: ["/bin/sh", "-c"]
          args: ["echo 'Hello from the Sandbox!' && sleep 3600"]
          resources:
            requests:
              cpu: "1000m"
              memory: "100Mi"

Replace NAMESPACE with your namespace, for example agent-sandbox-demo.

Apply the manifest:
```
kubectl apply -f sandboxtemplate.yaml
```

Save the following manifest as sandboxwarmpool.yaml. This establishes an initial static pool of replicas.

apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxWarmPool
metadata:
  name: agent-warmpool
  namespace: NAMESPACE
spec:
  replicas: 10
  sandboxTemplateRef:
    name: agent-template

Apply the manifest:
```
kubectl apply -f sandboxwarmpool.yaml
```

Configure metrics collection

The Agent Sandbox controller exposes a counter metric for the number of sandboxes claimed: agent_sandbox_claim_creation_total. You can configure a PodMonitoring resource to collect this metric and send it to Google Cloud Managed Service for Prometheus.

Save the following manifest as podmonitoring.yaml:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: agent-sandbox-controller-monitoring
  namespace: agent-sandbox-system # Namespace where the controller is running
spec:
  selector:
    matchLabels:
      app: agent-sandbox-controller
  endpoints:
  - port: 8080 # Port where metrics are exposed
    path: /metrics
    interval: 15s

Apply the manifest:
```
kubectl apply -f podmonitoring.yaml
```

Enable custom metrics adapter

To allow the HPA to read metrics from Google Cloud Managed Service for Prometheus, you must deploy the custom-metrics-stackdriver-adapter.

Enable the required IAM bindings. Run the following commands:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole=cluster-admin --user="$(gcloud config get-value account)"

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud projects add-iam-policy-binding PROJECT_ID \
  --role=roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

Replace PROJECT_NUMBER with your Google Cloud project number.

Configure RBAC permissions for SandboxWarmPool

The capacity buffer controller needs permission to read the scale subresource of the SandboxWarmPool custom resource.

Save the following manifest as capacity-buffer-rbac.yaml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: sandbox-warmpool-scale-reader
rules:
- apiGroups: ["extensions.agents.x-k8s.io"]
  resources: ["sandboxwarmpools/scale"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: ca-sandbox-warmpool-scale-reader
subjects:
- kind: User
  name: "system:cluster-autoscaler"
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: sandbox-warmpool-scale-reader

Apply the manifest:

kubectl apply -f capacity-buffer-rbac.yaml

Configure capacity buffer

Configure a CapacityBuffer to maintain an infrastructure buffer proportional to the size of the SandboxWarmPool. For more information, see Configure capacity buffers.

Save the following manifest as capacitybuffer.yaml. This example maintains a buffer equivalent to 200% of the SandboxWarmPool's replicas using standby capacity (suspended VMs).

apiVersion: autoscaling.x-k8s.io/v1beta1
kind: CapacityBuffer
metadata:
  name: agent-warmpool-buffer
  namespace: NAMESPACE
spec:
  percentage: 200
  scalableRef:
    apiGroup: extensions.agents.x-k8s.io
    kind: SandboxWarmPool
    name: agent-warmpool
  provisioningStrategy: "buffer.gke.io/standby-capacity"

Apply the manifest:
```
kubectl apply -f capacitybuffer.yaml
```

Configure Horizontal Pod Autoscaler

Connect the SandboxWarmPool to the HPA to dynamically scale replicas based on the custom metric.

Save the following manifest as hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agent-warmpool-hpa
  namespace: NAMESPACE
spec:
  scaleTargetRef:
    apiVersion: extensions.agents.x-k8s.io/v1alpha1
    kind: SandboxWarmPool
    name: agent-warmpool
  minReplicas: 10
  maxReplicas: 100
  metrics:
  - type: External
    external:
      metric:
        name: "prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter"
        selector:
          matchLabels:
            metric.labels.warmpool_name: "agent-warmpool"
      target:
        type: Value
        value: 0.2

Apply the manifest:
```
kubectl apply -f hpa.yaml
```

Monitor scaling events

You can monitor HPA and Capacity Buffer events to verify dynamic scaling.

Monitor HPA events

To watch HPA events, run the following command:

kubectl get events -n NAMESPACE --watch \
    --field-selector involvedObject.kind=HorizontalPodAutoscaler

The sample output when scaling occurs looks similar to the following:

SuccessfulRescale New size: 20; reason: external metric prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter above target

Monitor CapacityBuffer events

To watch Capacity Buffer events, run the following command:

kubectl get events -n NAMESPACE --watch \
    --field-selector involvedObject.kind=CapacityBuffer

The sample output showing suspended VM resume or scale-up looks similar to the following:

TriggeredScaleUp capacity buffer 20 fake pods triggered scale-up

What's next

Learn more about Agent Sandbox.
Learn more about Capacity buffers.