Scale Agent Sandboxes dynamically using HPA and Capacity Buffers

This page explains how to dynamically scale GKE Agent Sandbox environments using the Horizontal Pod Autoscaler (HPA) and standby capacity buffers on a GKE Standard cluster.

By default, Agent Sandbox Warm Pools keep a static number of pre-provisioned replicas ready to minimize Pod startup latency. This helps to avoid scenarios with variable traffic, where maintaining a high number of static replicas can incur high compute costs.

You can balance capacity readiness and cost savings by using dynamic scaling. This approach adjusts the size of the SandboxWarmPool based on demand and uses standby capacity buffers (suspended VMs) to proactively provision infrastructure for fast scaling without the full cost of over-provisioning active nodes.

Before you begin

Before you start, make sure that you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.

Create a cluster

To create a GKE Standard cluster with the required configurations for standby capacity buffers and Agent Sandbox, run the following command:

gcloud container clusters create CLUSTER_NAME \
    --location=CONTROL_PLANE_LOCATION \
    --cluster-version=VERSION \
    --enable-autoscaling \
    --enable-autoprovisioning \
    --max-cpu=MAX_CPU \
    --max-memory=MAX_MEMORY \
    --enable-agent-sandbox \
    --enable-image-streaming \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --monitoring=SYSTEM

Replace the following:

  • CLUSTER_NAME: the name of your new cluster.
  • VERSION: the GKE version, which must be 1.36.0-gke.2208000 or later.
  • CONTROL_PLANE_LOCATION: the Compute Engine location for your new cluster. Choose a region for regional clusters (for example, us-central1), or a zone for zonal clusters (for example, us-central1-a).
  • MAX_CPU: maximum CPU limits for auto-provisioning, for example 4000.
  • MAX_MEMORY: maximum memory limits for auto-provisioning in GB, for example 12000.
  • PROJECT_ID: your Google Cloud project ID.

Configure Agent Sandbox components

You must define a SandboxTemplate and a SandboxWarmPool to manage your sandboxed workloads.

  1. Save the following manifest as sandboxtemplate.yaml:

    apiVersion: extensions.agents.x-k8s.io/v1alpha1
    kind: SandboxTemplate
    metadata:
      name: agent-template
      namespace: NAMESPACE
    spec:
      podTemplate:
        metadata:
          labels:
            app: agent-sandbox-workload
        spec:
          restartPolicy: Never
          containers:
            - name: python-agent
              image: python:3.11-slim
              command: ["/bin/sh", "-c"]
              args: ["echo 'Hello from the Sandbox!' && sleep 3600"]
              resources:
                requests:
                  cpu: "1000m"
                  memory: "100Mi"
    

    Replace NAMESPACE with your namespace, for example agent-sandbox-demo.

  2. Apply the manifest:

    kubectl apply -f sandboxtemplate.yaml
    
  3. Save the following manifest as sandboxwarmpool.yaml. This establishes an initial static pool of replicas.

    apiVersion: extensions.agents.x-k8s.io/v1alpha1
    kind: SandboxWarmPool
    metadata:
      name: agent-warmpool
      namespace: NAMESPACE
    spec:
      replicas: 10
      sandboxTemplateRef:
        name: agent-template
    
  4. Apply the manifest:

    kubectl apply -f sandboxwarmpool.yaml
    

Configure metrics collection

The Agent Sandbox controller exposes a counter metric for the number of sandboxes claimed: agent_sandbox_claim_creation_total. You can configure a PodMonitoring resource to collect this metric and send it to Google Cloud Managed Service for Prometheus.

  1. Save the following manifest as podmonitoring.yaml:

    apiVersion: monitoring.googleapis.com/v1
    kind: PodMonitoring
    metadata:
      name: agent-sandbox-controller-monitoring
      namespace: agent-sandbox-system # Namespace where the controller is running
    spec:
      selector:
        matchLabels:
          app: agent-sandbox-controller
      endpoints:
      - port: 8080 # Port where metrics are exposed
        path: /metrics
        interval: 15s
    
  2. Apply the manifest:

    kubectl apply -f podmonitoring.yaml
    

Enable custom metrics adapter

To allow the HPA to read metrics from Google Cloud Managed Service for Prometheus, you must deploy the custom-metrics-stackdriver-adapter.

Enable the required IAM bindings. Run the following commands:

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole=cluster-admin --user="$(gcloud config get-value account)"

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

gcloud projects add-iam-policy-binding PROJECT_ID \
  --role=roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

Replace PROJECT_NUMBER with your Google Cloud project number.

Configure RBAC permissions for SandboxWarmPool

The capacity buffer controller needs permission to read the scale subresource of the SandboxWarmPool custom resource.

  1. Save the following manifest as capacity-buffer-rbac.yaml:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: sandbox-warmpool-scale-reader
    rules:
    - apiGroups: ["extensions.agents.x-k8s.io"]
      resources: ["sandboxwarmpools/scale"]
      verbs: ["get"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: ca-sandbox-warmpool-scale-reader
    subjects:
    - kind: User
      name: "system:cluster-autoscaler"
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: sandbox-warmpool-scale-reader
    
  2. Apply the manifest:

    kubectl apply -f capacity-buffer-rbac.yaml
    

Configure capacity buffer

Configure a CapacityBuffer to maintain an infrastructure buffer proportional to the size of the SandboxWarmPool. For more information, see Configure capacity buffers.

  1. Save the following manifest as capacitybuffer.yaml. This example maintains a buffer equivalent to 200% of the SandboxWarmPool's replicas using standby capacity (suspended VMs).

    apiVersion: autoscaling.x-k8s.io/v1beta1
    kind: CapacityBuffer
    metadata:
      name: agent-warmpool-buffer
      namespace: NAMESPACE
    spec:
      percentage: 200
      scalableRef:
        apiGroup: extensions.agents.x-k8s.io
        kind: SandboxWarmPool
        name: agent-warmpool
      provisioningStrategy: "buffer.gke.io/standby-capacity"
    
  2. Apply the manifest:

    kubectl apply -f capacitybuffer.yaml
    

Configure Horizontal Pod Autoscaler

Connect the SandboxWarmPool to the HPA to dynamically scale replicas based on the custom metric.

  1. Save the following manifest as hpa.yaml:

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: agent-warmpool-hpa
      namespace: NAMESPACE
    spec:
      scaleTargetRef:
        apiVersion: extensions.agents.x-k8s.io/v1alpha1
        kind: SandboxWarmPool
        name: agent-warmpool
      minReplicas: 10
      maxReplicas: 100
      metrics:
      - type: External
        external:
          metric:
            name: "prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter"
            selector:
              matchLabels:
                metric.labels.warmpool_name: "agent-warmpool"
          target:
            type: Value
            value: 0.2
    
  2. Apply the manifest:

    kubectl apply -f hpa.yaml
    

Monitor scaling events

You can monitor HPA and Capacity Buffer events to verify dynamic scaling.

Monitor HPA events

To watch HPA events, run the following command:

kubectl get events -n NAMESPACE --watch \
    --field-selector involvedObject.kind=HorizontalPodAutoscaler

The sample output when scaling occurs looks similar to the following:

SuccessfulRescale New size: 20; reason: external metric prometheus.googleapis.com|agent_sandbox_claim_creation_total|counter above target

Monitor CapacityBuffer events

To watch Capacity Buffer events, run the following command:

kubectl get events -n NAMESPACE --watch \
    --field-selector involvedObject.kind=CapacityBuffer

The sample output showing suspended VM resume or scale-up looks similar to the following:

TriggeredScaleUp capacity buffer 20 fake pods triggered scale-up

What's next