Capacity buffers improve the responsiveness and reliability of critical workloads by proactively managing spare cluster capacity using a Kubernetes CapacityBuffer custom resource definition. Using a capacity buffer lets you explicitly define a specific amount of unused node capacity within your cluster. This reserved capacity helps to ensure that GKE provisions nodes ahead of time.
When a high-priority workload needs to scale up quickly, the new workload can use the empty capacity immediately without waiting for node provisioning. This minimizes latency and avoids resource contention during sudden spikes in demand.
This page provides three methods for configuring capacity buffers: a fixed replicas buffer, a percentage-based buffer, and a resource limits buffer.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
- Create, or have access to a GKE cluster on version 1.35.2-gke.1842000 or later.
- (Optional, but recommended) Enable node auto-provisioning on your cluster.
To use standby buffers, you must create a cluster with the following GKE version:
gcloud container clusters create CLUSTER_NAME \ --region=COMPUTE_REGION \ --cluster-version=1.35.2-gke.1842002 \ --release-channel=NoneReplace the following:
CLUSTER_NAME: the name of your new cluster.COMPUTE_REGION: the Compute Engine region for your new cluster, such asus-central1.
Create prerequisite Kubernetes objects
To configure a CapacityBuffer, you need a namespace that holds all of the
required objects (the CapacityBuffer itself, and additional resources like a
PodTemplate or workload). The PodTemplate and CapacityBuffer must be in
the same namespace. You can create a namespace or use an existing namespace,
including the default namespace.
Depending on which type of CapacityBuffer you're configuring, you also require one of the following:
- PodTemplate: defines the resource requirements for a single unit of buffer capacity. The configuration specified in the CapacityBuffer object references the Pod template.
Workload: an existing workload that you reference in the CapacityBuffer object. This guide uses a Deployment object as an example workload, but capacity buffers support any of the following resource types:
- Deployment
- ReplicaSet
- StatefulSet
- ReplicationController
- Job
CustomResourceDefinitions (CRDs) that implement the
scalesubresource.
This section provides examples of these objects. If you already have a workload that you want to configure with a capacity buffer, proceed to Apply a capacity buffer.
To create an example Kubernetes workload, complete the following steps:
Save the following manifest as
namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: capacity-buffer-example labels: name: capacity-buffer-exampleThis manifest creates a namespace called
capacity-buffer-example.Save the following manifest as
buffer-pod-template.yaml:apiVersion: v1 kind: PodTemplate metadata: name: buffer-unit-template namespace: capacity-buffer-example # the namespace must be the same namespace as the CapacityBuffer template: spec: terminationGracePeriodSeconds: 0 containers: - name: buffer-container image: registry.k8s.io/pause:3.9 resources: requests: cpu: "1" memory: "1Gi" limits: cpu: "1" memory: "1Gi"This manifest creates a
PodTemplatethat defines the resource requirements for a single unit of buffer capacity (1CPU and1GiMemory). This configuration specifies the size of capacity units that GKE provisions for the buffer. For example, with this PodTemplate, GKE won't consider nodes with less than 1 CPU and 1Gi of available resources as part of the buffer, if the cluster scales up.Save the following manifest as
sample-workload-deployment.yaml:apiVersion: apps/v1 kind: Deployment metadata: name: critical-workload-ref namespace: capacity-buffer-example # the namespace must be the same namespace as the CapacityBuffer spec: replicas: 10 selector: matchLabels: app: critical-workload template: metadata: labels: app: critical-workload spec: containers: - name: busybox image: busybox command: ["sleep", "3600"] resources: requests: cpu: 100mThis manifest creates a sample Deployment with 10 replicas, which is the reference object for the percentage-based buffer example in the next section.
Apply the manifests to your cluster:
kubectl apply -f namespace.yaml -f buffer-pod-template.yaml -f sample-workload-deployment.yamlVerify that GKE created the objects:
kubectl get podtemplate -n capacity-buffer-example kubectl get deployment critical-workload-ref -n capacity-buffer-exampleThe output is similar to the following:
NAME AGE buffer-unit-template 1m NAME READY UP-TO-DATE AVAILABLE AGE critical-workload-ref 10/10 10 10 1m
Apply a capacity buffer
This section provides examples of the different types of capacity buffers that you can apply to your workloads.
Configure a fixed replicas buffer
Configuring a CapacityBuffer with fixed replicas specifies the exact number of buffer units that you want based on a PodTemplate.
To create a buffer with fixed replicas, complete the following steps:
Save the following manifest as
cb-fixed-replicas.yaml:apiVersion: autoscaling.x-k8s.io/v1beta1 kind: CapacityBuffer metadata: name: fixed-replica-buffer namespace: NAMESPACE spec: podTemplateRef: name: POD_TEMPLATE replicas: 3 provisioningStrategy: "STRATEGY"Replace the following:
NAMESPACE: the name of your namespace, for examplecapacity-buffer-example.POD_TEMPLATE: the PodTemplate that defines your resource requirements, for examplebuffer-unit-template.STRATEGY: the provisioning strategy, either"buffer.x-k8s.io/active-capacity"(default) or"buffer.gke.io/standby-capacity".
This manifest creates a CapacityBuffer resource that references a PodTemplate to request a specific number of buffer units.
Apply the manifest:
kubectl apply -f cb-fixed-replicas.yamlConfirm that GKE applied the capacity buffer:
kubectl get capacitybuffer fixed-replica-buffer -n NAMESPACEThe
replicasfield in the status should show3, which reflects the number of replicas that you defined in the manifest. TheSTATUSfield should showReadyForProvisioning.
Configure a percentage-based buffer
Configuring a percentage-based buffer dynamically sizes the buffer based on a
percentage of an existing scalable workload. Percentage-based buffers are
supported for any object that defines a scale subresource, such as Deployments,
StatefulSets, ReplicaSets, or Jobs. You can't define a percentage-based buffer
for Pod templates because they don't have a replicas field.
To create a percentage-based buffer, complete the following steps:
Save the following manifest as
cb-percentage-based.yaml:apiVersion: autoscaling.x-k8s.io/v1beta1 kind: CapacityBuffer metadata: name: percentage-buffer namespace: NAMESPACE spec: scalableRef: apiGroup: apps kind: Deployment name: SCALABLE_RESOURCE_NAME percentage: 20 provisioningStrategy: "STRATEGY"Replace the following:
NAMESPACE: the name of your namespace.SCALABLE_RESOURCE_NAME: the name of your scalable resource, for examplecritical-workload-ref.STRATEGY: the provisioning strategy, either"buffer.x-k8s.io/active-capacity"(default) or"buffer.gke.io/standby-capacity".
This manifest creates a CapacityBuffer resource that requests a buffer size equivalent to 20% of the referenced resource's replicas. If you're using the Deployment example from the previous section, the replica value is set to
10.Apply the manifest:
kubectl apply -f cb-percentage-based.yamlConfirm that GKE applied the capacity buffer:
kubectl get capacitybuffer percentage-buffer -n NAMESPACECheck the CapacityBuffer status. The
replicasfield should show a value from the percentage calculation. If you're using the Deployment example from the previous section, you should see2buffer units, which is 20% of the 10 replicas defined in the Deployment.Test the dynamic scaling by manually scaling the Deployment up to 20 replicas:
kubectl scale deployment critical-workload-ref -n NAMESPACE --replicas=20The CapacityBuffer controller reacts and automatically scales the buffer to 4 replicas.
Configure a resource limits buffer
You can use the limits field to define a maximum number of resources that the
buffer should consume, calculated based on the PodTemplate size.
To create a resource limits buffer, complete the following steps:
Save the following manifest as
cb-resource-limits.yaml:apiVersion: autoscaling.x-k8s.io/v1beta1 kind: CapacityBuffer metadata: name: resource-limit-buffer namespace: NAMESPACE spec: podTemplateRef: name: POD_TEMPLATE limits: cpu: "5" memory: "5Gi" provisioningStrategy: "STRATEGY"Replace the following:
NAMESPACE: the name of your namespace, for examplecapacity-buffer-example.POD_TEMPLATE: the PodTemplate that defines your resource requirements, for examplebuffer-unit-template.STRATEGY: the provisioning strategy, either"buffer.x-k8s.io/active-capacity"(default) or"buffer.gke.io/standby-capacity".
This manifest creates a CapacityBuffer resource with a total limit of 5 CPUs and 5 GiB Memory. If you're using the PodTemplate example from the previous step, you define each unit as
1CPU and1GiMemory, which should result in 5 buffer units.Apply the manifest:
kubectl apply -f cb-resource-limits.yamlConfirm that GKE applied the capacity buffer:
kubectl get capacitybuffer resource-limit-buffer -n NAMESPACECheck the CapacityBuffer status. The
replicasfield should show a value derived from the limits that you defined. If you're using the PodTemplate example from the previous section, you should see5buffer units because this is the maximum number of units that fit within the defined limits.
Customize standby buffer behavior
You can use annotations to customize how standby buffers start and refresh.
Add these annotations to the metadata.annotations field of your
CapacityBuffer resource:
buffer.gke.io/standby-capacity-init-time: the amount of time a node remains active after creation before it's suspended. The format is a duration string (for example,5mor1h). The default is5m.buffer.gke.io/standby-capacity-refresh-frequency: how often suspended nodes are refreshed. The default is1d.
The following example shows a manifest with these optional fields to customize the behavior of standby buffers:
apiVersion: autoscaling.x-k8s.io/v1beta1
kind: CapacityBuffer
metadata:
name: customized-standby-buffer
namespace: my-namespace
annotations:
buffer.gke.io/standby-capacity-init-time: "15m"
buffer.gke.io/standby-capacity-refresh-frequency: "12h"
spec:
podTemplateRef:
name: buffer-unit-template
replicas: 3
provisioningStrategy: "buffer.gke.io/standby-capacity"
Preload images on standby buffers
To speed up workload startup times when a standby node resumes, you can preload container images by using a DaemonSet. The DaemonSet runs during the start-up period before the node is suspended.
To preload images by using the DaemonSet, complete the following steps:
Save the following manifest as
image-puller-daemonset.yaml:apiVersion: apps/v1 kind: DaemonSet metadata: name: image-prefetch-daemonset namespace: NAMESPACE spec: selector: matchLabels: name: image-prefetch template: metadata: labels: name: image-prefetch spec: tolerations: - key: "buffer.gke.io/standby-node-suspended" operator: "Exists" initContainers: - name: image-puller image: IMAGE_NAME command: ["sh", "-c", "true"] containers: - name: pause image: registry.k8s.io/pause:3.9Replace the following:
NAMESPACE: the namespace for the DaemonSet, for examplecapacity-buffer-example.IMAGE_NAME: the name of the image to preload, for exampleyour-app-image:latest.
Apply the DaemonSet manifest to your cluster:
kubectl apply -f image-puller-daemonset.yamlVerify that the DaemonSet is created:
kubectl get daemonset image-prefetch-daemonset -n NAMESPACEVerify that your capacity buffer is created and ready for provisioning:
kubectl get capacitybuffer CAPACITY_BUFFER_NAME -n NAMESPACECheck the status. The
STATUSfield should showReadyForProvisioning.
Remove capacity buffers
If you no longer need a capacity buffer for your workloads, delete the CapacityBuffer object. This removes the placeholder Pods and allows the cluster autoscaler to scale down the nodes.
kubectl delete capacitybuffer CAPACITY_BUFFER_NAME -n NAMESPACE
Replace CAPACITY_BUFFER_NAME with the name of the
CapacityBuffer that you want to delete.
Troubleshooting
The following section contains information on resolving common issues with capacity buffers.
Capacity buffer not ready due to billing model
If you create a CapacityBuffer for a workload that uses the Pod-based billing model (pay-per-Pod), the capacity buffer won't be ready for provisioning.
To identify this issue, check the CapacityBuffer status:
kubectl describe capacitybuffer BUFFER_NAME -n NAMESPACE
Look for a condition of the type ReadyForProvisioning with a status of False.
To resolve this issue, ensure that your CapacityBuffer references a workload or PodTemplate that is compatible with node-based billing.
Permission errors for custom scalable resources
If you configure a CapacityBuffer to work with custom scalable objects (using
the scalableRef field), the cluster autoscaler might fail to scale the buffer
if it lacks the necessary permissions.
To resolve this issue, manually grant the required permissions by
creating a ClusterRole and ClusterRoleBinding, such as in the following
example:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: custom-scale-getter
rules:
- apiGroups: ["api.example.com"]
resources: ["customreplicatedresources/scale"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ca-custom-scale-getter
subjects:
- kind: User
name: "system:cluster-autoscaler"
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: custom-scale-getter
For more information about configuring RBAC, see the Kubernetes RBAC documentation.
What's next
- Learn more about capacity buffers.
- Refer to the CapacityBuffer CRD documentation.