To improve workload stability, Google Kubernetes Engine (GKE) Autopilot mode manages the values of Pod resource requests, such as CPU, memory, or ephemeral storage. This page includes the following information, which you can use to plan efficient, stable, and cost-effective workloads:
- Default values that Autopilot applies to Pods that don't specify values.
- Minimum and maximum values that Autopilot enforces for resource requests.
- How the default, minimum, and maximum values vary based on the hardware that your Pods request.
This page is for Operators and Developers who provision and configure cloud resources, and deploy workloads. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.
You should already be familiar with Kubernetes resource management.
Overview of resource requests in Autopilot
Autopilot uses the resource requests that you specify in your workload configuration to configure the nodes that run your workloads. Autopilot enforces minimum and maximum resource requests based on the compute class or the hardware configuration that your workloads use. If you don't specify requests for some containers, Autopilot assigns default values to let those containers run correctly.
When you deploy a workload in an Autopilot cluster, GKE validates the workload configuration against the allowed minimum and maximum values for the selected compute class or hardware configuration (such as GPUs). If your requests are less than the minimum, Autopilot automatically modifies your workload configuration to bring your requests within the allowed range. If your requests are greater than the maximum, Autopilot rejects your workload and displays an error message.
The following list summarizes the categories of resource requests:
- Default resource requests: Autopilot adds these if you don't specify your own requests for workloads
- Minimum and maximum resource requests: Autopilot validates your specified requests to ensure that they're within these limits. If your requests are outside the limits, Autopilot modifies your workload requests.
- Workload separation and extended duration requests: Autopilot has different default values and different minimum values for workloads that you separate from each other, or for Pods that get extended protection from GKE-initiated eviction.
- Resource requests for DaemonSets: Autopilot has different default, minimum, and maximum values for containers in DaemonSets.
How to request resources
In Autopilot, you request resources in your Pod specification. The supported minimum and maximum resources that you can request change based on the hardware configuration of the node on which the Pods run. To learn how to request specific hardware configurations, refer to the following pages:
Default resource requests
If you don't specify resource requests for some containers in a Pod, Autopilot applies default values. These defaults are suitable for many smaller workloads.
Additionally, Autopilot applies the following default resource requests regardless of the selected compute class or hardware configuration:
- Containers in DaemonSets - CPU: 50 mCPU
- Memory: 100 MiB
- Ephemeral storage: 100 MiB
 
- All other containers - Ephemeral storage: 1 GiB
 
For more information about Autopilot cluster limits, see Quotas and limits.
Default requests for compute classes
Autopilot applies the following default values to resources that are not defined in the Pod specification for Pods that run on compute classes. If you only set one of the requests and leave the other blank, GKE uses the CPU:memory ratio defined in the Minimum and maximum requests section to set the missing request to a value that complies with the ratio.
| Compute class | Resource | Default request | 
|---|---|---|
| General-purpose (default) | CPU | 0.5 vCPU | 
| Memory | 2 GiB | |
| Accelerator | No default requests enforced. | |
| Balanced | CPU | 0.5 vCPU | 
| Memory | 2 GiB | |
| Performance | No default requests enforced. | |
| Scale-Out | CPU | 0.5 vCPU | 
| Memory | 2 GiB | |
Minimum and maximum resource requests
The total resources requested by your deployment configuration should be within the supported minimum and maximum values that Autopilot allows. The following conditions apply:
- Ephemeral storage requests: - Ephemeral storage uses the VM boot disk unless your nodes have local SSDs attached. - Compute hardware that includes local SSDs like A100 (80GB) GPUs, H100 (80GB) GPUs, or the Z3 machine series support a maximum request that's equal to the size of the local SSD minus any system overhead. For information about this system overhead, see Ephemeral storage backed by local SSDs. 
- In GKE version 1.29.3-gke.1038000 and later, Performance class Pods and hardware accelerator Pods support a maximum ephemeral storage request of 56 TiB unless the hardware includes local SSDs. - In all other Autopilot Pods regardless of the GKE version, the total ephemeral storage request across all of the containers in the Pod must be between 10 MiB and 10 GiB unless otherwise specified. 
- For larger volumes, use generic ephemeral volumes, which provide equivalent functionality and performance to ephemeral storage but with significantly more flexibility as they can be used with any GKE storage option. For example, the maximum size for a generic ephemeral volume using - pd-balancedis 64 TiB.
 
- For DaemonSet Pods, the minimum resource requests are as follows: - Clusters that support bursting: 1 mCPU per Pod, 2 MiB of memory per Pod, and 10 MiB of ephemeral storage per container in the Pod.
- Clusters that don't support bursting: 10 mCPU per Pod, 10 MiB of memory per Pod, and 10 MiB of ephemeral storage per container in the Pod.
 - To check whether your cluster supports bursting, see Bursting availability in GKE. 
- If your cluster supports bursting, Autopilot doesn't enforce 0.25 vCPU increments for your Pod CPU requests. If your cluster doesn't support bursting, Autopilot rounds up your CPU requests to the nearest 0.25 vCPU. To check whether your cluster supports bursting, see Bursting availability in GKE. 
- The CPU:memory ratio must be within the allowed range for the selected compute class or hardware configuration. If your CPU:memory ratio is outside the allowed range, Autopilot automatically increases the smaller resource. For example, if you request 1 vCPU and 16 GiB of memory (1:16 ratio) for Pods running on the - Scale-Outclass, Autopilot increases the CPU request to 4 vCPUs, which changes the ratio to 1:4.
Minimums and maximums for compute classes
The following table describes the minimum, maximum, and allowed CPU-to-memory ratio for each compute class that Autopilot supports:
| Compute class | CPU:memory ratio (vCPU:GiB) | Resource | Minimum | Maximum | 
|---|---|---|---|---|
| General-purpose (default) | Between 1:1 and 1:6.5 | CPU | The value depends on whether your cluster supports bursting, as follows: 
 To check whether your cluster supports bursting, see Bursting availability in GKE. | 30 vCPU | 
| Memory | The value depends on whether your cluster supports bursting, as follows: 
 To check whether your cluster supports bursting, see Bursting availability in GKE. | 110 GiB | ||
| Accelerator | See Minimums and maximums for accelerators | |||
| Balanced | Between 1:1 and 1:8 | CPU | 0.25 vCPU | 222 vCPU If minimum CPU platform selected: 
 | 
| Memory | 0.5 GiB | 851 GiB If minimum CPU platform selected: 
 | ||
| Performance | N/A | CPU | No minimum requests enforced | 
 | 
| Memory | No minimum requests enforced | 
 | ||
| Ephemeral storage | No minimum requests enforced | 
 | ||
| Scale-Out | Exactly 1:4 | CPU | 0.25 vCPU | 
 | 
| Memory | 1 GiB | 
 | ||
To learn how to request compute classes in your Autopilot Pods, refer to Choose compute classes for Autopilot Pods.
Minimums and maximums for accelerators
GKE doesn't enforce minimum CPU, memory, or ephemeral storage requests for Pods that use accelerators. The following table describes the maximum requests for each of these resources based on the number and type of accelerator that you use.
Unless specified, the maximum ephemeral storage supported is 56 TiB.
| Accelerator type | Resource | Maximum | 
|---|---|---|
| NVIDIA B200 nvidia-B200 | CPU | 
 | 
| Memory | 
 | |
| Ephemeral storage | 
 | |
| NVIDIA H200 (141GB) nvidia-h200-141gb | CPU | 
 | 
| Memory | 
 | |
| Ephemeral storage | 
 | |
| NVIDIA H100 Mega (80GB) nvidia-h100-mega-80gb | CPU | 
 | 
| Memory | 
 | |
| Ephemeral storage | 
 | |
| NVIDIA H100 (80GB) nvidia-h100-80gb | CPU | 
 | 
| Memory | 
 | |
| Ephemeral storage | 
 | |
| NVIDIA A100 (40GB) nvidia-tesla-a100 | CPU | 
 The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU. | 
| Memory | 
 The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB. | |
| NVIDIA A100 (80GB) nvidia-a100-80gb | CPU | 
 The sum of CPU requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 2 vCPU. | 
| Memory | 
 The sum of memory requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 14 GiB. | |
| Ephemeral storage | 
 | |
| NVIDIA L4 nvidia-l4 | CPU | 
 The sum of CPU requests of all DaemonSets that run on an L4 GPU node must not exceed 2 vCPU. | 
| Memory | 
 The sum of memory requests of all DaemonSets that run on an L4 GPU node must not exceed 14 GiB. | |
| NVIDIA Tesla T4 nvidia-tesla-t4 | CPU | 
 | 
| Memory | 
 | |
| TPU v5e tpu-v5-lite-podslice | CPU | 
 | 
| Memory | 
 | |
| Ephemeral storage | 56 TiB | |
| TPU v5p tpu-v5p-slice | CPU | 280 vCPU | 
| Memory | 448 GiB | |
| Ephemeral storage | 56 TiB | |
| TPU v4 tpu-v4-podslice | CPU | 240 vCPU | 
| Memory | 407 GiB | |
| Ephemeral storage | 56 TiB | 
To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.
Resource requests for workload separation and extended duration
Autopilot lets you manipulate Kubernetes scheduling and eviction behavior using methods such as the following:
- Use taints and tolerations and node selectors to ensure that certain Pods only get placed on specific nodes. For details, see Configure workload separation in GKE.
- Use Pod anti-affinity to prevent Pods from co-locating on the same node. The default and minimum resource requests for workloads that use these methods to control scheduling behavior are higher than for workloads that don't.
- Use an annotation to protect Pods from eviction caused by node auto-upgrades and scale-down events for up to seven days. For details, see Extend the run time of Autopilot Pods.
If your specified requests are less than the minimums, the behavior of Autopilot changes based on the method that you used, as follows:
- Taints, tolerations, selectors, and extended duration Pods: Autopilot modifies your Pods to increase the requests when scheduling the Pods.
- Pod anti-affinity: Autopilot rejects the Pod and displays an error message.
The following table describes the default requests and the minimum resource requests that you can specify. If a configuration or compute class isn't in this table, Autopilot doesn't enforce special minimum or default values.
| Compute class | Resource | Default | Minimum | 
|---|---|---|---|
| General-purpose | CPU | 0.5 vCPU | 0.5 vCPU | 
| Memory | 2 GiB | 0.5 GiB | |
| Balanced | CPU | 2 vCPU | 1 vCPU | 
| Memory | 8 GiB | 4 GiB | |
| Scale-Out | CPU | 0.5 vCPU | 0.5 vCPU | 
| Memory | 2 GiB | 2 GiB | 
Init containers
Init
containers
run sequentially, and all init containers must finish running before application
containers can start. In Autopilot clusters, if you don't specify CPU
or memory requests for init containers or explicitly set requests to 0,
Autopilot modifies your Pods during creation to add resource requests
to each init container. The requests assigned to each init container are equal to
the sum of requests for all application containers in the Pod. This is the default
behavior.
This behavior differs from Standard clusters, where init containers use any unallocated resources available on the node on which the Pod is scheduled.
Automatic resource allocation for init containers
The automatic resource allocation for init containers occurs at Pod creation. We suggest that you don't manually specify resource requests for the init containers in Autopilot clusters, so that each container gets the full resources available to the Pod by default.
If you change the resource requests of non-init containers in the Pod after creation, Autopilot doesn't automatically adjust the resource requests of the init containers. As a result, you might notice charges that aren't consistent with the actual resource usage of the Pod. Your bill is based on the effective resource request of the Pod, which is the larger of the following:
- The largest resource request of any single init container in the Pod.
- The sum of requests for all application containers in the Pod.
For more information, see Automatic resource management in Autopilot.
Manual resource allocation for init containers
If you need to change the existing resource requests for your application containers to manage costs and resources, we recommend that you do one of the following to adjust your init container requests:
- Manually update the resource
requests
for the init container to match the new total requests on the Pod. Consider
the following when manually specifying the resource requests:
- Requests lower than the Pod's total resources can constrain the init container.
- Requests higher than the Pod's total resources can increase costs.
 
- Remove the resource requests to let Autopilot re-compute them. Autopilot will default to re-allocating the resources to each init container based on the current total resources requested by all the application containers in the Pod.
Setting resource limits in Autopilot
Kubernetes lets you set both requests and limits for resources in your
Pod specification. The behavior of your Pods changes depending on whether your
limits are different than your requests, as described in the following
table:
| Values set | Autopilot behavior | 
|---|---|
| requestsequal tolimits | Pods use the GuaranteedQoS class. | 
| requestsset,limitsnot set | The behavior depends on whether your cluster supports bursting, as follows: 
 To check whether your cluster supports bursting, see Bursting availability in GKE. | 
| requestsnot set,limitsset | Autopilot sets requeststo the value
      oflimits, which is the default Kubernetes behavior.Before: resources:
  limits:
    cpu: "400m"After: resources:
  requests:
    cpu: "400m"
  limits:
    cpu: "400m" | 
| requestsless thanlimits | The behavior depends on whether your cluster supports bursting, as follows: 
 To check whether your cluster supports bursting, see Bursting availability in GKE. | 
| requestsgreater thanlimits | Autopilot sets requeststo the value
      oflimits.Before: resources:
  requests:
    cpu: "450m"
  limits:
    cpu: "400m"After: resources:
  requests:
    cpu: "400m"
  limits:
    cpu: "400m" | 
| requestsnot set,limitsnot set | Autopilot sets  The behavior for  
 To check whether your cluster supports bursting, see Bursting availability in GKE. | 
In most situations, set adequate resource requests and equal limits for your workloads.
For workloads that temporarily need more resources than their steady-state, like during boot up or during higher traffic periods, set your limits higher than your requests to let the Pods burst. For details, see Configure Pod bursting in GKE.
Automatic resource management in Autopilot
If your specified resource requests for your workloads are outside of the allowed ranges, or if you don't request resources for some containers, Autopilot modifies your workload configuration to comply with the allowed limits. Autopilot calculates resource ratios and the resource scale up requirements after applying default values to containers with no request specified.
- Missing requests: If you don't request resources in some containers, Autopilot applies the default requests for the compute class or hardware configuration.
- CPU:memory ratio: Autopilot scales up the smaller resource to bring the ratio within the allowed range.
- Ephemeral storage: Autopilot modifies your ephemeral storage requests to meet the minimum amount required by each container. The cumulative value of storage requests across all containers cannot be more than the maximum allowed value. Prior to 1.28.6-gke.1317000, Autopilot scales down the requested ephemeral storage if the value exceeds the maximum. In version 1.28.6-gke.1317000 and later, Autopilot rejects your workload.
- Requests below minimums: If you request fewer resources than the allowed minimum for the selected hardware configuration, Autopilot automatically modifies the Pod to request at least the minimum resource value.
By default, when Autopilot automatically scales a resource up to meet
a minimum or default resource value, GKE allocates the extra
capacity to the first container in the Pod manifest. In GKE
version 1.27.2-gke.2200 and later, you can tell GKE to allocate
the extra resources to a specific container by adding the following to the
annotations field in your Pod manifest:
autopilot.gke.io/primary-container: "CONTAINER_NAME"
Replace CONTAINER_NAME with the name of the
container.
Resource modification examples
The following example scenario shows how Autopilot modifies your workload configuration to meet the requirements of your running Pods and containers.
Single container with < 0.05 vCPU
| Container number | Original request | Modified request | 
|---|---|---|
| 1 | CPU: 30 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 50 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | 
Multiple containers with total CPU < 0.05 vCPU
| Container number | Original requests | Modified requests | 
|---|---|---|
| 1 | CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 30 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | 
| 2 | CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | 
| 3 | CPU: 10 mvCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB | 
| Total Pod resources | CPU: 50 mCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB | 
Single container with memory too low for requested CPU
In this example, the memory is too low for the amount of CPU (1 vCPU:1 GiB minimum). The minimum allowed ratio for CPU to memory is 1:1. If the ratio is lower than that, the memory request is increased.
| Container number | Original request | Modified request | 
|---|---|---|
| 1 | CPU: 4 vCPU Memory: 1 GiB Ephemeral storage: 10 MiB | CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB | 
| Total Pod resources | CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB |