Create an A4X GKE cluster

This document describes how to deploy a GKE cluster that uses A4X accelerator-optimized machine types by using Cluster Toolkit.

The A4X machine series runs on an exascale platform based on NVIDIA's rack-scale architecture using NVIDIA GB200 Grace Blackwell Superchips and is optimized for compute and memory-intensive, network-bound ML training and HPC workloads.

To learn more about the A4X machine series, see the A4X Max and A4X machine series section in the Compute Engine documentation.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Required roles

To get the permissions that you need to deploy the cluster, ask your administrator to grant you the following IAM roles on the project:

Kubernetes Engine Admin (roles/container.admin)
Compute Admin (roles/compute.admin)
Storage Admin (roles/storage.admin)
Project IAM Admin (roles/resourcemanager.projectIamAdmin)
Service Account Admin (roles/iam.serviceAccountAdmin)
Service Account User (roles/iam.serviceAccountUser)
Service Usage Consumer (roles/serviceusage.serviceUsageConsumer)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Choose a consumption option and obtain capacity

To obtain capacity, complete the following steps:

Choose a consumption option: Make your choice based on how you want to get and use GPU resources. To learn more, see Choose a consumption option.

For GKE, consider the following additional information:
- A4X compute instances cannot be provisioned by using flex-start.
- You must use the reservation-bound provisioning model to create clusters with A4X. Other provisioning models are not supported.
Obtain capacity: The process to obtain capacity differs for each consumption option. To learn about the process for your chosen consumption option, see Capacity overview.

Note: When you request A4X capacity, you obtain it in all capacity mode. This mode is the only supported reservation operational mode for A4X machine types. For more information about all capacity mode, see Reservation operational mode.

Requirements

The following requirements apply to an AI-optimized GKE cluster that uses A4X instances:

Verify that you use GKE version 1.33.4-gke.1036000 or later (for 1.33), or version 1.32.8-gke.1108000 or later (for 1.32). These versions help ensure that A4X uses the following:
- R580, which is the minimum GPU driver version for the GB200 GPUs in A4X virtual machine (VM) instances. This driver is turned on by default.
- Coherent Driver-based Memory Management (CDMM), which is turned on by default. NVIDIA recommends that Kubernetes clusters turn on this mode to resolve memory over-reporting. CDMM lets you manage GPU memory through the driver instead of the operating system (OS). This approach helps you avoid OS onlining of GPU memory, and exposes the GPU memory as a Non-Uniform Memory Access (NUMA) node to the OS. Multi-instance GPUs are not supported when CDMM is turned on. For more information about CDMM, see Hardware and Software Support.
- GPUDirect RDMA, which is recommended to let A4X node pools use the networking capabilities of A4X.
To use GPUDirect RDMA, the GKE nodes must use a Container-Optimized OS node image. Ubuntu and Windows node images are not supported.