This document describes the steps to create Compute Engine instances in bulk that use A4X Max accelerator-optimized machine types. To learn about compute instance and cluster creation options, see Deployment options overview page.
A4X Max instance type
A Compute Engine instance, or compute instance, is a computing resource hosted on Google's infrastructure that can be either a virtual machine (VM) or a bare metal instance. A4X Max instances are available as bare metal instances, which differ from VM instances by providing direct, non-virtualized access to the underlying physical hardware. To learn more about the A4X Max machine type, see A4X Max series in the Compute Engine documentation.
Before you begin
Before creating A4X Max instances in bulk, if you haven't already done so, complete the following steps:
- Choose a consumption option: your choice of consumption option determines how you get
and use GPU resources.
To learn more, see Choose a consumption option.
- Obtain capacity: the process to obtain capacity differs for each consumption option.
To learn about the process to obtain capacity for your chosen consumption option, see Capacity overview.
Limitations
When you create A4X Max instances in bulk, the following limitations apply:
- You don't receive sustained use discounts or flexible committed use discounts for instances that use this machine type.
- You can only create instances in certain regions and zones.
- You can't use Persistent Disk (regional or zonal). You can only use Google Cloud Hyperdisk.
- This machine type is only available on the NVIDIA Grace platform.
- Machine type changes aren't supported for A4X Max. To switch to or from this machine type, you must create a new instance.
- You can't run Windows operating systems on this machine type. For a list of supported Linux operating systems, review the supported operating systems for GPU instances.
- A4X Max instances don't support the following:
- You can't attach Hyperdisk ML disks created before February 4, 2026 to A4X Max machine types.
Required roles
To get the permissions that
you need to create compute instances in bulk,
ask your administrator to grant you the
Compute Instance Admin (v1) (roles/compute.instanceAdmin.v1)
IAM role
on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create compute instances in bulk. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create compute instances in bulk:
-
compute.instances.createon the project -
To use a custom image to create the VM:
compute.images.useReadOnlyon the image -
To use a snapshot to create the VM:
compute.snapshots.useReadOnlyon the snapshot -
To use an instance template to create the VM:
compute.instanceTemplates.useReadOnlyon the instance template -
To specify a subnet for your VM:
compute.subnetworks.useon the project or on the chosen subnet -
To specify a static IP address for the VM:
compute.addresses.useon the project -
To assign an external IP address to the VM when using a VPC network:
compute.subnetworks.useExternalIpon the project or on the chosen subnet -
To assign a legacy network to the VM:
compute.networks.useon the project -
To assign an external IP address to the VM when using a legacy network:
compute.networks.useExternalIpon the project -
To set VM instance metadata for the VM:
compute.instances.setMetadataon the project -
To set tags for the VM:
compute.instances.setTagson the VM -
To set labels for the VM:
compute.instances.setLabelson the VM -
To set a service account for the VM to use:
compute.instances.setServiceAccounton the VM -
To create a new disk for the VM:
compute.disks.createon the project -
To attach an existing disk in read-only or read-write mode:
compute.disks.useon the disk -
To attach an existing disk in read-only mode:
compute.disks.useReadOnlyon the disk
You might also be able to get these permissions with custom roles or other predefined roles.
A4X Max fundamentals
An A4X Max cluster is organized into a hierarchy of blocks and sub-blocks to facilitate large-scale, non-blocking network performance. Understanding this topology is key when reserving capacity and deploying workloads.
- A4X Max instance
- An A4X Max instance is a single A4X Max machine type that has 4 GPUs attached.
- Sub-block
- A sub-block is the fundamental unit of A4X Max
capacity. For A4X Max, a sub-block consists of 18 A4X Max
instances (72 GPUs); these instances form an NVLink domain and are connected
using a multi-node NVLink
system. You create an A4X Max sub-block by applying a
compact placement policy that specifies a
1x72topology. - Block
- An A4X Max block is composed of 25 sub-blocks (NVLink domains), totalling up to 450 A4X Max instances (1,800 GPUs). The sub-blocks are rail-aligned for efficient scaling. Each sub-block requires a compact placement policy. Therefore, for a single A4X Max block, you can create 25 compact placement policies.
The following table shows the supported topology options for A4X Max instances:
Topology (gpuTopology) |
Number of GPUs | Number of instances |
|---|---|---|
1x72 |
72 | 18 |
Overview
Creating instances in bulk with the A4X Max machine type includes the following steps:
Create VPC networks
To set up the network for A4X Max machine types, create two VPC networks for the following network interfaces:
- 1 regular VPC network with two subnets for the IDPF network interfaces (NICs). These are used for host to host communication.
- 1 VPC network with the RoCE network profile
for the CX-8 NICs when creating multiple A4X Max
subblocks. The RoCE VPC network uses
a single subnet named
default-subnet-1-RDMA_NAME_PREFIX-netthat is automatically provided, and all eight CX-8 NICs use this subnet. These NICs use RDMA over Converged Ethernet (RoCE), providing the high-bandwidth, low-latency communication that's essential for scaling out to multiple A4X Max subblocks. For a single A4X Max subblock, you can skip this VPC network because within a single subblock, direct GPU to GPU communication is handled by the multi-node NVLink.
For more information about NIC arrangement, see Review network bandwidth and NIC arrangement.
Create the networks either manually by following the instruction guides or automatically by using the provided script.
Instruction guides
To create the networks, you can use the following instructions:
- To create the regular VPC networks for the gVNICs, see Create and manage Virtual Private Cloud networks.
- To create the RoCE VPC network, see Create a Virtual Private Cloud network for RDMA NICs.
For these VPC networks, we recommend setting the
maximum transmission unit (MTU) to a larger value.
For A4X Max machine types, the recommended MTU is 8896 bytes.
To review the recommended MTU settings for other GPU machine types, see
MTU settings for GPU machine types.
Script
To create the networks, follow these steps.
For these VPC networks, we recommend setting the
maximum transmission unit (MTU) to a larger value.
For A4X Max machine types, the recommended MTU is 8896 bytes.
To review the recommended MTU settings for other GPU machine types, see
MTU settings for GPU machine types.
Use the following script to create regular VPC networks for the IDPF NICs.
#!/bin/bash # Create regular VPC network for the IDPF NICs gcloud compute networks create IDPF_NETWORK_PREFIX-net \ --subnet-mode=custom \ --mtu=8896 \ --enable-ula-internal-ipv6 # Create subnets for the IDPF NICs for N in $(seq 0 1); do gcloud compute networks subnets create IDPF_NETWORK_PREFIX-$N \ --network=IDPF_NETWORK_PREFIX-net \ --region=REGION \ --stack-type=IPV6_ONLY \ --ipv6-access-type=INTERNAL done gcloud compute firewall-rules create IDPF_NETWORK_PREFIX-internal \ --network=IDPF_NETWORK_PREFIX-net \ --action=ALLOW \ --rules=tcp:0-65535,udp:0-65535,58 \ --source-ranges=IP_RANGEIf you require multiple A4X Max subblocks, use the following script to create the RoCE VPC network and subnets for the four CX-8 NICs on each A4X Max instance.
#!/bin/bash # List and make sure network profiles exist in the machine type's zone gcloud compute network-profiles list --filter "location.name=ZONE" # Create network for RDMA NICs gcloud compute networks create RDMA_NAME_PREFIX-net \ --network-profile=ZONE-vpc-roce-metal \ --subnet-mode custom \ --mtu=8896 # For RoCE VPC networks for bare metal instances, a single subnet named # default-subnet-1-RDMA_NAME_PREFIX-net is automatically provided. # For more details, see https://cloud.google.com/vpc/docs/rdma-network-profiles.Replace the following:
IDPF_NETWORK_PREFIX: the custom name prefix to use for the regular VPC networks and subnets for the IDPF NICs.RDMA_NAME_PREFIX: the custom name prefix to use for the RoCE VPC network and subnets for the CX-8 NICs.ZONE: specify a zone in which the machine type that you want to use is available, such asus-central1-a. For information about regions, see GPU availability by regions and zones.REGION: the region where you want to create the subnets. This region must correspond to the zone specified. For example, if your zone isus-central1-a, then your region isus-central1.IP_RANGE: the IP range to use for the SSH firewall rules.
- Optional: To verify that the VPC network resources are created successfully, check the network settings in the Google Cloud console:
- In the Google Cloud console, go to the VPC networks page.
- Search the list for the networks that you created in the previous step.
- To view the subnets, firewall rules, and other network settings, click the name of the network.
Create a compact placement policy
To create a compact placement policy, select one of the following options:gcloud
To create a compact placement policy, use the
gcloud beta compute resource-policies create group-placement command:
gcloud beta compute resource-policies create group-placement POLICY_NAME \
--collocation=collocated \
--gpu-topology=1x72 \
--region=REGION
Replace the following:
POLICY_NAME: the name of the compact placement policy.REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, see GPU availability by regions and zones.
REST
To create a compact placement policy, make a POST request to the
beta
resourcePolicies.insert method.
POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/resourcePolicies
{
"name": "POLICY_NAME",
"groupPlacementPolicy": {
"collocation": "COLLOCATED",
"gpuTopology": "1x72"
}
}
Replace the following:
PROJECT_ID: your project ID.POLICY_NAME: the name of the compact placement policy.REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, see GPU availability by regions and zones.
Create A4X Max instances in bulk
To obtain a GPU topology of 1x72, create
18 A4X Max instances. When you create the instances, apply the compact placement policy that specifies the gpuTopology
field. Applying the policy ensures that Compute Engine creates all 18 A4X Max
instances in one sub-block to use an NVLink domain.
If a sub-block lacks capacity for all 18 A4X Max instances, then the bulk creation fails
and doesn't create any instance.
If your workload can operate with less than 18 A4X Max instances, then you can set the
minCount field
to the minimum number of instances required for your workload. If you want to use any available
capacity, then set the minCount field to 1.
To create A4X Max instances in bulk, select one of the following options.
The following commands also set the access scope for your instances. To simplify permissions management, Google recommends that you set the access scope on an instance tocloud-platform access and then use IAM roles to define what services the instance can
access. For more information, see
Scopes best practice.
gcloud
To create A4X Max instances in bulk, use the
gcloud compute instances bulk create command.
gcloud compute instances bulk create \
--name-pattern=NAME_PATTERN \
--count=COUNT \
--machine-type=a4x-maxgpu-4g-metal \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT \
--region=REGION \
--boot-disk-type=hyperdisk-balanced \
--boot-disk-size=DISK_SIZE \
--scopes=cloud-platform \
--network-interface=nic-type=IDPF,network=IDPF_NETWORK_PREFIX-net,stack-type=IPV6_ONLY,subnet=IDPF_NETWORK_PREFIX-sub-0 \
--network-interface=nic-type=IDPF,network=IDPF_NETWORK_PREFIX-net,stack-type=IPV6_ONLY,subnet=IDPF_NETWORK_PREFIX-sub-1,no-address \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=mrdma \
--reservation-affinity=specific \
--reservation=RESERVATION \
--provisioning-model=RESERVATION_BOUND \
--instance-termination-action=TERMINATION_ACTION \
--maintenance-policy=TERMINATE \
--restart-on-failure \
--resource-policies=POLICY_NAME
Replace the following:
NAME_PATTERN: the name pattern to use for the A4X Max instances. For example, usinginstance-#for the name pattern generates A4X Max instances with names such asinstance-1andinstance-2, up to the number of A4X Max instances specified by--count.COUNT: the number of A4X Max instances to create.IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.IMAGE_PROJECT: the project ID of the OS image.REGION: specify a region in which the machine type that you want to use is available. You must use the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.DISK_SIZE: the size of the boot disk in GB.IDPF_NETWORK_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use IDPF NICs.RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.-
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:- To create A4X Max instances on any single block:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME - To create A4X Max instances on a specific block:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME - To create A4X Max instances in a specific subblock:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
- To create A4X Max instances on any single block:
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the A4X Max instance at the end of the reservation period.POLICY_NAME: the name of the compact placement policy.
REST
To create A4X Max instances in bulk, make a POST request to the
instances.bulkInsert method.
POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/bulkInsert
{
"namePattern":"NAME_PATTERN",
"count":"COUNT",
"instanceProperties":{
"machineType":"a4x-maxgpu-4g-metal",
"disks":[
{
"boot":true,
"initializeParams":{
"diskSizeGb":"DISK_SIZE",
"diskType":"hyperdisk-balanced",
"sourceImage":"projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
},
"mode":"READ_WRITE",
"type":"PERSISTENT"
}
],
"serviceAccounts": [
{
"email": "default",
"scopes": [
"https://www.googleapis.com/auth/cloud-platform"
]
}
],
"networkInterfaces": [
{
"accessConfigs": [
{
"name": "external-nat",
"type": "ONE_TO_ONE_NAT"
}
],
"network": "projects/NETWORK_PROJECT_ID/global/networks/IDPF_NETWORK_PREFIX-net",
"nicType": "IDPF",
"stackType": "IPV6_ONLY",
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/IDPF_NETWORK_PREFIX-sub-0"
},
{
"network": "projects/NETWORK_PROJECT_ID/global/networks/IDPF_NETWORK_PREFIX-net",
"nicType": "IDPF",
"stackType": "IPV6_ONLY",
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/IDPF_NETWORK_PREFIX-sub-1"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
},
{
"subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
"nicType": "MRDMA",
"stackType": "IPV6_ONLY"
}
],
"reservationAffinity":{
"consumeReservationType":"SPECIFIC_RESERVATION",
"key":"compute.googleapis.com/reservation-name",
"values":[
"RESERVATION"
]
},
"scheduling":{
"provisioningModel":"RESERVATION_BOUND",
"instanceTerminationAction":"DELETE",
"onHostMaintenance": "TERMINATE",
"automaticRestart":true
},
"resourcePolicies": [
"projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
]
}
}
Replace the following:
PROJECT_ID: the project ID of the project where you want to create the A4X Max instance.ZONE: specify a zone in which the machine type that you want to use is available. You must use a zone in the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.NAME_PATTERN: the name pattern to use for the A4X Max instances. For example, usinginstance-#for the name pattern generates A4X Max instances with names such asinstance-1andinstance-2, up to the number of A4X Max instances specified by--count.COUNT: the number of A4X Max instances to create.VM_NAME: the name of the VM.DISK_SIZE: the size of the boot disk in GB.IMAGE_PROJECT: the project ID of the OS image.IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Supported operating systems.NETWORK_PROJECT_ID: the project ID of the network.IDPF_NETWORK_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use IDPF NICs.REGION: the region of the subnetwork.RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.-
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:- To create A4X Max instances on any single block:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME - To create A4X Max instances on a specific block:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME - To create A4X Max instances in a specific subblock:
projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
- To create A4X Max instances on any single block:
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the A4X Max instance at the end of the reservation period.PROJECT_ID: the project ID of the compact placement policy.REGION: the region of the compact placement policy.POLICY_NAME: the name of the compact placement policy.
For more information about the configuration options when creating compute instances in bulk, see Create VMs in bulk in the Compute Engine documentation.