Create a MIG with a multi-host Cloud TPU slice
This document describes how to create a managed instance group (MIG) with a multi-host TPU slice.
Prerequisites
Complete the following prerequisites:
- Create a project for your TPUs as described in Set up a project for TPUs.
- Determine your TPU requirements as described in Plan your resources.
Create a MIG with multi-host TPU slices
- Create an instance template.
- Create a workload policy.
- Create the MIG.
Create an instance template
The command to create an instance template depends on the consumption option you use: on-demand, Spot, reservation-bound, or flex-start. For more information about consumption options, see Plan your TPU resources.
Create an instance template for an on-demand TPU VM
The following command creates an instance template using the on-demand consumption option:
gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--maintenance-policy=TERMINATE \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT
Replace the following placeholders:
INSTANCE_TEMPLATE_NAME: The name of your instance template.MACHINE_TYPE: The machine type for the TPU VM, for example,ct6e-standard-8t.IMAGE_FAMILY: The OS image family for the TPU VM. If you want to install a specific OS version, use the--imageflag. For more information about OS images, see OS images.IMAGE_PROJECT: The project that contains the OS image. For TPU images, this isubuntu-os-accelerator-images.
Create an instance template for a TPU Spot VM
The following command creates an instance template using the Spot consumption option:
gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--maintenance-policy=TERMINATE \
--instance-termination-action=STOP \
--provisioning-model=SPOT \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT
Replace the following placeholders:
INSTANCE_TEMPLATE_NAME: The name of your instance template.MACHINE_TYPE: The machine type for the TPU VM, for example,ct6e-standard-8t.IMAGE_FAMILY: The OS image family for the TPU VM. If you want to install a specific OS version, use the--imageflag. For more information about OS images, see OS images.IMAGE_PROJECT: The project that contains the OS image. For TPU images, this isubuntu-os-accelerator-images.
Create an instance template for a TPU reservation-bound VM
The following command creates an instance template using the reservation-bound consumption option:
gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--maintenance-policy=TERMINATE \
--instance-termination-action=DELETE \
--reservation-affinity=specific \
--provisioning-model=reservation-bound \
--reservation=RESERVATION_NAME \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT
Replace the following placeholders:
INSTANCE_TEMPLATE_NAME: The name of your instance template.MACHINE_TYPE: The machine type for the TPU VM, for example,ct6e-standard-8t.RESERVATION_NAME: The name of the specific reservation to consume.IMAGE_FAMILY: The OS image family for the TPU VM. If you want to install a specific OS version, use the--imageflag. For more information about OS images, see OS images.IMAGE_PROJECT: The project that contains the OS image. For TPU images, this isubuntu-os-accelerator-images.
Create an instance template for a TPU Flex-start VM
The following command creates an instance template using the flex-start consumption option:
gcloud compute instance-templates create INSTANCE_TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--maintenance-policy=TERMINATE \
--instance-termination-action=DELETE \
--provisioning-model=FLEX_START \
--max-run-duration=DURATION \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT
Replace the following placeholders:
INSTANCE_TEMPLATE_NAME: The name of your instance template.MACHINE_TYPE: The machine type for the TPU VM, for example,ct6e-standard-8t.DURATION: The maximum duration for which the TPU VM can run.IMAGE_FAMILY: The OS image family for the TPU VM. If you want to install a specific OS version, use the--imageflag. For more information about OS images, see OS images.IMAGE_PROJECT: The project that contains the OS image. For TPU images, this isubuntu-os-accelerator-images.
Create a workload policy
You must create a workload policy with the accelerator-topology parameter (for
example, 4x4, 8x8, or 4x4x4). The accelerator topology configures the MIG
to treat the instances as a single, interconnected slice.
The following command creates a workload policy:
gcloud compute resource-policies create workload-policy WORKLOAD_POLICY_NAME \
--type=high-throughput \
--accelerator-topology=TOPOLOGY \
--region=REGION
Replace the following placeholders:
WORKLOAD_POLICY_NAME: The name of your workload policy.TOPOLOGY: The topology of the TPU VMs, for example,4x4x8. For more information about topology for each version of TPU, see System architecture.REGION: The region for your workload policy.
Create a MIG
Create a zonal or a regional MIG by using the
gcloud compute instance-groups managed create command
as follows:
To create a zonal MIG containing a multi-host TPU slice, use the following command:
gcloud compute instance-groups managed create MIG_NAME \ --size=MIG_SIZE \ --target-size-policy-mode=bulk \ --template=INSTANCE_TEMPLATE_URL \ --zone=ZONE \ --default-action-on-vm-failure=do-nothing \ --workload-policy=WORKLOAD_POLICY_URLTo create a regional MIG containing a multi-host TPU slice, use the following command:
gcloud compute instance-groups managed create MIG_NAME \ --size=MIG_SIZE \ --target-size-policy-mode=bulk \ --template=INSTANCE_TEMPLATE_URL \ --region=REGION \ --default-action-on-vm-failure=do-nothing \ --workload-policy=WORKLOAD_POLICY_URL \ --target-distribution-shape=any-single-zone \ --instance-redistribution-type=none
Replace the following placeholders:
MIG_NAME: The name of your MIG.MIG_SIZE: The number of VMs in the MIG.INSTANCE_TEMPLATE_URL: the URL of the instance template that you want to use to create instances in the MIG. The URL can contain either the ID or name of the instance template. Specify one of the following values:- For a regional instance template:
projects/PROJECT_ID/regions/REGION/instanceTemplates/INSTANCE_TEMPLATE_ID - For a global instance template:
INSTANCE_TEMPLATE_ID
- For a regional instance template:
ZONE: The zone for your MIG.REGION: The region for your MIG.WORKLOAD_POLICY_URL: The URL of the workload policy that you want to use to create instances in the MIG. For example:projects/PROJECT_ID/regions/WORKLOAD_POLICY_REGION/resourcePolicies/WORKLOAD_POLICY_NAME.
Create VMs with custom names in a MIG
You can create VMs in a MIG by specifying custom names for each VM. This is useful for debugging and ensuring instances are created in a specific order.
MIGs that contain a multi-host TPU slice use the bulk mode of target size policy. When creating VMs with custom names in such a MIG, the following applies:
You must first verify that the MIG doesn't have VMs in it. If the MIG has VMs, you must either resize the MIG to target size
0or create another MIG with target size0.You can only use the REST API to create VMs with custom names.
Create VMs with custom names by using one of the following REST API methods:
For a zonal MIG, use the
instanceGroupManagers.createInstances.POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instanceGroupManagers/MIG_NAME/createInstances { "instances": [ { "name": "INSTANCE_NAME_1" }, { "name": "INSTANCE_NAME_2" }, ... ] }For a regional MIG, use the
regionInstanceGroupManagers.createInstances.POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/regions/REGION/instanceGroupManagers/MIG_NAME/createInstances { "instances": [ { "name": "INSTANCE_NAME_1" }, { "name": "INSTANCE_NAME_2" }, ... ] }
Replace the following placeholders:
PROJECT_ID: The ID of the project where the MIG exists.ZONE: The zone of the MIG.REGION: The region of the MIG.INSTANCE_NAME_1,2,..: The names of the VMs to add to the specified MIG.
What's next
- Learn about TPU VMs and MIGs.
- Learn how to Create a MIG with a single-host Cloud TPU slice.
- Learn how to manage TPU VMs.
- Learn about TPUs in GKE.
- Learn how to run an ML workload on TPUs, for example, Serve Qwen2-72B-Instruct with vLLM on TPUs.