Create TPU Flex-start VMs with Compute Engine
TPU Flex-start VMs, powered by Dynamic Workload Scheduler, offer a flexible, cost-effective way to access TPU resources for AI workloads for up to 7 days without long-term reservations. When you request TPU Flex-start VMs, your request remains in a queue until capacity is available. Once provisioned, the TPU VMs run for your specified duration.
TPU Flex-start VMs are a good fit for quick experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days. For more information about other TPU consumption options, see Cloud TPU consumption options.
You can delete your TPU resources at any time to stop billing. For more information about TPU pricing, see Cloud TPU pricing.
Limitations
TPU Flex-start VMs have the following limitations:
- You can request TPU Flex-start VMs for a duration of up to 7 days.
- You can request the following Cloud TPU versions and zones:
MIGs with TPUs have the following limitations:
Lifecycle operations: You can't stop, start, resume, or suspend TPU instances. To change configurations that require a restart or to stop incurring charges, you must delete the instances.
Regional MIG zone distribution: You must set the target distribution shape to
ANY_SINGLE_ZONE.Configuration updates in a MIG:
- You can't update a MIG that forms a multi-host TPU slice due to the defined accelerator topology.
- You can update a MIG that forms single-host TPU slices by using the
automatic or selective methods.
However, the updates for single-host TPU slice don't support the restart
(
RESTART) action. If a restart is necessary and the most disruptive action allowed is replace (REPLACE), then the updater will replace the instance; otherwise, the update attempt fails with an error.
For a MIG that forms a multi-host TPU slice, the following limitations also apply:
Target size policy: You must set the target size policy mode to
BULK. After you set this mode, you can't change it.Target size: In bulk mode, you can set the target size to either
0or the number of instances that are required to form the accelerator topology.Workload policy: You must specify a workload policy in which the accelerator topology is defined. After you set the workload policy, you can't change or remove the policy from the MIG.
Unsupported features: MIGs with TPUs don't support the following features:
- Instance flexibility
- Resize requests to obtain resources all at once
- Stateful configuration
- For a MIG that forms a multi-host TPU slice, the following are also not supported:
Before you begin
Before requesting TPU Flex-start VMs, you must:
- Install the Google Cloud CLI
- Create a Google Cloud project
- Enable the Compute Engine API (
compute.googleapis.com) - Ensure you have the required permissions:
roles/compute.instanceAdmin.v1roles/iam.serviceAccountUser
For more information, see Set up a Google Cloud project for TPUs.
Ensure you have sufficient preemptible quota to use TPU Flex-start VMs. If your workload requires more cores than your current allocation, you can request a quota increase. For details, see Cloud TPU quotas.
Create TPU Flex-start VMs with MIGs
To use TPU Flex-start VMs, you create a managed instance group (MIG) with a specific instance template configuration.
For general instructions on creating Flex-start VMs, see Create Flex-start VMs.
Create TPU Flex-start VMs with a multi-host slice
Create an instance template
Create an instance template specifying the FLEX_START provisioning model and
your chosen run duration.
gcloud compute instance-templates create TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT \
--provisioning-model=FLEX_START \
--instance-termination-action=DELETE \
--max-run-duration=DURATION \
--region=REGION \
--maintenance-policy=TERMINATE
Replace the following placeholders:
- TEMPLATE_NAME: The name of your instance template.
- MACHINE_TYPE: The machine type
for the TPU VM (for example,
ct6e-standard-8t). - IMAGE_FAMILY: The OS image family for the TPU VM (for
example,
ubuntu-accelerator-2204-amd64-with-tpu-v6e) - IMAGE_PROJECT: The OS image project for the TPU VM
(for example,
ubuntu-os-accelerator-images) - DURATION: The maximum run duration (for example,
7dfor 7 days). - REGION: The region in which to create the instance template.
Create a workload policy
The following command creates a workload policy. This is optional for single-host slices.
gcloud compute resource-policies create workload WORKLOAD_POLICY_NAME \
--type=high-throughput \
--accelerator-topology=TOPOLOGY
Replace the following placeholders:
- WORKLOAD_POLICY_NAME: The name of your workload policy.
- TOPOLOGY: The topology of the TPU VMs, for example,
4x4x8.
Create the MIG
Create the MIG using the template.
gcloud compute instance-groups managed create MIG_NAME \
--zone=ZONE \
--template=TEMPLATE_NAME \
--size=SIZE \
--workload-policy=projects/PROJECT_ID/regions/WORKLOAD_POLICY_REGION/resourcePolicies/WORKLOAD_POLICY_NAME \
--target-size-policy-mode=bulk
Replace the following placeholders:
- MIG_NAME: The name of your MIG.
- ZONE: The zone of your MIG.
- TEMPLATE_NAME: The name of your instance template.
- SIZE: The number of instances to create.
- PROJECT_ID: The ID of your Google Cloud project.
- WORKLOAD_POLICY_REGION: The region where the workload policy is defined.
- WORKLOAD_POLICY_NAME: The name of your workload policy.
Create TPU Flex-start VMs with single-host slices
Create an instance template
Create an instance template specifying the FLEX_START provisioning model and
your chosen run duration.
gcloud compute instance-templates create TEMPLATE_NAME \
--machine-type=MACHINE_TYPE \
--image-family=IMAGE_FAMILY \
--image-project=IMAGE_PROJECT \
--provisioning-model=FLEX_START \
--instance-termination-action=DELETE \
--max-run-duration=DURATION \
--region=REGION \
--maintenance-policy=TERMINATE
Replace the following placeholders:
- TEMPLATE_NAME: The name of your instance template.
- MACHINE_TYPE: The machine type
for the TPU VM (for example,
ct6e-standard-8t). - IMAGE_FAMILY: The OS image family for the TPU VM (for
example,
ubuntu-accelerator-2204-amd64-with-tpu-v6e) - IMAGE_PROJECT: The OS image project for the TPU VM (for
example,
ubuntu-os-accelerator-images) - DURATION: The maximum run duration (for example,
7dfor 7 days). - REGION: The region in which to create the instance template.
Create a workload policy
The following command creates a workload policy. This is optional for single-host slices.
gcloud compute resource-policies create workload WORKLOAD_POLICY_NAME \
--type=high-throughput
Replace the following placeholders:
- WORKLOAD_POLICY_NAME: A name for your workload policy.
Create the MIG
Create the MIG using the template.
gcloud compute instance-groups managed create MIG_NAME \
--zone=ZONE \
--template=TEMPLATE_NAME \
--size=SIZE \
--workload-policy=projects/PROJECT_ID/regions/WORKLOAD_POLICY_REGION/resourcePolicies/WORKLOAD_POLICY_NAME
Replace the following placeholders:
- MIG_NAME: The name of your MIG.
- ZONE: The zone of your MIG.
- TEMPLATE_NAME: The name of your instance template.
- SIZE: The number of instances to create.
- PROJECT_ID: The ID of your Google Cloud project.
- WORKLOAD_POLICY_REGION: The region where the workload policy is defined.
- WORKLOAD_POLICY_NAME: The name of your workload policy.