Request TPU Spot VMs

Spot VMs offer unused capacity at significantly discounted rates. While Spot VMs are preemptible at any time, they don't have a maximum runtime duration, unlike preemptible TPUs. To restart a Spot VMs instance or MIG, you must delete and then recreate it.

For more information about Spot VMs, see the Compute Engine documentation about Spot VMs.

Create TPU Spot VMs

You can create TPU Spot VMs as individual instances or as part of a managed instance group (MIG).

To create TPU Spot VMs with a MIG, specify the Spot provisioning model in your instance template.

  1. Create an instance template:

        gcloud compute instance-templates create TEMPLATE_NAME \
            --provisioning-model=SPOT \
            --instance-termination-action=DELETE \
            --machine-type=MACHINE_TYPE \
            --image-family=IMAGE_FAMILY \
            --image-project=IMAGE_PROJECT \
            --zone=ZONE \
            --maintenance-policy=TERMINATE
    

    Replace the following placeholders:

    • TEMPLATE_NAME: The name of the instance template.
    • MACHINE_TYPE: The machine type of the VM.
    • IMAGE_FAMILY: The OS image family for the TPU VM.
    • IMAGE_PROJECT: The OS image project for the TPU VM.
    • ZONE: The zone where the instance template is created.
  2. Create a workload policy

    The following command creates a workload policy. This is optional for single-host slices.

    gcloud compute resource-policies create workload WORKLOAD_POLICY_NAME \
    --type=high-throughput \
    --accelerator-topology=TOPOLOGY
    

    Replace the following placeholders:

    • WORKLOAD_POLICY_NAME: The name of your workload policy.
    • TOPOLOGY: The topology of the TPU VMs, for example, 4x4x8.
  3. Create the MIG:

        gcloud compute instance-groups managed create MIG_NAME \
            --zone=ZONE \
            --template=TEMPLATE_NAME \
            --size=SIZE \
            --workload-policy=projects/PROJECT_ID/regions/WORKLOAD_POLICY_REGION/resourcePolicies/WORKLOAD_POLICY_NAME
    

    Replace the following placeholders:

    • MIG_NAME: The name of the MIG.
    • ZONE: The zone where the MIG is created.
    • TEMPLATE_NAME: The name of the instance template.
    • SIZE: The number of instances in the MIG.
    • PROJECT_ID: The ID of your Google Cloud project.
    • WORKLOAD_POLICY_REGION: The region where the workload policy is defined.
    • WORKLOAD_POLICY_NAME: The name of your workload policy.

Using single TPU VMs

You can also create single TPU Spot VMs.

gcloud compute instances create TPU_NAME \
    --zone=ZONE \
    --provisioning-model=SPOT \
    --instance-termination-action=DELETE \
    --machine-type=MACHINE_TYPE \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --maintenance-policy=TERMINATE

Replace the following placeholders:

  • TPU_NAME: The name of the TPU.
  • ZONE: The zone where the TPU is created.
  • MACHINE_TYPE: The machine type of the VM.
  • IMAGE_FAMILY: The image family of the instance template.
  • IMAGE_PROJECT: The OS image project for the TPU VM.

For more information about Spot VMs in Compute Engine, see Spot VMs.

Pricing and quota

Pricing for TPU Spot VMs is significantly lower than for on-demand and reserved TPUs. For more information about pricing, see Cloud TPU pricing.

You need preemptible quota to use TPU Spot VMs. For more information, see Quotas.

What's next