Cloud TPU resources in Compute Engine

You can create and manage Tensor Processing Units (TPUs) by using Compute Engine resources. This page provides a conceptual overview of using TPUs with Compute Engine. It maps TPU concepts to Compute Engine resources and outlines the high-level workflows for creating TPU resources.

Primary TPU concepts

To manage TPU resources within Compute Engine, it's helpful to understand these primary TPU concepts:

  • TPU VM: A virtual machine that connects directly to the TPU hardware. A single TPU VM is the same as a single-host slice.
  • TPU slice: A logical group of interconnected TPU chips, accessed through one or more TPU VMs. Slices have one of the following scopes:
    • Single-host slice: A slice consisting of one host machine. Single-host slice is another way of referring to a single TPU VM.
    • Multi-host slice: A slice that consists of multiple TPU VMs interconnected using a high-speed inter-chip interconnect (ICI).

TPU and Compute Engine concept map

The following table describes how TPU concepts map to Compute Engine resources:

Cloud TPU concept Compute Engine resource Resource details Use case
TPU VM VM instance A Compute Engine VM that provides direct access to TPU hardware. Individual VM tasks, SSH command execution, or debugging
TPU single-host slice VM instance or MIG with a single VM A configuration consisting of one physical host machine. Inference with autoscaling
TPU multi-host slice MIG with accelerator topology specified in workload policy A group of TPU VMs interconnected using ICI, managed as a single logical unit. Large-scale, distributed training requiring atomic provisioning

Migrate from the Cloud TPU API

The Cloud TPU API is no longer under active development. This includes the Google Cloud CLI for the Cloud TPU API and the Cloud Client Libraries for the Cloud TPU API. The Cloud TPU API will receive bug fixes and security updates only. New hardware generations, starting with TPU7x (Ironwood), are supported only through Compute Engine or Google Kubernetes Engine (GKE). For the latest features and support for the latest TPU versions, migrate by replacing your legacy Cloud TPU API calls with their equivalents in Compute Engine or GKE.

Depending on your orchestration and workload requirements, choose one of the following paths:

  • Compute Engine: Recommended for users who require direct VM-level control or custom OS images. To get started with provisioning TPUs in Compute Engine, see Quickstart: Create a TPU VM.
  • GKE: Recommended for containerized workloads, automated scaling, and large-scale orchestration. For more information about using TPUs with GKE, see About TPUs in GKE.

Existing TPU resources

TPU resources created using the Cloud TPU API (Node or QueuedResource REST objects) are incompatible with Compute Engine and GKE. To start using Compute Engine or GKE:

  • Rewrite any scripts that use the Cloud TPU API to use the Compute Engine or GKE APIs.
  • Delete resources using the Cloud TPU API and recreate them using Compute Engine or GKE APIs.

Limitations

TPUs in Compute Engine have the following limitations:

  • TPU versions: Compute Engine supports v5p, v6e, and TPU7x.
  • Capacity mode: The All Capacity mode for TPUs isn't available with Compute Engine.
  • Multislice: Creating groups of interconnected multi-host TPU slices isn't available with Compute Engine. To use Multislice, you must use Google Kubernetes Engine (GKE). For more information, see Deploy TPU Multislices in GKE.
  • Collections: Collection scheduling isn't available with Compute Engine. To use collection scheduling, you must use GKE. For more information, see Collection scheduling in the GKE documentation.

What's next