About TPUs on Google Cloud

Tensor Processing Units (TPUs) are Google's custom-developed, application-specific integrated circuits (ASICs) designed to accelerate machine learning (ML) and artificial intelligence (AI) workloads. Whether you are training complex foundation models for weeks or running large-scale inference, TPUs offer scalable, specialized computing resources optimized for frameworks like JAX and PyTorch.

Cloud TPUs are engineered to tackle the most demanding AI workloads. The key benefits include:

Optimized for matrix computations: TPUs are specifically designed with Matrix Multiply Units (MXUs) to execute the massive matrix operations fundamental to ML algorithms with exceptional efficiency.
High-bandwidth memory (HBM): On-chip high-bandwidth memory lets you train and serve larger models and effectively utilize larger batch sizes.
Massive scalability with slices: TPU chips can be connected in groups called slices. The slices let your workloads achieve scaling up to thousands of TPU chips for massive training jobs.

When to use TPUs

TPUs are optimized for specific workloads, such as the following:

Models dominated by matrix computations
Models with no custom PyTorch/JAX operations inside the main training loop
Models that train for weeks or months
Large models with large effective batch sizes
Models with ultra-large embeddings common in advanced ranking and recommendation workloads

TPUs are not suited to the following workloads:

Linear algebra programs that require frequent branching or contain many element-wise algebra operations
Workloads that require high-precision arithmetic
Neural network workloads that contain custom operations in the main training loop

Provisioning options on Google Cloud

You can access and provision TPUs by using the following Google Cloud products depending on your operational needs.

Compute Engine

Compute Engine lets you create and manage individual TPU VMs or slices, providing you with the capability for full lifecycle management of TPU VMs. Google recommends that you use Compute Engine over the legacy Cloud TPU API to provision your TPU resources.

To learn more, see Cloud TPU resources in Compute Engine.

Google Kubernetes Engine

Google Kubernetes Engine (GKE) provides a fully managed, multi-tenant Kubernetes environment for orchestrating large-scale AI workloads. GKE supports TPU node and node pool lifecycle management, including creating, configuring, and deleting TPU VMs.

To learn more, see About TPUs in GKE.

Cloud TPU

The Cloud TPU API, including the Google Cloud CLI and Cloud Client Libraries for Cloud TPU, is no longer under development. For provisioning and managing TPU resources, Google recommends that you use Compute Engine or GKE, based on your orchestration and workload needs.

For more information, see Migrate from the Cloud TPU API.

TPU versions supported in Compute Engine

Compute Engine supports the following TPU versions:

TPU7x (Ironwood)
TPU v6e (Trillium)
TPU v5p

For more information about each TPU version, see TPU machines.

What's next

Learn about Cloud TPU resources in Compute Engine
Learn about TPU hardware

About TPUs on Google Cloud Stay organized with collections Save and categorize content based on your preferences.