Tensor Processing Units (TPUs) are Google's custom-developed, application-specific integrated circuits (ASICs) designed to accelerate machine learning (ML) and artificial intelligence (AI) workloads. Whether you are training complex foundation models for weeks or running large-scale inference, TPUs offer scalable, specialized computing resources optimized for frameworks like JAX and PyTorch.
Cloud TPUs are engineered to tackle the most demanding AI workloads. The key benefits include:
Optimized for matrix computations: TPUs are specifically designed with Matrix Multiply Units (MXUs) to execute the massive matrix operations fundamental to ML algorithms with exceptional efficiency.
High-bandwidth memory (HBM): On-chip high-bandwidth memory lets you train and serve larger models and effectively utilize larger batch sizes.
Massive scalability with slices: TPU chips can be connected in groups called slices. The slices let your workloads achieve scaling up to thousands of TPU chips for massive training jobs.
When to use TPUs
TPUs are optimized for specific workloads, such as the following:
- Models dominated by matrix computations
- Models with no custom PyTorch/JAX operations inside the main training loop
- Models that train for weeks or months
- Large models with large effective batch sizes
- Models with ultra-large embeddings common in advanced ranking and recommendation workloads
TPUs are not suited to the following workloads:
- Linear algebra programs that require frequent branching or contain many element-wise algebra operations
- Workloads that require high-precision arithmetic
- Neural network workloads that contain custom operations in the main training loop
Provisioning options on Google Cloud
You can access and provision TPUs by using the following Google Cloud products depending on your operational needs.
Compute Engine
Compute Engine lets you create and manage individual TPU VMs or slices, providing you with the capability for full lifecycle management of TPU VMs. Google recommends that you use Compute Engine over the legacy Cloud TPU API to provision your TPU resources.
To learn more, see Cloud TPU resources in Compute Engine.
Google Kubernetes Engine
Google Kubernetes Engine (GKE) provides a fully managed, multi-tenant Kubernetes environment for orchestrating large-scale AI workloads. GKE supports TPU node and node pool lifecycle management, including creating, configuring, and deleting TPU VMs.
To learn more, see About TPUs in GKE.
Cloud TPU
The Cloud TPU API, including the Google Cloud CLI and Cloud Client Libraries for Cloud TPU, is no longer under development. For provisioning and managing TPU resources, Google recommends that you use Compute Engine or GKE, based on your orchestration and workload needs.
For more information, see Migrate from the Cloud TPU API.
TPU versions supported in Compute Engine
Compute Engine supports the following TPU versions:
- TPU7x (Ironwood)
- TPU v6e (Trillium)
- TPU v5p
For more information about each TPU version, see TPU machines.
What's next
- Learn about Cloud TPU resources in Compute Engine
- Learn about TPU hardware