TPU machines in accelerator-optimized machine family

This document describes the Compute Engine instances in the accelerator-optimized machine family that have Tensor Processing Units (TPUs). TPUs are Google's custom-developed, application-specific integrated circuits (ASICs) that are optimized specifically for artificial intelligence (AI) and machine learning (ML) workloads.

Compute Engine supports the following TPU versions:

  • TPU7x
  • TPU v6e
  • TPU v5p

Each machine type within a version has a specific topology and a number of TPU chips attached.

Fundamentals of TPU architecture

Understanding the fundamentals of TPU architecture helps you to choose the TPU version and machine type for your workload.

  • TPU chip: A TPU chip is a specialized accelerator designed by Google for machine learning. Each TPU chip contains one or more TensorCores to handle massive matrix operations. Each TensorCore consists of one or more matrix-multiply units (MXUs), which use a systolic array architecture to perform thousands of multiply-accumulate operations per cycle without constant memory access. While primarily used for high-speed matrix processing, the TPU chip also includes vector and scalar units for general computation and control flow operations.

  • TPU Pod: A TPU Pod is a contiguous set of TPUs grouped together over a specialized network. The number of TPU chips in a TPU Pod is dependent on the TPU version.

  • TPU VM: A TPU VM is a Linux virtual machine that runs on a TPU host and has access to the underlying TPUs. You can connect directly to TPU VMs using SSH. You have root access to the VM, so you can run arbitrary code. You can access compiler and runtime debug logs and error messages.

  • TPU slice: A logical group of interconnected TPU chips, accessed through one or more TPU VMs. Slices have one of the following scopes:

    • Single-host slice: A slice consisting of one host machine. In general, this maps to one TPU VM.
    • Multi-host slice: A slice consisting of multiple TPU VMs interconnected using a high-speed inter-chip interconnect (ICI).
  • TPU cube: A 4x4x4 topology of interconnected TPU chips. This is only applicable to 3D topologies.

  • SparseCore: SparseCores are dataflow processors that accelerate models using sparse operations. A primary use case is accelerating recommendation models, which rely heavily on embeddings.

  • TPU versions: The exact architecture of a TPU chip depends on the TPU version that you use. Each TPU version also supports different slice sizes and configurations.

For information about how TPUs work, see TPU architecture document in the Cloud TPU documentation.

Recommended TPU versions by workload types

TPU version Primary workload types
TPU7x (Ironwood)
  • Large-scale dense and Mixture-of-Experts (MoE) models
  • Intensive pre-training for massive foundation models
  • Sampling and decode-heavy inference
TPU v6e (Trillium)
  • Training & fine-tuning (Transformers, CNNs)
  • Large-scale inference (Gemma 2, Llama, Diffusion models)
  • Recommendation engines and personalization (using SparseCore)
TPU v5p
  • Highest performance for large-scale foundation model training
  • Massive-scale multi-modal AI training
  • Embedding-dense workloads like large recommendation systems

Consumption options

To optimize resource utilization and cost while balancing workload performance, Compute Engine supports the following TPU consumption options:

  • On-demand: to consume TPUs without arranging capacity in advance. Before requesting resources, you must have enough on-demand quota for the specific type and quantity of TPU VMs. On-demand is the most flexible consumption option; however, there is no guarantee that enough on-demand resources will be available to fulfill your request.

  • Spot VMs: to provision Spot VMs, you can get significant discounts, but Spot VMs can be preempted at any time, with a 30-second warning. For more information, see About Spot VMs.

  • Flex-start: to provision Flex-start VMs for up to seven days, with Compute Engine automatically allocating the hardware on a best-effort basis based on availability. For more information, see About Flex-start VMs.

  • Future reservation: to request a future reservation for one year or longer. For more information, see Request a future reservation for one year or longer in the Cloud TPU documentation.

  • Future reservation in calendar mode: to provision TPU resources for up to 90 days, for a specified time period. For more information, see About future reservation requests in calendar mode.

On-demand is the default consumption model for TPUs if you don't specify another option.

For information about the underlying provisioning model that enables the consumption option, see About VM provisioning models.

Consumption option availability by TPU versions

The following table summarizes the availability of each consumption option by TPU versions.

TPU version On-demand Spot Flex-start On-demand reservations Future reservations Future reservations in calendar mode
1 1 1

1 Spot, Flex-start, and Future reservations in calendar mode for TPU7x is restricted by an allowlist. To request access, contact your account team or the sales team.

TPU versions comparison

Compare the characteristics of different TPU versions. You can select specific properties in the Choose properties to compare field to compare those properties across all TPU versions in the following table.

Accelerator optimized Accelerator optimized Accelerator optimized
VM VM VM
Intel Emerald Rapids AMD EPYC Genoa Intel Sapphire Rapids
x86 x86 x86
224 44 to 180 208
Thread Thread Thread
960 GB 176 to 1440 GB 448 GB
NUMA NUMA NUMA
NVMe NVMe NVMe
gVNIC gVNIC gVNIC
400 Gbps 50 to 400 Gbps 200 Gbps
4 8 4
discounts discounts discounts
discounts discounts discounts

TPU architecture specifications

The following table lists the key specifications for each TPU version.

Specification TPU7x TPU v6e TPU v5p
Number of chips per pod 9216 256 8960
Peak compute per chip (BF16) (TFLOPs) 2307 918 459
Peak compute per chip (FP8) (TFLOPs) 4614 918 459
HBM capacity per chip (GiB) 192 32 95
HBM bandwidth per chip (GiBps) 7380 1638 2575
Number of vCPUs (4-chip VM) 224 180 208
RAM (GiB) (4-chip VM) 960 720 448
Number of TensorCores per chip 2 1 2
Number of SparseCores per chip 4 2 4
Bidirectional inter-chip interconnect (ICI) bandwidth per chip (GBps) 1200 800 1200
Data center network (DCN) bandwidth per chip (Gbps) 100 100 50

TPU machine types

The following sections describe the machine types available for each TPU version.

TPU7x (Ironwood)

Each TPU7x virtual machine (VM) contains 4 TPU chips. All TPU7x slices use full-host, 4-chip VMs.

Each TPU7x chip contains two TensorCores and four SparseCores.

The Ironwood programming model lets you access two TPU devices instead of a single logical core architecture used in previous generations. For more information, see Dual-chiplet architecture in the Cloud TPU documentation.

Machine type Number of vCPUs Instance memory (GiB) Physical NIC count Maximum network bandwidth (Gbps) Number of TPU chips per VM Number of NUMA nodes Total TPU memory (GiB HBM)
tpu7x-standard-4t 224 960 2 400 4 2 768

For more information about the TPU7x architecture, see TPU7x (Ironwood) in the Cloud TPU documentation.

TPU v6e (Trillium)

Each TPU v6e VM can contain 1, 4, or 8 TPU chips. 4-chip and smaller slices have the same non-uniform memory access (NUMA) node.

v6e slices are created using half-host VMs, each with 4 TPU chips, except for the following:

  • ct6e-standard-1t with only a single TPU chip is primarily intended for testing.
  • ct6e-standard-8t is a full-host VM that has been optimized for an inference use case, allowing all 8 TPU chips that are attached to a single VM to be used in a single serving workload.
Machine type Number of vCPUs Instance memory (GB) Physical NIC count Maximum network bandwidth (Gbps) Number of TPU chips per VM Number of NUMA nodes Total TPU memory (GiB HBM)
ct6e-standard-1t 44 176 1/4 50 1 1 32
ct6e-standard-4t 180 720 2 400 4 1 128
ct6e-standard-8t 360 1440 1 200 8 2 256

For more information about the TPU v6e architecture, see TPU v6e in the Cloud TPU documentation.

TPU v5p

A TPU v5p Pod is composed of 8960 TPU chips interconnected with reconfigurable high-speed links. TPU v5p's flexible networking lets you connect the TPU chips in a same-sized slice in multiple ways. Single slice training is supported for up to 6144 TPU chips.

Machine type Number of vCPUs Instance memory (GB) Physical NIC count Maximum network bandwidth (Gbps) Number of TPU chips per VM Number of NUMA nodes Total TPU memory (GiB HBM)
ct5p-hightpu-4t 208 448 1 200 4 2 380

For more information about the TPU v5p architecture, see TPU v5p in the Cloud TPU documentation.

TPU topology

The topology defines the physical arrangement of TPUs within a TPU slice. Depending on the TPU version, the topology is two- or three-dimensional. You can identify the number of TPU chips in a slice by calculating the product of each size in the topology. For example:

  • The tpu7x-standard-4t machine type with a 2x2x2 topology is an 8-chip multi-host TPU7x slice.

The following table lists the topologies available for each TPU version.

TPU version Machine type Scope Technical specifications
TPU7x (Ironwood) tpu7x-standard-4t Single-host
  • Topology: 2x2x1
  • Number of TPU chips for the topology: 4
  • Number of hosts: 1
  • Number of VMs: 1
  • Cubes count: 1/16
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 2x2x2
  • Number of TPU chips for the topology: 8
  • Number of hosts: 2
  • Number of VMs: 2
  • Cubes count: 1/8
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 2x2x4
  • Number of TPU chips for the topology: 16
  • Number of hosts: 4
  • Number of VMs: 4
  • Cubes count: 1/4
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 2x4x4
  • Number of TPU chips for the topology: 32
  • Number of hosts: 8
  • Number of VMs: 8
  • Cubes count: 1/2
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 4x4x4
  • Number of TPU chips for the topology: 64
  • Number of hosts: 16
  • Number of VMs: 16
  • Cubes count: 1
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 4x4x8
  • Number of TPU chips for the topology: 128
  • Number of hosts: 32
  • Number of VMs: 32
  • Cubes count: 2
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 4x8x8
  • Number of TPU chips for the topology: 256
  • Number of hosts: 64
  • Number of VMs: 64
  • Cubes count: 4
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 8x8x8
  • Number of TPU chips for the topology: 512
  • Number of hosts: 128
  • Number of VMs: 128
  • Cubes count: 8
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: 8x8x16
  • Number of TPU chips for the topology: 1024
  • Number of hosts: 256
  • Number of VMs: 256
  • Cubes count: 16
TPU7x (Ironwood) tpu7x-standard-4t Multi-host
  • Topology: {A}x{B}x{C} (where A, B, and C are multiples of two)
  • Number of TPU chips for the topology: A*B*C
  • Number of hosts: (A*B*C)/4
  • Number of VMs: (A*B*C/4)
  • Cubes count: (A*B*C/64)
TPU v6e (Trillium) ct6e-standard-1t Single-host
  • Topology: 1x1
  • Number of TPU chips for the topology: 1
  • Number of VMs: 1
TPU v6e (Trillium) ct6e-standard-8t Single-host
  • Topology: 2x4
  • Number of TPU chips for the topology: 8
  • Number of VMs: 1
TPU v6e (Trillium) ct6e-standard-4t Single-host
  • Topology: 2x2
  • Number of TPU chips for the topology: 4
  • Number of VMs: 1
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 2x4
  • Number of TPU chips for the topology: 8
  • Number of VMs: 2
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 4x4
  • Number of TPU chips for the topology: 16
  • Number of VMs: 4
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 4x8
  • Number of TPU chips for the topology: 32
  • Number of VMs: 8
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 8x8
  • Number of TPU chips for the topology: 64
  • Number of VMs: 16
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 8x16
  • Number of TPU chips for the topology: 128
  • Number of VMs: 32
TPU v6e (Trillium) ct6e-standard-4t Multi-host
  • Topology: 16x16
  • Number of TPU chips for the topology: 256
  • Number of VMs: 64
TPU v5p ct5p-hightpu-4t Single-host
  • Topology: 2x2x1
  • Number of TPU chips for the topology: 4
  • Number of VMs: 1
TPU v5p ct5p-hightpu-4t Multi-host
  • Topology: 2x2x2
  • Number of TPU chips for the topology: 8
  • Number of VMs: 2
TPU v5p ct5p-hightpu-4t Multi-host
  • Topology: 2x2x4
  • Number of TPU chips for the topology: 16
  • Number of VMs: 4
TPU v5p ct5p-hightpu-4t Multi-host
  • Topology: 2x4x4
  • Number of TPU chips for the topology: 32
  • Number of VMs: 8
TPU v5p ct5p-hightpu-4t Multi-host
  • Topology: {A}x{B}x{C} (where A, B, and C are multiples of two)
  • Number of TPU chips for the topology: A*B*C
  • Number of VMs: (A*B*C/4)1
  1. Calculated by the topology product divided by four.

What's next