TPU machines in accelerator-optimized machine family

This document describes the Compute Engine instances in the accelerator-optimized machine family that have Tensor Processing Units (TPUs). TPUs are Google's custom-developed, application-specific integrated circuits (ASICs) that are optimized specifically for artificial intelligence (AI) and machine learning (ML) workloads.

Compute Engine supports the following TPU versions:

TPU7x
TPU v6e
TPU v5p

Each machine type within a version has a specific topology and a number of TPU chips attached.

Fundamentals of TPU architecture

Understanding the fundamentals of TPU architecture helps you to choose the TPU version and machine type for your workload.

TPU chip: A TPU chip is a specialized accelerator designed by Google for machine learning. Each TPU chip contains one or more TensorCores to handle massive matrix operations. Each TensorCore consists of one or more matrix-multiply units (MXUs), which use a systolic array architecture to perform thousands of multiply-accumulate operations per cycle without constant memory access. While primarily used for high-speed matrix processing, the TPU chip also includes vector and scalar units for general computation and control flow operations.
TPU Pod: A TPU Pod is a contiguous set of TPUs grouped together over a specialized network. The number of TPU chips in a TPU Pod is dependent on the TPU version.
TPU VM: A TPU VM is a Linux virtual machine that runs on a TPU host and has access to the underlying TPUs. You can connect directly to TPU VMs using SSH. You have root access to the VM, so you can run arbitrary code. You can access compiler and runtime debug logs and error messages.
TPU slice: A logical group of interconnected TPU chips, accessed through one or more TPU VMs. Slices have one of the following scopes:
- Single-host slice: A slice consisting of one host machine. In general, this maps to one TPU VM.
- Multi-host slice: A slice consisting of multiple TPU VMs interconnected using a high-speed inter-chip interconnect (ICI).
TPU cube: A 4x4x4 topology of interconnected TPU chips. This is only applicable to 3D topologies.
SparseCore: SparseCores are dataflow processors that accelerate models using sparse operations. A primary use case is accelerating recommendation models, which rely heavily on embeddings.
TPU versions: The exact architecture of a TPU chip depends on the TPU version that you use. Each TPU version also supports different slice sizes and configurations.

For information about how TPUs work, see TPU architecture document in the Cloud TPU documentation.

Recommended TPU versions by workload types

TPU version	Primary workload types
TPU7x (Ironwood)	Large-scale dense and Mixture-of-Experts (MoE) models Intensive pre-training for massive foundation models Sampling and decode-heavy inference
TPU v6e (Trillium)	Training & fine-tuning (Transformers, CNNs) Large-scale inference (Gemma 2, Llama, Diffusion models) Recommendation engines and personalization (using SparseCore)
TPU v5p	Highest performance for large-scale foundation model training Massive-scale multi-modal AI training Embedding-dense workloads like large recommendation systems

Consumption options

To optimize resource utilization and cost while balancing workload performance, Compute Engine supports the following TPU consumption options:

On-demand: to consume TPUs without arranging capacity in advance. Before requesting resources, you must have enough on-demand quota for the specific type and quantity of TPU VMs. On-demand is the most flexible consumption option; however, there is no guarantee that enough on-demand resources will be available to fulfill your request.
Spot VMs: to provision Spot VMs, you can get significant discounts, but Spot VMs can be preempted at any time, with a 30-second warning. For more information, see About Spot VMs.
Flex-start: to provision Flex-start VMs for up to seven days, with Compute Engine automatically allocating the hardware on a best-effort basis based on availability. For more information, see About Flex-start VMs.
Future reservation: to request a future reservation for one year or longer. For more information, see Request a future reservation for one year or longer in the Cloud TPU documentation.
Future reservation in calendar mode: to provision TPU resources for up to 90 days, for a specified time period. For more information, see About future reservation requests in calendar mode.

On-demand is the default consumption model for TPUs if you don't specify another option.

For information about the underlying provisioning model that enables the consumption option, see About VM provisioning models.

Consumption option availability by TPU versions

The following table summarizes the availability of each consumption option by TPU versions.

TPU version	Spot	Flex-start	Future reservations in calendar mode
TPU7x	¹	¹	¹
TPU v6e
TPU v5p

¹ Spot, Flex-start, and Future reservations in calendar mode for TPU7x is restricted by an allowlist. To request access, contact your account team or the sales team.

TPU versions comparison

Compare the characteristics of different TPU versions. You can select specific properties in the Choose properties to compare field to compare those properties across all TPU versions in the following table.

	TPU7x	v6e	v5p
Workload type	Accelerator optimized	Accelerator optimized	Accelerator optimized
Instance type	VM	VM	VM
CPU type	Intel Emerald Rapids	AMD EPYC Genoa	Intel Sapphire Rapids
Architecture	x86	x86	x86
vCPUs	224	44 to 180	208
vCPU definition	Thread	Thread	Thread
Memory	960 GB	176 to 1440 GB	448 GB
Shared memory architecture	NUMA	NUMA	NUMA
Custom machine types	—	—	—
Extended memory	—	—	—
Sole tenancy	—	—	—
Nested virtualization	—	—	—
Confidential Computing	—		—
Disk interface type	NVMe	NVMe	NVMe
Hyperdisk Balanced			—
Hyperdisk Balanced HA	—	—	—
Hyperdisk Extreme	—	—	—
Hyperdisk ML
Hyperdisk Throughput	—	—	—
Local SSD	—	—	—
Standard PD	—	—	—
Balanced PD	—	—
SSD PD	—	—	—
Extreme PD	—	—	—
Network interfaces	gVNIC	gVNIC	gVNIC
Maximum network bandwidth	400 Gbps	50 to 400 Gbps	200 Gbps
Max TPUs per VM	4	8	4
Sustained use discounts	—	—	—
Resource-based committed use discounts (CUDs)	discounts	discounts	discounts
Compute flexible CUDs	— discounts	— discounts	— discounts
Spot VM discounts

TPU architecture specifications

The following table lists the key specifications for each TPU version.

Specification	TPU7x	TPU v6e	TPU v5p
Number of chips per pod	9216	256	8960
Peak compute per chip (BF16) (TFLOPs)	2307	918	459
Peak compute per chip (FP8) (TFLOPs)	4614	918	459
HBM capacity per chip (GiB)	192	32	95
HBM bandwidth per chip (GiBps)	7380	1638	2575
Number of vCPUs (4-chip VM)	224	180	208
RAM (GiB) (4-chip VM)	960	720	448
Number of TensorCores per chip	2	1	2
Number of SparseCores per chip	4	2	4
Bidirectional inter-chip interconnect (ICI) bandwidth per chip (GBps)	1200	800	1200
Data center network (DCN) bandwidth per chip (Gbps)	100	100	50

TPU machine types

The following sections describe the machine types available for each TPU version.

TPU7x (Ironwood)

Each TPU7x virtual machine (VM) contains 4 TPU chips. All TPU7x slices use full-host, 4-chip VMs.

Each TPU7x chip contains two TensorCores and four SparseCores.

The Ironwood programming model lets you access two TPU devices instead of a single logical core architecture used in previous generations. For more information, see Dual-chiplet architecture in the Cloud TPU documentation.

Machine type	Number of vCPUs	Instance memory (GiB)	Physical NIC count	Maximum network bandwidth (Gbps)	Number of TPU chips per VM	Number of NUMA nodes	Total TPU memory (GiB HBM)
`tpu7x-standard-4t`	224	960	2	400	4	2	768

For more information about the TPU7x architecture, see TPU7x (Ironwood) in the Cloud TPU documentation.

TPU v6e (Trillium)

Each TPU v6e VM can contain 1, 4, or 8 TPU chips. 4-chip and smaller slices have the same non-uniform memory access (NUMA) node.

v6e slices are created using half-host VMs, each with 4 TPU chips, except for the following:

ct6e-standard-1t with only a single TPU chip is primarily intended for testing.
ct6e-standard-8t is a full-host VM that has been optimized for an inference use case, allowing all 8 TPU chips that are attached to a single VM to be used in a single serving workload.

Machine type	Number of vCPUs	Instance memory (GB)	Physical NIC count	Maximum network bandwidth (Gbps)	Number of TPU chips per VM	Number of NUMA nodes	Total TPU memory (GiB HBM)
`ct6e-standard-1t`	44	176	1/4	50	1	1	32
`ct6e-standard-4t`	180	720	2	400	4	1	128
`ct6e-standard-8t`	360	1440	1	200	8	2	256

For more information about the TPU v6e architecture, see TPU v6e in the Cloud TPU documentation.

TPU v5p

A TPU v5p Pod is composed of 8960 TPU chips interconnected with reconfigurable high-speed links. TPU v5p's flexible networking lets you connect the TPU chips in a same-sized slice in multiple ways. Single slice training is supported for up to 6144 TPU chips.

Machine type	Number of vCPUs	Instance memory (GB)	Physical NIC count	Maximum network bandwidth (Gbps)	Number of TPU chips per VM	Number of NUMA nodes	Total TPU memory (GiB HBM)
`ct5p-hightpu-4t`	208	448	1	200	4	2	380

For more information about the TPU v5p architecture, see TPU v5p in the Cloud TPU documentation.

TPU topology

The topology defines the physical arrangement of TPUs within a TPU slice. Depending on the TPU version, the topology is two- or three-dimensional. You can identify the number of TPU chips in a slice by calculating the product of each size in the topology. For example:

The tpu7x-standard-4t machine type with a 2x2x2 topology is an 8-chip multi-host TPU7x slice.

The following table lists the topologies available for each TPU version.

TPU version	Machine type	Scope	Technical specifications
TPU7x (Ironwood)	`tpu7x-standard-4t`	Single-host	Topology: 2x2x1 Number of TPU chips for the topology: 4 Number of hosts: 1 Number of VMs: 1 Cubes count: 1/16
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 2x2x2 Number of TPU chips for the topology: 8 Number of hosts: 2 Number of VMs: 2 Cubes count: 1/8
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 2x2x4 Number of TPU chips for the topology: 16 Number of hosts: 4 Number of VMs: 4 Cubes count: 1/4
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 2x4x4 Number of TPU chips for the topology: 32 Number of hosts: 8 Number of VMs: 8 Cubes count: 1/2
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 4x4x4 Number of TPU chips for the topology: 64 Number of hosts: 16 Number of VMs: 16 Cubes count: 1
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 4x4x8 Number of TPU chips for the topology: 128 Number of hosts: 32 Number of VMs: 32 Cubes count: 2
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 4x8x8 Number of TPU chips for the topology: 256 Number of hosts: 64 Number of VMs: 64 Cubes count: 4
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 8x8x8 Number of TPU chips for the topology: 512 Number of hosts: 128 Number of VMs: 128 Cubes count: 8
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: 8x8x16 Number of TPU chips for the topology: 1024 Number of hosts: 256 Number of VMs: 256 Cubes count: 16
TPU7x (Ironwood)	`tpu7x-standard-4t`	Multi-host	Topology: {A}x{B}x{C} (where A, B, and C are multiples of two) Number of TPU chips for the topology: ABC Number of hosts: (ABC)/4 Number of VMs: (ABC/4) Cubes count: (ABC/64)
TPU v6e (Trillium)	`ct6e-standard-1t`	Single-host	Topology: 1x1 Number of TPU chips for the topology: 1 Number of VMs: 1
TPU v6e (Trillium)	`ct6e-standard-8t`	Single-host	Topology: 2x4 Number of TPU chips for the topology: 8 Number of VMs: 1
TPU v6e (Trillium)	`ct6e-standard-4t`	Single-host	Topology: 2x2 Number of TPU chips for the topology: 4 Number of VMs: 1
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 2x4 Number of TPU chips for the topology: 8 Number of VMs: 2
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 4x4 Number of TPU chips for the topology: 16 Number of VMs: 4
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 4x8 Number of TPU chips for the topology: 32 Number of VMs: 8
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 8x8 Number of TPU chips for the topology: 64 Number of VMs: 16
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 8x16 Number of TPU chips for the topology: 128 Number of VMs: 32
TPU v6e (Trillium)	`ct6e-standard-4t`	Multi-host	Topology: 16x16 Number of TPU chips for the topology: 256 Number of VMs: 64
TPU v5p	`ct5p-hightpu-4t`	Single-host	Topology: 2x2x1 Number of TPU chips for the topology: 4 Number of VMs: 1
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x2x2 Number of TPU chips for the topology: 8 Number of VMs: 2
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x2x4 Number of TPU chips for the topology: 16 Number of VMs: 4
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x4x4 Number of TPU chips for the topology: 32 Number of VMs: 8
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: {A}x{B}x{C} (where A, B, and C are multiples of two) Number of TPU chips for the topology: ABC Number of VMs: (ABC/4)¹

Calculated by the topology product divided by four. ↩

What's next

Learn about TPU resources in Compute Engine
Try the quickstart: Create a single TPU VM