TPU7x (Ironwood)
This page describes the architecture and available configurations for TPU7x, the latest TPU available on Google Cloud. TPU7x is the first release within the Ironwood family, Google Cloud's seventh generation TPU. The Ironwood generation is designed for large-scale AI training and inference.
With a 9,216-chip footprint per pod, TPU7x shares many similarities with TPU v5p. TPU7x provides high performance for large scale dense and MoE models, pre-training, sampling and decode-heavy inference.
To use TPU7x, you must use Google Kubernetes Engine (GKE). For more information, see About TPUs in GKE.
You can also use TPU7x and GKE with TPU Cluster Director. TPU Cluster Director is available through an All Capacity mode reservation, which gives you full access to all of your reserved capacity (no hold-backs) and full visibility into the TPU hardware topology, utilization status, and health status. For more information, see All Capacity mode overview.
To get access to TPU7x, contact your account team.
System architecture
Each TPU7x chip contains two TensorCores and four SparseCores. The following table shows the key specifications and their values for TPU7x compared to prior generations.
| Specification | v5p | v6e (Trillium) | TPU7x (Ironwood) |
|---|---|---|---|
| Number of chips per pod | 8960 | 256 | 9216 |
| Peak compute per chip (BF16) (TFLOPs) | 459 | 918 | 2307 |
| Peak compute per chip (FP8) (TFLOPs) | 459 | 918 | 4614 |
| HBM capacity per chip (GiB) | 95 | 32 | 192 |
| HBM bandwidth per chip (GB/s) | 2765 | 1638 | 7380 |
| Number of vCPUs (4-chip VM) | 208 | 180 | 224 |
| RAM (GB) (4-chip VM) | 448 | 720 | 960 |
| Number of TensorCores per chip | 2 | 1 | 2 |
| Number of SparseCores per chip | 4 | 2 | 4 |
| Bi-directional inter-chip interconnect (ICI) bandwidth per chip (GB/s) | 1200 | 800 | 1200 |
| Data center network (DCN) bandwidth per chip (Gb/s) | 50 | 100 | 100 |
The following diagram illustrates the architecture of Ironwood:

Dual-chiplet architecture
The Ironwood programming model lets you access two TPU devices instead of the single logical core (also known as MegaCore) architecture used in previous generations (TPU v4 and v5p). This change improves the cost-effectiveness and efficiency of manufacturing the chip. While this represents an architectural shift, the new design ensures that you can reuse existing software models with minimal changes.
Ironwood TPUs are composed of two distinct chiplets. This is a departure from the unified memory space of the MegaCore architecture.
Chiplet composition: Each chiplet is a self-contained unit with one TensorCore, two SparseCores, and 96 GB of high-bandwidth memory (HBM).
High-speed interconnect: The two chiplets are connected by a die-to-die (D2D) interface that is six times faster than a 1D inter-chip interconnect (ICI) link. Inter-chiplet communication is managed using collective operations.
Programming model and framework exposure
The programming model for Ironwood is similar to that of TPU generations earlier than v4, such as TPU v3. The new architecture is exposed in the following ways:
Two devices per chip: Frameworks like JAX expose each Ironwood chip as two separate "devices," one for each chiplet.
4D topology: JAX adds a fourth dimension to the topology to specify which of the two on-chip devices to use. This lets you use existing software models with minimal modification.
For more information about achieving optimal performance with the dual-chiplet architecture, see Performance recommendations for Ironwood's dual-chiplet architecture
Supported configurations
TPU7x chips have a direct connection to the nearest neighboring chips in 3 dimensions, resulting in a 3D mesh of networking connections. Slices larger than 64 chips are made up of one or more 4x4x4 "cubes" of chips.
The following table shows common 3D slice shapes that are supported for TPU7x:
| Topology | TPU chips | Hosts | VMs | Cubes | Scope |
|---|---|---|---|---|---|
| 2x2x1 | 4 | 1 | 1 | 1/16 | Single-host |
| 2x2x2 | 8 | 2 | 2 | 1/8 | Multi-host |
| 2x2x4 | 16 | 4 | 4 | 1/4 | Multi-host |
| 2x4x4 | 32 | 8 | 8 | 1/2 | Multi-host |
| 4x4x4 | 64 | 16 | 16 | 1 | Multi-host |
| 4x4x8 | 128 | 32 | 32 | 2 | Multi-host |
| 4x8x8 | 256 | 64 | 64 | 4 | Multi-host |
| 8x8x8 | 512 | 128 | 128 | 8 | Multi-host |
| 8x8x16 | 1024 | 256 | 256 | 16 | Multi-host |
| 8x16x16 | 2048 | 512 | 512 | 32 | Multi-host |
TPU7x VM
Each TPU7x virtual machine (VM) contains 4 chips. Each VM has access to two NUMA nodes. For more information about NUMA nodes, see Non-uniform memory access on Wikipedia.
All TPU7x slices use full-host, 4-chip VMs. The technical specifications for a TPU7x VM are:
- Number of vCPUs per VM: 224
- RAM per VM: 960 GB
- Number of NUMA nodes per VM: 2
Hyperdisk
By default, the VM boot disk for TPU7x is Hyperdisk Balanced. You can attach additional Hyperdisk Balanced disks to your TPU VM for additional storage.
For more information about Hyperdisk, see Hyperdisk overview. For more information about storage options for Cloud TPU, see Storage options for Cloud TPU data.
What's next
- Use TPU7x with GKE
- Use TPU7x with TPU Cluster Director
- Use the Google Cloud ML Diagnostics platform to optimize and diagnose your workloads