This document provides an overview of Ironwood (TPU7x) in Google Kubernetes Engine (GKE). Ironwood (TPU7x) is Google's seventh-generation Tensor Processing Unit (TPU), custom-designed for large-scale AI workloads. It offers a significant performance improvement over previous TPU generations, which lets you train and serve larger and more complex models.
Characteristics of Ironwood (TPU7x)
Ironwood (TPU7x) introduces unique features that differentiate it from other TPU versions. These features impact availability, node pool configuration, and workload performance.
For information about the underlying hardware, see Ironwood (TPU7x) architecture.
Availability
Ironwood (TPU7x) is available in GKE Standard clusters that run version 1.34.0-gke.2201000 and later, and in Autopilot clusters that run version 1.34.1-gke.3084001 and later.
Workload policy for multi-host node pools
Ironwood (TPU7x) uses a workload
policy to
configure the physical placement of the underlying infrastructure when you
create multi-host TPU slice node pools. You create a workload policy and then
apply it by using the --placement-policy flag. This policy replaces the
--tpu-topology flag used by other TPU versions.
A workload policy is a type of resource policy that lets you configure the physical placement of infrastructure. Ironwood (TPU7x) supports the High throughput workload policy. This policy colocates the TPU VMs to reduce network latency and lets you define the maintenance strategy to minimize workload disruptions.
NUMA binding
The Ironwood (TPU7x) architecture includes the following elements:
- Each Ironwood (TPU7x) virtual machine (VM) contains four chips and two NICs.
- Each VM contains two Non-Uniform Memory Access (NUMA) nodes.
- The CPU, memory, and NICs resources are equally split between the two NUMA nodes.
Accessing resources across different NUMA nodes (cross-NUMA access) can introduce performance bottlenecks in your workloads. Therefore, to optimize your workload performance, GKE lets you deploy your workloads in a multi-container setup. This binds each container to the CPU, memory, and TPU resources within a given NUMA node.
Reference implementations of LLMs
To learn how to deploy large language models (LLMs) on Ironwood (TPU7x), see the following reference implementations. You can use one of the following options for cluster creation:
- GKE XPK: use Accelerated Processing Kit (XPK) to quickly create GKE clusters and run workloads for proofs-of-concept and testing. For more information, see the XPK documentation.
- GKE on Google Cloud CLI: use the Google Cloud CLI to create your GKE cluster instance manually for precise customization or expansion of existing production GKE environments.
| LLM | GKE XPK | GKE on Google Cloud CLI |
|---|---|---|
Llama 70b with BF16 and a 4x4x4 topology |
Pretrain llama3.1-70b workload on Ironwood GKE clusters with XPK | Pretrain llama3.1-70b workload on Ironwood GKE clusters with Kubernetes JobSet |
DeepSeek with BF16 and a 4x4x8 topology |
Pretrain deepseek3-671b workload on Ironwood GKE clusters with XPK | Pretrain deepseek3-671b workload on Ironwood GKE clusters with Kubernetes JobSet |
GPT-oss-120b with BF16 and a 4x4x4 topology |
Pretrain gpt-oss-120b workload on Ironwood GKE clusters with XPK | Pretrain gpt-oss-120b workload on Ironwood GKE clusters with Kubernetes JobSet |
Qwen3-235b-a22b with BF16 and a 4x8x8 topology |
Pretrain qwen3-235b-a22b workload on Ironwood GKE clusters with XPK | Not available |
What's next
- Learn how to plan TPUs in GKE.
- Learn how to deploy TPUs in GKE.
- Try the end-to-end tutorials for Ironwood (TPU7x):