About Ironwood (TPU7x) in GKE

Autopilot Standard

This document provides an overview of Ironwood (TPU7x) in Google Kubernetes Engine (GKE). Ironwood (TPU7x) is Google's seventh-generation Tensor Processing Unit (TPU), custom-designed for large-scale AI workloads. It offers a significant performance improvement over previous TPU generations, which lets you train and serve larger and more complex models.

Characteristics of Ironwood (TPU7x)

Ironwood (TPU7x) introduces unique features that differentiate it from other TPU versions. These features impact availability, node pool configuration, and workload performance.

For information about the underlying hardware, see Ironwood (TPU7x) architecture.

Availability

Ironwood (TPU7x) is available in GKE Standard clusters that run version 1.34.0-gke.2201000 and later, and in Autopilot clusters that run version 1.34.1-gke.3084001 and later.

Workload policy for multi-host node pools

Ironwood (TPU7x) uses a workload policy to configure the physical placement of the underlying infrastructure when you create multi-host TPU slice node pools. You create a workload policy and then apply it by using the --placement-policy flag. This policy replaces the --tpu-topology flag used by other TPU versions.

A workload policy is a type of resource policy that lets you configure the physical placement of infrastructure. Ironwood (TPU7x) supports the High throughput workload policy. This policy colocates the TPU VMs to reduce network latency and lets you define the maintenance strategy to minimize workload disruptions.

NUMA binding

The Ironwood (TPU7x) architecture includes the following elements:

Each Ironwood (TPU7x) virtual machine (VM) contains four chips and two NICs.
Each VM contains two Non-Uniform Memory Access (NUMA) nodes.
The CPU, memory, and NICs resources are equally split between the two NUMA nodes.

Accessing resources across different NUMA nodes (cross-NUMA access) can introduce performance bottlenecks in your workloads. Therefore, to optimize your workload performance, GKE lets you deploy your workloads in a multi-container setup. This binds each container to the CPU, memory, and TPU resources within a given NUMA node.

Reference implementations of LLMs

To learn how to deploy large language models (LLMs) on Ironwood (TPU7x), see the following reference implementations. You can use one of the following options for cluster creation:

GKE XPK: use Accelerated Processing Kit (XPK) to quickly create GKE clusters and run workloads for proofs-of-concept and testing. For more information, see the XPK documentation.
GKE on Google Cloud CLI: use the Google Cloud CLI to create your GKE cluster instance manually for precise customization or expansion of existing production GKE environments.

LLM	GKE XPK	GKE on Google Cloud CLI
Llama 70b with BF16 and a `4x4x4` topology	Pretrain llama3.1-70b workload on Ironwood GKE clusters with XPK	Pretrain llama3.1-70b workload on Ironwood GKE clusters with Kubernetes JobSet
DeepSeek with BF16 and a `4x4x8` topology	Pretrain deepseek3-671b workload on Ironwood GKE clusters with XPK	Pretrain deepseek3-671b workload on Ironwood GKE clusters with Kubernetes JobSet
GPT-oss-120b with BF16 and a `4x4x4` topology	Pretrain gpt-oss-120b workload on Ironwood GKE clusters with XPK	Pretrain gpt-oss-120b workload on Ironwood GKE clusters with Kubernetes JobSet
Qwen3-235b-a22b with BF16 and a `4x8x8` topology	Pretrain qwen3-235b-a22b workload on Ironwood GKE clusters with XPK	Not available

What's next

Learn how to plan TPUs in GKE.
Learn how to deploy TPUs in GKE.
Try the end-to-end tutorials for Ironwood (TPU7x):