TPU v5p
This document describes the architecture and supported configurations of Cloud TPU v5p.
System architecture
This section describes the system architecture specific to the v5p version. Each TensorCore has four Matrix Multiply Units (MXU), a vector unit, and a scalar unit.
There are 8960 chips in a v5p Pod. The largest job that can be scheduled is a 96 cube (6144 chip) job.
The following table shows the key specifications for TPU v5p.
| Key specifications | v5p values |
|---|---|
| Peak compute per chip (BF16) | 459 TFLOPs |
| HBM2e capacity and bandwidth | 95 GB, 2765 GBps |
| TPU Pod size | 8960 chips |
| Interconnect topology | 3D torus * |
| Inter-chip interconnect (ICI) bandwidth | 4800 Gbps |
Configurations
A TPU v5p Pod is composed of 8960 chips interconnected with reconfigurable high-speed links. TPU v5p's flexible networking lets you connect the chips in a same-sized slice in multiple ways.
The following table shows the most common single-slice shapes supported with v5p, plus most (but not all) full cube shapes greater than 1 cube. The maximum v5p shape is 16x16x24 (6144 chips, 96 cubes).
| Topology | Cores | Chips | Hosts | Cubes | Supports twisted? |
|---|---|---|---|---|---|
| 2x2x1 | 8 | 4 | 1 | N/A | N/A |
| 2x2x2 | 16 | 8 | 2 | N/A | N/A |
| 2x4x4 | 64 | 32 | 8 | N/A | N/A |
| 4x4x4 | 128 | 64 | 16 | 1 | N/A |
| 4x4x8 | 256 | 128 | 32 | 2 | Yes |
| 4x8x8 | 512 | 256 | 64 | 4 | Yes |
| 8x8x8 | 1024 | 512 | 128 | 8 | N/A |
| 8x8x16 | 2048 | 1024 | 256 | 16 | Yes |
| 8x16x16 | 4096 | 2048 | 512 | 32 | Yes |
| 16x16x16 | 8192 | 4096 | 1024 | 64 | N/A |
| 16x16x24 | 12288 | 6144 | 1536 | 96 | N/A |
Single slice training is supported for up to 6144 chips. You can scale up to 18432 chips using Multislice. For more information about Multislice, see Cloud TPU Multislice Overview.
Cloud TPU ICI resiliency
ICI resiliency helps improve fault tolerance of optical links and optical circuit switches (OCS) that connect TPUs between cubes. (ICI connections within a cube use copper links that are not impacted). ICI resiliency allows ICI connections to be routed around OCS and optical ICI faults. As a result, it improves the scheduling availability of TPU slices, with the trade-off of temporary degradation in ICI performance.
Similar to Cloud TPU v4, ICI resiliency is enabled by default for v5p slices that are one cube or larger (4x4x4 topology).
VM, host and slice properties
| Property | Value in a TPU |
|---|---|
| # of v5p chips | 4 |
| # of vCPUs | 208 (only half is usable if using NUMA binding to avoid cross-NUMA performance penalty) |
| RAM (GB) | 448 (only half is usable if using NUMA binding to avoid cross-NUMA performance penalty) |
| # of NUMA nodes | 2 |
| NIC throughput (Gbps) | 200 |
| Cores | Chips | Hosts/VMs | Cubes | |
|---|---|---|---|---|
| Host | 8 | 4 | 1 | |
| Cube (rack) | 128 | 64 | 16 | 1 |
| Largest supported slice | 12288 | 6144 | 1536 | 96 |
| v5p full Pod | 17920 | 8960 | 2240 | 140 |