This document describes dynamic slicing in Google Kubernetes Engine (GKE). Dynamic slicing lets you configure provisioned TPU sub-blocks into different topologies. This capability reduces the need to re-create node pools, enhances fault tolerance by allowing automatic recovery when a failure occurs, and optimizes resource utilization.
Dynamic slicing is intended for AI/ML engineers and platform administrators who want to optimize TPU utilization, reduce provisioning time, and improve fault tolerance for large-scale training and inference workloads.
Before reading this document, you should be familiar with the following:
- TPUs in GKE.
- TPU Cluster Director. Dynamic slicing is a TPU feature enabled by TPU Cluster Director.
- All Capacity mode reservations. Dynamic slicing features are available exclusively on TPUs that use All Capacity mode.
What is dynamic slicing?
Dynamic slicing delivers flexibility in managing Cloud TPU capacity by letting you decouple TPU provisioning. Dynamic slicing involves the following process:
- Provision resources as smaller units: you provision resources as units
called sub-blocks. A sub-block
is the fundamental logical building unit of Ironwood (TPU7x) capacity.
For Ironwood (TPU7x), a sub-block represents a 16-node group of TPU VMs with a
4x4x4topology of interconnected TPU chips. In the context of TPU All Capacity mode and dynamic slicing, a node pool maps directly to a sub-block. - stitch sub-blocks together: dynamic slicing stitches these sub-blocks together into larger slices.
Benefits of dynamic slicing
Dynamic slicing helps you to achieve the following:
- Reduce time to provision: individually provisioning sub-blocks leads to faster overall provisioning because it minimizes the impact of any single failure.
- Reduce time to recover: if a TPU chip failure occurs, the smallest unit of failure is a sub-block. Dynamic slicing isolates faulty sub-blocks so that workloads can be rescheduled on healthy sub-blocks faster than re-provisioning an entire large slice.
- Reshape capacity: if you have diverse workload requirements, you don't need to delete and re-create node pools for topology changes. Instead, you can dynamically reconfigure the provisioned node pools to match specified shapes.
Key elements of dynamic slicing
Dynamic slicing introduces the following key concepts:
- Incremental provisioning of node pools: dynamic slicing uses incremental provisioning, which is a fault-tolerant provisioning model of node pools. This model converts all your TPU capacity into node pools of 16-node group of TPU VMs.
- Slice controller: a Kubernetes Custom Resource controller running within the GKE control plane that manages dynamic slicing. The slice controller manages the lifecycle of a Slice custom resource, which represents a dynamic slice. The slice controller handles creating, continuously monitoring, and deleting the Slice. When you use a scheduler, the scheduler directs the creation and deletion of the Slice custom resource.
- Slice custom resource: dynamically stitches sub-blocks together based on the requested TPU topology. This process relies on the dynamic reconfiguration of the OCS network to connect the TPU node pools, which helps to ensure optimized performance. You can inspect the progress or health of dynamic slice formation by inspecting the Slice custom resource's status fields.
Schedulers for dynamic slicing
You can configure Kueue and Topology Aware Scheduling (TAS) to automatically create a Slice custom resource. You can also use your own scheduler to manage Slice custom resources.