This document describes how to use dynamic slicing by directly interacting with Slice custom resources. You can create slices, monitor partition states, and verify slice health.
Before following these instructions, ensure that you understand the concepts of dynamic slicing.
Why use dynamic slicing with a custom scheduler?
Use your own scheduler to manage Slice custom resources if you have complex scheduling requirements or if you want to integrate dynamic slicing with your existing scheduling infrastructure.
In case you prefer to use a scheduler instead of managing Slice custom resources directly, GKE provides integration with Kueue and Topology Aware Scheduling (TAS). For more information, see Schedule dynamic slices with Kueue and TAS.
Overview of the workflow
To use dynamic slicing with a custom scheduler, you perform the following tasks in this document:
- Enable the slice controller.
- Create node pools with incremental provisioning.
- Create Slice custom resources based on your workload requirements. Apply the Slice custom resource to your cluster.
- Monitor the partition states and slice health.
- Delete the slice when you are done.
For more information about the fields and status of the Slice custom resource, see the Slice custom resource reference information.
Requirements
To use dynamic slicing in GKE, you must meet the following requirements:
- Use a Standard cluster in version 1.35.2-gke.1842000 or later, in the Rapid channel.
- Use Ironwood (TPU7x) version.
- Use the Container-Optimized OS image for your nodes.
- To use incremental provisioning, use All Capacity mode reservations. All Capacity mode is a feature enabled by TPU Cluster Director.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
- Ensure that you have an existing Standard cluster in version 1.35.2-gke.1842000 or later, in the Rapid channel. To create a new cluster, see Creating a regional cluster.
- Ensure you have sufficient quota for Ironwood (TPU7x) in your region.
- If you plan to run multislice workloads, install JobSet v0.10.1 or later.
- Request TPU capacity in All Capacity mode.
Enable the slice controller
To use dynamic slicing, enable the slice controller in your cluster.
Update your cluster:
gcloud container clusters update CLUSTER_NAME \ --location=LOCATION \ --enable-slice-controllerReplace the following:
CLUSTER_NAME: the name of your cluster.LOCATION: the region with your available TPU capacity.
Get credentials so that you can communicate with your cluster with
kubectlcommands:gcloud config set container/cluster CLUSTER_NAME gcloud container clusters get-credentials CLUSTER_NAME \ --location=LOCATIONIn the output of the following command, verify that the
slices.accelerator.gke.iovalue is present:kubectl get crd slices.accelerator.gke.ioThe output is similar to the following:
slices.accelerator.gke.io 2026-01-09T23:58:02Z
Create node pools with incremental provisioning
This section describes how to create the TPU node pools with incremental provisioning. GKE converts all your TPU capacity into node pools of 16-node group of TPU VMs, or sub-blocks. GKE provisions these node pools even when it can't find all 16 healthy VMs by placing nodes on healthy parts of the host machine and incrementally provisioning unhealthy machines while they are repaired.
You can target your node pool to belong to any of the following:
- A specific block of TPUs, which is exposed in All Capacity mode reservations. Block targeting allows GKE to create the node pool in any available sub-block within the specified block.
- A specific sub-block, or a specific 16-node group of TPU VMs, of TPUs for more granular control.
Create a workload policy
To create a TPU slice node pool with Ironwood (TPU7x), you must first create a
workload policy with the accelerator-topology-mode field set to provision_only. This setting
triggers the incremental provisioning process.
Create a workload policy:
gcloud compute resource-policies create workload-policy WORKLOAD_POLICY_NAME \
--project=PROJECT_ID \
--region=REGION \
--type=HIGH_THROUGHPUT \
--accelerator-topology=4x4x4 \
--accelerator-topology-mode=provision_only
Replace the following:
WORKLOAD_POLICY_NAME: a name for your workload policy.PROJECT_ID: your Google Cloud project ID.REGION: the region for the workload policy.
In this command, do the following:
- Always set the
accelerator-topologyfield to4x4x4to match the total number of chips within a single sub-block. - Always set the
accelerator-topology-modefield toprovision_onlyto ensure the incremental provisioning process is triggered. When theprovision_onlyfield is set, the node pool provisions TPU nodes without forming ICI or OCS links.
Target your node pool to belong to a block or a sub-block
You can target specific sub-blocks or blocks within your All Capacity mode reservation.
- Target a block: each node pool uses capacity from a specified block. GKE places the node pool within an available sub-block in that block. You must create as many node pools as there are sub-blocks in the block you want to use.
Target a sub-block: each node pool maps to a specific and available sub-block. When using sub-block targeting, GKE creates the node pool if at least one VM is healthy. Incremental provisioning helps ensure that all nodes are placed within the specified sub-block.
Block
To retrieve the name of the block in a reservation and the count of available sub-blocks in the block, complete the following steps in the View the topology and health status of All Capacity Mode reservations document:
Identify the name of the block by listing all reservation blocks and copying the value in the
name:field. This value is the name of the block orBLOCK_NAMEin this document.Determine how many node pools to create by describing a reservation block and identifying the value in the
reservationSubBlockCountfield. This value is the number of sub-blocks available. For example, thereservationSubBlockCount: 4value indicates that the block has four sub-blocks available, and you need to create four separate node pools.
Set the reservation path:
export RESERVATION_PATH="projects/PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME"Replace the following:
RESERVATION_NAME: the name of your TPU reservation.BLOCK_NAME: the name of the block.
Create a node pool for each sub-block identified in the preceding step. For example, if the count is
4, run this command four times. Use a unique name for each node pool.gcloud container node-pools create NODE_POOL_NAME \ --cluster=CLUSTER_NAME \ --node-locations=ZONE \ --machine-type=tpu7x-standard-4t \ --num-nodes=16 \ --placement-policy=WORKLOAD_POLICY_NAME \ --reservation-affinity=specific \ --reservation=${RESERVATION_PATH}Replace the following:
NODE_POOL_NAME: the name of your new node pool.CLUSTER_NAME: the name of your GKE cluster.WORKLOAD_POLICY_NAME: the name of the workload policy you created.ZONE: the zone for the node pool, for example,us-central1-a.
Sub-block
To retrieve the name of the block and the IDs of the available sub-blocks, complete the following steps in the View the topology and health status of All Capacity Mode reservations document:
To identify the name of the block, list all reservation blocks and copy the value in the
name:field. This value is the name of the block orBLOCK_NAMEon this document.To identify the name of the sub-blocks, list all sub-blocks of a block and copy the value in the
name:field for each entry underreservationSubBlocks. This value is the name of the sub-block orSUBBLOCK_NAMEin this document.
Set the reservation path:
export RESERVATION_PATH="projects/PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME/reservationSubBlocks/SUBBLOCK_NAME"Replace the following:
RESERVATION_NAME: the name of your TPU reservation.BLOCK_NAME: the name of the block.SUBBLOCK_NAME: the name of the sub-block.
Create the node pool:
gcloud container node-pools create NODE_POOL_NAME \ --project=PROJECT_ID \ --cluster=CLUSTER_NAME \ --node-locations=ZONE \ --machine-type=tpu7x-standard-4t \ --num-nodes=16 \ --placement-policy=WORKLOAD_POLICY_NAME \ --reservation-affinity=specific \ --reservation=${RESERVATION_PATH}Replace the following:
NODE_POOL_NAME: a unique name for your new node pool, for example,sub-block-pool-1.PROJECT_ID: your Google Cloud project ID.CLUSTER_NAME: the name of your GKE cluster.ZONE: the zone for the node pool, for example,us-central2-b.WORKLOAD_POLICY_NAME: the name of the workload policy you created.
At this stage, the nodes are created, but their Inter-Chip Interconnect (ICI) links are not yet active. Therefore, you can't run workloads on these node pools directly.
To enable all the necessary ICI links to form the slice and allow workloads to be scheduled, create a dynamic slice by using one of the following methods:
- Create a Slice custom resource. Instead of Pods, you use a Slice custom resource to define the specified topology, which the slice controller activates.
- Schedule GKE workloads with Kueue and TAS. Kueue automatically handles the creation and deletion of Slice custom resources. Avoid manually modifying Slice custom resources created by Kueue.
Form a dynamic slice
After you create the node pools, you can form a larger dynamic slice by creating a Slice custom resource. Instead of Pods, you use a Slice custom resource to define the specified topology, which the slice controller then activates.
Verify the status of the nodes and the partitions
To get the node names from the node pool, run the following command:
kubectl get nodes -l cloud.google.com/gke-nodepool=${NODE_POOL_NAME}The outcome is similar to the following:
NAME STATUS ROLES AGE VERSION gke-np-status-update-7b4c890c-0jhp Ready <none> 2d1h v1.35.1-gke.1396002 gke-np-status-update-7b4c890c-377r Ready <none> 2d1h v1.35.1-gke.1396002 gke-np-status-update-7b4c890c-gb51 Ready <none> 2d1h v1.35.1-gke.1396002Verify the node's provisioning model:
kubectl describe node NODE_NAME | grep "cloud.google.com/gke-accelerator-topology-mode"The outcome is similar to the following:
cloud.google.com/gke-accelerator-topology-mode: PROVISION_ONLYThis value matches the
accelerator-topology-mode=provision_onlysetting defined when you created the workload policy.Retrieve the node label information:
kubectl describe node NODE_NAME | grep "cloud.google.com/gke-tpu-partition-4x4x4-id"Replace
NODE_NAMEwith the name of one of the nodes in the node pool.The outcome is similar to the following:
cloud.google.com/gke-tpu-partition-4x4x4-id=fba785f80d18552357dcdef6d3d16c27The
cloud.google.com/gke-tpu-partition-4x4x4-stateannotation indicates if the node is available to form a dynamic slice. This label supports the following values:HEALTHY: the node is healthy and fully functional.DEGRADED: the node is impaired but still usable for dynamic slice formation.UNHEALTHY: the node is malfunctioning and can't be used to form a slice.UNSET: state is undefined due to insufficient nodes in the node pool.INCOMPLETE: not all nodes within the partition are provisioned.
Verify that the node includes the
node.gke.io/created-by-migannotation:kubectl describe node NODE_NAME | grep "node.gke.io/created-by-mig"Replace
NODE_NAMEwith the name of one of the nodes in the node pool.The outcome is similar to the following:
node.gke.io/created-by-mig: projects/735972712744/zones/us-central1-ai1a/team/stringThe output includes the
node.gke.io/created-by-miglabel, which allows the GKE control plane to link Kubernetes nodes to their underlying Compute Engine resources.
Create a Slice custom resource
Define the Slice custom resource:
apiVersion: accelerator.gke.io/v1beta1 kind: Slice metadata: # Name of the slice resource name: SLICE_NAME spec: # Specify the type of accelerator for this slice type: "tpu7x" # Define the desired topology for the accelerator slice topology: TOPOLOGY partitionIds: - PARTITION_ID # Example: a9476d1b02bd4f4e75ffffae3bd23c01 - PARTITION_ID_2 # ... add more partition IDs as neededReplace the following:
SLICE_NAME: a name for your slice. The name must meet themetadata.nameconditions.TOPOLOGY: the topology for the dynamic slice. The topology must meet the following conditions:- Each dimension of the requested topology must be a multiple of four,
for example
4A x 4B x 4C. - The three values in the topology dimensions,
AxBxC, must be in non-decreasing order (A ≤ B ≤ C). For example,4x4x8is valid, but4x8x4is not. This order helps ensure consistent slice formation and avoids unexpected behavior. - The product of the three values in the topology dimensions,
A × B × Cmust not exceed 9,216.
- Each dimension of the requested topology must be a multiple of four,
for example
PARTITION_ID: a list of strings that identify the4x4x4partitions that make up the slice. Calculate the number of partitions based on the total number of chips, where each partition consists of 64 chips. The number of items in yourspec.partitionIdslist must exactly match the calculated number of partitions ((A × B × C) / 64). ThepartitionIdsmust meet the following conditions:- Each partition must map to a reservation sub-block.
- All associated sub-blocks must belong to the same reservation.
- All associated blocks must reside within the same reservation.
- The associated node pools must have all nodes in
readystate.
- The value of the
typefield must betpu7x. - Optionally, to enable the slice controller to automatically retry during slice
formation, you can add the
slice.gke.io/retry-on-failure: "true"annotation to the slice custom resource. If the slice isn't created because of theSliceCreationFailedstatus reason, the controller will retry until the slice is successfully formed.
For example, to create a
4x8x8slice, you need to provide four unique partition IDs.apiVersion: accelerator.gke.io/v1beta1 kind: Slice metadata: name: test-slice-example annotations: slice.gke.io/retry-on-failure: "true" # Optional annotation to retry slice formation spec: type: "tpu7x" topology: "4x8x8" # (4*8*8)/64 = 4 partitions partitionIds: - "p0" - "p1" - "p2" - "p3"Apply the Slice custom resource:
kubectl apply -f test-slice-example.yamlAt this point, GKE attempts to create the slice. If one of the following issues occurs, slice creation fails, and the status reason in the Slice custom resource is updated to
SliceCreationFailedorFAILED:- If the selected nodes on the custom resource don't exist, the status
reason is
SliceCreationFailed. - If any nodes on the custom resource are used by another slice, the
status reason is
SliceCreationFailed. - If the nodes on the custom resource aren't part of the same reservation
block, the status reason is
FAILED. - If the nodes aren't in the same reservation, the status reason is
FAILED. - If the topology doesn't match the number of partitions, the status
reason is
SliceCreationFailed.
To learn more about the status of the Slice custom resource, see Slice status.
- If the selected nodes on the custom resource don't exist, the status
reason is
Monitor the status of the Slice custom resource
To check the status of the Slice custom resource, run the following command:
kubectl describe slice SLICE_NAME
Replace the SLICE_NAME with the name of the slice.
The output is similar to the following:
Name: test-slice
Namespace:
Labels: <none>
Annotations: <none>
API Version: accelerator.gke.io/v1beta1
Kind: Slice
Metadata:
Creation Timestamp: 2026-01-11T23:45:15Z
Finalizers:
accelerator.gke.io/slice-finalizer
Generation: 1
Resource Version: 1768175347356335006
UID: d0b71e5c-be3f-4788-aead-930c7afec4f2
Spec:
Partition Ids:
2c79463990ff67c4e3c2648666bfedfa
ba898ffcac0ad0946e8ff036d771ee53
[more partition IDs]
Topology: 8x16x16
Type: tpu7x
Status:
Conditions:
Last Transition Time: 2026-01-11T23:45:38Z
Message: ""
Reason: FAILED
Status: False
Type: Ready
Events:
The reason field in the status of the Slice custom resource indicates the current state of the slice.
The lifecycle of a Slice custom resource follows this progression:
SliceNotCreated: the controller performs initialization and resources checks.- If prerequisites are not met, the state transitions to
SliceCreationFailed. - If validation passes, the state transitions to
ACTIVATING.
- If prerequisites are not met, the state transitions to
ACTIVATING: GKE is forming the slice.- If successful, the state transitions to
ACTIVE. - If sub-blocks are degraded but the slice is usable, the state transitions to
ACTIVE_DEGRADED. - If formation fails, the state transitions to
FAILED.
- If successful, the state transitions to
DEACTIVATING: if the Slice custom resource is deleted or a critical failure occurs in an active or failed state, the slice begins dismantling.INCOMPLETE: the terminal step before the resource is fully deleted.
To learn more about the status of the Slice custom resource, see Slice status.
Run workloads on dynamic slicing
When the Slice custom resource is in an ACTIVE state, you can run workloads on it.
The following section includes examples of workloads that use
dynamic slicing. The workloads are submitted as Jobs or JobSets.
Example 1: single workload uses a single slice
The following example shows a workload that uses a single sub-block slice.
Save the following sample manifest as
tpu-job-jax-v7x-64.yaml:apiVersion: v1 kind: Service metadata: name: headless-svc spec: clusterIP: None selector: job-name: tpu-job-jax-v7x-64 --- apiVersion: batch/v1 kind: Job metadata: name: tpu-job-jax-v7x-64 spec: backoffLimit: 0 completions: 16 parallelism: 16 completionMode: Indexed template: metadata: annotations: cloud.google.com/gke-tpu-slice-topology: 4x4x4 spec: nodeSelector: cloud.google.com/gke-tpu-topology: 4x4x4 cloud.google.com/gke-tpu-accelerator: tpu7x cloud.google.com/gke-tpu-slice: test-slice subdomain: headless-svc restartPolicy: Never containers: - name: tpu-job-jax env: - name: TPU_ACCELERATOR_TYPE value: tpu7x-128 image: python:3.12 securityContext: privileged: false command: - bash - -c - | set -ex pip install -U --pre jax jaxlib libtpu requests -i https://us-python.pkg.dev/ml-oss-artifacts-published/jax/simple/ -f https://storage.googleapis.com/jax-releases/libtpu_releases.html pip list python -c 'import jax; print("Total TPU devices (cores):", jax.device_count())' resources: requests: google.com/tpu: 4 limits: google.com/tpu: 4In this manifest:
cloud.google.com/gke-tpu-slice-topologyandcloud.google.com/gke-tpu-topologydefine the topology of the dynamic slice.env.value: tpu7x-128is the TPU accelerator type and the total number of cores in the slice. The number of cores is calculated by multiplying the dimensions of the topology by the number of cores per chip. For example, for a4x4x4topology, the calculation is4 × 4 × 4 × 2 = 128, where2is the number of cores per chip fortpu7x(Ironwood (TPU7x)). Therefore, theTPU_ACCELERATOR_TYPEistpu7x-128.
Apply the
tpu-job-jax-v7x-64.yamlmanifest:kubectl apply -f tpu-job-jax-v7x-64.yaml
Example 2: deploy a workload on multislice node pools by using JobSet
This example demonstrates how to deploy a workload on multislice node pools by using JobSet.
Install JobSet:
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.10.1/manifests.yamlSave the following sample manifest as
tpu-multislice-jax.yaml:apiVersion: jobset.x-k8s.io/v1alpha2 kind: JobSet metadata: name: tpu-multislice-jax annotations: alpha.jobset.sigs.k8s.io/exclusive-topology: cloud.google.com/gke-tpu-slice spec: failurePolicy: maxRestarts: 3 replicatedJobs: - name: slice-job replicas: 2 template: spec: parallelism: 16 completions: 16 backoffLimit: 0 completionMode: Indexed template: metadata: annotations: # The shape of the slice cloud.google.com/gke-tpu-slice-topology: 4x4x4 spec: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet nodeSelector: cloud.google.com/gke-tpu-topology: 4x4x4 cloud.google.com/gke-tpu-accelerator: tpu7x # IMPORTANT: Do NOT put 'cloud.google.com/gke-tpu-slice' here manually. # The exclusive-topology annotation handles the slice assignment automatically. containers: - name: jax-worker image: python:3.12 securityContext: privileged: true ports: - containerPort: 8471 command: - bash - -c - | set -ex pip install -U --pre jax jaxlib libtpu requests -f https://storage.googleapis.com/jax-releases/libtpu_releases.html # Verify JobSet injected the specific slice ID for this worker echo "JobSet Index: $JOB_COMPLETION_INDEX" python -c 'import jax; print("Total TPU devices:", jax.device_count())' resources: requests: google.com/tpu: 4 limits: google.com/tpu: 4Apply the
tpu-multislice-jax.yamlmanifest:kubectl apply -f tpu-multislice-jax.yamlIn this manifest:
- The
replicas: 2field underreplicatedJobsindicates that JobSet creates two separate Jobs, each corresponding to a4x4x4TPU slice. - The
alpha.jobset.sigs.k8s.io/exclusive-topology: cloud.google.com/gke-tpu-sliceannotation helps ensure that each Job is assigned to a unique TPU slice. - The
cloud.google.com/gke-tpu-slice-topology: 4x4x4annotation defines the topology of each dynamic slice. - The
TPU_ACCELERATOR_TYPEenvironment variable is not explicitly set in this example, as JobSet handles the slice assignment. The JAX code automatically detects the available TPU devices within its assigned slice.
- The
Delete the slice
Delete the slice:
kubectl patch slice $SLICE_NAME --type json \ -p='[{"op": "remove", "path": "/metadata/finalizers"}]'Verify that the slice is deleted:
kubectl get slices
Disable the Slice Controller
To disable the slice controller, remove it from the cluster.
Check that the Slice custom resources empty:
kubectl get slice -AUpdate the cluster to disable the slice controller:
gcloud container clusters update ${CLUSTER_NAME} \ --location=${REGION} \ --no-enable-slice-controllerDelete the Slice custom resource :
kubectl delete crd slices.accelerator.gke.ioVerify that the Slice custom resource is deleted:
kubectl get crd | grep slices.accelerator.gke.ioRemove the labels added by slice controller. Need to remove these labels:
cloud.google.com/gke-tpu-slicecloud.google.com/gke-tpu-topology
- To remove from a specific node, update the node name
export NODE_NAME="gke-tpu-bdac9600-3bdg" kubectl label node $NODE_NAME cloud.google.com/gke-tpu-slice- cloud.google.com/gke-tpu-slice-topology-- If you want to remove these labels from every node in your cluster:
kubectl label nodes --all cloud.google.com/gke-tpu-slice- cloud.google.com/gke-tpu-slice-topology-- Check the node labels and confirm they are empty:
export NODE_NAME="gke-tpu-bdac9600-3bdg" kubectl describe node $NODE_NAME | grep "cloud.google.com/gke-tpu-slice"
What's next
- Learn more about dynamic slicing concepts.
- Learn about the Slice custom resource.