This page describes how to enable and manage graphics processing unit (GPU) workloads on Google Distributed Cloud. To take advantage of this functionality, you must have a Distributed Cloud hardware configuration that contains GPUs. GPU support is disabled by default. You must explicitly enable GPU support on your Distributed Cloud cluster. Keep in mind that Distributed Cloud Servers don't support GPU workloads.
To plan for and order such a configuration, choose configuration 2 in the following documents:
If your Distributed Cloud rack includes GPUs, you can configure your Distributed Cloud workloads to use GPU resources.
Distributed Cloud workloads can run in containers and on virtual machines:
- GPU workloads running in containers. When you enable GPU support, all GPU resources on your Distributed Cloud cluster are initially allocated to workloads running in containers. The GPU driver for running GPU-based containerized workloads is included in Distributed Cloud. Within each container, GPU libraries are mounted at - /opt/nvidia.
- GPU workloads running on virtual machines. To run a GPU-based workload on a virtual machine, you must allocate GPU resources on the target Distributed Cloud node to virtual machines, as described later on this page. Doing so bypasses the built-in GPU driver and passes the GPUs directly through to virtual machines. You must manually install a compatible GPU driver on each virtual machine's guest operating system. You must also secure all the licensing required to run specialized GPU drivers on your virtual machines. 
To confirm that GPUs are present on a Distributed Cloud
node, verify that the node has the vm.cluster.gke.io.gpu=true label. If the
label is not present on the node, then there are no GPUs installed on the
corresponding Distributed Cloud physical machine.
Enable GPU support
To enable GPU support for your workloads, you must create or modify the VMRuntime custom
resource that contains the enableGPU parameter with its value set to true,
and then apply it to your Distributed Cloud cluster.
For example:
apiVersion: vm.cluster.gke.io/v1 kind: VMRuntime metadata: name: vmruntime spec: # Enable GPU support enableGPU: true
Depending on the type of cluster on which you want to enable the VM Runtime on GDC virtual machine subsystem, do one of the following:
- For Cloud control plane clusters on which you have not yet enabled the VM Runtime on GDC
virtual machine subsystem, you must manually create the VMRuntimeresource.
- For Cloud control plane clusters on which you have already enabled the VM Runtime on GDC
virtual machine subsystem, you must edit the existing VMRuntimeresource
- For local control plane clusters, you must edit the existing VMRuntimeresource.
This same VMRuntime resource also configures VM Runtime on GDC
support on your cluster
by using the enable parameter. Make sure that you configure the two parameters
according to your workload needs. You do not have to enable
VM Runtime on GDC support to enable GPU support on your
Distributed Cloud cluster.
The following table describes the available configurations.
| enablevalue | enableGPUvalue | Resulting configuration | 
|---|---|---|
| false | false | Workloads run only in containers and cannot use GPU resources. | 
| false | true | Workloads run only in containers and can use GPU resources. | 
| true | true | Workloads can run on virtual machines and in containers. Both types of workloads can use GPU resources. | 
| true | false | Workloads can run on virtual machines and in containers. Neither type of workload can use GPU resources. | 
Verify that GPU support has been enabled
To verify that GPU support has been enabled on your cluster, use the following command:
kubectl get pods --namespace vm-system
The command returns output similar to the following example:
NAME                                         READY   STATUS     RESTARTS  AGE
...
gpu-controller-controller-manager-vbv4w      2/2     Running    0         31h
kubevirt-gpu-dp-daemonset-gxj7g              1/1     Running    0         31h
nvidia-gpu-dp-daemonset-bq2vj                1/1     Running    0         31h
...
In the output, you can verify that the GPU controller Pods have been deployed
and are running in the vm-system namespace.
Allocate GPU resources
By default, when you enable GPU support on your Distributed Cloud cluster, all GPU resources on each node in the cluster are allocated to containerized workloads. To customize the allocation of GPU resources on each node, complete the steps in this section.
Configure GPU resource allocation
- To allocate GPU resources on a Distributed Cloud node, use the following command to edit the - GPUAllocationcustom resource on the target node:- kubectl edit gpuallocation NODE_NAME --namespace vm-system - Replace - NODE_NAMEwith the name of the target Distributed Cloud node.- In the following example, the command's output shows the factory-default GPU resource allocation. By default, all GPU resources are allocated to containerized ( - pod) workloads, and no GPU resources are allocated to virtual machine (- vm) workloads:- ... spec: pod: 2 # Number of GPUs allocated for container workloads vm: 0 # Number of GPUs allocated for VM workloads
- Set your GPU resource allocations as follows: - To allocate a GPU resource to containerized workloads, increase the value
of the podfield and decrease the value of thevmfield by the same amount.
- To allocate a GPU resource to virtual machine workloads, increase the value
of the vmfield and decrease the value of thepodfield by the same amount.
 - The total number of allocated GPU resources must not exceed the number of GPUs installed on the physical Distributed Cloud machine on which the node runs; otherwise, the node rejects the invalid allocation. - In the following example, two GPU resources have been reallocated from containerized ( - pod) workloads to virtual machine (- vm) workloads:- ... spec: pod: 0 # Number of GPUs allocated for container workloads vm: 2 # Number of GPUs allocated for VM workloads- When you finish, apply the modified - GPUAllocationresource to your cluster and wait for its status to change to- AllocationFulfilled.
- To allocate a GPU resource to containerized workloads, increase the value
of the 
Check GPU resource allocation
- To check your GPU resource allocation, use the following command: - kubectl describe gpuallocations NODE_NAME --namespace vm-system - Replace - NODE_NAMEwith the name of the target Distributed Cloud node.- The command returns output similar to the following example: - Name: mynode1 ... spec: node: mynode1 pod: 2 # Number of GPUs allocated for container workloads vm: 0 # Number of GPUs allocated for VM workloads Status: Allocated: true Conditions: Last Transition Time: 2022-09-23T03:14:10Z Message: Observed Generation: 1 Reason: AllocationFulfilled Status: True Type: AllocationStatus Last Transition Time: 2022-09-23T03:14:16Z Message: Observed Generation: 1 Reason: DeviceStateUpdated Status: True Type: DeviceStateUpdated Consumption: pod: 0/2 # Number of GPUs currently consumed by container workloads vm: 0/0 # Number of GPUs currently consumed by VM workloads Device Model: Tesla T4 Events: <none>
Configure a container to use GPU resources
To configure a container running on Distributed Cloud to use GPU resources, configure its specification as shown in the following example, and then apply it to your cluster:
apiVersion: v1 kind: Pod metadata: name: my-gpu-pod spec: containers: - name: my-gpu-container image: CUDA_TOOLKIT_IMAGE command: ["/bin/bash", "-c", "--"] args: ["while true; do sleep 600; done;"] env: resources: requests: nvidia.com/gpu-pod-TESLA_T4: 2 limits: nvidia.com/gpu-pod-TESLA_T4: 2 nodeSelector: kubernetes.io/hostname: NODE_NAME
Replace the following:
- CUDA_TOOLKIT_IMAGE: the full path and name of the NVIDIA CUDA toolkit image. The version of the CUDA toolkit must match the version of the NVIDIA driver running on your Distributed Cloud cluster. To determine your NVIDIA driver version, see the Distributed Cloud release notes. To find the matching CUDA toolkit version, see CUDA Compatibility.
- NODE_NAME: the name of the target Distributed Cloud node.
Configure a virtual machine to use GPU resources
To configure a virtual machine running on
Distributed Cloud to use GPU resources, configure its
VirtualMachine resource specification as shown in the following example,
and then apply it to your cluster:
apiVersion: vm.cluster.gke.io/v1 kind: VirtualMachine ... spec: ... gpu: model: nvidia.com/gpu-vm-TESLA_T4 quantity: 2