This document describes the pre-configured, bootable operating system (OS) images that Cluster Director uses to deploy Compute Engine instances in your clusters.
The OS images that you can use in Cluster Director come with pre-configured machine learning (ML) frameworks and libraries. These frameworks and libraries remove the need for manual installation, as well as simplify model creation and training for large-scale workloads. By understanding the available OS images in Cluster Director and their lifecycle, you can choose the right OS image for your workload, keep your clusters secure, and prevent workload disruptions caused by obsolete software.
Slurm OS images for Cluster Director
When you deploy a Slurm cluster, Cluster Director provides OS images with pre-configured ML frameworks and libraries for the nodes in your cluster. These OS images offer the following benefits:
The OS images provision every node in your cluster with a consistent software stack, which includes the required NVIDIA drivers and CUDA versions.
The OS images are extensions of the Ubuntu LTS OS images, and they include all necessary system software for cluster and workload management.
Included software
The custom OS images for Cluster Director include the following software components by default:
OS images: The following table lists the supported image families and image versions for each machine series.
Machine series Image family Image version A4X Ubuntu 24.04 LTS with NVIDIA driver version 580 and CUDA 13 projects/clusterdirector-public-images/global/images/family/a4x-ubuntu-2404-arm64-nvidia-580-slurm-2505-v20251118
A4, A3 Ultra, or N2 Ubuntu 22.04 LTS with NVIDIA driver version 580 and CUDA 13, or version 570 and CUDA 12 projects/clusterdirector-public-images/global/images/family/common-ubuntu-2204-amd64-nvidia-580-slurm-2505-v20251113(default)projects/clusterdirector-public-images/global/images/family/common-ubuntu-2204-amd64-nvidia-570-slurm-2505-v20250918
A3 Mega Ubuntu 22.04 LTS with NVIDIA driver version 570 and CUDA 12 projects/clusterdirector-public-images/global/images/family/a3m-ubuntu-2204-amd64-nvidia-570-slurm-2505-v20251114(default)projects/clusterdirector-public-images/global/images/family/a3m-ubuntu-2204-amd64-nvidia-570-slurm-2505-v20250918
Orchestration: Slurm version 25.05 and its dependencies, such as MariaDB.
Containerization tools: the NVIDIA enroot container runtime and NVIDIA pyxis, used for running containerized workloads on Slurm clusters.
Drivers: NVIDIA driver version 580 with CUDA Toolkit 13, or version 570 with CUDA Toolkit 12.
Libraries: libraries for GPUDirect RDMA, including
ibverbs-utilsandrdma-core.Parallel computing libraries: Open MPI and PMIx for managing parallel processing tasks across the cluster.
Google Cloud integrations: the Ops Agent for monitoring and logging, and Cloud Storage FUSE for accessing Cloud Storage buckets from the cluster nodes.
For information about the patch releases and end of support dates for an image family, see Image family release notes.