You can deploy, manage, and monitor clusters that you want to run artificial intelligence (AI), machine learning (ML), or high performance computing (HPC) workloads on by using Cluster Director. Cluster Director is a Google Cloud product that automates the complex setup and configuration of clusters, helping you configure compute, networking, and storage resources for your clusters to maximize performance and minimize downtimes.
Cluster Director is designed for IT administrators and AI researchers who want to avoid the overhead of managing a cluster, while focusing on running their AI, ML, or HPC workloads.
Cluster Director key features
Cluster Director integrates multiple Google Cloud services into a cohesive system that simplifies cluster management and provides the following features:
Automated cluster lifecycle management: the Cluster Director platform is built around a unified control plane with its own API and UI in the Google Cloud console. This feature is a central place to perform all cluster lifecycle operations, letting you programmatically deploy, manage, and scale clusters with all their necessary compute, networking, and storage resources.
Managed cluster orchestration and scaling: Cluster Director uses a managed Slurm environment for fault-tolerant and scalable job scheduling. This managed environment lets you dynamically scale your cluster by adding and editing machine configurations to an existing cluster to adjust resources based on your workload demands.
Resilient, performance-optimized infrastructure: clusters are built on a foundation of performance-optimized hardware, including compute-optimized and accelerator-optimized virtual machine (VM) instances. The system uses topology-aware scheduling to colocate VMs and minimize network latency. It also features built-in resiliency, with autohealing capabilities that automatically detect and replace failed VMs to help minimize downtimes.
Integrated observability: Cluster Director provides built-in tools for monitoring the health, topology, and performance of your clusters through a dashboard in the Google Cloud console. These tools give you an at-a-glance view of your cluster's component health.
Cluster Director use cases
Cluster Director is designed to meet the needs of the following use cases:
| Use case | Example workloads |
|---|---|
| Large-scale AI workloads |
|
| HPC workloads |
|
Pricing
There is no charge for using Cluster Director itself. Instead, you incur charges for the underlying Google Cloud resources that your clusters use, such as compute, storage, and networking resources. For more information, see the related pricing documentation:
Pricing for storage services: