Supported storage services for Cluster Director

This document provides a conceptual overview of the Google Cloud storage services supported for Cluster Director.

High-performance storage is a critical component for large-scale artificial intelligence (AI) and high performance computing (HPC) workloads. The storage services that you use with Cluster Director handle everything from preparing and loading training data to saving model checkpoints and managing shared user environments.

Shared file system requirements for Slurm

When you deploy a cluster with Slurm as the orchestrator, a shared file system is required for the /home directory on all controller and login nodes. To meet this requirement, you can configure Cluster Director to create a new Filestore or Google Cloud Managed Lustre instance, or use an existing instance. This configuration provides a shared space for user files and configurations.

Supported storage services

In addition to a mandatory Filestore or Managed Lustre instance for the /home directory, you can attach other storage solutions to your cluster to meet the specific needs of your workloads. Cluster Director supports the following storage services:

Storage service Features Recommended for
Filestore Overview: Filestore is a fully managed, high-performance NFS file storage service. It provides a familiar file system interface and is suitable for a wide range of use cases. A Filestore instance is the default option for home directories in Cluster Director clusters.
  • Home directories
  • General-purpose file storage
  • Workloads requiring an NFS interface
Managed Lustre Overview: Managed Lustre is a high-performance, fully managed parallel file system optimized for AI and HPC applications. With its ultra-low latency and full POSIX support, it's ideal for migrating on-premises AI workloads to Google Cloud.
  • Home directories
  • Migrating AI or ML workloads to Google Cloud
  • Model simulations
  • Workloads with frequent small reads and writes
Cloud Storage Overview: Cloud Storage is a highly scalable, durable, and cost-effective object store. Through integration with Cloud Storage FUSE, Cloud Storage buckets can be mounted as local file systems, making it suitable for storing vast datasets for training and model checkpoints.
  • Cost-effective data storage
  • Data processing and preparation
  • Storing model training data and checkpoints

What's next?