Storage services provide the essential data architecture that helps enable high-performance model training, inference, and fine tuning in the AI Hypercomputer ecosystem. While multiple storage services are available in Google Cloud, the most suitable choice depends on your requirements for I/O, throughput, scale, and latency for use cases within the artificial intelligence (AI) and machine learning (ML) lifecycle.
This document introduces and compares storage services in Google Cloud that can best help you optimize GPU or TPU performance. It also provides recommendations on the ideal service for specific AI and ML use cases.
Introduction to storage services
Google Cloud offers multiple storage solutions that are optimized for AI and ML use cases:
Cloud Storage is an object storage system that's designed for processing and storing massive datasets, like those required for training or bulk inference. Cloud Storage offers several capabilities to help you optimize your data storage for AI and ML tasks.
Google Cloud Managed Lustre is a fully managed and POSIX-compliant parallel file system that's designed for the specialized, low-latency, and high-concurrency metadata performance required for training and inference workloads.
The following sections provide more information about each storage service.
Cloud Storage
Cloud Storage is a foundational object store that's designed to offer global scalability, durability, and cost efficiency. When you use Cloud Storage, you store data as objects in containers called buckets. Cloud Storage offers multiple capabilities for your buckets that help optimize AI and ML workload performance:
Products in the Cloud Storage Rapid family are designed to clear data bottlenecks for your AI and ML workloads by bringing your data closer in proximity to your compute resources. These products let you colocate your data in the same zones as your compute workloads and enable high performance and cost-efficient data storage scaling for your GPU or TPU clusters. Cloud Storage Rapid products include the following:
Rapid Bucket provides the fastest read and write performance in Cloud Storage for zonal buckets. Objects in zonal buckets are stored in the Rapid storage class, a high-performance storage class that's optimized for I/O-intensive workloads. In addition to lower latency, Rapid Bucket delivers significantly higher throughput (up to 15 TB/s) compared to other products and bucket locations in Cloud Storage.
Rapid Cache accelerates data reads to existing buckets without requiring code changes. Rapid Cache is an SSD-backed zonal read cache for Cloud Storage buckets that's used to serve data for data read requests. The product offers higher throughput (up to 2.5 TB/s) and lower latency than buckets without a cache.
Rapid Cache is often set up for multi-region buckets, where accelerator capacity is fragmented across Google Cloud regions. Data read from the cache incurs reduced data transfer fees than data read directly from a multi-region bucket.
Cloud Storage FUSE is an open source FUSE adapter that lets you mount buckets as local file systems, enabling applications to interact with object storage by using standard file system semantics. This capability lets you leverage the global scalability, durability, and cost efficiency of Cloud Storage with local file access. Cloud Storage FUSE is actively maintained and supported by Google.
Cloud Storage FUSE offers multiple client-side caching and tuning parameters, such as parallel downloads. These capabilities can abstract development complexities and help achieve peak performance by sharding or parallelizing streams.
Hierarchical namespace enables a true file system structure in buckets and provides efficient data management capabilities, including atomic folder renames and faster file lookups when the bucket is mounted with Cloud Storage FUSE. Hierarchical namespace offers 8 times higher queries per second (QPS) for object reads and writes than buckets without hierarchical namespace. For more information about the benefits of using hierarchical namespace, see performance and management benefits.
Enabling hierarchical namespace is highly recommended when you have workloads that require high-throughput data loading and frequent model checkpointing. Having hierarchical namespace enabled is required when creating zonal buckets with Rapid Bucket.
Managed Lustre
Google Cloud Managed Lustre is a high performance, POSIX-compliant, fully managed parallel file system that's optimized for AI and ML applications. The Managed Lustre architecture is ideally suited for high-throughput, low-latency, and high-metadata-concurrency AI/ML workloads, such as checkpointing, high-speed weight propagation in reinforcement learning, and Key-Value (KV) caching.
For more information about common use cases for Managed Lustre, see Business cases.
Comparison of storage services
The following table provides a high-level comparison of Cloud Storage and Managed Lustre across key characteristics:
| Characteristics | Cloud Storage | Managed Lustre |
|---|---|---|
| Architecture | Object store
|
Parallel file system
|
| Storage capacity | Scales up to EBs of capacity. |
Scales up to 80 PB of capacity, depending on the instance's performance tier. |
| Performance | Supports the following:
|
Supports the following:
|
| Pricing |
For details, see Cloud Storage pricing. |
For details, see Managed Lustre pricing. |
| Recommendations by requirements | Recommended for applications that need a scalable object store and general cost efficiency for training datasets, asynchronous multi-tier checkpointing, and model weight storage. In particular, Cloud Storage Rapid is recommended for high-performance and cost-efficient data scaling. |
Recommended for applications that need a fully POSIX-compliant parallel file system or home directories. Also recommended for latency-sensitive or high-metadata-concurrency workloads, such as KV caching offloads, synchronous checkpointing, and high-speed weight propagation for reinforcement learning. |
Storage service recommendations by use case
| Use case | Storage service recommendation | Reason for recommendation |
|---|---|---|
| Training and preparing datasets | Primary recommendation: Cloud Storage Rapid Bucket | Cloud Storage buckets provide the capacity, throughput scale, cost efficiency, and durability that are often needed for massive volumes of training and inference datasets. When you use Rapid Bucket to create a zonal bucket, the zonal bucket benefits from very high throughput (up to 15 TB/s) and sub-millisecond latency for open files at optimal cost. |
| Secondary recommendation: Managed Lustre | Managed Lustre provides sub-millisecond latency. It's helpful as a dedicated, ultra-fast workspace for your most intensive training and dataset preparation tasks where low latency and metadata concurrency performance are a high priority. | |
| Moving or saving model weights for checkpointing or weight transfers | Primary recommendation: Managed Lustre | Managed Lustre provides sub-millisecond latency and parallel data access, enabling thousands of rollout workers to pull the same weight file simultaneously without slowing down. |
| Secondary recommendation: Cloud Storage Rapid Bucket | Rapid Bucket is well suited for asynchronous multi-tiered or
distributed checkpointing when it's used with GCSFS
through fsspec or
Cloud Storage FUSE with client-side performance tuning.
|
|
| Storing and downloading models for inference | Primary recommendation: Cloud Storage Rapid Cache or Rapid Bucket | Rapid Cache acts as a booster that helps reduce the inference cold start. With Rapid Cache, the model weights can be pre-warmed in the same zone as your inference nodes, allowing a new inference instance to quickly download the model weights and process its first request. Rapid Bucket serves as a high-performance, accelerated zonal storage engine, letting you locate model weights in the same zone as your inference fleet. For model serving, we recommend using the Run:ai Model Streamer for vLLM for peak download performance. For other inference stacks, optimizing Cloud Storage FUSE parallel download parameters can significantly reduce cold-start latency during model weight downloads. |
| Secondary recommendation: Managed Lustre | Managed Lustre provides sub-millisecond latency and parallel data access, benefitting performance-sensitive models and the scaling of concurrent GPUs downloading the same model simultaneously. | |
| KV cache offloading | Primary recommendation: Managed Lustre | Managed Lustre provides sub-millisecond latency and parallel data access, allowing different nodes to "pull" the KV cache and resume chats without re-processing the whole history of the chat. |
What's next
Learn more about Cloud Storage Rapid, a family of products in Cloud Storage that are designed for AI, ML, and data-intensive analytics.
Learn how to optimize performance when using Cloud Storage FUSE or the Cloud Storage FUSE CSI driver to download datasets.
Learn how to accelerate model loading on Google Kubernetes Engine.