Overview of storage services for AI and ML workloads in AI Hypercomputer

Storage services provide the essential data architecture that helps enable high-performance model training, inference, and fine tuning in the AI Hypercomputer ecosystem. While multiple storage services are available in Google Cloud, the most suitable choice depends on your requirements for I/O, throughput, scale, and latency for use cases within the artificial intelligence (AI) and machine learning (ML) lifecycle.

This document introduces and compares storage services in Google Cloud that can best help you optimize GPU or TPU performance. It also provides recommendations on the ideal service for specific AI and ML use cases.

Introduction to storage services

Google Cloud offers multiple storage solutions that are optimized for AI and ML use cases:

  • Cloud Storage is an object storage system that's designed for processing and storing massive datasets, like those required for training or bulk inference. Cloud Storage offers several capabilities to help you optimize your data storage for AI and ML tasks.

  • Google Cloud Managed Lustre is a fully managed and POSIX-compliant parallel file system that's designed for the specialized, low-latency, and high-concurrency metadata performance required for training and inference workloads.

The following sections provide more information about each storage service.

Cloud Storage

Cloud Storage is a foundational object store that's designed to offer global scalability, durability, and cost efficiency. When you use Cloud Storage, you store data as objects in containers called buckets. Cloud Storage offers multiple capabilities for your buckets that help optimize AI and ML workload performance:

  • Products in the Cloud Storage Rapid family are designed to clear data bottlenecks for your AI and ML workloads by bringing your data closer in proximity to your compute resources. These products let you colocate your data in the same zones as your compute workloads and enable high performance and cost-efficient data storage scaling for your GPU or TPU clusters. Cloud Storage Rapid products include the following:

    • Rapid Bucket provides the fastest read and write performance in Cloud Storage for zonal buckets. Objects in zonal buckets are stored in the Rapid storage class, a high-performance storage class that's optimized for I/O-intensive workloads. In addition to lower latency, Rapid Bucket delivers significantly higher throughput (up to 15 TB/s) compared to other products and bucket locations in Cloud Storage.

    • Rapid Cache accelerates data reads to existing buckets without requiring code changes. Rapid Cache is an SSD-backed zonal read cache for Cloud Storage buckets that's used to serve data for data read requests. The product offers higher throughput (up to 2.5 TB/s) and lower latency than buckets without a cache.

      Rapid Cache is often set up for multi-region buckets, where accelerator capacity is fragmented across Google Cloud regions. Data read from the cache incurs reduced data transfer fees than data read directly from a multi-region bucket.

  • Cloud Storage FUSE is an open source FUSE adapter that lets you mount buckets as local file systems, enabling applications to interact with object storage by using standard file system semantics. This capability lets you leverage the global scalability, durability, and cost efficiency of Cloud Storage with local file access. Cloud Storage FUSE is actively maintained and supported by Google.

    Cloud Storage FUSE offers multiple client-side caching and tuning parameters, such as parallel downloads. These capabilities can abstract development complexities and help achieve peak performance by sharding or parallelizing streams.

  • Hierarchical namespace enables a true file system structure in buckets and provides efficient data management capabilities, including atomic folder renames and faster file lookups when the bucket is mounted with Cloud Storage FUSE. Hierarchical namespace offers 8 times higher queries per second (QPS) for object reads and writes than buckets without hierarchical namespace. For more information about the benefits of using hierarchical namespace, see performance and management benefits.

    Enabling hierarchical namespace is highly recommended when you have workloads that require high-throughput data loading and frequent model checkpointing. Having hierarchical namespace enabled is required when creating zonal buckets with Rapid Bucket.

Managed Lustre

Google Cloud Managed Lustre is a high performance, POSIX-compliant, fully managed parallel file system that's optimized for AI and ML applications. The Managed Lustre architecture is ideally suited for high-throughput, low-latency, and high-metadata-concurrency AI/ML workloads, such as checkpointing, high-speed weight propagation in reinforcement learning, and Key-Value (KV) caching.

For more information about common use cases for Managed Lustre, see Business cases.

Comparison of storage services

The following table provides a high-level comparison of Cloud Storage and Managed Lustre across key characteristics:

Characteristics Cloud Storage Managed Lustre
Architecture

Object store

  • Data is stored in flat buckets by default. All bucket types (zonal, region, dual-region, and multi-region) offer geo-redundancy options that can be accelerated with Cloud Storage Rapid capabilities.
  • You can optionally enable hierarchical namespace to create buckets that support storing data in a file system structure.
  • You can optionally enable Cloud Storage FUSE to mount buckets as local file systems.

Parallel file system

  • Data is stored as files in Managed Lustre instances and mounted as local file systems across your accelerator clusters without any additional tuning needs.
Storage capacity

Scales up to EBs of capacity.

Scales up to 80 PB of capacity, depending on the instance's performance tier.

Performance

Supports the following:

  • Sub-millisecond latency for open files with Rapid Bucket
  • Tens of millions of IOPs/TiB with Rapid Bucket
  • Up to 2.5 TB/s of bandwidth with Rapid Cache
  • Up to 15 TB/s of bandwidth with Rapid Bucket
  • Bandwidth increase requests

Supports the following:

  • Sub-millisecond latency
  • Tens of millions of IOPs/TiB
  • Up to 10 TB/s of bandwidth
Pricing

For details, see Cloud Storage pricing.

For details, see Managed Lustre pricing.

Recommendations by requirements

Recommended for applications that need a scalable object store and general cost efficiency for training datasets, asynchronous multi-tier checkpointing, and model weight storage. In particular, Cloud Storage Rapid is recommended for high-performance and cost-efficient data scaling.

Recommended for applications that need a fully POSIX-compliant parallel file system or home directories. Also recommended for latency-sensitive or high-metadata-concurrency workloads, such as KV caching offloads, synchronous checkpointing, and high-speed weight propagation for reinforcement learning.

Storage service recommendations by use case

Use case Storage service recommendation Reason for recommendation
Training and preparing datasets Primary recommendation: Cloud Storage Rapid Bucket Cloud Storage buckets provide the capacity, throughput scale, cost efficiency, and durability that are often needed for massive volumes of training and inference datasets. When you use Rapid Bucket to create a zonal bucket, the zonal bucket benefits from very high throughput (up to 15 TB/s) and sub-millisecond latency for open files at optimal cost.
Secondary recommendation: Managed Lustre Managed Lustre provides sub-millisecond latency. It's helpful as a dedicated, ultra-fast workspace for your most intensive training and dataset preparation tasks where low latency and metadata concurrency performance are a high priority.
Moving or saving model weights for checkpointing or weight transfers Primary recommendation: Managed Lustre Managed Lustre provides sub-millisecond latency and parallel data access, enabling thousands of rollout workers to pull the same weight file simultaneously without slowing down.
Secondary recommendation: Cloud Storage Rapid Bucket Rapid Bucket is well suited for asynchronous multi-tiered or distributed checkpointing when it's used with GCSFS through fsspec or Cloud Storage FUSE with client-side performance tuning.
Storing and downloading models for inference Primary recommendation: Cloud Storage Rapid Cache or Rapid Bucket

Rapid Cache acts as a booster that helps reduce the inference cold start. With Rapid Cache, the model weights can be pre-warmed in the same zone as your inference nodes, allowing a new inference instance to quickly download the model weights and process its first request.

Rapid Bucket serves as a high-performance, accelerated zonal storage engine, letting you locate model weights in the same zone as your inference fleet.

For model serving, we recommend using the Run:ai Model Streamer for vLLM for peak download performance. For other inference stacks, optimizing Cloud Storage FUSE parallel download parameters can significantly reduce cold-start latency during model weight downloads.

Secondary recommendation: Managed Lustre Managed Lustre provides sub-millisecond latency and parallel data access, benefitting performance-sensitive models and the scaling of concurrent GPUs downloading the same model simultaneously.
KV cache offloading Primary recommendation: Managed Lustre Managed Lustre provides sub-millisecond latency and parallel data access, allowing different nodes to "pull" the KV cache and resume chats without re-processing the whole history of the chat.

What's next