Rapid Bucket

This page describes Rapid Bucket, a capability that lets you store objects in the Rapid storage class by setting a zone as a bucket's location. This approach lets you colocate your data storage with your compute resources, which delivers significantly lower latency and higher throughput compared to other storage classes in Cloud Storage. Workloads in other zones and regions can also access the bucket, with performance relative to the network distance.

To create a zonal bucket using Rapid Bucket, see Create zonal buckets. You can view the list of supported locations in Zones. To read and append to objects in zonal buckets, see use objects in zonal buckets.

Benefits

Rapid Bucket is built to remove storage bottlenecks and is ideal to use for your most data-intensive applications, such as AI/ML and data analytics. Rapid Bucket supports sub-millisecond latency, up to 15 TB/s of aggregate throughput, and 20 million queries per second (QPS). The ultra-low latency allows for instantaneous data retrieval and empowers real-time inference applications to perform at scale. The massive throughput and high QPS help keep your expensive GPU clusters fully saturated, dramatically reducing model training times.

Rapid Bucket terminology

The Cloud Storage documentation uses the following terms:

Rapid Bucket: The product that enables buckets to be created with a zonal location and the Rapid storage class.
Rapid storage: The storage class that offers the highest data access and I/O operation performance in Cloud Storage. When you use Rapid Bucket, you create a bucket that uses Rapid storage. For more information about Rapid storage, see Storage classes.
Zonal bucket: A bucket that's located in a zone. Objects in zonal buckets are always stored in Rapid storage and are appendable.

Capabilities of zonal buckets

In addition to providing low latency and high throughput, zonal buckets enable you to do the following:

Append to objects in the zonal bucket without performing a full object rewrite
Open objects and maintain a stream as you perform operations, letting you accelerate subsequent reads and writes

Use cases

Rapid Bucket is most suitable for AI/ML workloads or other data-intensive workloads. Some examples of such workloads are model checkpointing, evaluation, and serving, as well as logging and messaging queues. It can also be used for streaming data or to provide storage for databases.

To take full advantage of the low latency and high throughput provided by Rapid Bucket, make sure to enable gRPC direct connectivity.

Access to objects in zonal buckets

To get the performance benefits of a zonal bucket, make sure to open objects for streaming and maintain a stream as you perform operations on the objects. When you establish and maintain a stream, you can perform subsequent read or write operations to the object with very low latency. For example, when reading a Parquet file, you can perform both the initial read of the file's metadata (the footer) and the subsequent read of specific rows within a single request. This approach is more efficient than using separate requests for each step.

Once established, object streams are kept open by default when you access zonal bucket objects using Cloud Storage FUSE or the Cloud Storage client libraries.

You can open multiple read streams to an object from any number of hosts. There's no limitation on the number of read streams you can establish to an object.

Appending objects

You can append data to objects in zonal buckets. When you make appends to objects, the following semantics apply:

Appendable objects appear in the bucket namespace as soon as you start writing to them and can be read while still being written.
There are no restrictions on the number of appends you can make to an object or the number of bytes you can append at a time. You can make appends up until an object reaches its maximum size of 5 TiB.
An object's size will grow in real-time as new appends are permanently written or flushed. When you establish a read stream, you should anticipate a minimal delay in the object's size getting updated.
Appendable objects can only have one writer at a time. If a new write stream is established for an object that already has an existing write stream, an error is returned from Cloud Storage to the original stream, and the original stream will no longer be permitted to write. The new writer can resume appending from the last persisted offset without other interleaved appends to the object.

Finalizing objects

After an object is finalized, you can no longer append to it, but you can still overwrite the object with a new version. The metadata of a finalized object is still mutable; for example, new tags can be added and the object can be renamed.

Mounting zonal buckets

You can mount and access zonal buckets by using Cloud Storage FUSE or the Cloud Storage FUSE CSI driver. Make sure to use Cloud Storage FUSE version 3.7.2 or later. To use the Cloud Storage FUSE CSI driver, ensure that your Google Kubernetes Engine version is 1.35.0-gke.3047001 or later.

Pricing

Using Rapid Bucket incurs charges for data storage, operations, and networking. For more information, see Pricing.

Limitations

Zonal buckets must have hierarchical namespace and uniform bucket-level access enabled.
Google Cloud CLI limitations:
- Minimum supported Google Cloud CLI version: The minimum version of the gcloud CLI that supports zonal buckets is 553.0.0. Earlier versions aren't compatible with zonal buckets. We recommend using the latest version of the gcloud CLI to get the latest features and bug fixes.
- Visibility of incomplete uploads: Unlike buckets in other storage classes, where objects only appear in the namespace after an upload completes, partially uploaded objects in zonal buckets are immediately visible. If a Google Cloud CLI upload command fails or is interrupted, you might see incomplete objects in your bucket. You can still resume these uploads by re-running the command.
- Object overwrites: Standard Google Cloud CLI behavior applies to zonal buckets: when you overwrite an object, if a file or object with the same name exists at the destination, the Google Cloud CLI cp, mv, and rsync commands will overwrite it by default. To prevent overwrites, use the --no-clobber flag. When using the Google Cloud CLI, appending data to an existing object is not supported; the entire source must be re-uploaded.
- Object finalization: Objects uploaded to a zonal bucket by using the Google Cloud CLI might occasionally experience a brief delay before the object's metadata is fully synchronized. Because Cloud Storage uses an eventually consistent model, attempting to download an object immediately after upload can result in a hash mismatch error if the metadata is not yet updated.
  
  If a download fails with a hash mismatch error shortly after an upload, retry the command. The system ensures that downloads either succeed in full or fail explicitly; partial or corrupted downloads won't occur silently.

Incompatibilities

Zonal buckets are incompatible with the following tools, operations, products, and metadata:

Tools

Note: If your application uses client libraries to write to Cloud Storage, you'll need to make changes to your code to use supported APIs. To learn which APIs are supported, refer to the code samples for your client library in Use objects in zonal buckets.
- Writes using the XML API or the JSON API
- XML API multipart uploads
Writes for non-appendable objects by using gRPC
Data protection and disaster recovery
- Object Versioning
- Soft delete
Data management
- Rapid Cache
- Autoclass
- Bucket Lock
- Composing objects
- Object holds
- The Object Lifecycle Management SetStorageClass action
- Object Retention Lock
- Relocating buckets
- Resumable uploads
- Rewriting objects
- Requester Pays
Access control
- Object-level access control lists (ACLs)
- CORS configurations
- Customer-supplied encryption key (CSEK)
- HMAC keys
Metadata
- Objects in zonal buckets don't have an MD5 hash.
- The metadata properties associated with unsupported features and products don't appear in the resource representation of a zonal bucket or an appendable object, or are otherwise unwriteable. For example:
  - The softDeleteTime and hardDeleteTime metadata properties don't appear in the resource representation of the Objects resource because soft delete isn't supported for objects in zonal buckets.
  - The storageClass metadata of objects in zonal buckets always have a value of RAPID and can't be rewritten because zonal buckets must always use the Rapid storage class.

Quotas

Each zone per project has a storage bytes quota. Each zone per project also has an egress quota from Cloud Storage to Google services. To see how much storage or data egress quota is available, refer to the Quotas & System Limits page. To learn how to request more quota, see Manage your quotas.

Best practices

To help optimize performance when using zonal buckets with Cloud Storage FUSE, maintain an open file handle to the mounted objects and use it for multiple operations. This results in better performance because it lets Cloud Storage FUSE avoid performing unnecessary network round trips per repeat read.

What's next

Learn how to create zonal buckets.
Learn how to use objects in zonal buckets.