Rapid Cache

This page describes Rapid Cache, a feature that provides an SSD-backed zonal read cache for Cloud Storage buckets, enabling you to get more throughput and lower latency on your stored data. Rapid Cache provides storage capacity and bandwidth that automatically scale up or down to your needs.

Because of its benefits, Rapid Cache is helpful for improving the performance and reducing the network costs associated with read-heavy workloads.

See Create and manage caches to learn how to create and manage caches in Rapid Cache.

How does it work?

Rapid Cache lets you create caches in the same zone as your workloads. When you create a cache in a zone, data read requests originating from the zone are processed by the cache instead of the bucket. Each cache serves clients within the same zone as the cache. Data will only be ingested into the cache from your bucket when that data is read by a VM that resides in the same zone as the cache. In addition, data can be ingested when the data is written to your bucket if you configure ingest on write. Metadata doesn't get cached and requests for object metadata are processed by the bucket instead of the cache.

Rapid Cache is a fully managed service and always returns consistent data.

Benefits

When you cache your data with Rapid Cache, you get the following benefits:

Get faster data access: Rapid Cache co-locates your data in the same zone as your compute resources and is fully backed by SSD. This enables your workloads to get up to 2.5 TB/s of throughput and reduces latency for faster reads.
Reduce multi-region data transfer fees: Data that's read from the cache is charged reduced data transfer fees compared to data that's read directly from a multi-region bucket.
Reduce retrieval fees: Retrieval fees for buckets in Nearline storage, Coldline storage, and Archive storage don't apply for data reads from the cache.
Accrue lower costs from read operations: Read operations served from Rapid Cache are priced lower than Class B operations served from a bucket in Standard storage.
Autoscale your cache size: Rapid Cache's dynamic SSD caching scales automatically based on usage without you needing to specify a cache size.
Use caches efficiently: Rapid Cache can be enabled on existing buckets without requiring changes to your existing applications or APIs. Data stored within Rapid Cache is strongly consistent.

For details about pricing, see Rapid Cache pricing. For information about quotas, see Rapid Cache quotas.

When should you use Rapid Cache?

Use Rapid Cache for data that's infrequently changed and frequently read to accelerate data reads for analytics workloads and AI/ML model training and loading.

Say you're training an AI model across many Google Kubernetes Engine nodes, all repeatedly reading data that's stored in your Cloud Storage buckets and running in the same zone. When you create a cache in the zone where your workload is running, the cache provides extra bandwidth and helps you reduce the data transfer fees associated with reading data in multi-region buckets, letting you run larger, scaled workloads more efficiently.

Cache size and bandwidth limit autoscaling

Rapid Cache provides temporary storage capacity and bandwidth that automatically scale up or down according to the amount of data stored in a cache.

The cache bandwidth limit starts at 100 Gbps and scales at the rate of 20 Gbps per 1 TiB of stored data. You can increase the starting bandwidth or total bandwidth limit by increasing the amount of data stored in the cache, creating more caches in a zone, or contacting your Technical Account Manager or Google representative.

To learn more about size and bandwidth limits for Rapid Cache, see Cloud Storage quotas & limits.

Caching data in zones

When you create a cache for a bucket, the cache must be created in a zone within the location of your bucket. For example, if your bucket is located in the us-east1 region, you can create a cache in us-east1-b but not us-central1-c. If your bucket is located in the ASIA dual-region, you can create a cache in any zones that make up the asia-east1 and asia-southeast1 regions.

For each bucket, you can create a maximum of one cache per zone. For example, if a bucket is located in the us-east1 region, you could create a cache in us-east1-b and another cache in us-east1-c. If a bucket is located in a multi-region that encompasses us-central1 and us-east1, you could create a cache in us-central1-a and another cache in us-east1-b.

You can create caches in zones as long as capacity is available for the zone. If the capacity for creating a cache is unavailable, Rapid Cache continues trying to create a cache until the capacity becomes available or the creation process is cancelled by the user. The capacity might remain unavailable for a long period of time.

You can use Rapid Cache in the following zones. These zones can be used depending on the location type of your bucket.

Asia

The following table shows the zones and location types that are available for Rapid Cache in the Asia geographical area.

Zone name	Region	Dual-region	Multi-region	Custom dual-region
`asia-east1-a`
`asia-east1-b`
`asia-east1-c`
`asia-northeast1-a`
`asia-northeast1-b`
`asia-northeast1-c`
`asia-south1-a`
`asia-south1-b`
`asia-south1-c`
`asia-southeast1-a`
`asia-southeast1-b`
`asia-southeast1-c`

Europe

The following table shows the zones and location types that are available for Rapid Cache in the Europe geographical area.

Zone name	Region	Dual-region	Multi-region	Custom dual-region
`europe-north1-a`
`europe-north1-b`
`europe-north1-c`
`europe-west1-b`
`europe-west1-c`
`europe-west1-d`
`europe-west4-a`
`europe-west4-b`
`europe-west4-c`
`europe-west4-ai1a` (AI zone)
`europe-west6-a`
`europe-west6-b`

United States

The following table shows the zones and location types that are available for Rapid Cache in the United States geographical area.

Zone name	Region	Dual-region	Multi-region	Custom dual-region
`us-central1-a`
`us-central1-b`
`us-central1-c`
`us-central1-f`
`us-central1-ai1a` (AI zone)
`us-east1-b`
`us-east1-c`
`us-east1-d`
`us-east4-a`
`us-east4-b`
`us-east4-c`
`us-east5-a`
`us-east5-b`
`us-east5-c`
`us-south1-a`
`us-south1-b`
`us-south1-c`
`us-south1-ai1b` (AI zone)
`us-west1-a`
`us-west1-b`
`us-west1-c`
`us-west2-a`
`us-west3-a`
`us-west3-b`
`us-west3-c`
`us-west4-a`
`us-west4-b`
`us-west4-c`

Data ingestion

By default, data is ingested into the cache only after it's requested for the first time. Because the cache is empty during this initial request, the first read results in a cache miss, where the system instead fetches the data from the bucket. As the data is delivered to the user, it's simultaneously ingested into the cache. As a result, all subsequent reads are served directly from the cache as cache hits, dramatically speeding up performance.

To avoid the slow initial request entirely, you can configure the cache to ingest data on write, meaning that it loads data the moment the data is written to the bucket. Ingesting data into the cache on write is ideal for workflows that require very fast speed from the very first read, such as restoring system checkpoints or preparing data pipelines for model training.

When ingesting data into a cache, Rapid Cache breaks objects into smaller, fixed-size chunks. Breaking objects into chunks allows for more granular caching, especially for large files where only specific parts are accessed.

A chunk is a 2 MB block of data. When a request is made for an object, Rapid Cache identifies which 2 MB chunks cover the requested byte range and manages those chunks independently.

The data ingestion behavior differs depending on the size of the object being ingested into the cache:

For read requests to objects larger than 2 MB, only the chunks containing the requested byte range are ingested. For example, reading the first 1 MB of a 100 MB file ingests only the first 2 MB chunk.
For read requests to objects smaller than 2 MB (for example, a 500 KB image), the entire object is ingested into the cache.

Cache configuration

You can set the following properties when you configure a cache during a create or update operation.

Time to live (TTL)

The TTL is the longest time that a chunk of data will remain in the cache from the last read. For example, if the TTL is set to 24 hours, a chunk of data that is last read at 11am on Monday with no subsequent reads will be evicted from the cache at 11am on Tuesday. You can set a TTL between 24 hours and 7 days. If unspecified, the TTL defaults to 24 hours.

Ingest on write

Ingesting data into the cache on object write accelerates read-after-write workloads, such as checkpointing and outputting data prep for a training job. When you configure a cache to ingest data on write, data is written to the cache as it gets uploaded to the bucket. This proactive approach removes initial cache misses and allows your workloads to benefit from an immediate cache hit on the very first read.

Cache operations

This section describes operations you can perform on Rapid Cache caches. Some operations are asynchronous and return a long-running operation, while other operations are synchronous, where the operations are done immediately and return an AnywhereCache resource.

Create a cache

When you create a cache, the cache enters a CREATING state as it's being created, and enters a RUNNING state when it becomes actively running. Cache creation operation can take up to 48 hours, after which, the operation times out.

The AnywhereCaches Create API is asynchronous. A create operation causes a long-running operation to be returned. The long-running operation provides a status of the create operation and lets you cancel the operation before it's complete.

Update a cache

You can update the TTL or ingestion behavior of a cache in a RUNNING state. When a cache is in the process of being updated, the pending_update field evaluates to true. While the pending_update field evaluates to true, the cache cannot be updated again. When the TTL of a cache has finished updating, the new TTL is immediately applied to both existing and new data in the cache.

A cache in a CREATING or DISABLED state cannot be updated.

The AnywhereCaches Update API is asynchronous and returns a long-running operation.

Get a cache

When you get a cache, Rapid Cache returns the state and configuration of the cache instance. The AnywhereCaches Get API is synchronous and returns an AnywhereCache resource.

List caches

You can return a list of associated caches for a given bucket. The AnywhereCaches List API is synchronous and supports pagination.

Disable a cache

You can disable a cache to permanently remove the cache from your bucket's configuration. When you disable a cache, it enters a DISABLED state. During this state, you can still read existing data from the cache but you can't ingest new data into the cache.

After you disable a cache, there's a 1-hour grace period during which you can cancel the disablement by resuming the cache. After this 1-hour grace period, the cache gets deleted. When the cache gets deleted, all the data within the cache gets evicted, and the cache is removed from the bucket.

During the 1-hour period before the cache gets deleted, you can revert the DISABLED state by resuming the cache, at which point the cache resumes in the RUNNING state.

The AnywhereCaches Disable API is synchronous and returns an AnywhereCache resource.

Resume a cache

You can resume caches that are in a DISABLED state, as long as the disabled cache is within the 1-hour grace period. After the 1-hour grace period, the resume operation is done at best effort, as the cache could be deleted at any point after the grace period. Once a cache has been resumed, it enters a RUNNING state.

The AnywhereCaches Resume API is synchronous and returns an AnywhereCache resource.

Rapid Cache recommender

The Rapid Cache recommender provides recommendations and insights for creating caches in bucket-zone pairs by analyzing your data usage and storage. For overview information and instructions on using the Rapid Cache recommender, see Rapid Cache recommender.

Using Rapid Cache to accelerate reads for BigQuery

Rapid Cache can be used to serve data for object read requests issued by BigQuery. Using Rapid Cache, you can accelerate data reads for your applications while optimizing cost efficiency.

While BigQuery is a regional service, its underlying compute resources might occasionally shift between zones for load balancing. As a best practice, enable Rapid Cache for a BigQuery workload in all zones of a region to ensure there's an available cache to use in case the underlying compute resources change zones. If a cache in a zone is not used, it doesn't incur additional cost, as Rapid Cache is pay-per-use. Note that if a workload's resources change zones, the cache in the new zone will need to re-ingest the data, potentially incurring a one-time increase in data ingestion costs.

Encryption of cached data

Data is stored in the cache in the data's original server-side encrypted format, providing compatibility with the encryption options supported by Cloud Storage.

Limitations and restrictions

To delete a bucket, you must first delete all its associated caches. The only exception is when deleting a bucket using the Google Cloud console, which deletes all associated caches along with the bucket.
When performing the cache create, disable, resume, or update operations, limit the rate of operations to no more than one operation per second. Performing more than one operation per second can result in failures.
Rapid Cache is not durable storage and data may be evicted from the cache in various scenarios. One scenario is when the cache gets automatically resized to ensure that sufficient resources are available for your workloads. In this scenario, some data might get evicted according to a least-recently-used (LRU) algorithm until the Rapid Cache service has finished increasing the cache size.

In any case, your data remains safely stored in your source bucket. When data gets dropped from the cache due to reasons besides TTL expiry, the Rapid Cache service will attempt to re-ingest the data into the cache transparently and at no cost to you. If the data cannot be transparently re-ingested or was dropped due to TTL expiry, the Rapid Cache service will re-ingest the data upon first read.
Recommendations and insights generated by the Rapid Cache recommender cannot be read using BigQuery.

Performance considerations

Chunk misses: If a request covers multiple chunks and some chunks are in the cache while others are not, Rapid Cache transparently retrieves the missing chunks from the source bucket.
TTL and eviction: The Time to Live (TTL) and Least Recently Used (LRU) eviction policies also operate on chunks. Frequently used parts of a large file may remain in the cache while infrequently used parts are evicted.

Pricing

For pricing for using Rapid Cache, see Rapid Cache pricing.

Cost controls

Expand the following tips to learn how you can minimize the costs of running caches:

Bucket selection

You should only create caches for buckets that contain data you want to cache.

Zone selection

You should only create caches in zones where your workload will benefit from caching.

TTL setting

You should specify the minimal TTL you need to store data in the cache. The TTL can be changed non-disruptively. The default is 1 day.

Disabling the cache

You can disable a cache to permanently remove it from the service and stop all associated cache fees from accruing.

Troubleshooting temporary resource shortages

The following sections describe how to troubleshoot when a temporary resource shortage occurs, where there isn't enough SSD capacity or serving capacity in a specified zone to create a cache, increase a cache's size, or increase a cache's bandwidth limit.

Failure to create a new cache

Rapid Cache can fail to create a new cache in a specific zone due either to a lack of SSD capacity or throughput serving resources, which results in a temporary shortage of resources. During this time period, Rapid Cache attempts to create the new cache for up to 48 hours. If resources become available within the 48 hour timeframe, Rapid Cache completes the cache creation request successfully. If resources don't become available within the 48 hour timeframe, the cache creation request fails.

How to troubleshoot: To avoid disruption to your caching, you can manually cancel the cache creation operation and create a new cache in a different zone that might have capacity available. To monitor or cancel a cache creation operation, see using long-running operations.

Failure to increase cache size

Rapid Cache can fail to increase a cache's size when the required amount of SSD capacity isn't available in the cache's zone.

Although Rapid Cache offers automatic cache size increases on-demand, cache size increases are contingent upon SSD capacity availability. If SSD capacity isn't available when the automatic cache size increase request is made, Rapid Cache continues to submit the request until the temporary resource shortage ends or an increase in cache size is no longer needed.

During a temporary resource shortage, new data is ingested and existing data in the cache is evicted based on a least-recently-used basis. Caches that are large enough to store most of the hot data experience little to no impact to cache metrics. Caches with less capacity than the amount of hot data can evict data and re-ingest the same data more often than caches not affected by resource shortages. When the actual size of your cache is much smaller than the needed capacity, you might experience the following resource shortage-related behavior:

A lower cache bandwidth limit, lower cache throughput, higher data transfer bandwidth quota consumption, and a possible impact on other metrics
Billing might be affected in the following ways:
- Increased costs from the cache ingestion fee
- Decreased costs from the cache storage fee
- Decreased costs from the cache data transfer out fee
- Decreased costs from the cache data transfer out operation fees
- Increased costs from the multi-region data transfer fee
- Increased costs from the usage of Class B operations

For information about these fees, see Rapid Cache pricing.

How to troubleshoot: For best results during a temporary resource shortage, we recommend monitoring your caches and disabling unnecessary caches or workloads based on your needs.

Failure to scale up a cache's bandwidth limit

A cache bandwidth limit shortage can occur temporarily during a cache size increase when throughput serving resources in a specific zone are insufficient to scale the cache bandwidth limit of existing caches at 20 Gbps per TiB. During a shortage available of cache bandwidth, Rapid Cache doesn't allow the cache bandwidth limit to scale at 20 Gbps per TiB of data but the cache continues to serve read requests. You can request more cache bandwidth by contacting your Technical Account Manager or Google representative. During a shortage of available cache bandwidth, you might see an increase in your bucket's data egress bandwidth consumption.