Troubleshoot issues

This page explains various error scenarios, and provides guidance for resolving the errors.

Replication scenarios

This section explains replication issues that might occur with your cluster.

How do you monitor replication lags?

Memorystore for Redis Cluster has the /cluster/replication/maximum_offset_diff metric. This metric monitors the maximum replication offset difference (in bytes) for a node in a primary cluster.

By keeping the replication offset difference low, replicas can perform incremental sync operations more frequently and at a lower cost than full sync operations.

We recommend that you set a threshold for the maximum_offset_diff metric. If the threshold is exceeded, then Memorystore for Redis Cluster can notify you by an alert.

Based on the node type for your cluster, we recommend that you set the threshold, as follows:

  • If the node type is redis-shared-core-nano, redis-standard-small, redis-highmem-medium, redis-highcpu-medium, or redis-standard-large, then set the threshold to be less than 64 MB.

  • If the node type is redis-highmem-xlarge or redis-highmem-2xlarge, then set the threshold to be less than 1 GB.

Connectivity error scenarios

This section explains connectivity issues your instance can encounter.

Connection error caused by firewall rules

Firewall rules might cause connection errors by blocking the ports that Memorystore for Redis Cluster uses. For both of your instance's Private Service Connect endpoints, allowlist ports 11000 through 13047. For more information about these endpoints, see Reserved network addresses.

Connection error caused by organization policies

You can have an organization policy that blocks your Private Service Connect connections to your Memorystore for Redis Cluster instance.

If your organization policy uses the .restrictPrivateServiceConnectProducer policy, then allow list the 961333125034 folder number, which is a folder specifically for Memorystore for Redis Cluster. For example:

name: organizations/Consumer-org-1/policies/compute.restrictPrivateServiceConnectProducer
spec:
    rules:
      - values:
          allowedValues:
          - under:folders/961333125034

If your organization policy uses the .disablePrivateServiceConnectCreationForConsumers policy, you should allow list SERVICE_PRODUCERS. For example:

name: organizations/Consumer-org-1/policies/compute.disablePrivateServiceConnectCreationForConsumers
spec:
    rules:
      - values:
          allowedValues:
          - SERVICE_PRODUCERS

CPU usage scenarios

This section explains CPU usage issues that your cluster might encounter.

The output buffer of your cluster runs out of space

If the output buffer of your cluster runs out of space, then do the following:

When the memory of your cluster is full, and a new write comes in, Memorystore for Redis Cluster evicts keys to make room for the write, based on your cluster's maxmemory policy. The allkeys-lru policy evicts the least recently used (LRU) keys from the entire keyset.

We recommend that you monitor your cluster's maxmemory and used memory. This helps you to know if your cluster reaches the provisioned cluster capacity. Also, by reducing the value for the maxmemory parameter, you get more space for the overhead.

Persistence scenarios

This section explains persistence issues that might occur with your cluster.

Your write traffic exceeds Memorystore for Redis Cluster's ability to compact and reclaim space through AOF rewriting

If this situation occurs, then the Append-Only File (AOF) grows faster than the rewrite process can manage. This leads to disk exhaustion, causes write failures, and blocks operations that require replica creation and full synchronization.

Memorystore for Redis Cluster implemented guardrails to regulate the write throughput. This ensures that AOF rewriting can keep pace with sustained high-write workloads.