This page shows you how to identify and troubleshoot latency issues in your Spanner components.To learn more about possible latency points in a Spanner request, see Latency points in a Spanner request.
You can measure and compare the request latencies between different components and the database to determine which component is causing the latency. These latencies include End-to-end latency, Google Front End (GFE) latency, Spanner API request latency , and Query latency.
In your client application that uses your service, confirm there's a latency increase from end-to-end latency. Check the following dimensions from your client-side metrics. For more information, see Client-side metrics descriptions.
client_name: the client library name and version.location: the Google Cloud region where the client-side metrics are published. If your application is deployed outside Google Cloud, then the metrics are published to theglobalregion.method: the RPC method name—for example,spanner.commit.status: the RPC status—for example,OKorINTERNAL.
Group by these dimensions to see if the issue is limited to a specific client, status, or method. For dual-region or multi-regional workloads, see if the issue is limited to a specific client or Spanner region.
Check your client application health, especially the computing infrastructure on the client side (for example, VM, CPU, or memory utilization, connections, file descriptors, and so on).
Check latency in Spanner components by viewing the client-side metrics:
a. Check end-to-end latency using the
spanner.googleapis.com/client/operation_latenciesmetric.b. Check Google Front End (GFE) latency using the
spanner.googleapis.com/client/gfe_latenciesmetric.Check the following dimensions for Spanner metrics:
database: the Spanner database name.method: the RPC method name—for example,spanner.commit.status: the RPC status—for example,OKorINTERNAL.
Group by these dimensions to see if the issue is limited to a specific database, status, or method. For dual-region or multi-regional workloads, check to see if the issue is limited to a specific region.
Check Spanner API request latency using the
spanner.googleapis.com/api/request_latenciesmetric. For more information, see Spanner metrics.If you have high end-to-end latency, but low GFE latency, and a low Spanner API request latency, the application code might have an issue. It could also indicate a networking issue between the client and regional GFE. If your application has a performance issue that causes some code paths to be slow, then the end-to-end latency for each API request might increase. There might also be an issue in the client computing infrastructure that was not detected in the previous step.
If you have a high GFE latency, but a low Spanner API request latency, it might have one of the following causes:
Accessing a database from another region. This action can lead to high GFE latency and low Spanner API request latency. For example, traffic from a client in the
us-east1region that has an instance in theus-central1region might have a high GFE latency but a lower Spanner API request latency.There's an issue at the GFE layer. Check the Google Cloud Status Dashboard to see if there are any ongoing networking issues in your region. If there aren't any issues, then open a support case and include this information so that support engineers can help with troubleshooting the GFE.
Check the CPU utilization of the instance. If the CPU utilization of the instance is above the recommended level, you should manually add more nodes, or set up auto scaling. For more information, see Autoscaling overview.
Observe and troubleshoot potential hotspots or unbalanced access patterns using Key Visualizer and try to roll back any application code changes that strongly correlate with the issue timeframe.
Check any traffic pattern changes.
Check Query insights and Transaction insights to see if there might be any query or transaction performance bottlenecks.
Use procedures in Oldest active queries to see any expense queries that might cause a performance bottleneck and cancel the queries as needed.
Use procedures in the troubleshooting sections in the following topics to troubleshoot the issue further using Spanner introspection tools:
What's next
- Now that you've identified the component that contains the latency, explore the problem further using the built-in client-side metrics.
- Learn how to use metrics to diagnose latency.
- Learn how to troubleshoot Spanner deadline exceeded errors.