This document describes how to detect and debug hotspots in your database. You can access statistics about hotspots in splits with both GoogleSQL and PostgreSQL.
Spanner stores your data as a contiguous key space, ordered by the primary keys of your tables and indexes. A split is a range of rows from a set of tables or an index. The split's start is called the split start. The split limit sets the end of the split. The split includes the split start, but not the split limit.
In Spanner, hotspots are situations where too many requests are sent to the same server, which saturates the resources of the server and potentially causes high latencies. The splits affected by hotspots are known as hot or warm splits.
A split's hotspot statistic (identified in the system as CPU_USAGE_SCORE) is a
measurement of the load on a split that's constrained by the resources available
on the server. This measurement is given as a percentage. If more than 50% of
the load on a split is constrained by the available resources, then the split is
considered warm. If 100% of the load on a split is constrained, then the split
is considered hot. Such hot splits might also affect the latency of the requests
served by them.
The CPU_USAGE_SCORE of a split can remain constant or vary over time based on
the workload accessing the split and changes in the split boundaries.
Based on the warm and hot split resource constraints, Spanner
might use load-based
splitting to evenly
distribute the load across the key space. The warm and hot splits can be moved
across the instance's servers for load balancing. Spanner
performs load-based splitting in the background, minimizing the impact on
latency. However, Spanner might not be able to balance the load,
even after multiple attempts at splitting, due to
anti-patterns
in the application. The UNSPLITTABLE_REASONS column in the statistics views
provides specific reasons why a hot or warm split couldn't be divided further.
Hence, persistent warm or hot splits that last at least 10 minutes might need
further troubleshooting and potential application changes, especially when
UNSPLITTABLE_REASONS are present.
The Spanner hot split statistics help you identify the splits
where hotspots occur and understand why they may persist. These statistics,
combined with UNSPLITTABLE_REASONS codes, can help you diagnose what actions
you need to take to resolve hotspots. You can then make changes to your
application or schema, as needed.
How to access hot split statistics
Spanner provides the hot split statistics in the SPANNER_SYS
schema. SPANNER_SYS data is available through GoogleSQL and
PostgreSQL interfaces. You can access this data in the following ways:
- A database's Spanner Studio page in the Google Cloud console.
- The
gcloud spanner databases execute-sqlcommand. - The
executeSqlor theexecuteStreamingSqlmethod.
The following single read methods that Spanner provides don't
support SPANNER_SYS:
- Performing a strong read from a single row or multiple rows in a table.
- Performing a stale read from a single row or multiple rows in a table.
- Reading from a single row or multiple rows in a secondary index.
Hot split statistics
You use the following views to track hot splits:
SPANNER_SYS.SPLIT_STATS_TOP_MINUTE: shows splits that are hot during 1-minute intervals.SPANNER_SYS.SPLIT_STATS_TOP_10MINUTE: shows splits that are hot during any part of a 10-minute interval.SPANNER_SYS.SPLIT_STATS_TOP_HOUR: shows splits that are hot during any part of a 1-hour interval.
These views have the following properties:
- Each view contains data for non-overlapping time intervals of the duration the view name specifies.
- Intervals are based on clock times:
- 1-minute intervals end on the minute.
- 10-minute intervals end on the 10th minute of the hour, for example, 11:10:00, 11:20:00.
- 1-hour intervals end on the hour.
- After each interval, Spanner collects data from all servers
and then makes the data available in the
SPANNER_SYSviews shortly thereafter. For example, at 11:59:30 AM, the most recent intervals available to SQL queries are:- 1 minute: 11:58:00-11:58:59 AM
- 10 minutes: 11:40:00-11:49:59 AM
- 1 hour: 10:00:00-10:59:59 AM
- Spanner groups the statistics by splits.
- Each row contains statistics, including the
CPU_USAGE_SCOREpercentage which indicates how hot or warm a split is, for each split that Spanner captures statistics for during the specified interval. - The
SPANNER_SYS.SPLIT_STATS_TOP_MINUTEview offers the granular split statistics for every minute. Use this view for detailed debugging of recent events. - The
SPANNER_SYS.SPLIT_STATS_TOP_10MINUTEandSPANNER_SYS.SPLIT_STATS_TOP_HOURviews provide an aggregated view within 10-minute and hour intervals, respectively. Use these views for trend analysis or investigating issues over the past few days or weeks. For more information on aggregation, see View event aggregation. - If Spanner is unable to store all the hot splits during the
interval, the system prioritizes the splits with the highest
CPU_USAGE_SCOREpercentage during the specified interval. If there are no splits returned, it's an indication of the absence of any hot splits.
Data retention
The maximum amount of data that Spanner retains for each view, at any point in time, is as follows:
SPANNER_SYS.SPLIT_STATS_TOP_MINUTE: intervals covering the previous 24 hours.SPANNER_SYS.SPLIT_STATS_TOP_10MINUTE: intervals covering the previous 4 days.SPANNER_SYS.SPLIT_STATS_TOP_HOUR: intervals covering the previous 30 days.
These retention periods cannot be increased or decreased, and you can't prevent Spanner from collecting hot split statistics.
- To delete statistics data, you must either delete the database being tracked or wait until the statistics data rolls out of retention.
- To retain statistics data for longer periods, periodically copy data out of the hot split statistics views.
View schema
The following table shows the schema for hot split statistics:
| Column name | Type | Description |
|---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval during which the split was warm or hot. |
SPLIT_START |
STRING |
The starting key of the range of rows in the split. The split start might also be <begin>, indicating the beginning of the key space. |
SPLIT_LIMIT |
STRING |
The limit key for the range of rows in the split. The limit key might also be <end>, indicating the end of the key space. |
CPU_USAGE_SCORE |
INT64 |
The CPU_USAGE_SCORE percentage of the splits. A
CPU_USAGE_SCORE percentage of 50% indicates the presence of warm
or hot splits. |
AFFECTED_TABLES |
STRING ARRAY |
The tables whose rows might be in the split. |
UNSPLITTABLE_REASONS |
STRING ARRAY |
Identifies the type of hotspots present that load-based
splitting cannot mitigate, often due to anti-patterns. The presence
of any reason indicates user intervention, such as schema or workload
adjustments, is likely needed. An empty array means either no
unsplittable conditions were detected during this interval or the
high load was too short-lived for Spanner to determine if it was
unsplittable. See
UNSPLITTABLE_REASONS types
for more details. |
Split start and split limit keys
A split is a contiguous row range of a database, and is defined by its start and limit keys. A split can be a single row, a narrow row range, or a wide row range, and the split can include multiple tables or indexes.
The SPLIT_START and SPLIT_LIMIT columns identify the primary keys of a warm
or hot split.
Example schema
The following schema is an example table for the topics in this page.
GoogleSQL
CREATE TABLE Users (
UserId INT64 NOT NULL,
FirstName STRING(MAX),
LastName STRING(MAX),
) PRIMARY KEY(UserId);
CREATE INDEX UsersByFirstName ON Users(FirstName DESC);
CREATE TABLE Threads (
UserId INT64 NOT NULL,
ThreadId INT64 NOT NULL,
Starred BOOL,
) PRIMARY KEY(UserId, ThreadId),
INTERLEAVE IN PARENT Users ON DELETE CASCADE;
CREATE TABLE Messages (
UserId INT64 NOT NULL,
ThreadId INT64 NOT NULL,
MessageId INT64 NOT NULL,
Subject STRING(MAX),
Body STRING(MAX),
) PRIMARY KEY(UserId, ThreadId, MessageId),
INTERLEAVE IN PARENT Threads ON DELETE CASCADE;
CREATE INDEX MessagesIdx ON Messages(UserId, ThreadId, Subject),
INTERLEAVE IN Threads;
PostgreSQL
CREATE TABLE users
(
userid BIGINT NOT NULL PRIMARY KEY,-- INT64 to BIGINT
firstname VARCHAR(max),-- STRING(MAX) to VARCHAR(MAX)
lastname VARCHAR(max)
);
CREATE INDEX usersbyfirstname
ON users(firstname DESC);
CREATE TABLE threads
(
userid BIGINT NOT NULL,
threadid BIGINT NOT NULL,
starred BOOLEAN, -- BOOL to BOOLEAN
PRIMARY KEY (userid, threadid),
CONSTRAINT fk_threads_user FOREIGN KEY (userid) REFERENCES users(userid) ON
DELETE CASCADE -- Interleave to Foreign Key constraint
);
CREATE TABLE messages
(
userid BIGINT NOT NULL,
threadid BIGINT NOT NULL,
messageid BIGINT NOT NULL PRIMARY KEY,
subject VARCHAR(max),
body VARCHAR(max),
CONSTRAINT fk_messages_thread FOREIGN KEY (userid, threadid) REFERENCES
threads(userid, threadid) ON DELETE CASCADE
-- Interleave to Foreign Key constraint
);
CREATE INDEX messagesidx ON messages(userid, threadid, subject), REFERENCES
threads(userid, threadid);
Imagine your key space looks like this:
| PRIMARY KEY |
|---|
<begin> |
Users() |
Threads() |
Users(2) |
Users(3) |
Threads(3) |
Threads(3,"a") |
Messages(3,"a",1) |
Messages(3,"a",2) |
Threads(3, "aa") |
Users(9) |
Users(10) |
Threads(10) |
UsersByFirstName("abc") |
UsersByFirstName("abcd") |
<end> |
Example of splits
The following shows some example splits to help you understand what splits look like.
The SPLIT_START and SPLIT_LIMIT might indicate the row of a table or index,
or they can be <begin> and <end>, representing the boundaries of the key
space of the database. The SPLIT_START and SPLIT_LIMIT might also contain
truncated keys, which are keys preceding any full key in the table. For example,
Threads(10) is a prefix for any Threads row interleaved in Users(10).
| SPLIT_START | SPLIT_LIMIT | AFFECTED_TABLES | EXPLANATION |
|---|---|---|---|
Users(3) |
Users(10) |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
Split starts at row with UserId=3 and ends at the row before the row with UserId = 10. The split contains the Users table rows and all its interleaved tables rows for UserId=3 to 10. |
Messages(3,"a",1) |
Threads(3,"aa") |
Threads, Messages, MessagesIdx |
The split starts at the row with UserId=3, ThreadId="a" and MessageId=1 and ends at the row preceding the row with the key of UserId=3 and ThreadsId = "aa". The split contains all the tables between Messages(3,"a",1) and Threads(3,"aa"). As the split_start and split_limit are interleaved in the same top-level table row, the split contains the interleaved tables rows between the start and limit. See schemas-overview to understand how interleaved tables are co-located. |
Messages(3,"a",1) |
<end> |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
The split starts in the messages table at the row with key UserId=3, ThreadId="a" and MessageId=1. The split hosts all the rows from the split_start to <end>, the end of the key space of the database. All the rows of the tables following the split_start, like Users(4) are included in the split. |
<begin> |
Users(9) |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
The split starts at <begin>, the beginning of the key space of the database and ends at the row preceding the Users row with UserId=9. So the split has all the table rows preceding Users and all the rows of Users table preceding UserId=9 and the rows of its interleaved tables. |
Messages(3,"a",1) |
Threads(10) |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
Split starts at Messages(3,"a", 1) interleaved in Users(3) and ends at the row preceding Threads(10). Threads(10) is a truncated split key that is a prefix of any key of the Threads table interleaved in Users(10). |
Users() |
<end> |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
The split starts at the truncated split key of Users() which precedes any full key of the Users table. The split extends until the end of the possible key space in the database. The affected_tables hence cover the Users table, its interleaved tables and indexes and all the tables that might appear after users. |
Threads(10) |
UsersByFirstName("abc") |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
The split starts at the Threads row with UserId = 10 and ends at the index, UsersByFirstName at the key preceding "abc". |
UNSPLITTABLE_REASONS types
When Spanner cannot mitigate a hotspot through load-based
splitting, the UNSPLITTABLE_REASONS column in the SPLIT_STATS_TOP_* views
cites one or more of the following reasons:
HOT_ROW
Description: High load is concentrated on a single row. Spanner cannot add split points within an individual row.
Common Causes:
- Frequent high-volume operations (reads, writes, or updates) on a single key.
- Schema designs that centralize access to a single row.
Mitigation Strategies:
- Reduce QPS to the hot split.
- Redesign schema to distribute load. For example, shard counters across multiple rows.
- Review Schema design best practices.
MOVING_HOT_SPOT
Description: The key range experiencing high load shifts over time, often sequentially. Load-based splitting is ineffective as the hotspot relocates before Spanner can split the previously affected range.
Common Causes:
- Inserts with a monotonically increasing or decreasing leading key part, such as a commit timestamp.
- Sequential point reads across the keyspace of a table.
Mitigation Strategies:
- Avoid monotonically increasing or decreasing keys for the first part of the primary key in write-intensive workloads. For detailed mitigation strategies, see Schema design best practices. Techniques include using UUIDs or prepending a hash of the key.
LARGE_SCAN_HOT_SPOT
Description: The split experiences high load due to frequent or resource-intensive operations that scan across a key range. This can include range reads (including reads issued as part of a transaction) or queries. Spanner refrains from excessively splitting the ranges covered by such operations to avoid potential performance degradation for these scans, which can occur if the data becomes too fragmented across many small splits.
Common Causes:
- Queries or read operations executing broad range scans on frequently accessed data.
- DML statements (
UPDATE,DELETE) with WHERE clauses that require scanning ranges. - Absence of suitable indexes, leading to base table scans.
Mitigation Strategies:
- Optimize SQL statements (
SELECT,UPDATE,DELETE) to reduce the number of rows scanned. - Create appropriate indexes to support common query and DML predicates, minimizing the number of rows scanned.
UNISOLATABLE_HOT_ROW
Description: Spanner identifies a narrow, high-load key but
cannot isolate it by inserting new split points, due to the unavailability of a
suitable split point. This case is similar to HOT_ROW, but the SPLIT_START
and SPLIT_LIMIT don't completely isolate the hotspot.
Common Causes:
- Intense, localized load on one row or adjacent rows sharing a key prefix.
Mitigation Strategies:
- Analyze application access patterns for the keys within the reported
SPLIT_STARTandSPLIT_LIMIT. - Mitigation strategies often overlap with
HOT_ROW, focusing on reducing direct operational load on the problematic narrow key range.
UNSPECIFIED
Description: The split is experiencing high load and cannot be split, but the cause does not fall under the other specific categories. This can happen in complex load scenarios or due to internal system behavior.
Mitigation Strategies:
- Investigate application workloads, queries, or transactions accessing the
tables within the hot split (listed in
AFFECTED_TABLES) that have shown increased load. - Use tools like Query Insights and Transaction Insights to identify expensive operations.
- Evaluate the workload and ensure that you are using the Schema design best practices and SQL best practices.
- If the hotspot persists for more than 10 minutes despite the prior optimizations, open a support case.
View event aggregation
Entries in the SPANNER_SYS.SPLIT_STATS_TOP_10MINUTE and SPANNER_SYS.SPLIT_STATS_TOP_HOUR views represent an aggregation of the
1-minute intervals within their respective windows:
CPU_USAGE_SCORE: This shows the maximum CPU_USAGE_SCORE recorded for the
split in any 1-minute interval within the 10-minute or 1-hour window.
UNSPLITTABLE_REASONS: This array is a union of all unique
UNSPLITTABLE_REASONS observed for the split across all 1-minute intervals
within the window.
A split appears in these views if its CPU_USAGE_SCORE was 50% or higher in at
least one of the constituent 1-minute intervals.
Example of aggregation
Examine the split from Users(101) to Users(102). The following table shows
its potential entries in the MINUTE view over a 10-minute period from
10:00:00 to 10:10:00:
INTERVAL_END |
SPLIT_START |
SPLIT_LIMIT |
CPU_USAGE_SCORE |
AFFECTED_TABLES |
UNSPLITTABLE_REASONS |
|---|---|---|---|---|---|
| 10:01:00 | Users(101) | Users(102) | 60 | [Messages,Users,Threads] | [] |
| 10:02:00 | Users(101) | Users(102) | 95 | [Messages,Users,Threads] | [HOT_ROW] |
| 10:03:00 | Users(101) | Users(102) | 80 | [Messages,Users,Threads] | [HOT_ROW] |
| 10:04:00 | Users(101) | Users(102) | 55 | [Users,Threads] | [] |
| 10:06:00 | Users(101) | Users(102) | 70 | [Users,Threads] | [LARGE_SCAN_HOT_SPOT] |
| 10:07:00 | Users(101) | Users(102) | 65 | [Users,Threads] | [LARGE_SCAN_HOT_SPOT] |
| 10:09:00 | Users(101) | Users(102) | 52 | [Users,Threads] | [] |
The corresponding aggregated entry in 10MINUTE for the interval ending at
10:10:00 for this split would be:
INTERVAL_END |
SPLIT_START |
SPLIT_LIMIT |
CPU_USAGE_SCORE |
AFFECTED_TABLES |
UNSPLITTABLE_REASONS |
|---|---|---|---|---|---|
| 10:10:00 | Users(101) | Users(102) | 95 | [Messages,Users,Threads] | [HOT_ROW, LARGE_SCAN_HOT_SPOT] |
CPU_USAGE_SCORE: 95 is the maximum value from theCPU_USAGE_SCOREcolumn in the 1-minute view for this split within the window.UNSPLITTABLE_REASONS:[HOT_ROW, LARGE_SCAN_HOT_SPOT]is the union of all unique reasons present in theUNSPLITTABLE_REASONScolumn in the 1-minute view.
This example shows how the 10MINUTE view summarizes the most intense load and
all types of unsplittable issues encountered during the period. The HOUR view
follows the same aggregation logic over a 60-minute window.
Find hot splits
You can use the following SQL statement to retrieve hot split statistics. You can run these SQL statements using the client libraries, Google Cloud CLI, or the Google Cloud console.
SELECT
t.interval_end,
t.split_start,
t.split_limit,
t.cpu_usage_score,
t.affected_tables,
t.unsplittable_reasons
FROM
SPANNER_SYS.SPLIT_STATS_TOP_DURATION AS t
WHERE
-- Optional: Filter by a specific interval end time
-- t.interval_end = 'INTERVAL_END_TIME'
ORDER BY
t.interval_end DESC, t.cpu_usage_score DESC;
Replace the following:
DURATION: chooseMINUTE,10MINUTE, orHOUR, based on the observation period. For example,SPANNER_SYS.SPLIT_STATS_TOP_HOUR.INTERVAL_END_TIME: Replace with aTIMESTAMPof the end time of your observation period. For example,2072-06-08 08:30:00Z.
Interpret query results
For a complete list of UNSPLITTABLE_REASONS codes and their possible
diagnoses, see UNSPLITTABLE_REASONS types.
For example, your query output might look like the following:
SPLIT_START |
SPLIT_LIMIT |
CPU_USAGE_SCORE |
AFFECTED_TABLES |
UNSPLITTABLE_REASONS |
|---|---|---|---|---|
| Threads(10) | Threads(10, "aa") | 100 | Messages,Threads | [UNISOLATABLE_HOT_ROW] |
| Messages(631, "abc", 1) | Messages(631, "abc", 3) | 100 | Messages | [HOT_ROW] |
| Users(620) | <end> | 100 | Messages,Users,Threads | [MOVING_HOT_SPOT] |
| Users(101) | Users(102) | 90 | Messages,Users,Threads | [HOT_ROW] |
| Users(13) | Users(76) | 82 | Messages,Users,Threads | [LARGE_SCAN_HOT_SPOT] |
| Threads(12, "zebra") | Users(14) | 76 | Messages,Users,Threads | [] |
From these results, you could infer the following issues:
- Threads(10) to Threads(10, "aa"): Hot at 100% with
[UNISOLATABLE_HOT_ROW]. A single key, a key range prefix in theThreadstable, or an interleaved table is hot, and Spanner cannot split the range further. - Messages(631, "abc", 1) to Messages(631, "abc", 3): Hot at 100% with
[HOT_ROW]. Load is concentrated on MessageId 1 and 2 for this User and Thread. - Users(620) to
<end>: Hot at 100% with[MOVING_HOT_SPOT]. This often indicates a pattern of inserts with monotonically increasing or decreasing User IDs, causing the end of the key space to be persistently hot. - Users(101) to Users(102): Hot at 90% with
[HOT_ROW]. The load is concentrated on the single Users row UserId = 101 and its interleaved children. - Users(13) to Users(76): Hot at 82% with
[LARGE_SCAN_HOT_SPOT]. This suggests frequent or expensive scans across this range of User IDs. - Threads(12, "zebra") to Users(14): Warm at 76% usage. No unsplittable reasons were detected in this interval. Spanner might still be able to split this if the load persists or increases.
Troubleshoot hotspots using hot split statistics
This section describes how to detect and troubleshoot hotspots.
Select a time period to investigate
Check the latency metrics for your Spanner database to find the time period when your application experienced high latency and CPU usage. For example, it might show you that an issue started around 10:50 PM on May 18, 2072.
Check for unsplittable reasons
As Spanner balances load with load-based
splitting, we
recommend that you investigate hotspots that continue for more than 10 minutes,
especially if they have UNSPLITTABLE_REASONS. The presence of
UNSPLITTABLE_REASONS indicates that Spanner cannot split
the hot split, and schema or workload changes might be needed to mitigate the
hotspot.
You can query for UNSPLITTABLE_REASONS as shown in the following example
query:
SELECT
reason,
COUNT(*) AS occurrences
FROM
SPANNER_SYS.SPLIT_STATS_TOP_MINUTE AS t,
UNNEST(t.unsplittable_reasons) AS reason
WHERE
t.cpu_usage_score >= 50
AND ARRAY_LENGTH(t.unsplittable_reasons) > 0
AND t.interval_end >= "2072-05-18T17:40:00Z" -- Start of window
AND t.interval_end <= "2072-05-18T17:50:00Z" -- End of window
GROUP BY
reason
ORDER BY
occurrences DESC;
The presence of UNSPLITTABLE_REASONS indicates a need for further debugging.
You can also monitor unsplittable reasons using Cloud Monitoring. The
metric to use is unsplittable_reason_count. For more information, see
Spanner metrics.
Find the splits with the highest CPU_USAGE_SCORE and their UNSPLITTABLE_REASONS
For this example, we run the following SQL to find the row ranges with the
highest CPU_USAGE_SCORE level and their corresponding UNSPLITTABLE_REASONS:
GoogleSQL
SELECT t.split_start,
t.split_limit,
t.cpu_usage_score,
t.affected_tables,
t.unsplittable_reasons
FROM SPANNER_SYS.SPLIT_STATS_TOP_MINUTE t
WHERE t.cpu_usage_score >= 50
AND t.interval_end = "interval_end_date_time";
Replace interval_end_date_time with the date and time for the
interval, using the format YYYY-MM-DDTHH:MM:SSZ. For example,
2072-05-18T17:40:00Z.
PostgreSQL
SELECT t.split_start,
t.split_limit,
t.cpu_usage_score,
t.affected_tables,
t.unsplittable_reasons
FROM spanner_sys.split_stats_top_minute t
WHERE t.cpu_usage_score >= 50
AND t.interval_end = 'interval_end_date_time'::timestamptz;
Replace interval_end_date_time with the date and time for the
interval, using the format YYYY-MM-DDTHH:MM:SSZ. For example,
2072-05-18T17:40:00Z.
The previous SQL outputs the following:
SPLIT_START |
SPLIT_LIMIT |
CPU_USAGE_SCORE |
AFFECTED_TABLES |
UNSPLITTABLE_REASONS |
|---|---|---|---|---|
Users(180) |
<end> |
85 |
Messages,Users,Threads |
[MOVING_HOT_SPOT] |
Users(24) |
Users(76) |
76 |
Messages,Users,Threads |
[HOT_ROW, LARGE_SCAN_HOT_SPOT] |
Threads(10) |
UsersByFirstName("abc") |
100 |
UsersByFirstName, Users, Threads, Messages, MessagesIdx |
[] |
From this table of results, we can see that there are three hot splits and two
of them are unsplittable. Hotspots with UNSPLITTABLE_REASONS that persist over
time warrant further investigation. To understand what each reason means and how
to mitigate it, see UNSPLITTABLE_REASONS types.
Best practices to mitigate hotspots
Note: If you've had an increase in your load and have recently upscaled your instance, Spanner might take a few minutes to perform its load-balancing operations before your latency decreases.
If load-balancing doesn't decrease latency, the next step is to identify the cause of the hotspots. After that, options are to either reduce the hotspot workload, or optimize the application schema and logic to avoid hotspots.
Identify the cause
- Use Lock & Transaction Insights to look for transactions that have high lock wait time where the row range start key is within the hot split.
- Use Query Insights to look for queries that read from the table that contains the hot split, and have recently increased latency, or a higher ratio of latency to CPU.
- Use Oldest Active Queries to look for queries that read from the table that contains the hot split, and have higher than expected latency.
Some special cases to watch for:
- Check to see if time to live (TTL)
was enabled recently. If there are a lot of splits from old data, then TTL
can raise
CPU_USAGE_SCORElevels during mass deletes. In this case, the issue should self-resolve once the initial deletions complete.
Optimize the workload
- Follow SQL best practices. Consider stale reads, writes that don't perform reads first, or adding indexes.
- Follow Schema best practices. Ensure your schema is designed to handle load balancing and avoid hotspots.
What's next
- Learn about schema design best practices.
- Learn about Key Visualizer.
- Look through examples of schema designs.
- Learn how to use the split insights dashboard to detect hotspots.