This document describes how you can use AI assistance to help you monitor and troubleshoot your Spanner resources. You can use the AI-assisted troubleshooting tools of Spanner and Gemini Cloud Assist to troubleshoot high database load.
Before you begin
Set up Gemini Cloud Assist for your Google Cloud user account and project.
After you set up Gemini Cloud Assist, the service takes up to five minutes to propagate. Wait for propagation to complete before you enable AI-assisted troubleshooting in Spanner.
Required roles
To get the permissions that you need to to use AI-assisted troubleshooting, ask your administrator to grant you the following IAM roles on your Spanner databases:
-
Cloud Spanner Database User (
roles/spanner.databaseUser) -
Database Insights viewer (
roles/databaseinsights.viewer) -
Gemini Cloud Assist Investigation Owner (
roles/geminicloudassist.investigationOwner)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Open Gemini Cloud Assist
In the Google Cloud console, go to the Spanner Instances page.
To open the Overview page of an instance, click the instance name.
To open Gemini, click spark Open or close Gemini Cloud Assist chat.
In the Gemini Cloud Assist pane, enter a prompt that describes the information you're interested in.
After you enter the prompt, click Send prompt. Gemini returns a response to your prompt based on information from the last hour.
Troubleshoot high database load
By accessing the Query insights dashboard or the System insights dashboard in the Google Cloud console, you can analyze your database and troubleshoot events when your system experiences a higher database load than average. Spanner uses the 24 hours of data that occurs prior to your selected time range to calculate the expected load of your database. You look into the reasons for the higher load events and analyze the evidence behind reduced performance. Spanner also provides recommendations for optimizing your database to improve performance.
To use AI assistance with troubleshooting high database load, go to the System insights dashboard or the Query insights dashboard in the Google Cloud console.
Query insights dashboard
Troubleshoot high database load with AI assistance in the Query insights dashboard using the following steps:
In the Google Cloud console, go to the Spanner instances page.
To open the Overview page of an instance, click the instance name.
Optional: In the Databases list, click a database.
In the navigation menu, click Query insights.
Optional: Use the Time range filter to select either 1 hour, 6 hours, 1 day, 7 days, 30 days or a custom range.
You zoom in to specific sections of the chart where you notice areas of high load that you want to analyze. For example, an area of high load might display CPU utilization levels closer to 100%. To zoom in, you can click and select a portion of the chart.
In the Total CPU Utilization (All Queries) chart, click the Investigate performance button to start troubleshooting latency with AI assistance from Gemini Cloud Assist.
After about two minutes, the Investigation details pane opens with the following sections:
- Issue. A description of the issue being investigated, including the investigation's start and stop time.
- Observations. A list of observations about the issue. For example, these can include lock contention details, such as a longer than expected lock wait ratio for the query.
- Hypotheses. A list of AI-recommended actions to take to help address the slow running query.
System insights dashboard
Troubleshoot high database load with AI assistance in the System insights dashboard using the following steps:
In the Google Cloud console, go to the Spanner instances page.
To open the Overview page of an instance, click the instance name.
Optional: Under Databases, click a database.
In the navigation menu, click System insights.
Optional: Use the Time range filter to select either 1 hour, 6 hours, 1 day, 7 days, 30 days or a custom range.
You zoom in to specific sections of the chart where you notice areas of high load that you want to analyze. For example, an area of high load might display CPU utilization levels closer to 100%. To zoom in, you can click and select a portion of the chart.
Click the Explore Investigations button to start troubleshooting database load with AI assistance from Gemini Cloud Assist.
After about two minutes, the Investigation details pane opens with the following sections:
- Issue. A description of the issue being investigated, including the investigation's start and stop time.
- Observations. A list of observations about the issue. For example, these can include lock contention details, such as a longer than expected lock wait ratio for the query.
- Hypotheses. A list of AI-recommended actions to take to help address the slow running query.
Analyze high database load
Using AI assistance, you can analyze and troubleshoot the details of your database load.
Analysis time period
Spanner analyzes your database for the time period that you select in your database load chart from the Query insights dashboard or the System insights dashboard. If you select a time period of less than 24 hours, then Spanner analyzes the entire time period. If you select a time period greater than 24 hours, then Spanner selects only the last 24 hours of the time period for analysis.
To calculate the baseline performance analysis of your database, Spanner includes 24 hours of a baseline time period in its analysis time period. If your selected time period occurs on a day other than Monday, then Spanner uses a baseline time period of the 24 hours previous to your selected time period. If your selected time period occurs on a Monday, then Spanner uses a baseline time period of the 7th day previous to your selected time period.
Metrics analysis
When Spanner starts the analysis, Spanner checks for significant changes in the various metrics, including but not limited to the following:
- CPU utilization
- Read and write latencies, P50 and P99
- Read and write queries per second (QPS)
- Node count
- Session metrics
- Lock wait time
- Transaction abort count
- Query statistics
- Transaction statistics
- Lock statistics
- Split statistics
Spanner compares the baseline aggregated data for your database within the performance data of your analysis time window. If Spanner detects a significant change in threshold for a key metric, then Spanner indicates a possible situation with your database. The identified situation might explain a cause for the high load on your database over the selected time period.
Recommendations
When Gemini Cloud Assist completes analysis, the Hypotheses section of the Investigation details pane lists actionable insights to help remediate the issue.
For some situations, based on the analysis, a recommendation might not exist.
What's next
- Write SQL with Gemini assistance.
- Understand latency metrics.
- Investigate high CPU utilization.
- Performance overview.
- Monitor instances with system insights.