Cloud SQL performance capture overview

Cloud SQL for MySQL performance capture helps you diagnose and resolve complex and transient performance issues in your MySQL database.

When your instance experiences performance degradation, such as a database slowdown or stall, standard metrics can be insufficient for determining the root cause. Performance capture solves this issue by capturing detailed, point-in-time snapshots of the database at the moment when a problem is detected. You can use configurable triggers to take system-wide snapshots when transient issues occur and to detect long-running transactions.

Example use cases

This section lists example use cases for how you can use performance capture after you enable it for your instance.

Diagnose a database stall

Issue: An instance has been unresponsive for several minutes, but the metrics show only a drop in queries per second (QPS) and an increase in connections.

Example usage: Define a threshold trigger using runningThreadsThreshold. The performance capture log reveals a high number of semaphore waits, which identifies a specific mutex contention (for example, on the adaptive hash index) as the root cause.

Analyze query degradation

Issue: Query performance suddenly degrades system-wide.

Example usage: The performance capture log reveals a single, long-running transaction that accumulated a massive number of undo logs. The log identifies the long-running transaction, user, and query text.

Investigate replication lag

Issue: A read replica is falling significantly behind its source.

Example usage: You configure a high value trigger threshold for secondsBehindSourceThreshold. You can check the performance capture log to pinpoint the specific GTID causing the lag.

Manage long-running transactions

Issue: A batch job or user query runs for an excessive amount of time, holding locks.

Example usage: You configure a threshold trigger for transactionDurationThreshold. The performance capture log identifies the transaction that exceeds the threshold. You can use this information to investigate.

How performance data is captured

Performance capture operates as an agent-based service that monitors your instance. When you enable performance capture, your Cloud SQL instance does the following to capture performance data:

  1. The agent probes your database configuration to read the triggers that you've defined. The agent then probes the metrics of your database at a configurable interval, which is set to 30 seconds by default.

  2. If a problem is detected and the threshold of a trigger has been exceeded, then the agent continues to compare the database's live state to your rules. To prevent any false alarms from temporary spikes, the agent triggers a full performance capture only if it detects the problem for a consecutive number of probes. For example, the agent might trigger a performance capture if it detects that the number of threads are high for three probes in a row.

  3. When a performance capture is triggered, the agent connects to the database and runs a series of diagnostic commands to capture a detailed snapshot.

  4. The captured information is formatted into log entries and sent directly to the project's Cloud Logging for the Cloud SQL instance under a specific log stream named mysql-performance-capture.log.

Configurable triggers

You can configure the following triggers for performance capture:

  • runningThreadsThreshold: triggers when the number of active threads running on a primary instance exceeds the specified value. For example, you might configure the threshold to run performance capture if the number of active running threads goes higher than 100.

  • secondsBehindSourceThreshold: triggers for replicas when replication lag exceeds the specified number of seconds. For example, you might configure the threshold to run performance capture if the read replica lag is more than 300 seconds.

  • transactionDurationThreshold: triggers logging for individual transactions that run longer than the specified duration. For example, you might set up the trigger to log any single transaction that runs for more than 10 minutes.

Cooldown period after a performance capture

To prevent excessive logging and system overhead during a sustained performance event, performance capture implements a cooldown period of 30 minutes after a successful snapshot capture. This cooldown period is automatically activated to prevent the agent from triggering new, redundant captures while the system is in an extended problem state.

Pricing

Performance capture stores logs in Cloud Logging, which can incur additional storage costs.

For more information about the pricing for storing logs in Logging, see Pricing.

Limitations

  • You must have query insights enabled to use performance capture. If you disable query insights, then you also disable performance capture.
  • Performance capture is available only for Cloud SQL for MySQL 5.7 and later.

What's next