Troubleshoot CPU bus locks

This document explains how to identify and troubleshoot CPU bus locks in Linux guest operating systems. It covers the symptoms of CPU bus locks, how to diagnose these issues by using kernel log messages, how to locate the faulty code, and how to apply mitigations or fixes.

Overview

A CPU bus lock occurs when a processor must assert a hardware LOCK# signal to acquire exclusive access to the system-wide memory bus. This issue typically happens in one of the following circumstances:

  • An atomic instruction operates on unaligned memory that crosses a cache line boundary (a split lock).
  • An atomic instruction operates on memory that is designated as uncacheable (UC), such as Memory-Mapped I/O (MMIO).

Because the CPU asserts a global bus lock, all other processors and devices in the guest operating system must wait while the memory operation completes. A high rate of bus locks might severely degrade CPU performance.

While older processors didn't track bus locks, modern x86 processors such as Intel Sapphire Rapids and later, or AMD Zen 5 or later, include a hardware feature that detects CPU bus locks. When an instruction triggers a CPU bus lock, the CPU issues a debug exception (#DB) immediately after the instruction completes.

Beginning with Linux kernel version 5.13 on Intel and 6.13 on AMD, the Linux kernel intercepts this #DB exception and applies a mitigation, typically by rate-limiting the faulty process. By intentionally forcing the thread to sleep, the kernel prevents a single application from saturating the memory bus, preserving system performance for the rest of the compute instance at the cost of the faulty application performance.

Symptoms

If a process in your Linux guest is triggering CPU bus locks, you might experience the following symptoms:

  • Degraded application performance: CPU bus locks might introduce unanticipated latency for applications.
  • System-wide load spikes: overall system responsiveness might decline.
  • Unexpected application crashes: if you configure the kernel to handle split or bus locks strictly (split_lock_detect=fatal), then the faulty application may crash with a SIGBUS error.

Identify CPU bus locks

To identify whether your compute instance is experiencing CPU bus locks, do one of the following:

Example CPU bus lock trace

x86/split lock detection: #DB: <process_name>/<pid> took a bus_lock trap at address: 0x<address>

To detect future CPU bus locks, do the following:

  1. Enable serial port output logging.
  2. Create a log-based alerting policy for the following log:

    resource.type="gce_instance" log_id("serialconsole.googleapis.com/serial_port_1_output") textPayload=~"took a bus_lock trap"

    This log entry gives you the process name (<process_name>) and Process ID (<pid>) that is responsible for the CPU bus lock, as well as the instruction pointer address where the fault occurred.

Troubleshoot CPU bus locks

If you are developing or compiling the faulty application, you can use specific C or C++ compiler warnings to identify variables and structures that might cause split locks.

Compiler warnings

If you use GCC or Clang, compile your code with the following flags to help identify alignment issues:

  • -Wcast-align or -Wcast-align=strict: these flags warn you when a pointer cast increases the required alignment of the target. Casting a generic char* buffer to a uint64_t* and performing an atomic operation on it is a classic cause of split locks.
  • -Waddress-of-packed-member: this flag warns you when you take the address of a packed struct member (for example, using #pragma pack(1) or __attribute__((packed))). Because packed structures disregard natural memory alignment, any atomic operation on a member of a packed struct has a high probability of crossing a 64-byte cache line boundary.

Catching uncacheable (UC) memory locks

If atomic operations on uncacheable memory causes the CPU bus lock, compiler warnings won't catch it. This issue typically happens when interacting with device memory:

  • Audit memory mappings: review your code for uses of mmap with flags like O_SYNC or direct access to /dev/mem or /dev/uio.
  • Avoid atomics on MMIO: don't use atomic operations like __sync_fetch_and_add or std::atomic on memory regions mapped to device registers or uncacheable memory buffers

Fixing CPU bus locks

You can fix CPU bus lock issues by correcting the memory alignment in the application's source code.

  • Avoid using #pragma pack or __attribute__((packed)) on structures that contain atomic variables, mutexes, or spinlocks.
  • Use standard alignment directives (like alignas(64) in C++11 or __attribute__((aligned(64))) in C) to force variables that are heavily used in atomic operations to align to cache line boundaries.
  • Make sure that there are no alignment-related warnings during compilation.
  • Make sure that you only use standard locking mechanisms (mutexes, spinlocks) or atomic instructions on standard, cacheable RAM, never on MMIO or UC memory.

If the troubleshooting steps didn't resolve the issue, then contact Cloud Customer Care and include all of the information you gathered during troubleshooting.