Manage high availability

Select a documentation version:

Manage high availability for your AlloyDB Omni clusters to add resilience to outages and failures through automated failovers and recovery mechanisms.

Limitations

  • Standby nodes can't be used as readable replicas.

  • Even if the dataplane is healthy, if the AlloyDB Omni cluster manager is down for more than 90 seconds by default, an automatic failover occurs. This duration can be configured in Configure high availability specifications using the HEALTHCHECK_PERIOD and AUTOFAILOVER_TRIGGER_THRESHOLD variables.

Configure high availability specification

To configure high availability, fill out the following information in your DBCluster specification:

DBCluster:
  metadata:
    ...
  spec:
    ...
    availability:
      numberOfStandbys: NUMBER_OF_STANDBYS
      enableAutoFailover: true
      enableAutoHeal: true
      replayReplicationSlotsOnStandbys: false
      healthcheckPeriodSeconds: HEALTHCHECK_PERIOD
      autoFailoverTriggerThreshold: AUTOFAILOVER_TRIGGER_THRESHOLD
      autoHealTriggerThreshold: AUTOHEAL_TRIGGER_THRESHOLD

Replace the following variables:

  • NUMBER_OF_STANDBYS: number of standby nodes to set up. Setting this value to 0 disables high availability. The maximum value is 5. If you're not sure how many standby nodes you need, start with 2 for high resiliency.

  • (Optional) HEALTHCHECK_PERIOD: number of seconds to wait between each health check. The default value is 30. The minimum value is 1. The maximum value is 86400 (one day).

  • (Optional) AUTOFAILOVER_TRIGGER_THRESHOLD: number of times the health check can fail before a failover occurs. The default value is 3. The minimum value is 0, but if the value is set to 0, AlloyDB Omni uses the default value.

    An automatic failover occurs if the healthcheck fails AUTOFAILOVER_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOFAILOVER_TRIGGER_THRESHOLD seconds.

  • (Optional) AUTOHEAL_TRIGGER_THRESHOLD: number of times the health check can fail before auto-heal begins. The default value is 3. The minimum value is 0, but if the value is set to 0, AlloyDB Omni uses the default value.

    An automatic recovery occurs if the healthcheck fails AUTOHEAL_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOHEAL_TRIGGER_THRESHOLD seconds.

Apply your DBCluster specification

To apply your configured DBCluster specification, run one of the following command:

alloydbctl

alloydbctl apply -d "DEPLOYMENT_SPEC" -r "DBCLUSTER_SPECIFICATION"

Replace the following variables:

Ansible

ansible-playbook DBCLUSTER_PLAYBOOK -i "DEPLOYMENT_SPEC" \
      -e resource_spec="DBCLUSTER_SPECIFICATION"

Replace the following variables:

  • RESTORE_PLAYBOOK: path to the playbook that you created for your DBCluster CRD.

  • DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.

  • DBCLUSTER_SPECIFICATION: path to the DBCluster specification you created in Create a cluster.

Switchover to a standby instance

You can perform switchovers when you need to test your high availability setup or any other planned maintenance activities that require switching the primary and standby replica. Once the switchover occurs, the direction of replication and roles of the primary and standby are reversed.

Switchovers perform the following actions:

  1. AlloyDB Omni orchestrator takes the primary offline.

  2. AlloyDB Omni orchestrator promotes the standby to be the new primary.

  3. AlloyDB Omni orchestrator converts the primary into a standby.

  4. AlloyDB Omni starts the newly-converted standby.

Perform a switchover

To perform a switchover, complete the following steps:

  1. Verify that your primary and standby instances are healthy.

  2. Verify that the high availability status.phase is Ready.

    alloydbctl

    alloydbctl get -d "DEPLOYMENT_SPEC" -t DBCluster -n DBCLUSTER_SPECIFICATION -o yaml

    Replace the following variables:

    Ansible

    ansible-playbook status.yaml -i DEPLOYMENT_SPEC -e resource_type=DBCluster \
      -e resource_name=DBCLUSTER_SPECIFICATION

    Replace the following variables:

  3. Create a Switchover specification using the following format:

    Switchover:
      metadata:
        name: SWITCHOVER_NAME
      spec:
        dbClusterRef: DBCLUSTER_NAME
        newPrimary: NEW_PRIMARY_NAME
    

    Replace the following variables:

    • SWITCHOVER_NAME: name for this Switchover specification. For example, my-switchover-1. This name must be unique every time a switchover is performed.

    • DBCLUSTER_NAME: name of your database cluster that you defined in Create a cluster.

    • (Optional) NEW_PRIMARY_NAME: name of the standby DBCluster specification that should be the new primary.

  4. If you're using Ansible, create a playbook for your Switchover specification.

    - name: SWITCHOVER_PLAYBOOK_NAME
      hosts: localhost
      vars:
        ansible_become: true
        ansible_user: ANSIBLE_USER
        ansible_ssh_private_key_file: ANSIBLE_SSH_PRIVATE_KEY_FILE
      roles:
      - role: google.alloydbomni_orchestrator.switchover
    

    Replace the following variables:

    • SWITCHOVER_PLAYBOOK_NAME: name of your Ansible playbook. For example, My Switchover.

    • ANSIBLE_USER: OS user that Ansible uses to log into your AlloyDB Omni nodes.

    • ANSIBLE_SSH_PRIVATE_KEY_FILE: private key Ansible uses to connect to your AlloyDB Omni nodes using SSH.

  5. Apply your Switchover specification.

    alloydbctl

    alloydbctl apply -d "DEPLOYMENT_SPEC" -r "SWITCHOVER_SPECIFICATION"

    Replace the following variables:

    • DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.

    • SWITCHOVER_SPECIFICATION: path to the Switchover specification you created in step three.

    Ansible

    ansible-playbook SWITCHOVER_PLAYBOOK -i "DEPLOYMENT_SPEC" \
      -e resource_spec="SWITCHOVER_SPECIFICATION"

    Replace the following variables:

    • SWITCHOVER_PLAYBOOK: path to the playbook that you created for your Switchover CRD in step four.

    • DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.

    • SWITCHOVER_SPECIFICATION: path to the Switchover specification you created in step three.

Load balancer for high availability

The load balancer (HAProxy) achieves high availability by pairing its nodes with Keepalived and a virtual IP. Keepalived utilizes the Virtual Router Redundancy Protocol (VRRP) to control a floating, virtual IP. Database client applications connect to this virtual IP instead of the database node's IP address.

In configurations where a dedicated load balancer isn't used, Keepalived is installed directly on the database nodes. In this scenario, high availability is achieved by dynamically assigning the virtual IP to the current primary node, ensuring seamless failover if the primary becomes unavailable.

To establish a stable election, Keepalived assigns VRRP priorities to the database cluster nodes. The first load balancer node assumes the primary role with a higher Keepalived priority— for example, 110. Subsequent nodes act as secondaries with a lower priority— for example, 100.

To ensure that the virtual IP points to a healthy node, Keepalived runs continues health checks every two seconds. This verifies the state of the systemd HAProxy process. If the HAProxy service on the primary fails, Keepalived migrates the virtual IP to a healthy secondary node.

If the database node's membership changes, HAProxy and Keepalived automatically points to the new active database nodes. The underlying routing configuration updates without dropping live client connections.

Configure the load balancer

To configure the virtual IP for the load balancer nodes, add the following dbLoadBalancerOptions field to the primarySpec field in your DBCluster specification:

DBCluster:
  spec:
    primarySpec:
      ...
      dbLoadBalancerOptions:
        onprem:
          loadBalancerIP: "VIRTUAL_IP"
          loadBalancerType: "internal"
          loadBalancerInterface: "VIRTUAL_IP_INTERFACE"

Replace the following variables:

  • VIRTUAL_IP: static IP address used for the floating, virtual IP. Database client applications use the IP address defined here. TO ensure that Keepalived can broadcast gratuitous ARPs successfully, this IP address must be available; can't loopback; and in the case of on-premises, belongs to the same subnet as your primary node interfaces.

  • VIRTUAL_IP_INTERFACE: network interface where VIRTUAL_IP is configured. The default value is eth0.