This page documents AlloyDB Omni version 18.1.0 using the RPM orchestrator deployment option. Choose a different deployment option.

Manage high availability

Select a documentation version:

Manage high availability for your AlloyDB Omni clusters to add resilience to outages and failures through automated failovers and recovery mechanisms.

Limitations

Standby nodes can't be used as readable replicas.
Even if the dataplane is healthy, if the AlloyDB Omni cluster manager is down for more than 90 seconds by default, an automatic failover occurs. This duration can be configured in Configure high availability specifications using the HEALTHCHECK_PERIOD and AUTOFAILOVER_TRIGGER_THRESHOLD variables.

Configure high availability specification

To configure high availability, fill out the following information in your DBCluster specification:

DBCluster:
  metadata:
    ...
  spec:
    ...
    availability:
      numberOfStandbys: NUMBER_OF_STANDBYS
      enableAutoFailover: true
      enableAutoHeal: true
      replayReplicationSlotsOnStandbys: false
      healthcheckPeriodSeconds: HEALTHCHECK_PERIOD
      autoFailoverTriggerThreshold: AUTOFAILOVER_TRIGGER_THRESHOLD
      autoHealTriggerThreshold: AUTOHEAL_TRIGGER_THRESHOLD

Replace the following variables:

NUMBER_OF_STANDBYS: number of standby nodes to set up. Setting this value to 0 disables high availability. The maximum value is 5. If you're not sure how many standby nodes you need, start with 2 for high resiliency.
(Optional) HEALTHCHECK_PERIOD: number of seconds to wait between each health check. The default value is 30. The minimum value is 1. The maximum value is 86400 (one day).
(Optional) AUTOFAILOVER_TRIGGER_THRESHOLD: number of times the health check can fail before a failover occurs. The default value is 3. The minimum value is 0, but if the value is set to 0, AlloyDB Omni uses the default value.

An automatic failover occurs if the healthcheck fails AUTOFAILOVER_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOFAILOVER_TRIGGER_THRESHOLD seconds.
(Optional) AUTOHEAL_TRIGGER_THRESHOLD: number of times the health check can fail before auto-heal begins. The default value is 3. The minimum value is 0, but if the value is set to 0, AlloyDB Omni uses the default value.

An automatic recovery occurs if the healthcheck fails AUTOHEAL_TRIGGER_THRESHOLD times or for HEALTHCHECK_PERIOD * AUTOHEAL_TRIGGER_THRESHOLD seconds.

Apply your `DBCluster` specification

To apply your configured DBCluster specification, run one of the following command:

`alloydbctl`

alloydbctl apply -d "DEPLOYMENT_SPEC" -r "DBCLUSTER_SPECIFICATION"

Replace the following variables:

DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
DBCLUSTER_SPECIFICATION: path to the DBCluster specification you created in Create a cluster.

Ansible

ansible-playbook DBCLUSTER_PLAYBOOK -i "DEPLOYMENT_SPEC" \
      -e resource_spec="DBCLUSTER_SPECIFICATION"

Replace the following variables:

RESTORE_PLAYBOOK: path to the playbook that you created for your DBCluster CRD.
DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
DBCLUSTER_SPECIFICATION: path to the DBCluster specification you created in Create a cluster.

Switchover to a standby instance

You can perform switchovers when you need to test your high availability setup or any other planned maintenance activities that require switching the primary and standby replica. Once the switchover occurs, the direction of replication and roles of the primary and standby are reversed.

Switchovers perform the following actions:

AlloyDB Omni orchestrator takes the primary offline.
AlloyDB Omni orchestrator promotes the standby to be the new primary.
AlloyDB Omni orchestrator converts the primary into a standby.
AlloyDB Omni starts the newly-converted standby.

Perform a switchover

To perform a switchover, complete the following steps:

Verify that your primary and standby instances are healthy.
Verify that the high availability status.phase is Ready.
alloydbctl
alloydbctl get -d "DEPLOYMENT_SPEC" -t DBCluster -n DBCLUSTER_SPECIFICATION -o yaml
Replace the following variables:
- DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
- DBCLUSTER_SPECIFICATION: name of your DBCluster specification that you defined in Create a cluster.
Ansible
ansible-playbook status.yaml -i DEPLOYMENT_SPEC -e resource_type=DBCluster \ -e resource_name=DBCLUSTER_SPECIFICATION
Replace the following variables:
- DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
- DBCLUSTER_SPECIFICATION: name of your DBCluster specification that you defined in Create a cluster.
Create a Switchover specification using the following format:
```
Switchover:
  metadata:
    name: SWITCHOVER_NAME
  spec:
    dbClusterRef: DBCLUSTER_NAME
    newPrimary: NEW_PRIMARY_NAME
```
Replace the following variables:
- SWITCHOVER_NAME: name for this Switchover specification. For example, my-switchover-1. This name must be unique every time a switchover is performed.
- DBCLUSTER_NAME: name of your database cluster that you defined in Create a cluster.
- (Optional) NEW_PRIMARY_NAME: is the standby instance that becomes the new primary. To map the instance name to a host, see the instanceList field in the status of the referenced DBCluster.
If you're using Ansible, create a playbook for your Switchover specification.
```
- name: SWITCHOVER_PLAYBOOK_NAME
  hosts: localhost
  vars:
    ansible_become: true
    ansible_user: ANSIBLE_USER
    ansible_ssh_private_key_file: ANSIBLE_SSH_PRIVATE_KEY_FILE
  roles:
  - role: google.alloydbomni_orchestrator.switchover
```
Replace the following variables:
- SWITCHOVER_PLAYBOOK_NAME: name of your Ansible playbook. For example, My Switchover.
- ANSIBLE_USER: OS user that Ansible uses to log into your AlloyDB Omni nodes.
- ANSIBLE_SSH_PRIVATE_KEY_FILE: private key Ansible uses to connect to your AlloyDB Omni nodes using SSH.
Apply your Switchover specification.
alloydbctl
alloydbctl apply -d "DEPLOYMENT_SPEC" -r "SWITCHOVER_SPECIFICATION"
Replace the following variables:
- DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
- SWITCHOVER_SPECIFICATION: path to the Switchover specification you created in step three.
Ansible
ansible-playbook SWITCHOVER_PLAYBOOK -i "DEPLOYMENT_SPEC" \ -e resource_spec="SWITCHOVER_SPECIFICATION"
Replace the following variables:
- SWITCHOVER_PLAYBOOK: path to the playbook that you created for your Switchover CRD in step four.
- DEPLOYMENT_SPEC: path to the deployment specification you created in Install AlloyDB Omni components.
- SWITCHOVER_SPECIFICATION: path to the Switchover specification you created in step three.

Load balancer for high availability

The load balancer (HAProxy) achieves high availability by pairing its nodes with Keepalived and a virtual IP. Keepalived utilizes the Virtual Router Redundancy Protocol (VRRP) to control a floating, virtual IP. Database client applications connect to this virtual IP instead of the database node's IP address.

In configurations where a dedicated load balancer isn't used, Keepalived is installed directly on the database nodes. In this scenario, high availability is achieved by dynamically assigning the virtual IP to the current primary node, ensuring seamless failover if the primary becomes unavailable.

To establish a stable election, Keepalived assigns VRRP priorities to the database cluster nodes. The first load balancer node assumes the primary role with a higher Keepalived priority— for example, 110. Subsequent nodes act as secondaries with a lower priority— for example, 100.

To ensure that the virtual IP points to a healthy node, Keepalived runs continues health checks every two seconds. This verifies the state of the systemd HAProxy process. If the HAProxy service on the primary fails, Keepalived migrates the virtual IP to a healthy secondary node.

If the database node's membership changes, HAProxy and Keepalived automatically points to the new active database nodes. The underlying routing configuration updates without dropping live client connections.

Configure the load balancer

To configure the virtual IP for the load balancer nodes, add the following dbLoadBalancerOptions field to the primarySpec field in your DBCluster specification:

DBCluster:
  spec:
    primarySpec:
      ...
      dbLoadBalancerOptions:
        onprem:
          loadBalancerIP: "VIRTUAL_IP"
          loadBalancerType: "internal"
          loadBalancerInterface: "VIRTUAL_IP_INTERFACE"

Replace the following variables:

VIRTUAL_IP: static IP address used for the floating, virtual IP. Database client applications use the IP address defined here. TO ensure that Keepalived can broadcast gratuitous ARPs successfully, this IP address must be available; can't loopback; and in the case of on-premises, belongs to the same subnet as your primary node interfaces.
VIRTUAL_IP_INTERFACE: network interface where VIRTUAL_IP is configured. The default value is eth0.

Manage high availability Stay organized with collections Save and categorize content based on your preferences.

Limitations

Configure high availability specification

Apply your DBCluster specification

alloydbctl

Ansible

Switchover to a standby instance

Perform a switchover

alloydbctl

Ansible

alloydbctl

Ansible

Load balancer for high availability

Configure the load balancer

Manage high availability

Apply your `DBCluster` specification

`alloydbctl`

`alloydbctl`

`alloydbctl`