Configure Pacemaker-aware snapshots for IBM Db2 HADR

Backup and DR Service provides a Pacemaker-aware snapshot feature for IBM Db2 High Availability Disaster Recovery (HADR) databases. Use this feature to safely take Persistent Disk snapshots of a standby database that a Pacemaker cluster manages.

This process prevents Pacemaker from misinterpreting the temporary database deactivation during a snapshot as a failure. This avoidance of a misinterpretation prevents an unnecessary database restart.

Before you begin

Grant the Db2 instance OS user the necessary permissions to run Pacemaker commands with sudo privileges on all Db2 HADR standby nodes.

As the root user, open the /etc/sudoers.d/db2_pacemaker_access file for editing:
```
visudo -f /etc/sudoers.d/db2_pacemaker_access
```

Add the following line to the file:

DB2_INSTANCE_OS_USER  ALL=(root)    NOPASSWD: /usr/sbin/pcs status, /usr/sbin/pcs resource * DB2_HADR_PACEMAKER_RESOURCE_REGEX

Replace the following placeholders:
- DB2_INSTANCE_OS_USER: your Db2 instance OS username.
- DB2_HADR_PACEMAKER_RESOURCE_REGEX: the regular expression that matches your Db2 HADR Pacemaker resource name. For example, *_db2.
Note: The wildcard * after the resource name lets you pass flags like maintenance=true to the pcs resource command.

Enable Pacemaker-aware backups

To enable this feature, specify the Pacemaker resource name in the backup settings for your Db2 application.

In the Backup and DR management console, go to the Db2 application backup settings.
Locate the Db2 HADR CLUSTER PACEMAKER RESOURCE option.
Enter the Pacemaker resource name for the Db2 instance that manages the HADR cluster.

How Pacemaker-aware snapshots work

The Pacemaker-aware backup process coordinates Pacemaker and Db2 through a specific sequence of operations.

The script runs pre-checks to verify that it is on the standby node and the HADR pair is in a healthy state.
The script places the Db2 instance resource into Pacemaker's maintenance mode. This action instructs Pacemaker to stop monitoring the resource.
The script deactivates the database with the db2 deactivate command to freeze database I/O.
The script takes the Persistent Disk storage snapshot.
The script activates the database with the db2 activate command to unfreeze I/O.
The script removes the Db2 instance resource from maintenance mode, which lets Pacemaker resume monitoring.
The script logs all actions and runs cleanup routines.

Troubleshoot

If a backup job fails, examine the following logs on the database standby node for more details:

/var/act/log/customapp-db2instance.log: contains information about standby database deactivation and activation, and Pacemaker resource maintenance mode entry and exit.
/act/tmpdata/BACKUP_JOB_NAME/pcs_background_cleanup.log: indicates if the Pacemaker resource was automatically taken out of maintenance mode because the snapshot exceeded the two-minute timeout.
/var/log/pacemaker/pacemaker.log: provides additional context from Pacemaker.

Snapshot timeout

If a Persistent Disk snapshot job exceeds two minutes, Backup and DR automatically exits the Pacemaker resource maintenance mode. The backup job status changes to Retrying, and the scheduler initiates a new backup in ten minutes.

Aborted or blocked backup jobs

If a backup job aborts or becomes blocked, the system exits Pacemaker resource maintenance mode after two minutes, and the backup job fails.

Configure alerts

You can configure alerts from the Backup and DR management console to send emails when a backup job fails. It is also recommended to configure Pacemaker alerts based on pcs status.

What's next

Learn more about Backup and DR concepts.
Learn more about Backup and DR for IBM Db2 databases.