Backup and DR Service provides a Pacemaker-aware snapshot feature for IBM Db2 High Availability Disaster Recovery (HADR) databases. Use this feature to safely take Persistent Disk snapshots of a standby database that a Pacemaker cluster manages.
This process prevents Pacemaker from misinterpreting the temporary database deactivation during a snapshot as a failure. This avoidance of a misinterpretation prevents an unnecessary database restart.
Before you begin
Grant the Db2 instance OS user the necessary permissions to run Pacemaker
commands with sudo
privileges on all Db2 HADR standby nodes.
As the
root
user, open the/etc/sudoers.d/db2_pacemaker_access
file for editing:visudo -f /etc/sudoers.d/db2_pacemaker_access
Add the following line to the file:
DB2_INSTANCE_OS_USER ALL=(root) NOPASSWD: /usr/sbin/pcs status, /usr/sbin/pcs resource * DB2_HADR_PACEMAKER_RESOURCE_REGEX
Replace the following placeholders:
DB2_INSTANCE_OS_USER
: your Db2 instance OS username.DB2_HADR_PACEMAKER_RESOURCE_REGEX
: the regular expression that matches your Db2 HADR Pacemaker resource name. For example,*_db2
.
Enable Pacemaker-aware backups
To enable this feature, specify the Pacemaker resource name in the backup settings for your Db2 application.
- In the Backup and DR management console, go to the Db2 application backup settings.
- Locate the Db2 HADR CLUSTER PACEMAKER RESOURCE option.
- Enter the Pacemaker resource name for the Db2 instance that manages the HADR cluster.
How Pacemaker-aware snapshots work
The Pacemaker-aware backup process coordinates Pacemaker and Db2 through a specific sequence of operations.
- The script runs pre-checks to verify that it is on the standby node and the HADR pair is in a healthy state.
- The script places the Db2 instance resource into Pacemaker's maintenance mode. This action instructs Pacemaker to stop monitoring the resource.
- The script deactivates the database with the
db2 deactivate
command to freeze database I/O. - The script takes the Persistent Disk storage snapshot.
- The script activates the database with the
db2 activate
command to unfreeze I/O. - The script removes the Db2 instance resource from maintenance mode, which lets Pacemaker resume monitoring.
- The script logs all actions and runs cleanup routines.
Troubleshoot
If a backup job fails, examine the following logs on the database standby node for more details:
/var/act/log/customapp-db2instance.log
: contains information about standby database deactivation and activation, and Pacemaker resource maintenance mode entry and exit./act/tmpdata/BACKUP_JOB_NAME/pcs_background_cleanup.log
: indicates if the Pacemaker resource was automatically taken out of maintenance mode because the snapshot exceeded the two-minute timeout./var/log/pacemaker/pacemaker.log
: provides additional context from Pacemaker.
Snapshot timeout
If a Persistent Disk snapshot job exceeds two minutes, Backup and DR automatically
exits the Pacemaker resource maintenance mode. The backup job status changes to
Retrying
, and the scheduler initiates a new backup in ten minutes.
Aborted or blocked backup jobs
If a backup job aborts or becomes blocked, the system exits Pacemaker resource maintenance mode after two minutes, and the backup job fails.
Configure alerts
You can configure alerts from the Backup and DR management console
to send emails when a backup job fails. It is also recommended to configure
Pacemaker alerts based on pcs status
.
What's next
- Learn more about Backup and DR concepts.
- Learn more about Backup and DR for IBM Db2 databases.