Creating physical database backups with support for point-in-time recovery

While logical database backups require very little space, they also have the disadvantage of taking a great deal of time to restore for anything other than the smallest of Evergreen systems. Physical database backups are little more than a copy of the database file system, meaning that the space required for each physical backup will match the space used by your production database. However, physical backups offer the great advantage of almost instantaneous recovery, because the indexes already exist and simply need to be validated when you begin database recovery. Your backup server should match the configuration of your master server as closely as possible including the version of the operating system and PostgreSQL.

Like logical backups, physical backups also represent a snapshot of the data at the point in time at which you began the backup. However, if you combine physical backups with write-ahead-log (WAL) segment archiving, you can restore a version of your database that represents any point in time between the time the backup began and the time at which the last WAL segment was archived, a feature referred to as point-in-time recovery (PITR). PITR enables you to undo the damage that an accidentally or deliberately harmful UPDATE or DELETE statement could inflict on your production data, so while the recovery process can be complex, it provides fine-grained insurance for the integrity of your data when you run upgrade scripts against your database, deploy new custom functionality, or make global changes to your data.

To set up WAL archiving for your production Evergreen database, you need to modify your PostgreSQL configuration (typically located on Debian and Ubuntu servers in /etc/postgresql/<version>/postgresql.conf):

  1. Change the value of archive_mode to on
  2. Set the value of archive_command to a command that accepts the parameters %f (representing the file name of the WAL segment) and %p (representing the complete path name for the WAL segment, including the file name). You should copy the WAL segments to a remote file system that can be read by the same server on which you plan to create your physical backups. For example, if /data/wal represents a remote file system to which your database server can write, a possible value of archive_command could be: test ! -f /data/wal/%f && cp %p /data/wal/%f, which effectively tests to see if the destination file already exists, and if it does not, copies the WAL segment to that location. This command can be and often is much more complex (for example, using scp or rsync to transfer the file to the remote destination rather than relying on a network share), but you can start with something simple.

Once you have modified your PostgreSQL configuration, you need to restart the PostgreSQL server before the configuration changes will take hold: . Stop your OpenSRF services. . Restart your PostgreSQL server. . Start your OpenSRF services and restart your Apache HTTPD server.

To create a physical backup of your production Evergreen database:

  1. From your backup server, issue the pg_basebackup -x -D <data-destination-directory> -U <user-name> -h <hostname> <database-name> command to create a physical backup of database <database-name> on your backup server.

You should establish a process for creating regular physical backups at periodic intervals, bearing in mind that the longer the interval between physical backups, the more WAL segments the backup database will have to replay at recovery time to get back to the most recent changes to the database. For example, to be able to relatively quickly restore the state of your database to any point in time over the past four weeks, you might take physical backups at weekly intervals, keeping the last four physical backups and all of the corresponding WAL segments.