PostgreSQL WAL Archiving and Point-In-Time-Recovery

Enterprise PostgreSQL Solutions

Comments are off

PostgreSQL WAL Archiving and Point-In-Time-Recovery

WAL is short for Write-Ahead-Log. Any change to the data is first recorded in a WAL file. The WAL files are mainly used by RDBMS as a way to achieve durability and consistency while writing data to storage systems.

Before we move forward, let’s first see why we need a WAL archiving and Point in Time Recovery (PITR). Consider if you have accidentally dropped some table(s) or deleted some data? How do you recover from such mistakes? WAL archiving and PITR is the answer to that. A WAL file can be replayed on a server to recreate the recorded changes on that server. Hence we can use the WALs to recover from such dangerous situations. Whereas the PITR is a way to stop the replay of WALs at the specified point and have a consistent snapshot of data at that time. i.e. just before the table is dropped or data is removed.

How to Perform WAL Archiving

Normally, PostgreSQL databases keep the WAL files in the pg_wal directory of the $PGDATA. However, these WAL files may get recycled and can be deleted/overwritten by the server. So to avoid such scenarios, we keep a copy of WAL files in a separate directory other than $PGDATA.
In order to do that, the PostgreSQL server provides us a way to copy the WAL file to a different location as soon as a WAL file is generated by the server. This way depends on three commands (options) namely archive_mode, archive_command, and wal_level. These options can be set in the $PGDATA/postgresql.conf configuration file.

Archiving Options

PostgreSQL server provides us with some options through which we can control the WAL archiving. Let’s see what these options are and how to use them.

  • archive_mode signifies whether we want to enable the WAL archiving. It can accept the following values:
    • on – to enable the archiving
    • off – disable the archiving
    • always – normally this option is the same as ‘on’. This enables archiving for a standby server as well. If the standby shares the same path with another server, it may lead to WAL file corruption. So care must be taken in this case.
  • archive_command specifies how to archive (copy) the WAL files and where. This option accepts the shell command or a shell script. Which is executed whenever there is a WAL file generated by the server to archive it. This option accepts the following placeholders:
    • %f – if present it’s replaced with the filename of the WAL file.
    • %p – if present it is replaced with the pathname of the WAL file.
    • %% – is replaced with ‘%’
  • wal_level is another important option. In PostgreSQL version 10+, it defaults to ‘replica’, prior to this version it was set to minimal by default. wal_level accepts the following values:
    • Minimal – adds only the information that is required for crash recovery or from immediate shutdown. It’s not usable for replication or archiving purposes.
    • Replica – signifies that WAL will have enough information for WAL archiving and replication.
    • Logical – adds information required for logical replication.
an example of wal archive.
vim $PGDATA/postgresql.conf
archive_mode = on
archive_command = ‘cp %p /path/to//archive_dir/%f’
wal_level = replica

Point In Time Recovery

In PostgreSQL, PITR is a way to stop the replay of WAL files at an appropriate point in time. There can be many WAL files in the archive but we may not want to replay all of them. Replaying all WALs will result in the same state where we had made some mistake. There are two important prerequisites required for PITR to work.

  • Availability of a full base backup (usually taken with pg_basebackup)
  • WAL files (WAL archive)

In order to achieve the PITR, the first step would be to restore an earlier taken base backup and then create a recovery setup. The setup requires configuring restore_command and recovery_target options.

  • restore_command specifies from where to look up the WAL files to replay on this server. This command accepts the same placeholders as archive_command.
  • recovery_target_time This option tells the server when to stop the recovery or replay process. The process will stop as soon as the given timestamp is reached.

recovery_target_inclusive This option controls whether to stop the replay of WALs just after the recovery_target_time is reached (if set to true) or just before (if set to false).

an example of PITR recovery options.
vim $BACKUP/postgresql.con
restore_command = ‘cp  /path/to//archive_dir/%f %p’
recovery_target_time = ‘’
recovery_target_inclusive = false


Let’s combine all of the above in a practical demonstration and see how this all works.

# Start server
./pg_ctl -D $PGDATA start
# take a base backup
./pg_basebackup -D $BACKUP -Fp
# connect and put some data
./psql postgres
postgres=# create table foo(c1 int, c2 timestamp default current_timestamp);
postgres=#  insert into foo select generate_series(1, 1000000), clock_timestamp();
INSERT 0 1000000
postgres=# select current_timestamp;
 2020-10-01 18:01:18.157764+05
(1 row)
postgres=# delete from foo;
DELETE 1000000
postgres=# select current_timestamp;
 2020-10-01 18:01:36.272033+05
(1 row)

Let’s stop the server and create a recovery setup on the backup to stop before the deletion of data occurred.

./pg_ctl -D $PGDATA stop
# tell this cluster to start on recovery mode.
touch $BACKUP/recovery.signal

# edit configuration file to setup recovery options on the backup cluster.
vim  $BACKUP/postgresql.conf
# Recovery Options
restore_command = ‘cp  $HOME/wal_archive/%f %p’
recovery_target_time = ‘2020-10-01 18:01:18.157764+05’
recovery_target_inclusive = false

# let’s start the backup cluster and start the recovery process.
./pg_ctl -D $BACKUP start
 2020-10-01 18:03:17.365 PKT [71219] LOG:  listening on IPv4 address "", port 5432
2020-10-01 18:03:17.366 PKT [71219] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-10-01 18:03:17.371 PKT [71220] LOG:  database system was interrupted; last known up at 2020-10-01 18:00:45 PKT
2020-10-01 18:03:17.413 PKT [71220] LOG:  starting point-in-time recovery to 2020-10-01 18:01:18.157764+05
2020-10-01 18:03:17.441 PKT [71220] LOG:  restored log file "000000010000000000000002" from archive
2020-10-01 18:03:17.463 PKT [71220] LOG:  redo starts at 0/2000028
2020-10-01 18:03:17.463 PKT [71220] LOG:  consistent recovery state reached at 0/2000100
2020-10-01 18:03:17.463 PKT [71219] LOG:  database system is ready to accept read only connections
2020-10-01 18:03:17.501 PKT [71220] LOG:  restored log file "000000010000000000000003" from archive
2020-10-01 18:03:18.182 PKT [71220] LOG:  restored log file "000000010000000000000004" from archive
2020-10-01 18:03:18.851 PKT [71220] LOG:  restored log file "000000010000000000000005" from archive
2020-10-01 18:03:19.539 PKT [71220] LOG:  restored log file "000000010000000000000006" from archive
2020-10-01 18:03:20.195 PKT [71220] LOG:  restored log file "000000010000000000000007" from archive
2020-10-01 18:03:20.901 PKT [71220] LOG:  restored log file "000000010000000000000008" from archive
2020-10-01 18:03:21.574 PKT [71220] LOG:  restored log file "000000010000000000000009" from archive
2020-10-01 18:03:22.286 PKT [71220] LOG:  restored log file "00000001000000000000000A" from archive
2020-10-01 18:03:22.697 PKT [71220] LOG:  recovery stopping before commit of transaction 508, time 2020-10-01 18:01:32.304189+05
2020-10-01 18:03:22.697 PKT [71220] LOG:  pausing at the end of recovery
2020-10-01 18:03:22.697 PKT [71220] HINT:  Execute pg_wal_replay_resume() to promote.

Connect to this cluster and see if we still have data in the foo table?

postgres=# select count(*) from foo;
(1 row)

See when the recovery stopped, we still have the data!


PITR is a critical process to recover important data in case of a server crash which may result in an invalid/corrupt data directory leading to downtime. However, what can be prevented is the impact of a crash resulting in data loss. 

Keeping an up-to-date full backup with WAL files will lead to a simple and convenient restoration process. The time recovery time will depend on how recent the full backup was and how many WAL files were generated past it.