Various Restoration Techniques Using PostgreSQL Point-In-Time Recovery

Enterprise PostgreSQL Solutions

Comments are off

Various Restoration Techniques Using PostgreSQL Point-In-Time Recovery

Introduction

This blog is aimed at beginners trying to learn the basics of PostgreSQL but already have some experience under their belt. For this tutorial, we will assume you have PostgreSQL correctly installed on Ubuntu. All of these steps were done using PostgreSQL 16 (development version) and Ubuntu 22.10. We’ll go over 3 different restoration methods, the “default” restore to “latest” method, restore by Log Sequence Number (LSN) and restore by timestamp.

Point-In-Time Recovery (PITR) is the process of restoring a database to its state at a specific point in time by replaying the Write Ahead Logging (WAL) files. We can show this process by creating two copies of a database and adding data to one. We can then archive the WAL file created by the first database and have the second database replay them in order to get them in sync. This would be similar to if our first database crashed and we lost data. We are at some previous state in the database with only the WAL files to get us back. Replaying the WAL files acts just like rerunning each command we issued from the last checkpoint up until a point we specify. This then leaves us exactly where we were before the crash.

Getting Started

First, create a new database and an archive folder

$ initdb -D data/
$ mkdir archive/

Then edit this database’s postgresql.conf file

port = 5432
wal_level = replica
archive_mode = on
archive_command = 'cp %p $HOME/pg/archive/%f'
archive_timeout = 60

Now, make a copy of the entire database cluster (notice we have no tables or data to actually backup):

$ cp -rp data/* data.backup/

Start the first database

$ pg_ctl -D data/ -l data.log start
$ createuser -s postgres

Let’s add some data

$ psql -U postgres -c "create database test"
CREATE DATABASE
$ psql -U postgres -d test -c "create table t1 (id int, val text)"
CREATE TABLE
$ psql -U postgres -d test -c "insert into t1 values (1, 'hello')"
INSERT 0 1
$ psql -U postgres -d test -c "insert into t1 values (2, 'world')"
INSERT 0 1
$ psql -U postgres -d test -c "insert into t1 values (3, 'foo')"
INSERT 0 1
$ psql -U postgres -d test -c "insert into t1 values (4, 'bar')"
INSERT 0 1

Check the archive to make sure our command is working correctly. If you don’t see any files in the archive we can manually tell the database to create the unfinished WAL files. Be careful using this command too much though as unfinished WAL files are the same size as finished ones so this can quickly bloat our database.

$ psql -U postgres -c "select pg_switch_wal()"
 pg_switch_wal 
---------------
 0/187BBC8
(1 row)

$ ls -l archive/
total 16384
-rw------- 1 tristen tristen 16777216 Apr 1 09:49 000000010000000000000001

Stop the main server

$ pg_ctl -D data stop

Now, edit the copy’s postgresql.conf. We need to set the recovery_target_timeline to “latest” in order to have the copy replayed to the most recent state of the WAL files.

port = 5433
archive_mode = off
restore_command = 'cp $HOME/pg/archive/%f %p'
recovery_target_timeline = 'latest'

Lastly, we need to create a recovery.signal file to tell the database to start in recovery mode

$ touch data.backup/recovery.signal

Start the copy server and it should start up in recovery mode and begin replaying the WAL files.

$ pg_ctl -D data.backup -l data.backup.log start
$ tail data.backup.log
2023-04-14 09:50:16.459 PDT [21530] LOG:  redo done at 0/2000060 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.03 s
2023-04-14 09:50:16.459 PDT [21530] LOG:  last completed transaction was at log time 2023-04-14 09:49:13.53674-07
2023-04-14 09:50:16.468 PDT [21530] LOG:  restored log file "000000010000000000000002" from archive
cp: cannot stat '/home/tristen/pg/archive/00000002.history': No such file or directory
2023-04-14 09:50:16.479 PDT [21530] LOG:  selected new timeline ID: 2
cp: cannot stat '/home/tristen/pg/archive/00000001.history': No such file or directory
2023-04-14 09:50:16.496 PDT [21530] LOG:  archive recovery complete
2023-04-14 09:50:16.497 PDT [21528] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-04-14 09:50:16.513 PDT [21528] LOG:  checkpoint complete: wrote 905 buffers (5.5%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.006 s, sync=0.007 s, total=0.018 s; sync files=259, longest=0.001 s, average=0.001 s; distance=28136 kB, estimate=28136 kB; lsn=0/3000028, redo lsn=0/3000028
2023-04-14 09:50:16.515 PDT [21525] LOG:  database system is ready to accept connections

Let’s check the data in the copy server

$ psql -U postgres -d test -c "select * from t1" -p 5433
 id |  val  
----+-------
  1 | hello
  2 | world
  3 | foo
  4 | bar
(4 rows)

This shows our backup replayed the WAL files up until the “latest” change.

Point-In-Time Recovery Using LSN

Follow all the steps above, but stop right before you shut down the main server.

We’re then going to execute a command to get the LSN of the current WAL file of our main database:

$ psql -U postgres -d test -c "select pg_current_wal_insert_lsn()"
 pg_current_wal_insert_lsn 
---------------------------
 0/2000060
(1 row)

Using this value, update the postgresql.conf of the copy database

port = 5433
archive_mode = off
restore_command = 'cp $HOME/pg/archive/%f %p'
recovery_target_lsn = '0/2000060'

Now we can shut down the main database and do our recovery like before:

$ pg_ctl -D data stop
$ touch data.backup/recovery.signal
$ pg_ctl -D data.backup -l data.backup.log start
$ psql -U postgres -d test -c "select * from t1" -p 5433
 id |  val  
----+-------
  1 | hello
  2 | world
  3 | foo
  4 | bar
(4 rows)

Point-In-Time Recovery Using Timestamps

Using timestamps is a little more involved as we need to reference specific points in time in order to restore our database. For this, proceed like the default example until its time to insert data:

$ psql -U postgres -c "create database test"
CREATE DATABASE
$ psql -U postgres -d test -c "create table t1 (id int, val text)"
CREATE TABLE
$ psql -U postgres -d test -c "insert into t1 values (1, 'hello')"
INSERT 0 1
$ psql -U postgres -d test -c "insert into t1 values (2, 'world')"
INSERT 0 1
$ sleep 60
$ psql -U postgres -c "select now()"
              now              
-------------------------------
 2023-04-14 12:54:23.160389-07
(1 row)

$ psql -U postgres -d test -c "insert into t1 values (3, 'foo')"
INSERT 0 1
$ psql -U postgres -d test -c "insert into t1 values (4, 'bar')"
INSERT 0 1
$ sleep 60
$ psql -U postgres -c "select now()"
              now              
-------------------------------
 2023-04-14 12:55:23.184277-07
(1 row)

As you can see, we’ve placed sleep calls in the middle of adding data in order to make the timestamps easier to manage. We also issue the command “select now()” to get the current timestamp from the database. Like before, finish up any remaining WAL files and create a recovery.signal dummy file.

$ psql -U postgres -c "select pg_switch_wal()"
$ pg_ctl -D data stop
$ touch data.backup/recovery.signal

This time, in the copy database’s postgresql.conf file we’re going to add:

port = 5433
archive_mode = off
restore_command = 'cp $HOME/pg/archive/%f %p'
recovery_target_time = '2023-04-14 12:54:23'

This recovery_target_time corresponds directly to the first timestamp we took after the first two pieces of data were inserted but before the second two pieces. Let’s start the copy server and see what it contains:

$ pg_ctl -D data.backup -l data.backup.log start
$ psql -U postgres -d test -c "select * from t1" -p 5433
 id |  val  
----+-------
  1 | hello
  2 | world
(2 rows)

We can see that only the first two pieces of data are present, as we expect. Now, let’s try restoring again 1 second after this. Edit the recovery_target_time as follows:

recovery_target_time = '2023-04-14 12:54:24'

Then recreate the recovery.signal dummy file as it is deleted upon successful recovery:

$ touch data.backup/recovery.signal

Finally, restart the copy server and select all the values in the table like before:

pg_ctl -D data.backup -l data.backup.log restart
psql -U postgres -d test -c "select * from t1" -p 5433
 id |  val  
----+-------
  1 | hello
  2 | world
  3 | foo
  4 | bar
(4 rows)

Depending on how fast or slow you entered the commands to send data to the server you might have to alter the recovery_target_time parameter to get a good result. If you go too far you might get the following error in your logfile:

FATAL:  recovery ended before configured recovery target was reached

This means that the time you want to recover to is too far in the future. Try setting the recovery time a few seconds back and then starting the server again.

Conclusion

There we have it! Point-in-time recovery using various methods to restore our database. In this article we covered first restoring the database to the latest timeline, then how to restore to a specific LSN and finally how to restore to a specific timestamp. Each of these methods has its specific use cases but knowing them all will undoubtedly help to make your databases easier to manage and more reliable.

References

The PostgreSQL Global Development Group. (2023, February 9). 26.3. continuous archiving and point-in-time recovery (pitr). PostgreSQL Documentation. https://www.postgresql.org/docs/current/continuous-archiving.html

RapidLoop. (2019, June 4). PostgreSQL incremental backup and point-in-time recovery. pgDash. https://pgdash.io/blog/postgres-incremental-backup-recovery.html