Replication Failover with pg_rewind in PG12

Enterprise PostgreSQL Solutions

Comments are off

Replication Failover with pg_rewind in PG12

1. Overview

In the previous blog, we have discussed how to correctly set up streaming replication clusters between one master and one slave in Postgres version 12. In this blog, we will simulate a failover scenario on the master database, which causes the replica (or slave) database cluster to be promoted as new master and continue the operation. We will also simulate a failback scenario to reuse the old master cluster after the failover scenario with the help of pg_rewind.

Normally it is quite easy to do a failback to the old master after slave gets promoted to master but if there is data written to the old master after slave promotion, we will have an out-of-sync case between them both and we will have to use the pg_rewind tool to synchronize the two data directories in order to bring the old master to match the state of the new master. Please note that the pg_rewind tool will remove transactions from the old master in order to match up with the new, so certain pre-caution is needed to use this tool.

Here’s a brief overview of list of actions we are going to perform:

  • simulate failover by promoting slave cluster, so it becomes a new master
  • simulate data insertion to master cluster, also referred as old master after promotion
  • shutdown the old master cluster and set it up as a standby server
  • run pg_rewind on old master to synchronize transaction states with new master
  • bring up old master as a standby server to synchronize with the new master

This blog assumes you already have streaming replication setup between one master and one slave from previous blog. If you have not checked out the previous blog titled “Streaming Replication Setup in PG12 – How to Do it Right”, it is recommended to give that a read first.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04

2. Simulate a Failover Case

We will simply promote the slave database cluster to simulate a failover.

$ pg_ctl promote -D db-slave

2019-10-30 11:10:16.951 PDT [16643] LOG:  received promote request
2019-10-30 11:10:16.951 PDT [16651] FATAL:  terminating walreceiver process due to administrator command
2019-10-30 11:10:16.966 PDT [16643] LOG:  redo done at 0/3003B60
2019-10-30 11:10:16.991 PDT [16643] LOG:  selected new timeline ID: 2
2019-10-30 11:10:17.030 PDT [16643] LOG:  archive recovery complete
2019-10-30 11:10:17.051 PDT [16642] LOG:  database system is ready to accept connections

As seen above, After slave gets promoted, it switches to a new timeline for future data operations. At this point the master and slave are no longer streaming WAL files from each other and we essentialyl have two independent database clusters running. We will call them old master and new master in the following sections instead so it is clear.

3. Insert Some Data to the Old Master

We would like to create a data out-of-sync case by inserting some more data to the old master cluster.

$ psql -d clusterdb -U cary -c "INSERT INTO test_table(x) SELECT y FROM generate_series(1, 100) a(y);" -p 5432
INSERT 0 100

Check that both the old master and new master are clearly out of sync:

## new master ##
$ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5433
(1 row)

## old master ##
$ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5432
(1 row)

4. Configure the Old Master as Standby Server to Sync with New Master

Now we would like to attempt a failover to make the old master as a standy server to syncrhonize with the new master.

Let’s shutdown the old master cluster.

$ pg_ctl -D db-master stop
waiting for server to shut down.... done
server stopped

Let’s update postgresql.conf in the old master:

############# db-master/postgresql.conf #############
recovery_target_timeline = 'latest'
archive_cleanup_command = 'pg_archivecleanup /home/caryh/streaming-replication/archivedir %r'
restore_command = 'cp /home/caryh/streaming-replication/archivedir/%f %p'
primary_slot_name = 'main'
primary_conninfo = 'user=cary passfile=''/home/caryh/.pgpass'' host= port=5433 sslmode=prefer sslcompression=0 gssencmode=disable target_session_attrs=any'

the primary_conninfo tells the old master to stream WAL files from the new master located at

Also, do not forget to touch the standby.signal file to tell the cluster to run in standby mode:

touch db-master/standby.signal

We specified in the old master to connect to primary_slot_name = main. Let’s create the matching replication slot on the new master.

$ psql -d clusterdb -U cary -c "select * from pg_create_physical_replication_slot('main');" -p 5433
 slot_name | lsn 
 main      | 
(1 row)

$ psql -d clusterdb -U cary -c "select * from pg_replication_slots;" -p 5433 -x
-[ RECORD 1 ]-------+---------
slot_name           | main
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | f
active_pid          | 
xmin                | 
catalog_xmin        | 
restart_lsn         | 
confirmed_flush_lsn | 

Now the new master has a matching replication slot called main and it is not active at the moment. Now we are ready to start the old master as standby server

5. Start the Old Master as Standby Server

Now, we are ready to start the old master as a standby:

$ pg_ctl -D db-master start

2019-10-30 11:30:04.071 PDT [1610] HINT:  Future log output will go to log destination "syslog".
2019-10-30 11:30:04.075 PDT [1611] LOG:  database system was shut down at 2019-10-30 11:29:13 PDT
2019-10-30 11:30:04.079 PDT [1611] LOG:  restored log file "00000002.history" from archive
2019-10-30 11:30:04.082 PDT [1611] LOG:  entering standby mode
2019-10-30 11:30:04.084 PDT [1611] LOG:  restored log file "00000002.history" from archive
2019-10-30 11:30:04.095 PDT [1611] FATAL:  requested timeline 2 is not a child of this server's history
2019-10-30 11:30:04.095 PDT [1611] DETAIL:  Latest checkpoint is at 0/4000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/3003B98.
2019-10-30 11:30:04.096 PDT [1610] LOG:  startup process (PID 1611) exited with exit code 1
2019-10-30 11:30:04.096 PDT [1610] LOG:  aborting startup due to startup process failure
2019-10-30 11:30:04.098 PDT [1610] LOG:  database system is shut down

As you can see above, the old master refuses to start because there is a timeline difference between the old master and the new master. This is caused by the additional data insertions that happens to the old master after the promotion event in step number 3.

This is where pg_rewind comes handy in situation like this, to synchronize the two clusters.

6. Use pg_rewind to Synchronize the two Clusters

Now, let’s synchronize the two database clusters with pg_rewind.

$ pg_rewind --target-pgdata=db-master --source-server="port=5433 user=cary dbname=clusterdb" --progress
pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 0/3003E58 on timeline 1
pg_rewind: rewinding from last common checkpoint at 0/2000060 on timeline 1
pg_rewind: reading source file list
pg_rewind: reading target file list
pg_rewind: reading WAL in target
pg_rewind: need to copy 53 MB (total source directory size is 78 MB)
54363/54363 kB (100%) copied
pg_rewind: creating backup label and updating control file
pg_rewind: syncing target data directory
pg_rewind: Done!

After pg_rewind is finished, we will have to edit once more the configuration of the old master because the tool copies most of the configuration settings from the new master to the old master as a synchronization process. Let’s examine both db-master/ and db-master/postgresql.conf and make sure of the followings again.

############# db-master/postgresql.conf #############
primary_slot_name = 'main'
recovery_target_timeline = 'latest'
port = 5432

############# db-master/ #############
primary_conninfo = 'user=cary passfile=''/home/caryh/.pgpass'' host= port=5433 sslmode=disable sslcompression=0 gssencmode=disable target_session_attrs=any'

and also, don’t forget about this:

touch db-master/standby.signal

Now, we should be ready to start the old master again.

7. Start the Old Master Agian as Standby Server

$ pg_ctl -D db-master start

2019-10-30 12:27:28.140 PDT [5095] LOG:  restored log file "000000010000000000000002" from archive
2019-10-30 12:27:28.167 PDT [5095] LOG:  redo starts at 0/2000028
2019-10-30 12:27:28.182 PDT [5095] LOG:  consistent recovery state reached at 0/3027258
2019-10-30 12:27:28.183 PDT [5095] LOG:  invalid record length at 0/3027258: wanted 24, got 0
2019-10-30 12:27:28.183 PDT [5095] LOG:  redo done at 0/3027230
2019-10-30 12:27:28.183 PDT [5095] LOG:  last completed transaction was at log time 2019-10-30 12:20:34.056723-07
019-10-30 12:27:28.226 PDT [5094] LOG:  database system is ready to accept connections

The old master can now start as a streaming replication to the new master and we can observe that after using pg_rewind the additional data that was inserted to old master in step number 3 is now removed, as it has been rewound from 300 entries to 200 entries to match up with the new master.

## new master ##
$ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5433
(1 row)

## old master ##
$ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5432
(1 row)

8. Summary

In this blog, we have simulated a failover case and observe the effect of promoting a standby slave server while more data insertions happening to the original master server. We have demonstrated how to use pg_rewind tool to synchronize both master and slave after the slave promotion. Though it results some data deletion at the original master, in the end, we are able to resolve the timeline conflict with pg_rewind and complete the database failover scenario.