In this blog, we’ll be going over some more advanced topics new in Postgres 16. Having some experience with Linux, Postgres, and SQL is necessary as we’ll not only be going over these new features but also how to implement them. This blog was written using PostgreSQL 16 (Development Version) running on Ubuntu 23.04. First I’ll go over some background and a brief introduction to what Bi-Directional Replication is, and why it’s important, then finish off with how we implement Bi-Directional Logical Replication.
Before we can start learning about Bi-Directional Logical Replication we first have to understand Logical Replication.
Basics of Logical Replication
Logical replication has been supported since PostgreSQL 10 and has received numerous updates and improvements in the following years. Logical Replication is the process of copying (ie. replicating) data objects represented as their changes. This way we can copy only specific changes of objects like tables rather than whole databases, and stream these changes across different platforms and versions. This is all in contrast to Physical replication which uses exact block addresses and as a result, is limited to only copying entire databases and cannot stream across platforms or versions since the data must match in both.
Logical replication also introduces two very important elements necessary for understanding its Bi-Directional counterpart. These are Publishers and Subscribers, you can think of them as a leader node (Publisher) and a follower node (Subscriber). The Publisher will gather up its recent changes and send them as an ordered list of commands to the Subscriber. Once received the Subscriber takes this series of commands and applies it to its data. If both databases started with the same data, then the Subscriber will be up-to-date with the Publisher.
Now that we understand what Logical Replication is, what is Bi-Directional Replication doing differently? In short, Bi-Directional Logical Replication is when all nodes in the replication are both Publisher and Subscriber. Each database can now handle read and write requests, and all the changes will be streamed to one another. This is the Bi-Directional aspect, as rather than changes flowing in one direction as before, they flow in both directions.
What Postgres 16 adds is a new parameter to the WITH statement that filters out replication from certain nodes. Bi-Directional Logical Replication uses this parameter WITH(ORIGIN = NONE), this filters out all replication from connections with origins that are not NONE. Essentially, this only allows newly added data to be replicated, you can probably see why this is the case. If one database inserts new data and replicates it to a second, this second database will replicate the data also inserting it thus triggering another replication to the original database. We quickly get an infinite loop of replication, which is why this option is necessary to keep everything finite.
The main benefit of Bi-Direction Logical Replication is that it allows more availability for both read and write requests since we have two Primary nodes. This can be extremely beneficial for a wide range of applications where writing is especially needed.
Bi-Directional Logical Replication requires a few preconditions to operate correctly, as such many of its drawbacks are from these specific conditions. For example, when setting up replication the tables in each database must follow the same schema. Same name and columns, otherwise the Subscriber will not be able to find the table. Until Logical Replication can support replication of the Data Definition Language (DDL) used to create the tables, the user must do this manually to ensure consistency.
Now that we understand the basics of Bi-Directional Logical Replication, we can look into how we implement it between two databases. The beginning will be quite similar to setting up regular Logical Replication, but with a very important difference when we are creating the Publishers and Subscribers.
First, we will create the two primary databases which will follow each other:
$ initdb -D database1
$ initdb -D database2
In each database’s postgres.conf file set each’s way_level to logical and give each one a unique port number:
port = 5432
wal_level = logical
port = 5433
wal_level = logical
Start both databases:
pg_ctl -D database1 -l database1.log start
pg_ctl -D database2 -l database2.log start
Create the Publishers for each Database:
# CREATE PUBLICATION mypub1 FOR TABLE mytable;
# CREATE PUBLICATION mypub2 FOR TABLE mytable;
Create the Subscribers for each Database:
# CREATE SUBSCRIPTION mysub1 CONNECTION 'host=127.0.0.1 port=5433 user=postgres dbname=postgres' PUBLICATION mypub2 WITH(ORIGIN = NONE);
# CREATE SUBSCRIPTION mysub2 CONNECTION 'host=127.0.0.1 port=5432 user=postgres dbname=postgres' PUBLICATION mypub1 WITH(ORIGIN = NONE);
Note the order we created the publishers and subscribers, it is very important to first create the Publishers and then the Subscribers. You can refer to Figure 2 if you want a more visual representation, the number in the corner of each component denotes the order of their creation.
Now, when any data is inserted into either database, it should be replicated across both nodes.
In this blog, we went over the new Bi-Directional Logical Replication feature in PostgreSQL 16. To start, we went over a brief background on Logical Replication and the Publisher/Subscriber model used for synchronizing data. We then went over how Bi-Directional Logical Replication works and the new parameter that allows it to function without triggering infinite replication loops. Finally, we looked at how to set up Bi-Directional Replication with two primary PostgreSQL databases. With support for synchronization between primary nodes, increasing availability and data persistence should be a breeze for any of your database applications.
- C, Vigneshwaran. Bi-Directional Replication Using Origin Filtering in PostgreSQL, Fujitsu, 31 Aug. 2023, www.postgresql.fastware.com/blog/bi-directional-replication-using-origin-filtering-in-postgresql.
Tristen received his Bachelor of Applied Science in Computer Engineering from the University of British Columbia in May 2023. He joined HighGo Software Inc. as a Software Engineer fresh out of university and is very excited for his engineering journey to begin. His main interests include Machine Learning, Embedded Systems, and Database Management Systems. With experience in C/C++ and advanced relational databases, Tristen hopes to contribute significantly to the PostgreSQL community as he continues to learn and grow his expertise.