An Overview of Logical Replication in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

An Overview of Logical Replication in PostgreSQL

Logical Replication appeared in Postgres10, it came along with number of keywords like ‘logical decoding’, ‘pglogical’, ‘wal2json’, ‘BDR’ etc. These words puzzle me so much so I decided to start the blog with explaining these terms and describing the relationships between them. I will also mention a new idea for a tool called ‘logical2sql’, it is designed for synchronization of data from Postgres to databases of different kinds and architectures.

Difference Between Physical And Logical Replication

Replication (Or Physical Replication) is a more ancient or traditional data synchronize approach. It plays an important role in the data synchronization of Postgres. It copies data in the WAL record by using exact block addresses and byte-by-byte replication. So it will be almost the same between master and slave and the ‘ctid’ is the same which identifies the location of a data row. ​ This blog will also briefly discuss logical replication, I have mentioned physical replication here just to differentiate physical and logical replication for the readers.

physical replication

Physical Replication Schematics

Logical Replication support table-level data synchronization, in princial it is very different from physical replication. The detailed analysis below will help in understanding this :

Logical Replication : When the WAL record is produced, the logical decoding module analyzes the record to a ‘ReorderBufferChange’ struct (you can deem it as HeapTupleData). The pgoutput plugin will then do a filter and data reorganization to the ReorderBufferChanges and send it to subscription.

The publication will get some ‘HeapTupleData’ and will remake insert, delete, update to the received data. Notice that Physical Replication will copy data from wal record to data pages while Logical Replication will execute an insert, update, delete again in the subscription.

Logical Replication Schematics

Concepts About Recognize Logical

A picture to known every things about logical

Logical Decoding is the key technology of all logical replication feature. The test_decoding plugin can be used to test logical decoding, it doesn’t do a whole lot but it can be used to develop other output plugin for logical decoding. The wal2json appeared as a contrib which has better output than test_decoding. BDR and pglogical are contrib modules who encompass the same feature as logical replication, BDR supports multi-master, DDL replication, etc. The pglogical almost has the same feature as built-in Logical Replication with different operational steps however we can use pglogical for a low version of Postgres while Logical Replication can support from pg10.

1. Key:logical decoding

logical decoding Schematics

The walsender process continue receiving wal record from replicate slot, the wal records will be filtered and sent to ReorderBufferChange (It is mainly oldtuple and newtuple for DML records). The ReorderBufferChange will be set into recordbuffer by their TransactionID, as one TransactionID is committed, the ReorderBufferChange in recordbuffer related to the transaction will be analyzed by the pointed plugin.

2. test_decoding Plugin

The test_decoding is an plugin of Postgres and it appears with logical decoding,it can turn the wal record to format that human-readable. We can use the plugin in two ways as below, you can check it in the document.

2.1 Used By SQL

postgres=# SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding');
    slot_name    |    lsn    
 regression_slot | 0/16569D8
(1 row)
postgres=# \d t1
                      Table "public.t1"
 Column |       Type        | Collation | Nullable | Default 
 i      | integer           |           |          | 
 j      | integer           |           |          | 
 k      | character varying |           |          | 
postgres=# insert into t1 values(1,1,'test_decoding use by SQL');
postgres=# SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
    lsn    | xid |                                                data                                                
 0/16569D8 | 492 | BEGIN 492
 0/16569D8 | 492 | table public.t1: INSERT: i[integer]:1 j[integer]:1 k[character varying]:'test_decoding use by SQL'
 0/1656A60 | 492 | COMMIT 492
(3 rows)

It’s Schematics

2.2 Used By pg_recvlogical

movead@home:~/ld $ pg_recvlogical -d postgres --slot=test2 --create-slot
movead@home:~/ld $ pg_recvlogical -d postgres --slot=test2 --start -f ld.out &
[1] 19807
movead@home:~/ld $ psql -c "insert into t1 values(2,2,'test_decoding use by pg_recvlogical');"
movead@home:~/ld $ vi ld.out
movead@home:~/ld $

Then you can see the output in file ld.out

table public.t1: INSERT: i[integer]:2 j[integer]:2 k[character varying]:'test_decoding use by pg_recvlogical'

It’s Schematics

3. Wal2json

wal2json plugin is an enhanced version of test_decoding, it’s output is relatively easier to use.

movead@home:~/ld $ pg_recvlogical -d postgres --slot test_slot_wal2json --create-slot -P wal2json
movead@home:~/ld $ pg_recvlogical -d postgres --slot test_slot_wal2json --start -o pretty-print=1 -f ld_wal2json.out
movead@home:~/ld $ vi ld_wal2json.out
movead@home:~/ld $

now can check the output in ld_wal2json.out file.

      "change": [
                      "kind": "insert",
                      "schema": "public",
                      "table": "t1",
                      "columnnames": ["i", "j", "k"],
                      "columntypes": ["integer", "integer", "character varying"],
                      "columnvalues": [1, 1, "wal2json use by pg_recvlogical"]


4. Logical Replication

Postgres user can use test_decoding to recognize the logical replication but can hardly use it without a secondary development or third-party contrib. The Logical Replication makes us use logical replicate with a build-in feature of Postgres. In other words, users cannot directly use the test_decoding function for production environments unless you use third-party plug-ins to synchronize data, while logical replication makes logical decoding easier to use in production environments.

Here is simple Logical Replication progress, I will show what Postgres do with every step.

4.1 Prepare Test Enviroment

//Publication IP43
create user user_logical43 replication encrypted password '123';
create table t1(i int, j int ,k varchar);
alter table t1 add primary key(i);
grant select on t1 to user_logical43;

//Subscription IP89
create table t1(i int,j int,k varchar);

4.2 Logical Replication Deploy

IP43 create publication

We will use pub43 later, others is for introduce.

create publication pub43 for table t1;
create publication pub43_1 for all tables;
create publication pub43_2 for all tables with (publish='insert,delete');

publication create will modify the catalog relation

postgres=# select * from pg_publication ;
oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate
16420 | pub43   |       10 | f           | t         | t         | t         | t
16422 | pub43_1 |       10 | t           | t         | t         | t         | t
16423 | pub43_2 |       10 | t           | t         | f         | t         | f
(3 rows)
postgres=#  select * from pg_publication_rel ;
oid | prpubid | prrelid
16421 |   16420 |   16392
(1 row)
postgres=# select oid,relname from pg_class where relname ='t1' or relname ='t2';
oid | relname
16402 | t2
16392 | t1
(2 rows)

According to the test above we can see that the publication name will appear in catalog pg_publication, and if the publication is not for all tables then a mapping between publication and tables will appear in catalog pg_publication_rel. You can use publish option to point more detail publication information(insert,delete,update.truncate)

IP89 create subscription

create subscription sub89 connection 'host= port=5432 dbname=postgres user=user_logical43 password=123' publication pub43;

The catalog relations changed

//For IP89
postgres=# select * from pg_subscription;
oid | subdbid | subname | subowner | subenabled |                                 subconninfo                                   | subslotname | subsync
commit | subpublications
16394 |   13591 | sub89   |       10 | t         | host= port=5432 dbname=postgres user=user_logical43 password=123 | sub89       | off    
      | {pub43}
(1 row)

postgres=# select * from pg_subscription_rel;
srsubid | srrelid | srsubstate | srsublsn  
  16394 |   16384 | r         | 0/16A2700
(1 row)

postgres=# select oid,relname from pg_class where relname ='t1';
oid | relname
16384 | t1
(1 row)

//For IP43
postgres=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | ca
talog_xmin | restart_lsn | confirmed_flush_lsn
sub89     | pgoutput | logical   |  13594 | postgres | f         | t     |       8940 |     |  
      524 | 0/16A26C8   | 0/16A2700
(1 row)

Also, walsend and logical replication work processes are created for logical replication services in IP43 and IP89, respectively.

4.3 Check Result

Insert a row on IP43

lichuancheng@IP43:~ $ psql
psql (13devel)
Type "help" for help.

postgres=# insert into t1 values(1,1,'data from ip43');

Query in IP89

[lichuancheng@IP89 data]$ psql
psql (13devel)
Type "help" for help.

postgres=# select *from t1;
i | j |       k        
1 | 1 | data from ip43
(1 row)


Yes, the row is applied as below describe.

4.4 How it works

Now we have completed a simple demo of the function of logical replication. I believe you are not familiar with this process. This paper mainly explains the principle instead of going into many functional demonstrations. The principle of convective replication has been described in detail in the previous section Difference Between Physical And Logical Replication below is a miniature of this simple insert statement in Logical Replication Schematics.

Insert Logical Replication Schematics

5. BDR and Pglogical

There is only one version of BDR (Bi Directional Replication) in github which only supports Postgres9.4. ​ The pglogical and Logical Replication are built on the same principle. Pglogical can support pg 9.4 and onwards while Logical Replication is supported from pg10, pglogical can also handle conflict of data.

The New Concept: logical2sql

1. What And Why Logical2sql

The test_decoding or wal2json can get data from the database however we need a tool that can apply the data to other databases as-well, this is where logical2sql will come into play. The process of logical2sql is described in the diagram below :

Implementation Schematics

However it seems to have some problem with a delay caused by writing and reading between pg_recvlogical and tool used for forwarding the data. Look at the red box in the picture, the pg_recvlogic program writes data to the temp file and forward tool reads data from the temp file which results in the loss of data transmission efficiency. Secondly, it will cause a problem with the escape character. If we can combine the two tools, we can solve the problem by avoiding the loss of transmission efficiency of read and write data. At the same time because there is no intermediate file, there is no need to worry about the problem of escape characters. It is Logical2sqlⅠ.

2. Logical2sql Design

There was a need to develop a new plugin which can change RecordBufferChange to SQL to adapt to different databases. We can then build a new tool which will receive the SQL and forward to other databases instead of writing down to the temporary file.


3. Logical2sql Design Ⅱ

There was also a data transfer in the process of SQL from the new output plugin to the new tool. We can even abandon new tool to directly implement SQL forwarding in the plugin. In the logical2sql idea mapI we can see that the SQL is formed in the new output plugin and then sent to the new tool, the SQL forwarding is done by the new tool. SQL from the new output plugin to the new tool will also have some efficiency loss. We can finish the SQL forwarding process without any tools after the SQL is formed in the new output plugin. It is logical2sql Ⅱ.

logical2sql Ⅱ

I think it will be a good approach,Let’s try it?