The Origin in PostgreSQL logical decoding

Enterprise PostgreSQL Solutions

Comments are off

The Origin in PostgreSQL logical decoding

While studying the WAL structure, I came across a keyword named “origin” in the WAL log that I found intriguing. There is not much details on PG official documentation on this keyword in the WAL record and its functions. Hence I set upon the research in the code to find more about this secret data, the purpose of this blog is to share my research about Origin and its purpose in the WAL log.

What is Origin

Based on my understanding from the codebase, the purpose of the “origin” is to stored the source of the tuple. The diagram given below show the data being fed into the PG0 database from four different sources. These are the logical replication subscriptions from PG1 and PG2 database and the data coming from the two applications App1 and App2. The WAL record of the PG0 database needs to store the source of each tuple hence the purpose of strong the origin. This is explained further in the blog…

How to create Origin

First of all, let’s see the catalog that manages or store Origin in PostgreSQL.

postgres=#  \d pg_replication_origin
    Table "pg_catalog.pg_replication_origin"
 Column  | Type | Collation | Nullable | Default 
---------+------+-----------+----------+---------
 roident | oid  |           | not null | 
 roname  | text | C         | not null | 
Indexes:
    "pg_replication_origin_roiident_index" UNIQUE, btree (roident), tablespace "pg_global"
    "pg_replication_origin_roname_index" UNIQUE, btree (roname), tablespace "pg_global"
Tablespace: "pg_global"

postgres=#

‘roident’ is the identifier assigned to Origin when it is created.

‘roname’ is the name of this Origin.

There are currently two ways to create Origin in a PostgreSQL database:

Subscribing to a remote publication can create a origin locally.

--publication
postgres=# \d
       List of relations
 Schema | Name | Type  | Owner  
--------+------+-------+--------
 public | t1   | table | movead
(1 row)

postgres=# create publication pub1 for all tables ;
CREATE PUBLICATION

--subscription
postgres=# create subscription sub1 connection 'host=xxxxxxxx port=5432 dbname=postgres user=movead' publication pub1;
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres=#

Using the pg_replication_origin_create () function can manually create a origin

postgres=# select pg_replication_origin_create('test_origin');
 pg_replication_origin_create 
------------------------------
                            2
(1 row)

Take a look at the Origin we created

postgres=# select * from pg_replication_origin;
 roident |   roname    
---------+-------------
       1 | pg_16389
       2 | test_origin
(2 rows)

postgres=#

‘pg_16389’ is a origin created by subscription (16389 is the subscription Oid), and ‘test_origin’ is a manually created origin. Here you can also see that ‘roident’ is not a database Oid, it is just an ordinary value.

Use the origin

In the case of an origin generated by a subscription, what you need to do only is to insert some data on the publication side then you will get wal record with an origin marked. To use a manually created origin, you should call the pg_replication_origin_session_setup () API to bind the session to the origin.

1. manually created origin

postgres=# select pg_replication_origin_session_setup('test_origin');
 pg_replication_origin_session_setup 
-------------------------------------

(1 row)
postgres=# insert into t1 values(100);select pg_current_wal_lsn();
INSERT 0 1
 pg_current_wal_lsn 
--------------------
 0/4000230
(1 row)

postgres=#

2. subscription origin

postgres=# insert into t1 values(200);select pg_current_wal_lsn();
INSERT 0 1
 pg_current_wal_lsn 
--------------------
 0/156BD18
(1 row)

postgres=# 

3. Check in wal

movead@Movead:/h2/pg_stream_re/bin$ ./pg_waldump ../data_logical/pg_wal/000000010000000000000003 
rmgr: Heap        len (rec/tot):     62/    62, tx:        519, lsn: 0/03000028, prev 0/02009BE0, desc: INSERT+INIT off 1 flags 0x08, blkref #0: rel 1663/12664/16394 blk 0
rmgr: Transaction len (rec/tot):     65/    65, tx:        519, lsn: 0/03000068, prev 0/03000028, desc: COMMIT 2020-04-16 17:09:01.989257 CST; origin: node 1, lsn 0/0, at 2020-04-16 17:05:49.767511 CST
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn: 0/030000B0, prev 0/03000068, desc: RUNNING_XACTS nextXid 520 latestCompletedXid 518 oldestRunningXid 519; 1 xacts: 519
rmgr: Heap        len (rec/tot):     62/    62, tx:        520, lsn: 0/030000E8, prev 0/030000B0, desc: INSERT off 2 flags 0x08, blkref #0: rel 1663/12664/16394 blk 0
rmgr: Transaction len (rec/tot):     65/    65, tx:        520, lsn: 0/03000128, prev 0/030000E8, desc: COMMIT 2020-04-16 17:09:09.327941 CST; origin: node 2, lsn 0/156BB28, at 2020-04-16 17:09:09.268948 CST
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/03000170, prev 0/03000128, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030001A8, prev 0/03000170, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/030001E0, prev 0/030001A8, desc: CHECKPOINT_ONLINE redo 0/30001A8; tli 1; prev tli 1; fpw true; xid 0:521; oid 24576; multi 1; offset 0; oldest xid 478 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 521; online
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/03000258, prev 0/030001E0, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521
rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 0/03000290, prev 0/03000258, desc: SWITCH 
rmgr: Transaction len (rec/tot):     65/    65, tx:        519, lsn: 0/03000068, prev 0/03000028, desc: COMMIT 2020-04-16 17:09:01.989257 CST; origin: node 1, lsn 0/0, at 2020-04-16 17:05:49.767511 CST

In this wal parsing record, ‘origin: node 1, lsn 0/0, at 2020-04-16 17: 05: 49.767511 CST’ is the origin record, which indicates that tuples in this transaction comes from node 1(The origin which roident is 1).

rmgr: Transaction len (rec/tot):     65/    65, tx:        520, lsn: 0/03000128, prev 0/030000E8, desc: COMMIT 2020-04-16 17:09:09.327941 CST; origin: node 2, lsn 0/156BB28, at 2020-04-16 17:09:09.268948 CST

In this wal parsing record, ‘node 2, lsn 0 / 156BB28, at 2020-04-16 17: 09: 09.268948 CST’ is the origin record, which indicates that tuples in this transaction comes from node 2(The origin which roident is 2).

So far, it proves that there is a mark for the origin in the wal. Can we know the origin of a record without parsing the wal? I investigated the source code and found that the origin id was saved in commit_ts, but no query API was provided. If necessary, a query API can be opened easily.

Origin’s use in the logical decoding

Origin is designed for logical decoding to filter unwanted data in the logical analyse plugin. Currently, ‘test_decoding’ and ‘pgoutput’ parsing plugins are built into the PostgreSQL kernel, and ‘test_decoding’ realizes the use of Origin. The following are the two external interfaces used for test_decoding

postgres=# select pg_logical_slot_get_changes('regression_slot', NULL,NULL,'only-local','true');
movead@Movead:/h2/pg_stream_re/bin$ ./pg_recvlogical -d postgres --option='only-local=true' --slot=test --start -f -

You can use the ‘only-local’ option to define the result set parsed by the test_decoding plugin.

‘pgoutput’ is plug-in for logical replication, but pgoutput does not support the analysis of Origin data, so logical replication can’t distinguishing data origin. I think we can enhance pgoutput to make a magic effect of logical replication.

As shown in the figure, the two subscriptions from PG1 and PG2 are two origins. App1 and App2 can specify an origin for themselves through a manual origin. After ‘pgoutput’ has been enhanced, when PG3 and PG4 subscript the same thing from PG0, PG3 can receive data from PG1 and APP2 and PG4 can receive data from PG2 and APP1.

Origin life cycle

Looking at the picture above, suppose there is a piece of data that is logically copied from PG1 to PG0, and logically copied to PG3. Does this data in PG3 still remember that it came from PG1? The answer is no. It can not record multiple Origins, but I found this is a todo item in the code. Assuming that we have completed this function, the sytem can still know where it came from after multiple logical copy transfers, and it will log it’s footprint all the time.

Summary

Origin is a very magical thing, but fewer people notice it, maybe it will play a big role in the future.