While studying the WAL structure, I came across a keyword named “origin” in the WAL log that I found intriguing. There is not much details on PG official documentation on this keyword in the WAL record and its functions. Hence I set upon the research in the code to find more about this secret data, the purpose of this blog is to share my research about Origin and its purpose in the WAL log.
What is Origin
Based on my understanding from the codebase, the purpose of the “origin” is to stored the source of the tuple. The diagram given below show the data being fed into the PG0 database from four different sources. These are the logical replication subscriptions from PG1 and PG2 database and the data coming from the two applications App1 and App2. The WAL record of the PG0 database needs to store the source of each tuple hence the purpose of strong the origin. This is explained further in the blog…

How to create Origin
First of all, let’s see the catalog that manages or store Origin in PostgreSQL.
postgres=# \d pg_replication_origin Table "pg_catalog.pg_replication_origin" Column | Type | Collation | Nullable | Default ---------+------+-----------+----------+--------- roident | oid | | not null | roname | text | C | not null | Indexes: "pg_replication_origin_roiident_index" UNIQUE, btree (roident), tablespace "pg_global" "pg_replication_origin_roname_index" UNIQUE, btree (roname), tablespace "pg_global" Tablespace: "pg_global" postgres=#
‘roident’ is the identifier assigned to Origin when it is created.
‘roname’ is the name of this Origin.
There are currently two ways to create Origin in a PostgreSQL database:
Subscribing to a remote publication can create a origin locally.
--publication postgres=# \d List of relations Schema | Name | Type | Owner --------+------+-------+-------- public | t1 | table | movead (1 row) postgres=# create publication pub1 for all tables ; CREATE PUBLICATION --subscription postgres=# create subscription sub1 connection 'host=xxxxxxxx port=5432 dbname=postgres user=movead' publication pub1; NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION postgres=#
Using the pg_replication_origin_create () function can manually create a origin
postgres=# select pg_replication_origin_create('test_origin'); pg_replication_origin_create ------------------------------ 2 (1 row)
Take a look at the Origin we created
postgres=# select * from pg_replication_origin; roident | roname ---------+------------- 1 | pg_16389 2 | test_origin (2 rows) postgres=#
‘pg_16389’ is a origin created by subscription (16389 is the subscription Oid), and ‘test_origin’ is a manually created origin. Here you can also see that ‘roident’ is not a database Oid, it is just an ordinary value.
Use the origin
In the case of an origin generated by a subscription, what you need to do only is to insert some data on the publication side then you will get wal record with an origin marked. To use a manually created origin, you should call the pg_replication_origin_session_setup () API to bind the session to the origin.
1. manually created origin
postgres=# select pg_replication_origin_session_setup('test_origin'); pg_replication_origin_session_setup ------------------------------------- (1 row) postgres=# insert into t1 values(100);select pg_current_wal_lsn(); INSERT 0 1 pg_current_wal_lsn -------------------- 0/4000230 (1 row) postgres=#
2. subscription origin
postgres=# insert into t1 values(200);select pg_current_wal_lsn(); INSERT 0 1 pg_current_wal_lsn -------------------- 0/156BD18 (1 row) postgres=#
3. Check in wal
movead@Movead:/h2/pg_stream_re/bin$ ./pg_waldump ../data_logical/pg_wal/000000010000000000000003 rmgr: Heap len (rec/tot): 62/ 62, tx: 519, lsn: 0/03000028, prev 0/02009BE0, desc: INSERT+INIT off 1 flags 0x08, blkref #0: rel 1663/12664/16394 blk 0 rmgr: Transaction len (rec/tot): 65/ 65, tx: 519, lsn: 0/03000068, prev 0/03000028, desc: COMMIT 2020-04-16 17:09:01.989257 CST; origin: node 1, lsn 0/0, at 2020-04-16 17:05:49.767511 CST rmgr: Standby len (rec/tot): 54/ 54, tx: 0, lsn: 0/030000B0, prev 0/03000068, desc: RUNNING_XACTS nextXid 520 latestCompletedXid 518 oldestRunningXid 519; 1 xacts: 519 rmgr: Heap len (rec/tot): 62/ 62, tx: 520, lsn: 0/030000E8, prev 0/030000B0, desc: INSERT off 2 flags 0x08, blkref #0: rel 1663/12664/16394 blk 0 rmgr: Transaction len (rec/tot): 65/ 65, tx: 520, lsn: 0/03000128, prev 0/030000E8, desc: COMMIT 2020-04-16 17:09:09.327941 CST; origin: node 2, lsn 0/156BB28, at 2020-04-16 17:09:09.268948 CST rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/03000170, prev 0/03000128, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030001A8, prev 0/03000170, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/030001E0, prev 0/030001A8, desc: CHECKPOINT_ONLINE redo 0/30001A8; tli 1; prev tli 1; fpw true; xid 0:521; oid 24576; multi 1; offset 0; oldest xid 478 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 521; online rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/03000258, prev 0/030001E0, desc: RUNNING_XACTS nextXid 521 latestCompletedXid 520 oldestRunningXid 521 rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/03000290, prev 0/03000258, desc: SWITCH
rmgr: Transaction len (rec/tot): 65/ 65, tx: 519, lsn: 0/03000068, prev 0/03000028, desc: COMMIT 2020-04-16 17:09:01.989257 CST; origin: node 1, lsn 0/0, at 2020-04-16 17:05:49.767511 CST
In this wal parsing record, ‘origin: node 1, lsn 0/0, at 2020-04-16 17: 05: 49.767511 CST’ is the origin record, which indicates that tuples in this transaction comes from node 1(The origin which roident is 1).
rmgr: Transaction len (rec/tot): 65/ 65, tx: 520, lsn: 0/03000128, prev 0/030000E8, desc: COMMIT 2020-04-16 17:09:09.327941 CST; origin: node 2, lsn 0/156BB28, at 2020-04-16 17:09:09.268948 CST
In this wal parsing record, ‘node 2, lsn 0 / 156BB28, at 2020-04-16 17: 09: 09.268948 CST’ is the origin record, which indicates that tuples in this transaction comes from node 2(The origin which roident is 2).
So far, it proves that there is a mark for the origin in the wal. Can we know the origin of a record without parsing the wal? I investigated the source code and found that the origin id was saved in commit_ts, but no query API was provided. If necessary, a query API can be opened easily.
Origin’s use in the logical decoding
Origin is designed for logical decoding to filter unwanted data in the logical analyse plugin. Currently, ‘test_decoding’ and ‘pgoutput’ parsing plugins are built into the PostgreSQL kernel, and ‘test_decoding’ realizes the use of Origin. The following are the two external interfaces used for test_decoding
postgres=# select pg_logical_slot_get_changes('regression_slot', NULL,NULL,'only-local','true');
movead@Movead:/h2/pg_stream_re/bin$ ./pg_recvlogical -d postgres --option='only-local=true' --slot=test --start -f -
You can use the ‘only-local’ option to define the result set parsed by the test_decoding plugin.
‘pgoutput’ is plug-in for logical replication, but pgoutput does not support the analysis of Origin data, so logical replication can’t distinguishing data origin. I think we can enhance pgoutput to make a magic effect of logical replication.
As shown in the figure, the two subscriptions from PG1 and PG2 are two origins. App1 and App2 can specify an origin for themselves through a manual origin. After ‘pgoutput’ has been enhanced, when PG3 and PG4 subscript the same thing from PG0, PG3 can receive data from PG1 and APP2 and PG4 can receive data from PG2 and APP1.

Origin life cycle
Looking at the picture above, suppose there is a piece of data that is logically copied from PG1 to PG0, and logically copied to PG3. Does this data in PG3 still remember that it came from PG1? The answer is no. It can not record multiple Origins, but I found this is a todo item in the code. Assuming that we have completed this function, the sytem can still know where it came from after multiple logical copy transfers, and it will log it’s footprint all the time.
Summary
Origin is a very magical thing, but fewer people notice it, maybe it will play a big role in the future.

Movead.Li is kernel development of Highgo Software. Since joining into Highgo Software in 2016, Movead takes the most time on researching the code of Postgres and is good at ‘Write Ahead Log’ and ‘Database Backup And Recovery’. Base on the experience Movead has two open-source software on the Postgres database. One is Walminer which can analyze history wal file to SQL. The other one is pg_lightool which can do a single table or block recovery base on base backup and walfiles or walfiles only.
Hello
Now he has joined the HighGo community team and hopes to make more contributions to the community in the future.
Recent Comments