A Deeper Look Inside PostgreSQL Visibility Check Mechanism

Enterprise PostgreSQL Solutions

Leave a Comment

A Deeper Look Inside PostgreSQL Visibility Check Mechanism

What is Visibility?

Simply put, the visibility refers to whether a row of data (Heap Tuple by default) should be displayed to the user in certain states, backend processes, or transactions.

For example,

  • A user changes a data record from ‘A’ to ‘B’ via UPDATE
  • PostgreSQL handles this by INSERTing a new record call ‘B’ while marking record ‘A’ as ‘invisible’
  • Both records exist in the system, but ‘B’ is visible, while ‘A’ is not.
  • A deleted or invisible record is also referred as a ‘dead’ tuple
  • One of the responsibilities of VACUUM process is to clear these ‘dead’ tuples to free up space.

MVCC (Multi-Version Concurrency Control) is a method in which each write operation creates a “new version” of the data while retaining the “old version”. This allows concurrent read and write operations without blocking each other. PostgreSQL uses a variant of MVCC, also called Snapshot Isolation to isolate concurrent transactions.

So, it is possible that a single piece of data could have multiple “versions” of it, and it is PostgreSQL’s responsibility to determine which ‘version’ shall be presented to the user based on multiple factors. This act is also known as the “visibility check” or “visibility control”

In this blog, we will dive into PostgreSQL’s visibility check mechanism to understand how it works.

Visibility Indicators

To my knowledge today, PostgreSQL determines a data’s visibility based on the following indicators:

  • Transaction ID
  • xmin
  • xmax
  • cid
  • transaction snapshot
  • CLOG (Commit Log)
  • hintbit

what exactly are these? Let’s find out.

transaction ID

This should be self-explanatory. All transactions in PostgreSQL are associated with a ID number called transaction ID. This is mainly used to check if a data in question is inserted or deleted in the current transaction. We will explore more on this in the later part of the blog.

xmin, xmax, cid and hintbit

These indicators are categorized together because they are all stored in each heap data tuple’s header and each data tuple would have different values, resulting in different visibility check results. The image below illustrates where they are stored on the tuple header:

xmin:

  • this is the transaction ID that inserts this heap data tuple

xmax:

  • this is the transaction ID that removes this heap data tuple
  • if a tuple header has a valid, committed xmax value, this tuple is normally said to be ‘dead’ or ‘invisible’

cid:

  • this is the command ID that this data tuple is inserted within a transaction
  • mostly used to determine visibility within the current transaction
  • if a transaction contains SELECTs + other DML queries (INSERT, UPDATE and DELETE), then cid can be used to determine if a data tuple is inserted or removed before or after a SELECT.

hintbit:

  • this is a flag field that stores the query results from CLOG so PostgreSQL does not have to look up CLOG all the time. CLOG stands for Commit Log and is explained below
  • if hintbit has HEAP_XMIN_COMMITTED set, this means that the transaction ID that inserts this tuple (a.k.a xmin) is committed
  • if PostgreSQL can get this information directly from hintbit, then it can save some time looking up from the CLOG, resulting in some performance gain.
  • possible hintbit flag values are:

CLOG (Commit Log)

CLOG is a data structure stored in shared memory and persisted on $PGDATA/pg_xact directory. The purpose of CLOG is to record the commit status of transactions. Status can be one of:

  • COMMITTED
  • IN_PROGRESS
  • ABORTED
  • SUB_COMMITTED

For example:

If a tuple’s xmin is 103, and CLOG says 103 is ABORTED (image below), this means 103 is not visible because the transaction that inserts this data tuple is ABORTED. Then, for the sake of performance, PostgreSQL will mark hintbit as HEAP_XMIN_INVALID, so it would not look up from the CLOG next time we visit the same data tuple again.

Transaction Snapshot

A transaction snapshot is another data structure that stores active transaction information at a certain point in time. It is mostly used to isolate the data tuples in concurrent transactions. A data tuple could be inserted and committed by another transaction but the transaction snapshot I have could be from the past. Then in this case, the data tuple is still invisible to me according to my snapshot in the past.

A transaction snapshot can be represented by several transaction numbers:

[xmin, xmax, xip_list]
  • xmin: minimum Transaction ID still active – anything less than this = transaction committed or rolled back.
  • xmax: maximum Transaction ID that has been committed or rolled back + 1 – greater than or equal to this value = the transaction has not been committed or rolled back.
  • xip_list: list of ongoing transaction numbers – should be greater than xmin and less than xmax

Please note that the definitions of xmin and xmax in transaction snapshot are different from those stored in the tuple header above.

Transaction snapshot is better explained with examples.

Say we have 3 concurrent backends A, B and C starting transactions 747, 748 and 749, each inserting a data record ‘A’, ‘B’ and ‘C’.

postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('A');
Postgres*# SELECT txid_current();
 txid_current
--------------
          747
(1 row)

postgres=# BEGIN;
postgres=# INSERT INTO mytable VALUES('B');
postgres=# SELECT txid_current();
 txid_current
--------------
          748
(1 row)
postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('C');
Postgres*# SELECT txid_current();
 txid_current
--------------
          749
(1 row)

Then, we commit transaction 749 and obtain a current transaction snapshot via “pg_current_snapshot()”

postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('A');
Postgres*# SELECT txid_current();
 txid_current
--------------
          747
(1 row)

postgres=# BEGIN;
postgres=# INSERT INTO mytable VALUES('B');
postgres=# SELECT txid_current();
 txid_current
--------------
          748
(1 row)
postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('C');
Postgres*# SELECT txid_current();
 txid_current
--------------
          749
(1 row)

Postgres*# COMMIT;

postgres=# select pg_current_snapshot();
 pg_current_snapshot
---------------------
 747:750:747,748
(1 row)

In the perspective of backend C, the transaction snapshot shows:

  • 747 is the minimum transaction ID still active
  • 750 is the maximum transaction ID that has been committed or aborted + 1 (749 + 1)
  • 747 and 748 are still on-going transactions.

What does it mean though?

  • C is able to see the data tuples inserted by transaction IDs less than 750 except except for 747 and 748 because they are still in progress (not committed or rollbacked)
  • C is not able to see data tuples inserted in the future ( larger than 750)
  • C is able to see the data tuple inserted by itself in transaction 749 because it has just committed it.

we can also view the transaction snapshots of A and B while in transaction

postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('A');
Postgres*# SELECT txid_current();
 txid_current
--------------
          747
(1 row)



Postgres*# select pg_current_snapshot();
 pg_current_snapshot
---------------------
 747:750:748
(1 row)

postgres=# BEGIN;
postgres=# INSERT INTO mytable VALUES('B');
postgres=# SELECT txid_current();
 txid_current
--------------
          748
(1 row)



postgres=*# select pg_current_snapshot();
 pg_current_snapshot
---------------------
 747:750:747
(1 row)

postgres=# BEGIN;
Postgres*# INSERT INTO mytable VALUES('C');
Postgres*# SELECT txid_current();
 txid_current
--------------
          749
(1 row)

Postgres*# COMMIT;

postgres=# select pg_current_snapshot();
 pg_current_snapshot
---------------------
 747:750:747,748
(1 row)

As you can see that A’s transaction snapshot indicates that B (748) is in progress, so it cannot see the data tuples inserted by B(748) yet. 747 is the current transaction it is in and is not marked in snapshot anywhere, so it can see the data inserted by itself within the current transaction. B’s transaction snapshot indicates the exact opposite.

Determine Visibility

The visibility indicators mentioned above are just in pieces; this means that we cannot determine the true visibility of a data tuple purely based one one or a few of them. They all have to be considered to compute the correct visibility.

Let’s summarize where things are to determine visibility:

  • xmin, xmax, cid and hintbit are stored in the tuple header
  • snapshot, transaction id are provided by the transaction manager when starting a transaction
  • CLOG is available in shared memory

The image below is a simplified illustration of visibility checking:

This only gives you a brief overview of the entire visibility checking process. This process can be found in src/backend/heap/heapam_visibility.c in the function heapTupleSatisfiesMVCC(). Of course, the actual logic is more complex than the illustration above.

I have studied the entire process of PostgreSQL’s visibility checking logic and come up with a more elaborated flow diagram below: A visibility check can go as deep as this diagram shows, but thanks to the use of hintbit, majority of the visibility check is done pretty quickly by simply checking the hintbit and the snapshot with the xmin and xmax values of a data tuple.

Summary

If you are a PostgreSQL developer working on distributed database, shared storage cluster or sharding, I believe this blog could provide you with an insight of how PostgreSQL handles visibility check. Data atomicity and visibility is always an issue when we start to interact with multiple database nodes. How to ensure data is consistent across the cluster is a common challenge. Knowing the fundamentals of transaction and visibility could help us toggle those problems.