CREATE TABLE one_column_table (field INTEGER);
The above SQL creates ‘one_column_table’ with only a single column. But does the table actually have only one column?
Let’s query the pg_attribute
catalog to find out how many columns our one_column_table has.
SELECT attname, attnum, atttypid::regtype from pg_attribute WHERE attrelid = 'one_column_table'::regclass; attname | attnum | atttypid ----------+--------+---------- tableoid | -6 | oid cmax | -5 | cid xmax | -4 | xid cmin | -3 | cid xmin | -2 | xid ctid | -1 | tid field | 1 | integer (7 rows)
What?? As per the ‘pg_attribute’ catalog, the table we just created has seven columns.
Somehow PostgreSQL has added six extra columns to our table and all of these implicit columns have -ve ‘attnum’ value.
This post is about these implicit system columns, what data these fields hold, and what we can do with that data.
So let us look at the system columns one by one.
ctid
TID is an acronym for a tuple identifier (row identifier), which is a pair of values (block number, tuple index within the block). Since the data files in PostgreSQL are logically divided into fixed-size blocks (8k by default), the CTID column identifies the physical location of the two within its table.
So what we can do with this CTID column?
Knowing the exact row location of the tuple within the table’s data file can be handy for fixing the data corruption in some cases. but it has another very interesting use cases. This CTID column could be used to identify the unique ROW within the table if it has duplicate data.
For example
Suppose we accidentally inserted the duplicate data in our example ‘one_column_table’ table.
INSERT INTO one_column_table SELECT generate_series(1,5); --insert the same data 2nd time INSERT INTO one_column_table SELECT generate_series(1,5); SELECT * FROM one_column_table; field ------- 1 2 3 4 5 1 2 3 4 5 (10 rows)
So let’s see how we can leverage this implicit CTID column to get rid of all duplicate rows.
The data in the user-defined column is duplicated in the table but since the ctid of each row is always unique within the table so selecting the ctid along with the rest of data would make each row uniquely identifiable.
SELECT ctid,* FROM one_column_table; ctid | field --------+------- (0,1) | 1 (0,2) | 2 (0,3) | 3 (0,4) | 4 (0,5) | 5 (0,6) | 1 (0,7) | 2 (0,8) | 3 (0,9) | 4 (0,10) | 5 (10 rows)
Now we can use this ctid column to delete the duplicate rows.
DELETE FROM one_column_table WHERE ctid NOT IN ( SELECT MAX(ctid) FROM one_column_table GROUP BY field ); DELETE 5 postgres=# SELECT * FROM one_column_table; field ------- 1 2 3 4 5 (5 rows)
In short, the ctid column contains the information related to the location of a tuple within the table’s data file (although it can get changed by vacuum) and can be leveraged to uniquely identify the row or its position in the data file.
tableoid
This implicit column represents the object id (row identifier) of the row in the ‘pg_class
‘ catalog table that holds the metadata for this particular table.
We can use this ‘tableoid‘ value to easily get the meta-information for the table.
SELECT c.relname, n.nspname as namespace, t.spcname as tablespace, c.relhasrules has_rules, c.relhasindex has_indexes FROM pg_class c LEFT OUTER JOIN pg_tablespace t on (t.oid = c.reltablespace) LEFT OUTER JOIN pg_namespace n on (n.oid = c.relnamespace) WHERE c.oid = ( SELECT tableoid FROM one_column_table LIMIT 1 ); relname | namespace | tablespace | has_rules | has_indexes ------------------+-----------+------------+-----------+------------- one_column_table | public | | f | f (1 row)
But that’s not the only thing ‘tableoid’ can do: it can serve another very interesting purpose.
‘tableoid’ can be used to identify the origin of the row, especially when we are dealing with partitioned tables or UNION queries.
--CREATE partitioned table CREATE TABLE OCT_partitioned (field INTEGER) PARTITION BY range (field); CREATE TABLE OCT_part1 PARTITION OF OCT_partitioned FOR VALUES FROM (1) TO (5); CREATE TABLE OCT_part2 PARTITION OF OCT_partitioned FOR VALUES FROM (6) TO (10); CREATE TABLE OCT_part3 PARTITION OF OCT_partitioned DEFAULT; --INSERT some data INSERT INTO OCT_partitioned SELECT generate_series(1,15); --Issue SELECT SELECT tableoid::regclass, field from OCT_partitioned; tableoid | field -----------+------- oct_part1 | 1 oct_part1 | 2 oct_part1 | 3 oct_part1 | 4 oct_part2 | 6 oct_part2 | 7 oct_part2 | 8 oct_part2 | 9 oct_part3 | 5 oct_part3 | 10 oct_part3 | 11 oct_part3 | 12 oct_part3 | 13 oct_part3 | 14 oct_part3 | 15 (15 rows)
As can be seen from the above query, using the ‘tableoid’ column we can easily identify the origin of rows in case of partitioned tables.
Similarly, ‘tableoid’ comes in handy to identify the row origin when using UNION queries.
--CREATE a table CREATE TABLE second_one_column_table ( field INTEGER); --INSERT some data in second_one_column_table INSERT INTO second_one_column_table values(999); INSERT INTO second_one_column_table values(9999); INSERT INTO second_one_column_table values(99999); --Issue a union select query SELECT field,tableoid::regclass from second_one_column_table UNION SELECT field,tableoid::regclass from one_column_table; field | tableoid -------+------------------------- 999 | second_one_column_table 250 | one_column_table 200 | one_column_table 1004 | one_column_table 1005 | one_column_table 300 | one_column_table 99999 | second_one_column_table 1002 | one_column_table 1003 | one_column_table 4 | one_column_table 9999 | second_one_column_table 400 | one_column_table 1001 | one_column_table 150 | one_column_table 100 | one_column_table 1 | one_column_table 2 | one_column_table 3 | one_column_table 500 | one_column_table 5 | one_column_table (20 rows)
xmin
The identifier type used by the system for transaction ID is ‘XID’, or transaction (abbreviated XACT) identifier. This is the 32-bit number that identifies the transaction in the system.
‘xmin‘ implicit column holds the transaction-ID of the inserting transaction for this row version. Let’s insert some more data into our example ‘one_column_table‘ table and see how we can use ‘xmin’ column.
--Insert some more rows in our example table INSERT INTO one_column_table values ( 100); INSERT INTO one_column_table values ( 200); INSERT INTO one_column_table values ( 300);
Now see what we have in the xmin column
SELECT xmin, * from one_column_table; xmin | field ------+------- 509 | 1 509 | 2 509 | 3 509 | 4 509 | 5 516 | 100 517 | 200 518 | 300 (8 rows)
So using this ‘xmin’ column we can also identify the time when the row was inserted.
SELECT to_char(pg_xact_commit_timestamp(xmin) ,'YYYY/MM/DD HH:MM:SS') AS row_inserted_at, field FROM one_column_table; row_inserted_at | field ---------------------+------- 2020/05/18 05:05:31 | 1 2020/05/18 05:05:31 | 2 2020/05/18 05:05:31 | 3 2020/05/18 05:05:31 | 4 2020/05/18 05:05:31 | 5 2020/05/18 08:05:38 | 100 2020/05/18 08:05:41 | 200 2020/05/18 08:05:43 | 300 (8 rows)
Note:
pg_xact_commit_timestamp
is only available iftrack_commit_timestamp
configuration is enabled inpostgresql.conf
xmax
Unlike other pseudo columns, ‘xmax’ can hold two types of values.
Either ‘xmax’ stores the transaction ID (XID) of the transaction that deleted the tuple or it can hold the XID of transaction that is holding the row lock on the row.
Let’s see whats in the ‘xmax’ value for rows in our example table.
SELECT xmax, field FROM one_column_table; xmax | field ------+------- 0 | 1 0 | 2 0 | 3 0 | 4 0 | 5 0 | 100 0 | 200 0 | 300 (8 rows)
Now open another session and issue the UPDATE on one_column_table table.
Note: REMEMBER UPDATE is “delete + insert” under the hood in PostgreSQL
BEGIN; select txid_current(); txid_current -------------- 519 (1 row) UPDATE one_column_table SET field=field+1;
Before ending the transaction, re-run the same query in the first session
SELECT xmax, field FROM one_column_table; xmax | field ------+------- 519 | 1 519 | 2 519 | 3 519 | 4 519 | 5 519 | 100 519 | 200 519 | 300 (8 rows)
As you can see all the rows got xmax = 519, this shows that transaction with XID = 519 is deleting these versions of rows from another session.
End the transaction and this time try locking a row.
From the second session open a transaction and lock row with field=200
BEGIN; postgres=*# SELECT txid_current(); txid_current -------------- 523 (1 row) SELECT * from one_column_table where field = 200 for update; field ------- 200 (1 row)
Now back to session one and see what we have in the ‘xmax’ column.
SELECT xmax, field from one_column_table; xmax | field ------+------- 0 | 1 0 | 2 0 | 3 0 | 4 0 | 5 0 | 100 523 | 200 0 | 300 (8 rows)
Here the row with a field value of 200 has 523 in the ‘xmax’ column. but this time it means transaction 523 is holding the lock on the row.
So how can we figure out what is the meaning of the current value of xmax?
infomask
flags in the heap page identify the value of ‘xmax’ field. So let’s install pageinspect
EXTENSION to inspect the content of the heap page.
CREATE EXTENSION pageinspect; SELECT t.field, t.xmax, t.ctid, CASE WHEN (t_infomask & 128)::boolean then 'LOCK' WHEN (t_infomask & 1024)::boolean then 'COMMITTED' WHEN (t_infomask & 2048)::boolean then 'ROLLBACKED' WHEN (t_infomask & 4096)::boolean then 'MULTI XACT' END as xmax_status FROM one_column_table t LEFT OUTER JOIN heap_page_items(get_raw_page('one_column_table', 0)) h ON (t.ctid = h.t_ctid) WHERE t.field = 200 and h.t_xmax = t.xmax; field | xmax | ctid | xmax_status -------+------+--------+------------- 200 | 523 | (0,28) | LOCK (1 row)
We see in the above query output that the row is locked. In this case, it is because of the FOR UPDATE we issue from another session. Similarly, the same query would output the status of the transaction as COMMITTED or ROLLBACKED when the ‘xmax’ identifies the deleting transaction instead of the lock on the row.
cmin and cmax
‘cmin‘ and ‘cmax‘ are overlapped fields and are used within the same transaction to identify the command that changed a tuple.
As per the documentation:
‘cmin’ is: The command identifier (starting at zero) within the inserting transaction.
‘cmax‘ is: The command identifier within the deleting transaction, or zero.
Well I can’t think of some real usefulness of these, other than that we can use the column value to see the sequence of inserts within the same transaction.
BEGIN; --identify the transaction ID SELECT txid_current(); txid_current -------------- 525 (1 row) -- INSERT some rows within the same transaction INSERT INTO one_column_table values (1003); INSERT INTO one_column_table values (1002); INSERT INTO one_column_table values (1001); INSERT INTO one_column_table values (1004); INSERT INTO one_column_table values (1005); END; --cmin contains the order of rows inserted --within the transaction 525 SELECT cmin, field FROM one_column_table WHERE xmin = 525 ORDER BY field; cmin | field ------+------- 2 | 1001 1 | 1002 0 | 1003 3 | 1004 4 | 1005 (5 rows)
Bottomline
The post only gives a small preview of what we can do with implicit pseudo columns that get created with every table in PostgreSQL. The examples may not show the use cases of these columns in real-world scenarios, they are merely used to get an understanding of the implicit columns and how they can be useful.

Muhammad Usama is a database architect / PostgreSQL consultant at HighGo Software and also Pgpool-II core committer. Usama has been involved with database development (PostgreSQL) since 2006, he is the core committer for open source middleware project Pgpool-II and has played a pivotal role in driving and enhancing the product. Prior to coming to open source development, Usama was doing software design and development with the main focus on system-level embedded development. After joining the EnterpriseDB, an Enterprise PostgreSQL’s company in 2006 he started his career in open source development specifically in PostgreSQL and Pgpool-II. He is a major contributor to the Pgpool-II project and has contributed to many performance and high availability related features.
Recent Comments