When inserting data into a table, you can specify that the value of a field is null. For example, for a table t (i int, j int, k int), we can execute insert into t values (8,1,6), or insert into t values (3, null, 7) to insert a record with a null value. This blog will explore some technical details of storing null values in PostgreSQL.
The issue how PG storage null value
A tuple in PG is divided into TupleHead part and data part. Two examples of inserting data into table ‘t’ are mentioned above. Here, a picture is used to throw this question: how to represent this null value in tuples?
As shown in the figure, an int data type can occupy 4 bytes of space without a null value. But in the case of null value, how should the database indicate that there is a null between 3 and 7?
Storage method of null value in PG
Using the pageinspact tool to observe how null values are stored. The following tests are performed:
postgres=# create table t(i int, j int, k int); CREATE TABLE postgres=# insert into t values(8,1,6); INSERT 0 1 postgres=# insert into t values(3,NULL,7); INSERT 0 1 postgres=# insert into t values(4,9,2); INSERT 0 1 postgres=# select lp,t_infomask,t_bits,t_data from heap_page_items(get_raw_page('t', 0)); lp | t_infomask | t_bits | t_data ----+------------+----------+---------------------------- 1 | 2048 | | \x080000000100000006000000 2 | 2049 | 10100000 | \x0300000007000000 3 | 2048 | | \x040000000900000002000000 (3 rows) postgres=#
As you can see, null values are not marked in the data area.
At the same time, we notice that the ‘t_infomask’ and ‘t_bits’ values of the second data with null value are inconsistent with those of the first and third data. Therefore, this may be the secret of reading the null value.
When (t_infomask & HEAP_HASNULL) is true, the column representing the tuple has a null value.
The ‘t_bits’ stores all the null columns in the tuple.
Therefore, before PG parses each column of a tuple, it will get whether there be some column is null via t_infomask and t_bits. If a column is null, it is considered that there is no data for the column in the data area.
‘t_its’ is an array of uint8. When there’s nothing null column in the tuple, t_bits can be considered empty, when the tuple has null columns, the t_bits array uses a bit to represent whether a column is null.
postgres=# create table t0(i1 int,i2 int,i3 int,i4 int,i5 int,i6 int, postgres(# i7 int,i8 int,i9 int,i10 int,i11 int,i12 int postgres(# ); CREATE TABLE postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,NULL,12); INSERT 0 1 postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,NULL,NULL,12); INSERT 0 1 postgres=# insert into t0 values(1,2,3,4,5,6,7,8,NULL,NULL,NULL,12); INSERT 0 1 postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,11,12); INSERT 0 1 postgres=# postgres=# select lp,t_infomask,t_bits from heap_page_items(get_raw_page('t0', 0)); lp | t_infomask | t_bits ----+------------+------------------ 1 | 2049 | 1111111111010000 2 | 2049 | 1111111110010000 3 | 2049 | 1111111100010000 4 | 2048 | (4 rows) postgres=#
The above script will create the table t0 with 12 columns, it then inserts several tuples in the table and execute the pageinspect tool query to inspect the tuples. Please see the t_bits value of the tuple :
It can be concluded that:
The value of the tag bit unused in t_bits is 0.
The 1 value indicates that the corresponding column is not null, otherwise it is null.
The columns of the table that are dropped
After a table column is dropped in PG, the data structure of this column will be saved in the catalog in a way that is invisible to the user. Do a test as follows:
postgres=# alter table t0 drop column i1; ALTER TABLE postgres=# insert into t0 values(2,3,4,5,6,7,8,9,10,11,12); INSERT 0 1 postgres=# select lp,t_infomask,t_bits from heap_page_items(get_raw_page('t0', 0)); lp | t_infomask | t_bits ----+------------+------------------ 1 | 2049 | 1111111111010000 2 | 2049 | 1111111110010000 3 | 2049 | 1111111100010000 4 | 2048 | 5 | 2049 | 0111111111110000 (5 rows) postgres=#
In the above test, we drop column ‘I1’ of t0 and insert a record with no null value into t0. Using the pageinspact tool, we found that the inserted data (lp=5) also has null value. The t_bits value of the dropped column show that it is also considered as a null value.
When a record has nothing null value, t_bits is not stored in the record. Once there is a null value, t_bits is stored in the tuple to record the null value status of all columns. A dropped column of a table is considered as null column. Therefore, when there are many columns in a table, do not drop a column, otherwise extra t_bits will be generated for each record, which causes the storage to swell.
Movead.Li is kernel development of Highgo Software. Since joining into Highgo Software in 2016, Movead takes the most time on researching the code of Postgres and is good at ‘Write Ahead Log’ and ‘Database Backup And Recovery’. Base on the experience Movead has two open-source software on the Postgres database. One is Walminer which can analyze history wal file to SQL. The other one is pg_lightool which can do a single table or block recovery base on base backup and walfiles or walfiles only.
Now he has joined the HighGo community team and hopes to make more contributions to the community in the future.