The way PG store null value in record

Enterprise PostgreSQL Solutions

Comments are off

The way PG store null value in record

When inserting data into a table, you can specify that the value of a field is null. For example, for a table t (i int, j int, k int), we can execute insert into t values (8,1,6), or insert into t values (3, null, 7) to insert a record with a null value. This blog will explore some technical details of storing null values in PostgreSQL.

The issue how PG storage null value

A tuple in PG is divided into TupleHead part and data part. Two examples of inserting data into table ‘t’ are mentioned above. Here, a picture is used to throw this question: how to represent this null value in tuples?

tuple_struct

As shown in the figure, an int data type can occupy 4 bytes of space without a null value. But in the case of null value, how should the database indicate that there is a null between 3 and 7?

Storage method of null value in PG

Using the pageinspact tool to observe how null values are stored. The following tests are performed:

postgres=# create table t(i int, j int, k int);
CREATE TABLE
postgres=# insert into t values(8,1,6);
INSERT 0 1
postgres=# insert into t values(3,NULL,7);
INSERT 0 1
postgres=# insert into t values(4,9,2);
INSERT 0 1
postgres=# select lp,t_infomask,t_bits,t_data  from  heap_page_items(get_raw_page('t', 0));
 lp | t_infomask |  t_bits  |           t_data           
----+------------+----------+----------------------------
  1 |       2048 |          | \x080000000100000006000000
  2 |       2049 | 10100000 | \x0300000007000000
  3 |       2048 |          | \x040000000900000002000000
(3 rows)

postgres=#

As you can see, null values are not marked in the data area.

tuple_struct

At the same time, we notice that the ‘t_infomask’ and ‘t_bits’ values of the second data with null value are inconsistent with those of the first and third data. Therefore, this may be the secret of reading the null value.

  1. When (t_infomask & HEAP_HASNULL) is true, the column representing the tuple has a null value.

  2. The ‘t_bits’ stores all the null columns in the tuple.

Therefore, before PG parses each column of a tuple, it will get whether there be some column is null via t_infomask and t_bits. If a column is null, it is considered that there is no data for the column in the data area.

‘t_bits’ design

‘t_its’ is an array of uint8. When there’s nothing null column in the tuple, t_bits can be considered empty, when the tuple has null columns, the t_bits array uses a bit to represent whether a column is null.

postgres=# create table t0(i1 int,i2 int,i3 int,i4 int,i5 int,i6 int,
postgres(# i7 int,i8 int,i9 int,i10 int,i11 int,i12 int
postgres(# );
CREATE TABLE
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,NULL,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,NULL,NULL,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,11,12);
INSERT 0 1
postgres=#
postgres=# select lp,t_infomask,t_bits  from  heap_page_items(get_raw_page('t0', 0));
 lp | t_infomask |      t_bits      
----+------------+------------------
  1 |       2049 | 1111111111010000
  2 |       2049 | 1111111110010000
  3 |       2049 | 1111111100010000
  4 |       2048 | 
(4 rows)

postgres=#

The above script will create the table t0 with 12 columns, it then inserts several tuples in the table and execute the pageinspect tool query to inspect the tuples. Please see the t_bits value of the tuple :

t_bits

It can be concluded that:

The value of the tag bit unused in t_bits is 0.

The 1 value indicates that the corresponding column is not null, otherwise it is null.

The columns of the table that are dropped

After a table column is dropped in PG, the data structure of this column will be saved in the catalog in a way that is invisible to the user. Do a test as follows:

postgres=# alter table t0 drop column i1;
ALTER TABLE
postgres=# insert into t0 values(2,3,4,5,6,7,8,9,10,11,12);
INSERT 0 1
postgres=# select lp,t_infomask,t_bits  from  heap_page_items(get_raw_page('t0', 0));
 lp | t_infomask |      t_bits      
----+------------+------------------
  1 |       2049 | 1111111111010000
  2 |       2049 | 1111111110010000
  3 |       2049 | 1111111100010000
  4 |       2048 | 
  5 |       2049 | 0111111111110000
(5 rows)

postgres=#

In the above test, we drop column ‘I1’ of t0 and insert a record with no null value into t0. Using the pageinspact tool, we found that the inserted data (lp=5) also has null value. The t_bits value of the dropped column show that it is also considered as a null value.

droped_column

Conclusion

When a record has nothing null value, t_bits is not stored in the record. Once there is a null value, t_bits is stored in the tuple to record the null value status of all columns. A dropped column of a table is considered as null column. Therefore, when there are many columns in a table, do not drop a column, otherwise extra t_bits will be generated for each record, which causes the storage to swell.