The way PG store null value in record

Enterprise PostgreSQL Solutions

2 comments

The way PG store null value in record

When inserting data into a table, you can specify that the value of a field is null. For example, for a table t (i int, j int, k int), we can execute insert into t values (8,1,6), or insert into t values (3, null, 7) to insert a record with a null value. This blog will explore some technical details of storing null values in PostgreSQL.

The issue how PG storage null value

A tuple in PG is divided into TupleHead part and data part. Two examples of inserting data into table ‘t’ are mentioned above. Here, a picture is used to throw this question: how to represent this null value in tuples?

tuple_struct

As shown in the figure, an int data type can occupy 4 bytes of space without a null value. But in the case of null value, how should the database indicate that there is a null between 3 and 7?

Storage method of null value in PG

Using the pageinspact tool to observe how null values are stored. The following tests are performed:

postgres=# create table t(i int, j int, k int);
CREATE TABLE
postgres=# insert into t values(8,1,6);
INSERT 0 1
postgres=# insert into t values(3,NULL,7);
INSERT 0 1
postgres=# insert into t values(4,9,2);
INSERT 0 1
postgres=# select lp,t_infomask,t_bits,t_data  from  heap_page_items(get_raw_page('t', 0));
 lp | t_infomask |  t_bits  |           t_data           
----+------------+----------+----------------------------
  1 |       2048 |          | \x080000000100000006000000
  2 |       2049 | 10100000 | \x0300000007000000
  3 |       2048 |          | \x040000000900000002000000
(3 rows)

postgres=#

As you can see, null values are not marked in the data area.

tuple_struct

At the same time, we notice that the ‘t_infomask’ and ‘t_bits’ values of the second data with null value are inconsistent with those of the first and third data. Therefore, this may be the secret of reading the null value.

  1. When (t_infomask & HEAP_HASNULL) is true, the column representing the tuple has a null value.

  2. The ‘t_bits’ stores all the null columns in the tuple.

Therefore, before PG parses each column of a tuple, it will get whether there be some column is null via t_infomask and t_bits. If a column is null, it is considered that there is no data for the column in the data area.

‘t_bits’ design

‘t_its’ is an array of uint8. When there’s nothing null column in the tuple, t_bits can be considered empty, when the tuple has null columns, the t_bits array uses a bit to represent whether a column is null.

postgres=# create table t0(i1 int,i2 int,i3 int,i4 int,i5 int,i6 int,
postgres(# i7 int,i8 int,i9 int,i10 int,i11 int,i12 int
postgres(# );
CREATE TABLE
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,NULL,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,NULL,NULL,NULL,12);
INSERT 0 1
postgres=# insert into t0 values(1,2,3,4,5,6,7,8,9,10,11,12);
INSERT 0 1
postgres=#
postgres=# select lp,t_infomask,t_bits  from  heap_page_items(get_raw_page('t0', 0));
 lp | t_infomask |      t_bits      
----+------------+------------------
  1 |       2049 | 1111111111010000
  2 |       2049 | 1111111110010000
  3 |       2049 | 1111111100010000
  4 |       2048 | 
(4 rows)

postgres=#

The above script will create the table t0 with 12 columns, it then inserts several tuples in the table and execute the pageinspect tool query to inspect the tuples. Please see the t_bits value of the tuple :

t_bits

It can be concluded that:

The value of the tag bit unused in t_bits is 0.

The 1 value indicates that the corresponding column is not null, otherwise it is null.

The columns of the table that are dropped

After a table column is dropped in PG, the data structure of this column will be saved in the catalog in a way that is invisible to the user. Do a test as follows:

postgres=# alter table t0 drop column i1;
ALTER TABLE
postgres=# insert into t0 values(2,3,4,5,6,7,8,9,10,11,12);
INSERT 0 1
postgres=# select lp,t_infomask,t_bits  from  heap_page_items(get_raw_page('t0', 0));
 lp | t_infomask |      t_bits      
----+------------+------------------
  1 |       2049 | 1111111111010000
  2 |       2049 | 1111111110010000
  3 |       2049 | 1111111100010000
  4 |       2048 | 
  5 |       2049 | 0111111111110000
(5 rows)

postgres=#

In the above test, we drop column ‘I1’ of t0 and insert a record with no null value into t0. Using the pageinspact tool, we found that the inserted data (lp=5) also has null value. The t_bits value of the dropped column show that it is also considered as a null value.

droped_column

Conclusion

When a record has nothing null value, t_bits is not stored in the record. Once there is a null value, t_bits is stored in the tuple to record the null value status of all columns. A dropped column of a table is considered as null column. Therefore, when there are many columns in a table, do not drop a column, otherwise extra t_bits will be generated for each record, which causes the storage to swell.

2 Responses

  1. Movead Li Andrew Kozin says:

    Movead, thank you for the post! I love your approach to inspecting nuts and bolts of the database. This is really nice in… well… luck of documentation about this part of big picture.

    But I have a question — have you run `vacuum full t0` before getting nullified column values?
    I couldn’t reproduce this until full vacuuming (notice, that vacuum by itself doesn’t change `t_bits` field in any way). I tested in PostgreSQL 11:

    “`
    postgres=# select * from t0;
    i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 | i10 | i11 | i12
    —-+—-+—-+—-+—-+—-+—-+—-+—-+—–+—–+—–
    1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | 12
    1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | | 12
    1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | | | 12
    1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12

    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ——————+————
    1111111111010000 | 2305
    1111111110010000 | 2305
    1111111100010000 | 2305
    | 2304
    “`

    After columns removal:

    “`
    postgres=# select * from t0;
    i2 | i3 | i4 | i5 | i6 | i7 | i8 | i10 | i11 | i12
    —-+—-+—-+—-+—-+—-+—-+—–+—–+—–
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12

    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ——————+————
    1111111111010000 | 2305
    1111111110010000 | 2305
    1111111100010000 | 2305
    | 2304

    postgres=# vacuum t0;
    VACUUM
    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ——————+————
    1111111111010000 | 2817
    1111111110010000 | 2817
    1111111100010000 | 2817
    | 2816

    postgres=# vacuum full t0;
    VACUUM
    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ——————+————
    0111111101010000 | 2817
    0111111100010000 | 2817
    0111111100010000 | 2817
    0111111101110000 | 2817

    postgres=# select * from t0;
    i2 | i3 | i4 | i5 | i6 | i7 | i8 | i10 | i11 | i12
    —-+—-+—-+—-+—-+—-+—-+—–+—–+—–
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12
    “`

    It seems, that PostgreSQL just keeps these bits for removed columns for consistency, but doesn’t need it anymore.
    Only full vacuuming forces bits nullification.

    This is really interesting because I would expect the full vacuum to re-build the table with new definitions, but it doesn’t.

    Then I was iterested even more: what if some columns are added? At first, nothing has changed

    “`
    postgres=# select * from t0;
    i2 | i3 | i4 | i5 | i6 | i7 | i8 | i10 | i11 | i12 | i13 | i14 | i15 | i16 | i17 | i18 | i19
    —-+—-+—-+—-+—-+—-+—-+—–+—–+—–+—–+—–+—–+—–+—–+—–+—–
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | | | | | | |

    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ——————+————
    0111111101010000 | 2817
    0111111100010000 | 2817
    0111111100010000 | 2817
    0111111101110000 | 2817
    “`

    But when we add a new row with non-null value of some big column, the magics occur:

    “`
    postgres=# insert into t0 (i19) values (19);
    INSERT 0 1
    nepalez=# select * from t0;
    i2 | i3 | i4 | i5 | i6 | i7 | i8 | i10 | i11 | i12 | i13 | i14 | i15 | i16 | i17 | i18 | i19
    —-+—-+—-+—-+—-+—-+—-+—–+—–+—–+—–+—–+—–+—–+—–+—–+—–
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | | | 12 | | | | | | |
    2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | | | | | | |
    | | | | | | | | | | | | | | | | 19
    (5 строк)

    postgres=# select t_bits, t_infomask from heap_page_items (get_raw_page(‘t0’, 0));
    t_bits | t_infomask
    ————————–+————
    0111111101010000 | 2817
    0111111100010000 | 2817
    0111111100010000 | 2817
    0111111101110000 | 2817
    000000000000000000100000 | 2305
    “`

    Now `t_bits` expanded to 3×8 = 24 bit values for 19 columns to fit (17 current, and 2 deleted which are still kept in the same order as before). As expected, further updates would use longer `t_bits` from the begining, but I skip snippets for brevity ))

  2. Movead Li says:

    Thanks for comment.
    When you do a vacuum, it erases the invalid tuple on pages, it will not touch the real data for valid tuple, I think it’s for performance. However when you do a vacuum full, it removes all valid data to another relfilenode, so it gets a chance to modify all valid tuple, so the data for dropped column can be wipe for reduce the space of a tuple.

    Every table column has a record in pg_attribute catalog, you can search from pg_attribute catalog, when you drop a column, record in pg_attribute will never remove, so it need a bit in t_bits.

    When we add a column it have nothing influence to already existing tuples.

Leave a Reply

Your email address will not be published. Required fields are marked *