Features In PG13 – Deduplication in B-Tree Indexes

Enterprise PostgreSQL Solutions

2 comments

Features In PG13 – Deduplication in B-Tree Indexes

PG13; not to be confused with PG-13, only PostgreSQL Guidance required here!

PostgreSQL Beta 1 was released on May 21, 2020 and Beta 2 on June 25, 2020. A beta might not be bug free, but it almost always includes all core features of the full release. PG betas are no different. Beta 2 includes all the major features we expect to see in the final release. There are some fairly useful and exciting features that are worth mentioning.

Key Feature Highlights

Like any other major PostgreSQL release, version 13 looks to improve performance and usability. Some of the notable features are:

  • Deduplication of Data in B-Tree
  • Incremental Sorting
  • BEFORE Row-Level Triggers for Partition Tables
  • FETCH FIRST WITH TIES

This blog will specifically focus on the deduplication of data in b-tree indexes introduced in PG13. We are going to look at how this impacts the index sizes and performance for PG13 versus PG12.

Duplicates in a B-Tree

Contrary to general beliefs, duplicates occur in an index and more often than one would anticipate. By definition, a duplicate entry is a leaf node in a B-Tree index where there are at least 2 index entries within the same index that carry the same data for all index column(s).

Duplicates occur in another way too. When records are updated in a database table, in actual a new tuple is created and the old marked as deleted. On physical or in memory storage, the tuple now resides in a new location. The index entry must also point to the new location. This, thereby ends up creating a new node in the B-Tree index.

Deduplication

Deduplication is an option that groups together duplicate B-Tree index entries into a group entry. This avoids the need to duplicate column(s) data. Instead, a sorted list of TID is maintained that points to rows in the table. Maintaining TIDs also serve to uniquely identify a tuple and thereby fulfilling one of Lehman and Yao’s algorithmic conditions. TIDs in B-Tree indexes were introduced in PG12.

How Does Deduplication Work?

I would’ve imagined that managing B-Tree is pretty straightforward. That’s not really the case. When we discuss index B-Trees, there are lots of algorithms in play for managing various aspects. This makes it a fairly complex code for tree structures, management of free space, etc. And that’s precisely why deduplication is such a big highlight in PG13.

One would assume that with deduplication enabled, we’ll end up with a B-Tree without any duplicates. That’s not quite true. You may still have duplicates in a B-Tree.

Deduplication only happens when new items are being inserted into a B-Tree index. And that too when the remaining free space in an index page is less than the space required by the new index tuple, and not before single index page vacuum and page compaction has been run. Simply put, when a B-Tree index page is being split, deduplication happens on the current index page, not on the new one being created during the split. Any duplicates here are then grouped together in list tuples thereby achieving deduplication. This entire process also ensures that space gets evenly distributed across index pages.

Setting Up Environment

So now that we have understood how deduplication works, let’s see it in action. To demonstrate its working, we are going to compare PG version 12.2 with the PG version 13 Beta 2. Creating a table and B-Tree index on both versions.

postgres=# CREATE TABLE btree_dups AS (SELECT GENERATE_SERIES(1, 1000000)::BIGINT AS val);
SELECT 1000000
postgres=# CREATE INDEX btree_idx ON btree_dups(val);
CREATE INDEX
postgres=# SELECT  c.relname
        , c.relkind
        , pg_size_pretty( pg_relation_size(c.oid) )
FROM    pg_class c
WHERE   c.relname = 'btree_dups' 
        OR  c.oid IN 
        (
            SELECT i.indexrelid FROM pg_index i WHERE i.indrelid = 'btree_dups'::regclass 
        );
  relname   | relkind | pg_size_pretty 
------------+---------+----------------
 btree_dups | r       | 35 MB
 btree_idx  | i       | 21 MB
(2 rows)

The initial table and index sizes are the same on both versions as well. Let’s try to change that by making some updates.

postgres=# UPDATE btree_dups SET val = val + 1;
UPDATE 1000000

The above query 3 times on both versions. 

Deduplication in Action

Time to check how the table and index sizes have increased. (Skipping the SQL statement for brevity).

12.2

  relname   | relkind | pg_size_pretty 
------------+---------+----------------
 btree_dups | r       | 104 MB
 btree_idx  | i       | 86 MB
(2 rows)

13 Beta 2

  relname   | relkind | pg_size_pretty 
------------+---------+----------------
 btree_dups | r       | 104 MB
 btree_idx  | i       | 62 MB
(2 rows)

That’s a significant difference in size. Should we expect a read performance improvement then? Although the scope of this blog does not include performance benchmarking, running simple select with timing on in psql shows the following results.

12.2

postgres=# EXPLAIN SELECT val FROM btree_dups;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on btree_dups  (cost=0.00..23275.00 rows=1000000 width=8)
(1 row)

Time: 2.415 ms
postgres=# DO
postgres-# $$BEGIN
postgres$# PERFORM * FROM btree_dups;
postgres$# END;
postgres$# $$;
DO
Time: 190.101 ms

13 Beta 2

postgres=# EXPLAIN SELECT val FROM btree_dups;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on btree_dups  (cost=0.00..23274.00 rows=1000000 width=8)
(1 row)

Time: 2.221 ms
postgres=# DO
postgres-# $$BEGIN
postgres$# PERFORM * FROM btree_dups;
postgres$# END;
postgres$# $$;
DO
Time: 174.843 ms

I’ve picked up the median values after running the above select statement many, many times. So there is an obvious performance benefit. How much? That requires a separate and more detailed analysis.

Limitation and Syntax

By default, deduplication is enabled for B-Tree indexes. However, you may turn it ON or OFF for a given index by specifying required value for the index storage parameter “deduplicate_items”. Turning off deduplication for an index does not change the existing posting list tuples.

Deduplication does not occur for all B-Tree indexes. There are some unsupported data types and containers as well as any indexes including non-index columns for which deduplication is disabled.

Conclusion

In the end, deduplication is a major enhancement that reduces load on disk and RAM, as well as improving performance where duplicate data otherwise will cause index bloating. However, there is performance overhead as well as deduplication is an additional step that may not always benefit. It is therefore recommended to use it wisely, keeping it on by default but also disabling it for any indexes where frequency of insertions is high or updates yield new values that previously did not exist.

2 Responses

  1. Hamid Akhtar Mohammad Mahmoud Alhashash says:

    This example does not actually represent B-Tree index deduplication; There is no duplicate values at all in this data (1 to 1M icremented three times). Th increase in index size is a result of dead tuples that will be removed by VACUUM or REINDEX.

    Try the command `REINDEX INDEX btree_idx;` and the index in both versions will be reduced back to 21MB.

    You may test actual deduplication by using the modulus of the series. For example:
    `CREATE TABLE btree_dups AS (SELECT GENERATE_SERIES(1, 1000000)::BIGINT % 1000 AS val0);`

    In this case the table will have 1000 different values each value has 1000 duplicate. The result index size in PG13 7MB instead of 21MB.

    You may also try different duplcation frequency by changing the modulus.

    • Hamid Akhtar Mohammad Mahmoud Alhashash says:

      Also, both queries do not use index at all! It is clearly just table scan as show in the explain result “Seq Scan on btree_dups”. The difference is mostly noise and PG version related.

      To use the index, you need a relatively high selectivity query like `val FROM btree_dups WHERE val=200; `

Leave a Reply

Your email address will not be published. Required fields are marked *