Types of Indexes in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

Types of Indexes in PostgreSQL

Finding relevant information quickly speeds up performance. For example, while reading a book in which you have to find a topic that you would like to read, if you know that it is in a certain chapter then you will simply go to that chapter, perhaps look through it and start reading the desired topic.

However if you don’t know then you can consult with the book index that’s usually available at the end of the book first, and find out which pages might list the desired topic. This way you will go only those pages to look for the topic. However if the index is not available then you will have to browse through the whole book looking for the pages of interest. This will obviously consume a lot of time in finding the relevant pages instead of quickly jumping to the desired pages right away.

The analogy of using a book index for searching the desired pages is similar to searching a database table for the desired tuples. If you can create an index on a database table the query execution may get quite fast and if you don’t have the index then it may end up taking quite a lot of time. I used the word “may” because in some cases a full table scan might be faster than an index scan, we will explore some of these in this blog moving forward.

PostgreSQL provides a variety of indexes and also quite a number of ways to create these indexes. The blog provides a brief introduction of all the different index types available in PostgreSQL, and also provides some examples to elaborate the index types.

1. B-Tree Index

It is the default index type in PostgreSQL that gets created when you do a ‘CREATE INDEX’ statement without mentioning the index name.

This index is much suitable for the data that can be sorted and can handle equality and range queries. The following command is used to create a btree index:

CREATE INDEX name ON table (column); or
CREATE INDEX name ON table USING BTREE (column);

2. Hash Index

The hash index prior to PostgreSQL version 10 were almost discouraged for various reasons such as:

  • Not WAL logged hence not crash safe
  • Not replicated to stand-by server
  • Performance is not so good and may take a long time to build the index (depending on the  table size)

However since version 10, these problems have been resolved. They are now crash safe and are able to be replicated to standby server. They sometimes perform better than b-tree indexes and are also space efficient.

The Hash index only works with equality operators that means that you can only look up for data that matches exactly. However it is now much optimized and does a much faster lookup which makes this index a bit of specialized index that can be used where equal comparison is more important. The following command is used to create a hash index:

CREATE INDEX name ON table USING HASH (column);

3. Gist Index

Gist or Generalized Search Tree are useful when the data to be indexed is more complex than to do a simple equate or ranged comparison like finding nearest-neighbor and pattern matching. The example of such data includes geometric data, network address comparisons and full-text searches. 

The Gist index itself provides an infrastructure to implement different strategies for indexing data such as B-trees and R-trees. The following command is used to create a hash index:

CREATE INDEX name ON table USING gist (column);

4. SP-Gist Index

SP-Gist or Space partitioned Gist indexes are useful when the data can be grouped into non-overlapping groupings. Like Gist index, it also provides an infrastructure for implementing different indexing strategies. However SP-Gist is non-balanced in nature and divides data in partitions, so it allows the implementation of quad-trees, k-d trees, and radix trees (tries).

5. Gin Index

Generalized Inverted indexes are useful in indexing data that consist of multiple elements in a single column such as arrays, json documents (jsonb) or text search documents (tsvector).

CREATE INDEX name ON table USING gin (column);

6. BRIN Index

Block Range Indexes are useful for large size tables that have columns with some natural sort order. The BRIN index divides the table into block ranges and keeps a summary of those blocks. This summary includes min, max values of the range. The following command is used to create a hash index:

CREATE INDEX name ON table USING brin (column);

The above list discribes the available index algorithms availble in postgres database, now lets see some of the characteristics of indexes that can be used to further tweek and enhance the performance of indexes.

Multicolumn Indexes

PostgreSQL does allow creation of an index on multiple columns. However you can only have a maximum of 32 columns and such indexes only work with Btree, Gist, Gin and Brin. An example of such index is:

CREATE TABLE test (x int, y int);
CREATE INDEX multi_idx ON test (x, y);

postgres=# EXPLAIN (costs off) SELECT * from test WHERE x = 10 AND y = 100;
               QUERY PLAN                
-----------------------------------------
 Index Only Scan using multi_idx on test
   Index Cond: ((x = 10) AND (y = 100))
(2 rows)

Unique Indexes

A unique index enforces the uniqueness of the values in the column. In PostgreSQL a unique index can be created on one or multiple columns. If a unique index is created for multiple columns the uniqueness is ensured using the combined values of columns. however only B-tree index can be declared unique.

CREATE TABLE test (x int, y int);
CREATE UNIQUE INDEX unique_idx ON test (x);
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 x      | integer |           |          | 
 y      | integer |           |          | 
Indexes:
    "unique_idx" UNIQUE, btree (x)

Let’s create a multi-column index.

CREATE UNIQUE INDEX unique_multi_col_idx ON test (x, y);
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 x      | integer |           |          | 
 y      | integer |           |          | 
Indexes:
    "unique_idx" UNIQUE, btree (x)
    "unique_multi_col_idx" UNIQUE, btree (x, y)

NULL values are not considered equal, hence they are considered unique values. Adding multiple NULLs in a unique column won’t result in index violation.

Also note that, if you declare a unique constraint or primary key for a column, PostgreSQL automatically creates a unique index. An example of such index is:

CREATE TABLE test (x int PRIMARY KEY, y int UNIQUE);
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 x      | integer |           | not null | 
 y      | integer |           |          | 
Indexes:
    "test_pkey" PRIMARY KEY, btree (x)
    "test_y_key" UNIQUE CONSTRAINT, btree (y)

Indexes on Expressions

It is also possible in PostgreSQL database to create the indexed columns based on the result of a function or scalar expression computed from one or more columns of the table. Since the result of computed expression is stored on the index and it won’t be computed at run time, the access to table data becomes much faster. However the insertion and updation of the data becomes more expensive. Some example of this index are:

CREATE TABLE test (x int, y text);
postgres=# EXPLAIN (costs off) SELECT * from test WHERE lower(y) = 'value';
              QUERY PLAN              
--------------------------------------
 Seq Scan on test
   Filter: (lower(y) = 'value'::text)
(2 rows)

Now lets create an indexed expression and see the query again. It should show the index being built using the expression now, instead of the table field.

postgres=# CREATE INDEX expr_idx ON test (lower(y));
CREATE INDEX
postgres=# EXPLAIN (costs off) SELECT * from test WHERE lower(y) = 'value';
                   QUERY PLAN                   
------------------------------------------------
 Bitmap Heap Scan on test
   Recheck Cond: (lower(y) = 'value'::text)
   ->  Bitmap Index Scan on expr_idx
         Index Cond: (lower(y) = 'value'::text)
(4 rows)

postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 x      | integer |           |          | 
 y      | text    |           |          | 
Indexes:
    "expr_idx" btree (lower(y)

Partial Indexes

There are some situations where you don’t want to index the whole table, instead you will want to filter out some specific data based on some conditions. This kind of index is called a partial index. The partial index only contains the data for those rows that fulfill that specific condition.

These indexes contain a WHERE clause and result in much smaller index sizes than the usual indexes (without conditions). Since they have a smaller footprint, they result in faster data access. An example of such index is:

CREATE TABLE test (x int, y text);
postgres=# EXPLAIN (costs off) SELECT * from test WHERE y IS NULL;
      QUERY PLAN       
-----------------------
 Seq Scan on test
   Filter: (y IS NULL)
(2 rows)

Lets create a partial index

postgres=# CREATE INDEX partial_idx ON test (y) WHERE y IS NULL;
CREATE INDEX
postgres=# EXPLAIN (costs off) SELECT * from test WHERE y IS NULL;
               QUERY PLAN               
----------------------------------------
 Bitmap Heap Scan on test
   Recheck Cond: (y IS NULL)
   ->  Bitmap Index Scan on partial_idx
(3 rows)

Index-Only Scans

Index only scans are the index where all the needed data by the query is available directly from the index. Normally when an index is created, it is stored separately from the table data and whenever a query is executed to fetch the data rows it is read from the index and table files. However, if the query only consists of indexed columns then there is no need to fetch the data from relation files, it can be directly returned from the index. Which can improve the query performance.

CREATE TABLE test (x int, y int);
CREATE INDEX col_idx ON test (x);
postgres=# EXPLAIN (costs off) select x from test where x > 10;
              QUERY PLAN               
---------------------------------------
 Index Only Scan using col_idx on test
   Index Cond: (x > 10)
(2 rows)

But if you include a column that’s not part of the index then that would not be covered under index only scan.

postgres=# EXPLAIN (costs off) select x, y from test where x > 10 ;
            QUERY PLAN            
----------------------------------
 Index Scan using col_idx on test
   Index Cond: (x > 10)
(2 rows)

However not all index types provide support for index-only scans. Currently the B-tree, Gist and SP-Gist index provide the support.

Covering Indexes

The covering index is similar to Index-Only scans with a little difference. What if you only want to create an index on a specific column but still want to be able to answer the query from the index without going to the main table. That’s what the covering scans are. PostgreSQL has the option to allow this by using the ‘INCLUDE’ clause while creating the index.

CREATE TABLE test (x int, y int);
CREATE INDEX col_idx ON test (x) include (y);
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 x      | integer |           |          | 
 y      | integer |           |          | 
Indexes:
    "col_idx" btree (x) INCLUDE (y)

postgres=# EXPLAIN (costs off) select x, y from test where x > 10 ;
              QUERY PLAN               
---------------------------------------
 Index Only Scan using col_idx on test
   Index Cond: (x > 10)
(2 rows)

This query would have used Index Scan instead of Index-Only scan if we did not specify the INCLUDE clause.

Conclusion

Having the ability to choose from many different types of indexes is a power that must be used carefully as well. The performance that indexes offer can very easily degrade when incorrect type of index or poorly create one is used. It is therefore imperative that you understand the specific use case and define an index or multiple indexes based on the need. While indexes are mainly created for quick data retrieval, the database system does a lot of background work to keep those in sync with updated data in the table which may lead to bloated indexes. So keep data updates in perspective as well the specific retrieval use case when building an index.