Understand PostgreSQL’s Planner – Simple Scan Paths vs Plans

Enterprise PostgreSQL Solutions

Comments are off

Understand PostgreSQL’s Planner – Simple Scan Paths vs Plans

Introduction

When you send a query to PostgreSQL, it normally would go through stages of query processing and return you the results at the end. These stages are known as:

  • Parse
  • Analyze
  • Rewrite
  • Plan
  • Execute

I wrote another blog to briefly explain the responsibility of each query processing stage. You can find it here. In this blog, we will only focus on the “plan” stage or the “planner” module as this is perhaps the most interesting or complex stage if you will. I will share my understanding of the planner module as I investigate its internal workings to handle a simple sequential scan. This will be based on PostgreSQL 16.

The planner has a very simple objective though, identify the fastest “path” from an options of available paths and make a “plan” out of it so the “executor” module can execute it in the next stage. Yet, to identify the fastest “path” is what makes the planner complicated. Let’s take a look.

Where it all start

the function exec_simple_query() in postgres.c is where the query processing stages take place. We will focus on what happens after it lands on pg_plan_query(). I will only mention the important functions that it will land on.

What happens behind pg_plan_query?

A lot of things happen actually such as:

  • identify sub query, partitioned table, foreign table, joins…etc
  • estimate the sizes of all tables involved with help of table access method
  • identify all possible paths to complete the query
    • sequential scan, index, scan, tid, scan, parallel worker…etc
  • out of all the paths, find the best one, normally the cheapest
  • make a plan out of it.

Taking a simple SELECT query on a single table, without any join or sub query, the approximate call stack under pg_plan_query can be visualized as below:

This calls tack diagram is greatly reduced, but illustrates several key elements of the planner module. You may need to download the image for better view in case the words are too small. The block marked with a blue star will be explained in detail next.

set_base_rel_sizes()

As the name suggests, this is the main entry point to estimate the size of all relations (tables, views, indexes ..etc) involved. The size includes estimated number of rows (tuples) and columns. This information normally has to be obtained from “heap access method”, where it would have access to “buffer manager”, and “storage manager” to provide an estimate of sizes.

The total size would be the sizes of all the involved tables. This is important for the “cost estimate” stage later.

set_base_rel_pathlist()

For a simple sequential scan of a simple table, this is where the program will end up. Other, more complex queries would have different path-building techniques. Refer to “set_rel_pathlist()” in allpaths.c for other path building techniques.

Currently, 4 scan paths are added by default:

  • sequential scan
    • scan everything sequentially
  • partial sequential scan
    • this is added as partial because it is to be aggregated by the “gather” node which is considered in the next stage. This essentially means a parallel sequential scan
    • only added if the relation or query is parallel safe
  • index scan
    • if the table has index, it can be considered as a potential path
  • tid scan
    • if the query contain range restriction clause ( WHERE ctid >'(1,30)’ AND ctid < ‘(30,5)’) then tid scan could be an option

all of these would involve some costs, estimated by the number of tuples or pages there are and the cost factor per tuple / page as configured as below:

# - Planner Cost Constants -

seq_page_cost = 1.0                    # measured on an arbitrary scale
random_page_cost = 4.0                 # same scale as above
cpu_tuple_cost = 0.01                  # same scale as above
cpu_index_tuple_cost = 0.005           # same scale as above
cpu_operator_cost = 0.0025             # same scale as above
parallel_setup_cost = 1000.0           # same scale as above
parallel_tuple_cost = 0.1              # same scale as above

Different path methods have different cost calculations, they would call the following methods to compute a “startup cost” and a “run cost”:

  • cost_seqscan()
  • cost_indexscan()
  • cost_tidscan()

You could essentially influence planner’s decision in choosing the most ideal path to make a plan. For example, if you prefer the planner to use parallel scan more often, you could consider making the cost per parallel scan tuple cheaper, by making “parallel_tuple_cost” smaller, like 0.001.

The “add_path” is called to add a path to a list of potential paths, but keep in mind that planner path building mechanism does have a ejection mechanism. This means that if we intend to add a path that is significantly better than the rest of the paths already added, it may drop all existing path while accepting the new. Likewise, if the path to add is significantly worse, it would not be added at all.

The “add_partial_path” is called if planner considers parallel sequential scan safe. This kind of sequential scan is “partial” because it needs to be collected and aggregated to form the final results, resulting in extra costs there, so parallelism may or may not be always ideal. Here’s a rule of thumb:

  • if PostgreSQL has to scan a lot of data but only a few of them is what we want, parallelism can help
  • if PostgreSQL has to scan a lot of data and most of them are what we want, parallelism may be slower.

generate_gather_paths

This is called if there is already some partial paths added, which is normally sequential scan sub path. This routine add a new path type called “gather” which has a subpath called “sequential scan”. The gather path has to consider the costs of each sub path and the cost to fetch the tuple from the parallel workers and aggregate the data in the final form.

get_cheapest_fractional_path and create_plan

Once all the potential path candidates have been added, this function will be called to choose the cheapest one, the one that has the lowest total cost. Then this chosen path will be fed into “create_plan” where the path (and subpaths if any) will be created “recursively and formulate to a final plan structure that executor can understand and execute.

Examine the Plan

We can use EXPLAIN ANALYZE before our query to examine the cheapest plan that planner has chosen and its cost details. The following example is a query plan that consists of 1 main plan called “gather” which consists of 1 partial plan called “sequential scan” with 2 workers. You can tell by the arrow ( -> ) which denotes a sub path.

postgres=# explain analyze select * from test where a > 500000 and a <600000;
                          QUERY PLAN
------------------------------------------------------------
 Gather  (cost=1000.00..329718.40 rows=112390 width=36) (actual time=62.362..5106.295 rows=99999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..317479.40 rows=46829 width=36) (actual time=58.020..3416.544 rows=33333 loops=3)
         Filter: ((a > 500000) AND (a < 600000))
         Rows Removed by Filter: 13300000
 Planning Time: 0.489 ms
 Execution Time: 5110.030 ms
(8 rows)

If planner chooses the sequential scan main path without any sub paths, the query plan would look like:

postgres=# explain analyze select * from test where a > 500000;
                          QUERY PLAN
------------------------------------------------------------
 Seq Scan on test  (cost=0.00..676994.40 rows=39571047 width=6) (actual time=0.011..7852.896 rows=39500000 loops=1)
   Filter: (a > 500000)
   Rows Removed by Filter: 500000
 Planning Time: 0.115 ms
 Execution Time: 9318.773 ms
(5 rows)

Summary

planner is complex but I hope this blog can help you understand the basic working principles of planner so that if you would like to “hack” PostgreSQL’s planner module to improve some aspects of it or add new path options, you at least know where to start.