Optimizing SQL – Step 1: EXPLAIN Costs and Plans in PostgreSQL – Part 2

Enterprise PostgreSQL Solutions

Comments are off

Optimizing SQL – Step 1: EXPLAIN Costs and Plans in PostgreSQL – Part 2

This is the second blog in a series of blogs attempting to walk you through the query optimization process. We started from the very basics of understanding the “EXPLAIN” command. Reading part 1 is not a prerequisite, however, if you wish to understand how you can reconstruct a query from a given plan, Optimizing SQL – Step 1: EXPLAIN in PostgreSQL – Part 1 may help.

In this blog, we are going to dig a little deeper into the “EXPLAIN” command, learn more about costs, “EXPLAIN ANALYZE”, and compare plans in an attempt to better understand potential areas for optimizing queries.

Set Up

We are going to create 2 tables to explore and understand the “EXPLAIN” command further. 

-- big_table has 100 unique rows
postgres=# CREATE TABLE big_table AS (SELECT GENERATE_SERIES(1,100) id, 'hello ' || GENERATE_SERIES(1,100) AS name);
SELECT 100
 
-- small_table has 3 sets of 10 rows with the same data.
postgres=# CREATE TABLE small_table AS (SELECT (GENERATE_SERIES(0,29) % 10) + 1 id, 'hello ' || ((GENERATE_SERIES(0,29) % 10) + 1) AS name);
SELECT 30

The duplicate tuples will help us later when we use aggregate functions in a select statement.

Costs

PostgreSQL uses a cost based optimizer. What it really means is that there are multiple ways a query may be executed. The server needs a way to choose a good option if not the best amongst many possible ones available. This requires comparing different execution strategies. So PostgreSQL assigns costs to multiple competing strategies and chooses the ones with lower costs. 

Cost estimates are arbitrary values that are assigned to each step in query execution based on the expected resource load it may create. As per the PostgreSQL documentation, it is an arbitrary value for the planner’s estimate of the cost. “seq_page_cost” GUC is used as a baseline cost value. It is an arbitrary effort estimate of a disk page fetch that is part of a series of sequential fetches. Costs for other operations are defined in comparison to this cost. See the “Planner Cost Constants” section in a “postgresql.conf” section for other cost parameters. 

postgres=# EXPLAIN SELECT COUNT(*), bt.id, bt.name, st.id 
FROM big_table bt INNER JOIN small_table st
ON bt.id = st.id
WHERE bt.id < 8 AND st.id < 5
GROUP BY bt.id, bt.name, st.id
HAVING COUNT(*) > 1
ORDER BY bt.id DESC, bt.name;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 GroupAggregate  (cost=3.79..3.82 rows=1 width=24)
   Group Key: bt.id, bt.name, st.id
   Filter: (count(*) > 1)
   ->  Sort  (cost=3.79..3.80 rows=1 width=16)
         Sort Key: bt.id DESC, bt.name
         ->  Hash Join  (cost=1.50..3.78 rows=1 width=16)
               Hash Cond: (bt.id = st.id)
               ->  Seq Scan on big_table bt  (cost=0.00..2.25 rows=6 width=12)
                     Filter: (id < 8)
               ->  Hash  (cost=1.38..1.38 rows=10 width=4)
                     ->  Seq Scan on small_table st  (cost=0.00..1.38 rows=10 width=4)
                           Filter: (id < 5)
(12 rows)

There are some critical points to note here with regards to costs:

  • First and foremost, the cost is a double data type which gives more flexibility in setting up very small or large arbitrary values. 
  • Cost is a range.

The first of these values is an estimated startup cost. It is the time expended before the output phase can begin. It includes any time expended by child nodes. Have a look at the startup cost for node on line 20 and then see the startup cost for parent node on line 19. The parent node starts after the child node has completed. It indicates a consolidation of data; i.e. the parent node is waiting to collect all the rows before it starts working. Any nodes with a 0.00 startup costs indicate that these nodes can start pretty much immediately. 

The second value is the estimated total cost before the output phase can begin emitting rows to the parent node. So the parent node on line 19 receives all the qualified rows after which an additional cost of 0.30 is incurred.

In some cases, the parent node can start receiving data immediately from the child node. Have a look at the following output.

postgres=# EXPLAIN SELECT * FROM big_table, small_table;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Nested Loop  (cost=0.00..40.88 rows=3000 width=48)
   ->  Seq Scan on big_table  (cost=0.00..2.00 rows=100 width=12)
   ->  Materialize  (cost=0.00..1.45 rows=30 width=36)
         ->  Seq Scan on small_table  (cost=0.00..1.30 rows=30 width=36)
(4 rows)

It is a cross join between the big_table and small_table. You can clearly see that the “Materialize” node starts receiving data as the sequential scan begins on small_table. The “Nested Loop” also starts at the same time.

Cost is really a relative measure of where “potentially” the resources are spent. It gives us a good indication of areas that need improvement.

Optimizing Costs

For optimum performance, you may want to calibrate costs set in your "postgresql.conf" file according to your hardware. For example, “random_page_cost” is estimated as 4 times more expensive than a sequential access page. That is because of the expectation that the kernel will do read-ahead optimization and thereby reduces sequential fetch cost. Random page cost will be less for an SSD as compared to an HDD. You can also set custom costs for a table space that may be on a different storage device. See PostgreSQL’s “CREATE TABLESPACE” documentation for more details on it.Let’s take a closer look at the costs in the “EXPLAIN” plan given below.

Explain Analyze

Before we discuss “EXPLAIN ANALYZE”, you MUST remember that it runs the actual query; “EXPLAIN” without “ANALYZE” does not! So take special care when running “EXPLAIN ANALYZE” queries especially when running DML queries. If you wish to run “EXPLAIN ANALYZE” on an “INSERT/UPDATE/DELETE or CREATE TABLE AS” queries, wrap the statement in a transaction block.

Re-running the above query with “ANALYZE”, and we get:

postgres=# EXPLAIN ANALYZE SELECT COUNT(*), bt.id, bt.name, st.id 
FROM big_table bt INNER JOIN small_table st
ON bt.id = st.id
WHERE bt.id < 8 AND st.id < 5
GROUP BY bt.id, bt.name, st.id
HAVING COUNT(*) > 1
ORDER BY bt.id DESC, bt.name;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=3.79..3.82 rows=1 width=24) (actual time=2.499..2.829 rows=4 loops=1)
   Group Key: bt.id, bt.name, st.id
   Filter: (count(*) > 1)
   ->  Sort  (cost=3.79..3.80 rows=1 width=16) (actual time=2.412..2.550 rows=12 loops=1)
         Sort Key: bt.id DESC, bt.name
         Sort Method: quicksort  Memory: 25kB
         ->  Hash Join  (cost=1.50..3.78 rows=1 width=16) (actual time=1.683..1.990 rows=12 loops=1)
               Hash Cond: (bt.id = st.id)
               ->  Seq Scan on big_table bt  (cost=0.00..2.25 rows=6 width=12) (actual time=0.646..0.742 rows=7 loops=1)
                     Filter: (id < 8)
                     Rows Removed by Filter: 93
               ->  Hash  (cost=1.38..1.38 rows=10 width=4) (actual time=0.958..0.969 rows=12 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Seq Scan on small_table st  (cost=0.00..1.38 rows=10 width=4) (actual time=0.599..0.749 rows=12 loops=1)
                           Filter: (id < 5)
                           Rows Removed by Filter: 18
 Planning Time: 2.358 ms
 Execution Time: 3.093 ms
(18 rows)

In addition to cost, “EXPLAIN ANALYZE” gives us actual times, rows and loops information since it has executed the query. Comparing the costs and timing for nodes on lines 18 and 23, we can see that a cost of 2.25 takes upto 0.742 ms whereas 1.38 takes up 0.749 ms.

While value for “rows” is self explanatory, loops might not be. And no, we are not talking about cursors here. In some cases, a query might need to execute a node multiple times. Let’s re-run a cross join statement with “ANALYZE” as we would need to fetch one of the two tables again and again.

postgres=# EXPLAIN ANALYZE SELECT * FROM big_table, small_table;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..40.88 rows=3000 width=48) (actual time=12.700..121.419 rows=3000 loops=1)
   ->  Seq Scan on big_table  (cost=0.00..2.00 rows=100 width=12) (actual time=10.894..12.014 rows=100 loops=1)
   ->  Materialize  (cost=0.00..1.45 rows=30 width=36) (actual time=0.017..0.376 rows=30 loops=100)
         ->  Seq Scan on small_table  (cost=0.00..1.30 rows=30 width=36) (actual time=0.582..0.976 rows=30 loops=1)
 Planning Time: 2.109 ms
 Execution Time: 156.926 ms
(6 rows)

The node on line 6 runs 100 times which is exactly the number of rows in our big_table.

Comparing Plans

The costs help us better understand different choices available to us when writing an SQL query. Let’s take a simple example of choosing between a subquery vs a join. Which one is better in our particular case. The following queries return the exact same data.

postgres=# SELECT * FROM big_table WHERE id IN (SELECT DISTINCT(id) FROM small_table) ORDER BY id, name;
 
postgres=# SELECT DISTINCT(big_table.*) FROM big_table INNER JOIN small_table ON big_table.id = small_table.id ORDER BY id, name;

The first query takes “4.537 ms” and the second one takes “3.636 ms”. Reviewing the plans for both, the “hash join” node in the second query takes longer and results in a slower query.

Conclusion

Understanding where the query is spending most of its time is critical to optimizing it. The cost may translate to actual $ cost. So keeping query cost and runtimes low can reduce overall cost of impact on your hardware and bank account.

This completes the part on EXPLAIN in my series of blogs on optimizing SQL. In the next steps we are going to learn about other factors that impact query planning, optimization and execution. Stay tuned.