Understanding the Execution Plan of a Hash Join

Enterprise PostgreSQL Solutions

Comments are off

Understanding the Execution Plan of a Hash Join

A hash join is one of the most common join methods used by PostgreSQL and other relational databases. It works by building a hash table from the smaller input (called the build side) and then probing it with rows from the larger input (the probe side) to find matching join keys.
Hash joins are especially efficient for large, unsorted datasets—particularly when there are no useful indexes on the join columns.

This post uses a concrete example to explain how a hash join works. The example is run on PostgreSQL 18.

Preparing the Data

-- Create tables
CREATE TABLE regions (
    id SERIAL PRIMARY KEY,
    region_name TEXT
);

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name TEXT,
    region_id INT
);

-- Insert sample data
INSERT INTO regions (region_name)
SELECT 'region_' || g FROM generate_series(1, 10) g;

INSERT INTO customers (name)
SELECT 'customer_' || g FROM generate_series(1, 100000) g;

-- Assign each customer a region
UPDATE customers
SET region_id = (random() * 9)::int + 1;

The Query

EXPLAIN
SELECT c.id, c.name, r.region_name
FROM regions r
LEFT JOIN customers c ON c.region_id = r.id;

Output:

Hash Right Join  (cost=38.58..2575.95 rows=100000 width=50)
  Hash Cond: (c.region_id = r.id)
  ->  Seq Scan on customers c  (cost=0.00..2274.00 rows=100000 width=22)
  ->  Hash  (cost=22.70..22.70 rows=1270 width=36)
        ->  Seq Scan on regions r  (cost=0.00..22.70 rows=1270 width=36)

At first glance, this seems weird: the SQL uses LEFT JOIN, but the plan shows RIGHT JOIN. Why does that happen? This post is going to explain the puzzle.

Understanding JOIN Types

Let’s assume simplified data:

customers

idnameregion_id
1Alice1
2Bob2
3CarolNULL
4Dave4

regions

idregion_name
1North
2South
3East

LEFT JOIN

SELECT c.id, c.name, r.region_name
FROM customers c
LEFT JOIN regions r ON c.region_id = r.id;

Result:

idnameregion_name
1AliceNorth
2BobSouth
3CarolNULL
4DaveNULL

With a LEFT JOIN, all rows from the left table (customers) are kept. If a row has no match in the right table, its right-side columns are filled with NULL. This behavior can be described as “fill left table.”

RIGHT JOIN

SELECT c.id, c.name, r.region_name
FROM customers c
RIGHT JOIN regions r ON c.region_id = r.id;

Result:

idnameregion_name
1AliceNorth
2BobSouth
NULLNULLEast

A RIGHT JOIN keeps all rows from the right table (regions). If no customer matches a region, the customer columns are filled with NULL. This is “fill right table.”

INNER JOIN

SELECT c.id, c.name, r.region_name
FROM customers c
INNER JOIN regions r ON c.region_id = r.id;

Result:

idnameregion_name
1AliceNorth
2BobSouth

Only matching rows are kept — no “fill” behavior.

FULL JOIN

SELECT c.id, c.name, r.region_name
FROM customers c
FULL JOIN regions r ON c.region_id = r.id;

Result:

idnameregion_name
1AliceNorth
2BobSouth
3CarolNULL
4DaveNULL
NULLNULLEast

FULL JOIN keeps all rows from both sides — it “fills both tables.”

Why LEFT JOIN Becomes RIGHT JOIN in the Plan

The following two statements are logically identical:

SELECT c.id, c.name, r.region_name
FROM regions r
LEFT JOIN customers c ON c.region_id = r.id;

SELECT c.id, c.name, r.region_name
FROM customers c
RIGHT JOIN regions r ON c.region_id = r.id;

The first LEFT JOIN statement “fill left table (regions)”; the second RIGHT JOIN statement “fill in right table (regions)” . So both produce the same result. The difference is only in which table is conceptually “left” or “right.” PostgreSQL’s planner may swap them for efficiency, depending on which table is smaller.

How Hash Join Works

When performing a hash join, PostgreSQL chooses the smaller table and loads all its rows into memory to build a hash table.
Then it scans the larger table one row at a time and probes the hash table for matches.

In planner terminology:

  • The table used to build the hash table is called the inner table.
  • The table scanned row by row is called the outer table.

Hash Join Process

Step 1: Build phase (inner table loaded into hash table)
-------------------------------------------------------
Inner Table (regions)        Hash Table
+----+-------------+        +----+-------------+
| id | region_name |  --->  | id | region_name |
+----+-------------+        +----+-------------+
| 1  | North       |        | 1  | North       |
| 2  | South       |        | 2  | South       |
| 3  | East        |        | 3  | East        |
+----+-------------+        +----+-------------+

Step 2: Probe phase (scan outer table)
--------------------------------------
Outer Table (customers)
+----+-------+-----------+
| id | name  | region_id |
+----+-------+-----------+
| 1  | Alice | 1         |
| 2  | Bob   | 2         |
| 3  | Carol | NULL      |
| 4  | Dave  | 4         |
+----+-------+-----------+

For each outer row:
- Look up region_id in the hash table.
- If a match is found, join the rows.
- If no match (NULL or missing id), fill inner columns with NULL.

Step 3: Result of LEFT JOIN
+----+-------+-----------+-------------+
| id | name  | region_id | region_name |
+----+-------+-----------+-------------+
| 1  | Alice | 1         | North       |
| 2  | Bob   | 2         | South       |
| 3  | Carol | NULL      | NULL        |
| 4  | Dave  | 4         | NULL        |
+----+-------+-----------+-------------+

Join Type Decision Table

In PostgreSQL’s plan, whether it shows LEFT JOIN or RIGHT JOIN depends on which table the planner chooses as the inner or outer side. The “fill” direction is decided by the SQL statement, while the hash build side is decided by table size.

Fill InnerFill OuterJoin Type
truefalseRight Join
falsetrueLeft Join
truetrueFull Join
falsefalseInner Join

So in our example, since regions is smaller, PostgreSQL builds the hash table from it (making it the inner table). The query requests a fill on regions (LEFT JOIN). Combining these conditions results in a Hash Right Join in the plan — even though the SQL used LEFT JOIN.

Summary

  • PostgreSQL’s planner can flip join sides for efficiency.
  • A “LEFT JOIN” in SQL may appear as a “RIGHT JOIN” in the plan if the smaller table is on the opposite side.
  • The join type name in the plan reflects which side is the hash build side, not necessarily the SQL syntax.