How to run Hierarchical Queries with PostgreSQL

Enterprise PostgreSQL Solutions

1 comment

How to run Hierarchical Queries with PostgreSQL

A hierarchical query is built upon parent-child relationship, the relationship exist in the same table or view. The relationship dictates that each child can have one parent while a parent can have many children. Hierarchical query is a SQL query that handles data of hierarchical model i.e. an organisation structure where every employee has one manager and one manager who is also an employee can have many employees in his reporting, another example is a family tree where one person can only have one parent while a person can have many children. There are many examples where the data contains parent-child relationship i.e. hierarchical model and traversing through a self-reference table using the standard SQL constructs can be a challenging and difficult task. The simple queries of hierarchical data in the same table or view is possible using the table join etc but as the relationship becomes deeper, it is difficult to traverse the data to get expected results.

All major database engines including PostgreSQL, Oracle, DB2 and MS SQL provide support for querying such kind of data. The hierarchical query constructs were one of the first features that were added to Oracle more than 20 years ago. PostgreSQL has introduced the WITH Queries clause (Common table expression) to handle such type of data, using WITH Queries along with the optional recursive clause, it is possible to refer to its own output in a single query. The CTE can also be thought of as a defining temporary table that exist just for one query.

In this blog we will be using an organisation structure in order to demonstrate the usage of hierarchical queries. The blog will show the usage of Hierarchical queries feature in Oracle and then demonstrate how we can use the CTE with Recursive keyword to achieve similar results in PostgreSQL. We will also discuss the ongoing work in the PostgreSQL community for improving the recursive self-referencing queries. We will be using the well known (EMP) table (as shown below) that contains a hierarchical type of data, the root of the relationship is “KING”. King is the president in this employee data, below the President each manager has its own department and each department contains one or more employees reporting to the manager. Using the hierarchical query feature, the user can traverse through the hierarchy in the EMP table, it can traverse from bottom up in order to retrieve the top hierarchy of an employee or top down to find out all the employee under one manager. In this blog, we will use the hierarchical query feature in Oracle and Recursive query of PostgreSQL for performing bottom-up search and top-down search of EMP data.

So this is how the EMP table looks like :

postgres=# select * from emp;
 empno | ename  |    job    | mgr  |  hiredate  |   sal   |  comm   | deptno 
-------+--------+-----------+------+------------+---------+---------+--------
  7369 | SMITH  | CLERK     | 7902 | 1980-12-17 |  800.00 |         |     20
  7499 | ALLEN  | SALESMAN  | 7698 | 1981-02-20 | 1600.00 |  300.00 |     30
  7521 | WARD   | SALESMAN  | 7698 | 1981-02-22 | 1250.00 |  500.00 |     30
  7566 | JONES  | MANAGER   | 7839 | 1981-04-02 | 2975.00 |         |     20
  7654 | MARTIN | SALESMAN  | 7698 | 1981-09-28 | 1250.00 | 1400.00 |     30
  7698 | BLAKE  | MANAGER   | 7839 | 1981-05-01 | 2850.00 |         |     30
  7782 | CLARK  | MANAGER   | 7839 | 1981-06-09 | 2450.00 |         |     10
  7788 | SCOTT  | ANALYST   | 7566 | 1987-04-19 | 3000.00 |         |     20
  7839 | KING   | PRESIDENT |      | 1981-11-17 | 5000.00 |         |     10
  7844 | TURNER | SALESMAN  | 7698 | 1981-09-08 | 1500.00 |    0.00 |     30
  7876 | ADAMS  | CLERK     | 7788 | 1987-05-23 | 1100.00 |         |     20
  7900 | JAMES  | CLERK     | 7698 | 1981-12-03 |  950.00 |         |     30
  7902 | FORD   | ANALYST   | 7566 | 1981-12-03 | 3000.00 |         |     20
  7934 | MILLER | CLERK     | 7782 | 1982-01-23 | 1300.00 |         |     10
(14 rows)

CONNECT BY, PRIOR and START WITH in Oracle

In Oracle, the hierarchical query is defined using the two mandatory keywords i.e. CONNECT BY and START WITH. The hierarchy is built when the CONNECT BY defines the relationship between parent and child, the PRIOR keyword used with CONNECT by specifies the parent. The START_WITH clause identify the record that the user wants to start with. There are other pseudo columns and clauses like LEVEL, connect_by_root and ORDER SIBLINGS BY that we will discuss in the next series of this blog, in this article we will focus on the mandatory clauses that are used by Oracle to build the hierarchical tree and how we can achieve the same results in PostgreSQL using the CTE Recursive queries.

Here is a simple hierarchical query that uses the connect by to define the hierarchy and start with where manager is null. As mentioned above King is the President and King and no value for mgr, using the query below we are traversing the tree top down.

SQL> select ename,job,mgr from emp 
  2  connect by prior empno = mgr
  3  start with mgr is null;
ENAME	   JOB		    MGR
---------- --------- ----------
KING	   PRESIDENT
JONES	   MANAGER	   7839
SCOTT	   ANALYST	   7566
ADAMS	   CLERK	   7788
FORD	   ANALYST	   7566
SMITH	   CLERK	   7902
BLAKE	   MANAGER	   7839
ALLEN	   SALESMAN	   7698
WARD	   SALESMAN	   7698
MARTIN	   SALESMAN	   7698
TURNER	   SALESMAN	   7698

ENAME	   JOB		    MGR
---------- --------- ----------
JAMES	   CLERK	   7698
CLARK	   MANAGER	   7839
MILLER	   CLERK	   7782

14 rows selected.

If we are only interested to see all the employees reporting to BLAKE, we can execute the following query :

SQL> 
  1 select empno,ename,job,mgr from emp
  2 connect by prior empno = mgr
  3 start with ename = 'BLAKE'

     EMPNO ENAME      JOB	       MGR
---------- ---------- --------- ----------
      7698 BLAKE      MANAGER	      7839
      7499 ALLEN      SALESMAN	      7698
      7521 WARD       SALESMAN	      7698
      7654 MARTIN     SALESMAN	      7698
      7844 TURNER     SALESMAN	      7698
      7900 JAMES      CLERK	      7698

6 rows selected.

The above queries show a top down search where the root is the top most employee in the hierarchy (based on start with clause) and then displaying the employees in the reporting. We can also perform a bottom up search by starting with employee that is at the bottom of the hierarchy and then displaying the tree upward.

This is done by moving the PRIOR keyword to parent side of the relationship. As shown in the query below, we have moved the PRIOR to mgr and started with ADAMS which is CLERK, the query results build the tree with ADAMS as the root and all the way upto the KING which is the highest level in the employee hierarchy.      

1 select empno,ename,job,mgr from emp
2 connect by empno = prior mgr
3 start with empno = 7876

     EMPNO ENAME      JOB	       MGR
---------- ---------- --------- ----------
      7876 ADAMS      CLERK	      7788
      7788 SCOTT      ANALYST	      7566
      7566 JONES      MANAGER	      7839
      7839 KING       PRESIDENT 

Hierarchical Queries in PostgreSQL

Hierarchical queries can be achieved using CTE recursive queries, PostgreSQL provides the WITH  statement that allows to create temporary tables or auxiliary statements that are for use in the same query. The temporary tables created in CTE query exist only during the execution of the query.  Recursive is an optional keyword used with the CTE query, the optional recursive changes enables the WITH query to refer to its output. The recursive queries are typically used to deal with hierarchical or tree structure data. The general form of recursive WITH query has the following structure :

WITH Recursive cte_query AS 
(
	CTE_Query Definition;
UNION ALL
	CTE_Query Definition;
)
SELECT * FROM cte_query;

The first part of CTE_Query Definition is non-recursive term, the second part is recursive term, the recursive term can contain a reference to the Query’s own output.The last part of the CTE query is the termination block. 

Let’s start with writing the simple query to achieve the same result as first query used in the above section and using this example understand how the recursive CTE works. 

WITH RECURSIVE cte_query 
AS 
(
select empno, ename, mgr
from emp e
where mgr is null
UNION ALL
select e.empno, e.ename, e.mgr
From emp e
INNER JOIN cte_query c ON c.empno = e.mgr
)
Select * from cte_query;

So the first part of the cte_query definition is the non-recursive part which generates the base result. In this case, it returns the root of the hierarchy (top-down) that is KING President with no manager.

select empno, ename, mgr
from emp e
where mgr is null


empno | ename | mgr 
-------+-------+-----
  7839 | KING  |    
(1 row)

The second part of the the cte_query is the recursive part which generate the output by joining the emp table with the output of non-recursive query.  

select e.empno, e.ename, e.mgr
From emp e
INNER JOIN cte_query c ON c.empno = e.mgr

Since this is traversing the tree from top-down, the second recursive part of the query will bring back all the direct report of King in the first iteration and in the remaining iterations it will traverse the hierarchy and continue to join the emp table with cte_query.

WITH RECURSIVE cte_query                                                                                                                AS                                                                                                                                                 (                                                                                                                                                  select empno,ename,job,mgr                                                                                                                         from emp                                                                                                                                           where mgr is null                                                                                                                                  UNION ALL                                                                                                                                          select e.empno, e.ename,e.job, e.mgr                                                                                                               From emp e                                                                                                                                         JOIN cte_query c ON c.empno = e.mgr                                                                                                                )                                                                                                                                                  Select * from cte_query;

 empno | ename  |    job    | mgr  
-------+--------+-----------+------
  7839 | KING   | PRESIDENT |     
  7566 | JONES  | MANAGER   | 7839
  7698 | BLAKE  | MANAGER   | 7839
  7782 | CLARK  | MANAGER   | 7839
  7499 | ALLEN  | SALESMAN  | 7698
  7521 | WARD   | SALESMAN  | 7698
  7654 | MARTIN | SALESMAN  | 7698
  7788 | SCOTT  | ANALYST   | 7566
  7844 | TURNER | SALESMAN  | 7698
  7900 | JAMES  | CLERK     | 7698
  7902 | FORD   | ANALYST   | 7566
  7934 | MILLER | CLERK     | 7782
  7369 | SMITH  | CLERK     | 7902
  7876 | ADAMS  | CLERK     | 7788
(14 rows)

Now lets try about CTE recursive query to get all the subordinates of Manager BLAKE : 

WITH RECURSIVE cte_query                                                                                                                AS                                                                                                                                                 (                                                                                                                                                  select empno,ename,job,mgr                                                                                                                         from emp                                                                                                                                           where ename = 'BLAKE'                                                                                                                              UNION ALL                                                                                                                                          select e.empno, e.ename,e.job, e.mgr                                                                                                               From emp e                                                                                                                                         JOIN cte_query c ON c.empno = e.mgr                                                                                                                )                                                                                                                                                  Select * from cte_query;
 empno | ename  |   job    | mgr  
-------+--------+----------+------
  7698 | BLAKE  | MANAGER  | 7839
  7499 | ALLEN  | SALESMAN | 7698
  7521 | WARD   | SALESMAN | 7698
  7654 | MARTIN | SALESMAN | 7698
  7844 | TURNER | SALESMAN | 7698
  7900 | JAMES  | CLERK    | 7698
(6 rows)

Again the non-recursive part brings back the starting point of the tree i.e. Manager ‘BLAKE’ and the recursive part brings back all the sub-ordinates of BLAKE.

Now lets try a bottom-up search using CTE recursive query by getting all the member of the hierarchy starting from the bottom.

WITH RECURSIVE cte_query                                                                                                                AS                                                                                                                                                 (                                                                                                                                                  select empno,ename,job,mgr                                                                                                                         from emp                                                                                                                                           where empno=7876                                                                                                                                   UNION ALL                                                                                                                                          select e.empno, e.ename,e.job, e.mgr                                                                                                               From emp e                                                                                                                                         INNER JOIN cte_query c ON c.mgr = e.empno                                                                                                          )                                                                                                                                                  Select * from cte_query;
 empno | ename |    job    | mgr  
-------+-------+-----------+------
  7876 | ADAMS | CLERK     | 7788
  7788 | SCOTT | ANALYST   | 7566
  7566 | JONES | MANAGER   | 7839
  7839 | KING  | PRESIDENT |     
(4 rows)

The non-recursive part of the query returns ADAMS CLERK who is at the bottom of the hierarchy. The recursive part of the query produce the rest of the tree by joining the cte_query with emp table.  

Conclusion

In this blog I have provided an introduction to hierarchical queries in Oracle and demonstrated how we can run hierarchical queries in PostgreSQL using the CTE recursive queries. I have provided example of both Oracle and PostgreSQL to explain how hierarchical queries are built. This is the first blog of this series and it uses simple examples to demonstrate this really important feature. Next I will explain the other pseudo columns and clauses that are supported by Oracle and how we can run those in PostgreSQL.

The PostgreSQL community is doing some work in improving the recursive queries, please refer to the following hacker threads to see the ongoing development in the community for recursive queries. 

Hackers email subject “Bug with subqueries in recursive CTEs?”

Hackers email subject “[PATCH] Allow multiple recursive self-references”

One Response

  1. […] post How to run Hierarchical Queries with PostgreSQL appeared first on Highgo Software […]

Leave a Reply

Your email address will not be published. Required fields are marked *