A hierarchical query is built upon parent-child relationship, the relationship exist in the same table or view. The relationship dictates that each child can have one parent while a parent can have many children. Hierarchical query is a SQL query that handles data of hierarchical model i.e. an organisation structure where every employee has one manager and one manager who is also an employee can have many employees in his reporting, another example is a family tree where one person can only have one parent while a person can have many children. There are many examples where the data contains parent-child relationship i.e. hierarchical model and traversing through a self-reference table using the standard SQL constructs can be a challenging and difficult task. The simple queries of hierarchical data in the same table or view is possible using the table join etc but as the relationship becomes deeper, it is difficult to traverse the data to get expected results.
All major database engines including PostgreSQL, Oracle, DB2 and MS SQL provide support for querying such kind of data. The hierarchical query constructs were one of the first features that were added to Oracle more than 20 years ago. PostgreSQL has introduced the WITH Queries clause (Common table expression) to handle such type of data, using WITH Queries along with the optional recursive clause, it is possible to refer to its own output in a single query. The CTE can also be thought of as a defining temporary table that exist just for one query.
In this blog we will be using an organisation structure in order to demonstrate the usage of hierarchical queries. The blog will show the usage of Hierarchical queries feature in Oracle and then demonstrate how we can use the CTE with Recursive keyword to achieve similar results in PostgreSQL. We will also discuss the ongoing work in the PostgreSQL community for improving the recursive self-referencing queries. We will be using the well known (EMP) table (as shown below) that contains a hierarchical type of data, the root of the relationship is “KING”. King is the president in this employee data, below the President each manager has its own department and each department contains one or more employees reporting to the manager. Using the hierarchical query feature, the user can traverse through the hierarchy in the EMP table, it can traverse from bottom up in order to retrieve the top hierarchy of an employee or top down to find out all the employee under one manager. In this blog, we will use the hierarchical query feature in Oracle and Recursive query of PostgreSQL for performing bottom-up search and top-down search of EMP data.
So this is how the EMP table looks like :
postgres=# select * from emp; empno | ename | job | mgr | hiredate | sal | comm | deptno -------+--------+-----------+------+------------+---------+---------+-------- 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | | 20 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | | 20 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | | 30 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | | 10 7788 | SCOTT | ANALYST | 7566 | 1987-04-19 | 3000.00 | | 20 7839 | KING | PRESIDENT | | 1981-11-17 | 5000.00 | | 10 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 7876 | ADAMS | CLERK | 7788 | 1987-05-23 | 1100.00 | | 20 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | | 30 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | | 20 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | | 10 (14 rows)
CONNECT BY, PRIOR and START WITH in Oracle
In Oracle, the hierarchical query is defined using the two mandatory keywords i.e. CONNECT BY and START WITH. The hierarchy is built when the CONNECT BY defines the relationship between parent and child, the PRIOR keyword used with CONNECT by specifies the parent. The START_WITH clause identify the record that the user wants to start with. There are other pseudo columns and clauses like LEVEL, connect_by_root and ORDER SIBLINGS BY that we will discuss in the next series of this blog, in this article we will focus on the mandatory clauses that are used by Oracle to build the hierarchical tree and how we can achieve the same results in PostgreSQL using the CTE Recursive queries.
Here is a simple hierarchical query that uses the connect by to define the hierarchy and start with where manager is null. As mentioned above King is the President and King and no value for mgr, using the query below we are traversing the tree top down.
SQL> select ename,job,mgr from emp 2 connect by prior empno = mgr 3 start with mgr is null;
ENAME JOB MGR ---------- --------- ---------- KING PRESIDENT JONES MANAGER 7839 SCOTT ANALYST 7566 ADAMS CLERK 7788 FORD ANALYST 7566 SMITH CLERK 7902 BLAKE MANAGER 7839 ALLEN SALESMAN 7698 WARD SALESMAN 7698 MARTIN SALESMAN 7698 TURNER SALESMAN 7698 ENAME JOB MGR ---------- --------- ---------- JAMES CLERK 7698 CLARK MANAGER 7839 MILLER CLERK 7782 14 rows selected.
If we are only interested to see all the employees reporting to BLAKE, we can execute the following query :
SQL> 1 select empno,ename,job,mgr from emp 2 connect by prior empno = mgr 3 start with ename = 'BLAKE' EMPNO ENAME JOB MGR ---------- ---------- --------- ---------- 7698 BLAKE MANAGER 7839 7499 ALLEN SALESMAN 7698 7521 WARD SALESMAN 7698 7654 MARTIN SALESMAN 7698 7844 TURNER SALESMAN 7698 7900 JAMES CLERK 7698 6 rows selected.
The above queries show a top down search where the root is the top most employee in the hierarchy (based on start with clause) and then displaying the employees in the reporting. We can also perform a bottom up search by starting with employee that is at the bottom of the hierarchy and then displaying the tree upward.
This is done by moving the PRIOR keyword to parent side of the relationship. As shown in the query below, we have moved the PRIOR to mgr and started with ADAMS which is CLERK, the query results build the tree with ADAMS as the root and all the way upto the KING which is the highest level in the employee hierarchy.
1 select empno,ename,job,mgr from emp 2 connect by empno = prior mgr 3 start with empno = 7876 EMPNO ENAME JOB MGR ---------- ---------- --------- ---------- 7876 ADAMS CLERK 7788 7788 SCOTT ANALYST 7566 7566 JONES MANAGER 7839 7839 KING PRESIDENT
Hierarchical Queries in PostgreSQL
Hierarchical queries can be achieved using CTE recursive queries, PostgreSQL provides the WITH statement that allows to create temporary tables or auxiliary statements that are for use in the same query. The temporary tables created in CTE query exist only during the execution of the query. Recursive is an optional keyword used with the CTE query, the optional recursive changes enables the WITH query to refer to its output. The recursive queries are typically used to deal with hierarchical or tree structure data. The general form of recursive WITH query has the following structure :
WITH Recursive cte_query AS ( CTE_Query Definition; UNION ALL CTE_Query Definition; ) SELECT * FROM cte_query;
The first part of CTE_Query Definition is non-recursive term, the second part is recursive term, the recursive term can contain a reference to the Query’s own output.The last part of the CTE query is the termination block.
Let’s start with writing the simple query to achieve the same result as first query used in the above section and using this example understand how the recursive CTE works.
WITH RECURSIVE cte_query AS ( select empno, ename, mgr from emp e where mgr is null UNION ALL select e.empno, e.ename, e.mgr From emp e INNER JOIN cte_query c ON c.empno = e.mgr ) Select * from cte_query;
So the first part of the cte_query definition is the non-recursive part which generates the base result. In this case, it returns the root of the hierarchy (top-down) that is KING President with no manager.
select empno, ename, mgr from emp e where mgr is null empno | ename | mgr -------+-------+----- 7839 | KING | (1 row)
The second part of the the cte_query is the recursive part which generate the output by joining the emp table with the output of non-recursive query.
select e.empno, e.ename, e.mgr From emp e INNER JOIN cte_query c ON c.empno = e.mgr
Since this is traversing the tree from top-down, the second recursive part of the query will bring back all the direct report of King in the first iteration and in the remaining iterations it will traverse the hierarchy and continue to join the emp table with cte_query.
WITH RECURSIVE cte_query AS ( select empno,ename,job,mgr from emp where mgr is null UNION ALL select e.empno, e.ename,e.job, e.mgr From emp e JOIN cte_query c ON c.empno = e.mgr ) Select * from cte_query; empno | ename | job | mgr -------+--------+-----------+------ 7839 | KING | PRESIDENT | 7566 | JONES | MANAGER | 7839 7698 | BLAKE | MANAGER | 7839 7782 | CLARK | MANAGER | 7839 7499 | ALLEN | SALESMAN | 7698 7521 | WARD | SALESMAN | 7698 7654 | MARTIN | SALESMAN | 7698 7788 | SCOTT | ANALYST | 7566 7844 | TURNER | SALESMAN | 7698 7900 | JAMES | CLERK | 7698 7902 | FORD | ANALYST | 7566 7934 | MILLER | CLERK | 7782 7369 | SMITH | CLERK | 7902 7876 | ADAMS | CLERK | 7788 (14 rows)
Now lets try about CTE recursive query to get all the subordinates of Manager BLAKE :
WITH RECURSIVE cte_query AS ( select empno,ename,job,mgr from emp where ename = 'BLAKE' UNION ALL select e.empno, e.ename,e.job, e.mgr From emp e JOIN cte_query c ON c.empno = e.mgr ) Select * from cte_query; empno | ename | job | mgr -------+--------+----------+------ 7698 | BLAKE | MANAGER | 7839 7499 | ALLEN | SALESMAN | 7698 7521 | WARD | SALESMAN | 7698 7654 | MARTIN | SALESMAN | 7698 7844 | TURNER | SALESMAN | 7698 7900 | JAMES | CLERK | 7698 (6 rows)
Again the non-recursive part brings back the starting point of the tree i.e. Manager ‘BLAKE’ and the recursive part brings back all the sub-ordinates of BLAKE.
Now lets try a bottom-up search using CTE recursive query by getting all the member of the hierarchy starting from the bottom.
WITH RECURSIVE cte_query AS ( select empno,ename,job,mgr from emp where empno=7876 UNION ALL select e.empno, e.ename,e.job, e.mgr From emp e INNER JOIN cte_query c ON c.mgr = e.empno ) Select * from cte_query; empno | ename | job | mgr -------+-------+-----------+------ 7876 | ADAMS | CLERK | 7788 7788 | SCOTT | ANALYST | 7566 7566 | JONES | MANAGER | 7839 7839 | KING | PRESIDENT | (4 rows)
The non-recursive part of the query returns ADAMS CLERK who is at the bottom of the hierarchy. The recursive part of the query produce the rest of the tree by joining the cte_query with emp table.
In this blog I have provided an introduction to hierarchical queries in Oracle and demonstrated how we can run hierarchical queries in PostgreSQL using the CTE recursive queries. I have provided example of both Oracle and PostgreSQL to explain how hierarchical queries are built. This is the first blog of this series and it uses simple examples to demonstrate this really important feature. Next I will explain the other pseudo columns and clauses that are supported by Oracle and how we can run those in PostgreSQL.
The PostgreSQL community is doing some work in improving the recursive queries, please refer to the following hacker threads to see the ongoing development in the community for recursive queries.
Hackers email subject “Bug with subqueries in recursive CTEs?”
Hackers email subject “[PATCH] Allow multiple recursive self-references”
Ahsan Hadi is a VP of Development with HighGo Software Inc. Prior to coming to HighGo Software, Ahsan had worked at EnterpriseDB as a Senior Director of Product Development, Ahsan worked with EnterpriseDB for 15 years. The flagship product of EnterpriseDB is Postgres Plus Advanced server which is based on Open source PostgreSQL. Ahsan has vast experience with Postgres and has lead the development team at EnterpriseDB for building the core compatibility of adding Oracle compatible layer to EDB’s Postgres Plus Advanced Server. Ahsan has also spent number of years working with development team for adding Horizontal scalability and sharding to Postgres. Initially, he worked with postgres-xc which is multi-master sharded cluster and later worked on managing the development of adding horizontal scalability/sharding to Postgres. Ahsan has also worked a great deal with Postgres foreign data wrapper technology and worked on developing and maintaining FDW’s for several sql and nosql databases like MongoDB, Hadoop and MySQL.
Prior to EnterpriseDB, Ahsan worked for Fusion Technologies as a Senior Project Manager. Fusion Tech was a US based consultancy company, Ahsan lead the team that developed java based job factory responsible for placing items on shelfs at big stores like Walmart. Prior to Fusion technologies, Ahsan worked at British Telecom as a Analyst/Programmer and developed web based database application for network fault monitoring.
Ahsan joined HighGo Software Inc (Canada) in April 2019 and is leading the development teams based in multiple Geo’s, the primary responsibility is community based Postgres development and also developing HighGo Postgres server.