Hierarchical Query Enhancements in Oracle Database 10g
Some applications make extensive use of hierarchical data such as an organization chart, a bill of material in a manufacturing and assembly plant, or a family tree. These types of information are most conveniently represented in a tree structure. However, such data can be easily fit into a relational table by using a self-referential relationship, as in the following definition of the EMPLOYEE table:
CREATE TABLE EMPLOYEE ( EMP_ID NUMBER (4) CONSTRAINT EMP_PK PRIMARY KEY, EMP_NAME VARCHAR2 (15) NOT NULL, DEPT_ID NUMBER (2) NOT NULL, MGR_ID NUMBER (4) CONSTRAINT EMP_FK REFERENCES EMPLOYEE 9EMP_ID) SALARY NUMBER (7,2) NOT NULL, HIRE_DATE DATE NOT NULL);
Note the foreign key constraint EMP_FK on the column MGR_ID, which references the EMP_ID column of the same table (and hence the term “self-referential”). In this table, the MGR_ID represents the EMP_ID of an employee’s manager.
Oracle provides some useful extensions to ANSI SQL to manipulate hierarchical data represented in a relational table. Up to Oracle9i, Oracle’s hierarchical extensions include the START WITH … CONNECT BY clause, the PRIOR operator, and the LEVEL pseudo-column. The following example lists the employees in a hierarchical order and indents the subordinates under an employee:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH MGR_ID IS NULL CONNECT BY PRIOR EMP_ID = MGR_ID; LEVEL EMPLOYEE EMP_ID MGR_ID ---------- -------------------- ---------- ---------- 1 KING 7839 2 JONES 7566 7839 3 SCOTT 7788 7566 4 ADAMS 7876 7788 3 FORD 7902 7566 4 SMITH 7369 7902 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 WARD 7521 7698 3 MARTIN 7654 7698 3 TURNER 7844 7698 3 JAMES 7900 7698 2 CLARK 7782 7839 3 MILLER 7934 7782
Using Oracle provided extensions, you can construct complex hierarchical operations on a tree-structured data. With Oracle Database 10g, new features have been added to Oracle’s support for hierarchical queries. In this article, we will discuss these new features in detail, with examples.
New Features
The new hierarchical query features in Oracle Database 10g are:
- New Operator
- CONNECT_BY_ROOT
- New Pseudocolumns
- CONNECT_BY_ISCYCLE
- CONNECT_BY_ISLEAF
- New Function
- SYS_CONNECT_BY_PATH (Oracle9i)
- New Keywords
- NOCYCLE
- SIBLINGS (Oracle9i)
We will discuss each of these in the following sections.
CONNECT_BY_ROOT
The CONNECT_BY_ROOT operator, when applied to a column, returns the value for that column for the root row. The following example illustrates how the CONNECT_BY_ROOT operator is used:
SELECT EMP_NAME, CONNECT_BY_ISLEAF FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID; Employee Top Manager -------------------- ------------ JONES JONES SCOTT JONES ADAMS JONES FORD JONES SMITH JONES BLAKE BLAKE ALLEN BLAKE WARD BLAKE MARTIN BLAKE TURNER BLAKE JAMES BLAKE CLARK CLARK MILLER CLARK
In this example, the organization tree is built by starting with the rows that have MGR_ID = 7839. This means that anyone whose manager is “7839” will be considered a root for this query. Now, all the employees who come under the organizations under these roots will be displayed in the result set of this query along with the name of their top-most manager in the tree. The CONNECT_BY_ROOT operator determines the top-most node in the tree for a given row.
NOCYCLE
Cycles are not allowed in a true tree structure. But some hierarchical data may contain cycles. In a hierarchical structure, if a descendant is also an ancestor, it is called a cycle. It is sometimes difficult to identify cycles in hierarchical data. The hierarchical construct “START WITH … CONNECT BY … PRIOR” will report an error if there is a cycle in the data.
To allow the “START WITH … CONNECT BY … PRIOR” construct to work properly even if cycles are present in the data, Oracle Database 10g provides a new keyword, NOCYCLE. If there are cycles in the data, you can use the NOCYCLE keyword in the CONNECT BY clause, and you will not get the error mentioned earlier.
The test data we have in the EMPLOYEE table doesn’t have a cycle. To test the NOCYCLE feature, let’s introduce a cycle into the existing EMPLOYEE data, by updating the MGR_ID column of the top-most employee (KING with EMP_ID=7839) with the EMP_ID of one of the lowest-level employees (MARTIN with EMP_ID = 7654).
UPDATE EMPLOYEE
SET MGR_ID = 7654
WHERE MGR_ID IS NULL;
Now, if you perform a hierarchical query, you will get an ORA-01436 error:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID; LEVEL EMPLOYEE EMP_ID MGR_ID ---------- -------------------- ---------- ---------- 1 KING 7839 7654 2 JONES 7566 7839 3 SCOTT 7788 7566 4 ADAMS 7876 7788 3 FORD 7902 7566 4 SMITH 7369 7902 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 WARD 7521 7698 3 MARTIN 7654 7698 4 KING 7839 7654 5 JONES 7566 7839 6 SCOTT 7788 7566 7 ADAMS 7876 7788 6 FORD 7902 7566 ERROR: ORA-01436: CONNECT BY loop in user data
Besides the error, note that the whole tree starting with KING starts repeting under MARTIN. This is erroneous and confusing. The NOCYCLE keyword can be used in the CONNECT BY clause to get rid of this error:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID; LEVEL EMPLOYEE EMP_ID MGR_ID ---------- -------------------- ---------- ---------- 1 KING 7839 7654 2 JONES 7566 7839 3 SCOTT 7788 7566 4 ADAMS 7876 7788 3 FORD 7902 7566 4 SMITH 7369 7902 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 WARD 7521 7698 3 MARTIN 7654 7698 3 TURNER 7844 7698 3 JAMES 7900 7698 2 CLARK 7782 7839 3 MILLER 7934 7782
The above query recognizes that there is a cycle and ignore the cycle (as an impact of the NOCYCLE keyword), and returns the rows as if there were no cycle.
CONNECT_BY_ISCYCLE
It is sometimes difficult to identify cycles in hierarchical data. Oracle 10g’s new pseudocolumn CONNECT_BY_ISCYCLE can help you identify the cycles in the data easily. The CONNECT_BY_ISCYCLE can be used only in conjunction with the NOCYCLE keyword in a hierarchical query. The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor; otherwise it returns 0. For example:
SELECT EMP_NAME, CONNECT_BY_ISCYCLE FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID; EMP_NAME CONNECT_BY_ISCYCLE --------------- ------------------ KING 0 JONES 0 SCOTT 0 ADAMS 0 FORD 0 SMITH 0 BLAKE 0 ALLEN 0 WARD 0 MARTIN 1 TURNER 0 JAMES 0 CLARK 0 MILLER 0
Note that since MARTIN is KING’s manager in this data set, and also MARTIN comes under the organization tree under KING, the row for MARTIN has a value 1 for CONNECT_BY_ISCYCLE.
Note: For correct results in the subsequent queries, we should revert the data back to its original state by rolling back (if you have not committed) the earlier change we did to force a cycle in the data. If you have already committed the change, then update the MGR_ID for KING to NULL.
CONNECT_BY_ISLEAF
In a tree structure, the nodes at the lowest level of the tree are referred to as leaf nodes. Leaf nodes have no children. CONNECT_BY_ISLEAF is a pseudocolumn that returns 1 if the current row is a leaf, and returns 0 if the current row is not a leaf. For example:
SELECT EMP_NAME, CONNECT_BY_ISLEAF FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID; EMP_NAME CONNECT_BY_ISLEAF --------------- ----------------- KING 0 JONES 0 SCOTT 0 ADAMS 1 FORD 0 SMITH 1 BLAKE 0 ALLEN 1 WARD 1 MARTIN 1 TURNER 1 JAMES 1 CLARK 0 MILLER 1
This new feature can help simplify SQL statements that need to identify all the leaf nodes. Without this pseudocolumn, to identify the leaf nodes, you would probably write a query like the following:
SELECT EMP_ID, EMP_NAME, SALARY, HIRE_DATE FROM EMPLOYEE E WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE E1 WHERE E.EMP_ID = E1.MGR_ID); EMP_ID EMP_NAME SALARY HIRE_DATE ------- --------------- ---------- --------- 7369 SMITH 800 17-DEC-80 7499 ALLEN 1600 20-FEB-81 7521 WARD 1250 22-FEB-81 7654 MARTIN 1250 28-SEP-81 7844 TURNER 1500 08-SEP-81 7876 ADAMS 1100 23-MAY-87 7900 JAMES 950 03-DEC-81 7934 MILLER 1300 23-JAN-82
However, this query can be made much simpler with the new pseudocolumn CONNECT_BY_ISLEAF, as shown below:
SELECT EMP_ID, EMP_NAME, SALARY, HIRE_DATE FROM EMPLOYEE E WHERE CONNECT_BY_ISLEAF = 1 START WITH MGR_ID IS NULL CONNECT BY PRIOR EMP_ID = MGR_ID; EMP_ID EMP_NAME SALARY HIRE_DATE ------- --------------- ---------- --------- 7876 ADAMS 1100 23-MAY-87 7369 SMITH 800 17-DEC-80 7499 ALLEN 1600 20-FEB-81 7521 WARD 1250 22-FEB-81 7654 MARTIN 1250 28-SEP-81 7844 TURNER 1500 08-SEP-81 7900 JAMES 950 03-DEC-81 7934 MILLER 1300 23-JAN-82
SYS_CONNECT_BY_PATH
The SYS_CONNECT_BY_PATH function was introduced in Oracle9i. However, it makes sense to discuss it along with the enhancements in Oracle Database 10g. The SYS_CONNECT_BY_PATH is function takes two arguments — a column name, and a character string — and returns the value of the column from the root node to each node, separated by the character string. For example:
SELECT SYS_CONNECT_BY_PATH(EMP_NAME, '::') FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID; SYS_CONNECT_BY_PATH(EMP_NAME,'::') ---------------------------------------- ::KING ::KING::JONES ::KING::JONES::SCOTT ::KING::JONES::SCOTT::ADAMS ::KING::JONES::FORD ::KING::JONES::FORD::SMITH ::KING::BLAKE ::KING::BLAKE::ALLEN ::KING::BLAKE::WARD ::KING::BLAKE::MARTIN ::KING::BLAKE::TURNER ::KING::BLAKE::JAMES ::KING::CLARK ::KING::CLARK::MILLER
ORDER SIBLINGS BY
The SIBLINGS keyword was introduced in Oracle9i. However, it makes sense to discuss it along with the enhancements in Oracle Database 10g. A hierarchical query with a “START WITH … CONNECT BY … PRIOR …” construct displays the results in an arbitrary order, as shown in the following example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID; LEVEL EMPLOYEE EMP_ID MGR_ID ------- -------------------- ---------- ---------- 1 KING 7839 2 JONES 7566 7839 3 SCOTT 7788 7566 4 ADAMS 7876 7788 3 FORD 7902 7566 4 SMITH 7369 7902 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 WARD 7521 7698 3 MARTIN 7654 7698 3 TURNER 7844 7698 3 JAMES 7900 7698 2 CLARK 7782 7839 3 MILLER 7934 7782
As always, you can use an ORDER BY clause to order the result rows in the way you want. However, in this case, an ORDER BY clause can destroy the hierarchical layers of the displayed data, as shown in the following example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID ORDER BY EMP_NAME; LEVEL EMPLOYEE EMP_ID MGR_ID ----- -------------------- ---------- ---------- 4 ADAMS 7876 7788 3 ALLEN 7499 7698 2 BLAKE 7698 7839 2 CLARK 7782 7839 3 FORD 7902 7566 3 JAMES 7900 7698 2 JONES 7566 7839 1 KING 7839 3 MARTIN 7654 7698 3 MILLER 7934 7782 3 SCOTT 7788 7566 4 SMITH 7369 7902 3 TURNER 7844 7698 3 WARD 7521 7698
As you can see from the above output, it is impossible to identify the hierarchical relationship between the rows. To resolve this problem, Oracle Database 10g has introduced a new keyword SIBLINGS, that you can use in an ORDER BY clause, and order the result set properly. For example:
SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID FROM EMPLOYEE START WITH EMP_ID = 7839 CONNECT BY PRIOR EMP_ID = MGR_ID ORDER SIBLINGS BY EMP_NAME; LEVEL EMPLOYEE EMP_ID MGR_ID ----- -------------------- ---------- ---------- 1 KING 7839 2 BLAKE 7698 7839 3 ALLEN 7499 7698 3 JAMES 7900 7698 3 MARTIN 7654 7698 3 TURNER 7844 7698 3 WARD 7521 7698 2 CLARK 7782 7839 3 MILLER 7934 7782 2 JONES 7566 7839 3 FORD 7902 7566 4 SMITH 7369 7902 3 SCOTT 7788 7566 4 ADAMS 7876 7788
In the above output, BLAKE, CLARK and JONES are siblings, and they are displayed in the ascending order. So are BLAKE’s children – ALLEN, JAMES, MARTIN, TURNER and WARD.
Conclusion
Oracle Database 10g enhances the already powerful hierarchical query features of the Oracle database. Among the new features are the easy ways to identify leafs and cycles in the data. The ordering of siblings provides a great way to improve the readability of the result sets. Developers who are familiar with Oracle’s hierarchical query constructs will find these features very useful.
--
Sanjay Mishra has more than 12 years of industry experience, and has extensively worked in the areas of database architecture, database management, performance tuning, scalability, ETL, backup / recovery, parallel server, and parallel execution. He has coauthored three Oracle books published by O'Reilly & Associates (Mastering Oracle SQL, Oracle SQL Loader: The Definitive Guide, Oracle Parallel Processing). Sanjay can be reached at smishra_tech@yahoo.com.
Sanjay Mishra
Last modified 2005-04-16 09:23 AM