Complex Constraints
Relational purists insist that full support of arbitrary complex declarative constraints is one of the characteristic features of True Relational DBMS (as compared to SQL DBMS). In this article we investigate if SQL DBMS are indeed incapable of supporting complex constraints.
Subqueries in the Check Constraints
One of the most powerful SQL features is scalar subquery. Scalar subquery can be a part of where or select clause, as opposed to inline view, which is an element of the from list. It is natural to expect that scalar subquery can substitute any scalar variable or constant in an arbitrary expression. For example, given a constraint:
ALTER TABLE emp
ADD CONSTRAINT ck_0_le_1 CHECK( 0 < 2 )
which trivially holds true, we can try substituting the constant 2 with a subquery as follows:
ALTER TABLE emp
ADD CONSTRAINT ck_fk CHECK(
0 < (select count(1) from dept
where dept.deptno=emp.deptno)
)
Assuming that emp relation doesn’t admit NULLs, we easily recognize that the above check constraint is essentially referential integrity constraint or, simply, foreign key. Although there seems to be little point reinventing foreign key constraints as check constraints, the next example demonstrates that subqueries in check constraints greatly increase constraint’s expressive power.
Consider the customer supertype which has two subtypes: business_customer and home_customer. Further, assume that our data model doesn’t have the customer table. If we introduce one more entity — order —, then it is natural to reinforce many-to-one relationship between orders and customers as the following constraint:
ALTER TABLE order
ADD CONSTRAINT ck_fk CHECK(
0 < (select count(1) from business_customer b
where b.cust_id=order.cust_id)
OR
0 < (select count(1) from home_customer h
where h.cust_id=order.cust_id)
)
The question of how to impose a “generalized” foreign key constraint onto a set of more than two tables is frequently asked on various database forums.
Limitations
Unfortunately, the RDBMS that I use leaves no doubt that, today, this is not possible:
ORA-02251: subquery not allowed here
ALTER TABLE emp |
What about User Defined SQL functions (UDF)? Like subquery, UDF returns a scalar value as well, which is normally allowed in any place of expression where ordinary scalar variable or constant is allowed. However, if a RDBMS of my choice were allowing UDF calls within check constraint expression, then I could claim that the implementation is inconsistent. Indeed, UDF can contain embedded SQL inside its body:
function deptno_cnt( deptno integer )
RETURN integer IS
ret_val integer;
BEGIN
select count(1) into ret_val from dept
where dept.deptno = deptno
RETURN ret_val;
END;
which effectively makes UDFs as powerful as scalar subqueries.
Your mileage might vary, depending on what RDBMS you use, but the problem is more fundamental than just allowing a naive subquery evaluation, or a blind UDF call when validating check constraint.
Constraints as a View Equation
There are two issues of paramount importance as far as constraints are concerned: performance and concurrency. In this article we’ll focus on performance only. Let's analyze the above “foreign key” constraint:
ALTER TABLE emp
ADD CONSTRAINT ck_fk CHECK(
0 < (select count(1) from dept
where dept.deptno=emp.deptno)
)
from that perspective.
Suppose a tuple were inserted into the emp table then, how is RDBMS supposed to check whether the constraint hasn’t been violated? That’s easy — the RDBMS just has to check if the expression is valid for that specific tuple. This, in turn, would trigger an execution of the query :
select count(1) from dept
where dept.deptno=:1
where :1 bind variable matches current emp.deptno value. If there is an index on dept.deptno, then query optimizer should be able to leverage it. With the dept table growth the performance index range scan would degrade gracefully (logarithmically to be more exact), which mimics the “authentic” foreign key implementation. (Typical RDBMS requires foreign key constraint to refer to the master table column which has a unique key constraint on it, and unique key constraint implies index). So far so good: the technique where constraint is checked only for those tuples in the emp and the dept tables which are affected by the transaction is called incremental constraint evaluation.
What if a tuple is deleted from the dept table, which condition should the RDBMS check? Unlike the previous case, constraint doesn’t favor single tuple evaluation at the dept table side. It has to be rewritten.
Informally, the constraint says:
“Any tuple in the emp table satisfies the check condition”
which is equivalent to:
“No tuple in the emp table refutes the check condition”
In other words, if we take the emp table and select all the tuples that don’t satisfy the check condition, then the resulting view must be empty. The last sentence could be easily translated back into a formal expression:
select 1 from emp where not
0 < (select count(1) from dept
where dept.deptno=emp.deptno)
= {}
which we interpret as view equation. Indeed, in this form it looks very similar to equations in math. The empty set {} at the right side of the view equation mimics zero element in math equation.
Next, the view can be rewritten as antijoin:
select 1 from emp where not exists
(select 1 from dept
where dept.deptno=emp.deptno)
= {}
and, then, antijoin can be transformed into a set operation:
select deptno from emp
minus
select deptno from dept
= {}
What a charming view equation! It’s immediate that the emp-dept referential integrity constraint can be violated only if the left side view becomes nonempty. It could happen when a tuple is inserted into emp table, or deleted from the dept table.
This is as far as we can get in our quest for symmetry in a formal representation of our foreign key constraint. Now we can focus on implementation.
Materialized Views
In the RDBMS world, queries and updates are often conflicting goals from a performance perspective. We can speed up some queries at the cost of introducing some auxiliary structures. Those structures are maintained incrementally; the change to the structures is small when the update transaction is small, which makes the overall performance acceptable. Incremental query evaluation is, arguably, one of the most important performance ideas.
Ubiquitous examples of incremental query evaluation are indexes and materialized views. More exclusive examples include materializing transitive closure relation and various hierarchy encodings. (Admittedly, the last two incremental evaluation techniques are not supported by query rewrite, when a user’s query is agnostic of the auxiliary structures.)
Incrementally maintained materialized views are ideal candidates for view equation evaluation. In our last example, the RDBMS has to check if the materialized view:
select deptno from emp
minus
select deptno from dept
is empty. Formally, this condition is check constraint on the aggregated materialized view:
CREATE MATERIALIZED VIEW emp_minus_dept_mv AS
select count(1) cnt from (
select deptno from emp
minus
select deptno from dept
)
ALTER TABLE emp_minus_dept_mv
ADD CONSTRAINT ck_card_eq_0 CHECK( cnt = 0 )
Although we succeeded in reducing the problem of generic constraints to check constraints with simple expressions, incrementally maintaining materialized views is technically difficult and, therefore, has limitations on the implementation side.
Oracle, for example, doesn’t allow set operations in the definition of incrementally maintained materialized view (the Oracle term is fast refreshable). The allowed relational operations that preserve the fast refreshable property of a materialized view are: join, outer join, and aggregation. We have to change the design in order to work around this restriction.
An alternative solution would use outer join and check if the column is not null:
CREATE MATERIALIZED VIEW LOG ON emp WITH ROWID INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON dept WITH ROWID INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW emp_dept_oj_mv REFRESH FAST ON COMMIT AS select d.deptno ddept, d.rowid drid, e.rowid erid from emp e, dept d where e.deptno=d.deptno(+); ALTER TABLE emp_dept_oj_mv ADD CONSTRAINT ck_oj_mv CHECK(ddept is not null); delete from dept where deptno = 20; commit;
ORA-12008: error in materialized view refresh path ORA-02290: check constraint (SCOTT.CK_OJ_MV) violated
commit |
As expected, deleting a key in the dept table causes update of emp_dept_oj_mv materialized view with NULL value in the ddept column, which is not allowed by ck_oj_mv check constraint.
This completes our program implementing foreign key constraint as check constraint. From a performance perspective our last solution is still not quite satisfactory. Unlike our original materialized view with minus operator, outer join view grows with the base tables growing. This storage penalty is totally unnecessary and, once more, could be avoided, when the incremental view update mechanism becomes more sophisticated to allow set operations.
Now that the idea of implementing complex constraints via check constraints on materialized views looks sound, we can investigate other cases.
Unique Key Constraint Spanning Multiple Tables
Returning to supertype-subtype example, let’s implement a constraint that spans both business_customer and home_customer tables such that cust_id column in the view:
select cust_id from business_customer
union all
select cust_id from home_customer
is unique. This condition is met if and only if cust_id is a unique key in each table, and view:
select cust_id from business_customer
intersect
select cust_id from home_customer
is empty. One more time, since set operation is not supported we have to rewrite the view as join. The complete test script is:
create table home_customer(cust_id integer); create table business_customer(cust_id integer); insert into home_customer(cust_id) values(1); CREATE MATERIALIZED VIEW LOG ON home_customer WITH ROWID INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW LOG ON business_customer WITH ROWID INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW bh_ij_mv REFRESH FAST ON COMMIT AS select h.cust_id custid, h.rowid hrid, b.rowid brid from home_customer h, business_customer b where h.cust_id=b.cust_id; ALTER TABLE bh_ij_mv ADD CONSTRAINT uk_bh_ij_mv CHECK(custid is null); insert into business_customer(cust_id) values(1); commit;
ORA-12008: error in materialized view refresh path ORA-02290: check constraint (SCOTT.UK_BH_IJ_MV) violated
commit |
The uk_bh_ij_mv constraint doesn’t check bh_ij_mv’s cardinality explicitly. It checks if the column is NULL, which confirms that cardinality is 0. It is possible to nest materialized views and verify cardinality directly.
Note, that unlike view with outer join in the previous section, there is no storage concern.
Cardinality Constraint
Incremental view evaluation with aggregates is supported — that is all required for rather straightforward implementation:
CREATE MATERIALIZED VIEW LOG ON emp WITH ROWID INCLUDING NEW VALUES; CREATE MATERIALIZED VIEW emp_ag_mv REFRESH FAST ON COMMIT AS select count(1) c from emp e; ALTER TABLE emp_ag_mv ADD CONSTRAINT ck_ag_mv CHECK(c<15); insert into emp (empno) values (6); commit;
ORA-12008: error in materialized view refresh path ORA-02290: check constraint (SCOTT.CK_AG_MV) violated
commit |
Conclusion
Overall, it is quite surprising to see that RDBMS features that were aimed to solve one set of problems (data warehousing) can be leveraged in completely different area (constraint enforcement).
Acknowledgments
The idea of leveraging materialized views is credited to Tony Andrews. Many thanks to Bob Jenkins and Benoit Dageville, discussions with whom also contributed to this article.
--
Vadim Tropashko works for Real World Performance group at Oracle Corp.
Contributors : Vadim Tropashko, Tony Andrews, Bob Jenkins, Benoit Dageville
Last modified 2005-04-20 10:04 AM