One True Lookup Table
I think that Paul Keister was the first person to coin the phrase “OTLT” (One True Lookup Table) for a common SQL programming technique that is popular with newbies. The technique crops up time and time again, but I’ll give him credit as the first guy to give it a name. Simply put, the idea is to have one table to do all of the code look-ups in the schema. It usually looks like this:
CREATE TABLE Lookups (code_type CHAR(10) NOT NULL, code_value VARCHAR(255) NOT NULL, code_description VARCHAR(255) NOT NULL, PRIMARY KEY (code_value, code_type));
So if we have Dewey Decimal Classification (library codes), ICD (International Classification of Diseases), and two-letter ISO-3166 country codes in the schema, we have them all in one, honking big table.
Let’s start with the problems in the DDL and then look at the awful queries you have to write (or hide in VIEWs). So we need to go back to the original DDL and add a CHECK() constraint on the code_type column. (Otherwise, we might “invent” a new encoding system by typographical error.)
The Dewey Decimal and ICD codes are numeric, and the ISO-3166 is alphabetic. Oops, need another CHECK constraint that will look at the code_type and make sure that the string is in the right format. Now the table looks something like this, if anyone attempted to do it right, which is not usually the case:
CREATE TABLE Lookups (code_type CHAR(10) NOT NULL CHECK(code_type IN ('DDC', 'ICD', 'ISO3166', ..), code_value VARCHAR(255) NOT NULL, CHECK (CASE WHEN code_type = 'DDC' AND code_value SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]' THEN 1 WHEN code_type = 'ICD' AND code_value SIMILAR TO '[0-9][0-9][0-9].[0-9][0-9][0-9]' THEN 1 WHEN code_type = 'ISO3166' AND code_value SIMILAR TO '[A-Z][A-Z]' THEN 1 ELSE 0 END = 1), code_description VARCHAR(255) NOT NULL, PRIMARY KEY (code_value, code_type));
The SIMILAR TO is the SQL-92 version of grep(), if you are not familiar with it. Since the typical application database can have dozens and dozens of codes in it, just keep extending this pattern for as long as required. Not very pretty is it? That’s why most OTLT programmers don’t bother with it.
The next thing you notice about this table is that the columns are pretty wide VARCHAR(n), or even worse, that they NVARCHAR(n). The value of (n) is most often the largest one allowed in that particular SQL product.
Since you have no idea what is going to be shoved into the table, there is no way to predict and design with a safe, reasonable maximum size. The size constraint has to be put into the WHEN clause of that second CHECK() constraint between code_type and code_value.
These large sizes tend to invite bad data. You give someone a VARCHAR(n) column, and you eventually get a string with a lot of white space and a small odd character sitting at the end of it. You give someone an NVARCHAR(255) column and eventually it will get a Buddhist sutra in Chinese Unicode.
If you make an error in the code_type or code_description among codes with the same structure, it might not be detected. You can turn 500.000 from “Natural Sciences and Mathematics” in Dewey Decimal codes into "Coal Workers’ Pneumoconiosis" in ICD and vice versa. This can really hard to find when one of the similarly structured schemes had unused codes in it.
Now let’s consider the problems with actually using the OTLT in the DML. It is always necessary to add the code_type as well as the value which you are trying to look up.
SELECT P1.ssn, P1.lastname, .., L1.code_description FROM Lookups AS L1, Personnel AS P1 WHERE L1.code_type = 'ICD' AND L1.code_value = P1.sickness AND ..;
In this sample query, I need to know the code_type of the Personnel table sickness column and of every other encoded column in the table. If you got a code_type wrong, you can still get a result.
I also need to allow for some overhead for type conversions. It would be much more natural to use DECIMAL (6,3) for Dewey Decimal codes instead of VARCHAR(n), so that is probably how it appears in the Personnel table. But why not use CHAR(7) for the code? If I had a separate table for each encoding scheme, then I would have used a FOREIGN KEY and matched the data types in the referenced and referencing tables. There is no definitive guide for datatype choices in the OTLT approach.
When I go to execute a query, I have to pull in the entire lookup table, even if I only use one code. If one code is at the start of the physical storage, and another is at the end of physical storage, I can do a lot of paging. When I update the Lookup table, I have to lock out everyone until I am finished. It is like having to carry an encyclopedia set with you when all you needed was a magazine article.
I am going to venture a guess that this idea came from OO programmers who think of it as some kind of polymorphism done in SQL. They say to themselves that a table is a class, which it isn’t, and therefore it ought to have polymorphic behaviors, which it doesn’t.
Maybe there are good reasons for the data modeling principle that a well-designed table is a set of things of the same kind instead of a pile of unrelated items.
--
Joe Celko was a member of the ANSI X3H2 Database Standards Committee and helped write the SQL-92 standards. He is the author of over 750 magazine columns and seven books, the best known of which is SQL for Smarties, now in its third edition. For more articles, puzzles, resources, and more, go to www.celko.com.
Contributors : Joe Celko
Last modified 2005-09-08 09:07 AM