System Tables
Introduction
How come the title of this article is “System Tables”?
I was tempted to call this article “Well, you know, that stuff you get when you ask your DBMS what tables and columns and constraints it has.” Or possibly, “What SQL Server calls ‘system tables,’ but IBM calls ‘system catalog,’ but Oracle calls ‘data dictionary’.” The problem with the name is manifold:
- If the title said “information schema,” those who’ve never heard the term (and there are many) wouldn’t understand it.
- If the title said “catalog” or “system catalog,” some might believe the discussion will pertain to the standard SQL catalog.schema.table hierarchy (and “catalog” means “database” to some people).
- If the title said “metadata,” some would think this is an article about data warehousing.
- And so on.
So it’s not really that simple ... but I decided that “System Tables” was a good compromise for an article that discusses how IBM DB2 v8, Microsoft SQL Server 2000, and Oracle9i handle the data that describes the database structure itself. I’ll call this the “metadata.”
Describing the Database Structure
In standard SQL, the data that describes the database is stored in a schema called INFORMATION_SCHEMA. Specifically:
“The Information Schema is a definition of the SQL data model, specified as an SQL-schema, in terms of <SQL schema statement>s as defined in ISO/IEC 9075. ... The Information Schema views are defined as being in a schema named INFORMATION_SCHEMA, enabling these views to be accessed in the same way as any other tables in any other schema. SELECT on all of these views is granted to PUBLIC WITH GRANT OPTION, so that they can be queried by any user ... No other privilege is granted on them, so they cannot be updated. ... The Information Schema tables may be supposed to be represented in the Definition Schema in the same way as any other tables, and are hence self-describing.”
— the SQL Standard
So, if you’re using a standard SQL-compliant DBMS, and you want to know something about the structure of a table to which you have access, you can just use a regular SELECT statement to retrieve the information you need from the INFORMATION_SCHEMA.
But what about the Big Three? Here are some quotes from each vendor’s documentation:
“Each database includes a set of system catalog tables, which describe the logical and physical structure of the data. ... These tables contain information about the definitions of database objects such as user tables, views, and indexes, as well as security information about the authority that users have on these objects. They are created when the database is created, and are updated during the course of normal operation. You cannot explicitly create or drop them, but you can query and view their contents using the catalog views. ... SELECT privilege on these views is granted to PUBLIC by default.”
— IBM DB2 documentation
“These views are defined in a special schema named INFORMATION_SCHEMA, which is contained in each database, but visible only in the master database. Each INFORMATION_SCHEMA view contains meta data for all data objects stored in that particular database.”
— Microsoft SQL Server documentation
“F021, Basic information schema — Oracle does not have any of the views in this feature. However, Oracle makes the same information available in other metadata views ... These tables and views are called static, because they change only when a change is made to the data dictionary (for example, when a new table is created or a user is granted new privileges). Data dictionary tables are not directly accessible, but you can access information in them through data dictionary views.”
— Oracle documentation
In each case, we see that the DBMS in question provides access to metadata via a set of system views that can be operated upon with the SQL SELECT statement. SQL Server even provides the proper INFORMATION_SCHEMA structure required by standard SQL.
But why did the makers of the Standard — and the Big Three — decide that such a construct was necessary? Well, they were (as always) following the rules set forth by the father of relational DBMS theory.
Codd’s Rule #4
Dr. Edward F. (Ted) Codd started up relational database theory in 1970. Fifteen years later, in ComputerWorld, his article about “The Twelve Rules” that concisely define an ideal relational database appeared (there were actually thirteen rules, but we’re stuck with the name). The fourth rule was:
Rule 4: Dynamic On-Line Catalog Based on the Relational Model
- A relational database must provide access to its structure through the same tools that are used to access the data. This is usually accomplished by storing the structure definition within special system tables.
To paraphrase the master:
“Rule #4 requires a relational DBMS to logically represent the description of the database (and all of the objects contained therein) in the same way as ordinary data, so that authorized users can retrieve the information with the same relational language they use when retrieving the ordinary data. One major consequence of the rule is that users will need to learn only one data model — an advantage that non-relational systems usually do not offer.”
You’ll find many other paraphrases and glosses all over the Internet (I’ve collected a few links for the References section at the end of this article). But here’s my own attempt to explain why Rule #4 is a Good Thing:
- You don’t have to learn new terms or keywords to retrieve metadata information. You already know SELECT.
- There are numerous possible ways to select and format the results, compared to the Procrustean ways like “LIST TABLES” (an old DB2 command) or “sp_help” (an old SQL Server procedure) or “SQLTables()” (the function from the ODBC lagoon).
- It’s portable because every major DBMS either already does it this way, or will soon.
- Your online command tool or GUI interface won’t need tweaking in order to display the data you retrieve.
Access to Metadata in Eleven SQL DBMSs
Here’s a vague example question:
How do I get some information about the tables?
Now, here are 11 answers, from 11 major DBMSs. Look particularly at the naming convention, and at any qualifiers the DBMS uses.
-
CLOUDSCAPE
Metadata table names start with “sys.”
Example:SELECT SCHEMAID, TABLENAME
FROM SYSTABLES -
DB2
Metadata tables are in the “syscat” or “sysibm” schema.
Example:SELECT TABSCHEMA, TABNAME
FROM SYSCAT.TABLES -
INFORMIX
Metadata table names start with “sys.”
Example:SELECT TABNAME, TABID
FROM SYSTABLES
WHERE TABID > 99 -
INGRES
Metadata table names start with “ii.”
Example:SELECT TABLE_NAME, CREATE_DATE
FROM IITABLES -
INTERBASE
Metadata table names start with “rdb$.” Example:SELECT rdb$relation, rdb$owner_name
FROM rdb$relations -
MIMER
Mimer v8.2, like SQL Server, supports the SQL:1992
INFORMATION_SCHEMA requirement. Metadata tables are in a schema named INFORMATION_SCHEMA. Example:SELECT TABLE_SCHEMA, TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' -
ORACLE
Metadata tables are owned by “sys.”
Example:SELECT
OWNER, TABLE_NAME FROM ALL_TABLES -
POSTGRESQL
Metadata table names start with “pg_.”
Example:SELECT relname,
relkind FROM pg_class -
SAP DB
Metadata tables are (in effect) in a schema named “DOMAIN.”
Example:SELECT OWNER, TABLENAME
FROM DOMAIN.TABLE -
SQL Server (old style)
In earlier versions of SQL Server, you could retrieve
metadata information from the sysobjects table. sysobjects
had one entry for every database object (e.g., tables, views,
constraints, and so on) and could be queried just like any other
table.Example:SELECT *
FROM SYSOBJECTS
WHERE XTYPE = 'PK' -
SQL SERVER (new style from v7 and up)
Although Microsoft still supports the sysobjects option,
there is now a better way to retrieve metadata. Metadata
tables are in a schema named INFORMATION_SCHEMA.
Example:SELECT TABLE_NAME, TABLE_TYPE
FROM INFORMATION_SCHEMA.TABLES -
SYBASE
Metadata table names start with “sys.”
Example:SELECT NAME, CRDATE
FROM SYSOBJECTS
WHERE TYPE ='U'
There’s a bit of a divide: some DBMSs distinguish the system tables by giving them a special prefix like “sys.” Others (the more SQL Standard-compatible ones) put the system tables in a separate “schema” (they may call this an “owner”) such as INFORMATION_SCHEMA. The latter way is better, because then one doesn’t have to worry about silly restrictions like “you can’t create a table name that starts with sys.”
So if you want to guarantee portability, never create tables with names that start with “sys” or “ii” or “pg_,” or names that contain “$.” And don’t hire new employees with names like “sys” or “domain” or “information_schema.”
Two Kinds of Access
Quite often, there are really two ways to access system tables: a low-level access to the base table or an access via an immutable view of the base table. I’ve just illustrated this for Microsoft, but it’s true of several DBMSs. In fact it’s even true of the SQL Standard, which defines a gaggle of base tables in a DEFINITION_SCHEMA that nobody ever will implement (and the Standard states does not need to be implemented), and a gaggle of views, based on the DEFINITION_SCHEMA tables, in INFORMATION_SCHEMA. In the words of the Standard:
“The views of the Information Schema are viewed tables defined in terms of the base tables of the Definition Schema. The only purpose of the Definition Schema is to provide a data model to support the Information Schema and to assist understanding. An implementation need do no more than simulate the existence of the Definition Schema, as viewed through the Information Schema views.”
With some DBMSs, especially in old versions, you can do interesting things:
- Put a trigger on a system table so that any change can cause an action (for example, this was possible in SQL Server 6.5).
- Revoke SELECT permission from a system table so that most users can’t see it (usually it’s assumed that SELECT is granted to PUBLIC so anybody who has a privilege on an object can see that object’s description).
- Add indexes or other mechanisms to speed up access to a system table.
But those are merely interesting details. More important is the stern warning from every vendor: that you only use the system tables for reading, and that you use the highest-level access method that the DBMS provides:
“The catalog views cannot be explicitly created or dropped. The views are updated during normal operation in response to SQL data definition statements ... Data in the system catalog views is available through normal SQL query facilities. The system catalog views ... cannot be modified using normal SQL data manipulation statements.”
— IBM DB2 documentation
“System tables should not be altered directly by any user. For example, do not attempt to modify system tables with DELETE, UPDATE, or INSERT statements, or user-defined triggers.”
— Microsoft SQL Server documentation
“To access the data dictionary, use SQL statements. Because the data dictionary is read-only, you can issue only queries (SELECT statements) against the tables and views of the data dictionary. ... No Oracle user should ever alter (update, delete, or insert) any rows or schema objects contained in the SYS schema, because such activity can compromise data integrity.”
— Oracle documentation
Since the system tables can be a bottleneck and can be pretty large, it may often be useful to make a copy (globally or for each user) of static but commonly-used system tables. This is possible with CREATE ... SELECT or with permanent materialized views, depending which DBMS you’re using. Such a copy is sometimes called a “shadow catalog.”
Some Differences Between The Big Three Implementations
Looking at the metadata retrieval options provided by the Big Three (DB2, Oracle, SQL Server), you’ll notice that each has a particular flavor of system tables management.
DB2:
There are several flavors of DB2, and naturally the system tables can be different in each — especially if the metadata has something to do with physical characteristics or operating-system file information.
Oracle:
Oracle calls its system tables the “data dictionary” and normally splits the tables into three, progressively more global, sets:
- USER views show descriptions of the objects that are in the user’s schema.
- ALL views show descriptions of every object that the user can access.
- DBA views show descriptions of all objects in every schema.
In Oracle7 and earlier, data dictionary tables were used to track tablespace space utilization. Although this method still exists to support older databases, Oracle recommends against using it.
SQL Server:
The original designers had an idea that DEFAULT is a type of constraint (odd) and that NOT NULL is a column attribute, but now that SQL Server follows standard SQL rules you won’t notice this so much. Currently there are different systems lying side by side: table names that start with “sys” (as in Sybase), stored procedure calls like “sp_help db,” and (the modern way) INFORMATION_SCHEMA.
Sum-it-up
Nobody can agree about the name, but everybody supports system tables in some fashion. Ted Codd was right — they’re good things. SQL Server 2000 deserves special kudos for being the first of the Big Three to support standard SQL INFORMATION_SCHEMA access.
References
Boyle, Neil. “Leveraging the information_schema”
Chigrik, Alexander. “SQL Server 2000 undocumented system tables”
Codd, E.F. “Is Your DBMS Really Relational?,”
ComputerWorld, (Part 1: October 14, 1985, Part 2: October 21, 1985).
Unfortunately Codd’s original paper is not available online. Here are some Web pages that list Codd’s 12 Rules, sometimes with considerable paraphrasing (no two are alike):
http://mylinux.rmz.uni-lueneburg.de/ss03/db/4/codd.htm
http://www.geocities.com/rajesh_purohit/database/12coddsrule.html
http://www.itworld.com/nl/db_mgr/05072001/
http://www.indianflair.com/articles/rdbms.htm
Gennick, Jonathan. “The master key to Oracle’s data dictionary.”
ISO/IEC JTC 1/SC 32. “Information technology — Database languages — SQL — Part 11: Information and Definition Schemas (SQL/Schemata).”
Kreines, David. “Oracle Data Dictionary Pocket Reference: sample pages.”
Parker, Jack. “Using the Informix System Catalogs.”
Seiner, Robert S. “DBAs Don’t Need Metadata.”
Sharma, Rahul. “Information Schema Views and Object Properties.”
Tabernick, John. “Using SQL Server System Tables Metadata.”
--
Trudy Pelzer has worked with SQL since 1986. She has co-authored four books, including one thick book about the SQL Standard (SQL-99 Complete, Really) and one thin book about portability and optimization for DB2, Oracle, SQL Server, and five other DBMSs (SQL Performance Tuning). Trudy has written other articles for DBAzine.
Contributors : Trudy Pelzer
Last modified 2005-04-18 09:43 PM