Tags Do Not a Language Make - Part 1
Part 1 | Part 2
"XML was, and continues to be, a revolutionary technology."
— Ed Dumbill, xml.com
"It's historical fact that the syntax of XML was defined before its data model, the XML Information Set (Infoset). While this contributed to the speed of delivery of the XML specification, it also leads to a number of subsequent problems; most notably, the discontinuities between the DOM and XPath, both of which define different tree models for XML documents."
— Leigh Dodds, xml.com
In their seminal article in Scientific American, Bosak and Bray describe the objective of XML as follows:
"Give people a few hints, and they can figure out the rest. They can look at this page, see some large type followed by blocks of small type and know that they are looking at the start of a magazine article. They can look at a list of groceries and see shopping instructions. They can look at some rows of numbers and understand the state of their bank account. Computers, of course, are not that smart; they need to be told exactly what things are, how they are related and how to deal with them. The solution, in theory, is very simple: use tags that say what the information is, not what it looks like. For example, label the parts of an order for a shirt not as boldface, paragraph, row, and column — what HTML offers — but as price, size, quantity and color." [emphasis mine]
As a database professional, do you recognize this?
- What things are
- How they are related
- How to deal with them
Unless you have an adequate knowledge and understanding of data fundamentals — which is quite rare — you probably don’t realize this is more or less a definition — albeit informal — of a data model. If database professionals don’t recognize it, there is no reason to expect the two authors, who lack database background, realized it either.
"Model" is one of the most abused concepts in the IT industry, which is riddled with all sorts of confusions (see "Models, Models Everywhere, Nor Any Time to Think"). To set yet again this matter straight, a data model is a general theory of data that serves as a "translator" for mapping enterprise-specific business models to enterprise-specific logical models (see Practical Issues in Database Management). To that end, a data model provides four logical constructs — data types, structure, integrity, and manipulation — used to map business models, which computers don’t understand, to logical representations that computers can deal with (the three kinds of model are routinely confused in the industry; see "On What Is a Data Model," "Something to Call One’s Own").
Figure 1 shows the mapping of a business model on the left to the logical model on the right, using the relational data model for translation (e.g., entity types map to tables, attributes to columns)
Relational mapping of a business model (A) to a logical model (B).
The four constructs underlie a data language via which database designers "tell" a DBMS "what things are" (data types), "how they are related" (structure and integrity) and "how to deal with them" (manipulation).
This is fundamental and hardly novel: without a data language based on some data model, no DBMS would know what data means and, therefore, would not be able to manage it (protect its integrity, manipulate it correctly, and so on). This is another way of saying that database management is impossible without a data language (syntax) reflecting a data model (semantics), because without them, only users and applications would know what the data means, which is exactly how things were prior to databases and DBMSs.
The desirable properties of a data model are:
- Generality: ability to represent as many kinds of data as possible
- Formality: a sound theoretical foundation
- Completeness: inclusion of all four components
- Simplicity: as simple as possible (but not too simple!)
The relational model is currently the only data model that has all four desirable properties.
- There is no information that it cannot represent (claims to the contrary notwithstanding)
- It has dual theoretical foundation: predicate logic and set theory
- It includes all four components:
- data types: domains
- structure: R-tables
- full integrity: domain, column, table and database constraints
- manipulation: R-operations
But XML is being heavily promoted now as a revolutionary data management technology, claimed to be as suitable to the task as relational technology, if not better. To stake such a claim, proponents of XML databases must demonstrate that:
- XML has a data model;
- The data model is new;
- Relative to the relational model, the XML data model:
- is as general, or more so;
- has a sound theoretical foundation;
- includes all four components;
- is as simple, or more simple.
Failure to demonstrate any one of these, let alone all, disqualifies any database technology, XML included, as an equal, let alone superior, alternative to relational technology.
For how XML scores, stay tuned to this column.
--
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld and Computerworld,and is columnist for The Data Administration Newsletter, and the Journal of Conceptual Modeling. His book, Practical Issues in Database Management serves as text for his seminars.
Contributors : Fabian Pascal
Last modified 2006-01-04 01:46 PM