The Name Game
"In our industry, there is a strong desire to put names on things. This is natural enough, given the amount of information that we have to classify and deal with in our work. To give something a name is to gain control over it, and this is not necessarily a bad thing. The problem is when the name takes the place of true understanding of the thing named. Discourse tends to be the bantering of names, without true understanding of the concepts involved."
-- David Hay, News Flash: Ron Ross May Be Mistaken
The IT industry has a long and profitable tradition of proliferating terminology and acronyms which, more often than not, are new labels for old concepts, or technologies that had already been tried and discarded, or had become obsolete. XML (hierarchic databases) and managing data in application files by application programs come readily to mind. Anybody who knows and understands fundamentals readily recognizes these "innovations" for what they are, new labels notwithstanding.
Let's consider Ron Ross' so-called "business rules approach" to data management. In his article, "What are Fact Models and Why Do You Need Them?," he states that "data modelers usually try to accomplish two goals at once - often unknowingly [using] the data model to explore business requirements with users, while at the same time, to develop system requirements and database designs." He argues that the primary audience for the data model is system designers and DBAs and proposes "to stop using data models for developing business requirements" and to use his "fact model," a part of the business model that is for "business analysts and subject matter experts."
Ross refers to three models, data model, business model, and fact model, which he does not define. As Chris Date argues in, "Models, Models Everywhere, Nor Any Time to Think," modeling is an area most riddled with lack of precision and confusion. Practitioners either use different terms to mean the same thing, or use the same term to mean different things. As I explain more precisely in my book Practical Issues in Database Management, database design involves three kinds of model at three representation levels -- conceptual, logical and physical:
- conceptual model (also referred to as business model, or entity-relationship model): model of the persistent data of some specific enterprise (e.g., Figure 1A)
- logical model: the logical representation in the database of some enterprise-specific conceptual model (e.g., Figure 1B)
- physical model (or internal model): the physical representations on disk of some enterprise-specific logical model
Figure 1a: Conceptual Model
Figure 1B: Logical Model
Conceptual models are expressed in business terms (entities, attributes, relationships) e.g. employees, departments, salaries, and so on. Only users, but no DBMS understand these terms. So to be represented in the database, business models must be translated into something that a DBMS can understand, namely logical models.
Consider the conceptual model in Figure 1A. The entity type Employee, shown in Figure 2, for example, is essentially, a graphic representation of the following:
An employee is identified by an employee number, has a name, works in a department, was hired on a hire-date, is of a sex, earns a salary.
Figure 2: Graphical representation of a predicate
This is a generalized fact in the real world, generalized in the sense that it applies to any and all employees of a certain enterprise. Such generalized facts are predicates and the bold terms value-holders in logic.
A DBMS does not know what an employee is, what a department is, what "works in a department" means, and so on. But we can represent a predicate in the database in a form that a DBMS can understand, a logical model. A data model is a general "translation language," so to speak, via which to map specific conceptual models such as that in Figure 1A to their specific logical representations in the database such as that in Figure 1B. To that end, a data model provides the following constructs:
- structure (or organization): how the facts are organized in the database
- data types: represent attributes in the database
- integrity constraints: represent business rules in the database
- manipulation: represents behavior in the database
The relational model is a data model that provides four such constructs: R-tables (organization); domains (data types); integrity constraints; R-operations (manipulation).
Consider now the slightly revised predicate (sex was dropped, employee fingerprint was added and the value-holders were given abbreviated names) as follows:
- Employee is identified by employee number (EMP#), has name (ENAME), works in department (DEPT#), was hired on hire-date (HIREDATE), earns salary (SALARY), has fingerprint (EFP).
The table in Figure 3 is the relational representation in the database of the predicate; the columns EMP#, ENAME, DEPT#, SEX and HIREDATE represent the value holders, the rows represent facts about individual employees (propositions in logic). The two rows in the table are sets of values that represent two propositions about two employees. When these row values are plugged into the value holders in the predicate, the following propositions about those employees obtain:
- Employee is identified by employee number 100, has name Spenser, works in department E21, was hired on hire-date 6/19/1980, earns salary $26,150, has fingerprint EFP1.
- Employees is identified by employee number 160, has name Pianka, works in department D11, was hired on hire-date 10/11/1977, earns salary $27,250, has fingerprint EFP2.
Figure 3: Logical representation of a predicate and its propositions
Thus, the conceptual model consists of facts as understood by users and the logical model is a representation of those facts in the database in a form understood by the DBMS. The data model facilitates such representations.
The three kinds of representation model and the data model are routinely confused in the industry. For example, when Ross and Hay use the term 'data model', both actually mean logical model, which is, of course, confusing, and makes it difficult to assess their respective arguments. Business and physical models are also frequently referred to as data models and vice-versa.
Ross is quite correct when he says that data modelers "try to accomplish two goals at once", that is exactly their function: to extract the conceptual model from end-users and managers and, via the data model, map it to a logical model and pass it to the DBA, for physical implementation (the DBA then provides access to the logical model to application developers and end-users, which should never be exposed to physical implementation details). Data modelers ought to be educated to play this dual communication role between users and IS personnel, to which end they should be conversant with both the conceptual and logical models, as well as with the data model. The problem, in practice, is that this task is usually incorrectly assigned to IS personnel, who know mostly, if not only, software and hardware tools and, at best, physical models, but are not educated (at least, not sufficiently) on the data, conceptual and logical models. That is why the dual task is performed "unknowingly" rather than knowingly and why communication with users is in the wrong language (not to mention poor modeling). Here's a typical modeling job description, characterized by fuzziness, emphasis on tools and a few contradictions:
"Deep understanding of extensible, commercial design patterns. Ability to analyse [sic] and reverse engineer best-of-breed database designs. Excellent Oracle PL/SQL skills. Rigorous analytical thought. Facility with Erwin for deployment on Oracle 8i. Commercial software experience a strong plus. Must have excellent written and communication skills and be prepared for exciting, 4 month roller-coaster."
Data modelers should have conceptual and logical modeling skills and knowledge of the data model, not Oracle skills. And should map business models to logical models, not backwardly infer logical from physical models, without knowledge of either the data model or the conceptual model. Roller-coaster indeed.
Conceptual (business) models are -- and have always been -- nothing but organized collections of facts. It is good that Ross has finally realized this, but it is hardly a revelation, let alone some new approach to data modeling, as he claims. He admits that his "fact model" is "part of the business mode," but the fact is (pun intended) that business models are fact models. Note also that business rules are facts too (the above predicate is, essentially, a rule; a conceptual model is, therefore, a collection of all the rules in effect, see Chapter 2, The Rule of Rules in Practical Issues in Database Management. It follows, therefore, that business rules are nothing new either. As Chris Date points out:
"... the theoretical foundation for business rules is essentially the same as that for relational databases! In other words, it's nothing more nor less than the relational model all over again. Indeed ... business rule technology is very much in the spirit of, and is fully consistent with, Codd's original relational vision. This is one major reason why business rules are not just another fad in this generally fad-ridden industry -- they really are going to have a far-reaching and long-lasting effect on the way we do business in the IT world."
These are data fundamentals which every practitioner should know. Unfortunately, one can have a very successful career and even become an "expert" without ever being exposed to fundamentals, or understand them (see my article on "respected analysts." Ross' response to Hay's criticism is quite instructive in this context.
"You make some excellent points, and clearly you do know what 'data' modeling is about. However, I'm afraid 'data modeling' has become a de-based term, and has so much historical baggage that it fails to connect very well to the business side. That is one reason we strongly recommend the new term 'Fact Modeling' for what we do with that community. In fact, I think you undercut your own arguments by use of the following definition: '... a data model (as I learned of the term), fundamentally describes 'things of significance to the business about which we wish to hold information.' The primary purpose of a Fact Model is NOT to figure out what things about which we wish to 'hold information.' That's exactly the way a system designer or database designer would say it. A Fact Model is about structuring basic knowledge -- nothing more and nothing less. It's NOT a blueprint for 'holding' anything ... nor is it really about 'information' in the way most people use the term. It is about what we need to know to run the business. That knowledge is 'data' only in the eyes of designers."
It is, unfortunately, true that 'data model' has become a debased term, but inventing yet another unnecessary term (when we obviously have more than we can keep straight already), only exacerbates, rather than solves the problem. Hay's use of terms is not accurate either:
"Based on my experience, a preferable strategy would be to stop trying to use the data modeling process to develop system requirements and database designs. In point of fact, his 'fact models' that he proposes as an alternative to data models are almost exactly what I produce when I am producing what I call a data model."
Facts (that is, business) models are not alternatives to a data model. And Hay does not produce a data model, he produces first a business model, then maps it to a logical model using the data model.
End-users, DBAs, or application programmers should not be entrusted with modeling without a proper education in fundamentals. Were they so educated, they would talk facts -- including business rules -- to users at the conceptual level; and tables domains and integrity constraints to DBAs and application developers at the logical level.
If Ross wants to criticize modelers for not employing the correct language (of the conceptual and logical models) when communicating with users, that's fine. But constantly reinventing new terms without an understanding of fundamental concepts, is not ever going to get us far -- it'll only confuse even more.
--
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a Web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld, and Computerworld and is author of the contrarian column "Against the Grain." His third book, Practical Issues in Database Management - a Guide for the Thinking Practitioner (Addison Wesley, June 2000), serves as text for a seminar bearing the same name. He can be contacted at editor@dbdebunk.com.
Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM