The "Future of DBMS" - Part 1
Part 1 | Part 2
I have recently come across an exchange on The Future of RDBMS in section so named at Devshed. It is not different than most of what is being said in the industry, but there are two reasons why some clarifications are warranted. First, because it refers to Database Debunkings, and second, because it offers excellent examples of how poor knowledge in the database field really is. How poor is it (as Johny Carson used to ask)? It is so poor, that even the minds of those with their "heart in the right place" has been contaminated by the industry. Indeed, one thing I learned is to never assume that because somebody agrees with my arguments, or advances similar ones, it necessarily means correct understanding of the fundamentals. This is the first part in a 2 part series of my comments, which are prefixed with FP.
Rycamor: The future of RDBMS: "Object-oriented databases are the next step in evolution of data management …The relational data model is dying, because it is too limited and non-intuitive for modern data needs ... the industry is moving toward a more XML- or object-centric approach to data management … hierarchical, tree-structured, complex data types, instead of just columns and rows ... blah, blah, etc."
For the past couple of years, every time I heard or read about this sort of industry hype, I became just a little more uneasy. At first I couldn't put my finger on it. I thought it might be because I was just afraid I would have to learn something new, even if it must be better. (I mean, it's the next step). But every time I read someone going on about how easy it is to work with "arbitrary levels and attributes" and "non-structured data types", etc., I couldn't help thinking this is exactly what I didn't want my database to do. The whole reason I liked using a database was to prevent myself from just arbitrarily adding information, and the above just sounded like a recipe for headaches.
I now believe my feelings are justified, after taking a closer look at the original concept of the relational data model, proposed by Codd & Date. It's not just "one way to look at things." It is more like a self-evident mathematical theorem. In other words, they didn't invent it, they discovered it. By definition, there is no other way to really truly make sense of your data, to get rid of all redundancy, and to maintain true integrity of your data.
Before anyone jumps on me for making such a "generalization," you might want to spend some time at http://www.dbdbunk.com/. This site is run by a couple of interesting people, one of whom happens to be C.J. Date himself. Yes ... one of the original two guys who started the whole thing in motion. He and his two associates have some very interesting things to say about the current state of databases and the computer industry in general. For example:
- SQL != Relational Data Management -- according to them, SQL is actually an incomplete implementation of the true relational data model, and so the study of SQL itself is not enough to truly understand relational data. Yes, it's the best we can do, for now, but we have hung on to SQL far too long, and it is time to come up with a truly capable declarative language for data management.
- XML databases -- it's already been tried. It's called the hierarchical data model, and the problems associated with it in the 60s are exactly what led to the need for the relational data model.
- Object-oriented databases don't have a well-defined data model, or even a good concept of what a data type is. Thus, there is literally no way they can provide true data integrity.
- The universities and the computer companies are losing sight of the importance of understanding data fundamentals. Instead, they are just pushing developers to learn a vendor-specific implementation -- it's a cookbook approach to learning: all how-tos, with no "why" or "why not." They all tend to dismiss any serious discussion of the fundamentals as "academic theory," and not "practical" for business usage. Companies don't look for a database designer, but for an Oracle person, or a Sybase person, etc. ...
The above is just a glance at what you will find in the topics and articles on the site. I found it very inspirational material, in the sense that these guys cut through all the industry BS, buzzwords, etc., and focus on what is really at stake. It's surprisingly readable, too, for a bunch of "theorists."
FP: Pretty good assessment of industry, and summary of Database Debunkings (I and Peter Robson, the webmaster, are partners in the site; Chris Date is senior contributor). The various arguments quoted by Rycamor are backed up in various of my articles and online columns Against the Grain and at DBAzine,"The Data Administration Newsletter," and the Journal of Conceptual Modeling.
Andnaess: I visited Database Debunkings some time ago and read several of the articles. Very interesting stuff. Have you read The Third Manifesto by Date? I don't think Date and Codd invented the relational model, that was the work of Codd only, Date recognized the beauty of it early on and started working with it.
I firmly believe in the relational model having a future. It's such a simple and elegant model that it can never become unfashionable. XML has its uses, object-oriented databases are, as far as I can see, (and to put it bluntly) ****e. The presentations I have seen just make me think something like, "Oh no, not this mess again."
It's interesting to read what Date and Pascal say because it shows that there still is room for improvement. I'm hoping to write my Master thesis around relational databases, and currently it's definitely the field that interests me the most.
What especially interests me is the fact that, today we have a one-to-one relationship between the logical view and the physical view. This leads to nasty things like denormalization for the sake of speed. Yuckh. Such details should be hidden deep inside any RDBMS. We should only work with the logical view of the data, which of course is perfectly normalized.
And of course, it's always interesting when someone says that an entire industry is doing things the wrong way ...
FP:
- Good attitude and perspective for a student, let's hope his mind won't be corrupted by the industry and academia, but I won't hold my breath (for academic examples see "Denormalization for Performance: Et Tu Academia"; and my May and June Against the Grain columns).
- Whether the relational model has a future in the industry I am not sure, given that it does not even have a present (SQL is very far from it, so much so that Date and Darwen explicitly reject it as a basis for implementing a true RDBMS in The Third Manifesto, and "room for improvement" is a huge understatement). However, if what is meant is that it won't be readily superseded by other models, then that is, of course, very likely. Indeed, Codd applied predicate logic to database management, and it is logic that guarantees the integrity and reliability of relational databases and RDBMSs. I keep asking those pushing, or claiming to have invented "better data models" to provide formal definitions and what theoretical foundation they have substituted for logic, but I have to receive an answer yet and I am not holding my breath, particularly since they don't even know what a data model is (see "On Respected Technical Analysts," "On the So-Called 'Associative Model of Data'," "On What is a Data Model - Reply to Simon Williams," and "Models, Models Everywhere, Nor Any Time to Think").
Rycamor: Yes, I suppose you're right. Codd was the discoverer, and Date helped him refine the theory. And yes, it is always fun when someone has the guts to take on an entire industry. Their Quotes of the Week section is merciless. Intelligent Enterprise has some more good articles by Date.
I noticed that they mention Ingres as one of the few databases [sic] that was designed with the possibility of expanding into a true relational database. Interestingly enough, PostgreSQL is the descendant of Ingres. I found out about the Database Debunkings website from one of the PostgreSQL mailing lists. It seems like PostgreSQL might show some promise in these areas in the future. I have been learning a lot about PostgreSQL lately, and it is pretty impressive. Also, see them have some fun with MySQL (it's rather sad, actually).
FP: I am not familiar with PostgreSQL, but even the sheerest relation to SQL makes it a questionable proposition.
Binky: Scary conversation ... are we talking about the implementation or the concept of relational databases? In other words, you keep talking about actualities such as Oracle, Sybase, Postgres, but I thought that the relational database was a concept and not an actual product.
As far as I understood the idea of RDBs, is that it's the concept of creating a data storage method whereby data is separated into smaller chunks by means of identifying links between those chunks of data. [FP: huh?]
I've always been confused about object-orientated databases; does this mean that you control the data, not through SQL, but through object-orientated programming? Hence the act of updating a record is simply a case of calling a method within the database that will do the update? I don't really get how that works, though I can see some logic. Is the database still tabular or are we now thinking databases in a different way to how we used to?
I could understand the theory of having a BLOB field in a database that contains a further database or XML file, thereby making the data quasi-3D. So a table of employees could have a field called pet that would contain an XML file that details the persons pet in a structured way that allows for much more flexibility/locality than 1 or more tables. Help, need to cool brain down now.
FP: Scary indeed. Confusion about OO databases I would expect - OO is really confused and confusing (see Date's series on the Liskov Substitution Principle). But apparently that's not the only confusion here. This is a typical example of practitioners never bothering to educate themselves on the fundamentals, but rather relying on intuition, product experience and online exchanges for information and knowledge. The "definition" of a relational database is pure nonsense (here the problem is much more serious than just database or relational knowledge, see Date's "Why Is It Important to Think Clearly?" in Relational Database Writings 1994-1997). See if you can provide a precise formal definition of a database yourself. Incidentally, there is no theory behind BLOBs (see Chapter 1 in Practical Issues in Database Management).
Rycamor: We are talking about both the implementation and the concept. Specifically, no one has ever succeeded in a complete implementation. And rather than pursue this complete implementation, the vendors instead are creating a plethora of add-on "solutions", application servers and whatnot, in the hope of redirecting people away from the main problem. See Codd's 12 (actually 13) rules which define a relational database.
Yes, you are right about OO databases. One of the chief complaints is that in order to access the data, you will need to use programming rather than simple declarative statements, such as in SQL. This breaks the concept of data independence, where any program or any person can access the data, without depending on a specific piece of custom software. Those who are pushing this concept say it will be much easier to do "employee.getRecord(12)" instead of "SELECT * FROM employees WHERE id=12". The problem is, that may sound nice and intuitive at first (to the programmer), but when you have to combine data in complex ad hoc queries, the OO method will be a nightmare. Also, now your database requires that much more work to connect with other applications, because methods are not standard. A method such as "employee.getRecord()" belongs on the application level, where the application makes the SQL query internally.
The problem with storing an XML file in a BLOB column is: Guess what? You have just broken the relational model. Now, in order to get the details on that pet, you have to rely on additional programming methods. Programming should be used for decision-making and user interaction, not complex storing and retrieval. Yes, SQL doesn't give us the easiest way to deal with a "tree-structured" relationship, but that is a shortcoming of SQL, not the relational data model itself.
Don't get me wrong: I can see how in certain circumstances, the data in the BLOB might not be an integral part of your main data needs, so this might be an expedient way to deal with it, but if that data becomes more important later, such as in integrating with reports and statistics from your main database, you are going to have problems. In the end, the relational data model applied properly, allows for more flexibility and efficiency than any other approach.
FP:
- Codd's 12 Rules do not define a relational DBMS (not database!), but rather specify a set of principles and features that a DBMS should adhere to/provide to be considered a true RDBMS. They were intended as criteria by which to assess the relational fidelity of commercial products. Even though they are helpful in screening out pretenders, the rules have been found somewhat problematic, and no longer in "active use."
- The problem with OO databases is much more serious than just the necessity of programmed access. Aside from being fuzzy and lacking a theoretical foundation, OO is essentially a programming approach -- a set of guidelines for developing "good programs"-- and does not provide a specific data model analogous to the relational model, because it was not intended for data management.
- The principle of data independence means that database functions (data types, structure, integrity, manipulation, security, concurrency control, physical management, optimization) are not performed in applications, but by the DBMS. If a DBMS supports BLOB data types with operators for values of those types, it does not violate data independence or the relational model. However, such support raises some other nontrivial problems (see Chapter 1 in my book and "Unstructured Thinking," an article forthcoming at Database Debunkings.)
- Chapter 7 in my book also demonstrates that a truly relational DBMS can handle tree structures better than hierarchic DBMSs, and the fact that SQL does not is its own fault, not the model's.
--
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author, and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a Web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld, and Computerworld and is author of the contrarian column Against the Grain. His third book, Practical Issues in Database Management - a Guide for the Thinking Practitioner (Addison Wesley, June 2000), serves as text for a seminar bearing the same name.
Contributors : Fabian Pascal
Last modified 2006-01-04 02:02 PM