Wanted: Knowledge and Clear Thinking
A while ago, Chris Date published a two-part article, "Why Is It Important to Think Precisely," which he wrote because -- believe it or not -- somebody actually asked him this question during one of his seminars. This says it all about the state of database practice. I am frequently reminded of that article, as I was by the InfoWorld article "Dawn of a New Database" by Ephraim Schwartz. From the way it begins, it is quite clear that it won't be very enlightening:
"Oracle makes an OK database. Microsoft's SQL Server and IBM's DB2 aren't bad either. But as data gets collected over wireless and the demand for a warp-speed response increases, all of these well established -- dare I say, old-time -- companies may soon get a rude awakening. You see, transactional databases are coming into their own."
This paragraph utterly lacks any informational value, but, then, this is standard operating language in the trade media (note that Schwartz uses the term 'database' when he actually means 'DBMS', a common mistake in the industry).
Schwartz mentions several companies whose business requires very fast response from their DBMSs, who found Oracle and "some other traditional SQL-type databases" too slow, and concluded that "Only InterSystems Caché transactional database could handle them." One of them "is talking to InterSystems ... about a way to store all of its SMS messages, [a]gain, something a relational database can't really handle", Schwartz concludes.
To: ephraim_schwartz@infoworld.com Hi, I recommend that trade journalists educate themselves on data fundamentals before they write on the subject of databases. Regarding your article in InfoWorld: 1st, you mean DBMS, not database. It's not the same thing. 2nd, there is no relational DBMS available. They are all, including Oracle, SQL DBMSs. It's not the same thing. 3rd, performance is determined entirely at the physical level and has nothing to do with the relational data model, which is purely logical. 4th, Cache is not a "new" type of DBMS. To the extent that Cache has good performance, it's entirely due to its physical implementation, and not to the "type" of DBMS it is. 5th, any DBMS vendor is entirely free to do whatever they damn please at the physical level to maximize performance, regardless of whether the DBMS is relational or not. To the extent that a product does not perform well, they have only themselves, not relational technology to blame, and that does not mean a "new type" of DBMS is needed. One reason the industry goes from fad to fad, often even recycling old discarded technologies (e.g. hierarchic DBMSs relabeled XML) is precisely because practitioners, vendors and the trade press ignore the fundamentals. You may want to check my recent article on a piece by a colleague of yours. |
Before I delve into debunking this, it's enlightening to know how InterSystems promotes its product, hyped by one of Schwartz colleagues (often, regurgitating vendors' claims is about all the media does, which serves its advertising interests):
"Caché 4.0 combines the speed and scalability of a transactional multidimensional data model with the power and flexibility of object technology, regardless of how the data is modeled. The system supports tables, objects, and even multidimensional arrays. Caché's proprietary Unified Data Architecture provides automatic access to all data in either object or table form, sidestepping the need to map from one format to the other. Better yet, because Caché stores its data in a multidimensional format, it assures exceptional performance even under heavy load."
-- Tim Fielden, "Caché Binds Data,"InfoWorld.
I used to be amazed at how much confusion and how many errors can be squeezed into short paragraphs, but I am not anymore (if you can't figure it out in this case, you're not sufficiently informed either). Such "you-name-it-we've-got-it" marketing claims (see also "Comments on an Interview with Jim Gray" make it quite clear what the level of knowledge by vendors and the trade press is (for further evidence, see the several exchanges on "What Do You Mean Post-Relational?", triggered by similar pronouncements by InterSystems' Joe DeSantis). And the fact that they can get away with it does not suggest that the majority of readers know much more.
But, back to Schwartz. Note the shift from traditional SQL-type databases (er, DBMSs, of course), to relational DBMSs; and from performance to handling message text. Those without an adequate education on database fundamentals -- read: the vast majority of practitioners -- will probably come away from this with the following conclusion: "Relational DBMSs, of which SQL-type ones are the norm, are unable to provide fast performance, or handle text, like the new type of "transactional" DBMSs (whatever that means), such as Caché." And it is these types of articles -- and the conclusions they (mis)lead to -- that causes technology to regress rather than progress (see my first editorial at Database Debunkings).
First of all, SQL DBMSs, even though they have some relational origins, are so far from what the relational model requires and, consequently, fail to provide so many of the practical benefits from the model, that one of the proscriptions in The Third Manifesto by Chris Date and Hugh Darwen is that SQL should not be the basis of any future true implementation of the relational model (the book outlines a data language, Tutorial D, as the basis for such implementations).
Second, as I've explained so many times in so many places (see, for example, several Against the Grain columns on normalization), performance is determined entirely at the physical level and has absolutely nothing to do with the data model -- a purely logical construct -- underlying databases and DBMSs. In other words, whether a DBMS is relational or not says nothing about its performance.
Consider now the reaction Schwartz got from Bob Shimp, Oracle vice president of database product marketing, when confronted with the claims made for Caché:
"They [InterSystems] are an extremely small niche product designed for a highly specialized type of market. However, we're interested in this market, in memory database systems, but not ready to announce anything yet." (emphasis mine)
So the high performance claimed for Caché is largely due to its "in-memory" (physical) implementation and there is nothing, of course, to prevent relational DBMSs, or even SQL DBMSs, from such implementation. Yet a competitor does not say so even when it is in his best interest!, because he simply doesn't know and understand fundamentals. It's the declared objective of the press to assess technology for its readers. Yeah, right; if you believe that, I have a bridge in Brooklyn to sell you. Without adequate knowledge, they wouldn't be able to do that even if they wanted to, which they don't anyway (see my first editorial).
Schwartz also quotes Paul Grabscheid, vice president of strategic planning at InterSystems:
"Think about it. In the old days, only employees had access to a company database, typically used to collect and analyze data and issue reports. Now, anybody with a cell phone can access a database, and it must respond to not hundreds but hundreds of thousands of users. Oracle is an old technology, a quarter-century old. They are clearly king of the hill now and no one is going to knock them off, but I believe their time has passed."
I don't know whether Oracle's time has passed or not -- software, particularly the more complex system software -- never dies; it is a sort of trap from which users (and even vendors) find difficult to escape. In the preface to one of my books, I wrote that there is an economic incentive for complexity in the computer industry. Simplicity does not require so many books, seminars, consultants and programmers, which is how most of the money is made. That is at least part of the reason why true relational products have not been implemented, but Oracle -- an unnecessarily complex SQL system -- has been so successful.
There is, however, one sense in which I agree with Grabscheid, although we derive entirely opposed conclusions. It is true that SQL and its commercial implementations are old technology (not to mention bad technology, relationally speaking). But the true solution does not lie with nonrelational products such as Caché or XML, which throw us back decades (see last month's column and the Against the Grain columns on XML). Rather, the promises are true implementations of the relational model. I alluded in several writings to a technology that has been recently developed that facilitates the implementation of, among other software tools, true RDBMSs, with performance potentially several folds higher that of SQL products, and much simpler administration to boot. This technology offers a much wider range of optimizability and is particularly suited to in memory implementations (unfortunately, for legal reasons I cannot say more at this point, but stay tuned to this column and Database Debunkings.
Unfortunately, given the state of knowledge in the database field -- that between two senior vendor personnel and one trade journalist, they cannot get things straight, speaks volumes of the so-called "dawn of new database"-- whether this technology will be actually used in the industry for this purpose is another matter altogether. I won't hold my breath either for that to happen, or for a reply to the e-mail I sent Schwartz. (see sidebar). To quote from Chris Date's preface to Understanding Relational Databases:
"... [DBMS] deficiencies are, it seems to me, directly due to the widespread lack of understanding (not least on the part of vendors), of fundamental database principles. Certainly it is undeniable that they flout those principles in numerous ways. And the practical consequences are all too obvious: First, users must understand where the deficiencies lie; second, they have to understand just why they are deficiencies; third, they have to understand how to work around them; and fourth, they have to devote time and effort in persuading the vendors to remedy them. The trouble is, of course, users too tend to be unaware of those same fundamental principles and, hence, find themselves unable to carry out their side of the "contract" ... What is more, this sad state of affairs is not likely to change, given the apparent lack of interest on the part of the trade press -- itself ignorant of those same principles -- in trying to improve matters. It's a vicious cycle."
--
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author, and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a Web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld, and Computerworld and is author of the contrarian column Against the Grain. His third book, Practical Issues in Database Management - a Guide for the Thinking Practitioner (Addison Wesley, June 2000), serves as text for a seminar bearing the same name.
Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM