XML storage: Oracle should be hearing footsteps

17.01.2006
Twenty-four years ago, I raised a furor in the database management systems industry. As a rookie analyst -- a stock analyst, no less -- I argued that the then-dominant hierarchical/network data architectures should and would be replaced by "index-based" systems. Over the next few years, I was proved right, as inverted-list and relational products took over the DBMS market.

Recently, I've argued a contrasting position: XML-based data architectures should and will get an important IT role in applications where tabular data-bases don't do a great job. Thus, I think that IBM's and Microsoft's more- or-less native XML storage systems will be more than niche curiosities, and Oracle will soon have to offer a worthy competitor.

There are three basic parts to the argument:

1. There are applications for which XML offers a superior logical architecture to SQL. These fall into two groups. First, there are apps in traditional categories -- CRM, SCM and so on -- that don't have naturally concise relational schemas. We can say that the natural schema is highly variable, or we can say that the overarching schema that takes this variability into account is horrifically complex. Either way, stuffing these apps into a relational straitjacket causes a lot of unnecessary grief.

Second, there are apps that deal with new kinds of complex, dynamic documents. Before XML, either these documents didn't exist at all or their processing couldn't be fully automated.

2. For many of these applications, native XML storage is more efficient than traditional relational storage. Before Microsoft's and IBM's recent announcements, there were two ways to store XML in a relational database. First, since an XML document is a string of characters, you could stick it in a Clob, or Character Large Object. But updating or retrieving specific data values inside the Clob is very inefficient; you basically have to process the whole document.