Database sea change drawing near

08.04.2005
Von Ephraim Schwartz

There is no nonbiological entity that can compare to the human brain"s ability to process massive amounts of different types of data, integrate it, analyze it, and respond to it in a split second. But to hear the major database vendors tell it, they"re coming close.

Major changes in database technology on the way from vendors such as IBM, Microsoft, Oracle, and Sybase include the ability to store semi-structured and unstructured data in a relational database and converge OLTP and analytics into a single database that handles both transaction and analysis simultaneously. Lower total cost of ownership (TCO) will also be a central theme in the next generation of database technology.

The marriage of analytics and OLTP, in fact, goes hand in glove with the move by IT leaders to consolidate systems in the datacenter, according to Barry Shuler, CTO at Marriott International.

"We want to store and retrieve data faster and take it out of a warehouse. There is no need to have separate reporting from the transaction system. We would want to run the transaction system and reporting system on a single physical entity," Shuler said.

In an exclusive to InfoWorld, Sybase revealed that this fall it will release ASE 15 (Adaptive Server Enterprise), an OLTP relational database that will include a sophisticated query processing engine with algorithmic similarities to the analytics done by Sybase IQ, its data warehousing tool.

"It"s not replacing IQ, but it is expanding what ASE can do in a mixed environment," said Tom Traubitz, senior marketing manager at Sybase.

By mixed environment Traubitz is referring to changing requirements in telecommunications, health care, and retail. In these industries and others, customers want a database to process, read, and write data, as is typical of an OLTP database, while simultaneously retrieving and analyzing the data as happens in a data warehouse.

The database paradigm is shifting from request-response to triggers that emit messages when something interesting happens, according to Traubitz. As the database becomes more of an active partner in the environment, sending messages to applications, in three to five years customers can expect to see a technology that Traubitz calls "steams."

"The concept here is that you define the question in advance then you file the data past the question. The data is like a stock ticker going by," Traubitz said.

Microsoft also is moving slowly and steadily toward a convergence of OLTP and analytics, according to Tom Rizzo, Microsoft"s director of product management for SQL Server. The forthcoming version of SQL Server, due in the second half of 2005, will have analysis services on top of relational rows and columns, and non-relational unstructured data, Rizzo said. Within the ETL (Extraction Transformation and Loading) tool, Microsoft will integrate the analytics right in the pipeline as the data is flowing in real-time, he added. 

Microsoft will also add to SQL Server what it calls "proactive caching." Using this design, as data comes into the OLTP system it is cached and aggregated according to standard statistical algorithms that do "sum," "deviation," "min and max," "average," and "ceiling."

"We can slice the data according to whatever you tell us," Rizzo said.

While all the database vendors say they can do this now, Donald Feinberg, vice president and distinguished analyst at Gartner, said it just isn"t so.

"They are dreaming," said Feinberg, who points out that it can be done on a small to medium-size database, but as you get to an enterprise data warehouse and thousands of OLTPs, the database can"t do both -- just yet.

"It won"t be possible [in a large database] until 2010," Feinberg added.

Jerry Kerr, senior research analyst at the Standish Group, is also doubtful that the full benefits of a converged database will happen in the next year or two. He warns that this is not just a software problem but an issue of hardware as well.

Kerr said the maximum size of memory, the speed of processors, and I/O rate all have to increase before a database can achieve an acceptable level of performance for simultaneous OLTP and data warehousing.

XML for the near term

However, new technology emerging this year offers the capability to store native XML format in a relational database.

With compliance driving the enterprise to store e-mails, Weblogs, and someday perhaps every phone call made at the office, the need to store data that does not conform to columns and rows is critical.

"Data that was low value, like e-mail messages, is now high value," said Andy Mendolsohn, senior vice president of database server technologies at Oracle.

Oracle"s next database release will support XML Query language called XQuery, a new W3C standard for native XML access.

The next versions of IBM DB2, expected out by 2006, and Microsoft SQL 2005, due to ship in the second half of 2005, will also support XQuery. All of the databases will allow mixed SQL and XML queries to the database.

Making that integration effort even easier will be the next version of Office, which will make XML the default file format in Word, PowerPoint, and Excel spreadsheets.

Once you can have native XML data supported in a relational database, the full fidelity of the XML data model is available, said Peter Kelly, senior analyst with the Burton Group.

"You will be able to put a digital signature on the whole document. You can have a purchase order or RFQ, without having to transform it, which certifies the document has not been tampered with," said Kelly.

However, compliance also touches privacy issues as in HIPPA compliance, and that brings yet another challenge to database vendors, according to Kerr at the Standish Group. 

The reason that the medical industry hasn"t moved over to electronic storage of patient records is because of security and privacy issues.

"Within a single record you need different levels of security. Some people may need to access some fields but not the entire record," said Kerr.

Kerr points out that there is no feature in a standard database available today that offers this capability with a security layer.

Cutting out costs

Most analysts agree that lowering the cost of supporting a database is another big driver for change.

"The biggest cost is people," said Feinberg at Gartner.

IBM appears to agree. Jeff Jones, director of strategy at IBM Information Management Software, says databases need to require fewer DBAs.

"Each version of DB2 will require fewer DBAs until not one is required," said Jones.

That may be a bit optimistic, but according to Feinberg, if DBAs can be freed up from the day to day management of the database they can be used in more productive areas such as designing schema and performance optimization. Microsoft will have some optimization technology in SQL Server 2005, according to Rizzo.

A "closed loop system" will optimize a query by capturing statistics about that query and trying different optimization plans the next time the query is used.

Up until now Oracle has been focused on automating the management side of a database. Oracle 10g includes automatic backup and recovery, memory management, statistics, and storage management. The next version will focus on self-diagnostics, self-tuning, and self-repair said Mark Townsend, senior director of database product management at Oracle.

IBM is investing a "huge amount" in taking DB2 to where it automatically improves performance without a DBA adjusting SQL, according to Jones. In DB2, Version 8.2 there are features for automatic performance optimization, installation, and automatic configuration when setting up the database during installation. Future versions will also focus on self-diagnostics and self-healing.

The reason why a battle is being waged so strongly among database vendors, according to Feinberg, is the belief that whichever vendor controls, supports, and stores all the data also has the power to leverage that vendor"s other products and services.