Viper set to strike

08.07.2005
Von Ed Scannell

With the delivery in 2006 of the next version of DB2, code-named Viper, IBM"s quest for a database that hooks up structured and unstructured data and that supports queries by SQL and Xquery might reach its end. At least that is the belief of Janet Perna, general manager of IBM Software"s information management division. Perna sees Viper, expected to enter formal beta testing by September, as big a step forward technologically -- as big a leap as the one made in going from hierarchical to relational databases. However, she says this transition should be significantly easier for larger IT shops because Viper is already built on relational technology and contains a much heavier dose of XML, which will make integration, development, and integration easier.

Perna sat down with InfoWorld Editor at Large Ed Scannell to discuss the upcoming product, which is expected to go head to head with archrival Oracle"s flagship database as well as fend off Microsoft"s efforts on the low end, and its strategic importance to Big Blue"s overall integration strategy.

InfoWorld:Will Viper finally be the crystallization of the long-held dream of integrating structured and unstructured data?

Perna: Yes, I think so. First, you must understand that XML is pervasive to everything we are doing in information management, whether it involves managing structured and unstructured information, or as a transport, or mapping standards for integrating information. It becomes part and parcel of things.

InfoWorld: So XML is fundamentally responsible for next-generation databases, like Viper, taking a giant step forward in their evolution?

Perna: Yes. It will be as dramatic as when we went from hierarchical to relational data bases. Now, did that happen overnight? No. The world didn"t just turn and go to relational on day one. But when it did, it was very significant. This will be as revolutionary as that move was.

InfoWorld: What do larger IT shops users need to consider from a technology standpoint before they commit to Viper?

Perna: [Viper] will be easier than the move to relational because it is already built on a relational base. All the same skills can apply, such as administration and the tooling. What has changed is largely under the hood. So the question is how quickly will they move to building a lot of [XML-based] applications and have a lot of XML documents. It is beginning to evolve right now, but everyone won"t be doing it on day one. However, on day one anyone can adopt Viper without changing their skills, people, and tools. They will also get a performance benefit, and an ease-of-administration benefit. And they will have the ability to build applications that marry structured and unstructured data.

InfoWorld: Do you see XML as increasingly important to marrying structured and unstructured data? Has XML progressed enough over the past three or four years to make that happen?

Perna: Yes, because there will be more and more XML content being produced and delivered. Why did people go from hierarchical to relational databases? Well, it was easier to design for and the query language made it very easy to get information out of the database. You look at XML in relational [databases] today -- does it work? Sure. Can we make it easier to use and perform better? Sure. And as there is more and more XML content created that will become more important. As we build out our metadata and metadata management capabilities in DB2, XML will be a key underpinning of that.

InfoWorld: Why did the purebred XML databases not succeed?

Perna: I look at them in the same way I did the pure object databases. Part of the reason was they did not have the robustness from an availability and scalability point of view. Could they get there? Sure. But it is a matter of where do you start. Do you start here with a pure object or pure XML database and put all the other stuff in it? Or do you start over here on a relational database and put the XML in?

InfoWorld: What is front of mind for CXOs when you talk to them today?

Perna: When I talk to them it is about cost reductions, or faster application development for an SOA architecture, or information integration. In some cases these issues are not just the issues among CIOs but among the CEOs, which to me is amazing. It is surprising to me how information integration hopped up the list as one of the things, along with revenue growth and employee productivity, as a top concern for them.

InfoWorld: How cooked is Viper now in terms of its features?  Are you going to add in more capabilities as it goes through the first beta, as you are doing with range partitioning?

Perna: There will be other things added in Viper. There will be more autonomic stuff, like self management and self healing capabilities. The range partitioning capabilities are really about performance and ease of administration and ease of design for massively parallel applications. It is not just a performance feature. It is an important feature if you are an Informix user. We have not had range partitioning for the last number of years, but not having it in there has not slowed us down.

InfoWorld: If it hasn"t slowed you down, why is it so important?

Perna: If we want to continue to move Oracle users to DB2, this is a feature we need to have. There are many different partitioning techniques and this is one technique we didn"t have. This is a good thing for us.

InfoWorld: Any other technologies besides partitioning and autonomic?

Perna: There will always be performance things we can do. But there will be security enhancements you"ll see and other technologies we are building right now. When you look at what the requirements are for a database engine, they are pretty much the same as they been there for 40 years: performance, scalability availability, reliability, security, and the ease of building the applications. That"s it. Look at the types of applications customers want to support -- OLTP continues to be important, as are data warehousing and complex querying. They continue to drive a lot of database growth.

And as companies optimize their business processes and start to do analytics in real time, they will need a database platform that can ingest information at transaction speeds. For companies with only a data warehousing platform, the transaction part will be difficult for them to do real-time analytics.

InfoWorld: Besides getting Viper into beta testing, what else has been a focus for you?

Perna: Iam actually spending a lot of my time going beyond the database, because as companies are looking at how they deal with all of their information assets, that is what they are focusing on as well. If we look at the universe of information, historically, most companies and most of the data providers have been focused at the data level on capturing transaction-oriented data, and then focused on the efficient storage and management of that data. I call that passive because not much is happening with the data. As long as we back it up and recover it and do so cheaply, that was the name of the game. But now companies are trying to use that data to gain insight into their business. An average Fortune 500 company last year had 177 terabytes of storage, and that is up from 7 terabytes in 1996. Their raw storage and data is growing tremendously and some 85 percent of it is unstructured data. So the challenge then is how do we take this data and integrate it with other data, other sources of information in order to gain some insight?

InfoWorld: What has been your bigger picture acquisition strategy over the past few years? You have been rather busy.

Perna: Yes we have. The acquisitions I have been making the last few years have gone toward filling out this technology framework. With Cross Access it was for mainframe databases that could do federation, with Formatica it was around content repositories. Ascential is centered around things like data cleansing, data profiling, and meta data management. With AlphBlox and SRD it was a around embedded. Green Pastures was for document management and Trigo for reference information for products.

InfoWorld: How does Viper fit into all this?

Perna: It is under the hood of everything we are building. It sits here in the database part, but if you look at our content management repository, it is used as a metadata store for the content, so it is under the hood of the Content Manager.