StreamBase"s Stonebraker touts streaming apps

11.01.2005
Von 
Paul Krill ist Redakteur unserer US-Schwesterpublikation InfoWorld.

Mike Stonebraker has had a well-traveled career in IT, specifically in the area of data management. He was the main architect of the Ingres relational database, the Postgres object-relational database, and the Mariposa federated data system. He was founder and CTO of the Ingres, Illustra, and Cohera corporations as well as CTO at Informix and Required Technology. His latest project is StreamBase Systems Inc., where he is founder and CTO. StreamBase is attempting to ride what it believes will be a great wave of need for high-speed streams-based data applications. Stonebraker recently met with InfoWorld Editor at Large Paul Krill in Pacific Grove, Calif., to discuss StreamBase and recent goings-on with Stonebraker"s previous projects. Stonebraker also provided his views on the open source phenomenon and Oracle Corp."s acquisition of PeopleSoft Inc.

InfoWorld: What are StreamBase"s goals?

Stonebraker: This is a commercialization of an academic prototype and Stan Zdonik, who"s (professor of computer science) at Brown University, and I four years ago basically recognized it. If you want to do real-time stream processing such as Wall Street does bunches of, they"re badly served by all (current) system software. So we set about building a new piece of system software from the ground up that"s very good at inhaling fire hoses of incoming data and doing fairly complicated processing on it.

InfoWorld: You say it"s system software. If it"s not analogous to a database, what would it be analogous to?

Stonebraker: When I say system software I mean things like database systems, application servers, messaging systems. There are big differences between what a database system does and what we do. For instance, electronic trading is driving up the feed data rates on all the exchanges because the electronic trading systems are very good at probing the market. So data rates are exploding on Wall Street, trades are going up, and electronic trading says "Do it right now." And right now means millisecond response time, inhaling fire hoses of data. We have an engine that"s good at this. We"re (currently) selling most exclusively in financial services. We"re a startup. We"re a 25-person company. We"re very well-funded. The financial services (industry) is willing to take risks on startups. They"re willing to deal with new technology. Other places where there are big applications (for StreamBase) are in military and homeland security.

InfoWorld: Why those?

Stonebraker: We did a prototype that dealt with army battalion monitoring. When an army battalion is 30,000 humans and 12,000 vehicles, the army is deadly serious about getting a vital signs monitor on every one of the humans so they can do combat medical triage or (take other actions). They already have a GPS system in every vehicle, but that didn"t keep Jennifer Lynch"s convoy from getting lost.

They want to turn this into a system to watch the position of every vehicle and compare it against where you"re supposed to be. They also want to put a sensor on the gun turret. Together with position, that allows you to detect crossfire which is a big problem in Iraq. (Also,) they want to put a monitor on the gas gauge and figure out do you have enough fuel to accomplish your mission. It"s this style of application which is large amounts of real-time data with real-time actions to take.

InfoWorld: StreamBase is not really transactional software, correct?

Stonebraker: No, none of these applications are transactional. This is not a bread-and-butter business data processing model ?The military is a good market because they"ve got fire hoses of real-time data and want to take real-time actions based on it.

(Other opportunities include) industrial process control, monitoring Cheerios factories, and oil refineries; it"s a very conservative market and downstream (these are) something we"ll look at, but not right now. Any continuous industrial manufacturing process is monitoring what"s happening and wanting to take corrective action before you spit out too much bad product and so it"s a ton of sensors watching intermediate steps of the process, and you want to take corrective action if those sensors get out of whack. Another (opportunity) is in the networking space, the denial of service attacks, intrusion detection -- people would love to do it in real time so that they can take corrective action quickly.

Another example is in financial services (where) the fear is that the bad guys that do credit card fraud and identity theft will target financial services networks. So financial services companies want to do things like watch every application-level event in a worldwide network (for) the same customer logged in from two or more IP addresses that look like they"re more than a mile apart. (We) watch a complicated network for application-level events that have this or that property, the idea being to look for situations that might be fraud. These are applications which are currently typically done by salting away every event in the database and then looking at in batch after close of business. That"s like locking the hen house after the fox has already been in there. (Users) want to move that to real time and this is another example of network monitoring.

There"s a sea change in micro-sensor technology of various sorts. You"ve probably heard the most about RFID which (I think is one of the least) interesting technologies. But what"s going to happen is that everything of material significance is going to get sensor-tagged by one or another technology in the next decade or so, and it"s going to report its state or location in real time and that"s going to generate a (great) deal of new monitoring applications.

(Another example is E-ZPass toll road systems.) What"s going to happen fairly quickly is that your E-ZPass system is going to position you in real time. That will allow the turnpike authorities to do congestion-based tolling, which is when how much to charge you in toll depends on how many other people are trying to use the same road.

InfoWorld: How does your software work?

Stonebraker: What we do is we read TCP/IP streams. We produce asynchronous messages to TCP/IP. The messages we produce, the customer has to write an application that consumes them. They give you an API so you don"t have to directly use TCP/IP so they can use it with an (app). In financial services, there"s a dozen or so popular feed formats and we"ve written adapters for most of them to convert to our internal format. So in comes the market feed, this is an asynchronous message stream.

One way to think about it is that we insist that it obey a database-style (scheme) so that we read binary messages off of the wire and if they"re not in our format, then there"s a converter that converts them. We give you a workflow-oriented GUI with a bunch of primitives and you assemble an application by dragging and dropping all of our primitives onto a workspace and then you run the workspace.

InfoWorld: So where is the data stored, if necessary?

Stonebraker: Any database system has an architecture where it"s the big oil can in the middle called storage. In comes data into storage, once you store it, commit the transaction, index the data, write a log record, you can do any outbound query processing you want. So basically storage is in the latency loop and processing comes after storage and we call that outbound processing. We do not do that.

What we do is we do the same style of processing on the bytes as they fly by in virtual memory. There"s no requirement that you store the data ? We are not a database system. We are stream-processing engine that is good at doing processing on streams as they fly by. Not requiring storing the data makes a certain class of applications go way, way faster .? Our implementation (in one particular application) runs 140,000 messages a second on a $1,500 PC. And that took a few days total to implement. We tried the same application on one of the elephant RDBMSs, (meaning large, commercially available relational databases). The best we could get it to was 900 messages a second.

InfoWorld: Where does XML fit in this?

Stonebraker: XML is the last thing in the world we would ever do.

InfoWorld: Why?

Stonebraker: We have an adapter for XML so that if you have an XML stream coming in, we convert it on input to something that"s high-performance. If we internally processed XML, it"s just way, way too slow and there are two reasons. One is it"s a self-describing format so messages get really big and then every box has to parse every message to figure out what it means ? I don"t mean to be glib, but all of our customers, if you mention the word XML they just laugh. There?s no way to run 100,000 messages a second in XML.

InfoWorld: Is there any kind of open source story for StreamBase?

Stonebraker: We embed Berkeley DB, which is an open source storage engine. We have extensibility in C++; you can use whatever C++ system you want for extensibility. This is not an open source engine.

InfoWorld: There are no plans to make StreamBase open source?

Stonebraker: Not right now, no. Since we can get up applications in small numbers of days, the business model is basically find a customer who"s interested, and we volunteer to write a pilot program on our nickel. We say "Point us at your hardest problem," then we?ll go away and we?ll pilot it for you and come back and show it to you. If you like what you see, we?ll talk some more; otherwise we?ll go away.

InfoWorld: You"ve worked with Informix, Ingres, Postgres, now you?re doing this. Ingres and Postgres are open source and IBM (Profile, Products, Articles) has purchased Informix. What are your feelings about what"s happened with all those technologies?

Stonebraker: I think Ingres suffered from benign neglect under the tutelage of Computer Associates. (Going open source) is a very reasonable marketing strategy on the part of CA to try and get some traction for Ingres. I just wonder if it"s too little, too late. It"s a rock solid relational database system. ? I think it"s a very good relational database system that is very, very competitive with products like MySQL.

InfoWorld: What about Postgres?

Stonebraker: Postgres (also known as PostgreSQL) has been an open source DBMS forever and it has a dedicated following that is maintaining it, upgrading it. There"s a grassroots effort that keeps Postgres going. It would be great if somebody with some marketing muscle got behind pushing Postgres, because I think it"s a very high-functionality system that is very attractive if you need an extendible relational database system. It"s a very good system. Ingres is not extensible in the same way. Postgres has a very natural market niche that it appeals to. It"s alive and well and there"s quite a cadre of people who love it.

InfoWorld: What about Informix, which you got involved in through the acquisition of Illustra ?

Stonebraker: I think it"s so blended into IBM nobody really knows what color"s what at this point. Informix was sold to IBM and so IBM inherited two code lines. They inherited the Illustra code line and they inherited all the Informix code line. You"d have to ask them what their plans for these code lines are. ? I think DB2 has a lot of the good features from Informix Version 9.0.

InfoWorld: Such as?

Stonebraker: The Informix-Universal Server (database) had a very nice time series data blade which was very good at storing historical financial services data, and my understanding is that they"ve taken that data blade and put it into DB2. They?re sort of scavenging Universal Server for stuff and putting it into DB2. As near as I can tell, their strategy is to convert all the Informix users to DB2 and to take what there is good out of the Informix code line and put it into DB2.

InfoWorld: Do you think Oracle is wearing too many hats in being in applications and buying PeopleSoft? Do you think they?re in over their heads with an acquisition like that, or do you just wish them well on it?

Stonebraker: Buying PeopleSoft to me is (in line with) their goal to become a big player in the application market. As near as I can tell, their market strategy is (to be) all things to all people. This requires them to have good execution, as well as an increasingly large span of areas of expertise. ?Their challenge is to integrate PeopleSoft technology without alienating the customer base. They?ve got their hands full. So I wish them well. If I look out into the future, I think there?s going to be half a dozen or less software elephants, certainly, to include Microsoft, SAP, and Oracle.

InfoWorld: We have three others left then.

Stonebraker: And there maybe some others, maybe BEA. It?ll be interesting to see because I think increasingly the objective is to provide one-stop shopping over everything, all of enterprise software.

InfoWorld: What do you think of the open source movement? What do you think that means for commercial software companies if people are looking for the stuff for free?

Stonebraker: I think all the more power to them. I just chuckle that Microsoft is fighting Linux as hard as they can and so I think open source is attractive and I think that Linux is a fabulous model.