Accessing the web of databases

02.05.2006
I've just posted the fourth installment (http://www.infoworld.com/4109) in my new series of Friday podcasts. It's an interview with Kingsley Idehen, CEO of OpenLink Software.

OpenLink's (http://www.openlinksw.com/) flagship product is a universal database and application server, Virtuoso (http://www.infoworld.com/699), which I last wrote about in 2003.

I convened the interview mainly to discuss Virtuoso's recent transition to open source (http://www.openlinksw.com/blog/~kidehen/?id=951), but our wide-ranging conversation helped me clarify a theme that's been central to my own work, and will dominate the next phase of the Internet's evolution. The Web is becoming a database -- or, more precisely, a network of databases. All of the trends that inform this column -- including Web services, REST (Representational State Transfer), AJAX (Asynchronous JavaScript and XML), and interpersonal as well as interprocess collaboration -- can be usefully refracted through that lens.

I've always regarded the Web as a programmable data source as well as a platform for the document/software hybrid that we call a Web page. Early on, programmable access to Web data entailed a lot of screen scraping. Nowadays it often still does, but it's becoming common to find APIs that serve up the Web's data. If you want to remix the InfoWorld metadata explorer (http://www.infoworld.com/4110), for example, as Mike Parsons did, you can fetch its data directly as XML.

Free text search is an even more popular access API. Nearly every site provides that service, or outsources it to Google or another engine.

And, of course, sites that act as database front ends support canned queries, the results of which may (if you're lucky) be accessible by way of APIs such as RSS.