Market researcher revs up data warehouse grid

19.10.2006

Before its recent move to Oracle grid technology, R.L. Polk stored most of its data on Oracle 9 or 10 databases running Sun Solaris servers, connected to EMC gear running in storage-area networks.

Now, R.L. Polk's grid is comprised of 100 two- and four-way servers all running Red Hat Enterprise Linux. It also serves up applications and powers the rule processing engine. It can "easily double" to 200 servers, providing room for growth.

Only a tiny portion of the grid -- four four-way servers -- is apportioned now to the data warehouse. Much of it is devoted to running R.L. Polk's new Web-based applications, which both import data into the data warehouse from 260 discrete sources, such as car dealers or state licensing boards, and streams it out to paying customers, such as carmakers, car dealers and parts suppliers.

The data warehouse serves as R.L. Polk's "single source of truth" on a massive database that includes 500 million individual cars, or almost 85 percent of all cars in the world as of 2002. It also includes data on 250 million households and 3 billion transactions.

R.L. Polk cleanses the names and addresses of all incoming records, adds location data such as latitude and longitude, and, in the case of the 17-digit vehicle identification numbers unique to every car, extrapolates each car's individual features and styling. It's a complicated process, but as his team continues to tweak the Oracle grid engine, Vasconi expects to be able to shorten the importation time to less than 24 hours.