Firm uses Oracle app to call airfare trends

19.12.2006
At first glance, Farecast.com Inc.'s claim that its Web site can predict with 75 percent accuracy whether a particular airfare is going to rise or fall in the next seven days doesn't sound that impressive. Isn't flipping a coin accurate 50 percent of the time?

The Seattle firm uses a finely-tuned data mining engine to analyze more than 150 billion actual airfare price quotes from the past 18 months to and from 75 major U.S. cities to come up with its prediction. And garnering that extra 25 percent of certainty on fares apparently really matters: More than a million unique would-be fliers have tried the free Farecast.com service since August.

"It's a very complex problem," said Jay Bartot, vice president of technology at Farecast.com. "Our data mining engine is very large and sophisticated. We do a lot of post-processing, deriving new data from our existing data, which is then fed into our predictive engine."

In other words, Farecast.com is constantly generating airfare predictions on its own in addition to those requested by consumers. It then checks its results against the actual price quotes generated by the airlines, allowing Farecast.com to figure out how accurate it really is and further finetune its data mining operation.

Started in 2003, Farecast.com was spun out of research by professors at the University of Washington and the University of Southern California, yielding what is essentially a business intelligence service for consumers. The choice of which database technology to use was key and the company experimented with several open-source databases, including PostGreSQL and BerkeleyDB, before initially settling on MySQL.

Even so, as Farecast.com neared a launch date, Bartot worried. "We knew we would have to scale out in a major way. I had read some stories about companies doing huge rollouts of MySQL clusters, but in at least one case relevant to us, it turned out to be more of an experiment," Bartot said.

Having had experience with Oracle at previous jobs, Bartot decided to move off MySQL to an Oracle 10g-based grid.

Farecast.com now runs a four-node cluster using Real Application Clusters, Oracle Partitioning and Enterprise Manager 10g. Each node is running SUSE Linux Enterprise Server with two dual-core AMD Opteron 275HE processors, 8GB of DDR 3300 memory and remote NFS-attached storage.

Farecast.com dedicates one node for ad hoc queries. Another node handles administrative tasks. A third node handles the key task of loading data. With information sharing agreements from all of the major domestic carriers except for Southwest Airlines, Farecast.com adds more than three billion airfare quotes each month. That data arrives around the clock in XML format via a provider called ITA Software Inc., before being transformed into SQL and other formats by Farecast.com's in-house tools. It is then loaded into the Oracle data warehouse.

Bartot called the Oracle technology "robust," and "a great product" that is easy enough for just two system administrators to handle.

There are rare occasions when usage has spiked enough to cause the Oracle database to lock up, but Bartot said "they are easily things at the application layer we can change to fix, nothing I would characterize as shortcomings" in Oracle.

The 5TB data warehouse, which is compressed to 1TB to fit on disk, is growing fast. According to Bartot, the company is adding more data sources such as airfares with non-U.S. cities in preparation for a likely launch of international price predictions late in 2007.