Large Data Set Analysis in the Cloud: Hadoop gets a boost

09.04.2009

The interesting thing about the two Hadoop offerings is that they both bring something unique to the table. Amazon removes the need for a Hadoop user to locate spare computing resources, always a tough task to accomplish in a typical corporate data center-I mean, who has 15 or 20 machines sitting around idle, just waiting to be used for Hadoop? On the other hand, Cloudera's offering avoids the need to upload large amounts of data to Amazon-a challenge given the limited bandwidth available to most companies; using Amazon's offering also imposes data movement costs, since Amazon charges for data movement in and out of AWS.

I suspect that both offerings will prove popular going forward. Each will be used by companies grappling with the need to analyze Internet-scale data. Depending upon the particular project or company constraints, one solution or the other will end up being preferred. In fact, I would not be surprised to see many companies embrace both approaches to Hadoop, once they begin to understand its power.

Bernard Golden is CEO of consulting firm HyperStratus, which specializes in virtualization, cloud computing and related issues. He is also the author of "Virtualization for Dummies," the best-selling book on virtualization to date.