What Hadoop can, and can't do

14.06.2012

That's not to say you use Hadoop for structured data. In fact, there are many solutions that take advantage of the relatively low storage expense per TB of Hadoop to simply store structured data there instead of a relational database system (RDBMS). But if your storage needs are not all that great, then shifting data back and forth between Hadoop and an RDBMS would be overkill.

One area you would not want to use Hadoop for is transactional data. Transactional data, by its very nature, is highly complex, as a transaction on an ecommerce site can generate many steps that all have to be implemented quickly. That scenario is not at all ideal for Hadoop.

Nor would it be optimal for structured data sets that require very minimal latency, like when a Web site is served up by a MySQL database in a typical LAMP stack. That's a speed requirement that Hadoop would poorly serve.

What Hadoop can do

Because of its batch processing, Hadoop should be deployed in situations such as index building, pattern recognitions, creating recommendation engines, and sentiment analysis -- all situations where data is generated at a high volume, stored in Hadoop, and queried at length later using MapReduce functions.