Apache Hadoop to get more user friendly

20.07.2011
Relief is on the way for users of the open source Apache Hadoop distributed computing platform who have wrestled with the complexity of the technology.

A planned upgrade to Hadoop distributed computing platform, which has become popular for analyzing large volumes of data, is intended to make the platform more user-friendly, said Eric Baldeschwieler, CEO of with the intent of building a support and training business around Hadoop. The upgrade also will feature improvements for high availability, installation and data management. Due in beta releases later this year with a general availability release eyed for the second quarter of 2012, the release is probably going to be called Hadoop 0.23.

"A big focus for us is going to be adding tools for monitoring and distributing and management, [with the goal of making it] much easier for organizations to use Hadoop. The problem now is it takes a pretty sophisticated operations staff to install and use it," Baldeschwieler said during an interview at HortonWorks's Silicon Valley offices this week. He formerly was vice president of Hadoop engineering at Yahoo, which has been instrumental in Hadoop development.

Version 0.23 also is set for improvements in availability, performance, and scalability. "That's a big one for very large customers," such as Yahoo and Facebook, Baldeschwieler said. Tending to single points of failure in Hadoop's master nodes will be a goal.

Also, the new HCatalog data management software layer planned for Hadoop 0.23 will let users store data in a more traditional table style, enabling users to transparently move data between tools. It also yields benefits for the MapReduce programming model used with Hadoop. Currently, users can work with two higher level languages on top of Hadoop -- Pig and Hive -- said Baldeschwieler. Pig and Hive have their own specialty data stores. "What HCatalog's going to allow is for Pig and Hive and MapReduce itself to operate on one set of tables," he said.