Yahoo working on Hadoop MapReduce 2

24.03.2011
The , the software implementation that allows batch processing of petabytes of data, is expected out this year, says a executive.

Todd Papaioannou, vice president of architecture at Yahoo, told Computerworld this week that current iterations of Hadoop lack the ability to effectively manage resources across thousands of servers in a cluster.

So developers are working on improving utilization, scheduling and management of resources.

For example, the new architecture will include a global ResourceManager that will tracks server availability and scheduling invariants while a per-application ApplicationMaster runs inside the cluster and tracks the program semantics for a given job, Yahoo developer Arun Murthy wrote in a blog post.

Papaioannou said Yahoo contributed about 70% of the code for the current iteration of Hadoop and the Hadoop Distributed File System (HDFS).

Earlier this year, Yahoo dropped its own distribution of Hadoop and began working more closely with the Apache Hadoop community because it allows the open source community to help with development efforts, Papaioannou said.