Alpha debut for Apache Hadoop 2.0
The Apache Hadoop developers have released the first alpha of Apache Hadoop 2.0, which features YARN, the next generation MapReduce implementation. According to the announcement by the release manager for Hadoop 2.0, Arun Murthy, the alpha is far from production ready and should be regarded as a preview release.
YARN, introduced in Hadoop's 0.23 branch, splits Hadoop's JobTracker functionality into two daemons: one as a global resource manager (RM) and one for per-application job-scheduling, the ApplicationMaster (AM). The ResourceManager then manages the cluster's resources while the ApplicationMaster negotiates with it for the resources it needs to run its application.
Other features in Hadoop 2.0 alpha, and which appeared in Hadoop 0.23, include HDFS Federation and HDFS HA (High Availability with manual failover). Performance has also been improved in the alpha. Wire compatibility for both HDFS and YARN is now a feature thanks to a switch to using protobufs for communication.
A number of features have yet to be incorporated – such as HDFS Snapshots and autofailover for HA NameNode – and work is needed on the stability and performance of YARN. Additionally, Murthy notes that some of the APIs need to be worked on because of the switch to using protobufs, and HDFS HA and YARN need more testing and validation. He also points out that many features are still works-in-progress.
Further details on the Hadoop 2.0.0 alpha release are available from the Hadoop releases page and in the release notes. Apache Hadoop 2.0.0 alpha can be downloaded from Apache mirrors and documentation for the release is available. The most recent full release of Hadoop was version 1.0 released in January. Hadoop is licensed, as all Apache projects are, under the Apache 2.0 licence.