Second alpha for Apache Hadoop 2.0
The Apache Hadoop developers have voted on and released the second alpha of Apache Hadoop 2.0, which goes by the version number 2.0.2. The distributed computing and storage framework's latest developments include significant improvements to the high availability variant of HDFS and a more stable version of YARN, which has already been tested against a 2000 node cluster. The release notes offer a high level of detail about all the changes made.
The developers at Hortonworks say they believe they are well on their way to signing off on Hadoop-2.x early next year. Hadoop developers are now working on the last of the feature work such as HDFS HA without the need for shared storage (which is already committed), YARN ResourceManager availability, and scheduling enhancements. The 2.0.2-alpha's YARN implementation is already running at Yahoo on 2,000 and 3,600 node clusters, though they are actually running Hadoop 0.23.4 which is "essentially is 2.0.2 alpha without HDFS high availability".
YARN is a next-generation MapReduce implementation; it splits Hadoop's JobTracker functionality into two daemons: one as a global resource manager (RM) and one for per-application job-scheduling, the ApplicationMaster (AM). The ResourceManager then manages the cluster's resources while the ApplicationMaster negotiates with it for the resources it needs to run its application.
Apache Hadoop 2.0.2 alpha is available to download from various Apache mirrors, but, as a development release, it is not suitable for production use.