Hadoop distribution from Yahoo!
Yahoo! has announced the availability of its own distribution of Apache Hadoop framework for distributed computing. The source-only distribution of Yahoo's Hadoop is the version of the Apache Hadoop code that is used internally by Yahoo! on tens of thousands of systems.
The Yahoo! release includes patches that Yahoo! has incorporated to improve the performance and stability of their own clusters. Yahoo! points out that all these patches have been contributed back to the Apache project, but may not be available yet, in an official Apache release. The code is being hosted on GitHub and is available now. Yahoo! do not offer support for the distribution.
A recent demonstration of Hadoop by Yahoo! saw a petabyte of data sorted in 16.25 hours, using a 3,800 node cluster comprised of dual, quad-core, 2.5Ghz Xeon systems. The same system configuration could sort a terabyte of data in 62 seconds.