Project Serengeti: Hadoop in the VMware cloud
VMware has introduced a new open source project that is designed to allow the Apache Hadoop big data framework to be used in virtualised and cloud environments. VMware's plan is that the Serengeti technology will allow VMware's vSphere to become the main virtualisation platform for Hadoop applications.
The technology can handle various Hadoop implementations such as Hadoop 1.0, CDH 3 by Cloudera, Hortonworks 1.0, Greenplum HD 1.0, and implementations by IBM and MapR. Additionally, VMware has announced that it will release code to the Hadoop project. The company plans to make code for the HDFS (Hadoop Distributed File System) and Hadoop MapReduce components available so that data and MapReduce jobs can be distributed across a virtual infrastructure in an optimal way.
The developers have also updated the Spring for Apache Hadoop project, which allows programmers to use Hadoop as an analytics tool in the Java applications they create using the Spring framework. It also enables them to create, configure and execute Hadoop services such as MapReduce, Hive and Pig from within Spring. The new projects were announced at the Hadoop Summit in San Jose, where other companies, including Cloudera, DataStax, Hortonworks, MapR and Pentaho, have also announced new Hadoop-related products.