Intel plays for big data with optimised Hadoop distribution
Intel has surprised developers by announcing its own distribution of Apache Hadoop, the distributed "big data" framework. The Intel Distribution for Apache Hadoop Software is, says the company, optimised for Intel Xeon processors with Intel SSD drives and Intel's 10GbE networking. Hadoop is a Java framework for scalable distributed systems based around the MapReduce approach and developed by the Apache Software Foundation.
Intel's focus in its distribution is in enhancing the open source framework with performance and security features. According to the company, with its enhancements and hardware, a terabyte of data, analysis of which had taken up to four hours, was now processable within seven minutes. Intel points to the Xeon Processors' support for AES New Instructions (AES-NI) – chip-based encryption – which has been integrated with Hadoop to allow the cluster to process encrypted data in the Hadoop Distributed File System without much performance degradation.
The distribution also includes tools for deployment, configuration and monitoring for administrators rolling out new applications on their clusters. These include the Intel Active tuner for Hadoop, which will automatically configure Hadoop for best performance, removing the need for specialised knowledge of the Hadoop configuration or standalone performance benchmarking.
The project's many tools are being developed by Intel, rather than under the Apache Hadoop project, and are not open source software. However, the company has announced plans to extend the Hadoop framework with Project Rhino, which is a set of data protection enhancements that will add a framework for encryption and key management, a common token-based and single-sign-on authorisation framework for Hadoop components, more granular ACL support in HBase, and improved audit logging.
Intel announced thirty partners who were working with it, including Amazon, Cisco, Cray, Dell, LucidWorks, Pentaho, Red Hat, SAP, SAS and Wipro. SAP will be integrating its HANA in-memory data-processing capabilities with Intel Hadoop as well as building bridges to and from Hadoop through SAP Data Services. Cray has also announced that it plans to build links between the Intel distribution and its own Xtreme-X and Xtreme-cool supercomputers.