In association with heise online

02 March 2012, 12:25

Spring Hadoop makes Java/Hadoop interaction easier

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

SpringSource logo

SpringSource, the VMware division that is the home of the Spring framework for Java, has announced Spring Hadoop; this brings support for Spring, Spring Batch and Spring Integration to Apache Hadoop applications. This will allow Spring application developers to make use of data and computing capabilities of Hadoop compute clusters as an analytical tool. The project has been developed over the last few months, according to developer Costin Leau, who introduced this first release.

Among the capabilities that Spring Hadoop provides are mechanisms to interact with Hadoop to create, configure and run Hadoop MapReduce jobs and access Hadoop hosted services. MapReduce jobs can, for example, be created and submitted to the cluster from within Spring applications. Administrators could use JVM-based scripting languages such as Groovy, JRuby, Jython or Rhino, to interact with the Hadoop file system API, to simplify provisioning, add new files, parse results or cleanup; SpringSource suggests this is a middle ground between Hadoop's FileSystem Java API and Hadoop's shell.

Developers can also work with Hive, the distributed database, starting servers and accessing the database either through a Thrift client or with a JDBC driver using Spring's JDBC functionality. Similar abstactions for HBase and Pig are available. Spring Batch has a tasklet which also allows it to execute Hive queries, along with tasklets for Pig, Scripting, Hadoop tools and Cascading. Further details are available in the reference documentation.

Spring Hadoop is a subproject of the Spring Data project which is developing data access connectivity to new non-relational databases, MapReduce frameworks or cloud-based data storage. Among the existing projects are JDBC support for Oracle RAC, GemFire, Redis, Riak, MongoDB, Neo4J, Amazon S3. Spring Hadoop is available to download and is published under the Apache 2 Licence. The most recent release currently available is the development release 1.0.0 Milestone 1.


Print Version | Send by email | Permalink:

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit