In association with heise online

03 April 2012, 17:29

Hadoop data transfer tool Sqoop now an Apache Top-Level Project

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

Sqoop logo The Apache Software Foundation has officially announced the graduation of Sqoop, the data transfer tool for Apache Hadoop, to a Top-Level Project. Apache Sqoop is designed to efficiently transfer bulk data between the Hadoop "Big Data" platform and other data stores.

For example, Sqoop can import data from a database into the Hadoop Distributed File System, Hive or HBase, or export data from those distributed platforms, back into databases or other storage like an enterprise data warehouse. Sqoop is able to parallelise the data transfer to improve performance and best make use of system and network resources; it is built on the Hadoop infrastructure itself and utilises its capabilities to distribute the transfer among Hadoop nodes. It also allows companies to write SQL queries to be executed against their databases and have the results imported into Hadoop. It comes complete with connectors to MySQL, PostgreSQL, Oracle, SQL Server and DB2.

The Foundation says Sqoop has been embraced as an "ideal SQL-to-Hadoop data transfer solution". Sqoop began development in May 2009 as a module for Apache Hadoop, contributed as a patch by Aaron Kimball. Over the course of the year it saw 56 patches, and was eventually decoupled from Hadoop in April 2010 and hosted on GitHub by Cloudera. Another year, four releases and 191 patches later, Sqoop was proposed for incubation at the Apache Software Foundation to boost the community around the project. Another 116 patches in incubation and now Sqoop and its "Apache Way" governance is ready for the top level of the Apache Software Foundation. The most recent release of Sqoop is version 1.4.1-incubating, available from Apache mirror sites for download.

Sqoop was one of four projects cleared for graduation to the top level last month; Rave, the social mashup engine, was first. Lucy, a C-based full text search engine, and Accumulo, a distributed key/value store built on top of Apache Hadoop, Zookeeper and Thrift will be officially announced in the coming weeks.


Print Version | Send by email | Permalink:

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit