10gen announces MongoDB Hadoop Connector
10gen, the company behind MongoDB, has announced the general availability of a connector for its open source NoSQL database and Apache Hadoop, the MapReduce framework and distributed computing platform. According to its developers, version 1.0 of the connector is the "culmination of over a year of work to bring our users a solid integration layer between their MongoDB deployments and Hadoop clusters for data processing".
Using the connector, developers can transfer MongoDB data into Hadoop MapReduce jobs and receive the results of MapReduce jobs and transfer it back to MongoDB. Support for writing to MongoDB from Apache Pig and from the Flume distributed logging system is included in the initial release, as is support for using Python to MapReduce to and from MongoDB via Hadoop Streaming.
The MongoDB Hadoop Connector currently supports MongoDB 2.0 or later, but the developers note that "it should (mostly) work with 1.8.x". Future releases of the connector will add Pig input support and Ruby support for Streaming, as well as support for reading and writing MongoDB Backup Files for offline batch processing.
More details about the connector can be found in the announcement blog post. Version 1.0 of the MongoDB Hadoop Connector is available to download from GitHub; a full list of requirements, documentation and examples are provided.
- Apache Hadoop reaches its 1.0 milestone, a report from The H.
- NoSQL database MongoDB 2 ups the tempo, a report from The H.