Graph processing platform Apache Giraph reaches 1.0
Used by Facebook and Yahoo, the Apache Giraph project for distributed graph processing has released version 1.0. This is the first new version since the project left incubation and became a top-level project in May 2012, though for some reason it has yet to make it to the Apache index of top level projects.
Giraph allows social graphs and other richly interconnected data structures with many billions of edges to be analysed using hundreds of machines. It is inspired by the Bulk Synchronous Parallel abstract computer model and the Google Pregel system for large scale graph-processing. The developers of Giraph say that unlike those systems, Giraph is an open source, scalable platform built atop of the Apache Hadoop infrastructure which has no single point of failure by design. The documentation includes an introduction to Giraph's iterative graph processing and how to implement graph processing functions in Java. The Giraph project has seen contributions from Yahoo!, Twitter, Facebook and LinkedIn and from academic institutions around the world.
The 1.0 version has built on previous releases by more efficiently using memory though fast byte-based serialisation by default and more efficiently using multicores with multithreaded input and computation. There's also a simplified API for working with vertexes (also referred to as nodes), the ability to input data based on those vertexes or on the graph edges between them, a master compute API for managing application-wide logic in processing and sharded aggregators for large, in terms of memory, aggregations. Finally, there's simple access to Hive tables and support for Hadoop's new YARN processing.
The Apache 2.0 licensed Giraph is available to download from the project's releases page. The developers hope to be having more frequent releases in future "now that we are more familiar with the process."