Giraph: Production-grade graph processing infrastructure for trillion edge graphs
Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. In this talk, we describe the usability, performance, and scalability improvements we made to Apache Giraph, an open-source graph processing system, in order to be able to deploy it in production at Facebook on graphs of up to a trillion edges. We also detail significant changes to the original Vertex-centric computational model that make it possible to develop a broader range of production graph applications. Finally, we will deep dive into a few production applications, share what we learned in the past year and a half of production experience, and describe our future directions for the platform.
Avery has a PhD from Northwestern University in the area of parallel computing. He worked at Yahoo! Search for four years on the web map analytics platform, large-scale ad hoc serving infrastructure, and cluster management. During the past two and a half years, he has been working at Facebook in the general area of big data computational frameworks (Corona – scalable MapReduce and Giraph – scalable graph processing).