Big Data Dictionary

GraphLab

GraphLab is a graph-based, high performance, distributed computation framework written in C++. It provides a unified multicore and distributed API using optimized C++ execution engine leverages extensive multithreading and asynchronous IO from HDFS. GraphLab intelligently places data and computation using sophisticated scheduling algorithms. It is designed to scale to graphs with billions of vertices and edges easily. GraphLab combines advances in machine learning algorithms, asynchronous distributed graph computation, prioritized scheduling, and graph placement with optimized low-level system design and ecient data-structures to achieve unmatched performance and scalability in challenging machine learning tasks.

The above figure illustrates the software stack of the GraphLab system. In principle, GraphLab is an asynchronous distributed shared-memory abstraction in which vertex-programs have shared access to a distributed graph with data stored on every vertex and edge. In this programming abstraction, each vertex-program can directly access information on the current vertex, adjacent edges, and adjacent vertices irrespective of edge direction. The GraphLab abstraction consists of three main parts: the data graph, the update function, and the sync operation. The data graph represents user modifiable program state that stores both the mutable user-defined data and encodes the sparse computational dependencies. The update function represents the user computation and operates on the data graph by transforming data in small overlapping contexts called scopes. On the runtime, the GraphLab execution model enables a more efficient distributed execution by relaxing the execution ordering requirements of the shared-memory and allowing the GraphLab run-time engine to determine the best order to execute vertices. For example, one function may choose to return vertices in an order that minimizes network communication or latency. The only requirement imposed by the GraphLab abstraction is that all vertices are eventually executed.

GraphChi project is is a spin-off of the GraphLab project which has been designed to enable a single desktop computer (actually a Mac Mini) to run large scale graph-based computations.