VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

Jiefeng Cheng1, Qin Liu2, Zhenguo Li1,

Wei Fan1, John C.S. Lui2, Cheng He1

1Huawei Noah’s Ark Lab

2 The Chinese University of Hong Kong

ICDE’15

Graph is everywhere

We have large graphs• Web graph• Social graph• User-movie ratings graph• …

Graph Computation• PageRank• Community detection• ALS for collaborative filtering• …

Mining from Big Graphs: two feasible ways

Distributed systems• Pregel[SIGMOD’10], GraphLab[OSDI’12], GraphX[OSDI’14],

Giraph, ...• Expensive cluster, complex setup, writing distributed programs

Single-machine system• Disk: GraphChi[OSDI’12], X-Stream[SOSP’13]

• SSD: TurboGraph[KDD’13], FlashGraph[FAST’15]

• Computation time close to distributed systems• PageRank on Twitter graph (41M nodes, 1.4B edges)

• Spark: 8.1min with 50 machines (each with 2 CPUs, 7.5G RAM)[Stanton KDD’12]

• VENUS: 8 min on a single machine with quad-core CPU, 16G RAM

• Affordable, easy to program/debug

Existing Systems

Vertex-centric programming model: popularized by Pregel / GraphLab / GraphChi• Each vertex updates itself based on its neighborhood

GraphChi• Updated data on each vertex must be propagated to its

neighbors through disk• Extensive disk I/O

X-Stream• Different API: edge-centric programming

• Less expressive, re-implement common algorithms

• Also use disk to propagate updates

Our Contributions

Design and implement a disk-based system, VENUS• A new vertex-centric streamlined processing model• Separate mutable vertex data and immutable edge

data• Read/Write less data compared to other systems

Evaluation on large graphs• Outperform GraphChi and X-Stream• Verify that our design reduce data access

Vertex-Centric Programming

Consider GraphChi

for each iteration for each vertex v update(v)

void update(v) fetch data from each in-edge update data on v spread data to each out-edge

Duplicated data

v

Vertex-Centric Programming

VENUS:• Only store mutable values on vertices

Pros• Less data access• Enable ``streamlined’’ processing

Cons• Limited expressiveness

void update(v) fetch data from each in-edge update data on v spread data to each out-edge

in-neighbor

v

VENUS Architecture

VENUS Architecture

Disk storage (offline)• Sharding• Separation of edge data and vertex data

Computing model (online)• Load edge data sequentially

• Execute the update function on each vertex

• How to load vertex data and propagate updates

Sharding

Graph cannot fit in RAM?• Split the graph into shards

Each shard corresponds to an interval of vertices:• G-shard: immutable structure of graph

• In-edges of nodes in the interval

• V-shard: mutable vertex values• Vertex values of all vertices in the shard

Structure table: all g-shards

Value table: all vertex dataVertex ID 1 2 3 4 5 6 7 8 9 10 11 12

Data

Interval I1=[1,4] I2=[5,8] I3=[9,12]

G-shard 7,9,10 → 16,10 → 2

1,2,6 → 31,2,6,7,10 → 4

6,7,8,11 → 51,10 → 6

3,10,11 → 73,6,11 → 8

2,3,4,10,11 → 9

11 → 104,6 → 11

2,3,9,10,11 → 12

V-shard I1∪{6,7,9,10} I2∪{1,3,10,11} I3∪{2,3,4,6}

Vertex-Centric Streamlined Processing

V-shards are much smaller than g-shards• Load each v-shard entirely into memory

Scan each g-shard sequentially• Execute the update function in parallel

Execution

Load v-shard 1

7,9,10 → 16,10 → 21,2,6 → 3

1,2,6,7,10 → 4

Update v-shard 1Load v-shard 2

6,7,8,11 → 51,10 → 6

3,10,11 → 73,6,11 → 8

Update v-shard 2Load v-shard 3

2,3,4,10,11 → 911 → 104,6 → 11

2,3,9,10,11 → 12

Update v-shard 3

LoadingExecution

Parallelize execution and loading

Load and Update v-shards

Two I/O efficient algorithms• Algorithm 1: Extension of PSW in GraphChi (skip)• Algorithm 2: Merge-Join

• Load: merge-join between value table and v-shard

• Update: write values of [1,4] back to vertex table

Use value buffer to cache value table

ID 1 2 3 4 5 6 7 8 9 10 11 12

DataValue table on disk

ID 1 2 3 4 6 7 9 10 Vertices in v-shard 1on disk

ID 1 2 3 4 6 7 9 10

DataLoaded v-shard 1

Evaluation of VENUS

Setup: a commodity PC• quad-core 3.4GHz CPU• 16GB RAM and 4TB hard disk

Main competitors:• GraphChi and X-Stream

Applications:• PageRank• WCC: weakly connected components• CD: community detection• ALS: alternating least square for collaborative filtering• Shortest path, label propagations, etc.

PageRank on Twitter

Twitter follow-graph: 41M nodes, 1.4B edges

Cost of Updates Propagation:Data Write and Read

Applications: WCC, CD, ALS

Failed to implement CD on X-Stream, due to its edge-centric programming model

Web-Scale Graph

Clueweb12: web scale graph• 978 million nodes, 42.5 billion edges• 402 GB on disk• 2 iterations of PageRank

Computation time• GraphChi: 4.3 hours• X-Stream: 7.4 hours• VENUS-I: 2 hours• VENUS-II: 1.8 hours

Conclusion

Present a disk-based graph computation system, VENUS

Our design of graph storage and execution can reduce data access and I/O

Evaluations show it outperforms GraphChi and X-Stream

Also VENUS can handle billion-scale problems

Thank you!Q&A

Data & Analytics

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC