64
RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, Guoqing Harry Xu 1 1 1 2 3 UCLA Nanjing University Facebook 1 2 3

RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine

Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, Guoqing Harry Xu1 1 12 3

UCLA Nanjing University Facebook1 2 3

Page 2: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

2

Big Graph

Page 3: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

2

Graph Datasets

Big Graph

Page 4: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

2

Graph Datasets

GraphChi

Graph Systems

GridGraphBig Graph

Page 5: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Page 6: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

Page 7: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

PageRank

Connected Component

Page 8: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

PageRank

Connected Component

Iterative value computation

Page 9: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

PageRank

Connected Component

Iterative value computation

GraphChi

Think Like a Vertex

Page 10: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

Graph Mining

PageRank

Connected Component

Iterative value computation

GraphChi

Think Like a Vertex

Page 11: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

Graph Mining

PageRank

Connected Component

Frequent Subgraph Mining

Clique Finding

Iterative value computation

GraphChi

Think Like a Vertex

Page 12: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

Graph Mining

PageRank

Connected Component

Frequent Subgraph Mining

Clique Finding

Iterative value computation

Discover structural patterns

GraphChi

Think Like a Vertex

Page 13: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Graph Analytical Problems

3

Graph Computation

Graph Mining

PageRank

Connected Component

Frequent Subgraph Mining

Clique Finding

Iterative value computation

Discover structural patterns

GraphChi

?

Think Like a Vertex

Page 14: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Existing Mining Systems

• Enumerate all possible subgraphs

• For each subgraph, check if it matches the pattern

• Pattern is application-specific (Clique finding, motif counting, frequent subgraph mining)

4

Page 15: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Existing Datalog Systems

5

• Relational predicates

- TC(a, b, c) R(a, b), a < b, R(b, c), b < c, R(c, a)

- count TC(a, b, c)

• Relation algebra enables composition of small structures into big structures

Page 16: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Challenges in Graph Mining

6

1 2 3 4 5 6

4k22k

335k7.8M

117M

1.7B

Exponentially

size of subgraphs

# of

subg

raph

s

• # of subgraphs grows exponentially with the size of subgraphs

Arabesque [CHC Teixeira et al. , SOSP’15]

Page 17: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Problems with Distributed Mining Systems

7

• Suffer from large startup and communication overhead

- Arabesque on 10-node cluster, 35s startup, 3s execution

- DistGraph on 128-node cluster, 32,768GB memory

• Need enterprise clusters with large amounts of memory

- some nodes out of memory, other nodes with memory usage < 10%

• Poor load balancing due to dynamic working sets

Page 18: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Problems with Datalog Systems

8

• Programming model is not expressive enough for complex graph mining algorithms

Page 19: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Thoughts and Insight

9

• Not all users have access to enterprise cluster

• Many users are domain experts with limited background in hosting a cluster

• Distributed mining systems drawbacks: large startup, underutilized cpus, poor load balancing

Page 20: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Thoughts and Insight

9

• Not all users have access to enterprise cluster

• Many users are domain experts with limited background in hosting a cluster

• Distributed mining systems drawbacks: large startup, underutilized cpus, poor load balancing

Increasingly large SSDs

Page 21: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Our Proposal: RStreamA single machine, out-of-core graph mining system

10

• A simple and expressive API

• Gather-Apply-Scatter + Relational Algebra => GRAS

• An efficient runtime engine

• implements relational algebra with streaming

Page 22: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GAS

11

Gather information from neighbor vertices

Page 23: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GAS

12

Apply and update the vertex property

Page 24: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GAS

13

Scatter information to neighbor vertices

Page 25: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GRAS

14

Page 26: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GRAS

14

GAS

supports iterative graph processing

Page 27: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GRAS

14

GAS

Relational Algebra

supports iterative graph processing

enables composition of structures

Page 28: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GRAS

14

GAS

Relational Algebra

GRAS

supports iterative graph processing

enables composition of structures

iteratively composition of structures

Page 29: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

GRAS

14

GAS

Relational Algebra

GRAS

supports iterative graph processing

enables composition of structures

iteratively composition of structures

Page 30: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Edge Streaming

15

• Use streaming to reduce I/O costs

• Sequentially access (larger) datasets from disk, randomly access (smaller) datasets held in memory

X-Stream [A Roy et al. , SOSP’13]

Page 31: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Edge Streaming

16

VID Value

1 12 2

Src Dest

1 42 5

Value Dest

1 42 5

Vertex Table Edge TableUpdate Table

A graph is partitioned into streaming partitions. Each streaming partition contains

Page 32: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Streaming for Scatter/Gather

17

Update Table

src dest

1 22 5

Edge Table

ID value

1 a2 b

Update Table

Streaming Partition 1

Streaming Partition 2

Vertex Table

Update Tablevalue dest

a 2b 5

a 2

b 5

Scatter

Update Tablevalue dest

a 2

Update Table

ID value

1 a2 b

Update TableVertex Table

Update Tablevalue dest

a+b 2

Gather/Apply

Streaming Load Shuffle

Streaming Load

Page 33: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

RStream API

18

Scatter

Relational

Relational

GatherApply

.

.

.

Scatter

GatherApply

Relational

Page 34: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Example:Triangle Counting

19

Scatter R1 R2

Page 35: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Example:Triangle Counting

19

edge table

src dest1 42 5… …

1 4

2 5

Scatter

Scatter R1 R2

VID value1 42 5… …

vertex table

Page 36: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Example:Triangle Counting

19

edge table

update table1

src dest1 42 5… …

c1 c21 42 5… …

⋈src dest4 95 8… …

edge table

1 4 9

2 5 8

1 4

2 5

Scatter

R1

(a, b) ⋈ (b, c)(a, b, c)

Scatter R1 R2

VID value1 42 5… …

vertex table

Page 37: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Example:Triangle Counting

19

edge table

update table1

src dest1 42 5… …

c1 c21 42 5… …

⋈src dest4 95 8… …

edge table

c1 c2 c31 4 92 5 8… … …

⋈src dest9 18 2… …

update table2 edge table

1 4 9

2 5 8

1 4 9

2 5 8

1 4

2 5

Scatter

R1

(a, b) ⋈ (b, c)(a, b, c)

(a, b, c) ⋈ (c, a) (a, b, c, a)R2

Scatter R1 R2

VID value1 42 5… …

vertex table

Page 38: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Outline

• How to provide a general programming interface for graph mining algorithms?

• How to implement relational operators efficiently for graphs?

20

Page 39: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Load

Streaming for Join Operator

21

Update Table

Src Dest

1 22 5

Edge Table

C1 C2

3 16 2 ⋈

Update Table

C1 C2 C3

3 1 26 2 5

3 1 2

6 2 5

Streaming Partition 1

Streaming Partition 2

Streaming Shuffle

Page 40: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Load

Streaming for Join Operator

21

Update Table

Src Dest

1 22 5

Edge Table

C1 C2

3 16 2 ⋈

Update Table

C1 C2 C3

3 1 26 2 5

3 1 2

6 2 5

Streaming Partition 1

Streaming Partition 2

Locality-Aware Join

Streaming Shuffle

Page 41: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

22

1

2

3

⋈ 3 4

1

2

3

1

2

3

⋈ 4

1

2

3

2

1 2 3 3 4

1 2 3 2 4

Page 42: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

2

1 2 3 3 4

1 2 3 2 4

Page 43: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

2

1 2 3 4

1 2 3 3 4

1 2 3 2 4

Page 44: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

42

1 2 3 4

1 2 3 3 4

1 2 3 2 4

Page 45: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

42

1 2 3 4

1 2 3 4

1 2 3 3 4

1 2 3 2 4

Page 46: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

same update tuples

different subgraphs

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

42

1 2 3 4

1 2 3 4

1 2 3 3 4

1 2 3 2 4

Page 47: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

same update tuples

different subgraphs

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

42

1 2 3 4

1 2 3 4

1 2 3 3 4

1 2 3 2 4

Page 48: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Structural Information

same update tuples

different subgraphs

22

1

2

3

⋈ 3 4

1

2

3 4

1

2

3

⋈ 4

1

2

3

42

1 2 3 4

1 2 3 4

Structural info is missing!

1 2 3 3 4

1 2 3 2 4

Page 49: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Missing Structural Information

• Identical tuples may represent different structures

• Different tuples may represent identical structures

23

Page 50: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Adding Structural Info• Encodes the history of joins in update tuples

24

6 8

5

7

6 8

sub graph update tuplesindex 0 1

⋈8

6 8 7

⋈8

6 8 7

5

index 0 1

6 8 7(1)

2

6 8 5(1)7(1)

index 0 1 2 3

Page 51: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Is Join Enough?

• Join grows a subgraph from one of its vertices

• For Frequent Subgraph Mining, we need to explore all possibilities of existing subgraphs

• A different way of joining to grow a subgraph from all of its vertices

25

Page 52: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Join on All Columns

1 2

• Joins update table with edge table on every column

2610

Page 53: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Join on All Columns

1 2

1 2 3

• Joins update table with edge table on every column

2610

Page 54: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Join on All Columns

1 2

1 2 3

1 24

• Joins update table with edge table on every column

2610

Page 55: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Join on All Columns

1 2

1 2 3

1 24

1 2 35

1 2 3

6

1 2 3 7

• Joins update table with edge table on every column

2610

Page 56: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Join on All Columns

1 2

1 2 3

1 24

1 2 35

1 2 3

6

1 2 3 7

1 248

1 24

9

1 24

• Joins update table with edge table on every column

2610

Page 57: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Automorphism and Isomorphism

1 2 3

1 2 3

thread 1

thread 2

• Different threads can generate identical(automorphic) update tuples

27

• Select and keep one, remove all the other duplicates

1

2

3

5

4

6

Aggregation( )2,

• Different tuples may belong to same isomorphism class

• Aggregate to count number of each distinct shape

Arabesque [CHC Teixeira et al. , SOSP’15]

Page 58: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Evaluation• Platform

- 10-node cluster, 5TB SSD

- Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory

28

• Application

- Triangle Counting

- Transitive Closure

- N-Clique Finding

- N-Motif Counting

- Frequent Subgraph Mining

Graphs #Edges #VerticesCiteseer 4,732 3,312

Mico 1.1M 100K

Patents 14M 2.7M

LiverJournal 69M 4.8M

Orkut 117M 3M

UK-2005 936M 39.5M

• Input graphs

Page 59: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Comparisons with Mining Systems

29

Citeseer

Mico Patent

Triangle Counting

RStream 0.04 15.8 6.7Arabesque-10 38.1 43.1 114.9

5-CliqueRStream 0.01 115.1 35.3

Arabesque-10 42.8 132 174.5

3-FSM 1K

RStream 0.06 351.7 383.7Arabesque-10 35.6 5790.1 -ScaleMine-10 1.2 802.6 -DistGraph-10 0.4 - -

RStream outperforms Arabesque by 60.9x ScaleMine by 12.1x DistGraph by 7.2x

Page 60: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Comparisons with Mining Systems

30

0200400600800

100012001400160018002000

3-10K 3-15K 3-20K 4-15K 4-20K 4-25K 5-15K 5-20K 5-25K

Rstream

ScaleMine

Arabesque

FSM on patent graph

subgraph size - support

runn

ing

time(

seco

nds)

Page 61: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Comparisons with Datalog Systems

31

LiveJournal Orkut

TriangleCounting

RStream 87 827.4

BigDatlog-10 94.8 1205.3

BigDatalog-5 109.6 1850.3

BigDatalog-1 567.3 -

SociaLite 896.1 - 0

100

200

300

400

500

600

700

800

900

1,000

BD-1 BD-5 BD-10 SL RSTi

me(

seco

nds)

Transitive Closure

8,021

Page 62: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Size of Intermediate Data

32

Phase #MB

4-Motif Counting

Mico

0 16.5

1 2086

2 886378

3 672194

Total 1.49TB

Page 63: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

Size of Intermediate Data

32

Phase #MB

4-Motif Counting

Mico

0 16.5

1 2086

2 886378

3 672194

Total 1.49TB

13MB initial graph68182 X

Page 64: RStream:Marrying Relational Algebra with Streaming for ... · -10-node cluster, 5TB SSD -Each node: 2 Xeon(R) CPU E5-2640 v3 processors,32GB memory 28 • Application -Triangle Counting

ConclusionsRStream: A single machine, out-of-core graph mining system

33

• A simple and expressive API

• GAS + Relational Algebra => GRAS

• An efficient runtime engine

• implements relational algebra with tuple streaming

https://github.com/rstream-system