74
Distributed Estimation of Graph 4-Profiles Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University of Texas, Austin, USA April 14, 2016 E. R. Elenberg Distributed Estimation of Graph 4-Profiles 1/23

Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Distributed Estimation of Graph 4-Profiles

Ethan R. Elenberg, Karthikeyan Shanmugam,Michael Borokhovich, Alexandros G. Dimakis

University of Texas, Austin, USA

April 14, 2016

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 1/23

Page 2: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Introduction

• Perform analytics on large graphs

- How does information spread across the Web?- Is this social network user a spam bot?- Classify a protein as helpful or harmful

• Generalize existing subgraph analysis

- Triangle counts, clustering coefficient, graphlet frequencies

• Scalable, distributed algorithms

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 2/23

Page 3: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 4: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

? ? ? ?

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 5: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

0 ? ? ?

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 6: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

0 5 ? ?

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 7: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

0 5 5 ?

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 8: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 3-profile

• Count the induced subgraphs formed by selecting all triples ofvertices

H3H2H1H0

0 5 5 0

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23

Page 9: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 10: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]

F0

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 11: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]

F0

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 12: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, ?, ?, ?, ?, ?, ?, ?]

F4

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 13: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ?]

F4

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 14: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, 2, ?, ?, ?, ?, ?, ?]

F4

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 15: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, 2, 0, 0, ?, ?, ?, ?]

F7

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 16: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, 2, 0, 0, 1, ?, ?, ?]

F7

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 17: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices

• n(G) = [0, 0, 0, 0, 2, 0, 0, 1, 2, 0, 0]

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23

Page 18: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

Definition

Let Ni be the number of Fi’s in a graph G. The vectorn(G) = [N0, N1, . . . , N10] is called the global 4-profile of G.

- Always sums to(|V |

4

), the total number of 4-subgraphs

Definition

For each v ∈ V , the local 4-profile counts how many times vparticipates in each Fi with 3 other vertices.

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 5/23

Page 19: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Graph 4-profile

Definition

Let Ni be the number of Fi’s in a graph G. The vectorn(G) = [N0, N1, . . . , N10] is called the global 4-profile of G.

- Always sums to(|V |

4

), the total number of 4-subgraphs

Definition

For each v ∈ V , the local 4-profile counts how many times vparticipates in each Fi with 3 other vertices.

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 5/23

Page 20: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Motivation

• Local 4-profiles embed each vertex into an 11 dimensionalfeature space

- Spam detection- Generative models

• Global 4-profile concisely describes local connectivity

- Molecule classification

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 6/23

Page 21: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Introduction

• Problem: Compute (or approximate) 4-profile quantities for alarge graph

• Previous approaches have 2 drawbacks

- Require global communication- Many vertices redundantly repeat the same calculations

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 7/23

Page 22: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Introduction

• Problem: Compute (or approximate) 4-profile quantities for alarge graph

• Previous approaches have 2 drawbacks

- Require global communication- Many vertices redundantly repeat the same calculations

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 7/23

Page 23: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Contributions

1 Design novel, distributed algorithm to calculate local 4-profiles

2 Derive improved concentration bounds for 4-profile sparsifiers

3 Evaluate performance on real-world datasets

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 8/23

Page 24: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Related Work

Well studied across several communities (graphlets, motifs,subgraph frequencies):

• Graph sub-sampling

[Kim, Vu ’00] [Tsourakakis, et al. ’08 -’11] [Jha, et al. ’15]

• Large-scale triangle counting

[Shank ’07] [Satish, et al. ’14] [Eden, et al. ’15]

• Subgraph/graphlet counting equations

[Kloks, et al. ’00] [Kowaluk, et al. ’13]Orca [Hocevar, Demsar ’14] [E. ’15] [Ahmed, et al. ’15]

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 9/23

Page 25: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Outline

1 Introduction

2 4-Prof-Dist Algorithm

3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound

4 Experiments

5 Conclusions

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 9/23

Page 26: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

4-Prof-Dist

• Message passing algorithm in the Gather-Apply-Scatterframework

- GraphLab, Pregel, Spark GraphX, etc.

- Communication only allowed between adjacent vertices

- Intermediate results stored as edge data

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 10/23

Page 27: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

4-Prof-Dist

1 Each vertex computes its local 3-profile and triangle list

2 Each vertex solves a 17× 17 system of equations relating itslocal 4-profile to its neighbors’ local 3-profiles

- Edge Pivots [Kloks, et al. ’00] [Kowaluk, et al. ’13] [E. ’15]

- 4-clique Counting

- 2-hop Histogram

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23

Page 28: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

4-Prof-Dist

1 Each vertex computes its local 3-profile and triangle list

2 Each vertex solves a 17× 17 system of equations relating itslocal 4-profile to its neighbors’ local 3-profiles

- Edge Pivots [Kloks, et al. ’00] [Kowaluk, et al. ’13] [E. ’15]

- 4-clique Counting

- 2-hop Histogram

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23

Page 29: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Outline

1 Introduction

2 4-Prof-Dist Algorithm

3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound

4 Experiments

5 Conclusions

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23

Page 30: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process

• Sub-sample each edge in the graph independently withprobability p

• Relate the original and sub-sampled graphs via a 1-stepMarkov chain

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 12/23

Page 31: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process: 4-clique

Original

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

......

p6

Sub-sampled

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Estimator

=1

p6

Sub-sampled

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23

Page 32: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process: 4-clique

Original

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

......

p6

Sub-sampled

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Estimator

=1

p6

Sub-sampled

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23

Page 33: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process: 4-clique

Original

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

......

p6

Sub-sampled

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Estimator

=1

p6

Sub-sampled

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23

Page 34: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process: 4-clique

Original

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

......

p6

p4

p5

6p 5(1− p)

3p 4(1−

p) 2p 4

(1− p)

Sub-sampled

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Estimator

=1

p6

Sub-sampled

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23

Page 35: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Edge Sub-sampling Process

In general, construct an invertible transition matrix H

Hij = P(sub-sampled = Fi | original = Fj)

UnbiasedEstimators

= H−1

Sub-sampled4-profile

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 14/23

Page 36: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Main Concentration Result

• N10 - # 4-cliques

• k10 - Maximum # 4-cliques sharing a common edge

Theorem (4-clique sparsifier)

If the sampling probability

p ≥(

log(2/δ)k10

2ε2N10

)1/12

,

then the relative error is bounded by ε with probability at least1− δ.

Proof Sketch:

- 4-clique estimator is associated with a read-k10 function family

f(G, p) = e1e2e4e5e7e8 + e4e5e6e8e9e10 + . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23

Page 37: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Main Concentration Result

• N10 - # 4-cliques

• k10 - Maximum # 4-cliques sharing a common edge

Theorem (4-clique sparsifier)

If the sampling probability

p ≥(

log(2/δ)k10

2ε2N10

)1/12

,

then the relative error is bounded by ε with probability at least1− δ.

Proof Sketch:

- 4-clique estimator is associated with a read-k10 function family

f(G, p) = e1e2e4e5e7e8 + e4e5e6e8e9e10 + . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23

Page 38: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Outline

1 Introduction

2 4-Prof-Dist Algorithm

3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound

4 Experiments

5 Conclusions

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23

Page 39: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Implementation

• GraphLab PowerGraph v2.2

• Multicore server• 256 GB RAM, 72 logical cores

• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each

Datasets

Name Vertices Edges (undirected)

WEB-NOTRE 325, 729 1, 090, 108LiveJournal 4, 846, 609 42, 851, 237

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 16/23

Page 40: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Implementation

• GraphLab PowerGraph v2.2

• Multicore server• 256 GB RAM, 72 logical cores

• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each

Datasets

Name Vertices Edges (undirected)

WEB-NOTRE 325, 729 1, 090, 108LiveJournal 4, 846, 609 42, 851, 237

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 16/23

Page 41: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: AWS Full Neighborhood vs. Histogram, 10 runs

12 nodes 14 nodes 16 nodes 18 nodes 20 nodes0.0

0.5

1.0

1.5

2.0

Net

wor

kse

nt[b

ytes

]×1010 WEB-NOTRE, AWS

2-hop 2-hop histogram

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 17/23

Page 42: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: AWS Full Neighborhood vs. Histogram, 10 runs

12 nodes 14 nodes 16 nodes 18 nodes 20 nodes0

10

20

30

40

50R

unni

ngtim

e[s

ec]

WEB-NOTRE, AWS2-hop 2-hop histogram

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 18/23

Page 43: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: Concentration Bounds

0.0 0.2 0.4 0.6 0.8 1.0Edge sampling probability

100

103

106

109

1012

1015

1018

1021

1024

1027

1030

Acc

urac

y:|e

xact

-app

rox|

LiveJournal, Sparsifier Accuracy

Kim-VuRead-k4-cliquesMeasured

Improves on previous bounds if p = Ω(1/ log |E|)

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 19/23

Page 44: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: Concentration Bounds

0.0 0.2 0.4 0.6 0.8 1.0Edge sampling probability

100

103

106

109

1012

1015

1018

1021

1024

1027

1030

Acc

urac

y:|e

xact

-app

rox|

LiveJournal, Sparsifier Accuracy

Kim-VuRead-k4-cliquesMeasured

Improves on previous bounds if p = Ω(1/ log |E|)E. R. Elenberg Distributed Estimation of Graph 4-Profiles 19/23

Page 45: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: Multicore Running Time Comparison, 10 runs

10 cores 20 cores 30 cores 40 cores 50 cores 60 cores0

500

1000

1500

2000

2500R

unni

ngtim

e[s

ec]

Orca: 1288 sec

LiveJournal, Asterix

4-Prof-Dist, p = 1 vs. Orca [Hocevar, Demsar ’14]

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 20/23

Page 46: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: Multicore Running Time Comparison, 10 runs

p=1 p=0.9 p=0.8 p=0.7 p=0.6 p=0.5 p=0.4 p=0.3 p=0.2 p=0.10

100

200

300

400

500

600R

unni

ngtim

e[s

ec]

LiveJournal, Asterix

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 21/23

Page 47: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Results: AWS Running Time Comparison, 10 runs

8 nodes 12 nodes 16 nodes 20 nodes0

50

100

150

200R

unni

ngtim

e[s

ec]LiveJournal, AWS

p=1p=0.7

p=0.4p=0.1

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 22/23

Page 48: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Outline

1 Introduction

2 4-Prof-Dist Algorithm

3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound

4 Experiments

5 Conclusions

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 22/23

Page 49: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

Summary

1 2-hop histogram reduces network communication in adistributed setting

2 Edge sub-sampling produces fast, accurate 4-profile estimates

- Bounds for other subgraphs in the full paper

3 Distributed/parallel implementation improves performance atscale

github.com/eelenberg/4-profiles

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 50: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Results: AWS Network Communication, 10 runs

8 nodes 12 nodes 16 nodes 20 nodes0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Net

wor

kse

nt[b

ytes

]×1011 LiveJournal, AWS

p=1p=0.7

p=0.4p=0.1

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 51: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) 2-hop Histogram

• At each vertex a, store a (p, ca[p]) pair for each neighbor p

• For any a ∈ Γ(v) and p /∈ Γ(v),

ca[p] = 1 ⇔ vap forms a 2-path

• Gather across neighbors to count the number of distinct2-paths from v to p

p

v∑p/∈Γ(v)

(⊕a∈Γ(v) ca[p]2

)F7(v) F9(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 52: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) 2-hop Histogram

• At each vertex a, store a (p, ca[p]) pair for each neighbor p

• For any a ∈ Γ(v) and p /∈ Γ(v),

ca[p] = 1 ⇔ vap forms a 2-path

• Gather across neighbors to count the number of distinct2-paths from v to p

p

v∑p/∈Γ(v)

(⊕a∈Γ(v) ca[p]2

)F7(v) F9(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 53: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) 2-hop Histogram

• At each vertex a, store a (p, ca[p]) pair for each neighbor p

• For any a ∈ Γ(v) and p /∈ Γ(v),

ca[p] = 1 ⇔ vap forms a 2-path

• Gather across neighbors to count the number of distinct2-paths from v to p

p

v∑p/∈Γ(v)

(⊕a∈Γ(v) ca[p]2

)F7(v) F9(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 54: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) 2-hop Histogram

• At each vertex a, store a (p, ca[p]) pair for each neighbor p

• For any a ∈ Γ(v) and p /∈ Γ(v),

ca[p] = 1 ⇔ vap forms a 2-path

• Gather across neighbors to count the number of distinct2-paths from v to p

p

v∑p/∈Γ(v)

(⊕a∈Γ(v) ca[p]2

)F7(v) F9(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 55: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) 2-hop Histogram

• At each vertex a, store a (p, ca[p]) pair for each neighbor p

• For any a ∈ Γ(v) and p /∈ Γ(v),

ca[p] = 1 ⇔ vap forms a 2-path

• Gather across neighbors to count the number of distinct2-paths from v to p

p

v∑p/∈Γ(v)

(⊕a∈Γ(v) ca[p]2

)F7(v) F9(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 56: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Local 3-profile

Vertex program in the Gather-Apply-Scatter framework [E. ’15]

1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)

2 For each edge va: Scatter

v an3,va = |Γ(v) ∩ Γ(a)|,

v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .

3 For each vertex v: Gather and Apply

v an3,v = 1

2

∑a∈Γ(v) n3,va

v anc2,v = 1

2

∑a∈Γ(v) n

c2,va, . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 57: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Local 3-profile

Vertex program in the Gather-Apply-Scatter framework [E. ’15]

1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)

2 For each edge va: Scatter

v an3,va = |Γ(v) ∩ Γ(a)|,

v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .

3 For each vertex v: Gather and Apply

v an3,v = 1

2

∑a∈Γ(v) n3,va

v anc2,v = 1

2

∑a∈Γ(v) n

c2,va, . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 58: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Local 3-profile

Vertex program in the Gather-Apply-Scatter framework [E. ’15]

1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)

2 For each edge va: Scatter

v an3,va = |Γ(v) ∩ Γ(a)|,

v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .

3 For each vertex v: Gather and Apply

v an3,v = 1

2

∑a∈Γ(v) n3,va

v anc2,v = 1

2

∑a∈Γ(v) n

c2,va, . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 59: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Local 3-profile

Vertex program in the Gather-Apply-Scatter framework [E. ’15]

1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)

2 For each edge va: Scatter

v an3,va = |Γ(v) ∩ Γ(a)|,

v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .

3 For each vertex v: Gather and Apply

v an3,v = 1

2

∑a∈Γ(v) n3,va

v anc2,v = 1

2

∑a∈Γ(v) n

c2,va, . . .

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 60: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

H0(v) He1(v) Hd

1 (v) Hc2(v) He

2(v) H3(v)

∑a∈Γ(v)

(ne

1,va

2

)= F1(v) + F2(v)

∑a∈Γ(v)

ne1,van3,va = 2F5(v) + F

′′8 (v)

∑a∈Γ(v)

(nc

2,va

2

)= 3F

′6(v) + F

′8(v)

∑a∈Γ(v)

nc2,van

e2,va = F

′4(v) + 2F7(v)

∑a∈Γ(v)

(n3,va

2

)= F

′9(v) + 3F10(v)

∑a∈Γ(v)

nc2,van3,va = 2F

′8(v) + 2F

′9(v)

∑a∈Γ(v)

ne1,van

c2,va = 2F

′3(v) + F

′4(v) n

d1,v|Γ(v)| = F2(v) + F4(v) + F8(v)

F3(v)

F

F4(v) F6(v) F8(v) F9(v)

F′

F′′

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 61: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 62: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 63: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 64: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 65: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 66: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 67: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Edge Pivot Equations

• Relate local 4-profile at v to neighboring local 3-profiles

• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors

v a

∑a∈Γ(v) n

c2,van

e2,va F

′4(v) 2F7(v)

= +

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 68: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Clique Counting

• Directly count the number of 4-cliques, F10

1 Initially, each vertex stores its triangle list ∆(v)

2 For each edge v:Gather triangles of a that contain 3 neighbors of v

b c

v a

∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 69: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Clique Counting

• Directly count the number of 4-cliques, F10

1 Initially, each vertex stores its triangle list ∆(v)

2 For each edge v:Gather triangles of a that contain 3 neighbors of v

b c

v a

∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 70: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Clique Counting

• Directly count the number of 4-cliques, F10

1 Initially, each vertex stores its triangle list ∆(v)

2 For each edge v:Gather triangles of a that contain 3 neighbors of v

b c

v a

∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 71: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Clique Counting

• Directly count the number of 4-cliques, F10

1 Initially, each vertex stores its triangle list ∆(v)

2 For each edge v:Gather triangles of a that contain 3 neighbors of v

b c

v a

∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 72: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Clique Counting

• Directly count the number of 4-cliques, F10

1 Initially, each vertex stores its triangle list ∆(v)

2 For each edge v:Gather triangles of a that contain 3 neighbors of v

b c

v a

∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 73: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Concentration Result

• Because there are no large constants, our concentrationbounds improve on earlier work in practical settings

Corollary (Comparison with [Kim,Vu])

Let G be a graph with m edges. If p = Ω(1/ logm) andδ = Ω(1/m), then read-k provides better triangle sparsifier

accuracy than [Kim,Vu]. If additionally k10 ≤ N5/610 , then read-k

provides better 4-clique sparsifier accuracy than [Kim,Vu].

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23

Page 74: Ethan R. Elenberg, Karthikeyan Shanmugam, Michael ...eelenberg.github.io/Elenberg4profile · Ethan R. Elenberg, Karthikeyan Shanmugam, Michael Borokhovich, Alexandros G. Dimakis University

(Backup) Results: 4-profile Sparsifier Accuracy, 10 runs

p=0.9 p=0.7 p=0.4 p=0.3 p=0.10.90

0.95

1.00

1.05

1.10A

ccur

acy:

exac

t/app

rox

LiveJournal, Accuracy, 4-profilesF10F9

F8F7

E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23