Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Preview:

Citation preview

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Jiewen Huang and Daniel AbadiYale University

Facebook Social Graph

Social Graphs

Web Graphs

Semantic Graphs

Many systems use hash partitioning

● Results in many edges being “cut”

Given a graph G and an integer k, partition the vertices into k disjoint sets such that:

● as few cuts as possible

● as balanced as possible

Graph Partitioning

NP Hard

Multilevel scheme Coarsening phase

State of the Art

The only constant is change.

-------- Heraclitus

To Make the Problem more Complicated

Social graphs: new people and friendshipsSemantic Web graphs: new knowledgeWeb graphs: new websites and links

Dynamic Graphs

A

Partition 1 Partition 2

Is partition 1 still the better partition for A?

Repartitioning the entire graph upon every change is way too expensive

New Framework

Leopard:● Locally reassess partitioning as a result of

changes without a full re-partitioning● Integrates consideration of replication with

partitioning

Outline

Background and Motivation

LEOPARD

Overview

Computation Skipping

Replication

Experiments

Algorithm Overview

For each added/deleted edge <V1, V2>

Compute best partition for V1 using a heuristic

Re-assign V1 if needed

The same for V2

Example: Adding an Edge

AB

Partition 1 Partition 2

Compute the Partition for B

A

B

Partition 1 Partition 2# neighbours: 1# vertices: 5

# neighbours: 3# vertices: 3

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5

Higher score

This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper

Compute the Partition for A

A

B

Partition 1 Partition 2# neighbours: 1# vertices: 4

# neighbours: 2# vertices: 4

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66

Higher score

Example: Adding an Edge

B

Partition 1 Partition 2

A

(1) B stays put(2) A moves to partition 2

Outline

Background and Motivation

Leopard

Overview

Computation Skipping

Replication

Experiments

Computation cost

For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)

Computation Skipping

Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.

Computation Skipping

Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex.

For example, threshold = # accumulated changes / # neighbors = 20%.

(1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute

(2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.

Outline

Background and Motivation

Leopard

Overview

Computation Skipping

Replication

Experiments

Goals of replication:

fault tolerance (k copies for each data point/block)

further cut reduction

Replication

It takes two parameters:

● minimum: fault tolerance

● average: cut reduction

Minimum-Average Replication

Example

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

min = 2average = 2.5

first copy

replica

Example

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

min = 2average = 2.5

How Many Copies?

A

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum = 2average = 3

Scores of each partition

How Many Copies?

A

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum = 2average = 3

minimum requirementWhat about them?

Always keep the last n computed scores.

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

cutoff: top avg-1/k-1 percent of scores

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 2

cutoff: 30th highest score

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 2

cutoff: 30th highest score

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 3

cutoff: 30th highest score

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th

# copies: 4

cutoff: 30th highest score

Outline

Background and Motivation

Leopard

Experiments

Experiment Setup

● Comparison points○ Leopard with FENNEL heustitics

○ One-pass FENNEL (no vertex reassignment)

○ METIS (static graphs)

○ ParMETIS (repartitioning for dynamic graphs)

○ Hash Partitioning

● Graph Datasets○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs

○ Size: up to 66 million vertices and 1.8 billion edges

Edge Cut

Computation Skipping

Effect of Replication on Edge Cut

Thanks!

Q & A

Recommended