Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Jiewen Huang and Daniel AbadiYale University

Facebook Social Graph

Social Graphs

Web Graphs

Semantic Graphs

Many systems use hash partitioning

● Results in many edges being “cut”

Given a graph G and an integer k, partition the vertices into k disjoint sets such that:

● as few cuts as possible

● as balanced as possible

Graph Partitioning

NP Hard

Multilevel scheme Coarsening phase

State of the Art

The only constant is change.

-------- Heraclitus

To Make the Problem more Complicated

Social graphs: new people and friendshipsSemantic Web graphs: new knowledgeWeb graphs: new websites and links

Dynamic Graphs

Partition 1 Partition 2

Is partition 1 still the better partition for A?

Repartitioning the entire graph upon every change is way too expensive

New Framework

Leopard:● Locally reassess partitioning as a result of

changes without a full re-partitioning● Integrates consideration of replication with

partitioning

Outline

Background and Motivation

LEOPARD

Overview

Computation Skipping

Replication

Experiments

Algorithm Overview

For each added/deleted edge <V1, V2>

Compute best partition for V1 using a heuristic

Re-assign V1 if needed

The same for V2

Example: Adding an Edge

Compute the Partition for B

Partition 1 Partition 2# neighbours: 1# vertices: 5

# neighbours: 3# vertices: 3

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5

Higher score

This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper

Compute the Partition for A

Partition 1 Partition 2# neighbours: 1# vertices: 4

# neighbours: 2# vertices: 4

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66

Higher score

Example: Adding an Edge

(1) B stays put(2) A moves to partition 2

Outline

Leopard

Overview

Replication

Experiments

Computation cost

For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)

Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.

Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex.

For example, threshold = # accumulated changes / # neighbors = 20%.

(1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute

(2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.

Outline

Leopard

Overview

Replication

Experiments

Goals of replication:

fault tolerance (k copies for each data point/block)

further cut reduction

Replication

It takes two parameters:

● minimum: fault tolerance

● average: cut reduction

Minimum-Average Replication

Example

# copies vertices

2 A,C,D,E,H,J,K,L

min = 2average = 2.5

first copy

replica

Example

# copies vertices

2 A,C,D,E,H,J,K,L

min = 2average = 2.5

How Many Copies?

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum = 2average = 3

Scores of each partition

How Many Copies?

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum requirementWhat about them?

Always keep the last n computed scores.

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

cutoff: top avg-1/k-1 percent of scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

30th 31th

# copies: 2

cutoff: 30th highest score

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

30th 31th

# copies: 2

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

30th 31th

# copies: 3

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

# copies: 4

Outline

Leopard

Experiments

Experiment Setup

● Comparison points○ Leopard with FENNEL heustitics

○ One-pass FENNEL (no vertex reassignment)

○ METIS (static graphs)

○ ParMETIS (repartitioning for dynamic graphs)

○ Hash Partitioning

● Graph Datasets○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs

○ Size: up to 66 million vertices and 1.8 billion edges

Edge Cut

Effect of Replication on Edge Cut

Thanks!

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Technology

Formatting & Partitioning a Hard Drive in OS X - Tiger and Leopard

Leopard Valve Electronic Ratio Valve ERV. Leopard valve

THE LEOPARD

Leopard 2 FahrPz (3).JPG Leopard 2 FahrPz (4)tank-masters.de/_content/_indexprints/Leopard_2_FahrSPz.pdf · Leopard 2 FahrPz (103).JPG Leopard 2 FahrPz (105).JPG Leopard 2 FahrPz

Snow Leopard

Balancing Replication and Partitioning in a Distributed Java Database

Vijay Mahawar · partitioning, data replication, aggregation tables etc. for DSS – Decision Support System Application Tuning Majority of all Oracle system performance problems

Penn ESE525 Spring 2008 -- DeHon 1 ESE535: Electronic Design Automation Day 10: February 27, 2008 Partitioning 2 (spectral, network flow, replication)

LEOPARD 125cc LEOPARD 125cc LEOPARD 125cc - …imr-berlin.net/.../motor_iame/leopard_reparaturhandbuch.pdf– LEOPARD 125cc ENGINE DISASSEMBLY 1 2. - CRANKSHAFT DISASSEMBLY/ ASSEMBLY

MySQL Advanced MySQL Replication MySQL Cluster MySQL Partitioning António Amorim, Carlos Jesus. CU1 - DBWorkshop 9-10 /Feb/09

MySQL e MariaDB - Nervnervinformatica.com.br/Downloads/Materiais/MySQLxMariaDB.pdf · WSQL Database Server WSQL Connectors WSQL Replication WSQL Fabric WSQL Router WSQL Partitioning

LEOPARD: Lightweight Edge-Oriented Partitioning …LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs Jiewen Huang Yale University jiewen.huang@yale.edu

Informix Enterprise Replication...Mar 09, 2006 · Nagaraju Inturi nagaraju.inturi@hcl.com. 2-Data dissemination-Update anywhere-Workload partitioning-High availability-Disaster recovery-Rolling

Parallel OLAP query processing in database clusters with ... · Parallel OLAP query processing in database clusters with data replication ... tual partitioning technique [8],

Leopard 2 Czołg podstawowy Leopard 2

Principles of Software Construction: Objects, Design and … · 2013-11-14 · • Distributed system design principles • Replication and partitioning for reliability and scalability

LEOPARD: Lightweight Edge-Oriented Partitioning and ...abadi/papers/leopard-vldb16.pdf · LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs Jiewen

IOGP: An Incremental Online Graph Partitioning …...processing, but do not t for our case. Leopard [ 15] proposes a par-titioning algorithm and a tightly integrated replication algorithm

Edge-Oriented Partitioning and Replication for Dynamic Graphs · Leopard: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs presented by Qixuan Li 1. What

Snow Leopard Conservation Action Plan for Nepal - · PDF fileSnow Leopard Conservation Action Plan for Nepal ... through the replication of the Livestock Insurance Scheme ... Snow