66
Carnegie Mellon Universit Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Joe Hellerstein Alex Smola @ e Next Generation of the GraphLab Abstractio Jay Gu 2

Joseph Gonzalez Joint work with

  • Upload
    skylar

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

2. @. The Next Generation of the GraphLab Abstraction. Joseph Gonzalez Joint work with. Yucheng Low. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Joe Hellerstein. Jay Gu. Alex Smola. How will we design and implement parallel learning systems?. ... a popular answer :. - PowerPoint PPT Presentation

Citation preview

Page 1: Joseph Gonzalez Joint work with

Carnegie Mellon University

Joseph GonzalezJoint work with

YuchengLow

AapoKyrola

DannyBickson

CarlosGuestrin

JoeHellerstein

AlexSmola

@The Next Generation of the GraphLab Abstraction.

JayGu

2

Page 2: Joseph Gonzalez Joint work with

How will wedesign and implement

parallel learning systems?

Page 3: Joseph Gonzalez Joint work with

Map-Reduce / HadoopBuild learning algorithms on-top of

high-level parallel abstractions

... a popular answer:

Page 4: Joseph Gonzalez Joint work with

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

4

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Page 5: Joseph Gonzalez Joint work with

Example of Graph Parallelism

Page 6: Joseph Gonzalez Joint work with

PageRank ExampleIterate:

Where:α is the random reset probabilityL[j] is the number of links on page j

1 32

4 65

Page 7: Joseph Gonzalez Joint work with

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

My Rank

Friends Rank

Factored Computation

Page 8: Joseph Gonzalez Joint work with

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

8

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?Pregel (Giraph)?

Page 9: Joseph Gonzalez Joint work with

BarrierPregel (Giraph)

Bulk Synchronous Parallel Model:

Compute Communicate

Page 10: Joseph Gonzalez Joint work with

PageRank in Giraph (Pregel)

public void compute(Iterator<DoubleWritable> msgIterator) {

double sum = 0;while (msgIterator.hasNext())

sum += msgIterator.next().get();DoubleWritable vertexValue =

new DoubleWritable(0.15 + 0.85 * sum);setVertexValue(vertexValue);if (getSuperstep() < getConf().getInt(MAX_STEPS,

-1)) {long edges = getOutEdgeMap().size();sentMsgToAllEdges(

new DoubleWritable(getVertexValue().get() / edges));

} else voteToHalt();}

Page 11: Joseph Gonzalez Joint work with

Carnegie Mellon University

Bulk synchronous computation can be inefficient.

11

Problem

Page 12: Joseph Gonzalez Joint work with

Curse of the Slow Job

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Data

Data

Data

Data

Data

Data

Data

Barr

ier

Page 13: Joseph Gonzalez Joint work with

Curse of the Slow JobAssuming runtime is drawn from an exponential distribution with mean 1.

0 2 4 6 8 10 120

2

4

6

8

10

12

Number of Jobs

Runti

me

Mul

tiple

Page 14: Joseph Gonzalez Joint work with

Problem with MessagingStorage Overhead:

Requires keeping Old and New Messages [2x Overhead]Redundant messages:

PageRank: send a copy of your own rank to all neighborsO(|V|) O(|E|)

Often requires complex protocolsWhen will my neighbors need information about me?

Unable to constrain neighborhood stateHow would you implement graph coloring?

CPU

1 CPU 2

Sends the same message three times!

Page 15: Joseph Gonzalez Joint work with

Converge More Slowly

1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

6000

7000

8000

9000

Number of CPUs

Runti

me

in S

econ

ds

Optimized in Memory Bulk Synchronous

Asynchronous Splash BP

Page 16: Joseph Gonzalez Joint work with

Carnegie Mellon University

Bulk synchronous computation can be wrong!

16

Problem

Page 17: Joseph Gonzalez Joint work with

17

The problem with Bulk Synchronous Gibbs

Adjacent variables cannot be sampled simultaneously.

Strong PositiveCorrelation

t=0

Parallel Execution

t=2 t=3

Strong PositiveCorrelation

t=1

Sequential

Execution

Strong NegativeCorrelation

Page 18: Joseph Gonzalez Joint work with

BeliefPropagationSVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

The Need for a New AbstractionIf not Pregel, then what?

18

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Pregel (Giraph)

Page 19: Joseph Gonzalez Joint work with

What is GraphLab?

Page 20: Joseph Gonzalez Joint work with

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

20

Page 21: Joseph Gonzalez Joint work with

Data Graph

21

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:• User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Page 22: Joseph Gonzalez Joint work with

Comparison with Pregel

PregelData is associated only with vertices

GraphLabData is associated with both vertices and edges

Page 23: Joseph Gonzalez Joint work with

pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }

;][)1(][][

iNj

ji jRWiR

Update Functions

23

An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Page 24: Joseph Gonzalez Joint work with

PageRank in GraphLab2

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {vertex_data& vdata =

context.vertex_data(); double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

1/context.num_out_edges(edge.source()) *

context.vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *

sum;double residual = abs(vdata.rank –

old_rank) /

context.num_out_edges();if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

Page 25: Joseph Gonzalez Joint work with

Comparison with Pregel

PregelData must be sent to adjacent verticesThe user code describes the movement of data as well as computation

GraphLabData is read from adjacent verticesUser code only describes the computation

Page 26: Joseph Gonzalez Joint work with

The Scheduler

26

CPU 1

CPU 2

The scheduler determines the order that vertices are updated.

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sche

dule

r

The process repeats until the scheduler is empty.

Page 27: Joseph Gonzalez Joint work with

The GraphLab Framework

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

27

Page 28: Joseph Gonzalez Joint work with

Ensuring Race-Free CodeHow much can computation overlap?

Page 29: Joseph Gonzalez Joint work with

GraphLab Ensures Sequential Consistency

29

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 30: Joseph Gonzalez Joint work with

Consistency Rules

30

Guaranteed sequential consistency for all update functions

Data

Page 31: Joseph Gonzalez Joint work with

Full Consistency

31

Page 32: Joseph Gonzalez Joint work with

Obtaining More Parallelism

32

Page 33: Joseph Gonzalez Joint work with

Edge Consistency

33

CPU 1 CPU 2

Safe

Read

Page 34: Joseph Gonzalez Joint work with

Is pretty neat!

In Summary …

Page 35: Joseph Gonzalez Joint work with

02000

40006000

800010000

1200014000

1.00E-021.00E+001.00E+021.00E+041.00E+061.00E+08

GraphLabPregel

Runtime (s)

L1 E

rror

Pregel vs. GraphLabMulticore PageRank (25M Vertices, 355M Edges)

Pregel [Simulated]Synchronous ScheduleNo Skipping [Unfair updates comparison]No Combiner [Unfair runtime comparison]

0.0E+00 5.0E+08 1.0E+09 1.5E+09 2.0E+091.00E-02

1.00E+00

1.00E+02

1.00E+04

1.00E+06

1.00E+08GraphLabPregel

Updates

L1 E

rror

Page 36: Joseph Gonzalez Joint work with

Update Count Distribution

0 10 20 30 40 50 60 700

2000000

4000000

6000000

8000000

10000000

12000000

14000000

Number of Updates

Num

-Ver

tices

Most vertices need to be updated infrequently

Page 37: Joseph Gonzalez Joint work with

Bayesian Tensor Factorization

Gibbs Sampling

Dynamic Block Gibbs Sampling

MatrixFactorization

Lasso

SVM

Belief Propagation

PageRank

CoEM

K-Means

SVD

LDA

…Many others…

Page 38: Joseph Gonzalez Joint work with

Startups Using GraphLab

Companies experimenting with Graphlab

Academic projects Exploring Graphlab

1600++ Unique Downloads Tracked(possibly many more from direct repository checkouts)

Page 39: Joseph Gonzalez Joint work with

Why do we need a NEW GraphLab?

Page 40: Joseph Gonzalez Joint work with

Natural Graphs

Page 41: Joseph Gonzalez Joint work with

41

Natural Graphs Power Law

Top 1% vertices is adjacent to53% of the edges!

Yahoo! Web Graph

“Power Law”

Page 42: Joseph Gonzalez Joint work with

Problem: High Degree Vertices

High degree vertices limit parallelism:

Touch a LargeAmount of State

Requires Heavy Locking

Processed Sequentially

Page 43: Joseph Gonzalez Joint work with

High Degree Vertices are Common

Use

rs

Movies

Netflix

“Social” People Popular Movies

θZwZwZwZw

θZwZwZwZw

θZwZwZwZw

θZwZwZwZw

Hyper Parameters

Docs

Words

Freq.

Common Words

Obama

Page 44: Joseph Gonzalez Joint work with

Proposed Four Solutions

Decomposable Update FunctorsExpose greater parallelism by further factoring update functions

Commutative- Associative Update FunctorsTransition from stateless to stateful update functions

Abelian Group Caching (concurrent revisions)Allows for controllable races through diff operations

Stochastic ScopesReduce degree through sampling

Page 45: Joseph Gonzalez Joint work with

PageRank in GraphLab

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {vertex_data& vdata =

context.vertex_data(); double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

1/context.num_out_edges(edge.source()) *

context.vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *

sum;double residual = abs(vdata.rank –

old_rank) /

context.num_out_edges();if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

Page 46: Joseph Gonzalez Joint work with

PageRank in GraphLab

struct pagerank : public iupdate_functor<graph, pagerank> {

void operator()(icontext_type& context) {vertex_data& vdata =

context.vertex_data(); double sum = 0;foreach ( edge_type edge,

context.in_edges() )sum +=

1/context.num_out_edges(edge.source()) *

context.vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *

sum;double residual = abs(vdata.rank –

old_rank) /

context.num_out_edges();if (residual > EPSILON)

context.reschedule_out_neighbors(pagerank());}

};

Atomic Single Vertex Apply

Parallel Scatter [Reschedule]

Parallel “Sum” Gather

Page 47: Joseph Gonzalez Joint work with

Decomposable Update Functors

Decompose update functions into 3 phases:

Locks are acquired only for region within a scope Relaxed Consistency

+ + … + Δ

Y YY

ParallelSum

User Defined:Gather( ) ΔY

Δ1 + Δ2 Δ3

Y Scope

Gather

Y

YApply( , Δ) Y

Apply the accumulated value to center vertexUser Defined:

Apply

Y

Scatter( )

Update adjacent edgesand vertices.

User Defined:Y

Scatter

Page 48: Joseph Gonzalez Joint work with

Factorized PageRankstruct pagerank : public iupdate_functor<graph, pagerank> { double accum = 0, residual = 0;

void gather(icontext_type& context, const edge_type& edge) {

accum += 1/context.num_out_edges(edge.source()) *

context.vertex_data(edge.source()).rank;}void merge(const pagerank& other) { accum +=

other.accum; }void apply(icontext_type& context) {

vertex_data& vdata = context.vertex_data();double old_value = vdata.rank;vdata.rank = RESET_PROB + (1 - RESET_PROB)

* accum; residual = fabs(vdata.rank – old_value) /

context.num_out_edges();}void scatter(icontext_type& context, const

edge_type& edge) {if (residual > EPSILON)

context.schedule(edge.target(), pagerank());

}};

Page 49: Joseph Gonzalez Joint work with

Y

Split computation across machines:

Decomposable Execution Model

( o )( )Y

YYF1 F2

YY

Page 50: Joseph Gonzalez Joint work with

Weaker ConsistencyNeighboring vertices maybe be updated simultaneously:

A B

CGather

Gather Gather

Apply

Page 51: Joseph Gonzalez Joint work with

Other Decomposable AlgorithmsLoopy Belief Propagation

Gather: Accumulates product (log sum) of in messagesApply: Updates central beliefScatter: Computes out messages and schedules adjacent vertices

Alternating Least Squares (ALS)

y1

y2

y3

y4

w1

w2

x1

x2

x3Use

r Fac

tors

(W)

Movie Factors (X)

Use

rs MoviesNetflix

Use

rs

≈x

Movies

Page 52: Joseph Gonzalez Joint work with

Convergent Gibbs SamplingCannot be done:

A B

CGather

Gather Gather

Unsafe

Page 53: Joseph Gonzalez Joint work with

Decomposable FunctorsFits many algorithms

Loopy Belief Propagation, Label Propagation, PageRank…

Addresses the earlier concerns

Problem: Does not exploit asynchrony at the vertex level.

Large State

DistributedGather and Scatter

Heavy Locking

Fine GrainedLocking

Sequential

ParallelGather and Scatter

Page 54: Joseph Gonzalez Joint work with

Need for Vertex Level Asynchrony

Exploit commutative associative “sum”

Y

+ + + + + Y

Costly gather for a single change!

Page 55: Joseph Gonzalez Joint work with

Need for Vertex Level Asynchrony

Exploit commutative associative “sum”

Y

+ + + + + Y

Page 56: Joseph Gonzalez Joint work with

Need for Vertex Level Asynchrony

Exploit commutative associative “sum”

Y

+ + + + + + Δ Y

Page 57: Joseph Gonzalez Joint work with

Need for Vertex Level Asynchrony

Exploit commutative associative “sum”

Y

+ + + + + + Δ YOld (Cached) Sum

Page 58: Joseph Gonzalez Joint work with

Need for Vertex Level Asynchrony

Exploit commutative associative “sum”

Y

+ + + + + + Δ YOld (Cached) Sum

Δ Δ Δ Δ

Page 59: Joseph Gonzalez Joint work with

Commutative-Associative Updatestruct pagerank : public iupdate_functor<graph, pagerank> {

double delta;pagerank(double d) : delta(d) { }void operator+=(pagerank& other) { delta +=

other.delta; }void operator()(icontext_type& context) {

vertex_data& vdata = context.vertex_data();

vdata.rank += delta;if(abs(delta) > EPSILON) {

double out_delta = delta * (1 – RESET_PROB) *

1/context.num_out_edges(edge.source());

context.schedule_out_neighbors(pagerank(out_delta));}

}};// Initial Rank: R[i] = 0;// Initial Schedule: pagerank(RESET_PROB);

Page 60: Joseph Gonzalez Joint work with

Scheduling Composes UpdatesCalling reschedule neighbors forces update function composition:

pagerank(3) Pending: pagerank(7)

reschedule_out_neighbors(pagerank(3))pagerank(3)

Pending: pagerank(3)

Pending: pagerank(10)

Page 61: Joseph Gonzalez Joint work with

Experimental Comparison

Page 62: Joseph Gonzalez Joint work with

Comparison of Abstractions:Multicore PageRank (25M Vertices, 355M Edges)

0 1000 2000 3000 4000 5000 60001.00E-021.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+08

GraphLab1FactorizedDelta

Runtime (s)

L1 E

rror

Page 63: Joseph Gonzalez Joint work with

Comparison of Abstractions:Distributed PageRank (25M Vertices, 355M Edges)

2 3 4 5 6 7 80

50

100

150

200

250

300

350

400GL 1 (Chromatic)GL 2 Delta (Asynchronous)

# Machines (8 CPUs per Machine)

Runti

me

(s)

2 3 4 5 6 7 80

5

10

15

20

25

30

35GL 1 (Chromatic)GL 2 Delta (Asynchronous)

# Machines (8 CPUs per Machine)

Tota

l Com

mun

icati

on (G

B)

Page 64: Joseph Gonzalez Joint work with

PageRank on the Web circa 2000Invented Comparison:

GraphLab2 (512)

Pegasus (800)

Priter (200)

Dryad (960)

0

500

1000

1500

2000

2500

3000

3500Ru

ntim

e Se

cond

s

Page 65: Joseph Gonzalez Joint work with

Ongoing work

Extending all of GraphLab2 to the distributed settingImplemented push based engines (chromatic)Need to build GraphLab2 distributed locking engine

Improving storage efficiency of the distributed data-graphPorting large set of Danny’s applications

Page 66: Joseph Gonzalez Joint work with

Questionshttp://graphlab.org