54
Carnegie Mellon GraphLab A New Framework for Parallel Machine Learning Yucheng Low Aapo Kyrola Carlos Guestrin Joseph Gonzalez Danny Bickson Joe Hellerstein

GraphLab A New Framework for Parallel Machine Learning

  • Upload
    tausiq

  • View
    58

  • Download
    1

Embed Size (px)

DESCRIPTION

GraphLab A New Framework for Parallel Machine Learning. Yucheng Low Aapo Kyrola Carlos Guestrin. Joseph Gonzalez Danny Bickson Joe Hellerstein. Exponential Parallelism. Exponentially Increasing Parallel Performance. Exponentially Increasing Sequential Performance. - PowerPoint PPT Presentation

Citation preview

Page 1: GraphLab A New Framework for  Parallel Machine Learning

Carnegie Mellon

GraphLabA New Framework for

Parallel Machine Learning

Yucheng LowAapo Kyrola Carlos Guestrin

Joseph GonzalezDanny BicksonJoe Hellerstein

Page 2: GraphLab A New Framework for  Parallel Machine Learning

2

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

0.01

0.1

1

10

Exponential Parallelism

Exponentially Increasing

Sequential Pe

rformance

Constant SequentialPerformance

Proc

esso

r Spe

ed G

Hz

Exponentially Increasing Parallel Performance

Release Date

13 Million Wikipedia Pages3.6 Billion photos on Flickr

Page 3: GraphLab A New Framework for  Parallel Machine Learning

Parallel Programming is Hard

Designing efficient Parallel Algorithms is hard

Race conditions and deadlocksParallel memory bottlenecksArchitecture specific concurrencyDifficult to debug

ML experts repeatedly address the same parallel design challenges

3

Avoid these problems by using high-level abstractions.

Graduate students

Page 4: GraphLab A New Framework for  Parallel Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

4

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

No Communication needed

Page 5: GraphLab A New Framework for  Parallel Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

5

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

No Communication needed

Page 6: GraphLab A New Framework for  Parallel Machine Learning

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

6

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

No Communication needed

Page 7: GraphLab A New Framework for  Parallel Machine Learning

CPU 1 CPU 2

MapReduce – Reduce Phase

7

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.

26

1726.

31

Fold/Aggregation

Page 8: GraphLab A New Framework for  Parallel Machine Learning

Related Data

8

Interdependent Computation: Not MapReduceable

Page 9: GraphLab A New Framework for  Parallel Machine Learning

Parallel Computing and ML

Not all algorithms are efficiently data parallel

9

Data-Parallel Complex Parallel Structure

CrossValidation

Feature Extraction

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization Sampling

Lasso

Page 10: GraphLab A New Framework for  Parallel Machine Learning

Common Properties

10

1) Sparse Data Dependencies

2) Local Computations

3) Iterative Updates

• Sparse Primal SVM• Tensor/Matrix Factorization

• Expectation Maximization• Optimization

• Sampling• Belief Propagation

Operation A

Operation B

Page 11: GraphLab A New Framework for  Parallel Machine Learning

Gibbs Sampling

11

X4 X5 X6

X9X8

X3X2X1

X7

1) Sparse Data Dependencies

2) Local Computations

3) Iterative Updates

Page 12: GraphLab A New Framework for  Parallel Machine Learning

GraphLab is the SolutionDesigned specifically for ML needs

Express data dependenciesIterative

Simplifies the design of parallel programs:

Abstract away hardware issuesAutomatic data synchronizationAddresses multiple hardware architectures

Implementation here is multi-coreDistributed implementation in progress

12

Page 13: GraphLab A New Framework for  Parallel Machine Learning

Carnegie Mellon

GraphLabA New Framework for

Parallel Machine Learning

Page 14: GraphLab A New Framework for  Parallel Machine Learning

GraphLab

14

Data Graph Shared Data Table

Scheduling

Update Functions and Scopes

GraphLabModel

Page 15: GraphLab A New Framework for  Parallel Machine Learning

Data Graph

15

A Graph with data associated with every vertex and edge.

:Data

x3: Sample valueC(X3): sample counts

Φ(X6,X9): Binary potential

X1

X2

X3

X5

X6

X7

X8

X9

X10

X4

X11

Page 16: GraphLab A New Framework for  Parallel Machine Learning

Update Functions

16

Update Functions are operations which are applied on a vertex and transform the data in the scope of the vertex

Gibbs Update: - Read samples on adjacent vertices - Read edge potentials - Compute a new sample for the current vertex

Page 17: GraphLab A New Framework for  Parallel Machine Learning

Update Function Schedule

17

e f g

kjih

dcbaCPU 1

CPU 2

ahaibd

Page 18: GraphLab A New Framework for  Parallel Machine Learning

Update Function Schedule

18

e f g

kjih

dcbaCPU 1

CPU 2

aibd

Page 19: GraphLab A New Framework for  Parallel Machine Learning

Static ScheduleScheduler determines the

order of Update Function Evaluations

19

Synchronous Schedule: Every vertex updated simultaneously

Round Robin Schedule: Every vertex updated sequentially

Page 20: GraphLab A New Framework for  Parallel Machine Learning

Converged Slowly ConvergingFocus Effort

Need for Dynamic Scheduling

20

Page 21: GraphLab A New Framework for  Parallel Machine Learning

Dynamic Schedule

21

e f g

kjih

dcbaCPU 1

CPU 2

ahab

b

i

Page 22: GraphLab A New Framework for  Parallel Machine Learning

Dynamic ScheduleUpdate Functions can insert new tasks into

the schedule

22

FIFO Queue Wildfire BP [Selvatici et al.]Priority Queue Residual BP [Elidan et al.]Splash Schedule Splash BP [Gonzalez et al.]

Obtain different algorithms simply by changing a flag!

--scheduler=fifo --scheduler=priority --scheduler=splash

Page 23: GraphLab A New Framework for  Parallel Machine Learning

Global Information

What if we need global information?

23

Sum of all the vertices?

Algorithm Parameters?Sufficient Statistics?

Page 24: GraphLab A New Framework for  Parallel Machine Learning

Shared Data Table (SDT)Global constant parameters

24

Constant:Total # Samples

Constant: Temperature

Page 25: GraphLab A New Framework for  Parallel Machine Learning

Accumulate Function:

Sync OperationSync is a fold/reduce operation over the graph

25

Sync!

1 3 2

1211

3251

0

Apply Function:Add

Divide by |V|

9222

Accumulate performs an aggregation over verticesApply makes a final modification to the accumulated dataExample: Compute the average of all the vertices

Page 26: GraphLab A New Framework for  Parallel Machine Learning

Shared Data Table (SDT)Global constant parametersGlobal computation (Sync Operation)

26

Constant:Total # Samples

Sync: SampleStatistics

Sync: LoglikelihoodConstant: Temperature

Page 27: GraphLab A New Framework for  Parallel Machine Learning

Carnegie Mellon

Safetyand

Consistency

27

Page 28: GraphLab A New Framework for  Parallel Machine Learning

Write-Write Race

28

Write-Write Race If adjacent update functions write simultaneously

Left update writes: Right update writes:Final Value

Page 29: GraphLab A New Framework for  Parallel Machine Learning

Race Conditions + Deadlocks

Just one of the many possible racesRace-free code is extremely difficult to write

29

GraphLab design ensures race-free operation

Page 30: GraphLab A New Framework for  Parallel Machine Learning

Scope Rules

30

Guaranteed safety for all update functions

Page 31: GraphLab A New Framework for  Parallel Machine Learning

Full Consistency

31

Only allow update functions two vertices apart to be run in parallelReduced opportunities for parallelism

Page 32: GraphLab A New Framework for  Parallel Machine Learning

Obtaining More Parallelism

32

Not all update functions will modify the entire scope!

Belief Propagation: Only uses edge dataGibbs Sampling: Only needs to read adjacent vertices

Page 33: GraphLab A New Framework for  Parallel Machine Learning

Edge Consistency

33

Page 34: GraphLab A New Framework for  Parallel Machine Learning

Obtaining More Parallelism

34

“Map” operations. Feature extraction on vertex data

Page 35: GraphLab A New Framework for  Parallel Machine Learning

Vertex Consistency

35

Page 36: GraphLab A New Framework for  Parallel Machine Learning

Sequential ConsistencyGraphLab guarantees sequential

consistency

36

For every parallel execution, there exists a sequential execution of update functions which will produce the same result.

CPU 1

CPU 2

CPU 1

Parallel

Sequential

time

Page 37: GraphLab A New Framework for  Parallel Machine Learning

GraphLab

37

GraphLabModel

Data Graph Shared Data Table

Scheduling

Update Functions and

Scopes

Page 38: GraphLab A New Framework for  Parallel Machine Learning

Carnegie Mellon

Experiments

38

Page 39: GraphLab A New Framework for  Parallel Machine Learning

ExperimentsShared Memory Implemention in C++ using PthreadsTested on a 16 processor machine

4x Quad Core AMD Opteron 838464 GB RAM

Belief Propagation +Parameter Learning

Gibbs SamplingCoEMLasso

39

Compressed SensingSVMPageRankTensor Factorization

Page 40: GraphLab A New Framework for  Parallel Machine Learning

Graphical Model Learning

40

3D retinal image denoising

Data Graph: 256x64x64 vertices

Update Function

Belief PropagationSync

Acc: Compute inference statisticsApply:Take a gradient step

Sync: Edge-potential

Page 41: GraphLab A New Framework for  Parallel Machine Learning

Graphical Model Learning

41

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Optimal

Bette

r

Approx. Priority Schedule

Splash Schedule

15.5x speedup on 16 cpus

Page 42: GraphLab A New Framework for  Parallel Machine Learning

Graphical Model Learning

42

Inference

Gradient Step

Standard parameter learning takes gradient only after inference is compute

Parallel Inference + Gradient Step

With GraphLab:Take gradient step while inference is running

Runt

ime

3x faster!

2100 sec

700 sec

Iterated Simultaneous

Page 43: GraphLab A New Framework for  Parallel Machine Learning

Gibbs SamplingTwo methods for sequentially consistency:

43

ScopesEdge Scope

graphlab(gibbs, edge, sweep);

SchedulingGraph Coloring

CPU 1CPU 2CPU 3

t0t1t2t3

graphlab(gibbs, vertex, colored);

Page 44: GraphLab A New Framework for  Parallel Machine Learning

Gibbs SamplingProtein-protein interaction networks [Elidan et al. 2006]

Pair-wise MRF14K Vertices100K Edges

10x SpeedupScheduling reduceslocking overhead

44

0 2 4 6 8 10 12 14 1602468

10121416

Number of CPUs

Spee

dup

Optimal

Bette

r

Round robin schedule

Colored Schedule

Page 45: GraphLab A New Framework for  Parallel Machine Learning

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

Vertices

Edges

Small 0.2M 20MLarge 2M 200M

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Page 46: GraphLab A New Framework for  Parallel Machine Learning

CoEM (Rosie Jones, 2005)

4646

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Small

LargeGraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 47: GraphLab A New Framework for  Parallel Machine Learning

Lasso

47

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Page 48: GraphLab A New Framework for  Parallel Machine Learning

Lasso

48

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Page 49: GraphLab A New Framework for  Parallel Machine Learning

Lasso

49

L1 regularized Linear Regression

Shooting Algorithm (Coordinate Descent)Due to the properties of the update, full consistency is needed

Finance Dataset from Kogan et al [2009].

Page 50: GraphLab A New Framework for  Parallel Machine Learning

Full Consistency

50

Optimal

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r

Dense

Sparse

Page 51: GraphLab A New Framework for  Parallel Machine Learning

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Relaxing Consistency

51Why does this work? (Open Question)

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bette

r Optimal

Dense

Sparse

Page 52: GraphLab A New Framework for  Parallel Machine Learning

GraphLabAn abstraction tailored to Machine Learning Provides a parallel framework which compactly expresses

Data/computational dependenciesIterative computation

Achieves state-of-the-art parallel performance on a variety of problemsEasy to use

52

Page 53: GraphLab A New Framework for  Parallel Machine Learning

Future WorkDistributed GraphLab

Load BalancingMinimize CommunicationLatency HidingDistributed Data ConsistencyDistributed Scalability

GPU GraphLabMemory bus bottle neckWarp alignment

State-of-the-art performance for <Your Algorithm Here> .

53

Page 54: GraphLab A New Framework for  Parallel Machine Learning

Carnegie Mellon

Parallel GraphLab 1.0

Available Today

http://graphlab.ml.cmu.edu

54

Documentation… Code… Tutorials…