40
. . . . . . Motivation CIEL Skywriting Optimizations & Fault Tolerance Evaluation & Future work . . . CIEL universal distributed execution engine Presenter: Emmanouil Dimogerontakis @{AdvDS} EMDC KTH November 6, 2012 Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 1/23

Ciel universal distributed execution engine

Embed Size (px)

DESCRIPTION

Presentation of the CIEL framework as described in the paper: "CIEL: a universal execution engine for distributed data-flow computing"

Citation preview

Page 1: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.

.

. ..

.

.

CIEL universal distributed execution engine

Presenter: Emmanouil Dimogerontakis@{AdvDS}

EMDC KTH

November 6, 2012

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 1/23

Page 2: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. .1 Motivation

Distributed Execution Engines

.. .2 CIEL

Dynamic Task GraphsArchitecture

.. .3 Skywriting

.. .4 Optimizations & Fault Tolerance

.. .5 Evaluation & Future work

EvaluationFuture WorkConclusions

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 2/23

Page 3: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Purpose

Execute a Task Graph providing:

Task Scheduling

Data Distribution

Load Balancing

Transparent FaultTolerance

Figure: Task Graph

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23

Page 4: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Purpose

Execute a Task Graph providing:

Task Scheduling

Data Distribution

Load Balancing

Transparent FaultTolerance

Figure: Task Graph

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23

Page 5: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Purpose

Execute a Task Graph providing:

Task Scheduling

Data Distribution

Load Balancing

Transparent FaultTolerance

Figure: Task Graph

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23

Page 6: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Purpose

Execute a Task Graph providing:

Task Scheduling

Data Distribution

Load Balancing

Transparent FaultTolerance

Figure: Task Graph

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23

Page 7: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Limitations

Task graphs used up to now:

Static

Acyclic

Limitations :

Limited Expressive Power

Poor Performance

Insufficient Fault Tolerance

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/23

Page 8: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Limitations

Task graphs used up to now:

Static

Acyclic

Limitations :

Limited Expressive Power

Poor Performance

Insufficient Fault Tolerance

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/23

Page 9: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Distributed Execution Engines

.. Overview

Figure: Distributed Execution Engines comparison

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 5/23

Page 10: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. .1 Motivation

Distributed Execution Engines

.. .2 CIEL

Dynamic Task GraphsArchitecture

.. .3 Skywriting

.. .4 Optimizations & Fault Tolerance

.. .5 Evaluation & Future work

EvaluationFuture WorkConclusions

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 6/23

Page 11: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. CIEL

WHY Universal?

Support same cluster of algorithms as a TM

HOW ?

using Dynamic Task Graphs

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/23

Page 12: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. CIEL

WHY Universal?

Support same cluster of algorithms as a TM

HOW ?

using Dynamic Task Graphs

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/23

Page 13: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. CIEL primitives

objects

references

tasks

Figure: A Task Grapha

aSource: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 8/23

Page 14: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. Dynamic Task Graphs

Figure: A Dynamic Task Graph

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 9/23

Page 15: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. Master & Worker

Figure: CIEL Master

Figure: CIEL Worker

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 10/23

Page 16: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. Master & Worker

Figure: CIEL Master Figure: CIEL Worker

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 10/23

Page 17: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

Dynamic Task GraphsArchitecture

.. Architecture

Figure: CIEL Architecture

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 11/23

Page 18: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. .1 Motivation

Distributed Execution Engines

.. .2 CIEL

Dynamic Task GraphsArchitecture

.. .3 Skywriting

.. .4 Optimizations & Fault Tolerance

.. .5 Evaluation & Future work

EvaluationFuture WorkConclusions

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 12/23

Page 19: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. Creating Tasks with Skywriting

Figure: Spawning a new task

Figure: Dereferencing futures

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 13/23

Page 20: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. Creating Tasks with Skywriting

Figure: Spawning a new task Figure: Dereferencing futures

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 13/23

Page 21: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. .1 Motivation

Distributed Execution Engines

.. .2 CIEL

Dynamic Task GraphsArchitecture

.. .3 Skywriting

.. .4 Optimizations & Fault Tolerance

.. .5 Evaluation & Future work

EvaluationFuture WorkConclusions

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 14/23

Page 22: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. Optimizations

Globally unique identifiers enable memoization

Streaming partially written objects between tasks

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 15/23

Page 23: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

.. Fault Tolerance

Client (no driver program)

Worker (periodic heartbeat)

Master (persistent logging, secondary masters, object tablereconstruction)

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 16/23

Page 24: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. .1 Motivation

Distributed Execution Engines

.. .2 CIEL

Dynamic Task GraphsArchitecture

.. .3 Skywriting

.. .4 Optimizations & Fault Tolerance

.. .5 Evaluation & Future work

EvaluationFuture WorkConclusions

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 17/23

Page 25: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Performance Comparison with production system

Figure: DistrubutedGrep on Hadoop and Ciel

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 18/23

Page 26: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Perfomance of Iterative Algorithm

Figure: K-means on Hadoop and Ciel with 20 workers

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 19/23

Page 27: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Overheads

Figure: Speedup of Binomial Options Pricing Model on 47 workers

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 20/23

Page 28: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Future Work

Integrate CIEL with existing programming languages

Partition master state

Explore use of multiple cores (see [5])

Explore use of non-deterministic parallelism (see [3])

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 21/23

Page 29: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Conclusions

CIEL[4, 1] and Skywriting[2]

are not good for:

sharing large amounts ofdata

fine-grain parallelization

fully automatic parallelism

relation algebra environment

distributed operating system

are really good for :

writing iterative algorithms

data-dependent control flowusing dynamic task graphs

transparent fault toleranceand automatic distribution

scaling across hundreds ofmachines

Questions ?

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23

Page 30: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Conclusions

CIEL[4, 1] and Skywriting[2]

are not good for:

sharing large amounts ofdata

fine-grain parallelization

fully automatic parallelism

relation algebra environment

distributed operating system

are really good for :

writing iterative algorithms

data-dependent control flowusing dynamic task graphs

transparent fault toleranceand automatic distribution

scaling across hundreds ofmachines

Questions ?

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23

Page 31: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.. Conclusions

CIEL[4, 1] and Skywriting[2]

are not good for:

sharing large amounts ofdata

fine-grain parallelization

fully automatic parallelism

relation algebra environment

distributed operating system

are really good for :

writing iterative algorithms

data-dependent control flowusing dynamic task graphs

transparent fault toleranceand automatic distribution

scaling across hundreds ofmachines

Questions ?

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23

Page 32: Ciel universal distributed execution engine

. . . . . .

MotivationCIEL

SkywritingOptimizations & Fault Tolerance

Evaluation & Future work

EvaluationFuture WorkConclusions

.

.

[1] D.G. Murray.

A distributed execution engine supporting data-dependent control flow.

PhD thesis, PhD thesis, Univ. of Cambridge, 2011.

[2] D.G. Murray and S. Hand.

Scripting the cloud with skywriting.

In Proceedings of the 2nd USENIX conference on Hot topics in cloudcomputing, pages 12–12. USENIX Association, 2010.

[3] D.G. Murray and S. Hand.

Non-deterministic parallelism considered useful.

In HotOS XIII, 13th Workshop on Hot Topics in Operating Systems,2011.

[4] D.G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy,and S. Hand.

Ciel: a universal execution engine for distributed data-flow computing.

In Proceedings of the 8th USENIX conference on Networked systemsdesign and implementation, page 9. USENIX Association, 2011.

[5] M. Schwarzkopf, D.G. Murray, and S. Hand.

Condensing the cloud: running ciel on many-core.

Proceedings of EuroSys SFMA, 2011.

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 23/23

Page 33: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

Part I.

.

. ..

.

.

Appendix

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 24/23

Page 34: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. .6 CIEL

.. .7 Skywriting

.. .8 Experiments

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 1/7

Page 35: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. Hidden slide 1

Figure: Task and Object table maintained in Master node

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 2/7

Page 36: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. .6 CIEL

.. .7 Skywriting

.. .8 Experiments

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/7

Page 37: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. Hidden slide 2

Figure: Spawning Tasks1

1Source: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/7

Page 38: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. Hidden slide 3

Figure: Blocking on futures2

2Source: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 5/7

Page 39: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. .6 CIEL

.. .7 Skywriting

.. .8 Experiments

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 6/7

Page 40: Ciel universal distributed execution engine

. . . . . .

CIELSkywriting

Experiments

.. Hidden slide 4

Figure: Primary Master Failure

Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/7