Upload
save-manos
View
309
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation of the CIEL framework as described in the paper: "CIEL: a universal execution engine for distributed data-flow computing"
Citation preview
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.
.
. ..
.
.
CIEL universal distributed execution engine
Presenter: Emmanouil Dimogerontakis@{AdvDS}
EMDC KTH
November 6, 2012
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 1/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. .1 Motivation
Distributed Execution Engines
.. .2 CIEL
Dynamic Task GraphsArchitecture
.. .3 Skywriting
.. .4 Optimizations & Fault Tolerance
.. .5 Evaluation & Future work
EvaluationFuture WorkConclusions
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 2/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Purpose
Execute a Task Graph providing:
Task Scheduling
Data Distribution
Load Balancing
Transparent FaultTolerance
Figure: Task Graph
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Purpose
Execute a Task Graph providing:
Task Scheduling
Data Distribution
Load Balancing
Transparent FaultTolerance
Figure: Task Graph
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Purpose
Execute a Task Graph providing:
Task Scheduling
Data Distribution
Load Balancing
Transparent FaultTolerance
Figure: Task Graph
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Purpose
Execute a Task Graph providing:
Task Scheduling
Data Distribution
Load Balancing
Transparent FaultTolerance
Figure: Task Graph
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Limitations
Task graphs used up to now:
Static
Acyclic
Limitations :
Limited Expressive Power
Poor Performance
Insufficient Fault Tolerance
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Limitations
Task graphs used up to now:
Static
Acyclic
Limitations :
Limited Expressive Power
Poor Performance
Insufficient Fault Tolerance
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Distributed Execution Engines
.. Overview
Figure: Distributed Execution Engines comparison
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 5/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. .1 Motivation
Distributed Execution Engines
.. .2 CIEL
Dynamic Task GraphsArchitecture
.. .3 Skywriting
.. .4 Optimizations & Fault Tolerance
.. .5 Evaluation & Future work
EvaluationFuture WorkConclusions
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 6/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. CIEL
WHY Universal?
Support same cluster of algorithms as a TM
HOW ?
using Dynamic Task Graphs
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. CIEL
WHY Universal?
Support same cluster of algorithms as a TM
HOW ?
using Dynamic Task Graphs
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. CIEL primitives
objects
references
tasks
Figure: A Task Grapha
aSource: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 8/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. Dynamic Task Graphs
Figure: A Dynamic Task Graph
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 9/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. Master & Worker
Figure: CIEL Master
Figure: CIEL Worker
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 10/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. Master & Worker
Figure: CIEL Master Figure: CIEL Worker
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 10/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
Dynamic Task GraphsArchitecture
.. Architecture
Figure: CIEL Architecture
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 11/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. .1 Motivation
Distributed Execution Engines
.. .2 CIEL
Dynamic Task GraphsArchitecture
.. .3 Skywriting
.. .4 Optimizations & Fault Tolerance
.. .5 Evaluation & Future work
EvaluationFuture WorkConclusions
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 12/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. Creating Tasks with Skywriting
Figure: Spawning a new task
Figure: Dereferencing futures
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 13/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. Creating Tasks with Skywriting
Figure: Spawning a new task Figure: Dereferencing futures
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 13/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. .1 Motivation
Distributed Execution Engines
.. .2 CIEL
Dynamic Task GraphsArchitecture
.. .3 Skywriting
.. .4 Optimizations & Fault Tolerance
.. .5 Evaluation & Future work
EvaluationFuture WorkConclusions
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 14/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. Optimizations
Globally unique identifiers enable memoization
Streaming partially written objects between tasks
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 15/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
.. Fault Tolerance
Client (no driver program)
Worker (periodic heartbeat)
Master (persistent logging, secondary masters, object tablereconstruction)
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 16/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. .1 Motivation
Distributed Execution Engines
.. .2 CIEL
Dynamic Task GraphsArchitecture
.. .3 Skywriting
.. .4 Optimizations & Fault Tolerance
.. .5 Evaluation & Future work
EvaluationFuture WorkConclusions
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 17/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Performance Comparison with production system
Figure: DistrubutedGrep on Hadoop and Ciel
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 18/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Perfomance of Iterative Algorithm
Figure: K-means on Hadoop and Ciel with 20 workers
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 19/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Overheads
Figure: Speedup of Binomial Options Pricing Model on 47 workers
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 20/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Future Work
Integrate CIEL with existing programming languages
Partition master state
Explore use of multiple cores (see [5])
Explore use of non-deterministic parallelism (see [3])
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 21/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Conclusions
CIEL[4, 1] and Skywriting[2]
are not good for:
sharing large amounts ofdata
fine-grain parallelization
fully automatic parallelism
relation algebra environment
distributed operating system
are really good for :
writing iterative algorithms
data-dependent control flowusing dynamic task graphs
transparent fault toleranceand automatic distribution
scaling across hundreds ofmachines
Questions ?
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Conclusions
CIEL[4, 1] and Skywriting[2]
are not good for:
sharing large amounts ofdata
fine-grain parallelization
fully automatic parallelism
relation algebra environment
distributed operating system
are really good for :
writing iterative algorithms
data-dependent control flowusing dynamic task graphs
transparent fault toleranceand automatic distribution
scaling across hundreds ofmachines
Questions ?
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.. Conclusions
CIEL[4, 1] and Skywriting[2]
are not good for:
sharing large amounts ofdata
fine-grain parallelization
fully automatic parallelism
relation algebra environment
distributed operating system
are really good for :
writing iterative algorithms
data-dependent control flowusing dynamic task graphs
transparent fault toleranceand automatic distribution
scaling across hundreds ofmachines
Questions ?
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 22/23
. . . . . .
MotivationCIEL
SkywritingOptimizations & Fault Tolerance
Evaluation & Future work
EvaluationFuture WorkConclusions
.
.
[1] D.G. Murray.
A distributed execution engine supporting data-dependent control flow.
PhD thesis, PhD thesis, Univ. of Cambridge, 2011.
[2] D.G. Murray and S. Hand.
Scripting the cloud with skywriting.
In Proceedings of the 2nd USENIX conference on Hot topics in cloudcomputing, pages 12–12. USENIX Association, 2010.
[3] D.G. Murray and S. Hand.
Non-deterministic parallelism considered useful.
In HotOS XIII, 13th Workshop on Hot Topics in Operating Systems,2011.
[4] D.G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy,and S. Hand.
Ciel: a universal execution engine for distributed data-flow computing.
In Proceedings of the 8th USENIX conference on Networked systemsdesign and implementation, page 9. USENIX Association, 2011.
[5] M. Schwarzkopf, D.G. Murray, and S. Hand.
Condensing the cloud: running ciel on many-core.
Proceedings of EuroSys SFMA, 2011.
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 23/23
. . . . . .
CIELSkywriting
Experiments
Part I.
.
. ..
.
.
Appendix
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 24/23
. . . . . .
CIELSkywriting
Experiments
.. .6 CIEL
.. .7 Skywriting
.. .8 Experiments
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 1/7
. . . . . .
CIELSkywriting
Experiments
.. Hidden slide 1
Figure: Task and Object table maintained in Master node
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 2/7
. . . . . .
CIELSkywriting
Experiments
.. .6 CIEL
.. .7 Skywriting
.. .8 Experiments
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 3/7
. . . . . .
CIELSkywriting
Experiments
.. Hidden slide 2
Figure: Spawning Tasks1
1Source: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 4/7
. . . . . .
CIELSkywriting
Experiments
.. Hidden slide 3
Figure: Blocking on futures2
2Source: http://www.cl.cam.ac.uk/~dgm36/CIEL-NSDI-slides.pdf
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 5/7
. . . . . .
CIELSkywriting
Experiments
.. .6 CIEL
.. .7 Skywriting
.. .8 Experiments
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 6/7
. . . . . .
CIELSkywriting
Experiments
.. Hidden slide 4
Figure: Primary Master Failure
Presenter: Emmanouil Dimogerontakis @{AdvDS} CIEL universal distributed execution engine 7/7