ChEsS: Cost-Effective Scheduling Nikos Zacheilas across ...menasce/cs788/slides/cs788-ChEsS_presentation-Michael.pdfChEsS: Cost-Effective Scheduling across multiple heterogeneous mapreduce

Summarized by: Michael Bowen

ChEsS: Cost-Effective Scheduling across multiple heterogeneous mapreduce clusters

Nikos ZacheilasVana Kalogeraki

2016 IEEE International Conference on Autonomic Computing

Presentation Summary❖ Key Terminology

❖ Problem Statement

❖ Contributions

❖ Challenges

❖ Variables

❖ Strategy (Methodology)

❖ Impact Estimation

❖ Optimization Problem

❖ Adaptive Weighted Sum

❖ Evaluation

Key TerminologyMapreduce - Word Count - the canonical example

Photo Credit: wikis.nyu.edu

http://wikis.nyu.edu

Key Terminology❖ Isolation

❖ Data Isolation

❖ Privacy and security

❖ Failure Isolation

❖ Hide failures across clusters

❖ Version Isolation

❖ Dependency and version management

❖ Performance Isolation

❖ Prod, Dev, Test - multiple prod different delineations

Key Terminology❖ Per-job considerations

❖ Performance

❖ Monetary Cost

❖ Data Locality

❖ Scheduling Policy

❖ FIFO, Fair, and Capacity - more recently EDF and Least-Laxity

Key Terminology

❖ Makespan

❖ End-to-end execution time of the submitted job

❖ Pareto-based analysis

❖ Many possible courses of action competing for attention

Problem Statement

Challenges

❖ Jobs-to-clusters possible assignments is an exponential increase as number of jobs and clusters increases

❖ Difficult to manually determine these assignments

❖ Budget required vs workload makespan

Contributions

❖ Parameter impact estimates

❖ Jobs locality constraints, intra-job scheduling algorithms, etc…

❖ Pareto-frontier search algorithm improvements

❖ Budget vs Makespan tradeoff analysis

❖ Evaluation study of industry workloads

Variables❖ c ∈ Clusters

❖ VMsc

❖ # virtual machines

❖ mslotsc, rslots

c

❖ map/reduce slots

❖ schedulerc

❖ scheduling algorithm

❖ costc

❖ per hour cost (ec2)

❖ threadsc

❖ # threads spawned for execution

❖ Jobsc

❖ set of jobs assigned to cluster c

❖ makespanc

❖ total execution time in seconds of all jobs assigned to c

❖ budgetc

❖ required budget

❖ j ∈ Jobs

❖ mtasksj, rtasksj

❖ # map/reduce tasks used by job j

❖ mslotsj,c, rslotsj,c

❖ # map/reduce slots reserved by j from c

❖ sizej

❖ input data size of j

❖ dataHostj

❖ where input resides

❖ mtimej,c, rtimej,c, stimej,c

❖ map/reduce/shuffle time estimates

❖ JTimej,c

❖ execution time of job j

Strategy

1. Estimate impact of intra-cluster scheduling policies and locality constraints on makespan and budget

2. Formulate multi-objective optimization problem

3. Solve using Adaptive Weighted Sum (AWS)

Impact EstimationKey Assumption - repetitive, aperiodic jobs

Execution time -Lower Bound

Upper Bound

Map/Reduce/Shuffle for lower and upper

Final estimate - average of two limits

Impact EstimationKey Assumption - repetitive, aperiodic jobs

❖ Locality Constraints -

❖ Add overhead of time to transfer data to execution time

❖ Makespan -

❖ Simulator Engine

❖ Input - scheduling policy and set of jobs

❖ Output - makespan

❖ Budget -

❖ Budget vs Exec Time -

Optimization Problem❖ Pareto-frontier

❖ Detect optimal solutions with respect to constraints

❖ Result helps user decide amongst solution space

❖ Example with two job-to-cluster assignments, P and Q

❖ Q dominates P if and only if

❖ Budget Q ≤ Budget P AND Makespan Q < Makespan P

❖ Budget Q < Budget P OR Makespan Q < Makespan P

❖ The set of non-dominated assignments is the solution space of interest - known as the Pareto-frontier

Adaptive Weighted Sum❖ Pareto-frontier search is very costly - use Adaptive Weighted Sum as an approximation

❖ Regular weighted sum -

❖ Greedy - assign jobs to clusters that lead to min utilityScore

❖ Challenge -

❖ Detected solutions non-uniformally distributed

❖ Cannot detect solutions in non-convex regions of the solution space

❖ Adaptive Weighted Sum -

❖ Perform single-objective optimization in unexplored regions of the solution space

Evaluation

❖ Note - developed for INSIGHT, which provides real-time event detection in Dublin

❖ Used industry workloads based on Yahoo’s Hadoop clusters

❖ Used scientific workloads based on traces from Open-Cloud cluster provider

❖ Four possible clusters considered for the possible jobs

EvaluationExecution time estimation error

Evaluationdj parameter impact

EvaluationScheduling Algorithm Impact

EvaluationLocality Constraints Impact

EvaluationComparison With Optimal

Critique

❖ Assumption of repetitive, aperiodic jobs

❖ Understandable constraint - difficult to model otherwise

❖ Unsure of how realistic this constraint is

❖ Mapreduce is more of a legacy system at this point

❖ Rapidly losing market-share to Spark

Documents

ChEsS: Cost-Effective Scheduling Nikos Zacheilas across ...menasce/cs788/slides/cs788-ChEsS_presentation-Michael.pdfChEsS: Cost-Effective Scheduling across multiple heterogeneous mapreduce