Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Summarized by: Michael Bowen
ChEsS: Cost-Effective Scheduling across multiple heterogeneous mapreduce clusters
Nikos ZacheilasVana Kalogeraki
2016 IEEE International Conference on Autonomic Computing
Presentation Summary❖ Key Terminology
❖ Problem Statement
❖ Contributions
❖ Challenges
❖ Variables
❖ Strategy (Methodology)
❖ Impact Estimation
❖ Optimization Problem
❖ Adaptive Weighted Sum
❖ Evaluation
Key TerminologyMapreduce - Word Count - the canonical example
Photo Credit: wikis.nyu.edu
Key Terminology❖ Isolation
❖ Data Isolation
❖ Privacy and security
❖ Failure Isolation
❖ Hide failures across clusters
❖ Version Isolation
❖ Dependency and version management
❖ Performance Isolation
❖ Prod, Dev, Test - multiple prod different delineations
Key Terminology❖ Per-job considerations
❖ Performance
❖ Monetary Cost
❖ Data Locality
❖ Scheduling Policy
❖ FIFO, Fair, and Capacity - more recently EDF and Least-Laxity
Key Terminology
❖ Makespan
❖ End-to-end execution time of the submitted job
❖ Pareto-based analysis
❖ Many possible courses of action competing for attention
Problem Statement
Challenges
❖ Jobs-to-clusters possible assignments is an exponential increase as number of jobs and clusters increases
❖ Difficult to manually determine these assignments
❖ Budget required vs workload makespan
Contributions
❖ Parameter impact estimates
❖ Jobs locality constraints, intra-job scheduling algorithms, etc…
❖ Pareto-frontier search algorithm improvements
❖ Budget vs Makespan tradeoff analysis
❖ Evaluation study of industry workloads
Variables❖ c ∈ Clusters
❖ VMsc
❖ # virtual machines
❖ mslotsc, rslots
c
❖ map/reduce slots
❖ schedulerc
❖ scheduling algorithm
❖ costc
❖ per hour cost (ec2)
❖ threadsc
❖ # threads spawned for execution
❖ Jobsc
❖ set of jobs assigned to cluster c
❖ makespanc
❖ total execution time in seconds of all jobs assigned to c
❖ budgetc
❖ required budget
❖ j ∈ Jobs
❖ mtasksj, rtasksj
❖ # map/reduce tasks used by job j
❖ mslotsj,c, rslotsj,c
❖ # map/reduce slots reserved by j from c
❖ sizej
❖ input data size of j
❖ dataHostj
❖ where input resides
❖ mtimej,c, rtimej,c, stimej,c
❖ map/reduce/shuffle time estimates
❖ JTimej,c
❖ execution time of job j
Strategy
1. Estimate impact of intra-cluster scheduling policies and locality constraints on makespan and budget
2. Formulate multi-objective optimization problem
3. Solve using Adaptive Weighted Sum (AWS)
Impact EstimationKey Assumption - repetitive, aperiodic jobs
Execution time -Lower Bound
Upper Bound
Map/Reduce/Shuffle for lower and upper
Final estimate - average of two limits
Impact EstimationKey Assumption - repetitive, aperiodic jobs
❖ Locality Constraints -
❖ Add overhead of time to transfer data to execution time
❖ Makespan -
❖ Simulator Engine
❖ Input - scheduling policy and set of jobs
❖ Output - makespan
❖ Budget -
❖ Budget vs Exec Time -
Optimization Problem❖ Pareto-frontier
❖ Detect optimal solutions with respect to constraints
❖ Result helps user decide amongst solution space
❖ Example with two job-to-cluster assignments, P and Q
❖ Q dominates P if and only if
❖ Budget Q ≤ Budget P AND Makespan Q < Makespan P
❖ Budget Q < Budget P OR Makespan Q < Makespan P
❖ The set of non-dominated assignments is the solution space of interest - known as the Pareto-frontier
Adaptive Weighted Sum❖ Pareto-frontier search is very costly - use Adaptive Weighted Sum as an approximation
❖ Regular weighted sum -
❖ Greedy - assign jobs to clusters that lead to min utilityScore
❖ Challenge -
❖ Detected solutions non-uniformally distributed
❖ Cannot detect solutions in non-convex regions of the solution space
❖ Adaptive Weighted Sum -
❖ Perform single-objective optimization in unexplored regions of the solution space
Evaluation
❖ Note - developed for INSIGHT, which provides real-time event detection in Dublin
❖ Used industry workloads based on Yahoo’s Hadoop clusters
❖ Used scientific workloads based on traces from Open-Cloud cluster provider
❖ Four possible clusters considered for the possible jobs
EvaluationExecution time estimation error
Evaluationdj parameter impact
EvaluationScheduling Algorithm Impact
EvaluationLocality Constraints Impact
EvaluationComparison With Optimal
Critique
❖ Assumption of repetitive, aperiodic jobs
❖ Understandable constraint - difficult to model otherwise
❖ Unsure of how realistic this constraint is
❖ Mapreduce is more of a legacy system at this point
❖ Rapidly losing market-share to Spark