View
19
Download
0
Category
Preview:
Citation preview
Public
Auto-scaling Techniques for Elastic Data
Stream Processing Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, Christof Fetzer
March 2014
Public
Outline
1. Introduction
2. Auto-scaling Techniques for Elastic Data Stream Processing
3. Evaluation
4. Conclusion and Future Work
Public
Utilization within Cloud Environments
Cluster of Twitter[1] has average CPU utilization < 20%, however ~80% of the resources are reserved
Google Cluster Trace[2] shows an average 25-35% CPU utilization and 40% memory
The average utilization within public clouds
is estimated to be between 6% to 12%[3].
1 Benjamin Hindman, et al. "Mesos: A platform for fine-grained resource sharing in the data center." NSDI, 2011. 2 Charles Reiss, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SOCC, ACM, 2012. 3 Arunchandar Vasan, et al. “Worth their watts? An empirical study of datacenter servers“. HPCA, 2010.
3
Public
Elasticity
Users needs to reserve required resources Limited understanding of the performance of the system
Limited knowledge of characteristics of the workload
Workload Load
Static Provisioning
Elastic Provisioning
time
Underprovisioning
Overprovisioning
4
Public
Elastic Data Stream Processing
Long standing continous queries over potential infinite data stream
Small memory footprint (MB – GB) for most use cases, fast scale out Strict requirements on end to end latency Unpredictable workload, high variability (within seconds to minutes) Load balancing influences running queries
Input Streams
Data Stream Processing Engine Output Streams
5
Public
Auto-scaling Techniques[1]
Different algorithmic approaches for various domains and use cases.
1) Threshold-based approaches
2) Time series analysis
3) Reinforcement learning
4) Queuing theory
5) Control theory
1T. Lorido-Botran et al., “Auto-scaling techniques for elastic applications in cloud environments,” Tech.
Rep., 2012. 6
Public
Requirements
Workload Independence: Independent from workload characteristics; we make no assumption on the input workload.
Adaptivity: Adapt online to changing conditions like different workload characteristics.
Configurability: Easy to setup and configure by an end user.
Computational feasibility: The algorithm has to be computationally feasible for scale out within seconds.
Feasible: Threshold-based approaches, Reinforcement Learning, Control Theory.
Not feasible: Time Series Analysis, Queuing Models.
7
Public
Threshold-based Approach
Upper Threshold:
„If CPU utilization of host is larger than x for y seconds,
host is marked as overloaded.“
Lower Threshold:
„If CPU utilization of host is small than z for w seconds,
host is marked as underloaded.“
Additional parameters: Target Utilization, Grace Period
Two Variants: Local Thresholds vs. Global Thresholds
8
Public
Reinforcement Learning[1]
Lookup table describing for each state the best action
Best action is adapted based on an online learning algorithm
Extension for Elastic Data Stream Processing:
Local policy
Initialisation based on threshold-based policy
Grace Period
1R. Das et al., “Model-based and model-free approaches to autonomic resource allocation”, IBM
Research Report, 2005. 9
Public
Elastic Data Stream Processing
Elastic CEP Engine Elastic CEP Engine FU
GU
Operator Placement Distributed Data
Stream Processing Engine Processing
Coordination
Auto-scaling Techniques
Heinze, Thomas, et al. "Elastic Complex Event Processing under Varying Query Load.“ BD3,VLDB,
2013.
Queries
Input Streams
Output Streams
10
Public
Measured
Host Utilization
Auto-scaling Technique
Max./ Min. Host Utilization,
Target Utilization
Scaling
Decision
Integration of Auto-Scaling Techniques
11
Public
Setup
Based on data taken from Frankfurt Stock Exchange
35 aggregation queries using varying data selection and time ranges
up to 10 processing nodes
Experiment duration: ~1 hour
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 101 201 301 401 501 601 701
Even
t C
ou
nt
(Eve
nt/
s)
Time
Day 2
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 101 201 301 401 501 601 701
Even
t C
ou
nt
(Eve
nt/
s)
Time
Day 1
12
Public
Configurability
Global Thresholds Local Thresholds
Local threshold are more robust, while global thresholds highly sensitiv.
0
1
2
3
4
5
6
7
8
9
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0.2-0.8 0.3-0.8 0.3-0.9
Late
ncy
(s)
Uti
lizat
ion
Latency Avg Utilization Max Utilization Min Utilization Median Latency
0
1
2
3
4
5
6
7
8
9
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0.2-0.8 0.3-0.8 0.3-0.9
Late
ncy
(s)
Uti
lizat
ion
13
Public
Configurability
Reinforcement Learning Local Thresholds
Reinforcement Learning achieves best utilization and latency values.
0
1
2
3
4
5
6
7
8
9
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0.2-0.8 0.3-0.8 0.3-0.9
Late
ncy
(s)
Uti
lizat
ion
0
1
2
3
4
5
6
7
8
9
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0.2-0.8 0.3-0.8 0.3-0.9
Late
ncy
(s)
Uti
lizat
ion
Latency Avg Utilization Max Utilization Min Utilization Median Latency
14
Public
Different Workloads
Reinforcement Learning Local Thresholds
Reinforcement Learning reduces variability between different workloads.
0
1
2
3
4
5
6
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Day 1 Day 2 Day3
Late
ncy
(s)
Uti
lizat
ion
0
1
2
3
4
5
6
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Day 1 Day 2 Day3
Late
ncy
(s)
Uti
lizat
ion
Latency Avg Utilization Max Utilization Min Utilization Median Latency
15
Public
Lessions Learned
Elasticity for data stream processing systems poses new challenges towards auto-scaling techniques
Global thresholds create many overload situations
Adaptive Reinforcement Learning is more stable than Local Thresholds, improves utilization by 5% and reduces latency by 30%
Need to handle variations within a workload better
Investigate additional parameters like latency, queue length.
Study influence of the placement algorithm
16
Recommended