Upload
lilly-passman
View
220
Download
5
Tags:
Embed Size (px)
Citation preview
DEXA 2005
Control-based Quality Adaptation in Data Stream
Management Systems (DSMS)
Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu¥
†Department of Computer Sciences, Purdue University, USA
‡School of Computing Science, Simon Fraser University at Surrey, Canada
¥ School of Mechanical Engineering, Purdue University, USA
DEXA 2005
Data Stream Management
• Continuous data, discarded after being processed
• Continuous query• Data-active query-
passive model• Applications
– Financial analysis– Mobile services– Sensor networks– Network monitoring– More …
User
DSMS
User
User
DataQuery
Results
DEXA 2005
DSMS architecture
• Network of query operators (O1 – O3)
• Each operator has its own queue (q1 – q4)
• Scheduler decides which operator to execute
• Query results (Q1, Q2) pushed to clients
• Example systems:– Aurora/Borealis– STREAM
DEXA 2005
Quality-of-Service (QoS) in DSM
• Data processing is QoS-critical in DSMS– Tuple delay is the major concern: results generated from old data
are useless!
• Highly dynamic environment hard to maintain QoS– Bursty data input– Unpredictable unit processing cost
• Overloading during spikes degraded (delay) QoS • Solution: adjust the following (i.e. quality adaptation)
– Sampling rate (source side) – Data loss (DSMS side) load shedding
DEXA 2005
Load Shedding
• Eliminating excessive load by dropping data items less QoS violations
• Basic algorithm (Tatbul et al., 2003): periodically• CPU is the bottlenecking resource• Key questions
– When?– How much?– Where?– Which tuples?
DEXA 2005
What’s missing?
• Current solutions focus on steady-state performance
• Assuming input level changes between stable states
• However, arrivals are bursty in practice – always in transient state
• Taking averages (baseline) wouldn’t work
Load
Time
CPUcapacity
DEXA 2005
Our approach
• View load shedding as a feedback control problem • Feedback Control: manipulation of system behavior by
adjusting system input based on system output – Cruise control of automobiles, room temperature control, etc.
• The feedback control loop:– Plant
– Monitor
– Controller
– Actuator
• How it works– Error = measured output – desirable output
– Focal point: controller, which maps error to control signal
DEXA 2005
Why Feedback Control ?
• Maintain system performance under internal/external uncertainties
• Control theory provides tools to choose and tune controller toward desired performance
– Current load shedding solution is also feedback-based– Difference: we use control theory to guide the controller design
• Steps of problem-solving using control theory1. Mapping problem to feedback control loop, determine
input/output
2. System identification: modeling input/output relationship
3. Controller design: can be done analytically
DEXA 2005
The feedback control loop
• Plant : current DSMS– Input : load admitted– Output : delay QoS– Reference output: specified by DBA
• Actuator – adaptor: load shedder– admission controller
• Monitor : new• Controller : new• System dynamics: disturbances• Discrete control: control period T
DEXA 2005
System identification
• To build dynamic model that describes the relationship between input and output
• Most system can be modeled by the following linear difference equation:
– I(x): input at period x– O(x): output at period x
– n: order of the equation– ai, bi: system-specific coefficients
• Determine n, ai, bi by experiments using synthetic inputs
n
ii
n
ii ikIbikOakO
11
)()()(
DEXA 2005
Controller design
• PI controller:
– E(k) : error– g, r: controller coefficients– Id(k) : desirable input
• More efficiently:
• Transfer function of the PI controller:
k
id iErkEgkI
0
)()()(
)1()()1()( krEkEgkIkI dd
1
)()(
z
rzgzC
• For example, a second order system has TF:
• Closed-loop TF (CLTF):
• determine g and r by pole placement of the CLTF (details skipped)
212
21
)(
)()(
azaz
bzb
zI
zOzG
DEXA 2005
Actuator (load shedder) design
• Id(k) is the desirable load (# of data tuples) entering the DSMS during the next control period k
• Let S(k) be the real load during period k, we need to discard S(k) - Id(k) tuples
• Two implementations of load shedder:– Admit the first Id(k) tuples during period k
• Pros: easy to implement, generate (100%) accurate control signal
• Cons: skewed to the early arrivals– Sampling based shedding: each tuple is discarded with
probability 1-p, i.e. p = Id(k) / S(k) • However, S(k) is unknown at the beginning of period k• Solution: use S(k-1) to estimate S(k) and this does not affect
controller performance (see backup slide)
DEXA 2005
Determining control period
• Control period T is critical in controller design• Two primary concerns in setting T
– Should be short enough to capture the changes of input rate • Nyquist-Shannon theorem of sampling
• The shorter the better
– Output signal (delay) is measured as an average of all data tuples in one control period
• T is too short small number of sampled tuples• T cannot be too short as the output signal may fail to represent real
system status
• We make tradeoffs between the above two factors and set T to one second
DEXA 2005
Experiments
• We evaluate our control-based solution by simulations
• Set four classes of delays: 500ms – 2000ms
• Operator scheduling policy: Earliest Deadline First– Input: CPU utilization
– Output: deadline miss ratio
• Small query network with 13 operators
• Stream data:– Synthetic: Poisson, Pareto
– Real: TCP traces
• Comparison: static shedding– Amount of shedding follows a pre-determined STEPSIZE
– Similar to TCP rate control
DEXA 2005
Simulation results: Poisson inputs
Target deadline miss ratio (control goal) is set to zero
Inputs Outputs
DEXA 2005
Simulation results: bursty inputs
a. Paretob. TCP trace
• Much less deadline misses than static shedding
• The same or lower level of data loss (load shed)
• Hard to get an appropriate STEPSIZE in static shedding – not a problem in control-based approach
DEXA 2005
Summary
• Load shedding is an important quality adaptation method• Current solutions focusing on steady-state performance
do not work well under bursty inputs • We propose an approach to guide load shedding in a
highly dynamic environment based on feedback control theory
• Initial experimental results by simulation show promising potential of our approach
DEXA 2005
Verification of model
First order linear model
DEXA 2005
Simulation: unpredictable unit processing cost
Control-based method learns the real cost
DEXA 2005
Controller stability after replacing S(k) with S(k-1)
Let Id’(k) be the input signal as a result of using S(k-1) instead of S(k), we have
Id’(k) = p S(k-1)
and thus
S(k-1) Id (k) = S(k) Id’(k) .
In the z-domain, we get
Id (k) = z Id’(k) .
Plugging above into the CLTF, we have
According to control theory, controller is still stable.
DEXA 2005
Ongoing work
• Performed all three steps in a real DSMS – the Borealis system
• We set output to average delay• System identification gives a first-order model
structure• Control function
• Controller analysis gives the following set of parameters:
)1()1()()( 10 kaIkEbkEbkI dd
8.0 and ,31.0 ,4.0 10 abb
DEXA 2005
Ongoing work: results
• Control target: 2000ms• Comparison:
– Adaptive: static shedding– BASELINE– NON-CTRL
• Metrics:– Total delay violations– Total delayed tuples– Max delay– Load shed
DEXA 2005
Ongoing work: results