23
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu ¥ †Department of Computer Sciences, Purdue University, USA ‡School of Computing Science, Simon Fraser University at Surrey, Canada ¥ School of Mechanical Engineering, Purdue University, USA

DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

Embed Size (px)

Citation preview

Page 1: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Control-based Quality Adaptation in Data Stream

Management Systems (DSMS)

Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu¥

†Department of Computer Sciences, Purdue University, USA

‡School of Computing Science, Simon Fraser University at Surrey, Canada

¥ School of Mechanical Engineering, Purdue University, USA

Page 2: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Data Stream Management

• Continuous data, discarded after being processed

• Continuous query• Data-active query-

passive model• Applications

– Financial analysis– Mobile services– Sensor networks– Network monitoring– More …

User

DSMS

User

User

DataQuery

Results

Page 3: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

DSMS architecture

• Network of query operators (O1 – O3)

• Each operator has its own queue (q1 – q4)

• Scheduler decides which operator to execute

• Query results (Q1, Q2) pushed to clients

• Example systems:– Aurora/Borealis– STREAM

Page 4: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Quality-of-Service (QoS) in DSM

• Data processing is QoS-critical in DSMS– Tuple delay is the major concern: results generated from old data

are useless!

• Highly dynamic environment hard to maintain QoS– Bursty data input– Unpredictable unit processing cost

• Overloading during spikes degraded (delay) QoS • Solution: adjust the following (i.e. quality adaptation)

– Sampling rate (source side) – Data loss (DSMS side) load shedding

Page 5: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Load Shedding

• Eliminating excessive load by dropping data items less QoS violations

• Basic algorithm (Tatbul et al., 2003): periodically• CPU is the bottlenecking resource• Key questions

– When?– How much?– Where?– Which tuples?

Page 6: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

What’s missing?

• Current solutions focus on steady-state performance

• Assuming input level changes between stable states

• However, arrivals are bursty in practice – always in transient state

• Taking averages (baseline) wouldn’t work

Load

Time

CPUcapacity

Page 7: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Our approach

• View load shedding as a feedback control problem • Feedback Control: manipulation of system behavior by

adjusting system input based on system output – Cruise control of automobiles, room temperature control, etc.

• The feedback control loop:– Plant

– Monitor

– Controller

– Actuator

• How it works– Error = measured output – desirable output

– Focal point: controller, which maps error to control signal

Page 8: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Why Feedback Control ?

• Maintain system performance under internal/external uncertainties

• Control theory provides tools to choose and tune controller toward desired performance

– Current load shedding solution is also feedback-based– Difference: we use control theory to guide the controller design

• Steps of problem-solving using control theory1. Mapping problem to feedback control loop, determine

input/output

2. System identification: modeling input/output relationship

3. Controller design: can be done analytically

Page 9: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

The feedback control loop

• Plant : current DSMS– Input : load admitted– Output : delay QoS– Reference output: specified by DBA

• Actuator – adaptor: load shedder– admission controller

• Monitor : new• Controller : new• System dynamics: disturbances• Discrete control: control period T

Page 10: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

System identification

• To build dynamic model that describes the relationship between input and output

• Most system can be modeled by the following linear difference equation:

– I(x): input at period x– O(x): output at period x

– n: order of the equation– ai, bi: system-specific coefficients

• Determine n, ai, bi by experiments using synthetic inputs

n

ii

n

ii ikIbikOakO

11

)()()(

Page 11: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Controller design

• PI controller:

– E(k) : error– g, r: controller coefficients– Id(k) : desirable input

• More efficiently:

• Transfer function of the PI controller:

k

id iErkEgkI

0

)()()(

)1()()1()( krEkEgkIkI dd

1

)()(

z

rzgzC

• For example, a second order system has TF:

• Closed-loop TF (CLTF):

• determine g and r by pole placement of the CLTF (details skipped)

212

21

)(

)()(

azaz

bzb

zI

zOzG

Page 12: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Actuator (load shedder) design

• Id(k) is the desirable load (# of data tuples) entering the DSMS during the next control period k

• Let S(k) be the real load during period k, we need to discard S(k) - Id(k) tuples

• Two implementations of load shedder:– Admit the first Id(k) tuples during period k

• Pros: easy to implement, generate (100%) accurate control signal

• Cons: skewed to the early arrivals– Sampling based shedding: each tuple is discarded with

probability 1-p, i.e. p = Id(k) / S(k) • However, S(k) is unknown at the beginning of period k• Solution: use S(k-1) to estimate S(k) and this does not affect

controller performance (see backup slide)

Page 13: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Determining control period

• Control period T is critical in controller design• Two primary concerns in setting T

– Should be short enough to capture the changes of input rate • Nyquist-Shannon theorem of sampling

• The shorter the better

– Output signal (delay) is measured as an average of all data tuples in one control period

• T is too short small number of sampled tuples• T cannot be too short as the output signal may fail to represent real

system status

• We make tradeoffs between the above two factors and set T to one second

Page 14: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Experiments

• We evaluate our control-based solution by simulations

• Set four classes of delays: 500ms – 2000ms

• Operator scheduling policy: Earliest Deadline First– Input: CPU utilization

– Output: deadline miss ratio

• Small query network with 13 operators

• Stream data:– Synthetic: Poisson, Pareto

– Real: TCP traces

• Comparison: static shedding– Amount of shedding follows a pre-determined STEPSIZE

– Similar to TCP rate control

Page 15: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Simulation results: Poisson inputs

Target deadline miss ratio (control goal) is set to zero

Inputs Outputs

Page 16: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Simulation results: bursty inputs

a. Paretob. TCP trace

• Much less deadline misses than static shedding

• The same or lower level of data loss (load shed)

• Hard to get an appropriate STEPSIZE in static shedding – not a problem in control-based approach

Page 17: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Summary

• Load shedding is an important quality adaptation method• Current solutions focusing on steady-state performance

do not work well under bursty inputs • We propose an approach to guide load shedding in a

highly dynamic environment based on feedback control theory

• Initial experimental results by simulation show promising potential of our approach

Page 18: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Verification of model

First order linear model

Page 19: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Simulation: unpredictable unit processing cost

Control-based method learns the real cost

Page 20: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Controller stability after replacing S(k) with S(k-1)

Let Id’(k) be the input signal as a result of using S(k-1) instead of S(k), we have

Id’(k) = p S(k-1)

and thus

S(k-1) Id (k) = S(k) Id’(k) .

In the z-domain, we get

Id (k) = z Id’(k) .

Plugging above into the CLTF, we have

According to control theory, controller is still stable.

Page 21: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Ongoing work

• Performed all three steps in a real DSMS – the Borealis system

• We set output to average delay• System identification gives a first-order model

structure• Control function

• Controller analysis gives the following set of parameters:

)1()1()()( 10 kaIkEbkEbkI dd

8.0 and ,31.0 ,4.0 10 abb

Page 22: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Ongoing work: results

• Control target: 2000ms• Comparison:

– Adaptive: static shedding– BASELINE– NON-CTRL

• Metrics:– Total delay violations– Total delayed tuples– Max delay– Load shed

Page 23: DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song

DEXA 2005

Ongoing work: results