27
Data Quality and Query Cost in pervasive sensing systems David Yates 1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College Computer Information Systems Dept. Waltham, Massachusetts, USA [email protected]

Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Embed Size (px)

Citation preview

Page 1: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in pervasive sensing systems

David Yates 1

Data Quality and Query Cost in Pervasive Sensing Systems

David J. Yates

Bentley CollegeComputer Information Systems Dept.

Waltham, Massachusetts, [email protected]

Page 2: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 2

Joint Work With …

Erich NahumIBM T.J. Watson Research Center

19 Skyline DriveHawthorne, New York, USA

James Kurose and Prashant ShenoyDept. of Computer Science

University of MassachusettsAmherst, Massachusetts, USA

Page 3: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 3

Talk Outline

• Data quality and query cost for pervasive sensing systems• Motivation and introduction

• Pervasive sensing applications• Resource-constrained sensor fields• Sensor networks and backbone networks

• Data management techniques to conserve resources

• Sensor network data server and cache• Query cost, data quality, delay, value deviation• Cost and quality performance

• Summary and Conclusions

Page 4: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 4

Research Contributions

• Define and quantify data quality and query cost performance in pervasive sensing systems

• Develop policies that approximate sensor field values using cached values for nearby locations

• Prove analytic upper bound on sensor field query rate

• Show cost and quality win-win for pervasive sensing applications for which response time is most important

• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important

• Results are robust with respect to the manner in which the query workload changes

Page 5: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 5

Pervasive Sensing Applications

• Microsensors, on-board processing, wireless interfaces feasible at very small scale – can monitor phenomena “up close”

• Enables spatially and temporally dense monitoring and control

Pervasive sensing will reveal previously unobservable phenomena

Data center management

Manufacturing engineering

Environmental monitoring

Natural disaster response

Embedded, energy-constrained (wireless, small form-factor), unattended systems

Page 6: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 6

Sensors Embedded in Infrastructure

• The day after a moderate earthquake jolts the city of San Francisco, building inspectors check on the structural integrity of an office building in the financial district. Sensors embedded in the walls of the building to monitor and record vibration data confirm that the structure is safe to enter. (Intel 2005)

Page 7: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 7

• Sensor fields (blue), backbone (yellow), monitoring & control applications (red)

• Queries submitted from sensing applications• Replies received from sensor fields• Our focus – Data management at data server

From Sensor Networks to Applications

Light

SoundData server / Gateway

(and cache)

Routers & Switches Sensing

Application

…Embedded, energy-constrained (wireless, small

form-factor), unattended systems

Page 8: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 8

Data Server Node Without Cache

Sensor network query queue

Gateway reply queue

Queries

Replies

Sensor field

Queries

Replies

s

s

s

s

ss

s

ss

s

s

s

l1

l2

li = query location iti = timestamp associated with value sampled in

sensor field at location i

{t1}

{t2}

s = sensor

Page 9: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 9

Data Server Node Without Cache

Sensor network query queue

Gateway reply queue

Queries

Replies

Sensor field

Queries

Replies

s

s

s

s

ss

s

ss

s

s

s

l1

l2

li = query location iti = timestamp associated with value sampled in

sensor field at location i

Querym

Replym

End-to-end delay occurs between Querym and Replym.Value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue.

{t1}

{t2}

s = sensor

Page 10: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 10

Sensor network query queue

Gateway query queue

Cache update queue

Cache

Queries

Updates or repliesHit

Gateway reply queue

Miss or Prefetch

Updates

Data Server Node With Cache

Sensor field

s

s

s

s

ss

s

ss

s

s

s

l1

l2

Queries

Replies

l3

li = query location; eli = cache entry for query location

ti = timestamp of value associated with location ivi = value in cache associated with location i

eli = {li,vi,ti} el1, el2

Querym

Replym

For a cache hit or a miss, end-to-end delay occurs between Querym and Replym. Also, value deviation is between the value in Replym and the value at li as Replym leaves the gateway reply queue.

s = sensor

Locations l1 and l2 are cached in entries el1 and el2

Page 11: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 11

' '(1 )

mean( )1' , where , andstddev( )(1 )

mean( )1' , where stddev( )(1 )

where is system end-to-end delay, is value divergence,

d

v

Q AS A Dn d v

S Sd dS b

d b Se dD D

v vD cv c De v

SD

and is the relative importance of vs. d vA S D

Query Cost and Data Quality

Cost to query location li is normalized such that

Normalized quality using softmax normalization

min( ) 1 unitliCost

Page 12: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 12

Caching and Lookup Policies

• All hits• All misses• Simple lookup• Piggyback queries• Greedy age-based lookup• Greedy distance-based lookup• Median-of-3 lookup

no queries

Policies incorporate an age parameter

TT can be 0, finite, or infinite

precise lookupsand queries

approximatelookups andqueries

Page 13: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 13

Research Contributions

• Defined and quantified data quality and query cost performance in pervasive sensing systems

• Developed policies that approximate sensor field values using cached values for nearby locations

• Prove analytic upper bound on sensor field query rate

• Show cost and quality win-win for pervasive sensing applications for which response time is most important

• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important

• Results are robust with respect to the manner in which the query workload changes

Page 14: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 14

Lab Trace Data

Trace data from multi-sensor motes deployed at Intel Berkeley lab (Deshpande 2004)

Page 15: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 15

Lab Environment and Workload

• 2.3 million readings taken over 35+ days• Use readings with largest changes in

value in our simulator (light measured in Lux)

• Changes occur slowly relative to correlated changes (about 1 location every 1.4 seconds)

• But, range of values is large

• Applications determine values for A and T

Page 16: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 16

Bounded Resource Consumption

• N is set of locations in sensor field• Cache entry for each location used by

multiple queries for periods of T seconds (requires blocking behind pending queries)

• Sensor field query rate can be bounded by:

queries per second

• Proof: Induction on size of N• Sensor field transmissions dominate

resource consumption

NT

Page 17: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 17

Data Quality Driven by Response Time

Picking a large value of A means delay is more importantthan value deviationConsider normalized quality when A = 0.9

Page 18: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 18

Cost and Quality Performance when

Response Time drives Quality

0

4

8

12

16

20

0.1 0.3 0.5 0.7 0.9

Quality

Cos

t

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

Trace-driven Changes

A = 0.9, T = 90 secQuery rate = 0.9 lps

Change rate = 1.4 lps

Approximate greedy lookups outperform other policiesThere is a win-win here!

Page 19: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 19

Delay when Response Time drives Quality

Delay Quality, highly

0

1

2

3

4

5

0.1 0.3 0.5 0.7 0.9

Quality

Del

ay

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

Trace-driven Changes

Page 20: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 20

Research Contributions

• Defined and quantified data quality and query cost performance in pervasive sensing systems

• Developed policies that approximate sensor field values using cached values for nearby locations

• Proved analytic upper bound on sensor field query rate

• Showed cost and quality win-win for pervasive sensing applications for which response time is most important

• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important

• Results are robust with respect to the manner in which the query workload changes

Page 21: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 21

' '(1 )

mean( )1' , where , andstddev( )(1 )

mean( )1' , where stddev( )(1 )

where is system end-to-end delay, is value divergence,

d

v

Q AS A Dn d v

S Sd dS b

d b Se dD D

v vD cv c De v

SD

and is the relative importance of vs. d vA S D

Data Quality Driven by Accuracy

Choosing a small value of A means value deviation is moreimportant to data quality than delayFor example, consider normalized quality when A = 0.1

Page 22: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 22

Cost vs. Quality when Accuracy drives Quality

0

4

8

12

16

20

0.3 0.4 0.5 0.6 0.7

Quality

Cos

t

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

Trace-driven Changes

A = 0.1, T = 90 secQuery rate = 0.9 lps

Change rate = 1.4 lps

There is a tradeoff between cost and quality here

Page 23: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 23

Value Deviation when Accuracy drives Quality

Trace-driven Changes

0

100

200

300

400

0.3 0.4 0.5 0.6 0.7

Quality

Val

ue d

evia

tion

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

Significant differences in accuracy between policies

Page 24: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 24

Cost and Quality Trends when Response Time drives

Quality

0

4

8

12

16

20

0.1 0.3 0.5 0.7 0.9

Quality

Cos

t

0

4

8

12

16

20

0.1 0.3 0.5 0.7 0.9

Quality

Co

st

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

0

4

8

12

16

20

0.1 0.3 0.5 0.7 0.9

Quality

Cos

t

Trace-driven ChangesA = 0.9, T = 9 secQuery rate = 90, 9,

and 0.9 lps

Again, there is awin-win here!

Page 25: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 25

Cost vs. Quality Trends when Accuracy drives Quality

0

4

8

12

16

20

0.3 0.4 0.5 0.6 0.7

Quality

Co

st

All hits

All misses

Simple lookup

Greedy age lookup

Greedy dist lookup

Median-of-3 lookup

Piggyback queries

0

4

8

12

16

20

0.3 0.4 0.5 0.6 0.7

Quality

Cos

t

0

4

8

12

16

20

0.3 0.4 0.5 0.6 0.7

Quality

Cos

t

Trace-driven ChangesA = 0.1, T = 9 secQuery rate = 90, 9,

and 0.9 lps

Same relative performance

Page 26: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 26

Talk Summary

• Define and quantify data quality and query cost performance in pervasive sensing systems

• Develop policies that approximate sensor field values using cached values for nearby locations

• Prove analytic upper bound on sensor field query rate

• Show cost and quality win-win for pervasive sensing applications for which response time is most important

• Show cost vs. quality tradeoff for sensing applications for which accuracy is most important

• Results are robust with respect to the manner in which the query workload changes

Page 27: Data Quality and Query Cost in pervasive sensing systemsDavid Yates1 Data Quality and Query Cost in Pervasive Sensing Systems David J. Yates Bentley College

Data Quality and Query Cost in Pervasive Sensing Systems

David Yates 27

Thank You!

• Further questions ???• …

David J. Yates

Bentley CollegeComputer Information Systems Dept.

Waltham, Massachusetts, [email protected]