Upload
bela
View
48
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Time-Decaying Sketches for Sensor Data Aggregation. Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering Iowa State University Bojian Xu Dept. of Electrical and Computer Engineering Iowa State University. 75F 11:39. 76F 11:34. 72F 11:29. - PowerPoint PPT Presentation
Citation preview
Time-Decaying Sketches for Sensor Data Aggregation
Graham CormodeAT&T Labs, Research
Srikanta TirthapuraDept. of Electrical and Computer Engineering
Iowa State University
Bojian XuDept. of Electrical and Computer Engineering
Iowa State University
2/30
Mean of the Temperatures in the Last 30 Minutes
76F11:45
73F11:40
79F11:30
70F11:22
76F11:1578F
11:4173F
11:3976F
11:3876F
11:26
75F11:39
76F11:34
72F11:29
73F11:19
80F11:38
79F11:30
76F11:25
76F11:45
78F11:41
73F11:39
76F11:38
76F11:26
3/30
Sketch
76F11:45
73F11:40
79F11:30
70F11:22
76F11:1578F
11:4173F
11:3976F
11:3876F
11:26
75F11:39
76F11:34
72F11:29
73F11:19
80F11:38
79F11:30
76F11:25
76F11:45
4/30
Sketch Merging
Answer
5/30
General Time Decay• General Decay function:
• Time decayed value of element at time c is:
age0
6/30
Formal Model of the Data(on One Sensor)
Data stream: e0=(v0,t0,id0), e1=(v1,t1,id1), …– v: value– t: timestamp of creation– id: a unique id of the observation
• User defined Time Decay:
• Asynchronous arrival: It is possible ti > tj, while i<j
• Duplicates: idi = idj is possible – Assume: if idi = idj , then vi = vj, ti=tj
7/30
Contribution
First mergable sketch combines the following:
Logarithmic space of the universe size
Guaranteed accuracy
Any time decay model Sum
Asynchronous arrival Quantile
Duplicate insensitive Frequent elements
Data aggregation under any multi-path routing protocol
8/30
Related WorkAny time decay model
Asynchronous arrival
Duplicate insensitive
Sum Quantile Frequent Elements
1 √ √2 √ √3 √ √4 √ √
Our work
√ √ √ √ √ √
1. S. Nath, P. B. Gibbons, S. Seshan and Z. R. Anderson, “Synopsis diffusion for robust aggregation in sensor networks”, SenSys 2004
2. J. Considine, F. Li, G. Kollios and J. Byers, “Approximate Aggregation Techniques for Sensor Databases”, ICDE 2004
3. E. Cohen and M. Strauss, “Maintaining time-decaying stream aggregates”, PODS 2003; Journal of Algorithm 2006
4. S. Tirthapura, B. Xu and C. Busch, “Sketching Asynchronous Streams Over Sliding Windows”, PODC 2006
9/30
Outline
• Problem: Time decayed sum of distinct elements over an asynchronous stream.
• Focus on Integral decay model: is always an integer
10/30
Estimate of the Sum (on One Sensor)
• Given:
– Stream: R = (v0,t0,id0),…, (vn,tn,idn), …
– User defined decay function: f()
• Maintain:
– c: current time– D: set of distinct elements in R
11/30
Estimate of the Sum (cont’d)• Linear space lower bound on duplicate-insensitive
sum (Alon, Matias and Szegedy, STOC 1996)– Deterministic approximate algorithm– Randomized algorithm giving accurate result
• Goal: Continuously maintain an (, )-estimate of:
– User inputs:– D: set of distinct elements in R
An (, )- estimate for X is a random variable Y, such that Pr[|Y-X| > X] < .
12/30
Algorithm for Sum (High Level Picture)
Sum v1=4 v2=8+
SampleRate = p
• Count the number of selected integers
• Multiply by 1/p
√ √ √√ +Count
Random Sampling
13/30
Duplicate Detection
√
Copy 1
√ √
Copy 2
Hash Function Random Sampling
Select x
14/30
Intuition - I
Sample
sample rate
By Chebyshev inequality, for an ε-approximation of the count with constant probability:
(v,t,id)
15/30
Intuition - II
• t
• t+
• Sample rate ?
16/30
SIZE ??
p1 = 1/2
p0 = 1
p2 = 1/4
SampleRate pj
Maintain Multiple Samples
17/30
Faster Sampling• RangeSample (Pavan & Tirthapura, SICOMP 2007)
– Efficiently compute the number of selected integers
√ √ √
SIZE ??
p1 = 1/2
p0 = 1
p2 = 1/4
SampleRate pj
p1 = 1/2
p0 = 1
p2 = 1/4
18/30
At time: t
At time: t +
e=(v, t, id)
= Expiry Time
Expiry Time
√ √ √ At time: t
At time: t +
expiry time
Binary search over [t, tmax] using RangeSample
√ √ √
19/30
t0
t1
t2 1/4
1/8
p=1
1/2
Level 0
Level 1
Level 2
Largest expiry time of all the elements discarded from the sample Sample 0
Sketch
Sketch Structure
20/30
(e1,22)
(e1,19)
1/4
p=1
1/2
Level 0
Level 1
Level 2
current time 17
data: (v, t, id) e1 (22, 16, 6)
Expiry0 22Expiry1 19Expiry2 17
21/30
(e3,21)(e2,23)(e1,22)
(e2,21)(e1,19)
1/4
p=1
1/2
Level 0
Level 1
Level 2
current time 17 18 18
data: (v, t, id) e1 (22, 16, 6)
e2(32, 17, 9)
e3(7, 16, 11)
Expiry0 22 23 21Expiry1 19 21 16Expiry2 17 18 16
22/30
(e4,23)(e2,23)(e1,22)
(e4,21)(e2,21)(e1,19)
1/4
p=1
1/2
Level 0
Level 1
Level 2
current time 17 18 18 20
data: (v, t, id) e1 (22, 16, 6)
e2(32, 17, 9)
e3(7, 16, 11)
e4(21, 18, 8)
Expiry0 22 23 21 23Expiry1 19 21 16 21Expiry2 17 18 16 20
(e3,21)
Discard the element with smallest expiry time
23/30
(e4,23)(e2,23)(e1,22)t0= 21
(e4,21)(e2,21)(e1,19)
1/4
p=1
1/2
Level 0
Level 1
Level 2
current time 17 18 18 20
data: (v, t, id) e1 (22, 16, 6)
e2(32, 17, 9)
e3(7, 16, 11)
e4(21, 18, 8)
Expiry0 22 23 21 23
Expiry1 19 21 16 21
Expiry2 17 18 16 20
24/30
(e4,23)(e2,23)(e1,22)t0= 21
(e4,21)(e2,21)(e1,19)
1/4
p=1
1/2
Level 0
Level 1
Level 2
current time 17 18 18 20 20
data: (v, t, id) e1 (22, 16, 6)
e2(32, 17, 9)
e3(7, 16, 11)
e4(21, 18, 8)
e5(32, 17, 9)
Expiry0 22 23 21 23 23
Expiry1 19 21 16 21 21
Expiry2 17 18 16 20 18
Duplicate
25/30
Answer a Query for the Decayed Sum
Current time = 20
t0= 21Level 0
Level 1
Level 2
Level used to answer the
query
e2
e4
√
√
(e4,23)(e2,23)(e1,22)
(e4,21)(e2,21)(e1,19)
1/4
p=1
1/2
26/30
Over the Whole Sensor N/W
(e3,13)(e2,9)(e1,6)
(e3,13)(e5,10)(e4,6)
Each sample keeps 3 distinct items with largest expiry time.
union
(e3,13)(e5,10)(e2,9)
union
union
Sketch 1
Sketch 2
Result of merging sketch 1&2
27/30
Algorithm Complexity
• Space complexity:
• Time complexity– expected time for processing one item
– Time for answering a query
– Time for merging two sketches
28/30
ConclusionFirst sketch combines the following
Logarithmic space of the universe size
Guaranteed accuracy
Any time decay model Sum
Asynchronous arrival Quantile
Duplicate insensitive Frequent elements
Data aggregation under any multi-path routing protocol
29/30
Ongoing and Future Work
• Implementation– Observed results better than theoretical
predictions
• Better duplicate insensitive sketches for specific decay models?
• Other aggregates, such as Variance, clustering?
30/30
THANKS