Upload
berg
View
25
Download
0
Embed Size (px)
DESCRIPTION
Approximating Sensor Network Queries Using In-Network Summaries. Alexandra Meliou Carlos Guestrin Joseph Hellerstein. Approximate Answer Queries. Approximate representation of the world: Discrete locations Lossy communication Noisy measurements - PowerPoint PPT Presentation
Citation preview
Approximating Sensor Network Queries Using In-Network
Summaries
Alexandra MeliouCarlos GuestrinJoseph Hellerstein
Approximate Answer Queries Approximate representation of the world:
Discrete locations Lossy communication Noisy measurements
Applications do not expect accurate values (tolerance to noise)
Example: Return the temperature at all locations ±1C, with 95% confidence
Query Satisfaction: On expectation the requested portion of sensor values lies within the
error range
In-network DecisionsQuery
Use in-network models to make routing decisions
No centralized planning
In-network Summaries
Spanning tree T(V,E’)
+
Models Mv for all nodes v
Mv represents the whole subtree rooted at v.
Model Complexity
Need for compression
Gaussian distributions at the leaves:• good for modeling individual node
measurements
Talk “outline”
Compression
TraversalConstruction
In-network summaries
Collapsing Gaussian Mixtures Compress an m-size
mixture to a k-size mixture.
Look at simple case (k=1) Minimize KL-
divergence?“Fake” mass
Quality of Compression
Depends on query workload
Query with acceptable error window WQuery with acceptable error window W’<W
Compression
Accurate mass inside interval
No guarantee on the tails
€
maxz
f (x)dxz−w
z+w∫€
N(μ,σ 2)dxμ−w
μ+w∫ = N i(μ i,σ i2)dx
μ−w
μ+w∫i∑
Talk “outline”
Compression
TraversalConstruction
In-network summaries
Query Satisfaction A response R={r1…rn} satisfies query Q(w,δ) if:
In expectation the values of at least δn nodes lie within [ri-w,ri+w]
€
f i(x)dxri −w
ri +w∫i∑ ≥ δn
In-network summary
Q
R [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10]
Within error bounds
Optimal Traversal Given: tree and models Find: subtree such that
€
T =G(V ,E)
€
Mv
€
G(V ',E '), E '⊆ E
€
Mass(Mv,w) ≥ δnleaves∑
Can be computed with Dynamic Programming
response [μleaves]
Greedy Traversal If local model satisfies
Return μ Else descend to child node
€
f (x)dxμ−w
μ+w∫ ≥ δ
More conservative solution:enforces query satisfiability on every subtree instead of the whole tree
Traversal Evaluation
Talk “outline”
Compression
TraversalConstruction
In-network summaries
Optimal Tree Construction Given a structure, we know how to build
the models
But how do we pick the structure?
Traversal = cut
Theorem: In a fixed fanout tree, the cost of the traversal is where |C| is the size of the cut, and F the fanout
€
FF−1 |C | −1( )
Intuition: minimize cut size
Group nodes into a minimum number of groups which satisfy the query constraints
Clustering problem
Optimal Clustering Given a query Q(w,δ), optimal clustering
is NP-hard Related to the Group Steiner Tree Problem
Greedy algorithm with factor log(n) approximation Greedily pick max size cluster Issue: does not enforce connectivity of
clusters
Greedy Clustering Include extra nodes to enforce connectivity
Augment clusters only with accessible nodes (losing the logn guarantee)
Clustering comparison 2 distributed clustering algorithms are compared to the centralized
greedy clustering
Talk “outline”
Compression
TraversalConstruction
In-network summariesEnriched models
Enriched models Support more complex models
k-mixtures• Compress to a k-size mixture instead of a SGM
Virtual nodes• Every component of the k-size mixture is stored as a
separate “virtual node” SGMs on multiple windows
• Maintain additional SGMs for different window sizes
More space, more expensive model updates
(SGM = Single Gaussian Model)
Evaluation of enriched models
SGM surprisingly effective in representing the underlying data
Sensitivity analysis
Talk “outline”
Compression
TraversalConstruction
In-network summaries
Tree Construction Parameters and Effect on Performance Confidence
Performance for workloads of different confidence than the hierarchy design
Error window Broader vs narrower ranges of window sizes Assignment of windows across tree levels
Temporal changes How often should the models be updated
ConfidenceWorkload of 0.95 confidence
Design confidence does not have a big impact on performance
Error windows
A wide range is not always better, because it forces the traversal of more levels
Model Updates
Sensitivity analysis
Conclusions
Analyzed compression schemes for in-network summaries
Evaluated summary traversal Studied optimal hierarchy construction Studied increased complexity models
Showed that simple SGM are sufficient Analyzed the effect on efficiency of various
parameters
Compression
TraversalConstruction
In-network summariesEnriched models