Upload
virgil-dennis
View
220
Download
1
Embed Size (px)
Citation preview
1
Pradeep Kumar Gunda(Thanks to Jigar Doshi and Shivnath Babu for some slides)
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor
NetworksSamuel Madden, Michael J Franklin, Joseph M Hellerstein, Wei Hong
2
TAG - Motivation
Sensor Networks used for monitoring in various fieldsCivil engineers to monitor buildings during earthquakes
Biologists for habitat monitoring
People prefer summary reports not individual values
Aggregation common to all these applications!
Must be a core service and easy to use.
TAG fills this void
3
Before TAG
Centralized approachTransfer everything to base station
No suppression – high energy usage, traffic
Directed DiffusionViewed aggregation as a application-specific operation
Aggregation API in routing layer
No declarative query language like TAG
Not for any generic aggregation operators
4
What is TAG
Tiny Aggregation for Sensor Networks
SQL – like interface eg. Min, Max, Count
Sensitive to constraints of ad-hoc sensor networks
Query inserted into network over an existing routing protocol
Aggregation done along the reverse path
Combines the research in networking community with database community
5
DBMS in a nutshell
Select max(wins), team fromBasketballwins
Where year=‘2002’
Group by tournament
Team Year Wins Tournament
Duke 2002 25 ACC
Duke 2003 35 NCAA
UNC 2002 20 ACC
UNC 2003 25 NCAA
6
DataBase Management System
High-levelHigh-levelQuery QQuery Q
DBMS
Data
Answer
Translates Q intobest execution plan
for current conditions,runs plan
Keeps data safe and correct
despite failures, concurrent
updates, online processing, etc.
7
Data Streams
User/ApplicationUser/Application
Register Register Continuous QueryContinuous Query(Standing Query)(Standing Query)
Stream QueryProcessorInput streams
ResultResult
8
What can DBMS offer for Sensor Networks??
Express aggregation as SQLSpecify what you want. Not how to getUsers need not write low level programming language code!!!
Less bugsDon’t worry about optimization
Techniques from parallel/distributed dbSensor network is a stream of sensor readings to base station
9
Query ModelOne Table : sensorsSELECT AVG(volume), room FROM sensorsWHERE floor = 6GROUP BY roomHAVING AVG(VOLUME) > thresholdEPOCH DURATION 30s
In generalSELECT {agg(expr), attrs} FROM sensorsWHERE {selPreds}GROUP BY {attrs}HAVING{havingPreds}EPOCH DURATION IDifference between TAG & SQL : Continuous Output
10
Aggregate Structure
Standard SQL supports “the basic 5”:MIN, MAX, SUM, AVERAGE, and COUNT
TAG supports any function conforming toInitializer i: Instantiates a record for a single sensor valueMerging function f. Merges two partial state recordsEvaluator e: Computes the actual value of the aggregate from a partial state record
Example - averagei{v} <v,1>f{<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>e{<S1, C1>} S1/C1
TAG supports MEDIAN, HISTOGRAM and COUNT DISTINCT also
11
Classifying AggregatesDuplicate Sensitive (yes/no)Exemplary/SummaryMonotonics’ = f(a,b) e(s’) >= MAX(e(s1),e(s2)) OR e(s’) <= MIN(e(s1),e(s2))
Decides whether predicate can be applied in network
Partial StateDistributive (partial state’s size same as final aggregate)Algebraic (partial states are not themselves aggregate)Holistic (No useful partial aggregation)UniqueContent Sensitive
12
Aggregate Taxonomy
Distributive Algebraic Holistic Unique Content-sensitive
Partial statesize
Same asfinalaggregate
Constant Dataproportional
Distinctvalueproportional
Proportionalto somedataproperty
13
Requirements of the Routing Algorithm
Deliver Query requests to all nodes
Route from every node to root
No Duplicates ! (Affects some aggregates like count, avg)
Does it violate the end-to-end principle?
A simple example proposed : tree based routing
14
Tree Based Routing
One rootAny interior node sets sender as parent and sets its level to that of parent + 1RebroadcastsMessage sent by node to its parent eventually reaches rootReselect parent after k silent epochs
Query
P:0, L:1
2
1
5
3
4
6
P:1, L:2
P:1, L:2
P:3, L:3
P:2, L:3
P:4, L:4
15
The TAG Algorithm
2 PhasesDistribution: Queries are pushed down the network.
Parents broadcast queries to their children
Collection: Aggregate values continuously sent from children to parents
Reply from all children required before forwarding an aggregate value
TDMA like partitioningChildren must deliver records during a parent-specified time interval
Parent collects all values (including its own) and sends the aggregate up the tree
16
Flow of partial State
Parent reception interval must be chosen carefullyAll children must be able to reportCannot exceed end of epoch
However we can always make the algorithm pipelined
17
Pipelined Aggregation
122132
111111
123143
54321
1
2 3
4
5
1
2
31
4
Sensor #
Ep
och
#
Epoch 3SELECT COUNT(*) FROM sensors
18
Pipelined Aggregation
123143
122132
111111
123154
54321
1
2 3
4
5
1
2
31
5
Sensor #
Ep
och
#
Epoch 4SELECT COUNT(*) FROM sensors
19
Pipelined Aggregation
123154
123143
122132
111111
123155
54321
1
2 3
4
5
1
2
31
5
Sensor #
Ep
och
#
Epoch 5SELECT COUNT(*) FROM sensors
20
Grouping
Simple aggregation mechanismComplicated by HAVING clauseGroup eviction to solve storage problemEvicted tenant sent to parent
21
Simulation Environment
Java-based simulation & visualization for validating algorithms, collecting data.
Sensors arranged on a grid, radio connectivity by Euclidian distance
Communication modelLossless: All neighbors hear all messages
Symmetric linksNo collisions, hidden terminals, etc.
RealisticNumber of hops related to distance but not proportional
22
Simulation Results
2500 nodes, d=50TAG outperforms centralized approach by an order of magnitude in most cases
Does equally well in the worst caseActual benefit depends on the topology
23
Optimizations
SnoopingOverhear packets – can initiate aggregation if missed
Can also be used for suppression!
Hypothesis testing Guess the value of aggregate & suppress
Send only if current val > MAX
Can be applied to a variety of aggregates such as MAX
24
Experiment: Hypothesis Testing
Uniform Value Distribution, MAX Query
Messages/ Epoch vs. Network Diameter
0
500
1000
1500
2000
2500
3000
10 20 30 40 50
Network Diameter
Messages / Epoch
No GuessGuess = 50
Guess = 90Snooping
25
TAG Loss Tolerance
Maintain List of the link signal quality to the neighbors and if better shift parent
Pick a new parent if no hello for a .You can pick a node below you in the tree so child may have to reselect their parent
26
Experiment: Effects of Loss
Percent Error From Single Loss vs. Network Diameter
0
0.5
1
1.5
2
2.5
3
3.5
10 20 30 40 50
Network Diameter
Percent Error From Single Loss
AVERAGECOUNT
MAXMEDIAN
27
Experiment: Benefit of Cache
Percentage of Network I nvolved vs. Network Diameter
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Network Diameter
% Network
No Cache
5 Rounds Cache9 Rounds Cache
15 Rounds Cache
28
Summary
TAG is based on a declarative interfaceMakes network tasking easier for the end user
TAG outperforms centralized approaches in most cases
Relies heavily on underlying routing layerPlacement of query tree constrained by characteristics of routing tree
29
Critique
Fault Tolerance – too simplisticHow high is the failed node on the tree
MobilityMultiple Queries ?Generic?Multiple sinks / Sink mobility?More hypothesis testing - more problems!Compression + aggregation ?Energy budgeting ?