Upload
clyde-mcgee
View
214
Download
0
Embed Size (px)
DESCRIPTION
3 Outline --- Querying the Physical World Device Networks & Their Query Processing Description of Device Networks Three kinds of queries Two approaches Device Database System Device & Function User representation Internal representation Queries Query Processing over Device Database System Performance Metrics Distributed Query Execution Plans Experiments Discussions
Citation preview
11
Querying the Physical WorldQuerying the Physical World
------------ Cornell UniversityCornell University
Event Detection Services Using Data Service Event Detection Services Using Data Service Middleware in Distributed Sensor NetworksMiddleware in Distributed Sensor Networks
------------ University of VirginiaUniversity of Virginia
Presented By Gary Zhou @ UVAPresented By Gary Zhou @ UVA
CS 862 Presentation
22
Comparison between these two papersComparison between these two papers
Query the physical worldQuery the physical world
Event Detection ServiceEvent Detection Service
No avi value for each data, so not really real-time based.No avi value for each data, so not really real-time based.
There is avi value for each data, so really Real-time basedThere is avi value for each data, so really Real-time based
Special interesting point: represent device functionSpecial interesting point: represent device function
Special interesting point: provide event detection serviceSpecial interesting point: provide event detection service
Concentrate on individual mote.Concentrate on individual mote.
Group-based robust coordinationGroup-based robust coordination
Provide database-like abstraction to applicationsProvide database-like abstraction to applications
Provide database-like abstraction to applicationsProvide database-like abstraction to applications
33
Outline --- Querying the Physical WorldOutline --- Querying the Physical World
Device Networks & Their Query Processing Device Networks & Their Query Processing Description of Device NetworksDescription of Device Networks Three kinds of queriesThree kinds of queries Two approachesTwo approaches
Device Database SystemDevice Database System Device & FunctionDevice & Function User representationUser representation Internal representationInternal representation Queries Queries
Query Processing over Device Database SystemQuery Processing over Device Database System Performance MetricsPerformance Metrics Distributed Query Execution PlansDistributed Query Execution Plans Experiments Experiments
DiscussionsDiscussions
44
Outline --- Event Detection ServiceOutline --- Event Detection Service
MotivationMotivationData services in sensor networksData services in sensor networksData Service Middleware (DSWare)Data Service Middleware (DSWare)Pay more attention to Event Detection ServicePay more attention to Event Detection ServiceExperiments and performanceExperiments and performanceDiscussionsDiscussions
55
Device Networks & Their Query ProcessingDevice Networks & Their Query Processing
Description of Device NetworkDescription of Device NetworkThe widespread deployment of sensors, actuators and mobile devices is The widespread deployment of sensors, actuators and mobile devices is transforming the physical world into a computing platform.transforming the physical world into a computing platform.
Emerging networking techniques ensure that devices are interconnected Emerging networking techniques ensure that devices are interconnected and accessible from local- or wide-area networks.and accessible from local- or wide-area networks.
Using this new computing platform, users interact with portions of the Using this new computing platform, users interact with portions of the physical world. physical world.
66
Three kinds of QueriesThree kinds of Queries
Historical queriesHistorical queriesThese are typically aggregate queries over historical data obtained from the These are typically aggregate queries over historical data obtained from the device network.device network.An example --- For each rainfall sensor in 1800 JPA, display the average An example --- For each rainfall sensor in 1800 JPA, display the average level of rainfall for 1999.level of rainfall for 1999.
Snapshot queriesSnapshot queriesThese queries concern the device network at a given point in time.These queries concern the device network at a given point in time.An example --- Retrieve the current rainfall level for all sensors in 1800 JPA.An example --- Retrieve the current rainfall level for all sensors in 1800 JPA.
Long-running queriesLong-running queriesThese queries concern the device network over a time interval.These queries concern the device network over a time interval.For the next 5 hours, retrieve every 30 seconds the rainfall level for all For the next 5 hours, retrieve every 30 seconds the rainfall level for all sensors in 1800 JPA.sensors in 1800 JPA.
77
Two ApproachesTwo Approaches
Device database systemDevice database systemDefinition --- A database system that enables distributed Definition --- A database system that enables distributed query processing over a device network. query processing over a device network.
The warehousing approachThe warehousing approachDefinition --- In this approach, data are extracted from the Definition --- In this approach, data are extracted from the devices in a predefined way and stored in a centralized devices in a predefined way and stored in a centralized database system that is responsible for query processing. database system that is responsible for query processing.
88
Two Approaches --- warehousingTwo Approaches --- warehousing
Advantages of warehousing approachAdvantages of warehousing approach
Disadvantages of warehousing approachDisadvantages of warehousing approach
It uses valuable resources to transfer large amount of raw data It uses valuable resources to transfer large amount of raw data from devices to the database server.from devices to the database server.
It disassociates access to device from the query workload.It disassociates access to device from the query workload.
It is well suited for aggregated queries asked for historical data.It is well suited for aggregated queries asked for historical data.
99
Two Approaches --- Device database systemTwo Approaches --- Device database system
Device database systemDevice database system
Device & FunctionDevice & FunctionUser representationUser representationInternal representation Internal representation Queries Queries
1010
Device & FunctionDevice & Function
DeviceDeviceEach device is a mini-server that supports a set of functions and can Each device is a mini-server that supports a set of functions and can process portions of the queries directly at the device.process portions of the queries directly at the device.example, a function that detects an abnormal rainfall level.example, a function that detects an abnormal rainfall level.
FunctionFunctionA function either A function either
a)a) Acquires, stores and processes data Acquires, stores and processes data ororb)b) Triggers an action in the physical worldTriggers an action in the physical world
Synchronous functionSynchronous function It returns result immediately, on demand. It returns result immediately, on demand. It is used to monitor continuous phenomena, for example, a function It is used to monitor continuous phenomena, for example, a function
that returns the rainfall level.that returns the rainfall level.
Asynchronous functionAsynchronous function It returns result after an arbitrary period of time.It returns result after an arbitrary period of time. It is used to monitor threshold events, for example, a function that detects It is used to monitor threshold events, for example, a function that detects
an abnormal rainfall level.an abnormal rainfall level.
1111
User representationUser representation
Devices are represented as ADTsDevices are represented as ADTsAbstract Data Type (ADT) objectsAbstract Data Type (ADT) objects
ADT objects are objects that are single attribute values encapsulating a ADT objects are objects that are single attribute values encapsulating a collection of related data. ADT objects provide controlled access to collection of related data. ADT objects provide controlled access to encapsulated data through a well-defined interface.encapsulated data through a well-defined interface.An example: RFSensors (Sensor,X,Y) provides An example: RFSensors (Sensor,X,Y) provides Sensor.getRainfallLevel()Sensor.getRainfallLevel()
1212
Internal representationInternal representation
Device functions are represented as virtual relationsDevice functions are represented as virtual relationsVirtual relationVirtual relation
It is a tabular representation of a function. A record in it contains the It is a tabular representation of a function. A record in it contains the input arguments and the output argument of the function it is input arguments and the output argument of the function it is associated with. associated with.
Arguments of Arguments of Device FunctionDevice Function
aa11 ………… aaMM
Attributes of Attributes of Virtual RelationVirtual Relation
Device Device ADT IDADT ID
aa11 ………… aaMM Output Output value value
Time Time stampstamp
Properties of Virtual relationProperties of Virtual relationIt is appended onlyIt is appended onlyIt is naturally partitioned across all devices represented by the same It is naturally partitioned across all devices represented by the same device ADTdevice ADT
1313
QueriesQueries
Historical queriesHistorical queriesSnapshot queriesSnapshot queries
They are naturally formulated as They are naturally formulated as declarative queries in SQL declarative queries in SQL
An example of long-running query An example of long-running query
SELECT R.Sensor.getRainfallLevel()SELECT R.Sensor.getRainfallLevel()FROM RFSensors RFROM RFSensors RWHERE R.Sensor.getRainfallLevel() > 50WHERE R.Sensor.getRainfallLevel() > 50
AND $every(30)AND $every(30)
The function $every(30) specifies that a new record is The function $every(30) specifies that a new record is inserted every 30 seconds into the append-only inserted every 30 seconds into the append-only virtual relation corresponding to the function virtual relation corresponding to the function RFSensor.getRainfallLevel().RFSensor.getRainfallLevel().
1414
Query Processing over Device Database SystemQuery Processing over Device Database System
Performance MetricsPerformance MetricsTraditional performance metricsTraditional performance metrics
Throughput --- average number of queries processed per unit of timeThroughput --- average number of queries processed per unit of time
New performance metricsNew performance metrics Resource Usage --- The total amount of energy consumed by the Resource Usage --- The total amount of energy consumed by the
devices when executing a query.devices when executing a query.
Response time --- time needed by the system to produce all answer Response time --- time needed by the system to produce all answer records to a query.records to a query.
Reaction Time --- The interval between the time a function, called Reaction Time --- The interval between the time a function, called on devices, returns the value and the time the corresponding on devices, returns the value and the time the corresponding answer is produced on the front-end.answer is produced on the front-end.
1515
Distributed Query Execution PlansDistributed Query Execution Plans
Query --- Query --- Retrieve every 30 seconds the rainfall level if it is Retrieve every 30 seconds the rainfall level if it is greater than 50 mm.greater than 50 mm.
SELECT VR.valueSELECT VR.valueFROM VRFSensorsGetRainfallLevel VR, RFSensors RFROM VRFSensorsGetRainfallLevel VR, RFSensors RWHERE VR.Sensor = R.Sensor AND VR.value > 50WHERE VR.Sensor = R.Sensor AND VR.value > 50
AND $every(30)AND $every(30)
1616
Plan TPlan T
Data extracted from the devices are materialized in the relation VR that is Data extracted from the devices are materialized in the relation VR that is located on the front-end.located on the front-end.
Both R and VR are in the front-end. And the join is executed on the front-endBoth R and VR are in the front-end. And the join is executed on the front-end
Join relation R and relation VR (using join condition VR.Sensor = R.Sensor Join relation R and relation VR (using join condition VR.Sensor = R.Sensor AND VR.value > 50)AND VR.value > 50)
1717
Plan APlan A
It is a simple tree where R is joined on the front-end with relation VR partitioned It is a simple tree where R is joined on the front-end with relation VR partitioned across a set of devices.across a set of devices.
The front-end asked each device to measure rainfall level and to transfer the The front-end asked each device to measure rainfall level and to transfer the resulting virtual records back to the front-end.resulting virtual records back to the front-end.
Disadvantages --- All devices with rainfall sensors transmit data to the front-end Disadvantages --- All devices with rainfall sensors transmit data to the front-end while the query only concerns the sensors which measure a rainfall level greater while the query only concerns the sensors which measure a rainfall level greater than 50.than 50.
Each virtual record arriving on the front-end is then joined with relation R.Each virtual record arriving on the front-end is then joined with relation R.
1818
Plan BPlan B
Define a semi-join between R and the partitions of VR located on the devices. The Define a semi-join between R and the partitions of VR located on the devices. The semi-join projects out the joining attribute from R (here the device ID Sensor) and semi-join projects out the joining attribute from R (here the device ID Sensor) and sends it to all devices.sends it to all devices.
On the devices, whenever the rainfall level is measured, a virtual record is On the devices, whenever the rainfall level is measured, a virtual record is generated and joined with the portion of relation R sent by the front-end generated and joined with the portion of relation R sent by the front-end (using joining condition R.Sensor = VR.Sensor and VR.value > 50)(using joining condition R.Sensor = VR.Sensor and VR.value > 50)
If the joining condition is verified, the virtual record is sent back to the front-If the joining condition is verified, the virtual record is sent back to the front-end to get joined with complete records from relation R .end to get joined with complete records from relation R .
1919
Plan CPlan C
It only pushes the selection (VR.value > 50) onto the device. Only records It only pushes the selection (VR.value > 50) onto the device. Only records that verify the condition are sent back to the front-end where they are joined that verify the condition are sent back to the front-end where they are joined with relation R.with relation R.
Compared to Plan B, there is no subset of relation R transmitted to the devices.Compared to Plan B, there is no subset of relation R transmitted to the devices.
2020
Resource usage for sensors located Resource usage for sensors located outside a flood areaoutside a flood area
With Plan A, data is sent back to the front-end whenever it is With Plan A, data is sent back to the front-end whenever it is generate.generate.
With Plan B, a semi-join is pushed to the device. The condition on With Plan B, a semi-join is pushed to the device. The condition on the rainfall level is checked on the device and no data is sent back the rainfall level is checked on the device and no data is sent back because of being outside of the flood. because of being outside of the flood.
Plan B pays the initial cost of transferring a fragment of relation R to Plan B pays the initial cost of transferring a fragment of relation R to the devices. This initial cost is amortized (compared to Plan A) the devices. This initial cost is amortized (compared to Plan A) during the lifespan of the long-running query.during the lifespan of the long-running query.
With Plan C, a selection is pushed to the device. The condition on With Plan C, a selection is pushed to the device. The condition on the rainfall level is checked on the device and also no data is sent the rainfall level is checked on the device and also no data is sent back because of locating outside of the flood.back because of locating outside of the flood.
2121
Resource usage for sensors located Resource usage for sensors located inside a flood areainside a flood area
With all plans, data is always sent back to the front-end. With all plans, data is always sent back to the front-end. The initial cost of Plan B is here never amortized. So line B will rise rapidly The initial cost of Plan B is here never amortized. So line B will rise rapidly with time increasing. with time increasing. Question: Why Plan C and Plan A have almost similar curves? Question: Why Plan C and Plan A have almost similar curves? Because the cost of performing a selection is low compared to the cost Because the cost of performing a selection is low compared to the cost
of sending data. of sending data.
2222
Conclusion of PlansConclusion of Plans Pushing a selection as in Plan C is the optimal. This is Pushing a selection as in Plan C is the optimal. This is intuitive since the query filters out uninteresting events generated intuitive since the query filters out uninteresting events generated on the devices. on the devices. Pushing the selection allows the device database system to Pushing the selection allows the device database system to trade efficiently increased processing on the devices for reduced trade efficiently increased processing on the devices for reduced communication.communication.
2323
I love the idea of using virtual relations to represent device I love the idea of using virtual relations to represent device functionsfunctionsThe complete query semantics over a Device Database are not The complete query semantics over a Device Database are not given here.given here.No avi value for each data, so not really real-time based.No avi value for each data, so not really real-time based.Individual nodes are not important, and a mote’s sensor may get Individual nodes are not important, and a mote’s sensor may get damaged and repots wrong value. So group-based coordinate damaged and repots wrong value. So group-based coordinate should be introduced.should be introduced.
DiscussionsDiscussions
2424
Event Detection ServiceEvent Detection Service
2525
MotivationMotivation
sensor networks are data-centric and real-time basedsensor networks are data-centric and real-time based – – Abstraction of real-time data semantics neededAbstraction of real-time data semantics neededIndividual nodes in sensor networks are unreliableIndividual nodes in sensor networks are unreliable-- Group-based robust coordination needed -- Group-based robust coordination needed
Detection of some events relies on more than one type of Detection of some events relies on more than one type of sensor datasensor data-- The relationship can help to increase the reliability of -- The relationship can help to increase the reliability of data decisionsdata decisions
2626
Data Services in sensor networksData Services in sensor networks
Queries (location, frequency, duration)Queries (location, frequency, duration)Data/Event disseminationData/Event disseminationData AggregationData AggregationData-centric Storage/CachingData-centric Storage/CachingEvent DetectionEvent DetectionData Security and Access AuthorizationData Security and Access Authorization
2727
Data Service Middleware (DSWare)Data Service Middleware (DSWare)
Data StorageData StorageMap the key to a logical nodeMap the key to a logical nodeMap a logical node to multiple physical nodesMap a logical node to multiple physical nodes
CachingCachingSpread copies along the routing pathSpread copies along the routing path
Compare?
Data StorageData StorageStatic copies & provide reliabilityStatic copies & provide reliability
CachingCachingVariable copies & improve performance Variable copies & improve performance
Sensor nodes
Real-time Scheduling
Subscription
Application
DSWare
Database-like abstraction
Event Detection
Group Management Aggregation
Data Storage Caching Authorization
Services in Data Service Middleware
2828
Problems with current event detection schemesProblems with current event detection schemes
An external node collects reports of atomic events and An external node collects reports of atomic events and determines whether the compound event occursdetermines whether the compound event occurs
Explosion
Atomic Event Reports
Determine the occurrence of compound events
reduce possible in-network processing and increase reduce possible in-network processing and increase unnecessary concentrated traffic around the decision nodeunnecessary concentrated traffic around the decision node
Increase detection delay (unacceptable for some time-critical Increase detection delay (unacceptable for some time-critical applications) applications)
2929
Event Detection Service in DSWareEvent Detection Service in DSWareEvent: application-interested activity in the environment that can be Event: application-interested activity in the environment that can be monitored or detected monitored or detected
Explosion
Detected in the area: High Temperature,
light intensity change, acoustic changes
Hierarchy of eventsHierarchy of events Atomic event: Atomic event:
detected through a single sensor’s observationdetected through a single sensor’s observatione.g. High Temperature, light intensity change, acoustic change e.g. High Temperature, light intensity change, acoustic change
Compound event: Compound event: consists of a set of atomic eventsconsists of a set of atomic eventsdetected based on the detection of atomic events that a compound event detected based on the detection of atomic events that a compound event consists of consists of e.g. Explosione.g. Explosion
3030
Event Detection Scheme in DSWare Event Detection Scheme in DSWare
ConfidenceConfidence Every compound event detection report has a confidence value, Every compound event detection report has a confidence value,
which indicates the reliability of the reportwhich indicates the reliability of the report Confidence function is designed based on data semanticsConfidence function is designed based on data semantics
Related importance of different atomic sub-eventsRelated importance of different atomic sub-eventsTemporary continuity of eventsTemporary continuity of eventsStatistical modelsStatistical modelsSimilarity among adjacent regionsSimilarity among adjacent regions
Waiting Time Window Waiting Time Window The time that an aggregation node waits for the arrivals of all The time that an aggregation node waits for the arrivals of all
possible atomic event reports possible atomic event reports When TW timeouts, report a compound event if the confidence When TW timeouts, report a compound event if the confidence
value reaches the minimum confidence requirements of this eventvalue reaches the minimum confidence requirements of this event Avoid endless waiting for messages lossAvoid endless waiting for messages loss Enable event detection based on partial information collectedEnable event detection based on partial information collected
3131
A Simple Example: Explosion (E)A Simple Example: Explosion (E)Sub-events: Sub-events: high temperature (T), special light (L), acoustic high temperature (T), special light (L), acoustic changes (A)changes (A)
Group Leader
T
f=0.6h
A
f=0.9h
Time window
Af=1.2h
L
Lost
L
f=0.3h
Shift time window
time
time
f=0.9h Report E
f=0.9h
T
No reports
f=0.3h f=1.2h Report E
L
f=0.3h
Confidence function: Confidence function: f = [0.6 * BOOL(T) + 0.3 * BOOL(L) + 0.3 * BOOL(A)] * hf = [0.6 * BOOL(T) + 0.3 * BOOL(L) + 0.3 * BOOL(A)] * h(h: history factor, increases if the explosion event has been (h: history factor, increases if the explosion event has been
detected in previous waiting time window. Assume 1detected in previous waiting time window. Assume 1≤≤hh≤2≤2))
Minimum Confidence: 0.8Minimum Confidence: 0.8
3232
Some other issues in event detectionSome other issues in event detection
Temporal resolutionTemporal resolution– Some events last much longer than the sensing interval of a Some events last much longer than the sensing interval of a
sensor. So probably some applications will report a single event sensor. So probably some applications will report a single event repetitively, which is unnecessary.repetitively, which is unnecessary.
Spatial resolutionSpatial resolution– If the size of a detection group is too small compared to the If the size of a detection group is too small compared to the
event, there might be several groups in this event’s coverage event, there might be several groups in this event’s coverage that will report the same event.that will report the same event.
3333
Performance in Reduction of CommunicationPerformance in Reduction of Communication
Base line:Base line:– Only one report of an environment Only one report of an environment
property is generated from a group property is generated from a group during each sensing interval.during each sensing interval.
– Send all reports to an outside Send all reports to an outside node and the entire analysis will node and the entire analysis will be done there. be done there.
DSWare has less communication.DSWare has less communication.
3434
Performance in Differentiating Events and Event-like FactorsPerformance in Differentiating Events and Event-like Factors
How to differentiate How to differentiate repetition report of event repetition report of event fromfrom event-like event-like factorfactor??
How about the performance with How about the performance with different time window sizedifferent time window size and and different minimum confidence valuedifferent minimum confidence value??
3535
DiscussionsDiscussions
The idea of event detection service is well developed and completely The idea of event detection service is well developed and completely discussed.discussed.In DSWare, data is replicated in multiple physical nodes that can be In DSWare, data is replicated in multiple physical nodes that can be mapped to a single logical node. So consistency among these nodes mapped to a single logical node. So consistency among these nodes is a key issue. In this paper, “weak consistency” is mentioned. But is a key issue. In this paper, “weak consistency” is mentioned. But what’s the definition of “weak consistency” in sensor network?what’s the definition of “weak consistency” in sensor network?Since multiple physical nodes are used to map to a single logical Since multiple physical nodes are used to map to a single logical node, why data caching is needed? What’s the different purposes of node, why data caching is needed? What’s the different purposes of introducing both of them.introducing both of them.It is mentioned that application can specify the actual scheduling It is mentioned that application can specify the actual scheduling schema in the sensor networks based on the most important schema in the sensor networks based on the most important concerns. But is it a good way for application to do that? It doesn’t concerns. But is it a good way for application to do that? It doesn’t seem a simple work. seem a simple work.
3636
Discussions --- (cont.)Discussions --- (cont.)
What is the position of real-time scheduling in the system? How to What is the position of real-time scheduling in the system? How to provide real-time?provide real-time?Two questions about Fig 5.Two questions about Fig 5. How to differentiate How to differentiate repetition report of event repetition report of event fromfrom event-like event-like
factorfactor?? How about the performance, with How about the performance, with different time window sizedifferent time window size and and
different minimum confidence valuedifferent minimum confidence value??A little typing mistake:A little typing mistake: In the last sentence before 5.1, “an explosion event will be In the last sentence before 5.1, “an explosion event will be
reported if the Confidence_E reported if the Confidence_E is not less than 0.9is not less than 0.9” should be “an ” should be “an explosion event will be reported if the Confidence_E explosion event will be reported if the Confidence_E is no less is no less than 0.9than 0.9””