Upload
ivo
View
44
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Stream Hierarchy Data Mining for Sensor Data. Margaret H. Dunham SMU Dallas, Texas 75275 [email protected]. Vijay Kumar UMKC Kansas City, Missouri 64110 [email protected]. From Sensors to Streams – An Outline. Data Stream Overview Data Stream Visualization Temporal Heat Map - PowerPoint PPT Presentation
Citation preview
111/26/07 – IRADSN’07
Stream Hierarchy Data Mining for Sensor Data
Margaret H. DunhamSMU
Dallas, Texas [email protected]
Vijay KumarUMKC
Kansas City, Missouri [email protected]
2
From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization
Temporal Heat Map Data Stream Modeling
Extensible Markov Model Data Stream Hierarchy
11/26/07 – IRADSN’07
3
From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization
Temporal Heat Map Data Stream Modeling
Extensible Markov Model Data Stream Hierarchy
11/26/07 – IRADSN’07
4
From Sensors to Streams Data captured and sent by a set of sensors
is usually referred to as “stream data”. Real-time sequence of encoded signals
which contain desired information. It is continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items
Stream data is infinite - the data keeps coming.
11/26/07 – IRADSN’07
5
Data Stream Management Systems (DSMS)
Software to facilitate querying and managing stream data.
Retrieve the most recent information from the stream Data aggregation facilitates merging together multiple
streams Modeling stream data to “summarize” stream Visualization needed to observe in real-time the spatial
and temporal patterns and trends hidden in the data.
11/26/07 – IRADSN’07
6
DSMS Problems Stream Management development in state similar to
that of databases prior to 1970’s Each system/researcher looks at specific
application or system No standards concerning functionality No standard query language
Unreasonable to expect end users will access raw data, data in the DSMS, or even data at a summarized view
Domain experts need to “see” a higher level of data 11/26/07 – IRADSN’07
7
Our Proposal
Four level data abstraction to facilitate the creation of actionable intelligence for domain experts evaluating sensor data.
11/26/07 – IRADSN’07
8
From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization
Temporal Heat Map Data Stream Modeling
Extensible Markov Model Data Stream Hierarchy
11/26/07 – IRADSN’07
9
Assumptions for Our Research
End User: May not be knowledgeable concerning sensors Probably a Domain Expert May not need to see exact sensor values Concerned with trends and approximate values Need to see data from MANY sensors at one time Need to see data continuously in a visualization of
the stream
11/26/07 – IRADSN’07
10
Suppose There Were MANY Sensors
Traditional line graphs would be very difficult to read Requirements for new visualization technique:
High level summary of data Handle multiple sensors at once Continuous Temporal Spatial
11/26/07 – IRADSN’07
11
Temporal Heat Map
Also called Temporal Chaos Game Representation (TCGR) Temporal Heat Map (THM) is a visualization technique for streaming
data derived from multiple sensors. It is a two dimensional structure similar to an infinite table. Each row of the table is associated with one sensor value. Each column of the table is associated with a point in time. Each cell within the THM is a color representation of the sensor value Colors normalized (in our examples)
0 – While 0.5 – Blue 1.0 - Red
11/26/07 – IRADSN’07
1210/11/07
NGDM'07
Cisco – Internal VoIP Traffic Data
• Time →
•Va
lues
→
• Complete Stream: CiscoEMM.png
• VoIP traffic data was provided by Cisco Systems and represents logged VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003.
13
Derwent River (UK)
11/26/07 – IRADSN’07
28043
28011
28048
28010
28023
28117
Derwent Temporal Heat Mapderwentrotate.png
14
From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization
Temporal Heat Map Data Stream Modeling
Extensible Markov Model Data Stream Hierarchy
11/26/07 – IRADSN’07
15
Data Stream Modeling Requirements
Summarization (Synopsis )of data Use data NOT SAMPLE Temporal and Spatial Dynamic Continuous (infinite stream) Learn Forget Sublinear growth rate - Clustering
11/26/07 – IRADSN’07
16
Extensible Markov Model Extensible Markov Model (EMM): at any time t, EMM
consists of a Markov Chain with designated current node, Nn, and algorithms to modify it, where algorithms include:
EMMCluster, which defines a technique for matching between input data at time t + 1 and existing states in the MC at time t.
EMMIncrement algorithm, which updates MC at time t + 1 given the MC at time t and clustering measure result at time t + 1.
EMMDecrement algorithm, which removes nodes from the EMM when needed.
In addition, the EMM has associated Data Mining functions such a Rare Event Detection and Prediction
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374.11/26/07 – IRADSN’07
1710/11/07NGDM'07
EMM Learning
• <18,10,3,3,1,0,0>
• <17,10,2,3,1,0,0>
• <16,9,2,3,1,0,0>
• <14,8,2,3,1,0,0>
• <14,8,2,3,0,0,0>
• <18,10,3,3,1,1,0.>
• 1/3
• N1
• N2
• 2/3
• N3
• 1/1• 1/3
• N1
• N2
• 2/3
• 1/1
• N3
• 1/1
• 1/2
• 1/3
• N1
• N2
• 2/3 • 1/2
• 1/2
• N3
• 1/1
• 2/3
• 1/3
• N1
• N2
• N1
• 2/2• 1/1
• N1
1
1811/26/07 – IRADSN’07
N2
N1 N3
N5 N6
2/2
1/3
1/3
1/3
1/2
N1 N3
N5 N6
1/61/6
1/6
1/31/3
1/3
EMM Forgetting
1911/26/07 – IRADSN’07
EMM Sublinear Growth Rate
Minnesota Department of Transportation (MnDot)
20
From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization
Temporal Heat Map Data Stream Modeling
Extensible Markov Model Data Stream Hierarchy
11/26/07 – IRADSN’07
21
Traditional DBMS Data Abstraction
Three levels of data abstraction Physical, Logical External
Data is normally pulled to the user by a query
11/26/07 – IRADSN’07
22
Proposed DSMS Data Abstraction Abstraction
Level 0 - Physical Level• Raw data from sensors• Cannot be stored
Level 1 – DSMS• Sensor data is merged, aggregated, and cleansed. • DSMS queries may be processed against this data.
Level 2 – Model• Summarization (Synopsis )of data
Level 3 – Domain Expert• Summary Visualization
Data is normally pushed to the user11/26/07 – IRADSN’07
2311/26/07 – IRADSN’07
Levels Lowest Level
Highest Level Abstraction
Inter-level Data Migration
Memory Hierarchy
n External Storage
Subset/Cache/Buffer Fetch/Prefetch
DBMS Data Hierarchy
3 Physical Storage
External View Fetch, Prefetch
Data Warehouse
n Operational Data
Cube/Multidimensional View
Aggregation
Stream Hierarchy
4 Sensor Data Visualization/Triggers Automatic Push
2411/26/07 – IRADSN’07
Data StreamManagement System(DSMS)
N2
N3N1
N4 N5
P21
P12
P41
P31
P34P15
P24
P53
P55
LEVEL 3Domain Expert
LEVEL 2Model
LEVEL 1DSMS
LEVEL 0Sensors
Streams
Triggers, Lookmarks, Anomalies
Data MiningApplications
Temporal DynamicSynopsis
Actionable Intelligence
Visualization
Query
Scratch Space
25
Stream Hierarchy Summary
Except for the inter-level functionality requirements, each level functionality is independent of the others and may differ across different implementations.
The model used must capture time and ordering of data, be able to both learn and forget, and use some variation of clustering.
Visualization at the domain expert level must capture both time and ordering. It addition it should be able to be easily “read” for many sets of sensors.
11/26/07 – IRADSN’07