Give me purpose! - Technische Universität Darmstadt · PDF fileBackground •Immanuel...

Preview:

Citation preview

Why?SenseML 2014 Keynote

Immanuel Schweizer

Background

• Immanuel Schweizer

• TU Darmstadt, Germany

• Telecooperation Lab• Ubiquitous Computing

• Smart Urban Networks

SenseML 2014 2

Background

• Graph-based optimization forP2P networks

• PhD Thesis • Energy-efficient network protocols

for wireless sensor networks

• Flow Control

• Topology Control

• Application: Urban Management

SenseML 2014 3

Background

SenseML 2014 4

Inductive Loops

• >150 traffic lights• ~3,000 sensors

• Two parameters• Utilization

• Count

SenseML 2014 5

Street Cars

• ~10 sensors• Deployed on streetcars

• Solar cells, Zigbee (868 MHz), temperature, GPS, …

SenseML 2014 6

Phones / Noisemap

• Noise pollution via microphone

• More than 2000 installations• 30 active users per day

• ~ 750,000 data points

• Gamification

• Calibration

SenseML 2014 7

da_sense

SenseML 2014 8

More sensors…

…more data!SenseML 2014 9

And more data…

OpenSense (ETH Zurich, http://www.opensense.ethz.ch/trac/)

DeviceAnalyzer (University of Cambridge, https://deviceanalyzer.cl.cam.ac.uk/)

SenseML 2014 10

What do we do with all that data?

SenseML 2014 11

What do we do with all that data?

• Help with planning tasks

• Understand human activity

• Environmental models

• Detect events

• Track users

• Nowcasting / Forecasting

• …

SenseML 2014 12

Machine Learning

SenseML 2014 13

What‘s special about sensor data?

SenseML 2014 14

Where does sensor data come from?

SenseML 2014 15

Sensor Infrastructure

SenseML 2014 16

Sensor Infrastructure

• High cost per sensor

• Mostly wired

• High quality of information

• Some kind of certification

SenseML 2014 17

Sensor Infrastructure

(Wireless) Sensor Networks

SenseML 2014 18

Wireless Sensor Networks

• Cheaper hardware

• Mostly wireless

• Battery-powered

• Mixed quality of information

• High diversity

SenseML 2014 19

Sensor Infrastructure

(Wireless) Sensor Networks

Mobile Sensing / User-generated Data

SenseML 2014 20

Mobile Sensing

• Easy development and deployment

• Almost no hardware cost

• Lack of control over quality of information

• Privacy

• Humans-in-the-loop

SenseML 2014 21

Sensor Infrastructure

(Wireless) Sensor Networks

Mobile Sensing / User-generated Data

Qu

antity

Qu

ality

SenseML 2014 22

What‘s special about sensor data?

Heterogeneity

• Unstructured vs. Structured data

• Different hardware• Different Sensors• Mobile Phones vs.

Dedicated Hardware

• Heterogeneity of data sources

• Spatial and time resolution

Quality-of-Information

• Low cost sensors

• Mobility

• Human-in-the-loop

• Faults

• Placement

• …

SenseML 2014 23

Preprocessing

• Data Fusion

• Integrating External Sources

• Filtering

• Approximation

• Fault Detection

• Manual Cleaning

• …

SenseML 2014 24

Example 1: Location

SenseML 2014 25

Example 2: Filtering Noisemap

SenseML 2014 26

Example 2: Filtering Noisemap

SenseML 2014 27

Example 3: Road Network

• Traffic measurements

• Noise measurements

• Idea: Predict traffic, based on noise measurements

SenseML 2014 28

Example 3: Road Network

SenseML 2014 29

Road network data processing

Road Segment

· Road Type· Surface Type· Maximum Speed· Oneway· Number of lanes· Etc.

Road Characteristics

A polygon area in WGS 84 coordinate system

An area around the road segment, excluding the

space near neighbor segements and the areas of surrounding buildings.

Road Segment Geometry Selection area geometry

Average sound pressure level for a time interval

Traffic levelWeather conditions

SenseML 2014 30

Road network data processingOpenStreetMap

• Goal - create road segments automatically

• Largest free road network dataset

• OSM Data format• Node, way, relation• Attributes

SenseML 2014 31

Road network data processing

OSM - Non-planar topology

• Straight-forward planarization not possible• Road segment separated in multiple polylines

SenseML 2014 32

Road network data processing

• Misclassified road links• Remove "unclassified" roads• Filter by length

• Represent multiple ways as single way• Merge ways

• Missing common node• Merge nodes in proximity of 5 cm

SenseML 2014 33

Road network data processing

• Clean up

• Combine parallel ways of the same street

SenseML 2014 34

Road network data processing

• 2D geometry• Based on number of lanes

SenseML 2014 35

𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛𝐴𝑟𝑒𝑎 = 𝐴\(𝐵1 ∪ 𝐵2 ∪ … 𝐵𝑛)

Road network data processing

Spatial filter

• Which sound pressure records to include?

• Straight-forward approach: select measurements based on proximity

• 2 spatial buffers around each segment

SenseML 2014 36

Road network data processing

• Exclude buildings

• Location accuracy - falsely included/excluded measurements• Inward/outward offsetting

• Inward: minimize the number of included measurements, that are recorded outside

• Outward: minimize the number of filtered out measurements, that are recorded inside

SenseML 2014 37

Example 3: Road Network

SenseML 2014 38

What‘s special about sensor data?

SenseML 2014 39

What‘s special about sensor data?

SenseML 2014 40

What‘s special about sensor data?

=?

SenseML 2014 41

Real-world data

• Classes for classification• Sound Level

• Traffic Level

SenseML 2014 42

Example: Traffic Level

SenseML 2014 43

Example: Traffic Level

SenseML 2014 44

Real-world data

• Classes for classification• Sound Level

• Traffic Level

• Evaluation

• Transferability

SenseML 2014 45

Example: Noise Pollution

Visualization

Sound Level Prediction

ARFF Writer

Classification

Decision Tree Learning

Final Model

1

2

OpenStreetMap

Extracting OSM information about

nearby streets

LinkedGeoData

Extracting information about nearby buildings

Object Data (RDF)

WeatherData

Extracting weather information in the surrounding area

Data File

Data File

External Data Sources

Additional Data

Adding additional information

SPARQL

Attributes

Noisemap

Instances of noise data

Initial Dataset

Point of Interest

Geocoordinates

1

2

SenseML 2014 46

Evaluation

• Cross Validation• Accuracy, Precision, Recall ~

80%

• Other Models• Same Resolution

• Same Input Data

• Difference?

• Human-readable rules

SenseML 2014 47

Transferability

• Perfect Model for Darmstadt

• No noise data in Nancy, France

• Same Features?• External data sources

• Different regulations

• …

SenseML 2014 48

What‘s special about sensor data?

SenseML 2014 49

Pipeline

Visualization

Sound Level Prediction

ARFF Writer

Classification

Decision Tree Learning

Final Model

1

2

OpenStreetMap

Extracting OSM information about

nearby streets

LinkedGeoData

Extracting information about nearby buildings

Object Data (RDF)

WeatherData

Extracting weather information in the surrounding area

Data File

Data File

External Data Sources

Additional Data

Adding additional information

SPARQL

Attributes

Noisemap

Instances of noise data

Initial Dataset

Point of Interest

Geocoordinates

1

2

SenseML 2014 50

PipelinesVisualization

Sound Level Prediction

ARFF Writer

Classification

Decision Tree Learning

Final Model

1

2

OpenStreetMap

Extracting OSM information about

nearby streets

LinkedGeoData

Extracting information about nearby buildings

Object Data (RDF)

WeatherData

Extracting weather information in the surrounding area

Data File

Data File

External Data Sources

Additional Data

Adding additional information

SPARQL

Attributes

Noisemap

Instances of noise data

Initial Dataset

Point of Interest

Geocoordinates

1

2

Layer 1

Layer 2

Layer 3

Measurements Traffic Data

Measurement Filter Traffic ParserOSM Parser

OSM XML

Machine Learning ModelTraining Set Builder

SenseML 2014 51

Pipelines

• StandardizedToolbox• Rapidminer++

• Generalize Components (with interfaces)

• Learn and share• What parts can be generalized? Why?

• Share your experience about building these pipelines

SenseML 2014 52

Visualization

Sound Level Prediction

ARFF Writer

Classification

Decision Tree Learning

Final Model

1

2

OpenStreetMap

Extracting OSM information about

nearby streets

LinkedGeoData

Extracting information about nearby buildings

Object Data (RDF)

WeatherData

Extracting weather information in the surrounding area

Data File

Data File

External Data Sources

Additional Data

Adding additional information

SPARQL

Attributes

Noisemap

Instances of noise data

Initial Dataset

Point of Interest

Geocoordinates

1

2

What‘s special about sensor data?

• Preprocessing• Heterogeneity

• QoI

• Real-World• Classes

• Evaluation

• Transferability

• Pipeline• Share, learn, and standardize?

• More automation

SenseML 2014 53

Recommended