Upload
joaquin-vanschoren
View
1.099
Download
0
Embed Size (px)
DESCRIPTION
Terabyte scale Sensor Network data analysis using MapReduce/ Hadoop
Citation preview
Clustering• Sensor signal is not labeled. For classification, we need to
label first, e.g. by clustering events
Truck? Car? Noise?
Clustering• Sensor signal is not labeled. For classification, we need to
label first, e.g. by clustering events
Convolute Convolute Convolute Convolute Convolute
Windowing Windowing Windowing WindowingWindowing
Clustering
Average per cluster
Average per cluster
Average per cluster
Average per cluster
Average per cluster
Calculatedistance
Calculatedistance
Calculatedistance
Calculatedistance
Calculatedistance
update clusters
random clusters
Clustering in Hadoop
<ts1, >
<ts2, >
<ts3, >
<ts4, >
<ts5, >
<ts6, >
<ts, {s1, s2, ...}>
split 1
split 2
split n
Map
Map
<tsi , {0,s1}> Reduce(per tsi )
Reduce
...
Map Reduce
Data massageRaw data Subsequences(with lead-in/out)
tslead-in lead-out
<tsi-1,{1,s1}>
...
...
<tsi+1,{-1,s1}>
lead-in/out needed to center bumps (snapping)
Clustering in Hadoop
<ts1, >
<ts2, >
<ts3, >
<ts4, >
<ts5, >
<ts6, >
<ts, {s1, s2, ...}>
split 1
split 2
split n
Map
Map
<tsi , {0,s1}> Reduce(per tsi )
Reduce
...
Map Reduce
Data massageRaw data Subsequences(with lead-in/out)
tslead-in lead-out
<tsi-1,{1,s1}>
...
...
<tsi+1,{-1,s1}>
Some clever partitioning of keys
lead-in/out needed to center bumps (snapping)
Clustering in Hadoop
<ts1, >
<ts2, >
<ts3, >
<ts4, >
<ts5, ><ts6, >
Subsequences(with lead-in/out)
Map
Map
Reduce(per clusi )
ReduceMap
split 1
split 2
split n...
<clusi,partialsums>
Clustering Clustercentroids
update current cluster centroidsiterate
tslead-in lead-out
currentcluster
centroids
k (random)
<clus1, >
<clus2, >
Distance calculation parallel
Clustering
ClusteringTruck (traffic jam)
ClusteringTruck (traffic jam)
Small truck (traffic jam)
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truck
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truck
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truckSmall truck
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truckSmall truckCar
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truckSmall truckCar
Idle (noise)
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truckSmall truckCar
Idle (noise)
Truck!Car!
ClusteringTruck (traffic jam)
Small truck (traffic jam)
Cars (traffic jam)
Heavy truckMedium truckSmall truckCar
Idle (noise)
Performance
0
10,00
20,00
30,00
40,00
3 days 10 days 1 month 3 months
Convolution Clustering
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Amount of sensor data
Run
time
(hou
rs)
Performance
0
10,00
20,00
30,00
40,00
3 days 10 days 1 month 3 months
Convolution Clustering
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Amount of sensor data
Run
time
(hou
rs)
66 node cluster
Performance
0
10,00
20,00
30,00
40,00
3 days 10 days 1 month 3 months
Convolution Clustering
• MapReduce: Techniques scale linearly (6 node cluster)
• Noticeable overhead on small amounts of data
Amount of sensor data
Run
time
(hou
rs)
66 node cluster
Multi-scale analysis• Sensor signal is composite of events that happen at
different time-scales
• Passing truck (small), traffic jam (medium), seasonal change (long scale)
• Try to de-compose signals in ‘natural’ timescales
• Basic idea:
• Convolute data at different scales (scale space)
• Subtract key convolutions (band-pass filters)
Scale space
Amount of sensor data
Multi-scale analysis• Subtraction of two such convolutions (band-pass filter)
Amount of sensor data
Scale space
Decomposition
S2
S4-S2
S4
• Large-scale preprocessing allows advanced analysis
• Equation discovery
• Long-term trends (regression)
• E.g. change in response, eigenfrequency,...
• Correlations:
• ...
Up to speed...
s100(t) = 1.196 s101(t) - 0.272 s102(t) + 0.156 s106(t)
temperature
stra
in
Thanks
Gracias
Xie Xie
Danke
Dank U
Merci
Efharisto
Dhanyavaad
GrazieSpasiba
Obrigado
Tesekkurler
Diolch
KöszönömArigato
Hvala
Toda