27
Clustering Sensor signal is not labeled. For classification, we need to label first, e.g. by clustering events

Hadoop sensordata part3

Embed Size (px)

DESCRIPTION

Terabyte scale Sensor Network data analysis using MapReduce/ Hadoop

Citation preview

Page 1: Hadoop sensordata part3

Clustering• Sensor signal is not labeled. For classification, we need to

label first, e.g. by clustering events

Page 2: Hadoop sensordata part3

Truck? Car? Noise?

Clustering• Sensor signal is not labeled. For classification, we need to

label first, e.g. by clustering events

Page 3: Hadoop sensordata part3

Convolute Convolute Convolute Convolute Convolute

Windowing Windowing Windowing WindowingWindowing

Clustering

Average per cluster

Average per cluster

Average per cluster

Average per cluster

Average per cluster

Calculatedistance

Calculatedistance

Calculatedistance

Calculatedistance

Calculatedistance

update clusters

random clusters

Page 4: Hadoop sensordata part3

Clustering in Hadoop

<ts1, >

<ts2, >

<ts3, >

<ts4, >

<ts5, >

<ts6, >

<ts, {s1, s2, ...}>

split 1

split 2

split n

Map

Map

<tsi , {0,s1}> Reduce(per tsi )

Reduce

...

Map Reduce

Data massageRaw data Subsequences(with lead-in/out)

tslead-in lead-out

<tsi-1,{1,s1}>

...

...

<tsi+1,{-1,s1}>

lead-in/out needed to center bumps (snapping)

Page 5: Hadoop sensordata part3

Clustering in Hadoop

<ts1, >

<ts2, >

<ts3, >

<ts4, >

<ts5, >

<ts6, >

<ts, {s1, s2, ...}>

split 1

split 2

split n

Map

Map

<tsi , {0,s1}> Reduce(per tsi )

Reduce

...

Map Reduce

Data massageRaw data Subsequences(with lead-in/out)

tslead-in lead-out

<tsi-1,{1,s1}>

...

...

<tsi+1,{-1,s1}>

Some clever partitioning of keys

lead-in/out needed to center bumps (snapping)

Page 6: Hadoop sensordata part3

Clustering in Hadoop

<ts1, >

<ts2, >

<ts3, >

<ts4, >

<ts5, ><ts6, >

Subsequences(with lead-in/out)

Map

Map

Reduce(per clusi )

ReduceMap

split 1

split 2

split n...

<clusi,partialsums>

Clustering Clustercentroids

update current cluster centroidsiterate

tslead-in lead-out

currentcluster

centroids

k (random)

<clus1, >

<clus2, >

Distance calculation parallel

Page 7: Hadoop sensordata part3

Clustering

Page 8: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Page 9: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Page 10: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Page 11: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truck

Page 12: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truck

Page 13: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truckSmall truck

Page 14: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truckSmall truckCar

Page 15: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truckSmall truckCar

Idle (noise)

Page 16: Hadoop sensordata part3

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truckSmall truckCar

Idle (noise)

Page 17: Hadoop sensordata part3

Truck!Car!

ClusteringTruck (traffic jam)

Small truck (traffic jam)

Cars (traffic jam)

Heavy truckMedium truckSmall truckCar

Idle (noise)

Page 18: Hadoop sensordata part3

Performance

0

10,00

20,00

30,00

40,00

3 days 10 days 1 month 3 months

Convolution Clustering

• MapReduce: Techniques scale linearly (6 node cluster)

• Noticeable overhead on small amounts of data

Amount of sensor data

Run

time

(hou

rs)

Page 19: Hadoop sensordata part3

Performance

0

10,00

20,00

30,00

40,00

3 days 10 days 1 month 3 months

Convolution Clustering

• MapReduce: Techniques scale linearly (6 node cluster)

• Noticeable overhead on small amounts of data

Amount of sensor data

Run

time

(hou

rs)

66 node cluster

Page 20: Hadoop sensordata part3

Performance

0

10,00

20,00

30,00

40,00

3 days 10 days 1 month 3 months

Convolution Clustering

• MapReduce: Techniques scale linearly (6 node cluster)

• Noticeable overhead on small amounts of data

Amount of sensor data

Run

time

(hou

rs)

66 node cluster

Page 21: Hadoop sensordata part3

Multi-scale analysis• Sensor signal is composite of events that happen at

different time-scales

• Passing truck (small), traffic jam (medium), seasonal change (long scale)

• Try to de-compose signals in ‘natural’ timescales

• Basic idea:

• Convolute data at different scales (scale space)

• Subtract key convolutions (band-pass filters)

Page 22: Hadoop sensordata part3

Scale space

Amount of sensor data

Page 23: Hadoop sensordata part3

Multi-scale analysis• Subtraction of two such convolutions (band-pass filter)

Amount of sensor data

Page 24: Hadoop sensordata part3

Scale space

Page 25: Hadoop sensordata part3

Decomposition

S2

S4-S2

S4

Page 26: Hadoop sensordata part3

• Large-scale preprocessing allows advanced analysis

• Equation discovery

• Long-term trends (regression)

• E.g. change in response, eigenfrequency,...

• Correlations:

• ...

Up to speed...

s100(t) = 1.196 s101(t) - 0.272 s102(t) + 0.156 s106(t)

temperature

stra

in

Page 27: Hadoop sensordata part3

Thanks

Gracias

Xie Xie

Danke

Dank U

Merci

Efharisto

Dhanyavaad

GrazieSpasiba

Obrigado

Tesekkurler

Diolch

KöszönömArigato

Hvala

Toda