34
1 Clustering of location-based data Mohammad Rezaei May 2013

1 Clustering of location- based data Mohammad Rezaei May 2013

Embed Size (px)

Citation preview

Page 1: 1 Clustering of location- based data Mohammad Rezaei May 2013

1

Clustering of location-based data

Mohammad Rezaei

May 2013

Page 2: 1 Clustering of location- based data Mohammad Rezaei May 2013

2

Data mining and Clustering

- Huge amount of location-based Data

- Need for mechanisms to extract knowledge

- Clustering as an important field in spatio-temporal data mining

Page 3: 1 Clustering of location- based data Mohammad Rezaei May 2013

3

Clustering

Page 4: 1 Clustering of location- based data Mohammad Rezaei May 2013

4

Some applications

RoutingInteresting placesRecommendation of servicesMarketing managementUsers with same interestsVisualization

Page 5: 1 Clustering of location- based data Mohammad Rezaei May 2013

5

Clustering Problems in Mopsi

Clutter of markers on the mapSimilar services or photos in a

listCategorization of servicesDistribution of users’ locationsTimeline view of photosClustering of events

Page 6: 1 Clustering of location- based data Mohammad Rezaei May 2013

6

Clutter of markers

Page 7: 1 Clustering of location- based data Mohammad Rezaei May 2013

7

Search results

Clustering

Page 8: 1 Clustering of location- based data Mohammad Rezaei May 2013

8

Photos

Page 9: 1 Clustering of location- based data Mohammad Rezaei May 2013

9

Users

Page 10: 1 Clustering of location- based data Mohammad Rezaei May 2013

10

Solutions

Grid based clustering

Distance based clustering

Page 11: 1 Clustering of location- based data Mohammad Rezaei May 2013

11

Google Maps version 3.0- Using location in pixels for grid-base

clustering- 22 zoom levels- 256*256 in zoom level 0 to 536870912*

536870912 in zoom level 21- ≈ 60*1012 cells in the zoom level 21

with cell size(60,80)

Page 12: 1 Clustering of location- based data Mohammad Rezaei May 2013

12

Some issues

- Photos are added or deleted dynamically

- Querying for a certain time, certain user or according to photo description

- Different zoom levels, moving map

Page 13: 1 Clustering of location- based data Mohammad Rezaei May 2013

13

Hierarchical Clustering on server

Page 14: 1 Clustering of location- based data Mohammad Rezaei May 2013

14

Hierarchical Clustering on server

Individual clustering for different zoom levels

Clustering of whole data

How to extract clusters for a specific query?

Are clusters for a lower zoom level can be derived from higher level?

Page 15: 1 Clustering of location- based data Mohammad Rezaei May 2013

15

Client side clustering

- Query from server (Resulting N objects)

- Take the zoom view

Not too many cells

- Taking objects in the zoom view and do

clustering only for them (M objects)

- It takes O(N) to find out the objects in

the zoom view!

Page 16: 1 Clustering of location- based data Mohammad Rezaei May 2013

16

Grid based clustering

Input location (lat, lon) of markers Width and height of markers (Hm,Wm) Width and height of cells in the grid (H, W)

Output Location of clusters

Location of the marker

W

H

Wm

Hm

Page 17: 1 Clustering of location- based data Mohammad Rezaei May 2013

17

Representation - Middle of cell

-No overlap

-Locations can be misleading

Page 18: 1 Clustering of location- based data Mohammad Rezaei May 2013

18

Representation- First object

Page 19: 1 Clustering of location- based data Mohammad Rezaei May 2013

19

Representation – Average Location

Page 20: 1 Clustering of location- based data Mohammad Rezaei May 2013

Proposed approach

- Grids start from beginning of the whole map

- Extend the grid in current zoom view

By moving map clusters do not

change

- Average location for representative

By moving map clusters

do not change

20

W

H

(xmin, ymin)

(xmax, ymax)

Page 21: 1 Clustering of location- based data Mohammad Rezaei May 2013

21

Algorithm

1. nRow = ceil((xmax-xmin)/W)

2. nColumn = ceil((ymax-ymin)/H)

3. nCell = nRow * nColumn 4. Clusters = all cells // empty clusters5. For all the markers6. row = floor((y-ymin)/gridHeight)

7. column = floor((x-xmin)/gridWidth)

8. cellNum = row*nColumn + column9. Add the marker to Clusters[cellNum]10. Update the cluster:

Clusters[cellNum]

W

H

(xmax, ymax)

(xmin, ymin)

(x,y)

1 2 3 4 5

1

2

3

4

5

1 2 3 4 5

6 7 8 9 10

11

25

19

Cell number

18 20

Page 22: 1 Clustering of location- based data Mohammad Rezaei May 2013

22

Merging algorithm- Average location as representative

1. MergeClusters(clusters)2. change the order of clusters descending according to the size of

clusters 3. set parent of each cluster, the same cluster4. k=1 (K is number of clusters)5. while (k < K )6. if ( k is not “processed” )7. checkNeighbors(k);8. mark the cluster k “processed”9. k=k+1

10. CheckNeighbors(k)11. cluster1=clusters[k]12. For all 8 neighbors13. cluster2 = one of the neighbors // 14. if cluster2 is not an empty cell15. checkNeighbor(cluster1, cluster2)

Page 23: 1 Clustering of location- based data Mohammad Rezaei May 2013

23

Merging algorithm1. checkNeighbor(cluster1, cluster2)2. find the distance d between the two clusters3. if d<T // distance threshold T4. while ( cluster2 is “processed” ) // means it has been merged5. cluster2 = clusters[cluster2.parent]6. MergeClusters(cluster1, cluster2);

7. MergeClusters(cluster1, cluster2)8. n1 and n2: size of the clusters9. (x1,y1) and (x2,y2): location of clusters10. x=(n1*x1+n2*x2)/(n1+n2) 11. y=(n1*y1+n2*y2)/(n1+n2) 12. x1 x and y1 y 13. mark the second cluster “processed”14. cluster2.parent = k

Page 24: 1 Clustering of location- based data Mohammad Rezaei May 2013

24

Grid based clustering

Width and height of a cell H>Hm and W>Wm

Minimum distance of the markers to avoid overlap 22

mm HWd

d

Wm

Hm

Marker

Location of marker

Page 25: 1 Clustering of location- based data Mohammad Rezaei May 2013

25

Distance based clustering

Input location (lat, lon) of markers Width and height of markers (Hm, Wm)

Output location of clusters

Time complexity: O(N2)

Page 26: 1 Clustering of location- based data Mohammad Rezaei May 2013

26

Algorithm1. i= 0;2. While (i<N) // N=number of markers3. if ( marker i is not clustered )4. Label marker i as clustered5. Calculate distance (dj) to other non-clustered

markers6. for all markers j7. If dj<T // T: distance threshold

8. merge the markers i and j9. Label marker j as clustered10. i = i+1;

Page 27: 1 Clustering of location- based data Mohammad Rezaei May 2013

27

Timeline view of photosDisplaying n photos in a limited space

Page 28: 1 Clustering of location- based data Mohammad Rezaei May 2013

28

Timeline view of photos

Input Timestamps Number of clustersOutput PartitionsAlgorithm K-means

Page 29: 1 Clustering of location- based data Mohammad Rezaei May 2013

29

Location clusters

Homes of usersShop

Walking street

Marketplace

Swimhall

Sciencepark

Page 30: 1 Clustering of location- based data Mohammad Rezaei May 2013

30

Clustering of trajectories

Page 31: 1 Clustering of location- based data Mohammad Rezaei May 2013

31

Similarity or distance

Start and end of the routes

Page 32: 1 Clustering of location- based data Mohammad Rezaei May 2013

32

Similarity or distance

Speed, length, accelaration, time, etc

70 km/h 72 km/h

50 km/h

30 km/h

60 km/h

These two routes are more similar in speed than others

Page 33: 1 Clustering of location- based data Mohammad Rezaei May 2013

33

Similarity or distance

Closeness of points and shape(Comparing whole route or segments of the routes)

t1T1

t2t3 t4

t5 t6t7

t8

T2t1

t2

t3

t4

t1T1

t2 t3 t4t5 t6

t7t8

T2t1

t2

t3

t4

Closest pair distance

Sum of pair distance

Page 34: 1 Clustering of location- based data Mohammad Rezaei May 2013

34

Cluttering problem for routes