16
K-means Clustering Islamic University of Lebanon Faculty of Engineering Computer and Communication – Computer – Master Prepared by Ahmed Ramzi Rashid Ahmed Sedeeq Baker Supervised by: Dr. Ali Haroun semester 3 rd 2017-2-6 1

K means clustering

Embed Size (px)

Citation preview

Page 1: K means clustering

1

K-means ClusteringIslamic University of Lebanon

Faculty of Engineering Computer and Communication – Computer – Master

Prepared by Ahmed Ramzi RashidAhmed Sedeeq Baker

Supervised by:Dr. Ali Haroun

semester 3rd 2017-2-6

Page 2: K means clustering

2

Outlines

1 -Introduction.

- Cluster.

- K-means.

- problem.

2 -Calculate using k-means.

( step 0 , step 1 , step 2. )

3 -Results and suggestion.

4 -References .

5 -Index .

Page 3: K means clustering

3

Introduction Cluster In general a grouping of objects in a same group (cluster) are similar

(or related) to one another and different from (or unrelated to) the objects in other groups.

Page 4: K means clustering

4

K-means Algorithm : How it work?

-Give the cluster number (K) ,

the K-means algorithm is carried out in three steps:

Introduction

4

Page 5: K means clustering

5

A marketing researcher wishes to determine market segments in a community based on patterns of loyalty to brands and stores a small sample of seven respondents is selected as a pilot test of how cluster analysis is applied. Two measures of loyalty- V1(store loyalty) and V2(brand loyalty)- were measured for each respondents on 0-10 scale.

Introduction

Page 6: K means clustering

6

Now we used this equation of K-means algorithm to calculate the distance for (K=2).

G F E D C B A3.61 6.40 5 5.09 5.09 3.16 02.24 3.61 2.24 2.83 2 0 3.16

Step 1 : Use initial seed points for partitioning

D0 =C1 = ( 3,2 ) group 1C2 = ( 4,5 ) group 2

Page 7: K means clustering

7

D0

d(A,C1)= = 0d(A,C2)= =3.16

d(B,C1) =d(B,C2) =

d(C,C1) = =5.09d(C,C2)= =2d(D,C1)= =5.09d(D,C2)= =2.83

d(E,C1) =d(E,C2) =

d(F,C1) = =6.40d(F,C2)= =3.61d(G,C1) = =3.61d(G,C2)= =2.24

we have two centroid point that we take randomly (A,B) with (K=2).

Page 8: K means clustering

8

Step 2 : Compute new centroids of the current

D1 = C1 = ( 3,2 ) group 1C2 = ( 4.8,6 ) group 2

G F E D C B A

3.61 6.40 5 5.09 5.09 3.16 02.33 2.41 1.2 2.97 1.28 1.28 5.93

C1 = (3,2) C2 = (4+4+2+6+7+6)/6 , (5+7+7+6+7+4) C2 = (4.8 , 6)

Page 9: K means clustering

9

D1

d(A,C1)= = 0d(A,C2)= =5.93

d(B,C1) =d(B,C2) =

d(C,C1) = =5.09d(C,C2)= =1.28d(D,C1)= =5.09d(D,C2)= =2.97

d(E,C1) =d(E,C2) =

d(F,C1) = =6.40d(F,C2)= =2.41d(G,C1) = =3.61d(G,C2)= =2.33

the first cluster have the centroid (A) , and the second cluster that have the a new centroid it’s contain these points around it (B,C,D,E,F,G) .

Page 10: K means clustering

10

Step 3 : Repeat the first two steps

G F E D C B A

1.80 4.71 3.35 4.71 4.03 2.06 1.80

2.77 2.47 1.52 2.66 0.84 1.52 4.68D2 = C1 = ( 4.5,3 ) group

1C2 = ( 4.6,6.4 ) group

C1 = (3+6)/2 , (2+4)/2 C2 = (4+4+2+6+7)/5 , (5+7+7+6+7)/5 C1 = (4.5 , 3)

C2 = (4.6 , 6.4)

Page 11: K means clustering

11

D2

d(A,C1)= = 1.80d(A,C2)= =4.68

d(B,C1) =d(B,C2) =

d(C,C1) = =4.03d(C,C2)= =0.84d(D,C1)= =4.71d(D,C2)= =2.66

d(E,C1) =d(E,C2) =

d(F,C1) = =4.71d(F,C2)= =2.47d(G,C1) = =1.80d(G,C2)= =2.77

The first cluster have a new centroid and the points (A,G) . Rather than in the second cluster we have change the centroid and we have these point around the new centroid (B,C,D,E,F).

Page 12: K means clustering

12

Results and suggestion-:

-In this presentation we take a set of items and we divided it by using k-means algorithm in two groups every items in one group similar to other in same group in some qualities , and every group have a centroid that have a

minimum distance with all items in the group .

-If we use (k=3) or more we may be minimize the time and we may be have a better performance ?

Page 13: K means clustering

13

References

1 -Pang-Ning Tan , Instructor’s Solution Manual , Pearson Addison-Wesley , pp 125 , 2006.

2 -Margaret Rouse , Cluster , http://searchexchange.techtarget.com/definition/cluster , pp 2 , 2013.

3 -Slideshare , clustering with K-means algorithm.

Page 14: K means clustering

14

Thanks all for listening

Page 15: K means clustering

15

IndexK-Means Clustering Visual Basic Code

Sub kMeanCluster (Data() As Variant, numCluster As Integer)' main function to cluster data into k number of Clusters' input: ' + Data matrix (0 to 2, 1 to TotalData); ' Row 0 = cluster, 1 =X, 2= Y; data in columns' + numCluster: number of cluster user want the data to be clustered' + private variables: Centroid, TotalData' output:' o) update centroid' o) assign cluster number to the Data (= row 0 of Data)

Dim i As IntegerDim j As IntegerDim X As SingleDim Y As SingleDim min As SingleDim cluster As IntegerDim d As SingleDim sumXY() (1)

Dim isStillMoving As Boolean isStillMoving = Trueif totalData <= numCluster Then 'only the last data is put here because it designed to be interactiveData(0, totalData) = totalData ' cluster No = total dataCentroid(1, totalData) = Data(1, totalData) ' XCentroid(2, totalData) = Data(2, totalData) ' YElse 'calculate minimum distance to assign the new datamin = 10 ^ 10 'big numberX = Data(1, totalData)Y = Data(2, totalData)For i = 1 To numCluster d = dist(X, Y, Centroid(1, i), Centroid(2, i)) If d < min Then min = dcluster = iEnd IfNext iData(0, totalData) = cluster (2)

Page 16: K means clustering

16

IndexK-Means Clustering Visual Basic CodeDo While isStillMoving' this loop will surely convergent'calculate new centroids' 1 =X, 2=Y, 3=count number of dataReDim sumXY(1 To 3, 1 To numCluster)For i = 1 To totalDatasumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))Data(0, i))sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))Next iFor i = 1 To numClusterCentroid(1, i) = sumXY(1, i) / sumXY(3, i)Centroid(2, i) = sumXY(2, i) / sumXY(3, i)Next i'assign all data to the new centroidsisStillMoving = False (3)

For i = 1 To totalDatamin = 10 ^ 10 'big numberX = Data(1, i)Y = Data(2, i)For j = 1 To numClusterd = dist(X, Y, Centroid(1, j), Centroid(2, j))If d < min Thenmin = dcluster = jEnd IfNext jIf Data(0, i) <> cluster ThenData(0, i) = clusterisStillMoving = TrueEnd IfNext iLoop End IfEnd Sub (4)