View
38
Download
0
Category
Preview:
Citation preview
1
K-means ClusteringIslamic University of Lebanon
Faculty of Engineering Computer and Communication – Computer – Master
Prepared by Ahmed Ramzi RashidAhmed Sedeeq Baker
Supervised by:Dr. Ali Haroun
semester 3rd 2017-2-6
2
Outlines
1 -Introduction.
- Cluster.
- K-means.
- problem.
2 -Calculate using k-means.
( step 0 , step 1 , step 2. )
3 -Results and suggestion.
4 -References .
5 -Index .
3
Introduction Cluster In general a grouping of objects in a same group (cluster) are similar
(or related) to one another and different from (or unrelated to) the objects in other groups.
4
K-means Algorithm : How it work?
-Give the cluster number (K) ,
the K-means algorithm is carried out in three steps:
Introduction
4
5
A marketing researcher wishes to determine market segments in a community based on patterns of loyalty to brands and stores a small sample of seven respondents is selected as a pilot test of how cluster analysis is applied. Two measures of loyalty- V1(store loyalty) and V2(brand loyalty)- were measured for each respondents on 0-10 scale.
Introduction
6
Now we used this equation of K-means algorithm to calculate the distance for (K=2).
G F E D C B A3.61 6.40 5 5.09 5.09 3.16 02.24 3.61 2.24 2.83 2 0 3.16
Step 1 : Use initial seed points for partitioning
D0 =C1 = ( 3,2 ) group 1C2 = ( 4,5 ) group 2
7
D0
d(A,C1)= = 0d(A,C2)= =3.16
d(B,C1) =d(B,C2) =
d(C,C1) = =5.09d(C,C2)= =2d(D,C1)= =5.09d(D,C2)= =2.83
d(E,C1) =d(E,C2) =
d(F,C1) = =6.40d(F,C2)= =3.61d(G,C1) = =3.61d(G,C2)= =2.24
we have two centroid point that we take randomly (A,B) with (K=2).
8
Step 2 : Compute new centroids of the current
D1 = C1 = ( 3,2 ) group 1C2 = ( 4.8,6 ) group 2
G F E D C B A
3.61 6.40 5 5.09 5.09 3.16 02.33 2.41 1.2 2.97 1.28 1.28 5.93
C1 = (3,2) C2 = (4+4+2+6+7+6)/6 , (5+7+7+6+7+4) C2 = (4.8 , 6)
9
D1
d(A,C1)= = 0d(A,C2)= =5.93
d(B,C1) =d(B,C2) =
d(C,C1) = =5.09d(C,C2)= =1.28d(D,C1)= =5.09d(D,C2)= =2.97
d(E,C1) =d(E,C2) =
d(F,C1) = =6.40d(F,C2)= =2.41d(G,C1) = =3.61d(G,C2)= =2.33
the first cluster have the centroid (A) , and the second cluster that have the a new centroid it’s contain these points around it (B,C,D,E,F,G) .
10
Step 3 : Repeat the first two steps
G F E D C B A
1.80 4.71 3.35 4.71 4.03 2.06 1.80
2.77 2.47 1.52 2.66 0.84 1.52 4.68D2 = C1 = ( 4.5,3 ) group
1C2 = ( 4.6,6.4 ) group
C1 = (3+6)/2 , (2+4)/2 C2 = (4+4+2+6+7)/5 , (5+7+7+6+7)/5 C1 = (4.5 , 3)
C2 = (4.6 , 6.4)
11
D2
d(A,C1)= = 1.80d(A,C2)= =4.68
d(B,C1) =d(B,C2) =
d(C,C1) = =4.03d(C,C2)= =0.84d(D,C1)= =4.71d(D,C2)= =2.66
d(E,C1) =d(E,C2) =
d(F,C1) = =4.71d(F,C2)= =2.47d(G,C1) = =1.80d(G,C2)= =2.77
The first cluster have a new centroid and the points (A,G) . Rather than in the second cluster we have change the centroid and we have these point around the new centroid (B,C,D,E,F).
12
Results and suggestion-:
-In this presentation we take a set of items and we divided it by using k-means algorithm in two groups every items in one group similar to other in same group in some qualities , and every group have a centroid that have a
minimum distance with all items in the group .
-If we use (k=3) or more we may be minimize the time and we may be have a better performance ?
13
References
1 -Pang-Ning Tan , Instructor’s Solution Manual , Pearson Addison-Wesley , pp 125 , 2006.
2 -Margaret Rouse , Cluster , http://searchexchange.techtarget.com/definition/cluster , pp 2 , 2013.
3 -Slideshare , clustering with K-means algorithm.
14
Thanks all for listening
15
IndexK-Means Clustering Visual Basic Code
Sub kMeanCluster (Data() As Variant, numCluster As Integer)' main function to cluster data into k number of Clusters' input: ' + Data matrix (0 to 2, 1 to TotalData); ' Row 0 = cluster, 1 =X, 2= Y; data in columns' + numCluster: number of cluster user want the data to be clustered' + private variables: Centroid, TotalData' output:' o) update centroid' o) assign cluster number to the Data (= row 0 of Data)
Dim i As IntegerDim j As IntegerDim X As SingleDim Y As SingleDim min As SingleDim cluster As IntegerDim d As SingleDim sumXY() (1)
Dim isStillMoving As Boolean isStillMoving = Trueif totalData <= numCluster Then 'only the last data is put here because it designed to be interactiveData(0, totalData) = totalData ' cluster No = total dataCentroid(1, totalData) = Data(1, totalData) ' XCentroid(2, totalData) = Data(2, totalData) ' YElse 'calculate minimum distance to assign the new datamin = 10 ^ 10 'big numberX = Data(1, totalData)Y = Data(2, totalData)For i = 1 To numCluster d = dist(X, Y, Centroid(1, i), Centroid(2, i)) If d < min Then min = dcluster = iEnd IfNext iData(0, totalData) = cluster (2)
16
IndexK-Means Clustering Visual Basic CodeDo While isStillMoving' this loop will surely convergent'calculate new centroids' 1 =X, 2=Y, 3=count number of dataReDim sumXY(1 To 3, 1 To numCluster)For i = 1 To totalDatasumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))Data(0, i))sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))Next iFor i = 1 To numClusterCentroid(1, i) = sumXY(1, i) / sumXY(3, i)Centroid(2, i) = sumXY(2, i) / sumXY(3, i)Next i'assign all data to the new centroidsisStillMoving = False (3)
For i = 1 To totalDatamin = 10 ^ 10 'big numberX = Data(1, i)Y = Data(2, i)For j = 1 To numClusterd = dist(X, Y, Centroid(1, j), Centroid(2, j))If d < min Thenmin = dcluster = jEnd IfNext jIf Data(0, i) <> cluster ThenData(0, i) = clusterisStillMoving = TrueEnd IfNext iLoop End IfEnd Sub (4)
Recommended