34
1 Unsupervised Optimal Fuzzy Clustering Presented by Asya Nikitina I.Gath and A. B. Geva. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 11(7), 773-781

Unsupervised Optimal Fuzzy Clustering

Embed Size (px)

DESCRIPTION

Unsupervised Optimal Fuzzy Clustering. I.Gath and A. B. Geva. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1989 , 11 (7), 773-781. Presented by Asya Nikitina. Fuzzy Sets and Membership Functions. - PowerPoint PPT Presentation

Citation preview

Page 1: Unsupervised Optimal Fuzzy Clustering

1

Unsupervised Optimal Fuzzy Clustering

Presented by

Asya Nikitina

I.Gath and A. B. Geva. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1989, 11(7), 773-781

Page 2: Unsupervised Optimal Fuzzy Clustering

2

Fuzzy Sets and Membership FunctionsYou are approaching a red light and must advise a driving student when to apply the brakes. What would you say: “Begin braking 74 feet from the crosswalk”? “Apply the brakes pretty soon”?

Everyday language is one example of the ways vagueness is used and propagated.

Imprecision in data and information gathered from and about our environment is either statistical (e.g., the outcome of a coin toss is a matter of chance) or nonstatistical (e.g., “apply the brakes pretty soon”).

This latter type of uncertainty is called fuzziness.

Page 3: Unsupervised Optimal Fuzzy Clustering

3

Fuzzy Sets and Membership Functions

We all assimilate and use fuzzy data, vague rules, and imprecise information.

Accordingly, computational models of real systems should also be able to recognize, represent, manipulate, interpret, and use both fuzzy and statistical uncertainties.

Statistical models deal with random events and outcomes; fuzzy models attempt to capture and quantify nonrandom imprecision.

Page 4: Unsupervised Optimal Fuzzy Clustering

4

Fuzzy Sets and Membership Functions

Conventional (or crisp) sets contain objects that satisfy precise properties required for membership. For example, the set of numbers H from 6 to 8 is crisp:

H = {r ∈ ℛ | 6 ≤ r ≤ 8}

mH = 1; 6 ≤ r ≤ 8;

mH = 0; otherwise (mH is a membership function)

Crisp sets correspond to 2-valued logic: is or isn’t on or off black or white 1 or 0

Page 5: Unsupervised Optimal Fuzzy Clustering

5

Fuzzy Sets and Membership Functions

Fuzzy sets contain objects that satisfy imprecise properties to varying degrees, for example, the “set” of numbers F that are “close to 7.”

In the case of fuzzy sets, the membership function, mF(r),

maps numbers into the entire unit interval [0,1]. The value mF(r) is called the grade of membership of r in F.

Fuzzy sets correspond to continuously-valued logic: all shades of gray between black (= 1) and white (= 0)

Page 6: Unsupervised Optimal Fuzzy Clustering

6

Fuzzy Sets and Membership Functions

Because the property “close to 7” is fuzzy, there is not a unique membership function for F. Rather, it is left to the modeler to decide, based on the potential application and properties desired for F, what mF(r) should be like.

The membership function is the basic idea in fuzzy set theory; its values measure degrees to which objects satisfy imprecisely defined properties.

Fuzzy memberships represent similarities of objects to imprecisely defined properties.

Membership values determine how much fuzziness a fuzzy set contains.

Page 7: Unsupervised Optimal Fuzzy Clustering

7

Fuzziness and Probability

Pr (B ∈ ℒ) = 0.91 mℒ(A) = 0.91

L = {all liquids}ℒ = fuzzy subset of L: ℒ = {all potable liquids}

A B

sw apm w ater? beer? H 2O? HCl?

Page 8: Unsupervised Optimal Fuzzy Clustering

8

Clustering is a mathematical tool that attempts to discover structures or certain patterns in a data set, where the objects inside each cluster show a certain degree of similarity.

Clustering

Page 9: Unsupervised Optimal Fuzzy Clustering

9

Hard clustering assign each feature vector to one and only one of the clusters with a degree of membership equal to one and well defined boundaries between clusters.

Page 10: Unsupervised Optimal Fuzzy Clustering

10

Fuzzy clustering allows each feature vector to belong to more than one cluster with different membership degrees (between 0 and 1) and vague or fuzzy boundaries between clusters.

Page 11: Unsupervised Optimal Fuzzy Clustering

11

Difficulties with Fuzzy Clustering

The optimal number of clusters K to be created has to be determined (the number of clusters cannot always be defined a priori and a good cluster validity criterion has to be found).

The character and location of cluster prototypes (centers) is not necessarily known a priori, and initial guesses have to be made.

Page 12: Unsupervised Optimal Fuzzy Clustering

12

Difficulties with Fuzzy Clustering

The data characterized by large variabilities in cluster shape, cluster density, and the number of points (feature vectors) in different clusters have to be handled.

Page 13: Unsupervised Optimal Fuzzy Clustering

13

Objectives and Challenges Create an algorithm for fuzzy clustering that partitions the data set into an optimal number of clusters.

This algorithm should account for variability in cluster shapes, cluster densities, and the number of data points in each of the subsets.

Cluster prototypes would be generated through a process of unsupervised learning.

Page 14: Unsupervised Optimal Fuzzy Clustering

14

The Fuzzy k-Means Algorithm

N – the number of feature vectors K – the number of clusters (partitions) q – weighting exponent (fuzzifier; q > 1) uik – the ith membership function on the kth vector ( uik: X [0,1] )

Σkuik = 1; 0 < Σiuik < n Vi– the cluster prototype (the mean of all feature vectors in cluster i or the center of cluster i) Jq(U,V) – the objective function

Page 15: Unsupervised Optimal Fuzzy Clustering

15

Partition a set of feature vectors Xinto K clusters (subgroups) represented as fuzzy sets F1, F2, …, FK

by minimizing the objective function Jq(U,V)

Jq(U,V) = ΣiΣk(uik)qd2(Xj – Vi); K N

Larger membership values indicate higher confidence in the assignment of the pattern to the cluster.

The Fuzzy k-Means Algorithm

Page 16: Unsupervised Optimal Fuzzy Clustering

16

Description of Fuzzy Partitioning

1) Choose primary cluster prototypes Vi

for the values of the memberships2) Compute the degree of membership of all feature vectors in all clusters:

uij = [1/d2(Xj – Vi)]1/(q-1) / Σk [1/ d2(Xj – Vi)]1/(q-1) (1) under the constraint: Σiuik = 1

Page 17: Unsupervised Optimal Fuzzy Clustering

17

Description of Fuzzy Partitioning

3) Compute new cluster prototypes Vi

Vi = Σj[(uij)q Xj ] / Σj(uij)q (2) 4) Iterate back and force between (1) and (2) until the memberships or cluster centers for successive iteration differ by more than some prescribed value (a termination criterion)

Page 18: Unsupervised Optimal Fuzzy Clustering

18

The Fuzzy k-Means Algorithm

Computation of the degree of membership uij depends

on the definition of the distance measure, d2(Xj – Vi):

d2(Xj – Vi) = (Xj – Vi) T -1(Xj – Vi)

= I => The distance is Euclidian, the shape of the clusters assumed to be hyperspherical

is arbitrary => The shape of the clusters assumed to be of arbitrary shape

Page 19: Unsupervised Optimal Fuzzy Clustering

19

The Fuzzy k-Means AlgorithmFor the hyperellipsoidal clusters, an “exponential” distance measure, d2

e (Xj – Vi), based on ML

estimation was defined:

d2e (Xj – Vi) = [det(Fi)]1/2/Pi exp[(Xj – Vi) T Fi

-1(Xj – Vi)/2]

Fi – the fuzzy covariance matrix of the ith clusterPi – the a priori probability of selecting ith cluster

h(i/Xj) = (1/d2e (Xj – Vi))/ Σk (1/d2

e (Xj – Vk))

h(i/Xj) – the posterior probability (the probability of selecting ith cluster given jth vector)

Page 20: Unsupervised Optimal Fuzzy Clustering

20

The Fuzzy k-Means AlgorithmIt’s easy to see that for q = 2, h(i/Xj) = uij

Thus, substituting uij with h(i/Xj) results in the fuzzy modification of the ML estimation (FMLE).

Addition calculations for the FMLE:

Page 21: Unsupervised Optimal Fuzzy Clustering

21

The Major Advantage of FMLE

Obtaining good partition results starting from “good” classification prototypes.

The first layer of the algorithm, unsupervised tracking of initial centroids, is based on the fuzzy K-means algorithm.

The next phase, the optimal fuzzy partition, is being carried out with the FMLE algorithm.

Page 22: Unsupervised Optimal Fuzzy Clustering

22

Unsupervised Tracking of Cluster Prototypes

Different choices of classification prototypes may lead to different partitions.

Given a partition into k cluster prototypes, place the next (k +1)th cluster center in a region where data points have low degree of membership in the existing k clusters.

Page 23: Unsupervised Optimal Fuzzy Clustering

23

Unsupervised Tracking of Cluster Prototypes

1) Compute average and standard deviation of the whole data set.

2) Choose the first initial cluster prototype at the average location of all feature vectors.

3) Choose an additional classification prototype equally distant from all data points.

4) Calculate a new partition of the data set according to steps 1) and 2) of the fuzzy

k-means algorithm.1) If k, the number of clusters, is less than a given

maximum, go to step 3, otherwise stop.

Page 24: Unsupervised Optimal Fuzzy Clustering

24

Common Fuzzy Cluster ValidityEach data point has K memberships; so, it is desirable to summarize the information by a single number, which indicates how well the data point (Xk) is classified by clustering.

Σi(uik)2 partition coefficient

Σi(uik) loguik classification entropy

maxi uik proportional coefficient

The cluster validity is just the average of any of those functions over the entire data set.

Page 25: Unsupervised Optimal Fuzzy Clustering

25

Proposed Performance Measures“Good” clusters are actually not very fuzzy.

The criteria for the definition of “optimalpartition” of the data into subgroups werebased on the following requirements:

1. Clear separation between the resulting clusters

2. Minimal volume of the clusters3. Maximal number of data points concentrated

in the vicinity of the cluster centroid

Page 26: Unsupervised Optimal Fuzzy Clustering

26

Proposed Performance Measures

Fuzzy hypervolume, FHV, is defined by:

Where Fi is given by:

Page 27: Unsupervised Optimal Fuzzy Clustering

27

Proposed Performance Measures

Average partition density, DPA, is calculated from:

Where Si, the “ sum of the central members”, is given by:

Page 28: Unsupervised Optimal Fuzzy Clustering

28

Proposed Performance Measures

The partition density, PD, is calculated from:

Page 29: Unsupervised Optimal Fuzzy Clustering

29

Sample Runs

In order to test the performance of the algorithm, N artificial m-dimensional feature vectors from a multivariate normal distribution having different parameters and densities were generated.

Situations of large variability of cluster shapes, densities, and number of data points in each cluster were simulated.

Page 30: Unsupervised Optimal Fuzzy Clustering

30

FCM Clustering with Varying Density

The higher density cluster attracts all other cluster prototypesso that the prototype of the right cluster is slightly drawn awayfrom the original cluster center and the prototype of the leftcluster migrates completely into the dense cluster.

Page 31: Unsupervised Optimal Fuzzy Clustering

31

Page 32: Unsupervised Optimal Fuzzy Clustering

32

Fig. 3. Partition of 12 clusters generated from five-dimensional multivariate Gaussian distribution with unequally variable features, variable densities and variable number of data points ineach cluster (only three of the features are displayed).

(a) Data points before partitioning(b) Partition of 12 subgroups using the UFP-ONC algorithm.

All data points gave been classified correctly.

(a) (b)

Page 33: Unsupervised Optimal Fuzzy Clustering

33

Page 34: Unsupervised Optimal Fuzzy Clustering

34

Conclusions

The new algorithm, UFP-ONC (unsupervised fuzzy partition-optimal number of classes), that combines the most favorable features of both the fuzzy K-means algorithm and the FMLE, together with unsupervised tracking of classification prototypes, were created. The algorithm performs extremely well in situations of large variability of cluster shapes, densities, and number of data points in each cluster .