12
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Embed Size (px)

DESCRIPTION

However, the computational complexity renders such an approach unthinkable for all but the simplest problems; there are approximately c n /c! ways of partitioning a set of n elements into c subsets, For example an exhaustive search for the best set of 5 clusters in 100 samples would require considering more than 5 97 (≈10 67 ) Thus exhaustive search is completely infeasible partitions.

Citation preview

Page 1: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Compiled By:Raj Gaurang

TiwariAssistant

ProfessorSRMGPC,

Lucknow

Iterative Optimization and Cluster Validation

Page 2: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Exhaustive enumerationOnce a criterion function has been

selected, clustering becomes a well-defined problemfind those partitions of the set of samples

that extremize the criterion function Since the sample set is finite, there are

only a finite number of possible partitions. Thus, in theory the clustering problem can always be solved by exhaustive enumeration.

Page 3: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

However, the computational complexity renders such an approach unthinkable for all but the simplest problems;

there are approximately cn/c! ways of partitioning a set of n elements into c subsets,

For example an exhaustive search for the best set of 5 clusters in 100 samples would require considering more than 597(≈1067)

Thus exhaustive search is completely infeasible partitions.

Page 4: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Iterative OptimizationThe basic idea is to find some reasonable initial

partition and to “move” samples from one group to another if such a move will improve the value of the criterion function.

Different starting points can lead to different solutions, and one never knows whether or not the best solution has been found.

Despite these limitations, the fact that the computational requirements are bearable makes this approach attractive.

Page 5: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Let us consider the use of iterative improvement to minimize the sum-of-squared error criterion Je, written as

Page 6: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation
Page 7: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

In context of i

which typically happens whenever ˆx is closer to mj than mi.

Page 8: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Basic iterative minimum-squared-error clustering

Page 9: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

CS583, Bing Liu, UIC9

Cluster ValidationThe goal of clustering is to determine the

intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good

clustering? The best criterion is heavily dependent of the

final aim of the clustering..

Page 10: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

It is the user which must supply this criterion, in such a way that the result of the clustering will suit their needs.

For instance, one could be interested in finding representatives for homogeneous groups (data

reduction), in finding natural clusters and describe their unknown

properties (natural data types), in finding useful and suitable groupings (useful data

classes) in finding unusual data objects (outlier detection).

More or less, whatever the intention of clustering may be, the number of clusters sought is always unknown. quantity

Page 11: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

Cluster ValidityFor cluster analysis, the analogous question is

how to evaluate the “goodness” of the resulting clusters?

But “clusters are in the eye of the beholder”!

Then why do we want to evaluate them?

To avoid finding patterns in noise To compare clustering algorithms To compare two sets of clusters To compare two clusters

Page 12: Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Iterative Optimization and Cluster Validation

The Problem of ValidityWhen clustering is done by extremizing a criterion

function, a common approach is to repeat the clustering procedure for c = 1, c = 2, c = 3, etc., and to see how the criterion function changes with c.

For example, it is clear that the sum-of-squared error criterion Je must decrease monotonically with c, since the squared error can be reduced each time c is increased.

If the n samples are really grouped into ˆc compact, well separated clusters, one would expect to see Je decrease rapidly until ˆc = c,

decreasing much more slowly thereafter until it reaches zero at c = n.