24
Unsupervised learning: Clustering Ata Kaban The University of Birmingham http:// www.cs.bham.ac.uk/~axk

Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Unsupervised learning: Clustering

Ata Kaban

The University of Birmingham

http://www.cs.bham.ac.uk/~axk

Page 2: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

The Clustering Problem

Unsupervised Learning

Data (input) ‘Interesting structure’ (output)

-Should contain essential traits

-discard unessential details

-provide a compact summary the data

-interpretable for humans

-…

Objective function that expresses our

notion of interestingness for

this data

Page 3: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Here is some data…

Page 4: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 5: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 6: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 7: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 8: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 9: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 10: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Formalising• Data points xn n=1,2,… N

• Assume K clusters

• Binary indicator variables zkn associated with each data point and cluster: 1 if xn is in cluster k and 0 otherwise

• Define a measure of cluster compactness as the total distance from the cluster mean:

Page 11: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

• Cluster quality objective (the smaller the better):

• Two sets of parameters - the cluster mean values mk and the cluster allocation indicator variables zkn

• Minimise the above objective over each set of variables while holding one set fixed This is exactly what the K-means algorithm is doing! (can you prove it?)

Page 12: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

– Pseudo-code of K-means algorithm:

Begin

initialize 1, 2, …,K

(randomly selected)

do classify n samples according to nearest i

recompute i

until no change in i

return 1, 2, …, K

End

Page 13: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 14: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 15: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk
Page 16: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Other forms of clustering

• Many times, clusters are not disjoint, but a cluster may have subclusters, in turn having sub-subclusters.

Hierarchical clustering

Page 17: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

• Given any two samples x and x’, they will be grouped together at some level, and if they are grouped a level k, they remain grouped for all higher levels

• Hierarchical clustering tree representation called dendrogram

Page 18: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

• The similarity values may help to determine if the grouping are natural or forced, but if they are evenly distributed no information can be gained

• Another representation is based on set, e.g., on the Venn diagrams

swan
this rep reveals hierarchical structure, but does not represent similarities quantitatively!
Page 19: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

• Hierarchical clustering can be divided in agglomerative and divisive.

• Agglomerative (bottom up, clumping): start with n singleton cluster and form the sequence by merging clusters

• Divisive (top down, splitting): start with all of the samples in one cluster and form the sequence by successively splitting clusters

Page 20: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Agglomerative hierarchical clustering

• The procedure terminates when the specified number of cluster has been obtained, and returns the cluster as sets of points, rather than the mean or a representative vector for each cluster

Page 21: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Application to image segmentation

Page 22: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

Application to clustering face images

Cluster centres = face prototypes

Page 23: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

The problem of the number of clusters

• Typically, the number of clusters is known.

• When it’s not, that is a hard problem called model selection. There are several ways of proceed.

• A common approach is to repeat the clustering with K=1, K=2, K=3, etc.

swan
When J_e is the sum-of-squared-error, if c increases J_e always decreases (it is sufficient to consider a sample as a singleton). Therefore, by increasing c J_e is decreasing faster up to c=c*, then J_e decreases more slowly up to c=n
Page 24: Unsupervised learning: Clustering Ata Kaban The University of Birmingham axk

What did we learn today?

• Data clustering

• K-means algorithm in detail

• How K-means can get stuck and how to take care of that

• The outline of Hierarchical clustering methods