80
11-755 Machine Learning for Signal Processing Clustering Class 14. 18 Oct 2011 1 18 Oct 2011

clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

11-755 Machine Learning for Signal Processing

Clusteringg

Class 14. 18 Oct 2011

118 Oct 2011

Page 2: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering

218 Oct 2011

Page 3: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

How

318 Oct 2011

Page 4: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering What is clustering

Clustering is the determination of naturally occurring grouping ofnaturally occurring grouping of data/instances (with low within-group variability and high between-group variability)

418 Oct 2011

Page 5: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering What is clustering

Clustering is the determination of naturally occurring grouping ofnaturally occurring grouping of data/instances (with low within-group variability and high between-group variability)

518 Oct 2011

Page 6: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering What is clustering

Clustering is the determination of naturally occurring grouping ofnaturally occurring grouping of data/instances (with low within-group variability and high between-group variability)

How is it done Find groupings of data such that the

groups optimize a “within-group-variability” objective function of some kind

618 Oct 2011

Page 7: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering What is clustering

Clustering is the determination of naturally occurring grouping ofnaturally occurring grouping of data/instances (with low within-group variability and high between-group variability)

How is it done Find groupings of data such that the

groups optimize a “within-group-variability” objective function of some kind

The objective function used affects the nature of the discovered clusters E.g. Euclidean distance and distance

from center result in different clustersfrom center result in different clusters in this example

718 Oct 2011

Page 8: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering What is clustering

Clustering is the determination of naturally occurring grouping ofnaturally occurring grouping of data/instances (with low within-group variability and high between-group variability)

How is it done Find groupings of data such that the

groups optimize a “within-group-variability” objective function of some kind

The objective function used affects the nature of the discovered clusters E.g. Euclidean distance and distance

from center result in different clustersfrom center result in different clusters in this example

818 Oct 2011

Page 9: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Why Clustering

Automatic grouping into “Classes”ff ff Different clusters may show different behavior

Q Quantization All data within a cluster are represented by a

i l i tsingle point

P i t f th l ith Preprocessing step for other algorithms Indexing, categorization, etc.

918 Oct 2011

Page 10: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering criteria

Compactness criterion“ Measure that shows how “good” clusters are

The objective function

Distance of a point from a clusterT d t i th l t d t t b l t To determine the cluster a data vector belongs to

1018 Oct 2011

Page 11: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

1118 Oct 2011

Page 12: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

1218 Oct 2011

Page 13: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

Distance between the two farthest points in the clusterp

1318 Oct 2011

Page 14: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

Distance between the two farthest points in the clusterp

Total distance of every element in the cluster from the centroid of the cluster

1418 Oct 2011

Page 15: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

Distance between the two farthest points in the clusterp

Total distance of every element in the cluster from the centroid of the cluster

1518 Oct 2011

Page 16: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering Distance based measures

Total distance between each element in the cluster and every other element in the clustercluster

Distance between the two farthest points in the clusterp

Total distance of every element in the cluster from the centroid of the cluster

1618 Oct 2011

Page 17: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

“Compactness” criteria for clustering

Distance based measuresTotal distance between each Total distance between each element in the cluster and every other element in the cluster

Distance between the two farthest points in the cluster

Total distance of every element in ythe cluster from the centroid of the cluster

Distance measures are often Distance measures are often weighted Minkowski metrics

n nnn bawbawbawdist nMMM bawbawbawdist ...222111

1718 Oct 2011

Page 18: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering: Distance from cluster How far is a data point from a

cluster? Euclidean or Minkowski distance

from the centroid of the cluster

Distance from the closest point in the cluster

Distance from the farthest point in the cluster

Probability of data measured on cluster distribution

1818 Oct 2011

Page 19: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering: Distance from cluster How far is a data point from a

cluster? Euclidean or Minkowski distance

from the centroid of the cluster

Distance from the closest point in the cluster

Distance from the farthest point in the cluster

Probability of data measured on cluster distribution

1918 Oct 2011

Page 20: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering: Distance from cluster How far is a data point from a

cluster? Euclidean or Minkowski distance

from the centroid of the cluster

Distance from the closest point in the cluster

Distance from the farthest point in the cluster

Probability of data measured on cluster distribution

2018 Oct 2011

Page 21: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering: Distance from cluster How far is a data point from a

cluster? Euclidean or Minkowski distance

from the centroid of the cluster

Distance from the closest point in the cluster

Distance from the farthest point in the cluster

Probability of data measured on cluster distribution

2118 Oct 2011

Page 22: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering: Distance from cluster How far is a data point from a

cluster? Euclidean or Minkowski distance

from the centroid of the cluster

Distance from the closest point in the cluster

Distance from the farthest point in the cluster

Probability of data measured on cluster distribution

Fit of data to cluster-based regression 2218 Oct 2011

Page 23: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Optimal clustering: Exhaustive enumeration All possible combinations of data must be evaluated

If there are M data points, and we desire N clusters, the number of ways of separating M instances into N clusters is

N N

N

i

Mi iNiN

M 0)()1(

!1

Exhaustive enumeration based clustering requires that the objective function (the “Goodness measure”) be evaluated for every one of these, and the best one chosen

This is the only correct way of optimal clustering Unfortunately, it is also computationally unrealisticy p y

2318 Oct 2011

Page 24: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Not-quite non sequitir: Quantizatione

Signal Value Bits Mapped toS > 3 75v 11 3 * constalog

val

ue

S >= 3.75v 11 3 * const3.75v > S >= 2.5v 10 2 * const2.5v > S >= 1.25v 01 1 * const1 25 S 0 0 0

lity

of a

na

1.25v > S >= 0v 0 0

Analog value (arrows are quantization levels)Prob

abi

Linear quantization (uniform quantization):q ( q ) Each digital value represents an equally wide range of analog

values Regardless of distribution of data Regardless of distribution of data Digital-to-analog conversion represented by a “uniform” table

2418 Oct 2011

Page 25: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Not-quite non sequitir: Quantizatione

alog

val

ue Signal Value Bits Mapped toS >= 4v 11 4.54v > S >= 2.5v 10 3.25

lity

of a

na

2.5v > S >= 1v 01 1.251.0v > S >= 0v 0 0.5

Analog value (arrows are quantization levels)Prob

abi

Non-Linear quantization:q Each digital value represents a different range of analog values

Finer resolution in high-density areas Mu-law / A-law assumes a gaussian-like distribution of dataMu law / A law assumes a gaussian like distribution of data

Digital-to-analog conversion represented by a “non-uniform” table

2518 Oct 2011

Page 26: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Non-uniform quantizationog

val

uey

of a

nalo

Analog valueProb

abili

ty

If data distribution is not Gaussianish?

Analog valueP

Mu-law / A-law are not optimal How to compute the optimal ranges for quantization

Or the optimal table Or the optimal table

2618 Oct 2011

Page 27: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

The Lloyd Quantizerog

val

uey

of a

nalo

Prob

abili

ty

Lloyd quantizer: An iterative algorithm for computing

Analog value (arrows show quantization levels)

P

Lloyd quantizer: An iterative algorithm for computing optimal quantization tables for non-uniformly distributed data

Learned from “training” data

2718 Oct 2011

Page 28: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Lloyd Quantizer Randomly initialize

quantization pointsquantization points Right column entries of

quantization table

Assign all training points to the nearest

ti ti i tquantization point

Reestimate quantization points

Iterate until convergence Iterate until convergence

2818 Oct 2011

Page 29: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Lloyd Quantizer Randomly initialize

quantization pointsquantization points Right column entries of

quantization table

Assign all training points to the nearest

ti ti i tquantization point Draw boundaries

Reestimate quantization points

Iterate until convergence29

18 Oct 2011

Page 30: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Lloyd Quantizer Randomly initialize

quantization pointsquantization points Right column entries of

quantization table

Assign all training points to the nearest

ti ti i tquantization point Draw boundaries

Reestimate quantization points

Iterate until convergence30

18 Oct 2011

Page 31: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Lloyd Quantizer Randomly initialize

quantization pointsquantization points Right column entries of

quantization table

Assign all training points to the nearest

ti ti i tquantization point Draw boundaries

Reestimate quantization points

Iterate until convergence31

18 Oct 2011

Page 32: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Generalized Lloyd Algorithm: K–means clustering

K means is an iterative algorithm for clustering vector data McQueen, J. 1967. “Some methods for classification and

analysis of multivariate observations.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics andFifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297

General procedure:p Initially group data into the required number of clusters

somehow (initialization)Assign each data point to the closest cluster Assign each data point to the closest cluster

Once all data points are assigned to clusters, redefine clusters

Iterate 3218 Oct 2011

Page 33: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means Problem: Given a set of data

vectors, find natural clustersvectors, find natural clusters

Clustering criterion is scatter: distance from the centroiddistance from the centroid

Every cluster has a centroid The centroid represents the

l tcluster

Definition: The centroid is the weighted mean of the cluster

Weight = 1 for basic scheme

liicluster xw

wm 1

e g t o bas c sc e e

clustericlusteri

iw

3318 Oct 2011

Page 34: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3418 Oct 2011

Page 35: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3518 Oct 2011

Page 36: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3618 Oct 2011

Page 37: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3718 Oct 2011

Page 38: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3818 Oct 2011

Page 39: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

3918 Oct 2011

Page 40: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

4018 Oct 2011

Page 41: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

4118 Oct 2011

Page 42: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

4218 Oct 2011

Page 43: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points are clustered, recompute centroids

iicluster xwm 1

5. If not converged, go back to 2

clustericlusteri

iw

4318 Oct 2011

Page 44: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points are clustered, recompute centroids

iicluster xwm 1

5. If not converged, go back to 2

clusteriii

clusterii

cluster w

4418 Oct 2011

Page 45: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means comments

The distance metric determines the clustersf In the original formulation, the distance is L2

distance Euclidean norm w = 1 Euclidean norm, wi = 1

l

icluster xN

m 12||||),( clusterclustercluster mxmx distance

If we replace every x by mcluster(x), we get Vector Quantization

clustericlusterN

Quantization K-means is an instance of generalized EM

Not guaranteed to converge for all distance Not guaranteed to converge for all distance metrics

4518 Oct 2011

Page 46: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Initialization Random initialization Top-down clustering Top-down clustering Initially partition the data into two (or a small

number of) clusters using K meansnumber of) clusters using K means Partition each of the resulting clusters into two

(or a small number of) clusters, also using K means

Terminate when the desired number of clusters i bt i dis obtained

4618 Oct 2011

Page 47: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

4718 Oct 2011

Page 48: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

4818 Oct 2011

Page 49: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

4918 Oct 2011

Page 50: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

5018 Oct 2011

Page 51: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

5118 Oct 2011

Page 52: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

5218 Oct 2011

Page 53: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K-Means for Top–Down clustering1. Start with one cluster

S lit h l t i t t2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

5318 Oct 2011

Page 54: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Non-Euclidean clusters

Basic K means results in good clusters in Basic K-means results in good clusters in Euclidean spaces Alternately stated will only find clusters that are Alternately stated, will only find clusters that are

“good” in terms of Euclidean distances Will not find other types of clusters Will not find other types of clusters

5418 Oct 2011

Page 55: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Non-euclidean clustersf([x,y]) -> [x,y,z]x = xy = yz = (x2 + y2)

For other forms of clusters we must modify the distance measure E.g. distance from a circleg

May be viewed as a distance in a higher dimensional space I.e Kernel distances Kernel K-means Kernel K means

Other related clustering mechansims: Spectral clustering

Non-linear weighting of adjacency Non linear weighting of adjacency

Normalized cuts..

5518 Oct 2011

Page 56: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

The Kernel Trickf([x,y]) -> [x,y,z]x = xy = yz = (x2 + y2)

Transform the data into a synthetic higher-dimensional space where the desired patterns become natural clusterswhere the desired patterns become natural clusters E.g. the quadratic transform above

Problem: What is the function/space? Problem: What is the function/space?

Problem: Distances in higher dimensional-space are more expensive to computeexpensive to compute Yet only carry the same information in the lower-dimensional space

5618 Oct 2011

Page 57: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Distance in higher-dimensional space Transform data x through an unknown function (x) into a higher (potentially infinite)(x) into a higher (potentially infinite) dimensional space z = (x) z (x)

The distance between two points is computed in the higher-dimensional space d(x1, x2) = ||z1- z2||2 = ||(x1) – (x2)||2

d(x1, x2) can be computed without computing z Since it is a direct function of x1 and x2 Since it is a direct function of x1 and x2

5718 Oct 2011

Page 58: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Distance in higher-dimensional space Distance in lower-dimensional space: A combination

of dot productsof dot products ||x1- x2||2 = (z1- z2)T(z1- z2) = z1.z1 + z2.z2 -2 z1.z2

Distance in higher-dimensional space d(x1, x2) =||(x1) – (x2)||2

= (x1). (x1) + (x2). (x2) -2 (x1). (x2)

d( ) can be computed without knowing ( ) if: d(x1, x2) can be computed without knowing (x) if: (x1). (x2) can be computed for any x1 and x2 without

knowing (.)g ( )

5818 Oct 2011

Page 59: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

The Kernel function A kernel function K(x1,x2) is a function such that:

K( ) ( ) ( ) K(x1,x2) = (x1). (x2)

Once such a kernel function is found the Once such a kernel function is found, the distance in higher-dimensional space can be found in terms of the kernels d(x1, x2) =||(x1) – (x2)||2

= (x1). (x1) + (x2). (x2) -2 (x1). (x2)= K(x x ) + K(x x ) 2K(x x )= K(x1,x1) + K(x2,x2) - 2K(x1,x2)

But what is K(x1,x2)?( 1, 2)

5918 Oct 2011

Page 60: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

A property of the dot product

For any vector v, vTv = ||v||2 >= 0f f This is just the length of v and is therefore non-

negative

For any vector u = i ai vi, ||u||2 >=0 => (i ai vi)T(i ai vi) >= 0 > (i ai vi) (i ai vi) > 0 => i j ai aj vi .vj >= 0

This holds for ANY real {a1, a2, …}

6018 Oct 2011

Page 61: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

The Mercer Condition If z = (x) is a high-dimensional vector derived

from x then for all real {a1, a2, …} and any setfrom x then for all real {a1, a2, …} and any set {z1, z2, … } = {(x1), (x2),…} i j ai aj zi .zj >= 0i j i j i j

i j ai aj(xi).(xj) >= 0

If ( ) ( ) ( ) If K(x1,x2) = (x1). (x2) i j ai aj K(xi,xj) >= 0

Any function K() that satisfies the above condition is a valid kernel functioncondition is a valid kernel function

6118 Oct 2011

Page 62: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

The Mercer Condition K(x1,x2) = (x1). (x2)

i j ai aj K(xi xj) >= 0 i j ai aj K(xi,xj) > 0

A corollary: If any kernel K(.) satisfies the Mercer condition d(x1, x2) = K(x1,x1) + K(x2,x2) - 2K(x1,x2)

ti fi th f ll i i t fsatisfies the following requirements for a “distance” d(x x) = 0 d(x,x) = 0 d(x,y) >= 0 d(x,w) + d(w,y) >= d(x,y) d(x,w) d(w,y) d(x,y)

6218 Oct 2011

Page 63: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Typical Kernel Functions Linear: K(x,y) = xTy + c

Polynomial K(x,y) = (axTy + c)n

Gaussian: K(x,y) = exp(-||x-y||2/2)

Exponential: K(x,y) = exp(-||x-y||/)

Several others Choosing the right Kernel with the right

t f bl i tfparameters for your problem is an artform6318 Oct 2011

Page 64: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Kernel K-means

K(x,y)= (xT y + c)2

Perform the K-mean in the Kernel space The space of z = (x)

The algorithm..

6418 Oct 2011

Page 65: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means Initialize the clusters with a

random set of K points

clusteri

iii

cluster )x(ww

1mrandom set of K points

Cluster has 1 point

For each data point x find the closest cluster

clustericlusteri

iw

For each data point x, find the closest cluster2

clusterclustercluster ||m)x(||min)cluster,x(dmin)x(cluster

T

clusteriii

T

clusteriii

2cluster )x(wC)x()x(wC)x(||m)x(||)cluster,x(d

T2TT

clusteri clusteri clusterjj

Tiji

2i

Ti

T )x()x(wwC)x()x(wC2)x()x(

jiji2

ii )x,x(KwwC)x,x(KwC2)x,x(K

65

clusteri clusteri clusterj

jj

Computed entirely using only the kernel function!18 Oct 2011

Page 66: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

6618 Oct 2011

Page 67: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

6718 Oct 2011

Page 68: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

6818 Oct 2011

Page 69: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

6918 Oct 2011

Page 70: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

7018 Oct 2011

Page 71: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

7118 Oct 2011

Page 72: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

7218 Oct 2011

Page 73: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

7318 Oct 2011

Page 74: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points clustered, recompute cluster centroid

l xm 1

5. If not converged, go back to 2

clusteri

icluster

cluster xN

m

7418 Oct 2011

Page 75: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points are clustered, recompute centroids

iicluster xwm 1

5. If not converged, go back to 2

clustericlusteri

iw

7518 Oct 2011

Page 76: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Kernel K–means1. Initialize a set of centroids

randomly

2. For each data point x, find the distance from the centroid for each cluster•

3. Put data point in the cluster of the l t t id

),( clustercluster mxd distance

closest centroid• Cluster for which dcluster is

minimum

4. When all data points are clustered, recompute centroids

iicluster xwm 1

5. If not converged, go back to 2

clusteriii

clusterii

cluster w

7618 Oct 2011

Page 77: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

How many clusters?

Assumptions:f f Dimensionality of kernel space > no. of clusters

Clusters represent separate directions in Kernel spacesspaces

Kernel correlation matrix K Kernel correlation matrix K Kij = K(xi,xj)

Find Eigen values and Eigen vectors e of Find Eigen values and Eigen vectors e of kernel matrix No of clusters = no of dominant i (1Tei) terms No. of clusters no. of dominant i (1 ei) terms

7718 Oct 2011

Page 78: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Spectral Methods “Spectral” methods attempt to find “principal”

subspaces of the high dimensional kernel spacesubspaces of the high-dimensional kernel space Clustering is performed in the principal

subspacessubspaces Normalized cuts Spectral clustering Spectral clustering

Involves finding Eigenvectors and Eigen values of Kernel matrix

Fortunately, provably analogous to Kernel K-means

7818 Oct 2011

Page 79: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Other clustering methods

Regression based clustering Find a regression representing each cluster Associate each point to the cluster with the

best regression Related to kernel methods

7918 Oct 2011

Page 80: clustering - mlsp.cs.cmu.edumlsp.cs.cmu.edu/courses/fall2011/class14.18oct.clustering/class14.clustering.pdfweighted Minkowskimetrics n dist ... Euclidean or Minkowski distance from

Clustering..

Many many other variants Many applications..

Important: Appropriate choice of feature Appropriate choice of feature may eliminate need

for kernel trick..

Google is your friend.

8018 Oct 2011