Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Introduction to Machine Learning

Clustering

Varun ChandolaComputer Science & Engineering

State University of New York at BuffaloBuffalo, NY, USA

[email protected]

Chandola@UB CSE 474/574 1 / 19

Outline

ClusteringClustering DefinitionK-Means ClusteringInstantations and Variants of K-MeansChoosing ParametersInitialization IssuesK-Means Limitations

Spectral ClusteringGraph LaplacianSpectral Clustering Algorithm


Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?

I Call your four assistanteditors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue


Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?I Call your four assistant

editors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue


Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering


Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering


What is Clustering?

I Grouping similar things together

I A notion of a similarity or distance metric

I A type of unsupervised learningI Learning without any labels or target


Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports


Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports


K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2


K-Means Clustering





mink

dist(xn, ck)


4. Repeat 2


K-Means Clustering





mink

dist(xn, ck)


4. Repeat 2


K-Means Clustering





mink

dist(xn, ck)


4. Repeat 2


K-Means Clustering





mink

dist(xn, ck)


4. Repeat 2


Variants of K-Means

I Finding distanceI Euclidean distance is popular

I Finding cluster centersI Mean for K-MeansI Median for k-medoids


Choosing Parameters

1. Similarity/distance metricI Can use non-linear transformationsI K-Means with Euclidean distance produces “circular” clusters

2. How to set k?I Trial and errorI How to evaluate clustering?I K-Means objective function

J(c,R) =N∑

n=1

K∑k=1

Rnk‖xn − ck‖2

I R is the cluster assignment matrix

Rnk =

{1 If xn ∈ cluster k0 Otherwise


Initialization Issues

I Can lead to wrong clustering

I Better strategies

1. Choose first centroid randomly, choose second farthest away fromfirst, third farthest away from first and second, and so on.

2. Make multiple runs and choose the best


Strengths and Limitations of K-Means

StrengthsI Simple

I Can be extended to other types of data

I Easy to parallelize

WeaknessesI Circular clusters (not with kernelized versions)

I Choosing K is always an issue

I Not guaranteed to be optimal

I Works well if natural clusters are round and of equal densities

I Hard Clustering


Issues with K-Means

I “Hard clustering”

I Assign every data point to exactly one cluster

I Probabilistic ClusteringI Each data point can belong to multiple clusters with varying

probabilitiesI In general

P(xi ∈ Cj) > 0 ∀j = 1 . . .K

I For hard clustering probability will be 1 for one cluster and 0 for allothers


Spectral Clustering

I An alternate approach to clustering

I Let the data be a set of N points

X = x1, x2, . . . , xN

I Let S be a N × N similarity matrix

Sij = sim(xi , xj)

I sim(, ) is a similarity function

I Construct a weighted undirected graph from S with adjacencymatrix, W

Wij =

{sim(xi , xj) if xi is nearest neighbor of xj

0 otherwise

I Can use more than 1 nearest neighbors to construct the graph


Spectral Clustering as a Graph Min-cut Problem

I Clustering X into K clusters is equivalent to finding K cuts in thegraph WI A1,A2, . . . ,AK

I Possible objective function

cut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

I where Ak denotes the nodes in the graph which are not in Ak and

W (A,B) ,∑

i∈A,j∈B

Wij


Straight min-cut results in trivial solution

I For K = 2, an optimal solution would have only one node in A1 andrest in A2 (or vice-versa)

Normalized Min-cut Problem

normcut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

vol(Ak)

where vol(A) ,∑

i∈A di , di is the weighted degree of the node i


NP Hard Problem

I Equivalent to solving a 0-1 knapsack problem

I Find N binary vectors, ci of length K such that cik = 1 only if pointi belongs to cluster k

I If we relax constraints to allow cik to be real-valued, the problembecomes an eigenvector problemI Hence the name: spectral clustering


The Graph Laplacian

L , D−W

I D is a diagonal matrix with degree of corresponding node as thediagonal value

Properties of Laplacian Matrix

1. Each row sums to 0

2. 1 is an eigen vector with eigen value equal to 0

3. Symmetric and positive semi-definite

4. Has N non-negative real-valued eigenvalues

5. If the graph (W) has K connected components, then L has Keigenvectors spanned by 1A1 , . . . , 1AK with 0 eigenvalue.


Spectral Clustering Algorithm

ObservationI In practice, W might not have K exactly isolated connected

components

I By perturbation theory, the smallest eigenvectors of L will be closeto the ideal indicator functions

AlgorithmI Compute first (smallest) K eigen vectors of L

I Let U be the N × K matrix with eigenvectors as the columns

I Perform kMeans clustering on the rows of U


References


Documents

Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New