Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning...

Preview:

Citation preview

Introduction to Machine Learning

Clustering

Varun ChandolaComputer Science & Engineering

State University of New York at BuffaloBuffalo, NY, USA

chandola@buffalo.edu

Chandola@UB CSE 474/574 1 / 19

Outline

ClusteringClustering DefinitionK-Means ClusteringInstantations and Variants of K-MeansChoosing ParametersInitialization IssuesK-Means Limitations

Spectral ClusteringGraph LaplacianSpectral Clustering Algorithm

Chandola@UB CSE 474/574 2 / 19

Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?

I Call your four assistanteditors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue

Chandola@UB CSE 474/574 3 / 19

Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?I Call your four assistant

editors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue

Chandola@UB CSE 474/574 3 / 19

Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering

Chandola@UB CSE 474/574 4 / 19

Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering

Chandola@UB CSE 474/574 4 / 19

What is Clustering?

I Grouping similar things together

I A notion of a similarity or distance metric

I A type of unsupervised learningI Learning without any labels or target

Chandola@UB CSE 474/574 5 / 19

Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports

Chandola@UB CSE 474/574 6 / 19

Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports

Chandola@UB CSE 474/574 6 / 19

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Variants of K-Means

I Finding distanceI Euclidean distance is popular

I Finding cluster centersI Mean for K-MeansI Median for k-medoids

Chandola@UB CSE 474/574 8 / 19

Choosing Parameters

1. Similarity/distance metricI Can use non-linear transformationsI K-Means with Euclidean distance produces “circular” clusters

2. How to set k?I Trial and errorI How to evaluate clustering?I K-Means objective function

J(c,R) =N∑

n=1

K∑k=1

Rnk‖xn − ck‖2

I R is the cluster assignment matrix

Rnk =

{1 If xn ∈ cluster k0 Otherwise

Chandola@UB CSE 474/574 9 / 19

Initialization Issues

I Can lead to wrong clustering

I Better strategies

1. Choose first centroid randomly, choose second farthest away fromfirst, third farthest away from first and second, and so on.

2. Make multiple runs and choose the best

Chandola@UB CSE 474/574 10 / 19

Strengths and Limitations of K-Means

StrengthsI Simple

I Can be extended to other types of data

I Easy to parallelize

WeaknessesI Circular clusters (not with kernelized versions)

I Choosing K is always an issue

I Not guaranteed to be optimal

I Works well if natural clusters are round and of equal densities

I Hard Clustering

Chandola@UB CSE 474/574 11 / 19

Issues with K-Means

I “Hard clustering”

I Assign every data point to exactly one cluster

I Probabilistic ClusteringI Each data point can belong to multiple clusters with varying

probabilitiesI In general

P(xi ∈ Cj) > 0 ∀j = 1 . . .K

I For hard clustering probability will be 1 for one cluster and 0 for allothers

Chandola@UB CSE 474/574 12 / 19

Spectral Clustering

I An alternate approach to clustering

I Let the data be a set of N points

X = x1, x2, . . . , xN

I Let S be a N × N similarity matrix

Sij = sim(xi , xj)

I sim(, ) is a similarity function

I Construct a weighted undirected graph from S with adjacencymatrix, W

Wij =

{sim(xi , xj) if xi is nearest neighbor of xj

0 otherwise

I Can use more than 1 nearest neighbors to construct the graph

Chandola@UB CSE 474/574 13 / 19

Spectral Clustering as a Graph Min-cut Problem

I Clustering X into K clusters is equivalent to finding K cuts in thegraph WI A1,A2, . . . ,AK

I Possible objective function

cut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

I where Ak denotes the nodes in the graph which are not in Ak and

W (A,B) ,∑

i∈A,j∈B

Wij

Chandola@UB CSE 474/574 14 / 19

Straight min-cut results in trivial solution

I For K = 2, an optimal solution would have only one node in A1 andrest in A2 (or vice-versa)

Normalized Min-cut Problem

normcut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

vol(Ak)

where vol(A) ,∑

i∈A di , di is the weighted degree of the node i

Chandola@UB CSE 474/574 15 / 19

NP Hard Problem

I Equivalent to solving a 0-1 knapsack problem

I Find N binary vectors, ci of length K such that cik = 1 only if pointi belongs to cluster k

I If we relax constraints to allow cik to be real-valued, the problembecomes an eigenvector problemI Hence the name: spectral clustering

Chandola@UB CSE 474/574 16 / 19

The Graph Laplacian

L , D−W

I D is a diagonal matrix with degree of corresponding node as thediagonal value

Properties of Laplacian Matrix

1. Each row sums to 0

2. 1 is an eigen vector with eigen value equal to 0

3. Symmetric and positive semi-definite

4. Has N non-negative real-valued eigenvalues

5. If the graph (W) has K connected components, then L has Keigenvectors spanned by 1A1 , . . . , 1AK with 0 eigenvalue.

Chandola@UB CSE 474/574 17 / 19

Spectral Clustering Algorithm

ObservationI In practice, W might not have K exactly isolated connected

components

I By perturbation theory, the smallest eigenvectors of L will be closeto the ideal indicator functions

AlgorithmI Compute first (smallest) K eigen vectors of L

I Let U be the N × K matrix with eigenvectors as the columns

I Perform kMeans clustering on the rows of U

Chandola@UB CSE 474/574 18 / 19

References

Chandola@UB CSE 474/574 19 / 19

Recommended