26
Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19

Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Introduction to Machine Learning

Clustering

Varun ChandolaComputer Science & Engineering

State University of New York at BuffaloBuffalo, NY, USA

[email protected]

Chandola@UB CSE 474/574 1 / 19

Page 2: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Outline

ClusteringClustering DefinitionK-Means ClusteringInstantations and Variants of K-MeansChoosing ParametersInitialization IssuesK-Means Limitations

Spectral ClusteringGraph LaplacianSpectral Clustering Algorithm

Chandola@UB CSE 474/574 2 / 19

Page 3: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?

I Call your four assistanteditors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue

Chandola@UB CSE 474/574 3 / 19

Page 4: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Publishing a Magazine

I Imagine your are a magazineeditor

I Need to produce the next issue

I What do you do?I Call your four assistant

editors

1. Politics2. Health3. Technology4. Sports

I Ask each to send in k articlesI Join all to create an issue

Chandola@UB CSE 474/574 3 / 19

Page 5: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering

Chandola@UB CSE 474/574 4 / 19

Page 6: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Treating a Magazine Issue as a Data Set

I Each article is a data point consisting of words, etc.

I Each article has a (hidden) type - sports, health, politics, andtechnology

Now imagine your are the readerI Can you assign the type to each article?

I Simpler problem: Can you group articles by type?

I Clustering

Chandola@UB CSE 474/574 4 / 19

Page 7: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

What is Clustering?

I Grouping similar things together

I A notion of a similarity or distance metric

I A type of unsupervised learningI Learning without any labels or target

Chandola@UB CSE 474/574 5 / 19

Page 8: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports

Chandola@UB CSE 474/574 6 / 19

Page 9: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Expected Outcome of Clustering

x1

x 2

x1

x 2

Technology Health Politics Sports

Chandola@UB CSE 474/574 6 / 19

Page 10: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Page 11: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Page 12: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Page 13: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Page 14: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

K-Means Clustering

I Objective: Group a set of N points (∈ <D) into K clusters.

1. Start with k randomly initialized points in D dimensional spaceI Denoted as {ck}Kk=1

I Also called cluster centers

2. Assign each input point xn (∀n ∈ [1,N]) to cluster k, such that:

mink

dist(xn, ck)

3. Revise each cluster center ck using all points assigned to cluster k

4. Repeat 2

Chandola@UB CSE 474/574 7 / 19

Page 15: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Variants of K-Means

I Finding distanceI Euclidean distance is popular

I Finding cluster centersI Mean for K-MeansI Median for k-medoids

Chandola@UB CSE 474/574 8 / 19

Page 16: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Choosing Parameters

1. Similarity/distance metricI Can use non-linear transformationsI K-Means with Euclidean distance produces “circular” clusters

2. How to set k?I Trial and errorI How to evaluate clustering?I K-Means objective function

J(c,R) =N∑

n=1

K∑k=1

Rnk‖xn − ck‖2

I R is the cluster assignment matrix

Rnk =

{1 If xn ∈ cluster k0 Otherwise

Chandola@UB CSE 474/574 9 / 19

Page 17: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Initialization Issues

I Can lead to wrong clustering

I Better strategies

1. Choose first centroid randomly, choose second farthest away fromfirst, third farthest away from first and second, and so on.

2. Make multiple runs and choose the best

Chandola@UB CSE 474/574 10 / 19

Page 18: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Strengths and Limitations of K-Means

StrengthsI Simple

I Can be extended to other types of data

I Easy to parallelize

WeaknessesI Circular clusters (not with kernelized versions)

I Choosing K is always an issue

I Not guaranteed to be optimal

I Works well if natural clusters are round and of equal densities

I Hard Clustering

Chandola@UB CSE 474/574 11 / 19

Page 19: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Issues with K-Means

I “Hard clustering”

I Assign every data point to exactly one cluster

I Probabilistic ClusteringI Each data point can belong to multiple clusters with varying

probabilitiesI In general

P(xi ∈ Cj) > 0 ∀j = 1 . . .K

I For hard clustering probability will be 1 for one cluster and 0 for allothers

Chandola@UB CSE 474/574 12 / 19

Page 20: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Spectral Clustering

I An alternate approach to clustering

I Let the data be a set of N points

X = x1, x2, . . . , xN

I Let S be a N × N similarity matrix

Sij = sim(xi , xj)

I sim(, ) is a similarity function

I Construct a weighted undirected graph from S with adjacencymatrix, W

Wij =

{sim(xi , xj) if xi is nearest neighbor of xj

0 otherwise

I Can use more than 1 nearest neighbors to construct the graph

Chandola@UB CSE 474/574 13 / 19

Page 21: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Spectral Clustering as a Graph Min-cut Problem

I Clustering X into K clusters is equivalent to finding K cuts in thegraph WI A1,A2, . . . ,AK

I Possible objective function

cut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

I where Ak denotes the nodes in the graph which are not in Ak and

W (A,B) ,∑

i∈A,j∈B

Wij

Chandola@UB CSE 474/574 14 / 19

Page 22: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Straight min-cut results in trivial solution

I For K = 2, an optimal solution would have only one node in A1 andrest in A2 (or vice-versa)

Normalized Min-cut Problem

normcut(A1,A2, . . . ,AK ) ,1

2

K∑k=1

W (Ak , Ak)

vol(Ak)

where vol(A) ,∑

i∈A di , di is the weighted degree of the node i

Chandola@UB CSE 474/574 15 / 19

Page 23: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

NP Hard Problem

I Equivalent to solving a 0-1 knapsack problem

I Find N binary vectors, ci of length K such that cik = 1 only if pointi belongs to cluster k

I If we relax constraints to allow cik to be real-valued, the problembecomes an eigenvector problemI Hence the name: spectral clustering

Chandola@UB CSE 474/574 16 / 19

Page 24: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

The Graph Laplacian

L , D−W

I D is a diagonal matrix with degree of corresponding node as thediagonal value

Properties of Laplacian Matrix

1. Each row sums to 0

2. 1 is an eigen vector with eigen value equal to 0

3. Symmetric and positive semi-definite

4. Has N non-negative real-valued eigenvalues

5. If the graph (W) has K connected components, then L has Keigenvectors spanned by 1A1 , . . . , 1AK with 0 eigenvalue.

Chandola@UB CSE 474/574 17 / 19

Page 25: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

Spectral Clustering Algorithm

ObservationI In practice, W might not have K exactly isolated connected

components

I By perturbation theory, the smallest eigenvectors of L will be closeto the ideal indicator functions

AlgorithmI Compute first (smallest) K eigen vectors of L

I Let U be the N × K matrix with eigenvectors as the columns

I Perform kMeans clustering on the rows of U

Chandola@UB CSE 474/574 18 / 19

Page 26: Introduction to Machine Learning - Clustering · 2020. 6. 22. · Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New

References

Chandola@UB CSE 474/574 19 / 19