60
Clustering CS273P

Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

ClusteringCS273P

Page 2: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Topics● Clustering

● K-Means clustering

● Agglomerative Clustering

● Gaussian Mixtures and Expectation-Maximization (EM)

Page 3: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Supervised learning

Supervised learning: given labeled examples

model/predictor

label

label1

label3

label4

label5

Page 4: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Unsupervised learning

Unsupervised learning: given data, i.e. examples, but no labels

Page 5: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Unsupervised learningSupervised learning

● Predict target value (“y”) given features (“x”)

Unsupervised learning

● Understand patterns of data (just “x”)● Useful for many reasons

○ Data mining (“explain”)○ Missing data values (“impute”)○ Representation (feature generation or selection)

One example: clustering

● Describe data by discrete “groups” with some characteristics

Page 6: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

ClusteringClustering describes data by “groups”

The meaning of “groups” may vary by data!

Examples

Page 7: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Data from Garber et al.PNAS (98), 2001.

Gene expression data

Page 8: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Face clustering

Page 9: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-Means clustering

Page 10: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-Means ClusteringA simple clustering algorithm

Iterate between

● Updating the assignment of data to clusters● Updating the cluster’s summarization

Notation:1. Data example i has features xi2. Assume K clusters, (K=3)3. Each cluster c “described” by a center μc4. Each cluster will “claim” a set of nearby

points

Page 11: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: an example

Page 12: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: Initialize centers randomly

Page 13: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: assign points to nearest center

Page 14: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: readjust centers

Page 15: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: assign points to nearest center

Page 16: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: readjust centers

Page 17: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: assign points to nearest center

Page 18: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: readjust centers

Page 19: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-means: assign points to nearest center

No changes: Done

Page 20: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-Means ClusteringIterate until convergence:

(A) For each datum, find the closest cluster: (zi denotes cluster membership)

(B) Set each cluster to the mean of all assigned, for each cluster c:

Page 21: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

K-Means clusteringOptimizing the cost function:

Coordinate descent:

Over the cluster assignments (fixed μ)

● Only one term in sum depends on zi ● Minimized by selecting closest μc

Over the cluster centers (fixed z)

● Cluster c only depends on xi with zi =c● Minimized by selecting the mean

Guaranteed to converge after finite number of steps!

Page 22: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

InitializationMultiple local optima, depending on initialization

Try different (randomized) initializations

Can use cost C to decide which we prefer

Page 23: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Initialization methodsRandom

● Usually, choose random data index● Ensures centers are near some data● Issue: may choose nearby points

Page 24: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Initialization methodsDistance based

● Start with one random data point● Find the point farthest from the

clusters chosen so far● Issue: may choose outliers

Page 25: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Initialization methodsRandom + distance (“kmeans++”)

● Choose next points “far but randomly”

○ p(x) ~ squared distance from x to current centers

● Likely to put a cluster far away, in a region with lots of data

Page 26: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Choosing Number of ClustersWith cost function

what is the optimal value of k?

Cost always decreases with k!

A model complexity issue…

Page 27: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Choosing the number of clustersOne solution is to penalize for complexity

Add penalty: Total = Error + Complexity

Now more clusters can increase cost, if they don’t help “enough”

Ex: simplified BIC penalty

More precise version: see e.g. “X means” ( Pelleg & Moore, 2000)

Page 28: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

SummaryK-Means clustering● Clusters described as locations (“centers”) in feature space

Procedure● Initialize cluster centers● Iterate:

○ assign each data point to its closest cluster center○ move cluster centers to minimize mean squared error

Properties● Coordinate descent on MSE criterion● Prone to local optima; initialization important

Choosing the # of clusters , K● Model selection problem; penalize for complexity (BIC, etc.)

Page 29: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Agglomerative Clustering

Page 30: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Hierarchical Agglomerative Clustering

● A simple clustering algorithm● Define a distance (or dissimilarity) between

clusters (we’ll return to this)● Initialize: every example is a cluster● Iterate:

○ Compute distances between all clusters (store for efficiency)

○ Merge two closest clusters● Save both clustering and sequence of cluster

operations● Dendrogram

Page 31: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration 1Builds up a sequence of clusters (“hierarchical”)

Page 32: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration 2Builds up a sequence of clusters (“hierarchical”)

Page 33: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration 3Builds up a sequence of clusters (“hierarchical”)

Page 34: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration m-3Builds up a sequence of clusters (“hierarchical”)

Page 35: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration m-2Builds up a sequence of clusters (“hierarchical”)

Page 36: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Iteration m-1Builds up a sequence of clusters (“hierarchical”)

Page 37: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

From dentrogram to clustersGiven the sequence, can select a number of clusters or a dissimilarity threshold:

Page 38: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Cluster distances

Page 39: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Sec. 17.2

Page 40: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Sec. 17.2

Page 41: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Cluster distances - difference choices will affect clusters created

Page 42: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Example: gene expression clusteringMeasure gene expression

Various experimental conditions● Disease vs. normal● Time● Subjects

Explore similarities● What genes change together?● What conditions are similar?

Cluster on both genes and conditions

Page 43: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

SummaryAgglomerative clustering● Choose a cluster distance / dissimilarity scoring method● Successively merge closest pair of clusters● “Dendrogram” shows sequence of merges & distances

Agglomerative clusters depend critically on dissimilarity● Choice determines characteristics of “found” clusters

“Clustergram ” for understanding data matrix● Build clusters on rows (data) and columns (features)● Reorder data & features to expose behavior across groups

Page 44: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Gaussian Mixtures and EM

Page 45: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Mixtures of GaussiansK-means algorithm● Assigned each example to exactly one cluster● What if clusters are overlapping?

○ Hard to tell which cluster is right○ Maybe we should try to remain uncertain

● Used Euclidean distance● What if cluster has a non circular shape?

Gaussian mixture models● Clusters modeled as Gaussians

○ Not just by their mean

● EM algorithm: assign data to cluster with some probability● Gives probability model of x! (“generative”)

Page 46: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Mixtures of GaussiansStart with parameters describing each cluster

Mean μc , variance Σc , “size” πc

Probability distribution:

Page 47: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Mixtures of GaussiansStart with parameters describing each clusterMean μc , variance Σc , “size” πcProbability distribution:

Equivalent “latent variable” form:● Select a mixture component with probability πc● Sample from that component’s Gaussian

“Latent assignment” z:we observe x, but z is hidden

p(x) = marginal over x

Page 48: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Multivariate Gaussian Models

Page 49: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

EM Algorithm: E-stepStart with clusters: Mean μc , Covariance Σc , size πc

E-step (“Expectation”)● For each datum (example) xi ,● Compute ric, the probability that it belongs to cluster c

○ Compute its probability under model c○ Normalize to sum to one (over clusters c)

Page 50: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

EM Algorithm: E-stepStart with clusters: Mean μc , Covariance Σc , size πc

E-step (“Expectation”)● For each datum (example) xi ,● Compute ric, the probability that it belongs to cluster c

○ Compute its probability under model c○ Normalize to sum to one (over clusters c)

If xi is very likely under the c-th Gaussian, it gets high weightDenominator just makes r’ s sum to one

Page 51: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

EM Algorithm: M-StepStart with assignment probabilities ricUpdate parameters: mean μc , Covariance Σc , size πcM-step (Maximization)● For each cluster (Gaussian) z = c,● Update its parameters using the (weighted) data points

Weighted covariance of assigned dataWeighted mean of assigned data

Total responsibility allocated to cluster c

Fraction of total assigned to cluster c

Page 52: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 53: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 54: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 55: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 56: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 57: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 58: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 59: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization
Page 60: Clusteringxhx/courses/CS273P/11...K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization

Demo Time

https://lukapopijac.github.io/gaussian-mixture-model/