CSCI 4520 –Introduction to Machine Learning · CSCI 4520 –Introduction to Machine Learning...

CSCI 4520 – Introduction to Machine Learning

Spring 2020

Mehdi AllahyariGeorgia Southern University

(slides borrowed from Tom Mitchell, Maria Florina Balcan, Ali Borji, KeChen)

Clustering

Clustering, Informal Goals

Goal: Automatically partition unlabeled data into groups of similar datapoints.

Question: When and why would we want to do this?

• Automatically organizing data.

Useful for:

• Representing high-dimensional data in a low-dimensional space (e.g., for visualization purposes).

• Understanding hidden structure in data.

• Preprocessing for further analysis.

Clustering, Informal Goals

Goal: Automatically partition unlabeled data into groups of similar datapoints.

Question: When and why would we want to do this?

• Automatically organizing data.

Useful for:

• Representing high-dimensional data in a low-dimensional space (e.g., for visualization purposes).

• Understanding hidden structure in data.

• Preprocessing for further analysis.

Applications

• Cluster news articles or web pages or search results by topic.

Applications (Clustering comes up everywhere…)

• Cluster protein sequences by function or genes according to expression

profile.

• Cluster users of social networks by interest (community detection).

Facebook network Twitter Network

Applications

• Cluster customers according to purchase history.

Applications (Clustering comes up everywhere…)

• Cluster galaxies or nearby stars (e.g. Sloan Digital Sky Survey)

• And many many more applications….

Clustering

CS 2750 Machine Learning

CS 2750 Machine Learning Lecture 17

Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square

Clustering

Clustering Groups together “similar” instances in the data sample Basic clustering problem: • distribute data into k different groups such that data points

similar to each other are in the same group • Similarity between data points is defined in terms of some

distance metric (can be chosen)

Clustering is useful for: • Similarity/Dissimilarity analysis

Analyze what data points in the sample are close to each other • Dimensionality reduction

High dimensional data replaced with a group (cluster) label

Example

Clustering example

• We see data points and want to partition them into groups • Which data points belong together?

-3 -2 -1 0 1 2 3

Clustering example • We see data points and want to partition them into the groups • Which data points belong together?

Example

Clustering example

• We see data points and want to partition them into groups • Which data points belong together?

Clustering example • We see data points and want to partition them into the groups • Which data points belong together?

-3 -2 -1 0 1 2 3

Example

Clustering example • We see data points and want to partition them into the groups • Requires a distance metric to tell us what points are close to

each other and are in the same group

-3 -2 -1 0 1 2 3

3Euclidean distance

Clustering example

• A set of patient cases • We want to partition them into groups based on similarities

Patient # Age Sex Heart Rate Blood pressure …

Patient 1 55 M 85 125/80

Patient 2 62 M 87 130/85

Patient 3 67 F 80 126/86

Patient 4 65 F 90 130/90

Patient 5 70 M 84 135/85

Example

Clustering example • We see data points and want to partition them into the groups • Requires a distance metric to tell us what points are close to

each other and are in the same group

Euclidean distance

Clustering example

• A set of patient cases • We want to partition them into groups based on similarities

Patient 1 55 M 85 125/80

Patient 2 62 M 87 130/85

Patient 3 67 F 80 126/86

Patient 4 65 F 90 130/90

Patient 5 70 M 84 135/85

Example

Clustering example

• A set of patient cases • We want to partition them into the groups based on similarities

Patient 1 55 M 85 125/80

Patient 2 62 M 87 130/85

Patient 3 67 F 80 126/86

Patient 4 65 F 90 130/90

Patient 5 70 M 84 135/85

How to design the distance metric to quantify similarities?

Clustering example. Distance measures. In general, one can choose an arbitrary distance measure. Properties of distance metrics: Assume 2 data entries a, b Positiveness: Symmetry: Identity: Triangle inequality:

Clustering Example. Distance Measures

Clustering example

• A set of patient cases • We want to partition them into the groups based on similarities

Patient 1 55 M 85 125/80

Patient 2 62 M 87 130/85

Patient 3 67 F 80 126/86

Patient 4 65 F 90 130/90

Patient 5 70 M 84 135/85

How to design the distance metric to quantify similarities?

Clustering example. Distance measures. In general, one can choose an arbitrary distance measure. Properties of distance metrics: Assume 2 data entries a, b Positiveness: Symmetry: Identity: Triangle inequality:

),(),( abdbad 0),( tbad

0),( aad),(),(),( cbdbadcad �d

Distance Measures

Distance measures.

Assume pure real-valued data-points:

What distance metric to use?

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5 …

Distance measures

What distance metric to use? Euclidian: works for an arbitrary k-dimensional space

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5 …

Distance Measures

Distance measures.

What distance metric to use?

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5 …

Distance measures

What distance metric to use? Euclidian: works for an arbitrary k-dimensional space

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5 …

iii babad

2)(),(

Distance Measures

Distance measures

What distance metric to use? Squared Euclidian: works for an arbitrary k-dimensional

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5

iii babad

22 )(),(

Distance measures.

Manhattan distance: works for an arbitrary k-dimensional space

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5

Etc. ..

Distance Measures

Distance measures

What distance metric to use? Squared Euclidian: works for an arbitrary k-dimensional

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5

Distance measures.

Manhattan distance: works for an arbitrary k-dimensional space

12 34.5 78.5 89.2 19.2 23.5 41.4 66.3 78.8 8.9 33.6 36.7 78.3 90.3 21.4 17.2 30.1 71.6 88.5 12.5

||),(1¦

iii babad

Etc. ..

Clustering Algorithms

Clustering Clustering is useful for: • Similarity/Dissimilarity analysis

Analyze what data points in the sample are close to each other • Dimensionality reduction

High dimensional data replaced with a group (cluster) label • Data reduction: Replaces many datapoints with the point

representing the group mean Problems: • Pick the correct similarity measure (problem specific) • Choose the correct number of groups

– Many clustering algorithms require us to provide the number of groups ahead of time

Clustering algorithms • K-means algorithm

– suitable only when data points have continuous values; groups are defined in terms of cluster centers (also called means). Refinement of the method to categorical values: K-medoids

• Probabilistic methods (with EM) – Latent variable models: class (cluster) is represented by

a latent (hidden) variable value – Every point goes to the class with the highest posterior – Examples: mixture of Gaussians, Naïve Bayes with a

hidden class • Hierarchical methods

– Agglomerative – Divisive

n Partitioning Clustering Approachn a typical clustering analysis approach via iteratively

partitioning training data set to learn a partition of the given data space

n learning a partition on a data set to produce several non-empty clusters (usually, the number of clusters given in advance)

n in principle, optimal partition achieved via minimizing the sum of squared distance to its “representative object” in each cluster

2 )(),( knn

nk mxd -=å

),(21 kCKk dE

kmxxÎ= SS=

e.g., Euclidean distance

Introduction

n Given a K, find a partition of K clusters to optimize the chosen partitioning criterion (cost function)o global optimum: exhaustively search all partitions

n The K-means algorithm: a heuristic method o K-means algorithm (MacQueen’67): each cluster is represented by

the center of the cluster and the algorithm converges to stable centriods of clusters.

o K-means algorithm is the simplest partitioning method for clustering analysis and widely used in data mining applications.

Introduction

K-means Algorithm

n Given the cluster number K, the K-means algorithm is carried out in three steps after initialization:

n Initialisation: set seed points (randomly)n Assign each object to the cluster of the nearest seed point

measured with a specific distance metricn Compute new seed points as the centroids of the clusters of the

current partition (the centroid is the center, i.e., mean point, of the cluster)

n Go back to Step 1), stop when no more new assignment (i.e., membership in each cluster no longer changes)

K-means Clustering

n Choose a number of clusters kn Initialize cluster centers µ1,… µk

n Could pick k data points and set cluster centers to these points

n Or could randomly assign points to clusters and take means of clusters

n For each data point, compute the cluster center it is closest to (using some distance measure) and assign the data point to this cluster

n Re-compute cluster centers (mean of data points in cluster)

n Stop when there are no new re-assignments

n ProblemSuppose we have 4 types of medicines and each has two attributes (pH and weight index). Our goal is to group these objects into K=2 group of medicine.

Medicine Weight pH-Index

Example

n Step 1: Use initial seed points for partitioning Bc ,Ac 21 ==

24.4)14()25( ),(

5)14()15( ),(22

Assign each object to the cluster with the nearest seed point

Euclidean distance

Example

n Step 2: Compute new centroids of the current partition

Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.

3431 ,

)1 ,1(

÷øö

çèæ ++++

Example

n Step 2: Renew membership based on new centroids

Compute the distance of all objects to the new centroids

Assign the membership to objects

Example

n Step 3: Repeat the first two steps until its convergence

Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.

)213 ,

)1 ,211(

=÷øö

çèæ ++

=÷øö

çèæ ++

Example

n Step 3: Repeat the first two steps until its convergence

Compute the distance of all objects to the new centroids

Stop due to no new assignment Membership in each cluster no longer change

Example

For the medicine data set, use K-means with the Manhattan distance metric for clustering analysis by setting K=2 and initialising seeds as C1 = A and C2 = C. Answer three questions as follows:1. How many steps are required for convergence?2. What are memberships of two clusters after convergence?3. What are centroids of two clusters after convergence?

Medicine Weight pH-Index

Exercise

Euclidean k-means Clustering

Input: A set of n datapoints 𝐱𝟏, 𝐱𝟐, … , 𝐱𝒏 in Rd

target #clusters k

Output: k representatives 𝒄𝟏, 𝐜𝟐, … , 𝒄𝒌 ∈ Rd

Objective: choose 𝒄𝟏, 𝐜𝟐, … , 𝒄𝒌 ∈ Rd to minimize

∑i=1 n minj∈ 1,…,k 𝐱𝐢 − 𝐜𝐣

target #clusters k

Natural assignment: each point assigned to its closest center, leads to a Voronoi partition.

target #clusters k

Computational complexity:

NP hard: even for k = 2 [Dagupta’08] or d = 2 [Mahajan-Nimbhorkar-Varadarajan09]

There are a couple of easy cases…

An Easy Case for k-means: k=1

Output: 𝒄 ∈ Rd to minimize ∑i=1 n 𝐱𝐢 − 𝐜

Solution:

1n ∑i=1

n 𝐱𝐢 − 𝐜2

= 𝛍 − 𝐜 2 +1n ∑i=1

n 𝐱𝐢 − 𝛍2

So, the optimal choice for 𝐜 is 𝛍.

The optimal choice is 𝛍 = 1n

∑i=1 n 𝐱𝐢

Avg k-means cost wrt c Avg k-means cost wrt μ

Idea: bias/variance like decomposition

An Easy Case for k-means: k=1

Output: 𝒄 ∈ Rd to minimize ∑i=1 n 𝐱𝐢 − 𝐜

Solution:

1n ∑i=1

n 𝐱𝐢 − 𝐜2

= 𝛍 − 𝐜 2 +1n ∑i=1

n 𝐱𝐢 − 𝛍2

So, the optimal choice for 𝐜 is 𝛍.

The optimal choice is 𝛍 = 1n

∑i=1 n 𝐱𝐢

Avg k-means cost wrt c Avg k-means cost wrt μ

Idea: bias/variance like decomposition

n Computational complexityn O(tKn), where n is number of objects, K is number of clusters,

and t is number of iterations. Normally, K, t << n.n Local optimum

n sensitive to initial seed pointsn converge to a local optimum: maybe an unwanted solution

n Other problems n Need to specify K, the number of clusters, in advance n Unable to handle noisy data and outliers (K-Medoids algorithm)n Not suitable for discovering clusters with non-convex shapesn Applicable only when mean is defined, then what about

categorical data? (K-mode algorithm)n how to evaluate the K-mean performance?

k-means Clustering Issues

Hierarchical Clustering

soccer

sports fashion

Gucci tennis Lacoste

All topics

• A hierarchy might be more natural.

• Different users might care about different levels of granularity or even prunings.

• Partition data into 2-groups (e.g., 2-means)

Top-down (divisive)

• Recursively cluster each group.

Bottom-Up (agglomerative)

soccer

sports fashion

All topics • Start with every point in its own cluster.

• Repeatedly merge the “closest” two clusters.

• Different defs of “closest” give different algorithms.

Bottom-Up (agglomerative)

• Single linkage: dist A, 𝐵 = minx∈A,x′∈B′

dist(x, x′)

dist A, B = avgx∈A,x′∈B′

dist(x, x′)

soccer

sports fashion

All topics Have a distance measure on pairs of objects. d(x,y) – distance between x and y

• Average linkage:

• Complete linkage:

• Wards’ method

E.g., # keywords in common, edit distance, etc

dist A, B = maxx∈A,x′∈B′

dist(x, x′)

Single Linkage

Single Linkage Bottom-up (agglomerative) • Start with every point in its own cluster. • Repeatedly merge the “closest” two clusters.

Single linkage: dist A, 𝐵 = minx∈A,x′∈𝐵

dist(x, x′)

0 6 2.1 3.2 -2 -3 A B C D E F

A B D E 1 2

A B C D E A B C D E F

Dendogram

Single Linkage

Single Linkage Bottom-up (agglomerative) • Start with every point in its own cluster. • Repeatedly merge the “closest” two clusters.

1 2 3 4 5

One way to think of it: at any moment, we see connected components of the graph where connect any two pts of distance < r.

0 6 2.1 3.2 -2 -3 A B C D E F

Watch as r grows (only n-1 relevant values because we only we merge at value of r corresponding to values of r in different clusters).

Single linkage: dist A, 𝐵 = minx∈A,x′∈𝐵

dist(x, x′)

Complete Linkage

Complete Linkage Bottom-up (agglomerative) • Start with every point in its own cluster. • Repeatedly merge the “closest” two clusters.

Complete linkage: dist A, B = maxx∈A,x′∈B

dist(x, x′)

One way to think of it: keep max diameter as small as possible at any level.

0 6 2.1 3.2 -2 -3 A B C D E F

A B D E 1 2

A B C DEF

A B C D E F

Complete Linkage

Complete Linkage Bottom-up (agglomerative) • Start with every point in its own cluster. • Repeatedly merge the “closest” two clusters.

One way to think of it: keep max diameter as small as possible.

0 6 2.1 3.2 -2 -3 A B C D E F

1 2 3 4 5

Complete linkage: dist A, B = maxx∈A,x′∈B

dist(x, x′)

Other Clustering Algorithms

Hierarchical clustering • Advantage:

– Smaller computational cost; avoids scanning all possible clusterings

• Disadvantage: – Greedy choice fixes the order in which clusters are merged;

cannot be repaired • Partial solution:

• combine hierarchical clustering with iterative algorithms like k-means

Other clustering methods • Spectral clustering

– Uses similarity matrix and its spectral decomposition (eigenvalues and eigenvectors)

• Multidimensional scaling – techniques often used in data visualization for exploring

similarities or dissimilarities in data.

CSCI 4520 –Introduction to Machine Learning · CSCI 4520 –Introduction to Machine Learning...

Documents

Mehdi Allahyari Georgia Southern University Logistic ... · Mehdi Allahyari Georgia Southern University (slides borrowed from Tom Mitchell, BarnabásPóczos& AartiSingh 1 Logistic

4520 c Brochure

OH 4520 NO B - CLAS

Beweiser für quantifizierte Boolesche Ausdrücke - QUAFFLE Arman Allahyari-Abhari Universität Bremen Fachbereich 3

Patton Smartnode 4520 4110 Users Guide

- on PAIDge X 4520

4516 4520.output

Final Portfolio 4520

CSci 127: Introduction to Computer Science › files › csci127S20Lecture13.pdf · CSci 127: Introduction to Computer Science hunter.cuny.edu/csci CSci 127 (Hunter) Lecture 13 12

CD 4520 Be

PIC18F2420/2520/4420/4520 Data · PDF filePIC18F2420/2520/4420/4520

COLLEGE OF SCIENCES - htir.co.in€¦ · CSCI 193 Introduction to Unix CSCI 240 Computer Software Design I CSCI 241 Computer Software Design II CSCI 270 Web Page Development CSCI

CSci 2011 Discrete Mathematics Lecture 8 CSci 2011

4520 Chapter 16

Aspire 4520 4220 4520g 4220g

4520 PPT - 02 NEC Introduction (Full)profwagner.com/4520/4520-PPT02.pdf · 2017-08-29 · 1 ECET 4520 Industrial Distribution Systems, Illumination, and the NEC Introduction to the

ECET 4520 - profwagner.comprofwagner.com/4520/4520-PPT14.pdf · 1 ECET 4520 Industrial Distribution Systems, Illumination, and the NEC Feeder and Service Load Calculation Sizing Feeders

Boletim RDR Distrito 4520 _ março

Level 2 Diploma in ICT Professional Competence (4520-02)€¦ · 4 Level 2 Diploma in ICT Professional Competence (4520-02) Level 2 Diploma in ICT Professional Competence (4520-02)

Zoom 4520 UserManual Eng