49
Announcements 1 ©2017 Kevin Jamieson Homework 3 due tonight! HW 4 will be posted tonight. Start early.

Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Announcements

1©2017 Kevin Jamieson

• Homework 3 due tonight!

• HW 4 will be posted tonight. Start early.

Page 2: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 2

Clustering

Machine Learning – CSE546 Kevin Jamieson University of Washington

November 21, 2016©Kevin Jamieson 2017

Page 3: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Clustering images

3[Goldberger et al.]

Set of Images

©Kevin Jamieson 2017

Page 4: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Clustering web search results

4©Kevin Jamieson 2017

Page 5: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Hierarchical Clustering

5

Pick one: - Bottom up: start with every point as a cluster and

merge - Top down: start with a single cluster containing

all points and split

Different rules for splitting/merging, no “right answer”

Gives apparently interpretable tree representation. However, warning: even random data with no structure will produce a tree that “appears” to be structured.

Page 6: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Some Data

6©Kevin Jamieson 2017

Page 7: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

7©Kevin Jamieson 2017

Page 8: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

8©Kevin Jamieson 2017

Page 9: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)

9©Kevin Jamieson 2017

Page 10: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns

10©Kevin Jamieson 2017

Page 11: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

1. Ask user how many clusters they’d like. (e.g. k=5)

2. Randomly guess k cluster Center locations

3. Each datapoint finds out which Center it’s closest to.

4. Each Center finds the centroid of the points it owns…

5. …and jumps there

6. …Repeat until terminated!

11©Kevin Jamieson 2017

Page 12: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

K-means

■ Randomly initialize k centers µ(0) = µ1

(0),…, µk(0)

■ Classify: Assign each point j∈{1,…N} to nearest center:

■ Recenter: µi becomes centroid of its point:

Equivalent to µi ← average of its points!12©Kevin Jamieson 2017

Page 13: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

What is K-means optimizing?

■ Potential function F(µ,C) of centers µ and point allocations C:

■ Optimal K-means: minµminC F(µ,C)

13

N

©Kevin Jamieson 2017

Page 14: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Does K-means converge??? Part 1

■ Optimize potential function:

■ Fix µ, optimize C

14©Kevin Jamieson 2017

Page 15: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Does K-means converge??? Part 2

■ Optimize potential function:

■ Fix C, optimize µ

15©Kevin Jamieson 2017

Page 16: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Vector Quantization, Fisher Vectors

16

1. Represent image as grid of patches 2. Run k-means on the patches to build code book 3. Represent each patch as a code word.

Vector Quantization (for compression)

Page 17: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Vector Quantization, Fisher Vectors

17

1. Represent image as grid of patches 2. Run k-means on the patches to build code book 3. Represent each patch as a code word.

Vector Quantization (for compression)

Page 18: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Vector Quantization, Fisher Vectors

18

1. Represent image as grid of patches 2. Run k-means on the patches to build code book 3. Represent each patch as a code word.

Similar reduced representation can be used as a feature vector

Vector Quantization (for compression)

Coates, Ng, Learning Feature Representations with K-means, 2012

Typical output of k-means on patches

Page 19: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Spectral Clustering

19

Adjacency matrix: W

Wi,j = weight of edge (i, j)

Given feature vectors, could construct: - k-nearest neighbor graph with weights in {0,1} - weighted graph with arbitrary similarities

Di,i =nX

j=1

Wi,j L = D�W

Wi,j

= e��||xi�xj ||2

Let f 2 Rnbe a

function over the nodes

Page 20: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

Spectral Clustering

20

Adjacency matrix: W

Wi,j = weight of edge (i, j)

Given feature vectors, could construct: - (k=10)-nearest neighbor graph with

weights in {0,1}

Di,i =nX

j=1

Wi,j L = D�W

Popular to use the Laplacian L or

its normalized form

eL = I �D�1Was a regularizer for learning over graphs

Page 21: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 21

Mixtures of Gaussians

Machine Learning – CSE546 Kevin Jamieson University of Washington

November 21, 2016©Kevin Jamieson 2017

Page 22: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 22

(One) bad case for k-means

■ Clusters may overlap ■ Some clusters may be

“wider” than others

©Kevin Jamieson 2017

Page 23: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 23

(One) bad case for k-means

■ Clusters may overlap ■ Some clusters may be

“wider” than others

©Kevin Jamieson 2017

Page 24: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 24

Mixture models

©Kevin Jamieson 2017

Y

`(✓;Z) =nX

i=1

log[(1� ⇡)�✓1(yi) + ⇡�✓2(yi)]

Z = {yi}ni=1 is observed data

If �✓(x) is Gaussian density with parameters ✓ = (µ,�2) then

Page 25: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 25

Mixture models

©Kevin Jamieson 2017

Y

`(✓; yi,�i = 0) =

`(✓; yi,�i = 1) =

If �✓(x) is Gaussian density with parameters ✓ = (µ,�2) then

Z = {yi}ni=1 is observed data

� = {�i}ni=1 is unobserved data

Page 26: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 26

Mixture models

©Kevin Jamieson 2017

Y

`(✓;Z,�) =

nX

i=1

(1��i) log[(1� ⇡)�✓1(yi)] +�i log(⇡�✓2(yi)]

Z = {yi}ni=1 is observed data

� = {�i}ni=1 is unobserved data

If �✓(x) is Gaussian density with parameters ✓ = (µ,�2) then

If we knew �, how would we choose ✓?

Page 27: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 27

Mixture models

©Kevin Jamieson 2017

Y

`(✓;Z,�) =

nX

i=1

(1��i) log[(1� ⇡)�✓1(yi)] +�i log(⇡�✓2(yi)]

Z = {yi}ni=1 is observed data

� = {�i}ni=1 is unobserved data

If �✓(x) is Gaussian density with parameters ✓ = (µ,�2) then

If we knew ✓, how would we choose �?

Page 28: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 28

Mixture models

©Kevin Jamieson 2017

Y

`(✓;Z,�) =

nX

i=1

(1��i) log[(1� ⇡)�✓1(yi)] +�i log(⇡�✓2(yi)]

Z = {yi}ni=1 is observed data

� = {�i}ni=1 is unobserved data

If �✓(x) is Gaussian density with parameters ✓ = (µ,�2) then

�i(✓) = E[�i|✓,Z] =

Page 29: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 29

Mixture models

©Kevin Jamieson 2017

Page 30: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 30

Gaussian Mixture Example: Start

©Kevin Jamieson 2017

Page 31: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 31

After first iteration

©Kevin Jamieson 2017

Page 32: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 32

After 2nd iteration

©Kevin Jamieson 2017

Page 33: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 33

After 3rd iteration

©Kevin Jamieson 2017

Page 34: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 34

After 4th iteration

©Kevin Jamieson 2017

Page 35: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 35

After 5th iteration

©Kevin Jamieson 2017

Page 36: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 36

After 6th iteration

©Kevin Jamieson 2017

Page 37: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 37

After 20th iteration

©Kevin Jamieson 2017

Page 38: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 38

Some Bio Assay data

©Kevin Jamieson 2017

Page 39: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 39

GMM clustering of the assay data

©Kevin Jamieson 2017

Page 40: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 40

Resulting Density Estimator

©Kevin Jamieson 2017

Page 41: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 41

Expectation Maximization Algorithm

©Kevin Jamieson 2017

The iterative gaussian mixture model (GMM) fitting algorithm is special case of EM:

Z is observed data

� is unobserved data

T = (Z,�)

Page 42: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 42

Missing data example

©Kevin Jamieson 2017

xi ⇠ N (µ,⌃)but suppose some entries of xi are missing

`(✓|T, ✓) = �1

2

log(2⇡|⌃|) + (xi � µ)

T⌃

�1(x� µ)

Z is observed data

� is unobserved data

T = (Z,�)

E[`(✓0;T)|Z, b✓(j)]E Step:Natural choice for

b✓(0)?

Page 43: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 43

Missing data example

©Kevin Jamieson 2017

xi ⇠ N (µ,⌃)but suppose some entries of xi are missing

`(✓|T, ✓) = �1

2

log(2⇡|⌃|) + (xi � µ)

T⌃

�1(x� µ)

Z is observed data

� is unobserved data

T = (Z,�)

E[`(✓0;T)|Z, b✓(j)]E Step:

b✓(j+1)= argmax

✓0E[`(✓0;T)|Z, b✓(j)]M Step:

Natural choice for

b✓(0)?

E[Y |X = x] = µY + ⌃Y X⌃�1XX(x� µX)

Page 44: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 44

Missing data example

©Kevin Jamieson 2017

xi ⇠ N (µ,⌃)but suppose some entries of xi are missing

`(✓|T, ✓) = �1

2

log(2⇡|⌃|) + (xi � µ)

T⌃

�1(x� µ)

Z is observed data

� is unobserved data

T = (Z,�)

E[`(✓0;T)|Z, b✓(j)]E Step:

b✓(j+1)= argmax

✓0E[`(✓0;T)|Z, b✓(j)]M Step:

Natural choice for

b✓(0)?

Connection to matrix factorization?

E[Y |X = x] = µY + ⌃Y X⌃�1XX(x� µX)

Page 45: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017 45

Density Estimation

Machine Learning – CSE546 Kevin Jamieson University of Washington

November 21, 2016©Kevin Jamieson 2017

Page 46: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Kernel Density Estimation

46©Kevin Jamieson 2017

A very “lazy” GMM

Page 47: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Kernel Density Estimation

47©Kevin Jamieson 2017

Page 48: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Kernel Density Estimation

48©Kevin Jamieson 2017

Predict argmaxm brim

What is the Bayes optimal classification rule?

Page 49: Announcements - University of Washington · 2018. 11. 30. · Announcements ©2017 Kevin Jamieson 1 • Homework 3 due tonight! • HW 4 ... Kevin Jamieson University of Washington

©Kevin Jamieson 2017

Generative vs Discriminative

49©Kevin Jamieson 2017