56
Dirichlet Processes Machine Learning: Jordan Boyd-Graber University of Maryland INTRODUCTION Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 1/1

users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Dirichlet Processes

Machine Learning: Jordan Boyd-GraberUniversity of MarylandINTRODUCTION

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 1 / 1

Page 2: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Content Questions

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 2 / 1

Page 3: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Administrivia

� Feeback on projects

� Work on first deliverable

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 3 / 1

Page 4: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

DPMM

� Don’t know how many clusters there are

� Gibbs sampling: change the assignment of one cluster conditioned onall other clusters

� Convergence harder to detect

� Equation

p(zi = k |~z−i ,~x ,{θk},α)∝

¨�

nkn·+α

N�

x , nx̄n+1 ,1

existingα

n·+αN (x ,0,1) new

(1)

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 4 / 1

Page 5: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Simplification

We’ll assume that:

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(2)

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 5 / 1

Page 6: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Data

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 6 / 1

Page 7: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0

Compute the (proportional) probability of assigning data 0 to a new clusterand cluster 1.Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(3)

i x1 x2 zi

0 10 101 8 9 12 7 6 23 -9 -10 34 -5 -10 45 -7 -7 56 1 1 6

� There are currently 6 clusters(4)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 8: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0

� There are currently 6 clusters

(3)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 9: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0

� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

(4)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 10: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0

� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

p(z0 = 1 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 4.004.50

,1�

= 0.16×0.00029 (4)

(5)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 11: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

p(z0 = 1 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 4.004.50

,1�

= 0.16×0.00029 (4)

p(z0 = 2 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 3.503.00

,1�

= 0.16×0.00007 (5)

p(z0 = 3 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −4.50−5.00

,1�

= 0.16×0.00000 (6)

p(z0 = 4 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −2.50−5.00

,1�

= 0.16×0.00000 (7)

p(z0 = 5 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −3.50−3.50

,1�

= 0.16×0.00000 (8)

p(z0 = 6 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 0.500.50

,1�

= 0.16×0.00000 (9)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 12: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

p(z0 = 1 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 4.004.50

,1�

= 0.16×0.00029 (4)

p(z0 = 2 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 3.503.00

,1�

= 0.16×0.00007 (5)

p(z0 = 3 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −4.50−5.00

,1�

= 0.16×0.00000 (6)

p(z0 = 4 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −2.50−5.00

,1�

= 0.16×0.00000 (7)

p(z0 = 5 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −3.50−3.50

,1�

= 0.16×0.00000 (8)

p(z0 = 6 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 0.500.50

,1�

= 0.16×0.00000 (9)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 13: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

p(z0 = 1 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 4.004.50

,1�

= 0.16×0.00029 (4)

p(z0 = 2 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 3.503.00

,1�

= 0.16×0.00007 (5)

p(z0 = 3 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −4.50−5.00

,1�

= 0.16×0.00000 (6)

p(z0 = 4 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −2.50−5.00

,1�

= 0.16×0.00000 (7)

p(z0 = 5 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −3.50−3.50

,1�

= 0.16×0.00000 (8)

p(z0 = 6 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 0.500.50

,1�

= 0.16×0.00000 (9)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 14: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 0� There are currently 6 clusters

p(z0 = new | ~z−0 ,~x ,α)∝0.25

6 + 0.25N

10.0010.00

| 0.000.00

,1�

= 0.04×0.00000 (3)

p(z0 = 1 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 4.004.50

,1�

= 0.16×0.00029 (4)

p(z0 = 2 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 3.503.00

,1�

= 0.16×0.00007 (5)

p(z0 = 3 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −4.50−5.00

,1�

= 0.16×0.00000 (6)

p(z0 = 4 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −2.50−5.00

,1�

= 0.16×0.00000 (7)

p(z0 = 5 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| −3.50−3.50

,1�

= 0.16×0.00000 (8)

p(z0 = 6 | ~z−0 ,~x ,α)∝1.00

6 + 0.25N

10.0010.00

| 0.500.50

,1�

= 0.16×0.00000 (9)

� After normalization:{new: 0.00 1: 0.80 2: 0.19 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 7 / 1

Page 15: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 0

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 8 / 1

Page 16: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1

Compute the (proportional) probability of assigning data 1 to clusters 1 and2.Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(10)

i x1 x2 zi

0 10 10 11 8 92 7 6 23 -9 -10 34 -5 -10 45 -7 -7 56 1 1 6

� There are currently 6 clusters(11)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 17: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1

� There are currently 6 clusters

p(z1 = 1 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 5.005.00

,1�

= 0.16×0.00674 (10)

(11)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 18: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1

� There are currently 6 clusters

p(z1 = 1 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 5.005.00

,1�

= 0.16×0.00674 (10)

p(z1 = 2 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 3.503.00

,1�

= 0.16×0.00055 (11)

(12)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 19: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1� There are currently 6 clusters

p(z1 = new | ~z−1 ,~x ,α)∝0.25

6 + 0.25N

8.009.00

| 0.000.00

,1�

= 0.04×0.00001 (10)

p(z1 = 1 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 5.005.00

,1�

= 0.16×0.00674 (11)

p(z1 = 2 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 3.503.00

,1�

= 0.16×0.00055 (12)

p(z1 = 3 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −4.50−5.00

,1�

= 0.16×0.00000 (13)

p(z1 = 4 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −2.50−5.00

,1�

= 0.16×0.00000 (14)

p(z1 = 5 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −3.50−3.50

,1�

= 0.16×0.00000 (15)

p(z1 = 6 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 0.500.50

,1�

= 0.16×0.00001 (16)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 20: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1� There are currently 6 clusters

p(z1 = new | ~z−1 ,~x ,α)∝0.25

6 + 0.25N

8.009.00

| 0.000.00

,1�

= 0.04×0.00001 (10)

p(z1 = 1 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 5.005.00

,1�

= 0.16×0.00674 (11)

p(z1 = 2 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 3.503.00

,1�

= 0.16×0.00055 (12)

p(z1 = 3 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −4.50−5.00

,1�

= 0.16×0.00000 (13)

p(z1 = 4 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −2.50−5.00

,1�

= 0.16×0.00000 (14)

p(z1 = 5 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −3.50−3.50

,1�

= 0.16×0.00000 (15)

p(z1 = 6 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 0.500.50

,1�

= 0.16×0.00001 (16)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 21: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 1� There are currently 6 clusters

p(z1 = new | ~z−1 ,~x ,α)∝0.25

6 + 0.25N

8.009.00

| 0.000.00

,1�

= 0.04×0.00001 (10)

p(z1 = 1 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 5.005.00

,1�

= 0.16×0.00674 (11)

p(z1 = 2 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 3.503.00

,1�

= 0.16×0.00055 (12)

p(z1 = 3 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −4.50−5.00

,1�

= 0.16×0.00000 (13)

p(z1 = 4 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −2.50−5.00

,1�

= 0.16×0.00000 (14)

p(z1 = 5 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| −3.50−3.50

,1�

= 0.16×0.00000 (15)

p(z1 = 6 | ~z−1 ,~x ,α)∝1.00

6 + 0.25N

8.009.00

| 0.500.50

,1�

= 0.16×0.00001 (16)

� After normalization:{new: 0.00 1: 0.92 2: 0.08 3: 0.00 4: 0.00 5: 0.00 6:0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 9 / 1

Page 22: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 1

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 10 / 1

Page 23: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 2

Compute the (proportional) probability of assigning data 2 to cluster 1 (butnothing else; there won’t be other options).Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(17)

i x1 x2 zi

0 10 10 11 8 9 12 7 63 -9 -10 34 -5 -10 45 -7 -7 56 1 1 6

� There are currently 5 clusters(18)

� After normalization:{new: 0.00 1: 1.00 3: 0.00 4: 0.00 5: 0.00 6: 0.00}� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 11 / 1

Page 24: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 2

� There are currently 5 clusters

p(z2 = 1 | ~z−2 ,~x ,α)∝2.00

6 + 0.25N

7.006.00

| 6.006.33

,1�

= 0.32×0.34851 (17)

(18)

� After normalization:{new: 0.00 1: 1.00 3: 0.00 4: 0.00 5: 0.00 6: 0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 11 / 1

Page 25: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 2

� There are currently 5 clusters

p(z2 = new | ~z−2 ,~x ,α)∝0.25

6 + 0.25N

7.006.00

| 0.000.00

,1�

= 0.04×0.00010 (17)

p(z2 = 1 | ~z−2 ,~x ,α)∝2.00

6 + 0.25N

7.006.00

| 6.006.33

,1�

= 0.32×0.34851 (18)

p(z2 = 3 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −4.50−5.00

,1�

= 0.16×0.00000 (19)

p(z2 = 4 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −2.50−5.00

,1�

= 0.16×0.00000 (20)

p(z2 = 5 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −3.50−3.50

,1�

= 0.16×0.00000 (21)

p(z2 = 6 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| 0.500.50

,1�

= 0.16×0.00020 (22)

� After normalization:{new: 0.00 1: 1.00 3: 0.00 4: 0.00 5: 0.00 6: 0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 11 / 1

Page 26: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 2

� There are currently 5 clusters

p(z2 = new | ~z−2 ,~x ,α)∝0.25

6 + 0.25N

7.006.00

| 0.000.00

,1�

= 0.04×0.00010 (17)

p(z2 = 1 | ~z−2 ,~x ,α)∝2.00

6 + 0.25N

7.006.00

| 6.006.33

,1�

= 0.32×0.34851 (18)

p(z2 = 3 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −4.50−5.00

,1�

= 0.16×0.00000 (19)

p(z2 = 4 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −2.50−5.00

,1�

= 0.16×0.00000 (20)

p(z2 = 5 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −3.50−3.50

,1�

= 0.16×0.00000 (21)

p(z2 = 6 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| 0.500.50

,1�

= 0.16×0.00020 (22)

� After normalization:{new: 0.00 1: 1.00 3: 0.00 4: 0.00 5: 0.00 6: 0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 11 / 1

Page 27: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 2

� There are currently 5 clusters

p(z2 = new | ~z−2 ,~x ,α)∝0.25

6 + 0.25N

7.006.00

| 0.000.00

,1�

= 0.04×0.00010 (17)

p(z2 = 1 | ~z−2 ,~x ,α)∝2.00

6 + 0.25N

7.006.00

| 6.006.33

,1�

= 0.32×0.34851 (18)

p(z2 = 3 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −4.50−5.00

,1�

= 0.16×0.00000 (19)

p(z2 = 4 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −2.50−5.00

,1�

= 0.16×0.00000 (20)

p(z2 = 5 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| −3.50−3.50

,1�

= 0.16×0.00000 (21)

p(z2 = 6 | ~z−2 ,~x ,α)∝1.00

6 + 0.25N

7.006.00

| 0.500.50

,1�

= 0.16×0.00020 (22)

� After normalization:{new: 0.00 1: 1.00 3: 0.00 4: 0.00 5: 0.00 6: 0.00}

� New assignment = 1

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 11 / 1

Page 28: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 2

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 12 / 1

Page 29: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

Compute the (proportional) probability of assigning data 3 to cluster 4 and5.Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(23)

i x1 x2 zi

0 10 10 11 8 9 12 7 6 13 -9 -104 -5 -10 45 -7 -7 56 1 1 6

� There are currently 4 clusters(24)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 30: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

(23)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 31: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

p(z3 = 4 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −2.50−5.00

,1�

= 0.16×0.00027 (23)

(24)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 32: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

p(z3 = 4 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −2.50−5.00

,1�

= 0.16×0.00027 (23)

p(z3 = 5 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00020 (24)

(25)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 33: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

p(z3 = new | ~z−3 ,~x ,α)∝0.25

6 + 0.25N

−9.00−10.00

| 0.000.00

,1�

= 0.04×0.00000 (23)

p(z3 = 1 | ~z−3 ,~x ,α)∝3.00

6 + 0.25N

−9.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (24)

p(z3 = 4 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −2.50−5.00

,1�

= 0.16×0.00027 (25)

p(z3 = 5 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00020 (26)

p(z3 = 6 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| 0.500.50

,1�

= 0.16×0.00000 (27)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 34: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

p(z3 = new | ~z−3 ,~x ,α)∝0.25

6 + 0.25N

−9.00−10.00

| 0.000.00

,1�

= 0.04×0.00000 (23)

p(z3 = 1 | ~z−3 ,~x ,α)∝3.00

6 + 0.25N

−9.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (24)

p(z3 = 4 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −2.50−5.00

,1�

= 0.16×0.00027 (25)

p(z3 = 5 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00020 (26)

p(z3 = 6 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| 0.500.50

,1�

= 0.16×0.00000 (27)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 35: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 3

� There are currently 4 clusters

p(z3 = new | ~z−3 ,~x ,α)∝0.25

6 + 0.25N

−9.00−10.00

| 0.000.00

,1�

= 0.04×0.00000 (23)

p(z3 = 1 | ~z−3 ,~x ,α)∝3.00

6 + 0.25N

−9.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (24)

p(z3 = 4 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −2.50−5.00

,1�

= 0.16×0.00027 (25)

p(z3 = 5 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00020 (26)

p(z3 = 6 | ~z−3 ,~x ,α)∝1.00

6 + 0.25N

−9.00−10.00

| 0.500.50

,1�

= 0.16×0.00000 (27)

� After normalization:{new: 0.00 1: 0.00 4: 0.58 5: 0.42 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 13 / 1

Page 36: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 3

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 14 / 1

Page 37: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

Compute the (proportional) probability of assigning data 4 to cluster 4 and5.Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(28)

i x1 x2 zi

0 10 10 11 8 9 12 7 6 13 -9 -10 44 -5 -105 -7 -7 56 1 1 6

� There are currently 4 clusters(29)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 38: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

� There are currently 4 clusters

p(z4 = 4 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −4.50−5.00

,1�

= 0.16×0.00657 (28)

(29)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 39: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

� There are currently 4 clusters

p(z4 = 4 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −4.50−5.00

,1�

= 0.16×0.00657 (28)

p(z4 = 5 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00127 (29)

(30)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 40: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

� There are currently 4 clusters

p(z4 = new | ~z−4 ,~x ,α)∝0.25

6 + 0.25N

−5.00−10.00

| 0.000.00

,1�

= 0.04×0.00001 (28)

p(z4 = 1 | ~z−4 ,~x ,α)∝3.00

6 + 0.25N

−5.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (29)

p(z4 = 4 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −4.50−5.00

,1�

= 0.16×0.00657 (30)

p(z4 = 5 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00127 (31)

p(z4 = 6 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| 0.500.50

,1�

= 0.16×0.00001 (32)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 41: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

� There are currently 4 clusters

p(z4 = new | ~z−4 ,~x ,α)∝0.25

6 + 0.25N

−5.00−10.00

| 0.000.00

,1�

= 0.04×0.00001 (28)

p(z4 = 1 | ~z−4 ,~x ,α)∝3.00

6 + 0.25N

−5.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (29)

p(z4 = 4 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −4.50−5.00

,1�

= 0.16×0.00657 (30)

p(z4 = 5 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00127 (31)

p(z4 = 6 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| 0.500.50

,1�

= 0.16×0.00001 (32)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 42: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 4

� There are currently 4 clusters

p(z4 = new | ~z−4 ,~x ,α)∝0.25

6 + 0.25N

−5.00−10.00

| 0.000.00

,1�

= 0.04×0.00001 (28)

p(z4 = 1 | ~z−4 ,~x ,α)∝3.00

6 + 0.25N

−5.00−10.00

| 6.256.25

,1�

= 0.48×0.00000 (29)

p(z4 = 4 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −4.50−5.00

,1�

= 0.16×0.00657 (30)

p(z4 = 5 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| −3.50−3.50

,1�

= 0.16×0.00127 (31)

p(z4 = 6 | ~z−4 ,~x ,α)∝1.00

6 + 0.25N

−5.00−10.00

| 0.500.50

,1�

= 0.16×0.00001 (32)

� After normalization:{new: 0.00 1: 0.00 4: 0.84 5: 0.16 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 15 / 1

Page 43: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 4

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 16 / 1

Page 44: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 5

Compute the (proportional) probability of assigning data 5 to cluster 4 (butnothing else is viable).Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(33)

i x1 x2 zi

0 10 10 11 8 9 12 7 6 13 -9 -10 44 -5 -10 45 -7 -76 1 1 6

� There are currently 3 clusters(34)

� After normalization:{new: 0.00 1: 0.00 4: 1.00 6: 0.00}� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 17 / 1

Page 45: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 5

� There are currently 3 clusters

p(z5 = 4 | ~z−5 ,~x ,α)∝2.00

6 + 0.25N

−7.00−7.00

| −4.67−6.67

,1�

= 0.32×0.09470 (33)

(34)

� After normalization:{new: 0.00 1: 0.00 4: 1.00 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 17 / 1

Page 46: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 5

� There are currently 3 clusters

p(z5 = new | ~z−5 ,~x ,α)∝0.25

6 + 0.25N

−7.00−7.00

| 0.000.00

,1�

= 0.04×0.00005 (33)

p(z5 = 1 | ~z−5 ,~x ,α)∝3.00

6 + 0.25N

−7.00−7.00

| 6.256.25

,1�

= 0.48×0.00000 (34)

p(z5 = 4 | ~z−5 ,~x ,α)∝2.00

6 + 0.25N

−7.00−7.00

| −4.67−6.67

,1�

= 0.32×0.09470 (35)

p(z5 = 6 | ~z−5 ,~x ,α)∝1.00

6 + 0.25N

−7.00−7.00

| 0.500.50

,1�

= 0.16×0.00002 (36)

� After normalization:{new: 0.00 1: 0.00 4: 1.00 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 17 / 1

Page 47: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 5

� There are currently 3 clusters

p(z5 = new | ~z−5 ,~x ,α)∝0.25

6 + 0.25N

−7.00−7.00

| 0.000.00

,1�

= 0.04×0.00005 (33)

p(z5 = 1 | ~z−5 ,~x ,α)∝3.00

6 + 0.25N

−7.00−7.00

| 6.256.25

,1�

= 0.48×0.00000 (34)

p(z5 = 4 | ~z−5 ,~x ,α)∝2.00

6 + 0.25N

−7.00−7.00

| −4.67−6.67

,1�

= 0.32×0.09470 (35)

p(z5 = 6 | ~z−5 ,~x ,α)∝1.00

6 + 0.25N

−7.00−7.00

| 0.500.50

,1�

= 0.16×0.00002 (36)

� After normalization:{new: 0.00 1: 0.00 4: 1.00 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 17 / 1

Page 48: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 5

� There are currently 3 clusters

p(z5 = new | ~z−5 ,~x ,α)∝0.25

6 + 0.25N

−7.00−7.00

| 0.000.00

,1�

= 0.04×0.00005 (33)

p(z5 = 1 | ~z−5 ,~x ,α)∝3.00

6 + 0.25N

−7.00−7.00

| 6.256.25

,1�

= 0.48×0.00000 (34)

p(z5 = 4 | ~z−5 ,~x ,α)∝2.00

6 + 0.25N

−7.00−7.00

| −4.67−6.67

,1�

= 0.32×0.09470 (35)

p(z5 = 6 | ~z−5 ,~x ,α)∝1.00

6 + 0.25N

−7.00−7.00

| 0.500.50

,1�

= 0.16×0.00002 (36)

� After normalization:{new: 0.00 1: 0.00 4: 1.00 6: 0.00}

� New assignment = 4

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 17 / 1

Page 49: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 5

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 18 / 1

Page 50: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

Compute the (proportional) probability of assigning data 6 to a new clusterand cluster 1.Recall that α= 0.25 and

p(x | x̄)∝ exp

¨

x1−n

n + 1x̄1

�2

+

x2−n

n + 1x̄2

�2«

(37)

i x1 x2 zi

0 10 10 11 8 9 12 7 6 13 -9 -10 44 -5 -10 45 -7 -7 46 1 1

� There are currently 2 clusters(38)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 51: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

� There are currently 2 clusters

p(z6 = new | ~z−6 ,~x ,α)∝0.25

6 + 0.25N

1.001.00

| 0.000.00

,1�

= 0.04×0.24312 (37)

(38)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}

� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 52: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

� There are currently 2 clusters

p(z6 = new | ~z−6 ,~x ,α)∝0.25

6 + 0.25N

1.001.00

| 0.000.00

,1�

= 0.04×0.24312 (37)

p(z6 = 1 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| 6.256.25

,1�

= 0.48×0.00060 (38)

(39)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}

� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 53: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

� There are currently 2 clusters

p(z6 = new | ~z−6 ,~x ,α)∝0.25

6 + 0.25N

1.001.00

| 0.000.00

,1�

= 0.04×0.24312 (37)

p(z6 = 1 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| 6.256.25

,1�

= 0.48×0.00060 (38)

p(z6 = 4 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| −5.25−6.75

,1�

= 0.48×0.00005 (39)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}

� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 54: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

� There are currently 2 clusters

p(z6 = new | ~z−6 ,~x ,α)∝0.25

6 + 0.25N

1.001.00

| 0.000.00

,1�

= 0.04×0.24312 (37)

p(z6 = 1 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| 6.256.25

,1�

= 0.48×0.00060 (38)

p(z6 = 4 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| −5.25−6.75

,1�

= 0.48×0.00005 (39)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}

� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 55: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Sampling point 6

� There are currently 2 clusters

p(z6 = new | ~z−6 ,~x ,α)∝0.25

6 + 0.25N

1.001.00

| 0.000.00

,1�

= 0.04×0.24312 (37)

p(z6 = 1 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| 6.256.25

,1�

= 0.48×0.00060 (38)

p(z6 = 4 | ~z−6 ,~x ,α)∝3.00

6 + 0.25N

1.001.00

| −5.25−6.75

,1�

= 0.48×0.00005 (39)

� After normalization:{new: 0.97 1: 0.03 4: 0.00}

� New assignment = 0

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 19 / 1

Page 56: users.umiacs.umd.eduusers.umiacs.umd.edu/~jbg/teaching/CMSC_726/15b.pdfSampling point 2 Compute the (proportional) probability of assigning data 2 to cluster 1 (but nothing else; there

Assignments after sampling point 6

10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 111110

9876543210123456789

1011

Machine Learning: Jordan Boyd-Graber | UMD Dirichlet Processes | 20 / 1