23
Isomap + TSNE Machine Learning for Data Science (CS4786) Lecture 9

Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

Isomap + TSNE

Machine Learning for Data Science (CS4786)Lecture 9

Principal Component Analysis

Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/

Page 2: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

MANIFOLD BASED DIMENSIONALITY REDUCTION

Key Assumption: Points live on a low dimensional manifold

Manifold: subspace that looks locally Euclidean

Given data, can we uncover this manifold?

Can we unfold this?

Page 3: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Page 4: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Page 5: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Pair-wise Distance

Matrix

Page 6: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Pair-wise Distance

Matrix

Page 7: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

ISOMAP: PITFALLS

1 If we don’t take enough nearest neighbors, then graph may not beconnected

2 If we connect points too far away, points that should not beconnected can get connected

3 There may not be a right number of nearest neighbors we shouldconsider!

Page 8: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Page 9: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

Page 10: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

Page 11: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

Page 12: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

Page 13: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

Page 14: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

Page 15: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

Page 16: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

Page 17: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

Page 18: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

Page 19: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

Page 20: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

Page 21: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

Page 22: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

Page 23: Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

Demo