33
Visualizing Data using t-SNE Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 Kevin Zhao [email protected] October 30, 2014 Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab) t-SNE October 30, 2014 1 / 33

visualization data using t-SNE

Embed Size (px)

Citation preview

Page 1: visualization data using t-SNE

Visualizing Data using t-SNE

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008

Kevin Zhao

[email protected]

October 30, 2014

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 1 / 33

Page 2: visualization data using t-SNE

Overview

1 Overview

2 t-Distributed Stochastic Neighbor Embedding

3 Experiment Setup and Results

4 Code and Web Resources

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 2 / 33

Page 3: visualization data using t-SNE

Introduction

Overview

We are given a collection of N high-dimensional objects x1, ...xNHow can we get a feel for how these objects are arranged in the dataspace?

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 3 / 33

Page 4: visualization data using t-SNE

Introduction

Principal Components Analysis

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 4 / 33

Page 5: visualization data using t-SNE

Introduction

Principal Components Analysis

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 5 / 33

Page 6: visualization data using t-SNE

Introduction

Swiss Roll

PCA is mainly concerned dimensionality, with preserving when largepairwise distances in the map

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 6 / 33

Page 7: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Introduction

Distance Perservation

Neighbor Perservation

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 7 / 33

Page 8: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Introduction

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 8 / 33

Page 9: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Introduction

Preserve the neighborhood

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 9 / 33

Page 10: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Introduction

Measure pairwise similarities between high-dimensional andlow-dimensonal objects

pj |i =exp(−||xi − xj ||2/2σ2

i )∑k 6=i exp(−||xi − xk ||2/2σ2

i )

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 10 / 33

Page 11: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Stochastic Neighbor Embedding

Converting the high-dimensional Euclidean distances into conditionalprobabilities that represent similarities

Similarity of datapoints in High Dimension

pj |i =exp(−||xi − xj ||2/2σ2

i )∑k 6=i exp(−||xi − xk ||2/2σ2

i )

Similarity of datapoints in Low Dimension

qj |i =exp(−||yi − yj ||2)∑k 6=i exp(−||yi − yk ||2)

Cost function

C =∑i

KL(Pi ||Qi ) =∑i

∑j

pj |i logpj |iqj |i

Minimize the cost function using gradient descentLaurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 11 / 33

Page 12: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Stochastic Neighbor Embedding

Gradient has a surprisingly simple form

∂C

∂yi=

∑j 6=i

(pj |i − qj |i + pi |j − qi |j)(yi − yj)

The gradient update with momentum term is given by

Y (t) = Y (t−1) + η∂C

∂yi+ β(t)(Y (t−1) − Y (t−2))

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 12 / 33

Page 13: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Symmetric SNE

Minimize the sum of the KL divergences between the conditionalprobabilities

C =∑i

KL(Pi ||Qi ) =∑i

∑j

pj |i logpj |iqj |i

Minimize a single KL divergence between a joint probabilitydistribution

C = KL(P||Q) =∑i

∑j 6=i

pij logpijqij

The obvious way to redefine the pairwise similarities is

pij =exp(−||xi − xj ||2/2σ2)∑k 6=l exp(−||xl − xk ||2/2σ2)

qij =exp(−||yi − yj ||2)∑k 6=l exp(−||yl − yk ||2)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 13 / 33

Page 14: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Symmetric SNE

Such that pij = pji , qij = qji , the main advantage is simplifing the gradient

∂C

∂yi= 2

∑j

(pij − qij)(yi − yj)

However, in practice we symmetrize (or average) the conditionals

pij =pj |i + pi |j

2N

Set the bandwidth σi such that the conditional has a fixed perplexity(effective number of neighbors) Perp(Pi ) = 2H(Pi ), typical value is about 5to 50

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 14 / 33

Page 15: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

t-Distribution

Use heavier tail distribution than Gaussian in low-dim space, we choose

qij ∝ (1 + ||yi − yj ||2)−1

Then the gradient could be

∂C

∂yi= 4

∑j 6=i

(pij − qij)(1 + ||yi − yj ||2)−1(yi − yj)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 15 / 33

Page 16: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

t-Distributed Stochastic Neighbor Embedding

Similarity of datapoints in High Dimension

pij =exp(−||xi − xj ||2/2σ2)∑k 6=l exp(−||xl − xk ||2/2σ2)

Similarity of datapoints in Low Dimension

qij =(1 + ||yi − yj ||2)−1∑k 6=l(1 + ||yk − yl ||2)−1

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 16 / 33

Page 17: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

t-Distributed Stochastic Neighbor Embedding

Cost function

C = KL(P||Q) =∑i

∑j

pij logpijqij

Large pij modeled by small qij : Large penaltySmall pij modeled by large qij : Small penaltyt-SNE mainly preserves local similarity structure of the data

Gradient

∂C

∂yi= 4

∑j 6=i

(pij − qij)(1 + ||yi − yj ||2)−1(yi − yj)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 17 / 33

Page 18: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Gradient Interpretation

Pairwise Euclidean distance between two points in the high-dim and inlow-dim data representation

Figure : Gradient of SNE and t-SNE

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 18 / 33

Page 19: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Gradient Interpretation

We can interpret the t-SNE gradient as a simulation of an N-body system

∂C

∂yi= 4

∑j 6=i

(pij − qij)(1 + ||yi − yj ||2)−1(yi − yj)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 19 / 33

Page 20: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Gradient Interpretation

We can interpret the t-SNE gradient as a simulation of an N-body system

Displacement(yi − yj)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 20 / 33

Page 21: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Gradient Interpretation

We can interpret the t-SNE gradient as a simulation of an N-body system

Exertion / Compression

(pij − qij)(1 + ||yi − yj ||2)−1

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 21 / 33

Page 22: visualization data using t-SNE

t-Distributed Stochastic Neighbor Embedding

Gradient Interpretation

We can interpret the t-SNE gradient as a simulation of an N-body system

N-Body, summation

∂C

∂yi= 4

∑j 6=i

(pij − qij)(1 + ||yi − yj ||2)−1(yi − yj)

Reduce Complexity from O(N2) to O(N logN) via Barnes Hut(tree-based) algorithm

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 22 / 33

Page 23: visualization data using t-SNE

Experiment Setup and Results

Experiment & Results

MNIST

Randomly selected 6,000 images

28× 28 = 784 pixels

Olivetti faces

400 images (10 per individual)

92× 112 = 10, 304 pixels

COIL-20

20 different objects and 72 equally spaced orientations, yielding atotal of 1,440 images

32× 32 = 1024 pixels

Start by using PCA to reduce the dimensionality of the data to 30

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 23 / 33

Page 24: visualization data using t-SNE

Experiment Setup and Results

Experiment & Results

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 24 / 33

Page 25: visualization data using t-SNE

Experiment Setup and Results

MNIST t-SNE

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 25 / 33

Page 26: visualization data using t-SNE

Experiment Setup and Results

MNIST Sammon

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 26 / 33

Page 27: visualization data using t-SNE

Experiment Setup and Results

MNIST Isomap

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 27 / 33

Page 28: visualization data using t-SNE

Experiment Setup and Results

MNIST LLE

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 28 / 33

Page 29: visualization data using t-SNE

Experiment Setup and Results

Olivetti faces

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 29 / 33

Page 30: visualization data using t-SNE

Experiment Setup and Results

COIL-20

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 30 / 33

Page 31: visualization data using t-SNE

Code and Web Resources

Web Resources

Google: t-sneLink: http://homepage.tudelft.nl/19j49/t-SNE.html

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 31 / 33

Page 32: visualization data using t-SNE

Code and Web Resources

Source Codes

t-SNE (Matlab, CUDA, Binary, Python, Torch, Julia, R andJavaScript)

Parametric t-SNE (Matlab)

Barnes-Hut-SNE (with C++, Matlab, Python, Torch, and Rwrappers)

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 32 / 33

Page 33: visualization data using t-SNE

Code and Web Resources

Thanks for your patience

Laurens van der Maaten and Geoffrey Hinton, JMLR 2008 (MCLab)t-SNE October 30, 2014 33 / 33