17
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Visualizing Data using t- SNE Presenter : Wei-Hao Huang Authors : Geoffrey Hinton JMLR 2008

Visualizing Data using t-SNE

  • Upload
    pomona

  • View
    110

  • Download
    0

Embed Size (px)

DESCRIPTION

Visualizing Data using t-SNE. Presenter : Wei- Hao Huang Authors : Geoffrey Hinton JMLR 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Visualizing Data using t-SNE

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Visualizing Data using t-SNE

Presenter : Wei-Hao Huang  Authors : Geoffrey Hinton

JMLR 2008

Page 2: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Page 3: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation· Visualization of high-dimensional data is an

important problem and deals with data of widely varying dimensionality.

· Linear v.s. Nonlinear dimensionality reduction techniques.

· Techniques are strong performance on artificial data sets, but visualizing real high-dimensional data are not.

Page 4: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objectives

4

• To convert a high-dimensional data set into a matrix of pairwise similarities.

• To introduce a new technique is called “t-SNE” for visualizing the resulting similarity data.

Page 5: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

5

Methodology· Stochastic Neighbor Embedding

· t-Distributed Stochastic Neighbor Embedding─ Symmetric SNE─ Mismatched Tail can Compensate for Mismatched

Dimensionalities

Page 6: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Stochastic Neighbor Embedding·

·

· · · ·

6

Data space

Map space

Cost function

Perplexity

Gradient descent method

Page 7: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Symmetric SNE (t-SNE)

7

To use Student-t distribution improve performance.• Cost function is difficult to optimize Symmetrized• Crowding problem heavy-tailed distribution

Cost function

Data space

Map space

Gradient descent method

Page 8: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Mismatched Tails can Compensate for Mismatched Dimensionalities (t-SNE)·

·

8

Gradient descent method

Map space

Page 9: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.t-SNE Algorithm

9

Page 10: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Data Sets

─ MNIST data set, Olivetti faces data set, COIL-20 data set, word-features data set, and Netflix data set.

· Experimental Setup─ To use PCA to reduce the dimensionality─ Cost function parameter settings

10

Page 11: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Visualizations of 6,000 handwritten digits from the MNIST data set

11

t-SNE Sammon mapping

LLEIsomap

Page 12: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Visualizations of the Olivetti faces data set

12

t-SNE Sammon mapping

LLEIsomap

Page 13: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Visualizations of the COIL-20 data set

13

t-SNE Sammon mapping

LLEIsomap

Page 14: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Applying t-SNE to Large Data Sets

14

K=20Neighborhood graph

Page 15: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Weaknesses· Dimensionality reduction for other purposes.· Curse of intrinsic dimensionality.· Non-convexity of the t-SNE cost function.

15

Page 16: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

16

Conclusions

• t-SNE is capable of retaining the local structure of the data while also revealing some important global structure.

• To present a landmark approach that makes it possible to successfully visualize large real-world data sets.

Page 17: Visualizing Data using t-SNE

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

17

Comments· Advantages

─ Visualization of high-dimensional data is very well.─ Open source.

· Applications─ Visual application for data.