31
Representation learning Deep learning overview, representation learning methods in detail (sammons map, t-sne), the backprop algorithm in detail, and regularization and its impact on optimization. Dimensionality reduction Word2vec PCA Sammon's map Regularization t-SNE Factorized Embeddings Latent variable models ALI/BiGAN Slides by Joseph Paul Cohen 2020

Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Representation learningDeep learning overview, representation learning methods in detail (sammons map, t-sne), the backprop algorithm in detail, and regularization and its impact on optimization.

Dimensionality reductionWord2vecPCASammon's mapRegularizationt-SNEFactorized EmbeddingsLatent variable modelsALI/BiGAN

Slides by Joseph Paul Cohen 2020

Page 2: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Dim reduction overviewWe would like a mapping from R100 (or anything) to R2 to visualize it: aka: encoding, code, embedding, representation

Invertible vs non-invertible: If we allow for the loss of information we cannot have a lossless reconstruction. Some methods just preserve distances, no mapping learned.

Page 3: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

PCA for dim reduction (quick recap)

In 2d data this vector captures the most variance

A is the eigenvectors of X such that x=AATx, A's columns are orthogonal, and the columns of A form a basis which encodes the most variance.

Page 4: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

word2vecPresented at NeurIPS 2013

Strategy to learn representations for word tokens given their context.

Relevant to problems where context defines concepts (with redundancy)

4

Tomas Mikolov

Page 5: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

What to do with word embeddings?

● We can compose them to create paragraph embeddings (bag of embeddings).● Use in place of words for an RNN● Augment learned representations on small datasets

● `

[Cultural Shift or Linguistic Drift, Hamilton, 2016]

Study how the meaning between two texts varies (or hospitals, or doctors)?

[Pennington, 2014][Mikolov, 2013]

Study the compositionality of the learned latent space

Page 6: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Token representationsOne-hot encoding: binary vector per token

Example: cat = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0]dog = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 … 0]house = [1 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0]

Note!

If x is one hotThe dot product of Wx= a Single column of W M x N N x 1

=

M x 1

Page 7: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

word2vecTarget wordContext wordContext word Context word

involving respiratory system and other chest symptomsContext word

involving

respiratory

doctor

chest

Mikolov, Efficient Estimation of Word Representations in Vector Space, 2013

1. Each word is a training example2. Each word is used in many contexts3. The context defines each word

system

1

1

0

1

0

0

0

0

0

1-1

5.1

involving

respiratory

doctor

chest

system

Context window = 2

Page 8: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Learning in progress

Page 9: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

king + (woman - man) = ?

The point that is closest is queen!

Page 10: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Try it yourself!https://colab.research.google.com/drive/1VU4mm_DThBaQc9t0Cf6ajjHQDEw-Q1H2

Page 11: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Sammon's mapDescribed by John W. Sammon in 1969

Method of non-linear dim reduction based on gradient descent.

Basic method of preserving distances in a low dim space.

11

John W. Sammon

Page 12: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Colab Notebookhttps://colab.research.google.com/drive/1FDJ2FlVfN5PYYrNKEW2w48_BuSknhKif

Page 13: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

= distance computed between each learned representation

= distance function (or matrix) that you want to represent

First: a basic non-linear dimensionality reduction

Learn a representation that maintains pairwise distances.

Page 14: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

= distance computed between each learned representation

= distance function (or matrix) that you want to represent

Constant!

Scale discrepancy by true distance.Small distance = more important

Page 15: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

= distance computed between each learned representation

= distance function (or matrix) that you want to represent

source_d = torch.pdist(source)target_d = torch.pdist(target)

stress = (((target_d - source_d)**2)/(source_d+1)).sum()

Page 16: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Output is a non-square (non redundant) distance matrix between vectors

Page 17: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

source = torch.Tensor(data.values)target = torch.randn(source.shape[0],2, requires_grad=True)

optimizer = torch.optim.SGD([target], lr=0.5)optimizer.zero_grad() # get ready for new gradients

source_d = torch.pdist(source) # compute distancestarget_d = torch.pdist(target) # compute distances

stress = (((target_d - source_d)**2)/(source_d+1)).sum()

stress.backward() # compute gradients for target

optimizer.step() # adjust the target tensor

Page 18: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

This paper calls the learning rate the "magic factor" !

Page 19: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Exercise (regularization)

How to control the representation learned?

Adjust the objective function so that the minimum has the property you want.

dloss += 0.01*torch.pdist(target[label==6]).mean()

Page 20: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

DiscussionSimple cases? hard cases?

What does a learning rate over 1 mean?

What are the drawbacks of this sammon's map?

What does regularization change about training?

Page 21: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

t-SNETL;DR: Sammon's map but distances delay exponentially

= conditional probability that xi is next to xj given a Gaussian centered at xi

data space embedding space

Ratio between distances weighted by source data distance.Drives p and q to be equal but only for nearby points.

Page 22: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Setting the σ (perplexity)t-SNE performs a binary search for the value of σi that produces a Pi with a fixed perplexity.

Data space (but in 2d)

Small σ

Image from: https://www.dataminingapps.com/2019/11/a-refresher-on-t-sne/

Perp=30-> H = ~4.9

Large σ

Page 23: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

DiscussionHow does t-SNE differ from sammon's map?

Which distances are meaningful?

Page 24: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Factorized EmbeddingsTL;DR: two spaces of non-linear embeddings.

Conditioned on each other to predict data.

GTEx Dataset: 8,910 samples, 56,000 genes

[Trofimov et al. ICML WCB 2017]

Tissue

Page 25: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Color represents expression predicted when conditioned on

a gene

Evaluating sample space

Value predicted for entire space

Page 26: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Latent variable modelsWe learn a mapping from a latent

variable z to a complicated x = Something simple like a Gaussian

where

The conditional prob is modeled by a neural network and the latent space

is a distribution we understand.

Image from Ward, A. D, 3D Surface Parameterization Using Manifold Learning for Medial Shape Representation

Page 27: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

ALI/BiGANICLR 2017. Two papers, same idea.

Matches joint distribution p(x,z).

Trains an encoder and decoder.

27

Vincent Dumoulin Jeff Donahue

Page 28: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

ALI/BiGANMatches the joint distribution p(z,x) with q(z,x) using an adversarial loss.

p(z,x)

q(z,x)

Classifier learns to tell the difference between inputs. Update E and G so D cannot distinguish.

Typically a Gaussian generates

points here.

Page 29: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

The generator learns to match the target

distribution to fool the discriminator

Generator

Fake Image

Discriminator

Real or Fake?

Fake Real

Noise

Quick intro to adversarial distribution matching

29

Page 30: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms

Homework1) Find a single/multi cell RNA-seq dataset compute a PCA, Sammon's Map,

and t-SNE. Color points by some relevant value.

2) What are the challenges for representation learning?

3)

Page 31: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms