UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 1
Lecture 9: Unsupervised, Generative & Adversarial NetworksDeep Learning @ UvA
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 2
o Recurrent Neural Networks (RNN) for sequences
o Backpropagation Through Time
o Vanishing and Exploding Gradients and Remedies
o RNNs using Long Short-Term Memory (LSTM)
o Applications of Recurrent Neural Networks
Previous Lecture
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 3
o Latent space data manifolds
o Autoencoders and Variational Autoencoders
o Boltzmann Machines
o Adversarial Networks
o Self-Supervised Networks
Lecture Overview
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 4
The manifold hypothesis
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 5
Data in theory
o One image: 256 256Γ256Γ3
β¦ 256 height, 256 width, 3 RGB channels, 256 pixel intensities
o Each of these images is like the one in the background
o For text the equivalent would be generating random letter sequences
dakndfqblznqrnbecaojdwlzbirnxxbesjntxapkklsndtuwhc
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 6
Data in theory
o One image: 256 256Γ256Γ3
β¦ 256 height, 256 width, 3 RGB channels, 256 pixel intensities
o Each of these images is like the one in the background
o For text the equivalent would be generating random letter sequences
qgkhlkjijxskmbisuwephrhudskneyaeajdzhowieyqwhfnago
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 7
Data in theory
o One image: 256 256Γ256Γ3
β¦ 256 height, 256 width, 3 RGB channels, 256 pixel intensities
o Each of these images is like the one in the background
o For text the equivalent would be generating random letter sequences
tpxvdcxwgebouyvqaekqzgwvqfuakuhodsapxzbfsizgobtpjb
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 8
o Data live in manifolds
o A manifold is a latent structure of muchlower dimensionalitydimππππππππ βͺ dimπππ‘π
o Nobody βforcesβ the datato be on the manifold, theyjust are
o The manifold occupies onlya tiny fraction of the possible space
Data in reality: The manifold hypothesis
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 9
o The trajectory definestransformations/variances
o Rotation⦠E.g. the faces turns
Data in reality: The manifold hypothesis
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 10
o The trajectory definestransformations/variances
o Rotation⦠E.g. the faces turns
o Global appearance changes⦠E.g., Different face
Data in reality: The manifold hypothesis
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 11
o The trajectory definestransformations/variances
o Rotation⦠E.g. the faces turns
o Global appearance changes⦠E.g., Different face
o Or even local appearance change⦠E.g., eyes open/closed
Data in reality: The manifold hypothesis
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 12
o Each image is a either a set ofcoordinates in the full data space
Data in reality: The manifold hypothesis
Image=
0.1β0.20.80.3β0.5β0.30.80.1β0.4
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 13
o Or a smaller set of intrinsicmanifold coordinates
Data in reality: The manifold hypothesis
Image taken from Pascal Vincent, Deep Learning Summer School, 2015
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 14
o Manifold Data distribution
o Learning the manifold Learning data distribution and data variances
o How to learn the these variances automatically?
o Unsupervised and/or generative learning
So what?
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 15
Unsupervised and GenerativeLearning
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 16
o Latent space manifolds
o Autoencoders and Variational Autoencoders
o Boltzmann Machines
o Adversarial Networks
o Self-Supervised Networks
What is unsupervised learning?
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 17
o Much more unlabeled than labeled data⦠Large data better models
β¦ Ideally not pay for annotations
o What is even the correct annotation for learning data distribution and/or data variances
o Discovering structure⦠What are the important features in the dataset
Why unsupervised learning?
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 18
o Latent space manifolds
o Autoencoders and Variational Autoencoders
o Boltzmann Machines
o Adversarial Networks
o Self-Supervised Networks
What are generative models?
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 19
o Force models to go beyond surface statistical regularities of existing data⦠Go beyond direct association of observable inputs to observable outputs
β¦ Avoid silly predictions (adversarial examples)
o Understand the world
o Understand data variances, disentangle and recreate them
o Detect unexpected events in your data
o Pave the way towards reasoning and decision making
Why generative?
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 20
o Autoencoders and Variational Autoencoders
o Boltzmann Machines
o Adversarial Networks
o Self-Supervised Networks
Unsupervised & generative learning of the manifold
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 21
Autoencoders
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 22
o π§ is usually after a non-linearityβ¦ E.g. π§ = β ππ₯ + π , β β‘ π β , tanh(β )
o Reconstruction error β
o Often error is the Euclidean loss
Standard Autoencoder
Encoder π
Decoder π
Latent space πError β
Input: π₯
Output: reconstruction ΰ·π₯
β =
π
β π₯, π β π₯
β = π₯ β ΰ·π₯ 2
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 23
o The latent space should have fewer dimensions than input⦠Undercomplete representation
β¦ Bottleneck architecture
o Otherwise (overcomplete) autoencoder might learn the identity function
π β πΌ βΉ π₯ = π₯ βΉ β = 0β¦ Assuming no regularization
β¦ Often in practice still works though
o Also, if π§ = ππ₯ + π (linear) autoencoder learns same subspace as PCA
Standard Autoencoder
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 24
o Stacking layers on top of each other
o Bigger depth higher level abstractions
o Slower training
Stacking Autoencoders
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 25
o Add random noise to input⦠Dropout, gaussian
o Loss includes expectationover noise distribution
Denoising Autoencoder
Error β
Corrupted input: π₯
Output: reconstruction ΰ·π₯
Input: π₯
Noise π: π( π₯|π₯, π)
Encoder π
Decoder π
Latent space π
β =
π
Ξπ( π₯|π₯)β π₯, π β π₯
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 26
o The network does not overlearn the data⦠Can even use overcomplete latent spaces
o Model forced to learn more intelligent, robust representations⦠Learn to ignore noise or trivial solutions(identity)
β¦ Focus on βunderlyingβ data generation process
Denoising Autoencoder
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 27
o Instead of invariant to input noise, invariant output representations⦠Similar input with different sampled noise similar outputs
o Regularization term over noisy reconstructions
Stochastic Contractive Autoencoders
β =
π
β π₯, π β π₯ + πΞπ( π₯|π₯)[ β π₯ β β π₯ 2]
Reconstruction error Regularization error
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 28
o Instead of invariant to input noise, invariant output representations⦠Similar input with different sampled noise similar outputs
o Regularization term over noisy reconstructions
o Trivial solution?
Stochastic Contractive Autoencoders
β =
π
β π₯, π β π₯ + πΞπ( π₯|π₯)[ β π₯ β β π₯ 2]
Reconstruction error Regularization error
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 29
o Instead of invariant to input noise, invariant output representations⦠Similar input with different sampled noise similar outputs
o Regularization term over noisy reconstructions
o Trivial solution?β¦ β π₯ = 0
β¦ To avoid make sure the encoder and decoder weights are shared ππ = πππ
Stochastic Contractive Autoencoders
β =
π
β π₯, π β π₯ + πΞπ( π₯|π₯)[ β π₯ β β π₯ 2]
Reconstruction error Regularization error
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 30
o Taylor series expansion about a point π¦ = πβ π¦ = β π + ββ² π π¦ β π + 0.5 ββ²β² π π¦ β π 2 + β¦
o For π¦ = π₯ = π₯ + π, where π~π(0, π2πͺ) the (first order) Taylor
β π₯ = β π₯ + π = β π₯ +πh
ππ₯π
Stochastic Analytic Regularization
Ξπ( π₯|π₯)[ β π₯ β β π₯ 2] =πh
ππ₯π
2
β π2πh
ππ₯
2
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 31
o Taylor series expansion about a point π¦ = πβ π¦ = β π + ββ² π π¦ β π + 0.5 ββ²β² π π¦ β π 2 + β¦
o For π¦ = π₯ = π₯ + π, where π~π(0, π2πͺ) the (first order) Taylor
β π₯ = β π₯ + π = β π₯ +πh
ππ₯π
o Higher order expansions also possible although computationally expensive⦠Alternative: compute higher order derivatives stochastically
β¦ Analytic and stochastic regularization
Stochastic Analytic Regularization
Ξπ( π₯|π₯)[ β π₯ β β π₯ 2] =πh
ππ₯π
2
β π2πh
ππ₯
2
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 32
o Gradients show sensitivity of function around a point
β =
π
β π₯, π β π₯ + ππh
ππ₯
2
o Regularization term penalizes sensitivity to all directions
o Reconstruction term enforces sensitivity only to direction of manifold
Some geometric intuition
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 33
o Letβs try to check where the gradients are sensitive around our manifold
o Gradients Jacobian
o Strongest directions of Jacobian Compute SVD of Jacobianπβ
ππ₯= ππππ
o Take strongest singular values in π and respective vectors from Uβ¦ These are our strongest tangents directions of Jacobian of most sensitivity
o Hopefully the tangents sensitive to direction of the manifold, not random
Some geometric intuition: letβs get real
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 34
o Letβs try to check where the gradients are sensitive around our manifold
o Gradients Jacobian
o Strongest directions of Jacobian Compute SVD of Jacobianπβ
ππ₯= ππππ
o Take strongest singular values in π and respective vectors from Uβ¦ These are our strongest tangents directions of Jacobian of most sensitivity
o Hopefully the tangents sensitive to direction of the manifold, not random
o How to check?
Some geometric intuition: letβs get real
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 35
o Letβs try to check where the gradients are sensitive around our manifold
o Gradients Jacobian
o Strongest directions of Jacobian Compute SVD of Jacobianπβ
ππ₯= ππππ
o Take strongest singular values in π and respective vectors from Uβ¦ These are our strongest tangents directions of Jacobian of most sensitivity
o Hopefully the tangents sensitive to direction of the manifold, not random
o How to check? Visualize!!
Some geometric intuition: letβs get real
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 36
o Letβs try to check where the gradients are sensitive around our manifold
o Gradients Jacobian
o Strongest directions of Jacobian Compute SVD of Jacobianπβ
ππ₯= ππππ
o Take strongest singular values in π and respective vectors from Uβ¦ These are our strongest tangents directions of Jacobian of most sensitivity
o Hopefully the tangents sensitive to direction of the manifold, not random
o How to check? Visualize!!
Some geometric intuition: letβs get real
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 37
Strongest tangents
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 38
o We want to model the data distribution
π π₯ = ΰΆ±ππ π§ ππ π₯ π§ ππ§
o Posterior ππ π§ π₯ is intractable for complicated likelihood functions ππ π₯ π§ , e.g. a neural network π π₯ is also intractable
o Introduce an inference machine ππ π§ π₯ (e.g. another neural network) that learns to approximate the posterior ππ π§ π₯β¦ Since we cannot know ππ π§ π₯ define a variational lower bound to optimize instead
β π, π, π₯ = βπ·πΎπΏ ππ π§ π₯ Τ‘ππ π§ + πΈππ π§ π₯ (log ππ(π₯|π§))
Variational Autoencoder
Reconstruction termRegularization term
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 39
o Since we cannot know ππ π§ π₯ define a variational lower bound to optimize instead
o Optimize inference machine ππ π§ π₯ and likelihood machine ππ(π₯|π§)β¦ ππ π§ π₯ should follow a Gaussian distribution
o Reparamerization trickβ¦ Instead of π§ being stochastic, introduce stochastic noise variable π
β¦ Then, think of π§ as deterministic, where π§ = ππ§ π₯ + πππ§ π₯
β¦ Simultaneous optimization of ππ π§ π₯ and ππ(π₯|π§) now possible
β π, π, π₯ = βπ·πΎπΏ ππ π§ π₯ Τ‘ππ π§ + πΈππ π§ π₯ (log ππ(π₯|π§))
Variational Autoencoder
Reconstruction termRegularization term
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 40
Examples
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 41
Adversarial Networks
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 42
o So far we were trying to express the pdf of the data π(π₯)
o Why bother? Is that really necessary? Is that limiting?
o Can we model directly the sampling of data?
Generative Adversarial Networks
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 43
o So far we were trying to express the pdf of the data π(π₯)
o Why bother? Is that really necessary? Is that limiting?
o Can we model directly the sampling of data?
o Yes, by modelling sampling as a βThief vs. Policeβ game
Generative Adversarial Networks
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 44
o Composed of two successive networks⦠Generator network (like upper half of autoencoders)
β¦ Discriminator network (like a convent)
o Learningβ¦ Sample βnoiseβ vectors π§
β¦ Per π§ the generator produces a sample π₯
β¦ Make a batch where half samples are real,half are the generated ones
β¦ The discriminator needs to predict what is realand what is fake
Generative Adversarial Networks
Generator
Noise π
Discriminator
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 45
o Generator and discriminator networks optimized together⦠The generator (thief) tries to fool the discriminator
β¦ The discriminator (police) tries to not get fooled by the generator
o Mathematically
minπΊ
maxπ·
π(πΊ, π·) = πΈπ₯~ππππ‘π(π₯) log π·(π₯) + πΈπ§~ππ§(π§) log(1 β π· πΊ π§ )
βPolice vs Thiefβ
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 46
A graphical interpretation
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 47
o This is (very) fresh research
o As such the theory behind adversarialnetworks is by far not mature
o Moreoever, GANs are quite unstable
o Optimal solution is a saddle point⦠Easy to mess up, little stability
o Solutions?β¦ Not great ones yet, although several
researchers are actively working on that
Adversarial network hardships
Generator
Discriminator
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 48
Examples of generated images
Bedrooms
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 49
Image βarithmeticsβ
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 51
Summary
o Latent space data manifolds
o Autoencoders and Variational Autoencoders
o Boltzmann Machines
o Adversarial Networks
o Self-Supervised Networks
UVA DEEP LEARNING COURSE β EFSTRATIOS GAVVES UNSUPERVISED, GENERATIVE & ADVERSARIAL NETWORKS - 52
o http://www.deeplearningbook.org/β¦ Part II: Chapter 10
o Excellent blog post on Backpropagation Through Time⦠http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-
rnns/β¦ http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-
language-model-rnn-with-python-numpy-and-theano/β¦ http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-
through-time-and-vanishing-gradients/
o Excellent blog post explaining LSTMs⦠http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[Pascanu2013] Pascanu, Mikolov, Bengio. On the difficulty of training Recurrent Neural Networks, JMLR, 2013
Reading material & references
UVA DEEP LEARNING COURSEEFSTRATIOS GAVVES
UNSUPERVISED, GENERATIVE& ADVERSARIAL NETWORKS - 53
Next lecture
o Memory networks
o Advanced applications of Recurrent and Memory Networks