CSCE 496/896 Lecture 5:...

CSCE496/896

Lecture 5:Autoencoders

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

CSCE 496/896 Lecture 5:Autoencoders

Stephen Scott

(Adapted from Paul Quint and Ian Goodfellow)

sscott@cse.unl.edu

1 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Introduction

Autoencoding is training a network to replicate itsinput to its outputApplications:

Unlabeled pre-training for semi-supervised learningLearning embeddings to support information retrievalGeneration of new instances similar to those in thetraining setData compression

2 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Outline

Basic ideaStackingTypes of autoencoders

DenoisingSparseContractiveVariationalGenerative adversarial networks

3 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Basic Idea

Sigmoid activation functions, 5000 training epochs,square loss, no regularizationWhat’s special about the hidden layer outputs?

4 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Basic Idea

An autoencoder is a network trained to learn theidentity function: output = input

Subnetwork calledencoder f (·) maps inputto an embeddedrepresentationSubnetwork calleddecoder g(·) maps backto input space

Can be thought of as lossy compression of inputNeed to identify the important attributes of inputs toreproduce faithfully

5 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Basic Idea

General types of autoencoders based on size of hiddenlayer

Undercomplete autoencoders have hidden layer sizesmaller than input layer size⇒ Dimension of embedded space lower than that of input

space⇒ Cannot simply memorize training instances

Overcomplete autoencoders have much larger hiddenlayer sizes⇒ Regularize to avoid overfitting, e.g., enforce a sparsity

constraint

6 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Basic IdeaExample: Principal Component Analysis

A 3-2-3 autoencoder with linear units and square lossperforms principal component analysis: Find lineartransformation of data to maximize variance

7 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Stacked Autoencoders

A stackedautoencoderhas multiplehidden layers

Can share parameters to reduce their number byexploiting symmetry: W4 = W>

1 and W3 = W>2

weights1 = tf.Variable(weights1_init, dtype=tf.float32, name="weights1")weights2 = tf.Variable(weights2_init, dtype=tf.float32, name="weights2")weights3 = tf.transpose(weights2, name="weights3") # shared weightsweights4 = tf.transpose(weights1, name="weights4") # shared weights

8 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Stacked AutoencodersIncremental Training

Can simplify training by starting with single hiddenlayer H1

Then, train a second AE to mimic the output of H1

Insert this into first networkCan build by using H1’s output as training set forPhase 2

9 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Stacked AutoencodersIncremental Training (Single TF Graph)

Previous approach requires multiple TensorFlow graphsCan instead train both phases in a single graph: Firstleft side, then right

10 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Stacked AutoencodersVisualization

Input MNIST Digit Network Output

Weights (features selected) for five nodes from H1:

11 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Stacked AutoencodersSemi-Supervised Learning

Can pre-train network with unlabeled data⇒ learn useful features and then train “logic” of dense

layer with labeled data

TF code12 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Transfer Learning from Trained Classifier

Can alsotransfer from aclassifiertrained ondifferent task,e.g., transfer aGoogleNetarchitecture toultrasoundclassification

Often choose existing one from a model zoo

13 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Denoising AutoencodersVincent et al. (2010)

Can train an autoencoder to learn to denoise input bygiving input corrupted instance x and targetinguncorrupted instance xExample noise models:

Gaussian noise: x = x + z, where z ∼ N (0, σ2I)Masking noise: zero out some fraction ν ofcomponents of xSalt-and-pepper noise: choose some fraction ν ofcomponents of x and set each to its min or max value(equally likely)

14 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Denoising Autoencoders

15 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Denoising AutoencodersExample

16 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Denoising Autoencoders

How does it work?Even though, e.g., MNIST data are in a784-dimensional space, they lie on a low-dimensionalmanifold that captures their most important featuresCorruption process moves instance x off of manifoldEncoder fθ and decoder gθ′ are trained to project x backonto manifold

17 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Sparse Autoencoders

An overcomplete architectureRegularize outputs of hidden layer to enforce sparsity:

J (x) = J (x, g(f (x))) + αΩ(h) ,

where J is loss function, f is encoder, g is decoder,h = f (x), and Ω penalizes non-sparsity of hE.g., can use Ω(h) =

∑i |hi| and ReLU activation to

force many zero outputs in hidden layerCan also measure average activation of hi acrossmini-batch and compare it to user-specified targetsparsity value p (e.g., 0.1) via square error orKullback-Leibler divergence:

p logpq

+ (1− p) log1− p1− q

where q is average activation of hi over mini-batch18 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Contractive Autoencoders

Similar to sparse autoencoder, but use

Ω(h) =

m∑j=1

n∑i=1

(∂hi

I.e., penalize large partial derivatives of encoderoutputs wrt input valuesThis contracts the output space by mapping inputpoints in a neighborhood near x to a smaller outputneighborhood near f (x)

⇒ Resists perturbations of input x

If h has sigmoid activation, encoding near binary and aCE pushes embeddings to corners of a hypercube

19 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational Autoencoders

VAE is an autoencoder that is also generative model⇒ Can generate new instances according to a probability

distributionE.g., hidden Markov models, Bayesian networksContrast with discriminative models, which predictclassifications

Encoder f outputs [µ,σ]>

Pair (µi, σi) parameterizesGaussian distribution fordimension i = 1, . . . , nDraw zi ∼ N (µi, σi)Decode this latent variable zto get g(z)

20 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersLatent Variables

Independence of z dimensions makes it easy togenerate instances wrt complex distributions viadecoder gLatent variables can be thought of as values ofattributes describing inputs

E.g., for MNIST, latent variables might represent“thickness”, “slant”, “loop closure”

21 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersArchitecture

22 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersOptimization

Maximum likelihood (ML) approach for traininggenerative models: find a model (θ) with maximumprobability of generating the training set XAchieve this by minimizing the sum of:

End-to-end AE loss (e.g., square, cross-entropy)Regularizer measuring distance (K-L divergence) fromlatent distribution q(z | x) and N (0, I) (= standardmultivariate Gaussian)

N (0, I) also considered the prior distribution over z (=distribution when no x is known)

eps = 1e-10latent_loss = 0.5 * tf.reduce_sum(

tf.square(hidden3_sigma) + tf.square(hidden3_mean)- 1 - tf.log(eps + tf.square(hidden3_sigma)))

23 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersReparameterization Trick

Cannot backprop error signal through random samplesReparameterization trick emulates z ∼ N (µ, σ) withε ∼ N (0, 1), z = εσ + µ

24 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersExample Generated Images: Random

Draw z ∼ N (0, I) and display g(z)

25 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational AutoencodersExample Generated Images: Manifold

Uniformly sample points in z space and decode

26 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Variational Autoencoders2D Cluster Analysis

Cluster analysis by digit

27 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial Network

GANs are also generative models, like VAEsModels a game between two players

Generator creates samples intended to come fromtraining distributionDiscriminator attempts to discern the “real” (originaltraining) samples from the “fake” (generated) ones

Discriminator trains as a binary classifier, generatortrains to fool the discriminator

28 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkHow the Game Works

Let D(x) be discriminator parameterized by θ(D)

Goal: Find θ(D) minimizing J(D)(θ(D),θ(G)

)Let G(z) be generator parameterized by θ(G)

Goal: Find θ(G) minimizing J(G)(θ(D),θ(G)

)A Nash equilibrium of this game is

(θ(D),θ(G)

that each θ(i), i ∈ D,G yields a local minimum of itscorresponding J

29 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkTraining

Each training step:Draw a minibatch of x values from datasetDraw a minibatch of z values from prior (e.g., N (0, I))Simultaneously update θ(G) to reduce J(G) and θ(D) toreduce J(D), via, e.g., Adam

For J(D), common to use cross-entropy where label is 1for real and 0 for fakeSince generator wants to trick discriminator, can useJ(G) = −J(D)

Others exist that are generally better in practice, e.g.,based on ML

30 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkDCGAN: Radford et al. (2015)

“Deep, convolution GAN”Generator uses transposed convolutions (e.g.,tf.layers.conv2d_transpose) without pooling toupsample images for input to discriminator

31 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkDCGAN Generated Images: Bedrooms

Trained from LSUN dataset, sampled z space

32 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkDCGAN Generated Images: Adele Facial Expressions

Trained from frame grabs of interview, sampled z space

33 / 34

CSCE496/896

Stephen Scott

Introduction

Basic Idea

Stacked AE

Denoising AE

Sparse AE

ContractiveAE

Variational AE

Generative Adversarial NetworkDCGAN Generated Images: Latent Space Arithmetic

Performed semantic arithmetic in z space!

(Non-center images have noise added in z space; center isnoise-free)

34 / 34

CSCE 496/896 Lecture 5:...

Documents

ZS12/18/24/30-2DH AE ZW11/18/24/30-2DH AE dijelovi... · 2016. 1. 23. · 2DH AE ZW18-2DH AE ZW24-2DH AE ZW30-2DH AE Innenkörper Heating body Corps interieur Corpo interno Cuerpo

PRINCE WILLIAM COUNTY 2008 Comprehensive Plan Long Range ... · ae ae er ae crhs er ei ae srr ae srl e r ae crhs pl pl pl ae srl nc pl er rec ae rpc rpc srl pl srl ae srl srl srl

Algorithmic system Algorítmico/Catalogos... · Red RS-485 AE/SA-III AE/SA-C8 AE/SA-C2 AE/SA-C8 Tarjeta Red Ethernet Red ETHERNET AE/V-C485R AE/V-C485R AE/V-C485R There are different

ST 4851 AE ST 8051 AE - Taloon.com

Joan Sscott - A Invisibilidade Da Experiência

gillowukes.files.wordpress.com Believer - Monkees From: Richard G's Ukulele Songbook I Gm7 AE-AE-AE-AE I AE-AE-AE-AE I oh I …

RDS Olteni (asfalt) Str. Putul N Str. Putul Olteni (asfalt ... · Str. Putul Olteni (asfalt) Str. Putul Olteni (asfalt) AE AE AE AE AE AE AE SbA1- drum L=17m conducta de protectie

CSCE 496/896 Lecture 9: word2vec and node2veccse.unl.edu/~sscott/teach/Classes/cse496S18/slides/9-xxx2vec.pdf · CSCE 496/896 Lecture 9: word2vec and node2vec Stephen Scott Introduction

AE² Agenda - Aukstumtehnika AE Information Package-2014-10 … · AE-1013-FR Standard AE-1014-FR cable clamp and terminal board AE-1054-FR Standard ... AE5465C FZ AE-1118-FR CSR

AE 9 AE 40 Contactors, NE Contactors Relays Control 1 1SBC 0093 00 R1002 AE 9 ... AE 40 Contactors NE.. Contactor Relays Contents AE 9 ... AE 40 Contactors Description ... 2 ABB Control

Auto Enrolment: To AE or not to AE?

CSCE 496/896 Lecture 6: Recurrent Architecturescse.unl.edu/~sscott/teach/Classes/cse496S18/slides/6-Recurrent.pdf · CSCE 496/896 Lecture 6: Recurrent Architectures Stephen Scott

国コード Ae Ae Ae Ae Ae Ae

AE 75TX and AE 100TX - Advanced Energysolarenergy.advanced-energy.com/upload/File/AE_Data_Sheets/CAN-AE...AE 75TX and AE 100TX (Formerly known as PVP75kW and PVP100kW) The industry

Price List 2012 - MKT Gastroservice GmbHMODELLO Model, Modèle, Modell, Modelo AE 35.21 AE 38.25 AE 40.30 AE 45.30 AE 50.32 AE 50.35 H Supplemento potenza Extra power, Puissance supplémentaire,

Computer Science & Engineering 150A Problem Solving …cse.unl.edu/~sscott/teach/Classes/cse150AF09/slides/lecture05... · Computer Science & Engineering 150A Problem Solving

Films AE Classement AE 2009

Introduction - University of Nebraska–Lincolncse.unl.edu/~sscott/teach/Classes/cse423F13/slides/...CSCE423/823 Introduction Bellman-Ford Algorithm Introduction The Algorithm Example

AY-XP9LSR AE-X9LSR AY-XP12LSR AE-X12LSR

ovac DaMaR+ae}ae ku-å+ae}ae SaMaveTaa YauYauTSav ) … · athärjunasya çoka-mohau kathambhütäv ity apekñäyäà mahäbhärata-vaktä çré-vaiçampäyano janamejayaà prati