33
ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE- TUNING FOR DEEP AUTOENCODERS Submitted by: Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094) Prof. K S Venkatesh

Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

Embed Size (px)

Citation preview

Page 1: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

Submitted by: Supervised by:Ankit Bhutani Prof. Amitabha Mukerjee(Y9227094) Prof. K S Venkatesh

Page 2: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

AUTOENCODERS

AUTO-ASSOCIATIVE NEURAL NETWORKS

OUTPUT SIMILAR AS INPUT

Page 3: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

DIMENSIONALITY REDUCTION

BOTTLENECK CONSTRAINT LINEAR ACTIVATION – PCA [Baldi et

al., 1989] NON-LINEAR PCA [Kramer, 1991] – 5

layered network ALTERNATE SIGMOID AND LINEAR

ACTIVATION EXTRACTS NON-LINEAR FACTORS

Page 4: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

ADVANTAGES OF NETWORKS WITH MULTIPLE LAYERS

ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS

TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA

HEIRARCHICAL REPRESENTATION RESULTS FROM CIRCUIT THEORY –

SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS

Page 5: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

PROBLEMS WITH DEEP NETWORKS

DIFFICULTY IN TRAINING DEEP NETWORKS NON-CONVEX NATURE OF OPTIMIZATION GETS STUCK IN LOCAL MINIMA VANISHING OF GRADIENTS DURING

BACKPROPAGATION SOLUTION

-``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006]

GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING

Page 6: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

HOW TO TRAIN DEEP NETWORKS?

PRE-TRAINING INCREMENTAL LAYER-WISE TRAINING EACH LAYER ONLY TRIES TO REPRODUCE

THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER

Page 7: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

FINE-TUNING

INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING

PERFORM BACKPROPOAGATION AS USUAL

Page 8: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

MODELS USED FOR PRE-TRAINING

STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs) HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE

A PROBABILISTIC DECISION OF PUTTING 0 OR 1 MODEL LEARNS THE JOINT PROBABILITY OF 2

BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER

EXACT METHODS – COMPUTATIONALLY INTRACTABLE

NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE

Page 9: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

MODELS USED FOR PRE-TRAINING

DETERMINISTIC – SHALLOW AUTOENCODERS HIDDEN LAYER ACTIVATIONS (0-1) ARE

DIRECTLY USED FOR INPUT TO NEXT LAYER

TRAINED BY BACKPROPAGATION DENOISING AUTOENCODERS CONTRACTIVE AUTOENCODERS SPARSE AUTOENCODERS

Page 10: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

CLASSIFIERS & AUTOENCODERS

TASK \ MODEL

RBM SHALLOW AE

CLASSIFIER [Hinton et al, 2006] and many others since then

Investigated by [Bengio et al, 2007], [Ranzato et al, 2007], [Vincent et al, 2008], [Rifai et al, 2011] etc.

DEEP AE [Hinton & Salakhutdinov, 2006]

No significant results reported in literature - Gap

Page 11: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

DATASETS

MNIST

Big and Small Digits

Page 12: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

DATASETS

Square & Room

2d Robot Arm

3d Robot Arm

Page 13: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

Libraries used Numpy, Scipy Theano – takes care of parallelization

GPU Specifications Memory – 256 MB Frequency – 33 MHz Number of Cores – 240 Tesla C1060

Page 14: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

MEASURE FOR PERFORMANCE

REVERSE CROSS-ENTROPY

X – Original input Z – Output Θ – Parameters – Weights and Biases

Page 15: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

BRIDGING THE GAP

RESULTS FROM PRELIMINARY EXPERIMENTS

Page 16: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

PRELIMINARY EXPERIMENTS

TIME TAKEN FOR TRAINING

CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN

Page 17: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

SPARSITY FOR DIMENSIONALITY REDUCTION

EXPERIMENT USING SPARSE REPRESENTATIONS STRATEGY A – BOTTLENECK STRATEGY B – SPARSITY + BOTTLENECK STRATEGY C – NO CONSTRAINT +

BOTTLENECK

Page 18: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

ALTERNATE SPARSITY

Page 19: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

OTHER IMPROVEMENTS

MOMENTUM INCORPORATING THE PREVIOUS UPDATE CANCELS OUT COMPONENTS IN

OPPOSITE DIRECTIONS – PREVENTS OSCILLATION

ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING

WEIGHT DECAY REGULARIZATION PREVENTS OVER-FITTING

Page 20: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

COMBINING ALL

USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS

Page 21: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

INTERMEDIATE FINE-TUNEING

MOTIVATION

Page 22: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

PROCESS

Page 23: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

PROCESS

Page 24: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

RESULTS

Page 25: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

RESULTS

Page 26: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

RESULTS

Page 27: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

CONCLUDING REMARKS

Page 28: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh
Page 29: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

NEURAL NETWORK BASICS

Page 30: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

BACKPROPAGATION

Page 31: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

RBM

Page 32: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

RBM

Page 33: Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

AUTOENCODERS