37
Regularization

Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Regularization

Page 2: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Classical Regularization: Parameter Norm Penalty

Page 3: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

𝐿2Parameter Regularization, weight decay

Page 4: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 5: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 6: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 7: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Dataset Augmentation

Page 8: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Dataset Augmentation

Page 9: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Noise injection

Page 10: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Noise injection

Page 11: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Injecting Noise at the Input

Page 12: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 13: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 14: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 15: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 16: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Manifold Tangent Classifier

Page 17: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Manifold Tangent Classifier

Page 18: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Manifold Tangent Classifier

Page 19: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Early Stopping as a Form of Regularization

Page 20: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Early Stopping as a Form of Regularization

Page 21: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 22: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 23: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 24: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Parameter Tying and Parameter Sharing

Page 25: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Parameter Tying and Parameter Sharing

Page 26: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Bagging and Other Ensemble Methods

Page 27: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Bagging and Other Ensemble Methods

Page 28: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Bagging and Other Ensemble Methods

Page 29: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Bagging and Other Ensemble Methods

Page 30: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Dropout

• Dropout is one of the techniques for preventing overfitting in deep neural network which contains a large number of parameters.

Page 31: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Original Paper

• Title:– Dropout: A Simple Way to Prevent Neural Networks from Overfitting.

• Authors:– Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan

Salakhutdinov

• Organization:– Department of Computer Science, University of Toronto

• URL:– https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

Page 32: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Overview

• The key idea is to randomly drop units from the neural network during training.

• During training, dropout samples from an exponential number of different “thinned” network.

• At test time, we approximate the effect of averaging the predictions of all these thinned networks.

Page 33: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of
Page 34: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Model

Page 35: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Training

Page 36: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Max-norm Regularization

Page 37: Regularization · regularization term given by it has the effect of penalizing large gradients of the function ÿ(x). it has the effect of reducing the sensitivity of the output of

Test Time