18
REGULARIZATION (ADDITIONAL) Soft Computing Week 4

REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Embed Size (px)

Citation preview

Page 1: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

REGULARIZATION(ADDITIONAL)

Soft Computing Week 4

Page 2: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

The problem of overfitting

Page 3: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Example: Linear regression (housing prices)

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).

Pri

ce

SizePri

ceSize

Pri

ce

Size

Page 4: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Example: Logistic regression

( = sigmoid function)

x1

x2

x1

x2

x1

x2

Page 5: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Addressing overfitting:

Options:1. Reduce number of features.

― Manually select which features to keep.― Model selection algorithm (later in course).

2. Regularization.― Keep all the features, but reduce

magnitude/values of parameters .― Works well when we have a lot of features,

each of which contributes a bit to predicting .

Page 6: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularization Cost function

Page 7: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Intuition

Suppose we penalize and make , really small.

Pri

ce

Size of house

Pri

ce

Size of house

Page 8: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularization.

Small values for parameters ― “Simpler” hypothesis― Less prone to overfitting

Housing:― Features: ― Parameters:

Page 9: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularization.

Pri

ce

Size of house

Page 10: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularization

In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large for our problem, say )?- Algorithm works fine; setting to be very large

can’t hurt it- Algortihm fails to eliminate overfitting.- Algorithm results in underfitting. (Fails to fit even

training data well).- Gradient descent will fail to converge.

Page 11: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularization

In regularized linear regression, we choose to minimize

What if is set to an extremely large value (perhaps for too large for our problem, say )?

Pri

ce

Size of house

Page 12: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularized linear regression

Page 13: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularized linear regression

Page 14: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Gradient descent

Repeat

Page 15: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularized logistic regression

Page 16: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Regularized logistic regression.

Cost function:x1

x2

Page 17: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Gradient descent

Repeat

Page 18: REGULARIZATION (ADDITIONAL) Soft Computing Week 4

Advanced optimization

gradient(1) = [ ];

function [jVal, gradient] = costFunction(theta)

jVal = [ ];

gradient(2) = [ ];

gradient(n+1) = [ ];

code to compute

code to compute

code to compute

code to compute

gradient(3) = [ ];code to compute