Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Preview:

Citation preview

Ridge regression and Bayesian linear

regressionKenneth D. Harris

6/5/15

Multiple linear regression

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Single best guess

What sort of relationship can you assume? Linear

Multiple linear regression

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Not enough

What sort of prediction do you need? Single best guess

What sort of relationship can you assume? Linear

Multiple predictors, one predicted variable

• Choose to minimize sum-squared error:

Optimal weight vector (in MATLAB)

Too many predictors

• If , you can fit the training data perfectly• is equations in unknowns

• If , the solution is underconstrained ( is not invertible)

• But even if , you can problems with too many predictors

𝑁=40 ,𝑝=30 , 𝑦=𝑥1

𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒

𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒

Geometric interpretation

Target

𝐱𝟏

Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .

It would be better to just fit .

SignalN

oise

𝐱𝟐

Geometric interpretation

Target

𝐱𝟏

Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .

It would be better to just fit .

SignalN

oise

𝐱𝟐

Overfitting = large weight vectors

• Solution: weight vector penalty

Optimal weight vector

The inverse can always be taken, even for .

Example

𝜆=0 𝜆=3

Ridge regression introduces a bias𝜆=0 𝜆=50

A quick trick to do ridge regression

• Ordinary linear regression:

Minimizes . Define

Then is the solution to ridge regression. (Why?)

Regression as a probability model

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Probability distribution

What sort of relationship can you assume? Linear

Regression as a probability model

• Assume is random, but and are just numbers.

Then the likelihood is

Maximum likelihood is the same as least-squares fit.

Bayesian linear regression

• Now consider to also be random with prior distribution:

The posterior distribution is

Bayesian linear regression

This is all quadratic in . So is Gaussian distributed.

Bayesian linear regression

Mean of is exactly the same as in ridge regression. But we also get a covariance matrix for .

Bayesian predictions• Given a training set , and a new value Assume is random but are fixed.

• To make a prediction of , integrate over all possible :

Mean is the same as in ridge regression, but we also get a variance:.

The variance does not depend on the training set . It is low when many of the training set values are collinear with .

Recommended