Regression / Calibration · paul geladi Oct 2007 5 Multivariate linear regression Things to do • Set up the equation • Solve the equation • Diagnose the equation • Visualise

paul geladi Oct 2007 1

Regression / Calibration

Paul Geladi, SBT, SLU

MLR, RR, PCR, PLS

Univariate regression


x

y

Offset

Slope a

x

y

Offset a

Slope b a

e

y = a + bx + e


x

y

x

y Linear fit

Underfit


x

y Overfit

x

y Quadratic fit


Multivariate linear regression

Things to do

• Set up the equation

• Solve the equation

• Diagnose the equation

• Visualise the results

• Use the equation


Things to do

• Check residuals

• Check for outliers

• Check for nonlinearity

• Correct for nonlinearity

• Wavelength reduction

y = f(x)

Works sometimes

y = f(x)

Works only for a few variables

Measurement noise!

∞ possible functions


X y

I

K

y = f(x)

y = f(x)

Simplified by:

y = b0 + b1x1 + b2x2 + ... + bKxK + f

Linear approximation


y = b0 + b1x1 + b2x2 + ... + bKxK + f

y : response

xk : predictors

bk : regression coefficients

b0 : offset, constant

f : residual

Nomenclature

X y

I

K

X, y mean-centered b0 out


y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

} I samples

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f


X y

I

K

f

b

= +

y = Xb + f

X, y known, measurable b, f unknown

No solution

f must be constrained


The MLR solution

Multiple Linear Regression

Ordinary Least Squares (OLS)

b = (X’X)-1 X’y

Problems?

Least squares


3b1 + 4b2 = 1

4b1 + 5b2 = 0

One solution

3b1 + 4b2 = 1

4b1 + 5b2 = 0

b1 + b2 = 4

No solution


3b1 + 4b2 + b3 = 1

4b1 + 5b2 + b3 = 0

∞ solutions

b = (X’X)-1 X’y

-K > I ∞ solutions

-I > K no solution

-error in X

-error in y

-inverse may not exist

-inverse may be unstable


3b1 + 4b2 + e = 1

4b1 + 5b2 + e = 0

b1 + b2 + e = 4

Solution

Wanted solution

- I ≥ K

- No inverse

- No noise in X


Diagnostics

y = Xb + f

SS tot = SSmod + SSres

R2 = SSmod / SStot = 1- SSres / SStot

Coefficient of determination

Diagnostics

y = Xb + f

SSres = f’f

RMSEC = [ SSres / (I-A) ] 1/2

Root Mean Squared Error of Calibration


Alternatives to MLR/OLS

Principal Component Regression (PCR)

- I ≥ K

-Easy inversion



X T

K A

PCA

- A ≤ I

- T orthogonal

- Noise in X removed


y = Td + f

d = (T’T)-1 T’y


Problem

How many components used?

Advantage

- PCA done on data

- Outliers

- Classes

- Noise in X removed


Partial Least Squares Regression

X Y t u


X Y t u

w’ q’

Outer relationship

X Y t u

w’ q’

Inner relationship


X Y t u

w’ q’

A

A A

A

p’

Advantages

- X decomposed

- Y decomposed

- Noise in X left out

- Noise in Y left out


PCR, PLS are one component at a time methods

After each component, a residual

is calculated

The next component is calculated on the residual

Another view

y = Xb + f

y = XbRR + fRR

y = XbPCR + fPCR

y = XbPLS + fPLS


Prediction


Xcal ycal

I

K

Xtest ytest

J

yhat

Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

PRESS = ftest’ftest

RMSEP = [ PRESS / J ] 1/2

Root Mean Squared Error of Prediction


Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

R2test = Q2 = 1 - ftest’ftest/ytest’ytest

Some rules of thumb

R2 > 0.65 5 PLS comp.

R2test > 0.5

R2 - R2test < 0.2


Bias

f = y - Xb

always 0 bias

ftest = y - yhat

bias = 1/J S ftest

Leverage - influence

b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage



b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage






Residual plot

Residual

-Check histogram f

-Check variablewise E

-Check objectwise E


Documents

Regression / Calibration · paul geladi Oct 2007 5 Multivariate linear regression Things to do • Set up the equation • Solve the equation • Diagnose the equation • Visualise