Upload
vutram
View
218
Download
0
Embed Size (px)
Citation preview
paul geladi Oct 2007 1
Regression / Calibration
Paul Geladi, SBT, SLU
MLR, RR, PCR, PLS
Univariate regression
paul geladi Oct 2007 2
x
y
Offset
Slope a
x
y
Offset a
Slope b a
e
y = a + bx + e
paul geladi Oct 2007 3
x
y
x
y Linear fit
Underfit
paul geladi Oct 2007 4
x
y Overfit
x
y Quadratic fit
paul geladi Oct 2007 5
Multivariate linear regression
Things to do
• Set up the equation
• Solve the equation
• Diagnose the equation
• Visualise the results
• Use the equation
paul geladi Oct 2007 6
Things to do
• Check residuals
• Check for outliers
• Check for nonlinearity
• Correct for nonlinearity
• Wavelength reduction
y = f(x)
Works sometimes
y = f(x)
Works only for a few variables
Measurement noise!
∞ possible functions
paul geladi Oct 2007 7
X y
I
K
y = f(x)
y = f(x)
Simplified by:
y = b0 + b1x1 + b2x2 + ... + bKxK + f
Linear approximation
paul geladi Oct 2007 8
y = b0 + b1x1 + b2x2 + ... + bKxK + f
y : response
xk : predictors
bk : regression coefficients
b0 : offset, constant
f : residual
Nomenclature
X y
I
K
X, y mean-centered b0 out
paul geladi Oct 2007 9
y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK + f
y = b1x1 + b2x2 + ... + bKxK + f
} I samples
y = b1x1 + b2x2 + ... + bKxK +f
y = b1x1 + b2x2 + ... + bKxK +f
y = b1x1 + b2x2 + ... + bKxK +f
y = b1x1 + b2x2 + ... + bKxK +f
y = b1x1 + b2x2 + ... + bKxK +f
paul geladi Oct 2007 10
X y
I
K
f
b
= +
y = Xb + f
X, y known, measurable b, f unknown
No solution
f must be constrained
paul geladi Oct 2007 11
The MLR solution
Multiple Linear Regression
Ordinary Least Squares (OLS)
b = (X’X)-1 X’y
Problems?
Least squares
paul geladi Oct 2007 12
3b1 + 4b2 = 1
4b1 + 5b2 = 0
One solution
3b1 + 4b2 = 1
4b1 + 5b2 = 0
b1 + b2 = 4
No solution
paul geladi Oct 2007 13
3b1 + 4b2 + b3 = 1
4b1 + 5b2 + b3 = 0
∞ solutions
b = (X’X)-1 X’y
-K > I ∞ solutions
-I > K no solution
-error in X
-error in y
-inverse may not exist
-inverse may be unstable
paul geladi Oct 2007 14
3b1 + 4b2 + e = 1
4b1 + 5b2 + e = 0
b1 + b2 + e = 4
Solution
Wanted solution
- I ≥ K
- No inverse
- No noise in X
paul geladi Oct 2007 15
Diagnostics
y = Xb + f
SS tot = SSmod + SSres
R2 = SSmod / SStot = 1- SSres / SStot
Coefficient of determination
Diagnostics
y = Xb + f
SSres = f’f
RMSEC = [ SSres / (I-A) ] 1/2
Root Mean Squared Error of Calibration
paul geladi Oct 2007 16
Alternatives to MLR/OLS
Principal Component Regression (PCR)
- I ≥ K
-Easy inversion
paul geladi Oct 2007 17
Principal Component Regression (PCR)
X T
K A
PCA
- A ≤ I
- T orthogonal
- Noise in X removed
Principal Component Regression (PCR)
y = Td + f
d = (T’T)-1 T’y
paul geladi Oct 2007 18
Problem
How many components used?
Advantage
- PCA done on data
- Outliers
- Classes
- Noise in X removed
paul geladi Oct 2007 19
Partial Least Squares Regression
X Y t u
paul geladi Oct 2007 20
X Y t u
w’ q’
Outer relationship
X Y t u
w’ q’
Inner relationship
paul geladi Oct 2007 21
X Y t u
w’ q’
A
A A
A
p’
Advantages
- X decomposed
- Y decomposed
- Noise in X left out
- Noise in Y left out
paul geladi Oct 2007 22
PCR, PLS are one component at a time methods
After each component, a residual
is calculated
The next component is calculated on the residual
Another view
y = Xb + f
y = XbRR + fRR
y = XbPCR + fPCR
y = XbPLS + fPLS
paul geladi Oct 2007 23
Prediction
paul geladi Oct 2007 24
Xcal ycal
I
K
Xtest ytest
J
yhat
Prediction diagnostics
yhat = Xtestb
ftest = ytest -yhat
PRESS = ftest’ftest
RMSEP = [ PRESS / J ] 1/2
Root Mean Squared Error of Prediction
paul geladi Oct 2007 25
Prediction diagnostics
yhat = Xtestb
ftest = ytest -yhat
R2test = Q2 = 1 - ftest’ftest/ytest’ytest
Some rules of thumb
R2 > 0.65 5 PLS comp.
R2test > 0.5
R2 - R2test < 0.2
paul geladi Oct 2007 26
Bias
f = y - Xb
always 0 bias
ftest = y - yhat
bias = 1/J S ftest
Leverage - influence
b= (X’X)-1 X’y
yhat = Xb = X(X’X)-1 X’y = Hy
the Hat matrix
diagonal elements of H: Leverage
paul geladi Oct 2007 27
Leverage - influence
b= (X’X)-1 X’y
yhat = Xb = X(X’X)-1 X’y = Hy
the Hat matrix
diagonal elements of H: Leverage
Leverage - influence
paul geladi Oct 2007 28
Leverage - influence
Leverage - influence
paul geladi Oct 2007 29
Residual plot
Residual
-Check histogram f
-Check variablewise E
-Check objectwise E
paul geladi Oct 2007 30