Upload
colby-bellows
View
231
Download
0
Tags:
Embed Size (px)
Citation preview
Environmental Data Analysis with MatLab
Lecture 6:The Principle of Least Squares
Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
purpose of the lecture
estimate model parameters using the
principle of least-squares
part 1
the least squares estimation of model parameters and their covariance
the prediction error
motivates us to define an error vector, e
prediction error in straight line case
-6 -4 -2 0 2 4 6-15
-10
-5
0
5
10
15
x
dplot of linedata01.txt
auxiliary variable, x
data
, d
dipre
diobs ei
total errorsingle number summarizing the error
sum of squares of individual errors
principle of least-squares
that minimizes
least-squares and probability
suppose that each observation has a Normal p.d.f.
2
for uncorrelated datathe joint p.d.f. is just the product of
the individual p.d.f.’s
least-squares formula for E suggests a link
between probability and least-squares
now assume that Gm predicts the mean of d
minimizing E(m) is equivalent to maximizing p(d)
Gm substituted for d
the principle of least-squaresdetermines the m
that makes the observations “most probable”
in the sense of maximizingp(dobs)
the principle of least-squaresdetermines the model parameters
that makes the observations “most probable”
(provided that the data are Normal)
this isthe principle of maximum likelihood
a formula for mestat the point of minimum error, E
∂E / ∂mi = 0so solve this equation for mest
Result
where the result comes fromE =
so
unity when k=jzero when k≠jsince m’s are independent
use the chain rule
so just delete sum over j and replace j with k
which gives
covariance of mestmest is a linear function of d of the form mest = M dso Cm = M Cd MT, with M=[GTG]-1GTassume Cd uncorrelated with uniform variance, σd
2
then
two methods of estimating the variance of the data
posterior estimate: use prediction error
prior estimate: use knowledge of measurement technique
the ruler has 1mm tic marks, so σd≈½mm
posterior estimates are overestimates when the model is poor
reduce N by M since an M-parameter model can exactly
fit N data
confidence intervals for the estimated model parameters
(assuming uncorrelated data of equal variance)
soσmi = √[Cm]ii
and
m=mest±2σmi (95% confidence)
MatLab script for least squares solution
mest = (G’*G)\(G’*d);Cm = sd2 * inv(G’*G);sm = sqrt(diag(Cm));
part 2
exemplary least squares problems
Example 1: the mean of data
the constant
will turn out to be the mean
usual formula for the mean
variance decreases with number of data
m1est = d = 2σd± √N (95% confidence)
formula for mean formula for covariance
combining the two into confidence limits
Example 2: fitting a straight line
intercept
slope
[GTG]-1=(uses the rule)
intercept and slope are uncorrelated
when the mean of x is zero
keep in mind that none of this algrbraic manipulation is needed if we just compute
using MatLab
Generic MatLab scriptfor least-squares problems
mest = (G’*G)\(G’*dobs);dpre = G*mest;e = dobs-dpre;E = e’*e;sigmad2 = E / (N-M);covm = sigmad2 * inv(G’*G);sigmam = sqrt(diag(covm));mlow95 = mest – 2*sigmam;mhigh95 = mest + 2*sigmam;
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-40
-20
0
2040
time, days
obs t
em
p,
C
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-40
-20
0
20
40
time, days
pre
tem
p,
C
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-40
-20
0
2040
time, days
err
or,
C
d(t)obs
d(t)pre
error, e(t)time t, days
time t, days
time t, days
Example 3:modeling long-term trend and annual cycle in
Black Rock Forest temperature data
the model:
long-term trend annual cycle
Ty=365.25; G=zeros(N,4); G(:,1)=1; G(:,2)=t; G(:,3)=cos(2*pi*t/Ty); G(:,4)=sin(2*pi*t/Ty);
MatLab script to create the data kernel
prior variance of databased on accuracy of thermometerσd = 0.01 deg C
posterior variance of databased on error of fitσd = 5.60 deg C
huge difference, since the model does not include diurnal cycle of weather patterns
long-term slope
95% confidence limits based on prior variancem2 = -0.03 ± 0.00002 deg C / yr95% confidence limits based on posterior variancem2 = -0.03 ± 0.00460 deg C / yrin both cases, the cooling trend is significant, in the sense that the confidence intervals do not include zero or positive slopes.
However
The fit to the data is poor, so the results should be used with caution. More effort needs to be put into developing a better model.
part 3
covariance and the shape of the error surface
m1est
0 4m20
4
mest
m1
m2est
solutions within the region of low error are almost as good as mest
small range of m2
large range of m1
E(m)mi
miest
near the minimum the error is shaped like a parabola. The curvature of the parabola
controls the with of the region of low error
near the minimum, the Taylor series for the error is:
curvature of the error surface
starting with the formula for error
we compute its 2nd derivative
but
so
curvature of the error surface
covariance of the model parameters
the covariance of the least squares solution
is expressed
in the shape of the error surface
E(m)mi
miest
E(m)mi
miest
large variance
small variance