Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Linear models and their mathematical foundations:Multiple linear regression, part I
Steffen Unkel
Department of Medical StatisticsUniversity Medical Center Gottingen, Germany
Winter term 2018/19 1/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Introduction
In multiple linear regression, we attempt to predict acontinuous (random) response variable y on the basis of anassumed linear relationship with several (fixed) predictorvariables x1, x2, . . . , xk .
Given a sample of n observations on y and the associated xvariables, the n model equations can be written as
y1y2...yn
=
1 x11 x12 . . . x1k1 x21 x22 . . . x2k...
......
. . ....
1 xn1 xn2 . . . xnk
β0β1...βk
+
ε1ε2...εn
or
yn×1
= Xn×(k+1)
β(k+1)×1
+ εn×1
.
Winter term 2018/19 2/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Assumptions
Model assumptions:
A1 E(ε) = 0.
A2 Cov(ε) = σ2I.
Occasionally, we will make use of the following additionalassumption:
A3 ε ∼ Nn(0, σ2I).
For the time being, we assume that for the n × (k + 1) designmatrix it holds that n > k + 1 and rank(X) = k + 1.
The β regression coefficients are sometimes referred to aspartial regression coefficients.
Winter term 2018/19 3/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Least squares estimation of β
To find β, we solve the optimization problem minβ
ε>ε.
If y = Xβ + ε, where X has size n × (k + 1) with n > k + 1and rank(X) = k + 1, then the (ordinary) least squaresestimator β that minimizes ε>ε is
β = (X>X)−1X>y .
The least squares estimator is derived without any of theassumptions A1–A3.
If β = (X>X)−1X>y, then ε = y− Xβ = y− y is the vectorof residuals.
Winter term 2018/19 4/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Basic geometry of least squares
Prediction space
O
y
Xβ
Figure: A general point Xβ in the prediction space.
Winter term 2018/19 5/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Basic geometry of least squares (2)
Prediction space
O
y
X�β
Figure: The right-angled triangle of vectors y, y = Xβ and ε = y− y.
Winter term 2018/19 6/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Properties of the least squares estimator β
1 If assumption A1 holds, then E(β) = β.
2 If assumption A2 holds, then Cov(β) = σ2(X>X)−1.
3 Gauss-Markov theorem: If A1 and A2 hold, the least squaresestimators βj , j = 0, . . . , k, have minimum variance among all
linear unbiased estimators; the βj (j = 0, . . . , k) are best linearunbiased estimators (BLUE).
Corollary: If A1 and A2 hold, the BLUE of a>β is a>β, whereβ = (X>X)−1X>y.
Winter term 2018/19 7/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Properties (2)
4 If x = (1, x1, . . . , xk)> and z = (1, c1x1, . . . , ckxk)>, then
y = β>
x = β>z z, where βz is the least squares estimator from
the regression of y on z.
Corollary: The fitted value y is invariant to a full-rank lineartransformation on the x variables.
Winter term 2018/19 8/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Estimation of σ2
We estimate σ2 by
s2 =1
n − k − 1
n∑i=1
(yi − x>i β)2
=1
n − k − 1(y− Xβ)>(y− Xβ) =
y>y− β>
X>y
n − k − 1
=SSE
n − k − 1,
where x>i is the ith row of X and SSE = y>y− β>
X>y.
If A1 and A2 hold, then E(s2) = σ2 and an unbiased
estimator of Cov(β) is Cov(β) = s2(X>X)−1.
Winter term 2018/19 9/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Maximum likelihood estimation
To the assumptions A1 and A2, we now add A3:ε ∼ Nn(0, σ2I).
If y ∼ Nn(Xβ, σ2I) and X is an n × (k + 1) design matrixwith rank(X) = k + 1 < n the maximum likelihood estimators(MLEs) of β and σ2 are
βMLE = (X>X)−1X>y ,
σ2MLE =1
n(y− Xβ)>(y− Xβ) .
Whereas βMLE is the same as the least squares estimator,σ2MLE is biased.
Winter term 2018/19 10/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Some properties of the MLEs
1 The MLEs βMLE and σ2MLE have the following distributionalproperties:
(i) βMLE ∼ Nk+1(β, σ2(X>X)−1).
(ii) nσ2MLE/σ
2 ∼ χ2(n − k − 1).
(iii) βMLE and σ2MLE are independent.
2 If y ∼ Nn(Xβ, σ2I), then βMLE and σ2MLE are jointly sufficientstatistics for the parameters β and σ2.
Winter term 2018/19 11/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
The multiple linear regression model in centered form
Let xj =∑n
i=1 xij/n (j = 1, . . . , k). The centered multiplelinear regression model for y is
y = (1n Xc)
(αβ1
)+ ε ,
where α = β0 + β1x1 + · · ·+ βk xk , β1 = (β1, . . . , βk)>,Xc =
(In − n−11n1>n
)X1 and
X1 =
x11 x12 . . . x1kx21 x22 . . . x2k
......
. . ....
xn1 xn2 . . . xnk
.
Recall the centering matrix In − n−11n1>n .
Winter term 2018/19 12/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Least squares estimators in the centered model
The least squares estimators are given by
α = y , β1 = (X>c Xc)−1X>c y .
The estimators above are the same as β = (X>X)−1X>y withthe adjustment
β0 = α− β1x1 − · · · − βk xk = y − β>1 x .
We can express y1, . . . , yn in centered form as follows:yi = α + β1(xi1 − x1) + · · ·+ β1(xik − xk).
We can write the error sum of squares as follows:
SSE =∑n
i=1(yi − y)2 − β>1 X>c y = y>y− β
>X>y.
Winter term 2018/19 13/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Coefficient of determination
Recall the coefficient of determination:
R2 =
∑ni=1(yi − y)2∑ni=1(yi − y)2
=SSR
SST= 1− SSE
SST.
For the multiple linear regression model, 0 ≤ R2 ≤ 1 can bewritten as
R2 =β>
X>y− ny2
y>y− ny2= 1− y>y− β
>X>y
y>y− ny2
=β>1 X>c Xc β1
y>y− ny2.
Winter term 2018/19 14/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Some properties of R2
1 The positive square root of R is the multiple correlationbetween the response and the predictors.
2 The multiple correlation is equal to the simple correlationbetween the observed yi ’s and the fitted yi ’s.
3 R2 is invariant to full-rank linear transformations on the x ’sand to a scale change on y .
Winter term 2018/19 15/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Adjusted R2
Adding a predictor to the model cannot decrease the value ofR2.
However, this may conflict with the principle of parsimony.
An adjusted R2a has been proposed that includes a penalty for
adding a predictor variable to the model.
It is defined as
R2a =
(R2 − k
n−1
)(n − 1)
n − k − 1=
(n − 1)R2 − k
n − k − 1.
R2a can be negative.
Winter term 2018/19 16/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Model setting
We now consider situations for which the assumption A2 isviolated.
Instead we impose the assumption Cov(ε) = σ2V, whereV 6= I is a known symmetric positive definite matrix of sizen × n.
The matrix V has n diagonal elements and
(n2
)elements
above (or below) the diagonal.
In certain applications, a simpler structure for V (e.g.diagonal) is assumed.
Winter term 2018/19 17/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Generalized least squares (GLS) estimators
For the model with Cov(ε) = σ2V, we obtain the followingresults:
(i) The BLUE of β is β = (X>V−1X)−1X>V−1y .
(ii) The covariance matrix for β is Cov(β) = σ2(X>V−1X)−1 .
(iii) An unbiased estimator of σ2 is
s2 =(y− Xβ)>V−1(y− Xβ)
n − k − 1
=y>[V−1 − V−1X(X>V−1X)−1X>V−1
]y
n − k − 1,
where β is given in (i).
Winter term 2018/19 18/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Maximum likelihood estimators
For the model with Cov(ε) = σ2V, the maximum likelihoodestimators are
β = (X>V−1X)−1X>V−1y ,
σ2 =(y− Xβ)>V−1(y− Xβ)
n.
Winter term 2018/19 19/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Misspecification of the error structure
Suppose the model is y = Xβ + ε with E(y) = Xβ andCov(y) = σ2V and one uses the ordinary least squaresestimator βOLS = (X>X)−1X>y to estimate β.
The consequences of using the ordinary least squares estimatoron E(βOLS) and Cov(βOLS) for the case that the errorstructure Cov(ε) = σ2V holds will be discussed in the tutorial.
Winter term 2018/19 20/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Model misspecification
Suppose the model is y = Xβ + ε with E(y) = Xβ andCov(y) = σ2I.
Let the model be partitioned as
y = Xβ + ε = (X1 X2)
(β1
β2
)+ ε
= X1β1 + X2β2 + ε .
Suppose we leave out X2β2 when it should be included, i.e.,when β2 6= 0.
By doing so, we misspecify E(y).
Winter term 2018/19 21/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Reduced model
We consider estimation of β1 when underfitting.
We write the reduced model as
y = Xβ∗1 + ε∗
using β∗1 to emphasize that these parameters and their
estimates β∗1 will be different from β1 and β1, respectively, in
the full model.
Winter term 2018/19 22/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Fitting the reduced model
If we fit the model y = X1β∗1 + ε∗ when the correct model is
y = X1β1 + X2β2 + ε with Cov(y) = σ2I, then the following
results for the least squares estimator β∗1 = (X>1 X1)−1X>1 y
can be obtained:
E(β∗1) = will be discussed in the tutorial
Cov(β∗1) = σ2(X>1 X1)−1 ,
Furthermore, Cov(β1)− Cov(β∗1) = σ2AB−1A>, which is a
positive definite matrix, where A = (X>1 X1)−1X>1 X2 andB = X>2 X2 − X>2 X1A. Therefore, Var(βj) > Var(β∗j ).
Winter term 2018/19 23/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Underfitting and overfitting
Underfitting leads in general to biased results but lowervariances.
Overfitting leads to unbiased results but greater variances.
Seek an adequate balance between a biased model and amodel with large variances.
Task: find an optimum subset of predictors.
Winter term 2018/19 24/25
Models, estimation and goodness-of-fitGeneralized least squares
Misspecifications and orthogonalization
Orthogonalization
Suppose that in the full model y = X1β1 + X2β2 + ε thecolumns of X1 are orthogonal to the columns of X2, that is,X>1 X2 = O.
If X>1 X2 = O, then the least squares estimator β∗1 obtained
from fitting the reduced model is unbiased: E(β∗1) = β1.
Moreover, if X>1 X2 = O, then the estimator of β1 in the full
model is the same as the estimator of β∗1 in the reduced
model.
The process of orthogonalization can give additional insightsinto the meaning of the regression coefficients.
Winter term 2018/19 25/25