Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Lecture 3:Linear Models
Bruce Walsh lecture notesUppsala EQG courseversion 28 Jan 2012
2
Quick Review of the Major Points
The general linear model can be written as
y = X! + e• y = vector of observed dependent values
• X = Design matrix: observations of the variables in the assumed linear model
• ! = vector of unknown parameters to estimate
• e = vector of residuals (deviation from model fit), e = y-X !
3
y = X! + eSolution to ! depends on the covariance structure(= covariance matrix) of the vector e of residuals
• OLS: e ~ MVN(0, "2 I)• Residuals are homoscedastic and uncorrelated, so that we can write the cov matrix of e as Cov(e) = "2I• the OLS estimate, OLS(!) = (XTX)-1 XTy
Ordinary least squares (OLS)
• GLS: e ~ MVN(0, V)• Residuals are heteroscedastic and/or dependent,• GLS(!) = (XT V-1 X)-1 V-1 XTy
Generalized least squares (GLS)
4
BLUE
• Both the OLS and GLS solutions are alsocalled the Best Linear Unbiased Estimator(or BLUE for short)
• Whether the OLS or GLS form is useddepends on the assumed covariancestructure for the residuals– Special case of Var(e) = "e
2 I -- OLS
– All others, i.e., Var(e) = R -- GLS
5
y = µ + !1x1 + !2x2 + · · · + !nxx + e
Linear ModelsOne tries to explain a dependent variable y as a linearfunction of a number of independent (or predictor)variables.
A multiple regression is a typical linear model,
Here e is the residual, or deviation between the truevalue observed and the value predicted by the linearmodel.
The (partial) regression coefficients are interpretedas follows. A unit change in xi while holding allother variables constant results in a change of !i in y
6
Linear Models
As with a univariate regression (y = a + bx + e), the modelparameters are typically chosen by least squares,wherein they are chosen to minimize the sum ofsquared residuals, # ei
2
This unweighted sum of squared residuals assumes an OLS error structure, so all residuals are equallyweighted (homoscedastic) and uncorrelated
If the residuals differ in variances and/or some arecorrelated (GLS conditions), then we need to minimize the weighted sum eTV-1e, which removes correlations andgives all residuals equal variance.
7
Predictor and Indicator Variables
yij = µ + si + eij
yij = trait value of offspring j from sire i
µ = overall mean. This term is included to give the si terms a mean value of zero, i.e., they are expressed as deviations from the mean
si = The effect for sire i (the mean of its offspring). Recallthat variance in the si estimates Cov(half sibs) = VA/4
eij = The deviation of the jth offspring from the familymean of sire i. The variance of the e’s estimates thewithin-family variance.
Suppose we measuring the offspring of p sires. One linear model would be
8
Predictor and Indicator VariablesIn a regression, the predictor variables aretypically continuous, although they need not be.
yij = µ + si + eij
Note that the predictor variables here are the si, (thevalue associated with sire i) something that we are trying to estimate
We can write this in linear model form by using indicatorvariables,
xik =!
1 if sire k = i
0 otherwise
9
Models consisting entirely of indicator variablesare typically called ANOVA, or analysis of variancemodels
Models that contain no indicator variables (other thanfor the mean), but rather consist of observed value ofcontinuous or discrete values are typically calledregression models
Both are special cases of the General Linear Model
(or GLM)
yijk = µ + si + dij + !xijk + eijk
Example: Nested half sib/full sib design with an age correction ! on the trait
10
yijk = µ + si + dij + !xijk + eijk
ANOVA model
Regression model
Example: Nested half sib/full sib design with an age correction ! on the trait
si = effect of sire idij = effect of dam j crossed to sire ixijk = age of the kth offspring from i x j cross
11
Linear Models in Matrix Form
Suppose we have 3 variables in a multiple regression,with four (y,x) vectors of observations.
In matrix form, y = X! + e
y =
"#$
y1
y2
y3
y4
%&' ! =
"#
µ!1
!2
!3
%&$ ' e =
"
#$
e1
e2
e3
e4
%
&'X =
"#$
1 x11 x12 x13
1 x21 x22 x23
1 x31 x32 x33
1 x41 x42 x43
%&'
The design matrix X. Details of both the experimentaldesign and the observed values of the predictor variablesall reside solely in X
yi = µ + !1xi1 + !2xi2 + !3xi3 + ei
12
The Sire Model in Matrix Form
The model here is yij = µ + si + eij
Consider three sires. Sire 1 has 2 offspring, sire 2 hasone and sire 3 has three. The GLM form is
y =
"
#####
y11
y12
y21
y31
y32y33
%
&&&&& , X =
"
#####
1 1 0 01 1 0 01 0 1 01 0 0 11 0 0 11 0 0 1
%
&&&&&$ ' $ '! =
"
#$
µs1s2s3
%
&' , and e =
"
#####$
e11
e12
e21e31e32e33
%
&&&&&'
Still need to specify covariance structure of theresiduals. For example, could be common-familyeffects that make some correlated.
13
In-class ExerciseSuppose you measure height and sprint speed forfive individuals, with heights (x) of 9, 10, 11, 12, 13and associated sprint speeds (y) of 60, 138, 131, 170, 221
1) Write in matrix form (i.e, the design matrixX and vector ! of unknowns) the following models
• y = bx• y = a + bx• y = bx2
• y = a + bx + cx2
2) Using the X and y associated with these models,compute the OLS BLUE, ! = (XTX)-1XTy for each
14
Rank of the design matrix• With n observations and p unknowns, X is an n x p
matrix, so that XTX is p x p
• Thus, at most X can provide unique estimates forup to p < n parameters
• The rank of X is the number of independent rowsof X. If X is of full rank, then rank = p
• A parameter is said to be estimable if we canprovide a unique estimate of it. If the rank of Xis k < p, then exactly k parameters are estimable(some as linear combinations, e.g. !1-3!3 = 4)
• if det(XTX) = 0, then X is not of full rank
• Number of nonzero eigenvalues of XTX gives therank of X.
15
Experimental design and X• The structure of X determines not only which
parameters are estimable, but also the expectedsample variances, as Var(!) = k (XTX)-1
• Experimental design determines the structure ofX before an experiment (of course, missing dataalmost always means the final X is different formthe proposed X)
• Different criteria used for an optimal design. LetV = (XTX)-1 . The idea is to chose a design for Xgiven the constraints of the experiment that:– A-optimality: minimizes tr(V)
– D-optimality: minimizes det(V)
– E-optimality: minimizes leading eigenvalue of V
16
Ordinary Least Squares (OLS)When the covariance structure of the residuals has acertain form, we solve for the vector ! using OLS
If the residuals are homoscedastic and uncorrelated,"2(ei) = "e
2, "(ei,ej) = 0. Hence, each residual is equallyweighted,
Sum of squaredresiduals canbe written as
Predicted value of the y’s
If residuals follow a MVN distribution, OLS = ML solution
n(
i=1
)e2i = )eT)e = (y!X!)T (y!X!)
17
Ordinary Least Squares (OLS)
Taking (matrix) derivatives shows this is minimized by
This is the OLS estimate of the vector !
The variance-covariance estimate for the sample estimatesis
V! = (XTX)!1"2e
n(
i=1
)e2i = )eT)e = (y!X!)T (y!X!)
! = (XT X)!1XT y
The ij-th element gives the covariance between theestimates of !i and !j.
18
Sample Variances/Covariances
*"2e =
1n! rank(X)
n(
i=1
)e2i
V! = (XTX)!1"2e
The residual variance can be estimated as
The estimated residual variance can be substituted into
To give an approximation for the sampling variance and covariances of our estimates.
Confidence intervals follow since the vector of estimates ~ MVN(!, V!)
19
Example: Regression Through the Origin
yi = !xi + ei
Here X =
"##$
x1
x2...
xn
%&&' y =
"
##$
y1
y2
...yn
%
&&' ! = (! )
XTX =n(
i=1
x2i XT y =
n(
i=1
xi yi
+!=,XTX
-!1XT y =
+xi yi
x2i
++"2(!) =
1n !1
(yi! !xi)2
x2i
"
(
2(b) =,XTX
-!1"2
e ="2
e+x2
i
"2e =
1n 1 (yi !xi)2-
-
20
Polynomial Regressions
GLM can easily handle any function of the observedpredictor variables, provided the parameters to estimateare still linear, e.g. Y = $ +!1f(x) + !2g(x) + … + e
Quadratic regression:
yi = # + !1 xi + !2 x2i + ei
! =
"
$#!1!2
%
' X =
"
##$
1 x1 x21
1 x2 x22
......
...1 xn x2
n
%
&&'
21
Interaction EffectsInteraction terms (e.g. sex x age) are handled similarly
With x1 held constant, a unit change in x2 changes yby !2 + !3x1 (i.e., the slope in x2 depends on the currentvalue of x1 )
Likewise, a unit change in x1 changes y by !1 + !3x2
yi = # +!1 xi1 + !2 xi2 + !3 xi1xi2 +ei
! =
"#$
#!1
!2
!3
%&' X =
"
##$
1 x11 x12 x11x121 x21 x22 x21x22...
......
...1 xn1 xn2 xn1xn2
%
&&'
22
The GLM lets you build yourown model!
• Suppose you want a quadratic regressionforced through the origin where the slopeof the quadratic term can vary over thesexes
• Yi = !1xi + !2xi2 + !3sixi
2
• si is an indicator (0/1) variable for the sex(0=male, 1=female).– Male slope = !2,
– Female slope = !2 + !3
23
Fixed vs. Random EffectsIn linear models are are trying to accomplish two goals:estimation the values of model parameters and estimateany appropriate variances.
For example, in the simplest regression model,y = $ + !x + e, we estimate the values for $ and ! andalso the variance of e. We, of course, can alsoestimate the ei = yi - ($ + !xi )
Note that $/! are fixed constants are we trying toestimate (fixed factors or fixed effects), while theei values are drawn from some probability distribution(typically Normal with mean 0, variance "2
e). The ei are random effects.
24
“Mixed” models contain both fixed and random factors
This distinction between fixed and random effects isextremely important in terms of how we analyzed a model.If a parameter is a fixed constant we wish to estimate,it is a fixed effect. If a parameter is drawn fromsome probability distribution and we are trying to makeinferences on either the distribution and/or specific realizations from this distribution, it is a random effect.
We generally speak of estimating fixed factors andpredicting random effects.
y = Xb + Zu + e, u ~MVN(0,R), e ~ MVN(0,"2eI)
Key: need to specify covariance structures for MM
25
Example: Sire model
yij = µ + si + eij
It depends. If we have (say) 10 sires, if we are ONLYinterested in the values of these particular 10 sires and don’t care to make any other inferences about the population from which the sires are drawn, then we can treat them as fixed effects. In the case, the model is fully specified the covariance structure for the residuals.Thus, we need to estimate µ, s1 to s10 and "2
e, and wewrite the model as yij = µ + si + eij, "2(e) = "2
e I
Here µ is a fixed effect, e a random effect
Is the sire effect s fixed or random ?
26
yij = µ + si + eij
Conversely, if we are not only interested in these10 particular sires but also wish to make someinference about the population from which theywere drawn (such as the additive variance, since "2
A = 4"2s, ), then the si are random effects. In this
case we wish to estimate µ and the variances"2
s and "2w. Since 2si also estimates (or predicts)
the breeding value for sire i, we also wish toestimate (predict) these as well. Under a Random-effects interpretation, we write the model asyij = µ + si + eij, "2(e) = "2
eI, "2(s) = "2AA
27
Generalized Least Squares (GLS)
Suppose the residuals no longer have the samevariance (i.e., display heteroscedasticity). Clearlywe do not wish to minimize the unweighted sumof squared residuals, because those residuals withsmaller variance should receive more weight.
Likewise in the event the residuals are correlated,we also wish to take this into account (i.e., performa suitable transformation to remove the correlations)before minimizing the sum of squares.
Either of the above settings leads to a GLS solutionin place of an OLS solution.
28
In the GLS setting, the covariance matrix for thevector e of residuals is written as R where Rij = "(ei,ej)
The linear model becomes y = X! + e, cov(e) = R
The GLS solution for ! is
The variance-covariance of the estimated model parameters is given by
, -b = XT R!1X
!1
XTR!1y
-( )Vb = XT
R!1X
1"2
e
29
ExampleOne common setting is where the residuals are uncorrelatedbut heteroscedastic, with "2(ei) = "2
e/wi.
For example, sample i is the mean value of ni individuals, with "2(ei) = "e
2/ni. Here wi = ni
Consider the model yi = $ + !xi
R = Diag(w!11 , w!1
2 , . . . , w!1n )
! =.
#!
/ X =
"
$1 x1...
...1 xn
%
'
Here
giving
where
-R 1 = Diag(w1, w2, . . . ,wn)
XTR!1y = w
"
$yw
xyw
%
' -
XTR!1X = w
"
$1 xw
xw x2w
%
'
w =n(
i=1
wi, xw =n(
i=1
wixi
w, x2w =
n(
i=1
wix2i
w
yw =n(
i=1
wiyi
w, xyw =
n(
i=1
wixi yi
w
This gives the GLS estimators of $ and ! as
Likewise, the resulting variances and covariance forthese estimators is
!
a = yw ! bxw
b =xyw ! xw yw
x2w x 2
w
-
!
"2(b) ="2
e
w (x2w ! x2
w )
"(a, b) =!"2
e xw
w (x2w ! x2w )
"2(a) ="2
e · x2w
w (x2w x2w )
32
Chi-square and F distributions
Let Ui ~ N(0,1), i.e., a unit normal
The sum U12 + U2
2 + … + Uk2 is a chi-square random
variable with k degrees of freedom
Under appropriate normality assumptions, thesums of squares that appear in linear modelsare also chi-square distributed. In particular,
The ratio of two chi-squares is an F distribution
n(
i=1
(xi ! x ) " $2n!1
2
33
$2k/k
$2n/n
" Fk,n
In particular, an F distribution with k numeratordegrees of freedom, and n denominator degreesof freedom is given by
Thus, F distributions frequently arise in testsof linear models, as these usually involve ratiosof sums of squares.
34
Sums of Squares in linear models
The total sums of squares SST of a linear modelcan be written as the sum of the error (or residual)sum of squares and the model (or regression) sum of squares
SST = SSM + SSE
((yi ! y)2 )
((yi ! yi)2 )
((yi ! y)2�
r2, the coefficient of determination, is thefraction of variation accounted for by the model
r2 =SSM
SST
= 1 - SSE
SST
35
))SSE =n(
i=1
( yi ! yi )2 =n(
i=1
e 2i
SST =n(
i=1
( yi!y )2 =n(
i=1
y2i!y 2 =
n(
i=1
y2i!
1n2
0n(
i=1
yi
12
SST = yT y! 1n
yT Jy = yT
.I! 1
nJ/
y
SSE = yT
.I !X
,XT X
-!1
XT
/y
SSM = SST ! SSE = yT
.X
,XT X
-!1
XT ! 1n
J/
y
Sums of Squares are quadratic products
We can write this as a quadratic product as
Where J is a matrix all of whose elements are 1’s
36
Expected value of sums ofsquares
• In ANOVA tables, the E(MS), or expectedvalue of the Mean Squares (scaled SS orSum of Squares), often appears
• This directly follows from the quadraticproduct. If E(x) = µ, Var(x) = V, then
– E(xTAx) = tr(AV) + µTAµ
37
Hypothesis testing
Provided the residual errors in the model are MVN, then for a modelwith n observations and p estimated parameters,
SSE
"2e
" $2n!p
Consider the comparison of a full (p parameters)and reduced (q < p) models, where SSEr = error SS forreduced model, SSEf = error SS for full model
.SSEr ! SSEf
p! q
/ 2.SSEf
n! p
/=
.n! p
p! q
/ .SSEr
SSEf
! 1/
The difference in the error sum of squares for the full andreduced model provided a test for whether the model fit is thesame
This ratio follows an Fp-q,n-p distribution
38
Does our model account for a significant fraction ofthe variation?
.n! p
p! 1
/ .SST
SSEf
! 1/
=.
n! p
p! 1
/ .r2
1! r2
/
Here the reduced model is just yi = u + ei
In this case, the error sum of squares for thereduced model is just the total sum of squaresand the F test ratio becomes
This ratio follows an Fp-1,n-p distribution
39
Model diagnostics
• It’s all about the residuals
• Plot the residuals– Quick and easy screen for outliers
• Test for normality among estimatedresiduals– Q-Q plot
– Wilk-Shapiro test
– If non-normal, try transformations, such as log
40
OLS, GLS summary
41
Different statistical models
• GLM = general linear model– OLS ordinary least squares: e ~ MVN(0,cI)
– GLS generalized least squares: e ~ MVN(0,R)
• Mixed models– Both fixed and random effects (beyond the residual)
• Mixture models– A weighted mixture of distributions
• Generalized linear models– Nonlinear functions, non-normality
42
Mixture models• Under a mixture model, an observation potentially
comes from one of several different distributions,so that the density function is %1&1 + %2&2 + %3&3
– The mixture proportions %i sum to one
– The &i represent different distribution, e.g., normal withmean µi and variance "2
• Mixture models come up in QTL mapping -- anindividual could have QTL genotype QQ, Qq, or qq– See Lynch & Walsh Chapter 13
• They also come up in codon models of evolution, werea site may be neutral, deleterious, or advantageous,each with a different distribution of selectioncoefficients– See Walsh & Lynch (volume 2A website), Chapters 10,11
43
Generalized linear models
Typically assume non-normal distribution forresiduals, e.g., Poisson, binomial, gamma, etc
44
Likelihoods for GLMsUnder assumption of MVN, x ~ MVN(! ,V), the likelihood function becomes
= (2%) n/2 |V|!1/2 exp3! 1
2(x! )T V!1 (x ! !)
4-L(!,V | x) !
Variance components (e.g., "2A, "2
e, etc.) are included in V
REML = restricted maximum likelihood. Method ofchoice for variance components, as it maximizesthat part of the likelihood function that is independentof the fixed effects, !.