Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Contents
1 Models with fixed effects1 Modelisation preambles and linear (fixed effects) models (LM)2 Generalized linear (fixed effects) models (GLM)
2 Linear Mixed Models (LMM)1 Definition and notations2 Estimation and algorithms3 Models selection
3 Generalized Linear Mixed Models (GLMM)1 Definition and notations2 Estimation and algorithms3 Model selection
4 Longitudinal, overdispersion and non linear mixed models examples
5 Experimentation using R
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Basics for model construction
1 a set of observations for :
y a response variable (i.e. dependent variable)x1, x2, ... explanatory variables
with matrix notation :
y =
y1
...yn
X =
x1,1 . . . x1,K
......
...xn,1 . . . xn,K
y is a realization of the n × 1 response random vector YX is the observation of the n × K design matrix
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
2 an assumption on the distribution of Y
Definition (Parametric density family)
A parametric density family is a collection of density functions fθ(y),described by a finite-dimensional parameter θ. The set of all allowablevalues for the parameter is denoted Θ ⊆ RK , and the model itself iswritten as
F = {fθ(y) | θ ∈ Θ}
Example
Gaussian parametric model
F =
{fµ,σ(y) = 1√
2πσe− (y−µ)2
2σ2 | µ ∈ R, σ > 0
}Exponential parametric model
F ={
fλ(y) = λe−λy | λ > 0}
Binomial parametric model
F ={
fp(y) = C yn py (1− p)n−y | 0 ≤ p ≤ 1
}C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
3 a link between Y and the X ’s
for ’linear models’, this link is assumed between :
the mean (i.e. expected value)
µ = E(Y )
a linear combination of the X ’s through the linear predictor
η = Xβ
byg(µ) = η
with g(µ) = µ for the normality assumption !
Remark
Linearity is expressed in terms of linear combination of parameters
η = µ+ β1x + β2x2
takes part of a linear model but relation between y and x is no morelinear.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
About explanatory variables
2 types of explanatory variables :
1 factors↪→ interest is in attributing variability in y to various categories ofthe factor
Example: patients classified by gender (M/F) and age group(A/B/C)
ηij = µ+ αi + βj i = 1, 2 j = 1, 2, 3
↪→ parameter values give the impact of factor’s levels on theresponse variable
factors may be crossed or nestedfactors may have main effect and interaction effect
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
2 regressors↪→ interest is in attributing variability in y to changes in values of acontinuous covariable
Example: changes due to weight x
η = β0 + β1x
↪→ parameter values give the impact of an increase in x on theresponse variable
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Fixed vs Random effects
Example
4 loaves of bread are taken from each of 6 batches of bread baked at 3different temperatures
interest of course in each particular baking temperature used
no interest in each batch which are very depending on thecircumstances
batches effect can be viewed as a sample of a random batch effect(levels are chosen at random from an infinite set of batches levels)
interest in estimating the variance of the batch effect as a source ofrandom variation in the data
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
fixed effect (factor) is defined with a finite set of levels and wheninterest lies in the estimation of each particular level effect
random effect (factor) is defined with an infinite set of levels (withonly a finite subset present in the data collection) and when interestlies more in the variance induced by these levels than in theestimation of the levels themselves
↪→ Consequence: data collected within each level of the random effectfactor are linked to a same realization of a random variable. Thisintroduce dependency between this data.
Example
data collected on the 4 loaves in each batch share something linked tothe batch itself !
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Example (Placebo and drug)
Clinical trial of treating epileptics with a drug.Patients randomly allocated to either drug or placebo.Response: number of seizures for patient k receiving treatment i
E(Yik ) = µi = µ+ αi
The 2 treatments are the only 2 being used and there is no thought forany other treatments. Each level is of interest.
↪→ Treatment is a fixed effect factor.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Example (cont.)
Suppose the clinical trial is composed of repetition at 20 different clinicsin the city.Response: number of seizures for patient k receiving treatment i inclinics j
E(Yijk ) = µij = µ+ αi + bj
Clinics can be viewed as a random sample of clinics.Inference will be made on the population of clinics effects and inparticular the variance (magnitude of the variations among clinics).
↪→ Clinics is a random effect factor.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Example (cont.)
Assume the procedure is very specialized so that the trial is onlyconducted in a very few number of referral hospitals.We can no longer consider the clinics selected as a sample from a largergroup of clinics.We want to make inference only to the clinics in the study.
↪→ Clinics is a fixed effect factor.
The situation is determining whether clinics effect is to be consideredfixed or random.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Help for decision random or fixed
Are the levels of the factor going to be considered as a random samplefrom a population of effects ?
Is there enough information about a factor to decide that the levels of itin the data are like a random sample ?
Does each level of the factor have a constant effect we want to estimate ?
Do we want to identify the levels variation as a source of variability ?
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Mathematical notation for fixed effects
E(Yik ) = µ+ αi
Mathematical notation and properties for random effects
E(Yik |ai ) = µ+ ai
when ai are the levels of a random effect; ai ’s are treated as randomvariables.2 usual assumptions :
1 ai ’s are independently and identically distributed
2 ai ’s have zero mean and all the same variance σ2a .
ai ∼ i .i .d .(0, σ2a)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Is height depending on sex and weight?
Observations
(y , s,w) the height,the sex and theweight of 100
individuals sampledfrom a population
Height in cm
Fre
quen
cy
160 165 170 175
05
1015
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Model construction
yij is a realization for individual i of sex j of Yij from a parametricdensity family FWe assume:
F is the gaussian parametric family,
Height in cm
Fre
quen
cy
160 165 170 175
05
1015
Residuals
Fre
quen
cy
−2 −1 0 1 2 3
05
1015
20
Yij are independentE(Yij ) = µ+ αj + βwij where µ is the intercept, αj the sex fixedeffects (male and female), β fixed effect of weight.V(Yij ) = σ2,∀i , j
this implies1 Yij ∼ N (µ+ αj + βwij , σ
2)2 µ, αj , β and σ2 are the unknown parameters.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Is number of foliar blotches depending on clone?
Observations
(y , c) number offoliar blotches and
clone identifiersampled from apopulation. 25
individuals for eachof 4 clones havebeen measured.
0 2 4 6 8 10 12 14
010
2030
4050
60
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Is number of foliar blotches depending on clone?
Model construction
yij is a realization for individual i of clone j of Yij from theparametric Poisson family FF = { e−λλy
y ! , λ > 0} and E(Y ) = λ.
We assume:
Yij are independentηij = µ+ αj where µ is the intercept, αj the clone fixed effectslog link: log[E(Yij )] = log(λij ) = ηij
this implies1 V(Yij ) = E(Yij )2 µ and αj ’s are unknown parameters.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Is number of foliar blotches depending on clone?
Observations
(y , c) number offoliar blotches and
clone identifiersampled from apopulation. 25
individuals per clonehave beenmeasured.
z = 1 if tree is non infectedz = 0 if tree is infected
number of non-infected trees per clone
C1 C2 C3 C419 17 4 0
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Model construction
zij is a realization for individual i of clone j of Zij from theparametric Bernoulli family FF =
{fp(z) = pz (1− p)1−z | 0 ≤ p ≤ 1
}and E(Z ) = p.
We assume:
Zij ’s are independentηij = µ+ αj where µ is the intercept, αj the clone fixed effectslogit link: logit(pij ) = log(
pij
1−pij) = ηij
this implies1 V(Zij ) = pij (1− pij )2 µ and αj ’s are unknown parameters.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Is height depending on family?
Observations
30 families havebeen studied and 10
individuals perfamily have been
measured. Let(y , f ) height andfamily of the 300
individuals sampledfrom a population
0 5 10 15 20
020
4060
80
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Model construction 1: select the best families
Estimate the value of each family jWe assume:
1 Yij comes from the gaussian parametric density,2 Yij are independent3 E(Yij ) = µ+ αj where µ is the intercept, αj the family fixed effects4 V(Yij ) = σ2,∀i , j
this implies1 family fixed effects2 individuals belonging to a same family are assumed to be independent
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Model construction 2: genetic breeding program
estimate the family variability in the whole populationWe assume:
1 Yij comes from the gaussian parametric density,2 E(Yij |αj ) = µ+ αj where µ is the intercept, αj the family random
effects3 conditionnally to αj , Yij are independent4 V(Yij |αj ) = σ2, ∀i , j5 αj ∼ N (0, σ2
α)
this implies1 marginal expectation E(Yij ) = µ2 marginal correlation
ρ(Yij ,Yi′j′) =
{σ2α
σ2α+σ2 if j = j ′
0 if j 6= j ′
3 individuals belonging to a same family are not independent
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Linear Models definition
Let (yi , xi1, . . . , xiK )ni=1 denote K + 1 observations for n individuals.
(yi )ni=1 realizations of n random variables (Yi )
ni=1. Yi are modeled by a
linear model when
1 Yi comes from the gaussian parametric density family
2 Yi are independent
3 predictor: ηi = β0 +∑K
k=1 xikβk (linear combination of parameters)
4 expectation with identity link: E(Yi ) = ηi
5 variance: V(Yi ) = σ2, ∀i
Yi = β0 +K∑
k=1
xikβk︸ ︷︷ ︸fixed part
+ εi︸︷︷︸error part
εi iid ∼ N (0, σ2)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Matrix formulation
Whatever the nature of the explanatory variables (regressors, factors),linear model always could be defined as follows:
Y = Xβ + ε
with β = (β0, β1, . . . , βK )′
6 observations3 regressors
Y1
.
.
.Y6
=
1 x1,1 . . . x1,3
.
.
.
.
.
.
.
.
.
.
.
.1 x6,1 . . . x6,3
β0β1β2β3
+ε
1 factor, 2 levels
Y1
.
.
.Y6
=
1 1 01 1 01 1 01 0 11 0 11 0 1
β0
β1β2
+ε
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Terminology :
Multiple Linear Regression/ANOVA/ANCOVA
if matrix X only contains regressors, models are called multipleregression models
if matrix X only contains factors, model are called Analysis OfVariance (ANOVA) models (X is a matrix composed with 0 or 1).
if matrix X contains both regressors and factors, models are calledAnalysis of Covariance (ANCOVA) models.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
R procedure
> res.lm <- lm(formula=y ∼ x+f+x:f, data = data, offset
=rep(0,n), weights=rep(1,n))
> summary(res.lm)
> anova(res.lm)
> names(res.lm)
> res.lm$coefficients
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Gaussian hypothesis
Graphical
histogram, QQ-plot,
> hist(res.lm$residuals)
> par(mfrow=c(2,2)
> plot(res.lm)
Statistical test
Kolmogorov-Smirnov
> ks.test(res.lm$res, "pnorm")
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Homoscedasticity hypothesis
Graphical
residual/fitted
> par(mfrow=c(1,2)
> plot(res.lm$fitted.values,
res.lm$res) ;
> plot(res.lm$res)
> abline(v=0)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Independence hypothesis
Difficult to test !!!
have a look at the plot of the residualsfor time correlation : Durbin-Watson test ...
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Estimation
Let’s assume a linear model :
Y = Xβ + ε
with ε ∼ N (0, σ2Id)
Parameters to be estimated are β, σIn all the following, X is supposed of full rank: rank(X )= K
Least squares approach : min(||Y − Xβ||2)
βls = (X ′X )−1X ′Y
best linear unbiased estimator of β
βls ∼ N (β, σ2(X ′X )−1)
best quadratic unbiased estimator of σ2
σ2ls =
1
n − K(Y − X β)′(Y − X β) and σ2
ls ∼σ2
n − Kχ2(n − K )
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Preambles for estimation
Definition (Likelihood function)
The likelihood function is a function of the parameters of a statisticalmodel and is defined as
Θ ↪→ Rθ 7→ fθ(y),
and will be denoted as L(θ; y)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Maximum likelihood approach
Likelihood
L(β, σ, y) =n∏
i=1
1√2πσ2
e−1
2σ2 (yi−x′i β)′(yi−x′i β)
Log-likelihood
`(β, σ, y) = −n
2log (2πσ2)− 1
2σ2(y − Xβ)′(y − Xβ)
Maximum log-likelihood
∂β,σ`(β, σ, y) = 0⇒
βml = (X ′X )−1X ′Y
σ2ml =
1
n(Y − X β)′(Y − X β)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
βls = βml
but σ2ls 6= σ2
ml
σ2ls :
is unbiasedis calculated on the orthogonal of Xit takes into account the difference between Y and its projection onX : X β
σ2ml :
is biasedestimation of σ2 and β are made jointlyit does note take into account the lost in degrees of freedom due tothe estimation of β
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Goodness of fit criterion
Adjusted R-square
R2 = 1−∑n
i=1(yi − x ′i β)2/(n − K )∑ni=1(yi − y)2/(n − 1)
Akaike’s Information Criterion
AIC = −2 logL(βml , σml , y) + 2K
Bayesian Information Criterion
BIC = −2 logL(βml , σml , y) + K log(n)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Variable selection in multiple regression
The main approaches
Forward selection, which involves starting with no variables in themodel, trying out the variables one by one and including them ifthey are statistically significant.
Backward elimination, which involves starting with all candidatevariables and testing them one by one for statistical significance,deleting any that are not significant.
Stepwise methods that are a combination of the above, testing ateach stage for variables to be included or excluded.
step function
For AIC: > step(model, data, direction = c("both",
"backward", "forward"), k=2, trace)
k=2 is by default
For BIC: > step(model, data, direction,
k=log(nrow(data)), trace)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Generalized Linear Models – Introduction
Agricultural Science - different types of data (responses):
continuous: weight, height, diameterdiscrete: count, proportion
Model choice - important part of the research: search for a simplemodel which explains well the data.
All models envolve:
a systematic component - related to the explanatory variables(regression model, analysis of variance model, analysis of covariancemodel);a random component - related to the distributions followed by theresponse variables;a link between systematic and random components.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Motivating example – Carnation meristem culture
0.0 0.1 0.3 0.5 1.0 2.0
s l v s l v s l v s l v s l v s l v
1 2.5 0 3 5.5 1 5 4.8 1 9 2.8 0 10 2.0 1 12 1.7 12 2.5 0 2 4.3 1 5 3.0 1 10 2.3 1 8 2.3 1 15 2.5 11 3.0 0 6 3.3 0 4 2.7 0 8 2.7 1 12 2.0 1 15 2.3 12 2.5 1 3 4.3 0 4 3.1 1 11 3.2 0 13 1.0 1 12 1.5 11 4.0 0 4 5.4 0 5 2.9 0 8 2.9 1 14 2.8 1 13 1.7 11 4.0 0 3 3.8 1 6 3.3 1 8 1.5 1 14 2.0 1 16 2.0 12 3.0 0 3 4.3 1 6 2.1 1 8 2.5 0 14 2.7 1 17 1.7 11 3.0 0 4 6.0 1 5 3.7 1 8 2.8 0 9 1.8 1 15 2.0 11 5.0 0 3 5.0 1 4 3.8 1 8 1.8 1 13 1.8 1 17 2.0 11 4.0 0 2 5.0 0 5 3.8 1 11 2.0 0 9 2.1 1 14 2.3 11 2.0 1 3 4.5 0 6 3.3 0 9 2.7 1 15 1.3 1 16 2.5 11 4.0 0 3 4.0 1 6 2.6 1 12 1.8 1 15 1.2 1 21 1.3 12 3.0 0 4 3.3 0 5 2.3 0 12 2.3 1 16 1.2 1 18 1.3 12 3.5 1 3 4.3 1 4 3.6 1 10 1.5 1 9 1.0 1 16 1.8 11 3.0 0 3 4.5 1 3 4.8 1 10 1.5 1 13 1.7 1 18 1.0 12 3.0 0 2 3.8 0 4 2.0 0 7 1.0 1 14 1.7 1 20 1.3 12 5.5 0 3 4.7 1 6 1.7 0 8 3.0 1 16 1.3 1 22 1.5 11 3.0 0 4 2.2 0 5 2.5 0 12 2.0 1 13 1.8 1 20 1.3 11 2.5 0 2 3.8 1 5 2.0 0 9 3.0 11 2.0 0 3 5.0 0 5 2.0 0
Response variables: number of shoots (s), average length of shoots(l), vitrification (v)
Distributions: ??
Systematic component: regression model, completely randomizedexperiment.
Links: ??
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
********************
**
*
***************
**
****************
****
**
*
*
*****
*
*
**
**
**
*
**
*
*****
*
*
*
***
*
**
*
**
**
**
**
*
*
*
*
*
*
*
*
*
*
*
0.0 0.5 1.0 1.5 2.0
510
1520
BAP dose
Num
ber o
f sho
ots
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
******
*
**
*
**
*
*
*
**
*
******
******
*
*
***** **
*****************
****************** ******************
0.0 0.5 1.0 1.5 2.0
01
23
45
BAP dose
Aver
age
leng
th o
f sho
ots
***
*
******
*
**
*
******
**
***
****
**
*
*
**
*
*
*
*
*
**
*
*
*
*****
*
*
*
**
***** *
**
*
**
**
*
*
********* ****************** ******************
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
BAP dose
Vitri
ficat
ion
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Motivating example – Rotenon toxicity
Dose (di ) mi yi
0.0 49 02.6 50 63.8 48 165.1 46 247.7 49 42
10.2 50 44
Response variable: Yi – number of dead insects out of mi insects(Martin, 1942).
Distribution: Binomial.
Systematic component: regression model, completely randomizedexperiment.
Aim: Lethal doses.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
*
*
*
*
* *
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
Dose
Obs
erve
d pr
opor
tions
the curve of the observed proportions against doses has an S shape
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Motivation for the link function
Bioassay or biological assay – the measurement of the potency of astimulus by means of the reactions it produces in living matter
The stimulus may be physiological, chemical, biological, etc.
insects exposed to chemical stimuli (insecticides or biological, eg avirus, stimuli)other agricultural bioassays may involve large animals, fungi, plantsor plant parts such as leaves.
The response here is quantal or binary – just 2 possible responses(success or failure).
an insect dies or survivesa seed germinates or fails to germinatea cutting roots or fails to root
The rotenone data example
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
The tolerance of an individual (eg an insect) is the dose orconcentration or intensity of the stimulus above which the individualresponds (eg dies) and below which it does not respond.
An individual dies if his tolerance is smaller than a given dose.
Tolerance varies between individuals – it is a random variable.
Supposez : tolerance of a randomly chosen individualfZ (z): the probability density function of Zx : dose of the stimulus
5 10 15 20 25 30 35
0.0
0.1
0.2
0.3
0.4
z
f(z)
5 10 15 20 25 30 35
0.00
0.02
0.04
0.06
0.08
0.10
z
f(z)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Then, for an individual chosen at random from the population
P(death|x) = P(Z < x) =
∫ x
−∞f (z)dz
5 10 15 20 25 30 35
0.0
0.1
0.2
0.3
0.4
z
f(z) π
10 15 20 25 30
0.0
0.2
0.4
0.6
0.8
1.0
z
Pro
port
ion
of d
ead
inse
cts
LD50
It is often reasonable to assume that the tolerance Z has a normaldistribution, that is, Z ∼ N(µ, σ2). Then, the cumulative normaldistribution is
π = P(death|x) = P(Z < x) = P
(Z − µσ
<x − µσ
)= Φ
(x − µσ
)The inverse of the cumulative normal function,probit(π) = Φ−1(π) = α + βx , where α = −µσ and β = 1
σ , is calledthe probit transformation.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Alternatively, we can assume that tolerance Z has a logisticdistribution with mean E(Z ) = µ, variance Var(Z ) = π2τ 2/3 andpdf
fZ (z ;µ, τ) =1
τ
exp
(z − µτ
)[
1 + exp
(z − µτ
)]2 , µ ∈ R, τ > 0,
Then, the cumulative logistic distribution is
π = P(Z ≤ x) = F(x) =
∫ x
−∞
βeα+βz
(1 + eα+βz )2 dz =
eα+βx
1 + eα+βx
where α = −µ/τ and β = 1/τ .
The inverse of the cumulative distribution of the logistic distributionis called the logit transformation
logit(π) = logπ
1− π= α + βx
The normal and logistic distributions are symmetrical around themean.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
A different distribution, which is asymmetrical, for the tolerance Z isthe extreme value (Gumbel) distribution with mean E(Z ) = α + γτ ,variance Var(Z ) = π2τ 2/6, where γ ≈ 0, 577216 is the Euler numberdefined by γ = −ψ(1) = limn→∞(
∑ni=1 i−1 − log n),
ψ(p) = d log Γ(p)/dp is the digamma function, and pdf
fZ (z ;α, τ) =1
τexp
(z − ατ
)exp
[− exp
(z − ατ
)], α ∈ R, τ > 0,
Then, the cumulative Gumbel distribution is
π = F (x) =
∫ x
−∞β exp
(α + βz − eα+βz
)dz = 1− exp [− exp(α + βx)]
where α = −µ/τ and β = 1/τ .
The inverse of the cumulative Gumbel distribution is calledcomplementary log-log transformation
c-loglog(π) = log(− log(1− π)) = α + βx
The exposed theory shows one type of biological justification for thelink function.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
5 10 15 20 25 30 35
0.0
0.1
0.2
0.3
0.4
z
f(z)
5 10 15 20 25 30 35
0.00
0.04
0.08
z
f(z)
5 10 15 20 25 30 35
0.0
0.4
0.8
zPro
port
ion
of d
ead
inse
cts
5 10 15 20 25 30 35
0.0
0.4
0.8
zPro
port
ion
of d
ead
inse
cts
5 10 15 20 25 30 35
−3−1
13
z
Pro
bit(p
ropo
rtio
n)
5 10 15 20 25 30 35
−6−4
−20
2
z
C−l
og−l
og(p
ropo
rtio
n)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Generalized Linear Models – Definition
The three components of a generalized linear model (Nelder andWedderburn, 1972; McCullagh and Nelder, 1989) are:
independent random variables Yi , i = 1, . . . , n, from an exponentialfamily distribution with means µi and constant dispersion (scale)parameter φ,
f (y) = exp
{yθ − b(θ)
φ+ c(y , φ)
}where µ = E[Y ] = b′(θ) and Var(Y ) = φb′′(θ) = φV (µ), V (µ)called variance function.
a linear predictor vector η given by
η = Xβ
where β is a vector of p unknown parameters and X = [x1, . . . , xn]′
is the n × p design matrix;
a link function g(·) relating the mean to the linear predictor, i.e.
g(µi ) = ηi = x′iβ
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Normal Models
Yi , i = 1, . . . , n, a continuous response variable,Yi ∼ N(µi , σ
2) with mean µi and constant variance σ2
We model the mean µi in terms of the explanatory variables xi .As a glm
Random component:
f (yi ;µi , σ2) =
1√
2πσ2exp
[−
(yi − µi )2
2σ2
]= exp
[1
σ2
(yiµi −
µ2i
2
)−
1
2log(2πσ2)−
y2i
2σ2
]
where
θi = µi , φ = σ2, b(θi ) =
µ2i
2=θ2
i
2, c(yi , φ) = −
1
2
[y2
i
σ2+ log(2πσ2)
], V (µi ) = 1
Systematic component: ηi = x′iβ
Link function: ηi = µi (identity link, canonical link)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Binomial regression model
Yi , count of successes out of a sample of size mi , i = 1, . . . , nYi ∼ Bin(mi , πi ) with mean E[Yi ] = µi = miπi , andVar(Yi ) = miπi (1− πi )We model the expected proportions πi ∈ [0, 1] in terms of theexplanatory variables xi . As a glm
Random component:
f (yi ;πi ) =(m
y
)π
yii (1− πi )
mi−yi = exp
[yi log
(πi
1− πi
)+ mi log(1− πi ) + log
(mi
yi
)],
where yi = 0, 1, . . . ,mi ,
φ = 1, θi = log
(πi
1− πi
)= log
(µi
mi − µi
)⇒ µi =
meθi(1 + eθi )
,
b(θ) = −mi log(1− πi ) = mi log (1 + eθi ), c(yi , φ) = log(m
y
), V (µi ) = µi (mi − µi )/mi
Systematic component:
ηi = x′iβ
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Link function:
The canonical link function is the logit
ηi = g(πi ) = g(µi
mi) = log
(µi
mi − µi
)= log
(πi
1− πi
)Other common choices are
probit ηi = g(πi ) = g( µi
mi) = Φ−1(µi/mi ) = Φ−1(πi )
complementary log-log link
ηi = g(πi ) = g(µi
mi) = log{− log(1− πi )}.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Poisson regression models
Yi , i = 1, . . . , n, are counts with means µi
Yi ∼ Pois(µi ) with mean µi and variance Var(Yi ) = µi
As a glm
Random component:
f (yi ;µi ) =e−µiµyi
i
yi != exp[yi log(µi )− µi − log(yi !)]
where
φ = 1, θi = log(µi )⇒ µi = exp(θi ), b(θ) = µi = exp(θi ), c(yi , φ) = log(yi !), V (µi ) = µi
Systematic component:
ηi = x′iβ
Link function:
The canonical link function is the log
ηi = g(µi ) = log(µi )
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
For different observation periods/areas/volumes:
Yi ∼ Pois(tiλi )
Taking a log-linear model for the rates,
log(λi ) = x′iβ
results in the following log-linear model for the Poisson means
log(µi ) = log(tiλi ) = log(ti ) + x′iβ,
where the log(ti ) is included as a fixed term, or offset, in the model.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Components of some distributions from exponential family
Distribution φ θ b(θ) c(y, φ) µ(θ) V (µ)
Normal: N(µ, σ2) σ2 µθ2
2−
1
2
y2
σ2+ log(2πσ2)
θ 1
Poisson: P(µ) 1 log(µ) eθ − log(y !) eθ µ
Binomial: B(m, π) 1 log
(µ
m − µ
)m log(1 + eθ ) log
(m
y
) meθ
1 + eθ
µ
m(m − µ)
Negative Binomial: BN(µ, k) 1 log
(µ
µ + k
)−k log(1 − eθ ) log
[Γ(k + y)
Γ(k)y !
]keθ
1 − eθµ
(µk
+ 1
)Gamma: G(µ, ν) ν−1 −
1
µ− log(−θ) ν log(νy) − log(y) − log Γ(ν) −
1
θµ2
Inverse Gaussian: IG(µ, σ2) σ2 −1
2µ2−(−2θ)1/2 −
1
2
[log(2πσ2y3) +
1
σ2y
](−2θ)−1/2 µ3
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Canonical link functions for some distributions
Distribution Canonical link functionsNormal Identity: η = µPoisson Logarithmic: η = log(µ)
Binomial Logistic: η = log
(π
1− π
)= log
(µ
m − µ
)Gamma Reciprocal: η =
1
µ
Inverse Gaussian Reciprocal squared: η =1
µ2
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Estimation
Maximum likelihood estimation.
Estimation algorithm (Nelder and Wedderburn, 1972) – Iterativellyweighted least squares (IWLS)
X′W[t]Xβ[t+1] = X′W[t]z[t]
where
t denotes the iteration index
X = [x1, x2, . . . , xn]′ is a design matrix n × p,
W = diag{Wi} – depends on the known (prior) weights (wi ), variancefunction (distribution) and link function
Wi = wiV (µi )
(dµidηi
)2
β – parameter vector p × 1
z – a vector n × 1 (adjusted response variable) – depends on y and linkfunction
zi = ηi + (yi − µi )dηidµi
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
When the dispersion parameter is unknown, it may be estimated bythe Pearson Estimator
φ =1
n − p
n∑i=1
wi (yi − µi )2
V (µi )
where µi = g−1(x′i β) is the ith fitted value.
Some computer packages estimate φ by the deviance estimatorD(β)/(n − p); but it cannot be recommended because of problemswith bias and inconsistency in the case of a non-constant variancefunction.
For positive data, the deviance may also be sensitive to roundingerrors for small values of yi .
The asymptotic variance of β is estimated by the inverse (Fisher)information matrix, giving
Var(β) = K = φ(X′WX)−1,
where W is calculated from β.
The standard error se(βj ) is calculated as the square-root of the jthdiagonal element of this matrix, for j = 1, . . . , p
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
When φ is known, a 1− α confidence interval for βj is defined bythe endpoints
βj ± se(βj )z1−α/2
where z1−α/2 is the 1− α/2 standard normal quantile.
For φ unknown, we replace φ by φ in K and a 1− α confidenceinterval for βj is defined by the endpoints
βj ± se(βj )t(1−α/2)(n − p)
where t(1−α/2)(n − p) is the 1− α/2 quantile of Student’s tdistribution with n − p degrees of freedom.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Analysis of Deviance – Goodness of fitting and modelselection
Analysis of deviance is the method of parameter inference forgeneralized linear models based on the deviance, generalizing ideasfrom ANOVA, and first introduced by Nelder and Wedderburn(1972).
The situation is similar to regression analysis, in the sense thatmodel terms must be eliminated sequentially, and the significance ofa term may depend on which other terms are in the model.
The deviance D measures the distance between y and µ, given by
S =D(β)
φ= −2[log L(µ, y)−log L(y , y)] = 2φ−1
n∑i=1
wi [yi (θi−θi )+b(θi )−b(θi )]
where L(µ, y) and L(y , y) are the likelihood function values for thecurrent and saturated models, θi = θ(yi ), θi = θ(µi ) andD(β) =
∑ni=1 wi d(yi ; µi ).
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Deviance for some models
Model Deviance
Normal Dp =n∑
i=1
(yi − µi )2
Binomial Dp = 2n∑
i=1
[yi log
(yi
µi
)+ (mi − yi ) log
(mi − yi
mi − µi
)]Poisson Dp = 2
n∑i=1
[yi log
(yi
µi
)+ (µi − yi )
]Negative Binomial Dp = 2
n∑i=1
[yi log
(yi
µi
)+ (yi + k) log
(µi + k
yi + k
)]Gamma Dp = 2
n∑i=1
[log
(µi
yi
)+
yi − µi
µi
]Inverse Gaussian Dp =
n∑i=1
(yi − µi )2
yi µ2i
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
We consider separately the cases where φ is known and unknown,but first we introduce some notation.
Let M1 denote a model with p parameters, and let D1 = D(β)denote the minimized deviance under M1
Let M2 denote a sub-model of M1 with q < p parameters, and letD2 denote the corresponding minimized deviance, where D2 ≥ D1
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Known dispersion φ parameter
Mainly relevant for discrete data, for which, in general, φ = 1.
The deviance D1 is a measure of goodness-of-fit of the model M1;and is also known as the G 2 statistic in discrete data analysis.
A more traditional goodness-of-fit statistic is Pearson’s X 2 statistic
X 2 =∑ wi (yi − µi )
2
V (µi )
Asymptotically, for large w the statistics D1 and X 2 are equivalentand distributed as χ2(n − p) under M1.
Various numerical and analytical investigations have shown that thelimiting χ2 distribution is approached faster for the X 2 statistic thanfor D1, at least for discrete data.
A formal level α goodness-of-fit test for M1 is obtained by rejectingM1 if X 2 > χ2
(1−α)(n − p)
This test may be interpreted as a test for overdispersion.
The fit of a model is a complex question, cannot be summarized in asingle number – supplement with an inspection of residuals.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
To test the sub-model M2 with q < p we use the log likelihood ratiostatistic
D2 − D1 ∼ χ2(p − q)
M2 is rejected at level α if D2 − D1 > χ2(1−α)(p − q)
In the case where φ 6= 1 we use the scaled deviance D/φ instead ofD; and the scaled Pearson statistic X 2/φ instead of X 2 and so on.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Unknown dispersion φ parameter
The dispersion parameter is usually unknown for continuous data
In the discrete case we may prefer to work with unknown dispersionparameter, if evidence of overdispersion has been found in the data.
There is no formal goodness-of-fit test available based on X 2 – thefit of the model M1 to the data must be checked by residual analysis.
X 2 is used to estimate the dispersion parameter
φ =1
n − p
∑ wi (yi − µi )2
V (µi )
where µi = g−1(β′xi ) is the ith fitted value.
To test the sub-model M2 with q < p parameters inference may bebased on F−statistic,
F =(D2 − D1)/(p − q)
φ∼ F (p − q, n − p)
We reject M2 at level α if F > F1−α(p − q, n − p)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Deviance Table – An example
Suppose a completely randomized experiment with two factors A (with alevels) and B (with b levels) and r replications
Model DF Deviance Deviance Diff. DF Diff. MeaningNull rab − 1 D1
D1 − DA a− 1 A ignoring BA a(rb − 1) DA
DA − DA+B b − 1 B including AA+B a(rb − 1)− (b − 1) DA+B
DA+B − DA∗B (a− 1)(b − 1) Interaction ABincluded A and B
A+B+A.B ab(r − 1) DA∗B
DA∗B ab(r − 1) ResidualSaturated 0 0
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Residual analysis
Pearson residual
rPi =yi − µi√
V (µi )
reflect the skewness of the underlying distribution.
Deviance residual
rDi = sign(yi − µi )√
d(yi ; µi )
which is much closer to being normal than the Pearson residual, buthas a bias (Jorgensen, 2011).
Modified deviance residual (Jorgensen, 1997)
r∗Di = rDi +φ
rDilog
rWi
rDi
where rWi is the Wald residual defined by
rWi = [g0(yi )− g0(µi )]√
V (yi )
where g0 is the canonical link.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
All those residuals have approximately mean zero and varianceφ(1− hi ), where hi is the ith diagonal element of the matrixH = W1/2X(X′WX)−1X′W1/2.
Use standardized residuals such as r∗Di (1− hi )1/2, which are nearly
normal with variance φ
Plot residuals against the fitted values – to check the proposedvariance function
Normal Q-Q plot (or normal Q-Q plot with simulated envelopes) forthe residuals – to check the correctness of the distributionalassumption
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Checking for the link function
Suppose the link function g0(µ) = g(µ, λ0) = Xβ, nested in a parametricfamily g(µ, λ), indexed by the parameter λ, for example,
g(µ, λ) =
µλ − 1
λλ 6= 0
log(µ) λ = 0
which includes the identity, logarithmic links, or the Aranda-Ordaz family,
µ =
{1− (1 + λeη)−
1λ λeη > −11 c.c.
which includes the identity, complementary log-log links.The Taylor expansion for g(µ, λ) around λ0, gives
g(µ, λ) ' g(µ, λ0) + (λ− λ0)u(λ0) = Xβ + γu(λ0)
where u(λ0) =∂g(µ, λ)
∂λ
∣∣∣∣λ=λ0
.
In general it is used u(λ0) = η2.C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Justification
Suppose the link function used was η = g(µ) and that the true link isg∗(µ). Then,
g(µ) = g [g∗−1(η)] = h(η)
The null hypothesis is H0 : h(η) = η and Ha : h(η) = non linear.Using Taylor expansion for g(µ) we have:
g(µ) ' h(0) + ηh′(0) + η2 h′′(0)
2
then, the added variable is η2, assuming that the model has terms forwhich the mean is marginal.Formal tests
Likelihood ratio test
Score Test
Wald Test
Graphics
Added variable plot
Partial residual plot
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R commands for GLM
glm(resp ~ linear predictor + offset(of), weights = w,
family=familyname(link ="linkname" ))
The resp is the response variable y . For a binomial regression model it isnecessary to create:
resp<-cbind(y,n-y)
The possible families (”canonical link”) are:
binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
The default family is the gaussian family and default links are the canonicallinks (don’t need to be declared). Other possible links are ”probit”, ”cloglog”,”cauchit”, ”sqrt”,etc. To see more, type
? glm
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Normal models – Transform or link
Example
Average daily fat yields (kg/day) from milk from a single cow for each of35 weeks (McCulloch, 2001)
0.31 0.39 0.50 0.58 0.59 0.640.68 0.66 0.67 0.70 0.72 0.680.65 0.64 0.57 0.48 0.46 0.450.31 0.33 0.36 0.30 0.26 0.340.29 0.31 0.29 0.20 0.15 0.180.11 0.07 0.06 0.01 0.01
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Fat yield example
A typical model:Fat yield ≈ αtβeγt where t=week
TransformYi = αtβi eγti eεi
log(Yi ) = µi + εi = logα + β log(ti ) + γi t + εi
log(Yi ) ∼ N(logα + β log(ti ) + γi t, σ2)
E[log(Yi )] = logα + β log(ti ) + γi t
LinkYi = µi + ξi = αtβi eγti + ξi
Yi ∼ N(αtβi eγti , τ 2)
E[Yi ] = αtβi eγti
log(E[Yi ]) = logα + β log(ti ) + γi t
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Plot of fat yield for each week – Observed values and fitted curve
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
Weeks
Fat y
ield
(kg
/day
)
*
*
*
* *
**
* **
**
* *
*
** *
**
*
**
*
**
*
*
**
** *
* *
*
ObservedLog transformationLog link
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
# Average daily fat yields (kg/day) from milk
# from a single cow for each of 35 weeks
# McCulloch (2001)
# Ruppert, Cressie, Carroll (1989), Biometrics, 45:637-656
fatyield.dat<-scan(what=list(yield=0))
0.31 0.39 0.50 0.58 0.59 0.64
0.68 0.66 0.67 0.70 0.72 0.68
0.65 0.64 0.57 0.48 0.46 0.45
0.31 0.33 0.36 0.30 0.26 0.34
0.29 0.31 0.29 0.20 0.15 0.18
0.11 0.07 0.06 0.01 0.01
fatyield.dat$week=1:35
attach(fatyield.dat)
lweek<-log(week)
## Normal model for log(fat)
lyield<-log(yield)
mod1<-lm(lyield~week+lweek)
summary(mod1)
anova(mod1)
R program (cont.)
## Normal model with log link
mod2<-glm(yield~week+lweek, (family=gaussian(link="log")))
fit2<-fitted(mod2)
summary(mod2)
anova(mod2)
# Plotting
plot(c(0,35), c(0,0.9), type="n", xlab="Weeks",
ylab="Fat yield (kg/day)")
points(week,yield, pch="*")
w<-seq(1,35,0.1)
lines(w, predict(mod2,data.frame(week=w, lweek=log(w)),
type="response"),
col="green", lty=1)
lines(w, exp(predict(mod1,data.frame(week=w, lweek=log(w)),
type="response")),
col="red", lty=2)
legend(20,0.9,c("Observed","Log transformation", "Log link"),
lty=c(-1,2,1),
pch=c("*"," "," "), col=c("black","red","green"),cex=.6)
title(sub="Figure 1. Fat yield (kg/day) for each week")
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Binomial regression model
Example
Batches of 20 pyrethroid-resistant moths (Heliothis virescens), a cottoncrop pest) of each sex were exposed to a range of doses of cypermethrintwo days after emergence from pupation. The number of moths whichwere either knocked down or dead was recorded after 72h.
Doses (di ) Males Females
1.0 1 02.0 4 24.0 9 68.0 13 10
16.0 18 1232.0 20 16
Response variable: Yi – number of dead insects out of 20 insects.Distribution: Binomial.Systematic component: regression model, completely randomizedexperiment.Aim: Lethal doses.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Cypermethrin toxicity example
Deviance residuals
Terms fitted in model d.f. Deviance X 2
Constant 11 124.9 101.4Sex 10 118.8 97.4
Dose 6 15.2 12.9Sex + Dose 5 5.0 3.7
Analysis of deviance
Source d.f. Deviance p-valueSex 1 6.1 0.0144Sex|Dose 1 10.2 0.0002Dose 5 109.7 < 0.0001Dose|Sex 5 113.8 < 0.0001Residual 5 5.0 0.5841Total 11 124.9
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Cypermethrin toxicity example
Modelslogit(p) = αj + βj log2(dose) – different logistic regression lineslogit(p) = αj + β log2(dose) – common slopelogit(p) = α + βj log2(dose) – common interceptlogit(p) = α + β log2(dose) – same logistic regression line.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Cypermethrin toxicity example
Residual Deviances
Terms fitted in model d.f. Deviance X 2
Constant 11 124.9 101.4Sex + Sex log2(dose) 8 4.99 3.51Sex + log2(dose) 9 6.75 5.31Const. + Sex log2(dose) 9 5.04 3.50Const. + log2(dose) 10 16.98 14.76
Analysis of deviance
Source d.f. Deviance p-valueSex 1 6.1 0.0144Linear Regression 1 112.0 < 0.0001Residual 9 6.8 0.7473Total 11 124.9
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Cypermethrin toxicity example
Regression equationsmales
logp
1− p= −2.372 + 1.535 log2(dose)
females
logp
1− p= −3.473 + 1.535 log2(dose)
Lethal Dosesmales
log2(LD50) =2.372
1.535= 1.55⇒ LD50 = 4.69
females
log2(LD50) =3.473
1.535= 2.26⇒ LD50 = 9.61
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Cypermethrin toxicity example
Plot of observed proportions and fitted curves
1 2 5 10 20
0.0
0.2
0.4
0.6
0.8
1.0
log(dose)
Pro
port
ions
*
*
*
*
*
*
+
+
+
+
+
+
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
y <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)
sex<-factor(rep(c("M","F"), c(6,6)))
ldose<-rep(0:5,2)
dose<-2**ldose
dose<-factor(dose)
Cyper.dat <- data.frame(sex, dose, ldose, y)
attach(Cyper.dat)
plot(ldose,y/20, pch=c(rep("*",6),rep("+",6)),
col=c(rep("green",6), rep("red",6)),
xlab="log(dose)", ylab="Proportion killed")
resp<-cbind(y,20-y)
mod1<-glm(resp~1, family=binomial)
mod2<-glm(resp~dose, family=binomial)
mod3<-glm(resp~sex, family=binomial)
mod4<-glm(resp~dose+sex, family=binomial)
anova(mod1, mod2, mod4, test="Chisq")
anova(mod1, mod3, mod4, test="Chisq")
R program (cont.)
mod5<-glm(resp~ldose, family=binomial)
mod6<-glm(resp~sex+ldose-1, family=binomial)
mod7<-glm(resp~ldose/sex, family=binomial)
mod8<-glm(resp~ldose*sex, family=binomial)
anova(mod1, mod5, mod6, mod8, test="Chisq")
anova(mod1, mod5, mod7, mod8, test="Chisq")
summary(mod6)
## Plotting
plot(c(1,32), c(0,1), type="n", xlab="log(dose)",
ylab="Proportions", log="x")
points(2**ldose,y/20, pch=c(rep("*",6),rep("+",6)),
col=c(rep("green",6),rep("red",6)))
ld<-seq(0,5,0.1)
lines(2**ld, predict(mod6,data.frame(ldose=ld,
sex=factor(rep("M",length(ld)),levels=levels(sex))),
type="response"), col="green")
lines(2**ld, predict(mod6,data.frame(ldose=ld,
sex=factor(rep("F",length(ld)),levels=levels(sex))),
type="response"), col="red")
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
CS2 toxicity
Mortality of adult beetles after five hours’ exposure to gaseous carbon disulphid(Bliss, 1935).
Log Number Numberdosage exposed killed
1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60
Considerations
Response variable: Yi – number of killed beetles out of mi exposed beetles(Pregibon, 1980).
Distribution: Binomial.
Systematic component: regression model, completely randomizedexperiment.
Aim: Lethal doses.C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Binomial logistic model with η = β0 + β1log(dose) gives deviance11.23 on 6 d.f. ⇒ lack of fit
Deviance for link test 8.037 on 1 d.f. ⇒ need for different link
Added variable plot for U = η2 ⇒ evidence that γ 6= 0, spreadthroughout data
●
●
●
●
●
●
●●
−2 0 2 4 6 8 10
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Ordinary residuals
Pear
son
resid
uals
Figure: CS2 - Added variable plot
C-loglog link with a linear lp gives good fit
Logistic link with a quadratic lp gives good fit
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
1.70 1.75 1.80 1.85
0.0
0.2
0.4
0.6
0.8
1.0
Log(dose)
Pro
port
ion
of b
eetle
s ki
lled
*
*
*
*
*
*
* **
ObservedLogitCloglog
Figure: Beetle mortality - Observed proportions and fitted curvesC. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
# *** Mortality of adult beetle Example ***# Pregibon (1980) - Applied Statistics, 29: 20
y <- c(6, 13, 18, 28, 52, 53, 61, 60)m <- c(59, 60, 62, 56, 63, 59, 62, 60)ldose <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)beetle.dat <- data.frame(ldose, m, y)attach(beetle.dat)resp<-cbind(y,m-y)
## Logistic link functionmodl<-glm(resp ~ poly(I(ldose))+poly(I(ldose),2), family=binomial(link="logit"))anova(modl, test="Chisq")
mod1l<-glm(resp ~ 1, family=binomial(link="logit"))1-pchisq(deviance(mod1l), df.residual(mod1l))print(sum(residuals(mod1l, 'pearson')^2))mod2l<-glm(resp ~ I(ldose), family=binomial(link="logit"))1-pchisq(deviance(mod2l), df.residual(mod2l))print(sum(residuals(mod2l, 'pearson')^2))
## Link function testLP2l <- (predict(mod2l,type = c("link")))^2mod3l<-update(mod2l , .~. +LP2l, family=binomial(link="logit"))anova(mod3l, test="Chisq")
mod4l<-glm(resp ~ poly(I(ldose),2), family=binomial(link="logit"))1-pchisq(deviance(mod4l), df.residual(mod4l))print(sum(residuals(mod4l, 'pearson')^2))
R program
## Link function testLP2q <- (predict(mod4l,type = c("link")))^2mod5l<-update(mod4l , .~. +LP2q, family=binomial(link="logit"))anova(mod5l, test="Chisq")
## Added variable plotrp <- residuals(mod2l, 'pearson')W <- mod2l$fitted*(1- mod2l$fitted)*mU <- LP2lmod1n <- glm(U~I(ldose), family=gaussian, weights=W)rU <- (U - mod1n$fitted)plot(rU,rp, xlab="Ordinary residuals",ylab="Pearson residuals")
# Partial residual plotresparc <- residuals(mod3l, 'pearson') + mod3l$coef[3]*Uplot(U,resparc, xlab="U",ylab="Residual + component")
summary(mod6l<-glm(resp ~ I(ldose)+I(ldose^2), family=binomial(link="logit")))anova(mod6l, test="Chisq")library(MASS)# variance-covariance matrix of estimated parametersvcov(mod6l)
R program
## Complementary log-log link functionmodc<-glm(resp ~ poly(I(ldose))+poly(I(ldose),2), family=binomial(link="cloglog"))anova(modc, test="Chisq")
mod1c<-glm(resp ~ 1, family=binomial(link="cloglog"))1-pchisq(deviance(mod1c), df.residual(mod1c))print(sum(residuals(mod1c, 'pearson')^2))mod2c<-glm(resp ~ I(ldose), family=binomial(link="cloglog"))1-pchisq(deviance(mod2c), df.residual(mod2c))print(sum(residuals(mod2c, 'pearson')^2))
## Link function testLP2l <- (predict(mod2c,type = c("link")))^2mod3c<-update(mod2c , .~. +LP2l, family=binomial(link="cloglog"))anova(mod3c, test="Chisq")
## Added variable plotrp <- residuals(mod2c, 'pearson')W <- mod2c$fitted*(1- mod2c$fitted)*mU <- LP2lmod1n <- glm(U~I(ldose), family=gaussian, weights=W)rU <- (U - mod1n$fitted)plot(rU,rp, xlab="Ordinary residuals",ylab="Pearson residuals")
# Partial residual plotresparc <- residuals(mod3c, 'pearson') + mod3c$coef[3]*Uplot(U,resparc, xlab="U", ylab="Residual + component")
R program
summary(mod5c<-glm(resp ~ I(ldose), family=binomial(link="cloglog")))library(MASS)# variance-covariance matrix of estimated parametersvcov(mod5c)
# Doses LD50 and LD90dose.p(mod5c); dose.p(mod5c, p=0.9)
# Doses LD25, LD50, LD75dose.p(mod5c, p = 1:3/4)
# Plottingplot(c(1.69,1.89), c(0,1), type="n", xlab="Log(dose)",ylab="Proportion of beetles killed")points(ldose, y/m, pch="*")ld<-seq(1.69,1.89,0.01)lines(ld, predict(mod4l,data.frame(ldose=ld),type="response"), col="blue", lty=1)lines(ld, predict(mod2c,data.frame(ldose=ld),type="response"), col="red", lty=2)legend(1.7,1.0,c("Observed","Logit", "Cloglog"), lty=c(-1,1,2),pch=c("*"," "," "), col=c("black","blue", "red"))
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Germination of Orobanche seed
Example
O. aegyptiaca 75 O. aegyptiaca 73
Bean Cucumber Bean Cucumber
10/39 5/6 8/16 3/12
23/62 53/74 10/30 22/41
23/81 55/72 8/28 15/30
26/51 32/51 23/45 32/51
17/39 46/79 0/4 3/7
10/13
Response variable: Yi – number of germinated seeds out of mi seeds.Distribution: Binomial.Systematic component: factorial 2× 2 (2 species, 2 extracts),completely randomized experiment (Crowder, 1978).Aim: to see how germination is affected by species and extracts.Problem: overdispersion.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Germination of Orobanche seed
Simple Binomial Model
No of seeds germinating yi as y-variable
Binomial logit model
Species * Extract interaction model
Analysis of deviance
Source d.f. Deviance p-valueE 1 55.97E | S 1 56.49S 1 2.54S | E 1 3.06S.E 1 6.41Residual 17 33.28 0.0096Total 20 98.72
Interaction significant
Some evidence of overdispersion?
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Germination of Orobanche seed
Normal plot with simulated envelope
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2
−1
01
2
quantile
devi
ance
res
idua
ls
● ●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
● ●
●
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
Species <- factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2))
Extract <- factor(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2))
y <- c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8,
10, 8, 23, 0, 3, 22, 15, 32, 3)
m <- c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16,
30, 28, 45, 4, 12, 41, 30, 51, 7)
orobanch <- data.frame(Species, Extract, m, y)
attach(orobanch)
resp<-cbind(y,m-y)
p<-y/m
# Binomial fit
orobanchB.fit<-glm(resp~Species*Extract, family=binomial)
summary(orobanchB.fit)
anova(orobanchB.fit, test="Chisq")
orobanchB.fit<-glm(resp~Extract*Species, family=binomial)
summary(orobanchB.fit)
anova(orobanchB.fit, test="Chisq")
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Poisson regression model
Example
Storing of micro-organisms
Bacterial concentrations (counts per fixed area) measured at initial freezing(−70oC) and at 1, 2, 6, 12 months.
Time 0 1 2 6 12Count 31 26 19 15 20
Hypothesized that count decays over time, i.e.
average count ∝ 1
(Time)γ
Model
Count ∼ Pois(µ)
logµ = β0 + β1 log(Time)
Avoid problems at time 0 by using
logµ = β0 + β1 log(Time + 0.1)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Storing of micro-organisms example
Residual deviances
Model d.f. Deviance X 2
Constant 4 7.0672 7.1532log(Time) 3 1.8338 1.8203
Analysis of deviance
Source d.f. Deviance p-valueLinear Regression 1 5.2334 0.0222Error 3 1.8338Total 4 7.0672
log(µ) = 3.149− 0.1261 log(Time)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Storing of micro-organisms example
Plot of bacterial concentration: observed values and fitted curve
0 2 4 6 8 10 12
1520
2530
Time in months
Cou
nts
*
*
*
*
*
Figure 1. Data and Fitted curveC. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
ltime <- log((tim <- c(0, 1, 2, 6, 12))+ 0.1)
lcount <- log(count <- c(31, 26, 19, 15, 20))
bacteria.dat <- data.frame(tim, count, ltime, lcount)
with(bacteria.dat,{
par(mfrow=c(1,2))
plot(tim, count, xlab="Time in months", ylab="Counts")
plot(ltime,lcount, xlab="Log(time in months)", ylab="Log(counts)")
par(mfrow=c(1,1))
})
mod1<-glm(count ~ tim, family=poisson, bacteria.dat)
anova(mod1, test="Chisq")
mod2<-glm(count ~ ltime, family=poisson)
anova(mod2, test="Chisq")
plot(c(0,12), c(15,31), type="n", xlab="Time in months",
ylab="Counts")
points(tim,count,pch="*")
x<-seq(0,12,0.1)
lp<-predict(mod2,data.frame(ltime=log(x+0.1)), type="response")
lines(x,lp,lty=1)
title(sub="Figure 1. Data and Fitted curve")
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Exercise Relative potency - Toxicity of insecticide to flour beetles
Insecticide Deposit of Insecticide
2.00 2.64 3.48 4.59 6.06 8.00
DDT 3/50 5/49 19/47 19/50 24/49 35/50
γ-BHC 2/50 14/49 20/50 27/50 41/50 40/50
DDT + γ-BHC 28/50 37/50 46/50 48/50 48/50 50/50
Considerations
Response variable: Yi – number of dead insects out of mi insects.
Distribution: Binomial.
Systematic component: ANOVA with regression model, completelyrandomized experiment.
Aim: Lethal doses and comparison of insecticides.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear modelsLinear Models Generalized Linear Models
Exercise
Count of the number of plant species on plots that have differentbiomass (a continuous explanatory variable with three levels: high, midand low) (Venables and Ripley, 1994)
Tabela: Number of plant species (Y), quantity of biomass (X) and levels of pHof the soil.
pH level Y X Y X Y X Y X Y X18 0,1008 15 2,6292 13 0,6526 8 3,6787 9 1,507919 0,1385 9 3,2522 9 1,5553 2 4,8315 8 2,3259
Low 15 0,8635 3 4,4172 8 1,6716 17 0,2897 12 2,995719 1,2929 2 4,7808 14 2,8700 14 0,0775 14 3,538112 2,4691 18 0,0501 13 2,5107 15 1,4290 7 4,364511 2,3665 19 0,4828 4 3,4976 17 1,1207 3 4,8705
29 0,1757 30 1,3767 21 2,5510 18 3,0002 13 4,905613 5,3433 9 7,7000 24 0,5536 26 1,9902 26 2,9126
Mid 20 3,2164 21 4,9798 15 5,6587 8 8,1000 31 0,739528 1,5269 18 2,2321 16 3,8852 19 4,6265 20 5,12096 8,3000 25 0,5112 23 1,4782 25 2,9345 22 3,5054
15 4,6179 11 5,6969 17 6,0930 24 0,7300 27 1,1580
30 0,4692 39 1,7308 44 2,0897 35 3,9257 25 4,266729 5,4819 23 6,6846 18 7,5116 19 8,1322 12 9,5721
High 39 0,0866 35 1,2369 30 2,5320 30 3,4079 33 4,605020 5,3677 26 6,5608 36 7,2420 18 8,5036 7 9,390939 0,7648 39 1,1764 34 2,3251 31 3,2228 24 4,136125 5,1371 20 6,4219 21 7,0655 12 8,7459 11 9,9817
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Linear Mixed Models
Violation of independence in Linear Models
sampling process
cluster trials (randomized units: families, countries, factories)multilevel (classes into schools into towns, repetitions into block intotrials). . .
longitudinal or repeated data
spatial data
Then,Y comes from the multivariate gaussian parametric family density :
F =
{1
(2π)n/2det(V )−1/2e−
12 (Y−Xβ)′V−1(Y−Xβ)
}dimension of V is n(n + 1)/2difficult to specify directly these elements⇒ with random effects we can deal with variances and covariances.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Eucalyptus example
Example
Breeding program of Eucalypt.Ten families of trees full-sibs have been measured each year. There arethree blocks, and 36 individuals per family in each block.Let assume that
Yijk = µ+ Bk + Fj + εijk
with Bk , k = 1, 2, 3 random block effects, Fj , j = 1, . . . , 10 random familyeffects. Then
1 variances : Var(Yijk ) = σ2B + σ2
F + σ2ε
2 covariances :
cov(Yijk ,Yi ′jk ) = σ2B + σ2
F
cov(Yijk ,Yi ′j′k ) = σ2B and cov(Yijk ,Yi ′jk′) = σ2
F
cov(Yijk ,Yi ′j′k′) = 0
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Definition
Let y denote the observation vector, for n individuals (y a realization ofrandom vector Y).Let β be a vector of fixed unknown parameters and u be the unobservedrealized vector of the random effect vector U.Let X and Z be two known model matrices.
Y is modeled by a linear mixed model when
1 Y comes from the multivariate gaussian parametric density family
2 the conditional expectation E(Y|U = u) is related to the predictorη = Xβ + Zu through the identity link: E(Y|U = u) = η
3 the conditional variance: V(Y |U = u) = R
4 U ∼ N (0,D) and D is an unknown covariance matrix
This is commonly written:
Y = Xβ + Zu + ε
with U and ε independent.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Marginal moments
marginal expectation
E (Y) = EU [E(Y|U)]
= EU (Xβ + ZU)
= Xβ
marginal variance
V (Y) = EU(V(Y |U)) + VU(E(Y |U))
= R + VU(Xβ + ZU)
= R + ZDZ ′
marginal distribution of Y
Y ∼ N (Xβ,ZDZ ′ + R)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation for V known
Coming back to the very general case Y ∼ N (Xβ,V )Maximum likelihood estimator of β?
Log-likelihood
`(β; y) = −1
2(y − Xβ)′V−1(y − Xβ)− 1
2log [det(V )]− n
2log(2π)
Score function
U(β) =∂`(β; y)
∂β= X ′V−1(y − Xβ)
Solution of U(β) = 0
β0 = (X ′V−1X )−X ′V−1Y
where A− is the generalized inverse of A
Theorem (Kruskal):
β0 = (X ′X )−X ′Y iff VX = X
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Properties of β0 and tests
Properties
expectationE(β0) = β
varianceV(β0) = (X ′V−1X )−
multivariate gaussian distribution
β0 ∼ NK
[β, (X ′V−1X )−
]Tests
H0 : Lβ = m
Test statistics: X 2 = (Lβ0 −m)′[L(X ′V−1X )−L′]−1(Lβ0 −m)
Distribution: X 2 ∼ χ2(r(L))
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation for V known up to a constant
Let V = σ2W , W known, σ2 unknownMaximum likelihood estimator of β and σ2?
Log-likelihood
`(β; y) = − 1
2σ2(y−Xβ)′W−1(y−Xβ)−n
2log(2πσ2)−1
2log [det(W )]
Score function for β
U(β; σ2) =∂`(β, σ2; y)
∂β= X ′(σ2W )−1(y − Xβ)
Score function for σ2
U(σ2; β) =∂`(β, σ2; y)
∂σ2=
1
2(σ2)2(y − X β)′W−1(y − X β)− n
2σ2
Solution of U(β; σ2) = 0 and U(σ2; β) = 0
β0 = (X ′W−1X )−X ′W−1Y
σ2ml =
1
n(Y − Xβ0)′W−1(Y − Xβ0)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Properties and tests
Properties
β0 ∼ NK
[β, σ2(X ′W−1X )−
]E(σ2
ml ) =σ2(n − r(X ))
nσ2
ml is biased but asymptotically unbiased
σ2ls =
n
n − r(X )σ2
ml is unbiased
σ2ls ∼
σ2
n − r(X )χ2(n − r(X ))
Tests
H0 : Lβ = m
Test statistics: F =(Lβ0 −m)′[L(X ′W−1X )−L′]−1(Lβ0 −m)
r(L)σ2ls
Distribution: F ∼ F(r(L), n − r(X ))
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Geometrical interpretation with W = Id
Y ε
Y = Xβ^
^
Vec(X)
Vec┴(X)
Y = X β ∈ Vec(X )dim(Vec(X )) = r(X )Y projection of Y on Vec(X )
ε = (Y − Y) ∈ Vec⊥(X )dim(Vec⊥(X )) = n − r(X )P = (Id − X (X ′X )−1X ′)
σ2ls = ε′ε
n−r(X )
σ2ml =
Y′PY
n=ε′ε
n
↪→ ML does not take into account dim(Vec⊥(X ))↪→ REML estimation
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Restricted likelihood
Let L such as LX = 0
Restricted log-likelihood of y (log-likelihood of Ly)
−2`reml (σ2; y) = y′L′(Lσ2WL′)−1Ly + log [det(Lσ2WL′)] + (n − r(X )) log (2π)
= C1 + y′Py + log [det(σ2W )] + log {det[X ′(σ2W )−1X ]}= C2 + y′Py + n log(σ2)− r(X ) log(σ2)
= C2 + y′Py + (n − r(X )) log(σ2)
= C2 +1
σ2y′Py + (n − r(X )) log(σ2)
where
C1 = (n − r(X )) log(2π)− log det(X ′X ) + log det(LL′)
C2 = C1 + log det(W ) + log {det[X ′W−1X ]}P = V−1 − V−1X (X ′V−1X )−1X ′V−1
P = W−1 −W−1X (X ′W−1X )−1X ′W−1
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Score function
Ureml (σ2) =
1
2(σ2)2y′Py − n − r(X )
2σ2
Solution
σ2reml =
y′Py
n − r(X )
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ML estimation for V unknown
Let Y ∼ N (Xβ,V )Let assume that V depends on a parameter vector φ: V = V (φ)Maximum likelihood estimator of β and φ?
Score functions U(β) = ∂`(β;φ,y)∂β = X ′V−1(y − Xβ), where V = V (φ)
−2U(φ) = −2∂`(φ;β,y)∂φ = tr(V−1 ∂V
∂φ )− (y − X β)′V−1 ∂V∂φV−1(y − X β)
Solutions1
β = (X ′V−1X )−X ′V−1Y
2
tr(V−1 ∂V
∂φ) = y′P
∂V
∂φPy
where P = V−1 − V−1X (X ′V−1X )−X ′V−1.↪→ not a closed form; use iterative algorithms
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Properties and tests
Properties
(X ′V−1X )− is not an estimator of Var(β): we should take intoaccount variability in V
Var(β − β) = Var(β0 − β) + Var(β − β0) ≥ Var(β0 − β)
Var∞(β, φ) = I−1
with
I =
(X ′V−1X 0
0 ( 12 tr(V−1 ∂V
∂φmV−1 ∂V
∂φm′))m,m′=1,...,M
)
the Fisher information matrix I = E(−∂
2`(β,φ;y)∂γ∂γ′
)with γ = (β, φ)′
and β assymptotically gaussian
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Test
H0 : Lβ = m
Test statistics: F =(Lβ0 −m)′[L(X ′V−1X )−L′]−1(Lβ0 −m)
r(L)
Approximate distribution F ∼ F(r(L), df )
df = n − r(X ) if V known up to a constant
many corrections of degrees of freedom df have been proposed
Satterthwaite (1941)Kenward and Roger (1997)...
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
REML estimation for V unknown
Restricted maximum likelihood estimator of φ?
Restricted likelihood
−2`reml (σ2; y) = y′Py + log(det(V )) + log(det(X ′V−1X ))
Score functions{−2Ureml (φ) = −2∂`reml (φ;β,y)
∂φ = tr(P ∂V∂φ )− y′P ∂V
∂φPy
Solution
tr(P∂V
∂φ) = y′P
∂V
∂φPy
where P = V−1 − V−1X (X ′V−1X )−X ′V−1.↪→ not a closed form; use iterative algorithms
Remark 1 : E(Ureml (φ; Y)) = 0
Remark 2 : β = (X ′V−reml 1X )−X ′V−reml 1y
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Properties
Var∞(β, φ) = I−1
with
I =
(X ′V−1X 0
0 ( 12 tr(P ∂V
∂φmP ∂V∂φm′
))m,m′=1,...,M
)and β assymptotically gaussian.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation in the linear mixed model
Coming back to the linear mixed model definition:
Y = Xβ +L∑
l=1
Zl ul + ε
with L different random effects ul ∼ N (0, σ2l Iql )
and ε ∼ N (0, σ20 In)
Parameters to be estimated : β and σ20 , σ
21 , . . . , σ
2L.
Conditional moments
E(Y|u1, . . . , uL) = Xβ +L∑
l=1
Zl ul
V(Y|u1, . . . , uL) = σ20 In
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Marginal moments
E(Y) = Xβ
V(Y) =L∑
l=1
σ2l Zl Z
′l + σ2
0 In
=L∑
l=0
σ2l Vl with V0 = In and Vl = Zl Z
′l
σ2 = (σ20 , σ
21 , . . . , σ
2L) are called variance components
↪→ V is a linear combination of variance components
↪→ ∂V∂σ2
l= Vl for all l ∈ {1, . . . , L}.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ML estimation
1 βml = (X ′V−1X )−X ′V−1Y
2 tr(V−1Vl ) = y′PVl Py
i.e.∑L
m=1 tr(V−1Vl V−1Vm)σ2
m = y′PVl Py
↪→ iterative linear system of equations :(tr(V−1Vl V
−1Vm))
l,m=1,...,L
(σ2
m
)m=1,...,L
=(
y′PVmPy)
m=1,...,L
REML estimation
1 βreml = (X ′V−1X )−X ′V−1Y
2 tr(PVl ) = y′PVl Py
i.e.∑L
m=1 tr(PVl PVm)σ2m = y′PVl Py
↪→ iterative linear system of equations :(tr(PVl PVm)
)l,m=1,...,L
(σ2
m
)m=1,...,L
=(
y′PVmPy)
m=1,...,L
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Properties
ML estimationVar∞(βml , σ
2ml ) = I−1
with
I =
(X ′V−1X 0
0 ( 12 tr(V−1Vl V
−1Vm))l,m=1,...,L
)REML estimation
Var∞(βreml , σ2reml ) = I−1
with
I =
(X ′V−1X 0
0 ( 12 tr(PVl PVm))l,m=1,...,L
)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Likelihood ratio test for fixed effects
Test model M0 embeded in M1
(M0 and M1 have the same variance structure)
with m0 = dim(Vec(X0)) and m1 = dim(Vec(X1)) (m0 < m1).Let `M0 and `M1 denote the associated log-likelihood calculated at theML-estimated parameter values (not REML).Then
X 2 = −2`M0 + 2`M1 ∼H0 χ2m1−m0
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Comparison of models
Akaike’s Information Criterion
AIC = −2 logL(βml , σml , y) + 2K
Bayesian Information Criterion
BIC = −2 logL(βml , σml , y) + K log(n)
Other criterions with other penalty terms
Danger !!! Softwares often give AIC or BIC calculated at reml-estimatedparameter values.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
σ2 =(σ2
0 , . . . , σ2L
)are solution of the non-linear equations:
F (σ2)σ2 = g(σ2)
where F (σ)2 is (L + 1)× (L + 1) matrix
F σ2 =L∑l
tr(V−1V ′l V−1Vl ), l ′ = 0; . . . L
andg(σ2) = y′PVl′ Py
Iterative algorithm should be employed
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Prediction of random effects
β are unknown but fixed parameters to be estimatedu are unobserved realized values of the latent random vector U↪→ estimation of U has no sense but prediction has !
As the joint distribution (Y,U) is gaussian
U = E (U|Y) = DZ ′V−1(Y − Xβ)
is the best linear predictor (BLP) in the sense of minimizing the meansquare error
if V is known,U0 = DZ ′V−1(Y − Xβ0)
is the best Unbiased linear predictor (BLUP)
if V is unknownU = DZ ′V−1(Y − X β)
is the estimate of the best linear predictor
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ANOVA with random effects
Example
2 varieties (Resistant/Susceptible)4 pesticide doses5 blocks40 observations of a response YYijk response for the i th dose j th variety and k th block.Evaluation of the influence of varieties, pesticide doses and interaction onthe response.Take into account the variablity between blocks and between interactionsblock × dose.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Example
Yijk = µ+ δi + γj + (δγ)ij + rk + wik + εijk
rk ∼ N (0, σ2r I5), wik ∼ N (0, σ2
w I20) and εijk ∼ N (0, σ2I40) mutuallyindependent.
cov(Yijk ,Yi ′j′k′) =
σ2
r + σ2w + σ2 if k = k ′ i = i ′ and j = j ′
σ2r + σ2
w if k = k ′ and i = i ′
σ2r if k = k ′
0 otherwise
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
> variety <- read.table("DataSets/Variety_eval.csv",sep=";",
+ header=T, colClasses=c(rep("factor",3),rep("numeric",2)))
> interaction.plot(variety$dose,variety$type,variety$y)
> plot(variety$y~I(variety$dose:variety$type))
1820
2224
2628
variety$dose
mea
n of
var
iety
$y
1 2 4 8
variety$type
sr
●
●
●
1:r 2:r 4:r 8:r
1520
2530
I(variety$dose:variety$type)
varie
ty$y
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Basic R code
> res.reml <- lmer(y ~ dose*type + (1|block) + (1|block:dose),
+ data=variety,REML = TRUE)
res.reml is a mer object
methods are available (e.g. fixef(res.reml), VarCorr(res.reml))
slots are available (e.g. res.reml@ranef, res.reml@mu)
Summary
> res.sum <- summary(res.reml)> print(c(slotNames(res.sum)[1:5],"...."))
[1] "methTitle" "logLik" "ngrps" "sigma" "coefs" "...."
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Summary
> res.sum@coefs
Estimate Std. Error t value
(Intercept) 20.00 1.476828 13.5425397
dose2 7.84 1.879586 4.1711322
dose4 8.18 1.879586 4.3520231
dose8 4.80 1.879586 2.5537544
types -2.94 1.314363 -2.2368242
dose2:types 0.30 1.858791 0.1613953
dose4:types 3.14 1.858791 1.6892704
dose8:types 3.94 1.858791 2.1196577
> data.frame(res.sum@REmat)
Groups Name Variance Std.Dev.
1 block:dose (Intercept) 4.5132 2.1244
2 block (Intercept) 2.0735 1.4400
3 Residual 4.3189 2.0782
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ML versus REML
Random effects
Groups var.ML std.ML var.REML std.REML1 block:dose 3.6106 1.9002 4.5132 2.12442 block 1.6588 1.2879 2.0735 1.44003 Residual 3.4551 1.8588 4.3189 2.0782
Fixed effects
Est.ML t.value.ML Est.REML t.value.REML(Intercept) 20.00 15.14 20.00 13.54
dose2 7.84 4.66 7.84 4.17dose4 8.18 4.87 8.18 4.35dose8 4.80 2.86 4.80 2.55types -2.94 -2.50 -2.94 -2.24
dose2:types 0.30 0.18 0.30 0.16dose4:types 3.14 1.89 3.14 1.69dose8:types 3.94 2.37 3.94 2.12
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Tests for fixed effects: each parameters
Use maximum likelihood : REML=FALSE
t-test approach, with n-p degrees of freedom (40-8)
Est.ML t.value.ML p.val(Intercept) 20.00 15.14 0.0000
dose2 7.84 4.66 0.0001dose4 8.18 4.87 0.0000dose8 4.80 2.86 0.0075types -2.94 -2.50 0.0177
dose2:types 0.30 0.18 0.8579dose4:types 3.14 1.89 0.0680dose8:types 3.94 2.37 0.0240
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Tests for fixed effects: each parameters
resampling approach : available for simple linear mixed models> ic <- mcmcsamp(res.ml,n = 100)> ic <- ic@fixef> ic <- t(apply(ic,1,function(x) quantile(x,prob=c(0.025,0.975))))> xtable(ic)
2.5% 97.5%(Intercept) 16.83 23.12
dose2 4.12 11.44dose4 4.30 11.64dose8 0.77 8.50types -5.78 1.14
dose2:types -4.16 5.12dose4:types -1.73 7.88dose8:types -1.03 8.52
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Tests for fixed effects: global test
F-test approach, with n-p degrees of freedom (40-8) usinganova(res.reml) R command
Df Sum.Sq Mean.Sq F.value p.valdose 3 176.54 58.85 13.625 0.0000type 1 11.99 11.99 2.776 0.1054
dose:type 3 29.64 9.88 2.288 0.0974
balanced case: exact FtestError: block
Df Sum Sq Mean Sq F value Pr(>F)Residuals 4 119.73 29.933
Error: block:doseDf Sum Sq Mean Sq F value Pr(>F)
dose 3 545.50 181.835 13.625 0.0003595 ***Residuals 12 160.14 13.345---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
Error: block:dose:typeDf Sum Sq Mean Sq F value Pr(>F)
type 1 11.990 11.9903 2.7762 0.1151dose:type 3 29.643 9.8809 2.2878 0.1176Residuals 16 69.102 4.3189
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Model Comparisons
we want to compare different nested models : use ML (not REML)
full model: dose*type
additive model ( dose+type) versus full model
an approximated test using chi-squarre distribution:
> compar <- anova(res.ml,additif)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)additif 8 212.85 226.36 -98.43res.ml 11 211.71 230.29 -94.86 7.14 3 0.0676
AIC or BIC criterion
> AICTab <-rbind(summary(additif)@AICtab,summary(res.ml)@AICtab)
> rownames(AICTab) <- c("additif", "res.ml")
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Predictions
conditional: E (Y |U) = Xβ + Zu
> pred.cond <- res.sum@mu
marginal: E (Y |U) = Xβ
> pred.mar <- res.sum@X%*%res.sum@coefs[,1]
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ANCOVA with random effects
Example
32 steers dispatched into8 farms4 diet treatmentseach steer in farm received one treatment
influence of the initial weight of steers
32 observations of a response Y
Yij response for the i th diet treatment j th farm .
Determine the optimal level of feed additive to maximize the averagedaily gain of steer.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ANCOVA with random farm effects
Yij = dtrti + γwij + farmj + εij
farmj ∼ N (0, σ2f I8), εij ∼ N (0, σ2I32) mutually independent.
cov(Yij ,Yi ′j′) =
σ2f + σ2 if i = i ′ and j = j ′
σ2f if j = j ′
0 otherwise
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Pairwise comparison of treatments via Tukey
> steers.ml <- lmer(adg~dtrt+iwt + (1|farm), data=steers,REML=FALSE)
> library(multcomp)
> summary(glht(steers.ml, linfct = mcp(dtrt = "Tukey")))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: lmer(formula = adg ~ dtrt + iwt + (1 | farm), data = steers,
REML = FALSE)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
10 - 0 == 0 0.48366 0.10308 4.692 <1e-04 ***
20 - 0 == 0 0.46401 0.10235 4.534 <1e-04 ***
30 - 0 == 0 0.55180 0.10487 5.262 <1e-04 ***
20 - 10 == 0 -0.01966 0.10251 -0.192 0.998
30 - 10 == 0 0.06814 0.10867 0.627 0.923
30 - 20 == 0 0.08779 0.10622 0.827 0.842
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
(Adjusted p values reported -- single-step method)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
ANCOVA with random farm effects and specific fixed weight slope
Yij = dtrti + γi wij + farmj + εij
farmj ∼ N (0, σ2f I8), εij ∼ N (0, σ2I32) mutually independent.
cov(Yij ,Yi ′j′) =
σ2f + σ2 if i = i ′ and j = j ′
σ2f if j = j ′
0 otherwise
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
R code
> steers2.ml <- lmer(adg~dtrt*iwt + (1|farm), data=steers, REML=FALSE)
> anova(steers.ml,steers2.ml)
Data: steers
Models:
steers.ml: adg ~ dtrt + iwt + (1 | farm)
steers2.ml: adg ~ dtrt * iwt + (1 | farm)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
steers.ml 7 27.617 37.877 -6.8086
steers2.ml 10 29.971 44.628 -4.9855 3.6463 3 0.3023
> library(ggplot2)
> datapred <- data.frame(iwt= steers$iwt,pred=
+ steers.ml@X%*%summary(steers.ml)@coefs[,1],
+ dtrt=steers$dtrt)
> steers.plot <- ggplot(datapred, aes(y=pred,x=iwt,group=dtrt,
+ colour=dtrt))+geom_line()
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
iwt
pred
1.2
1.4
1.6
1.8
2.0
350 400 450
dtrt
0
10
20
30
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Random coefficient model
Example
10 varieties of winter wheat sampled in population
6 clones per varieties
60 combinations completely randomized in the field
pre-planting moisture at each plot have been measured
60 observations of a response Y of wheat yield (in bushel)
Yij response for the i th variety j th clone
moisture influences the wheat yield .
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Plot
> library(ggplot2)
> wheat <- read.table("DataSets/wheat.csv", sep=";",
+ colClasses=c("factor","numeric","numeric"),header=T)
> wheat.plot <- ggplot(wheat, aes(y=yield,x=moisture,group=wheat,
+ colour=wheat))+geom_line()
moisture
yiel
d
30
40
50
60
70
10 20 30 40 50 60
wheat
1
10
2
3
4
5
6
7
8
9
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models
1 fixed coefficients
Yij = µ+ αwij + vi + γi wij + εij
but varieties sampled from population,
2 random variety effects
Yij = µ+ αwij + vi + γi wij + εij
where
v ∼ N (0, σ2v I10)
γ ∼ N (0, σ2γ I10)
↪→ are γ and v independent?
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
R code: independent case
> wheat.ind <- lmer(yield~moisture + (1|wheat)
+ + (0+moisture|wheat), data=wheat, REML=FALSE)
> xtable(data.frame(summary(wheat.ind)@REmat))
Groups Name Variance Std.Dev.
1 wheat (Intercept) 16.3594335 4.0446802 wheat moisture 0.0019991 0.0447113 Residual 0.3551692 0.595961
R code: dependent case
> wheat.dep <- lmer(yield~moisture + (1+moisture|wheat),
+ data=wheat, REML=FALSE)
Groups Name Variance Std.Dev. Corr
1 wheat (Intercept) 16.9590713 4.1181392 moisture 0.0021012 0.045839 -0.3403 Residual 0.3524533 0.593678
Fixed part is the same for both models, REML=TRUE option could bepossible
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Comparing two models
> xtable(anova(wheat.ind,wheat.dep))
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)wheat.ind 5 193.10 203.57 -91.55wheat.dep 6 194.06 206.62 -91.03 1.04 1 0.3076
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Is random slope to be included?
> xtable(data.frame(summary(wheat.ind)@REmat))
Groups Name Variance Std.Dev.1 wheat (Intercept) 16.3594335 4.0446802 wheat moisture 0.0019991 0.0447113 Residual 0.3551692 0.595961
> wheat.int <- lmer(yield~moisture + (1|wheat),
+ data=wheat, REML=FALSE)
> xtable(anova(wheat.int,wheat.ind))
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
wheat.int 4 208.26 216.64 -100.13wheat.ind 5 193.10 203.57 -91.55 17.16 1 0.0000
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
GLMM construction
Linear models
LM
�����*
HHHHHj Mixed
LMM
Generalized
GLMHHHHHj
�����*
GLMM
Flexibility to account for:
categorical, discrete ... non gaussian data ... distributions in theexponential family↪→ mean non linearly related to the model parameters↪→ variances of the data related to the mean
random effect factors
correlation structure in the data ... repeated data
over-dispersion
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Example
8 clinics sampled from a larger population
At clinic j , - nj1 receive treatment 1- nj2 receive treatment 2
The response Yjk at treament k and clinic j is the number of ‘favorable’responses.
fjk =Yjk
njkis the proportion of ‘favorable’ responses.
↪→ binomial data in a multi-center clinical trial.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Model definition
. Two random sources in the data:
• uj j = 1, .., q (q clusters) j th random effect
• Yijk k = 1, .., nij k th observation for subject i in cluster j
. Conditional building of the model:
• conditional distribution of Y given random effect:
Yi |U ∼ indep. ExpFam(µ, φ)with ∀i :
E(Yi |U) = µi = g−1(ηi ) = g−1(x ′i β + z ′i u)V(Yi |U) = φv(µi )
with g the link functionand v the variance function
• random effects distribution U :
Uj ∼ Nqj (0, σ2j Idqj )
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Example (cont.)
Yjk |U ∼ indep. BinomFam(njk , pjk )
E(Yjk |U) = µjk = njk × pjk = njk × g−1(ηjk )
V(Yjk |U) = φv(µjk )
U ∼ N8(0, σ2Id8)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Marginal moments
E(Yi ) = E(E(Yi |U))= E(g−1(x ′i β + z ′i u))= ?
↪→ cannot in general be simplified due to the non linear functiong−1.
V(Yi ) = V(E(Yi |U)) + E(V(Yi |U))= V(g−1(x ′i β + z ′i u)) + E(φv(g−1(x ′i β + z ′i u)))= ?
↪→ cannot in general be simplified.
cov(Yi ,Yj ) = cov(E(Yi |U),E(Yj |U)) + E(cov(Yi ,Yj |U))= cov(g−1(x ′i β + z ′i u), g−1(x ′j β + z ′j u)) + 0= ?
↪→ cannot in general be simplified.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Likelihood definition
parameters to be estimated: β, σ2
observed data: y
L(β, σ2; y) =∫IRq fY |U (y) fU (u) du
=∫IRq
∏i fYi |Uj
(yi ) fU (u) du
Problem: This integral cannot be formally expressed in closed formwith exceptions e.g. gaussian distribution, beta-binomial model,Poisson-gamma model ...
↪→ no analytic expression for the marginal distribution of Y↪→ need for approximation
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation
2 main estimation approaches:
numerical integration for calculating the likelihood and hencenumerical maximisation of the likelihood
linearization methods using Taylor expansions to approximate themodel
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation by numerical integration
L(β, σ2; y) =∫IRq
∏i fYi |U (yi ) fU (u) du
. Gaussian quadrature:integral approximated by a weighted sum:cf. Abramowitz & Stegun (1964)If Uj are independent:
L(β, σ2; y) =∫IRq
∏i fYi |U (yi ) fU (u) du
=∏
j
∫IR
∏i fYij |Uj
(yij ) fUj (uj ) duj
=∏
j
∫IR
∏i fYij |U∗j (yij ) fU∗j
(u∗j ) du∗j
`(β, σ2; y) '∑
j ln[∑
l
(∏i fYij |U∗j (yij , u
∗cj,l ))
wj,l
]with - quadrature points: u∗cj,l
- weights: wj,l
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
. Difficulties:
choice of the quadrature points
low dimension of random effects
a single random effects or two, no more
nested : OKcrossed : impossible
↪→ works only in very simple situations
↪→ implemented in SAS macro nlmixed devoted to non linear mixedmodels
. Other numerical integration techniques:MCMC, EM, MCEM ... suffer from nearly the same difficulties
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Estimation by linearization
. Simple approach cf . Schall (1991)
2 steps method:
⇒ conditionaly on U: the GLMM model is a GLM
recall estimation of β in a GLM : weighted least squares
X ′W−1β Xβ = X ′W−1
β zβ
- on pseudo data (or working variates) : zβ = η + (y − µβ)g ′(µβ)↪→ seen as an expansion of the link function around the mean- with covariance matrix : Wβ = diag{g ′(µβ)2 v(y)}
linearized model M[t] (iterative procedure) :
Z [t] = Xβ + ε where V(ε) = Wβ[t]
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
given U, linearized model M[t]u :
Z [t] = Xβ + ZU + ε where V(ε) = Wβ[t],u[t] = W [t]
⇒ estimation in the LMM
iterative estimation of parameters in the LMM (random effects recoveringtheir random nature)
↪→ double iterative algorithm
Other justifications to similar computational algorithms:
. Penalized quasi-likelihood cf. Breslow & Clayton (1993)
. Laplace approximation cf. Wolfinger (1993)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
. Advantages:- easy to compute- numerically fast- no limit on the number or type of random effects
. Drawbacks:- small bias (correction proposed by Lin & Breslow (1996))- normality assumption of the random effects- small value of variance components
↪→ implemented in SAS macro glimmix
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Multi-center clinical trial
clinic trt fav unfav
1 1 drug 11.00 25.002 1 cntl 10.00 27.003 2 drug 16.00 4.004 2 cntl 22.00 10.005 3 drug 14.00 5.006 3 cntl 7.00 12.007 4 drug 2.00 14.008 4 cntl 1.00 16.009 5 drug 6.00 11.00
10 5 cntl 0.00 12.0011 6 drug 1.00 10.0012 6 cntl 0.00 10.0013 7 drug 1.00 4.0014 7 cntl 1.00 8.0015 8 drug 4.00 2.0016 8 cntl 6.00 1.00
Do drug and clinic have an effecton the probability of favorableresponse?
↪→ each clinic is measured twicefor drug and control: dependencybetween the 2 reponses for eachclinic
↪→ clinics sampled from a targetpopulation : between clinicsvariability
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Modeling approach
Denote pjk the probability of favorable response for treatment k andclinic j .
A linear model would be described:
fjk = pjk + εjk
withpjk = µ+ αk + cj + (γ)jk
Problems:
non normal distribution of the data
values of linear combination of effects are outside of the range forprobabilities
interaction γ and error ε are confounded in the model
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Binomial model with canonical logit link
ηjk = logpjk
1− pjk= µ+ αk + cj + (γ)jk
with cj ∼ N (0, σ2c ) and (γ)jk ∼ N (0, σ2
γ).
> clinics <- read.table("DataSets/clinics.txt", header=T,
+ colClasses = c(rep("factor",2),rep("numeric",2)))
> clinics$trt <- relevel(clinics$trt,"drug")#change reference level
> clinics.glmm <- glmer(cbind(fav,unfav)~ trt + (1|clinic/trt),
+ data=clinics, family=binomial(link="logit"))
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Binomial model with canonical logit link
> summary(clinics.glmm)
Generalized linear mixed model fit by the Laplace approximationFormula: cbind(fav, unfav) ~ trt + (1 | clinic/trt)
Data: clinicsAIC BIC logLik deviance
43.81 46.9 -17.9 35.81Random effects:Groups Name Variance Std.Dev.trt:clinic (Intercept) 0.0057396 0.07576clinic (Intercept) 1.9317511 1.38987Number of obs: 16, groups: trt:clinic, 16; clinic, 8
Fixed effects:Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.4573 0.5442 -0.840 0.4008trtcntl -0.7423 0.3001 -2.473 0.0134 *---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
Correlation of Fixed Effects:(Intr)
trtcntl -0.263
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Binomial model with other link functions
> clinicsProb.glmm <- glmer(cbind(fav,unfav)~ trt + (1|clinic/trt),+ data=clinics, family=binomial(link="probit"))> summary(clinicsProb.glmm)
Generalized linear mixed model fit by the Laplace approximationFormula: cbind(fav, unfav) ~ trt + (1 | clinic/trt)
Data: clinicsAIC BIC logLik deviance
44.12 47.21 -18.06 36.12Random effects:Groups Name Variance Std.Dev.trt:clinic (Intercept) 0.00045787 0.021398clinic (Intercept) 0.66611291 0.816157Number of obs: 16, groups: trt:clinic, 16; clinic, 8
Fixed effects:Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2680 0.3181 -0.842 0.3996trtcntl -0.4425 0.1735 -2.550 0.0108 *---Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
Correlation of Fixed Effects:(Intr)
trtcntl -0.266
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
You can get predicted values:- on the predictor scale / on the inverse link scale (here probability scale)- with BLUP of random effects or not.
BLUP NOBLUP
Inv.pred g−1(x ′β + z ′u) g−1(x ′β)
pred x ′β + z ′u x ′β
different scale
> xtable(data.frame(logits=clinics.glmm@eta[1:5], ilogits=clinics.glmm@mu[1:5],+ ilogit2 = exp(clinics.glmm@eta[1:5])/(1+exp(clinics.glmm@eta[1:5]))))
logits ilogits ilogit21 -0.57 0.36 0.362 -1.29 0.22 0.223 1.39 0.80 0.804 0.65 0.66 0.665 0.54 0.63 0.63
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Longitudinal data
Longitudinal (or repeated) data are successive observations on each of acollection of observational units.
Example
traits are measured at different times
tree or animal growthchange in survivorship over time among populations
individuals are exposed to different level of a same treatment
same plants exposed to varying nitrogen dose (discrete or continuous)patients receiving several treatments along time periods
General objectives
→ Modeling the evolution among repetitions of traits depending onco-variables (factor or regressor) taking into account dependencies withinsubjects→ Modeling the between subjects variability
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Advantages:
an experiment more efficienthelps keep the variability lowhelps keep the validity of the results higher
Drawbacks :
may not be possible for each participant to be in all conditions of theexperimentunbalanced designmissing data
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Main question
observations on same subject
⇓
dependencies within subjects
but consecutive observations can be more correlated than more distantobservations
⇓
How to model within subject dependencies?
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Simple treatment×time model with repeated measures
Example
Pharmaceutical company tests a new drug on respiratory ability
A=standard, C=new, P = placebo
24 patients
each patient will receive each drug in a randomly assigned order
respiratory ability (FEV1) measured hourly for 8 hours followingtreatment
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Hypothesis
How does the response mean differ among treatment?
↪→ Is there a treatment main effect?
How does the response mean change over time?
↪→ Is there a time main effect?
How do response differences among treatment change over time?
↪→ Is there treatment×time interactions?
Treatment is called the between-subjects factor
Time is called within-subjects factor
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Data and reshape
> respir <- read.table("DataSets/respiratory.csv",
+ colClasses=c("factor", rep("numeric",9), "factor"),
+ header=TRUE, sep=";")
> colnames(respir) <-c("patient", 0:8,"drug")
> library(reshape)
> respir.lon <- melt(respir,id=c("patient","drug") )
> respir.lon$time <- as.numeric(respir.lon[,3])-1
> colnames(respir.lon)[4] <- "ability"
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Data and reshape
> qplot(time,ability,data=respir.lon, facets =~drug,
+ group=patient, geom="line", colour=patient)
time
abili
ty
1
2
3
4
1
2
3
4
a
p
0 2 4 6 8
c
0 2 4 6 8
patient
201
202
203
204
205
206
207
208
209
210
211
212
214
215
216
217
218
219
220
221
222
223
224
232
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Fixed model
gaussian familyY = µ+ ε
expectation for individual i in the treatment j at time k
E[Yijk ] = µ+ αj + γk + (αγ)jk
where µ the intercept, α the treatment effects, γ the time effect and(αγ) the interaction time × treatment
varianceV(Y ) = σ2Id
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Simple ANCOVA model
> Subrespir.lon <- respir.lon[respir.lon$variable!=0,]
> respir.lm <- lm(ability~drug*variable,data=Subrespir.lon)
> anova(respir.lm)
Analysis of Variance Table
Response: ability
Df Sum Sq Mean Sq F value Pr(>F)
drug 2 25.783 12.8913 25.6060 2.320e-11 ***
variable 7 17.170 2.4529 4.8722 2.386e-05 ***
drug:variable 14 6.280 0.4486 0.8910 0.5687
Residuals 552 277.903 0.5034
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Simple plots
> residus <- matrix(respir.lm$res,ncol=8)
> pairs(residus[,1:5])
var 1
−1.5 −0.5 0.5
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
−1.5 −0.5 0.5 1.5
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
−1.
5−
0.5
0.5
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
−1.
5−
0.5
0.5
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
var 2●
●
●
●● ●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●● ●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●●
●
●●
●
●
var 3●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●●
●
●●
●
●
−1.
50.
01.
0
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●●
●
● ●
●
●
−1.
5−
0.5
0.5
1.5
●●●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●●
●
●●
●
●
var 4●
●●
●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●●●
●
● ●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●
●
● ●
●
●
−1.5 −0.5 0.5
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●●
●
●
● ●
●
−1.5 0.0 1.0
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
● ●●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●●
●
●
● ●
●
−1.5 0.0 1.0
−1.
50.
01.
0
var 5
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Linear mixed model
> respir1.lmm <- lmer(ability~drug*variable+(1|patient),
+ data=Subrespir.lon, REML=FALSE)
> respir2.lmm <- lmer(ability~drug*variable+(1|patient/drug),
+ data=Subrespir.lon, REML=FALSE)
> anova(respir1.lmm,respir2.lmm)
Data: Subrespir.lon
Models:
respir1.lmm: ability ~ drug * variable + (1 | patient)
respir2.lmm: ability ~ drug * variable + (1 | patient/drug)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
respir1.lmm 26 456.37 569.63 -202.19
respir2.lmm 27 294.02 411.63 -120.01 164.35 1 < 2.2e-16 ***
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
> data.frame(summary(respir2.lmm)@REmat)
Groups Name Variance Std.Dev.
1 drug:patient (Intercept) 0.053485 0.23127
2 patient (Intercept) 0.368488 0.60703
3 Residual 0.060497 0.24596
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Comparing fixed parameters estimation
> aic.lm <- c(AIC(respir.lm),BIC(respir.lm))
> aic.lm
[1] 1264.807 1373.710
> aic.lmm <- summary(respir2.lmm)@AICtab
> aic.lmm
AIC BIC logLik deviance REMLdev
294.0151 411.63 -120.0075 240.0151 329.7807
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Comparing fixed part given covariance structure
> respir3.lmm <- lmer(ability~drug+variable+(1|patient/drug),
+ data=Subrespir.lon, REML=FALSE)
> anova(respir2.lmm,respir3.lmm)
Data: Subrespir.lon
Models:
respir3.lmm: ability ~ drug + variable + (1 | patient/drug)
respir2.lmm: ability ~ drug * variable + (1 | patient/drug)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
respir3.lmm 13 360.41 417.04 -167.20
respir2.lmm 27 294.02 411.63 -120.01 94.391 14 5.584e-14 ***
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Comments
Balanced experimental design
↪→ estimator of fixed effects do not depend on variance estimations↪→ estimation of fixed effects are equals for both cases
> coefs
[,1] [,2]
(Intercept) 3.48875000 3.48875000
drugc 0.19625000 0.19625000
drugp -0.67375000 -0.67375000
variable2 -0.07666667 -0.07666667
variable3 -0.28958333 -0.28958333
Residual variances are different
0.5 without random effects and 0.06 taking into account randomeffects
↪→ interaction becomes ”significant”.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Modelling covariance function
are residual variances constant over time?
are covariances between times depending on lag?
are covariances depending on treatments?
...
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
From here, we consider:
V =
V11 0 0 0
0 V12 0 0
0 0. . . 0
0 0 0 V83
We assume no patient dependency from one drug to the other.We work on the 8× 8-covariance sub-block matrix
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Studying relation between estimated covariance and time
Let assume an unstructured covariance model: the mostparameterized model
V(Yij ) =
σ21 σ12 σ13 . . .
σ22 σ23 . . .
. . .. . .
. . .
> Subrespir.lon <- respir.lon[respir.lon$variable!=0,]
> Subrespir.lon$indic <- I(Subrespir.lon$patient:Subrespir.lon$drug)
> respir.un <- gls(ability~drug*variable, data=Subrespir.lon,
+ correlation = corSymm(form = ~1 | indic),
+ weights= varIdent( form = ~1 | time),method="ML")> VarCov
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8][1,] 0.4351987 0.4395978 0.4256334 0.3981156 0.4168162 0.3769659 0.3413459 0.3679788[2,] 0.4395978 0.4947652 0.4607254 0.4492399 0.4737156 0.4077123 0.3825432 0.4079240[3,] 0.4256334 0.4607254 0.4717923 0.4491765 0.4641391 0.4085257 0.3853456 0.4078965[4,] 0.3981156 0.4492399 0.4491765 0.4732017 0.4635553 0.4004393 0.3855756 0.4073768[5,] 0.4168162 0.4737156 0.4641391 0.4635553 0.5538420 0.4738645 0.4449387 0.4744100[6,] 0.3769659 0.4077123 0.4085257 0.4004393 0.4738645 0.4701743 0.4267930 0.4438868[7,] 0.3413459 0.3825432 0.3853456 0.3855756 0.4449387 0.4267930 0.4786082 0.4308936[8,] 0.3679788 0.4079240 0.4078965 0.4073768 0.4744100 0.4438868 0.4308936 0.4821444
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
times
un
0.35
0.40
0.45
0.50
0.55
0 1 2 3 4 5 6 7
factor(gp)
1
2
3
4
5
6
7
8
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Autoregressive structure R
Let assume autoregressive covariance matrix:
V(Yij ) = σ2R where rkk′ = cor(Yijk ,Yijk′) = ρ|k−k′|
> respir.ar <- gls(ability~drug*variable, data = Subrespir.lon,
+ correlation = corAR1(0.2, form= ~time|indic), method="ML")
φ = 0.9219 and σ2 = 0.4685
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Plot between estimated unstructured and AR(1)covariances
ar
un
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.25 0.30 0.35 0.40 0.45 0.50
factor(gp)
1
2
3
4
5
6
7
8
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Random effect added to autoregressive structure
Autoregressive conditional covariance matrix for 1 subject within drug:
V(Yij |u) = σ2R where rkk′ = cor(Yijk ,Yijk′ |u) = ρ|k−k′|
Marginal covariance matrix for 1 subject within drug:
Vij =
(σ2 + σ2
u σ2ρ|k−k′| + σ2u
σ2ρ|k−k′| + σ2u σ2 + σ2
u
)> respir.lmear <- lme(ability~drug*variable, data=Subrespir.lon,
+ random= ~1|indic, correlation = corAR1(0.2, form= ~time|indic),method="ML")
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Plot between estimated unstructured and AR(1)+REcovariances
lmear
un
0.35
0.40
0.45
0.50
0.55
0.40 0.42 0.44 0.46
factor(gp)
1
2
3
4
5
6
7
8
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Impact of the patient RE
> respir2.lmear <- lme(ability~drug*variable, data=Subrespir.lon,+ random= ~1|patient/drug,+ correlation = corAR1(0.2, form= ~time|patient/drug),method="ML")> anova(respir.lmear,respir2.lmear)
Model df AIC BIC logLik Test L.Ratio p-valuerespir.lmear 1 27 258.8141 376.4290 -102.4070respir2.lmear 2 28 190.0226 311.9936 -67.0113 1 vs 2 70.79148 <.0001
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Some remarks
taking into account dependencies within subjects can inscrease testfor fixed effects
random effects are a way to model within subjects dependencies butare not necessarily sufficient
defining appropriate residual correlation form is a difficult task
plots are useful tools but not sufficientdifferent models should be compared but:
↪→ conclusions may differ according to different criterion
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Overdispersion in glms (Hinde and Demetrio, 1998a,b)
Residual Deviance ≈ Residual d.f.
What if Residual Deviance � Residual d.f.?1 Badly fitting model
omitted terms/variablesincorrect relationship (link)outliers
2 variation greater than predicted by model: ⇒ Overdispersion
count data: Var(Y ) > µcounted proportion data:
Var(Y ) > nπ(1− π)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Overdispersion in glms (Hinde and Demetrio, 1998a,b)
Causes of Overdispersion
variability of experimental material– individual level variability
correlation between individual responsese.g. litters of rats
cluster samplinge.g. areas; schools; classes; children
aggregate level data
omitted unobserved variables
excess zero counts (structural and sampling zeros)
ConsequencesWith correct mean model we have consistent estimates of β but:
incorrect standard errors
selection of overly complex models
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Worldwide Airline Fatalities, 1976-85
Example
Year Fatal Passenger Passengeraccidents deaths miles
(100 million)
1976 24 734 38631977 25 516 43001978 31 754 50271979 31 877 54811980 22 814 58141981 21 362 60331982 26 764 58771983 20 809 62231984 16 223 74331985 22 1066 7107
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Worldwide Airline Fatalities, 1976-85
Simple Models
Passenger miles (mi ) as exposure variable
Poisson log-linear model
Linear time trend
Yi ∼ Pois(miλi ) (1)
log λi = β0 + β1yeari
Fatal accidents:Deviance(time trend) = 20.68Residual Deviance = 5.46 on 8 d.f.
Passenger deaths:Deviance(time trend) = 202.1Residual Deviance = 1051.5 on 8 d.f.
⇒ compounding with aircraft size
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models for Overdispersion (Hinde and Demetrio, 1998a,b)
Two broad categories
assume some more general form for the variance function, possiblywith additional parameters.Estimation using moment methods, quasi-likelihood, extendedquasi-likelihood, pseudo-likelihood, . . .
assume a two-stage model for the response with the response modelparameter following some distribution.Maximum likelihood estimation (conjugate distribution models) orapproximate methods (e.g. using first two moments as above)Full hierarchical model – Bayesian methods
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Mean-variance Models (Hinde and Demetrio, 1998a,b)
Overdispersed Proportion DataYi successes out of mi trials, i = 1, . . . , n.
Model expected proportions πi with link function g and
g(πi ) = β′xi
constant overdispersion
Var(Yi ) = φmiπi (1− πi )
A general variance function:Overdispersion allowed to depend upon both mi and πi .
Var(Yi ) = miπi (1− πi )×[1 + φ(mi − 1)δ1{πi (1− πi )}δ2
]
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Mean-variance Models (Hinde and Demetrio, 1998a,b)
Overdispersed Count dataRandom variables Yi represent counts with means µi .
Constant overdispersion
Var(Yi ) = φµi
can arise through a simple compounding process.Suppose that Ni ∼ Pois(µN ) and T =
∑Ni
i=1 Xi , Xi are iid randomvariables.
E[T ] = µT = EN (E[T |N]) = µNµX
Var(T ) = EN [V(T |N)] + VN (E[T |N])
= µT
(σ2
X
µX+ µX
)= µT
E[X 2]
E[X ]
A general variance function
Var(Yi ) = µi
{1 + φµδi
}C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)
Beta-BinomialYi |Pi ∼ Bin(mi ,Pi )
E(Pi ) = πi Var(Pi ) = φπi (1− πi )
Unconditionally, E(Yi ) = miπi and
Var(Yi ) = miπi (1− πi )[1 + (mi − 1)φ]
Taking Pi ∼ Beta(αi , βi ), with αi + βi fixed, gives beta-binomialdistribution for Yi with the same variance function.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)
The same variance function results from assuming that individual binaryresponses are not independent but have a constant correlation.Writing Yi =
∑mi
j=1 Rij , where Rij are Bernoulli random variables with
E[Rij ] = πi and Var(Rij ) = πi (1− πi )
then, assuming a constant correlation ρ between the Rij ’s for j 6= k, wehave
Cov(Rij ,Rik ) = ρπi (1− πi )
and
E[Yi ] = miπi
Var(Yi ) =
mi∑j=1
Var(Rij ) +
mi∑j=1
∑k 6=j
Cov(Rij ,Rik )
= miπi (1− πi ) + mi (mi − 1)[ρπi (1− πi )]
= miπi (1− πi )[1 + ρ(mi − 1)],
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)
Logistic-normal and related modelsRandom effect in the linear predictor
ηi = β′xi + σzi
assume zi ∼ N(0, 1)
Probit-normal model - a convenient interpretation as a thresholdmodel for a normally distributed latent variable (McCulloch, 1994).Logistic-normal using EM algorithm with Gaussian quadrature.Approximate approach using a Williams type III model with
Var(Yi ) = miπi (1− πi )[1 + φ(mi − 1)πi (1− πi )]
make no specific distributional assumption about z - estimate adiscrete mixing distribution by non-parametric maximum likelihood(NPML).
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage Models – Binomial (Hinde and Demetrio, 1998a,b)
Considered as the two-stage model, the logit(Pi ) have a normal distributionwith variance σ2, i.e. logit(Pi ) ∼ N(x′iβ, σ
2). Writing
Ui = logit(Pi ) = logPi
(1− Pi )⇒ Pi =
eUi
(1 + eUi )
and using Taylor series for Pi , around Ui = E[Ui ] = x′iβ, we have
Pi =ex′i β
(1 + ex′iβ)
+ex′i β
(1 + ex′iβ)2
(Ui − x′iβ) + o(Ui − x′iβ).
Then
E(Pi ) ≈ex′i β
(1 + ex′iβ)
:= πi
and
Var(Pi ) ≈
[ex′i β
(1 + ex′iβ)2
]2
Var(Ui ) = σ2π2i (1− πi )
2
Consequently the variance function for the logistic-normal model can beapproximated by
Var(Yi ) ≈ miπi (1− πi )[1 + σ2(mi − 1)πi (1− πi )]
which Williams (1982) refers to as a type III variance function.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage models – Count data (Hinde and Demetrio, 1998a,b)
Negative Binomial
Variation in Poisson rate parameter:
Yi |θi ∼ Pois(θi ), θi ∼ Gamma(k, λi )
leads to negative binomial distribution with
E[Yi ] = µi = k/λi
and
Var(Yi ) = µi +µ2
ik
For known k, in the 1-parameter exponential family so still in glmframework.
Different assumptions for the Γ-distribution lead to differentparameterizations with different overdispersed variance functions,e.g. θi ∼ Γ(ki , λ) gives
Var(Yi ) = µi
(1 + 1
λ
)= φµi
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage models – Count data (Hinde and Demetrio, 1998a,b)
Negative binomial distribution
Yi |θi ∼ Pois(θi )θi ∼ Gamma(k, λi ), i = 1, . . . , n
This leads to a negative binomial distribution for the Yi with
fYi (yi ;µi , k) =Γ(k + yi )
Γ(k)yi !
µyii kk
(µi + k)k+yi, yi = 0, 1, . . .
and
E(Yi ) = k/λi = µi
Var(Yi ) = Eθi [Var(Yi |θi )] + Varθi (E[Yi |θi ])
= E[θi ] + V(θi ) =k
λi+
k
λ2i
Var(Yi ) = µi +µ2
i
k
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Two-stage models – Count data (Hinde and Demetrio, 1998a,b)
Poisson-normal and related modelsIndividual level random effect in the linear predictor
ηi = β′xi + σZi
assume Zi ∼ N(0, 1), so
Yi |Zi ∼ Pois(λi ) with log λi = x′iβ + σZi
where Zi ∼ N(0, 1), which gives
E[Yi ] = EZi (E[Yi |Zi ]) = EZi [ex′i β+σZi ]
= ex′i β+ 12σ2
:= µi
Var(Yi ) = EZi [Var(Yi |Zi )] + VarZi (E[Yi |Zi ])
= ex′i β+ 12σ2
+ VarZi (ex′i β+σZi )
= ex′i β+ 12σ2
+ e2x′i β+σ2
(eσ2
− 1).
i.e. a variance function of the form
Var(Yi ) = µi + k ′µ2i
make no specific distributional assumption about Z - estimate adiscrete mixing distribution by NPML.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Germination of Orobanche seed
Example
O. aegyptiaca 75 O. aegyptiaca 73
Bean Cucumber Bean Cucumber
10/39 5/6 8/16 3/12
23/62 53/74 10/30 22/41
23/81 55/72 8/28 15/30
26/51 32/51 23/45 32/51
17/39 46/79 0/4 3/7
10/13
Response variable: Yi – number of germinated seeds out of mi seeds.Distribution: Binomial.Systematic component: factorial 2× 2 (2 species, 2 extracts),completely randomized experiment (Crowder, 1978).Aim: to see how germination is affected by species and extracts.Problem: overdispersion.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models for Orobanche Data (Hinde and Demetrio, 1998a,b)
Binomial:
Var(Yi ) = miπi (1− πi )
• residual deviance for the interaction model is 33.28 on 17 df. – overdispersion
Quasi-likelihood:
Var(Yi ) = miπi (1− πi )
• constant overdispersion φ = 1.862• only marginal evidence of interaction• extract only important factor
Williams:
Var(Yi ) = miπi (1− πi )[1 + φ(mi − 1)]
• moment estimate φ = 0.0249• only marginal evidence of interaction• extract only important factor
• note similarity to QL, even though mi not equal.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Table: Deviances with overdispersion estimated from maximal model.
Bin Const Beta-BinomialML QL ML
E | S 56.49 30.34 32.69S | E 3.06 1.64 2.88S.Et 6.41 3.44 4.45
φ 1.862 0.012
Table: Deviances with overdispersion re-estimated for each model.
Beta-Binomial Logistic-NormalSource ML MLE | S 15.44 15.40S | E 2.73 2.71S.E 4.13 4.17
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Half-normal plots with simulation envelopes (Hinde and Demetrio, 1998a,b)
0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Binomial: Logit
Half-normal scores
Devia
nce
resid
uals
0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Constant Overdispersion (QL)
Half-normal scores
Devia
nce
resid
uals
0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Williams Type II
Half-normal scores
Devia
nce
resid
uals
0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Williams Type III
Half-normal scores
Devia
nce
resid
uals
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
Species <- factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,2, 2, 2, 2, 2, 2, 2, 2, 2, 2))
Extract <- factor(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,1, 1, 1, 1, 1, 2, 2, 2, 2, 2))
y <- c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8,10, 8, 23, 0, 3, 22, 15, 32, 3)
m <- c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16,30, 28, 45, 4, 12, 41, 30, 51, 7)
orobanch <- data.frame(Species, Extract, m, y)rm(Species, Extract, m, y)attach(orobanch)resp<-cbind(y,m-y)
# Binomial fitoro.Bin<-glm(resp~Species*Extract, family=binomial)anova(oro.Bin, test="Chisq")summary(oro.Bin, cor=FALSE)
oro.Bin<-glm(resp~Extract*Species, family=binomial)anova(oro.Bin, test="Chisq")summary(oro.Bin, cor=FALSE)
## Pearson estimate of phi(X2<-sum(residuals(oro.Bin, 'pearson')^2))(phi<-X2/df.residual(oro.Bin))
R program (cont)
# Quasilikelihhod fitoro.QL<-glm(resp~Species*Extract, family=quasibinomial)summary(oro.QL, cor=FALSE)summary(oro.QL)$dispersionanova(orobanchQL.fit, test="F")
##### Beta-binomial modellibrary(aod)# re-estimating phioro.BBin4 <- betabin(cbind(y,m-y) ~Extract,~1, data=orobanch)oro.BBin3 <- betabin(cbind(y,m-y) ~Species,~1, data=orobanch)oro.BBin2 <- betabin(cbind(y,m-y) ~Extract+Species,~1, data=orobanch)oro.BBin1 <- betabin(cbind(y,m-y) ~Extract+Species+Extract:Species,~1, data=orobanch)anova(oro.BBin4,oro.BBin2, oro.BBin1)anova(oro.BBin3,oro.BBin2, oro.BBin1)summary(oro.BBin2)
# fixing phi estimated by the maximum modelf1 <- betabin(cbind(y, m - y) ~ Extract+Species+Extract:Species, ~ 1, data=orobanch,
fixpar = list(5, 0.01238))f2 <- betabin(cbind(y, m - y) ~ Extract+Species, ~ 1, data=orobanch,
fixpar = list(4, 0.01238))f3 <- betabin(cbind(y, m - y) ~ Species, ~ 1, data=orobanch,
fixpar = list(3, 0.01238))f4 <- betabin(cbind(y, m - y) ~ Extract, ~ 1, data=orobanch,
fixpar = list(3, 0.01238))
anova(f4,f2,f1)anova(f3,f2,f1)
R program (cont)
# logistic normallibrary(lme4)ind <- (1:length(y))# re-estimating phioro.LN1<-glmer(resp~Extract*Species + (1|ind), family=binomial(link="logit"))oro.LN1summary(oro.LN1)@coefssummary(oro.LN1)@REmatoro.LN2<-glmer(resp~Extract+Species + (1|ind), family=binomial(link="logit"))oro.LN2oro.LN4<-glmer(resp~Extract + (1|ind), family=binomial(link="logit"))oro.LN4oro.LN3<-glmer(resp~Species + (1|ind), family=binomial(link="logit"))oro.LN3
anova(oro.LN4,oro.LN2,oro.LN1)anova(oro.LN3,oro.LN2,oro.LN1)
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Worldwide Airline Fatalities, 1976-85
Example
Year Fatal Passenger Passengeraccidents deaths miles
(100 million)
1976 24 734 38631977 25 516 43001978 31 754 50271979 31 877 54811980 22 814 58141981 21 362 60331982 26 764 58771983 20 809 62231984 16 223 74331985 22 1066 7107
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Worldwide Airline Fatalities, 1976-85
Simple Models
Passenger miles (mi ) as exposure variable
Poisson log-linear model
Linear time trend
Yi ∼ Pois(miλi ) (2)
log λi = β0 + β1yeari
Fatal accidents:Deviance(time trend) = 20.68Residual Deviance = 5.46 on 8 d.f.
Passenger deaths:Deviance(time trend) = 202.1Residual Deviance = 1051.5 on 8 d.f.
⇒ compounding with aircraft size
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Confidence Intervals – airline data
1976 1978 1980 1982 1984
400
600
800
1000
1200
Year
Num
ber
of d
eath
s
●
●
●
●
●
●
●
●
●
●
*
ObservedPoissonNegative binomialQuasi−Poisson
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
airline.dat <- scan(what=list(acc=0, death=0, miles=0))24 734 386325 516 430031 754 502731 877 548122 814 581421 362 603326 764 587720 809 622316 223 743322 1066 7107
airline <-data.frame(airline.dat)airline.dat$year<-seq(1976,1985,by=1)attach(airline.dat)plot(year,acc, main="Number of Fatal Accidents vs Year")
acc1.fit<-glm(acc~year+offset(log(miles)), family=poisson)summary(acc1.fit)anova(acc1.fit,test="Chisq")
plot(c(1976,1985), c(15,35), type="n", xlab="Year", ylab="Number of Accidents")points(year,acc)x<-seq(1976,1985,1)lp<-predict(acc1.fit,data.frame(year=x))fv<-exp(lp)lines(x,fv,lty=1)title(sub="Observed values and fitted curve")
qqnorm(resid(acc1.fit),ylab="Residuo")qqline(resid(acc1.fit))
R program (cont)
# Number of deaths# Poisson fitplot(year,death, main="Number of Fatal Accidents vs Year")
death1.fit<-glm(death~year+offset(log(miles)), family=poisson)summary(death1.fit)anova(death1.fit, test="Chisq")inflim <- with(predict(death1.fit, se = TRUE,type="response"),fit -pnorm(0.975)*se.fit)suplim <- with(predict(death1.fit, se = TRUE,type="response"),fit + pnorm(0.975)*se.fit)
plot(c(1976,1985), c(350,1200), type="n", xlab="Year", ylab="Number of deaths")points(year,death)x<-seq(1976,1985,1)fv1<-predict(death1.fit,data.frame(year=x),type="response")lines(x,fv1,lty=1)lines(x,inflim,lty=2)lines(x,suplim,lty=2)
# Negative binomial fitlibrary(MASS)death2.fit<-glm.nb(death~year+offset(log(miles)),link = log)summary(death2.fit)anova(death2.fit, test="F")inflim <- with(predict(death2.fit, se = TRUE,type="response"),fit -qt(0.975,8)*se.fit)suplim <- with(predict(death2.fit, se = TRUE,type="response"),fit + qt(0.975,8)*se.fit)
R program (cont)
fv2<-predict(death2.fit,data.frame(year=x),type="response")#lines(x,fv2,lty=1)#lines(x,fv1,lty=1,col="blue")lines(x,inflim,lty=3, col="blue")lines(x,suplim,lty=3, col="blue")
# Quasilikelihhod fitdeath3.fit<-glm(death~year+offset(log(miles)), family=quasipoisson)summary(death3.fit)anova(death3.fit, test="F")inflim <- with(predict(death3.fit, se = TRUE,type="response"),fit -qt(0.975,8)*se.fit)suplim <- with(predict(death3.fit, se = TRUE,type="response"),fit + qt(0.975,8)*se.fit)
fv3<- predict(death3.fit,data.frame(year=x),type="response")#lines(x,fv3,lty=1,col="red")lines(x,inflim,lty=4, col="red")lines(x,suplim,lty=4, col="red")legend(1976,1200.0,c("Observed","Poisson", "Negative binomial","Quasi-Poisson"), lty=c(-1,2,3,4),pch=c("*"," "," "," "), col=c("black","black","blue", "red"))
# Poisson log-normallibrary(lme4)ind <- (1:length(death))death.LN1<-glmer(death~year+offset(log(miles)) + (1|ind),family=poisson(link="log"))death.LN1summary(death.LN1)@coefssummary(death.LN1)@REmat
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Orange data (Pinheiro & Bates, 2000; Molenberghs &Verbeke, 2008)
Example
The data are the trunk circumference (mm) of each of five orange treesat seven different occasions.
ResponseDay Tree 1 Tree 2 Tree 3 Tree 4 Tree 5118 30 33 30 32 30484 58 69 51 62 49664 87 111 75 112 81
1004 115 156 108 167 1251231 120 172 115 179 1421372 142 203 139 209 1741582 145 203 140 214 177
All trees measured on the same occasions – balanced, longitudinal data.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Time since December 31, 1968 (days)
Trun
k ci
rcum
fere
nce
(mm
)
50
100
150
200
500 1000 1500
●
●
●
●●
● ●
3
●
●
●
●●
● ●
1
500 1000 1500
●
●
●
●
●
● ●
5
●
●
●
●
●
● ●
2
500 1000 1500
50
100
150
200
●
●
●
●
●
●●
4
Time since December 31, 1968 (days)
Trun
k ci
rcum
fere
nce
(mm
)
50
100
150
200
500 1000 1500
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
1
3 1 5 2 4
Much variability between trees
Much less variability within trees
Non-linear trend
Possible correlation along the time
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models for Orange data
Model 1: Logistic model for tree i at age xij , i = 1, . . . , 5, j = 1, . . . , 7without random effects (4 parameters)
Yij =φ1
1 + exp[−(xij − φ2)/φ3]+ εij
φ1 is an asymptotic trunk circumference,φ2 is the age at which the tree attains half of it asymptotic trunkcircumference,φ3 represents the growth scale,the error term εij ∼ N(0, σ2)
Model 2: Separate logistic curve to each tree (16 parameters)
Yij =φ1i
1 + exp[−(xij − φ2i )/φ3i ]+ εij
εij ∼ N(0, σ2)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models for Orange data (cont.)
Model 3: Logistic model with random efects (10 parameters)
Yij =φ1i
1 + exp[−(xij − φ2i )/φ3i ]+ εij
φi =
ψ1i
ψ2i
ψ3i
=
β1
β2
β3
+
b1i
b2i
b3i
= β+ bi , ψ =
σ21 σ12 σ13
σ12 σ22 σ23
σ13 σ23 σ23
bi ∼ N(0,ψ), εij ∼ N(0, σ2)
Model 4: Logistic model with random effect on the asymptote (5parameters)
Yij =φ1i
1 + exp[−(xij − φ2)/φ3]+ εij
φ1i = β1 + bi , bi ∼ N(0, σ2b), εij ∼ N(0, σ2)
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Model 1: Fixed logistic model
Fitted values
Sta
ndar
dize
d re
sidu
als
−1
0
1
2
50 100 150
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
Residuals
Tree
3
1
5
2
4
−40 −20 0 20 40
●
●
●
●
●
The variability in the residuals increases with the fitted values, dueto correlation among observations in the same treeThe residuals are mostly negative for trees 1 and 3 and mostlypositive for trees 2 and 4
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Model 2: Separate logistic curve to each treeTr
ee
3
1
5
2
4
150 200 250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Asym
600 800 1000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
xmid
200 300 400 500 600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scal
Residuals (mm)Tr
ee
3
1
5
2
4
−10 −5 0 5 10
●
●
●
●
●
The estimates for all the parameters vary with tree, there appears tobe relatively more variability in the Asym estimates.Asym – the only parameter for which all the confidence intervals donot overlap, suggesting that it is the only parameter for whichrandom effects are needed to account for variation among trees.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Models 3 and 4: Random logistic models
Model df AIC BIC logLik Test L.Ratio p-value3 10 279.98 295.53 -129.994 5 273.17 280.95 -131.58 3 vs 4 3.1896 0.6708
Note: The default estimation method in nlme in R is maximumlikelihood (ML), while in lme is restricted maximum likelihood(REML).
The large p-value for the likelihood ratio test confirms that thesimpler model (random effect only for the asymptote) is to bepreferred.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Model 1 Model 4Par. Value Std. Error DF t-value Value Std. Error DF t-value
Asym 192.68 20.24 32 9.52 191.05 16.15 28 11.83xmid 728.71 107.27 32 6.79 722.56 35.15 28 20.56scal 353.49 81.46 32 4.34 344.17 27.15 28 12.68
Residual 546.26 61.56
The fixed-effects estimates are similar, but the standard errors aremuch smaller in the nlme fit.
The estimated within-group standard error is also considerablysmaller in the non-linear mixed model.
This is because the between-group variability is not incorporated inthe non-linear fixed model, being absorbed in the standard error.
This pattern is generally observed when comparing mixed-effectsversus fixed-effects fits.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
Fitted values (mm)
Sta
ndar
dize
d re
sidu
als
−1
0
1
50 100 150 200
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
Time since December 31, 1968 (days)
Trun
k ci
rcum
fere
nce
(mm
)
50
100
150
200
500 1000 1500
●
●
●
●
●
● ●
3
500 1000 1500
●
●
●
●
●
●●
1
500 1000 1500
●
●
●
●
●
●●
5
500 1000 1500
●
●
●
●
●
● ●
2
500 1000 1500
●
●
●
●
●
●
●
4
fixed Tree
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
R program
library(nlme)data() # lists all datasetsdata(Orange, package='datasets')str(Orange)head(Orange)summary(Orange)par(mfrow=c(1,2))plot(Orange) ## five curves in five different panelsplot(Orange, outer=~1) ## all five curves in one panel
## no random effects - Model 1(fm1Oran.nls <- nls(circumference ~ SSlogis(age,Asym,xmid,scal),data = Orange))summary(fm1Oran.nls)plot(fm1Oran.nls)plot(fm1Oran.nls, Tree ~ resid(.), abline=0)
## separate logistic curve to each tree - Model 2(fm1Oran.lis <- nlsList(circumference ~ SSlogis(age, Asym, xmid, scal)|Tree,data = Orange))summary(fm1Oran.lis)plot(intervals(fm1Oran.lis), layout = c(3,1))plot(fm1Oran.lis, Tree ~ resid(.), abline = 0)
R program (cont.)
## random effect model - Model 3## no need to specify groups, as Orange is a groupedData object# random is ommitted - by default is equal to fixed(fm1Oran.nlme <- nlme(circumference ~ SSlogis(age,Asym,xmid,scal),data = Orange,fixed=Asym+xmid+scal~1, start=fixef(fm1Oran.lis)))## or simplyfm1Oran.nlme <- nlme(fm1Oran.lis)summary(fm1Oran.nlme) ## compare with next (look at the standard errors)summary(fm1Oran.nls)
## random effect model - Model 4fm2Oran.nlme <- update(fm1Oran.nlme, random = Asym ~ 1)anova(fm1Oran.nlme,fm2Oran.nlme) ## compares Model 3 and 4summary(fm2Oran.nlme)plot(fm1Oran.nlme) ## plots the standardized residuals versus the fitted valuesplot(augPred(fm2Oran.nlme, level = 0:1), layout = c(5,1))qqnorm(fm2Oran.nlme, abline = c(0,1),cex=0.7)
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
References I
Bates D.M. (2010). lme4: Mixed-effects modeling, with R, Springer,http://lme4.r-forge.r-project.org/book/
Cornillon, P.A. and Matzner-Lober, E. (2011). Regression avec R. Springer, Collection Pratique R,1st edition.
Faraway J.J.(2006). Extending the Linear Model with R. Texts in Statistical Science. Chapman &Hall
Hinde, J. and Demetrio, C.G.B. (1998a). Overdispersion: models and estimation. ComputationalStatistics and Data Analysis, 27, 151–170.
Hinde, J. and Demetrio, C.G.B. (1998b). Overdispersion: models and estimation. Sao Paulo: ABE.Jorgensen, B. (1997) The Theory of Dispersion Models. Chapman and Hall, London.Jorgensen, B. (2011) Generalized linear models. Encyclopedia of Environmetrics, to appear.Littell, R.C.; Milliken, G.A.; Stroup, W.W.; Wolfinger, R.D. and Schabenberger, O. (2006). SAS
for Mixed Models, SAS Institute.McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2ed., Chapman and Hall,
London.McCulloch, C.E. and Searle, S.R. (2001). Generalized, Linear, and Mixed Models, Wiley Series in
Probability and Statistics.Nelder, J.A. and Wedderburn, R.W.M. (1972). Generalized linear models, Journal of the Royal
Statistical Society, A, 135 370–384.Pinheiro, J. C., and Bates, D. (2000), Mixed Effects Models in S and S-PLUS, New York: Springer.Searle, S.R.; Casella, G. and McCulloch, C.E. (1992). Variance components, Wiley Series in
Probability and Statistics.Vazquez, A. I.; Bates,D. M.; Rosa,G. J. M.; Gianola, D.; Weigel, K. A. (2010) Technical note: An
R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci..88:497-504.
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications
Plan Models with fixed effects Linear Mixed Models (LMM) Generalized Linear Mixed Models (GLMM) Longitudinal case Overdispersion Nonlinear models
References II
Venables, W.N. and Ripley, B.D. (1994) Modern applied statistics with S-Plus. Springer-Verlag,New York.
Verbeke, G. and Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. New York:Springer-Verlag.
Verbeke, G. and Molenberghs, G. (2008) Models for Longitudinal and Incomplete Data. Slides of ashort course.
West, B.T.; Welch, K.B,; Galecki, A.T. (2007) Linear Mixed Models. A practical Guide UsingStatistical Software New York: Chapman & Hall/CRC.
Journal of Societe Francaise de Statistique (2002) Special Number on Modeles Mixtes etBiometrie. Vol.143 - No 1-2
C. Demetrio, F. Mortier & C. Trottier Mixed Models, theory and applications