Upload
ben-bolker
View
478
Download
0
Embed Size (px)
Citation preview
De�nitions Estimation Inference Challenges & open questions References
Generalized linear mixed models: overview andopen questions
Ben Bolker
McMaster University, Mathematics & Statistics and Biology
12 November 2013
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
(Generalized) linear mixed models
(G)LMMs: a statistical modeling framework incorporating:
Linear combinations of categorical and continuouspredictors, and interactions
Response distributions in the exponential family
(binomial, Poisson, and extensions)
Any smooth, monotonic link function
(e.g. logistic, exponential models)
Flexible combinations of blocking factors
(clustering; random e�ects)
Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Examples
ecology survival, predation, etc. (experimental plots)
genomics presence/absence of polymorphisms, gene expression(individuals)
educational assessment student scores (students × teachers)
psychology/sensometrics decisions, responses to stimuli(individuals)
epidemiology disease prevalence (postal codes, provinces, countries)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Coral protection by symbionts
none shrimp crabs both
Number of predation events
Symbionts
Num
ber
of b
lock
s
0
2
4
6
8
10
1
2
0
1
2
0
2
0
1
2
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Environmental stress: Glycera cell survival
H2S
Cop
per
0
33.3
66.6
133.3
0 0.03 0.1 0.32
Osm=12.8Normoxia
Osm=22.4Normoxia
0 0.03 0.1 0.32
Osm=32Normoxia
Osm=41.6Normoxia
0 0.03 0.1 0.32
Osm=51.2Normoxia
Osm=12.8Anoxia
0 0.03 0.1 0.32
Osm=22.4Anoxia
Osm=32Anoxia
0 0.03 0.1 0.32
Osm=41.6Anoxia
0
33.3
66.6
133.3
Osm=51.2Anoxia
0.0
0.2
0.4
0.6
0.8
1.0
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Arabidopsis response to fertilization & clipping
panel: nutrient, color: genotypeLo
g(1+
frui
t set
)
0
1
2
3
4
5
unclipped clipped
●●●●● ●
●●
●
●●
●●
●
●
●
●●
●
●
●●●
●
●
●
●●● ●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●●
●
●●
●
●●
●
●
●
●
●● ●
●●
●
●
●
● ●●● ●●● ●●● ●●● ●● ●● ●● ●
●
●●
● ●●● ●●●●
●
●
●
●●
●
●
● ●
●
●
●
●●●●●● ●●
●
●● ●
●
●
● ●
●
●
●
●
●
: nutrient 1
unclipped clipped
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●● ●● ●●● ●●●● ●
●
●
●●
●●
●●
●
●
●●●●●●● ●●● ●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●●●
●●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
: nutrient 8
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Coral demography
Before Experimental
●
●
● ● ● ●●●
●
● ●● ●
●●
●●●●
●
●● ● ●
●●● ●●
●
●●●
● ●●●
●
●●
●
●
●●
●●
●●●
●● ●
●●
●●
● ●
●
●●
●
● ●
●
●●●
●●●
●
●●●●
● ●●●
●
●●
● ●● ●●
●●
●●
●
●●● ● ●
●
● ●● ●
●● ●●● ●●● ●
●●
●
●
●
●●● ●
●
●
●
●● ●●●
●
●
●
●●
●●
● ●● ●●●
●
●
●
●●●●
●
●
●●●
●●●●
●●●● ●
●●
●● ●● ●●●
●●
●●
●
●●● ●●
●
●● ●
●
● ●
●
●●0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)
Mor
talit
y pr
obab
ility
Treatment
●
●
Present
Removed
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Technical de�nition
Yi︸︷︷︸response
∼
conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸
inverselink
function
, φ︸︷︷︸scale
parameter
)
η︸︷︷︸linear
predictor
= Xβ︸︷︷︸�xede�ects
+ Zb︸︷︷︸randome�ects
b︸︷︷︸conditionalmodes
∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix
)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Overview
Maximum likelihood estimation
L(Yi |θ,β)︸ ︷︷ ︸likelihood
=
∫· · ·
∫L(Yi |β,b)︸ ︷︷ ︸
data|random e�ects
× L(b|Σ(θ))︸ ︷︷ ︸random e�ects
db
Best �t is a compromise between two components(consistency of data with β and random e�ects, consistency ofrandom e�ect with RE distribution)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Overview
Integrated (marginal) likelihood
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
1.0
conditional mode value (u)
Sca
led
prob
abili
ty
L(b |σ2)
L(x |b, β)
Lprod
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Overview
Shrinkage
● ●
●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Arabidopsis block estimates
Genotype
Mea
n(lo
g) fr
uit s
et
0 5 10 15 20 25
−15
−3
0
3
● ● ●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
3 2 10
8 10 43 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Estimation methods
deterministic: precision vs. computational cost:penalized quasi-likelihood, Laplace approximation, adaptiveGauss-Hermite quadrature (Breslow, 2004) . . .
stochastic (Monte Carlo): frequentist and Bayesian (Boothand Hobert, 1999; Ponciano et al., 2009; Sung, 2007)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)
�exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts < 5, binary orlow-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)
�exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts < 5, binary orlow-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)
�exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts < 5, binary orlow-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Penalized quasi-likelihood (PQL)
alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)
�exible (allows spatial/temporal correlations, crossed REs)
biased for small unit samples (e.g. counts < 5, binary orlow-survival data)
widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases
descendants: higher-order PQL, hierarchical GLM
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Breslow (2004) on PQL
As usual when software for complicated statisticalinference procedures is broadly disseminated, there ispotential for abuse and misinterpretation. In spite of thefact that PQL was initially advertised as a procedure forapproximate inference in GLMMs, and its tendency togive seriously biased estimates of variance componentsand a fortiori regression parameters with binary outcomedata was emphasized in multiple publications [5, 6, 24],some statisticians seemed to ignore these warnings and tothink of PQL as synonymous with GLMM.
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Laplace approximation
for given β, θ (RE parameters), �nd conditional modes bypenalized, iterated reweighted least squares;then use second-order Taylor expansion around the conditionalmodes
more accurate than PQL
reasonably fast and �exible
lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Gauss-Hermite quadrature (GHQ)
as above, but compute additional terms in the integral(typically 8, but often up to 20)
most accurate
slowest, hence not �exible (2�3 RE at most, maybe only 1)
lme4:glmer, glmmML, repeated
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Adaptive vs. non-adaptive GHQ
Adaptive GHQ is more expensive at a given n,but makes up for it in accuracy
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Stochastic approaches
Mostly Bayesians (Bayesian computation handleshigh-dimensional integration)
various �avours: Gibbs sampling, MCMC, MCEM, etc.
generally slower but more �exible
simpli�es many inferential problems
must specify priors, assess convergence/error
specialized: glmmAK, MCMCglmm (Had�eld, 2010), INLA,bernor
general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),R2jags, rjags (JAGS)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Methods
Estimation: example (McKeon et al., 2012)
Log−odds of predation−6 −4 −2 0 2
Symbiont
Crab vs. Shrimp
Added symbiont
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
GLM (fixed)GLM (pooled)PQLLaplaceAGQ
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Wald tests
Wald tests (e.g. typical results of summary)
based on information matrixassume quadratic log-likelihood surface
exact for regular linear models;only asymptotically OK for GLM(M)s
computationally cheap
approximation is sometimes awful (Hauck-Donner e�ect)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
2D pro�les for coral predation
Scatter Plot Matrix
.sig01
2 4 6 8 101214
−3−2−1
0
(Intercept)
0
5
10
1510 15
0 1 2 3
tttcrabs
−10−8−6−4−20
−4 −2 0
0 1 2 3
tttshrimp
−10−8−6−4−2 −6 −4 −2
0 1 2 3
tttboth
−12−10−8−6−4−2
0 1 2 3
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Likelihood ratio tests
better, but still have to deal with two �nite-size problems:
when scale parameter is free (Gamma, etc.), deviance is ∼ F
rather than ∼ χ2, with poorly de�ned denominator dfin GLM(M) case, numerator is only asymptotically χ2 anywayBartlett corrections (Cordeiro and Ferrari, 1998; Cordeiroet al., 1994), higher-order asymptotics: cond [neither extendedto GLMMs!]
Pro�le con�dence intervals: moderately di�cult/fragile
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrapping
�t null model to data
simulate �data� from null model
�t null and working model, compute likelihood di�erence
repeat to estimate null distribution
should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Parametric bootstrap results
True p value
Infe
rred
p v
alue
0.020.040.060.08
0.02 0.06
Osm Cu
H2S
0.02 0.06
0.020.040.060.08
Anoxia
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Bayesian approaches
Provided that we have a good sample from the posteriordistribution (Markov chains have converged etc. etc.) we getmost of the inferences we want for free by summarizing themarginal posteriors
Model selection is still an open question: reversible-jumpMCMC, deviance information criterion
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Outline
1 Examples and de�nitions
2 EstimationOverviewMethods
3 Inference
4 Challenges & open questions
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Next steps
Dealing with complex random e�ects:regularization, model selection, penalized methods(lasso/fence)
Flexible correlation structures:spatial, temporal, phylogenetic
hybrid & improved MCMC methods (mcmcsamp, Stan)
Reliable assessment of out-of-sample performance
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Glycera estimates
Effect on survival
−60 −40 −20 0 20 40 60
Osm
Cu
H2S
Anoxia
Osm:Cu
Osm:H2S
Cu:H2S
Osm:Anoxia
Cu:Anoxia
H2S:Anoxia
Osm:Cu:H2S
Osm:Cu:Anoxia
Osm:H2S:Anoxia
Cu:H2S:Anoxia
Osm:Cu:H2S:Anoxia
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
MCMCglmmglmer(OD:2)glmer(OD)glmmMLglmer
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Acknowledgments
lme4: Doug Bates, MartinMächler, Steve Walker
Data: Adrian Stier (UBC/OSU),Sea McKeon (Smithsonian),David Julian (UF), Jada-SimoneWhite (Univ Hawai'i)
NSERC (Discovery)
SHARCnet
Ben Bolker
GLMMs
De�nitions Estimation Inference Challenges & open questions References
Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.
Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.
Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,71(1-2):261�269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.
Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / RevueInternationale de Statistique, 62(2):257�274. ISSN 03067734. doi:10.2307/1403512.
Had�eld, J.D., 2010. Journal of Statistical Software, 33(2):1�22. ISSN 1548-7660.
McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.
Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289�296.doi:10.1007/BF00140873.
Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.
Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.
Ben Bolker
GLMMs