44

Stats sem 2013

Embed Size (px)

Citation preview

De�nitions Estimation Inference Challenges & open questions References

Generalized linear mixed models: overview andopen questions

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

12 November 2013

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

Linear combinations of categorical and continuouspredictors, and interactions

Response distributions in the exponential family

(binomial, Poisson, and extensions)

Any smooth, monotonic link function

(e.g. logistic, exponential models)

Flexible combinations of blocking factors

(clustering; random e�ects)

Applications in ecology, neurobiology, behaviour, epidemiology, realestate, . . .

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Examples

ecology survival, predation, etc. (experimental plots)

genomics presence/absence of polymorphisms, gene expression(individuals)

educational assessment student scores (students × teachers)

psychology/sensometrics decisions, responses to stimuli(individuals)

epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Examples

ecology survival, predation, etc. (experimental plots)

genomics presence/absence of polymorphisms, gene expression(individuals)

educational assessment student scores (students × teachers)

psychology/sensometrics decisions, responses to stimuli(individuals)

epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Examples

ecology survival, predation, etc. (experimental plots)

genomics presence/absence of polymorphisms, gene expression(individuals)

educational assessment student scores (students × teachers)

psychology/sensometrics decisions, responses to stimuli(individuals)

epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Examples

ecology survival, predation, etc. (experimental plots)

genomics presence/absence of polymorphisms, gene expression(individuals)

educational assessment student scores (students × teachers)

psychology/sensometrics decisions, responses to stimuli(individuals)

epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Examples

ecology survival, predation, etc. (experimental plots)

genomics presence/absence of polymorphisms, gene expression(individuals)

educational assessment student scores (students × teachers)

psychology/sensometrics decisions, responses to stimuli(individuals)

epidemiology disease prevalence (postal codes, provinces, countries)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Coral protection by symbionts

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Environmental stress: Glycera cell survival

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Arabidopsis response to fertilization & clipping

panel: nutrient, color: genotypeLo

g(1+

frui

t set

)

0

1

2

3

4

5

unclipped clipped

●●●●● ●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

●●

● ●●● ●●●●

●●

● ●

●●●●●● ●●

●● ●

● ●

: nutrient 1

unclipped clipped

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●●●●

●●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●

●●

●●●●●●● ●●● ●●

●●

●●

●●●●●●

●●●●●

●●

: nutrient 8

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Coral demography

Before Experimental

● ● ● ●●●

● ●● ●

●●

●●●●

●● ● ●

●●● ●●

●●●

● ●●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

●●●●

● ●●●

●●

● ●● ●●

●●

●●

●●● ● ●

● ●● ●

●● ●●● ●●● ●

●●

●●● ●

●● ●●●

●●

●●

● ●● ●●●

●●●●

●●●

●●●●

●●●● ●

●●

●● ●● ●●●

●●

●●

●●● ●●

●● ●

● ●

●●0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)

Mor

talit

y pr

obab

ility

Treatment

Present

Removed

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Overview

Maximum likelihood estimation

L(Yi |θ,β)︸ ︷︷ ︸likelihood

=

∫· · ·

∫L(Yi |β,b)︸ ︷︷ ︸

data|random e�ects

× L(b|Σ(θ))︸ ︷︷ ︸random e�ects

db

Best �t is a compromise between two components(consistency of data with β and random e�ects, consistency ofrandom e�ect with RE distribution)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Overview

Integrated (marginal) likelihood

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

conditional mode value (u)

Sca

led

prob

abili

ty

L(b |σ2)

L(x |b, β)

Lprod

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Overview

Shrinkage

● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Arabidopsis block estimates

Genotype

Mea

n(lo

g) fr

uit s

et

0 5 10 15 20 25

−15

−3

0

3

● ● ●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

3 2 10

8 10 43 9 9 4 6 4 2 6 10 5 7 9 4 9 11 2 5 5

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Estimation methods

deterministic: precision vs. computational cost:penalized quasi-likelihood, Laplace approximation, adaptiveGauss-Hermite quadrature (Breslow, 2004) . . .

stochastic (Monte Carlo): frequentist and Bayesian (Boothand Hobert, 1999; Ponciano et al., 2009; Sung, 2007)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Penalized quasi-likelihood (PQL)

alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)

�exible (allows spatial/temporal correlations, crossed REs)

biased for small unit samples (e.g. counts < 5, binary orlow-survival data)

widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases

descendants: higher-order PQL, hierarchical GLM

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Penalized quasi-likelihood (PQL)

alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)

�exible (allows spatial/temporal correlations, crossed REs)

biased for small unit samples (e.g. counts < 5, binary orlow-survival data)

widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases

descendants: higher-order PQL, hierarchical GLM

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Penalized quasi-likelihood (PQL)

alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)

�exible (allows spatial/temporal correlations, crossed REs)

biased for small unit samples (e.g. counts < 5, binary orlow-survival data)

widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases

descendants: higher-order PQL, hierarchical GLM

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Penalized quasi-likelihood (PQL)

alternate steps of estimating GLM using known RE variancesto calculate weights; estimate LMMs given GLM �t (Breslow,2004)

�exible (allows spatial/temporal correlations, crossed REs)

biased for small unit samples (e.g. counts < 5, binary orlow-survival data)

widely used: SAS PROC GLIMMIX, R MASS:glmmPQL: in ≈90% of small-unit-sample cases

descendants: higher-order PQL, hierarchical GLM

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Breslow (2004) on PQL

As usual when software for complicated statisticalinference procedures is broadly disseminated, there ispotential for abuse and misinterpretation. In spite of thefact that PQL was initially advertised as a procedure forapproximate inference in GLMMs, and its tendency togive seriously biased estimates of variance componentsand a fortiori regression parameters with binary outcomedata was emphasized in multiple publications [5, 6, 24],some statisticians seemed to ignore these warnings and tothink of PQL as synonymous with GLMM.

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Laplace approximation

for given β, θ (RE parameters), �nd conditional modes bypenalized, iterated reweighted least squares;then use second-order Taylor expansion around the conditionalmodes

more accurate than PQL

reasonably fast and �exible

lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Gauss-Hermite quadrature (GHQ)

as above, but compute additional terms in the integral(typically 8, but often up to 20)

most accurate

slowest, hence not �exible (2�3 RE at most, maybe only 1)

lme4:glmer, glmmML, repeated

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Adaptive vs. non-adaptive GHQ

Adaptive GHQ is more expensive at a given n,but makes up for it in accuracy

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Stochastic approaches

Mostly Bayesians (Bayesian computation handleshigh-dimensional integration)

various �avours: Gibbs sampling, MCMC, MCEM, etc.

generally slower but more �exible

simpli�es many inferential problems

must specify priors, assess convergence/error

specialized: glmmAK, MCMCglmm (Had�eld, 2010), INLA,bernor

general: glmmBUGS, R2WinBUGS, BRugs (WinBUGS/OpenBUGS),R2jags, rjags (JAGS)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Methods

Estimation: example (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Wald tests

Wald tests (e.g. typical results of summary)

based on information matrixassume quadratic log-likelihood surface

exact for regular linear models;only asymptotically OK for GLM(M)s

computationally cheap

approximation is sometimes awful (Hauck-Donner e�ect)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

2D pro�les for coral predation

Scatter Plot Matrix

.sig01

2 4 6 8 101214

−3−2−1

0

(Intercept)

0

5

10

1510 15

0 1 2 3

tttcrabs

−10−8−6−4−20

−4 −2 0

0 1 2 3

tttshrimp

−10−8−6−4−2 −6 −4 −2

0 1 2 3

tttboth

−12−10−8−6−4−2

0 1 2 3

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Likelihood ratio tests

better, but still have to deal with two �nite-size problems:

when scale parameter is free (Gamma, etc.), deviance is ∼ F

rather than ∼ χ2, with poorly de�ned denominator dfin GLM(M) case, numerator is only asymptotically χ2 anywayBartlett corrections (Cordeiro and Ferrari, 1998; Cordeiroet al., 1994), higher-order asymptotics: cond [neither extendedto GLMMs!]

Pro�le con�dence intervals: moderately di�cult/fragile

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrapping

�t null model to data

simulate �data� from null model

�t null and working model, compute likelihood di�erence

repeat to estimate null distribution

should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrap results

True p value

Infe

rred

p v

alue

0.020.040.060.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.020.040.060.08

Anoxia

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Bayesian approaches

Provided that we have a good sample from the posteriordistribution (Markov chains have converged etc. etc.) we getmost of the inferences we want for free by summarizing themarginal posteriors

Model selection is still an open question: reversible-jumpMCMC, deviance information criterion

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Next steps

Dealing with complex random e�ects:regularization, model selection, penalized methods(lasso/fence)

Flexible correlation structures:spatial, temporal, phylogenetic

hybrid & improved MCMC methods (mcmcsamp, Stan)

Reliable assessment of out-of-sample performance

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Glycera estimates

Effect on survival

−60 −40 −20 0 20 40 60

Osm

Cu

H2S

Anoxia

Osm:Cu

Osm:H2S

Cu:H2S

Osm:Anoxia

Cu:Anoxia

H2S:Anoxia

Osm:Cu:H2S

Osm:Cu:Anoxia

Osm:H2S:Anoxia

Cu:H2S:Anoxia

Osm:Cu:H2S:Anoxia

MCMCglmmglmer(OD:2)glmer(OD)glmmMLglmer

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Adrian Stier (UBC/OSU),Sea McKeon (Smithsonian),David Julian (UF), Jada-SimoneWhite (Univ Hawai'i)

NSERC (Discovery)

SHARCnet

Ben Bolker

GLMMs

De�nitions Estimation Inference Challenges & open questions References

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

Cordeiro, G.M. and Ferrari, S.L.P., 1998. Journal of Statistical Planning and Inference,71(1-2):261�269. ISSN 0378-3758. doi:10.1016/S0378-3758(98)00005-6.

Cordeiro, G.M., Paula, G.A., and Botter, D.A., 1994. International Statistical Review / RevueInternationale de Statistique, 62(2):257�274. ISSN 03067734. doi:10.2307/1403512.

Had�eld, J.D., 2010. Journal of Statistical Software, 33(2):1�22. ISSN 1548-7660.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Pinheiro, J.C. and Bates, D.M., 1996. Statistics and Computing, 6(3):289�296.doi:10.1007/BF00140873.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.

Ben Bolker

GLMMs