40
Precursors GLMMs Results Conclusions References Generalized linear mixed models for ecologists: coping with non-normal, spatially and temporally correlated data Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology 30 August 2011 Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology GLMMs

Trondheim glmm

Embed Size (px)

DESCRIPTION

talk on GLMMs at Trondheim

Citation preview

Page 1: Trondheim glmm

Precursors GLMMs Results Conclusions References

Generalized linear mixed models for ecologists:coping with non-normal, spatially and temporally

correlated data

Ben Bolker

McMaster UniversityDepartments of Mathematics & Statistics and Biology

30 August 2011

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 2: Trondheim glmm

Precursors GLMMs Results Conclusions References

Outline

1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs

2 GLMMsEstimationInference

3 ResultsCoral symbiontsGlyceraArabidopsis

4 Conclusions

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 3: Trondheim glmm

Precursors GLMMs Results Conclusions References

Outline

1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs

2 GLMMsEstimationInference

3 ResultsCoral symbiontsGlyceraArabidopsis

4 Conclusions

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 4: Trondheim glmm

Precursors GLMMs Results Conclusions References

Examples

Coral protection by symbionts

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 5: Trondheim glmm

Precursors GLMMs Results Conclusions References

Examples

Environmental stress: Glycera cell survival

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 6: Trondheim glmm

Precursors GLMMs Results Conclusions References

Examples

Arabidopsis response to fertilization & clipping

panel: nutrient, color: genotype

Log(

1+fr

uit s

et)

0

1

2

3

4

5

unclipped clipped

●●●●● ●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

● ●● ●●● ●●●●

●●

● ●

●●●●●● ●●

●●●

● ●

: nutrient 1

unclipped clipped

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●●●●●● ●●● ●●

●●

●●

●●●● ●●●●●●●

●●

: nutrient 8

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 7: Trondheim glmm

Precursors GLMMs Results Conclusions References

Definitions

Generalized linear models (GLMs)

non-normal data: binary, binomial,count (Poisson/negative binomial)

non-linearity: log/exponential, logit/logistic:link function L

flexibility via linear predictor: L(response) = a + bi + cx . . .

stable, robust, fast, easy to use

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 8: Trondheim glmm

Precursors GLMMs Results Conclusions References

Definitions

Random vs. fixed effects

Fixed effects (FE) Interested in specific levels (“treatments”)

Random effects (RE):2

Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 9: Trondheim glmm

Precursors GLMMs Results Conclusions References

Definitions

Random vs. fixed effects

Fixed effects (FE) Interested in specific levels (“treatments”)

Random effects (RE):2

Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 10: Trondheim glmm

Precursors GLMMs Results Conclusions References

Definitions

Random vs. fixed effects

Fixed effects (FE) Interested in specific levels (“treatments”)

Random effects (RE):2

Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 11: Trondheim glmm

Precursors GLMMs Results Conclusions References

Definitions

Random vs. fixed effects

Fixed effects (FE) Interested in specific levels (“treatments”)

Random effects (RE):2

Interested in distribution (“blocks”)ExperimentalTemporal, spatialGenera, species, genotypesIndividuals (“repeated measures”)inference on population of blocks(blocks randomly selected?)(large number of blocks [> 5 − 7]?)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 12: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

Mixed models: classical approach

traditional approach tonon-independence

nested, randomized block,split-plot . . .

sum-of-squaresdecomposition/ANOVA:figure out treatment SSQ/df,error SQ/df

3

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 13: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

You can use an ANOVA if . . .

data are normal(or can be transformed)

responses are linear

design is (nearly) balanced

simple design (single or nested REs)(not crossed REs: e.g. year effects that apply across all spatialblocks)

no spatial or temporal correlation within blocks

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 14: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

“Modern” mixed models

Data still normal(izable), linear, butunbalanced/crossed/correlated

Balance(dispersion of observation around block mean)with(dispersion of block means around overall average)

Good for large, messy data. . . and when variation is interesting

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 15: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

Shrinkage (Arabidopsis)

● ●

●●

●●

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

Arabidopsis block estimates

Genotype

Mea

n(lo

g) fr

uit s

et

0 5 10 15 20 25

−15

−3

0

3

●●

●●

●●

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●

32

10

810

43 9 9 4 6

4 2 6 10 5 7 9 4 9 11 2 5 5

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 16: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

Shrinkage (sparrows)

Log(harmonic mean pop size)

Het

eroz

ygos

ity

0.68

0.70

0.72

0.74

0.76

0.78

0.80●

●●

2.0 2.5 3.0 3.5 4.0 4.5

island● Hestmannøy● Sleneset● Gjerøy● Indre Kvarøy● Husøy● Selvær● Ytre Kvarøy● Aldra● Myken● Lovund● Onøy● Nesøy● Lurøy● Sundøy

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 17: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

GLMMs

Data not normal(izable), nonlinear

Standard distributions (Poisson, binomial etc.)

Specific forms of nonlinearity (exponential, logistic etc.)

Conceptually v. similar to LMMs, but harder

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 18: Trondheim glmm

Precursors GLMMs Results Conclusions References

ANOVA vs. (G)LMMs

Challenges

Small # RE levels (<5–6)

Big data (> 1000 observations)

Spatial/temporal correlation structure (in GLMMs)

Unusual distributions of data (in GLMMs)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 19: Trondheim glmm

Precursors GLMMs Results Conclusions References

Outline

1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs

2 GLMMsEstimationInference

3 ResultsCoral symbiontsGlyceraArabidopsis

4 Conclusions

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 20: Trondheim glmm

Precursors GLMMs Results Conclusions References

Estimation

Penalized quasi-likelihood (PQL)1

flexible (e.g. handles spatial/temporal correlations)

least accurate: biased for small samples (low counts per block)

SAS PROC GLIMMIX, R MASS:glmmPQL

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 21: Trondheim glmm

Precursors GLMMs Results Conclusions References

Estimation

Laplace and Gauss-Hermite quadrature

more accurate than PQL: speed/accuracy tradeoff

lme4:glmer, glmmML, glmmADMB, R2ADMB (AD Model Builder,gamlss.mx:gamlssNP, repeated

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 22: Trondheim glmm

Precursors GLMMs Results Conclusions References

Estimation

Bayesian approaches

usually slow but flexible

best confidence intervals

must specify priors, assess convergence

specialized: glmmAK, MCMCglmm6, INLA

general: BUGS (glmmBUGS, R2WinBUGS, BRugs, WinBUGS,OpenBUGS, R2jags, rjags, JAGS)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 23: Trondheim glmm

Precursors GLMMs Results Conclusions References

Estimation

Extensions

Overdispersion Variance > expected from statistical model

Quasi-likelihood MASS:glmmPQL;overdispersed distributions (e.g. negativebinomial): glmmADMB, gamlss.mx:gamlssNP;observation-level random effects (e.g.lognormal-Poisson): lme4, MCMCglmm

Zero-inflation Overabundance of zeros in a discrete distribution

zero-inflated models: glmmADMB, MCMCglmmhurdle models: MCMCglmm

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 24: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Wald tests/CIs

Widely available (e.g. summary())

Assume data set is large/well-behaved

Always approximate, sometimes awful; bad for varianceestimates

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 25: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Likelihood ratio tests

Compare models (easy)

Confidence intervals — expensive and rarely available(lme4a for LMMs)

Asymptotic assumption

LMMs: F tests; estimate “equivalent” denominator df?approximations8;13: doBy:KRmodcomp

don’t really know what to do for GLMMsOK if number obs � number of parameters and

large # of blocks . . .

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 26: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Information-theoretic approaches

Above issues apply, but less well understood4;5;7;11:AIC is asymptotic too

For comparing models with different REs,or for AICc , what is p?

“Level of focus” issue: what are you trying to predict?5;14;15

(cAIC)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 27: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Bootstrapping

1 fit null model to data

2 simulate “data” from null model

3 fit null and working model, compute likelihood difference

4 repeat to estimate null distribution

simulate/refit methods; bootMer in lme4a (LMMs only!),doBy:PBModComp, or “by hand”:

> pboot <- function(m0, m1) {

s <- simulate(m0)

2 * (logLik(refit(m1, s)) - logLik(refit(m0, s)))

}

> replicate(1000, pboot(fm2, fm1))

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 28: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Bayesian inference

CIs, prediction intervals etc. computationally “free” afterestimation

Post hoc MCMC sampling:(glmmADMB, R2ADMB, lme4:MCMCsamp)

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 29: Trondheim glmm

Precursors GLMMs Results Conclusions References

Inference

Bottom line

Large data: computation slow, inference easy

Bayesian computation slow, inference easy

Small data: computation fast

Problems with zero variance (blme), correlations = ±1Bootstrapping for inference?

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 30: Trondheim glmm

Precursors GLMMs Results Conclusions References

Outline

1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs

2 GLMMsEstimationInference

3 ResultsCoral symbiontsGlyceraArabidopsis

4 Conclusions

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 31: Trondheim glmm

Precursors GLMMs Results Conclusions References

Coral symbionts

Coral symbionts: comparison of results

Regression estimates−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 32: Trondheim glmm

Precursors GLMMs Results Conclusions References

Glycera

Glycera fit comparisons

Effect on survival (logit)

−60 −40 −20 0 20 40 60

Osm

Cu

H2S

Anoxia

Osm:Cu

Osm:H2S

Cu:H2S

Osm:Anoxia

Cu:Anoxia

H2S:Anoxia

Osm:Cu:H2S

Osm:Cu:Anoxia

Osm:H2S:Anoxia

Cu:H2S:Anoxia

Osm:Cu:H2S:Anoxia

MCMCglmmglmer(OD:2)glmer(OD)glmmMLglmer

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 33: Trondheim glmm

Precursors GLMMs Results Conclusions References

Glycera

Glycera: parametric bootstrap results

True p value

Infe

rred

p v

alue

0.001

0.005

0.01

0.05

0.1

0.5

0.001

0.005

0.01

0.05

0.1

0.5

Osm

H2S

0.001 0.0050.01 0.05 0.1 0.5

Cu

Anoxia

0.001 0.0050.01 0.05 0.1 0.5

variable

normal

t7

t14

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 34: Trondheim glmm

Precursors GLMMs Results Conclusions References

Arabidopsis

Arabidopsis results

Regression estimates−1.0 0.0 1.0

nutrient8

amdclipped

nutrient8:amdclipped

rack2

statusPetri.Plate

statusTransplant

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 35: Trondheim glmm

Precursors GLMMs Results Conclusions References

Outline

1 PrecursorsExamplesDefinitionsANOVA vs. (G)LMMs

2 GLMMsEstimationInference

3 ResultsCoral symbiontsGlyceraArabidopsis

4 Conclusions

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 36: Trondheim glmm

Precursors GLMMs Results Conclusions References

What about space and/or time?

if in blocks, no problem (crossed random effects)10

test residuals, try to fail to reject NH of no autocorrelation

if normal (LMM), corStruct in lme, spdep

otherwise . . . spatcounts, geoRglm, geoBUGS, . . . ???

big data9

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 37: Trondheim glmm

Precursors GLMMs Results Conclusions References

Primary tools

Special-purpose:

lme4: multiple/crossed REs, (profiling): fastMCMCglmm: Bayesian, fairly flexibleglmmADMB: negative binomial, zero-inflated etc.

General-purpose:

AD Model Builder (and interfaces)BUGS/JAGS (and interfaces)INLA12

Tools are getting better, but still not easy!

Info: http://glmm.wikidot.com

Slides: http://www.slideshare.net/bbolker

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 38: Trondheim glmm

Precursors GLMMs Results Conclusions References

Acknowledgements

Funding: NSF, NSERC, NCEAS

Data: Josh Banta and Massimo Pigliucci (Arabidopsis);Adrian Stier and Seabird McKeon (coral symbionts); CourtneyKagan, Jocelynn Ortega, David Julian (Glycera);

Co-authors: Mollie Brooks, Connie Clark, Shane Geange, JohnPoulsen, Hank Stevens, Jada White

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 39: Trondheim glmm

Precursors GLMMs Results Conclusions References

[1] Breslow NE, 2004. In DY Lin & PJ Heagerty,eds., Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlateddata, pp. 1–22. Springer. ISBN 0387208623.

[2] Gelman A, 2005. Annals of Statistics, 33(1):1–53.doi:doi:10.1214/009053604000001048.

[3] Gotelli NJ & Ellison AM, 2004. A Primer ofEcological Statistics. Sinauer, Sunderland, MA.

[4] Greven S, 2008. Non-Standard Problems inInference for Additive and Linear Mixed Models.Cuvillier Verlag, Gottingen, Germany. ISBN3867274916. URL http://www.cuvillier.de/

flycms/en/html/30/-UickI3zKPS,3cEY=

/Buchdetails.html?SID=wVZnpL8f0fbc.

[5] Greven S & Kneib T, 2010. Biometrika,97(4):773–789. URL http:

//www.bepress.com/jhubiostat/paper202/.

[6] Hadfield JD, 2 2010. Journal of StatisticalSoftware, 33(2):1–22. ISSN 1548-7660. URLhttp://www.jstatsoft.org/v33/i02.

[7] Hurvich CM & Tsai CL, Jun. 1989. Biometrika,76(2):297 –307.doi:10.1093/biomet/76.2.297. URLhttp://biomet.oxfordjournals.org/content/

76/2/297.abstract.

[8] Kenward MG & Roger JH, 1997. Biometrics,53(3):983–997.

[9] Latimer AM, Banerjee S et al., 2009. EcologyLetters, 12(2):144–154.

[10] Ozgul A, Oli MK et al., Apr. 2009. EcologicalApplications: A Publication of the EcologicalSociety of America, 19(3):786–798. ISSN1051-0761. URL http:

//www.ncbi.nlm.nih.gov/pubmed/19425439.PMID: 19425439.

[11] Richards SA, 2005. Ecology, 86(10):2805–2814.doi:10.1890/05-0074.

[12] Rue H, Martino S, & Chopin N, 2009. Journal ofthe Royal Statistical Society, Series B,71(2):319–392.

[13] Schaalje G, McBride J, & Fellingham G, 2002.Journal of Agricultural, Biological &Environmental Statistics, 7(14):512–524. URLhttp://www.ingentaconnect.com/content/

asa/jabes/2002/00000007/00000004/art00004.

[14] Spiegelhalter DJ, Best N et al., 2002. Journal ofthe Royal Statistical Society B, 64:583–640.

[15] Vaida F & Blanchard S, Jun. 2005. Biometrika,92(2):351–370.doi:10.1093/biomet/92.2.351. URLhttp://biomet.oxfordjournals.org/cgi/

content/abstract/92/2/351.

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs

Page 40: Trondheim glmm

Precursors GLMMs Results Conclusions References

Extras

Spatial and temporal correlation (R-side effects):MASS:glmmPQL (sort of), GLMMarp, INLA;WinBUGS, AD Model Builder

Additive models: amer, gamm4, mgcv, lmeSplines

Ordinal models: ordinal

Population genetics: pedigreemm, kinship

Survival: coxme, kinship, phmm

Ben Bolker McMaster University Departments of Mathematics & Statistics and Biology

GLMMs