24

Google lme4

Embed Size (px)

Citation preview

Page 1: Google lme4

lme4: interface, testing, and community issues

Ben Bolker, McMaster UniversityDepartments of Mathematics & Statistics and Biology

15 April 2014

Page 2: Google lme4

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Page 3: Google lme4

lme4

I R package for mixed models

I linear, generalized, nonlinear

I speed and generality

I alternatives (also seehttp://glmm.wikidot.com/pkg-comparison)

I R: MCMCglmm, glmmADMB, hglm, othersI other: AD Model Builder, Stata (GLAMM, xtmixed,

xtmelogit), AS-REML, MLWiN, HLM, SAS PROCGLIMMIX/MIXED/NLMIXED, NIMBLE (http://www.slideshare.net/dlebauer/de-valpine-nimble)

I Bayesian frameworks: INLA, BUGS (JAGS: glm module), Stan

Page 4: Google lme4

Features

I formula interface

I scalar and vector random e�ects

I GLMMs: basic + user-speci�ed family/link functions

I extract deviance function

I standard accessors: �xed and random coe�cients, residuals etc

I predict and simulate methods

I likelihood pro�ling and parametric bootstrapping

Page 5: Google lme4

Downstream packages

afex

agridat

AICcmodavg

aod

aods3

arm

BayesFactor

Bayesthresh

BBRecapture

benchmark

blme

boss

BradleyTerry2

car

catdata

clusterPower

DAAG difR

dlnm

doBy

effects

expp

ez

flexmix

gamm4

glmulti

gmodels

GWAF

HLMdiag

HSAUR

HSAUR2

influence.ME

irtrees

kulife

kyotil

languageR

lava

lme4

LMERConvenienceFunctions

lmerTest

longpower

lsmeans

mediation

MEMSS

metafor

Metatron

MethComp

mi

mice

miceadds

mixAK

mixlm

MixMAP

mlmRev

MPDiR

multcomp

multiDimBio

MuMIn

NanoStringNorm

nonrandom

ordinal

pamm

pan

papeR

PBImisc

pbkrtest

pedigreemm

phia

phmm

polytomous

prLogistic

R2admbR2STATS

RcmdrPlugin.NMBU

refund

RLRsim robustlmm

RVAideMemoire

SASmixed

sirt

spacomSPOT

Surrogate

texreg

TripleR

ZeligMultilevel

Page 6: Google lme4

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Page 7: Google lme4

Challenges

I Wide range of users/developers

I Evolving goals

I R is a hacker language . . .

I choice of object-orientation systems: (S3/S4/ref class)fortunes::fortune(121)

Rolf Turner: If you want to simultaneously handcu�

yourself, strap yourself into a strait jacket, and tie

yourself in knots, and moreover write code which is

incomprehensible to the human mind, then S4

methods are indeed the way to go.

Page 8: Google lme4

Goals

I Simplicity for end-users (formula interface)

I Flexibility for downstream developers (modular chunks)I wrappers (ez, afex)I inference and diagnostics (pbkrtest, lmerTest)I extended models (pedigreemm, blme)

I Modularity for core development/maintenance

I Stability

Page 9: Google lme4

Layers

i linear algebra: RcppEigen/CHOLMOD

ii PWRSS/PIRLS computations

iii nonlinear optimization

iv API/formula interface, higher-level functions(pro�ling, bootstrap, etc.)

Page 10: Google lme4

Modular structure

(g)lFormula formula plus data → model elements(model frame, X, ReTrms ={Zt, Lambdat, Lind . . . })

mk(Gl|L)merDevfun model elements → deviance function(layers i and ii)

optimize(Gl|L)mer deviance function + starting conditions →estimates of θ and β(layer iii)

mkMerMod optimization results → merMod object

getME general-purpose accessor function

Page 11: Google lme4

Modularity in action

lmod <- lFormula(Reaction ~ Days + (Days | Subject),

sleepstudy)

names(lmod)

## [1] "fr" "X" "reTrms" "REML"

## [5] "formula"

devfun <- do.call(mkLmerDevfun, lmod)

(opt <- optimizeLmer(devfun))

## parameter estimates: 0.967 0.0152 0.231

## objective: 1744

## number of function evaluations: 98

result <- mkMerMod(environment(devfun), opt, lmod$reTrms,

fr = lmod$fr)

Page 12: Google lme4

Fit with pseudo-�xed e�ects

lmod2 <- lFormula(Reaction ~ Days + (1 | Subject) +

(0 + Days | Subject), sleepstudy)

devfun2 <- do.call(mkLmerDevfun, lmod2)

tmpf <- function(th) devfun2(c(20, th))

minqa::bobyqa(par = 1, fn = tmpf, lower = 0)

## parameter estimates: 0.248

## objective: 1824

## number of function evaluations: 22

Page 13: Google lme4

Is it working?

I most downstream packages successfully ported to v 1.0

I most users weaned from @ accessors (?)

I development seems easier

I will we be able to make large internal changes?

Page 14: Google lme4

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Page 15: Google lme4

Design issues

I Prevent/warn of silly usageI Unidenti�able models

(e.g. rank-de�cient, single level per random e�ect)I Ill-advised models

(e.g. small number of random e�ect levels)

I Prevent/warn of �bad� �ts

Page 16: Google lme4

Recent changes

I v. ???: move from nlminb to other default optimizers (nomore �false convergence� warnings)

I v. ???: introduce pre-�t checking

I v. 1.0-1: loosen pre-�t checks

I v. 1.0-5: introduce convergence checks

I soon: loosen/restructure convergence checks(use relative rather than absolute gradients)

Open questions: gradient, Hessian calculations?

Page 17: Google lme4

Problems

I Computational overhead (e.g. rank-checking)

I Unusual use cases

I Detecting and identifying �tting problems

Page 18: Google lme4

Model use issues

I Inference for mixed modelsis tough(e.g. the greatdegrees-of-freedom debate)

I Ethics: should you providequestionable, imperfect, orpoorly understoodmethods?(e.g. Wald intervals;standard errors onpredictions)

I . . . or should you let yourusers �ounder?

Roz Chast

Page 19: Google lme4

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Page 20: Google lme4

Testing

I Computational core is all �oating-point

I Small di�erences between platforms, compilers, etc.. . . .

I . . . but there are many unstable cases

I have to go beyond unit tests

I test examples are large, slow, and sometimes con�dential

Page 21: Google lme4

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Page 22: Google lme4

Model extensions

I non-linear �tting (present but underdeveloped)

I negative binomial, zero-in�ated models:EM/iterative algorithms or add to level III

I �exible variance structures: flexLambda branch

I structure in residuals (�R-side�)

Page 23: Google lme4

Open questions

I restore post hoc MCMC sampling?other (faster) methods for inference and

I limitations of formula interface

I how important is GHQ?

Page 24: Google lme4

The really big picture

I Switch to Julia, or ??

I CommoditiesI Computational linear algebraI Nonlinear optimizers

I Language-switching: interface friction

I Advantages of established framework