Google lme4

lme4: interface, testing, and community issues

Ben Bolker, McMaster UniversityDepartments of Mathematics & Statistics and Biology

15 April 2014

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

lme4

I R package for mixed models

I linear, generalized, nonlinear

I speed and generality

I alternatives (also seehttp://glmm.wikidot.com/pkg-comparison)

I R: MCMCglmm, glmmADMB, hglm, othersI other: AD Model Builder, Stata (GLAMM, xtmixed,

xtmelogit), AS-REML, MLWiN, HLM, SAS PROCGLIMMIX/MIXED/NLMIXED, NIMBLE (http://www.slideshare.net/dlebauer/de-valpine-nimble)

I Bayesian frameworks: INLA, BUGS (JAGS: glm module), Stan

http://glmm.wikidot.com/pkg-comparison

http://www.slideshare.net/dlebauer/de-valpine-nimble

http://www.slideshare.net/dlebauer/de-valpine-nimble

Features

I formula interface

I scalar and vector random e�ects

I GLMMs: basic + user-speci�ed family/link functions

I extract deviance function

I standard accessors: �xed and random coe�cients, residuals etc

I predict and simulate methods

I likelihood pro�ling and parametric bootstrapping

Downstream packages

afex

agridat

AICcmodavg

aod

aods3

arm

BayesFactor

Bayesthresh

BBRecapture

benchmark

blme

boss

BradleyTerry2

car

catdata

clusterPower

DAAG difR

dlnm

doBy

effects

expp

ez

flexmix

gamm4

glmulti

gmodels

GWAF

HLMdiag

HSAUR

HSAUR2

influence.ME

irtrees

kulife

kyotil

languageR

lava

lme4

LMERConvenienceFunctions

lmerTest

longpower

lsmeans

mediation

MEMSS

metafor

Metatron

MethComp

mi

mice

miceadds

mixAK

mixlm

MixMAP

mlmRev

MPDiR

multcomp

multiDimBio

MuMIn

NanoStringNorm

nonrandom

ordinal

pamm

pan

papeR

PBImisc

pbkrtest

pedigreemm

phia

phmm

polytomous

prLogistic

R2admbR2STATS

RcmdrPlugin.NMBU

refund

RLRsim robustlmm

RVAideMemoire

SASmixed

sirt

spacomSPOT

Surrogate

texreg

TripleR

ZeligMultilevel

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Challenges

I Wide range of users/developers

I Evolving goals

I R is a hacker language . . .

I choice of object-orientation systems: (S3/S4/ref class)fortunes::fortune(121)

Rolf Turner: If you want to simultaneously handcu�

yourself, strap yourself into a strait jacket, and tie

yourself in knots, and moreover write code which is

incomprehensible to the human mind, then S4

methods are indeed the way to go.

Goals

I Simplicity for end-users (formula interface)

I Flexibility for downstream developers (modular chunks)I wrappers (ez, afex)I inference and diagnostics (pbkrtest, lmerTest)I extended models (pedigreemm, blme)

I Modularity for core development/maintenance

I Stability

Layers

i linear algebra: RcppEigen/CHOLMOD

ii PWRSS/PIRLS computations

iii nonlinear optimization

iv API/formula interface, higher-level functions(pro�ling, bootstrap, etc.)

Modular structure

(g)lFormula formula plus data → model elements(model frame, X, ReTrms ={Zt, Lambdat, Lind . . . })

mk(Gl|L)merDevfun model elements → deviance function(layers i and ii)

optimize(Gl|L)mer deviance function + starting conditions →estimates of θ and β(layer iii)

mkMerMod optimization results → merMod object

getME general-purpose accessor function

Modularity in action

lmod <- lFormula(Reaction ~ Days + (Days | Subject),

sleepstudy)

names(lmod)

## [1] "fr" "X" "reTrms" "REML"

## [5] "formula"

devfun <- do.call(mkLmerDevfun, lmod)

(opt <- optimizeLmer(devfun))

## parameter estimates: 0.967 0.0152 0.231

## objective: 1744

## number of function evaluations: 98

result <- mkMerMod(environment(devfun), opt, lmod$reTrms,

fr = lmod$fr)

Fit with pseudo-�xed e�ects

lmod2 <- lFormula(Reaction ~ Days + (1 | Subject) +

(0 + Days | Subject), sleepstudy)

devfun2 <- do.call(mkLmerDevfun, lmod2)

tmpf <- function(th) devfun2(c(20, th))

minqa::bobyqa(par = 1, fn = tmpf, lower = 0)

## parameter estimates: 0.248

## objective: 1824

## number of function evaluations: 22

Is it working?

I most downstream packages successfully ported to v 1.0

I most users weaned from @ accessors (?)

I development seems easier

I will we be able to make large internal changes?

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Design issues

I Prevent/warn of silly usageI Unidenti�able models

(e.g. rank-de�cient, single level per random e�ect)I Ill-advised models

(e.g. small number of random e�ect levels)

I Prevent/warn of �bad� �ts

Recent changes

I v. ???: move from nlminb to other default optimizers (nomore �false convergence� warnings)

I v. ???: introduce pre-�t checking

I v. 1.0-1: loosen pre-�t checks

I v. 1.0-5: introduce convergence checks

I soon: loosen/restructure convergence checks(use relative rather than absolute gradients)

Open questions: gradient, Hessian calculations?

Problems

I Computational overhead (e.g. rank-checking)

I Unusual use cases

I Detecting and identifying �tting problems

Model use issues

I Inference for mixed modelsis tough(e.g. the greatdegrees-of-freedom debate)

I Ethics: should you providequestionable, imperfect, orpoorly understoodmethods?(e.g. Wald intervals;standard errors onpredictions)

I . . . or should you let yourusers �ounder?

Roz Chast

http://imgc-cn.artprintimages.com/images/P-473-488-90/60/6005/GXFB100Z/posters/roz-chast-the-kid-who-learned-about-math-on-the-street-cartoon.jpg

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Testing

I Computational core is all �oating-point

I Small di�erences between platforms, compilers, etc.. . . .

I . . . but there are many unstable cases

I have to go beyond unit tests

I test examples are large, slow, and sometimes con�dential

Outline

Introduction

Interface issues

User guidance

Testing

Future directions

Model extensions

I non-linear �tting (present but underdeveloped)

I negative binomial, zero-in�ated models:EM/iterative algorithms or add to level III

I �exible variance structures: flexLambda branch

I structure in residuals (�R-side�)

Open questions

I restore post hoc MCMC sampling?other (faster) methods for inference and

I limitations of formula interface

I how important is GHQ?

The really big picture

I Switch to Julia, or ??

I CommoditiesI Computational linear algebraI Nonlinear optimizers

I Language-switching: interface friction

I Advantages of established framework

Science

Google lme4