35
Multi-state modelling software, and encouraging statistical software development. Chris Jackson MRC Biostatistics Unit, Cambridge, U.K. MRC Biostatistics Unit Centenary Conference, 25 March 2014 Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 1/ 24

Multi-state modelling software, and encouraging

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Multi-state modelling software, and encouragingstatistical software development.

Chris JacksonMRC Biostatistics Unit, Cambridge, U.K.

MRC Biostatistics Unit Centenary Conference, 25 March 2014

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 1/ 24

Overview

Part 1. Software for multi-state models

I Two different types of multi-state model

I msm package for R — features, design principles, potentialdevelopments

I survival + mstate packages for R.

Part 2. Encouraging statistical software development inbiostatistics research

I What’s needed, and how to do it. Start discussion. . .

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 2/ 24

Part I

Software for multi-state modelling —

state of the art and future.

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 3/ 24

Multi-state models in continuous time

Example:

STAGE n−2 STAGE n−1

DISEASE DISEASE DISEASE DISEASE...

ABSORBING

STATE n

STAGE 1 STAGE 2

Defined by matrix Q of transition intensities: instantaneous risk ofmoving from state r to state s 6= r : at time t:

qrs(t,F(t−)) = limδt→0

P(S(t + δt) = s|S(t) = r ,F(t−))/δt.

e.g.Markovtime-homogeneous

}model, qrs independent of

{F(t−)t

I Single period (sojourn time) in state r ∼ Exp(mean = −1/qrr )

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 4/ 24

Multi-state models in continuous time

Example:

STAGE n−2 STAGE n−1

DISEASE DISEASE DISEASE DISEASE...

ABSORBING

STATE n

STAGE 1 STAGE 2

Defined by matrix Q of transition intensities: instantaneous risk ofmoving from state r to state s 6= r : at time t:

qrs(t,F(t−)) = limδt→0

P(S(t + δt) = s|S(t) = r ,F(t−))/δt.

e.g.Markovtime-homogeneous

}model, qrs independent of

{F(t−)t

I Single period (sojourn time) in state r ∼ Exp(mean = −1/qrr )

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 4/ 24

Data for multi-state models (1): intermittently-observed

State 1

State 2

State 3

State 4

t0 t1 t2 t3 t4

Panel data. State only observed ata finite number of times j .

I Don’t know the state betweenthese times

I e.g. chronic disease onlymeasurable at clinic visit /screening test

I Likelihood is product of transition probabilities between statesS(tj) observed at successive tj (Kalbfleisch & Lawless, JASA 1985).

L(Q) =∏j

PS(tj ),S(tj+1)(tj+1 − tj).

I Closed form for corresponding matrix P(t) = Exp(tQ) only ifQ is constant / piecewise constant with time t.

I Non-Markov models difficult (see later. . .)

msm package for R — designed for this type of data

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 5/ 24

Data for multi-state models (1): intermittently-observed

State 1

State 2

State 3

State 4

t0 t1 t2 t3 t4

Panel data. State only observed ata finite number of times j .

I Don’t know the state betweenthese times

I e.g. chronic disease onlymeasurable at clinic visit /screening test

I Likelihood is product of transition probabilities between statesS(tj) observed at successive tj (Kalbfleisch & Lawless, JASA 1985).

L(Q) =∏j

PS(tj ),S(tj+1)(tj+1 − tj).

I Closed form for corresponding matrix P(t) = Exp(tQ) only ifQ is constant / piecewise constant with time t.

I Non-Markov models difficult (see later. . .)

msm package for R — designed for this type of data

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 5/ 24

Data for multi-state models (2): completely-observed

State 1

State 2

State 3

Death

0 12

Observe all changes of state

I know the complete processhistory.

I e.g. changes of state representevents

I MI, stroke, periods in hospital.

I event times may be known fromadministrative data.

I Time-to-event data with competing event times censored.I substantial literature on survival / competing risks

I Only Markov models supported by msm.I exponential / piecewise-exponential event times.

I Can estimate transition rates under more flexible models (e.g.Cox semi-Markov) using standard survival analysis software.

survival and mstate packages for R designed for this

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 6/ 24

Data for multi-state models (2): completely-observed

State 1

State 2

State 3

Death

0 12

Observe all changes of state

I know the complete processhistory.

I e.g. changes of state representevents

I MI, stroke, periods in hospital.

I event times may be known fromadministrative data.

I Time-to-event data with competing event times censored.I substantial literature on survival / competing risks

I Only Markov models supported by msm.I exponential / piecewise-exponential event times.

I Can estimate transition rates under more flexible models (e.g.Cox semi-Markov) using standard survival analysis software.

survival and mstate packages for R designed for this

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 6/ 24

msm R package for multi-state modelling

http://CRAN.R-project.org/package=msm

Jackson (J Stat. Soft. 2011), Jackson et al. (Statistician 2003)

Used in health, finance, ecology, social science, engineering. . .

General and flexible. Fit continuous-time Markov modelsI with any state structure / transition matrixI covariates (proportional intensities) for any / all transitions

I subject-specific time-constant orI piecewise-constant time-dependent, including time itself

I to various patterns of observation, particularlyintermittently-observed. . .

msm(state ~ time, subject=subj, data=mydata,

covariates = list("1-2" = ~ age,

"2-3" = ~ age + treatment),

qmatrix=rbind(c(0,1,1),

c(0,0,1),

c(0,0,0)), gen.inits=TRUE)

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 7/ 24

Other observation schemes

State 1

State 2

State 3

4 = Death

0.0 1.5 3.5 5.0 9.0

State entry time observed, butstate at previous time unknown(typical for times of death)

# state 4 like this

msm(..., death=4, ...)

State 1

State 2

State 3

4 = Death

0.0 1.5 3.5 5.0 9.0

?

?

?X States only observed to be in a

certain set of possible states (e.g.alive, but unknown disease severity)

# (put 999 in data at these times)

msm(..., censor=999,

censor.states=c(1,2,3))

Appropriate likelihoods computed / maximised by msm.(No truncated samples, or informative observation times).

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 8/ 24

Hidden Markov models

State 1

State 2

State 3

State 4

0.0 1.5 3.5 5.0 10.0

States observed with error, from atrue hidden Markov chain.

msm(..., ematrix = rbind(c(0,1,0),

c(1,0,0),

c(0,0,0)),...)

Months

Mar

ker

leve

l

0 20 40 60 80 100

40

70

100

DISEASE−FREE

DISEASE STAGE 1

DISEASE STAGE 2

ObservedUnderlying

Continuous outcome, conditionalon a hidden Markov chain (manyoutcome distributions)

msm(y ~ time,

hmodel = list(hmmNorm(mean=100, sd=16),

hmmNorm(mean=54, sd=18),

hmmIdent(999)), ...)

msm jointly estimates (with covariates on either)

I Markov transition intensities QI misclassification probabilities / outcome parameters

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 9/ 24

Other features

I Constrain parameters to equal other parameters or to knownvalues — parsimony / model checking

msm(...,

constraint = list(age = c(1,1,1,2,2),

treatment = c(1,2,3,4,4)),

fixedpars = c(4,9),

...)

I Mixture of observation types (intermittent, exact,misclassified. . .) in the same data

msm(..., obstype = obs, obstrue = obst, ... )

I Model assessment: plots and formal tests of fit1.

plot.prevalence.msm(x), pearson.msm(x), ...

1Titman & Sharples, Stat Meth Med Res (2010) 19(6):621-651Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 10/ 24

Programming principles (1)

(I don’t always follow these! Takes time)

I Minimum input from user: sensible defaults, that advancedusers can tweak.

I Complete documentation, user guides and examples.I Maintain it / fix bugs. Reply helpfully to emails!

I Update documentation if people find something unclear /commonly misuse in same way

I Outputs that are meaningful / helpful.I e.g. confidence intervals instead of SEs / p-values, hazard

ratios as well as log HRs.

I Useful messages. Tell the user how to fix any errors:

Data inconsistent with transition matrix

Data inconsistent with transition matrix: subject 200 moves from

state 4 to state 2 at non-missing observation 1436

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 11/ 24

Programming principles (1)

(I don’t always follow these! Takes time)

I Minimum input from user: sensible defaults, that advancedusers can tweak.

I Complete documentation, user guides and examples.I Maintain it / fix bugs. Reply helpfully to emails!

I Update documentation if people find something unclear /commonly misuse in same way

I Outputs that are meaningful / helpful.I e.g. confidence intervals instead of SEs / p-values, hazard

ratios as well as log HRs.

I Useful messages. Tell the user how to fix any errors:

Data inconsistent with transition matrix

Data inconsistent with transition matrix: subject 200 moves from

state 4 to state 2 at non-missing observation 1436

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 11/ 24

Programming principles (2)

(I don’t always follow these! Takes time)

I Modular / maintainable code.I Avoid reinventing the wheel – build on others’ work

I e.g. msm now depends on mexp (Goulet et al.), a nice library formatrix exponentiation, instead of relying on its own methods

I Automatic “unit testing”: for each package function, whenupdating package

stopifnot(result_from_new_version() == known_result)

I Efficiency: vectorised R code, heavy numerical work done in C

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 12/ 24

Future developments in msm

I Many internal cleanups, and unglamorous new features, inpast year.

I In progress: modified AIC to compare models withdifferently-aggregated state structures (Howard Thom)

I Future: test assumptions (Markov, time/between patienthomogeneity) through fitting more complex models. . .

I . . . but difficult with intermittent observations: unobservedevent times / paths through states.

I Integrating likelihoods over possible paths / event times onlyfeasible for simple transition structures (e.g. well/illness/death)

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 13/ 24

Semi-Markov models for panel data

State 1

State 2

State 3

State 4

t0 t1 t2 t3 t4

I Transition intensitydepends on time spent incurrent state.

I Difficult for panel data:don’t know time of entryinto state.

Phase-type models (Titman and Sharples, Biometrics 2010; Titman, Statistics and Computing 2011)

Current state

Nextstate

Substate 1 Substate 2 Substate kSubstate

k+1...

Exponential sojourn in current state

replaced by embedded Markov

model with k phases: sojourn time

is time to state k + 1.

May be implemented ashidden Markov models in msm

I many more parameters,how to interpret?

I identifiability constraintsnot implemented yet

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 14/ 24

Random effects models

I Unexplained heterogeneity in transition intensities betweenindividuals / groups.

I Likelihood only tractable for specific casesI e.g. discrete random effects distribution (Cook et al, Biometrics

2004)I same random effect for all intensities (Satten, Biometrics 1999).I conjugate gamma frailties, exact event times (Putter &

Houwelingen, SMMR 2011), mixture models / mover-stayer(O’Keeffe et al. Stat. Med. 2012). . .

I MCMC approaches (JAGS / BUGS / Stan software) or MonteCarlo EM (Sutradhar and Cook, JRSS C, 2008)?

I experimental facility available in msm to generate code to fitsame model in JAGS

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 15/ 24

Time-inhomogeneous models for panel data

State 1

State 2

State 3

State 4

t0 t1 t2 t3 t4

I Likelihood needs transitionprobability matrix P with r , sentry Pr(S(t1) = s|S(t0) = r).

I Kolmogorov forward equationsdP(t0, t1)/dt = P(t0, t1)Q(t)

I Q not constant or piecewiseconstant with time → noanalytic solution.

Or numerically solve the differential equation (Titman, Biometrics

2011).

I Allows e.g. Weibull or spline functions for Q(t) — smoother /more realistic than piecewise constant

I Need to solve for each distinct covariate value — hard forcontinuous covariates / big datasets.

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 16/ 24

Time-inhomogeneous models for panel data

State 1

State 2

State 3

State 4

t0 t1 t2 t3 t4

I Likelihood needs transitionprobability matrix P with r , sentry Pr(S(t1) = s|S(t0) = r).

I Kolmogorov forward equationsdP(t0, t1)/dt = P(t0, t1)Q(t)

I Q not constant or piecewiseconstant with time → noanalytic solution.

Or numerically solve the differential equation (Titman, Biometrics

2011).

I Allows e.g. Weibull or spline functions for Q(t) — smoother /more realistic than piecewise constant

I Need to solve for each distinct covariate value — hard forcontinuous covariates / big datasets.

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 16/ 24

Future developments in msm

Ideally any new methods should work with any multi-statestructure.

I or at least for common structures: e.g. progressive disease

, or progression and death

I unmaintainable if handle too many special cases in differentways.

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 17/ 24

All transition times known: survival/mstate system

Model fitting: survival R package (Therneau)

I Estimate event-specific hazards == transition ratesI under Cox or fully parametric models for times to each event.

Prediction: mstate R package (de Wreede et al. J. Stat. Soft. 2011)

I Estimate cumulative incidences from a Cox model (Breslow)

I Convert these to transition probabilities over a time periodI Aalen-Johansen estimator for inhomogeneous Markov modelsI Individual patient simulation for semi-Markov models

I No documentation for using parametric modelsI needed e.g. for extrapolation in health economic evaluations

I Data need some awkward manipulation

Tutorials / courses to clear up msm vs. mstate confusion / makeall their methods more accessible?

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 18/ 24

Part II

Encouraging more software development

in biostatistics research

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 19/ 24

Why encourage software development?

Statistical methods need accessible software.

I allows the method to be used by more people— especially non-experts

I saves time for everyone, even experts

I increases transparency / trust in research resultsI good for promoting a new method

I “the other way our ideas get out there is through software . . .

software implementation is a kind of publication, indeed, one of the

best kinds.” (http://andrewgelman.com/2014/03/12/publishing-journals)

I see, e.g. popularity of DIC in BUGS

I impact, citations. . .

→ good for science (and scientists).

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 20/ 24

Types of statistical software

Ad-hoc code, often to accompany a journal paper

I Typically only usable by experts, maybe only after tweaking

I OK for reproducibility / transparency

I Bare minimum

R (or Stata) packages.

Standalone software / large libraries (e.g. BUGS, JAGS, Stan).

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24

Types of statistical software

Ad-hoc code, often to accompany a journal paper

R (or Stata) packages.

I Documented and maintained, easy to install and use. . .

I Builds on great work done in the last 10-15 years to make Rand CRAN the main platform for statistics research.

I Ideal that we can aim for for new methodology.I see e.g. Jeff Leek (Johns Hopkins) group policy: all PhD students

to develop and maintain an R package: http://simplystatistics.org/2013/

10/07/the-leek-group-policy-for-developing-sustainable-r-packages/

Standalone software / large libraries (e.g. BUGS, JAGS, Stan).

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24

Types of statistical software

Ad-hoc code, often to accompany a journal paper

R (or Stata) packages.

Standalone software / large libraries (e.g. BUGS, JAGS, Stan).

I Often to accompany a major methodological advance (e.g.MCMC).

I Needs advanced programming / software engineering skills todevelop.

I Still needs users for feedback / bug reports / testing / support

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 21/ 24

How can we develop more accessible software?

Culture shift: software viewed as a valuable research output

I Funding bodies / grant reviewers, research assessors, PhDexaminers, journal editors, supervisors and line managers. . .

Time and money. . .

People and skills. . .

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24

How can we develop more accessible software?

Culture shift: software viewed as a valuable research output

Time and money. . .

I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”

I A lot of tedious work involved in writing software — but sameis true for writing papers!

People and skills. . .

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24

How can we develop more accessible software?

Culture shift: software viewed as a valuable research output

Time and money. . .

I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”

I A lot of tedious work involved in writing software — but sameis true for writing papers!

“. . .[our academic culture] has traditionally put a very high premium on beingclever and a relatively low premium on being willing to go through the schlep.[As applied statistics grows] the schlep becomes just as important as the cleveridea. If you aren’t willing to put in the time to code your methods up and makethem accessible to other investigators, then who will be?”Jeff Leek (http://simplystatistics.org/2012/05/28/schlep-blindness-in-statistics)

People and skills. . .

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24

How can we develop more accessible software?

Culture shift: software viewed as a valuable research output

Time and money. . .

I Priorities: consider a methodology project not finishedwithout usable software — not an “optional extra”

I A lot of tedious work involved in writing software — but sameis true for writing papers!

“creating an R package is building something. It is something you can point toand say, ”I made that”. Leaving aside all the tangible benefits to your career,the profession, etc. it is maybe the most gratifying feeling you get whenworking on research.” (Jeff Leek, https://github.com/jtleek/rpackages)

People and skills. . .

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24

How can we develop more accessible software?

Culture shift: software viewed as a valuable research output

Time and money. . .

People and skills. . .

I More collaborative programming, just like we do collaborativewriting (tools could help e.g. GitHub)

I Informal / internal peer review by local software expertsI Training of students / researchers in software development

I Software user / discussion groups (e.g. for R techniques)I Courses and online resources. A lot to be learnt from “open

source” community!

I Collaborations with computing specialists: especially for majorsoftware projects: http://www.timeshighereducation.co.uk/news/

save-your-work-give-software-engineers-a-career-track/2006431.article

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 22/ 24

More than just software

“Knowledge transfer”: Link software with documentation andtraining→ make methods more accessible, avoid them being misused

I Tutorial papers

I “Vignettes”: tutorials with worked examples, packaged withthe software

I Journal of Statistical Software, R Journal, Stata Journal— gives author a publication!

I Short courses

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 23/ 24

Discussion

Disclaimer: My perspective is limited to MRC-BSU / biostatistics, and biased

by my own enthusiasm!

I What’s your experience?

I Are we doing OK as we are?

I What else can we do?

Chris Jackson, MRC-BSU Cambridge Multi-state modelling, and encouraging more software 24/ 24