Semi-Markov models under panel observationtitman/semimarkov_talk.pdf · Advantages Computation of likelihood relatively fast ... framework because already using a hidden Markov model

Semi-Markov models under panel observation

Andrew Titman

Lancaster University

March 8, 2012

Andrew Titman Lancaster University


Overview

Multi-state modelling

Computational issues with semi-Markov models

Phase-type sojourn distributions

Phase-type approximations to parametricdistributions

Application to data on post-lung-transplantationpatients

Further extensions



Multi-state models

Generalisation of standard survival analysis

Model transition intensities between multiple states

Applications

Medical: e.g. chronic diseases, HIV, breast cancer screening,cognitive decline.Financial: e.g. credit scoring modelsSocial science/ economics: e.g. employment status



Multi-state models

Inference methods dependent on the observation scheme

Continuous observation up to right-censoring:

Natural generalisations of estimators from standard survivalanalysis availableNon-parametric estimation of baseline intensities commonlyused.

Panel observation

State of individual only observed at discrete (irregularlyspaced, patient specific) time pointsParametric estimation most common: Markov, timehomogeneous (Kalbfleisch & Lawless, 1985).



Example: Bronchiolitis obliterans syndrome intransplantation patients

BOS Free BOS

Death

q12(t,Ft)

q21(t,Ft)

q13(t,Ft) q23(t,Ft)



Transition intensities

Multi-state models are typically parameterised via thetransition intensities

qrs(t,Ft) = limδt→0

P (X(t+ δt) = s|X(t) = r,Ft)δt

for process X(t) with filtration (or history) Ft.Necessary to make some kind of assumptions

Homogeneous Markov qrs(t,Ft) = qrsMarkov qrs(t,Ft) = qrs(t)Semi-Markov qrs(t,Ft) = qrs(t, t

∗) where t∗ < t is the time ofentry into the current state.

Vast majority of work for panel observed data focusses onMarkov cases.



Why consider a semi-Markov model?

Might be more realistic for particular applications

e.g. spells of a disease unlikely to be very short → exponentialdistribution not appropriatee.g. people at less risk of disease the longer they have beendisease free.

As a model diagnostic

Way of directly testing the Markov assumptionLikely to also pick up some frailty type effects



Likelihood for Markov model

The likelihood for a single individual observed in states x0, . . . , xni

at time points, 0 = t0, t1, . . . , tni is

ni∏j=1

pxj−1xj (tj−1, tj)

where prs(t1, t2) = P(X(t2) = s|X(t1) = r).P(t1, t) relates to Q(t), the generator matrix of transitionintensities, through the Kolmogorov forward equations (KFE)

dP(t1, t)

dt= P(t1, t)Q(t), P(t1, t1) = I.

In the time homogeneous case P(t1, t) = exp((t− t1)Q0), i.e.matrix exponential.



Likelihood for Semi-Markov model

No longer possible to factorise likelihood in terms of transitionprobabilities between pairs of events

prs(t1, t2) now depends on time of entry into state r.

In general P (X1 = x1, . . . , Xn = xn) =∑H∫S|H Lh(s)ds

Sum over all possible paths, H, consistent with the observedhistory and for each history integrate over the possiblesojourns in each state.

If no recovery possible then involves numerical quadrature.(Foucher et al, 2011).



Current status data

Simplest possible interval censoring scenario is where theprocess is initiated in state 1 at time 0 and subjects are onlyobserved once

Here likelihood can be expressed in terms ofpr(t) = P (X(t) = r|X(0) = 1) which is the solution to asystem of integral equations.

pr(t) =∑j 6=r

∫ t

0

pj(u)qjr(t− u) exp {−Qj(t− u)}du+ δ1r exp {−Q1(t)}

where Qj(t) =∑R

k=1

∫ t0 qjk(u)du.

But more generally require nested equations because currenttime spent in each state is not known.



Computation for semi-Markov likelihood

Kang & Lagakos (2007) considered the direct integralequation approach, but with restrictions:

At least one state of the process has an exponential sojourntime - to allow partial factorisation.Other states have a minimum sojourn time (guarantee time) -to limit the maximum number of jumps occurring betweenobservations.

Some potential for simulation based approaches to theproblem

e.g. Stopping-time resampling (Chen et al (2005))



Phase type distribution

Distribution of time to absorption of a time homogeneousMarkov process

Matrix analytic representation

f(t) = π exp (tS)S0

S(t) = π exp (tS)1

where π vector of initial state occupancy probabilities, Ssubgenerator matrix and S0 = −S1.



Coxian phase-type distribution

1 2 3 N

N + 1

µ1 µ2 µ3 µN

ξ1 ξ2 ξ3 . . .ξN−1

θ = (µ1, µ2, . . . , µN , ξ1, ξ2, . . . , ξN−1).

π = (1, 0, . . . , 0).



General idea

Phase-type distributions offer a very flexible class ofwaiting-time distributions.

If the sojourn times of the semi-Markov model are restrictedto have phase-type distributions, then the likelihood remainstractable

Can be represented as an aggregated Markov model.Hidden Markov model likelihood methods apply.



Likelihood

If each state, r, of the process has an N phase-type sojourndistribution can define sub-states r1, . . . , rN .

Representing the phases of the phase-type distribution

The latent process, X∗, of sub-states is Markov.

Observed process then has a hidden Markov modelrepresentation, e.g.

P (X1, X2, X3) =∑i,j,k

P (X∗1 = 1i, X∗2 = 2j , X

∗3 = 3k)

=∑i,j,k

P (X∗1 = 1i)P (X∗2 = 2j |X∗1 = 1i)P (X

∗3 = 3k|X∗2 = 2j)

Can recursively evaluate summation by using Forwardalgorithm.



Advantages

Computation of likelihood relatively fast

Provided the number of latent states is not excessive.

Often, in addition to panel observation can havemisclassification of the state.

P (Ot = s|Xt = r) = ers and assumed that O1, . . . , Onindependent conditional on X1, . . . , Xn.

Very natural extension to these models under the phase-typeframework because already using a hidden Markov modellikelihood.

Some scope to fit these models in existing software e.g. msmpackage in R.



Disadvantages

When using phase-type distributions with multiple phases, runinto identifiability problems very quickly

Many parameters close to being redundant or becomeredundantDifficulties even for right-censored data

Only feasible for very simple phase-type distributions in thepanel data case.

If comparing with Markov model cannot perform standardlikelihood ratio test

Non-standard conditions - some parameters of the phase-typemodel are unidentifiable under the null Markov model.

Not uncommon to get boundary estimates e.g. 0 hazard ofdeath from one state.



2-Phase Coxian distribution

Simplest non-trivial phase-type distribution

Defined by three parameters which roughly determine theinitial intensity, terminal intensity and the rate of changebetween these levels.

1 2

µ1 µ2

ξ

3

ξ not identifiable if µ1 = µ2.



2-Phase Semi-Markov model for the BOS data

3

22211211

µ(13)2

µ(13)1 µ

(23)2

ξ1 µ(12)2 ξ2

µ(21)2

µ(21)1

µ(12)1

µ(23)1



Alternative approach

In stochastic control theory, the use of phase-typeapproximations to parametric distributions is common.

e.g. in the analysis of queues.

However, typically analysing a process with known waitingdistribution.

Principle could be applied to estimating semi-Markov models.

Join phase-type approximations for different states together.



Approximation of Weibull distribution

Weibull hazard function is monotonically increasing ordecreasing

Good phase-type approximation can be obtained withrelatively few phases.

Here consider 5-phase Coxian distribution with 9 parameters.

Seek S(θ) that minimizes the Kullback-Leibler distance

Don’t need to fit to tails of distribution. e.g. if follow-up instudy is 10 years, don’t need to fit distribution beyond 10 years.Just need accurate amount of mass after upper point



B-spline family fit

In order to fit the semi-Markov will want phase-type fits for alarge range of Weibull distributions.Impractical to do a custom fit for every point.

Too time consumingResulting likelihood not smooth

In general seek θ(α) that minimizes∫ αu

αl

KL(fα,λ, fS(θ))dα (1)

Find B-spline approximations to the solution of (1)

θi(α) =∑j

wijBij(α)

for i = 1, . . . , 9.



Demonstration of fit: α = 1.2

0.0 0.5 1.0 1.5 2.0

−2.

5−

2.0

−1.

5−

1.0

−0.

50.

0

log[f(t)]

t

log(

f)

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

S(t)

t

S(t

)




0.0 0.5 1.0 1.5 2.0

−3.

5−

3.0

−2.

5−

2.0

−1.

5−

1.0

−0.

50.

0

log[f(t)]

t

log(

f)

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

S(t)

t

S(t

)




0.0 0.5 1.0 1.5 2.0

−3

−2

−1

01

2

log[f(t)]

t

log(

f)

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

S(t)

t

S(t

)




0 1 2 3 4 5

−4

−3

−2

−1

01

2

log[f(t)]

t

log(

f)

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

S(t)

t

S(t

)



Demonstration of fit: Kullback-Leibler distance

0.5 1.0 1.5 2.0

0.00

00.

002

0.00

40.

006

0.00

80.

010

Comparison of approximations

α

KL

PointwiseB−spline



Phase-type approximation to Weibull semi-Markov process

Optimisation to establish approximation quite large

But only has to be performed once:

Can fit for Weibull rate λ = 1 for a given cut-off point, e.g.t = 2.Taking λS(θ) then gives optimal estimate for rate λ for cut-offt = 2/λ.

Resulting likelihood is differentiable so standard approaches tomaximum likelihood estimation applicable e.g. BFGS, BHHHor other quasi-Newton methods.



Embedded system

Each (non-absorbing) state of the semi-Markov process ismade up of 5 sub-states

If there are multiple destinations from a state:

Overall intensity out of state taken to be αrλαrr tαr−1

Individual intensity from r → s

αrλrs {λrt}αr−1

where λr =∑j 6=s λrs

NB: Not the same as having competing Weibull intensitieswith separate shape parameters.



Quality of approximation to the likelihood

Difficult to assess because “exact” likelihood very difficult tocompute for examples of interest.

In simulations estimates based on maximising the approximatelikelihood are close to unbiased and have accurate standarderrors.

For a simple two state ‘switching’ model where all subjectsobserved at equally spaced intervals and one sojourndistribution is exponential can use direct simulation to getlikelihood curve.

Sufficient statistic is simple.



Quality of approximation: Simple 2 state example

●

●

●

●

●

●

●

●

●

●

● ●● ● ● ●

●●

●

●

●

●

●

●

●

●

●

●

●

●

0.6 0.7 0.8 0.9 1.0 1.1 1.2

−31

25−

3120

−31

15

Comparison of likelihood curves

α

l(α)



Example: Post-lung transplantation patients

Bronchiolitis obliterans syndrome

Deterioration in lung function over time

364 double-lung or heart-lung transplantation patients.

6 month survivors

‘Normal’ lung function determined in first 6 months

BOS state defined by % of normal lung function based onFEV1 measurements.

Subject to misclassification.

between BOS free & BOS states.

2654 assessments on lung function, 193 deaths.



Results for BOS

Markov 2-PH Semi-Markov Weibull Semi-Markov

−2× LL 3005.06 2976.5 2979.7Parameters 9 13 11

Clear evidence against homogeneous Markov model.

Fit of 2-phase Coxian and Weibull semi-Markov models quitecomparable.



Results for BOS

Semi-Markov models estimate decreasing hazard with timesince entry into the state for both the BOS-free and BOSstates.

Possible interpretations:

Patient heterogeneity: some patients have rapid declines.Problem with model assumptions regarding statemisclassification.Partly accounts for time non-homogeneity with respect to timesince transplant.



Comparison of overall survival estimates

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Estimated survival for heart−lung transplant patients

Time since transplant (years)

S(t

)

Markov2−PHWeibull



Comparison of conditional survival estimates

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Estimated conditional survival given a 5 year sojourn in state 2

Time (Years)

P(A

live)

WeibullWeibull 95% CI2PH2PH 95% CI



Further extensions

Covariates on intensities straightforward provided assume

qrs(t; z) = αrλrs exp(βrsz)

∑j

λrj exp(βrjz)t

αr−1

Not a proportional intensities model.

Alternative competing Weibull intensities possible in principle

But requires a much larger number of latent states (e.g. 5N

for N competing events).

Pattern mixture representation also possible.



Further extensions

Non-homogeneous semi-Markov models are possible byapplying existing methods for non-homogeneous HMMs

Piecewise constant intensities:

qrs(t) =

{qrs1 t < tu

qrs2 t ≥ tu

‘Time transformation’ models:

Q(t) = Q0g(t), g(t) > 0.

Intensities of the observed process then depend both on timesince entry in the state and time since initiation (or calendartime).



Conclusions

Models with phase-type sojourn distributions can be used toobtain tractable likelihoods for semi-Markov models underpanel observation due to equivalence with a class of hiddenMarkov models.

Can use either directly:

Simple 2-phase Coxian distribution

Indirectly as approximations to other parametric survivaldistributions:

One-off optimisation to establish B-spline family ofapproximation to Weibull distributionsThese approximations then embedded within overall system.



Conclusions

Enables a way of checking (homogeneous) Markovassumption.

But doesn’t imply semi-Markov model is the best model.

Non-homogeneous Markovfrailty/random effectsState misclassification

May depend on the application which is most preferable.

2-Phase Coxian and Weibull models give very similar results

Very slight improvement in efficiency for Weibull estimates.



References

Chen, Y., Xie, J., Liu, JS. (2005) Stopping-time resampling forsequential Monte Carlo methods. JRSS B 67: 199-217.

Foucher, Y., Giral, M., Soulillou, JP., Daures, JP. (2010). A flexiblesemi-Markov model for interval-censored data and goodness-of-fittesting. Statistical Methods in Medical Research. 19: 127-145.

Kalbfleisch, J.D, Lawless, J.F. (1985) The analysis of panel dataunder a Markov assumption. JASA. 80:863-871

Kang, M., Lagakos, S.W. (2007) Statistical methods for panel datafrom a semi-Markov process, with application to HPV. Biostatistics8, 252-264.

Titman, AC. Sharples, LD. (2010). Semi-Markov models withphase-type sojourn distributions. Biometrics. 66: 742-752.



Documents

Semi-Markov models under panel observationtitman/semimarkov_talk.pdf · Advantages Computation of likelihood relatively fast ... framework because already using a hidden Markov model