39
Lexis diagrams and analysis of register data Practical use of Lexis diagrams in the analysis and routine reporting from population registers Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen [email protected] www.biostat.ku.dk/~bxc Statistics for Health Registers and Linked Databases Milton Keynes, UK, May 2009 1/ 39

Lexis diagrams and analysis of register data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lexis diagrams and analysis of register data

Lexis diagrams and

analysis of register dataPractical use of Lexis diagrams in the analysis and

routine reporting from population registers

Bendix CarstensenSteno Diabetes Center& Department of Biostatistics, University of [email protected] www.biostat.ku.dk/~bxc

Statistics for Health Registers and Linked DatabasesMilton Keynes, UK, May 2009

1/ 39

Page 2: Lexis diagrams and analysis of register data

OutlineHealth registers

Lexis diagrams

Tabulation of follow-up

Models and likelihood

Diabetes incidence

Diabetes mortality

Cox modelling

Advantage of parametric hazards

Summary

2/ 39

Page 3: Lexis diagrams and analysis of register data

Health events

I Diagnoses (hospitals, clinics)

I Procedures (treatments, measurements)

I Purchases (prescriptions)

Linkage by person ID:⇒ (partial) health history of persons.

Health registers 3/ 39

Page 4: Lexis diagrams and analysis of register data

Population registers

I All persons in the population included.

I Universal linkage of persons.I The Nordic countries (DK,FI,IS,NO,SE):

I Person ID for all citizensI Used for health care purposesI Used for all taxation and social records as well

Not only health histories, but also healthhistories linked to social, economic andeducations status is possible.

I Censuses replaced by register tabulations.

Health registers 4/ 39

Page 5: Lexis diagrams and analysis of register data

Mortality rates from registers

I Entry time (e.g. date of diagnosis).

I Exit time (e.g. date of death).

Implicit here is the current state:Alive with a diagnosis.

Health registers 5/ 39

Page 6: Lexis diagrams and analysis of register data

Incidence rates from registers

I Entry time (e.g. date of birth).

I Exit time (e.g. date of diagnosis).

Implicit here is the current state:Alive without a diagnosis.

Usually not compiled directly, but based on:

I Cases from a register

I Follow-up derived from censuses

Note: the follow-up is derived from a register of allat risk; in this case the entire population.

Health registers 6/ 39

Page 7: Lexis diagrams and analysis of register data

General health history

I Current stateI Entry point in stateI Exit point from stateI Next state

Multistate models:

Well

DM

Dead

λλ

µµWµµD

Health registers 7/ 39

Page 8: Lexis diagrams and analysis of register data

Wilhelm Lexis

Wilhelm Lexis(1837–1914)German statistician andeconomist.

� �����.�"�����

�������/�/��&�� ���������/�/������0�� ���1���

����"� �2��34������"� �2��������

!� ��&�� ��,�&�� �����5"����������&�� �

."�� ��&�� ��,�&�� ������ ���� � ��&�� �

�� ������&�� ��

����������2:�;�!����� �������/���6������/��7�23�������� � �� �� � 0�<</���2����8/�<2:<=�� >,7�� �>,�?� 0>@�>:�>:��/�8��'8�888

:����: A,�A@�:AAB�:@�CB

Lexis diagrams 8/ 39

Page 9: Lexis diagrams and analysis of register data

Lexis diagram

I Shows the follow-upa person from entryto exit as a functionof date and age.

I In general: follow-upshown on twotimescales.

Lexis diagrams for a 1‰random sample of theDanish National DiabetesRegister.

1994 1998 2002 20060

20

40

60

80

100

Date

Age

●●

●●

●●

●●

●●

● ●

1994 1998 2002 20060

20

40

60

80

100

Date

Age

●●

●●

●●

●●

●●

● ●

Lexis diagrams 9/ 39

Page 10: Lexis diagrams and analysis of register data

1990 1995 2000 2005 201045

50

55

60

65

Date

Age

●●

● ●●

Lexis diagrams 10/ 39

Page 11: Lexis diagrams and analysis of register data

1990 1995 2000 2005 201045

50

55

60

65

Date

Age

●●

● ●●

Lexis diagrams 11/ 39

Page 12: Lexis diagrams and analysis of register data

Tabulation by age, period and cohort

Two extra complications:

1. Population risk time must be split in triangles.Can be done from population figures.Available for entire nations in the humanmortality database.

2. Midpoints of age-intervals are no longeraverage age in the classes. The same for periodand cohort.Midpoints should be offset by 1

3 to give averageage at follow-up.

Both complications are treated in “Age-period-cohort models

in the Lexis diagram”, Statistics in Medicine, 2007. [1, 2, 3]

Lexis diagrams 12/ 39

Page 13: Lexis diagrams and analysis of register data

Construction of rates

I Rate = Events / Risk time = D/Y

I Mortality = red blobs / length of lines

I Incidence rates = green blobs / population

What subsets of the Lexis diagram should this bedone for?

This is essentially asking:

“What timescales are we interested in?”

Tabulation of follow-up 13/ 39

Page 14: Lexis diagrams and analysis of register data

Data manipulations

I From: Individual records (entry,exit,status)I To: Tables of (Events,risk time) = (D, Y ) by

timescales(age, date, duration. . . )

Each individual contributes to many cells of thetable, so it is not a tabulation of the individuals, itis a tabulation of the follow-up:

I Split follow-up in small pieces; each one with(d, y) recorded.

I Tabulate (d, y) by timescales(and other covariates of interest, such as sexand date of birth).

Tabulation of follow-up 14/ 39

Page 15: Lexis diagrams and analysis of register data

Keeping track

Functions for splitting time; one record per personto one record per period of follow-up, while keepingtrack of time-scale, risk time and events:

SAS: Macro %Lexis available fromwww.biostat.ku.dk/~bxc/Lexis.

Stata: Functions stset and stsplit.

R: Functions Lexis, splitLexis andcutLexis; available in the Epi package.

Tabulation of follow-up 15/ 39

Page 16: Lexis diagrams and analysis of register data

Keeping track in practice

The Epi package has a Lexis machinery (designedby Martyn Plummer, IARC, Lyon).

I Keeps track of multiple states and mutipletime scales

I Provides tools for summarizing and tabulatingfollow-up

I Lexis diagrams shown here are made byplot.Lexis

I An overview of the Lexis machinery isincluded in the package as a .pdf-document.

Tabulation of follow-up 16/ 39

Page 17: Lexis diagrams and analysis of register data

Tabulation

Split records (i.e. (d, y)) are then tabulated by:

1. Fixed covariates(sex, genotype, date of birth, . . . )

2. Timescales(age, calendar time, duration, . . . )

Rates can now be computed by any of the variablesin the tabulation.

Analysis proceeds as if observations wereindependent Poisson observations.

Tabulation of follow-up 17/ 39

Page 18: Lexis diagrams and analysis of register data

Analysis of rates

Rate = Events/Risk time = D/Y

This is based on the log-likelihood for observation ofD events during Y risk time with a constant rate λ:

`(λ) = Dlog(λ)− λY

Apart from a term Dlog(Y ), this is thelog-likelihood for a Poisson observation D withmean µ = λY ; log(µ) = log(λ) + log(Y )

The empirical rate is the ML-estimator in theconstant rate model.

Models and likelihood 18/ 39

Page 19: Lexis diagrams and analysis of register data

Likelihood for one person

The likelihood from several intervals from oneindividual is a product of conditional probabilities:

P {event at t4| alive at t0}= P {event at t4| alive at t4}×

P {survive (t3, t4)| alive at t3}×P {survive (t2, t3)| alive at t2}×P {survive (t1, t2)| alive at t1}×P {survive (t0, t1)| alive at t0}

This can computationally be treated as thelikelihood of 4 independent Poisson observations,(1, 0, 0, 0) with possibly different means.

Models and likelihood 19/ 39

Page 20: Lexis diagrams and analysis of register data

Likelihood for varying rates

I If we assume rates are constant in eachinterval, the log-likelihood from one individualis a sum of Poisson terms.

I Each term refers to one interval of follow-up, sonot independent but the likelihood is a product.

I The purpose of splitting the follow-up is toallow the rates to vary within the follow-up ofeach person.

I Intervals should be so small that rates can beassumed constant within each.(5-year age intervals are usually not.)

Models and likelihood 20/ 39

Page 21: Lexis diagrams and analysis of register data

Analysis of split records

I Splitting the records allows rates to vary acrossfollow-up.

I The split records are analysed as independentPoisson.

I Tabulation makes analysis more handy,technically; but is formally superfluous.

I NOTE: A separate parameter for eachtabulation interval is not necessary.

I Use interval midpoints as a continuouscovariate, model the effect by splines,fractional polynomials, . . . .

Models and likelihood 21/ 39

Page 22: Lexis diagrams and analysis of register data

Register data analysis

I The nature of data (individual records of eventdates) allows arbitrarily fine split of follow-up.

I The amount of data provides technicalproblems that can be solved by tabulation

I Analysis should report smoothed versions ofrates, possibly on multiple time-scales

Models and likelihood 22/ 39

Page 23: Lexis diagrams and analysis of register data

Tabulation of incident DM cases

I Follow-up split byage and date (1-yrclasses).

I Cases (green dots)by age, date, sexand date of birth.

I Population figureswith risk timeamong DM ptt.subtracted to giverisk time amongnon-DM population.

1990 1995 2000 2005 201045

50

55

60

65

DateA

ge

●●

● ●●

Diabetes incidence 23/ 39

Page 24: Lexis diagrams and analysis of register data

Model for DM incidence rates

Model (for each sex):

λ(a, p) = f(a)× g(p), g(2004) = 1

a — current agep — current date (period)

f(a) and g(p) are modelled by natural splines(restricted cubic splines)

Reported in detail in [4]: The National Danish Diabetes

Register: Trends in incidence, prevalence and mortality.

Diabetologia, 2008.Diabetes incidence 24/ 39

Page 25: Lexis diagrams and analysis of register data

0 20 40 60 80 100

0.1

0.2

0.5

1.0

2.0

5.0

10.0

1

1

Age

Inci

denc

e ra

te p

er 1

000

pyrs

1996 2000 2004 2008

1

1

Date of inclusion

0.1

0.2

0.5

1.0

2.0

5.0

10.0

●●

Rat

e ra

tio

Diabetes incidence 25/ 39

Page 26: Lexis diagrams and analysis of register data

Tabulation of DM deaths

I Follow-up split byage and date (1-yrclasses).

I Cases (red dots)and risk time (graylines) by age, date,duration, sex anddate of birth. 1990 1995 2000 2005 2010

45

50

55

60

65

DateA

ge

●●

● ●●

Diabetes mortality 26/ 39

Page 27: Lexis diagrams and analysis of register data

Models for DM mortality rates

Model using current age (a), date at diagnosis(p− d) and duration (d) (two timescales):

λ(a, p) = f(a)×g(p−d)×h(d), g(2004) = 1, h(0) = 1

Model using current age at diagnosis, date atdiagnosis and duration (one timescale):

λ(a, p) = f(a−d)×g(p−d)×h(d), g(2004) = 1, h(0) = 1

f , g and h are modelled by natural splines(restricted cubic splines).

Diabetes mortality 27/ 39

Page 28: Lexis diagrams and analysis of register data

DM mortality, two timescales

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1

Time since inclusion

0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio

Diabetes mortality 28/ 39

Page 29: Lexis diagrams and analysis of register data

DM mortality, two timescales

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1

Time since inclusion

0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio

Diabetes mortality 29/ 39

Page 30: Lexis diagrams and analysis of register data

DM mortality, one timescale

30 40 50 60 70 80 90

2

5

10

20

50

100

200

1

1

Age at inclusion

Mor

talit

y ra

te p

er 1

000

pyrs

1

1

2000

Inclusion date

●●

0 4 8 12

1

1

Time since inclusion

0.2

0.5

1.0

2.0

5.0

10.0

20.0

●●

Rat

e ra

tio

Diabetes mortality 30/ 39

Page 31: Lexis diagrams and analysis of register data

Why not a Cox model?

Need to choose an underlying time-scale:

I Age (i.e. current age)

I Duration (i.e. time since diagnosis)

The other time scale is accomodated in aCox-model by splitting the follow-up on this, andincluding it as a covariate.

So you can accomodate more than one timescale ina Cox-model, but the hazzle with time-splitting isthe same.

The Poisson approach is easier because rates aredirectly estimated using smoothers.

Cox modelling 31/ 39

Page 32: Lexis diagrams and analysis of register data

Age at entry?

If duration is taken as timescale and age at entry(e = a− d) as covariate:

λ(a, d,x) = λo(d)exp(α(a− d) + xβ)

= λo(d)e−αdexp(αa+ xβ)

The effect of current age is taken to be linear onthe log-scale, i.e. exponential effect of age.

Which in may cases is not too fra from reality —but unless you are prepared to split data you cannotcheck the feasibility of the model.

Cox modelling 32/ 39

Page 33: Lexis diagrams and analysis of register data

The real advantage

Well DM

Dead (no DM) Dead (DM)

-

? ?

λ(a)

µW (a) µD(a, d)

Advantage of parametric hazards 33/ 39

Page 34: Lexis diagrams and analysis of register data

The relationships between the rates and the probabilities are:

P {Well at a} = exp(−∫ a

0

λ(s) + µW (s) ds)

P {Dead (well) at a} =∫ a

0

µW (s)exp(−∫ s

0

λ(u) + µW (u) du)

ds

P {DM at a} =∫ a

0

P {DM diagnosis at s}

×P {survive with DM from s to a} ds

=∫ a

0

λ(s)exp(−∫ s

0

λ(u) + µW (u) du)

×exp(−∫ a

s

µD(u, u− s) du)

ds

P {Dead (DM) at a} = 1− P {Well at a} − P {Dead (well) at a}−P {DM at a}

Advantage of parametric hazards 34/ 39

Page 35: Lexis diagrams and analysis of register data

The real advantage

Poisson models gives parametic expressions for therates, so calculation of integrals is simple; they arejust sums:

# Evaluate the cumulative rates at the *end* of the intervalsInc <- cumsum( inc )M.w <- cumsum( m.w )# Probability of being in the "Well" state

P.w <- exp( -(Inc+M.w) )# Probability of being dead without disease, i.e in the "Dead well" state

P.mw <- cumsum( m.w * exp( -(Inc+M.w) ) )# Probability of being alive with disease

P.wd <- x <- numeric( A )for( a in 1:A ){ for( d in 1:a ) # here d plays the role of age at diagnosis

x[d] <- inc[d] * exp( -(Inc[d]+M.w[d]) ) *exp( -sum( m.d[cbind(d:a,1:(a-d+1))] ) )

P.wd[a] <- sum( x[1:a] ) }res <- cbind( P.w, P.wd, 1-P.w-P.mw-P.wd, P.mw )

Advantage of parametric hazards 35/ 39

Page 36: Lexis diagrams and analysis of register data

0 20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

a.pt

rep(

2, N

)

a.pt

rep(

2, N

)

20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

P(+, well)

P(+, DM)

P(Alive, DM)

P(Alive, well)

Age

Advantage of parametric hazards 36/ 39

Page 37: Lexis diagrams and analysis of register data

0 20 40 60 80 1000

5

10

15

20

25

30

Age

P(

DM

bef

ore

age

a )

(%)

0

5

10

15

20

25

30

Advantage of parametric hazards 37/ 39

Page 38: Lexis diagrams and analysis of register data

Summary

I Registers provide follow-up for health events.

I For the initial event, population risk time isneeded to compute rates.

I Tables of events and risk time should be withnarrow time intervals.

I Effects of timescales modelled by using theinterval midpoints as quantitative variables.

I Show data in Lexis diagrams.

I Show rates as interpretable curves.

I Usually many timescales are available; informedchoice is needed.

Summary 38/ 39

Page 39: Lexis diagrams and analysis of register data

References

B Carstensen.Age-Period-Cohort models for the Lexis diagram.Statistics in Medicine, 26(15):3018–3045, July 2007.

J Rosenbauer and K Strassburger.Comments on: ”Age-Period-Cohort models for the Lexis diagram”.Statistics in Medicine, 27:1557–1561, 2007.

B Carstensen.Age-Period-Cohort models for the Lexis diagram (author’s reply).Statistics in Medicine, 27:1561–1564, 2007.

B Carstensen, JK Kristensen, P Ottosen, and K Borch-Johnsen.The Danish National Diabetes Register: Trends in incidence, prevalence andmortality.Diabetologia, 51:2187–2196, 2008.

The presentation is available on my homepage:www.biostat.ku.dk/~bxc/

Summary 39/ 39