01/20151 EPI 5344: Survival Analysis in Epidemiology Time varying covariates March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive

101/2015

EPI 5344:Survival Analysis in

EpidemiologyTime varying covariates

March 24, 2015

Dr. N. Birkett,School of Epidemiology, Public Health &

Preventive Medicine,University of Ottawa

201/2015

Objectives

• Introduce time varying covariates• Methods of inclusion into Cox models• SAS (computer) issues

301/2015

• Does heart transplantation improve survival?– Epidemiological study with ID measures– Observational study (not an RCT)

Introduction (1)

401/2015

• Assume that transplant has no effect on survival– IDR = 1.0

• 800 candidates for transplant• 2 year follow-up• No losses• 50% of people get a transplant

– Always occurs on their first anniversary of entering study

• 25% of group die in first year• 25% of first year survivors die in second year

Introduction (2)

01/2015

Introduction (3)

Ignore transplant status

Time N Deaths PT ID/PY

Year 1 800

Year 2


Year 1 800 200

Year 2


Year 1 800 200

Year 2 600


Year 1 800 200

Year 2 600 150


Year 1 800 200

Year 2 600 150

350


Year 1 800 200 700

Year 2 600 150

350


Year 1 800 200 700

Year 2 600 150 525

350


Year 1 800 200 700

Year 2 600 150 525

350 1,225


Year 1 800 200 700 0.286

Year 2 600 150 525

350 1,225


Year 1 800 200 700 0.286

Year 2 600 150 525 0.286

350 1,225


Year 1 800 200 700 0.286

Year 2 600 150 525 0.286

350 1,225 0.286

5

601/2015

Introduction (4)Stratify by transplant status

Transplant Done


Year 1 400

Year 2


Year 1 400 0

Year 2


Year 1 400 0

Year 2 400


Year 1 400 0

Year 2 400 100


Year 1 400 0

Year 2 400 100

100


Year 1 400 0 400

Year 2 400 100

100


Year 1 400 0 400

Year 2 400 100 350

100


Year 1 400 0 400

Year 2 400 100 350

100 750


Year 1 400 0 400 0.0

Year 2 400 100 350

100 750


Year 1 400 0 400 0.0

Year 2 400 100 350 0.286

100 750


Year 1 400 0 400 0.0

Year 2 400 100 350 0.286

100 750 0.133

701/2015

Introduction (5)Stratify by transplant status

NO Transplant Done


Year 1 400

Year 2


Year 1 400 200

Year 2


Year 1 400 200

Year 2 200


Year 1 400 200

Year 2 200 50


Year 1 400 200

Year 2 200 50

250


Year 1 400 200 300

Year 2 200 50

250


Year 1 400 200 300

Year 2 200 50 175

250


Year 1 400 200 300

Year 2 200 50 175

250 475


Year 1 400 200 300 0.667

Year 2 200 50 175

250 475


Year 1 400 200 300 0.667

Year 2 200 50 175 0.286

250 475


Year 1 400 200 300 0.667

Year 2 200 50 175 0.286

250 475 0.526

801/2015

• What is the observed IDR under this method of analysis?

• Transplant ID = 0.133/yr• No transplant ID = 0.526/yr• IDR = 0.253• Correct IDR = 1.0

Introduction (6)

STRONG BIAS

Doing an RCT does NOT fix this issueas long as transplant is not done at time ‘0’

901/2015

• How do we fix this?– No-one is at risk of dying with a transplant until the

transplant has taken place

• Solution using epi methods: – People who never have transplant– People who have a transplant

• Accumulate PT (and events) to the non-transplant group until after a transplant occurs

• Accumulate PT (and events) to the transplant group only after transplant occurs

Introduction (7)

1001/2015

Introduction (8)CORRECT WAY:

No Transplant Done


Year 1 800

Year 2


Year 1 800 200

Year 2


Year 1 800 200

Year 2 200


Year 1 800 200

Year 2 200 50


Year 1 800 200

Year 2 200 50

250


Year 1 800 200 700

Year 2 200 50

250


Year 1 800 200 700

Year 2 200 50 175

250


Year 1 800 200 700

Year 2 200 50 175

250 875


Year 1 800 200 700 0.286

Year 2 200 50 175

250 875


Year 1 800 200 700 0.286

Year 2 200 50 175 0.286

250 875


Year 1 800 200 700 0.286

Year 2 200 50 175 0.286

250 875 0.286

1101/2015

Introduction (9)CORRECT WAYTransplant Done


Year 1 0

Year 2


Year 1 0 0

Year 2


Year 1 0 0

Year 2 400


Year 1 0 0

Year 2 400 100


Year 1 0 0

Year 2 400 100

100


Year 1 0 0 0

Year 2 400 100

100


Year 1 0 0 0

Year 2 400 100 350

100


Year 1 0 0 0

Year 2 400 100 350

100 350


Year 1 0 0 0 ND

Year 2 400 100 350

100 350


Year 1 0 0 0 ND

Year 2 400 100 350 0.286

100 350


Year 1 0 0 0 ND

Year 2 400 100 350 0.286

100 350 0.286

1201/2015

• What is the observed IDR under this method of analysis?

• Transplant ID = 0.286/yr• No transplant ID = 0.286/yr• IDR = 1.0• Correct IDR = 1.0

Introduction (10)

TIME VARYING COVARIATE Transplant status

1301/2015

• Exposures can change during follow-up– People stop/start smoking– BP increases– Air pollution varies from year to year

• Hazard often depends more strongly on recent values than original exposure– Not always true– Can depend on

• cumulative exposure• Lagged exposure

Time Varying Covariates (1)

1401/2015

• Produces non-proportional hazards– Change in exposure level causes hazard to change in

one group

• Still proportional conditional on value of time varying exposure.


1501/2015

1601/2015

17

Before t*, HR = 1.0

After t*, HR* < 1.0


NOT PH over all time

If we ignore the time of exposure and just treat these as two groups with PH, we get a biased estimate of the hazard ratio

– A type of average of 1.0 and HR* (> HR*)

01/2015

1801/2015

BUT: before t*, hazards are proportional

after t*, hazards are proportional• The true impact of the exposure is HR*

and only occurs after t*• Need an analysis approach to reflect this


1901/2015

• Is this hard to do?– YES and NO

• Consider a situation where all subjects start off as ‘unexposed’ but at some time in the future, some people become exposed


2001/2015

Standard Cox Model


Time Varying Cox Model

Only change

2101/2015

• The theory really is this simple!• WHY?


RISK SETS

2201/2015

• Likelihood function for Cox model is computed at each time point when an event occurs– Depends only on subjects “at risk” at the event

time– RISK SET


xij is the value of ‘x’AT THE TIME of this event

2301/2015

Fixed covariates:


xij is the same at all times

Time varying covariates:

Use the xij which corresponds to the event time of this risk set

Keep doing this over all risk sets

2401/2015

• So why isn’t it simple to do this?• Practical Issues intrude!!!!• To fit a time varying covariate, SAS needs

to know the value of the covariate for every risk set.– Need to compute a value of the covariate at

the time of every event.• Interpretation is also tricky (later)


25


Example– 4 subjects

– 2 get transplant at t = 15 & t = 25

– Want to include a time-varying covariate for transplant status.

01/2015

ID Outcome Time of event

Transplant Time of transplant

1 dead 10 N .

2 dead 20 Y 15

3 dead 30 N .

4 dead 40 Y 25

4 risk sets at t=10, 20, 30, & 40

26


01/2015

Risk set ID Xtrans

10 1234

0000

20 234

100

30 34

01

40 4 1

2701/2015

• Two ways to do this in SAS:– Use programming statements in ‘Proc Phreg’.– Re-structure the data set and use a different

method of describing the model to SAS• Counting Process Input.

• Other programmes have similar options and choices


2801/2015

• We’ll look at both ways.– Some things can only be done in the Phreg

programming approach– Counting Process input has some strong

benefits.– Counting process approach can be tricky to

use with age as the time scale


2901/2015

• SAS lets you include programme statements within PROC PHREG:

proc phreg data=njb1; model surv*vs(0)=age sex x1; if (surv > 20) then x1 = 2; else x1 = 1;run;

Proc Phreg programming (1)

3001/2015

• This code is processed once for each risk set

• ‘surv’ is the time when the risk set occurs– It is NOT the survival time for the subject

• ‘x1’ is the value of the variable in the subject at the time of

the specific risk set under consideration.– Here, it is ‘1’ if the risk set occurs before time 20 but ‘2’

otherwise

• File can get VERY BIG

• Hard to de-bug your code– But, SAS 9.4 allows ‘out’ statements to be used

Proc Phreg programming (2)

31

Stanford Heart Transplant Study

01/2015

3201/2015

3301/2015

Standard phreg analysis.Defines the ‘transplant’ status in the ‘data step’ using code like this:

data njb1; set stanford; if (dot = .) then trans = 0; else trans = 1;run;

proc phreg data=njb1; model time*cens(0)=trans;run;

3401/2015

Trans=1 a) Had a transplant b) Lived long enough to have a transplant

3501/2015

Hazard curves look something like this.

Transplant

No Transplant

Transplant time

In this interval, HR = 0

Overall HR is biased

3601/2015

Stanford Heart Transplant Study: with time varying effect

ID Surv1 Dead Wait

1 49 1 .

2 5 1 .

3 15 1 0

4 38 1 35

5 17 1 .

6 2 1 .

7 674 1 50

For each event time, we need to define the transplant variable for every subject still in risk set

plant = 0 no transplant by risk set time 1 transplant done on or before risk set time

3701/2015

Risk set time

ID’s Wait time plant

2 1234567

.

.035..50

0010000

5 123457

.

.035.50

001000

15 13457

.035.50

01000

3801/2015

Risk set time

ID’s Wait time plant

17 1457

.35.50

0000

38 147

.3550

010

49 17

150

00

674 7 50 1

3901/2015

SAS Code to create ‘plant’ and run analysis

proc phreg data=stan; model surv1*dead(0)=plant surg ageaccept/ ties=exact; if (wait > surv1 or wait = .) then plant = 0; else plant = 1;run;

40

Counting Process Input (1)

• Counting processes are a different way to look at survival– mathematically more powerful– essentially, each subject follows a ‘process’

• ‘count up’ the events they experience• can handle recurrent events• enhances modeling of exposure.

• Don’t need to know all this to use SAS counting process style input.

01/2015

41


• Data set needs to be restructured.• To-date

– one record per subject– To code covariate changes, need multiple variables

• value at baseline (v1)• time of first change (t1) and new value (v2)• and so on

– Need to use ‘phreg’ programming to define value at risk set.

01/2015

42


• New approach– Similar to piece-wise exponential model– Split data for each subject into multiple

records• Define intervals where every covariate is constant

– [t1, t2)

• Each interval has one line (record) of data

– Intervals continue until:• Subject censored• Subject has outcome event.

01/2015

4301/2015

• Need to re-structure data file• Each interval needs a record in the data set• Need to code

• Start of this interval• End of this interval• Outcome status at end of interval• Value of time varying covariate(s) during the

interval• Values of fixed covariates, etc.


4401/2015

• Let’s use data from the Stanford Heart Transplant

Study• the same data as before.• But, we only include transplant status• Ignore other variables for now.• Only have one time varying covariate.


01/2015

ID Surv1 Dead Wait

1 49 1 .

2 5 1 .

3 15 1 0

4 38 1 35

5 17 1 .

6 2 1 .

7 674 1 50

Original data Re-structured data

ID Start Stop Status plant

1 0 49 1 0


1 0 49 1 0

2 0 5 1 0


1 0 49 1 0

2 0 5 1 0

4 0 35 0 0


1 0 49 1 0

2 0 5 1 0

4 0 35 0 0

4 35 38 1 1


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

4 0 35 0 0

4 35 38 1 1


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

3 .1 15 1 1

4 0 35 0 0

4 35 38 1 1


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

3 .1 15 1 1

4 0 35 0 0

4 35 38 1 1

5 0 17 1 0


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

3 .1 15 1 1

4 0 35 0 0

4 35 38 1 1

5 0 17 1 0

6 0 2 1 0


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

3 .1 15 1 1

4 0 35 0 0

4 35 38 1 1

5 0 17 1 0

6 0 2 1 0

7 0 50 0 0


1 0 49 1 0

2 0 5 1 0

3 0 .1 0 0

3 .1 15 1 1

4 0 35 0 0

4 35 38 1 1

5 0 17 1 0

6 0 2 1 0

7 0 50 0 0

7 50 674 1 0

45

4601/2015

DATA stanlong; SET allison.stan; plant=0; start=0;

IF (trans=0) THEN DO; dead2=dead; stop=surv1; IF (stop=0) THEN stop=.1; OUTPUT; END;

ELSE DO; stop=wait; IF (stop=0) THEN stop=.1; dead2=0; OUTPUT;

plant=1; start=wait; IF (stop=.1) THEN start=.1; stop=surv1; dead2=dead; OUTPUT; END;RUN;

SAS Code to re-structure data

DATA stanlong; SET allison.stan; plant=0; start=0;

IF (trans=0) THEN DO; dead2=dead; stop=surv1; OUTPUT; END;

ELSE DO; stop=wait; dead2=0; OUTPUT;

plant=1; start=wait; stop=surv1; dead2=dead; OUTPUT; END;RUN;

4701/2015

PROC PHREG DATA=stanlong; MODEL (start,stop)*dead2(0)=plant surg ageaccpt / TIES=EFRON;RUN;

SAS Code for counting-process input analysis

Identical to previous time-varying analysis

4801/2015

Types of time varying covariates• Internal (endogenous)

– Change in the covariate is related to the behaviour of the subject.

– Measurement requires subject to be under periodic examination

• Blood pressure• Cholesterol• Smoking

– More challenging for analysis• Often part of causal pathway


4901/2015

• External (exogenous)– Variables which vary independently of the subject’s

normally biological processes.– The values do not depend on subject-specific

information– Measurement does not require subject monitoring

• Hourly pollen count


5001/2015

• Some pattern types– Non-reversible dichotomy

• Transplant

– Reversible dichotomy• Smoking• Drug use

– Continuous variable• Cholesterol


5101/2015

• Some issues– Need for valid measures for all subjects at all follow-

up time• Missing data• ‘coarse’ measurement intervals• Imputation• Interpolation

– Computationally intense

• Reverse causation effects• Intermediate variables in the causal pathway


5201/2015

Some Logical fallacies• Can not use the future to predict the future!• Example #1

– Recruit a cohort of neonates• Age at entry = 0 for all subjects

– Not useful as a predictor

– Suggestion is made to use average age during follow-up to predict outcome

– INVALID• Average age during follow-up depends on ‘future’ information• High average age is due to long survival


5301/2015

Intermediaries (Internal covariates)• RCT of anti-hypertensive treatment• Outcome: time to stroke• Main Q: Does drug rate of stroke• Model 1: ln(HR) = β1 (drug)

• BUT, we measured BP on all subjects during follow-up. – Why not include this as a time-varying covariate?


5401/2015

Intermediaries (cont)• Model 1: ln(HR) = β1 (drug)

• Model 2: ln(HR) = β1*(drug) + β2 BP(t)

• Results• Model 1 β1 : p < 0.001

• Model 2 β1*: p =0.6


WHY?

5501/2015

Drug drop in BP drop in stroke risk

• Effect of drug on stroke is already accounted for in the BP term

• Estimate from model of ‘drug’ effect is the effect of the drug after adjusting for changes in BP• That is, after adjusting for the drug effect.


5601/2015

• Study of prisoners released from jail– One year follow-up– Monitor every week

• If subject was re-arrested, record the week of the arrest

• Recidivated

– Key question• Does financial security post-release reduce risk of

recidivism?

SAS examples (1)

5701/2015

5801/2015

5901/2015

6001/2015

• Study also collected information about employment status for every week of follow-up after release

• Time varying covariate• Hypothesis

– Being in full-time employment reduces the risk of recidivism.

SAS examples (2)

6101/2015

ID EMP1 EMP2 EMP3 ……… EMP52

1 1 1 0 ……… 0

2 0 0 0 ……… 1

3 1 0 0 ……… 0

… and so on

Data layout for employment information

6201/2015

PROC PHREG DATA=allison.recid; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week];RUN;

6301/2015

BUT: if you get arrested in week 10, you can’t work fulltime in week 10

REVERSE CAUSATION

Lagged exposure

6401/2015

title 'Single week lag';PROC PHREG data=allison.recid; WHERE week>1; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week-1];RUN;

6501/2015

• Allison looks at some other models– Other lag intervals– cumulative work experience

• Worth reviewing for code examples and interpretation

SAS examples (3)

6601/2015

• Albumin and death– Question:

• Does a falling serum albumin predict an increased likelihood of death?

SAS examples (4)

6701/2015

• Albumin measured on the first day of each month– Ad-hoc measurement– Not available on every day of the month

• Can not use ‘average’ albumin around death date– No post-death value

• Use ‘closest’ value before risk set date

SAS examples (5)

6801/2015

DATA bloodcount;

INFILE 'c:\blood.dat';

INPUT deathday status alb1-alb12;

ARRAY alb(*) alb1-alb12;

status2=0;

deathmon=CEIL(deathday/30.4);

DO j=1 TO deathmon;

start=(j-1)*30.4;

stop=start+30.4;

albumin=alb(j);

IF (j=deathmon) THEN DO;

status2=status;

stop=deathday-start;

END;

OUTPUT;

END;

Run;

PROC PHREG DATA=bloodcount;

MODEL (start,stop)*status2(0)=albumin;

RUN;

Uses counting process style input

6901/2015

• Alcohol cirrhosis and survival– Prothrombin time (a measure of blood

clotting) is hypothesized as a predictor of survival

– Cohort of men were followed up– Lab measures were taken at ‘clinically

relevant’ times• No pattern to the times• Varied for each subject

SAS examples (6)

7001/2015

7101/2015

DATA alcocount; SET allison.alco; time1=0; time11=.; ARRAY t(*) time1-time11; ARRAY p(*) pt1-pt10; dead2=0; DO j=1 TO 10 WHILE (t(j) NE .); start=t(j); pt=p(j); stop=t(j+1); IF (t(j+1)=.) THEN DO; stop=surv; dead2=dead; END; OUTPUT; END;run;

PROC PHREG DATA=alcocount; MODEL (start,stop)*dead2(0)=pt;RUN;

Uses counting process style input

7201/2015

Documents

01/20151 EPI 5344: Survival Analysis in Epidemiology Time varying covariates March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive