Upload
kristin-king
View
230
Download
3
Tags:
Embed Size (px)
Citation preview
101/2015
EPI 5344:Survival Analysis in
EpidemiologyTime varying covariates
March 24, 2015
Dr. N. Birkett,School of Epidemiology, Public Health &
Preventive Medicine,University of Ottawa
201/2015
Objectives
• Introduce time varying covariates• Methods of inclusion into Cox models• SAS (computer) issues
301/2015
• Does heart transplantation improve survival?– Epidemiological study with ID measures– Observational study (not an RCT)
Introduction (1)
401/2015
• Assume that transplant has no effect on survival– IDR = 1.0
• 800 candidates for transplant• 2 year follow-up• No losses• 50% of people get a transplant
– Always occurs on their first anniversary of entering study
• 25% of group die in first year• 25% of first year survivors die in second year
Introduction (2)
01/2015
Introduction (3)
Ignore transplant status
Time N Deaths PT ID/PY
Year 1 800
Year 2
Time N Deaths PT ID/PY
Year 1 800 200
Year 2
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 600
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 600 150
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 600 150
350
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 600 150
350
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 600 150 525
350
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 600 150 525
350 1,225
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 600 150 525
350 1,225
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 600 150 525 0.286
350 1,225
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 600 150 525 0.286
350 1,225 0.286
5
601/2015
Introduction (4)Stratify by transplant status
Transplant Done
Time N Deaths PT ID/PY
Year 1 400
Year 2
Time N Deaths PT ID/PY
Year 1 400 0
Year 2
Time N Deaths PT ID/PY
Year 1 400 0
Year 2 400
Time N Deaths PT ID/PY
Year 1 400 0
Year 2 400 100
Time N Deaths PT ID/PY
Year 1 400 0
Year 2 400 100
100
Time N Deaths PT ID/PY
Year 1 400 0 400
Year 2 400 100
100
Time N Deaths PT ID/PY
Year 1 400 0 400
Year 2 400 100 350
100
Time N Deaths PT ID/PY
Year 1 400 0 400
Year 2 400 100 350
100 750
Time N Deaths PT ID/PY
Year 1 400 0 400 0.0
Year 2 400 100 350
100 750
Time N Deaths PT ID/PY
Year 1 400 0 400 0.0
Year 2 400 100 350 0.286
100 750
Time N Deaths PT ID/PY
Year 1 400 0 400 0.0
Year 2 400 100 350 0.286
100 750 0.133
701/2015
Introduction (5)Stratify by transplant status
NO Transplant Done
Time N Deaths PT ID/PY
Year 1 400
Year 2
Time N Deaths PT ID/PY
Year 1 400 200
Year 2
Time N Deaths PT ID/PY
Year 1 400 200
Year 2 200
Time N Deaths PT ID/PY
Year 1 400 200
Year 2 200 50
Time N Deaths PT ID/PY
Year 1 400 200
Year 2 200 50
250
Time N Deaths PT ID/PY
Year 1 400 200 300
Year 2 200 50
250
Time N Deaths PT ID/PY
Year 1 400 200 300
Year 2 200 50 175
250
Time N Deaths PT ID/PY
Year 1 400 200 300
Year 2 200 50 175
250 475
Time N Deaths PT ID/PY
Year 1 400 200 300 0.667
Year 2 200 50 175
250 475
Time N Deaths PT ID/PY
Year 1 400 200 300 0.667
Year 2 200 50 175 0.286
250 475
Time N Deaths PT ID/PY
Year 1 400 200 300 0.667
Year 2 200 50 175 0.286
250 475 0.526
801/2015
• What is the observed IDR under this method of analysis?
• Transplant ID = 0.133/yr• No transplant ID = 0.526/yr• IDR = 0.253• Correct IDR = 1.0
Introduction (6)
STRONG BIAS
Doing an RCT does NOT fix this issueas long as transplant is not done at time ‘0’
901/2015
• How do we fix this?– No-one is at risk of dying with a transplant until the
transplant has taken place
• Solution using epi methods: – People who never have transplant– People who have a transplant
• Accumulate PT (and events) to the non-transplant group until after a transplant occurs
• Accumulate PT (and events) to the transplant group only after transplant occurs
Introduction (7)
1001/2015
Introduction (8)CORRECT WAY:
No Transplant Done
Time N Deaths PT ID/PY
Year 1 800
Year 2
Time N Deaths PT ID/PY
Year 1 800 200
Year 2
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 200
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 200 50
Time N Deaths PT ID/PY
Year 1 800 200
Year 2 200 50
250
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 200 50
250
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 200 50 175
250
Time N Deaths PT ID/PY
Year 1 800 200 700
Year 2 200 50 175
250 875
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 200 50 175
250 875
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 200 50 175 0.286
250 875
Time N Deaths PT ID/PY
Year 1 800 200 700 0.286
Year 2 200 50 175 0.286
250 875 0.286
1101/2015
Introduction (9)CORRECT WAYTransplant Done
Time N Deaths PT ID/PY
Year 1 0
Year 2
Time N Deaths PT ID/PY
Year 1 0 0
Year 2
Time N Deaths PT ID/PY
Year 1 0 0
Year 2 400
Time N Deaths PT ID/PY
Year 1 0 0
Year 2 400 100
Time N Deaths PT ID/PY
Year 1 0 0
Year 2 400 100
100
Time N Deaths PT ID/PY
Year 1 0 0 0
Year 2 400 100
100
Time N Deaths PT ID/PY
Year 1 0 0 0
Year 2 400 100 350
100
Time N Deaths PT ID/PY
Year 1 0 0 0
Year 2 400 100 350
100 350
Time N Deaths PT ID/PY
Year 1 0 0 0 ND
Year 2 400 100 350
100 350
Time N Deaths PT ID/PY
Year 1 0 0 0 ND
Year 2 400 100 350 0.286
100 350
Time N Deaths PT ID/PY
Year 1 0 0 0 ND
Year 2 400 100 350 0.286
100 350 0.286
1201/2015
• What is the observed IDR under this method of analysis?
• Transplant ID = 0.286/yr• No transplant ID = 0.286/yr• IDR = 1.0• Correct IDR = 1.0
Introduction (10)
TIME VARYING COVARIATE Transplant status
1301/2015
• Exposures can change during follow-up– People stop/start smoking– BP increases– Air pollution varies from year to year
• Hazard often depends more strongly on recent values than original exposure– Not always true– Can depend on
• cumulative exposure• Lagged exposure
Time Varying Covariates (1)
1401/2015
• Produces non-proportional hazards– Change in exposure level causes hazard to change in
one group
• Still proportional conditional on value of time varying exposure.
Time Varying Covariates (2)
1501/2015
1601/2015
17
Before t*, HR = 1.0
After t*, HR* < 1.0
Time Varying Covariates (3)
NOT PH over all time
If we ignore the time of exposure and just treat these as two groups with PH, we get a biased estimate of the hazard ratio
– A type of average of 1.0 and HR* (> HR*)
01/2015
1801/2015
BUT: before t*, hazards are proportional
after t*, hazards are proportional• The true impact of the exposure is HR*
and only occurs after t*• Need an analysis approach to reflect this
Time Varying Covariates (4)
1901/2015
• Is this hard to do?– YES and NO
• Consider a situation where all subjects start off as ‘unexposed’ but at some time in the future, some people become exposed
Time Varying Covariates (5)
2001/2015
Standard Cox Model
Time Varying Covariates (6)
Time Varying Cox Model
Only change
2101/2015
• The theory really is this simple!• WHY?
Time Varying Covariates (7)
RISK SETS
2201/2015
• Likelihood function for Cox model is computed at each time point when an event occurs– Depends only on subjects “at risk” at the event
time– RISK SET
Time Varying Covariates (8)
xij is the value of ‘x’AT THE TIME of this event
2301/2015
Fixed covariates:
Time Varying Covariates (9)
xij is the same at all times
Time varying covariates:
Use the xij which corresponds to the event time of this risk set
Keep doing this over all risk sets
2401/2015
• So why isn’t it simple to do this?• Practical Issues intrude!!!!• To fit a time varying covariate, SAS needs
to know the value of the covariate for every risk set.– Need to compute a value of the covariate at
the time of every event.• Interpretation is also tricky (later)
Time Varying Covariates (10)
25
Time Varying Covariates (11)
Example– 4 subjects
– 2 get transplant at t = 15 & t = 25
– Want to include a time-varying covariate for transplant status.
01/2015
ID Outcome Time of event
Transplant Time of transplant
1 dead 10 N .
2 dead 20 Y 15
3 dead 30 N .
4 dead 40 Y 25
4 risk sets at t=10, 20, 30, & 40
26
Time Varying Covariates (12)
01/2015
Risk set ID Xtrans
10 1234
0000
20 234
100
30 34
01
40 4 1
2701/2015
• Two ways to do this in SAS:– Use programming statements in ‘Proc Phreg’.– Re-structure the data set and use a different
method of describing the model to SAS• Counting Process Input.
• Other programmes have similar options and choices
Time Varying Covariates (13)
2801/2015
• We’ll look at both ways.– Some things can only be done in the Phreg
programming approach– Counting Process input has some strong
benefits.– Counting process approach can be tricky to
use with age as the time scale
Time Varying Covariates (14)
2901/2015
• SAS lets you include programme statements within PROC PHREG:
proc phreg data=njb1; model surv*vs(0)=age sex x1; if (surv > 20) then x1 = 2; else x1 = 1;run;
Proc Phreg programming (1)
3001/2015
• This code is processed once for each risk set
• ‘surv’ is the time when the risk set occurs– It is NOT the survival time for the subject
• ‘x1’ is the value of the variable in the subject at the time of
the specific risk set under consideration.– Here, it is ‘1’ if the risk set occurs before time 20 but ‘2’
otherwise
• File can get VERY BIG
• Hard to de-bug your code– But, SAS 9.4 allows ‘out’ statements to be used
Proc Phreg programming (2)
31
Stanford Heart Transplant Study
01/2015
3201/2015
3301/2015
Standard phreg analysis.Defines the ‘transplant’ status in the ‘data step’ using code like this:
data njb1; set stanford; if (dot = .) then trans = 0; else trans = 1;run;
proc phreg data=njb1; model time*cens(0)=trans;run;
3401/2015
Trans=1 a) Had a transplant b) Lived long enough to have a transplant
3501/2015
Hazard curves look something like this.
Transplant
No Transplant
Transplant time
In this interval, HR = 0
Overall HR is biased
3601/2015
Stanford Heart Transplant Study: with time varying effect
ID Surv1 Dead Wait
1 49 1 .
2 5 1 .
3 15 1 0
4 38 1 35
5 17 1 .
6 2 1 .
7 674 1 50
For each event time, we need to define the transplant variable for every subject still in risk set
plant = 0 no transplant by risk set time 1 transplant done on or before risk set time
3701/2015
Risk set time
ID’s Wait time plant
2 1234567
.
.035..50
0010000
5 123457
.
.035.50
001000
15 13457
.035.50
01000
3801/2015
Risk set time
ID’s Wait time plant
17 1457
.35.50
0000
38 147
.3550
010
49 17
150
00
674 7 50 1
3901/2015
SAS Code to create ‘plant’ and run analysis
proc phreg data=stan; model surv1*dead(0)=plant surg ageaccept/ ties=exact; if (wait > surv1 or wait = .) then plant = 0; else plant = 1;run;
40
Counting Process Input (1)
• Counting processes are a different way to look at survival– mathematically more powerful– essentially, each subject follows a ‘process’
• ‘count up’ the events they experience• can handle recurrent events• enhances modeling of exposure.
• Don’t need to know all this to use SAS counting process style input.
01/2015
41
Counting Process Input (2)
• Data set needs to be restructured.• To-date
– one record per subject– To code covariate changes, need multiple variables
• value at baseline (v1)• time of first change (t1) and new value (v2)• and so on
– Need to use ‘phreg’ programming to define value at risk set.
01/2015
42
Counting Process Input (3)
• New approach– Similar to piece-wise exponential model– Split data for each subject into multiple
records• Define intervals where every covariate is constant
– [t1, t2)
• Each interval has one line (record) of data
– Intervals continue until:• Subject censored• Subject has outcome event.
01/2015
4301/2015
• Need to re-structure data file• Each interval needs a record in the data set• Need to code
• Start of this interval• End of this interval• Outcome status at end of interval• Value of time varying covariate(s) during the
interval• Values of fixed covariates, etc.
Counting Process Input (4)
4401/2015
• Let’s use data from the Stanford Heart Transplant
Study• the same data as before.• But, we only include transplant status• Ignore other variables for now.• Only have one time varying covariate.
Counting Process Input (5)
01/2015
ID Surv1 Dead Wait
1 49 1 .
2 5 1 .
3 15 1 0
4 38 1 35
5 17 1 .
6 2 1 .
7 674 1 50
Original data Re-structured data
ID Start Stop Status plant
1 0 49 1 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
4 0 35 0 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
4 0 35 0 0
4 35 38 1 1
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
4 0 35 0 0
4 35 38 1 1
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
3 .1 15 1 1
4 0 35 0 0
4 35 38 1 1
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
3 .1 15 1 1
4 0 35 0 0
4 35 38 1 1
5 0 17 1 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
3 .1 15 1 1
4 0 35 0 0
4 35 38 1 1
5 0 17 1 0
6 0 2 1 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
3 .1 15 1 1
4 0 35 0 0
4 35 38 1 1
5 0 17 1 0
6 0 2 1 0
7 0 50 0 0
ID Start Stop Status plant
1 0 49 1 0
2 0 5 1 0
3 0 .1 0 0
3 .1 15 1 1
4 0 35 0 0
4 35 38 1 1
5 0 17 1 0
6 0 2 1 0
7 0 50 0 0
7 50 674 1 0
45
4601/2015
DATA stanlong; SET allison.stan; plant=0; start=0;
IF (trans=0) THEN DO; dead2=dead; stop=surv1; IF (stop=0) THEN stop=.1; OUTPUT; END;
ELSE DO; stop=wait; IF (stop=0) THEN stop=.1; dead2=0; OUTPUT;
plant=1; start=wait; IF (stop=.1) THEN start=.1; stop=surv1; dead2=dead; OUTPUT; END;RUN;
SAS Code to re-structure data
DATA stanlong; SET allison.stan; plant=0; start=0;
IF (trans=0) THEN DO; dead2=dead; stop=surv1; OUTPUT; END;
ELSE DO; stop=wait; dead2=0; OUTPUT;
plant=1; start=wait; stop=surv1; dead2=dead; OUTPUT; END;RUN;
4701/2015
PROC PHREG DATA=stanlong; MODEL (start,stop)*dead2(0)=plant surg ageaccpt / TIES=EFRON;RUN;
SAS Code for counting-process input analysis
Identical to previous time-varying analysis
4801/2015
Types of time varying covariates• Internal (endogenous)
– Change in the covariate is related to the behaviour of the subject.
– Measurement requires subject to be under periodic examination
• Blood pressure• Cholesterol• Smoking
– More challenging for analysis• Often part of causal pathway
Time Varying Covariates (15)
4901/2015
• External (exogenous)– Variables which vary independently of the subject’s
normally biological processes.– The values do not depend on subject-specific
information– Measurement does not require subject monitoring
• Hourly pollen count
Time Varying Covariates (16)
5001/2015
• Some pattern types– Non-reversible dichotomy
• Transplant
– Reversible dichotomy• Smoking• Drug use
– Continuous variable• Cholesterol
Time Varying Covariates (17)
5101/2015
• Some issues– Need for valid measures for all subjects at all follow-
up time• Missing data• ‘coarse’ measurement intervals• Imputation• Interpolation
– Computationally intense
• Reverse causation effects• Intermediate variables in the causal pathway
Time Varying Covariates (18)
5201/2015
Some Logical fallacies• Can not use the future to predict the future!• Example #1
– Recruit a cohort of neonates• Age at entry = 0 for all subjects
– Not useful as a predictor
– Suggestion is made to use average age during follow-up to predict outcome
– INVALID• Average age during follow-up depends on ‘future’ information• High average age is due to long survival
Time Varying Covariates (19)
5301/2015
Intermediaries (Internal covariates)• RCT of anti-hypertensive treatment• Outcome: time to stroke• Main Q: Does drug rate of stroke• Model 1: ln(HR) = β1 (drug)
• BUT, we measured BP on all subjects during follow-up. – Why not include this as a time-varying covariate?
Time Varying Covariates (20)
5401/2015
Intermediaries (cont)• Model 1: ln(HR) = β1 (drug)
• Model 2: ln(HR) = β1*(drug) + β2 BP(t)
• Results• Model 1 β1 : p < 0.001
• Model 2 β1*: p =0.6
Time Varying Covariates (21)
WHY?
5501/2015
Drug drop in BP drop in stroke risk
• Effect of drug on stroke is already accounted for in the BP term
• Estimate from model of ‘drug’ effect is the effect of the drug after adjusting for changes in BP• That is, after adjusting for the drug effect.
Time Varying Covariates (22)
5601/2015
• Study of prisoners released from jail– One year follow-up– Monitor every week
• If subject was re-arrested, record the week of the arrest
• Recidivated
– Key question• Does financial security post-release reduce risk of
recidivism?
SAS examples (1)
5701/2015
5801/2015
5901/2015
6001/2015
• Study also collected information about employment status for every week of follow-up after release
• Time varying covariate• Hypothesis
– Being in full-time employment reduces the risk of recidivism.
SAS examples (2)
6101/2015
ID EMP1 EMP2 EMP3 ……… EMP52
1 1 1 0 ……… 0
2 0 0 0 ……… 1
3 1 0 0 ……… 0
… and so on
Data layout for employment information
6201/2015
PROC PHREG DATA=allison.recid; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week];RUN;
6301/2015
BUT: if you get arrested in week 10, you can’t work fulltime in week 10
REVERSE CAUSATION
Lagged exposure
6401/2015
title 'Single week lag';PROC PHREG data=allison.recid; WHERE week>1; MODEL week*arrest(0)=fin age race wexp mar paro prio employed / TIES=EFRON; ARRAY emp(*) emp1-emp52; employed=emp[week-1];RUN;
6501/2015
• Allison looks at some other models– Other lag intervals– cumulative work experience
• Worth reviewing for code examples and interpretation
SAS examples (3)
6601/2015
• Albumin and death– Question:
• Does a falling serum albumin predict an increased likelihood of death?
SAS examples (4)
6701/2015
• Albumin measured on the first day of each month– Ad-hoc measurement– Not available on every day of the month
• Can not use ‘average’ albumin around death date– No post-death value
• Use ‘closest’ value before risk set date
SAS examples (5)
6801/2015
DATA bloodcount;
INFILE 'c:\blood.dat';
INPUT deathday status alb1-alb12;
ARRAY alb(*) alb1-alb12;
status2=0;
deathmon=CEIL(deathday/30.4);
DO j=1 TO deathmon;
start=(j-1)*30.4;
stop=start+30.4;
albumin=alb(j);
IF (j=deathmon) THEN DO;
status2=status;
stop=deathday-start;
END;
OUTPUT;
END;
Run;
PROC PHREG DATA=bloodcount;
MODEL (start,stop)*status2(0)=albumin;
RUN;
Uses counting process style input
6901/2015
• Alcohol cirrhosis and survival– Prothrombin time (a measure of blood
clotting) is hypothesized as a predictor of survival
– Cohort of men were followed up– Lab measures were taken at ‘clinically
relevant’ times• No pattern to the times• Varied for each subject
SAS examples (6)
7001/2015
7101/2015
DATA alcocount; SET allison.alco; time1=0; time11=.; ARRAY t(*) time1-time11; ARRAY p(*) pt1-pt10; dead2=0; DO j=1 TO 10 WHILE (t(j) NE .); start=t(j); pt=p(j); stop=t(j+1); IF (t(j+1)=.) THEN DO; stop=surv; dead2=dead; END; OUTPUT; END;run;
PROC PHREG DATA=alcocount; MODEL (start,stop)*dead2(0)=pt;RUN;
Uses counting process style input
7201/2015