Upload
benjamin-shaw
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1
S077: Applied Longitudinal Data AnalysisWeek #6: What Are The Topics Covered In Today’s Overview?
Topic Slide
I. Describing Continuous-Time Event Occurrence Data:1. Salient Features of Continuous-Time Event Data.2. Redefining the Survivor And Hazard Functions And Strategies For Estimation.3. The Cumulative Hazard Function.4. Developing Your Intuition About Survivor, Cumulative Hazard and Kernel-
Smoothed Hazard Functions.
II. Fitting Cox Regression Models:1. Towards a Statistical Model for Continuous-Time Hazard.2. Fitting the Continuous Time Hazard Model to Data.3. Evaluating the Results of Model Fitting.4. Graphically Displaying the Results of Model Fitting.
III. Extending the Cox Regression Model:1. Including Time-Varying Predictors in the Cox Regression Model.2. Non-Proportional Hazards Cox Regression Models.
… And There Exist An Infinite Number Of Such Instants … any division of continuous time—weeks,
days, hours, etc—can always be made finer. (in contrast to the finite—and usually small—number of values for
TIME in discrete-time)
… the Probability Of Observing Any Particular Event Time Is Infinitesimally Small
… (and approaches 0 as time’s divisions get finer).
• This has serious implications for the definition of hazard -- the lynchpin of any survival analysis.
• We must define continuous-time hazard differently, making it more difficult to estimate and display it in data analysis.
… the Probability That Ties—When Two Or More People Have the Same Event Time– Will
Occur Is Infinitesimally Small …• Continuous-time survival methods were developed
assuming that ties never occur. • Unfortunately, ties are usually present in real
“continuous time” data? Why? Because, while underlying true times to event may be truly continuous, times recorded in the data are usually rounded to the nearest unit (year, month, week, etc).
• This can lead to difficulties, and ad-hoc fix-ups.
In Continuous Time … We Know The Precise Instant That the Events Occur …
e.g., Jane took her first drink at 6:19 after release from an alcohol treatment program
(ALDA, Section 13.1.1, pp 469-471) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6 Slide 2
S077: Applied Longitudinal Data AnalysisWhat Happens When We Record Event Occurrence In Continuous Time?
This Implies that…
Data source: Diekmann & colleagues (1996), Journal of Social Psychology.
Sample: 57 motorists in Munich, Germany (purposefully) blocked at a green light by a Volkswagen Jetta.
Research design:Tracked from light change until horn honk:
n=43 (75.4%) honked their horns before the light turned red; the rest are censored.
Event time recorded to the nearest 100th of a second!
(ALDA, Section 13.1.1, pp 471-472)
The only tie!
A few very patient people?
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 3
S077: Applied Longitudinal Data AnalysisWhat Do Continuous Time Event History Data Look Like?
(ALDA, Section 13.1.2 & 13.1.2, pp 472-475)
jiij tTtS Pr)(
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 4
S077: Applied Longitudinal Data AnalysisDefining the Continuous Time Survivor Function
Notation for Continuous-Time Event Data: Ti is a continuous random variable representing the event time for individual i.
tj clocks the infinite number of instants – the “true” time -- when the event could actually occur.
CENSORi indicates whether Ti is censored.
Survival Probability, and the Survivor Function, have the same definition in continuous-time as they do in discrete-time because they refer to the probability that a person will experience the event after a particular instant …that is, in an interval!
Survival Probability, and the Survivor Function, have the same definition in continuous-time as they do in discrete-time because they refer to the probability that a person will experience the event after a particular instant …that is, in an interval!
How Do We Estimate Survival Probability and the Survival
Function, in Continuous Time?Several approaches (see ALDA):• Discrete-Time Method.• Actuarial Method.• Kaplan-Meier (“Product Limit”) Method.
(ALDA, Section 13.3, p 483-491)
Estimated median lifetime=3.5769 seconds
Kaplan-Meier Estimate Of The Survivor Function:
Note how smooth these estimates are.
)(ˆ1)(ˆ1)(ˆ1)(ˆ21 jj tptptptS
Conditional Probability Of Event Occurrence in period j:
Note how erratic they are, especially as risk set declines in later intervals.
j
jj riskatn
eventsntp )(ˆ
Key idea:Use observed event times to construct time intervals of different lengths, such that each interval contains only one observed event time, then apply standard discrete-time methods:• By convention, construct an initial interval [0, 1.41).• Since the first 3 observed event times are 1.41, 1.51
and 1.67, construct two subsequent intervals: [1.41, 1.51), [1.51, 1.67).
• Continue through all the observed event times.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 5
S077: Applied Longitudinal Data AnalysisEstimating The Continuous-Time Survivor Function –Kaplan-Meier Approach
But How Do We Estimate the Continuous-Time Equivalent of the Hazard Probability – In Fact, Is There Such An Equivalent?
Advantages of the KM approach:• Uses all the observed information on the continuous
event times without grouping or “binning up.”• If event occurrence is recorded in a truly continuous
time-metric, the estimated survivor function appears almost ‘continuous.’
• The estimated survivor function is as refined as the fineness of the data collection.
Drawbacks of KM approach:• When examining plots for subgroups, any“drops”
will occur in different places making visual comparison trickier.
• No corresponding (decent) estimate of hazard is available. You could compute:
but these estimates tend to be too erratic to be of much direct use.
j
jKMjKM width
tpth
)(ˆ)(ˆ
(ALDA, Section 13.3, p 483-491)
5 10 15 20
Seconds after light turns green
0
0.00
0.25
0.50
0.75
1.00
S(tj )
Kaplan Meier
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 6
S077: Applied Longitudinal Data AnalysisKaplan-Meier Estimates of the Sample Survivor Function: Pros And Cons
(ALDA, Section 13.1.2 & 13.1.2, pp 472-475)
HazardHazard again assesses risk—at a particular moment—
that an individual, who has not yet done so, will experience the event …
But, it cannot be defined as a probability because, in continuous time, any probability will always tend to
zero!
Divide time into an infinite number of vanishingly small intervals:
[tj , tj + t)
t
tTtttinterval the in falls TPr Limit)h(t jijji
0 tij
|),[
Instead, We Define Hazard As A “Rate”… the limit of the probability that Ti falls in the interval,
divided by the width of the interval (t), as t 0 :
Tips on Interpreting Continuous-Time Hazard:• It’s not a probability—it’s a“rate” or “probability per
unit of time.”• You need to be explicit about the unit of time—60
mph, 60K/yr.• Unlike discrete-time hazard probabilities,
continuous-time hazard rates can exceed 1 (this has implications for statistical we model log hazard).
• (Intuition – similar to thinking about a number of events occurring in a finite period, and then dividing by the length of the period.)
includes tj excludes tj + t
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 7
S077: Applied Longitudinal Data AnalysisDefining the Continuous-Time “Hazard–Rate”
(ALDA, Section 13.4, pp 488-491)
Cumulative Hazard-Rate Function • Assesses the total amount of accumulated
risk individual i has faced from the beginning of time (t0) to the present (tj)
• By definition, begins at 0 and rises monotonically over time (never decreasing).
• Has no directly interpretable metric, and is not a probability.
• Cumulation prevents you from using it to directly assess unique risk but, examining its changing shape would allow us to deduce the information we need.
• And, the good news is that it can be estimated directly from the survivor function.
,)()( ijt and t between
ij th cumulationtHj0
ConclusionIf you had a way of estimating H(t), you could deduce the shape of h(t) by
studying how the gradient of H(t) changed over time -- any change in the gradient would reflect a corresponding change in h(t).
First, let’s think conceptually and imagine the transition
from h(t) to H(t) In this example, because h(t) is
constant, the corresponding H(t) would increase linearly with time (because the same fixed amount of risk—the constant value of hazard—is added to the prior cumulative level at each successive instant)
Now, think your way back from H(t) To h(t)
Because this is what you will actually need to do in practice. • Guesstimate the rate of
increase in H(t) at different points in time.
• Because the slopes are identical, here the rate of change in H(t) is constant over time, indicating that the level of h(t) is constant over time.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 8
S077: Applied Longitudinal Data AnalysisEstimating the Hazard-Rate: The Key is the Cumulative Hazard-Rate Function
(ALDA, Section 13.4.1, pp 488-491)
h(t) must be increasing(the linear increase in h(t) is not
guaranteed, but a steady increase is).
h(t) must be initially low, then increase and then decrease
When rate of increase in H(t) reverses itself, h(t) has hit a peak (or trough).
h(t) must be decreasing Over time, a smaller amt of risk is added to H(t) suggesting the asymptote in h(t).
H(t) accelerates over time H(t) decelerates over time H(t) accelerates then decelerates
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 9
S077: Applied Longitudinal Data AnalysisFrom Cumulative Hazard-Rate To Hazard-Rate: Develop Your Intuition
(ALDA, Section 13.4.2, p 491-494)
Conclusion:
Hazard-rate is initially low, increases until around the 5th second, and then decreases
again.
How Can We Systematically Quantify These Changing Rates Of Increase So As To Estimate Hazard-Rate?
0 5 10 15 20
Seconds after light turns green
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
H(tj )
Examining the changing slopes in to learn about the hazard-rate …
)(ˆ tH
The “-Ln S(t)” Method• It requires calculus to prove, but it can be
established that H(tj) = -ln S(tj).
• So … you can estimate H(t) by taking the negative log of the KM estimated survivor function.
Negative log survivor
slowest rate of increase
faster rate of increase
slowing down
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 10
S077: Applied Longitudinal Data AnalysisCumulative Hazard-Rate Function In Practice: Estimation Methods & Data Analytic Practice
(ALDA, Section 13.5, p 494-497)
Bandwidth=1
Bandwidth=3
Bandwidth=2
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 11
S077: Applied Longitudinal Data AnalysisKernel-Smoothed Estimates Of The Hazard-Rate Function
Idea:Use The Changing Rates Of Change In Cumulative
Hazard To Generate (Admittedly Erratic) Hazard-Rate Estimates & Smooth Them Out …
• h(tj) = rate of change in {-ln S(tj)}.
• So … successive differences in sample cumulative hazard yield “pseudo-slope” estimates of hazard.
• Slide a temporal window—a “bandwidth”— across the plot and aggregate these estimates together , in a moving average.
• Yields “kernel-smoothed” approximate hazard-rate estimates.
Finally… A Computational Window On The Continuous-Time Hazard-Rate
But, as the bandwidth widens: • The link between the smoothed function and the actual
hazard-rate is weakened because we are estimating hazard-rate’s average within a broader timeframe.
• The estimates cannot be computed near to the beginning and the end, because of the need to average (a big problem if hazard is highest initially)
Sample: 194 inmates released from a minimum security prison
Research Design: Each was followed for up to 3 years.
Event: Whether and, if so, when they were re-arrested.
Arrest recorded to the nearest day.
N=106 (54.6%) were reincarcerated.
Data source:Kristin Henning and colleagues, Criminal Justice and Behavior
Person-Level Dataset (note, we do not use a person-period data set)
PERSONAL—identifies the 61 former inmates (31.4%) who
had a history of person-related crimes (e.g., assault,
kidnapping).
PROPERTY—identifies the 158 (81.4%) who had a history of
property crimes
AGE at release—centered on sample
mean of 30.7
Event Occurrence Information
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 12
S077: Applied Longitudinal Data AnalysisIntroducing Cox Regression Analysis: Illustrative Data-Example
(ALDA, Section 14.1.1, pp. 504-507)
Sample Survivor Functions Recidivism is high in both
groups, and those with a history of person-related crimes are at greater risk
(ML of 17.3 vs. 13.1) .
Sample Cumulative Hazard Functions
Approximately linear immediately after release and soon accelerates (but at
different times); eventually both decelerate. Suggests that each underlying hazard function is
initially steady, then rises, then falls.
Sample Kernel-Smoothed Hazard Functions
Can’t describe risk immediately after release, but by month 8, we can see that the hazard-rate for those with
PERSONAL=1 is consistently higher than for those with
PERSONAL=0 .
Intuitively, a continuous-time hazard-rate model should look like a DT hazard probability model, in which a sensible transformation of hazard is expressed as the sum of two components:
• A baseline function, the value of transformed hazard-rate when all predictors are 0
• A weighted linear combination of predictors.
Realistically, because we lack a complete picture of hazard, we develop the model conceptually in terms of cumulative hazard. Then, we use algebra to deduce an equivalent specification of the model in terms of hazard-rate.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 13
S077: Applied Longitudinal Data AnalysisTowards The Cox Model For Continuous-Time Hazard-Rate: Sample Functions, By PERSONAL
(ALDA, Section 14.1.1, pp. 504-507)
What Kind Of Statistical Model Should We Use To Model Log H(t)?
A dual partition still makes sense, with log H(t) expressed as the sum of two parts:
PERSONAL = 0
PERSONAL = 1
0 6 12 18 24 30 36
Months after release
0.00
0.50
1.00
1.50H(t j )
Problem:Cumulative Hazard Rate is bounded below at 0
PERSONAL = 0
PERSONAL = 1
0 6 12 18 24 30 36
Months after release
0.00
1.00
-1.00
-2.00
-3.00
-4.00
-5.00
-6.00
Log H(tj )
Solution:Model the Log of Cumulative Hazard-Rate …
Expands vertical separation at smaller
values
Compresses vertical separation at higher
values
A Baseline Function, now the value of log
H(t) when all predictors are 0
A weighted linear combination of the
predictors.
But, How Do We Specify This Baseline?
As in DTSA, we use a completely general unconstrained profile, which we’ll call log H0(tj),
and we won’t even estimate it!!!You might think that this in-built vagueness creates
problems for estimation, but the beauty of Cox’s approach is that it’s perfectly fine.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 14
S077: Applied Longitudinal Data AnalysisTowards The Cox Regression Model For Cumulative Hazard-Rate
(ALDA, Section 14.1.2, pp. 507-512)
ij0ij PERSONALtH logH(t log 1)()
)() j0ij tH logH(t log
0PERSONAL when
1)()
j0ij tH logH(t log
1PERSONAL when
When PERSONAL=1, the baseline function
shifts “vertically” by 1
Log H(tij
)
"
"
""""""""""""""""" "" ""
"" """ """" "
""" " "
!
!
!!!!!!!!!!!!!!! !!
!!!!!!!!!!!!!!!!!!! !! !!!!!!!
!! !! !! !!! ! !!!
! ! !!!
0 6 12 18 24 30 36
Months after release
0.00
1.00
-1.00
-2.00
-3.00
-4.00
-5.00
PERSONAL = 0
PERSONAL = 1
Mapping the model onto sample log cumulative hazard functions(using +’s and ’s to denote estimated subsample values)
Log H0
(tj )
Log H0(t
j ) + b
1Curves are hypothesized
population log cumulative hazard-
rate functions
(they should go through sample data but we don’t
expect them to fit perfectly)
Vertical distance between functions, b1,
captures the magnitude of the predictor’s effect.
(We assume that the effect is constant regardless of
how long the offender has been out of prison)
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 15
S077: Applied Longitudinal Data AnalysisSpecifying The Cox Regression Model In Terms Of Log Cumulative Hazard-Rate
(ALDA, Section 14.1.2, pp. 507-512)
i
ij0ij
PERSONALj0ij
PERSONALtH logH(t log
etH H(t
e e1
1
)()
)()
)() j0ij tH H(t
0PERSONAL when
1)() etH H(t
1PERSONAL when
j0ij
When PERSONAL=1, the baseline function is
no longer shifted vertically; instead, it is multiplied by exp(1).
Mapping the model onto sample cumulative hazard-rate functions(using +’s and ’s to denote estimated subsample values)
H(t ij)
0 6 12 18 24 30 36
Months after release
0.00
0.50
1.00
1.50
PERSONAL = 0
PERSONAL = 1
H0 (tj )
H0 (tj )exp( b 1)
Curves are hypothesized population cumulative hazard-rate functions
Ratio of Cumulative Hazard-Rate Functions
11 exp
)(
exp)(
j0
j0
tH
tH
ratio=exp(b1)
When the outcome is raw Cumulative Hazard-Rate, the functions are magnifications and diminutions of
each other—they are Proportional.
Yet we still say the effect is constant over time because their ratio is constant
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 16
S077: Applied Longitudinal Data AnalysisAntilogging To Specify The Cox Regression Model In Terms Of Cumulative Hazard-Rate
(ALDA, Section 14.1.3, pp. 512-516)
Using calculus, we can show that the Cox Regression Models just specified in terms of Cumulative Hazard-Rate are identical to those expressed in terms of Raw Hazard-Rate.
ij0ij XtH logH(t log 1)()
iXj0ij etHH(t 1)()
H(t ij )
0 25 50 75 100
Time
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
H0(tj )
H0(tj )exp( 1)Ratio = exp( 1)
Log H(tij )
0 25 50 75 100
Time
0.00
2.00
4.00
-2.00
-4.00
-6.00
-8.00
-10.00
-12.00
-14.00
Log H0(tj )
Log H0(tj ) +
1
1Difference =
When expressed on a log scale, b1 represents a
constant vertical distance
When expressed on a raw scale, exp(b1) represents a
constant vertical ratio
CumulativeHazard-Rate Format
h(tij )
0 25 50 75 100
Time
0.00
0.05
0.10
0.15
0.20
0.25
h0(tj )
h0(tj )exp( 1)
Ratio = exp( 1)
Log h(tij )
0 25 50 75 100
Time
0.00
2.00
4.00
-2.00
-4.00
-6.00
-8.00
-10.00
-12.00
-14.00
Log h0(tj )
Log h0(tj ) +
1
1Difference =ij0ij Xth logh(t log 1)()
iXj0ij ethh(t 1)()
RawHazard-Rate Format
Practical Consequences
1. Can conduct exploratory data analysis using cumulative hazard-rate.
2. Can interpret parameter estimates in terms of predictors’ effects on hazard-rate.
3. Because raw hazard-rate profiles at different levels of the predictors are proportional, the Cox model is often called a “proportional hazards model.”
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 17
S077: Applied Longitudinal Data AnalysisHazard-Rate Representation Of The Cox Regression Model
(ALDA, Section 14.2, pp 516-523)
PijPijijj0ij
PijPijijj0ij
XXXthh(t log
XXXthh(t
2211
2211
)(log)
exp)()
Estimation
In addition to specifying a statistical model for hazard-
rate, Cox developed an ingenious method for fitting his “Cox Regression Model” to data, called the method of Partial Maximum Likelihood Estimation (and is available in all major stat packages (See §
14.2)).
Three Practical Consequences Of Cox’s Method
• The Shape Of The Baseline Hazard-Rate Function Is Irrelevant. Unlike parametric methods, we need not make any assumptions about the shape of the baseline hazard-rate function.
• The Precise Event Times Turn Out To Be Irrelevant; Only Their Rank Order Matters. Cox regression analysis is semi-parametric. The very data that you took pains to collect so precisely is converted effectively into ranks during model fitting!
• Ties Can Create Analytic Difficulties. Even though the specific time values are irrelevant, their ranking does matter. In theory, there should be no ties; in reality, there always are. (In the recidivism data, there are 5 days when 2 people were arrested—9, 77, 178, 207, & 528.) All packages have one or more ad-hoc ways of dealing with this (we use Efron’s Method).
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 18
S077: Applied Longitudinal Data AnalysisFitting The Cox Regression Model To Data
Simple Uncontrolled Models
Overall Model
Strategy for interpreting parameter estimates:Each summarizes the impact of a one-unit
difference in the predictor on the log
hazard-rate, controlling for other predictors in the
model.
Log-hazard-rate for someone with a history of personal offenses is
0.479 units higher than for someone without this history.
What Does This Look Like Graphically?Returning to the earlier sample log cumulative hazard-rate functions, by PERSONAL, we estimate that in the
population, the average distance between them is 0.479
But, Is There A More Intuitive Way Of Explaining This?
(ALDA, Section 14.3.1, p. 524-528) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 19
S077: Applied Longitudinal Data AnalysisInterpreting Parameter Estimates From a Fitted Cox Regression Model
You can antilog the parameter estimates and interpret them as fitted hazard-rate ratios associated with a 1-unit
difference in the predictor …
e1.1946
The estimated hazard-rate describing recidivism among offenders with a history of property offenses is more
than three times that of those with no such history.
For continuous predictors Compute the %age difference in hazard associated with a 1-unit
difference in the predictor: 100*(hazard ratio-1)
100*(0.9342-1) = -6.58%The estimated hazard-rate for
recidivism is 6.6% lower for each additional year of age upon
release
Careful: You Can Only Make Comparative Statements about Fitted Hazard-Rate, from Cox Regression Output …
• You can say that the hazard-rate for one group is three times that of another, but you cannot say how high, or low, either function actually is.
• This is the critical compromise associated with partial ML approach used to fit the Cox Regression Model.
(ALDA, Section 14.3.1, p. 524-528) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 20
S077: Applied Longitudinal Data AnalysisInterpreting Parameter Estimates as Hazard-Rate Ratios In A Fitted Cox Regression Model
(ALDA, Section 14.3.4, p. 532-535)
Q: How Do You Compare Each Person’s Unique Risk Of Event
Occurrence To That Of A “Baseline Person”
(e.g., to someone with values of all predictors equal to 0—here, a person of avg. AGE on release (30.7), with no history of PERSONAL or PROPERTY
crime)
PijPijijj0
PijPijijj0 XXXth
XXXth
2211
2211 exp)(
exp)(
A: You do it by taking ratios of their hazard functions:
risk score
Average Comparative Risk:But, participants arrived by
different routes:• ID 22 was of average age on release
with no history of these crimes.• ID 8 had a history of both crimes but
was 22 years older than the average inmate upon release.
High Comparative Risk:• All were younger than average on
release.• Risk that ID 5 will re-offend is over
seven times than a baseline person.
Low comparative risk:• All much older than average on release.• None has history of both crimes.
Risk scores are useful for demonstrating that there is more than one way to attain a given level of risk but….
Careful: Changing the baseline by centering the predictors changes the values of the risk scores …
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 21
S077: Applied Longitudinal Data AnalysisUsing Risk Scores: Summarizing the Impact Of Several Predictors Simultaneously
(ALDA, Section 14.4, p. 535-542)
Even though we have repeatedly stated that Cox Regression Analysis provides no information about the Baseline Hazard-Rate Function, it is actually possible to recover estimated baseline functions from the fitted model …
Useful for documenting the combined effects of predictors. Here, we use Model D to control for AGE and demonstate the combined effect of predictors PERSONAL and PROPERTY, documenting the large differences in survival associated with variation in them
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 22
S077: Applied Longitudinal Data AnalysisRecovering Survivor & Cumulative Hazard-Rate Functions From A Fitted Cox Regression Model
(ALDA, Section 15.1, pp 544-545)
Model specification is easy: Just add the subscript j to the time-varying predictors
But, data demands can be enormously high (sometimes insurmountable) … • You need to know the value of any time-
varying predictor—for everyone still at risk—at every moment when anyone experiences the event.
Same requirement as in discrete-time, but it was unproblematic there because:
Number of unique event times was relatively small
Event occurrence and predictors are typically assessed on the same schedule
In continuous time, you typically can’t set the data collection schedule to coincide with event occurrence for everyone still at risk
ijij0ij XXthh(t 2211exp)()
Practical Implications
• If you’re interested in time-varying predictors, research design is crucial—Don’t wait until the data are collected.
• Time-varying predictors that are non-reversible dichotomies—that themselves represent event occurrence—are easiest to collect data on (eg, 1st marriage, HS graduation).
• Reversible dichotomies and continuous predictors usually require data-imputation and things can get very complex very quickly (discussed in Section 15.1.2 and 15.1.3)
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 23
S077: Applied Longitudinal Data AnalysisIncluding Time-Varying Predictors In A Cox Regression Model
(ALDA, Section 15.1.1, pp 545-551)
Sample: 1,658 men interviewed twice (in 1974 and 1985) --
382 (23.0%) started using cocaine between ages 17 and 41.
Data source: Burton and colleagues (1996),Journal of Health and Social Behavior
Three Time-Invariant Predictors EARLYMJ and EARLYOD indicate whether the respondent had initiated marijuana (7.2%) or other drugs (3.7%) so early that he could be characterized as a previous user at t0 (age 17).
BIRTHYR (1961-1985), to account for societal changes (included as a control predictor in every model).
Four Time-Varying Predictors
– USEDMJj, SOLDMJj USEDODj , SOLDODj each
identify, at each age tj , whether the respondent had previously used or sold marijuana (MJ) or other drugs (OD).
– Conceptually, think about a person-period data set in which these variables switch from 0 to 1 in the relevant year and stay at 1 thereafter.
– In reality, we do not use a person-period data set but rather computer code in a person-level data set ( Section 15.1, p. 547).
• Rather than using contemporaneous values of the TV predictors, we lag them by one year. Addresses issues of rate-and state-dependence (discussed in Section 12.3.3, p. 440).
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 24
S077: Applied Longitudinal Data AnalysisIncluding Time-Varying Non-Reversible Dichotomies As Predictors In A Cox Model: Example
(ALDA, Section 15.1.1, pp 545-551)
A: Only Time-Invariant Predictors
IncludedAll 3 stat sig.
B: Substitute Time-Varying Use Predictors
• Effects much larger (and still stat sig.).
• Fit much better.
C: Add Time-Varying Sales Predictors
• Creates ordinal variable when paired with use predictors.
• Both use and sales are stat sig.
• Hazards add up: Someone who both used and sold MJ and OD has a hazard ratio of exp(5.1606)=164.27!
• Best fitting model so far.
D: Add Back Time-Invariant Predictors
• Estimates are not stat sig.• D fits no better than C• We prefer model C
Note: Diminishing BIRTHYR effects• Uncontrolled estimate = 0.2026.• Drops from .1551 to 0.0849 from A to C.• Effects previously attributable to BIRTHYR get absorbed
by TV drug use (known as substitution effects).
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 25
S077: Applied Longitudinal Data AnalysisInterpreting Results Of Fitting Cox Regression Models With Time-Varying Predictors
(ALDA, Section 15.3.2, pp 564-570)
Sample: 174 teens admitted to a psychiatric hospital
Research Design: True ExperimentHalf (n=88) had traditional psychiatric treatment and services (TREAT=0).
The other half (n=86) were randomly selected to participate in an innovative program that provided coordinated mental health services regardless of setting (in- or out-patient).
Everyone tracked for up to 3 months to determine whether and, if so, when they were released.
RQ: Does provision of comprehensive mental health services reduce the length of hospital stay?
Data source: Michael Foster and colleagues (1996), Evaluation and Program Planning
Fitted Cox model for TREAT
Does this statistically non-significant effect mean that the TREATment has
no effect?
Perhaps, but not necessarily…It could be that the effect of TREATment
varies over time
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 26
S077: Applied Longitudinal Data AnalysisMight The Effect Of A Predictor Differ Over TIME?
(ALDA, Section 15.3.2, pp 564-570)
iTREATj0ij
ij0ij
eth h(t
TREATth logh(t log
1)()
)() 1
Proportional Hazards Assumption
Any predictor’s effect must only produce a constant difference in the elevation of the
log hazard-rate profile (or a magnification/diminution of the raw
hazard profiles)
In discrete-time, we could plot the sample hazard probability functions to see if there
was a violation, but in continuous time, it’s not easy to plot sample hazard-rate
functions
Solution: Inspect sample cumulative hazard-rate functions because model
equivalence means that these plots can tell what we need to know about
potential violations of the proportionality assumption
-4
-3
-2
-1
0
1
2
0 7 14 21 28 35 42 49 56 63 70 77
Days in hospital
fitt
ed l
og H
(t)
Treatment
Comparison
TREATment effect appears large early
Little TREATment effect late
If The Proportionality Assumption Is Violated For A Predictor, Then There Is An Interaction Between The Predictor And TIME.
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 27
S077: Applied Longitudinal Data AnalysisHow Might We Detect A Violation Of The “Proportional Hazards” Assumption?
(ALDA, Section 15.3.2, pp 564-570)
Three Common Specifications For Interactions With TIME
A. Linear interaction with TIME—the effect of TREAT declines smoothly (linearly) over time.
B. Step-function—the effect of TREAT differs across epochs (here, we’ll ask about weeks).
C. Logarithmic---similar to linear, but handles typical long tails for TIME.
Statistically Significant Linear Interaction with TIME
By centering TIME on day 1, 0.7064 is TREATment effect on first
day of hospitalization
Effect of TREAT Differs by WeekNote decline in estimates in early weeks.
exp(2.5335)=12.60 is the estimated
hazard ratio on day #1.
Estimated log hazard for TREAT declines
by .5301 as length of stay doubles (1 to 2, 2
to 4, 4 to 8 etc.):
Day 1=12.60
Day 8 = 2.56
Day 32=0.89
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 28
S077: Applied Longitudinal Data AnalysisFitting Non-Proportional Cox Regression Models