57
Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park [email protected]

Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park [email protected]

Embed Size (px)

Citation preview

Page 1: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

Threshold Regression Models

Mei-Ling Ting Lee, University of Maryland, College Park

[email protected]

Page 2: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

2

Outline

• An example to demonstrate the usefulness of the first-hitting time based threshold regression (TR) model.

• Introduction of the TR model

• Connections with the PH model

• Semi-parametric TR model

• Simulations

Page 3: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

3

A non-proportional hazard example: Time to infection of kidney dialysis patients with different catheterization procedures(Nahman et al 1992, Klein & Moesberger 2003)

• Surgical group:43 patients utilized a surgically placed catheter

• Percutaneous group:76 patients utilized a percutaneous placement

of their catheter

The survival time is defined by the time to cutaneous exit-site infection.

Page 4: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

4

Kaplan-Meier Estimate versus PH Cox Model

Page 5: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

5

Weibull versus Lognormal

Page 6: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

6

Loglogistic versus Gamma

Page 7: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

7

Kaplan-Meier Estimate versus First-hitting-time based Threshold Regression Model

Page 8: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

9

First-hitting Time Based Threshold Regression:Modeling Event Times by

a Stochastic Process Reaching a Boundary (Lee & Whitmore 2006, Statistical Sciences)

• Example: Equipment Failure: Equipment fails when its cumulative wear first reaches a failure

threshold.

Question: What is the influence of ambient temperature on failure?

Page 9: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

10

Occupational and Environmental Health

Occupational risk: A railroad worker is exposed to diesel exhaust in the workplace. The exposure and other influences cause the worker’s health status to gradually tend downward toward a critical threshold that will result in death from a particular cause (e.g., lung cancer).

Question: Does diesel exhaust exposure increase the risk of lung cancer and, if so, to what degree?

Page 10: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

11

y0

S0

0

time t

Process Y(t)

First hitting time S of a fixed boundary at level zero for a stochastic process of interest Y(t)

Sample path

Page 11: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

First Hitting Time (FHT) Models

{Y(t)} : the stochastic process of interest

: the boundary set

First hitting time S defined by

S = inf { t : Y(t) ∈ B }

Page 12: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

13

Examples of first hitting time (FHT) models:

• Wiener diffusion to a fixed boundaryProgress of multiple myeloma until death • Renewal process to a fixed count of renewal eventsTime to the nth epileptic seizure • Semi-Markov process to an absorbing state Multi-state model for disease with death as an absorbing state

Page 13: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

14

y0

S L0

time t

Process Y(t)

Two sample paths of a stochastic process of interest:

(1) One path experiences ‘failure’ at first hitting time S

(2) One path is ‘surviving’ at end of follow up at time L

Sample paths Y(t)

Page 14: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

16

Parameters for the FHT Model

Model parameters for the latent process Y(t) :

• Process parameters: θ = (where is the mean drift and is the variance

• Baseline level of process: Y(0) = y0

• Because Y(t) is latent, we set

Page 15: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

17

Likelihood Inference for the FHT Model

The likelihood contribution of each sample subject is as follows.

• If the subject fails at S=s:

f (s | y0, ) = Pr [ first-hitting-time in (s, s+ds) ]

• If the subject survives beyond time L:

1- F (L | y0 ,) = Pr [ no first-hitting-time before L ]

Page 16: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

18

0 0 01

ln , ln , 1 ln , .

where

is the failure indicator for subject

is a censored survival time if subject fails

and denote the FHT

n

i i i ii

i

i i i

L x d f t x d F t x

d i

t t s i

f F

p.d.f and complementary c.d.f.

Page 17: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

19

Threhold Regression

Link Functions: parametric or semi-parametric

Simultaneous regressions:Possible Link functions for the baseline parameter Y(0) and drift parameter

include

• Linear combinations of covariates X1,…, Xp

• polynomial combinations of X1, …, Xp

• Regression splines• Penalized regression splines• Random effects

Page 18: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

20

Threshold regression (TR)

More than one simultaneous regression functions with different links may be used to estimate parameters of:

1. Process Y(t): Wiener process, gamma process, etc

2. Boundary threshold: straight lines or curves

3. Time scale: calendar or running time, analytical time

Page 19: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

21

References

• Aalen O.O. and Gjessing H.K. (2001). Understanding the shape of the hazard rate: a process point of view. Statistical Science, 16: 1-22.

• Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data, Second Edition, Wiley.

• Lee, M.-L. T. and G. A. Whitmore (2006). Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary. Statistical Sciences.

• Aalen O.O., Borgon O, and Gjessing H.K (2008). Survival and Event History Analysis: A process Point of View. Springer.

Page 20: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

22

Threshold Regression Interpretation of PH functions

• Most survival distributions can be related to hitting time distributions for some stochastic processes.

• Families of PH functions can be generated by varying time scales or boundaries of a TR model

• The same family of PH functions can be produced by different TR models.

• Simulation studies: both TR model and PH hold simultaneously, based on standard Brownian motion with variation of time scale. (Julia Batishev’s presentation in sec 2 of Tract 2 on July 3rd)

Page 21: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

23

Threshold Regression

If Y(t) has a Wiener Process, then the first hitting time S has an inverse Gaussian distribution.

Consider

Page 22: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

24

Semi-parametric Threshold Regression(Joint work with Z. Yu and W. Tu)

When Y(t) has a Wiener Process, then the first hitting time S has an inverse

Gaussian distribution.

We consider

Where the functional form of is unspecified.

Page 23: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

25

Semi-parametric TR using Regression Spline

• Use linear link for covariates X1, …, Xp

• For covariate Z, consider the nonparametric function (z) as a linear combination of a set of basis functions Bj(z).

(z) =j jBj(z).

• Select the smoothing parameter and the number of knots.

Page 24: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

26

Page 25: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

27

Semi-parametric TR using Penalized Spline

In addition to regression spline, we also consider a cubic spline approach with penalty on the second derivative of the nonparametric function

Page 26: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

28

Cross Validation

Page 27: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

29

Simulation Procedures

Page 28: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

30

Simulation Results

Table 1 summarizes simulation results using both spline approaches.

For the penalized spline approach, over 400 replications, the means are

=0.491 and = 0.504 (very close to the true value =0.5).

The mean of the estimated standard error are 0.311 and 0.180 which are very

close to the empirical standard errors 0.307 and 0.177.

The empirical coverage probabilities are 0.952 and 0.947.

Our simulation results show satisfactory performance of the penalized spline

approach with respect to regression coefficients and corresponding variance

estimation.

The mean of the estimate for over 400 replications is 1.99 which is very close

to the true value 2.

Page 29: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

31

Page 30: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

32

Page 31: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

33

Page 32: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

36

Analyzing Longitudinal Survival Data Using Threshold Regression:

Comparison with Cox Regression

Mei-Ling Ting Lee, University of Maryland

G. A. Whitmore, McGill University

Bernard Rosner, Harvard Medical School

Page 33: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

37

Longitudinal Data Structure

Page 34: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

38

Longitudinal Data Structure

Health examples:

• Annual monitoring of blood pressure

• Current status of disease

• Cohort study of smoking and lung cancer with bi-annual medical checks

Page 35: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

39

x0

S

tm

0

A sequence of time points

Process X(t)

Figure: Longitudinal data structure for threshold regression

t1 t2

(x, z, f, c)

…..

Page 36: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

40

Longitudinal Data Structure (cont.)

Individual observation sequences may include:

• Process level x• Covariate vector z• Failure indicator f• Censoring indicator c

Page 37: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

41

Longitudinal Data Structure (cont.)

Page 38: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

42

Uncoupling Longitudinal Data

Definition of uncoupling: Break each longitudinal record into a series of single records.

Handling longitudinal data is simple with uncoupling.

Under what conditions is uncoupling valid?

Page 39: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

43

Analysis of Longitudinal Threshold Regression Data with Uncoupling

Define the observation vector for each visit j :

The longitudinal observation sequence is stopped by censoring or failure at some visit m:

Page 40: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

44

Analysis of Longitudinal Threshold Regression Data with Uncoupling

Probability of the stopped observation sequence:

Invoke a Markov assumption:

Page 41: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

45

A Common Analytical Situation for P(Aj | Aj-1)

Consider independent censoring c , Failure indicator f (f =1 if failed, f = 0 if not)Process reading x , and covariate z

Without readings on process { X (t) } (i.e., the process of interest is latent), probability elements for the likelihood then involve simple conditional expressions:

Page 42: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

46

Case Illustration

Threshold regression output for the illustrative longitudinal data set

Assuming X(t) follows a standardized Brownian motion with initial status x0 to a fixed barrier at zero. We can make inferences about the influence of covariates on x0 at each visit.

Page 43: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

47

Case Illustration

Page 44: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

48

Case Illustration (cont.)

Threshold regression output has the familiar look of conventional regression output but offers greater insights and a richer interpretation

Page 45: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

49

Uncoupling in Cox PH Regression

The main probability element

has the following form when Cox PH regression is uncoupled

Term h0j denotes a segment of a discrete baseline hazard function over [tj-1, tj).

Page 46: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

52

Output

Page 47: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

53

Case Illustration

The Nurses’ Health Study cohort data set

Questionnaire completed every two years

Interest in incidence of lung cancer

Longitudinal records from 1986 to 2000: 115,768 subjects748,007 observational intervals1,577,382 person-years at risk

The health process is latent.

Page 48: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

54

Case Illustration (cont.)

Assume independent censoring.

Assume a Wiener diffusion process with zero mean and unit variance (Brownian motion).

• A zero mean is consistent with the data. • A latent health status scale allows the variance to be fixed arbitrarily.

The only parameter is x0, the initial health status at baseline for each observation interval.

Page 49: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

55

Case Illustration (cont.)

For each observation interval, the covariates are:

1. Baseline cumulative smoking (pkyrs0, in pack years).

2. Baseline age (age0, in years)

A log-linear link is used, i.e., ln(x0)

Page 50: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

56

Case Illustration (cont.)

Threshold regression output based on longitudinal uncoupling

Page 51: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

57

Case Illustration (cont.)

Uncoupling breaks each longitudinal record in the data set into a series of single records.

Once parameters are estimated, predictive inferences can be made by splicing together forward records of a case using specified covariate conditions. In essence, splicing is the reverse of uncoupling.

Page 52: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

58

Case Illustration (cont.)

The splicing process involves multiplying estimated conditional event probabilities as follows.

Page 53: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

59

Partial Likelihood

Explicit representation of the initial probability and one-step transition probabilities:

Page 54: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

60

Partial Likelihood (cont.)

Factor conditional probability

Page 55: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

61

Partial Likelihood (cont.)

Build the partial likelihood for parameters of the parent process, running time and boundary.

Set aside the likelihood contribution of the covariate process and censoring mechanism

Page 56: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

62

Common Analytical Situation (cont.)

Longitudinal records are broken into a series of single records with the following elements.

Page 57: Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park MLTLEE@UMD.EDU

63