Absolute risk estimation in a case cohort study of prostate cancer

BackgroundMethodsResults

Summary

Absolute Risk Estimation in a Case Cohort Studyof Prostate Cancer

Sahir Rai Bhatnagar

Queen’s University

[email protected]

August 30, 2013

1 / 35


Summary

RationaleObjectivePrevious Work

Incidence vs. Mortality of Prostate Cancer in Canada

prostate

25

50

75

100

125

1990 2000 2010year

age−

stan

dard

ized

rat

e

incidence mortality

Possible reasons for gap:

Better treatment outcomes

Death from other causes(comorbidities)

Overdiagnosis

2 / 35


Summary



prostate

25

50

75

100

125

1990 2000 2010year

age−

stan

dard

ized

rat

e

incidence mortality




Overdiagnosis

2 / 35


Summary



prostate

25

50

75

100

125

1990 2000 2010year

age−

stan

dard

ized

rat

e

incidence mortality




Overdiagnosis

2 / 35


Summary



prostate

25

50

75

100

125

1990 2000 2010year

age−

stan

dard

ized

rat

e

incidence mortality




Overdiagnosis

2 / 35


Summary


Why is this Gap a Problem?

Overdiagnosis

1 Screen detected cancer that would have otherwise not beendiagnosed before death from other causes

2 Can lead to overtreatment

Overtreatment

Treatment for a cancer that is clinically not significant → lowtumour grade

Morbidity associated with being diagnosed with cancer

Quality of life effects, incontinence, erectile dysfunction

Lower life expectancy

3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary



Overdiagnosis



Overtreatment





3 / 35


Summary


Solution to Overdiagnosis and Overtreatment

Individualised Decision Making

Active surveillance for patients with favorable risk disease(Gleason score ≤ 6)

Clinical tools that calculate life expectancy based onpredisposing factors such as age and comorbidities

4 / 35


Summary






4 / 35


Summary






4 / 35


Summary






4 / 35


Summary


Objectives

Research Statement

1 Using a semiparametric model based on both age andcomorbidity, obtain absolute risk estimates of death fromprostate cancer in a population based case cohort study

2 Create a web tool for life expectancy based on this model, foruse in a clinical setting

5 / 35


Summary


Objectives

Research Statement

1 Using a semiparametric model based on both age andcomorbidity, obtain absolute risk estimates of death fromprostate cancer in a population based case cohort study

2 Create a web tool for life expectancy based on this model, foruse in a clinical setting

5 / 35


Summary


Wykes (2011)

Wykes (2011) proposed an ad hoc approach to estimating therelative risk and baseline hazard function for the case cohort design:

Modified PH Model

S(t) = (S0(t))α exp {αβ1 (age − µage) +

β2

(CIRS-Gpros − µCIRS-Gpros

)}S0(t), β1, µage are derived from the full cohort

β2, µCIRS-Gpros are from case cohort

α = β1 from case cohortβ1 from full cohort

6 / 35


Summary


Wykes (2011)

Wykes (2011) proposed an ad hoc approach to estimating therelative risk and baseline hazard function for the case cohort design:

Modified PH Model

S(t) = (S0(t))α exp {αβ1 (age − µage) +

β2

(CIRS-Gpros − µCIRS-Gpros

)}S0(t), β1, µage are derived from the full cohort

β2, µCIRS-Gpros are from case cohort

α = β1 from case cohortβ1 from full cohort

6 / 35

Case Cohort Study Design

40 45 50 55 60 65 70

(a) Cohort, no exposure info

40 45 50 55 60 65 70

(b) Case cohort sampling

40 45 50 55 60 65 70

(c) Add back failures

CIRS=10CIRS=2

CIRS=7CIRS=6

CIRS=14CIRS=13

CIRS=1

CIRS=9

CIRS=9CIRS=4

CIRS=4

CIRS=10CIRS=1

CIRS=3

CIRS=6CIRS=7

40 45 50 55 60 65 70

(d) Get exposure information

Figure: Each line represents how long a subject is at risk. A failure is represented by ared dot. Subjects without a red dot are censored individuals. The axis represents agevalues.


Summary

Case Cohort Study DesignPopulationModelSoftware

Study Population

Population-based case cohort study of 2,740 patientsdiagnosed with prostate cancer between 1990 and 1998 inOntario

Stratified by CCOR random sample

Cases were sampled independently of subcohort

Death from prostate cancer and death from other causesassessed → death clearance date December 31, 1999

Death from any cause updated for subcohort only → deathclearance date December 31, 2008

8 / 35


Summary


Study Population






8 / 35


Summary


Study Population






8 / 35


Summary


Study Population






8 / 35


Summary


Study Population






8 / 35


Summary


Study Population






8 / 35


Summary


Improvements

Stratified by CCOR Cox PH

Sj(t) = S0j(t)exp(β1·Age+β2·CIRS-Gpros), j = 1, . . . , 8

We improve on Wykes (2011) work in two ways:

we estimate both the baseline survival function and relativerisk parameters using the case cohort population

we use a stratified by CCOR analysis, which assumes adifferent baseline hazard for each region

9 / 35


Summary


Improvements






9 / 35


Summary


Improvements






9 / 35


Summary


Improvements






9 / 35


Summary


Relative Risk Estimation

Prentice (1986) proposed a pseudolikelihood given by

L(β) =D∏j=1

exp(zT(j)β

)∑

k∈Rtj

exp(zTkβ

)wk

t1, . . . , tD : D distinct failure times

Rtj : the risk set at failure time tj

wk : the weight applied to the risk set

10 / 35


Summary




L(β) =D∏j=1

exp(zT(j)β

)∑

k∈Rtj

exp(zTkβ

)wk




10 / 35


Summary




L(β) =D∏j=1

exp(zT(j)β

)∑

k∈Rtj

exp(zTkβ

)wk




10 / 35


Summary




L(β) =D∏j=1

exp(zT(j)β

)∑

k∈Rtj

exp(zTkβ

)wk




10 / 35


Summary


Weighting Schemes

exp(zTi β

)Yi (tj)wi (tj) exp

(zTi β

)+

∑k∈SC ,k 6=i

Yk(tj)wk(tj) exp(zTkβ

)

Table: Comparing different values of wk

Outcome type and timing Prentice Self & Prentice Barlow(1986) (1988) (1994)

Case outside SC before failure 0 0 0Case outside SC at failure 1 0 1Case in SC before failure 1 1 1/αCase in SC at failure 1 1 1SC control 1 1 1/α

11 / 35


Summary



Therneau & Li (1999) showed that Var(β) can be computed byadjusting the outputted variance from standard Cox modelprograms:

I−1 + (1− α)DTSCDSC

I−1: variance returned by the Cox model program

DSC : subset of matrix of dfbeta residuals that containssubcohort members only

12 / 35


Summary







12 / 35


Summary







12 / 35


Summary


Absolute Risk Estimation

A weighted version of the Breslow estimator for the full cohortwith covariate history z0(u) is given by

dΛ(u, z0(u)) = dΛ0(u) exp(zT0 (u)β

)=

n∑i=1

dNi (u)

( nm )∑

j∈Ri (u) exp(zTj (u)β

) exp(zT0 (u)β

)

=n∑

i=1

dNi (u)

( nm )∑

j∈Ri (u) exp((zj(u) − z0(u))

T β)

=n∑

i=1

m

n

dNi (u)∑j∈Ri (u) exp

((zj(u) − z0(u))

T β)

13 / 35


Summary



The sum of these increments can be used to calculate the riskbetween two time points, say s and t, for s < t:

Λ(s, t, z0) = Λ(0, t, z0)− Λ(0, s, z0) =

t∫s

dΛ(u, z0(u))

For this project, we will be only considering the case when s = 0.The corresponding survival probability is

S(0, t, z0) = exp(−Λ(0, t, z0)

)

14 / 35


Summary



The sum of these increments can be used to calculate the riskbetween two time points, say s and t, for s < t:

Λ(s, t, z0) = Λ(0, t, z0)− Λ(0, s, z0) =

t∫s

dΛ(u, z0(u))

For this project, we will be only considering the case when s = 0.The corresponding survival probability is

S(0, t, z0) = exp(−Λ(0, t, z0)

)

14 / 35


Summary


Software

PROC PHREG in SAS/STAT software and coxph in R

SAS macro developed by Langholz & Jiao (2007)

15 / 35


Summary


Software

PROC PHREG in SAS/STAT software and coxph in R

SAS macro developed by Langholz & Jiao (2007)

15 / 35


Summary


Computational Methods for the Case Cohort Design

Standard Cox PH functions in SAS and R cannot directlycompute the relative and absolute risk estimates

Modifications must be made to the dataset to make use ofPROC PHREG and coxph functions

16 / 35


Summary


Computational Methods for the Case Cohort Design

Standard Cox PH functions in SAS and R cannot directlycompute the relative and absolute risk estimates

Modifications must be made to the dataset to make use ofPROC PHREG and coxph functions

16 / 35

Computation of Pseudolikelihood

Manually

Subcohort failure

Non-Subcohort failure

Subcohort failure

40 45 50 55 60 65 70

7

6

5

4

3

2

1

L(β) =

exp(Z7β)

exp(Z1β) + exp(Z2β) + exp(Z6β) + exp(Z7β)

×exp(Z5β)

exp(Z1β) + exp(Z2β) + exp(Z4β) + exp(Z5β) + exp(Z6β)

×exp(Z1β)

exp(Z1β) + exp(Z3β) + exp(Z4β)

Computationally

40 45 50 55 60 65 70

7.27.1

6

5

4

3

2

1.21.1

L(β) =

exp(Z7.2β)

exp(Z1.1β) + exp(Z2β) + exp(Z6β) + exp(Z7.2β)

×exp(Z5β)

exp(Z1.1β) + exp(Z2β) + exp(Z4β) + exp(Z5β) + exp(Z6β)

×exp(Z1.2β)

exp(Z1.2β) + exp(Z3β) + exp(Z4β)

Computation of Pseudolikelihood

Manually

Subcohort failure

Non-Subcohort failure

Subcohort failure

40 45 50 55 60 65 70

7

6

5

4

3

2

1

L(β) =

exp(Z7β)

exp(Z1β) + exp(Z2β) + exp(Z6β) + exp(Z7β)

×exp(Z5β)

exp(Z1β) + exp(Z2β) + exp(Z4β) + exp(Z5β) + exp(Z6β)

×exp(Z1β)

exp(Z1β) + exp(Z3β) + exp(Z4β)

Computationally

40 45 50 55 60 65 70

7.27.1

6

5

4

3

2

1.21.1

L(β) =

exp(Z7.2β)

exp(Z1.1β) + exp(Z2β) + exp(Z6β) + exp(Z7.2β)

×exp(Z5β)

exp(Z1.1β) + exp(Z2β) + exp(Z4β) + exp(Z5β) + exp(Z6β)

×exp(Z1.2β)

exp(Z1.2β) + exp(Z3β) + exp(Z4β)


Summary

Relative Risk EstimatesAbsolute Risk EstimatesWeb Life Expectancy Tool

Table: All Cause mortality

All Cause

Covariate 1999 2008 Wykes (2011)

Age 1.04 (1.02, 1.05) 1.05 (1.04, 1.06) 1.02 (1.01, 1.03)CIRS-Gpros 1.14 (1.10, 1.18) 1.13 (1.10, 1.16) 1.12 (1.09, 1.15)

Table: Cause Specific mortality

Cause Specific

Covariate PCa (1999)a OC (1999)

Age 1.00 (0.99, 1.02) 1.07 (1.05, 1.09)CIRS-Gpros 1.032 (0.99, 1.07) 1.25 (1.20, 1.30)

a predictors not significant

18 / 35


Summary


Table: All Cause mortality

All Cause

Covariate 1999 2008 Wykes (2011)

Age 1.04 (1.02, 1.05) 1.05 (1.04, 1.06) 1.02 (1.01, 1.03)CIRS-Gpros 1.14 (1.10, 1.18) 1.13 (1.10, 1.16) 1.12 (1.09, 1.15)

Table: Cause Specific mortality

Cause Specific

Covariate PCa (1999)a OC (1999)

Age 1.00 (0.99, 1.02) 1.07 (1.05, 1.09)CIRS-Gpros 1.032 (0.99, 1.07) 1.25 (1.20, 1.30)

a predictors not significant

18 / 35


Summary


●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●●●●●●●●●●

●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

All cause (1999) All cause (2008) OC (1999)

PCa (1999) Wykes (2011)

10

20

30

10

20

30

0 3 4 5 6 7 8 9101112 0 3 4 5 6 7 8 9101112CIRS−G_pros

life

expe

ctan

cy (

year

s)

50

60

70

80Age

Figure: Estimates from East Region (Wykes estimates based on all cause unstratifiedsubcohort at 2008) 19 / 35


Summary


5

10

15

20

25

0 3 4 5 6 7 8 9 10 11 12CIRS−G_pros

life

expe

ctan

cy (

year

s) Age

55

65

75

Model

All cause (2008)

Wykes (2011)

Figure: Estimates from East Region (Wykes estimates based on all cause unstratifiedsubcohort at 2008) 20 / 35

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95age (years)

surv

ival

pro

babi

lity

CCO North West region

(a) All cause (1999)

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95age (years)

surv

ival

pro

babi

lity


(b) Other cause(1999)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90age (years)

surv

ival

pro

babi

lity


(c) Prostate cancer(1999)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100age (years)

surv

ival

pro

babi

lity


(d) All cause (2008)

Figure: Survival curve with 95% confidence band for North West CCOR for age=50,CIRS-Gpros=5, based on the four models developed.


Summary


Tool for Use in a Clinical Setting

Built using shiny package for R

Requires R and an internet connection

Three simple lines of code

22 / 35


Summary






22 / 35


Summary






22 / 35


Summary

ContributionsFuture WorkReferences

Contributions

1 Life expectancy estimates based on age and comorbidity scorefrom a population based case cohort study

2 Improves previous work done with this dataset by usingappropriate statistical methodology for case cohort design

3 Easy-to-use interactive web tool for clinical use

23 / 35


Summary


Contributions




23 / 35


Summary


Contributions




23 / 35


Summary


Future Work

Internal validation → risk prediction accuracy, PHassumption, compare expected values from 1999 model toobserved 2008 subcohort

External validation → generalizability

Convert Langholz & Jiao (2007) SAS macro to R for absoluterisk estimates

24 / 35


Summary


Future Work




24 / 35


Summary


Future Work




24 / 35


Summary


Alternative Analysis

Treat the case cohort design as a cohort study with missingdata (NestedCohort and Survey packages for R) (Mark &Katki (2008), Breslow & Lumley (2009))

Competing Risk analysis using the subdistribution hazard ofFine & Gray

25 / 35


Summary


Alternative Analysis

Treat the case cohort design as a cohort study with missingdata (NestedCohort and Survey packages for R) (Mark &Katki (2008), Breslow & Lumley (2009))

Competing Risk analysis using the subdistribution hazard ofFine & Gray

25 / 35


Summary


References I

M. Center et al.International Variation in Prostate Cancer Incidence andMortality RatesEuropean Association of Urology, 61:2012

Bryan Langholz and Jenny JiaoComputational methods for case cohort studiesComputational Statistics & Data Analysis, 51:2007

Therneau, T and Li, HComputing the Cox model for case cohort designsLifetime data analysis, 2(5):1999

26 / 35


Summary


References II

PRENTICE, R. L.A case-cohort design for epidemiologic cohort studies anddisease prevention trialsBiometrika, 73(1):1986

RStudio and Inc.shiny: Web Application Framework for RR package version 0.6.0, 17(25):2012

T. Lumleysurvey: analysis of complex survey samplesR package version 3.28-2,2012

27 / 35


Summary


References III

Katki, H.A. and Mark, S.D.Survival Analysis for Cohorts with Missing CovariateInformationThe R Journal,2008

28 / 35


Summary


Influential Analysis

Dfbeta residuals

Are the approximate changes in the parameter estimates (β − β(j))

when the j th observation is omitted. These variables are aweighted transform of the score residual variables and are useful inassessing local influence and in computing approximate and robustvariance estimates

Standardized dfbeta residuals:

(β − β(j))

sd(β − β(j))

29 / 35


Summary


Why is the PsL not a Partial Likelihood

Not a PL → Asymptotic distribution theory for the PLestimators breaks down because cases outside the subcohortinduce non-nesting of the σ-fields, or sets on which thecounting process probability measure is defined

cannot use Likelihood ratio tests since PsL are not reallikelihoods

30 / 35


Summary


Consistent Estimators

An estimator θ will perform better and better as we obtain moresamples. If at the limit n→∞ the estimator tends to be alwaysright, it is said to be consistent. This notion is equivalent toconvergence in probability

Convergence in Probability

Let X1, . . . ,Xn be a sequence of iid RVs drawn from a distributionwith parameter θ and θ an estimator for θ. θ is consistent as anestimator of θ if θ

p→ θ

limn→∞

P(|θ − θ| > ε) = 0, ∀ ε > 0

31 / 35


Summary


Competing Risks

when disease of interest has a delayed occurrence, otherevents may preclude observation of disease of interest givingrise to a competing risk situation

to analyze the event of interest taking into account thecompeting risks, one has to model the hazard ofsubdistribution. In the presence of competing risks, theprobability to observe the event of interest spans the interval[0, p], p < 1. Because p does not reach 1, this function is nota proper distribution function and it is called a subdistribution

Modify the partial likelihood in the Cox model to modelhazard of subdistribution, such that the individualsexperiencing competing risks are always in the risk set withtheir contribution regulated by a weight

32 / 35


Summary


Competing Risks

Fine and Gray Partial Likelihood:

PL(β) =n∏

j=1

exp {βxj}∑r∈Rj

wrj exp {βxr}

δj

(1)

wrj =G (tj)

G (min(tj , tr ))(2)

Cj =

1 if the event of interest was observed at time tj

2 if the competing risk event was observed at time tj

0 if no event was observed at time tj

δj =

{1 Cj = 1

0 Cj = 0 or Cj = 233 / 35


Summary


Competing Risks

G (tj): estimated probability of being censored at time tj byeither the event of interest or a competing risks eventRj = {r ; tr ≥ tj or Cr = 2}individuals experiencing competing risk event are alwaysconsidered in the risk set at all time pointsThe weight controls how much a specific observationparticipates in the sum in the denominatorIf the observation at time tr is censored (Cr = 0) or has theevent of interest (Cr = 1), the weight is either 1 or 0depending whether the observation is in the risk set (tr ≥ tj)or it is not in the risk set (tr < tj)If the observation has a competing risk event, then it is alwaysin the risk set and the weight is 1 if (tr ≥ tj) or a quantity lessthan 1 if tr < tjThis quantity decreases as the distance between tr and tjincreasesHazard ratio depends on time and thus the proportionality ofhazard is not satisfied in general

34 / 35


Summary


Case-cohort in the presence of competing risks

PsL(β) =n∏

j=1

exp {βxj}∑r∈Rj

Irjwrj exp {βxr}

δj

δj =

{1 Cj = 1

0 Cj = 0 or Cj = 2

Irj =

{1 r is in subcohort or r is a case outside the subcohort and tr = tj

0 r is a case outside the subcohort and tr 6= tj

wrj =G(tj )

G(min(tj ,tr )), where G (tj) is the estimation of the probability

of censoring regardless of the fact that in a case-cohort study thecases are oversampled

35 / 35

Science

Absolute risk estimation in a case cohort study of prostate cancer