10
BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis Sol_survival_R.docx Page 1 of 10 Unit 8 – Introduction to Survival Analysis Practice Problems SOLUTIONS R Users Before you begin: Be sure to have installed the package survival. The following are some hypothetical data on two groups, smokers and non-smokers, in a study that investigated survival (days) following a root canal. Group Days(X) Status at Last Follow-up (C) group days status smokers 4 alive smokers 7 dead smokers 8 alive nonsmoker 29 alive smokers 29 dead smokers 31 alive nonsmoker 40 dead smokers 65 dead nonsmoker 69 dead nonsmoker 78 alive nonsmoker 79 alive nonsmoker 106 dead smokers 107 alive nonsmoker 129 dead smokers 130 alive smokers 140 alive smokers 142 alive smokers 149 dead smokers 158 alive smokers 160 dead nonsmoker 161 dead smokers 162 alive smokers 187 dead smokers 188 alive nonsmoker 197 dead nonsmoker 204 alive nonsmoker 208 alive smokers 221 dead nonsmoker 228 dead nonsmoker 231 alive

sol survival R - UMass Amherst

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 1 of 10

Unit 8 – Introduction to Survival Analysis Practice Problems

SOLUTIONS R Users

Before you begin: Be sure to have installed the package survival.

The following are some hypothetical data on two groups, smokers and non-smokers, in a study that investigated survival (days) following a root canal. Group Days(X) Status at Last Follow-up (C) group days status smokers 4 alive smokers 7 dead smokers 8 alive nonsmoker 29 alive smokers 29 dead smokers 31 alive nonsmoker 40 dead smokers 65 dead nonsmoker 69 dead nonsmoker 78 alive nonsmoker 79 alive nonsmoker 106 dead smokers 107 alive nonsmoker 129 dead smokers 130 alive smokers 140 alive smokers 142 alive smokers 149 dead smokers 158 alive smokers 160 dead nonsmoker 161 dead smokers 162 alive smokers 187 dead smokers 188 alive nonsmoker 197 dead nonsmoker 204 alive nonsmoker 208 alive smokers 221 dead nonsmoker 228 dead nonsmoker 231 alive

Page 2: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 2 of 10

1. By any means you like, create a data set (Excel or Stata or R) of the observations in the table on page 1. Tips - (1) Create a 0/1 variable called group with 1=smokers, (2) create a 0/1 variable called status with 1=dead. Solution:Icreatedanexceldatasetcalledhw_survival.xlsx.Hereisascreencaptureofthetopportion.

group days status 1.00 4.00 0.00 1.00 7.00 1.00 1.00 8.00 0.00 0.00 29.00 0.00 1.00 29.00 1.00 1.00 31.00 0.00 0.00 40.00 1.00 1.00 65.00 1.00 0.00 69.00 1.00 0.00 78.00 0.00 0.00 79.00 0.00

InRStudio,fromthetoolbarattop,Iusedtheusedropdownmenutoimporthw_survival.xlsx

Page 3: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 3 of 10

setwd("/Users/cbigelow/Desktop") dat <- hw_survival # Optional – I decided to rename my data to something shorter group <- as.factor(dat$group) status <-as.factor(dat$status) dat

## group days status ## 1 1 4 0 ## 2 1 7 1 ## 3 1 8 0 ## 4 0 29 0

--- rows not shown ----

## 28 1 221 1 ## 29 0 228 1 ## 30 0 231 0

Preliminary-Declaredatatobesurvivaldatalibrary(survival)

# use Surv( ) function in package {survival} to declare data as survival data # KEY: Surv(TIMETOEVENTVARIABLE, CENSORINGVARIABLE) surv.object <- with(dat, Surv(days,status))

Page 4: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 4 of 10

2. Obtain the Kaplan-Meier estimates of survival, separately for smokers and non-smokers. Suggestion. Do this first by hand (see page 25 of the Unit 8 notes). Then try using R. Smokers:

ID x c t An actual time t

Of death or censoring

# At Risk at t- instant before

# Surviving t Conditional % Surviving

# at Risk to carry forward

Always start at “0” 17 17 17/17 17 1 4 0 Drop 16 2 7 1 7 16 15 15/16 15 3 8 0 Drop 14 4 29 1 29 14 13 13/14 13 5 31 0 Drop 12 6 65 1 65 12 11 11/12 11 7 107 0 Drop 10 8 130 0 Drop 9 9 140 0 Drop 8 10 142 0 Drop 7 11 149 1 149 7 6 6/7 6 12 158 0 Drop 5 13 160 1 160 5 4 4/5 4 14 162 0 Drop 3 15 187 1 187 3 2 2/3 2 16 188 0 Drop 1 17 221 1 221 1 0 0/1 0

Key: ID - Subject Identifier, X – Time on Study, C – Censoring Indicator (C=1 if Event of Death, C=0 if Censored) NON-Smokers:

ID x c t An actual time t

Of death or censoring

# At Risk at t- instant before

# Surviving t Conditional % Surviving

# at Risk to carry forward

Always start at “0” 13 13 13/13 13 1 29 0 Drop 12 2 40 1 40 12 11 11/12 11 3 69 1 69 11 10 10/11 10 4 78 0 Drop 9 5 79 0 Drop 8 6 106 1 106 8 7 7/8 7 7 129 1 129 7 6 6/7 6 8 161 1 161 6 5 5/6 5 9 197 1 197 5 4 4/5 4 10 204 0 Drop 3 11 208 0 Drop 2 12 228 1 228 2 1 1/2 1 13 231 0 Drop 0

Key: ID - Subject Identifier, X – Time on Study, C – Censoring Indicator (C=1 if Event of Death, C=0 if Censored)

Page 5: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 5 of 10

Kaplan-Meier Estimates S[t] for Smokers: t

Formula for Estimating S[t]

S[t]

0 S[0]=Pr[T>0]=17/17 1.0 7 S[7]=Pr[T>0]P[T>7| T>0] = (17/17)(15/16) .9375 29 S[29]=Pr[T>0]P[T>7| T>0] Pr[T>29|T>7] = (17/17)(15/16)(13/14) .8705 65 … = (17/17)(15/16)(13/14)(11/12) .7980 149 … = (17/17)(15/16)(13/14)(11/12)(6/7) .6840 160 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5) .5472 187 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5)(2/3) .3648 221 … = (17/17)(15/16)(13/14)(11/12)(6/7)(4/5)(2/3)(0/1) 0

Kaplan-Meier Estimates S[t] for NON-Smokers:

t

Formula for Estimating S[t]

S[t]

0 S[0]=Pr[T>0]=13/13 1.0 40 S[40]=Pr[T>0]P[T>40| T>0] = (13/13)(11/12) .9167 69 S[69]=Pr[T>0]P[T>40| T>0] Pr[T>69|T>40] = (13/13)(11/12)(10/11) .8333 106 … = (13)(11/12)(10/11)(7/8) .7292 129 … = (13)(11/12)(10/11)(7/8)(6/7) .6250 161 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6) .5208 197 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6)(4/5) .4167 228 … = (13/13)(11/12)(10/11)(7/8)(6/7)(5/6)(4/5)(1/2) .2083

# Command survfit ( ) to obtain Kaplan-Meier estimates of survival. Package: survival library(survival) q2fit <- survival::survfit(surv.object~group,data=dat) summary(q2fit)

## Call: survfit(formula = surv.object ~ group, data = dat) ## ## group=0 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 40 12 1 0.917 0.0798 0.7729 1.000 ## 69 11 1 0.833 0.1076 0.6470 1.000 ## 106 8 1 0.729 0.1355 0.5066 1.000 ## 129 7 1 0.625 0.1510 0.3893 1.000 ## 161 6 1 0.521 0.1577 0.2877 0.943 ## 197 5 1 0.417 0.1568 0.1993 0.871 ## 228 2 1 0.208 0.1669 0.0433 1.000 ## ## group=1 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 7 16 1 0.938 0.0605 0.826 1.000 ## 29 14 1 0.871 0.0856 0.718 1.000 ## 65 12 1 0.798 0.1048 0.617 1.000 ## 149 7 1 0.684 0.1386 0.460 1.000 ## 160 5 1 0.547 0.1651 0.303 0.989 ## 187 3 1 0.365 0.1852 0.135 0.987 ## 221 1 1 0.000 NaN NA NA

Page 6: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 6 of 10

# Command plot( ) to obtain curves plot(q2fit, xlab="Survival Time in Days", ylab="% Surviving", yscale=100, col=c("red","blue"), main="Survival Distributions by Group") legend("topright", title="Group", c("Smokers", "Non-smokers"), fill=c("red", "blue"))

Page 7: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 7 of 10

3. By hand, perform a log rank test of the null hypothesis of equal survival curves for smokers and non- smokers. Tip – See page 32 of the course notes for unit 8 (Introduction to Survival Analysis) Worksheet .

t 7 1 16 1 28 13 29 29 1 14 1 26 13 27 40 0 12 1 23 12 24 65 1 12 1 22 11 23 69 0 11 1 21 11 22 106 0 11 1 18 8 19 129 0 10 1 16 7 17 149 1 7 1 12 6 13 160 1 5 1 10 6 11 161 0 4 1 9 6 10 187 1 3 1 7 5 8 197 0 1 1 5 5 6 221 1 1 1 2 2 3 228 0 0 1 1 2 2

Key: O1t = 1 if death in smoker, 0 if death in nonsmoker, n1t=# at risk among smokers dt = # deaths (Nt – dt ) = # surviving n2t = # at risk among nonsmokers Nt = Total # at risk Worksheet - continued .

T

7 1 0.5517 0.2473 29 1 0.5185 0.2497 40 0 0.5000 0.2500 65 1 0.5217 0.2495 69 0 0.5000 0.2500 106 0 0.5789 0.2438 129 0 0.5882 0.2422 149 1 0.5385 0.2485 160 1 0.4545 0.2479 161 0 0.4000 0.2400 187 1 0.3750 0.2344 197 0 0.1667 0.1389 221 1 0.3333 0.2222 228 0 0 0 Totals 7 6.0272 3.0644

1tO 1tn td t t(N d )- 2tn tN

1tO 1t 1t t tE[O ] (n )[d N ]= 2t t t1t 1t

t t

n (N d )V[O ] [E(O )]N (N 1)

é ù-= ê ú-ë û

Page 8: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 8 of 10

4. Using R, reproduce the log rank test that you did by hand in exercise #3.

# survival curves (logrank test) using command survdiff(). Package: survival library(survival) survival::survdiff(surv.object~group, data=dat)

## Call: ## survdiff(formula = surv.object ~ group, data = dat) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## group=0 13 7 7.97 0.119 0.309 ## group=1 17 7 6.03 0.157 0.309 ## ## Chisq= 0.3 on 1 degrees of freedom, p= 0.578

5. Write an expression for a Cox Proportional Hazards Model that could be explored to investigate the association of survival time following root canal with smoking status. Define all terms.

Solution: A Cox PH model for the hazard of death following root canal and its association with smoking status is

where -

instantaneous hazard of death at time “t” given survival to “t-“ for person with covariate Z

baseline hazard of death at time “t” given survival to “t-“ Z = group = indicator of smoking status with Z=1 for smokers, 0 for nonsmokers.

( )

2#deaths #deaths

21t 1tt=1 t=1

log rank;1df #deaths

1tt=1

O E(O )7 6.02717

0.30883.0644V(O )

c

æ ö-ç ÷ -è ø= = =å å

å

0h(t; Z) = h (t) exp[ βZ ]

h(t; Z) =

0 h (t) =

Page 9: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 9 of 10

6. What assumptions must hold in order for this model to be valid?

Solution: (1) Model: A Cox PH model for the hazard of death following root canal and its association with smoking status is

where Z=group and

instantaneous hazard of death at time “t” given survival to “t-“ for person with covariate Z

baseline hazard of death at time “t” given survival to “t-“ Z = group and Z=1 for smokers, 0 for nonsmokers. (2) Proportional Hazards: The relative hazard of death for smokers is a constant multiple (called the hazard ratio) of the hazard of death for non-smokers over all occasions of time. (3) Independence: The observations are independent.

7. Using Stata or R (or whatever software you like), fit the model you stated in exercise #5. Report your output and provide annotations that explain the output. # The function coxph() will fit a Cox PH Model. Package: survival library(survival) q7cox <- survival::coxph(surv.object~group, data=dat) q7cox

## Call: ## coxph(formula = surv.object ~ group, data = dat) ## ## coef exp(coef) se(coef) z p ## group 0.317 1.373 0.573 0.55 0.58 ## ## Likelihood ratio test=0.31 on 1 df, p=0.579 ## n= 30, number of events= 14

0h(t; Z) = h (t) exp[ βZ ]

h(t; Z) =

0 h (t) =

Page 10: sol survival R - UMass Amherst

BIOSTATS 640 – Spring 2021 Intermediate Biostatistics Unit 8. Survival Analysis

Sol_survival_R.docx Page 10 of 10

In this sample, smokers have a non-statistically significant (p=.58) relative hazard of death that is 37% greater than that of nonsmokers following root canal. (HR = 1.37 with 95% CI limits 0.45 to 4.22)

8. Compare the fit of the model you obtained for exercise #7 to the results of the log-rank test that you got for exercises #3 and #4.

Solution: It’s a match! A Cox PH model for the hazard of event with one 0/1 predictor is equivalent to the log rank test for the comparison of two groups. Log Rank Test Chi Square = 0.3088 on df=1 has p-value = .5784 Cox PH Model Score Test for significance of 0/1 GROUP = 0.3088 on df=1 has p-value = .5784