View
221
Download
1
Category
Tags:
Preview:
Citation preview
Lecture 6: Non-parametric Estimation II
-Confidence intervals and bands-Mean and median
Point-wise Confidence Intervals
• Recall last time we discussed several possibilities for constructing point-wise confidence intervals for S(t) at a particular t
• The estimates rely heavily on the Greenwood estimate of the variance of
• Recall that this is the sum in the Greenwood’s formula:
2ˆ ˆˆ i
i i i
i
d
Y Y dt t
V S t S t
S t
Types of Point-wise CIs• Linear
• Log
• Log-log
• Arcsine Square root
2
1 2ˆ ˆ i
i i ii
d
Y Y dt tS t z S t
1 2ˆdi
Y Y dt t i i iiz
S t e
11 1 2 ˆlogˆ ˆ, where
diY Y dS t t t i i ii
zS t S t e
ˆ0.52ˆ1 2 1
ˆ0.52ˆ1 22 1
ˆ ˆsin max 0,arcsin 0.5
ˆsin min ,arcsin 0.5
i
i i ii
i
i i ii
S td
Y Y d S tt t
S td
Y Y d S tt t
S t z S t
S t z
Example: Tongue Cancer
• 80 subjects with tongue cancer• Outcome: Time to death (in weeks)• Two tumor types
– Aneuploid– Diploid
Example: Tongue Cancer
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
1.0
Time to Death (months)
Su
rviv
al
AneuploidDiploid
R Life Tables >dat<-Surv(tongue$Time, tongue$Cens)> type<-tongue$Type> fit<-survfit(dat~type)> summary(fit)Call: survfit(formula = dat ~ type)
type=1 *Anueploid time n.risk n.event survival std.err L 95% CI U 95% CI 1 52 1 0.981 0.0190 0.944 1.000 3 51 2 0.942 0.0323 0.881 1.000 4 49 1 0.923 0.0370 0.853 0.998 10 48 1 0.904 0.0409 0.827 0.988 13 47 2 0.865 0.0473 0.777 0.963 16 45 2 0.827 0.0525 0.730 0.936 24 43 1 0.808 0.0547 0.707 0.922 26 42 1 0.788 0.0566 0.685 0.908 27 41 1 0.769 0.0584 0.663 0.893 28 40 1 0.750 0.0600 0.641 0.877 30 39 2 0.712 0.0628 0.598 0.846 32 37 1 0.692 0.0640 0.578 0.830 41 36 1 0.673 0.0651 0.557 0.813 51 35 1 0.654 0.0660 0.537 0.797 65 33 1 0.634 0.0669 0.516 0.780….
R Life Tables
type=2 *Diploid
time n.risk n.event survival std.err L 95% CI U 95% CI 1 28 1 0.9643 0.0351 0.8979 1.000 3 27 1 0.9286 0.0487 0.8379 1.000 4 26 1 0.8929 0.0585 0.7853 1.000 5 25 2 0.8214 0.0724 0.6911 0.976 8 23 1 0.7857 0.0775 0.6475 0.953 12 21 1 0.7483 0.0824 0.6031 0.929 13 20 1 0.7109 0.0863 0.5603 0.902 18 19 1 0.6735 0.0895 0.5190 0.874 23 18 1 0.6361 0.0921 0.4790 0.845 26 17 1 0.5986 0.0939 0.4402 0.814 27 16 1 0.5612 0.0952 0.4025 0.783 30 15 1 0.5238 0.0959 0.3658 0.750 42 14 1 0.4864 0.0961 0.3302 0.716 56 13 1 0.4490 0.0957 0.2956 0.682 62 12 1 0.4116 0.0948 0.2621 0.646…
CIs for S(52) for Anueploid Tumors
• Linear
CIs for S(52) for Anueploid Tumors
• Log
CIs for S(52) for Anueploid Tumors
• Log-log
CIs for S(52) for Anueploid Tumors
• Arcsine-Square root
Comparing CI’s
R Code### Anueploid Onlyafit.lin<-survfit(dat[type==1]~1, conf.type="plain")afit.log<-survfit(dat[type==1]~1, conf.type="log")afit.loglog<-survfit(dat[type==1]~1, conf.type="log-log")
### Diploid onlydfit.lin<-survfit(dat[type==2]~1, conf.type="plain")dfit.log<-survfit(dat[type==2]~1, conf.type="log")dfit.loglog<-survfit(dat[type==2]~1, conf.type="log-log")
R Codepar(mfrow=c(2,2))
plot(afit.lin, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Linear")
lines(dfit.lin, conf.int=T, col=3, lwd=2, lty=c(1,1))
mtext("S(t)", side=2, line=3, at=-0.4)
plot(afit.log, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log")
lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(1,1))
plot(afit.loglog, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log-log")
lines(dfit.loglog, conf.int=T, col=3, lwd=2, lty=c(1,1))
mtext("Time to Death (days)", side=1, line=3, at=500)
plot(afit.lin, conf.int=F, col=1, lwd=2, lty=2, main="Arcsine squareroot")
lines(dfit.lin, conf.int=F, col=3, lwd=2, lty=1)
lines(afit.lin$time, aarcsn[,1], type="s", lwd=2, lty=2)
lines(afit.lin$time, aarcsn[,3], type="s", lwd=2, lty=2)
lines(dfit.lin$time, darcsn[,1], type="s", col=3, lwd=2, lty=1)
lines(dfit.lin$time, darcsn[,3], type="s", col=3, lwd=2, lty=1)
R Codepar(mfrow=c(1,2), xpd=NA)
plot(afit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Anueploid", ylab="S(t)")
lines(afit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))
lines(afit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))
lines(afit.lin$time, aarcsn[,1], type="s", col=4, lwd=2, lty=4)
lines(afit.lin$time, aarcsn[,3], type="s", col=4, lwd=2, lty=4)
mtext("Time to Death (days)", side=1, line=3, at=500)
plot(dfit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Diploid")
lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))
lines(dfit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))
lines(dfit.lin$time, darcsn[,1], type="s", col=4, lwd=2, lty=4)
lines(dfit.lin$time, darcsn[,3], type="s", col=4, lwd=2, lty=4)
legend(x=125, y=1, legend=c("Linear","Log","Log-log","Arcsine sqrt"), col=c(1,3,2,4), lty=c(1,2,3,4), lwd=2, cex=0.75, bty="n")
CI Function for Arcsine Sqrtarcsincis<-function(fit, t, alpha)#fit is a fitted survival curve from survfit, t is the time we want to estimate S(t) and the CI for#alpha is the significance level we want to consider{ if(class(fit)!="survfit") stop("The object is not of class 'survfit'") tloc<-max(which(fit$time<=t)) St_hat<-fit$surv[tloc] gw<-c() for (i in 1:tloc)
{di<-fit$n.event[i]yi<-fit$n.risk[i]gw<-append(gw, di/(yi*(yi-di)))}
gwsum<-sum(gw) lo<-round((sin(max(0, asin(sqrt(St_hat))-0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-St_hat)))))^2, 3) hi<-round((sin(min(pi/2, asin(sqrt(St_hat))+0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-
St_hat)))))^2, 3) St_hat<-round(St_hat, 3) cat("Arcsine Square root ", 100*(1-alpha), "% Confidence interval for S(", t, ") is: ", lo, " <= ",
St_hat," <= ",hi, sep="", "\n")}
CI Function for Arcsine Sqrt
> afit<-survfit(dat[type==1]~1, conf.type="none")> arcsincis(fit=afit, t=52, alpha=0.05)Arcsine Square root 95% Confidence interval for S(52) is: 0.52 <= 0.654 <= 0.776
Just Some Reminders About Usage…
• For N > 25 and < 50% censoring– Log-log is good– Arcsine square-root good– All three give ~ nominal coverage for 95% CI– Exception: extreme right tail where there is little
data• Linear approach requires larger N for good
coverage
Which Ones are More/Less Conservative
• Arcsine square root– Slightly conservative– A little wider than necessary
• Log– Conservative for upper limit– Anti-conservative on lower limit
• Log-log– For small N, slightly anti-conservative– A little too narrow
• Linear– For small N, overly anti-conservative– Too narrow
• Large Samples: all about the same
Confidence Intervals
• What is most commonly produced by software packages
• Valid ONLY for point-wise intervals• Problem is they are often misinterpreted:
– Plot a set of point-wise 95% CIs– Interpret as confidence “band”– These “bands” are too narrow!
Confidence Bands
• A band for which we are (1-a)% confident that the survival function fails within the band for all t in some interval
• Tend to be wider than the point-wise estimates
• Looking for two random functions, U(t) and L(t) s.t.
1 ; L UP L t S t U t t t t
EP (“equal probability”) bands (Nair)
• Proportional to point-wise confidence bands• Several steps:
1. Define a’s:
2. Define confidence coefficient: ca(aL, aU) from table C3
3. Use linear, log-log, or arcsine approach…
2
2
2
2
1
1
S L
S L
S U
S U
n tL n t
n t
U n t
a
a
Linear Confidence Bands
ˆ ˆ,
where
for
i
i i ii
L U S
dS Y Y dt t
L U
S t c a a t S t
t
t t t
Log-Log Confidence Bands
• Define q:
• Confidence band:
,exp
ˆln
L U Sc a a t
S t
1ˆ ˆ,S t S t
Arcsine-Square Root Confidence Bands
12
12
ˆ2ˆ1
ˆ2ˆ1
ˆlower: sin max 0, arcsin 0.5 ,
ˆupper: sin min , arcsin 0.5 ,2
S tL U S S t
S tL U S S t
S t c a a t
S t c a a t
Hall-Wellner Confidence Bands• Similar to previous approach• Confidence coefficients in table C4
• Linear:
• Log-log:
• Arcsine:
2, 1
ˆ ˆL U Sa a n tS t S t
n
121 22
12
121 22
12
ˆ, 1_2ˆ1
ˆ, 1_2ˆ2 1
ˆ ˆsin max 0,arcsin 0.5
ˆsin min ,arcsin 0.5
L U s
L U s
k a a n t S t
S tn
k a a n t S t
S tn
S t S t
S t
12, 1
ˆ ˆ, where expˆln
L U Sa a n tS t S t
n S t
Life-Table for Tongue Cancer>sort(tongue$Time) [1] 1 1 3 3 3 4 4 5 5 8 8 10 12 13 13 13 16 16 18 23 24 26 26 27 27 28 30 30 30 32 [30] 41 42 51 56 61 62 65 67 67 69 70 72 73 74 76 77 79 80 81 87 87 88 89 91 93 93 96[58] 97 100 101 104 104 104 104 104 108 109 112 120 129 131 150 157 167 176 181 231 231 240 400> dat<-Surv(tongue$Time, tongue$Cens)> full.mod<-survfit(dat~1)> summary(full.mod)Call: survfit(formula = dat ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 80 2 0.975 0.0175 0.9414 1.000 3 78 3 0.938 0.0271 0.8859 0.992 4 75 2 0.913 0.0316 0.8526 0.977 5 73 2 0.888 0.0353 0.8209 0.960 8 71 1 0.875 0.0370 0.8054 0.951 10 69 1 0.862 0.0386 0.7900 0.941 12 68 1 0.850 0.0400 0.7747 0.932… 157 8 1 0.252 0.0634 0.1541 0.413 167 7 1 0.216 0.0638 0.1213 0.386 181 5 1 0.173 0.0640 0.0838 0.357
Example: Tongue Cancer
• Steps:1. Choose and interval of time
2. Define a’s:
10 200L Ut and t
2
2
2
2
2 2
32 2 2 1 180*78 78*75 75*73 73*71 71*70 69*68
2 2 2
80*0.00199881 80*0.00199881
80*0.1370271 80*0.1370271
10
0.0019988
200 181 0.137027
0.1379
0.0.9164
S L
S L
S U
S U
s L s
s L s s
n tL n t
n t
U n t
t
t
a
a
Example: Tongue Cancer
• Steps:3. Define confidence coefficient: ca(aL, aU) from
table C3 (or ka(aL, aU) from table C4)
4. Use linear, (log), log-log, or arcsine approach…
Example: Confidence Bands (Nair)
0 50 100 150 200 250
0.0
0.2
0.4
0.6
0.8
1.0
Linear
0 50 100 150 200 250
0.0
0.2
0.4
0.6
0.8
1.0
Log
0 50 100 150 200 250
0.0
0.2
0.4
0.6
0.8
1.0
Loglog
0 50 100 150 200 250
0.0
0.2
0.4
0.6
0.8
1.0
Arcsine-Sqrt
There is an R Package for That…
• R package “km.ci”– Will estimate different types of confidence intervals
• But no arcsine squareroot
– Will estimate both the Nair and Hall-Wellner confidence bands
• But only the log-log transformed
• It does include the ca and ka tables from Klein and Moeschberger!– So… you could write your own confidence band function
Confidence Band Performance
• Linear is poor for n < 200– Poor coverage probability
• Log-log and arcsine square-root have pretty accurate coverage probabilities (even for n = 20)
Mean Estimation• Recall:• Non-parametric approach
– Can be done using integral
– Requires last time is not censored• “fixes” exist if this is the case
0
S t dt
2
2
10 0
1
ˆ ˆˆ ˆ ˆ&
ˆ ˆ ˆ100 1 % CI :
D ii
i i i
dS t dt Var S t dt
Y Y d
z Var
Example: MP-6 treatment for Leukemia
• Time in months to relapse on acute leukemia patients
• 21 patients observed for up to 35 weeksTime di Yi S(t)
6 3 21 0.8577 1 17 0.807
10 1 15 0.75313 1 12 0.69016 1 11 0.62822 1 7 0.53823 1 6 0.448
Example: MP-6 treatment for Leukemia
Example: MP-6 treatment for Leukemia
Median Survival
• Any quantile of S(t) can be estimated in the same way
• The most common is median (p = 0.50)• Definition of median survival:
• That is, the smallest event time for which S(t) is less than 0.50
0.5ˆˆ inf : 0.5x t S t
Other Quantiles
• For the pth quantile
• So for example, the 25th quantile is
ˆˆ inf : 1px t S t p
0.25ˆ ˆˆ inf : 1 0.25 inf : 0.75x t S t t S t
Precision of pth Quantile
• Technically difficult…• Requires knowledge of the density function of X at xp
• Approximation approach by Brookmeyer and Crowely (1982)
• Most commonly used approach for estimating a confidence interval for median survival
Brookmeyer-Crowley Approach
• For each observed t, estimate a z-score• Example
– 95% confidence interval • Calculate Z for each t• All t for which |Z| < 1.96 are included in the confidence
interval
• Looks similar to an approach for estimating the confidence for a mean
Brookmeyer-Crowley Approach
• Linear
• Log-log
• Arcsine square root
ˆ 1
ˆˆ
S t pZ
V S t
ˆ ˆ ˆln ln ln ln 1 ln
ˆˆ
S t p S t S t
V S t
12ˆ ˆ ˆ2 arcsine arcsine 1 1
ˆˆ
S t p S t S t
V S t
Example
• Kim paper• Event = time to relapse• Data:
– 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+
Kim Data
0 20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Time to Relapse (months)
Su
rviv
al F
un
ctio
n
Kaplan-MeierNelson-Aalen
Kim Datat di Yi S(t) V(t) Linear Z Log-log Z Arcsine Z
10 1 10
35 1 8
55 1 5
80 1 2
CI Around Median
• Median for Kim data is 80 weeks• CI around this value is set of all points that
satisfy the selected inequality– E.g. for 90%, Z is 1.645– In the case of the Kim data, estimates highly
variable– Limitation with SMALL data set
For Next Time
• Left Truncated data• Competing Risks
Recommended