Lecture 6: Non-parametric Estimation II

-Confidence intervals and bands-Mean and median

Point-wise Confidence Intervals

• Recall last time we discussed several possibilities for constructing point-wise confidence intervals for S(t) at a particular t

• The estimates rely heavily on the Greenwood estimate of the variance of

• Recall that this is the sum in the Greenwood’s formula:

2ˆ ˆˆ i

Y Y dt t

V S t S t

Types of Point-wise CIs• Linear

• Log

• Log-log

• Arcsine Square root

1 2ˆ ˆ i

i i ii

Y Y dt tS t z S t

1 2ˆdi

Y Y dt t i i iiz

11 1 2 ˆlogˆ ˆ, where

diY Y dS t t t i i ii

zS t S t e

ˆ0.52ˆ1 2 1

ˆ0.52ˆ1 22 1

ˆ ˆsin max 0,arcsin 0.5

ˆsin min ,arcsin 0.5

i i ii

Y Y d S tt t

S t z S t

Example: Tongue Cancer

• 80 subjects with tongue cancer• Outcome: Time to death (in weeks)• Two tumor types

– Aneuploid– Diploid

0 100 200 300 400

Time to Death (months)

AneuploidDiploid

R Life Tables >dat<-Surv(tongue$Time, tongue$Cens)> type<-tongue$Type> fit<-survfit(dat~type)> summary(fit)Call: survfit(formula = dat ~ type)

type=1 *Anueploid time n.risk n.event survival std.err L 95% CI U 95% CI 1 52 1 0.981 0.0190 0.944 1.000 3 51 2 0.942 0.0323 0.881 1.000 4 49 1 0.923 0.0370 0.853 0.998 10 48 1 0.904 0.0409 0.827 0.988 13 47 2 0.865 0.0473 0.777 0.963 16 45 2 0.827 0.0525 0.730 0.936 24 43 1 0.808 0.0547 0.707 0.922 26 42 1 0.788 0.0566 0.685 0.908 27 41 1 0.769 0.0584 0.663 0.893 28 40 1 0.750 0.0600 0.641 0.877 30 39 2 0.712 0.0628 0.598 0.846 32 37 1 0.692 0.0640 0.578 0.830 41 36 1 0.673 0.0651 0.557 0.813 51 35 1 0.654 0.0660 0.537 0.797 65 33 1 0.634 0.0669 0.516 0.780….

R Life Tables

type=2 *Diploid

time n.risk n.event survival std.err L 95% CI U 95% CI 1 28 1 0.9643 0.0351 0.8979 1.000 3 27 1 0.9286 0.0487 0.8379 1.000 4 26 1 0.8929 0.0585 0.7853 1.000 5 25 2 0.8214 0.0724 0.6911 0.976 8 23 1 0.7857 0.0775 0.6475 0.953 12 21 1 0.7483 0.0824 0.6031 0.929 13 20 1 0.7109 0.0863 0.5603 0.902 18 19 1 0.6735 0.0895 0.5190 0.874 23 18 1 0.6361 0.0921 0.4790 0.845 26 17 1 0.5986 0.0939 0.4402 0.814 27 16 1 0.5612 0.0952 0.4025 0.783 30 15 1 0.5238 0.0959 0.3658 0.750 42 14 1 0.4864 0.0961 0.3302 0.716 56 13 1 0.4490 0.0957 0.2956 0.682 62 12 1 0.4116 0.0948 0.2621 0.646…

CIs for S(52) for Anueploid Tumors

• Linear

• Log

• Log-log

• Arcsine-Square root

Comparing CI’s

R Code### Anueploid Onlyafit.lin<-survfit(dat[type==1]~1, conf.type="plain")afit.log<-survfit(dat[type==1]~1, conf.type="log")afit.loglog<-survfit(dat[type==1]~1, conf.type="log-log")

### Diploid onlydfit.lin<-survfit(dat[type==2]~1, conf.type="plain")dfit.log<-survfit(dat[type==2]~1, conf.type="log")dfit.loglog<-survfit(dat[type==2]~1, conf.type="log-log")

R Codepar(mfrow=c(2,2))

plot(afit.lin, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Linear")

lines(dfit.lin, conf.int=T, col=3, lwd=2, lty=c(1,1))

mtext("S(t)", side=2, line=3, at=-0.4)

plot(afit.log, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log")

lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(1,1))

plot(afit.loglog, conf.int=T, col=1, lwd=2, lty=c(2,2), main="Log-log")

lines(dfit.loglog, conf.int=T, col=3, lwd=2, lty=c(1,1))

mtext("Time to Death (days)", side=1, line=3, at=500)

plot(afit.lin, conf.int=F, col=1, lwd=2, lty=2, main="Arcsine squareroot")

lines(dfit.lin, conf.int=F, col=3, lwd=2, lty=1)

lines(afit.lin$time, aarcsn[,1], type="s", lwd=2, lty=2)

lines(afit.lin$time, aarcsn[,3], type="s", lwd=2, lty=2)

lines(dfit.lin$time, darcsn[,1], type="s", col=3, lwd=2, lty=1)

R Codepar(mfrow=c(1,2), xpd=NA)

plot(afit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Anueploid", ylab="S(t)")

lines(afit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))

lines(afit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))

lines(afit.lin$time, aarcsn[,1], type="s", col=4, lwd=2, lty=4)

lines(afit.lin$time, aarcsn[,3], type="s", col=4, lwd=2, lty=4)

mtext("Time to Death (days)", side=1, line=3, at=500)

plot(dfit.loglog, conf.int=T, col=2, lwd=2, lty=c(3,3), main="Diploid")

lines(dfit.log, conf.int=T, col=3, lwd=2, lty=c(2,2))

lines(dfit.lin, conf.int=T, col=1, lwd=2, lty=c(1,1))

legend(x=125, y=1, legend=c("Linear","Log","Log-log","Arcsine sqrt"), col=c(1,3,2,4), lty=c(1,2,3,4), lwd=2, cex=0.75, bty="n")

CI Function for Arcsine Sqrtarcsincis<-function(fit, t, alpha)#fit is a fitted survival curve from survfit, t is the time we want to estimate S(t) and the CI for#alpha is the significance level we want to consider{ if(class(fit)!="survfit") stop("The object is not of class 'survfit'") tloc<-max(which(fit$time<=t)) St_hat<-fit$surv[tloc] gw<-c() for (i in 1:tloc)

{di<-fit$n.event[i]yi<-fit$n.risk[i]gw<-append(gw, di/(yi*(yi-di)))}

gwsum<-sum(gw) lo<-round((sin(max(0, asin(sqrt(St_hat))-0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-St_hat)))))^2, 3) hi<-round((sin(min(pi/2, asin(sqrt(St_hat))+0.5*qnorm(1-alpha/2)*sqrt(gwsum*St_hat/(1-

St_hat)))))^2, 3) St_hat<-round(St_hat, 3) cat("Arcsine Square root ", 100*(1-alpha), "% Confidence interval for S(", t, ") is: ", lo, " <= ",

St_hat," <= ",hi, sep="", "\n")}

CI Function for Arcsine Sqrt

> afit<-survfit(dat[type==1]~1, conf.type="none")> arcsincis(fit=afit, t=52, alpha=0.05)Arcsine Square root 95% Confidence interval for S(52) is: 0.52 <= 0.654 <= 0.776

Just Some Reminders About Usage…

• For N > 25 and < 50% censoring– Log-log is good– Arcsine square-root good– All three give ~ nominal coverage for 95% CI– Exception: extreme right tail where there is little

data• Linear approach requires larger N for good

coverage

Which Ones are More/Less Conservative

• Arcsine square root– Slightly conservative– A little wider than necessary

• Log– Conservative for upper limit– Anti-conservative on lower limit

• Log-log– For small N, slightly anti-conservative– A little too narrow

• Linear– For small N, overly anti-conservative– Too narrow

• Large Samples: all about the same

Confidence Intervals

• What is most commonly produced by software packages

• Valid ONLY for point-wise intervals• Problem is they are often misinterpreted:

– Plot a set of point-wise 95% CIs– Interpret as confidence “band”– These “bands” are too narrow!

Confidence Bands

• A band for which we are (1-a)% confident that the survival function fails within the band for all t in some interval

• Tend to be wider than the point-wise estimates

• Looking for two random functions, U(t) and L(t) s.t.

1 ; L UP L t S t U t t t t

EP (“equal probability”) bands (Nair)

• Proportional to point-wise confidence bands• Several steps:

1. Define a’s:

2. Define confidence coefficient: ca(aL, aU) from table C3

3. Use linear, log-log, or arcsine approach…

n tL n t

Linear Confidence Bands

ˆ ˆ,

i i ii

dS Y Y dt t

S t c a a t S t

Log-Log Confidence Bands

• Define q:

• Confidence band:

L U Sc a a t

1ˆ ˆ,S t S t

Arcsine-Square Root Confidence Bands

ˆ2ˆ1

ˆlower: sin max 0, arcsin 0.5 ,

ˆupper: sin min , arcsin 0.5 ,2

S tL U S S t

S t c a a t

Hall-Wellner Confidence Bands• Similar to previous approach• Confidence coefficients in table C4

• Linear:

• Log-log:

• Arcsine:

ˆ ˆL U Sa a n tS t S t

121 22

ˆ, 1_2ˆ1

ˆ, 1_2ˆ2 1

ˆ ˆsin max 0,arcsin 0.5

ˆsin min ,arcsin 0.5

k a a n t S t

S t S t

ˆ ˆ, where expˆln

L U Sa a n tS t S t

Life-Table for Tongue Cancer>sort(tongue$Time) [1] 1 1 3 3 3 4 4 5 5 8 8 10 12 13 13 13 16 16 18 23 24 26 26 27 27 28 30 30 30 32 [30] 41 42 51 56 61 62 65 67 67 69 70 72 73 74 76 77 79 80 81 87 87 88 89 91 93 93 96[58] 97 100 101 104 104 104 104 104 108 109 112 120 129 131 150 157 167 176 181 231 231 240 400> dat<-Surv(tongue$Time, tongue$Cens)> full.mod<-survfit(dat~1)> summary(full.mod)Call: survfit(formula = dat ~ 1)

time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 80 2 0.975 0.0175 0.9414 1.000 3 78 3 0.938 0.0271 0.8859 0.992 4 75 2 0.913 0.0316 0.8526 0.977 5 73 2 0.888 0.0353 0.8209 0.960 8 71 1 0.875 0.0370 0.8054 0.951 10 69 1 0.862 0.0386 0.7900 0.941 12 68 1 0.850 0.0400 0.7747 0.932… 157 8 1 0.252 0.0634 0.1541 0.413 167 7 1 0.216 0.0638 0.1213 0.386 181 5 1 0.173 0.0640 0.0838 0.357

• Steps:1. Choose and interval of time

2. Define a’s:

10 200L Ut and t

32 2 2 1 180*78 78*75 75*73 73*71 71*70 69*68

80*0.00199881 80*0.00199881

80*0.1370271 80*0.1370271

0.0019988

200 181 0.137027

0.1379

0.0.9164

s L s s

n tL n t

• Steps:3. Define confidence coefficient: ca(aL, aU) from

table C3 (or ka(aL, aU) from table C4)

4. Use linear, (log), log-log, or arcsine approach…

Example: Confidence Bands (Nair)

0 50 100 150 200 250

Linear

0 50 100 150 200 250

Loglog

0 50 100 150 200 250

Arcsine-Sqrt

There is an R Package for That…

• R package “km.ci”– Will estimate different types of confidence intervals

• But no arcsine squareroot

– Will estimate both the Nair and Hall-Wellner confidence bands

• But only the log-log transformed

• It does include the ca and ka tables from Klein and Moeschberger!– So… you could write your own confidence band function

Confidence Band Performance

• Linear is poor for n < 200– Poor coverage probability

• Log-log and arcsine square-root have pretty accurate coverage probabilities (even for n = 20)

Mean Estimation• Recall:• Non-parametric approach

– Can be done using integral

– Requires last time is not censored• “fixes” exist if this is the case

S t dt

ˆ ˆˆ ˆ ˆ&

ˆ ˆ ˆ100 1 % CI :

dS t dt Var S t dt

Example: MP-6 treatment for Leukemia

• Time in months to relapse on acute leukemia patients

• 21 patients observed for up to 35 weeksTime di Yi S(t)

6 3 21 0.8577 1 17 0.807

10 1 15 0.75313 1 12 0.69016 1 11 0.62822 1 7 0.53823 1 6 0.448

Example: MP-6 treatment for Leukemia

Median Survival

• Any quantile of S(t) can be estimated in the same way

• The most common is median (p = 0.50)• Definition of median survival:

• That is, the smallest event time for which S(t) is less than 0.50

0.5ˆˆ inf : 0.5x t S t

Other Quantiles

• For the pth quantile

• So for example, the 25th quantile is

ˆˆ inf : 1px t S t p

0.25ˆ ˆˆ inf : 1 0.25 inf : 0.75x t S t t S t

Precision of pth Quantile

• Technically difficult…• Requires knowledge of the density function of X at xp

• Approximation approach by Brookmeyer and Crowely (1982)

• Most commonly used approach for estimating a confidence interval for median survival

Brookmeyer-Crowley Approach

• For each observed t, estimate a z-score• Example

– 95% confidence interval • Calculate Z for each t• All t for which |Z| < 1.96 are included in the confidence

interval

• Looks similar to an approach for estimating the confidence for a mean

Brookmeyer-Crowley Approach

• Linear

• Log-log

• Arcsine square root

S t pZ

ˆ ˆ ˆln ln ln ln 1 ln

S t p S t S t

12ˆ ˆ ˆ2 arcsine arcsine 1 1

S t p S t S t

Example

• Kim paper• Event = time to relapse• Data:

– 10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+

Kim Data

0 20 40 60 80

Time to Relapse (months)

Kaplan-MeierNelson-Aalen

Kim Datat di Yi S(t) V(t) Linear Z Log-log Z Arcsine Z

10 1 10

35 1 8

55 1 5

80 1 2

CI Around Median

• Median for Kim data is 80 weeks• CI around this value is set of all points that

satisfy the selected inequality– E.g. for 90%, Z is 1.645– In the case of the Kim data, estimates highly

variable– Limitation with SMALL data set

For Next Time

• Left Truncated data• Competing Risks

Lecture 6: Non-parametric Estimation II -Confidence intervals and bands -Mean and median

Documents

Parametric Versus Non Parametric Statistics

The Parametric Measurement Handbook€¦ · All measurement channels must use the high-speed ADC; sampling intervals of less than 2 ms are not supported when using the high-resolution

Thinking Parametric Design- Introducing Parametric Gaudi

Tutorial parametric v. non-parametric

Supervised Parametric and Non-Parametric Classiﬂcation …cvrc.ece.utexas.edu/Publications/supervised_parametric_and_non.pdf · Supervised Parametric and Non-Parametric Classiﬂcation

Chapter 9 Estimating the Value of a ParameterThe methods for estimating parameters (such as or p) by constructing confidence intervals rely on parametric methods. Parametric statistics

LEVEL'/ - DTIC · a stated level of confidence. Tolerance intervals are generally also wider than confidence intervals. Many times rather than confidence intervals, prediction intervals

Parametric and Non-Parametric Statistics for Program

Parametric versus non parametric test

Parametric Simulation using OpenModelica€¦ · Parametric Simulation using OpenModelica - Parametric Simulation using OpenModelica20 January, 2020 Parametric Simulation using OpenModelica

Interacting with Remote Systems - Princeton University · 2 non-parameteric bootstrap conﬁdence intervals (voter turnout) 3 parametric bootstrap conﬁdence intervals (voter turnout)

Parametric Modeling With Creo Parametric

Parametric and non parametric

Computing Con dence Intervals for Log-Concave …hkj/Students/mahdis.pdf1.3 Maximum likelihood estimators Parametric MLE Maximum likelihood is one of the methods for parametric estimation

Parametric Modeling with Creo Parametric 2 - SDC 1-2 Parametric Modeling with Creo Parametric Introduction The feature-based parametric modeling technique enables the designer to ·

76fcp/statistics/intervals/intervals/intervals.pdf · 78 CHAPTER 5. INTERVAL ESTIMATION 5.1 Bayesian Intervals We will call intervals obtained according to Bayesian methodology Bayesian

NON-PARAMETRIC ESTIMATION OF A DISTRIBUTION FUNCTION …ageconsearch.umn.edu/bitstream/229802/2/Empirical Distribution Function... · as a set of non-overlapping intervals; thus,

IMPRINTER BANDS · INfANT IMPRINTER BANDS IMPRINTER BANDS IMPRINTER BANDS Labels & Tapes: 800.323.4840 I 630.986.1800 I Product Number Size of Band Information Area Bands per Box

n G SystemCreo Parametric Complete Mold Design, Creo Parametric Progressive Die, Creo Parametric Prismatic & Multi-surface Milling, Creo Parametric Production Machining, Creo Parametric

Paper TC335 Conception of Bi-band Rectangular Microstrip ... · In this paper, the rectangular patch antenna operate in Ku and K bands, whose the intervals of frequency, respectively,