IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION

UW Winter 07

1

IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION

Donald A. Pierce, Oregon Health Sciences Univ

Ruggero Bellio, Udine, Italy

These slides are at www.science.oregonstate.edu/~piercedo/

UW Winter 07

2

Nearly all survival analysis uses first-order asymptotics: limiting distns of MLE, LR, or scores ; interest here only on Cox regression, partial likelihood

Usually these approximations are quite good, but it is of interest to verify this or improve on them (Samuelsen, Lifetime Data Analysis, 2003)

We consider both higher-order asymptotics and more direct simulation of P-values

Primary issue: inference beyond first-order requires more than the likelihood function

This may lead to unreasonable dependence of methods on the censoring model and baseline hazard

Our approach involves forms of conditioning on censoring

UW Winter 07

3

Consider direct simulation of P-values without this (same issues arise in higher-order asymptotics)

One must estimate the baseline hazard, sample failure times according to this, then apply the censoring model which may involve estimating a censoring distribution

Quite unattractive in view of the essential nature of Cox regression

With suitable conditioning, and some further conventions regarding the censoring model, this can be avoided

Aim is to maintain the rank-based nature of inference in the presence of censoring (simulation: sample failures from exponential distn, apply censoring to ranks)

We provide convenient Stata and R routines for carrying out both the simulation and higher-order asymptotics.

UW Winter 07

4

COX REGRESSION: Hazards of form , with unspecified. Interest parameter a scalar function of with remaining coordinates as nuisance parameters

( ; , ) ( ) zt z t e

XO

X

OO

X

(1)t (2)t

Risk set : those alive at failure time iR ( )itMultinomial likelihood contribution , the probability that it is individual (i) among these that fails.

( )( ) /i

i ji j

z zL e e

R

Partial likelihood ( ) ( )i iL L

( )

UW Winter 07

5

Useful reference sets for interpreting a given dataset:

(i) data-production frame of reference

(ii) conditional on “censoring configuration”

(iii) treating all risk sets as fixed

Using (i) involves censoring model, estimation of baseline hazard and censoring distribution (see Dawid 1991 JRSSS-A regarding data-production and inferential reference sets)

That of (ii) requires some development/explanation. By “censoring configuration” we mean the numbers of censorings between successive ordered failures

Approach (iii) is not really “conditional”, but many may feel this is the most appropriate reference set --- things are certainly simple from this viewpoint. Applies when risk sets arise in complicated ways, and to time-dependent covariables

UW Winter 07

6

EXTREME* EXAMPLE TO SHOW HOW THINGS ARE WORKING:

n = 40 with 30% random censoring, log(RR) interest parameter 1.0 with binary covar -- 5 nuisance params in RR involving exponential covariables . Hypotheses where one-sided Wald P-value is 0.05 * 6 covariables with <30 failures

results typical for datasets Lower Upper

LR first order 0.046 0.062

Data production, exact (simulation) 0.090 0.020

Conditional, exact (simulation) 0.103 0.024

Conditional, 2nd order asymptotics 0.096 0.025

Fixed risk sets, exact (simulation) 0.054 0.051

Fixed risk sets, 2nd order asymptotics 0.052 0.052

UW Winter 07

7

With fewer failures, and fewer nuisance parameters, adjustments are smaller and thus harder to summarize.

Lower Upper

LR first order 0.042 0.065

Data production, exact (simulation) 0.053 0.040

Conditional, exact (simulation) 0.054 0.037

Conditional, 2nd order asymptotics 0.060 0.043

Fixed risk sets, 2nd order asymptotics 0.047 0.051

However, the following for a typical dataset shows the essential nature of results. This is for n = 20 with 25% censoring, interest parameter as before, and only 1 nuisance parameter.

Samuelsen’s conclusion, that in small samples the Wald and LR confidence intervals are conservative, does not seem to hold up with any useful generality

UW Winter 07

8

CONDITIONING ON “CENSORING CONFIGURATION”

That is, on the vector , where is the number censored following the jth ordered failure

Seems easy to accept that this is “ancillary” information, for inference about relative risk when using partial likelihood. It could be that “ancillary” is not the best term for this (comments please!!)

The further convention involved in making this useful pertains to which individuals are censored

Our convention for this: in martingale fashion, sample from risk sets the to be censored, with probabilities possibly depending on covariables (comments please!!)

Unless these probabilities depend on covariables, a quite exceptional assumption, results of Kalbfleisch & Prentice (1973 Bka) apply: partial likelihood is the likelihood of “reduced ranks”

0 1( , , , )kq q q q jq

jq

UW Winter 07

9

Recall that a probability model for censoring is often (but with notable exceptions) sort of a “fiction” concocted by the statistician, with following aims

A common model is that for each individual there is a fixed, or random, latent censoring time and what is observed is the minimum of the failure and censoring time

Leads to usual likelihood function: product over individuals of

The use of censoring models is usually only to consider whether this likelihood is valid (censoring is “uninformative”) --- model is not used beyond this

But usual models as above render the problem not to be one only involving ranks, whereas our conditioning and convention maintain the rank-based inferential structure

1( ) ( )i ic ci i if t pr T t

UW Winter 07

10

“Reduced ranks”, or marginal distribution of ranks, concept – individual 3 is here censored

x

x

O

x

Compatible ranks for uncensored data – the single “reduced ranks” outcome

2, 3, 4, 1

2, 4, 3, 1

2, 4, 1, 3Partial likelihood, as a function of the data, provides the distribution of these reduced ranks

UW Winter 07

11

Thus with our conditioning and convention, and no direct dependence of censoring on covariates, the K&P result yields that the partial likelihood is the actual likelihood for the “reduced rank” data

Means that all the theory of higher-order likelihood inference applies to partial likelihood (subject to minor issues of discreteness) --- a more general argument exists for the data-production reference set

Higher-order asymptotics depend only on certain covariances of scores and loglikelihood

Either exact or asymptotic results can in principle be computed from the K&P result, but simulation is both simpler and more computationally efficient

Simulation for asymptotics is considerably simpler than for exact results (no need to fit models for each trial), but many will prefer the latter when it is not problematic

UW Winter 07

12

SIMULATION OF P-VALUES:

With conditioning, one may: (i) simulate failure times using constant baseline hazard since only the ranks matter (ii) apply censoring process to the rank data, and (iii) fit the two models

Our primary aim is to lay out assumptions justifying (i) and (ii). (comments please!!)

Highly tractable, except that null and alternative model must be fitted for each trial

Quite often must allow for “infinite” MLEs, but even with this can be problematic for small samples

Primary advantage over asymptotics is the transparency

Stata procedure uses same syntax as the ordinary fitting routine, takes about a minute for 5,000 trials

UW Winter 07

13

SECOND-ORDER METHODS: This is for inference about scalar functions of the RR. It involves the quantity proposed by Barndorff-Nielsen,

where r is the signed-root maximum LR, and adj involves more than the likelihood function.

Insight into limitations of first-order methods derives from decomposing this adjustment as

where NP allows for fitting nuisance parameters and INF basically allows for moving from likelihood to frequency inference.

Generally, INF is only important for fairly small samples, but NP can be important for reasonable amounts of data when there are several nuisance parameters.

*r r adj

adj NP INF

UW Winter 07

14

COMPUTATION OF THIS: Will not give (the fairly simple) formulas here, but they involve computing

where the parameter are then evaluated at the constrained and full MLEs (formulas: Pierce & Bellio, Bka 2006, 425)

These must be computed by simulation, raising the same issues about reference sets, but this is far easier than the simulation of likelihood ratios

Quantities above pertain to statistical curvature, and at least in our setting the magnitude and direction of the NP adjustment relate to the extent and direction of the curvature introduced by variation in composition of risk sets

0

0

0 1

0 1 0

cov ( ), ( )

cov ( ), ( ) ( )

l l

l l l

UW Winter 07

15

RISK SETS AS FIXED: Things simplify considerably for the inferential reference set where the risk sets are taken as fixed (and experiments on these as independent)

Use of this reference set often seems necessary when the risk sets arise in complex ways, mainly useful for inference about relative risk beyond analysis of simple response-time data

It is also quite adequate for all needs when the numbers at risk are large in relation to the number of failures (rare events).

UW Winter 07

16

FORMULAS FOR FIXED RISK SETS: In this case the setting is one of independent multinomial experiments defined on the risk sets. Following is for loglinear RR

Formulas of Pierce & Peters 1992 JRSS-B apply, yielding

Where w is the Wald statistic, and is the ratio of determinants of the nuisance parameter information at the full and constrained MLEs

May be useful in exploring for what settings the NP adjustment is important: nuisance parameter information must “vary rapidly” with the value of the interest parameter

However, these adjustments are smaller than for our other reference sets

1/ 2

1log( / )

1log( )

INF r wr

NPr

UW Winter 07

17

SAME AS FIRST EXAMPLE (5 nuisance parameters) BUT WITH:

n = 500 with 97% random censoring (fewer failures than before, namely about 15) – rare disease case

Remainder of model specification as in first example, results when Wald P-value is 0.05

typical results for a single dataset, lower limits

LR first order 0.057

Data-production refset 0.059

Conditional exact (direct simulation) 0.054

Conditional, second-order 0.054

Fixed risk sets, exact (simulation) 0.055

Fixed risk sets, 2nd order 0.052

UW Winter 07

18

OVERALL RECOMMENDATIONS:

1. Seems that adjustments will usually be small, but it may be at least worthwhile to verify that in many instances if convenient enough.

2. Will provide routines in Stata and R. The Stata one largely uses same syntax as basic fitting command.

3. When failures are a substantial fraction of those at risk, use conditional simulation of P-values unless problems with fitting are encountered

4. If those problems are likely or encountered, then use the 2nd-order methods. These also provide more insight.

5. When failures are a small fraction of those at risk, or when risk sets arise in some special way, use the asymptotic fixed risk set calculations

Documents

IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION