42
Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis advisors: Piet Groeneboom and Jon A. Wellner

Nonparametric maximum likelihood estimation (MLE) for bivariate censored data Marloes H. Maathuis advisors: Piet Groeneboom and Jon A. Wellner

Embed Size (px)

Citation preview

Nonparametric maximum likelihood estimation (MLE)

for bivariate censored data

Marloes H. Maathuis

advisors:

Piet Groeneboom and Jon A. Wellner

Motivation

Estimate the distribution function of the

incubation period of HIV/AIDS:– Nonparametrically– Based on censored data:

• Time of HIV infection is interval censored

• Time of onset of AIDS is interval censored

or right censored

Approach

• Use MLE to estimate the bivariate distribution

• Integrate over diagonal strips: P(Y-X ≤ z) X (HIV)

Y (AIDS)

z

Main focus of the project

• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the

inconsistency

Main focus of the project

• MLE for bivariate censored data:– Computational aspects– (In)consistency and methods to repair the

inconsistency

1980

1992

1996

1980 1983 1986 X (HIV)

Y (AIDS)In

terv

al o

f on

set

of A

IDS

Interval ofHIV infection

1980

1992

1996

1980 1983 1986 X (HIV)

Y (AIDS)In

terv

al o

f on

set

of A

IDS

Interval ofHIV infection

Observation rectangle Ri

X (HIV)

Y (AIDS)

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

α1 α2

α3 α4

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

1

max log ,n

F F i i ii

P X Y R

F

s.t. and

4 1 1 3max log( ) log( )

1 2 2 4log( ) log( )

3 4log( )

0, 1, , 4,i i 4

1

1ii

3/5 0

0 25

X (HIV)

Y (AIDS)Maximal intersections

Observation rectangle Ri

The αi’s are not always uniquely determined: mixture non uniqueness

1

max log ,n

F F i i ii

P X Y R

F

s.t. and

4 1 1 3max log( ) log( )

1 2 2 4log( ) log( )

3 4log( )

0, 1, , 4,i i 4

1

1ii

Computation of the MLE

• Reduction step:

determine the maximal intersections

• Optimization step:

determine the amounts of mass assigned to the maximal intersections

Computation of the MLE

• Reduction step:

determine the maximal intersections

• Optimization step:

determine the amounts of mass assigned to the maximal intersections

Existing reduction algorithms

• Betensky and Finkelstein (1999, Stat. in Medicine) • Gentleman and Vandal (2001, JCGS) • Song (2001, Ph.D. thesis) • Bogaerts and Lesaffre (2003, Tech. report)

The first three algorithms are very slow,

the last algorithm is of complexity O(n3).

New algorithms

• Tree algorithm

• Height map algorithm: – based on the idea of a height map of the

observation rectangles– very simple– very fast: O(n2)

1

11

1000

0

11

33

2110

0

21

33

2121

0

21

22

1011

0

10

11

0011

0

00

21

1122

1

00

11

0011

0

00

00

0011

0

0

Height map algorithm: O(n2)

1

22

2110

0

2

Main focus of the project

• MLE of bivariate censored data:– Computational aspects – (In)consistency and methods to repair the

inconsistency

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

u1 u2

Time of HIV infection is interval censored case 2

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

HIV

AIDS

t = min(c,y)

u1 u2

Time of onset of AIDS is right censored

t = min(c,y)

HIV

AIDS

u1 u2

HIV

AIDS

u1 u2

t = min(c,y)

HIV

AIDS

u1 u2

t = min(c,y)

HIV

AIDS

u1 u2

t = min(c,y)

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Inconsistency of the naive MLE

Methods to repair inconsistency

• Transform the lines into strips

• MLE on a sieve of piecewise constant densities

• Kullback-Leibler approach

• cannot be estimated

consistently

X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period

( )P Z z

X = time of HIV infectionY = time of onset of AIDSZ = Y-X = incubation period

1 2( )P Z z x X x

• An example of a parameter we can estimate consis-tently is:

Conclusions (1)

• Our algorithms for the parameter reduction step are significantly faster than other existing algorithms.

• We proved that in general the naive MLE is an inconsistent estimator for our AIDS model.

Conclusions (2)

• We explored several methods to repair the inconsistency of the naive MLE.

• cannot be estimated consistently without additional assumptions. An alternative parameter that we can estimate consistently is:

. 1 2( )P Z z x X x

( )P Z z

Acknowledgements

• Piet Groeneboom

• Jon Wellner