58
Modern Survival Analysis David Steinsaltz 1 University of Oxford 1 University lecturer at the Department of Statistics, University of Oxford

Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Modern Survival Analysis

David Steinsaltz1

University of Oxford

1University lecturer at the Department of Statistics, University of Oxford

Page 2: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents

1 Time processes and counting processes 11.1 Definition of counting process . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Examples of counting processes . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Single survival time . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Sum of counting processes . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 Deterministic intensity . . . . . . . . . . . . . . . . . . . . . . . . 31.4.2 Random intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 �-algebras and conditioning 72.1 � algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 �-algebras and information . . . . . . . . . . . . . . . . . . . . . 82.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.4 Intersections of �-algebras . . . . . . . . . . . . . . . . . . . . . . 92.1.5 � algebra generated by a family of sets . . . . . . . . . . . . . . . 92.1.6 Joins of �-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.7 �-algebra generated by a random variable . . . . . . . . . . . . . 92.1.8 Filtrations of �-algebras . . . . . . . . . . . . . . . . . . . . . . . 102.1.9 Complete filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.10 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.11 Predictable processes . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Definition of conditioning . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Intuitive definition for discrete random variables . . . . . . . . . . 142.2.3 Properties of conditional expectations . . . . . . . . . . . . . . . . 142.2.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.5 In search of past time . . . . . . . . . . . . . . . . . . . . . . . . 16

ii

Page 3: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents iii

2.2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Martingales 203.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Compensators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Inhomogeneous Poisson counting process . . . . . . . . . . . . . . 223.3 Cheater’s guide to stochastic integrals . . . . . . . . . . . . . . . . . . . 233.4 Variation processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4.1 Intuitive definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.2 Formal definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.3 Useful facts about variation processes . . . . . . . . . . . . . . . 253.4.4 Caveats (not examinable) . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5.1 Independent sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.5.2 Weighted independent sums . . . . . . . . . . . . . . . . . . . . . 283.5.3 Compensated homogeneous Poisson process . . . . . . . . . . . . 293.5.4 Compensated inhomogeneous Poisson process . . . . . . . . . . . 29

3.6 Normal approximation for martingales . . . . . . . . . . . . . . . . . . . 30

4 Non-parametric estimation for basic survival models 324.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Introduction to non-parametric estimation . . . . . . . . . . . . . 324.1.2 The multiplicative intensity model . . . . . . . . . . . . . . . . . 334.1.3 Martingale analysis of the multiplicative intensity model . . . . . . 34

4.2 The Nelson–Aalen estimator . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.1 Distinct event times: Informal derivation . . . . . . . . . . . . . 354.2.2 Distinct event times: Formal derivation of the Nelson–Aalen estimator 364.2.3 Pointwise confidence intervals . . . . . . . . . . . . . . . . . . . . 364.2.4 Simulated data set . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.5 Breaking ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 More about non-parametric estimation 405.1 The nobody-left problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.2 The Kaplan–Meier estimator . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1 Deriving the Kaplan–Meier estimator . . . . . . . . . . . . . . . . 415.2.2 The relation between Nelson–Aalen and Kaplan–Meier . . . . . . 425.2.3 Duhamel’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2.4 Confidence intervals for the Kaplan–Meier estimator . . . . . . . . 44

Page 4: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents iv

6 Pointwise comparisons and quantiles 476.1 Pointwise hypothesis tests for survival . . . . . . . . . . . . . . . . . . . 486.2 Survival to 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Medians and other quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . 576.4 Computing survival estimators in R . . . . . . . . . . . . . . . . . . . . . 58

6.4.1 Survival objects with only right-censoring . . . . . . . . . . . . . 586.4.2 Other survival objects . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Comparing distributions: Excess mortality 637.1 Estimating excess mortality: One-sample setting . . . . . . . . . . . . . 637.2 Excess mortality: Two-sample case . . . . . . . . . . . . . . . . . . . . . . 67

8 Nonparametric tests for equality of survival distributions 698.1 One-sample setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8.1.1 No ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.1.2 Weight functions and particular tests . . . . . . . . . . . . . . . . 708.1.3 With ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.1.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

8.2 Two-sample setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.2.1 No ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728.2.2 Weight functions and particular tests . . . . . . . . . . . . . . . . . 748.2.3 With ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758.2.4 The AML example . . . . . . . . . . . . . . . . . . . . . . . . . . . 778.2.5 Kidney dialysis example . . . . . . . . . . . . . . . . . . . . . . . 78

9 Relative-risk models 819.1 The relative-risk regression model . . . . . . . . . . . . . . . . . . . . . . . 819.2 Partial likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839.3 Significance testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

10 Relative-risk models II 8610.1 Estimating baseline hazard . . . . . . . . . . . . . . . . . . . . . . . . . 86

10.1.1 Breslow’s estimator . . . . . . . . . . . . . . . . . . . . . . . . . . 8610.1.2 Individual risk ratios . . . . . . . . . . . . . . . . . . . . . . . . . . 87

10.2 Dealing with ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8710.3 Asymptotic properties of partial likelihood . . . . . . . . . . . . . . . . . 8910.4 The AML example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9110.5 The Cox model in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Page 5: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents v

11 Additive hazards regression 9911.1 Describing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9911.2 Fitting the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10011.3 Variance estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11.3.1 Martingale representation of additive hazards model . . . . . . . . 10111.3.2 Estimating the covariance matrix . . . . . . . . . . . . . . . . . . 102

11.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10311.4.1 Single categorical covariate . . . . . . . . . . . . . . . . . . . . . 10311.4.2 Simulated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

12 Model diagnostics 10712.1 General principles of model selection . . . . . . . . . . . . . . . . . . . . . 107

12.1.1 The idea of model diagnostics . . . . . . . . . . . . . . . . . . . . . 10712.1.2 A simulated example . . . . . . . . . . . . . . . . . . . . . . . . . 108

12.2 Cox–Snell residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10912.3 Bone marrow transplantation example . . . . . . . . . . . . . . . . . . . . 111

13 Model diagnostics II 11413.1 Martingale residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

13.1.1 Definition of martingale residuals . . . . . . . . . . . . . . . . . . . 11413.1.2 Application of martingale residuals for estimating covariate trans-

forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11513.2 Graphical tests of the proportional hazards assumption . . . . . . . . . 116

13.2.1 Log cumulative hazard plot . . . . . . . . . . . . . . . . . . . . . 11613.2.2 Andersen plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11613.2.3 Arjas plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11613.2.4 Leukaemia example . . . . . . . . . . . . . . . . . . . . . . . . . 119

14 Censoring and truncation revisited 12114.1 Left censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

14.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12214.2 Doubly-censored data: Turnbull’s algorithm . . . . . . . . . . . . . . . . 12214.3 Interval-censored data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

15 Frailty models 12615.1 Proportional frailty model . . . . . . . . . . . . . . . . . . . . . . . . . . 12615.2 Examples of frailty distributions . . . . . . . . . . . . . . . . . . . . . . . 127

15.2.1 Gamma frailty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12715.2.2 PVF family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

15.3 Hazard and frailty of survivors . . . . . . . . . . . . . . . . . . . . . . . 128

Page 6: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents vi

15.4 E↵ects on the hazard ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 12815.4.1 Changing relative risk . . . . . . . . . . . . . . . . . . . . . . . . 12815.4.2 Association between individuals . . . . . . . . . . . . . . . . . . . 129

16 Measurement error 13016.1 Motivation: Linear regression . . . . . . . . . . . . . . . . . . . . . . . . 130

16.1.1 Regression calibration . . . . . . . . . . . . . . . . . . . . . . . . . 13116.2 Cox model with additive errors . . . . . . . . . . . . . . . . . . . . . . . . 131

16.2.1 Approximate solutions . . . . . . . . . . . . . . . . . . . . . . . . 13216.2.2 The additive normal model . . . . . . . . . . . . . . . . . . . . . 132

16.3 Small-error approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 13316.4 Example: Thyroid cancer in Hiroshima and Nagasaki . . . . . . . . . . . . 134

A Using R for survival analysis I

B Notes on the Poisson Process IIIB.1 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IIIB.2 The Poisson process on R+ . . . . . . . . . . . . . . . . . . . . . . . . . . IV

B.2.1 Local definition of the Poisson process . . . . . . . . . . . . . . . . IVB.2.2 Global definition of the Poisson process . . . . . . . . . . . . . . . VB.2.3 Defining the Interarrival process . . . . . . . . . . . . . . . . . . . VB.2.4 Equivalence of the definitions . . . . . . . . . . . . . . . . . . . . VIB.2.5 The Poisson process as Markov process . . . . . . . . . . . . . . VIII

B.3 Examples and extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . VIIIB.4 Some basic calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . IXB.5 Thinning and merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIIIB.6 Poisson process and the uniform distribution . . . . . . . . . . . . . . . . XV

C Assignments XVIIIC.1 Modern Survival Problem sheet 1:

Counting processes and martingales . . . . . . . . . . . . . . . . . . . . I

Page 7: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

CONTENTS vii

Website: http://www.steinsaltz.me.uk/survival/survival.html

Classes: There will be 6 classes, held Monday mornings 9–10 and 10–11 in the seminarroom of 1 SPR in weeks 4 through 8, and week 1 of Hilary Term. Work for each class isto be turned in at the statistics department by Friday noon.

Overview: Students will learn how to use the basic mathematical tools that are usedto evaluate survival models. They will learn both mathematical facts about standardsurvival models currently in use, and how to use standard R packages to fit these modelsto data. They will learn how to interpret models critically, and how to choose anappropriate model.

Prerequisites: Required Part A Probability and Statistics. BS3a and BS3b would behelpful. Basic computer skills, including some familiarity with R. This is on a level thatan interested student with no R experience could acquire in a few hours.

Synopsis

• Point processes and compensators. Introduction to martingales.

• Non-parametric estimation. Semi-parametric estimation and the Cox model.

• Additive hazards regression. Varieties of data, such as current-status and randomlytruncated data.

• Model selection: Hypothesis testing and information criteria.

• Model Diagnostics: Graphical methods, Residual methods.

• Advanced topics, such as: Influence and robustness, measurement error and lon-gitudinal data, recurrent event models, frailty models, Isotonic (shape-based) fitsand hypothesis testing (in particular, increasing hazards). Bayesian approaches tosurvival.

Reading: The primary source for material in this course will beO. O. Aalen, O. Borgan, H. K. Gjessing, Survival and Event History Analysis:

A Process Point of ViewOther material will come from

• J. P. Klein and M. L. Moeschberger, Survival Analysis: Techniques for Censoredand Truncated Data, (2d edition)

• T. R. Fleming and D. P. Harrington, Counting Processes and Survival Analysis

Klein and Moeschberger is the most applied, least theoretical book. Fleming andHarrington is more rigorous than the level of this course. Aalen et al. is at more or less

Page 8: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Contents viii

the level we are aiming for. Nonetheless, some material will be adapted, as needed, fromthe other two books, and from some other sources as well. But if you’re looking for moredetail about the mathematics, Fleming and Harrington is a good place to start. If you’relooking for a more straightforward presentation, try Klein and Moeschberger.

Page 9: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Chapter 1

Time processes and countingprocesses

1.1 Definition of counting process

Given a sequence1 of random times T1, . . . , Tn, where n in principle could be1, we have an associated counting process

N(t) := #�

i : Ti t

=nX

i=1

1{Ti

t}. (1.1)

Note that this definition makes counting processes continuous from the right.From the left they have limits2, but are discontinuous at each Ti. We writeN(t�) for the left-limit at t, which is

N(t�) =(

N(t)�#{i : Ti = t} if t 2 {T1, . . . , Tn},N(t) otherwise.

In general, a random function that is piecewise constant and right-continuous, with all jumps being positive integers, may be called a countingprocess, and it may be associated to a sequence of random variables whereT 2 R appears exactly N(T )�N(T�) times. (Note that the number of Twhere this di↵erence is nonzero is finite on any bounded interval.)

1We may or may not be concerned with the order, but we must consider the T1, . . . , Tn

as a sequence, not as a set, because in principle times may be repeated. An alternative isto think of {T1, . . . , Tn

} as a multiset.2Functions that are continuous from the right, but only have limits from the left are

sometimes called cadlag, from the French continue a droite, limite a gauche. We will notbe using this term.

1

Page 10: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Counting processes 2

We will refer to a counting process whose jumps are all of size 1 (withprobability 1) as a simple counting process. The counting process associatedto any independent sequence of continuous random variables

1.2 Examples of counting processes

1.2.1 Single survival time

Let T be a nonnegative random variable. The counting process associatedwith T is a function that is 0 for t < T , and then jumps to N(t) = 1 att = T .

1.2.2 Sum of counting processes

If N1, . . . , Nk are counting processes, then N(t) =Pk

i=1Ni(t) is also a count-ing process. In particular, the counting process associated with T1, . . . , Tk isthe sum of the counting processes associated with T1, . . . , Tk. It may also bewritten as

N(t) = #�

i : Ti t

.

The sum of independent simple counting processes may not be simple.But it will be if the processes are all generated by times with continuousdistributions.

1.3 Poisson process

The fundamental counting process is the Poisson process. This is a topic inPart A Probability and in Statistics BS3a (Applied Probability), so shouldbe familiar to you, but notes from Part A Probability are included as areminder in the Appendix B.

As noted there, the term “Poisson process” is used sometimes for thepoint process — the random collection of points — and sometimes for thecounting process associated with this point process. When we need to makeclear which we are referring to, we will use the terms Poisson point processand Poisson counting process.

The Poisson counting process is the simplest nontrivial continuous-timediscrete-space homogeneous Markov process on Z+. It starts at 0, and movesup by one step after holding times that are i.i.d. exponential. Thus it hasthe Markov property, meaning that for any 0 s1 < t1 s2 < t2 · · · sn < tn, the random variables

N(ti)�N(si)�n

i=1are independent.

Page 11: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Counting processes 3

1.4 Intensity

1.4.1 Deterministic intensity

Let N(t) be a Poisson counting process with rate 1 on R+, let � : R+ ! R+

be a piecewise continuous, locally bounded (that is, bounded on boundedintervals) function, and let ⇤(t) :=

R t0 �(s)ds. Define

N (�)(t) := N(⇤(t)). (1.2)

Remember that the Poisson process with rate 1 has the property that

lim�#0

��1P�

N(t+ �)�N(t) > 0

= 1;

that is, points accumulate at rate 1. The time-changed process N (�) has

lim�#0

��1P�

N (�)(t+ �)�N (�)(t) > 0

= lim�#0

��1P�

N(⇤(t+ �))�N(⇤(t)) > 0

= lim�#0

⇤(t+ �)� �(t)

�· ��1

⇤ P�

N�

⇤(t) + �⇤�

�N(⇤(t)) > 0

where �⇤ = (⇤(t+ �)� ⇤(t)),

= �(t).

We call N (�) the (inhomogeneous) Poisson counting process with intensity�, and the corresponding point process is the (inhomogeneous) Poisson pointprocess with intensity �. It has the same general properties as the Poissonprocess (no clustering, independent increments), but changes the assumptionof constant rate. The expected number of points on an interval (s, t) is notproportional to t� s now, but to

R ts �(x)dx.

As with the homogeneous Poisson process, the sum of independent in-homogeneous Poisson processes with intensities �i(t) is an inhomogeneousPoisson process with intensity

P

�i(t).We may also think of the inhomogeneous Poisson process with intensity

�, like the Poisson process, as a sequence of interarrival times. The sequenceis no longer independent, but the event times are Markov, in the sensethat if we define 0 = T0 < T1 < T2 < · · · < Tn to be the event times, and⌧i := Ti � Ti�1, then conditioned on (T0, . . . , Ti) the next interarrival time⌧i+1 has hazard rate �(t+ Ti), which translates to a density

f⌧i+1|(T0,...,T

i

) = �(Ti + t) expn

�Z T

i

+t

Ti

�(s)dso

. (1.3)

Page 12: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Counting processes 4

Recall that, for a positive random variable T with density f and cdfF (t) =

R t0 f(s)ds, and survival function S(t) = 1� F (t), the hazard rate of

T is defined as

�(t) :=f(t)

1� F (t)=�S0(t)

S(t)= lim

�#0��1P

t T < T + ��

� t T

. (1.4)

The intensity is the same as the hazard rate of the first waiting time.

1.4.2 Random intensity

A counting process with fixed intensity is useful for some purposes — mod-elling customer arrivals at a queue, for instance, where customers are morelikely to come at certain times of day — but generally not for survivalmodelling. The reason is that the most common survival applications involvea fixed population of individuals, which is depleted at each event. Randomcensoring — for example, individuals dropping out of the study — will alsocause the rate of new events to change.

We will allow, then, for the possibility that the intensity �(t) depends onsome stochastic process (that is, a random function). It will be important thatwe only allow the intensity to depend on the past. We will be developing amathematical definition of “conditioning on the past” in the coming chapters,but eschewing complete mathematical rigour. Those who are interested maywish to look at chapter 5 of [Dur10] (or almost any graduate-level text onstochastic processes).

Survival experiments

A paradigm example is where we have n individuals, each of whom has arandom event time Ti. The population is not replaced, so the number ofindividuals at risk is changing.

Suppose that the Ti are i.i.d., with hazard rate �(t). We define a downwardcounting process

Y (t) := #�

i : t < Ti

= n�#�

i : t � Ti

. (1.5)

This is called the number at risk; that is, the numbeur of individuals at timet whose event has not yet occurred. Then if N(t) is the number of eventsthat have happened up to time t, the intensity at time t is �(t)Y (t). SinceY (t) is random, the intensity is also random. On the interval t 2 [T(i), T(i+1))the intensity is (n� i)�(t).

Page 13: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Counting processes 5

In some applications the individuals will have di↵erent hazard rates fortheir waiting times. For such applications we need to think of a risk setR(t), the set of individuals at risk at time t (so Y (t) = |R(t)|), and then therandom intensity at time t would be given by

�(t) =X

i2R(t)

�i(t).

Right censoring

A slightly more complicated example is when each individual has a pair ofpositive random variables (Xi, Ci). Xi is the event time, Ci is the censoringtime. We observe (Ti, �i) where Ti = Xi ^ Ci and �i = 1{X

i

Ci

}.The idea is that the event time is not observed if it comes after Ci

(so-called right-censoring). Otherwise, all we know is that censoring occurredat the time Ti. The number at risk is now given by

Y (t) := #�

i : t < Xi ^ Ci

= n�#�

i : t � Xi ^ Ci

. (1.6)

As before, if the Xi all have hazard rate �(t), the intensity is Y (t)�(t). Thedi↵erence is that Y (t) is no longer computable from the (Ti).

Notation 1.4.1. We use the notations a ^ b = min{a, b} and a _ b = max{a, b}.If T1, . . . , Tn are real numbers, we write T(1) · · · T(n) for the ordered sequence.(Tie-breaking may be done arbitrarily. The mathematical subtleties of tie-breakingwill not concern us.)We will generally use capital letters T1, . . . , Tn to represent independent times (whichmay be event times or censoring times), and t1 < · · · < tk to represent the orderedevent times.

We will mainly be concerned with the case where Xi and Ci are inde-pendent of each other, called random or non-informative censoring. Whenwe neglect to state otherwise, it will be assumed that censoring is random.Right censoring covers several common practical situations, including

(i). Loss to follow-up: the subject moves away and we can’t contacthim/her.

(ii). Drop out: The subject declines or is unable to continue participation.As this is often due to side-e↵ects in medical trials, or a deteriorationin condition, it is clear that independence of drop out will often beapproximate at best.

Page 14: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Counting processes 6

(iii). End of trial: This is the simplest case, since a deterministic plannedend to the study is presumptively independent of the event times.

“End of trial” is called type I censoring. If individuals have distinct censoringtimes that are fixed in advance, this is called progressive type I censoring.

In general, it is clear that dependence between Xi and Ci can causearbitrarily di�cult problems with statistical analysis (which may be resolvedby including the censoring process in our statistical model, which we gen-erally do not wish to do). But independence turns out to be a strongercondition than we need, and excludes some important examples of interest;in particular, so-called type II right censoring. In type II censoring, a popu-lation of N individuals is observed until exactly r of them have had theirevent, at which point the study is concluded. Clearly the censoring timesare not independent of the event times, but it also seems intuitively clearthat there is no particular problem with using the observed event times todraw conclusions about that portion of the survival curve. The next fewlectures will develop a mathematical framework that allows us to draw theappropriate distinctions for determining, among other things, when censoringis “su�ciently independent”.

Page 15: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Chapter 2

�-algebras and conditioning

2.1 � algebras

2.1.1 Background

If you have learned any measure theoretic probability, you will be familiarwith the fundamental “probability triple” (⌦,F,P). Here ⌦ is the samplespace, P is the probability distribution, and F is the set of “events”: thesubsets of ⌦ that have probabilities. For discrete probability F can includeall the subsets, so there’s no need to think about it. One of the first theoremsyou prove in measure-theoretic probability is that it is impossible to definea continuous distribution on the real numbers such that all subsets haveprobabilities. This means that a mathematically rigorous requires that wethink carefully about what events are allowed. The collection of events forma �-algebra, meaning that it satisfies the following conditions:

(i). ⌦ 2 F;

(ii). If A 2 F then A{ 2 F;

(iii). If A1, A2, . . . are a countable collection of elements of F, thenS1

i=1Ai 2F.

These clearly parallel the probability axioms, saying that any set whoseprobability could be computed from knowing the probabilities of events inF is also an event in F. The smallest �-algebra that contains all the openintervals is called the Borel �-algebra, and that is the �-algebra that weconventionally use for continuous probability on R.

The Borel �-algebra is BIG. It doesn’t include all subsets of R, butactually constructing a set that is not Borel is not something you are likely

7

Page 16: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 8

to do by accident. Thus, we can still be pretty rigorous about our probabilitytheory — as rigorous as we intend to be in this course — without ever definingprecisely which sets are “events”. We will be ignoring these “measurability”questions.

2.1.2 �-algebras and information

So why are we talking about �-algebras? They turn out to be exactly theright mathematical language for talking about di↵erent sets of information.There are good reasons for restricting to a smaller class of events than themaximum possible. The basic idea is that the full �-algebra F representscomplete information about which outcome occurred, while smaller �-algebrasrepresent reduced information: All you can know is, which events in the�-algebra occurred.

2.1.3 Examples

Trivial �-algebra

The minimal �-algebra — the smallest amount of information you could have— is represented by T = {⌦, ;}. This corresponds to having no informationat all. This �-algebra is contained in any other �-algebra, by definition.

An example on [0, 1]

Let ⌦ = [0, 1]. Then F = {⌦, [0, 12 ], (12 , 1], ;} is a �-algebra. It may be

thought of as representing the information about which half of the interval! is in.

Coin tossing

Let ⌦ = {0, 1}n (where n may be 1). Let A(k1, . . . , kj ;x1, . . . , xj) for1 ki n distinct integers, and xi 2 {0, 1} be the “cylinder set”

A(k1, . . . , kj ;x1, . . . , xj) :=�

(!1, . . . ,!n) : !ki

= xi for 1 i j

.

This is the set of outcomes such that the kj-th coordinate is fixed to be xj .That is, it is the set of sequences of coin-flips where some particular flips(given by kj) are known to have particular values (given by xj). There are3m di↵erent cylinder sets with 1 k1 < k2 < · · · < kj m, which we maylist as A1, . . . ,A3m . Then if we define

Fm :=�

Ai1 [ · · · [Aij

, 0 j 3m, 1 i1 < i2 < · · · < ij 3m

Page 17: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 9

then Fm may be thought of as representing the information in the first mflips.

2.1.4 Intersections of �-algebras

If F1, . . . ,Fn are �-algebras, the intersection represents the information thatis common to all of them, written

F1 ^ · · · ^ Fn =n

i=1

Fi =n\

i=1

Fi.

As with any other sort of intersection, we may take intersections over arbitrarycollections, so if {Fi : i 2 I}, for any index class I, then the intersectionV

i2I Fi =T

i2I Fi is a �-algebra.

2.1.5 � algebra generated by a family of sets

If F is any collection of subsets of ⌦, the intersection of all �-algebrascontaining F is a �-algebra, called the �-algebra generated by F , written�(F ). As already mentioned, the �-algebra generated by open intervals iscalled the Borel �-algebra.

Coin-tossing

The �-algebra described in section 2.1.3 is generated by the family of sets

Bi =�

(!1, . . . ,!n) : !i = 0

.

2.1.6 Joins of �-algebras

If F and G are �-algebras, the join of F and G, written F_G, is the �-algebragenerated by sets of the form A \ B, where A 2 F and B 2 G. It may bethought of as the �-algebracontaining all the information of F and G together.

We may also define the join of an arbitrary collection of �-algebras,as being generated by all finite intersections of elements of the individual�-algebras.

2.1.7 �-algebra generated by a random variable

If X is a real-valued random variable, the �-algebra hXi is defined to be the�-algebra generated by the family of events of the form {! : X(!) x},for any x 2 R. In other words, it consists of sets of the form X�1(B),

Page 18: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 10

where B ⇢ R is any Borel set. This may be thought of as the �-algebra ofinformation about the outcome that is contained in the value of X.

If X and Y are random variables, the �-algebra generated by them jointlyis hX,Y i = hXi _ hY i. We may likewise speak of the �-algebra generatedby an arbitrary collection of random variables.

If F is a �-algebra with {! : X(!) a} 2 F for all a 2 R, we say thatX 2 F. Equivalently, we say that X is F-measurable. Thus, we can think ofa �-algebra as a collection of random variables, and this is the perspectivewe will be taking. A �-algebra is a way of summarising the statement thatwe know the values of certain random variables. The axioms of a �-algebramean that for any collection of random variables X1, . . . , Xn 2 F, F alsoincludes any other random variables that could be computed as a functionof X1, . . . , Xn.

Coin-tossing

The �-algebra Fm described in section 2.1.3 is generated by the randomvariables X1, . . . , Xm, where Xi = outcome of the i-th flip.

2.1.8 Filtrations of �-algebras

A filtration is a collection of �-algebras Ft such that Fs ⇢ Ft when s t.(They are said to be increasing.)

A stochastic process (X(t))t�0 is said to be adapted to the filtration Ft

if X(t) 2 Ft. The filtration Ft generated by {Xs : s t} is called the past�-algebra of X. It is the minimal filtration such that X is adapted to F.

We define

Ft+ :=^

s>t

Fs,

Ft� :=_

s<t

Fs.

Intuitively, Ft+ includes all information that is available at all times after t,while Ft� includes all information that is available strictly before time t. Wecall a filtration right-continuous if Ft = Ft+. Unless otherwise indicated, wewill be assuming that filtrations are right-continuous.

The natural filtration of a stochastic process (Xt)t�0 is the filtrationgenerated by Xt. This is the smallest filtration with respect to which(Xt) is adapted, but we will often need to include additional information— additional random variables — in the filtration. In particular, we will

Page 19: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 11

include information about censoring and truncation events up to time t inthe filtration.

2.1.9 Complete filtrations

A filtration (Ft)t�0 is complete if F0 includes all events with probability 0.We will always assume our filtrations are complete. This reflects the notionthat there is no information in events of probability 0, because they “almostsurely” don’t happen.

2.1.10 Stopping times

A stopping time (also called a Markov random time) is a random timesuch that we know at time t whether it has happened. That is, it is apositive-real-valued random variable T such that {! : T t} 2 Ft. Someexamples:

• The time of the first event is a stopping time.

• The time of the last death in a survival experiment starting witha (possibly random) collection of individuals without censoring is astopping time.

• The time of the last observed death in a survival experiment startingwith a (possibly random) collection of individuals with random rightcensoring is not a stopping time. This is because any given event maybe the last one, if all survivors are ultimately censored. There is noway to know this at the time of the event (unless, of course, there areno survivors).

If we think of (Ft) as representing sets A such that we may determinewhether the outcome ! is in A using only information available at time t,then for a stopping time T , we may speak of the �-algebra FT of eventswhose membership may be determined using information available at therandom time T : That is, when T (!) occurs, whenever that is, we knowwhether ! 2 A. Or we may say that a random variable X 2 FT if we maycalculate X at time T , once we know what T is. Thus, for example, if (Ft)is the natural filtration of a stochastic process

X(t)�

t�0, then T and X(T )

is FT -measurable, as is X(T � 1), andR T0 X(s)ds.

The formal definition (which we will not have cause to use) is

FT =_

t�0

n

! : T (!) t

\A : A 2 Ft

o

Page 20: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 12

2.1.11 Predictable processes

A crucial concept for the models we want to develop is that of a predictableprocess. Intuitively predictable process is an Ft-adapted process (X(t)) suchthat no new information appears suddenly. We can “predict” the valueof the process at any time on the basis of information available at leastinfinitesimally before that time. Clearly a left-continuous process would meetthis definition. A jump process that has a clock that accumulates time —possibly at a random rate, according to an auxiliary stochastic process —and then jumps when the clock hits 1, would also be predictable. On theother hand, the process

X(t) =

(

0 for all t with probability 12 ;

1{t�1} with probability 12 .

is clearly not predictable (when Ft is the history of X). The value of X(t)cannot be calculated from any information available in the values X(s) fors < 1.

This suggests a definition

(X(t)) is predictable if X(t) 2 Ft� for each t.

This definition is not restrictive enough, though, because an unpredictableprocess may not be unpredictable at any particular time. A Poisson countingprocess is quintessentially unpredictable — independent increments impliesthat there is no way to know that a jump is coming until it actually comes —but for any fixed t the probability of an unpredictable jump actually occurringat time t is 0. Thus

X(t) = X(t�) +⇥

X(t)�X(t�)⇤

where X(t�) := lim�#0X(t � �). Clearly X(t�) is Ft�-measurable, andX(t)�X(t�) is almost surely 0, so it is F0-measurable (by completeness).So this definition would not exclude the Poisson process.

An alternative definition would be

(X(t)) is predictable if it is left-continuous.

This clearly is too restrictive, since “predictability” would not be infringedif there were a right-continuous jump at a deterministic time, for example.The technical definition then extends this to processes whose random bitsare left-continuous. We will not need this generality.

Page 21: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 13

For purposes of this course, predictable processes are equivalent to left-continuous processes.

The more general definition may be found in section 1.4 of [FH91].

2.2 Conditioning

One of the most important concepts in probability is the conditional expec-tation of one random variable conditioned on another – or on a collection ofother random variables. An excellent elementary introduction to this conceptmay be found in chapter 6 of [Pit93]. A more sophisticated treatment is inchapter 5 of [Dur10].

2.2.1 Definition of conditioning

Let F be a �-algebra, thought of as a collection of random variables, and Xa random variable that may or may not be included in F. As we have said,these random variables represent information. Then

E⇥

X�

�F⇤

(2.1)

the expectation of X conditioned on F is, intuitively, the best approximation— the best guess — we can make to X, given only the information in F. It isthe projection of X onto the F-measurable random variables. This may betaken as a formal definition:

E⇥

X�

�F⇤

is the F-measurable r.v. Y that minimises E⇥

(X � Y )2⇤

. (2.2)

Alternatively,

E⇥

X�

�F⇤

is the F-measurable r.v. Y that satisfies

E⇥

XZ⇤

= E⇥

Y Z⇤

for any Z 2 F.(2.3)

These definitions aren’t very convenient to work with, and they’re notobviously even definitions — Does such a Y exist? Is it unique? — so wedelve into some more intuitive descriptions.

Page 22: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 14

2.2.2 Intuitive definition for discrete random variables

Suppose F = hW i is the �-algebra generated by the discrete random variableW . Then E[X|F] is some function g(W ) — that is, if we know thatW (!) = w,then we can compute E[X|F](!) = g(w). What would g(w) be? Accordingto (2.3), if we take Z = 1{W=w},

E⇥

X1{W=w}⇤

= E⇥

g(W )1{W=w}⇤

= g(w)P�

W = w

.

Thus

g(w) =E⇥

X1{W=w}⇤

P{W = w} ,

which is our old definition for E[X|W = w]. That is, E[X|F] = E[X|W ] is arandom variable that may be written as a function of W , by the rule thatassigns to E[X|W ](!) the value E[X|W = w] whenever W (!) = w.

It is clear that this may be generalised to �-algebras generated by multiplediscrete random variables. Generalising to continuous (or more complicated)random variables takes some abstract mathematics, but it still works, andthe results are unique (or, unique enough), and it is still true that whenF = hW1, . . . ,Wki then

E[X|F](!) = E⇥

X�

�W1 = w1, . . . ,Wk = wk

whenever Wi(!) = wi. (2.4)

2.2.3 Properties of conditional expectations

• If X 2 F then E[X|F] = X. This holds if X is not in F but is afunction of some random variables in F.

• Suppose X is independent of F, meaning that for any real x, and anyA 2 F,

P�

{X x} \A�

= P�

X x

P(A).

Then E[X|F] = E[X]. This applies, in particular, to the case whenF = {⌦, ;}. Since X is always independent of T, we see that E[X|T] isthe constant E[X].

• If F ⇢ G are two �-algebras then

E⇥

E[X�

�F]�

�G⇤

= E⇥

X�

�F⇤

= E⇥

E[X�

�G]�

�F⇤

. (2.5)

In other words, the cruder estimate dominates. The first equalityfollows immediately from the fact that E[X|F] is already G-measurable.The second equality follows, intuitively, from the fact that an estimate of

Page 23: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 15

an estimate cannot be better than an estimate of the original quantity.(Obviously, this depends on the fact that the estimate E[X

�G] isunbiased, in the proper sense. So there is some work to be done for aformal proof.)

• If X and Y are any random variables,

E⇥

X + Y�

�F⇤

= E⇥

X�

�F⇤

+ E⇥

Y�

�F⇤

. (2.6)

Since integrals behave like sums, if we have a collection of boundedrandom variables �(u) (for u 2 [s, t]),

Eh

Z t

s�(u)du

Fi

=

Z t

sE⇥

�(u)�

�F⇤

du. (2.7)

• If Y 2 F, thenE⇥

XY�

�F⇤

= E⇥

X�

�F⇤

· Y. (2.8)

Proof. Intuitively this makes sense: If we know the value of Y , thebest approximation to XY must be obtained by multiplying Y by thebest approximation to X. Formally, we use the characterisation (2.3).First of all, since Y 2 F and E[X

�F] 2 F, clearly Y E[X�

�F] 2 F.

If Z is any other F-measurable random variable then also ZY 2 F.Thus, by (2.3)

E⇥

Z · Y E[X�

�F]⇤

= E⇥

ZY · E[X�

�F]⇤

= E[ZXY ].

2.2.4 Examples

Constants

The most trivial collection of random variables is the collection F of allconstants (or deterministic random variables).1 Then E[X|F] = E[X].

We could also think of F as being an empty collection of random variables;constants may be thought of as functions with no arguments.

Conditioning on a single random variable

If F is generated by a random variable Y , we write E[X|F] as E[X|Y ], as itis sometimes defined; that is, E[X|Y ] is a random variable that is a functionof Y , calculated according to the rule, when Y = y, E[X|Y ] = E[X|Y = y].

1In the usual language of �-algebras, F is the �-algebra {;,⌦}.

Page 24: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 16

Uniform distribution

Let X be the uniform distribution on [0, 1], and let Y = X(1�X). ThenF := hXi includes all open intervals, hence is the complete Borel �-algebra,while G := hY i includes only the Borel sets that are symmetric about12 . Clearly E[X|F] = X and E[X2|F] = X2, since X and X2 are bothF-measurable. On the other hand,

E⇥

X�

�G⇤

=1

2,

E⇥

X2�

�G⇤

= E⇥

X � Y�

�G⇤

=1

2� Y.

The second calculation follows from the first, using (2.6). The first seemsintuitively clear: given the value of Y , which is the information in G, weknow that X = 1

2(1±p1� 4y), with both possibilities equally likely (since

the distribution is uniform). Thus, the conditional expectation is

E[X|Y = y] =1

2

1

2(1 +

p

1� 4y) +1

2(1�

p

1� 4y)

=1

2.

More formally, we can write Z = sgn(2X � 1). Then X = 12(1+Z

p1� 4Y ),

so

E⇥

X�

�G⇤

=1

2+

1

2E⇥

Zp1� 4Y

�G⇤

=1

2+

1

2E⇥

Z�

�G⇤

p1� 4Y

by (2.8). We know that

E⇥

Z�

�G⇤

=f((1 +

p1� 4Y )/2)� f((1�

p1� 4Y )/2)

f((1 +p1� 4Y )/2) + f((1�

p1� 4Y )/2)

,

where f is the density of X. In the case where f is uniform, this is 0, butwe can compute this more generally.

2.2.5 In search of past time

We now can describe, informally, the general intensity of a counting processas

�(t)dt = E⇥

dN(t)�

�Ft�⇤

. (2.9)

This is a useful way of thinking about it, even if it’s not quite right becausedtand dN(t) aren’t real mathematical objects (unless we’re doing nonstandardanalysis).

This intuition allows us to think of intensities as being relative to theinformation available at time t, and we may want to be able to change our

Page 25: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 17

minds about what information is available for computing the intensity. Forexample, in many of the models we are concerned with, the hazard ratesof individuals depend on covariates. A given covariate may be observedor unobserved in a given setting, and the filtration is part of the model.It is crucial that Ft (or whatever we call the filtration in a given setting)contains only random variables whose value is known by time t, which is ingeneral a subset of the random variables whose values have been physicallydetermined by time t.

This leads to an important theorem, the Innovation Theorem, which isan application of (2.5): Suppose we have two di↵erent collections of randomvariables Ft and Gt for each t, such that Ft contains only partial informationup to time t — that is, only some of the random variables determined bytime t — and Gt contains all the information. (Or, Gt may also contain onlypartial information, but in any case Ft ⇢ Gt.) Suppose we have computedthe intensity of the process based on the information in Gt, but we actuallywant the intensity based on Ft. Then by (2.5)

�F(t) = E⇥

dN(t)�

�Ft�⇤

= E⇥

E⇥

dN(t)�

�Gt�⇤

�Ft�⇤

.

Theorem 2.1 (Innovation Theorem). Suppose Ft ⇢ Gt for every t. Thenthe associated intensities satisfy

�F(t) = E⇥

�G(t)�

�Ft�⇤

. (2.10)

2.2.6 Examples

Coin flipping

Let ⌦ = {0, 1}1 be the space of infinite sequences of coin flips, Xi = !i

the outcome of the i-th flip. Let P being the probability distribution thatmakes the Xi i.i.d. uniform on {0, 1}. Let Ft be the �-algebra generatedby X1, . . . , Xm where m = btc. Of course, we would ordinarily look atthis process in discrete time, but there is no problem with embedding it incontinuous time. We may also define X(t) := Xbtc, and S(t) =

P

1itXi.Then X(t) and S(t) are right-continuous and in Ft; Ft is a right-continuous�-algebra.

If n � m then

E⇥

S(n)�

�Fm

= E⇥

S(m)�

�Fm

+ E⇥

S(n)� S(m)�

�Fm

= S(m) + E⇥

S(n)� S(m)⇤

= S(m) +n�m

2

Page 26: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 18

since S(n)� S(m) is independent of Fm.

Poisson process

Let N(t) be a Poisson counting process with intensity �, and Ft the associatedfiltration. For any fixed t and t0 > t,

E⇥

N(t0)�

�Ft

= E⇥

N(t0)�N(t)�

�Ft

+ E⇥

N(t)�

�Ft

= �(t0 � t) +N(t),

and

E⇥

N(t0)2�

�Ft

= E⇥

(N(t0)�N(t))2�

�Ft

+ E⇥

N(t)2�

�Ft

+ 2E⇥

(N(t0)�N(t))N(t)�

�Ft

= �(t0 � t) + �2(t0 � t)2 +N(t)2 + 2N(t) · �(t0 � t),

so that

Var�

N(t0)�

�Ft

= E⇥

N(t0)2�

�Ft

� E⇥

N(t0)�

�Ft

⇤2

= �(t0 � t).

Thus, unsurprisingly (since N(t0)�N(t) is independent of Ft), the conditionalvariance of N(t0) conditioned on Ft is precisely the same as Var(N(t0)�N(t)).

Frailty process

Suppose we have a single individual, whose mortality rate as a function ofage is Gompertz — that is, �(t) = Be✓t. Suppose now that B is itself arandom “frailty”, determined already at age 0. For definiteness, let us saythat B has Gamma distribution with parameters (r, µ). (Recall that thisdistribution has density

fr,µ(x) =µr

�(r)xr�1e�µx (2.11)

on x 2 (0,1). Its expectation is r/µ and variance r/µ2. Let Gt be thecomplete past up to time t, while Ft is the past excluding the randomvariable B. (In this case, it will be generated by T1{Tt}.)

Since G includes all the information,

�G(t) = Be✓t1{T�t}.

That is, up to time t we know B, and so we know that the rate of the singleevent occurring at time t — intuitively, the “instantaneous probability per

Page 27: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Conditional expectations 19

unit of time” — is the hazard rate, unless it has already happened, in whichcase the intensity is 0.

�F is the intensity you would estimate if you could not observe B. TheInnovation Theorem tells us that

�F(t) = E⇥

Be✓t1{T�t}�

�Ft�⇤

= e✓tE⇥

B1{T�t}�

�T1{T<t}⇤

,

because Tt generates Ft�. Now, on the event {T1{T<t} > 0} — that is, when

T < t, so it has already happened — �F(t) = �G(t) = 0. On the remainingevent, where T � t, we get

�F(t) = e✓tE⇥

B1{T�t}�

�T � t⇤

= e✓tE⇥

B�

�T � t⇤

= e✓tE⇥

B · 1{T�t}

P�

T � t

= e✓tR10 xfr,µ(x)P

T � t�

�B = x

R10 fr,µ(x)P

T � t�

�B = x

= e✓tR10 xre�µxP

T � t�

�B = x

dxR10 xr�1e�µxP

T � t�

�B = x

dx

We have

P�

T � t�

�B = x

= exp

�Z t

0xe✓sds

= expn

�x

e✓t � 1⌘o

.

So

�F(t) = e✓t

R10 xr exp

n

�x⇣

µ+ e✓x�1✓

⌘o

dxR10 xr�1 exp

n

�x⇣

µ+ e✓t�1✓

⌘o

dx

= e✓t�(r + 1)

µ+ (e✓t � 1)/✓��(r+1)

�(r) (µ+ (e✓t � 1)/✓)�r

=re✓t

µ+ (e✓t � 1)/✓.

Page 28: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Chapter 3

Martingales

3.1 Martingales

3.1.1 Definitions

. Intuitively, a martingale is a “fair game”. That is, based on what hashappened up to time t, the expected future change is 0. The average bestguess about the value at any future time is that it will be equal to the currentvalue. Formally, the stochastic process (M(t))t�0 is a martingale with respectto the filtration Ft (or an “Ft-martingale” if for any t � s,

E⇥

M(t)�

�Fs

= M(s). (3.1)

Fact 3.1. If (M(t)) is a martingale, and Y 2 Fs, then E[(M(t)�M(s))Y ] =0 for any t � s, and E[M(t)Y ] = E[M(s)Y ].

This is an immediate consequence of (2.8).

Fact 3.2. Optional stopping: If T > s is a bounded stopping time, thenE[XT

�Fs] = M(s).

Example: Sums of i.i.d. random variables

If M(t) =Pdte

i=1 ⇠i, where ⇠1, ⇠2, . . . are i.i.d. with E[⇠i] = 0, then (M(t)) is amartingale. (We can also think of this as a discrete-time martingale.

The reason martingales are interesting is that they have many of thesame nice properties as sums of i.i.d. random variables — Laws of LargeNumbers and Central Limit Theorem — but are much more general.

20

Page 29: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 21

Brownian motion

As the number of jumps in a discrete martingale gets large, but the sizeof the jumps remains small, the process can be rescaled to converge to acontinuous martingale called a Brownian motion or Wiener process. Themathematical analysis of this object is a fascinating subject in its own right(which many of you will have seen in other courses), but we will not havemuch to say about it in this course. Still, it is useful to know that there is thisuniversal limiting object that plays the same role for whole stochastic processpaths that the Gaussian distribution plays for one-dimensional averages. Theessential properties of the Brownian motion are:

• Continuous: Brownian motion is a random continuous function B :R+ ! R.

• Independent increments: For any 0 s1 t1 s2 t2 · · · sn tn, the random variables B(t1)�B(s1), B(t2)�B(s2), . . . , B(tn)�B(sn) are independent.

• Normal distribution: For any s, t, B(t)�B(s) is normally distributedwith mean 0 and variance t� s.

3.2 Compensators

If N(t) is a Poisson counting process with intensity �, it obviously isn’t amartingale, since it only goes up. But if we define M(t) := N(t)� �t, then

E⇥

M(t)�

�Fs

= E⇥

M(s) +�

M(t)�M(s)�

�Fs

= M(s) + E⇥

N(t)�N(s)� �(t� s)�

�Fs

= M(s) + E⇥

N(t)�N(s)�

�Fs

� �(t� s)

= M(s).

The last line uses the fact that N(t) � N(s) is independent of Fs, whichimplies that

E⇥

N(t)�N(s)�

�Fs

= E⇥

N(t)�N(s)⇤

= �(t� s).

The last line uses the fact that N(t) � N(s) is independent of Fs, whichimplies that

E⇥

N(t)�N(s)�

�Fs

= E⇥

N(t)�N(s)⇤

= �(t� s).

Page 30: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 22

So we subtract the function �t from N and get a martingale. We say �tis the compensator of the Poisson counting process.

A compensator for the random process N(t) is a (possibly random)process A(t) with the following properties:

• A(t) is non-decreasing;

• A(t) is predictable;

• N(t)�A(t) is a martingale.

Of course, if N(t) has a compensator A(t), then taking M(t) := N(t)�A(t), for any t � s

E⇥

N(t)�

�Fs

= E⇥

M(t) +A(t)�

�Fs

= E⇥

M(t)�

�Fs

+A(s) + E⇥

A(t)�A(s)�

�Fs

�M(s) +A(s)

since A(t)�A(s) is always � 0, so

E⇥

N(t)�

�Fs

� N(s). (3.2)

A process N satisfying (3.2) is called a submartingale. There is a result,the Doob–Meyer decomposition, telling us that every submartingale has acompensator — that is, it may be written as a sum of a martingale and anon-decreasing predictable process.

Intuitively, the compensator is the cumulative conditional rate of instan-taneous average increase of the process. For a Poisson-like counting processthis is exactly the same thing that we have vaguely defined as a cumulativeintensity.

3.2.1 Inhomogeneous Poisson counting process

Let N(t) be a counting process with predictable intensity �(t). We make theassumption that �(t) does not change, on average, very rapidly, in the sensethat there is a constant C such that for all t � s � u � 0,

E⇥

|�(t)� �(s)|�

�Fu

C(t� s). (3.3)

We have not formally defined what it means for �(t) to be the intensityof N(t). We define it now (not completely rigorously) by the relations

E⇥

N(t+ �)�N(t)�

�Ft

= �(t)� + o(�), uniformly,

E⇥�

N(t+ �)�N(t)�

1{N(t+�)�N(t)�2}�

�Ft

= o(�), uniformly.(3.4)

Page 31: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 23

That is, conditioned on all the information up to time t, the estimated rateof new points appearing in the next instant is �(t); and the rate of multiplepoints in a tiny interval is vanishingly small. The errors are small withrespect to � in a way that is uniform over all random outcomes and all t.

If we define M(t) := N(t)� ⇤(t), then for t � s

E⇥

M(t+ �)�

�Fs

= E⇥

E⇥

M(t+ �)�

�Ft

�Fs

.

Thus

d

dtE⇥

M(t)�

�Fs

= lim�#0

��1E⇥

E⇥

M(t+ �)�M(t)�

�Ft

Fs

= lim�#0

E⇥

��1E⇥

N(t+ �)�N(t)�

�Ft

� E��1Z t+�

t�(u)du

�Ft

Fs

= lim�#0

E⇥

��1E⇥

��(t) + o(�)�

�Ft

� E⇥

�(t)�

�Ft

Fs

= 0.

3.3 Cheater’s guide to stochastic integrals

You may be familiar with the Riemann-Stieltjes integralZ t

0f(x)dG(x).

Without concerning ourselves too much with the formalities, we may thinkof this as meaning the integral of a bounded function f(x) with respect tochanges in G. If G is a di↵erentiable function whose derivative is g, thendG = (dG/dx)dx, so we may write

Z t

0f(x)dG(x) =

Z t

0f(x)g(x)dx.

What if G has jumps, but is di↵erentiable between the jumps? SupposeG0(x) = g(x) for x /2 {x1, . . . , xn}, where x1 < · · · < xn, and G has a jumpof size yi at xi — that is, G(xi)�G(xi�) = yi. Then the “change in G” hasa jump of size yi at xi, so

Z t

0f(x)dG(x) =

Z t

0f(x)g(x)dx +

nX

i=1

f(xi)yi. (3.5)

The random functions we will be concerned with are piecewise di↵eren-tiable with jumps, so we may apply formula (3.5) to define the integral with

Page 32: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 24

respect to one of them. In particular, if N(t) is a counting process withevents at 0 T1 < · · · < TK t (where K may now also be random), thenfor any bounded predictable stochastic process X(s),

Z t

0X(s)dN(s) =

KX

i=1

X(Ti). (3.6)

Example 3.1: Compensated counting process

Given the compensator ⇤(s) =R s0 �(u)du, we define the mar-

tingale M(s) = N(s)� ⇤(s). Then for any predictable process(X(t)),

Z t

0X(t)dM(t) =

X

Ti

t

X(Ti)�Z t

0X(s)�(s)ds. (3.7)

⌅Note that the changes in M all have expectation 0 (because M is a

martingale). What we are doing is taking the change �M(s) and multiplyingit by X(s), which is already known before time s, so may be thought ofas a constant. The resulting change is larger or smaller than �M(s), inproportion to X(s), but it is still zero on average.

Fact 3.3. If X is a random process such that X(s) 2 Fs� for all s, and Ma martingale, then

Y (t) :=

Z t

0X(s)dM(s) is a martingale. (3.8)

3.4 Variation processes

3.4.1 Intuitive definitions

One of the nice properties of independent sums is that the variance of the sumis the sum of the variances, since the covariances are all 0. The same is truefor martingale sums, as long as we define variance to mean the conditionalvariance based on the past. We define the predictable variation process hMi(t)to be an increasing function that accumulates all of the conditional varianceup to time t:

dhMi(t) = Var(dM(t)�

�Ft�).

Page 33: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 25

It is random because it sums variances conditional on the developments upto time t. The optional variation process is the sum of the actual squaredchanges in the process: If M has jumps of size Yi at points Ti, then

[M ](t) =X

Ti

t

Y 2i .

(This is in our special setting where the stochastic processes have only jumpsand di↵erentiable pieces.)1

3.4.2 Formal definitions

We may divide up the interval [0, t] into n equal subintervals [ti, ti+1) withti = it/n, and treat M(ti) like a discrete-time process. (We won’t be usingdiscrete-time martingales, but they are defined in the obvious way, anddiscussed in section 2.1 of [ABG08].) Letting �Mi = M(ti)�M(ti�1), wesee that E[�Mi

�Fti�1 ] = 0, and we define

hMi(t) = limn!1

nX

i=1

Var�

�Mi

�Fti�1

,

and

[M ](t) = limn!1

nX

i=1

(�Mi)2.

Of course, we need a bit of mathematical work — which we will skip — toshow that these limits always exist for the sorts of processes we are concernedwith here.

3.4.3 Useful facts about variation processes

Fact 3.4. If M is a martingale with M(0) = 0 then

M2 � hMi is a mean-zero martingale;

M2 � [M ] is a mean-zero martingale.

In particular,

Var�

M(t)�

= E⇥

M(t)2⇤

= E⇥

hMi(t)⇤

= E⇥

[M ](t)⇤

. (3.9)

1If M is di↵erentiable over the interval [s, t], and we break it up into K equal pieces

ti

= s+ (t� s) ⇤ i/K, thenP

(M(ti+1)�M(t

i

))2K!1�����! 0, so the only contribution to the

optional variation comes from the jumps. It’s di↵erent when M is a continuous Markovprocess such as a Brownian motion, but we won’t consider that here.

Page 34: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 26

Fact 3.5. If X is a predictable stochastic process, and M = N � ⇤ acounting-process martingale, then

Y (t) :=

Z t

0X(s)dM(s)

has variation processes

hY i(t) =Z t

0X(s)2d⇤(s) =

Z t

0X(s)2�(s)ds, (3.10)

[Y ](t) =

Z t

0X(s)2dN(s). (3.11)

In other words, predictable variation is driven by the continuous part of M ,while optional variation is driven by the random jumps.

3.4.4 Caveats (not examinable)

We are calling these “facts” rather than “theorems” because

(i). We are not proving them.

(ii). They’re not technically true. But they’re true enough for our purposes.

The problem is, M could be a martingale but M(t)2 might not in generaleven have a finite expectation. Similarly, the claim made in (3.8) did notimpose any assumptions on the integrand process X. It would clearly workif X were bounded, but we need to allow for integrals like

R

M(s�)dM(s).Even if M is derived from a counting process with bounded intensity it won’tbe bounded. It will be locally bounded, though, by which we mean somethinglike M is bounded on finite intervals with high probability. The result ofthe integration is a local square integrable martingale, which means that itessentially satisfies the conditions for a martingale, including having variationprocesses that satisfy the formulas above, except for small probabilities thatit could be much larger than expected on small intervals.

For the sorts of examples we will be considering here, the statements aretrue; and they can be proved in significantly more generality if we expandour definitions by replacing martingales by local martingales. The proofs,together with the definitions of local martingales and related concepts, arenot di�cult, but they would take and extra couple of lectures. If you’reinterested, there is a straightforward treatment — in the context of countingprocesses — in chapter 2 of [FH91].

Page 35: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 27

3.5 Examples

3.5.1 Independent sums

Let ⇠1, ⇠2, . . . be independent normal random variables, with mean 0 andVar(⇠i) = �2

i , and let

M(t) :=X

it

⇠i.

Thus M(t) is a right-continuous process that makes jumps precisely atpositive-integer times. Ft includes the values of all jumps that happened upto and including time t, and of course any function of a combination of these.When t is a positive integer, Ft� does not include the value of ⇠t; otherwise,Ft� is identical with Ft.

For any s < t,

Eh

M(t)�

Fs

i

= Eh

M(t)�M(s)�

Fs

i

+ Eh

M(s)�

Fs

i

= Eh

X

s<it

⇠i�

Fs

i

+M(s)

=X

s<it

Eh

⇠i�

Fs

i

+M(s)

= M(s),

The last equality follows from the fact that ⇠i is independent of Fs for i > s,so that E[⇠i|Fs] = E[⇠i] = 0.

The predictable variation is flat except at positive integers t. At thosetimes it increments by

dhMi(t) = Var�

dM(t)�

�Ft��

= Var�

⇠t�

�Ft��

= �2t ,

since ⇠t is independent of Ft� with variance 1. Thus

hMi(t) =X

it

�2i .

Note that the predictable variation is deterministic, because the incrementsare independent of the past. The optional variation process is also flat awayfrom positive integers, but the jumps are the squares of the jumps in M , so

[M ](t) =X

it

⇠2i .

Page 36: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 28

3.5.2 Weighted independent sums

Imagine that a gambler is betting on the outcomes of the random variables⇠i in section 3.5.1. This means that at time i she gets to choose, basedon everything she has seen so far — formally, this means that the randomvariable Ci+1 2 Fi — to bet an amount Ci+1 (bounded by some fixed C),which will return Ci+1⇠i+1 at time i+ 1. Her fortune at time t (relative toher initial fortune, which is set to 0), is

M(t) :=X

it

Ci⇠i.

Thus M(t) is a right-continuous process that makes jumps precisely atpositive-integer times. Ft includes the values of all jumps that happened upto and including time t, as well as Cbtc+1.

For any s < t,

Eh

M(t)�

Fs

i

= Eh

M(t)�M(s)�

Fs

i

+ Eh

M(s)�

Fs

i

= Eh

X

s<it

Ci⇠i�

Fs

i

+M(s)

=X

s<it

Eh

Ci⇠i�

Fs

i

+M(s)

= M(s),

The last equality follows from the fact that Fs ⇢ Fi�1 (because s < i, andthere is no new information between time i and i+ 1), so that

Eh

Ci⇠i�

Fs

i

= Eh

E⇥

Ci⇠i�

�Fi�1⇤

Fs

i

= Eh

CiE⇥

⇠i�

�Fi�1⇤

Fs

i

since Ci 2 Fi�1

= Eh

Ci · 0�

Fs

i

= 0,

since ⇠i is independent of Fi�1, so that E[⇠i|Fi�1] = E[⇠i] = 0. (This is just aformal way of saying that at time i the random variable Ci is like a constant,so Ci⇠i is a constant times ⇠i, with expectation 0.)

The predictable variation is flat except at positive integers t. At thosetimes it increments by

dhMi(t) = Var�

dM(t)�

�Ft��

= Var�

Ct⇠t�

�Ft��

.

Page 37: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 29

Since E⇥

Ct⇠t�

�Ft�⇤

= 0,

Var�

Ct⇠t�

�Ft��

= E⇥

(Ct⇠t)2�

�Ft�⇤

= C2t E

(⇠t)2�

�Ft�⇤

since C2t 2 Ft�.

= C2t �

2t .

SohMi(t) =

X

it

C2i �

2i .

Note that this is random, because Ci is random. But it is “predictable”, inthe sense that hMi(t) 2 Ft� — that is, it is known before time t.

The optional variation process is also flat away from positive integers,but the jumps are the squares of the jumps in M , so

[M ](t) =X

it

C2i ⇠

2i .

3.5.3 Compensated homogeneous Poisson process

Let N(t) be a homogeneous Poisson counting process with intensity �, and

M(t) := N(t)� �t.

We already know that M is a martingale. The predictable variation is nolonger piecewise flat. Instead, regardless of the past dM(t) is 1� �dt withprobability �dt, and ��dt otherwise, making

dhMi(t) = Var�

dM(t)�

�Ft��

= �dt+O(dt2).

Thus hMi(t) = �t.The optional variation process is just [M ](t) = N(t).

3.5.4 Compensated inhomogeneous Poisson process

Let N(t) be a homogeneous Poisson counting process with intensity �(t) attime t, and

M(t) := N(t)� ⇤(t).

Now, conditioned on the past, dM(t) is 1� �(t)dt with probability �(t)dt,and ��(t)dt otherwise, making

dhMi(t) = Var�

dM(t)�

�Ft��

= �(t)dt+O(dt2).

Thus hMi(t) = ⇤(t).The optional variation process is again just [M ](t) = N(t).

Page 38: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 30

3.6 Normal approximation for martingales

Like sums of i.i.d. random variables, martingales are approximately normal.The simplest Central Limit Theorem, taught in Part A Probability, says thatif ⇠1, ⇠2, . . . are i.i.d. random variables with E[⇠i] = µ and Var(⇠i) = �2, thendefining Mn :=

Pni=1(⇠i � µ), we have Mn/�

pn !d N(0, 1); that is, Mn

standardised to have variance 1 converges to a standard normal distribution.The same proof extends easily to any sequence of independent random

variables with E[⇠i] = µi and Var(⇠i) = �2i . Then, defining Mn :=

Pni=1(⇠i �

µi), we have⇣

nX

i=1

�2i

⌘�1/2Mn

n!1�����!d N(0, 1).

as long asP1

i=1 �2i =1. (We must also impose some sort of condition on

the third moments. It would su�ce, for instance, to know that there is aconstant C such that E[|⇠i|3] C�2

i .) Check that you understand that thei.i.d. Central Limit Theorem is just a special case of this one.

These generalise to martingales, with one important complication: Thevariance may be random. The quantity corresponding to �2

i is the incrementto the predictable variation process. What we need is for the predictable vari-ation to be approximately a fixed deterministic function, and for individualjumps to be small.

Suppose that we have a mean-zero martingale M (n)(t) with a parameter n— for instance, the number of subjects — such that the predictable variationconverges to the function V (t) as n!1:

limn!1

hM (n)i(t) = V (t) for each t. (3.12)

Suppose, too, that individual jumps are small — for instance, that lettingT1, T2, . . . be the times of jumps,

limn!1

Eh

X

Ti

t

|M (n)(Ti)�M (n)(Ti�)|3i

= 0 for each t. (3.13)

Then M (n)(t)/p

V (t)n!1�����!d N(0, 1) for each t. In fact, we have a functional

CLT, telling us that the entire random function M (n) converges to the time-changed Brownian motion W (V (·)), where W is Brownian motion. But wewon’t need this formal result, though we will occasionally refer to it forintuition.

A version that will be adequate for our purposes is the following: SupposeM (n)(t) = N (n)(t) � ⇤(n)(t) is a counting-process martingale, and H(n)(t)

Page 39: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Martingales 31

is a predictable process. Then fM (n)(t) :=R t0 H

(n)(s)dM (n)(s) is anothermartingale, and

Theorem 3.6. Suppose there is a function v (and V (t) =R t0 v(s)ds) such

that

H(n)(s)2�(n)(s)n!1�����!P v(s), and (3.14)

H(n)(s)2n!1�����!P 0. (3.15)

Then fM (n) converges in distribution to the stochastic process W (V (t)), where

W is Brownian motion. In particular, fM (n)(t) converges to a normal distri-bution with mean 0 and variance V (t).

More details may be found in section 2.3 of [ABG08].

Page 40: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Bibliography

[ABG08] Odd O. Aalen, Ørnulf Borgan, and Hakon K. Gjessing. Survivaland Event History Analysis: A process point of view. SpringerVerlag, 2008.

[AG82] Per Kragh Andersen and Richard D Gill. Cox’s regression modelfor counting processes: a large sample study. AoS, pages 1100–1120, 1982.

[AS02] Thomas Augustin and Regina Schwarz. Cox’s proportionalhazards model under covariate measurement error. In TotalLeast Squares and Errors-in-Variables Modeling, pages 179–188.Springer, 2002.

[Cox06] David R. Cox. Principles of Statistical Inference. CambridgeUniversity Press, 2006.

[CRSC10] Raymond J Carroll, David Ruppert, Leonard A Stefanski, andCiprian M Crainiceanu. Measurement Error in Nonlinear Models:A modern perspective. CRC Press, 2010.

[Dur10] Rick Durrett. Probability: theory and examples. CambridgeSeries in Statistical and Probabilistic Mathematics. CambridgeUniversity Press, Cambridge, fourth edition, 2010.

[EEH+77] Stephen H. Embury, Laurence Elias, Philip H. Heller, Charles E.Hood, Peter L. Greenberg, and Stanley L. Schrier. Remissionmaintenance therapy in acute myelogenous leukemia. The WesternJournal of Medicine, 126:267–72, April 1977.

[FH91] Thomas R. Fleming and David P. Harrington. Counting Processesand Survival Analysis. Wiley, 1991.

136

Page 41: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Bibliography 137

[KM03] John P. Klein and Melvin L. Moeschberger. Survival Analysis:Techniques for Censored and Truncated Data. SV, 2nd edition,2003.

[MGM01] Rupert G. Miller, Gail Gong, and Alvaro Munoz. Survival Analy-sis. Wiley, 2001.

[Pit93] Jim Pitman. Probability. Springer Verlag, 1993.

[Pre82] R. L. Prentice. Covariate measurement errors and parameter esti-mation in a failure time regression model. Biometrika, 69(2):331–42, 1982.

[TK98] Howard M. Taylor and Samuel Karlin. An introduction to stochas-tic modeling. Academic Press, 3rd edition, 1998.

[Tur74] Bruce W Turnbull. Nonparametric estimation of a survivorshipfunction with doubly censored data. Journal of the AmericanStatistical Association, 69(345):169–173, 1974.

Page 42: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Appendix A

Using R for survival analysis

The basic package for survival analysis is survival. The data sets fromKlein and Moeschberger’s book are available in the KMSurv package. Someuseful guidelines to use of the survival package are available in lecture notesby David Diez. However, there are some errors in the syntax of the survfitcommand, possibly due to changes in the package. It gives as an example> attach(tongue)

> my.surv <- Surv(time[type==1], delta[type==1])

> survfit(my.surv)

This produces an error message. The current version of survfit can’t beapplied to a single survival object. It needs a formula. If you instead enter> my.surv <- survfit(Surv(time, delta)⇠type,data=tongue)> my.surv

you get something like the desired output Call: survfit(formula =

Surv(time, delta) ⇠ type, data = tongue)

records n.max n.start events median 0.95LCL 0.95UCL

type=1 52 52 52 31 93 67 NA

type=2 28 28 28 22 42 23 112

If you really wanted just the first row, you could have first subsettedthe data frame to include only the portion with type=1. And if you reallyjust had a single population, you could have just added type as a dummyvariable always equal to 1. Now plot(my.surv) will produce the plot inFigure A.1.

I

Page 43: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Bibliography II

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

1.0

Figure A.1: Plot of Kaplan–Meier survival curves for patients with twodi↵erent types of tongue cancer.

Page 44: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Appendix B

Notes on the Poisson Process

(Mainly copied from the BS3b Lecture Notes)

B.1 Point processes

A joint n-dimensional distribution may be thought of as a model for randompoint in Rn. A point process is a model for choosing a set of points. Someexamples of phenomena that may be modelled this way are:

• The pattern of faults on a silicon chip;

• The accumulation of mutations within an evolutionary tree;

• Appearance of a certain pattern of pixels in a photograph;

• The times when a radioactive sample emits a particle;

• The arrival times of customers at a bank;

• Times when customers issue register financial transactions.

Note that a point process — thought of as a random subset of Rn (orsome more general space) — may also be identified with a counting processN mapping regions of Rn to natural numbers. For any region A, N(A) isthe (random) number of points in A.

A Poisson process is the simplest possible point process. It has thefollowing properties (which we will formalise later on):

• Points in disjoint regions are selected independently;

III

Page 45: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process IV

• The number of points in a region is proportional purely to the size ofthe region;

• There are finitely many points in any bounded region, and points areall distinct.

We will be focusing in this course on one-dimensional point processes, sorandom collections of points in R. Note that the counting process is uniquelydetermined by its restriction to half-infinite intervals: We write

Nt := N�

(�1, t]�

= # points t.

If we imagine starting from 0 and progressing to the left, the succession ofpoints may be identified with a succession of “interarrival times”.

B.2 The Poisson process on R+

The Poisson process on R+ is the simplest nontrivial point process. We havethree equivalent ways of representing a point process on R+:

Random set of points ! Interarrival times ! Counting process Nt

B.2.1 Local definition of the Poisson process

A Poisson arrival process with parameter � is an integer-valued stochasticprocess N(t) that satisfies the following properties:

(PPI.1) N(0) = 0.

(PPI.2) Independent increments: If 0 s1 < t1 s2 < t2 · · · sk < tk,

then the random variables�

N(ti)�N(si)�k

i=1are independent.

(PPI.3) Constant rate: P{N(t+ h)�N(t) = 1} = �h+ o(h). That is,

limh#0

h�1P{N(t+ h)�N(t) = 1} = �.

(PPI.4) No clustering: P{N(t+ h)�N(t) � 2} = o(h).

The corresponding point process has a discrete set of points at those twhere N(t) jumps. That is, {t : N(t) = N(t�) + 1}.

Page 46: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process V

B.2.2 Global definition of the Poisson process

A Poisson arrival process with parameter � is an integer-valued stochasticprocess N(t) that satisfies the following properties:

(PPII.1) N(0) = 0.

(PPII.2) Independent increments: If 0 s1 < t1 s2 < t2 · · · sk < tk,

then the random variables�

N(ti)�N(si)�k

i=1are independent.

(PPII.3) Poisson distribution:

P{N(t+ s)�N(s) = n} = e��t (�t)n

n!.

The corresponding point process has a discrete set of points at those twhere N(t) jumps. That is, {t : N(t) = N(t�) + 1}.

B.2.3 Defining the Interarrival process

A Poisson process with parameter � may be defined by letting ⌧1, ⌧2, . . . bei.i.d. random variables with exponential distribution with parameter �. Thenthe point process is made up of the cumulative sums Tk :=

Pki=1 ⌧i; and the

counting process isN(t) = #{k : 0 Tk t}.

Page 47: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process VI

1

2

3

4

5

6

7

8

9

!1 !2 !3 !4 !5 !6 !7 !8 !9

T1 T2 T3 T4 T5 T6 T7 T8 T9

N(t)

Figure B.1: Representations of a Poisson process. T1, T2, . . . are the locationsof the points (the arrival times). ⌧1, ⌧2, . . . are the i.i.d. interarrival times.The green line represents the counting process, which increases by 1 at eacharrival time.

B.2.4 Equivalence of the definitions

Proposition 1. The local, global, and interarrival definitions define thesame stochastic process.

Proof. We show that the local definition is equivalent to the other definitions.We start with the local definition and show that it implies the global definition.The first two conditions are the same. Consider an interval [s, t], and supposethe process satisfies the local definition. Choose a large integer K, and define

⇠i = N

s+ ti

K

�N

s+ ti� 1

K

.

Then N(t + s) � N(s) =PK

i=1 ⇠i, and ⇠i is close to being Bernoulli withparameter �t/K, so N(t + s) � N(s) should be close to Binom(K,�t/K)distribution, which we know converges to Poisson(�t). Formally, we can

Page 48: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process VII

assume that the ⇠i are all 0 or 1, since

P�

⇠i � 2 for some 1 i K

K · o(t/K)K!1�����! 0.

The new ⇠i, which are no more than 1, have mgf

M⇠(✓) =

1 +�t

K(1 + o(1))(e✓ � 1)

.

Thus the mgf of N(t+ s)�N(t) is

M⇠(✓)K =

1 +�t

K(1 + o(1))(e✓ � 1)

◆KK!1�����! e�t(e

✓�1),

which is the mgf of a Poi(�t) random variable.Now assume the global definition. Since there are a finite number of

points in any finite interval, we may list them in order, giving us a sequenceof random variables T1, T2, . . . , and the interarrival times ⌧1, ⌧2, . . . . We needto show that these are independent with Exp(�) distribution. It will su�ceto show that for any positive numbers t1, t2, . . . , tk,

P�

⌧k > tk�

� ⌧1 = t1, . . . , ⌧k�1 = tk�1

= e��tk .

(This means that, independent of ⌧i for i k�1, ⌧k has Exp(�) distribution.)Let t = t1 + · · · + tk�1. By property (PPII.2), the numbers and locationsof the points on [0, t] and on (t, t+ tk] are independent.1 The event

⌧k >tk, ⌧1 = t1, . . . , ⌧k�1 = tk�1

is identical to the event�

N(tk + t) �N(t) =0, ⌧1 = t1, . . . , ⌧k�1 = tk�1

, so by independence the probability is simplythe probability of a Poi(�tk) random variable being 0, which is

P�

N(tk + t)�N(t) = 0

= e��tk .

Finally, suppose the process is defined by the interarrival definition.It’s trivial that N(0) = 0. The independent increment condition is satisfiedbecause of the memoryless property of the exponential distribution. (Formally,we can show that, conditioned on any placement of all the points on theinterval [0, t2], the next point still has distribution t+ ⌧ , where ⌧ is Exp(�).Then we proceed by induction on the number of intervals.) Finally, P{N(t+h) �N(t) � 1} = 1 � e��h = �h + o(h), while P{N(t + h) �N(t) � 2} P{⌧ < h}2, where ⌧ is an Exp(�) random variable, so this is on the order of�2h2.

1If we’re going to be completely rigorous, we would need to take an infinitesimal intervalaround each t

i

, and show that the events of ⌧i

being in all of these intervals are independent.

Page 49: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions VIII

B.2.5 The Poisson process as Markov process

We note that the Poisson process in one dimension is also a Markov process.It is the simplest version of a Markov process in continuous time. We do notformally define the Markov property in continuous time, but intuitively itmust be that the past contains no information about the future of the processthat is not in the current position. For a counting process, the “future” ismerely the time when it will make the next jump (and the next one afterthat, and so on), while the past is simply how long since the last jump,and the one before that, and so on. So the time to the next jump must beindependent of the time since the last jump, which can only happen if thedistribution of the time is the “memoryless” exponential distribution.

Thus, if a counting process is to be Markov, it must be something like aPoisson process. The only exception is that it would be possible to changethe arrival rate, depending on the cumulative count of the arrivals.

More generally, a continuous-time Markov process on a discrete statespace is defined by linking a Markov chain of transitions among the stateswith independent exponentially distributed waiting times between transitions,with the rate parameter of the waiting times determined by the state currentlyoccupied. But this goes beyond the scope of this course.

B.3 Examples and extensions

Part of learning about any probability distribution or process is to knowwhat the standard situations are where that distribution is considered to bethe default model. The one-dimensional Poisson process is the default modelfor a process of identical events happening at a constant rate, such that noneof the events influences the timing of the others. Standard examples:

• Arrival times of customers in a shop.

• Calls coming into a telephone exchange (the traditional version) orservice requests coming into an internet server.

• Particle emissions from a radioactive material.

• Times when surgical accidents occur in a hospital.

Some of these examples we would expect to be not exactly like a Poissonprocess, particularly as regards homogeneity: Customers might be morelikely to come in the morning than in the afternoon, or accidents mightbe more likely to occur in the hospital late at night than in mid-afternoon.

Page 50: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions IX

Generalising to Poisson-like processes where the arrival rates may change withtime, for instance, is fairly straightforward. As with many other modellingproblems, we might best think of the Poisson process as a kind of “nullmodel”: Probably too simplistic to be really accurate, but providing abaseline for starting the modelling process, from which we can considerwhether more detailed realism is worth the e↵ort.

B.4 Some basic calculations

Example 2.1: Queueing at the bank

Customers arrive at a bank throughout the day according to aPoisson process, at a rate of 2 per minute. Calculate:

(i) The probability that the first customer to arrive after 12noon arrives after 12:02;

(ii) The probability that exactly 4 customers arrive between12:03 and 12:06;

(iii) The probability that there are at least 3 customers to arrivebetween 12:00 and 12:01.

(iv) The expected time between the 4th customer arrival afternoon and the 7th.

(v) The distribution of the time between the 4th customer arrivalafter noon and the 7th.

Solution:

(i) The number of arrivals in a 2-minute period has Poi(4)distribution. So the probability that this number is 0 ise�4 = 0.018. Alternatively, we can think in terms of waitingtimes. The waiting time has Exp(2) distribution. Regardlessof how long since the last arrival at noon, the remainingwaiting time is still Exp(2). So the probability that this isat least 2 minutes is e�4.

(ii) The number N3 arriving in 3 minutes has Poi(6) distribution.The probability this is exactly 4 is e�664/4! = 0.134.

Page 51: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions X

(iii) The number is Poi(2). Then

P{N1 � 3} = 1�P{N1 2} = 1�e�2

1 +2

1!+

22

2!

= 0.323.

(iv) The expected time between arrivals is 1/2 minute. Thus theexpected sum of three interarrival times is 1.5 minutes.

(v) This is the sum of three independent Exp(2) random vari-ables, so has Gamma(2, 3) distribution, with density 4t2e�2t.

Example 2.2: The waiting time paradox

Central stations of the Moscow subway have (or had, when Ivisited a dozen or so years ago) an electronic sign that showsthe time since the last train arrived on the line. That is, it is atimer that resets itself to 0 every time a train arrives, and thencounts the seconds up to the next arrival. Suppose trains arriveon average once every 2 minutes, and arrivals are distributed asa Poisson process. A woman comes every day, boards the nexttrain, and writes down the number on the timer as the trainarrives. What is the long-term average of these numbers?

Solution: You might think that the average should be 2 minutes.After all, she is observing an interarrival time between trains,and the expected interarrival time is 2 minutes. It is true thatif she stood there all day and wrote down the numbers on thetimer when the trains arrive, the average should converge to 2minutes. But she isn’t averaging all the interarrival times, onlythe ones that span the moments when she comes to the station.Isn’t that the same thing?

No! The intervals that span her arrival times are size-biased.Imagine the train arrival times being marked on the time-line.Now the woman picks a time to come into the station at random,independent of the train arrivals. (We are implicitly assumingthat she uses no information about the actual arrival times. Thiswould be true if she comes at the same time every day, or at arandom time independent of the trains.) This is like dropping

Page 52: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XI

a point at random onto the real line. There will be some wideintervals and some narrow intervals, and the point will naturallybe more likely to end up in one of the wider intervals. In fact, ifthe interarrival times have density f(t), the size-biased interarrivaltimes will have density tf(t)/

R10 sf(s)ds. In the Poisson process,

the interarrival times have exponential distribution, so the densityof the observed interarrival times is

t · �e��t

R10 s�e��sds

= �2te��t.

An easier way to see this is to think of the time that the womanwaits for the train, and the time since the previous train at themoment when she enters the station. By the memoryless propertyof the exponential distribution, the waiting time until the nexttrain, at the moment when she enters, is still precisely exponentialwith parameter 2. By the symmetry of the Poisson process, it’sclear that if we go backwards looking for the previous train, thetime will also be exponential with parameter 2, and the two timeswill be independent. Thus, the interarrival times observed bythe woman will actually be the sum of two independent Exp(2)random variables, so will have gamma distribution, with rateparameter 2 and shape parameter 2. ⌅

Example 2.3: Genetic recombination model

A simple model of genetic recombination is illustrated in FigureB.2. Each individual has two copies of each chromosome (ma-ternal and paternal — one from the mother and one from thefather). Genes are lined up on the chromosomes, so for the geneson a given chromosome, your children should inherit either allthe genes you got from your mother, or all the genes you gotfrom your father. Not exactly, though, because of recombination.

During meiosis — the process that creates sperm and ova — thechromosomes are broken at random points where they “crossover”, making new chromosomes out of pieces of the maternaland paternal chromosomes. In early genetic research biologistsworked to situate genes on chromosomes by measuring how likelythe genes were to stay together, generation after generation.

Page 53: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XII

maternal

paternal

x x x

new chromosomes

x x x

x y z

x y z

Figure B.2: Illustration of the genetic recombination model. At top wesee the maternal and paternal chromosomes lined up, with the locationsof crossover events — determined by a Poisson process — marked by x’s.The middle sketch shows them crossing over. The bottom sketch shows thenew chromosomes that result. Genes are inherited together if they are onpositions where the same colour goes on the same new chromosome. Thusthe positions marked by x and y are inherited together, while x and z arenot.

Page 54: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XIII

Genes that were on di↵erent chromosomes should be passed onindependently (Mendel’s law). Genes that were close together ona chromosome should almost always be passed on as a unit. Andgenes that were farther apart should be more likely than chanceto be inherited together, but not certain.

In our model, the chromosomes are the unit interval, and thecrossover points are a Poisson process with intensity �. Considertwo points x < y on the interval, representing the location oftwo genes. We might first ask for the probability that thereis no crossover between x and y. Since Ny � Nx has Poissondistribution with parameter �(y � x).

P{no crossover} = P{Ny �Nx = 0}= e��(y�x).

But this isn’t really what we want to compute. Looking at theinheritance of x and y, we can’t tell if there was no recombinationbetween those points or 2 or 4 or any even number. So wecompute

P{even number of crossovers} =1X

k=0

P{Ny �Nx = 2k}

=1X

k=0

e��(y�x) (�(y � x))2k

(2k)!

= e��(y�x) · 12

e�(y�x) + e��(y�x)⌘

=1

2

1 + e�2�(y�x)⌘

.

Thus, if we observe that genes at x and y are inherited togetherwith probability p > 1

2 , we can estimate the distance betweenthem as

1

2�log(2p� 1).

B.5 Thinning and merging

Consider the following problems: A hospital intensive care unit admits 4patients per day, according to a Poisson process. One patient in twenty, on

Page 55: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XIV

average, develops a dangerous infection. What is the probability that therewill be more than 2 dangerous infections in the course of a week?

Or: The casualty department takes in victims of accidents at the rate of4 per hour through the night, and heart attack/stroke victims at the rateof 2 per hour, each of them according to a Poisson process. What is thedistribution of the total number of patients that arrive during an 8-hourshift? What can we say about the distribution of patient arrivals, ignoringtheir cause?

Theorem B.1. Suppose we have a Poisson process with parameter �. Denotethe arrival times T1 < T2 < · · · . We thin the process by the following process:We are given a probability distribution on {1, . . . ,K} (so, we have numberspi = P({i}) for i = 1, . . . ,K). Each arrival time is assigned to category i with

probability pi. The assignments are independent. Let T (i)1 < T (i)

2 < · · · be thearrival times in category i. Then these are independent Poisson processes,with rate parameters �pi for process #i.

Conversely, suppose we have independent Poisson processes T (i)1 < T (i)

2 <· · · with rate parameters �i. We form the merged process T1 < T2 < · · ·by taking the union of all the times, ignoring which process they come from.Then the merged process is also a Poisson process, with parameter

P

�i.

Proof. Thinning: We use the local definition of the Poisson process. Start

from the process T1 < T2 < . . . . Let T (i)1 < T (i)

2 < · · · be the i-th thinnedprocess. Clearly Ni(0) is still 0. If we look at the number of events occurringon disjoint intervals, they are being thinned from independent randomvariables. Since a function applied to independent random variables producesindependent random variables, we still have independent increments. Wehave

P�

Ni(t+ h)�Ni(t) = 1

= P�

N(t+ h)�N(t) = 1

· P�

assign category i

+ P�

N(t+ h)�N(t) � 2

· P�

assign category i to exactly one

= (�h+ o(h))pi + o(h)

= pi�h+ o(h).

And by the same approach, we see that P�

Ni(t+ h)�Ni(t) � 2

= o(h).Independence is slightly less obvious. In general, if you take a fixed

number of points and allocate them to categories, the numbers in the di↵erentcategories will not be independent. The key is that there is not a fixed numberof points; moving from left to right, there is always the same chance �dt of

Page 56: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XV

getting an event at the next moment, and these may be allocated to anyof the categories, independent of the points already allocated. A rigorousproof is easiest with the global definition. Consider N1(t), N2(t), . . . , NK(t)for fixed t. These may be generated by the following process: Let N(t) bePoi(�t), and let (N1(t), N2(t), . . . , NK(t)) be multinomial with parameters(N(t); (pi)). That is, supposing N(t) = n, allocate points to bins 1, . . . ,Kaccording to the probabilities p1, . . . , pK . Then

P�

N1(t) = n1, . . . , NK(t) = nK

= P�

N(t) = n

P�

N1(t) = n1, . . . , NK(t) = nK

�N(t) = n

= e��t (�t)n

n!· n!

n1! · · ·nK !pn11 · · · pnK

K

=KY

i=1

e��i

t (�t)ni

ni!

=KY

i=1

P�

Ni(t) = ni

Since counts involving distinct intervals are clearly independent, this com-pletes the proof.

Merging: This is left as an exercise.

Thus, in the questions originally posed, the arrivals of patients whodevelop dangerous infections (assuming they are independent) is a Poissonprocess with rate 4/20 = 0.2. The number of such patients in the course of aweek is then Poi(1.4), so the probability that this is > 2 is

1� e�1.4

1 +1.4

1+

1.42

2

= 0.167.

The casualty department takes in two independent Poisson streams ofpatients with total rate 6, so it is a Poisson process with parameter 6. Thenumber of patients in 8 hours has Poi(48) distribution.

B.6 Poisson process and the uniform distribution

The presentation here is based on section V.4 of [TK98].

Example 2.4: Discounted income

Page 57: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XVI

Companies evaluate their future income stream in terms of presentvalue: Earning £1 next year is not worth as much as £1 today;put simply, £1 income t years in the future is worth e�✓t£ today,where ✓ is the interest rate.

A company makes deals at random times, according to a Poissonprocess with parameter �. For simplicity, let us say that eachdeal is worth £1. What is the expectation and variance of thetotal present value of all its future deals? ⌅

The problem here is that, while we know the distribution of the numberof deals over a span [0, t], the quantity of interest depends on the precisetimes.

Theorem B.2. Let T1 < T2 < · · · be the arrival process of a Poisson process.For any s < t, conditioned on the event {N(t) �N(s) = n}, the subset ofpoints on the interval (s, t] is jointly distributed as n independent pointsuniform on (s, t].

Proof. Intuitively this is clear: The process is uniform, so there can’t be ahigher probability density at one point than at another. And finding a pointin (t, t+ �t] doesn’t a↵ect the locations of any other points.

We can prove this formally by calculating that for any s < u < t,conditioned on N(t) � N(s) = n, the number of points in (s, u] (that is,N(u)�N(s)) has binomial distribution with parameters n and (u�s)/(t�s).That is, the number of points in the subinterval has exactly the samedistribution as you would find by allocating the n points independently anduniformly. Then we need to argue that the distribution of the number ofpoints in every subinterval determines the joint distribution. The details areleft as an exercise.

Solution to the Discounted income problem: Consider the presentvalue of all deals in the interval [0, t]. Conditioned on there being n deals,these are independent and uniformly distributed on [0, t]. The expectedpresent value of a single deal is then

Z t

0t�1e�✓sds = (✓t)�1

1� e�✓t⌘

,

and the variance of a single deal’s value is

�2(t) := (2✓t)�1⇣

1� e�2✓t⌘

� (✓t)�2⇣

1� e�✓t⌘2

=1

2✓2t

(✓ � 2t�1) + 4e�✓t � (✓ + 2t�1)e�2✓t⌘

.

Page 58: Modern Survival Analysis - Steinsaltz notes/Survival... · 2015-05-06 · survival. Reading: The primary source for material in this course will be O. O. Aalen, O. Borgan, H. K. Gjessing,

Poisson process: Examples and extensions XVII

Thus, the present value up to time t, call it Vt, has conditional expectationand variance

E⇥

Vt

�N(t)⇤

= N(t)(✓t)�1⇣

1� e�✓t⌘

,

Var�

Vt

�N(t)�

= N(t)�2(t).

(The formula for the conditional variance depends on independence.) So wehave

E[Vt] = E⇥

E⇥

Vt

�N(t)⇤⇤

= �✓�1⇣

1� e�✓t⌘

.

Of course, as t!1 this will converge to �/✓. For the variance, we use theformula Var(V ) = E[Var(V |X)] + Var(E[V |X]), so that

Var(Vt) = E[N(t)]�2(t) + (✓t)�2⇣

1� e�✓t⌘2

Var(N(t))

= � · 1

2✓2

(✓ � 2t�1) + 4e�✓t � (✓ + 2t�1)e�2✓t⌘

+ ✓�2t�2⇣

1� e�✓t⌘2

�t

t!1�����! �

2✓.