32
U.U.D.M. Project Report 2020:41 Examensarbete i matematik, 15 hp Handledare: Marcus Westerberg Ämnesgranskare: Rolf Larsson Examinator: Martin Herschend Augusti 2020 Department of Mathematics Uppsala University Classification of survival data by comparison of survival functions an application to prostate cancer registry data Alexander Christiansson

Classification of survival data by comparison of survival

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification of survival data by comparison of survival

U.U.D.M. Project Report 2020:41

Examensarbete i matematik, 15 hpHandledare: Marcus Westerberg Ämnesgranskare: Rolf LarssonExaminator: Martin HerschendAugusti 2020

Department of MathematicsUppsala University

Classification of survival data by comparison of survival functionsan application to prostate cancer registry data

Alexander Christiansson

Page 2: Classification of survival data by comparison of survival
Page 3: Classification of survival data by comparison of survival

Abstract

We present and compare four methods for classification of survival data by comparison ofsurvival functions. Three of the methods are regression-based and one is based on minimizinga dissimilarity measure between survival functions. We also review the theory of survivalanalysis.

These methods are applied a dataset consisting of 80367 men diagnosed with prostatecancer between 1998 and 2016, derived from a nation-wide register of prostate cancer data(NPCR/PCBaSe), to classify metastatic status of individuals with unknown metastatic sta-tus.

Two of the four methods produced classification rules such that the prognosis of in-dividuals classified as having metastasized disease by them were similar to the prognosisof individuals with confirmed metastasized disease. With appropriate improvements thesemethods could be used to make future register-based studies less affected by misclassificationbias.

Page 4: Classification of survival data by comparison of survival

Contents

1 Introduction 31.1 Purpose/aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Method summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 Prostate cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 NPCR and PCBaSe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Theory 73.1 Statistical theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . 73.1.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.1 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 Survival and hazard functions . . . . . . . . . . . . . . . . . . . . . . . 103.2.3 Likelihood derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.4 The Kaplan-Meier estimator . . . . . . . . . . . . . . . . . . . . . . . 123.2.5 Log-rank and Wilcoxon tests . . . . . . . . . . . . . . . . . . . . . . . 133.2.6 The Cox proportional hazards model . . . . . . . . . . . . . . . . . . . 143.2.7 Parametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . 153.2.8 Another comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Statistical analysis 184.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Parametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Semiparametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . 194.4 Dissimilarity minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Results 205.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.3 Parametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.4 Semiparametric survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . 225.5 Dissimilarity minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1

Page 5: Classification of survival data by comparison of survival

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Discussion 266.1 Summary of method and results . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Strengths and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.3 Comparison of methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.4 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.5 Implications and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2

Page 6: Classification of survival data by comparison of survival

Chapter 1

Introduction

Prostate cancer is the most commonly diagnosed form of cancer among men in Sweden.Almost all (> 98%) diagnoses are entered into the nation-wide register NPCR/PCBaSewhere the cancers are classified into five risk groups based on a modified version of theNational Comprehensive Cancer Network (NCCN) guidelines [11], a system which has beenshown to be strongly prognostic [16, 21].

According to this system a cancer is considered to be of the highest risk category (distantmetastases) if skeleton imaging found metastases or if metastatic status is unknown and theprostate specific antigen (PSA) serum level was ≥ 100 ng/ml at the time of diagnosis (thisrule will be denoted PSA100 henceforth). The inclusion of individuals without evidence ofmetastases but high PSA levels in the highest risk category is justified by studies [14, 15]that have found that high PSA levels is a very strong indicator of metastatic disease.

However, in [18] it was shown that about a fourth of men in NPCR/PCBaSe with PSAlevels ≥ 100 ng/ml who had undergone imaging in fact had no metastases. These men hadbetter prognosis than men with metastases and PSA levels ≥ 100 ng/ml. This suggests thatthe current classification system might overclassify men into the highest risk category andcalls into question whether it can be improved so that risk category more closely correspondsto prognosis.

1.1 Purpose/aim

The purpose of this thesis is to develop an improved risk classification system for men inNPCR/PCBaSe to make future studies less influenced by misclassification bias. In particularwe would like to refine the highest risk category (distant metastases) to only include menwith metastases or with comparable prognosis in terms of risk of death by prostate cancer.

1.2 Hypotheses

Based on the findings in [18] we expect to be able to find better classification rules fordetermining metastatic status. In particular we expect to be able to find two subsets withsignificantly differing survival in the set of men with unknown metastatic status, correspond-ing to those with and without metastases. We will also attempt to identify subgroups whereit’s hard to discriminate between metastatic and non-metastatic disease.

3

Page 7: Classification of survival data by comparison of survival

1.3 Delimitations

We only consider men diagnosed 1998 and later. We do a complete case analysis, except forthe variable corresponding to metastatic status. We also consider all non-prostate cancerdeaths as censored observations (see section 3.2.1).

1.4 Method summation

91959 men diagnosed with prostate cancer between 1998 and 2016 were selected at randomfrom PCBaSe. After removing incomplete observations 80367 men remained.

In addition to the baseline method PSA100 described above, we use four methods toclassify individuals; logistic regression, fully parametric survival analysis, semiparametricsurvival analysis (Cox regression) and a method based on minimizing area between survivalcurves.

Logistic regression is a standard statistical method whereas the later three methods arebased on a set of statistical methods known as survival analysis, developed for the analysisof data where the outcome variable is the time until the occurrence of some event (in ourcase death). Survival analysis is reviewed in some detail in section 3.2.

1.5 Acknowledgments

I’d like to thank my advisor Marcus Westerberg for all the help and support I’ve gottenthrough the whole process. I’d also like to thank Hans Garmo for many useful ideas andcomments. Finally I’d like to thank Rolf Larsson for proofreading and spotting many mis-takes.

4

Page 8: Classification of survival data by comparison of survival

Chapter 2

Background

2.1 Prostate cancer

Prostate cancer is the most commonly diagnosed form of cancer among men in Sweden. Thedisease is often non-fatal (87.9% relative ten-year survival) since it mostly appears at a latestage of life and develops slowly [17].

Prostate cancers are classified according to the Tumor, Node, Metastasis (TNM) systemto make comparison of data obtained from different hospitals and health care systems easier[12]. The tumor stage (T-stage) roughly corresponds to the extension of the primary tumor.It ranges from T0 to T4 where T0 corresponds to no evidence of a tumor and T4 to a largetumor invading adjacent structures. The lymph node stage (N-stage) is N1 if metastasesin the lymph nodes have been found, and N0 otherwise. The metastasis stage (M-stage) isM1 if evidence of distant metastases have been found and M0 otherwise. If tumor, node ormetastasis status has not been assessed, the stage is said to be TX, NX or MX respectively.

Two other important indicators used are Gleason score and serum levels of prostate-specific antigen (PSA). The Gleason score corresponds to the results of a biopsy, with higherGleason score corresponding to a more aggressive tumor. The prostate-specific antigen is aprotein that is produced by healthy prostates but often has elevated serum levels in men withprostate cancer. Therefore serum levels of PSA serves as an indicator for how progressed aprostate cancer is [12].

2.2 NPCR and PCBaSe

The data used in the statistical analysis was taken from the National Prostate CancerRegister (NPCR) of Sweden and PCBaSe. NPCR is a register containing 98% of all prostatecancer of all prostate cancer diagnoses in Sweden since 1998, amounting to 192,755 cases asof September 3rd 2019. Its predecessor was set up in 1987 and since 1998 all six health careregions in Sweden have been a part of it.

NPCR includes data on survival time, tumor stage, metastasis, PSA level and age atdiagnosis. More variables such as the number of biopsies and prostate volume have beenadded over time. In total there are 50-100 variables per individual [13].

PCBaSe is an extension of NPCR, under which the register has been linked to a numberof other health care and demographic registers such as the death cause register using theunique personal identity number.

5

Page 9: Classification of survival data by comparison of survival

The data quality in the register has been found to be generally high and more than 130studies based on NPCR have been registered on PUBMED since 2010. [13]

2.3 Previous research

Two studies on which the current classification system is based are [14] and [15]. An articlequestioning this classification system is [18].

6

Page 10: Classification of survival data by comparison of survival

Chapter 3

Theory

3.1 Statistical theory

3.1.1 Maximum likelihood estimation

The statistical models we consider below will be fit using the method of maximum likelihoodestimation [1]. The goal of the method is to find the distribution in a space of distributionfunctions that is most likely to have generated the sample. We consider both parametricand the less common non-parametric maximum likelihood estimation.

In the parametric case, the distribution function is assumed to be indexed by elementsof a finite-dimensional vector space Θ. The elements θ ∈ Θ are called parameters. This is,the class of distribution functions considered is of the form

{f(·|θ) | θ ∈ Θ} .

We consider a sample of size n, X = (X1, . . . , Xn), where each Xi is a random variable,independently and identically distributed (i.i.d.) by f(·|θ). Let fn(·|θ) denote the jointdistribution of X. Given a realization x = (x1, . . . , xn) of the sample, we define the likelihoodL of the observation as a function of θ:

L(θ) = fn(x|θ) =

n∏i=1

f(xi|θ).

The goal of maximum likelihood estimation is to find the value of θ ∈ Θ that maximizesL. This value θ ∈ Θ is called the maximum likelihood estimate. The maximum likelihood isgenerally found numerically, for example by the Newton-Raphson method, though in somesimple cases it is possible to find a closed-form solution.

In section 3.2.4 we will consider nonparametric maximum likelihood estimates. It followsthe same principles, but instead of defining the likelihood as a function on a parameter space,we define it on a space of density functions, so

L(f) =

n∏i=1

f(xi).

The maximum likelihood estimate is here the density function f that maximizes L.In practice one often works with the log-likelihood

`(θ) = logL(θ)

7

Page 11: Classification of survival data by comparison of survival

instead of the likelihood. Due to the monotonicity of the logarithm, maximizing the likeli-hood is equivalent to maximizing the log-likelihood.

3.1.2 Logistic regression

Logistic regression is a method for determining the probability that a categorical responsevariable will take a particular value. We we will only consider the case of binary logisticregression, that is when the response variable is binary. The extension to variables takingmore than two values follows the same principles.

We consider a binary random variable Y that either takes the value 0 or 1. We wish tomodel the probability p that Y = 1 given the values of k covariates X = (1, X(1), . . . , X(k)).We will associate one parameter to each coordinate of X, so the 1 added at the beginningof the vector will allow us to introduce a constant parameter to the model, i.e. one thataffects all observations independently of covariate values.

Instead of modeling p = Pr(Y = 1|X) directly, we will model the logit or log-odds of p,

logit(p) = log

(p

1− p

).

One reason that we do this is that the range of logit(p) is all of R while the range of p isjust (0, 1), so we can take

logit(p) = βtX (3.1)

as a model, where β = (β0, β1, . . . , βk) is a vector of parameters. We can solve for p in (3.1),yielding

p =exp(β′X)

1 + exp(β′X)=

1

1 + exp(−β′X).

To determine the values of the parameters β we will use maximum likelihood estimation.Consider an i.i.d. sample ((X1, Y1), . . . , (Xn, Yn)) of size n. The likelihood given the real-ization ((x1, y1), . . . , (xn, yn)) of the sample is then

L(β) =

n∏i=1

pyii (1− pi)1−yi

where pi = Pr(Y = 1|X = xi). The corresponding log-likelihood is

`(β) =

n∑i=1

yi log(pi) + (1− yi) log(1− pi)

=

n∑i=1

log(1− pi) +

n∑i=1

yi[log(pi)− log(1− pi)]

=

n∑i=1

log

(1− 1

1 + exp(−β′xi)

)+

n∑i=1

yi log

(pi

1− pi

)

=

n∑i=1

log

(exp(−β′xi)

1 + exp(−β′xi)

)+

n∑i=1

yiβ′xi

=

n∑i=1

log

(1

1 + exp(β′xi)

)+

n∑i=1

yiβ′xi

= −n∑i=1

log(1 + exp(β′xi)) +

n∑i=1

yiβ′xi.

8

Page 12: Classification of survival data by comparison of survival

Differentiating with respect to the j-th component of β,

∂`

∂βj= −

n∑i=1

1

1 + exp(β′xi)exp(β′xi)x

(j)i +

n∑i=1

yix(j)i

Simultaneously setting these derivatives to zero and solving for β yields the maximum likeli-hood estimate of β. No closed-form solution is available, so this has to be done numerically.

3.2 Survival analysis

Survival analysis is a set of statistical methods for analyzing time-to-event data, data wherethe outcome variable is time elapsed from a time origin until the occurrence of a chosenevent of interest. This type of data is common in medical studies where often the timeorigin corresponds to entry into the study and the event of interest to death, thus the namesurvival analysis. It is possible to make other choices of time origin or event, for example onecould consider the time until recurrence of a disease after a period of remission. Even if theevent is not death, the time until the event is called survival time, with the interpretationbeing time survived until occurrence of the event.

Survival analysis can also be applied in non-medical contexts, for example in the actuarialsciences or in engineering where it can be used to predict the time until failure of a mechanicalcomponent. We focus below on the applications for medical sciences.

3.2.1 Censoring

A major complication in the analysis of survival data is the presence of censored observa-tions. An observation of an individual said to be censored when only partial information onthat individual’s survival time is available. For example we might know that an individualsurvived up to a given time, but have no information on the survival past that point. Thismight happen because the event had not occurred at the end of the study, or because theindividual was lost to follow up (for example by dropping out of the study for personal oradministrative reasons). This form of censoring is known as right censoring, and is the mostimportant form of censoring in medical studies. Other forms of censoring are left censoring,where the event has already occurred when an individual entered the study, and intervalcensoring where either the origin or event time is only known up to some time period, forexample since the status of the individual is only checked once a month.

Another cause of missing information in survival data is truncation, where only eventsoccurring within some time interval are observed so that we have no information on eventsoccurring outside this interval. In the data used in the analysis below only right-censoringwill be relevant, so we will not consider other forms of censoring or truncation here. Fordetails on other kinds of censoring and truncation, see [8].

Another obstacle in the analysis of survival data are competing risks. A competing riskis another event that makes observation of the event of interest impossible. An example ofcompeting risk is death from something other than the event of interest. It is not always clearhow to treat losses to competing risks, since there may be correlation between competingevents and the event of interest. As mentioned in the introduction (section 1.3) we willbelow consider all deaths due to competing risks as censored observations (censored at thetime of the occurrence of the competing event).

An important assumption that generally needs to be made when dealing with censoreddata is that the process of censoring should be non-informative [3]. This means that the

9

Page 13: Classification of survival data by comparison of survival

knowledge that an individual was censored at a given time should not provide any furtherinformation about that individual’s survival time, other than that it survived up to at leastthat time. This assumption could for example be violated if individuals that are close toexperiencing the event are routinely censored.

There are at least four basic ways of dealing with censored data [9]. A naive option isto analyze complete data only, but doing so means discarding large portions of the dataand leads to a biased analysis unless censoring occurs completely at random which is rarein practice.

Alternatively one could dichotomize the data and only consider whether or not eachindividual experienced the event before a given time. The problem is then reduced to aregular classification problem. However this approach also discards large amounts of data.

A third option is to impute the survival time of the censored observations. This requiresus to model the survival of censored observations, and this model can be very hard tovalidate, especially when the cause of censoring is death.

Finally one can use a likelihood-based approach, where one constructs a likelihood of theobservations taking censoring into account and maximizes it under some model assumptions.This is the most common approach in survival analysis.

3.2.2 Survival and hazard functions

Let T and C be random variables representing survival and censoring time respectively.Due to censoring we will not in general observe T or C, but rather the smallest of T andC, Y = min(T,C), and a binary ∆ = I(T ≤ C) indicating whether the observation wascensored or not. In other words, ∆ = 0 if the observation was censored and ∆ = 1 if theobservation was an event. We call this ∆ the censoring status of an observation [8].

In survival analysis we usually want to make inferences about the distribution of T ,and are generally not interested in the distribution of C. There are two common waysof representing the distribution of T . The first is the survival function, defined as theprobability of surviving past a time t,

S(t) = 1− F (t) = P (T > t)

where F (t) is the distribution function of T . Note that if f(t) is the density function of T ,f(t) = −S′(t).

A second way of representing the distribution of T is the hazard function,

λ(t) = lim∆t→0

P (T < t+ ∆t | T > t)

∆t= lim

∆t→0

P (t ≤ T < t+ ∆t)

∆t · S(t).

Intuitively we may interpret the hazard function as the risk of experiencing an event at timet given that it has not occurred up to time t. The hazard function completely specifies thesurvival function, since

λ(t) =1

S(t)lim

∆t→0

P (t ≤ T < t+ ∆t)

∆t

=1

S(t)lim

∆t→0

S(t)− S(t+ ∆t)

∆t

= −S′(t)

S(t)

10

Page 14: Classification of survival data by comparison of survival

so that ∫ t

0

λ(s)ds = −∫ t

0

S′(s)

S(s)ds = − lnS(t)

yielding S(t) = exp(−∫ t

0λ(s)ds). The function Λ(t) = −

∫ t0λ(s)ds is known as the cumula-

tive hazard function and is interpreted as the amount of risk an individual has experiencedup to the time t.

Figure 3.1: Plot of the constant hazard λ(t) = 1 and the survival function determined by it.

Figure 3.2: Plot of the an ”bathtub” shaped hazard, starting high but quickly decreasing,plateauing and then slowly increasing, and the survival function determined by the hazard.

11

Page 15: Classification of survival data by comparison of survival

3.2.3 Likelihood derivation

We derive the likelihood of a survival function S, largely following [8]. As above, theunderlying random variables are the survival time T and the censoring time C. Due tocensoring we observe Y = min(T,C) and a binary indicator ∆ = I(T ≤ C), denotingwhether the observation was an event (∆ = 1) or censored (∆ = 0). We denote the densityfunctions of Y and T by fY and fT respectively.

First note that for an event occurring at time t, i.e. Y = T = t, ∆ = 1, the contributionto the likelihood is equal to the joint density of Y and ∆ at the point (t, 1):

fY (t|∆ = 1) Pr(∆ = 1) = fT (t|T ≤ tj) Pr(T ≤ t)

=fT (t)

Pr(T ≤ tj)Pr(T ≤ t) (3.2)

= fT (t).

For a censored observation at time t, i.e. Y = C = t, ∆ = 0, the contribution to thelikelihood is equal to the probability of not experiencing the event until that time, i.e. S(t).This can be seen by nothing that the joint density of Y and ∆ at the point (t, 0) is equal to

fY (t|∆ = 0) Pr(∆ = 0) = 1 · Pr(∆ = 0) = Pr(T ≥ t) = S(t). (3.3)

Here the assumption of non-informative censoring is used to conclude that Pr(∆ = 0) =Pr(T ≥ t).

Now consider a realization (yi, δi) of an i.i.d. sample of n survival times and censor-ing statuses. Let t1, · · · , tr be the yi’s that correspond to events (i.e. where δi = 1) andc1, · · · , cn−r be the yi’s corresponding to censored observations. Since the sample was as-sumed to be independent, we can multiply the likelihoods calculated in (3.2) and (3.3) tofind the total likelihood:

L =

r∏j=1

f(tj)

n−r∏j=1

S(cj).

More compactly,

L =

n∏i=1

fT (yi)δiS(yi)

1−δi =

n∏i=1

(fT (yi)

S(yi)

)δiS(yi) =

n∏i=1

λ(yi)δiS(yi).

Where λ(t) = fT (t)/S(t) is the hazard function defined in section 3.2.2. The correspondinglog-likelihood is

` =

n∑i=1

δi log(λ(yi)) + log(S(yi)). (3.4)

3.2.4 The Kaplan-Meier estimator

The Kaplan-Meier estimator or product-limit estimator, is a nonparametric estimator ofthe survival function S(t). It is nonparametric in the sense that it makes no assumptionsabout the shape of S(t). The estimator was introduced by Kaplan and Meier [7] and isfundamental to survival analysis.

Consider a realization (yi, δi) of an n-sized i.i.d. sample of survival times and censoringstatuses. Let t1 < . . . < tr be the set of distinct event times (i.e. the set of yi’s such that

12

Page 16: Classification of survival data by comparison of survival

δi = 1, possibly reordered). Here r ≤ n with equality if all times yi are distinct and noobservations are censored.

For j = 1, . . . , r, let dj denote the number of events at tj . We define an individual withobserved survival time yi to be at risk at a time t if it has not experienced the event justbefore t, i.e. if yi ≥ t. We denote the number of individuals at risk at time tj by dj . TheKaplan-Meier estimator is then defined as

S(t) =∏j|tj≤t

(1− dj

nj

)

This is the maximum likelihood estimate of S given the observed sample under the assump-tion that censoring is non-informative [7].

The variance of S can be estimated using Greenwood’s formula,

Var(S)

= S(t)2∑j|tj≤t

dini(ni − di)

.

See for example [6] for details. This estimate of the variance along with a normality as-

sumption allows the calculation of confidence intervals for S.Using this estimator we can easily draw survival curves that summarize the estimated

survival function. These survival curves are often stratified to allow visual comparison ofestimated survival between groups.

3.2.5 Log-rank and Wilcoxon tests

While a visual comparison of estimated survival functions allows us to quickly make con-jectures about difference in survival between two groups, this has to be weighed againstthe hypothesis that there is no underlying difference in survival, and that the apparent dif-ferences are a result of random variations. The log-rank and Wilcoxon tests can be usedto determine whether the survival of two groups differs in a statistically significant way.Both tests work under the null hypothesis that there is no underlying difference in survivalbetween the two groups [3].

We start with the realization (yi, δi, βi) of an n-sized i.i.d. sample where βi is 1 or 2depending on which group individual i belongs to and yi and δi are as before. We orderthe distinct event times in increasing order, t1 < · · · < tr. For each j = 1, . . . , r denotethe number of events in group 1 and 2 at time tj by d1j and d2j respectively. Similarlywe denote the number of individuals at risk in each group at tj by n1j and n2j . The totalnumber of events and individuals at risk at tj are then dj = d1j + d2j and nj = n1j + n2j

respectively.We now consider each j = 1, . . . , r separately. The values of n1j , n2j and dj are known

and can be considered as fixed. We want to determine the distribution of d1j (and thusalso d2j = dj − d1j) under the null hypothesis. If there is no difference in the survivalof the groups, the number of events in group 1 (d1j) should correspond to the number ofdistinguished objects drawn when randomly drawing n1j objects from a population of size njwhere dj objects are distinguished. This means that d1j is hypergeometrically distributed,and by the known expectation and variance of the hypergeometric distribution,

e1j = E(d1j) =n1jdjnj

v1j = Var(d1j) =n1jn2jdj(nj − dj)

n2j (nj − 1)

13

Page 17: Classification of survival data by comparison of survival

We now define a test statistic as the sum of differences between the expected and observednumber of deaths in group 1 at each time.

UL =

r∑j=1

(d1j − e1j)

For large r this is approximately normally distributed with zero mean [3]. We also calculatethe variance of UL.

VL = Var(UL) =

r∑j=1

v1j

Then UL√VL

has standard normal distribution for large r. Since the square of standard normal

random variables is chi-square distributed with 1 degree of freedom, we conclude that

WL =U2L

VL∼ χ2

1

Larger values of WL constitute greater evidence against the null hypothesis that there is nodifference in survival between the two groups. Since we know the distribution of WL we cancalculate a p-value that allows us to either reject or confirm the null hypothesis.

The Wilcoxon test is defined similarly, but instead of UL we take as test statistic

UW =

r∑j=1

nj(d1j − e1j)

i.e. at each time we multiply the divergence from the expected number of events by thenumber of individuals at risk. This makes the test more sensitive to early differences insurvival between the groups. We calculate the variance,

VW = VarUW =

r∑j=1

n2jv1j

Since UW is also normally distributed with mean 0 for large r, we get

WW =U2W

VW∼ χ2

1.

3.2.6 The Cox proportional hazards model

The Cox proportional hazards model [3, 4] is used for regression modeling of survival data.The model assumes that for an individual with covariate values x, the hazard function takesthe form

h(t|x) = h0(t) exp(β′x)

where β is a parameter corresponding to weights given to the covariates and h0 is the baselinehazard function, common to all covariate values. The model is called semiparametric becausethe baseline hazard is not assumed to follow any parametric distribution.

For two subjects with covariate values x1 and x2, the ratio between their hazard func-tions,

h(t|x1)

h(t|x2)=h0(t) exp(β′x1)

h0(t) exp(β′x2)= exp(β′[x1 − x2]) (3.5)

14

Page 18: Classification of survival data by comparison of survival

depends only on the values of x1 and x2, i.e. is independent of time. This means that themodel assumes that the hazards for individuals with given covariate values are proportionallyconstant at all times, explaining the choice of term proportional hazards.

The Cox model is fit using a technique called partial likelihood. The reason we cannotuse regular likelihoods is that under the Cox model, observations can only specify the hazardwhere an event occurs; all possible values of h(t|x) at non-event times are consistent withany observation. However the partial likelihood behaves as a likelihood for purposes ofasymptotic inference [19].

We derive the partial likelihood under the assumption that no events occur at the sametime; this makes the derivation less technical. Denote the set of individuals at risk at timeti by R(ti). The probability of individual i experiencing the event at time ti, given that anevent occurs at ti is

Pr(Yi = ti|xi, one event at ti) =Pr(Yi = ti|xi)

Pr(one event at ti)

=Pr(Yi = ti|xi)∑`∈R(ti)

P (T` = ti)

=h(ti|xi)∑

`∈R(ti)h(ti|x`)

=h0(t) exp(β′xi)∑

`∈R(ti)h0(t) exp(β′x`)

=exp(β′xi)∑

`∈R(ti)exp(β′x`)

Taking this expression over all events yields

L =

n∏i=1

(exp(β′xi)∑

`∈R(ti)exp(β′x`)

)δi. (3.6)

The derivative of logL with respect to each component of β can then be set to 0 and solvedfor β numerically.

It is also possible to estimate the baseline hazard h0(t), but we will not do this herebecause we only need to be able to compare the survival between groups, which can be donewithout knowing h0(t) due to equation 3.5.

3.2.7 Parametric survival analysis

Another approach to regression modeling of survival data is to assume that the survivaltimes follow a specific probability distribution. This is called parametric survival analysis[3]. It is usually prefered to model the hazard instead of the survival times directly sincethe hazard function often takes a simpler form.

A simple model assumes that the hazard function is constant, i.e. h(t) = λ. Then

S(t) = exp

(−∫ t

0

λds

)= exp(−λt) and f(t) = λ exp(−λt)

so the survival times follow an exponential distribution. However this assumption is oftentoo restrictive to be used in practice. A more flexible model of the hazard function is

15

Page 19: Classification of survival data by comparison of survival

h(t) = λγt(γ−1). Then

S(t) = exp

(−∫ t

0

λγs(γ−1)ds

)= exp(−λtγ)

f(t) = λγt(γ−1) exp(−λtγ)

A random variable with this density function is said to be Weibull distributed. The param-eters λ, γ > 0 are called the scale and shape respectively. We note that the exponentialdistribution above is a special case of this distribution where γ = 1.

A Weibull model is fit using maximum likelihood. Let (yi, δi) be a realization of an i.i.d.sample of size n. The log-likelihood (3.4) becomes for our particular choice of h(t)

` =

n∑i=1

δi log(λγy(γ−1)i )− λyγi

= log(λγ)

n∑i=1

δi + (γ − 1)

n∑i=1

δi log(yi)− λn∑i=1

yγi

= log(λγ)r + (γ − 1)

n∑i=1

δi log(yi)− λn∑i=1

yγi

Differentiating with respect to λ and γ yields

d

dλ=r

λ−

n∑i=1

yγi (3.7)

d

dγ=r

γ+

n∑i=1

δi log(yi)− λn∑i=1

yγi log(yi). (3.8)

To maximize L we find λ and γ where both derivatives are zero. By (3.7) we have λ = r∑ni=1 y

γi

so plugging into (3.8) yields

r

γ+

n∑i=1

δi log(yi)−r∑ni=1 y

γi

n∑i=1

yγi log(yi) = 0

This is non-linear in γ and has to be solved numerically.We can add covariates to the Weibull model by allowing the hazard function of to vary

for each individual through its covariates. Let xi be an m-vector containing the m covariatesfor individual i. We also add a parameter β to the model, an m-vector corresponding to theweights of the covariates. As model we take

h(t|xi) = exp(β′xi)λt(γ−1).

The log-likelihood for this model is

` =

n∑i=1

δi[β′xi + log(λγ) + (γ − 1) log(yi)]− λ exp(β′xi)y

γi

=

n∑i=1

{δi[β′xi + log(λγ) + γ log(yi)]− λ exp(β′xi)yγi } −

n∑i=1

δi log yi.

We can drop the last term from the log-likelihood since it does not depend on any of themodel’s parameters. The maximum likelihood estimates of the parameters λ, γ and β arefound numerically [3].

16

Page 20: Classification of survival data by comparison of survival

3.2.8 Another comparison test

Another method for comparing survival functions was proposed in [10] and further discussedin [2]. This method amounts to integrating the absolute difference between two Kaplan-Meierestimates, which is an easy task since the estimates are step functions. Compared to thelog-rank and Wilcoxon tests which are primarily significance tests this test performs betterat detecting the overall difference between two survival curves in many situations.

Let S1 and S2 be the survival function of two groups of individuals with correspondingKaplan-Meier estimates S1 and S2. We define the dissimilarity between S1 and S2 to be

dis(S1, S2) =1

τ

∫ τ

0

|S1(t)− S2(t)|dt =1

τ

∑i|ti<τ

|S1(t)− S2(t)|(tj+1 − tj)

where τ is the last time point at which the two survival functions are comparable. If the lastobservation in both groups was an event, this is simply the last observable time. Howeverif the last observation in either group was censored, we cannot compare survival past thatpoint.

17

Page 21: Classification of survival data by comparison of survival

Chapter 4

Statistical analysis

4.1 Logistic regression

A breakpoint t was chosen and all observations censored before time t were discarded. Weadded an indicator variable to all remaining observations according to whether they had dieddue to prostate cancer before time t or not. We then performed a logistic regression (section3.1.2) using the new indicator variable as response and with T-stage, Gleason score, ageand PSA as covariates. We also added M-stage as a covariate and allowed for interactionsbetween it and all other covariates. We used a generalized additive model (using the Rpackage gam [5]) to add smoothing splines to model non-linear effects in age and PSA.

For each MX observation we predicted the probability of dying before the breakpointunder the fitted model, supposing that the observation was M0, M1 and MX respectively.We then determined a value θ corresponding to the probability of an observation being M1,as described below.

First let A denote the set of individuals for which M-stage is known, and B the set ofindividuals with unknown M-stage. Let D correspond to the event of dying due to prostatecancer before the breakpoint t. We wish to determine θ = PrB(M1). By the law of totalprobability,

PrB(D) = PrB(D|M0) PrB(M0) + PrB(D|M1) PrB(M1).

Using that PrB(M0) = 1− PrB(M1) and solving for PrB(M1), we get

θ = PrB(M1) =PrB(D)− PrB(D|M0)

PrB(D|M1)− PrB(D|M0).

Under the assumption that PrB(D|M0) = PrA(D|M0) and PrB(D|M1) = PrA(D|M1)(that is that the probability of dying before the breakpoint, conditional on M-stage, isconstant between the groups) this becomes

θ =PrB(D)− PrA(D|M0)

PrA(D|M1)− PrA(D|M0).

We then classified individuals as M1 if θ was above a given value, and as M0 otherwise.

4.2 Parametric survival analysis

We fit a Weibull model (section 3.2.7) with covariates as in the logistic regression method,i.e. T-stage, Gleason score, age, PSA, allowing for interactions between M-stage and all

18

Page 22: Classification of survival data by comparison of survival

other covariates.For each MX observation we predicted the scale parameters λM0, λM1 and λMX under

the model supposing that the observation were M0, M1 and MX respectively. We thencalculated θ similarly as above, by

θ =λMX − λM0

λM1 − λM0.

In this case θ does not have a direct interpretation as a probability, but as long as λMX isbetween λM0 and λM1 θ will take a value between 0 and 1 and will measure how close thesurvival of the observation is to what it would be if the observation were M1.

Again we classified individuals as M0 or M1 based on θ, classifying individuals as M1 ifθ was high enough.

4.3 Semiparametric survival analysis

This proceeded in the same way as the parametric method, with the exception that we useda Cox proportional hazards model (section 3.2.6) instead of a Weibull model.

4.4 Dissimilarity minimization

The observations were split into three subsets according to T-stage. For each subset wetested 120 rules for splitting the MX observations into two disjoint groups. The rules wereof the form age ≥ x and PSA ≥ y. For each rule we calculated the dissimilarity between thegroup consisting of MX observations matching the rule and the group consisting of all M1observations in the subset. We also calculated the proportion p of MX observations matchedby the rule. We ranked these rules according to a measure of goodness

goodness =p

dis(S1, S2)r

where r ≥ 0 is a tuning parameter whose value determines the weight given to how similarthe survival curves are. Higher values of r lead to a closer match of the survival curves, atthe expense of matching fewer observations.

For each subset we selected the rule with highest goodness.

19

Page 23: Classification of survival data by comparison of survival

Chapter 5

Results

5.1 Baseline

M0 M1 MXN = 21688 N = 6559 N = 52120

Follow up time (years) 2.47 5.87 10.80 0.96 1.97 3.85 2.69 5.62 9.05

Event : Censored 63% (13769) 22% ( 1423) 67% (34811)

Other death 21% ( 4641) 17% ( 1123) 22% (11413)

PC death 15% ( 3278) 61% ( 4013) 11% ( 5896)

PSA (ng/ml) 8.2 15.0 31.6 42.0 140.0 500.0 5.2 8.1 16.0

Age 64 69 75 67 74 80 63 69 76

Diagnosis year : 1998-2002 28% ( 6136) 31% ( 2030) 16% ( 8280)

2003-2007 25% ( 5433) 26% ( 1694) 29% (14946)

2008-2012 19% ( 4118) 20% ( 1298) 33% (17071)

2013-2016 28% ( 6001) 23% ( 1537) 23% (11823)

T stage : T1-2 73% (15730) 29% ( 1905) 85% (44405)

T3 26% ( 5566) 52% ( 3393) 13% ( 6697)

T4 2% ( 392) 19% ( 1261) 2% ( 1018)

Gleason WHO : GS 2-6 27% ( 5903) 7% ( 470) 53% (27837)

GS 7 44% ( 9631) 31% ( 2053) 33% (17459)

GS 8-10 28% ( 6154) 62% ( 4036) 13% ( 6824)

a b c represent the lower quartile a, the median b, and the upper quartile c for continuousvariables. N is the number of non–missing values. Numbers after percents are frequencies.

Table 5.1: Baseline characteristics by M stage.

We see in table 5.1 that most men did not undergo imaging (MX). The individuals withpositive imaging results (M1) were followed for a shorter time and were more likely to experi-ence death from prostate cancer while followed than other men. They also had much higherPSA at diagnosis, likelier to have higher T-stage and Gleason WHO, and were diagnosed ata higher age (74 years vs 69 for M0 and MX). We also see that individuals with negativeimaging results (M0) had, on average, higher PSA at diagnosis, higher T-stage, and higherGleason WHO than individuals who did not undergo imaging (MX)

20

Page 24: Classification of survival data by comparison of survival

Figure 5.1: Kaplan-Meier estimates of survival by M-stage with MX individuals split intoa high- and low risk group according to the current classification system. Shaded areasrepresent 95% confidence intervals.

In figure 5.1 we again see that men with M1 had the worst prognosis. Its also seen thatMX individuals that were classified into the highest risk category by the baseline PSA100-method have prognosis similar to but significantly better than men with M1 (p < 0.001 bylog-rank). Further, in the long term, MX men with PSA lower than 100 ng/ml had slightlybetter prognosis than men with M0.

5.2 Logistic regression

We selected the breakpoint t = 2 years, close to the median survival time of men with M1(2.2− 2.7 years [20]). After removing individuals censored before the breakpoint, n = 66921individuals remained. We classified individuals as M1 using the cutoffs θ = 0.4, 0.5 and0.6. The resulting survival curves are shown in figure 5.2. The survival curves of the MXindividuals classified as M1 are visually similar to the survival curves of M1 individuals,and the logrank test is unable to detect a significant difference between these curves whenθ = 0.5 and θ = 0.6.

21

Page 25: Classification of survival data by comparison of survival

Figure 5.2: Kaplan-Meier estimates of survival of individuals with M0, M1 as well asindividuals with MX classified according to the rule described in section 4.1, using cutoffsθ = 0.4, 0.5, 0.6. Logrank test for difference p < 0.01, p = 0.8, p = 0.2 respectively.

5.3 Parametric survival analysis

The method described in 4.2 produced similar results as the method based on semiparametricsurvival analysis and the results are omitted here.

5.4 Semiparametric survival analysis

We classified individuals as M1 using the method described in 4.3 using the cutoffs θ ≥0.4, 0.5 and 0.6. Figure 5.3 shows the resulting survival curves. None of the parametervalues makes the survival curve of MX individuals classified as M1 similar to the survivalcurve of M1 individuals.

22

Page 26: Classification of survival data by comparison of survival

Figure 5.3: Kaplan-Meier estimates of survival of individuals with M0, M1 as well asindividuals with MX classified according to the rule described in section 4.3, using cutoffsθ = 0.4, 0.5, 0.6. Logrank test for difference p < 0.01 for all θ.

5.5 Dissimilarity minimization

Here individuals were classified using the method described in 4.4 with r = 0.5, 1 and 1.5respectively. The corresponding survival curves are shown in figure 5.4. We note that thesurvival curve of the MX individuals classified as M1 becomes more similar to the survivalcurve of M1 individuals as r increases. The logrank test is unable to detect a differencebetween the survival curves when r = 1.5.

r T1-2 T3 T4

0.5 PSA ≥ 50, age ≥ 55 PSA ≥ 50, age ≥ 50, PSA ≥ 50 , age ≥ 501.0 PSA ≥ 125, age ≥ 50 PSA ≥ 275, age ≥ 50 PSA ≥ 100, age ≥ 601.5 PSA ≥ 250, age ≥ 60 PSA ≥ 275, age ≥ 50 PSA ≥ 250, age ≥ 80

Table 5.2: The rules with highest goodness in each T-stage group for the different valuesof r.

23

Page 27: Classification of survival data by comparison of survival

Figure 5.4: Kaplan-Meier estimates of survival of individuals with M0, M1 as well asindividuals with MX classified according to the rule described in section 4.4, using param-eter values ρ = 0.5, 1.0, 1.5. Logrank test for difference p < 0.01, p < 0.01 and p = 0.5respectively.

5.6 Summary

Table 5.3 shows the proportion of individuals in different subgroups that classified as havingmetastasized disease by each method. We see that as θ and ρ increase, the proportion ofindividuals classified as having metastasized disease also decrease. We also note that withhigher T-stages, all methods classify a larger proportion of men as having metastasizeddisease.

Page 28: Classification of survival data by comparison of survival

Logistic regression Semiparametric survival analysis Dissimilarity minimization PSA100

θMX

M0 M1 θMX

M0 M1 rMX

M0 M1 MX M0 M1All

PSA<100

PSA≥100

AllPSA<100

PSA≥100

AllPSA<100

PSA≥100

T-s

tage

All

52120 49555 2565 21688 6559 52120 49555 2565 21688 6559 52120 49555 2565 21688 6559 52120 21688 6559

0.4 3% 1% 27% 1% 16% 0.4 4% 3% 20% 2% 12% 0.5 10% 5% 100% 15% 71% 5% 6% 57%0.5 1% 1% 19% 1% 10% 0.5 2% 2% 14% 1% 8% 1.0 3% 0% 68% 3% 43%0.6 1% 0% 12% 0% 7% 0.6 1% 1% 10% 1% 6% 1.5 2% 0% 40% 1% 29%

T1-2

44405 43590 815 15730 1905 44405 43590 815 15730 1905 44405 43590 815 15730 1905 44405 15730 1905

0.4 0% 0% 2% 0% 1% 0.4 0% 0% 3% 0% 2% 0.5 4% 3% 100% 9% 54% 2% 3% 40%0.5 0% 0% 1% 0% 0% 0.5 0% 0% 1% 0% 1% 1.0 1% 0% 81% 2% 35%0.6 0% 0% 1% 0% 0% 0.6 0% 0% 1% 0% 1% 1.5 1% 0% 38% 1% 23%

T3

6697 5472 1225 5566 3393 6697 5472 1225 5566 3393 6697 5472 1225 5566 3393 6697 5566 3393

0.4 12% 8% 28% 3% 14% 0.4 20% 20% 24% 7% 14% 0.5 35% 20% 100% 29% 76% 18% 13% 60%0.5 5% 2% 16% 1% 7% 0.5 11% 11% 15% 3% 8% 1.0 8% 0% 46% 4% 37%0.6 2% 1% 9% 0% 4% 0.6 6% 5% 10% 1% 5% 1.5 8% 0% 46% 4% 37%

T4

1018 493 525 392 1261 1018 493 525 392 1261 1018 493 525 392 1261 1018 392 1261

0.4 55% 47% 63% 28% 43% 0.4 40% 43% 38% 20% 24% 0.5 68% 35% 100% 55% 85% 52% 34% 74%0.5 42% 31% 51% 17% 32% 0.5 32% 33% 31% 16% 20% 1.0 51% 2% 97% 31% 69%0.6 28% 18% 38% 9% 24% 0.6 25% 26% 24% 11% 16% 1.5 15% 0% 28% 3% 16%

Table 5.3: The proportion of individuals within subgroups being classified as having metastasized disease using the different methods and differentvalues of the parameters θ and ρ. The numbers at the top of every column are the numbers of individuals in that subgroup.

25

Page 29: Classification of survival data by comparison of survival

Chapter 6

Discussion

6.1 Summary of method and results

We evaluated four methods for classifying metastatic status of men with prostate cancer. Twoof the methods (logistic regression and dissimilarity minimization) managed to partition the MXgroup into two subgroups with one of the subgroups having a similar survival function as the M1group.

6.2 Strengths and limitations

As seen in table 5.1 the three methods considered are more conservative than the baseline PSA100

method in classifying individuals as having metastatic disease. For example, the baseline methodclassifies 6% of individuals with negative imaging result (M0) as having metastases whereas thecorresponding proportion with our methods is only 0-3%. On the other hand, our methodsclassify fewer men with positive imagine result (M1) as having metastatic disease. Further, thesurvival functions of men classified as having metastasized cancer by the logistic regression anddissimilarity minimization methods are very similar to those of men with metastasized disease.

We remark that in the subset of T4 individuals, all methods (including the baseline) classifymany men with M0 as having metastatic disease. This might be caused by the fact that T4 isitself correlated with bad prognosis [13], so that even without metastases these men may havesimilar survival function as men with metastasized disease.

There is large room for improvement of the methods. Firstly, we have only considered indi-viduals with complete data. It is possible that some of the mechanisms that cause data to gomissing are themselves interesting. Further we have only used a small subset of the availablevariables in NPCR/PCBaSe. Another limitation is that the data on metastatic and tumor stageis only registered at diagnosis. A man who undergoes imaging with negative result at diagnosiswill for all we know be M0 even if his cancer later is found to have metastasized. Finally, wehave both found the classification rules and evaluated them on the same data which likely lim-its how well the rules generalize to unseen data. Our preliminary tests suggest that the rulesfound by the logistic regression method perform slightly better than those found by dissimilarityminimization when classifying previously unseen data, but this would have to be studied further.

26

Page 30: Classification of survival data by comparison of survival

6.3 Comparison of methods

An important difference between the methods is that the one based on minimizing dissimilarityprovides a binary classification for each individual, whereas the regression-based methods producefor each individual a value θ between 0 and 1 corresponding the how likely it is that this individualis M1, from which a binary classification can be obtained by picking a θ-cutoff. Both have benefitsand drawbacks; the former is arguably easier to interpret and is in a sense an upgraded versionof the current classification system. On the other hand, the regression-based methods have thebenefit of allowing the user of the classification to easily vary the θ-cutoff. This is useful becausethe cost of missclassifying an M1 observation as M0 and conversely can differ between studies.

It will likely be harder to add more variables to the dissimilarity minimization method thanthe others since it can not handle non-ordinal categorical variables in an obvious way and willsuffer from subsets becoming very small as the rules considered becomes more and more specific.

It’s somewhat surprising that the logistic regression seemingly performs so much better thanthe semiparametric method, since the latter can see the entire distribution of survival timeswhile the former can only see whether or not an individual survived past a fixed breakpoint.There are several possible explanations for this. Firstly, it might be the case that with a suitablychosen breakpoint, even this binary value contains enough information to distinguish a prognosisassociated with metastases from one without. Secondly, we used smoothing splines in the logisticregression to account for non-linear effects in some of the covariates. However even when splineswere not used the logistic regression still performed better. Finally it might be the case that theproportional hazards assumption (section 3.2.6) is not satisfied, in which case the semiparametricmodel does not work.

6.4 Previous research

Compared to the studies on which the current classification system is based [14, 15], we considereda much larger population (60/129 individuals vs. 84611 individuals). In addition our data wasobtained from a population based register, ensuring strong external validity.

We also note that in the study by Rana [15], the mean PSA serum level of men with metastaticdisease was very high, so it may be the case that the claimed 100% predictive value of the rulePSA ≥ 100 ng/ml could also have been obtained by a higher cutoff (say ≥ 200 ng/ml) [18].

6.5 Implications and conclusion

The methods proposed above could, with suitable improvements, be used in future studies basedon data from NPCR/PCBaSe.

To illustrate by example one could imagine two studies, one in which one would like toinvestigate some property of men with metastasized disease and another where one wants toinvestigate men with nonmetastasized disease. In the first study, the cost of missclassifying anindividual without metastases as having metastases is likely be higher than the converse, sincedoing so would lead to the inclusion of men that does not belong in the study. In this case onewould want to choose a higher value of θ if the logistic regression or semiparametric methodswere used. In the second study, the situation may be reversed, i.e. the cost of missclassifying aman as being free from metastases may be higher than the converse. Here one would prefer alower value of θ.

27

Page 31: Classification of survival data by comparison of survival

Bibliography

[1] George Casella and Roger L. Berger. Statistical Inference. Thomson Learning, 2002.

[2] Dechang Chen, Huan Wang, Li Sheng, Matthew T. Hueman, Donald E. Henson, Arnold M.Schwartz, and Jigar A. Patel. An Algorithm for Creating Prognostic Systems for Cancer.Journal of Medical Systems, 40(7):160, July 2016.

[3] David Collett. Modelling survival data in medical research. Texts in statistical science. CRCPress, Boca Raton, Fl., 3. ed edition, 2015.

[4] D. R. Cox. Regression Models and Life-Tables. Journal of the Royal Statistical Society:Series B (Methodological), 34(2):187–202, 1972.

[5] Trevor Hastie. gam: Generalized Additive Models, 2019. https://CRAN.R-project.org/package=gam.

[6] John D. Kalbfleisch and Ross L. Prentice. The Statistical Analysis of Failure Time Data.Wiley-Interscience, Hoboken, N.J, 2 edition edition, September 2002.

[7] E. L. Kaplan and Paul Meier. Nonparametric Estimation from Incomplete Observations.Journal of the American Statistical Association, 53(282):457–481, June 1958.

[8] John P. Klein and Melvin L. Moeschberger. Survival analysis: techniques for censored andtruncated data. Statistics for biology and health. Springer, New York, NY, 2. ed., corr. 3.print edition, 2010. OCLC: 837651820.

[9] Kwan-Moon Leung, Robert M. Elashoff, and Abdelmonem A. Afifi. Censoring Issues inSurvival Analysis. Annual Review of Public Health, 18(1):83–104, 1997.

[10] Xun Lin and Qiang Xu. A new method for the comparison of survival distributions. Phar-maceutical Statistics, 9(1):67–76, March 2010.

[11] James Mohler, Robert R. Bahnson, Barry Boston, J. Erik Busby, Anthony D’Amico,James A. Eastham, Charles A. Enke, Daniel George, Eric Mark Horwitz, Robert P. Huben,Philip Kantoff, Mark Kawachi, Michael Kuettel, Paul H. Lange, Gary MacVicar, Eliza-beth R. Plimack, Julio M. Pow-Sang, Mack Roach, Eric Rohren, Bruce J. Roth, Dennis C.Shrieve, Matthew R. Smith, Sandy Srinivas, Przemyslaw Twardowski, and Patrick C. Walsh.Prostate Cancer. Journal of the National Comprehensive Cancer Network, 8(2):162–200,February 2010.

[12] N Mottet, R.C.N. van den Bergh, E Briers, L Bourke, P Cornford, M De Santis, S Gillessen,A Govorov, J Grummet, A.M. Henry, T.B. Lam, M.D. Mason, H.G. van der Poel, T.H.van der Kwast, O Rouviere, and T Wiegel. European Association of Urology Guidelines.

28

Page 32: Classification of survival data by comparison of survival

2018 Edition. volume presented at the EAU Annual Congress Copenhagen 2018. EuropeanAssociation of Urology Guidelines Office, 2018.

[13] NPCR. Prostatacancer, Nationell kvalitetsrapport for 2018. Technical report, September2019.

[14] J M O’Donoghue, E Rogers, H Grimes, P McCarthy, M Corcoran, H Bredin, and H FGiven. A reappraisal of serial isotope bone scans in prostate cancer. The British Journal ofRadiology, 66(788):672–676, 1993.

[15] A. Rana, K. Karamanis, M. G. Lucas, and G. D. Chisholm. Identification of metastaticdisease by T category, gleason score and serum PSA level in patients with carcinoma of theprostate. British Journal of Urology, 69(3):277–281, March 1992.

[16] Jennifer R. Rider, Fredrik Sandin, Ove Andren, Peter Wiklund, Jonas Hugosson, and ParStattin. Long-term outcomes among noncuratively treated men according to prostate can-cer risk category in a nationwide, population-based study. European Urology, 63(1):88–96,January 2013.

[17] Socialstyrelsen. Cancer i siffror 2018, June 2018.

[18] Stattin Karl, Sandin Fredrik, Bratt Ola, and Lambe Mats. The Risk of Distant Metastasesand Cancer Specific Survival in Men with Serum Prostate Specific Antigen Values above100 ng/ml. Journal of Urology, 194(6):1594–1600, December 2015.

[19] Terry M. Therneau and Patricia M. Grambsch. Modeling Survival Data: Extending the CoxModel. Springer Science & Business Media, November 2013.

[20] Marcus Westerberg, Ingela Franck Lissbrant, Jan Erik Damber, David Robinson, HansGarmo, and Par Stattin. Temporal changes in survival in men with de novo metastaticprostate cancer: nationwide population-based study. Acta Oncologica (Stockholm, Sweden),pages 1–6, September 2019.

[21] Renata Zelic, Hans Garmo, Daniela Zugna, Par Stattin, Lorenzo Richiardi, Olof Akre, andAndreas Pettersson. Predicting Prostate Cancer Death with Different Pretreatment RiskStratification Tools: A Head-to-head Comparison in a Nationwide Cohort Study. EuropeanUrology, October 2019.

29