Model Selection and Model Averaging for Longitudinal Data with Application in Personalized

Model Selection and Model Averagingfor Longitudinal Data with Application in

Personalized Medicineby

Hui Yang

Submitted in Partial Fulfillment of the

Requirements for the Degree

Doctor of Philosophy

Supervised by

Professor Hua Liang

Department of Biostatistics and Computational BiologySchool of Medicine and Dentistry

University of RochesterRochester, New York

2013

ii

Biographical Sketch

Hui Yang was born in Tianjin, People’s Republic of China, on August 11, 1983. In

2006, she received her Bachelor of Science degree in Statistics in the Department of

Statistics, School of Mathematical Sciences, at Nankai University. Prior to coming to

Rochester, she spent two years in Texas and received her Master of Science degree in

Mathematics in 2009 in the Department of Mathematics, College of Arts and Sciences,

at the University of North Texas.

Thereafter, Hui joined the Ph.D. program in the Department of Biostatistics and

Computational Biology, School of Medicine and Dentistry, at the University of Rochester.

In 2010, she received her Master of Arts degree in Statistics and has begun her Ph.D

thesis research under the guidance of Professor Hua Liang since 2011.

Hui presented her work at the 2013 International Biometric Society Meeting in

Orlando, Florida and at the 2013 Joint Statistical Meeting in Montreal, Canada. She

is a member of the American Statistical Association and the International Biometric

Society.

iii

Acknowledgments

I would first like to express my sincere gratitude to Professor Hua Liang for his

inspiration and constant guidance, support and encouragement throughout my Ph.D.

research. He has not just made this thesis possible but also exemplified for me the

scientific spirit of a true scholar.

Many thanks also to the rest of my thesis committee members: Professor Hulin Wu,

Professor Tanzy Love and Professor Jean-Philippe Couderc. I very much appreciate

their invaluable suggestions and comments to help improve this thesis.

I would also like to thank Professor Guohua Zou for his insight on my thesis re-

search; Professor Michael McDermott for his advice and guidance in planning my pur-

suit of a Ph.D. degree; and Ms. Cheryl-Bliss Clark for her endless support and care.

I am very grateful to have spent wonderful years in the Department of Biostatistics

and Computational Biology. The graduate courses, lectures and professional activities

helped develop my knowledge and skills and sparked my professional motivations. I

enjoyed interacting with and learning from the faculty, staff and my student colleagues.

Their support and friendships enriched my Ph.D. study.

Finally, I would like to express my love and gratitude to my family, including my

wonderful parents, Xiulan Song and Qiuwei Yang. With their endless loving care, I am

blessed.

iv

Abstract

Longitudinal data are sometimes collected with a large number of potential ex-

ploratory variables. In order to get the better statistical inference and make the more

accurate prediction, model selection has become an important procedure for longitu-

dinal studies. Nevertheless, the inference based on a single model may ignore the un-

certainty introduced by the selection procedure, and therefore underestimate the vari-

ability. As an alternative, model averaging approach combines estimates from different

candidate models in the form of the certain weighted mean to reduce the effect of se-

lection instability. There has been much literature about model selection and averaging

for cross-sectional data, but more efforts are needed to invest in longitudinal data.

My thesis focuses on model selection and model averaging procedures in the lon-

gitudinal data context. We propose an AIC-type model selection criterion (∆AIC) in-

corporating the generalized estimating equations approach. Specifically, we consider

the difference between the quasi-likelihood of a candidate model and a narrow model

plus a penalty term in order to avoid the complicated integration calculation from the

quasi-likelihood. This criterion actually inherits theoretical asymptotic properties from

AIC.

In the second part, we develop a focused information criterion (QFIC) and a Fre-

quentist model average (QFMA) procedure on the basis of a quasi-score function in-

corporating the generalized estimating equations approach. These methods are shown

to have asymptotic properties. We also conduct intensive simulation studies to examine

the numerical performance of the proposed methods.

v

The third part aims to apply the focused information criterion to personalized medicine.

Based on the individual level information from clinical observations, demographics,

and genetics, this criterion provides a personalized predictive model to make a prog-

nosis and diagnosis for an individual subject. Consideration of the heterogeneity of

individuals helps to reduce prediction uncertainty and improve prediction accuracy.

Several real case studies from biomedical research are studied as illustrations.

vi

Contributors and Funding Sources

This thesis was supervised by a dissertation committee: Professor Hua Liang (ad-

visor), Professor Hulin Wu, and Professor Tanzy Love from the Department of Bio-

statistics and Computational Biology, and Professor Jean-Philippe Couderc from the

Department of Medicine, Cardiology at the University of Rochester.

The content of this thesis mainly consists of three research projects during the doc-

toral study at the University of Rochester. Two research papers are in preparation as

follows:

Hui, Y., Peng, L., Guohua, Z., and Hua, L. Variable Selection and Model

Averaging for Longitudinal Data Incorporating GEE Approach, Submitted

to Statistica Sinica.

Hui, Y., Hua, L. Focused Information Criterion on Predictive Models in

Personalized Medicine, In preparation.

This thesis was advised by Professor Hua Liang. All work was completed by the stu-

dent. The graduate study was supported by the Fellowship from University of Rochester

Medical Center.

vii

Table of Contents

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Model Selection and Averaging Approach . . . . . . . . . . . . . . . . 8

1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 AIC-Type Model Selection Criterion Incorporating the GEE Approach 14

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Quasi-likelihood-based ∆AIC . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Conclusion and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Focused Information Criterion and the Frequentist Model Averaging Pro-

cedure Incorporating the GEE Approach 31

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Model Selection and Averaging Procedures . . . . . . . . . . . . . . . 32

3.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 43


viii

4 Predictive Models in Personalized Medicine 55

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Prostate Cancer Case Study . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Relapsing Remitting Multiple Sclerosis Case Study . . . . . . . . . . . 66

4.4 Veteran’s Lung Cancer Case Study . . . . . . . . . . . . . . . . . . . . 78


5 Discussion and Future Work 87

Bibliography 90

Appendix 100

A.1 Regularity Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 100

A.2 Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

A.3 Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 106



ix

List of Tables

1.1 Structure of the Typical Longitudinal Dataset . . . . . . . . . . . . . . 4

2.1 ∆AIC - Candidate Models in Simulation Studies . . . . . . . . . . . . 21

2.2 ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC

in Simulation I with True Exchangeable Correlation Structure EX(0.5) . 22


in Simulation I with True Autoregressive Correlation Structure AR(0.5) . 23


in Simulation II with True Mixed Correlation Structure MIX . . . . . . . 25

2.5 WESDR - Statistical Inference under Full Model with IN, EX and AR

Working Correlation Matrices . . . . . . . . . . . . . . . . . . . . . . 27

2.6 WESDR - ∆AIC Values and Ranks of Candidate Models . . . . . . . . 28

2.7 WESDR - QIC and ∆AIC Values of Models Selected by QIC . . . . . . 29

3.1 QFIC and QFMA - Candidate Models in Simulation I with Continuous

Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 QFIC and QFMA - Candidate models in Simulation II with Binary Re-

sponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 A5055 - Statistical Inference under Full Model with IN, EX and AR

Working Correlations Matrices . . . . . . . . . . . . . . . . . . . . . . 45

x

3.4 A5055 - ∆AIC and QFIC Values on 12 Nested Models Selected by ∆AIC 49

3.5 A5055 - QIC and QFIC Values on 12 Nested Model Selected by QIC . . 50

3.6 A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models

Selected by QFIC for CD4 . . . . . . . . . . . . . . . . . . . . . . . . 51


Selected by QFIC for CD8 . . . . . . . . . . . . . . . . . . . . . . . . 52


Selected by QFIC for Age . . . . . . . . . . . . . . . . . . . . . . . . 53

4.1 Prostate Cancer - Statistical Inference under Full Model . . . . . . . . . 59

4.2 Prostate Cancer - Candidate Models . . . . . . . . . . . . . . . . . . . 60

4.3 Prostate Cancer - Group Partition Criteria . . . . . . . . . . . . . . . . 62

4.4 Prostate Cancer - Group-Specific Percentages and Prediction Error Rates

of Targeted Patients with Four Partition Criteria . . . . . . . . . . . . . 65

4.5 RRMS - Statistical Inference under Full Model . . . . . . . . . . . . . 67

4.6 RRMS - Candidate Models . . . . . . . . . . . . . . . . . . . . . . . . 69

4.7 RRMS - Group-Specific Percentages and Prediction Error Rates for the

Targeted Patients at the Targeted Visit Days with Four Partition Criteria 75

4.8 RRMS - Personalized Predictive Models Concluded by the Personal-

ized QFIC for Targeted Patients under Twelve Scenarios . . . . . . . . 76

4.9 Lung Cancer - Statistical Inference under Full Model . . . . . . . . . . 79

4.10 Lung Cancer - Candidate Models . . . . . . . . . . . . . . . . . . . . . 80

xi

List of Figures

2.1 WESDR - ∆AIC Values of Candidate Models . . . . . . . . . . . . . . 28

3.1 QFMA and QFIC - MSE and CP for Focused Parameter ζ in Simulation

I on Continuous Responses with True Exchangeable, Autoregressive

and Mixed Correlation Matrices EX(0.5), AR(0.5) and MIX . . . . . . . 39

3.2 QFMA & QFIC - MSE & CP for Focused Parameter ζ in Simulation

II on Binary Responses with True Exchangeable, Autoregressive and

Mixed Correlation Matrices EX(0.5), AR(0.5) and MIX . . . . . . . . . . 42

3.3 A5055 - Prediction Error Rates of Model Selection and Model Averag-

ing Procedures with Different Values of Weight Parameter κ . . . . . . 46

4.1 Prostate Cancer - Frequency of Candidate Models Selected by the Per-

sonalized FIC as the Personalized Predictive Models for 376 Patients . . 61

4.2 Prostate Cancer - Histograms of Tumor Volume and Age . . . . . . . . 63

4.3 RRMS - Empirical and Estimated Exacerbation Rates on Visit Days

and Duration Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4 RRMS - Frequencies of Candidate Models Selected by the Personalized

QFIC as the Personalized Predictive Models for 822 Observations . . . 70

4.5 RRMS - Frequencies of Candidate Model Selected by the Personalized

QFIC as the Personalized Predictive Models for 50 Patients at Visit

Days of 7, 31, 61 and 104 . . . . . . . . . . . . . . . . . . . . . . . . . 72

xii

4.6 RRMS - Exacerbation Rate Predictions for Targeted Patients under the

Single Predictive Model and the Twelve Personalized Predictive Models 77

4.7 Lung Cancer - Frequencies of Candidate Models Selected by the Per-

sonalized FIC as the Personalized Predictive Models for 137 Veterans . 81

4.8 Lung Cancer - Kaplan Meier Estimations and Karnofsky Scores His-

tograms on Veterans in Groups G8, G14 and G16 . . . . . . . . . . . . 82

4.9 Lung Cancer - Frequencies of Candidate Model Selected by the Per-

sonalized FIC as the Personalized Predictive Models for Veterans with

Different Tumor Cell Types . . . . . . . . . . . . . . . . . . . . . . . . 84

1

1 Introduction

1.1 Background and Motivation

Longitudinal data, in the form of repeated measurements with the same individual over

time or place, arise in a broad range of fields including biomedical, pharmaceutical,

social, and public health research. Instead of just comparing the same characteristics

from different individuals at one specific point in cross-sectional studies, longitudinal

studies can allow one to analyze the change of the responses as well as the influence

factors over a long period of time.

We acknowledge that for each individual, multiple observations may be correlated,

even if the individuals themselves are independent from each other. In order to get

more reliable statistical inference, the possible correlation has to be considered. During

the last three decades, there has been various literature on the statistical analysis of

longitudinal data, such as Harville (1977), Laird and Ware (1982), Liang and Zeger

(1986), Prentice (1988), Zhao and Prentice (1990), Breslow and Clayton (1993), Qu

et al. (2000), Diggle et al. (2002), and Fitzmaurice et al. (2009). Generally speaking,

the analysis can be categorized in three different model fitting classes: marginal models,

mixed effects models, and transition models. More details about these model fitting

approaches are discussed in Section 1.2.

2

Sometimes, there are many potential exploratory variables collected in a study. To

include all variables may result in an overfitting model with poor predictive perfor-

mance. Therefore, statistical analysis generally starts with choosing an appropriate

model, including only important and necessary variables. There has been extensive

literature about model selection, but it mainly focuses on the classic linear regression

models, such as Buckland et al. (1997), Shao (1997), George (2000), and Miller (2002).

More recently, some of the traditional model selection criteria have been extended to

longitudinal data, especially for mixed effects models and marginal models like gener-

alized estimating equations. These criteria are reviewed in Section 1.3.

Nevertheless, all these traditional model selection criteria are data-oriented and se-

lect the single final model with the best overall fit, regardless of the different parameter

interests. Hansen (2005) pointed out that “models should be evaluated based on their

purposes.” In other words, different models should be chosen for analyzing different

individuals or subgroups, or for estimating different focused parameters, as mentioned

in Hand and Vinciotti (2003). From this perspective, Claeskens and Hjort (2003) pro-

posed the focused information criterion (hereafter “FIC”), which chooses the model

with the smallest estimated mean square error of the focused parameter’s estimate. The

paper also developed the corresponding large sample properties.

One concern about model selection procedures relates to over optimistic confidence

intervals. Inference based on a single model ignores the uncertainty introduced during

the model selecting process and therefore underestimates the variability, which may re-

sult in relatively narrow confidence intervals, as shown in Danilov and Magnus (2004),

Shen et al. (2004). As an alternative, the model averaging approach avoids model se-

lection instability by averaging the estimates based on the different candidate models.

It reduces the risk of selecting a poor model and improves the coverage probability of

the corresponding confidence intervals. This strategy has been studied in much litera-

ture, including Draper (1995), Buckland et al. (1997), Burnham et al. (2002), Danilov

and Magnus (2004), Leeb and Potscher (2006). Most, however, are from the Bayesian

3

perspective. In 2003, Hjort and Claeskens (2003) proposed the Frequentist model aver-

aging procedure (hereafter “FMA”) by using weights obtained based on certain model

selection criteria. Section 1.3 also discusses the Frequentist model averaging frame-

work for classic linear regression models.

FIC and the FMA procedure have been well studied in commonly used models,

such as generalized linear models in Claeskens et al. (2006), Cox proportional haz-

ards models in Hjort and Claeskens (2006), semi-parametric partial linear models in

Claeskens and Carroll (2007), and generalized additive partial linear models in Zhang

and Liang (2011). Since longitudinal data have become more common, novel analysis

approaches are highly demanded to attain better statistical inference and make more

accurate predictions. This demand motivates us to study model selection and model av-

eraging procedures in the longitudinal data context. In particular, the characteristic of

FIC, which aims to tailor the final model based on the targeted parameter, inspires us to

apply FIC to predictive models in personalized medicine. Section 1.4 briefly describes

the thesis work. All technical details are provided in the Appendix.

1.2 Estimation and Inference

Consider a longitudinal study with n independent subjects. Subject i has mi vis-

its, where the jth visit collects the response yij and a set of the covariates xij =

(x1ij, · · · , xkij). Let N =∑n

i=1mi be the total number of observations in this study.

Table 1.1 illustrates the structure of the typical longitudinal dataset.

In longitudinal data analysis, marginal models mainly focus on the exploratory vari-

ables’ effects on the mean responses, regardless of the correlation structure within each

subject. To fit the marginal models, the generalized estimating equations (hereafter

“GEE”) approach, proposed by Liang and Zeger (1986), has been widely used. It pro-

vides the consistent estimates by only specifying the first two marginal moments and a

working correlation matrix. The corresponding estimation and inference are provided

4

Table 1.1: Structure of the Typical Longitudinal Dataset

Subject Observation Response Exploratory Variables

1 1 y11 x111, · · · , xk11

1 2 y12 x112, · · · , xk12

......

......

1 m1 y1m1 x11m1 , · · · , xk1m1

......

......

n 1 yn1 x1n1, · · · , xkn1

n 1 yn2 x1n2, · · · , xkn2

......

......

n mn ynmn x1nmn , · · · , xknmn

in Subsection 1.2.1. Instead of targeting the population level, mixed effects models

allow the regression coefficients to vary randomly for each individual and therefore

provide the subject-specific inference as well. Laird and Ware (1982) introduced linear

mixed effects models (hereafter “LMM”) to analyze the continuous longitudinal data,

which are based on the normality assumption. Later on, generalized linear mixed ef-

fects models (hereafter “GLMM”) were also proposed to fit the categorical longitudinal

data. Both of the models arrive at the estimation by integrating the random effects from

the joint likelihood. More details are presented in Subsection 1.2.2.

1.2.1 Generalized Estimating Equations

In the framework of the GEE approach, the mean of yij is connected to xij through a

link function g(·) as follows,

E(yij) = µij and g(µij) = x>ijβ,

5

where β = (β1, · · · , βk)> is a vector of the unknown parameters. The variance of yij

can be expressed as a known function ν(·) of µij with a nuisance parameter φ,

var(yij) = φν(µij).

Starting from these two basic assumptions, Wedderburn (1974) defined the log quasi-

likelihood function K(µij, φ, yij) through the following relation:

∂K(µij, φ; yij)

∂µij=yij − µijφν(µij)

.

Let yi = (yi1 · · · , yimi)> and xi = (xi1, · · · ,ximi

)>. In the context of longitudinal

data, the log quasi-likelihood function can be defined similarly:

∂Q(β,Ri(α), φ; yi)

∂β= D>i V−1

i (yi − µi),

where µi = E(yi), Di = Di(β) = ∂µi/∂β>, Vi = φA1/2

i Ri(α)A1/2

i , Ri(α) is a

mi ×mi working correlation matrix and Ai is a mi ×mi diagonal matrix with the jth

diagonal element ν(µij). Let D = (y1,x1), · · · , (yn,xn) and the estimates of β can

be reached by solving the corresponding quasi-score equations, known as generalized

estimating equations:

U(β,R(α), φ;D) =∂Q(β,R(α), φ;D)

∂β=

n∑i=1

D>i V−1i (yi − µi) = 0.

The main advantage of the GEE estimates βgee is their consistency under the mild regu-

larity conditions, regardless of the misspecified working correlation matrix. It has also

been shown that√n(βgee − β) follows the asymptotic normal distribution with mean

zero and variance-covariance matrix Vgee where:

Vgee = limn→∞

n

(n∑i=1

D>i V−1i Di

)−1 n∑i=1

D>i V−1i cov(yi)V

−1i Di

(n∑i=1

D>i V−1i Di

)−1

.

By replacing cov(yi) with yi − µi(β)yi − µi(β)> and substituting α, β and φ by

their√n-consistent estimates, Vgee can be estimated consistently, where the estimate is

6

known as the sandwich estimate or as the robust variance-covariance estimate in White

(1980).

Liang and Zeger (1986) also suggested several commonly used working correla-

tion matrices: the independent working correlation matrix (IN) with Ri = Ini, the

exchangeable working correlation matrix (EX) with [Ri]jk = α (j 6= k), the first order

autoregressive working correlation matrix (AR) with [Ri]jk = α|j−k| (j 6= k) and the

unstructured working correlation matrix (UN) with [Ri]jk = αjk (j 6= k). Although the

GEE approach provides the robust estimates regardless of the choice of Ri, choosing

the one which is close to the true correlation can increase efficiency.

1.2.2 Mixed Effects Models

For the previous longitudinal dataset, if the response variables are continuous, the linear

mixed effects models proposed by Laird and Ware (1982) have the following formula:

yi = xiβ + zibi + εi with bi ∼ N(0, σ2H) and εi ∼ N(0, σ2Ki).

Here the design matrix xi, composed of the k fixed effects, links the unknown popu-

lation parameters β to the response yi, while the design matrix zi, including all the

l random effects, links the unknown individual parameters bi to yi. In particular,

zi = (zi1, · · · , zimi)>, where zij = (z1ij, · · · , zlij). By defining the fixed and ran-

dom effects, LMM allows some parameters fixed while others vary randomly across

the subjects. The covariance of the repeated measurements therefore can be specified

as:

cov(yi) = Vi = σ2ziHz>i + σ2Ki.

Denote YN×1 = (y>1 , · · · ,y>n )>; XN×k = (x>1 , · · · ,x>n ); ZN×nl = diag(z>1 , · · · , z>n );

bnl×1 = (b>1 , · · · ,b>n )>; εN×1 = (ε>1 , · · · , ε>n )>; Hnl×nl = diag(H, · · · ,H) and

Knl×nl = diag(K1, · · · ,Kn). LME can also be written in the following matrix notation:

Y = Xβ + Zb + ε with b ∼ N(0, H).

7

If the variance components of H and K are given, the estimates of β and bi can be

reached by the likelihood or the generalized least square methods as the best linear

unbiased estimates, as shown in Robinson (1991):

β = (X>V−1X)−1X>V−1Y and bi = Hz>i V−1i (yi − xiβ),

where V = diag(V1, · · · ,Vn). If the variance matrices are unknown, to estimate β

and bi, the maximum likelihood method or the restricted maximum likelihood method

can be used with EM-algorithm, as proposed by Dempster et al. (1977) and Laird and

Ware (1982).

In order to model the categorical longitudinal data, the generalized linear mixed

effects models have also been studied, such as Stiratelli et al. (1984), Breslow and

Clayton (1993), and Schall (1991). Conditional upon the individual effect, the mean of

yij can be connected to xij through a link function g(·):

E(yij|bi) = µij, and g(µij) = x>ijβ + z>ijbi.

The responses are (conditionally) independent and have the conditional density func-

tion of the following formula:

f(yij|bi;β, σ20) = exp

ωijσ2

0

(yijθij − a(θij)) + c(yij, σ20/ωij)

where the ωijs are the known weights, as shown in McCullagh and Nelder (1989). The

mean and canonical parameters are linked through the equation µ = a′(θ). When the

canonical link function is the normal density function, GLMM becomes LME. Though

there is no closed form for the estimates in GLMM, the EM algorithm or Newton-

Raphson methods can be applied instead. The Gibbs sampling introduced in Zeger and

Karim (1991), or the Laplace approximation in Breslow and Clayton (1993) may also

be considered when the dimension of random effects is relatively high.

8

1.3 Model Selection and Averaging Approach

In regression, given a large set of exploratory variables, we need to choose the appropri-

ate ones to arrive at a better statistical inference and make a more accurate prediction.

Model selection therefore becomes a necessary procedure. Fortunately, a number of

model selection criteria have been well studied for regression models.

One of the most widely used criteria was the Akaike’s information criterion (AIC).

It was proposed in Akaike (1973) as an asymptotic unbiased estimate of Kullback and

Leibler’s information between a candidate model and the true model. By trading off

the gain of information and model complicity, the AIC value of a candidate model can

be defined as:

AICk = −2log(Lk) + 2k,

with Lk being the likelihood function and k being the number of exploratory variables

in the candidate model.

From the Bayesian perspective, Schwarz (1978) proposed the Schwarz information

criterion, which is also known as the Bayesian information criterion (BIC). The BIC

value of a candidate model is defined as:

BICk = −2log(Lk) + 2klog(n)

where n is the number of observations in the study. Depending on the sample size

and the number of the exploratory variables, BIC usually penalizes the complicity of

a candidate model more strongly than AIC. The model with the smallest AIC or BIC

value is chosen as the final model.

There are also certain criteria built on the concept of the residual sum of squares

(hereafter “RSS”), such as the residual mean square (hereafter “RMS”), the squared

multiple correlation coefficient R2, and the adjusted R2. Mallows (1973) proposed

Mallows’ Cp defined as:

RSSk/σ2 + 2k − n,

9

where σ2 is the RMS after regression on the complete set of all the exploratory vari-

ables. The final model, chosen by these criteria with the smallest value, is actually a

compromise among the sample size, effect size and the collinearity degree within the

exploratory variables.

With the development of model fitting in longitudinal data, several traditional model

selection criteria in the classical linear regression models have also been extended to

longitudinal analysis. This has occurred particularly in the GEE approach, reviewed in

Subsection1.3.1, and in LMM and GLMM, briefly presented in Subsection 1.3.2.

1.3.1 Model Selection for the GEE Approach

For the non-likelihood based GEE approach, the traditional model selection criteria

cannot be applied directly. Pan (2001a) proposed Akaike’s information criterion in gen-

eralized estimating equations, called quasi-likelihood under the independence model

criterion (QIC). By replacing the likelihood component with quasi-likelihood intro-

duced in McCullagh and Nelder (1989), it is defined as:

QIC(R) = −2Q(βgee(R), I;D) + 2trace(ΩIVgee).

Here, βgee(R) and the sandwich estimate Vgee are both obtained with the working cor-

relation matrix R. The logarithm of the quasi-likelihood function Q(βgee(R); I,D), is

reached with working-independence assumption, as is ΩI, the inverse of the sandwich

estimate of βgee(I). With the similar components as AIC, the quasi-likelihood plus a

penalty term, QIC picks the model with the smallest QIC value.

In terms of the criteria based on RSS, the GEE version residual sum of squares,

RSSgee, can be simply extended as:

RSSgee =n∑i=1

mi∑j=1

yij − g

(x>ijβgee

)2

.

10

Cantoni et al. (2005) also proposed weighted residual sum of squares by considering

the observations’ heteroscedasticity:

RSSw =n∑i=1

mi∑j=1

cij

yij − g

(x>ijβgee

)φν(µij)

2

,

where cijs are weights based on the experience. Cantoni et al. (2005) extended Mal-

lows’ Cp criterion to the generalized Cp, denoted by GCp for the GEE approach.

GCp =n∑i=1

mi∑j=1

yij − g

(x>ijβgee

)φν(µij)

2

−N + 2trace(M−1N),

with M = n−1∑n

i=1 D>i V−1i Di and N = n−1

∑ni=1 D>i A−1

i Di.

1.3.2 Model Selection for LMM and GLMM

LMM and GLMM involve two types of model selection issues, targeting the inference

of the population and individual levels separately: (1) identification of the significant

fixed and random effects and (2) identification of the significant fixed effects only when

the random effects are not the subject for selection, mentioned in Dziak and Li (2007).

Liu et al. (1999) proposed the predicted residual sum of square (PRESS) using the

leave-one-out cross-validation experiment:

PRESS =n∑i=1

yi − xiβ(−i)

2

,

where β(−i) is estimated when the ith subject is deleted from analysis. PRESS can be

used for selection of the fixed effects only

When focusing on individual level inference, Vaida and Blanchard (2005) proposed

conditional AIC:

cAIC = −2log(cL) + 2ρ.

Here, cL is the model likelihood conditioning on bi = bi and ρ is the effective degrees

of freedom, defined as ρ = trace(H). H is the hat matrix mapping the observation y to

the fitted vector y.

11

1.3.3 Frequentist Model Averaging

In this section, we take the classical linear regression model as an example to illustrate

the framework for the FMA procedure, as proposed in Hjort and Claeskens (2003). The

notations introduced here are limited to this subsection only.

Suppose we have the following linear regression model:

Yn×1 = Xn×p βp×1 + Zn×q γq×1 + εn×1.

The design matrix X includes all of the exploratory variables, which are sure to be

included in the final model, whereas the design matrix Z is composed of the variables,

about which we are uncertain. The unknown parameters β and γ link X and Z to the

response Y. Here, we assume that the matrix (X,Z) has the full column rank p+q. This

framework allows us to start with a “narrow” model that includes all of the necessary

exploratory variables of X, and then to add one or more additional variables in Z. Each

subset S of 1, · · · , q represents one candidate model.

Suppose we are interested in the unknown quantity µ. Denote the estimate obtained

from candidate model S by:

µS = µ(βS, γS).

During the traditional model selection procedure, one final model is chosen from the

corresponding 2q candidate models based on a model selection criterion. We then make

the statistical inference and prediction under this final model. However, Hjort and

Claeskens (2003) have demonstrated the overoptimistic nature of the corresponding

confidence intervals with respect to coverage probability. This excess optimism pro-

vides the motivation to propose the model averaging procedure with the compromise

estimate, taking the form of:

µ =∑

S

ωSµS.

The choice of the weights ωS distinguishes the Frequentist from the Bayesian perspec-

tives. Instead of using the weights based on prior information in the Bayesian model

12

averaging procedure, the Frequentist model averaging procedure uses the weights, that

are totally determined by the data. Hjort and Claeskens (2003) also provide a partial list

of the particularly attractive weights. Specifically, when the weight function becomes

an indicator function of the final model selected by a certain model selection criterion,

the model averaging estimate is consistent with the model selection estimate associated

this model selection criterion.

1.4 Outline of the Thesis

In this thesis, model selection and model averaging procedure is further studied in the

longitudinal data context. We mainly consider the estimates incorporating the GEE

approach.

In Chapter 2, we propose and study another quasi-likelihood based AIC-type model

selection criterion incorporating the GEE approach, ∆AIC, by considering the quasi-

likelihood difference between the candidate model and a narrow model plus a penalty

term. Theoretical asymptotic properties are derived and proven. As a byproduct, we

also give a theoretical justification of the equivalence in distribution between the quasi-

likelihood ratio test and the Wald test incorporating the GEE approach. Simulation

studies and real data analysis are then performed to support the better performance of

this approach.

We also extend FIC and the FMA procedure to longitudinal data and propose the

quasi-likelihood-based focused information criterion (QFIC) and Frequentist model av-

eraging (QFMA) procedure incorporating the GEE approach in Chapter 3. The impact

of various weight functions on QFMA estimates is examined and a suggestion for the

weights’ choice is given from a numerical prospective. Simulation studies are also

performed to provide evidence of the superiority of the proposed procedures. The pro-

cedure is further applied to a real data example.

FIC tries to select the model with the minimum estimated mean square error of a

13

targeted parameter’s estimation. In Chapter 4, we redefine the personalized FIC and

apply it in predictive models in personalized medicine. Based on individual level infor-

mation from clinical observations, demographics, genetics, etc., this criterion can pro-

vide a personalized predictive model for a targeted patient and make a corresponding

personalized prognosis and diagnosis. Consideration of the population’s heterogene-

ity helps reduce prediction uncertainty and improve prediction accuracy. Several case

studies from biomedical research, not just for longitudinal, but also for survival and

cross-sectional data, are analyzed as illustrations.

Due to the popularity of LMM and GLMM in longitudinal studies, the extension of

the model averaging procedure to LMM and GLMM with their corresponding optimal

weights’ choice can be a direction for future research and is discussed in Chapter 5.

14

2 AIC-Type Model Selection

Criterion Incorporating the GEE

Approach

2.1 Introduction

The example motivating our study in this chapter is from the Wisconsin Epidemiolog-

ical Study of Diabetic Retinopathy (WESDR, Klein et al., 1984), where 996 insulin-

taking younger-onset diabetics in southern Wisconsin were examined for the presence

of diabetic retinopathy in both their left and right eyes. The objective of our study is to

determine the main risk factors of diabetic retinopathy from thirteen potential factors,

which were collected at the same time. In this analysis, the strong correlation between

the two eyes of each participant must be considered.

As we mentioned in Chapter 1, LMM has been widely used for analyzing longi-

tudinal data. As a likelihood-based approach, it relies on the assumption that data are

drawn from certain known distributions, which in reality may be unknown. Even if the

distributions are specified, it is still sometimes very challenging to derive the complete

likelihood, especially for non-Gaussian data. Instead of specifying the complicated

joint distribution of responses, Liang and Zeger (1986) developed the GEE approach,

which provides the consistent estimates by only specifying the first two marginal mo-

15

ments and a working correlation matrix.

Subsection 1.3.1 lists certain model selection criteria that have been extended to

the GEE approach. Pan (2001a)’s QIC can be easily computed using the well devel-

oped statistical packages in S-plus/R and SAS. It is worth pointing out that the neg-

ligence of the significant part during the QIC’s derivation and the reliance on work-

ing independence, however, make QIC have a lack of theoretical asymptotic prop-

erties. Cantoni et al. (2005)’s GCp criterion used weighted quadratic predictive risk

as a measure of the model’s adequacy for prediction. It, however, requires bootstrap

sampling or Monte Carlo simulation that can be computationally expensive. Another

extended cross-validation approach based on expected predictive bias was suggested

by Pan (2001b). This approach received little attention due to the computational re-

quirement as well. On the other hand, Fu (2003) proposed the penalized generalized

estimating equations for variable selection. Wang and Qu (2009) proposed a BIC-type

model selection criterion based on quadratic inference function. They both require an

extra searching algorithm for tuning parameters.

This chapter aims to propose another quasi-likelihood-based AIC-type model se-

lection criterion for longitudinal data incorporating the GEE approach. We choose a

narrow model as a benchmark and consider the quasi-likelihood difference between a

candidate model and the narrow model, thereby avoiding the complicated calculation

of the whole quasi-likelihood and making the implementation feasible and easier. The

idea is inspired by the local misspecification framework setting in Hjort and Claeskens

(2003). Under certain regularity conditions, the proposed criterion is shown to have

similar asymptotic properties as AIC.

In this chapter, Section 2.2 proposes the new model selection criterion ∆AIC and

provides corresponding theoretical insights. Simulation studies and the WESDR real

data study are carried out in Sections 2.3 and 2.4. In the final section, we conclude with

some remarks.

16

2.2 Quasi-likelihood-based ∆AIC

Claeskens and Hjort (2008) pointed out that among all the candidate models, when

the true model is at a fixed distance from the narrow model with a large sample size,

the dominating bias always suggests the full model as the final model. It therefore

motivates us to study and propose a model selection criterion for longitudinal data

incorporating the GEE approach in a local misspecification framework, as similarly

studied in Hjort and Claeskens (2003).

2.2.1 Local Misspecification Framework

Consider the longitudinal data introduced in Chapter 1. We start with the full model,

where all the covariates can be grouped into two categories: p certain covariates, which

are certainly included in the final model, and q uncertain ones, of which we are unsure.

The corresponding unknown coefficients are therefore composed of certain coefficients

θ = (θ1, · · · , θp) and uncertain coefficients γ = (γ1, · · · , γq), written as:

β = (θ,γ).

Any candidate model S therefore can be written as a special case of the full model:

βS = (θ,γS,0Sc),

where γS is a qS subvector of γ and 0Sc is a qSc subvector of q × 1 vector 0 with S ⊂

1, · · · , q. When S = N , the narrow model,

βN = (θ,0),

includes the certain covariates only. The true model is defined in a similar framework

in Hjort and Claeskens (2003):

β0 = (θ0,γ0) =(θ0, δ/

√n).

17

Here δ = (δ1, · · · , δq) measures how far away the true model is from the narrow model

in directions 1, · · · , q of order O (1/√n) and some δi’s can be 0. Under this scenario,

the size of the squared model biases and the model variances can reach O(1/n), the

highest possible large sample approximation.

To simplify the discussion, in the context of the GEE approach, we ignore the treat-

ment of the nuisance parameters α and φ and assume the consistency of α (β, φ) and

φ (β) and the boundedness of ∂α (β, φ) /∂φ as presented in Liang and Zeger (1986).

Thus, the quasi-score of the full model, evaluated at (θ0,0), can be written as:

U =

[U1

U2

]=

[∂Q(θ,γ;D)/∂θ

∂Q(θ,γ;D)/∂γ

]θ=θ0,γ=0

.

The corresponding (p+ q)× (p+ q) quasi-likelihood information matrix is denoted by:

Σ = varN (U) =

[Σ00 Σ01

Σ10 Σ11

]and Σ−1 =

[Σ00 Σ01

Σ10 Σ11

],

where Σ11 = (Σ11 −Σ10Σ−100 Σ01)−1. Let πS be the qS × q projection matrix mapping

γ to γS with qS being the size of S. The quasi-score of the candidate model S, evaluated

at (θ0,0), can be written as:

US =

[U1

U2,S

]=

[U1

πSU2

].

The corresponding quasi-likelihood information matrix has a (p+ qS)× (p+ qS) dimen-

sion:

ΣS =

[Σ00 Σ01π

>S

πSΣ10 πSΣ11π>S

]and

(Σ11

S

)−1= πS

(Σ11)−1π>S .

2.2.2 Quasi-likelihood-based ∆AIC

Let (θS, γS

)be the GEE estimates under candidate model S. Recall that the AIC value

of model S can be reached by:

−2n∑i=1

logf(yi, θS, γS) + 2|S|,

18

where |S| is the number of parameters in model S. Similarly, the quasi-likelihood-based

AIC value of model S can be calculated through:

QAICn,S = −2n∑i=1

Q(θS, γS; yi) + 2|S|.

As we mentioned earlier, due to the complicated correlation structure of longitudinal

data, QAIC is generally very difficult to implement, especially the part with the inte-

gration involving the inverse of the working covariance matrix in the quasi-likelihood

component. Nevertheless in the previous framework, every candidate model includes

all the certain parameters θ, of which the narrow model is composed. By subtracting

the QAIC value of the narrow model from every candidate model, we can avoid calcu-

lating the log quasi-likelihood directly. Thus, we propose AIC-type quasi-likelihood-

based model selection criterion for longitudinal data incorporating the GEE approach

as:

∆AICn,S = QAICn,S − QAICn,N .

The following theorem gives the specific form and the large sample behavior of ∆AICn,S.

Theorem 2.1 Under Regularity Assumptions given in Appendix, as n goes to infinity,

∆AICn,Sd= −nγ>

(Σ11)−1π>S Σ11

S πS

(Σ11)−1γ + 2|S/N|

d→ −χ2|S/N|(λS) + 2|S/N|,

with non-centrality parameter λS = nγ>0(Σ11)−1π>S Σ11

S πS

(Σ11)−1γ0. The degree of

freedom, |S/N|, is the number of covariates in the candidate model S, but not in the

narrow model. Here and below, “ d=” denotes equality in distribution and “ d→” denotes

convergence in distribution.

Theorem 2.1 indicates that in the large sample context, the behavior of ∆AICn,S is

fully dictated by the full model’s GEE estimates γ. Also, the limiting behaviors of

all ∆AICn,S in principle determine the limits of all the candidate models’ selection

probabilities through:

P(∆AICn selects model S | γ)→ P(∆AIC selects model S | γ0).

19

As shown in the proof of Theorem 2.1 in Appendix, by subtracting, the complicated

component in QAIC is canceled out and the remaining terms involve only the uncer-

tain parameters and the quasi-likelihood information matrix, which can be consistently

estimated incorporating the GEE approach. In particular, the estimates of Σ11 and

Σ11S =

πS

(Σ11)−1π>S−1 can be obtained from the sandwich estimate Σgee. Consis-

tent with AIC, the model with the smallest ∆AIC value will be selected as the final

model.

Remark 2.1 Due to the lack of likelihood, there are no likelihood ratio tests avail-

able incorporating the GEE approach for hypothesis testing, mentioned in Lipsitz and

Fitzmaurice (2009). Nevertheless, the availability of quasi-likelihood and the previous

theorem motivate us to consider the quasi-likelihood ratio tests. Consider the following

hypotheses:

H0 : γ = 0 vs Ha : γ 6= 0.

The null model can be viewed as a narrow model with only the certain parameter

vector θ. The alternative model can be viewed as a full model, which includes θ and

also the uncertain parameter vector γ. The quasi-likelihood ratio test statistic between

the alternative and null models, therefore between the full and narrow models, can be

written as:

QLRn = 2[Q(θ, γ;D)−Q(θ, γN ;D)

]= −QAICn,F + 2|F|+ QAICn,N − 2|N |

= −∆AICn,F + 2|F/N|d= nγ>

(Σ11)−1γ.

This shares the same form of the quadratic style Wald test statistic. Thus, Theorem

2.1 simultaneously gives the theoretical justification of the equivalence in distribution

between the quasi-likelihood ratio test and the Wald test incorporating the GEE ap-

proach.

20

2.3 Simulation Studies

In this section, we investigate the performance of our proposed model selection crite-

rion ∆AIC. To compare with QIC, we use the same model setting as in Pan (2001a),

where the longitudinal simulation studies have the moderate sample size of n = 50 or

100 subjects and m = 3 visit times for each subject.

Four potential exploratory covariates x1, x2, x3 and x4 are considered in the study.

They are generated from:

x1iji.i.d.∼ Bernoulli(1/2), x2ij = (j − 1) and x3ij, x4ij

i.i.d.∼ Uniform(−1, 1),

where x3ij and x4ij are also independent from x1ij . The binary response yij has the

conditional expectation µij:

µij = E(yij|x1ij, x2ij, x3ij, x4ij).

µij can be connected with the covariates through:

logit(µij) = β0 + β1x1ij + β2x2ij + β3x3ij + β4x4ij,

where i ∈ 1, · · · , n and j ∈ 1, · · · ,m. The coefficients are set to be:

β0 = −β1 = −β2 = 0.25 and β3 = β4 = 0.

Therefore, the model with a intercept term (int.), x1, and x2 is the true model. The

narrow model only includes int. and x1 to be consistent with Pan (2001a). The final

model then is selected from the remaining 23 = 8 candidate models as listed in Table

2.1.

We first use the Copulas package, developed by Yan (2007), to generate two types

of correlation structures among three response observations of each subject: exchange-

able and autoregressive with a correlation coefficient ρ = 0.5, denoted by EX(0.5) and

AR(0.5). Based on one thousand simulation replications, the frequencies of the candi-

date models selected by ∆AIC and QIC as the final model under these two scenarios

21

Table 2.1: ∆AIC - Candidate Models in Simulation Studies

Model Covariates Model Covariates

m1 - Full int. x1 x2 x3 x4 m5 int. x1 x3 x4

m2 int. x1 x2 x3 m6 int. x1 x3


m4 - True int. x1 x2 m8 - Narrow int. x1

incorporating the GEE approach with three different working correlation matrices, IN,

EX, and AR, are listed in Tables 2.2 and 2.3.

Generally speaking, Tables 2.2 and 2.3 both show the better performance of ∆AIC

compared to QIC, in terms of the relatively higher frequencies of selecting the true

model as the final model among all eight candidates. In particular, with the correct

working correlation matrices, i.e., EX for EX(0.5) scenario and AR for AR(0.5) sce-

nario, ∆AIC works observably better than QIC. With IN working correlation matrix,

QIC turns out to be comparable with ∆AIC. These patterns also show the bias of QIC

from simplifying with the working independence model and ignoring the complicated

part in the derivation.

There is one more point we want to mention in Table 2.3. Under the true autore-

gressive correlation structure AR(0.5), when the sample size is small, n = 50, both

∆AIC and QIC have little higher frequencies of choosing the narrow model as the final

model. As the sample size becomes large, n = 100, ∆AIC and QIC both work better in

terms of the much higher frequencies of the true model selection. This may be due to

the more complicated true autoregressive correlation structure than exchangeable cor-

relation structure. The complicity may require a relatively larger sample size to arrive

at a better estimation.

As we showed above, the first simulation study assumes the simple predictable cor-

22

Table 2.2: ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC in

Simulation I with True Exchangeable Correlation Structure EX(0.5)

n Criterion R m1 m2 m3 m4 m5 m6 m7 m8

50 ∆AIC IN 19 80 81 375 10 72 75 288

EX 17 86 80 371 11 85 77 273

AR 23 88 78 367 11 75 81 277

QIC IN 20 77 80 364 10 73 74 302

EX 28 83 91 343 13 80 80 282

AR 20 81 88 354 14 75 78 290

100 ∆AIC IN 21 101 108 542 7 29 31 161

EX 17 107 105 544 8 27 25 167

AR 15 105 110 540 8 34 22 166

QIC IN 20 102 111 541 6 31 31 158

EX 24 117 119 515 9 32 28 156

AR 19 107 113 535 9 33 27 157

23


Simulation I with True Autoregressive Correlation Structure AR(0.5)


50 ∆AIC IN 17 68 66 335 14 69 67 364

EX 17 80 70 322 17 77 78 339

AR 15 83 64 323 18 80 84 333

QIC IN 19 66 69 333 14 75 68 356

EX 23 76 70 322 16 74 76 343

AR 20 76 73 315 20 80 78 338

100 ∆AIC IN 17 100 113 473 12 38 35 212

EX 14 101 107 480 7 48 37 206

AR 20 87 113 486 12 50 35 197

QIC IN 16 98 115 475 11 41 35 209

EX 20 109 123 452 13 44 35 204

AR 20 101 121 462 14 43 33 206

24

relation structures among each subject’s repeated response measurements. In many real

longitudinal studies, however, it is impossible to know the true underlying correlation

structure pattern. Thus, the scenario with more complicated correlation structures be-

comes more interesting. Here we generate the longitudinal data, MIX, 30% of which

come from EX(0.5), 30% from AR(0.5), and the rest have the following specified corre-

lation structure:

R(α) =

1.0 0.4 0.1

0.4 1.0 0.7

0.1 0.7 1.0

.Again, ∆AIC and QIC are applied to this scenario for model selection incorporating

the GEE approach. The results of one thousand simulation replications are shown in

Table 2.4.

Table 2.4 shows the similar pattern as in previous tables. With the small sample

size, n = 50, QIC works better with IN, while ∆AIC works better with EX and AR.

When the sample size is large, n = 100, ∆AIC works better with all three working

correlation matrices, though it is close to QIC under IN. This also shows the better

large sample properties of ∆AIC compared to QIC.

2.4 A Numerical Example

We now apply our proposed model selection criterion ∆AIC to the WESDR dataset

mentioned in the beginning of this chapter. This dataset was also examined by Barnhart

and Williamson (1998) and Pan (2001a). Here, we consider only 720 individuals who

have the complete information from examinations in both eyes. Therefore, there are

1440 total observations with a possibly natural correlation between the two eyes of

each individual. The binary response, retinpy, indicates diabetic retinopathy (1 -

presence and 0 - absence). The study aims to determine the main risk factors for diabetic

retinopathy from thirteen potential ones.

25


Simulation II with True Mixed Correlation Structure MIX


50 ∆AIC IN 20 73 67 308 14 81 75 362

EX 13 71 70 316 16 87 61 366

AR 19 70 69 313 11 89 70 359

QIC IN 21 68 68 317 19 87 68 352

EX 26 76 76 303 24 92 68 335

AR 23 76 72 305 24 89 72 339

100 ∆AIC IN 12 94 92 497 9 56 36 204

EX 15 90 98 496 8 58 35 200

AR 15 83 95 506 8 54 32 207

QIC IN 14 91 93 496 10 54 36 206

EX 27 92 94 478 14 58 37 200

AR 24 100 94 476 15 57 33 201

26

Based on the univariate analysis and the goodness-of-fit tests conducted in Barnhart

and Williamson (1998), we consider only eight risk factors that were found marginally

significant for the response: iop, intraocular pressure; diab, duration of diabetes (in

years); gh, glycosylated hemoglobin level; sbp, systolic blood pressure; dbp, diastolic

blood pressure; bmi, body mass index; pr, pulse rate (beats/30 seconds); and prot,

proteinuria (0 - absence and 1 - presence). The model concluded by Barnhart and

Williamson (1998) includes diab, gh, dbp, bmi, diab2, and bmi2, and is used as the

narrow model in our setting, as in Pan (2001a). The full model, therefore, includes

eight risk factors and two quadric terms of diab and bmi. It can be written as:

logit(µij) = β0 + β1diabij + β2ghij + β3dbpij + β4bmiij + β5(diabij)2

+ β6(bmiij)2 + β7iopij + β8sbpij + β9prij + β10protij,

with i = 1, · · · , 720, j = 1, 2 and µij is the conditional expected of retinpyij . We

thus consider four uncertain risk factors: iop, sbp, pr and prot, resulting in 24 = 16

candidate models. From these candidates, the final model is selected by ∆AIC and

QIC.

Due to the possibly natural correlation in this dataset, the full marginal logistic

regression model is fitted incorporating the GEE approach with three different working

correlation matrices: IN, EX and AR. The corresponding coefficients’ estimates and p-

values are listed in Table 2.5 in the order of their significance. Table 2.5 shows that other

than the risk factors in the narrow model, the uncertain covariate proteinuria (prot)

also has a relatively small p-value (0.04). Moreover, these three working correlation

matrices provide very similar estimates. For sake of simplicity, we only incorporate the

GEE approach with the exchangeable working correlation matrix.

The values of ∆AIC and the corresponding ranks for all 16 candidate models are

listed in Table 2.6 and plotted in Figure 2.1. As shown in Table 2.6, the top model

concluded by ∆AIC includes one uncertain risk factor, prot (statistically significant as

shown in Table 2.5). The following four models are either the narrow model or adding

one more risk factor besides prot. These patterns further show the importance of prot.

27

Table 2.5: WESDR - Statistical Inference under Full Model with IN, EX and AR

Working Correlation Matrices

IN EX AR

Covariate Estimate P-value Estimate P-value Estimate P-value

int. -1.7e-00 0.0e-00 -1.7e-00 0.0e-00 -1.7e-00 0.0e-00

diab -2.6e-01 0.0e-00 -2.6e-01 0.0e-00 -2.6e-01 0.0e-00

diab2 7.0e-03 0.0e-00 7.0e-03 0.0e-00 7.0e-03 0.0e-00

bmi2 6.1e-01 4.4e-05 6.1e-01 4.2e-05 6.1e-01 4.2e-05

gh -1.5e-01 2.5e-05 -1.5e-01 2.6e-05 -1.5e-01 2.6e-05

bmi -6.0e-01 3.3e-03 -6.0e-01 3.4e-03 -6.0e-01 3.4e-03

dbp -2.3e-02 2.8e-02 -2.3e-02 2.5e-02 -2.3e-02 2.5e-02

prot -7.1e-01 4.0e-02 -7.1e-01 4.0e-02 -7.1e-01 4.0e-02

iop -4.3e-02 1.5e-01 -4.0e-02 1.5e-01 -4.0e-02 1.5e-01

sbp 1.1e-02 1.8e-01 1.1e-02 1.8e-01 1.1e-02 1.8e-01

pr -1.7e-02 2.0e-01 -1.8e-02 2.0e-01 -1.8e-02 2.0e-01

28

Table 2.6: WESDR - ∆AIC Values and Ranks of Candidate Models

Rank ∆AIC Uncertain Covariates Rank ∆AIC Uncertain Covariates

1 -0.733 prot 9 0.555 iop

2 -0.274 prot, iop 10 0.615 prot, sbp, pr

3 0.000 N 11 0.765 pr

4 0.009 prot, sbp 12 1.223 iop, pr

5 0.019 prot, pr 13 1.731 sbp

6 0.171 prot, iop, sbp 14 2.239 iop, sbp

7 0.311 prot, iop, pr 15 2.407 sbp, pr

8 0.491 prot, iop, sbp, pr 16 2.727 iop, sbp, pr

Figure 2.1: WESDR - ∆AIC Values of Candidate Models

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

Candidate Model

∆AIC

1 2 Narrow 4 5 6 7 Full 9 10 11 12 13 14 15 16

29

For the comparison with Pan (2001a), we also list the values of ∆AIC for the top

four candidate models selected by QIC along with the full and narrow models in Table

2.7. Pan concluded that the top four candidate models are very close in terms of similar

QIC values: 1185.5, 1185.7, 1185.8 and 1186.0. But by implementing ∆AIC, we do see

the relatively large difference: −0.274,−0.733, 0.171 and 0.009. Thus, ∆AIC suggests

one single model as the final model, which includes: diab, gh, dbp, bmi, diab2, bmi2

and prot.

We also use the quasi-likelihood ratio test commented in Remark 2.1 and the ANOVA

method referred in Hjsgaard et al. (2006) to compare the relatively better model con-

cluded by QIC: narrow+iop+prot, with the final model selected by ∆AIC: narrow+

prot, where narrow = diab + gh + dbp + bmi + diab2 + bmi2. We obtain the test

statistics for both approaches of 2.15 with a p-value of 0.1429. It indicates the insignifi-

cant difference between these two models and the preference of the simpler final model

chosen by our ∆AIC.

Table 2.7: WESDR - QIC and ∆AIC Values of Models Selected by QIC

QIC ∆AIC

Uncertain Covariates Rank IN EX AR IN EX AR

prot, iop 1 1185.5 1185.1 1185.1 -0.291 -0.274 -0.274

prot 2 1185.7 1185.7 1185.7 -0.717 -0.733 -0.733

prot, iop, sbp 3 1185.8 1185.4 1185.4 0.190 0.171 0.171

prot, sbp 4 1186.0 1186.0 1186.0 0.045 0.009 0.009

prot, iop, sbp, pr 8 1186.5 1186.0 1186.0 0.541 0.491 0.491

N 10 1189.8 1189.8 1189.8 0.000 0.000 0.000

30

2.5 Conclusion and Remarks

The key point of our proposed approach is to consider the difference between the candi-

date model and a narrow model by executing the Taylor expansion in order to avoid cal-

culating the integration involved in the quasi-likelihood. The resulting criterion ∆AIC

can be easily implemented by fitting the full model with a penalty term. This advantage

becomes more critical for discrete response variables. Although our criterion is built

under the AIC framework, analogously, we can also define BIC-type quasi-likelihood-

based model selection criterion ∆BIC for longitudinal data incorporating the GEE ap-

proach by just changing the penalty term. As Yang (2005) mentioned, BIC aims to

consistently select the true model, which is required to be in the set of all the candidate

models. AIC aims to minimize the distance between the selected model and the data

set in terms of likelihood. ∆AIC and ∆BIC, therefore, have the similar characteristics

to AIC and BIC. Other criteria can also be extended to longitudinal data in a similar

way.

It is worth mentioning that the choice of a narrow model is necessary for implement-

ing ∆AIC. We suggest to prefit the full model at first and pick the covariates with the

“small” p-values as the certain covariates, thereby composing a narrow model. Other

covariates can also be included by interests and experience. Both theoretical and nu-

merical evidence suggests that the choice of a narrow model only lightly influences

the results. When the signal of some covariates become weaker, smaller models are

favorable in both ∆AIC and QIC.

Two issues arise concerning model selection for longitudinal data incorporating the

GEE approach: variable selection and working correlation matrix selection. Currently

∆AIC is limited to variable selection only. More work needs to be done for the selection

of a working correlation matrix.

31

3 Focused Information Criterion and

the Frequentist Model Averaging

Procedure Incorporating the GEE

Approach

3.1 Introduction

In clinical studies, longitudinal data are commonly used to analyze long term ex-

ploratory variables’ effects on response variables. One example is AIDS clinical study

A5055, which aimed to predict the long term antiviral treatment responses of HIV-1 in-

fected patients by considering pharmacokinetics, drug adherence and susceptibility. In

this study, each patient was visited multiple times over 24 weeks after entry. Therefore,

the correlations among repeated measurements of each patient are expected and have

to be accounted for analysis.

As we mentioned in Chapter 1, all existing model selection criteria incorporating

the GEE approach are data-oriented and result in the model with overall properties.

Claeskens and Hjort (2003) therefore proposed the model selection criterion FIC to se-

lect different models based on different targeted parameters. At the same time, Hjort

and Claeskens (2003) also proposed the FMA procedure to reduce the risk of choos-

32

ing a poor model and thereby improve the confidence intervals’ coverage probabil-

ity. This chapter aims to propose the quasi-likelihood based focused information crite-

rion (QFIC) and the Frequentist model averaging (QFMA) procedure for longitudinal

data incorporating the GEE approach. QFIC and the QFMA procedure inherit certain

asymptotic properties from FIC and the FMA procedure due to the similarities between

quasi-likelihood and likelihood.

Section 3.2 introduces QFIC and the QFMA procedures and constructs the modified

confidence intervals based on QFMA estimation. Simulation studies and the A5055

real data study are performed in Section 3.3 and Section 3.4 respectively. In the final

section, we conclude with additional remarks.

3.2 Model Selection and Averaging Procedures

3.2.1 Focused Information Criterion

As mentioned in Subsection 2.2.2, denote the GEE estimates under candidate model S

by(θS, γS

). The corresponding asymptotic distribution of the GEE estimates will be

given in the following proposition.

Proposition 3.1 Under the misspecification framework and Regularity Assumptions

given in Appendix, as n goes to infinity, we have:

√n

[θ − θ0

γ

]d→ Σ−1

[Σ01δ + M1

Σ11δ + M2

]∼ Np+q

(Σ−1

[Σ01

Σ11

]δ,Σ−1

),

where

[M1

M2

]∼ Np+q(0,Σ). In particular, under candidate model S:

√n

[θS − θ0

γS

]d→ Σ−1

S

[Σ01δ + M1

πSΣ11δ + πSM2

]∼ Np+qs

(Σ−1

S

[Σ01

πSΣ11

]δ,Σ−1

S

).

33

To simplify the notation, let W = Σ11(M2 − Σ10Σ−100 M1). By Proposition 3.1, the

estimates of uncertain parameters under the full model can be specified as:

√nγ

d→ δ + W = ∆ ∼ Nq

(δ,Σ11

).

In particular, under candidate model S:

√nγS

d→ Σ11S πS

(Σ11)−1

(δ + W) = Σ11S πS

(Σ11)−1

∆.

Assume that the focused parameter can be written as the function of the model pa-

rameters, denoted by ζ = ζ (θ,γ), and has the continuous partial derivatives in the

neighborhood of ζ0 = ζ (θ0,γ0). Denote:

ω = Σ10Σ−100

∂ζ

∂θ−∂ζ∂γ

, τ 20 =

(∂ζ

∂θ

)>Σ−1

00

(∂ζ

∂θ

)and DS = π>S Σ11

S πS

(Σ11)−1

.

The following theorem provides the limiting distribution of the focused parameter’s

estimate incorporating the GEE approach under candidate model S.


√n(ζS − ζ0)

d→ ΩS = Ω0 + ω>δ − ω>DS∆,

where

Ω0 =

(∂ζ

∂θ

)>Σ−1

00 M1 ∼ Np(0, τ20 ).

The limiting variable ΩS follows the normal distribution with mean ω>(Iq − DS)δ and

variance τ 20 + ω>π>S Σ11

S πSω.

The limiting mean square errors can be achieved by Theorem 3.1 as:

mse(ΩS) = τ 20 + ω>π>S Σ11

S πSω +[ω> (Iq − DS) δ

]2,

where the parameters τ 0, ω, Σ11S , DS and δ can all be estimated incorporating the

GEE approach under the full model. Therefore, we propose the quasi-likelihood-based

34

focused information criterion (QFIC) for longitudinal data incorporating the GEE ap-

proach as:

QFICn,S = 2ω>π>S Σ11

S πSω + n[ω>(Iq − DS)γ

]2.

In the large sample context, the behavior of QFIC is not only related to the uncertain

parameter γ, but also influenced by ω that is determined by the focused parameter

ζ. Therefore, QFIC chooses the different models for estimating the different focused

parameters. The model with the smallest QFIC value, therefore the smallest estimated

mean square error of the focused parameters’ estimates, is selected as the final model.

3.2.2 The Frequentist Model Averaging Procedure

Model selection procedure aims to select a single final model, either catching the over-

all information from the data such as ∆AIC, or minimizing the mean square error of

the focused parameters’ estimates such as QFIC. The inference based on this final

model, however, ignores the uncertainty introduced by the selecting process and re-

sults in overly optimistic confidence intervals. The FMA procedure, as an alternative

to model selection procedure, addresses this problem and provides the relatively robust

statistical inference.

Similarly, the quasi-likelihood-based Frequentist model averaging (QFMA) esti-

mate of the focused parameter ζ can be defined as the weighted average among the

estimates reached through all the candidate models incorporating the GEE approach:

ζ(γ) =∑

S

p(S |γ)ζS,

where p(·|·) is a weight function satisfying∑

S p(S|γ) = 1 with each individual taking

value in [0, 1]. The following theorem shows the asymptotic properties of the model

averaging estimate ζ.


√n(ζ − ζ0)

d→ Ω = Ω0 + ω>δ − ω>δ(∆),

35

where

δ(∆) =∑

S

p(S|∆)DS∆.

The mean and variance of the limiting variable Ω are given as:

E(Ω) = ω>δ − ω>E[δ(∆)

]and var(Ω) = τ 2

0 + ω>var[δ(∆)

]ω.

Motivated by Theorem 3.2, we modify the traditional confidence intervals of the fo-

cused parameter ζ, based on the model averaging estimate ζ as:

lown = ζ − ω>[γn −

1√nδ(γn)

]− zkτ√

n,

upn = ζ − ω>[γn −

1√nδ(γn)

]+zkτ√n,

where zk is the kth standard normal quantile. τ/√n is the consistent estimate of the

standard deviation of ζ under the full model, which can be written as:

τ/√n = n−1/2

(τ 2

0 + ω>Σ11ω)1/2

.

By shifting the center of the confidence intervals from ζ by the amount of ω>[γn −

δ(γn)/√n], and widening the confidence intervals as τ/

√n instead of τ S/

√n, there-

fore including the uncertainty, the coverage probability is shown to be consistent with

the nominal coverage probability by the following theorem.


Pr(lown ≤ ζ0 ≤ upn)d→ 2Φ(zk)− 1,

where Φ(·) is a standard normal distribution function.

In particular,

Zn =[√n(ζ − ζ0

)− ω>

∆n − δ(∆n)

]/τ

d→ (Ω0 + ω>δ − ω>∆)/τ

is a standard normal distribution. Theorem 3.2 can be easily proven by simultaneous

convergence in distribution:√n(ζ − ζ0

), γn d→

Ω0 + ω>δ − ω>δ(∆),∆

.

36

3.2.3 The Choices of Weight Functions

The model averaging estimate takes the form of the weighted estimates among all the

candidate models. It can be connected to a model selection estimate by taking a spe-

cific weight function. In particular, the final model selected by ∆AIC, S∆AIC, takes an

indicator function as the weight function, which is called hard core weight function:

ζ∆AIC =∑

S

I(S = S∆AIC)ζS = ζS∆AIC.

Likewise, the final model selected by QIC can be written as:

ζQIC =∑

S

I(S = SQIC)ζS = ζSQIC,

and the final model selected by QFIC can be written as:

ζQFIC =∑

S

I(S = SQFIC)ζS = ζSQFIC.

Buckland et al. (1997), however, suggested that the choice of weights in the model av-

eraging estimates should be proportional to exp(fS − |S|), where fS is the maximized

log-likelihood at candidate model S. For longitudinal data incorporating the GEE ap-

proach, the weights thus should be proportional to exp(QS − |S|), with QS being the

quasi-likelihood of candidate model S. The corresponding smoothed weight functions

for ∆AIC and QIC can be represented as:

exp(∆AICn,S/2

)∑T exp

(∆AICn,T/2

) andexp(QICn,S/2

)∑T exp

(QICn,T/2

) .It can also be beneficial to consider the information carried by QFIC using the smoothed

QFIC weight. The weight function is similar to that suggested in Hjort and Claeskens

(2003) as follows,

exp(−κ

2

QFICn,S

ω>Σ11ω

)∑

T exp(−κ

2

QFICn,T

ω>Σ11ω

) κ ≥ 0.

37

Here, κ is the weight parameter bridging the weight function from being uniform (κ

close to 0) to begin hard core (large κ). When the performances of all the candidate

models are very close, we would like to choose κ such that the weight function is close

to uniform. When certain candidate models behave much better than others, the larger

κ is a better option. The larger κ can make the weight function close to hard core,

and can therefore place the higher weights on the models that behave better and lower

weights on the ones that behave badly.

3.3 Simulation Studies

This section aims to investigate the performance of our proposed QFIC and the QFMA

procedure for longitudinal data incorporating the GEE approach. The model selection

procedures using QFIC, ∆AIC as proposed in Chapter 2, and Pan (2001a)’s QIC (de-

noted as P-QFIC, P-∆AIC and P-QIC) are compared to their smoothed weighted model

averaging procedures (denoted as S-QFIC, S-∆AIC and S-QIC). In particular, we cal-

culate the coverage probabilities (hereafter “CP”) of the estimated 95% confidence in-

tervals (hereafter “CIs”) and the estimated mean square errors (hereafter “MSE”) for

the targeted parameter. As a reference, the inference based on the full model (here-

after “Full”) is reported as well. Specifically, we consider the discrete and continuous

responses with n = 100 subjects, and each subject has m = 3 visits.

3.3.1 Continuous Response Variable

The continuous response variable can be reached by:

yi = β0 + β1x1i + β2x2i + β3x3i + εi with i = 1, · · · , n.

The covariates x1i = (x1i1, x1i2, x1i3)>, x2i = (x2i1, x2i2, x2i3)>, and x3i = (x3i1, x3i2, x3i3)>

are independently generated from a multivariate normal distribution with mean (1, 1, 1)>

38

and identity covariance matrix. The error term εi = (εi1, εi2, εi3)> is independent

of the covariates and is generated from a three-dimensional normal distribution with

mean 0, marginal variance 1. Section 2.3 introduces three types of correlation struc-

tures among the three repeated response measurements of each subject: two simple

predictable structures, EX(0.5) and AR(0.5), and a complex one, MIX. Here, we con-

sider the same correlation structures for εi. The narrow model contains only int. and

x1 with (β0, β1) = (2, 1). The coefficients of the other two covariates are valued as

(β2, β3) = (2,−2)/√mn. Totally, four candidate models are given in Table 3.1.

Table 3.1: QFIC and QFMA - Candidate Models in Simulation I with Continuous

Response

Model Covariate Model Covariate

m1 - Full int. x1 x2 x3 m2 int. x1 x3

m3 int. x1 x2 m4 - Narrow int. x1

Here, we consider only one focused parameter in this study as follows:

ζ = −2β0 + 2β1 − 0.5β2 + 0.5β3.

Actually, the focused parameter is not necessarily limited to the form of the linear

combinations of the coefficients. We also tried a quadratic form β21 + β2 and observed

a similar pattern. All the models are fitted incorporating the GEE approach with three

different working correlation matrices: IN, EX and AR. The simulation results, based

on one thousand replications, are presented in Figure 3.1 in terms of the MSE and CP

of the estimated 95% CIs for the focused parameter.

39

Figu

re3.

1:Q

FMA

and

QFI

C-M

SEan

dC

Pfo

rFoc

used

Para

met

erζ

inSi

mul

atio

nIo

nC

ontin

uous

Res

pons

esw

ithTr

ue

Exc

hang

eabl

e,A

utor

egre

ssiv

ean

dM

ixed

Cor

rela

tion

Mat

rice

sEX(0.5

),AR(0.5

)an

dMIX

25303540

EX

(0.5

)

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

25303540

AR

(0.5

)

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

25303540

MIX

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.880.900.920.94

EX

(0.5

)

Coverage Probabiliy

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.880.900.920.94

AR

(0.5

)

Coverage Probabiliy

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.880.900.920.94

MIX

Coverage Probabiliy

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

INE

XA

R

40

Regardless of different working correlation matrices, three MSE plots in the upper

panel of Figure 3.1 consistently show the performance of model averaging procedures

to be better than model selection procedures in terms of relatively smaller MSE values.

They also show that the performance of model selection criterion QFIC is better than

∆AIC and QIC. Comparing ∆AIC to QIC, S-∆AIC behaves similarly to S-QIC, while

P-∆AIC works better than P-QIC. This shows the superiority of ∆AIC compared to

QIC and also the stability of averaging version compared to selection version. As a

reference, the full model does provide unbiased estimates, but with the price of a largely

increase of variability. It therefore has relatively larger MSE.

We now compare the procedures based on different working correlation matrices.

In the first two predictable correlation structure scenarios, the GEE estimates with the

correct working correlation matrix, i.e., EX for EX(0.5) and AR for AR(0.5), always

have the smallest MSE values for all the model selection or averaging procedures. This

pattern is consistent with the true correlation’s efficiency pointed out in Liang and Zeger

(1986). In the third scenario (MIX), AR gives the smallest MSE value. This may be

due to the relatively closer to the true correlation structure of AR compared to EX .

In all three scenarios, IN results in the biggest MSE value due to the high correlation

coefficient in the simulation setting.

Three CP plots in the bottom panel of Figure 3.1 also indicate the better perfor-

mance of the modified CIs from averaging procedure compared to the traditional CIs

from selection procedure. The CPs of all the modified CIs are very close to 95%,

whereas the CPs of the traditional CIs can sometimes dramatically go down to even

90%. Compared to the different CP behaviors of the traditional CIs resulting from P-

QFIC, P-∆AIC and P-QIC, three modified CIs perform similarly. This also shows the

more stable behavior of model averaging procedure.

For CP plots, in the first two correlation structure scenarios, AR and EX works very

similarly and better than IN. Although the correct working correlation matrix, i.e., EX

for EX(0.5) and AR for AR(0.5), does work a little better. In MIX scenario, AR works

41

better than the other two due to the same reason as the MSE plots.

3.3.2 Binary Response Variable

For the binary longitudinal data, we generate the binary response with the same model

as in Section 2.3, although we consider the different coefficients combination:

(β0, β1) = (3,−3) and (β2, β3, β4) = (1, 1,−1)/√mn.

The narrow model therefore contains int. and x1. As shown in Table 3.2, we have 8

candidate models. Here, we focus on the specified parameter:

ζ = 2β1 + 2β2 + 0.5β3 + 0.5β4 + 0.5β5.

The simulation results, based on one thousand replications, are presented in Figure 3.2.

Table 3.2: QFIC and QFMA - Candidate models in Simulation II with Binary

Response

Model Covariate Model Covariate

m1 - Full int. x1 x2 x3 x4 m5 int. x1 x3 x4



m4 int. x1 x2 m8 - Narrow int. x1

42

Figu

re3.

2:Q

FMA

&Q

FIC

-MSE

&C

Pfo

rFoc

used

Para

met

erζ

inSi

mul

atio

nII

onB

inar

yR

espo

nses

with

True

Exc

hang

eabl

e,A

utor

egre

ssiv

ean

dM

ixed

Cor

rela

tion

Mat

rice

sEX(0.5

),AR(0.5

)an

dMIX

505560657075

EX

(0.5

)

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

505560657075

AR

(0.5

)

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

505560657075

MIX

Inde

x

Mean Square Error

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.900.920.940.96

EX

(0.5

)

Coverage Probability

Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.900.920.940.96

AR

(0.5

)


Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

0.900.920.940.96

MIX


Ful

lS

−QF

ICS

−∆A

ICS

−QIC

P−Q

FIC

P−∆

AIC

P−Q

IC

INE

XA

R

43

The pattens of the binary longitudinal data, shown in Figure 3.2, are similar to the

continuous case. Generally speaking, estimates by model averaging procedure have a

relatively smaller MSE than model selection estimates, and estimates by QFIC have a

smaller MSE than ∆AIC and QIC. With the different working correlation matrices, EX

and AR similarly provide MSE values smaller than IN. The true working correlation

matrices for the first two scenarios, i.e., EX for EX(0.5) and AR for AR(0.5), however,

have the slightly better performance. Regarding ∆AIC and QIC, with EX and AR,

∆AIC has a much smaller MSE than QIC. With IN, however, they work almost the

same. This also shows the superiority of ∆AIC compared to QIC and the bias intro-

duced by using working independence for QIC. For CP, the modified CIs are observably

closer to 95% CPs than the traditional CIs.

In summary, for both continuous and binary longitudinal simulation studies, the

MSE and CP plots consistently show the advantage of QFIC over the traditional model

selection criteria ∆AIC and QIC. They also demonstrate the behaviors of the QFMA

procedure are better than the traditional model selection procedure.

3.4 A Numerical Example

In this section, we apply our proposed QFIC and the QFMA procedure to the AIDS

Clinical Trials Group protocol A5055 longitudinal study incorporating the GEE ap-

proach. A5055 was a Phase I/II, randomized, open-label, 24-week comparative study

of the pharmacokinetics, tolerability, safety and antiretroviral effects of two regimens

of indinavir (IDV), ritonavir (RTV) and two nucleoside analogue reverse transcriptase

inhibitors on HIV-1 infected patients who failed protease inhibitor containing antiretro-

viral therapies.

In this study, 42 patients were randomized to one of two regimens and were visited

at entry, weeks 1, 2 and 4 and every 4 weeks thereafter through week 24 of the follow-

up. Plasma for HIV-1 RNA testing was conducted at each visit, providing a binary

44

response rna, (0 - negative and 1 - positive). A series of potentially explanatory vari-

ables were collected at the same time, including: age; cd4, CD4 cell counts; cd8, CD8

cell counts; ic50, phenotypic determination of antiretroviral drug resistance; icmin

and rcmin, trough level of IDV and RTV concentration in plasma; ic12h and rc12h,

IDV and RTV concentration in plasma measured after 12h from dose taken; icmax and

rcmax, maximum IDV and RTV concentration in plasma; iauc and rauc, area under

the plasma concentration-time curve for IDV and RTV; and iadh and radh, pill counts

for monitoring adherence. More detailed descriptions and analyses are reported in Wu

et al. (2005), Huang et al. (2008), and Acosta et al. (2004). In this section, given all

these various potentially explanatory factors, we aim to identify the pertinent factors to

better predict the antiretroviral treatment response for a new patient.

We first fit the full model by considering all fourteen possible covariates in order to

identify the highly significant covariates. The full model can be written as:

logit(µij) =β0 + β1cd4ij + β2cd8ij + β3ageij + β4ic50ij + β5radhij + β6iadhij

+ β13raucij + β14iaucij + β7rcminij + β8icminij + β9rcmaxij

+ β10icmaxij + β11rc12hij + β12ic12hij,

with i = 1, · · · , 42, j = 1, · · · , ti and µij being the conditional expectation of rnaij .

Again, due to the complicated correlation structure among each patient’s repeated ob-

servations, the corresponding marginal logistic regression model is fitted incorporating

the GEE approach with three different working correlation matrices: IN, EX and AR.

By the order of the covariates’ significance, the model fitting results are listed in Table

3.3 in terms of the corresponding coefficients’ estimates and p-values.

In Table 3.3, the working correlation matrices IN and EX give very similar coef-

ficients’ estimates and corresponding p-values that are quite different from those with

AR. Regardless of the working correlation matrices, however, all the results point out

the highly significant covariates: int., cd4, cd8 and age. These four covariates are

therefore included as the certain covariates and we run the model selection and av-

eraging procedures among the remaining eleven uncertain ones. Nevertheless, if we

45

Table 3.3: A5055 - Statistical Inference under Full Model with IN, EX and AR

Working Correlations Matrices

IN EX AR

Covariate Estimate P-value Estimate P-value Estimate P-value

int. 4.0e+01 1.2e-02 3.9e+01 1.4e-02 3.7e+01 2.3e-02

cd4 1.0e-02 1.1e-03 1.1e-02 1.1e-03 1.1e-02 1.3e-03

cd8 -1.7e+00 4.3e-03 -1.8e+00 3.7e-03 -1.7e+00 3.2e-03

age -8.1e-02 6.4e-03 -8.0e-02 7.1e-03 -7.8e-02 8.7e-03

icmax -2.3e+00 3.8e-02 -2.3e+00 4.1e-02 -2.2e+00 3.7e-02

iauc 9.2e-02 8.0e-02 9.1e-02 8.5e-02 9.2e-02 8.5e-02

ic50 2.1e-01 9.1e-02 2.1e-01 9.8e-02 1.9e-01 9.2e-02

rcmin 1.8e-01 1.1e-01 1.8e-01 1.1e-01 1.9e-01 1.1e-01

rc12h 7.9e-01 1.3e-01 8.1e-01 1.2e-01 8.0e-01 1.4e-01

ic12h -6.3e-04 2.3e-01 -6.5e-04 2.2e-01 -6.1e-04 2.4e-01

rcmax -0.7e+00 2.6e-01 -1.6e+00 2.8e-01 -1.4e+00 3.7e-01

iadh -4.8e+00 2.9e-01 -4.7e+00 3.0e-01 -3.9e+00 3.4e-01

radh 2.2e+00 6.5e-01 2.1e+00 6.7e-01 1.3e+00 7.9e-01

icmin 7.6e-05 7.9e-01 8.3e-05 7.7e-01 1.3e-04 6.5e-01

rauc -3.3e-03 8.5e-01 -3.9e-03 8.3e-01 -6.7e-03 7.2e-01

46

consider all the possible candidates models, 211 models need to be estimated. A back-

ward elimination approach, as introduced in Claeskens et al. (2006), is thus used here

as an alternative to an exhaustive search. We start with the full model, delete one co-

variate at each step based on a certain model selection criterion and end up with twelve

nested candidate models. The model selection and averaging procedures are therefore

processed among these twelve models.

We examine the predictive powers by using a cross-validation experiment for six

model selection and averaging procedures: S-QFIC, S-∆AIC, S-QIC, P-QFIC, P-∆AIC,

and P-QIC. Due to the complicated correlation structure among each patient’s repeated

observations, a leave-one-patient-out cross-validation experiment can be a better choice

instead of a leave-one-observation-out experiment. The prediction error rates are eval-

uated by the percentage of wrong predictions among one thousand replications. They

are plotted in Figure 3.3.

Figure 3.3: A5055 - Prediction Error Rates of Model Selection and Model Averaging

Procedures with Different Values of Weight Parameter κ

0 2 4 6 8 10

0.1

86

0.1

88

0.1

90

0.1

92

Value of κ

Pre

dic

tion

Err

or

Ra

te

S−QFICP−QFIC

0.1

84

0.1

88

0.1

92

0.1

96

Methods

Pre

dic

tion

Err

or

Ra

te

S−QFIC P−QFIC S−∆AIC P−∆AIC S−QIC P−QIC

IN EX AR

As we mentioned in Section 3.2, the weight parameter κ bridges the QFIC-based

47

weight function from uniform to hard core. The left panel in Figure 3.3 gives the dif-

ferent prediction error rates of S-QFIC by using different κ values, ranged from 0 to

10. When κ = 0, the model averaging estimate is actually the arithmetic mean of the

estimates from the 12 nested candidate models. Without considering the information

about the different models’ different behaviors, this estimate results in the largest error

rate. On the other hand, the dashed line indicates the error rate of P-QFIC. It is actu-

ally equivalent to S-QFIC with the hard core weight function, assigning 1 to the best

model selected by QFIC and 0 to the rest of the models. For this specific data set with

the specific model setting, the prediction error rate of S-QFIC in the plot dramatically

decreases as κ takes values from 0 to 1. This error rate becomes less than the P-QFIC

error rate when κ = 2, and reaches the minimum when κ has the value around 5.

Eventually, it converges to the error rate of P-QFIC when κ→∞.

The right panel in Figure 3.3 plots the prediction error rates based on S-QFIC with

κ = 5, and also S-∆AIC, S-QIC, P-QFIC, P-∆AIC and P-QIC incorporating the GEE

approach with IN, EX and AR. From the plot, we observe the smaller error rate of

QFIC compared to ∆AIC and QIC, and the smaller error rate of ∆AIC compared to

QIC. These pattern indicate the advantage of the prediction made by using the different

sets of explanatory covariates for different patients at different visits, and also once

again present the better behavior of ∆AIC compared to QIC. The plot also shows the

smaller error rates of S-QFIC compared to P-QFIC, S-∆AIC to P-∆AIC and S-QIC

to P-QIC, even though the difference for the QFIC pair is not substantial. This shows

the behavior of model averaging procedure is better than selection procedure. With

the different working correlation matrices, IN and EX give almost the same prediction

error rates for all six estimates, while AR’s performances are much worse. This may

due to the similarities of EX and IN to the unknown true correlation.

In order to demonstrate that the final models concluded by QFIC are different for

various interests, we also consider three focused parameters: the coefficients of cd4,

cd8 and age. The backward elimination selection is processed incorporating the GEE

48

approach with IN, based on ∆AIC, QIC, and QFIC. The corresponding 12 selected

nested models are listed in Table 3.4 - Table 3.8 along with the values of the model

selection criteria and the focused parameters’ estimates.

Regardless of the different focused parameters, ∆AIC and QIC both result in their

own final models, selected among their twelve corresponding nested candidate models.

In particular to ∆AIC during the backward elimination search, the covariates deleted

in the first three steps are the most insignificant ones based on the full model. Their

corresponding p-values range from 0.6 − 0.9 as shown in Table 3.3. The subsequent

deletion has the different order, which may be due to the change of the significance

after deleting the most insignificant covariates. The model selection criterion QFIC,

however, selects the different final models for the different focused parameters from

their own different twelve nested candidate models .


Based on the quasi-likelihood, we propose the parameter-oriented model selection cri-

terion QFIC and the Frequentist model averaging QFMA procedure for longitudinal

data incorporating the GEE approach, and derive the asymptotic properties for the pro-

posed procedures. Both simulation studies and real data analysis show their superiori-

ties in terms of a smaller mean square error, a closer to 95% coverage probability, and

a smaller prediction error rate.

In the study of the weight choice, we note the effect of the weight parameter κ on

the weight function. From a numerical point of view, when the performances among

all the candidate models are quite different, a large value of κ is preferable to stretch

the weights’ difference. As a consequence, much higher weights are given to the better

behaved candidate models. On the other hand, when all the candidate models behave

closely, small κ is chosen to shrink the weights’ difference. However, there is no ex-

plicit form derived for κ’s selection from the theoretical prospective, and more research

49

Table 3.4: A5055 - ∆AIC and QFIC Values on 12 Nested Models Selected by ∆AIC

Covariate 1 2 3 4 5 6 7 8 9 10 11 12

icmax × × × × × × ×

iauc × × × × × × × × × × ×

ic50 × × × × × × × × × ×

rcmin × × × × × ×

rc12h × × × × ×

ic12h × × × ×

rcmax × × × × × × × ×

iadh × × × × × × × × ×

radh × × ×

icmin × ×

rauc ×

∆AIC[e-00] -45.7 -47.7 -49.6 -51.3 -50.8 -51.1 -48.9 -40.9 -32.2 -18.0 -7.7 0.0

QFICβcd4[e-03] 10.45 10.41 10.42 10.37 9.98 8.63 8.36 9.05 9.29 9.36 9.38 9.47

QFICβcd8[e-00] -1.73 -1.70 -1.66 -1.68 -1.62 -1.39 -1.32 -1.00 -1.12 -1.14 -0.86 -0.87

QFICβage[e-02] -8.07 -8.08 -8.16 -8.06 -9.27 -8.43 -9.07 -6.47 -5.39 -5.54 -5.57 -4.44

NOTE:× indicates presence of the covariate in the model and means its absence. Row ∆AIC[e-00] lists

the ∆AIC values. Row QFICβcd4[e-03] lists the QFIC values ×10−3 when we focus on parameter βcd4.

Row QFICβcd8[e-00] lists the QFIC values when we focus on parameter βcd8. Row QFICβage

[e-02] lists the

QFIC values ×10−2 when we focus on parameter βage.

50

Table 3.5: A5055 - QIC and QFIC Values on 12 Nested Model Selected by QIC

Covariate 1 2 3 4 5 6 7 8 9 10 11 12

icmax × × × × × × × ×

iauc × × × × × ×

ic50 × × × ×

rcmin × × ×

rc12 ×

ic12h × × × × × × ×

rcmax × × × × × × × × × × ×

iadh × ×

radh × × × × × × × × ×

icmin × × × × ×

rauc × × × × × × × × × ×

QIC[e-00] 204 210 213 216 216 217 216 216 215 214 212 211

QFICβcd4[e-03] 10.45 8.78 8.51 8.28 8.35 8.20 9.09 9.26 9.19 9.25 9.25 9.47

QFICβcd8[e-00] -1.73 -1.44 -1.48 -1.51 -1.29 -0.86 -0.77 -0.74 -0.76 -0.78 -0.78 -0.87

QFICβage[e-02] -8.07 -8.13 -7.73 -8.24 -7.73 -8.16 -7.28 -5.40 -5.17 -5.21 -5.20 -4.44

NOTE: × indicates presence of the covariate in the model and means its absence.

Row QIC[e-00] lists the QIC values. Row QFICβcd4[e-03] lists the QFIC values ×10−3 when we focus

on parameter βcd4. Row QFICβcd8[e-00] lists the QFIC values when we focus on parameter βcd8. Row

QFICβage[e-02] lists the QFIC values ×10−2 when we focus on parameter βage.

51

Table 3.6: A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models

Selected by QFIC for CD4

Covariate 1 2 3 4 5 6 7 8 9 10 11 12

icmax × × × × × × × ×

iauc × × × × ×

ic50 × × × × × ×

rcmin × × × × × × × × × × ×

rc12h × × × × × × × × ×

ic12h × × × ×

rcmax × × × × × × × × × ×

iadh × × ×

radh ×

icmin × × × × × × ×

rauc × ×

QFICβcd4[e-04] 2.956 2.847 2.797 2.773 2.700 2.119 2.047 2.048 2.175 6.560 8.500 15.50

βcd4

[e-03] 10.45 10.41 10.37 10.25 9.96 10.80 10.17 10.35 10.35 9.41 9.87 9.47

NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβcd4[e-

04] lists the QFIC values ×10−4 when we focus on parameter βcd4. Row βcd4

[e-03] lists the values of

βcd4× 10−3.

52


Selected by QFIC for CD8

Covariate 1 2 3 4 5 6 7 8 9 10 11 12

icmax × × × × ×

iauc × × × × × ×

ic50 × × × × × × × × × × ×

icmin × × × × × × × × ×

rc12h × × × × × × × ×

ic12h × × × ×

rcmax × × × × × × × × × ×

iadh × × × × × × ×

radh × ×

icmin × × ×

rauc ×

QFICβcd8[e-00] 2.67 2.23 1.94 1.86 1.85 1.86 1.87 1.98 4.15 6.26 9.00 18.91

βcd8

[e-00] -1.73 -1.70 -1.73 -1.68 -1.62 -1.50 -1.51 -1.49 -1.16 -1.02 -1.12 -0.87

NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβcd8[e-00]

lists the QFIC values when we focus on parameter βcd8. Row βcd8

[e-00] lists the values of βcd8

.

53


Selected by QFIC for Age

Covariate 1 2 3 4 5 6 7 8 9 10 11 12

icmax × × × × × × × ×

iauc × × × × ×

ic50 × × × ×

rcmin × × × × × × × × ×

rc12h × × × × × × ×

ic12h × ×

rcmax × × ×

iadh × × × × × ×

radh ×

icmin × × × × × × × × × × ×

rauc × × × × × × × × × ×

QFICβage[e-02] 2.78 2.45 2.32 1.78 1.71 1.52 1.39 0.92 1.42 2.12 5.30 2.25

βage

[e-02] -8.07 -7.97 -8.94 -7.78 -7.35 -6.28 -6.35 -5.67 -5.55 -6.27 -5.71 -4.44

NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβage[e-

02] lists the QFIC values ×10−2 when we focus on parameter βage. Row βage

[e-02] lists the values of

βage× 10−2.

54

needs to be done to investigate the theoretical properties of κ.

In high-dimensional settings of a large number of uncertain parameters, it is an-

ticipated that averaging all the possible candidate models is practically infeasible. A

backward elimination or a forward selection procedure introduced by Claeskens et al.

(2006) is preferable in order to dramatically reduce computational burden. However,

backward elimination and forward selection procedure may sometimes result in differ-

ent final models. A further investigation of the different final models is warranted.

55

4 Predictive Models in Personalized

Medicine

4.1 Introduction

As the growth of biotechnology and genomics continues, personalized medicine has be-

come an important topic in current medical practice. Evidence-based medicine selects

therapy based on a whole group of patients. Nevertheless, it ignores the heterogene-

ity among the patients within the cohort. In oncology studies, it has been shown that

cancers can be diverse in terms of their oncogenesis, pathogenesis, and responsiveness

to therapy even if they are in the same primary site and stage, as mentioned in Simon

(2013). Certain medication, which has a significant treatment effect on some patients,

may be of no use to others. Misuse of medication may expose the patients to the risks

of adverse events with no benefit, as illustrated in Dumas et al. (2007). By utilizing

individual level characteristics, such as patient demographics, imaging and exam re-

sults, laboratory parameters, and genetic or genomic information, personalized predic-

tive models can improve individualized prognosis and diagnosis and correspondingly

individualize and optimize therapy, as mentioned in Simon (2005) and Simon (2012).

They therefore can be applied to many fields in personalized medicine, including per-

sonalized preventive care, personalized prognosis, diagnosis and monitoring, as well as

personalized therapy selection.

56

In this century, there has been vigorous statistical research about personalized ther-

apy selection, such as Murphy (2002), Robins (2004), Moodie et al. (2007), Robins

et al. (2008), Li et al. (2008), Qian and Murphy (2011), Brinkley et al. (2010), Gunter

et al. (2011) and Zhang et al. (2012). Most of the research involves a single or series

of sequential decision making processes and focuses on estimating optimal treatment

regimes. Some statisticians have also conducted subgroup analysis to tailor their find-

ings to a specific group, such as Bonetti and Gelber (2000), Bonetti and Gelber (2004),

Song and Pepe (2004), Pfeffer and Jarcho (2006), Wang et al. (2007), and Cai et al.

(2011). For certain patients or subgroups, therapy that would result in the best esti-

mated mean response outcomes based on specified models with specified exploratory

variables is chosen from a set of candidate therapies. However, most of these meth-

ods are based on the assumption that the specified model with the specified exploratory

variables is the true underlying model. Due to heterogeneity in the population, differ-

ent exploratory variables might be identified to be significant for different patients or

subgroups. Therefore in this chapter, instead of focusing on personalized therapy selec-

tion, we target personalized prognosis and diagnosis by using personalized predictive

models.

The construction of a reliable prediction rule for future responses is heavily de-

pendent on the “adequacy” of the fitted model. As mentioned in Section 1.1, the final

model that results from the traditional model selection criteria, such as AIC and BIC,

is the model with the overall best property for the whole population regardless of in-

dividuals. Even if the model is selected for a certain subgroup, it can still catch the

overall information of that whole subgroup but does not necessarily work best for each

individual in that group, as illustrated in Henderson and Keiding (2005).

The focused information criterion (FIC), as mentioned in Chapter 1, focuses at-

tention directly on the parameter of the primary interest and aims to select the model

with the minimum estimated mean square error of the parameter’s estimation. The final

model therefore ideally is the best model for that parameter only. This characteristic

57

motivates us to apply FIC to personalized medicine, and in particular, to the field of

personalized prognosis and diagnosis.

Based on the notation introduced in Sections 2.2 and 3.2, for patient j, we assume

that the prediction on his/her response outcome can be written as the function of the

model parameters, denoted by ζj = ζj (θ,γ). The value of the personalized FIC for

patient j under candidate model S therefore can be written and estimated as:

FICj,S = 2ω>j π>S Σ

11

S πSωj + n[ω>j (Iq − DS)γ

]2,

where ωj = Σ10Σ−1

00 ∂ζj/∂θ − ∂ζj/∂γ is determined by ζj . Even for the same can-

didate model S, the predictions on different targeted patients can result in different

personalized FIC values.

As mentioned in Chapter 1, FIC has been extended to several commonly used mod-

els. We also proposed QFIC incorporating the GEE approach for longitudinal data in

Chapter 3. Based on the well established framework, we first illustrate an application

of the classic personalized FIC in one cross-sectional binary case study and provide a

personalized diagnosis on tumor penetration of the prostatic capsule for prostate cancer

patients in Section 4.2. In Section 4.3, the personalized QFIC is applied to a longi-

tudinal case study and used to make a personalized prognosis on patients’ treatment

responses in relapsing remitting multiple sclerosis disease. Survival data is very com-

mon in oncology studies. One application of the corresponding personalized FIC for

survival data is discussed in Section 4.4. In this section, we aim to make a personalized

prediction on the survival rate of veterans with advanced lung cancer. We conclude in

the final section.

4.2 Prostate Cancer Case Study

Prostate cancer is one of the most common cancers in American men. As it advances,

cancer cells may spread from the prostate to the capsule. Knowing the cancer stage

58

can help the doctor make a diagnosis and select a corresponding therapy. The first

case study we discuss in this chapter is a prostate cancer trial with the possible capsule

involvement, introduced in Hosmer and Lemeshow (1989).

In this trial, 151 out of 376 patients had prostate cancer that penetrated the prostatic

capsule. The binary response, penetrat, indicates tumor penetration (0 - absence and

1 - presence). The corresponding potential explanatory factors include: dre, result of

the digital rectal exam (1 - no nodule, 2 - unilobar left nodule, 3 - unilobar right nodule,

and 4 - bilobar nodule); caps, detection of the capsular involvement in the rectal exam

(1- absence and 2 - presence); psa, prostate-specific antigen value (in mg/ml); volume,

tumor volume obtained from ultrasound (in cm3); gscore, total Gleason score (0-10);

race, (1 - white and 2 - black); and age.

In this section, we aim to select a personalized predictive model for a targeted

prostate cancer patient based on the personalized FIC. By doing so, we can better pre-

dict the targeted patient’s tumor penetration rate and therefore provide a personalized

diagnosis for cancer progression.

4.2.1 Model Selection Implementation

We first prefit the data with the classic logistic regression model with all the potential

explanatory covariates listed above. The full model can be written as:

logit(µ) =β0 + β1gscore + β2dre + β3psa

+ β4race + β5caps + β6volume + β7age,

where µ is the conditional expectation of penetrat. By order of the significance, the

corresponding statistical inference is listed in Table 4.1 in terms of the coefficients’

estimates, standard errors and p-values.

Based on Table 4.1, we identify four highly significant covariates: int., gscore,

dre, and psa, which are the certain covariates. The predictive model, therefore, is

selected by the personalized FIC and also the traditional AIC (for comparison) from

59

Table 4.1: Prostate Cancer - Statistical Inference under Full Model

Covariate Estimate Std.err Z-value P-value

int. -6.1e+00 1.9e+00 -3.2 1.6e-03

gscore 9.7e-01 1.7e-01 5.8 5.8e-09

dre(2) 7.3e-01 3.6e-01 2.1 4.0e-02

dre(3) 1.5e+00 3.8e-01 4.0 5.9e-05

dre(4) 1.4e+00 4.6e-01 3.0 2.5e-03

psa 2.9e-02 1.0e-02 3.0 3.5e-03

race -6.8e-01 4.7e-01 -1.4 1.5e-01

caps 5.3e-01 4.6e-01 1.1 2.6e-01

volume -2.6e-03 2.6e-03 -1.0 3.2e-01

age -1.3e-02 2.0e-02 -0.7 5.0e-01

the remaining 24 = 16 candidate models listed in Table 4.2. Here and below, “×”

indicates the presence of the specific covariate in the specific candidate model and “”

means its absence.

As a result of AIC, m8 is selected as the single overall predictive model for all the

patients in the study and is circled in Table 4.2. Other than the four certain covariates,

m8 also contains one uncertain covariate race.

Nevertheless, for implementation of the personalized FIC, we consider each pa-

tient’s capsular penetration prediction as an individual targeted parameter. Therefore,

376 personalized predictive models are selected individually for the corresponding 376

patients by the personalized FIC from the 16 candidate models. Figure 4.1 provides

the frequencies of the 16 candidate models selected as the final personalized predictive

models for the 376 patients. From this histogram, we observe that instead of the single

predictive model m8, the personalized predictive models mainly distribute among the

60

Table 4.2: Prostate Cancer - Candidate Models

race caps volume age race caps volume age

m1 × × × × m9 × × ×

m2 × × m10 × ×

m3 × × × m11 × ×

m4 × × m12 ×

m5 × × × m13 × ×

m6 × × m14 ×

m7 × × m15 ×

m8 × m16

NOTE: × indicates presence of the covariate in the candidate model and means its absence.

candidate models: m6, m7, m8, m11, m12, m14, and m16. In particular, more than 50

patients choose m8, m12, and m16 as their predictive models.

4.2.2 Cross-Validation and Simulation Examination

In order to examine the predictive power of the personalized predictive models and the

single final model m8, we run a leave-one-out cross-validation experiment. The cor-

responding prediction error rates for the personalized predictive models and the single

predictive model are 0.345 and 0.351. The smaller prediction error rate of the person-

alized predictive models indicates the superiority of the personalized FIC compared to

the traditional AIC.

We also conduct a simulation study to compare the models’ performance at a rela-

tively small sample size level. In order to mimic the patients in this study, we randomly

sample with replacement from the 376 patients and generate 100 pseduo-patients with

observations on the response and seven potential covariates. Again, we implement the

61

Figure 4.1: Prostate Cancer - Frequency of Candidate Models Selected by the

Personalized FIC as the Personalized Predictive Models for 376 Patients

Candidate Models

Fre

qu

en

cy

01

02

03

04

05

06

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

personalized FIC and the traditional AIC on these 100 pseduo-patients and identify 100

personalized predictive models and one overall predictive model, based on which we

make the corresponding penetration rate predictions. The corresponding mean square

errors can be obtained by comparing the predictions to the true penetration rates. We

calculate the true penetration rates through the following formula:

p =exp(τ )

1 + exp(τ ),

where

τ =− 6.1 + 0.97gscore + 0.73dre(2) + 1.5dre(3) + 1.4dre(4)

+ 0.029psa− 0.68race + 0.53caps− 0.0026volume− 0.013age.

The coefficients used here are from Table 4.1. With one thousand replications, we arrive

at estimated mean square errors of 3.14 and 3.25 for the personalized predictive models

and the single predictive model. The smaller mean square error once again shows the

better behavior of the personalized FIC compared to AIC.

62

4.2.3 Group-Specific Analysis

In order to illustrate the personalized FIC’s consideration of the patients’ heterogeneity,

we also perform the group-specific analysis based on four uncertain covariates.

Figure 4.2 presents the histograms of the observations on two continuous uncer-

tain covariates, volume and age. In particular, about 50% of the patients have 2 cm3

tumor volume obtained from ultrasound. Based on these two histograms and also the

outcomes of two binary uncertain covariates, race and caps, we categorize all the 376

patients into two groups with four different partition criteria, as listed in Table 4.3.

Table 4.3: Prostate Cancer - Group Partition Criteria

Criterion Group A Group B

race white black

caps presence absence

volume > 2 cm3 = 2 cm3

age (60,75) [40,60] or [75,80]

In this subsection, we particularly target the patients whose personalized predictive

models are different from the single predictive model m8 in terms of each uncertain

covariate. As we reported in Subsection 4.2.1, m8 includes one extra uncertain co-

variate race in addition to the certain covariates. These targeted patients’ personalized

predictive models, therefore, either (i) exclude race that is shown in m8,(race[]

);

(ii) include caps that is excluded from m8,(caps[×]

); (iii) include volume that is ex-

cluded from m8,(volume[×]

); or (iv) include age that is excluded from m8,

(age[×]

).

For each partition criterion, the percentage (pct.) of the targeted patients is calcu-

lated based on the number of patients in each group (size) as reported in Table 4.4.

The percentage measures the difference shown in each group between the personalized

predictive models and the single predictive model in terms of each specific uncertain co-

63

Figure 4.2: Prostate Cancer - Histograms of Tumor Volume and Age

tumor volume

Fre

quen

cy

0 50 100 150

050

100

150

age

Fre

quen

cy

50 55 60 65 70 75 80

05

1015

2025

64

variate. The corresponding prediction error rates of the personalized predictive models

(erFIC) and the single predictive model (erAIC) are calculated only based on the targeted

patients and also shown in Table 4.4.

We highlight the relatively higher percentages, which are greater than 50%. Par-

ticularly in the bottom row of the table, a total of 56% of the patients exclude race in

their predictive models, regardless of the group partition criteria. The smaller predic-

tion error rates of the personalized predictive model compared to the single predictive

model for almost every category show the advantage of tailoring the predictive model

individually based on the patient’s personal information.

Based on each group partition criterion, we also compare the percentages of the

targeted patients within the corresponding two groups. Generally speaking, various

percentages in Table 4.4 do show the differences in each group-specific comparison.

This is especially true of the boxes circled in the dashed line that indicate the pairs with

quite different percentages. For the race-based partition, 61% of black patients include

caps in their personalized predictive models while only 29% of white patients do so.

Since white patients are the majority in this study (340 out of 376), yet show a low per-

centage, this indicates simultaneously the overall fitting property of m8 selected by the

traditional AIC. Based on the caps partition criterion, for the patients with and with-

out capsular involvement, 60% -vs- 20% of patients exclude race and 25% -vs- 65%

patients include volume in their final personalized predictive models. Group A and B

partitioned based on volume also reveal the quite different percentages, 72% -vs- 36%,

in terms of race’s existence in their personalized predictive models.

In summary, by considering the individual level information of prostate cancer pa-

tients, the personalized FIC considers patients’ heterogeneity and provides the best per-

sonalized predictive model for the targeted patient only. The smaller prediction error

rate, smaller mean square error, and the results of the group-specific analysis all show

the advantage of the personalized predictive models concluded by the personalized FIC

over the single predictive model selected by the traditional AIC. Therefore, diagnosis of

65

Table 4.4: Prostate Cancer - Group-Specific Percentages and Prediction Error Rates of

Targeted Patients with Four Partition Criteria

Criterion Group Inference race[] caps[×] volume[×] age[×] sizerace A pct. 56% 29% 29% 26% 340

erFIC 0.337 0.342 0.317 0.302erAIC 0.339 0.347 0.320 0.310

B pct. 61% 61% 33% 36% 36erFIC 0.355 0.399 0.460 0.175erAIC 0.398 0.437 0.484 0.211

caps A pct. 60% 36% 25% 25% 336erFIC 0.338 0.355 0.347 0.291erAIC 0.342 0.366 0.344 0.295

B pct. 20% 3% 65% 45% 40erFIC 0.365 0.009 0.286 0.264erAIC 0.418 0.004 0.317 0.308

volume A pct. 72% 48% 27% 26% 211erFIC 0.349 0.368 0.381 0.290erAIC 0.357 0.378 0.383 0.302

B pct. 36% 12% 33% 28% 165erFIC 0.312 0.274 0.283 0.282erAIC 0.314 0.284 0.291 0.291

age A pct. 58% 35% 23% 33% 267erFIC 0.335 0.320 0.341 0.287erAIC 0.339 0.327 0.343 0.295

B pct. 50% 27% 44% 11% 109erFIC 0.349 0.456 0.322 0.278erAIC 0.362 0.480 0.331 0.315

Total pct. 56% 32% 29% 27% 376

NOTE: pct. indicates the percentage of the targeted patients in each group; erFIC and erAIC indicate the

prediction error rates of the personalized predictive models and the single predictive model based on the

targeted patients in each group; size indicates the number of patients in each group; race[] indicates

the targeted patients whose personalized predictive models exclude race; caps[×] indicates the targeted

patients whose personalized predictive models include caps; volume[×] indicates the targeted patients

whose personalized predictive models include volume; and age[×] indicates the targeted patients whose

personalized predictive models include age.

66

the targeted prostate cancer patients’ capsular penetration can be better made individu-

ally based on the different personalized predictive models chosen by the personalized

FIC.

4.3 Relapsing Remitting Multiple Sclerosis Case Study

Other than the cross-sectional study, the personalized predictive models can also be

used for individualized prognosis and diagnosis in longitudinal studies. As an illus-

tration, the second case study we perform is from a longitudinal clinical trial, which

aims to assess the effects of neutralizing antibodies on interferon beta-1 (IFNB) in re-

lapsing remitting multiple sclerosis (RRMS), a disease that destroys the myelin sheath

surrounding the nerves.

We particularly focus on a 15-week magnetic resonance imaging (MRI) study in-

volving 50 patients in two locations, randomized into three treatment groups: 17 in

placebo, 17 in low-dose and 16 in high-dose. At each of 17 scheduled visits, a binary

exacerbation outcome exacerb was recorded at the time of each MRI scan, according

to whether an exacerbation began since the previous scan (1 - positive and 0 - nega-

tive). The potential explanatory covariates include: edss, expanded disability status

scale; dose, treatment groups (0 - placebo, 1 - low dose, and 2 - high dose); duration,

rrms duration (in years); lot, location indicator (0 - location A and 1 - location B);

sex; and visit, the visit times (in days).

The goal of this study is to identify a prediction rule, by which we can then accu-

rately predict the targeted patients’ exacerbation response to the specific treatment. We

can make a better prediction even at the targeted visit time.

67


We consider the following generalized additive partially linear models incorporating

the GEE approach for this study:

logit(µ) = η1(visit) + η2(duration) +β1edss+β2dose+β3lot+β4age+β5sex

where visit and duration are set in nonparametric components and µ is the con-

ditional expectation of exacerb. Figure 4.3 plots the empirical exacerbation rates at

different visit days and with different RRMS duration time. It confirms the nonlinear

trends of these two covariates on the log odds ratio of the response.

We therefore prefit this full model using the polynomial spline method incorporat-

ing the GEE approach with EX. The two degree natural splines are used to approximate

the two nonparametric functions. The fitted curves of these two nonparametric com-

ponents, η1(visit) and η2(duration), are depicted in the dashed line in Figure 4.3.

Summaries of the rest coefficients, including their estimates, standard errors, and cor-

responding p-values, are listed in Table 4.5.

Table 4.5: RRMS - Statistical Inference under Full Model

Covariate Estimate Std.err Wald P-value

edss 2.9e-01 8.8e-02 11.1 8.7e-04

int. -1.4e+00 8.2e-01 3.0 8.6e-02

dose(1) 7.5e-02 3.1e-01 0.1 8.1e-01

dose(2) -3.5e-01 3.1e-01 1.3 2.5e-01

lot 3.9e-01 3.3e-01 1.4 2.4e-01

age -1.3e-02 1.5e-02 0.8 3.8e-01

sex 1.4e-01 3.4e-01 0.2 6.7e-01

Based on Table 4.5, other than the two nonparametric components, we also include

the highly significant covariate edss in the narrow model and perform the model selec-

68

Figure 4.3: RRMS - Empirical and Estimated Exacerbation Rates on Visit Days and

Duration Time

20 40 60 80 100

0.10

0.15

0.20

0.25

Visit (in Days)

Exa

cerb

atio

n R

ate

5 10 15 20 25

0.0

0.1

0.2

0.3

0.4

Duration (in Years)

Exa

cerb

atio

n R

ate

Empirical RateEstimated Rate

69

tion procedure among the remaining five uncertain factors. There are a total of 25 = 32

candidate models, as listed in Table 4.6.

Table 4.6: RRMS - Candidate Models

int. dose lot age sex int. dose lot age sex

m1 × × × × × m17 × × × ×

m2 × × × × m18 × × ×

m3 × × × × m19 × × ×

m4 × × × m20 × ×

m5 × × × × m21 × × ×

m6 × × × m22 × ×

m7 × × × m23 × ×

m8 × × m24 ×

m9 × × × × m25 × × ×

m10 × × × m26 × ×

m11 × × × m27 × ×

m12 × × m28 ×

m13 × × × m29 × ×

m14 × × m30 ×

m15 × × m31 ×

m16 × m32


The traditional AIC-type model selection criterion ∆AIC for longitudinal data in-

corporating the GEE approach, proposed in Chapter 2, selects m30 as the final single

predictive model, which can be written as:

logit(µ) = η1(visit) + η2(duration) + β1edss + β4age.

70

It is circled in Table 4.6. Regardless of the different characteristics of the different

patients at the different visit times, m30 is the overall best choice for this longitudinal

study.

On the other hand, the personalized QFIC, proposed in Chapter 3, considers the

observations’ heterogeneity among the patients and even among the same patient’s dif-

ferent visit times. By taking each time point’s exacerbation prediction as the individual

targeted parameter, the personalized QFIC chooses different personalized predictive

models for different patients at the different visit times. In this study, we have 50 pa-

tients and about 17 visit times for each patient, therefore totaling 822 observations.

Figure 4.4 provides the frequencies of the 32 candidate models chosen as the corre-

sponding 822 personalized predictive models. From the histogram in Figure 4.4, we

can observe that other than ∆AIC’s single predictive model m30, m16 also has a rela-

tively high frequency.

Figure 4.4: RRMS - Frequencies of Candidate Models Selected by the Personalized

QFIC as the Personalized Predictive Models for 822 Observations

Candidate Models

Fre

quen

cy

050

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

71

In particular, on certain targeted visit days, namely days 7, 31, 61 and 104, the

frequencies of the candidate models selected as the personalized predictive models for

these 50 patients are also plotted in Figure 4.5. We do observe a slight difference among

these four histograms. Due to the relatively strong correlation among each patient’s

repeated measurements, the four histograms all have relatively higher frequencies to

choose m16 and m30 for the 50 patients. This general trend in these histograms is

consistent with the trend shown in the overall histogram in Figure 4.4.

By using a cross-validation experiment, we also examine the predictive powers of

the 822 personalized predictive models and the single predictive model m30. Again,

due to the complicated correlation structure of each patient’s repeated measurements,

a leave-one-patient-out experiment is used. Based on one thousand replications, the

prediction error rates of 0.265 for the personalized predictive models and 0.272 for the

single predictive model show the superiority of tailoring predictive models individually

by the personalized QFIC.


Similar to the discussion in Subsection 4.2.3, in order to illustrate the personalized

QFIC’s consideration of the patients’ heterogeneity, we also carry out the group-specific

analysis based on four uncertain covariates: dose, lot, sex and age. The analysis is

performed respectively at four different target visit days, namely days 7, 31, 61 and

104. Again, we specifically focus on the patients whose personalized predictive models

are different from the single predictive model m30 in terms of the presence or absence

of each uncertain covariate.

72

Figu

re4.

5:R

RM

S-F

requ

enci

esof

Can

dida

teM

odel

Sele

cted

byth

ePe

rson

aliz

edQ

FIC

asth

ePe

rson

aliz

edPr

edic

tive

Mod

els

for5

0Pa

tient

sat

Vis

itD

ays

of7,

31,6

1an

d10

4

Day

7

Can

dida

te M

odel

s

Frequency

05101520

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

Day

31

Can

dida

te M

odel

s

Frequency

05101520

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

Day

61

Can

dida

te M

odel

s

Frequency

05101520

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

Day

104

Can

dida

te M

odel

s

Frequency

05101520

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

73

As reported in Subsection 4.3.1, m30 includes one extra uncertain covariate age,

other than the certain covariates. Therefore, the targeted patients in this subsection have

their personalized predictive models: either (i) including dose[×] based on dose group

partition criterion; (ii) including lot[×] based on lot partition criterion; (iii) including

sex[×] based on sex criterion; or (iv) excluding age[] based on age criterion. The

corresponding percentages, prediction error rates estimated through the personalized

predictive models and the single predictive model for the targeted patients at the visit

days of 7, 31, 61, and 104 are reported in Table 4.7.

The high percentages of targeted patients in Table 4.7 show the relatively large

differences of the personalized predictive models from the single predictive model in

the corresponding groups in terms of the specific uncertain covariate’s existence. In

particular, we highlight the percentages greater than 30%.

For the uncertain covariate age, in the earlier visit days of 7 and 31, only the

younger group, composed of patients who are younger than 30 years old, has the higher

percentages. In other words, 50% of the patients in the younger group exclude age from

their personalized predictive models at day 7 and 31. The majority of the patients in the

remaining two groups have their personalized predictive models consistent with m30 in

terms of the existence of age. But in the later visit days of 61 and 104, more than 30%

of patients in all three groups exclude age from their personalized predictive models,

as shown in the dashed box at the bottom right corner of Table 4.7. This indicates the

larger difference between the predictive models concluded by the personalized QFIC

and ∆AIC in the later visit days compared to the earlier days. It simultaneously shows

the personalized QFIC’s consideration of the heterogeneity among observations at dif-

ferent visits.

In addition, for consideration of the heterogeneity among the patients, the top dashed

box encircles the placebo group based on the dose group partition criterion. At all four

visit days, more than 35% of patients in the placebo group include dose in their person-

alized predictive models compared to the other two treatment groups. It is reasonable

74

to infer that the patients in the placebo group tend to have no treatment effect. There-

fore, the treatment indicator dose may be significant for patients in the placebo group

to better predict their exacerbation rate.

Finally, most of the categories in Table 4.7 have the smaller prediction error rates of

the personalized QFIC compared to ∆AIC. This again shows the advantage of tailoring

the predictive model individually based on the individual level information.

4.3.3 Statistical Inference on Targeted Patients

Rather than just focusing on prediction accuracy of the patients in the study, we also try

to predict future patients. In this section, we particularly consider 36 year old patients

who have 8.8 years of RRMS disease with the expanded disability status scale of 4. All

these values are actually the medians of the observations on the continuous potential

covariates age, duration, and edss in this current study.

To illustrate that the personalized predictive models are tailored by the personalized

QFIC for each targeted patient, we place these patients into twelve different scenarios

based on the three categorical potential factors of sex, lot, and dose. Table 4.8 records

the corresponding twelve personalized predictive models selected by the personalized

QFIC. The corresponding exacerbation rate predictions through the 17 visit times for

each scenario are also plotted in Figure 4.6.

In Table 4.8, the female patients who are in placebo and low-dose group at location

A and the male patients who are in low-dose group at location A and in high-dose

group at location B have their personalized predictive models only include age, thus

consistent with the single predictive model m30.

75

Tabl

e4.

7:R

RM

S-G

roup

-Spe

cific

Perc

enta

ges

and

Pred

ictio

nE

rror

Rat

esfo

rthe

Targ

eted

Patie

nts

atth

eTa

rget

edV

isit

Day

s

with

Four

Part

ition

Cri

teri

a

Day

7D

ay31

Day

61D

ay10

4C

rite

rion

Gro

uppc

t.er

FIC

erA

ICsi

zepc

t.er

FIC

erA

ICsi

zepc

t.er

FIC

erA

ICsi

zepc

t.er

FIC

erA

ICsi

zedose

plac

ebo

35%

0.29

30.

303

1750

%0.

233

0.23

816

38%

0.17

00.

224

1738

%0.

170

0.22

416

[×]

low

0%-

-17

0%-

-17

0%-

-17

0%-

-15

high

19%

0.15

10.

244

1619

%0.

088

0.11

716

7%0.

805

0.84

216

7%0.

805

0.84

214

tota

l18

%0.

246

0.28

450

22%

0.19

40.

205

4916

%0.

261

0.31

250

16%

0.26

10.

312

45lot

A0%

--

100%

--

90%

--

100%

--

8[×

]B

3%0.

255

0.42

040

3%0.

818

0.75

840

5%0.

192

0.21

840

11%

0.53

00.

515

37to

tal

2%0.

255

0.42

050

2%0.

818

0.75

849

4%0.

192

0.21

850

9%0.

530

0.51

545

sex

mal

e0%

--

385%

0.40

50.

426

3718

%0.

2609

0.24

338

6%0.

552

0.62

835

[×]

fem

ale

8%0.

292

0.47

212

17%

0.09

60.

195

1217

%0.

099

0.15

512

10%

0.14

20.

303

10to

tal

2%0.

292

0.47

250

8%0.

250

0.31

149

18%

0.22

50.

224

507%

0.41

60.

520

45age

<=

3050

%0.

517

0.53

28

50%

0.63

90.

644

875

%0.

917

0.88

78

50%

0.45

20.

445

6[

]30

-40

19%

0.11

00.

114

2628

%0.

132

0.14

825

35%

0.07

40.

085

2640

%0.

213

0.24

325

>=

4025

%0.

071

0.09

316

25%

0.04

60.

048

1650

%0.

302

0.29

616

50%

0.28

20.

295

14to

tal

26%

0.14

20.

152

5031

%0.

161

0.17

149

46%

0.20

40.

207

5044

%0.

261

0.28

245

NO

TE

:Ate

ach

targ

eted

visi

tday

,pct

.in

dica

tes

the

perc

enta

geof

the

targ

eted

patie

nts

inea

chgr

oup;

erFI

Can

der

AIC

indi

cate

the

pred

ictio

ner

ror

rate

s

ofth

epe

rson

aliz

edpr

edic

tive

mod

els

and

the

sing

lepr

edic

tive

mod

elba

sed

onth

eta

rget

edpa

tient

sin

each

grou

p;si

zein

dica

tes

the

num

ber

ofpa

tient

s

inea

chgr

oup;

age[

]in

dica

tes

the

targ

eted

patie

nts

who

sepe

rson

aliz

edpr

edic

tive

mod

els

excl

udeage

;dose[×

]in

dica

tes

the

targ

eted

patie

nts

who

se

pers

onal

ized

pred

ictiv

em

odel

sin

clud

edose

;lot[×

]in

dica

tes

the

targ

eted

patie

nts

who

sepe

rson

aliz

edpr

edic

tive

mod

els

incl

udelot

;an

dsex[×

]

indi

cate

sth

eta

rget

edpa

tient

sw

hose

pers

onal

ized

pred

ictiv

em

odel

sin

clud

esex

.

76

Table 4.8: RRMS - Personalized Predictive Models Concluded by the Personalized

QFIC for Targeted Patients under Twelve Scenarios

sex

dose lot male female

placebo A int. age sex age

B int. int. sex

low A age age

B int. lot.

high A dose age dose age sex

B age int. lot sex

The targeted patients who receive high-dose in location A include dose in their

personalized predictive model. The female patients who are in low-dose and high-dose

groups in location B include lot in their personalized predictive models. The uncer-

tain explanatory covariate age is significant for all patients in location A, regardless

of their gender and treatment. In location B, however, only high-dose males identify

age’s significance. Among these twelve scenarios, females tend to include sex in their

personalized predictive models.

From Figure 4.6 we observe that both the personalized predictive models and the

single predictive model m30 show the U-shaped exacerbation rate predication along

with the visit time. But in the different treatment groups, namely the placebo, low-

dose, and high-dose groups, the personalized predictive exacerbation rate is decreasing

as the dose level changes from placebo to high, while it stays the same by the single

predictive model m30.

In conclusion, the personalized QFIC utilizes the individual level information and

considers the heterogeneity among RRMS patients and even among the repeated mea-

77

Figure 4.6: RRMS - Exacerbation Rate Predictions for Targeted Patients under the

Single Predictive Model and the Twelve Personalized Predictive Models

20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

male at location A

Visit Days

Exa

cerb

atio

n R

ates

∆AIC − m30QFICplacebo − m13

QFIClow − m30QFIChigh − m22

20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

male at location B

Visit Days

Exa

cerb

atio

n R

ates



20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

female at location A

Visit Days

Exa

cerb

atio

n R

ates



20 40 60 80 100

0.10

0.15

0.20

0.25

0.30

female at location B

Visit Days

Exa

cerb

atio

n R

ates



NOTE: QFICplacebo indicates the personalized predictive model for targeted patient in placebo group.

QFIClow indicates the personalized predictive model for targeted patient in low-dose group. QFIChigh

indicates the personalized predictive model for targeted patient in high-dose group.

78

surements from the same patient at different visit times. With the personalized predic-

tive model selected by the personalized QFIC, we can therefore reach a more accurate

exacerbation rate prediction and make a better prognosis and diagnosis on treatments

for the targeted patient only.

4.4 Veteran’s Lung Cancer Case Study

As we mentioned in Section 4.1, cancers can be very diverse even if they are in the

same primary site and stage, as discussed in Simon (2013). Heterogeneity is therefore

highly needed for consideration in oncology studies. Since survival outcomes are very

common in such studies, we are motivated to apply the adjusted focused information

criterion as introduced in Hjort and Claeskens (2006), into a lung cancer survival study.

As mentioned in Harris et al. (1989) and Campling et al. (2005), “Lung cancer is

an urgent priority among veterans. Not only is the incidence higher, but the survival is

lower than in civilian populations.” The third case study in this chapter concerns lung

cancer veterans. The dataset was collected by the Veterans Administration Lung Cancer

Study Group and reported in Prentice (1973). It has been studied by Kalbfleisch and

Prentice (2002) and further discussed in Bennett (1983) and Pettitt (1984).

In this trial, 137 veterans with advanced inoperable lung cancer were randomized

into one of two chemotherapeutic agents. The primary failure time was the time to

death event and nine survival times were censored. The potential explanatory covari-

ates include: kscore, Karnofsky performance score of patients’ daily living activities

performance measured at randomization (10 - 30 - completely hospitalized, 40 - 60 -

partial confinement, and 70 - 90 - able to care for self); diagt, time period from diag-

nosis to randomization (in months); age, pateints’ age at the beginning of the study (in

years); prior, an indicator whether the patient has had prior therapy (0 - no and 10 -

yes); treat, chemotherapeutic agents (1 - standard and 2 - test); and type, histological

types of tumor cells (1 - squamous , 2 - small, 3 - adeno-carcinoma, and 4 - large).

79

For patients with cancer, especially advanced cancer, survival rates are commonly

used by a doctor as a standard and effective way of discussing individualized prog-

nosis. In this study, we focus mainly on the prediction of each lung cancer veteran’s

survival rate, in particular, on the 30th day since randomization. We can thus provide

the corresponding personalized prognosis for each lung cancer veteran.


We first prefit the data with the Cox proportional hazard linear regression model with

all the potential explanatory covariates, as listed above:

λ(t) = λ0(t) exp(β1kscore + β2type + β3treat + β4age + β5prior + β6diagt

).

The corresponding estimates, standard errors and p-values are listed in Table 4.9.

Table 4.9: Lung Cancer - Statistical Inference under Full Model

Covariate Estimates Exp(estimates) Std.err Z-value P-value

kscore -3.3e-02 9.7e-01 5.5e-03 -6.0 2.6e-09

type(2) 8.6e-01 2.4e+00 2.8e-01 3.1 1.8e-03

type(3) 1.2e+00 3.3e+00 3.0e-01 4.0 7.1e-05

type(4) 4.0e-01 1.5e+00 2.8e-01 1.4 1.6e-01

treat 2.9e-01 1.3e+00 2.1e-01 1.4 1.6e-01

age -8.7e-03 9.9e-01 9.3e-03 -0.9 3.5e-01

prior 7.2e-03 1.0e+00 2.3e-02 0.3 7.6e-01

diagt 8.1e-05 1.0e+00 9.1e-03 0.0 9.9e-01

Under the full model, Table 4.9 indicates the highly significant covariates, kscore

and type, which are included in the narrow model as the certain covariates. We there-

fore run the model selection procedure among the remaining four uncertain covariates:

80

treat, age, prior, and diagt. Accordingly, there are 24 = 16 candidate models as

listed in Table 4.10.

Table 4.10: Lung Cancer - Candidate Models

treat age prior diagt treat age prior diagt

m1 × × × × m9 × × ×

m2 × × × m10 × ×

m3 × × × m11 × ×

m4 × × m12 ×

m5 × × × m13 × ×

m6 × × m14 ×

m7 × × m15 ×

m8 × m16


The traditional AIC selects the circled narrow model m16 in Table 4.10 as the final

single predictive model with overall best fitting for all the lung cancer veterans. Nev-

ertheless, by considering the veterans’ heterogeneity, the personalized FIC provides

different personalized predictive models for different targeted patients based on their

individual information. The frequencies of the 16 candidate models selected as the per-

sonalized predictive models for all 137 lung cancer veterans are plotted in Figure 4.7.

Relatively consistent with AIC, Figure 4.7 reveals the highest frequency of choosing

m16 as the personalized predictive models. Other than the narrow model, m8 with

treat and m14 with prior also have relatively higher frequencies.


To check the heterogeneity among the lung cancer veterans in the study, we perform

group-specific analysis from two different perspectives. Due to the higher frequencies

81

Figure 4.7: Lung Cancer - Frequencies of Candidate Models Selected by the

Personalized FIC as the Personalized Predictive Models for 137 Veterans

Candidate Models

Freq

uenc

y

010

2030

4050

6070

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

of m8, m14, and m16 in Figure 4.7, we first compare three groups G8, G14 and G16,

composed of the lung cancer veterans whose personalized predictive models are chosen

to be m8, m14 and m16 respectively.

Figure 4.8 draws the Kaplan Meier estimated survival curves for all three groups.

From the graphs, we observe that lung cancer veterans in G8 have the smallest esti-

mated survival rate of around 0.2 at the 30th day. Veterans in G14 have the relatively

larger estimated survival rate of around 0.8 compared to G8. G16 has the largest sur-

vival rate of around 0.9.

Crooks et al. (1991) has mentioned that patients with higher Karnofsky scores at

the time of tumor diagnosis have better survival and quality of life over the course of

their illness. Therefore, the histograms of the Karnofsky performance score for all three

groups are plotted in Figure 4.8 as well. Consistent with Kaplan Meier estimates, the

Karnofsky scores in G8 tend to be relatively lower. The lung cancer veterans in G16

tend to have relatively higher Karnofsky scores compared to the other two groups.

82

Figure 4.8: Lung Cancer - Kaplan Meier Estimations and Karnofsky Scores

Histograms on Veterans in Groups G8, G14 and G16

0 20 40 60 80

0.0

0.4

0.8

KM Curve in G8

Time (in days)

Sur

viva

l Rat

e

Day 30

Histogram in G8

Karnofsky ScoreF

requ

ency

0 20 40 60 80 100

05

1020

30

0 50 100 150 200 250 300

0.0

0.4

0.8

KM Curve in G14

Time (in days)

Sur

viva

l Rat

e

Day 30

Histogram in G14

Karnofsky Score

Fre

quen

cy

0 20 40 60 80 100

05

1020

30

0 200 400 600 800 1000

0.0

0.4

0.8

KM Curve in G16

Time (in days)

Sur

viva

l Rat

e

Day 30

Histogram in G16

Karnofsky Score

Fre

quen

cy

0 20 40 60 80 100

05

1020

30

83

Per the previous discussion, m8 tends to be assigned to the lung cancer veterans

who have a relatively lower kscore and a relatively smaller survival rate at the 30the

day. The existence of the uncertain covariate treat in m8, therefore, indicates that the

treatment assignment is an important factor for the survival rate prediction on veterans

who have much more advanced lung cancer. Compared to the narrow model m16,

m14 includes one extra uncertain covariate prior. It is also reasonable to infer that

information about whether prior therapy is given or not is important to the relatively

advanced lung veterans. In addition, more than 50% of the lung cancer veterans select

m16 as their personalized predictive models. This is consistent with the overall property

of the single predictive model selected by AIC.

The other perspective of group-specific analysis we consider is the heterogeneity

among patients in terms of their different tumor cell types. Figure 4.9 presents the

frequencies of selecting candidate models as the final personalized predictive models

for lung cancer veterans with different tumor cell types. The histograms show that

the majority of veterans with squamous and large tumor cells tend to select the narrow

model, m16, as their final personalized predictive models. The final models for veterans

with small and adeno tumor cells are roughly equally distributed among the candidate

models m8, m14 and m16.

4.4.3 Adjusted Prediction Error

Due to the existence of the censorship and the time-dependence in survival analysis,

Schumacher et al. (2007) adjusted the prediction error via Brier’s score, in Brier (1950),

as a function of time.

For the targeted patient j, denote (yj, sj) the observed time and the censor status,

where yj is the censoring time when sj = 1 and yj is the event time when sj = 0. At

time t, pj(t) denotes the survival status, in other words, if the patient is alive pj(t) = 1,

84

Figure 4.9: Lung Cancer - Frequencies of Candidate Model Selected by the

Personalized FIC as the Personalized Predictive Models for Veterans with Different

Tumor Cell Types

Squamous Cell

Candidate Models

Fre

quen

cy

05

1015

2025

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Small Cell

Candidate Models

Fre

quen

cy

05

1015

2025

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Adeno Cell

Candidate Models

Fre

quen

cy

05

1015

2025

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Large Cell

Candidate Models

Fre

quen

cy

05

1015

2025

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

85

otherwise, pj(t) = 0. The adjusted prediction error at time t is defined as:

er(t) =∑j

[pj(t)− rj(t)

]2w[t, g(t)

],

where rj(t) is the estimated survival rate for the patient j at time t. The weights w

remove the large sample censoring bias and are given by:

w[t, g(t)

]=

Iyj ≤ t, sj = 0g(yj−)

+Iyj > tg(t)

and g(t) denotes an estimate of the conditional probability of being uncensored at time

t (Gerds and Schumacher (2006) and van der Laan and Robins (2003)). Here, we use

the Kaplan-Meier estimate of the censoring survival function substituted for g, making

it consistent for the prediction error (Gerds and Schumacher (2006), Graf et al. (1999)

and Korn and Simon (1991)).

With the leave-one-out cross-validation experiment, we calculate the adjusted pre-

diction error rates for the single final predictive model of 0.273, and the personalized

predictive models of 0.233. The better behavior for the personalized predictive models

once again shows the advantages of considering heterogeneity and using different ex-

planatory factors for different targeted individuals. Therefore, the future personalized

prognosis, in terms of the survival rate at day 30, can be more accurately predicted

through the personalized predictive model selected by the personalized FIC.


Through these three case studies, namely the cross-sectional study in prostate cancer,

longitudinal study in relapsing remitting multiple sclerosis disease and survival study

in lung cancer, we illustrate the applications of the personalized FIC in identifying the

personalized predictive models for personalized prognosis and diagnosis. We thus show

the applicability of the personalized FIC in one field of personalized medicine.

86

Different from the traditional model selection criteria, FIC does not attempt to as-

sess the overall fitting of candidate models but instead focuses attention directly on the

parameter of primary interest. Generally speaking, in the model selection procedure,

including the unnecessary covariates may lead to estimates with small bias but high

variance, while excluding the necessary covariates typically yields large bias though

small variance. FIC balances the goals of having a small bias and a small variance and

aims to provide the small mean square error of the estimates.

With the information from all the patients in the study, the traditional procedure

makes the statistical inference and the prediction for any targeted patient with the

“overall fitting” model selected by the traditional model selection criterion. Due to

the patients’ heterogeneity, the model with the overall best properties may not be the

best for the targeted patient. By using the individual level information from the tar-

geted patient, the personalized FIC focuses on individual prediction and aims to find

his/her own best model in terms of the minimum mean square error estimate of his/her

own prediction. Leave-one-out cross-validation experiments and group-specific analy-

sis were performed for all three case studies. The smaller prediction error rate attained

by using the different personalized predictive models compared to the single “overall

best” model shows the superiority of our perspective.

In this chapter, we only utilize FIC’s individualized consideration to personalized

prognosis and diagnosis. More research and applications of the personalized FIC need

to be invested in the field of personalized medicine, such as the personalized therapy

selection and monitoring.

87

5 Discussion and Future Work

Quasi-likelihood based model selection and model averaging procedures incorporat-

ing the GEE approach proposed in this thesis, namely ∆AIC, QFIC and QFMA, are

originally designed for analyzing regular longitudinal and correlated data only. Never-

theless, missing data actually arise in longitudinal study quite often. Regarding missing

mechanism, there are mainly three categories, that are based on Little and Rubin (1987)

and Little (1995): missing completely at random (MCAR), missing at random (MAR)

and missing not at random (MNAR).

For the longitudinal data with missing outcomes, the GEE approach attempts to be

robust by relaxing assumptions on the model, but with the price of imposing a relatively

stronger missing mechanism assumption, MCAR. Even if the working correlation ma-

trix is correctly specified, failing to meet the MCAR assumption can result in a very

biased estimation. The weighted generalized estimating equations (WGEE) approach,

therefore, was proposed by Robins et al. (1994) and Robins et al. (1994). It aims to

avoid the bias by ignoring the unavailable observations and placing more weight on

the remainder. One direction of future research is to extend the current model selec-

tion and model averaging procedures incorporating the GEE approach to the procedure

incorporating WGEE approach so as to deal with the missing outcome in longitudinal

study.

88

One advantage of the GEE estimates is their consistency even if the working cor-

relation matrix is misspecified. Nevertheless, choosing the working correlation matrix

that is close to the true correlation structure, can increase the estimates’ efficiency.

There are, therefore, two issues involved in model selection and averaging for longitu-

dinal data incorporating the GEE approach: variable selection and working correlation

matrix selection.

In this thesis, we consider only the model selection and averaging procedures re-

garding the potential explanatory covariates. For the longitudinal data incorporating

the GEE approach, more research needs to be conducted on the working correlation

matrix’ selection and averaging. We can build a similar local misspecification frame-

work, as mentioned in Subsection 2.2.1, where IN can be viewed as the narrow model

with the diagonal parameters in the matrix being the certain parameters. The remaining

parameters in EX, AR or even UN can all be treated as the uncertain parameters.

Another direction of future research is to extend the model selection and averaging

procedures to the mixed effects models and to develop the optimal weights’ choice

strategy from both methodological and practical perspectives. Due to the two stages’

statistical inference, population level and individual level, as mentioned in Chapter

1, the development for the mixed effects models can be very challenging. Other than

selection of the fixed effects, random effects and their covariance matrix structures have

to be chosen as well, especially for the individual level inference.

In the process of theoretical derivation, there are actually more scenarios that have

to be considered. For longitudinal data collected over place, the number of the spots

or clusters may be limited in the study. Therefore, a small sample size in terms of the

number of clusters is involved, where:

n = O(1) and mi →∞, where i = 1, · · · , n.

If the repeated measurements are collected over time, the times of visits for the individ-

89

uals have to be limited for certain studies, where:

mi = O(1) with i = 1, · · · , n and n→∞.

In addition, there is no closed form of fixed and random effects’ estimation for the gen-

eralized linear mixed effects models, as mentioned in Chapter 1. Certain approximation

methods have to be used as well.

90

Bibliography

E.P. Acosta, H. Wu, S.M. Hammer, S. Yu, D.R. Kuritzkes, A. Walawander, J.J. Eron,

C.J. Fichtenbaum, C. Pettinelli, D. Neath, E. Ferguson, A.J. Saah, and J.G. Gerber,

“Comparison of two indinavir/ritonavir regimens in the treatment of HIV-infected

individuals.,” J Acquir Immune Defic Syndr, 37:1358–66, 2004.

H. Akaike, “Maximum Likelihood Identification of Gaussian Autoregressive Moving

Average Models,” Biometrika, 60:255–265, 1973.

Huiman X. Barnhart and John M. Williamson, “Goodness-of-Fit Tests for GEE Mod-

eling with Binary Responses,” Biometrics, 54:720–729, 1998.

Steve Bennett, “Analysis of survival data by the proportional odds model,” Statistics in

Medicine, 2(2):273–277, 1983.

Marco Bonetti and Richard D. Gelber, “A graphical method to assess treatmentcovari-

ate interactions using the Cox model on subsets of the data,” Statistics in Medicine,

19(19):2595–2609, 2000.

Marco Bonetti and Richard D. Gelber, “Patterns of treatment effects in subsets of

patients in clinical trials,” Biostatistics, 5(3):465–481, 2004.

N.E. Breslow and D. G. Clayton, “Approximate inference in generalized linear mixed

models,” Journal of the American Statistical Association, 88:9–25, 1993.

91

G. W. Brier, “Verification of Forecasts Expressed in Terms of Probability,” Monthly

Weather Review, 78:1, 1950.

Jason Brinkley, Anastasios Tsiatis, and Kevin J. Anstrom, “A Generalized Estimator of

the Attributable Benefit of an Optimal Treatment Regime,” Biometrics, 66(2):512–

522, 2010.

S.T. Buckland, K.P. Burnham, and N.H. Augustin, “Model Selection: An Integral Part

of Inference,” Biometrics, 53:603–618, 1997.

Kenneth P. Burnham, David R. Anderson, and Kenneth P. Burnham, Model selection

and multimodel inference: A practical information-theoretic approach, Springer,

2nd edition, 2002.

Tianxi Cai, Lu Tian, Peggy H. Wong, and L. J. Wei, “Analysis of randomized compara-

tive clinical trial data for personalized treatment selections,” Biostatistics, 12(2):270–

282, 2011.

Barbara G. Campling, Wei-Ting Hwang, Jiameng Zhang, Stephanie Thompson,

Leslie A. Litzky, Anil Vachani, Ilene M. Rosen, and Kenneth M. Algazy, “A

population-based study of lung carcinoma in Pennsylvania,” Cancer, 104(4):833–

840, 2005.

Eva Cantoni, Joanna Mills Flemming, and Elvezio Ronchetti, “Variable selection for

marginal longitudinal generalized linear models,” Biometrics, 61(2):507–514, 2005.

G. Claeskens and R. J. Carroll, “An asymptotic theory for model selection inference in

general semiparametric problems,” Biometrika, 94:249–265, 2007.

G. Claeskens, C. Croux, and J. van Kerckhoven, “Variable selection for logistic re-

gression using a prediction-focused information criterion,” Biometrics, 62:972–979,

2006.

92

G. Claeskens and N.L. Hjort, “The focused information criterion,” Journal of the

American Statistical Association, 98:900–916, 2003.

Gerda. Claeskens and N. L. Hjort, Model Selection and Model Averaging, Cambridge

University Press, Cambridge, 2008.

Valerie Crooks, Susan Waller, Tom Smith, and Theodore J. Hahn, “The Use of the

Karnofsky Performance Scale in Determining Outcomes and Risk in Geriatric Out-

patients,” Journal of Gerontology, 46(4):M139–M144, 1991.

Dmitry Danilov and Jan R. Magnus, “On the Harm That Ignoring Pretesting Can

Cause,” Journal of Econometrics, 122:27–46, 2004.

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incom-

plete data via the EM algorithm,” Journal of the Royal Statistical Society Series

B-Methodological, 39(1):1–38, 1977.

P.J. Diggle, P. Heagerty, K.Y. Liang, and S.L. Zeger, Analysis of Longitudinal Data,

Oxford University Press, Oxford, 2 edition, 2002.

D. Draper, “Assessment and propagation of model uncertainty,” Journal of the Royal

Statistical Society, Series B, 57:45–70, 1995.

Dumas, E. Todd, Hawke, L. Roy, Lee, and R. Craig, “Warfarin Dosing and the Promise

of Pharmacogenomics,” Current Clinical Pharmacology, 2(1):11–21, January 2007.

John J. Dziak and Runze Li, An Overview on Variable Selection for Longitudinal Data,

World Sciences Publisher, Singapore, 2007.

Garrett M. Fitzmaurice, Marie Davidian, Geert Verbeke, and Geert Molenberghs,

Longitudinal Data Analysis, Wiley Series in Probability and Statistics. Wiley-

Interscience [John Wiley & Sons], Hoboken, NJ, 2009.

Wenjiang J. Fu, “Penalized estimating equations,” Biometrics, 59(1):126–132, 2003.

93

Edward I. George, “The variable selection problem,” Journal of the American Statisti-

cal Association, 95(452):1304–1308, 2000.

Thomas A. Gerds and Martin Schumacher, “Consistent Estimation of the Expected

Brier Score in General Survival Models with Right-Censored Event Times,” Biomet-

rical Journal, 48(6):1029–1040, 2006.

Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher, “Assessment

and comparison of prognostic classification schemes for survival data,” Statistics in

Medicine, 18(17-18):2529–2545, 1999.

L. Gunter, J. Zhu, and S. A. Murphy, “Variable Selection for Qualitative Interactions.,”

Statistical methodology, 1(8):42–55, 2011.

D. J. Hand and V. Vinciotti, “Local versus global models for classification problems:

Fitting models where it matters,” AMST, 57:124–131, 2003.

B. E. Hansen, “Challenges for econometric model selection,” Econometric Theory,

21:60–68, 2005.

Randall E. Harris, James R. Hebert, and Ernst L. Wynder, “Cancer risk in male veterans

utilizing the veterans administration medical system,” Cancer, 64(5):1160–1168,

1989.

David A Harville, “Maximum Likelihood Approaches to Variance Component Esti-

mation and to Related Problems,” Journal of the American Statistical Association,

72(358):320–338, 1977.

R. Henderson and N. Keiding, “Individual survival time prediction using statistical

models,” Journal of Medical Ethics, 31(12):703–706, December 2005.

Nils Lid Hjort and Gerda Claeskens, “Focused information criteria and model aver-

aging for the Cox hazard regression model,” Journal of the American Statistical

Association, 101:1449–1464, 2006.

94

N.L. Hjort and G. Claeskens, “Frequentist model average estimators,” Journal of the

American Statistical Association, 98:879–899, 2003.

Hjsgaard, Ulrich Halekoh, and Jun Yan, “The R Package geepack for Generalized

Estimating Equations,” Journal of Statistical Software, 15/2:1–11, 2006.

D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley & Sons,

New York, 1989.

Y.X. Huang, H. Liang, and H. L. Wu, “Identifying predictors for anti-HIV treatment

response: mechanism-based differential equation models versus empirical semipara-

metric regression models,” Statistics in Medicine, 27:4722–4739, 2008.

J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data,

Wiley, New York, 2002.

Ronald Klein, Barbara E. K. Klein, Scot E. Moss, Matthew D. Davis, and David L.

DeMets, “The Wisconsin Epidemiologic Study of Diabetic Retinopathy: II. Preva-

lence and Risk of Diabetic Retinopathy When Age at Diagnosis Is Less Than 30

Years,” Arch Ophthalmol, 102(4):520–526, 1984.

Edward L. Korn and Richard Simon, “Explained Residual Variation, Explained Risk,

and Goodness of Fit,” The American Statistician, 45(3):pp. 201–206, 1991.

S. Kullback and R. A. Leibler, “On information and sufficiency,” Annals of Mathemat-

ical Statistics, 22:49–86, 1951.

N. M. Laird and J. H. Ware, “Random-effects Models for Longitudinal Data,” Biomet-

rics, 38:963–974, 1982.

Hannes Leeb and Benedikt M. Potscher, “Can one estimate the conditional distribution

of post-model-selection estimators?,” The Annals of Statistics, 34:2554–2591, 2006.

95

Chuan-Yun Li, Xizeng Mao, and Liping Wei, “Genes and (Common) Pathways Under-

lying Drug Addiction,” PLoS Comput Biol, 4(1):e2+, 2008.

K. Y. Liang and S. L. Zeger, “Longitudinal Data Analysis Using Generalized Linear

Models,” Biometrika, 73:13–22, 1986.

Stuart Lipsitz and Garrett Fitzmaurice, “Generalized estimating equations for longitudi-

nal data analysis,” In Longitudinal data analysis, Chapman & Hall/CRC Handbooks

of Modern Statistical Methods, pages 43–78. 2009.

Roderick J. A. Little, “Modeling the drop-out mechanism in repeated-measures stud-

ies,” Journal of the American Statistical Association, 90:1112–1121, 1995.

Roderick J. A. Little and Donald B. Rubin, Statistical Analysis With Missing Data,

Wiley Series in Probability and Mathematical Statistics: Applied Probability and

Statistics. John Wiley & Sons Inc., New York, 1987.

Honghu Liu, Robert E. Weiss, Robert I. Jennrich, and Neil S. Wenger, “PRESS model

selection in repeated measures data,” Computational Statistics & Data Analysis,

30(2):169–184, 1999.

C. L. Mallows, “Some comments on Cp,” Technometrics, 15:661–675, 1973.

P. McCullagh, “Quasi-Likelihood Functions,” The Annals of Statistics, 11:59–67, 1983.

P. McCullagh and J. A. Nelder, Generalized linear models, Chapman and Hall, London

New York, 2nd edition, 1989.

A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, 2 edition,

2002.

Erica E. M. Moodie, Thomas S. Richardson, and David A. Stephens, “Demystifying

Optimal Dynamic Treatment Regimes,” Biometrics, 63(2):447–455, 2007.

96

S. A. Murphy, “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical

Society, Series B, 65:331–366, 2002.

Wei Pan, “Akaike’s information criterion in generalized estimating equations,” Bio-

metrics, 57(1):120–125, 2001.

Wei Pan, “Model selection in estimating equations,” Biometrics, 57(2):529–534, 2001.

A. N. Pettitt, “Proportional Odds Models for Survival Data and Estimates Using

Ranks,” Journal of the Royal Statistical Society. Series C (Applied Statistics),

33(2):pp. 169–175, 1984.

Marc A Pfeffer and John A Jarcho, “The charisma of subgroups and the subgroups of

CHARISMA.,” N Engl J Med, 354(16):1744–6, 2006.

R. L. Prentice, “Exponential survivals with censoring and explanatory variables,”

Biometrika, 60(2):279–288, 1973.

Ross L. Prentice, “Correlated binary regression with covariates specific to each binary

observation,” Biometrics, 44:1033–1048, 1988.

Min Qian and Susan A. Murphy, “Performance guarantees for individualized treatment

rules,” Annals of statistics, 39(2):1180–1210, 2011.

Annie Qu, Bruce G. Lindsay, and Bing Li, “Improving generalised estimating equations

using quadratic inference functions,” Biometrika, 87(4):823–836, 2000.

J. Robins, L. Orellana, and A. Rotnitzky, “Estimation and extrapolation of optimal

treatment and testing strategies.,” Stat Med, 27(23):4678–721, 2008.

James M. Robins, “Optimal Structural Nested Models for Optimal Sequential Deci-

sions,” In In Proceedings of the Second Seattle Symposium on Biostatistics. Springer,

2004.

97

J.M. Robins, A. Rotnitzky, and L.P. Zhao, “Estimation of regression coefficients when

some regressors are not always observed,” Journal of the American Statistical Asso-

ciation, 89:846–866, 1994.

G. K. Robinson, “That BLUP is a Good Thing: The Estimation of Random Effects,”

Statistical Science, 6(1):15–32, 1991.

R. Schall, “Estimation in generalized linear models with random effects,” Biometrika,

78:717–727, 1991.

Martin Schumacher, Harald Binder, and Thomas Gerds, “Assessment of survival pre-

diction models based on microarray data,” Bioinformatics, 23:1768–1774, 2007.

G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, 6:461–

464, 1978.

J. Shao, “An Asymptotic Theory for Linear Model Selection,” Statistica Sinica, 7:221–

264, 1997.

X. Shen, H-C. Huang, and J. Ye, “Adaptive model selection and assessment for expo-

nential family models,” Technometrics, 46:306–317, 2004.

R. Simon, “Roadmap for developing and validating therapeutically relevant genomic

classifiers,” J Clin Oncol, 23:7332–7341, 2005.

Richard Simon, “Clinical trials for predictive medicine,” Statistics in Medicine,

31(25):3031–3040, 2012.

R.M. Simon, Genomic Clinical Trials and Predictive Medicine, Practical Guides to

Biostatistics and Epidemiology. Cambridge University Press, 2013.

X. Song and M.S. Pepe, “Evaluating markers for selecting a patient’s treatment.,”

Biometrics, 60(4):874–83, 2004.

98

Robert Stiratelli, Nan Laird, and James H. Ware, “Random-Effects Models for Serial

Observations with Binary Response,” Biometrics, 40(4):pp. 961–971, 1984.

Florin Vaida and Suzette Blanchard, “Conditional Akaike information for mixed-effects

models,” Biometrika, 92:351–370, 2005.

Mark J. van der Laan and James M. Robins, Unified methods for censored longitudinal

data and causality, Springer, 2003.

Lan Wang and Annie Qu, “Consistent model selection and data-driven smooth tests

for longitudinal data in the estimating equations approach,” Journal of the Royal

Statistical Society, Series B, 71(1):177–190, 2009.

R. Wang, S. W. Lagakos, J. H. Ware, D. J. Hunter, and Jm Drazen, “Statistics in

medicine–reporting of subgroup analyses in clinical trials,” New England Journal of

Medicine, 357(21):2189–94+, 2007.

R. W. M. Wedderburn, “Quasi-likelihood functions, generalized linear models, and the

Gauss-Newton method,” Biometrika, 61:439–447, 1974.

Halbert White, “A heteroskedasticity-consistent covariance matrix estimator and a di-

rect test for heteroskedasticity,” Econometrica, 48:817–838, 1980.

H. Wu, Y. Huang, E.P. Acosta, S.L. Rosenkranz, D.R. Kuritzkes, J.J. Eron, A.S. Perel-

son, and J.G. Gerber, “Modeling long-term HIV dynamics and antiretroviral re-

sponse: effects of drug potency, pharmacokinetics, adherence, and drug resistance.,”

J Acquir Immune Defic Syndr, 39:272–83, 2005.

Jun Yan, “Enjoy the Joy of Copulas: With a Package copula,” Journal of Statistical

Software, 21(4):1–21, 2007.

Y. H. Yang, “Can the Strengths of AIC and BIC be Shared?- A Conflict Between Model

Identification and Regression Estimation,” Biometrika, 92:937–950, 2005.

99

S.L. Zeger and M.R. Karim, “Generalized linear models with random effects: a Gibbs

sampling approach,” Journal of the American Statistical Association, 86:79–86,

1991.

Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian, “A Robust

Method for Estimating Optimal Treatment Regimes,” Biometrics, 68(4):1010–1018,

2012.

X. Y. Zhang and H. Liang, “Focused information criterion and model averaging for

generalized additive partial linear models,” The Annals of Statistics, 39:174–200,

2011.

Lue Ping Zhao and Ross L. Prentice, “Correlated binary regression using a quadratic

exponential model,” Biometrika, 77:642–648, 1990.

100

Appendix

Let f(y;θ,γ) be the density function. Denote the corresponding score function,

evaluated at (θ0,0), by:

T =

[T1

T2

]=

[∂ log f(y;θ,γ)/∂θ

∂ log f(y;θ,γ)/∂γ

]θ=θ0,γ=0

.

Denote the corresponding quasi-score function, evaluated at (θ0,0), by:

U =

[U1

U2

]=

[∂Q(θ,γ; y)/∂θ

∂Q(θ,γ; y)/∂γ

]θ=θ0,γ=0

.

The corresponding second derivatives are denoted as:

H =

[∂2Q/∂θ∂θ> ∂2Q/∂θ∂γ>

∂2Q/∂γ∂θ> ∂2Q/∂γ∂γ>

]θ=θ0,γ=0

.

To study the large sample properties of the proposed model selection criterion ∆AIC,

we need some regularity conditions.

A.1 Regularity Assumptions

(C.1): The log density function log f(y;θ,γ) has the three continuous partial deriva-

tives with respect to (θ,γ) in a neighborhood around (θ0,0), which are domi-

nated by functions with finite means under fN (y) = f(y;θ0,0). The true density

f0(y) = f0(y;θ0, δn−1/2) can be represented by fN (y) as:

f0(y) = fN (y)

1 + T>2 (y)δn−1/2 + r(y, δn−1/2),

101

where r(y, t) is small enough to make fN (y)r(y, t) is order of o (‖t‖2) uni-

formly in y.

(C.2): The log quasi-likelihood function Q(θ,γ; y) has third continuous derivatives

with respect to (θ,γ) in a neighborhood around (θ0,0), which is dominated by

function with finite mean under fN (y). The quasi-information matrix Σ (defined

below) exists and is non-singular under fN (y).

Σ = EN (−H) = varN (U) =

[Σ00 Σ01

Σ10 Σ11

]and Σ−1 =

[Σ00 Σ01

Σ10 Σ11

].

(C.3): The integrals∫

U(y)fN (y)r(y, t)dy and∫‖U(y)‖2fN (y)r(y, t)dy are order

of o (‖t‖2).

(C.4): For some ξ > 0, the integrals∫‖U(y)‖2+ξfN (y)dy and

∫‖U(y)‖2+ξfN (y)r(y, t)dy

are order ofO(1). Also, the variables |U2+ξ1k (y)T2r(y)| and |U2+ξ

2l (y)T2r(y)| have

finite means under null density fN (y), for k ∈ 1, · · · , p and r, l ∈ 1, · · · , q

with U1k = ∂Q/∂θk, U2l = ∂Q/∂γl and T2r = ∂ log f/∂γr.

These assumptions have customarily been assumed in the literature of quasi-likelihood

function, GEE and local misspecification framework. See Wedderburn (1974), McCul-

lagh (1983), Liang and Zeger (1986) and Hjort and Claeskens (2003).

A.2 Technical Lemmas

Two lemmas are introduced in this section. Lemma A.1 focuses on the large sample

behavior of quasi-score. Lemma A.2 focuses on the relationship between the GEE

estimates and quasi-score, therefore the large sample behaviors of the GEE estimates.

Lemma A.1 Under the misspecification framework and Regularity Assumptions, we

have: [R1,n

R2,n

]d→ Np+q

([Σ01

Σ11

]δ,Σ

),

102

where

R1,n =1√n

n∑i=1

U1(yi) and R2,n =1√n

n∑i=1

U2(yi).

In particular, in candidate model S:[R1,n

R2,S,n

]d→ Np+qS

([Σ01

πSΣ11

]δ,ΣS

).

Here, “ d→” denotes convergence in distribution under the sequence of f0(y).

Proof. We shall finish the proof in three steps. In the first two steps, we calculate the

expectation and variance of the quasi-score under f0(y). In the third step, we verify the

requirement for the Lyapounov central limit theorem and complete the proof.

Step 1. Consider E0(U1) first. E0(U2) can be manipulated by similar arguments. A

direct calculation yields that:

E0(U1) =

∫U1(y)fN (y)dy +

∫U1(y)fN (y)T>2 (y)δn−1/2dy

+

∫U1(y)fN (y)r(y, δn−1/2)dy. (A.1)

It is easy to see that the first term in the equation (A.1) equals zero by the fact that

U(y) = D>V−1(y − µ) with µ = EN (y). Note that:∫U(y)fN(y)T>(y)dy =

∫U(y)fN(y) [∂ log fN(y)/∂β]> dy

=

∫D>V−1(y − µ) [∂fN(y)/∂β]> dy

=D>V−1

∫y[∂fN(y)/∂β>

]dy −D>V−1µ

∫ [∂fN(y)/∂β>

]dy

=D>V−1 ∂

∂β>

∫yfN(y)dy −D>V−1µ

∂

∂β>

∫fN(y)dy

=D>V−1 ∂µ

∂β>− 0 = D>V−1D = Σ,

where the interchanges are justified by assumption (C.1) that |T(y)| is dominated by

function with finite mean under fN (y) and assumption (C.4) that |U(y)T(y)| has finite

103

mean under fN (y). Thus, the second term in the equation (A.1) is Σ01δn−1/2. Also,

by assumption (C.3), we conclude that the third term in the equation (A.1) is order of

o(1/n).

By the similar arguments, E0(U2) = Σ11δ/√n + o(1/n). As a result, the expecta-

tion of the quasi-score under f0(y) becomes:

E0

[U1

U2

]=

[Σ01

Σ11

]δ√n

+ o(1/n).

Step 2. Similar to calculating the expectation of the quasi-score, we first consider

var0(U1). The remaining terms can be manipulated by similar arguments. Note that:

E0(U1U>1 ) =

∫U1(y)U>1 (y)fN (y)dy +

∫U1(y)U>1 (y)fN (y)T>2 (y)δn−1/2dy

+

∫U1(y)U>1 (y)fN (y)r(y, δn−1/2)dy.

(A.2)

The first term is EN (U1U>1 ). By assumption (C.4), we know:∫

U21k(y)T2r(y)fN (y)dy ≤

∫ ∣∣U21k(y)T2r(y)

∣∣fN (y)dy = O(1)

and for k1, k2 ∈ 1, · · · , p,∣∣∣∣∫ U1k1(y)U1k2(y)T2r(y)fN (y)dy

∣∣∣∣≤ 1

2

[∫ ∣∣U21k1

(y)T2r(y)fN (y)∣∣dy +

∫ ∣∣U21k2

(y)T2r(y)fN (y)∣∣dy

]= O(1).

Therefore,∫

U1(y)U>1 (y)fN (y)T>2 (y)dy is order of O(1). It follows that the second

term in the equation (A.2) is order of O(1/√n). By assumption (C.3), we conclude

that the third term in (A.2) is order of o(1/n).

A direct simplification yields that

var0(U1) = varN (U1) +O(1/√n) = Σ00 +O(1/

√n).

104

Go through the similar arguments for var0(U2), cov0(U1,U>2 ) and cov0(U2,U

>1 ). The

variance of the quasi-score can be expressed under f0(y) as:

var0

[U1

U2

]=

[Σ00 Σ10

Σ10 Σ11

]+O(1/

√n) = Σ +O(1/

√n).

Step 3. Because yis are independent, the corresponding quasi-score, denoted by UF,i =

U(yi), is independent too. By assumption (C.4), for some ξ > 0:

E0

(‖U(y)‖2+ξ

)=

∫‖U(y)‖2+ξfN (y)dy +

∫‖U(y)‖2+ξfN (y)T>2 (y)δn−1/2dy

+

∫‖U(y)‖2+ξfN (y)r(y, δn−1/2)dy = O(1).

Therefore ‖UF,i‖2+ξ has finite mean under true density f0(y). So is ‖UF,i−E0(UF,i)‖2+ξ.

Denote the true distribution of yi by F0,i(y). It follows that:

limn→∞

n−(1+ξ/2)

n∑i=1

∫‖U− E0(UF,i)‖2+ξdF0,i(y)→ 0.

The Lyapounov condition can be assured. Applying Lyapounov central limit theorem

to the quasi-score UF,i indicates that:

1√n

n∑i=1

UF,i − E0(UF,i)

d→ Np(0,Σ).

Therefore, [R1,n

R2,n

]d→ Np+q

([Σ01

Σ11

]δ,Σ

).

Q.E.D.

Lemma A.2 Under the misspecification framework and the Regularity Assumptions,

the GEE estimates have the following equivalence in distribution form:

√n

[θ − θ0

γ

]= Σ−1

[R1,n

R2,n

]+ op(1),

105

In particular, under candidate model S:

√n

[θ − θ0

γS

]= Σ−1

S

[R1,n

πSR2,n

]+ op(1).

Proof. Consider a Taylor series expansion of the quasi-score around (θ0,0) as:[R1,n(θ, γ)

R2,n(θ, γ)

]

=

[R1,n

R2,n

]+

[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]θ=θ0,γ=0

[θ − θ0

γ − 0

]

+1

2

[θ − θ0

γ − 0

]>[∂2R1,n(θ,γ)/∂θ>∂θ ∂2R1,n(θ,γ)/∂γ>∂θ

∂2R2,n(θ,γ)/∂θ>∂γ ∂2R2,n(θ,γ)

/∂γ>∂γ

]θ=θ∗,γ=γ∗

[θ − θ0

γ − 0

],

(A.3)

with θ∗ between θ0 and θ, and γ∗ between 0 and γ. Recalling the consistency of the

GEE estimates, it is easy to see θ∗ = θ0+op(1) and γ∗ = op(1). Also, assumption (C.1)

indicates that the matrix of the second derivative in the third term in the equation (A.3)

is bounded, therefore the third term is of order op(1). The equation (A.3) becomes:[0

0

]=

[R1,n

R2,n

]+

[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]θ=θ0,γ=0

×

[θ − θ0

γ − 0

]+op(1).

Therefore,

√n

[θ − θ0

γ − 0

]= −√n

[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]−1

θ=θ0,γ=0

×

[R1,n

R2,n

]+op(1).

Again assumption (C1) and the law of large number yields:

1√n

[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]θ=θ0,γ=0

= −Σ + op(1)

and

√n

[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]−1

θ=θ0,γ=0

= −Σ−1 + op(1).

106

Therefore,

√n

[θ − θ0

γ − 0

]=−Σ−1 + op(1)

[ R1,n

R2,n

]+ op(1)

= −Σ−1

[R1,n

R2,n

]+ op(1).

We finish the proof. Q.E.D.

A.3 Proof of Theorem 2.1

Based on Lemma A.2, the estimate of the uncertain parameter under the full model

becomes:

√nγ = Σ10R1,n + Σ11R2,n + op(1)

= Σ11(R2,n −Σ10Σ−100 R1,n) + op(1).

The estimates of the uncertain parameters under candidate model S can be written as:

√nγS = Σ11

S (πSR2,n −Σ10,SΣ−100 R1,n) + op(1)

= Σ11S πS(R2,n −Σ10Σ

−100 R1,n) + op(1).

(A.4)

A direct calculation indicates a relationship between γS and γ as follows:

√nγS =

√nΣ11

S πS

(Σ11)−1γ + op(1). (A.5)

Also, the large sample behavior of the GEE estimates can also derived by Lemma A.1

and Lemma A.2:

√n

[θ − θ0

γ

]d→ Np+q

(Σ−1

[Σ01

Σ11

]δ,Σ−1

). (A.6)

107

We are now going to prove the main theorem. To derive the specific form of ∆AIC,

consider a Taylor series expansion of the log quasi-likelihood around (θ0,0):

Q(θ, γ;D) = Q(θ0,0;D) +√n

[R1,n

R2,n

]>×

[θ − θ0

γ − 0

]

+

√n

2

[θ − θ0

γ − 0

]>[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>

∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>

]θ=θ∗,γ=γ∗

[θ − θ0

γ − 0

],

where θ∗ is between θ0 and θ and γ∗ between 0 and γ. It follows that:

Q(θ,γ;D)−Q(θ0,0;D)

=

[R1,n

R2,n

]>×√n

[θ − θ0

γ − 0

]+

√n

2

[θ − θ0

γ − 0

]>−√nΣ + op(1)

[ θ − θ0

γ − 0

]

=

[R1,n

R2,n

]>×

Σ−1

[R1,n

R2,n

]+ op(1)

− 1

2

Σ−1

[R1,n

R2,n

]+ op(1)

>Σ + op

(1/√n)

Σ−1

[R1,n

R2,n

]+ op(1)

=1

2

[R1,n

R2,n

]>Σ−1

[R1,n

R2,n

]+ op(1).

The second equality follows from Lemma A.1. In particular,

Q(θ, γS;D)−Q(θ0,0;D) =1

2

[R1,n

πSR2,n

]>Σ−1

S

[R1,n

πSR2,n

]+ op(1). (A.7)

For the narrow model, it becomes:

Q(θ,0;D)−Q(θ0,0;D) =1

2R>1,nΣ

−100 R1,n + op(1). (A.8)

Recall the definition of ∆AICn,S, which follows that:

∆AICn,S = −2n∑i=1

Q(θ, γS; yi) + 2n∑i=1

Q(θ,0; yi) + 2|S/N|

= −2[Q(θ, γS;D)−Q(θ,0;D)

]+ 2|S/N|.

108

The equations (A.7) and (A.8) indicate that:

∆AICn,S = −2[Q(θ, γS;D)−Q(θ0,0;D)

]+ 2[Q(θN ,0;D)−Q(θ0,0;D)

]+ 2|S/N|

= −

[R1,n

πSR2,n

]>Σ−1

S

[R1,n

πSR2,n

]+ R>1,nΣ−1

00 R1,n + 2|S/N|+ op(1).

Using the expressions given in the equations (A.4) and (A.5), ∆AICn,S can be further

expressed as:

−(πSR2,n − πSΣ10Σ

−100 R1,n

)>Σ11

S

(πSR2,n − πSΣ10Σ

−100 R1,n

)+ 2|S/N|+ op(1)

=−√nγ>S

(Σ11

S

)−1√nγS + 2|S/N|+ op(1)

=− nγ>(Σ11)−1π>S Σ11

S πS

(Σ11)−1γ + 2|S/N|+ op(1).

Recalling the equation (A.6), we have proven that√nγ ∼ Nqδ,Σ

11), therefore the

main component of ∆AICn,S converges to a non-central chi-squared distribution and:

∆AICn,Sd→− χ2

|S/N|(λS) + 2|S/N|,

with λS = nγ>0(Σ11)−1π>S Σ11

S πS

(Σ11)−1γ0. We thus complete the proof. Q.E.D.


Then, the GEE estimate converges in distribution as follows:

√n

[θS − θ0

γS

]→d

[(Σ00,SΣ01 + Σ01,SπSΣ11)δ + Σ00,SM1 + Σ01,SπSM2

(Σ10,SΣ01 + Σ11,SπSΣ11)δ + Σ10,SM1 + Σ11,SπSM2

]

=

[Σ−1

00 Σ01δ + Σ−100 M1 −Σ−1

00 Σ01π>S Σ11

S πS

(Σ11)−1

∆

Σ11S πS

(Σ11)−1

∆

].

Since ζ is a function of (θ,γ), then√n(ζS− ζ0

)can be expanded by Taylor expansion

and a delta method as:

109

√n(ζS − ζ0) =

√nζ(θS, γS)− ζ(θ0, δ/

√n)

d→(∂ζ

∂θ

)>√n(θS − θ0) +

(∂ζ

∂γS

)>√n(γS − γ0)−

(∂ζ

∂γ

)>δ

d→(∂ζ

∂θ

)> Σ−1

00 Σ01δ + Σ−100 M1 −Σ−1

00 Σ01π>S Σ11

S πS

(Σ11)−1

∆

+

(∂ζ

∂γS

)>Σ11

S πS

(Σ11)−1

∆−(∂ζ

∂γ

)>δ

=

(∂ζ

∂θ

)>Σ−1

00 Σ01 −(∂ζ

∂γ

)>δ −

(∂ζ

∂θ

)>Σ−1

00 Σ01π>S −

(∂ζ

∂γS

)>

Σ11S πS

(Σ11)−1

∆ +

(∂ζ

∂θ

)>Σ−1

00 M1

=

(∂ζ

∂θ

)>Σ−1

00 M1 + ω>δ − ω>π>S Σ11S πS

(Σ11)−1

∆.

Therefore:

√n(ζS − ζ0)

d→ ΩS = Ω0 + ω>δ − ω>π>S Σ11S πS

(Σ11)−1

∆

where Ω0 ∼ Np(0, τ20 ). The limiting variable ΩS follows Normal distribution with

mean ω>δ − ω>π>S Σ11S πS

(Σ11)−1δ and variance τ 2

0 + ω>π>S Σ11S πSω.

Q.E.D.


Since the compromise estimator has the form of ζ =∑

S p(S|∆)ζS, therefore:

√n(ζ−ζ0

) d→ Ω =∑

S

p(S|∆)ΩS = Ω0+ω>δ−ω>∑

S

p(S|∆)π>S Σ11

S πS

(Σ11)−1

∆.

The limiting variable Ω has meanω>δ−ω>E[δ(∆)

]and variance τ 2

0 +ω>var[δ(∆)

]ω.

Q.E.D.

Documents

Model Selection and Model Averaging for Longitudinal Data with Application in Personalized