Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Model Selection and Model Averagingfor Longitudinal Data with Application in
Personalized Medicineby
Hui Yang
Submitted in Partial Fulfillment of the
Requirements for the Degree
Doctor of Philosophy
Supervised by
Professor Hua Liang
Department of Biostatistics and Computational BiologySchool of Medicine and Dentistry
University of RochesterRochester, New York
2013
ii
Biographical Sketch
Hui Yang was born in Tianjin, People’s Republic of China, on August 11, 1983. In
2006, she received her Bachelor of Science degree in Statistics in the Department of
Statistics, School of Mathematical Sciences, at Nankai University. Prior to coming to
Rochester, she spent two years in Texas and received her Master of Science degree in
Mathematics in 2009 in the Department of Mathematics, College of Arts and Sciences,
at the University of North Texas.
Thereafter, Hui joined the Ph.D. program in the Department of Biostatistics and
Computational Biology, School of Medicine and Dentistry, at the University of Rochester.
In 2010, she received her Master of Arts degree in Statistics and has begun her Ph.D
thesis research under the guidance of Professor Hua Liang since 2011.
Hui presented her work at the 2013 International Biometric Society Meeting in
Orlando, Florida and at the 2013 Joint Statistical Meeting in Montreal, Canada. She
is a member of the American Statistical Association and the International Biometric
Society.
iii
Acknowledgments
I would first like to express my sincere gratitude to Professor Hua Liang for his
inspiration and constant guidance, support and encouragement throughout my Ph.D.
research. He has not just made this thesis possible but also exemplified for me the
scientific spirit of a true scholar.
Many thanks also to the rest of my thesis committee members: Professor Hulin Wu,
Professor Tanzy Love and Professor Jean-Philippe Couderc. I very much appreciate
their invaluable suggestions and comments to help improve this thesis.
I would also like to thank Professor Guohua Zou for his insight on my thesis re-
search; Professor Michael McDermott for his advice and guidance in planning my pur-
suit of a Ph.D. degree; and Ms. Cheryl-Bliss Clark for her endless support and care.
I am very grateful to have spent wonderful years in the Department of Biostatistics
and Computational Biology. The graduate courses, lectures and professional activities
helped develop my knowledge and skills and sparked my professional motivations. I
enjoyed interacting with and learning from the faculty, staff and my student colleagues.
Their support and friendships enriched my Ph.D. study.
Finally, I would like to express my love and gratitude to my family, including my
wonderful parents, Xiulan Song and Qiuwei Yang. With their endless loving care, I am
blessed.
iv
Abstract
Longitudinal data are sometimes collected with a large number of potential ex-
ploratory variables. In order to get the better statistical inference and make the more
accurate prediction, model selection has become an important procedure for longitu-
dinal studies. Nevertheless, the inference based on a single model may ignore the un-
certainty introduced by the selection procedure, and therefore underestimate the vari-
ability. As an alternative, model averaging approach combines estimates from different
candidate models in the form of the certain weighted mean to reduce the effect of se-
lection instability. There has been much literature about model selection and averaging
for cross-sectional data, but more efforts are needed to invest in longitudinal data.
My thesis focuses on model selection and model averaging procedures in the lon-
gitudinal data context. We propose an AIC-type model selection criterion (∆AIC) in-
corporating the generalized estimating equations approach. Specifically, we consider
the difference between the quasi-likelihood of a candidate model and a narrow model
plus a penalty term in order to avoid the complicated integration calculation from the
quasi-likelihood. This criterion actually inherits theoretical asymptotic properties from
AIC.
In the second part, we develop a focused information criterion (QFIC) and a Fre-
quentist model average (QFMA) procedure on the basis of a quasi-score function in-
corporating the generalized estimating equations approach. These methods are shown
to have asymptotic properties. We also conduct intensive simulation studies to examine
the numerical performance of the proposed methods.
v
The third part aims to apply the focused information criterion to personalized medicine.
Based on the individual level information from clinical observations, demographics,
and genetics, this criterion provides a personalized predictive model to make a prog-
nosis and diagnosis for an individual subject. Consideration of the heterogeneity of
individuals helps to reduce prediction uncertainty and improve prediction accuracy.
Several real case studies from biomedical research are studied as illustrations.
vi
Contributors and Funding Sources
This thesis was supervised by a dissertation committee: Professor Hua Liang (ad-
visor), Professor Hulin Wu, and Professor Tanzy Love from the Department of Bio-
statistics and Computational Biology, and Professor Jean-Philippe Couderc from the
Department of Medicine, Cardiology at the University of Rochester.
The content of this thesis mainly consists of three research projects during the doc-
toral study at the University of Rochester. Two research papers are in preparation as
follows:
Hui, Y., Peng, L., Guohua, Z., and Hua, L. Variable Selection and Model
Averaging for Longitudinal Data Incorporating GEE Approach, Submitted
to Statistica Sinica.
Hui, Y., Hua, L. Focused Information Criterion on Predictive Models in
Personalized Medicine, In preparation.
This thesis was advised by Professor Hua Liang. All work was completed by the stu-
dent. The graduate study was supported by the Fellowship from University of Rochester
Medical Center.
vii
Table of Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Model Selection and Averaging Approach . . . . . . . . . . . . . . . . 8
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 AIC-Type Model Selection Criterion Incorporating the GEE Approach 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Quasi-likelihood-based ∆AIC . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Conclusion and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Focused Information Criterion and the Frequentist Model Averaging Pro-
cedure Incorporating the GEE Approach 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Model Selection and Averaging Procedures . . . . . . . . . . . . . . . 32
3.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Conclusion and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 48
viii
4 Predictive Models in Personalized Medicine 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Prostate Cancer Case Study . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Relapsing Remitting Multiple Sclerosis Case Study . . . . . . . . . . . 66
4.4 Veteran’s Lung Cancer Case Study . . . . . . . . . . . . . . . . . . . . 78
4.5 Conclusion and Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 85
5 Discussion and Future Work 87
Bibliography 90
Appendix 100
A.1 Regularity Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.3 Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
A.4 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
A.5 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
ix
List of Tables
1.1 Structure of the Typical Longitudinal Dataset . . . . . . . . . . . . . . 4
2.1 ∆AIC - Candidate Models in Simulation Studies . . . . . . . . . . . . 21
2.2 ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC
in Simulation I with True Exchangeable Correlation Structure EX(0.5) . 22
2.3 ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC
in Simulation I with True Autoregressive Correlation Structure AR(0.5) . 23
2.4 ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC
in Simulation II with True Mixed Correlation Structure MIX . . . . . . . 25
2.5 WESDR - Statistical Inference under Full Model with IN, EX and AR
Working Correlation Matrices . . . . . . . . . . . . . . . . . . . . . . 27
2.6 WESDR - ∆AIC Values and Ranks of Candidate Models . . . . . . . . 28
2.7 WESDR - QIC and ∆AIC Values of Models Selected by QIC . . . . . . 29
3.1 QFIC and QFMA - Candidate Models in Simulation I with Continuous
Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 QFIC and QFMA - Candidate models in Simulation II with Binary Re-
sponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 A5055 - Statistical Inference under Full Model with IN, EX and AR
Working Correlations Matrices . . . . . . . . . . . . . . . . . . . . . . 45
x
3.4 A5055 - ∆AIC and QFIC Values on 12 Nested Models Selected by ∆AIC 49
3.5 A5055 - QIC and QFIC Values on 12 Nested Model Selected by QIC . . 50
3.6 A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for CD4 . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for CD8 . . . . . . . . . . . . . . . . . . . . . . . . 52
3.8 A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for Age . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Prostate Cancer - Statistical Inference under Full Model . . . . . . . . . 59
4.2 Prostate Cancer - Candidate Models . . . . . . . . . . . . . . . . . . . 60
4.3 Prostate Cancer - Group Partition Criteria . . . . . . . . . . . . . . . . 62
4.4 Prostate Cancer - Group-Specific Percentages and Prediction Error Rates
of Targeted Patients with Four Partition Criteria . . . . . . . . . . . . . 65
4.5 RRMS - Statistical Inference under Full Model . . . . . . . . . . . . . 67
4.6 RRMS - Candidate Models . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 RRMS - Group-Specific Percentages and Prediction Error Rates for the
Targeted Patients at the Targeted Visit Days with Four Partition Criteria 75
4.8 RRMS - Personalized Predictive Models Concluded by the Personal-
ized QFIC for Targeted Patients under Twelve Scenarios . . . . . . . . 76
4.9 Lung Cancer - Statistical Inference under Full Model . . . . . . . . . . 79
4.10 Lung Cancer - Candidate Models . . . . . . . . . . . . . . . . . . . . . 80
xi
List of Figures
2.1 WESDR - ∆AIC Values of Candidate Models . . . . . . . . . . . . . . 28
3.1 QFMA and QFIC - MSE and CP for Focused Parameter ζ in Simulation
I on Continuous Responses with True Exchangeable, Autoregressive
and Mixed Correlation Matrices EX(0.5), AR(0.5) and MIX . . . . . . . 39
3.2 QFMA & QFIC - MSE & CP for Focused Parameter ζ in Simulation
II on Binary Responses with True Exchangeable, Autoregressive and
Mixed Correlation Matrices EX(0.5), AR(0.5) and MIX . . . . . . . . . . 42
3.3 A5055 - Prediction Error Rates of Model Selection and Model Averag-
ing Procedures with Different Values of Weight Parameter κ . . . . . . 46
4.1 Prostate Cancer - Frequency of Candidate Models Selected by the Per-
sonalized FIC as the Personalized Predictive Models for 376 Patients . . 61
4.2 Prostate Cancer - Histograms of Tumor Volume and Age . . . . . . . . 63
4.3 RRMS - Empirical and Estimated Exacerbation Rates on Visit Days
and Duration Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 RRMS - Frequencies of Candidate Models Selected by the Personalized
QFIC as the Personalized Predictive Models for 822 Observations . . . 70
4.5 RRMS - Frequencies of Candidate Model Selected by the Personalized
QFIC as the Personalized Predictive Models for 50 Patients at Visit
Days of 7, 31, 61 and 104 . . . . . . . . . . . . . . . . . . . . . . . . . 72
xii
4.6 RRMS - Exacerbation Rate Predictions for Targeted Patients under the
Single Predictive Model and the Twelve Personalized Predictive Models 77
4.7 Lung Cancer - Frequencies of Candidate Models Selected by the Per-
sonalized FIC as the Personalized Predictive Models for 137 Veterans . 81
4.8 Lung Cancer - Kaplan Meier Estimations and Karnofsky Scores His-
tograms on Veterans in Groups G8, G14 and G16 . . . . . . . . . . . . 82
4.9 Lung Cancer - Frequencies of Candidate Model Selected by the Per-
sonalized FIC as the Personalized Predictive Models for Veterans with
Different Tumor Cell Types . . . . . . . . . . . . . . . . . . . . . . . . 84
1
1 Introduction
1.1 Background and Motivation
Longitudinal data, in the form of repeated measurements with the same individual over
time or place, arise in a broad range of fields including biomedical, pharmaceutical,
social, and public health research. Instead of just comparing the same characteristics
from different individuals at one specific point in cross-sectional studies, longitudinal
studies can allow one to analyze the change of the responses as well as the influence
factors over a long period of time.
We acknowledge that for each individual, multiple observations may be correlated,
even if the individuals themselves are independent from each other. In order to get
more reliable statistical inference, the possible correlation has to be considered. During
the last three decades, there has been various literature on the statistical analysis of
longitudinal data, such as Harville (1977), Laird and Ware (1982), Liang and Zeger
(1986), Prentice (1988), Zhao and Prentice (1990), Breslow and Clayton (1993), Qu
et al. (2000), Diggle et al. (2002), and Fitzmaurice et al. (2009). Generally speaking,
the analysis can be categorized in three different model fitting classes: marginal models,
mixed effects models, and transition models. More details about these model fitting
approaches are discussed in Section 1.2.
2
Sometimes, there are many potential exploratory variables collected in a study. To
include all variables may result in an overfitting model with poor predictive perfor-
mance. Therefore, statistical analysis generally starts with choosing an appropriate
model, including only important and necessary variables. There has been extensive
literature about model selection, but it mainly focuses on the classic linear regression
models, such as Buckland et al. (1997), Shao (1997), George (2000), and Miller (2002).
More recently, some of the traditional model selection criteria have been extended to
longitudinal data, especially for mixed effects models and marginal models like gener-
alized estimating equations. These criteria are reviewed in Section 1.3.
Nevertheless, all these traditional model selection criteria are data-oriented and se-
lect the single final model with the best overall fit, regardless of the different parameter
interests. Hansen (2005) pointed out that “models should be evaluated based on their
purposes.” In other words, different models should be chosen for analyzing different
individuals or subgroups, or for estimating different focused parameters, as mentioned
in Hand and Vinciotti (2003). From this perspective, Claeskens and Hjort (2003) pro-
posed the focused information criterion (hereafter “FIC”), which chooses the model
with the smallest estimated mean square error of the focused parameter’s estimate. The
paper also developed the corresponding large sample properties.
One concern about model selection procedures relates to over optimistic confidence
intervals. Inference based on a single model ignores the uncertainty introduced during
the model selecting process and therefore underestimates the variability, which may re-
sult in relatively narrow confidence intervals, as shown in Danilov and Magnus (2004),
Shen et al. (2004). As an alternative, the model averaging approach avoids model se-
lection instability by averaging the estimates based on the different candidate models.
It reduces the risk of selecting a poor model and improves the coverage probability of
the corresponding confidence intervals. This strategy has been studied in much litera-
ture, including Draper (1995), Buckland et al. (1997), Burnham et al. (2002), Danilov
and Magnus (2004), Leeb and Potscher (2006). Most, however, are from the Bayesian
3
perspective. In 2003, Hjort and Claeskens (2003) proposed the Frequentist model aver-
aging procedure (hereafter “FMA”) by using weights obtained based on certain model
selection criteria. Section 1.3 also discusses the Frequentist model averaging frame-
work for classic linear regression models.
FIC and the FMA procedure have been well studied in commonly used models,
such as generalized linear models in Claeskens et al. (2006), Cox proportional haz-
ards models in Hjort and Claeskens (2006), semi-parametric partial linear models in
Claeskens and Carroll (2007), and generalized additive partial linear models in Zhang
and Liang (2011). Since longitudinal data have become more common, novel analysis
approaches are highly demanded to attain better statistical inference and make more
accurate predictions. This demand motivates us to study model selection and model av-
eraging procedures in the longitudinal data context. In particular, the characteristic of
FIC, which aims to tailor the final model based on the targeted parameter, inspires us to
apply FIC to predictive models in personalized medicine. Section 1.4 briefly describes
the thesis work. All technical details are provided in the Appendix.
1.2 Estimation and Inference
Consider a longitudinal study with n independent subjects. Subject i has mi vis-
its, where the jth visit collects the response yij and a set of the covariates xij =
(x1ij, · · · , xkij). Let N =∑n
i=1mi be the total number of observations in this study.
Table 1.1 illustrates the structure of the typical longitudinal dataset.
In longitudinal data analysis, marginal models mainly focus on the exploratory vari-
ables’ effects on the mean responses, regardless of the correlation structure within each
subject. To fit the marginal models, the generalized estimating equations (hereafter
“GEE”) approach, proposed by Liang and Zeger (1986), has been widely used. It pro-
vides the consistent estimates by only specifying the first two marginal moments and a
working correlation matrix. The corresponding estimation and inference are provided
4
Table 1.1: Structure of the Typical Longitudinal Dataset
Subject Observation Response Exploratory Variables
1 1 y11 x111, · · · , xk11
1 2 y12 x112, · · · , xk12
......
......
1 m1 y1m1 x11m1 , · · · , xk1m1
......
......
n 1 yn1 x1n1, · · · , xkn1
n 1 yn2 x1n2, · · · , xkn2
......
......
n mn ynmn x1nmn , · · · , xknmn
in Subsection 1.2.1. Instead of targeting the population level, mixed effects models
allow the regression coefficients to vary randomly for each individual and therefore
provide the subject-specific inference as well. Laird and Ware (1982) introduced linear
mixed effects models (hereafter “LMM”) to analyze the continuous longitudinal data,
which are based on the normality assumption. Later on, generalized linear mixed ef-
fects models (hereafter “GLMM”) were also proposed to fit the categorical longitudinal
data. Both of the models arrive at the estimation by integrating the random effects from
the joint likelihood. More details are presented in Subsection 1.2.2.
1.2.1 Generalized Estimating Equations
In the framework of the GEE approach, the mean of yij is connected to xij through a
link function g(·) as follows,
E(yij) = µij and g(µij) = x>ijβ,
5
where β = (β1, · · · , βk)> is a vector of the unknown parameters. The variance of yij
can be expressed as a known function ν(·) of µij with a nuisance parameter φ,
var(yij) = φν(µij).
Starting from these two basic assumptions, Wedderburn (1974) defined the log quasi-
likelihood function K(µij, φ, yij) through the following relation:
∂K(µij, φ; yij)
∂µij=yij − µijφν(µij)
.
Let yi = (yi1 · · · , yimi)> and xi = (xi1, · · · ,ximi
)>. In the context of longitudinal
data, the log quasi-likelihood function can be defined similarly:
∂Q(β,Ri(α), φ; yi)
∂β= D>i V−1
i (yi − µi),
where µi = E(yi), Di = Di(β) = ∂µi/∂β>, Vi = φA1/2
i Ri(α)A1/2
i , Ri(α) is a
mi ×mi working correlation matrix and Ai is a mi ×mi diagonal matrix with the jth
diagonal element ν(µij). Let D = (y1,x1), · · · , (yn,xn) and the estimates of β can
be reached by solving the corresponding quasi-score equations, known as generalized
estimating equations:
U(β,R(α), φ;D) =∂Q(β,R(α), φ;D)
∂β=
n∑i=1
D>i V−1i (yi − µi) = 0.
The main advantage of the GEE estimates βgee is their consistency under the mild regu-
larity conditions, regardless of the misspecified working correlation matrix. It has also
been shown that√n(βgee − β) follows the asymptotic normal distribution with mean
zero and variance-covariance matrix Vgee where:
Vgee = limn→∞
n
(n∑i=1
D>i V−1i Di
)−1 n∑i=1
D>i V−1i cov(yi)V
−1i Di
(n∑i=1
D>i V−1i Di
)−1
.
By replacing cov(yi) with yi − µi(β)yi − µi(β)> and substituting α, β and φ by
their√n-consistent estimates, Vgee can be estimated consistently, where the estimate is
6
known as the sandwich estimate or as the robust variance-covariance estimate in White
(1980).
Liang and Zeger (1986) also suggested several commonly used working correla-
tion matrices: the independent working correlation matrix (IN) with Ri = Ini, the
exchangeable working correlation matrix (EX) with [Ri]jk = α (j 6= k), the first order
autoregressive working correlation matrix (AR) with [Ri]jk = α|j−k| (j 6= k) and the
unstructured working correlation matrix (UN) with [Ri]jk = αjk (j 6= k). Although the
GEE approach provides the robust estimates regardless of the choice of Ri, choosing
the one which is close to the true correlation can increase efficiency.
1.2.2 Mixed Effects Models
For the previous longitudinal dataset, if the response variables are continuous, the linear
mixed effects models proposed by Laird and Ware (1982) have the following formula:
yi = xiβ + zibi + εi with bi ∼ N(0, σ2H) and εi ∼ N(0, σ2Ki).
Here the design matrix xi, composed of the k fixed effects, links the unknown popu-
lation parameters β to the response yi, while the design matrix zi, including all the
l random effects, links the unknown individual parameters bi to yi. In particular,
zi = (zi1, · · · , zimi)>, where zij = (z1ij, · · · , zlij). By defining the fixed and ran-
dom effects, LMM allows some parameters fixed while others vary randomly across
the subjects. The covariance of the repeated measurements therefore can be specified
as:
cov(yi) = Vi = σ2ziHz>i + σ2Ki.
Denote YN×1 = (y>1 , · · · ,y>n )>; XN×k = (x>1 , · · · ,x>n ); ZN×nl = diag(z>1 , · · · , z>n );
bnl×1 = (b>1 , · · · ,b>n )>; εN×1 = (ε>1 , · · · , ε>n )>; Hnl×nl = diag(H, · · · ,H) and
Knl×nl = diag(K1, · · · ,Kn). LME can also be written in the following matrix notation:
Y = Xβ + Zb + ε with b ∼ N(0, H).
7
If the variance components of H and K are given, the estimates of β and bi can be
reached by the likelihood or the generalized least square methods as the best linear
unbiased estimates, as shown in Robinson (1991):
β = (X>V−1X)−1X>V−1Y and bi = Hz>i V−1i (yi − xiβ),
where V = diag(V1, · · · ,Vn). If the variance matrices are unknown, to estimate β
and bi, the maximum likelihood method or the restricted maximum likelihood method
can be used with EM-algorithm, as proposed by Dempster et al. (1977) and Laird and
Ware (1982).
In order to model the categorical longitudinal data, the generalized linear mixed
effects models have also been studied, such as Stiratelli et al. (1984), Breslow and
Clayton (1993), and Schall (1991). Conditional upon the individual effect, the mean of
yij can be connected to xij through a link function g(·):
E(yij|bi) = µij, and g(µij) = x>ijβ + z>ijbi.
The responses are (conditionally) independent and have the conditional density func-
tion of the following formula:
f(yij|bi;β, σ20) = exp
ωijσ2
0
(yijθij − a(θij)) + c(yij, σ20/ωij)
where the ωijs are the known weights, as shown in McCullagh and Nelder (1989). The
mean and canonical parameters are linked through the equation µ = a′(θ). When the
canonical link function is the normal density function, GLMM becomes LME. Though
there is no closed form for the estimates in GLMM, the EM algorithm or Newton-
Raphson methods can be applied instead. The Gibbs sampling introduced in Zeger and
Karim (1991), or the Laplace approximation in Breslow and Clayton (1993) may also
be considered when the dimension of random effects is relatively high.
8
1.3 Model Selection and Averaging Approach
In regression, given a large set of exploratory variables, we need to choose the appropri-
ate ones to arrive at a better statistical inference and make a more accurate prediction.
Model selection therefore becomes a necessary procedure. Fortunately, a number of
model selection criteria have been well studied for regression models.
One of the most widely used criteria was the Akaike’s information criterion (AIC).
It was proposed in Akaike (1973) as an asymptotic unbiased estimate of Kullback and
Leibler’s information between a candidate model and the true model. By trading off
the gain of information and model complicity, the AIC value of a candidate model can
be defined as:
AICk = −2log(Lk) + 2k,
with Lk being the likelihood function and k being the number of exploratory variables
in the candidate model.
From the Bayesian perspective, Schwarz (1978) proposed the Schwarz information
criterion, which is also known as the Bayesian information criterion (BIC). The BIC
value of a candidate model is defined as:
BICk = −2log(Lk) + 2klog(n)
where n is the number of observations in the study. Depending on the sample size
and the number of the exploratory variables, BIC usually penalizes the complicity of
a candidate model more strongly than AIC. The model with the smallest AIC or BIC
value is chosen as the final model.
There are also certain criteria built on the concept of the residual sum of squares
(hereafter “RSS”), such as the residual mean square (hereafter “RMS”), the squared
multiple correlation coefficient R2, and the adjusted R2. Mallows (1973) proposed
Mallows’ Cp defined as:
RSSk/σ2 + 2k − n,
9
where σ2 is the RMS after regression on the complete set of all the exploratory vari-
ables. The final model, chosen by these criteria with the smallest value, is actually a
compromise among the sample size, effect size and the collinearity degree within the
exploratory variables.
With the development of model fitting in longitudinal data, several traditional model
selection criteria in the classical linear regression models have also been extended to
longitudinal analysis. This has occurred particularly in the GEE approach, reviewed in
Subsection1.3.1, and in LMM and GLMM, briefly presented in Subsection 1.3.2.
1.3.1 Model Selection for the GEE Approach
For the non-likelihood based GEE approach, the traditional model selection criteria
cannot be applied directly. Pan (2001a) proposed Akaike’s information criterion in gen-
eralized estimating equations, called quasi-likelihood under the independence model
criterion (QIC). By replacing the likelihood component with quasi-likelihood intro-
duced in McCullagh and Nelder (1989), it is defined as:
QIC(R) = −2Q(βgee(R), I;D) + 2trace(ΩIVgee).
Here, βgee(R) and the sandwich estimate Vgee are both obtained with the working cor-
relation matrix R. The logarithm of the quasi-likelihood function Q(βgee(R); I,D), is
reached with working-independence assumption, as is ΩI, the inverse of the sandwich
estimate of βgee(I). With the similar components as AIC, the quasi-likelihood plus a
penalty term, QIC picks the model with the smallest QIC value.
In terms of the criteria based on RSS, the GEE version residual sum of squares,
RSSgee, can be simply extended as:
RSSgee =n∑i=1
mi∑j=1
yij − g
(x>ijβgee
)2
.
10
Cantoni et al. (2005) also proposed weighted residual sum of squares by considering
the observations’ heteroscedasticity:
RSSw =n∑i=1
mi∑j=1
cij
yij − g
(x>ijβgee
)φν(µij)
2
,
where cijs are weights based on the experience. Cantoni et al. (2005) extended Mal-
lows’ Cp criterion to the generalized Cp, denoted by GCp for the GEE approach.
GCp =n∑i=1
mi∑j=1
yij − g
(x>ijβgee
)φν(µij)
2
−N + 2trace(M−1N),
with M = n−1∑n
i=1 D>i V−1i Di and N = n−1
∑ni=1 D>i A−1
i Di.
1.3.2 Model Selection for LMM and GLMM
LMM and GLMM involve two types of model selection issues, targeting the inference
of the population and individual levels separately: (1) identification of the significant
fixed and random effects and (2) identification of the significant fixed effects only when
the random effects are not the subject for selection, mentioned in Dziak and Li (2007).
Liu et al. (1999) proposed the predicted residual sum of square (PRESS) using the
leave-one-out cross-validation experiment:
PRESS =n∑i=1
yi − xiβ(−i)
2
,
where β(−i) is estimated when the ith subject is deleted from analysis. PRESS can be
used for selection of the fixed effects only
When focusing on individual level inference, Vaida and Blanchard (2005) proposed
conditional AIC:
cAIC = −2log(cL) + 2ρ.
Here, cL is the model likelihood conditioning on bi = bi and ρ is the effective degrees
of freedom, defined as ρ = trace(H). H is the hat matrix mapping the observation y to
the fitted vector y.
11
1.3.3 Frequentist Model Averaging
In this section, we take the classical linear regression model as an example to illustrate
the framework for the FMA procedure, as proposed in Hjort and Claeskens (2003). The
notations introduced here are limited to this subsection only.
Suppose we have the following linear regression model:
Yn×1 = Xn×p βp×1 + Zn×q γq×1 + εn×1.
The design matrix X includes all of the exploratory variables, which are sure to be
included in the final model, whereas the design matrix Z is composed of the variables,
about which we are uncertain. The unknown parameters β and γ link X and Z to the
response Y. Here, we assume that the matrix (X,Z) has the full column rank p+q. This
framework allows us to start with a “narrow” model that includes all of the necessary
exploratory variables of X, and then to add one or more additional variables in Z. Each
subset S of 1, · · · , q represents one candidate model.
Suppose we are interested in the unknown quantity µ. Denote the estimate obtained
from candidate model S by:
µS = µ(βS, γS).
During the traditional model selection procedure, one final model is chosen from the
corresponding 2q candidate models based on a model selection criterion. We then make
the statistical inference and prediction under this final model. However, Hjort and
Claeskens (2003) have demonstrated the overoptimistic nature of the corresponding
confidence intervals with respect to coverage probability. This excess optimism pro-
vides the motivation to propose the model averaging procedure with the compromise
estimate, taking the form of:
µ =∑
S
ωSµS.
The choice of the weights ωS distinguishes the Frequentist from the Bayesian perspec-
tives. Instead of using the weights based on prior information in the Bayesian model
12
averaging procedure, the Frequentist model averaging procedure uses the weights, that
are totally determined by the data. Hjort and Claeskens (2003) also provide a partial list
of the particularly attractive weights. Specifically, when the weight function becomes
an indicator function of the final model selected by a certain model selection criterion,
the model averaging estimate is consistent with the model selection estimate associated
this model selection criterion.
1.4 Outline of the Thesis
In this thesis, model selection and model averaging procedure is further studied in the
longitudinal data context. We mainly consider the estimates incorporating the GEE
approach.
In Chapter 2, we propose and study another quasi-likelihood based AIC-type model
selection criterion incorporating the GEE approach, ∆AIC, by considering the quasi-
likelihood difference between the candidate model and a narrow model plus a penalty
term. Theoretical asymptotic properties are derived and proven. As a byproduct, we
also give a theoretical justification of the equivalence in distribution between the quasi-
likelihood ratio test and the Wald test incorporating the GEE approach. Simulation
studies and real data analysis are then performed to support the better performance of
this approach.
We also extend FIC and the FMA procedure to longitudinal data and propose the
quasi-likelihood-based focused information criterion (QFIC) and Frequentist model av-
eraging (QFMA) procedure incorporating the GEE approach in Chapter 3. The impact
of various weight functions on QFMA estimates is examined and a suggestion for the
weights’ choice is given from a numerical prospective. Simulation studies are also
performed to provide evidence of the superiority of the proposed procedures. The pro-
cedure is further applied to a real data example.
FIC tries to select the model with the minimum estimated mean square error of a
13
targeted parameter’s estimation. In Chapter 4, we redefine the personalized FIC and
apply it in predictive models in personalized medicine. Based on individual level infor-
mation from clinical observations, demographics, genetics, etc., this criterion can pro-
vide a personalized predictive model for a targeted patient and make a corresponding
personalized prognosis and diagnosis. Consideration of the population’s heterogene-
ity helps reduce prediction uncertainty and improve prediction accuracy. Several case
studies from biomedical research, not just for longitudinal, but also for survival and
cross-sectional data, are analyzed as illustrations.
Due to the popularity of LMM and GLMM in longitudinal studies, the extension of
the model averaging procedure to LMM and GLMM with their corresponding optimal
weights’ choice can be a direction for future research and is discussed in Chapter 5.
14
2 AIC-Type Model Selection
Criterion Incorporating the GEE
Approach
2.1 Introduction
The example motivating our study in this chapter is from the Wisconsin Epidemiolog-
ical Study of Diabetic Retinopathy (WESDR, Klein et al., 1984), where 996 insulin-
taking younger-onset diabetics in southern Wisconsin were examined for the presence
of diabetic retinopathy in both their left and right eyes. The objective of our study is to
determine the main risk factors of diabetic retinopathy from thirteen potential factors,
which were collected at the same time. In this analysis, the strong correlation between
the two eyes of each participant must be considered.
As we mentioned in Chapter 1, LMM has been widely used for analyzing longi-
tudinal data. As a likelihood-based approach, it relies on the assumption that data are
drawn from certain known distributions, which in reality may be unknown. Even if the
distributions are specified, it is still sometimes very challenging to derive the complete
likelihood, especially for non-Gaussian data. Instead of specifying the complicated
joint distribution of responses, Liang and Zeger (1986) developed the GEE approach,
which provides the consistent estimates by only specifying the first two marginal mo-
15
ments and a working correlation matrix.
Subsection 1.3.1 lists certain model selection criteria that have been extended to
the GEE approach. Pan (2001a)’s QIC can be easily computed using the well devel-
oped statistical packages in S-plus/R and SAS. It is worth pointing out that the neg-
ligence of the significant part during the QIC’s derivation and the reliance on work-
ing independence, however, make QIC have a lack of theoretical asymptotic prop-
erties. Cantoni et al. (2005)’s GCp criterion used weighted quadratic predictive risk
as a measure of the model’s adequacy for prediction. It, however, requires bootstrap
sampling or Monte Carlo simulation that can be computationally expensive. Another
extended cross-validation approach based on expected predictive bias was suggested
by Pan (2001b). This approach received little attention due to the computational re-
quirement as well. On the other hand, Fu (2003) proposed the penalized generalized
estimating equations for variable selection. Wang and Qu (2009) proposed a BIC-type
model selection criterion based on quadratic inference function. They both require an
extra searching algorithm for tuning parameters.
This chapter aims to propose another quasi-likelihood-based AIC-type model se-
lection criterion for longitudinal data incorporating the GEE approach. We choose a
narrow model as a benchmark and consider the quasi-likelihood difference between a
candidate model and the narrow model, thereby avoiding the complicated calculation
of the whole quasi-likelihood and making the implementation feasible and easier. The
idea is inspired by the local misspecification framework setting in Hjort and Claeskens
(2003). Under certain regularity conditions, the proposed criterion is shown to have
similar asymptotic properties as AIC.
In this chapter, Section 2.2 proposes the new model selection criterion ∆AIC and
provides corresponding theoretical insights. Simulation studies and the WESDR real
data study are carried out in Sections 2.3 and 2.4. In the final section, we conclude with
some remarks.
16
2.2 Quasi-likelihood-based ∆AIC
Claeskens and Hjort (2008) pointed out that among all the candidate models, when
the true model is at a fixed distance from the narrow model with a large sample size,
the dominating bias always suggests the full model as the final model. It therefore
motivates us to study and propose a model selection criterion for longitudinal data
incorporating the GEE approach in a local misspecification framework, as similarly
studied in Hjort and Claeskens (2003).
2.2.1 Local Misspecification Framework
Consider the longitudinal data introduced in Chapter 1. We start with the full model,
where all the covariates can be grouped into two categories: p certain covariates, which
are certainly included in the final model, and q uncertain ones, of which we are unsure.
The corresponding unknown coefficients are therefore composed of certain coefficients
θ = (θ1, · · · , θp) and uncertain coefficients γ = (γ1, · · · , γq), written as:
β = (θ,γ).
Any candidate model S therefore can be written as a special case of the full model:
βS = (θ,γS,0Sc),
where γS is a qS subvector of γ and 0Sc is a qSc subvector of q × 1 vector 0 with S ⊂
1, · · · , q. When S = N , the narrow model,
βN = (θ,0),
includes the certain covariates only. The true model is defined in a similar framework
in Hjort and Claeskens (2003):
β0 = (θ0,γ0) =(θ0, δ/
√n).
17
Here δ = (δ1, · · · , δq) measures how far away the true model is from the narrow model
in directions 1, · · · , q of order O (1/√n) and some δi’s can be 0. Under this scenario,
the size of the squared model biases and the model variances can reach O(1/n), the
highest possible large sample approximation.
To simplify the discussion, in the context of the GEE approach, we ignore the treat-
ment of the nuisance parameters α and φ and assume the consistency of α (β, φ) and
φ (β) and the boundedness of ∂α (β, φ) /∂φ as presented in Liang and Zeger (1986).
Thus, the quasi-score of the full model, evaluated at (θ0,0), can be written as:
U =
[U1
U2
]=
[∂Q(θ,γ;D)/∂θ
∂Q(θ,γ;D)/∂γ
]θ=θ0,γ=0
.
The corresponding (p+ q)× (p+ q) quasi-likelihood information matrix is denoted by:
Σ = varN (U) =
[Σ00 Σ01
Σ10 Σ11
]and Σ−1 =
[Σ00 Σ01
Σ10 Σ11
],
where Σ11 = (Σ11 −Σ10Σ−100 Σ01)−1. Let πS be the qS × q projection matrix mapping
γ to γS with qS being the size of S. The quasi-score of the candidate model S, evaluated
at (θ0,0), can be written as:
US =
[U1
U2,S
]=
[U1
πSU2
].
The corresponding quasi-likelihood information matrix has a (p+ qS)× (p+ qS) dimen-
sion:
ΣS =
[Σ00 Σ01π
>S
πSΣ10 πSΣ11π>S
]and
(Σ11
S
)−1= πS
(Σ11)−1π>S .
2.2.2 Quasi-likelihood-based ∆AIC
Let (θS, γS
)be the GEE estimates under candidate model S. Recall that the AIC value
of model S can be reached by:
−2n∑i=1
logf(yi, θS, γS) + 2|S|,
18
where |S| is the number of parameters in model S. Similarly, the quasi-likelihood-based
AIC value of model S can be calculated through:
QAICn,S = −2n∑i=1
Q(θS, γS; yi) + 2|S|.
As we mentioned earlier, due to the complicated correlation structure of longitudinal
data, QAIC is generally very difficult to implement, especially the part with the inte-
gration involving the inverse of the working covariance matrix in the quasi-likelihood
component. Nevertheless in the previous framework, every candidate model includes
all the certain parameters θ, of which the narrow model is composed. By subtracting
the QAIC value of the narrow model from every candidate model, we can avoid calcu-
lating the log quasi-likelihood directly. Thus, we propose AIC-type quasi-likelihood-
based model selection criterion for longitudinal data incorporating the GEE approach
as:
∆AICn,S = QAICn,S − QAICn,N .
The following theorem gives the specific form and the large sample behavior of ∆AICn,S.
Theorem 2.1 Under Regularity Assumptions given in Appendix, as n goes to infinity,
∆AICn,Sd= −nγ>
(Σ11)−1π>S Σ11
S πS
(Σ11)−1γ + 2|S/N|
d→ −χ2|S/N|(λS) + 2|S/N|,
with non-centrality parameter λS = nγ>0(Σ11)−1π>S Σ11
S πS
(Σ11)−1γ0. The degree of
freedom, |S/N|, is the number of covariates in the candidate model S, but not in the
narrow model. Here and below, “ d=” denotes equality in distribution and “ d→” denotes
convergence in distribution.
Theorem 2.1 indicates that in the large sample context, the behavior of ∆AICn,S is
fully dictated by the full model’s GEE estimates γ. Also, the limiting behaviors of
all ∆AICn,S in principle determine the limits of all the candidate models’ selection
probabilities through:
P(∆AICn selects model S | γ)→ P(∆AIC selects model S | γ0).
19
As shown in the proof of Theorem 2.1 in Appendix, by subtracting, the complicated
component in QAIC is canceled out and the remaining terms involve only the uncer-
tain parameters and the quasi-likelihood information matrix, which can be consistently
estimated incorporating the GEE approach. In particular, the estimates of Σ11 and
Σ11S =
πS
(Σ11)−1π>S−1 can be obtained from the sandwich estimate Σgee. Consis-
tent with AIC, the model with the smallest ∆AIC value will be selected as the final
model.
Remark 2.1 Due to the lack of likelihood, there are no likelihood ratio tests avail-
able incorporating the GEE approach for hypothesis testing, mentioned in Lipsitz and
Fitzmaurice (2009). Nevertheless, the availability of quasi-likelihood and the previous
theorem motivate us to consider the quasi-likelihood ratio tests. Consider the following
hypotheses:
H0 : γ = 0 vs Ha : γ 6= 0.
The null model can be viewed as a narrow model with only the certain parameter
vector θ. The alternative model can be viewed as a full model, which includes θ and
also the uncertain parameter vector γ. The quasi-likelihood ratio test statistic between
the alternative and null models, therefore between the full and narrow models, can be
written as:
QLRn = 2[Q(θ, γ;D)−Q(θ, γN ;D)
]= −QAICn,F + 2|F|+ QAICn,N − 2|N |
= −∆AICn,F + 2|F/N|d= nγ>
(Σ11)−1γ.
This shares the same form of the quadratic style Wald test statistic. Thus, Theorem
2.1 simultaneously gives the theoretical justification of the equivalence in distribution
between the quasi-likelihood ratio test and the Wald test incorporating the GEE ap-
proach.
20
2.3 Simulation Studies
In this section, we investigate the performance of our proposed model selection crite-
rion ∆AIC. To compare with QIC, we use the same model setting as in Pan (2001a),
where the longitudinal simulation studies have the moderate sample size of n = 50 or
100 subjects and m = 3 visit times for each subject.
Four potential exploratory covariates x1, x2, x3 and x4 are considered in the study.
They are generated from:
x1iji.i.d.∼ Bernoulli(1/2), x2ij = (j − 1) and x3ij, x4ij
i.i.d.∼ Uniform(−1, 1),
where x3ij and x4ij are also independent from x1ij . The binary response yij has the
conditional expectation µij:
µij = E(yij|x1ij, x2ij, x3ij, x4ij).
µij can be connected with the covariates through:
logit(µij) = β0 + β1x1ij + β2x2ij + β3x3ij + β4x4ij,
where i ∈ 1, · · · , n and j ∈ 1, · · · ,m. The coefficients are set to be:
β0 = −β1 = −β2 = 0.25 and β3 = β4 = 0.
Therefore, the model with a intercept term (int.), x1, and x2 is the true model. The
narrow model only includes int. and x1 to be consistent with Pan (2001a). The final
model then is selected from the remaining 23 = 8 candidate models as listed in Table
2.1.
We first use the Copulas package, developed by Yan (2007), to generate two types
of correlation structures among three response observations of each subject: exchange-
able and autoregressive with a correlation coefficient ρ = 0.5, denoted by EX(0.5) and
AR(0.5). Based on one thousand simulation replications, the frequencies of the candi-
date models selected by ∆AIC and QIC as the final model under these two scenarios
21
Table 2.1: ∆AIC - Candidate Models in Simulation Studies
Model Covariates Model Covariates
m1 - Full int. x1 x2 x3 x4 m5 int. x1 x3 x4
m2 int. x1 x2 x3 m6 int. x1 x3
m3 int. x1 x2 x4 m7 int. x1 x4
m4 - True int. x1 x2 m8 - Narrow int. x1
incorporating the GEE approach with three different working correlation matrices, IN,
EX, and AR, are listed in Tables 2.2 and 2.3.
Generally speaking, Tables 2.2 and 2.3 both show the better performance of ∆AIC
compared to QIC, in terms of the relatively higher frequencies of selecting the true
model as the final model among all eight candidates. In particular, with the correct
working correlation matrices, i.e., EX for EX(0.5) scenario and AR for AR(0.5) sce-
nario, ∆AIC works observably better than QIC. With IN working correlation matrix,
QIC turns out to be comparable with ∆AIC. These patterns also show the bias of QIC
from simplifying with the working independence model and ignoring the complicated
part in the derivation.
There is one more point we want to mention in Table 2.3. Under the true autore-
gressive correlation structure AR(0.5), when the sample size is small, n = 50, both
∆AIC and QIC have little higher frequencies of choosing the narrow model as the final
model. As the sample size becomes large, n = 100, ∆AIC and QIC both work better in
terms of the much higher frequencies of the true model selection. This may be due to
the more complicated true autoregressive correlation structure than exchangeable cor-
relation structure. The complicity may require a relatively larger sample size to arrive
at a better estimation.
As we showed above, the first simulation study assumes the simple predictable cor-
22
Table 2.2: ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC in
Simulation I with True Exchangeable Correlation Structure EX(0.5)
n Criterion R m1 m2 m3 m4 m5 m6 m7 m8
50 ∆AIC IN 19 80 81 375 10 72 75 288
EX 17 86 80 371 11 85 77 273
AR 23 88 78 367 11 75 81 277
QIC IN 20 77 80 364 10 73 74 302
EX 28 83 91 343 13 80 80 282
AR 20 81 88 354 14 75 78 290
100 ∆AIC IN 21 101 108 542 7 29 31 161
EX 17 107 105 544 8 27 25 167
AR 15 105 110 540 8 34 22 166
QIC IN 20 102 111 541 6 31 31 158
EX 24 117 119 515 9 32 28 156
AR 19 107 113 535 9 33 27 157
23
Table 2.3: ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC in
Simulation I with True Autoregressive Correlation Structure AR(0.5)
n Criterion R m1 m2 m3 m4 m5 m6 m7 m8
50 ∆AIC IN 17 68 66 335 14 69 67 364
EX 17 80 70 322 17 77 78 339
AR 15 83 64 323 18 80 84 333
QIC IN 19 66 69 333 14 75 68 356
EX 23 76 70 322 16 74 76 343
AR 20 76 73 315 20 80 78 338
100 ∆AIC IN 17 100 113 473 12 38 35 212
EX 14 101 107 480 7 48 37 206
AR 20 87 113 486 12 50 35 197
QIC IN 16 98 115 475 11 41 35 209
EX 20 109 123 452 13 44 35 204
AR 20 101 121 462 14 43 33 206
24
relation structures among each subject’s repeated response measurements. In many real
longitudinal studies, however, it is impossible to know the true underlying correlation
structure pattern. Thus, the scenario with more complicated correlation structures be-
comes more interesting. Here we generate the longitudinal data, MIX, 30% of which
come from EX(0.5), 30% from AR(0.5), and the rest have the following specified corre-
lation structure:
R(α) =
1.0 0.4 0.1
0.4 1.0 0.7
0.1 0.7 1.0
.Again, ∆AIC and QIC are applied to this scenario for model selection incorporating
the GEE approach. The results of one thousand simulation replications are shown in
Table 2.4.
Table 2.4 shows the similar pattern as in previous tables. With the small sample
size, n = 50, QIC works better with IN, while ∆AIC works better with EX and AR.
When the sample size is large, n = 100, ∆AIC works better with all three working
correlation matrices, though it is close to QIC under IN. This also shows the better
large sample properties of ∆AIC compared to QIC.
2.4 A Numerical Example
We now apply our proposed model selection criterion ∆AIC to the WESDR dataset
mentioned in the beginning of this chapter. This dataset was also examined by Barnhart
and Williamson (1998) and Pan (2001a). Here, we consider only 720 individuals who
have the complete information from examinations in both eyes. Therefore, there are
1440 total observations with a possibly natural correlation between the two eyes of
each individual. The binary response, retinpy, indicates diabetic retinopathy (1 -
presence and 0 - absence). The study aims to determine the main risk factors for diabetic
retinopathy from thirteen potential ones.
25
Table 2.4: ∆AIC - Frequencies of Candidate Models Selected by ∆AIC and QIC in
Simulation II with True Mixed Correlation Structure MIX
n Criterion R m1 m2 m3 m4 m5 m6 m7 m8
50 ∆AIC IN 20 73 67 308 14 81 75 362
EX 13 71 70 316 16 87 61 366
AR 19 70 69 313 11 89 70 359
QIC IN 21 68 68 317 19 87 68 352
EX 26 76 76 303 24 92 68 335
AR 23 76 72 305 24 89 72 339
100 ∆AIC IN 12 94 92 497 9 56 36 204
EX 15 90 98 496 8 58 35 200
AR 15 83 95 506 8 54 32 207
QIC IN 14 91 93 496 10 54 36 206
EX 27 92 94 478 14 58 37 200
AR 24 100 94 476 15 57 33 201
26
Based on the univariate analysis and the goodness-of-fit tests conducted in Barnhart
and Williamson (1998), we consider only eight risk factors that were found marginally
significant for the response: iop, intraocular pressure; diab, duration of diabetes (in
years); gh, glycosylated hemoglobin level; sbp, systolic blood pressure; dbp, diastolic
blood pressure; bmi, body mass index; pr, pulse rate (beats/30 seconds); and prot,
proteinuria (0 - absence and 1 - presence). The model concluded by Barnhart and
Williamson (1998) includes diab, gh, dbp, bmi, diab2, and bmi2, and is used as the
narrow model in our setting, as in Pan (2001a). The full model, therefore, includes
eight risk factors and two quadric terms of diab and bmi. It can be written as:
logit(µij) = β0 + β1diabij + β2ghij + β3dbpij + β4bmiij + β5(diabij)2
+ β6(bmiij)2 + β7iopij + β8sbpij + β9prij + β10protij,
with i = 1, · · · , 720, j = 1, 2 and µij is the conditional expected of retinpyij . We
thus consider four uncertain risk factors: iop, sbp, pr and prot, resulting in 24 = 16
candidate models. From these candidates, the final model is selected by ∆AIC and
QIC.
Due to the possibly natural correlation in this dataset, the full marginal logistic
regression model is fitted incorporating the GEE approach with three different working
correlation matrices: IN, EX and AR. The corresponding coefficients’ estimates and p-
values are listed in Table 2.5 in the order of their significance. Table 2.5 shows that other
than the risk factors in the narrow model, the uncertain covariate proteinuria (prot)
also has a relatively small p-value (0.04). Moreover, these three working correlation
matrices provide very similar estimates. For sake of simplicity, we only incorporate the
GEE approach with the exchangeable working correlation matrix.
The values of ∆AIC and the corresponding ranks for all 16 candidate models are
listed in Table 2.6 and plotted in Figure 2.1. As shown in Table 2.6, the top model
concluded by ∆AIC includes one uncertain risk factor, prot (statistically significant as
shown in Table 2.5). The following four models are either the narrow model or adding
one more risk factor besides prot. These patterns further show the importance of prot.
27
Table 2.5: WESDR - Statistical Inference under Full Model with IN, EX and AR
Working Correlation Matrices
IN EX AR
Covariate Estimate P-value Estimate P-value Estimate P-value
int. -1.7e-00 0.0e-00 -1.7e-00 0.0e-00 -1.7e-00 0.0e-00
diab -2.6e-01 0.0e-00 -2.6e-01 0.0e-00 -2.6e-01 0.0e-00
diab2 7.0e-03 0.0e-00 7.0e-03 0.0e-00 7.0e-03 0.0e-00
bmi2 6.1e-01 4.4e-05 6.1e-01 4.2e-05 6.1e-01 4.2e-05
gh -1.5e-01 2.5e-05 -1.5e-01 2.6e-05 -1.5e-01 2.6e-05
bmi -6.0e-01 3.3e-03 -6.0e-01 3.4e-03 -6.0e-01 3.4e-03
dbp -2.3e-02 2.8e-02 -2.3e-02 2.5e-02 -2.3e-02 2.5e-02
prot -7.1e-01 4.0e-02 -7.1e-01 4.0e-02 -7.1e-01 4.0e-02
iop -4.3e-02 1.5e-01 -4.0e-02 1.5e-01 -4.0e-02 1.5e-01
sbp 1.1e-02 1.8e-01 1.1e-02 1.8e-01 1.1e-02 1.8e-01
pr -1.7e-02 2.0e-01 -1.8e-02 2.0e-01 -1.8e-02 2.0e-01
28
Table 2.6: WESDR - ∆AIC Values and Ranks of Candidate Models
Rank ∆AIC Uncertain Covariates Rank ∆AIC Uncertain Covariates
1 -0.733 prot 9 0.555 iop
2 -0.274 prot, iop 10 0.615 prot, sbp, pr
3 0.000 N 11 0.765 pr
4 0.009 prot, sbp 12 1.223 iop, pr
5 0.019 prot, pr 13 1.731 sbp
6 0.171 prot, iop, sbp 14 2.239 iop, sbp
7 0.311 prot, iop, pr 15 2.407 sbp, pr
8 0.491 prot, iop, sbp, pr 16 2.727 iop, sbp, pr
Figure 2.1: WESDR - ∆AIC Values of Candidate Models
−0.5
0.0
0.5
1.0
1.5
2.0
2.5
Candidate Model
∆AIC
1 2 Narrow 4 5 6 7 Full 9 10 11 12 13 14 15 16
29
For the comparison with Pan (2001a), we also list the values of ∆AIC for the top
four candidate models selected by QIC along with the full and narrow models in Table
2.7. Pan concluded that the top four candidate models are very close in terms of similar
QIC values: 1185.5, 1185.7, 1185.8 and 1186.0. But by implementing ∆AIC, we do see
the relatively large difference: −0.274,−0.733, 0.171 and 0.009. Thus, ∆AIC suggests
one single model as the final model, which includes: diab, gh, dbp, bmi, diab2, bmi2
and prot.
We also use the quasi-likelihood ratio test commented in Remark 2.1 and the ANOVA
method referred in Hjsgaard et al. (2006) to compare the relatively better model con-
cluded by QIC: narrow+iop+prot, with the final model selected by ∆AIC: narrow+
prot, where narrow = diab + gh + dbp + bmi + diab2 + bmi2. We obtain the test
statistics for both approaches of 2.15 with a p-value of 0.1429. It indicates the insignifi-
cant difference between these two models and the preference of the simpler final model
chosen by our ∆AIC.
Table 2.7: WESDR - QIC and ∆AIC Values of Models Selected by QIC
QIC ∆AIC
Uncertain Covariates Rank IN EX AR IN EX AR
prot, iop 1 1185.5 1185.1 1185.1 -0.291 -0.274 -0.274
prot 2 1185.7 1185.7 1185.7 -0.717 -0.733 -0.733
prot, iop, sbp 3 1185.8 1185.4 1185.4 0.190 0.171 0.171
prot, sbp 4 1186.0 1186.0 1186.0 0.045 0.009 0.009
prot, iop, sbp, pr 8 1186.5 1186.0 1186.0 0.541 0.491 0.491
N 10 1189.8 1189.8 1189.8 0.000 0.000 0.000
30
2.5 Conclusion and Remarks
The key point of our proposed approach is to consider the difference between the candi-
date model and a narrow model by executing the Taylor expansion in order to avoid cal-
culating the integration involved in the quasi-likelihood. The resulting criterion ∆AIC
can be easily implemented by fitting the full model with a penalty term. This advantage
becomes more critical for discrete response variables. Although our criterion is built
under the AIC framework, analogously, we can also define BIC-type quasi-likelihood-
based model selection criterion ∆BIC for longitudinal data incorporating the GEE ap-
proach by just changing the penalty term. As Yang (2005) mentioned, BIC aims to
consistently select the true model, which is required to be in the set of all the candidate
models. AIC aims to minimize the distance between the selected model and the data
set in terms of likelihood. ∆AIC and ∆BIC, therefore, have the similar characteristics
to AIC and BIC. Other criteria can also be extended to longitudinal data in a similar
way.
It is worth mentioning that the choice of a narrow model is necessary for implement-
ing ∆AIC. We suggest to prefit the full model at first and pick the covariates with the
“small” p-values as the certain covariates, thereby composing a narrow model. Other
covariates can also be included by interests and experience. Both theoretical and nu-
merical evidence suggests that the choice of a narrow model only lightly influences
the results. When the signal of some covariates become weaker, smaller models are
favorable in both ∆AIC and QIC.
Two issues arise concerning model selection for longitudinal data incorporating the
GEE approach: variable selection and working correlation matrix selection. Currently
∆AIC is limited to variable selection only. More work needs to be done for the selection
of a working correlation matrix.
31
3 Focused Information Criterion and
the Frequentist Model Averaging
Procedure Incorporating the GEE
Approach
3.1 Introduction
In clinical studies, longitudinal data are commonly used to analyze long term ex-
ploratory variables’ effects on response variables. One example is AIDS clinical study
A5055, which aimed to predict the long term antiviral treatment responses of HIV-1 in-
fected patients by considering pharmacokinetics, drug adherence and susceptibility. In
this study, each patient was visited multiple times over 24 weeks after entry. Therefore,
the correlations among repeated measurements of each patient are expected and have
to be accounted for analysis.
As we mentioned in Chapter 1, all existing model selection criteria incorporating
the GEE approach are data-oriented and result in the model with overall properties.
Claeskens and Hjort (2003) therefore proposed the model selection criterion FIC to se-
lect different models based on different targeted parameters. At the same time, Hjort
and Claeskens (2003) also proposed the FMA procedure to reduce the risk of choos-
32
ing a poor model and thereby improve the confidence intervals’ coverage probabil-
ity. This chapter aims to propose the quasi-likelihood based focused information crite-
rion (QFIC) and the Frequentist model averaging (QFMA) procedure for longitudinal
data incorporating the GEE approach. QFIC and the QFMA procedure inherit certain
asymptotic properties from FIC and the FMA procedure due to the similarities between
quasi-likelihood and likelihood.
Section 3.2 introduces QFIC and the QFMA procedures and constructs the modified
confidence intervals based on QFMA estimation. Simulation studies and the A5055
real data study are performed in Section 3.3 and Section 3.4 respectively. In the final
section, we conclude with additional remarks.
3.2 Model Selection and Averaging Procedures
3.2.1 Focused Information Criterion
As mentioned in Subsection 2.2.2, denote the GEE estimates under candidate model S
by(θS, γS
). The corresponding asymptotic distribution of the GEE estimates will be
given in the following proposition.
Proposition 3.1 Under the misspecification framework and Regularity Assumptions
given in Appendix, as n goes to infinity, we have:
√n
[θ − θ0
γ
]d→ Σ−1
[Σ01δ + M1
Σ11δ + M2
]∼ Np+q
(Σ−1
[Σ01
Σ11
]δ,Σ−1
),
where
[M1
M2
]∼ Np+q(0,Σ). In particular, under candidate model S:
√n
[θS − θ0
γS
]d→ Σ−1
S
[Σ01δ + M1
πSΣ11δ + πSM2
]∼ Np+qs
(Σ−1
S
[Σ01
πSΣ11
]δ,Σ−1
S
).
33
To simplify the notation, let W = Σ11(M2 − Σ10Σ−100 M1). By Proposition 3.1, the
estimates of uncertain parameters under the full model can be specified as:
√nγ
d→ δ + W = ∆ ∼ Nq
(δ,Σ11
).
In particular, under candidate model S:
√nγS
d→ Σ11S πS
(Σ11)−1
(δ + W) = Σ11S πS
(Σ11)−1
∆.
Assume that the focused parameter can be written as the function of the model pa-
rameters, denoted by ζ = ζ (θ,γ), and has the continuous partial derivatives in the
neighborhood of ζ0 = ζ (θ0,γ0). Denote:
ω = Σ10Σ−100
∂ζ
∂θ−∂ζ∂γ
, τ 20 =
(∂ζ
∂θ
)>Σ−1
00
(∂ζ
∂θ
)and DS = π>S Σ11
S πS
(Σ11)−1
.
The following theorem provides the limiting distribution of the focused parameter’s
estimate incorporating the GEE approach under candidate model S.
Theorem 3.1 Under Regularity Assumptions given in Appendix, as n goes to infinity,
√n(ζS − ζ0)
d→ ΩS = Ω0 + ω>δ − ω>DS∆,
where
Ω0 =
(∂ζ
∂θ
)>Σ−1
00 M1 ∼ Np(0, τ20 ).
The limiting variable ΩS follows the normal distribution with mean ω>(Iq − DS)δ and
variance τ 20 + ω>π>S Σ11
S πSω.
The limiting mean square errors can be achieved by Theorem 3.1 as:
mse(ΩS) = τ 20 + ω>π>S Σ11
S πSω +[ω> (Iq − DS) δ
]2,
where the parameters τ 0, ω, Σ11S , DS and δ can all be estimated incorporating the
GEE approach under the full model. Therefore, we propose the quasi-likelihood-based
34
focused information criterion (QFIC) for longitudinal data incorporating the GEE ap-
proach as:
QFICn,S = 2ω>π>S Σ11
S πSω + n[ω>(Iq − DS)γ
]2.
In the large sample context, the behavior of QFIC is not only related to the uncertain
parameter γ, but also influenced by ω that is determined by the focused parameter
ζ. Therefore, QFIC chooses the different models for estimating the different focused
parameters. The model with the smallest QFIC value, therefore the smallest estimated
mean square error of the focused parameters’ estimates, is selected as the final model.
3.2.2 The Frequentist Model Averaging Procedure
Model selection procedure aims to select a single final model, either catching the over-
all information from the data such as ∆AIC, or minimizing the mean square error of
the focused parameters’ estimates such as QFIC. The inference based on this final
model, however, ignores the uncertainty introduced by the selecting process and re-
sults in overly optimistic confidence intervals. The FMA procedure, as an alternative
to model selection procedure, addresses this problem and provides the relatively robust
statistical inference.
Similarly, the quasi-likelihood-based Frequentist model averaging (QFMA) esti-
mate of the focused parameter ζ can be defined as the weighted average among the
estimates reached through all the candidate models incorporating the GEE approach:
ζ(γ) =∑
S
p(S |γ)ζS,
where p(·|·) is a weight function satisfying∑
S p(S|γ) = 1 with each individual taking
value in [0, 1]. The following theorem shows the asymptotic properties of the model
averaging estimate ζ.
Theorem 3.2 Under Regularity Assumptions given in Appendix, as n goes to infinity,
√n(ζ − ζ0)
d→ Ω = Ω0 + ω>δ − ω>δ(∆),
35
where
δ(∆) =∑
S
p(S|∆)DS∆.
The mean and variance of the limiting variable Ω are given as:
E(Ω) = ω>δ − ω>E[δ(∆)
]and var(Ω) = τ 2
0 + ω>var[δ(∆)
]ω.
Motivated by Theorem 3.2, we modify the traditional confidence intervals of the fo-
cused parameter ζ, based on the model averaging estimate ζ as:
lown = ζ − ω>[γn −
1√nδ(γn)
]− zkτ√
n,
upn = ζ − ω>[γn −
1√nδ(γn)
]+zkτ√n,
where zk is the kth standard normal quantile. τ/√n is the consistent estimate of the
standard deviation of ζ under the full model, which can be written as:
τ/√n = n−1/2
(τ 2
0 + ω>Σ11ω)1/2
.
By shifting the center of the confidence intervals from ζ by the amount of ω>[γn −
δ(γn)/√n], and widening the confidence intervals as τ/
√n instead of τ S/
√n, there-
fore including the uncertainty, the coverage probability is shown to be consistent with
the nominal coverage probability by the following theorem.
Theorem 3.3 Under Regularity Assumptions given in Appendix, as n goes to infinity,
Pr(lown ≤ ζ0 ≤ upn)d→ 2Φ(zk)− 1,
where Φ(·) is a standard normal distribution function.
In particular,
Zn =[√n(ζ − ζ0
)− ω>
∆n − δ(∆n)
]/τ
d→ (Ω0 + ω>δ − ω>∆)/τ
is a standard normal distribution. Theorem 3.2 can be easily proven by simultaneous
convergence in distribution:√n(ζ − ζ0
), γn d→
Ω0 + ω>δ − ω>δ(∆),∆
.
36
3.2.3 The Choices of Weight Functions
The model averaging estimate takes the form of the weighted estimates among all the
candidate models. It can be connected to a model selection estimate by taking a spe-
cific weight function. In particular, the final model selected by ∆AIC, S∆AIC, takes an
indicator function as the weight function, which is called hard core weight function:
ζ∆AIC =∑
S
I(S = S∆AIC)ζS = ζS∆AIC.
Likewise, the final model selected by QIC can be written as:
ζQIC =∑
S
I(S = SQIC)ζS = ζSQIC,
and the final model selected by QFIC can be written as:
ζQFIC =∑
S
I(S = SQFIC)ζS = ζSQFIC.
Buckland et al. (1997), however, suggested that the choice of weights in the model av-
eraging estimates should be proportional to exp(fS − |S|), where fS is the maximized
log-likelihood at candidate model S. For longitudinal data incorporating the GEE ap-
proach, the weights thus should be proportional to exp(QS − |S|), with QS being the
quasi-likelihood of candidate model S. The corresponding smoothed weight functions
for ∆AIC and QIC can be represented as:
exp(∆AICn,S/2
)∑T exp
(∆AICn,T/2
) andexp(QICn,S/2
)∑T exp
(QICn,T/2
) .It can also be beneficial to consider the information carried by QFIC using the smoothed
QFIC weight. The weight function is similar to that suggested in Hjort and Claeskens
(2003) as follows,
exp(−κ
2
QFICn,S
ω>Σ11ω
)∑
T exp(−κ
2
QFICn,T
ω>Σ11ω
) κ ≥ 0.
37
Here, κ is the weight parameter bridging the weight function from being uniform (κ
close to 0) to begin hard core (large κ). When the performances of all the candidate
models are very close, we would like to choose κ such that the weight function is close
to uniform. When certain candidate models behave much better than others, the larger
κ is a better option. The larger κ can make the weight function close to hard core,
and can therefore place the higher weights on the models that behave better and lower
weights on the ones that behave badly.
3.3 Simulation Studies
This section aims to investigate the performance of our proposed QFIC and the QFMA
procedure for longitudinal data incorporating the GEE approach. The model selection
procedures using QFIC, ∆AIC as proposed in Chapter 2, and Pan (2001a)’s QIC (de-
noted as P-QFIC, P-∆AIC and P-QIC) are compared to their smoothed weighted model
averaging procedures (denoted as S-QFIC, S-∆AIC and S-QIC). In particular, we cal-
culate the coverage probabilities (hereafter “CP”) of the estimated 95% confidence in-
tervals (hereafter “CIs”) and the estimated mean square errors (hereafter “MSE”) for
the targeted parameter. As a reference, the inference based on the full model (here-
after “Full”) is reported as well. Specifically, we consider the discrete and continuous
responses with n = 100 subjects, and each subject has m = 3 visits.
3.3.1 Continuous Response Variable
The continuous response variable can be reached by:
yi = β0 + β1x1i + β2x2i + β3x3i + εi with i = 1, · · · , n.
The covariates x1i = (x1i1, x1i2, x1i3)>, x2i = (x2i1, x2i2, x2i3)>, and x3i = (x3i1, x3i2, x3i3)>
are independently generated from a multivariate normal distribution with mean (1, 1, 1)>
38
and identity covariance matrix. The error term εi = (εi1, εi2, εi3)> is independent
of the covariates and is generated from a three-dimensional normal distribution with
mean 0, marginal variance 1. Section 2.3 introduces three types of correlation struc-
tures among the three repeated response measurements of each subject: two simple
predictable structures, EX(0.5) and AR(0.5), and a complex one, MIX. Here, we con-
sider the same correlation structures for εi. The narrow model contains only int. and
x1 with (β0, β1) = (2, 1). The coefficients of the other two covariates are valued as
(β2, β3) = (2,−2)/√mn. Totally, four candidate models are given in Table 3.1.
Table 3.1: QFIC and QFMA - Candidate Models in Simulation I with Continuous
Response
Model Covariate Model Covariate
m1 - Full int. x1 x2 x3 m2 int. x1 x3
m3 int. x1 x2 m4 - Narrow int. x1
Here, we consider only one focused parameter in this study as follows:
ζ = −2β0 + 2β1 − 0.5β2 + 0.5β3.
Actually, the focused parameter is not necessarily limited to the form of the linear
combinations of the coefficients. We also tried a quadratic form β21 + β2 and observed
a similar pattern. All the models are fitted incorporating the GEE approach with three
different working correlation matrices: IN, EX and AR. The simulation results, based
on one thousand replications, are presented in Figure 3.1 in terms of the MSE and CP
of the estimated 95% CIs for the focused parameter.
39
Figu
re3.
1:Q
FMA
and
QFI
C-M
SEan
dC
Pfo
rFoc
used
Para
met
erζ
inSi
mul
atio
nIo
nC
ontin
uous
Res
pons
esw
ithTr
ue
Exc
hang
eabl
e,A
utor
egre
ssiv
ean
dM
ixed
Cor
rela
tion
Mat
rice
sEX(0.5
),AR(0.5
)an
dMIX
25303540
EX
(0.5
)
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
25303540
AR
(0.5
)
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
25303540
MIX
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.880.900.920.94
EX
(0.5
)
Coverage Probabiliy
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.880.900.920.94
AR
(0.5
)
Coverage Probabiliy
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.880.900.920.94
MIX
Coverage Probabiliy
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
INE
XA
R
40
Regardless of different working correlation matrices, three MSE plots in the upper
panel of Figure 3.1 consistently show the performance of model averaging procedures
to be better than model selection procedures in terms of relatively smaller MSE values.
They also show that the performance of model selection criterion QFIC is better than
∆AIC and QIC. Comparing ∆AIC to QIC, S-∆AIC behaves similarly to S-QIC, while
P-∆AIC works better than P-QIC. This shows the superiority of ∆AIC compared to
QIC and also the stability of averaging version compared to selection version. As a
reference, the full model does provide unbiased estimates, but with the price of a largely
increase of variability. It therefore has relatively larger MSE.
We now compare the procedures based on different working correlation matrices.
In the first two predictable correlation structure scenarios, the GEE estimates with the
correct working correlation matrix, i.e., EX for EX(0.5) and AR for AR(0.5), always
have the smallest MSE values for all the model selection or averaging procedures. This
pattern is consistent with the true correlation’s efficiency pointed out in Liang and Zeger
(1986). In the third scenario (MIX), AR gives the smallest MSE value. This may be
due to the relatively closer to the true correlation structure of AR compared to EX .
In all three scenarios, IN results in the biggest MSE value due to the high correlation
coefficient in the simulation setting.
Three CP plots in the bottom panel of Figure 3.1 also indicate the better perfor-
mance of the modified CIs from averaging procedure compared to the traditional CIs
from selection procedure. The CPs of all the modified CIs are very close to 95%,
whereas the CPs of the traditional CIs can sometimes dramatically go down to even
90%. Compared to the different CP behaviors of the traditional CIs resulting from P-
QFIC, P-∆AIC and P-QIC, three modified CIs perform similarly. This also shows the
more stable behavior of model averaging procedure.
For CP plots, in the first two correlation structure scenarios, AR and EX works very
similarly and better than IN. Although the correct working correlation matrix, i.e., EX
for EX(0.5) and AR for AR(0.5), does work a little better. In MIX scenario, AR works
41
better than the other two due to the same reason as the MSE plots.
3.3.2 Binary Response Variable
For the binary longitudinal data, we generate the binary response with the same model
as in Section 2.3, although we consider the different coefficients combination:
(β0, β1) = (3,−3) and (β2, β3, β4) = (1, 1,−1)/√mn.
The narrow model therefore contains int. and x1. As shown in Table 3.2, we have 8
candidate models. Here, we focus on the specified parameter:
ζ = 2β1 + 2β2 + 0.5β3 + 0.5β4 + 0.5β5.
The simulation results, based on one thousand replications, are presented in Figure 3.2.
Table 3.2: QFIC and QFMA - Candidate models in Simulation II with Binary
Response
Model Covariate Model Covariate
m1 - Full int. x1 x2 x3 x4 m5 int. x1 x3 x4
m2 int. x1 x2 x3 m6 int. x1 x3
m3 int. x1 x2 x4 m7 int. x1 x4
m4 int. x1 x2 m8 - Narrow int. x1
42
Figu
re3.
2:Q
FMA
&Q
FIC
-MSE
&C
Pfo
rFoc
used
Para
met
erζ
inSi
mul
atio
nII
onB
inar
yR
espo
nses
with
True
Exc
hang
eabl
e,A
utor
egre
ssiv
ean
dM
ixed
Cor
rela
tion
Mat
rice
sEX(0.5
),AR(0.5
)an
dMIX
505560657075
EX
(0.5
)
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
505560657075
AR
(0.5
)
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
505560657075
MIX
Inde
x
Mean Square Error
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.900.920.940.96
EX
(0.5
)
Coverage Probability
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.900.920.940.96
AR
(0.5
)
Coverage Probability
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
0.900.920.940.96
MIX
Coverage Probability
Ful
lS
−QF
ICS
−∆A
ICS
−QIC
P−Q
FIC
P−∆
AIC
P−Q
IC
INE
XA
R
43
The pattens of the binary longitudinal data, shown in Figure 3.2, are similar to the
continuous case. Generally speaking, estimates by model averaging procedure have a
relatively smaller MSE than model selection estimates, and estimates by QFIC have a
smaller MSE than ∆AIC and QIC. With the different working correlation matrices, EX
and AR similarly provide MSE values smaller than IN. The true working correlation
matrices for the first two scenarios, i.e., EX for EX(0.5) and AR for AR(0.5), however,
have the slightly better performance. Regarding ∆AIC and QIC, with EX and AR,
∆AIC has a much smaller MSE than QIC. With IN, however, they work almost the
same. This also shows the superiority of ∆AIC compared to QIC and the bias intro-
duced by using working independence for QIC. For CP, the modified CIs are observably
closer to 95% CPs than the traditional CIs.
In summary, for both continuous and binary longitudinal simulation studies, the
MSE and CP plots consistently show the advantage of QFIC over the traditional model
selection criteria ∆AIC and QIC. They also demonstrate the behaviors of the QFMA
procedure are better than the traditional model selection procedure.
3.4 A Numerical Example
In this section, we apply our proposed QFIC and the QFMA procedure to the AIDS
Clinical Trials Group protocol A5055 longitudinal study incorporating the GEE ap-
proach. A5055 was a Phase I/II, randomized, open-label, 24-week comparative study
of the pharmacokinetics, tolerability, safety and antiretroviral effects of two regimens
of indinavir (IDV), ritonavir (RTV) and two nucleoside analogue reverse transcriptase
inhibitors on HIV-1 infected patients who failed protease inhibitor containing antiretro-
viral therapies.
In this study, 42 patients were randomized to one of two regimens and were visited
at entry, weeks 1, 2 and 4 and every 4 weeks thereafter through week 24 of the follow-
up. Plasma for HIV-1 RNA testing was conducted at each visit, providing a binary
44
response rna, (0 - negative and 1 - positive). A series of potentially explanatory vari-
ables were collected at the same time, including: age; cd4, CD4 cell counts; cd8, CD8
cell counts; ic50, phenotypic determination of antiretroviral drug resistance; icmin
and rcmin, trough level of IDV and RTV concentration in plasma; ic12h and rc12h,
IDV and RTV concentration in plasma measured after 12h from dose taken; icmax and
rcmax, maximum IDV and RTV concentration in plasma; iauc and rauc, area under
the plasma concentration-time curve for IDV and RTV; and iadh and radh, pill counts
for monitoring adherence. More detailed descriptions and analyses are reported in Wu
et al. (2005), Huang et al. (2008), and Acosta et al. (2004). In this section, given all
these various potentially explanatory factors, we aim to identify the pertinent factors to
better predict the antiretroviral treatment response for a new patient.
We first fit the full model by considering all fourteen possible covariates in order to
identify the highly significant covariates. The full model can be written as:
logit(µij) =β0 + β1cd4ij + β2cd8ij + β3ageij + β4ic50ij + β5radhij + β6iadhij
+ β13raucij + β14iaucij + β7rcminij + β8icminij + β9rcmaxij
+ β10icmaxij + β11rc12hij + β12ic12hij,
with i = 1, · · · , 42, j = 1, · · · , ti and µij being the conditional expectation of rnaij .
Again, due to the complicated correlation structure among each patient’s repeated ob-
servations, the corresponding marginal logistic regression model is fitted incorporating
the GEE approach with three different working correlation matrices: IN, EX and AR.
By the order of the covariates’ significance, the model fitting results are listed in Table
3.3 in terms of the corresponding coefficients’ estimates and p-values.
In Table 3.3, the working correlation matrices IN and EX give very similar coef-
ficients’ estimates and corresponding p-values that are quite different from those with
AR. Regardless of the working correlation matrices, however, all the results point out
the highly significant covariates: int., cd4, cd8 and age. These four covariates are
therefore included as the certain covariates and we run the model selection and av-
eraging procedures among the remaining eleven uncertain ones. Nevertheless, if we
45
Table 3.3: A5055 - Statistical Inference under Full Model with IN, EX and AR
Working Correlations Matrices
IN EX AR
Covariate Estimate P-value Estimate P-value Estimate P-value
int. 4.0e+01 1.2e-02 3.9e+01 1.4e-02 3.7e+01 2.3e-02
cd4 1.0e-02 1.1e-03 1.1e-02 1.1e-03 1.1e-02 1.3e-03
cd8 -1.7e+00 4.3e-03 -1.8e+00 3.7e-03 -1.7e+00 3.2e-03
age -8.1e-02 6.4e-03 -8.0e-02 7.1e-03 -7.8e-02 8.7e-03
icmax -2.3e+00 3.8e-02 -2.3e+00 4.1e-02 -2.2e+00 3.7e-02
iauc 9.2e-02 8.0e-02 9.1e-02 8.5e-02 9.2e-02 8.5e-02
ic50 2.1e-01 9.1e-02 2.1e-01 9.8e-02 1.9e-01 9.2e-02
rcmin 1.8e-01 1.1e-01 1.8e-01 1.1e-01 1.9e-01 1.1e-01
rc12h 7.9e-01 1.3e-01 8.1e-01 1.2e-01 8.0e-01 1.4e-01
ic12h -6.3e-04 2.3e-01 -6.5e-04 2.2e-01 -6.1e-04 2.4e-01
rcmax -0.7e+00 2.6e-01 -1.6e+00 2.8e-01 -1.4e+00 3.7e-01
iadh -4.8e+00 2.9e-01 -4.7e+00 3.0e-01 -3.9e+00 3.4e-01
radh 2.2e+00 6.5e-01 2.1e+00 6.7e-01 1.3e+00 7.9e-01
icmin 7.6e-05 7.9e-01 8.3e-05 7.7e-01 1.3e-04 6.5e-01
rauc -3.3e-03 8.5e-01 -3.9e-03 8.3e-01 -6.7e-03 7.2e-01
46
consider all the possible candidates models, 211 models need to be estimated. A back-
ward elimination approach, as introduced in Claeskens et al. (2006), is thus used here
as an alternative to an exhaustive search. We start with the full model, delete one co-
variate at each step based on a certain model selection criterion and end up with twelve
nested candidate models. The model selection and averaging procedures are therefore
processed among these twelve models.
We examine the predictive powers by using a cross-validation experiment for six
model selection and averaging procedures: S-QFIC, S-∆AIC, S-QIC, P-QFIC, P-∆AIC,
and P-QIC. Due to the complicated correlation structure among each patient’s repeated
observations, a leave-one-patient-out cross-validation experiment can be a better choice
instead of a leave-one-observation-out experiment. The prediction error rates are eval-
uated by the percentage of wrong predictions among one thousand replications. They
are plotted in Figure 3.3.
Figure 3.3: A5055 - Prediction Error Rates of Model Selection and Model Averaging
Procedures with Different Values of Weight Parameter κ
0 2 4 6 8 10
0.1
86
0.1
88
0.1
90
0.1
92
Value of κ
Pre
dic
tion
Err
or
Ra
te
S−QFICP−QFIC
0.1
84
0.1
88
0.1
92
0.1
96
Methods
Pre
dic
tion
Err
or
Ra
te
S−QFIC P−QFIC S−∆AIC P−∆AIC S−QIC P−QIC
IN EX AR
As we mentioned in Section 3.2, the weight parameter κ bridges the QFIC-based
47
weight function from uniform to hard core. The left panel in Figure 3.3 gives the dif-
ferent prediction error rates of S-QFIC by using different κ values, ranged from 0 to
10. When κ = 0, the model averaging estimate is actually the arithmetic mean of the
estimates from the 12 nested candidate models. Without considering the information
about the different models’ different behaviors, this estimate results in the largest error
rate. On the other hand, the dashed line indicates the error rate of P-QFIC. It is actu-
ally equivalent to S-QFIC with the hard core weight function, assigning 1 to the best
model selected by QFIC and 0 to the rest of the models. For this specific data set with
the specific model setting, the prediction error rate of S-QFIC in the plot dramatically
decreases as κ takes values from 0 to 1. This error rate becomes less than the P-QFIC
error rate when κ = 2, and reaches the minimum when κ has the value around 5.
Eventually, it converges to the error rate of P-QFIC when κ→∞.
The right panel in Figure 3.3 plots the prediction error rates based on S-QFIC with
κ = 5, and also S-∆AIC, S-QIC, P-QFIC, P-∆AIC and P-QIC incorporating the GEE
approach with IN, EX and AR. From the plot, we observe the smaller error rate of
QFIC compared to ∆AIC and QIC, and the smaller error rate of ∆AIC compared to
QIC. These pattern indicate the advantage of the prediction made by using the different
sets of explanatory covariates for different patients at different visits, and also once
again present the better behavior of ∆AIC compared to QIC. The plot also shows the
smaller error rates of S-QFIC compared to P-QFIC, S-∆AIC to P-∆AIC and S-QIC
to P-QIC, even though the difference for the QFIC pair is not substantial. This shows
the behavior of model averaging procedure is better than selection procedure. With
the different working correlation matrices, IN and EX give almost the same prediction
error rates for all six estimates, while AR’s performances are much worse. This may
due to the similarities of EX and IN to the unknown true correlation.
In order to demonstrate that the final models concluded by QFIC are different for
various interests, we also consider three focused parameters: the coefficients of cd4,
cd8 and age. The backward elimination selection is processed incorporating the GEE
48
approach with IN, based on ∆AIC, QIC, and QFIC. The corresponding 12 selected
nested models are listed in Table 3.4 - Table 3.8 along with the values of the model
selection criteria and the focused parameters’ estimates.
Regardless of the different focused parameters, ∆AIC and QIC both result in their
own final models, selected among their twelve corresponding nested candidate models.
In particular to ∆AIC during the backward elimination search, the covariates deleted
in the first three steps are the most insignificant ones based on the full model. Their
corresponding p-values range from 0.6 − 0.9 as shown in Table 3.3. The subsequent
deletion has the different order, which may be due to the change of the significance
after deleting the most insignificant covariates. The model selection criterion QFIC,
however, selects the different final models for the different focused parameters from
their own different twelve nested candidate models .
3.5 Conclusion and Remarks
Based on the quasi-likelihood, we propose the parameter-oriented model selection cri-
terion QFIC and the Frequentist model averaging QFMA procedure for longitudinal
data incorporating the GEE approach, and derive the asymptotic properties for the pro-
posed procedures. Both simulation studies and real data analysis show their superiori-
ties in terms of a smaller mean square error, a closer to 95% coverage probability, and
a smaller prediction error rate.
In the study of the weight choice, we note the effect of the weight parameter κ on
the weight function. From a numerical point of view, when the performances among
all the candidate models are quite different, a large value of κ is preferable to stretch
the weights’ difference. As a consequence, much higher weights are given to the better
behaved candidate models. On the other hand, when all the candidate models behave
closely, small κ is chosen to shrink the weights’ difference. However, there is no ex-
plicit form derived for κ’s selection from the theoretical prospective, and more research
49
Table 3.4: A5055 - ∆AIC and QFIC Values on 12 Nested Models Selected by ∆AIC
Covariate 1 2 3 4 5 6 7 8 9 10 11 12
icmax × × × × × × ×
iauc × × × × × × × × × × ×
ic50 × × × × × × × × × ×
rcmin × × × × × ×
rc12h × × × × ×
ic12h × × × ×
rcmax × × × × × × × ×
iadh × × × × × × × × ×
radh × × ×
icmin × ×
rauc ×
∆AIC[e-00] -45.7 -47.7 -49.6 -51.3 -50.8 -51.1 -48.9 -40.9 -32.2 -18.0 -7.7 0.0
QFICβcd4[e-03] 10.45 10.41 10.42 10.37 9.98 8.63 8.36 9.05 9.29 9.36 9.38 9.47
QFICβcd8[e-00] -1.73 -1.70 -1.66 -1.68 -1.62 -1.39 -1.32 -1.00 -1.12 -1.14 -0.86 -0.87
QFICβage[e-02] -8.07 -8.08 -8.16 -8.06 -9.27 -8.43 -9.07 -6.47 -5.39 -5.54 -5.57 -4.44
NOTE:× indicates presence of the covariate in the model and means its absence. Row ∆AIC[e-00] lists
the ∆AIC values. Row QFICβcd4[e-03] lists the QFIC values ×10−3 when we focus on parameter βcd4.
Row QFICβcd8[e-00] lists the QFIC values when we focus on parameter βcd8. Row QFICβage
[e-02] lists the
QFIC values ×10−2 when we focus on parameter βage.
50
Table 3.5: A5055 - QIC and QFIC Values on 12 Nested Model Selected by QIC
Covariate 1 2 3 4 5 6 7 8 9 10 11 12
icmax × × × × × × × ×
iauc × × × × × ×
ic50 × × × ×
rcmin × × ×
rc12 ×
ic12h × × × × × × ×
rcmax × × × × × × × × × × ×
iadh × ×
radh × × × × × × × × ×
icmin × × × × ×
rauc × × × × × × × × × ×
QIC[e-00] 204 210 213 216 216 217 216 216 215 214 212 211
QFICβcd4[e-03] 10.45 8.78 8.51 8.28 8.35 8.20 9.09 9.26 9.19 9.25 9.25 9.47
QFICβcd8[e-00] -1.73 -1.44 -1.48 -1.51 -1.29 -0.86 -0.77 -0.74 -0.76 -0.78 -0.78 -0.87
QFICβage[e-02] -8.07 -8.13 -7.73 -8.24 -7.73 -8.16 -7.28 -5.40 -5.17 -5.21 -5.20 -4.44
NOTE: × indicates presence of the covariate in the model and means its absence.
Row QIC[e-00] lists the QIC values. Row QFICβcd4[e-03] lists the QFIC values ×10−3 when we focus
on parameter βcd4. Row QFICβcd8[e-00] lists the QFIC values when we focus on parameter βcd8. Row
QFICβage[e-02] lists the QFIC values ×10−2 when we focus on parameter βage.
51
Table 3.6: A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for CD4
Covariate 1 2 3 4 5 6 7 8 9 10 11 12
icmax × × × × × × × ×
iauc × × × × ×
ic50 × × × × × ×
rcmin × × × × × × × × × × ×
rc12h × × × × × × × × ×
ic12h × × × ×
rcmax × × × × × × × × × ×
iadh × × ×
radh ×
icmin × × × × × × ×
rauc × ×
QFICβcd4[e-04] 2.956 2.847 2.797 2.773 2.700 2.119 2.047 2.048 2.175 6.560 8.500 15.50
βcd4
[e-03] 10.45 10.41 10.37 10.25 9.96 10.80 10.17 10.35 10.35 9.41 9.87 9.47
NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβcd4[e-
04] lists the QFIC values ×10−4 when we focus on parameter βcd4. Row βcd4
[e-03] lists the values of
βcd4× 10−3.
52
Table 3.7: A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for CD8
Covariate 1 2 3 4 5 6 7 8 9 10 11 12
icmax × × × × ×
iauc × × × × × ×
ic50 × × × × × × × × × × ×
icmin × × × × × × × × ×
rc12h × × × × × × × ×
ic12h × × × ×
rcmax × × × × × × × × × ×
iadh × × × × × × ×
radh × ×
icmin × × ×
rauc ×
QFICβcd8[e-00] 2.67 2.23 1.94 1.86 1.85 1.86 1.87 1.98 4.15 6.26 9.00 18.91
βcd8
[e-00] -1.73 -1.70 -1.73 -1.68 -1.62 -1.50 -1.51 -1.49 -1.16 -1.02 -1.12 -0.87
NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβcd8[e-00]
lists the QFIC values when we focus on parameter βcd8. Row βcd8
[e-00] lists the values of βcd8
.
53
Table 3.8: A5055 - QFIC Values and Coefficient Estimates on 12 Nested Models
Selected by QFIC for Age
Covariate 1 2 3 4 5 6 7 8 9 10 11 12
icmax × × × × × × × ×
iauc × × × × ×
ic50 × × × ×
rcmin × × × × × × × × ×
rc12h × × × × × × ×
ic12h × ×
rcmax × × ×
iadh × × × × × ×
radh ×
icmin × × × × × × × × × × ×
rauc × × × × × × × × × ×
QFICβage[e-02] 2.78 2.45 2.32 1.78 1.71 1.52 1.39 0.92 1.42 2.12 5.30 2.25
βage
[e-02] -8.07 -7.97 -8.94 -7.78 -7.35 -6.28 -6.35 -5.67 -5.55 -6.27 -5.71 -4.44
NOTE: × indicates presence of the covariate in the model and means its absence. Row QFICβage[e-
02] lists the QFIC values ×10−2 when we focus on parameter βage. Row βage
[e-02] lists the values of
βage× 10−2.
54
needs to be done to investigate the theoretical properties of κ.
In high-dimensional settings of a large number of uncertain parameters, it is an-
ticipated that averaging all the possible candidate models is practically infeasible. A
backward elimination or a forward selection procedure introduced by Claeskens et al.
(2006) is preferable in order to dramatically reduce computational burden. However,
backward elimination and forward selection procedure may sometimes result in differ-
ent final models. A further investigation of the different final models is warranted.
55
4 Predictive Models in Personalized
Medicine
4.1 Introduction
As the growth of biotechnology and genomics continues, personalized medicine has be-
come an important topic in current medical practice. Evidence-based medicine selects
therapy based on a whole group of patients. Nevertheless, it ignores the heterogene-
ity among the patients within the cohort. In oncology studies, it has been shown that
cancers can be diverse in terms of their oncogenesis, pathogenesis, and responsiveness
to therapy even if they are in the same primary site and stage, as mentioned in Simon
(2013). Certain medication, which has a significant treatment effect on some patients,
may be of no use to others. Misuse of medication may expose the patients to the risks
of adverse events with no benefit, as illustrated in Dumas et al. (2007). By utilizing
individual level characteristics, such as patient demographics, imaging and exam re-
sults, laboratory parameters, and genetic or genomic information, personalized predic-
tive models can improve individualized prognosis and diagnosis and correspondingly
individualize and optimize therapy, as mentioned in Simon (2005) and Simon (2012).
They therefore can be applied to many fields in personalized medicine, including per-
sonalized preventive care, personalized prognosis, diagnosis and monitoring, as well as
personalized therapy selection.
56
In this century, there has been vigorous statistical research about personalized ther-
apy selection, such as Murphy (2002), Robins (2004), Moodie et al. (2007), Robins
et al. (2008), Li et al. (2008), Qian and Murphy (2011), Brinkley et al. (2010), Gunter
et al. (2011) and Zhang et al. (2012). Most of the research involves a single or series
of sequential decision making processes and focuses on estimating optimal treatment
regimes. Some statisticians have also conducted subgroup analysis to tailor their find-
ings to a specific group, such as Bonetti and Gelber (2000), Bonetti and Gelber (2004),
Song and Pepe (2004), Pfeffer and Jarcho (2006), Wang et al. (2007), and Cai et al.
(2011). For certain patients or subgroups, therapy that would result in the best esti-
mated mean response outcomes based on specified models with specified exploratory
variables is chosen from a set of candidate therapies. However, most of these meth-
ods are based on the assumption that the specified model with the specified exploratory
variables is the true underlying model. Due to heterogeneity in the population, differ-
ent exploratory variables might be identified to be significant for different patients or
subgroups. Therefore in this chapter, instead of focusing on personalized therapy selec-
tion, we target personalized prognosis and diagnosis by using personalized predictive
models.
The construction of a reliable prediction rule for future responses is heavily de-
pendent on the “adequacy” of the fitted model. As mentioned in Section 1.1, the final
model that results from the traditional model selection criteria, such as AIC and BIC,
is the model with the overall best property for the whole population regardless of in-
dividuals. Even if the model is selected for a certain subgroup, it can still catch the
overall information of that whole subgroup but does not necessarily work best for each
individual in that group, as illustrated in Henderson and Keiding (2005).
The focused information criterion (FIC), as mentioned in Chapter 1, focuses at-
tention directly on the parameter of the primary interest and aims to select the model
with the minimum estimated mean square error of the parameter’s estimation. The final
model therefore ideally is the best model for that parameter only. This characteristic
57
motivates us to apply FIC to personalized medicine, and in particular, to the field of
personalized prognosis and diagnosis.
Based on the notation introduced in Sections 2.2 and 3.2, for patient j, we assume
that the prediction on his/her response outcome can be written as the function of the
model parameters, denoted by ζj = ζj (θ,γ). The value of the personalized FIC for
patient j under candidate model S therefore can be written and estimated as:
FICj,S = 2ω>j π>S Σ
11
S πSωj + n[ω>j (Iq − DS)γ
]2,
where ωj = Σ10Σ−1
00 ∂ζj/∂θ − ∂ζj/∂γ is determined by ζj . Even for the same can-
didate model S, the predictions on different targeted patients can result in different
personalized FIC values.
As mentioned in Chapter 1, FIC has been extended to several commonly used mod-
els. We also proposed QFIC incorporating the GEE approach for longitudinal data in
Chapter 3. Based on the well established framework, we first illustrate an application
of the classic personalized FIC in one cross-sectional binary case study and provide a
personalized diagnosis on tumor penetration of the prostatic capsule for prostate cancer
patients in Section 4.2. In Section 4.3, the personalized QFIC is applied to a longi-
tudinal case study and used to make a personalized prognosis on patients’ treatment
responses in relapsing remitting multiple sclerosis disease. Survival data is very com-
mon in oncology studies. One application of the corresponding personalized FIC for
survival data is discussed in Section 4.4. In this section, we aim to make a personalized
prediction on the survival rate of veterans with advanced lung cancer. We conclude in
the final section.
4.2 Prostate Cancer Case Study
Prostate cancer is one of the most common cancers in American men. As it advances,
cancer cells may spread from the prostate to the capsule. Knowing the cancer stage
58
can help the doctor make a diagnosis and select a corresponding therapy. The first
case study we discuss in this chapter is a prostate cancer trial with the possible capsule
involvement, introduced in Hosmer and Lemeshow (1989).
In this trial, 151 out of 376 patients had prostate cancer that penetrated the prostatic
capsule. The binary response, penetrat, indicates tumor penetration (0 - absence and
1 - presence). The corresponding potential explanatory factors include: dre, result of
the digital rectal exam (1 - no nodule, 2 - unilobar left nodule, 3 - unilobar right nodule,
and 4 - bilobar nodule); caps, detection of the capsular involvement in the rectal exam
(1- absence and 2 - presence); psa, prostate-specific antigen value (in mg/ml); volume,
tumor volume obtained from ultrasound (in cm3); gscore, total Gleason score (0-10);
race, (1 - white and 2 - black); and age.
In this section, we aim to select a personalized predictive model for a targeted
prostate cancer patient based on the personalized FIC. By doing so, we can better pre-
dict the targeted patient’s tumor penetration rate and therefore provide a personalized
diagnosis for cancer progression.
4.2.1 Model Selection Implementation
We first prefit the data with the classic logistic regression model with all the potential
explanatory covariates listed above. The full model can be written as:
logit(µ) =β0 + β1gscore + β2dre + β3psa
+ β4race + β5caps + β6volume + β7age,
where µ is the conditional expectation of penetrat. By order of the significance, the
corresponding statistical inference is listed in Table 4.1 in terms of the coefficients’
estimates, standard errors and p-values.
Based on Table 4.1, we identify four highly significant covariates: int., gscore,
dre, and psa, which are the certain covariates. The predictive model, therefore, is
selected by the personalized FIC and also the traditional AIC (for comparison) from
59
Table 4.1: Prostate Cancer - Statistical Inference under Full Model
Covariate Estimate Std.err Z-value P-value
int. -6.1e+00 1.9e+00 -3.2 1.6e-03
gscore 9.7e-01 1.7e-01 5.8 5.8e-09
dre(2) 7.3e-01 3.6e-01 2.1 4.0e-02
dre(3) 1.5e+00 3.8e-01 4.0 5.9e-05
dre(4) 1.4e+00 4.6e-01 3.0 2.5e-03
psa 2.9e-02 1.0e-02 3.0 3.5e-03
race -6.8e-01 4.7e-01 -1.4 1.5e-01
caps 5.3e-01 4.6e-01 1.1 2.6e-01
volume -2.6e-03 2.6e-03 -1.0 3.2e-01
age -1.3e-02 2.0e-02 -0.7 5.0e-01
the remaining 24 = 16 candidate models listed in Table 4.2. Here and below, “×”
indicates the presence of the specific covariate in the specific candidate model and “”
means its absence.
As a result of AIC, m8 is selected as the single overall predictive model for all the
patients in the study and is circled in Table 4.2. Other than the four certain covariates,
m8 also contains one uncertain covariate race.
Nevertheless, for implementation of the personalized FIC, we consider each pa-
tient’s capsular penetration prediction as an individual targeted parameter. Therefore,
376 personalized predictive models are selected individually for the corresponding 376
patients by the personalized FIC from the 16 candidate models. Figure 4.1 provides
the frequencies of the 16 candidate models selected as the final personalized predictive
models for the 376 patients. From this histogram, we observe that instead of the single
predictive model m8, the personalized predictive models mainly distribute among the
60
Table 4.2: Prostate Cancer - Candidate Models
race caps volume age race caps volume age
m1 × × × × m9 × × ×
m2 × × m10 × ×
m3 × × × m11 × ×
m4 × × m12 ×
m5 × × × m13 × ×
m6 × × m14 ×
m7 × × m15 ×
m8 × m16
NOTE: × indicates presence of the covariate in the candidate model and means its absence.
candidate models: m6, m7, m8, m11, m12, m14, and m16. In particular, more than 50
patients choose m8, m12, and m16 as their predictive models.
4.2.2 Cross-Validation and Simulation Examination
In order to examine the predictive power of the personalized predictive models and the
single final model m8, we run a leave-one-out cross-validation experiment. The cor-
responding prediction error rates for the personalized predictive models and the single
predictive model are 0.345 and 0.351. The smaller prediction error rate of the person-
alized predictive models indicates the superiority of the personalized FIC compared to
the traditional AIC.
We also conduct a simulation study to compare the models’ performance at a rela-
tively small sample size level. In order to mimic the patients in this study, we randomly
sample with replacement from the 376 patients and generate 100 pseduo-patients with
observations on the response and seven potential covariates. Again, we implement the
61
Figure 4.1: Prostate Cancer - Frequency of Candidate Models Selected by the
Personalized FIC as the Personalized Predictive Models for 376 Patients
Candidate Models
Fre
qu
en
cy
01
02
03
04
05
06
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
personalized FIC and the traditional AIC on these 100 pseduo-patients and identify 100
personalized predictive models and one overall predictive model, based on which we
make the corresponding penetration rate predictions. The corresponding mean square
errors can be obtained by comparing the predictions to the true penetration rates. We
calculate the true penetration rates through the following formula:
p =exp(τ )
1 + exp(τ ),
where
τ =− 6.1 + 0.97gscore + 0.73dre(2) + 1.5dre(3) + 1.4dre(4)
+ 0.029psa− 0.68race + 0.53caps− 0.0026volume− 0.013age.
The coefficients used here are from Table 4.1. With one thousand replications, we arrive
at estimated mean square errors of 3.14 and 3.25 for the personalized predictive models
and the single predictive model. The smaller mean square error once again shows the
better behavior of the personalized FIC compared to AIC.
62
4.2.3 Group-Specific Analysis
In order to illustrate the personalized FIC’s consideration of the patients’ heterogeneity,
we also perform the group-specific analysis based on four uncertain covariates.
Figure 4.2 presents the histograms of the observations on two continuous uncer-
tain covariates, volume and age. In particular, about 50% of the patients have 2 cm3
tumor volume obtained from ultrasound. Based on these two histograms and also the
outcomes of two binary uncertain covariates, race and caps, we categorize all the 376
patients into two groups with four different partition criteria, as listed in Table 4.3.
Table 4.3: Prostate Cancer - Group Partition Criteria
Criterion Group A Group B
race white black
caps presence absence
volume > 2 cm3 = 2 cm3
age (60,75) [40,60] or [75,80]
In this subsection, we particularly target the patients whose personalized predictive
models are different from the single predictive model m8 in terms of each uncertain
covariate. As we reported in Subsection 4.2.1, m8 includes one extra uncertain co-
variate race in addition to the certain covariates. These targeted patients’ personalized
predictive models, therefore, either (i) exclude race that is shown in m8,(race[]
);
(ii) include caps that is excluded from m8,(caps[×]
); (iii) include volume that is ex-
cluded from m8,(volume[×]
); or (iv) include age that is excluded from m8,
(age[×]
).
For each partition criterion, the percentage (pct.) of the targeted patients is calcu-
lated based on the number of patients in each group (size) as reported in Table 4.4.
The percentage measures the difference shown in each group between the personalized
predictive models and the single predictive model in terms of each specific uncertain co-
63
Figure 4.2: Prostate Cancer - Histograms of Tumor Volume and Age
tumor volume
Fre
quen
cy
0 50 100 150
050
100
150
age
Fre
quen
cy
50 55 60 65 70 75 80
05
1015
2025
64
variate. The corresponding prediction error rates of the personalized predictive models
(erFIC) and the single predictive model (erAIC) are calculated only based on the targeted
patients and also shown in Table 4.4.
We highlight the relatively higher percentages, which are greater than 50%. Par-
ticularly in the bottom row of the table, a total of 56% of the patients exclude race in
their predictive models, regardless of the group partition criteria. The smaller predic-
tion error rates of the personalized predictive model compared to the single predictive
model for almost every category show the advantage of tailoring the predictive model
individually based on the patient’s personal information.
Based on each group partition criterion, we also compare the percentages of the
targeted patients within the corresponding two groups. Generally speaking, various
percentages in Table 4.4 do show the differences in each group-specific comparison.
This is especially true of the boxes circled in the dashed line that indicate the pairs with
quite different percentages. For the race-based partition, 61% of black patients include
caps in their personalized predictive models while only 29% of white patients do so.
Since white patients are the majority in this study (340 out of 376), yet show a low per-
centage, this indicates simultaneously the overall fitting property of m8 selected by the
traditional AIC. Based on the caps partition criterion, for the patients with and with-
out capsular involvement, 60% -vs- 20% of patients exclude race and 25% -vs- 65%
patients include volume in their final personalized predictive models. Group A and B
partitioned based on volume also reveal the quite different percentages, 72% -vs- 36%,
in terms of race’s existence in their personalized predictive models.
In summary, by considering the individual level information of prostate cancer pa-
tients, the personalized FIC considers patients’ heterogeneity and provides the best per-
sonalized predictive model for the targeted patient only. The smaller prediction error
rate, smaller mean square error, and the results of the group-specific analysis all show
the advantage of the personalized predictive models concluded by the personalized FIC
over the single predictive model selected by the traditional AIC. Therefore, diagnosis of
65
Table 4.4: Prostate Cancer - Group-Specific Percentages and Prediction Error Rates of
Targeted Patients with Four Partition Criteria
Criterion Group Inference race[] caps[×] volume[×] age[×] sizerace A pct. 56% 29% 29% 26% 340
erFIC 0.337 0.342 0.317 0.302erAIC 0.339 0.347 0.320 0.310
B pct. 61% 61% 33% 36% 36erFIC 0.355 0.399 0.460 0.175erAIC 0.398 0.437 0.484 0.211
caps A pct. 60% 36% 25% 25% 336erFIC 0.338 0.355 0.347 0.291erAIC 0.342 0.366 0.344 0.295
B pct. 20% 3% 65% 45% 40erFIC 0.365 0.009 0.286 0.264erAIC 0.418 0.004 0.317 0.308
volume A pct. 72% 48% 27% 26% 211erFIC 0.349 0.368 0.381 0.290erAIC 0.357 0.378 0.383 0.302
B pct. 36% 12% 33% 28% 165erFIC 0.312 0.274 0.283 0.282erAIC 0.314 0.284 0.291 0.291
age A pct. 58% 35% 23% 33% 267erFIC 0.335 0.320 0.341 0.287erAIC 0.339 0.327 0.343 0.295
B pct. 50% 27% 44% 11% 109erFIC 0.349 0.456 0.322 0.278erAIC 0.362 0.480 0.331 0.315
Total pct. 56% 32% 29% 27% 376
NOTE: pct. indicates the percentage of the targeted patients in each group; erFIC and erAIC indicate the
prediction error rates of the personalized predictive models and the single predictive model based on the
targeted patients in each group; size indicates the number of patients in each group; race[] indicates
the targeted patients whose personalized predictive models exclude race; caps[×] indicates the targeted
patients whose personalized predictive models include caps; volume[×] indicates the targeted patients
whose personalized predictive models include volume; and age[×] indicates the targeted patients whose
personalized predictive models include age.
66
the targeted prostate cancer patients’ capsular penetration can be better made individu-
ally based on the different personalized predictive models chosen by the personalized
FIC.
4.3 Relapsing Remitting Multiple Sclerosis Case Study
Other than the cross-sectional study, the personalized predictive models can also be
used for individualized prognosis and diagnosis in longitudinal studies. As an illus-
tration, the second case study we perform is from a longitudinal clinical trial, which
aims to assess the effects of neutralizing antibodies on interferon beta-1 (IFNB) in re-
lapsing remitting multiple sclerosis (RRMS), a disease that destroys the myelin sheath
surrounding the nerves.
We particularly focus on a 15-week magnetic resonance imaging (MRI) study in-
volving 50 patients in two locations, randomized into three treatment groups: 17 in
placebo, 17 in low-dose and 16 in high-dose. At each of 17 scheduled visits, a binary
exacerbation outcome exacerb was recorded at the time of each MRI scan, according
to whether an exacerbation began since the previous scan (1 - positive and 0 - nega-
tive). The potential explanatory covariates include: edss, expanded disability status
scale; dose, treatment groups (0 - placebo, 1 - low dose, and 2 - high dose); duration,
rrms duration (in years); lot, location indicator (0 - location A and 1 - location B);
sex; and visit, the visit times (in days).
The goal of this study is to identify a prediction rule, by which we can then accu-
rately predict the targeted patients’ exacerbation response to the specific treatment. We
can make a better prediction even at the targeted visit time.
67
4.3.1 Model Selection Implementation
We consider the following generalized additive partially linear models incorporating
the GEE approach for this study:
logit(µ) = η1(visit) + η2(duration) +β1edss+β2dose+β3lot+β4age+β5sex
where visit and duration are set in nonparametric components and µ is the con-
ditional expectation of exacerb. Figure 4.3 plots the empirical exacerbation rates at
different visit days and with different RRMS duration time. It confirms the nonlinear
trends of these two covariates on the log odds ratio of the response.
We therefore prefit this full model using the polynomial spline method incorporat-
ing the GEE approach with EX. The two degree natural splines are used to approximate
the two nonparametric functions. The fitted curves of these two nonparametric com-
ponents, η1(visit) and η2(duration), are depicted in the dashed line in Figure 4.3.
Summaries of the rest coefficients, including their estimates, standard errors, and cor-
responding p-values, are listed in Table 4.5.
Table 4.5: RRMS - Statistical Inference under Full Model
Covariate Estimate Std.err Wald P-value
edss 2.9e-01 8.8e-02 11.1 8.7e-04
int. -1.4e+00 8.2e-01 3.0 8.6e-02
dose(1) 7.5e-02 3.1e-01 0.1 8.1e-01
dose(2) -3.5e-01 3.1e-01 1.3 2.5e-01
lot 3.9e-01 3.3e-01 1.4 2.4e-01
age -1.3e-02 1.5e-02 0.8 3.8e-01
sex 1.4e-01 3.4e-01 0.2 6.7e-01
Based on Table 4.5, other than the two nonparametric components, we also include
the highly significant covariate edss in the narrow model and perform the model selec-
68
Figure 4.3: RRMS - Empirical and Estimated Exacerbation Rates on Visit Days and
Duration Time
20 40 60 80 100
0.10
0.15
0.20
0.25
Visit (in Days)
Exa
cerb
atio
n R
ate
5 10 15 20 25
0.0
0.1
0.2
0.3
0.4
Duration (in Years)
Exa
cerb
atio
n R
ate
Empirical RateEstimated Rate
69
tion procedure among the remaining five uncertain factors. There are a total of 25 = 32
candidate models, as listed in Table 4.6.
Table 4.6: RRMS - Candidate Models
int. dose lot age sex int. dose lot age sex
m1 × × × × × m17 × × × ×
m2 × × × × m18 × × ×
m3 × × × × m19 × × ×
m4 × × × m20 × ×
m5 × × × × m21 × × ×
m6 × × × m22 × ×
m7 × × × m23 × ×
m8 × × m24 ×
m9 × × × × m25 × × ×
m10 × × × m26 × ×
m11 × × × m27 × ×
m12 × × m28 ×
m13 × × × m29 × ×
m14 × × m30 ×
m15 × × m31 ×
m16 × m32
NOTE: × indicates presence of the covariate in the candidate model and means its absence.
The traditional AIC-type model selection criterion ∆AIC for longitudinal data in-
corporating the GEE approach, proposed in Chapter 2, selects m30 as the final single
predictive model, which can be written as:
logit(µ) = η1(visit) + η2(duration) + β1edss + β4age.
70
It is circled in Table 4.6. Regardless of the different characteristics of the different
patients at the different visit times, m30 is the overall best choice for this longitudinal
study.
On the other hand, the personalized QFIC, proposed in Chapter 3, considers the
observations’ heterogeneity among the patients and even among the same patient’s dif-
ferent visit times. By taking each time point’s exacerbation prediction as the individual
targeted parameter, the personalized QFIC chooses different personalized predictive
models for different patients at the different visit times. In this study, we have 50 pa-
tients and about 17 visit times for each patient, therefore totaling 822 observations.
Figure 4.4 provides the frequencies of the 32 candidate models chosen as the corre-
sponding 822 personalized predictive models. From the histogram in Figure 4.4, we
can observe that other than ∆AIC’s single predictive model m30, m16 also has a rela-
tively high frequency.
Figure 4.4: RRMS - Frequencies of Candidate Models Selected by the Personalized
QFIC as the Personalized Predictive Models for 822 Observations
Candidate Models
Fre
quen
cy
050
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
71
In particular, on certain targeted visit days, namely days 7, 31, 61 and 104, the
frequencies of the candidate models selected as the personalized predictive models for
these 50 patients are also plotted in Figure 4.5. We do observe a slight difference among
these four histograms. Due to the relatively strong correlation among each patient’s
repeated measurements, the four histograms all have relatively higher frequencies to
choose m16 and m30 for the 50 patients. This general trend in these histograms is
consistent with the trend shown in the overall histogram in Figure 4.4.
By using a cross-validation experiment, we also examine the predictive powers of
the 822 personalized predictive models and the single predictive model m30. Again,
due to the complicated correlation structure of each patient’s repeated measurements,
a leave-one-patient-out experiment is used. Based on one thousand replications, the
prediction error rates of 0.265 for the personalized predictive models and 0.272 for the
single predictive model show the superiority of tailoring predictive models individually
by the personalized QFIC.
4.3.2 Group-Specific Analysis
Similar to the discussion in Subsection 4.2.3, in order to illustrate the personalized
QFIC’s consideration of the patients’ heterogeneity, we also carry out the group-specific
analysis based on four uncertain covariates: dose, lot, sex and age. The analysis is
performed respectively at four different target visit days, namely days 7, 31, 61 and
104. Again, we specifically focus on the patients whose personalized predictive models
are different from the single predictive model m30 in terms of the presence or absence
of each uncertain covariate.
72
Figu
re4.
5:R
RM
S-F
requ
enci
esof
Can
dida
teM
odel
Sele
cted
byth
ePe
rson
aliz
edQ
FIC
asth
ePe
rson
aliz
edPr
edic
tive
Mod
els
for5
0Pa
tient
sat
Vis
itD
ays
of7,
31,6
1an
d10
4
Day
7
Can
dida
te M
odel
s
Frequency
05101520
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
Day
31
Can
dida
te M
odel
s
Frequency
05101520
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
Day
61
Can
dida
te M
odel
s
Frequency
05101520
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
Day
104
Can
dida
te M
odel
s
Frequency
05101520
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
73
As reported in Subsection 4.3.1, m30 includes one extra uncertain covariate age,
other than the certain covariates. Therefore, the targeted patients in this subsection have
their personalized predictive models: either (i) including dose[×] based on dose group
partition criterion; (ii) including lot[×] based on lot partition criterion; (iii) including
sex[×] based on sex criterion; or (iv) excluding age[] based on age criterion. The
corresponding percentages, prediction error rates estimated through the personalized
predictive models and the single predictive model for the targeted patients at the visit
days of 7, 31, 61, and 104 are reported in Table 4.7.
The high percentages of targeted patients in Table 4.7 show the relatively large
differences of the personalized predictive models from the single predictive model in
the corresponding groups in terms of the specific uncertain covariate’s existence. In
particular, we highlight the percentages greater than 30%.
For the uncertain covariate age, in the earlier visit days of 7 and 31, only the
younger group, composed of patients who are younger than 30 years old, has the higher
percentages. In other words, 50% of the patients in the younger group exclude age from
their personalized predictive models at day 7 and 31. The majority of the patients in the
remaining two groups have their personalized predictive models consistent with m30 in
terms of the existence of age. But in the later visit days of 61 and 104, more than 30%
of patients in all three groups exclude age from their personalized predictive models,
as shown in the dashed box at the bottom right corner of Table 4.7. This indicates the
larger difference between the predictive models concluded by the personalized QFIC
and ∆AIC in the later visit days compared to the earlier days. It simultaneously shows
the personalized QFIC’s consideration of the heterogeneity among observations at dif-
ferent visits.
In addition, for consideration of the heterogeneity among the patients, the top dashed
box encircles the placebo group based on the dose group partition criterion. At all four
visit days, more than 35% of patients in the placebo group include dose in their person-
alized predictive models compared to the other two treatment groups. It is reasonable
74
to infer that the patients in the placebo group tend to have no treatment effect. There-
fore, the treatment indicator dose may be significant for patients in the placebo group
to better predict their exacerbation rate.
Finally, most of the categories in Table 4.7 have the smaller prediction error rates of
the personalized QFIC compared to ∆AIC. This again shows the advantage of tailoring
the predictive model individually based on the individual level information.
4.3.3 Statistical Inference on Targeted Patients
Rather than just focusing on prediction accuracy of the patients in the study, we also try
to predict future patients. In this section, we particularly consider 36 year old patients
who have 8.8 years of RRMS disease with the expanded disability status scale of 4. All
these values are actually the medians of the observations on the continuous potential
covariates age, duration, and edss in this current study.
To illustrate that the personalized predictive models are tailored by the personalized
QFIC for each targeted patient, we place these patients into twelve different scenarios
based on the three categorical potential factors of sex, lot, and dose. Table 4.8 records
the corresponding twelve personalized predictive models selected by the personalized
QFIC. The corresponding exacerbation rate predictions through the 17 visit times for
each scenario are also plotted in Figure 4.6.
In Table 4.8, the female patients who are in placebo and low-dose group at location
A and the male patients who are in low-dose group at location A and in high-dose
group at location B have their personalized predictive models only include age, thus
consistent with the single predictive model m30.
75
Tabl
e4.
7:R
RM
S-G
roup
-Spe
cific
Perc
enta
ges
and
Pred
ictio
nE
rror
Rat
esfo
rthe
Targ
eted
Patie
nts
atth
eTa
rget
edV
isit
Day
s
with
Four
Part
ition
Cri
teri
a
Day
7D
ay31
Day
61D
ay10
4C
rite
rion
Gro
uppc
t.er
FIC
erA
ICsi
zepc
t.er
FIC
erA
ICsi
zepc
t.er
FIC
erA
ICsi
zepc
t.er
FIC
erA
ICsi
zedose
plac
ebo
35%
0.29
30.
303
1750
%0.
233
0.23
816
38%
0.17
00.
224
1738
%0.
170
0.22
416
[×]
low
0%-
-17
0%-
-17
0%-
-17
0%-
-15
high
19%
0.15
10.
244
1619
%0.
088
0.11
716
7%0.
805
0.84
216
7%0.
805
0.84
214
tota
l18
%0.
246
0.28
450
22%
0.19
40.
205
4916
%0.
261
0.31
250
16%
0.26
10.
312
45lot
A0%
--
100%
--
90%
--
100%
--
8[×
]B
3%0.
255
0.42
040
3%0.
818
0.75
840
5%0.
192
0.21
840
11%
0.53
00.
515
37to
tal
2%0.
255
0.42
050
2%0.
818
0.75
849
4%0.
192
0.21
850
9%0.
530
0.51
545
sex
mal
e0%
--
385%
0.40
50.
426
3718
%0.
2609
0.24
338
6%0.
552
0.62
835
[×]
fem
ale
8%0.
292
0.47
212
17%
0.09
60.
195
1217
%0.
099
0.15
512
10%
0.14
20.
303
10to
tal
2%0.
292
0.47
250
8%0.
250
0.31
149
18%
0.22
50.
224
507%
0.41
60.
520
45age
<=
3050
%0.
517
0.53
28
50%
0.63
90.
644
875
%0.
917
0.88
78
50%
0.45
20.
445
6[
]30
-40
19%
0.11
00.
114
2628
%0.
132
0.14
825
35%
0.07
40.
085
2640
%0.
213
0.24
325
>=
4025
%0.
071
0.09
316
25%
0.04
60.
048
1650
%0.
302
0.29
616
50%
0.28
20.
295
14to
tal
26%
0.14
20.
152
5031
%0.
161
0.17
149
46%
0.20
40.
207
5044
%0.
261
0.28
245
NO
TE
:Ate
ach
targ
eted
visi
tday
,pct
.in
dica
tes
the
perc
enta
geof
the
targ
eted
patie
nts
inea
chgr
oup;
erFI
Can
der
AIC
indi
cate
the
pred
ictio
ner
ror
rate
s
ofth
epe
rson
aliz
edpr
edic
tive
mod
els
and
the
sing
lepr
edic
tive
mod
elba
sed
onth
eta
rget
edpa
tient
sin
each
grou
p;si
zein
dica
tes
the
num
ber
ofpa
tient
s
inea
chgr
oup;
age[
]in
dica
tes
the
targ
eted
patie
nts
who
sepe
rson
aliz
edpr
edic
tive
mod
els
excl
udeage
;dose[×
]in
dica
tes
the
targ
eted
patie
nts
who
se
pers
onal
ized
pred
ictiv
em
odel
sin
clud
edose
;lot[×
]in
dica
tes
the
targ
eted
patie
nts
who
sepe
rson
aliz
edpr
edic
tive
mod
els
incl
udelot
;an
dsex[×
]
indi
cate
sth
eta
rget
edpa
tient
sw
hose
pers
onal
ized
pred
ictiv
em
odel
sin
clud
esex
.
76
Table 4.8: RRMS - Personalized Predictive Models Concluded by the Personalized
QFIC for Targeted Patients under Twelve Scenarios
sex
dose lot male female
placebo A int. age sex age
B int. int. sex
low A age age
B int. lot.
high A dose age dose age sex
B age int. lot sex
The targeted patients who receive high-dose in location A include dose in their
personalized predictive model. The female patients who are in low-dose and high-dose
groups in location B include lot in their personalized predictive models. The uncer-
tain explanatory covariate age is significant for all patients in location A, regardless
of their gender and treatment. In location B, however, only high-dose males identify
age’s significance. Among these twelve scenarios, females tend to include sex in their
personalized predictive models.
From Figure 4.6 we observe that both the personalized predictive models and the
single predictive model m30 show the U-shaped exacerbation rate predication along
with the visit time. But in the different treatment groups, namely the placebo, low-
dose, and high-dose groups, the personalized predictive exacerbation rate is decreasing
as the dose level changes from placebo to high, while it stays the same by the single
predictive model m30.
In conclusion, the personalized QFIC utilizes the individual level information and
considers the heterogeneity among RRMS patients and even among the repeated mea-
77
Figure 4.6: RRMS - Exacerbation Rate Predictions for Targeted Patients under the
Single Predictive Model and the Twelve Personalized Predictive Models
20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
male at location A
Visit Days
Exa
cerb
atio
n R
ates
∆AIC − m30QFICplacebo − m13
QFIClow − m30QFIChigh − m22
20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
male at location B
Visit Days
Exa
cerb
atio
n R
ates
∆AIC − m30QFICplacebo − m30
QFIClow − m30QFIChigh − m21
20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
female at location A
Visit Days
Exa
cerb
atio
n R
ates
∆AIC − m30QFICplacebo − m16
QFIClow − m16QFIChigh − m30
20 40 60 80 100
0.10
0.15
0.20
0.25
0.30
female at location B
Visit Days
Exa
cerb
atio
n R
ates
∆AIC − m30QFICplacebo − m15
QFIClow − m28QFIChigh − m30
NOTE: QFICplacebo indicates the personalized predictive model for targeted patient in placebo group.
QFIClow indicates the personalized predictive model for targeted patient in low-dose group. QFIChigh
indicates the personalized predictive model for targeted patient in high-dose group.
78
surements from the same patient at different visit times. With the personalized predic-
tive model selected by the personalized QFIC, we can therefore reach a more accurate
exacerbation rate prediction and make a better prognosis and diagnosis on treatments
for the targeted patient only.
4.4 Veteran’s Lung Cancer Case Study
As we mentioned in Section 4.1, cancers can be very diverse even if they are in the
same primary site and stage, as discussed in Simon (2013). Heterogeneity is therefore
highly needed for consideration in oncology studies. Since survival outcomes are very
common in such studies, we are motivated to apply the adjusted focused information
criterion as introduced in Hjort and Claeskens (2006), into a lung cancer survival study.
As mentioned in Harris et al. (1989) and Campling et al. (2005), “Lung cancer is
an urgent priority among veterans. Not only is the incidence higher, but the survival is
lower than in civilian populations.” The third case study in this chapter concerns lung
cancer veterans. The dataset was collected by the Veterans Administration Lung Cancer
Study Group and reported in Prentice (1973). It has been studied by Kalbfleisch and
Prentice (2002) and further discussed in Bennett (1983) and Pettitt (1984).
In this trial, 137 veterans with advanced inoperable lung cancer were randomized
into one of two chemotherapeutic agents. The primary failure time was the time to
death event and nine survival times were censored. The potential explanatory covari-
ates include: kscore, Karnofsky performance score of patients’ daily living activities
performance measured at randomization (10 - 30 - completely hospitalized, 40 - 60 -
partial confinement, and 70 - 90 - able to care for self); diagt, time period from diag-
nosis to randomization (in months); age, pateints’ age at the beginning of the study (in
years); prior, an indicator whether the patient has had prior therapy (0 - no and 10 -
yes); treat, chemotherapeutic agents (1 - standard and 2 - test); and type, histological
types of tumor cells (1 - squamous , 2 - small, 3 - adeno-carcinoma, and 4 - large).
79
For patients with cancer, especially advanced cancer, survival rates are commonly
used by a doctor as a standard and effective way of discussing individualized prog-
nosis. In this study, we focus mainly on the prediction of each lung cancer veteran’s
survival rate, in particular, on the 30th day since randomization. We can thus provide
the corresponding personalized prognosis for each lung cancer veteran.
4.4.1 Model Selection Implementation
We first prefit the data with the Cox proportional hazard linear regression model with
all the potential explanatory covariates, as listed above:
λ(t) = λ0(t) exp(β1kscore + β2type + β3treat + β4age + β5prior + β6diagt
).
The corresponding estimates, standard errors and p-values are listed in Table 4.9.
Table 4.9: Lung Cancer - Statistical Inference under Full Model
Covariate Estimates Exp(estimates) Std.err Z-value P-value
kscore -3.3e-02 9.7e-01 5.5e-03 -6.0 2.6e-09
type(2) 8.6e-01 2.4e+00 2.8e-01 3.1 1.8e-03
type(3) 1.2e+00 3.3e+00 3.0e-01 4.0 7.1e-05
type(4) 4.0e-01 1.5e+00 2.8e-01 1.4 1.6e-01
treat 2.9e-01 1.3e+00 2.1e-01 1.4 1.6e-01
age -8.7e-03 9.9e-01 9.3e-03 -0.9 3.5e-01
prior 7.2e-03 1.0e+00 2.3e-02 0.3 7.6e-01
diagt 8.1e-05 1.0e+00 9.1e-03 0.0 9.9e-01
Under the full model, Table 4.9 indicates the highly significant covariates, kscore
and type, which are included in the narrow model as the certain covariates. We there-
fore run the model selection procedure among the remaining four uncertain covariates:
80
treat, age, prior, and diagt. Accordingly, there are 24 = 16 candidate models as
listed in Table 4.10.
Table 4.10: Lung Cancer - Candidate Models
treat age prior diagt treat age prior diagt
m1 × × × × m9 × × ×
m2 × × × m10 × ×
m3 × × × m11 × ×
m4 × × m12 ×
m5 × × × m13 × ×
m6 × × m14 ×
m7 × × m15 ×
m8 × m16
NOTE: × indicates presence of the covariate in the candidate model and means its absence.
The traditional AIC selects the circled narrow model m16 in Table 4.10 as the final
single predictive model with overall best fitting for all the lung cancer veterans. Nev-
ertheless, by considering the veterans’ heterogeneity, the personalized FIC provides
different personalized predictive models for different targeted patients based on their
individual information. The frequencies of the 16 candidate models selected as the per-
sonalized predictive models for all 137 lung cancer veterans are plotted in Figure 4.7.
Relatively consistent with AIC, Figure 4.7 reveals the highest frequency of choosing
m16 as the personalized predictive models. Other than the narrow model, m8 with
treat and m14 with prior also have relatively higher frequencies.
4.4.2 Group-Specific Analysis
To check the heterogeneity among the lung cancer veterans in the study, we perform
group-specific analysis from two different perspectives. Due to the higher frequencies
81
Figure 4.7: Lung Cancer - Frequencies of Candidate Models Selected by the
Personalized FIC as the Personalized Predictive Models for 137 Veterans
Candidate Models
Freq
uenc
y
010
2030
4050
6070
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
of m8, m14, and m16 in Figure 4.7, we first compare three groups G8, G14 and G16,
composed of the lung cancer veterans whose personalized predictive models are chosen
to be m8, m14 and m16 respectively.
Figure 4.8 draws the Kaplan Meier estimated survival curves for all three groups.
From the graphs, we observe that lung cancer veterans in G8 have the smallest esti-
mated survival rate of around 0.2 at the 30th day. Veterans in G14 have the relatively
larger estimated survival rate of around 0.8 compared to G8. G16 has the largest sur-
vival rate of around 0.9.
Crooks et al. (1991) has mentioned that patients with higher Karnofsky scores at
the time of tumor diagnosis have better survival and quality of life over the course of
their illness. Therefore, the histograms of the Karnofsky performance score for all three
groups are plotted in Figure 4.8 as well. Consistent with Kaplan Meier estimates, the
Karnofsky scores in G8 tend to be relatively lower. The lung cancer veterans in G16
tend to have relatively higher Karnofsky scores compared to the other two groups.
82
Figure 4.8: Lung Cancer - Kaplan Meier Estimations and Karnofsky Scores
Histograms on Veterans in Groups G8, G14 and G16
0 20 40 60 80
0.0
0.4
0.8
KM Curve in G8
Time (in days)
Sur
viva
l Rat
e
Day 30
Histogram in G8
Karnofsky ScoreF
requ
ency
0 20 40 60 80 100
05
1020
30
0 50 100 150 200 250 300
0.0
0.4
0.8
KM Curve in G14
Time (in days)
Sur
viva
l Rat
e
Day 30
Histogram in G14
Karnofsky Score
Fre
quen
cy
0 20 40 60 80 100
05
1020
30
0 200 400 600 800 1000
0.0
0.4
0.8
KM Curve in G16
Time (in days)
Sur
viva
l Rat
e
Day 30
Histogram in G16
Karnofsky Score
Fre
quen
cy
0 20 40 60 80 100
05
1020
30
83
Per the previous discussion, m8 tends to be assigned to the lung cancer veterans
who have a relatively lower kscore and a relatively smaller survival rate at the 30the
day. The existence of the uncertain covariate treat in m8, therefore, indicates that the
treatment assignment is an important factor for the survival rate prediction on veterans
who have much more advanced lung cancer. Compared to the narrow model m16,
m14 includes one extra uncertain covariate prior. It is also reasonable to infer that
information about whether prior therapy is given or not is important to the relatively
advanced lung veterans. In addition, more than 50% of the lung cancer veterans select
m16 as their personalized predictive models. This is consistent with the overall property
of the single predictive model selected by AIC.
The other perspective of group-specific analysis we consider is the heterogeneity
among patients in terms of their different tumor cell types. Figure 4.9 presents the
frequencies of selecting candidate models as the final personalized predictive models
for lung cancer veterans with different tumor cell types. The histograms show that
the majority of veterans with squamous and large tumor cells tend to select the narrow
model, m16, as their final personalized predictive models. The final models for veterans
with small and adeno tumor cells are roughly equally distributed among the candidate
models m8, m14 and m16.
4.4.3 Adjusted Prediction Error
Due to the existence of the censorship and the time-dependence in survival analysis,
Schumacher et al. (2007) adjusted the prediction error via Brier’s score, in Brier (1950),
as a function of time.
For the targeted patient j, denote (yj, sj) the observed time and the censor status,
where yj is the censoring time when sj = 1 and yj is the event time when sj = 0. At
time t, pj(t) denotes the survival status, in other words, if the patient is alive pj(t) = 1,
84
Figure 4.9: Lung Cancer - Frequencies of Candidate Model Selected by the
Personalized FIC as the Personalized Predictive Models for Veterans with Different
Tumor Cell Types
Squamous Cell
Candidate Models
Fre
quen
cy
05
1015
2025
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Small Cell
Candidate Models
Fre
quen
cy
05
1015
2025
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Adeno Cell
Candidate Models
Fre
quen
cy
05
1015
2025
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Large Cell
Candidate Models
Fre
quen
cy
05
1015
2025
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
85
otherwise, pj(t) = 0. The adjusted prediction error at time t is defined as:
er(t) =∑j
[pj(t)− rj(t)
]2w[t, g(t)
],
where rj(t) is the estimated survival rate for the patient j at time t. The weights w
remove the large sample censoring bias and are given by:
w[t, g(t)
]=
Iyj ≤ t, sj = 0g(yj−)
+Iyj > tg(t)
and g(t) denotes an estimate of the conditional probability of being uncensored at time
t (Gerds and Schumacher (2006) and van der Laan and Robins (2003)). Here, we use
the Kaplan-Meier estimate of the censoring survival function substituted for g, making
it consistent for the prediction error (Gerds and Schumacher (2006), Graf et al. (1999)
and Korn and Simon (1991)).
With the leave-one-out cross-validation experiment, we calculate the adjusted pre-
diction error rates for the single final predictive model of 0.273, and the personalized
predictive models of 0.233. The better behavior for the personalized predictive models
once again shows the advantages of considering heterogeneity and using different ex-
planatory factors for different targeted individuals. Therefore, the future personalized
prognosis, in terms of the survival rate at day 30, can be more accurately predicted
through the personalized predictive model selected by the personalized FIC.
4.5 Conclusion and Remarks
Through these three case studies, namely the cross-sectional study in prostate cancer,
longitudinal study in relapsing remitting multiple sclerosis disease and survival study
in lung cancer, we illustrate the applications of the personalized FIC in identifying the
personalized predictive models for personalized prognosis and diagnosis. We thus show
the applicability of the personalized FIC in one field of personalized medicine.
86
Different from the traditional model selection criteria, FIC does not attempt to as-
sess the overall fitting of candidate models but instead focuses attention directly on the
parameter of primary interest. Generally speaking, in the model selection procedure,
including the unnecessary covariates may lead to estimates with small bias but high
variance, while excluding the necessary covariates typically yields large bias though
small variance. FIC balances the goals of having a small bias and a small variance and
aims to provide the small mean square error of the estimates.
With the information from all the patients in the study, the traditional procedure
makes the statistical inference and the prediction for any targeted patient with the
“overall fitting” model selected by the traditional model selection criterion. Due to
the patients’ heterogeneity, the model with the overall best properties may not be the
best for the targeted patient. By using the individual level information from the tar-
geted patient, the personalized FIC focuses on individual prediction and aims to find
his/her own best model in terms of the minimum mean square error estimate of his/her
own prediction. Leave-one-out cross-validation experiments and group-specific analy-
sis were performed for all three case studies. The smaller prediction error rate attained
by using the different personalized predictive models compared to the single “overall
best” model shows the superiority of our perspective.
In this chapter, we only utilize FIC’s individualized consideration to personalized
prognosis and diagnosis. More research and applications of the personalized FIC need
to be invested in the field of personalized medicine, such as the personalized therapy
selection and monitoring.
87
5 Discussion and Future Work
Quasi-likelihood based model selection and model averaging procedures incorporat-
ing the GEE approach proposed in this thesis, namely ∆AIC, QFIC and QFMA, are
originally designed for analyzing regular longitudinal and correlated data only. Never-
theless, missing data actually arise in longitudinal study quite often. Regarding missing
mechanism, there are mainly three categories, that are based on Little and Rubin (1987)
and Little (1995): missing completely at random (MCAR), missing at random (MAR)
and missing not at random (MNAR).
For the longitudinal data with missing outcomes, the GEE approach attempts to be
robust by relaxing assumptions on the model, but with the price of imposing a relatively
stronger missing mechanism assumption, MCAR. Even if the working correlation ma-
trix is correctly specified, failing to meet the MCAR assumption can result in a very
biased estimation. The weighted generalized estimating equations (WGEE) approach,
therefore, was proposed by Robins et al. (1994) and Robins et al. (1994). It aims to
avoid the bias by ignoring the unavailable observations and placing more weight on
the remainder. One direction of future research is to extend the current model selec-
tion and model averaging procedures incorporating the GEE approach to the procedure
incorporating WGEE approach so as to deal with the missing outcome in longitudinal
study.
88
One advantage of the GEE estimates is their consistency even if the working cor-
relation matrix is misspecified. Nevertheless, choosing the working correlation matrix
that is close to the true correlation structure, can increase the estimates’ efficiency.
There are, therefore, two issues involved in model selection and averaging for longitu-
dinal data incorporating the GEE approach: variable selection and working correlation
matrix selection.
In this thesis, we consider only the model selection and averaging procedures re-
garding the potential explanatory covariates. For the longitudinal data incorporating
the GEE approach, more research needs to be conducted on the working correlation
matrix’ selection and averaging. We can build a similar local misspecification frame-
work, as mentioned in Subsection 2.2.1, where IN can be viewed as the narrow model
with the diagonal parameters in the matrix being the certain parameters. The remaining
parameters in EX, AR or even UN can all be treated as the uncertain parameters.
Another direction of future research is to extend the model selection and averaging
procedures to the mixed effects models and to develop the optimal weights’ choice
strategy from both methodological and practical perspectives. Due to the two stages’
statistical inference, population level and individual level, as mentioned in Chapter
1, the development for the mixed effects models can be very challenging. Other than
selection of the fixed effects, random effects and their covariance matrix structures have
to be chosen as well, especially for the individual level inference.
In the process of theoretical derivation, there are actually more scenarios that have
to be considered. For longitudinal data collected over place, the number of the spots
or clusters may be limited in the study. Therefore, a small sample size in terms of the
number of clusters is involved, where:
n = O(1) and mi →∞, where i = 1, · · · , n.
If the repeated measurements are collected over time, the times of visits for the individ-
89
uals have to be limited for certain studies, where:
mi = O(1) with i = 1, · · · , n and n→∞.
In addition, there is no closed form of fixed and random effects’ estimation for the gen-
eralized linear mixed effects models, as mentioned in Chapter 1. Certain approximation
methods have to be used as well.
90
Bibliography
E.P. Acosta, H. Wu, S.M. Hammer, S. Yu, D.R. Kuritzkes, A. Walawander, J.J. Eron,
C.J. Fichtenbaum, C. Pettinelli, D. Neath, E. Ferguson, A.J. Saah, and J.G. Gerber,
“Comparison of two indinavir/ritonavir regimens in the treatment of HIV-infected
individuals.,” J Acquir Immune Defic Syndr, 37:1358–66, 2004.
H. Akaike, “Maximum Likelihood Identification of Gaussian Autoregressive Moving
Average Models,” Biometrika, 60:255–265, 1973.
Huiman X. Barnhart and John M. Williamson, “Goodness-of-Fit Tests for GEE Mod-
eling with Binary Responses,” Biometrics, 54:720–729, 1998.
Steve Bennett, “Analysis of survival data by the proportional odds model,” Statistics in
Medicine, 2(2):273–277, 1983.
Marco Bonetti and Richard D. Gelber, “A graphical method to assess treatmentcovari-
ate interactions using the Cox model on subsets of the data,” Statistics in Medicine,
19(19):2595–2609, 2000.
Marco Bonetti and Richard D. Gelber, “Patterns of treatment effects in subsets of
patients in clinical trials,” Biostatistics, 5(3):465–481, 2004.
N.E. Breslow and D. G. Clayton, “Approximate inference in generalized linear mixed
models,” Journal of the American Statistical Association, 88:9–25, 1993.
91
G. W. Brier, “Verification of Forecasts Expressed in Terms of Probability,” Monthly
Weather Review, 78:1, 1950.
Jason Brinkley, Anastasios Tsiatis, and Kevin J. Anstrom, “A Generalized Estimator of
the Attributable Benefit of an Optimal Treatment Regime,” Biometrics, 66(2):512–
522, 2010.
S.T. Buckland, K.P. Burnham, and N.H. Augustin, “Model Selection: An Integral Part
of Inference,” Biometrics, 53:603–618, 1997.
Kenneth P. Burnham, David R. Anderson, and Kenneth P. Burnham, Model selection
and multimodel inference: A practical information-theoretic approach, Springer,
2nd edition, 2002.
Tianxi Cai, Lu Tian, Peggy H. Wong, and L. J. Wei, “Analysis of randomized compara-
tive clinical trial data for personalized treatment selections,” Biostatistics, 12(2):270–
282, 2011.
Barbara G. Campling, Wei-Ting Hwang, Jiameng Zhang, Stephanie Thompson,
Leslie A. Litzky, Anil Vachani, Ilene M. Rosen, and Kenneth M. Algazy, “A
population-based study of lung carcinoma in Pennsylvania,” Cancer, 104(4):833–
840, 2005.
Eva Cantoni, Joanna Mills Flemming, and Elvezio Ronchetti, “Variable selection for
marginal longitudinal generalized linear models,” Biometrics, 61(2):507–514, 2005.
G. Claeskens and R. J. Carroll, “An asymptotic theory for model selection inference in
general semiparametric problems,” Biometrika, 94:249–265, 2007.
G. Claeskens, C. Croux, and J. van Kerckhoven, “Variable selection for logistic re-
gression using a prediction-focused information criterion,” Biometrics, 62:972–979,
2006.
92
G. Claeskens and N.L. Hjort, “The focused information criterion,” Journal of the
American Statistical Association, 98:900–916, 2003.
Gerda. Claeskens and N. L. Hjort, Model Selection and Model Averaging, Cambridge
University Press, Cambridge, 2008.
Valerie Crooks, Susan Waller, Tom Smith, and Theodore J. Hahn, “The Use of the
Karnofsky Performance Scale in Determining Outcomes and Risk in Geriatric Out-
patients,” Journal of Gerontology, 46(4):M139–M144, 1991.
Dmitry Danilov and Jan R. Magnus, “On the Harm That Ignoring Pretesting Can
Cause,” Journal of Econometrics, 122:27–46, 2004.
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incom-
plete data via the EM algorithm,” Journal of the Royal Statistical Society Series
B-Methodological, 39(1):1–38, 1977.
P.J. Diggle, P. Heagerty, K.Y. Liang, and S.L. Zeger, Analysis of Longitudinal Data,
Oxford University Press, Oxford, 2 edition, 2002.
D. Draper, “Assessment and propagation of model uncertainty,” Journal of the Royal
Statistical Society, Series B, 57:45–70, 1995.
Dumas, E. Todd, Hawke, L. Roy, Lee, and R. Craig, “Warfarin Dosing and the Promise
of Pharmacogenomics,” Current Clinical Pharmacology, 2(1):11–21, January 2007.
John J. Dziak and Runze Li, An Overview on Variable Selection for Longitudinal Data,
World Sciences Publisher, Singapore, 2007.
Garrett M. Fitzmaurice, Marie Davidian, Geert Verbeke, and Geert Molenberghs,
Longitudinal Data Analysis, Wiley Series in Probability and Statistics. Wiley-
Interscience [John Wiley & Sons], Hoboken, NJ, 2009.
Wenjiang J. Fu, “Penalized estimating equations,” Biometrics, 59(1):126–132, 2003.
93
Edward I. George, “The variable selection problem,” Journal of the American Statisti-
cal Association, 95(452):1304–1308, 2000.
Thomas A. Gerds and Martin Schumacher, “Consistent Estimation of the Expected
Brier Score in General Survival Models with Right-Censored Event Times,” Biomet-
rical Journal, 48(6):1029–1040, 2006.
Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher, “Assessment
and comparison of prognostic classification schemes for survival data,” Statistics in
Medicine, 18(17-18):2529–2545, 1999.
L. Gunter, J. Zhu, and S. A. Murphy, “Variable Selection for Qualitative Interactions.,”
Statistical methodology, 1(8):42–55, 2011.
D. J. Hand and V. Vinciotti, “Local versus global models for classification problems:
Fitting models where it matters,” AMST, 57:124–131, 2003.
B. E. Hansen, “Challenges for econometric model selection,” Econometric Theory,
21:60–68, 2005.
Randall E. Harris, James R. Hebert, and Ernst L. Wynder, “Cancer risk in male veterans
utilizing the veterans administration medical system,” Cancer, 64(5):1160–1168,
1989.
David A Harville, “Maximum Likelihood Approaches to Variance Component Esti-
mation and to Related Problems,” Journal of the American Statistical Association,
72(358):320–338, 1977.
R. Henderson and N. Keiding, “Individual survival time prediction using statistical
models,” Journal of Medical Ethics, 31(12):703–706, December 2005.
Nils Lid Hjort and Gerda Claeskens, “Focused information criteria and model aver-
aging for the Cox hazard regression model,” Journal of the American Statistical
Association, 101:1449–1464, 2006.
94
N.L. Hjort and G. Claeskens, “Frequentist model average estimators,” Journal of the
American Statistical Association, 98:879–899, 2003.
Hjsgaard, Ulrich Halekoh, and Jun Yan, “The R Package geepack for Generalized
Estimating Equations,” Journal of Statistical Software, 15/2:1–11, 2006.
D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley & Sons,
New York, 1989.
Y.X. Huang, H. Liang, and H. L. Wu, “Identifying predictors for anti-HIV treatment
response: mechanism-based differential equation models versus empirical semipara-
metric regression models,” Statistics in Medicine, 27:4722–4739, 2008.
J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data,
Wiley, New York, 2002.
Ronald Klein, Barbara E. K. Klein, Scot E. Moss, Matthew D. Davis, and David L.
DeMets, “The Wisconsin Epidemiologic Study of Diabetic Retinopathy: II. Preva-
lence and Risk of Diabetic Retinopathy When Age at Diagnosis Is Less Than 30
Years,” Arch Ophthalmol, 102(4):520–526, 1984.
Edward L. Korn and Richard Simon, “Explained Residual Variation, Explained Risk,
and Goodness of Fit,” The American Statistician, 45(3):pp. 201–206, 1991.
S. Kullback and R. A. Leibler, “On information and sufficiency,” Annals of Mathemat-
ical Statistics, 22:49–86, 1951.
N. M. Laird and J. H. Ware, “Random-effects Models for Longitudinal Data,” Biomet-
rics, 38:963–974, 1982.
Hannes Leeb and Benedikt M. Potscher, “Can one estimate the conditional distribution
of post-model-selection estimators?,” The Annals of Statistics, 34:2554–2591, 2006.
95
Chuan-Yun Li, Xizeng Mao, and Liping Wei, “Genes and (Common) Pathways Under-
lying Drug Addiction,” PLoS Comput Biol, 4(1):e2+, 2008.
K. Y. Liang and S. L. Zeger, “Longitudinal Data Analysis Using Generalized Linear
Models,” Biometrika, 73:13–22, 1986.
Stuart Lipsitz and Garrett Fitzmaurice, “Generalized estimating equations for longitudi-
nal data analysis,” In Longitudinal data analysis, Chapman & Hall/CRC Handbooks
of Modern Statistical Methods, pages 43–78. 2009.
Roderick J. A. Little, “Modeling the drop-out mechanism in repeated-measures stud-
ies,” Journal of the American Statistical Association, 90:1112–1121, 1995.
Roderick J. A. Little and Donald B. Rubin, Statistical Analysis With Missing Data,
Wiley Series in Probability and Mathematical Statistics: Applied Probability and
Statistics. John Wiley & Sons Inc., New York, 1987.
Honghu Liu, Robert E. Weiss, Robert I. Jennrich, and Neil S. Wenger, “PRESS model
selection in repeated measures data,” Computational Statistics & Data Analysis,
30(2):169–184, 1999.
C. L. Mallows, “Some comments on Cp,” Technometrics, 15:661–675, 1973.
P. McCullagh, “Quasi-Likelihood Functions,” The Annals of Statistics, 11:59–67, 1983.
P. McCullagh and J. A. Nelder, Generalized linear models, Chapman and Hall, London
New York, 2nd edition, 1989.
A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, 2 edition,
2002.
Erica E. M. Moodie, Thomas S. Richardson, and David A. Stephens, “Demystifying
Optimal Dynamic Treatment Regimes,” Biometrics, 63(2):447–455, 2007.
96
S. A. Murphy, “Optimal Dynamic Treatment Regimes,” Journal of the Royal Statistical
Society, Series B, 65:331–366, 2002.
Wei Pan, “Akaike’s information criterion in generalized estimating equations,” Bio-
metrics, 57(1):120–125, 2001.
Wei Pan, “Model selection in estimating equations,” Biometrics, 57(2):529–534, 2001.
A. N. Pettitt, “Proportional Odds Models for Survival Data and Estimates Using
Ranks,” Journal of the Royal Statistical Society. Series C (Applied Statistics),
33(2):pp. 169–175, 1984.
Marc A Pfeffer and John A Jarcho, “The charisma of subgroups and the subgroups of
CHARISMA.,” N Engl J Med, 354(16):1744–6, 2006.
R. L. Prentice, “Exponential survivals with censoring and explanatory variables,”
Biometrika, 60(2):279–288, 1973.
Ross L. Prentice, “Correlated binary regression with covariates specific to each binary
observation,” Biometrics, 44:1033–1048, 1988.
Min Qian and Susan A. Murphy, “Performance guarantees for individualized treatment
rules,” Annals of statistics, 39(2):1180–1210, 2011.
Annie Qu, Bruce G. Lindsay, and Bing Li, “Improving generalised estimating equations
using quadratic inference functions,” Biometrika, 87(4):823–836, 2000.
J. Robins, L. Orellana, and A. Rotnitzky, “Estimation and extrapolation of optimal
treatment and testing strategies.,” Stat Med, 27(23):4678–721, 2008.
James M. Robins, “Optimal Structural Nested Models for Optimal Sequential Deci-
sions,” In In Proceedings of the Second Seattle Symposium on Biostatistics. Springer,
2004.
97
J.M. Robins, A. Rotnitzky, and L.P. Zhao, “Estimation of regression coefficients when
some regressors are not always observed,” Journal of the American Statistical Asso-
ciation, 89:846–866, 1994.
G. K. Robinson, “That BLUP is a Good Thing: The Estimation of Random Effects,”
Statistical Science, 6(1):15–32, 1991.
R. Schall, “Estimation in generalized linear models with random effects,” Biometrika,
78:717–727, 1991.
Martin Schumacher, Harald Binder, and Thomas Gerds, “Assessment of survival pre-
diction models based on microarray data,” Bioinformatics, 23:1768–1774, 2007.
G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, 6:461–
464, 1978.
J. Shao, “An Asymptotic Theory for Linear Model Selection,” Statistica Sinica, 7:221–
264, 1997.
X. Shen, H-C. Huang, and J. Ye, “Adaptive model selection and assessment for expo-
nential family models,” Technometrics, 46:306–317, 2004.
R. Simon, “Roadmap for developing and validating therapeutically relevant genomic
classifiers,” J Clin Oncol, 23:7332–7341, 2005.
Richard Simon, “Clinical trials for predictive medicine,” Statistics in Medicine,
31(25):3031–3040, 2012.
R.M. Simon, Genomic Clinical Trials and Predictive Medicine, Practical Guides to
Biostatistics and Epidemiology. Cambridge University Press, 2013.
X. Song and M.S. Pepe, “Evaluating markers for selecting a patient’s treatment.,”
Biometrics, 60(4):874–83, 2004.
98
Robert Stiratelli, Nan Laird, and James H. Ware, “Random-Effects Models for Serial
Observations with Binary Response,” Biometrics, 40(4):pp. 961–971, 1984.
Florin Vaida and Suzette Blanchard, “Conditional Akaike information for mixed-effects
models,” Biometrika, 92:351–370, 2005.
Mark J. van der Laan and James M. Robins, Unified methods for censored longitudinal
data and causality, Springer, 2003.
Lan Wang and Annie Qu, “Consistent model selection and data-driven smooth tests
for longitudinal data in the estimating equations approach,” Journal of the Royal
Statistical Society, Series B, 71(1):177–190, 2009.
R. Wang, S. W. Lagakos, J. H. Ware, D. J. Hunter, and Jm Drazen, “Statistics in
medicine–reporting of subgroup analyses in clinical trials,” New England Journal of
Medicine, 357(21):2189–94+, 2007.
R. W. M. Wedderburn, “Quasi-likelihood functions, generalized linear models, and the
Gauss-Newton method,” Biometrika, 61:439–447, 1974.
Halbert White, “A heteroskedasticity-consistent covariance matrix estimator and a di-
rect test for heteroskedasticity,” Econometrica, 48:817–838, 1980.
H. Wu, Y. Huang, E.P. Acosta, S.L. Rosenkranz, D.R. Kuritzkes, J.J. Eron, A.S. Perel-
son, and J.G. Gerber, “Modeling long-term HIV dynamics and antiretroviral re-
sponse: effects of drug potency, pharmacokinetics, adherence, and drug resistance.,”
J Acquir Immune Defic Syndr, 39:272–83, 2005.
Jun Yan, “Enjoy the Joy of Copulas: With a Package copula,” Journal of Statistical
Software, 21(4):1–21, 2007.
Y. H. Yang, “Can the Strengths of AIC and BIC be Shared?- A Conflict Between Model
Identification and Regression Estimation,” Biometrika, 92:937–950, 2005.
99
S.L. Zeger and M.R. Karim, “Generalized linear models with random effects: a Gibbs
sampling approach,” Journal of the American Statistical Association, 86:79–86,
1991.
Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian, “A Robust
Method for Estimating Optimal Treatment Regimes,” Biometrics, 68(4):1010–1018,
2012.
X. Y. Zhang and H. Liang, “Focused information criterion and model averaging for
generalized additive partial linear models,” The Annals of Statistics, 39:174–200,
2011.
Lue Ping Zhao and Ross L. Prentice, “Correlated binary regression using a quadratic
exponential model,” Biometrika, 77:642–648, 1990.
100
Appendix
Let f(y;θ,γ) be the density function. Denote the corresponding score function,
evaluated at (θ0,0), by:
T =
[T1
T2
]=
[∂ log f(y;θ,γ)/∂θ
∂ log f(y;θ,γ)/∂γ
]θ=θ0,γ=0
.
Denote the corresponding quasi-score function, evaluated at (θ0,0), by:
U =
[U1
U2
]=
[∂Q(θ,γ; y)/∂θ
∂Q(θ,γ; y)/∂γ
]θ=θ0,γ=0
.
The corresponding second derivatives are denoted as:
H =
[∂2Q/∂θ∂θ> ∂2Q/∂θ∂γ>
∂2Q/∂γ∂θ> ∂2Q/∂γ∂γ>
]θ=θ0,γ=0
.
To study the large sample properties of the proposed model selection criterion ∆AIC,
we need some regularity conditions.
A.1 Regularity Assumptions
(C.1): The log density function log f(y;θ,γ) has the three continuous partial deriva-
tives with respect to (θ,γ) in a neighborhood around (θ0,0), which are domi-
nated by functions with finite means under fN (y) = f(y;θ0,0). The true density
f0(y) = f0(y;θ0, δn−1/2) can be represented by fN (y) as:
f0(y) = fN (y)
1 + T>2 (y)δn−1/2 + r(y, δn−1/2),
101
where r(y, t) is small enough to make fN (y)r(y, t) is order of o (‖t‖2) uni-
formly in y.
(C.2): The log quasi-likelihood function Q(θ,γ; y) has third continuous derivatives
with respect to (θ,γ) in a neighborhood around (θ0,0), which is dominated by
function with finite mean under fN (y). The quasi-information matrix Σ (defined
below) exists and is non-singular under fN (y).
Σ = EN (−H) = varN (U) =
[Σ00 Σ01
Σ10 Σ11
]and Σ−1 =
[Σ00 Σ01
Σ10 Σ11
].
(C.3): The integrals∫
U(y)fN (y)r(y, t)dy and∫‖U(y)‖2fN (y)r(y, t)dy are order
of o (‖t‖2).
(C.4): For some ξ > 0, the integrals∫‖U(y)‖2+ξfN (y)dy and
∫‖U(y)‖2+ξfN (y)r(y, t)dy
are order ofO(1). Also, the variables |U2+ξ1k (y)T2r(y)| and |U2+ξ
2l (y)T2r(y)| have
finite means under null density fN (y), for k ∈ 1, · · · , p and r, l ∈ 1, · · · , q
with U1k = ∂Q/∂θk, U2l = ∂Q/∂γl and T2r = ∂ log f/∂γr.
These assumptions have customarily been assumed in the literature of quasi-likelihood
function, GEE and local misspecification framework. See Wedderburn (1974), McCul-
lagh (1983), Liang and Zeger (1986) and Hjort and Claeskens (2003).
A.2 Technical Lemmas
Two lemmas are introduced in this section. Lemma A.1 focuses on the large sample
behavior of quasi-score. Lemma A.2 focuses on the relationship between the GEE
estimates and quasi-score, therefore the large sample behaviors of the GEE estimates.
Lemma A.1 Under the misspecification framework and Regularity Assumptions, we
have: [R1,n
R2,n
]d→ Np+q
([Σ01
Σ11
]δ,Σ
),
102
where
R1,n =1√n
n∑i=1
U1(yi) and R2,n =1√n
n∑i=1
U2(yi).
In particular, in candidate model S:[R1,n
R2,S,n
]d→ Np+qS
([Σ01
πSΣ11
]δ,ΣS
).
Here, “ d→” denotes convergence in distribution under the sequence of f0(y).
Proof. We shall finish the proof in three steps. In the first two steps, we calculate the
expectation and variance of the quasi-score under f0(y). In the third step, we verify the
requirement for the Lyapounov central limit theorem and complete the proof.
Step 1. Consider E0(U1) first. E0(U2) can be manipulated by similar arguments. A
direct calculation yields that:
E0(U1) =
∫U1(y)fN (y)dy +
∫U1(y)fN (y)T>2 (y)δn−1/2dy
+
∫U1(y)fN (y)r(y, δn−1/2)dy. (A.1)
It is easy to see that the first term in the equation (A.1) equals zero by the fact that
U(y) = D>V−1(y − µ) with µ = EN (y). Note that:∫U(y)fN(y)T>(y)dy =
∫U(y)fN(y) [∂ log fN(y)/∂β]> dy
=
∫D>V−1(y − µ) [∂fN(y)/∂β]> dy
=D>V−1
∫y[∂fN(y)/∂β>
]dy −D>V−1µ
∫ [∂fN(y)/∂β>
]dy
=D>V−1 ∂
∂β>
∫yfN(y)dy −D>V−1µ
∂
∂β>
∫fN(y)dy
=D>V−1 ∂µ
∂β>− 0 = D>V−1D = Σ,
where the interchanges are justified by assumption (C.1) that |T(y)| is dominated by
function with finite mean under fN (y) and assumption (C.4) that |U(y)T(y)| has finite
103
mean under fN (y). Thus, the second term in the equation (A.1) is Σ01δn−1/2. Also,
by assumption (C.3), we conclude that the third term in the equation (A.1) is order of
o(1/n).
By the similar arguments, E0(U2) = Σ11δ/√n + o(1/n). As a result, the expecta-
tion of the quasi-score under f0(y) becomes:
E0
[U1
U2
]=
[Σ01
Σ11
]δ√n
+ o(1/n).
Step 2. Similar to calculating the expectation of the quasi-score, we first consider
var0(U1). The remaining terms can be manipulated by similar arguments. Note that:
E0(U1U>1 ) =
∫U1(y)U>1 (y)fN (y)dy +
∫U1(y)U>1 (y)fN (y)T>2 (y)δn−1/2dy
+
∫U1(y)U>1 (y)fN (y)r(y, δn−1/2)dy.
(A.2)
The first term is EN (U1U>1 ). By assumption (C.4), we know:∫
U21k(y)T2r(y)fN (y)dy ≤
∫ ∣∣U21k(y)T2r(y)
∣∣fN (y)dy = O(1)
and for k1, k2 ∈ 1, · · · , p,∣∣∣∣∫ U1k1(y)U1k2(y)T2r(y)fN (y)dy
∣∣∣∣≤ 1
2
[∫ ∣∣U21k1
(y)T2r(y)fN (y)∣∣dy +
∫ ∣∣U21k2
(y)T2r(y)fN (y)∣∣dy
]= O(1).
Therefore,∫
U1(y)U>1 (y)fN (y)T>2 (y)dy is order of O(1). It follows that the second
term in the equation (A.2) is order of O(1/√n). By assumption (C.3), we conclude
that the third term in (A.2) is order of o(1/n).
A direct simplification yields that
var0(U1) = varN (U1) +O(1/√n) = Σ00 +O(1/
√n).
104
Go through the similar arguments for var0(U2), cov0(U1,U>2 ) and cov0(U2,U
>1 ). The
variance of the quasi-score can be expressed under f0(y) as:
var0
[U1
U2
]=
[Σ00 Σ10
Σ10 Σ11
]+O(1/
√n) = Σ +O(1/
√n).
Step 3. Because yis are independent, the corresponding quasi-score, denoted by UF,i =
U(yi), is independent too. By assumption (C.4), for some ξ > 0:
E0
(‖U(y)‖2+ξ
)=
∫‖U(y)‖2+ξfN (y)dy +
∫‖U(y)‖2+ξfN (y)T>2 (y)δn−1/2dy
+
∫‖U(y)‖2+ξfN (y)r(y, δn−1/2)dy = O(1).
Therefore ‖UF,i‖2+ξ has finite mean under true density f0(y). So is ‖UF,i−E0(UF,i)‖2+ξ.
Denote the true distribution of yi by F0,i(y). It follows that:
limn→∞
n−(1+ξ/2)
n∑i=1
∫‖U− E0(UF,i)‖2+ξdF0,i(y)→ 0.
The Lyapounov condition can be assured. Applying Lyapounov central limit theorem
to the quasi-score UF,i indicates that:
1√n
n∑i=1
UF,i − E0(UF,i)
d→ Np(0,Σ).
Therefore, [R1,n
R2,n
]d→ Np+q
([Σ01
Σ11
]δ,Σ
).
Q.E.D.
Lemma A.2 Under the misspecification framework and the Regularity Assumptions,
the GEE estimates have the following equivalence in distribution form:
√n
[θ − θ0
γ
]= Σ−1
[R1,n
R2,n
]+ op(1),
105
In particular, under candidate model S:
√n
[θ − θ0
γS
]= Σ−1
S
[R1,n
πSR2,n
]+ op(1).
Proof. Consider a Taylor series expansion of the quasi-score around (θ0,0) as:[R1,n(θ, γ)
R2,n(θ, γ)
]
=
[R1,n
R2,n
]+
[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]θ=θ0,γ=0
[θ − θ0
γ − 0
]
+1
2
[θ − θ0
γ − 0
]>[∂2R1,n(θ,γ)/∂θ>∂θ ∂2R1,n(θ,γ)/∂γ>∂θ
∂2R2,n(θ,γ)/∂θ>∂γ ∂2R2,n(θ,γ)
/∂γ>∂γ
]θ=θ∗,γ=γ∗
[θ − θ0
γ − 0
],
(A.3)
with θ∗ between θ0 and θ, and γ∗ between 0 and γ. Recalling the consistency of the
GEE estimates, it is easy to see θ∗ = θ0+op(1) and γ∗ = op(1). Also, assumption (C.1)
indicates that the matrix of the second derivative in the third term in the equation (A.3)
is bounded, therefore the third term is of order op(1). The equation (A.3) becomes:[0
0
]=
[R1,n
R2,n
]+
[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]θ=θ0,γ=0
×
[θ − θ0
γ − 0
]+op(1).
Therefore,
√n
[θ − θ0
γ − 0
]= −√n
[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]−1
θ=θ0,γ=0
×
[R1,n
R2,n
]+op(1).
Again assumption (C1) and the law of large number yields:
1√n
[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]θ=θ0,γ=0
= −Σ + op(1)
and
√n
[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]−1
θ=θ0,γ=0
= −Σ−1 + op(1).
106
Therefore,
√n
[θ − θ0
γ − 0
]=−Σ−1 + op(1)
[ R1,n
R2,n
]+ op(1)
= −Σ−1
[R1,n
R2,n
]+ op(1).
We finish the proof. Q.E.D.
A.3 Proof of Theorem 2.1
Based on Lemma A.2, the estimate of the uncertain parameter under the full model
becomes:
√nγ = Σ10R1,n + Σ11R2,n + op(1)
= Σ11(R2,n −Σ10Σ−100 R1,n) + op(1).
The estimates of the uncertain parameters under candidate model S can be written as:
√nγS = Σ11
S (πSR2,n −Σ10,SΣ−100 R1,n) + op(1)
= Σ11S πS(R2,n −Σ10Σ
−100 R1,n) + op(1).
(A.4)
A direct calculation indicates a relationship between γS and γ as follows:
√nγS =
√nΣ11
S πS
(Σ11)−1γ + op(1). (A.5)
Also, the large sample behavior of the GEE estimates can also derived by Lemma A.1
and Lemma A.2:
√n
[θ − θ0
γ
]d→ Np+q
(Σ−1
[Σ01
Σ11
]δ,Σ−1
). (A.6)
107
We are now going to prove the main theorem. To derive the specific form of ∆AIC,
consider a Taylor series expansion of the log quasi-likelihood around (θ0,0):
Q(θ, γ;D) = Q(θ0,0;D) +√n
[R1,n
R2,n
]>×
[θ − θ0
γ − 0
]
+
√n
2
[θ − θ0
γ − 0
]>[∂R1,n(θ,γ)/∂θ> ∂R1,n(θ,γ)/∂γ>
∂R2,n(θ,γ)/∂θ> ∂R2,n(θ,γ)/∂γ>
]θ=θ∗,γ=γ∗
[θ − θ0
γ − 0
],
where θ∗ is between θ0 and θ and γ∗ between 0 and γ. It follows that:
Q(θ,γ;D)−Q(θ0,0;D)
=
[R1,n
R2,n
]>×√n
[θ − θ0
γ − 0
]+
√n
2
[θ − θ0
γ − 0
]>−√nΣ + op(1)
[ θ − θ0
γ − 0
]
=
[R1,n
R2,n
]>×
Σ−1
[R1,n
R2,n
]+ op(1)
− 1
2
Σ−1
[R1,n
R2,n
]+ op(1)
>Σ + op
(1/√n)
Σ−1
[R1,n
R2,n
]+ op(1)
=1
2
[R1,n
R2,n
]>Σ−1
[R1,n
R2,n
]+ op(1).
The second equality follows from Lemma A.1. In particular,
Q(θ, γS;D)−Q(θ0,0;D) =1
2
[R1,n
πSR2,n
]>Σ−1
S
[R1,n
πSR2,n
]+ op(1). (A.7)
For the narrow model, it becomes:
Q(θ,0;D)−Q(θ0,0;D) =1
2R>1,nΣ
−100 R1,n + op(1). (A.8)
Recall the definition of ∆AICn,S, which follows that:
∆AICn,S = −2n∑i=1
Q(θ, γS; yi) + 2n∑i=1
Q(θ,0; yi) + 2|S/N|
= −2[Q(θ, γS;D)−Q(θ,0;D)
]+ 2|S/N|.
108
The equations (A.7) and (A.8) indicate that:
∆AICn,S = −2[Q(θ, γS;D)−Q(θ0,0;D)
]+ 2[Q(θN ,0;D)−Q(θ0,0;D)
]+ 2|S/N|
= −
[R1,n
πSR2,n
]>Σ−1
S
[R1,n
πSR2,n
]+ R>1,nΣ−1
00 R1,n + 2|S/N|+ op(1).
Using the expressions given in the equations (A.4) and (A.5), ∆AICn,S can be further
expressed as:
−(πSR2,n − πSΣ10Σ
−100 R1,n
)>Σ11
S
(πSR2,n − πSΣ10Σ
−100 R1,n
)+ 2|S/N|+ op(1)
=−√nγ>S
(Σ11
S
)−1√nγS + 2|S/N|+ op(1)
=− nγ>(Σ11)−1π>S Σ11
S πS
(Σ11)−1γ + 2|S/N|+ op(1).
Recalling the equation (A.6), we have proven that√nγ ∼ Nqδ,Σ
11), therefore the
main component of ∆AICn,S converges to a non-central chi-squared distribution and:
∆AICn,Sd→− χ2
|S/N|(λS) + 2|S/N|,
with λS = nγ>0(Σ11)−1π>S Σ11
S πS
(Σ11)−1γ0. We thus complete the proof. Q.E.D.
A.4 Proof of Theorem 3.1
Then, the GEE estimate converges in distribution as follows:
√n
[θS − θ0
γS
]→d
[(Σ00,SΣ01 + Σ01,SπSΣ11)δ + Σ00,SM1 + Σ01,SπSM2
(Σ10,SΣ01 + Σ11,SπSΣ11)δ + Σ10,SM1 + Σ11,SπSM2
]
=
[Σ−1
00 Σ01δ + Σ−100 M1 −Σ−1
00 Σ01π>S Σ11
S πS
(Σ11)−1
∆
Σ11S πS
(Σ11)−1
∆
].
Since ζ is a function of (θ,γ), then√n(ζS− ζ0
)can be expanded by Taylor expansion
and a delta method as:
109
√n(ζS − ζ0) =
√nζ(θS, γS)− ζ(θ0, δ/
√n)
d→(∂ζ
∂θ
)>√n(θS − θ0) +
(∂ζ
∂γS
)>√n(γS − γ0)−
(∂ζ
∂γ
)>δ
d→(∂ζ
∂θ
)> Σ−1
00 Σ01δ + Σ−100 M1 −Σ−1
00 Σ01π>S Σ11
S πS
(Σ11)−1
∆
+
(∂ζ
∂γS
)>Σ11
S πS
(Σ11)−1
∆−(∂ζ
∂γ
)>δ
=
(∂ζ
∂θ
)>Σ−1
00 Σ01 −(∂ζ
∂γ
)>δ −
(∂ζ
∂θ
)>Σ−1
00 Σ01π>S −
(∂ζ
∂γS
)>
Σ11S πS
(Σ11)−1
∆ +
(∂ζ
∂θ
)>Σ−1
00 M1
=
(∂ζ
∂θ
)>Σ−1
00 M1 + ω>δ − ω>π>S Σ11S πS
(Σ11)−1
∆.
Therefore:
√n(ζS − ζ0)
d→ ΩS = Ω0 + ω>δ − ω>π>S Σ11S πS
(Σ11)−1
∆
where Ω0 ∼ Np(0, τ20 ). The limiting variable ΩS follows Normal distribution with
mean ω>δ − ω>π>S Σ11S πS
(Σ11)−1δ and variance τ 2
0 + ω>π>S Σ11S πSω.
Q.E.D.
A.5 Proof of Theorem 3.2
Since the compromise estimator has the form of ζ =∑
S p(S|∆)ζS, therefore:
√n(ζ−ζ0
) d→ Ω =∑
S
p(S|∆)ΩS = Ω0+ω>δ−ω>∑
S
p(S|∆)π>S Σ11
S πS
(Σ11)−1
∆.
The limiting variable Ω has meanω>δ−ω>E[δ(∆)
]and variance τ 2
0 +ω>var[δ(∆)
]ω.
Q.E.D.