Approximate Local D-optimal Experimental Design for Binary
Technical Report RP-SOR-0501
Hovav A. Dror and David M. Steinberg
Department of Statistics and Operations Research
Raymond and Beverly Sackler Faculty of Exact Sciences
Tel Aviv University
Ramat Aviv 69978
Abstract: A fast and simple method is proposed that produces approximate multivariate local D-optimal
designs of high e¢ ciency for models with binary response. The method assumes availability of a D-optimal
design for a parallel normal response linear problem that has the same linear predictor, with an assumption
of homogenous variance; the change required to transform the standard design into an e¢ cient one for a
multivariate logit or probit model is to shift any design point whose probability is very low or very high (and
is therefore non-informative) into the nearest feasible point of moderate probability.
KEY WORDS: Generalized Linear Models ; Logistic Response; Logit; Probit; Design of Experiments
Construction of an experimental design for a generalized linear model presents a level of complexity greater than
that required for a model with a normally distributed error term of constant variance. Finding an optimal design
for a linear model is already a numerically intensive optimization problem. However, some linear models can be
characterized and have a known trivial optimal design, and designs for more complicated models can be sought
through available software such as "gosset" (Hardin and Sloane 1993), the statistical toolbox in MATLAB (The
Mathworks, inc), JMP or the SAS Optex procedure. Extension from a linear model to a logistic one does not
retain these qualities. Trivial solutions to linear model design problems are often factorial designs, utilizing the
corners of the design region; for a logisitic response, some of these corners have probabilities that are close to
zero or one, so that the responses are almost deterministic, and are therefore non-informative. Furthermore,
optimal design for binary response depends on the unknown coe¢ cients, and hence two experiments having
the same model but di¤erent coe¢ cient values will typically have di¤erent optimal designs. Common software
packages assume all observations have homogenous variance and so do not provide a remedy for generalized
This paper proposes a simple method for constructing an approximate local D-optimal design for a multi-
variate binary response problem with the logit or probit link.
2. EXPERIMENTAL DESIGN FOR GENERALIZED LINEAR MODELS
Most work on Experimental Design focuses on linear models with a continuous response. A common assumption
is that the error term of the model is normally distributed and that its variance is constant over the design
region. These assumptions are not met (even asymptotically) with the popular logistic model or with the probit
model for binary response data. Standard procedure for the analysis of these models uses iteratively reweighted
least squares, which asymptotically leads to weights that enable solving the problem as a simple linear model
(McCullagh and Nelder 1989). These weights depend on the parameters of the problem and are generally
di¤erent for two binary response problems of the same structure but di¤erent true (unknown) coe¢ cients.
Frequently, Experimental Design uses optimality criteria based on Fishers Information Matrix. Of these,
D-optimality is the most intensively studied. For the normal response setting with y = � + " and � = F�;
D-optimality maximizes the determinant of the information matrix FTF and so minimizes the volume of the
condence ellipsoid for the unknown coe¢ cients �. Atkinson & Donev (1992) note that the assumptions of
normality and constancy of variance enter the optimal design through the information matrix. They indicate
that instead of FTF; other matrices are appropriate for non-normal distributions, and that once the appropriate
matrix has been dened the principles and practice of optimal experimental design are similar to continuous
response problems. This means that the optimal experimental design is driven from the information matrix
of the parameters FTWF ; the matrix of weights, W , depends on the unknown parameters and on the design.
This dependence creates a di¢ culty in constructing an optimal experimental design.
2.1 Local D-Optimal Designs
Because a D-optimal design depends on the coe¢ cients, its derivation require us to make some assumptions
on the values of the unknowns we want to estimate. When we use an initial estimate (rather than a Bayesian
a-priori distribution, for instance) we produce a design that is optimal only locally, for the coe¢ cient values
that were adopted. If the initial estimate is poor, the designs performance may be far from optimal. D-optimal
designs that are based on an initial estimate for the unknown coe¢ cient values are designated as local D-optimal
2.2 Local D-Optimal Designs for a logistic model
Abdelbasit & Plackett (1983) discuss the construction of local D-optimal designs for binary response with one
explanatory variable. They show that for an unconstrained univariate problem with the model logit(p) =
�0 + �1x the local D-optimal design places half the points where the estimated probability is 0.824 and the
other half at the location with probability 0.176. Univariate binary response has been further considered by
many authors, see for example Atkinson & Donev (1992), Sitter (1992), Hedayat, Yan & Pezzuto (1997), or
Mathew & Sinha (2001).
Although the nal coordinates creating the optimal design are found numerically, the function to maximize
is derived through analytical procedures. Increasing the number of unknowns complicates both the analytical
route and the numerical process. This is probably the reason not much has been published on multivariate
design problems. Chipman and Welch (1996) compare D-optimal designs for Generalized Linear Models over
a constrained region to linear regression D-optimal designs, including multifactor problems. Their comparison
was based on computer generated D-optimal designs, using weighted linear models. This way they show the
general phenomenon that points in the linear model optimal design may "move in" from the edges of the design
region for the logistic model. Sitter and Torsney (1995) analyze D- and c- optimal designs for binary response
experiments with two design variables and various link functions; Smith and Ridout (2003) obtain optimal
Bayesian designs for bioassays involving two parallel dose response relationships where the main interest is in
doses of one substance; Atkinson, Demetrio & Zocchi (1995) describe a dose response experiment with two
variables - one continuous and one indicator variable, considering the special case in which the value of the
indicator variable is unknown during the process of the experiment, but is available for the posterior analysis.
Atkinson (2005) gives examples of rst order D-optimal designs with two variables and discusses usage of
standard factorial design as an approximation, similar to the method proposed here; but - assuming one needs
to search for the approximate design over irregular design regions Atkinson concludes that "Searching over the
original design space to nd optimal design for the generalized linear model would both be easier and lead to a
more e¢ cient design than would trying to nd such a regression approximation".
It is noted that the treatment described so far was limited to a rst order model with two variables. Woods,
Lewis, Eccleston and Russel (2005) o¤er a method of creating multivariate compromise designs that are robust
to uncertainty in aspects of the model, including the uncertainty in the coe¢ cient values, di¤erent choices of
a link function and various models, including interactions. As can be expected for such complex designs, their
method requires intensive computation.
We will now suggest a simple method of constructing approximate local D-optimal designs. This method is
not limited to a small number of covariates, or to rst-order designs without interactions. It can produce ap-
proximate designs almost instantaneously, as it does not require intensive computation, and it is straightforward
to implement. There are two cornerstones for this algorithm; rst, one needs an optimal design for an analogous
linear model (that is, the optimal design for a problem with the same formulation and a constant variance).
Second, we use computation of the probability at design points according to the specic initial coe¢ cient values.
3. REGION OF LINEARITY
We are considering a binary response with the logistic link. That is pi = e
; Fi being the i-th row of the
regression matrix F: We attempt to nd an approximation for a local optimal design. As explained, the term
"local" implies that the design relates to particular values of the coe¢ cients.
Note that while both � and feasible points for the design space may be of any di