A Bayesian criterion for selecting super saturated screening designs

Journal of Statistical Planning andInference 120 (2004) 137–153

www.elsevier.com/locate/jspi

A Bayesian criterion for selecting super saturatedscreening designs

Daniel Colemana, Yousceek Jeongb, Robert W. Keenerc;∗aCytokinetics Inc., 280 East Grand Ave, South San Francisco, CA 94080, USA

bKorea Institute for Currency Research, HIT Bldg., 17 Haengdang-Dong, Seoul 133-791, South KoreacDepartment of Statistics, University of Michigan, Ann Arbor, MI 48103, USA

Received 8 February 2001; accepted 25 October 2002

Abstract

Super-saturated designs in which the number of factors under investigation exceeds the numberof experimental runs have been suggested for screening experiments initiated to identify importantfactors for future study. Most of the designs suggested in the literature are based on natural butad hoc criterion. The “average s2” criteria introduced by Booth and Cox (Technometrics 4 (1962)489) is a popular choice. Here, a decision theoretic approach is pursued leading to an optimalitycriterion based on misclassi>cation probabilities in a Bayesian model. In certain cases, designsoptimal under the average s2 criterion are also optimal for the new criterion. Necessary conditionsfor this to occur are presented. In addition, the new criterion often provides a strict preferencebetween designs tied under the average s2 criterion, which is advantageous in numerical searchas it reduces the number of local minima.c© 2002 Elsevier B.V. All rights reserved.

1. Introduction

1.1. Formulation and main results

At the onset of study of an industrial process, researchers or engineers can of-ten suggest myriad factors that might inBuence a production process. In such cases,the primary goal of initial experimentation may be to identify important factors thatmerit further study. To reduce costs associated with this phase of experimentation,super-saturated designs in which the number of factors exceeds the number of runs

∗ Corresponding author. Tel.: +1-734-763-3519; fax: +1-734-763-4676.E-mail addresses: [email protected] (D. Coleman), [email protected] (Y. Jeong),

[email protected] (R.W. Keener).

0378-3758/$ - see front matter c© 2002 Elsevier B.V. All rights reserved.doi:10.1016/S0378-3758(02)00502-5

mailto:[email protected]



138 D. Coleman et al. / Journal of Statistical Planning and Inference 120 (2004) 137–153

have been suggested and studied by Booth and Cox (1962), Lin (1993), and Wu(1993). For super-saturated designs there is a general presumption that although manyfactors are under study, only a small proportion of these factors will have a signi>canteHect on response. This has been called “eHect sparsity” by Box and Meyer (1985).The new criteria developed here select designs best suited to the task of identifyingactive factors under the assumption of eHect sparsity.In this work, a linear model is postulated in which all two and higher way interactions

are zero. Thus the expected response for a single run is

� ± �1 ± · · · ± �k ;where � is the overall mean, �i for i = 1; : : : ; k are main eHects of the k factors,and the plus/minus signs are “+” if the factor is at one level and “−” if the factoris at the other level. The errors in the linear model are independent and identicallydistributed from N(0; �2). In vector form, the response Y from n experimental runswill have a multivariate normal distribution with covariance �2I and mean X , where = (�; �1; : : : ; �k)′ and the design matrix X has entries ±1 according to the factorsettings for the various runs. Entries in the >rst column of X are all one.Ideally, columns of the design matrix X would be selected to be mutually orthogonal.

But for super-saturated designs this will be impossible since k exceeds n. So, manydesigns proposed to date are based on measures of the orthogonality of columns of thedesign matrix X . Writing X = (1; x1; : : : ; xk), with “1” denoting a column of ones, andrestricting attention to “balanced” designs in which 1′xi = 0, i = 1; : : : ; k, the simplestsuggestion, advanced by Booth and Cox (1962), seeks designs which minimize theaverage s2 criterion∑

i¡j

s2ij ;

where sij = x′i xj, the inner product of xi and xj. Note that xi and xj both have squaredlength n, and so sij=n is the cosine of the angle between xi and xj.The average s2 criterion and related measures were developed from considerations

related to accurate estimation, but have only indirect relevance when the primary infer-ential task is identifying important factors. In Sections 2 and 3, more suitable criteriaare developed speci>cally for identi>cation of important factors. The >rst criterion,derived in Theorem 2.1, is the cosecant criterion,∑

i¡j

cscij ; (1)

where cscij=1=(1−s2ij=n2)1=2, the cosecant of the angle between xi and xj. This criterionis derived in the context of a Bayesian model, given in detail in Section 2, in whichthere is a single active factor. Let �j denote the posterior probability that j is theactive factor. If i is the active factor, one would certainly hope to have �i ¿�j, and agiven design will be eHective if pri(�j ¿�i) is small for all j �= i. A natural Bayesiancriterion is

∑i �=j pri(�j ¿�i), where pri denotes conditional probability given i is the

active factor. The cosecant criterion arises from this after an approximation. Note thatif i is the active factor �j ¿�i might be viewed as a “pairwise” error, so the cosecantcriteria is related to sums of pairwise error probabilities.

D. Coleman et al. / Journal of Statistical Planning and Inference 120 (2004) 137–153 139

A similar approach from a frequentist perspective is pursued in Section 3. Competinghypotheses are compared using a generalized likelihood ratio test, which is equivalentto comparison based on mean square error, suggested by Srivastava (1975). Sums ofpairwise error probabilities remains a natural criterion. These error probabilities havebeen computed by Shirakura et al. (1996), but depend on the design and the unknownmagnitude of the eHect of the active factor in a rather complicated fashion. A localapproach based on maximizing the rate of decrease of the pairwise error probabilitieswhen the eHect is near zero is suggested, and this leads to the sine criteria in whichdesigns are selected to maximize∑

i¡j

sinij :

Other design criteria based on the error probabilities have been proposed by Ghoshand Teschmacher (2002) and Shirakura et al. (1996).Section 4 considers optimal designs. In many situations where optimal average s2 de-

signs can be obtained analytically, the designs also minimize the cosecant criterion (1).Theorem 4.1 gives suMcient conditions for this to occur. In situations where optimaldesigns must be found by numerical search, there seem to be de>nite advantages us-ing the new criterion. Numerical search routines include the interchange and exchangealgorithms considered by Nguyen (1994), Nguyen (1996), Nguyen and Miller (1992),Nguyen and Williams (1993), Booth and Cox (1962), and Lin (1995). Using the newcosecant criterion in the interchange algorithm has led to new designs which improveon the best known designs according to either criterion, average s2 or Bayesian cose-cant. This numerical work is detailed in Section 4, and the improved designs are givenin Tables 3–6.

1.2. Motivation and a comparison with related approaches

Booth and Cox (1962) have two arguments concerning the relevance of the averages2 criterion. One argument notes that in the sub-model in which there are only twoactive factors, i and j, the variance of the least squares estimator for either main eHectis n�2=(n2− s2ij). If the cosine sij=n is small (which seems suspect in some cases) thenthis variance is approximately �2(1 + s2ij=n

2)=n. Averaging this approximation over iand j gives the average s2 criterion. Their other argument concerns the variance of anestimate for the eHect of a single factor i, ignoring other factors. If the other maineHects have a >xed magnitude and random sign. Then

∑j s2ij is proportional to the

increase in the variance of the estimate for the eHect of factor i.Nguyen (1996) notes similarities between the average s2 criterion and the classi-

cal notions of A and D optimality. Let �1¿ · · ·¿ �k+1 be the eigenvalues of X ′X .When k ¡n, A-optimal designs are chosen to minimize tr[(X ′X )−1] =

∑i �

−1i , and

D-optimal designs are chosen to maximize det(X ′X ) =∏i �i. In super-saturated cases,

it seems natural to modify these criteria summing and multiplying only over the nnonzero eigenvalues. Minimizing the average s2 criterion is equivalent to minimizingtr[(X ′X )2]=

∑i �2i . The eigenvalues must satisfy the constraint

∑�i=tr(X ′X )=n(k+1).

If the eigenvalues could be chosen freely satisfying this constraint, solutions to the A,


D, and average s2 optimization problems would be the same with all of the nonzeroeigenvalues taking the same value, so qualitatively these criteria all tend to favor de-signs with small variation for the eigenvalues.These justi>cations for the average s2 criterion are all related to estimation accuracy.

This has some relevance to screening—with enough precision an active factor cancertainly be identi>ed. But the link is not direct, and this motivated the approachbased on error probabilities pursued here.The approach presented here is more natural than one based on estimation accuracy,

but the setting is somewhat restrictive. In typical applications the assumption that onlya single eHect is present in the model would be at best a useful >ction. Wu (1993)derives criteria that can be viewed as extensions of the average s2 criterion to situationswith a >xed number of active factors. For instance, his criterion A3 (for three activefactors) 1 is proportional to

∑i¡j¡l

tr

n sij sil

sij n sjl

sil sjl n

−1

:

Analogous extensions of the results here, with the design criteria based on error prob-abilities identifying two (or more) active factors, may be possible, but the necessarycalculations are tedious. Although the average s2 and cosecant criterion lose direct rel-evance as more complicated models are entertained, both reward designs with smallmagnitudes for the sij, which must have indirect value in these situations. In this regard,note that the individual summands in Wu’s A3 criteria are smallest when sij=sil=sjl=0.In fact, if we de>ne Ss2ijl = (s

2ij + s

2il + s

2jl)=3, then the i; j; l summand in A3 is at most

n(n2 − Ss2ijl)=(n3 − 2Ss3ijl − n Ss2ijl), an increasing function of Ss2ijl.

The importance of discrimination in more complicated models is driven to someextent by the size of the error variance �2. If � is large relative to the anticipated size ofeHects, there will not be enough information to distinguish between complicated modelsand design criteria should focus on how well you will do in simpler situations. But as �decreases there will be a better chance identifying and >tting complicated models, anddesign criteria should take increasing account of this possibility. The extreme situationhere arises if � = 0, considered by Srivastava (1975), Srivastava (1976) and Ghoshand Avila (1985). In these papers, a design is said to have resolving power r if,in the absence of noise, any collection of k signi>cant factors can be identi>ed andtheir eHects evaluated. A design will have resolving power r if any collection of 2rcolumns {x1; : : : ; xk} are linearly independent. If the noise � is suMciently small, thenk signi>cant factors can still be identi>ed, provided the design has resolving power k.This follows from an interesting bound by Rosendorn and Rosendorn (1976).Although there is no guarantee that designs minimizing the average s2 criterion or

the cosecant criterion will have optimal resolving power, both of these criteria favorsmall inner products sij, and if the sij are small enough collections of the columns

1 Wu’s A2 criterion is proportional to the sum of the variances n�2=(n2 − s2ij) considered by Booth andCox in their derivation of the average s2 criterion.


cannot be linearly dependent. It seems likely that designs selected using these criteriawill generally have some resolving power. Numerical examples discussed in Section 4support this to some extent. New designs given in Tables 3–6 can all resolve at least4 factors, and the resolving power that seems to increase to k=2 as n approaches k.Resolving powers for these designs are indicated in their tables.There is also a volume of literature on screening designs by researchers in infor-

mation theory. This approach is based on results from large deviation theory, with afocus on small error rates in various asymptotic limits. See for instance Malyutov andSadaka (1998). In this work, designs are selected with the random balance method,described for screening experiments in Budne (1959) and ascribed to Satterthwaite. Inthe limits considered, the number of active factors is >xed, but both the number of runsn and the total number of factors k tend to in>nity, with n and log k of comparablemagnitude. The main results show that small error rates are possible, which is quiteinteresting since the limit has k much larger than n.In the random balance method, design matrix entries are selected at random. This

may be practical, but is a bit hard to defend. Since error probabilities equal the averageof their conditional values given the design, some improvement must be possible using>xed designs. As a practical matter, if performance can be measured by an appropriatecriterion, then a researcher would generally want to improve any random balance designwith interchange algorithm iterations.Asymptotic results for random balance designs may have limited relevance for ex-

periments of typical scale in industry. For instance, if a balanced design was selected atrandom for an 8 run experiment studying 14 factors, the expected value for the averages2 criteria would be 832, nearly twice the minimal value of 448. But as sample sizesand number of factors increase, this approach may be worth considering, especiallygiven the numerical challenge of minimizing any criteria over a large class of designs.

2. Bayesian models

A Bayesian model incorporating eHect sparsity can be speci>ed in the followinghierarchical fashion. The set � of important factors will be viewed as an unknownparameter. Given �, main eHects for factors not in � will be zero, and main eHectsfor factors in � will have some speci>ed joint distribution. Formulae will be mosttractable if this speci>ed distribution is multivariate normal. As an extension, the priordistribution could allow a small amount of variation for factors not in �. Chipman(1996) uses priors of this form in Bayesian variable selection.The design criterion (1) is developed for the simplest interesting situation in which

there is only one active factor, equally likely to be any of the k factors under con-sideration. In the derivation, the error variance �2 is taken as known. But the sameanswer is anticipated for unknown �2, modeled as a random variable in a Bayesianmodel with its own prior, since �2 does not appear in cosecant criterion (1).Although simple, the assumptions here do capture key features that should be con-

sidered designing a study. Designs based on the resulting cosecant criterion (1) shouldbe useful in many cases where the underlying assumptions are suspect. In these cases


data analysis after experimentation can and should be based on more realistic and elab-orate assumptions. Ideally, designs would also be based on more elaborate and realisticmodels, but the necessary analysis seems diMcult and the resulting criteria would mostlikely yield the same designs when the number of factors and runs is moderate.Let Hi denote the hypothesis that factor i is the single active factor. In the hierarchical

model, given Hi, � and �i are a priori independent with � ∼ N(0; �2�) and �i ∼ N(0; �2�).Here �2� and �

2� are known constants. So, given Hi,

Y = �1+ �ixi + � ∼ Nn(0; �2�11′ + �2�xix′i + �2I):After running the experiment, posterior probabilities �i that i is the single active

factor can be obtained in a routine fashion using Bayes formula. In particular, �i willexceed �j if and only if the Hi=Hj likelihood ratio exceeds 1. Ideally, if i is theactive factor and j �= i, �i will end up larger than �j. So for a good design, the errorprobability pri(�j ¿�i) should be small. Here “pri” denotes conditional probabilityin the Bayesian model, given Hi. The Bayesian criterion function derived below isproportional to the sum of all these error probabilities,∑

i �=jpri(�j ¿�i): (2)

A formula and cosecant approximation for the probabilities in this sum are given inTheorem 2.1.It is possible to view the Bayesian criterion (2) as a decision theoretic risk. The

action after experimentation will be to provide a list ranking the factors according toperceived importance. The head of the list will be the factor most likely to be the singleactive factor, followed by the factor next most likely the active factor, etcetera. Theloss function is simply the number of factors in this list ahead of the true active factor.The optimal Bayes procedure then lists factors according to their posterior probabilities,and (2) is k times the Bayes risk of this optimal procedure.

Theorem 2.1. Fix i and j and de:ne

�21 =

sin2ij(1− 3�2 − 4�4)− sin4ij + 4�2(1 + �2)2+(cos2ij − �2 − 2�4) sinij (sin2ij + 4�2 + 4�4)1=2

2(1 + �2)2 cos2ij=(n2�2� sin

2ij)

and

�22 =sin2ij + 4�

2 + 4�4 − (1 + 2�2) sinij (sin2ij + 4�2 + 4�4)1=22(1 + �2) cos2ij=(n2�

2� sin

2ij)

;

where cosij = sij=n, the cosine of the angle between xi and xj, sinij =√1− cos2ij and

�2 = �2=(n�2�). Then

pri(�j ¿�i) =2 arctan(�2=�1)

�:


As �→ 0, �1=�2 = sinij=�+O(�). If xi and xj are not collinear,

pri(�j ¿�i) =�

� sinij+O(�3)

as � → 0. The approximation here is proportional to cscij justifying the cosecantcriterion (1).

Proof. Let fm denote the zero mean multivariate normal density with covariance

�m = �2�11′ + �2�xmx

′m + �

2I:

Under Hm, Y has density fm, and so �m=fm(Y )=∑l fl(Y ), by Bayes theorem. From

this, �j ¿�i if and only if fj(Y )¿fi(Y ). Eigenvectors of �i are 1, xm, and vectorsorthogonal to these, with corresponding eigenvalues �2 + n�2�, �

2 + n�2�, and �2. So

det(�m) = �2n(1 + n�2�=�2)(1 + n�2�=�

2), constant as m varies. Also,

�−1m = �−2I − 11′�2�=�

2

�2 + n�2�− xmx

′m�

2�=�

2

�2 + n�2�;

and so

Y ′�−1m Y = �

2‖Y‖2 − n2 SY 2�2�=�

2

�2 + n�2�− (x′mY )

2�2�=�2

�2 + n�2�:

Using this, fj(Y )¿fi(Y ) if and only if W 2j ¿W

2i , where Wj = x

′jY and Wi = x

′iY .

Given Hi,(Wi

Wj

)∼ N2(0; !)

with

! = n2�2�

(1 + �2 cosij(1 + �2)

cosij(1 + �2) cos2ij + �2

):

To >nish the calculation, introduce

V =

(1 b

b 1

)(Wi

Wj

)

and note that V 21 − V 22 = (1− b2)(W 2i −W 2

j ), so that if |b|¡ 1, W 2j ¿W

2i if and only

if V 22 ¿V21 . If we take

b=sin2ij − 2(1 + �2) + sinij (sin2ij + 4�2 + 4�4)1=2

2 cosij(1 + �2);

then V1 and V2 are independent under Hi with variances �21¿ �22 stated in the theorem.

Since the ratio of two independent standard normal variables has a Cauchy distribution,

pri(�j ¿�i) = pri(|V2|¿ |V1|)

= pri

(∣∣∣∣V2=�2V1=�1

∣∣∣∣¿�1=�2)=2 arctan(�2=�1)

�;

the >rst result in the theorem. The other assertions then follow by Taylor expansion.


This approximation leads naturally to the cosecant criterion (1). Since sinij and cos2ijcan be computed from s2ij, both �1 and �2 are functions of s

2ij, and the exact Bayesian

criterion (2) can be expressed as∑i¡j

f(s2ij) (3)

with an increasing convex function f.

3. Classical approach

For comparison, in this section we show how design criteria can be developed fromerror probabilities for pairwise comparisons without a Bayesian model. In this case, thesigni>cant factor might be identi>ed using generalized likelihood ratio comparisons. If$ denotes the mean of Y , so Y ∼ N($; �2I), then the log likelihood function is

‘ =− 12�2

‖Y − $‖2 − n log(2��2):

Under the hypothesis Hi that i is the active factor, $= �1+ �ixi. Maximum likelihoodestimators of � and �i under Hi (still assuming balanced columns, 1 · xi = 0) are�̂ = SY = 1 · Y=n and �̂i = xi · Y=n=Wi=n. With known �2,

supHi‘=− 1

2�2‖Y − SY1− �̂ixi‖2 − n log(2��2)

=− 12�2

[n SY 2 + n�̂2i − ‖Y‖2]− n log(2��2):

So in this case, likelihood ratio tests will select the factor i with the largest value for�̂2i . This is equivalent to the Bayesian analysis, since �̂i ˙ Wi. If �2 is unknown, max-imum likelihood estimators under Hi for � and �i remain the same, and the maximumlikelihood estimator for �2 is

�̂2i =1n‖Y − SY1− �̂ixi‖2 = 1n [‖Y‖

2 − n SY 2 − n�̂2i ]:

So with unknown �,

supHi‘ =−n

2− n log(2��̂2i ):

Since �̂2i is a decreasing function of �̂2i , the factor selected will still be the one to

maximize �̂2i .To derive a formula for pairwise errors, >x i and j and assume that i is the active

factor. De>ne

V1 =(1 + sinij)Wi − cosijWj� sinij

√2n(1 + sinij)

and V2 =(1 + sinij)Wj − cosijWi� sinij

√2n(1 + sinij)


and note that W 2i ¿W

2j if and only if V

21 ¿V

22 . Under Hi, EWi=n�i, EWj=n cosij �i,

Var(Wi) = Var(Wj) = n�2, and Cov(Wi;Wj) = n�2 cosij. It follows that

EV1 = �̃i1 + sinij√2(1 + sinij)

and EV2 = �̃icosij√

2(1 + sinij);

where �̃i =√n�i=�, and Cov(V ) = I . If Z1 and Z2 are independent standard normal

variables, we can write the pairwise error probability as

P(W 2j ¿W

2i ) = P(V

22 ¿V

21 ) = P((Z2 + EV2)

2¿ (Z1 + EV1)2)

= E exp{Z2EV2 + Z1EV1 − �̃2i =2}I{Z22 ¿Z21}:This form will be suMcient for our needs, but an exact result is possible, given inShirakura et al. (1996).The error probability derived is a complicated function of �̃i and the angle between

xi and xj. Asymptotic analysis for large �̃i should be similar to the large deviationsapproach pursued in Malyutov and Sadaka (1998), and integration over �̃i should givethe cosecant criteria. A local approach seeking to maximize the rate the error decreasesnear �̃i = 0 is also feasible. Exploiting natural symmetries of the joint distribution ofZ1 and Z2,

EZ1I{Z22 ¿Z21}= EZ2I{Z22 ¿Z21}= EZ1Z2I{Z22 ¿Z21}= 0and

E(Z21 + Z22 )I{Z22 ¿Z21}= 1

2 E(Z21 + Z

22 ) = 1:

Finally,

E(Z22 − Z21 )I{Z22 ¿Z21}= E|Z22 − Z21 |I{Z22 ¿Z21}= 12E|Z22 − Z21 |:

Since Z2 − Z1 and Z2 + Z1 are independent, and both are normal with variance 2,E|Z22 − Z21 |= E|Z2 − Z1|E|Z2 + Z1|= 4=�. Solving,

EZ22 I{Z22 ¿Z21}=�+ 22�

and EZ21 I{Z22 ¿Z21}=�− 22�

:

By Taylor expansion, the pairwise error probability is

12+12E(Z2EV2 + Z1EV1)2I{Z22 ¿Z22} −

�̃2i4+ o(�̃2i )

=12+ (EV2)2

�+ 24�

+ (EV1)2�− 24�

− �̃2i

4+ o(�̃2i )

=12− sinij2��̃2i + o(�̃

2i )

as �̃i → 0. Since the local rate of decrease is proportional to sinij, designs might beselected to maximize∑

i¡j

sinij :


Since −sinij =−√1− s2ij=n2, the negative of this criteria (to be minimized) again has

form (3) with f in increasing convex function.

4. Optimal designs

When k = 2(n− 1), optimal designs for the average s2 criterion can often be foundusing a construction due to Lin (1993) based on half fractions of Hadamard ma-trices. This approach is generalized in Nguyen (1996), exploiting relations betweensuper-saturated designs and cyclic incomplete block designs (see John and Williams,1995). By Theorem 4.1, many of these optimal s2 designs are also optimal for thecosecant criterion (1).When k �= 2(n− 1) analytically optimal designs are not known, and Nguyen (1996)

pursues a numerical approach. Some of the designs he discovered seem to minimize thecosecant criterion (1). In other cases, our numerical search to minimize the cosecantcriterion has lead to new designs with smaller values for both criteria than the designsgiven by Nguyen (1996).

4.1. Optimal designs when k = 2(n− 1)

Table 1 lists |sij| multiplicities for the average s2 optimal designs given in Nguyen(1996) when k = 2(n− 1). Rows of the table list the number of runs, the number offactors, and multiplicities for the distinct values of |sij|.For the optimal s2 designs in Table 1 the number of distinct values for the |sij| is

small, either two or three. The following theorem details relationships between optimaldesigns for the cosecant and average s2 criteria in such cases.

Theorem 4.1. (a) If a design minimizes the average s2 criterion, and if the |sij| forthis design only assume the two smallest values (0 and 4 if n=2 is even, 2 and 6 if n=2is odd), then the design also minimizes the cosecant criterion or any other criteriaof form (3) with f increasing and convex. Furthermore, any design which minimizesthe cosecant criterion also minimizes the average s2 criterion. (b) Suppose the |sij|

Table 1sij Multiplicities

n k Distinct |sij| n k Distinct |sij|

0 4 8 2 6

8 14 63 28 10 18 144 912 22 132 99 14 26 286 3916 30 195 240 18 34 459 10220 38 399 247 57 22 42 651 21024 46 483 460 92 26 50 850 37528 54 594 675 162 30 58 1044 609


for an optimal average s2 design assume the three smallest values. If the |sij| forsome optimal Bayesian cosecant design also assume the three smallest values, thenthis design will minimize the average s2 criterion. Also, if there are several optimalaverage s2 designs, a design minimizing the cosecant criterion will have the smallestpossible multiplicity for the largest of the three values for the |sij|.

From this result designs minimizing the cosecant criterion quite often minimize theaverage s2 criterion, and the cosecant criterion “breaks ties” among optimal s2 designsin a natural fashion. In situations governed by part (b) of the theorem, the cosecantcriterion re>nes the average s2 criterion. This theorem follows directly from LemmaA.3 in the appendix. This result is developed in a more general setting which allowssimilar comparisons of other criteria with form (3).We believe the designs for n=20, 24, or 28 in Table 1 are Bayes optimal, but do not

have a rigorous proof. Extensive numerical study has not unearthed any improvements,and cyclic designs of dimension k = 2(n− 1) seem to have a great variety of optimalproperties.

4.2. Designs of dimension k �= 2(n− 1)

At present, theoretical methods do not give optimal s2 designs when k �= 2(n− 1),and a numerical approach is necessary. Unfortunately, there are no known algorithmsfor optimizing the average s2 or the cosecant criterion with certainty in a reasonableamount of computer time. Nguyen (1996) describes the following interchange algorithmto optimize objective functions over balanced designs:

1. Start with a randomly generated balanced design and enumerate the columns.2. Starting with the >rst column perform all possible interchanges of 1 and −1. If thereis an interchange which reduces the objective function then make the interchange;if there is more than one, make the one which reduces the objective function themost. Proceed to the next column.

3. Stop when a pass through all the columns fails to make an interchange.

When the algorithm terminates the >nal design matrix is a local minimum in thesingle exchange neighborhood. The algorithm is run many times and the design ma-trix corresponding to the smallest local minimum is taken as the best. Certainly thelonger the algorithm is run the greater the chance of >nding the true optimal design.For larger designs a full investigation of the domain requires substantial computingtime.In our numerical work, 300,000 random starts were made of the interchange algo-

rithm for each design. We consider the 7 cases studied by Nguyen (see Table 2). Infour of these cases, superior designs were obtained. These designs are given in theTables 3–6. The minimums for designs with n= 24, k = 30 and n= 18, k = 36 wereobtained just once in 300,000 runs.Table 2 compares the |sij| multiplicities of Nguyen’s designs with the multiplicities

of the new designs.


Table 2Multiplicity comparison

n k Nguyen New Designs

distinct |sij| distinct |sij|0 4 8 0 4 8

12 16 81 3912 18 96 5712 24 141 13524 30 241 187 7 241 188 6

2 6 2 6

18 24 249 27 250 2618 30 362 73 364 7118 36 493 137 495 135

Table 3Design with n = 24, k = 30

+ − − + + + + + − − + − + − − + + − − + − − − − + + + − − +− − + − + + − + − + − − + − − − − + + + − − + + − − + − + +− − + + + − − − + + + + − − + − + − − + − + − + + − + + + −+ − − − + + + − − + − + + + − + + − + − − − − + + − − + + −+ − − − + − + + + − + + − − + − − + + + + − − − − + − + + −+ − − − − − − + − − − − − + − − + − − − + + − − − + + − + +− + − + − + − + + + + − + + − − − + − − − − − + − + − + + −− + + − + − − + + + − + + + − + − − − + + + − − + + − + − +− − − + − − − + − − − − − − − + + + + + + + + + + − − + − −− + + − − − + − + + + − − − − + + + + − + − − − + − + − + +− + − + + − + − + − − + + − − − + − − − + − + + − − − − − +− − − + − − − − − + + + + − + + − − + − − + + − − + − − + +− − + − − + + + + − + + − + + − + − + − − − + + + + + + − ++ + + − + − − − − − + − − + + + − − − + − − + + + + − − + −+ + − − − + + − − + + + − − − − − + − + + + + + + + + + − +− − + − − + + − − − + − + + + − − + − − + + − − + − − − − −+ + − + + + − + + − + + + + + + − + + − + + + + + − + − + ++ − + + − + − − + + − − − − + + − − − − + − − + − − − + − ++ + + − − − − + − − − + + − + + + + − − − − + − − − + + − −+ − + + − − + − + − − + + + − + − + + + − + − + − + + − − −− + − + + + − − − + − + − + + − + + + + + − − − − + + − − −+ + + + − + + + + + − − + − + − + − + + + + + − + + − − + −+ + + + + − + + − + + − − + − − − − + − − + + − − − + + − −− + − − + + + − + − − − − + + + + + − + − + + − − − − + + +(Resolving Power 11)

4.3. Some remarks on computation

Replacing the average s2 objective function∑i¡j s

2ij=(

k2 ) with g(s

2) =∑i¡j f(s

2ij)

for any increasing convex function f produces a >ner distinction between designs



+ + + − + + − − − + − − − − + − − − − + + + + −+ − + − − + + − − + − + + + − + + − − − − − − −+ + + + + − + + + − + − − − − + − − + − + − − −− + + − − + − − + − + − + + + − + + + − + + − −− + − + + + + + − − + + + − − + + + − + − + − −− − − + + − − + + + − + + + + + − − − − + + − −− − − + − + + + − + − − − − − − + − + − + + + +− + − − − + − + + + + + − − − − − − − − − − − ++ + − + − + − + − + − − + + + + − + + + − − + −+ − − − + − + − + + + + − − + + + + + − − + + −+ + − − − − + − + − − + + + − − − − + + + + + ++ + + + − − + + − + + + − + + − + + − + + − − +− − + + − + + − + − + − − + + + − − − + − + + +− − + − + + − + + − − + − + − + + + + + + − + +− − − − + − + − − + + − + − + + − + + + + − − ++ − + − + − − + − − + − + + − − − + − − − + + ++ − − + − − − − + − − − − − − − + + − + − − − −− + + + + − − − − − − + + − + − + − + − − − + +(Resolving Power 9)


− + − + + + − − + − − + − + − − + + − − − + + − − − − + − +− − + − + − − − + − + + − + + + − + + + + + − + + − + + − −+ + + + − + + − − − − + + + + + − − − + − − − + − − − + + −− − − − + + + + − + − + − − + − − + − − − − − + − + + − − −+ + + + + − + − + + + − + − − − + + − + + + − + − + − − − −− + − − − − − − − + − − − − − + + + + + + − + + − + − + + −+ + + + − + − + + + − − − + + − + + + − + − − + + − + − + ++ − + − − − + − − + − − + − − − − − + − − + + + + − + + − +− − + − − + + + + − + − + − + + + + − − − + + + + + − + + ++ − − + + + + − + − − + + − − + − + + + + − + − + + + − + +− + − + + − − + + + − − + − + + − − + − − + − − + − − − + −+ − − + + − − − − − + − − + + − − − − − + − + + + + − − + +− − + + − − − − − + + + + − + + + − − − + − + − − − + − − +− + − − − − + + − − + + + + − − + + + + − − − − + − − − − +− + + − + + − + − + + + + + − − − − − + + + + − + + + + + −+ − + − + + − + − − − − − + − + + − + + − + − − − + − − − ++ + − − − + + + + − + − − − − + − − − − + − − − − − + + − −+ − − + − − + + + + + + − + + − + − + + − + + − − + + + + −(Resolving Power 4)

and allows for tie breaking between optimal s2 designs. Furthermore, using the latterin the interchange algorithm leads to two improvements. First, it allows tie breakingbetween competing interchanges at stage 2 of the algorithm. In addition, it allows more



++ − − − + − − − − − − − − + + − + + − − + − − + + + − − − − + − − + −++ − − + + − + − − − + + + − − + − + − + + − + − − − − + + + − + − + +−− − − − − + − + − + − + + − − + + − + + + − − + − − − − + − + + − + −++ + + + − + + + − − − + + + + + − − + − + + − + − + − + − − + + + − +−− − + − − + − − + − − − + − + + − − − + + + + + + + + + − + − + + + −−+ + − − + + − − + + + + + + + + − + + + − + − − + + − − + + + − + + +−− + + − − − + + + − + + − + + + + − − − − − + + − − − − − + − − − + +−+ − + + + + − + + − + − + + − − − − + − + − + − + − + − − + + + − − +++ + − − − − + + + + + + − − + − − − + + + + + − + − + + − − + − − + −−− + − + + − + − + − − + − + + − + + + + + − + − − + + − + − − + + − −++ − − + − − − + + + − − − − + + + + + − − − − + + − + + + + − + + − ++− − − + − + + − − + + + + − − − + + + − − + + + + + + − − + − − − − −−+ − + − + + + + − − + − − + − + + + + + − + − − − − + + − − − − + − −+− + − + + − − − + − + − + − − + − − − − − + − + − − + − + − + − + − −−+ + + + − + + − − + − − − − + − − − − − − − − − − + + − + − − − − + ++− + + + + + + + − + + − − + − − + − − + − − + + + + − + + + + + + + −−− − + − − − − − − + − + − + − − − + − − − + + − + − − + + − + + + − ++− + + − + − − + + + − − + − − − + + − + + + − − − + − + − + − − − − +(Resolving Power 4)

interchanges. For example, suppose n1 = 0, n2 = 4, n3 = 8, and designs A and B havemultiplicities +1 = 241, +2 = 188, +3 = 6 and 1 = 238, 2 = 192, 3 = 5. Then theaverage s2 criterion is the same for both designs, but g(s2) is smaller for design B.Running the interchange algorithm, if the current design is A and B can be obtainedfrom A by a single exchange, then an exchange will be made if the objective functionis g(s2), but not if the objective function is average s2. Our success >nding improveddesigns seems due to both using the cosecant criterion and a large number of restartsselected at random from the entire design space.

Acknowledgements

Coleman’s research was supported in part by National Science Foundation grantDMS 96-26843 and a grant from the DuPont Corporation. We thank Jee-Weon Parkfor comments on a draft of this paper, and the referees and editors for their suggestions.

Appendix A. Comparison lemma

Lemma A.3 relates solutions to two constrained maximization problems, A and B.For A, the goal is to maximize

∑ml=1 A(yl) with A an increasing convex function over

a constraint set C ⊂ Nm, where N = {n1; : : : ; nN} with n1¡ · · ·¡nN . For B, thegoal is to maximize

∑l B(yl) over y∈C. For the application to design, y would be a

vector listing values s2ij in some order, and so m would be k(k − 1)=2; C would be allvectors y that can arise in this fashion from some balanced design; and the functions A


or B would be s2 for the average s2 criterion, 1=(1− s2=n2)1=2 for the cosecant criterion(1), or any function f of form (3). In the lemma, B will be more convex than A inthe following sense.

De$nition A.1. Let A and B be strictly increasing convex functions on a commondomain N. Function B is more convex than A if

A(y3)− A(y2)A(y2)− A(y1)¡

B(y3)− B(y2)B(y2)− B(y1) ; (A.1)

whenever y1¡y2¡y3.

Remark A.2. If A is linear and B is strictly convex, then B is more convex than A. Inaddition, the de>nition is invariant under aMne transformations: if B is more convexthan A and c1; : : : ; c4 are arbitrary constants with c1¿ 0 and c3¿ 0, c1B+ c2 is moreconvex than c3A+ c4.

Lemma A.3. Let A and B be increasing convex functions with B more convex thanA on N. Let a and b be arbitrary solutions of A and B, so that

∑l A(al) =

inf y∈C

∑l A(yl) and

∑l B(bl) = inf y∈C

∑l B(yl). De:ne multiplicity vectors + and

of a and b by +i = #{l: al = ni} and i = #{l: bl = ni}, 16 i6N .

1. Suppose +i = 0 for i¿ 3. Then a is a solution of B. Furthermore every solutionof B is a solution of A.

2. Suppose +i = 0 for i¿ 4. Then

16 +1; 2¿ +2 and 36 +3: (A.2)

3. Suppose a and a∗ are solutions of A with distinct multiplicities + and +∗, and that+i = +∗i = 0 for i¿ 4. Then +∗2 �= +2. Without loss of generality assume +∗2¿+2.Then 11 = +1 − +∗1¿ 0, 12 = +3 − +∗3¿ 0, and

11 =A(n3)− A(n2)A(n2)− A(n1)12: (A.3)

Furthermorem∑i=1

B(al)¿m∑i=1

B(a∗l ):

Remark A.4. The constraint set C in this lemma is arbitrary. Convex combinations ofpoints in C need not be in C, even if they happen to lie in Nm.

Proof (First Assertion): By Remark A.2, there is no loss of generality transforming Aand B so that A(n1) = B(n1) and A(n2) = B(n2). Then by (A.1) B¿A on N. Usingthis, since a solves A and has support {n1; n2},

m∑l=1

B(bl)¿m∑l=1

A(bl)¿m∑l=1

A(al) =m∑l=1

B(al):


Hence a is a solution to B. To see that b is a solution of A, since B¿A and a hassupport {n1; n2},

06m∑l=1

A(bl)−m∑l=1

A(al)6m∑l=1

B(bl)−m∑l=1

B(al);

which equals zero since a solves B.

Proof (Second Assertion): Suppose (A.2) fails. It is again convenient, using RemarkA.2, to transform A(·), but now the appropriate transformation will depend on the wayin which (A.2) fails. There are three cases:

1. If 1¿+1 transform A so that A(n2) = B(n2) and A(n3) = B(n3). Then by (A.1),A(n1)¡B(n1).

2. If 2¡+2 transform A so that A(n1) = B(n1) and A(n3) = B(n3). Then by (A.1),A(n2)¿B(n2).

3. If 3¿+3 transform A so that A(n1) = B(n1) and A(n2) = B(n2). Then by (A.1),A(n3)¡B(n3).

These cases are not mutually exclusive, but this is not a problem since all of themlead to a contradiction. In all of these cases

0¿m∑l=1

B(bl)−m∑l=1

B(al) =N∑l=1

( l − +l)B(nl)

= ( 1 − +1)B(n1) + ( 2 − +2)B(n2) + ( 3 − +3)B(n3) +N∑l=4

lB(nl)

¿ ( 1 − +1)A(n1) + ( 2 − +2)A(n2) + ( 3 − +3)A(n3) +N∑l=4

lA(nl)

=m∑l=1

A(bl)−m∑l=1

A(al)¿ 0;

a contradiction. Hence 16 +1, 2¿ +2, and 36 +3.

Proof (Third Assertion): Suppose +2 = +∗2 . Then +1 + +3 = +∗1 + +

∗3 and +1A(n1) +

+3A(n3)=+∗1A(n1)++∗3A(n3). Solving these linear equations, +1 =+

∗1 and +3 =+

∗3 . Thus

+= +∗, a contradiction.Next, (A.3) follows directly from the identity

3∑1

+iA(ni)− A(n2)3∑1

+i =3∑1

+∗i A(ni)− A(n2)3∑1

+∗i :

Since A is strictly increasing, 11 and 12 must have the same sign, and both cannot benegative if +∗2¿+2.


For the last part, transform A and B such that A(n1) = B(n1), A(n2) = B(n2) andA(n3)¡B(n3). Then

m∑l=1

B(al)−m∑l=1

B(a∗l )

=m∑l=1

(+l − +∗l )B(nl)

=m∑l=1

(+l − +∗l )A(nl) + (+3 − +∗3 )(B(n3)− A(n3))¿ 0:

References

Booth, K.H.V., Cox, D.R., 1962. Some systematic supersaturated designs. Technometrics 4, 489–495.Box, B.E.P., Meyer, R.D., 1985. Some new ideas in the analysis of screening designs. Res. Nat. Bur.Standards 90, 495–502.

Budne, T., 1959. Application of random balance designs. Technometrics 2, 139–155 (with discussion).Chipman, H., 1996. Bayesian variable selection with related predictors. Canad. J. Statist. 24 (1), 17–36.Ghosh, S., Avila, D., 1985. Some new factor screening designs using the search linear model. J. Statist.Plann. Inference 11, 259–266.

Ghosh, S., Teschmacher, L., 2002. Comparisons of search designs using search probabilities. J. Statist. Plann.Inference 104, 439–458.

John, J.A., Williams, E.R., 1995. Cyclic and Computer Generated Design, 2nd Edition. Chapman & Hall,New York.

Lin, D.K.J., 1993. A new class of supersaturated designs. Technometrics 35, 28–31.Lin, D.K.J., 1995. Generating systematic supersaturated designs. Technometrics 37, 213–225.Malyutov, M., Sadaka, H., 1998. Jaynes principle in testing signi>cant variables of linear model. RandomOperators and Stochastic Equations 4, 311–330.

Nguyen, N.-K., 1994. Construction of optimal incomplete block designs by computer. Technometrics 36,300–307.

Nguyen, N.-K., 1996. An algorithmic approach to constructing supersaturated designs. Technometrics 38,69–73.

Nguyen, N.-K., Miller, A.J., 1992. A review of some exchange algorithms for constructing discrete d-optimaldesigns. Comput. Statist. Data Anal. 14, 489–498.

Nguyen, N.-K., Williams, E.R., 1993. An algorithm for constructing optimal resolvable row-column designs.Austral. J. Statist. 35, 363–370.

Rosendorn, N.N., Rosendorn, E.R., 1976. An algebraic substantiation of applicability of supersaturated searchdesigns in the presence of noise, Zavodskaja Laboratorija (10) 1219–1223.

Shirakura, T., Takahashi, T., Srivastava, J.N., 1996. Searching probabilities for nonzero eHects in searchdesigns for the noisy case. Ann. Statist. 24 (6), 2560–2568.

Srivastava, J.N., 1975. Designs for searching non-negligible eHects. In: Srivastava, J. (Ed.), A Survey ofStatistical Designs and Linear Models. North-Holland, Amsterdam, pp. 507–519.

Srivastava, J.N., 1976. Some further theory of search linear models. In: Contributions to Applied Statistics,Experientia Supplementum, Vol. 22. Birkhauser, Basel, pp. 249–256.

Wu, C.F.J., 1993. Construction of supersaturated designs through partially aliased interactions. Biometrika80, 661–669.

Documents

A Bayesian criterion for selecting super saturated screening designs