12
Distribution of Rankings for Groups Exhibiting Heteroscedasticity and Correlation Author(s): Scott Gilbert Source: Journal of the American Statistical Association, Vol. 98, No. 461 (Mar., 2003), pp. 147- 157 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/30045202 . Accessed: 15/06/2014 11:39 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AM All use subject to JSTOR Terms and Conditions

Distribution of Rankings for Groups Exhibiting Heteroscedasticity and Correlation

Embed Size (px)

Citation preview

Distribution of Rankings for Groups Exhibiting Heteroscedasticity and CorrelationAuthor(s): Scott GilbertSource: Journal of the American Statistical Association, Vol. 98, No. 461 (Mar., 2003), pp. 147-157Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/30045202 .

Accessed: 15/06/2014 11:39

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Distribution of Rankings for Groups Exhibiting Heteroscedasticity and Correlation

Scott GILBERT

In one-way analysis of variance, a main interest is in differences among the groups that comprise the population. For a given parameter, such as a mean value, data yield parameter estimates for each group, as well as group rankings based on these statistics. Here ranking probabilities are studied under the assumption that parameter estimates are well approximated by a normal distribution, either in finite

samples or asymptotically, with possible intergroup heteroscedasticity and correlation. Particular interest lies in the ranking distribution as a descriptor of experiments on some future dataset. Examples include contract bidding, global economic competition, and rival contests in mating. Ranking distributions are analytically complicated, yet some interesting properties can be derived for them via the

symmetry and elliptical geometry of the normal distribution. Some relationships between ranking probabilities and group parameters are described, with attention given to the role of between-group heterogeneity and correlation. For the ranking probabilities, estimators and

asymptotically valid standard errors formulas and hypothesis tests are proposed. Simulation is used to describe the sample size needed for accurate asymptotic approximation, and the methods are illustrated with an economic example.

KEY WORDS: Asymptotic; One-way analysis of variance; Ranking distribution; Simulation.

1. INTRODUCTION

Consider a population consisting of p groups, for which 0 is a p vector of constants with ,i describing the ith group, and suppose that there is interest in the relative magnitudes of

01 .....

O. This is a situation arising frequently in the analy- sis of two-way data. The constants are presumed to be unob- servable and are estimated by some 0, ... 6,, which have statistical rankings r, . . . , rp, defined (in terms of ascending order) as the smallest numbers in the set { 1,.... , p} such that

ri < rj

if Oi

< Oi, r- = rj if

0i - 0j, for i,j-= 1, ... p.

The probability distribution of rankings, given by P(ri = j), i, j = 1, .. p, is useful in describing the success likeli- hood of competitors in racing events, and variations on this basic theme are common in many fields, including economics [e.g., bidding on contracts (see, e.g., Engel, Fischer, and Gale- tovic 2001) and global economic competition (Amato and Amato 2001)], and biology [rival contests in mating (Radcliffe and Rass 1998)]. Of particular interest is the ranking probabil- ities that describe the likelihood of outcomes in some future experiment for groups that may exhibit heterogeneity both in the parameters 6i of interest and in other parameters (vari- ances) and may exhibit mutual dependence.

Ranking probabilities are described for the case in which where the estimator 0 is well approximated by a normal dis- tribution, in a finite sample or asymptotically, with attention given to intergroup heteroscedasticity and correlation. Hence interested is not only in the group constants 01,.... O,p, but also in the joint contribution of these and other parameters (variances and covariances) to the ranking distribution. In comparison, a great deal of previous work has focused on the problem of multiple inferences and simultaneous confi- dence intervals for 0,,... , O8 (Hochberg and Tamhane 1987; Toothaker 1993; Hsu 1996). Other work on the selection prob- lem has developed and evaluated methods for picking the best group (in population terms) using sample statistics (Bechhofer, Elmaghraby, and Morse 1959; Gibbons, Olkin, and Sobel

1977; Gupta and Panchapeakesan 1979; Lam 1989; Hoppe 1993; Mukhopadhyay and Solanky; 1994 Bechofer, Santner, and Goldsman 1995). The selection literature largely deals with the case in which group statistics 01, ..., OP

are inde- pendent and differ in distribution only in 01,... , Op,

with the focus on 0 and decision procedures related to 0. Recently, Gupta and Miescke (1988), Nelson and Matejcik (1995), Nel- son, Swann, Goldsman, and Song (2001) and Kim and Nelson (2001) have developed selection methods that allow for het- eroscedastic and correlated groups.

The range of possible ranking distributions is quite diverse under normality and is analytically complex. By exploiting symmetries and the elliptical geometry of the normal distri- bution, the range of feasible ranking probabilities can be par- tially characterized under the hypothesis of equal population rankings (01 = 02 = ... = O) and under the alternative. The presence of intergroup heteroscedasticity and/or correlation greatly increases the range of feasible ranking probabilities under each hypothesis. A full characterization of the possible means and covariances consistent with a given ranking distri- bution is beyond the scope of this article.

Estimating the ranking distribution requires an historical sample that is suitably large relative to the planned sample. In addition to nonparametric methods based on historical event frequencies, a class of parametric estimators is proposed that includes a "plug-in" method, in which historical moment esti- mates are substituted for population moments in generating normal ranking probabilities, as well as methods that incorpo- rate Bayesian uncertainty about the moment values. The para- metric estimators are computed via numerical methods, and estimator consistency is verified in planned samples of fixed size and those of size increasing in magnitude but declining sufficiently rapidly relative to the historical sample size.

In addition, asymptotically valid confidence intervals and tests of restrictions on the ranking probabilities are proposed,

Scott Gilbert is Assistant Professor, Department of Economics, Southern Illinois University, Carbondale, IL 62901 (E-mail: [email protected]). The author thanks the editor, an associate editor, and two referees for very help- ful suggestions. This project was supported in part by U.S. Office of Naval Research Grant N0014-99-0722. A previous version of this article was pre- sented at the Fall 2000 meeting of the Midwest Econometrics Group.

r 2003 American Statistical Association Journal of the American Statistical Association

March 2003, Vol. 98, No. 461, Theory and Methods DOI 10.1198/016214503388619175

147

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

148 Journal of the American Statistical Association, March 2003

including uniformity (e.g., the restriction of equality among the probabilities of all ranks, for a given group). Because

groups are allowed to exhibit heteroscedasticity and correla- tion, such hypotheses need not reduce to restrictions on 0 alone. Wald tests are proposed that use the (nonparametric or plug-in normal) estimated ranking probabilities, with an

asymptotically chi-squared distribution under the null hypoth- esis when the planned sample is of fixed size and the historical

sample grows. No corresponding result is possible when the

planned statistic 0 is normal only asymptotically, due in part to the fact the hypotheses to be tested are explicit restrictions on

probabilities and hence are sensitive to nonnormal probabil- ity. Because the proposed parametric plug-in inference method is based on asymptotic (delta) methods, simulation is used to examine finite-sample performance. The method works well even in samples as small as 30.

The remainder of the article is organized as follows. Section 2 discusses an example from economics, and Section 3 describes ranking distributions for multivariate normal statis- tics. Section 4 proposes estimators of ranking probabilities, and Section 5 develops tools of inference for these probabili- ties. Section 6 continues the economic example, and Section 7 contains concluding remarks. An Appendix contains mathe- matical proofs.

2. EXAMPLE

This section briefly mentions an economic example. This

example is an instance of fixed-effects modeling, in which the effects of interest are measured in terms of mean values. Other

group-specific quantities of interest include regression inter-

cepts and slopes. In each case, statistics yield group rankings and their distribution that are described in Section 3.

Consider international rankings of annual economic growth rates, for example, the year-to-year percentage changes in real

gross national product per capita. Such rankings are of great importance to economic science, and studies of such rankings have a long history, dating back at least to Adam Smith's 1776 treatise Wealth of Nations. Data on annual economic growth rates are currently available for many countries, with the years of coverage ranging from a few years to a few centuries. Over time, the amount and quality of data have improved, and the

quantitative study and debate about international economic

rankings has recently intensified (see, e.g., Barro 1991, 1996; Barro and Sala-I-Martin 1995; Landes 1999; Forbes 2000; Galbraith 2002; Farmer and Lahiri 2002).

Annual economic growth rates (series NY.GNP.PCAP. KD.ZG) for Canada, Mexico, and the United States for the years 1966-1999 were obtained from the World Bank's World Development Indicators CD-ROM (2001 edition). Let the parameter Oi represent the mean annual economic growth rate for the ith country, and let 6 consist of average growth rates over some years later than the end-of-sample 1999, say the year 2002 or the years 2002-2006. (Note: Data for years 2000 and 2001 for Mexico are not yet available from the World Bank.) Interest lies in the distribution of rankings r among the economic growth rates 01,, 02, and 03, as well as deter- minants of this distribution, including means, variances, and covariances among annual growth rates.

Economic theory is rather mixed regarding the likelihood of outcomes for international rankings of economic growth rates. With Canada and the United States as developed coun- tries and Mexico a developing country, if ultimately Mexico would catch up to the living standard of Canada and the United States, then its economic growth rate must eventually exceed that of the others. Catching up is predicted by the Solow- Swann neoclassical theory of economic growth (see Solow 1956; Swann 1956; Farmer and Lahiri 2002). On the other hand, if invention and human capital innovation are unlim- ited resources, then all three countries may share a common

long-run economic growth rate. This situation is predicted by the recent endogenous growth theory (see Lucas 1988; Romer 1990; Farmer and Lahiri 2002). Consistent with this situation is the prediction that each of the three countries has a one-third probability of achieving a given rank in terms of economic

growth rates in the near future. This possibility is investigated in Section 6.

3. RANKING DISTRIBUTION

To explore the determinants of ranking probabilities in situ- ations like the one in Section 2, normality of planned statistics is assumed.

Assumption 1: 0 is distributed multivariate normal

N(0, fl), for some p x p invertible variance-covariance matrix

We Assumption I can be considered an exact property in finite

samples or a large-sample approximation, and an implicit role for conditioning variables is available for the distribution of estimate 0 and rankings r via a conditional mean and condi- tional covariance matrix.

Normality allows considerable variety of behavior in rank-

ing probabilities Qij = P(ri = j), for i, j = 1, ....

p. Some

important generic properties are

p p

0<Qq<l1 and EQik=:Qkj =1, for i,j=l ,..., p. k=1 k=l

(1)

The set of matrices Q satisfying (1) comprises the inte- rior of the set of doubly stochastic matrices. The equalities

EP=l Qik = 1, i = 1 ....

p, hold for any ranking distribution Q, and because the distribution of 0 admits a probability den-

sity function, there are no ties in rank [P(ri f rj, Vi 1j) = 1], and E=,

Qk- = 1 is obtained. Moreover, because the density

function for 0 has full support on RP, the bounds 0 < Qij < 1, i, j = 1,...., p are obtained.

Some, but not all, doubly stochastic matrices Q are con- sistent with normality. In the case of p = 2 groups, (1) is

equivalent to 0 < Q,, = Q22 < 1 and Q21 = Q12 = 1 -Q1I, and each such Q can be generated under normality N(O, I), with I the identity matrix, by setting 0, - 02 to achieve the desired value of Q,,. For general p, consider the p2 x 1 vector vec (Q) of stacked columns of Q. The linear equations in (1) can be expressed as a'vec(Q) = 0, i = 1, ... 2p - 1, for linearly independent p2 x 1 vectors a . . . , a2p1. The solutions con- stitute a linear subspace L of Rp2 of dimension p2 - (2p - 1). Because L is a linear subspace intersecting the interior of the

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Gilbert: Rankings for Groups with Heteroscedasticity and Correlation 149

unit hypercube in RP2 (including the vector (1/p ....,

1/p)'), its intersection M with the unit-hypercube also has dimension

p2 - (2p - 1). Consequently, vec(Q) (and Q itself) spans a space of dimension p2 - 2p + 1. In contrast, under normality, Q is determined by (01 - Oi, i = 2,... , p) and fl, the former having p - 1 elements and the latter having p2 elements, and because 1f is symmetric, its effective number of parameters is p(p + 1)/2. Hence the space spanned by Q under normal-

ity has dimension at most p2/2 + 3p/2 - 1, which is less than

p2 -2p + 1 when p> 7. Under normality, Q is a function g(8,1f) of mean

and variance-covariance parameters, specifically g(0, fl) fxeA, f(x; 6, 12)dx, where f is the normal density for 0 and x Aij represents the event ri = j. Some basic properties of

g, used later, are as follows: g(ae, fl) = g(0, 1f) for all scalar a and with e the p vector (1,...., 1)'; g(0, afl) = g(0, 9f) for all a > 0; each partial derivative of f with respect to (0, fl) is integrable in x (see Patel and Read 1996, chap. 2), and so

g is infinitely differentiable; and limno,0 g (0, 2f) is not gener- ally well defined, but limao g(0, aB) is well defined for each

0 and positive definite symmetric matrix B (see Lemma A.3 in the Appendix).

The function g is, by construction, expressible in terms of

integrals involving a standard normal density f* and/or (cumu- lative) distribution function F*, either univariate or multivari- ate. When p = 2, Q1 = F*((02 - 01)/ofl11l + 22 - 2 2 12), and so on. For p = 3, results can be obtained with the help of existing tables of normal integrals, as when, say, 0, = c for some c, 02 = 3 = 0, and 12 = I, in which case Q1l = P(w, <

min(w2, w3)), with w1, w2, and w3 independent normal vari- ables with unit variance and mean values c, 0, and 0. The distribution of min(w2, w3) is then Ft(z) = 1 - (1- F*(Z))2 in which case Q11 =f P(w < z)dFt(z) = 2f F*(z-c)(1 - F*(z))f*(z) dz, and standard tables of normal integrals (Owen 1980, p. 406, eq. 11,010) can be used to conclude

Q,,= 3/2F*(-c/V2) + 2f*(-c//2) fo f*(-cz//2)/(1 + z )dz. In general, the form of g in terms of normal integrals is interesting but exceeds the coverage of existing tables. Nev- ertheless, some features of g can be described by exploiting symmetries of the normal distribution.

The following theorem describes ranking probabilities when groups are homogeneous in 6.

Theorem 1. If normality (assumption 1) of 8 holds and

0, = 82 . O p then the probability that group i takes on

the jth rank is the same as the probability that it takes on the (p + 1- j)th rank, for i, j = 1,..., p.

Hence, under 6 homogeneity and normality, the probabil- ity matrix Q acquires a form of column(ar) symmetry Qi =

Qi,p+1-j. The corresponding notion of row symmetry does

not apply. If, in addition, groups are uncorrelated (1i- = 0

for i i j) and if variances are homogeneous between groups

(ii

= 1ji),

then 8, ... , ,p

are independent and identically distributed (iid). Hence Q is uniform, (e.g., Qij = 1/p for all i, j), but this uniformity can otherwise fail (e.g.,

- = 0, i1

diagonal; ,11 1 = 1f33; f22 = 2, in which case Q21 23 > Q22, etc.). The conclusions of Theorem 1 are not unique to the normal distribution, being valid for the more general class

of elliptical distributions (Muirhead 1982, p. 34), and all oth- ers for which the distribution of differences ij = 60 - Bi is invariant to the transformation f -> -0 under 0 uniformity (see the Appendix). These issues are explored further using Proposition 1 later in this section.

An attempt can be made to find additional restrictions on Q imposed by normality and homogeneity of 0. However, when the column symmetry of Q is combined with conditions (1), for even p the result entails p + (p - 1) + (p/2)(p - 1) inde-

pendent restrictions on Q, and the remaining degrees of free- dom in Q are p2/2 - 3p/2 + 1. In comparison, under normal- ity and 0 homogeneity, g is invariant to 0 and invariant to

(positive) scalar multiples of fl, and hence the space spanned by Q = g(O, fl) has dimension at most p(p + 1)/2 - 1, this

being greater than p2/2 - 3p/2 + 1 for p = 2, 4, .... Conse- quently, aside from column symmetry, any restrictions on Q arising from normality and 0 homogeneity are rather subtle.

Under group heterogeneity, the following theorem applies.

Theorem 2. If normality (assumption 1) of 0 holds and 01 < 02 < .

< Op then the probability that group 1 takes on the first rank is greater than the probability that it takes on the pth rank, and the probability that group p takes on the pth rank is greater than the probability that it takes on the first rank.

Hence, under normality and the strict ordering 01 <

02 < < Op, the matrix Q acquires a degree of asymmetry

Q11 > Q1p and Qp, > Qp,, and for any other strict ordering analogous results are obtained by permuting the elements of

0. As with Theorem 1, the conclusions of Theorem 2 are not unique to the normal distribution for 6. They are valid for elliptical distributions and more generally for those for which 0 has a density function f(z; 8) with full support; 0 is a vec- tor of mean parameters and hence f (z; 8) = f(z - 0; 0) for all z, 0; and f(z; 0) = f(-z; 0) for all z (see the Appendix).

Theorem 2 describes a rather limited form of asymmetry in Q that arises when groups differ according to the values

O1, ... ,OP. Some stronger forms of asymmetry, relevant to

0 with distinct elements 0, < 02 <'.

< O,, can be easily defined:

(a) Qii

> Q for all j : i and all i (b)

Qii > Qi,p+1-i for all i.

But neither of these forms of asymmetry is valid gener- ally. If groups have equal variances and zero covariances, then, under normality, 6i = 6i + zi with z, ... z, iid, and if 01 < 2 <... < p then (a) holds, as does the somewhat weaker (b). Yet restriction (a) easily fails, for example, with p = 3, = e (1, 2, 3)', f diagonal, 11 = 12,, = 1, 22 = 2, and e > 0 sufficiently small, because then Q22 < 21 Q23. Restriction (b) fails in the case where p = 4, group i= 2, 0 = (-1, 0, 0.1,0.2)', a,11

= 22 = = 1, 13

" = 14 = 0,

fl12 = 1223 - 24 = .45, and f34 = -.45. Here 01 < 02 <3 64, and yet Q22

<-23, and the idea is that for the difference

vector z = (6, - 02, 63- 62, 64 - 62), the mean vector /

of z has 1, < 0 and

4.2 and pC3 close to 0, whereas the covariance

between z2 and z3 is negative and that between the remaining z's is close to 0. As a consequence, z is rather likely to lie in the region z, < 0 and is more likely to lie in the regions

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

150 Journal of the American Statistical Association, March 2003

(zl < 0, Z2 < 0, z3 > 0) and (zI < 0, Z2 > 0, z3 <0) than in the region (zI < 0, z:2 > 0, z3 > 0), giving rise to Q23 > Q22. Using the method described in Section 4, the approximation Q22 = .30, Q23 = .52 is obtained.

The results of Theorems 1 and 2 have been interpreted as

consequences of symmetries in the normal distribution. This geometric interpretation can be further developed via the fol-

lowing result, which is of independent interest.

Proposition 1. Let z be a d x 1 random vector, for d > 2, and suppose that z is multivariate normal with density f (z; t, V), for some mean vector /L and some invertible variance-covariance matrix V. Let v* = sup, f(z; At, V), and let A be a measurable subset of Rd. Then

P(z e A) = f A(Cc(v)n A)/vc(v) dv, (2)

where Ec(v) is the level set {z: f(z; A, V)= v}, a hyper- ellipse of the form {z: (z - I)'V-I(z - l) = c(v)}, where

c(v) = -21n(v) + (d/2)(ln(2T) )+ln(I VI)) and A(Ec(v)n A) is the surface area of 6c(v) n A.

Surface area, as referred to in Proposition 1, is a concept originally defined for objects like the sphere residing in d = 3 dimensions. However, it is standard to extend this definition to

any d > 2 dimensions (see Price 1982 and the Appendix), and for d = 2, surface area coincides with the arc length concept.

Choosing z in Proposition 1 to be the (p - 1) x 1 vec- tor (Oi -Ok, k= 1 ..., i- 1, i 1 ..., p)', for each i and j,

QJ = P(z E A) for A some orthant of RP-', and Q can be

interpreted geometrically in terms of orthants, hyperellipses, and the surface area of their intersection. To reaffirm The- orem 1 in the nontrivial cases p > 3, observe that under 0 homogeneity, the vector z is normal with - = 0, and for each orthant A of Rp-1, A(8p nA) = A(v, n-A) due to the symme- try of the hyperellipse about its center. Applying Proposition 1 leads to Theorem 1 for p > 3. For Theorem 2, with p > 3 and strict ordering 0, < 02

<-" < Op, for i= 1, t<0 and

A(e n A+) < A(8, n A-), where A+ and A- are the positive and negative orthants. Hence

Qlp < Q11. Qp1

< Qpp is deduced

similarly reaffirming Theorem 2 for p > 3.

4. ESTIMATION

To estimate the (objective, frequentist) ranking probabil- ity matrix Q, it is convenient to suppose that the historical and planned datasets are both balanced samples (from the same underlying population), with the historical sample hav-

ing n observations per group and the planned sample having q observations per group. With each probability Qij describ-

ing the likelihood of a particular ranking among statistics

019 ....

f,p computed on the future sample, a bootstrap esti- mation method is to use empirical frequencies of the various

rankings over the size q subsamples of the historical sample. This approach imposes no distributional assumptions on 0 and thus is nonparametric and consistent under general conditions, whereas in Section 3 the normal distribution was considered

specifically. If normality is a reasonable assumption for the planned vec-

tor 0 of statistics, then parametric estimators of Q can be

usefully considered. With a parametric approach, greater effi- ciency may be achieved; moreover, in Section 3 the connection between Q and the population means and (co)variances under normality was described at some length. For an estimator Q that itself is obtained from an estimated normal distribution for 6, the preceding discussion can be applied to Q and the estimated moments. Hence such estimators Q can be usefully interpreted in terms of moment estimates.

With the parametric approach, a role for Bayesian uncer- tainty about population moments can also be provided. To proceed, it is convenient to define W

- = qRf and =

(0, ... Op, (1ij, i = j .... p), j = 1 .. p)'. Then 0 is a

(p +p(p + 1)/2) x 1 vector, and given 4 and q, 0 and fl can be computed. Define m(.) : (0, f) = m(o) (with dependency of m on q surpressed), in which case m(.) is differentiable. Then a mixture-of-normals class of estimators,

=f g(m(4))dF(4O), (3)

is specified with g as previously defined and with F a cumu- lative distribution function that represents the investigator's beliefs about the true value 0* of 4), given available data. Sup- pose that F converges to point mass at 4)*, as n -+ ,oo

F(O) - 1{to.< 0, (4)

in probability, for all 4, with 1ly.

the indicator function and where for any two vectors v and w of the same dimensions, v < w if and only if vk I< wk for each index k. To form such beliefs, one typically has a consistent estimator 4 of 4p, such as a maximum likelihood estimator, based on historical data. Two examples of parameter beliefs F follow.

Plug-In F. Let F(O) = 1l{,<y.

Then F is a discrete dis- tribution reflecting empirical information (in 4), but no prior information about 0*. The resulting Q = g(m(O)) is a plug- in estimator that can be interpreted in terms of moment esti- mates 4 via the preceding discussion (and see Sec. 6 for an application).

To illustrate the plug-in estimator, suppose that serially independent normal populations are ranked via future sam- ple averages 6, and that 0 consists of historical sample aver- ages and

f- = f-t/q, with fOt the historical sample's variance-

covariance matrix. Then, with p = 2 groups, a plug-in normal

estimator Pll = P(1 < 02) = F*( (62

-- 01)/

I Ii 2)/q)+

Sl2 1- Q1, and so on, is obtained. Because the ratio

(02- 61)/V (gl + f2)/q actually has a student t distribu-

tion (centered at the population value), rather than a normal distribution, the use of normal F* somewhat understates the

uncertainty in this ratio. The resulting biases in estimating Q diminish as the historical sample size n increases. If the pop- ulations are serially dependent and covariance stationary, then f1 is specific to q.

Truncated Normal F. Let F(O) be continuous with a

probability density function

Sf( ; ), )1 ts h(o; ,V) =

f f ( ; , 1n e) I dE

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Gilbert: Rankings for Groups with Heteroscedasticity and Correlation 151

where f(4; 4, V/n) is multivariate normal, S is the set of p x p symmetric positive-definite matrices, and

V -A AB (5)

for suitable matrices A and B, which vary according to the application.

To illustrate, once again let 0 consist of population means and 0 consist of sample averages over a future sample of size q. Let xi, i=

1,.... p, t = 1 .... , n be a random historical

sample, with 6 = X the historical sample mean vector, and fl = Oft/q. Then xi, i=

1,...., p are each data series of length

n, and Au

can be specified as o(xi, (- k)(Xm )) for (k, m) such that = ~ km, and with

c(, .) sample covari- ance. Likewise, Bi can be specified as 5((xa - a)(Xb - Xb), (Xk- k)(X- Xm)) for (a, b) such that 4q = lab and (k, m) such that 4)j = lm. If the data x are thought to be

normally distributed, then greater efficiency arises with A = 0 and B specified in terms of '5 (xa, xb) for various a and b via standard formulas (e.g., Muirhead 1982, p. 90).

For large historical sample sizes n, in standard asymptotic theory the parameter estimator 4) converges to normal. Hence the truncated-normal distribution can be considered a consis- tent approximation to the distribution of 4. The distribution F for 4 arises if, by analogy, the asymptotic distribution of 4 is conferred to that of 4. The truncated-normal can also be interpreted as a large-n approximation to the normal-Wishart distribution, in which case the truncated-normal is approxi- mately a Bayesian posterior distribution for 4 with a diffuse conjugate prior (as in Broemling 1985, eqs. 8.10-8.11, with inverse-diffuseness matrices each set to 0), and Q is the result- ing Bayesian predictive distribution for 0. An exact normal- Wishart distribution can be used for 4 (see Gupta and Miescke 1996; Chick and Inoue 2001 for related work on Bayesian ranking selection problems), but this typically requires nor- mality of the data x.

With truncated normal parameter beliefs F and a condition- ally (on 4)) normal distribution for the planned statistic 0, the unconditional distribution for 0 is nonnormal. Consequently, our discussion from Section 3 cannot be directly applied to further interpret the estimator Q obtained from truncated nor- mal F. In this sense, the plug-in estimator has an advantage over the truncated normal for the purposes here, and it is easier to compute. In contrast, a truncated normal or normal-Wishart F may produce an estimator Q with less finite-sample bias than the plug-in.

Having described two examples of the mixture-of-normals estimator Q defined by (3)-(4), the discussion now turns to large-sample consistency of this estimator. If the planned statistic 6 is to be computed using a fixed number q of obser- vations per group, and if the number n of observations used to compute historical statistics 4 increases without bound, then, under belief convergence (4), Q consistently estimates Q. To show this, it suffices to note that g(.) is bounded and differ- entiable (established earlier) and m(.) is also differentiable, in which case a standard limit theorem (such as theorem 29.1 of Billingsley 1995) can be applied to the sequence of integrals f g(m(4))dF,(4), n = 1, 2,...

If the normal distribution for 0 is appropriate only as an asymptotic (large q) approximation, then for consistent (large n) estimation of Q, both n and q must increase appropriately. Under belief convergence of the form (4), convergence of the estimator Q is then generally complicated by changes in both the relevant integrand in formula (3) and the measure with which integration takes place, thwarting standard limit the- orems. In the case of the plug-in estimator, the situation is somewhat simpler.

Theorem 3. For the plug-in estimator Q, let the planned sample size q = n' for some ar in (0, 1/2), let

/-(6 - 0*) con-

verge in distribution to N(0, Wit) as q -+ oo for some invert- ible it, let fi = f1t/q, with it a positive definite symmetric matrix, and let 1_n() - 4*) be bounded in probability. Then, as the historical sample size n -+ 0o, Q and Q both converge to a common limit Q*.

Hence, for consistency of the plug-in estimator, it suffices that the planned sample size q increase with the historical sample size n, but not "too fast". Also, practical considera- tions may also affect choice of q. For economic growth rates (Sec. 2), predictions of future growth rates over horizons q of more than 5 years may be considered quite unreliable (for relevant discussion, see Clements and Hendry 1999).

Theorem 3 requires that the ratio q/n of planned and his- torical sample sizes approach 0. To illustrate why, consider the scenario used earlier to illustrate the plug-in estimator. If

0* = 0* (and hence Ql,

= 1/2), and if q -+ -c and q = cn for some c > 0, then Q,, converges in distribution to F*(f/fw), with w a standard normal variable. Consequently, Q fails to converge in probability to the constant matrix Q when q = c n, whereas if 60* # 0 , then Q does converge to Q.

Mixture-of-normals estimates Q can be computed through existing formulas in the case where p = 2 (as earlier) for var- ious choices of F, but not so in general, for the same rea- sons that Q itself is not generally computable in closed form. Instead, either numerical integration (quadrature) or simula- tion methods can be applied. For a given amount of com- puter time, the former method typically yields more accurate results for small p (<3, say), whereas for larger p, simulation or a mix of quadrature and simulation may be preferred (see Dai 1996; Dai, Cassandras, and Panayiotou 2000; Robert and Casella 1999, chap. 3).

To compute parametric Q via numerical integration, first note that with Q1, the probability that 6i takes rank j,

Q, = U P(Ok, < Ok2 <

Okp)' (6)

k:kj=i

where k is any p x 1 vector with distinct entries each taking a value in { , . ..., p}. It is useful to reexpress Q as

Qij =U P(nm -, - o > On nm {e - , > 0}1), (7) A

where A is any partition {fA, B} of the numbers 1, 2,...., i- 1, i + 1

...., p into two sets, A and B (one of which may be

the empty set 0), such that A contains j - 1 elements and B contains p - j elements.

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

152 Journal of the American Statistical Association, March 2003

For the plug-in estimator Q based on plug-in F and normal

0,

Qi j Jf (z; m,W, A) dz, (8) A o1>

where z is a (p - 1) x 1 vector and f(z; A, WA) is the mul- tivariate normal density with mean vector mA and variance- covariance matrix WA given by

mA = CA O, WA = CA f CI, (9)

by setting 0 = 0 and f-

= fl, with (p - 1) x p matrices CA defined as follows. Let KA,,, ... KA,j-1 be the elements of A arranged in increasing order, and let K,,.., K,,,_j be the elements of B arranged in increasing order. Define, for l= 1,..., p-1 and m= 1,...,p,

1 for 1 < j and m= i -1 for 1 > j and m = i

CA1m = -1 for l<j and m = KAl

1 for I j and m = K,1-j+1l 0 otherwise.

The proposed computational method would appear to involve many steps, but in practice one need not compute all p2 ele- ments of the (bistochastic) matrix Q, but instead compute just p2 - 2p + 1 of them and obtain the rest via simple arithmetic.

To illustrate, in the case where p = 3, each matrix CA is a 2 x 3 matrix, and for group i and rank j = 1 there is a unique partition A, with A = 0, B = {2, 31, KA = 0, K, = (2, 3) and

C{0,{2,3}} = 2 0 1

It suffices to compute four probabilities, say Q11, Q13, Q31 and

Q33, and infer the rest, Q12 = 1 - Q - Q13, and so on. For the first four probabilities, the foregoing computation method is used, which requires integration of a normal bivariate den- sity over an unbounded region. Available quadrature/numerical integration routines can yield quite accurate results (see Genz 1988; Evans and Swartz 1995). In particular, the author use, and can recommend the Mathematica (version 4.0 or higher) NIntegrate routine (with default settings that use an adap- tive algorithm, recursively dividing the integration region as needed), but other routines (available in Fortran, Mathcad, etc.) are also fully capable, as is the recent (and free) soft- ware developed by Alan Genz at Washington State University. The author has written Mathematica programs, available on request, which work for data with p = 3 and p = 4 groups.

For the Bayesian estimator Q based on truncated-normal F and normal 0,

-1 = f f(z; mA,

WA) h(aA; aA, DA) daAdz, (10)

A o

where aA = (m', vech(WA)')' and DA = E(V/n)E', with EA solving aA = EA 4. Specifically,

E A

FoI

with FA the (p - l)p/2 x p(p + 1)/2 matrix given by

FA = Gp-, (,'-1 CA) (CA C IPg HP,,

where for any k > 1 and square k x k matrix M, Gk and Hk are defined as the solutions to vech(M) = Gkvec(M) and vec(M) = Hkvech(M), with vech(M) the stacked lower- triangular portion of M. The computations required for the Bayesian Q are obviously greater than for the plug-in, but Q is once again bistochastic, and hence only p2 - 2p + 1 of its elements must be computed. In the case where p = 3,

1 0 0 0 G2 = 0 1 0 0 and

0 0 0 1

1 0 0 0 0 0

H3 0 0 0 1 0 0, 0 0 0 0 1 0 0 0 1 0 0 0 O 0 0 0 1 0 0 0 0 0 0 1

and computing (10) involves performing seven-dimensional numerical integration over an unbounded domain. In this prob- lem, nonconvergence of simple adaptive quadrature routines is commonly encountered; thus quadrature based on Monte Carlo or quasi-Monte Carlo sampled points for domain sub- division is recommended instead.

5. INFERENCE

In addition to point estimates of the ranking probabil- ity matrix Q, standard errors as well as confidence intervals and hypothesis tests can be obtained for (the elements) of Q. Hypotheses specific to the ranking probability matrix Q include

HilQij= 1/p for j= 1,..., p,

which for group i is the hypothesis of uniformity in the prob- ability of ranks j = 1,

.... p. In the example from Section 2,

Hi* asserts that for each of the ranks 1, 2, and 3 there is equal probability that the ith country's economic growth rate takes on that rank, consistent with modern endogenous growth the- ory. Among the alternatives to Hi, in the case where i = 2 (Mexico, the developing country) is the (neoclassical) alterna- tive that Mexico is "catching up" to Canada and the United States, causing it to grow faster, with Q23 > Q22 > Q21. Alter- natively, Mexico may experience more volatile growth rates that fluctuate about the same long-run mean as that of the developed countries, causing Q21 > Q22 and Q23 > Q22.

Linear restrictions, such as uniformity, can be expressed as Avec(Q) = 0 for some c x p2 rank c matrix A and some number c of restrictions. For uniformity hypothesis Hi*, c =

p- 1, Aki- 1, and Aki+kp = -1 for k = 1... r, and Ak, = 0 for all other (k, m). A Wald statistic for hypothesis testing is then

W = vec(Q)'A'(ALA')-' A vec(Q), (11)

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Gilbert: Rankings for Groups with Heteroscedasticity and Correlation 153

with L a covariance estimate (described further later in this section) for vec(Q), such that W is chi-squared in large sam- ples under the null hypothesis.

Inference for Q, including standard errors, confidence inter- vals, and hypothesis tests, can be carried using either the non- parametric estimator or one of the mixture-of-normals estima- tors Q discussed in Section 4. For the latter case, specifically the plug-in normal estimator is treated. The corresponding but more complicated theory for truncated-normal and normal- Wishart estimators, and so forth, is deferred to future work.

For the nonparametric estimator Q, obtained by counting frequencies of bootstrapped rankings from the historical sam- ple, under standard conditions ,/Ivec(Q - Q) is asymptoti- cally normal N(0, L) for some L, and standard errors, con- fidence intervals, and hypothesis tests for Q can be obtained via a consistent estimator L of L. To this end the bootstrap can be applied resampling from the data to construct statis- tics and their rankings, and then obtaining L via the variance- covariance matrix of the simulated rankings' indicator series. If the data series are considered to be serially independent, then simple resampling (without replacement) can be used; otherwise block-wise resampling can be used.

Under normality (assumption 1), in the case where p = 2, the hypotheses H* and H2 are equivalent to the restriction

08 = 02, but for p > 3, uniformity H/* for a specific group i is not equivalent to some restriction on 0 alone, but instead is a rather complex restriction on both 0 and fl. For instance, if p = 3 and 08 = 0, 62 = 1, and 83 = 2, then HP can be achieved with fli 1, = f22 = c, 33 - 1, and lij = 0 for all i j, for some c > 1. In principle, to test such hypotheses, 0 and fl can be estimated under the hypothesized restrictions via maximum likelihood and compared to unrestricted estimates via a likelihood ratio test; however, computing the restricted set of 0 and f1 is itself a nontrivial task.

For inference based on the plug-in normal estima- tor Q, define the matrix of partial derivatives F =

d vec (g(m( )))4=_.. Theorem 4. Let normality (assumption 1) of 0 hold for a

planned sample of fixed size q, and let /-i(4 - P) converge in distribution to N(0, V) as the historical sample size n -- 00,

for some V. Then vec -n(Q - Q) converges in distribution to N(0, L) as n --+ oc, with L the matrix FVF'.

Note that Theorem 4 does not assert that L is invertible, and this cannot be verified analytically, yet there is no reason to doubt invertibility, nor have problems of invertibility been encountered in carrying out the plug-in Wald test. As a practi- cal matter, the practitioner can examine the invertibility issue empirically (see, e.g., Gill and Lewbel 1992).

In comparison with the consistency results of Theorem 3, the normality results for the vector V-vec(Q - Q) in Theo- rem 4 are more specialized, because they require that 0 be normal for fixed q, rather than just asymptotically normal (q increasing). Moreover, the nature of the testing problem thwarts attempts to obtain normality of Vnvec(Q - Q) when q increases with n. One reason for this failure is that as q- 00, the elements of Q can approach 0 or 1 (e.g., the boundaries of the relevant parameter space), in which case asymptotic normality of Q breaks down. There is a second

Table 1. Simulated Rejection Rates, Tests of Ranking Probabilities

Nominal test size Horizon Sample q n 10 5 1

1 30 .10 .07 .03 40 .08 .07 .02 50 .07 .05 .01

100 .08 .04 .01 5 30 .10 .08 .05

40 .10 .08 .04 50 .10 .08 .04

100 .06 .05 .02

potential problem regarding asymptotic normal approximation to the distribution of J/(8 - 6). The exact error involved in such approximations is typically unknown, but if a rate

1/1/- is assumed for convergence of vec(Q - g(O, flt/q)) to 0 (in a suitable norm), then, because I-nvec (Q - Q) =

,/-nvec (Q - g(O, ft/q) + g(6, 1ft/q) - Q), if q -+ 00 and q/n -- 0 as in Theorem 3, and if ,/nvec (Q - g(O, fl/q)) converges to normal, then f-n(Q - Q) diverges. Consequently, nonnormality of 0 can create special problems for asymptotic normality of -/nvec(Q - Q).

To use Theorem 4 for constructing standard errors, confi- dence intervals, and so on, estimates of matrices F and V are needed. The estimators V were discussed in Section 4; esti- mating F uses the plug-in estimate F = vec (g(m( )))= , which is consistent because 4 is assumed to be consistent and both m and g are differentiable functions. Computing F uses numerical methods based on differences Ck= g(m(0 + 5k))) - g(m(h)), k = 1 . . . , d, where d = p + p(p + 1)/2 and 5k is the p vector with Skk = ak and kj = 0 otherwise, for suitably small numbers ak (which in the computations herein is specified as ak =

-k/Vl ). Then F is approximated

by (vec (C1)/a, . .., vec (C)/ad); computing this involves using g(m(4)) = Q obtained earlier by numerical integra- tion (Sec. 4), and g(m(4))) is obtained similarly for = 0+0k, k- 1 ...1 d.

A simulation study was conducted to gauge the finite- sample accuracy of the asymptotically valid inferences pro- vided by the plug-in parametric tests. For p = 3 groups and prediction horizons q = 1 and q = 5, pseudorandom data xit, i = 1, ... p, t = 1 ... , n, iid standard normal, for n = 30, 40, 50, and 100, were generated. For each sample design, 500 simulation rounds were conducted, and results were recorded. The uniformity hypotheses Hi7, i = 1, 2, 3 were tested; Table 1 reports the frequency of test rejections (pooled across the three null hypotheses) at the 10%, 5%, and 1% nominal (chi-squared test, 2 degrees of freedom) significance levels. As indicated, the finite-sample size of the tests is pretty close to the theoretical size for all samples considered.

6. EXAMPLE, CONTINUED

Figure 1 plots the historical time series of annual growth rates. The objectives are to gauge the probability of coun- try rankings for economic growth rates in the future and also to address the reasonableness of particular theories such as equiprobable rankings for a given country. The sample stan- dard deviation of economic growth is much higher for Mexico

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

154 Journal of the American Statistical Association, March 2003

** * *

2.5 *

2-.5

A \ * a *

1I70 o1975 1 0 19 8 5: 1995

-2.5

-7.5

Figure 1. Economic Growth Rates, Annual. Depicted are annual growth rates for Canada (4), Mexico (*), and United States (N).

than for the other countries (see also Fig. 1), and there is high correlation for Canada and the United States; hence the data here are consistent with the intergroup heteroscedasticity and correlation features discussed earlier.

Because interest lies in the distribution of future economic growth rates, there is in principle a role for conditional means

0 and variance/covariances fl for some conditioning variables observable historically and useful in predicting the planned statistics 0. However, autocorrelations for annual economic growth are not generally significant at any lag (with p values for the F statistic of the fifth-order autoregression of .88 for Canada, .86 for Mexico, and .21 for the United States), and hence conditioning on recent years' growth rates is of limited use, and instead conditioning proceeds under the assumption that the data are iid over time.

Table 2 reports estimated ranking probabilities, using the historical sample (n = 34), three estimation methods, horizons q = 1 and q = 5 years, and statistics 6 and f1i consisting of sample means and covariances. Both horizons are of economic interest, and for the parametric estimation methods, the longer horizon introduces to the planned statistic 0 a degree of con- vergence toward the assumed normal distribution. For the data herein, Table 2 indicates that each of the 3 x 3 probability

Table 2. Estimated Ranking Probabilities

Estimation Parameter Horizon method beliefs q Rank Canada Mexico U.S.

Parametric Plug-in 1 1 .26 .46 .28 2 .45 .12 .43 3 .29 .42 .29

5 1 .25 .48 .26 2 .45 .12 .43 3 .30 .39 .31

Parametric Truncated- 1 1 .29 .40 .31 normal 2 .45 .23 .32

3 .26 .37 .37 5 1 .24 .54 .22

2 .45 .18 .37 3 .31 .28 .41

Nonpara- 1 1 .29 .50 .21 metric 2 .41 .03 .56

3 .30 .47 .23 5 1 .26 .48 .26

2 .46 .12 .42 3 .28 .41 .32

matrices Q satisfies (1), that is, lies in the interior of the set of doubly stochastic matrices. The plug-in normal paramet- ric estimate is numerically quite similar to the nonparametric bootstrap estimate. The mixed-normal estimate Q (based on truncated normal beliefs about 4*) is roughly similar to the plug-in estimate but tends to be closer to uniform (1/3 proba- bility for each country-rank pair), consistent with the idea that the mixed-normal estimation method puts more importance on sampling error in the historical statistics 06, 62, and 83. Over- all, a common finding is that for Mexico, each method puts higher probability on ranks 1 and 3 than on rank 2.

For the plug-in normal estimate of ranking probabilities, the discussion of normal rankings from Section 3 (includ- ing Theorems 1 and 2) can be drawn on to further interpret the results in terms of historical means and covariances. The three growth rate series have distinct historical means (with Mexico < United States < Canada), and if they also had exactly equal sample variances and exactly zero covariances, then the estimate Q would show that for Canada, rank = 3 (highest growth rate) gets the highest probability; for Mex- ico, rank = 1 gets the lowest probability; and for the United States, rank = 2 gets the highest probability. However, the groups have substantial heteroscedasticity and correlation, and for each horizon q, both Canada and the United States achieve highest probability at rank = 2! This is consistent with the tendency of Mexico's economic growth rate to exceed or fall below both of the other countries' rates. If the sample mean values were all identical regardless of the sample covariances, then from Theorem 1 it could be concluded that each country has an equal estimated likelihood of rankings 1 and 3. This

prescription is rather close to the observed plug-in estimates, consistent with limited between-group variation in 0. Given the nonzero observed between-group differences in mean, it can be concluded correctly from Theorem 2 that the slowest- growing country (Mexico) has the highest probability (.46 at q = 1, .49 at q = 5) of future slowest growth rate (rank = 1), whereas the fastest-growing country (Canada) has the highest probability (.29 at q = 1, .30 at q = 5) of future fastest growth rate.

Table 3 reports the results of Wald tests for the hypothesis of equiprobable rankings for each country. These results sup- port the impression made earlier that future economic rank- ings (1, 2, and 3) are not reasonably equiprobable for Mexico, while deviations from the equi-probable ranking hypothesis are milder for Canada and the United States.

Table 3. Tests of Ranking Uniformity, By Group

Estimation method Horizon q Canada Mexico U.S.

Parametric 1 Wald 1.63 65.72 .62 (plug-in) p value .44 0 .73

5 Wald .60 27.07 .06 p value .74 0 .97

Nonparametric 1 Wald .84 106.79 6.95 p value .66 0 .03

5 Wald 2.14 14.74 1.30 p value .34 0 .52

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Gilbert: Rankings for Groups with Heteroscedasticity and Correlation 155

7. CONCLUDING REMARKS

This article has investigated the decision problem of how to form probability assessments for future group rankings, given that groups may be rather complex in their specification. As a formal framework, it has been assumed that rankings are made via sample statistics that have a joint normal distribu- tion, and the impact on the ranking probabilities of values taken by the normal distribution parameters, including means, variances, and covariances, has been examined. Methods for

estimating the true (objective, frequentist) ranking probability distribution given historical data and for developing inferences about the ranking probabilities have been developed. An eco- nomic example demonstrates the feasibility of the proposed methods.

The proposed methods rely on normality of sample statis- tics, either in finite samples or asymptotically. It would be

interesting to obtain results for different statistics (e.g., bino- mial, Student t, chi-squared). Particularly interesting to fur- ther understanding of the general decision problem of ranking probability assessment, are results that are robust to a vari- ety of distributional specifications. Also, although alternative

Bayesian approaches to the ranking problem have been dis- cussed in some detail, much more can be done.

APPENDIX: MATHEMATICAL PROOFS

Proof of Theorem 1

To proceed, define the p x p matrix ij = 6, - Oj, i, j = 1 . p, and statistics Oij = i - 6, i, j 1. . ., p. To establish column sym- metry Qij = Qi, p+-j of Q, use the following fact.

Lemma A.]. If normality (assumption 1) holds and 8, = 02 O=

.= Op, then the distribution of q1 is the same as that of -f.

Proof. Under normality, vec(o) is distributed multivariate nor- mal, N(a, V), with a = vec(f) and variance-covariance matrix V

consisting of elements cov(0i - 0, k - 0m), for i, j, k, m = 1,

.... p.

Also, observe that vec(-/) is multivariate normal, N(-a, V). Under the restriction 0, = 02 = = Op,

0 = 0 and so a = 0, in which case the distributions of vec(q) and vec(-f) are the same.

Note that the conclusions of Lemma A.1 are not limited to the normal distribution, but are valid as long as vec(q) has a (cumulative) distribution D(.; a) and vec(-f) has a distribution D(.; -a), for some family D(., a) of distributions parameterized by mean vector a.

To apply Lemma A.1, first note that because Qij is the probabil- ity that 0i has rank j among 61., . p, Qij

= ZRj P(maxkR, 0k

<

6i < minkeRx

Ok), where (Rj, R ) is any partition of (1 ..., j- 1, j+ 1,..., p} for which

R9 contains j- 1 elements and Rj contains

the remaining p - j elements. Therefore, Qi = >ij P(maxkERj fki < 0 < minkeRY ~ki). From Lemma A.1,

/ and -, have the same dis-

tribution, and hence (maxkCR fki, minkCR* ki) has the same distri- bution as (maxkeR -ki, minkeR* -fki), for each i, j, Rj. Moreover,

maxkeR fki < 0 if and only if maxkeR. --fki

> 0, and minkeR* fki > 0

if and only if minkeR-*

ki < 0, for each i, j, R9.

Consequently,

Qij= -Rj

P(maxkeR, ki > 0 > minkeRF ki)= Qi,p+-j.

Proof of Theorem 2

This builds on the arguments used to obtain Theorem 1.

Lemma A.2. Let z be a d x 1 random vector with mean vector

/i. For each bt, let f(z; /t) be a probability density function for z such that f(z; /Ct) > 0 for all z and /t, and f(z; 0) = f(-z; 0) for all z. If /t > 0, then P(z > 0) > P(z < 0), where for any vector v, the inequality v > 0 means that vi > 0 for each element vi of v.

Proof. If At = 0, then the distribution of z is that of -z. Hence, with P* denoting probability under the f(z, 0) density, P*(z > 0) = P*(z < 0) = c for some c that is > 0 because the density is strictly positive. For /t > 0, P(z > 0) = P*(z > -/-) > P*(z > 0) = c, whereas P(z < 0) = P*(z < -A) < P*(z < 0) = c, and hence P(z > 0) > P(z < 0).

Observe that with the choice z = vec(f), the assumptions of Lemma A.2 suffice for those of Lemma A. 1 but are stronger, because they assume existence of a density with full support.

The claims in Theorem 2 pertain only to the first and last columns of Q. To proceed, first suppose that z = (02 - 6,..., - 01)'. With

01 <... < Op,, z is multivariate normal N(jt, V) with L > 0. More-

over, because fl is considered invertible, V is also invertible, and hence z has a density function f(z; IL) satisfying the assumptions of Lemma A.2. Conclude that P(z > 0) > P(z < 0) or, equivalently, Q11 >

Qp. To show that Q,, > Q,,, define z = (01 - Op .... Op-1

-

Op)', and proceed analogously.

Proof of Proposition 1

To begin, specify the surface area for subsets of hyperellipses. Recall that if a d x 1 random vector is normal

N(/L, V) (or, more generally, elliptical), then the level sets of the corresponding prob- ability density function are hyperellipses 8 = {z : (z - /t)'V-'(z -

/t) = c} for some c > 0. For each point z on 8, let z = h(S) =

/t + V1/2/c(s( ) ....,

Sd-I(S)) for some " = (k, . .. ~ d-_)

E [0, 2T]d-1, with s1, ... ,sd-1 the spherical coordinates for the unit sphere in Rd (Price 1982, p. 461). Then dh/dy =

V'/2/.fds/d , and for each Lebesgue-measurable set S C 8, let As be the set of " for which z = h(a) for some z E S. Then define the surface area (Price 1982, chap. 45) A(S)=

~-Sas ||M[|d{, with M = Vl/2ds/di and

I M| =l'jkMi2k. With level sets R, = {z : f(z; t, V) = a} for the normal density,

c(a) solves Ra = {z : (z - u)'V-I(z - A) = c(a)}, in which case c(a) = -21n(a)+(d/2)(ln(27)r+ln(|VI)), and c'(a) = -2/a. Also, with fsf(z; /t, V)dz = ff(z; At, V)l1,fsldz, the Lebesgue integral of the nonnegative Lebesgue-measurable function f(z; It, V)llzEsl, fs f(z; /t, V)dz =

suphuH S h(z)dz, where H is the set of functions

h for which h(z) < f(z)l{z sI (for all z) and h(z) = E-, ak (lEAk for some L, with A,,...,AL subsets of Rd that are Lebesgue measurable and satisfy nr Ak = 0 and U_, Ak, = Rd. Let ak kv*/L and Ak = E8F nS, k = 1,

.... L, where e8 is the "shell"

of points in Rd with outer boundary (8c(aI) and inner bound-

ary 8c(ak), with a0 - 0. Then fs f(z)dz = limL E:- =, aL (k S), where

A(.) is Lebesgue measure. Moreover, for each e >

0 and each y E (0, 1) and each Lebesgue-measurable set S, s(8f n S) = A(k n S)(c(a,)- c(a,,))+ E and Ic(ak -

/c(ak-l)- (1/(2c(am))c(aok)v*/L) < e for each k > yL and all L sufficiently large. Consequently,

limLj Z>~yL ak(8( n S)=

limLoe Z,_zL (1/Vc(ak>)) (8c(at nA)

=a A(8~.(v) 0nS)/vc(v)dv. Moreover, because for each 6 > O, y > 0 can be chosen for which

Is f(z; L, V)dz - limLoo

t_~yL a/,/(8 n S) < 8, conclude that

s f(z; At, V)dz = fo* A(8,c(v) n S)/c(v)dv.

Proof of Theorem 3

First, some results for g(6, f) when f approaches the origin are established.

Lemma A.3. lima{og(, aB) exists for all symmetric positive definite matrices B.

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

156 Journal of the American Statistical Association, March 2003

Proof. Suppose first that 01 = 02 = ... = p. Then g(O, aB) =

g(0, aB) = g(0, B), and hence lim,1o g(0, aB) = g(0, B). Next, sup- pose that 0, < 02 < ... < Op. Then, clearly, limao g(8, aB) = I. The

remaining case is that some elements of 0 have values that are dis- tinct from that of all other elements, and some elements have values in common with those of other elements. An example is 0 = (K', A')' with K ak x 1 vector such that K1 = ... = Kk, and A a (p - k) xl vector such that A1 <dA2 < ... < p-k, in which case

limg(0, aB)= A 0

aO 0 Ip_k

with Ip-k the (p - k) x (p - k) identity matrix and A =

limaO g(K, aV,), where VK is the upper-left k x k submatrix of V.

For other examples, arguments are analogous but may involve an ini- tial permutation of 0 elements, as well as a limit formula involving a matrix partition into more than four blocks.

To apply Lemma A.3, note that because /-(0

- 0) converges in distribution to multivariate normal N(0, Wt) as q

-- oo,

Q*- limq-o Q = liml/q,0o g(, (l/q)ft), which by Lemma A.3 is a well- defined limit.

To show that Q - Q converges weakly to 0, recall that Q has

typical element Qij of the form g(0, f1) = fEA f (x; IA,

fl)dz, with

f(x; /u, f) the normal density and with x e Aij representing the event

ri = j, as in the text. Moreover, the derivatives of g(O, fl) exist and

are of the form fxA Df(x; 0, f)dx with Df(x; 0, n) a derivative of

f with respect to an element of 0 or ni. Denoting M = 7-1 leads to

f(x; /x, f) = (27r)-P/2/det(M) exp((-1/2)(x - O)'M(x - 0)) and

d/d0g(8, M-') = -ME(l{xEAjj}(x - 0)), and with

(E(l{xEAi}l(xk -

0)))2 <P(x E Aijf)kk, Id/d0g(O, M-') I < Zi M'ibi ii, with

the 12 norm. Next,

aM g(0, M-1)

I a

( I

)det)(M) 2 1 tM P(x E Aij) - E( xeAij(Xi -

Oi)(xi -

Oj) ) --2 de t(M){x ij'

and because E(l{xCAij}(x -

Oi)(x -

Oj)) IP(x E

Aij)E((x, -

0i)2(x j -

O)2), the bound

g det(M)|

g(, M-) det(M) +E((xi-i)2 (xj 2) a ij

- det(M)

results. Also, E((xi - 0i)2(x- 0j)2)

< E(xi - )4E(xj -Oj)4 where E(xi -

i)4E(xj -

Oj)4= 3fiiIjj, because x is normal.

Hence

Smaxkm -vM det(M)|

-Mi

g(O, M-1) <

B(M) det(M)

+ -maxf kk"

dMij det(M) k

Let C(M) be the quantity max(B(M), /Iij MjEilsii).

Then,

using the bounds on the derivatives of g, the fundamental theorem of calculus leads to

Ilvec(Q - Q) l I - * max C(wM +(1 - w)M), U)w[O, 1]

with 4* the true value of ) (as in the text). With fl = ft/q, maxWE[0,1] C(wM + (1 - w)M) is of order O,(q), whereas I11 - 4* | is of order

Op,(l/VI ). Hence, because q = na with a E (0, 1/2), it

is concluded that Q - Q converges to 0 in probability.

Proof of Theorem 4

Under assumption 1, have Q = g(m(o)). With 1/n(

- 4)) 4 N(0, V), the Delta method (e.g., van der Vaart 1998, thm. 3.1 is

applied), omitting details because they are routine, to conclude that

_V-n(g(m()) - g(m(o))) is asymptotically N(0, FVF'), from which

the normality result for vec(Q) follows.

[Received January 2002. Revised September 2002.]

REFERENCES

Amato, L. H., and Amato, C. H. (2001), "The Effects of Global Competition on Total Factor Productivity in U.S. Manufacturing," Review of Industrial Organization, 19, 407-423.

Barro, R. J. (1991), "Economic Growth in a Cross-Section of Countries," Quarterly Journal of Economics, 106, 407-443.

(1996), "Determinants of Economic Growth: A Cross-Country Empir- ical Study," Working Paper 5698, National Bureau of Economic Research.

Barro, R. J., and Sala-I-Martin, X. (1995), Economic Growth, New York: McGraw-Hill.

Bechhofer, R. E., Elmaghraby, S., and Morse, N. (1959), "A Single-Sample Multiple-Decision Procedure for Selecting the Multinomial Event Which Has the Highest Probability," Annals of Mathematical Statistics, 30, 102- 119.

Bechhofer, R. E., Santner, T. J., and Goldsman, D. M. (1995), Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons, New York: Wiley.

Billingsley, P. (1995), Probability and Measure (3rd ed.), New York: Wiley. Broemeling, L. D. (1985), Bayesian Analysis of Linear Models, New York:

Marcel Dekker. Chick, S. E., and Inoue, K. (2001), "New Procedures to Select the Best Sim-

ulated System Using Common Random Numbers," Management Science, 47, 1133-1149.

Clements, M. P., and Hendry, D. F. (1999), Forecasting Non-Stationary Eco- nomic Time Series, Cambridge, MA: MIT Press.

Dai, L. (1996), "Convergence Properties of Ordinal Comparison in the Simu- lation of Discrete Event Dynamic Systems," Journal of Optimization The- ory and Applications, 91, 363-388.

Dai, L., Cassandras, C. G., and Panayiotou, C. G. (2000), "On the Con- verge Rate of Ordinal Optimization for a Class of Stochastic Discrete Resource Allocation Problems," IEEE Transactions in Automatic Control, 45, 588-591.

Engel, E. M., Fischer, R. D., and Galetovic, A. (2001), "Least-Present-Value- of-Revenue Auctions and Highway Franchising," Journal of Political Econ- omy, 109, 993-1020.

Evans, M., and Swartz, T. (1995), "Methods for Approximating Integrals in Statistics With Special Emphasis on Bayesian Integration Problems," Sta- tistical Science, 10, 254-272.

Farmer, R. E. A., and Lahiri, A. (2002), "Economic Growth in an Interdepen- dent World Economy," Discussion Paper 3250, Center for Economic Policy Research London.

Forbes, K. (2000), "A Reassessment of the Relationship Between Inequality and Growth," American Economic Review, 90, 869-887.

Galbraith, J. K. (2002), "A Perfect Crime: Global Inequality," Daedalus, 131, 11-25.

Genz, A. (1998), "Stochastic Methods for Multiple Integrals Over Unbounded Regions," Mathematical Computer Simulation, 47, 287-298.

Gibbons, J. D., Olkin, I., and Sobel, M. (1977), Selecting and Ordering Pop- ulations: A New Statistical Methodology, New York: Wiley.

Gill, L., and Lewbel, A. (1992), "Testing the Rank and Definiteness of Esti- mated Matrices With Applications to Factor, State Space, and ARMA Mod- els," Journal of the American Statistical Association, 87, 766-776.

Gupta, S., and Miescke, K. J. (1988), "On the Problem of Finding the Largest Normal Mean Under Heteroscedasticity," Statistical Decision Theory and Related Topics IV, 2, 37-49.

(1996), "Bayesian Look-Ahead One-Stage Sampling Allocations for Selection of the Best Population," Journal of Statistical Planning and Infer- ence, 54, 229-244.

Gupta, S., and Panchapeakesan, S. (1979), Multiple Decision Procedures, New York: Wiley.

Hochberg, Y., and Tamhane, A. C. (1987), Multiple Comparison Procedures, New York: Wiley.

Hoppe, E M. (ed.)(1993), Multiple Comparisons, Selection, and Applications to Biometry, New York: Marcel Dekker.

Hsu, J. C. (1996), Multiple Comparisons, London: Chapman and Hall.

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions

Gilbert: Rankings for Groups with Heteroscedasticity and Correlation 157

Kim, S., and Nelson, B. L. (2001), "A Fully Sequential Selection Procedure for Indifference-Zone Selection in Simulation," Transactions on Modeling and Computer Simulation, 11, 251-273.

Lam, K. (1989), "The Multiple Comparison of Ranked Parameters," Commu- nications in Statistics, Part A-Theory and Methods, 18, 1217-1237.

Landes, D. S. (1999), The Wealth and Poverty of Nations, New York: Norton. Lucas, R. E. (1988), "On the Mechanics of Economic Development," Journal

of Monetary Economics, 22, 3-42. Muirhead, R. J. (1982), Aspects of Multivariate Statistical Theory, New York:

Wiley. Mukhopadhyay, N., and Solanky, T. K. (1994), Multistage Selection and Rank-

ing Procedures: Second-Order Asymptotics, New York: Marcel Dekker. Nelson, B. L., and Matejcik, F. J. (1995), "Using Common Random Numbers

for Indifference-Zone Selection and Multiple Comparisons in Simulation," Management Science, 41, 1935-1945.

Nelson, B. L., Swann, J., Goldsman, D., and Song, W. (2001), "Simple Pro- cedures for Selecting the Best Simulated System When the Number of Alternatives is Large," Operations Research, 49, 950-963.

Owen, D. B. (1980), "A Table of Normal Integrals," Communications in Statistics, Part B-Simulation and Computation, 9, 389-419.

Patel, J. K., and Read, C. B. (1996), Handbook of the Normal Distribution (3rd ed.), New York: Marcel Dekker.

Price, G. B. (1982), Multivariable Analysis, New York: Springer-Verlag. Radcliffe, J., and Rass, L. (1998), "Spatial Mendelian Games," Mathematical

Biosciences, 151, 199-218. Robert, C. P., and Casella, G. (1999), Monte Carlo Statistical Methods, New

York: Springer. Romer, P. M. (1990), "Endogenous Technical Change," Journal of Political

Economy, 98, S71-S102. Solow, R. (1956), "A Contribution to the Theory of Economic Growth," Quar-

terly Journal of Economics, 770, 65-94. Swann, T. (1956), "Economic Growth and Capital Accumulation," Economic

Record, 32, 334-361. Toothaker, L. E. (1993), Multiple Comparison Procedures, London: Sage. van der Vaart, A. W. (1998), Asymptotic Statistics, Cambridge, U.K.: Cam-

bridge University Press.

This content downloaded from 185.2.32.121 on Sun, 15 Jun 2014 11:39:24 AMAll use subject to JSTOR Terms and Conditions