Chessboard Distributions and Random Vectors with Speciﬁed Marginals and Covariance ... · 2014. 1. 17. · Chessboard Distributions and Random Vectors with Speciﬁed Marginals

Chessboard Distributions and Random Vectors with Specified Marginals

and Covariance Matrix

Soumyadip Ghosh and Shane G. Henderson

School of Operations Research and Industrial Engineering

Cornell University

Ithaca, NY 14853, U.S.A.

July 6, 2001

Abstract

There is a growing need for the ability to specify and generate correlated random variables as primitive

inputs to stochastic models. Motivated by this need, several authors have explored the generation of

random vectors with specified marginals, together with a specified covariance matrix, through the use

of a transformation of a multivariate normal random vector (the NORTA method).

A covariance matrix is said to be feasible for a given set of marginal distributions if a random vector

exists with these characteristics. We develop a computational approach for establishing whether a given

covariance matrix is feasible for a given set of marginals. The approach is used to rigorously establish

that there are sets of marginals with feasible covariance matrix that the NORTA method cannot match.

In such cases, we show how to modify the initialization phase of NORTA so that it will exactly match

the marginals, and approximately match the desired covariance matrix.

An important feature of our analysis is that we show that for almost any covariance matrix (in a

certain precise sense), our computational procedure either explicitly provides a construction of a random

vector with the required properties, or establishes that no such random vector exists.

1

Introduction

There is a growing need for the ability to specify and generate random vectors consisting of correlated

observations as primitive inputs to stochastic models. For example, in a manufacturing setting, the

processing times of a single job at different stations may be correlated due to characteristics of the

job such as size. In determining reservoir release rules, the inflows of water to different reservoirs are

invariably correlated. In generating random test problems for a given algorithm, it is advantageous to

ensure that some elements of the problem are correlated (Hill and Reilly 1994, 2000, Hodgson et al.

2000). Further applications have recently been reported in cost analysis (Lurie and Goldberg 1998), and

in decision and risk analysis (Clemen and Reilly 1999).

Perhaps the “ideal” approach is to specify the full joint distribution of the random vector. This

approach is typically limited to situations where the marginal distributions are all from the same para-

metric family. For methods of this type see, for example, Devroye (1986) and Johnson (1987). But

the case where the marginals are not all from the same parametric family affords far greater modeling

generality, and is perhaps the case of more interest from a practical standpoint.

The primary difficulty in this case is that a tremendous amount of information is typically required

to specify (and fit) such a joint distribution. Furthermore, special methods must be devised to generate

random vectors with the given joint distribution, and this can be a practically insurmountable problem

for a model of even moderate complexity (Law and Kelton 2000, p. 479).

A practical alternative is to only specify the marginal distributions of the random variables, together

with the correlation matrix or covariance matrix. (Note that this information does not necessarily

uniquely specify the distribution.) The covariance measure could be Spearman’s rank covariance, Pear-

son’s product-moment covariance, Kendall’s τ , or any other convenient covariance measure. In this

paper, we will focus on Pearson’s product-moment covariance and Spearman’s rank covariance because

of their wide use and acceptance in application settings.

It is important to note that we are restricting attention to the generation of finite-dimensional random

vectors. As such, we are not attempting to generate a time series with given correlation properties. For

such studies, see for example Cario and Nelson (1996), Melamed et al. (1992), and Lewis et al. (1989).

Hill and Reilly (1994) describe a method for generating random vectors with specified marginals

and covariances through mixtures of extreme correlation distributions. The approach is very effective

for random vectors of low dimension (d ≤ 3 say), but the computational requirements quickly become

excessive for higher dimensional random vectors. There is another difficulty with this approach. We say

2

that a covariance matrix is feasible for a given set of marginal distributions if a random vector exists with

the prescribed marginals and covariance matrix. We show (Section 1) that there are sets of marginals

with feasible covariance matrix that cannot be matched using the technique developed by Hill and Reilly

(see Section 1 below).

Cario and Nelson (1997) described the “NORmal To Anything” (NORTA) method for generating

random vectors with prescribed covariance matrix. The NORTA method basically involves a component-

wise transformation of a multivariate normal random vector, and capitalizes on the fact that multivariate

normal random vectors are easily generated; see e.g., Law and Kelton 2000, p. 480. Cario and Nelson

traced the roots of the method back to Mardia (1970) who looked at bivariate distributions, and to Li

and Hammond (1975) who concentrated on the case where all of the marginals have densities (with re-

spect to Lebesgue measure). Iman and Conover (1982) implemented the same transformation procedure

to induce a given rank correlation in the output. Their method is only approximate, in that the output

will have only very approximately the desired rank correlation. Clemen and Reilly (1999) described

how to use the NORTA procedure to induce a desired rank correlation in the context of decision and

risk analysis. Lurie and Goldberg (1998) implemented a variant of the NORTA method for generating

samples of a predetermined size.

It is natural to ask whether the NORTA procedure can match any feasible covariance matrix for a

given set of marginals. Both Li and Hammond (1975) and Lurie and Goldberg (1998) give examples

where this does not appear to be the case. However, the random vectors that are proposed in these

papers as counterexamples are not proved to exist, and so the question has not yet been completely

settled.

For 2-dimensional random vectors, the NORTA method can match any feasible covariance matrix.

This follows immediately from the characterizations in Whitt (1976). However, for dimensions 3 and

greater, little is known.

In this paper, we prove that there are feasible covariance matrices for a given set of marginals that

the NORTA method cannot match. To establish this result we derive a computational procedure based

on linear programming for establishing whether or not a given covariance matrix is feasible for a given

set of marginals, and if so, explicitly providing a joint distribution with the required properties. We

call the constructed distributions “chessboard” distributions because of their structure; see Section 2.

It is worth noting that, at least in this paper, we are not advocating generating random vectors from

chessboard distributions. The approach is developed primarily to rigorously establish that NORTA can

fail. However, the idea of using chessboard distributions to generate random vectors with a set of desired

3

properties is a topic of current research.

Other methods for tackling the problem of generating random vectors with specified marginals and

rank covariance matrix have been developed. Chessboard distributions are perhaps closest in nature to

the “piecewise-uniform copulae” developed in Mackenzie (1994). (A copula is the distribution function

of a random vector with uniform marginals. The term was coined in Sklar 1959, and Nelsen 1999 is a

useful recent reference). Mackenzie (1994) attempts to identify a piecewise-uniform copula that matches

a given set of rank covariances. He assumes that such copulae exist, and then selects the one with

maximum entropy. In contrast, we do not assume this feasibility, develop the theoretical properties

of the approach, and apply it in the context of the NORTA method. Meeuwissen and Cooke (1994)

describe “tree-dependent” random vectors that can be rapidly generated, but cannot necessarily match

all feasible covariance matrices. Cooke (1997) introduces a generalization of tree-dependent random

vectors that is based on a “vine” representation of a joint distribution. Such random vectors can be

rapidly generated, but it is not yet clear whether they can be used to model any feasible covariance

matrix.

It is well-known that the set of feasible covariance matrices for a given set of marginals forms a convex

set (in a certain Euclidean space - see Section 2). Covariance matrices on the boundary of this set cannot

be matched using chessboard distributions, although chessboard distributions can get arbitrarily close

to such matrices; see Section 2.

Remark 1 In the case where all of the marginal distributions have densities with respect to Lebesgue

measure, the chessboard distribution we construct has a joint density with respect to d-dimensional

Lebesgue measure. In this case, we can, and do, refer to a chessboard density.

The philosophy of specifying marginals and correlations to model dependent random variates is clearly

an approximate one, since the joint distribution is not completely specified. Therefore, one should be

willing to live with reasonable (this is, of course, a relative term) discrepancies in the covariance matrix

from that desired. In cases where NORTA cannot precisely match a feasible covariance matrix, it is

still possible to use NORTA to obtain the desired marginals exactly, and the desired covariance matrix

approximately. Lurie and Goldberg (1998) gave an alternative approach to this problem, but we believe

that our solution has properties that make it more desirable; see Section 4.

If one is not willing to live with reasonable discrepancies from the desired covariance matrix, then

perhaps a more careful approach to specifying the dependence structure is warranted.

We view the primary contributions of this paper as follows.

4

1. We provide a computational procedure for determining whether a given covariance matrix is feasible

or not for a copula, i.e., for a random vector with uniform(0, 1] marginals. (In this case the rank

covariance and product-moment covariance are identical.) If the covariance matrix is feasible, then

an explicit construction of a joint density with these properties, that we call a chessboard density,

is provided. The method works for all covariance matrices that do not lie on the boundary of the

set of feasible covariance matrices; see Section 2. To the best of our knowledge, this is the first

example of such a procedure. This case is important because it is central to the analysis and use

of rank covariance for continuous marginals; see Section 1.

2. We provide a computational procedure for determining whether or not a given Pearson product-

moment covariance matrix is feasible for a given set of more general marginal distributions. If the

covariance matrix is feasible for the given set of marginals, we provide an explicit construction of

a chessboard distribution with the desired properties. Again, this procedure works for covariance

matrices that do not lie on the boundary of the set of feasible covariance matrices, and we believe

that this is the first example of such a procedure.

3. We rigorously establish that there are feasible covariance matrices that cannot be matched using

the NORTA method.

4. We provide a simple modification to the initialization phase of the NORTA method that enables one

to use the NORTA method to closely approximate the desired covariance matrix. The modification

involves the solution of a semidefinite program, and works in both the rank correlation and Pearson

product-moment correlation cases without any specialization. Based on a small computational

study, it appears that when one cannot exactly match a desired covariance, the discrepancy between

the desired and realized covariance matrices is quite small, at least for 3-dimensional random

vectors.

The remainder of this paper is organized as follows. In Section 1 we review the NORTA method,

and describe how it may be used to match a given Pearson product-moment covariance matrix, or a

given rank covariance matrix. In Section 2 we develop the theory of chessboard distributions in the

case where all of the marginal distributions are uniform(0, 1]. The chessboard distribution concept is

extended to more general marginals in Section 3. Next, in Section 4 we present a small computational

study that sheds light on when we might expect the NORTA method to be unable to match a feasible

covariance matrix, and provide several examples where this occurs. Our numerical results suggest that

as the covariance matrix gets close to the boundary, the linear program (LP) that needs to be solved

5

increases in size. In this section, we also present our modification of the NORTA method that involves

semidefinite programming. Finally, in Section 5 we discuss conclusions and future research.

1 The NORTA method

Cario and Nelson (1997) described the “NORmal To Anything” (NORTA) method for generating i.i.d.

replicates of a random vector X∗ say with prescribed marginal distributions and covariance structure.

In this method, one starts by generating a random vector Z with a multivariate normal distribution and

transforms Z to obtain a random vector X = (X1, . . . , Xd). Let Fi be the desired marginal distribution

function of X∗i , for i = 1, . . . , d.

The NORTA method generates i.i.d. replicates of X by the following procedure.

1. Generate an IRd valued standard normal random vector Z = (Z1, . . . , Zd) with mean vector 0 and

covariance matrix ΣZ = (ΣZ(i, j) : 1 ≤ i, j ≤ d), where ΣZ(i, i) = 1 for i = 1, . . . , d.

2. Compute the vector X = (X1, . . . , Xd) via

Xi = F−1i (Φ(Zi)), (1)

for i = 1, . . . , d, where Φ is the distribution function of a standard normal random variable, and

F−1i (u) = infx : Fi(x) ≥ u. (2)

The vector X generated by this procedure will have the prescribed marginal distributions. To see

this, note that each Zi has a standard normal distribution, so that Φ(Zi) is uniformly distributed on

(0, 1), and so F−1i (Φ(Zi)) will have the required marginal distribution.

The covariance matrix ΣZ should be chosen so that it induces the required correlation structure on

X. There are many measures of correlation between two random variables, but perhaps the two most

popular are Pearson’s product-moment correlation, and Spearman’s rank correlation.

1.1 Pearson’s Product-Moment Correlation

Suppose that we wish X∗ to have Pearson product-moment covariance matrix Σ, where

Σ(i, j) = EX∗i X∗

j − EX∗i EX∗

j

for 1 ≤ i, j ≤ d. This is the case that Cario and Nelson (1997) examined. Note that this is equivalent

to prespecifying the correlation matrix, since the marginal distributions are also prespecified. To ensure

6

that the required correlations are defined, we make the assumption that E[(X∗i )2] < ∞ for i = 1, . . . , d.

It turns out that choosing ΣZ to arrive at the correct covariance matrix Σ is a nontrivial problem.

Let X be the random vector generated from (1) above and ΣX denote its covariance matrix. As

noted in Li and Hammond (1975) and Cario and Nelson (1997), each term ΣX(i, j) = cov(Xi, Xj) is a

function of only cov(Zi, Zj). To see this, note that when cor(Zi, Zj) 6= ±1,

cov(Xi, Xj) =∫ ∞

−∞

∫ ∞

−∞F−1

i (Φ(zi))F−1j (Φ(zj))ϕij(zi, zj) dzi dzj − EXiEXj , (3)

where ϕij is the joint density of (Zi, Zj). The expression (3) depends only on the marginal distributions

Fi and Fj , and the density ϕij . The density ϕij depends only on the covariance between Zi and Zj .

When cov(Zi, Zj) = ±1, the joint density ϕij degenerates and the integral representation (3) is no longer

valid. However, in this degenerate case the covariance between Xi and Xj is still a function only of the

covariance between Zi and Zj . Hence, the relation (3) defines a function cij : [−1, 1] → IR mapping

cov(Zi, Zj) to cov(Xi, Xj), where Xi and Xj are defined via (1).

So the problem of matching a desired covariance matrix reduces to d(d− 1)/2 separate root-finding

problems of selecting cov(Zi, Zj) to match cov(Xi, Xj) to Σ(i, j). Unfortunately, there is no general

analytical expression for the function cij , and so we cannot determine the exact ΣZ that is to be used

analytically.

Cario and Nelson (1997) established that under very mild conditions, the function cij is a continuous

non-decreasing function of ΣZ(i, j). This result allows us to perform an efficient numerical search for

values ΛZ(i, j) that yield

cij(ΛZ(i, j)) = Σ(i, j) for i < j. (4)

Remark 2 Under more restrictive assumptions on the marginal distributions than Cario and Nelson

(1997) impose, Henderson, Chiera and Cooke (2000) show that (4) possesses a unique solution.

We take ΛZ(i, i) = 1 for i = 1, . . . , d, and for i > j, set ΛZ(i, j) = ΛZ(j, i) to ensure that ΛZ is

symmetric. Alternatives to the numerical search suggested by Cario and Nelson (1997) include the use of

a stochastic root-finding algorithm (Chen 2001), or polynomial expansions (van der Geest 1998). Unless

otherwise stated, we henceforth assume that a solution to (4) exists.

One might hope that if the matrix ΛZ satisfies (4), then ΛZ could be used in the NORTA method to

generate i.i.d. replicates of X. Unfortunately, the results of this paper prove that this is not always the

case. In fact, there exists a feasible covariance matrix for a 3-dimensional random vector with uniform

marginals on (0, 1] that cannot be generated with the NORTA procedure. The problem arises when the

7

matrix ΛZ as determined from (4) is not positive semidefinite, in which case it is not a valid covariance

matrix.

Li and Hammond (1975) suggested the following example to illustrate this important fact. Let

X∗1 , X∗

2 and X∗3 be 3 uniformly distributed random variables on (0, 1] with covariance matrix

Σ =112

1 −0.4 0.2

−0.4 1 0.8

0.2 0.8 1

.

In the special case when X∗ has uniform marginals, the equations (4) can be solved analytically. In

particular, Kruskal (1958) showed that the (unique) solution to (4) is given by

ΛZ(i, j) = 2 sin[2πΣ(i, j)]. (5)

For the Li and Hammond example, the (unique) matrix ΛZ found from (5) is not positive semidefinite.

It is important to observe though, that this is a counterexample only if the postulated random vector

exists. Li and Hammond did not show this.

Remark 3 It is straightforward to show that the Li and Hammond example cannot be generated using the

extremal distributions method of Hill and Reilly (1994). One simply attempts to solve the LP suggested

by Hill and Reilly (1994), which turns out to be infeasible. Therefore, if the Li and Hammond example

exists, it shows that there are feasible covariance matrices that cannot be matched using the extremal

distributions technique.

Lurie and Goldberg (1998) gave an example with nonuniform marginals and positive definite covari-

ance matrix for which the solution to (4) is also not positive semidefinite. They did not establish that

the postulated random vector exists.

When all of the marginals have continuous distribution functions, a natural alternative to the nu-

merical search procedure mentioned earlier is to “work in Gaussian space”. In other words, given a

set of data with known (or fitted) marginals with continuous distribution functions, we first transform

the data set into normal random variates using the inverse of the transformation (1). We can then

compute an empirical covariance matrix ΣZ and use this covariance matrix in the NORTA procedure.

(If the distribution function F of a random variable X is not continuous, then F (X) does not have a

uniform distribution on (0, 1), and so one will not obtain a normally distributed random variable using

Φ−1(F (X)). Therefore, the continuity of the marginal distribution functions is needed.)

8

This approach is certainly simpler than a numerical search procedure, but it has two important

drawbacks. First, it requires a set of input data, which may not be available in general. But second,

and perhaps more importantly, this procedure does not necessarily ensure that the resulting X variates

will have the required covariance structure. To see why, observe that the transformed normal random

variables mentioned above are unlikely to have a joint normal distribution. Therefore, the correlations of

the jointly normal random variables used in the NORTA method using ΣZ will be unlikely to transform

through the NORTA procedure to yield the desired covariance matrix for X, as one might otherwise

expect. This is a subtle point, but one that is worth bearing in mind.

1.2 Spearman’s Rank Correlation

Suppose now that we wish X∗ to have Spearman’s rank covariance matrix Σ, where

Σ(i, j) = rcov(X∗i , X∗

j ) = EFi(X∗i )Fj(X∗

j )− EFi(X∗i )EFj(X∗

j )

for 1 ≤ i, j ≤ d. This is the case treated by Clemen and Reilly (1999). In contrast to product-moment

covariance, the rank covariance is always defined, because Fi(X∗i ) is a bounded random variable. In

fact, if Fi is continuous, then Fi(X∗i ) is uniformly distributed on (0, 1). An important property of

Spearman’s rank covariance is that unlike Pearson’s product-moment covariance, it is preserved under

strictly increasing transformations of the random variables.

If all of the marginal distribution functions Fi are continuous, then the NORTA transformation (1)

is strictly increasing. In this case, the rank covariance is preserved by the NORTA transformation, and

so if X is the NORTA generated random vector, then

rcov(Xi, Xj) = cov(Φ(Zi), Φ(Zj)). (6)

But (6) is precisely the quantity Σ(i, j) in (5). Therefore, given a desired rank covariance matrix Σ, we

simply compute ΣZ = ΛZ via (5) and use this within the NORTA procedure.

Observe that if the random vector in the Li and Hammond example (given above) exists, then it is

again an example showing that there are feasible rank covariance matrices for a given set of marginals

that cannot be matched using a NORTA procedure.

In the case where Fi (say) is not continuous, (6) no longer holds. Therefore, the analytical expression

(5) cannot be used. However, one could use a numerical search procedure as in Cario and Nelson (1997)

to identify the covariance ΣZ(i, j) that yields the required rank covariance rcov(Xi, Xj). This follows

since the rank covariance between Xi and Xj is a nondecreasing continuous function of the covariance

9

between Zi and Zj . The nondecreasing property follows immediately from the proof of Theorem 1 in

Cario and Nelson (1997), and the fact that the function Fi(F−1i (Φ(·))) is nondecreasing. Continuity

follows from Theorem 2 of Cario and Nelson.

In this section we reviewed the NORTA method and discussed the significance of the Li and Hammond

example. We now turn to the question of whether the Li and Hammond counterexample exists or not.

2 Copulas

Recall that a copula is the joint distribution function of a random vector with uniform marginals on (0, 1].

In this section, we develop a method for either constructing the joint distribution of a 3-dimensional

copula with prescribed covariance matrix, or establishing that such a joint distribution does not exist.

(Note that Pearson’s product-moment covariance and Spearman’s rank covariance coincide in the copula

case, so the problem is well-defined.) The approach is easily carried over to the general case of a d-

dimensional copula with prescribed covariance matrix.

We will let X = (X1, X2, X3) denote a random variable with such a distribution and let Σ = (Σij :

1 ≤ i, j ≤ 3) be the desired covariance matrix. We first construct the probability mass function (pmf)

of a random vector Y = (Y1, Y2, Y3) whose marginals are discretized versions of the marginals of X.

The pmf will be constructed to try to ensure that Y has covariance matrix Σ (except for the diagonal

entries). From this pmf, we then construct the density of X in such a way that the off-diagonal entries

in the covariance matrix are maintained. (The diagonal elements are determined by the marginals of

X.) This then yields the required construction.

Our notation will appear partly redundant at times, but this is done to ensure consistency with

Section 3 where we will extend these ideas to more general marginal distributions.

Let n ≥ 1 be an integral parameter that determines the level of discretization that will be performed.

Let yi,k = kn , k = 0, . . . , n be the set of points that divide the range (0,1] of the ith variable into n equal

length sub-intervals. For k = 1, . . . , n and i = 1, 2 and 3, let

Yi,k = E[Xi |Xi ∈ (yi,k−1, yi,k] ] =2k − 1

2n(7)

be the conditional mean of Xi given that it lies in the kth sub-interval.

The support of the random vector Y is the mesh of points

(Y1,i, Y2,j , Y3,k) : 1 ≤ i, j, k ≤ n.

10

Let

q(i, j, k) = P (Y1 = Y1,i, Y2 = Y2,j , Y3 = Y3,k)

be the probability that Y equals the (i, j, k)th point in the support of Y , so that q represents the pmf

of the random vector Y . (Note that it is not the pmf itself, since the function q is defined on integers,

while the domain of the pmf is contained in the unit cube.)

Consistent with the notion that Y is a discretized version of X, we also have that

q(i, j, k) = P (X ∈ C(i, j, k)),

where the cell C(i, j, k) represents the cube of points surrounding the (i, j, k)th point in the support of

Y . More precisely,

C(i, j, k) = (x1, x2, x3) : y1,i−1 < x1 ≤ y1,i, y2,j−1 < x2 ≤ y2,j , y3,k−1 < x3 ≤ y3,k.

We then see thatn∑

j,k=1

q(i, j, k) = P (Y1 = Y1,i) = P (X1 ∈ (y1,i−1, y1,i]) =1n

, ∀i = 1, . . . , n, (8)

n∑

i,k=1

q(i, j, k) = P (Y2 = Y2,j) = P (X2 ∈ (y2,j−1, y2,j ]) =1n

, ∀j = 1, . . . , n, (9)

n∑

i,j=1

q(i, j, k) = P (Y3 = Y3,k) = P (X3 ∈ (y3,k−1, y3,k]) =1n

, ∀k = 1, . . . , n, (10)

q(i, j, k) ≥ 0 ∀i, j, k = 1, . . . , n. (11)

With these constraints satisfied, we then have that EYi = 1/2 = EXi for i = 1, . . . , 3. To see this,

note that for Y1, we have that

EY1 =n∑

i,j,k=1

Y1,iq(i, j, k)

=n∑

i,j,k=1

E[X1 |X1 ∈ (y1,i−1, y1,i] ]P (X ∈ C(i, j, k))

=n∑

i=1

E[X1 |X1 ∈ (y1,i−1, y1,i] ]P (X1 ∈ (y1,i−1, y1,i])

= EX1.

Recall that our intermediate goal is to match the covariance matrix of Y to that of X (with the

exception of the diagonal elements). We do this using an LP. If Cij = cov(Yi, Yj), then we want to

minimize

|C12 − Σ12|+ |C13 − Σ13|+ |C23 − Σ23|. (12)

11

Now

C12 =n∑

i,j,k=1

Y1,iY2,jq(i, j, k)− EY1EY2,

which is a linear function of the q(i, j, k)’s, with similar linear expressions for C13 and C23. Furthermore,

the matrix Σ is simply a parameter and so, using a standard trick in linear programming, we can represent

|C12 − Σ12| in a linear fashion, and similarly for the other terms in (12) as follows.

Define Z+ij and Z−ij to be the positive and negative parts of the difference Cij − Σij , i.e.,

Z+ij = (Cij − Σij)+ = maxCij − Σij , 0, and (Cij − Σij)− = −minCij − Σij , 0.

We can now attempt to match the covariances of Y to those of X using the LP

min∑2

i=1

∑3j=i+1(Z

+ij + Z−ij )

subject to Cij − Σij = Z+ij − Z−ij , i = 1 to 2 and j = i + 1 to 3

Z+ij ≥ 0, Z−ij ≥ 0, together with constraints (8), (9), (10) and (11).

This LP is always feasible since a product copula where the Yi’s are independent can be easily

constructed by setting all q(i, j, k) = n−3. Also, the objective function of the LP is bounded below by

0, so an optimal solution exists.

If the optimal objective value for the LP is 0, then we have constructed a joint probability mass

function for Y that has the desired covariance structure, i.e., cov(Yi, Yj) = Σij .

The discretized random vector Y does not possess continuous uniform (0,1] marginals. However,

we can construct a random vector X with continuous uniform marginals from Y in such a way that

cov(Yi, Yj) = cov(Xi, Xj) for i 6= j, i.e, the covariances are preserved. Assuming the optimal objective

value of the LP is 0, this then yields an explicit construction of a random vector with the desired

marginals and convariance matrix.

By conditioning on the cell containing X, we see that the requirement that cov(Y1, Y2) = cov(X1, X2)

is equivalent ton∑

i,j,k=1

q(i, j, k) · Y1,iY2,j −EY1EY2 =n∑

i,j,k=1

E[X1X2|X ∈ C(i, j, k)] ·P (X ∈ C(i, j, k))−EX1EX2 (13)

But, EY1 = EX1 and EY2 = EX2, and so (13) can be re-expressed asn∑

i,j,k=1

q(i, j, k) · E[X1|X ∈ C(i, j, k)] · E[X2|X ∈ C(i, j, k)]− E[X1X2|X ∈ C(i, j, k)] = 0. (14)

Equation (14) could be satisfied in many ways, but perhaps the simplest is to note that (14) will hold

if, conditional on X lying in C(i, j, k), X1, X2 and X3 are independent. In that case, each term in the

12

sum (14) is 0. One can ensure that this conditional independence holds, while simultaneously ensuring

that X has the correct marginal distributions, by setting the density of X within the cell C(i, j, k) to

that of independent, uniformly distributed random variables, scaled so that the total mass in the cell is

q(i, j, k). To be precise, if f is the density of X, then for any x ∈ C(i, j, k), we set

f(x) = n3q(i, j, k). (15)

In a sense, we are “smearing” the mass q(i, j, k) uniformly over the cell C(i, j, k).

Theorem 1 below proves that if the optimal objective value of the LP is 0, then the density f so

constructed has the desired marginals and covariance matrix.

Theorem 1 If the optimal objective value of the LP is 0, then the density f defined via (15) has uniform

(0,1] marginals and covariance matrix Σ.

Proof: Clearly, f is nonnegative and integrates to 1. Next, we need to show that the marginals, fj say,

of f are uniform. For any x ∈ (y1,i−1, y1,i), we have that

f1(x)dx =n∑

j,k=1

P (X1 ∈ [x, x + dx)|X ∈ C(i, j, k))P (X ∈ C(i, j, k))

=n∑

j,k=1

P (X1 ∈ [x, x + dx)|X1 ∈ (y1,i−1, y1,i])q(i, j, k)

=n∑

j,k=1

dx∫ y1,i

y1,i−11 dy

q(i, j, k)

= n

n∑

j,k=1

q(i, j, k)dx = 1dx

The first equation follows by conditioning on the cell in which the random vector lies, and the second

by the conditional independence of X1, X2 and X3 given that X lies in C(i, j, k). The third follows from

the assumption of uniform “smearing” of q(i, j, k) on the cell C(i, j, k). A similar result holds for the

marginals of X2 and X3, and so the joint density f has the right marginals.

Next we need to show that the obtained covariances are indeed the desired ones. Take the case of

cov(X1, X2). Starting with its definiton, we have

cov(X1, X2) = EX1X2 − EY1EY2

= EY1Y2 − EY1EY2

= Σ12.

13

The first equality follows from the fact that EY1 = EX1 and EY2 = EX2, and the second is just a

restatement of (14). The final equation follows from the fact that the optimal objective value is 0.

The same follows for cov(X2, X3) and cov(X1, X3). Hence, f has the covariances as desired and this

completes the proof. 2

Remark 4 The name “chessboard” distribution is motivated by the form of (15) in a 2 dimensional

problem. In this case, the unit square is broken down in n2 squares, and the density f is constant on

each square, with value n2q(i, j).

Remark 5 There is no need for the cells used in the above construction to be of equal size. Indeed,

Theorem 1 remains true for more general discretizations; see Theorem 10 in Section 3.

The feasible region of the LP can be reduced through the inclusion of constraints on the Z+ijs and

Z−ijs. These constraints provide us with a new feasibility criterion to test for the existence of a random

vector with the given covariance matrix.

The constraints are developed by assuming that a random vector X with uniform marginals and

covariance matrix Σ exists, discretizing X to obtain a new random vector X say, and then bounding the

change in the covariances resulting from the discretization.

So suppose that we discretize X to obtain X. Let

q(i, j, k) = P (X = (Y1,i, Y2,j , Y3,k)),

and observe that q provides a feasible solution to the above LP. We now wish to bound the change in

the covariance resulting from this discretization. Observe that

cov(X1, X2)− Σ12 = EX1X2 − EX1X2

=n∑

i,j,k=1

(Y1,iY2,j − E[X1X2|X ∈ C(i, j, k)])q(i, j, k). (16)

But

y1,i−1y2,j−1 ≤ E[X1X2|X ∈ C(i, j, k)] ≤ y1,iy2,j . (17)

Combining (16) with (17) we see that

cov(X1, X2)− Σ12 ≤n∑

i,j,k=1

q(i, j, k)(Y1,iY2,j − y1,i−1y2,j−1) and (18)

cov(X1, X2)− Σ12 ≥n∑

i,j,k=1

q(i, j, k)(Y1,iY2,j − y1,iy2,j). (19)

14

Equation (18) gives an upper bound on Z+12, and (19) gives an upper bound on Z−12. Similar bounds

may be obtained for the other covariances. After substituting in the explicit expressions for yi,k and

Yi,k, these bounds simplify to

Z+ij ≤

12n

− 14n2

and Z−ij ≤12n

+1

4n21 ≤ i < j ≤ 3. (20)

Once the LP is augmented with the bounds (20), it is no longer guaranteed to be feasible. In fact,

Theorem 2 below establishes that if the augmented LP is infeasible for any value of n ≥ 1, then the

covariance matrix Σ is not feasible for uniform marginals. The proof is basically a summary of the above

discussion, and is given to help clarify these ideas.

Theorem 2 If the augmented LP is infeasible for some n ≥ 1, then there cannot exist a random vector

X with uniform marginals and the desired covariance matrix Σ.

Proof: Suppose there exists a random vector X with uniform marginals and covariance matrix Σ. Then,

as above, we can construct a solution q by discretizing X that satisfies all of the constraints, including

the bounds (20). Thus the augmented LP is feasible, which is a contradiction. 2

In fact, one can prove a converse to Theorem 2.

Theorem 3 If the covariance matrix Σ is not feasible for uniform (0,1] marginals, then there exists an

n ≥ 1 such that the augmented LP is infeasible.

Proof: On the contrary, suppose that the augmented LP is feasible for all n ≥ 1. Let qn denote an

optimal solution to the nth augmented LP, and let µn denote the probability measure corresponding to

the density resulting from the smearing operation (15) applied to qn. Then each µn is the distribution

of a random vector with support contained in (0, 1]3 with uniform(0, 1] marginals. Hence, the sequence

(µn : n ≥ 1) is tight, and by Theorem 29.3 on p. 392 of Billingsley (1986), it possesses a weakly

convergent subsequence (µn(k) : k ≥ 1), converging to µ say.

Now, µ has uniform (0, 1] marginals. This follows from Theorem 29.2, p. 391 of Billingsley (1986)

since each µn(k) has uniform(0, 1] marginals, µn(k) ⇒ µ as k →∞, and the projection map πj : IR3 → IR

that returns the jth coordinate of a vector in IR3 is continuous.

Now, if Cn is the covariance matrix of the distribution qn, then

2∑

i=1

n∑

j=i+1

|Cnij − Σij | ≤ 3

2n+

34n2

→ 0

15

as n → ∞. This follows from the bounds (20), and the fact that in any optimal solution, it is not the

case that both Z+ij and Z−ij are strictly positive.

Finally, if Xn(k) has distribution µn(k), then (Xn(k)i X

n(k)j : k ≥ 1) is a uniformly bounded sequence of

random variables, and therefore uniformly integrable. It immediately follows that the covariance matrix

Λ of µ is given by

Λ = limk→∞

Cn(k) = Σij .

Thus, µ has the required marginals and covariance matrix, which is a contradiction, and the result is

proved. 2

Combining Theorems 2 and Theorem 3, we see that a covariance matrix is infeasible for uniform

marginals if, and only if, the augmented LP is infeasible for some n ≥ 1.

Given this very sharp characterization of infeasible covariance matrices, it is natural to ask whether

a similar result holds for feasible covariance matrices. We would then have the result that a covariance

matrix is feasible for a given set of marginals if and only if there is some finite n such that the optimal

objective value of the augmented LP is zero. Unfortunately, this conjecture is false.

Suppose that X1 = X2 and hence cov(X1, X2) = var(X1) = 1/12. For given n, the covariance

between Y1 and Y2 is maximized by concentrating all mass on the cells (i, i), and so q(i, i) = n−1 for

1 ≤ i ≤ n. In that case, we have that

cov(Y1, Y2) =n∑

i=1

(2i− 1

2n

)2 1n−

(12

)2

=112− 1

12n2

Therefore, cov(Y1, Y2) < 1/12 for all finite n, and so the conjecture is false.

Notice that the covariance matrix in this example is singular. This example is a special case of the

following result.

Theorem 4 All chessboard densities have nonsingular covariance matrices.

Proof: On the contrary, suppose that f is a chessboard density with singular covariance matrix Σ, and

let X have density f . Since Σ is singular, there exists a nonzero vector α such that Σα = 0. Hence,

var(α′X) = α′Σα = 0, and so α′X = α′EX a.s. Since α is nonzero, we may, by relabelling variables if

necessary, write X1 as a linear function of X2, X3, say X1 = β0 + β2X2 + β3X3. This equality must also

hold conditional on X ∈ C(i, j, k). But the components of X are conditionally independent given that

X ∈ C(i, j, k) because f is a chessboard density, which is the required contradiction. 2

16

The importance of Theorem 4 is that if Σ is feasible for the given marginals and singular, then no

matter how large n may be, the optimal objective value of the LP will always be > 0, i.e., we cannot

exactly match the covariance matrix Σ. However, we can come arbitrarily close, as the following result

shows.

Theorem 5 Suppose that the covariance matrix Σ is feasible for uniform (0, 1] marginals. Then for

all n ≥ 1, the augmented LP is feasible, and if z(n) is the optimal objective value of the nth LP, then

z(n) → 0 as n →∞.

Proof: Since Σ is feasible for uniform marginals, the augmented LP is feasible for all n ≥ 1. (This is

just the contrapositive of Theorem 2.) Let qn denote an optimal solution to the nth LP, and let fn be

the corresponding smeared density. If Cn is the covariance matrix corresponding to fn, then the bounds

(20) imply that

z(n) =2∑

i=1

n∑

j=i+1

|Cnij − Σij | ≤ 3

2n+

34n2

→ 0

as n →∞. 2

Therefore, chessboard densities can come arbitrarily close to any required Σ that is feasible for

uniform marginals. In fact, one can prove that chessboard densities can exactly match a (very) slightly

restricted class of feasible covariance matrices. To state this result we need some notation.

We can and do easily state and prove Proposition 6 for a general dimension d (i.e., not just d = 3)

without any notational difficulty. Any covariance matrix Σ of a d dimensional random vector with

uniform(0, 1] marginals can be characterized by d(d − 1)/2 covariances, since the diagonal entries are

determined by the marginals, and the matrix is symmetric. Hence we can, with an abuse of notation,

think of Σ as a d(d− 1)/2 dimensional vector in some contexts, and as a d× d matrix in others.

Let Ω ⊂ [−1/12, 1/12]d(d−1)/2 denote the space of feasible covariance matrices, so that Σ ∈ Ω implies

that there exists a random vector with uniform(0, 1] marginals, and covariance matrix Σ. We will show

below that Ω is nonempty and convex (this is well-known), but also closed and full-dimensional (this

appears to be new). In particular then, any covariance matrix on the boundary of Ω is feasible. We will

also show that Σ is contained in the interior of Ω if, and only if, there is some finite n for which the

augmented LP has objective value 0. The collective implications of this and our previous results will be

discussed after the statement and proof of these results.

Proposition 6 The set Ω is nonempty, convex, closed and full-dimensional.

17

Proof: If the components of X are independent, then the covariance matrix Σ is diagonal, and so Ω

contains the zero vector, and is therefore nonempty.

It is well-known that Ω is convex. For if Σ1, Σ2 ∈ Ω, then there exist random vectors X, Y with

uniform(0, 1] marginals, and covariance matrices Σ1 and Σ2 respectively. For λ ∈ (0, 1), let Z be given

by X with probability λ, and Y with probability 1−λ. Then Z has covariance matrix λΣ1 + (1−λ)Σ2.

The proof that Ω is closed is virtually identical to that of Theorem 3 and is omitted.

We use the NORTA method to prove that Ω is full-dimensional. We will show that each of the

vectors ±ek/12 are contained in Ω, where ek is the vector whose components are all 0 except for a 1 in

the kth position, for k = 1, . . . , d(d− 1)/2. The convexity of Ω then ensures that Ω is full-dimensional.

Let Z be a multivariate normal random vector with mean 0 and covariance matrix consisting of 1’s

on the diagonal, and also in the (i, j)th and (j, i)th position (i 6= j), with the remaining components

being 0. That is, Z consists of 2 perfectly correlated standard normal random variables Zi and Zj , and

d − 2 independent standard normal random variables. Now let U be the random vector with uniform

(0, 1) marginals obtained by setting Um = Φ(Zm) for m = 1, . . . , d. Then Ui and Uj are perfectly

correlated, and independent of all of the remaining components of U . Thus, U has covariance matrix

whose components are all 0 except for the diagonal elements, and the (i, j), and (j, i)th elements, which

are equal to 1/12. Thus, ek/12 lies in Ω, where k corresponds to the position (i, j). A similar argument

with perfectly negatively correlated Zi and Zj shows that −ek/12 ∈ Ω. Since i 6= j were arbitrary, the

proof is complete. 2

In Theorem 4 we showed that all chessboard densities have nonsingular covariance matrices. This is

almost sufficient to establish that all boundary points of Ω do not have chessboard densities. However,

it is certainly conceivable that the boundary of Ω contains nonsingular, as well as singular, covariance

matrices. So we strengthen Theorem 4 with the following result.

Theorem 7 If fn is a chessboard density with covariance matrix Σ, then Σ is contained in the interior

of Ω.

Proof: Let X have density fn. We will show that we can both increase, and decrease, the covariance

between X1 and X2. Symmetry then allows us to conclude that the same result holds for Xi and Xj

with i 6= j. The convexity of Ω then completes the proof.

Let q be the discretization of fn into its n3 cells, and let C(i, j, k) be a cell with q(i, j, k) > 0. Divide

18

the cell C(i, j, k) into 4 (equal size) subcells,

Cab(i, j, k) = (x, y, z) ∈ C(i, j, k) :2i− (3− a)

2n< x ≤ 2i− (2− a)

2n,

2j − (3− b)2n

< y ≤ 2j − (2− b)2n

,

for 1 ≤ a, b ≤ 2.

Generate a new density g by the usual smearing (15) in all cells except C(i, j, k). Within the cell

C(i, j, k), assign a mass of q(i, j, k)/2 to each of the cells C11(i, j, k), and C22(i, j, k), and then uniformly

smear within these cells. In other words, for (x, y, z) contained in these two cells, set g(x, y, z) =

2n3q(i, j, k) and set g to be 0 in the cells Cab(i, j, k) for a 6= b. Then it is straightforward to show that g

has uniform marginals, that the (1, 2)th covariance is strictly increased, and that the other covariances

remain unchanged.

A similar argument placing the mass in the cells Cab(i, j, k) with a 6= b shows that the covariance

can be strictly decreased, and so the proof is complete. 2

We have thus far shown that if a covariance matrix Σ is not in Ω, then the augmented LP will be

infeasible for some n ≥ 1, and if Σ is on the boundary of Ω, then the LP approach will yield distributions

with covariance matrices that arbitrarily closely approximate Σ, but never actually achieve it. Our final

result shows that if Σ is contained in the interior of Ω, then there is some n ≥ 1 for which the optimal

objective value of the augmented LP is 0, and so one can exactly match Σ using a chessboard density.

Before proving this result, we need the following lemma. This lemma basically states that given a fixed

vector x, we can choose certain other vectors arbitrarily close to x, so that x is a convex combination of

these “close” vectors, and if we perturb the close vectors slightly, then x is still a convex combination of

the perturbed vectors.

For x ∈ IRm and ε > 0, let B(x, ε) denote the (open) set of vectors y ∈ IRm : ρ(x, y) < ε, where ρ

is the L1 distance

ρ(x, y) =m∑

i=1

|xi − yi|.

The proof of the following lemma may be found in Appendix A.

Lemma 8 Let x ∈ IRm, and let ε > 0 be arbitrary. There exist m + 1 points x1, . . . , xm+1 ∈ B(x, ε),

and a δ > 0 such that if

ρ(xi, x′i) < δ ∀i = 1, . . . , m + 1,

then x may be written as a convex combination of x′1, . . . , x′m+1.

19

We are now ready to state the final result of this section. As in Proposition 6, there is no loss of

clarity if we state this result for a general dimension d rather than just d = 3.

Theorem 9 If Σ is contained in the interior of Ω, then there exists an n ≥ 1 such that the optimal

objective value of the augmented LP is 0.

Proof: Let m = d(d−1)/2, and for now, consider Σ as an m-vector. Let ε > 0 be such that B(Σ, ε) ⊆ Ω,

and choose Σ1, Σ2, . . . , Σm+1 ∈ B(Σ, ε) and δ as in Lemma 8.

Since Σi ∈ Ω, from Theorem 5 there exists an n(i) such that the augmented LP with target

covariance matrix Σi has optimal objective value smaller than δ, for each i = 1, . . . , m + 1. Let

n = n(1)n(2) · · ·n(m + 1), and let qi denote a solution to the augmented LP with target matrix Σi

and discretization level n for i = 1, . . . ,m + 1. Then the optimal objective value corresponding to qi is

also less than δ. (Note that if k, n ≥ 1 are integers, then the optimal objective values z(n) and z(kn)

satisfy the relationship z(kn) ≤ z(n), since the chessboard density obtained from the solution to the nth

LP can also be obtained from the (kn)th LP.)

Let Σ′i denote the covariance matrix corresponding to the chessboard density f i for the solution qi,

for i = 1, . . . ,m + 1. Then, by Lemma 8, there exist nonnegative multipliers λ1, λ2, . . . , λm+1 summing

to 1 such that

Σ =m+1∑

i=1

λiΣ′i. (21)

If we set

f =m+1∑

i=1

λifi,

then f is also a chessboard density with discretization level n, and from (21), its covariance matrix is

exactly Σ. 2

In summary then, we have shown that if Σ is infeasible for uniform marginals, then the augmented

LP will be infeasible for some n ≥ 1. This includes the case where Σ is singular and infeasible for uniform

marginals. Furthermore, we have shown that if Σ is contained in the interior of Ω, then the augmented

LP will have optimal objective value 0 for some n ≥ 1, and so one can construct a chessboard density

from the solution to the augmented LP with the required marginals and covariance matrix. So if Σ is

not contained in the boundary of Ω, then we have an algorithm for determining, in finite time, whether

Σ is feasible for the given marginals or not. One simply solves the augmented LP for n = 1, 2, 3, . . . until

the augmented LP is either infeasible, or has an optimal objective value of 0. In the latter case, we can

deliver an explicit construction of the desired distribution.

20

The case where Σ lies on the boundary of Ω is more problematical. We have shown that in this case,

Σ is feasible for uniform marginals, but that a chessboard density cannot be constructed with uniform

marginals and covariance matrix Σ. Therefore, for such matrices, the algorithm outlined above will not

terminate in finite time. However, a chessboard distribution can come arbitrarily close to the required

covariance matrix.

3 More general marginals

The LP method used in Section 2 to evaluate the existence of a random vector with uniform marginals

and given covariance matrix can be adapted to investigate the existence of random vectors having

arbitrary marginal distributions and given Pearson product-moment covariance matrix. We will stick to

the case of a 3-dimensional random vector for notational simplicity but note that the approach is easily

extended to the general d-dimensional case.

Let X = (X1, X2, X3) represent the random vector that is to be constructed, and let Σ be the

desired covariance matrix. Let Fi(·) denote the distribution function of Xi, for i = 1, 2, 3. For ease of

exposition we assume that each of the Fi’s has a density fi with respect to Lebesgue measure, although

the approach applies more generally, and in particular, can be applied when some or all of the Xi’s have

discrete distributions.

In the spirit of the method developed in Section 2, we will first construct the probability mass function

of a discretized random vector Y = (Y1, Y2, Y3) with a covariance structure as close to the desired one

as possible, and then derive a joint distribution for X.

Let n1, n2 and n3 represent the levels of discretization of the random variables X1, X2 and X3

respectively, and hence the number of points that form the support of Y1, Y2 and Y3. Let the range of

the variable Xi be divided into ni subintervals (which may, or may not, be equal in length) by the set

of points yi,0, yi,1, . . . , yi,ni, with

−∞ ≤ yi,0 < yi,1 < · · · < yi,ni ≤ ∞.

Note that we explicitly allow yi,0 and yi,ni to be infinite and the spacing between the yi,ks to be arbitrary.

Let Yi,k denote the conditional mean of Xi, given that it lies in the subinterval (yi,k−1, yi,k]. In other

words, we set

Yi,k = E[Xi|Xi ∈ (yi,k−1, yi,k]] =∫ yi,k

yi,k−1

xfi(x)Pi(k)

dx,

21

where Pi(k) = Fi(yi,k) − Fi(yi,k−1) represents the probability that Xi lies in the kth subinterval. The

support for the random vector Y is then Y1,i, Y2,j , Y3,k : 1 ≤ i ≤ n1, 1 ≤ j ≤ n2, 1 ≤ k ≤ n3.Let q(i, j, k) = P (Y = (Y1,i, Y2,j , Y3,k)) = P (X ∈ C(i, j, k)), where C(i, j, k) is defined as in Section

2 to be the cell corresponding to q(i, j, k). We now give constraints on the q(i, j, k)s analogous to (8)

through (11). Specifically, we have that

n∑

j,k=1

q(i, j, k) = P (Y1 = Y1,i) = P (X1 ∈ (y1,i−1, y1,i]) = P1(i), ∀i = 1, . . . , n1,

n∑

i,k=1

q(i, j, k) = P (Y2 = Y2,j) = P (X2 ∈ (y2,j−1, y2,j ]) = P2(j), ∀j = 1, . . . , n2,

n∑

i,j=1

q(i, j, k) = P (Y3 = Y3,k) = P (X3 ∈ (y3,k−1, y3,k]) = P3(k), ∀k = 1, . . . , n3,

q(i, j, k) ≥ 0 ∀i = 1, . . . , n1, j = 1, . . . , n2, k = 1, . . . , n3.

When these constraints are satisfied, EYi = EXi for each i, just as in the case of uniform marginals.

We can now formulate an LP along the lines of that given in Section 2 to attempt to match the covariances

of Y to those required of X. We omit the details.

The LP is always feasible, since we can take the Yi’s to be independent, and the objective value is

bounded below by 0. Hence, an optimal solution exists for every discretization.

If the optimal objective value is 0, then we have been able to construct a probability mass function

for Y with the required covariance structure. Constructing a joint distribution function for X from this

pmf for Y is similar to the method of uniform “smearing” used in Section 2.

Specifically, the “smearing” process should be able to satisfy (14). Again, the easiest method for

doing so is perhaps to ensure that the variables are conditionally independent given that X lies within

the cell. To ensure that this conditional independence holds while simultaneously ensuring that X has

the right marginals, we set the density of X within the cell C(i, j, k) to be that of independent random

variables with the right marginals, scaled so that the total mass in the cell is q(i, j, k). To be precise, if

f is the density of X, then for any x = (x1, x2, x3) ∈ C(i, j, k), we set

f(x) =f1(x1)P1(i)

f2(x2)P2(j)

f3(x3)P3(k)

q(i, j, k). (22)

We can now provide analogous results to those in Section 2. We omit the proofs since they are similar

to those already presented.

Theorem 10 If the optimal objective value of the LP is 0, then the density f defined via (22) has the

required marginals and covariance matrix Σ.

22

As in Section 2, let Ω ⊂ IRd(d−1)/2 denote the set of feasible covariance matrices for the given

marginals. As before, we think of a given covariance matrix as a vector in d(d− 1)/2 dimensional space

in some contexts, and as a d× d matrix in others.

Proposition 11 The set Ω is nonempty, convex and full-dimensional.

We also have the following analogue of Theorems 4 and 7.

Theorem 12 Any chessboard density has a nonsingular covariance matrix. Furthermore, if fn is a

chessboard density with covariance matrix Σ, then Σ is contained in the interior of Ω.

To extend the other results of the previous section, we assume that yi,0 and yi,niare finite for all i,

i.e., that all of the distribution functions Fi have bounded support. We further assume that all of the

ni’s are equal to n say, and that all subintervals are of equal length. Thus, we will discretize on a regular

grid containing n3 cells.

Suppose that a random vector X with the desired marginals and covariance matrix exists, and let

X denote its discretization. Let q(i, j, k) be the probability that X lies in the cell C(i, j, k). We can

now bound the change in the covariances of X and X. But first it is convenient to let ai = yi,0 and

∆i = yi,1 − yi,0, so that ∆i is the width of the cells in the ith coordinate direction, for i = 1, 2, 3. With

this notation, we have that

cov(X1, X2) − Σ12

=n∑

i,j,k=1

q(i, j, k)[Y1,iY2,j − E[X1X2|X ∈ C(i, j, k)]]

≤n∑

i,j,k=1

q(i, j, k)[y1,iy2,j − y1,i−1y2,j−1]

=n∑

i,j,k=1

q(i, j, k)[(a1 + i∆1)(a2 + j∆2)− (a1 + (i− 1)∆1)(a2 + (j − 1)∆2)]

=n∑

i,j,k=1

q(i, j, k)[∆1(a2 + (j − 1)∆2) + ∆2(a1 + (i− 1)∆1) + ∆1∆2]

≤ ∆1EX2 + ∆2EX1 + ∆1∆2.

A similar lower bound can be derived, and so the LP can be augmented by the bounds

Z+ij , Z

−ij ≤ ∆iEXj + ∆jEXi + ∆i∆j ,

for 1 ≤ i < j ≤ 3.

23

Observe that as n →∞, these bounds converge to 0. This is the final ingredient required to strengthen

the other results of the previous section to the more general case of distributions with bounded support

and densities. In particular, we now have the following results, which we state without proof because

the proofs are similar to the case of uniform marginals.

Remark 6 The above bounds were derived assuming a regularly spaced grid of cells, so that the cells

were all of identical size. However, similar bounds can be expected to hold when the cells are not of equal

size. Indeed, one should be able to obtain bounds on Z+ij and Z−ij which converge to 0 as long as the

maximum sidelength of the cells converges to 0.

Proposition 13 Suppose that Σ is feasible for the given marginals. If all of the densities fi have

bounded support, then as n → ∞, the optimal objective value of the LP converges to 0. Furthermore,

the set Ω is closed.

Theorem 14 Suppose that all of the densities fi have bounded support. Then ∃n ≥ 1 such that the nth

LP is infeasible if and only if the matrix Σ is infeasible for the given marginals.

Theorem 15 Suppose that all of the densities fi have bounded support. Then ∃n ≥ 1 such that that the

optimal objective value of the nth LP is 0 if and only if the matrix Σ is contained in the interior of Ω.

So assuming that all of the densities fi have bounded support, and Σ does not lie on the boundary of

Ω, then we have a finite algorithm for determining whether Σ is feasible or not, and if feasible, supplying

an explicit joint density with the required properties. The algorithm is simply to solve a sequence of

LPs for n = 1, 2, . . . until either the LP is infeasible, or has an optimal objective value of 0. If Σ lies

on the boundary of Ω, then we know that it is feasible, and we can approach it arbitrarily closely with

chessboard distributions, but never exactly reach it.

4 Application to NORTA

We now apply the theory developed in Section 2 to explore the performance of the NORTA method in

matching the covariance matrix Σ for a 3-dimensional random vector X = (X1, X2, X3) with uniform

marginals. We will see that the Li and Hammond counterexample exists, obtain some insight into the

class of covariance matrices that cannot be matched using NORTA, and develop a remedy based on

semidefinite programming for such cases.

For notational convenience, we will use the correlation matrix R = 12Σ = (ρij : 1 ≤ i, j ≤ 3)

instead of Σ. The matrix R is determined by ρ12, ρ13 and ρ23. The set Θ of all possible values of

24

ρ = (ρ12, ρ13, ρ23) that constitute feasible correlation matrices is just a rescaling of the set Ω in Section

2, and is a proper subset of the cube [−1, 1]3 because a correlation matrix R is constrained to be positive

semidefinite. We examined all symmetric positive semidefinite matrices with off-diagonal components

in the set -1.0, -0.9, . . ., -0.1, 0, 0.1, . . ., 0.9, 1.0. There are 4897 such matrices.

These 4897 matrices were further tested to see whether they were NORTA feasible. We define the

matrix R to be NORTA feasible if the covariance matrix Λ found via (5) is positive semidefinite. In

this case, a multivariate normal random vector with covariance matrix Λ will be transformed via the

NORTA method to a multivariate uniform random vector with the required correlation matrix R.

A total of 160 sample matrices were identified to be NORTA defective. Note that since X1, X2 and

X3 are identically distributed, many different ρ’s form the same effective correlation matrix for X. For

example, ρ =(0.5,-0.5,0.5), (-0.5,0.5,0.5) and (0.5,0.5,-0.5) constitute the same joint distribution for X

up to a symmetry. If we eliminate such multiple occurences, the number of NORTA defective matrices

reduces to 31 cases.

The question that remains to be answered is whether these NORTA defective matrices are feasible for

uniform marginals. We applied our LP method to each NORTA defective case iteratively for increasing

values of n, the level of discretization, to determine whether a chessboard density can be constructed.

The results may be found in Table 1 of Appendix B. For 25 of the 31 cases, chessboard distributions

that exactly match R were constructed with a discretization level n ≤ 18. Larger values of n appeared

to be needed for matrices that were “near singular”, in the sense that their smallest eigenvalue was close

to 0. Chessboard distributions could not exactly match R in the remaining 6 cases, but this is to be

expected from Theorem 4 since in all of these cases, R was singular. However, the optimal objective

value in these 6 cases was approximately 2× 10−5 (with n = 80), so that chessboard distributions came

very close.

The Li and Hammond covariance matrix is among those that we were able to exactly match using

chessboard distributions. So we have rigorously established that there are feasible covariance matrices

for a given set of marginals that cannot be matched via the NORTA method.

These results seem to suggest that NORTA defective R matrices are those that are near-singular,

and perhaps are then relatively rare. However, Lurie and Goldberg (1998) believe that singular and

near-singular correlation matrices actually represent a common situation in cost analysis for example.

This is because correlations between cost elements are typically estimated from unbalanced data sets.

This is likely to lead to indefinite target correlation matrices, so that any least adjustment to them is

almost certainly going to result in an adjusted target matrix that is singular, or very nearly so.

25

It is natural to ask whether the NORTA method can be modified to generate random vectors with

the desired marginals and approximately the right covariance matrix.

Lurie and Goldberg (1998) described a method for identifying a positive semidefinite covariance

matrix ΣZ for use within the NORTA method that yields approximately the desired product-moment

covariance matrix Σ. Their approach involves a complicated nonlinear optimization, and must be spe-

cialized for approximating the rank correlation or product-moment correlation, depending on the case

desired. Furthermore, although they report that their optimization procedure always converges in prac-

tice, they do not have a proof of this result. Finally, their approach appears to be limited to fixed

sample sizes. We present an alternative method based on semidefinite programming that does not

share these limitations. (See Vandenberghe and Boyd 1996 for an accessible introduction to semidefinite

programming.)

Let ΛZ be the symmetric matrix that we wish to use in the NORTA procedure. We do not distinguish

between the cases where ΛZ is chosen to induce a given rank, product-moment, or other correlation in

the output random vector X. If ΛZ is indefinite, then we use a semidefinite program (SDP) to find a

matrix ΣZ that is “close” to ΛZ and is positive semidefinite. The matrix ΣZ is then used within the

NORTA method.

Why is this approach reasonable? In Theorem 2 of Cario and Nelson (1997), it is shown that under a

certain moment condition, the output covariance matrix is a continuous function of the input covariance

matrix ΣZ used in the NORTA procedure. So if ΣZ is “close” to ΛZ , then we can expect the covariance

matrix of the NORTA generated random vectors to be close to the desired matrix Σ. The moment

condition always holds when we are attempting to match rank covariances, and we can expect it to hold

almost invariably when matching product-moment correlations. Therefore, it is eminently reasonable to

try and minimize some measure of distance d(ΛZ , ΣZ) say, between ΛZ and ΣZ .

The SDP falls under the broad class of matrix completion problems; see Alfakih and Wolkowicz

(2000), or Johnson (1990). Given ΛZ as data, and assuming that we are operating in dimension d = 3,

we wish to choose a symmetric matrix ΣZ to

minimize |ΣZ(1, 2)− ΛZ(1, 2)|+ |ΣZ(1, 3)− ΛZ(1, 3)|+ |ΣZ(2, 3)− ΛZ(2, 3)|subject to ΣZ º 0,

ΣZ(i, i) = 1,

where the matrix inequality A º 0 signifies a constraint that the matrix A be positive semidefinite. This

problem is easily formulated as an SDP.

The SDP framework allows us to include preferences on how the search for ΣZ is performed. For

26

example, we can require that ΣZ(i, j) ≥ ΛZ(i, j), or that the value ΛZ(i, j) change by at most δ > 0. Effi-

cient algorithms are available for solving semidefinite problems; see Wolkowicz, Saigal and Vandenberghe

(2000).

We solved SDPs using public domain codes for all of the 31 cases of NORTA defective correlation

matrices identified earlier. In each case, we also computed d(Σ, ΣX), where Σ is the desired covariance

matrix, and ΣX is the covariance matrix that NORTA delivers using the solution from the SDP for ΣZ .

The results may be found in Table 2 of Appendix B. The maximum distance d(Σ, ΣX) observed was

approximately 0.05 in a case where Σ was singular, and when Σ was nonsingular, the maximum distance

was less than 0.02. We conclude that the SDP approach above is very effective, at least in 3 dimensions.

Remark 7 When ΛZ is indefinite, the optimal ΣZ lies on the boundary of the set of symmetric pos-

itive semidefinite matrices ΩZ with diagonal elements equal to 1. Therefore, ΣZ is singular. It does

not immediately follow though, that the induced covariance matrix ΣX is singular, since the NORTA

transformation alters covariances in a nonlinear fashion.

It is worth noting that ΣX may not be the closest NORTA feasible covariance matrix to Σ, because

the optimization was performed “in Gaussian space”. This is in contrast to the Lurie and Goldberg

(1998) procedure. But the values computed in our experiments seem to suggest that the difference

d(Σ, ΣX) is usually very small.

5 Conclusions and Future Research

In Section 2 we developed a method for determining whether a copula with given covariance matrix

exists or not. The method works for all covariance matrices except those lying on the boundary of the

set of feasible covariance matrices. It would be interesting to see if one can extend these methods to

match boundary covariance matrices as well.

In Section 3, we extended the results of Section 2 to more general marginals. Some of the results

from Section 2 extend immediately, while others appear to require that the support of the marginal dis-

tributions be bounded. We are investigating whether this bounded support hypothesis can be removed.

Using these methods, we have shown that the Li and Hammond example exists, so that the NORTA

method cannot necessarily match all feasible covariance matrices for a given set of marginals. For

such cases we have suggested a modification of the NORTA procedure that enables one to at least

approximately match the desired covariance matrix. The modified procedure is as follows.

27

1. Identify ΛZ to match some aspect of the desired covariance structure of X in any fashion. If ΛZ

is positive semidefinite, then one can proceed directly with the NORTA procedure.

2. If not, solve an SDP as outlined in Section 4 to identify a matrix ΣZ that is “close” to ΛZ , and

use ΣZ in the NORTA procedure.

The additional work involved in this modification only shows up in the initialization phase of the

NORTA method, and so there is no additional computational overhead while the method is being used

to generate replicates of X. Furthermore, public-domain algorithms are available for solving SDPs, and

these algorithms can handle very large dimensional problems with relative ease. Finally, the method does

not require tailoring to different correlation measures. The only step that depends on the correlation

measure being used is the first step when ΛZ is identified.

Returning to chessboard distributions, our numerical results suggest that the discretization level

required to match a covariance matrix increases as the matrix moves towards the boundary of the

feasible set of covariance matrices. This result is perhaps to be expected, since the optimal objective

values of the LP decrease as the discretization is made finer, so long as the discretizations are “nested”

(see the proof of Theorem 9). However, it remains to be seen whether the optimal objective values of

the LP decrease as one moves from discretization level n to n + 1 for n ≥ 2. (Note that the cells of the

(n + 1)th LP are not nested within those of the nth LP as long as n ≥ 2.)

A subject of active research is whether chessboard distributions can be made into a practical method

for random vector generation. The primary bottleneck appears to be in the setup phase when we need

to solve a potentially large LP.

A Proof of Lemma 8

Basically, one chooses the xi’s to be the vertices of a simplex centered at x. To be precise, let r > 0 be

a parameter, and set

x1 = ( −a1 −a2 · · · −am−1 −am )′ + x

x2 = ( a1 −a2 · · · −am−1 −am )′ + x

x3 = ( 0 2a2 · · · −am−1 −am )′ + x

......

......

......

......

...

xm = ( 0 0 · · · (m− 1)am−1 −am )′ + x

xm+1 = ( 0 0 · · · 0 mam )′ + x,

28

where

ai = r

√m

m + 1

√1

i(i + 1).

Then, (Dantzig 1991), the xi’s define the vertices of an equilateral simplex whose center is x, and whose

vertices are a (Euclidean) distance rm/(m + 1) from x. Choose r so that xi ∈ B(x, ε) for all i.

Observe that the average of the xi’s is x. In fact, it is easy to show that the (m+1)× (m+1) matrix

B consisting of the xi’s in columns, supplemented with a row of 1’s is nonsingular, and so

y = B−1x = (m + 1)−1(1, 1, . . . , 1)′.

Now, observe that B−1 is a continuous function of B, at least in a neighbourhood of B, and so y = B−1x

is locally a continuous function of x1, . . . , xm+1. Hence, there is a δ > 0 such that if ρ(xi, x′i) < δ for

all i = 1, . . . , m + 1, and D consists of the vectors x′i in columns supplemented with a row of 1’s, then

y = D−1x consists of all positive components, and the elements of y sum to 1. 2

B Computational Results

The computational results can be found at the Operations Research Home Page <http://or.pubs.informs.org>

in the Online Collection.

Acknowledgments

We would like to thank Marina Epelman for a discussion that was helpful in proving Lemma 8, and

the referees for suggestions that improved the presentation. This work was partially supported by NSF

grant DMI-9984717.

References

Alfakih, A. and H. Wolkowicz. 2000. Matrix completion problems. In Handbook of Semidefinite Pro-

gramming: Theory, Algorithms and Applications. H. Wolkowicz, R. Saigal, L. Vandenberghe, eds,

Kluwer, Boston, 533–545.

Billingsley, P. 1986. Probability and Measure. Wiley, New York.

Cario, M. C. and B. L. Nelson. 1996. Autoregressive to anything: time-series input processes for

simulation. Operations Research Letters 19:51–58.

29

Cario, M. C. and B. L. Nelson. 1997. Modeling and generating random vectors with arbitrary marginal

distributions and correlation matrix. Technical Report, Department of Industrial Engineering and

Management Sciences, Northwestern University, Evanston, Illinois.

Chen, H. 2001. Initialization for NORTA: generation of random vectors with specified marginals and

correlations. INFORMS Journal on Computing. To appear.

Clemen, R. T., and T. Reilly. 1999. Correlations and copulas for decision and risk analysis. Management

Science. 45:208–224.

Cooke, R. M. 1997. Markov and entropy properties of tree- and vine-dependent variables. Proceedings

of the ASA Section on Bayesian Statistical Science. Alexandria, VA.

Dantzig, G. B. 1991. Converting a converging algorithm into a polynomially bounded algorithm. Techni-

cal report 91-5, Systems Optimization Laboratory, Dept of Operations Research, Stanford University,

Stanford, California.

Devroye, L. 1986. Non-Uniform Random Variate Generation. Springer-Verlag, New York.

Henderson, S. G., B. A. Chiera, and R. M. Cooke. 2000. Generating “dependent” quasi-random numbers.

Proceedings of the 2000 Winter Simulation Conference. J. A. Joines, R. R. Barton, K. Kang, P. A.

Fishwick, eds. IEEE, Piscataway New Jersey. 527–536.

Hill, R. R., and C. H. Reilly. 1994. Composition for multivariate random vectors. In Proceedings of the

1994 Winter Simulation Conference, J. D. Tew, S. Manivannan, D. A. Sadowsky, A. F. Seila, eds.

IEEE, Piscataway New Jersey, 332 – 339.

Hill, R. R., and C. H. Reilly. 2000. The effects of coefficient correlation structure in two-dimensional

knapsack problems on solution procedure performance. Management Science, 46: 302–317.

Hodgson, T. J., J. A. Joines, S. D. Roberts, K. A. Thoney, J. R. Wilson. 2000. Satisfying due-dates

in large job shops: Characteristics of ”real” problems. Technical Report. Department of Industrial

Engineering, North Carolina State University, Raleigh, North Carolina.

Iman, R. and W. Conover. 1982. A distribution-free approach to inducing rank correlation among input

variables, Communications in Statistics: Simulation and Computation, 11: 311-334.

Johnson, C. R. 1990. Matrix completion problems: a survey. Proceedings of Symposia in Applied

Mathematics 40:171–198.

Johnson, M. E. 1987. Multivariate Statistical Simulation. Wiley, New York.

Kruskal, W. 1958. Ordinal measures of associaton. J. Amer. Statist. Assoc. 53:814–861.

30

Law, A. M. and W. D. Kelton. 2000. Simulation Modeling and Analysis, 3rd ed. McGraw Hill, Boston.

Lewis, P. A. W., E. McKenzie, and D. K. Hugus. 1989. Gamma processes. Communications in Statistics:

Stochastic Models 5:1–30.

Li, S. T., and J. L. Hammond. 1975. Generation of pseudorandom numbers with specified univariate

distributions and correlation coefficients. IEEE Transactions on Systems, Man, and Cybernetics.

5:557–561.

Lurie, P. M., and M. S. Goldberg. 1998. An approximate method for sampling correlated random

variables from partially-specified distributions. Management Science. 44:203–218.

Mackenzie, G. R. 1994. Approximately Maximum-Entropy Multivariate Distributions with Specified

Marginals and Pairwise Correlations. Ph.D. thesis. Department of Decision Sciences, University

of Oregon, Eugene OR.

Mardia, K. V. 1970. A translation family of bivariate distributions and Frechet’s bounds. Sankhya.

A32:119–122.

Meeuwissen, A. M. H., and R. M. Cooke. 1994. Tree dependent random variables. Technical report

94-28, Department of Mathematics, Delft University of Technology, Delft, The Netherlands.

Melamed, B., J. R. Hill, and D. Goldsman. 1992. The TES methodology: modeling empirical stationary

time series. In Proceedings of the 1992 Winter Simulation Conference IEEE, Piscataway, New Jersey,

135–144.

Nelsen, R. B. 1999. An Introduction to Copulas. Lecture Notes in Statistics, 139. Springer-Verlag, New

York.

Sklar, A. 1959. Fonctions de Repartition a n Dimensions et Leurs Marges. Publications de l’Institut

Statistique de l’Universite de Paris 8:229–231.

Vandenberghe, L., S. Boyd. 1996. Semidefinite Programming. SIAM Review 38:49–95.

van der Geest, P. A. G. 1998. An algorithm to generate samples of multi-variate distributions with

correlated marginals Computational Statistics and Data Analysis 27: 271-289.

Whitt,W. 1976. Bivariate distributions with given marginals. The Annals of Statistics. 4:1280–1289.

Wolkowicz, H., R. Saigal, L. Vandenberghe, eds. 2000. Handbook of Semidefinite Programming: Theory,

Algorithms and Applications. Kluwer, Boston.

31

Documents

Chessboard Distributions and Random Vectors with Speciﬁed Marginals and Covariance ... · 2014. 1. 17. · Chessboard Distributions and Random Vectors with Speciﬁed Marginals