View
14
Download
0
Category
Preview:
Citation preview
Chessboard Distributions and Random Vectors with Specified Marginals
and Covariance Matrix
Soumyadip Ghosh and Shane G. Henderson
School of Operations Research and Industrial Engineering
Cornell University
Ithaca, NY 14853, U.S.A.
July 6, 2001
Abstract
There is a growing need for the ability to specify and generate correlated random variables as primitive
inputs to stochastic models. Motivated by this need, several authors have explored the generation of
random vectors with specified marginals, together with a specified covariance matrix, through the use
of a transformation of a multivariate normal random vector (the NORTA method).
A covariance matrix is said to be feasible for a given set of marginal distributions if a random vector
exists with these characteristics. We develop a computational approach for establishing whether a given
covariance matrix is feasible for a given set of marginals. The approach is used to rigorously establish
that there are sets of marginals with feasible covariance matrix that the NORTA method cannot match.
In such cases, we show how to modify the initialization phase of NORTA so that it will exactly match
the marginals, and approximately match the desired covariance matrix.
An important feature of our analysis is that we show that for almost any covariance matrix (in a
certain precise sense), our computational procedure either explicitly provides a construction of a random
vector with the required properties, or establishes that no such random vector exists.
1
Introduction
There is a growing need for the ability to specify and generate random vectors consisting of correlated
observations as primitive inputs to stochastic models. For example, in a manufacturing setting, the
processing times of a single job at different stations may be correlated due to characteristics of the
job such as size. In determining reservoir release rules, the inflows of water to different reservoirs are
invariably correlated. In generating random test problems for a given algorithm, it is advantageous to
ensure that some elements of the problem are correlated (Hill and Reilly 1994, 2000, Hodgson et al.
2000). Further applications have recently been reported in cost analysis (Lurie and Goldberg 1998), and
in decision and risk analysis (Clemen and Reilly 1999).
Perhaps the “ideal” approach is to specify the full joint distribution of the random vector. This
approach is typically limited to situations where the marginal distributions are all from the same para-
metric family. For methods of this type see, for example, Devroye (1986) and Johnson (1987). But
the case where the marginals are not all from the same parametric family affords far greater modeling
generality, and is perhaps the case of more interest from a practical standpoint.
The primary difficulty in this case is that a tremendous amount of information is typically required
to specify (and fit) such a joint distribution. Furthermore, special methods must be devised to generate
random vectors with the given joint distribution, and this can be a practically insurmountable problem
for a model of even moderate complexity (Law and Kelton 2000, p. 479).
A practical alternative is to only specify the marginal distributions of the random variables, together
with the correlation matrix or covariance matrix. (Note that this information does not necessarily
uniquely specify the distribution.) The covariance measure could be Spearman’s rank covariance, Pear-
son’s product-moment covariance, Kendall’s τ , or any other convenient covariance measure. In this
paper, we will focus on Pearson’s product-moment covariance and Spearman’s rank covariance because
of their wide use and acceptance in application settings.
It is important to note that we are restricting attention to the generation of finite-dimensional random
vectors. As such, we are not attempting to generate a time series with given correlation properties. For
such studies, see for example Cario and Nelson (1996), Melamed et al. (1992), and Lewis et al. (1989).
Hill and Reilly (1994) describe a method for generating random vectors with specified marginals
and covariances through mixtures of extreme correlation distributions. The approach is very effective
for random vectors of low dimension (d ≤ 3 say), but the computational requirements quickly become
excessive for higher dimensional random vectors. There is another difficulty with this approach. We say
2
that a covariance matrix is feasible for a given set of marginal distributions if a random vector exists with
the prescribed marginals and covariance matrix. We show (Section 1) that there are sets of marginals
with feasible covariance matrix that cannot be matched using the technique developed by Hill and Reilly
(see Section 1 below).
Cario and Nelson (1997) described the “NORmal To Anything” (NORTA) method for generating
random vectors with prescribed covariance matrix. The NORTA method basically involves a component-
wise transformation of a multivariate normal random vector, and capitalizes on the fact that multivariate
normal random vectors are easily generated; see e.g., Law and Kelton 2000, p. 480. Cario and Nelson
traced the roots of the method back to Mardia (1970) who looked at bivariate distributions, and to Li
and Hammond (1975) who concentrated on the case where all of the marginals have densities (with re-
spect to Lebesgue measure). Iman and Conover (1982) implemented the same transformation procedure
to induce a given rank correlation in the output. Their method is only approximate, in that the output
will have only very approximately the desired rank correlation. Clemen and Reilly (1999) described
how to use the NORTA procedure to induce a desired rank correlation in the context of decision and
risk analysis. Lurie and Goldberg (1998) implemented a variant of the NORTA method for generating
samples of a predetermined size.
It is natural to ask whether the NORTA procedure can match any feasible covariance matrix for a
given set of marginals. Both Li and Hammond (1975) and Lurie and Goldberg (1998) give examples
where this does not appear to be the case. However, the random vectors that are proposed in these
papers as counterexamples are not proved to exist, and so the question has not yet been completely
settled.
For 2-dimensional random vectors, the NORTA method can match any feasible covariance matrix.
This follows immediately from the characterizations in Whitt (1976). However, for dimensions 3 and
greater, little is known.
In this paper, we prove that there are feasible covariance matrices for a given set of marginals that
the NORTA method cannot match. To establish this result we derive a computational procedure based
on linear programming for establishing whether or not a given covariance matrix is feasible for a given
set of marginals, and if so, explicitly providing a joint distribution with the required properties. We
call the constructed distributions “chessboard” distributions because of their structure; see Section 2.
It is worth noting that, at least in this paper, we are not advocating generating random vectors from
chessboard distributions. The approach is developed primarily to rigorously establish that NORTA can
fail. However, the idea of using chessboard distributions to generate random vectors with a set of desired
3
properties is a topic of current research.
Other methods for tackling the problem of generating random vectors with specified marginals and
rank covariance matrix have been developed. Chessboard distributions are perhaps closest in nature to
the “piecewise-uniform copulae” developed in Mackenzie (1994). (A copula is the distribution function
of a random vector with uniform marginals. The term was coined in Sklar 1959, and Nelsen 1999 is a
useful recent reference). Mackenzie (1994) attempts to identify a piecewise-uniform copula that matches
a given set of rank covariances. He assumes that such copulae exist, and then selects the one with
maximum entropy. In contrast, we do not assume this feasibility, develop the theoretical properties
of the approach, and apply it in the context of the NORTA method. Meeuwissen and Cooke (1994)
describe “tree-dependent” random vectors that can be rapidly generated, but cannot necessarily match
all feasible covariance matrices. Cooke (1997) introduces a generalization of tree-dependent random
vectors that is based on a “vine” representation of a joint distribution. Such random vectors can be
rapidly generated, but it is not yet clear whether they can be used to model any feasible covariance
matrix.
It is well-known that the set of feasible covariance matrices for a given set of marginals forms a convex
set (in a certain Euclidean space - see Section 2). Covariance matrices on the boundary of this set cannot
be matched using chessboard distributions, although chessboard distributions can get arbitrarily close
to such matrices; see Section 2.
Remark 1 In the case where all of the marginal distributions have densities with respect to Lebesgue
measure, the chessboard distribution we construct has a joint density with respect to d-dimensional
Lebesgue measure. In this case, we can, and do, refer to a chessboard density.
The philosophy of specifying marginals and correlations to model dependent random variates is clearly
an approximate one, since the joint distribution is not completely specified. Therefore, one should be
willing to live with reasonable (this is, of course, a relative term) discrepancies in the covariance matrix
from that desired. In cases where NORTA cannot precisely match a feasible covariance matrix, it is
still possible to use NORTA to obtain the desired marginals exactly, and the desired covariance matrix
approximately. Lurie and Goldberg (1998) gave an alternative approach to this problem, but we believe
that our solution has properties that make it more desirable; see Section 4.
If one is not willing to live with reasonable discrepancies from the desired covariance matrix, then
perhaps a more careful approach to specifying the dependence structure is warranted.
We view the primary contributions of this paper as follows.
4
1. We provide a computational procedure for determining whether a given covariance matrix is feasible
or not for a copula, i.e., for a random vector with uniform(0, 1] marginals. (In this case the rank
covariance and product-moment covariance are identical.) If the covariance matrix is feasible, then
an explicit construction of a joint density with these properties, that we call a chessboard density,
is provided. The method works for all covariance matrices that do not lie on the boundary of the
set of feasible covariance matrices; see Section 2. To the best of our knowledge, this is the first
example of such a procedure. This case is important because it is central to the analysis and use
of rank covariance for continuous marginals; see Section 1.
2. We provide a computational procedure for determining whether or not a given Pearson product-
moment covariance matrix is feasible for a given set of more general marginal distributions. If the
covariance matrix is feasible for the given set of marginals, we provide an explicit construction of
a chessboard distribution with the desired properties. Again, this procedure works for covariance
matrices that do not lie on the boundary of the set of feasible covariance matrices, and we believe
that this is the first example of such a procedure.
3. We rigorously establish that there are feasible covariance matrices that cannot be matched using
the NORTA method.
4. We provide a simple modification to the initialization phase of the NORTA method that enables one
to use the NORTA method to closely approximate the desired covariance matrix. The modification
involves the solution of a semidefinite program, and works in both the rank correlation and Pearson
product-moment correlation cases without any specialization. Based on a small computational
study, it appears that when one cannot exactly match a desired covariance, the discrepancy between
the desired and realized covariance matrices is quite small, at least for 3-dimensional random
vectors.
The remainder of this paper is organized as follows. In Section 1 we review the NORTA method,
and describe how it may be used to match a given Pearson product-moment covariance matrix, or a
given rank covariance matrix. In Section 2 we develop the theory of chessboard distributions in the
case where all of the marginal distributions are uniform(0, 1]. The chessboard distribution concept is
extended to more general marginals in Section 3. Next, in Section 4 we present a small computational
study that sheds light on when we might expect the NORTA method to be unable to match a feasible
covariance matrix, and provide several examples where this occurs. Our numerical results suggest that
as the covariance matrix gets close to the boundary, the linear program (LP) that needs to be solved
5
increases in size. In this section, we also present our modification of the NORTA method that involves
semidefinite programming. Finally, in Section 5 we discuss conclusions and future research.
1 The NORTA method
Cario and Nelson (1997) described the “NORmal To Anything” (NORTA) method for generating i.i.d.
replicates of a random vector X∗ say with prescribed marginal distributions and covariance structure.
In this method, one starts by generating a random vector Z with a multivariate normal distribution and
transforms Z to obtain a random vector X = (X1, . . . , Xd). Let Fi be the desired marginal distribution
function of X∗i , for i = 1, . . . , d.
The NORTA method generates i.i.d. replicates of X by the following procedure.
1. Generate an IRd valued standard normal random vector Z = (Z1, . . . , Zd) with mean vector 0 and
covariance matrix ΣZ = (ΣZ(i, j) : 1 ≤ i, j ≤ d), where ΣZ(i, i) = 1 for i = 1, . . . , d.
2. Compute the vector X = (X1, . . . , Xd) via
Xi = F−1i (Φ(Zi)), (1)
for i = 1, . . . , d, where Φ is the distribution function of a standard normal random variable, and
F−1i (u) = infx : Fi(x) ≥ u. (2)
The vector X generated by this procedure will have the prescribed marginal distributions. To see
this, note that each Zi has a standard normal distribution, so that Φ(Zi) is uniformly distributed on
(0, 1), and so F−1i (Φ(Zi)) will have the required marginal distribution.
The covariance matrix ΣZ should be chosen so that it induces the required correlation structure on
X. There are many measures of correlation between two random variables, but perhaps the two most
popular are Pearson’s product-moment correlation, and Spearman’s rank correlation.
1.1 Pearson’s Product-Moment Correlation
Suppose that we wish X∗ to have Pearson product-moment covariance matrix Σ, where
Σ(i, j) = EX∗i X∗
j − EX∗i EX∗
j
for 1 ≤ i, j ≤ d. This is the case that Cario and Nelson (1997) examined. Note that this is equivalent
to prespecifying the correlation matrix, since the marginal distributions are also prespecified. To ensure
6
that the required correlations are defined, we make the assumption that E[(X∗i )2] < ∞ for i = 1, . . . , d.
It turns out that choosing ΣZ to arrive at the correct covariance matrix Σ is a nontrivial problem.
Let X be the random vector generated from (1) above and ΣX denote its covariance matrix. As
noted in Li and Hammond (1975) and Cario and Nelson (1997), each term ΣX(i, j) = cov(Xi, Xj) is a
function of only cov(Zi, Zj). To see this, note that when cor(Zi, Zj) 6= ±1,
cov(Xi, Xj) =∫ ∞
−∞
∫ ∞
−∞F−1
i (Φ(zi))F−1j (Φ(zj))ϕij(zi, zj) dzi dzj − EXiEXj , (3)
where ϕij is the joint density of (Zi, Zj). The expression (3) depends only on the marginal distributions
Fi and Fj , and the density ϕij . The density ϕij depends only on the covariance between Zi and Zj .
When cov(Zi, Zj) = ±1, the joint density ϕij degenerates and the integral representation (3) is no longer
valid. However, in this degenerate case the covariance between Xi and Xj is still a function only of the
covariance between Zi and Zj . Hence, the relation (3) defines a function cij : [−1, 1] → IR mapping
cov(Zi, Zj) to cov(Xi, Xj), where Xi and Xj are defined via (1).
So the problem of matching a desired covariance matrix reduces to d(d− 1)/2 separate root-finding
problems of selecting cov(Zi, Zj) to match cov(Xi, Xj) to Σ(i, j). Unfortunately, there is no general
analytical expression for the function cij , and so we cannot determine the exact ΣZ that is to be used
analytically.
Cario and Nelson (1997) established that under very mild conditions, the function cij is a continuous
non-decreasing function of ΣZ(i, j). This result allows us to perform an efficient numerical search for
values ΛZ(i, j) that yield
cij(ΛZ(i, j)) = Σ(i, j) for i < j. (4)
Remark 2 Under more restrictive assumptions on the marginal distributions than Cario and Nelson
(1997) impose, Henderson, Chiera and Cooke (2000) show that (4) possesses a unique solution.
We take ΛZ(i, i) = 1 for i = 1, . . . , d, and for i > j, set ΛZ(i, j) = ΛZ(j, i) to ensure that ΛZ is
symmetric. Alternatives to the numerical search suggested by Cario and Nelson (1997) include the use of
a stochastic root-finding algorithm (Chen 2001), or polynomial expansions (van der Geest 1998). Unless
otherwise stated, we henceforth assume that a solution to (4) exists.
One might hope that if the matrix ΛZ satisfies (4), then ΛZ could be used in the NORTA method to
generate i.i.d. replicates of X. Unfortunately, the results of this paper prove that this is not always the
case. In fact, there exists a feasible covariance matrix for a 3-dimensional random vector with uniform
marginals on (0, 1] that cannot be generated with the NORTA procedure. The problem arises when the
7
matrix ΛZ as determined from (4) is not positive semidefinite, in which case it is not a valid covariance
matrix.
Li and Hammond (1975) suggested the following example to illustrate this important fact. Let
X∗1 , X∗
2 and X∗3 be 3 uniformly distributed random variables on (0, 1] with covariance matrix
Σ =112
1 −0.4 0.2
−0.4 1 0.8
0.2 0.8 1
.
In the special case when X∗ has uniform marginals, the equations (4) can be solved analytically. In
particular, Kruskal (1958) showed that the (unique) solution to (4) is given by
ΛZ(i, j) = 2 sin[2πΣ(i, j)]. (5)
For the Li and Hammond example, the (unique) matrix ΛZ found from (5) is not positive semidefinite.
It is important to observe though, that this is a counterexample only if the postulated random vector
exists. Li and Hammond did not show this.
Remark 3 It is straightforward to show that the Li and Hammond example cannot be generated using the
extremal distributions method of Hill and Reilly (1994). One simply attempts to solve the LP suggested
by Hill and Reilly (1994), which turns out to be infeasible. Therefore, if the Li and Hammond example
exists, it shows that there are feasible covariance matrices that cannot be matched using the extremal
distributions technique.
Lurie and Goldberg (1998) gave an example with nonuniform marginals and positive definite covari-
ance matrix for which the solution to (4) is also not positive semidefinite. They did not establish that
the postulated random vector exists.
When all of the marginals have continuous distribution functions, a natural alternative to the nu-
merical search procedure mentioned earlier is to “work in Gaussian space”. In other words, given a
set of data with known (or fitted) marginals with continuous distribution functions, we first transform
the data set into normal random variates using the inverse of the transformation (1). We can then
compute an empirical covariance matrix ΣZ and use this covariance matrix in the NORTA procedure.
(If the distribution function F of a random variable X is not continuous, then F (X) does not have a
uniform distribution on (0, 1), and so one will not obtain a normally distributed random variable using
Φ−1(F (X)). Therefore, the continuity of the marginal distribution functions is needed.)
8
This approach is certainly simpler than a numerical search procedure, but it has two important
drawbacks. First, it requires a set of input data, which may not be available in general. But second,
and perhaps more importantly, this procedure does not necessarily ensure that the resulting X variates
will have the required covariance structure. To see why, observe that the transformed normal random
variables mentioned above are unlikely to have a joint normal distribution. Therefore, the correlations of
the jointly normal random variables used in the NORTA method using ΣZ will be unlikely to transform
through the NORTA procedure to yield the desired covariance matrix for X, as one might otherwise
expect. This is a subtle point, but one that is worth bearing in mind.
1.2 Spearman’s Rank Correlation
Suppose now that we wish X∗ to have Spearman’s rank covariance matrix Σ, where
Σ(i, j) = rcov(X∗i , X∗
j ) = EFi(X∗i )Fj(X∗
j )− EFi(X∗i )EFj(X∗
j )
for 1 ≤ i, j ≤ d. This is the case treated by Clemen and Reilly (1999). In contrast to product-moment
covariance, the rank covariance is always defined, because Fi(X∗i ) is a bounded random variable. In
fact, if Fi is continuous, then Fi(X∗i ) is uniformly distributed on (0, 1). An important property of
Spearman’s rank covariance is that unlike Pearson’s product-moment covariance, it is preserved under
strictly increasing transformations of the random variables.
If all of the marginal distribution functions Fi are continuous, then the NORTA transformation (1)
is strictly increasing. In this case, the rank covariance is preserved by the NORTA transformation, and
so if X is the NORTA generated random vector, then
rcov(Xi, Xj) = cov(Φ(Zi), Φ(Zj)). (6)
But (6) is precisely the quantity Σ(i, j) in (5). Therefore, given a desired rank covariance matrix Σ, we
simply compute ΣZ = ΛZ via (5) and use this within the NORTA procedure.
Observe that if the random vector in the Li and Hammond example (given above) exists, then it is
again an example showing that there are feasible rank covariance matrices for a given set of marginals
that cannot be matched using a NORTA procedure.
In the case where Fi (say) is not continuous, (6) no longer holds. Therefore, the analytical expression
(5) cannot be used. However, one could use a numerical search procedure as in Cario and Nelson (1997)
to identify the covariance ΣZ(i, j) that yields the required rank covariance rcov(Xi, Xj). This follows
since the rank covariance between Xi and Xj is a nondecreasing continuous function of the covariance
9
between Zi and Zj . The nondecreasing property follows immediately from the proof of Theorem 1 in
Cario and Nelson (1997), and the fact that the function Fi(F−1i (Φ(·))) is nondecreasing. Continuity
follows from Theorem 2 of Cario and Nelson.
In this section we reviewed the NORTA method and discussed the significance of the Li and Hammond
example. We now turn to the question of whether the Li and Hammond counterexample exists or not.
2 Copulas
Recall that a copula is the joint distribution function of a random vector with uniform marginals on (0, 1].
In this section, we develop a method for either constructing the joint distribution of a 3-dimensional
copula with prescribed covariance matrix, or establishing that such a joint distribution does not exist.
(Note that Pearson’s product-moment covariance and Spearman’s rank covariance coincide in the copula
case, so the problem is well-defined.) The approach is easily carried over to the general case of a d-
dimensional copula with prescribed covariance matrix.
We will let X = (X1, X2, X3) denote a random variable with such a distribution and let Σ = (Σij :
1 ≤ i, j ≤ 3) be the desired covariance matrix. We first construct the probability mass function (pmf)
of a random vector Y = (Y1, Y2, Y3) whose marginals are discretized versions of the marginals of X.
The pmf will be constructed to try to ensure that Y has covariance matrix Σ (except for the diagonal
entries). From this pmf, we then construct the density of X in such a way that the off-diagonal entries
in the covariance matrix are maintained. (The diagonal elements are determined by the marginals of
X.) This then yields the required construction.
Our notation will appear partly redundant at times, but this is done to ensure consistency with
Section 3 where we will extend these ideas to more general marginal distributions.
Let n ≥ 1 be an integral parameter that determines the level of discretization that will be performed.
Let yi,k = kn , k = 0, . . . , n be the set of points that divide the range (0,1] of the ith variable into n equal
length sub-intervals. For k = 1, . . . , n and i = 1, 2 and 3, let
Yi,k = E[Xi |Xi ∈ (yi,k−1, yi,k] ] =2k − 1
2n(7)
be the conditional mean of Xi given that it lies in the kth sub-interval.
The support of the random vector Y is the mesh of points
(Y1,i, Y2,j , Y3,k) : 1 ≤ i, j, k ≤ n.
10
Let
q(i, j, k) = P (Y1 = Y1,i, Y2 = Y2,j , Y3 = Y3,k)
be the probability that Y equals the (i, j, k)th point in the support of Y , so that q represents the pmf
of the random vector Y . (Note that it is not the pmf itself, since the function q is defined on integers,
while the domain of the pmf is contained in the unit cube.)
Consistent with the notion that Y is a discretized version of X, we also have that
q(i, j, k) = P (X ∈ C(i, j, k)),
where the cell C(i, j, k) represents the cube of points surrounding the (i, j, k)th point in the support of
Y . More precisely,
C(i, j, k) = (x1, x2, x3) : y1,i−1 < x1 ≤ y1,i, y2,j−1 < x2 ≤ y2,j , y3,k−1 < x3 ≤ y3,k.
We then see thatn∑
j,k=1
q(i, j, k) = P (Y1 = Y1,i) = P (X1 ∈ (y1,i−1, y1,i]) =1n
, ∀i = 1, . . . , n, (8)
n∑
i,k=1
q(i, j, k) = P (Y2 = Y2,j) = P (X2 ∈ (y2,j−1, y2,j ]) =1n
, ∀j = 1, . . . , n, (9)
n∑
i,j=1
q(i, j, k) = P (Y3 = Y3,k) = P (X3 ∈ (y3,k−1, y3,k]) =1n
, ∀k = 1, . . . , n, (10)
q(i, j, k) ≥ 0 ∀i, j, k = 1, . . . , n. (11)
With these constraints satisfied, we then have that EYi = 1/2 = EXi for i = 1, . . . , 3. To see this,
note that for Y1, we have that
EY1 =n∑
i,j,k=1
Y1,iq(i, j, k)
=n∑
i,j,k=1
E[X1 |X1 ∈ (y1,i−1, y1,i] ]P (X ∈ C(i, j, k))
=n∑
i=1
E[X1 |X1 ∈ (y1,i−1, y1,i] ]P (X1 ∈ (y1,i−1, y1,i])
= EX1.
Recall that our intermediate goal is to match the covariance matrix of Y to that of X (with the
exception of the diagonal elements). We do this using an LP. If Cij = cov(Yi, Yj), then we want to
minimize
|C12 − Σ12|+ |C13 − Σ13|+ |C23 − Σ23|. (12)
11
Now
C12 =n∑
i,j,k=1
Y1,iY2,jq(i, j, k)− EY1EY2,
which is a linear function of the q(i, j, k)’s, with similar linear expressions for C13 and C23. Furthermore,
the matrix Σ is simply a parameter and so, using a standard trick in linear programming, we can represent
|C12 − Σ12| in a linear fashion, and similarly for the other terms in (12) as follows.
Define Z+ij and Z−ij to be the positive and negative parts of the difference Cij − Σij , i.e.,
Z+ij = (Cij − Σij)+ = maxCij − Σij , 0, and (Cij − Σij)− = −minCij − Σij , 0.
We can now attempt to match the covariances of Y to those of X using the LP
min∑2
i=1
∑3j=i+1(Z
+ij + Z−ij )
subject to Cij − Σij = Z+ij − Z−ij , i = 1 to 2 and j = i + 1 to 3
Z+ij ≥ 0, Z−ij ≥ 0, together with constraints (8), (9), (10) and (11).
This LP is always feasible since a product copula where the Yi’s are independent can be easily
constructed by setting all q(i, j, k) = n−3. Also, the objective function of the LP is bounded below by
0, so an optimal solution exists.
If the optimal objective value for the LP is 0, then we have constructed a joint probability mass
function for Y that has the desired covariance structure, i.e., cov(Yi, Yj) = Σij .
The discretized random vector Y does not possess continuous uniform (0,1] marginals. However,
we can construct a random vector X with continuous uniform marginals from Y in such a way that
cov(Yi, Yj) = cov(Xi, Xj) for i 6= j, i.e, the covariances are preserved. Assuming the optimal objective
value of the LP is 0, this then yields an explicit construction of a random vector with the desired
marginals and convariance matrix.
By conditioning on the cell containing X, we see that the requirement that cov(Y1, Y2) = cov(X1, X2)
is equivalent ton∑
i,j,k=1
q(i, j, k) · Y1,iY2,j −EY1EY2 =n∑
i,j,k=1
E[X1X2|X ∈ C(i, j, k)] ·P (X ∈ C(i, j, k))−EX1EX2 (13)
But, EY1 = EX1 and EY2 = EX2, and so (13) can be re-expressed asn∑
i,j,k=1
q(i, j, k) · E[X1|X ∈ C(i, j, k)] · E[X2|X ∈ C(i, j, k)]− E[X1X2|X ∈ C(i, j, k)] = 0. (14)
Equation (14) could be satisfied in many ways, but perhaps the simplest is to note that (14) will hold
if, conditional on X lying in C(i, j, k), X1, X2 and X3 are independent. In that case, each term in the
12
sum (14) is 0. One can ensure that this conditional independence holds, while simultaneously ensuring
that X has the correct marginal distributions, by setting the density of X within the cell C(i, j, k) to
that of independent, uniformly distributed random variables, scaled so that the total mass in the cell is
q(i, j, k). To be precise, if f is the density of X, then for any x ∈ C(i, j, k), we set
f(x) = n3q(i, j, k). (15)
In a sense, we are “smearing” the mass q(i, j, k) uniformly over the cell C(i, j, k).
Theorem 1 below proves that if the optimal objective value of the LP is 0, then the density f so
constructed has the desired marginals and covariance matrix.
Theorem 1 If the optimal objective value of the LP is 0, then the density f defined via (15) has uniform
(0,1] marginals and covariance matrix Σ.
Proof: Clearly, f is nonnegative and integrates to 1. Next, we need to show that the marginals, fj say,
of f are uniform. For any x ∈ (y1,i−1, y1,i), we have that
f1(x)dx =n∑
j,k=1
P (X1 ∈ [x, x + dx)|X ∈ C(i, j, k))P (X ∈ C(i, j, k))
=n∑
j,k=1
P (X1 ∈ [x, x + dx)|X1 ∈ (y1,i−1, y1,i])q(i, j, k)
=n∑
j,k=1
dx∫ y1,i
y1,i−11 dy
q(i, j, k)
= n
n∑
j,k=1
q(i, j, k)dx = 1dx
The first equation follows by conditioning on the cell in which the random vector lies, and the second
by the conditional independence of X1, X2 and X3 given that X lies in C(i, j, k). The third follows from
the assumption of uniform “smearing” of q(i, j, k) on the cell C(i, j, k). A similar result holds for the
marginals of X2 and X3, and so the joint density f has the right marginals.
Next we need to show that the obtained covariances are indeed the desired ones. Take the case of
cov(X1, X2). Starting with its definiton, we have
cov(X1, X2) = EX1X2 − EY1EY2
= EY1Y2 − EY1EY2
= Σ12.
13
The first equality follows from the fact that EY1 = EX1 and EY2 = EX2, and the second is just a
restatement of (14). The final equation follows from the fact that the optimal objective value is 0.
The same follows for cov(X2, X3) and cov(X1, X3). Hence, f has the covariances as desired and this
completes the proof. 2
Remark 4 The name “chessboard” distribution is motivated by the form of (15) in a 2 dimensional
problem. In this case, the unit square is broken down in n2 squares, and the density f is constant on
each square, with value n2q(i, j).
Remark 5 There is no need for the cells used in the above construction to be of equal size. Indeed,
Theorem 1 remains true for more general discretizations; see Theorem 10 in Section 3.
The feasible region of the LP can be reduced through the inclusion of constraints on the Z+ijs and
Z−ijs. These constraints provide us with a new feasibility criterion to test for the existence of a random
vector with the given covariance matrix.
The constraints are developed by assuming that a random vector X with uniform marginals and
covariance matrix Σ exists, discretizing X to obtain a new random vector X say, and then bounding the
change in the covariances resulting from the discretization.
So suppose that we discretize X to obtain X. Let
q(i, j, k) = P (X = (Y1,i, Y2,j , Y3,k)),
and observe that q provides a feasible solution to the above LP. We now wish to bound the change in
the covariance resulting from this discretization. Observe that
cov(X1, X2)− Σ12 = EX1X2 − EX1X2
=n∑
i,j,k=1
(Y1,iY2,j − E[X1X2|X ∈ C(i, j, k)])q(i, j, k). (16)
But
y1,i−1y2,j−1 ≤ E[X1X2|X ∈ C(i, j, k)] ≤ y1,iy2,j . (17)
Combining (16) with (17) we see that
cov(X1, X2)− Σ12 ≤n∑
i,j,k=1
q(i, j, k)(Y1,iY2,j − y1,i−1y2,j−1) and (18)
cov(X1, X2)− Σ12 ≥n∑
i,j,k=1
q(i, j, k)(Y1,iY2,j − y1,iy2,j). (19)
14
Equation (18) gives an upper bound on Z+12, and (19) gives an upper bound on Z−12. Similar bounds
may be obtained for the other covariances. After substituting in the explicit expressions for yi,k and
Yi,k, these bounds simplify to
Z+ij ≤
12n
− 14n2
and Z−ij ≤12n
+1
4n21 ≤ i < j ≤ 3. (20)
Once the LP is augmented with the bounds (20), it is no longer guaranteed to be feasible. In fact,
Theorem 2 below establishes that if the augmented LP is infeasible for any value of n ≥ 1, then the
covariance matrix Σ is not feasible for uniform marginals. The proof is basically a summary of the above
discussion, and is given to help clarify these ideas.
Theorem 2 If the augmented LP is infeasible for some n ≥ 1, then there cannot exist a random vector
X with uniform marginals and the desired covariance matrix Σ.
Proof: Suppose there exists a random vector X with uniform marginals and covariance matrix Σ. Then,
as above, we can construct a solution q by discretizing X that satisfies all of the constraints, including
the bounds (20). Thus the augmented LP is feasible, which is a contradiction. 2
In fact, one can prove a converse to Theorem 2.
Theorem 3 If the covariance matrix Σ is not feasible for uniform (0,1] marginals, then there exists an
n ≥ 1 such that the augmented LP is infeasible.
Proof: On the contrary, suppose that the augmented LP is feasible for all n ≥ 1. Let qn denote an
optimal solution to the nth augmented LP, and let µn denote the probability measure corresponding to
the density resulting from the smearing operation (15) applied to qn. Then each µn is the distribution
of a random vector with support contained in (0, 1]3 with uniform(0, 1] marginals. Hence, the sequence
(µn : n ≥ 1) is tight, and by Theorem 29.3 on p. 392 of Billingsley (1986), it possesses a weakly
convergent subsequence (µn(k) : k ≥ 1), converging to µ say.
Now, µ has uniform (0, 1] marginals. This follows from Theorem 29.2, p. 391 of Billingsley (1986)
since each µn(k) has uniform(0, 1] marginals, µn(k) ⇒ µ as k →∞, and the projection map πj : IR3 → IR
that returns the jth coordinate of a vector in IR3 is continuous.
Now, if Cn is the covariance matrix of the distribution qn, then
2∑
i=1
n∑
j=i+1
|Cnij − Σij | ≤ 3
2n+
34n2
→ 0
15
as n → ∞. This follows from the bounds (20), and the fact that in any optimal solution, it is not the
case that both Z+ij and Z−ij are strictly positive.
Finally, if Xn(k) has distribution µn(k), then (Xn(k)i X
n(k)j : k ≥ 1) is a uniformly bounded sequence of
random variables, and therefore uniformly integrable. It immediately follows that the covariance matrix
Λ of µ is given by
Λ = limk→∞
Cn(k) = Σij .
Thus, µ has the required marginals and covariance matrix, which is a contradiction, and the result is
proved. 2
Combining Theorems 2 and Theorem 3, we see that a covariance matrix is infeasible for uniform
marginals if, and only if, the augmented LP is infeasible for some n ≥ 1.
Given this very sharp characterization of infeasible covariance matrices, it is natural to ask whether
a similar result holds for feasible covariance matrices. We would then have the result that a covariance
matrix is feasible for a given set of marginals if and only if there is some finite n such that the optimal
objective value of the augmented LP is zero. Unfortunately, this conjecture is false.
Suppose that X1 = X2 and hence cov(X1, X2) = var(X1) = 1/12. For given n, the covariance
between Y1 and Y2 is maximized by concentrating all mass on the cells (i, i), and so q(i, i) = n−1 for
1 ≤ i ≤ n. In that case, we have that
cov(Y1, Y2) =n∑
i=1
(2i− 1
2n
)2 1n−
(12
)2
=112− 1
12n2
Therefore, cov(Y1, Y2) < 1/12 for all finite n, and so the conjecture is false.
Notice that the covariance matrix in this example is singular. This example is a special case of the
following result.
Theorem 4 All chessboard densities have nonsingular covariance matrices.
Proof: On the contrary, suppose that f is a chessboard density with singular covariance matrix Σ, and
let X have density f . Since Σ is singular, there exists a nonzero vector α such that Σα = 0. Hence,
var(α′X) = α′Σα = 0, and so α′X = α′EX a.s. Since α is nonzero, we may, by relabelling variables if
necessary, write X1 as a linear function of X2, X3, say X1 = β0 + β2X2 + β3X3. This equality must also
hold conditional on X ∈ C(i, j, k). But the components of X are conditionally independent given that
X ∈ C(i, j, k) because f is a chessboard density, which is the required contradiction. 2
16
The importance of Theorem 4 is that if Σ is feasible for the given marginals and singular, then no
matter how large n may be, the optimal objective value of the LP will always be > 0, i.e., we cannot
exactly match the covariance matrix Σ. However, we can come arbitrarily close, as the following result
shows.
Theorem 5 Suppose that the covariance matrix Σ is feasible for uniform (0, 1] marginals. Then for
all n ≥ 1, the augmented LP is feasible, and if z(n) is the optimal objective value of the nth LP, then
z(n) → 0 as n →∞.
Proof: Since Σ is feasible for uniform marginals, the augmented LP is feasible for all n ≥ 1. (This is
just the contrapositive of Theorem 2.) Let qn denote an optimal solution to the nth LP, and let fn be
the corresponding smeared density. If Cn is the covariance matrix corresponding to fn, then the bounds
(20) imply that
z(n) =2∑
i=1
n∑
j=i+1
|Cnij − Σij | ≤ 3
2n+
34n2
→ 0
as n →∞. 2
Therefore, chessboard densities can come arbitrarily close to any required Σ that is feasible for
uniform marginals. In fact, one can prove that chessboard densities can exactly match a (very) slightly
restricted class of feasible covariance matrices. To state this result we need some notation.
We can and do easily state and prove Proposition 6 for a general dimension d (i.e., not just d = 3)
without any notational difficulty. Any covariance matrix Σ of a d dimensional random vector with
uniform(0, 1] marginals can be characterized by d(d − 1)/2 covariances, since the diagonal entries are
determined by the marginals, and the matrix is symmetric. Hence we can, with an abuse of notation,
think of Σ as a d(d− 1)/2 dimensional vector in some contexts, and as a d× d matrix in others.
Let Ω ⊂ [−1/12, 1/12]d(d−1)/2 denote the space of feasible covariance matrices, so that Σ ∈ Ω implies
that there exists a random vector with uniform(0, 1] marginals, and covariance matrix Σ. We will show
below that Ω is nonempty and convex (this is well-known), but also closed and full-dimensional (this
appears to be new). In particular then, any covariance matrix on the boundary of Ω is feasible. We will
also show that Σ is contained in the interior of Ω if, and only if, there is some finite n for which the
augmented LP has objective value 0. The collective implications of this and our previous results will be
discussed after the statement and proof of these results.
Proposition 6 The set Ω is nonempty, convex, closed and full-dimensional.
17
Proof: If the components of X are independent, then the covariance matrix Σ is diagonal, and so Ω
contains the zero vector, and is therefore nonempty.
It is well-known that Ω is convex. For if Σ1, Σ2 ∈ Ω, then there exist random vectors X, Y with
uniform(0, 1] marginals, and covariance matrices Σ1 and Σ2 respectively. For λ ∈ (0, 1), let Z be given
by X with probability λ, and Y with probability 1−λ. Then Z has covariance matrix λΣ1 + (1−λ)Σ2.
The proof that Ω is closed is virtually identical to that of Theorem 3 and is omitted.
We use the NORTA method to prove that Ω is full-dimensional. We will show that each of the
vectors ±ek/12 are contained in Ω, where ek is the vector whose components are all 0 except for a 1 in
the kth position, for k = 1, . . . , d(d− 1)/2. The convexity of Ω then ensures that Ω is full-dimensional.
Let Z be a multivariate normal random vector with mean 0 and covariance matrix consisting of 1’s
on the diagonal, and also in the (i, j)th and (j, i)th position (i 6= j), with the remaining components
being 0. That is, Z consists of 2 perfectly correlated standard normal random variables Zi and Zj , and
d − 2 independent standard normal random variables. Now let U be the random vector with uniform
(0, 1) marginals obtained by setting Um = Φ(Zm) for m = 1, . . . , d. Then Ui and Uj are perfectly
correlated, and independent of all of the remaining components of U . Thus, U has covariance matrix
whose components are all 0 except for the diagonal elements, and the (i, j), and (j, i)th elements, which
are equal to 1/12. Thus, ek/12 lies in Ω, where k corresponds to the position (i, j). A similar argument
with perfectly negatively correlated Zi and Zj shows that −ek/12 ∈ Ω. Since i 6= j were arbitrary, the
proof is complete. 2
In Theorem 4 we showed that all chessboard densities have nonsingular covariance matrices. This is
almost sufficient to establish that all boundary points of Ω do not have chessboard densities. However,
it is certainly conceivable that the boundary of Ω contains nonsingular, as well as singular, covariance
matrices. So we strengthen Theorem 4 with the following result.
Theorem 7 If fn is a chessboard density with covariance matrix Σ, then Σ is contained in the interior
of Ω.
Proof: Let X have density fn. We will show that we can both increase, and decrease, the covariance
between X1 and X2. Symmetry then allows us to conclude that the same result holds for Xi and Xj
with i 6= j. The convexity of Ω then completes the proof.
Let q be the discretization of fn into its n3 cells, and let C(i, j, k) be a cell with q(i, j, k) > 0. Divide
18
the cell C(i, j, k) into 4 (equal size) subcells,
Cab(i, j, k) = (x, y, z) ∈ C(i, j, k) :2i− (3− a)
2n< x ≤ 2i− (2− a)
2n,
2j − (3− b)2n
< y ≤ 2j − (2− b)2n
,
for 1 ≤ a, b ≤ 2.
Generate a new density g by the usual smearing (15) in all cells except C(i, j, k). Within the cell
C(i, j, k), assign a mass of q(i, j, k)/2 to each of the cells C11(i, j, k), and C22(i, j, k), and then uniformly
smear within these cells. In other words, for (x, y, z) contained in these two cells, set g(x, y, z) =
2n3q(i, j, k) and set g to be 0 in the cells Cab(i, j, k) for a 6= b. Then it is straightforward to show that g
has uniform marginals, that the (1, 2)th covariance is strictly increased, and that the other covariances
remain unchanged.
A similar argument placing the mass in the cells Cab(i, j, k) with a 6= b shows that the covariance
can be strictly decreased, and so the proof is complete. 2
We have thus far shown that if a covariance matrix Σ is not in Ω, then the augmented LP will be
infeasible for some n ≥ 1, and if Σ is on the boundary of Ω, then the LP approach will yield distributions
with covariance matrices that arbitrarily closely approximate Σ, but never actually achieve it. Our final
result shows that if Σ is contained in the interior of Ω, then there is some n ≥ 1 for which the optimal
objective value of the augmented LP is 0, and so one can exactly match Σ using a chessboard density.
Before proving this result, we need the following lemma. This lemma basically states that given a fixed
vector x, we can choose certain other vectors arbitrarily close to x, so that x is a convex combination of
these “close” vectors, and if we perturb the close vectors slightly, then x is still a convex combination of
the perturbed vectors.
For x ∈ IRm and ε > 0, let B(x, ε) denote the (open) set of vectors y ∈ IRm : ρ(x, y) < ε, where ρ
is the L1 distance
ρ(x, y) =m∑
i=1
|xi − yi|.
The proof of the following lemma may be found in Appendix A.
Lemma 8 Let x ∈ IRm, and let ε > 0 be arbitrary. There exist m + 1 points x1, . . . , xm+1 ∈ B(x, ε),
and a δ > 0 such that if
ρ(xi, x′i) < δ ∀i = 1, . . . , m + 1,
then x may be written as a convex combination of x′1, . . . , x′m+1.
19
We are now ready to state the final result of this section. As in Proposition 6, there is no loss of
clarity if we state this result for a general dimension d rather than just d = 3.
Theorem 9 If Σ is contained in the interior of Ω, then there exists an n ≥ 1 such that the optimal
objective value of the augmented LP is 0.
Proof: Let m = d(d−1)/2, and for now, consider Σ as an m-vector. Let ε > 0 be such that B(Σ, ε) ⊆ Ω,
and choose Σ1, Σ2, . . . , Σm+1 ∈ B(Σ, ε) and δ as in Lemma 8.
Since Σi ∈ Ω, from Theorem 5 there exists an n(i) such that the augmented LP with target
covariance matrix Σi has optimal objective value smaller than δ, for each i = 1, . . . , m + 1. Let
n = n(1)n(2) · · ·n(m + 1), and let qi denote a solution to the augmented LP with target matrix Σi
and discretization level n for i = 1, . . . ,m + 1. Then the optimal objective value corresponding to qi is
also less than δ. (Note that if k, n ≥ 1 are integers, then the optimal objective values z(n) and z(kn)
satisfy the relationship z(kn) ≤ z(n), since the chessboard density obtained from the solution to the nth
LP can also be obtained from the (kn)th LP.)
Let Σ′i denote the covariance matrix corresponding to the chessboard density f i for the solution qi,
for i = 1, . . . ,m + 1. Then, by Lemma 8, there exist nonnegative multipliers λ1, λ2, . . . , λm+1 summing
to 1 such that
Σ =m+1∑
i=1
λiΣ′i. (21)
If we set
f =m+1∑
i=1
λifi,
then f is also a chessboard density with discretization level n, and from (21), its covariance matrix is
exactly Σ. 2
In summary then, we have shown that if Σ is infeasible for uniform marginals, then the augmented
LP will be infeasible for some n ≥ 1. This includes the case where Σ is singular and infeasible for uniform
marginals. Furthermore, we have shown that if Σ is contained in the interior of Ω, then the augmented
LP will have optimal objective value 0 for some n ≥ 1, and so one can construct a chessboard density
from the solution to the augmented LP with the required marginals and covariance matrix. So if Σ is
not contained in the boundary of Ω, then we have an algorithm for determining, in finite time, whether
Σ is feasible for the given marginals or not. One simply solves the augmented LP for n = 1, 2, 3, . . . until
the augmented LP is either infeasible, or has an optimal objective value of 0. In the latter case, we can
deliver an explicit construction of the desired distribution.
20
The case where Σ lies on the boundary of Ω is more problematical. We have shown that in this case,
Σ is feasible for uniform marginals, but that a chessboard density cannot be constructed with uniform
marginals and covariance matrix Σ. Therefore, for such matrices, the algorithm outlined above will not
terminate in finite time. However, a chessboard distribution can come arbitrarily close to the required
covariance matrix.
3 More general marginals
The LP method used in Section 2 to evaluate the existence of a random vector with uniform marginals
and given covariance matrix can be adapted to investigate the existence of random vectors having
arbitrary marginal distributions and given Pearson product-moment covariance matrix. We will stick to
the case of a 3-dimensional random vector for notational simplicity but note that the approach is easily
extended to the general d-dimensional case.
Let X = (X1, X2, X3) represent the random vector that is to be constructed, and let Σ be the
desired covariance matrix. Let Fi(·) denote the distribution function of Xi, for i = 1, 2, 3. For ease of
exposition we assume that each of the Fi’s has a density fi with respect to Lebesgue measure, although
the approach applies more generally, and in particular, can be applied when some or all of the Xi’s have
discrete distributions.
In the spirit of the method developed in Section 2, we will first construct the probability mass function
of a discretized random vector Y = (Y1, Y2, Y3) with a covariance structure as close to the desired one
as possible, and then derive a joint distribution for X.
Let n1, n2 and n3 represent the levels of discretization of the random variables X1, X2 and X3
respectively, and hence the number of points that form the support of Y1, Y2 and Y3. Let the range of
the variable Xi be divided into ni subintervals (which may, or may not, be equal in length) by the set
of points yi,0, yi,1, . . . , yi,ni, with
−∞ ≤ yi,0 < yi,1 < · · · < yi,ni ≤ ∞.
Note that we explicitly allow yi,0 and yi,ni to be infinite and the spacing between the yi,ks to be arbitrary.
Let Yi,k denote the conditional mean of Xi, given that it lies in the subinterval (yi,k−1, yi,k]. In other
words, we set
Yi,k = E[Xi|Xi ∈ (yi,k−1, yi,k]] =∫ yi,k
yi,k−1
xfi(x)Pi(k)
dx,
21
where Pi(k) = Fi(yi,k) − Fi(yi,k−1) represents the probability that Xi lies in the kth subinterval. The
support for the random vector Y is then Y1,i, Y2,j , Y3,k : 1 ≤ i ≤ n1, 1 ≤ j ≤ n2, 1 ≤ k ≤ n3.Let q(i, j, k) = P (Y = (Y1,i, Y2,j , Y3,k)) = P (X ∈ C(i, j, k)), where C(i, j, k) is defined as in Section
2 to be the cell corresponding to q(i, j, k). We now give constraints on the q(i, j, k)s analogous to (8)
through (11). Specifically, we have that
n∑
j,k=1
q(i, j, k) = P (Y1 = Y1,i) = P (X1 ∈ (y1,i−1, y1,i]) = P1(i), ∀i = 1, . . . , n1,
n∑
i,k=1
q(i, j, k) = P (Y2 = Y2,j) = P (X2 ∈ (y2,j−1, y2,j ]) = P2(j), ∀j = 1, . . . , n2,
n∑
i,j=1
q(i, j, k) = P (Y3 = Y3,k) = P (X3 ∈ (y3,k−1, y3,k]) = P3(k), ∀k = 1, . . . , n3,
q(i, j, k) ≥ 0 ∀i = 1, . . . , n1, j = 1, . . . , n2, k = 1, . . . , n3.
When these constraints are satisfied, EYi = EXi for each i, just as in the case of uniform marginals.
We can now formulate an LP along the lines of that given in Section 2 to attempt to match the covariances
of Y to those required of X. We omit the details.
The LP is always feasible, since we can take the Yi’s to be independent, and the objective value is
bounded below by 0. Hence, an optimal solution exists for every discretization.
If the optimal objective value is 0, then we have been able to construct a probability mass function
for Y with the required covariance structure. Constructing a joint distribution function for X from this
pmf for Y is similar to the method of uniform “smearing” used in Section 2.
Specifically, the “smearing” process should be able to satisfy (14). Again, the easiest method for
doing so is perhaps to ensure that the variables are conditionally independent given that X lies within
the cell. To ensure that this conditional independence holds while simultaneously ensuring that X has
the right marginals, we set the density of X within the cell C(i, j, k) to be that of independent random
variables with the right marginals, scaled so that the total mass in the cell is q(i, j, k). To be precise, if
f is the density of X, then for any x = (x1, x2, x3) ∈ C(i, j, k), we set
f(x) =f1(x1)P1(i)
f2(x2)P2(j)
f3(x3)P3(k)
q(i, j, k). (22)
We can now provide analogous results to those in Section 2. We omit the proofs since they are similar
to those already presented.
Theorem 10 If the optimal objective value of the LP is 0, then the density f defined via (22) has the
required marginals and covariance matrix Σ.
22
As in Section 2, let Ω ⊂ IRd(d−1)/2 denote the set of feasible covariance matrices for the given
marginals. As before, we think of a given covariance matrix as a vector in d(d− 1)/2 dimensional space
in some contexts, and as a d× d matrix in others.
Proposition 11 The set Ω is nonempty, convex and full-dimensional.
We also have the following analogue of Theorems 4 and 7.
Theorem 12 Any chessboard density has a nonsingular covariance matrix. Furthermore, if fn is a
chessboard density with covariance matrix Σ, then Σ is contained in the interior of Ω.
To extend the other results of the previous section, we assume that yi,0 and yi,niare finite for all i,
i.e., that all of the distribution functions Fi have bounded support. We further assume that all of the
ni’s are equal to n say, and that all subintervals are of equal length. Thus, we will discretize on a regular
grid containing n3 cells.
Suppose that a random vector X with the desired marginals and covariance matrix exists, and let
X denote its discretization. Let q(i, j, k) be the probability that X lies in the cell C(i, j, k). We can
now bound the change in the covariances of X and X. But first it is convenient to let ai = yi,0 and
∆i = yi,1 − yi,0, so that ∆i is the width of the cells in the ith coordinate direction, for i = 1, 2, 3. With
this notation, we have that
cov(X1, X2) − Σ12
=n∑
i,j,k=1
q(i, j, k)[Y1,iY2,j − E[X1X2|X ∈ C(i, j, k)]]
≤n∑
i,j,k=1
q(i, j, k)[y1,iy2,j − y1,i−1y2,j−1]
=n∑
i,j,k=1
q(i, j, k)[(a1 + i∆1)(a2 + j∆2)− (a1 + (i− 1)∆1)(a2 + (j − 1)∆2)]
=n∑
i,j,k=1
q(i, j, k)[∆1(a2 + (j − 1)∆2) + ∆2(a1 + (i− 1)∆1) + ∆1∆2]
≤ ∆1EX2 + ∆2EX1 + ∆1∆2.
A similar lower bound can be derived, and so the LP can be augmented by the bounds
Z+ij , Z
−ij ≤ ∆iEXj + ∆jEXi + ∆i∆j ,
for 1 ≤ i < j ≤ 3.
23
Observe that as n →∞, these bounds converge to 0. This is the final ingredient required to strengthen
the other results of the previous section to the more general case of distributions with bounded support
and densities. In particular, we now have the following results, which we state without proof because
the proofs are similar to the case of uniform marginals.
Remark 6 The above bounds were derived assuming a regularly spaced grid of cells, so that the cells
were all of identical size. However, similar bounds can be expected to hold when the cells are not of equal
size. Indeed, one should be able to obtain bounds on Z+ij and Z−ij which converge to 0 as long as the
maximum sidelength of the cells converges to 0.
Proposition 13 Suppose that Σ is feasible for the given marginals. If all of the densities fi have
bounded support, then as n → ∞, the optimal objective value of the LP converges to 0. Furthermore,
the set Ω is closed.
Theorem 14 Suppose that all of the densities fi have bounded support. Then ∃n ≥ 1 such that the nth
LP is infeasible if and only if the matrix Σ is infeasible for the given marginals.
Theorem 15 Suppose that all of the densities fi have bounded support. Then ∃n ≥ 1 such that that the
optimal objective value of the nth LP is 0 if and only if the matrix Σ is contained in the interior of Ω.
So assuming that all of the densities fi have bounded support, and Σ does not lie on the boundary of
Ω, then we have a finite algorithm for determining whether Σ is feasible or not, and if feasible, supplying
an explicit joint density with the required properties. The algorithm is simply to solve a sequence of
LPs for n = 1, 2, . . . until either the LP is infeasible, or has an optimal objective value of 0. If Σ lies
on the boundary of Ω, then we know that it is feasible, and we can approach it arbitrarily closely with
chessboard distributions, but never exactly reach it.
4 Application to NORTA
We now apply the theory developed in Section 2 to explore the performance of the NORTA method in
matching the covariance matrix Σ for a 3-dimensional random vector X = (X1, X2, X3) with uniform
marginals. We will see that the Li and Hammond counterexample exists, obtain some insight into the
class of covariance matrices that cannot be matched using NORTA, and develop a remedy based on
semidefinite programming for such cases.
For notational convenience, we will use the correlation matrix R = 12Σ = (ρij : 1 ≤ i, j ≤ 3)
instead of Σ. The matrix R is determined by ρ12, ρ13 and ρ23. The set Θ of all possible values of
24
ρ = (ρ12, ρ13, ρ23) that constitute feasible correlation matrices is just a rescaling of the set Ω in Section
2, and is a proper subset of the cube [−1, 1]3 because a correlation matrix R is constrained to be positive
semidefinite. We examined all symmetric positive semidefinite matrices with off-diagonal components
in the set -1.0, -0.9, . . ., -0.1, 0, 0.1, . . ., 0.9, 1.0. There are 4897 such matrices.
These 4897 matrices were further tested to see whether they were NORTA feasible. We define the
matrix R to be NORTA feasible if the covariance matrix Λ found via (5) is positive semidefinite. In
this case, a multivariate normal random vector with covariance matrix Λ will be transformed via the
NORTA method to a multivariate uniform random vector with the required correlation matrix R.
A total of 160 sample matrices were identified to be NORTA defective. Note that since X1, X2 and
X3 are identically distributed, many different ρ’s form the same effective correlation matrix for X. For
example, ρ =(0.5,-0.5,0.5), (-0.5,0.5,0.5) and (0.5,0.5,-0.5) constitute the same joint distribution for X
up to a symmetry. If we eliminate such multiple occurences, the number of NORTA defective matrices
reduces to 31 cases.
The question that remains to be answered is whether these NORTA defective matrices are feasible for
uniform marginals. We applied our LP method to each NORTA defective case iteratively for increasing
values of n, the level of discretization, to determine whether a chessboard density can be constructed.
The results may be found in Table 1 of Appendix B. For 25 of the 31 cases, chessboard distributions
that exactly match R were constructed with a discretization level n ≤ 18. Larger values of n appeared
to be needed for matrices that were “near singular”, in the sense that their smallest eigenvalue was close
to 0. Chessboard distributions could not exactly match R in the remaining 6 cases, but this is to be
expected from Theorem 4 since in all of these cases, R was singular. However, the optimal objective
value in these 6 cases was approximately 2× 10−5 (with n = 80), so that chessboard distributions came
very close.
The Li and Hammond covariance matrix is among those that we were able to exactly match using
chessboard distributions. So we have rigorously established that there are feasible covariance matrices
for a given set of marginals that cannot be matched via the NORTA method.
These results seem to suggest that NORTA defective R matrices are those that are near-singular,
and perhaps are then relatively rare. However, Lurie and Goldberg (1998) believe that singular and
near-singular correlation matrices actually represent a common situation in cost analysis for example.
This is because correlations between cost elements are typically estimated from unbalanced data sets.
This is likely to lead to indefinite target correlation matrices, so that any least adjustment to them is
almost certainly going to result in an adjusted target matrix that is singular, or very nearly so.
25
It is natural to ask whether the NORTA method can be modified to generate random vectors with
the desired marginals and approximately the right covariance matrix.
Lurie and Goldberg (1998) described a method for identifying a positive semidefinite covariance
matrix ΣZ for use within the NORTA method that yields approximately the desired product-moment
covariance matrix Σ. Their approach involves a complicated nonlinear optimization, and must be spe-
cialized for approximating the rank correlation or product-moment correlation, depending on the case
desired. Furthermore, although they report that their optimization procedure always converges in prac-
tice, they do not have a proof of this result. Finally, their approach appears to be limited to fixed
sample sizes. We present an alternative method based on semidefinite programming that does not
share these limitations. (See Vandenberghe and Boyd 1996 for an accessible introduction to semidefinite
programming.)
Let ΛZ be the symmetric matrix that we wish to use in the NORTA procedure. We do not distinguish
between the cases where ΛZ is chosen to induce a given rank, product-moment, or other correlation in
the output random vector X. If ΛZ is indefinite, then we use a semidefinite program (SDP) to find a
matrix ΣZ that is “close” to ΛZ and is positive semidefinite. The matrix ΣZ is then used within the
NORTA method.
Why is this approach reasonable? In Theorem 2 of Cario and Nelson (1997), it is shown that under a
certain moment condition, the output covariance matrix is a continuous function of the input covariance
matrix ΣZ used in the NORTA procedure. So if ΣZ is “close” to ΛZ , then we can expect the covariance
matrix of the NORTA generated random vectors to be close to the desired matrix Σ. The moment
condition always holds when we are attempting to match rank covariances, and we can expect it to hold
almost invariably when matching product-moment correlations. Therefore, it is eminently reasonable to
try and minimize some measure of distance d(ΛZ , ΣZ) say, between ΛZ and ΣZ .
The SDP falls under the broad class of matrix completion problems; see Alfakih and Wolkowicz
(2000), or Johnson (1990). Given ΛZ as data, and assuming that we are operating in dimension d = 3,
we wish to choose a symmetric matrix ΣZ to
minimize |ΣZ(1, 2)− ΛZ(1, 2)|+ |ΣZ(1, 3)− ΛZ(1, 3)|+ |ΣZ(2, 3)− ΛZ(2, 3)|subject to ΣZ º 0,
ΣZ(i, i) = 1,
where the matrix inequality A º 0 signifies a constraint that the matrix A be positive semidefinite. This
problem is easily formulated as an SDP.
The SDP framework allows us to include preferences on how the search for ΣZ is performed. For
26
example, we can require that ΣZ(i, j) ≥ ΛZ(i, j), or that the value ΛZ(i, j) change by at most δ > 0. Effi-
cient algorithms are available for solving semidefinite problems; see Wolkowicz, Saigal and Vandenberghe
(2000).
We solved SDPs using public domain codes for all of the 31 cases of NORTA defective correlation
matrices identified earlier. In each case, we also computed d(Σ, ΣX), where Σ is the desired covariance
matrix, and ΣX is the covariance matrix that NORTA delivers using the solution from the SDP for ΣZ .
The results may be found in Table 2 of Appendix B. The maximum distance d(Σ, ΣX) observed was
approximately 0.05 in a case where Σ was singular, and when Σ was nonsingular, the maximum distance
was less than 0.02. We conclude that the SDP approach above is very effective, at least in 3 dimensions.
Remark 7 When ΛZ is indefinite, the optimal ΣZ lies on the boundary of the set of symmetric pos-
itive semidefinite matrices ΩZ with diagonal elements equal to 1. Therefore, ΣZ is singular. It does
not immediately follow though, that the induced covariance matrix ΣX is singular, since the NORTA
transformation alters covariances in a nonlinear fashion.
It is worth noting that ΣX may not be the closest NORTA feasible covariance matrix to Σ, because
the optimization was performed “in Gaussian space”. This is in contrast to the Lurie and Goldberg
(1998) procedure. But the values computed in our experiments seem to suggest that the difference
d(Σ, ΣX) is usually very small.
5 Conclusions and Future Research
In Section 2 we developed a method for determining whether a copula with given covariance matrix
exists or not. The method works for all covariance matrices except those lying on the boundary of the
set of feasible covariance matrices. It would be interesting to see if one can extend these methods to
match boundary covariance matrices as well.
In Section 3, we extended the results of Section 2 to more general marginals. Some of the results
from Section 2 extend immediately, while others appear to require that the support of the marginal dis-
tributions be bounded. We are investigating whether this bounded support hypothesis can be removed.
Using these methods, we have shown that the Li and Hammond example exists, so that the NORTA
method cannot necessarily match all feasible covariance matrices for a given set of marginals. For
such cases we have suggested a modification of the NORTA procedure that enables one to at least
approximately match the desired covariance matrix. The modified procedure is as follows.
27
1. Identify ΛZ to match some aspect of the desired covariance structure of X in any fashion. If ΛZ
is positive semidefinite, then one can proceed directly with the NORTA procedure.
2. If not, solve an SDP as outlined in Section 4 to identify a matrix ΣZ that is “close” to ΛZ , and
use ΣZ in the NORTA procedure.
The additional work involved in this modification only shows up in the initialization phase of the
NORTA method, and so there is no additional computational overhead while the method is being used
to generate replicates of X. Furthermore, public-domain algorithms are available for solving SDPs, and
these algorithms can handle very large dimensional problems with relative ease. Finally, the method does
not require tailoring to different correlation measures. The only step that depends on the correlation
measure being used is the first step when ΛZ is identified.
Returning to chessboard distributions, our numerical results suggest that the discretization level
required to match a covariance matrix increases as the matrix moves towards the boundary of the
feasible set of covariance matrices. This result is perhaps to be expected, since the optimal objective
values of the LP decrease as the discretization is made finer, so long as the discretizations are “nested”
(see the proof of Theorem 9). However, it remains to be seen whether the optimal objective values of
the LP decrease as one moves from discretization level n to n + 1 for n ≥ 2. (Note that the cells of the
(n + 1)th LP are not nested within those of the nth LP as long as n ≥ 2.)
A subject of active research is whether chessboard distributions can be made into a practical method
for random vector generation. The primary bottleneck appears to be in the setup phase when we need
to solve a potentially large LP.
A Proof of Lemma 8
Basically, one chooses the xi’s to be the vertices of a simplex centered at x. To be precise, let r > 0 be
a parameter, and set
x1 = ( −a1 −a2 · · · −am−1 −am )′ + x
x2 = ( a1 −a2 · · · −am−1 −am )′ + x
x3 = ( 0 2a2 · · · −am−1 −am )′ + x
......
......
......
......
...
xm = ( 0 0 · · · (m− 1)am−1 −am )′ + x
xm+1 = ( 0 0 · · · 0 mam )′ + x,
28
where
ai = r
√m
m + 1
√1
i(i + 1).
Then, (Dantzig 1991), the xi’s define the vertices of an equilateral simplex whose center is x, and whose
vertices are a (Euclidean) distance rm/(m + 1) from x. Choose r so that xi ∈ B(x, ε) for all i.
Observe that the average of the xi’s is x. In fact, it is easy to show that the (m+1)× (m+1) matrix
B consisting of the xi’s in columns, supplemented with a row of 1’s is nonsingular, and so
y = B−1x = (m + 1)−1(1, 1, . . . , 1)′.
Now, observe that B−1 is a continuous function of B, at least in a neighbourhood of B, and so y = B−1x
is locally a continuous function of x1, . . . , xm+1. Hence, there is a δ > 0 such that if ρ(xi, x′i) < δ for
all i = 1, . . . , m + 1, and D consists of the vectors x′i in columns supplemented with a row of 1’s, then
y = D−1x consists of all positive components, and the elements of y sum to 1. 2
B Computational Results
The computational results can be found at the Operations Research Home Page <http://or.pubs.informs.org>
in the Online Collection.
Acknowledgments
We would like to thank Marina Epelman for a discussion that was helpful in proving Lemma 8, and
the referees for suggestions that improved the presentation. This work was partially supported by NSF
grant DMI-9984717.
References
Alfakih, A. and H. Wolkowicz. 2000. Matrix completion problems. In Handbook of Semidefinite Pro-
gramming: Theory, Algorithms and Applications. H. Wolkowicz, R. Saigal, L. Vandenberghe, eds,
Kluwer, Boston, 533–545.
Billingsley, P. 1986. Probability and Measure. Wiley, New York.
Cario, M. C. and B. L. Nelson. 1996. Autoregressive to anything: time-series input processes for
simulation. Operations Research Letters 19:51–58.
29
Cario, M. C. and B. L. Nelson. 1997. Modeling and generating random vectors with arbitrary marginal
distributions and correlation matrix. Technical Report, Department of Industrial Engineering and
Management Sciences, Northwestern University, Evanston, Illinois.
Chen, H. 2001. Initialization for NORTA: generation of random vectors with specified marginals and
correlations. INFORMS Journal on Computing. To appear.
Clemen, R. T., and T. Reilly. 1999. Correlations and copulas for decision and risk analysis. Management
Science. 45:208–224.
Cooke, R. M. 1997. Markov and entropy properties of tree- and vine-dependent variables. Proceedings
of the ASA Section on Bayesian Statistical Science. Alexandria, VA.
Dantzig, G. B. 1991. Converting a converging algorithm into a polynomially bounded algorithm. Techni-
cal report 91-5, Systems Optimization Laboratory, Dept of Operations Research, Stanford University,
Stanford, California.
Devroye, L. 1986. Non-Uniform Random Variate Generation. Springer-Verlag, New York.
Henderson, S. G., B. A. Chiera, and R. M. Cooke. 2000. Generating “dependent” quasi-random numbers.
Proceedings of the 2000 Winter Simulation Conference. J. A. Joines, R. R. Barton, K. Kang, P. A.
Fishwick, eds. IEEE, Piscataway New Jersey. 527–536.
Hill, R. R., and C. H. Reilly. 1994. Composition for multivariate random vectors. In Proceedings of the
1994 Winter Simulation Conference, J. D. Tew, S. Manivannan, D. A. Sadowsky, A. F. Seila, eds.
IEEE, Piscataway New Jersey, 332 – 339.
Hill, R. R., and C. H. Reilly. 2000. The effects of coefficient correlation structure in two-dimensional
knapsack problems on solution procedure performance. Management Science, 46: 302–317.
Hodgson, T. J., J. A. Joines, S. D. Roberts, K. A. Thoney, J. R. Wilson. 2000. Satisfying due-dates
in large job shops: Characteristics of ”real” problems. Technical Report. Department of Industrial
Engineering, North Carolina State University, Raleigh, North Carolina.
Iman, R. and W. Conover. 1982. A distribution-free approach to inducing rank correlation among input
variables, Communications in Statistics: Simulation and Computation, 11: 311-334.
Johnson, C. R. 1990. Matrix completion problems: a survey. Proceedings of Symposia in Applied
Mathematics 40:171–198.
Johnson, M. E. 1987. Multivariate Statistical Simulation. Wiley, New York.
Kruskal, W. 1958. Ordinal measures of associaton. J. Amer. Statist. Assoc. 53:814–861.
30
Law, A. M. and W. D. Kelton. 2000. Simulation Modeling and Analysis, 3rd ed. McGraw Hill, Boston.
Lewis, P. A. W., E. McKenzie, and D. K. Hugus. 1989. Gamma processes. Communications in Statistics:
Stochastic Models 5:1–30.
Li, S. T., and J. L. Hammond. 1975. Generation of pseudorandom numbers with specified univariate
distributions and correlation coefficients. IEEE Transactions on Systems, Man, and Cybernetics.
5:557–561.
Lurie, P. M., and M. S. Goldberg. 1998. An approximate method for sampling correlated random
variables from partially-specified distributions. Management Science. 44:203–218.
Mackenzie, G. R. 1994. Approximately Maximum-Entropy Multivariate Distributions with Specified
Marginals and Pairwise Correlations. Ph.D. thesis. Department of Decision Sciences, University
of Oregon, Eugene OR.
Mardia, K. V. 1970. A translation family of bivariate distributions and Frechet’s bounds. Sankhya.
A32:119–122.
Meeuwissen, A. M. H., and R. M. Cooke. 1994. Tree dependent random variables. Technical report
94-28, Department of Mathematics, Delft University of Technology, Delft, The Netherlands.
Melamed, B., J. R. Hill, and D. Goldsman. 1992. The TES methodology: modeling empirical stationary
time series. In Proceedings of the 1992 Winter Simulation Conference IEEE, Piscataway, New Jersey,
135–144.
Nelsen, R. B. 1999. An Introduction to Copulas. Lecture Notes in Statistics, 139. Springer-Verlag, New
York.
Sklar, A. 1959. Fonctions de Repartition a n Dimensions et Leurs Marges. Publications de l’Institut
Statistique de l’Universite de Paris 8:229–231.
Vandenberghe, L., S. Boyd. 1996. Semidefinite Programming. SIAM Review 38:49–95.
van der Geest, P. A. G. 1998. An algorithm to generate samples of multi-variate distributions with
correlated marginals Computational Statistics and Data Analysis 27: 271-289.
Whitt,W. 1976. Bivariate distributions with given marginals. The Annals of Statistics. 4:1280–1289.
Wolkowicz, H., R. Saigal, L. Vandenberghe, eds. 2000. Handbook of Semidefinite Programming: Theory,
Algorithms and Applications. Kluwer, Boston.
31
Recommended