16
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012 2729 Complex-Valued Random Vectors and Channels: Entropy, Divergence, and Capacity Georg Tauböck, Member, IEEE Abstract—Recent research has demonstrated significant achiev- able performance gains by exploiting circularity/noncircularity or properness/improperness of complex-valued signals. In this paper, we investigate the influence of these properties on important in- formation theoretic quantities such as entropy, divergence, and ca- pacity. We prove two maximum entropy theorems that strengthen previously known results. The proof of the first maximum entropy theorem is based on the so-called circular analog of a given com- plex-valued random vector. The introduction of the circular analog is additionally supported by a characterization theorem that em- ploys a minimum Kullback–Leibler divergence criterion. In the proof of the second maximum entropy theorem, results about the second-order structure of complex-valued random vectors are ex- ploited. Furthermore, we address the capacity of multiple-input multiple-output (MIMO) channels. Regardless of the specific dis- tribution of the channel parameters (noise vector and channel ma- trix, if modeled as random), we show that the capacity-achieving input vector is circular for a broad range of MIMO channels (in- cluding coherent and noncoherent scenarios). Finally, we investi- gate the situation of an improper and Gaussian distributed noise vector. We compute both capacity and capacity-achieving input vector and show that improperness increases capacity, provided that the complementary covariance matrix is exploited. Otherwise, a capacity loss occurs, for which we derive an explicit expression. Index Terms—Capacity, circular, circular analog, differential entropy, improper, Kullback–Leibler divergence, multiple-input multiple-output (MIMO), mutual information, noncircular, proper. I. INTRODUCTION C OMPLEX-VALUED signals are central in many scien- tific fields including communications, array processing, acoustics and optics, oceanography and geophysics, machine learning, and biomedicine. In recent research—for a compre- hensive overview see [4]—it has been shown that exploiting circularity/properness of complex-valued signals or lack of it (noncircularity/improperness) is able to significantly enhance Manuscript received August 09, 2006; revised October 31, 2010; accepted May 25, 2011. Date of publication January 31, 2012; date of current version April 17, 2012. This work was supported by WWTF Grant MOHAWI (MA 44) and WWTF Grant SPORTS (MA 07-004) and by FWF Grant “Statistical Infer- ence” (S10603-N13) within the National Research Network SISE. The mate- rial in this paper was presented in part at the 23rd Symposium on Information Theory in the Benelux, Louvain-la-Neuve, Belgium, May 2002; at the 2004 IEEE International Symposium on Information Theory, and at the 2011 IEEE Information Theory Workshop [1]–[3]. The author is with the Institute of Telecommunications, Vienna Univer- sity of Technology, Gusshausstrasse 25/389, 1040 Vienna, Austria (e-mail: [email protected]). Communicated by K. Kobayashi, Associate Editor for Shannon Theory. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2012.2184638 the performance of signal processing techniques. More specifi- cally, in the field of communications, it has been observed that important digital modulation schemes including binary phase- shift keying, pulse amplitude modulation, Gaussian minimum shift keying, offset quaternary phase-shift keying, and base- band (but not passband) orthogonal frequency division mul- tiplexing—commonly called discrete multitone (DMT)—pro- duce noncircular/improper complex baseband signals under var- ious circumstances (see e.g., [5]–[10]). Noncircular/improper baseband communication signals can also arise due to imbal- ance between their in-phase and quadrature (I/Q) components, and several techniques for compensating for I/Q imbalance have been proposed [11]–[13]. Information theory, on the other hand, addresses funda- mental performance limits of communication systems and also has large impact on many other scientific areas, where stochastic models are used. Clearly, it has the potential to study performance limits of signal processing algorithms and communication systems that exploit circularity/properness or noncircularity/improperness. However, there exist only a few information theoretic results addressing these questions [4], [14] and further advancements are desirable. A significant disadvantage of available results is that they often stick to a Gaussian assumption, something which is not always the case in practice. Apparently the first information theoretic work that addresses complex-valued random vectors analyzes the differential en- tropy of complex-valued random vectors [14]. More specifi- cally, it is shown that the differential entropy of a zero-mean complex-valued random vector with given covariance matrix is upper bounded by the differential entropy of a circular (and, consequently, zero mean and proper) Gaussian distributed com- plex-valued random vector with the same covariance matrix. Using this result, capacity results for complex-valued multiple- input multiple-output (MIMO) channels with additive circular/ proper Gaussian noise have been derived [15]. In particular, it has been shown that the capacity-achieving input vector is Gaussian distributed and circular/proper. Let us suppose that we are dealing with a complex-valued random vector which is known to be non-Gaussian. In this sit- uation, the differential entropy upper bound developed in [14] turns out to be not tight. The same is the case, if the com- plex-valued random vector is known to be improper. Hence, there are two sources that decrease the differential entropy of a complex-valued random vector, i.e., non-Gaussianity and im- properness. One goal of this paper is to derive improved/tighter maximum entropy theorems for both cases. The maximum entropy theorem in [14] associates a circular random vector (i.e., the Gaussian distributed one) to a given 0018-9448/$31.00 © 2012 IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012 2729

Complex-Valued Random Vectors and Channels:Entropy, Divergence, and Capacity

Georg Tauböck, Member, IEEE

Abstract—Recent research has demonstrated significant achiev-able performance gains by exploiting circularity/noncircularity orproperness/improperness of complex-valued signals. In this paper,we investigate the influence of these properties on important in-formation theoretic quantities such as entropy, divergence, and ca-pacity. We prove two maximum entropy theorems that strengthenpreviously known results. The proof of the first maximum entropytheorem is based on the so-called circular analog of a given com-plex-valued random vector. The introduction of the circular analogis additionally supported by a characterization theorem that em-ploys a minimum Kullback–Leibler divergence criterion. In theproof of the second maximum entropy theorem, results about thesecond-order structure of complex-valued random vectors are ex-ploited. Furthermore, we address the capacity of multiple-inputmultiple-output (MIMO) channels. Regardless of the specific dis-tribution of the channel parameters (noise vector and channel ma-trix, if modeled as random), we show that the capacity-achievinginput vector is circular for a broad range of MIMO channels (in-cluding coherent and noncoherent scenarios). Finally, we investi-gate the situation of an improper and Gaussian distributed noisevector. We compute both capacity and capacity-achieving inputvector and show that improperness increases capacity, providedthat the complementary covariance matrix is exploited. Otherwise,a capacity loss occurs, for which we derive an explicit expression.

Index Terms—Capacity, circular, circular analog, differentialentropy, improper, Kullback–Leibler divergence, multiple-inputmultiple-output (MIMO), mutual information, noncircular,proper.

I. INTRODUCTION

C OMPLEX-VALUED signals are central in many scien-tific fields including communications, array processing,

acoustics and optics, oceanography and geophysics, machinelearning, and biomedicine. In recent research—for a compre-hensive overview see [4]—it has been shown that exploitingcircularity/properness of complex-valued signals or lack of it(noncircularity/improperness) is able to significantly enhance

Manuscript received August 09, 2006; revised October 31, 2010; acceptedMay 25, 2011. Date of publication January 31, 2012; date of current versionApril 17, 2012. This work was supported by WWTF Grant MOHAWI (MA 44)and WWTF Grant SPORTS (MA 07-004) and by FWF Grant “Statistical Infer-ence” (S10603-N13) within the National Research Network SISE. The mate-rial in this paper was presented in part at the 23rd Symposium on InformationTheory in the Benelux, Louvain-la-Neuve, Belgium, May 2002; at the 2004IEEE International Symposium on Information Theory, and at the 2011 IEEEInformation Theory Workshop [1]–[3].

The author is with the Institute of Telecommunications, Vienna Univer-sity of Technology, Gusshausstrasse 25/389, 1040 Vienna, Austria (e-mail:[email protected]).

Communicated by K. Kobayashi, Associate Editor for Shannon Theory.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2012.2184638

the performance of signal processing techniques. More specifi-cally, in the field of communications, it has been observed thatimportant digital modulation schemes including binary phase-shift keying, pulse amplitude modulation, Gaussian minimumshift keying, offset quaternary phase-shift keying, and base-band (but not passband) orthogonal frequency division mul-tiplexing—commonly called discrete multitone (DMT)—pro-duce noncircular/improper complex baseband signals under var-ious circumstances (see e.g., [5]–[10]). Noncircular/improperbaseband communication signals can also arise due to imbal-ance between their in-phase and quadrature (I/Q) components,and several techniques for compensating for I/Q imbalance havebeen proposed [11]–[13].

Information theory, on the other hand, addresses funda-mental performance limits of communication systems andalso has large impact on many other scientific areas, wherestochastic models are used. Clearly, it has the potential tostudy performance limits of signal processing algorithms andcommunication systems that exploit circularity/properness ornoncircularity/improperness. However, there exist only a fewinformation theoretic results addressing these questions [4],[14] and further advancements are desirable. A significantdisadvantage of available results is that they often stick to aGaussian assumption, something which is not always the casein practice.

Apparently the first information theoretic work that addressescomplex-valued random vectors analyzes the differential en-tropy of complex-valued random vectors [14]. More specifi-cally, it is shown that the differential entropy of a zero-meancomplex-valued random vector with given covariance matrix isupper bounded by the differential entropy of a circular (and,consequently, zero mean and proper) Gaussian distributed com-plex-valued random vector with the same covariance matrix.Using this result, capacity results for complex-valued multiple-input multiple-output (MIMO) channels with additive circular/proper Gaussian noise have been derived [15]. In particular,it has been shown that the capacity-achieving input vector isGaussian distributed and circular/proper.

Let us suppose that we are dealing with a complex-valuedrandom vector which is known to be non-Gaussian. In this sit-uation, the differential entropy upper bound developed in [14]turns out to be not tight. The same is the case, if the com-plex-valued random vector is known to be improper. Hence,there are two sources that decrease the differential entropy ofa complex-valued random vector, i.e., non-Gaussianity and im-properness. One goal of this paper is to derive improved/tightermaximum entropy theorems for both cases.

The maximum entropy theorem in [14] associates a circularrandom vector (i.e., the Gaussian distributed one) to a given

0018-9448/$31.00 © 2012 IEEE

Page 2: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2730 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

complex-valued random vector. As pointed out, this choice doesnot always lead to the smallest change in differential entropy.This raises the question, how we can associate a circular randomvector to an (in general) non-circular complex-valued randomvector in a canonical way but not forcing it to be Gaussian dis-tributed. The choice we propose is intuitive and is, furthermore,supported by a characterization theorem that is based on a min-imum Kullback–Leibler divergence criterion. It also leads to thedesired improved entropy upper bound for the case for which therandom vector is known to be non-Gaussian. A study of furtherproperties complements its analysis.

As already mentioned, the maximum entropy theorem in [14]does not yield a tight upper bound if the random vector is knownto be improper. Extending our work of [1] and [2], we derive animproved maximum entropy theorem which addresses this sit-uation. As a by-product, we obtain a criterion for a matrix to bea valid complementary covariance matrix (also termed pseudo-covariance matrix [14]). We note that after our initial work [1],[2], the obtained characterization of complementary covariancematrices has been extended in [16] and [17]. Meanwhile, expres-sions for the differential entropy of improper Gaussian randomvectors have appeared in the literature as well [4], [18].

Finally, we apply the obtained maximum entropy theorems toderive novel capacity results for complex-valued MIMO chan-nels with additive noise vectors. Without making use of anyGaussian assumption (in contrast to [15]), we show that capacityis achieved by circular random vectors for a broad range ofchannels. These results include both the case of a deterministicchannel matrix and the case of a random channel matrix, whoserealizations are assumed to be either known to the receiver (co-herent capacity) or unknown (incoherent capacity). On the otherhand, we investigate the capacity of channels, whose noise isnoncircular/improper and Gaussian distributed. Such channelshave been shown to occur if, e.g., DMT is used as modulationscheme [9], [10]. Note that DMT is currently employed in sev-eral xDSL standards [19]. We derive capacity expressions fortwo cases: 1) we assume that the knowledge of the comple-mentary covariance matrix is taken into account (both at trans-mitter and receiver); and 2) we assume that it is erroneouslybelieved—i.e., that the transceiver is designed assuming—thatthe noise has a vanishing complementary covariance matrix, sothat the information contained in the complementary covariancematrix is ignored. This results in a decreased capacity and wecalculate the occurring capacity loss.

Notation: The identity matrix is denoted by . Weuse the superscript for transposition and the superscript

for Hermitian transposition, where the superscriptstands for complex conjugation. denotes the

imaginary unit, and are real and imaginary parts,refers to usual expectation, and and express

determinant and trace of a matrix, respectively. Throughout thepaper, denotes the logarithm taken with respect to an ar-bitrary but fixed base. Therefore, all results are valid regardlessof the chosen unit for differential entropy (nats or bits).

Outline: The remainder of this paper is organized as fol-lows. In Section II, we introduce our framework and presentinitial results about the distribution and second-order proper-ties of complex-valued random vectors. Section III deals with

the question, how to circularize complex-valued random vec-tors and analyzes the proposed method. The differential entropyof complex-valued random vectors is addressed in Section IVand two improved maximum entropy theorems are proved. Fi-nally, in Section V, we present various capacity results for com-plex-valued MIMO channels.

II. FRAMEWORK AND PRELIMINARY RESULTS

We consider complex-valued random vectors . We

assume that , where isdefined by stacking of real and imaginary part of , is dis-tributed according to a joint multivariate -dimensional prob-ability density function (pdf) . More precisely, it is as-sumed that the measure1 defining the distribution of is ab-solutely continuous with respect to , where denotes the

-dimensional Lebesgue measure [20]. Accordingly, when-ever an integral appears in this paper, integration is meant withrespect to the Lebesgue measure of appropriate dimension. Notethat when we refer to the distribution of , we mean the dis-tribution of defined by the pdf . Hence, a com-plex-valued random vector will be called Gaussian distributedif is (multivariate) Gaussian distributed.

Definition 2.1: A complex-valued random vector issaid to be circular, if has the same distribution as forall ; otherwise it is said to be noncircular. The set of allcircular complex-valued random vectors , whose distri-bution is absolutely continuous with respect to , is denotedby .

It is well known (see, e.g., [1], [4], [14], [21]) that for acomplete second-order characterization of a complex-valued

random vector not only mean vector and co-

variance matrix but also com-

plementary covariance matrixare required. Note that both mean vector and complementarycovariance matrix of a circular complex-valued random vectorare vanishing provided that its first- and second-order momentsexist [4].

Definition 2.2: A complex-valued random vector issaid to be proper, if its complementary covariance matrix van-ishes; otherwise, it is said to be improper.

Hence, circularity implies properness (under the assumptionof existing first- and second-order moments). Note that a zero-mean and proper Gaussian random vector is circular.

A. Polar and Sheared-Polar Representation

Here, we present some auxiliary results about the distributionof complex-valued random vectors. Let us denote by themapping

1Here, we refer to the measure defined on the Borel �-field on inducedby the measurable function defining the random vector.

Page 3: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2731

where denotes the set of nonnegative reals. There exists the

inverse , provided that we set for, . Note that the set, by which the domain

of is reduced according to this convention has measurezero with respect to . In the following, will be called

real representation of , whereas will bedenoted as polar representation of .

Lemma 2.3: Suppose is a complex-valued randomvector, which is distributed according to the pdf . Then,the pdf of its polar representation is given by

almost everywhere with respect to [20].Proof: Follows from

for all Lebesgue measurable sets , wherethe Jacobian determinant of is easily com-puted as[22].

Observing that the pdf of of the random vector

(with being deterministic) satis-fies

, where the notationis shorthand for modulo with respect to the interval , weobtain the following corollary.

Corollary 2.4: A complex-valued random vector iscircular if and only if the pdf of its polar representationsatisfies

Let us denote by the mapping shown in the first equa-tion at the bottom of the page, which is one-to-one with inverse

given by the second equation shown atthe bottom of the page. This follows immediately from the iden-tity

(1)

where . In the following,will be called sheared-polar represen-

tation of .

Lemma 2.5: Suppose is a complex-valued randomvector. Then, the pdfs of its polar representation and itssheared-polar representation are related according to

Proof: Observe that the measure defining the distribu-tion of is absolutely continuous with respect to ,since implies for all

, as can be seen by distinction of cases ac-cording to the modulo- operation. Since is not con-tinuous on its whole domain, we define the auxiliary mapping

, which is one-to-

one with inverse , where

. Its Jacobian de-terminant is identically .Suppose is any set of the form

. From (1), it follows that there

exists a finite partition of , i.e.,

and for , such that

and , where

. Here, denote the disjoint sets

, . Note that thepartition is caused by the modulo- operation used in the

Page 4: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2732 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

definition of and corresponds to the required distinctionof cases when investigating . Therefore

where follows from (1) and the fact thatfor . This implies the statement (see,

e.g., [20]).

Combining Corollary 2.4 and Lemma 2.5, while applying (1),yields the following important corollary.

Corollary 2.6: A complex-valued random vector iscircular if and only if the pdf of its sheared-polar representation

does not depend on , i.e., if and only if

B. Second-Order Properties

In the following, we establish some results about co-variance and complementary covariance matrices of com-plex-valued random vectors. For a given complex-valuedmatrix , let us denote by and

the real-valued matrices

(2a)

(2b)

respectively. This notation allows a simple expression of thecovariance matrix of the real representation of a complex-valued random vector in terms of covariance matrix andcomplementary covariance matrix as [1], [4], [14], [21]

(3)

Furthermore, , , and satisfy remarkable algebraic prop-erties, as stated by the next lemma.

Lemma 2.7:

(4a)

(4b)

(4c)

(4d)

(4e)

Proof: For some of the statements, see also [15]. Directcalculations yield (4a) and (4b). Equation (4c) follows from thedefinition of . A combination of (4a) and (4c), while observingthat , yields (4d). Finally, for (4e)

We are especially interested in the eigenvalues of . Wewill show that they are essentially given by the singular valuesof . Note that the singular value decomposition (SVD)[23] of a matrix factorizes into three ma-trices, i.e., . It is well defined for all rectangularcomplex matrices and yields unitary matricesand and a diagonal matrix , i.e.,

, with nonnegative entrieson its main diagonal—the singular values. and can bechosen such that the singular values are ordered in decreasingorder. In case the matrix is symmetric (not Hermi-tian), i.e., , there is a special SVD known as Takagifactorization [24]. It is given by the factorization

(5)

where the columns of are the orthonormal eigenvectors ofand the diagonal matrix has the singular values of

on its main diagonal.

Proposition 2.8: Suppose is a complex-valuedrandom vector with complementary covariance matrix

. Then, there exist a unitary matrixand a diagonal matrix with nonnegative entries,such that

represents the eigenvalue decomposition of . The diagonalentries of are the singular values of . In particular,

.Proof: Consider the Takagi factorization (5) of the sym-

metric and apply Lemma 2.7, i.e.,

Page 5: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2733

which represents the eigenvalue decomposition of , sinceis orthonormal according to (4d).

An interesting question that is directly related to complex-valued random vectors is the characterization of the set of com-plementary covariance matrices. For covariance matrices sucha characterization is well known, i.e., a matrix is a valid co-variance matrix, which means that there exists a random vectorwith this covariance matrix, if and only if it is Hermitian andnonnegative definite. In order to obtain an analogous result forcomplementary covariance matrices, we introduce the followingnotion.2

Definition 2.9: A matrix is said to be generalizedCholesky factor of a positive definite Hermitian matrix

, if it satisfies .Since , a generalized Cholesky factor is al-

ways a nonsingular matrix. Note that the conventional Choleskydecomposition (cf., [23]), , where is lower-trian-gular, yields a generalized Cholesky factor . But there are alsoother ways of constructing a generalized Cholesky factor. Let

be the eigenvalue decomposition of . For anymatrix , which satisfies , is a general-ized Cholesky factor. Hence, a generalized Cholesky factor isnot uniquely defined. However, we have the following charac-terization.

Proposition 2.10: Suppose is a generalized Choleskyfactor of . Then, for any unitary matrix , is alsoa generalized Cholesky factor. Conversely, if and aregeneralized Cholesky factors, then, there exists a unitary matrix

, such that .Proof: For nonsingular and , we have

which implies both statements.

The next theorem presents the promised criterion for a matrixto be a complementary covariance matrix. More precisely, it isa criterion in terms of both covariance matrix and complemen-tary covariance matrix. We will call a valid pair of co-variance matrix and complementary covariance matrix, if thereexists a complex-valued random vector with covariance matrix

and complementary covariance matrix .

Theorem 2.11: Suppose is nonsingular and. Then, is a valid pair of covariance matrix and

complementary covariance matrix if and only if is Hermi-tian and nonnegative definite, is symmetric, and the singularvalues of are smaller or equal to 1, where denotesan arbitrary generalized Cholesky factor of .

2Cf. the relation to the Karhunen-Loève transform [25], [26], also known asHotelling transform [27], and the Mahalanobis transform, e.g., [28] and refer-ences therein.

Proof: The requirements that is Hermitian and nonneg-ative definite as well as that is symmetric are obvious. Fur-thermore, observe that the singular values of do notdepend on the choice of the generalized Cholesky factor .

Suppose we are given a complex-valued random vectorwith covariance matrix and complementary co-

variance matrix . Consider the random vector .Clearly, and . From (3), i.e.,

, and the fact that isnonnegative definite, we conclude with Proposition 2.8, thatthe singular values of are smaller or equal to 1.

Conversely, consider a complex-valued random vector ,e.g., a Gaussian distributed one, defined by the covariance ma-

trix of its real representation as .According to Proposition 2.8, such a random vector exists,since is Hermitian and nonnegative definite provided thatthe singular values of are smaller or equal to 1. Ithas covariance matrix and complementary covariancematrix , cf., (3). Then, the random vector

has covariance matrix and complementarycovariance matrix .

Remarks: Apparently, the importance of the singular valuesof in the context of complex-valued random vec-tors was first observed in [1] (for the above criterion and ageneralized maximum entropy theorem) and independently in[29], where they were introduced as canonical coordinates [17],[27], [30]–[32] between a complex-valued random vector andits complex conjugate. Note that in [29], the matrixis called coherence matrix between a random vector and itscomplex conjugate. Interestingly, the approach of Schreier andScharf [29] differs from the approach taken here (and taken in[1]) in that [29] employs a complex-valued augmented algebrato study second-order properties of complex-valued randomvectors, whereas (2) introduces a real-valued representationinto real and imaginary parts. Later, in [18], the singular valueswere also termed circularity coefficients and the whole set ofsingular values was referred to as circularity spectrum. Wealso note that the condition of Theorem 2.11 on the singularvalues, can be equivalently expressed in terms of the Euclideanoperator norm as .

III. CIRCULAR ANALOG OF A COMPLEX-VALUED

RANDOM VECTOR

In this section, we consider the following problem: supposewe are given a complex-valued random vector, which is non-circular. Can we find a random vector, which is as “similar” aspossible to the original random vector but circular instead? Ob-viously, this depends on what is meant by “similar” and is, there-fore, mainly a matter of definition. However, if we can showuseful properties and/or theorems with this circularized randomvector, its introduction is reasonable. Our approach for associ-ating a circular random vector to a (possibly) noncircular one ismotivated by the well-known method used for stationarizing acyclostationary random process [33].

Definition 3.1: Suppose is a complex-valued random

vector. Then, the random vector , where

Page 6: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2734 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

is a uniformly distributed random variable independent of , issaid to be circular analog of .

In the following, we will show that the circular analog is in-deed a circular random vector. The next proposition expressesthe distribution of in terms of the distribution of (for bothpolar and sheared-polar representations).

Proposition 3.2: Suppose is a complex-valuedrandom vector. Then, the pdfs of the polar representations andsheared-polar representations of and its circular analogare related according to

(6a)

(6b)

respectively.Proof: For (6a), consider the joint pdf of and ,

i.e.,

and marginalize with respect to . Equation(6b) follows from (6a) using Lemma 2.5 and identity (1).

Observe that does not depend on , so that

Corollary 2.6 implies circularity of .

A. Divergence Characterization

Here, we present a (nontrivial) characterization of the circularanalog of a complex-valued random vector that further supportsthe chosen definition. It is based on the Kullback–Leibler diver-gence (or relative entropy) [34], [35], which can be regardedas a distance measure between two probability measures. Forcomplex-valued random vectors, whose real representations aredistributed according to multivariate pdfs, the Kullback–Leiblerdivergence between andis defined as

where we set and (motivated by conti-nuity). Here, is finite only if the support set ofis contained in the support set of . Note that

if and only if [34]. Thenext lemma shows that can be equivalently expressedin terms of polar and sheared-polar representations.

Lemma 3.3: Suppose and are com-plex-valued random vectors. Then, the Kullback–Leibler

divergence can be computed from the respective polarand sheared-polar representations of and according to

Proof: With

where Lemma 2.3 has been used. Furthermore, using the map-ping and the appropriate partition of(cf., the proof of Lemma 2.5)

where Lemma 2.5 has been used.

We intend to derive a theorem, which states that thecircular analog has a smaller “distance” from the givencomplex-valued random vector than any other circularrandom vector. To that end, consider the sheared-polarrepresentation of , i.e., , and form the “re-duced” vector by only taking the firstelements of . Clearly, its pdf is given by marginaliza-tion, i.e., ,

where . Furthermore, letdenote the support set of .

Page 7: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2735

Note that is equivalent to(for fixed ). We have

(7)where denotes the last element of .

Theorem 3.4: Suppose is a complex-valued randomvector. Then, a circular random vector is the circularanalog of , i.e., , if and only if it minimizes the Kull-back–Leibler divergence to within the whole set of cir-cular random vectors, i.e., if and only if

Furthermore

where denotes the conditional differential entropy ofgiven , cf., [34] and Definition 4.2, with and ac-

cording to (7).Proof: Suppose and consider its sheared-polar rep-

resentation . Due to the circularity of ,, and, according to Lemma 3.3

where is the corresponding “reduced” vector of. For the validity of , we also refer to [35, Th. D.13]. It

follows that

and the infimum is achieved for . Sincefor the circular analog of , ,

and since , the infimum is achievedif and only if , i.e., .

B. Complex-Valued Random Vectors With Finite Second-OrderMoments

In this section, we establish important properties of thecircular analog of a complex-valued random vector ,whose second-order moments exist. Clearly, both mean vectorand complementary covariance matrix of are vanishing.For the covariance matrix, we have the following result.

Theorem 3.5: Suppose is a zero-mean complex-valued random vector with finite second-order moments. Then,the covariance matrix of the circular analog equals the co-variance matrix of , i.e., .

Proof: For the correlation between the and entryof

where denotes the uniformly distributed random variable usedfor the definition of (see Definition 3.1) and followsfrom Fubini’s Theorem [22].

The following theorem states that the circular analog of animproper Gaussian distributed random vector is non-Gaussian.

Theorem 3.6: Suppose is a zero-mean, com-plex-valued, and Gaussian distributed random vector with

such that its circular analog isGaussian distributed. Here, denotes a generalized Choleskyfactor of the nonsingular covariance matrix of anddenotes the complementary covariance matrix of . Then, isproper.

Proof: We first prove the theorem for the special caseand , where denotes a diagonal

matrix with nonnegative diagonal entries . For fixed

(deterministic) , consider the random vector ,which has covariance matrix and complementarycovariance matrix . According to (3), the

covariance matrix of its real representation is given by

Page 8: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2736 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

whose determinant is easily computed as

Furthermore, its inverse is calculated as

where . Therefore, the pdf of isgiven by

. Since , we obtain

(8)

where is the modified Besselfunction of the first kind of order zero [36]. Here, we have usedthe identity

where denotes the polar coordinates of the complexnumber . According to the assumptions of the the-orem and to Theorem 3.5, is Gaussian distributed withcovariance matrix and vanishing mean vector andcomplementary covariance matrix. Hence, (8) impliesand, therefore, the statement.

For the general case, apply the Takagi factorization (5) to, i.e., , and consider the

random vector . Clearly, and ,and both and are Gaussian distributed. From the specialcase, , and, in turn, .

IV. DIFFERENTIAL ENTROPY OF COMPLEX-VALUED

RANDOM VECTORS

As outlined in Section I, we are interested in bounds on thedifferential entropy of complex-valued random vectors. We startwith a series of definitions, which are required for the furtherdevelopment of the paper. Again, we make use of the convention

and .

Definition 4.1: The differential entropy of a complex-valued random vector is defined as the differential en-tropy of its real representation , i.e.,

provided that the integrand is integrable [20].

Definition 4.2: The conditional differential entropyof a complex-valued random vector given a complexvalued random vector is defined as the conditionaldifferential entropy of the real representation given the realrepresentation , i.e.,

provided that the integrand is integrable. Here,denotes the joint pdf of and , whereas denotesthe marginal pdf of .

Definition 4.3: The mutual information between thecomplex-valued random vectors and is de-fined as the mutual information between their real representa-tions and , i.e.,

where denotes the joint pdf of and , andand are the marginal pdfs of and , re-

spectively. It is well known that these quantities satisfy the fol-lowing relations:

(9a)

(9b)

with equality in (9b) if and only if and are statistically inde-pendent. Furthermore, according to the next theorem, Gaussiandistributed circular/proper random vectors are known to be en-tropy maximizers.

Page 9: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2737

Theorem 4.4 [14]: Suppose is a zero-mean complex-valued random vector with nonsingular covariance matrix .Then, the differential entropy of satisfies

(10)

with equality if and only if is Gaussian distributed and cir-cular/proper.

Proof: See, e.g., [14] or [15].

Remarks: Let us assume, for the moment, that is knownto be non-Gaussian. Clearly, inequality (10) is strict in this caseand is not a tight upper bound for the differen-tial entropy . Similarly, if is known to be improper, thedifferential entropy is strictly smaller than .Loosely speaking, there are two sources that decrease the differ-ential entropy of a complex-valued random vector: non-Gaus-sianity and improperness. In the following, we will derive im-proved maximum entropy theorems that take this observationinto account. While their application is not limited to the non-Gaussian and improper cases, the obtained upper bounds arein general tighter for these two scenarios than the upper boundgiven by Theorem 4.4.

A. Maximum Entropy Theorem I

We first prove a maximum entropy theorem that is especiallysuited to the non-Gaussian case. However, also for Gaussian dis-tributed random vectors, the obtained upper bound will turn outto be tighter than the one of Theorem 4.4. It associates a spe-cific circular random vector to a given random vector and upperbounds the differential entropy of the given random vector bythe differential entropy of the associated circular random vector.

Theorem 4.5 (Maximum Entropy Theorem I): Supposeis a complex-valued random vector. Then, the differential

entropies of and its circular analog satisfy

with equality if and only if is circular.Proof: Since with independent of , we

have for fixed (deterministic)

and, furthermore, by applying Fubini’s theorem to Definition4.2

Therefore

(11)

where we have used (9a) and (9b). and are independent,

i.e., , if and only if and are independent.

To investigate this independence,3 consider the joint pdf ofand , i.e.,

where ,being the support set of , cf., (7). Since

on and

on , independence of andis equivalent to

(12)as function of . Transforming both sides of this equa-

tion according to and , a similar parti-tioning argument as in the proof of Lemma 2.5 shows that (12)is equivalent to

(13)

as function of . Marginalization of both sides of (13)with respect to yields

as function of , so that—according to Corollary 2.6 and(7)—equality implies circularity of . The con-verse statement follows from Theorem 3.4.

Remarks: Since is non-Gaussian in general,4 the upperbound in Theorem 4.5 is typically tighter than the upper boundin Theorem 4.4. Furthermore, Theorem 4.5 does not need therequirement of finite second-order moments. The next corollarystates that for improper Gaussian distributed random vectors theupper bound in Theorem 4.5 is strictly smaller than the upperbound in Theorem 4.4.

Corollary 4.6: Suppose is a zero-mean, complex-valued, and Gaussian distributed random vector with nonsin-gular covariance matrix and , where

denotes a generalized Cholesky factor of and de-notes the complementary covariance matrix of . Then, the dif-ferential entropy of its circular analog satisfies

with equality if and only if is proper.Proof: Since is zero-mean with covariance matrix

, cf., Theorem 3.5, the inequality follows from

3The following technical derivation is required in order to show identical dis-tributions of � � for all � � ��� �� and not only for � � ����� on ��� ��.

4Note that it is possible to define an improper (non-Gaussian, cf., Theorem3.6) random vector, such that its circular analog is Gaussian distributed. In thiscase, Theorem 4.5 does not yield an improvement over Theorem 4.4.

Page 10: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2738 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

Theorem 4.4. Furthermore, equalityimplies Gaussianity of , and, according to Theorem 3.6,properness of .

B. Maximum Entropy Theorem II

Here, we prove a maximum entropy theorem that is especiallysuited to the improper case. The derivation is based on a max-imum entropy theorem for real-valued random vectors.

Theorem 4.7: Suppose is a real-valued randomvector with nonsingular covariance matrix . Then, the dif-ferential entropy of satisfies

with equality if and only if is Gaussian distributed.Proof: For the proof of this theorem for being zero-mean

see e.g., [34]. The general case, where has a non-vanishingmean vector, follows immediately since both differential en-tropy and covariance matrix are invariant with respect to trans-lations.

We are now able to state the main theorem of Section IV-B.

Theorem 4.8 (Maximum Entropy Theorem II): Supposeis a complex-valued random vector with nonsingular covari-

ance matrix and , where denotes ageneralized Cholesky factor of and denotes the comple-mentary covariance matrix of . Then, the differential entropyof satisfies

where are the singular values of , with equalityif and only if is Gaussian distributed.

Proof: According to Theorem 4.7

where the last identity follows from Proposition 2.8 applied to

the random vector . Note that . Wealso conclude from the last expression that the nonsingularity of

, which is required for the application of Theorem 4.7, isa direct consequence of the assumption ,

since . The equality criterion is obvious.

Remarks: Note that with equalityif and only if is proper, so that Theorem 4.8 implies Theorem4.4. The upper bound in Theorem 4.8 is the differential entropyof a Gaussian distributed but in general noncircular/improperrandom vector with same covariance matrix and complemen-tary covariance matrix as , whereas the upper bound in The-orem 4.5 is the differential entropy of a circular but in generalnon-Gaussian random vector with same covariance matrix as .Which of the two bounds is tighter depends on the situation, i.e.,on the degree of improperness and non-Gaussianity; a generalstatement is not possible. However, for an improper Gaussiandistributed random vector

whereas for a circular non-Gaussian random vector

V. CAPACITY OF COMPLEX-VALUED CHANNELS

In this section we study the influence of circularity/proper-ness—noncircularity/improperness on channel capacity. In par-ticular, we investigate MIMO channels with complex-valuedinput and complex-valued output. For simplicity, we only con-sider linear channels with additive noise, i.e., channels of theform

(14)

where , , and denote transmit, receive,and noise vector, respectively, and is the channelmatrix. Both5 and are modeled as independent identicallydistributed (i.i.d.) (only with respect to channel uses; within therandom vectors the i.i.d. assumption is not made) vector-valuedrandom processes, whereas is either assumed to be deter-ministic or is modeled as an i.i.d. (again, only with respect tochannel uses) matrix-valued random process. Furthermore, ,

, and (if applicable) are assumed to be statistically inde-pendent. Note that the assumption of a Gaussian distributednoise vector is only made for the special case investigated inSection V-C but not in general. The channel is characterized bythe conditional distribution of given via the conditional pdf

of their real representations given , aswell as by a set of admissible input distributions. We write

, if the distribution of defined by the pdf is in .Then, the capacity/noncoherent capacity of (14) is given by thesupremum of the mutual information over the set of admissibleinput distributions [37], i.e., by

5Without loss of generality, since an i.i.d. (with respect to channel uses) � iscapacity achieving if � and � (if applicable) are i.i.d..

Page 11: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2739

If, for the case of a random channel matrix, it is additionallyassumed that the channel realizations are known to the receiver(but not to the transmitter), the channel output of (14) is the pair

(15)

so that the channel law of (15) is governed by the conditionalpdf , where is defined by an appro-priate stacking of real and imaginary part of . Therefore, thecoherent capacity of (15) is given by

where denotes the pdf of and Fubini’s Theoremhas been used. A random vector is said to be capacity-achieving for (14) or (15), if or

, respectively. In what follows (see Sections V-A and V-B),the existence of a (not necessarily circular) capacity-achievingrandom vector is presupposed.

A. Circular Noise Vector

Here, we assume that the noise vector is circularand that is closed under the operation of forming the circularanalog, i.e., that implies —in the followingshortly termed circular-closed. Note that this closeness assump-tion is a natural assumption, since the operation of forming thecircular analog preserves both peek and average power con-straints, cf., Theorem 3.5, which are the most common con-straints for defining . If is Gaussian distributed, it has beenshown in [15] that capacity (for deterministic ) and coherentcapacity (for random ) are achieved by circular (Gaussian dis-tributed) random vectors, respectively. The proofs are based onTheorem 4.4. The following theorems extend these results to thenon-Gaussian case.

Theorem 5.1: Suppose for (14) a deterministic channel ma-trix , a circular noise vector , and a cir-cular-closed set of admissible input distributions. Then, thereexists a circular random vector that achieves the ca-pacity of (14).

Proof: Let us denote by a—not necessarily cir-cular—capacity-achieving random vector. According to (9a), its

circular analog , where isuniformly distributed and assumed to be independent of and

, satisfies

Note that , cf., Theorem 3.4, and thatis independent of , according to (11). Therefore

where follows from Theorem 4.5. Hence, the circular iscapacity-achieving.

Theorem 5.2: Suppose for (14) a random channel matrix, a circular noise vector , and a circular-closed set

of admissible input distributions. Then, there exists a circularrandom vector that achieves the noncoherent capacityof (14).

Proof: Let us denote by a—not necessarily cir-

cular—capacity-achieving random vector and let(with being deterministic). With ,we obtain

where follows from Fubini’s Theorem. Since the differentialentropy of a complex-valued random vector is invariant withrespect to a multiplication with

where and follows from the circularity of .Hence, is capacity-achieving. It is well known that the mu-tual information is a concave function with respect to the inputdistribution for fixed channel law [37]. Therefore, by Jensen’sinequality [38], the random vector with distribution

defined according to satisfies

so that achieves the noncoherent capacity of (14). But, where denotes the uni-

formly distributed random variable used for defining (seeDefinition 3.1), and, therefore, .

Theorem 5.3: Suppose for (14) a random channel matrix, a circular noise vector , and a circular-closed set

Page 12: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2740 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

of admissible input distributions. Then, there exists a circularrandom vector that achieves the coherent capacity of(15).

Proof: Let us denote by a—not necessarily cir-cular—random vector that achieves the coherent capacity of(15). Using the same line of arguments as in the proof of The-

orem 5.1, its circular analog , whereis uniformly distributed and assumed to be indepen-

dent of , , and , can be shown to satisfy

where and . It follows that

i.e., achieves the coherent capacity of (15).

B. Circular Channel Matrix

Here, we assume that the channel matrix israndom, and—additionally—that an arbitrary stacking of the el-ements of into an -dimensional vector yields a circularrandom vector. The noise vector is not required to be cir-cular. Note that this is the opposite situation compared withSection V-A, where is circular but is arbitrary. Again, itis assumed that the set of admissible input distributions is cir-cular-closed. For the input distributions that achieve the nonco-herent capacity of (14) and the coherent capacity of (15), respec-tively, we have the following results.

Theorem 5.4: Suppose for (14) a random channel matrix, such that the random vector, which is obtained from an

arbitrary stacking of the elements of into an -dimensionalvector, is circular, and a circular-closed set of admissible inputdistributions. Then, there exists a circular random vector

that achieves the noncoherent capacity of (14).Proof: Let us denote by a—not necessarily cir-

cular—random vector that achieves the noncoherent capacity of

(14), and let , where is uni-formly distributed and assumed to be independent of , , and

, be its circular analog. We have

and intend to show . Due to the circularityof , Theorem 3.4 implies

so that it remains to show .Fubini’s Theorem yields

since ,and, furthermore

where follows from the circularity of . Hence, the circularachieves the noncoherent capacity of (14).

Theorem 5.5: Suppose for (14) a random channel matrix, such that the random vector, which is obtained from an

arbitrary stacking of the elements of into an -dimensionalvector, is circular, and a circular-closed set of admissible inputdistributions. Then, there exists a circular random vector

that achieves the coherent capacity of (15).Proof: Let us denote by a—not necessarily cir-

cular—capacity-achieving random vector and let(with being deterministic). Withwe obtain

where Fubini’s Theorem has been used, and, furthermore, dueto the circularity of

where . Hence, is capacity-achieving. There-fore, by applying Jensen’s inequality to the concave mutual in-formation function (with respect to the input distribution, cf.,

Page 13: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2741

the proof of Theorem 5.2), the random vector with dis-

tribution defined according to

satisfies

so that achieves the coherent capacity of (15). But, where denotes the

uniformly distributed random variable used for defining(see Definition 3.1), and, therefore, .

C. Deterministic Channel Matrix and Improper GaussianNoise Vector

Here, we investigate the case that the channel matrixis deterministic and that the noise vector is

Gaussian distributed. We impose an average power constraint,i.e., we define the set of admissible input distributions as

(16)

For proper, both capacity and capacity-achieving input vectorare well known [15]. Therefore, in the following, we considera more general situation without the assumption of beingproper. However, we introduce additional technical assump-tions, which make the derivation less complicated and lead tosimpler results. Note that most of these assumptions could besignificantly relaxed or even omitted, but for the price of moreinvolved theorems and proofs. We assume,

(17a)

(17b)

(17c)

(17d)

where and denote covariance matrix and complementarycovariance matrix of , respectively, and is a generalizedCholesky factor of . We have the following capacity result.

Theorem 5.6: Suppose for (14) that assumptions (17) holdand that the set of admissible input distributions is defined ac-cording to (16). Then, the capacity of (14) is given by

where are the singular values of . Furthermore,the zero-mean and Gaussian distributed random vectorwith covariance matrix and complementary covariance matrixgiven by

(18a)

(18b)

respectively, is capacity-achieving.

Proof: Since , wheredenotes the mean vector of , Theorem 4.4 implies that thesupremum of

over is achieved by a zero-mean and Gaussian distributedcomplex-valued random vector with covariance matrix

that maximizes the functionover the set of covariance matrices with , and withcomplementary covariance matrix that satisfies

, provided that such a random vector exists. Using theeigenvalue decomposition we obtain,

so that its maximum is achieved for, where is chosen such that

(19)

is satisfied. Note that this is the well-known water filling so-lution [15], [34], with the additional simplification that isnonsingular,6 since, according to (17b)

(20)

where is the largest entry (eigenvalue) of . This yields(18a). Clearly, the choice (18b) satisfies .It remains to show that is a valid pair of covariancematrix and complementary covariance matrix. To that end, con-sider

(21a)

(21b)

where follows from Theorem 2.11, and note that, cf., (19) and (20). This implies

and, furthermore

so that Theorem 2.11 shows that (18) defines a valid pair of co-variance matrix and complementary covariance matrix. Finally,the capacity of (14) is obtained as

6The water level � is larger than the noise power for all (parallel) eigenchan-nels.

Page 14: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

where Theorem 4.8 has been used.

Remarks: Whereas in many real-world scenarios the noisevector happens to be circular, so that the results of Section V-Aapply, there are also practically relevant scenarios, where thenoise vector is known to be improper. More specifically, DMTmodulation, which is widely used in xDSL applications [19],yields an equivalent system channel that exactly matches thesituation considered here [9], [10]. We also note that capacityresults for improper Gaussian distributed noise vectors could bealternatively derived by making use of an equivalent real-valuedchannel of dimension that is obtained by appropriatestacking of real and imaginary parts. The advantage of the ap-proach presented here is that it yields expressions that are ex-plicit in covariance matrix and complementary covariance ma-trix. In the following, we exploit this (desired) separation.

Observe that improper noise is beneficial since—due to—it increases capacity. However,

this presupposes a suitably designed transmission scheme. Ifit is erroneously believed that , it will be erroneouslybelieved as well (see Theorem 5.6) that the zero-mean andGaussian distributed random vector with covariance matrix

as in (18a) but with is capacity-achieving.It follows that

where and denotes the resulting ca-pacity loss. This capacity loss is quantified by the next theorem.

Theorem 5.7: Suppose for (14) that assumptions (17) holdand that the set of admissible input distributions is defined ac-cording to (16). Then, the capacity loss that occurs if it iserroneously believed that is given by

where are the singular values of. In par-

ticular

Proof: We intend to apply Theorem 4.8 to the randomvector . In order to meet the assumptionof Theorem 4.8, we have to show ,where denotes a generalized Cholesky factor of .Note that the nonsingularity of follows from the non-singularity of . Clearly,

and , so that

and, furthermore

(22)

The capacity loss is then given by

where follows from Theorem 4.8, since and. For the bound note that

Example: Let us consider the special case of the complexscalar channel with noise covariance andcomplementary noise covariance , where .According to (3)

which is illustrated in Fig. 1(a). It is seen that the noise poweris different for real and imaginary part. If it is erroneously be-lieved that , the same power will be assigned to real andimaginary part of the input vector, as it is shown in Fig. 1(b).However, the optimum power distribution that maximizes themutual information is different; it is depicted in Fig. 1(c). Notethat this capacity-achieving power distribution is obtained bywater filling on a real and imaginary part level. The differencebetween the mutual informations of solutions (b) and (c) is ex-pressed by the capacity loss .

VI. CONCLUSION

We studied the influence of circularity/noncircularity andproperness/improperness on important information theoreticquantities such as entropy, divergence, and capacity. As amotivating starting point served a theorem by Neeser and

Page 15: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

TAUBÖCK: COMPLEX-VALUED RANDOM VECTORS AND CHANNELS 2743

Fig. 1. Water filling strategies illustrating the capacity loss.

Massey [14], which states that the entropy of a zero-mean com-plex-valued random vector is upper-bounded by the entropyof a circular/proper Gaussian distributed random vector withsame covariance matrix. We strengthened this theorem in twodifferent directions: 1) we dropped the Gaussian assumptionand 2) we dropped the properness assumption. In both cases, theresulting upper bound turned out to be tighter than the one pre-viously known. A key ingredient for the proof in case 1 was theintroduction of the circular analog of a given complex-valuedrandom vector. Whereas its definition was based on intuitivearguments to obtain a circular random vector, which is “close”to the (potentially) noncircular given one, we rigorously provedthat it equals the unique circular random vector with minimumKullback–Leibler divergence. On the other hand, for case 2,we exploited results about the second-order structure of com-plex-valued random vectors that were obtained without makinguse of the augmented covariance matrix (in contrast to relatedwork). Additionally, we presented a criterion for a matrix tobe a valid complementary covariance matrix. Furthermore, weaddressed the capacity of MIMO channels. Regardless of thespecific distribution of the channel parameters (noise vectorand channel matrix, if modeled as random), we showed that thecapacity-achieving input vector is circular for a broad rangeof MIMO channels (including coherent and noncoherent sce-narios). This extends known results that make use of a Gaussianassumption. Finally, we investigated the situation of an im-proper and Gaussian distributed noise vector. We computedboth capacity and capacity-achieving input vector and showedthat improperness increases capacity, provided that the comple-mentary covariance matrix is exploited. Otherwise, a capacityloss occurs, for which we derived an explicit expression.

ACKNOWLEDGMENT

The author would like to thank J. Huber, J. Sayir, and J. Wein-richter for helpful hints and comments. He is also grateful to theanonymous reviewers for their constructive comments that haveresulted in a major improvement of this paper.

REFERENCES

[1] G. Tauböck, “Rotationally variant complex channels,” in Proc. 23rdSymp. Inf. Theory Benelux, Louvain-la-Neuve, Belgium, May 2002,pp. 261–268.

[2] G. Tauböck, “On the maximum entropy theorem for complex randomvectors,” in Proc. IEEE Int. Symp. Inf. Theory, IL, Chicago, Jun./Jul.2004, p. 41.

[3] G. Tauböck, “A maximum entropy theorem for complex-valuedrandom vectors, with implications on capacity,” in Proc. IEEE Inf.Theory Workshop, Paraty, Brazil, Oct. 2011, pp. 375–379.

[4] P. J. Schreier and L. L. Scharf, Statistical Signal Processing of Com-plex-Valued Data: The Theory of Improper and Noncircular Signals.Cambridge, U.K.: Cambridge Univ. Press, 2010.

[5] Y. C. Yoon and H. Leib, “Maximizing SNR in improper complex noiseand applications to CDMA,” IEEE Commun. Lett., vol. 1, no. 1, pp.5–8, Jan. 1997.

[6] G. Gelli, L. Paura, and A. R. P. Ragozini, “Blind widely linear mul-tiuser detection,” IEEE Commun. Lett., vol. 4, no. 6, pp. 187–189, Jun.2000.

[7] A. Lampe, R. Schober, W. Gerstacker, and J. Huber, “A novel itera-tive multiuser detector for complex modulation schemes,” IEEE J. Sel.Areas Commun., vol. 20, no. 2, pp. 339–350, Feb. 2002.

[8] H. Gerstacker, R. Schober, and A. Lampe, “Receivers with widelylinear processing for frequency-selective channels,” IEEE Trans.Commun., vol. 51, no. 9, pp. 1512–1523, Sep. 2003.

[9] G. Tauböck, “Noise analysis of DMT,” in Proc. IEEE Global Commun.Conf., San Francisco, CA, Dec. 2003, vol. 4, pp. 2136–2140.

[10] G. Tauböck, “Complex noise analysis of DMT,” IEEE Trans. SignalProcess., vol. 55, no. 12, pp. 5739–5754, Dec. 2007.

[11] L. Anttila, M. Valkama, and M. Renfors, “Circularity-based I/Q im-balance compensation in wideband direct-conversion receivers,” IEEETrans. Veh. Technol., vol. 57, no. 4, pp. 2099–2113, Jul. 2008.

[12] P. Rykaczewski, M. Valkama, and M. Renfors, “On the connectionof I/Q imbalance and channel equalization in direct-conversion trans-ceivers,” IEEE Trans. Veh. Technol., vol. 57, no. 3, pp. 1630–1636, May2008.

[13] Y. Zou, M. Valkama, and M. Renfors, “Digital compensation of I/Q im-balance effects in space-time coded transmit diversity systems,” IEEETrans. Signal Process., vol. 56, no. 6, pp. 2496–2508, Jun. 2008.

[14] F. D. Neeser and J. L. Massey, “Proper complex random processes withapplications to information theory,” IEEE Trans. Inf. Theory, vol. 39,no. 4, pp. 1293–1302, Jul. 1993.

[15] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur.Trans. Telecomm., vol. 10, pp. 585–595, Nov./Dec. 1999.

[16] P. J. Schreier, L. L. Scharf, and C. T. Mullis, “Detection and estimationof improper complex random signals,” IEEE Trans. Inf. Theory, vol. 51,no. 1, pp. 306–312, Jan. 2005.

[17] P. J. Schreier, L. L. Scharf, and A. Hanssen, “A generalized likelihoodratio test for impropriety of complex signals,” IEEE Signal Process.Lett., vol. 13, no. 7, pp. 433–436, Jul. 2006.

[18] J. Eriksson and V. Koivunen, “Complex random vectors and ICAmodels: Identifiability, uniqueness, and separability,” IEEE Trans. Inf.Theory, vol. 52, no. 3, pp. 1017–1029, Mar. 2006.

[19] M. Gagnaire, “An overview of broad-band access technologies,” Proc.IEEE, vol. 85, no. 12, pp. 1958–1972, Dec. 1997.

[20] P. R. Halmos, Measure Theory. New York: Springer-Verlag, 1974.[21] G. Tauböck, “Wireline multiple-input/multiple-output systems,” Ph.D.

dissertation, Inst. Commun. Radio-Frequency Eng., Vienna Univ.Technology, Vienna, Austria, 2005.

[22] W. Rudin, Real and Complex Analysis 3rd International Edition.New York: McGraw-Hill, 1987.

[23] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Bal-timore, MD: The Johns Hopkins Univ. Press, 1996.

[24] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.:Cambridge Univ. Press, 1990.

[25] K. Karhunen, “Über lineare Methoden in der Wahrscheinlichkeitsrech-nung,” Ann. Acad. Sci. Fennicae, ser. A (I), vol. 37, no. 18, pp. 3–79,1947.

[26] M. Loève, “Fonctions Aléatoires du Second Ordre,” in Processusstochastiques et mouvements Browniens, P. Lévy, Ed. Paris, France:Gauthier-Villars, 1948.

[27] H. Hotelling, “Analysis of a complex pair of statistical variables intoprincipal components,” J. Educ. Psychol., vol. 24, pp. 417–441, 1933.

[28] Y. Eldar and A. Oppenheim, “MMSE whitening and subspacewhitening,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1846–1851,Jul. 2003.

Page 16: IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, …properness/improperness of complex-valued signals. In this paper, ... the performance of signal processing techniques. More specifi-cally,

2744 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 5, MAY 2012

[29] P. J. Schreier and L. L. Scharf, “Canonical coordinates for reduced-rank estimation of improper complex random vectors,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Process., Orlando, FL, May 2002,pp. 1153–1156.

[30] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol.28, no. 3–4, pp. 321–377, 1936.

[31] L. L. Scharf and J. K. Thomas, “Wiener filters in canonical coordinatesfor transform coding, filtering, and quantizing,” IEEE Trans. SignalProcess., vol. 36, no. 3, pp. 647–654, Mar. 1998.

[32] L. L. Scharf and C. T. Mullis, “Canonical coordinates and the geometryof inference, rate, and capacity,” IEEE Trans. Signal Process., vol. 48,no. 3, pp. 824–831, Mar. 2000.

[33] A. Papoulis, Probability, Random Variables, and Stochastic Processes,3rd ed. New York: McGraw-Hill, 1991.

[34] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.

[35] A. Dembo and O. Zeitouni, Large Deviations Techniques and Appli-cations (corrected printing of the 1998 edition). Berlin, Germany:Springer-Verlag, 2010.

[36] M. Abramowitz and I. Stegun, Handbook of Mathematical Func-tions. New York: Dover, 1965.

[37] R. M. Gray, Entropy and Information Theory, 2nd ed. New York:Springer-Verlag, 2010.

[38] R. B. Ash, Probability and Measure Theory, 2nd ed. San Diego, CA:Academic, 2000.

Georg Tauböck (S’01–M’07) received the Dipl.-Ing. degree and the Dr.techn.degree (with highest honors) in electrical engineering and the Dipl.-Ing. degreein mathematics (with highest honors) from Vienna University of Technology,Vienna, Austria in 1999, 2005, and 2008, respectively. He also received thediploma in violoncello from the Conservatory of Vienna, Vienna, Austria, in2000.

From 1999 to 2005, he was with the FTW Telecommunications ResearchCenter Vienna, Vienna, Austria, and since 2005, he has been with the Institute ofTelecommunications, Vienna University of Technology, Vienna, Austria. FromFebruary 2010 to August 2010, he was a visiting researcher with the Communi-cation Technology Laboratory (Communication Theory Group) at ETH Zurich,Zurich, Switzerland.

His research interests include wireline and wireless communications, com-pressed sensing, signal processing, and information theory.