Capacity of Block Rayleigh Fading Channels Without CSIweb.stanford.edu/~mainakch/papers/block_fading_capacity.pdfMainak Chowdhury and Andrea Goldsmith, Fellow, IEEEy Department of

Capacity of Block RayleighFading Channels Without CSI

Mainak Chowdhury∗ and Andrea Goldsmith, Fellow, IEEE†

Department of Electrical Engineering, Stanford University, USAEmail: ∗[email protected], †[email protected]

Abstract—A system with a single antenna at the transmitterand receiver and no channel state information at either isconsidered. The channel experiences block Rayleigh fading witha coherence time of T0 symbol times and the fading statistics areassumed to be known perfectly. The system operates with a finiteaverage transmit power. It is shown that the capacity optimalinput distribution in the T0-dimensional space is the product ofthe distribution of an isotropically-distributed unit vector and adistribution on the 2−norm in the T0-dimensional space whichis discrete and has a finite number of points in the support.Numerical evaluations of this distribution and the associatedcapacity for a channel with fading and Gaussian noise for acoherence time T0 = 2 are presented for representative SNRs.Itis also shown numerically that an implicit channel estimation isdone by the capacity-achieving scheme.

Index Terms—Block fading channels, no CSI, capacity-achieving input distribution, noncoherent communications

I. INTRODUCTION

Channel estimation and the subsequent use of the channelestimates for data transmission lies at the basis of manywireless communication systems in use today. In this work,we explore an alternative paradigm where channel estimationand data transmission are not performed one after the otherbut rather jointly with the end-goal of maximizing the datarate. This maximum data rate equals the channel’s Shannoncapacity under the assumption that only the channel statisticsare known at the transmitter and the receiver. Capacity resultsin this setting are few and far between. However, there is arich history of work investigating many special cases.

One such example is the finite state Markov channel,whose capacity under no CSI was studied in [1], [2]. In theseworks, the Markov property of the channel was used both tocompute good bounds for the capacity as well as exact capac-ities for some special classes of channels. Capacity of i.i.d. aswell as block fading channels with no CSI (but with perfectknowledge of the channel statistics) has also been extensivelyinvestigated. In particular, the capacity-achieving distributionfor i.i.d. Rayleigh and Ricean fading channels without CSI wasderived in [3], [4]. Based on a characterization of the Karush-Kuhn-Tucker (KKT) conditions associated with the convexoptimization problem of maximizing the mutual informationof these channels, the authors established that the optimalcapacity-achieving input distribution is discrete with a finitenumber of mass points in the norm.

A series of fundamental contributions were made startingin the early 2000s on the capacity of block fading chan-

nels without CSI at the transmitter or the receiver, alsocalled noncoherent channels. One such contribution is thenotion of unitarily invariant codes proposed in [5] and [6]for noncoherent MIMO channels. In the asymptotically largeSNR regimes, the capacity-achieving schemes depend only onthe fading distribution and perform space-time coding overthe Grassman manifold associated with the channel matrix.Multiuser counterparts of these ideas can be found in [7], [8].Results about optimal random codes in general block fadingchannels for low-to-moderate SNR regimes are harder to comeby, since the codes depend not only on the fading distributionbut also on the noise distribution. One example of such a workis [9]. In this work, the authors established that the probabilitydistribution of the error-exponent optimal random block codefor a SISO channel is supported on a finite number of discretemass points in the norm of the block code.

In our work we investigate the capacity and capacity-achieving distribution of a block fading model with a coher-ence time T0 > 1 and any SNR. Our analysis determinesthat, similar to known results for T0 = 1 and for the errorexponent-optimal distribution for T0 > 1, the capacity forT0 > 1 is achieved by a distribution on the norm ‖x‖ whichis supported on a finite number of mass points. Based onthis observation, we present numerical results for the capacityand the capacity-achieving distributions for channels withfading and noise and a coherence time of T0 ≤ 2. We find,based on the capacity-achieving distribution for T0 = 2,that sequential channel estimation using pilot symbols andsubsequent data transmission achieves strictly lower data ratesthan the capacity.

We also examine the mutual information between the chan-nel output and the channel state under the capacity-achievingdistribution. We show that this mutual information is non-zeroand increases with SNR, which indicates that some form ofimplicit channel estimation is inherent in optimal decoding.These results are relevant to signal design in many existing oremerging wireless systems where, on the one hand, the effectsof an imprecise channel estimate on achievable data rates arepoorly understood and, on the other hand, precise channel stateinformation may be expensive to acquire. In such cases, jointchannel estimation and data transmission or just noncoherenttransmission may be better than separate channel estimationand transmission.

A line of work exploring the cost of separate channelestimation is [10]. Specifically, this work explores the utility

of channel state information under schemes which involveseparate channel estimation and transmission (henceforth re-ferred to as partially-coherent schemes). This work assumesa certain channel estimation overhead, and makes precisevarious aspects of the optimal learning overhead needed toachieve good performance. A surprising outcome from thisline of work is that the overhead needed to achieve good rates(measured by the capacity) is often not that large. Our resultssuggest in addition that, even when the coherence times aresmall, the channel output when data symbols are transmittedalready contains information about the channel state. Thissuggests that a form of joint channel estimation and datatransmission might achieve better performance in practice thanthe commonly-used pilot-based channel estimation.

The rest of the paper is organized as follows. We present thesystem model in Section II, describe some properties of theoutput distribution in Section III and characterize the structureof the optimal capacity-achieving input distribution in SectionIV. Based on a numerical optimization of these expressions forour channel model, we present the capacity and the capacity-achieving distribution as a function of the SNR in Section V-A.We discuss the implications of our results relative to pilot-based channel estimation in Section V-B and finally presentour concluding thoughts in Section VI.

II. SYSTEM MODEL

We consider one single antenna transmitter and one singleantenna receiver. The system across a single block of T0symbol times may be represented as

y = hx + ν (1)

with y,ν ∈ RT0 , h ∈ R,x ∈ RT0 . Each νi ∼ N (0, σ2),and h ∼ N (0, 1). We restrict attention to real-valued channelcoefficients for simplicity of the exposition and the numericaloptimization. Extensions to complex domains follow verysimilar lines and are presented in the extended version of thiswork [11]. We use capital letters to refer to a random variableand lowercase letters to refer to a realization. We use pY(·) torefer to the density function of the continuous random variableY, and µX(·) to refer to the probability measure on randomvariable X.

We assume a block fading model with a coherence timeT0. We assume no instantaneous CSI at the transmitter orthe receiver, an average transmit power of 1, and that thereceiver doesn’t know the instantaneous channel realizationat the beginning of each new channel realization. We considercoding across blocklengths and seek to understand the optimalsignaling strategies to achieve capacity. Note that since thischannel can be thought of as a memoryless system pY|x(·),the fundamental limits for the achievable rates of this system isachieved by a distribution on the T0-dimensional space of allpossible inputs over T0 time slots (i.e., a space-time randomcode).

The channel in Fig. 1 is completely specified by the condi-tional density pY|x(·) which in turn is specified as follows: if

x ∈ RT0 y ∈ RT0Channel

Fig. 1: The system model

y ∈ RT0 refers to the T0-dimensional output of the channel,then, given x, y is distributed as

y ∼ N (0,Σx),

where the (q, r)th entry of the matrix Σx is specified by

Σx,q,r = xrxq + σ2I(q = r),

with the indicator function I equal to 1 if the condition issatisfied and zero otherwise.

III. PROPERTIES OF THE COVARIANCE MATRIX Σx

In this section we point out some properties of the covari-ance matrix Σx in addition to the ones listed in Sections II-Cand IV in [12]. These properties are useful in understandingthe nature of the optimal input distributions and are also usedin establishing the results in Lemma 1. More specifically,the positive definiteness of the matrix Σx at all points inthe domain is used to establish the existence of the lineartransformation used to establish a contradiction. Proofs of theidentities listed below have been included in the extendedversion of this work [11].(a) Σx has T0 − 1 eigenvalues with value σ2 and a single

eigenvalue with value ‖x‖2 + σ2.(b) The (unnormalized) ith eigenvector corresponding to the

first T0 − 1 eigenvalues of σ2 are along (−xi+1/x1, ei),where ei is the unit row vector with only one nonzeroentry (unity) at position i. The T th0 eigenvector is along(x1/xT0 , . . . ,xT0−1/xT0 , 1).

(c) Σx is positive definite.(d) The determinant of Σx is a function of ‖x‖.

IV. CHARACTERIZING THE CAPACITY-ACHIEVINGDISTRIBUTION

The problem of maximizing mutual information for thechannel described in Fig. 1 can be written as

supµX(·)

I(Y;X)

subject to1

T0

∫‖x‖2dµX(x) ≤ 1,

(2)

or equivalently as

infµX(·)

− I(Y;X)

subject to1

T0

∫‖x‖2dµX(x) ≤ 1,

(3)

The above optimization is performed over all distributionsµX(·). I(Y;X) is the mutual information between Y andX and can be expressed as I(Y;X) = EY[− log(pY(·)] −EX[h(Y|X = x)], where h(Y|X = x) is the differentialentropy of Y given a fixed value x. The first expectation is

performed with respect to the distribution induced on Y bythe distribution µX(·), i.e., pY(·) =

∫pY|x(·)dµX(x).

Many structural properties of the capacity-achieving distri-bution have been derived in [12]. According to this work, thecapacity-achieving distribution is the product of the distribu-tions associated with the isotropically-distributed unitary vec-tors in the T0-dimensional space, together with a distributionon the norm r = ‖x‖. The rest of the discussions in thissection focuses on finding the optimal distribution associatedwith the norm r = ‖x‖.

We observe that the objective function in (3) is convexin µX(·). It can also be shown that the limit point of anysequence {µ(n)

x (·)}∞n=−∞ of measures lying in S = {µ(·) :∫‖x‖2dµ(x) ≤ T0} also lies in S. Thus the infimum in

(3) is attained by an optimal µ∗X(·). Necessary and sufficientconditions for the optimality of the solution µ∗X(·) can beobtained by writing down the KKT conditions. In particular,the Lagrangian L(µX(·), λ1, λ2) of the above optimizationproblem can be expressed as

L(µX(·), λ1, λ2)

=

∫y

(∫pY|x(y)dµX(x)

)log

(∫pY|x(y)dµX(x)

)dy

+

∫x

0.5 log((2πe)T0 |Σx|)dµX(x)+

λ1

(∫‖x‖2dµX(x)− T0

)+ λ2

(∫dµX(x)− 1

),

(4)

where λ1 ∈ R+ and λ2 ∈ R. In the above we used the factthat h(Y|X = x) = 0.5 log((2πe)T0 |Σx|).

The first-order necessary condition for the optimal µ∗X(·)states that whenever µ∗X(·) assigns positive measure to aneighborhood around x (i.e, µ∗X(Bxδ ) > 0, for all δ < δ0,where δ0 > 0 and Bxδ , {z : ‖z − x‖ < δ}), the followingmust hold: ∫

y

(1 + log(p?Y(y))) pY|x(y)dy

+ 0.5 log((2πe)T0 |Σx|) + λ1‖x‖2 + λ2

,g(p?Y(·),x) = 0,

(5)

where p?Y is the distribution on y induced by µX(·), and g(·, ·)is appropriately defined for the above relation to hold. Wenow state some properties of g(·, ·). These may be proved byobserving that µX(·) is only a function of r = ‖x‖ (referredto as µR(·) afterwards), which in turn follows from the resultsin [5].

Lemma 1. The following hold:

(a) If there exists an x such that for any neighborhood aroundx (i.e., Bxδ : {z : ‖z − x‖2 ≤ δ} for any positive δ),g(p?Y(·),x) is zero at some point inside the neighborhood,then p?Y(·) cannot be a valid probability distribution.

(b) There exists an R <∞ such that g(pY(·),x) > 0 for allx such that ‖x‖∞ > R.

(c) The optimal distribution µ∗X(·) assigns a non-zero measureto 0.

Proof Sketch. We present a brief sketch of the proofs below.Proof details are presented in the extended version of thismanuscript [11].

(a) This follows from the fact that if such a case existsthen, in particular, there exists a linear transformationfor which g(·,x) is zero in an interval around x alonga transformed coordinate. Thus by the Identity Theoremfrom complex analysis [13], g(·, ·) is identically zero alongthat coordinate. Then one can use methods very similarto those used in [3] to argue how the probability densityfunction p?Y(·) is nonintegrable (i.e., observing that therelation defines a Laplace transform and that the relationcan be inverted uniquely to a non-integrable distributionas described in Section IV-A in [3]).

(b) We first observe that under a power constraint,∫log(p?Y(y))pY|x(y)dy

is bounded by a term logarithmic in the norm of x andthat λ2 is fixed regardless of x. The result follows bynoting that, since λ1 > 0, as ‖x‖∞ → ∞, λ‖x‖2 −c log(‖x‖2 + σ2) is unbounded for any finite constant cand hence cannot be equal to zero.

(c) This may be established by contradiction. If all pointsin the support of the optimal distribution have a normgreater than zero, then, by arguments similar to thosein [3], the mutual information is increased by bringingany coordinate closer to zero, while meeting the powerconstraint.

The following corollary results from using this lemmatogether with results from real and complex analysis:

Corollary 1. The support of µ∗R(·) corresponding to theoptimal µ∗X(·) is bounded and finite in r = ‖x‖.

Proof. The invariance under unitary transformations followsfrom [12]. The support is discrete in ‖x‖ because otherwise(a) of Lemma 1 would imply a contradiction. The support ofµX(·) and µR(·) is bounded by (b) of Lemma 1. The numberof points with a non-zero probability mass in ‖x‖ is finitebecause otherwise, by the Bolzano Weierstrass theorem, wehave a limit point and (a) of Lemma 1 would again imply acontradiction.

Note that the arguments used to establish this result are verysimilar to the arguments presented in [3]. The only differencein analysis is due to the fact that T0 > 1. To establish the resultin this case, we apply a linear transformation (at a limit pointof a sequence of points with a non zero probability measureunder µ∗X(·)) to reduce it to the case considered in [3] andhence establish a contradiction.

10−1 100

10−1.5

10−1

10−0.5

SNR

Cap

acity

inbi

tspe

rsy

mbo

ltim

e

Fig. 2: Capacity vs. SNR for T0 = 1 (dotted) and T0 = 2(solid)

Based on Corollary 1, we now proceed to compute thecapacity and capacity-achieving distribution for our channelmodel. In Section V-B we point out connections of theseresults with channel estimation.

V. NUMERICAL RESULTS

In this section we consider an average transmit power of1 unit and the SNR to be completely specified by the noisevariance σ2, i.e., SNR = 1

σ2 . We study the effects of SNR onthe capacity of the block fading channel as well as on the infor-mation that can be extracted about the channel for the capacity-achieving distribution. These distributions are specified by afinite number of support points in ‖x‖ and their correspondingprobability mass functions. To obtain these distributions, thenumber of points in the capacity-achieving distribution wasincreased until, within the tolerances, the mutual informationdid not increase further and the dual variables λ1 and λ2stayed the same. The optimizations were performed usingthe fmin_slsqp routine in SciPy [14]. Multiple randomstarting points were used to test the numerical stability ofthe optimization problem and the optimization routine; thecapacity was found to be the same regardless of the startingpoint whenever the optimization completed successfully.

We present capacity results in Section V-A and discuss theimplications for channel estimation in Section V-B.

A. Capacity results

We first show how the capacity of the channel per symboltime changes with increasing SNR in Fig. 2. We note that,as expected, coding across time improves performance on theorder of a few dBs of coding gain.

We next present a visualization of the optimal µ∗X in the2D space of all x in Figs. 3 and 4. Note that these figuresspecify µ∗X for both T0 = 1 (i.i.d.) and T0 = 2 (blockfading). We observe that there is a significant mass point atx = 0, for both T0 = 1 and T0 = 2. We also observe that

x 0

−8−6

−4−2

02

46

8x1

−8 −6 −4 −20

24

68

Radial probability

mass

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 3: µ∗X(·) for SNR = 0.5. Blue cylinders represent theproduct distribution over two time slots based on the p?x forT0 = 1, whereas the red cylinders represents the optimaldistribution for T0 = 2. The height of the cylinders isproportional to the mass at a particular radius (T0 = 2) or ata particular point (T0 = 1). The blue cylinders are staggeredslightly for visibility.

the capacity-achieving distribution for T0 = 1 is not optimalfor T0 = 2. Moreover, in the domain of all x ∈ R2, apilot-based scheme performing channel estimation only wouldcorrespond to just a single point with a probability mass of1, whereas a scheme corresponding to pilot-based channelestimation and subsequent use of Gaussian codebooks wouldcorrespond to a distribution supported on a one-dimensionalline in the 2D space. The observation that the capacity-achieving scheme in Figs. 3-4 is neither of these demonstratesthe suboptimality (from a capacity point of view) of pilot-based channel estimation to maximize the achievable rates ofthe block fading channel.

B. Information about channel state

Many existing communication systems separately estimatethe channel using pilot symbols and then use the estimate forsubsequent data transmission either assuming that the estimateis perfect or by modeling the channel estimation error. In thissection, we discuss how the capacity-achieving distribution caninform channel estimation.

In Fig. 5 we plot I(H;Y) under the optimal signalingdistribution µ∗X for different SNRs and compare it with themutual information I(H;Y) computed under the distributioncorresponding to using pilots for channel estimation with T0symbol times, namely, µ∗X(·) = 1 if and only if x = x0,where x0 is a vector whose norm satisfies the power con-

x 0

−8−6

−4−2

02

46

8x1

−8 −6 −4 −20

24

68

Radial probability

mass

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Fig. 4: µ∗X(·) for SNR = 1. Same comments as those in thecaption of Fig. 3.

10−1 100

10−1

100

SNR

I(H;Y

)

Fig. 5: Mutual information between the channel and thechannel output with pilot symbols (dashed line) and with µ∗X(solid line) for T0 = 2.

straints. The latter is just the AWGN capacity expression0.5 log2(1 + T0 × SNR).

We observe from the figure that even with the capacity-achieving distribution µ∗x, the information content about thechannel h in the output y measured by the mutual informationI(H;Y) is nonzero. This suggests that the capacity-achievinginput distribution also allows information about the channelstate to be obtained at the decoder even without any pilotsymbols. This has implications for both the theory and practiceof joint channel estimation and data transmission. The figuresshow conclusively that, ignoring computational complexityconstraints, data transmission at channel capacity does notpreclude channel estimation.

VI. CONCLUSIONS

We consider the capacity of a block Rayleigh fading channelwithout instantaneous channel state information at either the

transmitter or the receiver, but with knowledge of the fadingstatistics. We establish that, similar to known results for thecapacity of the Rayleigh and Ricean i.i.d. fading channel, thecapacity of the block Rayleigh fading channel is also achievedby an input distribution µX(·) which is only a function ofr = ‖x‖ and in which the measure µR(·) on the norm r isfinite and discrete in r. We use this result to present numericalestimates of the capacity and the corresponding capacity-achieving distributions and demonstrate numerically that pilot-based channel estimation has strictly lower rates than capacity.In addition, our numerical results show that under the capacity-achieving distribution, the mutual information between thechannel state and output is non-zero, thereby suggesting thatchannel estimation is implicitly performed by the capacity-optimal decoder. Further investigation of this phenomenon andits implications for practical system design are topics for futurework.

REFERENCES

[1] M. Mushkin and I. Bar-David, “Capacity and coding for theGilbert-Elliott channels,” IEEE Trans. Inf. Theory, vol. 35, no.6, pp. 1277–1290, 1989.

[2] A. J. Goldsmith and P. P. Varaiya, “Capacity, mutual infor-mation, and coding for finite-state Markov channels,” IEEETrans. Inf. Theory, vol. 42, no. 3, pp. 868–886, 1996.

[3] I. C. Abou-Faycal et al., “The capacity of discrete-time mem-oryless Rayleigh-fading channels,” IEEE Trans. Inf. Theory,vol. 47, no. 4, pp. 1290–1301, 2001.

[4] M. C. Gursoy et al., “The noncoherent Rician fading channel-part I: Structure of the capacity-achieving input,” IEEE Trans.Wireless Commun., vol. 4, no. 5, pp. 2193–2206, 2005.

[5] B. M. Hochwald and T. L. Marzetta, “Unitary space-time mod-ulation for multiple-antenna communications in Rayleigh flatfading,” IEEE Trans. Inf. Theory, vol. 46, no. 2, pp. 543–564,2000.

[6] L. Zheng and D. N. C. Tse, “Communication on the grassmannmanifold: A geometric approach to the noncoherent multiple-antenna channel,” IEEE Trans. Inf. Theory, vol. 48, no. 2,pp. 359–383, 2002.

[7] S. Shamai and T. L. Marzetta, “Multiuser capacity in blockfading with no channel state information,” IEEE Trans. Inf.Theory, vol. 48, no. 4, pp. 938–942, 2002.

[8] S. Murugesan et al., “Optimization of training and schedulingin the non-coherent SIMO multiple access channel,” IEEEJournal on Selected Areas in Communications, vol. 25, no.7, pp. 1446–1456, 2007.

[9] I. Abou-Faycal and B. M. Hochwald, “Coding requirementsfor multiple-antenna channels with unknown Rayleigh fading,”Bell Labs Technical Memo, 1999.

[10] N. Jindal and A. Lozano, “Optimum pilot overhead in wirelesscommunication: A unified treatment of continuous and block-fading channels,” ArXiv preprint arXiv:0903.1379, 2009.

[11] M. Chowdhury and A. Goldsmith, Capacity of block fadingSIMO channels without CSI, To be submitted.

[12] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobilemultiple-antenna communication link in rayleigh flat fading,”IEEE Trans. Inf. Theory, vol. 45, no. 1, pp. 139–157, 1999.

[13] W. Rudin, Real and complex analysis. Tata McGraw-HillEducation, 1987.

[14] E. Jones et al., Scipy: Open source scientific tools for Python,[Online; accessed 2016-05-06], 2001–. [Online]. Available:http://www.scipy.org/.

Documents

Capacity of Block Rayleigh Fading Channels Without CSIweb.stanford.edu/~mainakch/papers/block_fading_capacity.pdfMainak Chowdhury and Andrea Goldsmith, Fellow, IEEEy Department of