Error Exponent for Multiple-Access Channels: Upper Boundsweb.eecs.umich.edu/~pradhanv/paper/ittrans11_2.pdfchannels. This results in a characterization in the form of an optimization

Error Exponent for Multiple-Access Channels:

Upper Bounds

Ali Nazari, S. Sandeep Pradhan and Achilleas Anastasopoulos ∗,

Department of Electrical Engineering and Computer Science,

University of Michigan, Ann Arbor, MI 48109, USA

email: [email protected], [email protected], [email protected]

November 4, 2011

Abstract

The problem of bounding the reliability function of a multiple-access channel (MAC) is studied. Two new

upper bounds on the error exponent of a two-user discrete memoryless (DM) multiple-access channel (MAC)

are derived. The first bound (sphere packing) is an upper bound on the average error exponent and is the

first bound of this type that explicitly imposes independence of the users’ input distributions (conditioned

on the time-sharing auxiliary variable) and thus results in a tighter sphere-packing exponent when compared

to the tightest known exponent derived by Haroutunian. The second bound (minimum distance) is an upper

bound on the maximal error exponent and not the average. To obtain this bound, first, an upper bound

on the minimum Bhattacharyya distance between codeword pairs is derived. For a certain large class of

two-user (DM) MAC, an upper bound on the maximal error exponent is derived as a consequence of the

upper bound on Bhattacharyya distance. Using a conjecture about the structure of the multi-user code, a

tighter minimum distance bound for the maximal error exponent is derived and is shown to be tight at zero

rates. Finally, the relationship between average and maximal error probabilities for a two user (DM) MAC

is studied. As a result, a method to derive new bounds on the average/maximal error exponent by using

known bounds on the maximal/average one is obtained.

1 Introduction

In this paper we consider the problem of reliable communication over a multiple-access channel in the discrete

memoryless setting. We consider the problem of characterization of the error exponents of this channel. In

particular, we provide single-letter information-theoretic characterization of upper bounds on the error exponents

for this class of channels. A schematic of multiple-access channel is depicted in Figure 1. In this model, two

transmitters wish to communicate reliably two independent messages to a single decoder by using the channel

n times. The received signal is corrupted both by noise and by mutual interference between the transmitters.

Here, MX , MY denote the two transmitted messages, MX and MY denote the decoded messages, and Xn, Y n

and Zn denote the transmitted and received sequences respectively. The capacity region of this channel has∗This work was supported by NSF grants (CAREER) CCF-0448115, CCF-0915619. The material in this paper was presented

in part at the IEEE International symposium on Information Theory (ISIT) 2008, Toronto, Canada and IEEE ISIT, Seoul, South

Korea, in 2009.

Decoder

X

Y

Z

M

X

Y

n

n

nMultiple

Channel

Access

Encoder 1

Encoder 2

M

M ,MX Y

^ ^

Figure 1: A schematic of two-user multiple-access channel

been characterized information-theoretically by Ahlswede and Liao [1, 23].

For a point-to-point discrete memoryless channels (DMC), it is well-known that the optimal error exponent

E(R) at some fixed transmission rate R (also known as the channel reliability function) gives the exponential

rate of decay of decoding error probability as a function of block-length for the best sequence of codes. Error

exponents have been studied for point-to-point discrete memoryless channels (DMCs) in the literature [4, 11,

15–17, 26, 29, 30]. Lower and upper bounds on the channel reliability function for the DMC are known. A

lower bound, known as the random coding exponent Er(R), was developed by Fano [16] by upper-bounding

the average error probability over an ensemble of codes. This bound is loose at low rates. Gallager [18] partly

closed this gap from below by introducing a technique to purge poor codewords from a random code. This

resulted in a new lower bound, the expurgated bound, which is an improvement over the random coding bound

at low rates [6, 10, 20]. Two upper bounds, known as sphere packing exponent Esp(R) and minimum distance

exponent Emd(R) were developed by Shannon, Gallager, and Berlekamp [29,30] and by Blahut [5], respectively.

The minimum distance bound is tighter than the sphere packing bound at lower rates. The random coding

bound and the sphere packing bound turn out to be equal for code rates greater than a certain value Rc, known

as critical rate, but are different at lower rates. The expurgated bound, Eex(R), coincides with the minimum

distance bound, Emd(R), at R = 0 [10, pg. 189]. In [30], the gap was further narrowed from the above by

showing that any straight line joining the sphere packing bound and any upper bound on the error exponent is

also a valid upper bound. Figure 2 shows all the upper and lower bounds on the reliability function for a DMC.

As we can see in this figure, the error exponent lies inside the shaded region for all transmission rates below the

critical rate.

Regarding discrete memoryless multiple-access channels (DM-MACs), stronger versions of Ahlswede and

Liao’s coding theorem [1,23], giving exponential upper and lower bounds for the error probability, were derived

by several authors. Slepian and Wolf [31], Dyachkov [14], Gallager [19], Pokorny [28], Liu [24], Nazari [26] and

Haim [21] studied lower bounds on the error exponents. The only known upper bound on the error exponents is

the one given by Haroutunian [22], which is rather weak. This bound is positive even in some region outside the

capacity of the channel, and does not take into account the fact that the two transmitters do not communicate

with each other. In [21], a tight characterization of error exponents has been obtained for certain symmetric

channels using structured codes. In our earlier work [26], we have obtained new lower bounds on the reliability

2

E

E

EE

E

sp

md

r

ex

st

CRc

R

E(R)

Etp

Figure 2: Upper and Lower bounds on the error exponent for a DMC: Er is the random coding bound, Emdis the minimum distance bound, Eex is the expurgated bound, Est is the straightline bound, Etp is the typical

random coding bound, and Esp is the sphere packing bound.

function for multiple-access channels: typical random coding bound and partial expurgated bound. These

bounds were characterized in the form of max-min optimization of information divergence functions over the

set of probability distributions.

In the present paper, we develop two new upper bounds on the reliability function of DM-MACs: analogs of

sphere packing bound and minimum distance bound. Towards this goal, we first revisit the point-to-point case

and examine the techniques used for obtaining the upper bounds on the optimal error exponent.

Sphere Packing Bound: The techniques employed to obtain the sphere packing bound can be broadly

classified into three categories. The first is known as the Shannon technique (see [29,30] and [6, Chapter 10]).

Although this yields expressions for the bounds on error exponents that are computationally easier to evaluate

than others, the expressions themselves are much more difficult to interpret. The second may be known as

the Csiszar technique, introduced by Csiszar [8]. This technique uses more intuitive expressions for the error

exponents in terms of the optimization of an objective function involving information quantities over the set of

probability distributions. This technique results in a sphere packing bound for the average probability of error.

The third consists of the strong converse technique, introduced by Csiszar and Korner [10]. This technique

results in an expression identical to that of Csiszar technique. The only difference between the two is that the

third technique results in a sphere packing bound for the maximal probability of error, and not the average.

Due to this, the result of the third technique is weaker than the result of the second one. The third technique

is conceptually simpler than the second and is more amenable for extension to multi-user channels. In point-to-

3

point scenario, by throwing away the worst half of the codewords in any codebook, it can be easily shown that

the average and maximal error probability performance are the same at any transmission rate. This is, however,

not true for a general multi-user transmission system. Specifically, Dueck [13] showed that the maximal error

capacity regions are in general smaller than average error capacity regions for multi-user channels.

In developing our sphere packing bound for multiple-access channels, we first develop a new technique for

deriving the sphere packing exponent for the point-to-point channel by developing a new strong converse theorem

for codes with a specified dominant composition. Using this converse theorem, we directly derive the well known

sphere packing bound for the average probability of error without the elimination of worse half of the codewords

as the final step. Then we extend this approach to multiple-access channels. The conventional strong converse

theorem says that if the rate of the code is high, then the average probability of error cannot be too small, but

says nothing about the size of the subset of the code that contains all good codewords. The new strong converse

theorem (Theorem 1) of this paper says that if the rate of the code is high, then the size of the subset of the

code containing all good codewords cannot be too large. The resulting new sphere packing bound has the form

of a max-min optimization of information divergence function over the set of probability distributions. This is

characterized using an auxilliary random variable along with a Markov chain. The resulting bound explicitly

imposes independence of the users input distributions (conditioned on the time-sharing auxiliary variable), and

results in a tighter sphere-packing exponent in comparison to Haroutunian’s [22].

Minimum distance Bound: We also derive an upper bound on the maximal error exponent for MAC by

studying the Bhattacharyya distance distribution of any multi-user code. We use a two-step approach for this.

In the first step, we derive an upper bound on the error exponent by establishing a link between the minimum

Bhattacharyya distance and the maximal probability of decoding error; the upper bound on the Bhattacharyya

distance then can be used to infer the lower bound on the probability of decoding error. This approach can

be thought of as the straightforward extension of that of [6] from point-to-point channels to multiple-access

channels. This results in a characterization in the form of an optimization problem where the objective function

is the average Bhattacharyya distance between the channel input letters. In the second step, we provide a new

multi-user tightening of this bound using an auxilliary random variable along with a Markov chain. Moreover,

we express the bound in the form of max-min optimization of information divergence function over the set of

probability distributions.

At zero rate pair, this upper bound has a similar structure to the partial expurgated bound studied in [26].

However, the two bounds are not necessarily equal. By using a conjecture about the structure of the multi-user

code, we derive a tighter minimum distance bound for the maximal error exponent. Finally, in this paper,

we study the relationship between average and maximal error probabilities for a two user (DM) MAC and

develop a method to obtain new bounds on the average/maximal error exponent by using known bounds on

the maximal/average one. It is observed that for the zero rate pair, the bounds on average error exponent

are indeed valid bounds on the maximal error exponent and vice versa. As a result, the comparison between

the conjectured minimum distance bound and the expurgated bound [26] is indeed a valid comparison. By

comparing these bounds it is shown that the expurgated and the conjectured minimum distance bounds are

equal at rate zero.

The paper is organized as follows. Preliminaries and definitions are introduced in Section 2. The sphere

packing bound on the average probability of error for point-to-point DMC and DM-MAC are studied in Section 3.

In Section 4, we study the minimum distance bound on the maximal error exponent. In Section 4.2, we study

4

some properties of the dominant type of any arbitrary code and in particular we give a condition for a joint type

to be a dominant type of any arbitrary code at a certain rate. The other main result of the paper which is a

minimum distance bound for the maximal error exponent for MAC is obtained in section 4.3. In section 4.4, by

using a conjecture about the structure of the multi-user code, a tighter minimum distance bound is derived and

is shown to be tight at zero rate. In section 5, by using a known upper bound on the maximum error exponent

function, we derive an upper bound on the average error exponent function and vice versa. The proofs of some

of these results are given in the Appendix.

2 Preliminaries

We will follow the notation of [10]. For any finite alphabet X , let P(X ) denote the set of all probability

distributions on X . For any sequence x = (x1, ..., xn) ∈ Xn, let Px denote its type. Let TP denote the type

class of type P . Let Pn(X ) denote the set of all types on X . Let TV denote a V-shell, D(V ‖W |P ) denote the

conditional I-divergence. We will use discrete memoryless channels without feedback.

Definition 1. A discrete memoryless channel (DMC) is a triple {X ,Y,W} consisting of a finite input alphabet

X , a finite output alphabet Y and a stochastic matrix W : X → Y. The channel transition probability for

n-sequences is given by

Wn(y|x) ,n∏i=1

W (yi|xi),

where x , (x1, ..., xn) ∈ Xn, y , (y1, ..., yn) ∈ Yn. We will identify a DMC with a stochastic matrix W ,

when X and Y are clear from the context. Typically, we will fix the input and output alphabets for entire the

discussion. Let W(Y|X ) denote the set of all DMCs with input alphabet X and output alphabet Y. An (n,M)

code is a set C = {(xi, Di) : 1 ≤ i ≤M} with (a) xi ∈ Xn, Di ⊂ Yn and (b) Di ∩Di′ = ∅ for i 6= i′.

The transmission rate, R, for this code, is defined as R = 1n logM . When this code C is used on a DMC W ,

and when ith codeword is transmitted, the conditional probability of error is given by

ei(C,W ) , Wn(Dci |xi). (1)

The average probability of error for this code under W is defined as

e(C,W ) ,1M

M∑i=1

ei(C,W ), (2)

and the maximal probability of error is defined as

em(C,W ) , maxiWn(Dc

i |xi). (3)

A (n,M, λ) code for DMC W : X → Z, is an (n,M) code C with em(C,W ) ≤ λ.

For the DMC, W : X → Y, the average and maximal error exponents, at rate R, are defined as:

E∗av(R) , lim supn→∞

maxC∈C(n,R)

− 1n

log e(C,W ), (4)

E∗m(R) , lim supn→∞

maxC∈C(n,R)

− 1n

log em(C,W ), (5)

where C(n,R) is the set of all codes of length n and rate R.

5

Definition 2. A two-user DM-MAC is a quadruple (X ,Y,Z,W ) of two finite the input alphabets X , and Y,

a finite output alphabet Z and a stochastic matrix W : X × Y → Z. The channel transition probability for

n-sequences is given by

Wn(z|x,y) ,n∏i=1

W (zi|xi, yi), (6)

where x , (x1, ..., xn) ∈ Xn, y , (y1, ..., yn) ∈ Yn, and z , (z1, ..., zn) ∈ Zn. As in the point-to-point case,

here too we will identify the MAC with its stochastic matrix when the alphabets are clear from the context. Let

W(Z|X ,Y) denote the set of all MACs with input alphabets X , Y and output alphabet Z.

An (n,M,N) multi-user code is a set C = {(xi,yj , Dij) : 1 ≤ i ≤ M, 1 ≤ j ≤ N} with (a) xi ∈ Xn, yj ∈ Yn,

Dij ⊂ Zn and (b) Dij ∩Di′j′ = ∅ for (i, j) 6= (i′, j′).

For this code, the transmission rate pair (RX , RY ) is defined as (RX , RY ) =(

1n logMX ,

1n logMY

). When

this code is used over a MAC W , and when the codeword pair (xi,yj) is transmitted, the conditional probability

of error of the two-user code C is given by

eij(C,W ) , Wn(Dcij |xi,yj). (7)

The average and maximal probability of error for the two-user code, C under W , are defined as

e(C,W ) ,1

MN

M∑i=1

N∑j=1

eij(C,W ), (8)

em(C,W ) , maxi,j

eij(C,W ). (9)

An (n,M,N, λ) code, C, for the MAC, W , is an (n,M,N) code with

em(C,W ) ≤ λ. (10)

Finally, the average and maximal error exponents at rate pair (RX , RY ), are defined as:

E∗av(RX , RY ) , lim supn→∞

maxC∈C(n,RX ,RY )

− 1n

log e(C,W ), (11)

E∗m(RX , RY ) , lim supn→∞

maxC∈C(n,RX ,RY )

− 1n

log em(C,W ), (12)

where C(n,RX , RY ) is the set of all codes of length n and rate pair (RX , RY ).

3 A Sphere Packing Bound on the Average Error Exponent

Toward obtaining new sphere packing bound for MAC, we will go back to the point-to-point DMC and obtain

a new strong converse theorem.

3.1 Point-to-point channel

We will contrast our approach with the Csiszar-Korner technique mentioned in the introduction. In the latter,

the following arguments are used. First a strong converse theorem to the capacity of the DMC with average

6

probability of error is obtained. Fix a DMC given by W and a small constant δ > 0. Consider an arbitrary

constant composition code with composition P , large blocklength n and rate R. For any DMC V , with I(P, V ) ≤R, by the strong converse theorem, there must exist a codeword x with the corresponding decoder decision

region S such that the probability of decoding error under V given that codeword is transmitted cannot be

exponentially small, i.e., V n(Sc|x) ≥ 1− δ. This then implies that under the actual channel W , the probability

of decoding error for that codeword cannot be too small either, i.e., Wn(Sc|x) ≥ 1+2δ2 exp{−nD(V ‖W |P )}.

This gives a lower bound on the maximum probability of error of the code over the channel W . An extension of

this to the MAC along with a strong converse can be used to provide a sphere packing bound on the maximal

error exponent.

Instead, we will develop a new strong converse theorem and a new approach to derive the sphere packing

bound on the average error exponent for a DMC. Fix a small constant λ > 0, and the channel W . Intuitively,

we will show that for any code, if a constant composition subset (say with composition P ) containing good

codewords–those having small probability of error (≤ 1+λ2 ) for any DMC V –is a large fraction of the code, then

the rate of the code cannnot be larger than the corresponding mutual information, i.e., R ≤ I(P, V ). This can

also be interpreted as follows. If the the rate of the code is larger than I(P, V ) for some composition P , then

the size of the corresponding constant composition subset of good codewords must be small. This provides a

lower bound on the size of the subset containing bad codewords, which can be used to provide a lower bound

on the average probability of error. These ideas are formalized in the following. Fix an input alphabet X and

output alphabet Y.

Definition 3. For any discrete memoryless channel, V ∈ W(Y|X ), for any distribution P ∈ P(X ), any

0 ≤ λ < 1, and any (n,M) code, C, define the set of “good” codewords as

EV (C,P, λ) ,{xi ∈ C : V n(Di|xi) ≥

1− λ2

,xi ∈ TP}. (13)

In the following we give our strong converse theorem.

Theorem 1. Consider an arbitrary (n,M) code C. For every P ∗ ∈ Pn(X ), every 0 ≤ λ < 1, and every DMC

V ∈ W(Y|X ) the condition |EV (C,P ∗, λ)| ≥ 1(n+1)|X|

(1− 2λ

1+λ

)M implies

1n

logM ≤ I(P ∗, V ) + εn(λ, |X |). (14)

where, εn > 0 and εn → 0 as n→∞.

Proof. The proof is provided in the Appendix.

Using this strong converse theorem, we derive the sphere packing bound on the average error exponent for

a DMC as follows.

Fact 1. (Sphere Packing Bound) For any R ≥ 0, and δ > 0 sufficiently small, and any discrete memoryless

channel, W : X → Z, every (n,M) code, C, with n sufficiently large and with

1n

logM ≥ R+ δ (15a)

has average probability of error

e(C,W ) ≥ e−n[Esp(R,W )(1+δ)+δ], (16)

7

where

Esp(R,W ) , maxP∈P(X )

minV :I(P,V )≤R

D(V ||W |P ). (17)

Proof. Consider an (n,M) code C. Since the size of the code is an exponential function of the block length

n, and the number of types is a polynomial function of it, it can be concluded that there must exist a type,

P ∗ ∈ Pn(X ) such that

|C ∩ TP∗ | ≥1

(n+ 1)|X |M. (18)

Consider an arbitrary discrete memoryless channel V : X → Z, such that this channel cannot support the

transmission rate R, i.e., R ≥ I(P ∗, V ). Choose λ < 1 satisfying

1 + λ

2> 1− δ

2. (19)

Since R ≥ I(P ∗, V ), it implies that for all sufficiently large n, we have

1n

logM ≥ I(P ∗, V ) + δ > I(P ∗, V ) + εn,

and hence it can be concluded from Theorem 1 (contra-positive version) that the size of the subset containing

good codewords (with respect to the channel V ) cannot to too large, i.e.,

∣∣EV (C,P ∗, λ)∣∣ < 1

(n+ 1)|X |

(1− 2λ

1 + λ

)M. (20)

By combining (18) and (20), for all sufficiently large n, we can argue that the size of the subset containing bad

codewords (with probability of decoding error at least 1+λ2 on the channel V ) cannot be too small. In other

words,

∣∣DV (C,P ∗, λ)∣∣ ≥ 1

(n+ 1)|X |

(2λ

1 + λ

)M, (21)

where DV (C,P ∗, λ) is defined as

DV (C,P ∗, λ) , (C ∩ TP∗) \ EV (C,P ∗, λ) ={xi ∈ C ∩ TP∗ : V n(Dc

i |xi) >1 + λ

2}. (22)

Next we relate the probabilities of decoding errors of these codewords when used on V and when used on W .

In other words, if the probability of decoding error with respect to V is not too small, then that with respect to

W cannot be too small either. By combining (19), (22) and using the same method as Csiszar in [10, pp. 167],

we have the following argument for the probability of decoding error on the given DMC W ,

Wn(Dci |xi) ≥ exp

{− n

D(V ||W |P ∗) + h(1− δ2 )

1− δ2

}≥ 1

2exp {−nD(V ||W |P ∗)(1 + δ)} for all xi ∈ DV (C,P ∗, λ), (23)

for small enough δ satisfying h(1 − δ2 ) < 1 − δ

2 . Now we are ready to provide a lower bound on the average

probability of decoding error when the code C is used over the channel W . The average error probability of the

8

code C over the channel W can be written as

e(C,W ) =1M

M∑i=1

Wn(Dci |xi)

≥ 1M

∑xi∈DV (C,P∗,λ)

12

exp {−nD(V ||W |P ∗)(1 + δ)}

≥ 1(n+ 1)|X |

(λ

1 + λ

)exp {−nD(V ||W |P ∗)(1 + δ)}. (24)

Since the inequality (24) holds for all V : X → Z satisfying I(P ∗, V ) ≤ R, it can be concluded that

e(C,W ) ≥ maxV :I(P∗,V )≤R

exp {−n [D(V ||W |P ∗)(1 + δ) + δ]}

= exp{−n[

minV :I(P∗,V )≤R

D(V ||W |P ∗)(1 + δ) + δ

]}, (25)

for sufficiently large n. As we mentioned earlier, P ∗ is a type of the code. We can provide a further lower bound

on the average error probability as follows

e(C,W ) ≥ minP∗∈Pn(X )

exp{−n[

minV :I(P∗,V )≤R

D(V ||W |P ∗)(1 + δ) + δ

]}≥ exp

{−n[

maxP∈P(X )

minV :I(P,V )≤R

D(V ||W |P )(1 + δ) + δ

]}. (26)

This completes the proof.

In the following we extend this idea to the case of multiple-access channels.

3.2 Multiple-access channels

The main result of this section is a lower (sphere packing) bound for the average error probability for a DM-MAC.

This bound is characterized using an auxiliary random variable. The key objective is to look at an arbitrary

code and distill from it an auxiliary random variable. To state the new bound, we need an intermediate result

that has the form of a strong converse for the MAC. We state this result here and relegate the proof to the

appendix. The proof uses the wringing technique due to Dueck [13] and Ahlswede [3]. Fix the input alphabets

X , Y and output alphabet Z.

Definition 4. For any DM-MAC, V ∈ W(Z|X ,Y), for any joint distribution P ∈ P(X × Y), any 0 ≤ λ < 1,

and any (n,MX ,MY ) code, C, define the set of “good” codeword pairs as

EV (C,P, λ) ,{

(xi,yj) ∈ C : V n(Dij |xi,yj) ≥1− λ

2, (xi,yj) ∈ TP

}. (27)

The following theorem says that if the subset of C that contains good codewords when used over a channel

V is a large fraction of the size of C, then the rates of the code must belong to the corresponding constrained

capacity region, i.e., the rates cannot to too large. This is a strong converse theorem for multiple-access channels.

9

Theorem 2. Consider an arbitrary (n,MX ,MY ) code C. For every P ∗n ∈ Pn(X × Y), every 0 ≤ λ < 1, and

every DMC V ∈ W(Z|X ,Y), the condition |EV (C,P ∗n , λ)| ≥ 1(n+1)|X||Y|

(1− 2λ

1+λ

)MXMY , implies(

1n

logMX ,1n

logMY

)∈ Cn (P ∗n , V ) , (28)

where Cn (P, V ) is defined as the closure of the following set⋃PUXY ∈Sn

{(RX , RY ) : RX < I(X ∧ Z|Y U) + εn, RY < I(Y ∧ Z|XU) + εn, RX +RY < I(XY ∧ Z|U) + εn,

}where Sn is the set of all distributions defined on U × X × Y such that (a) U is a finite set and the random

variables (U,X, Y, Z) have the distribution PUXY (u, x, y)V (z|x, y) for all (u, x, y, z) ∈ U × X × Y × Z, (b)

I(X ∧ Y |U) ≤ εn, and (c)∑u PUXY (u, x, y) = P (x, y) for all (x, y) ∈ X × Y . Here, εn → 0 as n→∞.


We further define C(P, V ) as the limiting version of the sets Cn(P, V ) by replacing εn with 0. Note that

Cn(P, V ) is closed and convex for every n, and so is C(P, V ). Since the information quantities are continuous

functions of the distributions it follows that (see [13, p.193])

C(P, V ) =⋂n

Cn(P, V ).

Note that C(P, V ) is the input-distribution constrained capacity region of the MAC V , and is characterized

using an auxiliary random variable along with a Markov chain. The Markov chain captures the structure of the

communication problem at hand: the encoders do not communicate with each other. Due to the convexity of

C(P, V ), we can restrict the size of alphabet of the auxiliary random variable |U| to 4. Let C(P, V ) denote the

closure of the complement of C(P, V ). Using the above theorem we obtain the following sphere packing bound

on the average error exponents for multiple-access channels.

Theorem 3. (Sphere Packing Bound) For any RX , RY ≥ 0, and δ > 0 sufficiently small, and any DM-

MAC, W : X × Y → Z, every (n,MX ,MY ) code, C, with

1n

logMX ≥ RX + δ (30a)

1n

logMY ≥ RY + δ, (30b)

has average probability of error

e(C,W ) ≥ e−n[Esp(RX ,RY ,W )(1+δ)+δ], (31)

where

Esp(RX , RY ,W ) , maxPXY ∈P(X×Y)

minV :(RX ,RY )∈C(PXY ,V )

D(V ||W |PXY ). (32)

Note that the sphere packing bound has the form of a max-min optimization problem. The maximization

is over all input distributions on the channel input, and minimization is over all channels that cannot support

the rate pair (RX , RY ).

10

Proof. Since C is an (n,MX ,MY ) multi-user code, it can be concluded that there must exist a joint type,

P ∗ ∈ Pn(X × Y) such that

|C ∩ TP∗ | ≥1

(n+ 1)|X ||Y|MXMY . (33)

Consider an arbitrary DM-MAC V such that (RX , RY ) ∈ C(P ∗, V ), i.e., either (RX , RY ) 6∈ C(P ∗, V ) or

(RX , RY ) belongs to the boundary of C(P ∗, V ). Since 1n logMX ≥ RX + δ and 1

n logMY ≥ RY + δ, we have

for all sufficiently large n, (1n

logMX ,1n

logMY

)6∈ Cn(P ∗, V ). (34)

Choose 0 ≤ λ < 1 such that

1 + λ

2> 1− δ

2. (35)

Hence using (34) and Theorem 2 (contra-positive version), it can be concluded that the size of the subset that

contains good codewords cannnot be too large, i.e.,

∣∣EV (C,P ∗, λ)∣∣ < 1

(n+ 1)|X ||Y|

(1− 2λ

1 + λ

)MXMY . (36)

By combining (33) and (36), it can be concluded that the size of the subset that contains bad codewords cannot

be too small, i.e.,

∣∣DV (C,P ∗, λ)∣∣ ≥ 1

(n+ 1)|X ||Y|

(2λ

1 + λ

)MXMY , (37)

where DV (C,P ∗, λ) is defined as

DV (C,P ∗, λ) , (C ∩ TP∗) / EV (C,P ∗, λ) ={

(xi,yj) ∈ C ∩ TP∗ : V n(Dcij |xi,yj) >

1 + λ

2}. (38)

By combining (35) and (38), and using the same method as Csiszar in [10, pp. 167], we have

Wn(Dcij |xi,yj) ≥ exp

{−D(V ||W |P ∗) + h(1− δ

2 )1− δ

2

}≥ 1

2exp {−nD(V ||W |P ∗)(1 + δ)} for all (xi,yj) ∈ DV (C,P ∗, λ), (39)

for small enough δ satisfying h(1− δ2 ) < 1− δ

2 . The average error probability of the code C over the channel W

can be written as

e(C,W ) =1

MXMY

MX∑i=1

MY∑j=1

Wn(Dcij |xi,yj)

≥ 1MXMY

∑(xi,yj)∈DV (C,P∗,λ)

12

exp {−nD(V ||W |P ∗)(1 + δ)}

≥ 1(n+ 1)|X ||Y|

(λ

1 + λ

)exp {−nD(V ||W |P ∗)(1 + δ)}. (40)

11

Since the inequality (40) holds for all V : X ×Y → Z satisfying (RX , RY ) ∈ C(P ∗, V ), it can be concluded that

e(C,W ) ≥ maxV :(RX ,RY )∈C(P∗,V )

exp {−n[D(V ||W |P ∗)(1 + δ) + δ]}

= exp {−n[ minV :(RX ,RY )∈C(P∗,V )

D(V ||W |P ∗)(1 + δ) + δ]},

≥ minPXY ∈P(X×Y):

exp {−n[ minV :(RX ,RY )∈C(PXY ,V )

D(V ||W |PXY )(1 + δ) + δ]}, (41)

for sufficiently large n, which completes the proof.

4 Minimum distance bound on the maximal error exponent for

MAC

In this section we consider a new upper bound on the error exponent for MAC that is tighter than the sphere

packing bound at low rates. This bound is only valid for the maximal error exponent. For this we will follow

the approach of Blahut [5,6]. As done in the previous section, here too, our objective is to look at an arbitrary

code and distill an auxiliary random variable. First we need some definitions about distance functions.

4.1 Preliminaries

Definition 5. For a specified channel W : X × Y → Z, the Bhattacharyya distance between the channel input

letter pairs (x, y), and (x, y) is defined by

dB((x, y), (x, y)

), − log

(∑z∈Z

√W (z|x, y)W (z|x, y)

). (42)

In this paper, we assume dB((x, y), (x, y)

)6=∞ for all (x, y) and (x, y). A channel with this property is called an

indivisible channel. An indivisible channel for which the matrix [A(i,j),(k,l)](i,j),(k,l) = [2sdB

((i,j),(k,l)

)](i,j),(k,l) is

nonnegative-definite for all s > 0 is called a nonnegative-definite channel.

For a block channel Wn, the normalized Bhattacharyya distance between two channel input block pairs

(x,y), and (x, y) is given by:

dB((x,y), (x, y)

)= − 1

nlog( ∑

z∈Zn

√Wn(z|x,y)Wn(z|x, y)

). (43)

If Wn is a memoryless channel, it can be easily shown that the Bhattacharyya distance between two pairs of

codewords (x,y) and (x, y), with joint empirical distribution PXY XY , is

dB((x,y), (x, y)

)=∑x,x∈Xy,y∈Y

PXY XY (x, y, x, y)dB((x, y), (x, y)

). (44)

As it can be seen from (44), for a fixed channel, the Bhattacharyya distance between two pairs of codewords

depends only on their joint composition. The minimum Bhattacharyya distance for a code C is defined as:

dB(C) , min(x,y),(x,y)∈C(x,y) 6=(x,y)

dB((x,y), (x, y)

). (45)

12

For a fixed rate pair (RX , RY ) and blocklength n, the best possible minimum distance is defined as

d∗B(RX , RY , n

), max

CdB(C), (46)

where the maximum is over all good codes with parameters (n, 2nRX , 2nRY ). Finally, we define

d∗B(RX , RY ) , lim supn→∞

d∗B(RX , RY , n

). (47)

Given a DM-MAC W , define the following function α : P(X × Y × Z)→ R+ as

α(PXY VZ|XY ) = D(VZ|XY ‖W |PXY ) +H(VZ|XY |PXY ).

Such a function was introduced in [9], and was used to express the expurgated bound in the form of max-min

optimization problem involving information-divergence functions.

4.2 Dominant Type of a Code

Consider any arbitrary code. We partition the codewords into classes according to their composition. Then the

error event is partitioned into its intersections with these type classes, and the error probability can be obtained

by adding up the probabilities of these intersections. Since the number of type classes grows polynomially as a

function of the blocklength, the error probability has the same exponential asymptotics as the largest one among

the probabilities of these intersections. In the other words, one of the types plays a crucial role in determining

the performance of the code. To obtain an upper bound on the reliability function of the channel, we need to

study the performance of the best code. It is observed that for the best code, the composition which dominates

the error event must be a dominant type of the code. Otherwise, we can eliminate all codewords with this

particular composition, and the resulting code which has the same transmission rate outperforms the best code,

which contradicts the assumption about the best code. Therefore, to obtain an upper bound on the reliability

function of a DMC, we need to study the compositions that can be the dominant type of the best code. In

particular, we need to answer the following question: “At a fixed transmission rate, R, which composition is

the dominant type of the best code?”, or the more general question, “At a fixed transmission rate, R, which

compositions can be potentially the dominant type of any arbitrary code?”. In single user code, the answer for

this question is straightforward and one can easily observe that as long as the number of sequences of type P

is larger than the number of codewords in the code, P can be dominant type of the code. Therefore, P can be

a dominant type of a code of rate R, if and only if H(P ) ≥ R.

Now, consider any (n,MX ,MY ) code C. Suppose all the message pairs are equiprobable and the sources

are sending data independently. Thus, for at least one joint type, the number of pairs of sequences in the multi

user code sharing that particular type, should be an exponential function of n, with the rate almost equal to

the rate of the multi user code, C. We call the subcode consisting these pairs of sequences as a dominant

subcode of C. Hence, for any good multi-user code, there must exist at least one joint type which dominates

the codebook. Therefore, to obtain an upper bound on the reliability function of a DM-MAC, we need to

characterize the possible dominant joint compositions of multi-user codes at a certain transmission rate pair.

We have the following corollary to Theorem 2.

13

Corollary 1. Consider an arbitrary (n,MX ,MY ) code. For every P ∗n ∈ Pn(X ×Y), the condition |C ∩ TP∗n | ≥1

(n+1)|X||Y|MXMY , implies

1n

logMX ≤ H(X|U) + ε′n,1n

logMY ≤ H(Y |U) + ε′n,

for some PUXY defined on U×X×Y such that (a) U is a finite set, (b) I(X∧Y |U) ≤ ε′n and (c)∑u PUXY (u, x, y) =

P ∗n(x, y) for all (x, y) ∈ X × Y, where ε′n → 0 as n→∞.

Proof. The proof follows from Theorem 2 by choosing λ = 0 and noiseless and orthogonal multiple-access

channel with Z = (X × Y) and Z = (X,Y ).

Now, consider a particular joint type P ∗n . By the corollary, if there does not exist any PU |XY satisfying the

constraint mentioned, then the joint type P ∗n cannot be a dominant type of a good multiuser code with rate

(RX , RY ).

4.3 Main Result

In this section, we present an upper (minimum distance) bound for the maximal error exponent for a DM-MAC.

The idea behind the derivation of this bound is the connection between the minimum distance of the code and

the maximal probability of decoding error. Intuitively, the closer the codewords are, the more confusion occurs

in decoding. We will follow a two-stage procedure.

Stage 1: We extend the technique used in [5] for point-to-point channels to multiple-access channels. To

derive an upper bound on the error exponent at rate (RX , RY ), we need to show that for any code with

parameter (RX , RY ), there exist at least two pairs of codewords which are very close to each other in terms of

Bhattacharyya distance. In the other words, we need to find an upper bound on the minimum distance of codes

with parameter(n, 2nRX , 2nRY

). Consider any arbitrary multi-user code, C, with parameters

(n, 2nRX , 2nRY

)with a dominant joint type PXY , i.e., |C ∩ TPXY

| ≥ 1(n+1)|X||Y|

MXMY . We concentrate on the dominant subset

corresponding PXY , i.e. all codeword pairs sharing PXY as their joint type. We study the minimum distance of

this subset and in particular we prove that there exist at least two pairs of codewords at a certain Bhattacharrya

distance. As a result, we find an upper bound for the minimum distance of this subset of the code. Clearly, this

bound is still a valid upper bound for the minimum distance of the original multi user code.

To obtain this upper bound, we show that there exist a spherical collection about a pair of channel input

words, not necessarily codeword pairs, with exponentially many codeword pairs on it. Intuitively, since expo-

nentially many codeword pairs are located on this spherical collection, all of these pairs cannot be too far away

from each other. We study the distance structure of this collection, and find the average distance of this subset.

It can be concluded that there must exist at least two pairs of codewords with distance at most equal to the

average we found. Then, by relating the maximal error probability of the code to its minimum distance, we

derive a lower bound on the maximal error probability of any multiuser code satisfying certain rate constraints.

Stage 2: Here we will use Corollary 1 to the sphere packing bound and restrict the joint types of the codes

that can enter into the max-min optimization problem that characterizes the bound. Further, we will express

the bound in the form of a max-min optimization problem involving information divergence function. The main

result of this section is given by the following theorem.

14

Theorem 4. For any indivisible nonnegative-definite channel, W , the maximal error reliability function,

E∗m(RX , RY ), satisfies

E∗m(RX , RY ) ≤ EU (RX , RY ,W ). (48)

where EU (RX , RY ,W ) is defined as

EU (RX , RY ,W ) , maxPUXY

minβ=X,Y,XY

EβU (RX , RY ,W, PXY U ). (49)

The maximum is taken over all PUXY ∈ P(U × X × Y) such that X − U − Y , and RX ≤ H(X|U) and

RY ≤ H(Y |U). The functions EβU (RX , RY ,W, PXY U ) are defined as follows:

EXU (RX , RY ,W, PXY U ) , minVXXXY Z∈VU

X

D(VZ|XY ||W |PXY ) + I(X ∧ Z|XY ),

EYU (RX , RY ,W, PXY U ) , minVXY Y Y Z∈VU

Y

D(VZ|XY ||W |PXY ) + I(Y ∧ Z|XY ),

EXYU (RX , RY ,W, PXY U ) , minVXY XY XY Z∈V

UXY

D(VZ|XY ||W |PXY ) + I(XY ∧ Z|XY ). (50)

where

VUX ,{VXXXY Z : VXY = VXY = VXY = PXY , X −XY − X, VX|XY = VX|XY ,

I(X ∧ X|Y ) ≤ RX , α(VXY Z) < α(VXY Z)}, (51)

VUY ,{VXY Y Y Z : VXY = VXY = VXY = PXY , Y −XY − Y , VY |XY = VY |XY ,

I(Y ∧ Y |X) ≤ RY , α(VXY Z) < α(VXY Z)}, (52)

VUXY ,{VXY XY XY Z : VXY = VXY = VXY = PXY , XY −XY − XY , VXY |XY = VXY |XY ,

I(XY ∧ XY ) ≤ RX +RY , α(VXY Z) < α(VXY Z)}. (53)


4.4 A Conjectured Tighter Upper Bound

Conjecture 1. For all sequences of nearly complete subgraphs of a particular type graph TPXY, the rates of

the subgraph (RX , RY ) satisfy

RX ≤ H(X|U), RY ≤ H(Y |U) (54)

for some PU |XY such that X − U − Y . Moreover, there exists u ∈ TPUsuch that the intersection of the fully

connected subgraph with TPXY |U (u) has the rate (RX , RY ).

Based on the result of previous lemma, and by following a similar argument as proof of Theorem 4 , we can

conclude the following result:

Theorem 5. For any indivisible nonnegative-definite channel, W , the maximal error reliability function,

E∗m(RX , RY ), satisfies

E∗m(RX , RY ) ≤ EC(RX , RY ,W ). (55)

15

where EC(RX , RY ,W ) is defined as

maxPUXY

minβ=X,Y,XY

EβC(RX , RY ,W, PXY U ) (56)


RY ≤ H(Y |U). The functions EβC(RX , RY ,W, PXY U ) are defined as follows:

EXC (RX , RY ,W, PXY U ) , minVXXXY Z∈VC

X

D(VZ|UXY ||W |VUXY ) + I(X ∧ Z|UXY ),

EYC (RX , RY ,W, PXY U ) , minVXY Y Y Z∈VC

Y

D(VZ|UXY ||W |VUXY ) + I(Y ∧ Z|UXY ),

EXYC (RX , RY ,W, PXY U ) , minVXY XY XY Z∈V

CXY

D(VZ|UXY ||W |VUXY ) + I(XY ∧ Z|UXY ). (57)

where

VCX ,{VUXXXY Z : VUXY = VUXY = VUXY = PUXY ,

X − UXY − X , VX|XY U = VX|XY U ,

I(X ∧ X|Y U) = I(X ∧ X|Y U) ≤ RX , α(VUXY Z) < α(VUXY Z)}

(58)

VCY ,{VUXY Y Y Z : VUXY = VUXY = VUXY = PUXY ,

Y − UXY − Y , VY |XY U = VY |XY U ,

I(Y ∧ Y |UX) = I(Y ∧ Y |UX) ≤ RY , α(VUXY Z) < α(VUXY Z)}

(59)

VCXY ,{VUXY XY XY Z : VUXY = VUXY = VUXY = PUXY ,

XY − UXY − XY VXY |UXY = VXY |UXY ,

I(XY ∧ XY |U) = I(XY ∧ XY |U) ≤ RX +RY , α(VUXY Z) < α(VUXY Z)}

(60)

Let us focus on the case where both codebooks have rate zero, RX = RY = 0. Any VUXXXY ∈ VCX satisfies

the following:

X − UY − X, X − UY − X, (61)

therefore, any VUXXXY Z ∈ VCX can be written as

PX|UPX|UPX|UPY |UPUVZ|UXY XX . (62)

Similarly, any VUXY Y Y ∈ VCY can be written as

PX|UPY |UPY |UPY |UPUVZ|UXY Y Y , (63)

and any VUXY XY XY ∈ VCXY can be written as

PX|UPY |UPX|UPY |UPX|UPY |UPUVZ|UXY XY XY . (64)

16

Therefore, EXC , EYC , and EXYC would be equal to

EXC (0, 0, PXY U ) = minVZ|UXY XPX|UPX|UPY |UPU :

α(VUXY Z)<α(VUXY Z)

D(VZ|UXY ||W |VUXY ) + I(X ∧ Z|UXY ), (65)

EYC (0, 0, PXY U ) = minVZ|UXY Y PX|UPY |UPY |UPU :


D(VZ|UXY ||W |VUXY ) + I(Y ∧ Z|UXY ), (66)

EXYC (0, 0, PXY U ) = minVZ|UXY XY PX|UPY |UPX|UPY |UPU :


D(VZ|UXY ||W |VUXY ) + I(XY ∧ Z|UXY ). (67)

Corollary 2. At rate RX = RY = 0,

EC(0, 0, PXY U ) = EL(0, 0, PXY U ), (68)

where EL(RX , RY , PXY U ) is the lower bound on the error exponent which was derived in [25, 26].

5 The Maximal Error Exponent vs. The Average Error Exponent

In point-to-point communications systems, one can show that a lower/upper bound for the maximal error

probability of the best code is also a lower/upper bound on the average probability of error for such a code.

However, in multiuser communications, this is not the case. It has been shown that for multiuser channels, in

general, the maximal error capacity region is smaller than the average error capacity region [12]. The minimum

distance bound, we obtained in the previous section, is a valid bound only for the maximal error exponent and

not the average. On the other hand, all the known lower bounds in [24][28][27][26], are valid only for the average

error exponent and not the maximal. As a result, unlike the point-to-point case, comparing these upper and

lower bounds does not give us any information about how good these bounds are. In the following, we show an

approach to derive a lower/upper bound on the average/maximal error exponent by using a known lower/upper

bound for the maximal/average error exponent.

Theorem 6. Fix any DM-MAC W : X × Y → Z, RX ≥ 0, RY ≥ 0. The following inequalities hold

E∗av (RX , RY )−R ≤ E∗m (RX , RY ) ≤ E∗av (RX , RY ) ≤ E∗m (RX , RY ) +R, (69)

where R = min{RX , RY }.


Corollary 3. If min{RX , RY } = 0, i.e., RX = 0 or RY = 0,

E∗m(RX , RY ) = E∗av(RX , RY ) (70)

Corollary 4. Fix any DM-MAC W : X × Y → Z, RX ≥ 0, RY ≥ 0. Assume that, the maximal reliability

function is bounded as follows:

ELm (RX , RY ) ≤ E∗m (RX , RY ) ≤ EUm (RX , RY ) . (71)

17

Then, the average reliability function can be bounded by

ELm (RX , RY ) ≤ E∗av (RX , RY ) ≤ EUm (RX , RY ) +R, (72)

where R = min{RX , RY }. Similarly, if the average reliability function is bounded as follows:

ELav (RX , RY ) ≤ E∗av (RX , RY ) ≤ EUav (RX , RY ) , (73)

it can be concluded that the maximal reliability function satisfies the following constraint

ELav (RX , RY )−R ≤ E∗m (RX , RY ) ≤ EUav (RX , RY ) . (74)

6 Conclusions

We have obtained a new strong converse theorem to the capacity of a DMC. This new theorem enables a concise

and intuitive derivation of the sphere packing bound on the average error exponent of the DMC. Moreover, the

approach is very general, and can be applied to DM-MAC, resulting in a sphere packing bound on the average

error exponent that truly captures the structure of the problem, i.e., the two transmitters do not communicate

with each other. The sphere packing bound is characterized using an auxiliary random variable. We then

provide a low-rate improvement of this bound for the maximal error exponents by providing an upper bound

on the minimum distance between any pair of codewords. This is a form of a minimum distance bound.

7 Appendix

7.1 Proof of Theorem 1

Our approach makes use of Augustin’s [4] strong converse theorem for one-way channels which is stated in the

following:

Lemma 1. [4]: For a (n,M, λ) code {(xi, Di) : 1 ≤ i ≤M} used on an arbitrary collection of DMCs (Wt)nt=1,

we have

logM <

n∑t=1

I(Xt ∧ Zt) +3

1− λ|X |n1/2, (75)

where the distribution of the RV’s are determined by the Fano-distribution on the codewords.

Proof. The proof is given in [4].

Consider any type P ∗ ∈ Pn(X ), channel V ∈ W(Y|X ), and 0 ≤ λ < 1. Suppose that |EV (C,P ∗, λ)| ≥1

(n+1)|X|

(1− 2λ

1+λ

)M . The code {(xi, Di) : xi ∈ EV (C,P ∗, λ)} is an

(n, |EV (C,P ∗, λ)|, 1+λ

2

)code. Let us define

λ′ , 1+λ2 . Therefore, by the result of Lemma 1, we conclude that

log (|EV (C,P ∗, λ)|) <n∑t=1

I(Xt ∧ Zt) +3

1− λ′|X |√n, (76)

18

where the distribution of random variables is determined by the Fano-distribution on the codewords. By using

the lower bound on the size of EV (C,P ∗, λ), it can be concluded that

1n

logM ≤ 1n

n∑t=1

I(Xt ∧ Zt) +3|X |

(1− λ′)√n

+ |X | log(n+ 1)n

+1n

log(

1 + λ

1− λ

). (77)

The last three terms on the right hand side of (77) are approaching zero for sufficiently large n. Let us focus

on the first term. In the following, we prove that

1n

n∑t=1

I(Xt ∧ Zt) ≤ I(P ∗, V ). (78)

First note that

1n

n∑t=1

I(Xt ∧ Zt) =1n

n∑t=1

∑x∈X

∑z∈Z

P(Xt = x)P(Zt = z|Xt = x) log(

P(Zt = z|Xt = x)P(Zt = z)

)

=1n

n∑t=1

∑x∈X

∑z∈Z

P(Xt = x)V (z|x) log(

V (z|x)P(Zt = z)

)

=∑x∈X

∑z∈Z

V (z|x) log(V (z|x))1n

n∑t=1

P(Xt = x)− 1n

n∑t=1

∑x∈X

∑z∈Z

P(Xt = x)V (z|x) log(P(Zt = z))

=∑x∈X

∑z∈Z

P ∗(x)V (z|x) log(V (z|x))− 1n

n∑t=1

∑x∈X

∑z∈Z

P(Xt = x)V (z|x) log(P(Zt = z)). (79)

The last equality holds because EV (C,P ∗, λ) is a constant composition code with composition P ∗. The second

term in (79) can be written as

1n

n∑t=1

∑x∈X

∑z∈Z


=∑z∈Z

1n

n∑t=1

(log

(∑x′∈X

P(Xt = x′)V (z|x′)

)∑x∈X

P(Xt = x)V (z|x)

)(80)

In the right hand side of (80), the summands are of the form of u log(u), which is a convex function of u. Thus,

1n

n∑t=1

∑x∈X

∑z∈Z


≥∑z∈Z

(log

(1n

n∑t=1

∑x′∈X

P(Xt = x′)V (z|x′)

)1n

n∑t=1

∑x∈X

P(Xt = x)V (z|x)

)

=∑z∈Z

(log

(∑x′∈X

1n

n∑t=1

P(Xt = x′)V (z|x′)

)∑x∈X

1n

n∑t=1

P(Xt = x)V (z|x)

)

=∑z∈Z

(log

(∑x′∈X

P ∗(x′)V (z|x′)

)∑x∈X

P ∗(x)V (z|x)

). (81)

Finally, by combining (79) and (81), it can be concluded that

1n

n∑t=1

I(Xt ∧ Zt) ≤∑x∈X

∑z∈Z

P ∗(x)V (z|x) log(V (z|x))−∑x∈X

∑z∈Z

P ∗(x)V (z|x) log

(∑x′∈X

P ∗(x′)V (z|x′)

)= I(P ∗, V ). (82)

19

Hence1n

logM ≤ I(P ∗, V ) +6|X |

(1− λ)√n

+|X |n

log(n+ 1) +1n

log(

1 + λ

1− λ

),

which completes the proof.


The basic idea of the proof is the wringing technique which was used for the first time, by Dueck [13] Ahlswede [2].

Consider any P ∗n ∈ Pn(X × Y), 0 ≤ λ < 1 and V ∈ W(Z|X ,Y). Suppose that |EV (C,P ∗n , λ)| ≥ 1(n+1)|X||Y|

(1−2λ

1+λ )MXMY . Let’s define A ,{

(i, j) : V n(Dij |xi,yj) ≥ 1−λ2 , (xi,yj) ∈ TP∗n

}. Note that

|A| ≥ 1(n+ 1)|X ||Y|

(1− 2λ

1 + λ

)MXMY . (83)

Define

C(i) ={

(i, j) : (i, j) ∈ A, 1 ≤ j ≤MY

}(84a)

B(j) ={

(i, j) : (i, j) ∈ A, 1 ≤ i ≤MX

}. (84b)

Consider the subcode{

(xi,yj , Dij) : (i, j) ∈ A}

and define random variables Xn, Y n

P((Xn, Y n) = (xi,yj)

)=

1|A|

if (i, j) ∈ A. (85)

Lemma 2. For random variables Xn, Y n defined in (85), the mutual information satisfies the following in-

equality:

I(Xn ∧ Y n) ≤ − log(

1− 2λ1 + λ

)+ |X ||Y| log(n+ 1). (86)

Proof. This is a generalization of the proof by Dueck in [13]. Note that

H(Y n|Xn) = H(Xn, Y n)−H(Xn) = log |A| −H(Xn) ≥ log |A| − log(MX). (87)

By (83), we conclude that

H(Y n|Xn) ≥ logMY + log(

1− 2λ1 + λ

)− |X ||Y| log(n+ 1). (88)

Finally,

I(Xn ∧ Y n) = H(Y n)−H(Y n|Xn) ≤ logMY −H(Y n|Xn)

≤ − log(

1− 2λ1 + λ

)+ |X ||Y| log(n+ 1), (89)

which concludes the proof.

Lemma 3. [3] Let Xn, Y n be random varaibles with values in Xn, Yn resp. and assume that

I(Xn ∧ Y n) ≤ σ (90)

20

Then, for any 0 < δ < σ there exist t1, t2, ..., tk ∈ {1, ..., n} where 0 ≤ k < 2σδ such that for some xt1 , yt1 , xt2 , yt2 ,

..., xtk , ytk

I(Xt ∧ Yt|Xt1 = xt1 , Yt1 = yt1 , ..., Xtk = xtk , Ytk = ytk) ≤ δ for t = 1, 2, ..., n, (91)

and

P(Xt1 = xt1 , Yt1 = yt1 , ..., Xtk = xtk , Ytk = ytk) ≥(

δ

|X ||Y|(2σ − δ)

)k. (92)

Proof. The proof is provided in [3].

Let us apply this lemma on the pair (Xn, Y n), with

σ = |X ||Y| log(n+ 1) + log(

1 + λ

1− λ

), and δ = n−

12 . (93)

This implies that there exist t1, t2, ..., tk ∈ {1, ..., n} where 0 ≤ k < 2σδ and xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk ,

that satisfy the two properties, given by (91,92), in the statement of Lemma 3. Now consider the subcode

{(xi,yj , Dij) : (i, j) ∈ A}, where

A ,{

(i, j) ∈ A : (xi)tl = xtl , (yj)tl = ytl 1 ≤ l ≤ k}

(94)

and define

C(i) = {(i, j) : (i, j) ∈ A, 1 ≤ j ≤MY } (95a)

B(j) = {(i, j) : (i, j) ∈ A, 1 ≤ i ≤MX} . (95b)

Lemma 4. The subcode {(xi,yj , Dij) : (i, j) ∈ A} has maximal error probability of at most 1+λ2 , and for σ and

δ given in (93), we have

|A| ≥(

δ

|X ||Y|(2σ − δ)

)k|A|. (96)

and

IA(Xt ∧ Yt) ≤ δ, for 1 ≤ t ≤ n (97)

where Xn = (X1, ..., Xn), Y n = (Y1, ..., Yn) are distributed according to the Fano-distribution of the subcode

{(xi,yj , Dij) : (i, j) ∈ A}.

Proof. Since A ⊂ A, the maximal probability of error for the corresponding code is at most 1+λ2 . The second

part of Lemma 3, immediately yields (96). On the other hand,

PA(Xt = x, Yt = y|xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk) =PA(Xt = x, Yt = y, xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk)

PA(xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk)

=NA(Xt = x, Yt = y, xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk)

NA(xt1 , yt1 , xt2 , yt2 , ..., xtk , ytk)

=NA(Xt = x, Yt = y)

|A|

= PA(Xt = x, Yt = y). (98)

21

Therefore, by the first part of Lemma 3, we conclude that

IA(Xt ∧ Yt) ≤ δ, for 1 ≤ t ≤ n. (99)

Now consider the following arguments. Recall that A is a constant composition subcode of C, and

P[(Xn, Y n) = (x,y)] =

{1|A| if (x,y) ∈ A0 otherwise

(100)

For any fixed j, consider (n, |B(j)|) code {(xi, Dij) : (i, j) ∈ B(j)}. For channel V , any codeword in this code

and yj has probability of error at most equal to 1+λ2 . Let us define λ′ , 1+λ

2 . It follows from Lemma 1 that

log |B(j)| ≤n∑t=1

I(Xt ∧ Zt|Yt = (yj)t) +3

1− λ′|X |n1/2. (101)

Similarly, it can be shown that

log |C(i)| ≤n∑t=1

I(Yt ∧ Zt|Xt = (xi)t) +3

1− λ′|Y|n1/2, (102)

log |A| ≤n∑t=1

I(XtYt ∧ Zt) +3

1− λ′|X ||Y|n1/2. (103)

Since P(Yt = y) =P

(i,j)∈A 1{(yj)t=y}

|A| , we have

1|A|

∑(i,j)∈A

log |B(j)| ≤∑

(i,j)∈A

n∑t=1

I(Xt ∧ Zt|Yt = (yj)t)

∑y 1{(yj)t=y}

|A|+

31− λ′

|X |n1/2

=n∑t=1

∑y

I(Xt ∧ Zt|Yt = y)P(Yt = y) +3

1− λ′|X |n1/2

=n∑t=1

I(Xt ∧ Zt|Yt) +3

1− λ′|X |n1/2. (104)

The left hand side of (104) can be bounded from below as follows

1|A|

∑(i,j)∈A

log |B(j)| = 1|A|

∑j

|B(j)| log |B(j)|

≥ 1|A|

∑j:|B(j)|≥B∗

|B(j)| log |B(j)|

≥ 1|A|

log(B∗)∑

j:|B(j)|≥B∗|B(j)|

≥ |A| −MYB∗

|A|log(B∗), (105)

where λ∗ and B∗ are defined as follows

λ∗ ,2λ

1 + λ, (106)

B∗ ,1− λ∗

n

MX

(n+ 1)|X ||Y|

(δ

|X ||Y|(2σ − δ)

)k. (107)

22

Moreover, by using (83) and the result of Lemma 4, it can be concluded that

MYB∗ ≤ 1

n|A|. (108)

Therefore,

1|A|

∑(i,j)∈A

log |B(j)| ≥|A| − 1

n |A||A|

log(B∗)

= (1− 1n

) log

(1− λ∗

n

MX

(n+ 1)|X ||Y|

(δ

|X ||Y|(2σ − δ)

)k). (109)

By combining (104), and (109), we get

logMX ≤ (1 +2n

)

(n∑t=1

I(Xt ∧ Zt|Yt) +3

1− λ′|X |n1/2

)+ log

(n

1− λ∗

)+ |X ||Y| log(n+ 1) + k log

(|X ||Y|2σ

δ

)Noting that λ′ = 1+λ

2 , λ∗ = 2λ1+λ , and the values of σ and δ as given in (93), we have the following relation:

1n

logMX ≤1n

n∑t=1

I(Xt ∧ Zt|Yt) + εn, (110)

where

εn =2|Z|n

+12|X ||Y|n− 1

2

1− λ+

1n

log(n(1 + λ)(1− λ)

)+|X ||Y| log(n+ 1)

n+

2σnδ

log(|X ||Y|2σ

δ

), (111)

and2σδ

= 2√n

(− log

(1− 2λ

1 + λ

)+ |X ||Y| log(n+ 1)

). (112)

Analogously,

1n

logMY ≤1n

n∑t=1

I(Yt ∧ Zt|Xt) + εn. (113)

To find an upper bound for log (MXMY ), we first try to find a lower bound on the log |A|. By Lemma 4

log |A| ≥ log |A|+ k log(

δ

|X ||Y|(2σ − δ)

)≥ log |A|+ k log

(δ

|X ||Y|2σ

)≥ log(MXMY )− |X ||Y| log(n+ 1) + log

(1− 2λ

1 + λ

)+ k log

(δ

|X ||Y|2σ

)(114)

Therefore,

log(MXMY ) ≤ log |A|+ log(

1 + λ

1− λ

)+ |X ||Y| log(n+ 1) +

2σδ

log(|X ||Y|2σ

δ

)(115)

Using (103), we have

log (MXMY ) ≤n∑t=1

I(XtYt ∧ Zt) +6|X ||Y|n 1

2

1− λ+ log

(1 + λ

1− λ

)+ |X ||Y| log(n+ 1) +

2σδ

log(|X ||Y|2σ

δ

)(116)

23

Hence1n

log (MXMY ) ≤ 1n

n∑t=1

I(XtYt ∧ Zt) + εn (117)

Collecting (110), (113), (117) and (100), (97), (111) and using the fact that δ = n−12 , we have

1n

logMX ≤1n

n∑t=1

I(Xt ∧ Zt|Yt) + εn (118)

1n

logMY ≤1n

n∑t=1

I(Yt ∧ Zt|Xt) + εn (119)

1n

log(MXMY ) ≤ 1n

n∑t=1

I(Xt, Yt ∧ Zt) + εn (120)

1n

n∑t=1

I(Xt ∧ Yt) ≤ εn, (121)

where for all (x, y) ∈ (X ,Y),

1n

n∑t=1

P(Xt = x, Yt = y) = P ∗n(x, y) (122)

These expressions are the averages of the mutual informations calculated at the empirical distributions in the

column t of the mentioned subcode. We can rewrite these equations with the new random variable U , where U

is distributed uniformly on {1, 2, ..., n}. Using the same method as Cover [7, pp. 402], we obtain the following

result.

1n

logMX ≤ I(X ∧ Z|Y,U) + εn, (123a)

1n

logMY ≤ I(Y ∧ Z|X,U) + εn, (123b)

1n

log (MXMY ) ≤ I(XY ∧ Z|U) + εn, (123c)

and I(X ∧ Y |U) ≤ εn,∑u P(x, y, u) = P ∗n(x, y) ∀(x, y) ∈ (X ,Y); where we have defined new random variables

X , XU , Y , YU and Z , ZU , whose distributions depend on U in the same way as the distributions of Xt,

Yt and Zt depend on t, and U is uniformly distributed over the set {1, 2, . . . , n}. This completes the proof.


We will give the proof in five steps. Fix ε > 0 sufficiently small. Consider an arbitrary (n, 2nRX , 2nRY ) code C,

for n sufficiently large. Let PXY be a type such that |C ∩ TPXY| ≥ 1

(n+1)|X||Y|2nRX 2nRY.

Step 1: Average Bhattacharyya distance between codeword pairs:

Fix an arbitrary joint type VXY XY ∈ Pn(X ×Y ×X ×Y) such that (a) its XY -marginal and XY -marginals

equal PXY , i.e., VXY = VXY = PXY , and (b) I(XY ; XY ) ≤ RX +RY − ε.

24

(x,y) ∈ TPXY

AXY = {(x, y) ∈ C : (x,y, x, y) ∈ TVXY XY

}

(x, y)

Figure 3: A schematic the spherical shell around a pair of words (x,y)

In the following we find a bound on the average Bhattacharyya distance between codeword pairs in the code

C. Consider an arbitrary pair of words (x,y) ∈ TPXY. Consider the spherical collection about (x,y), which is

defined by the set of all codeword pairs that has joint type VXY XY with (x,y):

AXY (x,y) ={

(x, y) ∈ C : (x,y, x, y) ∈ TVXY XY

}.

Let TXY (x,y) = |AXY (x,y)|. For a concise notation, we sometimes suppress the dependence of TXY and AXYon the sequence pair. From this point, we are going to study the distance structure of the pairs of codewords that

lie in AXY . A schematic of the spherical collection is shown in Figure 3. Since we have so many codewords in

this spherical collection, they cannot be far from one another. First, we calculate the average distance between

any two pairs in this spherical collection. The average distance is given by

dXYav =1

TXY (TXY − 1)dtot

where dtot is obtained by adding up all unordered distances between any two not necessarily distinct pairs of

codewords in AXY . In the other words, dtot is defined as

dXYtot =∑

(x,y)∈AXY

∑(x,y)∈AXY

dB((x, y), (x, y)

)where (x, y) and (x, y) are not necessarily distinct pairs. Therefore,

dXYav =1

TXY (TXY − 1)

∑(x,y),

(x,y)∈AXY

∑i,k∈Xj,l∈Y

Pxyxy(i, j, k, l)dB((i, j), (k, l)

)where Pxyxy is the joint composition of two codeword pairs (x, y) and (x, y). Furthermore, define the variable

Pxyxy(i, j, k, l|p) as follows:

Pxyxy(i, j, k, l|p) =

1n if (x)p = i, (y)p = j, (x)p = k, (y)p = l

0 otherwise

Hence, the average distance can be written as

dXYav =1

TXY (TXY − 1)

∑p

∑(x,y),

(x,y)∈AXY

∑i,k∈Xj,l∈Y

Pxyxy(i, j, k, l|p)dB((i, j), (k, l)

)

25

Let T(i,j)|p be the number of (x,y) ∈ AXY with (x)p = i, and (y)p = j. We can now express dXYav in terms of

T(i,j)|p as

dXYav =1n

TXYTXY − 1

∑p

∑i,k∈Xj,l∈Y

T(i,j)|pT(k,l)|p

T 2XY

dB((i, j), (k, l)

).

Moreover, let us define λ(i,j)|p as the fraction of the pairs in AXY with an (i, j) in their p-th component, i.e.,

λ(i,j)|p ,T(i,j)|p

TXY. (124)

Therefore, dXYav can be written as

dXYav =1n

TXYTXY − 1

∑p

∑i,k∈Xj,l∈Y

λ(i,j)|pλ(k,l)|pdB((i, j), (k, l)

). (125)

In general, λ is an unknown function. However, it must satisfy the following two constraints. The first one is∑i∈X ,j∈Y

λ(i,j)|p = 1 for all p. (126)

For the center of the sphere, (x,y), we define γ(i,j)|p as

γ(i,j)|p =

1 if (x)p = i, (y)p = j

0 otherwise.

The second constraint a valid λ must satisfy is the following:∑p

λ(i,j)|pγ(k,l)|p = nVXY XY (k, l, i, j) (127)

for all i, k ∈ X and all j, l ∈ Y. Therefore, we can upper bound dav with

dXYav ≤1n

TXYTXY − 1

maxλ

∑p

∑i,k∈Xj,l∈Y

λ(i,j)|pλ(k,l)|pdB((i, j), (k, l)

). (128)

where the maximization is taken over all λ satisfying (126) and (127).

Step 2: Finding the maximum: In the following lemma, we will find the maximum.

Lemma 5. Suppose that W is a nonnegative-definite channel. The average distance between the TXY pairs of

codewords in the spherical collection, defined by joint composition VXY XY , satisfies

dXYav ≤TXY

TXY − 1

∑i,k∈Xj,l∈Y

∑r∈X ,s∈Y

VXY XY (r, s, i, j)VXY XY (r, s, k, l)PXY (r, s)

dB((i, j), (k, l)

). (129)

Proof. Let

λ∗(i,j)|p =∑

k∈X ,l∈Y

VXY XY (k, l, i, j)PXY (k, l)

γ(k,l)|p (130)

26

(X, Y )

(X, Y )(X, Y )

Figure 4: A schematic of relation among (X, Y ), (X, Y ) and (X, Y )

We are going to prove that λ∗ achieves the maximum. It is easy to clarify that λ∗ satisfies (126) and (127).

Moreover, for all λ satisfying (126) and (127),∑p

λ∗(i,j)|pλ(k,l)|p =∑

r∈X ,s∈Y

VXY XY (r, s, i, j)PXY (r, s)

∑p

γ(r,s)|pλ(k,l)|p

=∑

r∈X ,s∈Y


(131)

By assuming that the channel is nonnegative definite, and by using a similar argument as [5, Lemma 6], we

can show that λ∗ achieves the maximum. Substituting this value for λ completes the proof.

The bound on dXYav can be interpreted as follows. The bound is expressed in terms of the expectation of the

Bhattacharyya distance between two pairs of channels inputs (X, Y ) and (X, Y ) with the joint input distribution

VXY XY given by

VXY XY (i, j, k, l) =∑

r∈X ,s∈YVXY |XY (i, j|r, s)VXY |XY (k, l|r, s)PXY (r, s).

In other words, we have three pairs of random variables (X,Y ), (X, Y ) and (X, Y ) which are related to each

other via a Markov chain (X, Y ) − (X,Y ) − (X, Y ) and VXY |XY = VXY |XY . A schematic of this relation is

depicted in Figure 4.

Step 3: Conditions on VXY XY such that TXY is large:

For a fixed VXY XY , let us study the spherical collection consisting of all pairs of codewords sharing composi-

tion VXY XY with some arbitrary pair of sequences in TPXY. Consider such spherical collection for every pair of

sequences. Since each of the codeword pairs shares joint composition VXY XY with roughly exp{H(XY |XY )}pair of sequences, it must belong to roughly exp{H(XY |XY )} different spherical collections. Therefore,∑

(x,y)∈TPXY

|AXY (x,y)| ≥ 1(n+ 1)|X ||Y|

exp{n[RX +RY +H(XY |XY )]}

Hence, by dividing both sides of the previous equality by |TPXY|, we conclude that

1|TPXY

|∑

(x,y)∈TPXY

|AXY (x,y)| ≥ 1(n+ 1)2|X ||Y| 2

n[RX+RY −I(XY ∧XY )].

27

Thus, there must exist a pair of sequence, (x,y) ∈ TPXY, with

|AXY (x,y)| ≥ 1(n+ 1)2|X ||Y| exp{n[RX +RY − I(XY ∧XY )]}. (132)

Since VXY XY satisfies I(XY ; XY ) ≤ RX + RY − ε, we conclude that there must exist a pair of sequence,

(x,y) ∈ TPXYsuch that TXY grows unbounded with n. Hence we have

dXYav ≤ 2∑i,k∈Xj,l∈Y

∑r∈X ,s∈Y


dB((i, j), (k, l)

)(133)

= 2∑i,k∈Xj,l∈Y

∑r∈X ,s∈Y

VXY |XY (i, j|r, s)VXY |XY (k, l|r, s)PXY (r, s)

dB((i, j), (k, l)) (134)

= 2EdW ((X, Y ), (X, Y )) (135)

where the expectation is with respect to the distribution VXY XY XY such that (a) VXY = VXY = VXY = PXY ,

(b) VXY XY = VXY XY (c) (XY )− (XY )− (XY ), and (d) I(XY ; XY ) ≤ RX +RY − ε.This implies that there must exists at least one pair of codewords in C whose distance must be no greater

than the right hand side of the above equation (135). Thus, using the continuity of average Bhattacharyya

distance as a function of probability distributions, we have obtained the following upper bound on the minimum

distance dB(C) of the code.

dB(C) ≤ EXYM (RX , RY ,W, PXY ) , minVXY XY XY ∈V

MXY

EdW((X, Y ), (X, Y )

). (136)

where

VMXY ,{VXY XY XY : VXY = VXY = VXY = PXY , XY −XY − XY

VXY |XY = VXY |XY , I(XY ∧ XY ) = I(XY ∧ XY ) ≤ RX +RY

}. (137)

Step 4: Conditional Arguments and the distance of the code: We will tighten this bound by conditioning

on one of the codewords. Now, let us fix a joint type VXY X ∈ Pn(X × Y × X ) for which we have the following

properties (a) VXY = VXY = PXY and (b) I(X ∧X|Y ) ≤ RX − ε. Let’s choose any y ∈ CY ∩ TPY. Consider

a word x ∈ TPXsuch that (x,y) ∈ TPXY

, and the spherical collection about (x,y) defined by VXY X , i.e.,

codewords in X-codebook that has joint type VXY X with (x,y). Let’s call such a sphere as AY . Assume that

|AY | = TY . We denote the average distance between any two pairs of codeword belonging to this spherical

collection by dYav. By a similar argument as what we did in Lemma 5, it can be shown that dYav is bounded from

above by

dYav ≤TY

TY − 1

∑i,k∈X

∑r∈X ,s∈Y

VXY X(r, s, i)VXY X(r, s, k)PXY (r, s)

dB((i, j), (k, j)

). (138)

Proceeding further by using the properties of VXY X , we get

dYav ≤ 2EdB((X, Y ), (X, Y )),

28

where the expectation is with respect to the distribution VXY XX such that (a) VXY = VXY = VXY = PXY , (b)

VX|XY = VX|XY , (c) X −XY − X, and (d) IV (X ∧ X|Y ) ≤ RX − ε.Using the above arguments while replacing X with Y , and using Corollary 1, we conclude Step 4 with the

following intermmediate result, where the Bhattacharyya distance rate function d∗B(RX , RY ) is bounded from

above using single-letter average Bhattacharyya distance between channel input pairs. Observe that the bound

given in Theorem 4 is expressed in terms of I-divergence and mutual information.

Lemma 6. For any nonnegative-definite channel, W , we have

d∗B(RX , RY ) ≤ EM (RX , RY ,W ), (139)

where EM (RX , RY ,W ) is defined as

EM (RX , RY ,W ) , maxPUXY

minβ=X,Y,XY

EβU,1(RX , RY ,W, PXY U ). (140)


RY ≤ H(Y |U). The functions EβM (RX , RY ,W, PXY U ) are defined as follows:

EXM (RX , RY ,W, PXY U ) , minVXXXY ∈VM

X

EdW((X, Y ), (X, Y )

),

EYM (RX , RY ,W, PXY U ) , minVXY Y Y ∈VM

Y

EdW((X, Y ), (X, Y )

),

EXYM (RX , RY ,W, PXY U ) , minVXY XY XY ∈V

MXY

EdW((X, Y ), (X, Y )

). (141)

where

VMX ,{VXXXY : VXY = VXY = VXY = PXY , X −XY − X

VX|XY = VX|XY , I(X ∧ X|Y ) = I(X ∧ X|Y ) ≤ RX}, (142)

VMY ,{VXY Y Y : VXY = VXY = VXY = PXY , Y −XY − Y

VY |XY = VY |XY , I(Y ∧ Y |X) = I(Y ∧ Y |X) ≤ RY}, (143)

VMXY ,{VXY XY XY : VXY = VXY = VXY = PXY , XY −XY − XY

VXY |XY = VXY |XY , I(XY ∧ XY ) = I(XY ∧ XY ) ≤ RX +RY

}. (144)

Step 5: Relating distance of the code to the error exponent:

Using the standard technique we can relate the average Bhattacharyya distance to the maximal probability

of decoding error and this results in the following statement.

Lemma 7. For any indivisible channel

E∗m(RX , RY ) ≤ d∗B(RX , RY ), (145)

where E∗m(RX , RY ) is the maximal error reliability function at rate pair (RX , RY ), and E∗m(RX , RY ,W ) ≤EM (RX , RY ,W ).

29

Proof. The proof of the first part is very similar to [6, Sec. 10.6]. The second part follows follows from Theorem

6.

Next we will express the bound on the error exponent in terms of I-divergence and mutual information.

Lemma 8. For β = X,Y,XY , the following quantities are equivalent

EβM (RX , RY ,W, PXY U ) = EβU (RX , RY ,W, PXY U ), (146)

where EβU s and EβM s are defined in (50) and (141).

Proof. A similar result was given in [9] for the expurgated lower bound on the error exponent for point-to-point

channels. In the present case, we have to extend that result to the upper bound. In the following we just give

an outline of the proof in the point-to-point case. The extension to the DM-MAC is straightforward. In the

following we prove that for any DMC W with input alphabet X and output alphabet Z, and for any input

distribution P , and any rate R > 0, we have Emd(P,R) = EU (P,R), where

Emd(R) = minVXXX∈A

EdB(X, X)

EU (R) = minVXXXZ∈B

D(VZ|X‖W |P ) + I(X ∧ Z|X)

A ={VXXX : X −X − X, VX = VX = VX = P, VX|X = VX|X , I(X ∧ X) ≤ R

}B =

{VXXXZ : VXXX ∈ A, α(VXZ) ≤ α(VXZ)

}To show EU (P,R) ≤ Emd(P,R): By using the correspondence X ↔ X, X ↔ X, and Z ↔ Y , and using

the arguments in [9, p. 9], we have

D(VZ|X‖W |P ) + I(X ∧ Z|X) =∑x,x,z

Vx,x,z logVz|x,x

W (z|x). (147)

Now note that we can make the constraint set B smaller and thus provide an upper bound on EU as follows.

Let B′ be defined similar to B except that the condition α(VXZ) ≤ α(VXZ) is replaced by VXZ = VXZ . Hence

EU (P,R) ≤ minVXXXZ∈B′

D(VZ|X‖W |P ) + I(X ∧ Z|X) (148)

= minVXXXZ∈B′

12

∑x,x,z

Vx,x,z logVz|x,x

W (z|x)+∑x,x,z

Vx,x,z logVz|x,x

W (z|x)

(149)

= minVXXXZ∈B′

∑x,x,z

Vx,x,z logVz|x,x√

W (z|x)W (z|x)(150)

= minVXXXZ∈B′

EdB(X, X) +D(VZ|XX‖V∗Z|XX |VXX), (151)

= minVXXXZ∈B′

EdB(X, X) = minVXXXZ∈A

EdB(X, X) = Emd(P,R), (152)

where the last but one equality follows by defining

V ∗Z|XX(z|x, x) =

√W (z|x)W (z|x)∑

z

√W (z|x)W (z|x)

.

30

To show Emd(P,R) ≤ EU (P,R): Note that α(VXZ) ≤ α(VXZ) implies

D(VZ|X‖W |P ) + I(X ∧ Z|X) ≥∑x,x,z

Vx,x,z logVz|x,x

W (z|x),

and hence, using (147), we have the following statements

EU (P,R) ≥ minVXXXZ∈B

12

∑x,x,z

Vx,x,z logVz|x,x

W (z|x)+∑x,x,z

Vx,x,z logVz|x,x

W (z|x)

(153)

= minVXXXZ∈B

∑x,x,z

Vx,x,z logVz|x,x√

W (z|x)W (z|x)(154)

≥ minVXXXZ∈A

∑x,x,z

Vx,x,z logVz|x,x√

W (z|x)W (z|x)(155)

= minVXXXZ∈A

EdB(X, X) +D(VZ|XX‖V∗Z|XX |VXX), (156)

= minVXXXZ∈A

EdB(X, X) = Emd(P,R) (157)

Therefore, by combining the result of Lemma 7, and Lemma 8, the result of Theorem 4 is concluded.


Without loss of generality, let us assume RX ≤ RY . The average error probability of any code is always less

than or equal to its maximal probability of error. As a result,

E∗m (RX , RY ) ≤ E∗av (RX , RY ) . (158)

On the hand, for any δ > 0 and sufficiently large n, there exists an (n,RX , RY ) code, C = CX ×CY , satisfying

the following inequality

e(C,W ) ≤ 2−n(E∗av(RX ,RY )−δ), (159)

which can be written as

1MY

MY∑j=1

{1MX

MX∑i=1

eij(C,W )

}≤ 2−n(E∗av(RX ,RY )−δ). (160)

It can be concluded that, for M∗Y ≥MY

2 codewords in CY , the following holds

1MX

MX∑i=1

eij(C,W ) ≤ 2× 2−n(E∗av(RX ,RY )−δ), for all j = 1, 2, ...,M∗Y . (161)

Here, without loss of generality, we assumed that these codewords are the first M∗Y codewords in CY . By

using (161), it can be concluded that

eij(C,W ) ≤ 2× 2−n(E∗av(RX ,RY )−RX−δ), for all j = 1, 2, ...,M∗Y , i = 1, 2, ...,MX , (162)

31

and therefore

em(C∗,W ) ≤ 2× 2−n(E∗av(RX ,RY )−RX−δ), (163)

where

C∗ , {(xi,yj) : i = 1, 2, ...,MX , j = 1, 2, ...,M∗Y } . (164)

Note that,

em(C∗,W ) ≥ 2−n(E∗m(RX ,RY −δ)+δ) ≥ 2−n(E∗m(RX ,RY )+2δ). (165)

By combining (163) and (165), we conclude that

E∗m (RX , RY ) ≥ E∗av (RX , RY )−RX . (166)

Similarly, it can be shown that

E∗av (RX , RY ) ≤ E∗m (RX , RY ) +RX . (167)

References

[1] R. Ahlswede. Multi-way communication channels. In Proc. International Symposium on Information

Theory, 1971.

[2] R. Ahlswede. On two-way communication channels and a problem by Zarankiewics. In Prague Conf. on

Inf. Theory, Sept. 1971.

[3] R. Ahlswede. An elementary proof of the strong converse theorem for the multiple access channel. Journal

of Combinatorics, Information and System sciences, 7(3):216–230, 1982.

[4] U. Augustin. Gedachtnisfreie kannale for diskrete zeit. Z. Wahrscheinlichkelts theory verw, (6):10–61, 1966.

[5] R. E. Blahut. Composition bounds for channel block codes. IEEE Trans. Information Theory, 23(6):656–

674, Nov. 1977.

[6] R. E. Blahut. Principles and Practice of Information Theory. Addison-Wesley, New York, 1987.

[7] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, 1991.

[8] I. Csiszar. The method of types. IEEE Trans. Information Theory, 44(6):2505–2523, Oct. 1998.

[9] I. Csiszar and J. Korner. Graph decomposition: A new key to coding theorems. IEEE Trans. Information

Theory, 1:5–12, Jan. 1981.

[10] I. Csiszar and J. Korner. Information theory: Coding theorems for Discrete memoryless Systems. Academic

Press, New York, 1981.

32

[11] R. L. Dobrusin. Asymptotic bounds of the probability of error for the transmission of messages over a

discrete memoryless channel with a symmetric transition probability matrix. tvip, 7:283–311, 1962.

[12] G. Dueck. Maximal error capacity regions are smaller than average error capacity regions for multi-user

channels. Prague Conf. on Inf. Theory, 7:11–19, 1978.

[13] G. Dueck. The strong converse of the coding theorem for the multi-user channels. Journal of Combinatorics,

Information and System sciences, 6:187–196, 1981.

[14] A. G. Dyachkov. Random constant composition codes for multiple-access channels. Prague Conf. on Inf.

Theory, 31(6):357–369, 1984.

[15] P. Elias. Coding for noisy channels. IRE Convention Record, 4:37–46, 1955.

[16] R. M. Fano. Transmission of Information: A Statistical Theory of Communication. MIT Press, New York,

1961.

[17] A. Feinstein. Error bounds in noisy channels without memory,. IEEE Trans. Information Theory, 1(2):13–

14, Sept. 1955.

[18] R. Gallager. A simple derivation of the coding theorem and some applications. IEEE Trans. Information

Theory, 11(1):3–18, Jan. 1965.

[19] R. Gallager. A perspective on multi-access channels. IEEE Trans. Information Theory, 31(2):124–142,

Mar. 1985.

[20] R. G. Gallager. Information theory and Reliable Communications. John Wiley & Sons, New York, 1968.

[21] E. Haim, Y. Kochman, and U. Erez. Improving the MAC error exponent using distributed structure. In

Proc. International Symposium on Information Theory, 2011.

[22] E. A. Haroutunian. Lower bound for the error probability of multiple-access channels. Problemy Peredachi

Informatsii, 11:23–36, June 1975.

[23] H. Liao. A coding theorem for multiple-access communications. In Proc. International Symposium on

Information Theory, June 1972.

[24] Y. Liu and B. L. Hughes. A new universal random coding bound for the multiple-access channels. IEEE

Trans. Information Theory, 42(2):376–386, Mar. 1996.

[25] A. Nazari, A. Anastasopoulos, and S. S. Pradhan. A new universal random coding bound for the average

error exponent for discrete memoryless multiple-access channels. In Conference on Information Sciences

and Systems, Mar 2009.

[26] A. Nazari, A. Anastasopoulos, and S. S. Pradhan. Error exponent for multiple-access channels: Lower

bounds. Submitted to IEEE Trans. Information Theory, July 2010. http://arxiv.org/abs/1010.1303.

[27] A. Nazari, S. S. Pradhan, and A. Anastasopoulos. New bounds on the maximal error exponent for multiple-

access channels. In Proc. IEEE Int. Symp. Inf. Theory, July 2009.

33

[28] J. Pokorney and H. S. Wallmeier. Random coding bounds and codes produced by permutations for the

multiple-access channels. IEEE Trans. Information Theory, 31(6):741–750, Nov. 1985.

[29] C. E. Shannon, R. Gallager, and E. Berlekamp. Lower bounds on error probability for coding on discrete

memoryless channels (part i). iac, 10:65–103, 1967.

[30] C. E. Shannon, R. Gallager, and E. Berlekamp. Lower bounds on error probability for coding on discrete

memoryless channels (part ii). iac, 10:522–552, 1967.

[31] D. Slepian and J. K. Wolf. A coding theorem for multiple access channels with correlated sources. Bell

System Tech. J., 52:1037–1076, 1973.

34

Documents

Error Exponent for Multiple-Access Channels: Upper Boundsweb.eecs.umich.edu/~pradhanv/paper/ittrans11_2.pdfchannels. This results in a characterization in the form of an optimization