12
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 1 Polar Codes for the m-User Multiple Access Channel Emmanuel Abbe and Emre Telatar Abstract—In this paper, polar codes for the m-user multiple access channel (MAC) with binary inputs are constructed. It is shown that Arıkan’s polarization technique applied individually to each user transforms independent uses of an m-user binary input MAC into successive uses of extremal MACs. This trans- formation has a number of desirable properties: (i) the ‘uniform sum rate’ of the original MAC is preserved, (ii) the extremal MACs have uniform rate regions that are not only polymatroids but matroids and thus (iii) their uniform sum rate can be reached by each user transmitting either uncoded or fixed bits; in this sense they are easy to communicate over. A polar code can then be constructed with an encoding and decoding complexity of O(n log n) (where n is the block length), a block error probability of o(exp(-n 1/2-ε )), and capable of achieving the uniform sum rate of any binary input MAC with arbitrary many users. Applications of this polar code construction to channels with a finite field input alphabet and to the AWGN channel are also discussed. Index Terms—Polar codes, polarization, multiple access chan- nel, multi-user communication, matroid. I. I NTRODUCTION The polarization technique, introduced by Arıkan in [3], transforms n independent uses of a noisy binary input channel into single-uses of n synthetic binary input channels. The key property of this transformation is that almost all of these synthetic channels are polarized, in the sense that they are either very noisy or almost noiseless (i.e., having a mutual information either close to 0 or to 1). Moreover, this technique preserves the ‘uniform mutual information’ — the mutual information of the channel with the uniform input distribution — that is, the proportion of synthesized channels that are almost noiseless tends to the uniform mutual information. As the very noisy or almost noiseless channels are channels for which it is easy to code, this transformation leads to the following coding scheme: uncoded information bits are sent on the polarized channels that have uniform mutual informations close to 1, and on the remaining channels, bits frozen to pre- determined values are transmitted. In addition to bringing a new perspective on coding, polar codes can be implemented with low computational effort. More precisely, the encoding and decoding complexity of a polar code is O(n log n). By definition of the uniform mutual information, these codes achieve the capacity of any channel whose capacity achieving input distribution is uniform. The E. Abbe is with the School of Communication and Computer Sciences, Ecole Polytechnique F´ ed´ erale de Lausanne, Lausanne, 1015-Switzerland, e- mail: emmanuel.abbe@epfl.ch. E. Telatar is with the School of Communication and Computer Sciences, Ecole Polytechnique F´ ed´ erale de Lausanne, Lausanne, 1015-Switzerland, e- mail: emre.telatar@epfl.ch. Manuscript received July 18, 2010; revised November 2, 2011. original polar code construction was generalized in [18] for channels with binary input alphabets to channels with alpha- bets of arbitrary prime cardinality, allowing polar codes to approach the capacity of any discrete memoryless channel. In this paper, we show how the polarization technique can be extended to a multi-user problem, namely, the multiple access channel (MAC). One interesting aspect of this extension is that, as opposed to the single-user setting where a single mutual information characterizes an achievable rate, there is in a MAC setting a collection of mutual informations that characterize an achievable rate region. Hence, the terminology “polarized” may need to be revised in a MAC setting, as there may be more than two “polarized MACs”. Indeed, for a 2- user binary input MAC, by applying Arıkan’s construction to each user’s input separately, [19] shows that n independent uses of a 2-user MAC are converted into n successive uses of five possible “extremal 2-user MACs”. These 2-user extremal MACs are: (i) each user sees a pure noise channel, (ii) one of the user sees a pure noisy channel and the other sees a noiseless channel, (iii) both users see a noiseless channel, (iv) a pure contention channel: a channel whose uniform rate region is the triangle with vertices (0,0), (0,1), (1,0). Note that for this channel, if any of the two users communicates at zero rate, the other user sees a noiseless channel. Moreover [19] shows that the ’uniform sum rate’ 1 of the original MAC is preserved during the polarization process, and that the polarization to the extremal MACs occurs fast enough, so as to ensure the construction of a polar code with vanishing block error probability, achieving uniform sum rate on binary inputs 2-user MACs. In contrast to [19], here we investigate the polarization of the MAC for an arbitrary number of users. In the two user case, the extremal MACs are not just MACs for which each user sees either a noiseless or pure noise channel, as there is also the pure contention channel. However, the uniform rate region of the 2-user extremal MACs are all polyhedrons with integer valued constraints. So, the first interesting aspect of the polarization of the MAC with arbitrary many users is to understand what pattern do extremal MACs follow. We will see that the 2-user and 3-user cases can be handled in a similar manner, whereas a new phenomenon appears when the number of users reaches 4, and the extremal MACs are no longer in a one to one correspondence with the polyhedrons having integer valued constraints. To characterize the extremal MACs, we first show that the mutual information function used to 1 In this paper all mutual informations are computed when the inputs of a MAC are distributed independently and uniformly. The resulting rate regions, sum rates, etc., are prefixed by ‘uniform’ to distinguish them from the capacity region, sum capacity, etc.

IEEE TRANSACTIONS ON INFORMATION THEORY, …eabbe/publications/mac_ieee_final2.pdfIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 1 Polar Codes for the m-User Multiple

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 1

Polar Codes for the m-UserMultiple Access Channel

Emmanuel Abbe and Emre Telatar

Abstract—In this paper, polar codes for the m-user multipleaccess channel (MAC) with binary inputs are constructed. It isshown that Arıkan’s polarization technique applied individuallyto each user transforms independent uses of an m-user binaryinput MAC into successive uses of extremal MACs. This trans-formation has a number of desirable properties: (i) the ‘uniformsum rate’ of the original MAC is preserved, (ii) the extremalMACs have uniform rate regions that are not only polymatroidsbut matroids and thus (iii) their uniform sum rate can be reachedby each user transmitting either uncoded or fixed bits; in thissense they are easy to communicate over. A polar code can thenbe constructed with an encoding and decoding complexity ofO(n logn) (where n is the block length), a block error probabilityof o(exp(−n1/2−ε)), and capable of achieving the uniform sumrate of any binary input MAC with arbitrary many users.Applications of this polar code construction to channels witha finite field input alphabet and to the AWGN channel are alsodiscussed.

Index Terms—Polar codes, polarization, multiple access chan-nel, multi-user communication, matroid.

I. INTRODUCTION

The polarization technique, introduced by Arıkan in [3],transforms n independent uses of a noisy binary input channelinto single-uses of n synthetic binary input channels. Thekey property of this transformation is that almost all of thesesynthetic channels are polarized, in the sense that they areeither very noisy or almost noiseless (i.e., having a mutualinformation either close to 0 or to 1). Moreover, this techniquepreserves the ‘uniform mutual information’ — the mutualinformation of the channel with the uniform input distribution— that is, the proportion of synthesized channels that arealmost noiseless tends to the uniform mutual information.As the very noisy or almost noiseless channels are channelsfor which it is easy to code, this transformation leads to thefollowing coding scheme: uncoded information bits are sent onthe polarized channels that have uniform mutual informationsclose to 1, and on the remaining channels, bits frozen to pre-determined values are transmitted.

In addition to bringing a new perspective on coding, polarcodes can be implemented with low computational effort.More precisely, the encoding and decoding complexity of apolar code is O(n log n). By definition of the uniform mutualinformation, these codes achieve the capacity of any channelwhose capacity achieving input distribution is uniform. The

E. Abbe is with the School of Communication and Computer Sciences,Ecole Polytechnique Federale de Lausanne, Lausanne, 1015-Switzerland, e-mail: [email protected].

E. Telatar is with the School of Communication and Computer Sciences,Ecole Polytechnique Federale de Lausanne, Lausanne, 1015-Switzerland, e-mail: [email protected].

Manuscript received July 18, 2010; revised November 2, 2011.

original polar code construction was generalized in [18] forchannels with binary input alphabets to channels with alpha-bets of arbitrary prime cardinality, allowing polar codes toapproach the capacity of any discrete memoryless channel.

In this paper, we show how the polarization technique canbe extended to a multi-user problem, namely, the multipleaccess channel (MAC). One interesting aspect of this extensionis that, as opposed to the single-user setting where a singlemutual information characterizes an achievable rate, there isin a MAC setting a collection of mutual informations thatcharacterize an achievable rate region. Hence, the terminology“polarized” may need to be revised in a MAC setting, as theremay be more than two “polarized MACs”. Indeed, for a 2-user binary input MAC, by applying Arıkan’s construction toeach user’s input separately, [19] shows that n independentuses of a 2-user MAC are converted into n successive uses offive possible “extremal 2-user MACs”. These 2-user extremalMACs are: (i) each user sees a pure noise channel, (ii) oneof the user sees a pure noisy channel and the other sees anoiseless channel, (iii) both users see a noiseless channel,(iv) a pure contention channel: a channel whose uniform rateregion is the triangle with vertices (0,0), (0,1), (1,0). Notethat for this channel, if any of the two users communicates atzero rate, the other user sees a noiseless channel. Moreover[19] shows that the ’uniform sum rate’1 of the original MACis preserved during the polarization process, and that thepolarization to the extremal MACs occurs fast enough, so as toensure the construction of a polar code with vanishing blockerror probability, achieving uniform sum rate on binary inputs2-user MACs.

In contrast to [19], here we investigate the polarization ofthe MAC for an arbitrary number of users. In the two usercase, the extremal MACs are not just MACs for which eachuser sees either a noiseless or pure noise channel, as there isalso the pure contention channel. However, the uniform rateregion of the 2-user extremal MACs are all polyhedrons withinteger valued constraints. So, the first interesting aspect ofthe polarization of the MAC with arbitrary many users is tounderstand what pattern do extremal MACs follow. We willsee that the 2-user and 3-user cases can be handled in a similarmanner, whereas a new phenomenon appears when the numberof users reaches 4, and the extremal MACs are no longerin a one to one correspondence with the polyhedrons havinginteger valued constraints. To characterize the extremal MACs,we first show that the mutual information function used to

1In this paper all mutual informations are computed when the inputs of aMAC are distributed independently and uniformly. The resulting rate regions,sum rates, etc., are prefixed by ‘uniform’ to distinguish them from the capacityregion, sum capacity, etc.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 2

provide bounds on the communication rates for the extremalMACs correspond to a specific type of a function called rankfunction in matroid theory. This connection is used to showthat the extremal MACs are in a one to one correspondencewith a family of matroids called binary, and are “equivalent”(in a sense which will be defined later) to linear deterministicMACs. This is then used to conclude the construction of apolar code ensuring reliable communication on binary inputMACs for arbitrary values of m (the number of users).

Finally, we discuss two applications resulting from the MACpolar code construction with arbitrary many users described inthis paper. The first one is motivated by the idea of proposinga new coding scheme for the additive white Gaussian noisechannel. By transmitting the standardized average of m binaryinputs which are uniformly distributed (taking into accountthe power constraint), we transmit a random input which isapproximately Gaussian distributed when m is large (usingthe central limit theorem). This is important to achieve thehighest rate on the AWGN channel, since the Gaussian inputdistribution maximizes the mutual information for this chan-nel. We can then use the polar code construction for a MACdeveloped in this paper to propose a new coding scheme forthe AWGN channel. In the second application, we constructpolar codes achieving the uniform sum-rate of MACs withq-ary inputs, where q is a power of 2, using the polar codeconstruction for MACs with binary inputs and a large enoughnumber of users. We also show how, with this extension, thesum-capacity of any m-user MAC can be achieved.

II. POLARIZATION PROCESS FOR MACS

We consider an m-user multiple access channel with binaryinput alphabets (BMAC) and arbitrary output alphabet Y . Thechannel is specified by the conditional probabilities

P (y|x), for all y ∈ Y and x = (x[1], . . . , x[m]) ∈ Fm2 .

Let Em := {1, . . . ,m} and let X[1], . . . , X[m] be mu-tually independent and uniformly distributed binary randomvariables. Let X := (X[1], . . . , X[m]). We denote by Y theoutput of the MAC P when the input is X . For J ⊆ Em, wedefine

X[J ] := {X[i] : i ∈ J},I[J ](P ) := I(X[J ];Y X[Jc]),

where Jc denotes the complement set of J in Em, and

I(P ) : 2Em → RJ 7→ I[J ](P ) (1)

where 2Em denotes the power set of Em and where I[∅](P ) =0. Note that

I(P ) := {(R1, . . . , Rm) : 0 ≤∑i∈J

Ri ≤ I[J ](P ), ∀J ⊆ Em}

is included in the capacity region of the MAC P . We referto I(P ) as the uniform rate region and to I[Em](P ) as theuniform sum rate. We now consider two independent uses ofsuch a MAC. We define

X1 := (X1[1], . . . , X1[m]), X2 := (X2[1], . . . , X2[m]),

where X1[i], X2[i], with i ∈ Em, are mutually independentand uniformly distributed binary random variables. We denoteby Y1 and Y2 the respective outputs of independent uses ofthe MAC P when the inputs are X1 and X2:

X1P→ Y1, X2

P→ Y2. (2)

We define two additional binary random vectors

U1 := (U1[1], . . . , U1[m]), U2 := (U2[1], . . . , U2[m])

with mutually independent and uniformly distributed compo-nents, and we put X1 and X2 in the following one to onecorrespondence with U1 and U2:

X1 = U1 + U2, X2 = U2,

where the addition in the above is the modulo 2 componentwise addition.

Definition 1. Let P : Fm2 → Y be an m-user BMAC. Wedefine two new m-user BMACs, P− : Fm2 → Y2 and P+ :Fm2 → Y2 × Fm2 , by

P−(y1, y2|u1) :=∑

u2∈Fm2

1

2mP (y1|u1 + u2)P (y2|u2),

P+(y1, y2, u1|u2) :=1

2mP (y1|u1 + u2)P (y2|u2),

for all ui ∈ Fm2 , yi ∈ Y , i = 1, 2.

That is, we have now two new m-user BMACs withextended output alphabets:

U1P−→ (Y1, Y2), U2

P+

→ (Y1, Y2, U1) (3)

which also defines I[J ](P−) and I[J ](P+), ∀J ⊆ Em.This construction is the natural extension of the construction

for m = 1, 2 in [3], [19]. Here again, we are comparing twoindependent uses of the same channel P (cf. (2)) with twosuccessive uses of the channels P− and P+ (cf. (3)). Notethat

I[J ](P−) ≤ I[J ](P ) ≤ I[J ](P+), ∀J ⊆ Em.

Definition 2. Let {Bn}n≥1 be i.i.d. uniform random variablesvalued in {−,+}. Let the BMAC valued random process{Pn, n ≥ 0} be defined by

P0 := P,

Pn := PBnn−1, ∀n ≥ 1. (4)

A. Discussion

When m = 1, we have 2I(P ) = I(P−) + I(P+), whichimplies that I(Pn) (which in this case denotes a sequence ofscalar random variables and not of functions) is a martingale.This allows to show that I(Pn) tends to either 0 or 1, andthe extremal channels of the single-user polarization schemeare either pure noise or noiseless channels. Moreover, in thepolarization of the single-user channel, the extremal channelsare synthesized by using a genie aided decoder. The geniehelps the decoder in providing the correct values of theprevious decisions when decoding the current channel’s input.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 3

In the polar code construction the genie is simulated by adecoder which decodes the bits successively on the syntheticchannels, and uses its previous decisions assuming they arecorrect. As the block error probability of the genie aided andthe standalone decoder are exactly the same, it is sufficient tostudy the block error probability of the genie aided decoder.These facts then facilitate the design of a code: bits are frozenon the very noisy channels and uncoded information bits aresent on the almost noiseless channels, recovered then by usinga successive decision decoder at the receiver. To show thatthe block error probability of this coding scheme is small,i.e., that the error caused by the successive decision decoderdoes not propagate, it is shown that the convergence to the“good” extremal channels is fast enough. When m ≥ 2, severalnew points need to be investigated. In particular, one needs tocheck whether I[J ](Pn) still has a martingale property fordifferent J’s. Then, if the convergence of each I[J ](Pn) canbe proved, one has to examine whether the obtained limitingMACs are also extremal MACs, along the spirit of creatingtrivial channels to communicate over, as in the single-userpolarization. Finally, one needs to ensure that the convergenceof these mutual informations is taking place fast enough, soas to ensure a block error probability that tends to zero whensuccessive decision decoding is used.

III. PRELIMINARY RESULTS

Summary: In Section III-A, we show that I(Pn) tends a.s.to a matroid rank function (cf. Definition 5). We then seein Section III-B that the extreme points of a uniform rateregion with matroidal constraints can be achieved by eachuser sending uncoded or frozen bits; in particular the uniformsum rate can be achieved by such strategies. We then showin Section IV, that for arbitrary m, I(Pn) tends not to anarbitrary matroid rank function but to the rank function of abinary matroid (cf. Definition 6). This is used to show thatthe convergence to the extremal MACs happens fast enough,which then leads to the main result of this paper, Theorem7 in Section IV. This theorem states that applying Arıkan’spolar transform separately to each user, and using a successivedecision decoder can achieve sum rates arbitrarily close to theuniform sum rate of a MAC, ensure block error probabilitythat decays roughly like 2−

√n with block length, and operate

with computational complexity O(n log n).

A. The extremal MACsLemma 1. {I[J ](Pn), n ≥ 0} is a bounded super-martingalewhen J Em and a bounded martingale when J = Em.

Proof: For any J ⊆ Em, I[J ](Pn) ≤ m and

2I[J ](P ) = I(X1[J ]X2[J ];Y1Y2X1[Jc]X2[Jc])

= I(U1[J ]U2[J ];Y1Y2U1[Jc]U2[Jc])

= I(U1[J ];Y1Y2U1[Jc]U2[Jc])

+ I(U2[J ];Y1Y2U1[Jc]U2[Jc]U1[J ])

≥ I(U1[J ];Y1Y2U1[Jc])

+ I(U2[J ];Y1Y2U1U2[Jc])

= I[J ](P−) + I[J ](P+). (5)

Fig. 1. The middle polyhedron (with plain line) represents the uniformrate region of a MAC P . The polyhedron on the right, respectively onthe left, represents the uniform rate region of P+, respectively P−. Themiddle polyhedron includes, respectively contains, the polyhedron on theright, respectively on the left. The average of the left and right polyhedrons isgiven by the dashed polyhedron, which is included in the middle polyhedron.The containment may be strict or not but at least one point in the dominantface of these polyhedrons must be in common (since the uniform sum-rate ispreserved).

If J = Em, the inequality above is an equality.Note that the inequality in the above are only due to

the bounds on the mutual informations of the P− channel.Because of the equality when J = Em, our constructionpreserves the uniform sum rate. An illustration of the uniformrate region of P, P− and P+ is given in Figure 1.

As a corollary of previous Lemma, we have the followingresult.

Lemma 2. The process {I[J ](Pn), J ⊆ Em} converges a.s.,i.e., for each J ⊆ Em, limn→∞ I[J ](Pn) exists a.s..

Note that for a fixed n, {I[J ](Pn), J ⊆ Em} denotes thecollection of the 2m random variables I[J ](Pn), for J ⊆ Em.When the convergence takes place (this is an a.s. event), letus define

I∞[J ] := limn→∞

I[J ](Pn)

and I∞ to be the function J 7→ I∞[J ].From the previous theorem, I∞[J ] is a random variable

valued in [0, |J |]. We will now further characterize theserandom variables.

Lemma 3. For any ε > 0, there exists δ > 0 such thatI(A2;B1B2A1)− I(A2;B2) < δ implies

I(A2;B2) ∈ [0, ε) ∪ (1− ε, 1],

whenever (A1, A2, B1, B2) are random variables valued inF2 × F2 × B × B, with B any set, and

PA1A2B1B2(a1, a2, b1, b2) =1

4Q(b1|a1 + a2)Q(b2|a2),

for any ai ∈ F2, bi ∈ B, i = 1, 2, where Q is a binary inputB-output channel.

Proof of Lemma 3: First note that

I(A2;B1B2A1)− I(A2;B2)

= I(A1 +A2;B1B2A1)− I(A2;B2)

≥ I(A1 +A2;B1B2)− I(A2;B2)

= H(A1 +A2|B1, B2)−H(A2|B2).

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 4

Moreover,

H(A1 +A2|B1, B2)

=∑

b1,b2∈B

H(pA1|B1=b1 ? pA2|B2=b2)pB1(b1)pB2

(b2)

=∑

b1,b2∈B

h2(h−12 (H(A1|B1 = b1)) ∗ h−1

2 (H(A2|B2 = b2)))

· pB1(b1)pB2(b2)

where h2 is the binary entropy function, h−12 is its inverse

(valued in [0, 1/2]) and where a∗b = a(1−b)+(1−a)b. Fromthe so-called Mrs. Gerbers Lemma [21], the function x 7→h2(h−1

2 (x) ? p0) is strictly convex in x for any p0 ∈ (0, 1/2).Hence, defining α such that h2(α) = H(A1|B1) = H(A2|B2)and using twice this Lemma, we obtain

H(A1 +A2|B1, B2)−H(A2|B2) (6)≥ h2(α ∗ α)− h2(α).

Finally, it is straightforward to check that the function [0, 1] 3α 7→ h2(α∗α)−h2(α) vanishes only when α ∈ {0, 1}. Sincethis function is continuous, the desired result follows.

This Lemma is used to prove the following.

Lemma 4. For any ε > 0 and any m-user BMAC P , thereexists δ > 0, such that for any J ⊆ Em, if I[J ](P+) −I[J ](P ) < δ, we have

I[J ](P )− I[J \ i](P ) ∈ [0, ε) ∪ (1− ε, 1], ∀i ∈ J,

where I[∅](P ) = 0.

Proof: Let i ∈ J . Note that

I[J ](P+)− I[J ](P )

= I(U2[J ];Y1Y2U1U2[Jc])− I(U2[J ];Y2U2[Jc])

= I(U2[J ];Y1U1|Y2U2[Jc])

≥ I(U2[i];Y1U1[i]U1[Jc]|Y2U2[Jc])

= I(U2[i];Y1U1[Jc]Y2U2[Jc]U1[i])− I(U2[i];Y2U2[Jc])

= I(U2[i];Y1X1[Jc]Y2X2[Jc]U1[i])− I(U2[i];Y2X2[Jc]).

Using Lemma 3 with Ak = Uk[i], Bk = YkXk[Jc], k = 1, 2,and

Q(y, x[Jc]|x[i]) =1

2m−1

∑x[j]∈F2,j /∈Jc∪{i}

P (y|x),

we conclude that we can take δ small enough, so thatI[J ](P+)−I[J ](P ) < δ implies I(U2[i];Y2X2[Jc]) ∈ [0, ε)∪(1− ε, 1]. Moreover, we have

I[J ](P ) = I[J \ i](P ) + I(U2[i];Y2X2[Jc]).

Lemma 5. With probability one, I∞[J ]− I∞[J \ i] ∈ {0, 1},∀J ⊆ Em, i ∈ J , where I∞[∅] := 0.

Proof: From Lemma 2, we have that I[J ](Pn) convergesa.s., hence limn→∞ |I[J ](Pn+1)− I[J ](Pn)| = 0 a.s. More-over, by definition of Pn, |I[J ](Pn+1) − I[J ](Pn)| is equalto I[J ](P+

n ) − I[J ](Pn) w.p. half and I[J ](Pn) − I[J ](P−n )

w.p. half. Hence, from (5), E|I[J ](Pn+1) − I[J ](Pn)| ≥12 (I[J ](P+

n ) − I[J ](Pn)). But |I[J ](Pn+1) − I[J ](Pn)| isbounded by m, hence limn→∞ E|I[J ](Pn+1)−I[J ](Pn)| = 0and limn→∞ I[J ](P+

n )− I[J ](Pn) = 0 almost surely. Finally,we conclude using Lemma 4.

Note that Lemma 5 implies in particular that {I∞[J ], J ⊆Em} is a.s. a discrete random vector.

Definition 3. We denote by Am the support of {I∞[J ], J ⊆Em} (when the convergence of Lemma 5 takes place, i.e.,a.s.). This is a subset of {0, . . . ,m}2m .

We have already seen that not every element in{0, . . . ,m}2m can belong to Am. We will now further char-acterize the set Am.

Definition 4. A polymatroid is a set Em, called a ground set,equipped with a function f : 2Em → R (where 2Em denotesthe power set of Em), called a rank function, which satisfies

f(∅) = 0,

f(J) ≤ f(K), ∀J ⊆ K ⊆ Em,

f(J ∪K) + f(J ∩K) ≤ f(J) + f(K), ∀J,K ⊆ Em.

The following result is provided in [9] as Lemma 3.1 (page42).

Lemma 6. For any MAC and any product distribution onthe inputs X[Em], we have that ρ(S) = I(X[S];Y X[Sc]) isa rank function on Em, where we denote by Y the outputof the MAC when the input is X[Em]. Hence, (Em, ρ) is apolymatroid.

Therefore, any realization of I(Pn) is a rank function andthe elements of Am are the image of a polymatroid rankfunction.

Definition 5. A matroid is a polymatroid whose rank functionis integer valued and satisfies f(J) ≤ |J |, ∀J ⊆ Em. Wedenote by MATm the set of all matroids with ground stateEm. We use the notation rB to refer to the rank function ofa matroid B. We will sometimes identify a matroid with itsrank function image, in which case, we consider an elementof MATm as a 2m dimensional integer valued vector. Wealso define a basis of a matroid by the collection of maximalsubsets of Em for which f(J) = |J |.

Using Lemma 5 and the definition of a matroid, we havethe following result.

Theorem 1. For every m ≥ 1, Am ⊆ MATm, i.e., I∞ is amatroid rank function.

We will see that the inclusion is strict for m ≥ 4.

B. Communicating on MACs with matroidal regions

We have shown that, when n tends to infinity, the MACsthat we create with the polarization construction of SectionII are particular MACs: the mutual informations I∞[J ] area.s. integer valued (and satisfy the other matroid properties).As stated in [5] Theorem 22, the vertices (corner points) of

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 5

a polyhedron defined by a rank function f are the vectors ofthe following form:

xj(1) = f(A1),

xj(i) = f(Ai)− f(Ai−1), ∀i ∈ {2, . . . , k}xj(i) = 0, ∀i ∈ {k + 1, . . . ,m},

for some k ≤ m, j(1), j(2), . . . , j(m) distinct in Em andAi = {j(1), j(2), . . . , j(i)}, where the vertices strictly in thepositive orthant are given for k = m.

Therefore, we have the following corollary.

Corollary 1. The uniform rate region defined by an elementof Am has vertices on the hypercube {0, 1}m. In particular,to communicate at a rate m-tuple which is a vertex of sucha MAC uniform rate region, each user individually communi-cates on either a noiseless (uniform mutual information of 1)or pure noise (uniform mutual information of 0) channel.

C. Convergence Speed and Representation of Matroids

Convention: for a given m, we write the collection{I∞[J ], J ⊆ Em} by skipping the empty set (since I∞[∅] =0) as follows: when m = 2, we order the sequence as(I∞[1], I∞[2], I∞[1, 2]), and when m = 3, as (I∞[1], I∞[2],I∞[3], I∞[1, 2], I∞[1, 3], I∞[2, 3], I∞[1, 2, 3]), etc.

In this section, we show that there is a correspondencebetween the extremal MACs and the linear deterministicMACs, i.e., MACs whose outputs are linear forms of theinputs. This correspondence has been used in [19] to establishthat the convergence to the extremal MACs for the 2-user caseis fast, namely o(2−n

β

) for any β < 1/2, which allows toconclude that the block error probability of the code describedin [19] is small. We hence follow the same approach as in [19]to treat the case where the number of users is arbitrary, andproceed here to establish this correspondence. We will seethat while the case m = 3 is similar to the case m = 2,a new difficulty is encountered for m ≥ 4. How to use thiscorrespondence in order to show that the convergence to theextremal MACs for the m-user case is fast is done in SectionIV.

Note that, for m = 2, a property of the matroids{(0, 0, 0), (0, 1, 1), (1, 0, 1),(1, 1, 1), (1, 1, 2)} is that we canexpress any of them as the uniform rate region of a lineardeterministic MAC: (1, 0, 1) is in particular the uniform rateregion of the MAC whose output is Y = X[1], (0, 1, 1)corresponds to Y = X[2], (1, 1, 1) to Y = X[1] + X[2] and(1, 1, 2) to Y = (X[1], X[2]). Indeed, this is related to thefact that any matroid with a two element ground state can berepresented in the binary field. Let us introduce the definitionof binary matroids.

Definition 6. Linear matroids: let A be a k ×m matrix overa field. Let Em be the index set of the columns in A. Therank of J ⊆ Em is defined by the rank of the sub-matrix withcolumns indexed by J .Binary matroids: A matroid is binary if it is a linear matroidover the binary field. We denote by BMATm the set of binarymatroids with m elements.

1

1 1

The Extremal 3-MACs.

I(P−)

I(P )

I(P+)12I(P−) + 1

2I(P+)

1

Fig. 2. These polyhedrons represent the matroids for m = 3, withouttaking into account the labeling of the ground set. If the labeling is taken intoaccount one obtains the 16 matroids for m = 3. Note that each polyhedroncorresponds to the uniform rate region of a linear deterministic MAC with3 users and binary inputs. Denoting by X1, X2, X3 i.i.d. uniform binaryrandom variables, the first polyhedron is for example the uniform rate regionof the MAC whose output is (X1, X2, X3), whereas the second polyhedronis obtained for the MAC whose output is (X1 +X2, X1 +X3)

1) The Case m = 3: MAT3 is given by 16 matroids (takinginto account the labeling of the ground set). These matroids arerepresented in Figure 2. Moreover, they are all binary repre-sentable (there are 16 binary matroids). For example, it is clearthat the deterministic MAC whose output is X[1]+X[2]+X[3]has a uniform rate region given by (1, 1, 1, 1, 1, 1, 1). Similarly,all matroids for m = 3 correspond to the rate region of a lineardeterministic MAC. However, one can also show that any 3-user binary MAC with uniform rate region given by a matroidis equivalent to a linear deterministic MAC in the followingsense. A MAC with output Y and uniform rate region given by(1, 1, 1, 1, 1, 1, 1) must satisfy I(X[1] +X[2] +X[3];Y ) = 1,and similarly for other matroids (with m = 3), where thelinear forms of inputs which can be recovered from theoutput are dictated by the binary representation of the matroid.However, the above claim is not quite sufficient to show that,if {I[J ](Pn), J ⊂ Em} tends to (1, 1, 1, 1, 1, 1, 1), we havealong this path that I((P [1,2,3])n) tends to 1, where P [1,2,3]

is the channel with input X[1] + X[2] + X[3] and outputY . For this, one can show a stronger version of the claimwhich says that if a MAC has a uniform rate region “close to”(1, 1, 1, 1, 1, 1, 1), it must be that I(X[1] +X[2] +X[3];Y ) isclose to 1. In any case, a similar technique as for the m = 2case lets one show that the convergence to the matroids in A3

must take place fast enough.2) The Case m = 4: We have that MAT4 contains 68

matroids. However, there are only 67 binary matroids withground state 4. Hence, there must be a matroid which doesnot have a binary representation. This matroid is given by(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) (one can easily check thatthis is not a binary matroid). It is denoted U2,4 and is calledthe uniform matroid of rank 2 with 4 elements (for which any2 elements set is a basis). Of course, that this matroid is notbinary, does not imply that an hypothetic convergence to itmust be slow. It means that we will not be able to use thetechnique employed for the case m = 2, 3.

Luckily, one can show that there is no MAC leading to U2,4

and the following holds.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 6

Lemma 7. A4 ⊂ BMAT4 ( MAT4.

Hence, the m = 4 case can be treated in a similar manneras the previous cases. We conclude this section by proving thefollowing result, which implies Lemma 7.

Lemma 8. U2,4 cannot be the uniform rate region of any MACwith four users and binary inputs.

Proof: Assume that U2,4 is the uniform rate region of aMAC. We then have

I(X[i, j];Y ) = 0, (7)I(X[i, j];Y X[k, l]) = 2, (8)

for all i, j, k, l distinct in {1, 2, 3, 4}.Let y0 be in the support of Y . For x ∈ F4

2, defineP(x|y0) = W (y0|x)/

∑z∈F4

2W (y0|z). Assume w.l.o.g. that

p0 := P(0, 0, 0, 0|y0) > 0. From (8), a realization of X[k, l]which has non-zero probability under conditiong of y0 mustdetermine the value of X[i, j] (for i, j, k, l distinct). Sincep0 := P(0, 0, 0, 0|y0) > 0, we must have in particular thatP(0, 0, ∗, ∗|y0) = 0 for any choice of ∗, ∗ which is not 0, 0and P(0, 1, ∗, ∗|y0) = 0 for any choice of ∗, ∗ which is not 1, 1(since each of these assignments have at least two zeros). Onthe other hand, from (7), each time we freeze two componentsin x and sum up P(x|y0) over the other components, we mustobtain the same sum since X[i, j] is uniform given y0. Hence,P(0, 1, 1, 1|y0) must be equal to p0. However, we have form(8) that P(1, 0, ∗, ∗|y0) = 0 for any choice of ∗, ∗ (even for1, 1 since we now have P(0, 1, 1, 1|y0) > 0). At the same time,this implies that the average of P(1, 0, ∗, ∗|y0) over ∗, ∗ is zero.This brings a contradiction, since from (7), this average mustequal to p0.

Moreover, a similar argument can be used to prove astronger version of Lemma 8 to show that no sequence ofMACs can have a uniform rate region that converges to U2,4.

3) Arbitrary values of m: We have seen in the previoussection that for m = 2, 3, 4, the extremal MACs have uniformrate region that are not any matroids but binary matroids. Thisfact can be used to show that for m = 2, 3, 4, {I[J ](Pn), J ⊆Em} must tend fast enough to {I∞[J ], J ⊆ Em}. Thedetails of this proof are provided in Section IV; in words,by working with the linear deterministic representation of theMACs, the problem of showing that the convergence speedis fast in the MAC setting becomes a consequence of aresult shown in [4] for the single-user setting. We now showthat the correspondence between extremal MACs and lineardeterministic MACs holds for any value of m.

Definition 7. A matroid is informatic if its rank function canbe expressed as r(J) = I(X[J ];Y X[Jc]), J ⊆ Em, whereX[Em] has independent and binary uniformly distributedcomponents, and Y is the output of a binary input MAC withinput x[Em].

Theorem 2. A matroid is informatic if and only if it is binary.

The converse of this theorem is easily proved and a proofof the direct part using the following theorem of Tutte [16]can be found in [1]. We show a stronger result in Theorem 4below.

Theorem 3 ([16]). A matroid is binary if and only if it hasno minor that is U2,4.

A minor of matroid is a matroid which is either a restrictionor a contraction of the original matroid to a subset of theground set. A contraction can be defined as a restriction onthe dual matroid, which is another matroid whose bases arethe complement set of the bases of the original matroid. Werefer to [14] for formal definitions. The characterization by afinite number of excluded minor for a matroid representableover a finite field is also known for GF (3) [15], [20] andGF (4) [8].

In the following theorem, we connect extremal MACs tolinear deterministic MACs.

Theorem 4. Let X[Em] have independent and binary uni-formly distributed components. Let Y be the output of a MACwith input X[Em] and for which f(J) = I(X[J ];Y X[Jc]) isinteger valued, for any J ⊆ Em. Then, there exists a binarymatrix A such that

I(AX[Em];Y ) = rankA = f(Em).

Note that the first equality in the theorem, namelyI(AX[Em];Y ) = rank(A), is equivalent to

H(AX[Em]|Y ) = 0,

which means that Y determines the linear function onthe inputs given by AX[Em]. Moreover, from the equalityI(AX[Em];Y ) = I(X[Em];Y ), we have that

I(X;Y |AX[Em]) = 0,

which means that once AX[Em] is known, no other informa-tion about X form Y can be deduced. Hence, these two factscan be summarized by saying that Y is “equivalent” to a linearform of the inputs (in terms of information).

This theorem was originally proved using matroid theorynotations and we refer to [1] for this proof and other investi-gations regarding the connection between matroid theory andextremal MACs. We provide an alternate proof of this theoremin the Appendix. Note that Theorem 2 follows from Theorem4. One can also show a stronger version of this theoremfor MACs having a uniform rate region which is close to amatroid, this is provided in Theorem 5 below, whose proof isalso given in the Appendix.

Theorem 5. Let X[Em] have independent and binary uni-formly distributed components. For any ε > 0, there existsγ(ε) with the following properties:(i) γ(ε)→ 0 as ε→ 0,

(ii) Whenever Y is the output of a MAC with input X[Em]and for which f : 2Em 3 J 7→ I(X[J ];Y X[Jc]) satisfiesmaxJ∈2Em d(f(J),Z) < ε, there exists a binary matrixA such that

|I(AX[Em];Y )− f(Em)| < γ(ε).

Theorem 2 says that an extremal MAC must have (withprobability one) the same uniform rate region as the one ofa linear deterministic MAC, i.e., a MAC whose output is acollection of linear forms of the inputs. However, Theorem 4,

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 7

says something stronger, namely, that from the output of anextremal MAC, one can recover a collection of linear forms ofthe inputs and essentially nothing else. In that sense, extremalMACs are equivalent to linear deterministic MACs. Thisalso suggests that we could have started from the beginningby working with the quantities I(P [J]) := I(

∑i∈J Xi;Y )

instead of I[J ](P ) = I(X[J ];Y X[Jc]) to analyze the polar-ization of a MAC. The second measure is the natural one tostudy a MAC, since it characterizes the rate region. However,we have just shown that it is sufficient to work with thefirst measure to characterize the uniform rate regions of thepolarized MACs. Indeed, one can show that I((P [J])n) tendseither to 0 or 1 and Eren Sasoglu [17] has provided a directargument showing that these measures fully characterize theuniform rate region of the extremal MACs. We use a similarargument for the proof of Theorem 4 given in the Appendix.

D. Comment: Relationship between information and matroidtheories

The process of identifying which matroids can have a rankfunction derived from an information theoretic measure, suchas the entropy, has been investigated in different works, cf.[22] and references therein. In particular, the problem of char-acterizing the entropic matroids has consequent applicationsin network information theory and network coding problemsas described in [11].

Entropic matroids are defined as follows. Let E be a finiteset and X[E] = {Xi}i∈E be a random vector with eachcomponent valued in a finite alphabet. Let h(I) := h(X[I]).

Theorem 6. h(·) is a rank function. Hence, (E, h) is apolymatroid.

A (poly)matroid is then called entropic, if its rank functioncan expressed as the entropy of a certain random vector,as above. A proof of the previous theorem is available in[6], [12]. The work of Han, Fujishige, Zhang and Yeung,[10], [6], [22] has resulted in the complete characterizationof entropic matroids for |E| = 2, 3. However, the problemis open when |E| ≥ 4. Note that in our case, where we havebeen interested in characterizing informatic matroids instead ofentropic matroids, we have also faced a different phenomenonwhen |E| ≥ 4. Other similar problems have been studied in[13].

IV. MAIN RESULT: POLAR CODES FOR MACS

In this section, we describe our polar code construction forthe MAC and prove the main theorem of the paper.

Let n = 2l for some l ∈ Z+ and let Gn =[

1 01 1

]⊗ldenote

the l-th Kronecker power of the given matrix. Let U [k]n :=(U1[k], . . . , Un[k]) and

X[k]n = U [k]nGn, k ∈ Em.

When X[Em]n is transmitted over n independent uses ofP to receive Y n, define for any i ∈ {1, . . . , n} the channel

P(i) : Fm2 → Yn × Fm(i−1)2 (9)

to be the channel whose inputs and outputs are Ui[Em] →Y nU i−1[Em].

Let n ≥ 1 and εn > 0. Classify each P(i) as either‘polarized’ or ‘not polarized’ according to the function I(P(i))being valued within εn of Z or not. (We will choose anappropriate sequence {εn} below. For the moment note onlythat by Theorem 1, if εn were any fixed constant, the channelsP(i) are in the ‘polarized’ category except for a vanishingfraction of indices i.) For i for which P(i) is in the polarizedcategory, set ri to be the integer within εn of I(P(i))[Em].Theorem 5 lets us conclude the existence of a ri ×m matrixAi for which H

(AiUi[Em]

∣∣ Y nU i−1[Em])< γ(εn), that is

to say the output of channel P(i) determines AiUi[Em] withhigh probability2.

We now describe what we refer to as the polar encoderand decoder for the MAC. The encoder will be specified viathe sets Bi ⊂ Em, the set of users sending data on P(i).These will be chosen as follows: If P(i) is not polarized Bi isempty. Otherwise, select ri linearly independent columns ofthe matrix Ai, and put k in Bi if and only if the k’th column isselected. For a user k let G[k] be the set of i for which k ∈ Bi.For each user k and i 6∈ G[k] choose Ui[k] independentlyand uniformly at random, reveal all these ‘frozen’ choices touser k and also to the decoder. The encoder for user k willtransmit uncoded bits on channels included in G[k], on theother channels it will transmit the frozen values.

The decoder operates by successively decoding U1[Em],U2[Em], . . . , Un[Em]. At stage i, having already decodedU i−1[Em] (assume correctly, for the moment), it is in posses-sion of (Y n, U i−1[Em]), the output of P(i). It can thus deter-mine AiUi[Em] with high probability. Since it knows Ui[B

ci ],

it can determine∑

k∈Bi Ai[k]Ui[k], and as {Ai[k] : k ∈ Bi}are linearly independent, it can determine Ui[Bi].

Observe that for this decoder to operate as described above,it needs the aid of a genie which provides it with U i−1[Em]at stage i of the decoding. Let Ui[Em] = φi(Y

n, U i−1[Em])denote the decoding function of such a decoder. Observenow, that if we construct an unaided decoder via Ui[Em] =φi(Y

n, U i−1[Em]) using the same decoding function of thegenie-aided decoder, the block error event for this unaideddecoder Un[Em] 6= Un[Em] is the same as the block errorevent Un[Em] 6= Un[Em] of the genie aided decoder. Thus,the block error probability of the unaided decoder Pe(n) isequal to the block error probability of the genie aided decoderand so can be upper bounded as

Pe(n) ≤∑i

Pe(P(i), AiUi)

where Pe(P(i), AiUi) is the probability of error in determiningAiUi from the output of the channel P(i). Note now, that wehave to be careful in our choice of εn: we need to take εn smallenough to ensure that nPe(P(i), AiUi) is small. We will see inthe proof of Theorem 9 that channel polarization happens sorapidly that even with such a more stringent choice of εn thefraction of non polarized channels vanishes with increasing n.(Indeed, for any β < 1/2, one can choose εn = 2−n

β

and stillensure polarization.)

2Indeed, by Problem 4.7 in [7], with probability at least 1− γ(εn).

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 8

Since I[Em](P ) is preserved through the polarization pro-cess (cf. the equality in (5)), we guarantee that with δndenoting the fraction of unpolarized channels,

Sum-Rate(n) :=1

n

m∑k=1

|G[k]| > I[Em](P )− δn − εn,

Thus if εn → 0 is chosen so that δn → 0, the communicationsystem described above achieves the uniform sum rate of theunderlying channel. The question as to whether εn can bechosen so that both δn → 0 and the block error probabilitydecays to zero is answered in the affirmative by the theorembelow.

Theorem 7. For any m ≥ 1, any binary input MAC P withm users, and any β < 1/2, there exists an integer n0 and asequence of codes with polar encoders and decoders describedabove such that the probability of error for a block length nsatisfies

Pe(n) ≤ 2−nβ

, ∀n ≥ n0

andlim inf

n→∞Sum-Rate(n) ≥ I[Em](P ).

As for the polar code in the single-user setting [3], theencoding and decoding complexity of this code is O(n log n).

Proof of Theorem 7: Fix α ∈ (β, 1/2), ε ∈ (0, 1/2) andεn = 2−n

α

. Let int(x) denote the closest integer to x anddefine

Dn := {i ∈{1, . . . , n} : I(P(i))[J ] ∈ Z± ε for any J,

∃Ai ∈ Fri×m2 with ri = int(I(P(i)[Em])) and

I(AiUi[Em];Y nU [Em]i−1) > ri − εn}.

For i ∈ Dn, we have H(AiUi[Em]

∣∣ Y nU i−1[Em])< εn,

and the output of channel P(i) determines AiUi[Em] with highprobability, namely

Pe(P(i), AiUi) ≤ εn. (10)

Therefore,

Pe(n) ≤∑i∈Dn

Pe(P(i), AiUi) (11)

≤ nεn = o(2−nβ

). (12)

Hence, such a choice of εn guarantees the first claim of theTheorem. We now show that such an εn is still large enough tomaintain most of the polarized MACs active, causing no lossin the sum-rate as stated in the second claim of the Theorem.To this end, we need the following definition and result.

Definition 8. For an m-user BMAC P with output alphabet Yand for S ⊆ Em, we define P [S] to be the single-user binaryinput channel with output alphabet Y , obtained from P by

P [S](y|s) =1

2m−1

∑x[Em]∈Fm2 :

∑i∈S xi=s

P (y|x[Em])

for all y ∈ Y , s ∈ F2. Schematically, if P : X[Em]→ Y , wehave P [S] :

∑i∈S Xi → Y .

Lemma 9. Let P(i) be the channel defined in (9) and let(P(i))

[S] be the corresponding single-user channel (cf. Defini-tion 8). We have for any ε ∈ (0, 1), α < 1/2 and S ⊆ Em

liml→∞

1

n|{i ∈ {1, . . . , n} : I((P(i))

[S]) > 1− ε,

I((P(i))[S]) < 1− 2−n

α

}| = 0.

The proof of this lemma is given below. Let

Dn[S] := {i ∈ {1, . . . , n} : I((P(i))[S]) > 1− εn}, (13)

Dn[S] := {i ∈ {1, . . . , n} : I((P(i))[S]) > 1− ε}. (14)

From Lemma 9, we have that

maxS∈2Em

1

n|Dn[S] \Dn[S]| → 0. (15)

This implies that

1

n|Dn \ Dn| → 0 (16)

where

Dn := {i ∈{1, . . . , n} : I(P(i))[J ] ∈ Z± ε for any J,

∃Ai ∈ Fri×m2 with ri = int(I(P(i)[Em])) and

I(AiUi[Em];Y nU [Em]i−1) > ri − γ(ε)}

where γ(ε) is as in Theorem 5. (The only difference betweenD and D is in the γ(ε) and εn in the last line.)

Since from Theorem 5

liml→∞

1

n|Dn| = 1,

we also have from (16)

liml→∞

1

n|Dn| = 1.

Finally, since the polarization process preserves the sum-rate,we conclude the proof of the Theorem.

Proof of Lemma 9: Note that

(P [S])− ≡ (P−)[S]

(P [S])+ � (P+)[S]

where ≡ means that the two transition probability distributionsare the same and where � means that they are degraded inthe sense

P1(y|x) � P2(y|x) if P1(y|x) = P2(φ(y)|x)

for some function φ. Hence, defining the Bhattacharyya pa-rameter of a single-user channel Q with binary input andoutput alphabet Y by

Z(Q) =∑y∈Y

√Q(y|0)Q(y|1),

we have

Z[(P−)[S]] = Z[(P [S])−] ≤ 2Z[P [S]]

Z[(P+)[S]] ≤ Z[(P [S])+] = Z[P [S]]2

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 9

and the random process Z` = Z[(P`)[S]] satisfies

Z`+1 ≤ Z2` if B`+1 = 1, (17)

Z`+1 ≤ 2Z` if B`+1 = 0. (18)

We then conclude by using Theorem 3 of [4], which showsthat a random process which satisfies3 (17) and (18) satisfiesfor any β < 1/2

lim inf`→∞

P(Z` ≤ 2−2β`) ≥ P(Z∞ = 0).

Let us define p = P{Z∞ = 1}. By definition of {Z∞ = 1},for any δ ∈ (0, 1), η > 0 there exists an integer k such thatfor any ` ≥ k

P{Z` ≥ δ} ≥ p− η,

hencelim inf

`→∞P{Z` ≥ δ} ≥ P{Z∞ = 1}.

Therefore,

P{2−2β` < Z` < δ} (19)

= 1− P{Z` ≤ 2−2β`} − P{Z` ≥ δ} (20)→ 0 (21)

and we proved that

liml→∞

1

2l|{i ∈ {1, . . . , 2l} : Z((P(i))

[S]) < δ,

Z((P(i))[S]) > 2−2lβ}| = 0.

To conclude the proof of the lemma, we use the fact thatfor any binary input discrete memoryless channel Q, we haveI(Q)+Z(Q) ≥ 1 (this is shown in Proposition 11 of [3]), andthe fact that I(Q)2 +Z(Q)2 ≤ 1 (this is shown in Proposition1 of [3]).

V. EXTENSIONS

A. Power of prime cardinality inputs

The coding schemed developed in this paper can also beused to construct a polar coding scheme (with a deterministicencoding matrix) over single-user channels with input cardi-nality equal to a power of 2 (or power of prime). Specifically,let W be a DMC with input alphabet cardinality 2m and arbi-trary output alphabet. Consider the virtual MAC W obtainedby mapping m independent uniform bits into a uniform symbolon F2m used as an input for the channel W . We have thatW is an m-user binary input MAC and the uniform sum-rate of W is the uniform mutual information of W . The MACpolar code construction of this paper defines then a polar codeconstruction for W , where the transformation of the inputsover F2m is done by using the field addition of Fm2 . Moreprecisely, from Un i.i.d. uniform over F2m , one constructXn = UnGn, where Gn is as define previously (and as in [3]),but where the addition operation used in the multiplicationof Un and Gn is the one of Fm2 (which corresponds to thecomponent wise addition of the representation of Ui in Fm2 ).

3the conditions required in Theorem 3 of [4] are indeed weaker than whatwe have here

Note that this provides a deterministic construction and analternative to the scheme proposed in [3] for power of primecardinality alphabets which uses a randomized construction.Using the result in this paper, one then obtains that the virtualchannels obtained with this polarization construction do nothave a uniform mutual information which is close to either0 or m, but which is close to an integer in {0, 1, . . . ,m}.This allows to define a polar coding scheme where partialinformation about the inputs Ui (corresponding to subsets ofthe components) can be recovered on each extremal channels.Since the uniform sum-rate of W is preserved through thetransformation, the uniform mutual information of W is alsopreserved, and the resulting polar coding scheme achieves theuniform mutual information of W . Since m can be arbitrarilylarge, this provide an alternative to construct capacity achiev-ing polar coding schemes on arbitrary DMCs (using a largeenough input cardinality to approximate a non-uniform inputdistribution).

Similarly, for a MAC with m users and q-ary input alpha-bets, where q = 2k, we can split each user into k virtualusers with binary inputs and use the polar code constructionof this paper to achieve the uniform sum rate. Furthermore,if an m-user q-ary input MAC requires a certain distributionto achieve the (true) sum rate, then, we can split each userinto multiple virtual users with binary inputs to approximatethe given distribution and thus achieve the sum capacity of anarbitrary MAC.

This can be further generalized to arbitrary prime powers,i.e., to arbitrary finite fields for the input alphabet, although thepower of 2 case may be particularly interesting for complexityconsiderations. Indeed, in that case, one may use FFT-like al-gorithm to reduce the decoding complexity from O(q2·n log n)(general estimate) to O(q log q ·n log n), where q = 2m is theinput cardinality.

B. Polar coding for the AWGN channel

One can use the results of Section IV to construct capacity-achieving codes for the AWGN channel using a quantizationscheme for the input distribution. For example, by transmittingthe standardized average of i.i.d. binary random variables,scaled to satisfy the power constraint, the receiver observes

Y =2√p

√m

m∑i=1

(Xi − 1/2) + Z,

where Z is Gaussian distributed. We can view this channel asbeing an m-user BMAC, (X1, . . . , Xm) → Y , and the polarcode constructed in this paper can be used to communicateover this channel. From the central limit theorem, by takingm arbitrarily large, the input distribution of previous schemeis arbitrarily close to a Gaussian distribution, and hence, thiscoding scheme can achieve rates arbitrarily close to the AWGNcapacity. To ensure that this scheme provides a ‘low encodingand decoding complexity code’ for the AWGN channel, onehas to make further complexity considerations when assumingm arbitrarily large. First, the decoder must recover an m-dimensional binary vector over each extremal MAC and thetotal (maximal) number of hypothesis is 2m. For this, the

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 10

decoder can proceed with each of the m users individually(reducing the problem to m successive hypothesis tests), byusing the marginalized single-user channel between one userand the output, which is an extremal channel in the single-user sense. Also, one maximal independent set of users needsto be identified for each extremal MAC, to know where theinformation bits should be sent. There is no need to checkexponentially many sets for this purpose, since this is achievedin at most m steps, by using a greedy algorithm that checksthe independence of a given set and increases the set by oneelement at each step (starting with the empty set). Polar codingschemes for the AWGN channel have been further investigatedin [2].

VI. DISCUSSION

We have constructed a polar code for the MAC with arbi-trarily many users, which preserves the properties (complexity,error probability decay) of the polar code constructions in[3], [19]. The polarization technique brings an interestingperspective on the MAC problem: by polarizing the MACs foreach user separately, we create a collection of extremal MACswhich are “trivial” to communicate over, both regarding howto handle noise (noiseless or pure noise) but also regardinghow to handle interference (which is, modulo synchronizationin the code, removed). We have also shown that the extremalMACs are in a one-to-one correspondence with the linear de-terministic MACs, i.e., MACs whose outputs are linear formsof the inputs. The polar code constructed in this paper is shownto achieve only a portion of the dominant face of the MACregion, which is however sufficient to achieve the uniformsum rate on any binary input MAC. There are examples ofnon-extremal MACs where the polar code described in thispaper can achieve rates in the entire uniform rate region, forexample, this is the case for a 2-user MAC whose output isX1 +X2 with probability half and (X1, X2) with probabilityhalf. In general, this may not be the case.

APPENDIX

In this section, we prove Theorem 4 and Theorem 5. Wefirst need an auxiliary lemma.

Lemma 10. Let W be a binary MAC with 2 users. LetX[E2] with i.i.d. uniform binary components and let Y bethe output of W when X[E] is sent. If I(X[1];Y X[2]),I(X[2];Y X[1]) and I(X[1]X[2];Y ) have specified integervalues, then I(X[1];Y ), I(X[2];Y ) and I(X[1] + X[2];Y )have specified values in {0, 1}.

Proof: Let

I := [I(X[1];Y X[2]), I(X[2];Y X[1]), I(X[1]X[2];Y )]

J := [I(X[1];Y ), I(X[2];Y ), I(X[1] +X[2];Y )].

Note that by the polymatroid property of the mutual informa-tion, we have

I ∈ {[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1], [1, 1, 2]}. (22)

Let y ∈ Supp(Y ) and for any x ∈ F22 define P(x|y) =

W (y|x)/∑

z∈F22W (y|z) (recall that W is the MAC with

inputs X[1], X[2] and output Y ). Assume w.l.o.g. that p0 :=P(0, 0|y) > 0.• If I = [0, 0, 0] we clearly must have J = [0, 0, 0].• If I = [?, 1, 1], we have I(X[2];Y X[1]) = 1 and we can

determine X[2] by observing X[1] and Y , which implies

P(01|y) = 0.

Moreover, since I(X[1];Y ) = I(X[1]X[2];Y ) −I(X[2];Y X[1]) = 0, i.e., X[1] is independent of Y , wemust have that

∑x[2] P(x[1]x[2]|y) is uniform, and hence,

P(00|y) = 1/2, P(10|y) + P(11|y) = 1/2.

Now, if ? = 1, by a symmetric argument as before,we must have P(11|y) = 1/2 and hence the inputpairs 00 and 11 have each probability half (a similarsituation occurs when assuming that P(x|y) > 0 forx 6= (0, 0)), and we can only recover X[1] + X[2] fromY , i.e., J = [0, 0, 1]. If instead ? = 0, we then haveI(X[2];Y ) = I(X[1]X[2];Y ) − I(X[1];Y X[2]) = 1and from a realization of Y we can determine X[2], i.e.,P(10) = 1/2 and J = [0, 1, 0].

• If I = [1, 0, 1], by symmetry with the previous case, wehave J = [1, 0, 0].

• If I = [1, 1, 2], we can recover all inputs from Y , henceJ = [1, 1, 1].

Proof of Theorem 4: Let I[S](W ) be assigned an integerfor any S ⊆ Em. By the chain rule of the mutual information

I(X[Em];Y ) = I(X[S];Y ) + I(X[Sc];Y X[S]),

and we can determine I(X[S];Y ) for any S. Since for anyT ⊆ S

I(X[S];Y ) = I(X[T ];Y ) + I(X[S − T ];Y X[T ]),

we can also determine I(X[S];Y X[T ]) for any S, T ⊆ Em

with S ∩ T = ∅. Hence, we can determine

I(X[1], X[2];Y X[S])

I(X[1];Y X[S]X[2])

I(X[2];Y X[S]X[1])

and thus using Lemma 10, we can determine

I(X[1] +X[2];Y X[S])

for any S ⊆ Em with {1, 2} /∈ S, hence

I(X[i] +X[j];Y )

for any i, j ∈ Em.Assume now that we have determined I(

∑T X[i];Y X[S])

for any T with |T | ≤ k and S ⊆ Em−T . Let T = {1, . . . , k}and let S ⊆ {k + 2, . . . ,m}.

I(∑T

X[i], X[k + 1];Y X[S])

= I(X[k + 1];Y X[S]) + I(∑T

X[i];Y X[S]X[k + 1]),

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 11

in particular, we can determine

I(X[k + 1];Y∑T

X[i], X[S])

= I(∑T

X[i], X[k + 1];Y X[S])

− I(∑T

X[i];Y X[S])

and

I(∑T

X[i], X[k + 1];Y X[S])

I(∑T

X[i];Y X[S]X[k + 1])

I(X[k + 1];Y∑T

X[i], X[S])

and using Lemma 10, we can determine

I(∑T

X[i] +X[k + 1];Y X[S])

hence

I(∑T

X[i];Y )

for any T ⊆ Em with |T | = k + 1. Hence, inducting thisargument, we can determine I(

∑T X[i];Y ) for any T ⊆ Em.

Note that the values of these mutual informations mustbe consistent, for example, if I(X[1] + X[2];Y ) = 1 andI(X[1]+X[3];Y ) = 1, we must have I(X[2]+X[3];Y ) = 1.In general, if I(

∑T1X[i];Y ) = 1 and I(

∑T2X[i];Y ) = 1,

we must have I(∑

T3X[i];Y ) = 1 for T3 = T1 ⊕ T2 (the

disjoint union of the two sets). Hence, a given MAC induces agiven function f(·), which in turns induces a given collectionof subsets T for which I(

∑T X[i];Y ) = 1. These subsets

are closed under ⊕ and can be generated by a minimalcollection of subsets, which form the rows of a matrix A,which satisfies I(AX[Em];Y ) = rank(A). Note that thereare several possible minimal collection of subsets and henceseveral possible matrices A (all obtained by one of them andby elementary operations over the rows). Moreover, f(J) =rank(A[J ]), where A[J ] denotes the columns of A indexed byJ . Hence, rank(A) = I(X[Em];Y ). For example, if m = 3,f(1) = f(2) = f(3) = 1, f(1, 2) = f(1, 3) = 2, f(2, 3) = 1and f(1, 2, 3) = 2, we know that g(S) = I(

∑S X[i];Y )

is determined by f and it is easy to check that in thiscase g(S) = 1 if and only if S ∈ {1, {2, 3}, {1, 2, 3}}and hence {1,{2,3}} is a minimal subset which leads to

A =

(1 0 00 1 1

).

In order to prove the “approximative” version of Theorem4, i.e., Theorem 5, we need the following lemma which is acorollary of Lemma 33 in [19] (Lemma 33 treats only onethe cases assumed in Lemma 11, but it is the only non-trivialcase as in the proof of Lemma 10). The proof of Theorem 5follows then from Lemma 11 and the proof of Theorem 4.

Lemma 11. Let W be a binary MAC with 2 users. Let X[E2]with i.i.d. uniform binary components and let Y be the outputof W when X[E] is sent. If I(X[1];Y X[2]), I(X[2];Y X[1])

and I(X[1]X[2];Y ) have specified integer values within ε,then I(X[1];Y ), I(X[2];Y ) and I(X[1] + X[2];Y ) havespecified values outside (γ(ε), 1− γ(ε)) with γ(ε)

ε→0→ 0.

REFERENCES

[1] E. Abbe, Mutual Information, Matroids and Extremal Channels, availableon arXiv:1012.4755v1, 2010.

[2] E. Abbe and A. Barron, Achieving the capacity of the AWGN channel witha polar coding scheme, International Symposium on Information Theory(ISIT), Saint Petersburg, July 2011.

[3] E. Arıkan, Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels, IEEETrans. Inform. Theory, vol. IT-55, pp. 3051–3073, July 2009.

[4] E. Arıkan and E. Telatar, On the rate of channel polarization, in Proc.2009 IEEE Int. Symp. Inform. Theory, Seoul, pp. 1493–1495, 2009.

[5] J. Edmonds, Submodular functions, matroids and certain polyhedra,Lecture Notes in Computer Science, Volume 2570/2003, 11–26, Springer,2003.

[6] S. Fujishige, Polymatroidal dependence structure of a set of randomvariables, Information and Control, vol. 39, pp. 55-72, 1978.

[7] R. Gallager, Information Theory and Reliable Communication, JohnWiley & Sons, 1968.

[8] J. F. Geelen, A. M. H. Gerards, and A. Kapoor, The excluded minors forGF(4)- representable matroids, J. Combin. Theory Ser. B 79 (2000), no.2, 247–299.

[9] T. S. Han, The capacity region of general multiple-access channel withcertain correlated sources, Information and Control, 40, pp. 37–60, 1979.

[10] T. S. Han, A uniqueness of Shannon’s information distance and relatednonnegativity problems, J. Comb., Inform. Syst. Sci., vol. 6, no. 4,pp. 320-331, 1981.

[11] B. Hassibi and S. Shadbakht, A construction of entropic vectors, ITAworkshop at UCSD, San Diego, February 2007.

[12] L. Lovasz, Submodular functions and convexity, in Mathematical Pro-gramming - The State of the Art, A. Bachem, M. Grotchel, and B. Korte,Eds. Berlin: Springer-Verlag, 1982, pp. 234257.

[13] F. Matus, Probabilistic conditional independence structures and matroidtheory: background, Int. J. of General Systems 22, pp. 185-196.

[14] J. Oxley, Matroid Theory, Oxford Science Publications, New York, 1992.[15] R. E. Bixby, On Reid’s characterization of the ternary matroids, J.

Combin. Theory Ser. B 26 (1979), no. 2, 174–204.[16] W. T. Tutte, A Homotopy Theorem for Matroids, Transactions of the

American Mathematical Society Vol. 88, No. 1, pp. 144–160, 1958.[17] E. Sasoglu, private communication.[18] E. Sasoglu, E. Telatar, E. Arıkan, Polarization for arbitrary discrete

memoryless channels, August 2009, arXiv:0908.0302v1 [cs.IT].[19] E. Sasoglu, E. Telatar, E. Yeh, Quasi-polarization for the two user binary

input multiple access channel, IEEE Information Theory Workshop,Cairo, January 2010.

[20] P. D. Seymour, Matroid representation over GF(3), J. Combin. TheorySer. B 26 (1979), no. 2, 159–173.

[21] A. D. Wyner and J. Ziv, A theorem on the entropy of certain binarysequences and applications (Part I), IEEE Trans. Inform. Theory, vol. 19,pp, 769–777, 1973.

[22] Z. Zhang and R. Yeung, On characterization of entropy function viainformation inequalities, IEEE Trans. on Information Theory, vol. 44,no. 4, pp. 1140–1452, 1998.

Emmanuel Abbe received his M.S. degree in 2003 from the MathematicsDepartment, Ecole Polytechnique Federale de Lausanne, Switzerland, andhis Ph.D. degree in 2008 from the Department of Electrical Engineeringand Computer Science, Massachusetts Institute of Technology, Cambridge.Since 2008, he has been working as a postdoctoral fellow and lecturer inthe School of Communication and Computer Sciences, Ecole PolytechniqueFederale de Lausanne, and as a research affiliate at MIT. His research interestsinclude information and coding theory, communications, and combinatorialprobability. He received the CVCI award for his master thesis in Mathematicsat the Ecole Polytechnique Federale de Lausanne in 2003, and the FoundationLatsis International Prize for his postdoctoral research at EPFL in 2011.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , MONTH YEAR 12

Emre Telatar (S’88 – M’91 – SM’11 – F’12) received the B.Sc. degreein electrical engineering from the Middle East Technical University, Ankara,Turkey, in 1986 and the S.M. and Ph.D. degrees in electrical engineering andcomputer science from the Massachusetts Institute of Technology, Cambridge,in 1988 and 1992, respectively.

From 1992 to 1999, he was with the Mathematical Sciences ResearchCenter, AT&T Bell Laboratories, Murray Hill, NJ. Since 1999, he hasbeen a Professor at the Ecole Polytechnique Federale de Lausanne (EPFL),Switzerland. His research interests are in communication and informationtheories.

Dr. Telatar was the recipient of the IEEE Information Theory SocietyPaper Award in 2001. He was a Program Co-Chair for the IEEE InternationalSymposium on Information Theory in 2002, and an Associate Editor forShannon Theory for the IEEE Transactions on Information Theory from 2001to 2004.