Low-Complexity Probabilistic Shaping for Coded Modulation

Low-Complexity Probabilistic Shaping for Coded Modulation

Vincent Corlay and Nicolas GressetMitsubishi Electric R&D Centre Europe, Rennes, France. E-mail: [email protected].

Abstract—We introduce a new method for probabilistic shap-ing of coded modulation. We show that, for a large range ofinformation rates, most of the shaping gain can be obtained viathe use of only two non-uniform binary sources. This result isachieved by considering non-Maxwell-Boltzmann distributions ofthe symbols.

Index Terms—Shaping, low complexity, coded modulation.

I. INTRODUCTION

The channel capacity characterizes the highest informa-tion rate that can be achieved for a fixed average transmitpower while maintaining a small error probability. For theGaussian channel, the channel capacity cannot be reachedif each symbol of commonly used constellations, such asthe amplitude-shift keying (ASK) constellation, is transmittedwith equal probability. As a result, the transmitter shouldprocess the data such that the symbols of the constellationare transmitted with a probability which enables to approachthe capacity. This operation is called shaping. More precisely,it is called probabilistic shaping, to be opposed with geometricshaping1[6].

We enumerate several existing methods to realize proba-bilistic shaping. Note that, for the Gaussian channel, it iswell-known that a Maxwell-Boltzmann (MB) distribution ofthe symbols yields quasi-optimal performance.

Trellis shaping consists in encoding the most significantbit used for the labelling of the ASK constellation with aconvolutional code [8]. The shaping code and shaping decoderare designed such that the average energy of the constellationis minimized (i.e., approximate a MB distribution). Trellisshaping is applied in the famous paper [13].

Sign-bit shaping is a special case of trellis shaping alsodescribed in [8]. It was recently studied in [3], where theauthors considered polar codes instead of convolution codesfor the shaping operation.

Lattice-based shaping is a similar method where a latticeis used for shaping [7]. The transmitted codewords are se-lected within the Voronoi region of the shaping lattice. If theVoronoi region is almost spherical, then the induced marginaldistribution of the symbols on each dimension is close to aMB distribution.

The paper [5] proposes a bit-wise multi-level distributionmatching scheme. The bit-levels are shaped successively de-pending on the output of the previous distribution matcher(DM). Both Gray labelling and natural labelling are consid-ered. The authors focus on fitting a MB distribution. Theystate that “All bit-levels need to be shaped if natural labellingis employed”.

1The main idea of geometric shaping is to change the position of thesymbols in the constellation without changing the probability distribution.

11.5 12

12.5 13

13.5 14

14.5 15

15.5 16

16.5 17

17.5 18

18.5 19

19.5

1.8

2

2.2

2.4

2.6

2.8

Fig. 1. Performance of the proposed shaping method, implemented viathe system depicted on Figure 4, combined with multi-level polar coding.The scheme is evaluated for a block error rate of 10−2 via Monte Carlosimulations. The block length is n = 1024 symbols. We also show theperformance of the coded system without shaping for comparison. P = 2means that the number of distinct non-equiprobable binary sources is limitedto 2. Moreover, the parameter pi of these two Bernoulli(pi) sources is fixed,namely p1 = 0.1 and p2 = 0.3.

The paper [9] discusses how to realize the DM of a MBdistribution with several binary DMs. They propose an algo-rithm to find the marginal distribution of each bit level suchthe joint distribution, after mapping, has a MB distribution.Similarly, [11] introduces “product DM”, where each bit levelis shaped independently such that the output symbol has a MBdistribution.

In [2], the authors combine the probabilistic amplitudeshaping scheme, which concatenates a DM with a systematicencoder for an error-correction code, with constant composi-tion distribution matching (CCDM) [10] used as a symbol-wise DM. The CCDM enables to fit the positive side of a MBdistribution.

As shown by this short review of existing works, mostpapers on probabilistic shaping focus on fitting a MB distribu-tion. Nevertheless, other distributions also enable to approachthe capacity while being easier to implement. Indeed, in thisletter we propose a new shaping method not relying on theMB distribution. We consider distributions that are convenientto implement while maintaining quasi-optimal performance.The block error rate of a multi-level coded scheme with theproposed method is shown on Figure 1 (see also Section IV-B).

Note that this new scheme is not about a method to makea new DM (such as CCDM [10] or polar code-based DM [4])but about an efficient manner to combine binary DMs, or morespecifically binary sources with different probabilities, in orderto realize the shaping operation.

arX

iv:2

109.

1513

5v1

[cs

.IT

] 3

0 Se

p 20

21

II. PRELIMINARIES

The Gaussian channel is expressed as

Y = X + Z, (1)

where X is the channel input with alphabet X and Z is thenoise with distribution N (0, σ2) . The signal-to-noise ratio(SNR) is defined as SNR = E[|X|2]/E[|Z|2]. The alphabetX considered in this paper is a M -ASK constellation. Thesymbols of this constellation, where M = 2m, are

X = {−2m − 1, ..,−3,−1,+1,+3, . . . ,+2m − 1}. (2)

Consequently, each symbol can be represented by a sequenceof m = log2M bits. Note that a M2-QAM can be obtainedas the Cartesian product of two M -ASK constellations.

Let p∗(x) be the distribution of the input X that maximizesthe mutual information (MI) for the ASK constellation

p∗(x) = arg maxp(x),E[X2]≤P

I(X;Y ), (3)

where P is the maximum average power. We also define p∗(x)as the set of quasi-optimal distributions, i.e.,

p(x) ∈ p∗(x) if I(X;Y ) ≥ arg maxp(x),E[X2]≤P

I(X;Y )− ε, (4)

where ε is a quantity whose magnitude depends on the targetsystem. The aim of shaping is to process the input such thatits probability distribution maximizes, or almost maximizes,the MI I(X;Y ). In other words, the distribution of the inputshould be in p∗(x).

For the M -ASK constellation, p(x) is often chosen as theMB distribution (e.g., in [13][5]). Indeed, the performanceobtained are close to the one obtained with p∗(x), i.e.,pMB(x) ∈ p∗(x) for ε small. However, implementing thisMB distribution is not trivial and may involve a high com-plexity (see the references mentioned in Section I). Moreover,as shown below, other distributions also yield quasi-optimalperformance.

III. PROPOSED METHOD

A. DescriptionGiven a M -ASK constellation, we propose to limit the set

of possible distributions to the ones that can be expressed asfollows:

∀ 1 ≤ i ≤ M

2,p(xi+M

2)

α=

1− p(xi)α

, (5)

where xi ∈ X and α = 1M/2 is a scaling constant to have the

sum of the probabilities equal to 1. Let us denote the variablep(xi)/α by pi. Since p∗(x) is symmetric, we set

pi = 1− pM2−i+1, 1 ≤ i ≤ M

4. (6)

An illustration is provided with the 8-ASK on Figure 2.Consequently, the MI is now optimized with respect to the

set {pi}, 1 ≤ i ≤ M4 :

{pi}∗ = arg max{pi},E[X2]≤P

I(X;Y ). (7)

Of course, the greater the number of distinct pi, thegreater the implementation complexity: As we shall see in

N°:1

�

N°:2

�2

-7 -5 -3 -1 1 3 5 7

N°:3 N°:4 N°:5

− �

N°:6

− �2

N°:7 N°:8

000 100 010 110 001 101 011 111

− �2 − � �2 �

Fig. 2. Illustration of the proposed distribution with the 8-ASK. Naturallabelling, with the less significant bits on the left, is also shown.

Section III-B, each distinct pi requires a distinct binary source.Consequently, we can force adjacent symbols to have thesame probability. We let P be the number of distinct pi. Asan example, Figure 3 shows how P changes the optimizeddistribution (according to (7)) for the 32-ASK.

0

0.01

0.02

0.03

0.04

0.05

-40 -20 0 20 400

0.02

0.04

0.06

-40 -20 0 20 400

0.02

0.04

0.06

-40 -20 0 20 40

Fig. 3. Optimized distribution at 24 dB for a 32-ASK with P = 1, P = 2,and P = 4, respectively.

In Section IV, we show that even for a M -ASK with largeM , p(x) ∈ p∗(x) if P is small.

B. Implementation of the proposed method

Let us first divide the M -ASK constellations in M/2 setsof two elements containing the ith and i + M/2th symbols,1 ≤ i ≤ M/2. E.g., with the example of Figure 2, we havea first set with the symbols {−7, 1}, a second set with thesymbols {−5, 3}, etc. The proposed shaping method requirestwo steps:• Step 1: One of the M/2 sets is randomly selected.• Step 2: The first symbol of the set is then transmitted with

probability pi and the second one with probability 1−pi.1) First implementation: Let us define the function

D(b1, b2, ..., bm) =∑m

i=1 2i−1 · bi, which returns the decimalvalue of a binary sequence. We assume that several binarysources S0, S1, S2,... are available, where S0 generates i.i.d.bits with Bernoulli(1/2) probabilities, and Si i.i.d. bits withBernoulli(pi) probabilities.

If natural labelling of the M -ASK is used (as on Figure 2),the shaping method can be implemented as illustrated onFigure 4: First, the binary source S0 generates m − 1 bitsb1, b2, ..., bm−1. If 1 ≤ D(b1, b2, ..., bm−1) + 1 ≤ M/4, aswitch selects the source Si of index i = D(b1, b2, ..., bm−1)+1. One bit bm is then obtained from this source (and notflipped). If M/4 + 1 ≤ D(b1, b2, ..., bm−1) + 1 ≤ M/2,a switch selects the source Si of index i = M/2 −D(b1, b2, ..., bm−1) and the bit bm obtained is then flipped2.

2Consequently, at the receiver, the decoded shaping bit is flipped if M/4+

1 ≤ D(b1, b2, . . . , bm−1) + 1 ≤ M/2, where b1, b2, . . . , bm−1 representthe values of the decoded less significant bits.

The flipping trick, which exploits the symmetry described by(6), enables to divide by two the number of distinct binarysources (i.e., we generate a Bernoulli(1 − pi) source from aBernoulli(pi) source). This selection of the non-equiprobablebinary source represents Step 1 above: Each of the M/2 sets oftwo symbols corresponds to a specific sequence b1, ..., bm−1.Then, a symbol mapper generates the symbol to be transmittedbased on the value of the m bits. I.e., the first symbol of theset selected by the first m − 1 bits is transmitted if bm = 0and the second is transmitted if bm = 1. This corresponds toStep 2 above.

If several adjacent symbols are constrained to have the sameprobability, i.e., P < M/4 and groups of M/4/P symbolshave the same probability, then the switch selects the firstsource for 1 ≤ D(b1, b2, ..., bm−1) + 1 ≤M/4/P , the secondsource for 1+M/4/P ≤ D(b1, b2, ..., bm−1)+1 ≤ 2·M/4/P ,etc.

Symbol mapper,Naturallabelling

1 bit

per symbol

Less significant bits (possibly coded)𝑆0 ,

𝑝=1/2 𝑚 − 1 bits per symbol

𝑆1 , 𝑝1

𝑆2 , 𝑝21 bit per symbol

FlipSwitch

𝑏1 , 𝑏2 ,…, 𝑏𝑚−1

𝑏𝑚

Fig. 4. Shaping encoder with P = 2.

2) Second implementation: If only one binary source isavailable and the bits should be processed packet-wise, theencoder can be adapted as follows. We consider the caseP = 2. The extension to any P is straightforward.

Let H(pi) denote the binary entropy with parameter pi.First, (m − 1)k + n/2(H(p1) + H(p2)) bits are generatedby the source S0. (m − 1)k bits are encoded via an error-correcting code to yield (m−1)n bits. In parallel, n/2 ·H(p1)bits and n/2 · H(p2) bits are processed by two binary DMswhich output a sequence of n/2 bits where each bit is equalto 0 with a probability p1 and p2, respectively. Each of the nsets of m− 1 bits, at the output of the error-correcting code,controls the switch which selects one bit at the output of thecorresponding DM. The process is illustrated by Figure 5.

(𝑚− 1)𝑘+𝑛

2(𝐻(𝑝1) +𝐻(𝑝2)) bits

Channel coding(𝑚− 1)𝑘 bits (𝑚− 1)𝑛 bits

Switch

𝑛

2𝐻(𝑝1) bits

Binary DM 1

Flip

Binary DM 2

𝑛

2𝐻(𝑝2) bits

Symbolmapper,Naturallabelling

𝑛

2bits

𝑛

2bits

𝑛symbols

𝑛 bits

S0,

𝑝=1/2

Fig. 5. Packet-wise shaping encoder with P = 2.

Of course, it is possible that the switch requests n/2+α bitsfrom the first DM and n/2−α from the second DM, where αis a random quantity. To address this issue, we simply add therule that if one source has no more bit available, the switch

selects a bit from the other source. In Appendix VI, we showthat the impact of this rule on the performance is negligible.

IV. PERFORMANCE

A. Mutual informationWe investigate the impact of P on the performance for the

32-ASK constellation. Figure 6 reports the MI obtained with a32-ASK when optimizing the distribution with P = 1, P = 2,and P = 4. We see that with P = 2 the loss is less than 0.1 dBfor information rates R < 3 bits per channel use (bpcu) andless than 0.2 dB for R < 4 bpcu. Note that for low informationrates even P = 1 yields satisfactory performance.

Moreover, for R < 4 bpcu we can take fixed values of thepi (for any SNR) without significant performance loss. E.g.,for P = 2, p1 = 0.1 and p2 = 0.3.

For the 8-ASK and 16-ASK constellations (see Figure 1),the loss with P = 2 is even smaller. For the 64-ASK, P = 2yields a loss of approximately 0.2 dB at 5 bpcu.

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 292

2.5

3

3.5

4

4.5

19.6 19.8 20 20.2 20.4 20.6 20.8 21 21.2 21.4 21.6 21.8 22

3.25

3.3

3.35

3.4

3.45

3.5

3.55

3.6

15.2 15.4 15.6 15.8 16 16.2 16.4

2.55

2.6

2.65

2.7

6.5 17 17

.5 18 18.5 19 19

.5

6.5 17 17

.5 18 18.5 19 19

.5

6.5 17 17

.5 18 18.5 19 19

.5

1616.5 17

17.5 18

18.5 19

19.5

515.5 16 16

.5 17 17.5 18 18

.5 19 19.5

1616.5 17

17.5 18

18.5 19

19.5

11.5 12

12.5 13

13.5 14

14.5 15

15.5 16

16.5 17

17.5 18

18.5 19

19.5

1.8

2

2.2

2.4

2.6

2.8

Fig. 6. Impact of P on the MI with a 32-ASK. The curve “equiprobable in-puts” is computed for a constellation having large number symbols (>> 32).

B. Block error rateFigure 1 reports the performance of the encoder of Figure 4,

in terms of block error rate, where the less significant bitsare encoded via multi-level polar coding with a block lengthn = 1024 (see Appendix VI-B for more details on thiscoding scheme). Both the achievable rate without shaping andwith shaping are shown. The results are obtained via MonteCarlo simulations. We observe that the difference betweenthe error-rate curves is similar to the difference between theMI curves with and without shaping. In other words, theshaping operation does not impact the performance of theerror-correcting code.

V. CONCLUSIONS

In this paper, we introduced a new approach for probabilisticshaping of coded modulations. We considered non-MB distri-butions. The advantage of these distributions is that there areconvenient to implement. Simulation results, both in terms ofMI and block error rates, indicate that most of the shaping gaincan be obtained using only two distinct non-uniform binarysources.

VI. APPENDIX

A. Impact of the switch

The impact of the switch can be computed as follows. Thereal probabilities of the binary sources (with the switch) are

p′1 =(n2 − ε)× p1 + ε× p2

n/2= p1 +

2ε

n(p2 − p1),

p′2 =(n2 − ε)× p2 + ε× p1

n/2= p2 +

2ε

n(p1 − p2),

(8)

where ε is a constant obtained as follows. Let X be a binomialdistribution which assesses the probability to have k successwith n Bernoulli(1/2) trials, i.e.

P (X = k) =

(n

k

)(1

2

)n

. (9)

The parameter ε is computed as follows: ε = E[k′ ·X], wherek′ = 0 if k ≤ n/2 + 1 and k′ = k − n/2 if k ≥ n/2 + 1(where k is the outcome of X), i.e.,

ε =

n∑k=n

2 +1

(k − n

2)×

(n

k

)(1

2

)n

. (10)

Let E′s be the energy of the constellation with the proba-bilities p′i and Es with the probabilities pi. The energy lossdue to the switch can be quantified as ∆ = 10 log10(E′s/Es).For the 8-ASK with the parameters p1 = 0.1 and p2 = 0.3,we find ∆ = 0.0099 dB for n = 212, ∆ = 0.0198 dB forn = 210, and ∆ = 0.0395 dB for n = 28.

B. Multi-level polar coding

1) Multi-level coding: Using the chain rule, the MI betweenthe input X and the output Y can be expressed as

I(X;Y ) = I(B1, B2, ..., Bm;Y ) =

m∑i=1

I(Bi;Y |B1, ..., Bi−1),

(11)

where Bi denotes the random variable corresponding to theith bit of the labelling considered. One bit level refers to thechannel described by I(Bi;Y |B1, ..., Bi−1). When a binarycode is used to transmit information over this ith level thecoding rate should be chosen to match3 I(Bi;Y |B1, ..., Bi−1).Consequently, the value of k on Figure 5 should be chosen ask = n/(m− 1) ·

∑m−1i=1 I(Bi;Y |B1, ..., Bi−1).

The value I(Bi;Y |B1, ..., Bi−1) can be computed as fol-lows. Let S be the set of symbols of the constellationobtained with Bi = 0 and S ′ with Bi = 1, given thatB1 = b1, ..., Bi−1 = bi−1. Given a received symbol y, thelog likelihood ratio (LLR) of the channel is given by

L(y) = logp(y|x ∈ S)

p(y|x ∈ S ′)=

∑x′∈S p(y|x′)p(x′)∑

x′′∈S′ p(y|x′′)p(x′′). (12)

3In practice, a back-off which depends on the code used is applied.

Since the bits on the first levels remain equiprobable, wehave p(Bi = 1|B1 = b1, ..., Bi−1 = bi−1) = p(Bi = 0|B1 =b1, ..., Bi−1 = bi−1) = 0.5 and the MI is computed as

I(Y ;Bi|B1, ..., Bi−1) =1−EB1,...,Bi−1EY |Bi=0[log2(1 + e−L(y))].

(13)

Regarding the last bit level used for shaping, we havep(Bm = 1|B1 = b1, ..., Bm−1 = bm−1) = pi and p(Bm =0|B1 = b1, ..., Bm−1 = bm−1) = 1 − pi and the MI iscomputed as

I(Y ;Bi|B1, ..., Bi−1) =

EB1,...,Bi−1 [H(Bi|B1 = b1, ..., Bi−1 = bi−1)−

EY |Bi=0[p log2(1 +e−L(y)(1− p)

p) + (1− p) log2(1 +

e−L(y)p

1− p )]].

(14)

For the range of SNR considered, the shaping level does notneed to be coded: The mutual information equals the entropy,i.e., the channel is “clean”.

2) Polar coding: Multi-level polar coding consistsin coding each bit level with a polar code of rateI(Y ;Bi|B1, ..., Bi−1) [1]. List decoding as presented in[12] is considered to decode the polar codes. Note that inour implementation the path-metrics are passed between thelayers and only one CRC, added to the last level, is used.

REFERENCES

[1] E. Arıkan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binary-input memoryless channels,” IEEETrans. Information Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.

[2] G. Bocherer, F. Steiner, and P. Schulte, “Bandwidth Efficient and Rate-Matched Low-Density Parity-Check Coded Modulation,” IEEE Trans.Communications, vol. 63, no. 12, pp. 4651-4665, Dec. 2015.

[3] R. Bohnke, O. Iscan, and W. Xu, “Sign-Bit Shaping Using Polar Codes,”Trans. Emerging Tel. Tech., vol. 31, no. 10, pp. 31-36, 2020.

[4] R. Bohnke, O. Iscan, and W. Xu, “Polar coded distribution matching,”Electronic Letters, vol. 55, no. 9, pp. 537-539, May. 2019.

[5] R. Bohnke, O. Iscan, and W. Xu, “Multi-Level Distribution Matching,”IEEE Communications Letters, vol. 24, no. 9, pp. 2015-2019, Sept. 2020.

[6] J. J. Boutros, U. Erez, J. Van Wonterghem, G. I. Shamir, G. Zemor, “Ge-ometric shaping: low-density coding of Gaussian-like constellations,” inProc. Information Theory Worshop, pp. 1-5, Nov. 2018.

[7] D. Forney, “Multidimensional Constellations-Part II: Voronoi Constella-tions,” IEEE Journal on Selected Areas in Communications, vol. 7, no.6, pp. 941-958, Aug. 1989.

[8] D. Forney, “Trellis Shaping,” IEEE Trans. Information Theory, vol. 38,no. 2, pp. 281-300, March 1992.

[9] M. Pikus and W.Xu, “Bit-level Probabilistically Shaped Coded Modu-lation,” IEEE Communications Letters, vol. 21, no. 9, pp. 1929-1932,Sept. 2017.

[10] P. Schulte and G. Bocherer, “Constant Composition Distribution Match-ing,” IEEE Trans. Information Theory, vol. 62, no. 1, pp. 430-434,Jan. 2016.

[11] F. Steiner, P. Schulte, and G. Bocherer, “Approaching waterfilling ca-pacity of parallel channels by higher order modulation and probabilisticamplitude shaping,” in Proc. 52nd Annu. Conf. Inf. Sci. Syst., pp. 1-6,Mar. 2018.

[12] I. Tal and A. Vardy, “List Decoding of Polar Codes”, IEEE Trans.Information Theory, vol. 61, no. 5, pp. 2213-2226, 2013.

[13] U. Wachsmann, R. Fischer, and J. Huber, “Multilevel Codes: TheoreticalConcepts and Practical Design Rules,” IEEE Trans. Information Theory,vol. 45, no. 5, pp. 1361-1391, July 1999.

Documents

Low-Complexity Probabilistic Shaping for Coded Modulation