Springer - Turbo-Like Codes

8/13/2019 Springer - Turbo-Like Codes

1/94


2/94

Turbo-like Codes


3/94

Aliazam Abbasfar

Turbo-like Codes

Design for High Speed Decoding

1 3


4/94

A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-10 9781402063903ISBN-13 9781402063909

Published by Springer,P.O. Box 17, 3300 AA Dordrecht, The Netherlands.

www.springeronline.com

Printed on acid-free paper

All Rights Reservedc 2007

No part of this work may be reproduced, stored in a retrieval system, or transmittedin any form or by any means, electronic, mechanical, photocopying, microfilming, recordingor otherwise, without written permission from the Publisher, with the exceptionof any material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work.


5/94

Dedicated to my wife


6/94

Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv

Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Turbo Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Turbo Codes and Turbo-like Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 RepeatAccumulate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.3 Product Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Iterative Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Probability Propagation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Message-passing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Graphs with Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Codes on Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6.1 Parity-check Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6.2 Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6.3 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 High-speed Turbo Decoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Pipelined Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Parallel Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vii


7/94

viii Contents

3.6 Speed Gain and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.6.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.7 Interleaver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.7.1 Low Latency Interleaver Structure . . . . . . . . . . . . . . . . . . . . . . . . 313.7.2 Interleaver Design Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.8 Hardware Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Very Simple Turbo-like Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Bounds on the ML Decoding Performance of Block Codes . . . . 40

4.1.2 Density Evolution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.2 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 RA Codes with Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.2 Performance of Punctured RA Codes with ML Decoding . . . . . . 53

4.3.3 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4 ARA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.2 Performance of ARA Codes with ML Decoding . . . . . . . . . . . . . 58

4.4.3 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.5 Other Precoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.1 Accumulator wih Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.6 Hardware Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 High Speed Turbo-like Decoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Parallel ARA Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Speed Gain and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4 Interleaver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.5 Projected Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.5.1 Parallel Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.5.2 Other Known Turbo-like Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.5.3 Parallel LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5.4 More AccumulateRepeatAccumulate Codes . . . . . . . . . . . . . . 74

5.6 General Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83


8/94


9/94

x List of Figures

27 Partitioned graph of a simple PCCC 25

28 Parallel turbo decoder with shared processors for two constituent codes 26

29 Performances of parallel decoder 28

30 Efficiency and speed gain 2831 Efficiency vs. signal to noise ratio 29

32 (a) Bit sequence in matrix form (b) after row interleaver

(c) A conflict-free interleaver (d) Bit sequence in sequential order

(e) The conflict-free interleaved sequence 30

33 Data and extrinsic sequences in two consecutive iterations for turbo

decoder with reverse interleaver 31

34 Sequences in two consecutive iterations for parallel turbo decoder

with reverse interleaver 3235 Scheduling diagram of the parallel decoder 32

36 The flowchart of the algorithm 34

37 Performance comparison forB = 1,024 36

38 Performance comparison forB = 4,096 37

39 (a) alpha recursion (b) beta recursion (c) Extrinsic computation 37

40 Probability density function of messages in different iterations 42

41 Constituent code model for density evolution 42

42 Constituent code model for density evolution 43

43 SNR improvement in iterative decoding 44

44 RepeatAccumulator code block diagram 44

45 Density evolution for RA codes (q = 3) 46

46 Accumulator with puncturing and its equivalent forp = 3 47

47 Block diagram of accumulator with puncturing 47

48 Block diagram of check_4 code and its equivalents 51

49 Normalized distance spectrum of RA codes with puncturing 5450 Density evolution for RA codes with puncturing (q = 4,p = 2) 56

51 The block diagram of the precoder 57

52 ARA(3,3) BER performance bound 58

53 ARA(4,4) BER performance bound 59

54 Normalized distance spectrum of ARA codes with puncturing 60

55 Density evolution for ARA codes with puncturing (q = 4,p = 2) 61

56 Performance of ARA codes using iterative decoding 62

57 The block diagram of the new precoder 63

58 Tanner graph for new ARA code 63

59 Performance of the new ARA code 64


10/94

List of Figures xi

60 The partitioned graph of ARA code 68

61 Parallel turbo decoder structure 68

62 Projected graph 69

63 Projected graph with conflict-free interleaver 7064 A PCCC projected graph with conflict-free interleaver 71

65 (a) PCCC with 3 component codes (b) SCCC (c) RA(3) (d) IRA(2,3) 72

66 A parallel LDPC projected graph 73

67 Simple graphical representation of a LDPC projected graph 74

68 ARA code without puncturing 75

69 (a) Rate 1/3 ARA code (b) rate 1/2 ARA code 75

70 (a) Rate 1/2 ARA code (b) New rate 1/3 ARA code (c) New rate 1/4

ARA code 76

71 Improved rate 1/2 ARA codes 77

72 Irregular rate 1/2 ARA codes 77

73 Irregular ARA code family for rate>1/2 78

74 Parallel decoder hardware architecture 79

75 Window processor hardware architecture 79


11/94

List of Tables

I Probability Definitions 9

II State Constraint 16

III The Decoder Parameters 26

IV Characteristic Factors for the Parallel Decoder @SNR= 0.7 dB

(BER = 10E 8) 29

V An Example of the Interleaver 33

VI Cut-off Thresholds for RA Codes with Puncturing 54

VII Cut-off Threshold for Rate 1/2 ARA Codes 61

VIII Cutoff Threshold for ARA Codes with Rate M?

m = m+1;

i > M ?

Done

k = N*m + j;

Qk = N*Pm,j + Nj

m = i1;

Qn & Qksatisfy the

constraint

?

m < 0 ?

m = m 1;

Yes

No

Yes

No

Yes

No

No

Yes

Yes

No

Fig. 36 The flowchart of the algorithm


46/94

3.8 Hardware Complexity 35

bit. Otherwise we check the remaining elements in the column in place of this bit. If

one of them satisfies the constraint, we exchange the indices in the column and go

to the next bit. If not, we try exchanging this bit with the previously designed bits

in the column. In this situation, when we exchange two bits the constraint for both

bits should be satisfied. If none of the previously designed bits can be exchangedwith this bit, then the algorithm fails. There are two options available in case the

algorithm fails: one is to make the constraint milder and the other one is to redo

everything with a new random matrix.

We have observed that this algorithm is very fast for S-random interleaver design

and it does not fail when the spread is less than sqrt(B/2). The maximum spread

that we can achieve for the structured interleaver is slightly smaller than that of the

ordinary one. Therefore, one should expect some degradation in performance.

A much more complicated constraint can be used in the algorithm in order to

improve the code. These constraints usually depend strictly on the code.

3.7.3 Simulation Results

For simulations, two PCCCs with block sizes of 1,024 and 4,096 are chosen. The

first constituent code is a rate 1/2 systematic code and the second code is a rate

one nonsystematic recursive code. The feed forward and feedback polynomials are

the same for both codes and are 1 + D + D3 and 1 + D2 + D3, respectively. Thus

coding rate is 1/3. The simulated channel is an AWGN channel.

Two interleavers with block length of 1,024 (M= 32, N= 32) and 4,096 (M=

128, N= 32) have been designed with the proposed algorithm. The BER perfor-

mance of the decoders has been simulated and compared with that of the serial

decoder with S-random interleaver. The maximum number of iterations for each case

is 10.

The performance comparison for the 1,024 case is illustrated in Figure 37. The

proposed two-dimensional S-random interleaver is called S2-random. As we see in

the figure, the performances are almost the same. The S-random interleaver has a

slightly better error floor.

The performances for the 4,096 case is shown in Figure 38. The difference

between the error floors is more noticeable. However, the codes have equal threshold

in both cases. The error floor can be reduced with a more powerful constraint in the

interleaver design algorithm.

3.8 Hardware Complexity

After the speed gain and efficiency were discussed, we are going to investigate

hardware complexity for parallel turbo decoder. The turbo decoder hardware consists

of two major parts: logic and memory.


47/94

36 3 High-speed Turbo Decoders

EB/No

Fig. 37 Performance comparison for B = 1,024

The memory requirement for the parallel decoder consists of the following:

Observations: These variables are the observation values received from channel.

They usually are the logarithm of likelihood ratios (LLR) of the code word bits. The

message computations are easier using these variables. The size of the memory is

the number of bits in a code word. For a rate 1/3 turbo code with block size of B,

the memory size is 3B. Usually the width of this memory is 45 bits. This is a read-

only memory during the decoding (all iterations). This memory is the same for serial

decoder.

Extrinsics: The extrinsics are stored in a memory to be used later by other

constituent codes. The size of the memory depends on the number of connections

between constituent codes. For a simple PCCC, there are only B extrinsics needed.

This memory is a read/write with a width of 69 bits. For conflict-free interleavers,

this memory is divided into M sub-blocks each one accessed independently. The

total memory, however, is the same as serial decoder.

Beta variables: These variables should be kept in a memory in backward recursion

before it is used to compute the extrinsics. Each beta variable is actually an array. The

size of the array is 2(K1)1, where K is the constraint length of the convolutionalcode; i.e. the number of possible values for a state variable. For an 8-state convolu-

tional code the size of memory for beta variables is 8B. Each element in the array

has usually 812 bits. This memory is a big portion of memory requirements for the


48/94


49/94

38 3 High-speed Turbo Decoders

All of the above operations are done in one clock cycle. In the parallel decoder we

haveMSISO blocks. Therefore, compared to serial decoder the above computational

logic is increased by M times.

Contrary to pipelined turbo decoder, the complexity is not increased proportional

to the speed gain.

3.9 Conclusion

We have proposed an efficient architecture for parallel implementation of turbo

decoders. The advantage of this architecture is that the increase in the processing load

due to parallelization is minimal. Simulation results demonstrate that this structure

not only achieves some orders of magnitude in speed gain, but also maintains theefficiency in processing. Also we have shown that the efficiency and the speed gain

of this architecture are almost independent of the SNR.

We also have proposed a novel interleaver structure for parallel turbo decoder.

The advantages of this architecture are low latency, high speed, and the feasibility

of the implementation. Simulation results show that we can achieve very good BER

performance by this architecture as well. We also presented a fast algorithm to design

such an interleaver, which can be used for designing S-random and other interleavers

by just changing the constraint.

The regularity of the recently proposed architecture for parallel turbo decoder andthe advantages of the proposed interleaver make this the architecture of choice for

VLSI implementation of high-speed turbo decoders.


50/94

Chapter 4

Very Simple Turbo-like Codes

4.1 Introduction

In searching for simple turbo-like codes RA codes are very inspiring. They are

perhaps the simplest turbo-like codes. Surprisingly, they achieve good performancetoo. Simplicity of these codes lends itself to a more comprehensive analysis of their

performance. Divsalar et al. have shown the performance of these codes with ML

decoding and proven that they can achieve near Shannon limit performance [12].

Moreover, they have proved that it achieves the Shannon limit when the rate goes to

zero.

However, RA codes cannot compete with turbo codes or well-designed LDPCs

as far as performance is concerned. To improve the performance of RA codes Jin

proposed Irregular RepeatAccumulate (IRA) codes [16,17]. He also presented a

method for designing very good IRA codes for binary erasure and additive white

Gaussian channels. He showed that they outperform turbo codes for codes with very

large block sizes. However, IRA codes lose both the regularity and simplicity at the

expense of performance.

In this chapter we show that with some simple modifications RA codes can

be transformed into very powerful codes while maintaining the simplicity. The

modifications include simple puncturing and precoding. First RA codes with reg-

ular puncturing are analyzed using iterative decoding, as well as ML decoding.

ML decoding performance is shown by a tight bound using the weight distrib-

ution of RA codes with puncturing. In fact, with increasing both the repetition

and puncturing the code rate remains the same whereas the performance getsbetter.

Then we present ARA codes. These codes not only are very simple, but also

achieve excellent performance. The performance of these codes with ML decoding

is illustrated and compared to random codes by very tight bounds. It is shown that

there are some simple codes that perform extremely close to Shannon limit with

ML decoding. The performance of ARA codes using iterative decoding is also

investigated and compared to ML decoding later on.

RA and ARA codes can be classified as LDPC codes. Despite the fact that LDPC

codes in general may have a very computationally involved encoder, RA and ARAcodes have a simple encoder structure. ARA codes, especially, allows us to generate

a wide range of LDPC codes with various degree distributions including variable

Aliazam Abbasfar,Turbo-Like Codes, 3965. 39

cSpringer 2007


51/94

40 4 Very Simple Turbo-like Codes

nodes with degree one (RA and IRA code structures do not allow degree one variable

nodes). They are able to generate LDPC codes with various code rates and data frame

sizes, and with a performance close to the Shannon capacity limit. The proposed

coding structure also allows constructing very high-speed iterative decoders using

belief propagation (message passing) algorithm.First we describe briefly some of the tools that we use for analyzing the perfor-

mance of a turbo-like code.

4.1.1 Bounds on the ML Decoding Performance of Block Codes

Since there is no practical ML decoding algorithm available for block codes with

large block size, we use the performance bounds to obtain some insight on codesbehavior. Using the classic union bound, the frame (word) error rate (FER) and BER

for a (N, K) linear block code decoded by an ML criterion over an AWGN channel

is upper-bounded by

FER

d=dminAdQ

2d rEb

N0

(11)

BER d=dmin

Kw=1

w

KAw,dQ

2d r

Eb

N0

(12)

Where r denotes the code rate, Eb/No is the signal to noise ratio, d is the

Hamming distance of code words, dmi n is the minimum distance between code

words, wd is the average input error weight, Ad is the cardinality of code words

with distanced, Aw,dis the cardinality of code words with input and output weight

of w and d, K is the block length, Nis the code word length, and Q denotes the

complementary error function defined as

Q(x)= 12

x

eu2/2du (13)

However, this bound is not very tight in low signal to noise ratios. There are tighter

bounds like Viterbi-Viterbi [33], Poltyrev [26], and Divsalar [10] bounds. Divsalar

bound is very attractive since it provides tight bounds with closed form expressions

for bit-error and word-error probabilities. Here we describe this bound by defining

some new variables:

=

d/N

a()=ln(Ad)/N (14)

c=rEbN0


52/94

4.1 Introduction 41

Where,a(), andc are normalized weight, weight distribution, and SNR respec-

tively. Then we define the following:

c0()=(1e2a() ) 12f(c, )= cc0()+2c+c2 c1 (15)

Then we define the exponent:

E(c, d)=

1

2ln[12c0() f(c, )] +

c f(c, )

1+ f(c, ) , c0()


53/94


Fig. 40 Probability density function of messages in different iterations

To explain the DE phenomena, we assume that all the code word bits are zero; i.e.

value 1 sent in the transmitter. We sketch the probability density function of the OMs

for one constituent codes in different iterations. Example of those curves is shown in

Figure 40.

As we see the density functions evolves towards higher means. We can approx-

imate the density functions by Gaussian approximation, as Wiberg did in his dis-

sertation [34]. The bit decisions are made based on the messages, so the bit-error

probability depends on the SNR of messages. Therefore what is really important

is that the SNR of the density functions should increase in order to obtain betterperformance as the iterations go on.

The messages are passed between the constituent codes. Hence, the constituent

codes get the evolved message as input and generate new messages at the output.

So what we need to know is the SNR transfer function for each constituent codes.

Using the transfer functions we can track the behavior of the SNR of the messages

G

Eb/No

SNRin

SNRout

Fig. 41 Constituent code model for density evolution


54/94

4.1 Introduction 43

G1

SNRin

SNRoutG2

Eb/NoFig. 42 Constituent code model for density evolution

as they are passing between the constituent codes. Therefore, we sketch a general

model for each constituent code as in Figure 41. In this model we have one more

parameter that is the operating Eb/N0. This is the Eb/N0 of the observations the

leaf node messages that are fed to this constituent code and acts like a bias for the

constituent code.

The SNR transfer function is denoted by G that indicates the following relation-ship:

SNRout= G(SNRin) (20)It should be noted that the transfer function is implicitly dependent on the operat-

ingEb/N0. The transfer function is usually derived by Monte Carlo simulation using

the Gaussian approximation or the real density function.

For turbo-like codes with two constituent codes the overall block diagram of the

iterative decoding is shown in Figure 42.

We have:

SNRout= G1(SNRin) (21)SNRin= G2(SNRout) (22)

Suppose we start with the first constituent code. There is no prior messages at this

time; i.e. SNRin= 0. The observations help to generate new messages with someSNRout= G1(SNRin) >0. This messages are passed to the second constituent codeand output messages have the SNR= G2(G1(SNRin)), which is the SNRin of thefirst constituent code for the next iteration. In order to obtain a better SNR at each

iteration we should have the following:

G2(G1(SNRin)) >SNRin; for any SNRin (23)

SinceG 2is strictly ascending function, it is reversible and we can write an equivalent

relation:

G1(SNRin) > G12 (SNRin); for any SNRin (24)

If the above constraint holds, then the iteratively decoding will result in correct

information bits (SNR goes to infinity). The minimum operating Eb/N0 that thisconstraint holds is denoted as the capacity of the code with iteratively decoding.

We usually draw G1(SNR) and G12 (SNR) in one figure. These curves give us

some sense about the convergence speed as well. An example of such curves and the


55/94


Fig. 43 SNR improvement in iterative decoding

way SNR improves in iterations is shown in Figure 43. As we see the speed of SNR

improvement depends on the slopes of G1 and G2.

We use the transfer function curves to analyze the performance of turbo-like codes

with iterative decoding.

4.2 RA Codes

RA codes are the simplest codes among turbo-like codes, which make them very

attractive for analysis. The general block diagram of this code is drawn in Figure 44.

An information block of length N is repeated q times and interleaved to make a

block of sizeqN, and then followed by an accumulator.

rep(q)u

ACCI

N qN qNqN

Fig. 44 RepeatAccumulator code block diagram


56/94

4.2 RA Codes 45

The accumulator can be viewed as a truncated rate one recursive convolutional

code with transfer function of 1/(1+ D), but sometimes it is better to think of it as ablock code whose input block [x1, x2, . . . , xn ] and output block [y1, y2, . . . , yn ] are

related by the following:

y1= x1y2= x1+x2y3= x1+x2+x3. . . .

yn= x1+x2+x3+ +xn (25)

4.2.1 ML Analysis

For ML analysis we need the weight distribution of the code. We use the concept

of uniform interleaver [8] to compute the overall input-output weight enumerator

(IOWE). Therefore, we need to compute the IOWE of both repetition code and the

accumulator. For repetition code it is simply the following:

Arep(q)w,d =

N

w

; d= q w

0; otherwise(26)

It can be expressed as

Arep(q)w,d =

Nw

(dq w) (27)

where() is the Kronecker delta function.The IOWE of the accumulator is:

Aaccw,d=

Ndw/2

d1

w/21

(28)

Wherexandxdenote the largest integer smaller than xand smallest integerlarger thanx, respectively.


57/94


Having the IOWE of the repeat and accumulate codes, we can compute the IOWE

of the RA code using the uniform interleaver.

ARA(q)w,d =q N

h= 0A

rep(q)

w,h A

acc

h,dq N

h

= N

w

q N

qw

q Ndqw/2 d1qw/21 (29)

4.2.2 DE Analysis

The RA codes consist of two component codes: repeat and accumulate codes. Hence,

the messages are exchanged between these two codes. We use Gaussian approxima-tion to obtain the SNR transfer functions of the constituent codes, which are shown

in Figure 45.

In accumulator code we have nonzero SNR even with zero messages in the input

coming from repetition code. This is because the observations help to generate

some nonzero messages. Therefore, the accumulator is able to jumpstart the iterative

decoding. On the other hand, the repetition code has a straight line transfer function

Fig. 45 Density evolution for RA codes (q= 3)


58/94

4.3 RA Codes with Puncturing 47

Fig. 46 Accumulator with punctur-

ing and its equivalent for p=3

with a slope of 2; the reverse function is shown in Figure 45. SNR out is zero when

SNRinis zero. This is justified since there is no channel observation available for this

code. The curves are almost touching; hence, the threshold of the RA code (rate 1/3)

is almost 0.5 dB.

4.3 RA Codes with Puncturing

4.3.1 ML Analysis

To compute the IOWE of the RA codes with puncturing we use the equivalent

encoder depicted in Figure 46 instead of the accumulator with puncturing.

As we see, the equivalent graph is a concatenated code of a regular check code

and an accumulator, which is shown in Figure 47.

Since the check code is regular and memoryless, the presence of any interleaver

between two codes does not change the IOWE of the overall code. In order tocompute the IOWE for this code we insert a uniform interleaver between two codes.

The next step is to compute the IOWE of the check code. The IOWE can

be expressed in a simple closed-form formula if we use the two-dimensional Z-

transform. The inverse Z-transform results in Acw,d. We start with N= 1, i.e. wehave only one parity check. We have

Ac(W,D)=Ep(W)+ Op(W)D (30)

Check(p) AccpN NN

Check(p) ppN NN

AccN

Fig. 47 Block diagram of accumulator with puncturing


59/94


where

Ep(W)=Even[(1+W)p] (31)

and

Op(W)=Odd[(1+W)p] (32)

Since there are Nindependent check nodes in the code, the IOWE can be written

in Z-transform as:

Ac(W,D)=(Ep(W)+Op(W)D)N =N

d=0

N

d Ep(W)NdOp(W)dDd (33)

The IOWE is obtained by taking the inverse Z-transform. The closed-form expres-

sion for Aw,dfor arbitrary p is very complicated. Instead we derive the IOWE for

p=2, 3, and 4, which are practically more useful.

4.3.1.1 Casep=2

Using the general formula in Z-transform we have:

Ac(2)(W,D)=(1+W2 +2WD)N (34)

It can be expanded as following:

Nd=0

N

d

(1+W2)Nd(2W)dDd =

Nd=0

N

d

Ndj=0

Nd

j

W2j (2W)d

Dd

(35)

Therefore the IOWE can be expressed as

Ac(2)w,d=

N

d

Ndj

2d; w=d+2j for j= 0, . . . ,Nd

0; otherwise(36)

It can be expressed concisely as

Ac(2)w,d=

N

d

Ndj=0

Nd

j

2d(wd2j ) (37)


60/94


where(x)is the Kronecker delta function.

Example:N= 3

A=

1 0 0 00 6 0 0

3 0 12 0

0 12 0 8

3 0 12 0

0 6 0 0

1 0 0 0

=

1 0 0 00 2 0 0

3 0 4 0

0 4 0 8

3 0 4 0

0 2 0 0

1 0 0 0

1 0 0 0

0 3 0 0

0 0 3 0

0 0 0 1

(38)

The second matrix is canceled out when we concatenate this code to other codes

with a uniform interleaver.

4.3.1.2 Casep=3

Starting from general formula in Z-transform we have:

Ac(3)(W,D)=(1+3W2 +(3W+W3)D)N (39)It can be expanded as following:

Ac(3)(W,D)=N

d=0

Nd

(1+3W2)Nd(3W+W3)dDd (40)

Ac(3)(W,D)=N

d=0

N

d

Ndi=0

Nd

i

3i W2i

di=0

d

i

3i (W)2(di )

Wd

Dd (41)

It can be written as:

Ac(3)(W,D)=N

d=0

N

d

Ndii=0

Nd

ii

3iiW2ii

di=0

d

i

3(di )(W)2i

Wd

Dd (42)

Then we have

Ac(3)(W,D)=N

d=0

N

d

di=0

Ndii=0

d

i

Nd

ii

3ii+(di )Wd+2i+2ii

Dd

(43)


61/94


If we let j= i+ii, we have

Ac(3)(W,D)=N

d=0

N

d

N

j=0

min(j,d)

i=max(0,jN+d)

d

i

Ndji

3d+j2i Wd+2j

Dd (44)

Therefore, it is easy to show that the IOWE becomes:

Ac(3)w,d=

N

d

min(j,d)

i=max(0,jN+d)

d

i

Ndji

3d+j2i

;

w

=d

+2j for j

=0, . . . ,N

0; otherwise

(45)

It can be written as

Ac(3)w,d=

N

d

Nj=0

min(j,d)i=max(0,jN+d)

d

i

Ndji

3d+j2i

(wd2j )

(46)

where() is the Kronecker delta function.Meanwhile we have the following property:

Ac(3)w,d= A

c(3)3Nw,Nd (47)

This property can be proven very easily by taking the complements of three input

bits to a check. The output of the check is also inverted. If the number of nonzero

input and output bits are w and d, respectively, the number of nonzero bits in

a complemented version is 3Nw and Nd. This proves the property. Thisproperty helps to save some computations.

Example: N=3

A=

1 0 0 0

0 9 0 0

9 0 27 0

0 57 0 27

27 0 99 0

0 99 0 27

27 0 57 00 27 0 9

0 0 9 0

0 0 0 1

=

1 0 0 0

0 3 0 0

9 0 9 0

0 19 0 27

27 0 33 0

0 33 0 27

27 0 19 00 9 0 9

0 0 3 0

0 0 0 1

1 0 0 0

0 3 0 0

0 0 3 0

0 0 0 1

(48)


62/94


Check(4)4N N

Check(2)4N N2N

Check(2)

p

N2NCheck(2)

2N4NCheck(2)

Fig. 48 Block diagram of check_4 code and its equivalents

4.3.1.3 Casep=4

The code for this case can be viewed as a concatenated code as shown in Figure 48.

Because the check code is regular and memoryless, we can put any interleaver

between the codes without changing the IOWE of the overall code.By using a uniform interleaver and the results found for case p=2 the IOWE canbe written as:

Ac(4)w,d=

2Nh=0

Ac(2)w,hA

c(2)h,d

2N

h

(49)

Using the result for case p=2, we obtain

Ac(4)w,d=

Nd

Ndj=0

2N

d

2ji=0

Nd

j

2Nd2j

i

22d+2j

(wd2i2j )

(50)

Example:N= 3

Ac(4) =

1 0 0 0

0 12 0 0

18 0 48 0

0 156 0 64

111 0 384 0

0 600 0 192

252 0 672 0

0 600 0 192

111 0 384 0

0 156 0 64

18 0 48 00 12 0 0

1 0 0 0

=

1 0 0 0

0 4 0 0

18 0 16 0

0 52 0 64

111 0 128 0

0 200 0 192

252 0 224 0

0 200 0 192

111 0 128 0

0 52 0 64

18 0 16 00 4 0 0

1 0 0 0

1 0 0 0

0 3 0 0

0 0 3 0

0 0 0 1

(51)


63/94


This method can be applied for any p that can be decomposed into two smaller

numbers.

Having computed the IOWE of the check code, we can use the uniform interleaver

formula to come up with the IOWE of the accumulator with puncturing. We have:

Aacc(p)w,d =

Nh=0

Ac(p)w,h A

acch,d

N

h

(52)

The simplified expressions for cases p=2, 3, and 4 are as follows:

Aacc(2)w,d =

Nh=0

Nhj=0

Nh

j

Ndh/2

d1h/21

2h (wh2j ) (53)

Aacc(3)w,d =

Nh=0

Nj=0

min(j,h)i=max(0,jN+h)

h

i

Nhji

Ndh/2

d1h/21

3h+j2i (wh2j )

(54)

Aacc(4)

w,d =

N

h=0

Nh

j=0

2Nh2j

i=0

Nhj

2Nh2ji

Ndh/2 d1h/21

22h+2j (wh2i2j )

(55)

It should be noted that despite the fact that we use a uniform interleaver to obtain the

IOWE, we come up with the exact IOWE for accumulator with puncturing.

The next step is to find the IOWE of the RA code with puncturing, which is derived

in case of a uniform interleaver after repetition.

Arep(q)acc(p)w,d =

q Nh=0

Arep(q)w,h A

acc(p)h,d

q N

h

(56)

Therefore, the closed form expressions for IOWE of RA(p=2, q=2), RA(p=3,q=3), and RA(p=4, q=4) will be the following:

Arep(2)

acc(2)

w,d =

N

w

2N

2w

N

h=0

Nhj=0

Nhj Ndh/2 d1h/21 2h

(2wh2j ) (57)


64/94


Arep(3)acc(3)w,d =

N

w

3N

3w

N

h=0

Nj=0


h

i

Nhji

Ndh/2

d1h/21

3h+j2i (3wh2j ) (58)

Arep(4)acc(4)w,d =

N

w

4N

4w

N

h=0

Nhj=0

2Nh2ji=0

Nh

j

2Nh2j

i

Ndh/2 d1h/21 22h+2j (4wh2i 2j ) (59)The above expressions are IOWE of the nonsystematic RA codes with regular

puncturing. However, in most cases we need to have systematic codes. It is very easy

to compute the IOWE of a systematic code based on its nonsystematic code. The

following formula shows the conversion.

Asysrep(q)acc(p)w,d = A

rep(q)acc(p)w,dw (60)

4.3.2 Performance of Punctured RA Codes with ML Decoding

RA codes are usually nonsystematic codes, i.e. the information block is not sent

along with the output of the accumulator. However, the RA codes with puncturing

should be systematic in order to be decodable by iterative decoding. This constraint

is because the messages passed towards information variables are always zero; hence

not improving.

The normalized distance spectrum of some rate 1/2 codes for a block size of 4,000are illustrated in Figure 49. These codes are RA code (q= 2), systematic RA codewith puncturing (q= 3, p= 3), (q= 4, p= 4), and random code. The distancespectrum of a (n,k)random code is

Arandomw,d =

n

w

k

d

2k (61)

Arandomd = k

d 2nk (62)

To compute the cutoff thresholds of these codes using Divsalar bound, the nor-

malized distance spectrum is computed when Ngoes to infinity, i.e. the asymptotic

expression of r(). For random codes and the asymptotic expression of r() for


65/94


Fig. 49 Normalized distance spectrum of RA codes with puncturing

random codes with code rate Rc is

r()=H()+(Rc1) ln 2 (63)

whereH() is the binary entropy function. The asymptotic expression of r() for RA

code with repetitionq can be obtained as:

r()= max0


66/94


2= j/2Nfor 0< 2 >1/2. Also (22+)/3< min(0.5,).

r()= max,1.2

H

42+23 +H

1

+(1/2)H211/2

+ (2+21) ln(3)+

1/2+ 22+3

H

/2

1/2+ 22+3

+

22+3

H

/2

22+3

(65)

To derive the asymptotic expression ofr() for RA(q= 4, p= 4), we let =d/2Nfor 0< < 1,=h/2Nfor 0<


67/94


68/94

4.4 ARA Codes 57

ACCN-M

M

N-M

Fig. 51 The block diagram of the precoder

Mbits are passed through without any change and the rest (NMbits) goes through

an accumulator. M is considered a parameter in code design. The effect of this

parameter is studied in ML and iterative decoding. Then, it is optimized for achieving

the best performance. The block diagram of the precoder is shown in Figure 51.

4.4.1 ML Analysis

In order to find the performance of the code we need to compute the IOWE of the

precoder. It is easily computed using the IOWE of the accumulator code as follows:

Aprew,d=

Mm=0

M

m

Aaccwm,dm (67)

Therefore the IOWE of the overall code can be written as:

Aprerep(q)acc(p)w,d =

N

h=0

Aprew,hA

rep(q)acc(p)h,d

N

h

(68)

For systematic ARA code (p=3, q=3), we have

Aprerep(3)acc(3)w,d =

Mm=0

Nk=0

M

m

3N

3k

N

h=0

Nj=0


h

i

Nhji

Ndh/2 d1h/21N Mk+m(wm)/2

km1(wm)/21

3h+j2i (3kh2j ) (69)

Aprerep(4)acc(4)w,d =

Mm=0

Nk=0

M

m

4N

4k

N

h=0

Nhj=0

2Nh2ji=0

Nh

j

2Nh2j

i

Ndh/2

d1h/21

N Mk+m

(wm)/2

km1(wm)/21

22h+2j (4kh2i2j ) (70)


69/94


Fig. 52 ARA(3,3) BER performance bound

The above expressions are IOWE of the nonsystematic ARA codes with regular

puncturing. The IOWE of their systematic codes are derived by the following con-

version.

AsysARA(q, p)w,d = A

ARA(q,p)w,dw (71)

4.4.2 Performance of ARA Codes with ML Decoding

Divsalar BER performance bound of the ARA(3,3) and ARA(4,4) for different

Ms are shown in Figures 52 and 53. It is observed that the more number of bits

accumulates in the precoder, the lower the code threshold becomes. However, the

improvement stops at a certain point, which is M= 1/5 N for ARA(3,3) andM= 2/5 N for ARA(4,4). It is obvious that when M= N the codes turn intoRA with puncturing. It is very heartening that the performance of the ARA(4,4)

approaches very closely to that of random codes for the same block size in lowEb/N0

region.It is very instructive to observe the distance spectrum of these codes (For optimum

M). As we see in Figure 54, the only difference between the distance spectrum of

these codes and a random code is in the low-distance region, which causes the error

floor.


70/94


71/94


Fig. 54 Normalized distance spectrum of ARA codes with puncturing

To derive the asymptotic expression of r() for ARA(q= 4, p= 4), we let = M/2Nfor 0 < < 1/2, 1=m/2Nfor 0 < 1 < , 2= (w m)/2Nfor 0 < 2 < 1/2 ,=d/2Nfor 0<


72/94

4.4 ARA Codes 61

Table VII Cut-off Threshold for Rate 1/2 ARA Codes

Cutoff ARA_punc. ARA_punc. Random Shannon

threshold (q=3, p=3) (q=4, p=4) code limitRate 1/2 0.509 dB 0.310 dB 0.308 dB 0.184 dB

Table VII tabulates the Divsalar cutoff threshold for the same codes in Figure

54. As we expected, based on the BER performance bound, the cutoff threshold of

ARA(4,4) is extremely close to the cutoff threshold of random code.

4.4.3 DE Analysis

Unfortunately the iterative decoding cannot decode, as well as ML decoding. More-

over, the difference between the performances of two codes with iterative decoding

cannot be predicted based on their ML decoding performance.

The effect of the precoder in iterative decoding is very clear in Figure 55. The

accumulator and the repetition are regarded as one constituent code. The SNR

Fig. 55 Density evolution for ARA codes with puncturing (q=4, p=2)


73/94


Fig. 56 Performance of ARA codes using iterative decoding

transfer functions of this code and the simple repetition code are shown for com-

parison. The noticeable difference is a shift in the curve. This improves the threshold

almost by 0.5dB.

We have used the DE method to optimize the ARA codes for iterative decod-

ing. ARA(4,4) and ARA(3,3) achieve the best performance with M= 0.7Nand M= 0.5N, respectively. The performances of these codes are illustrated inFigure 56.

4.5 Other Precoders

Although the ML decoding performance of ARA codes with simple accumulate

precoder is very close to that of random codes, there is no known practical method

to actually perform this decoding. Therefore, we are looking for codes that have

good performance with iterative decoding. We have observed that with differentprecoders very good codes can be obtained. In this section we introduce some of

these precoders.


74/94


75/94


Fig. 59 Performance of the new ARA code

4.6 Hardware Complexity

Since the building blocks of the decoder is repetition and check nodes, like LDPC

codes, and the message passing are very simple for these blocks, the hardware

complexity is very low as far as the logic is concerned.

The memory requirement depends on the number of edges in the graph.

ARA codes have quite small number of edges, which results in low memoryrequirement.

4.7 Conclusion

This study proposes a novel coding structure which is not only very simple, but also

achieves the performance comparable or better than the best practical turbo codes

and LDPC codes. The ML analysis showed that in some cases they are extremelyclose to random codes, which achieve the Shannon limit.

The proposed coding scheme generates a family of LDPC codes for various code

rates and data frame sizes, and with a performance close to the Shannon capacity


76/94

4.7 Conclusion 65

limit. Unlike general LDPC codes, they have very simple encoding too. The main

innovation is in the inclusion of a very simple precoder, which is constructed by

parallel punctured accumulators. Such precoder improves the performance.

The regularity and simplicity of the proposed coding structure also allows

the construction of very high-speed iterative decoders using message-passingalgorithm.


77/94

Chapter 5

High Speed Turbo-like Decoders

5.1 Introduction

This chapter presents the architecture for high-speed decoding of ARA codes.

Whereby, message-passing algorithm enables us to achieve parallelism. Simula-tions have shown that efficiency is not compromised in order to obtain speed

gains.

Like parallel turbo decoder memory access poses a practical problem. We

extend the concept of conflict-free interleaver to address this problem. This leads

to the introduction of a new class of turbo-like codes that can be decoded

very fast. It is shown that the proposed high-speed turbo and ARA decoder are

among the codes in this class. The general architecture for decoding this class is

presented.

5.2 Parallel ARA Decoder

To build high-speed decoders for ARA codes we go along the similar way used

for high-speed turbo decoders. The basic idea is to partition the graph into sev-

eral subgraphs and let them work in parallel. For hardware regularity it is desir-

able that subgraphs are identical or have minimum variety. As an example, the

ARA code shown in Figure 58 is considered. The partitioned graph is drawn in

Figure 60.

Each subgraph is decoded using the messages passing algorithm. Since subgraphs

have tree structure, the efficient scheduling provides the fastest decoding method.

Usually the decoding for each subgraph is done serially, which lowers the com-

plexity. The hardware entity that performs the decoding for one subgraph is called

subgraph processor or window processor each subgraph corresponds to a window

of the code word.

There are three types of messages that are communicated within/between sub-

blocks: Internal messages are those that correspond to edges within one subgraph.

Border messages are those that are related to edges connecting two adjacent sub-graphs. External messages are passed between subgraphs through the interleaver. In

other words, they correspond to edges with global span. External messages are called

extrinsics after their counterparts in turbo codes.

Aliazam Abbasfar,Turbo-Like Codes, 6780. 67

c Springer 2007


78/94

68 5 High Speed Turbo-like Decoders

0 1

Interleaver

2 3 4 5 6 7

0 1 2 3 4 5 6 7

Fig. 60 The partitioned graph of ARA code

We need memory for storing all messages. The memory for internal messages is

local to the subgraph processors, whereas for external messages it is a global memorythat all subgraphs should have access to it. Border messages are usually stored in

registers that are part of subgraph processors. They are exchanged with neighboring

subgraphs at the end of each iteration.

Therefore the architecture for the decoder is like in Figure 61, in which a andbare

border messages and x and y are extrinsics. Internal messages are not shown. This

architecture is very similar to parallel turbo decoder. The only difference is that there

are two different window processors here, which are denoted as W and W.

Fig. 61 Parallel turbo decoder

structure


79/94

5.4 Interleaver Design 69

5.3 Speed Gain and Efficiency

Unlike turbo decoders the parallel processing does not cost much more processing.

This is because the LDPC codes are inherently parallel. What we are doing here isto make the parallelization practically feasible.

5.4 Interleaver Design

Although the message-passing algorithm allows us to parallelize the decoding

process, accessing so many extrinsics at the same time poses a practical problem.

Since Mwindow processors are running at the same time, Mextrinsics are being

used simultaneously. The extrinsic memory is organized in M banks of mem-ory in order to facilitate the simultaneous access; i.e. M locations are accessed

simultaneously.

As we discussed in the case of parallel turbo decoder, the interleaver should be

such that the window processors get the extrinsics from different banks of memory

in interleaved order as well. This forces us to have conflict-free interleaver presented

for parallel turbo decoder. However, in this section we look at this problem from a

graphical point of view. The parallel decoder comprises M identical processors that

are running in parallel. We put the partitions in parallel planes. Then we look at the

projected graph. The projected graph for ARA code shown in Figure 60, is shown in

Figure 62.

The projected graph can be viewed as the vectorized version of the actual

graph. In other words, there is a message vector associated with every edge in

the projected graph. The structure of memories for messages is such that only

one message vector is accessible at a time. The interleaver should preserve the

Fig. 62 Projected graph


80/94


Fig. 63 Projected graph with conflict-free interleaver

message vectors in its entirety, but the permutation is allowed within a vector. The

permutation within a vector is the permutation among the window processors or

different planes in the overall graph. This permutation does not change the projected

graph.

Therefore the interleaver consists of several independent permutations within the

message vectors. The way vectors are connected between two constituent codes is

another flexibility in the interleaver design. An example of the projected graph with

conflict-free interleaver is shown in Figure 63. The dashed edges indicate that the

permutation is allowed within a vector.

The above connections not only guarantee the conflict-free structure, but also

ensure that messages are propagated throughout the graph. Therefore projected

graph provides a very useful approach for designing turbo-like codes for high speeddecoding.

5.5 Projected Graph

In this section, design methodology for turbo-like codes based on the projected graph

is presented. There are two ways that could be pursued in order to design codes based

on projected graphs.The first approach is the one that was used so far to parallelize the decoder.

This is based on partitioning an existing code graph into some subgraphs. This

method works on any regular or semiregular graph. The projected graph includes one


81/94

5.5 Projected Graph 71

Fig. 64 A PCCC projected graph with

conflict-free interleaver

partition of each component code, which is called component graph. The component

graphs are connected with conflict-free interleavers. This method was used for ARA

code whose projected graph is shown in Figure 63.

It is shown that the parallel turbo decoder is a member of this class. Later on LDPC

codes based on projected graph are also introduced.

5.5.1 Parallel Turbo Decoder

The parallel turbo decoder proposed in Chapter 3 is one example of graphs

based on projected graph. The projected graph of such a code is illustrated in

Figure 64.

It is very instructive to note that the reverse interleaver, used for decreasing

the latency, is clearly shown connecting the edges of two component graphs inreverse order. From the projected graph argument, we can see that the interleaver

structure is just several independent permutations. The number of permutations

needed is the window size; here it is 4.

5.5.2 Other Known Turbo-like Codes

The class of codes based on the projected graph covers a wide range of turbo-likecodes. This section introduces some known turbo and turbo-like codes with projected

graphs. Figure 65 illustrates the projected graphs for a parallel turbo code with three

constituent codes: a serial turbo code, RA code, and IRA code.


82/94


Fig. 65 (a) PCCC with three

component codes: (b) SCCC, (c)

RA(3), (d) IRA(2,3)


83/94


The second approach is to design the code by designing its projected graph. In

this method we should design some component graphs and the connections between

them in order to have good performance. In other words, the partitions are designed

first and then put together to create constituent codes.

This approach is very appealing because the resulting code is definitely paral-lelizable and the performance of the code can be analyzed very efficiently by its

component graphs. The first example of this approach is parallel LDPC codes, which

is explained in the following section.

5.5.3 Parallel LDPC Codes

This section explains how to design LDPC codes with parallel decoding capabilities.This class of LDPC codes was independently discovered by Richardson et al. [29],

which is called vector LDPC codes. Thorpe [32] introduced LDPC codes based on

protograph that is basically the same concept. In this section we present this class as

codes with projected graph.

There are two component graphs in these codes: one contains only single parity-

check codes (variable degree) and the other has only repetition codes (variable

degree). One example of such a code is shown in Figure 66.

There are some noticeable facts about this projected graph. All variable nodes

are in one component graph, which means that all the observations are stored andprocessed in one kind of window processor. Variable and check node can have

different degrees. Therefore this structure is capable of implementing regular and

irregular LDPC codes. The degree distribution of variable and check nodes is known

from the projected graph. There are no local messages and no border messages

passed between adjacent subgraphs. In other words, we only have external edges.

Therefore, the projected graph can be represented graphically as a simple Tanner

graph. The graphical representation for above example is shown in Figure 67. The

number of interleaver needed for this code is equal to the number of edges of the

projected graph.The only disadvantage of this method for code design is that it does not provide

an efficient encoder. Sometimes simple encoding is not possible for codes designed

this way.

Fig. 66 A parallel LDPC projected graph


84/94


0 1 32

Fig. 67 Simple graphical representation of a

LDPC projected graph

5.5.4 More AccumulateRepeatAccumulate Codes

In this section some improved ARA codes are introduced that were designed based

on projected graphs. These are ARA codes with different codes rates. The thresholds

of these codes are also compared with channel capacity threshold.

5.5.4.1 Code Rates


85/94

Fig. 68 ARA code without puncturing

Fig. 69 (a) Rate 1/3 ARA code; (b) rate 1/2 ARA code


86/94


Table VIII Cutoff Threshold for ARA Codes with Rate


87/94


Table IX Cutoff Threshold for Improved ARA Codes with Rate


88/94


Table X Cutoff Threshold for ARA Codes with Rate >1/2

Rate 4/7 5/8 2/3 7/10 8/11 3/4 10/13

Threshold (dB) 0.700 1.006 1.272 1.506 1.710 1.894 2.057

Shannon limit 0.530 0.815 1.059 1.272 1.459 1.626 1.777

Difference 0.170 0.191 0.213 0.234 0.251 0.268 0.280

5.5.4.3 Code Rates >1/2

In this section we present a family of ARA codes derived from rate 1/2 irregular

ARA code in Figure 72. The projected graph of these codes is shown in Figure 73

and the performance of this family is listed for different rates in Table X. It also

shows how close that is to Shannon threshold.

5.6 General Hardware Architecture

In this section we present general hardware architecture for implementing parallel

turbo-like decoders. Without any loss of generality we focus on turbo-like codes with

two constituent codes. This can be easily extended to codes with several constituent

codes by grouping them into two combined constituent codes. The general hardware

architecture is shown in Figure 74. EXTn denotes the external memory for nthwindow processor.

Since the processors are identical and are running in parallel, the scheduling is the

same for all of them. Therefore, there is only one scheduling controller needed for

each constituent codes. The scheduling controller determines which message vector

is accessed and what permutation is used. The permutor is a memoryless block that

permutes the message vector on the fly. Since the message vectors are permuted

Fig. 73 Irregular ARA code family for rate >1/2


89/94

5.6 General Hardware Architecture 79

Permutor

Window

processorWM-1

EXTM-1

Scheduling

ControllerC1

Address

Generator

Scheduling

ControllerC2

PermutationSelect

Window

processorW1

EXT1

Window

processorW0

EXT0

Window

processorW0

Window

processorW1

Window

processorWM-1

. . .

. . .

. . .

Fig. 74 Parallel decoder hardware architecture

differently, the permutor should be programmable. If M, the number of window

processors, is large, the permutor can be the bottleneck of the hardware design.

The architecture of one window processor is depicted in Figure 75.

AM and BM denote the registers which contains border messages. The observation

memory is loaded at the beginning of the decoding and remains intact until end. This

memory is not necessary for all window processors.

B

M

Observation Mem

MessagePassing

Core

A

M

Internal Messages Mem

Fig. 75 Window processor hardware architecture


90/94


5.7 Conclusion

In this chapter, architecture for high-speed decoding of ARA codes was presented.

Two major issues in high-speed decoding were addressed: parallel processing andmemory access problem. This led to introduction of a new class of turbo-like codes

that can be decoded very fast, which are the codes with projected graph. This

classification provides an alternative method to design turbo-like codes for high-

speed decoding. It was shown that the proposed high-speed turbo and ARA decoder

are among the codes in this class. The general architecture for decoding this class of

codes was also presented.

The generalized coding structure that was developed during this research is a

powerful approach toward designing turbo-like codes that are suitable for high-

speed decoding. However, there are some areas that are not covered yet or can be

a complement to this research, which are described as follows.

First, in designing ARA codes the focus was on improving the threshold. However,

another important aspect of the performance is usually ignored that is the error

floor. Two important factors affect the error floor of a code: code structure and

interleaver design. Code structure is selected to obtain a certain threshold; therefore

the interleaver design is used to improve the error floor. ARA codes with pseudo-

random interleavers usually have high error floors. We have been able to improve

the error floor by orders of magnitude by manual changes in the interleavers. It is

very important to find a systematic way to design or modify interleavers to have low

error floor. Design of algorithmic interleavers is a more challenging topic, which isof more practical interest.

Second, the search for good codes based on their projected graph is very reward-

ing. Since the structure of such a code guarantees the high-speed decoding capability,

the only concern is the performance of the code. On the other hand, the projected

graph is desired to be very simple, which makes the search easier. One simple way

of approaching this problem is to start with known projected graphs and make some

changes. Analysis of the resulting code determines whether the change is good or not.

We have pursued this approach and some preliminary results show its effectiveness.


91/94

References

1. A. Abbasfar and K. Yao, An efficient and practical architecture for high speed turbo

decoders, Vol. 1, Proceedings of VTC, October 2003, pp. 337341.2. A. Abbasfar and K. Yao, Interleaver design for high speed turbo decoders, Vol. 5205,

Proceedings of SPIE, August 2003, pp. 282290.

3. A. Abbasfar and K. Yao, An efficient architecture for high-speed turbo decoders, Pro-

ceedings of ICASSP 2003, April 2003, pp. IV-521IV-524.

4. S. Aji and R.J. McEliece, The generalized distributed law, IEEE Trans. Inform. Theory,

March 2000, 32(1), 325343.

5. L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding of linear codes for

minimum symbol error rate, IEEE Trans. Inform. Theory, March 1974, 284287.

6. S. Bennedetto, D. Divsalar, G. Montorsi, and F. Pollara, Soft-input soft-output APP

module for iterative decoding of concatenated codes, IEEE Commun. Lett., January 1997,2224.

7. S. Bennedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial concatenation of inter-

leaved codes: performance analysis, design, and iterative decoding, IEEE Trans. Inform.

Theory, May 1998, 44(3), 909926.

8. S. Bennedetto and G. Montorsi, Unveiling turbo codes: some results on parallel concate-

nated codes, IEEE Trans. Inform. Theory, March 1996, 42(2), 409428.

9. C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error correcting coding

and decoding: turbo codes, Proceedings of the 1993 IEEE International Conference on

Communications, Geneva, Switzerland, May 1993, pp. 10641070.

10. D. Divsalar, A simple tight bound on error probability of block codes with application toturbo codes, JPL TMO Progress Report 42139, November 1999, pp. 135.

11. D. Divsalar, S. Dolinar, and F. Pollara, Iterative turbo decoder analysis based on

Gaussian density evolution, IEEE J. Select. Areas Commun., May 2001, 19(5), pp.

891907.

12. D. Divsalar, H. Jin, and R.J. McEliece, Coding theorems for turbo-like codes, Proceedings

of the 36th Allerton Conference on Communication, Control and Computing, September

1998, Allerton House, Monticello, IL, pp. 201210.

13. B.J. Frey, F.R. Kschischang, and P.G. Gulak, Concurrent turbo-decoding, Proceedings of

the IEEE International Symposium on Information Theory, July 1997, Ulm, Germany,

p. 431.14. R. Gallager, Low-Density Parity-Check Codes, MIT Press, Cambridge, MA, 1963.

15. J. Hsu and C.H. Wang, A parallel decoding scheme for turbo codes, Vol. 4, IEEE

Symposium on Circuits and Systems, Monterey, June 1998, pp. 445448.

81


92/94

82

16. H. Jin, Analysis and Design of Turbo-like Codes. Ph.D. thesis, California Institute of

Technology, Pasadena, 2001.

17. H. Jin, A. Khandekar, and R. McEliece, Irregular repeat-accumulate codes, in: Proceed-

ings of the 2nd International Symposium on Turbo Codes, Brest, France, 2000, pp. 18.

18. F.R. Kschischang and B.J. Frey, Iterative decoding of compound codes by probabilitypropagation in graphical models, IEEE J. Select. Areas Commun., February 1998, 16(2),

219230.

19. S.L. Lauritzen and D.J. Spiegelhalter, Local computations with probabilities on graphical

structures and their applications in expert systems, J. R. Stat. Soc. B., 1988, 50, 157224.

20. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, D.A. Spielman, and V. Stemann, Practical

low-resilient codes, Proceedings of 29th Symposium on Theory of Computing, 1997, pp.

150157.

21. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, and D.A. Spielman, Improved low-density

parity-check codes using irregular graphs, IEEE Trans. Inform. Theory, 2001, 47, 585

598.22. D.J.C. MacKay and R.M. Neal, Good codes based on very sparse matrices, in: C. Boyd

(ed.). Cryptography and Coding, 5th IMA Conference, No. 1025 in Lecture Notes in

Computer Science, Springer, Berlin, 1995, pp. 100111.

23. D.J.C. MacKay, Good error correcting codes based on very sparse matrices, IEEE Trans.

Inform. Theory, 1999, 45(2), 399431.

24. R.J. McEliece, D.J.C. MacKay, and J.F. Cheng, Turbo decoding as an instance of Pearls

belief propagation algorithm, IEEE J. Select. Areas Commun., February 1998, 16(2),

140152.

25. J. Pearl, Fusion, propagation, and structuring in belief networks, Artif. Intell., 1986, 29,

242288.

26. G. Poltyrev, Bounds on the decoding error probability of binary linear codes via their

spectra, IEEE Trans. Inform. Theory, 40(10), 12611271.

27. T. Richardson and R. Urbanke, The capacity of low density parity check codes under

message passing decoding, IEEE Trans. Inform. Theory, February 2001, 47(2), 599618.

28. T. Richardson, M.A. Shokrollahi, and R. Urbanke, Design of capacity-approaching irreg-

ular low-density parity-check codes, IEEE Trans. Inform. Theory, February 2001, 47(2),

619637.

29. Richardson et al., Methods and apparatus for decoding LDPC codes, United States Patent

6,633,856, October 14, 2003.

30. C.E. Shannon, A mathematical theory of communications, Bell Syst. Tech. J., 1948, 27,

379423.

31. R.M. Tanner, A recursive approach to low complexity codes, IEEE Trans. Inform. Theory,

1981, IT-27, 533547.

32. Jeremy Thorpe, Low Density Parity Check (LDPC) Codes Constructed from Protograhs,

JPL INP Progress Report 42-154, August 15, 2003.

33. A.J. Viterbi and A.M. Viterbi, An improved union bound for binary input linear codes on

AWGN channel, with applications to turbo decoding, Proceedings of IEEE Information

Theory Workshop, February 1998.

34. N. Wiberg, Codes and Decoding on General Graphs. Linkping Studies in Science and

Technology. Dissertation no. 440. Linkping University, Linkping, Sweden, 1996.


93/94

Index

A

AccumulateRepeatAccumulate, v, xiv, 4,56, 102

ARA code, 4

ARA codes, iv, v, viii, ix, xiv, 4, 56, 80, 82,

84, 87, 88, 90, 92, 102, 105, 106, 109, 110

ARA decoder, v, xiv, 4, 92, 109

B

Backward Recursion, 25

BCJR algorithm, iii, 8, 24, 26, 27, 31, 33,

34belief propagation algorithm, 2, 9, 112

bipartite graph, 9

block codes, iv, 6, 7, 18, 56

C

Codes on graph, iii, 18

Conflict-free interleaver, 3

conflict-free interleavers, 42, 52,

97

Constituent code, viii, 61constituent codes, vi, vii, 2, 5, 6, 7, 8, 26, 27,

28, 29, 31, 33, 34, 35, 40, 52, 59, 60, 61,

65, 79, 96, 98, 100, 107, 108

convolutional code, vi, 5, 8, 20, 21, 23, 24,

26, 52, 64

convolutional codes, 5, 6, 20, 21, 22, 27

Convolutional codes, iii, vi, 20, 21

D

density evolution, viii, 4, 28, 59, 61,111

Density evolution, iv, 59

E

efficiency, iv, v, xiii, 1, 3, 35, 36, 37, 38, 40,53, 92, 94

Efficient schedule, 16

extrinsic information, 8, 25, 26, 28, 33,

44

extrinsics, 11, 26, 27, 28, 29, 30, 40, 41, 42,

44, 52, 93, 94, 95

F

Flooding schedule, 16

Forward recursion, 24

G

graph representation, 18, 20, 22, 24

graphs with cycles, 3, 11

Graphs with cycles, iii, 17

H

hardware architecture, v, ix, 107, 108, 109

Hardware complexity, iv, v, 3, 51, 90

high-speed decoding, xiii, 2, 4, 92, 109,110

I

interleaver, iv, vii, ix, x, xiii, 3, 7, 32, 34, 41,

42, 43, 44, 45, 46, 47, 49, 50, 53, 54, 64,

65, 67, 69, 72, 73, 74, 92, 93, 95, 96, 97,

98, 101, 110

Interleaver, iv, v, 40, 45, 95, 111

IOWE, 64, 65, 67, 68, 69, 70, 72, 73, 74, 75,

81, 82IRA codes, 55

Irregular RepeatAccumulate, 55

83


94/94

84 Index

iterative decoding, vii, viii, xiii, 2, 3, 4, 7, 8,

9, 24, 28, 29, 30, 55, 56, 61, 63, 66, 75,

80, 86, 87, 88, 102, 111

Llatency, iv, xiii, 3, 18, 30, 35, 42, 43, 53, 98

LDPC codes, v, 1, 9, 19, 55, 56, 59, 90, 91,

94, 97, 100, 101, 113

Low-density parity-check, 1, 19, 112

M

MAP decoding, 8

Memory access, xiii, 3

message passing, 15, 18, 20, 24, 27, 56, 80,

90, 91, 92message-passing algorithm, xiii, 3, 17, 24,

32, 41, 95

ML decoding, iv, v, 4, 55, 56, 75, 80, 82,

86, 88

P

Parallel concatenated convolutional code, 2,

5

parallelization, xiii, 3, 4, 35, 36, 45, 53, 80,

94

Parity-check codes, iii, 18

pipelining, 43

precoder, viii, 80, 81, 82, 86, 88, 89,

105

precoding, 55

probabilistic graph, 10

probability propagation algorithm, 3, 11

processing load, 35, 36, 53

projected graph, ix, 4, 95, 96, 97, 98, 100,

101, 102, 104, 106, 109, 110

protograph, 100

puncturing, iv, v, viii, ix, 55, 67, 73, 74, 75,

76 78 79 82 84 87 88 102

R

RA codes, iii, iv, viii, x, 1, 2, 6, 55, 63, 65,

66, 67, 75, 76, 78, 79, 80

RepeatAccumulate codes, 63

repetition codes, 100

S

scheduling, 15, 16, 18, 27, 32, 45, 93, 108

Serial concatenated convolutional codes, 2, 5

serial decoder, 32, 37, 42, 49, 52, 53

Shannon limit, xiv, 4, 19, 55, 56, 78, 91, 105,

111

SISO, vi, 8, 26, 27, 28, 29, 30, 34, 35, 41,

45, 46, 47, 53

sparse parity-check matrix, 19speed gain, vii, 3, 30, 35, 36, 38, 39, 40, 43,

51, 53

Speed gain, iv, v, 35, 36, 94

speed gain and efficiency, 3, 51

S-random interleavers, 47

state constraint, 21, 35

state variables, vi, 20, 21, 25

systematic code, 5, 6, 36, 49, 75, 79

T

Tanner graph, vi, viii, 9, 18, 19, 20, 21, 89

turbo codes, vi, xiii, 1, 2, 6, 7, 11, 17, 22, 27,

28, 29, 30, 44, 55, 80, 93, 111, 112

Turbo codes, iii, 5, 22, 28

turbo decoding, vi, 9, 11, 24, 29, 30

turbo encoder, 5

turbo-like code, 2, 4, 56

turbo-like codes, iii, iv, v, xiii, xiv, 2, 3, 4, 5,

6, 17, 55, 59, 61, 63, 80, 92, 97, 98, 107,

109

W

window processor 31 34 93 101 107 108

Documents

Springer - Turbo-Like Codes