Upload
deepak-salian
View
240
Download
0
Embed Size (px)
Citation preview
8/13/2019 Springer - Turbo-Like Codes
1/94
8/13/2019 Springer - Turbo-Like Codes
2/94
Turbo-like Codes
8/13/2019 Springer - Turbo-Like Codes
3/94
Aliazam Abbasfar
Turbo-like Codes
Design for High Speed Decoding
1 3
8/13/2019 Springer - Turbo-Like Codes
4/94
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN-10 9781402063903ISBN-13 9781402063909
Published by Springer,P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
www.springeronline.com
Printed on acid-free paper
All Rights Reservedc 2007
No part of this work may be reproduced, stored in a retrieval system, or transmittedin any form or by any means, electronic, mechanical, photocopying, microfilming, recordingor otherwise, without written permission from the Publisher, with the exceptionof any material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work.
8/13/2019 Springer - Turbo-Like Codes
5/94
Dedicated to my wife
8/13/2019 Springer - Turbo-Like Codes
6/94
Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Turbo Concept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Turbo Codes and Turbo-like Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 RepeatAccumulate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Product Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Iterative Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Probability Propagation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Message-passing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Graphs with Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Codes on Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.1 Parity-check Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.2 Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.3 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 High-speed Turbo Decoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Pipelined Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Parallel Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vii
8/13/2019 Springer - Turbo-Like Codes
7/94
viii Contents
3.6 Speed Gain and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7 Interleaver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7.1 Low Latency Interleaver Structure . . . . . . . . . . . . . . . . . . . . . . . . 313.7.2 Interleaver Design Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Hardware Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Very Simple Turbo-like Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Bounds on the ML Decoding Performance of Block Codes . . . . 40
4.1.2 Density Evolution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 RA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.2 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 RA Codes with Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 Performance of Punctured RA Codes with ML Decoding . . . . . . 53
4.3.3 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 ARA Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.1 ML Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.2 Performance of ARA Codes with ML Decoding . . . . . . . . . . . . . 58
4.4.3 DE Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Other Precoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Accumulator wih Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 Hardware Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 High Speed Turbo-like Decoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Parallel ARA Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.3 Speed Gain and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Interleaver Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Projected Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5.1 Parallel Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5.2 Other Known Turbo-like Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5.3 Parallel LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5.4 More AccumulateRepeatAccumulate Codes . . . . . . . . . . . . . . 74
5.6 General Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8/13/2019 Springer - Turbo-Like Codes
8/94
8/13/2019 Springer - Turbo-Like Codes
9/94
x List of Figures
27 Partitioned graph of a simple PCCC 25
28 Parallel turbo decoder with shared processors for two constituent codes 26
29 Performances of parallel decoder 28
30 Efficiency and speed gain 2831 Efficiency vs. signal to noise ratio 29
32 (a) Bit sequence in matrix form (b) after row interleaver
(c) A conflict-free interleaver (d) Bit sequence in sequential order
(e) The conflict-free interleaved sequence 30
33 Data and extrinsic sequences in two consecutive iterations for turbo
decoder with reverse interleaver 31
34 Sequences in two consecutive iterations for parallel turbo decoder
with reverse interleaver 3235 Scheduling diagram of the parallel decoder 32
36 The flowchart of the algorithm 34
37 Performance comparison forB = 1,024 36
38 Performance comparison forB = 4,096 37
39 (a) alpha recursion (b) beta recursion (c) Extrinsic computation 37
40 Probability density function of messages in different iterations 42
41 Constituent code model for density evolution 42
42 Constituent code model for density evolution 43
43 SNR improvement in iterative decoding 44
44 RepeatAccumulator code block diagram 44
45 Density evolution for RA codes (q = 3) 46
46 Accumulator with puncturing and its equivalent forp = 3 47
47 Block diagram of accumulator with puncturing 47
48 Block diagram of check_4 code and its equivalents 51
49 Normalized distance spectrum of RA codes with puncturing 5450 Density evolution for RA codes with puncturing (q = 4,p = 2) 56
51 The block diagram of the precoder 57
52 ARA(3,3) BER performance bound 58
53 ARA(4,4) BER performance bound 59
54 Normalized distance spectrum of ARA codes with puncturing 60
55 Density evolution for ARA codes with puncturing (q = 4,p = 2) 61
56 Performance of ARA codes using iterative decoding 62
57 The block diagram of the new precoder 63
58 Tanner graph for new ARA code 63
59 Performance of the new ARA code 64
8/13/2019 Springer - Turbo-Like Codes
10/94
List of Figures xi
60 The partitioned graph of ARA code 68
61 Parallel turbo decoder structure 68
62 Projected graph 69
63 Projected graph with conflict-free interleaver 7064 A PCCC projected graph with conflict-free interleaver 71
65 (a) PCCC with 3 component codes (b) SCCC (c) RA(3) (d) IRA(2,3) 72
66 A parallel LDPC projected graph 73
67 Simple graphical representation of a LDPC projected graph 74
68 ARA code without puncturing 75
69 (a) Rate 1/3 ARA code (b) rate 1/2 ARA code 75
70 (a) Rate 1/2 ARA code (b) New rate 1/3 ARA code (c) New rate 1/4
ARA code 76
71 Improved rate 1/2 ARA codes 77
72 Irregular rate 1/2 ARA codes 77
73 Irregular ARA code family for rate>1/2 78
74 Parallel decoder hardware architecture 79
75 Window processor hardware architecture 79
8/13/2019 Springer - Turbo-Like Codes
11/94
List of Tables
I Probability Definitions 9
II State Constraint 16
III The Decoder Parameters 26
IV Characteristic Factors for the Parallel Decoder @SNR= 0.7 dB
(BER = 10E 8) 29
V An Example of the Interleaver 33
VI Cut-off Thresholds for RA Codes with Puncturing 54
VII Cut-off Threshold for Rate 1/2 ARA Codes 61
VIII Cutoff Threshold for ARA Codes with Rate M?
m = m+1;
i > M ?
Done
k = N*m + j;
Qk = N*Pm,j + Nj
m = i1;
Qn & Qksatisfy the
constraint
?
m < 0 ?
m = m 1;
Yes
No
Yes
No
Yes
No
No
Yes
Yes
No
Fig. 36 The flowchart of the algorithm
8/13/2019 Springer - Turbo-Like Codes
46/94
3.8 Hardware Complexity 35
bit. Otherwise we check the remaining elements in the column in place of this bit. If
one of them satisfies the constraint, we exchange the indices in the column and go
to the next bit. If not, we try exchanging this bit with the previously designed bits
in the column. In this situation, when we exchange two bits the constraint for both
bits should be satisfied. If none of the previously designed bits can be exchangedwith this bit, then the algorithm fails. There are two options available in case the
algorithm fails: one is to make the constraint milder and the other one is to redo
everything with a new random matrix.
We have observed that this algorithm is very fast for S-random interleaver design
and it does not fail when the spread is less than sqrt(B/2). The maximum spread
that we can achieve for the structured interleaver is slightly smaller than that of the
ordinary one. Therefore, one should expect some degradation in performance.
A much more complicated constraint can be used in the algorithm in order to
improve the code. These constraints usually depend strictly on the code.
3.7.3 Simulation Results
For simulations, two PCCCs with block sizes of 1,024 and 4,096 are chosen. The
first constituent code is a rate 1/2 systematic code and the second code is a rate
one nonsystematic recursive code. The feed forward and feedback polynomials are
the same for both codes and are 1 + D + D3 and 1 + D2 + D3, respectively. Thus
coding rate is 1/3. The simulated channel is an AWGN channel.
Two interleavers with block length of 1,024 (M= 32, N= 32) and 4,096 (M=
128, N= 32) have been designed with the proposed algorithm. The BER perfor-
mance of the decoders has been simulated and compared with that of the serial
decoder with S-random interleaver. The maximum number of iterations for each case
is 10.
The performance comparison for the 1,024 case is illustrated in Figure 37. The
proposed two-dimensional S-random interleaver is called S2-random. As we see in
the figure, the performances are almost the same. The S-random interleaver has a
slightly better error floor.
The performances for the 4,096 case is shown in Figure 38. The difference
between the error floors is more noticeable. However, the codes have equal threshold
in both cases. The error floor can be reduced with a more powerful constraint in the
interleaver design algorithm.
3.8 Hardware Complexity
After the speed gain and efficiency were discussed, we are going to investigate
hardware complexity for parallel turbo decoder. The turbo decoder hardware consists
of two major parts: logic and memory.
8/13/2019 Springer - Turbo-Like Codes
47/94
36 3 High-speed Turbo Decoders
EB/No
Fig. 37 Performance comparison for B = 1,024
The memory requirement for the parallel decoder consists of the following:
Observations: These variables are the observation values received from channel.
They usually are the logarithm of likelihood ratios (LLR) of the code word bits. The
message computations are easier using these variables. The size of the memory is
the number of bits in a code word. For a rate 1/3 turbo code with block size of B,
the memory size is 3B. Usually the width of this memory is 45 bits. This is a read-
only memory during the decoding (all iterations). This memory is the same for serial
decoder.
Extrinsics: The extrinsics are stored in a memory to be used later by other
constituent codes. The size of the memory depends on the number of connections
between constituent codes. For a simple PCCC, there are only B extrinsics needed.
This memory is a read/write with a width of 69 bits. For conflict-free interleavers,
this memory is divided into M sub-blocks each one accessed independently. The
total memory, however, is the same as serial decoder.
Beta variables: These variables should be kept in a memory in backward recursion
before it is used to compute the extrinsics. Each beta variable is actually an array. The
size of the array is 2(K1)1, where K is the constraint length of the convolutionalcode; i.e. the number of possible values for a state variable. For an 8-state convolu-
tional code the size of memory for beta variables is 8B. Each element in the array
has usually 812 bits. This memory is a big portion of memory requirements for the
8/13/2019 Springer - Turbo-Like Codes
48/94
8/13/2019 Springer - Turbo-Like Codes
49/94
38 3 High-speed Turbo Decoders
All of the above operations are done in one clock cycle. In the parallel decoder we
haveMSISO blocks. Therefore, compared to serial decoder the above computational
logic is increased by M times.
Contrary to pipelined turbo decoder, the complexity is not increased proportional
to the speed gain.
3.9 Conclusion
We have proposed an efficient architecture for parallel implementation of turbo
decoders. The advantage of this architecture is that the increase in the processing load
due to parallelization is minimal. Simulation results demonstrate that this structure
not only achieves some orders of magnitude in speed gain, but also maintains theefficiency in processing. Also we have shown that the efficiency and the speed gain
of this architecture are almost independent of the SNR.
We also have proposed a novel interleaver structure for parallel turbo decoder.
The advantages of this architecture are low latency, high speed, and the feasibility
of the implementation. Simulation results show that we can achieve very good BER
performance by this architecture as well. We also presented a fast algorithm to design
such an interleaver, which can be used for designing S-random and other interleavers
by just changing the constraint.
The regularity of the recently proposed architecture for parallel turbo decoder andthe advantages of the proposed interleaver make this the architecture of choice for
VLSI implementation of high-speed turbo decoders.
8/13/2019 Springer - Turbo-Like Codes
50/94
Chapter 4
Very Simple Turbo-like Codes
4.1 Introduction
In searching for simple turbo-like codes RA codes are very inspiring. They are
perhaps the simplest turbo-like codes. Surprisingly, they achieve good performancetoo. Simplicity of these codes lends itself to a more comprehensive analysis of their
performance. Divsalar et al. have shown the performance of these codes with ML
decoding and proven that they can achieve near Shannon limit performance [12].
Moreover, they have proved that it achieves the Shannon limit when the rate goes to
zero.
However, RA codes cannot compete with turbo codes or well-designed LDPCs
as far as performance is concerned. To improve the performance of RA codes Jin
proposed Irregular RepeatAccumulate (IRA) codes [16,17]. He also presented a
method for designing very good IRA codes for binary erasure and additive white
Gaussian channels. He showed that they outperform turbo codes for codes with very
large block sizes. However, IRA codes lose both the regularity and simplicity at the
expense of performance.
In this chapter we show that with some simple modifications RA codes can
be transformed into very powerful codes while maintaining the simplicity. The
modifications include simple puncturing and precoding. First RA codes with reg-
ular puncturing are analyzed using iterative decoding, as well as ML decoding.
ML decoding performance is shown by a tight bound using the weight distrib-
ution of RA codes with puncturing. In fact, with increasing both the repetition
and puncturing the code rate remains the same whereas the performance getsbetter.
Then we present ARA codes. These codes not only are very simple, but also
achieve excellent performance. The performance of these codes with ML decoding
is illustrated and compared to random codes by very tight bounds. It is shown that
there are some simple codes that perform extremely close to Shannon limit with
ML decoding. The performance of ARA codes using iterative decoding is also
investigated and compared to ML decoding later on.
RA and ARA codes can be classified as LDPC codes. Despite the fact that LDPC
codes in general may have a very computationally involved encoder, RA and ARAcodes have a simple encoder structure. ARA codes, especially, allows us to generate
a wide range of LDPC codes with various degree distributions including variable
Aliazam Abbasfar,Turbo-Like Codes, 3965. 39
cSpringer 2007
8/13/2019 Springer - Turbo-Like Codes
51/94
40 4 Very Simple Turbo-like Codes
nodes with degree one (RA and IRA code structures do not allow degree one variable
nodes). They are able to generate LDPC codes with various code rates and data frame
sizes, and with a performance close to the Shannon capacity limit. The proposed
coding structure also allows constructing very high-speed iterative decoders using
belief propagation (message passing) algorithm.First we describe briefly some of the tools that we use for analyzing the perfor-
mance of a turbo-like code.
4.1.1 Bounds on the ML Decoding Performance of Block Codes
Since there is no practical ML decoding algorithm available for block codes with
large block size, we use the performance bounds to obtain some insight on codesbehavior. Using the classic union bound, the frame (word) error rate (FER) and BER
for a (N, K) linear block code decoded by an ML criterion over an AWGN channel
is upper-bounded by
FER
d=dminAdQ
2d rEb
N0
(11)
BER d=dmin
Kw=1
w
KAw,dQ
2d r
Eb
N0
(12)
Where r denotes the code rate, Eb/No is the signal to noise ratio, d is the
Hamming distance of code words, dmi n is the minimum distance between code
words, wd is the average input error weight, Ad is the cardinality of code words
with distanced, Aw,dis the cardinality of code words with input and output weight
of w and d, K is the block length, Nis the code word length, and Q denotes the
complementary error function defined as
Q(x)= 12
x
eu2/2du (13)
However, this bound is not very tight in low signal to noise ratios. There are tighter
bounds like Viterbi-Viterbi [33], Poltyrev [26], and Divsalar [10] bounds. Divsalar
bound is very attractive since it provides tight bounds with closed form expressions
for bit-error and word-error probabilities. Here we describe this bound by defining
some new variables:
=
d/N
a()=ln(Ad)/N (14)
c=rEbN0
8/13/2019 Springer - Turbo-Like Codes
52/94
4.1 Introduction 41
Where,a(), andc are normalized weight, weight distribution, and SNR respec-
tively. Then we define the following:
c0()=(1e2a() ) 12f(c, )= cc0()+2c+c2 c1 (15)
Then we define the exponent:
E(c, d)=
1
2ln[12c0() f(c, )] +
c f(c, )
1+ f(c, ) , c0()
8/13/2019 Springer - Turbo-Like Codes
53/94
42 4 Very Simple Turbo-like Codes
Fig. 40 Probability density function of messages in different iterations
To explain the DE phenomena, we assume that all the code word bits are zero; i.e.
value 1 sent in the transmitter. We sketch the probability density function of the OMs
for one constituent codes in different iterations. Example of those curves is shown in
Figure 40.
As we see the density functions evolves towards higher means. We can approx-
imate the density functions by Gaussian approximation, as Wiberg did in his dis-
sertation [34]. The bit decisions are made based on the messages, so the bit-error
probability depends on the SNR of messages. Therefore what is really important
is that the SNR of the density functions should increase in order to obtain betterperformance as the iterations go on.
The messages are passed between the constituent codes. Hence, the constituent
codes get the evolved message as input and generate new messages at the output.
So what we need to know is the SNR transfer function for each constituent codes.
Using the transfer functions we can track the behavior of the SNR of the messages
G
Eb/No
SNRin
SNRout
Fig. 41 Constituent code model for density evolution
8/13/2019 Springer - Turbo-Like Codes
54/94
4.1 Introduction 43
G1
SNRin
SNRoutG2
Eb/NoFig. 42 Constituent code model for density evolution
as they are passing between the constituent codes. Therefore, we sketch a general
model for each constituent code as in Figure 41. In this model we have one more
parameter that is the operating Eb/N0. This is the Eb/N0 of the observations the
leaf node messages that are fed to this constituent code and acts like a bias for the
constituent code.
The SNR transfer function is denoted by G that indicates the following relation-ship:
SNRout= G(SNRin) (20)It should be noted that the transfer function is implicitly dependent on the operat-
ingEb/N0. The transfer function is usually derived by Monte Carlo simulation using
the Gaussian approximation or the real density function.
For turbo-like codes with two constituent codes the overall block diagram of the
iterative decoding is shown in Figure 42.
We have:
SNRout= G1(SNRin) (21)SNRin= G2(SNRout) (22)
Suppose we start with the first constituent code. There is no prior messages at this
time; i.e. SNRin= 0. The observations help to generate new messages with someSNRout= G1(SNRin) >0. This messages are passed to the second constituent codeand output messages have the SNR= G2(G1(SNRin)), which is the SNRin of thefirst constituent code for the next iteration. In order to obtain a better SNR at each
iteration we should have the following:
G2(G1(SNRin)) >SNRin; for any SNRin (23)
SinceG 2is strictly ascending function, it is reversible and we can write an equivalent
relation:
G1(SNRin) > G12 (SNRin); for any SNRin (24)
If the above constraint holds, then the iteratively decoding will result in correct
information bits (SNR goes to infinity). The minimum operating Eb/N0 that thisconstraint holds is denoted as the capacity of the code with iteratively decoding.
We usually draw G1(SNR) and G12 (SNR) in one figure. These curves give us
some sense about the convergence speed as well. An example of such curves and the
8/13/2019 Springer - Turbo-Like Codes
55/94
44 4 Very Simple Turbo-like Codes
Fig. 43 SNR improvement in iterative decoding
way SNR improves in iterations is shown in Figure 43. As we see the speed of SNR
improvement depends on the slopes of G1 and G2.
We use the transfer function curves to analyze the performance of turbo-like codes
with iterative decoding.
4.2 RA Codes
RA codes are the simplest codes among turbo-like codes, which make them very
attractive for analysis. The general block diagram of this code is drawn in Figure 44.
An information block of length N is repeated q times and interleaved to make a
block of sizeqN, and then followed by an accumulator.
rep(q)u
ACCI
N qN qNqN
Fig. 44 RepeatAccumulator code block diagram
8/13/2019 Springer - Turbo-Like Codes
56/94
4.2 RA Codes 45
The accumulator can be viewed as a truncated rate one recursive convolutional
code with transfer function of 1/(1+ D), but sometimes it is better to think of it as ablock code whose input block [x1, x2, . . . , xn ] and output block [y1, y2, . . . , yn ] are
related by the following:
y1= x1y2= x1+x2y3= x1+x2+x3. . . .
yn= x1+x2+x3+ +xn (25)
4.2.1 ML Analysis
For ML analysis we need the weight distribution of the code. We use the concept
of uniform interleaver [8] to compute the overall input-output weight enumerator
(IOWE). Therefore, we need to compute the IOWE of both repetition code and the
accumulator. For repetition code it is simply the following:
Arep(q)w,d =
N
w
; d= q w
0; otherwise(26)
It can be expressed as
Arep(q)w,d =
Nw
(dq w) (27)
where() is the Kronecker delta function.The IOWE of the accumulator is:
Aaccw,d=
Ndw/2
d1
w/21
(28)
Wherexandxdenote the largest integer smaller than xand smallest integerlarger thanx, respectively.
8/13/2019 Springer - Turbo-Like Codes
57/94
46 4 Very Simple Turbo-like Codes
Having the IOWE of the repeat and accumulate codes, we can compute the IOWE
of the RA code using the uniform interleaver.
ARA(q)w,d =q N
h= 0A
rep(q)
w,h A
acc
h,dq N
h
= N
w
q N
qw
q Ndqw/2 d1qw/21 (29)
4.2.2 DE Analysis
The RA codes consist of two component codes: repeat and accumulate codes. Hence,
the messages are exchanged between these two codes. We use Gaussian approxima-tion to obtain the SNR transfer functions of the constituent codes, which are shown
in Figure 45.
In accumulator code we have nonzero SNR even with zero messages in the input
coming from repetition code. This is because the observations help to generate
some nonzero messages. Therefore, the accumulator is able to jumpstart the iterative
decoding. On the other hand, the repetition code has a straight line transfer function
Fig. 45 Density evolution for RA codes (q= 3)
8/13/2019 Springer - Turbo-Like Codes
58/94
4.3 RA Codes with Puncturing 47
Fig. 46 Accumulator with punctur-
ing and its equivalent for p=3
with a slope of 2; the reverse function is shown in Figure 45. SNR out is zero when
SNRinis zero. This is justified since there is no channel observation available for this
code. The curves are almost touching; hence, the threshold of the RA code (rate 1/3)
is almost 0.5 dB.
4.3 RA Codes with Puncturing
4.3.1 ML Analysis
To compute the IOWE of the RA codes with puncturing we use the equivalent
encoder depicted in Figure 46 instead of the accumulator with puncturing.
As we see, the equivalent graph is a concatenated code of a regular check code
and an accumulator, which is shown in Figure 47.
Since the check code is regular and memoryless, the presence of any interleaver
between two codes does not change the IOWE of the overall code. In order tocompute the IOWE for this code we insert a uniform interleaver between two codes.
The next step is to compute the IOWE of the check code. The IOWE can
be expressed in a simple closed-form formula if we use the two-dimensional Z-
transform. The inverse Z-transform results in Acw,d. We start with N= 1, i.e. wehave only one parity check. We have
Ac(W,D)=Ep(W)+ Op(W)D (30)
Check(p) AccpN NN
Check(p) ppN NN
AccN
Fig. 47 Block diagram of accumulator with puncturing
8/13/2019 Springer - Turbo-Like Codes
59/94
48 4 Very Simple Turbo-like Codes
where
Ep(W)=Even[(1+W)p] (31)
and
Op(W)=Odd[(1+W)p] (32)
Since there are Nindependent check nodes in the code, the IOWE can be written
in Z-transform as:
Ac(W,D)=(Ep(W)+Op(W)D)N =N
d=0
N
d Ep(W)NdOp(W)dDd (33)
The IOWE is obtained by taking the inverse Z-transform. The closed-form expres-
sion for Aw,dfor arbitrary p is very complicated. Instead we derive the IOWE for
p=2, 3, and 4, which are practically more useful.
4.3.1.1 Casep=2
Using the general formula in Z-transform we have:
Ac(2)(W,D)=(1+W2 +2WD)N (34)
It can be expanded as following:
Nd=0
N
d
(1+W2)Nd(2W)dDd =
Nd=0
N
d
Ndj=0
Nd
j
W2j (2W)d
Dd
(35)
Therefore the IOWE can be expressed as
Ac(2)w,d=
N
d
Ndj
2d; w=d+2j for j= 0, . . . ,Nd
0; otherwise(36)
It can be expressed concisely as
Ac(2)w,d=
N
d
Ndj=0
Nd
j
2d(wd2j ) (37)
8/13/2019 Springer - Turbo-Like Codes
60/94
4.3 RA Codes with Puncturing 49
where(x)is the Kronecker delta function.
Example:N= 3
A=
1 0 0 00 6 0 0
3 0 12 0
0 12 0 8
3 0 12 0
0 6 0 0
1 0 0 0
=
1 0 0 00 2 0 0
3 0 4 0
0 4 0 8
3 0 4 0
0 2 0 0
1 0 0 0
1 0 0 0
0 3 0 0
0 0 3 0
0 0 0 1
(38)
The second matrix is canceled out when we concatenate this code to other codes
with a uniform interleaver.
4.3.1.2 Casep=3
Starting from general formula in Z-transform we have:
Ac(3)(W,D)=(1+3W2 +(3W+W3)D)N (39)It can be expanded as following:
Ac(3)(W,D)=N
d=0
Nd
(1+3W2)Nd(3W+W3)dDd (40)
Ac(3)(W,D)=N
d=0
N
d
Ndi=0
Nd
i
3i W2i
di=0
d
i
3i (W)2(di )
Wd
Dd (41)
It can be written as:
Ac(3)(W,D)=N
d=0
N
d
Ndii=0
Nd
ii
3iiW2ii
di=0
d
i
3(di )(W)2i
Wd
Dd (42)
Then we have
Ac(3)(W,D)=N
d=0
N
d
di=0
Ndii=0
d
i
Nd
ii
3ii+(di )Wd+2i+2ii
Dd
(43)
8/13/2019 Springer - Turbo-Like Codes
61/94
50 4 Very Simple Turbo-like Codes
If we let j= i+ii, we have
Ac(3)(W,D)=N
d=0
N
d
N
j=0
min(j,d)
i=max(0,jN+d)
d
i
Ndji
3d+j2i Wd+2j
Dd (44)
Therefore, it is easy to show that the IOWE becomes:
Ac(3)w,d=
N
d
min(j,d)
i=max(0,jN+d)
d
i
Ndji
3d+j2i
;
w
=d
+2j for j
=0, . . . ,N
0; otherwise
(45)
It can be written as
Ac(3)w,d=
N
d
Nj=0
min(j,d)i=max(0,jN+d)
d
i
Ndji
3d+j2i
(wd2j )
(46)
where() is the Kronecker delta function.Meanwhile we have the following property:
Ac(3)w,d= A
c(3)3Nw,Nd (47)
This property can be proven very easily by taking the complements of three input
bits to a check. The output of the check is also inverted. If the number of nonzero
input and output bits are w and d, respectively, the number of nonzero bits in
a complemented version is 3Nw and Nd. This proves the property. Thisproperty helps to save some computations.
Example: N=3
A=
1 0 0 0
0 9 0 0
9 0 27 0
0 57 0 27
27 0 99 0
0 99 0 27
27 0 57 00 27 0 9
0 0 9 0
0 0 0 1
=
1 0 0 0
0 3 0 0
9 0 9 0
0 19 0 27
27 0 33 0
0 33 0 27
27 0 19 00 9 0 9
0 0 3 0
0 0 0 1
1 0 0 0
0 3 0 0
0 0 3 0
0 0 0 1
(48)
8/13/2019 Springer - Turbo-Like Codes
62/94
4.3 RA Codes with Puncturing 51
Check(4)4N N
Check(2)4N N2N
Check(2)
p
N2NCheck(2)
2N4NCheck(2)
Fig. 48 Block diagram of check_4 code and its equivalents
4.3.1.3 Casep=4
The code for this case can be viewed as a concatenated code as shown in Figure 48.
Because the check code is regular and memoryless, we can put any interleaver
between the codes without changing the IOWE of the overall code.By using a uniform interleaver and the results found for case p=2 the IOWE canbe written as:
Ac(4)w,d=
2Nh=0
Ac(2)w,hA
c(2)h,d
2N
h
(49)
Using the result for case p=2, we obtain
Ac(4)w,d=
Nd
Ndj=0
2N
d
2ji=0
Nd
j
2Nd2j
i
22d+2j
(wd2i2j )
(50)
Example:N= 3
Ac(4) =
1 0 0 0
0 12 0 0
18 0 48 0
0 156 0 64
111 0 384 0
0 600 0 192
252 0 672 0
0 600 0 192
111 0 384 0
0 156 0 64
18 0 48 00 12 0 0
1 0 0 0
=
1 0 0 0
0 4 0 0
18 0 16 0
0 52 0 64
111 0 128 0
0 200 0 192
252 0 224 0
0 200 0 192
111 0 128 0
0 52 0 64
18 0 16 00 4 0 0
1 0 0 0
1 0 0 0
0 3 0 0
0 0 3 0
0 0 0 1
(51)
8/13/2019 Springer - Turbo-Like Codes
63/94
52 4 Very Simple Turbo-like Codes
This method can be applied for any p that can be decomposed into two smaller
numbers.
Having computed the IOWE of the check code, we can use the uniform interleaver
formula to come up with the IOWE of the accumulator with puncturing. We have:
Aacc(p)w,d =
Nh=0
Ac(p)w,h A
acch,d
N
h
(52)
The simplified expressions for cases p=2, 3, and 4 are as follows:
Aacc(2)w,d =
Nh=0
Nhj=0
Nh
j
Ndh/2
d1h/21
2h (wh2j ) (53)
Aacc(3)w,d =
Nh=0
Nj=0
min(j,h)i=max(0,jN+h)
h
i
Nhji
Ndh/2
d1h/21
3h+j2i (wh2j )
(54)
Aacc(4)
w,d =
N
h=0
Nh
j=0
2Nh2j
i=0
Nhj
2Nh2ji
Ndh/2 d1h/21
22h+2j (wh2i2j )
(55)
It should be noted that despite the fact that we use a uniform interleaver to obtain the
IOWE, we come up with the exact IOWE for accumulator with puncturing.
The next step is to find the IOWE of the RA code with puncturing, which is derived
in case of a uniform interleaver after repetition.
Arep(q)acc(p)w,d =
q Nh=0
Arep(q)w,h A
acc(p)h,d
q N
h
(56)
Therefore, the closed form expressions for IOWE of RA(p=2, q=2), RA(p=3,q=3), and RA(p=4, q=4) will be the following:
Arep(2)
acc(2)
w,d =
N
w
2N
2w
N
h=0
Nhj=0
Nhj Ndh/2 d1h/21 2h
(2wh2j ) (57)
8/13/2019 Springer - Turbo-Like Codes
64/94
4.3 RA Codes with Puncturing 53
Arep(3)acc(3)w,d =
N
w
3N
3w
N
h=0
Nj=0
min(j,h)i=max(0,jN+h)
h
i
Nhji
Ndh/2
d1h/21
3h+j2i (3wh2j ) (58)
Arep(4)acc(4)w,d =
N
w
4N
4w
N
h=0
Nhj=0
2Nh2ji=0
Nh
j
2Nh2j
i
Ndh/2 d1h/21 22h+2j (4wh2i 2j ) (59)The above expressions are IOWE of the nonsystematic RA codes with regular
puncturing. However, in most cases we need to have systematic codes. It is very easy
to compute the IOWE of a systematic code based on its nonsystematic code. The
following formula shows the conversion.
Asysrep(q)acc(p)w,d = A
rep(q)acc(p)w,dw (60)
4.3.2 Performance of Punctured RA Codes with ML Decoding
RA codes are usually nonsystematic codes, i.e. the information block is not sent
along with the output of the accumulator. However, the RA codes with puncturing
should be systematic in order to be decodable by iterative decoding. This constraint
is because the messages passed towards information variables are always zero; hence
not improving.
The normalized distance spectrum of some rate 1/2 codes for a block size of 4,000are illustrated in Figure 49. These codes are RA code (q= 2), systematic RA codewith puncturing (q= 3, p= 3), (q= 4, p= 4), and random code. The distancespectrum of a (n,k)random code is
Arandomw,d =
n
w
k
d
2k (61)
Arandomd = k
d 2nk (62)
To compute the cutoff thresholds of these codes using Divsalar bound, the nor-
malized distance spectrum is computed when Ngoes to infinity, i.e. the asymptotic
expression of r(). For random codes and the asymptotic expression of r() for
8/13/2019 Springer - Turbo-Like Codes
65/94
54 4 Very Simple Turbo-like Codes
Fig. 49 Normalized distance spectrum of RA codes with puncturing
random codes with code rate Rc is
r()=H()+(Rc1) ln 2 (63)
whereH() is the binary entropy function. The asymptotic expression of r() for RA
code with repetitionq can be obtained as:
r()= max0
8/13/2019 Springer - Turbo-Like Codes
66/94
4.3 RA Codes with Puncturing 55
2= j/2Nfor 0< 2 >1/2. Also (22+)/3< min(0.5,).
r()= max,1.2
H
42+23 +H
1
+(1/2)H211/2
+ (2+21) ln(3)+
1/2+ 22+3
H
/2
1/2+ 22+3
+
22+3
H
/2
22+3
(65)
To derive the asymptotic expression ofr() for RA(q= 4, p= 4), we let =d/2Nfor 0< < 1,=h/2Nfor 0<
8/13/2019 Springer - Turbo-Like Codes
67/94
8/13/2019 Springer - Turbo-Like Codes
68/94
4.4 ARA Codes 57
ACCN-M
M
N-M
Fig. 51 The block diagram of the precoder
Mbits are passed through without any change and the rest (NMbits) goes through
an accumulator. M is considered a parameter in code design. The effect of this
parameter is studied in ML and iterative decoding. Then, it is optimized for achieving
the best performance. The block diagram of the precoder is shown in Figure 51.
4.4.1 ML Analysis
In order to find the performance of the code we need to compute the IOWE of the
precoder. It is easily computed using the IOWE of the accumulator code as follows:
Aprew,d=
Mm=0
M
m
Aaccwm,dm (67)
Therefore the IOWE of the overall code can be written as:
Aprerep(q)acc(p)w,d =
N
h=0
Aprew,hA
rep(q)acc(p)h,d
N
h
(68)
For systematic ARA code (p=3, q=3), we have
Aprerep(3)acc(3)w,d =
Mm=0
Nk=0
M
m
3N
3k
N
h=0
Nj=0
min(j,h)i=max(0,jN+h)
h
i
Nhji
Ndh/2 d1h/21N Mk+m(wm)/2
km1(wm)/21
3h+j2i (3kh2j ) (69)
Aprerep(4)acc(4)w,d =
Mm=0
Nk=0
M
m
4N
4k
N
h=0
Nhj=0
2Nh2ji=0
Nh
j
2Nh2j
i
Ndh/2
d1h/21
N Mk+m
(wm)/2
km1(wm)/21
22h+2j (4kh2i2j ) (70)
8/13/2019 Springer - Turbo-Like Codes
69/94
58 4 Very Simple Turbo-like Codes
Fig. 52 ARA(3,3) BER performance bound
The above expressions are IOWE of the nonsystematic ARA codes with regular
puncturing. The IOWE of their systematic codes are derived by the following con-
version.
AsysARA(q, p)w,d = A
ARA(q,p)w,dw (71)
4.4.2 Performance of ARA Codes with ML Decoding
Divsalar BER performance bound of the ARA(3,3) and ARA(4,4) for different
Ms are shown in Figures 52 and 53. It is observed that the more number of bits
accumulates in the precoder, the lower the code threshold becomes. However, the
improvement stops at a certain point, which is M= 1/5 N for ARA(3,3) andM= 2/5 N for ARA(4,4). It is obvious that when M= N the codes turn intoRA with puncturing. It is very heartening that the performance of the ARA(4,4)
approaches very closely to that of random codes for the same block size in lowEb/N0
region.It is very instructive to observe the distance spectrum of these codes (For optimum
M). As we see in Figure 54, the only difference between the distance spectrum of
these codes and a random code is in the low-distance region, which causes the error
floor.
8/13/2019 Springer - Turbo-Like Codes
70/94
8/13/2019 Springer - Turbo-Like Codes
71/94
60 4 Very Simple Turbo-like Codes
Fig. 54 Normalized distance spectrum of ARA codes with puncturing
To derive the asymptotic expression of r() for ARA(q= 4, p= 4), we let = M/2Nfor 0 < < 1/2, 1=m/2Nfor 0 < 1 < , 2= (w m)/2Nfor 0 < 2 < 1/2 ,=d/2Nfor 0<
8/13/2019 Springer - Turbo-Like Codes
72/94
4.4 ARA Codes 61
Table VII Cut-off Threshold for Rate 1/2 ARA Codes
Cutoff ARA_punc. ARA_punc. Random Shannon
threshold (q=3, p=3) (q=4, p=4) code limitRate 1/2 0.509 dB 0.310 dB 0.308 dB 0.184 dB
Table VII tabulates the Divsalar cutoff threshold for the same codes in Figure
54. As we expected, based on the BER performance bound, the cutoff threshold of
ARA(4,4) is extremely close to the cutoff threshold of random code.
4.4.3 DE Analysis
Unfortunately the iterative decoding cannot decode, as well as ML decoding. More-
over, the difference between the performances of two codes with iterative decoding
cannot be predicted based on their ML decoding performance.
The effect of the precoder in iterative decoding is very clear in Figure 55. The
accumulator and the repetition are regarded as one constituent code. The SNR
Fig. 55 Density evolution for ARA codes with puncturing (q=4, p=2)
8/13/2019 Springer - Turbo-Like Codes
73/94
62 4 Very Simple Turbo-like Codes
Fig. 56 Performance of ARA codes using iterative decoding
transfer functions of this code and the simple repetition code are shown for com-
parison. The noticeable difference is a shift in the curve. This improves the threshold
almost by 0.5dB.
We have used the DE method to optimize the ARA codes for iterative decod-
ing. ARA(4,4) and ARA(3,3) achieve the best performance with M= 0.7Nand M= 0.5N, respectively. The performances of these codes are illustrated inFigure 56.
4.5 Other Precoders
Although the ML decoding performance of ARA codes with simple accumulate
precoder is very close to that of random codes, there is no known practical method
to actually perform this decoding. Therefore, we are looking for codes that have
good performance with iterative decoding. We have observed that with differentprecoders very good codes can be obtained. In this section we introduce some of
these precoders.
8/13/2019 Springer - Turbo-Like Codes
74/94
8/13/2019 Springer - Turbo-Like Codes
75/94
64 4 Very Simple Turbo-like Codes
Fig. 59 Performance of the new ARA code
4.6 Hardware Complexity
Since the building blocks of the decoder is repetition and check nodes, like LDPC
codes, and the message passing are very simple for these blocks, the hardware
complexity is very low as far as the logic is concerned.
The memory requirement depends on the number of edges in the graph.
ARA codes have quite small number of edges, which results in low memoryrequirement.
4.7 Conclusion
This study proposes a novel coding structure which is not only very simple, but also
achieves the performance comparable or better than the best practical turbo codes
and LDPC codes. The ML analysis showed that in some cases they are extremelyclose to random codes, which achieve the Shannon limit.
The proposed coding scheme generates a family of LDPC codes for various code
rates and data frame sizes, and with a performance close to the Shannon capacity
8/13/2019 Springer - Turbo-Like Codes
76/94
4.7 Conclusion 65
limit. Unlike general LDPC codes, they have very simple encoding too. The main
innovation is in the inclusion of a very simple precoder, which is constructed by
parallel punctured accumulators. Such precoder improves the performance.
The regularity and simplicity of the proposed coding structure also allows
the construction of very high-speed iterative decoders using message-passingalgorithm.
8/13/2019 Springer - Turbo-Like Codes
77/94
Chapter 5
High Speed Turbo-like Decoders
5.1 Introduction
This chapter presents the architecture for high-speed decoding of ARA codes.
Whereby, message-passing algorithm enables us to achieve parallelism. Simula-tions have shown that efficiency is not compromised in order to obtain speed
gains.
Like parallel turbo decoder memory access poses a practical problem. We
extend the concept of conflict-free interleaver to address this problem. This leads
to the introduction of a new class of turbo-like codes that can be decoded
very fast. It is shown that the proposed high-speed turbo and ARA decoder are
among the codes in this class. The general architecture for decoding this class is
presented.
5.2 Parallel ARA Decoder
To build high-speed decoders for ARA codes we go along the similar way used
for high-speed turbo decoders. The basic idea is to partition the graph into sev-
eral subgraphs and let them work in parallel. For hardware regularity it is desir-
able that subgraphs are identical or have minimum variety. As an example, the
ARA code shown in Figure 58 is considered. The partitioned graph is drawn in
Figure 60.
Each subgraph is decoded using the messages passing algorithm. Since subgraphs
have tree structure, the efficient scheduling provides the fastest decoding method.
Usually the decoding for each subgraph is done serially, which lowers the com-
plexity. The hardware entity that performs the decoding for one subgraph is called
subgraph processor or window processor each subgraph corresponds to a window
of the code word.
There are three types of messages that are communicated within/between sub-
blocks: Internal messages are those that correspond to edges within one subgraph.
Border messages are those that are related to edges connecting two adjacent sub-graphs. External messages are passed between subgraphs through the interleaver. In
other words, they correspond to edges with global span. External messages are called
extrinsics after their counterparts in turbo codes.
Aliazam Abbasfar,Turbo-Like Codes, 6780. 67
c Springer 2007
8/13/2019 Springer - Turbo-Like Codes
78/94
68 5 High Speed Turbo-like Decoders
0 1
Interleaver
2 3 4 5 6 7
0 1 2 3 4 5 6 7
Fig. 60 The partitioned graph of ARA code
We need memory for storing all messages. The memory for internal messages is
local to the subgraph processors, whereas for external messages it is a global memorythat all subgraphs should have access to it. Border messages are usually stored in
registers that are part of subgraph processors. They are exchanged with neighboring
subgraphs at the end of each iteration.
Therefore the architecture for the decoder is like in Figure 61, in which a andbare
border messages and x and y are extrinsics. Internal messages are not shown. This
architecture is very similar to parallel turbo decoder. The only difference is that there
are two different window processors here, which are denoted as W and W.
Fig. 61 Parallel turbo decoder
structure
8/13/2019 Springer - Turbo-Like Codes
79/94
5.4 Interleaver Design 69
5.3 Speed Gain and Efficiency
Unlike turbo decoders the parallel processing does not cost much more processing.
This is because the LDPC codes are inherently parallel. What we are doing here isto make the parallelization practically feasible.
5.4 Interleaver Design
Although the message-passing algorithm allows us to parallelize the decoding
process, accessing so many extrinsics at the same time poses a practical problem.
Since Mwindow processors are running at the same time, Mextrinsics are being
used simultaneously. The extrinsic memory is organized in M banks of mem-ory in order to facilitate the simultaneous access; i.e. M locations are accessed
simultaneously.
As we discussed in the case of parallel turbo decoder, the interleaver should be
such that the window processors get the extrinsics from different banks of memory
in interleaved order as well. This forces us to have conflict-free interleaver presented
for parallel turbo decoder. However, in this section we look at this problem from a
graphical point of view. The parallel decoder comprises M identical processors that
are running in parallel. We put the partitions in parallel planes. Then we look at the
projected graph. The projected graph for ARA code shown in Figure 60, is shown in
Figure 62.
The projected graph can be viewed as the vectorized version of the actual
graph. In other words, there is a message vector associated with every edge in
the projected graph. The structure of memories for messages is such that only
one message vector is accessible at a time. The interleaver should preserve the
Fig. 62 Projected graph
8/13/2019 Springer - Turbo-Like Codes
80/94
70 5 High Speed Turbo-like Decoders
Fig. 63 Projected graph with conflict-free interleaver
message vectors in its entirety, but the permutation is allowed within a vector. The
permutation within a vector is the permutation among the window processors or
different planes in the overall graph. This permutation does not change the projected
graph.
Therefore the interleaver consists of several independent permutations within the
message vectors. The way vectors are connected between two constituent codes is
another flexibility in the interleaver design. An example of the projected graph with
conflict-free interleaver is shown in Figure 63. The dashed edges indicate that the
permutation is allowed within a vector.
The above connections not only guarantee the conflict-free structure, but also
ensure that messages are propagated throughout the graph. Therefore projected
graph provides a very useful approach for designing turbo-like codes for high speeddecoding.
5.5 Projected Graph
In this section, design methodology for turbo-like codes based on the projected graph
is presented. There are two ways that could be pursued in order to design codes based
on projected graphs.The first approach is the one that was used so far to parallelize the decoder.
This is based on partitioning an existing code graph into some subgraphs. This
method works on any regular or semiregular graph. The projected graph includes one
8/13/2019 Springer - Turbo-Like Codes
81/94
5.5 Projected Graph 71
Fig. 64 A PCCC projected graph with
conflict-free interleaver
partition of each component code, which is called component graph. The component
graphs are connected with conflict-free interleavers. This method was used for ARA
code whose projected graph is shown in Figure 63.
It is shown that the parallel turbo decoder is a member of this class. Later on LDPC
codes based on projected graph are also introduced.
5.5.1 Parallel Turbo Decoder
The parallel turbo decoder proposed in Chapter 3 is one example of graphs
based on projected graph. The projected graph of such a code is illustrated in
Figure 64.
It is very instructive to note that the reverse interleaver, used for decreasing
the latency, is clearly shown connecting the edges of two component graphs inreverse order. From the projected graph argument, we can see that the interleaver
structure is just several independent permutations. The number of permutations
needed is the window size; here it is 4.
5.5.2 Other Known Turbo-like Codes
The class of codes based on the projected graph covers a wide range of turbo-likecodes. This section introduces some known turbo and turbo-like codes with projected
graphs. Figure 65 illustrates the projected graphs for a parallel turbo code with three
constituent codes: a serial turbo code, RA code, and IRA code.
8/13/2019 Springer - Turbo-Like Codes
82/94
72 5 High Speed Turbo-like Decoders
Fig. 65 (a) PCCC with three
component codes: (b) SCCC, (c)
RA(3), (d) IRA(2,3)
8/13/2019 Springer - Turbo-Like Codes
83/94
5.5 Projected Graph 73
The second approach is to design the code by designing its projected graph. In
this method we should design some component graphs and the connections between
them in order to have good performance. In other words, the partitions are designed
first and then put together to create constituent codes.
This approach is very appealing because the resulting code is definitely paral-lelizable and the performance of the code can be analyzed very efficiently by its
component graphs. The first example of this approach is parallel LDPC codes, which
is explained in the following section.
5.5.3 Parallel LDPC Codes
This section explains how to design LDPC codes with parallel decoding capabilities.This class of LDPC codes was independently discovered by Richardson et al. [29],
which is called vector LDPC codes. Thorpe [32] introduced LDPC codes based on
protograph that is basically the same concept. In this section we present this class as
codes with projected graph.
There are two component graphs in these codes: one contains only single parity-
check codes (variable degree) and the other has only repetition codes (variable
degree). One example of such a code is shown in Figure 66.
There are some noticeable facts about this projected graph. All variable nodes
are in one component graph, which means that all the observations are stored andprocessed in one kind of window processor. Variable and check node can have
different degrees. Therefore this structure is capable of implementing regular and
irregular LDPC codes. The degree distribution of variable and check nodes is known
from the projected graph. There are no local messages and no border messages
passed between adjacent subgraphs. In other words, we only have external edges.
Therefore, the projected graph can be represented graphically as a simple Tanner
graph. The graphical representation for above example is shown in Figure 67. The
number of interleaver needed for this code is equal to the number of edges of the
projected graph.The only disadvantage of this method for code design is that it does not provide
an efficient encoder. Sometimes simple encoding is not possible for codes designed
this way.
Fig. 66 A parallel LDPC projected graph
8/13/2019 Springer - Turbo-Like Codes
84/94
74 5 High Speed Turbo-like Decoders
0 1 32
Fig. 67 Simple graphical representation of a
LDPC projected graph
5.5.4 More AccumulateRepeatAccumulate Codes
In this section some improved ARA codes are introduced that were designed based
on projected graphs. These are ARA codes with different codes rates. The thresholds
of these codes are also compared with channel capacity threshold.
5.5.4.1 Code Rates
8/13/2019 Springer - Turbo-Like Codes
85/94
Fig. 68 ARA code without puncturing
Fig. 69 (a) Rate 1/3 ARA code; (b) rate 1/2 ARA code
8/13/2019 Springer - Turbo-Like Codes
86/94
76 5 High Speed Turbo-like Decoders
Table VIII Cutoff Threshold for ARA Codes with Rate
8/13/2019 Springer - Turbo-Like Codes
87/94
5.5 Projected Graph 77
Table IX Cutoff Threshold for Improved ARA Codes with Rate
8/13/2019 Springer - Turbo-Like Codes
88/94
78 5 High Speed Turbo-like Decoders
Table X Cutoff Threshold for ARA Codes with Rate >1/2
Rate 4/7 5/8 2/3 7/10 8/11 3/4 10/13
Threshold (dB) 0.700 1.006 1.272 1.506 1.710 1.894 2.057
Shannon limit 0.530 0.815 1.059 1.272 1.459 1.626 1.777
Difference 0.170 0.191 0.213 0.234 0.251 0.268 0.280
5.5.4.3 Code Rates >1/2
In this section we present a family of ARA codes derived from rate 1/2 irregular
ARA code in Figure 72. The projected graph of these codes is shown in Figure 73
and the performance of this family is listed for different rates in Table X. It also
shows how close that is to Shannon threshold.
5.6 General Hardware Architecture
In this section we present general hardware architecture for implementing parallel
turbo-like decoders. Without any loss of generality we focus on turbo-like codes with
two constituent codes. This can be easily extended to codes with several constituent
codes by grouping them into two combined constituent codes. The general hardware
architecture is shown in Figure 74. EXTn denotes the external memory for nthwindow processor.
Since the processors are identical and are running in parallel, the scheduling is the
same for all of them. Therefore, there is only one scheduling controller needed for
each constituent codes. The scheduling controller determines which message vector
is accessed and what permutation is used. The permutor is a memoryless block that
permutes the message vector on the fly. Since the message vectors are permuted
Fig. 73 Irregular ARA code family for rate >1/2
8/13/2019 Springer - Turbo-Like Codes
89/94
5.6 General Hardware Architecture 79
Permutor
Window
processorWM-1
EXTM-1
Scheduling
ControllerC1
Address
Generator
Scheduling
ControllerC2
PermutationSelect
Window
processorW1
EXT1
Window
processorW0
EXT0
Window
processorW0
Window
processorW1
Window
processorWM-1
. . .
. . .
. . .
Fig. 74 Parallel decoder hardware architecture
differently, the permutor should be programmable. If M, the number of window
processors, is large, the permutor can be the bottleneck of the hardware design.
The architecture of one window processor is depicted in Figure 75.
AM and BM denote the registers which contains border messages. The observation
memory is loaded at the beginning of the decoding and remains intact until end. This
memory is not necessary for all window processors.
B
M
Observation Mem
MessagePassing
Core
A
M
Internal Messages Mem
Fig. 75 Window processor hardware architecture
8/13/2019 Springer - Turbo-Like Codes
90/94
80 5 High Speed Turbo-like Decoders
5.7 Conclusion
In this chapter, architecture for high-speed decoding of ARA codes was presented.
Two major issues in high-speed decoding were addressed: parallel processing andmemory access problem. This led to introduction of a new class of turbo-like codes
that can be decoded very fast, which are the codes with projected graph. This
classification provides an alternative method to design turbo-like codes for high-
speed decoding. It was shown that the proposed high-speed turbo and ARA decoder
are among the codes in this class. The general architecture for decoding this class of
codes was also presented.
The generalized coding structure that was developed during this research is a
powerful approach toward designing turbo-like codes that are suitable for high-
speed decoding. However, there are some areas that are not covered yet or can be
a complement to this research, which are described as follows.
First, in designing ARA codes the focus was on improving the threshold. However,
another important aspect of the performance is usually ignored that is the error
floor. Two important factors affect the error floor of a code: code structure and
interleaver design. Code structure is selected to obtain a certain threshold; therefore
the interleaver design is used to improve the error floor. ARA codes with pseudo-
random interleavers usually have high error floors. We have been able to improve
the error floor by orders of magnitude by manual changes in the interleavers. It is
very important to find a systematic way to design or modify interleavers to have low
error floor. Design of algorithmic interleavers is a more challenging topic, which isof more practical interest.
Second, the search for good codes based on their projected graph is very reward-
ing. Since the structure of such a code guarantees the high-speed decoding capability,
the only concern is the performance of the code. On the other hand, the projected
graph is desired to be very simple, which makes the search easier. One simple way
of approaching this problem is to start with known projected graphs and make some
changes. Analysis of the resulting code determines whether the change is good or not.
We have pursued this approach and some preliminary results show its effectiveness.
8/13/2019 Springer - Turbo-Like Codes
91/94
References
1. A. Abbasfar and K. Yao, An efficient and practical architecture for high speed turbo
decoders, Vol. 1, Proceedings of VTC, October 2003, pp. 337341.2. A. Abbasfar and K. Yao, Interleaver design for high speed turbo decoders, Vol. 5205,
Proceedings of SPIE, August 2003, pp. 282290.
3. A. Abbasfar and K. Yao, An efficient architecture for high-speed turbo decoders, Pro-
ceedings of ICASSP 2003, April 2003, pp. IV-521IV-524.
4. S. Aji and R.J. McEliece, The generalized distributed law, IEEE Trans. Inform. Theory,
March 2000, 32(1), 325343.
5. L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, Optimal decoding of linear codes for
minimum symbol error rate, IEEE Trans. Inform. Theory, March 1974, 284287.
6. S. Bennedetto, D. Divsalar, G. Montorsi, and F. Pollara, Soft-input soft-output APP
module for iterative decoding of concatenated codes, IEEE Commun. Lett., January 1997,2224.
7. S. Bennedetto, D. Divsalar, G. Montorsi, and F. Pollara, Serial concatenation of inter-
leaved codes: performance analysis, design, and iterative decoding, IEEE Trans. Inform.
Theory, May 1998, 44(3), 909926.
8. S. Bennedetto and G. Montorsi, Unveiling turbo codes: some results on parallel concate-
nated codes, IEEE Trans. Inform. Theory, March 1996, 42(2), 409428.
9. C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error correcting coding
and decoding: turbo codes, Proceedings of the 1993 IEEE International Conference on
Communications, Geneva, Switzerland, May 1993, pp. 10641070.
10. D. Divsalar, A simple tight bound on error probability of block codes with application toturbo codes, JPL TMO Progress Report 42139, November 1999, pp. 135.
11. D. Divsalar, S. Dolinar, and F. Pollara, Iterative turbo decoder analysis based on
Gaussian density evolution, IEEE J. Select. Areas Commun., May 2001, 19(5), pp.
891907.
12. D. Divsalar, H. Jin, and R.J. McEliece, Coding theorems for turbo-like codes, Proceedings
of the 36th Allerton Conference on Communication, Control and Computing, September
1998, Allerton House, Monticello, IL, pp. 201210.
13. B.J. Frey, F.R. Kschischang, and P.G. Gulak, Concurrent turbo-decoding, Proceedings of
the IEEE International Symposium on Information Theory, July 1997, Ulm, Germany,
p. 431.14. R. Gallager, Low-Density Parity-Check Codes, MIT Press, Cambridge, MA, 1963.
15. J. Hsu and C.H. Wang, A parallel decoding scheme for turbo codes, Vol. 4, IEEE
Symposium on Circuits and Systems, Monterey, June 1998, pp. 445448.
81
8/13/2019 Springer - Turbo-Like Codes
92/94
82
16. H. Jin, Analysis and Design of Turbo-like Codes. Ph.D. thesis, California Institute of
Technology, Pasadena, 2001.
17. H. Jin, A. Khandekar, and R. McEliece, Irregular repeat-accumulate codes, in: Proceed-
ings of the 2nd International Symposium on Turbo Codes, Brest, France, 2000, pp. 18.
18. F.R. Kschischang and B.J. Frey, Iterative decoding of compound codes by probabilitypropagation in graphical models, IEEE J. Select. Areas Commun., February 1998, 16(2),
219230.
19. S.L. Lauritzen and D.J. Spiegelhalter, Local computations with probabilities on graphical
structures and their applications in expert systems, J. R. Stat. Soc. B., 1988, 50, 157224.
20. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, D.A. Spielman, and V. Stemann, Practical
low-resilient codes, Proceedings of 29th Symposium on Theory of Computing, 1997, pp.
150157.
21. M. Luby, M. Mitzenmacher, M.A. Shokrollahi, and D.A. Spielman, Improved low-density
parity-check codes using irregular graphs, IEEE Trans. Inform. Theory, 2001, 47, 585
598.22. D.J.C. MacKay and R.M. Neal, Good codes based on very sparse matrices, in: C. Boyd
(ed.). Cryptography and Coding, 5th IMA Conference, No. 1025 in Lecture Notes in
Computer Science, Springer, Berlin, 1995, pp. 100111.
23. D.J.C. MacKay, Good error correcting codes based on very sparse matrices, IEEE Trans.
Inform. Theory, 1999, 45(2), 399431.
24. R.J. McEliece, D.J.C. MacKay, and J.F. Cheng, Turbo decoding as an instance of Pearls
belief propagation algorithm, IEEE J. Select. Areas Commun., February 1998, 16(2),
140152.
25. J. Pearl, Fusion, propagation, and structuring in belief networks, Artif. Intell., 1986, 29,
242288.
26. G. Poltyrev, Bounds on the decoding error probability of binary linear codes via their
spectra, IEEE Trans. Inform. Theory, 40(10), 12611271.
27. T. Richardson and R. Urbanke, The capacity of low density parity check codes under
message passing decoding, IEEE Trans. Inform. Theory, February 2001, 47(2), 599618.
28. T. Richardson, M.A. Shokrollahi, and R. Urbanke, Design of capacity-approaching irreg-
ular low-density parity-check codes, IEEE Trans. Inform. Theory, February 2001, 47(2),
619637.
29. Richardson et al., Methods and apparatus for decoding LDPC codes, United States Patent
6,633,856, October 14, 2003.
30. C.E. Shannon, A mathematical theory of communications, Bell Syst. Tech. J., 1948, 27,
379423.
31. R.M. Tanner, A recursive approach to low complexity codes, IEEE Trans. Inform. Theory,
1981, IT-27, 533547.
32. Jeremy Thorpe, Low Density Parity Check (LDPC) Codes Constructed from Protograhs,
JPL INP Progress Report 42-154, August 15, 2003.
33. A.J. Viterbi and A.M. Viterbi, An improved union bound for binary input linear codes on
AWGN channel, with applications to turbo decoding, Proceedings of IEEE Information
Theory Workshop, February 1998.
34. N. Wiberg, Codes and Decoding on General Graphs. Linkping Studies in Science and
Technology. Dissertation no. 440. Linkping University, Linkping, Sweden, 1996.
8/13/2019 Springer - Turbo-Like Codes
93/94
Index
A
AccumulateRepeatAccumulate, v, xiv, 4,56, 102
ARA code, 4
ARA codes, iv, v, viii, ix, xiv, 4, 56, 80, 82,
84, 87, 88, 90, 92, 102, 105, 106, 109, 110
ARA decoder, v, xiv, 4, 92, 109
B
Backward Recursion, 25
BCJR algorithm, iii, 8, 24, 26, 27, 31, 33,
34belief propagation algorithm, 2, 9, 112
bipartite graph, 9
block codes, iv, 6, 7, 18, 56
C
Codes on graph, iii, 18
Conflict-free interleaver, 3
conflict-free interleavers, 42, 52,
97
Constituent code, viii, 61constituent codes, vi, vii, 2, 5, 6, 7, 8, 26, 27,
28, 29, 31, 33, 34, 35, 40, 52, 59, 60, 61,
65, 79, 96, 98, 100, 107, 108
convolutional code, vi, 5, 8, 20, 21, 23, 24,
26, 52, 64
convolutional codes, 5, 6, 20, 21, 22, 27
Convolutional codes, iii, vi, 20, 21
D
density evolution, viii, 4, 28, 59, 61,111
Density evolution, iv, 59
E
efficiency, iv, v, xiii, 1, 3, 35, 36, 37, 38, 40,53, 92, 94
Efficient schedule, 16
extrinsic information, 8, 25, 26, 28, 33,
44
extrinsics, 11, 26, 27, 28, 29, 30, 40, 41, 42,
44, 52, 93, 94, 95
F
Flooding schedule, 16
Forward recursion, 24
G
graph representation, 18, 20, 22, 24
graphs with cycles, 3, 11
Graphs with cycles, iii, 17
H
hardware architecture, v, ix, 107, 108, 109
Hardware complexity, iv, v, 3, 51, 90
high-speed decoding, xiii, 2, 4, 92, 109,110
I
interleaver, iv, vii, ix, x, xiii, 3, 7, 32, 34, 41,
42, 43, 44, 45, 46, 47, 49, 50, 53, 54, 64,
65, 67, 69, 72, 73, 74, 92, 93, 95, 96, 97,
98, 101, 110
Interleaver, iv, v, 40, 45, 95, 111
IOWE, 64, 65, 67, 68, 69, 70, 72, 73, 74, 75,
81, 82IRA codes, 55
Irregular RepeatAccumulate, 55
83
8/13/2019 Springer - Turbo-Like Codes
94/94
84 Index
iterative decoding, vii, viii, xiii, 2, 3, 4, 7, 8,
9, 24, 28, 29, 30, 55, 56, 61, 63, 66, 75,
80, 86, 87, 88, 102, 111
Llatency, iv, xiii, 3, 18, 30, 35, 42, 43, 53, 98
LDPC codes, v, 1, 9, 19, 55, 56, 59, 90, 91,
94, 97, 100, 101, 113
Low-density parity-check, 1, 19, 112
M
MAP decoding, 8
Memory access, xiii, 3
message passing, 15, 18, 20, 24, 27, 56, 80,
90, 91, 92message-passing algorithm, xiii, 3, 17, 24,
32, 41, 95
ML decoding, iv, v, 4, 55, 56, 75, 80, 82,
86, 88
P
Parallel concatenated convolutional code, 2,
5
parallelization, xiii, 3, 4, 35, 36, 45, 53, 80,
94
Parity-check codes, iii, 18
pipelining, 43
precoder, viii, 80, 81, 82, 86, 88, 89,
105
precoding, 55
probabilistic graph, 10
probability propagation algorithm, 3, 11
processing load, 35, 36, 53
projected graph, ix, 4, 95, 96, 97, 98, 100,
101, 102, 104, 106, 109, 110
protograph, 100
puncturing, iv, v, viii, ix, 55, 67, 73, 74, 75,
76 78 79 82 84 87 88 102
R
RA codes, iii, iv, viii, x, 1, 2, 6, 55, 63, 65,
66, 67, 75, 76, 78, 79, 80
RepeatAccumulate codes, 63
repetition codes, 100
S
scheduling, 15, 16, 18, 27, 32, 45, 93, 108
Serial concatenated convolutional codes, 2, 5
serial decoder, 32, 37, 42, 49, 52, 53
Shannon limit, xiv, 4, 19, 55, 56, 78, 91, 105,
111
SISO, vi, 8, 26, 27, 28, 29, 30, 34, 35, 41,
45, 46, 47, 53
sparse parity-check matrix, 19speed gain, vii, 3, 30, 35, 36, 38, 39, 40, 43,
51, 53
Speed gain, iv, v, 35, 36, 94
speed gain and efficiency, 3, 51
S-random interleavers, 47
state constraint, 21, 35
state variables, vi, 20, 21, 25
systematic code, 5, 6, 36, 49, 75, 79
T
Tanner graph, vi, viii, 9, 18, 19, 20, 21, 89
turbo codes, vi, xiii, 1, 2, 6, 7, 11, 17, 22, 27,
28, 29, 30, 44, 55, 80, 93, 111, 112
Turbo codes, iii, 5, 22, 28
turbo decoding, vi, 9, 11, 24, 29, 30
turbo encoder, 5
turbo-like code, 2, 4, 56
turbo-like codes, iii, iv, v, xiii, xiv, 2, 3, 4, 5,
6, 17, 55, 59, 61, 63, 80, 92, 97, 98, 107,
109
W
window processor 31 34 93 101 107 108