LDPC Options for Next Generation Wireless Systems ?· LDPC Options for Next Generation Wireless Systems…

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>LDPC Options for Next Generation Wireless Systems </p><p>T. Lestable* and E. Zimmermann# </p><p>*Advanced Technology Group, Samsung Electronics Research Institute, UK #Technische Universitt Dresden, Vodafone Chair Mobile Communications Systems </p><p> AbstractLow-Density Parity-Check (LDPC) codes have recently drawn much attention due to their near-capacity error correction performance, and are currently in the focus of many standardization activities, e.g., IEEE 802.11n, IEEE 802.16e, and ETSI DVB-S. In this contribution, we discuss several aspects related to the practical application of such codes to wireless communications systems. We consider flexibility, memory requirements, en- and decoding complexity and different variants of decoding algorithms for LDPC codes that enable to effectively trade-off error correction performance for implementation simplicity. We conclude that many of what have been considered significant disadvantages of LDPC codes (inflexibility, high encoding complexity, etc.) can be overcome by appropriate use of different algorithms and strategies that have been recently developed making LDPC codes a highly attractive option for forward error correction for B3G/4G systems. </p><p> Index TermsLDPC, Belief Propagation, Bit </p><p>Flipping, Scheduling, Complexity, TGnSync, 4G. </p><p>R INTR</p><p>ecently many standards proposals, namely TGnSync [15][27], or WWise [28] for IEEE </p><p>802.11n, together with IEEE 802.16e [14], have considered LDPC coding schemes as key component of their system features. The adoption by such standards activities proves the increasing maturity of the LDPC related technology, especially the affordable joint complexity from encoder and decoder implementation. From sub-optimal lower-</p><p>complexity decoding algorithms [16] to complete flexible architecture design [6][7][8], some pragmatic and realistic implementation solutions allow LDPC codes to be more and more attractive as enhancement of current (B3G) or next generation wireless systems (4G) [29]. </p><p>The aim of this paper is thus to present and evaluate non-exhaustive solutions that allow to decrease the global complexity of encoding/decoding. The first part presents basic properties of LDPC codes, together with the message-passing principle. Then we tackle the encoder complexity issue, where Hardware (HW) requirements in terms of dimensioning are assessed, relying on the Block-LDPC approach. The decoder side is kept for the final part, as it represents the more voracious element within the joint design. We thus review and evaluate performance/complexity trade-off of sub-optimal low-complexity decoding algorithms, and highlight common trends in the architecture of such LDPC codes, by evaluating their HW requirements. </p><p>ODUCTION </p><p>LDPC Codes: Fundamentals </p><p>In this part, we briefly introduce the basic principle and notations of LDPC codes. For further reading see [1][16][17]. LDPC codes are linear block codes whose parity-check matrix H has the favourable property of being sparse, i.e. contains only a low number of non-zero elements. Tanner graphs of such codes are bipartite graphs containing two </p><p>Page 1 (10) </p></li><li><p>different kinds of nodes, code (bit, or variable) nodes and check nodes. A (n, k) LDPC code is thus represented by a m x n parity-check matrix H, where m=n-k, is the number of redundancy (parity bits) of the coding scheme. We can then distinguish regular from irregular LDPC codes, depending on the degree distribution of code nodes (column weights) and check nodes (row weights). A regular scheme means that these distributions are constant along column and rows, and are usually represented by the notation (dv, dc). For such a code, the number of non-zero elements is thus given either by n*dv or m*dc, leading to the following code rate relation Rc=1-(m/n)=1-(dv/dc). The decoding of LDPC codes is relying on the Belief-Propagation Algorithm (BPA) framework extensively discussed in literature [19]. This involves two major steps, the check node update and the bit node update. (Fig.1), where intrinsic values from the channel feed first bit nodes (parents), then extrinsic information is processed and forwarded to check nodes (child), that themselves will produce new extrinsic information relying on parity-check constraints, feeding their connected bit nodes. </p><p> Figure 1: Message-Passing Illustration </p><p>Check Node Update followed by Bit Node Update </p><p>The way of switching between bit and check nodes updates is referred as scheduling, and will be discussed later on, as this can impact the decoder complexity. Joint Design Methodology </p><p> With parallelization holding the promise of keeping delays low while continuously increasing data rates, the major attraction from a design point of view is that LDPC codes are inherently fully parallel oriented. </p><p>Nevertheless, a fully parallel implementation is prohibitive due to large block lengths. Consequently a strong trend is currently going towards semi-parallel architectures [6][8], with Block-LDPC being the centrepiece for this approach. Another important practical issue when dealing with coding schemes for adaptive air interfaces is flexibility in terms of block sizes and code rates. While designing LDPC codes we have therefore to keep in mind the direct and strong relation between the structure of the parity check matrix and the total encoding, decoding and implementation complexity. Indeed, a completely random LDPC might achieve better performance at the expense of a very complex interconnections (shuffle) network that might be prohibitive for large block lengths in terms of HW wirings, together with leading to potentially high complexity encoding, low achievable parallelization level, and most importantly low flexibility in terms of block sizes and code rates. Therefore the sequel intends to highlight and assess the most relevant performance/HW requirements trade-offs. Random-Like LDPC One typical way of constructing good LDPC codes is to take a degree distribution that promises good error correction performance [17] (e.g. by EXIT chart curve matching of variable and check node decoder transfer curves) as a starting point and then use e.g. progressive edge growth (PEG) [18] algorithms to ensure a good distance spectrum, i.e., low error floors. Codes that are contructed following this framework usually come very close to the bounds of what is achievable in terms of error correction performance [17]. They are hence considered as the baseline comparison case for performance assessment. The disadvantage of this approach for practical implementation is that a new code needs to be designed for each block length and code rate, leading to the above mentioned low flexibility. The obtained codes are often non-systematic, thus requiring appropriate preprocessing to enable near linear-time encoding [2]. </p><p>Page 2 (10) </p></li><li><p>Structured LDPC (Block-LDPC) Structured (Block-)LDPC codes on the other hand, such as Pi-Construction Codes [15] or LDPC Array Codes [14] proposed in the framework of IEEE 802 have been shown to have good performance and high flexibility in terms of code rates and block sizes at the same time. The parity-check matrix H of such codes can be seen as an array of square sub-block matrices. These latter sub-block matrices are obtained by circular shift and/or rotation of the identity matrix, or are all-zero matrices. The parity check matrix is hence fully determined by means of these circular shifts, and the square sub-block matrices dimension p. LDPC codes defined by such standards rely on the concept of a base model matrix introduced by Zhong and Zhang [7], which contains the circular shift values. Figure 2 shows the base model matrix (Mb,Nb)=(12, 24) Hb for the Rc=1/2 LDPC code defined in TGnSync [27]: </p><p> Figure 2: Base model matrix for TGnSync Rc=1/2 [27] Adaptation to different block lengths can e.g. be done by expanding elements of the base matrix (e.g. by replacing each 1 in the parity check matrix by an identity matrix and each 0 by an all-zeros matrix). Different code rates are obtained by appending more elements to the matrix in only one dimension (i.e., add more variable nodes, but no check nodes). Note that the decoder must be flexible enough to support such changes of the code structure. Using such base matrices hence adds flexibility in terms of packet length </p><p>together with maintaining the degree distributions of H. Indeed, for a given block length N, the expansion factor p (denoted Zf in standards) is obtained through p=N/Nb. As N must therefore be a multiple of Nb, the maximum achievable granularity is obviously bounded by the size of the base model matrix. In the case of the TGnSync LDPC code the block length is hence scalable in steps of 24 bits it is not very probable that a higher granularity could be required. Another interesting aspect is that the whole expansion process is independent from the circular shift values, leading to many different possible LDPC code designs [10][11][12] with different performance, but capable of being mapped onto the same semi-parallel decoding architecture. Encoding Complexity One considerable challenge for the application of LDPC in practical wireless systems has long been encoding complexity (it is in fact a still quite widespread mis-conception that this remains an open issue). If the parity check matrix is in non-systematic format, straightforward methods for encoding destroy (or do not exploit) the sparse nature of the matrix thus leading to an encoding complexity quadratic with block length. However, in his famous paper [2], Richardson presented several pre-processing techniques to transform H into an approximate lower triangular (and thus approximate systematic) form, leading to a complexity quadratic w.r.t. to only a small fraction of the block length n for the encoding. The resulting H format is given below (Fig. 3) for the Block-LDPC after expansion factor p (=Zf): </p><p>=</p><p>EDCTBA</p><p>H</p><p>pnN =</p><p>pmM =</p><p>MN g gM </p><p>gM </p><p>pg = </p><p>=</p><p>EDCTBA</p><p>H</p><p>pnN =</p><p>pmM =</p><p>MN g gM </p><p>gM </p><p>pg = </p><p> Figure 3: Approximated Lower Triangular H to </p><p>facilitate near linear-time encoding </p><p>Page 3 (10) </p></li><li><p>The Block-LDPC considered in this paper [27], have exactly the above requested format, and thus enable first to estimate accurately the encoding complexity, and then to take advantage of the pipelined processing described hereafter (Fig. 4): </p><p>The total number of logical gates (NAND) is given in Fig.5, as a function of both the block length (indexed by its expansion factor Zf), and the code rate Rc. Therefore dimensioning requires around 11K gates for the worst case, here a rate Rc=1/2 code of codeword length 2304 bits (Zf=96). </p><p>Ts A</p><p>C</p><p>E 1T 1 B</p><p>TsT1pTp2</p><p>( )1 ( )21T</p><p>( )3</p><p>Ts A</p><p>C</p><p>E 1T 1T 1 1 B</p><p>TsT1pTp2</p><p>( )1 ( )21T 1T</p><p>( )3 </p><p>Figure 4: Pipelined encoder structure The remaining complexity comes from the inversion of two matrices. Fortunately, due to the triangular nature of blocks (1) and (3), this can be solved by back-substitution, thus considerably decreasing the amount of operations. Then (2) involves only matrix-vector multiplications, enabling the use of dedicated techniques (cf. the colouring problem described in [7]) for proposing efficient architectures. Nevertheless, the complexity of (2) is still O(g2), where g is proportional to the expansion factor. It is thus recommended to fix g=p=Zf. Alternatively, one may right away construct the parity check matrix in systematic form [15] and achieve full linear time encoding. </p><p>Figure 6: ROM width (bits) for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) </p><p> Figure 6 above depicts the memory requirements (ROM in bits), for storing counters initialization values. This amount is directly related to the Non-Zero blocks in H, underlining sparsity differences among the coding schemes available. The case Rc=3/4 is now the worst case. </p><p>Encoder HW requirements Taking into account all these HW requirements, we follow estimations given in [7], and apply them to the TGnSync LDPC proposal. </p><p>Figure 7: Register width (bits) for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) </p><p> The register storage is related with the pipelined structure (Fig. 4) used for encoding. Figure 5: Number of Logical Gates for TGnSync Encoder w.r.t. Block length (Zf) and Code Rate (Rc) </p><p>Page 4 (10) </p></li><li><p>The code rate ranking is respected here (Fig. 7), and Rc=0.5 is once again the worst case requesting around 15.6Kbits of memory (1Kbits=1024 bits). As a result, to implement a fully pipelined encoder working with the whole range of coding schemes available in TGnSync we need 11K gates, together with around 16.2Kbits of memory. Note that for a random-like LDPC the required amount of memory will be significantly higher, even if the sparsity of the check matrix is retained for encoding, as will be highlighted later when discussing interleaver complexity. </p><p>Decoding Complexity On the receiver side, there are mainly three different topics that have to be investigated: how to decrease the decoder complexity, how to increase the efficiency by means of generic architectures, and/or how to achieve high throughput. We will start by considering decoding complexity. As is obvious from the structure of the message passing process, the average complexity of LDPC decoding process is the product of three factors: </p><p>o the node complexity o the average number of iterations, and o the number of nodes in each iteration. </p><p>In the following, we will discuss how these three factors can be minimized using state-of-the-art algorithms. Node Complexity Sub-Optimal Decoding The standard algorithm for decoding of LDPC codes is the so-called belief propagation algorithm (BPA), also known as sum-product algorithm [1][19]. For implementation simplicity, it is convenient to execute the algorithm in the log-domain, turning the typically required multiplications into simple additions (e.g. at the bit nodes). Following this path, however, has a significant disadvantage: calculating the check node messages then requires the non-linear box-plus operation. This drawback can be overcome by applying the maxLog approximation known from Turbo decoding resulting in the well-known MinSum algorithm [16]. Unfortunately, introducing this approximation results in a typical performance loss of 0.5-1dB. Several proposals aim at reducing this offset by </p><p>introducing correction terms in the calculation of check node messages [20][21]. However, the most efficient method by far is a simple scaling of the check node messages [16] that enables to recover close-to-optimal error correction performance. We will refer to this algorithm as the corrected MinSum algorithm. Calculating variable and check node messages then only involves simple sum and minimum operations, respectively, plus a scaling of the check node messages. It is conjectured that no further substantial reduction in the node complexity is possible without accepting quite significant performance losses. In this context, Bit-Flipping Algorithms can be considered as a viable soluti...</p></li></ul>


View more >