Upload
jonathan-farmer
View
216
Download
0
Embed Size (px)
Citation preview
Tinoosh Mohsenin, Dean Truong and Bevan M. Baas
VLSI Computation Lab, ECE DepartmentUniversity of California, Davis
Multi-Split-Row Threshold Decoding Implementations for
LDPC Codes
Outline
Introduction LDPC Decoding Goals and Key Features Split-Row Threshold Decoding Method Multi-Split-Row Threshold Decoder
Implementations and Results Conclusion
LDPC Decoding
Error
Check Processing
Variable Processing
Error correction
Parity check
β
α
λ
100001010
010100001
001001100
001100010
100010001
10010100
H
001
010
010
100
001
100
0
C1 C2 C3 C4 C5 C6
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
Check nodes
Variable nodes
LDPC decoding challenges High interconnect complexity for large number of processing
nodes Large delay, area, and power dissipation caused by long and
global wire
Message passing decoding
Outline
Introduction to LDPC Decoding Goals and Key Features Split-Row Threshold Decoding Method Multi-Split-Row Threshold Decoder
Implementations and Results Conclusion
LDPC Decoder Design Goals and Features
Key goals Very high throughput and high energy efficiency Area efficient (small circuit area) Well suited for long-length and large row weight LDPC
codes Easy implementation with automatic CAD tools Good error performance
Split-Row decoding key features Reduced interconnect complexity Reduced processor complexity
T. Mohsenin and B. Baas, “Split-row: A reduced complexity, high throughput LDPC decoder architecture,” in ICCD, 2006T. Mohsenin and B. Baas, “High-throughput LDPC decoders using a multiple Split-Row method,” in ICASSP, 2007
Standard MinSum vs. Split-Row Decoding
Standard MinSum decoding
Split-Row decoding
100001010
010100001
001001100
001100010
100010001
10010100
H
001
010
010
100
001
100
0
C1
V3 V5 V8 V10
reductionof input wires to check processor
reductionof check
processor area
H
Hsplit-sp0 Hsplit-sp1
100001010
010100001
001001100
001100010
100010001
10010100 001
010
010
100
001
100
0
C1sp0
V3 V5 V8 V10
C1sp1
Check proc
Variable proc
Syndrome check
Initialization
Check proc Sp0
Variable proc sp0
Syndrome check
Initialization
Check proc Sp1
Variable proc sp1
Sign Sp0
Sign Sp1
Problem with Original Split-Row Algorithm
0.5 – 0.7 dB error performance loss from MinSum Normalized and SPA.
In original MinSum Split-Row each partition has no information of the minimum value of the other partition.
Global Min, Min sp1 Min sp0
1.50030500 000.30
0.30050300 001.500.3000.300.300 001.50
MinSum MinSum Split-Row
Sp0 Sp1
Initialization
Outline
Introduction to LDPC Decoding Goals and Key Features Split-Row Threshold Decoding Method Multi-Split-Row Threshold Decoder Implementation
and results Conclusion
MinSum Split-Row Threshold Algorithm
Threshold_en Sp1=1
Threshold_en Sp0=0
A signal (Threshold_en) is passed from each partition, which indicates whether a partition has a minimum less than a given threshold (T).
Check nodes now take as their minimum of their own local Min or T. Optimum threshold value (T) is obtained by empirical simulations
Mohsenin et al, "An Improved Split-Row Thresholding Decoding Algorithm for LDPC Codes,"To appear to IEEE International Conference on Communications (ICC'09).
0.300T0T00 001.50
T=0.5
Sp0 Sp1
Threshold_en Sp0=01.50030500 000.30
T=0.5
Sp0 Sp1
Threshold_en Sp1=1
(2048,1723) (6,32) 10GBASE-T code
Code length =2048 Information length=1723 Row size (No. of parity checks)=384 Row weight (Wr)=32 Column weight (Wc)=6
H =
11
11
11
11
11
1
1
1
1
11
11
11
11
1
1
11
1 1
1
1
1
1
1
1
11
1
1
1
11 1
1
1
11
1
11
1
1
HSplit-Sp0 HSplit-Sp1 HSplit-Spn-1
Row weight=32/Spn
C1sp0
V1 Vp
C1sp1
Vx
C1spn
VkVj Vm
Code length =2048Row weight (Wr)=32
32/Spn variable nodes
Error Performance for (2048,1723) 10GBASE-T Code
MS Split-Row-16 Threshold is 0.22 dB away from MS and is 0.12 dB better than Split-Row-2 Original.
Threshold (T)=0.2 In the Plot:
BPSK modulation AWGN channel Maximum 15 iterations Based on 80 error blocks
3 3.5 4 4.5 5 5.510
-10
10-8
10-6
10-4
10-2
SNR (dB)
Bit
Err
or
Pro
ba
bili
ty
SPAMS NormalizedMS Split-Row-2 ThresholdMS Split-Row-4 ThresholdMS Split-Row-8 ThresholdMS Split-Row-16 ThresholdMS Split-Row-2 Original
0.22 dB
0.12 dB
Outline
Introduction to LDPC Decoding Goals and Key Features Split-Row Threshold Decoding Method Multi-Split-Row Threshold Decoder
Implementations and results Conclusion
Delay Analysis for Decoders
Path1: propagation of Threshold_en passing through Spn-2 partitions
Path2: delay path through check and variable procs
For small Spn the interconnect delay is dominant because of wire interconnect complexity
As the number of partitioning increases Path 1 delay increases
outλVariable
proc
outλ
Threshold_en_in Threshold_en_out
outλ
Sp0 Spn-1Spn-2
Check proc
Variable proc
Path 2
Check proc
Check proc
Variable proc
Threshold_en_in Threshold_en_out Threshold_en_in Threshold_en_out
Path 1
MinSumSplit-2 Split-4 Split-8 Split-160
10
20
30
40
50
60
70
Crit
ical
pat
h de
lay
(ns)
interconnect delaygate delay
Area Analysis for Decoders
In MinSum, the synthesis area deviates significantly from layout area due to low utilization.
Area break down per sub-block for MinSum and Split-16 75% of MinSum decoder is
empty space for wiring
MinSum Split-2 Split-4 Split-8 Split-160
5
10
15
20
De
cod
er
Are
a (
mm
2 )
layoutsynthesis
MinSum Split-16 Threshold
75%11%
10%
4%
17%
38%
43%
2%
Check ProcVar ProcClk tree+ RegsWire (empty space)
Comparison of Decoders
10GBASE-T Code
65 nm, 7 M, 1.3 V
MinSum standard
Split-2 Threshold
Split-4 Threshold
Split-8 Threshold
Split-16 Threshold
Split-16 vs.MinSum
Area Utilization 25% 40% 85% 95% 98% 3.9x
Area (mm2) 18.2 8.9 5.0 4.5 3.8 4.8x
Speed (MHz) 17 40 53 112 101 5.9x
Throughput @ 15 iter (Gbps) 2.3 5.5 7.2 15.2 13.8 6x
CAD Tool CPU Time (hour) >78 36 18 10 5 >15.6x
(6,32) (2048,1723) 10GBASE-T code with 15 decoding iterations. Minsum standard
Split-2 Threshold
Split-4 Threshold
Split-8 Threshold Split-16
Threshold
Conclusion
Split-Row Threshold algorithm improves the error performance when compared with original Split-Row.
Split-Row Threshold allows for high level of partitionings without losing significant error performance.
Higher level of partitioning reduces the number of connections between check and variable processors. This results in a higher logic utilization and a smaller circuit.
We can meet the demands of high speed applications while obtaining very low area when compared to standard decoding.
Acknowledgements
Support ST Microelectronics NSF Grant 430090 and CAREER award 546907 Intel SRC GRC Grant 1598 and CSR Grant 1659 Intellasys UC Micro SEM
Special thanks Professor Shu Lin