31
GPU A l tdD di fHi h GPU Accelerated Decoding of High Performance Error Correcting Codes Andrew D. Copeland, Nicholas B. Chang, d St h L and Stephen Leung This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. HPEC_GPU_DECODE-1 ADC 10/1/2009 MIT Lincoln Laboratory conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

GPU A l t d D di f Hi hGPU Accelerated Decoding of High Performance Error Correcting Codes

Andrew D. Copeland, Nicholas B. Chang,d St h Land Stephen Leung

This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

HPEC_GPU_DECODE-1ADC 10/1/2009

MIT Lincoln Laboratory

conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Page 2: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Outline

• IntroductionWi l C i ti– Wireless Communications

– Performance vs. Complexity– Candidate Implementationsp

• Low Density Parity Check (LDPC) Codes• NVIDIA GPU Architecture• GPU Implementation• Results

S• Summary

MIT Lincoln LaboratoryHPEC_GPU_DECODE-2

ADC 10/1/2009

Page 3: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Wireless Communications

Coding Modulation Demodulation Decoding

Generic Communication System

Coding Modulation Demodulation Decoding

... ...

Channel•MultipathNoiseg g

Data Datag g•Noise

•Interference

MIT Lincoln LaboratoryHPEC_GPU_DECODE-3

ADC 10/1/2009

Page 4: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Wireless Communications

O FOur Focus

Coding Modulation Demodulation Decoding

Generic Communication System

Coding Modulation Demodulation Decoding

... ...

Channel•MultipathNoiseg g

Data Datag g•Noise

•Interference

MIT Lincoln LaboratoryHPEC_GPU_DECODE-4

ADC 10/1/2009

Page 5: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Wireless Communications

O FOur Focus

Coding Modulation Demodulation Decoding

Generic Communication System

Coding Modulation Demodulation Decoding

... ...

Channel•MultipathNoiseg g

Data Datag g

Simple Error Correcting Code

•Noise•Interference

01011 Coding

010110101101011

MIT Lincoln LaboratoryHPEC_GPU_DECODE-5

ADC 10/1/2009

Page 6: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Wireless Communications

O FOur Focus

Coding Modulation Demodulation Decoding

Generic Communication System

Coding Modulation Demodulation Decoding

... ...

Channel•MultipathNoiseg g

Data Datag g

Simple Error Correcting Code

•Noise•Interference

01011 Coding

010110101101011

• Allow for larger link distance d/ hi h d t t

High Performance Codes

and/or higher data rate• Superior performance comes

with significant increase in complexity

MIT Lincoln LaboratoryHPEC_GPU_DECODE-6

ADC 10/1/2009

complexity

Page 7: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Performance vs. Complexity

• Increased performance req ires largerrequires larger computational cost

• Off the shelf systems require 5dB more power to

4-State STTC

8-State STTC16-State STTC

32 St t STTC

Commercial Interests

Custom Designs

require 5dB more power to close link than custom system

• Best custom design only

32-State STTC

GF(2) LDPC BICM-ID

64-State STTC

Best custom design only require 0.5dB more power than theoretical minimum

GF(2) LDPC BICM ID

Direct GF(256) LDPC

MIT Lincoln LaboratoryHPEC_GPU_DECODE-7

ADC 10/1/2009

Page 8: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Performance vs. Complexity

• Increased performance req ires larger 4-State STTC

8-State STTC16-State STTC

32 St t STTC

Commercial Interests

Custom Designs

requires larger computational cost

• Off the shelf systems require 5dB more power to 32-State STTC

GF(2) LDPC BICM-ID

64-State STTCrequire 5dB more power to close link than custom system

• Best custom design only GF(2) LDPC BICM ID

Direct GF(256) LDPC

Best custom design only require 0.5dB more power than theoretical minimum�

Our Choice

MIT Lincoln LaboratoryHPEC_GPU_DECODE-8

ADC 10/1/2009

Page 9: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

• Half second burst of data– Decode complexity 37 TFLOPS– Memory transferred during decode 44 TB

PC 9x FPGA ASIC 4x GPU

ImplementationImplementation Time (beyond development)Performance

(decode time)(decode time)

Hardware Cost

MIT Lincoln LaboratoryHPEC_GPU_DECODE-9

ADC 10/1/2009

Page 10: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

• Half second burst of data– Decode complexity 37 TFLOPS– Memory transferred during decode 44 TB

PC 9x FPGA ASIC 4x GPU

Implementation -Implementation Time (beyond development)Performance

(decode time)9 days

(decode time)

Hardware Cost ≈ $2,000

MIT Lincoln LaboratoryHPEC_GPU_DECODE-10

ADC 10/1/2009

Page 11: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

• Half second burst of data– Decode complexity 37 TFLOPS– Memory transferred during decode 44 TB

PC 9x FPGA ASIC 4x GPU

Implementation - 1 personImplementation Time (beyond development)

1 person year (est.)

Performance(decode time)

9 days 5.1 minutes(decode time)

(est.)

Hardware Cost ≈ $2,000 ≈ $90,000

MIT Lincoln LaboratoryHPEC_GPU_DECODE-11

ADC 10/1/2009

Page 12: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

• Half second burst of data– Decode complexity 37 TFLOPS– Memory transferred during decode 44 TB

PC 9x FPGA ASIC 4x GPU

Implementation - 1 person 1 5 personImplementation Time (beyond development)

1 person year (est.)

1.5 person years (est.)

+1 year to Fab.Performance

(decode time)9 days 5.1 minutes 0.5s

(if ibl )(decode time)(est.) (if possible)

Hardware Cost ≈ $2,000 ≈ $90,000 ≈ $1 million

MIT Lincoln LaboratoryHPEC_GPU_DECODE-12

ADC 10/1/2009

Page 13: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

• Half second burst of data– Decode complexity 37 TFLOPS– Memory transferred during decode 44 TB

PC 9x FPGA ASIC 4x GPU

Implementation - 1 person 1 5 person ?Implementation Time (beyond development)

1 person year (est.)

1.5 person years (est.)

+1 year to Fab.

?

Performance(decode time)

9 days 5.1 minutes 0.5s(if ibl )

?(decode time)

(est.) (if possible)

Hardware Cost ≈ $2,000 ≈ $90,000 ≈ $1 million ≈ $5,000 (includes PC

cost)

MIT Lincoln LaboratoryHPEC_GPU_DECODE-13

ADC 10/1/2009

cost)

Page 14: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Outline

• Introduction• Low Density Parity Check (LDPC) Codes

– Decoding LDPC GF(256) CodesAl ith i D d f LDPC D d– Algorithmic Demands of LDPC Decoder

• NVIDIA GPU Architecture• GPU Implementation• GPU Implementation• Results• Summary• Summary

MIT Lincoln LaboratoryHPEC_GPU_DECODE-14

ADC 10/1/2009

Page 15: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Low Density Parity Check (LDPC) Codes

LDPC GF(256) Parity-Check Matrix

⎤⎡ 000000000000 5017452 ⎤⎡c

NChecks×NSymbols

NChecks

NSymbols• Proposed by Gallager (1962) codes were binary

LDPC Codes

⎥⎥⎥⎥⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎡

000000000000000000000000000000000000

000000000000000000000000000000000000

39621678214172

17412910721878119

9721345017452

⎥⎥⎥

⎢⎢⎢

⎡=

⎥⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎢⎡

0

01

c

c

n

NChecks– codes were binary• Not used until mid 1990s• GF(256) Davey and McKay

(1998)

⎥⎥⎥⎥⎥⎥⎥

⎦⎢⎢⎢⎢⎢⎢⎢

⎣ 000000000000000000000000

000000000000000000000000000000000000

17811023822097134

15218113949114

396216

191

⎥⎦⎢⎣⎥⎥⎦⎢

⎢⎣ 15c

C d dSyndrome

(1998)– symbols in [0, 1, 2, …, 255]

• Parity check matrix defines graph Codewordgraph

– Symbols connected to checks nodes

• Has error state feedback SY C

H

Tanner Graph

– Syndrome = 0 (no errors)– Potential to re-transmit

YMBOL

HECKS

MIT Lincoln LaboratoryHPEC_GPU_DECODE-15

ADC 10/1/2009

LS S

Page 16: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Decoding LDPC GF(256) Codes

• Solved using LDPC Sum-Product Decoding Algorithm– A belief propagation algorithm

Tanner Graph

p p g g• Iterative Algorithm

– Number of iterations depends on SNR

SYM

CH

Tanner Graph

SymbolUpdate

Check Update

SymbolUpdate

Check Update

HardDecision

HardDecision M

BOLS

HECKS

Update UpdateUpdate Update DecisionDecision

S S

ML Probabilities

CodewordSyndrome

decoded symbols

MIT Lincoln LaboratoryHPEC_GPU_DECODE-16

ADC 10/1/2009

y

Page 17: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Algorithmic Demands of LDPC Decoder

• Decoding of a entire sequence 37 TFLOPSComputational Complexity

– Single frame 922 MFLOPS– Check update 95% of total

• Data moved between device memory and multiprocessors 44.2 TB

Memory Requirements

p– Single codeword 1.1 GB

• Data moved within Walsh Transform and permutations (part of check update) 177 TB

– Single codeword 4.4 GB

Decoder requires architectures with high

MIT Lincoln LaboratoryHPEC_GPU_DECODE-17

ADC 10/1/2009

computational and memory bandwidths

Page 18: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Outline

• Introduction• Low Density Parity Check (LDPC) Codes• NVIDIA GPU Architecture

– Dispatching Parallel Jobs • GPU Implementation

R l• Results• Summary

MIT Lincoln LaboratoryHPEC_GPU_DECODE-18

ADC 10/1/2009

Page 19: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

NVIDIA GPU Architecture

GPU • GPU contains N multiprocessors– Each Performs Single

Instruction Multiple Thread (SIMT) operations

GPU

(SIMT) operations• Constant and Texture caches

speed up certain memory access patterns

• NVIDIA GTX 295 (2 GPUs)– N = 30 multiprocessors– M = 8 scalar processors

Shared memory 16KB– Shared memory 16KB– Device memory 896MB– Up to 512 threads on each

multiprocessor– Capable of 930 GFLOPs – Capable of 112 GB/sec

MIT Lincoln LaboratoryHPEC_GPU_DECODE-19

ADC 10/1/2009

NVIDIA CUDA Programming Guide Version 2.1

Page 20: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Dispatching Parallel Jobs

• Program decomposed into blocks of threads– Thread blocks are executed on individual multiprocessors– Threads within blocks coordinate through shared memory– Threads optimized for coalesced memory access

GridBlock 1 Block 2 Block 3 Block M

…• Calling parallel jobs (kernels) within C function is simple

grid.x = M; threads.x = N;k l ll id h d ( i bl )

Calling parallel jobs (kernels) within C function is simple• Scheduling is transparent and managed by GPU and API

kernel_call<<<grid, threads>>>(variables);

Thread blocks execute independently of one anotherThreads allow faster computation and better use of memory bandwidth

MIT Lincoln LaboratoryHPEC_GPU_DECODE-20

ADC 10/1/2009

Threads allow faster computation and better use of memory bandwidthThreads work together using shared memory

Page 21: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Outline

• Introduction• Low Density Parity Check (LDPC) Codes• NVIDIA GPU Architecture• GPU Implementation

– Check Update Implementation R l• Results

• Summary

MIT Lincoln LaboratoryHPEC_GPU_DECODE-21

ADC 10/1/2009

Page 22: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

GPU Implementation of LDPC Decoder

• Quad GPU Workstation– 2 NVIDIA GTX 295s

3 20GHz Intel Core i7 965– 3.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory– Costs $5000

• Software written using NVIDIA’s Compute Unified Device Architecture (CUDA)

• Replaced Matlab functions with CUDA MEX functions• Replaced Matlab functions with CUDA MEX functions• Utilized static variables

– GPU Memory allocated first time function is calledConstants used to decode many codewords only loaded once– Constants used to decode many codewords only loaded once

• Codewords distributed across the 4 GPUs

MIT Lincoln LaboratoryHPEC_GPU_DECODE-22

ADC 10/1/2009

Page 23: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Check Update Implementation

Permutation WalshTransform

Inverse Walsh Trans. Permutation

X

.

.

.

X

Permutation Walsh Transform

Inverse Walsh Trans. PermutationX

.

.

.

• Th d bl k i d t h h kSYM

CHE

• Thread blocks assigned to each check independently of all other checks

• and are 256 element vectorsEach thread is assigned to element of GF(256) M

BOLS

ECKS

– Each thread is assigned to element of GF(256)– Coalesced memory reads– Threads coordinate using shared memory

grid.x = num_checks;threads.x = 256;check_update<<<grid, threads>>>(Q,R);

MIT Lincoln LaboratoryHPEC_GPU_DECODE-23

ADC 10/1/2009

Structure of algorithm exploitable on GPU

Page 24: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Outline

• Introduction• Low Density Parity Check (LDPC) Codes• NVIDIA GPU Architecture• GPU Implementation• Results• Summary

MIT Lincoln LaboratoryHPEC_GPU_DECODE-24

ADC 10/1/2009

Page 25: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

PC 9x FPGA ASIC 4x GPU

Implementation Time (beyond development)

- 1 person year (est.)

1.5 person years (est.)

+1 year to Fab.

?

Performance(time to decode half

second of data)

9 days 5.1 minutes 0.5s(if possible)

?

H d C t $2 000 $90 000 $1 illi $5 000Hardware Cost ≈ $2,000 ≈ $90,000 ≈ $1 million ≈ $5,000 (includes PC

cost)

MIT Lincoln LaboratoryHPEC_GPU_DECODE-25

ADC 10/1/2009

Page 26: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Candidate Implementations

PC 9x FPGA ASIC 4x GPU

Implementation Time (beyond development)

- 1 person year (est.)

1.5 person years (est.)

+1 year to Fab.

6 person weeks

Performance(time to decode half

second of data)

9 days 5.1 minutes 0.5s(if possible)

5.6 minutes

H d C t $2 000 $90 000 $1 illi $5 000Hardware Cost ≈ $2,000 ≈ $90,000 ≈ $1 million ≈ $5,000 (includes PC

cost)

MIT Lincoln LaboratoryHPEC_GPU_DECODE-26

ADC 10/1/2009

Page 27: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Implementation Results

• Decoded test data in 5.6 minutes instead of 9 days 104

Decode Time

y– A 2336x speedup

15 iterations103

10

(Min

utes

)

15 IterationsGPUMatlab15 iterations

102

code

Tim

e 15 Iterations Matlab

0 5 0 0 5 1 1 5 2 2 5 3 3 5 4100

101D

ec

-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4SNR (dB)

GPU implementation supports analysis,

MIT Lincoln LaboratoryHPEC_GPU_DECODE-27

ADC 10/1/2009

p pp y ,testing, and demonstration activities

Page 28: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Implementation Results

• Decoded test data in 5.6 minutes instead of 9 days 104

Decode Time

y– A 2336x speedup

• Hard coded version runs fixed number of 15 iterations

103

10

(Min

utes

)

15 IterationsGPUMatlabiterations

– Likely used on FPGA or ASIC versions

15 iterations102

code

Tim

e 15 Iterations MatlabHard Coded

0 5 0 0 5 1 1 5 2 2 5 3 3 5 4100

101D

ec

-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4SNR (dB)

GPU implementation supports analysis,

MIT Lincoln LaboratoryHPEC_GPU_DECODE-28

ADC 10/1/2009

p pp y ,testing, and demonstration activities

Page 29: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Summary

• GPU architecture can provide significant performance improvement with limited hardware cost andimprovement with limited hardware cost and implementation effort

• GPU implementation is a good fit for algorithms withGPU implementation is a good fit for algorithms with parallel steps that require systems with large computational and memory bandwidths

– LDPC GF(256) decoding is a well-suited algorithm

• Implementation allows reasonable algorithm development cycle

– New algorithms can be tested and demonstrated in minutes instead of days

MIT Lincoln LaboratoryHPEC_GPU_DECODE-29

ADC 10/1/2009

Page 30: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Backup

MIT Lincoln LaboratoryHPEC_GPU_DECODE-30

ADC 10/1/2009

Page 31: GPU A l t d D di f Hi hGPU Accelerated Decoding of High ... · – 2 NVIDIA GTX 295s – 3 20GHz Intel Core i7 9653.20GHz Intel Core i7 965 – 12GB of 1330MHz DDR3 memory – Costs

Experimental Setup

• Data rate 1.3 Gb/s • Half second burst

Test System Parameters

• Half second burst – 40,000 codewords (6000 symbols)

• Parity check matrix 4000 checks × 6000 symbolssymbols

– Corresponds to rate 1/3 code• Performance numbers based on 15 iterations

of decoding algorithmg g

MIT Lincoln LaboratoryHPEC_GPU_DECODE-31

ADC 10/1/2009