16
Fully LTE-A compliant Turbo Decoder for Base Station Applications Dipl. Ing. Stefan Weithoffer [email protected] 4.11.14, Brest

Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Fully LTE-A compliant Turbo Decoder for Base Station Applications

Dipl Ing Stefan Weithoffer weithoffereituni-klde

41114 Brest

Evolution

2

hellip

Demand for high throughput

How to Increase Throughput

Use multiple slow decoders

Use monolithic high speed decoder

Dec

Dec

Dec

Dec

Dec

PRO

Easy to implement

CON

Low efficiency large memory

Large latency

PRO

Higher efficiency

Lower latency

CON

Challenging architecture due to iterative decoding

State-of-the-Art Turbo-Code Decoders

P MAP decoder in parallel

LTE conflict-free interleaver up to parallelism of 64

Subblock size BP

Windowing inside MAP to reduce memory (sliding window of size WL)

MAP1

Subblock 1

Interleaver Deinterleaver

Network

MAP2

Subblock 2

MAPP

Subblock P

write

read

LLLL

fn)LPB(

BTP

ACQWLpipelineMAP

iter_halfMAP

MAP

B

BP

WL

Acquisition necessary

Softdecoder inherent serial serial fwdbwd recursion on complete block

challenge parallelize MAP decoding

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 2: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Evolution

2

hellip

Demand for high throughput

How to Increase Throughput

Use multiple slow decoders

Use monolithic high speed decoder

Dec

Dec

Dec

Dec

Dec

PRO

Easy to implement

CON

Low efficiency large memory

Large latency

PRO

Higher efficiency

Lower latency

CON

Challenging architecture due to iterative decoding

State-of-the-Art Turbo-Code Decoders

P MAP decoder in parallel

LTE conflict-free interleaver up to parallelism of 64

Subblock size BP

Windowing inside MAP to reduce memory (sliding window of size WL)

MAP1

Subblock 1

Interleaver Deinterleaver

Network

MAP2

Subblock 2

MAPP

Subblock P

write

read

LLLL

fn)LPB(

BTP

ACQWLpipelineMAP

iter_halfMAP

MAP

B

BP

WL

Acquisition necessary

Softdecoder inherent serial serial fwdbwd recursion on complete block

challenge parallelize MAP decoding

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 3: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

How to Increase Throughput

Use multiple slow decoders

Use monolithic high speed decoder

Dec

Dec

Dec

Dec

Dec

PRO

Easy to implement

CON

Low efficiency large memory

Large latency

PRO

Higher efficiency

Lower latency

CON

Challenging architecture due to iterative decoding

State-of-the-Art Turbo-Code Decoders

P MAP decoder in parallel

LTE conflict-free interleaver up to parallelism of 64

Subblock size BP

Windowing inside MAP to reduce memory (sliding window of size WL)

MAP1

Subblock 1

Interleaver Deinterleaver

Network

MAP2

Subblock 2

MAPP

Subblock P

write

read

LLLL

fn)LPB(

BTP

ACQWLpipelineMAP

iter_halfMAP

MAP

B

BP

WL

Acquisition necessary

Softdecoder inherent serial serial fwdbwd recursion on complete block

challenge parallelize MAP decoding

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 4: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

State-of-the-Art Turbo-Code Decoders

P MAP decoder in parallel

LTE conflict-free interleaver up to parallelism of 64

Subblock size BP

Windowing inside MAP to reduce memory (sliding window of size WL)

MAP1

Subblock 1

Interleaver Deinterleaver

Network

MAP2

Subblock 2

MAPP

Subblock P

write

read

LLLL

fn)LPB(

BTP

ACQWLpipelineMAP

iter_halfMAP

MAP

B

BP

WL

Acquisition necessary

Softdecoder inherent serial serial fwdbwd recursion on complete block

challenge parallelize MAP decoding

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 5: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 6: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Turbo-Code Decoder

High Throughput (high code rates) large P

Communications performance decreases for high code rates with small LACQ

Increase LACQ to counterbalance communications performance decrease

LMAP dominates saturation in throughput

Smaller P Radix 2 Radix 4 only P2 for same throughput

Smaller LACQ Next iteration initialization (NII) LACQ =0

Smaller LWL no windowing inside MAP LWL=0

Improves communications performance

But second LLR unit mandatory and increase in memory

Re-computation only every nth metric is stored Additional state metric unit re-

calculates the other n-1 metrics Optimum n= 1198612119875

Eg LTE reduces memory storage from B=6144 to 768 state metrics

___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014

ACQWLpipelineMAP

iter_halfMAP

MAP LLLL fn)LPB(

BTP

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 7: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Throughput dependent on Parallelism

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 8: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

LTE Turbo-Code Decoder

High Throughput iteration control

Various iteration control schemes known

LTE comes with CRC attached to code blocks

If CRC on the decoded code block succeeds stop decoding

For high code rates decoding process can oscillate

CRC after every half iteration mandatory

___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany

ACQWLpipelineMAP

iterhalfMAP

MAP LLLLfnLPB

BTP

)( _

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 9: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Cyclic Redundancy Check

Example 8 bit CRC (Polynomial x8+x2+x+1)

The CRC of a code block can be very efficiently evaluated if the data bits are in linear order

Ok for non-interleaved half iterations

BUT

What about interleaved half iterations

9

CRC OK

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 10: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Iteration Control CRC timing

10

Additional latency

Additional memory

Two CRC units necessary

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 11: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

LTE Turbo-Code Decoder

Iteration control based on code block CRC

Significant additional latency for interleaved half iterations

Additional memory (B-bit)

Second CRC unit mandatory

On-the-fly CRC calculation

Same (small) LCRC for non-interleaved and interleaved half iterations

No additional memory needed

Easier IO control

No second CRC unit needed

ACQWLpipelineMAP

CRCiterhalfMAP

MAP LLLLfLnLPB

BTP

))(( _

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 12: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

1 Gbits LTE-A TC Decoder28nm

bull Support for all LTE code block sizes

bull On-the-fly CRC calculation

bull Transport Block (TB) CRC calculation for

up to 4 TBs

bull Softout calculation

___

wo memories

12

Architecture Parameters

Radix Parallelism 4 16

Max Window Length 192

Max Iter 7

Acquisition NII

Input quantization 8

Technology 28 nm

Area (Post Synthesis) ~065 mm2

Throughput (Mbps) 1000

Frequency (MHz) 700

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 13: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Thank you for attention

For more information please visit

httpemseituni-klde

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 14: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

215 Gbits LTE TC Decoder65nm

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 15: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Generic Decoder Structure

Parallelize block processing

Turbo-Code Decoder softdecoder inherent serial

LDPC Decoder inherent parallel since node processing independent

Network (interleaver tanner graph) no locality

Routing congestion access conflicts

Impact on communication standards (UMTSHSPA versus LTE)

Softdecoder 1 (MAP)

Check_n1

Check_nN

Softdecoder 2 (MAP)

Variable_n1

Variable_nN

Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)

Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks

Processing Block 1 Processing Block 2 Network

Turbo Decoder Design Space

Page 16: Fully LTE-A compliant Turbo Decoder for Base Station ......1 Gbit/s LTE-A TC Decoder@28nm • Support for all LTE code block sizes • On-the-fly CRC calculation • Transport Block

Turbo Decoder Design Space