20
An Efficient, Portable and Generic Library for Successive Cancellation Decoding of Polar Codes Adrien Cassagne 1,2 Bertrand Le Gal 1 Camille Leroux 1 Olivier Aumage 2 Denis Barthou 2 1 IMS, Univ. Bordeaux, INP, France 2 Inria / Labri, Univ. Bordeaux, INP, France LCPC, September 2015 A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 1 / 20

An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

An Efficient, Portable and Generic Libraryfor Successive Cancellation Decoding of Polar Codes

Adrien Cassagne1,2 Bertrand Le Gal1 Camille Leroux1

Olivier Aumage2 Denis Barthou2

1IMS, Univ. Bordeaux, INP, France

2Inria / Labri, Univ. Bordeaux, INP, France

LCPC, September 2015

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 1 / 20

Page 2: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Context: Error Correction Codes (ECC)

Algorithm that enable reliable delivery of digital dataRedundancy for error correction

Usually implemented in hardwareGrowing interest for software implementation

End-user utilization (low power consumption processors)Less expensive production than dedicated hardware chipsAlgorithms validation (typically Monte-Carlo HPC simulations)

Require performanceFocus on the decoder (most time consuming part)

Comm. Chan.Transmitter Receiver

ChannelEncoderSource Decoder Sink

The communication chain

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 2 / 20

Page 3: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Polar Codes as a New Class of ECC

Explored for upcoming 5G Mobile Phones (Huawei1)Redundancy: adding bits at fixed positions (value is always 0)

0 0 0 01 0 1 0

N = 8

Frozen bits Information bits

1 0 1 0Information bits

Enc. process

K = 4

Example of Polar Code (number of info. bits K = 4, frame size N = 8)

Rate R = N/K : frame size / information bits ratio

1http://www.huawei.com/minisite/has2015/img/5g_radio_whitepaper.pdf

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 3 / 20

Page 4: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

The Successive Cancellation (SC) Algorithm

Depth-first binary tree traversal/search algorithm3 key functions:{

λc = f (λa, λb) = sign(λa.λb).min(|λa|, |λb|)λc = g(λa, λb, s) = (1− 2s)λa + λb(sc , sd) = h(sa, sb) = (sa ⊕ sb, sb).

1 3

�d

�d+1 �d+1

f

g

h

2

Per-node downward and upwardcomputations

4

3

2

1

0

Depth

Data layout representation

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 4 / 20

Page 5: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Polar Decoding Tree

?

?

? ? ? ?

?

f() g()

f() f()

f()f()

g()g()

f() f() g()g()g()g()

h() h() h() h()

h()h()

h()

Position of the frozen and information bits in the tree

Same specialized tree for each frameFrames are independent

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 5 / 20

Page 6: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

A Wide Optimization Space

Simplification of the computations (tree pruning, rewriting rules)Vectorization of the node functions (f , g , h)Optimization on the decoder binary sizeImplementation of low level kernels: various instruction sets (SSE,AVX, NEON, etc.)

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 6 / 20

Page 7: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Example: Application of the Rewriting Rules

��������

������

������

������

������

����������������

������������

��������������������

? ?

gr()

xor()

f()

xor()

gr()f()

SPCRep

Rate 0 Rate 1

Rep SPC

cut branch

Rewriting rules applied to a N = 8 and K = 4 frame

Rewriting rules are applied recursivelyRepeated application of this rules lead to a simplified tree

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 7 / 20

Page 8: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Rewriting Rules and Tree Pruning

���������

���������

���������

���������

������������

������������ ����

������������

���������

��������� ��

������

������

������ ����

������������

��������

��������

����������������

?

?

? ?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

????

?level >= 2

???

?

f() g()

xor()

f() gr()

xor()

f()

xor()

g1()g0()

xor0()

spc()rep()h()

spc()rep()h()

Rate 1

RepRepetition

SPCSingle Parity Check

Rate 0

Any node

LeafRate 1, left onlyRate 0, left only

Rate 0 Rate 1 Repetition SPC

Level 2+ SPCRep., leaf childrenRate 1, leaf childrensRate 0, leaf childrens

Repetition, left only Standard Case

Sub-tree rewriting rules and tree pruning for processing specialization

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 8 / 20

Page 9: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Vectorization

Intra frame SIMD strategy:n data

f f f f

Inter frame SIMD strategy:

frame 0

frame 1

frame 2

frame 3

frame 0

frame 1

frame 2

frame 3

n data

n fra

mes

f

f

f

f

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 9 / 20

Page 10: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

P-EDGE: a Dedicated Framework for Polar CodesFeatures

1 Code generationFlattening recursive callsRewriting rulesGeneration of templated C++

2 C++ specializationLoop unrollingData typesSIMD

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 10 / 20

Page 11: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

P-EDGE: Code Generation Example

��������

������

������

������

������

����������������

������������

��������������������

? ?

gr()

xor()

f()

xor()

gr()f()

SPCRep

Rate 0 Rate 1

Rep SPC

cut branch

Generated code for N = 8 and K = 4

1 void Generated_SC_decoder_N8_K4 :: decode ()2 {3 // ------------ template args --------------- -std args -4 // -types - --funcs -- ----offsets ---- -size - --buffs ---5 f < R , F , FI , 0 , 4 , 8 , 4 > :: apply ( l );6 rep < B , R , H , HI , 8 , 0 , 4 > :: apply ( l , s );7 gr < B , R , G , GI , 0 , 4 , 0 , 8 , 4 > :: apply ( l , s );8 spc < B , R , H , HI , 8 , 4 , 4 > :: apply ( l , s );9 xo < B , X , XI , 0 , 4 , 0 , 4 > :: apply ( s );

10 }

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 11 / 20

Page 12: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Reducing the L1I Cache Occupancy

Flattening may generate large binariesBinary size grows with the frame sizePerformance slowdown when the binary exceeds the L1I cacheMoving offsets from template to function arguments

Help the compiler to factorize many function calls

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 12 / 20

Page 13: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Sub-tree Folding Technique

Legend

Default

Rate 0 left

Rate 0

Rate 1

Rep left

Rep

SPC 2

N0 xo()

N1

f()

N22

g()

xo()

N2

f()

N11

g()

xo0()

N3 N4

g0()

xo0()

N5 N6

g0()

xo0()

N7 N8

g0()

xo0()

N9 N10

g0()

h()

xo()

N12

f()

N17

g()

xo0()

N13 N14

g0()

xo()

N15

f()

N16

gr()

rep() spc()

xo()

N18

f()

N21

g()

xo()

N19

f()

N20

gr()

rep() spc()

spc()

xo()

N23

f()

N34

g()

xo()

N24

f()

N29

g()

xo()

N25

f()

N26

gr()

rep() xo()

N27

f()

N28

gr()

rep() spc()

xo()

N30

f()

N33

g()

xo()

N31

f()

N32

gr()

rep() spc()

h()

xo()

N35

f()

N42

g()

xo()

N36

f()

N41

g()

xo()

N37

f()

N40

g()

xo0()

N38 N39

g0()

h()

h()

h()

h()

Full decoding tree representation (N = 128, K = 64).

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 13 / 20

Page 14: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Sub-tree Folding TechniqueEnabling compression

Legend

Default

Rate 0 left

Rate 0

Rate 1

Rep left

Rep

SPC 2

N0

N1 N22

N2 N11

N3 N4

N5N6

N7 N8

N9 N10

N12 N17

N14

N15N16

N21

N23 N34

N24 N29

N25 N33

N35 N42

N36

N40

A single occurrence of a given sub-tree traversal is generated, andreused wherever neededCompression ratio on the example shown: 1.48

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 14 / 20

Page 15: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Binary Sizes Comparison

P-EDGE generated decoder binary sizes depending on the frame size (R=1/2)

2

4

8

16

32

64

128

256

512

1024

2048

4096

6 7 8 9 10 11 12 13 14 15 16

Dec

oder

bin

ary

size

(KB)

Codewords size (n = log2(N))

Without compression

32-bit Inter-SIMD32-bit Intra-SIMD8-bit Intra-SIMD

L1I size

0.5

1

2

4

8

16

32

64

6 7 8 9 10 11 12 13 14 15 16

Codewords size (n = log2(N))

With compression

Enable sub-tree folding

De-templetization: 10-fold binary reductionTree folding: 5-fold binary reduction (for N = 216)

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 15 / 20

Page 16: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Performance results

P-Edge Intel-based P-Edge ARM-based McGill impl.2CPU Intel Xeon E31225 ARM Cortex-A15 Intel Core i7-2600

3.10Ghz MPCore 2.32GHz 3.40GHzCache L1I/L1D 32KB L1I/L1D 32KB L1I/L1D 32KB

L2 256KB L2 1024KB L2 256KBL3 6MB No L3 L3 8MB

Compiler GNU g++ 4.8 GNU g++ 4.8 GNU g++ 4.8

Performance evaluation platforms.

Compiler flags: -std=c++11 -Ofast -funroll-loops

2G. Sarkis, P. Giard, C. Thibeault, and W.J. Gross. Autogenerating software polardecoders. In Signal and Information Processing (GlobalSIP), 2014 IEEE GlobalConference on, pages 6–10, Dec 2014.

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 16 / 20

Page 17: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Intra-SIMD Comparison with State of Art

Intra frame vectorization (32-bit, float)

250

300

350

400

450

500

550

6 7 8 9 10 11 12 13 14 15 16

Cod

ed th

roug

hput

(Mbi

t/s)

Frame size (n = log2(N))

Intel Xeon CPU E31225 (AVX1 SIMD)

P-Edge, rate 5/6McGill, rate 5/6

50

60

70

80

90

100

110

120

6 7 8 9 10 11 12 13 14 15 16

Frame size (n = log2(N))

Nvidia Jetson TK1 CPU A15 (NEON SIMD)

P-Edge, rate 5/6

Cross marks: Sarkis et al.P-Edge up to 25% better

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 17 / 20

Page 18: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Inter-SIMD Comparison with State of Art

Inter frame vectorization (8-bit, char)

800

1000

1200

1400

1600

1800

2000

2200

2400

2600

6 7 8 9 10 11 12 13 14 15 16

Cod

ed th

roug

hput

(Mbi

t/s)

Frame size (n = log2(N))

Intel Xeon CPU E31225 (SSE4.1 SIMD)

P-Edge, rate 5/6Handw., rate 5/6

100

150

200

250

300

350

400

450

500

6 7 8 9 10 11 12 13 14 15 16

Frame size (n = log2(N))

Nvidia Jetson TK1 CPU A15 (NEON SIMD)

Triangles marks: former “handwritten” implementation3

P-Edge up to 25% better3B. Le Gal, C. Leroux, and C. Jego. Multi-gb/s software decoding of polar codes.

IEEE Transactions on Signal Processing, 63(2):349–359, Jan 2015.A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 18 / 20

Page 19: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Exploring Optimization Impacts

Impact of the different optimizations on the throughput

0

100

200

300

400

500

1/5 1/2 5/6 9/10

Cod

ed th

roug

hput

(Mbi

t/s)

Rate (R = K / N)

32-bit intra frame SIMD (AVX1)

412

304

380

511 521492

0

500

1000

1500

2000

1/5 1/2 5/6 9/10

Rate (R = K / N)

8-bit inter frame SIMD (SSE4.1)

spc2spc4spc6

repcut1cut0

ref

1861

1377

1540

1808 1804 1770

Throughput depending on the different optimizations for N = 2048

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 19 / 20

Page 20: An Efficient, Portable and Generic Library for Successive ...aff3ct.github.io/publications/Cassagne2015c - An Efficient, Portable... · P-EDGE: Polar ECC Decoder Generation Environment

Conclusion and Future Works

Conclusion:P-EDGE: a Polar ECC decoder exploration frameworkClear separation of concerns

Rewriting rules engineLow level building blocks (f , g , h, rep, spc, ...)

Large optimization explorationOutperform state of art decodersPerformance Portability

Future works:In-depth performance analysis

Performance model (Roof-line, Execution-Cache-Memory (ECM))Reduce memory footprintExplore other Polar code decoder variants

A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INPP-EDGE: Polar ECC Decoder Generation Environment 20 / 20