35
C-to-Verilog.com: High-Level Synthesis Using LLVM Nadav Rotem, Haifa University

C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

C-to-Verilog.com: High-Level Synthesis Using LLVM

Nadav Rotem, Haifa University

Page 2: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Computing tradeoffs

• Different kinds of computational problems

• Different kind of architecture solutions

GSM communication

Camera Image De-noise

Phonebook Manager

Render Graphics

Page 3: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

CPU GPU FPGA ASIC

Python, Java

DSP

Verilog, VHDLC/C++, OpenCL

Multi

Core

Easy to program.

Flexible. Slow.

Performance, power

efficient. Difficult to

program.

Page 4: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Introduction to High-Level Synthesis

Page 5: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Hardware description languages• Complex digital systems are made of basic

logic elements (AND, NOT, FF, etc.)

• Designers use Hardware Description Languages to describe logic blocks

• Use C-like syntax to express hardware module toplevel(clock,reset,out);

input clock;input reset;output reg out;reg flop1;reg flop2;always @ (posedge reset or posedge clock)If (reset) begin

flop1 <= 0;flop2 <= 1;

end else beginflop1 <= flop2;flop2 <= flop1; out <= flop2 ^ flop1;

endendmodule

Page 6: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

The problem with HDL

• Need to 'think' hardware• All parts of the circuit operate at the same time

• Explicit notion of time (clock, synchronization)

• Explicit notion of space (size and connectivity of components)

• Extremely long compilation cycle

• Difficult to develop and verify

• Details, Details, Details …

Page 7: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

High-level synthesis

• HLS: Compilation of high-level languages, such as C, to Hardware Description Languages.

• Easier to write and test code in C

• Use a subset of C

– No IO, recursion, jump by value, etc.

– Standard hardware/software interface

while(true) {out = val1 ^ val2;...

}

module toplevel(clock,reset,out);input clock;input reset;output reg out;reg flop1;reg flop2;always @ (posedge reset or

posedge clock)If (reset) beginflop1 <= 0; flop2 <= 1;

end else beginflop1 <= flop2; flop2 <= flop1; out <= flop2 ^ flop1;

endendmodule

Page 8: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

C-to-Verilog.com

• LLVM-based high-level synthesis system

• Developed as a graduate research project

• Website is web-interface for the synthesis system

• Free, Open, etc.

Page 9: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

High-Level Synthesis using LLVM

Page 10: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

HLS using LLVM

– Use Clang and LLVM to parse and optimize the code

– Special LLVM passes optimize the code at IR level

– HLS backend synthesize the code to Verilog

Frontend(clang)

LLVM(opt)

VerilogBackend

Hardware optimization

passes

.c Verilog

Page 11: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

HLS backend for LLVM

Page 12: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Simple High-Level Synthesis

– It is trivial to compile sequential C-like code to HDL

– A state-machine can represent the original code

– We can create a state for each 'opcode‘

– Example:case (state)

ST0: beginA <= B + 5;state <= ST1;

endST1: begin

C <= (A == 0)state <= ST2;

endST2: begin

if (C)state <= ST9;

elsestate <= ST0;

end...

endcase

entry:%A = add i32 %B, 5%C = icmp eq i32 %A, 0 %br i1 %C, label %next, label %entry…

Page 13: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

High-Level synthesis challenges

• The simple state-machine translation is inefficient

• We want to optimize:– Fast designs (few clock cycles to complete)

– High-frequency (low clock latency)

– Size and resource efficient (few gates, memory ports)

– Low-power

Page 14: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Scheduling pipelined resources

– Generally, in HLS resources can be synthesized

• Unlimited registers, arithmetic ops, etc.

– Some resources are limited, and need to be shared.

• External memory ports

• ASIC Multipliers (for FPGA synthesis)

– Often, hardware resources are 'pipelined', to gain high frequencies.

• Multiplier – 5 stages, Memory – 2 stages, etc.

Page 15: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

List Scheduling

– Schedule a single basic block

– Convert a DDG into a [Time x Resource] table

– Requirements:• Preserve DDG dependencies

• Expose parallelism

• Conserve resources, use pipelined resources

– After scheduling, HDL syntax generation is simple

Page 16: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Example (bad)

• Multiplier – 3 cycles

• Load/Store – 2 cycles

• Other – 1 cycle

LoadA

LoadB

LoadC

Mult Mult

Mult Add

Xor

StoreD

+2

+2

+2

+3

+3

+3

+1

+1

+2

+2

21 cycles

StoreE

Page 17: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Time

Mult Mem Other

LoadA

LoadB

LoadC

Mult Mult

Mult Add

Xor

StoreD

13 cycles

StoreE

Page 18: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

IR Optimizations for hardware

Page 19: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Reduce-bitwidth-opt

• CPUs have fixed execution units

• In hardware synthesis, arithmetic operations are synthesized into circuits

• Fewer bit width arithmetic operations translate to smaller circuits which operate at higher frequencies

Page 20: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Reduce-bitwidth-opt

• Reduce bit width in several cases:

1. Detect local bit reducing patterns (masks)

2. Reduce constant integers to lowest bit-width

3. Use smallest possible arithmetic operation based on input width

• Simple LLVM Pass

Y = X & 0xFF i32 0x5i8 + i8 = i9i4 * i4 = i8

1 2 3

Page 21: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Arithmetic tree height reduction

*

* *

*

*

* *

*

*

*

*

*

*

*Short dependency chain,

High parallelism

Long dependency chain,

Low parallelism

Page 22: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Arithmetic tree height reduction

• Simple LLVM pass

• Collect long chains of arithmetic operations and balance them

• Only for commutative arithmetic and logic operations

• Not suitable for software where number of registers is unlimited

Page 23: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

HLS flow example

Page 24: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop-count example

// count the number of 1’s in a wordunsigned popCnt(unsigned input) {

unsigned sum = 0; for (unsigned i = 0; i < 32; i++) {

sum += (input) & 1; input = input/2;

} return sum;

}

Page 25: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop-count

• Runtime: – The program has a loop, which executes 32 times.

• Size:– Has several 32-bit registers.

– Has control-flow logic.

• Frequency:– Has 32bit-adders, long carry-chains.

• IR-level optimizations can be very beneficial

Page 26: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

• First, we let LLVM unroll and optimize the loop

// count the number of 1’s in a wordunsigned popCnt(unsigned input) {

unsigned sum = 0;sum += (input>>0) & 1; sum += (input>>1) & 1; sum += (input>>2) & 1; ……return sum;

}

Page 27: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

• Next, we balance the long-chain of adders to become a tree of 31-additions

+

+ +

+

+

+ ++

+

+

+

+

+

+

Balance

Page 28: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

// count the number of 1’s in a wordunsigned popCnt(unsigned input) {

unsigned sum0,sum1, sum2, sum3 … t0,… t0 = (input>>0) & 1; t1 = (input>>1) & 1; t2 = (input>>2) & 1; ……sum0 = t0 + t1;sum1 = t2 + t3;…sum30 = sum29 + sum 28; return sum;

}

Page 29: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

• Finally, we’ll reduce the bit-width of each operation

+

+ +

+

+

+ +

& & & & & && & &1 results in 1-bit value

Addition of 1 bit inputs ≤ 2-bit

Addition of 2 bit inputs ≤ 3-bit

Addition of 5 bit inputs ≤ 6-bit

Page 30: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

// count the number of 1’s in a wordunsigned popCnt(unsigned input) {

uint1_t t0, t1 … uint2_t sum0,sum1, sum2, sum3 … uint3_t sum17,sum18, sum19, sum20 … …uint6_t sum31;…t0 = (input>>0) & 1; …sum0 = t0 + t1;…sum31 = sum29 + sum 30; return sum;

}

Page 31: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop – count

• Finally, we pass the hw-optimized IR to the backend for scheduling and syntax generation

+

+ +

+

+

+ +

& & & & & && &Cycle 0

Cycle 1

Cycle 2

Small operations can

be scheduled into a

single clock cycle

Only a wire

Page 32: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Pop - count

• IR-level optimizations are very beneficial

– Size: fewer and smaller registers, no control flow

– Frequency: smaller arithmetic ops (32bit -> 6 bits)

– Cycles: Fewer cycles (32 -> 4)

Page 33: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Conclusion

• High-level synthesis automates circuit design

• LLVM is an invaluable tool when developing a HLS compiler

• HLS compiler is made of IR-level optimization passes and a scheduling backend

Page 34: C-to-Verilog.com: High-Level Synthesis Using LLVM · 2019-10-30 · High-Level Synthesis using LLVM. HLS using LLVM –Use Clang and LLVM to parse and optimize the code –Special

Questions ?