16
JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of Florida, Gainesville Scotty Sirowy (current) David Sheldon (current) Chen Huang (current) This research was supported in part by the National Science Foundation, the Semiconductor Research Corporation, Intel, Freescale, IBM, and Xilinx Frank Vahid Dept. of CS&E University of California, Riverside Associate Director, Center for Embedded Computer Systems, UC Irvine

JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

JIT FPGA Ideas

Contributing Ph.D. StudentsRoman Lysecky (Ph.D. 2005, now Asst. Prof. at

Univ. of ArizonaGreg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of

Florida, GainesvilleScotty Sirowy (current)

David Sheldon (current)Chen Huang (current)

This research was supported in part by the National Science Foundation, the Semiconductor Research

Corporation, Intel, Freescale, IBM, and Xilinx

Frank VahidDept. of CS&E

University of California, Riverside

Associate Director, Center for Embedded Computer Systems, UC Irvine

Page 2: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

2Frank Vahid, UC Riverside

SystemC Bytecode for FPGAs

Demo

Page 3: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

3Frank Vahid, UC Riverside

FPGA Common Presence

Caches, FPUs, GPUs, FPGAs

App developers may expect FPGA presence

How create/distribute apps that make good use of FPGA if present?

µP

Binary

Cache FPU

FPGAµP

GPU

Page 4: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

4Frank Vahid, UC Riverside

“Spatial” Algorithms for FPGAs Example – Count patterns

Sequential algorithm Hash table 10s cycles per pattern

int patterns[1,000]; int counts[1,000];while (1) { WaitForPattern(); CurrPattern = X; hash = HashFct(CurrPattern); item = Find(patterns, CurrPattern, hash); if (item) { counts[item]++; } }

count

Level 1logic pattern

logicLevel 2

Level mlogic

CurrPattern

countpattern

countpattern

.

.

.

bus

Spatial algorithm Pipelined stages Essence is the connectivity

of components, not the sequencing of instructions

Page 5: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

5Frank Vahid, UC Riverside

Bytecode Modern portability approach

Java, C#

PentiumAtomOpteron

bytecode

Compiler

VM VM VM

Virtual Machine (VM): Program that executes bytecode

May JIT compile to native architecture

Page 6: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

6Frank Vahid, UC Riverside

SystemC Bytecode?

PentiumFPGA

SystemC bytecode

Compiler

VM VM

SystemC

Opteron+

FPGA

VM

Page 7: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

7Frank Vahid, UC Riverside

UCR SystemC Bytecode and Compiler

class EDGE_DETECTOR : public sc_module {//signal declarations…EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady;

SC_method(getPixel); sensitive << clock.pos();

void getPixel(){ … dataReady.write(1);}

void mainComp(){ int i, j; for(i = 0; i < 3; i++){ for(j = 0; j < 3; j++){ sumX = sumX + mem.read()*GX[i][j] } } … edge.write(sumX + sumY)}

SystemC

--headersignal clock : 1signal reset : 1signal memory_in : 32signal fb_data : 32signal leds : 4

process(clock)READ $1 memory_inADD $2 $0 3ADD $3 $2 $1WRITE $3 s1ADDI $1 $0 1WRITE $1 dataReadyEND

process(dataReady)READ $5 val6 SW $5 24($0) READ $5 val7 …ADDI $10 $0 0 ADDI $7 $0 0ADDI $13 $0 8 …END

UCR’s SystemC bytecode

UCR’s SystemC-to-

bytecode compiler

MIPS-like sequential instructions

Spatial Constructs

Page 8: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

8Frank Vahid, UC Riverside

SystemC Bytecode Emulator

Emulator

Input Memory

Output Memory

UART

Buttons

LEDs

Read Signal Memory

Write Signal Memory

Main Processor

Instruction Memory

USB Interface

FPGABytecode uploadable via USB drive

Accelerators speedup emulation

SystemC bytecode

Page 9: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

9Frank Vahid, UC Riverside

SystemC Bytecode Accelerators

Emulator

Input Memory

Output Memory

UART

Buttons

LEDs

Read Signal Memory

Write Signal Memory

Main Processor

Instruction Memory

USB Interface

Accelerator 1

Accelerator 2

Accelerator 3FPGA

SystemC bytecode

Implementation MIPS-like multicycle RISC

datapath 100 MHz Clock ~33 Million Instr/Sec Communicates to core

emulator memory mapped registers

Area: ~5000 slices # of accelerators limited to #

of masters allowed on bus ~1200 lines of VHDL

Accelerator

RISC Datapath

Register File

Local Mem

Bus, start,load logic

Page 10: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

10Frank Vahid, UC Riverside

Dynamic SystemC Accelerator Management

Emulator

Input Memory

Output Memory

UART

Buttons

LEDs

Read Signal Memory

Write Signal Memory

Main Processor

Instruction Memory

USB Interface

Accelerator 1

Accelerator 2

Accelerator 3FPGA

SystemC bytecode

Only a limited number of SystemC accelerators can fit on an FPGA fabric

Dynamically map processes to accelerators based on process usage

Involves online algorithms

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Random Biased Periodic

Sequence

(ms

)

Virtual machine

Big FPGA/no com

Big FPGA

Static preloaded

Greedy

AG

42 44 4311 12 10

Image Filter Example

Page 11: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

11Frank Vahid, UC Riverside

Just-in-Time Synthesis

Emulator

Input Memory

Output Memory

UART

Buttons

LEDs

Read Signal Memory

Write Signal Memory

Main Processor

Instruction Memory

Accelerator 1

Accelerator 2

Accelerator 3FPGA

SystemC bytecode

Possible to even perform synthesis on-chip – “warp processing” (previous UCR work)

Send SystemC bytecode to synthesis server

FPGA Specific Bitstream

Dynamically reconfiguresome or all of the FPGA

Page 12: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

15Frank Vahid, UC Riverside

Transmuting Coprocessors

Demo

Page 13: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

16Frank Vahid, UC Riverside

FPGA is a Size-Limited Coprocessing Resource

CP library

00010010100100100101

00010010100100100101

New FPGA binary

DOOM: 23secBlowfish: 6sec

DOOM: 23secBlowfish: 6sec

User app profile info

ServerUser device

Internet

μ P

FPGA

DMABus

I/O

Memory

A software updateA coprocessor update

CP selection

CP placement

FPGA implement

s coprocesso

rs

Upload app profile

infoSelect

coproc. set, generate new FPGA bitstream

Send back new

bitstream, re-program

FPGA

Speedup with

previous apps

App executions change. Must decide which coprocessors should be FPGA-resident at a given time – transmuting

coprocessors

Page 14: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

17Frank Vahid, UC Riverside

Transmuting Coprocessor Demo

Three image filters: Blur filter (S/L): Blur the image Sobel filter (S/L): Find the edge of

the image Emboss filter(S/L): Emboss the

image

Platform: Virtex 2P(XC2VP30): PPC +

Coprocessors PPC Frequency: 100Mhz Coproc. Frequency: 50Mhz

0

20

40

60

80

100

120

Ti me

MP Smal l CP Large CP

30x 120x

Size(slice) Small Large

Blur 30 120

Sobel 228 912

Emboss 81 324

Page 15: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

18Frank Vahid, UC Riverside

Demo architecture

PPC Peripherals

InstructionBRAM

EDK

Interface to external

DisplayBRAM

ImageBRAM

Coproc

VGA control

VGA display

UART Push button

ISE

Image (128*128 pixels and 24bit color): 24 BRAMs

Soft version: Read (Image BRAM)Execution (PPC)Write (Display BRAM)

Coprocessor version: Read (Image BRAM)Execution(Coproc)Write (Display BRAM)

Dock: send the profile information through UART.

PLB

Page 16: JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ

19Frank Vahid, UC Riverside

Coprocessor configurations

Microprocessor only Small blur+ small sobel Small blur + small emboss Small sobel + small emboss Large blur Large sobel Large emboss

Choose the configuration according to app profile info.

PPC Peripherals

Memory

Virtex2P

Coprocessor region

Blur (S)Sobel(S)

Blur (S)Emboss(s)

Sobel(s)Emboss(s)

Blur (L)Sobel (L)

Emboss(L)