View
216
Download
1
Embed Size (px)
Citation preview
JIT FPGA Ideas
Contributing Ph.D. StudentsRoman Lysecky (Ph.D. 2005, now Asst. Prof. at
Univ. of ArizonaGreg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of
Florida, GainesvilleScotty Sirowy (current)
David Sheldon (current)Chen Huang (current)
This research was supported in part by the National Science Foundation, the Semiconductor Research
Corporation, Intel, Freescale, IBM, and Xilinx
Frank VahidDept. of CS&E
University of California, Riverside
Associate Director, Center for Embedded Computer Systems, UC Irvine
2Frank Vahid, UC Riverside
SystemC Bytecode for FPGAs
Demo
3Frank Vahid, UC Riverside
FPGA Common Presence
Caches, FPUs, GPUs, FPGAs
App developers may expect FPGA presence
How create/distribute apps that make good use of FPGA if present?
µP
Binary
Cache FPU
FPGAµP
GPU
4Frank Vahid, UC Riverside
“Spatial” Algorithms for FPGAs Example – Count patterns
Sequential algorithm Hash table 10s cycles per pattern
int patterns[1,000]; int counts[1,000];while (1) { WaitForPattern(); CurrPattern = X; hash = HashFct(CurrPattern); item = Find(patterns, CurrPattern, hash); if (item) { counts[item]++; } }
count
Level 1logic pattern
logicLevel 2
Level mlogic
CurrPattern
countpattern
countpattern
.
.
.
bus
Spatial algorithm Pipelined stages Essence is the connectivity
of components, not the sequencing of instructions
5Frank Vahid, UC Riverside
Bytecode Modern portability approach
Java, C#
PentiumAtomOpteron
bytecode
Compiler
VM VM VM
Virtual Machine (VM): Program that executes bytecode
May JIT compile to native architecture
6Frank Vahid, UC Riverside
SystemC Bytecode?
PentiumFPGA
SystemC bytecode
Compiler
VM VM
SystemC
Opteron+
FPGA
VM
7Frank Vahid, UC Riverside
UCR SystemC Bytecode and Compiler
class EDGE_DETECTOR : public sc_module {//signal declarations…EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady;
SC_method(getPixel); sensitive << clock.pos();
void getPixel(){ … dataReady.write(1);}
void mainComp(){ int i, j; for(i = 0; i < 3; i++){ for(j = 0; j < 3; j++){ sumX = sumX + mem.read()*GX[i][j] } } … edge.write(sumX + sumY)}
SystemC
--headersignal clock : 1signal reset : 1signal memory_in : 32signal fb_data : 32signal leds : 4
process(clock)READ $1 memory_inADD $2 $0 3ADD $3 $2 $1WRITE $3 s1ADDI $1 $0 1WRITE $1 dataReadyEND
process(dataReady)READ $5 val6 SW $5 24($0) READ $5 val7 …ADDI $10 $0 0 ADDI $7 $0 0ADDI $13 $0 8 …END
UCR’s SystemC bytecode
UCR’s SystemC-to-
bytecode compiler
MIPS-like sequential instructions
Spatial Constructs
8Frank Vahid, UC Riverside
SystemC Bytecode Emulator
Emulator
Input Memory
Output Memory
UART
Buttons
LEDs
Read Signal Memory
Write Signal Memory
Main Processor
Instruction Memory
USB Interface
FPGABytecode uploadable via USB drive
Accelerators speedup emulation
SystemC bytecode
9Frank Vahid, UC Riverside
SystemC Bytecode Accelerators
Emulator
Input Memory
Output Memory
UART
Buttons
LEDs
Read Signal Memory
Write Signal Memory
Main Processor
Instruction Memory
USB Interface
Accelerator 1
Accelerator 2
Accelerator 3FPGA
SystemC bytecode
Implementation MIPS-like multicycle RISC
datapath 100 MHz Clock ~33 Million Instr/Sec Communicates to core
emulator memory mapped registers
Area: ~5000 slices # of accelerators limited to #
of masters allowed on bus ~1200 lines of VHDL
Accelerator
RISC Datapath
Register File
Local Mem
Bus, start,load logic
10Frank Vahid, UC Riverside
Dynamic SystemC Accelerator Management
Emulator
Input Memory
Output Memory
UART
Buttons
LEDs
Read Signal Memory
Write Signal Memory
Main Processor
Instruction Memory
USB Interface
Accelerator 1
Accelerator 2
Accelerator 3FPGA
SystemC bytecode
Only a limited number of SystemC accelerators can fit on an FPGA fabric
Dynamically map processes to accelerators based on process usage
Involves online algorithms
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Random Biased Periodic
Sequence
(ms
)
Virtual machine
Big FPGA/no com
Big FPGA
Static preloaded
Greedy
AG
42 44 4311 12 10
Image Filter Example
11Frank Vahid, UC Riverside
Just-in-Time Synthesis
Emulator
Input Memory
Output Memory
UART
Buttons
LEDs
Read Signal Memory
Write Signal Memory
Main Processor
Instruction Memory
Accelerator 1
Accelerator 2
Accelerator 3FPGA
SystemC bytecode
Possible to even perform synthesis on-chip – “warp processing” (previous UCR work)
Send SystemC bytecode to synthesis server
FPGA Specific Bitstream
Dynamically reconfiguresome or all of the FPGA
15Frank Vahid, UC Riverside
Transmuting Coprocessors
Demo
16Frank Vahid, UC Riverside
FPGA is a Size-Limited Coprocessing Resource
CP library
00010010100100100101
00010010100100100101
New FPGA binary
DOOM: 23secBlowfish: 6sec
DOOM: 23secBlowfish: 6sec
User app profile info
ServerUser device
Internet
μ P
FPGA
DMABus
I/O
Memory
A software updateA coprocessor update
CP selection
CP placement
FPGA implement
s coprocesso
rs
Upload app profile
infoSelect
coproc. set, generate new FPGA bitstream
Send back new
bitstream, re-program
FPGA
Speedup with
previous apps
App executions change. Must decide which coprocessors should be FPGA-resident at a given time – transmuting
coprocessors
17Frank Vahid, UC Riverside
Transmuting Coprocessor Demo
Three image filters: Blur filter (S/L): Blur the image Sobel filter (S/L): Find the edge of
the image Emboss filter(S/L): Emboss the
image
Platform: Virtex 2P(XC2VP30): PPC +
Coprocessors PPC Frequency: 100Mhz Coproc. Frequency: 50Mhz
0
20
40
60
80
100
120
Ti me
MP Smal l CP Large CP
30x 120x
Size(slice) Small Large
Blur 30 120
Sobel 228 912
Emboss 81 324
18Frank Vahid, UC Riverside
Demo architecture
PPC Peripherals
InstructionBRAM
EDK
Interface to external
DisplayBRAM
ImageBRAM
Coproc
VGA control
VGA display
UART Push button
ISE
Image (128*128 pixels and 24bit color): 24 BRAMs
Soft version: Read (Image BRAM)Execution (PPC)Write (Display BRAM)
Coprocessor version: Read (Image BRAM)Execution(Coproc)Write (Display BRAM)
Dock: send the profile information through UART.
PLB
19Frank Vahid, UC Riverside
Coprocessor configurations
Microprocessor only Small blur+ small sobel Small blur + small emboss Small sobel + small emboss Large blur Large sobel Large emboss
Choose the configuration according to app profile info.
PPC Peripherals
Memory
Virtex2P
Coprocessor region
Blur (S)Sobel(S)
Blur (S)Emboss(s)
Sobel(s)Emboss(s)
Blur (L)Sobel (L)
Emboss(L)