38
SSS 4/9/99 CMU Reconfigurable Comput ing 1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu [email protected]

SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu [email protected]

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

SSS 4/9/99 CMU Reconfigurable Computing 1

The CMU Reconfigurable Computing Project

April 9, 1999

Mihai Budiu

[email protected]

SSS 4/9/99 CMU Reconfigurable Computing 2

Current Project Members

ECE Department

Herman Schmit Srihari CadambiMatt MoeRobert TaylorRonald Laufer

CS Department

Seth Copen GoldsteinMihai Budiu

SSS 4/9/99 CMU Reconfigurable Computing 3

Why Study Reconfigurable Hardware?

It is a nice computation paradigm(wire your own computer)

SSS 4/9/99 CMU Reconfigurable Computing 4

Algorithm Year System Versus Speedup xDNA matching 1992 SPLASH 2 SPARC 10 4300

FIR Filter 1998 PipeRench UltraSparc300Mhz

90

IDEA Encryption 1998 PipeRench UltraSparc300Mhz

61

SAT solver 1997 Pamette SPARC 5110Mhz

17--1100

Ray Casting 1995 RIPP-10 Pentium75Mhz

33.8

Hidden MarkovModel

1996 1 Xilinx FPGA SPARC 10 24.4

DES Encryption 1996 GARP UltraSparc170Mhz

24

SPEC92 1994 MIPS+RC MIPS 1.22

Why Study Reconfigurable Hardware

SSS 4/9/99 CMU Reconfigurable Computing 5

Commercial Players

Source: In-stat April 1998  *Does not include software, hardwire or support EPROMs

SSS 4/9/99 CMU Reconfigurable Computing 6

What Is “Reconfigurable Hardware?”

Universal gates

and/or

storage elements

Interconnectionnetwork

Switches

SSS 4/9/99 CMU Reconfigurable Computing 7

Basic Ingredient: RAM cell

0001

Universal gate = RAM

a0

a1

a0

a1

dataa1 & a2

SSS 4/9/99 CMU Reconfigurable Computing 8

A switch is controlled by a 1-bit RAM cell

0

1

1

1

Basic Ingredients (ctd)

SSS 4/9/99 CMU Reconfigurable Computing 9

Outline

• What is reconfigurable hardware

• RH vs other computation paradigms

• Challenges in RH research

• PipeRench: the CMU project:– the hardware– the software

• Conclusions

SSS 4/9/99 CMU Reconfigurable Computing 10

RH vs ASICs• Generally Application-Specific Integrated Circuits

will be faster than RH:– RH wires are slow & big– RH bit-slices are costly to interconnect– RH devices must store configuration on the chip

but• RH can be reprogrammed

– new algorithms– to fix bugs

• RH cheaper in small production• RH tolerates faults better• RH sometimes faster with staged computation

SSS 4/9/99 CMU Reconfigurable Computing 11

RH vs Microprocessors

• RH less flexible (like a VLIW with fixed instructions)

but• RH provides more (customized)

computation elements• RH can decrease memory traffic• RH can be tailored for specific algorithms

and data types

RH will not replace mP, but complement them

SSS 4/9/99 CMU Reconfigurable Computing 12

Types of RH

• FPGAs: bit-level logic functionality(the basic processing elements compute on 1 bit)

• word-based architectures: PipeRench (CMU)(basic PE operates on 8 bits)

(basic PE is a small ALU)

• coarse architectures: RAW (MIT)(basic PE is a MIPS 2000 core)

SSS 4/9/99 CMU Reconfigurable Computing 13

RH In A SystemTitle:(coupling)Creator:(FrameMaker 5.5 PowerPC: LaserWriter 8 8.5.1)Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

SSS 4/9/99 CMU Reconfigurable Computing 14

Challenges In RC

• Software tools:– Programming RC like software development– Automatic compilation from HLL– Automatic program partitioning

• Mapping efficiently algorithms (no ISA)• System issues

– interfaces– find “ideal” RC fabric

SSS 4/9/99 CMU Reconfigurable Computing 15

The CMU Reconfigurable Computing Project

SSS 4/9/99 CMU Reconfigurable Computing 16

Hardware Goals

• To build a complete reconfigurable hardware device

• To build the system integration hardware

• To host the device in a PC

SSS 4/9/99 CMU Reconfigurable Computing 17

Our Device:

• Word processing elements

• Pipelined architecture

• Virtualized hardware

• Local interconnection network

• Wide pipelined bus

SSS 4/9/99 CMU Reconfigurable Computing 18

Configurationmemory

Stripes

Data & Configcontroller

Processingelements

SSS 4/9/99 CMU Reconfigurable Computing 19

Hardware Virtualization

Instructionscurrently in hardware

Instructions paged out

Actual availablehardware

Prog

ram

SSS 4/9/99 CMU Reconfigurable Computing 20

Hardware Virtualization (2)

compute

compute

compute

configurePage in

Page out

Program in configurationmemory

hardware

Overlap configuration with computation.

SSS 4/9/99 CMU Reconfigurable Computing 21

Processing Elements

• Look-up table• Any 3-to-1 function

a b

Cin

out

PE2 PE0PE1

SSS 4/9/99 CMU Reconfigurable Computing 22

The Interconnection Network

Word-level cross-bar

P*B bits

Pass Registers

0

P*B*N bits

B bits

PEPE N PE 1

SSS 4/9/99 CMU Reconfigurable Computing 23

The PCI BoardTitle:chip.epsCreator:fig2dev Version 3.2 Patchlevel 0-beta3Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

SSS 4/9/99 CMU Reconfigurable Computing 25

Software GoalTo program reconfigurable devices using the standard

software development processes:

– Compile C or Java– Do it quickly

Partitioner

DIL

Java

Data-flow Intermediate Language

Configuration

Reconfigurable HW CPU

Built

SSS 4/9/99 CMU Reconfigurable Computing 26

Building Circuits From DIL

a = b + c * d;

e = c - d;

• variables wires• operators gates

+

*

cb d

a

-

e

SSS 4/9/99 CMU Reconfigurable Computing 27

Mapping Circuits To

-

+

a b c

-

+

a b c

-+

a b c

-+

a b c

SSS 4/9/99 CMU Reconfigurable Computing 28

The DIL Compiler Front-End

Parser

Evaluator

Loader

Loader

Dil

input file

Circuit

component

library

Component

circuits

Backend

SSS 4/9/99 CMU Reconfigurable Computing 29

The DIL Compiler BackendCircuit

(expanded)

OptimizerPlacer-

Router

CircuitCircuit

(placed)

Code generator

AsmC++

Front-end

C++xfig

The whole compilation process is very fast (compared to classical CAD tools).

We can compile two orders of magnitude faster.

SSS 4/9/99 CMU Reconfigurable Computing 30

Small Big

Efficient usage Wasteful

Slower Faster bit-slice

Flexible interconnect Coarse routing

Bigger configuration Fewer configuration bits

Place and route easier Constrains the compiler

Processing Element Size Tradeoffs

SSS 4/9/99 CMU Reconfigurable Computing 31

Stripe Width Tradeoffs

Wider NarrowerFewer stripes More will fit

Virtualize more Fewer page-insBandwidth waste Less bandwidth available

Placer freedom Placement constrained

SSS 4/9/99 CMU Reconfigurable Computing 32

Wider Narrower

More area Less area

High bandwidth Time-mux bus

Bus Width Tradeoffs

SSS 4/9/99 CMU Reconfigurable Computing 33

Clock Speed Tradeoffs(run-time)

Faster Slower

Short critical path Big chains

Long pipeline built Compact circuits

Decomposition overhead Little decomposition

Virtualized more Less virtualized

+24

2424+

++

2424

24

88

8

SSS 4/9/99 CMU Reconfigurable Computing 34

Configuration Bits per Stripe

0

200

400

600

800

1000

1200

1400

1600

64 80 96 112 128 144Stripe Width

Co

nfi

gu

rati

on

Bit

s

2 4 8 16 32

PE bit width

SSS 4/9/99 CMU Reconfigurable Computing 35

Title:(fir-throughput.eps)Creator:Adobe Illustrator(TM) 7.0Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

SSS 4/9/99 CMU Reconfigurable Computing 36

Project Status• Operational:

– Behavioral and structural models of Piperench in Verilog

– Assembler, simulator– Tools for visualization and debugging– One tile fabricated and tested– Very fast compiler from intermediate language

• In work:– Prototype PipeRench to be taped this summer – PCI board to host PipeRench in a PC

SSS 4/9/99 CMU Reconfigurable Computing 37

Simulated Speed-up vs. UltraSparc @ 300Mhz

328.8

29.020.6

90.961.8

26.0

76.1

1.0

10.0

100.0

1000.0

ATR Cordic DCT FIR IDEA Nqueens Over

SSS 4/9/99 CMU Reconfigurable Computing 38

Future Work

• Build the PCI board

• Build the OS device drivers

• Start investigating HLL issues:– automatic partitioning– translation to DIL– special code transformations

SSS 4/9/99 CMU Reconfigurable Computing 39

Conclusions

• A set of important applications can benefit from RC devices

• RC offer potential for substantial performance improvement at a low cost

• RC devices will soon be mainstreamin the embedded computing world; perhaps in the future they will also permeate the desktop Pentium V

UVR