38
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications I

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

Embed Size (px)

Citation preview

Page 1: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE / ComS 583Reconfigurable Computing

Prof. Joseph ZambrenoDepartment of Electrical and Computer EngineeringIowa State University

Lecture #7 – Applications I

Page 2: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.2

Recap – Fixed-Point Arithmetic• Addition, subtraction the same (Q4.4 example):

• Multiplication requires realignment:

3.6250 0011.1010+ 2.8125 0010.1101 6.4375 0110.0111

3.6250 0011.1010 x 2.8125 0010.1101 00111010 00111010 00111010 00111010 10.1953125 1010.00110010

Page 3: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.3

Recap – Capacity Trends

Year1985

Xil

inx

Dev

ice

Com

ple

xity

XC200050 MHz1K gates

XC4000100 MHz

250K gates

Virtex200 MHz1M gates

Virtex-II 450 MHz8M gates

Spartan80 MHz

40K gates

Spartan-II200 MHz

200K gates

Spartan-3326 MHz5M gates

19911987

XC300085 MHz

7.5K gates

Virtex-E240 MHz4M gates

XC520050 MHz

23K gates

1995 1998 1999 2000 2002 2003

Virtex-II Pro450 MHz8M gates*

2004 2006

Virtex-4500 MHz

16M gates*

Virtex-5550 MHz

24M gates*

Page 4: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.4

Recap – FPGA Memory Resources

Distributed LUT-based RAM

Block Select+RAM

Page 5: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.5

Recap – Block Multipliers

• Block multipliers (18b x 18b) arranged in columns near RAM

Page 6: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.6

Recap – Block Multipliers (cont.)

• Synthesis tools can take larger multipliers and break them down into 18x18 multipliers

Page 7: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.7

Recap – FPGA CPU Resources

• Built-in PowerPC 405 processor blocks

Page 8: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.8

Recap – Xilinx Virtex-4 FPGAs

• Comes in three varieties:• Virtex-4 LX: most amount of LUTs• Virtex-4 FX: has PowerPCs like V2P• Virtex-4 SX: contains most amount of XtremeDSP slices

• CLB structure similar to Virtex-II• Largest LX device – 89,088 slices = 178,176 4-LUTs!• FX devices limited to 2 PPC 405s like Virtex-II Pro

• XTremeDSP Slices:• Same 18x18 block multiplier, now with optional

pipelining• Includes built-in 48-bit accumulator for MAC operations

Page 9: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.9

Recap – Xilinx Virtex-5 FPGAs

• CLB slices uses 6-input LUTs

• Block RAMs now 36Kbits per block

• DSP slices now support 25x18 MAC

• Diagonal routing

• LX, LXT, SXT, FXT varieties

Page 10: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.10

Outline

• Recap• Splash / Splash 2 System Architecture• Splash 2 Programming Models• Splash 2 Applications

• Text searching• Genetic pattern matching• Image processing

Page 11: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.11

Overview

• An early well-known reconfigurable computer was Splash / Splash 2

• Implemented as linear, systolic array• Developed at Supercomputing Research

Center (1990-1994)• Memory tightly coupled with each FPGA• Multiple Splash boards can be combined

to form larger system

Page 12: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.12

Splash 1

• Born to solve DNA string matching• Operational in 1989• Features

• 32 Xilinx XC3090 FPGAs (420 Mλ2)• 32 128KB SRAMs (600 Mλ2)• VMEbus interface (FIFO 1 MHz clock)

• 33 Gλ2 total (not counting interconnect)• Linear interconnect only

Page 13: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.13

Splash 1 Architecture

F3

M3

VME Bus

Interface

VSB Bus

Control

InterfaceFIFO IN

FIFO OUT

M4

F4

F11

M11

M12

F12

F0

M0

M7

F7

F8

M8

M15

F15

F1

M1

M6

F6

F9

M9

M14

F14

F2

M2

M5

F5

F10

M10

M13

F13

F31

M31

M24

F24

F23

M23

M16

F16

F28

M28

M27

F27

F20

M20

M19

F19

F29

M29

M26

F26

F21

M21

M18

F18

F30

M30

M25

F25

F22

M22

M17

F17

Page 14: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.14

Splash 2

• Operational in 1992• Attached processor system• Features

• 17 Xilinx XC4010 FPGAs (500 Mλ2)• 17 512KB SRAMs (2 Gλ2)• 9 TI SN74ACT8841s (16 port, 4b crossbar)• Sbus interface (< 100MB/s)

• 43 Gλ2 total (not counting interconnect)• Supported 2 programming models:

• Systolic• SIMD

Page 15: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.15

Splash 2 Architecture

M1 M4M3M2 M5 M8M7

F1 F4F3F2 F5 F8F7F6

M6

F0

M0

M16

F16

M13

F13

M14

F14

M15

F15

M12

F12

M9

F9

M10

F10

M11

F11

Crossbar Switch

From PrevBoard

To NextBoard

Page 16: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.16

Splash 2 Architecture (cont.)

Sparc-basedHost

InterfaceBoard

S-Bus

External Input

External Output

Array BoardsR-Bus

SIMD S-Bus

Page 17: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.17

Comparing Splash 1 and 2

Year

Splash 1 Splash 2

FPGA4-LUTs / BoardMax Input Rate

1989 1992XC3090 XC401010,240 13,6001 MB/s 100 MB/s

Interconnect Linear ArrayLinear Array

Memory / FPGA 128 KB

Broadcast

Area 33 Gλ2

Crossbar512 KB43 Gλ2

Page 18: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.18

Splash 2 Programming Models• Linear (systolic) array

• All near-neighbor communication, pipelined• Very fast (at the time) speed of 20-30MHz achieved• All FPGAs run the same program

• SIMD array• Instructions fanned out to all processing elements• Data across all elements collected at the end

.

.

F1 F16F3F2 …

F1 F16F3F2 …

F0

F3

F3

outin

outin

Page 19: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.19

Programming Splash 2

• Languages• VHDL• dbC – a C-like language capable of SIMD instructions

• How many programs?• Host• Interface board (XL and XR)• Splash array board

• X0• X1 – X16• TI crossbar chips

• Five programs!

• Requires intimate knowledge of the architecture

Page 20: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.20

Application #1 – Dictionary Search

• Search through dictionary of words for data hit

• Applicable to internet search engines / databases

• Opportunities for search parallelism• Splash implementation uses systolic

communication

Page 21: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.21

Dictionary Search (cont.)

• Each FPGA used to look into local memory• Longer data words “hashed” into 18 bit address• Valid bit in memory indicates if data value is

currently stored• Could be stored in several locations

Fi

Mi

Fi-1

Mi-1

Fi+1

Mi+1

Page 22: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.22

Example Hash Function

• XOR two character value with temp result and hash function• Rotate result• Different hash function for each FPGA

Shift amount: 7 bitsHash function: 1100 1000 1010 0011

00 0000 0000 0000 0000 0000 Clear hash register01 1010 0001 1101 00 Input the letters “th”---------------------------------10 1000 0011 0101 1100 0000 Temporary Result

10 0000 0101 0000 0110 1011 Result for “th”00 0000 0001 1001 01 Input for letters “e_”-----------------------------------------01 0010 0110 0001 1110 1011 Temporary result

10 0101 1010 0100 1100 0011 Result for “the_”

Page 23: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.23

Dictionary Search (cont.)

• Distribute dictionary in parallel to all memories• Collect word values in FIFOs• Distribute words two characters at a time across all devices• Perform local hashing and lookup in parallel• Collect “hit” result at end

• Splash 2 implementation results• 25 MHz• Three phases needed

• Fetch 2 bit-sliced characters• Perform hash• Table look-up

• Takes advantage of both systolic and SIMD modes

Page 24: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.24

Genetic Pattern Matching

• Comparing strings by edit distance• Motivation: The Human Genome Project

• Do two genetic strings match?• How are they related?

• When biologists characterize a new sequence, they want to compare it to the (growing) database of known sequences

• Abstraction:• What is the cost of transforming s into t • Given – costs for insertion, deletion, substitution

Page 25: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.25

Alphabet and Costs

• Alphabet• Letters in the string. For DNA, there are four:

• A (Adenine)• C (Cytosine)• T (Thymine)• G (Guanine)

• Transformation Costs• Insert:1, Delete:1, Substitute:2, match:0

• Type of comparison• One target to many sources• One target to one source

Page 26: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.26

Substitution Example

baboon

CostMoveWord

bab|on

bobon

boubon

bourbon

Delete ‘o’

Substitute ‘o’

Insert ‘u’

Insert ‘r’

Match?

Total cost:

1

2

1

1

0

5bourbon

Page 27: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.27

Dynamic Programming Solution

• Source sequence: s1, s2, … sm

• Target sequence: t1, t2, … tn

• di,j = distance between subsequence s1, s2, …si and subsequence t1, t2, …tj, where

d0,0 = 0

di,0 = di-1,0 + Delete(si)

d0,j = d0,j-1 + Insert(tj)

di-1,j + Delete(si)

di,j = min di,j-1 + Insert(tj)

di-1,j-1 + Substitute(si, tj)• Distance(Source, Target) = dm,n

Page 28: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.28

Dynamic Programming Example

baboon

0123456

b1012234

o2122223

u3233334

r4344345

b5344455

o6455456

n7566565

Page 29: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.29

Parallelism on the Anti-Diagonal

baboon

0123456

b1012234

o2122223

u3233334

r4344345

b5344455

o6455456

n7566565 di,j

di-1,j

di,j-1

di-1,j-1

Page 30: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.30

SCin

SDin

TCout

TDout

If (SCin != 0) and (TCin != 0)

PEDist + Substitute(SCin, TCin)

PEDist min TDin + Delete(SCin)

SDin + Insert(TCin)

else-if (SCin != 0)

PEDist SDin

else-if (TCin != 0)

PEDist TDin

endif

SCout SCin

TCout TCin

SDout PEDist

TDout PEDist

Bidirectional PE Example

BidirPE

PEDist

BidirPE

PEDist

BidirPE

PEDist

SCout

SDout

TCin

TDin

Page 31: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.31

Bidirectional Summary

• 16 CLBs/PE• 384 PEs/Board• 2,100 Million Cells/sec• Requires 2*(m+n) PEs• Uses only half the processors at any one time• Must stream both source and target for each

comparison• Makes comparison against large DB impractical

Page 32: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.32

Genetic Search Performance

• Nearly linear scaling in cell updates per second (CUPS)

• Need to reuse array for large patterns

Hardware CUPS

Splash 2 x16 43,000M

Splash 2 3,000M

Splash 1 370M

P-NAC (34) 500M

λ

0.60μ

0.60μ

0.60μ

2.0μ

CM-2 (64K) 150M

CM-5 (32) 33M

SPARC 10 1.2M

SPARC 1 0.87M

?

?

0.40μ

0.75μ

Area

500Mλ2x17x16

500Mλ2x16

420Mλ2x32

7.8Mλ2x34

1.6GMλ2

273Mλ2

CUP/λ2s

0.32

0.38

0.028

1.9

0.00075

0.0032

Page 33: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.33

Application #3 – Image Processing

• Reconfigurable computers well suited to image processing due to high parallelism and specialization (filtering)

• Algorithms change sufficiently fast such that ASIC implementations become outdated

• Examine two issues with Splash • Image compression• Image error estimation

• Parallelize across array in SIMD and systolic fashion

Page 34: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.34

Gaussian Pyramid Operations

• Gaussian Pyramid• Down sample image to compress image size for

communication

• Average over a set of points to create new point

• Laplacian Pyramid• Determine error found from Gaussian Pyramid• Expand contracted picture and compare with original

2

2

2

21 )2,2(),(),(

m nkk njmignmwjig

Page 35: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.35

Gaussian Pyramids

• Systolic array in which each device performs a separate function

• Limited by clock rate of slowest device

Page 36: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.36

Pyramid Flow

• Generates both Gaussian and Laplacian pyramid for 512 x 480 image in 22.7 ms at 15.7 MHz

Page 37: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.37

Other Image Processing

• Target recognition• Break image into “chips”• Each chip passed through linear array in

attempt to match with stored image• Images can be rotated, mirrored• Zoom in if suspicious object found

Page 38: CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #7 – Applications

CprE 583 – Reconfigurable ComputingSeptember 11, 2007 Lect-07.38

Summary

• Splash 2 effective due to scalability and programming model

• Parameterizable applications benefit that are regular and distributed

• High bandwidth effective for searching/signal processing

• Challenges remain in software development