72
Reconfigurable HPC Reconfigurable HPC part 3 Architectural Resources Reiner Hartenstein TU Kaiserslautern May 14, 2004 , TU Tallinn, Estonia

Reconfigurable HPC part 3 Architectural Resources

  • Upload
    angeni

  • View
    41

  • Download
    1

Embed Size (px)

DESCRIPTION

May 14, 2004 , TU Tallinn, Estonia. Reconfigurable HPC part 3 Architectural Resources. Reiner Hartenstein TU Kaiserslautern. terms:. DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA. Converging Design Flows. - PowerPoint PPT Presentation

Citation preview

Page 1: Reconfigurable HPC part 3 Architectural Resources

Reconfigurable HPC

Reconfigurable HPC

part 3Architectural

Resources

Reiner Hartenstein

TU Kaiserslautern

May 14, 2004 , TU Tallinn, Estonia

Page 2: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de2

TU Kaiserslautern

Converging Design Flows

this synthesis method is a generalization of

systolic array synthesis:super systolic synthesis

and DPA [Broderson,

2000]: terms:

DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA

the same synthesis method may be used for mapping an algorithm

onto both:rDPA [Kress, 1995],

Page 3: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de3

TU Kaiserslautern>> Time to space migration

<<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 4: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de4

TU Kaiserslautern

Problems in time to space migration of algorithms

Time to space migration of algorithms

Some have moderate interconnect requirements

Many DSP algorithms require just a pipeline

Some algorithms require excessive interconnect

Example: the Viterbi algorithm

A comprehensive taxonomy of algorithms is missing

Page 5: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de5

TU Kaiserslautern

IC interconnect: metal layers

Intel

Foundries offer up to 9 metal layers

and up to 3 poly layers

Reconfigurable interconnect fabric layouted over the rDPU cell

Page 6: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de6

TU Kaiserslautern

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !

+

rout-through and function

rout-throug

h only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

http://kressarray.de

Page 7: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de7

TU KaiserslauternKressArray DPSS

ApplicationSet

DPSS

published at ASP-DAC 1995

ArchitectureEditor

MappingEditor

statist.Data

DelayEstim.

Analyzer

Architecture

Estimator

interm.form 2

expr.tree

ALE-XCompiler

PowerEstimator

PowerData

VHDLVerilog

HDLGeneratorSimulator

User

ALEXCode

Improvement Proposal Generator

Suggestion

SelectionUserInterface

interm.form 3

Mapper

DesignRules

DatapathGeneratorGenerator

KressrDPU

Layout

data stream Schedule

Scheduler

KressArrayXplorer (Platform Design Space Explorer)

Xplorer

InferenceEngine (FOX)

Sug-gest-ion

KressArrayfamily

parameters

Compiler

Mapper

Scheduler

Page 8: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de8

TU Kaiserslautern

Xplorer GUI

Page 9: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de9

TU Kaiserslautern

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

http://kressarray.de

SNN filter KressArray Mapping Example

rout thru only

not usedbackbus connect

Page 10: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de10

TU Kaiserslautern

route-thru-only rDPU

3 vert. NNports, 32 bit

http://kressarray.de

Xplorer Plot: SNN Filter Example

+[13]

2 hor. NNports, 32 bit

operator

result

operand

operand

route thru

backbus connect

Page 11: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de11

TU Kaiserslautern

Communication resource editor panel of the

Xplorer user interface

Page 12: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de12

TU Kaiserslautern

Elements of the Xplorer mapping editor:a) Routing editor panel

Page 13: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de13

TU Kaiserslautern

Elements of the Xplorer mapping editor:b) Input port editor panel

Page 14: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de14

TU Kaiserslautern

Xplorer: Improvement Proposal Generator

Page 15: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de15

TU Kaiserslautern

Xplorer: conditional swap

operator

Page 16: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de16

TU Kaiserslautern

Xplorer: Macro

cells

Page 17: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de17

TU Kaiserslautern

FPGA-Style Mapping for coarse grain reconfigurable arrays

mapping Kress DPSS CHESS RaPiD Colt

placement simulated annealinggenetic

algorithm

routing

simulatedannealing

Pathfindergreedy

algorithm

Compiler

Mapper

Schedulerspecifies and

assembles thedata streams

from / to array

DPSS

KressArray DPSS(Datapath Synthesis System)

Page 18: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de18

TU KaiserslauternUlrich Nageldinger

DissertationUlrich Nageldinger: • ... on mapping applications onto KessArrays• ... simultaneous routing and placement by

simulated annealing• Supporting a huge family of KressArrays• fuzzy logic improvement proposal generator• profiling• design space exploration

infineon technologies, Munich

http://hartenstein.de/Ph-D-Theses.html

Page 19: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de19

TU Kaiserslautern>> Flowware languages <<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 20: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de20

TU Kaiserslautern

Similar Programming Language Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

sequencingdriven by:

read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching

read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching

very easy to learn

Page 21: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de21

TU Kaiserslautern

JPEG zigzag scan pattern

x

y

*> Declarations

HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;

goto PixMap[1,1]

HalfZigZag;SouthWestScanuturn (HalfZigZag)

HalfZigZag

data counterdata counter

data counterdata counter

HalfZigZag

EastScan is step by [1,0]end EastScan;

SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;

SouthScan isstep by [0,1]endSouthScan; NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;

Flowware language example (MoPL)The same language

principles

Page 22: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de22

TU Kaiserslautern

• The MoPL-3 Grammar ... of ...

• the Map-oriented Programming Language version 3 (MoPL-3), a data-procedural programming language

• to specify functions and operators to be mapped onto a DataPath Array (DPA) or other pipe network (hardwired as well as reconfigurable)

• and to procedurally program data streams associated with these functions or operators

MoPL-3 Grammar

Page 23: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de23

TU Kaiserslautern

MoPL grammar 1 (14): 1. Program Def.

2. Boundary Decl‘s

Identarray Decl-Size

;Data Typeof

Array Declaration

Identboundary Decl-Size ;

2. Boundary DeclarationsBoundary Declaration

SW Declaration

rALU Set-up

Scan Pattern Decl.

Boundary Declaration

Declaration Part

3

4

5

rALU = rDPU

15

16

19

1. Program Definition

Declaration Part Scan Statement PartMoPL Subroutine

Page 24: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de24

TU Kaiserslautern

MoPL grammar 2 (14): 3. Scan Window Decl‘s

Window Names

Point

Window Size

handle Data Typeof

Window Spec

Name-ListWindow Names

SW Group Name

Ident

3. Scan Window Declarations

Window Size Decl-Size

27

SW = Scan Window

window SW Group Name Window Spec

is

are

;Compound Window Declaration

Page 25: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de25

TU Kaiserslautern

MoPL grammar 3 (14): 4. rALU Set-up Decl‘s

4. rALU Set-up Declarations

Do Structure

Sub Structure

While Structure

Top Structure

Top Structure ;

Structural Part

do ConditionwhileS ub S truc ture ;Do Structure

rALU Name

Structural Part

SW Group Name

rALU subnet

is

of

; ;resident

rALU Config

IdentrALU Name

15

While Structure Conditionwhile Sub Structure ;

17

Page 26: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de26

TU Kaiserslautern

MoPL grammar 4

set localBranchFlag ;Set Structure

Sub StructureSub Structure List

FourBitVectorLocal Branch Flag

if Condition

then Sub Structure

else Sub Structure ;

If Structure

Sub Structure List

If Structure

Assignment

begin

end

Set Structure

;

Sub Structure

( Expression )Condition 25

Page 27: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de27

TU Kaiserslautern

MoPL grammar 5

(for missing production rules see Ph. D. thesis by Jürgen Becker)

activate

passivate

remove

rALU Subnet Name ;

rALU Activation

rALU Subnet NameIdent

http://hartenstein.de/Ph-D-Theses.html

Page 28: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de28

TU Kaiserslautern

MoPL grammar 6 (14): 5. Scan Pattern Decl‘s

Pattern NameIdent

Simple Pattern DeclPattern Name is Scan Action

rALUsubnet Flagdependent on

rALUsubnet FlaglocalBranchFlag

5. Scan Pattern Declarations

scanPattern Simple_Pattern_Decl ;

Compound Scan Pattern Decl

FourBitVectorLocal Branch Flag

22

Page 29: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de29

TU Kaiserslautern

MoPL grammar 7: 6. Scan Statement Decl‘s

begin end;Scan Statem ent B lockScan Statement Part

6. Scan Statement Declarations

Scan_Pattern_NameIdent

Scan_Window_NameIdent

Scan Statement Block with doSW Group Name

begin end;Sc an Statem ent

15

16

Page 30: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de30

TU Kaiserslautern

MoPL grammar 8 (14)

Ident

Array Name

Scan Statement move toScan Window Name

PointArray Name

Scan Pattern Call

rALU Activation

;

Scan Pattern Call Scan Pattern Name,

[ ]Scan Window Name

,parbegin parendScan Pattern

Scan Pattern Call( )

;

18

Page 31: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de31

TU Kaiserslautern

MoPL grammar 9 (14): 7. Scan Actions

7. Scan Action Declarations

Scan Pattern Sequence

Simple Scan

Library Scan

Pattern Spec

Scan Pattern Sequence

Scan Ident;begin Scan Action end

Scan N ame

Scan N ame Escape C lauseuntil

;begin Scan Action end

Es cape C laus ewhile Sc an Action

Scan Action

23

24

24 24

24

Page 32: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de32

TU Kaiserslautern

MoPL grammar 10 (14)

rotlrotrrotumirxmiryhalfrotlhalfrotrreverse

Stretching

Shearing

TransformationShortest Step

nne

ese

ssw

wnw

t.b.d.Stretching

1 step

],steps

[

Number

Number

Number

Shor tes tStep

Simple Scan

Shearing t.b.d.

Page 33: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de33

TU Kaiserslautern

MoPL grammar 11 (14)

Rel Op Number

Condition Clause

Transformation

Ident

)Scan Name(

Scan Name

IdentScan Ident

IdentLib Scan Name

SizeXY , NumberNumber

StepWidthXY

, NumberNumber

Escape ClauseCondition Clause

@ [

]

,

Condition Clause

external LibScanName

( SizeXY StepWidthXY; )

Library Scan

Page 34: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de34

TU Kaiserslautern

MoPL grammar 12 (14): 8. Expressions

8. Expression DeclarationsAssignment Expression ;Ident =

Sign +

-

Expression Simple ExpressionSimple ExpressionRel Op

Simple Expression

Term

+

-

or

xor

Factor

*

/

mod

and

Term Rel Op

<

<=

>

>=

==

<>

Factor( Expression )

Sign Factor

Factornot

Unsigned Real

SW Variable

Number

SW Variable

Ident Point

Page 35: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de35

TU Kaiserslautern

MoPL grammar 13: 9. Lexical Declarations

0 1 0 0 11 0 1

FourBitVector

Point ,[ ]NumberNumber

9. Lexical DeclarationsIdent

LetterLetter

Digit

Underscore

Digit

0 1 2 4 53 6 7 8 9

Underscore _

Scale FactorE Number

Signe

Number Digit

Unsigned Real

.NumberScale Factor

Number

A Z a z... ...

Letter

Page 36: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de36

TU Kaiserslautern

MoPL grammar 14 (14): 10. Common Production Rules

10. Common Production Rules

Ident

,

Name-List

Number Number:Range charunsigned

shortunsigned

intunsigned

longunsigned

float

Data Type

[ , ]Range RangeDecl-Size

Page 37: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de37

TU Kaiserslautern>> Data Sequencers <<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 38: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de38

TU Kaiserslautern

application-specific distributed memory*

• Application-specific memory: rapidly growing markets:– IP cores– Module generators– EDA environments

• Optimization of memory bandwidth for application-specific distributed memory

• Power and area optimization as a further benefit

• Key issues of address generators will be discussed

*) see books by Francky Catthoor et al.

Page 39: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de39

TU Kaiserslautern

Significance of Address Generators

• Address generators have the potential to reduce computation time significantly.

• In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750

• Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead

Page 40: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de40

TU Kaiserslautern

Smart Address Generators

1983 The Structured Memory Access (SMA) Machine

1984 The GAG (generic address generator)

1989 Application-specific Address Generator (ASAG)

1990 The slider method: GAG of the MoM-2 machine

1991 The AGU

1994 The GAG of the MoM-3 machine

1997 The Texas Instruments TMS320C54x DSP

1997 Intersil HSP45240 Address Sequencer

1999 Adopt (IMEC)

Page 41: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de41

TU Kaiserslautern

Adopt (from IMEC)

•cMMU synthesis environment:

•application-specific ACUs for array index reference

•ACU as a counter modified by multi-level logic filter

•ACU with ASUs from a Cathedral-3 library

•distributed ACU alleviates interconnect overhead (delay, power, area)

•nested loop minimization by algebraic transformations

•AE splitting/clustering

•AE multiplexing to obtain interleaved ASs

•other features

•customized MMU (cMMU) • address expression (AE)

•Address Sequence (AS)•Address Calculation Unit (ACU) • Application-Specific Unit (ASU)

For more details on Adopt see paper in proceedings CD-ROM

Page 42: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de42

TU Kaiserslautern

Distributed Memory

SA: scrambling and descrambling the data ?

Just in time: a new research area:

Application-specific distributed memory:

e. g. book by F. Catthoor et al. ...

Data address generators - 20 years research:

Page 43: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de43

TU Kaiserslautern

>> Sequencing through 2-D memory <<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 44: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de44

TU Kaiserslautern

MoM anti machine

Speedup by MoM

datacounter

memory

bank

asM

asM

asM

asM

asM

asM

...... asM

A d

istr

ibu

ted

mem

ory

(r)DPUsmart

memoryinterfac

e

MoM architecture:2-D memory space,adj. scan window

example: 4x4

scan window

grid-based design rule check example

speed-up: >1000complex boolean expressions in 1 clock cycle

address computation overhead: 94 %

Page 45: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de45

TU Kaiserslautern

Xputer Lab at Kaiserslautern: MoM I and II

Page 46: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de46

TU Kaiserslautern

Antimachine: MoM architecture

x

y

handle positions

scan window

scan pattern (high level sequencing)

example

intra scan window accesses(low level sequencing)

Handle Position Generator

Scan Window Generator

handleposition

bank 0 1 • • • n

y-GAG x-GAG

memory accesses

Page 47: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de47

TU Kaiserslautern

Vary-size scan windows

Size adjustable at run time

square or rectangular shape

location‘s individual access mode: R, W, R/W, no-op

by no-op placements any wild window shape

avoid multiple read/multiple write for overlapping successive scan window positions

Page 48: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de48

TU Kaiserslautern

2-D Generic Data Sequence Examples

a) b)

c)

d) e) f) g)

Page 49: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de49

TU Kaiserslautern

GAG Slider Model

987654321

987654321

123

x

y

x-scan line number

y-sc

anlin

enu

mbe

r

scan line number:

1 2 3

a)

b)

c)

scan pattern example for illustration of the slider model.

sliders

sliders

b) x addressc) y address

a) total address

123

LimitSlider

BaseSlider

GAG

AddressStepper

B0AL0

A

GenericAddressGenerator

Page 50: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de50

TU Kaiserslautern

GAG =Address

Generatorc

Generic GAU generic address unit Scheme

BaseSlider

B0

LimitSlider

L0

0B

[

AddressStepper

A

A

A

|| ||

L

]

limit

all 3 are copiesof the same BSU

stepper circuitGAU

published

in 1990

Page 51: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de51

TU Kaiserslautern GAG: Address Stepper

GAG =

AddressGenerator

Generic

+ / –

Escape

ClauseEnd

Detect

StepCounter

=o

L A A

inittag

AAddress

endExec

maxStepCount

0BLimit Base stepVector

[] | |

A LB0

[ ]|| ||limit

GAG: Address Stepper

Page 52: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de52

TU Kaiserslautern

Generic Sequence Examples

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

published

in 1990

a) b)

c)

d) e) f) g)

video scan

-90º rotated video scan

sheared video scan

non-rectangular video scan

zigzag video scan

spiral scan

feed-back-driven scans

atomic scan linear scan

-45º rotated (mirx (v scan))

perfectshuffle

until

Page 53: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de53

TU Kaiserslautern

GAG Slider Model

LimitStepper

BaseStepper

AddressStepper

B0AL0

A

LimitStepper

BaseStepper

AddressStepper

B0AL0

A

sliders

B0B

[

0 L

]0L0

B0B

[

0 A

A

L

]0L0

GAGGenericAddress

Generator

floor ceiling

Page 54: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de54

TU Kaiserslautern

ceiling

C

address

GAG Slider Operation Demo Example

yx

LB

L0B0AF

floor

LB

floor

slid

er

ceiling slider

Page 55: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de55

TU Kaiserslautern

GAG Complex Sequencer Implementation

LimitSlider

BaseSlider

GAG

AddressStepper

B0AL0

A

all `been published

in 1990

LimitSlider

BaseSlider

GAG

AddressStepper

B0AL0

A

LimitSlider

BaseSlider

GAU

AddressStepper

B0AL0

A

GAGGAG

GAUGeneric Addressing Unit

SDS

GAU

VLIWstack

Page 56: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de56

TU Kaiserslautern

XMDS Scan Pattern Editor GUI

Page 57: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de57

TU Kaiserslautern>> Acceleration mechanisms

<<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 58: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de58

TU Kaiserslautern

Linear Filter Application

b)

r

r r r

r

r/w r r

r

rr r

w / r r r

r

r r r

r

w/r r r

r

r r r Bank a

Bank a

Bank b

w r

r

r

scan step

Page 59: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de59

TU Kaiserslautern

Scanline unrolling

r r

r/w r r

r

r r r

r/w r r

r/w r r

r r r

Page 60: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de60

TU Kaiserslautern

90o Rotation of Scan Pattern

r r

rr

r

r

r

r

r

r

Bank a

Bank a

Bank b

Bank b

w wwr rr rr

r rr rrw ww

w w w

r

w

r

rr

r

r

r

r

w

r

r

w

Bank a

Bank a

Bank b

Bank b

scanwindowoverlaparea

r r/wr r/w r/w

r

r

r/w

r

rr

r

r

r

r/w

r

r

r/w

r

r

Page 61: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de61

TU Kaiserslautern

Linear Filter Application

after inner scan line loop unrolling

final design

after scan line

unrolling

hardw. level access optim.

initial design

Parallelized Merged Buffer Linear Filter Applicationwith example image of x=22 by y=11 pixel

Page 62: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de62

TU Kaiserslautern

r r

r/w r r

r

r r r

r/w r r

r/w r r

r r r

after inner scan line loop unrolling

final design

after scan line

unrolling

hardw. level access optim.

initial design

rr

w/r r r

r

r r r Bank a

Bank a

Bank b

Storage scheme optimization: scanline unrolling

x

y

handle positions

scan window

scan pattern (high level sequencing)

example

intra scan window accesses(low level sequencing)

MoM anti machine architecture

Page 63: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de63

TU Kaiserslautern

MoM anti machinean Xputer architecture

Speedup by MoM

datacounter

memory

bank

asM

asM

asM

asM

asM

asM

...... asM

A d

istr

ibu

ted

mem

ory

rDPUsmart

memoryinterface

Multiple scan windows

example: 4x4

scan window

s

.....

Page 64: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de64

TU Kaiserslautern

16 point CGFFT: mapped onto 2-D memory space

Page 65: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de65

TU Kaiserslautern

ou

tpu

t

tem

p

tem

p

tem

p

coeff

.

coeff

.

coeff

.

CGFFT: Nested and Parallel Scan Pattern

inp

ut

coeff

.

ini

ini+1

coeff.empty

MAC

Page 66: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de66

TU Kaiserslautern

CGFFT: Parallel Scan Pattern Animation

ini

ini+1

coeff.empty

outk

MAC

outj 32 steps

Page 67: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de67

TU Kaiserslautern

CGFFT: Parallel Scan Pattern Animation

MAC

outj

outj+1

outk

outk+1

ini

ini+1

coeff.empty

Ini+2

ini+3

coeff.empty

MAC

4 MAC unitsin parallel

8 MAC unitsin parallel

16 steps8 steps4 steps

Page 68: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de68

TU Kaiserslautern CGFFT: Nested and Parallel Scan Pattern

scanouter loop

patternHLScan is 3 steps [2, 0]

SP1 is 7 steps [0, 2]

SP23 is 7 steps [0, 1]

inner loopcompoundscanpatterns

3 in parallel

goto

Page 69: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de69

TU Kaiserslautern>> Acceleration mechanisms

<<

• Time to space migration

• Flowware languages

• Data Sequencers

• Sequencing through 2-D memory

• MoM architecture

• Acceleration mechanisms

http://www.uni-kl.de

Page 70: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de70

TU Kaiserslautern

Speed-up Enablers

Hier eine Liste

DRC 4 orders of magnitude

Address computation overheadTranslate into super-systolic rather than into instruction streams

Determine interconnect fabrics by compilation, but not before fabrication

Determine memory architecture by compilation, but not before fabrication

Page 71: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de71

TU Kaiserslautern

Acceleration Mechanisms

•parallelism by multi bank memory architecture•auxiliary hardware for address calculation •address calculation before run time

•avoiding multiple accesses to the same data.•avoiding memory cycles for address computation•improve parallelism by storage scheme transformations•improve parallelism by memory architecture transformations

•alleviate interconnect overhead (delay, power and area)

Page 72: Reconfigurable HPC part 3 Architectural Resources

© 2004, [email protected] http://hartenstein.de72

TU Kaiserslautern

END