34
[email protected] 21 May 2013 IEEE Computer Society EAB November 2005 at Philadelphia 1 Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de Twin-Paradigm Programmer Education: proposed New Text Book (and project) Reiner Hartenstein talk for Jan 2010 at © 2010, [email protected] http://hartenstein.de TU Kaiserslautern 1981: visiting professor at UC Berkeley IEEE fellow, SDPS fellow, FPL fellow, other awards Professor (ordinarius emeritus), TU Kaiserslautern my CV (1) Founder and co-founder of several international annual conference series All academic degrees from EE department at Karlsruhe Institute of Technology (KIT) Karl Steinbuch 2 © 2010, [email protected] http://hartenstein.de TU Kaiserslautern Viktor Prasanna called him „The father of Reconfigurable Computing“ (also pre-FPGA era) -- („Gerald Estrin: grandfather of RC“) 1983: founder of German Mead-&-Conway VLSI design scene: the multi university „E.I.S. project“ (grant: 35 million DM) my CV (2) Author & compiler writer of KARL[1]: in the 80ies the most successful HDL and trailblazer before VHDL [1] R. Hartenstein: Fundamentals of Structured Hardware Design, 1977, North Holland / American Elsevier -- bestseller 3 © 2010, [email protected] http://hartenstein.de TU Kaiserslautern Complete EDA framework[2] around KARL and ABL: 85 mio ECU grant by ESPRIT programme of the EU my CV (3) format-checking functional floorplan graphic editor [2] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; 1993. also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html calculus-based term rewriting floorplan generator, automatic test generation, testability analysis, embedded router, structured logic synthesis, simulator … 4 © 2010, [email protected] http://hartenstein.de TU Kaiserslautern my CV (4) consulting Hans-D. Genscher, Foreign Secretary & Vice-Chancellor lobbying & consulting Heinz Riesenhuber, Helmut Kohl‘s Tech & Research Secretary Konrad Schwaiger, Commission of the European Union popular science book writing for newspapers local global global http://hartenstein.de/Pressespiegel.html 5 some of my hobbies © 2010, [email protected] http://hartenstein.de TU Kaiserslautern 6 Book Outline Introduction Instruction Stream Machines Data Stream Machines Time to Space Mapping Hetero Systems Applications Outlook

Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 1

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

Twin-Paradigm Programmer Education:

proposed New Text Book (and project)

Reiner Hartenstein

talk for Jan 2010 at

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

1981: visiting professor at UC Berkeley

IEEE fellow, SDPS fellow, FPL fellow, other awards

Professor (ordinarius emeritus), TU Kaiserslautern

my CV (1)

Founder and co-founder of several international annual conference series

All academic degrees from EE department at Karlsruhe Institute of Technology (KIT)

Karl Steinbuch

2

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Viktor Prasanna called him „The father of Reconfigurable Computing“ (also pre-FPGA era) -- („Gerald Estrin: grandfather of RC“)

1983: founder of German Mead-&-Conway VLSI design scene: the multi university „E.I.S. project“ (grant: 35 million DM)

my CV (2)

Author & compiler writer of KARL[1]: in the 80ies the most successful HDL and trailblazer before VHDL

[1] R. Hartenstein: Fundamentals of Structured Hardware Design,

1977, North Holland / American Elsevier -- bestseller

3 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Complete EDA framework[2] around KARL and ABL: 85 mio ECU grant by ESPRIT programme of the EU

my CV (3)

format-checking functional floorplan graphic editor

[2] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; 1993.

also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html

calculus-based term rewriting floorplan generator,

automatic test generation,

testability analysis,

embedded router,

structured logic synthesis,

simulator …

4

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

my CV (4)

consulting

Hans-D. Genscher, Foreign Secretary & Vice-Chancellor

lobbying & consulting

Heinz Riesenhuber, Helmut Kohl‘s Tech & Research Secretary

Konrad Schwaiger, Commission of the

European Union

popular science book

writing for newspapers

local

global

global

http://hartenstein.de/Pressespiegel.html

5

some of my hobbies

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

6

Book Outline

• Introduction

• Instruction Stream Machines

• Data Stream Machines

• Time to Space Mapping

• Hetero Systems

• Applications

• Outlook

Page 2: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 2

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

7

How Societies Chose

to Fail or Succeed

Collapse of our computing ecosystem ?

Unaffordability of von-Neumann-centric computing could jeopardize

all facets of our global economy.

Many-core: failure could jeopardize both, IT industry & sections of the economy depending on rapid

improvement of IT. [Dave Patterson]

Several recent outages of cloud computing services.

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

8

Motivations for a Game Change

Energy consumption of all computers world-wide may become unaffordable

To enable mastering the impact of manycore the programmer population is not yet existing

Fundamentally new types of skills for programmers promise to cool down the chronic “software crisis”

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Energy-efficient Computing

• Low power circuit design*, SSD, & „green computers“**:

• The Twin Paradigm approach:

It‘s important: all sections should be supported

Promising results being better by orders of magnitude

technology established, but massive commitment required

2 different approaches:

*) e. g. PATMOS annual conference series

9 **) remember the talk by Rich Goldman on Tuesday ! © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

10

Re-thinking basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory

limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Put old ideas into practice

(POIIP)

11

... „The biggest payoff will come from putting old ideas into practice and teaching people how to apply them properly.“ [David Parnas]

“We need a complete re-definition of CS” [Burton Smith and other celebrities]

Wrong! I do not agree, [Reiner Hartenstein]

finding out, that ...

“We need a complete re-definition of curriculum recommendations - missing several key issues.”

[Reiner Hartenstein]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Educational Deficits

Transdisciplinary fragmentation: each application domain uses its own trick boxes

Too many sophisticated very clever architectures

We need a fundamental model with a methodology which all application domains have in common

Educational deficits have stalled Reconfigurable Computing (RC) as well as classical supercomputing

Transdisciplinary education & basic research needed

12

Page 3: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 3

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern it’s old stuff !

• Most of the enabling technologies of Reconfigurable Computing have been published in the 70ies and 80ies:

13

• This is mainly ignored by the CS community by the tunnel view of a reductionist mind set.

• We need to think out of the box: R&D and education need a twin paradigm approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

14

Book Outline

• Introduction

• Instruction Stream Machines • Data Stream Machines • Hetero Systems • Time to Space Mapping • Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Power Consumption of Computers

Energy cost may overtake

IT equipment cost in the near future

but „we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]

... has become an industry-wide issue: incremental improvements are on track,

[Albert Zomaya]

Current trends will lead to unaffordable future operation cost of our cyber infrastructure

Power consumption by internet: x30 til 2030 if trends continue G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008

15 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

50 years Software Crisis

Nathan’s Law: Software is a gas. It expands to fill its container ...

Nathan Myhrvold

… until being limited by Moore’s Law [& Kryder’s Law]

Wirth‘s Law

“software is slowing faster than hardware is accelerating“

Oct 1957 The Economist: Nov 19th 1955

In 1955, Parkinson could not have foreseen the impact of software. formula: bureaucracy growth independent of actual work to be done

[Niklaus Wirth]

[Cyril Northcote Parkinson]

Software critics is not new:

F. L. Bauer 1968, coined the term „Software Crisis“

N. N. 1995: THE STANDISH GROUP REPORT

Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005

Anthony Berglas 2008: Why it is Important that Software Projects Fail L. Savain 2006: Why Software is bad

Peter G. Neumann 1985-2003:

216x “Inside Risks“(18 years inside back cover of Comm_ACM)

term by F. L. Bauer [1968]

16

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

“Software”

overhead piles up to code sizes of astronomic dimensions

The von Neumann

Syndrome:

stands for extremely memory-cycle-hungry instruction streams

stolen from Bob Colwell

memory wall,

caches, ...

from earlier talks: from earlier talks:

datastream parallelism

instruction stream parallelism

data meet PU

C.V. Ramamoorthy

“The Memory Wall”

coined by Sally McKee (& co-author)

Patterson’s Law:

Dave Patterson

bandwidth gap grows 50% / year

has reached >1000x

17 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Program Program Performance

„Multicore computers shift the burden of software performance from chip designers to programmers.“

we anyway need a Software Education Revolution ... Since People have to write code differently,

[J. Larus: Spending Moore's Dividend; C_ACM, May 2009] ... performance drops & other problems

in moving single-core to multicore ...

... the chance to move RC* from niche to mainstream

a scenario like before the Mead-&-Conway revolution Missing programmer population, tools and methodology:

*) RC = Reconfigurable Computing

Embedded syst. & hdw scene have the right background to reform the parallelism education of programmers

program program

Programming Programming

18

Page 4: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 4

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

A Heliocentric CS Model needed

FE

Flowware

Engineering

PE

Program Engineering

The Generalization of Software Engineering —

A Twin Paradigm Dual Dichotomy Approach.

time to space mapping

issue

SE

Software

Engineering

RPU RPU

special

*) do not confuse with „dataflow“!

19 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Following the exemplar: our ideal:

educational framework to create a designer and researchers population

20

The most influential research project

in modern computer history.

The brain child of:

Carver Mead Lynn Conway

(~1980-…) The VLSI Revolution

Solving the VLSI design crisis: missing reply to Moore‘s Law

missing designer population & tools

Incubator of EDA* industry, workstations…

*) Electronic Design Automation

text book [1980] ( Bestseller ! )

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

VLSI design education spreading rapidly

1981 - 1983

A few examples:

E.I.S. Projekt (DE): 20 universities [1983...]

EUROCHIP (EU) [1984...]

21

world-wide

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The most influential research project

in modern computer history.

The VLSI Revolution

educational framework to create designer & researcher population

22

Carver Mead Lynn Conway

Solving the VLSI design crisis: missing reply to Moore‘s Law

missing designer population & tools

Incubator EDA industry, workstations…

software software

vN syndrome vN syndrome

programmer programmer

PE* PE*

programmer programmer

PSCs** PSCs**

programming programming

following the exemplar:

The Next Text Book Bestseller !

Next most influential R&D project Next most influential R&D project

*) Program Engineering

**) personal SuperComputers

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Teaching properly

„The biggest payoff will come from Putting Old ideas into Practice

and teaching people how to apply them properly.“

Dav

id P

arna

s

(POIIP) Put Old ideas into Practice

Lean extension of SE education: avoiding to scare away students

23 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

M-&-C-like textbook needed

avoiding specialization overload

24

Breadth of specialization massively reduced

Clean-up &

intuitive models

Fixing the

Education

Dilemma

Silicon Foundry

(external technology)

Gate level

Switching level

Circuit level

RT level

Anwendung

Layout level

Technology

on site

submit reject

submit reject

submit reject

submit reject

submit reject

conventional design flow:

fra

gm

entation

ta

ll th

in

m

an

application

The new M-&-C

organization:

Carver Mead Lynn Conway

stressing “Systems”

coherence

The text book [1980] ( Bestseller ! )

Breadth of specialization

„fa

b-less“

Page 5: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 5

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

25

Book Outline

• Introduction • Instruction Stream Machines

• Data Stream Machines • Hetero Systems • Time to Space Mapping • Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The first Reconfigurable Computer

•prototyped 1884 by Herman Hollerith

•a century before FPGA introduction

•data-stream-based •data-stream-based

•60 years later the von Neumann (vN) model took over

• instruction-stream-based • instruction-stream-based

26

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Early LUT

60 years later: RAM available f. configuration

non-volatile configuration “memory”

field-programmable:

•manually

•or, by swapping pre-wired plug boards

But only used for von Neumann machine (at that time)

27 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

28

The History of Mainframes

• Reconfigurable was first

• Von Neumann had the strongest lobby

• Von-Neumanized hardware design

• Problems with hardware description languages

But curriculum planners insisted in going through all abstraction levels

„I understand everything, although I do not know any electronics“

The software faculty did not support this mind set

Every student had to learn how to design a CPU

M&C misunderstood: only for chip designers & toolmakers

(the tall thin man)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Teaching for Change: an early martyr

„Turing is irrelevant“

The von Neumann model is the emulation of a tape machine

http://www.sigsoft.org/SEN/parnas.html

D. L. Parnas (keynote):

"Teaching for Change“;

10th Conf. Softw. Engineering Education

and Training (CSEET '97)

„The von Neumann syndrome“: coined ~ a decade later

Prof. C.V. Ramamoorthy, (UC Berkeley),

SDPS 2006, San Diego, CA

Brad Cox 1990: Planning the Software Industrial Revolution

Dijkstra 1968: The Goto considered harmful

R. Hartenstein, G. Koch 1975: The universal Bus considered harmful Backus 1978: Can programming be liberated from the von Neumann style?

Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style L. Savain 2006:

Why Software is bad …

Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover

of Comm_ACM)

Critique of von Neumann is not new:

punished for blasphemy?

(mimicking tape on RAM)

Peter G. Neumann

Teaching for Change

29 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

30

The von Neumann Paradigm Trap

•Antimachine:

•data counter(s) at memory unit(s)

•[Burks, Goldstein, von Neumann; 1946]

•Program counter at Datapath Unit

CS education got stuck in this paradigm trap

which stems from technology of the 1940s

We need a twin- paradigm approach

CS education’s right eye is blind, and

its left eye suffers from tunnel view

Page 6: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 6

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

31

Instruction Stream Machines

• a very simple von Neumann machine explained at bit level program sequencer explained, opcode, control structures, loops, simulator, simple example exercisesd

• imperative language, compilation, execution on von Neumann simulator, example exercises

• parallel processes, Hoare, MPI, etc -- extremely simple !!

• a supercomputing example (differential equations ?)

• The von Neumann Syndrome

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

32

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

33

Instruction Stream Machines

• a very simple von Neumann machine explained at bit level program sequencer explained, opcode, control structures, loops, simulator, eimple example exercisesd

• imperative language, compilation, execution on von Neumann simulator, example exercises

• parallel processes, Hoare, MPI, etc -- extremely simple !!

• a supercomputing example (differential equations)

• The von Neumann Syndrome

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

34

Compilation: Software

source program

software compiler

software code

Software Engineering Software

Engineering

instruction schedule

(Befehls-Fahrplan)

sequential

(von Neumann model)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

35

imperative Software Language (control part)

language category Software Languages (instruction streams)

deterministic procedural sequencing:

traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

state register program counter

address computation

massive memory cycle overhead

Instruction fetch memory cycle overhead

parallel memory bank access interleaving only

co-located with datapath

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

36

Instruction Stream Machines

• a very simple von Neumann machine explained at bit level program sequencer explained, opcode, control structures, loops, simulator, eimple example exercisesd

• imperative language, compilation, execution on von Neumann simulator, example exercises

• parallel processes, Hoare, MPI, etc -- extremely simple !!

• a supercomputing example (differential equations)

• The von Neumann Syndrome

Page 7: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 7

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

37

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

38

Instruction Stream Machines

• a very simple von Neumann machine explained at bit level program sequencer explained, opcode, control structures, loops, simulator, eimple example exercisesd

• imperative language, compilation, execution on von Neumann simulator, example exercises

• parallel processes, Hoare, MPI, etc -- extremely simple !!

• a supercomputing example (differential equations)

• The von Neumann Syndrome

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

39

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

40

Instruction Stream Machines

• a very simple von Neumann machine explained at bit level program sequencer explained, opcode, control structures, loops, simulator, eimple example exercisesd

• imperative language, compilation, execution on von Neumann simulator, example exercises

• parallel processes, Hoare, MPI, etc -- extremely simple !!

• a supercomputing example (differential equations)

• The von Neumann Syndrome

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

von Neumann overhead vs. Reconfigurable Computing

overhead von Neumann

machine hardwired

anti machine reconfigurable anti machine

instruction fetch instruction stream none*

state address computation instruction stream none*

data address computation instruction stream none*

data meet PU + other overh. instruction stream none*

i / o to / from off-chip RAM instruction stream none*

Inter PU communication instruction stream none*

message passing overhead instruction stream none*

*) configured before run time

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPA: reconfigurable datapath array rDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

) (c

oars

e-gr

aine

d re

c.)

41 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

overhead von Neumann

machine hardwired

anti machine reconfigurable anti machine

instruction fetch instruction stream none*

state address computation instruction stream none*

data address computation instruction stream none*

data meet P + other overh. instruction stream none*

i / o to / from off-chip RAM instruction stream none*

Inter PU communication instruction stream none*

message passing overhead instruction stream none*

**) just by reconfigurable address generator

von Neumann overhead vs. Reconfigurable Computing

*) configured before run time

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPA: reconfigurable datapath array rDPA: reconfigurable datapath array

(coa

rse-

grai

ned

rec.

) (c

oars

e-gr

aine

d re

c.)

42

Page 8: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 8

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Data meeting the Processing Unit (PU)

by Software

by Configware

routing the data by memory-cycle-hungry instruction streams thru shared memory

data-stream-based: placement* of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... explaining the RC advantage

*) before run time

(data)

(PU)

43 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern The von Neumann Syndrome

The data-stream-based anti machine approach:

The instruction-stream-based von Neumann approach:

has no von Neumann bottle-necks

has no von Neumann bottle-necks

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

has several

von Neumann overhead

phenomena

per CPU!

44

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Reconfigurable Computing means …

• Reconfigurable Computing means moving overhead from run time to compile time**

• For HPC run time is more precious than compiletime

• Reconfigurable Computing replaces “looping” at run time* …

http://www.tnt-factory.de/videos_hamster_im_laufrad.htm

… by configuration before run time

*) e. g. complex address computation **) or, loading time

45 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

46

Book Outline

• Introduction • Instruction Stream Machines • Data Stream Machines

• Hetero Systems • Time to Space Mapping • Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

47

We need a new machine paradigm

a programmer does not understand function evaluation without machine mechanisms - without a pogram counter …

data-stream-based mind set

we urgently need a

datastream based

machine paradigm

we urgently need a

datastream based

machine paradigm it was pepared almost 30 years ago

x x x

x x x

x x x

|

| |

x x x

x

x x

x x

x

- -

-

x x x x

x x

x x x

- - -

- - -

- - -

- - -

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| data streams

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Compilation: Software vs. Configware

source program

software compiler

software code

Software Engineering Software

Engineering Configware Engineering Configware Engineering

placement & routing

scheduler

flowware code

data

C, FORTRAN MATHLAB, …

instruction streams data streams configuration

configware code

mapper

configware compiler

source „program“

48

Page 9: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 9

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

49

data counter

GAG RAM

ASM

data counter

GAG RAM

ASM

data counter

GAG RAM

ASM

Configware Compilation

configware code

flowware code

mapper

configware compiler

scheduler

source „program“

Configware Engineering Configware Engineering

placement & routing

data

C, FORTRAN MATHLAB

programming the data counters

configware compilation fundamentally different from software compilation

configware compilation fundamentally different from software compilation

x x x

x x x

x x x

|

| |

x x x

x

x x

x x

x

- -

-

x x x x

x x

x x x

- - -

- - -

- - -

- - -

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| data streams

rDPA

pipe network

data counter

GAG RAM

ASM: Auto-Sequencing Memories ASM

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

50

Data Stream Machines

• Systolic array, def of datastream -simple !! -- (e. g. Matrix compute) examples intro, simulator, exercises

• DMA, auto-sequencing memory, data stream machine exercises programming data streams e.g.of prev. section

• introducing a flowware language, implement a compiler example exercises running on datastream machine

• introductory examples illustrate datastream machine use

• introducing the datastream machine

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

51

Having introduced Data streams

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

input data stream

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| output data streams

time

port #

time

time

port # time

port #

systolic array research throughout the 80ies:

mainly algebra experts

~1980

DPA (pipe network)

execution: transport-triggered

no memory wall

H. T. Kung

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

DPA

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

input data streams

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| output data streams

DPU operation is transport-triggered

no instruction streams

no message passing

nor thru common memory

where were the supercomputing people ?

massively reducing memory cycles

The right road map to HPC: there ignored for decades

52

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

input data stream

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| output data streams

time

port #

time

time

port # time

port #

defines: ... which data item at which time at which port

(H. T. Kung paradigm)

Algebra experts‘ hobby, early 80ies

DPA* (pipe network) *) DataPath Array

(array of DPUs)

The Systolic Array

DataPath Unit has no program counter!

it’s no CPU!

nice time/space notation -

53 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Yielding only linear unifornm pipes

pipeline properties array applications

shape resources

mapping scheduling

(data stream formation)

systolic array

regular data

dependencies only

linear only

uniform only

linear projection or algebraic synthesis

super-systolic rDPA

no restrictions simulated

annealing or P&R algorithm

(e.g. force-directed) scheduling algorithm

*

54

Page 10: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 10

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

of course algebraic (linear projection)

only for applications with regular data dependencies

Algebra experts caught by their own paradigm trap

Rainer Kress discarded their algebraic synthesis methods and replaced it by simulated annealing: rDPA

1995

Synthesis Method?

55 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Super Pipe Networks

pipeline properties array applications

shape resources

mapping scheduling

(data stream formation)

systolic array

regular data

dependencies only

linear only

uniform only

linear projection or algebraic synthesis

super-systolic rDPA

no restrictions simulated

annealing or P&R algorithm

(e.g. force-directed) scheduling algorithm

*

*) KressArray [1995]

56

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

pipe network, organized at compile time

rDPA = rDPU array, i. e. coarse-grained

rDPU = reconf. datapath unit (no program counter)

What pipe network ?

rDPA rDPA

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

Generalization*

of

the systolic array

array port receiving or sending a data stream

rDPA rDPA

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU [R. Kress, 1995]

*) supporting non-linear pipes on free form hetero arrays

57 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

another free form pipe network example:

honeycomb architecture

rDPA = rDPU array

rDPU = reconfigurable

datapath unit

more generalization

Generalization*

of

the systolic array

array port receiving or sending a data stream

[R. Kress, 1995]

*) supporting non-linear pipes on free form hetero arrays – also zigzag, spiral, fork and join, and even more crazy routing patterns tolerated

rDPA rDPA

rDPU rDPU rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU

58

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern super-systolic array

59

(recall this example !)

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not used backbus connect

supporting any complex free form pipe networks

far beyond just uniform linear pipes

CoDe-X inside [Jürgen Becker]

by KressArray Xplorer [Ulrich Nageldinger]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

60

Data Stream Machines

• Systolic array, def of datastream -simple !! -- (e. g. Matrix compute) examples intro, simulator, exercises

• DMA, auto-sequencing memory, data stream machine exercises programming data streams e.g.of prev. section

• introducing a flowware language, implement a compiler example exercises running on datastream machine

• introductory examples illustrate datastream machine use

• introducing the datastream machine

Page 11: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 11

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern “It’s not our job”

x x x

x x x

x x x

| | |

x x x x

x x

x x x

- - -

x x x x

x x

x x x

- - -

- - -

- - -

- - -

x x x

x x x

x x x

| | |

| | |

| | |

| |

|

resources

sequencer

Machine:

61 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

data counter

GAG RAM

ASM: Auto-Sequencing

Memory rDPA rDPA

ASM ASM ASM ASM ASM ASM

ASM ASM

ASM ASM

ASM ASM

Migration benefit by on-chip RAM

so that the drastic code size reduction by software to configware migration can beat the memory wall

Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM

multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

ASM ASM

ASM ASM

ASM ASM

ASM ASM ASM ASM ASM ASM

rDPA = rDPU array, i. e. coarse-grained

rDPU = reconf. datapath unit (no program counter)

GAGs inside ASMs generate the data streams

GAG = generic address generator

62

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

63

The counterpart of the von Neumann machine

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

|

(r)DPA

ASM

ASM

ASM

ASM

ASM

ASM

AS

M

AS

M

AS

M

AS

M

AS

M

AS

M

data counter

GAG RAM

ASM: Auto-Sequencing

Memory

data counters instead of a program counter

data counters: located at memory (not at data path)

coarse-grained

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Counters: the same micro architecture ?

data stream machine (anti machine)

data

counter

memory bank

asM

progra

m

counter

DPU CPU

instruction stream machine: (von Neumann etc.)

yes, is possible, but for data counters ...

*) for history of AGUs see Herz et al.: Proc. ICECS 2002, Dubrovnik, Croatia

... a much better AGU methodology is available*

AGU: address generator unit

64

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

von Neumann vs. anti machine

progra

m

counter

DPU CPU

RAM memory

von Neumann bottleneck

(r)DPA

without sequencer

asMA: auto-sequencing Memory Array

........

asM

asM

asM

asM

asM

(r)DPA

........

data stream machine (anti machine)

data

counter

memory bank

asM

asM: auto-sequencing Memory

instruction stream machine (von Neumann etc.)

65 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Behavior of the Counter

data

counter

memory bank

asM

asM

asM

asM

asM

asM

........

progra

m

counter

DPU CPU

(r)DPA

Page 12: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 12

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

anti machine

Configware Compilation

data

counter

memory bank

asM

asM

asM

asM

asM

asM

........

(r)DPA configware compiler

source „program“

mapper

Placement & routing

data scheduler

flowware code

67 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

68

Data Stream Machines

• Systolic array, def of datastream -simple !! -- (e. g. Matrix compute) examples intro, simulator, exercises

• DMA, auto-sequencing memory, data stream machine exercises programming data streams e.g.of prev. section

• introducing a flowware language, implement a compiler example exercises running on datastream machine

• introductory examples illustrate datastream machine use

• introducing the datastream machine

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

data counter

GAG RAM

ASM: Auto-Sequencing

Memory

ASM: Auto-Sequencing

Memory

use data counters, no program counter

rDPA rDPA

x x

x

x

x

x

x x

x

- -

-

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

ASM Data stream generators

x x x

x x x

x x x

|

| |

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

|

ASM

ASM

ASM

ASM

ASM

ASM

AS

M

AS

M

AS

M

AS

M

AS

M

AS

M

implemented

by distributed

on-chip memory

implemented

by distributed

on-chip memory

reconfigurable reconfigurable

non-von-Neumann machine paradigm

systolic array super systolic pipe network

[1995]

69 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

(anti-von-Neumann machine paradigm)

Data Counter instead of Program Counter

Generalization of the DMA

data counter

GAG RAM

ASM: Auto-Sequencing Memory ASM

GAG & enabling technology: published 1989 [by TU-KL], Survey paper: [M. Herz et al.*: IEEE ICECS 2003, Dubrovnik] *) IMEC & TU-KL

**) -- patented by TI** 1995

Storge Scheme optimization methodology, etc.

70

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

71

GAG: 2-D Generic Data Sequence Examples

a) b)

c)

d) e) f) g)

Limit Slider

Base Slider

GAG

Address Stepper

B0 DA L0

A

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

72

GAG Slider Operation Demo Example

yx

ceiling

C

address

DLDB

L0B 0 DA

F

floor

DLDB

Page 13: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 13

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

GAG Slider Model

LimitStepper

BaseStepper

AddressStepper

B0DAL0

A

LimitStepper

BaseStepper

AddressStepper

B0DAL0

A

sliders

B 0 B

[

0 L

]

0 L 0

B 0 B

[

0 A D

A D

L

]

0 L 0

GAG

Generic Address

Generator

floor ceiling

73 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

74

MoM Scan window (MoMSW) Illustration

• Multiple* vari-size reconfigurable MoMSW scan windows

• MoMSW controlled by reconfigurable GAG (generic address generators)

• 2-dimensional (data) memory address space

MoM architectural primary features:

*) typically 3

ASM: Auto-

Sequencing

Memory

ASM: Auto-

Sequencing

Memory

ASM: Auto-

Sequencing

Memory

ASM: Auto-

Sequencing

Memory

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

75

CGFFT: Parallel Scan Pattern Animation MoM-3 with 3 varisize scan windows

Datapath Datapath ASM: Auto-

Sequencing

Memory

ASM: Auto-

Sequencing

Memory

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

76

Reconfigurable Generic Address Generator GAG

Generalization of the DMA

data counter

GAG

GAG & enabling technology published 1989, survey: [M. Herz et al.: IEEE ICECS 2003, Dubrovnik]

patented by TI 1995

• storge scheme optimization methodology, etc.

Acceleration factors by:

• address computation without memory cycles

• supporting scratch optimization strategies (smart d-caching)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

77

Significance of MoMSW Reconfigurable Scan Windows

• MoMSW Scan windows have the potential to drastically reduce traffic to/from slow off-chip memory.

• No instruction streams needed to implement scratch pad optimization strategies using fast on-chip memory

• MoMSW Scan windows may contribute to speed-up by a factor of 10 and sometimes even much more

• MoMSW Scan windows are the deterministic alternative („d-caching“) to (indeterministic and speculative) classical cache usage: performance can be well predicted

• For data-stream-based computing scan windows are highly effective, whereas classical caches are entirely useless

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

78

Acceleration Mechanisms by ASM-based MoMSW

•parallelism by multi bank memory architecture •reconfigurable address compuattion – before run time

•avoiding multiple accesses to the same data.

•avoiding memory cycles for address computation

• improve parallelism by storage scheme transformations

•minimize data movement across chip boundaries

Page 14: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 14

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Xputer Lab at Kaiserslautern: MoM I and II 1986:

79 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

80

Data Stream Machines

• Systolic array, def of datastream -simple !! -- (e. g. Matrix compute) examples intro, simulator, exercises

• DMA, auto-sequencing memory, data stream machine exercises programming data streams e.g.of prev. section

• introducing a flowware language, implement a compiler example exercises running on datastream machine

• introductory examples illustrate datastream machine use

• introducing the datastream machine

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Similar Programming Language Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

sequencingdriven by:

read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching

read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching

81 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

82

Programming Language Paradigms

language category Computer Languages Xputer Languages

both deterministic procedural sequencing: traceable, checkpointable

operationsequencedriven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes,instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.),data loop, loop nesting,parallel loops, escapes,data stream branching

state register program counter data counter(s)

addresscomputation

massive memorycycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoided

parallel memorybank access interleaving only no restrictions

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Why these paradigms are twins

instruction stream languages

read next instruction

goto (instruction address)

jump to (instruction address)

instruction loop

instruction loop nesting

instruction loop escape

instruction stream branching

no internally parallel loops

data stream languages

read next data item

goto (data address)

jump to (data address)

data loop

data loop nesting

data loop escape

data stream branching

yes: internally parallel loops

83 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Procedural Languages Twins

systolic Flowware Languages

read next data item

goto (data address)

jump to (data address)

data loop

data loop nesting

data loop escape

data stream branching

yes: internally parallel loops

84

imperative Software Languages

read next instruction

goto (instruction address)

jump to (instruction address)

instruction loop

instruction loop nesting

instruction loop escape

instruction stream branching

no: no internally parallel loops

But there is the Asymmetry But there is the Asymmetry

program counter data counter(s)

for data parallelism for data parallelism

super

Page 15: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 15

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The twin paradigms: different moving of data

# moving data between data transport execution

triggered by strategy

1 von Neumann CPU cores

via common memory

instruction stream

moving data at run time

2

(r)DPU cores within (r)DPA

(coarse-grained array)

piped thru directly from

(r)DPU to (r)DPU

arrival of data

(transport-triggered)

moving at compile time the

locality of execution

Who moves operand to operator if not an instruction?

85 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

86

Von Neumann vs. anti machine

# feature von Neumann

machine

hardwired

anti machine

reconfigurable

anti machine

1 m’ code schedules: instruction stream data streams

2 # prog’ sources 1 2

3 source 1 none configware

4 source 2 software flowware

5 sequenced by: program counter data counters

6 counter co-located with: PU (data path): CPU memory block: ASM

9 inter PU communication: common memory piped through

10 data meeting PU: move data at run time move locality of execution at compile rime

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Overhead avoided by Anti Machine

# feature von Neumann

machine

hardwired

anti machine

reconfigurable

anti machine

11 state address computation

overhead at run time instruction stream

overhead:

none

12 data address computation

overhead at run time instruction stream none

13 Inter PU communication

overhead at run time instruction stream none

14 instruction fetch at run time instruction stream none

15 data meet PU at run time instruction stream none

16 other overhead instruction stream none

87 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

88

Which language ?

Computer scientists haven’t been interested in programming clusters. If putting the cluster on a chip is what excites them, fine.

It will still have to run Fortran!

*) like CoDe-X

Scientists do not change their direction

Newton’s 1st Law à la Gordon Bell:

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

New Languages for Parallelism ?

Efforts to extend standards-based, serial programming languages with features to describe parallel constructs are likely to fail.

Nick Tredennick:

Term Rewriting Systems may raise the abstraction level up to math formulae

Mauricio Ayala-Rincón:

What’s more likely to succeed are languages that raise the level of abstraction in algorithm description

89 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

90

Which Language ?

*) like CoDe-X

Support tools have been demonstrated by academia

Classical programming languages, but with a slightly different semantics (data-procedural) are good candidates for parallel programming.

Page 16: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 16

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

91

JPEG zigzag scan pattern

x

y

EastScan is step by [1,0] end EastScan;

SouthScan is step by [0,1] endSouthScan;

*> Declarations

NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan;

SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan;

HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag;

goto PixMap[1,1]

HalfZigZag; SouthWestScan uturn (reverse (HalfZigZag))

reverse (HalfZigZag)

data counter data counter

data counter data counter

2

1

3

4

HalfZigZag

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

92

What is the reason of the paradox ?

Result from decades of tunnel view in CS R&D and education

basic mind set completely wrong

the von Neumann Syndrome

“CPU: most flexible platform” ?

>1000 CPUs running in parallel: the most inflexible platform

However, FPGA & rDPA are very flexible

The Law of More: drastically declining programmer productivity

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

93

Dual paradigm: an old hat (3)

“procedure call” or function call

call Module-name (parameters);

Hardware Description Languages;

Hardware description: space domain

Software: time domain

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Software / Configware Co-Compilation

Analyzer / Profiler

SW code

SW compiler

para d igm “vN" machine

CW Code

CW compiler

anti machine paradigm

Partitioner

C language source

FW Code

Juergen Becker 1996

But we need a dual paradigm approach: to run legacy software together w. configware

Reconfigurable Computing: Technology is Ready, Users are Not

apropos compilation:

94

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Why a new machine paradigm ???

The anti machine as the 2nd paradigm is the key to curricular innovation

rDPA rDPA µprocessor µprocessor

... a Troyan horse to introduce data-stream-based issues to the classical mind set of programmers

Programming by flowware instead of software is very easy to learn

Flowware education: no fully fledged hardware expert needed to program configware

(... same language primitives)

95 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

96

Data Stream Machines

• Systolic array, def of datastream -simple !! -- (e. g. Matrix compute) examples intro, simulator, exercises

• DMA, auto-sequencing memory, data stream machine exercises programming data streams e.g.of prev. section

• introducing a flowware language, implement a compiler example exercises running on datastream machine

• introductory examples illustrate datastream machine use

• introducing the datastream machine

Page 17: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 17

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

97

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

98

Book Outline

• Introduction • Instruction Stream Machines • Data Stream Machines • Hetero Systems

• Time to Space Mapping • Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

99

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Our Contemporary Computer Machine Model

Machine model

resources sequencer

property programming

source property programming source state register

ASIC accelerator hardwired - hardwired -

CPU hardwired - programmable Software (instruction streams)

program counter

RPU accelerator programmable

Configware (configuration

code) programmable

Flowware (data

streams)

data counters

twin Paradigm Dichotomy

in CPU

in RAM

data counters of reconfigurable address generators in asM (auto-sequencing) data memory blocks

the same language primitives!

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The clash of Paradigms (a simplified view)

Computation in time (procedural-only mindset)

Computing in space (& time since unavoidable)

CS&E: Co-design mindset (mainly embedded systems) (mostly hardware people with some programming skills)

you can teach programming to a hardware guy you can teach programming to a hardware guy

you can’t teach hardware to a programmer

you can’t teach hardware to a programmer

This is the fault of our CS curricula This is the fault of our CS curricula

CS: Software-only mindset: von-Neumann-based

101 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

102

von Neumann machine

resources

sequencer

Machine: resources

sequencer

memory

algorithms

software program counter

von Neumann machine:

Page 18: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 18

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

103

Anti machine

resources

sequencer

memory

algorithms

flowware

data counters

hardwired anti machine:

resources

sequencer

memory

algorithms

flowware

data counters

reconfigurable anti machine:

configware

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

104

Dual paradigm: an old hat

Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer

Software mind set: instruction-stream-based: flow chart -> control instructions (FSM: state transition)

-> Register Transfer Modules (DEC: mid 1970ies); similar concept: Case Western Reserve Univ. ;

FF

token bit

evoke

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

105

Dual paradigm: an old hat (2)

“It is so simple!

why did it take 25 years to find out ?”

Hardware Description Language scene ~1970:

Because of the reductionists’ tunnel view

Because of a lack of transdisciplinary thinking

FF

token bit

evoke

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

decision box turns into demultiplexer

106

[1967] PvOIIP

0 1 B0

B1

CO

ND

ITIO

N

ENABLE

demultiplexer:

B0

B1

CONDITION

ENABLE

decision box:

RTM as DEC product available: 1973

[~1971] (introducing HDLs):

„That‘ so simple! Why did it take 30 years to find out ?“

C. G. Bell et al: IEEE Trans-C21/5, May 1972

W. A. Clark: 1967 SJCC, AFIPS Conf. Proc.

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

manycore programming* requires mapping from computing in time, to computing in space (& time)

Programming in CS education is based on fine-grained strictly sequential thinking

Decades old methods and simple models are stubbornly ignored**, even by engineers

Resulting in embarrassing situations (e. g. in talking to your CEO)

Education: a Key Issue

*) and Reconfigurable Computing **) except most hardware people

decision box turns into multiplexer

decision box turns into multiplexer “That’s so simple!

why did it take 30 years to find out?”

in 1971:

C. G. Bell et al: IEEE Trans-C21/5, May 1972

W. A. Clark: 1967 SJCC, AFIPS Conf. Proc.

task P

task Q task R

C 0 1 flowchart

flipflop P

flipflop Q flipflop R

C condition

evoke P

evoke Q evoke R

0 1

token

(token) (token)

RTM

107 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

108

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

rout thru only

not used backbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, e. g. 32 bits wide

reconfigurable Data Path Unit, e. g. 32 bits wide

no CPU

rDPU rDPU

note: software perspective without instruction streams

Symptom of the von Neumann Syndrome

Page 19: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 19

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

109

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not used backbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPU rDPU

note: software perspective without instruction streams: pipelining

note: software perspective without instruction streams: pipelining

compiled by Nageldinger‘s KressArray Xplorer with Juergen Becker‘s CoDe-X inside

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

KressArray Example

Y

X

rout thru only rout thru and function (multiplexer)

X

swap

0

1

0

1

>

Y

if X > Y then swap;

Conditional Swap

swap turned into a wiring pattern

110

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Really so simple ?

111

(recall this example !)

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not used backbus connect

embarrassing reaction to Ulrich Nageldinger‘s talk at RAW 1996

CoDe-X inside [Jürgen Becker]

by KressArray Xplorer [Ulrich Nageldinger]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Brick Wall in the Brain

112

immediately* a VIP jumps up: „But you can‘t implement decisions!“

Embarrassing: a top level R&D manager of a global IT corp. group

*) discussion after the talk: RAW at Orlando, FLA

completely missing

sense of Dichotomies

structural procedural

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

„but you can‘t implement decisions!“

S = R + (if C then A else B endif);

=1

+

A B R C

section of a very large pipe network: not knowing this solution

symptom of the hardware / software chasm

and the configware / software chasm

113 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

„But you can‘t implement decisions!“

114

S = R + (if C then A else B endif);

=1

+

A B R C

section of a very large pipe network: Software to

Configware

Migration:

it‘s criminal, that typical CS

graduates don‘t know this!

illustrating, that mono-rail education is fatal

Page 20: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 20

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

115

Hetero Systems

• Hetero systems examples. Co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Domains & what we need

116

term source for programming … domain

software instruction streams time (procedural)

configware ressources (structures) space (structural)

flowware data streams time (procedural)

we need data parallelism we need data parallelism

we need paradigm twins we need paradigm twins

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

117 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Terminology

term program counter

execution triggered

by paradigm

CPU

yes instruction fetch

instruction-stream-based

DPU** no data arrival*

data-stream-based

DPU

progra

m

counter

DPU CPU CPU

*) “transport-triggered” **) does not have a program counter

118

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The twin paradigm approach

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU CPU CPU

CPU CPU

CPU CPU

CPU CPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

inefficient von Neumann

paradigm

high performance antimachine paradigm

heterogenous manycore system

no instruction streams at run time:

only intermediate data to be stored: at distributed fast on-chip memory

for legacy software and supervisory tasks

Software/ Configware

Co-compilation

119 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Why a Dichotomy ?

120

Dichotomy = mutual allocation to two opposed domains such, that a third domain is excluded.

The dichotomy model as an educational orientation guide for dual rail education to overcome the software/configware chasm

Dichotomy of machine paradigms (von Neumann /Anti machine): the „Twin Paradigm“ model

Relativity Dichotomy: time domain / space domain – helps parallelization by time to space mapping

Page 21: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 21

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Teaching properly

121

Dichotomy model for dual rail education to overcome the software/configware chasm & the software/hardware chasm

1) Machine Paradigm Dichotomy (von Neumann / Datastream machine): the Twin Paradigm model

2) Relativity Dichotomy: time domain / space domain – helps parallelization by time to space mapping

„The biggest payoff will come from Putting Old ideas into Practice and teaching people how to apply them properly.“

Dav

id P

arna

s

(POIIP) von Neumann reconfigurable

keep in mind ~hardware

Put Old ideas into Practice

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

1 ) Paradigm Dichotomy

122

instruction domain

(procedural dichotomy)

imperative Language (instruction-procedural) (data-procedural)

datastream domain

The Twin Paradigm Approach (TTPA)

program

counter

CPU CPU data

counter

(r)DPA (r)DPA

+ - - +

Systolic Flowware Language super-

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

1 ) Paradigm Dichotomy

123

instruction domain

(procedural dichotomy)

imperative Language (instruction-procedural) (data-procedural)

datastream domain

The Twin Paradigm Approach (TTPA)

program

counter

CPU CPU data

counter

(r)DPA (r)DPA

+ - - +

+

+ data

parallelism

data

parallelism

we need we need

Systolic Flowware Language super-

s s

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The anti universe

•Paul Dirac predicted a complete anti universe consisting of antimatter

•“There are regions in the universe, which consist of antimatter .....

•We are not aware, that there is a new area in computing sciences , which consists of antimatter of computing

• .... But there are asymmetries”

•Reconfigurable Computing is made from this antimatter: data-stream-based computing

•when a particle hits its antiparticle, both are converted into energy: Annihilation

124

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

anti particles

• 1956: anti neutron created on Bevatron

• 1928: Paul Dirac: „there should be an anti electron having positive charge“ (Nobel price 1933)

• 1932: Carl David Anderson detected this „positron“ in cosmic radiation (Nobel price 1936)

• 1955 Owen Chamberlain et al. create anti proton on Bevatron

• 1954: new accelerators: cyclotron, like Berkeley‘s Bevatron

• 1965: creation of a deuterium anti nucleus at CERN

hydrogen anti hydrogen

• 1995: hydrogen anti atom created at CERN – by forcing positron and anti proton to merge by very low energy.

125 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Matter & Antimatter: Atom and Anti Atom

The World of Matter -

machine paradigm: the Atom

Anti Matter -

machine paradigm: Anti Atom

+ + -

- - +

126

Page 22: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 22

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Matter & Antimatter of Informatics : Machine and Anti Machine

+

CPU

- 1936 1st electronic computer (Konrad Zuse)

Machine paradigm: „von Neumann“

1946 v. N. machine paradigm

1971 1st microprocessor (Ted Hoff)

1979 „data streams“ (systolic array: Kung / Leiserson)

- DPU

+

Anti Machine paradigm

1990 anti machine paradigm published

1995 rDPA / DPSS (supersystolic: Rainer Kress)

novel

compilation

techniques

127 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

- DPU

(r)DPU

+

CPU

instruction stream

Matter vs. antimatter: CPU vs. DPU

-

progra

m

counter

DPU CPU

without sequencer

+

dat

a st

ream

(r)DPA

dat

a st

ream

s

+

+

128

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

heavy anti atoms: DPA = DPU array

- DPA

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU -

DPA

+

+

+

+

+

+

+

+

+

coher

ent

dat

a st

ream

s sp

inni

ng a

roun

d

129 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Parallelism by Concurrency

+ -

+

- -

+

- +

+

-

- +

- +

independent instruction streams difficult ...

130

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

131

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

132

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

CPU CPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

CPU CPU

Compilation for Dual Paradigm Multicore

SW compiler

CW compiler

C language source

Partitioner

Juergen Becker’s

CoDe-X, 1996

compile to hybrid multicore compile to hybrid multicore

placement and routing placement and routing

automatic parallelization by loop transformations automatic parallelization by loop transformations

generating a pipe network generating a pipe network

Page 23: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 23

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

133

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

134

Jürgen Becker’s CoDE-X -2 Co-Compiler

Analyzer / Profiler

GNU C compiler

para d igm Computer machine

DPSS

X-C compiler

Anti machine paradigm

Partitioner

X-C is C language extended by MoPL X-C

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU CPU

Resource Parameters

supporting KressArray

family

Pipelining:

A Shorter Leap

Pipelining:

A Shorter Leap

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Configware resources: variable

The RAM-based domains

Flowware algorithm: variable

Configware domain Configware domain

algorithm: variable

resources: fixed

Software CPU

135

Software domain Software domain (von Neumann Machine)

Anti Machine:

Nic

k T

redenn

ick’s

Pe

rspect

ive

space domain space domain

(structure)

time domain time domain

(data streams)

time domain time domain

(instruction streams)

1 Program Source needed

2 Program Sources needed

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

136

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Loop Transformation Examples

loop 1-8 body body endloop

loop 1-8 body endloop

loop 9-16 body endloop

fork

join

strip mining

loop 1-4 trigger endloop

loop 1-2 trigger endloop

loop 1-8 trigger endloop

reconf.array: host: loop 1-16 body endloop

sequential processes: resource parameter driven Co-Compilation

loop unrolling

137 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

138

Anti-Matter (vs. Matter)

Paul Dirac (1928, Nobel prize 1933):

But in CS it’s really existing, it’s Reconfigurable Computing

only accelerator artefacts* (1955 – 1995)

Dichotomy Example:

Dirac: “But there are Asymmetries” but in CS: only one asymmetry !

hydrogen

(CERN 1995) anti hydrogen

*) except positron , found from cosmic radiation

*) except positron , found from cosmic radiation

Page 24: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 24

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Double Dichotomy

1) Paradigm Dichotomy

2) Relativity Dichotomy

Procedure time

(Software-Domain)

Structure space

(Configware-Domain)

instruction stream von Neumann

(Software-Domain)

data stream Anti Machine

(Flowware-Domain)

139 139 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Double Dichotomy (2)

1) Paradigm Dichotomy

2) Relativity Dichotomy

-Procedure time:

(Software-Domain)

-Structure space:

(Configware-Domain)

instruction stream von Neumann

(Software-Domain)

data stream Anti Machine

(Flowware-Domain)

140

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Double Dichotomy (3)

2) Relativity Dichotomy

-Procedure time:

(Software-Domain)

-Structure space:

(Configware-Domain)

1) Paradigm Dichotomy

instruction stream von Neumann Machine

(Software-Domain) data stream Datastream Machine

(Flowware-Domain)

141 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

2) Relativity Dichotomy

time domain: space domain:

procedure domain structure domain

2 phases: 1) programming

instruction streams 2) run time

3 phases: 1) reconfiguration

of structures

time space

2) programming data streams

3) run time

142 von Neumann Machine Anti Machine

(time time/space)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern Relativity Dichotomy (2)

time domain: space domain:

procedure domain structure domain

2 phases: 1) programming

instruction streams 2) run time

3 phases: 1) reconfiguration

of structures

time space

2) programming data streams

3) run time

143

von Neumann Machine Datastream Machine

(time time/space)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

time-iterative to space-iterative

144

a time to

space/time

mapping

loop transformation methodogy: 70ies and later

n*k time steps, 1 CPU

n time steps, k DPUs

Often the space dimension is limited (e.g. because of the chip size) n time steps,

1 CPU

1 time step, n DPUs

a time to

space

mapping

e. g. example: bubble sort migration Strip mining

[D. Loveman, J-ACM, 1977]

Page 25: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 25

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

145

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

146

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

147

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Architektur statt Synchro

148

„Shuffle Sort“

conditional swap

conditional swap

conditional swap

conditional swap

Modifikation:

mit Shuffle-

Funktion

conditional swap

conditional swap

conditional swap

conditional swap

conditional swap

conditional swap

swap

conditional swap

conditional

Direkte Zeit-

Raum-Abbildung

Zugriffs-Konflikte

Bessere Architektur statt aufwendiger Synchronisation: Halbierung der Anzahl Blöcke + Auf und Ab der Daten (Shuffle Funktion) – kein von-Neumann-Syndrom !

Beispiel

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Zeit zu Raum Abbildung

Zeit-Domäne: Raum-Domäne:

Prozedur-Domäne Struktur-Domäne

149

Programmschleife n Zeitschritte, 1 CPU

Pipeline 1 Taktschritt, n DPUs

Bubble Sort n x k Zeitschritte,

1 „conditional swap“ unit

Shuffle Sort k Taktschritte, n „conditional swap“ units

Zeit-Algorithmus Raum-Algorithmus

conditional swap

x

y

conditional swap

conditional swap

conditional swap

conditional swap

Zeit-Algorithmus Raum- / Zeit-Algorithmus

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Transformationen seit den 70ern

Zeit-Domäne: Raum-Domäne:

Prozedur-Domäne Struktur-Domäne

150

Programmschleife n x k Zeitschritte,

1 CPU

Pipeline k Taktschritte, n DPUs

Zeit-Algorithmus Raum- / Zeit-Algorithmus

Strip Mining

Transformation

Schleifentransformationen: reichhaltige Methodik publiziert Hier ein Beispiel: [Übersicht: Diss.

Karin Schmidt, 1994, Shaker Verlag]

Page 26: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 26

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

time-iterative to space-iterative

151

a time to

space/time

mapping

loop transformation methodogy: 70ies and later

n*k time steps, 1 CPU

n time steps, k DPUs

Often the space dimension is limited n time steps, 1 CPU

1 time step, n DPUs

a time to

space

mapping

e. g. example: bubble sort migration Strip mining

[D. Loveman, J-ACM, 1977]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

We need to POIIP for:

152

2 simple key rules of thumb:

a) loop turns into pipeline

Software to Hardware Migration

Software to Configware Migration and

b) decision box turns

into demultiplexer

rDPU rDPU loop body

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU loop body

Put Old ideas into Practice

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

153

Hetero Systems

• Hetero systems examples. co-compilation, co-design, interfacing (very short intro).

• Softw 2 configw-&-hardw migration:

• Software/configware/hardware partitioning

• The MORPHEUS project

• Mastering the Hardware / Software Chasm

• The Twin Paradigm Approach

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

154

... (may be)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

155

Book Outline

• Introduction • Instruction Stream Machines • Data Stream Machines • Hetero Systems • Time to Space Mapping

• Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Dual-rail education

156

von Neumann

machine

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

CPU CPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

rDPU rDPU

datastream

machine

Software domain Configware domain

Conventional

Computing

Reconfigurable

Computing

Time domain Space domain 1)

2)

program

counter

CPU CPU data

counter

(r)DPA (r)DPA s s

Dic

hoto

my

#

migration, interfacing, partitioning

„The importance of not separating in your mind the notions of hardware and software“

Yale Patt:

Page 27: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 27

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

157

Book Outline

• Introduction • Instruction Stream Machines • Data Stream Machines • Hetero Systems • Time to Space Mapping • Application Examples

• Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

158

Application Exampless

• survey on speed-up-factor publications automotive, genome sequencing, image processing, what else ?

• elevator example or traffic light example

• cryptography • automotive

• bioinformatics • image processing

• genome sequencing • digital signal processing

• information science • … your proposals …..

• bubble sort to shuffle sort migration example

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

159

... (old stuff)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

160

Book Outline

• Introduction • Instruction Stream Machines • Data Stream Machines • Hetero Systems • Time to Space Mapping • Application Examples • Outlook

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

161

thank you for your patience

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

162

END

Page 28: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 28

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

extra pages

for discussion:

163 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

164

FPGAs in Supercomputing

• Synergisms: coarse-grained parallelism through conventional parallel processing,

reconfigurable logic box: 1 Bit

• and: fine-grained parallelism through direct configware execution on the FPGAs

DPU program counter

DPU CPU CPU

DataPath Units 32 Bit, 64 Bit

(millions of rLBs embedded in a reconfigurable interconnect fabrics)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

165

Have to re-think basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

166

„It is feared that domain scientists will have to learn how to design hardware. Can we avoid the need for hardware design skills and understanding?“

Avoiding the paradigm shift?

Tarek El-Ghazawi, panelist at SuperComputing 2006

SuperComputing, Nov 11-17, 2006, Tampa, Florida, over 7000 registered attendees, and 274 exhibitors

We need a bridge strategy by developing new text books and advanced tools for training the software community

We need a bridge strategy by developing advanced tools for training the software community to think in fine grained parallelism and pipelining techniques.

166

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

167

PISA-MoM

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

168

Processing 4-by-4 Reference Patterns

Mead-&-Conway nMOS Design Rules: 256 4-by-4 reference patterns

Mead-&-Conway CMOS Design Rules: >800 4-by-4 reference patterns

MoM: all reference patterns matched in a single clock cycle

vN Software: some reference patterns can be skipped, depending on earlier patterns

DPLA: fabricated by the

E.I.S. Multi University Project:

PISA DRC accelerator [ICCAD 1984]

1984: 1 DPLA replaces 256 FPGAs

Reference patterns automatically generated from Design Rules

PISA: a forerunner

of the MoM

accelerator reconfigurable

Page 29: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 29

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

169

Acceleration Mechanisms by ASM-based MoMSW

•parallelism by multi bank memory architecture •reconfigurable address compuattion – before run time

•avoiding multiple accesses to the same data.

•avoiding memory cycles for address computation

• improve parallelism by storage scheme transformations

•minimize data movement across chip boundaries

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

170

ASM

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

171

171 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

172

Morphware: old stuff

structural programming (non-v-N)

1971 PROMs for small logic

1984 first Xilinx FPGA

1975 PLA

1978 PAL with PALASM tool

meanwhile mainstream …

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern We need to POIIP for:

173

2 key rules of thumb - terrifically simple:

1) loop turns into pipeline [1979]

Software to Hardware Migration:

Software to Configware Migration: and

2) decision box turns into demultiplexer

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern (2) We need to POIIP for:

174

2 key rules of thumb - terrifically simple:

1) loop turns into pipeline [1979]

2) decision box turns into demultiplexer

[1968]: PvOIIP

Software to Hardware Migration:

Software to Configware Migration: and

Page 30: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 30

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

POIIP: Loop turns into pipeline

175

[1979]

(reconfigurable)

DataPath Unit:

rDPU rDPU loop body

rDPU rDPU

rDPU rDPU

rDPU rDPU

Pipeline:

rDPU rDPU loop body

loop:

complex loop body

nested loops

complex rDPU or pipe network inside rDPU

complex pipe network

CPU CPU

Memory Memory

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Brick Wall in the Brain

176

immediately* a VIP jumps up: „But you can‘t implement decisions!“

Embarrassing: a top level R&D manager of a global IT corp. group

*) discussion after the talk: RAW at Orlando, FLA

completely missing

sense of Dichotomies

structural procedural

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Computational Density vs. Code Size SPECfp2000/Billion-Transistors PowerPC 1998: 72 2003: 5 SUN: 1985: 105 2000: 30 1990: 17 alpha: 1992: 550 1999: 10

http://www.nytimes.com/2006/03/27/technology/27soft.html

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

109

108

107

106

105

104

103

102

101

100

2002

7750

4004/50 single-board computer

8080 6000

360

360 360

7090

7070

704

650

650

Windows lines of code Windows 95: 15 M Windows 98: 18 M Windows XP (’01): 35 M Windows Vista (’07): 50 M

2005 2010

MOPS/MHz/Mio Transistors Pentium: 1996: 0,43 2000: 0,06

Computational density SPECfp2000/BillionTransistors

MOPS/MHz/ Mio Transistors

Pentium 4 125,000,000

other intel µP

[BWRC, UC Berkeley, 2004]

177 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

ICT is at an inflection point

Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.

Cheap Revolution:

„Broadband is significant at the inflection point, prompting major market governance changes“

massive funding needed Cowhey‘s & Aronson‘s Law:

affordable broadband software performance

„Future prosperity depends on network capacity, ..., efficient pricing, and flexible platforms“

e.g., the living room commercially more important than the comparatively small PC market.

massive

funding

required

178

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Coarse-grained vs. fine-grained

device granularity path width eff’ve density flexibility

FPGA fine-grained ~ 1 bit very low general purpose

DPA coarse-grained multi bit:

e.g. 32 bits very high

specialized

rDPA coarse-grained domain-

specific platform

FPGA

fine-grained &

embedded hdw. mixed high

179 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

180

MoM

Page 31: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 31

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

MoM anti machine an Xputer architecture

Multiple Scan Windows

data

counter

memory bank

asM

asM

asM

asM

asM

asM

......

asM

A d

istr

ibut

ed m

em

ory

rDPU smart

memory interface

example: 4x4 scan windows

.....

181 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

16 point CGFFT: mapped onto 2-D memory space

182

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

outp

ut

temp

temp

temp

coeff

.

coeff

.

coeff

.

CGFFT: Nested and Parallel Scan Pattern

inpu

t

coeff

.

ini

ini+1

coeff. empty

MAC

183 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

CGFFT: Parallel Scan Pattern Animation

ini

ini+1

coeff. empty

outk

MAC

outj

184

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

CGFFT: Parallel Scan Pattern Animation

MAC

outj

outj+1

outk

outk+1

ini

ini+1

coeff. empty

Ini+2

ini+3

coeff. empty

MAC

4 MAC units

in parallel

8 MAC units

in parallel

185 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern CGFFT: Nested and Parallel Scan Pattern

scanouter loop

patternHLScan is 3 steps [2, 0]

SP1 is 7 steps [0, 2]

SP23 is 7 steps [0, 1]

inner loopcompound

scanpatterns

3 in parallel

goto

186

Page 32: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 32

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

>>> distributed memory

(application-specific)

distributed memory

The new discipline of

[] Herz et al.: proc. IEEE ICECS 2002

187 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

188

GAG

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

GAG generic address generator Scheme

Base Slider

B0

Limit Slider

L0

0 B

[

Address Stepper

DA

A

D A

| | | |

L

]

limit

all 3 are copies of the same BSU

stepper circuit GAG

189 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

GAG Slider Model

LimitStepper

BaseStepper

AddressStepper

B0DAL0

A

LimitStepper

BaseStepper

AddressStepper

B0DAL0

A

sliders

B 0 B

[

0 L

]

0 L 0

B 0 B

[

0 A D

A D

L

]

0 L 0

GAG

Generic Address

Generator

floor ceiling

190

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern GAG: Address Stepper

GAG =

Address

Generator

Generic

+ / –

Escape

Clause End

Detect

Step Counter

=o

L A D A

init tag

A

Address endExec

maxStepCount

0 B Limit Base stepVector

[ ] | |

D A L B 0

[ ] | | | |

limit

GAG: Address Stepper

191 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

192

GAG Complex Sequencer Implementation

Limit Slider

Base Slider

GAU

Address Stepper

B0 DA L0

A

all `been published

in 1990

Limit Slider

Base Slider

GAU

Address Stepper

B0 DA L0

A

Limit Slider

Base Slider

GAU

Address Stepper

B0 DA L0

A

GAU GAU

GAG

Generic Address Generator

SDS

GAG

VLIW

stack

Page 33: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 33

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

193

Generic Address Generator GAG

Generalization of the DMA

data counter

GAG

GAG & enabling technology published 1989, survey: [M. Herz et al.: IEEE ICECS 2003, Dubrovnik]

patented by TI 1995

• storge scheme optimization methodology, etc.

Acceleration factors by:

• address computation without memory cycles

*) Software to Xputer migration

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

194

miscellaneous

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Dave Patterson‘s Curve .....

1

10

100

1000 Performance

1980 1990 2000

µProc:

60%/yr..

DRAM

7%/yr..

Processor-Memory

Performance Gap:

(grows 50% / year)

DRAM

CPU

Other problems:

• routing congestion,

•memory cycle overhead

... is not the only problem

195 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

196

Future Trends in Microelectronics

• This especially holds for research funding and politics in Europe [R. H.][R. H.]

• Organizations usually behave poorer than anyone can predict [Gordon Bell][Gordon Bell]

•• Predictions require some history Predictions require some history [Gordon Bell][Gordon Bell]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

197

The new paradigm: how the data are traveling

An old hat: transport-triggered

pipeline, or chaining

super systolic array

better not by instruction execution better not by instruction execution

DPU DPU DPU

vN Move Processor instruction-driven

+ instruction-driven

[Jack Lipovski, EUROMiCRO, Nice, 1975]

P&R: move locality of operation, not data !

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

198

Xputer Principles

addr. generators reconfigurable

Data Path reconfigurable

Xputer

CPU We used the VAX-11/750 of my group

DPLA

rALU

contemporary ?

1984: first FPGAs: very tiny & very expensive

ASM ASM

Page 34: Reconfigurable Supercomputing means to brave the paradigm ... · in modern computer history. The VLSI Revolution educational framework to create designer & researcher population 22

[email protected]

21 May 2013

IEEE Computer Society EAB November 2005 at Philadelphia 34

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

HPC experts coming ...

Simulation of Star Clusters: x10 speed-up by supercomputer-to-morphware migration

(also molecular biology et al.)

Rainer Spurzem, University of Heidelberg

Reinhard Maenner, University of Mannheim HPC pioneer since 1976 (Physics Dept Heidelberg)

Configware by

Astrophysics by

ARI, Astrononisches Rechen-Institut, founded 1700 in Berlin, moved 1945 to Heidelberg by August Kopff

Gottfried Kirch

August Kopff

example: N-body problem going configware [FPL 1999]

199 © 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Curriculum recommendations

Real-Time Systems (Sweden)

Recommendations to design new ICT curricula

ACM/AIS/IEEE Computing Curricula recommendations

Chess - Center for Hybrid and Embedded Software Systems

(courses in embedded systems)

(EU) Graduate Curriculum on Embedded Software and Systems

Advanced Real Time Systems

Workshop on Embedded Systems Education

WESE

a variety of conference series

CSEE&T Conference on Software Engineering Education and Training

Int‘l Conf. on Microelectronics System Education

Workshop on Computer Architecture Education

200

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

• abstracting away the space domain

201

reductionist approach: reductionist approach:

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

202

Conclusions (1)

We massively need programmable accelerator co-processors

Established technologies are available and we can still use standard software and their tools

Configware skills and basic hardware knowledge

are essential qualifications for programmers.

We need a massive Migration of Software to Configware.

The implementation wall: the programmer population‘s unsustainable skills mismatches

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

203

Conclusions (2)

CS education is a monster !

Yaw-dropping sclerosis of curriculum taskforces

We need a complete re-definition of CS education

We urgently need a fundamental Research and

Education Revolution with connected thinking

Fully wrong educational mainstream approaches

We need „une' Levée en Masses“

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Applications TOPICS repository

Software Engineering: Component Technologies; Testing; Empirical Studies; Design Recovery and Documentation; Formal Methods;

Software Design; Software Process; Program Analysis; Software Architecture; Software Understanding; Consistency Management and

Quality Assurance; Case Studies; Automotive Software Engineering; Process Analysis and Improvement; Process and Tools; Testing and

Fault Correction; Extreme Programming; Undergraduate Education; Course Delivery and Evaluation; Process and Methodology

Parallel and Distributed Systems' Architectures: design, analysis, and implementation of multiple-processor systems (including multi-

processors, multicomputers, and networks); impact of VLSI on system design; interprocessor communications;

Parallel and Distributed Systems' Software: parallel languages and compilers; scheduling and task partitioning; databases, operating

systems, and programming environments for multiple-processor systems;

Parallel and Distributed Systems' Algorithms and applications: models of computation; analysis and design of parallel/distributed

algorithms; application studies resulting in better multiple-processor systems;

Other issues on Parallel and Distributed Systems'' and Software Engineering: performance measurements, evaluation, modeling and

simulation of multiple-processor systems; real-time, reliability and fault-tolerance issues; sequential-to-parallel forms conversion of software.

Algorithms: Data Base and Data Mining; Wireless Networks; Parallel and Network Computing; Routing, Sorting and Clustering;

Interconnection Networks; Routing and Scheduling; Compilers; Architecture and Systems; Routing Algorithms; Sorting and Routing

The American Recovery and Reinvestment Act of 2009 (ARRA, or “the stimulus”) totaled approximately $787 billion. Of this, approximately $21.5 billion (2.7 percent) was for the support of R&D—$18 billion for the conduct of research and $3.5 billion for facilities and equipment.