An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon

An Automated Development Framework for a RISC Processor with An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set ExtensionsReconfigurable Instruction Set Extensions

Nikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisNikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisSection of Electronics and Computers, Department of Physics,Section of Electronics and Computers, Department of Physics,

Aristotle University of Thessaloniki, GreeceAristotle University of Thessaloniki, GreeceE-mail: [email protected]: [email protected] AUTH

IntroductionIntroduction

Broad diversity of algorithms Rapid evolution of standards High-performance demands

Characteristics of modern applications

Requirements to amortize cost over high production volumes High levels of flexibility => fast time-to-market Adaptability => increased reusability

An appealing option: Reconfigurable Instruction Set Processor (RISP) Couple a standard processor with Reconfigurable Hardware (RH) Processor = bulk of the flexibility RH = adaptation to the targeted application through dynamic

Instruction Set Extensions (ISEs)

MotivationMotivationNew features in order to program such hybrid architectures Partition the application between the RH and the processor Configure the RH and feed the compiler with RH related

information (e.g. execution and reconfiguration latencies) Introduce extra code to pass data and control directives to and from

the RH

Transparent incorporation of these new features is a must In order to preserve the time-to-market close to that of the

traditional software design flow and continue to target software-oriented groups of users with small

or no experience in hardware design

ObjectiveObjectiveDesign a development framework for a dynamic Reconfigurable Instruction Set Processor (RISP) Fully automated

Hides all RH related issues requiring limited interaction with the user

Appropriate for exploration and fine-tune of the architecture towards a target applicationAllow different values for various architectural parametersRetargetable to different instances of the architecture

CONTROL LOGIC(Core &

Coupling)

REG

ISTER FILE

ALU

MU

X

PIPELINE R

EGISTER

PIPELINE R

EGISTER

PIPELINE R

EGISTER

PIPELINE R

EGISTER

MUL

SHIFT DATA MEMORY

CORE / RFU INTERFACE

PROCESSING & INTERCONNECT LAYERSCONFIGURATION LAYER

WRITE BACK DATA

CONTROL SIGNALS

I_WORD

OPERANDS

1ST STAGE RESULT

2ND STAGE RESULT

R OpCode

STATUS SIGNALS

CONFIGURATION BITS

CORERFU

IF Stage ID Stage EX Stage MEM Stage WB Stage

Target ArchitectureTarget ArchitectureCore processor Single issue, 5 Pipeline stages, 32-bit RISC Can issue one reconfigurable instruction per

execution cycle Reconfigurable instructions explicitly

encoded in the instruction word

Interface Provides control and data communication

Reconfigurable Functional Unit (RFU) Tightly coupled to the core’s pipeline Provides reconfigurable ISE ISE = MISOs of the cores primitive

instructionsMISO: Multiple-Input-Single-

Output

Multi-context config. mem

Can provide a different config. bitstream per execution cycle

1-D array of coarse-grain PEs

“Floating” between two concurrent pipeline stages

Exploit spatial/temporal computation

ConclusionsConclusions An automated development framework for a target RISP has been introduced The framework can be used both for fine-tune an instance of the RISP at design

time but also to program it after fabrication The framework hides all reconfigurable hardware related issues from the user The target RISP architecture can be used by any software-oriented user with no

grip of hardware design Support of a new architecture instance by the framework is possible without any

modification

AcknowledgementAcknowledgementThis work was supported by the General Secretariat of Research and Technology of Greece and the European Union.

Framework usage demonstration - Experimental ResultsFramework usage demonstration - Experimental Results

Speedup vs. number of PEsSpeedup vs. number of PEs Speedup vs. PEs/MISO inputsSpeedup vs. PEs/MISO inputs Speedup vs. number of Speedup vs. number of reconfigurable instructionsreconfigurable instructions

Code size and instruction Code size and instruction fetches reductionfetches reduction

16 words of 134 bitsConfig. Memory Size

ALU, Shift, MultiplyPEs Functionality8Num of PEs

32-bitsGranularityValueConfiguration

Exploration for fine-tuning of critical architecture parameters

A set of benchmarks derived from different benchmarking suites like MiBench and Powerstone were considered The possibilities of using the framework for fine-tune critical architectural parameters and/or to program/evaluate a

specific instance of the architecture are demonstrated Comparisons results were performed considering the RISC processor of the architecture with and without support

of the RFU unit

Configuration of an evaluation Configuration of an evaluation instanceinstance Evaluation of the derived

instance

0

20

40

60

80

100Code Size Instruction Fetches

0

1

2

3

4

58 Instr 16 Instr 32 Instr 64 Instr All Instr

0

1

2

3

4

5Max PEs 10 PEs 8 PEs 6 PEs 4PEs

0

1

2

3

4

5Max PEs-Max Inputs Max PEs-4 Inputs 8 PEs-4Inputs

x2.91 avg. speedup 38% avg. code size

reduction => less memory requirements

62% avg. instruction memory accesses reduction => significant power savings

Development Framework FlowDevelopment Framework FlowC/C++

Front-EndMachSUIF

Optimized IR in CDFG form

Basic Block Profiling Results

Instruction Selection

Back-EndStatistics

Executable Code

Profiling

Instrumentation

m2cInstr.Gen.

Pattern Gen.

Mapping

Instr. Extens.

User Defined Parameters

Instrument the CDFG with profiling annotations

Convert CDFG to equivalent C code

Compile and execute

Pattern generation to identify MISO cluster of operations

MappingAssign operations PEs / Route the 1-D arrayAnalyze data paths to minimize delayReport candidate instruction semantics

Maximum number of inputs in the pattern (i.e. # of reg.file read ports)

Permitted types of operations in the pattern (i.e. PE functionality)

Maximum number of operations in the patterns (i.e. # of PEs)

Number of different ISEs

Graph isomorphism to discover identical instructions

Rank instructions based on the estimated offered speedup

Select the best ISEs

Documents

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon