1
An Automated Development Framework for a RISC Processor An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon Nikolaos Vassiliadis, George Theodoridis and Spiridon Nikolaidis Nikolaidis Section of Electronics and Computers, Department of Physics, Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, Greece Aristotle University of Thessaloniki, Greece E-mail: [email protected] E-mail: [email protected] AUTH Introduction Introduction Broad diversity of algorithms Rapid evolution of standards High-performance demands Characteristics of modern applications Requirements to amortize cost over high production volumes High levels of flexibility => fast time-to-market Adaptability => increased reusability An appealing option: Reconfigurable Instruction Set Processor (RISP) Couple a standard processor with Reconfigurable Hardware (RH) Processor = bulk of the flexibility RH = adaptation to the targeted application through dynamic Instruction Set Extensions (ISEs) Motivation Motivation New features in order to program such hybrid architectures Partition the application between the RH and the processor Configure the RH and feed the compiler with RH related information (e.g. execution and reconfiguration latencies) Introduce extra code to pass data and control directives to and from the RH Transparent incorporation of these new features is a must In order to preserve the time-to-market close to that of the traditional software design flow and continue to target software-oriented groups of users with small or no experience in hardware design Objective Objective Design a development framework for a dynamic Reconfigurable Instruction Set Processor (RISP) Fully automated Hides all RH related issues requiring limited interaction with the user Appropriate for exploration and fine-tune of the architecture towards a target application Allow different values for various architectural parameters Retargetable to different instances of the architecture CONTROL LOGIC (C ore & C oupling) REG ISTER FILE ALU MUX PIPELIN E R EG ISTER PIPELIN E R EG ISTER PIPELIN E R EG ISTER PIPELIN E R EG ISTER MUL SH IFT DATA MEMORY CO RE /RFU INTERFACE PRO CESSING & IN TER C O NNECT LAYERS CO NFIG URATIO N LAYER WRITE BACK DATA CONTROL SIG N A LS I_W ORD OPERANDS 1 ST STAGE RESULT 2 ND STAGE RESULT R O pCode STATUS SIG N A LS CO NFIG URATIO N BITS CORE RFU IF Stage ID Stage EX Stage M EM Stage W B Stage Target Architecture Target Architecture Core processor Single issue, 5 Pipeline stages, 32-bit RISC Can issue one reconfigurable instruction per execution cycle Reconfigurable instructions explicitly encoded in the instruction word Interface Provides control and data communication Reconfigurable Functional Unit (RFU) Tightly coupled to the core’s pipeline Provides reconfigurable ISE ISE = MISOs of the cores primitive instructions MISO: Multiple-Input- Single-Output Multi- context config. mem Can provide a different config. bitstream per execution 1-D array of coarse-grain PEs “Floating” between two concurrent pipeline stages Exploit spatial/temporal computation Conclusions Conclusions An automated development framework for a target RISP has been introduced The framework can be used both for fine-tune an instance of the RISP at design time but also to program it after fabrication The framework hides all reconfigurable hardware related issues from the user The target RISP architecture can be used by any software-oriented user with no grip of hardware design Support of a new architecture instance by the framework is possible without any modification Acknowledgement Acknowledgement This work was supported by the General Secretariat of Research and Technology of Greece and the European Union. Framework usage demonstration - Experimental Framework usage demonstration - Experimental Results Results Speedup vs. number of Speedup vs. number of PEs PEs Speedup vs. PEs/MISO Speedup vs. PEs/MISO inputs inputs Speedup vs. number of Speedup vs. number of reconfigurable reconfigurable instructions instructions Code size and Code size and instruction fetches instruction fetches reduction reduction 16 words of 134 bits Config. Memory Size ALU, Shift, Multiply PEs Functionality 8 Num of PEs 32-bits Granularity Value Configuration Exploration for fine-tuning of critical architecture parameters A set of benchmarks derived from different benchmarking suites like MiBench and Powerstone were considered The possibilities of using the framework for fine-tune critical architectural parameters and/or to program/evaluate a specific instance of the architecture are demonstrated Comparisons results were performed considering the RISC processor of the architecture with and without support of the RFU unit Configuration of an evaluation Configuration of an evaluation instance instance Evaluation of the derived instance 0 20 40 60 80 100 C ode S ize Instruction Fetches 0 1 2 3 4 5 8 Instr 16 Instr 32 Instr 64 Instr A llInstr 0 1 2 3 4 5 M ax PEs 10 P E s 8 P E s 6 P E s 4PEs 0 1 2 3 4 5 M ax P E s-M ax Inputs M ax P E s-4 Inputs 8 P E s-4Inputs x2.91 avg. speedup 38% avg. code size reduction => less memory requirements 62% avg. instruction memory accesses reduction => significant power savings Development Framework Flow Development Framework Flow C/C++ Front-End M achSU IF O ptim ized IR in C D FG form Basic B lock Profiling R esults Instruction Selection B ack-End Statistics Executable C ode Profiling Instrum entation m 2c Instr.G en. Pattern G en. M apping Instr. Extens. U serD efined Param eters Instrument the CDFG with profiling annotations Convert CDFG to equivalent C code Compile and execute Pattern generation to identify MISO cluster of operations Mapping Assign operations PEs / Route the 1-D array Analyze data paths to minimize delay Report candidate instruction semantics Maximum number of inputs in the pattern (i.e. # of reg.file read ports) Permitted types of operations in the pattern (i.e. PE functionality) Maximum number of operations in the patterns (i.e. # of PEs) Number of different ISEs Graph isomorphism to discover identical instructions Rank instructions based on the estimated offered speedup Select the best ISEs

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon

Embed Size (px)

Citation preview

Page 1: An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon

An Automated Development Framework for a RISC Processor with An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set ExtensionsReconfigurable Instruction Set Extensions

Nikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisNikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisSection of Electronics and Computers, Department of Physics,Section of Electronics and Computers, Department of Physics,

Aristotle University of Thessaloniki, GreeceAristotle University of Thessaloniki, GreeceE-mail: [email protected]: [email protected] AUTH

IntroductionIntroduction

Broad diversity of algorithms Rapid evolution of standards High-performance demands 

Characteristics of modern applications

Requirements to amortize cost over high production volumes High levels of flexibility => fast time-to-market Adaptability => increased reusability

An appealing option: Reconfigurable Instruction Set Processor (RISP) Couple a standard processor with Reconfigurable Hardware (RH) Processor = bulk of the flexibility RH = adaptation to the targeted application through dynamic

Instruction Set Extensions (ISEs)

MotivationMotivationNew features in order to program such hybrid architectures Partition the application between the RH and the processor Configure the RH and feed the compiler with RH related

information (e.g. execution and reconfiguration latencies) Introduce extra code to pass data and control directives to and from

the RH

Transparent incorporation of these new features is a must In order to preserve the time-to-market close to that of the

traditional software design flow and continue to target software-oriented groups of users with small

or no experience in hardware design

ObjectiveObjectiveDesign a development framework for a dynamic Reconfigurable Instruction Set Processor (RISP) Fully automated

Hides all RH related issues requiring limited interaction with the user

Appropriate for exploration and fine-tune of the architecture towards a target applicationAllow different values for various architectural parametersRetargetable to different instances of the architecture

CONTROL LOGIC(Core &

Coupling)

REG

ISTER FILE

ALU

MU

X

PIPELINE R

EGISTER

PIPELINE R

EGISTER

PIPELINE R

EGISTER

PIPELINE R

EGISTER

MUL

SHIFT DATA MEMORY

CORE / RFU INTERFACE

PROCESSING & INTERCONNECT LAYERSCONFIGURATION LAYER

WRITE BACK DATA

CONTROL SIGNALS

I_WORD

OPERANDS

1ST STAGE RESULT

2ND STAGE RESULT

R OpCode

STATUS SIGNALS

CONFIGURATION BITS

CORERFU

IF Stage ID Stage EX Stage MEM Stage WB Stage

Target ArchitectureTarget ArchitectureCore processor Single issue, 5 Pipeline stages, 32-bit RISC Can issue one reconfigurable instruction per

execution cycle Reconfigurable instructions explicitly

encoded in the instruction word

Interface Provides control and data communication 

Reconfigurable Functional Unit (RFU) Tightly coupled to the core’s pipeline Provides reconfigurable ISE ISE = MISOs of the cores primitive

instructionsMISO: Multiple-Input-Single-

Output

Multi-context config. mem

Can provide a different config. bitstream per execution cycle

1-D array of coarse-grain PEs

“Floating” between two concurrent pipeline stages

Exploit spatial/temporal computation

ConclusionsConclusions An automated development framework for a target RISP has been introduced The framework can be used both for fine-tune an instance of the RISP at design

time but also to program it after fabrication The framework hides all reconfigurable hardware related issues from the user The target RISP architecture can be used by any software-oriented user with no

grip of hardware design Support of a new architecture instance by the framework is possible without any

modification

AcknowledgementAcknowledgementThis work was supported by the General Secretariat of Research and Technology of Greece and the European Union.

Framework usage demonstration - Experimental ResultsFramework usage demonstration - Experimental Results

Speedup vs. number of PEsSpeedup vs. number of PEs Speedup vs. PEs/MISO inputsSpeedup vs. PEs/MISO inputs Speedup vs. number of Speedup vs. number of reconfigurable instructionsreconfigurable instructions

Code size and instruction Code size and instruction fetches reductionfetches reduction

16 words of 134 bitsConfig. Memory Size

ALU, Shift, MultiplyPEs Functionality8Num of PEs

32-bitsGranularityValueConfiguration

Exploration for fine-tuning of critical architecture parameters

A set of benchmarks derived from different benchmarking suites like MiBench and Powerstone were considered The possibilities of using the framework for fine-tune critical architectural parameters and/or to program/evaluate a

specific instance of the architecture are demonstrated Comparisons results were performed considering the RISC processor of the architecture with and without support

of the RFU unit

Configuration of an evaluation Configuration of an evaluation instanceinstance Evaluation of the derived

instance

0

20

40

60

80

100Code Size Instruction Fetches

0

1

2

3

4

58 Instr 16 Instr 32 Instr 64 Instr All Instr

0

1

2

3

4

5Max PEs 10 PEs 8 PEs 6 PEs 4PEs

0

1

2

3

4

5Max PEs-Max Inputs Max PEs-4 Inputs 8 PEs-4Inputs

x2.91 avg. speedup 38% avg. code size

reduction => less memory requirements

62% avg. instruction memory accesses reduction => significant power savings

Development Framework FlowDevelopment Framework FlowC/C++

Front-EndMachSUIF

Optimized IR in CDFG form

Basic Block Profiling Results

Instruction Selection

Back-EndStatistics

Executable Code

Profiling

Instrumentation

m2cInstr.Gen.

Pattern Gen.

Mapping

Instr. Extens.

User Defined Parameters

Instrument the CDFG with profiling annotations

Convert CDFG to equivalent C code

Compile and execute 

Pattern generation to identify MISO cluster of operations

MappingAssign operations PEs / Route the 1-D arrayAnalyze data paths to minimize delayReport candidate instruction semantics

Maximum number of inputs in the pattern (i.e. # of reg.file read ports)

Permitted types of operations in the pattern (i.e. PE functionality)

Maximum number of operations in the patterns (i.e. # of PEs)

Number of different ISEs

Graph isomorphism to discover identical instructions

Rank instructions based on the estimated offered speedup

Select the best ISEs