Upload
chloe-mcgee
View
222
Download
0
Embed Size (px)
Citation preview
An Automated Development Framework for a RISC Processor with An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set ExtensionsReconfigurable Instruction Set Extensions
Nikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisNikolaos Vassiliadis, George Theodoridis and Spiridon NikolaidisSection of Electronics and Computers, Department of Physics,Section of Electronics and Computers, Department of Physics,
Aristotle University of Thessaloniki, GreeceAristotle University of Thessaloniki, GreeceE-mail: [email protected]: [email protected] AUTH
IntroductionIntroduction
Broad diversity of algorithms Rapid evolution of standards High-performance demands
Characteristics of modern applications
Requirements to amortize cost over high production volumes High levels of flexibility => fast time-to-market Adaptability => increased reusability
An appealing option: Reconfigurable Instruction Set Processor (RISP) Couple a standard processor with Reconfigurable Hardware (RH) Processor = bulk of the flexibility RH = adaptation to the targeted application through dynamic
Instruction Set Extensions (ISEs)
MotivationMotivationNew features in order to program such hybrid architectures Partition the application between the RH and the processor Configure the RH and feed the compiler with RH related
information (e.g. execution and reconfiguration latencies) Introduce extra code to pass data and control directives to and from
the RH
Transparent incorporation of these new features is a must In order to preserve the time-to-market close to that of the
traditional software design flow and continue to target software-oriented groups of users with small
or no experience in hardware design
ObjectiveObjectiveDesign a development framework for a dynamic Reconfigurable Instruction Set Processor (RISP) Fully automated
Hides all RH related issues requiring limited interaction with the user
Appropriate for exploration and fine-tune of the architecture towards a target applicationAllow different values for various architectural parametersRetargetable to different instances of the architecture
CONTROL LOGIC(Core &
Coupling)
REG
ISTER FILE
ALU
MU
X
PIPELINE R
EGISTER
PIPELINE R
EGISTER
PIPELINE R
EGISTER
PIPELINE R
EGISTER
MUL
SHIFT DATA MEMORY
CORE / RFU INTERFACE
PROCESSING & INTERCONNECT LAYERSCONFIGURATION LAYER
WRITE BACK DATA
CONTROL SIGNALS
I_WORD
OPERANDS
1ST STAGE RESULT
2ND STAGE RESULT
R OpCode
STATUS SIGNALS
CONFIGURATION BITS
CORERFU
IF Stage ID Stage EX Stage MEM Stage WB Stage
Target ArchitectureTarget ArchitectureCore processor Single issue, 5 Pipeline stages, 32-bit RISC Can issue one reconfigurable instruction per
execution cycle Reconfigurable instructions explicitly
encoded in the instruction word
Interface Provides control and data communication
Reconfigurable Functional Unit (RFU) Tightly coupled to the core’s pipeline Provides reconfigurable ISE ISE = MISOs of the cores primitive
instructionsMISO: Multiple-Input-Single-
Output
Multi-context config. mem
Can provide a different config. bitstream per execution cycle
1-D array of coarse-grain PEs
“Floating” between two concurrent pipeline stages
Exploit spatial/temporal computation
ConclusionsConclusions An automated development framework for a target RISP has been introduced The framework can be used both for fine-tune an instance of the RISP at design
time but also to program it after fabrication The framework hides all reconfigurable hardware related issues from the user The target RISP architecture can be used by any software-oriented user with no
grip of hardware design Support of a new architecture instance by the framework is possible without any
modification
AcknowledgementAcknowledgementThis work was supported by the General Secretariat of Research and Technology of Greece and the European Union.
Framework usage demonstration - Experimental ResultsFramework usage demonstration - Experimental Results
Speedup vs. number of PEsSpeedup vs. number of PEs Speedup vs. PEs/MISO inputsSpeedup vs. PEs/MISO inputs Speedup vs. number of Speedup vs. number of reconfigurable instructionsreconfigurable instructions
Code size and instruction Code size and instruction fetches reductionfetches reduction
16 words of 134 bitsConfig. Memory Size
ALU, Shift, MultiplyPEs Functionality8Num of PEs
32-bitsGranularityValueConfiguration
Exploration for fine-tuning of critical architecture parameters
A set of benchmarks derived from different benchmarking suites like MiBench and Powerstone were considered The possibilities of using the framework for fine-tune critical architectural parameters and/or to program/evaluate a
specific instance of the architecture are demonstrated Comparisons results were performed considering the RISC processor of the architecture with and without support
of the RFU unit
Configuration of an evaluation Configuration of an evaluation instanceinstance Evaluation of the derived
instance
0
20
40
60
80
100Code Size Instruction Fetches
0
1
2
3
4
58 Instr 16 Instr 32 Instr 64 Instr All Instr
0
1
2
3
4
5Max PEs 10 PEs 8 PEs 6 PEs 4PEs
0
1
2
3
4
5Max PEs-Max Inputs Max PEs-4 Inputs 8 PEs-4Inputs
x2.91 avg. speedup 38% avg. code size
reduction => less memory requirements
62% avg. instruction memory accesses reduction => significant power savings
Development Framework FlowDevelopment Framework FlowC/C++
Front-EndMachSUIF
Optimized IR in CDFG form
Basic Block Profiling Results
Instruction Selection
Back-EndStatistics
Executable Code
Profiling
Instrumentation
m2cInstr.Gen.
Pattern Gen.
Mapping
Instr. Extens.
User Defined Parameters
Instrument the CDFG with profiling annotations
Convert CDFG to equivalent C code
Compile and execute
Pattern generation to identify MISO cluster of operations
MappingAssign operations PEs / Route the 1-D arrayAnalyze data paths to minimize delayReport candidate instruction semantics
Maximum number of inputs in the pattern (i.e. # of reg.file read ports)
Permitted types of operations in the pattern (i.e. PE functionality)
Maximum number of operations in the patterns (i.e. # of PEs)
Number of different ISEs
Graph isomorphism to discover identical instructions
Rank instructions based on the estimated offered speedup
Select the best ISEs