Upload
piera
View
47
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Riccardo Cattaneo, Christian Pilato , Gianluca C. Durelli , Marco D. Santambrogio and Donatella Sciuto Politecnico di Milano, Italy. IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013. - PowerPoint PPT Presentation
Citation preview
Riccardo Cattaneo, Christian Pilato, Gianluca C. Durelli, Marco D.
Santambrogio and Donatella SciutoPolitecnico di Milano, Italy
IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013
SMASH: A Heuristic Methodology for Designing Partially Reconfigurable MPSoCs
Christian Pilato, Politecnico di Milano
What is an FPGA?
• Hardware device that can be customized after the fabrication to execute a specific functionality– Distinct hardware blocks are “intrinsically”
running in parallel on the device• Heterogeneous grid of interconnected
components • look-up tables (LUTs), block rams (BRAMs), digital
signal processors (DSPs), switch matrices, input/output blocks (IOBs) etc…
• Possibility to reuse resources by reconfiguring part of the logic at run time (partial reconfiguration)
2
Christian Pilato, Politecnico di Milano
Heterogeneous SoCs with FPGAs
• Highly coupled heterogeneous systems Zynq Platform: ARM Dual-Cortex A9 cores
tightly coupled with a Xilinx Artix-7 FPGA High speed, low latency reconfigurable
interconnect
3
AVNet ZedBoard(Zynq7000-based dev board)
Coarse Grain overview of Zynq7000 All-Programmable SoC
Christian Pilato, Politecnico di Milano
Design Challenges and Motivation
• Hardware engineer needs to:– partition the application in blocks
(partitioning)– determine which parts are better to be
executed in hardware (mapping and scheduling)
– generate the systems (architecture refinement)
• Partial reconfiguration allows reusing the same logic across different tasks– More tasks can be ported in hardware– Significant overhead to be taken into account 4
The steps are strictly interdependent!
INPUT
SMASH
Christian Pilato, Politecnico di Milano
SMASH: Proposed Methodology
• Design Space Exploration– determines the proper
mapping and scheduling
• Architecture Refinement– customizes the architectural
template to derive the corresponding platform
5
DAG
ArchitectureTemplate
Mapping and Scheduling Heuristic
(Fast)Solution
EvaluationDesign Space
Exploration
Solution
Architecture Refinement
Architecture Solution
SMASH
Implementations
Christian Pilato, Politecnico di Milano
Mapping and Scheduling
Input:• Task graph (DAG)• Architectural Template
– Identifies resources constraints• Implementations
– List of different trade-offs in termsof performance and resources
6
FPGA
INTERCONNECTION CHANNEL
RR0CPU1CPU0
SHARED MEMORY
I/O INTERFACE...
ICAP
RR1
RR2 IP0 Output:• Implementation and
component for each task• Order of execution
Christian Pilato, Politecnico di Milano
Implementation vs. Component
• Each task can have multiple alternative implementations on the same component– Faster tasks usually require more resources
• Some tasks can share implementations to execute the same functionality multiple times– Hardware reuse: no reconfiguration is
required
• Implementation is more related to functionality and resources
• Component is more related to where the task is actually executed– Processor or hardware module
7
Christian Pilato, Politecnico di Milano
SMASH: Execution Overview
8
• Simultaneous MApping and Scheduling HeuristicSMASH iteration
Schedule trace
Generate trace
Evaluate metrics
Store solution
Termination?
Return best solution
Yes
No
Christian Pilato, Politecnico di Milano
Exploring Mapping and Scheduling
• Exploration based on the Serial Generation Scheme (SGS)– Constructive approach to better handle design
constraints• Decision is not taken if it would lead to a constraint
violation
• Different combinations of mapping and scheduling– Each decision represents a mapping of a task
with respect to an implementation and a processing element
– The order of selection represents the priority values for resolving scheduling conflicts on the resources
9
Christian Pilato, Politecnico di Milano
Ant Colony Optimization
• Our proposed approach is based on Ant Colony Optimization (ACO) to limit unfeasible solutions– Cooperative behavior of the ants while
searching – The ant has different possibilities at each step
and takes stochastic decisions, composing a trace
• Stochastic principles guarantee exploration (a probability is generated for each admissible decision at each step)
• Feed-backs guarantee the exploitation of good parts of the solutions
10
Christian Pilato, Politecnico di Milano
Algorithm Overview
• Pseudo-code of the proposed ACO-based exploration:
11
Exploitation: updating global information
Mapping decision
Exploration: generating trace
Christian Pilato, Politecnico di Milano
Stochastic Selection Process
• At each decision point d, the probability to assign a candidate j (task/communication) to a proper implementation point i (implementation+processing element) is:
• Global information G: feedback information– Probability that the decision leads to a good solution
• Local heuristic L: problem-specific hint– “Adjusted” by the global heuristic if wrong
• Roulette wheel and extraction of a combination i, j– Probability is generated iff the resources required by the
resulting PEs can be satisfied by the architecture12
nkijdijd
ijdijdijd
nknk LGLG
p
,,,,,
,,,,,, ][][
][][
global heuristicThere is always the
possibility of adding a new PE or reusing an existing
one (platform customization)
local heuristic
More about SMASH
• Simultaneous MApping and Scheduling Heuristic
SMASH iteration
Schedule trace
Generate trace
Evaluate metrics
Store solution
Termination?
Return best solution
Yes
No
13
Christian Pilato, Politecnico di Milano
Trace Generation and Evaluation
• Evaluation is performed only on the complete trace– Updated version of the original TG augmented
with communications and reconfigurations• Reconfiguration is taken into account from the early
stages of the design process
• Possibility to include different evaluation methods– Analytical estimations vs. TLM simulations
• Decisions composing the best solution are reinforced– As the time goes, the best trace is identified
14
Christian Pilato, Politecnico di Milano
Scheduling Definition
Input• Task graph (DAG)• Trace: ordered list of mapping decisions
(task-component-implementation)Output• Start/end time estimations for each taskGoal• Reduce total
execution time
15
Task Component Implementation
A p1 impl_0
B p2 impl_1
C p1 impl_2
D p3 impl_3
Christian Pilato, Politecnico di Milano
Scheduling: Methodology Overview
16
SMASH scheduler
Create extended task
graph
Actual scheduling
(assign times)Evaluate Metrics
Task graphand
trace
Extendedtask graph Metrics
Christian Pilato, Politecnico di Milano
Extended TG: Communications
17
Adding explicit tasks based on the
communication topology
Christian Pilato, Politecnico di Milano
Extended TG: Reconfigurations
• A reconfiguration task is introduced iff:– Two processing tasks are mapped on the
same component and– Their implementations are different, i.e.,
module cannot be reused
• Insertion of a reconfiguration task:– New edges are introduced from all WRITEs
exiting the source processing task to the reconfiguration
– New edges are introduced from the reconfiguration to all the READs entering the target processing task
18
Christian Pilato, Politecnico di Milano
Extended TG: Reconfigurations
19
Task Component Implementation
A p1 impl_0
B p2 impl_1
C p1 impl_2
D p3 impl_3
Christian Pilato, Politecnico di Milano
Trace Evaluation
Possibility to integrate different
policies to generate the corresponding
scheduling20
Christian Pilato, Politecnico di Milano
Architecture Refinement
• Actual platform instance is derived based on the resulting decisions– Hardware modules with only one task assigned
are converted into static IP blocks– Hardware modules with more tasks assigned
are represented as reconfigurable regions
• Integration with the generation of the run time manager to manage reconfigurations– Still work in progress and manually performed
21
Christian Pilato, Politecnico di Milano
Experimental Evaluation
• Synthetic benchmarks (TGFF)– Focus on scalability of the approach– Possibility to evaluate different task graph patterns
• Resulting systems (platform instance and extended task graph with mapping/scheduling decisions) converted into virtual platforms– Validation of the different solutions assuming
correctness of the execution
• Simulations performed with Synopsys Platform Architect – VPU performance annotations extracted from tasks’
implementations
22
Christian Pilato, Politecnico di Milano
Experimental Setup
• Three different class of experiments:– Static: FPGA area is divided into a set of up to
KS static IP cores (no partial reconfiguration)– Mixed: both IP cores and reconfigurable
regions can be used, with an upper bound of KM IPs and RM reconfigurable regions.
– Reconfigurable: architectures with no more than KR regions
• Reconfigurable regions can be also deployed as static cores in the final architecture if only one task is assigned to them
23
Christian Pilato, Politecnico di Milano
Experimental Results
static mixed reconfigurable#Task IPs RRs HW
tasks#Reconf IPs RRs HW
tasks#Reconf IPs RRs HW
tasks#Reconf
12 7 0 7 0 7 0 7 0 6 0 6 020 20 0 20 0 18 1 20 1 17 1 19 131 30 0 30 0 20 4 31 7 16 7 30 741 30 0 30 0 18 8 40 14 12 12 40 1652 30 0 30 0 17 9 51 25 8 17 51 2660 30 0 30 0 15 10 53 28 10 14 51 2770 30 0 30 0 17 9 55 28 9 16 58 3383 30 0 30 0 15 11 80 54 6 19 81 5690 30 0 30 0 23 3 31 5 9 12 39 18100 30 0 30 0 16 7 46 23 3 17 53 33
24
sta ticmi xe dre co nÞg ura ble
Spee
dup
0
1
2
3
N umb er o f tasks12 20 31 41 52 60 70 83 90 100
Small task graphs cannot benefit of reconfiguration
Large task graphs are affected by communication overhead
Christian Pilato, Politecnico di Milano
Conclusions and Future Work
• SMASH is an automated methodology to design reconfigurable systems– It determines the mapping and scheduling of
the different tasks– It allows customizing the architectural template
• Future work– Integration of floorplanning procedures to
compuate and validate physical constraints of the blocks
– Automatic generation of the platform specification
25
Christian Pilato, Politecnico di Milano
End…
26
http://www.fp7-faster.eu/