Upload
angeni
View
36
Download
1
Embed Size (px)
DESCRIPTION
Reconfigurable Computing (EN2911X, Fall07) Lecture 11: RC Principles: Software (4/4). Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu. Summary of the last 3 lectures. previous lectures. System Specification. this lecture. traditional compiler class. - PowerPoint PPT Presentation
Citation preview
Reconfigurable ComputingS. Reda, Brown University
Reconfigurable Computing(EN2911X, Fall07)
Lecture 11: RC Principles: Software (4/4)
Prof. Sherief RedaDivision of Engineering, Brown University
http://ic.engin.brown.edu
Reconfigurable ComputingS. Reda, Brown University
Summary of the last 3 lectures
partitioning
SW
System Specification
HW
compiling
Verilog
synthesis
mapping & packing
place & route
download to board
compile
link
configuration file
executable image
previous lectures
this lecture
traditional compiler class
Reconfigurable ComputingS. Reda, Brown University
Embedding a digital circuit to FPGA fabric
Programmableinterconnect
Programmablelogic blocks
[Maxfield’04]
Programmable logic element
1. Mapping decomposes the circuit into logic sections and flip-flops such that each section fits into a K-LUT LE.
2. Packing groups LEs into clusters so that each cluster fits into a LAB3. Placement determines the position of each cluster into the LABs of
the island style FPGA4. Routing determines the exact routes for the communicating
LE/LABsWhat are the objectives/metrics that these algorithms should pursue?
Reconfigurable ComputingS. Reda, Brown University
1. Mapping finds a covering for a given circuit using K-LUT
Map to a LUT in a LB
[Figure form Cong FPGA’01]
Reconfigurable ComputingS. Reda, Brown University
A covering example
[From Ling et al. DAC’05]
There could be many possible covering? Which one should be picked?
Reconfigurable ComputingS. Reda, Brown University
2. Packing
How can we decide which LEs should go together in the same logic cluster?
Possible method (VPACK): Construct each cluster sequentially• Start by choosing seed LE for the cluster• Then greedily selects the LE which shares the most inputs and outputs
with the cluster being constructed• Repeat the procedure until greedily until the cluster is full or the number of
inputs exceed the limit I• Can addition of a LE to a cluster reduces the number of distinct inputs?
Reconfigurable ComputingS. Reda, Brown University
3. Placement
What’s wrong with the previous greedy algorithm?
• Placement assigns an exact position or LAB for each cluster in the input netlist
• Suppose you start with a random placement, how can you improve it?
Possible algorithm: - Pick a pair of cells and swap their locations if this leads to reduction in WL
WL results
possible placements
localoptimal
globaloptimal
It can simply get stuck in a local optimal result
Reconfigurable ComputingS. Reda, Brown University
Simulated annealing allows us to avoid getting trapped in a local minimaModified algorithm• Generate a random move (say a swap of
two cells)– calculate the chance in WL (L) due to
the move– if the move results in reduction (L < 0)
then accept– else reject with probability 1-e-L/T
• T (temperature) controls the rejection probability
• Initially, T is high (thus avoiding getting trapped early in a local minima) then the temperature cools down in a scheduled manner; at the end, the rejection probability is 1
• With the right “slow-enough” cooling scheduling, simulated annealing is guaranteed to reach the global optimal
WL results
possible placements
localoptimal
globaloptimal
Reconfigurable ComputingS. Reda, Brown University
How do the cooling scheduling and corresponding cost functions look like?
[source: I. Markov]
Reconfigurable ComputingS. Reda, Brown University
Placement before & after simulated annealing
[using VPR tool]
Reconfigurable ComputingS. Reda, Brown University
4. Routing
Assign exact routes for each wire in the given circuit in the FPGA fabric such that no two wires overlap
General idea: •Order the wires according to some criteria•Sequentially route each wire using shortest path algorithms (after removing the resources consumed from preceding routed wires)
Reconfigurable ComputingS. Reda, Brown University
Maze routing
22 1 2
2 1 s 1 22 1 2
2 t
Problem: Find the shortest path for a 2-pin wire from s to t
5 4 3 4 5 6 7 8 9 104 3 2 3 4 5 6 7 8 9 103 2 1 2 3 4 5 6 7 8 9 102 1 s 1 2 9 103 2 1 2 3 11 104 3 2 3 4 10 11 t 115 4 3 4 5 9 10 116 5 4 5 6 7 8 9
6 5 6 7 8 96 7 8 9
grid cell capacity is full
grid cell still has available tracks
Speed ups are possible using A* search algorithms and other AI search techniques
Reconfigurable ComputingS. Reda, Brown University
Impact of Net Ordering
A bad net ordering may unnecessarily increase the total wirelength or even yield the chip unroutable!
A
A
B
B
B first then A(Good order)
A
AB
B
A first then B(Bad order)
• Example: Two nets A and B
Length in placement
Timing criticality
Reconfigurable ComputingS. Reda, Brown University
When a route for a net can’t be found then rip up and re-route
A
B
C
A
B
C
Cannot route C
A
B
C
A
B
C
So rip-up Band route C first.
A
B
C
A
B
C
Finally route B.
[Example from Prof. D. Pan Lecture]
Reconfigurable ComputingS. Reda, Brown University
VPR. After routing
After placement After placement and routing
You probably saw similar layouts from the Quartus II tool
Reconfigurable ComputingS. Reda, Brown University
Finally programming the FPGA
Configuration data in
Configuration data out
= I/O pin/pad
= SRAM cell
Reconfigurable ComputingS. Reda, Brown University
Summary
• Done with software part for reconfigurable computing• Next lecture, project overview• The one after is the midterm• Afterwards, we will start looking at SystemC is a higher-
level method to synthesis systems