Upload
jeremy-fitzgerald
View
16
Download
2
Embed Size (px)
DESCRIPTION
Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning. Wai-Kei Mak Dept. of Computer Science and Engineering University of South Florida Evangeline F.Y. Young Dept. of Computer Science and Engineering The Chinese University of Hong Kong. Outline. - PowerPoint PPT Presentation
Citation preview
Temporal Logic Replication for Dynamically Reconfigurable FPGA
Partitioning
Wai-Kei MakDept. of Computer Science and Engineering
University of South Florida
Evangeline F.Y. YoungDept. of Computer Science and Engineering
The Chinese University of Hong Kong
Outline
I. Dynamically reconfigurable FPGAII. Temporal partitioning = Conventional
partitioning?III. Temporal logic replication
What? Why? How?
IV. Experimental resultsV. Conclusions
Dynamically Reconfigurable FPGA
Store multiple contexts on chip.Reuse logic blocks and wire segments
dynamically.The contexts stored can correspond to the
multiple stages of a large circuit.
Temporal Circuit Partitioning
Temporal partitioning multiple stages execute sequentially
Spatial partitioning multiple components execute concurrently
Temporal Logic Replication
Can reduce buffering requirement.Effectively utilize available slack logic
capacity.
Temporal Constraints
For a net n = (v1, {v2, …, vp}),
require s(v1) s(vj), j=2,…,p, if v1 is a combinational node
Temporal Constraints (Cont’d)
require s(vj) s(v1), j=2,…,p, if v1 is a flip-flop node
Temporal Partitioning with Replication
Problem: Partition given circuit into pre-defined # stages satisfying all temporal constraints.
Objective: Minimize buffers required between stages.
Proposal: Utilize available slack logic capacity to reduce signal buffering.
Solution: An effective 2-step approach.
2-Step Approach
Step 1: Compute a temporal partition w/o replication.Step 2: Repeatedly identify the bottleneck stage and
apply replication for that stage.
Advantages of 2-Step Approach
Will not replicate unnecessarily.All temporal constraints are already
satisfied when replicating.
Min-Area Min-Cut Replication
Let stage i be the bottleneck stage.Min-Cut ReplicationCompute a subset of nodes Ri in stage i
for replication into stage i+1 to maximally reduce the communication cost at stage i.
Min-Area Min-Cut ReplicationCompute a minimum subset of nodes Ri in
stage i for replication into stage i+1 to maximally reduce the communication cost at stage i.
Optimal Solution for Min-Area Min-Cut Replication
Observation 1:The min-cut replication problem can be solved by
computing a minimum cut (Vi-Ri,Ri) in stage i.
Observation 2:The min-area min-cut replication problem can be
solved by computing a minimum cut (Vi-Ri,Ri) in stage i s.t. |Ri| is minimized.
Let Vi = set of nodes in stage i.
Example
A pre-partition:
Computing a minimum cut in stage 2:
Example (Cont’d)
Computed R2 = {j}
Network Modeling
Need to ensure thatcut size = buffer requirement
For a net (v1, {v2, …, vp}),
The Case of Limited Slack Logic Capacity
The solution of min-area min-cut replication suffices if slack logic capacity is sufficiently large.
Otherwise, |Ri| exceeds the slack, then use a heuristic to reduce Ri.
Use a repeated max-flow min-cut heuristic to gradually reduce Ri (so cut size is only increased gradually).
H. Yang, D.F. Wong, “Efficient Network Flow based Min-Cut Balanced Partitioning”, ICCAD’94.
Algorithm
Input: Stage area bound A.1. Network modeling for bottleneck stage i.2. Compute min-cut (Vi-Ri,Ri) s.t. |Ri| is
minimized.3. If |Vi+1|+|Ri| A, stop and return Ri.
4. Collapse a node in Ri with all nodes in
Vi-Ri, goto 2.
Experimental Results
Circuit#buf w/o rep.
#buf w/ rep.
Imprv. %
Rep. %
C3540 198 194 2.02 0.48
C5315 140 129 7.86 0.67
C6288 83 63 24.10 4.41
C7552 210 176 16.19 3.12
S13207
688 669 2.76 2.54
S15850
761 699 8.15 3.59
S35932
2729 2636 3.41 2.48
S38417
2194 2104 4.10 0.63
S38584
2280 2137 6.27 0.98
Conclusions
Proposed temporal logic replication to reduce buffering requirement in DRFPGA partitioning.
Presented an effective 2-step approach.Formulated and optimally solved the min-area
min-cut replication problem.Extended to case of limited slack logic
capacity. In the paper, a new timing-driven temporal
partitioning algorithm was introduced to compute pre-partition.