Upload
kimberly-wade
View
213
Download
0
Embed Size (px)
Citation preview
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 2007
A Combined Clustering and Placement Algorithm for FPGAs
Mark Yamashita
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20072
Contributions
• New algorithm to do clustering and placement
• Novel approach for trading-off depth for duplication control
• Timing model/placement incorporated into clustering
• Delay improves by an average of 11%
• Controllable trade-off between area overhead and delay improvements
• Plan to submit to FPL ‘08
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20073
Motivation
• FPGAs need to be faster• 4x slower than ASICs
• Limitations of existing clustering approaches:• No depth control during clustering, often greedy
• Provide no means for duplication, or
• Use duplication in excess
• Inaccurate timing models
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20074
Motivation
• GOAL: • Improve critical-path delay by improving
clustering
• Approach:• Use placement information to form accurate
timing model
• Make better clustering decisions
• Use duplication to reduce depth
• Take advantage of otherwise unused logic in FPGA
• Control amount of duplication by relaxing depth
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20075
Algorithm Overview
T-VP
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20076
Phase 1: Microcluster Formation
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20077
Phase 1: Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20078
Phase 1: Lawler Levitt Turner Algorithm
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 20079
Phase 1
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200710
Phase 1: Node Duplication Reduction
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200711
Phase 1: Block Usage Results
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
tseng
ex5p
apex
4ds
ip
mise
x3dif
feq
alu4
des
bigke
yse
q
apex
2s2
98 frisc
ellipt
icsp
lapd
c
ex10
10
s384
17
s385
84.1
clma
MCNC Circuit
To
tal
Blo
cks
TVPack
Lawler
Reduced
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200712
Phase 1: Additional Duplication Reduction Through Depth Relaxation
11.5
11.7
11.9
12.1
12.3
12.5
12.7
12.9
13.1
Lawle
rs
Single
Pass
70%
50%
30%
20%
10% 5%
TVPack
Clustering Method
Tc
rit
[ns
]
200
250
300
350
400
450
500
CL
B C
ou
nt
Tcrit [ns]
CLBs
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200713
Algorithm Overview
T-VP
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200714
Phase 2: Microcluster Compaction with Orchestrator
• Iteratively move microclusters to improve timing
• Can fit multiple microclusters to the same CLB position, provided the aggregate of all microclusters meets CLB constraints
• If an area constraint is given, remove duplication and fragmentation until constraint is met
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200715
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200716
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200717
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200718
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200719
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200720
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200721
Phase 2: Orchestrator Example
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200722
Results: Timing
0.00
5.00
10.00
15.00
20.00
25.00
dsip
bigke
yde
s
mise
x3 seq
apex
4alu
4ex
5p
s385
84.1
apex
2dif
feq
tseng sp
la
ex10
10 pdc
s384
17
ellipt
ics2
98clm
afri
sc
MCNC Benchmark
Tcr
it [
ns]
T-VPack
Orchestrator
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200723
Results: Area
0
200
400
600
800
1000
1200
1400
MCNC Benchmark
CL
Bs
Us
ag
ed
0.00
5.00
10.00
15.00
20.00
25.00
Tc
rit
[ns
]
T-VPack
Orchestrator
T-VPack
Orchestrator
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200724
Results: Timing vs. Area
11.5
12
12.5
13
13.5
14
Unlimited Min +3 Min +2 Min +1 Minimum TVPack
Clustering
Tc
rit[
ns
]
200
220
240
260
280
300
320
340
360
380
400
CL
Bs Tcrit [ns]
CLBs
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200725
Results: Timing vs. Depth
-5.0%
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
0% 10% 20% 30% 40% 50% 60%
Depth Improvement
Tim
ing
Imp
rov
em
en
t
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200726
Conclusions
• Reducing depth contributes to a reduction in critical path delay
• Node duplication, when used effectively, reduces critical path delay
• Duplication can be used to provide a performance-area tradeoff to the designer
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200727
Future Work
• Promising Post-Placement Optimizations:• Retiming
• Leverage a more significant depth reduction
• Logic reintroduction
• Create duplication to increase performance
University of British ColumbiaDept. of Electrical and Computer Engineering
November 30, 200728
Contributions
• New algorithm to do clustering and placement
• Novel approach for trading-off depth for duplication control
• Timing model/placement incorporated into clustering
• Delay improves by an average of 11%
• Controllable trade-off between area overhead and delay improvements
• Plan to submit to FPL ‘08