Upload
others
View
20
Download
1
Embed Size (px)
Citation preview
Clustering of Flip-Flops for Useful-Skew Clock Tree Synthesis
Chuan Yean Tan • Purdue University Rickard Ewetz • University of Central Florida Cheng-Kok Koh • Purdue University
1
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
2
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
3
Multi-Bit Flip Flop (MBFF)
D0 Q0
CLK
D0 Q0
CLK
D1 Q1D0 Q0
CLK
D1 Q1
D2 Q2D3 Q3
1-Bit Flip-Flop 2-Bit Flip-Flop 4-Bit Flip-Flop
4
Benefits of utilizing MBFF
• Reduce power consumption• Reduce routing complexity and clock tree complexity,
less elements to connect leads to less routing complexity
5[3] L.Chen et al. Using MBFF for clock power saving by design compiler. – Proceedings of Synopsys User Group, 2010
MBFF Power And Area Savings[3]
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
6
Previous Works Summary
7
Key Methodologies of Previous Works:• Timing-Driven Intersecting of Feasible Regions[14] • Maximum Independent Set based on heuristic approach [2]• x- and y- interval graphs to determine maximum clique [9]• K-means Algorithm [15,10]
[2] Y.T.Chang et al. Post-placement power optimization with multi-bit flip-flops - ICCAD 2010[9] I.H.R. Jiang et al. INTEGRA: Fast multi-bit flip-flop cluster for clock power saving. - IEEE Transactions CAD of I.C.S. 2015[10] A.B.Khang et al. Improve flop tray-based design implementation for power reduction. - ICCAD 2016[14] S.H.Wang et al. Power-driver flip-flop merging and relocation. - IEEE Transactions CAD of I.C.S 2012[15] G.Wu et al. Flip-Flop clustering by weighted k-means algorithm. - DAC 2016
D Q
CLK
fanini
fanouti
8
Fanout’s slack
Timing-Driven Clustering of Flip-Flops
Feasible Region
D Q
CLK
fanini
fanouti
9
Feasible Region
Flip-Flop is allowed to moved with the region
Analysis of Previous Works
• Previous work cluster flip-flop without consideration of clock tree synthesis (CTS)
• Timing slacks are obtained pre-CTS, timing does not correlate as accurately after CTS
• Most modern designs adopt useful-skew tree to reduce area and power consumption
• However, once the UST has been built, updating or displacing FF’s placement is expensive!
10
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
11
D Q
CLK
D Q
CLK
Launch FFi Capture FFj
Combinational Logic
tiCQ tij
max tjS
tjHtij
mintiCQ
Setup Constraints: ti + tiCQ + tij
max + tjS ≤ tj + T
Hold Constraints: ti + tiCQ + tij
min ≥ tj + tjH
12
Setup Constraints: ti + tiCQ + tij
max + tjS ≤ tj + T
Hold Constraints: ti + tiCQ + tij
min ≥ tj + tjH
Constraints Formulation: uij = T – ti
CQ – tijmax – tj
S
lij = tjH – ti
CQ – tijmin
Skew Constraints: lij ≤ ti – tj ≤ uijlij ≤ skewij ≤ uij
13
Skew Constraint Graph (SCG)
14
FF1 FF4
Skew Constraint Graph
FF3
FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24
l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34
Weighted Edges: w12 = u12w21 = -l12
From the skew constraints: t1 – t2 ≤ u12t2 – t1 ≤ -l12
Generalize weighted constraint: ti – tj ≤ wij
Useful-Skew Tree based on Arrival Time Constraints [5]
FF1 FF4
Skew Constraint Graph (SCG)
FF3
FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24
l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34
ff4ub
ff4lb
ff2ub
ff2lb
ff3ub
ff3lb
Arrival Time Ranges
15
ff1ub
ff1lb
[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017
LP Formulation
No negative cyclefrom SCG
[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017
D Q
CLK
D Q
CLK
D Q
CLK
D Q
CLK
Arrival Time Ranges Clock Tree Construction
16
ff4ub
ff4lb
ff2ub
ff2lb
ff3ub
ff3lb
ff1ub
ff1lb
Useful-Skew Tree based on Arrival Time Constraints [5]
Clock Tree Synthesis
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
17
Clustering of FFs Mechanics
D Q
CLK
D Q
CLK
FFiMerge FF Candidate 1
FFjMerge FF Candidate 2
fanini
fanouti
fanoutj
faninj
18
Displacing FFs
D Q
CLK
D Q
CLK
FFiMerge FF Candidate 1
FFjMerge FF Candidate 2
MBFF Merged Location
ΔkiΔkj
Worst Case: FF’s displacement introduces increase in wirelength 19
Displacing FFs
D Q
CLK
D Q
CLK
FFiMerge FF Candidate 1
FFjMerge FF Candidate 2
update1
Moving each FF will require update to SCG to avoid “out-dated” skew constraints!!
D0 Q0
CLK
D1 Q1
20
MBFFij
update2
Displacing FFs
D Q
CLK
D Q
CLK
FFiMerge FF Candidate 1
FFjMerge FF Candidate 2
MBFFij
update1
Furthermore, displacing FFi does not guarantee that all skew constraints will be met after update1
D0 Q0
CLK
D1 Q1update2
21
Timing Delays
D Q
CLK
D Q
CLK
FFiMerge FF Candidate 1
FFjMerge FF Candidate 2
Merge Location
ΔkiΔkj
Let Δk = timing delay for increase in wirelength 22
Timing Delays
fanini
fanouti
fanoutj
faninj
D0 Q0
CLK
D1 Q1
MBFFij
Δk1
Δk4
Δk3
Δk2
ΣΔki = total timing delay increased from displacing FFs23
Comparing FF’s Arrival Time Ranges
FF1ub
FF1lb
FF2ub
FF3ub
FF4ub
FF5ub
FF2lb
FF3lb
FF4lb
FF4lb
Example of arrival time ranges derived from skew constraints24
Merging Flip-Flop Candidates
FF1ub
FF1lb
FF2ub
FF3ub
FF4ub
FF5ub
FF2lb
FF3lb
FF4lb
FF4lb
Merge Candidate
25
Overlapping Arrival Time Ranges Between FFs
FF1ub
FF1lb
FF2ub
FF3ub
FF4ub
FF5ub
FF2lb
FF3lb
FF4lb
FF4lb
Overlapping Region
26
Updating New Arrival Time Range
FF1ub
FF1lb
FF2ub
MBFF1ub
FF5ub
FF2lb
MBFF1lb
FF4lb
Resulting Merge FF Arrival Time Range
Fast computation to update Arrival Time Ranges after merging FFs 27
Outline • MBFF Introduction • Bounded Arrival Time Useful Skew Tree • Previous MBFF/Clustering Solutions • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
28
Proposed Methodology
Design Netlist
Specify Arrival Time Range
Select a FF Candidate Randomly
Cluster Flip-Flops Algorithm
Clock Tree Synthesis
if not all FF evaluated
29
Cluster Flip-Flops Algorithm
Find Flip-Flop Nearest Neighbor
Arrival Time Ranges Overlaps?
Compute Cluster Location
Scaled Arrival Time Ranages Overlaps?
Cluster Flip-Flops
New FF Candidate
YES
NO
NO
YES
30
Cluster Flip-Flops Algorithm
Find Flip-Flop Nearest Neighbor
Arrival Time Ranges Overlaps?
Compute Cluster Location
Scaled Arrival Time Ranages Overlaps?
Cluster Flip-Flops
New FF Candidate
YES
NO
NO
YES
31
Nearest Neighbor FF Selection HeuristicD Q
CLK
D Q
CLK
D Q
CLK
D Q
CLK
D Q
CLK
D Q
CLK
D Q
CLK
Cluster Pair 1
Cluster Pair 2
Cluster Pair 3
32
Cluster Flip-Flops Algorithm
Find Flip-Flop Nearest Neighbor
Arrival Time Ranges Overlaps?
Compute Cluster Location
Scaled Arrival Time Ranages Overlaps?
Cluster Flip-Flops
New FF Candidate
YES
NO
NO
YES
33
Compute Clustering Location
Cluster Location Algorithm:Weighted FF candidate based on FF’s fanout Higher fanout Higher capacitive load
D Q
CLK
D Q
CLK
Merge FF Candidate 1, FFiNo. of Fanout = 5
Merge FF Candidate 2, FFjNo. of Fanout = 2
Cluster Location
34
Compute Clustering Location
D Q
CLK
D Q
CLK
Merge FF Candidate 1, FFiNo. of Fanout = 5
Merge FF Candidate 2, FFjNo. of Fanout = 2
Cluster Location
Displacing FF with higher capacitive load will incur higher delay change
35
Cluster Flip-Flops Algorithm
Find Flip-Flop Nearest Neighbor
Arrival Time Ranges Overlaps?
Compute Cluster Location
Scaled Arrival Time Ranges Overlaps?
Cluster Flip-Flops
New FF Candidate
YES
NO
NO
YES
36
Arrival Time Range Overlaps
FF1ub
FF1lb
FF2ub
FF3ub
FF4ub
FF5ub
FF2lb
FF3lb
FF4lb
FF4lb
Overlapping Region
37
Updating Scaled Arrival Time Ranges
FF1ub
FF1lb
FF2ub
MBFF1ub
FF5ub
FF2lb
MBFF1lb
FF4lb
38
Scaled Arrival Time Range update after FF has been moved
Scaled from FF’s displacement
Cluster Flip-Flops Algorithm
Find Flip-Flop Nearest Neighbor
Arrival Time Ranges Overlaps?
Compute Cluster Location
Scaled Arrival Time Ranages Overlaps?
Cluster Flip-Flops
New FF Candidate
YES
NO
NO
YES
39
Outline • MBFF Introduction • Previous MBFF/Clustering Solutions • Bounded Arrival Time Useful Skew Tree • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
40
Evaluation Framework
41
Monte Carlo Framework: • Process Variation (10%)• Voltage Variation (15%) • Temperature Variation (30%)
Tree Structure: • Zero-Skew Tree (ZST) • Useful-Skew Tree (UST) [5]
Cluster FF
CTS[5]
CTO[16,17,18]
Quality of Results (QOR)
[5] R. Ewetz et al. Clock Tree Construction based on arrival time constraints – ACM 2017[16] R. Ewetz et al. MCMM clock tree optimization based on slack redistribution using a reduced slack graph. ASP-DAC ’16[17] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD’12[18] S. Roy, et al. Clock tree resynthesis for multi-corner multi-mode timing closure. ISPD’14
Clustering FF Results
42
8 – 14% decrease in total capacitance
Benchmark from IWLS 2005 [8]
[8] International Workshop for Logic Synthesis – http://irwls.org/iwls2005/benchmarks.html - 2005
Clustering FF + MBFF Transformation Results
43
20% – 34% decrease in total capacitance
Data & Clock Path Cap Reduction
44
Assuming worst casedatapath wirelength increase
Clock Path WireCap Reduction >> Data Path Wire Cap Reduction
Outline • MBFF Introduction • Bounded Arrival Time Useful Skew Tree • Previous MBFF/Clustering Solutions • Clustering of Flip-Flops Mechanics• Proposed Methodology • Evaluation & Experimental Results• Summary
45
Summary
46
• Displacing FFs requires update to skew constraints to ensure timing is met
• Clustering of Flip-Flops based on Bounded Arrival Time Range Clock Tree Synthesis provides fast update to the skew constraints while ensuring timing correctness
• Clustering of FFs reduces routing complexity • MBFF transformation further reduces power consumption
47
Q&A
Backup Slides
48
Bounded Arrival Time Ranges
FF1ub
FF1lb
FF2ub
MBFF1ub
FF5ub
FF2lb
MBFF1lb
FF4lb
Resulting Merge FF Arrival Time Range
49
Skew Constraint Graph (SCG)
FF1 FF4
Skew Constraint Graph
FF3
FF2l12 ≤ t1 – t2 ≤ u12 l24 ≤ t2 – t4 ≤ u24
l13 ≤ t1 – t3 ≤ u13 l34 ≤ t3 – t4 ≤ u34
50