Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Power-Optimal Pipelining in Deep Submicron Technology
Seongmoo Heo and Krste AsanoviComputer Architecture Group, MIT CSAIL
ISLPED 20048/10/2004
Traditional Pipelining• Goal: Maximum performance
Clk
Clk-Q SetupPropagation Delay
Clk
Clk
Vdd
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
Traded for Power(supply voltage scaling)
Pipelining as a Low-Power Tool
Delay
Power
PipeliningTime slack
Flip-flopPowerOverhead
* Clock frequency fixed
Pipelining as a Low-Power Tool
Delay
Power
Supply voltage scaling
Power Saving
* Clock frequency fixed
Power-Optimal Pipelining• Power reduction from pipelining limited by power
overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Too deep pipelining
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
Power-Optimal Pipelining
Delay
Power
Optimal Power Saving
Too deep pipelining
Too shallow pipeliningOptimal pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
Contribution• Pipelining is an old idea.• Research focus has been on performance impact of
pipelining.• Idea of using pipelining [Chandrakasan ’92] to lower
power has not been fully explored in deep submicron technology.
• Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating
1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without
clock-gating)
Bottom-to-Top Approach
TotalPower
(clock-gated)
Power
Timeactive activeinactive
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without
clock-gating)
Bottom-to-Top Approach
TotalPower
(not clock-gated)
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
Power
Timeactive inactive active
*Idle power = power consumed when circuit is idle and not clock-gated
• Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant
• Test bench– BPTM (Berkeley Predictive Technology Model)
70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24)– Hspice simulation at 100°C, Clock = 2 GHz
Methodology
BaselineN FO4 inverters (N = 2 ~ 24)
One Pipeline Stage
TG flip-flops TG flip-flops
Pipelining and Switching Power:Analytical Trend
O(N2)
O(1/N)
Number of FO4 per stage, N
Sw
itch
ing
Po
wer
OptimalSaving
Optimal FO4
Quadratic reductionof logic switching power
Flip-flop overhead
∝∝∝∝ Vdd2 ∝∝∝∝ N2
O(1/N)
Lea
kag
e P
ow
er
O(Nαααα ) (1<αααα< 2)
OptimalSaving
Optimal FO4Superlinear reductionof logic leakage power
Flip-flop overhead
Pipelining and Leakage Power:Analytical Trend
Number of FO4 per stage, N
∝∝∝∝ Vdd * e(ηηηηVdd) ∝∝∝∝ Nαααα
DIBL effect
Pipelining and Idle Power:Analytical Trend
• Clock-gating is not always possible– Increased control complexity – insufficient setup time of clock enable signal
• Leakage Power + Flip-flop Switching Power– Between leakage power scaling and flip-flop
switching power scaling depending on leakage level
Pipelining and Idle Power:Analytical Trend
O(1/N)
Number of FO4 per stage, N
Rel
ativ
e P
ow
er
O(Nαααα ) (1<αααα< 2)
OptimalSaving
Optimal FO4
O(1/N)
Number of FO4 per stage, N
O(N)
OptimalSaving
Optimal FO4Linear reduction of Flip-flop switching power∝∝∝∝ 1/N * Vdd
2 ∝∝∝∝ N
Leakage Power Scale
Flip-flop Switching Power Scale
Idle Power
Simulation Results:Power Components
866N*
70(LVT)~ 75(HVT)%
O(Nαααα ) (1<αααα< 2)
Leakage Power
55(HVT)~ 70(LVT)%
79(HVT)~ 82(LVT)%
Saving*
O(N) or O(Nαααα ) (1<αααα< 2)
O(N2)Right hand side curve
Idle Power
Switching Power
Power Components
N = Number of FO4 inverters per stage(Not including flip-flop delay)
N* = Optimal N Saving* = Optimal power saving by pipelining
Fixed Throughput @ 2 GHz
Optimal Power Saving
Optimal FO4 = 6
ClockGating
NoClockGating
Optimal FO4 = 6~8
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er *2 GHz*Flip-flop delaynot included in optimal FO4
IdlePower
SwitchingPower
SwitchingPower
LeakagePower
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
ClockGating
NoClockGating
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
LVT
ClockGating
NoClockGating
Discussion• LVT can be fast and power-efficient
– enables lower Vdd
• Flip-flop delay more important than flip-flop power for power-optimal pipelining
Limitation of This Work
↑↑↑↑↓↓↓↓Reduced glitches
↓↓↓↓↑↑↑↑Parasitic wire capacitance
↑↑↑↑
↑↑↑↑
Effect on optimal logic depth
↓↓↓↓Additional memory
↓↓↓↓Super-linear growth of flip-flops
Effect on optimal power saving
Conclusion• Pipelining is an effective low-power tool
when used to support voltage scaling in digital system implementing highly parallel computation.
• Optimal Logic Depth: 6-8 FO4– ~ 8-10 FO4 including flip-flop delay
• Optimal Power Saving: 55 – 80% – It depends on Vth, AF, Clock-Gating
• Insights:– Pipelining is more effective with High AF
• Pipelining is most effective at saving switching power
– Pipelining is more effective with lower Vth• Except for when leakage power is dominant.
– Pipelining is more effective with clock-gating • reduced flip-flop overhead.
Acknowledgments• Thanks to SCALE group members and
anonymous reviewers
• Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.
BACKUP SLIDES