Upload
phungphuc
View
218
Download
0
Embed Size (px)
Citation preview
Agendag
I t d ti• Introduction
• LP techniques in detail
• Challenges to low power techniques
• Guidelines for choosing various techniquesg q
Why is Power an Issue?
Leakage Leakage ggPowerPower
ActiveActivePowerPowerPerformance =
180 130 90 65Process Technology
(nm)P h
Source: Intel, 2004mW/MHzComplex System
Power hungry process
Sluggish Battery Life Improvement
Source: EETimes, 2004
Sluggish Battery Life Improvement
2000
2001
2002
2003
2004
Approaches To Power Management• System Architecture (multi-core)
• Software/Hardware power management system
Leakage Activesystem
– ARM IEM
• Voltage scaling / frequency scalingDesign and System Level Optimization
• Multiple voltage islandsOpt at o
• Clock gating, logic structuringWe will discuss this• Multi-Vth cell selection to reduce leakage
• Support for multi voltage islands (aka “multi-vdd” aka “MSV”) implementation
ImplementationWe will discuss this
in detailmulti vdd aka MSV ) implementation
• Signoff accurate analysis
• SOI
• High-K, Gate Stack, power gating, etc.
• LLDProcess Level Optimization
Controlling Power in gImplementation
Dynamic power (≈ k • C • V 2 • f )
Leakage power (≈ V • I )
• Clock gating (including de-clone) • Multi-Vt cell optimization
(≈ k • C L • V DD2 • f CLK) (≈ VDD • Ileakage)
• Area optimization
• Static voltage scaling (MSV)*
• Dynamic voltage frequency scaling
• Substrate biasing (VT CMOS)• Power shut-off (PSO) – aka
• Dynamic voltage frequency scaling (DVFS)*
• Adaptive voltage scaling (AVS)*
MTCMOS - including State retention– Fine grain control
Coarse grain control– Coarse grain control
* Techniques that affect both dynamic and leakage power
Techniques and Trade-offsPower reduction
techniqueLeakagepower
Dynamicpower
Timingpenalty
Area penalty
Methodologyimpact
Dynamic power optimization 10% 10% 0% -10% Noney p pMulti-Vt optimization 6X 0% 0% 0% Low
Clock gating 0% 20% 0% <2% Low
Voltage Islands 2X 40-50% 0% <10% Medium
Power shut-off (PSO) 10-50X 0% 4-8% 5-15% Medium-high
Dynamic and Adaptive Voltage Frequency Scaling
(DVFS and AVS)2-3X 40-70% 0% <10% High
(DVFS and AVS)
Substrate Biasing 10X - 10% <10% High
Source – Customer interviews, Conference papers (ISSCC), magazine articles
Dynamic Power Optimization (No V)
Pin swapping: low C with high FPin swapping: low C with high F
Gate sizing: CMOS power usage related to size
Buffer removal: remove unnecessary buffers
i t 2inst_2
Clock Gating
• Relies on clock gate control signal in RTL or netlist
RTL
Relies on clock gate control signal in RTL or netlist
RTLalways @(posedge clk)
if (en)out <= in;
Control signal
Block A
clk
Maps to either:1. User defined gating
module
Block B
2.Clock-gating-integrated cell from library
clk3.Gating function built
from standard logic
Designing with Voltage Islands
1 2 V Power Domain1.0 V Power Domain clamps
1.2 V Power Domain
MemoryLow Vt(Hi h S d)
Normal Vt High Vt(l l k
y
clamps
(High Speed) (low leakage, lower Speed)
Voltage Level Shifter
1.2VDomain
Power Domain 3 (0.8V)clamps
Voltage Level ShifterVoltage Level Shifter
Power Switch-Off (PSO) Methodologies
Fine Grain Power Switches Coarse Grain Power Switches
VDD
A ZSLEEP
Real VSS
VDD
Switch
Real VSS
SLEEP
Virtual VSS(No Pin)
Virtual Vss
VDD
A ZStandardCell
SLEEP Real R l
Virtual Vss
Standard Cells
Vdd SLEEPVdd
SLEEP
Real VSS
Real VSS
(power switchBuilt-in)
Standard Cells SwitchModule
Logical Representation(No change except for SLEEP)
Logical Representation(Logic needs to be power aware!)
Coarse Grain PSO Methodologies
Always On Always On(Default Domain) (Default Domain)
On/Offi
Always On
Always OnDomain DomainOn
Domain
Global Vdd Global Vdd
Power Switching
ll
GNDSwitched VDD
Power Switching cellCommon GNDSeparated Area VDDOn/Off
Domain
Cl t S it h S t d S it hcellCluster Switches Segmented Switches
Dynamic Voltage Frequency Scaling • Hardware that scales supply voltage and
clock frequency in response to software demands
– 16 levels of VDD (use 5 to 7 in practice) from 1.1V to 0.6V
– Clock frequency from 200MHz to 700MHz in i t f 33MH
Energy Characteristics of a ProcessorPower Energy
increments of 33MHz
• Triggered when load change (detected by CPU software, or HW) – (load means number of functions to be executed) ne
rgy/
Pow
er
number of functions to be executed)– Heavier load → ramp up supply voltage,
when stable, then scale up clock frequency
– Lighter load → scale down clock frequency,
En
g q ywhen PLL locks onto new rate, ramp down supply voltage
• Must keeps clock frequency within limits required by supply voltage to avoid clock Source – Magazine article
300 Mhz,0.80 V
433 Mhz,0.875V
533 Mhz,0.95 V
667 Mhz,1.05V
800 Mhz,1.15V
900 Mhz,1.25V
1000 Mhz,1.3V
Operating Points
required by supply voltage to avoid clock skew problems, timing violation.
– Worst-case scenario of a full swing from 0.6 V to 1.1V and from 200MHz to 700MHz
ld t k b t 280 i dcould take about 280 microseconds.
Dynamic Voltage Frequency Scaling
Mode Core Sleep SlowMode Core Sleep Slow
Baseline 1.08V
125MHz
1.08V
125 MHz
1.08V
125 MHz
SLEEP SLOW
125MHz 125 MHz 125 MHz
Slow 1.08V
125MHz
1.08V
125MHz
0.9V
66MHzCORE
Standby 0.0V 1.08V
125MHz
0.0V
• Multiple modes need to be analyzed/optimized for multiple• Multiple constraints (.sdc) • Librariesanalyzed/optimized for multiple corners
– Setup analysis for (WC, 1,125C) corner
p ( )– Example: baseline.sdc,
ios.sdc, slow.sdc, sleep.sdc– stdcell_1.08sl.lib,
stdcell_0.9sl.lib, stdcell_1.08fs.lib, stdcell 0 9fs libstdcell_0.9fs.lib
Adaptive Voltage Scaling
Operating Voltage
PMPM
CPU/SOCPM
Power Management Unit
PMPerformance parameters
Closed loop control
Substrate Bias Control
Vdd
-2.5, 0.84-2, 0.7750.8
0.9
VddVbp
Vbn
-1.5, 0.7-1, 0.625
-0.5, 0.540.5
0.6
0.7
Vth
(V)
Vss 0, 0.45
0.2
0.3
0.4
2 5 2 1 5 1 0 5 0
• For an n-channel device, the substrate is normally tied to ground (Vsb = 0)
-2.5 -2 -1.5 -1 -0.5 0
Vsb (V)
• A negative bias on Vsb causes Vth to increase• Substrate biasing can be done during packaging (VTCMOS) or
during operation (ABB)during operation (ABB)
Dynamic Power Optimization (No V)y p ( )
• Toggle reduction• Toggle reduction
– Efficient synthesis
C it d ti• Capacitance reduction
– Placement
– Physical synthesis
• Toggle based Capacitance reduction
– Pin swapping
– Area compactionp
– Wire length minimization (high-toggle, fanout)
• Useful skewUseful skew
MVT Optimizationp
Lib h t i ti• Library characterization
– Identical footprint
– Footprint independent
• Implementation
– Efficiently replacing lower Vt cells with higher Vt cells
• Analysisy
– How/When to measure leakage power?
– Signal Integrity AnalysisSignal Integrity Analysis
– Lowest leakage state
Clock Gatingg
Id tif i ti ditilatch_posedge
test
latch_posedge_precontrollatch_posedge_precontrol_obs
• Identifying gating conditions
• Testability requirements ck_outck_inenable
test
• Physical effects of clock gating
obs
• Timing effects of clock gating
Observability Logic
g g
……
.
SISO
SE
Specify max #flops observable
..
per observability flop (default=36)
Low Power Clock Tree Synthesis –De CloningDe-Cloning
CLK CLK
Congestion!
De-cloning
Congestion!Skew!
Dynamic power
Clock GatesCGEnable
Clock GatesCGEnable
Flip flops Flip flops
Voltage Islandsg• Which logic modules are suitable for voltage scaling?• What should be the scaled voltage value for these blocks?• What should be the scaled voltage value for these blocks?• Library characterization
– Multiple voltages/ multiple conditions– Additional components – Voltage level shifters
• Implementation– Physical shape of the voltage islandsy p g– Level shifter insertion in the netlist– Placement of level shifters
Routing to a level shifter– Routing to a level shifter– Power connection of a level shifter
• Analysis– Timing analysis of islands– Optimization including level shifters– Signal integrity analysisg g y y– IR drop and how it affects timing
Power Switch-Off
• Library Characterization– Additional parameters – leakage power, max. current through the cells (Id), max.
voltage drop– Additional cells – Switches, isolation cells, state retention cells
• Implementation– Logic level Switch insertion/simulation/verification– Switch placement schemes – Ring/Column/Distributedp g– Switch enable distribution – high fan out net– Power planning/routing – Fine grain, coarse grain
SRPG control signals– SRPG control signals• Analysis
– Transient analysis– On/Off analysis– Functional verification– Sneak path analysisp y
DVFS/AVS
Lib h t i ti• Library characterization
– Advanced modeling (ECSM, CCS)
• Implementation
– Clock synchronization
– Use of level shifters in the clock design
• Analysisy
– Multi-mode/multi-corner analysis/optimization
– Functional verification (huge for AVS)Functional verification (huge for AVS)
Substrate Bias
• Timing Analysis– Characterization for VTCMOS
– Custom analysis for ABB
• Optimization– Must be aware of body bias
W ll ti• Well separation– Between the regions that are subjected to control and that are not
• Planning/routing additional power signals• Planning/routing additional power signals– Congestion
– EMEM
– Cell design
– Functional Verification/validation
Variability and Low Power
Test Chip Timing Path Slack Distribution, -100ps -> +200ps
14%
16% notimetimed
10%
12%
ths
MVTMSV
6%
8%
% o
f pa
2%
4%
0%
-100 -8
0
-60
-40
-20 0 20 40 60 80 100
120
140
160
180
200
ps
Functional Checks Need to be Done @ Transistor Level
VDD VDD
Power PwrEn1
PwrEn1
A BV1 V2
PwrEn2
Vs Vc
ControlFSM
PwrEn2
ISO
A B
ISO
A YVs VcA
Iso
Y
ISO
Level Shifting Isolation Cellin Source Domain, which will be shut off
State Retention Register Checks
ASWPwrEn1
PowerController
PwrEn1
RTCLK
RET
A
D
Q
RETSRPG
VDD VRETON/OFF
V RET
RTCLK
V1V2
RTCLK
D
Q X
Don’t care
Don’t care
Str ct ral Check
RET
Q XSleep Wake
• Structural Check– Checks that RET signal comes from an Always ON power domain; VRET tied to continuous Power– Checks that VDD and D pins connect to the same power domain
• Functional Checksassert (RET || RET ) (RTCLK off)
Sneak Path Detection
Fl ti d h X
ENB VDD
Floating node when X is switched-off can
cause additional leakage
A YBlock X
EN VSS
Common in mux logic
How To Choose Between Various LP Techniques
• Understand the application/technology need for power reductionreduction
• Choose the techniques based on the power reduction i t d t irequirement and not vice versa.
• Understand the trade-offs – esp. methodology implications
High-level Guidelines for Power Reduction in Design
P i f t i il t d ti i• Power is a performance parameter similar to area and timing
– Optimize and analyze timing, power and area concurrently
• Choose the LP techniques early in the implementation
– Helps to get max. power reduction
– Architecture/process selection must be driven by power need
• Use of voltage scaling techniques leads to quadratic reduction g g q qin power e.g. MSV, DVFS
• When not in use, shut it off!
• Verify, verify, verify!
Steps for Successful LP Design Tapeout!
• LP implementation is complex and requires more time (2X) than normal Plan ahead!normal. Plan ahead!
• Library characterization can time consuming as new cells need to be designed and the existing cells characterized under new g gconditions.
• Choose a comprehensive implementation tool to address not l f t h i b t l t d ff b tonly a range of techniques, but also trade-offs between power,
area and timing.
• LP techniques force you to change the existing methodology• LP techniques force you to change the existing methodology adding new tools and steps. In order to be successful, consider partnering with a EDA vendor (Cadence!)
• Verification is key to successful implementation. Make sure the verification tool can understand low power techniques.