Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
© 2011 IBM Corporation
Active Management of Timing Guardband to Save Energy in POWER7
Charles Lefurgy, Alan Drake, Michael Floyd, Malcolm Allen-Ware, Bishop Brock, Jose Tierno, and John Carter
MICRO-44 5 December 2011
© 2011 IBM Corporation2
Excess guardband
� The voltage used on a microprocessor is conservative to provide a safe timing margin under worst-case conditions
– workload-induced voltage droops (dI/dt or load line)– high temperature
� Concern: Energy-efficiency is reduced to guarantee reliability.
� Opportunity: Worst-case conditions rarely occur. Can actual timing margin be controlled?
Voltage
Maximum
frequencyNominal guardband
reliable for worst-case load
Microprocessor operating points
FA
IL
Reduced guardbandwith active management
maxmin
© 2011 IBM Corporation
Our solution
� New capability to keep timing margin nearly constant– Convert excess timing margin into a voltage reduction– Reduce traditional voltage margin when conditions are not worst-case
(Some voltage margin is retained for aging, calibration inaccuracy, etc.)
1. Measure excess operational margin with timing margin sensor
– Difference from a calibrated reference point
2. Protect timing margin against voltage droop by adjusting frequency– Hardware-based timing margin controller
3. Save energy by converting excess timing margin into voltage reduction– Software-based performance controller
AdjustclockAdjustclock
Criticalpath
monitor
Criticalpath
monitor
AdjustvoltageAdjustvoltage
Frequency Target
Voltage
Performancecontroller
Timing margin
controller
Measuredfrequency
Timing margin differential
© 2011 IBM Corporation
Measure timing margin
� Use Critical Path Monitor (CPM) circuit. Mimics behavior of real critical path.
� Each cycle: generate pulse, traverse synthesized critical path and calibrated delay, capture in edge detector
� Edge detector 12-bit output: (bit 0 = less margin, bit 11 = more margin)
Edge Detector
Critical Path Monitor
© 2011 IBM Corporation5
Critical path monitor
� 5 Critical Path Monitors per core in POWER7 (8 core chip)
� Middle bits of edge detector are forwarded to DPLL
L 3
L 2
VSU&
FPU
ISU
IFU
LSU
FX
U
NCU
CORE
D
FU
5
5 DPLL
CPM
5-bit output
per CPM
POWER7 core chiplet
CPM output
“11111” = large margin“11110” = some margin
“11100” = ideal margin
“11000” = margin too small
“10000” = not enough margin
© 2011 IBM Corporation
Example of critical path monitor output
� Inject 60 mV droop into Power 755 Express Server (with no load-line)– Instruction fetch throttling
� Critical path sensor follows on-chip voltage reduction
Injected Instantaneous voltage droop
Measured CPM response (controllers disabled)
-70-50-30-101030
1500 2000 2500 3000Time (ns)
Voltage
difference
from
nominal
(mV)
345678
1500 2000 2500 3000
Time (ns)
Edge
Detector
Position
© 2011 IBM Corporation
Protect timing margin
� Timing margin controller responds to changing operating conditions by adjusting frequency to maintain timing margin target.
– Implemented in hardware of POWER7.– Can reduce frequency by -7% in about 5 ns to handle fast voltage droop.
CPM
DPLL
+/- freq
Workload, temperature,
voltage, and frequency
influence CPM output
L 3
L 2
VSU&
FPU
ISU
IFU
LSU
FX
U
NCU
CORE
DFU
5
5 DPLL
CPM
5-bit outputper CPM
POWER7 core chiplet POWER7
© 2011 IBM Corporation
Calibration of critical path
� Teach the chip the desired timing margin to use during field operation
� Done once during manufacturing of chip
� Run chip at desired timing margin– Set voltage, frequency, and temperature
– Run stressful workload
� Find delay setting that places timing edge on position 6 in edge detector– Position 6 is the setpoint for the timing margin controller
� Calibrations used in our study“100% guardband” – original product“50% guardband” – removes roughly 50% of the guardband“0% guardband” – removes nearly all guardband
© 2011 IBM Corporation
3050
3150
3250
3350
3450
3550
3650
0 500 1000 1500 2000 2500 3000 3500Time (nanoseconds)
Frequency
(MHz)
-60
-40
-20
0
20
Deviation
from Nominal
(mV)
9
Timing margin controller response time
� Quick enough to follow voltage droops
345678
0 500 1000 1500 2000 2500 3000 3500
Time (Nanoseconds)
Edge
Detector
Position
CPM response to droop event (timing margin control enabled)
Frequency response to droop event (timing margin control enabled)
© 2011 IBM Corporation
Save energy
� Performance controller adjusts voltage to meet desired clock frequency target.– Implemented in firmware of on-board microcontroller– Frequency is capped at target + 28 MHz (clock resolution)
• Prevent energy waste• Allow for detection of excess timing margin for voltage reduction
CPM
DPLL
+/- freq -
Real frequency
Adjust
voltageVoltage
regulator
Workload, temperature,
voltage, and frequency influence CPM output Core target frequency
POWER7 Microcontroller
© 2011 IBM Corporation11
Demonstration
© 2011 IBM Corporation
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Traditional
voltage
(CPM off)
50% 0%
FanProcessor + memory bufferDIMMOther
12
Results
� Implement in Power 750 Express Server(4 POWER7 processors, 64 GB)
– Run SPEC CPU 2006 workloads– Frequency target = 3864 MHz (Turbo)
� Guardband reduced to 50%– 20% chip power reduction
Voltage reduced 113 mV – 140 mV – 18% system power reduction
– 50% fan power reduction– No change in performance
� Upper bound on power reduction (0% guardband)
– 24% chip power reduction – 21% system power reduction
Normalized System
DC Power
Guardband
© 2011 IBM Corporation
Related work
� Razor [Ernst et al., MICRO-36, 2003]– Instrument actual critical paths
• Compute both speculative and known-good values– Rollback and replay instructions when timing error detected
• Allows safe removal of all timing margin
Up to 24%33-50%
[Das, 2009]
Power reduction
Less (localized)More (pervasive)Design and Verification Effort
No change1-3% overhead from
roll-back recovery
[Ernst, MICRO 2003]
Performance
0.2% of coreLess than 3% of chip
[Das, 2009]
Area
Active Guardband
Management in
POWER7
Razor
© 2011 IBM Corporation
Conclusions
� Demonstration of a new capability to keep timing margin nearly constant
� Architecture combines two feedback controllers– Hardware-based timing margin controller (safety)– Software-based performance controller (undervolting)
� Reduced average chip power by 20% for SPEC CPU2006
� Working prototype in POWER7 server
� Commercially viable
© 2011 IBM Corporation15
Acknowledgements
� Philip RestleAlexander RylyakovDaniel FriedmanDaniel Beece
(IBM Watson Research Lab)
Groundwork for the CPM-clock feedback control loop and extensive modeling to validate the CPM-DPLL feedback implemented in the POWER7 chip.
� Richard Willaman (IBM Austin POWER Systems Lab)
Collected the instantaneous droop scope data.
� This work was supported in part by the Defense Advanced Research Projects Agency under contract #HR0011-07-9-0002.
© 2011 IBM Corporation