16
© 2011 IBM Corporation Active Management of Timing Guardband to Save Energy in POWER7 Charles Lefurgy , Alan Drake, Michael Floyd, Malcolm Allen-Ware, Bishop Brock, Jose Tierno, and John Carter MICRO-44 5 December 2011

Active Management of Timing Guardband to Save Energy in ......© 2011 IBM Corporation Active Management of Timing Guardband to Save Energy in POWER7 Charles Lefurgy , Alan Drake, Michael

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • © 2011 IBM Corporation

    Active Management of Timing Guardband to Save Energy in POWER7

    Charles Lefurgy, Alan Drake, Michael Floyd, Malcolm Allen-Ware, Bishop Brock, Jose Tierno, and John Carter

    MICRO-44 5 December 2011

  • © 2011 IBM Corporation2

    Excess guardband

    � The voltage used on a microprocessor is conservative to provide a safe timing margin under worst-case conditions

    – workload-induced voltage droops (dI/dt or load line)– high temperature

    � Concern: Energy-efficiency is reduced to guarantee reliability.

    � Opportunity: Worst-case conditions rarely occur. Can actual timing margin be controlled?

    Voltage

    Maximum

    frequencyNominal guardband

    reliable for worst-case load

    Microprocessor operating points

    FA

    IL

    Reduced guardbandwith active management

    maxmin

  • © 2011 IBM Corporation

    Our solution

    � New capability to keep timing margin nearly constant– Convert excess timing margin into a voltage reduction– Reduce traditional voltage margin when conditions are not worst-case

    (Some voltage margin is retained for aging, calibration inaccuracy, etc.)

    1. Measure excess operational margin with timing margin sensor

    – Difference from a calibrated reference point

    2. Protect timing margin against voltage droop by adjusting frequency– Hardware-based timing margin controller

    3. Save energy by converting excess timing margin into voltage reduction– Software-based performance controller

    AdjustclockAdjustclock

    Criticalpath

    monitor

    Criticalpath

    monitor

    AdjustvoltageAdjustvoltage

    Frequency Target

    Voltage

    Performancecontroller

    Timing margin

    controller

    Measuredfrequency

    Timing margin differential

  • © 2011 IBM Corporation

    Measure timing margin

    � Use Critical Path Monitor (CPM) circuit. Mimics behavior of real critical path.

    � Each cycle: generate pulse, traverse synthesized critical path and calibrated delay, capture in edge detector

    � Edge detector 12-bit output: (bit 0 = less margin, bit 11 = more margin)

    Edge Detector

    Critical Path Monitor

  • © 2011 IBM Corporation5

    Critical path monitor

    � 5 Critical Path Monitors per core in POWER7 (8 core chip)

    � Middle bits of edge detector are forwarded to DPLL

    L 3

    L 2

    VSU&

    FPU

    ISU

    IFU

    LSU

    FX

    U

    NCU

    CORE

    D

    FU

    5

    5 DPLL

    CPM

    5-bit output

    per CPM

    POWER7 core chiplet

    CPM output

    “11111” = large margin“11110” = some margin

    “11100” = ideal margin

    “11000” = margin too small

    “10000” = not enough margin

  • © 2011 IBM Corporation

    Example of critical path monitor output

    � Inject 60 mV droop into Power 755 Express Server (with no load-line)– Instruction fetch throttling

    � Critical path sensor follows on-chip voltage reduction

    Injected Instantaneous voltage droop

    Measured CPM response (controllers disabled)

    -70-50-30-101030

    1500 2000 2500 3000Time (ns)

    Voltage

    difference

    from

    nominal

    (mV)

    345678

    1500 2000 2500 3000

    Time (ns)

    Edge

    Detector

    Position

  • © 2011 IBM Corporation

    Protect timing margin

    � Timing margin controller responds to changing operating conditions by adjusting frequency to maintain timing margin target.

    – Implemented in hardware of POWER7.– Can reduce frequency by -7% in about 5 ns to handle fast voltage droop.

    CPM

    DPLL

    +/- freq

    Workload, temperature,

    voltage, and frequency

    influence CPM output

    L 3

    L 2

    VSU&

    FPU

    ISU

    IFU

    LSU

    FX

    U

    NCU

    CORE

    DFU

    5

    5 DPLL

    CPM

    5-bit outputper CPM

    POWER7 core chiplet POWER7

  • © 2011 IBM Corporation

    Calibration of critical path

    � Teach the chip the desired timing margin to use during field operation

    � Done once during manufacturing of chip

    � Run chip at desired timing margin– Set voltage, frequency, and temperature

    – Run stressful workload

    � Find delay setting that places timing edge on position 6 in edge detector– Position 6 is the setpoint for the timing margin controller

    � Calibrations used in our study“100% guardband” – original product“50% guardband” – removes roughly 50% of the guardband“0% guardband” – removes nearly all guardband

  • © 2011 IBM Corporation

    3050

    3150

    3250

    3350

    3450

    3550

    3650

    0 500 1000 1500 2000 2500 3000 3500Time (nanoseconds)

    Frequency

    (MHz)

    -60

    -40

    -20

    0

    20

    Deviation

    from Nominal

    (mV)

    9

    Timing margin controller response time

    � Quick enough to follow voltage droops

    345678

    0 500 1000 1500 2000 2500 3000 3500

    Time (Nanoseconds)

    Edge

    Detector

    Position

    CPM response to droop event (timing margin control enabled)

    Frequency response to droop event (timing margin control enabled)

  • © 2011 IBM Corporation

    Save energy

    � Performance controller adjusts voltage to meet desired clock frequency target.– Implemented in firmware of on-board microcontroller– Frequency is capped at target + 28 MHz (clock resolution)

    • Prevent energy waste• Allow for detection of excess timing margin for voltage reduction

    CPM

    DPLL

    +/- freq -

    Real frequency

    Adjust

    voltageVoltage

    regulator

    Workload, temperature,

    voltage, and frequency influence CPM output Core target frequency

    POWER7 Microcontroller

  • © 2011 IBM Corporation11

    Demonstration

  • © 2011 IBM Corporation

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Traditional

    voltage

    (CPM off)

    50% 0%

    FanProcessor + memory bufferDIMMOther

    12

    Results

    � Implement in Power 750 Express Server(4 POWER7 processors, 64 GB)

    – Run SPEC CPU 2006 workloads– Frequency target = 3864 MHz (Turbo)

    � Guardband reduced to 50%– 20% chip power reduction

    Voltage reduced 113 mV – 140 mV – 18% system power reduction

    – 50% fan power reduction– No change in performance

    � Upper bound on power reduction (0% guardband)

    – 24% chip power reduction – 21% system power reduction

    Normalized System

    DC Power

    Guardband

  • © 2011 IBM Corporation

    Related work

    � Razor [Ernst et al., MICRO-36, 2003]– Instrument actual critical paths

    • Compute both speculative and known-good values– Rollback and replay instructions when timing error detected

    • Allows safe removal of all timing margin

    Up to 24%33-50%

    [Das, 2009]

    Power reduction

    Less (localized)More (pervasive)Design and Verification Effort

    No change1-3% overhead from

    roll-back recovery

    [Ernst, MICRO 2003]

    Performance

    0.2% of coreLess than 3% of chip

    [Das, 2009]

    Area

    Active Guardband

    Management in

    POWER7

    Razor

  • © 2011 IBM Corporation

    Conclusions

    � Demonstration of a new capability to keep timing margin nearly constant

    � Architecture combines two feedback controllers– Hardware-based timing margin controller (safety)– Software-based performance controller (undervolting)

    � Reduced average chip power by 20% for SPEC CPU2006

    � Working prototype in POWER7 server

    � Commercially viable

  • © 2011 IBM Corporation15

    Acknowledgements

    � Philip RestleAlexander RylyakovDaniel FriedmanDaniel Beece

    (IBM Watson Research Lab)

    Groundwork for the CPM-clock feedback control loop and extensive modeling to validate the CPM-DPLL feedback implemented in the POWER7 chip.

    � Richard Willaman (IBM Austin POWER Systems Lab)

    Collected the instantaneous droop scope data.

    � This work was supported in part by the Defense Advanced Research Projects Agency under contract #HR0011-07-9-0002.

  • © 2011 IBM Corporation