27
Power-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanovi Computer Architecture Group, MIT CSAIL ISLPED 2004 8/10/2004

Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Power-Optimal Pipelining in Deep Submicron Technology

Seongmoo Heo and Krste AsanoviComputer Architecture Group, MIT CSAIL

ISLPED 20048/10/2004

Page 2: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Traditional Pipelining• Goal: Maximum performance

Clk

Clk-Q SetupPropagation Delay

Clk

Clk

Vdd

Page 3: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining as a Low-Power Tool

Clk

Clk-Q SetupPropagation Delay

Time Slack

• Goal: Low-Power, Fixed Throughput

Time Slack

Vdd

Clk

Clk

Page 4: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining as a Low-Power Tool

Clk

Clk-Q SetupPropagation Delay

Time Slack

• Goal: Low-Power, Fixed Throughput

Time Slack

Vdd

Clk

Clk

Traded for Power(supply voltage scaling)

Page 5: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining as a Low-Power Tool

Delay

Power

PipeliningTime slack

Flip-flopPowerOverhead

* Clock frequency fixed

Page 6: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining as a Low-Power Tool

Delay

Power

Supply voltage scaling

Power Saving

* Clock frequency fixed

Page 7: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Power-Optimal Pipelining• Power reduction from pipelining limited by power

overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining

Page 8: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Power-Optimal Pipelining

Delay

Power

Too shallow pipelining

• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining

Page 9: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Power-Optimal Pipelining

Delay

Power

Too deep pipelining

Too shallow pipelining

• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining

Page 10: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Power-Optimal Pipelining

Delay

Power

Optimal Power Saving

Too deep pipelining

Too shallow pipeliningOptimal pipelining

• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining

Page 11: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Contribution• Pipelining is an old idea.• Research focus has been on performance impact of

pipelining.• Idea of using pipelining [Chandrakasan ’92] to lower

power has not been fully explored in deep submicron technology.

• Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating

Page 12: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without

clock-gating)

Bottom-to-Top Approach

TotalPower

(clock-gated)

Power

Timeactive activeinactive

SwitchingPower

Component

LeakagePower

Component

Idle Power

Component

Page 13: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without

clock-gating)

Bottom-to-Top Approach

TotalPower

(not clock-gated)

SwitchingPower

Component

LeakagePower

Component

Idle Power

Component

Power

Timeactive inactive active

*Idle power = power consumed when circuit is idle and not clock-gated

Page 14: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

• Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant

• Test bench– BPTM (Berkeley Predictive Technology Model)

70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24)– Hspice simulation at 100°C, Clock = 2 GHz

Methodology

BaselineN FO4 inverters (N = 2 ~ 24)

One Pipeline Stage

TG flip-flops TG flip-flops

Page 15: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining and Switching Power:Analytical Trend

O(N2)

O(1/N)

Number of FO4 per stage, N

Sw

itch

ing

Po

wer

OptimalSaving

Optimal FO4

Quadratic reductionof logic switching power

Flip-flop overhead

∝∝∝∝ Vdd2 ∝∝∝∝ N2

Page 16: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

O(1/N)

Lea

kag

e P

ow

er

O(Nαααα ) (1<αααα< 2)

OptimalSaving

Optimal FO4Superlinear reductionof logic leakage power

Flip-flop overhead

Pipelining and Leakage Power:Analytical Trend

Number of FO4 per stage, N

∝∝∝∝ Vdd * e(ηηηηVdd) ∝∝∝∝ Nαααα

DIBL effect

Page 17: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining and Idle Power:Analytical Trend

• Clock-gating is not always possible– Increased control complexity – insufficient setup time of clock enable signal

• Leakage Power + Flip-flop Switching Power– Between leakage power scaling and flip-flop

switching power scaling depending on leakage level

Page 18: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Pipelining and Idle Power:Analytical Trend

O(1/N)

Number of FO4 per stage, N

Rel

ativ

e P

ow

er

O(Nαααα ) (1<αααα< 2)

OptimalSaving

Optimal FO4

O(1/N)

Number of FO4 per stage, N

O(N)

OptimalSaving

Optimal FO4Linear reduction of Flip-flop switching power∝∝∝∝ 1/N * Vdd

2 ∝∝∝∝ N

Leakage Power Scale

Flip-flop Switching Power Scale

Idle Power

Page 19: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Simulation Results:Power Components

866N*

70(LVT)~ 75(HVT)%

O(Nαααα ) (1<αααα< 2)

Leakage Power

55(HVT)~ 70(LVT)%

79(HVT)~ 82(LVT)%

Saving*

O(N) or O(Nαααα ) (1<αααα< 2)

O(N2)Right hand side curve

Idle Power

Switching Power

Power Components

N = Number of FO4 inverters per stage(Not including flip-flop delay)

N* = Optimal N Saving* = Optimal power saving by pipelining

Fixed Throughput @ 2 GHz

Page 20: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Optimal Power Saving

Optimal FO4 = 6

ClockGating

NoClockGating

Optimal FO4 = 6~8

activity factor activity factor

rela

tive

pow

er

rela

tive

pow

er *2 GHz*Flip-flop delaynot included in optimal FO4

Page 21: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

IdlePower

SwitchingPower

SwitchingPower

LeakagePower

activity factor activity factor

rela

tive

pow

er

rela

tive

pow

er

Optimal Power Saving

Optimal FO4 = 6 Optimal FO4 = 6~8

ClockGating

NoClockGating

Page 22: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

activity factor activity factor

rela

tive

pow

er

rela

tive

pow

er

Optimal Power Saving

Optimal FO4 = 6 Optimal FO4 = 6~8

LVT

ClockGating

NoClockGating

Page 23: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Discussion• LVT can be fast and power-efficient

– enables lower Vdd

• Flip-flop delay more important than flip-flop power for power-optimal pipelining

Page 24: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Limitation of This Work

↑↑↑↑↓↓↓↓Reduced glitches

↓↓↓↓↑↑↑↑Parasitic wire capacitance

↑↑↑↑

↑↑↑↑

Effect on optimal logic depth

↓↓↓↓Additional memory

↓↓↓↓Super-linear growth of flip-flops

Effect on optimal power saving

Page 25: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Conclusion• Pipelining is an effective low-power tool

when used to support voltage scaling in digital system implementing highly parallel computation.

• Optimal Logic Depth: 6-8 FO4– ~ 8-10 FO4 including flip-flop delay

• Optimal Power Saving: 55 – 80% – It depends on Vth, AF, Clock-Gating

• Insights:– Pipelining is more effective with High AF

• Pipelining is most effective at saving switching power

– Pipelining is more effective with lower Vth• Except for when leakage power is dominant.

– Pipelining is more effective with clock-gating • reduced flip-flop overhead.

Page 26: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

Acknowledgments• Thanks to SCALE group members and

anonymous reviewers

• Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.

Page 27: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and

BACKUP SLIDES