39
Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux , Shahriar Mirabbasi, William Dunford University of British Columbia, Canada Patrick Palmer University of Cambridge, UK Energy Recovery from High-frequency Clocks using DC-DC Converters

Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of British Columbia, Canada Patrick Palmer University of

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Mehdi Alimadadi, Samad Sheikhaei,

Guy Lemieux, Shahriar Mirabbasi, William Dunford

University of British Columbia, Canada

Patrick Palmer

University of Cambridge, UK

Energy Recovery

from High-frequency Clocks

using DC-DC Converters

2

Problem

Clock power in high-performance CPUsCPU Year Clock Power % Power

for ClockClock Power

Intel McKinley2002

(180nm)1 GHz 130W 33% 43W

Intel Montecito2005

(90nm)2.5 GHz 85W 30% 25W

IBM Power 62007

(65nm)5 GHz > 100W 22% > 22W

• Cause– Charge big clock capacitor Cclk with energy– Discharge Cclk energy to GND (WASTE IT!!)– Repeat every clock cycle

3

Primary Contribution of This Work

• Primary contribution– Discharge Cclk using DC-DC converter instead of GND

• Use converter to power useful load (Rload)• Integrated clock drivers with DC-DC converters• Net savings in power

Voltage feedback (for regulation)

Useful

Load

4

Summary Results

• Explore 3 main DC-DC power converter topologies– Buck converter our previous work [ ISSCC 2007 ]– Boost converter this paper [ ISVLSI 2008 ]– Buck-boost converter this paper [ ISVLSI 2008 ]

• 90nm layouts, 3GHz operation, < 0.3mm2

Clock-only power (input)

Extra power to operate

converter (input)

Converter output power

% clock energy

recovered

Buck converter [ ISSCC2007 ]

40mW 16mW 26mW 50%

Boost converter

100mW 25mW 28mW 20%

Buck-boost converter

100mW 72mW 48mW 30%

Background

6

Background – Typical Clocking Architecture

Level 3 Gaters & Final drivers

Final H-tree

Bottom mesh

Level 1 & Level 2 H-tree

Clock

Source

7

Background – Typical Clocking Architecture

• Clock distribution

– Majority of energy used by final drivers

– Levels 1, 2• H-trees• Tunable delays (CVDs) to eliminate skew• Low-swing, differential low power, noise immunity• ~ 5W of power

– Level 3• Gaters reduce clock activity 50-85% (Power6)

– Can’t eliminate all activity still need a clock to compute• Final clock drivers

– Full-rail swing tapered inverters drive hundreds latches, high power• H-tree with ends shorted by Mesh low skew, high power

• ~15W to 40W of power

8

Background –Reducing Clock Power

• Clock distribution– Low-swing (differential) signals

• Final drivers need full-rail

– Resonant clocking (saves 80%)• Final drivers need square clock

• Final clock drivers– Adiabatic switching

• Low-performance, < 100MHz

– Double-edge clocking• Feasible, but complex flip-flops, larger loads• Compatible with energy recovery in this paper

9

Background – Switch Mode Power Supplies

• Basic DC-DC converter topologies– Buck

• Step down• 0 Vout VDD

– Boost• Step up• VDD Vout

– Buck-boost• Negative step up/down• Vout 0

CF

LF

D

S

RL

+

CF

LF D

S RL

+

CF

DS

RLLF

+

10

Background – Switch Mode Power Supplies

• DC-DC buck converter– CMOS inverter as power switches

• Implementation of zero-voltage switching (ZVS)– Turn on NMOS when Vinv= 0– Turn on PMOS when Vinv=Vdd

C R

Vgate VoutVinv

Vdd

S

D

IL

LL

R

VoutVinv

DS

-+Vin C

Vgate

IL

Background

ISSCC 2007 Design

• ZVS delay circuit• Integrated clock driver / power converter

12

Integration of Clock and SMPS

• CPU clock: 3GHz clock and large Cclk

• SMPS: large Mp, Mn drive chain

13

Integration of Clock and SMPS

• Combine the driver circuits

Vclk

Cclk

CLK in

Mp

Mn

VoutLf

Cf Rload

CLK in

14

Key Concept: Energy Recycling

• Benefits– Shared driver chain

– Cclk added to SMPS

• Red path– NMOS drains Cclk wastes charge!

• Blue path– Delay NMOS turn-on recovers clock charge!– ZVS (zero voltage switching) in power electronics

Vclk

Cclk

CLK in

VoutLf

Cf Rload

15

ZVS Detailed Operation

• ZVS delay circuit – Delay only rising edge of Vn

– Implemented inside the clock chain

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

16

ZVS Detailed Operation (Mode 1)

• Mode 1 (0 < t < DTsw)

– Mp is ON

– Current builds up in the inductor

– Cclk charges up

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Switching period

17

ZVS Detailed Operation (Mode 2)

• Mode 2 (DTsw < t < DTsw+Tzvs)– Both power transistors are OFF

– Inductor current discharges Cclk

– Cclk charge is recycled to output load

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Period

Tzvs = ZVS delay

18

ZVS Detailed Operation (Mode 3)

• Mode 3 (DTsw+Tzvs < t < Tsw)

– Mn turns ON when Vclk 0

• ZVS for Mn

– Inductor current decreases linearly

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Period

Tzvs = ZVS delay

19

Detailed Operation

• ZVS delay circuit for Mn

– Delay rising edge of Vn

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

3

4

Vout

RloadCclk

Lf

Cf

20

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

Detailed Operation

• ZVS delay circuit for Mn

– Falling edges of Vp and Vn are synchronized

21

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Time (nSec)V

olt

age

(V)

VclkVclk-refVload

Simulation Voltages

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

22

Simulation Currents

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Vo

ltag

e (V

)

VclkVclk-refVload

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Cu

rren

t (m

A)

LfMnMp

23

Effective Efficiency

• How to measure power efficiency after clock drivers are integrated with DC-DC converters ?– Converter gets “free energy” from clock

– Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter

Raw efficiency Effective efficiency

1001

in

outraw P

P

Raw Efficiency

Pin1 Pout

Integrated Clock Driverand Power Converter

orStand-alone Power Converter

dummyEffective Efficiency

Pin2

Pin1 – Pin2 PoutPin1

Clock Driver Portion

Power Converter Portion

Recycled Energy(not counted as

input power)

10021

inin

outeffective PP

P

24

Buck Converter – Simulation Results

0

50

100

150

200

250

300

40 50 60 70 80 90 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=30%D=40%D=50%D=60%D=70%

0

0.25

0.5

0.75

1

10 20 30 40 50 60 70 80

Duty Ratio (%)

Vo

ut

(V)

Iout=30

Iout=50

Iout=70

Iout=100

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

25

ISSCC 2007

• 90nm test chip 1mm2, buck converter 0.27mm2

26

Buck Converter – Chip Measurement vs. Simulation Results

0

50

100

150

200

250

300

40 50 60 70 80 90 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=30%D=40%D=50%D=60%D=70%

Chip Measurement Simulation (3GHz)

Fsw Sweep (D=50%)

0

40

80

120

160

200

240

30 40 50 60 70 80 90 100 110

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

3.5GHz3GHz2.5GHz2GHz

ISVLSI 2008New Design 1

Boost Converter

28

Boost Converter

• Basic operation– Vclk provides power & timing

• 0th order result… Vout = D/(1-D)*Vdd

29

Boost Converter

192/0.1

Wp/Lp = 576/0.1 Wp/Lp = 192/0.1

VpulseMp1

Wp/Lp = 48/0.1 Wp/Lp = 16/0.1

Wp/Lp = 192/0.1 Wp/Lp = 64/0.1

64/0.1

4096/0.1

1024/0.1

512/0.1 x2

Clock Load Capacitance

+

ILf

Cshift=21pF

Vclk

Vclk_scaled

4096/0.1

2048/0.1Mp2

Mp3

Mn2

Mn3

Mn1

Cclk_scaled

Vshift

Dshift

Vout

1kW

Cclk=21pF

+CF=378pF

2.2pF

LF=310pH

216/0.75

36720/0.75

VDD

2016/0.75Cclk=Cshift

30

Boost Converter – Simulation Results

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

0

0.5

1

1.5

2

2.5

30 40 50 60 70 80

Duty Ratio (%)

Vo

ut

(V)

Iout=10mAIout=30mAIout=50mAIout=70mAIout=100mA

0

25

50

75

100

125

0 20 40 60 80 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=40%D=50%D=60%D=70%D=80%

ISVLSI 2008New Design 2

Buck-boost Converter

32

Buck-boost Converter

• Basic operation– Vclk provides power & timing

• 0th order result… Vout = -D2/(1-D)*Vdd

33

Buck-boost Converter

192/0.1

Wp/Lp = 576/0.1 Wp/Lp = 192/0.1

VpulseMp1

Mn1

ILf LF

Vclk

Clock Load Capacitance

Vinv

Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 2016/0.75

Wp/Lp = 192/0.1 Wp/Lp = 64/0.1

64/0.1

4096/0.1

4096/0.1

1024/0.1

1024/0.1

+

+10.4kW

128/0.1

310pH

Cbias

2016/0.75 Cshift=21pF

Mp2

Mp3

4096/0.1Mp4

Mp5

Mn2

Mn3

Dshift

Vshift

Vclk

Vbias

Deep N-WellStructures

Vout

Three Diodesin Series, Each: 128/0.1

1kW

21pF

Cclk = 21pF

VDD

+CF=356pF

34560/0.75CF

2016/0.75

Cbias

34

Buck-boost Converter

-2

-1.6

-1.2

-0.8

-0.4

0

10 20 30 40 50 60 70

Duty Ratio (%)

Vo

ut

(V)

Iout=10mA

Iout=30mA

Iout=50mA

Iout=70mA

Iout=90mA

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

0

20

40

60

80

100

0 20 40 60 80 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=20%D=30%D=40%D=50%D=60%D=70%

Results and Comparisons

36

Summary Results

• 90nm layouts, 3GHz operation, < 0.3mm2

Clock-only power (input)

Extra power to operate

converter (input)

Converter output power

% clock energy

recovered

Buck converter [ ISSCC2007 ]

40mW 16mW 26mW 50%

Boost converter

100mW 25mW 28mW 20%

Buck-boost converter

100mW 72mW 48mW 30%

37

Comparative Results

• IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2

• Other work: fully on-chip DC-DC buck converter– S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter

with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357

– 27mm2, 45MHz– 65% power efficiency

• This work– 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz

• Cclk 20pF, equiv to 1.6mm2 of Power6 area

• DC-DC converter adds 12.5% area overhead

– LC filter: 310pH inductor, 350pF capacitor• L and C similar and dominate layout area can stack to cut area in half

– Buck: 75 – 185% effective power efficiency (50% recovered)– Boost: 25 – 110% effective power efficiency (20% recovered)– Buck-boost: 20 – 66% effective power efficiency (30% recovered)

38

Conclusion

• Key concepts– High switching frequency saves area– Combined drivers saves area and switching loss

– Recycled charge converter load discharges Cclk

– ZVS delay circuit lower power loss

• Limitations– Regulation needs variable duty cycle clock

• May introduce additional clock jitter• Mostly suitable for edge-triggered blocks

(no latches)

• Future work– Lots of improvements to make!

Thank you!

Questions ?