Lecture 21: Packaging, Power, & Clock

Preview:

DESCRIPTION

Lecture 21: Packaging, Power, & Clock. Outline. Packaging Power Distribution Clock Distribution. Packages. Package functions Electrical connection of signals and power from chip to board Little delay or distortion Mechanical connection of chip to board Removes heat produced on chip - PowerPoint PPT Presentation

Citation preview

Lecture 21:

Packaging, Power, & Clock

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 2

Outline Packaging Power Distribution Clock Distribution

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 3

Packages Package functions

– Electrical connection of signals and power from chip to board

– Little delay or distortion– Mechanical connection of chip to board– Removes heat produced on chip– Protects chip from mechanical damage– Compatible with thermal expansion– Inexpensive to manufacture and test

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 4

Package Types Through-hole vs. surface mount

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 5

Chip-to-Package Bonding Traditionally, chip is surrounded by pad frame

– Metal pads on 100 – 200 m pitch– Gold bond wires attach pads to package– Lead frame distributes signals in package– Metal heat spreader helps with cooling

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 6

Advanced Packages Bond wires contribute parasitic inductance Fancy packages have many signal, power layers

– Like tiny printed circuit boards Flip-chip places connections across surface of die

rather than around periphery– Top level metal pads covered with solder balls– Chip flips upside down– Carefully aligned to package (done blind!)– Heated to melt balls– Also called C4 (Controlled Collapse Chip Connection)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

LGA Package 1

21: Package, Power, and Clock 7

1366 gold-plated pads

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 8

Package Parasitics

Chip

Signal P

ins

PackageCapacitor

Signal P

ads

ChipVDD

ChipGND

BoardVDD

BoardGND

Bond Wire Lead Frame

Package

Use many VDD, GND in parallel

– Inductance, IDD

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 9

Heat Dissipation 60 W light bulb has surface area of 120 cm2

Itanium 2 die dissipates 130 W over 4 cm2

– Chips have enormous power densities– Cooling is a serious challenge

Package spreads heat to larger surface area– Heat sinks may increase surface area further– Fans increase airflow rate over surface area– Liquid cooling used in extreme cases ($$$)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 10

Thermal Resistance T = jaP

– T: temperature rise on chip

– ja: thermal resistance of chip junction to ambient

– P: power dissipation on chip Thermal resistances combine like resistors

– Series and parallel ja = jp + pa

– Series combination

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 11

Example Your chip has a heat sink with a thermal resistance

to the package of 4.0° C/W. The resistance from chip to package is 1° C/W. The system box ambient temperature may reach

55° C. The chip temperature must not exceed 100° C. What is the maximum chip power dissipation?

(100-55 C) / (4 + 1 C/W) = 9 W

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 12

Temperature Sensor

Monitor die temperature and throttle performance if it gets too hot

Use a pair of pnp bipolar transistors– Vertical pnp available in CMOS

Voltage difference is proportional to absolute temp– Measure with on-chip A/D converter

1 2 11 2

2

ln

ln ln ln ln

BEqVckT

c s BEc

c c cBE BE BE

s s c

IkTI I e V

q I

I I IkT kT kTV V V m

q I I q I q

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 13

Power Distribution Power Distribution Network functions

– Carry current from pads to transistors on chip– Maintain stable voltage with low noise– Provide average and peak power demands– Provide current return paths for signals– Avoid electromigration & self-heating wearout– Consume little chip area and wire– Easy to lay out

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 14

Power Requirements VDD = VDDnominal – Vdroop

Want Vdroop < +/- 10% of VDD

Sources of Vdroop

– IR drops– L di/dt noise

IDD changes on many time scalesclock gating

Time

Average

Max

Min

Power

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 15

IR Drop

A chip draws 24 W from a 1.2 V supply. The power supply impedance is 5 m. What is the IR drop?

IDD = 24 W / 1.2 V = 20 A

IR drop = (20 A)(5 m) = 100 mV

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

IR Introduced Noise

21: Package, Power, and Clock 16

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Power Distribution

21: Package, Power, and Clock 17

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Power Distribution Low level distribution is in metal 1. Power has to be strapped in higher layers of metal. The spacing is set by IR drop, electromigration, and

inductive effects. Always use multiple contacts on straps.

21: Package, Power, and Clock 18

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Power and Ground Distribution

21: Package, Power, and Clock 19

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

3 Metal Layers (EV4)

21: Package, Power, and Clock 20

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

4 Metal Layers (EV5)

21: Package, Power, and Clock 21

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

6 Metal Layers (EV6)

21: Package, Power, and Clock 22

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Power Supply Droop

21: Package, Power, and Clock 23

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

L di/dt Noise

21: Package, Power, and Clock 24

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 25

L di/dt Noise

A 1.2 V chip switches from an idle mode consuming 5W to a full-power mode consuming 53 W. The transition takes 10 clock cycles at 1 GHz. The supply inductance is 0.1 nH. What is the L di/dt droop?

I = (53 W – 5 W)/(1.2 V) = 40 A t = 10 cycles * (1 ns / cycle) = 10 ns L di/dt droop = (0.1 nH) * (40 A / 10 ns) = 0.4 V

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Dealing with L di/dt Separate power pins for I/O pads and chip core. Multiple power and ground pins. Careful selection of positions of power and ground

pins on package. Increase rise and fall times as much as possible. Schedule current consuming transitions. Use advanced packaging technologies. Use decoupling capacitances on the board. Use decoupling capacitances on chip.

21: Package, Power, and Clock 26

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Choosing the Right Pin

21: Package, Power, and Clock 27

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Decoupling Capacitance

21: Package, Power, and Clock 28

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 29

Bypass Capacitors Need low supply impedance at all frequencies Ideal capacitors have impedance decreasing with Real capacitors have parasitic R and L

– Leads to resonant frequency of capacitor

104

105

106

107

108

109

1010

10-2

10-1

100

101

102

frequency (Hz)

impedance

1 F

0.03

0.25 nH

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

De-coupling Capacitor Ratios

EV4– total effective switching capacitance = 12.5nF– 128nF of de-coupling capacitance– de-coupling/switching capacitance ~ 10x

EV5– 13.9nF of switching capacitance – 160nF of de-coupling capacitance

EV6– 34nF of effective switching capacitance– 320nF of de-coupling capacitance -- not enough!

Source: B. Herrick (Compaq)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

EV6 De-coupling CapacitanceDesign for Idd= 25 A @ Vdd = 2.2 V, f = 600

MHz– 0.32-µF of on-chip de-coupling capacitance was

added• Under major busses and around major gridded clock

drivers• Occupies 15-20% of die area

– 1-µF 2-cm2 Wirebond Attached Chip Capacitor (WACC) significantly increases “Near-Chip” de-coupling• 160 Vdd/Vss bondwire pairs on the WACC minimize

inductance

Source: B. Herrick (Compaq)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

EV6 WACC

587 IPGA

MicroprocessorWACC

Heat Slug

389 Signal - 198 VDD/VSS Pins389 Signal Bondwires

395 VDD/VSS Bondwires

320 VDD/VSS Bondwires

Source: B. Herrick (Compaq)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 33

Power System Model Power comes from regulator on system board

– Board and package add parasitic R and L– Bypass capacitors help stabilize supply voltage– But capacitors also have parasitic R and L

Simulate system for time and frequency responses

VoltageRegulator

Printed CircuitBoard Planes

Packageand Pins

SolderBumps

BulkCapacitor

CeramicCapacitor

PackageCapacitor

On-ChipCapacitor

On-ChipCurrent Demand

VDD

Chip

PackageBoard

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 34

Frequency Response Multiple capacitors in parallel

– Large capacitor near regulator has low impedance at low frequencies

– But also has a low self-resonant frequency– Small capacitors near chip and on chip have low

impedance at high frequencies Choose caps to get low impedance at all frequencies

frequency (Hz)

impedance

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 35

Example: Pentium 4

Power supply impedance for Pentium 4– Spike near 100 MHz caused by package L

Step response to sudden supply current chain– 1st droop: on-chip bypass caps– 2nd droop: package capacitance– 3rd droop: board capacitance

[Xu08] [Wong06]

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Distributed Model

21: Package, Power, and Clock 36

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 37

Charge Pumps

Sometimes a different supply voltage is needed but little current is required– 20 V for Flash memory programming– Negative body bias for leakage control during sleep

Generate the voltage on-chip with a charge pump

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 38

Energy Scavenging

Ultra-low power systems can scavenge their energy from the environment rather than needing batteries– Solar calculator (solar cells)– RFID tags (antenna)– Tire pressure monitors powered by vibrational

energy of tires (piezoelectric generator) Thin film microbatteries deposited on the chip can

store energy for times of peak demand

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Capacitive Cross Talk

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Capacitive Cross Talk Dynamic Node

3 x 1 m overlap: 0.19 V disturbance

CY

CXY

VDD

PDN

CLK

CLK

In1

In2

In3

Y

X

2.5 V

0 V

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Capacitive Cross Talk Driven Node

XY = RY(CXY+CY)

Keep time-constant smaller than rise time

V (Volt)

0

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

010.80.6

t (nsec)

0.40.2

X

YVX

RYCXY

CY

tr↑

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Dealing with Capacitive Cross Talk

Avoid floating nodes Protect sensitive nodes Make rise and fall times as large as possible Differential signaling Do not run wires together for a long distance Use shielding wires Use shielding layers

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Shielding

GND

GND

Shieldingwire

Substrate ( GND )

Shieldinglayer

VDD

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Cross Talk and Performance

Cc

- When neighboring lines switch in opposite direction of victim line, delay increases

DELAY DEPENDENT UPON ACTIVITY IN NEIGHBORING WIRES

Miller EffectMiller Effect

- Both terminals of capacitor are switched in opposite directions (0 Vdd, Vdd 0)

- Effective voltage is doubled and additional charge is needed (from Q=CV)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Impact of Cross Talk on Delay

r is ratio between capacitance to GND and to neighbor

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Dealing with Cross-Talk

Evaluate and improve Constructive layout generation Predictable structures Avoid worst case patterns

21: Package, Power, and Clock 46

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Structured Predictable Interconnect

S

S SV V S

G

S

SV

G

VS

S SV V S

G

S

SV

G

VExample: Dense Wire Fabric ([Sunil Kathri])Trade-off:• Cross-coupling capacitance 40x lower, 2% delay variation• Increase in area and overall capacitance Also: FPGAs, VPGAs

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 48

Clock Distribution On a small chip, the clock distribution network is just

a wire– And possibly an inverter for clkb

On practical chips, the RC delay of the wire resistance and gate load is very long– Variations in this delay cause clock to get to

different elements at different times– This is called clock skew

Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Example

21: Package, Power, and Clock 49

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 50

Example Skew comes from differences in gate and wire delay

– With right buffer sizing, clk1 and clk2 could ideally arrive at the same time.

– But power supply noise changes buffer delays

– clk2 and clk3 will always see RC skew

3 mm

1.3 pF

3.1 mmgclk

clk1

0.5 mm

clk2clk3

0.4 pF 0.4 pF

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Clock Uncertainties

21: Package, Power, and Clock 51

CMOS VLSI DesignCMOS VLSI Design 4th Ed.52

Clock Nonidealities Clock skew

– Spatial variation in temporally equivalent clock edges; deterministic + random, tSK

Clock jitter– Temporal variations in consecutive edges of the

clock signal; modulation + random noise– Cycle-to-cycle (short-term) tJS

– Long term tJL

Variation of the pulse width – Important for level sensitive clocking

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 53

Review: Skew Impact

F1

F2

clk

clk clk

Combinational Logic

Tc

Q1 D2

Q1

D2

tskew

CL

Q1

D2

F1

clk

Q1

F2

clk

D2

clk

tskew

tsetup

tpcq

tpdq

tcd

thold

tccq

setup skew

sequencing overhead

hold skew

pd c pcq

cd ccq

t T t t t

t t t t

Ideally full cycle is

available for work Skew adds sequencing

overhead Increases hold time too

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 54

Solutions Reduce clock skew

– Careful clock distribution network design– Plenty of metal wiring resources

Analyze clock skew– Only budget actual, not worst case skews– Local vs. global skew budgets

Tolerate clock skew– Choose circuit structures insensitive to skew

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 55

Clock Dist. Networks Ad hoc Grids H-tree Hybrid

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 56

H-Trees Fractal structure

– Gets clock arbitrarily close to any point– Matched delay along all paths

Delay variations cause skew A and B might see big skew A B

CMOS VLSI DesignCMOS VLSI Design 4th Ed.57

More realistic H-tree

[Restle98]

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 58

Itanium 2 H-Tree Four levels of buffering:

– Primary driver– Repeater– Second-level

clock buffer– Gater

Route around

obstructionsPrimary Buffer

Repeaters

Typical SLCBLocations

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Itanium 2 Repeaters

21: Package, Power, and Clock 59

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Spines

21: Package, Power, and Clock 60

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Pentium IV Clock Spines

21: Package, Power, and Clock 61

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Pentium IV Clock Spines

21: Package, Power, and Clock 62

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 63

Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die

CMOS VLSI DesignCMOS VLSI Design 4th Ed.64

The Grid System

D r iv e r

D r iv e r

Dri

ver

Driv

er

G C LK G C LK

G CL K

G CL K

•No rc-matching•Large power

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 65

Alpha Clock Grids

PLL

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

CMOS VLSI DesignCMOS VLSI Design 4th Ed.66

Example: DEC Alpha 21164

Clock Frequency: 300 MHz - 9.3 Million Transistors

Total Clock Load: 3.75 nF

Power in Clock Distribution network : 20 W (out of 50)

Uses Two Level Clock Distribution:

• Single 6-stage driver at center of chip

• Secondary buffers drive left and right sideclock grid in Metal3 and Metal4

Total driver size: 58 cm!

CMOS VLSI DesignCMOS VLSI Design 4th Ed.67

21164 Clocking 2 phase single wire clock,

distributed globally 2 distributed driver channels

– Reduced RC delay/skew

– Improved thermal distribution

– 3.75nF clock load

– 58 cm final driver width

Local inverters for latching Conditional clocks in caches to

reduce power More complex race checking Device variation

trise = 0.35ns tskew = 150ps

tcycle= 3.3ns

Clock waveform

Location of clockdriver on die

pre-driver

final drivers

CMOS VLSI DesignCMOS VLSI Design 4th Ed.68

Clock Drivers

CMOS VLSI DesignCMOS VLSI Design 4th Ed.69

Clock Skew in Alpha Processor

CMOS VLSI DesignCMOS VLSI Design 4th Ed.70

2 Phase, with multiple conditional buffered clocks

– 2.8 nF clock load– 40 cm final driver width

Local clocks can be gated “off” to save power

Reduced load/skew Reduced thermal issues Multiple clocks complicate race

checking

trise = 0.35ns tskew = 50ps

tcycle= 1.67ns

EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOSEV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS

Global clock waveform

PLL

CMOS VLSI DesignCMOS VLSI Design 4th Ed.71

21264 Clocking

CMOS VLSI DesignCMOS VLSI Design 4th Ed.72

EV6 Clock Results

GCLK Skew(at Vdd/2 Crossings)

ps5101520253035404550

ps300305310315320325330335340345

GCLK Rise Times(20% to 80% Extrapolated to 0% to 100%)

CMOS VLSI DesignCMOS VLSI Design 4th Ed. 73

EV7 Clock Hierarchy

GCLK(CPU Core)L2

L_C

LK(L

2 C

ache

)

L2R

_CLK

(L2

Cac

he)

NCLK(Mem Ctrl)

DLL

PLL

SYSCLK

DLL

DLL

+ widely dispersed drivers

+ DLLs compensate static and low-frequency variation

+ divides design and verification effort

- DLL design and verification is added work

+ tailored clocks

Active Skew Management and Multiple Clock Domains

CMOS VLSI DesignCMOS VLSI Design 4th Ed.21: Package, Power, and Clock 74

Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid

Ex: IBM Power4, PowerPC– H-tree drives 16-64 sector buffers– Buffers drive total of 1024 points– All points shorted together with grid

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Clock Gaters

21: Package, Power, and Clock 75

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Adaptive Deskewing

21: Package, Power, and Clock 76

CMOS VLSI DesignCMOS VLSI Design 4th Ed.77

Self-timed and Asynchronous Design

Functions of clock in synchronous design

1) Acts as completion signal

2) Ensures the correct ordering of events

Truly asynchronous design

2) Ordering of events is implicit in logic

1) Completion is ensured by careful timing analysis

Self-timed design

1) Completion ensured by completion signal2) Ordering imposed by handshaking protocol

CMOS VLSI DesignCMOS VLSI Design 4th Ed.78

Self-Timed Pipelined Datapath

R2 OutF2In

tpF2

Start Done

R1 F1

tpF1

Start Done

R3 F3

tpF3

Start Done

Req Req Req Req

Ack Ack Ack ACKHS HS HS

CMOS VLSI DesignCMOS VLSI Design 4th Ed.79

Completion Signal Generation

LOGIC

NETWORK

DELAY MODULE

In Out

Start Done

Using Delay Element (e.g. in memories)

CMOS VLSI DesignCMOS VLSI Design 4th Ed.80

Completion Signal Generation

Using Redundant Signal Encoding

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Completion Signal in DCVSL

21: Package, Power, and Clock 81

CMOS VLSI DesignCMOS VLSI Design 4th Ed.82

Self-Timed Adder

P0

C0

P1

G0

P2

G1

P3

G2 G3

VDD

Start

Start

P0

C0

P1

K0

P2

K1

P3

K2 K3

VDD

Start

Start

C0 C1 C2 C3 C4 C4

C4C0 C1 C2 C3 C4

VDD

Start

C4

C3

C2

C1

C4

C3

C2

C1

Start Done

(a) Differential carry generation

(b) Completion signal

CMOS VLSI DesignCMOS VLSI Design 4th Ed.

Completion Signal Using Current Sensing

21: Package, Power, and Clock 83

Recommended