29
Introduction to CMOS VLSI CMOS VLSI Design Lecture 19: Design for Skew

Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Introduction toCMOS VLSICMOS VLSI

Design

Lecture 19: Design for Skew

Page 2: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

OutlineOutline Clock Distribution Clock Distribution Clock Skew Sk T l t St ti Ci it Skew-Tolerant Static Circuits Traditional Domino Circuits Sk T l t D i Ci it Skew-Tolerant Domino Circuits

CMOS VLSI Design19: Design for Skew Slide 2

Page 3: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

ClockingClocking Synchronous systems use a clock to keep Synchronous systems use a clock to keep

operations in sequenceDistinguish this from previous or next– Distinguish this from previous or next

– Determine speed at which machine operates Clock must be distributed to all the sequencing Clock must be distributed to all the sequencing

elements– Flip-flops and latchesFlip-flops and latches

Also distribute clock to other elements – Domino circuits and memories– Domino circuits and memories

CMOS VLSI Design19: Design for Skew Slide 3

Page 4: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Clock DistributionClock Distribution On a small chip the clock distribution network is just On a small chip, the clock distribution network is just

a wireAnd possibly an inverter for clkb– And possibly an inverter for clkb

On practical chips, the RC delay of the wire resistance and gate load is very longresistance and gate load is very long– Variations in this delay cause clock to get to

different elements at different times– This is called clock skew

Most chips use repeaters to buffer the clock and Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew

CMOS VLSI Design19: Design for Skew Slide 4

Page 5: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

ExampleExample Skew comes from differences in gate and wire delay Skew comes from differences in gate and wire delay

– With right buffer sizing, clk1 and clk2 could ideally arrive at the same timearrive at the same time.

– But power supply noise changes buffer delaysclk and clk will always see RC skew– clk2 and clk3 will always see RC skew

gclk3 mm 3.1 mm

clk1

0.5 mm

clk2clk3

1.3 pF0.4 pF 0.4 pF

CMOS VLSI Design19: Design for Skew Slide 5

Page 6: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Review: Skew ImpactReview: Skew Impactclk clk

Ideally full cycle is

F1 F2

clk

Combinational Logic

Tc

Q1 D2 Ideally full cycle isavailable for work

Sk dd iQ1

D2

tskew

tsetup

tpcq

tpdq

Skew adds sequencingoverhead

I h ld ti tCLF1

clk

Q1

kdt T t t t

Increases hold time too

F2

clk

D2

tskew

setup skew

sequencing overhead

hold skew

pd c pcq

cd ccq

t T t t t

t t t t

Q1

D2

clk

tcd

thold

tccq

hold skewcd ccq

CMOS VLSI Design19: Design for Skew Slide 6

Page 7: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Cycle Time TrendsCycle Time Trends Much of CPU performance comes from higher f Much of CPU performance comes from higher f

– f is improving faster than simple process shrinksS i h d i bi t f l– Sequencing overhead is bigger part of cycle

10

100

5 100

1000

Hz

0.01

0.1

1

8038680486PentiumPentium II / III

Spec

Int9

5

1985 1988 1991 1994 1997 200010

100

8038680486PentiumPentium II / III

MH

1988 1991 1994 1997 20001985

200

500VDD = 5 VDD = 3.3

VDD = 2.5

Inve

rter D

elay

(ps)

100

dela

ys /

cycl

e50

1.2 0.8 0.6 0.35 2.0

Process

100

50Fano

ut-o

f-4 (F

O4)

0.25 10

1985 1988 1991 1994 1997

8038680486PentiumPentium II / IIIFO

4 in

verte

r d

20

2000

CMOS VLSI Design19: Design for Skew Slide 7

Process

Page 8: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

SolutionsSolutions Reduce clock skew Reduce clock skew

– Careful clock distribution network designPl t f t l i i– Plenty of metal wiring resources

Analyze clock skewO l b d t t l t t k– Only budget actual, not worst case skews

– Local vs. global skew budgets T l l k k Tolerate clock skew

– Choose circuit structures insensitive to skew

CMOS VLSI Design19: Design for Skew Slide 8

Page 9: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Clock Dist NetworksClock Dist. Networks Ad hoc Ad hoc Grids H t H-tree Hybrid

CMOS VLSI Design19: Design for Skew Slide 9

Page 10: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Clock GridsClock Grids Use grid on two or more levels to carry clock Use grid on two or more levels to carry clock Make wires wide to reduce RC delay E l k b t b i t Ensures low skew between nearby points But possibly large skew across die

CMOS VLSI Design19: Design for Skew Slide 10

Page 11: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Alpha Clock GridsAlpha Clock GridsAlpha 21064 Alpha 21164 Alpha 21264

PLL

lk idlk id gclk gridgclk grid

Alpha 21064 Alpha 21164 Alpha 21264

CMOS VLSI Design19: Design for Skew Slide 11

Alpha 21064 Alpha 21164 Alpha 21264

Page 12: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

H TreesH-Trees Fractal structure Fractal structure

– Gets clock arbitrarily close to any pointM t h d d l l ll th– Matched delay along all paths

Delay variations cause skew A d B i ht bi k A and B might see big skew A B

CMOS VLSI Design19: Design for Skew Slide 12

Page 13: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Itanium 2 H TreeItanium 2 H-Tree Four levels of buffering: Four levels of buffering:

– Primary driverR t– Repeater

– Second-level l k b ff

Repeaters

clock buffer– Gater

R d Route aroundobstructions

Typical SLCBLocations

Primary Buffer

CMOS VLSI Design19: Design for Skew Slide 13

Page 14: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Hybrid NetworksHybrid Networks Use H tree to distribute clock to many points Use H-tree to distribute clock to many points Tie these points together with a grid

Ex: IBM Power4, PowerPCH t d i 16 64 t b ff– H-tree drives 16-64 sector buffers

– Buffers drive total of 1024 pointsAll i h d h i h id– All points shorted together with grid

CMOS VLSI Design19: Design for Skew Slide 14

Page 15: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Skew ToleranceSkew Tolerance Flip flops are sensitive to skew because of hard edges Flip-flops are sensitive to skew because of hard edges

– Data launches at latest rising edge of clockM t t b f li t t i i d f l k– Must setup before earliest next rising edge of clock

– Overhead would shrink if we can soften edge L t h t l t d t t f k Latches tolerate moderate amounts of skew

– Data can arrive anytime latch is transparent

CMOS VLSI Design19: Design for Skew Slide 15

Page 16: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Skew: LatchesSkew: Latches2 Phase Latches

Q1

L1 L2 L3

1 12

CombinationalLogic 1

CombinationalLogic 2

Q2 Q3D1 D2 D3

2pd c pdqt T t

2-Phase Latches

1

2

sequencing overhead

1 2 hold nonoverlap skew,cd cd ccqt t t t t t

borrow setup nonoverlap skew2cTt t t t

Pulsed Latches setup skew

sequencing overhead

max ,pd c pdq pcq pwt T t t t t t

Pulsed Latches

hold skew

borrow setup skew

cd pw ccq

pw

t t t t t

t t t t

CMOS VLSI Design19: Design for Skew Slide 16

pp

Page 17: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Dynamic Circuit ReviewDynamic Circuit Review Static circuits are slow because fat pMOS load input Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS

transistors from the inputstransistors from the inputs– Precharge: = 0 output forced high

Evaluate: = 1 output may pull low– Evaluate: = 1 output may pull low

A

A B

Y

C DY

B

C

D A B C DY

A B C D

D

CMOS VLSI Design19: Design for Skew Slide 17

Page 18: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Domino CircuitsDomino Circuits Dynamic inputs must monotonically rise during Dynamic inputs must monotonically rise during

evaluationPlace inverting stage between each dynamic gate– Place inverting stage between each dynamic gate

– Dynamic / static pair called domino gate Domino gates can be safely cascaded Domino gates can be safely cascaded

domino AND

A

W

B

X

dynamicNAND

staticinverter

CMOS VLSI Design19: Design for Skew Slide 18

Page 19: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Domino TimingDomino Timing Domino gates are 1 5 2x faster than static CMOS Domino gates are 1.5 – 2x faster than static CMOS

– Lower logical effort because of reduced Cin

Ch ll i t k h ff iti l th Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval

T diti l h h h d– Traditional schemes have severe overhead– Skew-tolerant domino hides this overhead

CMOS VLSI Design19: Design for Skew Slide 19

Page 20: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Traditional Domino CktsTraditional Domino Ckts Hide precharge time by ping ponging between half Hide precharge time by ping-ponging between half-

cyclesOne evaluates while other precharges– One evaluates while other precharges

– Latches hold results during prechargeTc

clk

c

clkc

clk

c

clk

c

clk clk

c c c c

clk clk clk clk clk

clk

2pd c pdqt T t

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

tpdq tpdq

pd c pdq

CMOS VLSI Design19: Design for Skew Slide 20

Page 21: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Clock SkewClock Skew Skew increases sequencing overhead Skew increases sequencing overhead

– Traditional domino has hard edgesE l t t l t t i i d– Evaluate at latest rising edge

– Setup at latch by earliest falling edge

clk

c

clkc

clk

c

clk clk

c c c

clk clk clk clk

clk

setup skew2 2pd ct T t t S

tatic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Dyn

amic

tskewtsetup

CMOS VLSI Design19: Design for Skew Slide 21

Page 22: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Time BorrowingTime Borrowing Logic may not exactly fit half cycle Logic may not exactly fit half-cycle

– No flexibility to borrow time to balance logic between half cyclesbetween half cycles

Traditional domino sequencing overhead is about 25% of cycle time in fast systems!25% of cycle time in fast systems!

clk

c

clkc

clk clk

c c

clk clk clk

clkS

tatic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

tskewtsetup

CMOS VLSI Design19: Design for Skew Slide 22

Page 23: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Relaxing the TimingRelaxing the Timing Sequencing overhead caused by hard edges Sequencing overhead caused by hard edges

– Data departs dynamic gate on late rising edgeM t t t l t h l f lli d– Must setup at latch on early falling edge

Latch functionsP t lit h i t f d i t– Prevent glitches on inputs of domino gates

– Holds results during precharge I h l h ll ? Is the latch really necessary?

– No glitches if inputs come from other domino– Can we hold the results in another way?

CMOS VLSI Design19: Design for Skew Slide 23

Page 24: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Skew Tolerant DominoSkew-Tolerant Domino Use overlapping clocks to eliminate latches at phase Use overlapping clocks to eliminate latches at phase

boundaries.Second phase evaluates using results of first– Second phase evaluates using results of first

c

1

c

2

No latch atphase boundary

a

Sta

tic

Dyn

ami

Sta

tic

Dyn

ami

b c d

1 1

a

2

a

2

b

c

b

c

CMOS VLSI Design19: Design for Skew Slide 24

Page 25: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Full KeeperFull Keeper After second phase evaluates first phase precharges After second phase evaluates, first phase precharges Input to second phase falls

Vi l t t i it ?– Violates monotonicity? But we no longer need the value N th d t h fl ti t t Now the second gate has a floating output

– Need full keeper to hold it either high or low

weak fullX

H

weak fullkeepertransistors

f

CMOS VLSI Design19: Design for Skew Slide 25

Page 26: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Time BorrowingTime Borrowing Overlap can be used to Overlap can be used to

– Tolerate clock skewP it ti b i– Permit time borrowing

No sequencing overheadtoverlap

tskew

1

p

tborrow

2

1 1 1 1 1 2 2 2

pd ct T

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Sta

tic

Phase 1 Phase 2

CMOS VLSI Design19: Design for Skew Slide 26

ase ase

Page 27: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Multiple PhasesMultiple Phases With more clock phases each phase overlaps more With more clock phases, each phase overlaps more

– Permits more skew tolerance and time borrowing

1

2

3

2

ic ic ic ic ic ic ic ic

4

1 1 2 2 3 3 4 4

Sta

tic

Dyn

am

Sta

tic

Dyn

am

Sta

tic

Dyn

am

Dyn

am

Sta

tic

Dyn

am

Sta

tic

Dyn

am

Sta

tic

Dyn

am

Dyn

am

Sta

tic

Sta

tic

Phase 1 Phase 2 Phase 3 Phase 4

CMOS VLSI Design19: Design for Skew Slide 27

Page 28: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

Clock GenerationClock Generation

clken

1

2

3

4

CMOS VLSI Design19: Design for Skew Slide 28

Page 29: Lecture 19: Design for Skewvlsi.hongik.ac.kr/lecture/이전 강의 자료/이전... · 2010. 2. 25. · cd ccq Q1 D2 clk t cd t hold t ccq 19: Design for Skew Slide 6CMOS VLSI Design

SummarySummary Clock skew effectively increases setup and hold Clock skew effectively increases setup and hold

times in systems with hard edges Managing skew Managing skew

– Reduce: good clock distribution networkAnalyze: local vs global skew– Analyze: local vs. global skew

– Tolerate: use systems with soft edges Flip flops and traditional domino are costly Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full

speed even with moderate clock skewsspeed even with moderate clock skews.

CMOS VLSI Design19: Design for Skew Slide 29