Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Introduction toCMOS VLSICMOS VLSI
Design
Lecture 19: Design for Skew
OutlineOutline Clock Distribution Clock Distribution Clock Skew Sk T l t St ti Ci it Skew-Tolerant Static Circuits Traditional Domino Circuits Sk T l t D i Ci it Skew-Tolerant Domino Circuits
CMOS VLSI Design19: Design for Skew Slide 2
ClockingClocking Synchronous systems use a clock to keep Synchronous systems use a clock to keep
operations in sequenceDistinguish this from previous or next– Distinguish this from previous or next
– Determine speed at which machine operates Clock must be distributed to all the sequencing Clock must be distributed to all the sequencing
elements– Flip-flops and latchesFlip-flops and latches
Also distribute clock to other elements – Domino circuits and memories– Domino circuits and memories
CMOS VLSI Design19: Design for Skew Slide 3
Clock DistributionClock Distribution On a small chip the clock distribution network is just On a small chip, the clock distribution network is just
a wireAnd possibly an inverter for clkb– And possibly an inverter for clkb
On practical chips, the RC delay of the wire resistance and gate load is very longresistance and gate load is very long– Variations in this delay cause clock to get to
different elements at different times– This is called clock skew
Most chips use repeaters to buffer the clock and Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew
CMOS VLSI Design19: Design for Skew Slide 4
ExampleExample Skew comes from differences in gate and wire delay Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally arrive at the same timearrive at the same time.
– But power supply noise changes buffer delaysclk and clk will always see RC skew– clk2 and clk3 will always see RC skew
gclk3 mm 3.1 mm
clk1
0.5 mm
clk2clk3
1.3 pF0.4 pF 0.4 pF
CMOS VLSI Design19: Design for Skew Slide 5
Review: Skew ImpactReview: Skew Impactclk clk
Ideally full cycle is
F1 F2
clk
Combinational Logic
Tc
Q1 D2 Ideally full cycle isavailable for work
Sk dd iQ1
D2
tskew
tsetup
tpcq
tpdq
Skew adds sequencingoverhead
I h ld ti tCLF1
clk
Q1
kdt T t t t
Increases hold time too
F2
clk
D2
tskew
setup skew
sequencing overhead
hold skew
pd c pcq
cd ccq
t T t t t
t t t t
Q1
D2
clk
tcd
thold
tccq
hold skewcd ccq
CMOS VLSI Design19: Design for Skew Slide 6
Cycle Time TrendsCycle Time Trends Much of CPU performance comes from higher f Much of CPU performance comes from higher f
– f is improving faster than simple process shrinksS i h d i bi t f l– Sequencing overhead is bigger part of cycle
10
100
5 100
1000
Hz
0.01
0.1
1
8038680486PentiumPentium II / III
Spec
Int9
5
1985 1988 1991 1994 1997 200010
100
8038680486PentiumPentium II / III
MH
1988 1991 1994 1997 20001985
200
500VDD = 5 VDD = 3.3
VDD = 2.5
Inve
rter D
elay
(ps)
100
dela
ys /
cycl
e50
1.2 0.8 0.6 0.35 2.0
Process
100
50Fano
ut-o
f-4 (F
O4)
0.25 10
1985 1988 1991 1994 1997
8038680486PentiumPentium II / IIIFO
4 in
verte
r d
20
2000
CMOS VLSI Design19: Design for Skew Slide 7
Process
SolutionsSolutions Reduce clock skew Reduce clock skew
– Careful clock distribution network designPl t f t l i i– Plenty of metal wiring resources
Analyze clock skewO l b d t t l t t k– Only budget actual, not worst case skews
– Local vs. global skew budgets T l l k k Tolerate clock skew
– Choose circuit structures insensitive to skew
CMOS VLSI Design19: Design for Skew Slide 8
Clock Dist NetworksClock Dist. Networks Ad hoc Ad hoc Grids H t H-tree Hybrid
CMOS VLSI Design19: Design for Skew Slide 9
Clock GridsClock Grids Use grid on two or more levels to carry clock Use grid on two or more levels to carry clock Make wires wide to reduce RC delay E l k b t b i t Ensures low skew between nearby points But possibly large skew across die
CMOS VLSI Design19: Design for Skew Slide 10
Alpha Clock GridsAlpha Clock GridsAlpha 21064 Alpha 21164 Alpha 21264
PLL
lk idlk id gclk gridgclk grid
Alpha 21064 Alpha 21164 Alpha 21264
CMOS VLSI Design19: Design for Skew Slide 11
Alpha 21064 Alpha 21164 Alpha 21264
H TreesH-Trees Fractal structure Fractal structure
– Gets clock arbitrarily close to any pointM t h d d l l ll th– Matched delay along all paths
Delay variations cause skew A d B i ht bi k A and B might see big skew A B
CMOS VLSI Design19: Design for Skew Slide 12
Itanium 2 H TreeItanium 2 H-Tree Four levels of buffering: Four levels of buffering:
– Primary driverR t– Repeater
– Second-level l k b ff
Repeaters
clock buffer– Gater
R d Route aroundobstructions
Typical SLCBLocations
Primary Buffer
CMOS VLSI Design19: Design for Skew Slide 13
Hybrid NetworksHybrid Networks Use H tree to distribute clock to many points Use H-tree to distribute clock to many points Tie these points together with a grid
Ex: IBM Power4, PowerPCH t d i 16 64 t b ff– H-tree drives 16-64 sector buffers
– Buffers drive total of 1024 pointsAll i h d h i h id– All points shorted together with grid
CMOS VLSI Design19: Design for Skew Slide 14
Skew ToleranceSkew Tolerance Flip flops are sensitive to skew because of hard edges Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clockM t t b f li t t i i d f l k– Must setup before earliest next rising edge of clock
– Overhead would shrink if we can soften edge L t h t l t d t t f k Latches tolerate moderate amounts of skew
– Data can arrive anytime latch is transparent
CMOS VLSI Design19: Design for Skew Slide 15
Skew: LatchesSkew: Latches2 Phase Latches
Q1
L1 L2 L3
1 12
CombinationalLogic 1
CombinationalLogic 2
Q2 Q3D1 D2 D3
2pd c pdqt T t
2-Phase Latches
1
2
sequencing overhead
1 2 hold nonoverlap skew,cd cd ccqt t t t t t
borrow setup nonoverlap skew2cTt t t t
Pulsed Latches setup skew
sequencing overhead
max ,pd c pdq pcq pwt T t t t t t
Pulsed Latches
hold skew
borrow setup skew
cd pw ccq
pw
t t t t t
t t t t
CMOS VLSI Design19: Design for Skew Slide 16
pp
Dynamic Circuit ReviewDynamic Circuit Review Static circuits are slow because fat pMOS load input Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS
transistors from the inputstransistors from the inputs– Precharge: = 0 output forced high
Evaluate: = 1 output may pull low– Evaluate: = 1 output may pull low
A
A B
Y
C DY
B
C
D A B C DY
A B C D
D
CMOS VLSI Design19: Design for Skew Slide 17
Domino CircuitsDomino Circuits Dynamic inputs must monotonically rise during Dynamic inputs must monotonically rise during
evaluationPlace inverting stage between each dynamic gate– Place inverting stage between each dynamic gate
– Dynamic / static pair called domino gate Domino gates can be safely cascaded Domino gates can be safely cascaded
domino AND
A
W
B
X
dynamicNAND
staticinverter
CMOS VLSI Design19: Design for Skew Slide 18
Domino TimingDomino Timing Domino gates are 1 5 2x faster than static CMOS Domino gates are 1.5 – 2x faster than static CMOS
– Lower logical effort because of reduced Cin
Ch ll i t k h ff iti l th Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval
T diti l h h h d– Traditional schemes have severe overhead– Skew-tolerant domino hides this overhead
CMOS VLSI Design19: Design for Skew Slide 19
Traditional Domino CktsTraditional Domino Ckts Hide precharge time by ping ponging between half Hide precharge time by ping-ponging between half-
cyclesOne evaluates while other precharges– One evaluates while other precharges
– Latches hold results during prechargeTc
clk
c
clkc
clk
c
clk
c
clk clk
c c c c
clk clk clk clk clk
clk
2pd c pdqt T t
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
tpdq tpdq
pd c pdq
CMOS VLSI Design19: Design for Skew Slide 20
Clock SkewClock Skew Skew increases sequencing overhead Skew increases sequencing overhead
– Traditional domino has hard edgesE l t t l t t i i d– Evaluate at latest rising edge
– Setup at latch by earliest falling edge
clk
c
clkc
clk
c
clk clk
c c c
clk clk clk clk
clk
setup skew2 2pd ct T t t S
tatic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Dyn
amic
tskewtsetup
CMOS VLSI Design19: Design for Skew Slide 21
Time BorrowingTime Borrowing Logic may not exactly fit half cycle Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic between half cyclesbetween half cycles
Traditional domino sequencing overhead is about 25% of cycle time in fast systems!25% of cycle time in fast systems!
clk
c
clkc
clk clk
c c
clk clk clk
clkS
tatic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
tskewtsetup
CMOS VLSI Design19: Design for Skew Slide 22
Relaxing the TimingRelaxing the Timing Sequencing overhead caused by hard edges Sequencing overhead caused by hard edges
– Data departs dynamic gate on late rising edgeM t t t l t h l f lli d– Must setup at latch on early falling edge
Latch functionsP t lit h i t f d i t– Prevent glitches on inputs of domino gates
– Holds results during precharge I h l h ll ? Is the latch really necessary?
– No glitches if inputs come from other domino– Can we hold the results in another way?
CMOS VLSI Design19: Design for Skew Slide 23
Skew Tolerant DominoSkew-Tolerant Domino Use overlapping clocks to eliminate latches at phase Use overlapping clocks to eliminate latches at phase
boundaries.Second phase evaluates using results of first– Second phase evaluates using results of first
c
1
c
2
No latch atphase boundary
a
Sta
tic
Dyn
ami
Sta
tic
Dyn
ami
b c d
1 1
a
2
a
2
b
c
b
c
CMOS VLSI Design19: Design for Skew Slide 24
Full KeeperFull Keeper After second phase evaluates first phase precharges After second phase evaluates, first phase precharges Input to second phase falls
Vi l t t i it ?– Violates monotonicity? But we no longer need the value N th d t h fl ti t t Now the second gate has a floating output
– Need full keeper to hold it either high or low
weak fullX
H
weak fullkeepertransistors
f
CMOS VLSI Design19: Design for Skew Slide 25
Time BorrowingTime Borrowing Overlap can be used to Overlap can be used to
– Tolerate clock skewP it ti b i– Permit time borrowing
No sequencing overheadtoverlap
tskew
1
p
tborrow
2
1 1 1 1 1 2 2 2
pd ct T
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Sta
tic
Phase 1 Phase 2
CMOS VLSI Design19: Design for Skew Slide 26
ase ase
Multiple PhasesMultiple Phases With more clock phases each phase overlaps more With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowing
1
2
3
2
ic ic ic ic ic ic ic ic
4
1 1 2 2 3 3 4 4
Sta
tic
Dyn
am
Sta
tic
Dyn
am
Sta
tic
Dyn
am
Dyn
am
Sta
tic
Dyn
am
Sta
tic
Dyn
am
Sta
tic
Dyn
am
Dyn
am
Sta
tic
Sta
tic
Phase 1 Phase 2 Phase 3 Phase 4
CMOS VLSI Design19: Design for Skew Slide 27
Clock GenerationClock Generation
clken
1
2
3
4
CMOS VLSI Design19: Design for Skew Slide 28
SummarySummary Clock skew effectively increases setup and hold Clock skew effectively increases setup and hold
times in systems with hard edges Managing skew Managing skew
– Reduce: good clock distribution networkAnalyze: local vs global skew– Analyze: local vs. global skew
– Tolerate: use systems with soft edges Flip flops and traditional domino are costly Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full
speed even with moderate clock skewsspeed even with moderate clock skews.
CMOS VLSI Design19: Design for Skew Slide 29