28
CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003

CSE241VLSI Digital Circuits

Winter 2003

Lecture 07: Timing II

Page 2: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.2 Kahng & Cichy, UCSD ©2003

Delay Calculation

Cap\Tr 0.05 0.2 0.5

0.01 0.02 0.16 0.30

0.5 0.04 0.32 0.60

2.0 0.08 0.64 1.20

Cap\Tr 0.05 0.2 0.5

0.01 0.03 0.18 0.33

0.5 0.06 0.36 0.66

2.0 0.09 0.72 1.32

Cell Fall

Cell Rise

1.0pf

0.1ns

0.12ns

Fall delay = 0.178nsRise delay = 0.261nsFall transition = 0.147nsRise transition = …

0.178

0.261

Cap\Tr 0.05 0.2 0.5

0.01 0.01 0.09 0.15

0.5 0.03 0.27 0.45

2.0 0.06 0.54 0.90

Fall Transition

0.147

0.147ns

Page 3: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.3 Kahng & Cichy, UCSD ©2003

PVT (Process, Voltage, Temperature) Derating

Actual cell delay = Original delay x KPVT

Page 4: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.4 Kahng & Cichy, UCSD ©2003

PVT Derating: Example + Min/Typ/Max Triples

Proc_var (0.5:1.0:1.3)Voltage (5.5:5.0:4.5)Temperature (0:20:50)KP = 0.80 : 1.00 : 1.30KV = 0.93 : 1.00 : 1.08KT = 0.80 : 1.07 : 1.35

KPVT = 0.60 : 1.07 : 1.90

Cell delay = 0.261nsDerated delay = 0.157 : 0.279 : 0.496 {min : typical : max}

Page 5: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.5 Kahng & Cichy, UCSD ©2003

Conservatism of Gate Delay Modeling

True gate delay depends on input arrival time patterns

STA will assume that only 1 input is switching Will use worst slope among several inputs

Time

A B Ftpd

Time

A Ftpd

Vdd

Vdd

DA

B

F

CLD

A

B

F

CL

Page 6: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.6 Kahng & Cichy, UCSD ©2003

This Class + Logistics

Reading Smith, Chapters 15, 16 http://vlsicad.ucsd.edu/Presentations/ICCAD00TUTORIAL/ Possibly: Sarrafzadeh/Wong Chapters 2 - placement, 3 - routing,

(4 – performance modeling)

Schedule- MT will be take-home (and, easy), BUT you lose 5% if you

don’t show up on Thursday (attendance will be taken by Ben)

- Thursday: Surprise guest lecturer on floorplan / placement

HW #12: Suppose that you want to work on timing edges that are most critical according to some F(slack of the edge, #paths through the edge). How would you modify the STA calculation (longest path in a DAG) so that it also calculates the number of paths through each edge?

Slide courtesy of S. P. Levitan, U. Pittsburg

Page 7: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.7 Kahng & Cichy, UCSD ©2003

Buffer Clustering

Sylvester / Shepard, 2001

Hierarchical clustering connecting clock source (= root) to clock sinks (= leaves) of clustering tree

Fanout at each level between 5 and 200 (depends on buffer library)

Often specify a clock topology in the tool as, e.g., (1)-6-8-5 root has 6 children, each of which has 8 children, each of which has 5 (leaf) children 240 clock sinks

Big question: how to perform the hierarchical buffer clustering? What makes a “good” cluster?

Page 8: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.8 Kahng & Cichy, UCSD ©2003

Buffer Clustering by Space Partitioning

Sylvester / Shepard, 2001

Example: Cadence CT-Gen

Pick fanout (e.g., 6-4)

Pick “long axis” of bounding box of sinks

Place buffers at medians (essentially) of chunks of sinks identified by space-partitioning

Why is this good? Uses (or assumes) min wire; easily routed (Steiner routing; robust to ECOs; …

Why is it bad? Oversizes drivers; commits to skew which could be avoided

Page 9: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.9 Kahng & Cichy, UCSD ©2003

Buffer Clustering by Traditional Clustering

Sylvester / Shepard, 2001

Example: SPC, old Cell3 CTS

Pick fanout (e.g., 6)

Find clusters of size 6

Place buffers at centers or centroids or … of clusters

Recurse

Why is this good? Can get near-zero skew trees?

Why is this bad? ECOs; hard to route; more wire(?); difficult algorithms!

HW #13: Propose a hierarchical clustering strategy for buffered clock trees, and explain its pros and cons

Page 10: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.17 Kahng & Cichy, UCSD ©2003

Outline

Clocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 11: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.18 Kahng & Cichy, UCSD ©2003

Skew Reduction Using Package

• Most clock network latency occurs at global level (largest distances spanned)

• Latency Skew

• With reverse scaling, routing low-RC signals at global level becomes more difficult & area-consuming

Sylvester / Shepard, 2001

Page 12: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.19 Kahng & Cichy, UCSD ©2003

System clock

P/ASIC Solder bump

substrate

Incorporate global clock distribution into the package

Flip-chip packaging allows for high density, low parasitic access from substrate to IC

• RC of package-level wiring up to 4 orders of magnitude smaller than on-chip wiring

• Global skew reduced

• Lower capacitance lower power

• Opens up global routing tracks

• Results not yet conclusive

Skew Reduction Using Package

Sylvester / Shepard, 2001

Page 13: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.20 Kahng & Cichy, UCSD ©2003

Useful Skew (= cycle-stealing)

FF fast FF FFslow

Zero skew

hold setup hold setup

Timing Slacks

FF fast FF FFslow

Useful skew

hold setup hold setup

Useful skew

• Local skew constraints

• Shift slack to critical paths

Zero skew

• Global skew constraint

• All skew is badW. Dai, UC Santa Cruz

Page 14: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.21 Kahng & Cichy, UCSD ©2003

Skew = Local Constraint

D : longest pathd : shortest path

FF FF

safe

Skew

race condition cycle time violation

-d + thold Tperiod - D - tsetup< <

permissible range

Timing is correct as long as the signal arrives in the permissible skew range

W. Dai, UC Santa Cruz

Page 15: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.22 Kahng & Cichy, UCSD ©2003

Skew Scheduling for Design Robustness

“0 0 0”: at verge of violation

FF FF FF2 ns 6 ns

T = 6 ns

“2 0 2”: more safety margin4 0

-22

4 0

Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge

Can solve a linear program to maximize robustness = determine prescribed sink skews

W. Dai, UC Santa Cruz

Page 16: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.23 Kahng & Cichy, UCSD ©2003

Potential Advantages of Useful Skew

CLK

0-skew

CLK

U-skew

Reduce peak current consumption by distributing the FF switch point in the range of permissible skew

Affords extra margin to increase clock frequency or reduce sizing (= power)

W. Dai, UC Santa Cruz

Page 17: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.24 Kahng & Cichy, UCSD ©2003

Conventional Zero-Skew Flow

PlacementPlacement

SynthesisSynthesis

Extraction & Delay CalculationExtraction & Delay Calculation

Static Timing AnalysisStatic Timing Analysis

0-Skew Clock Synthesis0-Skew Clock Synthesis

Clock RoutingClock Routing

Signal RoutingSignal Routing

W. Dai, UC Santa Cruz

Page 18: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.25 Kahng & Cichy, UCSD ©2003

Useful-Skew Flow

Existing PlacementExisting Placement

Extraction & Delay CalculationExtraction & Delay Calculation

Static Timing AnalysisStatic Timing Analysis

U-Skew Clock SynthesisU-Skew Clock Synthesis

Clock RoutingClock Routing

Signal RoutingSignal Routing

Permissible range generationPermissible range generation

Initial skew schedulingInitial skew scheduling

Clock tree topology synthesisClock tree topology synthesis

Clock net routingClock net routing

Clock timing verificationClock timing verification

W. Dai, UC Santa Cruz

Page 19: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.26 Kahng & Cichy, UCSD ©2003

Outline

Clocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and used-skew degrees of freedom

Clock power issues

Gate timing models

Page 20: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.27 Kahng & Cichy, UCSD ©2003

Power consumption in clocks due to: Clock drivers Long interconnections Large clock loads – all clocked elements (latches, FF’s) are driven

Different components dominate Depending on type of clock network used Ex. Grid – huge pre-drivers & wire cap. drown out load cap.

Clock Power

Sylvester / Shepard, 2001

Page 21: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.28 Kahng & Cichy, UCSD ©2003

Clock Power Is LARGE

Not only is the clock capacitance large, it switches every cycle!

P = C Vdd2 f

Sylvester / Shepard, 2001

Page 22: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.29 Kahng & Cichy, UCSD ©2003

Low-Power Clocking

Gated clocksGated clocks Prevent switching in areas of chip not being usedPrevent switching in areas of chip not being used Easier in static designsEasier in static designs

Edge-triggered flops in ARM rather than transparent latches Edge-triggered flops in ARM rather than transparent latches in Alphain Alpha Reduced load on clock for each latch/flopReduced load on clock for each latch/flop Eliminated spurious power-consuming transitions during latch flow-Eliminated spurious power-consuming transitions during latch flow-

through (transparency)through (transparency)

Sylvester / Shepard, 2001

Page 23: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.30 Kahng & Cichy, UCSD ©2003

Clock Area

Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area

Routing area is most vital

Top-level metals are used to reduce RC delays These levels are precious resources (unscaled) Power routing, clock routing, key global signals

Reducing area also reduces wiring capacitance and power

Typical #’s: Intel Itanium – 4% of M4/5 used in clock routing

Sylvester / Shepard, 2001

Page 24: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.31 Kahng & Cichy, UCSD ©2003

Clock Slew Rates

To maintain signal integrity and latch performance, minimum slew rates are required

Too slow – clock is more susceptible to noise, latches are slowed down, setup times eat into timing budget [Tsetup = 200 + 0.33 * Tslew (ps)], more short-circuit power for large clock drivers

Too fast – burns too much power, overdesigned network, enhanced ground bounce

Rule-of-thumb: Trise and Tfall of clock are each between 10-20% of clock period (10% - aggressive target)

1 GHz clock; Trise = Tfall = 100-200ps

Sylvester / Shepard, 2001

Page 25: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.32 Kahng & Cichy, UCSD ©2003

Example: Alpha 21264

Grid + H-tree approach

Power = 32% of total

Wire usage = 3% of metals 3 & 4

4 major clock quadrants, each with a large driver connected to local grid structures

Sylvester / Shepard, 2001

Page 26: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.33 Kahng & Cichy, UCSD ©2003

Alpha 21264 Skew Map

Ref: Compaq, ASP-DAC00Sylvester / Shepard, 2001

Page 27: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.34 Kahng & Cichy, UCSD ©2003

Power vs. Skew

Fundamental design decision Meeting skew requirements is easy with unlimited

power budget Wide wires reduce RC product but increase total C Driver upsizing reduces latency ( reduces skew as well)

but increases buffer cap SOC context: plastic package power limit is 2-3 W

Sylvester / Shepard, 2001

Page 28: CSE241 L3 ASICs.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II

CSE241 L3 ASICs.35 Kahng & Cichy, UCSD ©2003

Clock Distribution Trends

Timing Clock period dropping fast, skew must follow Slew rates must also scale with cycle time Jitter – PLL’s get better with CMOS scaling but other sources of noise

increase- Power supply noise more important

- Switching-dependent temperature gradients

Materials Cu reduces RC slew degradation, potential skew Low-k decreases power, improves latency, skew, slews

Power Complexity, dynamic logic, pipelining more clock sinks Larger chips bigger clock networksSylvester / Shepard, 2001