68
ECE 260B – CSE 241A Timing Analysis & Correction 1 http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Timing Analysis and Correction Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

  • Upload
    cathy

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

ECE260B – CSE241A Winter 2005 Timing Analysis and Correction. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. Timing Analysis. Testing Simulation Device modeling (BSIM) Transistor-level time domain analysis (SPICE) Frequency domain interconnect analysis (AWE, PRIMA) - PowerPoint PPT Presentation

Citation preview

Page 1: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 1 http://vlsicad.ucsd.edu

ECE260B – CSE241AWinter 2005

Timing Analysis and Correction

Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

Page 2: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 2 http://vlsicad.ucsd.edu

Timing Analysis Testing

Simulation Device modeling (BSIM) Transistor-level time domain analysis (SPICE) Frequency domain interconnect analysis (AWE,

PRIMA)

Static timing analysis Transistor-level (PathMill) Gate-level (PrimeTime)

Page 3: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 3 http://vlsicad.ucsd.edu

State is stored in registers (flip-flops or latches)Combinational logic computes next-state, outputs

from present-state, inputs

Sequential Machine

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

Courtesy K. Keutzer et al. UCB

Page 4: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 4 http://vlsicad.ucsd.edu

Why Clocks?

Clocks provide the means to synchronize By allowing events to happen at known timing boundaries, we

can sequence these events

Greatly simplifies building of state machines

No need to worry about variable delay through combinational logic (CL)

All signals delayed until clock edge (clock imposes the worst case delay)

CombLogic

register

CombLogic

register

register

DataflowFSM

Courtesy K. Yang, UCLA

Page 5: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 5 http://vlsicad.ucsd.edu

Clock Cycle Time

Cycle time is determined by the delay through the CL Signal must arrive before the latching edge If too late, it waits until the next cycle

- Synchronization and sequential order becomes incorrect

Constraint: Tcycle > Tprop_delay_through_CL + Toverhead

Example: 3.0 GHz Pentium-4 Tcycle = 333ps

Can change circuit architecture to obtain smaller Tcycle

Courtesy K. Yang, UCLA

Page 6: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 6 http://vlsicad.ucsd.edu

Pipelining For dataflow:

Instead of a long critical path, split the critical path into chunks Insert registers to store intermediate results This allows 2 waves of data to coexist within the CL

Can we extend this ad infinitum? Overhead eventually limits the pipelining

- E.g., 1.5 to 2 gate delays for latch or FF Granularity limits as well

- Minimum time quantum: delay of a gate

register

register

register

register

register

tpd tpd1 tpd2

Tcycle > Tpd + ToverheadTcycle > max(tpd1, tpd2) + Toverhead

CL

A+B

CL

A+B

CL

A

CL

A

CL

B

CL

B

Courtesy K. Yang, UCLA

Page 7: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 7 http://vlsicad.ucsd.edu

Intel MPU FO4 INV Delays Per Clock Period

0.00

20.00

40.00

60.00

80.00

100.00

120.00

1982 1987 1993 1998 2004

Year

Nu

mb

er

of

FO

4 in

ve

rte

r d

ela

ys

386

486 DX2 DX4

Pentium

Pentium MMX

Pentium Pro

Pentium II

Celeron

Pentium III

Pentium 4

FO4 INV = inverter driving 4 identical inverters (no interconnect)

Half of frequency improvement has been from reduced logic stages, i.e., pipelining

Page 8: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 8 http://vlsicad.ucsd.edu

Let’s Revisit Cycle Time and Path Delay

Cycle time (T) cannot be smaller than longest path delay (Tmax)

Longest (critical) path delay is a function of:

Total gate, wire delays logic levels

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

Tclock1

data

cycle time

maxT T

Courtesy K. Keutzer et al. UCB

Page 9: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 9 http://vlsicad.ucsd.edu

Cycle Time - Setup Time

For FFs to correctly capture data, must be stable for:

Setup time (Tsetup) before clock arrives

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

Tclock1

data

setup time

max setupT T T

Courtesy K. Keutzer et al. UCB

Page 10: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 10 http://vlsicad.ucsd.edu

Cycle Time – Clock Skew

clock

Q1 Q2

Tclock1 Tclock2

Tclock1

Tclock2

Q2

data

clock skewQ2

10

If clock network has unbalanced delay – clock skew

Cycle time is also a function of clock skew (Tskew) max setup skewT T T T

critical path, ~5 logic levels

Courtesy K. Keutzer et al. UCB

Page 11: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 11 http://vlsicad.ucsd.edu

Cycle Time – Flip-Flop Delay (Clock to Q)

Cycle time is also a function of propagation delay of FF (Tclk-to-Q or Tc2q)

Tc2q : time from arrival of clock signal till change at FF output)

clock

Q1 Q2

Tclock1 Tclock2

Tclock1

Tclock2

Q2clock-to-Q

data

Q2

max setup skew clk to QT T T T T

critical path, ~5 logic levels

Courtesy K. Keutzer et al. UCB

Page 12: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 12 http://vlsicad.ucsd.edu

Min Path Delay - Hold Time

For FFs to correctly latch data, data must be stable during:

Hold time (Thold) after clock arrives

Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)

clock

Q1 Q2

Tclock1 Tclock2

short path, ~3 logic levels

Tclock1

data

hold time

min hold skewT T T

Courtesy K. Keutzer et al. UCB

Page 13: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 13 http://vlsicad.ucsd.edu

Setup, Hold, Cycle Times

set-up time – D stablebefore clock

cycle time

Example of a single phase clock

hold time –D stable after clock

When signalmay change

Courtesy K. Keutzer et al. UCB

Page 14: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 14 http://vlsicad.ucsd.edu

Timing Constraints for Edge-Triggered FFs

Max(Tpd) < Tcycle – Tsetup – Tc2q – Tskew

Delay is too long for data to be captured

Min(Tpd) > Thold-Tc2q+Tskew

Delay is too short and data can race through, skipping a state

FlipF

lop

Tcycle

Comb

Logic

Comb

Logic

Courtesy K. Yang, UCLA

Page 15: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 15 http://vlsicad.ucsd.edu

Example of Tpdmax Violation

Suppose there is skew between the registers in a dataflow (regA after regB)

“i” gets its input values from regA at transition in Ck’

CL output “o” arrives after Ck transition due to skew

To correct this problem, can increase cycle time

i

o

regA

regB

Tpdmax

Ck’ Ck

Ck

Ck’

i o

Tskew

Too late!

Tpdmax

Comb

Logic

Comb

Logic

Courtesy K. Yang, UCLA

Page 16: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 16 http://vlsicad.ucsd.edu

Example of Tpdmin Violation: Race Through Suppose clock skew causes regA to be clocked before regB

“i” passes through the CL with little delay (tpdmin)

“o” arrives before the rising Ck’ causes the data to be latched

Cannot be fixed by changing frequency have rock instead of chip

i

oregA

regB

Tpdmin

Ck Ck’

Ck

Ck’

i o

Tskew

Too early!

Tpdmin

Comb

Logic

Comb

Logic

Courtesy K. Yang, UCLA

Page 17: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 17 http://vlsicad.ucsd.edu

Synchronous design = combinational logic + sequential elements

For each flip-flop:

Tmax+ Tsetup < Tcycle - Tskew

Tmin > Thold + Tskew

Summary: Timing Constraints

CLK

Thold Tsetup

Tcycle

DATA

CLK

DQ combinationallogic

FF FF

Tmax : longest data propagation path delay

Tmin : shortest data propagation path delay

Page 18: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 18 http://vlsicad.ucsd.edu

Partition the design

Clock network

Clock definition

Derived clock

Clock groups

Clock delay (skew) calculation

Timing constraints exist between clocks with a common divisor frequency

Data paths with timing constraints

Clock Identification

CLK1

DQ combinationallogic

FF FF

CLK2

/8 divider

CLK4

CLK3

Page 19: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 19 http://vlsicad.ucsd.edu

Data paths with timing constraints Starting from primary inputs/FF outputs Ending at primary outputs/FF inputs

Represented by a labeled directed graph G = <V,E> Timing node V ~ pin/primary input/output Timing edge E ~ gate/wire delay (Timing arc ~ gate delay)

Timing Graph

C

B

F

X

Y

U

0

0

1

A

.15

.20

.20

1

2

2

2

Z

V

2

A

C

B2

2

10

1

0

.20

.20X

YZ

U

V.15

F

Courtesy K. Keutzer et al. UCB

Page 20: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 20 http://vlsicad.ucsd.edu

Static analysis = vector-less worst case analysis Graph based path propagation No logics

Pre-characterized look-up tables for gate delays Min/max/rise/fall

Characterized interconnect delays On-the-fly delay calculation SDF (standard delay format) annotation

Characterization

X

Y

2

2

Z2

X

YZ

Page 21: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 21 http://vlsicad.ucsd.edu

Compute Longest Path

Compute longest path in a DAG G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Origin

(Kirkpatrick 1966, IBM JRD)

Courtesy K. Keutzer et al. UCB

C

B

F

X

Y

U

0

0

1

A

.15

.20

.20

1

2

2

2

Z

V

2

Page 22: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 22 http://vlsicad.ucsd.edu

Compute Longest Path

Compute longest path in a DAG G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Origin

(Kirkpatrick 1966, IBM JRD)

Courtesy K. Keutzer et al. UCB

C

B

F

X

Y

U

0

0

1

A

.15

.20

.20

1

2

2

2

Z

V

2

Dynamic programming

How to exclude a set of paths?

Page 23: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 23 http://vlsicad.ucsd.edu

Timing Analysis Terminology Actual arrival time (AAT): forward propagation

Required arrival time (RAT): backward propagation

Slack = RAT - AAT A measure of how much timing margin exists at each node Slack < 0 timing violation Can optimize a particular branch Can trade slack for power, area, robustness

Critical path

clock

Page 24: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 24 http://vlsicad.ucsd.edu

Static Timing Analysis Flow Read in

design (LEF/DEF) timing library (.lib) timing constraints (GCF) delay annotation (SDF)

Set up constraints Annotated delays IO path constraints Single cycle setup/hold

checks Timing exceptions

- False paths- Multi-cycle paths- Max delay constraints- Min delay constraints

Construct timing graph Partition clock domain

(form path groups) Ideal/propagated clock Case analysis

AAT propagation Levelization

Timing report End points with violations Path enumeration

Page 25: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 25 http://vlsicad.ucsd.edu

Timing Exceptions

False paths: topologically connected but logically impossible to enable

To enable a path Logically: non-controlling values (e.g., 0 for OR gates, 1 for AND

gates) at side inputs Temporally: earlier signal transitions at side inputs

clock

Page 26: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 26 http://vlsicad.ucsd.edu

False Path Representation

Abstracted graph

Set_false_path -from {…} –through {…} … -through {…} –to {…}

from to

through through

from tothrough through

Page 27: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 27 http://vlsicad.ucsd.edu

False Path Identification Tagged timing analysis

Arrival times with the same tag are compared to find worst case False path filtered

from tothrough through

clock

a

btag: 2

ctag: 3

d

b

ac

d

arr: 3tag: 3

arr: 2tag: 2

arr: 1tag: 0

Page 28: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 28 http://vlsicad.ucsd.edu

Handling Latch-Based Designs Latch: level enabling sequential element

Transparent signal propagation

CLK

Tborrow

transparent

DATA

CLK

Dcombinationallogic

Latch

combinationallogic

Q

Time borrowing Path delay of previous stage

– Tborrow

Path delay of current stage + Tborrow

Page 29: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 29 http://vlsicad.ucsd.edu

Counting Process Variation Off-chip variation: two paths on a chip cannot use two

different operating conditions (i.e., corners) at the same time for setup or hold analysis

Launchclock_latepath (max) + data_latepath (max) < captureclock_earlypath (max) + clock_period – setup

Launchclock_earlypath (min) + data_earlypath (min) > captureclock_latepath (min) + hold

On-chip variation: the software calculates the delay for one path based on maximum operating condition while calculating the delay for another path based on minimum operating condition for setup or hold checks

Statistical static timing analysis (SSTA) Continuous pdf (probability distribution functions) Or discrete corners

pdf

Page 30: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 30 http://vlsicad.ucsd.edu

Clock Re-convergence Pessimism Removal Common part of two clock propagation paths cannot

have two different path delays at the same time

Need to compute clock propagation delay from the branch point

min

CLK

DQ combinationallogic

FF FF

max

max

Common part

Page 31: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 31 http://vlsicad.ucsd.edu

Outline

Timing Analysis Timing Requirements Static Timing Analysis

Timing Correction

Page 32: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 32 http://vlsicad.ucsd.edu

Timing Correction

Driven by STA “Incremental performance analysis backplane”

Two goals Fix logic design rule violations Fix timing problems

DAC-2002, Physical Chip Implementation

Page 33: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 33 http://vlsicad.ucsd.edu

Logic Design Rules Constraints of

Fanout Slew rate Load cap

Reduce timing look-up table extrapolation error

Control signal integrity Transition degradation Crosstalk noise Supply voltage drop Device reliability

Approaches Resizing Buffering Cloning (copying cells)

Page 34: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 34 http://vlsicad.ucsd.edu

Timing Correction Approaches

Re-synthesis Local synthesis transforms

Timing-driven placement Critical net weighting

Timing-driven routing Net ordering Buffering Topology optimization

Post-route optimization (IPO) Re-routing Re-timing and useful clock skew Sizing Buffering

DAC-2002, Physical Chip Implementation

Page 35: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 35 http://vlsicad.ucsd.edu

Local Synthesis Transforms

Resize cells

Buffer or clone to reduce load on critical nets

Decompose large cells

Swap connections on commutative pins or among equivalent nets

Move critical signals forward

Pad early paths

Area recovery

DAC-2002, Physical Chip Implementation

Page 36: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 36 http://vlsicad.ucsd.edu

Transform Example

Delay = 4

…..

Double Inverter

Removal

…..

…..

Delay = 2

DAC-2002, Physical Chip Implementation

Page 37: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 37 http://vlsicad.ucsd.edu

Resizing

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1

loadd

A B C

b

ad

e

f

0.2

0.2

0.3

?

b

aA

0.035

b

aC

0.026

DAC-2002, Physical Chip Implementation

Page 38: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 38 http://vlsicad.ucsd.edu

Cloning

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1

load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.2

0.2

?

b

a

d

e

f

g

h

A

B

DAC-2002, Physical Chip Implementation

Page 39: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 39 http://vlsicad.ucsd.edu

Buffering

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1

load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.2

0.2

? b

a

d

e

f

g

h0.1

0.2

0.2

0.2

0.2

B

B

0.2

DAC-2002, Physical Chip Implementation

Page 40: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 40 http://vlsicad.ucsd.edu

Redesign Fan-in Tree

a

c

d

b eArr(b)=3

Arr(c)=1

Arr(d)=0

Arr(a)=4

Arr(e)=61

1

1

c

d

e

Arr(e)=5

1

1b1

a

DAC-2002, Physical Chip Implementation

Page 41: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 41 http://vlsicad.ucsd.edu

Redesign Fan-out Tree

1

1

1

3

1

1

1

Longest Path = 5

1

1

1

3

1

2

Longest Path = 4Slowdown of buffer due to load

DAC-2002, Physical Chip Implementation

Page 42: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 42 http://vlsicad.ucsd.edu

Decomposition

DAC-2002, Physical Chip Implementation

Page 43: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 43 http://vlsicad.ucsd.edu

Swap Commutative Pins

2

c

ab

2

1

0 1

1

1

3

a

cb

2

1

0

1

1

2

1 5

Simple sorting on arrival times and delay works

DAC-2002, Physical Chip Implementation

Page 44: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 44 http://vlsicad.ucsd.edu

Logic Restructuring 1

• Nodes in critical section that fan out outside of critical section are duplicated

h

f

a

b

c

d

e

Late input signals

f

Collapsed node

b

c d

a e

e

h

Slides courtesy of Keutzer

Page 45: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 45 http://vlsicad.ucsd.edu

Logic Restructuring 2

f

Collapsed node

b c d

a e

Collapse critical section

Re-extract factor k

divisor

f

kd

cclose tooutput

b

a e

Place timing-critical nodes closer to output Make them pass through fewer gates After collapse, a divisor is selected such that substituting k into f places critical

signal c and d closer to output

Slides courtesy of Keutzer

Page 46: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 46 http://vlsicad.ucsd.edu

Summary of Local Synthesis Transforms

Variety of methods for delay optimization No single technique dominates

The one with more tricks wins? No!

Technology dependant (for gate delay) Differ with cell libraries

Methodology dependant (for wire delay) Need to predict placement and routing result Uncertainty!

Pros: large potential improvement

Cons: less predictable, more expensive

Page 47: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 47 http://vlsicad.ucsd.edu

Summary of Local Synthesis Transforms

Work smoothly in a physical synthesis flow Tight integration with placement and routing

Need a good framework for evaluating and processing different transforms

Accurate, fast timing engine with incremental analysis capability- don’t want to retime the whole design for each local transform

Simultaneous min and max delay analysis- How does fixing the setup violation affect the existing hold checks?

Page 48: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 48 http://vlsicad.ucsd.edu

Timing Correction Approaches

Re-Synthesis Local Transformation

Timing-Driven Placement

Timing-Driven Routing

Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering

Page 49: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 49 http://vlsicad.ucsd.edu

Reducing Crosstalk Effect

Shielding Effective for short range capacitive coupling Not for long range inductive coupling

Net ordering (wire swizzling)

Page 50: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 50 http://vlsicad.ucsd.edu

Reducing Crosstalk Effect Shielding

Net ordering

Gate sizing A strong driver is less sensitive to crosstalk But more likely to project crosstalk to its neighbors

Page 51: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 51 http://vlsicad.ucsd.edu

Reducing Crosstalk Effect Shielding

Net ordering

Gate sizing

Buffering Partition interconnects Mutual canceling:

Page 52: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 52 http://vlsicad.ucsd.edu

Timing Correction Approaches

Re-Synthesis Local Transformation

Timing-Driven Placement

Timing-Driven Routing

Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering

Page 53: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 53 http://vlsicad.ucsd.edu

Re-Timing

How would you meet the 10ns clock cycle time?

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

Page 54: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 54 http://vlsicad.ucsd.edu

Re-Timing

Re-order sequential elements and combinational logic

Did you see a problem here?

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

Page 55: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 55 http://vlsicad.ucsd.edu

Re-Timing

Re-order sequential elements and combinational logic

Need to predict placement and routing

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

Page 56: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 56 http://vlsicad.ucsd.edu

Useful Clock Skew

Equivalent to re-timing

Clock tree re-construction Insert delay cells Snaking Add dummy capacitive load

clock

FF

D Q

FF

D Q

FF

D Q

6 4 2 4 4

Cycle = 10

+2

Page 57: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 57 http://vlsicad.ucsd.edu

Timing Correction Approaches

Re-Synthesis Local Transformation

Timing-Driven Placement

Timing-Driven Routing

Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering

Page 58: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 58 http://vlsicad.ucsd.edu

Driving Large Capacitances: Inverter As Buffer

In

A

1 U

U*A

CL = X * CinCin

•Slide courtesy of Mary Jane Irwin, PSU

Total propagation delay = tp(inv) + tp(buffer)

tp0 = delay of min-size inverter with single min-size inverter as fanout load

Minimize tp = U * tp0 + X/U * tp0

Uopt = sqrt(X) ; tp,opt = 2 tp0 * sqrt(X)

Use only if combined delay is less than unbuffered case

Page 59: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 59 http://vlsicad.ucsd.edu

Delay Reduction With Cascaded Buffers

CL = xCin = uN Cin

in out

CLCin C1 C2

1 u u2 uN-1

•Slide courtesy of Mary Jane Irwin, PSU

Cascade of buffers with increasing sizes (U = tapering factor) can reduce delay

If load is driven by a large transistor (which is driven by a smaller transistor) then its turn-on time dominates overall delay

Each buffer charges the input capacitance of the next buffer in the chain and speeds up charging, reducing total delay

Cascaded buffers are useful when Rint < Rtr

Page 60: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 60 http://vlsicad.ucsd.edu

tp as Function of U and X

1.0 3.0 5.0 7.0u

0.0

20.0

40.0

60.0u

/ln

(u)

x=10

x=100

x=1000

x=10,000

•Slide courtesy of Mary Jane Irwin, PSU

Total line delay as function of driver size, load capacitance

Question: Derive the optimum (min-delay) value of U.

Page 61: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 61 http://vlsicad.ucsd.edu

Reducing RC Delay With Repeaters

RC delay is quadratic in length must reduce length

Observation: 22 = 4 and 1+1 = 2 but 12 + 12 = 2

driver receiver

driver

receiver

L = 2 units

Repeater = strong driver (usually inverter or pair of inverters for non-inversion) that is placed along a long RC line to “break up” the line and reduce delay

Page 62: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 62 http://vlsicad.ucsd.edu

Repeaters vs. Cascaded Buffers

Repeaters are used to drive long RC lines Breaking up the quadratic dependence of delay on line length is

the goal Typically sized identically

Cascaded buffers are used to drive large capacitive loads, where there is no parasitic resistance

We put all buffers at the beginning of the load This would be pointless for a long RC wire since the wire RC

delay would be unaffected and would dominate the total delay

Optimum buffering for an uniform long interconnect Cascaded buffers at source and sink Identical sized and spaced repeaters in between

Page 63: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 63 http://vlsicad.ucsd.edu

Buffering a Tree for Timing Optimization

Van Ginneken’s dynamic programming

Bottom-up traversal Evaluate each sub-tree by a triple

<delay, cap, cost> Filter out sub-optimal solutions

Limitations Buffer insertion locations (explored by

edge segmenting) Buffer insertion constraints (e.g., legal

buffer locations) Routing detour Delay calculation accuracy (wire

delay, slew rate, etc.)

<delay, cap, cost>

<delay, cap, cost>

Page 64: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 64 http://vlsicad.ucsd.edu

Buffering a Tree for Load Cap Constraints

Greedy for a single line Bottom-up traversal Insert a buffer when load cap reaches

limit

Greedy for a fixed routing tree Bottom-up traversal For each edge, greedy insertion For each node, buffer the branch with

the largest cap

NP-hard for simultaneous buffering and routing construction

C1 C2 C3 C4

C1 < U, C2 < U, C3 < U, C4 < UC1 + C2 + C3 + C4 > U

Page 65: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 65 http://vlsicad.ucsd.edu

Timing-Driven Routing Tree Construction

Minimum wirelength (Steiner Minimum Tree)

Given a set of terminals S

Find an additional set of points A such that a spanning tree T over S A has minimum wirelength

May not be timing optimum Some sinks are more timing

critical than others Some sinks have larger

capacitive load Buffers?

S

T

Page 66: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 66 http://vlsicad.ucsd.edu

Timing-Driven Routing Tree Construction

Minimum wirelength (Steiner Minimum Tree)

Shortest Path Tree

AHHK Tree Cost(q) = k * path_length(p)

+ edge_length(p, q) k = 0 minimum wirelength k = 1 shortest path

Heuristics with sink timing criticality weights

S

T

Page 67: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 67 http://vlsicad.ucsd.edu

Timing-Driven Routing Tree Construction

Simultaneous routing tree construction and buffer insertion

Buffer station (legal buffer locations) Routing blockage

Dynamic programming P-Tree

Clustering (C-Tree) Timing criticality Geometric distance Signal polarity Try AHHK with different k

Page 68: ECE260B – CSE241A Winter 2005 Timing Analysis and Correction

ECE 260B – CSE 241A Timing Analysis & Correction 68 http://vlsicad.ucsd.edu

Timing-Driven Routing Tree Topology Optimization

Chicken-egg dilemma (delay vs. routing)

Iterative greedy improvement (Q-Tree)

S

T

Delta Elmore delay

Buffer location