44
CSE241 L4 System.1 Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 04: Static Timing Analysis

CSE241 L4 System.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Lecture 04: Static Timing Analysis

  • View
    228

  • Download
    0

Embed Size (px)

Citation preview

CSE241 L4 System.1 Kahng & Cichy, UCSD ©2003

CSE241AVLSI Digital Circuits

Winter 2003

Lecture 04: Static Timing Analysis

CSE241 L4 System.2 Kahng & Cichy, UCSD ©2003

All emails and announcements are archived on the class web page

Remember: tomorrow (Friday, January 17) noon-1:20pm (EBUI 3327) prepares you for the synthesis lecture on Tuesday (see syllabus on web for expected content)

HW questions 4-6 due todayLab #1 report due Monday; turn-in instructions and

script announced last nightThis class: introduction to static timing analysis

Goals:How does STA work?What are use models for STA?(What kinds of things are done with STA results?)

Logistics

CSE241 L4 System.3 Kahng & Cichy, UCSD ©2003

Determine fastest permissible clock speed (e.g. 100MHz) by determining delay (including set-up and hold time) of longest path from register to register (e.g. 10ns)

Largely eliminates need for gate-level simulation to verify delay of the circuit

Goal of Static Timing Verification

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.4 Kahng & Cichy, UCSD ©2003

Cycle Time - Critical Path Delay

Cycle time (T) cannot be smaller than longest path delay (Tmax)

Longest (critical) path delay is a function of:

Total gate, wire delays logic levels

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

Tclock1

data

cycle time

maxT T

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.5 Kahng & Cichy, UCSD ©2003

Cycle Time - Setup Time

For FFs to correctly capture data, must be stable for:

Setup time (Tsetup) before clock arrives

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

Tclock1

data

setup time

max setupT T T

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.6 Kahng & Cichy, UCSD ©2003

Cycle Time – Clock Skew

clock

Q1 Q2

Tclock1 Tclock2

Tclock1

Tclock2

Q2

data

clock skewQ2

6

If clock network has unbalanced delay – clock skew

Cycle time is also a function of clock skew (Tskew) max setup skewT T T T

critical path, ~5 logic levels

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.7 Kahng & Cichy, UCSD ©2003

Cycle Time – Flip-Flop Delay (Clock to Q)

Cycle time is also a function of propagation delay of FF (Tclk-to-Q)

Tclk-to-Q : time from arrival of clock signal till change at FF output)

clock

Q1 Q2

Tclock1 Tclock2

Tclock1

Tclock2

Q2clock-to-Q

data

Q2

max setup skew clk to QT T T T T

critical path, ~5 logic levels

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.8 Kahng & Cichy, UCSD ©2003

Min Path Delay - Hold Time

For FFs to correctly latch data, data must be stable during:

Hold time (Thold) after clock arrives

Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)

clock

Q1 Q2

Tclock1 Tclock2

short path, ~3 logic levels

Tclock1

data

hold time

min hold skewT T T

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.9 Kahng & Cichy, UCSD ©2003

Setup, Hold, Cycle Times

set-up time – D stablebefore clock

cycle time

Example of a single phase clock

hold time –D stable after clock

When signalmay change

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.10 Kahng & Cichy, UCSD ©2003

Approach – Reduce to Combinational

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

original circuit

extracted block

Combinationallogic

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.11 Kahng & Cichy, UCSD ©2003

Combinational Block

Arrival time in green A

C

B

f

2

2

2

1

0

1

0

.20

.20

.20

.10

X

YZ

W

.15

.05

.05

.05

Interconnect delay in red

Gate delay in blue

What’s the right mathematical object to use to represent this physical object?

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.12 Kahng & Cichy, UCSD ©2003

Problem Formulation - 1

C

B

f

X

Y

W

0

.05.1

1

.2

0

0

1

A

.15.20

.20

A

C

B

f

2

2

2

1

0

1

0

.20

.20

.20

.10

X

YZ

W

.15

.05

.05

.05

12

2

2

Z

Use a labeled directed graph

G = <V,E>

Vertices represent gates, primary inputs and primary outputs

Edges represent wires

Labels represent delays

HW #7: In this graph model, both nodes and edges have delays. How would you modify this model so that only edges have delay, yet the same timing analysis remains possible?

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.13 Kahng & Cichy, UCSD ©2003

Problem Formulation – Actual Arrival Time

Actual arrival time A(v) for a node v is latest time that signal can arrive at node v

uu

A( ) max (A(u) d )

FI( )

X

Y

A(Z)

Z

x zd

Y zd

A(X)

A(Y)

where d is delay from to andu u, {X,Y}, {Z}. FI(υ) = =

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.14 Kahng & Cichy, UCSD ©2003

Problem Formulation - 2

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

A

C

B

f

2

2

2

1

0

1

0

.20

.20

.20

.10

X

YZ

W

.15

.5

.05

.05

12

2

2

Z

Use a labeled directed graph

G = <V,E>

Enumerate all paths - choose the longest?

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.15 Kahng & Cichy, UCSD ©2003

Problem Formulation - 3

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

0

0

0

0

0

Origin

(Kirkpatrick 1966, IBM JRD)

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.16 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

0

0

0

0

0

Origin

0

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.17 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

0

0

000

Origin 0

0

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.18 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

0

0

0

.1

0

Origin

0

0

0

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.19 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

1

2

2

2

Z

0

0

0

.1

0

Origin

0

0

0

2.0

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.20 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

0

0

0

.1

0

Origin

1.1

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.21 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

2

Z

0

0

0

.1

0

Origin

1.1

2.2

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.22 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

.1

A

.15.20

.20

12

2

Z

0

0

0

.1

0

Origin

1.1

2

3

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.23 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z

0

0

0

0

0

Origin

1.1

2

3

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.24 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z

0

0

0

0

0

Origin

1.1

2

5.8

3

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.25 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z

0

0

0

0

0

Origin

1.1

2

5.8

3

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.26 Kahng & Cichy, UCSD ©2003

Algorithm Execution

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z0

0

0

0

0

Origin

1.1

2

5.8

3

5.95

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.27 Kahng & Cichy, UCSD ©2003

Critical Path (sub-graph)

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z0

0

0

0

0

Origin

1.1

2

5.8

3

5.95

Compute longest path in a graph G = <V,E,delay,Origin>

// delay is set of labels, Origin is the super-source of the DAG

Forward-prop(W){

for each vertex v in W

for each edge <v,w> from v

Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))

if all incoming edges of w have been traversed, add w to W

}

Longest path(G){

Forward_prop(Origin) }

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.28 Kahng & Cichy, UCSD ©2003

Timing for Optimization: Extra Requirements

Longest-path algorithm computes arrival times at each node

If we have constraints, need to propagate slack to each node

A measure of how much timing margin exists at each node Can optimize a particular branch

Can trade slack for power, area, robustness

clockCourtesy K. Keutzer et al. UCB

CSE241 L4 System.29 Kahng & Cichy, UCSD ©2003

Required Arrival Time

Required arrival time R(v) is the time before which a signal must arrive to avoid a timing violation

Then recursively

X

Y

R(Z)Z

Yxd

X zd R(X)

R(Y)

uu

R( ) min (R(u) d )

FO( )

where {Y,Z} and {X} FO( ) = =

Required time is user defined at output:

setupR(v) = T - T

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.30 Kahng & Cichy, UCSD ©2003

Required Time Propagation: Example

C

B

f

X

Y

W

0

.5.1

1

.2

0

0

1

A

.15.20

.20

1

23.6

2

Z

Assume required time at output R(f) = 5.80

Propagate required times backwards

0

O

0

0

0

Origin

1.1

2

5.8

3

5.95

5.65

3.45

3.45

0.95

0.45

-0.15

1.45

5.80

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.31 Kahng & Cichy, UCSD ©2003

Timing Slack From arrival and required time can compute slack. For

each node v:

Slack reflects criticality of a node

Positive slack Node is not on critical path. Timing constraints met.

Zero slack Node is on critical path. Timing constraints are barely met.

Negative slack There is a timing violation

Slack distribution is key for timing optimization!

S( ) R( ) A( )

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.32 Kahng & Cichy, UCSD ©2003

Timing Slack Computation: Example

Compute slack at each node

C

B

f

X

Y

W

A

ZOrigin

-0.15

-0.15

0.45

-0.15

0.45

-0.15

1.45

-0.15

S( ) R( ) A( )

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.33 Kahng & Cichy, UCSD ©2003

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

Computing longest path delay• Full path enumeration - potentially exponential• Longest path algorithm on DAG (Kirkpatrick 1966, IBM JRD)

(O(v+e) or O(g + p))Currently applied to even the largest (>10M gate) circuitsTwo challenges

Asynchronous subcircuitsFalse paths

Approach of Static Timing Verification

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.34 Kahng & Cichy, UCSD ©2003

Enhancements of STA

Incremental timing analysis

Nanometer-scale process effects – variation ( probabilistic timing analysis)

Interference – crosstalk

Multiple inputs switching

Conservatism of delay propagation

HW #8: Suppose you change the size of one (combinational) gate in your design, thus invalidating the previous timing analysis. How much work must be done to regain a correct timing analysis?

Courtesy K. Keutzer et al. UCB

CSE241 L4 System.35 Kahng & Cichy, UCSD ©2003

Timing Correction

Driven by STA “Incremental performance analysis backplane”

Fix electrical violations Resize cells Buffer nets Copy (clone) cells

Fix timing problems Local transforms (bag of tricks) Path-based transforms

DAC-2002, Physical Chip Implementation

CSE241 L4 System.36 Kahng & Cichy, UCSD ©2003

Local Synthesis Transforms

Resize cells

Buffer or clone to reduce load on critical nets

Decompose large cells

Swap connections on commutative pins or among equivalent nets

Move critical signals forward

Pad early paths

Area recovery

DAC-2002, Physical Chip Implementation

CSE241 L4 System.37 Kahng & Cichy, UCSD ©2003

Transform Example

Delay = 4

…..

Double Inverter

Removal

…..

…..

Delay = 2

DAC-2002, Physical Chip Implementation

CSE241 L4 System.38 Kahng & Cichy, UCSD ©2003

Resizing

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

ad

e

f

0.2

0.2

0.3

?

b

aA

0.035

b

aC

0.026

DAC-2002, Physical Chip Implementation

CSE241 L4 System.39 Kahng & Cichy, UCSD ©2003

Cloning

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.2

0.2

?

b

a

d

e

f

g

h

A

B

DAC-2002, Physical Chip Implementation

CSE241 L4 System.40 Kahng & Cichy, UCSD ©2003

Buffering

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.2

0.2

? b

a

d

e

f

g

h0.1

0.2

0.2

0.2

0.2

B

B

0.2

DAC-2002, Physical Chip Implementation

CSE241 L4 System.41 Kahng & Cichy, UCSD ©2003

Redesign Fan-in Tree

a

c

d

b eArr(b)=3

Arr(c)=1

Arr(d)=0

Arr(a)=4

Arr(e)=61

1

1

c

d

e

Arr(e)=5

1

1b1

a

DAC-2002, Physical Chip Implementation

CSE241 L4 System.42 Kahng & Cichy, UCSD ©2003

Redesign Fan-out Tree

1

1

1

3

1

1

1

Longest Path = 5

1

1

1

3

1

2

Longest Path = 4Slowdown of buffer due to load

DAC-2002, Physical Chip Implementation

CSE241 L4 System.43 Kahng & Cichy, UCSD ©2003

Decomposition

DAC-2002, Physical Chip Implementation

CSE241 L4 System.44 Kahng & Cichy, UCSD ©2003

Swap Commutative Pins

2

c

ab

2

1

0 1

1

1

3

a

cb

2

1

0

1

1

2

1 5

Simple sorting on arrival times and delay works

DAC-2002, Physical Chip Implementation