64
1 Day 7 Clock Tree Synthesis Session Speaker Ajaya Kumar.s

7 Clk Tree Synthesis

Embed Size (px)

Citation preview

Page 1: 7 Clk Tree Synthesis

1

Day 7Clock Tree Synthesis

Session SpeakerAjaya Kumar.s

Page 2: 7 Clk Tree Synthesis

2© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Session objective

After completing this session students will be able to:

• Clock Tree General Concepts• Impact of clock skew • Clock Skew Types• CTS in design flow and basic steps• CTS in real P&G flow• Set up the design for clock tree synthesis• Perform clock tree synthesis• Perform post CTS optimizations• Analyze timing and clock specifications post CTS

Page 3: 7 Clk Tree Synthesis

3© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Session Topics

• Clock Tree Synthesis (CTS) goals

• Clock tree attribute

• Clock Distribution schemes

• Clock Skew

• Clock Tree Optimization Techniques

• Effect of clock tree synthesis

• Identify settings of key timing parameters for pre-CTS and post-CTS stages

Page 4: 7 Clk Tree Synthesis

4© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Design Status, Start of CTS Phase

• Placement - completed• Power and ground nets – prerouted• Estimated congestion – acceptable• Estimated timing – acceptable (~0ns slack)• Estimated max cap/transition – no violations• High fanout nets:

• Reset, Scan Enable synthesized with buffers• Clocks are still not buffered

Why are there no buffers on clock nets?Why are there no buffers on clock nets?

Page 5: 7 Clk Tree Synthesis

5© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

Clock

All clock pins are driven by a single clock source.All clock pins are driven by a single clock source.

Before CTS

Page 6: 7 Clk Tree Synthesis

6© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

CTS Goals

Meet logical Design Rule Constraints (DRC):Maximum transition delayMaximum load capacitanceMaximum fanoutMaximum buffer levels

Meet the clock tree targets:Maximum skewMin/Max insertion delay

Constraints are upper bound goals. If constraints are not met, violations will be reported.

Constraints are upper bound goals. If constraints are not met, violations will be reported.

Targets are "nice to have" goals. If targets are not met,no violations will be reported.

Targets are "nice to have" goals. If targets are not met,no violations will be reported.

Page 7: 7 Clk Tree Synthesis

7© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree Synthesis (CTS) (1/2)

A buffer tree is built to balance the loads and minimize the skew.A buffer tree is built to balance the loads and minimize the skew.

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

Clock

Page 8: 7 Clk Tree Synthesis

8© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree Synthesis (CTS) (2/2)

A “delay line” is added to meet the minimum insertion delay.A “delay line” is added to meet the minimum insertion delay.

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

FF FF FF

Clock

Page 9: 7 Clk Tree Synthesis

9© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock tree begins atSDC-defined clock source:

create_clock

Clock tree ends at “sinks”Clock sinks are:

Stop / Float pinsExclude pins (aka ignore pins)

D Q

FF

CLK

GATED

CLOCK

Start

D Q

FF

CLK

D Q

FF

CLK

STOP

STOP

STOP

Where does the Clock Tree Begin and End?

Page 10: 7 Clk Tree Synthesis

10© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Define Clock Root Attributes (1/2)

When the clock root is a primary port of a blockEnsure that an appropriate driving cell is definedset_driving_cellThe synthesis constraints may include a weak driving cell for all inputs, including the clock portBecause the clock is ideal during synthesis it has no effect on design QoRBut a weak driver on the clock port affects clock tree QoR during CTS

Clock root defined on primary clock port

Driving Cell

External driving cell specified for clock portCLK

Page 11: 7 Clk Tree Synthesis

11© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Define Clock Root Attributes (2/2)

When the clock root is a primary port, but at the CHIP-level through an IO-PAD

Ensure that an appropriate input transition is defined

set_input_transition

Clock root defined on primary clock port

Specify input transition

IO_PADCLK

Page 12: 7 Clk Tree Synthesis

12© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

STOP Pins:CTS optimizes for DRC and clock tree targets (skew, insertion delay)

FLOAT Pins:Like Stop pins, but with delays on clock pin

EXCLUDE Pins:CTS optimizes for DRC only(ignores clock tree targets)

D Q

FF

CLK

GATEDD Q

FF

CLKCLOCK

skew and insertion delay are ignored

skew and insertion delay are optimized

IPIP_CLK

D Q

FF

CLK

CLK_OUT

Implicit STOP or FLOAT pins

Implicit EXCLUDE pins

Exceptions

Stop, Float and Exclude Pins

Page 13: 7 Clk Tree Synthesis

13© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Skew will be balanced ‘globally’, within each clock domain, across all clock-pins of both master and generated clock.

D Q

FF1

CLK

D Q

FF4

CLK

D Q

FF5

CLK

D Q

FF2

CLK

CLOCK

create_clockD Q

FFDCLK

QN

D Q

FF3

CLK

GATED

create_generated_clock

All insertion delays are matched

0.64

0.65

0.63

Generated and Gated Clocks

Page 14: 7 Clk Tree Synthesis

14© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Scenario: If the clock pin inside a macro cell is correctly defined, CTS will treat that pin as an implicit stop pin. In this example the clock pin is not defined. What is the problem here?

The macro’s clock pin is marked as an implicit exclude pin – no skew optimization!

The macro’s clock pin is marked as an implicit exclude pin – no skew optimization! IP (FRAM)

IP_CLK

CLOCK

skew and insertion delay

are ignored

Implicit exclude pin no clock

pin definition

D Q

FF

CLKn?

D Q

FF3

CLK

User-defined or Explicit Stop Pins

Page 15: 7 Clk Tree Synthesis

15© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Defining an explicit stop pin allows CTS to optimize for skew and insertion delay targets.

Defining an explicit stop pin allows CTS to optimize for skew and insertion delay targets.

CTS has no knowledge of the IP-internal clock delay – it can only “see”up to the stop pin!

CLOCK

skew and insertion delay are now

optimized

Explicit stop pin defined

IP

D Q

FF

CLKn

0.17

IP_CLK

D Q

FF

CLK0.42

0.43

set_clock_tree_exceptions –stop_pins [get_pins IP/IP_CLK]

Defining an Explicit Stop Pin

Page 16: 7 Clk Tree Synthesis

16© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Defining an explicit floatpin allows CTS to adjust the insertion delays based on specification.

Defining an explicit floatpin allows CTS to adjust the insertion delays based on specification.

IP

D Q

FF

CLKn

0.15

CLOCK

skew and insertion delay are now optimized

Explicit float pin defined

IP_CLK

D Q

FF

CLK

0.42

0.27

D Q

FF

CLKn

Exceptions

set_clock_tree_exceptions \-float_pins IP/IP_CLK \-float_pin_max_delay_rise 0.15

Defining an Explicit Float Pin

Page 17: 7 Clk Tree Synthesis

17© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree Optimization

CTOCTO

Routing

Placed design with clock tree synthesized that meets

setup/hold, tran/cap

Yes

NoClock specmet?

Clock specmet?

Clock AnalysisClock Analysis

Perform additional Clock Tree Optimization as necessary to further improve clock skew.

CT optimization is run inside clock_opt, and can be run independently as well:

optimize_clock_tree

Page 18: 7 Clk Tree Synthesis

18© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Distribution Systems

Combination of multiple techniquesCommon theme is tree + grid or spine + grid

Hybrid distribution

Interconnected (shorted) clock structureGrid

Multiple central structures with length (or delay) matched branches

Spines with matched branches

Central clock driverCentral spine

Multiple levels of balanced tree segmentsH-tree is most common

Balanced tree

Automated buffer placements with unconstrained trees

Unconstrained tree

DistributionStyle

Page 19: 7 Clk Tree Synthesis

19© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Unconstrained tree

It is commonly used in automatic synthesis flows and usually placed with little or no restriction on the number of buffer stages and explicit matching between interconnect delays and the buffer delays

Page 20: 7 Clk Tree Synthesis

20© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Balanced Tree

• Uses Recursive H structure to distributethe clock signal

• At each point of a new H in the tree, the resistance is halved and the capacitance is doubled

• Larger line width is used for the mainH structure to minimize resistance

• Narrower line width is used at thebranching points along the tree tominimize capacitance

Page 21: 7 Clk Tree Synthesis

21© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Balanced H-tree clock topology is a structural symmetry, a balanced tree exhibits identical nominal delay and identical buffer and interconnect segmentsfrom the root of the distribution to all branches.

Full balanced tree topologies are designed to span the entire die in both the horizontaland vertical dimensions. They are capable of delivering the clock to all regionsof the die.

Balanced Tree Cont….

Page 22: 7 Clk Tree Synthesis

22© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Driver

Narrow line to reduce capacitance

Larger line width to reduce resistance

H structure

Page 23: 7 Clk Tree Synthesis

23© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Grid Network

• Gridded clock distribution was

common on earlier DEC Alpha

microprocessors.

• Advantages:

• Clock signals are available everywhere

• Tolerant to process variations

• Usually yields extremely low skew

values

• Disadvantages:

• Huge amounts of wiring & power

• Routing area large

• Wire cap large

Page 24: 7 Clk Tree Synthesis

24© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Central Spine

A central spine clock distribution is a specific implementation of a binary tree. Figure shows an idealized central spine implementation with the final branches serving all parts of the die. The binary tree is shown to have embedded shorting at all distribution levels and unconstrained routing to the local loads at the final branches.

In this configuration, the clock can be transported in a balanced fashion acrossone dimension of the die with low structural skew. The unconstrained branchesare simple to implement although there will be residual skew due to asymmetry

Page 25: 7 Clk Tree Synthesis

25© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Hybrid Distribution

A hybrid clock distribution incorporates a combination of earlier described topologies.Common configurations are spines-grid distribution or tree-grid distribution. It employs a multilevel H-tree driving a common grid. Specifically, the multilevel H-tree delivers the clock from the clock generator to various regions of the die.

Regional buffers (labeled as level 4 buffers in Fig.) residing at the end of themultilevel H-tree drive a common grid that includes all local loads

Page 26: 7 Clk Tree Synthesis

26© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Pentium 4 processor clock distribution using centralized spines with delaymatched final branches.

Hybrid Distribution Cont…

Page 27: 7 Clk Tree Synthesis

27© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock distribution characteristics of commercial processors

Asymmetric tree

130241,500Itanium 2 processor

Spine/Grid180163,600Pentium 4

processor

Tree/Grid65113,400Xeon processor

Sym. H-Tree/Grid

6585000Power6

Tree/Grid65183,000Merom

Clock Dist. style

TechnologyskewFrequencyName

Page 28: 7 Clk Tree Synthesis

28© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Distribution Network

Page 29: 7 Clk Tree Synthesis

29© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Path Length and its Delay Model

S2S1

S0

N0

L0, W0

L2, W2L1, W1

R1

R2

R0S0

S1

S2C0/2 C0/2

C1/2C1/2

C2/2 C2/2CL2

CL1

An equal Path length clock Tree The delay Model

Page 30: 7 Clk Tree Synthesis

30© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

The skew between S1 and S2 is given by :

22

21

1

1LLs C

wrlC

wrlt −=

The skew variation in terms of wire width variation

222

2212

1

112

2

21

1

1 wwCrl

wwCrl

wwt

wwt

t LLs Δ+Δ−=Δ

∂∂

+Δ∂∂

If ww 15.0±=Δ the worst case additional skew is

⎟⎟⎠

⎞+⎜⎜

⎛=Δ

2

22

1

1115.0wCrl

wCrl

t LLs

Path Length and its Delay Model Cont….

Page 31: 7 Clk Tree Synthesis

31© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Buffer Insertion in Clock Tree

Different buffer delays cause phase delay variations on different source to sink paths, the given tolerable skew of a buffered clock tree ts into two components

Tolerable skew for buffer delays,

ws

bss ttt +=

bst = tolerable skew for buffer delays

wst = wire width variation after buffer insertion

Buffer insertion problem is to find the location on the clock tree to insert intermediate buffers and and these locations are buffer insertion points (BIP’s)

Page 32: 7 Clk Tree Synthesis

32© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Skew

• Clock skew is the maximum difference in the arrival time of a clock signal at two different components.

• Clock skew forces designers to use a large time period between clock pulses. This makes the system slower.

• So, in addition to other objectives, clock skew should be minimized during clock routing.

Page 33: 7 Clk Tree Synthesis

33© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Related path is minimized for skew

D Q

FF1

CLK

D Q

FF2

CLK

D Q

FF3

CLK

BCLOCK

ADIN

B_OUT

A_OUT

D Q

FF1

CLK

D Q

FF2

CLK

B

CLOCK

T1(0.2ns)

B_OUT

T2(0.2ns)

D Q

FF3

CLK

ADIN

T3(0.4ns)

A_OUT

Local Skew

Longer runtime

Page 34: 7 Clk Tree Synthesis

34© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

All clock delays are matched as close as

possible

D Q

FF1

CLK

D Q

FF2

CLK

D Q

FF3

CLK

BCLOCK

ADIN

B_OUT

A_OUT

D Q

FF1

CLK

D Q

FF2

CLK

B

CLOCK

T1(0.37ns)

B_OUT

D Q

FF3

CLK

ADIN A_OUT

T2(0.38ns)

T3(0.38ns)

Global Skew

Global skew is recommended - fastest

Page 35: 7 Clk Tree Synthesis

35© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

D Q

FF1

CLK

D Q

FF2

CLK

D Q

FF3

CLK

BCLOCK

ADIN

B_OUT

A_OUT

D Q

FF1

CLK

D Q

FF2

CLK

BCLOCK

T1(0.11ns)

B_OUT

T2(0.35ns)

D Q

FF3

CLK

ADIN A_OUT

T3(0.22ns)

Add clock delay to FF2 to help setup time

Useful Skew

Use to fix small violations where local or global failed

Page 36: 7 Clk Tree Synthesis

36© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree Optimization (1/2)

AfterAfter

AfterAfter

AfterAfter

AfterAfter Before

F F F FF F

3X 4X2X

F FF F F FF F F F F FF F4X

4X

F F F FF F

4X

3XF FF F F FF F F F F FF F

2X 4X5X3X

F F F FF F

4X

3XF FF F F FF F F F F FF F

F F F FF F

3X 4X2X

F FF F F FF F F F F FF F4X

4X

F F F FF F

3X

2X

F FF F F FF F F F F FF F4X

4X

Page 37: 7 Clk Tree Synthesis

37© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree Optimization (2/2)

Before

F F F FF FF FF F F FF F F F F FF F F FF F F F

Before

F F F FF FF FF F F FF F F F F FF FF F F FF F

AfterAfter

F F F FF FF FF F F FF F F F F FF F F FF FF F

AfterAfter

F FF F F FF F F F F FF FF F F FF F F FF FF F

Level Adjustment

Reconfiguration

Page 38: 7 Clk Tree Synthesis

38© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Design Problem

• What are the main concerns for clock design?• Skew

– No. 1 concern for clock networks– For increased clock frequency, skew may contribute over 10% of the

system cycle time • Power

– very important, as clock is a major power consumer!– It switches at every clock cycle!

• Noise– Clock is often a very strong aggressor– May need shielding

• Delay– Not really important– But slew rate is important (sharp transition)

Page 39: 7 Clk Tree Synthesis

39© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Effects of Clock Tree Synthesis

Clock buffers added

Inserting clock trees can introduce new timing and max

tran/cap violations

Congestion may increase

Non clock cells may have been moved to less ideal

locations

Page 40: 7 Clk Tree Synthesis

40©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Timing-Driven P&R

optimizes the logic gates, places and routes them to meet all timing constraints

Timing Constraints == Speed Goals

Page 41: 7 Clk Tree Synthesis

41©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Timing Constraints

Arrival time of inputsClock periodRequired arrival time at outputs

Page 42: 7 Clk Tree Synthesis

42©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

IC Compiler calculates delay for every cell and every netTo calculate delays, needs to know each net’s parasitic Rs and Cs

•Cell Delay = ƒ(Input Transition Time, Cnet + Cpin)

•Net Delay = ƒ(Rnet, Cnet + Cpin)

•0.5 ns

•Cnet •Cpin•Rnet

Timing is Based on Cell and Net Delays

Page 43: 7 Clk Tree Synthesis

43©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

TLU

TLU model comes from the vendor and is contained in “tech” fileContains capacitance look-up tables onlyResistance is calculated from the net geometry and a resistance/length (unit resistance) value from the tech file

Layer "METAL1" {…unitNomResistance = 6.4e-5…

}

CapTable "metal1_C_LATERAL_14MIN" {wireWidthSize = 5wireSpacingSize = 16wireWidth = (0.16, 0.32, 0.48, 0.64, 0.8)wireSpacing = (0.18, 0.36, 0.54, 0.72, …, 2.88)capValue = (0.000183764, 9.85682e-05, 6.5029e-05, …

)}

CapModel "metal1Config4" {refLayer = "METAL1"lateralCapType = "Table"lateralCapDataMin = "metal1_C_LATERAL_14MIN"…

}

Page 44: 7 Clk Tree Synthesis

44©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

TLU +:

TLU+ models:Model UDSM process effectsContain C and R look-up tablesIf TLU+ models are available, use them!

UDSM Process EffectsConformal DielectricMetal FillShallow Trench IsolationCopper Dishing:

• Density Analysis• Width/Spacing

Trapezoid Conductor

UDSM Process EffectsConformal DielectricMetal FillShallow Trench IsolationCopper Dishing:

• Density Analysis• Width/Spacing

Trapezoid ConductorSingle

Process File(ITF)

TLU+

nxtgrd

Astro

Star-RCXT

Page 45: 7 Clk Tree Synthesis

45©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Mapping file

The Mapping File maps the .tf layer/via names to Star-RCXT .itf layer/via names.

Layer "METAL" {

layerNumber = 14

maskName = "metal1"

DIELECTRIC cm_extra3 { THICKNESS=0.06 ER=4.2 }

CONDUCTOR cm { THICKNESS=0.26 WMIN=0.16 …}

DIELECTRIC diel1d { THICKNESS=0.435 ER=4.2 }

cb13.itfcb13.tf

conducting_layerspoly polymetal1 cmmetal2 cm2

cb13.map

Page 46: 7 Clk Tree Synthesis

46©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Now that R and C are known from TLU/TLU+, the delays can be calculatedFor Cell Delays, only Ceff is needed

Calculating Net Delay is done using Delay Calculation algorithms: Elmore, AWE, Arnoldi

C1

R1R2

R3C3

C4

U2

U1

C2

Calculating Cell and Net Delay

Page 47: 7 Clk Tree Synthesis

47©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

After placement, but prior to routing, net geometry is estimated based on a Virtual RouteSince Virtual Routing is only an estimate,Elmore should be used for all steps up to and including routing

Pin-to-pin timingVirtual Route

PreRoute Delay Calculation Algorithm: Elmore

Page 48: 7 Clk Tree Synthesis

48©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

After routing, detailed nets are available and extraction will be more accurateUse AWE or Arnoldi for postroute optimizationsArnoldi is preferred when comparing to PrimeTime

Detailed Route

PostRoute Delay Calculation Algorithms (Cont…..)

Page 49: 7 Clk Tree Synthesis

49©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

[a] Ignore Interconnect (OFF by default):

When turned “ON”

• ignores any parasitic capacitance and resistance effects of the interconnect

nets (i.e. Rnet = 0; Cnet = 0) during optimization and timing reporting

• The only time this is recommended is when performing a “timing sanity check”

on the starting netlist, before beginning placement. a“timing sanity check” is

performed by running a timing report with all the timing panel settings in pre-

CTS mode, but with Ignore Interconnect” turned “ON”.

Timing SetUp parameters

Page 50: 7 Clk Tree Synthesis

50©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

[b] Enable Preset/Clear Arcs (OFF by default):

By default, asynchronous preset and clear timing arcs are not analyzed for timing.

Depending on your design, you may have to enable this setting after CTS. E.g. if

Your design contains a reset network that is asserted asynchronously, will not

analyze for preset/clear violations on the flip-flops unless this setting is enabled

Timing Setup parameters (Cont…..)

Page 51: 7 Clk Tree Synthesis

51©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Tree not built yetClock signal ports are connected directly to all FF clock ports without a buffer tree

Clock

Clock skew?

D Q

D Q

D Q

Many more FFs

Even more FFs

The SDC should useset_clock_uncertainty command to model an estimate for the Clock Skew that is going to appear once the Clock Tree is synthesized

The SDC should useset_clock_uncertainty command to model an estimate for the Clock Skew that is going to appear once the Clock Tree is synthesized

Prior to Clock Tree Synthesis (1/3)

Page 52: 7 Clk Tree Synthesis

52©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Load on clock port/driver pre-CTS is tremendousCTS will buffer the load

Pre-CTS, the delay to the FFs is “ideal”. i.e. the delay is zero, unless commands are used to “model” the clock insertion delay. example:set_clock_latency

Pre-CTS, the delay to the FFs is “ideal”. i.e. the delay is zero, unless commands are used to “model” the clock insertion delay. example:set_clock_latency

Clock1

D Q

D Q

D Q

Many more FFs

Even more FFs

Clock2

Prior to Clock Tree Synthesis (Cont….)

Page 53: 7 Clk Tree Synthesis

53©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock transition at FF clk ports is unknown until CTS has been performedYou need to provide an estimate for pre-CTS timing analysis (default is 0)

Clock1

D Q

D Q

D QClock2

?

?

?

Pre-CTS, use SDC command set_clock_transition to apply a transition to all FF clock pins

Pre-CTS, use SDC command set_clock_transition to apply a transition to all FF clock pins

Prior to Clock Tree Synthesis (Cont….)

Page 54: 7 Clk Tree Synthesis

54©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

A setup timing check verifies the timing relationship between the clock and the data pin of a flip-flop so that the setup requirement is met. In other words, the setup check ensures that the data is available at the input of the flip-flop before it is clocked in the flip-flop. The data should be stable for a certain amount of time, namely the setup time of the flip-flop, before the active edge of the clock arrives at the flip-flop.

Setup timing check

Page 55: 7 Clk Tree Synthesis

55©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Hold Timing Check

A hold timing check ensures that a flip-flop output value that is changing does not pass through to a capture flip-flop and overwrite its output before the flip-flop has had a chance to capture its original value. This check is based on the hold requirement of a flip-flop. The hold specification of a flip-flop requires that the data being latched should be held stable for a specified amount of time after the active edge of the clock.

Page 56: 7 Clk Tree Synthesis

56©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

A removal timing check ensures that there is adequate time between an active clock edge and the release of an asynchronous control signal. The check ensures that the active clock edge has no effect because the asynchronous control signal remains active until removal time after the active clock edge. In other words, the asynchronous control signal is released (becomes inactive) well after the active clock edge so that the clock edge can have no effect.

Removal timing check

Page 57: 7 Clk Tree Synthesis

57©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Recovery Timing Check

A recovery timing check ensures that there is a minimum amount of time between the asynchronous signal becoming inactive and the next active clock edge. In other words, this check ensures that after the asynchronous signal becomes inactive, there is adequate time to recover so that the next active clock edge can be effective.

Page 58: 7 Clk Tree Synthesis

58©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

On-Chip Variations

Due to process variations, identical MOS transistors in different portions of the die may not have similar characteristics . These differences are due to process variations within the die. Note that the process parameter variations across multiple manufactured lots can cover the entire span of process models from slow to fast

These differences can arise due to many factors, including:

i. IR drop variation along the die area affecting the local power supply.

ii. Voltage threshold variation of the PMOS or the NMOS device.

iii. Channel length variation of the PMOS or the NMOS device.

iv. Temperature variations due to local hot spots.

v. Interconnect metal etch or thickness variations impacting the interconnectresistance or capacitance.

Page 59: 7 Clk Tree Synthesis

59©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Derating setup timing check for OCV.

Page 60: 7 Clk Tree Synthesis

60©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Derating Hold timing check for OCV.

Page 61: 7 Clk Tree Synthesis

61©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Data to Data Checks

One important distinction with respect to the setup check of a flip-flop is that the data to data setup check is performed on the same edge as the launch edge (unlike a normal setup check of a flip-flop, where the capture clock edge is normally one cycle away from the launch clock edge). Thus, the data to data setup checks are also referred to as zero-cycle checks or same-cycle checks.

Page 62: 7 Clk Tree Synthesis

62©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Clock Gating Checks

A clock gating check occurs when a gating signal can control the path of a clock signal at a logic cell. An example is shown in Figure. The pin of the logic cell connected to the clock is called the clock pin and the pin where the gating signal is connected to is the gating pin. The logic cell where the clock gating occurs is also referred to as the gating cell.

Page 63: 7 Clk Tree Synthesis

63©M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Power Gating

Power gating involves gating off the power supply so that the power to the inactive blocks can be turned off. This procedure is illustrated in Figure, where a footer (or a header) MOS device is added in series with the power supply. The control signal SLEEP is configured so that the footer (or header) MOS device is on during normal operation of the block. Since the power gating MOS device (footer or header) is on during normal operation, the block is powered and it operates in normal functional mode.

Page 64: 7 Clk Tree Synthesis

64© M.S.Ramaiah School Of Advanced Studies

PEMP VSD531

Session Summary

Clock tree synthesis is one of the most important steps of IC design and can have a significant impact on timing, power, area, etc.

Clock tree synthesis and optimization are an iterative processes and can require replacement and rerouting various times in order to optimize clock tree parameters.

CTS importance increases for 90nm and below technologies and especially when applying low power design techniques as they significantly change the ratio of gate interconnects as well as manners of building clock trees depending on their multi-level structures.

Differentiating between TLU/TLU+ models wrt process Foundry rules