synchronus

8/20/2019 synchronus

http://slidepdf.com/reader/full/synchronus 1/41

TIMING AND CLOCKING ISSUES IN DIGITAL CIRCUITS

CHAPTER 1

INTRODUCTION

All sequential circuits have one property in common—a well-defined ordering of

the switching events must be imposed if the circuit is to operate correctly. If this were not

the case, wrong data might be written into the memory elements, resulting in a functional

failure. The synchronous system approach, in which all memory elements in the system

are simultaneously updated using a globally distributed periodic synchronization signal

that is, a global cloc! signal", represents an effective and popular way to enforce this

ordering. #unctionality is ensured by imposing some strict constraints on the generation

of the cloc! signals and their distribution to the memory elements distributed over the

chip$ non-compliance often leads to malfunction.

In a synchronous digital system, the cloc! signal is used to define a time reference

for the movement of data within that system. %ince this function is vital to the operation

of a synchronous system, much attention has been given to the characteristics of these

cloc! signals and the networ!s used in their distribution. &loc! signals are often regarded

as simple control signals$ however, these signals have some very special characteristics

and attributes. &loc! signals are typically loaded with the greatest fan-out, travel over the

longest distances, and operate at the highest speeds of any signal, either control or data,

within the entire system. %ince the data signals are provided with a temporal reference by

the cloc! signals, the cloc! waveforms must be particularly clean and sharp. #urthermore,

these cloc! signals are particularly affected by technology scaling, in that long global

interconnect lines become much more highly resistive as line dimensions are decreased.

This increased line resistance is one of the primary reasons for the growing importance of

cloc! distribution on synchronous performance. #inally, the control of any differences in

the delay of the cloc! signals can severely limit the ma'imum performance of the entire

system as well as create catastrophic race conditions in which an incorrect data signal

may latch within a register.

1.1 Synchronous Digital Systems

(ost synchronous digital systems consist of cascaded ban!s of sequential

registers with combinatorial logic between each set of registers. The functional

requirements of the digital system are satisfied by the logic stages. The global

Department of Electronics & Communication Enineerin




performance and local timing requirements are satisfied by the careful insertion of

pipeline registers into equally spaced time windows to satisfy critical worst case timing

constraints. The proper design of the cloc! distribution networ! further ensures that these

critical timing requirements are satisfied and that no race conditions e'ist. )ith the

careful design of the cloc! distribution networ! system-level synchronous performance

can actually increase, surpassing the performance advantages of asynchronous systems by

permitting synchronous performance to be based on average path delays rather than worst

case path delays, without incurring the handsha!ing protocol delay penalties required in

most asynchronous systems.

In a synchronous system, each data signal is typically stored in a latched state

within a bistable register awaiting the incoming cloc! signal, which determines when the

data signal leaves the register. *nce the enabling cloc! signal reaches the register, the data

signal leaves the bistable register and propagates through the combinatorial networ! and,

for a properly wor!ing system, enters the ne't register and is fully latched into that

register before the ne't cloc! signal appears. Thus, the delay components that ma!e up a

general synchronous system are composed of the following three individual subsystems+

" (emory storage elements

" ogic elements

/" &loc!ing circuitry and distribution networ!.

1. Im!ortance o" Cloc#

&loc! signals are typically loaded with the greatest fanout and operate at the

highest speeds of any signal within the synchronous system. %ince the data signals are

provided with a temporal reference by the cloc! signals, the cloc! waveforms must be

particularly clean and sharp. #urthermore, these cloc! signals are particularly affected by

technology scaling see (oore0s law", in that long global interconnect lines become

significantly more resistive as line dimensions are decreased. This increased line

resistance is one of the primary reasons for the increasing significance of cloc!

distribution on synchronous performance. #inally, the control of any differences and

uncertainty in the arrival times of the cloc! signals can severely limit the ma'imum

performance of the entire system and create catastrophic race conditions in which an

incorrect data signal may latch within a register.


http://en.wikipedia.org/wiki/Fanout

http://en.wikipedia.org/wiki/Waveform

http://en.wikipedia.org/wiki/Moore's_law


http://en.wikipedia.org/wiki/Race_hazard

http://en.wikipedia.org/wiki/Waveform


http://en.wikipedia.org/wiki/Race_hazard

http://en.wikipedia.org/wiki/Fanout




(ost synchronous digital systems consist of cascaded ban!s of sequential

registers with combinational logic between each set of registers. The functional

requirements of the digital system are satisfied by the logic stages. 1ach logic stage

introduces delay that affects timing performance, and the timing performance of the

digital design can be evaluated relative to the timing requirements by a timing analysis.

*ften special consideration must be made to meet the timing requirements. #or e'ample,

the global performance and local timing requirements may be satisfied by the careful

insertion of pipeline registers into equally spaced time windows to satisfy critical worst-

case timing constraints. The proper design of the cloc! distribution networ! helps ensure

that critical timing requirements are satisfied and that no race conditions e'ist see also

cloc! s!ew".

The delay components that ma!e up a general synchronous system are composed

of the following three individual subsystems+ the memory storage elements, the logic

elements, and the cloc!ing circuitry and distribution networ!.

2ovel structures are currently under development to ameliorate these issues and

provide effective solutions. Important areas of research include resonant cloc!ing

techniques, on-chip optical interconnect, and local synchronization methodologies.

1.$ Organi%ation o" the Re!ort

This report starts with an overview of the different timing methodologies. )e

analyze the impact of spatial variations of the cloc! signal, called cloc! s!ew, and

temporal variations of the cloc! signal, called cloc! 3itter, and introduce techniques to

cope with synchronous approach. These variations fundamentally limit the performance

that can be achieved using a conventional design methodology. The report is organized as

follows+

&hapter + Introduction - This chapter briefly e'plains the overview of the report.

&hapter + Timing (ethodologies - This chapter describes the different timing

methodologies with respect to the cloc! system.

&hapter /+ %ynchronous Timing 4asics - This chapter discusses the timing parameters of

the combinational logic circuits and sequential logic circuits. This chapter also includes

the analysis of synchronous sequential circuit with relative to timing parameters.


http://en.wikipedia.org/wiki/Digital

http://en.wikipedia.org/wiki/Flip-flop_(electronics)

http://en.wikipedia.org/wiki/Combinational_logic


http://en.wikipedia.org/wiki/Functional_requirements


http://en.wikipedia.org/wiki/Pipeline_(computing)

http://en.wikipedia.org/w/index.php?title=Timing_constraints&action=edit&redlink=1

http://en.wikipedia.org/wiki/Clock_skew

http://en.wikipedia.org/wiki/Digital

http://en.wikipedia.org/wiki/Flip-flop_(electronics)




http://en.wikipedia.org/wiki/Pipeline_(computing)

http://en.wikipedia.org/w/index.php?title=Timing_constraints&action=edit&redlink=1

http://en.wikipedia.org/wiki/Clock_skew




&hapter 5+ &loc! %!ew and &loc! 6itter - This chapter e'plains sources of the cloc! s!ew

and cloc! 3itter.

&hapter 7+ &loc! 8istribution Techniques - This chapter describes the different cloc!

distribution networ!s used to minimize the cloc! s!ew.

&hapter 9+ &onclusions - This chapter summarizes the ma3or accomplishments of this

report.

CHAPTER

TI&IN' &ETHODO(O'IES

In digital systems, signals can be classified depending on how they are related to a

local cloc!. %ignals that transition only at predetermined periods in time can be classified

as synchronous, mesochronous, or plesiochronous with respect to a system cloc!. A signal

that can transition at arbitrary times is considered asynchronous.

.1 Synchronous &etho)ology

A synchronous signal is one that has the e'act same frequency, and a !nown fi'ed

phase offset with respect to the local cloc!. In such a timing methodology, the signal is

:synchronized; with the cloc!, and the data can be sampled directly without any

uncertainty. In digital logic design, synchronous systems are the most straightforward

type of interconnect, where the flow of data in a circuit proceeds in loc!step with the

system cloc! as shown below.

*igure .1 %ynchronous interconnect methodology

<ere, the input data signal In is sampled with register = to give signal &in, which

is synchronous with the system cloc! and then passed along to the combinational logic

bloc!. After a suitable setting period, the output &out becomes valid and can be sampled





by = which synchronizes the output with the cloc!. In a sense, the :certainty period; of

signal &out, or the period where data is valid is synchronized with the system cloc!,

which allows register = to sample the data with complete confidence. The length of the

:uncertainty period,; or the period where data is not valid, places an upper bound on how

fast a synchronous interconnect system can be cloc!ed.

. &esochronous &etho)ology

A mesochronous signal is one that has the same frequency but an un!nown phase

offset with respect to the local cloc! :meso; from >ree! is middle". #or e'ample, if data

is being passed between two different cloc! domains, then the data signal transmitted

from the first module can have an un!nown phase relationship to the cloc! of the

receiving module. In such a system, it is not possible to directly sample the output at the

receiving module because of the uncertainty in the phase offset. A mesochronous"

synchronizer can be used to synchronize the data signal with the receiving cloc! as shown

below. The synchronizer serves to ad3ust the phase of the received signal to ensure proper

sampling.

*igure . (esochronous approach using variable delay line.

In #igure ., signal 8 is synchronous with respect to &l!A. <owever, 8 and

8 are mesochronous with &l!4 because of the un!nown phase difference between &l!A

and &l!4 and the un!nown interconnect delay in the path between 4loc! A and 4loc! 4.

The role of the synchronizer is to ad3ust the variable delay line such that the data signal

8/ a delayed version of 8" is aligned properly with the system cloc! of bloc! 4. In this

e'ample, the variable delay element is ad3usted by measuring the phase difference

between the received signal and the local cloc!. After register = samples the incoming

data during the certainty period, then signal 85 becomes synchronous with &l!4.

.$ Plesiochronous &etho)ology





A plesiochronous signal is one that has nominally the same, but slightly different

frequency as the local cloc! :plesio; from >ree! is near". In effect, the phase difference

drifts in time. This scenario can easily arise when two interacting modules have

independent cloc!s generated from separate crystal oscillators. %ince the transmitted

signal can arrive at the receiving module at a different rate than the local cloc!, one needs

to utilize a buffering scheme to ensure all data is received. Typically, plesiochronous

interconnect only occurs in distributed systems li!e long distance communications, since

chip or even board level circuits typically utilize a common oscillator to derive local

cloc!s. A possible framewor! for plesiochronous interconnect is shown in #igure ./.

*igure .$ ?lesiochronous communication using #I#*.

In this digital communications framewor!, the originating module issues data at

some un!nown rate characterized by &, which is plesiochronous with respect to &. The

timing recovery unit is responsible for deriving cloc! &/ from the data sequence, and

buffering the data in a #I#*. As a result, &/ will be synchronous with the data at the input

of the #I#* and will be mesochronous with &. %ince the cloc! frequencies from the

originating and receiving modules are mismatched, data might have to be dropped if the

transmit frequency is faster, and data can be duplicated if the transmit frequency is slower

than the receive frequency. <owever, by ma!ing the #I#* large enough, and periodically

resetting the system whenever an overflow condition occurs, robust communication can

be achieved.

In telecommunications, a plesiochronous system is one where different parts of

the system are almost, but not quite, perfectly synchronized. According to IT@-T

standards, a pair of signals is plesiochronous if their significant instants occur at

nominally the same rate, with any variation in rate being constrained within specified

limits. A sender and receiver operate plesiosynchronously if they operate at the same

nominal frequency but may have a slight frequency mismatch, which leads to a drifting

phase. The mismatch between the two systems0 cloc!s is !nown as the plesiochronous

difference.


http://en.wikipedia.org/wiki/Telecommunications

http://en.wikipedia.org/wiki/Synchronization

http://en.wikipedia.org/wiki/ITU-T


http://en.wikipedia.org/wiki/Synchronization

http://en.wikipedia.org/wiki/ITU-T




In general, plesiochronous systems behave similarly to synchronous systems,

e'cept they must employ some means in order to cope with sync slips, which will

happen at intervals due to the plesiochronous nature of the system. The most common

e'ample of a plesiochronous system design is the plesiochronous digital hierarchy

networ!ing standard.

The asynchronous serial communication protocol is asynchronous on the byte

level, but plesiochronous on the bit level. The receiver detects the start of a byte by

detecting a transition that may occur at a random time after the preceding byte. The

indefinite wait and lac! of e'ternal synchronization signals ma!es byte detection

asynchronous. Then the receiver samples at predefined intervals to determine the values

of the bits in the byte$ this is plesiochronous since it depends on the transmitter to

transmit at roughly the same rate the receiver e'pects, without coordination of the rate

while the bits are being transmitted.

.+ Asynchronous &etho)ology

Asynchronous signals can transition at any arbitrary time, and are not slaved to

any local cloc!. As a result, it is not straightforward to map these arbitrary transitions into

a synchronized data stream. Although it is possible to synchronize asynchronous signals

by detecting events and introducing latencies into a data stream synchronized to a local

cloc!, a more natural way to handle asynchronous signals is to simply eliminate the use of

local cloc!s and utilize a self-timed asynchronous design approach. In such an approach,

communication between modules is controlled through a handsha!ing protocol to perform

the proper ordering of commands.

*igure .+ Asynchronous methodology for simple pipeline interconnects.


http://en.wikipedia.org/wiki/Synchronous_system

http://en.wikipedia.org/wiki/Plesiochronous_digital_hierarchy

http://en.wikipedia.org/wiki/Asynchronous_serial_communication

http://en.wikipedia.org/wiki/Asynchronous_system

http://en.wikipedia.org/wiki/Synchronous_system

http://en.wikipedia.org/wiki/Plesiochronous_digital_hierarchy

http://en.wikipedia.org/wiki/Asynchronous_serial_communication

http://en.wikipedia.org/wiki/Asynchronous_system




)hen a logic bloc! completes an operation, it will generate a completion signal

8B to indicate that output data is valid. The handsha!ing signals then initiate a data

transfer to the ne't bloc!, which latches in the new data and begins a new computation by

asserting the initialization signal I. Asynchronous designs are advantageous because

computations are performed at the native speed of the logic, where bloc! computations

occur whenever data becomes available. There is no need to manage cloc! s!ew, and the

design methodology leads to a very modular approach where interaction between bloc!s

simply occurs through a handsha!ing procedure. <owever, these handsha!ing protocols

result in increased comple'ity and overhead in communication that can reduce

performance.

CHAPTER $

S,NCHRONOUS TI&IN' -ASICS

$.1 Synchronous Seuential Systems

A digital synchronous circuit is composed of a networ! of functional logic

elements and globally cloc!ed registers. #or an arbitrary ordered pair of registers = , = ",

one of the following two situations can be observed+ either the input of = cannot be

reached from the output of = by propagating through a sequence of logical elements only

or there e'ists at least one sequence of logic bloc!s that connects the output of = to the

input of = . In the former case, switching events at the output of the register = do not

affect the input of the register = during the same cloc! period. In the latter case—

denoted by = = —signal switching at the output of = will propagate to the input

of = . In this case, = , = " is called a sequentially-ad3acent pair of registers which ma!e

up a local data path see #ig. /.".





*igure $.1 ocal data path.

8elay &omponents of 8ata ?ath+ The minimum allowable cloc! period Tcp min"between

any two registers in a sequential data path is given by

1

f clkMAX

=T CP(min)T PD(max)+T Skew

!"#$%

)here∫¿+T Set −up= D(i , f )

T PD (max)=T C −Q+T Logic+T ¿ /$.0

and the total path delay of a data path T?8 ma'" is the sum of the ma'imum time required

for the data to leave the initial register once the cloc! signal &i arrives, T&-C, the time

necessary to propagate through the logic and interconnect, TogicDTInt, and the time

required to successfully propagate to and latch within the final register of the data path,

T%et-up. *bserve that the latest arrival time is given by Togic ma'" and the earliest arrival time

is given by Togic min", since data is latched into each register within the same cloc! period.

The sum of the delay components in " must satisfy the timing constraint of " in

order to support the cloc! period Tcp min", which is the inverse of the ma'imum possible

cloc! frequency, f cl!(AE. 2ote that the cloc! s!ew T%!ewi3 can be positive or negative

depending on whether &f leads or lags &i, respectively. The cloc! period is chosen such

that the latest data signal generated by the initial register is latched in the final register by

the ne't cloc! edge after the cloc! edge that activated the initial register. #urthermore, in

order to avoid race conditions, the local path delay must be chosen such that, for any two

sequentially-ad3acent registers in a multistage data path, the latest data signal must arrive

and be latched within the final register before the earliest data signal generated with the

ne't cloc! pulse arrives. The waveforms depicted in #ig. show the timing requirement

of " being barely satisfied i.e., the data signal arrives at = f 3ust before the cloc! signal

arrives at = f .





*igure $. Timing diagram of cloc!ed data path.

An e'ample of a local data path = i = f is shown in #ig. /.. The cloc!

signals &i and &f synchronize the sequentially- ad3acent pair of registers = i and = f ,

respectively. %ignal switching at the output of = i is triggered by the arrival of the cloc!

signal &i. After propagating through the logic bloc! if , this signal will appear at the input

of = f . Therefore, a nonzero amount of time elapses between the triggering event and the

signal switching at the input of = f . The minimum and ma'imum values of this delay are

called the short and long delays, respectively, and are denoted by di,f" and 8i,f",

respectively. 2ote that both di,f" and 8i,f" are due to the accumulative effects of three

sources of delay. These sources are the cloc!-to-output delay of the register = i, a delay

introduced by the signal propagating through if , and an interconnect delay due to the

presence of wires on the signal path, = i = f.

$. Timing Parameters "or Cominational (ogic

?hysically implemented combinational circuits 2A28 or 2*= gates for

e'ample" e'hibit certain timing characteristics.

A :F; or :; applied at the input to a combinational circuit does not result in an

instantaneous change at the output because of various electrical constraints. Input-to-

output delay in combinational circuits can be e'pressed with two parameters, propagation

delay, t pd , and contamination delay, t cd . 4oth delays are important characteristics for

circuits. They determine the ma'imum cloc! rate.

$..1 Pro!agation )elay /t pd 0





The amount of time needed for a change in a logic input to result in a permanent

change at an output, that is, the combinational logic will not show any further output

changes in response to an input change after time t pd units.

$.. Contamination )elay /t cd 0

The amount of time needed for a change in a logic input to result in an initial

change at an output, that is, the combinational logic is guaranteed not to show any output

change in response to an input change before t cd time units have passed.

*igure $.$ ?ropagation and &ontamination 8elays in the &ombinational ogic

&ombinational propagation delays are additive and so the propagation delay of a larger

combinational circuit can be determined by adding the propagation delays of each of the

circuit components along the longest path. In contrast, finding the contamination delay of

the circuit requires identifying the shortest path of contamination delays from input to

output and adding the delay values along this path.

$.$ Timing Parameters "or Seuential (ogic

%equential circuits can contain both &ombinational ogic and edge triggered flip

flops. A synchronous circuit is a digital circuit in which the parts are synchronized by a

cloc! signal. In an ideal synchronous circuit, every change in the logical levels of its

storage components is simultaneous. These transitions follow the level change of a special

signal called the cloc!. Ideally, the input to each storage element has reached its final

value before the ne't cloc! occurs, so the behaviour of the whole circuit can be predicted

e'actly. ?ractically, some delay is required for each logical operation, resulting in a

ma'imum speed at which each synchronous system can run.

To ma!e these circuits wor! correctly, a great deal of care is needed in the design

of the &loc! 8istribution 2etwor!s. %tatic timing analysis is often used to determine the

ma'imum safe operating speed. )hen sequential circuits are physically implemented theye'hibit certain timing characteristics that unli!e combinational circuits are specified in


http://en.wikipedia.org/wiki/Digital_circuit

http://en.wikipedia.org/wiki/Clock_signal

http://en.wikipedia.org/wiki/Clock_Distribution_Networks

http://en.wikipedia.org/wiki/Static_timing_analysis



http://en.wikipedia.org/wiki/Digital_circuit


http://en.wikipedia.org/wiki/Clock_Distribution_Networks





relation to the cloc! input. A flip-flop is edge triggered. A flip-flop stores when the cloc!

rises and is mostly never transparent. %ince flip-flops only change value in response to a

change in the cloc! value, timing parameters can be specified in relation to the rising for

positive edge-triggered" or falling for negative-edge triggered" cloc! edge. The following

parameters specify sequential circuit behavior. 2ote that these are all for positive edge-

triggered flip-flops unless otherwise specified, but are easily applied to negative edge

triggered flip-flops as well.

$.$.1 Pro!agation )elay /t clk−q0

The amount of time needed for a change in the flip-flop cloc! input 8 to result in

a change at the flip-flop output C. )hen the cloc! edge arrives, the 8 input value is

transferred to output C. After time t clk−q the output is guaranteed not to change value again

until another cloc! edge trigger arrives.

$.$. Contamination )elay /t cd 0

This value indicates the amount of time needed for a change in the flip-flop cloc!

input to result in the initial change at the flip-flop output C. The output of the flip-flop

maintains its initial value until time t cd has passed and is guaranteed not to show any

output change in response to an input change until after t cd has passed.

Note2 delays can be different for both rising and falling transitions.

$.$.$ Setu! time /t su0

The amount of time before the cloc! edge that data input 8 must be stable the

rising cloc! edge arrives.

$.$.+ Hol) time /t hold 0

This indicates the amount of time after the cloc! edge arrives that data input 8

must be held stable in order for the flip-flop to latch the correct value. <old time is

always measured from the rising cloc! edge for positive edge-triggered" to a point after

the cloc! edge.





*igure $.+ Timing ?arameters of %equential &ircuit

%etup and hold times are restrictions that a flip-flop places on combinational or

sequential circuitry that drives a flip-flop 8 input. The circuit has to be designed so the 8

input signal arrives at least t su time units before the cloc! edge and does not change until

at least t hold time units after the cloc! edge. If either of these restrictions is violated for any

of the flip-flops in the circuit, the circuit will not operate correctly. These restrictions limit

the ma'imum cloc! frequency at which the circuit can operate.

$.+ Determination o" the &a3imum Cloc# *reuency

(ost digital circuits contain both combinational components gates, multiple'es,

adders, etc." and sequential components flip-flops". These components can be combined

to form sequential circuits that perform computation and store results. 4y using

combinational and sequential component parameters, it is possible to determine the

ma'imum cloc! frequency at which a circuit will operate and generate correct results.

This analysis can best be e'amined through use of an e'ample.

*igure $.4 %equential &ircuit with &orresponding ?ropagation 8elays

The first step is model the given circuit so that the data paths between flip-flopsare characterized by the longest and shortest combinational delays between the flip-flops.





8irection of cloc!

*igure $.5 %equential &ircuit characterized by the longest and shortest delays.

et Tclk be the cloc! period. Then,

T clk ≥t clk −q (register )+ t p logic(longest )+ t su(estinationregister) /$.$0

T clk ≥0.6ns+8ns+0.4ns

T clk ≥9ns

Therefore the minimum cloc! period should be Gns.

The hold time of the destination register must be shorter than the minimum

propagation delay through the logic networ!,

t !ol<t c" q,c+t logic , c /$.+0

The above analysis is simplistic since the cloc! is never ideal. As a result of

process and environmental variations, the cloc! signal can have spatial and temporal

variations.





CHAPTER +

C(OC6 S6E7 AND C(OC6 8ITTER

+.1 Cloc# S#e9

The spatial variation in arrival time of a cloc! transition on an integrated circuit is

commonly referred to as cloc! s!ew. The cloc! s!ew between two points i and j on a I&

is given by H i, j" t i - t j, where t i and tj are the position of the rising edge of the cloc!

with respect to a reference. The cloc! s!ew can be positive or negative depending upon

the routing direction and position of the cloc! source. The timing diagram for the casewith positive s!ew is shown in #igure F.9. As the figure illustrates, the rising cloc! edge

is delayed by a positive H at the second register.

*igure +.1 Timing diagram of positive cloc! s!ew.

&loc! s!ew is caused by static path-length mismatches in the cloc! load and by

definition s!ew is constant from cycle to cycle. That is, if in one cycle CLK2 lagged

CLK1 by H, then on the ne't cycle it will lag it by the same amount. It is important to note

that cloc! s!ew does not result in cloc! period variation, but rather phase shift.

*igure +. Timing diagram of negative cloc! s!ew.





Skew has strong implications on performance and functionality of a sequential

system. #irst consider the impact of cloc! s!ew on performance. #rom #igure 5., a new

input In sampled by R1 at edge will propagate through the combinational logic and be

sampled by R2 on edge 5. If the cloc! s!ew is positive, the time available for signal to

propagate from R1 to R2 is increased by the s!ew H. The output of the combinational

logic must be valid one set-up time before the rising edge of CLK2 point 5". The

constraint on the minimum cloc! period can then be derived as+

T +# ≥t c−q+t logic+t su∨T ≥ t c−q+ t logic+t su−# /+.10

The above equation suggests that cloc! s!ew actually has the potential to improve the

performance of the circuit. That is, the minimum cloc! period required to operate the

circuit reliably reduces with increasing cloc! s!ewJ This is indeed correct, butunfortunately, increasing s!ew ma!es the circuit more susceptible to race conditions may

and harm the correct operation of sequential systems.

As above, assume that input n is sampled on the rising edge of CLK at edge

into R. The new values at the output of R propagate through the combinational logic

and should be valid before edge 5 at CLK . <owever, if the minimum delay of the

combinational logic bloc! is small , the inputs to R may change before the cloc! edge ,

resulting in incorrect evaluation. To avoid races, we must ensure that the minimum

propagation delay through the register and logic must be long enough such that the inputs

to R are valid for a hold time after edge . The constraint can be formally stated as

# + t !ol<t (c−q , c )+t (logic ,c )∨# < t ( c−q ,c )+t (logic, c)−t !ol /+.0

#igure 5. shows the timing diagram for the case when H K F. #or this case, the

rising edge of CLK happens before the rising edge of CLK . *n the rising edge of

CLK , a new input is sampled by R. The new sampled data propagates through the

combinational logic and is sampled by R on the rising edge of CLK , which corresponds

to edge 5. As can be seen from #igure F.L and 1q. 5.", a negative s!ew directly

impacts the performance of sequential system. <owever, a negative s!ew implies that the

system never fails, since edge happens before edge . This can also be seen from 1q.

5.", which is always satisfied since H K F.

1'ample scenarios for positive and negative cloc! s!ew are shown in #igure 5./.





*igure+.$ ?ositive and negative cloc! s!ew.

+.1.1 Positi:e Cloc# S#e9

H ; < = This corresponds to a cloc! routed in the same direction as the flow of the

data through the pipeline #igure 5./a". In this case, the s!ew has to be strictly controlled

and satisfy 1q. 5.". If this constraint is not met, the circuit does malfunction

independent of the cloc! period. =educing the cloc! frequency of an edge-triggered

circuit does not help get around s!ew problems. *n the other hand, positive s!ew

increases the throughput of the circuit as e'pressed by 1q. 5.", because the cloc! period

can be shortened by H. The e'tent of this improvement is limited as large values of H soon

provo!e violations of 1q. 5.".

+.1. Negati:e Cloc# S#e9

H > < = )hen the cloc! is routed in the opposite direction of the data #igure 5./b",

the s!ew is negative and condition 5." is unconditionally met. The circuit operates

correctly independent of the s!ew. The s!ew reduces the time available for actual

computation so that the cloc! period has to be increased by MHM. In summary, routing the

cloc! in the opposite direction of the data avoids disasters but hampers the circuit

performance.

@nfortunately, since a general logic circuit can have data flowing in both

directions for e'ample, circuits with feedbac!", this solution to eliminate races will not

always wor! #igure 5.5". The s!ew can assume both positive and negative values

depending on the direction of the data transfer. @nder these circumstances, the designer

has to account for the worst-case s!ew condition. In general, routing the cloc! so that

only negative s!ew occurs is not feasible.

Therefore, the design of a low-s!ew cloc! networ! is essential.





*igure+.+ 8atapath structure with feedbac!.

+. Cloc# 8itter

6itter is the undesired deviation from true periodicity of an assumed periodic

signal in electronics and telecommunications, often in relation to a reference cloc! source.

6itter may be observed in characteristics such as the frequency of successive pulses, the

signal amplitude, or phase of periodic signals. 6itter is a significant, and usually

undesired, factor in the design of almost all communications lin!s e.g., @%4, ?&I-e,

%ATA, *&-5N". In cloc! recovery applications it is called timing 3itter .

6itter can be quantified in the same terms as all time-varying signals, e.g., =(%, or

pea!-to-pea! displacement. Also li!e other time-varying signals, 3itter can be e'pressed in

terms of spectral density frequency content".

6itter period is the interval between two times of ma'imum effect or minimum

effect" of a signal characteristic that varies regularly with time. 6itter frequency, the more

commonly quoted figure, is its inverse. IT@-T >.NF classifies 3itter frequencies below

F <z as wander and frequencies at or above F <z as 3itter.

6itter may be caused by electromagnetic interference 1(I" and crosstal! with

carriers of other signals. 6itter can cause a display monitor to flic!er, affect the

performance of processors in personal computers, introduce clic!s or other undesired

effects in audio signals, and loss of transmitted data between networ! devices. The

amount of tolerable 3itter depends on the affected application.

&loc! 3itter refers to the temporal variation of the cloc! period at a given point —

that is, the cloc! period can reduce or e'pand on a cycle-by-cycle basis. It is strictly a

temporal uncertainty measure and is often specified at a given point on the chip. 6itter can


http://en.wikipedia.org/wiki/Signalling_(telecommunication)

http://en.wikipedia.org/wiki/Electronics




http://en.wikipedia.org/wiki/Frequency

http://en.wikipedia.org/wiki/Amplitude

http://en.wikipedia.org/wiki/Phase_(waves)


http://en.wikipedia.org/wiki/USB

http://en.wikipedia.org/wiki/PCI-e


http://en.wikipedia.org/wiki/SATA

http://en.wikipedia.org/wiki/OC-48

http://en.wikipedia.org/wiki/Clock_recovery

http://en.wikipedia.org/wiki/Root_mean_square


http://en.wikipedia.org/wiki/Signalling_(telecommunication)

http://en.wikipedia.org/wiki/Electronics



http://en.wikipedia.org/wiki/Frequency

http://en.wikipedia.org/wiki/Amplitude


http://en.wikipedia.org/wiki/USB


http://en.wikipedia.org/wiki/SATA

http://en.wikipedia.org/wiki/OC-48

http://en.wikipedia.org/wiki/Clock_recovery





be measured and cited in one of many ways. &ycle-to-cycle 3itter refers to time varying

deviation of a single cloc! period and for a given spatial location i is given as T jitter!in"

T i, n"1 - T i!n O T#, where T i!n is the cloc! period for period n, T i, n"1 is cloc! period for period

nD, and T CLK is the nominal cloc! period.

*igure+.4 &ircuit for studying the impact of 3itter on performance.

4.2.1 Jitter metrics

#or cloc! 3itter, there are three commonly used metrics+ a$solute jitter , period

jitter! and c%cle to c%cle jitter .

Absolute 3itter is the absolute difference in the position of a cloc!0s edge from

where it would ideally be.

?eriod 3itter a!a c%cle jitter " is the difference between any one cloc! period and

the idealPaverage cloc! period. Accordingly, it can be thought of as the discrete-time

derivative of absolute 3itter. ?eriod 3itter tends to be important in synchronous circuitry

li!e digital state machines where the error-free operation of the circuitry is limited by the

shortest possible cloc! period, and the performance of the circuitry is limited by theaverage cloc! period. <ence, synchronous circuitry benefits from minimizing period

3itter, so that the shortest cloc! period approaches the average cloc! period.

&ycle-to-cycle 3itter is the difference in lengthPduration of any two ad3acent cloc!

periods. Accordingly, it can be thought of as the discrete-time derivative of period 3itter. It

can be important for some types of cloc! generation circuitry used in microprocessors and

=A( interfaces.



http://en.wikipedia.org/wiki/Absolute_difference

http://en.wikipedia.org/wiki/Microprocessor

http://en.wikipedia.org/wiki/RAM


http://en.wikipedia.org/wiki/Absolute_difference

http://en.wikipedia.org/wiki/Microprocessor

http://en.wikipedia.org/wiki/RAM




%ince they have different generation mechanisms, different circuit effects, and

different measurement methodology, it is useful to quantify them separately.

4.3.2 Types of Jitter

There are three types of 3itter. They are random 3itter, deterministic 3itter and total

3itter. They are discussed as follows.

Ran)om ?itter

=andom 6itter, also called >aussian 3itter, is unpredictable electronic timing noise.

=andom 3itter typically follows a >aussian distribution or 2ormal distribution. It is

believed to follow this pattern because most noise or 3itter in a electrical circuit is caused

by thermal noise, which has a >aussian distribution. Another reason for random 3itter to

have a distribution li!e this is due to the central limit theorem. The central limit theorem

states that composite effect of many uncorrelated noise sources, regardless of the

distributions, approaches a >aussian distribution. *ne of the main differences between

random and deterministic 3itter is that deterministic 3itter is bounded and random 3itter is

unbounded.

Deterministic ?itter

8eterministic 3itter is a type of cloc! timing 3itter or data signal 3itter that is

predictable and reproducible. The pea!-to-pea! value of this 3itter is bounded, and the

bounds can easily be observed and predicted. 8eterministic 3itter can either be correlated

to the data stream data-dependent 3itter " or uncorrelated to the data stream bounded

uncorrelated 3itter". 1'amples of data-dependent 3itter are duty-cycle dependent 3itter

also !nown as duty-cycle distortion" and intersymbol interference. 8eterministic 3itter

or 86" has a !nown non->aussian probability distribution.

Total ?itter

Total 3itter T " is the combination of random 3itter R" and deterministic 3itter &"+

T 8 pea!-to-pea! D Q nQ= rms,

in which the value of n is based on the bit error rate 41=" required of the lin!.


http://en.wikipedia.org/wiki/Gaussian_distribution

http://en.wikipedia.org/wiki/Electrical_circuit

http://en.wikipedia.org/wiki/Thermal_noise

http://en.wikipedia.org/wiki/Central_limit_theorem

http://en.wikipedia.org/wiki/Deterministic_jitter


http://en.wikipedia.org/wiki/Data-dependent_jitter

http://en.wikipedia.org/wiki/Intersymbol_interference

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Probability_distribution


http://en.wikipedia.org/wiki/Gaussian_distribution

http://en.wikipedia.org/wiki/Electrical_circuit

http://en.wikipedia.org/wiki/Thermal_noise

http://en.wikipedia.org/wiki/Central_limit_theorem

http://en.wikipedia.org/wiki/Deterministic_jitter


http://en.wikipedia.org/wiki/Data-dependent_jitter

http://en.wikipedia.org/wiki/Intersymbol_interference

http://en.wikipedia.org/wiki/Normal_distribution





A common bit error rate used in communication standards such as 1thernet is

FR. The relationship for n and bit error rate is given in the table below.

Tale +.1 =elationship for n and 41=

+.$ Sources o" S#e9 an) 8itter

A perfect clock is defined as perfectly periodic signal that is simultaneous

triggered at various memory elements on the chip. <owever, due to a variety of process

and environmental variations, cloc!s are not ideal. To illustrate the sources of s!ew and

3itter, consider the simplistic view of cloc! generation and distribution as shown in #igure

5.N. Typically, a high frequency cloc! is either provided from off chip or generated on-

chip. #rom a central point, the cloc! is distributed using multiple matched paths to low-

level memory elements registers. In this picture, two paths are shown. The cloc! paths

include wiring and the associated distributed buffers required to drive interconnects and

loads. A !ey point to realize in cloc! distribution is that the absolute delay through a cloc!

distribution path is not important$ what matters is the relative arrival time between the

output of each path at the register points i.e., it is perfectly acceptable for the cloc! signal

to ta!e multiple cycles to get from a central distribution point to a low-level register as

long as all cloc!s arrive at the same time to different registers on the chip".


n -ER

9.5 FRF

9.L FR

L FR

L./ FR/

L.9 FR5

http://en.wikipedia.org/wiki/Ethernet






*igure +.5 %!ew and 6itter sources in synchronous cloc! distribution.

There are many reasons why the two parallel paths donSt result in e'actly the same

delay. The sources of cloc! uncertainty can be classified in several ways. #irst, errors can

be divided into systematic or random. %ystematic errors are nominally identical from chip

to chip, and are typically predictable e.g., variation in total load capacitance of each

cloc! path". In principle, such errors can be modeled and corrected at design time given

sufficiently good models and simulators. #ailing that, systematic errors can be deduced

from measurements over a set of chips, and the design ad3usted to compensate. =andom

errors are due to manufacturing variations e.g., dopant fluctuations that result in

threshold variations" that are difficult to model and eliminate. (ismatch may also be

characterized as static or time-varying. In practice, there is a continuum between changes

that are slower than the time constant of interest, and those that are faster. #or e'ample,

temperature variations on a chip vary on a millisecond time scale. A cloc! networ! tuned

by a one-time calibration or trimming would be vulnerable to time-varying mismatch due

to varying thermal gradients. *n the other hand, to a feedbac! networ! with a bandwidth

of several mega hertz, thermal changes appear essentially static. #or e'ample, the cloc!

net is usually by far the largest single net on the chip, and simultaneous transitions on thecloc! drivers induce noise on the power supply. <owever, this high speed effect does not

contribute to time-varying mismatch because it is the same on every cloc! cycle, affecting

each rising cloc! edge the same way. *f course, this power supply glitch may still cause

static mismatch if it is not the same throughout the chip. 4elow, the various sources of

s!ew and 3itter, introduced in #igure 5.9, are described in detail.

+.$.1 Cloc#=Signal 'eneration

The generation of the cloc! signal itself causes 3itter. A typical on-chip cloc!

generator, as described at the end of this chapter, ta!es a low-frequency reference cloc!





signal, and produces a high-frequency global reference for the processor. The core of such

a generator is a Boltage-&ontrolled *scillator B&*". This is an analog circuit, sensitive

to intrinsic device noise and power supply variations. A ma3or problem is the coupling

from the surrounding noisy digital circuitry through the substrate. This is particularly a

problem in modern fabrication processes that combine a lightly-doped epita'ial layer and

a heavily doped substrate to combat latch-up". This causes substrate noise to travel over

large distances on the chip. These noise source cause temporal variations of the cloc!

signal that propagate unfiltered through the cloc! drivers to the flip-flops, and result in

cycle-to-cycle cloc!-period variations. This 3itter causes performance degradation.

+.$. &anu"acturing De:ice @ariations

8istributed buffers are integral components of the cloc! distribution networ!s, as they are

required to drive both the register loads as well as the global and local interconnects. The

matching of devices in the buffers along multiple cloc! paths is critical to minimizing

timing uncertainty. @nfortunately, as a result of process variations, devices parameters in

the buffers vary along different paths, resulting in static s!ew. There are many sources of

variations including o'ide variations that affects the gain and threshold", dopant

variations, and lateral dimension width and length" variations. The doping variations can

affect the depth of 3unction and dopant profiles and cause variations in electrical

parameters such as device threshold and parasitic capacitances. The orientation of

polysilicon can also have a big impact on the device parameters. eeping the orientation

the same across the chip for the cloc! drivers is critical.

Bariation in the polysilicon critical dimension is particularly important as it

translates directly into (*% transistor channel length variation and resulting variations in

the drive current and switching characteristics. %patial variation usually consists of wafer

level or within-wafer" variation and die-level or within-die" variation. At least part of

the variation is systematic and can be modeled and compensated for. The random

variations however, ultimately limit the matching and s!ew that can be achieved.

+.$.$ Interconnect @ariations

Bertical and lateral dimension variations cause the interconnect capacitance and

resistance to vary across a chip. %ince this variation is static, it causes skew between

different paths. *ne important source of interconnect variation is the Inter-level 8ielectric

I8" thic!ness variations. In the formation of aluminum interconnect$ layers of silicon

dio'ide are interposed between layers of patterned metallization. The o'ide layer isdeposited over a layer of patterned metal features, generally resulting in some remaining





step height or surface topography. &hemical-mechanical polishing &(?" is used to

:planarize; the surface and remove topography resulting from deposition and etch as

shown in #igure 5.La".)hile at the feature scale over an individual metal line", &(? can

achieve e'cellent planarity, there are limitations on the planarization that can be achieved

over a global range. This is primarily caused due to variations in polish rate that is a

function of the circuit layout density and pattern effects. #igure 5.Lb shows this effect

where the polish rate is higher for the lower spatial density region, resulting in smaller

dielectric thic!ness and higher capacitance.

The assessment and control of variation is of critical importance in semiconductor

process development and manufacturing. %ignificant advances have been made to

develop analytical models for estimating the I8 thic!ness variations based on spatial

density. %ince this component is often predictable from the layout, it is possible to

actually correct for the systematic component at design time e.g., by adding appropriate

delays or ma!ing the density uniform by adding :dummy fills;". #igure 5.N shows the

spatial pattern density and I8 thic!ness for a high performance microprocessor. The

graphs show that there is clear correlation between the density and the thic!ness of the

dielectric. %o cloc! distribution networ!s must e'ploit such information to reduce cloc!

s!ew.

*igure +. Inter-level 8ielectric I8" thic!ness variation due to density.

*ther interconnect variations include deviation in the width of the wires and line

spacing. This results from photolithography and etches dependencies. At the lower levels

of metallization, lithographic effects are important while at higher levels etch effects are

important that depend on width and layout. The width is a critical parameter as it directly





impacts the resistance of the line and the wire spacing affects the wire-to-wire

capacitance, which the dominant component of capacitance.

*igure +.B ?attern density and I8 thic!ness variation for a high performance

microprocessor.

+.$.+ En:ironmental @ariations

1nvironmental variations are probably the most significant and primarily

contribute to s!ew and 3itter. The two ma3or sources of environmental variations are

temperature and power supply. Temperature gradients across the chip are a result of

variations in power dissipation across the die. This has particularly become an issue with

cloc! gating where some parts of the chip maybe idle while other parts of the chip might

be fully active. This results in large temperature variations. %ince the device parameters

such as threshold, mobility, etc." depend strongly on temperature, buffer delay for a cloc!

distribution networ! along one path can vary drastically for another path. (ore

importantly, this component is time-varying since the temperature changes as the logic

activity of the circuit varies. As a result, it is not sufficient to simulate the cloc! networ!s

at worst case corners of temperature$ instead, the worst-case variation in temperature

must be simulated. An interesting question is does temperature variation contribute to

s!ew or to 3itterU &learly the variation in temperature is time varying but the changes are

relatively slow typical time constants for temperature on the order of milliseconds".

Therefore it usually considered as a s!ew component and the worst-case conditions are

used. #ortunately, using feedbac!, it is possible to calibrate the temperature and

compensate.

?ower supply variations on the hand are the ma3or source of 3itter in cloc!

distribution networ!s. The delay through buffers is a very strong function of power





supply as it directly affects the drive of the transistors. As with temperature, the power

supply voltage is a strong function of the switching activity. Therefore, the buffer delay

along one path is very different than the buffer delay along another path. ?ower supply

variations can be classified into static or slow" and high frequency variations. %tatic

power supply variations may result from fi'ed currents drawn from various modules,

while high-frequency variations result from instantaneous I= drops along the power grid

due to fluctuations in switching activity. Inductive issues on the power supply are also a

ma3or concern since they cause voltage fluctuations. This has particularly become a

concern with cloc! gating as the load current can vary dramatically as the logic transitions

bac! and forth between the idle and active states. %ince the power supply can change

rapidly, the period of the cloc! signal is modulated on a cycle-by-cycle basis, resulting in

3itter. The 3itter on two different cloc! points maybe correlated or uncorrelated depending

on how the power networ! is configured and the profile of switching patterns.

@nfortunately, high-frequency power supply changes are difficult to compensate even

with feedbac! techniques. As a result, power supply noise fundamentally limits the

performance of cloc! networ!s.

+.$.4 Ca!aciti:e Cou!ling

The variation in capacitive load also contributes to timing uncertainty. There are

two ma3or sources of capacitive load variations+ coupling between the cloc! lines and

ad3acent signal wires and variation in gate capacitance. The cloc! networ! includes both

the interconnect and the gate capacitance of latches and registers. Any coupling between

the cloc! wire and ad3acent signal results in timing uncertainty. %ince the ad3acent signal

can transition in arbitrary directions and at arbitrary times, the e'actly coupling to the

cloc! networ! is not fi'ed from cycle-to-cycle. This results in cloc! 3itter. Another ma3or

source of cloc! uncertainty is variation in the gate capacitance related to the sequential

elements. The load capacitance is highly non-linear and depends on the applied voltage.

In many latches and registers this translates to the cloc! load being a function of the

current state of the latchPregister this is, the values stored on the internal nodes of the

circuit", as well as the ne't state. This causes the delay through the cloc! buffers to vary

from cycle-to-cycle, causing 3itter.

+.+ Im!act o" S#e9 an) 8itter on Per"ormance on Seuential Circuits

&onsider the sequential circuit show in #igure 5.G. Assume that nominally ideal

cloc!s are distributed to both registers the cloc! period is identical every cycle and the





skew is F". In reality, there is static skew H between the two cloc! signals assume that H V

F". Assume that CLK1 has a 3itter of tjitter1 and & has a 3itter of tjitter2. To determine

the constraint on the minimum cloc! period, we must loo! at the minimum available time

to perform the required computation. The worst case happens when the leading edge of

the current cloc! period on CLK1 happens late edge /" and the leading edge of the ne't

cycle of CLK2 happens early edge F". This results in the following constraint

T CL$ +# −t %itter1−t %itter 2≥t c−q+t logic+t su

¿ T ≥ t c−q+t logic+ t su−# +t %itter1+t %itter 2 /+.$0

*igure+. The impact of positive s!ew and 3itter on edge-triggered systems.

As the above equation illustrates, while positive s!ew can provide potential

performance advantage, 3itter has a negative impact on the minimum cloc! period. To

formulate the minimum delay constraint, consider the case when the leading edge of the

CLK1 cycle arrives early edge " and the leading edge the current cycle of CLK2 arrives

late edge 9". The separation between edge and 9 should be smaller than the minimum

delay through the networ!. This result in

# + t!ol+t %itter 1+ t %itter 2<t (c " q , c )+ t (logic , c)

¿

# < t (c " q , c )+t (logic,c) " t ! ol " t %itter 1 " t %itter2 /+.+0

The above relation indicates that the acceptable s!ew is reduced by the 3itter of the two

signals.





2ow consider the case when the s!ew is negative H KF" as shown in #igure 5.F.

#or the timing shown, MHM V t jitter2. It can be easily verified that the worst case timing is

e'actly the same as the previous analysis, with H ta!ing a negative value. That is, negative

s!ew reduces performance.

*igure +.1< The impact of negative s!ew and 3itter on edge-triggered systems.

+.4 Timing Constraints )ue to Cloc# S#e9

The magnitude and polarity of the cloc! s!ew have a two-sided effect on system

performance and reliability. 8epending upon whetherC i leads or lags

C f and upon

the magnitude ofT Skew with respect to

T PD , system performance and reliability can

either be degraded or enhanced. These cases are discussed below.

*igure +.11 Timing diagram of cloc!ed local data path.

+.4.1 &a3imum Data PathCloc# S#e9 Constraint Relationshi!

#or a design to meet its specified timing requirements, the greatest propagation

delay of any data path between a pair of data registers, &i and

&f , being

synchronized by a cloc! distribution networ! must be less than the minimum cloc! period

the inverse of the ma'imum cloc! frequency" of the circuit as shown in 5.. If the time

of arrival of the cloc! signal at the final register of a data pathT cf leads that of the time





of arrival of the cloc! signal at the initial register of the same sequential data path T ci Wsee

#ig. 5.A"X, the cloc! s!ew is referred to as positive cloc! s!ew and, under this

condition, the ma'imum attainable operating frequency is decreased. ?ositive cloc! s!ew

is the additional amount of time which must be added to the minimum cloc! period to

reliably apply a new cloc! signal at the final register, where reliable operation implies that

the system will function correctly at low as well as at high frequencies assuming fully

static logic". It should be noted that positive cloc! s!ew only affects the ma'imum

frequency of a system and cannot create race conditions.

*igure +.1 &loc! timing diagrams.

In the positive cloc! s!ew case, the cloc! signal arrives at R ' before it reaches Ri.

The ma'imum permissible positive cloc! s!ew can be e'pressed as

T ¿

∫ ¿+T Set −up

¿T Skew ' T CP−T PD(max)=T CP−¿

#or T Ci V T C /+.40

)here T PD (max ) is the ma'imum path delay between two sequentially-ad3acent registers.

This situation is the typical critical path timing analysis requirement commonly seen in

most high-performance synchronous digital systems. If 5.7" is not satisfied, the system

will not operate correctly at that specific cloc! period or cloc! frequency". Therefore,

T CP must be increased for the circuit to operate correctly, thereby decreasing the

system performance. In circuits where the tolerance for positive cloc! s!ew is small W





T Skew in 5.7" is smallX, the cloc! and data signals should be run in the same direction,

thereby forcing C ' to lag C i and ma!ing the cloc! s!ew negative.

+.4. &inimum Data PathCloc# S#e9 Constraint Relationshi!

If the cloc! signal arrives at Ri before it reaches R ' Wsee #ig. 5.4"X, the cloc!

s!ew is defined as being negative. 2egative cloc! s!ew can be used to improve the

ma'imum performance of a synchronous system by decreasing the delay of a critical

path$ however, a potential minimum constraint can occur, creating a race condition. In

this case, when C ' lags C i, the cloc! s!ew must be less than the time required for the data

signal to leave the initial register, propagate through the interconnect, combinatorial logic,

and setup in the final register see #ig.5.". If this condition is not met, the data stored in

register R ' is overwritten by the data that had been stored in register Ri and has propagatedthrough the combinatorial logic. #urthermore, a circuit operating close to this condition

might pass system diagnostics but malfunction at unpredictable times due to fluctuations

in ambient temperature or power supply voltage. &orrect operation requires R ' that latches

data which correspond to the data Ri latched during the previous cloc! period. This

constraint on cloc! s!ew is

∫ ¿+T (ol

|T Skew|'T PD(min)=T C −Q+T Logic (min)+T ¿

#or T c' V T ci /+.50

*igure +.1$ -bit shift register with positive cloc! s!ew

)hereT PD(min) is the minimum path delay between two sequentially-ad3acent registers

andT (ol is the amount of time the input data signal must be stable once the cloc!

signal changes state. An important e'ample in which this minimum constraint can occur

is in those designs which use cascaded registers, such as a serial shift register or a -bit

counter, as shown in #ig. 5./ note that a distributed =& impedance is between C i and





C ' ". In cascaded register circuits,T Logic(min) is zero and

∫¿T ¿

approaches zero since

cascaded registers are typically designed, at the geometric level, to abut". If T c' VT ci ( i.e.,

negative cloc! s!ew", then the minimum constraint becomes

|T Skew|'T C −Q+T (ol ,

#or T c' VT ci /+.0

and all that is necessary for the system to malfunction is a poor relative placement of the

flip flops or a highly resistive connection between C i and C ' . In a circuit configuration

such as a shift register or counter, where negative cloc! s!ew is a more serious problem

than positive cloc! s!ew, provisions should be made to force C ' to lead C i, as shown in

#ig. 5./.

As higher levels of integration are achieved in high-comple'ity B%I circuits, on-

chip testability becomes necessary. 8ata registers, configured in the form of serial

setPscan chains when operating in the test mode, and are a common e'ample of a design

for testability 8#T" technique. The placement of these circuits is typically optimized

around the functional flow of the data. )hen the system is reconfigured to use the

registers in the role of the setPscan function, different local path delays are possible. In

particular, the cloc! s!ew of the reconfigured local data path can be negative and greater

in magnitude than the local register delays. Therefore, with increased negative cloc!

s!ew, L" may no longer be satisfied and incorrect data may latch into the final register of

the reconfigured local data path. Therefore, it is imperative that attention be placed on the

cloc! distribution of those paths that have nonstandard modes of operation.

In ideal scaling of (*% devices, all linear dimensions and voltages are multiplied

by the factor P%, where %V. 8evice dependent delays, such asT C −Q ,

T Set −up , and

T Logic scale as P% while interconnect dominated delays such asT Skew remain

constant to first order, and if fringing capacitance and electro migration are considered,

actually increase with decreasing dimensions. Therefore, when e'amining the effects of

dimensional scaling on system reliability, 5.9" and 5.L" should be considered carefully.

*ne straight forward method to avoid the effect of technology scaling on those data paths

particularly susceptible to negative cloc! s!ew is to not scale the cloc! distribution lines.





%vensson and Afghahi show that by using courser than ordinary lines for the global cloc!

distribution, F-mm-wide chip sizes with

&(*% circuits scaled to F./ m polysilicon lines would have comparable logic and cross-

chip interconnect delays on the order of F.7 ns", ma!ing possible synchronous cloc!

frequencies of up to ><z. Therefore, the scaling of device technologies can severely

affect the design and operation of cloc! distribution networ!s, necessitating specialized

strategies and compensation techniques.

+.4.$ Enhancement o" Synchronous Per"ormance

ocalized cloc! s!ew can be used to improve synchronous performance by

providing more time for the critical worst case data paths. 4y forcing C i to lead C ' at each

critical local data path, e'cess time is shifted from the neighboring less critical local data

paths to the critical local data paths. This negative cloc! s!ew represents the additional

amount of time that the data signal at Ri has to propagate through the logic stages and

interconnect sections and into the final register. 2egative cloc! s!ew subtracts from the

logic path delay, thereby decreasing the minimum cloc! period. Thus, applying negative

cloc! s!ew, in effect, increases the total time that a given critical data path has to

accomplish its functional requirements by giving the data signal released from Ri more

time to propagate through the logic and interconnect stages and latch into R ' . Thus, the

differences in delay between each local data path are minimized, thereby compensating

for any inefficient partitioning of the global data path into local data paths that may have

occurred, a common situation in many practical systems.

The ma'imum permissible negative cloc! s!ew of a data path, however, is

dependent upon the cloc! period itself as well as the time delay of the previous data

paths. This results from the structure of the serially cascaded local data paths ma!ing up

the global data path. %ince a particular cloc! signal synchronizes a register which

functions in a dual role, as the initial register of the ne't local data path and as the final

register of the previous data path, the earlier C i is for a given data path, the earlier that

same cloc! signal, now C ' , is for the previous data path. Thus, the use of negative cloc!

s!ew in the ith path results in a positive cloc! s!ew for the preceding path, which may

then establish the new upper limit for the system cloc! frequency.









CHAPTER 4

C(OC6 DISTRI-UTION TECHNIUES

The most effective way to get the cloc! signal to every part of a chip that needs it,with the lowest s!ew, is a metal grid. In a large microprocessor, the power used to drive

the cloc! signal can be over /FY of the total power used by the entire chip. The whole

structure with the gates at the ends and all amplifiers in between have to be loaded and

unloaded every cycle. To save energy, cloc! gating temporarily shuts off part of the tree.

The cloc! distribution networ! or cloc! tree, when this networ! forms a tree"

distributes the cloc! signals" from a common point to all the elements that need it. %ince

this function is vital to the operation of a synchronous system, much attention has been

given to the characteristics of these cloc! signals and the electrical networ!s used in their

distribution. &loc! signals are often regarded as simple control signals$ however, these

signals have some very special characteristics and attributes.

4.1 Cloc# Distriution Strategies

(any different approaches, from ad hoc to algorithmic, have been developed for

designing cloc! distribution networ!s in synchronous digital integrated circuits. The

requirement of distributing a tightly controlled cloc! signal to each synchronous register

on a large nonredundant hierarchically structured integrated circuit an e'ample floorplan

is shown in #ig. 7." within specific temporal bounds is difficult and problematic.

#urthermore, the tradeoffs that e'ist among system speed, physical die area, and power

dissipation are greatly affected by the cloc! distribution networ!. The design

methodology and structural topology of the cloc! distribution networ! should be

considered in the development of a system for distributing the cloc! signals.

Therefore, various cloc! distribution strategies have been developed. The most

common and general approach to equipotential cloc! distribution is the use of buffered

trees. In contrast to these asymmetric structures, symmetric trees, such as <-trees, are

used to distribute high-speed cloc! signals. In developing structured custom integrated

circuits, such as is illustrated by the floorplan pictured in #ig. 7., specific circuit design

techniques are used to control the signal properties within the cloc! distribution networ!.ow-power design techniques are an area of significant currency and importance. %ome


http://en.wikipedia.org/wiki/Clock_gating


http://en.wikipedia.org/wiki/Electrical_network


http://en.wikipedia.org/wiki/Electrical_network




e'amples of different strategies for reducing the power dissipated within the cloc!

distribution networ!.

*igure 4.1 #loorplan of structured custom B%I circuit using synchronous cloc!

distribution.

4. -u""ere) Cloc# Distriution Trees

The most common strategy for distributing cloc! signals in B%I-based systems is

to insert buffers either at the cloc! source andPor along a cloc! path, forming a tree

structure.

Thus, the unique cloc! source is frequently described as the root of the tree, the initial

portion of the tree as the trun!, individual paths driving each register as the branches, and

the registers being driven as the leaves. This metaphor for describing a cloc! distribution

networ! is commonly accepted and used throughout the literature and is illustrated in #ig.

7..

*igure 4. Tree structure of cloc! distribution networ!.





*ccasionally, a mesh version of the cloc! tree structure is used in which shunt

paths further down the cloc! distribution networ! are placed to minimize the interconnect

resistance within the cloc! tree. This mesh structure effectively places the branch

resistances in parallel, minimizing the cloc! s!ew.

An e'ample of this mesh structure is described and illustrated in %ection IE-4.

The mesh version of the cloc! tree is considered in this paper as an e'tended version of

the standard, more commonly used cloc! tree depicted in #ig. 7.. The cloc! distribution

networ! is therefore typically organized as a rooted tree structure, as illustrated in #ig.

7.. Barious forms of a cloc! distribution networ!, including a trun!, tree, mesh, and <-

tree structures are illustrated in #ig. 7./. If the interconnect resistance of the buffer at the

cloc! source is small as compared to the buffer output resistance, a single buffer is often

used to drive the entire cloc! distribution networ!. This strategy may be appropriate if the

cloc! is distributed entirely on metal, ma!ing load balancing of the networ! less critical.

The primary requirement of a single buffer system is that the buffer should provide

sufficient current to drive the networ! capacitance both interconnect and fanout" while

maintaining high-quality waveform shapes i.e., short transition times" and minimizing

the effects of the interconnect resistance by ensuring that the output resistance of the

buffer is much greater than the resistance of the interconnect section being driven.

*igure 4.$ %tructures of cloc! distribution networ!s including a trun!, tree, mesh, and <-

tree.





An alternative approach to using only a single buffer at the cloc! source is to

distribute buffers throughout the cloc! distribution networ!, as shown in #ig. 7.. This

approach requires additional area but greatly improves the precision and control of the

cloc! signal waveforms and is necessary if the resistance of the interconnect lines is

nonnegligible. The distributed buffers serve the double function of amplifying the cloc!

signals degraded by the distributed interconnect impedances and isolating the local cloc!

nets from upstream load impedances. A three-level buffer cloc! distribution networ!

utilizing this strategy is shown in #ig. 7.5. In this approach a single buffer drives multiple

cloc! paths and buffers". The number of buffer stages between the cloc! source and each

cloc!ed register depends upon the total capacitive loading, in the form of registers and

interconnect, and the permissible cloc! s!ews. It is worth noting that the buffers are a

primary source of the total cloc! s!ew within a well-balanced cloc! distribution networ!

since the active device characteristics vary much more greatly than the passive device

characteristics. The ma'imum number of buffers driven by a single buffer is determined

by the current drive of the source buffer and the capacitive load assuming an (*%

technology" of the destination buffers. The final buffer along each cloc! path provides the

control signal of the driven register.

*igure 4.+ Three-level buffer cloc! distribution networ!.

<istorically, the primary design goal in cloc! distribution networ!s has been to

ensure that a cloc! signal arrives at every register within the entire synchronous system at

precisely the same time. This concept of zero cloc! s!ew design has been e'tended, to

provide either a positive or a negative cloc! s!ew at a magnitude depending upon the

temporal characteristics of each local data path in order to improve system performance

and enhance system reliability.





4.$ Symmetric H=Tree Cloc# Distriution Net9or#s

Another approach for distributing cloc! signals, a subset of the distributed buffer

approach depicted in #ig. 7., utilizes a hierarchy of planar symmetric <-tree or E-tree

structures see #ig. 7.7" to ensure zero cloc! s!ew by maintaining the distributed

interconnect and buffers to be identical from the cloc! signal source to the cloc!ed

register of each cloc! path.

*igure 4.4 %ymmetric <-tree and E-tree cloc! distribution networ!s.

In this approach, the primary cloc! driver is connected to the center of the main

:<; structure. The cloc! signal is transmitted to the four corners of the main :<.; These

four close to identical cloc! signals provide the inputs to the ne't level of the <-tree

hierarchy, represented by the four smaller :<; structures. The distribution process

continues through several levels of progressively smaller :<; structures. The final

destination points of the <-tree are used to drive the local registers or are amplified by

local buffers which drive the local registers. Thus, each cloc! path from the cloc! source

to a cloc!ed register has practically the same delay. The primary delay difference between

the cloc! signal paths is due to variations in process parameters that affect the

interconnect impedance and, in particular, any active distributed buffer amplifiers. The

amount of cloc! s!ew within an <-tree structured cloc! distribution networ! is strongly

dependent upon the physical size, the control of the semiconductor process, and the

degree to which active buffers and cloc!ed latches are distributed within the <-tree

structure.

The conductor widths in <-tree structures are designed to progressively decrease

as the signal propagates to lower levels of the hierarchy. This strategy minimizes

reflections of the high-speed cloc! signals at the branching points. %pecifically, the

impedance of the conductor leaving each branch point ZDmust be twice the impedance





of the conductor providing the signal to the branch point Z for an <-tree structure and

four times the impedance for an E-tree structure. This tapered <-tree structure is

illustrated in #ig. 7.9+

Z ZDP for an <-tree structure /4.10

*igure 4.5 Tapered <-tree cloc! distribution networ!.

The planar <-tree structure places constraints on the physical layout of the cloc!

distribution networ! as well as on the design methodology used in the development of the

B%I system. #or e'ample, in an <-tree networ!, cloc! lines must be routed in both thevertical and horizontal directions. #or a standard two-level metal &(*% process, this

manhattan structure creates added difficulty in routing the cloc! lines without using either

resistive interconnect or multiple high resistance vias between the two metal lines. This

aspect is a primary reason for the development of three or more layers of metal in logic-

based &(*% processes. #urthermore, the interconnect capacitance and therefore the

power dissipation" is much greater for the <-tree as compared with the standard cloc! tree

since the total wire length tends to be much greater. This increased capacitance of the <-

tree structure e'emplifies an important tradeoff between cloc! delay and cloc! s!ew in

the design of high-speed cloc! distribution networ!s. %ymmetric structures are used to

minimize cloc! s!ew$ however, an increase in cloc! signal delay is incurred. Therefore,

the increased cloc! delay must be considered when choosing between buffered tree and

<-tree cloc! distribution networ!s. Also, since cloc! s!ew only affects sequentially-

ad3acent registers, the obvious advantages to using highly symmetric structures to

distribute cloc! signals are significantly degraded. There may, however, be certain

sequentially-ad3acent registers distributed across the integrated circuit. #or this situation,





a symmetric <-tree structure may be appropriate, particularly to distribute the global

portion of the cloc! networ!.

CHAPTER 5

CONC(USIONS

An in-depth analysis of the synchronous digital circuits and cloc!ing approaches

was presented. &loc! skew and jitter has a ma3or impact on the functionality and

performance of a system. Important parameters are the cloc!ing scheme used and thenature of the cloc!-generation and distribution networ!. Alternative timing approaches,

such as self-timed design, are becoming attractive to deal with cloc! distribution

problems. %elf-timed design uses completion signals and handsha!ing logic to isolate

physical timing constraints from event ordering.

The connection of synchronous and asynchronous components introduces the ris!

of synchronization failure. The introduction of synchronizers helps to reduce that ris!, but

can never eliminate it. The !ey message of is that synchronization and timing are among

the most intriguing challenges facing the digital designer of the ne't decade.

The different timing methodologies were discussed based how the digital systems

are related with the cloc! signal. %ome information regarding the timing parameters of

sequential circuit with relative to the cloc! signal are discussed. The sequential circuits

are analyzed from the point of view of propagation delay, contamination delay, setup time

and hold time. The spatial variation in arrival time of a cloc! transition on an integrated

circuit i.e., &loc! %!ew" and the temporal variation of the cloc! period at a given point

i.e., &loc! 6itter" are e'plained in detail. The various cloc! distribution networ!s are

suggested. In these, <-Tree cloc! distribution networ! is used to minimize the cloc!

s!ew.

It is the intention of this report to integrate these various topics and to provide

some sense of cohesiveness to the field of timing, cloc!ing and cloc! distribution

networ!s.





RE*ERENCES

WX 6an (. =abaey, Anantha &handra!asan, 4orivo3e 2i!olic, &igital ntegrated Circuits,

nd edition, ?rentice <all of India, 2ew 8elhi, FF/.

WX 6ohn #. )a!erly, &igital &esign )rinciples and )ractices, 5th 1dition, ?earson

1ducation, India, FF9.

W/X B. >. *!lobdzi3a, B. (. %to3anovic, 8. (. (ar!ovic, and 2. (. 2edovic, &igital

S%stem Clocking* +igh,)er'ormance and Low )ower -spects, I%42 F-5L-L55L-E,

I111 ?ressP)iley-Interscience, FF/.

W5X 1. >. #riedman, :&loc! 8istribution 2etwor!s in %ynchronous 8igital Integrated

&ircuits”, ?roceedings of the I111, Bol. NG, 2o. 7, pp. 997-9G, (ay FF.

W7X A.<. A3ami, . 4aner3ee, and (. ?edram, :(odeling and Analysis of 2on-uniform

%ubstrate Temperature 1ffects on >lobal @%I Interconnects;, I111 Transactions on

&omputer Aided 8esign of Integrated &ircuit and %ystems, Bol. 5, 2o. 9, pp. -L,

6une FF7.

W9X http+PPwww.stanford.eduPclassPeeL/P

http://en.wikipedia.org/wiki/Special:BookSources/047127447X


http://dx.doi.org/10.1109/5.929649

http://dx.doi.org/10.1109/5.929649


http://dx.doi.org/10.1109/5.929649

http://dx.doi.org/10.1109/5.929649

Documents

synchronus