Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A NOVEL HIGH RESOLUTION DELAY LOCKED LOOP
by
ARDESHIR SAGHAFI
B.Sc, The University of Science and Technology
Tehran, Iran, 1989
A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF APPLIED SCIENCE
in
THE FACULTY OF GRADUATE STUDIES
(Electrical & Computer Engineering)
THE UNIVERSITY OF BRITISH COLUMBIA
July 2005
© Ardeshir Saghafi, 2005
Abstract
With the rapid advances in semiconductor technology, modern digital systems operated at
GHz frequency have been successfully developed for many years. As the chip size gets
progressively bigger, and the number of logic gates and chip operating frequencies
increase, the clock skew becomes increasingly more important in ensuring the proper
functioning of VLSI chips. With a synchronous methodology, it is impossible to increase
the clock speed further without reducing the clock skew on the chip.
The Phase Locked Loops (PLLs) and Delay Locked Loops (DLLs) have been widely
adopted to solve the clock skew problem. In recent years, Delay Locked Loops (DLL's)
have been widely used for clock alignment due to their lower phase-error accumulation
and faster locking time. In this thesis a novel high resolution D L L with less than 10 ps is
proposed which combines the coarse and fine delay line into an efficient hybrid delay line.
Consequently, it saves power and area.
11
Table of Contents
Abstract i i Table of Contents i i i List of Figures v Acknowledgment viii
CHAPTER 1 Introduction 1
1.1 Clock skew 1 1.2 Delay Locked Loop '. 3 1.3 D L L Vs. P L L 5 1.4 Applications 7
1.4.1 Clock distribution 7 1.4.2 S D R A M 7 1.4.3 Time-to-Digital converter (TDC) 9 1.4.4 Automatic Test Equipment (ATE) 10 1.4.5 Clock synthesis 10 1.4.6 Clock and data recovery (CDR) 11
CHAPTER 2 Background 12
2.1 Analog D L L 12 2.2 Digital D L L 16 2.3 Double loop D L L 18 2.4 Synchronous Mirror Delay (SMD) 20 2.5 Register controlled D L L (RDLL) 24 2.6 Vernier Delay Locked Loop (VDLL) 27
CHAPTER 3 Design of proposed D L L 30
3.1 Block diagram 30 3.2 D L L modules description 39
3.2.1 Vernier delay line 39 3.2.2 Vernier delay line controller 42 3.2.3 High resolution phase detector 47 3.2.4 Lock detector. 51
CHAPTER 4 Analysis of proposed D L L 52
4.1 Testbench 52 4.2 Initial lock 53 4.3 Lock re-entry 57 4.3.1 Lock re-entry (case 1) 57 4.3.2 Lock re-entry (case 2) 58
i i i
4.4 Gate count of vernier unit delay 63 4.5 Resolution of the proposed D L L 63 4.6 Limitations of the proposed D L L 64
CHAPTER 5 Conclusion 65
Bibl iography 68
Appendix A Design V H D L code 80
Appendix B Synthesis result 95
iv
List of Figures
Figure 1.1 Possible hold violation due to clock skew 2
Figure 1.2 Possible setup violation due to clock skew. 3
Figure 1.3 Typical D L L block diagram 4
Figure 1.4 Typical P L L block diagram 6
Figure 1.5 S D R A M output timing with and without a D L L 8
Figure 1.6 Block diagram of the laser range finder [101] 9
Figure 2.1 Conventional Analog D L L 12
Figure 2.2 Analog D L L with duty-cycle correction 14
Figure 2.3 Analog multiphase D L L 15
Figure 2.4 Digital D L L block diagram 16
Figure 2.5 Dual loop D L L 19
Figure 2.6 Conventional SMD 21
Figure 2.7 Timing diagram of a conventional SMD 22
Figure 2.8 Block diagram of Direct SMD 23
Figure 2.9 Register Controlled D L L (RDLL) 24
Figure 2.10 Core circuit in R D L L 25
Figure 2.11 Core circuit in a RSDLL 26
Figure 2.12 Block diagram of a vernier delay line [73] 29
Figure 2.13 Schematic of vernier delay line [73] 29
Figure 3.1 Block diagram of proposed D L L 30
V
Figure 3.2 Circuit and timing diagram ofa Conventional unit delay. 31
Figure 3.3 Circuit and timing diagram of a Symmetrical unit delay 32
Figure 3.4 CMOS N A N D gate 34
Figure 3.5 Phase Detector block 34
Figure 3.6 Lock Detector block 35
Figure 3.7 Controller block 35
Figure 3.8 Vernier delay line block 36
Figure 3.9 S A R D L L block diagram [33] 37
Figure 3.10 Flowchart for weighing sequence 38
Figure 3.11 Proposed unit delay circuit 40
Figure 3.12 State diagram of controller block 43
Figure 3.13 Shift registers in controller block 44
Figure 3.14 Phase detector in [50] 48
Figure 3.15 Proposed high resolution phase detector 49
Figure 3.16 Phase detector waveforms 50
Figure 3.17 Lock detector circuit 51
Figure 4.1 Initial lock mode waveform for a leading input clock 54
Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in) 55
Figure 4.3 Initial lock mode waveform for a leading output clock 56
Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in) 56
Figure 4.5 Lock re-entry mode waveform for small phase error. 58
Figure 4.6 Lock re-entry mode waveform for a leading input clock 59
Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in) 60
vi
Figure 4.8 Introduced glitch waveform for a leading input clock 60
Figure 4.9 Lock re-entry mode waveform for a leading output clock 61
Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in) 62
Figure 4.11 Introduced glitch waveform for a leading output clock 62
A C K N O W L E D G M E N T S
I would like to express my deepest gratitude to my academic and research advisor
Dr. Andre Ivanov for his guidance and constant support in helping me to conduct and
complete this work.
Also my wife has been supportive, not just tolerant, of my return to graduate school. She
is as pleased as I am that my dissertation is finished. She knows that I am grateful to her
for continuous support, but I take this opportunity for a public acknowledgment of my
debt to her.
V l l l
Chapter 1
Introduction
This chapter introduces the research topic of this thesis. A quick review of the D L L circuit
and its comparison with Phased Locked Loop are also included in this chapter. The chap
ter also describes the different applications in which the D L L is used.
1.1 Clock skew
As silicon fabrication technology develops, more logic can be packed on a die and as a
result the chip size gets progressively bigger. The number of logic gates and chip operat
ing frequencies increase, and the clock skew becomes increasingly more important in
ensuring the proper functioning of VLSI chips. With a synchronous communication proto
col on and off the chip, it is impractical to increase the communication clock speed further
without reducing the clock skew on the chip. In a synchronous design the period of clock
determines the available time for any operation between two flip-flops. Any uncertainty
such as skew or jitter reduces this period.
The clock skew is caused by different RC delay of clock interconnections along different
clock signal paths, different delays of clock buffers due to process and temperature varia
tions on the same chip, and power supply differences caused by power rail voltage drop.
l
The clock skew problem can also exist in other situations. For example, the input clock
driver in any chip will introduce uncertain time delays between the internal and external
clocks. As a result, internal clocks in a multi-chip system become asynchronous and prob
lems occur when data transfer between chips is performed.
Clock skew can lead to both setup and hold time violations. Consider the circuit in Figure
1.1(a), where the clock is shown routed in the direction of the data path. Delays in the
clock path lead to skewed versions of the system clock arriving at the two flip-flops. If 62
is greater than the sum of the clock-to-Q delay of FF1, the logic delay, and the setup time
of FF2, then a hold time violation will occur. As shown in Figure 1.1(b), FF2 samples the
wrong data. This can be prevented by adding delay to the data path from FF1 to FF2
(which increases the cycle time and is not preferred) or by reducing the clock skew.
(a)
clk_
D1 6* D Q
clkl - * J d1
logic D2
d2
D Q
r 4 F F 2
clk2
(b)
elk
clkl
D2
clk2
J V f A A
t
\
J \ I V I I
Figure 1.1 Possible hold violation due to clock skew
If the clock signal is routed in the opposite direction to data flow as shown in Figure
1.2(a), then clock skew will not cause a hold time violation. However a setup time viola
tion can occur since clk2 might arrive earlier than clkl as shown in Figure 1.2(b). The
clock cycle has to be increased in order to prevent this violation, which also harms system
performance.
(a)
e l k _J \ / V
(b) c lk l
D2
clk2
J V
X J V
7 \
\
Figure 1.2 Possible setup violation due to clock skew
1.2 Delay Locked L o o p
To reduce clock skew, the clock distribution network should be designed with care. In
addition, circuits such as Phased Locked Loops (PLLs) and Delay Locked Loops (DLL)
may be necessary to reduce the total clock skew by employing them in several critical
places of the clock distribution structure.
3
Basic D L L consists of a phase detector (PD) or a phase comparator (PC) block, a variable
delay line, and a controller to convert the PD's output to a control signal for the delay line
as shown in Figure 1.3(a). A basic D L L detects the phase error between the input clock
and its output clock and adjusts the total delay of variable delay line to a multiple of peri
ods of the input clock. It introduces enough delay (Td) so the rising edge of the output
clock coincides with the next rising edge of the input clock as shown in Figure 1.3(b).
External Reference Clock Phase
Detector Low Pass
Filter Phase
Detector Low Pass
Filter
Clock Buffer
Error signal
(a)
Point of use
DLL Input Clock j v -Lock lime-
DLL Output Clock / \ / \ \\J \ / \ \\-
DLL is locked (b)
Figure 1.3 Typical D L L block diagram
The correct timing of a synchronous circuit relies on clock edges and is affected by the
clock skew and jitter, so the introduced 1 input clock period delay doesn't have any nega
tive impact on the functionality of systems that utilize the Delay Locked Loop circuits.
The output clock's frequency of a standard Delay Locked Loop circuit is the same as that
in the input clock, so generally DLLs are not used for clock synthesis. PLLs are used
widely for synthesis and clock multiplication. While there are some applications which
use DLLs for clock synthesis, this is not common [45], [53] and [107].
The D L L and P L L circuits are considered feedback circuits. They generally require sev
eral clock cycles to achieve lock, resulting in a large standby power consumption. These
circuits cannot be used in clock deskewing applications requiring low standby power con
sumption. In other words, these circuits cannot be turned off in standby mode due to their
slow locking operation.
The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits
were developed for applications requiring low standby currents [52], [88] and [94]. These
circuits have no feedback so their lock-in time is significantly less than that of DLLs or
PLLs. During the standby mode, it is possible to switch them off. When power is resumed,
it only takes two or three clock cycles for them to lock in, which is negligible for most
applications.
1.3 D L L vs . P L L
When it comes to choosing between a P L L and D L L for a particular application, differ
ences in their architecture need to be understood. The oscillator used in the P L L inherently
introduces instability and accumulation of phase errors (Figure 1.4). This in turn degrades
the performance of the P L L when compensating for the delay of the clock distribution net-
5
work. On the other hand, the unconditionally stable DLL architecture does not accumulate
phase errors [20], [100], [103]. For this reason, the DLL architecture is widely used for
delay compensation and clock conditioning.
External Reference! Clock
6H Phase
Frequency Detector
Error signal
Low Pass Filter
Voltage Controlled Oscillator
Clock Tree
Point of use
Figure 1.4 Typical PLL block diagram
The DLL's closed loop transfer function has only one pole (a first order system) [56] and
[57]. Therefore, it is naturally a stable system. On the other hand, a PLL's closed loop
transfer function has two or three poles. Therefore, stability is a major issue and needs to
be addressed during design. Normally, one needs to add zeroes to a PLL's transfer function
in order to stabilize the PLL circuit [103].
The input clock's jitter propagates through a DLL circuit (first order system) and can
affect the performance of the system. PLL filters out the jitter, so it is the best choice for
applications with high jitter input. In a clock distribution system, the main clock is gener
ated by a quartz crystal oscillator, which does not introduce a significant amount of jitter.
Therefore, generally the DLL circuit is utilized for de-skewing purposes [33], [66], [68]
and [98].
6
The main disadvantage of a conventional D L L compared to a PLL, is its limited phase
capture range [37]. At a given operating clock frequency a D L L can delay its input clock
by an amount bounded by a minimum and maximum delay. As a consequence, extra care
must be taken by a designer to prevent the loop from trying to lock to a delay outside these
limits. To extend the operating range, the number of delay cells or the gain of the delay
line (analog DLL) should be increased. This not only consumes additional power, but also
causes more jitter from supply.
1.4 Appl icat ions
DLLs are used in many different applications as described in the following subsections.
1.4.1 Clock distribution
As previously mentioned, a D L L is mainly used in the clock distribution circuit which do
not require clock synthesis or multiplication. Due to the nature of these systems (fixed
clock frequency), a DLL's narrow capture range is not an issue, [35], [82], [33], [66], [68],
[98] and [102].
1.4.2 S D R A M
In synchronous D R A M , the output data strobe (DQS) should be locked to data outputs
(DQ outputs) for high-speed performance. The clock-access and output-hold times of con
ventional D R A M designs are determined by the delay time of internal circuits such as
clock input and output buffers. Variations in temperature and process change access times
and reduce the size of the valid data window. Several publications describe how a D L L
7
can optimize and stabilize clock-access and output hold times, [26], [47], [65], [70], [72],
[73], [74], [77], [79], [85], [87], [90], [93], [94] and [104]. A n internal D L L can be used to
adjust the time difference between the output and input clock signals in SDRAMs (Figure
1.5).
(a) without DLL
cik r DQ
tAC = Td(max) tOH = Td(min)
td
Data out xxyy - 53
Valid data window
(b) with DLL
Clk
tAC tOH
DQ
L \ / \ /
X )( Data out )
- 4 •
Valid data window
Figure 1.5 S D R A M output timing with and without a D L L
In Double Data Rate synchronous D R A M [1], [17], [21], [32], [44], [46], [50], [51] and
[75], where read/write accesses can occur on both rising and falling edges of the clock,
clock synchronizing is critical and is required for both clock edges. A symmetrical D L L is
used for this application. The term ''symmetrical'' means that the delay line used in the
DLL has the same delay for a high-to-low or a low-to-high logic transition.
1.4.3 Time-to-Digital converter (TDC)
High-resolution time-to-digital converters (TDCs) have an application in a number of
measurement systems such as time-of-flight (TOF) particle detectors, laser range finders
(Figure 1.6), and logic analyzers. Laser range-finding is used in many industrial applica
tions, for example measuring dimensions of ship blocks in shipyards, inspection of oil
level in large tanks, and robot vision [4], [22], [23], [36], [54], [62], [69], [81], [83], [95],
[96], [97], [101] and [106].
Time interval measurement
(DLL)
Distance result
Laser diode Transmitter
Amplifier +
Timing discriminator"
Target
Figure 1.6 Block diagram of the laser range finder [101].
Modern TOF systems used in particle physics experiments, require TDCs to have a resolu
tion below 1 ns. A distance measurement accuracy of 2-3 cm corresponds to 100-200 ps of
measurement time. A high-resolution measurement can be obtained by utilizing a logic
buffer delay as a time unit, and a DLL is used to stabilize the value of buffer delay against
process variations, temperature and power supply changes. The delay line is used in a
closed loop controlled by a D L L . The time resolution is limited to the delay of each unit
cell in the delay line.
1.4.4 Automatic Test Equipment (ATE)
General purpose Automatic Test Equipment (ATE) requires fast devices, high tester band
width, high data rates, and high timing accuracy. At the heart of ATE is timing event gen
eration circuitry which generates control signals for different parts of ATE [99]. DLLs
have been used widely in ATE to achieve required precision and eliminate process varia
tions, temperature fluctuations and supply voltage (PVT) that affect the time base genera
tor.
1.4.5 Clock synthesis
PLLs have been used successfully in creating tapped ring oscillators for clock synthesis. A
PLL's delay elements have two dependent variables controlled by the feedback system, the
frequency and the phase. A D L L , however, has only a single dependent variable controlled
by the feedback loop, the phase. The P L L will integrate the error of all its noise
sources,but a D L L will only integrate the noise sources that cause jitter such as power sup
ply noise or thermal noise. This only happens over one delay period, so a D L L does not
accumulate noise because it is a first order system. This is a desirable characteristic for
every high performance clock generator [2], [3], [6], [7], [8], [9], [11], [12], [15], [18],
[27], [28], [30], [39], [40], [41], [42], [45], [48], [64], [67], [80], [88], [103] and [105].
10
1.4.6 Clock and data recovery (CDR)
Clock and data recovery is a mechanism that allows a receiver to extract the clock from an
incoming data stream which then can be used to extract the incoming data. The receiver
extract the embedded clock in the data stream in order to transmit data back to the source.
Both Delay-Locked-Loops (DLLs) and Phase-Locked-Loops (PLLs) can be used in clock
and data recovery circuits. DLLs are rarely used in CDR circuits [14], [19], [25], [55], [61]
and [91].
11
Chapter 2
Background
In this chapter, we provide an overview of different D L L types. The advantages and disad
vantages of each D L L type has been discussed. A extensive literature overview of differ- •
ent types of DLLs has been included, which covers papers from 1993 to 2005.
2.1 Ana log D L L
Analog DLLs were first used in clock distribution applications [10] and [13]. A conven
tional analog D L L consists of four main blocks: a voltage controlled delay line (VCDL), a
charge-pump, a low pass filter, and a phase detector as shown in (Figure 2.1).
V C D L
RefClk
Figure 2.1 Conventional analog D L L
12
The input reference clock drives the delay line and is comprised of cascaded variable
delay buffers. The output clock drives the loop phase detector. The output of the phase
detector is integrated by the charge pump and the loop filter capacitor to generate a loop
control voltage. The loop negative feedback drives the control voltage to a value that ide
ally orces a zero phase error between the output clock and the reference clock.
The simple design of the D L L offers many advantages when compared to Voltage Con
trolled Oscillator (VCO) based PLLs. Due to frequency acquisition constraints, P L L usu
ally uses a specific type of phase detector, the state-machine based phase frequency
detector (PFD). In contrast, a DLL's phase detector can be easily implemented by using
bang-bang control [109]. This means that the control signal of the loop can simply be a
binary up or down signal rather than being proportional to the phase error magnitude.
Additionally, since DLLs do not use a V C O , phase errors induced by supply or substrate
noise do not accumulate over many clock cycles [108]. This improved noise immunity is
the main reason for the increased usage of DLLs in applications that do not require clock
synthesis [16], [19], [34] and [105].
An analog D L L is a relatively complex analog circuit requiring process-specific imple
mentation. It is difficult to reuse the same design for different technology, making analog
D L L a non-portable architecture. For example, i f an analog D L L is designed for 0.35 | im
CMOS technology then it is not practical to upgrade it to 0.18 | im technology, as major
changes in the layout of the design are required.
13
The output clock's duty cycle changes as it passes through many delay cells. The reason is
that the propagation delay of each unit cell in the delay line is not the same for low-to-high
and high-to-low input, so even i f the duty-cycle of a reference clock is 50% at the input,
the output duty-cycle may be significantly different. A conventional solution to this is
attaching duty-cycle correction circuits to all clock output drivers, which also adds to the
area and increases jitter.
A n all-analog multiphase D L L is proposed in [34]. It achieves both wide range operation
and low jitter performance. The proposed D L L has the same benefits as conventional ana
log D L L such as jitter cancelling and multiphase clock generation. It also uses a dual con
trolled delay cell to correct the duty-cycle problem as shown in Figure 2.2.
Reference Clock
V C D L
Phase Detector Charge pump Low pass filter
Phase Detector Charge pump Low pass filter
Vcp
Vduty
Clk
Figure 2.2 Analog D L L with duty-cycle correction
14
A second phase detector compares the inverted clock input, with the inverted clock output
and generates a control signal Vduty as shown in Figure 2 . 2 . It fine-tunes the cell current
ratio and therefore aligns the falling edges of reference clock and output clock. In this
way, it maintains a reference clock's duty cycle.
A quadrature phase mixing D L L was proposed in [104] and [105] , which completely elim
inates the limited capture range deficiency of conventional analog DLLs (Figure 2 . 3 ) . This
approach is based on the fact that quadrature clocks ( 9 0 degree phase shifted clocks) can
be generated for a given clock frequency. The quadrature clocks are input to a phase
mixer, which can produce a clock whose phase can span the complete 0 - 3 6 0 degree phase
interval. This approach reduces the limited phase range problem of conventional D L L .
Reference Clock Divide 0
By 2 9o°|
Phase Detector
Charge Pump
Figure 2 .3 Analog multiphase D L L
15
2.2 Digital D L L
Both analog and digital DLLs have been used for clock alignment applications [35], [82],
[33], [66], [68], [98] and [102]. A n analog D L L generally provides better jitter perfor
mance at the expense of greater complexity. Although the digital D L L uses more area and
power than the analog D L L , its greater simplicity, and lower minimum required power
supply voltage makes it very attractive for many clock alignment applications.
Digital DLLs are characterized by their use of digital delay lines. They are typically made
from simple digital circuit elements (Figure 2.4). This simplicity helps to design a portable
digital D L L which can be easily adopted for different technologies. Additionally, because
phase information in a digital D L L is stored as a digital state, digital DLLs can provide
very fast timing recovery after being placed in standby mode. However, conventional dig
ital DLLs provide only moderate phase resolution and jitter performance [1], [21], [32],
[48], [49], [71], [74], [76], [78], [92] and [94].
External Reference | Clock _
Demultiplexer
N -h
Phase Detector
Right Shift Register •1 Phase
Detector Left ^ Shift
Register •1 Phase Detector
Shift Register
Error signal
Figure 2.4. Digital D L L block diagram
16
Another benefit of digital DLLs is their ability to operate at lower voltages than analog
DLL's . Because analog DLLs require the use of saturated current sources, they experience
minimum voltage problems as supply voltage decreases. Digital DLLs , on the other hand,
only require enough voltage to ensure the proper operation of their digital gate elements.
A digital DLLs utilize the power saving benefits of power supply scaling better than ana
log DLLs. The power consumption of an analog D L L is the sum of static power consumed
by the constant current sources in the circuit and the dynamic power of C V f (where C is
capacitance and f is frequency). The power consumption of a digital D L L , on the other
hand, is determined primarily by C V f power, which decreases quadratically with supply
voltage.
The delay elements can be implemented with almost any circuit block, but because the
phase resolution of the delay line is determined by the propagation of each unit cell, delay
elements that provide minimal delay are generally preferred. The delay line of a conven
tional digital D L L uses inverters, since they provide the shortest delay of any CMOS digi
tal gates. Because of the inverting characteristic of an inverter gate, the delay line is
tapped only at every other inverter (two inverters in a series form a unit cell) to ensure that
output taps are not inverted and only shifted by the total propagation delay of the two
inverters.
Although conventional delay lines are attractive for their simplicity, DLLs based on such
conventional delay elements suffer from several significant limitations. First, the delay
17
line provides fairly coarse resolution. For example, the delay line with inverters as unit
cells provides a minimum phase step corresponding to two inverter delays. Such coarse
phase resolution is not enough for many clock alignment applications.
Second, conventional delay lines deliver only a limited phase range. In order to cover at
least one full cycle of phase, the delay line length and unit cell delays are adjusted to pro
vide at least 360 degrees of phase under the fastest process, voltage, and temperature
(PVT) conditions and minimum operating frequency. Consequently to cover this range, a
long delay line which occupies more silicon area and dissipates additional power is
required. Additionally, because inverters offer a poor power supply rejection ratio
(PSRR), power supply's noise-induced jitter can be accumulated as the signal propagates
through the delay line. This causes the signals from the later taps in the delay line to intro
duce more jitter than earlier taps.
2.3 Double loop D L L
The key parameters in the D L L design are locking time, power consumption, jitter, and
phase error, which depend on the choice of proper delay elements and loop control meth
ods. The phase adjustment is done through a variable delay line or a tapped delay line. The
tapped delay line is used for digital control, where the locking characteristics are less sen
sitive to switching noise and cross talk. On the other hand, the variable delay line is used
for reducing the static phase error, where the delay changes gradually. Therefore, the logi
cal approach to obtaining a D L L with fast locking and a low phase error is to combine
these two methods. This is called a dual loop D L L , sometimes referred to as semi-digital
18
D L L [26], [29], [31], [38], [45], [58], [60], [63], [84] and [89]. The locking procedure is
done in two steps, coarse tuning and fine tuning. Coarse tuning and fine tuning are per
formed in the digital and analog domains, respectively (Figure 2.5). The dual loop D L L
can be used in low power stand-by mode applications. Then, the recovery from stand-by
mode to regular operational mode is almost immediate because digital information is kept
in the stand-by mode and the position of the output tap in the delay line is known at star
tup.
External Clock Delay Delay Delay Delay
Charge Pump] Loop Filter
PFD
Mux / -
Analog Delay
Mux /—SjVlux / - S j V l u x / ~ \ Mux /—/-
Clock Buffer
Digital Control Block
Digital Phase Detector!
Figure 2.5 Dual loop D L L
After powering up the system, the coarse tuning mechanism starts. Normally the middle
tap in the delay line is selected and the output clock is compared to input reference clock
19
by a digital phase detector. Depending on which clock is leading and which is lagging, the
output of the phase detector shifts the selected tap right or left. Finally, the proper tap with
minimum delay to the reference clock is selected. By that time, coarse tuning phase had
been completed.
To avoid unwanted phase jitter, the digital block is disabled and shift registers in the con
troller block hold their positions. The analog control part is enabled to reduce the phase
error. This function is performed by a lock window mechanism. If the internal clock is
outside the window, the digital block is enabled. Once the internal clock enters the lock
window range, the analog block is enabled and the digital block is disabled. The range of
the analog part must be large enough to cover the lock-detecting window. The analog con
trol block consists ofa PFD, a charge pump, and a loop filter. The operation of the analog
loop is the same as that of the conventional analog D L L .
2.4 S y n c h r o n o u s Mirror Delay (SMD)
The conventional P L L and D L L circuits are considered feedback systems, requiring many
clock cycles to achieve lock. Therefore they can not be turned off and are not used in
clock-skew suppression applications requiring low standby currents for example in a cell
phone device. On the other hand, Synchronous Mirror Delay (SMD) and Clock Synchro
nized Delay (CSD) circuits are non-feedback systems which can achieve the lock, in only
two clock cycles [52], [88] and [94]. Therefore, in standby mode these circuits can be dis
abled, and they can lock to the reference clock in just two clock cycles when the operation
mode is resumed.
20
A conventional SMD circuit as shown in Figure 2.6, consists of an input buffer with delay
of d l , a clock driver with delay d 2 , a replica delay line (a dummy input buffer plus a
dummy clock driver with total delay (t,- e p l i c a = dj + d2), and two delay lines (a delay-mea
surement line and a variable-delay line arranged in parallel). When the circuit is activated,
the first clock signal propagates through the input buffer, the replica delay line, and the
delay-measurement line with delay [ t C K - t r e p l i c a ] until the second signal comes out of the
input buffer. Delay time [ t C K - 1 ^ ] ^ ] determines the length of the variable line. The sec
ond signal propagates through the variable-delay line and comes out of the clock driver.
The resulting total delay time is d, + d 2 + [ t C K - t ^ J + [ t C K - t r e p l i c a ] + d 2 = 2 t C K (Fig
ure 2.7). In this manner, no feedback circuitry is used and clock skew is eliminated within
two clock cycles. The simple structure of the SMD circuit also reduces design efforts [52],
[88] and [94].
tV = [tCK - (dl + d2)] < •
Buffer R e p l i c a D e l a y
Meas. Delay Line
Var. Delay Line
d2
Clock Driver
Internal Clock Line I Figure 2.6 Conventional SMD
21
tCK tCK
Ext Clock
A
B
C
Int Clock
n n V i —| n
Vreplica 1 i r~ii n
d2\ ~ i n
Figure 2.7 Timing diagram of a conventional SMD
Despite their advantages, SMD circuits are not widely used because they use a dummy
clock driver circuit based on clock driver circuits after the placement and routing phases.
Therefore, they are used for devices in which the clock driver circuits can be fixed during
the circuit design stage, e.g., memory elements [94].
Furthermore, a difference between the original clock driver circuit and the dummy clock
driver circuit exists due to process, power supply voltage,, and temperature variations
(PVT). This delay difference increases the phase error, which can not be compensated for
during the operation mode because no feedback mechanism exists for a SMD circuit.
22
A direct-skew-detect synchronous mirror delay (direct SMD) achieves clock-skew sup
pression in only two clock cycles [43] and [52]. It can be used for application-specific
integrated circuits (ASIC) with undefined clock paths as shown in Figure 2.8. The direct
SMD circuit detects both clock skew and clock cycle by using a direct-skew detector and
clock suppression circuitry. The direct SMD circuit does not use a dummy clock driver
circuit. Therefore, it does not experience the same problems as mentioned above for a con
ventional SMD circuit.
Input Ext B u f f e r
Clock
Dummy, Input Buffer
Skew Detector h - 1
Skew-Detection Signal
Meas. Delay Line
Var. Delay Line
Switch Clock Driver
Internal Clock Line
Figure 2.8 Block diagram of direct SMD
23
2.5 Register controlled D L L (RDLL)
The R D L L belongs to the digital D L L family and is widely used in high speed synchro
nous D R A M (SDRAM) applications [17], [51], [85] and [90]. In a SDRAM, the output
data strobe (DQS) should be locked to the data outputs. To optimize and stabilize clock-
access and output times, an internal R D L L is used in a SDRAM memory chip, which
adjusts the time difference between the output and input clock signals.
The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica
input buffer dummy [85]. The replica input buffer dummy is used in the feedback path to
match the delay of the input clock buffer. The phase detector (PD) is used to compare the
relative timing of the edges of the input clock and the feedback clock signal, which comes
through the tapped delay line. The shift register controls the point of entry in the delay line
for the incoming external clock as shown in Figure 2.9.
External Clock
M Clock buffer
Clock buffer (dummy)
Phase Comparator]
Delay line
Output Clock
Shift register
Figure 2.9 Register Controlled D L L (RDLL)
24
The outputs of the phase detector, shift-right and shift-left, are used to control the shift
register. In the conventional R D L L , only one bit of the shift register output is high, while
the other bits are zero. The single bit is used to select a point of entry for CLKIn in the
delay line. When the rising edge of the input clock is within the resolution of the output
clock, then both outputs of PD, shift-right and shift-left, are low and the loop is locked as
shown in Figure 2.10.
^>-^7t>^>---:"rOH>
CLKIn-
CLKOut
H L L
Shift register
Figure 2.10 Core circuit in R D L L
The resolution of the R D L L is determined by the size of unit delay used in the delay line.
The locking range is determined by the number of delay stages used in the delay line.
Since the D L L circuit inserts a delay time between CLKIn and CLKOut, making the out
put clock change simultaneously with the next rising edge of the input clock, the minimum
operating frequency to which the R D L L can lock is the reciprocal of the product of the
number of stages in the delay line with the delay per stage (F mj n= l/(Td * N), where Td is
the delay of one unit delay and N is the number of unit delays in the delay line). Adding
more delay stages will increase the locking range of the R D L L at the cost of increased
chip area and power consumption [17], [51], [85] and [90]. 25
The conventional R D L L uses an A N D gate as the unit-delay stage (NAND + Inverter).
The problem created by using a N A N D + Inverter as the basic delay element is that the
propagation delay through the unit delay for a high-to-low transition is not equal to the
delay of a low-to-high transition, i.e, t P H L is not equal to t P L H . If the difference between
t P H L and t P L H is 20 ps, for example, then the total skew of the falling edge through 50
stages is 1 ns. Because of this skew, the input clock's duty-cycle will not be preserved,
when the clock propagates through the delay line.
A Register-Controlled Symmetrical D L L (RSDLL) is proposed in [51], which can be used
for duty-cycle sensitive applications. For example, it meets the requirements of double-
data-rate (DDR) S D R A M that read/write accesses occurs on both rising and falling edges
of the clock. In the RSDLL, a modified symmetrical delay element is used, with a N A N D
gate instead of an inverter (two N A N D gates per delay stage).
Input •
L i H H
Q Q Q Q Shift register
H
Figure 2.11 Core circuit in a R S D L L
26
This symmetrical unit delay guarantees that t P H L = t P L H independently of process varia
tions, since when one N A N D switches from HIGH to LOW, the other switches from L O W
to HIGH. The schematic for a symmetrical D L L is shown in Figure 2.11.
2.6 Vernier Delay Locked Loop (VDLL)
The Vernier principle is based on the Vernier caliper [83]. The tool measures the length of
an object placed between its two jaws. On the sides of the jaws, an indicator mark shows
the distance between the jaws on a scale. Since the indicator usually falls between two tick
marks, additional accuracy is obtained by dividing the distance between tick marks.
A n additional scale is included next to the indicator, which has ten divisions in a distance
equal to nine divisions on the scale. Because of this mismatch it is possible to measure a
subdivision of the primary scale ten times smaller than the distance between tick marks.
Based on this concept, a delay line with N =10 delay elements can be designed to have a
total delay of H - 9 times of clock periods. The minimum achievable time step is = TV
N = D/H where T is the period of the input clock and D is delay of each delay element.
This technique was introduced and implemented for a time to digital converter (TDC)
[36], [70], [83] and [99]. A TDC is mainly used to digitize the time which has many poten
tial applications in high-energy and nuclear physics experiments.
27
In a conventional digital D L L , the quantization error is equal to the propagation delay of
each unit in the delay line. In a 0.35 | im CMOS technology, the propagation delay of an
inverter gate is about 40 ps. Thus, a unit delay consisting of two series inverter presents a
delay of 80 ps. For a GHz operating frequency, the 80 ps quantization error accounts for
8% of 1 ns clock period, an error that affects the functionality of a synchronous system.
The Vernier technique is implemented to reduce this error in [5], [24], and [86].
A modified version of the Register-Controlled D L L (RDLL) is proposed [73], which
relies on the Vernier concept. It consists of two series of RDLLs. The first R D L L performs
the coarse delay adjustment, with a 200 ps quantization error. The second R D L L , with a
40 ps quantization error, performs fine-tuning.
The coarse R D L L uses the conventional delay line, where each unit delay consists of a
N A N D gate and an inverter in a series configuration. The fine R D L L uses a different con
figuration, composed of two delay elements that have delay times of t d and 1.2 td, where t d
is the unit delay time of the conventional delay element as shown in Figure 2.12.
The delay lines are arranged in two parallel main and sub delay lines and are serially con
nected by switches SW0 to SW4. In Figure 2.12, only one of the switches can be closed at
any time. For example, i f SW0 is closed, the delay line generates 5 td. Similarly, i f SW1 is
closed, the delay line generates 5.2 td. Thus, this delay arrangement can generate a 0.2 t d
delay step, which is considerably smaller than that of conventional delay.
28
Sub-delay line
IN 1.2 td 1.2 td 1.2 td 1.2 td
SWON SW1 S W 2 \ SW3N SW4N
OUT - o
td td td td Main delay line
td
Figure 2.12 Block diagram of a Vernier delay line [73]
In figure 2.13, the main and sub-delay lines are connected with SWO to SW4 switches.
The fan-out of the main delay line is one, while that of the sub_delay line is two. Hence,
the delay of the sub_delay line exceeds that of the main delay line. This delay difference
becomes the unit delay time of the delay line, which is equal to the quantization error as
shown in F igure 2.13.
Sub-delay line
SW(n-l)
t r
td+A F.0 = 2
V td
- a
V SW(n)
Main delay line F.0 = 1
Figure 2.13 Schematic of vernier delay line [73]. 29
Chapter 3
Design of proposed DLL
This chapter covers the block diagram of the proposed circuit and detailed circuit explana
tions of each module in the block diagram. The logic design is described thoroughly. The
simulation results are covered in the next chapter. The design goal is to increase the reso
lution of D L L to less than 10 ps, as well as reducing the area (gate size) of the vernier
delay line in the D L L by a minimum of 10%. The power consumption is also reduced as a
result of the gate reduction in the vernier delay line. The resolution of less than 10 ps, area
reduction of 15% and operating frequency of up to 200 MHz is achieved in this design.
3.1 B lock diagram
The block diagram consists of four modules, phase detector, lock detector, Vernier delay
line and controller as shown in Figure 3.1.
Output
Input Clock
Vernier delay line
Phase Detector
a Controller Phase
Detector Controller
Lock Lock Indicator
Detector •
Error signal
Clock
Figure 3.1 Block diagram of proposed D L L
30
The input clock is connected to two modules, the phase detector and the Vernier delay
line. The Vernier delay line propagates the input clock and provides N output taps where N
is the number of unit delays in the delay line
In order to lock the output clock to the input clock for all input frequencies, the delay of
the delay line should be greater than the period of the minimum operating frequency. For
example, if a DLL's locking range is between 100 MHz to 200 MHz, then the delay line
must be able to delay the input clock by 10 ns. Therefore, the input clock is delayed by 10
ns when it exits from the last output tap. If the delay of each unit is, for example, 50 ps,
then the delay line needs 200 unit delays. Therefore, to reduce the minimum operating fre
quencies, more unit delays are required, which leads to more area and power consumption.
The delay of each unit depends on the number of cascaded gates in each unit and the tech
nology in which the circuit is implemented. The conventional unit delay consists of 1
N A N D and 1 inverter gates in series, which, in 0.18 | i m , technology generates a delay of
approximately 70 ps. The same unit cell implemented in 0.35 [xm can generate approxi
mately 100 ps. The delay estimates are based on commercial libraries.
There is a drawback for conventional unit delay, as the propagation delay is not symmetri
cal and the total delay for the rising edge of input signal is not the same as for the falling
edge. Therefore, an input clock with a 50% duty cycle can result in a square wave pulse
which no longer has the a 50% duty cycle as shown in Figure 3.2. This non-symmetrical
aspect can cause problems in Double-Data-Rate DRAMs, where read/write access can
31
occur on both rising and falling edges of the clock [1], [17], [21], [32], [44], [46], [50] and
[51].
InA
In pLH Out
OutA
H
t l t2
t
t l j*t2
Figure 3.2 Circuit and timing diagram of a conventional unit delay
The proposed DLL utilizes the unit delay consisting of two basic NAND gates in series.
This configuration eliminates the non-symmetrical characteristic of a conventional unit
delay. The total propagation delay of t l ( T P H L + T P L H ) for the input rising edge is equal to
t2 ( T P L H + T P H L ) for the falling edge of the same input clock. The T P H L and T P L H are high
to low and low to high delays of the NAND gate, respectively as shown in Figure 3.3.
Therefore, the duty cycle of the input clock is preserved through-out the delay line.
InA
In
J>TO Out
OutA
^ t
H H
t l t2 t l =t2
• t
Figure 3.3 Circuit and timing diagram of a symmetrical Unit delay
32
The Vernier delay line is controlled by a finite state machine or simply a controller. There
are two modes of operation, coarse and fine. A system reset signal initiates the coarse tun
ing mode. In this mode, the phase detector compares the output clock signal from the cen
ter tap with the reference input clock. If the positive edge of the input reference clock is
leading, then the controller shifts the output tap to the left and the total delay decreases.
On the other hand, if the positive edge of the input reference clock is lagging, then control
ler shifts the output tap to the right and the total effective delay increases.
The controller enters the fine tuning mode when the positive edge of the input reference
clock and the output tap of delay line are less than a unit delay apart. Therefore, the delay
of each Vernier unit determines the resolution of the coarse tuning mode. In the fine tuning
mode, each time unit shift to the left or right is a fraction of its coarse tuning mode. This
enhanced resolution determines the final resolution of the system and sets the maximum
phase jitter.
The phase detector compares the input clock reference with the output tap signal of the
delay line. The resolution of the D L L depends not only on the fine resolution of each Ver
nier unit delay but also on the resolution of the phase detector. In this design, the phase
detector's resolution is determined by the differential delay of a two input N A N D .
Generally, in CMOS gates the propagation delay from input ports to output port are not the
same. For example, in a N A N D gate, the input A which is connected to NMOS transistor
T l , has a smaller propagation delay than input B, which is connected to the NMOS tran-
33 •
sistor T2 because the capacitance load on the drain of T2 is more that of TI as shown in
Figure 3.4. This difference for a two input N A N D gate in CMOS 0.18 | im technology is
less than 10 ps and varies with load and input signal transition time (slew rate).
Out
A
Figure 3.4 CMOS N A N D gate
The phase detector block has three outputs: increasedelay, decrease_delay, and
controller_clk as shown in Figure 3.5. At any time during the coarse and fine tuning mode,
one of the increase_delay or decreasedelay outputs is active and controllerclk is used to
synchronize the controller with the phase detector, so any shift to right or left is performed
on the positive edge of controllerclk output.
dll_clk_input
dll_clk_output
reset
Figure 3.5 Phase Detector block
increase_delay register_clk decrease_delay
34
When the interval between the positive edge of the output clock and the input reference
clock is within the resolution of the D L L , then D L L is in lock mode. A l l of the phase
detector's outputs are disabled and the controller stays is in standby mode. The lock detec
tor block indicates when D L L is in the lock mode, and its output goes high when the D L L
is locked as is shown in Figure 3.6.
increase_delay dll_clock_input decrease_delay
• lock indicator
reset
Figure 3.6 Lock Detector block
The controller block is a finite state machine (FSM) controlling the delay line as shown in
Figures 3.7 and 3.8. It controls the coarse and fine tuning modes. It also provides the
mechanism to resume the lock mode when the input's clock frequency or phase changes
rapidly. The system reset pulse initializes the D L L , and the controller block goes into reset
mode when the system is powered up. The detailed flowchart is shown in Figure 3.12.
reset
increase_delay. decrease_delay •
registerer_clk
fine control
^ > fine_control_inv
^ > coarse_control
Figure 3.7 Controller block
35
fine control inv
coarse control I
fine control i Vernier delay line
•*»*- delay_line_output
delay_line_input
Figure 3.8 Vernier delay line block
The delay line has 128 output taps controlled by the controller block. During initialization
the center tap is selected as the output tap. The register-input bus is hardwired to a hex
value of "0000000080000000", which means that all the register-input bits except bit 63
are tied to logic zero. During system power up, the input load signal is asserted to logic
one. Consequently, this number is loaded into a 128 bit shift register. After reset, the cen
ter tap corresponding to output control bit 63 of the controller's shift register is selected
for the delay line output tap.
It is possible to load the shift register with any other number, so any output tap in the delay
line can be selected. The center tap is however the best choice, because it gives the maxi
mum dynamic range for both right and left shift, so the lock mode can be achieved in the
fastest time. In addition to speed, choosing the center tap as initial output tap leaves a
maximum number of unit delays in both directions. Therefore, the controller output selec
tor does not reach the boundary taps before entering the lock mode.
36
At any time, only one bit of the shift register is active, selecting an output tap of the delay
line. In this design all the unit delays are the same and exhibit the same amount of delay. A
linear approach has been selected to achieve the lock mode in the design of DLL in this
thesis. Therefore, the controller linearly shifts the output tap to the right or left one step at
a time so the skew between the output clock and input reference clock is gradually
reduced to the minimum, which is less than the resolution of fine tuning delay units.
A Successive Approximation Register Delay Locked Loop (SARDLL) is proposed in
[33], which uses a counter instead of a shift register. Also, its delay line is designed in a
binary-weighted manner and no longer consists of delay units with equal delay time. The
N-bit control word from the up/down counter determines whether the input clock goes
through the delay stage or passes it as shown in Figure 3.9.
Input Clock
Feedback Clock
1 2 4
m 2N-3 2N-2 2N-1
1 1 | Output 4-, J L J L r\nnV
f ^ J Delay Line ^
N-bit Control Word
Phase Comp
Fast Idle N-bit Up/Down Counter Slow
Clock
Figure 3.9 SARDLL block diagram [33].
37
For faster lock time, the binary search algorithm is incorporated into S A R D L L . This algo
rithm reduces the searching effort and speeds up the lock time process. The flowchart in
Figure 3.10 demonstrates how this algorithm works for a three-bit control word. In the
beginning, the most significant bit (MSB) of the controller output is set to one, and all the
other bits are set to zero. A phase comparator examines whether the output clock leads the
input clock or not. If it does, the MSB remains high. If not, it is set to low and held con
stant. In this way the MSB is determined and the process is repeated for each following bit
until the least significant bit (LSB) is determined. In this way, the D L L can be locked
quickly.
(Start)
Figure 3.10 Flowchart for weighing sequence
A conventional linear approach has been implemented in this thesis. Devising the best
algorithm to speed up the lock time period is an independent topic which can be explored
in future research projects.
38
3.2 D L L modules description
In this section, all the modules for the proposed D L L are explained in detail. First, the cir
cuit and all of its components are described. Then, the functionality and operation of each
block are investigated in details.
3.2.1 Vernier delay line
The delay line consists of N unit delay in a chain configuration. In this design N=128,
which establishes an approximate minimum operating frequency of 100 M H z based on
target spec. More unit delays are needed to lower the minimum operating frequency. Each
unit delay consists of five dual-input N A N D gates. Therefore, a total of 640 N A N D gates
are used for this Vernier delay line.
In clock distribution applications, the clock frequency is fixed, so the minimum value of N
is calculated for the frequency, automatically leading to minimal power and area con
sumption. In the clock recovery application, the D L L operates in a range of frequencies,
so the value of N is determined by the lowest frequency component in the incoming data.
The output port of all 128 delay units comprising the Vernier delay line are connected to a
single-bit bus. This single-bit bus is the output of the D L L and is fed back to the phase
detector block for phase comparison. If none of the tri-sate output buffers in the Vernier
delay line are enabled, then the D L L output floats which is neither low or high value.
39
In order to prevent the DLL output to float, a small tri-state buffer is hooked up to the DLL
output. The input and enable ports of this buffer are tied to logic high, so its output holds
the DLL output to a weak high '1 ' value. Due to the weak drive capability of this small
buffer, a low output at any one of these 128 buffers overrides this weak high value and the
DLL output is pulled down to the '0' logic value.
Each unit delay consists of five NAND and one tri-state buffer gates. U l and U2 form the
fine unit delay, and U3 and U4 form the coarse unit delay. U5 acts as a switch controlled
by the fine-control input. The coarsecontrol input is connected to enable port of the
buffer gate (U6) and determines whether the unit_delay_out port is connected to output of
the U4 or is in a state of high impedance as shown in Figure 3.11.
VDD VDD
finejnput
finecontrol fine control inv
vernier_input coarse_input •
clk_output
fme_output
..vernier_output coarse_output
Figure 3.11 Proposed unit delay circuit
40
The clk_output of all N unit delays are tied to each other and form a one bit tri-state bus. A
single tri-state with weak output drive holds this bus at weak high level which guarantees
this single-bit bus never floats.
The fine and coarse delay units are constructed by two N A N D gates in series, forming a
symmetrical delay line. The propagation delay is the same for both rising and falling
edges, so the duty-cycle is preserved along the line.
Each output port of U2 and U4 is connected to two other inputs, so the fan-out is two and
both U2 and U4 use the A l port for delay input. As the result, both U2 and U4 introduce
the same amount of propagation delay. The difference between fine and coarse unit delay
is that the fineinput is connected to port A2 of U I , but the coarse_input is connected to
port A l of U3. In a N A N D gate, the propagation delay from A l and A2 ports to output Z
is not the same. The Vernier technique is based on this inherent characteristic of the
N A N D gate and uses this differential delay between the two inputs to achieve a fine step
resolution.
In DLLs proposed in [47], [59], [51] [85] and [90], the input clock is connected to all the
unit delays in the delay chain, so there are N taps, where N is the number of unit delays in
the delay line. This large fan-out requires a clock driver, which is large in area and con
sumes extra power. It also introduces an extra delay that has to be compensated for with
another dummy clock driver in the feedback path.
41
In this design, the input clock is connected to only two N A N D gates in the first unit delay,
so there is no need for the clock driver. This eliminates the phase shift between the input
reference clock and output clock due to delay mismatch between the clock and dummy
clock driver.
There are a total of 5 dual-input N A N D and one tri-state gate in each unit delay, which is
less than 6 dual input N A N D and 6 inverter gates used in the previously described digital
Vernier D L L circuit [73] as it is shown in Figure 2.13.
The coarseinput and fmeinput ports of the first unit delay are tied to the input reference
elk port. This is the entry port for both fine and coarse chains, and from this point the ref
erence clock propagates through two separate fine and coarse delay chains. The
fme_output, coarse_output and vernierout ports in the last unit delay of the delay chain
are not connected to any net.
3.2.2 Vernier delay line controller
The controller block consists of a finite state machine (FSM) and two shift registers that
control the D L L operation. A l l the timing control for the delay line is originated in the
controller block. It determines which output tap in the delay line is connected to the D L L
output and whether the D L L is in coarse or fine mode.
42
Reset
decreasedelay & fine_control(N-l) increase_delay & fine_control(0)
Figure 3.12 State diagram of controller block
The finite state machine has four states: IDLE, INCREMENT, DECREMENT, and FINE
as shown in Figure 3.12. The F S M remains in the IDLE state while reset is asserted. The
initial coarse_load_data value is loaded into the coarse shift register when reset is asserted.
This value determines which output tap is selected as the output of the Vernier delay line.
The default value of "00000000000000008000000000000000" selects the center tap.
The register_clock, increase_delay and decrease_delay are generated by the phase detec
tor block The input register_clock signal is used to clock the shift register. The
increase_delay and decrease_delay signals determine whether is a right shift or a left shift
as shown in Figure 3.13.
43
increasedelay-decrease_delay-
register_clk
increase_delay decrease_delay-
register_clk
coarse_load_data
5^ Right D
B^- Left Enable (STATE = INCREMENT) gB» Clk Q
(STATE == DECREMENT)
coarse_control
fine load data
(STATE == FINE)
fine control fine control inv
Figure 3.13 Shift registers in controller block
Depending on whether the increase_delay or decrease_delay signals is asserted, the state
machine moves to INCREMENT or DECREMENT state, respectively. The state machine
stays in the D E C R E M E N T state as long as decrese_delay is asserted and moves to the
FINE state when increasedelay is asserted for the first time. The sate machine stays in the
INCREMENT state as long as increase_delay is asserted and moves to the D E C R E M E N T
state when decrease_delay is asserted for the first time. Subsequently, it moves to the
FINE state in the next clock when increase_delay is asserted
44
Therefore, regardless of whether it is in a state of DECREMENT or INCREMENT, the
state machine ends up in the FINE state where the coarse shift register is disabled and the
fine shift register's output determines the amount of incremental fine delay needed for
the D L L to lock its output clock with the input reference clock. The D L L stays in the lock
mode for as long as the input clock phase is steady and the phase difference between the
output and input reference clock is within the resolution of the phase detector.
If input clock's frequency and phase change at any time, then the D L L exits the lock
mode. If the output clock's rising edge leads the input clock's rising edge, then
increase_delay is asserted. On the other hand, if the input clock's rising edge leads the out
put clock's rising edge, then decrease_delay is asserted. In either case, the register_clock
is enabled. The state machine stays in the FINE state and the fine shift register shifts left or
right depending on whether decreasedelay or increase_delay is asserted.
For example if the resolution of Vernier delay line is 10 ps and the fine shift register holds
the hex value of "00000000010000000000000000000000" when D L L is in lock mode.
The fine shift register can be shifted to the left until its most significant bit becomes "1",
which requires 39 clock cycles. The delay of delay line is then decreased by 390 ps. On
the other hand shift register can be shifted to the right until its least significant bit becomes
"1" which requires 88 clock cycles and the delay of delay line is increased by 880 ps.
Therefore, i f the phase error between the output clock's rising edge and input clock's ris
ing edge is within this window, then the state machine stays in the FFNE state and lock
mode is achieved.
45
If phase error is not within this window, then the state machine shifts to either INCRE
MENT or DECREMENT, depending on whether an increase or decrease in the delay line
is required. At this point, the fine shift register resets to "0" and is disabled. The coarse
shift register, which controls the coarse delay line, is enabled and each shift to the right or
left increases or decreases the delay by an amount of delay equal to coarse unit delay
(delay of two NAND gates in a row). The state machine finally moves into the FINE state
when the phase error is less than the coarse unit delay, and then the fine incremental delay
can reduce the phase error into less than Vernier resolution.
In order to lower the power consumption in this DLL, only register_clock is used as the
clock to the controller module. Therefore, while DLL is in lock mode, both increase_delay
and decrease_delay are deasserted and registerclock is not enabled. The controller mod
ule has 128 flip-flops for each coarse and fine shift registers, so turning off the clock to
shift registers when both are disabled, lowers the power consumption. A flip-flop con
sumes power if it is clocked regardless of its D input changes. Disabling a clock when is
not required saves power in digital circuits.
The Vernier delay line consists of 128 unit delays. Therefore, there are 128 flip-flops in
each coarse and fine shift register. The finite state machine has four independent states.
Two flip-flops are required to encode the two bits representing these 3 states. In total,
there are 258 flip-flops in the controller module, so clock-gating (disabling a clock when
is not required) saves power when DLL is in the lock mode.
46
3.2.3 High-resolution phase detector
The phase detector in D L L detects the phase error between output and input reference
clocks. The resolution of a Vernier D L L depends not only on the Vernier concept utilized
in the delay line, but also on how its phase detector is designed. The minimum phase error
that can be detected by the phase detector is defined as the phase detector's resolution. The
resolution of a phase detector depends on many factors, including design methodology
and CMOS technology implemented in chip fabrication.
A high-resolution phase detector is proposed in [50], where the delay of a buffer deter
mines the resolution. The 70 ps is achieved when it is implemented in 0.18 | im technol
ogy. The phase detector has three outputs: Shift_Left, Shift_Right, and Clk as shown in
Figure 3.14. When the rising edge of the input clock is within one unit delay (the delay of
U4) of the rising edge of the output clock, both outputs of the phase detector, Shift_Right
and ShiftJLeft, go to low and Clk is turned off.
A divide-by-two is included in the phase detector, so the phase detector is made to wait at
least two clock cycles before making another decision, generating a high on either
Shift_Right or Shift_Left. This provides enough time for the shift register in the proposed
[50] design to operate and for its output waveform to stabilize, on the other hand increases
the lock time, because now a decision has to be made for every two input clock cycles.
47
Figure 3.14 Phase detector in [50].
A modified version of the high-resolution phase detector [50] is proposed in this thesis
which can significantly improve resolution. The Vernier methodology is implemented in
this design, which effectively reduces the amount of delay between the D inputs of UI and
U2. As explained previously, the delay between two inputs and the output of the A N D
gate is not the same for both inputs.
The delay difference is exploited in the Vernier delay line to achieve a very small fine
incremental unit delay. The same concept is used in the proposed high-resolution phase
detector in the thesis. The schematic of this phase detector is shown in Figure 3.15. The
U7 and U8 introduce the same delay because both gates are connected through pin A l of
48
the A N D gate. The U3 gate introduces slightly more delay, because the A2 pin is used as
input. The 0.18 |J.m technology library used for simulation and synthesis, introduces less
than 10 ps of delay difference between two outputs and output of an A N D gate.
Figure 3.15 Proposed high resolution phase detector
The decreasedelay and increasedelay are ORed to generate register_clk. The resolution
of a phase detector is defined as the minimum detectable phase error between its two
inputs. If phase error is within the resolution of the phase detector, then decrease_delay,
increase_delay and register_clk stay low.
The OR gate (U6) also delays the register_clk to either increasedelay or decreasedelay
which guarantees the required setup for the flip-flops in the controller driven by
register_clk. In a flip-flop the data should not change within setup and hold time window
around the clock edge, otherwise output is not predictable and can go to a metastable
(unstable) condition.
49
If the output clock leads the input clock by a margin greater than the resolution, a delay
difference is created between the A l and A2 input pins to the output pin in the A N D gate.
Then, the Q pin of UI and U2 go high resulting a high on the increase_delay output.
On the other hand, if the input clock leads the output clock by a margin greater than the
resolution, then the Q pin of UI and U2 go low (Q goes high for both UI and U2), result
ing in a high on decreasedelay output. In either case, register_clk goes high and generates
the required clock edge for the logic in the controller module as shown in Figure 3.16.
DLL Input Clock
DLL Output Clock
U1 /Q
decrease_delay
r-*\ Output leading
I—*| Input leading
^ " w U2 /Q /
increase_delay / \ / \ j j_ ft
ft
— "— SS register_clk / \ / \ jj / \
Figure 3.16 P h a s e detector waveforms
If none of the two cases exist, the input and output clocks are within the resolution of the
phase detector. In this case, the Q of UI goes high and the Q pin of U2 goes low resulting
in a low on both increase_delay and decrease_delay outputs. This happens when output
locks to input and D L L is in lock mode.
50
The divide by two logic (U3 and U7 in Figure 3.13) is not used in the proposed high-reso
lution phase detector. The delay of the Vernier delay line increases or decreases by a small
differential amount equal to the resolution of the delay line. Therefore, the delay line can
be stabilized before the next decision is taken on the next edge of input clocks, and there is
no need to delay by every other clock. This reduces the time required for the D L L to
achieve the lock mode. The lock mode is detected by the lock detector module and is
described in the next section.
3.2.4 Lock detector
The lock detector is a very simple circuit, which outputs a high when D L L is in the lock
mode as shown in Figure 3.17. If both increasedelay and decrease_delay are low on the
falling edge of the D L L input clock, then the output lockjndicator goes high to indicate
that D L L now is in lock mode. The D L L input clock is used instead of the register_clk
because when D L L goes to lock mode, the register_clk is off and can not clock the low
value on the decreasedelay and increasedelay.
increase_delay decreasedelay
D L L input clock
D Q
> 1—c
D Q
>
lock indicator
Figure 3.17 Lock detector circuit
51
Chapter 4
Analysis of proposed DLL
This chapter analyzes the simulation results, describes the testbench, and demonstrates
how the D L L achieves the lock mode. The coarse and fine phases of the locking process
are investigated and illustrated in the captured waveforms.
4.1 Testbench
A simple testbench instantiates the D L L design, clock, and reset generator. It also intro
duces glitch in the clock in order to examine how the D L L re-enters the lock mode when
its input clock phase changes abruptly. The lock_indicator signal is monitored any time
this signal becomes high indicating that D L L has entered the lock mode. The target resolu
tion is less than 10 ps for the operating frequency range of 100 MHz to 200 MHz.
In order to verify that the D L L can recover from any abrupt input phase changes, after a
set period of time a glitch is imposed on the input clock source. This drives the D L L into
the non-locking mode, where the D L L mechanism guarantees recovery. After some time,
the D L L locks to the input signal. The time it takes for D L L to lock depends on input fluc
tuations, the D L L architecture, the length of the delay line, and the algorithm used in the
controller's module, where the worst period is defined as the lock recovery period.
52
The D L L described in this thesis is in lock mode when the controllers state machine is in
the FINE state and when both increase_delay and decrease_delay signals are inactive.
Depending on the imposed glitch, the lock mode can be achieved in the FINE state based
on the condition that this glitch is smaller than unit delay. Any variation larger than unit
delay forces the state machine to enter INCREMENT or DECREMENT state, which later
re-enter the FINE state and finally enable D L L to regain lock status.
The testbench is configured for six different cases and exhaustively covers all the different
operational modes of D L L . The first two cases verify the general locking process after
power up and reset, considering both possible leading or lagging input clock in reference
to output clock. The other four cases verify the lock re-entry process when an amount of
glitch is applied to input clock. Depending on the amount of glitch and the relative posi
tion of the input to output clocks (leading or lagging), the four possible cases are investi
gated in the testbench. The following sections detail all the cases. A l l the waveforms are
included, and a description of the phase detector and the controller's operation for every
case clarifies the DLLs operating mechanism.
4.2 Initial lock
After powering up and resetting, either phase detector's increasedelay or decrease_delay
becomes high, depending on the polarity of the phase error. If the input clock leads the
output clock, then decrease_delay is enabled. On the other hand, i f the output clock leads
the input clock then increase_delay is enabled. In the case where output clock is in the
same phase as the input clock, then both increase_delay and decrease_delay signals (phase
detector outputs to the controller module) are disabled.
53
If decrase_delay is enabled, then the state machine transits to the DECREMENT state. In
this state at every clock the coarse shift register shifts one unit to the left, which conse
quently decreases the total delay by. one unit. At some point the output clock starts leading '
the input clock, which means that coarse action is completed and the state machine has
transited to the FINE state. In this state, the fine shift register shifts to the right and at
every cycle the total delay of delay line increases by an incremental value. As described in
the previous chapter, the incremental value is very small, 4 ps for the N A N D gate used in
this design. Finally, the output clock is within the D L L resolution (4 ps) of the input clock,
and increase_delay is disabled. The lock_indicator signal becomes high, which indicates
that D L L is locked. The captured waveforms are shown in Figures 4.1 and 4.2. For clarity,
only related signals are captured. The phase error between the output and input clocks is 2
ps after the D L L locks, where the L O C K I N D I C A T O R signal is high as shown in Figure
4.2.
File Edit Marker G o T o View Options Window Help
D | c g | B | ' I '1 , 1 a - | z - J T J K | > J « | » | H « | R | [ * T « . | ( S | f |
RESET
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
D L L _ C L O C K J N P U T
REGISTER_CLOCK
D E C R E A S E _ D E L A Y
I N C R E A S E _ D E L A Y
N E X T S T A T E
S T A T E
50000 100000 150000 ' ' I.J...' • j . . . . . . . i . .
200000 250000 _
DECREMENT FINE
DLE DECREMENT FINE
R R ~ T | Ready jTlrne - ZS0000 Wi f -10 5Wfc=9 Se i -0
Figure 4.1 Initial lock mode waveform for a leading input clock 54
File Edit Marker GoTo View Options Window Help
OJEgjt i z+ z-
RESET
LOCK_INDICATOR
D L L _ C L O C K _ O U T P U T
DLL_CLOCK_INPUT
REGISTER_CLOCK
D E C R E A S E _ D E L A Y
INCREASE_DELAY
N E X T S T A T E
S T A T E
232910 232920
FINE
FINE
232930
J Ready Time « HS0000 :Wif=1D lWfc=9 ;Sel=0
Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in)
On the other hand, i f increase_delay is enabled, then the state machine transits to the
I N C P v E M E N T state. In this state, at every clock edge the coarse shift register shifts one
unit to the right, which increases the total delay by one unit delay. At some point, the input
clock starts leading the output clock and decrease_delay is asserted, which means that
coarse action is completed. The state machine then moves to the D E C R E M E N T state and
after one clock cycle enters the FINE state as shown in Figure 4.3. The reason behind this
sequence is that initially the fine delay line output tap is set to the first tap, the most left
tap position of the chain, so fine delay can only be increased. Therefore, by going to the
DECREMENT state the output clock leads the input clock again, but this time the phase
error is less than one unit delay. By moving to FINE state the delay incrementally
increases until the phase error becomes zero and lock state is achieved.The phase error
between the output and input clocks is 2 ps after the D L L locks as shown in Figure 4.4.
55
File Edit Marker GoTo View Options Window Help
E _ i I « I * H A I [ M £ | J S L £ | |
RESET
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
DLL_CLOCK_INPUT
REGISTER_CLOCK
D E C R E A S E _ D E L A Y
I N C R E A S E _ D E L A Y
N E X T S T A T E
S T A T E
31
50000 100000 1 1 1 i 1 L..1 . v i . 1 1 1 1 1 1 I 1 • 1 •
-32086
150000 200000 2500001; • 1 1 ,1.1—' I ' I I < ' I . ' • ' ' I 1 I I I I 1
L n m r L r ^ ^
n I
NCRdMEMT D" | FINE i
iDLE iNCRE M E N T D" FINE i
Ready .Time = 260000 ,Wif-10 W f c - 9 Sel=0
Figure 4.3 Initial lock mode waveform for a leading output clock
RESET
L O C K J N D I C A T O R
DLL_CLOCK_OUTPUT
D L L _ C L O C K J N P U T
REGISTER_CLOCK
D E C R E A S E _ D E L A Y
INCREASE_DELAY
N E X T S T A T E
S T A T E
7 H
J250480 I ' 1 1 1 1
250500 250520 ..... I ... i i I U
FINE
250540 . I .
Time = 2G0CIB0 Wif-10 W f c - 9 ,Se l -0
Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in)
56
4.3 Lock re-entry
Phase variations on the input clock due to jitter and glitch introduce phase error, which in
causes the D L L to exit the lock mode. This initiates a re-entry process and subsequently
the D L L resumes its lock status. Depending on the amount of phase error, the state
machine can stay in the FINE state or move to INCREMENT or D E C R E M E N T states.
The following sections explain these 2 possible cases in detail.
4.3.1 Lock re-entry (easel)
If the phase error is within the dynamic range of the fine delay line, then D L L re-enters the
lock mode and the state machine stays in the FINE state. The dynamic range of the fine
delay line is the range at which its delay can be increased or decreased without reaching
the limit in both direction. The total delay of fine delay line is (N * T^, where N is the
number of fine delay units in the chain and Tf is the delay of each fine unit. In this design
N is 128 and the delay of each fine unit is 4 ps. The 4ps is the difference of input to output
delay of 2 input N A N D gate in the library.
For example, i f in lock mode the fine delay line's output is the middle tap of the chain then
the fine delay line can be increased or decreased by a delay equal to half of the total delay
of the fine delay line or 256 ps, which then any input phase error less than 256 ps is com
pensated and lock mode is resumed while state machine is still in the FINE state. The sim
ulation result is shown in Figure 4.5. The INCREASE_DELAY signal goes high for one
clock so increases the total delay by 1 fine unit delay or 4 ps and compensates for the
57
added 6 ps input phase error. The phase error is within 4 ps resolution of phase detector
and D L L is locked.
Eile Edit Marker GoTo View Options Window Help
D .1 j I U.\ 2+ | Z - | ' J i | K | > | «|»j*r>| R I [fT «t
277048 290000 300000 310000
J,,.,L,,J,J1....! ! [...! !.... .J. ! .' ) ! ,1...! ! ! ,' 1 1 1 1 1.... 1 1 1 ' '
R E S E T
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T j
D L L _ C L O C K _ I N P U T
R E G I S T E R _ C L O C K
D E C R E A S E _ D E L A Y
I N C R E A S E D E L A Y
S T A T E FINE
LT
| Ready
F I N E
F I N E
|Time = 600000 sWif=28 Wfc=9 jSel-1
Figure 4.5 Lock re-entry mode waveform for small phase error
4.3.2 Lock re-entry (case 2)
The phase error can not be corrected by fine action if the amount of error is larger than the
dynamic range of the fine delay line. For example, if in the lock mode the fine delay's out
put tap is in the center of the fine delay line, then any phase error greater than half of the
fine delay line, or 256 ps can not be corrected while the state machine is in the FINE state.
A phase error is generated i f input clock leads the output clock. The decrease_delay is
enabled, and the fine delay line output tap shifts to the left until it reaches the first tap of
58
fine delay line. At this point the state machine moves to the D E C R E M E N T state and
coarse action is enabled. At every clock, the total delay of D L L is decremented by an
amount equal to one unit delay or 78 ps in the simulation. At a certain point, the
decrease_delay is deasserted and increase_delay is enabled. Then, the state machine
moves to the FINE state and finally achieves the lock mode.
Figures 4.6 shows that originally D L L locks at time 220 ns. A 500 ps glitch is applied at
time 240 ns and D L L locks again at time 550 ns. The final phase when the D L L locks
again is shown in Figure 4.7. The 500 ps is the amount of glitch required for the D L L to
exit the lock mode and not to be locked within the dynamic range of fine delay line as
described in lock re-entry (case 1). The introduced glitch is shown in Figure 4.8.
Figure 4.6 Lock re-entry mode waveform for a leading input clock
59
File Edit Marker G o T o View Opt ions Window Help ^ 2 ^ i S ^ « & J i , s s l . t X > L L C I l
I D | E S ! | H | * N E S | - j 1 1 1 H H ' 1 K | » M * M » l [ M 5 l | S|f|
51476C f
450000 500000 . 1 ... 1 1 I 1 1 1 I I r 1 i j
— i i i i i_ R E S E T
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
D L L _ C L O C K J N P U T
R E G I S T E R _ C L O C K
D E C R E A S E J 3 E L A Y
I N C R E A S E _ D E L A Y
1
1
0
0
0
0
0
F INE
Hi R E S E T
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
D L L _ C L O C K J N P U T
R E G I S T E R _ C L O C K
D E C R E A S E J 3 E L A Y
I N C R E A S E _ D E L A Y
1
1
0
0
0
0
0
F INE
UUTT TTLRT
R E S E T
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
D L L _ C L O C K J N P U T
R E G I S T E R _ C L O C K
D E C R E A S E J 3 E L A Y
I N C R E A S E _ D E L A Y
1
1
0
0
0
0
0
F INE
i i m j m i i J T r L j i j i J T R j i j i mjmRjmnj i r i jmn jT iimnjiruiiirirmrLn i n n
UUTT TTLRT
R E S E T
L O C K J N D I C A T O R
D L L _ C L O C K _ O U T P U T
D L L _ C L O C K J N P U T
R E G I S T E R _ C L O C K
D E C R E A S E J 3 E L A Y
I N C R E A S E _ D E L A Y
1
1
0
0
0
0
0
F INE
1
1
0
0
0
0
0
F INE cn-M. Kl FINE
FINE |.«| J ' | »| •1 M M [Ready -t ime - 600000 ;Wl f«31" ,Wfc=9" " Sel= 1
Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in)
File Edit Marker GoTo View Options Window Help
D £ -1 'J II z+ Z- K • ,.- +. a ?|
£60990 280000 300000 320000 .! 1 1 1 1 ; 1.... j 1 1 1 1 '..j 1 1 1 1
1 1 1 1 1 1 J 1
RESET
LOCK_INDICATOR
DLL_CL0CK_0uTPUT|
DLL_CLOCK_INPUT
REGISTER_CLOCK
DECREASEJOELAY
IIMCREASE_DELAY
NEXTSTATE
1
1
1
1
0
0
0
FINE
•
FIN
Fl ' -E
Read- Time -6D0000 }Wlf-31 !Wfc=9 Sel=1
Figure 4.8 Introduced glitch waveform for a leading input clock
60
A phase error is generated i f the output clock leads the input clock. The increase_delay is
enabled, and the fine delay line output tap shifts to the right until it reaches the last tap of
fine delay line. At this point, the state machine moves to INCREMENT state and coarse
action is enabled. At every clock, the total delay of D L L is incremented by an amount
equal to one unit delay or 78 ps in the simulation. At a certain point the increase_delay is
deasserted and decrease_delay is enabled. Then, the state machine moves to the FINE
state and finally achieves the lock mode.
Figures 4.9 shows that, originally, the D L L locks at time 250 ns. A 2 ns glitch is applied at
time 300 ns and D L L locks again at time 1315 ns. The final phase when the D L L locks
again is shown in Figure 4.10. The 2 ns input phase shift is introduced as glitch which
causes the D L L exits the lock mode and L O C K J N D I C A T O R signal goes low as shown in
Figure 4.11.
Fi le Edi t M a r k e r G o T o V i e w O p t i o n s W i n d o w H e l p
•leg]sal a iNgsl i J J z,|z-|:-j| - i H ^ M j j r n c j M i l
500001) |0 500000 1000000 J—i—i—i—i—i—i—i—i—I—i i—i i i i i i i I i i i _i_
RESET
L O C K J N D I C A T O R
DLL_CLOCK_OUTPUT|
DLL_CLOCK_INPUT
REGISTER_CLOCK
DECREASE_DELAY
INCREASE_DELAY
NEXTSTATE
1
1
0
0
0
0
0
FINE
r J~L
•III i l l Ml IIIII j j II III IJIllllillllilM I I I I M J N ' I I I ujjjj_ji..jjjiijjjjjjjj;iji..jjjjjiwiiii
L J
STATE •* ' • I FINE
= INE I N C R E I v E F I N E
ax F I N E • s C R E I v . E ' IIME
Ready !Tlme - 1500000 Wif=31 :Wfc=S S e l - 1
Figure 4.9 Lock re-entry mode waveform for a leading output clock
61
File Edit Marker GoTo View Options Window Help
T [ - | - J »|z-||. | a l ;
240131 11250000
_ l l _ _ l I I
1300000 l
RESET
L O C K J N D I C A T O R
DLL_CLOCK_OUTPUT|
D L L _ C L O C K J N P U T
REGISTER_CLOCK
DECREASE_DELAY
INCREASEJDELAY
NEXTSTATE
> S T A T E ' , , , '
1
0
0
0
0
0
0
INCREH: I M . - H M N i . > \ K : } f Ml
\ C R E ! - / E U ~ D E C B E '
I
F I N E
Ti Ready T i m e - 1500000 W l f - 3 1 Wfc=9
Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in)
File Edit Marker GoTo View Options Window Help
D|cs|al *|<Mm| __4 z+|z-|, | < | > | « | » H jVjff f^J
267250 . I I I . L_
300000 , I ,
320000 I I i , . , I i
RESET
LOCK_INDICATOR
DLL_CLOCK_OUTPUT|
DLL_CLOCK_INPUT
REGISTER_CLOCK
D E C R E A S E _ D E L A Y
INCREASE_DELAY
N E X T S T A T E
1
1
1
1
0
0
0
FINE F I N E
FIM=
J3I
Ready iTime » 15DOOO0 Wif=31 !Wfc=9 |Sel=1
Figure 4.11 Introduced glitch waveform for a leading output clock
62
4.4 Gate count of the vernier unit delay
The proposed vernier unit delay line was mapped to a commercial 0.18 | im library. The
total cell area is about 96 basic cells. The previously published unit delay [73], was also
mapped to the same library and the total cell area is about 122 basic cells. Therefore, the
proposed unit delay saves about 20% gate count when is implemented in the same library.
The gate count reduction is significant considering hundreds of the unit delays blocks are
needed in a typical delay line.
The static power consumption of a circuit is due to the leakage current and is proportional
to the gate count. Therefore, the static power consumption of the delay line is reduced by
20%. The dynamic power consumption of the circuit not only depends on the gate size but
also at the rate each gate is being toggled in the circuit. The toggle rate is a function of
logic and operating frequency. The dynamic power consumption can be measured using
the dynamic test vectors which are generated during functional simulation.
The practical formulas are given by fabs to estimate the dynamic power consumption. The
general guideline is that dynamic power consumption increases proportionally with the
gate count increase. Based on this rule of thumb the 20% dynamic power saving is real
ized by the proposed delay line.
4.5 Resolut ion of the proposed D L L
The proposed vernier unit delay is based on the delay difference between the 2 inputs to
output of a dual-input N A N D gate. The difference for a N A N D gate in 0.18 (imcommer-
63
cial library is measured less than 10 ps in the functional simulation (4 ps). The previously
published unit delay [73], is based on the delay difference of a N A N D gate with different
fanout loads. The achieved resolution was the fifth of the delay of each unit block, i.e,
about 20 ps i f it was implemented in the same 0.18 [im library, considering the delay of
each unit delay block is 100 ps.
Therefore, the proposed design offers 100% improvement for resolution of the delay line.
The higher resolution reduces the phase error between the output and input clocks of a
D L L . At the same time, the cycle-to-cycle jitter is also reduced due to the fact that output
clock can be delayed by smaller unit between the two consecutive clock edges.
4.6 Limitations of the proposed DLL
The main limitation of the proposed D L L is, that depending on the phase error between
the input and output clocks, it can take up to 128 clock cycles for the D L L to lock which is
considered relatively slow. For example, i f the first output tap of the fine delay line is
selected while the D L L is locked, then a 512 ps glitch at input causing the output clock to
lead the input clock, requires 128 input clock so the D L L can lock again. The resolution of
fine delay line is 4 ps so at every clock cycle the delay of the whole delay line can is
increased by an amount equal to 4 ps, therefore 128 input clock cycles is required to lock.
This example is considered the worst case and normally D L L locks in a shorter time. The
thesis mainly concentrates on how to improve a DLL's resolution. The extra research can
be done to improve the lock time, for example devising efficient algorithms to shorten the
lock time period [33].
64
Chapter 5
Conclusion
The phase-locked loops (PLLs) and delay-locked loops (DLLs) have been widely adopted
to solve the clock skew problem. In recent years, Delay Locked Loops (DLLs) have been
widely used for clock alignment due to their lower phase-error accumulation and faster
locking time [35], [82]. A D L L is used in many other applications such as clock synthesis
[2], [3], [6], clock recovery [14], [19], [25], S D R A M controller [26], [47], [65], Automatic
Test equipment (ATE) [99] and Time to Digital Converter (TDC) [4], [22], and [23].
The first DLLs were analog and mainly used for clock distribution applications [10], and
[13]. A conventional analog D L L consists of four main blocks: a voltage controlled delay
line (VCDL), a charge-pump, a low pass filter, and a phase detector. The simple design of
the D L L offers many advantages when compared to VCO-based PLLs. It is still relatively
complex analog circuit, requiring process-specific implementation, making it very diffi
cult to reuse the same design for different technology. Basically an analog D L L is a non
portable architecture as major changes in the layout of design are required to port a design
from one technology to another one.
Digital DLLs are characterized by their use of digital delay lines. They are typically made
from simple digital circuit elements. This simplicity helps to design a portable digital D L L
which can be easily adopted for different technologies. Although the digital D L L uses
65
more area and power than the analog D L L , its greater simplicity, and lower minimum
required power supply voltage makes it very attractive for many applications.
The Register Delay Locked Loop (RDLL) belongs to the digital D L L family and is widely
used in high speed synchronous D R A M (SDRAM) applications [17], [51], [85] and [90].
The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica
input buffer dummy [85].
The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits are
non-feedback systems which can achieve the lock, in only two clock cycles [52], [88] and
[94]. Therefore, in standby mode these circuits can be disabled, and they can lock to the
reference clock in just two clock cycles when the operation mode is resumed.
The latest DLLs use Vernier principle, based on the Vernier caliper tool[83]. The Vernier
technique implemented in the proposed design is based on the characteristic of a N A N D
gate and uses the delay difference between the inputs to output of a dual-input N A N D gate
to achieve a fine step resolution. The previous technique [73] was based on the delay dif
ference of a N A N D gate with different fanout loads. The analysis in previous chapter
shows the resolution of D L L is doubled based on the new technique implemented in the
proposed design.
This thesis introduced a novel architecture for a high-resolution Vernier D L L with a reso
lution of less than 10 ps. It combines the two coarse and fine unit delay blocks into one
66
unit delay block in a way that effectively reduces the area of the delay line. This reduction
is considered significant when taking into account the number of unit delay blocks
required in a typical delay line. The combination of smaller delay line and integration of
fine and coarse controllers reduces D L L power consumption. The analysis in the previous
chapter shows that a 20% gate count reduction in the delay line is achieved by using the
proposed unit delay block. It also shows that total power consumed by delay line is also
reduced 20% approximately.
A testbench was written for all different cases, exhaustively covers all the different opera
tional modes of DLL. The first two cases verify the general locking process after power up
and reset, considering both possible leading or lagging input clock in reference to output
clock. The other four cases verify the lock re-entry process when an amount of glitch is
applied to input clock.
A linear control algorithm is used in this thesis to achieve lock mode. The controller lin
early increases or decreases the total delay .of the delay line. For faster lock time, the
binary search algorithm is incorporated into SARDLL [33]. This algorithm reduces the
searching effort and speeds up the lock time process. The various lock mechanism can be
explored in order to speed up the lock time period of the D L L . This can be considered as
one of the of future research topics.
67
Bibliography
[1] T.Hamamoto, K.Furutani, T.Kubo, S.Kawasaki, H.Iga, T.Kono, Y.Konishi, T.Yoshihara, " A 667-Mb/s Operating Digital D L L Architecture for 512-Mb DDR S D R A M , " IEEE J. Solid-State Circuits, vol. 39, N O . l , pp. 194-206, Jan 2004.
[2] C.C.Chung, C.Y.Lee, " A New DLL-Based Approach for All-Digital Multiphase Clock Generation," IEEE J. Solid-State Circuits, vol. 39, NO.3, pp. 469-471, Mar 2004.
[3] R.F.Rad, A.Nguyen, J.M.Tran, T.Greer, J.Poulton, W.J.Dally, J.H.Edmondson, R.Senthinathan, R.Rathi, M.E.Lee, H.T.Ng, " A 33-mw 8-Gb/s CMOS Clock Multiplier and CDR for Highly Integrated I/Os," IEEE J. Solid-State Circuits, vol. 39, NO.9, pp. 1553-1561, Sept 2004.
[4] C.S.Hwang, P.Chen, H.W.Tsao, " A High-Precision Time-to-Digital Converter Using a Two-Level Conversion Scheme," IEEE Transactions on Neuclear Science, vol 51, NO.4, pp. 1349-1352, Aug 2004.
[5] A.H.Chan, GW.Roberts, " A Jitter characterization system using a component-invariant Vernier delay line," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol 12, N O . l , pp. 79-95, Jan 2004.
[6] C.S.Hwang, P.Chen, H.W.Tsao, " A wide-range and fast-locking clock synthesizer IP based on delay-locked-loop," ISCAS 2004, Proceedings of the 2004 International Symposium on, Vol.1 May 2004, pp.352-361.
[7] K.Kim, N.Park, T.Kim, "An unlimited lock range D L L for clock generator," ISCAS 2004, Proceedings of the 2004 International Symposium on, Vol.1 May, 2004,pp.352-361.
[8] K.Cheng, Y L o , WFang, S.Hung, " A mixed-mode delay-locked loop for wide-range operation and multiphase clock generation,"System-on-chip for Real-Time Applications, 2003 Proceedings, Jul 2003, pp.90-93.
[9] A.Suzuki, S.Kawahito, D.Miyazaki, M.Furuta, " A digitally skew correctable multi-phase clock generator using a master-slave D L L , " ISCAS 03, Proceedings of the 2003 International Symposium on, Vol.1 May 2003, pp. 105-108.
[10] K.Taesung, K.Beomsup, "Phase interpolator using delay locked loop," Mixed-Signal Design, 2003, Southwest Symposium on, Feb 2003, pp.76-80.
68
ZJingcheng, D.Qingjin, T.Kawasniewski "A-107dBe, lOKHz Carrier offset 2-GHz DLL-based frequency synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304.
GManganaro, S.Kwak, S.Bugeja " A dual 10b 200MSPS pipeline D/A converter with DLL-based clock synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304.
H.Chang, C.Sun, S.Liu, " A low-jitter and precise multiphase delay-locked loop using shifted averaging V C D L , " in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 434-505.
W.Pvhee, H.Ainspan, S.Rylov, A.Rylyakov, M.Beakes, D.Friedman, S.Gowada, M.Soyuer, " A 10-Gb/s CMOS clock and data recovery circuit using a secondary delay-locked-loop," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.81-84.
GWei, J.Stonick, D.Weinlader, J.Sonntag, S.Searles " A 500MHz M P / D L L Clock Generator for a 5Gb/s Backplane Transceiver in 0.25 CMOS," in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 464-465.
S.J.Kim, S.H.Hong, J.K.Wee, J.H.Ahn, J.Y.Chung, " A low Jitter, fast recoverable, fully analog D L L using tracking A D C for high speed and low stand-by power DDR I/O interface," VLSI Circuits, 2003, Digest of Technical Papers, 2003 Symposium on, June 2003, pp. 285-286.
J.T.Kwak, C.K.Kwon, K.W.Kim, S.H.Lee, J.S.Kih, " A low cost high performance register-controlled digital D L L for 1 Gbps/spl times/32 DDR S D R A M , " VLSI Circuits, 2003, Digest of Technical Papers, 2003 Symposium on, June 2003, pp. 283-284.
K.H.Cheng, Y.L.Lo, W.F.Yu, S.Y.Hung, " A mixed-mode delay-locked loop for wide-range operation and multiphase clock generation," System-on-Chip for Real-Time Applications, 2003, Proceedings, The 3rd IEEE International Workshop on, Jul 2003, pp. 90-93.
Z.Mao, T.H.Szymansli, " A 4Gb/s CMOS fully-differential analog dual delay-locked loop clock/data recovery circuit," Electronics, Circuit and Systems, 2003, ICECS 2003, Proceedings of the 2003 10th IEEE International Conference on, Vol.2, Dec 2003, pp. 559-562.
69
M.E.Lee, W.J.Dally, T.Greer, H.T.Ng, R.F.Rad, J.Poulton, R.Senthinathan, "Jitter Transfer Characteristics of Delay-Locked Loops, Theories and Design Techniques," IEEE J. Solid-State Circuits, vol. 38, NO.4, pp. 614-621, Apr 2003.
T.Matano, Y.Takai, T.Takahashi, Y.Sakito, I.Fujii, Y.Takaishi, H.Fujisawa, S.Kubouchi, S.Narui, K.Arai, M.Morino, M.Nakamura, S.Miyatake, T.Sekiguchi, K.Koyama, " A 1-Gb/s/pin 512-Mb DDRII S D R A M Using a Digital D L L and a Slew-Rate-Controlled Output Buffer," IEEEJ. Solid-State Circuits, vol. 38, NO.5, pp. 762-768, May 2003.
S.Tabatabaei, A.Ivanov, "Embedded Timing Analysis: A SOC Infrastructure," IEEE Design & Test Of Computers, vol. 19, NO.3, pp. 24-36, June 2002.
S.Tabatabaei, A.Ivanov, " A n embedded core for Sub-Picosecond timing measurements,"^? Conference, 2002, Proceedings of ITC International, pp. 129-137, Oct 2002.
A.H.Chan, G.W.Roberts, " A deep sub-micron timing measurement circuit using a single-stage Vernier delay line," Custom Integrated Circuits Conference, 2002, Proceedings of the IEEE 2002, May 2002, pp.77-80.
X.Millard, F.Devisch, M.Kuijk, " A 900-Mb/s CMOS Data Recovery D L L using Half-Frequency Clock," IEEEJ. Solid-State Circuits, vol. 37, NO.6, pp. 711-715, June 2002.
S.J.Kim, S.H.Hong, J.K.Wee, J.H.Cho, P.S.Lee, J.H.Ahn, J.Y.Chung, " A Low-Jitter Wide-Range Slew-Calibrated Dual-Loop D L L Using Antifuse Circuitry for High-Speed D R A M , " IEEE J. Solid-State Circuits, vol. 37, NO.6, pp. 726-734, June 2002.
R.F.Rad, WDally, H.T.Ng, R.Senthinathan, M.E.Lee, R.Rathi, J.Poulton, " A Low-Power Multiplying D L L for Low-Jitter Multi gigahertz Clock Generation in Highly Integrated Digital Chips," IEEEJ. Solid-State Circuits, vol. 37, NO. 12, pp. 1804-1812, Dec 2002.
C.Kim, I.C.Hwang, S.M.Kang, " A Low-Power Small-Area +/-7.28-ps-Jitter 1-GHz DLL-Based Clock Generator," IEEEJ. Solid-State Circuits, vol. 37, NO. 11, pp. 1414-1420, Nov 2002.
Y.J.Jung, S.WLee, D.Shim, W.Kim, C.Kim, S.I.Cho, " A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines," IEEE J. Solid-State Circuits, vol. 36, NO.5, pp. 784-791, May 2001.
70
DJ.Foley, M.Flynn, "CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillator," IEEE J. Solid-State Circuits, vol. 36, NO.3, pp. 417-423, Mar 2001.
G.K.Dehng, J.W.Lyn, S.I.Liu, " A Fast-Lock Mixed-Mode D L L Using a 2-b SAR algorithm," IEEEJ. Solid-State Circuits, vol. 36, NO.10, pp. 1464-1471, Oct 2001.
J.B.Lee, K.H.Kim, C.Yoo, S.Lee, O.GNa, C.Y.Lee, H.Y.Song, J.S.Lee, Z.H.Lee, K.W.Yeom, H.J.Chung, I.W.Seo, M.S.Chae, Y.H.Choi, S.I.Cho, "Digitally-Controlled D L L and I/O Circuits for 500Mb/S/Pin x l6 DDR S D R A M , " ISSCC Dig, Tech. Papers, Feb 2001, pp.68-70.
G. K.Dehng, J.M.Hsu, C.Y.Yang, S.I.Liu, "Clock-Deskew Buffer Using a SAR-Controlled Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 35, pp. 1128-1136, Aug 2000.
YMoon, J.Choi, K.Lee, D.K.Jeong, and M.K.Kim, "An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation and Low-Jitter Performance," IEEE J. Solid-state Circuits, vol.35, pp. 377-384, Mar 2000.
H. Lee, H.Q.Nguyen, D.W.Potter, "Design Self-Synchronized Clock Distribution Networks In An SOC ASIC Using D L L With Remote Clock Feedback," ASIC/ SOC Conference, 2000, Proceedings, 13th Annual IEEE International, Sept 2000, pp.248-252.
P.Dudek, S.Szczepanski, J.V.Hatfield, " A High-Resolution CMOS Time-to-Digital Converter Utilizing a Vernier Delay Line," Solid-State Circuits, IEEE Transactions on, vol 35, NO.2, pp. 240-247, Feb 2000.
K.Minami, M.Mizuno, H.Yamaguchi, T.Nakano, YMatsushima, YSumi, T.Sato, H.Yamashida, M.Yamashina, " A 1GHz Portable Digital Delay-Locked Loop with infinite Phase Capture Ranges," ISSCC Dig, Tech. Papers, Feb 2000, pp.350-351.
Y.J.Jung, S.W.Lee, D.Shim, W.Kim, C.H.Kim, S.I.Cho, " A low Jitter Dual Loop D L L using Multiple VCDLs with a Duty Cycle Corrector," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, pp. 50-51.
D.J.Foley, M.P.Flynn, "CMOS D L L Based 2V, 3.2ps Jitter, 1GHz Clock Synthesizer and Temperature Compensated Tunable Oscillator," Custom Integrated Circuits Conference, 2002, Proceedings of the IEEE 2002, May 2000, pp.371-374.
71
S.S.Hwang, K.M.Joo, H.J.Park, J.W.Kim, P.Chung, " A D L L based 10-320 M H z Clock Synchronizer," ISCAS 2000, Proceedings of the 2000 International Symposium on, Vol.1 May, 2000 pp.265-268.
D.J.Foley, M.P.Flynn, " A 3.3V, 1.6GHz, Low-Jitter, Self-Correcting D L L Based Clock Synthesizer in 0.5 CMOS," ISCAS 2000, Proceedings of the 2000 International Symposium on, Vol . l May 2000, pp.249-252.
GChien, P.R.Gray, " A 900-MHz Local Oscillator Using a DLL-Based Frequency Multiplier Technique for PCS applications," IEEE J. Solid-state Circuits, vol.35, NO.12, pp. 1996-1999, Oct 2000.
J.H.Lee, S.H.Han, H.J.Yoo, " A 330MHz Low-Jitter and Fast-Locking Direct Skew Compensation D L L , " ISSCCDig, Tech. Papers, Feb 2000, pp.352-353.
S.Kuge, T.Kato, K.Furutani, S.Kikuda, K.Mitsui, T.Hamamoto, J.Setogawa, K . H amade, Y.Komiya, S.Kawasaki, T.Kono, T.Amano, T.Kubo, M.Haraguchi, Y.Nakaoka, M.Akiyama, Y.Konishi, H.Ozaki, T.Yoshihara, " A 0.18 256-Mb DDR-S D R A M with Low-Cost Post-Mold Tuning Method for D L L Replica," IEEE J. Solid-state Circuits, vol.35, N O . l l , pp. 1680-1689, Nov 2000.
S.S.Hwang, "Dual-Loop DLL-based clock synthesizer," Electronics Letters, vol 36, NO. 14, pp. 1173-1174, Jul 2000.
T.Hamamoto, S.Kawasaki, K.Furutani, K.Yasuda, Y.Konishi " A skew and jitter suppressed D L L architecture for high frequency DDR SDRAMs," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, Mar 2000, pp. 76-77.
J.J.Kim, S.B.Lee, T.S.Jung, C.H.Kim, S.I.Cho, B.Kim, " A Low-Jitter Mixed-Mode D L L for High-Speed D R A M Applications," IEEE J. Solid-state Circuits, vol.35, NO.10, pp. 1430-1436, Oct 2000.
C.S.Hwang, WC.Chung, C.Y.Wang, H.W.Tsao, S.I.Liu, " A 2V Clock Synthesizer using Digital Delay-Locked Loop," ASIC, 2000, Proceeding, 2002 IEEE Asia-Pacific Conference on, Aug 2000, pp.91-94.
S.Eto, H.Akita, K.Isobe, K.Tsuchida, H.Toda, T.Seki, " A 333MHz, 20mW, 18ps Resolution Digital D L L using Current-Controlled Delay with Parallel Variable Resistor D A C (PVR-DAC)," ASIC, 2000, Proceeding, 2002 IEEE Asia-Pacific Conference on, Aug 2000, pp.349-350.
72
H.Yoon, GCha, C.YOO, N.J.Kim, K.Y.Kim, C.H.Lee, K.N.Lim, k.Lee, J.Y.Jeon, T.S.Jung, H.Jeong, T.Y.Chung, K . K i m and S.I.Cho, " A 2.5-V, 333-Mb/s/pin, 1-Gbit, Double-Data-Rate Synchronous D R A M , " IEEE J. Solid-State Circuits, vol. 34, N O . l l , pp. 1589-1599 Nov, 1999.
F.Lin, J.Miller, A.Schoenfeld, M.Ma, and R.J.Baker, " A Register-Controlled Symmetrical D L L for Double-Data-Rate D R A M , " IEEE J. Solid-State Circuits, vol. 34, pp. 565-568, Apr 1999.
T.Saeki, K.Minami, H.Yoshida, H.Suzuki, " A Direct-Skew-Detect Synchronous Mirror Delay for Application-Specific Integrated Circuits," IEEE J. Solid-State Circuits, vol. 34, pp. 372-379, Mar 1999.
W.Rhee, A . A l i , "An On-Chip Phase compensation technique in fractional-N-fre-quency synthesis," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.3, June 1999, pp.363-366.
A.Mantyniemi, T.Rahkonen, J.Kostamovaara, " A High Resolution digital CMOS Time-To-Digital converter based on nested Delay Locked Loops," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.2, June 1999, pp.537-540.
S.Nagavarapu, J.Yan, E.K.F.Lee, R.L.Geiger " A n asynchronous data recovery/ retransmission technique with foreground D L L calibration," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.6, June 1999, pp.354-357.
R.L.Aguiar, D.M.Santos, "Simulation and modeling of digital Delay Locked Loops," ISCAS 1999, 42ndMidwest Symposium On, Vol.2, Aug 1999, pp.843-846.
R.L.Aguiar, D.M.Santos, "Modeling Charge-Pump Delay Locked Loops," ICECS 1999, The 6th IEEE International Conference On, Vol.2, Sept 1999, pp.823-826.
S.H.Han, J.H.Lee, H.J.Yoo, " A fast lock-on time Mixed Mode D L L with lOps jitter," VLSI and CAD, 1999, ICVC 1999, The 6th IEEE International Conference On, Oct 1999,pp.564-565.
M.Miyazaki, K.Ishibashi, " A 3-Cycle lock time Delay-Locked Loop with a parallel phase detector for low power mobile systems," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.396-399.
73
Y.S.Song, J.K.kang, " A Delay-Locked Loop circuit with Mixed-Mode tuning," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.347-350.
P.D.Capofreddi, C.D.Baringer, J.F.Jenson, M.J.W.Rodwell, W.P.Posey, M.W.Yung, Y.M.Xie, " A Clock and Data recovery IC for communications and radar applications," Design Of Mixed-Mode Integrated Circuits and Applications, 1999, Third International Workshop On, Jul 1999, pp.88-90.
T.Toifi, R.Vari, P.Moreira, A.Marchioro, "4-Channel Rad-Hard Delay Generation ASIC with Ins Timing Resolution for L H C , " Nuclear Science, IEEE Transactions On, Vol.46, NO.3, June 1999, pp.423-427.
J.Park, Y.Koo, W.Kim, " A Semi-Digital Delay-Locked Loop for clock skew minimization," VLSI Design, 1999, Proceedings of 12th International Conference On, Jan 1999,pp.584-588.
A.Balatsos, D.Lewis, "Low-Skew clock generator with dynamic impedance and delay matching," ISSCC Dig, Tech. Papers, Feb 1999, pp. 182-183.
L.Paris, J.Benzreba, P.Demone, M.Dunn, L.Falkenhagen, P.Gillingham, I.Harrison, W.He, D.Macdonald, M.Macintosh, B.Millar, K.Wu, H.J.Oh, J.Stender, V.Chen, J.Wu, " A 800MB/s 72Mb S L D R A M with digitally calibrated D L L , " ISSCC Dig, Tech. Papers, Feb 1999, pp.414-415.
Y.Moon, D.K.Jeong, " A lGbps transceiver with Receiver-End deskewing capability using Non-Uniform Tracked Oversampling and a 250-750 MHz Four-Phase D L L , " 1999 Symposium On VLSI Circuits, Dig, Tech. Papers, pp.47-48.
F.Mu, A.Edman, C.Sevenson, "Digital Multiphase Clock/Pattern Generator," IEEEJ. Sold-State Circuits, vol.34, NO.2, pp. 182-191, Feb 1999.
S.I.Liu, J.H.Lee, H.W.Tsao, "Low-Power Clock-Deskew Buffer for High-Speed Digital Circuits," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 554-558, Apr 1999.
M.Mota, J.Christiansen, " A High-Resolution Time Interpolator Based on a Delay Locked Loop and an RC Delay Line," IEEE J. Sold-State Circuits, vol.34, NO. 10, pp. 1360-1366, Oct 1999.
Y.Nakase, YMorooka, D.J.Perlman, D.J.Kolar, J.M.Choi, H.J.Shin, T.Yoshimura, N.Watanabe, Y.Matsuda, M.Kumanoya, M.Yamada, "Source-Synchronization and
74
Timing Vernier Techniques for 1.2-GB/s S L D R A M interface," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 494-501, Apr 1999.
W.Bruno, K.S.Donnelly, J.Kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C.L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horowitz, " A Portable Digital D L L for High-Speed CMOS Interface Circuits," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 632-644, May 1999.
C.Kim, H.K.Kyung, W.P.Jeong, J.S.Kim, B.S.Moon, J.W.Chai, S.M.Yim, J.H.Choi, K.H.Han, C.J.Park, H.S.Hwang, H.Choi, S.B.Cho, C.L.Portmann, S.I.Cho, " A 2.5-V, 72-Mbit, 2.0-GByte/s Packet-Based D R A M with a 1.0-Gbps/ pin Interface," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 645-652, May 1999.
S.Eto, M.Matsumiya, M.Takita, Y.Ishii, T.Nakamurra, K.Kawabata, H.Kano, A . Kitamoto, T.Ikeda, T.Koga, M.Higashiro, Y.Serizawa, K.Itabashi, O.Tsuboi, Y.Yokoyama, and M.Taguchi, " A 1 Gb S D R A M with ground level precharged bit-line and non-boosted 2.1V word line," IEEE J. Solid-State Circuits, vol. 33, N O . l 1 pp. 1697-1702, Nov 1998.
M.Hasegawa, M.Nakamura, S.Narui, S.ohkuma, YKawase, H.Endoh, S.Miyatake, T.Akiba, K.Kawakita, M.Yoshida, S.Yamada, T.Sekigguchi, I.Asano, Y.Tadaki, R.Nagai, S.Miyako, K.Kajigaya, M.Horiguchi, and Y.Nakagome, " A 256 Mb S D R A M with subthreshold leakage current suppression," in ISSCC 1998 Dig. Tech. Papers, Feb 1998, pp. 80-81.
C.H.Kim, J.H.Lee, J.B.Lee, B.S.Kim, C.S.Park, S.B.Lee, S.Y.Lee, C.W.Park, J.GRoh, H.S.Nam, D.GKim, D.Y.Lee, T.S.Jung, H.Yoon, S.I.Cho, " A 64-Mbit, 640-MByte/s bidirectional data strobed, Double-Data-Rate S D R A M with a 40-mW D L L for a 256-MByte memory system," IEEE J. Sold-State Circuits, vol.33, N O . l l , pp. 1703-1710, Nov 1998.
B. S.Kim, L.S.Kim, "100 MHz all-digital Delay-Locked Loop for low power application" Electronics Letters, vol 34, NO.18, pp. 1739-1740, Sept 1998.
S.J.Jang, S.H.Han, C.S.kim, Y.H.Jun, H.J.Yoo, " A compact ring delay line for high speed synchronous D R A M , " VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 60-61.
B. W.Garlepp, K.S.Donnelly, J.kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C. L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horwitz, " A portable digital D L L architecture for CMOS interface circuits," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 214-215.
75
T.Yushimura, Y.Nakase, N.Watanabe, YMorooka, Y.Matsuda, M.Kumanoya, H.Hamano, " A Delay-Locked Loop and 90-degree phase shifter for 800Mbps Double Data Rate memories," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 66-67.
D.Birru, " A novel Delay-Locked Loop based CMOS clock multiplier," IEEE J. Sold-State Circuits, vol.44, NO.4, pp. 1319-1322, Nov, 1998
M.Mota, J.Christiansen, " A four channel, self-calibrating, high resolution, Time To Digital Converter, "Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.1, pp. 409-412, Sept 1998.
RL.Aguiar, D.M.Santos, "Wide-Area clock distribution using controlled delay lines," Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.2, pp. 63-66, Sept 1998.
M.S.Gorbics, J.Kelly, K.M.Roberts and R.L.Sumner, " A High Resolution Multihit Time to Digital Converter Integrated Circuit," IEEE Transactions on Neuclear Science, vol 44, pp. 379-384, June 1997.
S.Sidiropoulos, M.Horwitz, " A Semidigital Dual Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 32, pp. 1683-1692, Nov 1997.
A.Hatakeyama, H.Mochizuki, T.Aikawa, M.Takita, Y.Ishii, H.Tsuboi, S.Y.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishimura, K.Kawabata, YOkajima, M.Kawano, H.Koima, K.Mizutani, T.Anezaki, M.Hasegawa, and M.taguchi, " A 256-Mb S D R A M using a register-controlled digital D L L , " IEEE J. Solid-State Circuits, vol. 32, pp. 1728-1732, Nov 1997.
GC.Moyer, M.Clements, W.Liu, T.Schaffer, R.K.Cavin, "The Delay Vernier pattern generation technique," IEEE J. Sold-State Circuits, vol.32, NO.4, pp. 551-562, Apr 1997.
K.Gotch, S.Wakayama, M.Saito, J.Ogawa, H.Tamura, YOkajima, M.Taguchi, "All-Digital Multi-Phase Delay Locked Loop for internal timing generation in embedded and/or high speed DRAMs," VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 107-108.
T.Saeki, H.Nakamura, J.Shimizu, " A lOps jitter 2 clock cycle lock time CMOS digital clock generator based on an interleaved synchronous mirror delay scheme" VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 109-110.
76
[89] S.Sidiropoulos, M.Horwitz, " A Semi-Digital D L L with unlimited phase shift capability and 0.08-400MHz operating range," ISSCC Dig, Tech. Papers, Feb 1997, pp.332-333.
[90] A.Hatakeyama, H.Mochizuki, TAikawa, M.Takita, Y.Ishi, H.Tsuboi, S.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishima, K.Kawabata, YOkajima, M.Kawano, H.Kojima, K.Mizutani, T.Anezaki, M.Hasegawa, M.Taguchi, " A 256Mb S D R A M using a Register-Controlled Digital D L L , " ISSCC Dig, Tech. Papers, Feb 1997, pp.72-73.
[91] S.Gogaert, M.Steyaert, " A skew tolerant CMOS level-based A T M data-recovery system without P L L topology," Custom Integrated Circuits Conference, 1997, Proceedings of the IEEE 1997, Sept 1997, pp.'453-456.
[92] B.S.Kim, L.S.Kim, " A low power 100MHz A l l Digital Delay-Locked Loop," ISCAS 2004, Proceedings of the 1997 International Symposium on, Vol.1 May 1997, pp. 1820-1823.
[93] V.Lines, M.A.Scido, C.Mar, A.Achyuthan, "High speed circuit techniques in a 150MHz 64M S D R A M , " Memory Technology, Design and Testing, 1997, Proceedings International Workshop On, Aug 1997, pp.8-11.
[94] T.Saeki, YNakaoka, M.Fujita, A.Tanaka, K.Nagata, K.Sakakibara, T.Matano, Y.Hoshino, K.Miyano, S.Isa, E.Kakehashi, J.Drynan, M.Komuro, T.Fukase, H.Iwasaki, J.Sekine, M.Igeta, N.Nakanishi, T.Itani, K.Yoshida, H.Yoshina, S.Hashimoto, T.Yshii, M.Ichinose, T.Imura, M.Uziie, K.Koyama, Y.Fukuzo, and T.Okuda, " A 2.5 ns clock access 250 MHz 256 Mb S D R A M with synchronous mirror delay," ISSCC 1996 Dig. Tech. Papers, Feb 1996, pp. 374-375.
[95] A.Chau, D.Deusschere, S.Dow, J.Flasck, M.E.Levi, F.Kristen, E.Su, " A Multi-Channel Time-to-Digital converter chip for drift chamber readout," Nuclear Science, IEEE Transactions On, Vol.43, NO.3, June 1996, pp. 1720-1724.
[96] D.M.Santos, S.F.Dow, M.E.Levi, " A CMOS Delay-Locked Loop and Sub-Nanosecond Time-to-Digital converter chip," Nuclear Science, IEEE Transactions On, Vol.43, NO.3, June 1996, pp.1717-1719.
[97] J.Christiansen, "An Integrated High Resolution CMOS Timing Generator Based on an Array of Delay Locked Loops," IEEE J. Sold-State Circuits, vol.31, NO.7, pp. 952-957, Jul 1996.
77
[98] S.Tanoi, T.Tanabe, K.Takahashi, S.Miyamoto, M.Uesugi, " A 250-622 MHz Deskew and Jitter-Suppressed Clock Buffer Using Two-Loop Architecture," IEEE J. Sold-State Circuits, vol.31, NO.4, pp. 487-493, Apr 1996.
[99] J.Chapman, J.Currin, S.Payne, " A Low-Cost High-Performance CMOS Timing Vernier for ATE," Test Conference, 1995, Proceedings International, pp. 459-468, Oct 1995.
[100] R.F.Ormondroyd, "The acquisition performance of Delay-Locked Loops in noise," Radio Receivers and Associated Systems, Sept 1995, pp.192-197.
[101] E.R.Ruotsalainen, T.Rahkonen, J.Kostamovaara, " A Low-Power CMOS Time-to-Digital converter," IEEE J. Sold-State Circuits, vol.30, NO.9, pp. 984-990, Sept 1995.
[102] H.Sutoh, K.Yamakoshi, M.Ino, "Circuit technique for Skew-Free Clock distribution," Custom Integrated Circuits Conference, 1995, Proceedings of the IEEE 1995, Sept 1995, pp.163-166.
[103] B.Kim, T.C.Weigandt, P.R.Gray " P L L / D L L system noise analysis for Low-Jitter Clock synthesizer design" ISCAS 1994, Proceedings of the 1994 International Symposium on, Vol.4 June 1994, pp.31-34.
[104] T.Lee, " A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 MB/s D R A M , " IEEEJ. Solid-State Circuits, vol. 29, pp. 1491-1496, Dec 1994.
[105] M.Izzard, "Analog versus digital control of a clock synchronizer for a 3 Gb/s data with 3.0 V differential E C L , " inDig. Tech, Papers 1994 Symp. VLSI Circuits, June 1994, pp. 39-40.
[106] C.Ljuslin, J.Christiansen, A.Marchioro, O.Klingsheim, "An integrated 16 channel CMOS Time-to-Digital converter," Nuclear Science, IEEE Transactions On, Vol.41, NO.4, Aug 1994,pp.ll04-1108.
[107] A.Waizman, " A Delay Line Loop for frequency synthesis of De-Skewed Clock," ISSCC Dig, Tech. Papers, Feb 1994, pp.298-299.
[108] T.Kuroda, T.Fujita, S.Mita, T.Mori, K.Matsuo, M.Kakumu, T.Sakurai, "Substrate noise influence on circuit performance in variable threshold-voltage scheme," IEEE J. Sold-State Circuits, vol.29, NO.3, pp. 309-312, Mar 1994.
78
[109] M.Ramezani, C.A.T.Salama, "An improved Bang-Bang phase detector for clock and data recovery applications," ISCAS, vol.1, NO.3, pp. 715-718, 1994.
79
Appendix A
Design VHDL code
library ieee; use ieee. stdlogicl 164. all; — library vst_nl8_sc_tsm_c4_wc; — use vst_nl8_sc_tsm_c4_wc.components.all; — library tpz973gtc; — use tpz973gtc.components.all;
entity vernierunitdelay is port ( coarsecontrol : in stdlogic := '0'; ~ contol line for the coarse chain finecontrol : in stdlogic := '0'; — contol line for the fine chain fine_control_inv : in stdlogic := '0'; — inverted version of fine_control coarseinput : in stdlogic := '0'; ~ input to coarse chain fineinput : in stdjogic := '0'; - input to fine chain vernierinput : in stdlogic := '0'; — input from previous stage coarseoutput : out stdlogic := '0'; ~ output of coarse chain fineoutput : out std_logic := '0'; - output of fine chain veraieroutput : out stdlogic := '0'; - output to next stage clkoutput : out stdlogic := '0'); - output of vernier unit
end vernier_unit_delay;
architecture structural of vernierunitdelay is
signal A,B,coarse_output_int,fine_output_int: stdlogic := '0'; signal logicone : stdlogic := '1';
component NAN2D0 port(
Z : out STDLOGIC; A l :in STDJLOGIC; A2 : in STD_LOGIC);
end component;
component BUFTD1 port(
Z :out STD_LOGIC; A :in STDLOGIC; ENB : in STDLOGIC);
end component;
component N AN2M1D1 port(
Z :out STD_LOGIC; A l : in STD_LOGIC; A2 : in STDLOGIC);
end component;
begin -- structural
U1: NAN2D0 port map (A, logicone, fine_input); U2: NAN2D0 port map (fine_output_int, A, logicone); U3: NAN2D0 port map (B, coarseinput, vernier_input); U4: NAN2D0 port map (coarse_output_int, B, finecontrolinv); U5: NAN2M1D1 port map (vernier_output, fine_output_int, finecontrol); U6: BUFTD1 port map (clkoutput, coarse_output_int, coarsecontrol);
fineoutput <= fme_output_int; coarse_output <= coarseoutputint; logic_one <= '1';
end structural;
entity vernierdelayline is generic (
N : integer := 128 ); -- number of delay elements port( delaylineoutput: out stdlogic := '0'; delay_line_input : in std_logic := '0'; finecontrol : in std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : in std_logic_vector(N-l downto 0) := (others => '0'); coarse_control : in std_logic_vector(N-l downto 0) := (others => '0'));
end vernierdelayline;
architecture structural of vernier_delay_line is
signal fine,coarse,vernier : std_logic_vector(N-1 downto 1) :=. (others => '0'); signal logicone : stdlogic := T;
component vernierunitdelay port( coarse_control : in std_logic; ~ contol line for the coarse chain finecontrol : in std_logic; - contol line for the fine chain finecontroMnv : in stdlogic; - inverted version of fme_control coarse_input : in stdlogic; - input to coarse chain fme_input : in stdlogic; - input to fine chain vernierinput : in stdlogic; - input from previous stage coarseoutput : out stdlogic; - output of coarse chain fme_output : out stdlogic; - output of fine chain
vernier_output : out std_logic; - output to next stage clkoutput : out std_logic); — output of vernier unit
end component;
begin — structural
chain: for i in 0 to N-l generate
last_unit: if (i = 0 ) generate Dl : vernier_unit_delay port map (coarsecontrol(O), finecontrol(O), finecontrolinv(O), coarse(l),
fine(l), vernier(l), open, open, open, delay_line_output); end generate last_unit;
middleunits: if (i > 0 and i < N-l) generate Dl : vernierunitdelay port map (coarse_control(i), finecontrol(i), fme_control_inv(i), coarse(i+l),
fme(i+l), vernier(i+l), coarse(i), fine(i), vernier(i), delay_line_output); end generate middleunits;
first_unit: if (i = N-l) generate Dl : vernier_unit_delay port map (coarse_control(N-l), fme_control(N-l), fine_control_inv(N-l),
delay_line_input, delay_line_input, logicone, coarse(N-l), fine(N-l), vernier(N-l), delay_line_output); end generate first_unit;
end generate chain;
logicone <= ' 1'; delaylineoutput <= 'H'; — Should be commented for synthesis
end structural;
entity vernier_controller is
generic (
N : integer := 128 ); — number of delay elements
port ( reset : in std_logic := '0'; registerclock : in std_logic := '0'; increasedelay : in std_logic := '0'; decreasedelay : in stdlogic := '0'; coarse_control : out std_logic_vector(N-l downto 0) := (others => '0'); finecontrol : out std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : out std_logic_vector(N-l downto 0) := (others => '0'));
end verniercontroller;
architecture behavior of vernier_controller is
signal coarse_load_data : std_logic_vector(N-l downto 0) := (others => '0');
82
signal fine_load_data : std_logic_vector(N-l downto 0) := (others => '0'); signal fme_control_int : std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_control_int: std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_enable : std_logic := '0'; signal fine_enable : stdlogic := '0'; signal logic_zero : stdlogic := '0'; signal logic_one : stdlogic := '1';
type statejype is (IDLE,INCREMENT,DECREMENT,FiNE); signal nextstate : state_type; signal state : state_type;
begin
process (nextstate, increase_delay, decrease_delay, fine control_int) begin case state is when IDLE =>
if increase_delay = '1' then nextstate <= INCREMENT; elsif decreasedelay = '1' then nextstate <= DECREMENT; else nextstate <= IDLE; end if;
when INCREMENT => if decrease_delay = i ' then nextstate <= DECREMENT; else nextstate <= INCREMENT; end if;
when DECREMENT => if increasedelay = T then nextstate <= FINE; else nextstate <= DECREMENT; end if;
when FINE => if increasedelay = '1' and finecontrolint(O) = T then nextstate <= INCREMENT; elsif decreasedelay = '1' and fine_control_int(N-2) = '1' then nextstate <= DECREMENT; else nextstate <= FINE; end if;
end case; end process;
process (reset, registerclock) begin
if reset = '0' then state <= IDLE; elsif (register_clock'event and registerclock = '1') then
state <= nextstate; end if;
end process;
process(reset, register_clock) begin
if reset = '0' then coarse_control_int <= coarse_load_data;
elsif (register_clock'event and register_clock = T) then if increase_delay = '1' and coarseenable = '1' then
rightshift: for i in 0 to N-2 loop coarse_control_int(i) <= coarse_control_int(i+l); end loop; coarsecontrolint(N-l) <= logic_one;
elsif decrease_delay = '1' and coarse_enable = '1' then leftshift: for i in N-1 downto 2 loop
coarse_control_int(i) <= coarse_control_int(i-l); end loop; coarse_control_int(0) <= logic_one;
end if; end if;
end process;
process(reset, register_clock) begin
if reset = '0' then fine_control_int <= fine_load_data; elsif (register_clock'event and registerclock = '1') then
if increasedelay = '1' and fineenable = '1' then rightshift: for i in 0 to N-2 loop
finecontrolint(i) <= fine_control_int(i+l); end loop; fine_control_int(N-1) <= logic_zero;
elsif decrease_delay = T and fine_enable = '1' then leftshift: for i in N-1 downto 2 loop
finecontrolint(i) <= finecontrolint(i-l); end loop; fine_control_int(0) <= logic_zero;
end if; end if;
end process;
logiczero <= '0'; logicone <= '1'; coarse_load_data <= x"FFFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF"; fine_load_data <= x"80000000000000000000000000000000"; coarse_enable <= '1' when ((state = INCREMENT or state = DECREMENT) and (nextstate /= FINE)) else '0'; fine_enable <= T' when (state = FINE) else '0'; , fine_control_inv <= not finecontrolint; fine_control <= fine_control_int; coarsecontrol <= coarse_control_int; end behavior;
entity high_resoloution_phase_detector is port( dll_clock_output: in stdlogic := '0'; — DLL's output clock dllclockinput : in stdlogic := '0'; — Input clock to DLL
84
reset : in std_logic := '0'; — reset input register_clock : out stdlogic := '0'; ~ Clock for shift register decreasedelay : out stdlogic := '0'; — shift-left output increasedelay : out std_logic := '0'); — shift right output
end high_resoloution__phase_detector;
architecture structural of high_resoloution_phase_detector is
signal A,B,C,D,E,decrease_delay_int,increase_delay_int: stdlogic := '0'; signal F,G,H,I,dll_clock_input_int,reg_clk_l,reg_clk_2,reg_clk_3 : stdlogic := '0'; signal logicone : stdlogic := '1';
component BUFBD4 port(
Z : out STDLOGIC; A : in STDLOGIC);
end component;
component BUFBD16 port(
Z : out STDLOGIC; A : in STDLOGIC);
end component;
component BUFBD32 port(
Z : out STDJLOGIC; A : in STDLOGIC);
end component;
component DFFRPB1 port(
Q : out STDLOGIC; QB :out STDLOGIC; CK :in STDLOGIC; D : in STD_LOGlC; RB :in STD_LOGIC);
end component;
component AND3D1 port(
Z : out STD_LOGIC; A l :in STDLOGIC; A2 : in STD_LOGIC; A3 : in STDLOGIC);
end component;
component AND2D1 port(
85
Z : out STDJLOGIC; A l :in STDLOGIC; A2 :in STD_LOGIC);
end component;
component OR2D1 port(
Z : out STD_LOGIC; A l :in STD_LOGIC; A2 :in STD_LOGIC);
end component;
begin — structural
Ul : DFFRPB1 port map (E, F, dllclockinputjnt, B, reset); U2: DFFRPB1 port map (G, H, dllclockinputjnt, A, reset); U3: AND2D1 port map (A, logicone, dll_clock_output); U4: AND3D1 port map (increase_delay_int, E, G, dll_clock_input__int); U5: AND3D1 port map (decrease_delay_int, F, H, dllclockinputjnt); U6: OR2D1 port map (regclkl, increasedelayint, decrease_delay_in U7: AND2D1 port map (B, dllclockoutput, logic_one); U8: AND2D1 port map (dllclockjnputint, dll_clock_input, logicone) u9: BUFBD4 port map (reg_clk_2, regclkl); ulO: BUFBD16 port map (reg_clk_3, reg_clk_2); ul 1: BUFBD32 port map (register_clock, reg_clk_3);
logic_one <= '1'; decreasedelay <= decrease delay_int; increase_delay <- increase_delay_int;
end structural;
entity lock_detector is port( lock_indicator : out stdlogic := '0'; — high when DLL is locked dll_clock_input : in std_logic := '0'; ~ Input clock to DLL reset : in stdlogic := '0'; — reset input decrease_delay : in std_logic := '0'; - shift-left output increase_delay : in std_logic := '0'); ~ shift_right output
end lockdetector;
architecture behavior of lock detector is
begin
process(reset, dll_clock_input) begin
if reset = '0' then lock indicator <= '0'; elsif dll_clock_input'event and dllclock input = '0' then
lockindicator <= not(increase_delay or decrease_delay); end if;
end process;
end behavior;
entity vernierdll is
generic ( N : integers 128 );
port( dll_clock input : in std_logic := '0'; dll_clock_output : out stdlogic := '0'; lockindicator : out std_logic := '0'; reset : in std_logic := '0');
end vernierdll;
architecture structural of vernier dll is
signal registerclock : stdlogic := '0'; signal finecontrol : std_logic_vector(N-l downto 0) := (others => '0'); signal fine_control_inv : stdJogic_vector(N-1 downto 0) := (others => '0'); signal coarse_control : std_logic_vector(N-l downto 0) := (others => '0'); signal decrease_delay : std_logic := '0'; signal increasedelay : stdlogic := '0'; signal dllclockoutput int : stdlogic := '0'; signal dllclock inputjnt : stdjogic := '0'; signal delayjineoutput : std_logic := '0'; signal logiczero : stdlogic := '0'; signal reset_int : stdlogic := '0'; signal lock indicator int : stdlogic := '0'; signal dll_clock_output_pad : stdjogic := '0';
component PDCH3DGZ port(
CLK : in std_logic; CP : out stdjogic);
end component;
component PDD24DGZ port(
I : in stdjogic; OEN : in stdjogic; PAD : inout stdjogic; C : out stdlogic);
end component;
87
component PDIDGZ port(
PAD : in stdjogic; C : out stdlogic);
end component;
component PDO02CDG port( I : in stdlogic; PAD : out stdlogic);
end component;
component vernier_delay_line
generic ( N : integer); number of delay elements
port( delay_line_output: out std_logic; delayjinejnput : in std_logic; finecontrol : in std_logic_vector(N-l downto 0); fine_control inv : in std_logic_vector(N-1 downto 0); coarsecontrol : in std_logic_vector(N-l downto 0));
end component;
component high_resoloution_phase_detector port( dllclockoutput: in stdlogic; — DLL's output clock dllclock input : in stdjogic; - Input clock to DLL reset : in stdjogic; - reset input register_clock : out stdjogic; -- Clock for shift register decreasedelay : out std logic; ~ shift left output increase_delay : out stdlogic); - shift right output
end component;
component vernier_controller is
generic ( N : integer); — number of delay elements
port( reset : in registerclock increase_delay decrease_delay coarsecontrol fine_control : fine control inv
stdjogic; in stdjogic; : in stdlogic; : in std logic; : out stdlogic_vector(N-l downto 0); out stdlogic_vector(N-l downto 0); : out stdlogic_vector(N-l downto 0));
end component;
component lockdetector is port( lockindicator : out stdjogic; dll_clock_input : in stdlogic; reset : in std_logic; decrease_delay : in stdlogic; increase_delay : in stdlogic);
end component;
begin — structural
ul : vernierdelayline generic map (N => 128)
port map (delayjineoutput, dll_clock_input_int, finecontrol, fine_control_inv, coarse_control);
u2 : high_resoloution_phase_detector port map (dll_clock_output_int, dll_clock_input_int, reset_int, registerclock, decreasedelay,
increasedelay); u3 : verniercontroller generic map (N=> 128) port map (resetint, registerclock, increasedelay, decreasedelay, coarsecontrol, fine_control,
fine_control_inv);
u4 : lockdetector
port map (lockindicatorint, dll_clock_input_int, reset_int, decreasedelay, increasedelay);
u5: PDD24DGZ
port map (delay_line_output, logiczero, dll_clock_output_pad, dll_clock_output_int);
u6: PDO02CDG
port map (lockindicatorint, lockindicator);
u7: PDIDGZ
port map (reset, resetint);
u8: PDCH3DGZ
port map (dll_clock_input, dll_clock_input_int);
logic_zero <= '0'; dll_clock_output <= dll_clock_output_pad;
end structural;
89
library ieee; use ieee.stdjogicl 164.all; library vst_nl8_sc_tsm_c4_typ; use vst_nl8_sc_tsm_c4_typ.components.all; library tpz973gtc; use tpz973gtc.components.all;
entity verniertestbench is generic(
N : integer := 128); end verniertestbench;
architecture behavior of vernierjestbench is
signal jitterl: stdjogic := '1'; signal jitterh : stdlogic := '0'; signal clock ljnput: stdjogic := '0'; signal clock2 input: stdjogic := '0'; signal clockenable : stdjogic := ' 1'; signal dll_clock input : stdjogic := '0'; signal dll_clock_output : stdjogic := '0'; signal lock indicator : stdjogic := '0'; signal reset : stdjogic := '0';
component vernierdll
generic ( N : integer);
port( dll_clock input : in stdjogic; dll_clock_output : out'stdjogic; lock indicator : out stdjogic; reset : in stdlogic);
end component;
begin
UI : vernierdll generic map(N => 128) port map (dllclockjnput, dll_clock_output, lockjndicator, reset);
process begin clockl input <= '0'; wait for 4100 ps; clockl Jnput <= '1'; wait for 4100 ps;
90
end process;
process begin clock2_input <= '1'; wait for 3900 ps; clock2_input <= '0'; wait for 3900 ps;
end process;
process begin clock_enable <= '1'; wait for 291200 ps; clock_enable <= '0'; wait for 2910000 ps ;
end process;
process begin jitterl <= '1'; wait for 291100 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps;
jitterl<=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= *1 wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T;
end process;
process begin jitterh <= '0'; wait for 295200 ps jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps;
jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=']'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0';
end process;
dllclockinput <= (((clock 1 input and jitterl) or jitterh) and clock_enable) or (clock2_input and (not clockenable));
process begin reset <= '0'; wait for 10000 ps; reset <= T; wait;
end process;
end behavior;
configuration vernierrtl of vernierjestbench is
93
for behavior for ul : vernierdll use entity work.vernierdll(structural);
for structural forul: vernier_delay_line use entity work.vernierdelayline(structural);
end for;
for u2: high_resoIoution_phase_detector use entity work.high_resoloution_phase_detector(structural);
end for;
foru3: vernier_controller use entity work.vernier_controller(behavior);
end for;
for u4: lockdetector use entity work.lock_detector(behavior);
end for;
end for; end for;
end for;
end vernier rtl
Appendix B
Synthesis result
Report: cell Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Mon Jun 27 12:29:56 2005
Attributes: b- black box (unknown) h- hierarchical n - noncombinational r - removable u - contains unmapped logic
Cell Reference Library Area Attributes
UI NAN2D0 vst nl8 sc tsm c4 wc 12.197000
U2 NAN2D0 vst nl8 sc tsm c4 wc 12.197000
U3 NAN2D0 vst nl8 sc tsm c4 wc 12.197000
U4 NAN2D0 vst nl8 sc tsm c4 wc 12.197000
U5 NAN2M1D1 vst nl8 sc tsm c4 wc 16.261999
U6 BUFTD1 vst nl8 sc tsm c4 wc 28.459000 n
Total 6 cells 93.508995
Report: area Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Thu Jun 23 14:31:56 2005
Library(s) Used:
vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db)
Number of ports: 10 Number of nets: 13 Number of cells: 6 Number of references: 3
Combinational area: 93.508995 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 93.508995 Total area: undefined
**************************************** Report: area Design : vernierdelayline Version: V-2004.06-SP1 Date : Thu Jun 23 14:34:45 2005
Library(s) Used:
vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db)
Number of ports: 386 Number of nets: 767 Number of cells: 128 Number of references: 1
Combinational area: 11969.153320 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 11969.151367 Total area: undefined
96
Report: area Design : high_resoloution_phase_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:39:20 2005
Library(s) Used:
vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db)
Number of ports: 6 Number of nets: 17 Number of cells: 11 Number of references: 7
Combinational area: 760.265991 Noncombinational area: 154.492004 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 914.757996 Total area: undefined
Report: area Design : verniercontroller Version: V-2004.06-SP1 Date : Thu Jun 23 14:50:12 2005
Library(s) Used:
vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_n 18_sc_tsm_c4_wc.db)
Number of ports: 388 Number of nets: 825 Number of cells: 567 Number of references: 19
Combinational area: 6244.804199 Noncombinational area: 25637.822266 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 31882.552734 Total area: undefined
97
**************************************** Report: area Design : lock_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:42:05 2005 ****************************************
Library(s) Used:
vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db)
Number of ports: 5 Number of nets: 6 Number of cells: 2 Number of references: 2
Combinational area: 12.197000 Noncombinational area: 77.246002 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 89.443001 Total area: undefined
**************************************** Report: area Design : vernier_dll Version: V-2004.06-SP1 Date : Thu Jun 23 15:09:01 2005 ****************************************
Library(s) Used:
vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db) tpz973gwc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/tpz973gwc.db)
Number of ports: 4 Number of nets: 397 Number of cells: 8 Number of references: 8
Combinational area: 65985.875000 Noncombinational area: 25869.562500 Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 91855.906250 Total area: undefined
98
****************************************
Report: cell Design : vernierdll Version: V-2004.06-SP1 Date : Thu Jun 23 15:10:29 2005 ****************************************
Attributes: b - black box (unknown) h - hierarchical n - noncombinational p - parameterized r - removable
u - contains unmapped logic
Cell Reference Library Area Attributes ul vernier_delay_line 11969.151367
u4 u5
u3
u2 h, n, p
highresoloutionphasedetector 914.757996 h,n
vernier_controller 31882.552734 h, n, p
lockdetector 89.443001 h, n PDD24DGZ tpz973gwc 9400.000000
n u6 u7 u8
PDO02CDG PDIDGZ PDCH3DGZ
tpz973gwc 9400.000000 tpz973gwc 9400.000000
tpz973gwc 18800.000000
Total 8 cells 91855.906250
HDL Parameter Information: ul - N=>128 u3 - N => 128