175
Low-Power, Gigahertz Clock Generation and Distribution using Injection-Locked Oscillators by Lin Zhang Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Supervised by Professor Hui Wu Department of Electrical and Computer Engineering Arts, Sciences and Engineering Edmund A. Hajim School of Engineering and Applied Sciences University of Rochester Rochester, New York 2010

Low-Power, Gigahertz Clock Generation and Distribution

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Low-Power, Gigahertz Clock Generation and Distribution

Low-Power, Gigahertz Clock Generation and

Distribution using Injection-Locked Oscillators

by

Lin Zhang

Submitted in Partial Fulfillment

of the

Requirements for the Degree

Doctor of Philosophy

Supervised by

Professor Hui Wu

Department of Electrical and Computer Engineering

Arts, Sciences and Engineering

Edmund A. Hajim School of Engineering and Applied Sciences

University of Rochester

Rochester, New York

2010

Page 2: Low-Power, Gigahertz Clock Generation and Distribution

ii

Curriculum Vitae

Lin Zhang was born in Luzhou, Sichuan Province, China on Feb. 26, 1982. He

attended Tsinghua University, Beijing, China from 2000 to 2004, and graduated with

a Bachelor of Science degree in 2004. He came to the University of Rochester in the

Fall of 2004 and began graduate studies in Electrical Engineering. He pursued his

research in RF and analog integrated circuits under the direction of Professor Hui

Wu and received the Master of Science degree from University of Rochester in 2006.

He received Zeng Xianzhang Scholarship at Tsinghua University in 2001 and Frank

J. Horton Research Fellowship at University of Rochester in 2006.

Page 3: Low-Power, Gigahertz Clock Generation and Distribution

iii

Acknowledgment

I would like to thank a number of people and organizations that have helped me

through my research at University of Rochester and in my life to date. First and

foremost, I am heartily thankful to my advisor, Prof. Hui Wu, whose encouragement,

guidance and support from the initial to the final level enabled me to develop an un-

derstanding of the subject and make this thesis possible. His knowledge and technical

excellence is beyond any doubt. What’s more important to me is his patience and

professionalism in teaching me through every technique details of the Ph.D. research.

I feel very honored and privileged to have worked with him.

I would like to extend my sincere gratitude to Prof. Michael Huang. Without his

help and guidance on the ILC project, this thesis would not have been possible. I

would also like to thank Prof. Eby Friedman, for being in my academic committee,

inspiring me on my research direction, and organizing the excellent IC design research

meeting, which helped a lot on my presentation skills. I would also like to thank Prof.

Sandhya Dwarkadas, Prof. Wendi Heinzelman, Prof. Robert Waag, Prof. Philippe

Fauchet and Prof. Thomas B. Jones for their valuable helps on my classes, lab works

and oral exams.

I am deeply indebted to all of my groupmates in Prof. Wu’s Laics group, Dr.

Yunliang Zhu, Berkehan Ciftcioglu, Jianyun Hu, David Karasiewicz, Shang Wang,

Jian Zhang and Jie Xu. Without their helps, discussions and friendships, my research

work would have been much more difficult than it was. I would also like to thank

Page 4: Low-Power, Gigahertz Clock Generation and Distribution

iv

Alok Garg and Aaron Carpenter in Prof. Huang’s group, for their cooperation on the

ILC project. I would like to thank all my friends I met in Rochester, Fu Bo, Yu Chao,

Li Xin, Yu Qiaoyan, Zhang Xiaohua, Sun Qiang... Their friendships have made my

life colorful, even in Rochester’s long white winter.

During my internship at Rambus during the summer of 2008, Id like to acknowl-

edge those I worked with there. My manager Nhat Nguyen, my mentor Yohan Frans,

and all the colleagues Ting Wu, Brian Leibowitz, Marko Aleksic and Fred Lee.

I would like to thank Laboratory for Laser Energetics for the support through

Frank J. Horton research fellowship. I would like to thank Bijoy Chatterjee, Ah-

mad Bahai, Peter Holloway, Mounir Bohsali, Johnny Yu, Anish Shah, Virginia Abellera,

Peter Misich, and Jun Wan of National Semiconductor for their help and support in

chip fabrication.

Finally, I would like to thank my wife, my parents and my sister for their uncon-

ditional love, support and encouragement in my life. No matter where I was, they

were always with me, so this thesis is dedicated to my family.

Page 5: Low-Power, Gigahertz Clock Generation and Distribution

v

Abstract

The generation and distribution of high speed and high quality clock signals have

become increasingly important in high performance microprocessors, wireline com-

munications and wireless communications. In multi-gigahertz frequency range, con-

ventional clocking techniques have encountered several design challenges in terms

of power consumption, skew and jitter. Injection-locking is a promising technique

to address these design challenges for gigahertz clocking. This dissertation presents

our studies of gigahertz, high performance, low power clock generation and distri-

bution using injection-locked frequency dividers (ILFDs), injection-locked frequency

multipliers (ILFMs) and injection-locked clock distribution networks (ILCs). Chip

prototypes in 0.18µm standard digital CMOS technologies are demonstrated for the

following gigahertz clocking circuits.

For gigahertz clock generation, we introduced a phase tuning scheme for an ILFD-

based dual-phase signal generator. The phase tuning capability in this scheme comes

from the tunable phase transfer characteristics of injection-locked frequency dividers.

Implemented with a frequency-tunable double-balanced divide-by-two injection-locked

frequency divider, the dual-phase signal generator prototype achieves 100o differential

phase tuning range around quadrature with generated signal frequency of 5 GHz.

For gigahertz frequency division, we introduced a divide-by-odd-number injection-

locked frequency divider to address the division ratio limitation of conventional injection-

locked frequency dividers. With differential injection and harmonic filtering, this new

Page 6: Low-Power, Gigahertz Clock Generation and Distribution

vi

ILFD topology maintains the fully differential nature of the output signal, while at

the same time achieving effective mixing between the injected odd harmonics and

output oscillation. 5% locking range without frequency tuning is achieved for the

circuit prototype of this topology working at input frequency of 16-18 GHz.

For gigahertz frequency multiplication, we introduced an injection-locked oscil-

lator to work as a high-gain, high-Q harmonic filter for conventional harmonic-

generation-and-filtering frequency multipliers. This new approach achieves signif-

icantly better undesired harmonic suppression for frequency multipliers built with

lossy digital CMOS processes. Frequency tunability of injection-locked oscillators also

enables multi-mode operations for such injection-locked frequency multipliers. The

circuit prototype of such a frequency multiplier achieves multiply by 2 and 3 dual-

mode operation with undesired harmonic suppressions better than 30 dB achieved for

both modes.

For gigahertz clock distribution, we proposed an injection-locked clocking scheme

using injection-locked oscillators (ILOs) as the local clock regenerators. Because of

an ILO’s capability to be locked by a small input signal, this new approach reduced

a large amount of clock buffers in global clock distribution. This not only reduces

the power consumption, but also reduces the skew and jitter which come from these

clock buffers. The phase tunability of ILOs can also be utilized to achieve the deskew

function between different clock domains. Three circuit prototypes of ILCs working

at several gigahertz have been built. They demonstrated better power and jitter

performance together with the built-in deskew capability of ILCs.

Page 7: Low-Power, Gigahertz Clock Generation and Distribution

vii

Contents

Curriculum Vitae ii

Acknowledgment iii

Abstract v

List of Tables x

List of Figures xi

1 Introduction 1

1.1 Clocking in Communications and Computing Systems . . . . . . . . . 2

1.2 Challenges for High Performance Clocking . . . . . . . . . . . . . . . 9

1.2.1 Skew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.2 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.3 Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.4 Power Consumption . . . . . . . . . . . . . . . . . . . . . . . 20

1.3 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . 21

2 High Performance Clocking 23

2.1 Clock Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Page 8: Low-Power, Gigahertz Clock Generation and Distribution

viii

2.1.2 Phase-Locked Loop (PLL) . . . . . . . . . . . . . . . . . . . . 28

2.1.3 Injection-Locked Oscillators (ILOs) . . . . . . . . . . . . . . . 31

2.1.4 High Speed Frequency Dividers and Multipliers . . . . . . . . 34

2.2 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.2.1 Conventional Monolithic Clock Distributions . . . . . . . . . . 46

2.2.2 Emerging Gigahertz Clock Distribution Schemes . . . . . . . . 48

2.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . 51

3 Injection-Locked Oscillators for Clock Generation 54

3.1 Analysis of Injection-Locked Oscillators . . . . . . . . . . . . . . . . . 54

3.1.1 ”Harmonic Balance” Analysis of Oscillators . . . . . . . . . . 56

3.1.2 ”Harmonic Balance” Analysis of Injection-Locked Oscillators . 58

3.1.3 Common-Mode and Differential Injection . . . . . . . . . . . . 59

3.1.4 Differential Injection for Odd-Harmonic and Fundamental In-

jection Locking . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2 Injection-Locked Frequency Dividers (ILFDs) . . . . . . . . . . . . . 66

3.2.1 Divide-by-Two ILFD for Dual-phase Signal Generation . . . . 70

3.2.2 Divide-by-Odd-Number ILFD . . . . . . . . . . . . . . . . . . 80

3.3 Injection-Locked Frequency Multipliers (ILFMs) . . . . . . . . . . . . 89

3.3.1 Dual Modulus ILFM with Good Harmonic Suppression . . . . 89

4 Injection-Locked Clock Distribution 101

4.1 Injection-Locked Clocking . . . . . . . . . . . . . . . . . . . . . . . . 101

4.1.1 Power Savings . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.1.2 Skew Reduction and Deskew Capability . . . . . . . . . . . . 104

4.1.3 Jitter Reduction and Suppression . . . . . . . . . . . . . . . . 106

4.1.4 Potential Applications . . . . . . . . . . . . . . . . . . . . . . 107

4.2 Architecture Level Evaluation of ILC Power Impact . . . . . . . . . . 108

Page 9: Low-Power, Gigahertz Clock Generation and Distribution

ix

4.3 ILC Circuit Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.3.1 Prototype I: ILC with Divide-by-two ILOs . . . . . . . . . . . 115

4.3.2 Prototype II: ILC with Non-division ILOs . . . . . . . . . . . 120

4.3.3 Prototype III: ILC with Active Deskewing . . . . . . . . . . . 129

5 Future Work and Conclusions 139

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.1.1 Injection-Locked Clock and Data Recovery . . . . . . . . . . . 139

5.1.2 Injection-Locked Free-Space Optoelectronic Oscillators . . . . 141

5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Bibliography 146

Page 10: Low-Power, Gigahertz Clock Generation and Distribution

x

List of Tables

3.1 Performance Comparison with Other Works . . . . . . . . . . . . . . 100

Page 11: Low-Power, Gigahertz Clock Generation and Distribution

xi

List of Figures

1.1 Speed of wireline communication standards increases with time. For

long distance wireline communication standards, synchronous optical

networking (SONET) and synchronous digital hierarchy (SDH), the

speed has increased from 51.84Mb/s in 1988 to the current large scale

deployment of 10Gb/s and 40Gb/s systems. Speed for short distance

wireline communication standard Ethernet increased from only 3Mbps

in 1975 to current speed of 10Gb/s. . . . . . . . . . . . . . . . . . . . 3

1.2 Commercial wireless systems are expanding from sub- and low-gigahertz

range to multi-gigahertz, even millimeter-wave range in order to sup-

port higher data rates. . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Page 12: Low-Power, Gigahertz Clock Generation and Distribution

xii

1.3 The role of clock signal in a typical wireline receiver. The highlighted

portion in (a) is the clocking circuit. The clock signal is first frequency

synchronized with that of the incoming data stream by a reference

clock fref , which is shared by both transmitter and receiver. The

frequency synchronized clock is then phase synchronized with the data

stream, and used to sample the data. (b) The example of good and

bad sampling clocks. The good sampling clock, clock A has a sampling

transition edge at the center of the data stream eye diagram. The bad

sampling clock, clock B has a sampling edge at the edge of the data

stream eye diagram. A sampling edge away from the eye diagram

center increases the chance of error of the system. . . . . . . . . . . . 5

1.4 The role of clock signal in wireless communications systems. (a) shows

a typical radio transceiver for wireless communications. The high-

lighted portion is the clock generator for wireless systems, the fre-

quency synthesizer. The output of frequency synthesizer up converts

the transmitted signal from low frequency to radio frequency, and down

converts the received signal from radio frequency to low frequency.

(b) If there is any frequency error for the synthesizer output, the up-

converted signal may overlap with adjacent channels in radio frequency

and cause errors for both the transmission channel and adjacent channels. 6

1.5 On-chip and off-chip clock frequencies, IO bandwidth and number of

cores for the processors in the near future predicted by International

Technology Roadmap for Semiconductors (ITRS) 2007. According to

the prediction, the on-chip clock frequency will reach near 9 GHz in

2015. The off-chip clock frequency will reach 30 GHz and IO bandwidth

will reach 80 Tb/s. Number of cores in 2015 will be around 10. . . . . 8

Page 13: Low-Power, Gigahertz Clock Generation and Distribution

xiii

1.6 Clock generation and distribution in a processor chip. The clock gen-

eration PLL generates gigahertz, frequency tunable clock signal for

the entire VLSI chip. After the clock generator, a clock distribution

network delivers the clock signal to every logic gates across the chip. . 9

1.7 Effect of skew on (a) long path error and (b) short path error. . . . . 11

1.8 Illustration of clock jitter as the clock period variations. The period of

clock with jitter is a Gaussian distribution with the mean value equal

to the ideal clock period Tavg. . . . . . . . . . . . . . . . . . . . . . . 12

1.9 (a) Typical phase noise profile of an oscillator versus offset from carrier.

The corner frequencies ∆ω 1

f3

and ∆ω 1

f2

represent the boundaries be-

tween 30dB/dec, 20dB/dec and flat regions of the phase noise profile.

(b) Illustration of the process of reciprocal mixing. . . . . . . . . . . 16

1.10 (a) A typical LC oscillator, and (b) the model for phase noise analysis.

The active circuit part provides a negative resistance to compensate

the loss of the LC tank. . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.11 (a) The linearized noise model for a typical type II PLL, and (b) a

typical PLL phase noise profile. . . . . . . . . . . . . . . . . . . . . . 18

2.1 Direct frequency generation using the mix-and-divide principle. It re-

quires excessive filtering. . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Typical integrated oscillators. (a) a ring oscillator and (b) an LC

oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Barkhausen oscillation criteria for LC oscillators. (a) an LC differen-

tial oscillator composed by an LC tank, represented by H(jω), and

active circuit, represented by f(Vo); (b) feedback loop model for the

LC oscillator; (c) amplitude and phase response of the LC tank H(jω). 27

Page 14: Low-Power, Gigahertz Clock Generation and Distribution

xiv

2.4 A basic phase-locked loop, where PFD is the phase and frequency

detector, CP is charge pump, LPF is low pass filter and VCO is voltage

controlled oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5 A typical PLL-based Integer-N frequency synthesizer. With the divide-

by-N frequency divider in the loop, the output clock frequency fout

equals to N × fin. The modulus selection is capable of changing the

value for the division ratio N . Thus the output frequency can be

changed with step size of the reference frequency fin. . . . . . . . . . 30

2.6 (a) Beat and injection locking phenomenon when an oscillator is driven

by a single-frequency input signal. (b) locking range. . . . . . . . . . 32

2.7 (a) A generic model of an injection-locked oscillator (ILO). (b) a divide-

by-two ILO based on a common differential LC oscillator. The input

signal is injected into the oscillator core through the tail transistor

Mtail. This topology exhibits good injection locking efficiency because

of the built-in single-balanced mixer structure. . . . . . . . . . . . . 33

2.8 Digital divide-by-two circuit and the implementation of the latch. . . 35

2.9 Dynamic CMOS dividers using (a) inverters (b) true single-phase clock

(TSPC) logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.10 Analog frequency dividers: (a) Miller frequency divider and (b) para-

metric frequency divider. . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.11 Injection-locked frequency divider (ILFD) implementations: (a) ring

oscillator ILFD, (b) Colpitts LC oscillator ILFD, (c) direct injection

into the tank of an LC differential LC oscillator, (d) injection through

the tail of the LC differential oscillator. . . . . . . . . . . . . . . . . 38

2.12 Frequency synthesis with LO and frequency multiplier in an RF trans-

mitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Page 15: Low-Power, Gigahertz Clock Generation and Distribution

xv

2.13 Frequency multipliers by (a) harmonic generation and filtering, (b)

regenerative harmonic doubling and (c) injection-locking. . . . . . . . 42

2.14 Injection-locked clocking with active deskew based on the ILO delay

tunings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1 Loop analysis of injection-locked oscillators. (a) loop model for an ILO,

where the LC tank is represented by H(jω), and the active circuit is

represented by f(Vi, Vo); (b) phasor representation of the phase shift

introduced active circuit; (c) amplitude and phase response of the LC

tank H(jω), showing the new oscillation frequency at ωi, instead of ω0. 55

3.2 One port model for a resonator-based oscillator with the active circuit

represented by a linear admittance. . . . . . . . . . . . . . . . . . . . 57

3.3 One port model for an injection-locked oscillator with the active circuit

described in time domain. . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4 An injection-locked oscillator based on a cross-coupled LC differential

oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5 Locking range simulation of a divide-by-two ILFD as compared the

harmonic perturbation and hard switching models. . . . . . . . . . . 63

3.6 Realizations of divide-by-three ILFD by differential cascode injection. 65

3.7 Phase tuning characteristics for a divide-by-two ILO in Fig. 2.7-b.

η ≡ Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation

frequency, ∆ω ≡ ω − ω0 is the frequency shift, and Q is the LC tank

quality factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.8 (a) Schematic,(b) equivalent circuit model and (c) behavior model of

a divide-by-two ILFD based on differential LC oscillator. The non-

linearity in the behavior model comes from the switching of the cross

pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Page 16: Low-Power, Gigahertz Clock Generation and Distribution

xvi

3.9 Dual phase generation by injection-locking two ILFDs. The injected

differential signals result in quadrature phase difference at the ILFD

outputs when ω01 = ω02. The output phases φ1 and φ2 are explicitly

expressed as the sum of the quadrature phases and the phase shift

parts ϕ1 and ϕ2 so that Eqn. 3.28 can be directly applied. . . . . . . 72

3.10 Phase tuning of two ILFDs: (a) quadrature; (b)(c) single-ended tuning;

(d) differential tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.11 Schematic of the prototype double-balanced ILFD for tunable dual-

phase signal generation. A first stage ILFD works as an active balun

to convert a single-ended input to differential signals, which are then

fed into the input of the double-balanced ILFD stage. . . . . . . . . . 74

3.12 Chip micrograph of the prototype ILFD. The chip occupies an area of

1.0 × 1.1mm2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.13 Frequency tuning range of the free-running ILFD core. . . . . . . . . 76

3.14 Locking range and bounds in the middle of the tuning range. Note

that these are the input signal frequencies, which are 4 times that of

the outputs. A maximum of 17% locking range was achieved . . . . . 77

3.15 Phase tuning: (a) keep Vt1 constant and tune Vt2; (b) keep Vt2 constant

and tune Vt1; (c) differential tuning of Vt1 and Vt2 at different injection

frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.16 Phase noise within the locking range, compared with that of input

signal. Measured at 5.3 GHz output. 12-dB phase noise reduction

comes from the divide by four operation. . . . . . . . . . . . . . . . 79

3.17 Circuit evolution of ILFD from divide-by-two to divide-by-three. The

most important change is from common mode injection and differential

mixing to differential injection and single-ended mixing. . . . . . . . . 80

Page 17: Low-Power, Gigahertz Clock Generation and Distribution

xvii

3.18 Circuit model for the divide-by-three ILFD, with differential injection

at the sources of the cross pair. . . . . . . . . . . . . . . . . . . . . . 81

3.19 Loop model for the divide-by-three ILFD. . . . . . . . . . . . . . . . 82

3.20 Circuit implementation of the divide-by-three ILFD. An input balun

is used for single-end to differential conversion. . . . . . . . . . . . . 84

3.21 Output spectrum of the divide-by-three ILFD with 23-dB and 21-dB

of second and third harmonic suppression. The zoom in of the funda-

mental output shows a clean spectrum. . . . . . . . . . . . . . . . . 85

3.22 Locking range vs. injection power for the divide-by-three ILFD. A

maximum of 1-GHz locking range was achieved. . . . . . . . . . . . . 86

3.23 Locking range vs. injection voltage. The injection-voltage is calculated

by the incident power reading and the s11 at the input port, with cable

and connector loss calibrated out. . . . . . . . . . . . . . . . . . . . 86

3.24 Extended working frequency range which combines the frequency tun-

ing range and the locking range. . . . . . . . . . . . . . . . . . . . . 87

3.25 Phase noise performance vs. injection power. 9-dB phase noise reduc-

tion is because of the divide by three operation. . . . . . . . . . . . . 87

3.26 Chip micrograph of the divide-by-three ILFD, with a chip size of

0.9mm × 0.9mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.27 Schematic of the dual-modulus injection-locked frequency multiplier. . 91

3.28 (a) Harmonic current generation vs. input voltage for a zero biased

NMOS transistor harmonic generator. (b) Harmonic current ratio to

fundamental vs. input voltage for the same harmonic generator . . . 92

3.29 Model of the filter between harmonic generator and the ILO. The trans-

former model has neglected the parasitic inductance. . . . . . . . . . 93

3.30 Die photo of the dual-modulus frequency multiplier. Osc is the oscil-

lator core, T1 is the transformer and M1 is the harmonic generator. . 94

Page 18: Low-Power, Gigahertz Clock Generation and Distribution

xviii

3.31 Locking ranges of doubler and tripler modes vs. input levels, which

determine the output frequency ranges of the frequency multiplier. . . 95

3.32 Harmonic suppressions of doubler and tripler modes vs. input levels. . 96

3.33 Output spectra of doubler (a) and tripler mode (b) showing the har-

monic suppressions at 5% locking range input levels. . . . . . . . . . . 97

3.34 Power and locking range trade-offs for doubler and tripler modes. . . 98

3.35 Output phase noises of doubler and tripler with comparison to free

running conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.36 Simulated transient of modulus change from tripler to doubler for the

dual-modulus ILFM, which shows a dynamics time in ns range. Limit-

ing amplifier is added at output to balance the amplitudes for the two

modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.1 Injection-locked clocking (ILC). . . . . . . . . . . . . . . . . . . . . . 101

4.2 Voltage gain of an inverter and an injection-locked oscillator at different

input signal levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.3 Power savings of ILC relative to conventional clock distribution. . . . 104

4.4 Skew introduced by resonant frequency error vs. quality factor of res-

onator in a resonant based clock distribution. . . . . . . . . . . . . . 105

4.5 Illustration of ILC jitter suppression in comparison with conventional

clocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.6 Illustration of the three different configurations (a-c) of global clock dis-

tribution, and a possible floorplan (d) for the ILC-based global clock

distribution in Alpha 21264. Each configuration is designated accord-

ing to its clocking network: XGM, IGM, and IM′. . . . . . . . . . . . 110

4.7 Circuit-level jitter simulation setup. . . . . . . . . . . . . . . . . . . . 111

Page 19: Low-Power, Gigahertz Clock Generation and Distribution

xix

4.8 Breakdown of processor power consumption with different clock distri-

bution methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.9 Schematic of (a) the test chip and (b) a divide-by-two ILO used. . . 116

4.10 Chip micrograph of the test chip. The whole chip size is 1.5mm ×1.3mm, and each ILO occupies 0.25mm×0.22mm. The H-tree sections

measure 500 µm, 280 µm, and 290 µm, respectively, from root to leaves.116

4.11 Spectrum of the generated local clock signal from ILO1, identical to

that from other ILOs on-chip. . . . . . . . . . . . . . . . . . . . . . . 117

4.12 Locking range of ILO1, identical to that of other ILOs on-chip. . . . 118

4.13 Phase noise of reference clock and 4 output clocks at different positions

on chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.14 Deskew capability of ILC. (a) deskewing when tuning ILO1 and/or

ILO2; (b) deskewing when tuning ILO1 and ILO2 differentially. The

skew is measured between the two output clock signals of ILO1 and

ILO2. Note that there is some imbalance between ILO1 and ILO2

caused by mismatch in the clock distribution tree and measurement

system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.15 Two possible circuit implementation of non-division injection-locked

oscillator. (a) requires a differential input while (b) can take both

single-ended and differential input. . . . . . . . . . . . . . . . . . . . 121

4.16 Non-division ILO analysis: (a) circuit model; (b) loop behavior model.

Similar to the divide-by-3 topology, differential injection is required for

non-division operation. . . . . . . . . . . . . . . . . . . . . . . . . . 121

4.17 Phase shifting characteristics of non-division ILO at different injection

ratios when Q = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.18 Non-division ILO with binary weighted switch capacitor tuning. . . . 124

4.19 (a) Schematic and (b) chip micrograph of the test chip. . . . . . . . 125

Page 20: Low-Power, Gigahertz Clock Generation and Distribution

xx

4.20 Measured spectrum for output clock. . . . . . . . . . . . . . . . . . . 126

4.21 Measured locking range of an ILO in ILC. . . . . . . . . . . . . . . . 127

4.22 5-bit digital phase shift tunings for two ILOs’ outputs. . . . . . . . . 127

4.23 Phase noise comparison of the input clock from signal source and out-

put clock from the ILO. . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.24 Jitter measurement of the signal source and ILC output at 4 GHz.

From extrapolation, the cycle-to-cycle jitter of signal source and ILC

output are 0.11 ps and 0.14 ps, respectively. . . . . . . . . . . . . . . 130

4.25 ILC with active deskewing. . . . . . . . . . . . . . . . . . . . . . . . . 130

4.26 An injection-locked clocking system with ILO-based active deskew.

Four ILOs are driven by the input clock through an H-tree. Each

ILO is buffered by Buf1 to drive 2 pF of on-chip load capacitor (CL),

which also converts the ILO differential output to a single-ended signal.

Output buffers Buf2 drive the test ports (TPx). . . . . . . . . . . . . 132

4.27 (a) Deskew logic algorithm, and (b) an example of the deskew sequence

which shows the design for ringing prevention. . . . . . . . . . . . . . 133

4.28 Test chip die photo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

4.29 Measured locking range for the ILC network. . . . . . . . . . . . . . 135

4.30 (a) Measured free-running frequency tuning, and (b) delay tuning un-

der locked state by 5-bit switched capacitor array. . . . . . . . . . . 136

4.31 Deskew dynamics of the deskew loop. . . . . . . . . . . . . . . . . . . 137

4.32 (a) Phase noise of the ILC output in comparison with input clock

and free-running ILO. (b) Cycle-to-cycle jitter test for ILC output and

input clock. The degradation is only 0.04 ps. . . . . . . . . . . . . . . 138

Page 21: Low-Power, Gigahertz Clock Generation and Distribution

xxi

5.1 (a) Conventional dual-loop CDR, with loop I for frequency acquisi-

tion and loop II for phase acquisition. (b) Injection-locked CDR with

frequency acquisition achieved by injection locking. . . . . . . . . . . 140

5.2 Optoelectronic oscillator with optical fiber as the resonator. . . . . . 142

5.3 Injection-locked optoelectronic oscillator (OEO) with free space res-

onator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Page 22: Low-Power, Gigahertz Clock Generation and Distribution

Chapter 1

Introduction

The development of information technology has greatly increased our capabilities

of producing, manipulating, storing, communicating, and disseminating information.

Communications and computers are the two major foundations of information tech-

nology. In recent years, many high speed communications, wired and wireless, have

been introduced to provide fast data transfers at different distances and through dif-

ferent transmission media. At the same time, advances in microelectronic processing

techniques and ever-increasing market demand have enabled continuous increases in

computer performance. For both communication systems and computers, there is

one important common signal, the clock signal. The speed of the clock signal is usu-

ally the most important indicator of system performance for communication systems

and computers. The development of communications and computer technologies has

pushed the clock frequency of both systems into multi-gigahertz range. At such a

high speed, new challenges and considerations are emerging for conventional clock

design methods.

1

Page 23: Low-Power, Gigahertz Clock Generation and Distribution

2

1.1 Clocking in Communications and Computing

Systems

Communications systems are the foundation for information transfers. According

to different transmission media, communication systems are generally categorized

into wired and wireless communications. Wired communication is also called wireline

communication. It is now the dominant method for long distance telecommunications.

It provides the backbone for the information highway. At the same time, wireline

communication is also the main technology for fast local area networks (LANs), which

connect fixed servers and workstations inside office and residential buildings.

Currently, the dominant long distance wireline communication standard is syn-

chronous optical networking (SONET) and synchronous digital hierarchy (SDH) [1].

They are standardized multiplexing protocols that transfer multiple digital bit streams

over optical fiber using lasers or light-emitting diodes (LEDs). Designed for high

speed data transfer from the beginning, SONET/SDH has also evolved with time to

support higher data rate with newer generations. Since the standardization of these

two technologies in 1988, the transfer speed of SONET/SDH has increased from the

original optical carrier 1 (OC-1) of 51.84Mb/s to the currently wide implemented op-

tical carrier 192 (OC-192) and optical carrier 768 (OC-768) at speeds of 10Gb/s and

40Gb/s, as shown in Fig. 1.1. Higher speed version of optical carrier 3072 (OC-3072)

supporting a speed of 160Gb/s is also planned for the near future.

For local area networks (LANs), Ethernet is the most widely used wireline com-

munication standard [2]. Its speed has also increased with time (Fig. 1.1). The first

Ethernet introduced by Xerox PARC in 1973-1975 ran at only 3Mb/s. The supported

data rate of 10base-X Ethernet started around 1980 increased to 10Mb/s. Fast Ether-

net introduced in 1995 achieves a data rate of 100Mb/s. Right now, Gigabit Ethernet

(GbE), supporting a data rate of 1Gb/s, has become the de facto protocol for data

Page 24: Low-Power, Gigahertz Clock Generation and Distribution

3

Figure 1.1: Speed of wireline communication standards increases with time. Forlong distance wireline communication standards, synchronous optical networking(SONET) and synchronous digital hierarchy (SDH), the speed has increased from51.84Mb/s in 1988 to the current large scale deployment of 10Gb/s and 40Gb/s sys-tems. Speed for short distance wireline communication standard Ethernet increasedfrom only 3Mbps in 1975 to current speed of 10Gb/s.

transmission for LANs. Most personal computers (PCs) now ship with a GbE net-

working port as a standard. At the same time, faster versions of Ethernet are also

commercially available. In 2007, more than 1 million 10-Gb/s Ethernet (10-GbE)

ports were shipped. It can support data rate of 10Gb/s and is becoming widely

used in the fiber-optic core of telecommunication networks and is even available as

backplanes for ultrafast commercial equipment enclosures. For the future, even faster

40- and 100-GbE Ethernet standards are under standardization by the IEEE 802.3

Higher Speed Study Group (HSSG) under the IEEE 802.3ba designation.

The speed for wireline communications has been increasing rapidly in recent years.

Wireless communications, at the same time, are also moving from crowded sub and

Page 25: Low-Power, Gigahertz Clock Generation and Distribution

4

Figure 1.2: Commercial wireless systems are expanding from sub- and low-gigahertzrange to multi-gigahertz, even millimeter-wave range in order to support higher datarates.

low-gigahertz range to multi-gigahertz, or even milimeter-wave range for higher data

rates. Cellular phone systems used to be the only major commercial wireless appli-

cation, whose frequencies are in the crowded sub-2GHz range, as shown in Fig. 1.2.

With the introductions of wireless local area network (WLAN) and wireless personal

area network (WPAN), wireless communications quickly occupied the multi-gigahertz

frequency range [3, 4]. Recent developments in wireless digital network interface for

the streaming of high definition video signal over air have further pushed commercial

wireless communications to millimeter-wave, or tens-of-gigahertz range [5].

Clock signal is critical to both wireline and wireless communication systems. In

wireline communications, the receiver relies on the clock signal to determine the time

to sample the data it receives from the transmitter [6], as demonstrated in Fig. 1.3.

A good sampling clock in a wireline receiver shall have the sampling transition edge

Page 26: Low-Power, Gigahertz Clock Generation and Distribution

5

Figure 1.3: The role of clock signal in a typical wireline receiver. The highlightedportion in (a) is the clocking circuit. The clock signal is first frequency synchronizedwith that of the incoming data stream by a reference clock fref , which is sharedby both transmitter and receiver. The frequency synchronized clock is then phasesynchronized with the data stream, and used to sample the data. (b) The exampleof good and bad sampling clocks. The good sampling clock, clock A has a samplingtransition edge at the center of the data stream eye diagram. The bad sampling clock,clock B has a sampling edge at the edge of the data stream eye diagram. A samplingedge away from the eye diagram center increases the chance of error of the system.

right in the center of the data stream eye diagram. If this clock condition is not

satisfied, the error rate of the communication system will increase. In order to control

the error rate, both the frequency and phase stability of the clock signal in wireline

communication systems should be tightly controlled.

In wireless communications, the clock signal, or RF carrier determines the RF

transmission frequency. Due to interference concerns, accurate RF carrier frequencies

are required for different wireless communications terminals to work concurrently

within the same location [7]. Fig. 1.4 (a) shows the role of RF clock signal in a

typical radio transceiver. It is a bridge between low frequency and the radio frequency

signals. If the accuracy of this clock signal is not satisfied, the up-converted signal

may overlap with adjacent channels or other adjacent interferers and degrade the

performance of the wireless communications system, as illustrated in Fig. 1.4 (b).

Page 27: Low-Power, Gigahertz Clock Generation and Distribution

6

Figure 1.4: The role of clock signal in wireless communications systems. (a) showsa typical radio transceiver for wireless communications. The highlighted portionis the clock generator for wireless systems, the frequency synthesizer. The outputof frequency synthesizer up converts the transmitted signal from low frequency toradio frequency, and down converts the received signal from radio frequency to lowfrequency. (b) If there is any frequency error for the synthesizer output, the up-converted signal may overlap with adjacent channels in radio frequency and causeerrors for both the transmission channel and adjacent channels.

High performance computing systems are the engine for information technology.

Historically, the performance advances of computing systems were enabled by two

coexisting forces. One is the advances in microelectronics fabrication technology,

and the other is the fast-increasing market demands for computing power. Until the

beginning of the 21st century, faster clock speed, larger logic integration scale and

smaller feature size of CMOS process have always been the natural ways to increase

the processor performance. Intel’s 4004 processor shipped in 1971 only worked at

a maximum speed of 740 KHz. Clock speed for it’s 80286 shipped in early 1980s

increased to 8 MHz. In 1993, Intel introduced the fifth generation of x86 micro-

architecture processor, Pentium, working at a speed of 66 MHz. Pentium 4 introduced

in 2000 worked at a speed exceeding 1 GHz, reaching 1.3 GHz. Even though the heat

dissipation bottleneck has slowed down the trend for processor clock speed increase,

to 2007, the clock speed for Intel’s processor still reached 3.8GHz. Considering the

Page 28: Low-Power, Gigahertz Clock Generation and Distribution

7

influence of the power dissipation plateau, the ITRS 2007 predicted the processor

clock speed will keep increase at a speed of 1.25 times per technology node, reaching

near 9 GHz in 2015 [8]. At the same time, off-chip clock for processor will increase at

a faster speed and reach 30 GHz in 2015. The IO bandwidth of the processor and the

number of cores will also increase at the same time, as shown in Fig. 1.5 [8]. Based

on the data from processors released between 2007 and 2010, even though the heat

dissipation challenge and multi-core technique have slowed down the processor main

clock speed increase predicted by ITRS 2007, the off-chip clock speed for processors

is increasing steadily along the trend.

The commercial developments of new computer interconnect technologies are also

proving the predictions by ITRS. For the communication between the processor and

the peripheral component, PCI Express with clock frequency up to 4GHz are quickly

replacing the conventional PCI bus, as shown in Fig. 1.5 [9]. Serial ATA (SATA) with

clock speed up to 6GHz are replacing the ATA for computer hard drive data path

[10]. Memory interface technologies from Rambus and other companies are targeting

for 1TB/s speed at clock frequency of more than 10 GHz [11].

In computing systems, clocking is one of the most critical functions because it

is involved in almost all aspects of data processing and communications. For syn-

chronous digital logic, which is the most popular logic form for data processing, the

clock defines the start and end time for the movement of logic signals between differ-

ent registers in the data path [12, 13]. One clock cycle is the given time for all the

logic circuitry to complete their actions and get ready for the next step.

The typical structure of high speed clocking in a VLSI system is shown in Fig. 1.6

[14]. Usually an off chip crystal oscillator works as the reference for an on chip

clock generation phase-locked loop (PLL) [6]. The clock generation PLL generates

a gigahertz frequency tunable clock signal for the entire VLSI chip. After the clock

generator, a clock distribution network delivers the clock signal to every logic gate

Page 29: Low-Power, Gigahertz Clock Generation and Distribution

8

Figure 1.5: On-chip and off-chip clock frequencies, IO bandwidth and number of coresfor the processors in the near future predicted by International Technology Roadmapfor Semiconductors (ITRS) 2007. According to the prediction, the on-chip clockfrequency will reach near 9 GHz in 2015. The off-chip clock frequency will reach 30GHz and IO bandwidth will reach 80 Tb/s. Number of cores in 2015 will be around10.

Page 30: Low-Power, Gigahertz Clock Generation and Distribution

9

Figure 1.6: Clock generation and distribution in a processor chip. The clock gen-eration PLL generates gigahertz, frequency tunable clock signal for the entire VLSIchip. After the clock generator, a clock distribution network delivers the clock signalto every logic gates across the chip.

across the chip. Controlling of the space and timing variation on clock transition edge

is the design target for clock distribution.

1.2 Challenges for High Performance Clocking

The challenges for high speed clocking come from the high susceptibility to clock

timing errors. The same amount of absolute timing error corresponds to a larger per-

centage of cycle time for a higher speed clock. Less usable cycle time translates to less

immunity to noise and errors in the data processing and communication processes. In

order to maintain acceptable error rates, absolute timing error therefore needs to be

scaled down with the clock cycle time. For wireless communication systems, higher

speed carrier frequency is harder to generate while maintaining good frequency sta-

bility [15]. Inaccurate carrier frequency broadens the signal bandwidth and generates

interference between different wireless systems, and different channels within a same

Page 31: Low-Power, Gigahertz Clock Generation and Distribution

10

standard [7].

In different systems, there are different ways of describing the clock timing errors.

For computing and wireline communication systems, the clock timing errors are usu-

ally characterized by skew and jitter [12]. Both skew and jitter are time domain are

descriptions of the clock timing error, with the former describe the space variations

of clock transition edges, and the latter describe the time variations of the clock tran-

sition edge. For wireless communication systems, the clock timing errors are usually

characterized by the phase noise of the local oscillation signal [15, 7]. It is a frequency

domain description on the phase variation of the local oscillation, or the clock signal.

1.2.1 Skew

Skew is defined as the clock arrival time differences at different positions [14].

For synchronous logic, different positions means different flip-flops. Clock skew in

synchronous logic is generally caused by mismatches in clock distribution network.

These mismatches can be from lengths and electrical properties of clock interconnect,

the sizes and device parameters of clock buffers, and from clock load itself.

The effect of clock skew on the performance of synchronous logic systems can be

demonstrated from the following two examples. Synchronous logic is usually com-

posed by the structure of two sequential registers with combinational logic inserted

in between. Suppose we have such a structure as shown in Fig. 1.7 (a), where Ci

and Cj are the clock signals for the near-end register Ri and the far-end register Rj.

When clock Ci arrive at the near-end register, it starts the movement of signal from

Ri to the combinational logic after a delay Tcq. Signals have different paths in com-

binational logic before they reach the far-end register. We assume the longest path

has a delay of Tijmax. Upon arriving at Rj, a setup time, Tsetup, for the register is

required. Usually the three delay elements, Tcq + Tijmax + Tsetup, in such a data path

comprise the minimum allowable clock cycle time Tcycle. However, if there is a clock

Page 32: Low-Power, Gigahertz Clock Generation and Distribution

11

Figure 1.7: Effect of skew on (a) long path error and (b) short path error.

skew such that Cj arrives earlier than Ci, we will have a problem if we still run the

logic at the same clock frequency. Because of the existence of an early skew, Tskew,

at Rj, clock Cj arrives before the signal at Rj input becomes stable according to the

setup time. This may cause a function error to the logic and is usually referred as

a setup violation. Because this type of error is usually caused by the longest delay

path in the logic, it is also called a long path error. One ready remedy for long path

error is to increase the clock cycle time, so that

Tcycle ≥ Tcq + Tijmax + Tsetup + Tskew (1.1)

can be satisfied. Apparently the existence of skew increases the minimum allowable

clock cycle time, and thus limits the maximum speed of the logic.

Another effect of skew on synchronous logic is shown in Fig. 1.7 (b), where the

clock, Cj, at far-end register, Rj, arrives later than the near-end one, Ci, with a skew

of Tskew. This time the signal goes through the shortest path in combinational logic

and has a smallest delay, Tijmin. The signal arrives the input of Rj after a total delay

of Tcq + Tijmin. This delay is smaller than the sum of clock skew Tskew and the hold

Page 33: Low-Power, Gigahertz Clock Generation and Distribution

12

Figure 1.8: Illustration of clock jitter as the clock period variations. The period ofclock with jitter is a Gaussian distribution with the mean value equal to the idealclock period Tavg.

time Thold required by the register Rj for the data of the previous cycle. The early

arrival of data of this cycle may damage the data of the previous one, causing a logic

error. This error is usually referred as a hold violation, or short path error, because

it is usually caused by the shortest path in the combination logic. A hold violation

can not be corrected by increasing the clock cycle time. Instead, delay elements need

to be added in the shortest logic path. However, this method must be done carefully

so that the longest path delay is not increased.

1.2.2 Jitter

Jitter is defined in ITU (International Telecommunication Union) specifications as

”short term variations of a digital signals significant instance from their ideal position

in time.” If we observe a signal with jitter on oscilloscope, the jitter shows up as a

blurred rising and falling edges (Fig. 1.8). Jitter adds up to the clock uncertainty on

top of skew. It further reduces the timing margin in a clock cycle and hence reduces

the maximum speed of a system.

Page 34: Low-Power, Gigahertz Clock Generation and Distribution

13

A practical clock signal can be written as

Vclock(t) = A(t)f [ω0t + φ(t)] (1.2)

where the function f is periodic in 2π and A(t) and φ(t) model fluctuations in am-

plitude and phase. The amplitude fluctuations can be significantly attenuated by

the amplitude limiting mechanism, which is readily present in clock generation and

distribution systems. The phase fluctuation φ(t), becomes the main cause of clocking

time error. With the existence of φ(t), the clock period will be variable from cycle

to cycle, and usually in a Gaussian distribution, as shown in Fig. 1.8. We refer to

Tn as the period of cycle n, and the mean value of the Tn distribution is defined as

the average period of the clock, which is denoted as Tavg. The standard deviation of

this clock period distribution is defined as cycle-to-cycle jitter σc. The equation for

σc can be expressed as

σc =

√ limN→∞

(1

N

N∑

n=1

(Tn − Tavg)2). (1.3)

The cycle-to-cycle jitter describes the magnitude of the clock period fluctuations

in short-term, but contains no information on the long-term jitter accumulation. The

long-term jitter accumulation can be characterized by absolute jitter, or long-term,

jitter, σabs(t). It is defined as the sum of periods variations during an observation

time t.

σabs(t = NTavg) =

N∑

n=1

(Tn − Tavg) (1.4)

The long time jitter definition is a function of observation time relative to the trig-

gering event. For free-running oscillators, it will increase to infinity with observation

time, since the period variation of one cycle has no information about the variation

Page 35: Low-Power, Gigahertz Clock Generation and Distribution

14

of previous cycles. This measure is more often used with PLLs, because a PLL has a

phase reference source that reset the jitter, which a free-running oscillator does not

have. From the operation analysis of the synchronous digital logic system, the jitter

that matters in its clock signal is the period variation between two sequential cycles,

which is cycle-to-cycle jitter σc.

For baseband digital communication, the jitter definition has broader meanings,

where the random part of clock time variations is defined as random jitter, and the

deterministic part is defined as deterministic jitter. Deterministic jitter is mainly

caused by duty cycle distortion and data dependent jitter. Deterministic jitter is

not random, so it is usually described by peak-to-peak magnitude. Useful jitter

characteristics for baseband digital communication receivers include jitter generation,

jitter transfer and jitter tolerance. Jitter generation is the process whereby jitter

appears at the output in the absence of applied input jitter. It is the measure of

the receiver’s ability of producing jitter itself. Jitter transfer is characterized by

the relationship between the applied input jitter and the resulting output jitter as a

function of frequency. For receivers in which a linear process describes the transfer of

jitter from the input to output port, the jitter transfer function is the ratio of output

jitter spectrum to the applied input jitter spectrum. Jitter tolerance is defined as the

peak-to-peak amplitude of sinusoidal jitter applied on the input signal that causes a

1-dB power penalty. This is a stress test intended to ensure that the receiver works

properly under required jitter conditions.

1.2.3 Phase Noise

For wireless communication systems, it is more convenient to study the jitter

in frequency domain, which is called phase noise [15, 7]. Phase noise is usually

characterized in terms of the single sideband noise spectral density. It has units of

Page 36: Low-Power, Gigahertz Clock Generation and Distribution

15

decibels below the carrier per hertz (dBc/Hz) and is defined as

L(∆ω) = 10logPsideband(ω0 + ∆ω, 1Hz)

Pcarrier

(1.5)

where Psideband(ω0 + ∆ω, 1Hz) represents the single sideband power at a frequency

offset of ∆ω from the carrier with a measurement bandwidth of 1 Hz. A typical phase

noise profile for an oscillator is shown in Fig. 1.9 (a). The phase noise usually drops

with the increase of offset frequency from the carrier. According to the slope of this

drop, the phase noise profile can be divided to three regions. The closest region has

a slope of 30dB/dec, the middle region has a slope of 20dB/dec, and at large offset

frequency, the profile becomes flat.

The phase noise is especially important for the LO signal in wireless systems. This

is because when an LO down converts the desired signal, adjacent interferers also get

convolved with the LO signal and down-converted near the desired signal. The phase

noise sideband of the LO will be transferred to both the down-converted desired signal

and interferers. This may result in overlapping spectra between the desired signal and

interferers, with the tails of interferers corrupting the desired information, as shown

in Fig. 1.9 (b). This effect is called reciprocal mixing.

There are many sources in an oscillator that contribute to the generation of phase

noise. The conversion mechanisms from different sources to phase noise are usually

different and complicated. In order to gain an intuitive understanding of these pro-

cesses, we will assume a linear time-invariant model for a typical LC oscillator, and

study the conversion process of device thermal noise in the LC tank to the oscillator

output phase noise (Fig. 1.10) [16]. The thermal noise of a device in the tank can

be represented by a white noise current source across the tank with a mean-square

spectral density ofi2n∆f

=4kT

R. (1.6)

Page 37: Low-Power, Gigahertz Clock Generation and Distribution

16

Figure 1.9: (a) Typical phase noise profile of an oscillator versus offset from carrier.The corner frequencies ∆ω 1

f3

and ∆ω 1

f2

represent the boundaries between 30dB/dec,

20dB/dec and flat regions of the phase noise profile. (b) Illustration of the process ofreciprocal mixing.

This noise current will only see the impedance of a perfectly lossless LC network

because the loss of the tank is canceled by the active energy restoration element. For

a relatively small offset frequency ∆ω from the center frequency ω0, the impedance

of a lossless LC network may be approximated by

Z(ω0 + ∆ω) ≈ −jω0L

2(∆ω/ω0). (1.7)

Substitute ω0L with the tank quality factor Q and effective parallel resistance R by

ω0L =R

Q, (1.8)

we have the tank impedance rewritten as

Z(ω0 + ∆ω) ≈ −jRω0

2Q∆ω. (1.9)

Page 38: Low-Power, Gigahertz Clock Generation and Distribution

17

Figure 1.10: (a) A typical LC oscillator, and (b) the model for phase noise analysis.The active circuit part provides a negative resistance to compensate the loss of theLC tank.

Multiply the squared magnitude of this tank impedance with the spectral density of

the mean-square noise current, we obtain the spectral density of the mean-squared

noise voltage:v2

n

∆f=

i2n∆f

|Z|2 = 4kTR(ω0

2Q∆ω)2. (1.10)

The power spectral density of the output noise is frequency-dependent, and is

proportional to the inverse-square of the offset frequency. This is because of the

filtering function of the LC tank. With the consideration of flicker noises at small

offset frequency, we can get a 1∆ω3 region on top of Eqn. 1.10, which is because the

flicker noise has a power spectral density inversely proportional to frequency. As

mentioned earlier, the above approach has assumed a linear time-invariant model

for the oscillator, which is not accurate enough for practical oscillators. With the

consideration of the nonlinear and time-variant natures of practical oscillators, [16]

gave a complete description of phase noise generation process, which involves the

conversion from device noise to the phase fluctuation φ(t) as in Eqn. 1.2, and a phase

modulation from φ(t) to oscillator output phase noise.

While jitter can be viewed as a statistical behavior for the random process of

Page 39: Low-Power, Gigahertz Clock Generation and Distribution

18

Figure 1.11: (a) The linearized noise model for a typical type II PLL, and (b) atypical PLL phase noise profile.

phase fluctuation φ(t), the phase noise is the frequency domain representation for the

same process. These two can be related by [17]

σ2abs = 4

∫ +∞

−∞Sφ(f) sin2 (πft) df. (1.11)

Note that the jitter in this relationship is absolute jitter, or long-term jitter.

It is worth to mention here the effect of frequency division and multiplication on

phase noise. Since frequency and phase are related by a linear operation, a division or

multiplication to frequency is identical to division and multiplication of phase by the

same factor. For the clock signal in Eqn. 1.2, a frequency division by M will generate

Vclock/M(t) = A(t)f [ω0

Mt +

φ(t)

M]. (1.12)

Page 40: Low-Power, Gigahertz Clock Generation and Distribution

19

The magnitude of the phase fluctuation is also divided by M , and from the narrow

band phase modulation approximation, the phase noise power is divided by M 2. If we

assume the carrier amplitude to be the same, a frequency division by M will generate

a phase noise reduction of 20 log M dB. On the other hand, a frequency multiplication

by M will increase the phase noise by 20 log M dB.

Phase noise of a phase-locked loop (PLL) is also of great interest for the study of

high speed clocking. For a typical type II PLL, the linearized noise model is shown

in Fig. 1.11(a). This noise model indicates that, the PLL output phase noise Φout is

contributed by both the input phase noise Φin and the phase noise generated by the

VCO ΦV CO. The noise transfer functions of the loop from both input and the VCO

are shown below:Φout(s)

Φin(s)=

ω2n + 2ζωns

s2 + 2ζωns + ω2n

, (1.13)

Φout(s)

ΦV CO(s)=

s2

s2 + 2ζωns + ω2n

, (1.14)

where ωn is the natural frequency of the loop and ζ is the damping factor. From the

noise transfer functions, we can see noise from input is low pass filtered by the loop,

while noise generated by the VCO is high pass filtered by the loop. The typical phase

noise profile for such a PLL is shown in Fig. 1.11(b), where the output phase noise

Lout(∆ω) is the superposition of the low pass filtered portion of input noise Lin(∆ω),

and high pass filtered portion of VCO noise LV CO(∆ω).

Skew, jitter and phase noise characterized the major timing uncertainties for high-

speed clocking systems. Many techniques in both circuit and system levels have been

developed to control these timing uncertainties. At the same time, power consumption

becomes a more and more important consideration when dealing with such problems

as many applications are moving to battery powered portable devices. The following

part of this proposal will introduce some new techniques, based on injection-locked

oscillators, to handle the problems of skew, jitter and phase noise for high speed

Page 41: Low-Power, Gigahertz Clock Generation and Distribution

20

clocking applications with good power efficiency.

1.2.4 Power Consumption

Energy efficiency is a big design consideration for battery-powered portable com-

munication devices because it directly determines the battery life of such devices.

Inside the transceiver of these portable communication devices, frequency synthesizer

consumes a considerable portion of the total radio power. As the carrier frequency

moves to higher frequency, the power consumption of the frequency synthesis circuit

will go to even higher level. Reducing the power consumption is thus a priority besides

the performance in designing high frequency synthesizers.

Inside a frequency synthesizer, the frequency divider of the phase-locked loop

(PLL) is one of the power hungry devices. For conventional digital implementation

of frequency dividers, the power consumption will increase linearly with frequency.

This increase of power becomes unacceptable for systems working in millimeter-wave

range. New frequency divider with better power consumption in multi-gigahertz and

millimeter-wave range is highly desired for wireless systems working in these frequency

ranges.

Power dissipation in computing systems is a major limitation at many levels. This

limitation is especially prominent for high performance processor chips. The Inter-

national Technology Roadmap for Semiconductors (ITRS) states that the amount of

heat that can be removed from a chip in a cost-effective way is about to reach a

plateau, saturating at about 200 Watt.

There are many sources of power consumption in high performance processors.

Clock distribution circuit is one of the major contributors of the total power consump-

tion. In large scale processors, the total power dissipation in the clock distribution

can be significant. It may be as much as 30-40% of the total power consumption of

the whole system. The most dominant component of clock power consumption is due

Page 42: Low-Power, Gigahertz Clock Generation and Distribution

21

to the dynamic switching:

P = CV 2DDfclk, (1.15)

where C is the loading capacitance of the whole clock network, VDD is the supply

voltage and fclk is the clock frequency. The capacitance of clock load can be very

high in large scale digital systems, possibly in the nano-Farad range. The capacitance

of the clock arises from many sources. First, the interconnect capacitance of the

metal lines is a major source of capacitance. Second, there are large buffers used

in the clock distribution network that give rise to large fanout and self-capacitance

terms. Third, there are capacitances associated with the inputs of the flip-flops

driven by the clock. When designing the clock distribution network, it is important

to minimize the capacitance of the clocking network in order to reduce the switching

power consumption.

1.3 Dissertation Organization

The following parts of the dissertation are organized as follows. Chapter 2 will

introduce the background knowledge of conventional clock generation and distribu-

tion methods. Some recent proposals for solving some of the high-speed clock design

challenges are also discussed. Chapter 3 will give a detailed background introduc-

tion about the injection locking phenomenon, and also three circuit innovations of

injection-locked frequency converters for communication systems. These three cir-

cuits include a tunable dual-phase signal generator based on double balanced divide-

by-2 injection-locked frequency divider, a divide-by-3 injection-locked frequency di-

vider based on differential injection and harmonic engineering, and a multiply-by-2/3

dual-mode injection-locked frequency multiplier which is capable of good harmonic

suppression in digital CMOS processes. After that, chapter 4 will present our new

injection-locked oscillator design for clock distribution purpose, and injection-locked

Page 43: Low-Power, Gigahertz Clock Generation and Distribution

22

clocking, a whole new set of clock distribution method for multi-gigahertz clock dis-

tributions in multi-core era. Chapter 5 will conclude this dissertation and point to

some future research directions.

Page 44: Low-Power, Gigahertz Clock Generation and Distribution

Chapter 2

High Performance Clocking

2.1 Clock Generation

Clock generation, or frequency synthesis in wireless communications term, is the

process of generating one or many clock frequencies from one or a few reference

sources. Clock generator, or frequency synthesizer is an important building block in

both processors and communications systems.

In early days’ wireless systems, a frequency synthesizer was a crystal-controlled

oscillator with a bank of crystals switched in manually. The frequency accuracy and

stability of such kind of clock generators were mainly determined by the accuracy

and stability of the crystal.

Direct frequency synthesis, or incoherent synthesis is the second generation of fre-

quency synthesis for wireless systems. This approach utilized relatively few crystals

to generate many frequencies by means of frequency mixing, division and multiplica-

tion. Fig. 2.1 is an example of direct frequency generation by frequency mixing and

division, where frequency at 23f0 is generated from a f0 source.

23

Page 45: Low-Power, Gigahertz Clock Generation and Distribution

24

Figure 2.1: Direct frequency generation using the mix-and-divide principle. It requiresexcessive filtering.

Clock generation by direct frequency synthesis can produce fast frequency switch-

ing, almost arbitrarily fine frequency resolution, low phase noise, and the highest-

frequency operation of any of the methods. However, direct frequency synthesis

requires considerably more hardware(oscillators, mixers, and bandpass filters). The

hardware requirements result in direct synthesizers being larger and more expensive.

Another disadvantage of the direct synthesis technique is that unwanted frequen-

cies can appear at the output. The wider the frequency range, the more likely that

spurious components will appear in the output.

In modern communications systems, a frequency synthesizer generates an output

frequency given by fout = f0 + kfch, where f0 is the lower end of the range, k is an

integer varying from 0 to the maximum number of channels and fch is the frequency

step, or channel spacing. In the receive band of IS-54 cellular systems, for example,

f0 = 869MHz, k = 0, ...833, and fch = 30kHz. The very high accuracy required in

the specifications of f0 and fch ofen requires the use of phase-locked loop (PLL).

In early processors where the clock speed is low, a simple form of clock generation

is just a ring oscillator. A ring oscillator has an odd number of inverter stages and

produces an oscillating signal at each node. The period of oscillation is determined by

the delay of each stage and the number of stages. This type of clock generation circuit

is only suitable for low speed clock generations, because the generated clock signal

is quite process dependent and unstable. In order to achieve better clock frequency

Page 46: Low-Power, Gigahertz Clock Generation and Distribution

25

Figure 2.2: Typical integrated oscillators. (a) a ring oscillator and (b) an LC oscilla-tor.

stability, again, on-chip phase-locked loops (PLLs) with crystal oscillators reference

are used for processor clock generation.

2.1.1 Oscillators

An oscillator is one of the most basic and essential component in a clocking sys-

tem. It converts dc power to RF power and produces a steady-state sinusoidal signal.

This steady-state periodic signal is usually frequency divided, multiplied or mixed

with another periodic signal to generate the clock signal, or RF carrier for a com-

munication or computer system. The frequency stability, or spectral purity of the

generated periodic signal is usually the most important performance figure for an os-

cillator. Other factors like power consumption and cost are also considered in many

applications in choosing an oscillator topology. These factors usually demonstrate

trade-offs with the frequency stability.

According to frequency determination mechanisms, oscillators can be categorized

into ring oscillators and resonator-based oscillators. A ring oscillator (Fig. 2.2-a) has

its oscillation frequency determined by the delay of a positive feedback loop. Usu-

ally several identical delay elements form a ring structure to construct an oscillator,

Page 47: Low-Power, Gigahertz Clock Generation and Distribution

26

which gives the name of this type of oscillator. At the frequency of oscillation, the

ring structure has a steady-state closed loop gain of one and loop phase shift of 2π,

which forms a positive feedback and enables the oscillation signal to add in phase

as it propagate along the ring structure. Ring oscillators usually have smaller area

and larger tuning range compared with resonator based oscillators. However, they

usually have inferior frequency stability and consume larger power. In applications

where performance is not strictly required, while at the same time, cost is a major

consideration, ring oscillators are a very good candidate for the generation the clock

source.

A resonator-based oscillator has its oscillation frequency determined by the res-

onator. For integrated circuits, such resonators are usually built with LC (inductor

and capacitor) tanks. So they are also called LC oscillators (Fig. 2.2-b). An LC

differential oscillator can be analyzed by Barkhausen’s criteria in a feedback loop as

shown in Fig. 2.3. In Fig. 2.3-a, an LC differential oscillator is discomposed into

two parts, the LC tank H(jω) and the active circuit f(V0). This two parts form a

feedback system as demonstrated in Fig. 2.3-b. According to Barkhausen’s oscillation

criteria, at steady state oscillation, the following relationships should hold:

|f(V0)H(jω0)| = 1 (2.1)

6 [f(V0)H(jω0)] = 2π (2.2)

where ω0 is the frequency of oscillation. Because the active circuit in Fig. 2.3-a

introduces zero phase shift, the oscillation frequency of this oscillator can only be at

the resonant frequency of the LC tank. This is because, as shown in Fig. 2.3-c, the

phase shift of the LC tank is zero only at its resonant frequency, which equals to 1√LC

.

In such LC oscillators, active circuits only provide the negative resistance nec-

essary to cancel the loss of the LC tank. Because the energy in LC oscillators is

Page 48: Low-Power, Gigahertz Clock Generation and Distribution

27

Figure 2.3: Barkhausen oscillation criteria for LC oscillators. (a) an LC differentialoscillator composed by an LC tank, represented by H(jω), and active circuit, rep-resented by f(Vo); (b) feedback loop model for the LC oscillator; (c) amplitude andphase response of the LC tank H(jω).

Page 49: Low-Power, Gigahertz Clock Generation and Distribution

28

recycled between the inductance and capacitance of the resonant tank, LC oscillators

usually demonstrate smaller power consumption than ring oscillators. Also because

of the bandpass filter function of the LC tank, LC oscillators also can have much bet-

ter spectral purity than ring oscillators. The only trade-off for these benefits of LC

oscillators is its larger occupied area, which comes from the passive inductor of the

LC tank. LC oscillators find their applications in high performance systems where a

high frequency stability clock source is required.

2.1.2 Phase-Locked Loop (PLL)

A phase-locked loop (PLL) is a feedback system that operates on the excess phase

of nominally periodic signals. Shown in Fig. 2.4 is a sample charge pump PLL,

consisting of a phase and frequency detector (PFD), a charge pump (CP), a low pass

filter (LPF) and a voltage-controlled oscillator (VCO). The phase difference between

the input x(t) and the output y(t) is sensed by the PFD and converted to voltage

signal by the charge pump. The low pass filter removes the high frequency component

in the charge pump output, feed the dc component to the control port of the VCO,

change the oscillation frequency of VCO and correct the frequency and phase error

between input x(t) and output y(t).

In the locked condition, all the signals in the loop have reached a steady state and

the PLL operates as follows. The phase detector and charge pump produces an output

whose dc value is proportional to the phase difference between the reference clock and

the feedback clock from the voltage-controlled oscillator (VCO). The low pass filter

(LPF) suppresses high-frequency components in the PD/CP output, allowing the dc

value to control the VCO frequency. The VCO then oscillates at a frequency equal

to the input frequency.

From the basic operation of a PLL, we found that in locked condition, the input

and output frequencies are exactly equal, regardless of the magnitude of the loop gain.

Page 50: Low-Power, Gigahertz Clock Generation and Distribution

29

Figure 2.4: A basic phase-locked loop, where PFD is the phase and frequency detector,CP is charge pump, LPF is low pass filter and VCO is voltage controlled oscillator.

This frequency synchronization function an extremely important property because

frequency synthesizers are intolerant of even small differences between the input and

output frequencies.

Further quantitative analysis on the loop transfer function of the PLL also reveals

that for a charge pump PLL in locked state, the phase error between the input and

output clocks is also eliminated if there is no mismatch for the PFD and charge

pump circuits. This phase synchronization function of charge pump PLL finds many

appliactions, include the wireline communications receiver and deskew circuits for

clock distribution.

If we insert a frequency divider in the feedback path of the VCO clock output,

because of the frequency synchronization nature, the divided clock will be equal

to the input reference, this means a PLL can be used for frequency multiplication

purpose. This frequency multiplication function is the basics for PLL based frequency

synthesis. PLL-based frequency synthesis is a coherent frequency synthesis method,

because it generates one or several clock frequencies from a single reference source.

This single reference source is usually an external crystal oscillator. A typical PLL-

based integer-N frequency synthesizer is shown in Fig. 2.5, where a voltage-controlled

oscillator (VCO) is corrected periodically by the phase comparison results between

the crystal reference and the frequency divided output [15]. The frequency divider is

built with variable division ratio, controlled by the modulus selection signal. Such a

Page 51: Low-Power, Gigahertz Clock Generation and Distribution

30

Figure 2.5: A typical PLL-based Integer-N frequency synthesizer. With the divide-by-N frequency divider in the loop, the output clock frequency fout equals to N ×fin.The modulus selection is capable of changing the value for the division ratio N . Thusthe output frequency can be changed with step size of the reference frequency fin.

topology produces fout = Nfin, where N varies in unity steps from ML to MH . If

Mfin is to be equal to f0 +kfch, then for the first channel (k=0), we have MLfin = f0.

Furthermore, for the second channel, (ML +1)fin = f0 + fch, implying that fch = fin.

Thus, fout = MLfin + kfin. The simplicity of integer-N architecture has made it a

popular choice wireless frequency synthesis and processor clock generation for many

years. However, because the input reference frequency must be equal to the channel

spacing in integer-N frequency synthesizer architecture, it suffers several drawbacks.

These drawbacks include the reference spurs and the loop bandwidth limitations. In

order to overcome these drawbacks, in recent years, fractional-N architecture was

introduced. Fractional-N PLLs allow the channel spacing to be much smaller than

the input reference frequency, thus removed bandwidth limitation of integer-N PLLs.

Phase-locked loop (PLL) is widely used in clocking circuits because of its versatile

functions, as discussed above. However, the circuit realization of PLLs is not as sim-

ple as it seems from diagrams. Every building block in the PLL involves many design

considerations and challenges. The loop as a whole also requires careful designs. For

phase and frequency detector (PFD), the problem of dead zone, mismatch between

UP and DOWN path are the major design concerns. For charge pump, the charge

Page 52: Low-Power, Gigahertz Clock Generation and Distribution

31

sharing problem and the mismatch between UP and DOWN current sources require a

lot of design efforts to remove. Design issues of voltage-controlled oscillators (VCOs)

include phase noise, power consumption, and frequency tuning range. Frequency di-

viders in the PLL feedback path usually consumes a large portion of the total power

budget. Large division ratios are usually realized by several dividers cascading to-

gether. The first frequency divider is usually called prescalar and draws most of the

attentions. This is because the prescalar needs to handle the highest speed among all

the frequency dividers, and at the same time, tends to consume most of the power.

The loop design includes the design of several important loop parameters, including

the loop bandwidth, the dampling factor, the phase margin and the lock time. Be-

cause of all these design challenges, the implementation of a PLL is usually time and

power consuming. For some critical applications, the use of a PLL is justifiable. For

some other applications, simpler circuit than a PLL is desired in order to reduce the

design efforts and power consumption.

2.1.3 Injection-Locked Oscillators (ILOs)

Injection locking [18, 19] is a special type of forced oscillation in nonlinear dynamic

systems (also known as synchronization). Suppose a signal of frequency ωi is injected

into an oscillator (Fig. 2.6-a), which has a self-oscillation (free-running) frequency

ω0. When ωi is quite different from ω0, “beats” of the two frequencies are observed.

As ωi approaches ω0, the beat frequency (|ωi − ω0|) decreases. When ωi enters some

neighborhood very close to ω0, the beats suddenly disappear, and the oscillator starts

to oscillate at ωi. The frequency range in which injection locking happens is called

the locking range (Fig. 2.6-b). The locking range determines the operation bandwidth

of an ILO, and need to be maximized. Generally speaking, it is proportional to the

injection strength. Injection locking also happens when ωi is close to the harmonic

or subharmonic of ω0, i.e. nω0 or 1nω0. The former case can be used for frequency

Page 53: Low-Power, Gigahertz Clock Generation and Distribution

32

iw

iw

0w

iw

iw

(a)

0w

iw

ww -i

Locking

Range

1w

2w

(b)

Figure 2.6: (a) Beat and injection locking phenomenon when an oscillator is drivenby a single-frequency input signal. (b) locking range.

division, and the latter for frequency multiplication.

An injection-locked oscillator (ILO) can be considered as a simple first-order PLL

(Fig. 2.7-a), in which nonlinearity of the oscillator core functions as a phase detector.

For example, in a typical divide-by-two ILO (Fig. 2.7-b) [20], the oscillator core

(consisting of M1, M2 and Mtail) also serves as a single-balanced mixer for phase

detection. Because of the simple structure, ILOs consume much less power than a

full-fledged PLL, and can operate at frequencies as high as tens of gigahertz [21].

The fact that the built-in “phase detectors” are mixer-based explains why ILOs can

operate at the harmonic and subharmonic frequencies of the input signal. Harmonic

and subharmonic frequency injection locking make ILOs ideal for frequency division

and multiplication purposes.

Because it’s compact structure and easy implementation, divide-by-two ILFD

shown in Fig. 2.7-b has been the most reported ILFD [22, 23, 24]. [22] first ana-

lyzed the phase limited and amplitude limited locking range, and the noise transfer

characteristics of such a differential divide-by-two ILFD. The circuit implementa-

tion achieved 12.3% locking range at 3 GHz with power consumption of 1.2 mW.

Page 54: Low-Power, Gigahertz Clock Generation and Distribution

33

(a) (b)

Figure 2.7: (a) A generic model of an injection-locked oscillator (ILO). (b) a divide-by-two ILO based on a common differential LC oscillator. The input signal is injectedinto the oscillator core through the tail transistor Mtail. This topology exhibits goodinjection locking efficiency because of the built-in single-balanced mixer structure.

A frequency synthesizer using such a divide-by-two ILFD as the PLL prescalar was

reported in [20]. [23] proposed a low power quadrature generation circuit by injection

a pair of differential signals to two identical injection-locked frequency dividers. A

hard switching model was used for the analysis of the divide-by-two ILFD, which

gives a simple expression for the locking range and output amplitude. [24] introduced

a unified model to analyze general injection-locked frequency divider. The proposed

method uses a two-dimentional taylor expansion to treat the nonlinearity of the ILFD

with respect to the injection and oscillation signals. Specifically for the divide-by-

two ILFD, a piecewise nonlinearity model was used for the cross-pair in the ILFD,

which is more accurate than hard switching model in [23]. [25] introduced a easy

understanding graphic method to illustrate the operation of injection-locked oscilla-

tors. However, this method is not valid for certain topologies with nonlinear mixing

between the injection frequency and oscillation signal.

Ring oscillator based injection-locked oscillators are also reported in literature.

Page 55: Low-Power, Gigahertz Clock Generation and Distribution

34

Compared with LC based ILOs, Ring-based ILOs usually generate stronger harmonic

components, burn more power and have inferior phase noise compared with LC based

ILOs. [26] introduced two ILFDs based on ring oscillators for divide-by-three and

divide-by-five operations. Both ring-ILFDs use a single injection applied to a common

ground node of all the delay cells. Such a single injection scheme is not effective

for multi-stage ring oscillators because multiple nodes with different phases tend to

cancel the effect of the single phase injection signal. Because of this disadvantage, [27]

proposed a multiple-input scheme for injection-locked ring oscillators. This scheme

can achieve the best locking range and uniformity of output phase spacing between

different nodes. The disadvantage of this approach, however, is the need for multiple

input with equally spaced phase differences, which are generally hard to generate.

2.1.4 High Speed Frequency Dividers and Multipliers

Clock dividers, or frequency dividers are essential building blocks in frequency

synthesis, quadrature signal generation, MUX/DEMUX, and radar systems. As dis-

cussed in the phase-locked loop section, the frequency divider is used in the feedback

path so that the PLL can achieve frequency multiplication. The first frequency di-

vider after the VCO in the PLL is called prescalar. It operates at highest speed

and consumes most of the power. In the design of high speed frequency dividers,

speed, power dissipation and phase noise are the most important specifications. As

the clock speed of communications and computing systems going to multi-gigahertz

range, frequency dividers in these systems are also running at such high speed. Such

high speed frequency dividers are under even more stringent power, speed and phase

noise trade-offs.

Currently, static or dynamic ”digital” dividers [28, 29, 30] are most common in

RF systems because it is widely believed that they have simpler structure, larger

bandwidth, and better robustness over process variations than conventional analog

Page 56: Low-Power, Gigahertz Clock Generation and Distribution

35

Figure 2.8: Digital divide-by-two circuit and the implementation of the latch.

dividers. Digital frequency dividers can be implemented by two latches connected in a

negative feedback loop, as shown in Fig. 2.8. The implementation of latches depends

on the available type of transistors, but a current-steering topology consisting of

a differential pair and a regenerative pair achieves high speed in both bipolar and

CMOS technologies, as shown in the expand-out box in Fig. 2.8. Such a latch is

static latch. High speed divide-by-two digital divider circuits can also incorporate

dynamic latches. Fig. 2.9 shows two examples of such dynamic dividers. In Fig. 2.9

(a), the first two CMOS inverters operate as dynamic latches controlled by CK and

CK, and the third inverter provides the overall inversion required in the negative

feedback loop. In Fig. 2.9 (b), the divider is based on true single-phase clock (TSPC)

scheme, achieving a high speed. The drawback of both these dynamic dividers is the

lack of precise complementary or quadrature outputs. For both static and dynamic

digital dividers, as the operation frequencies increase, the trade-off between the speed

and power dissipation becomes more critical, especially in mobile applications. Due

to large power dissipation, high speed digital dividers can also introduce considerable

noise degradation.

Another type of frequency dividers are regenerative frequency dividers [31, 32].

Shown in Fig. 2.10 (a) is a high-speed divide-by-two method originally proposed by

Page 57: Low-Power, Gigahertz Clock Generation and Distribution

36

Figure 2.9: Dynamic CMOS dividers using (a) inverters (b) true single-phase clock(TSPC) logic.

Figure 2.10: Analog frequency dividers: (a) Miller frequency divider and (b) para-metric frequency divider.

Miller [31]. So it’s also called Miller divider. Employing a mixer and a low-pass filter

in a feedback loop, the circuit operates as follows. Upon multiplication of the input

and output signals, the mixer generates components at fin + fout and fin − fout. If

the former is suppressed by the LPF but the latter is not, then fin − fout = fout, and

hence fout = fin/2. The Miller dividers can operate at speeds exceeding half of the

fT of its constituent devices, the drawbacks, however, is its substantial phase noise,

design complexity and power consumption.

Page 58: Low-Power, Gigahertz Clock Generation and Distribution

37

Parametric frequency dividers[33, 34, 35] are another type of analog frequency

dividers, as shown in Fig. 2.10 (b). The frequency division principle of a parametric

frequency divider relies on exciting a varactor at frequency fin and realizing a negative

resistance that sustains a loop gain of unity at fin/2. High quality factor varactors

and inductors are key elements in parametric frequency dividers. Since in CMOS

processes, high quality factor passive devices are not available, parametric frequency

dividers are not suitable for CMOS integration.

An injection-locked oscillator (ILO) injection-locked at its harmonics can be used

as a frequency divider, namely, an injection-locked frequency divider (ILFD) [20].

According to the different oscillators utilized to build the frequency divider, ILFD

can be categorized into different topologies. Fig. 2.11 shows several common ILFD

topologies seen in literature. In Fig. 2.11a is an ILFD implemented by a single-

ended ring oscillator with the superharmonic injection current injected to the common

source of CMOS inverters [36]. An ILFD can also be built with a ring oscillator with

differential ring stages. Generally speaking, ring-oscillator-based ILFDs have a large

locking range and can support both even harmonic and odd harmonic injection locking

with proper injection topology. However, as ring oscillators do not provide filtering

like in a resonant oscillator, they tend to have large unwanted harmonic components,

particularly at the injected signal frequency. Their phase noise performance is also

inferior to resonant-based ILOs.

A resonant-based ILFD has inherent advantage in both speed and power dissi-

pation. Such an ILFD is fundamentally an resonant oscillator at the subharmonic

frequency of the input signal, which effectively lowers the speed requirement for the

process technology by n-fold. As a resonant circuit, only a fraction of the stored

energy is dissipated in every cycle, which is determined by the quality factor Q of the

resonator. This means that a resonant-based ILFD can have lower power consump-

tion than both a digital divider and even a ring oscillator based ILFD. At the same

Page 59: Low-Power, Gigahertz Clock Generation and Distribution

38

Figure 2.11: Injection-locked frequency divider (ILFD) implementations: (a) ringoscillator ILFD, (b) Colpitts LC oscillator ILFD, (c) direct injection into the tank ofan LC differential LC oscillator, (d) injection through the tail of the LC differentialoscillator.

Page 60: Low-Power, Gigahertz Clock Generation and Distribution

39

time, a resonant-based ILFD also has the advantages of simpler circuit structure than

regenerative frequency dividers and better tolerance for low-Q devices compared with

parametric frequency dividers. Because of these advantages, resonant oscillator based

ILFDs are gaining popularity in recent high frequency CMOS designs [20, 37, 38].

Resonant-based ILFDs, or LC-oscillator-based ILFDs also has single-ended version

and differential version. An example of single-ended LC ILFD is shown in Fig. 2.11b,

which is based on a Colpitts oscillator with the superharmonic injection applied to

the gate of the active transistor. This superharmonic injection mixes with the fun-

damental oscillation by the nonlinearity of the transistor. Single-ended LC ILFDs do

not provide differential output, and are susceptible common mode noise. This is why

differential LC ILFDs, built with differential LC oscillators, as shown in Fig. 2.11c

and Fig. 2.11d, are favored or the single-ended counterparts [20, 23]. Depending on

where the injection signal is applied, LC differential ILFDs can be further divided to

the direct injection topology and injection through the tail topology. The direct injec-

tion topology, as shown in Fig. 2.11c, has an injection device connected across the LC

tank, and the superharmonic signal direct injected into the tank through the injection

device. The injection device is usually a single transistor with source/drain terminals

connected to the two differential oscillation nodes. The superharmonic injection ap-

plied to the gate of this transistor mixes with fundamental oscillation and generate

an injection current at fundament frequency, which locks the output frequency of the

oscillator. The other injection topology, which injects the superharmonic through the

tail transistor, as shown in Fig. 2.11d, has no extra devices compared with a stan-

dard LC differential oscillator. The superharmonic injection voltage applied to the

gate of the tail transistor is converted to current by the tail transconductance and

fed into the common source of the cross pair. This injection current is steered by

the fundamental oscillation voltage like in a single balanced mixer. Thus the mixing

between the injection and the oscillation is also similar to the situation in the single

Page 61: Low-Power, Gigahertz Clock Generation and Distribution

40

balanced mixer, and the generated fundamental current locks the output frequency of

the oscillator. The difference between the direct injection topology and the injection

through tail transistor topology is the different mixing mechanisms.

The discussed LC differential ILFD topologies above are all for even number har-

monic locking, ie., divide by two or divide by four. Odd number division ratios are

not supported by these topologies. In this dissertation, we proposed a new LC dif-

ferential ILFD topology for divide-by-3 harmonic locking in [39], which is the first

divide-by-odd-number ILFD based on a fully LC differential oscillator. The key dif-

ference of the proposed ILFD topology with conventional LC differential ILFDs is

the injection method. From the previous discussions, we found that all the injections

in conventional topologies are in a common mode fashion. But in our proposed new

topology, the superharmonic input is injected differentially into the oscillator. More

details will be discussed in chapter 3 for the new divide-by-odd-number ILFD.

Frequency multiplication is an important function in frequency synthesis, clock

distribution, and a wide range of RF and microwave applications. The combination

of a VCO with a frequency multiplier can enable the VCO to work at lower frequency

[40, 41, 42], which can be designed with better spectral purity. This arrangement can

also avoid a VCO pulling problem and allow a lower frequency synthesizer division

radio, as shown in Fig. 2.12. In a clock distribution system[43], using a frequency

multiplier as the local clock generator can enable the global clock to run at a lower

speed, which can both lower the power consumption and reduce the design difficulty

of the distribution network. Variable-modulus frequency multiplier can be a low-cost

solution for a multi-band system to switch the frequency between different bands. For

example, the 2.3GHz and 3.5GHz bands of WiMAX are roughly around the twice and

three times of the fundamental frequency of 1.15GHz, making it possible to generate

the frequency of the two bands by a VCO of 1.15GHz and a dual-modulus frequency

doubler/tripler.

Page 62: Low-Power, Gigahertz Clock Generation and Distribution

41

Figure 2.12: Frequency synthesis with LO and frequency multiplier in an RF trans-mitter.

In frequency multiplier design, the main performance metrics are phase noise, out-

put power, undesired harmonics suppression, and power consumption. Phase noise

is of great concern as it directly influences the timing accuracy. An ideal frequency

multiplier will introduce no additional phase noise and the output will only show a

phase noise degradation of 20log(N) compared with the input because of the fre-

quency multiplication, where N is the multiplication ratio. For a practical frequency

multiplier, the design goal is to minimize the additional phase noise introduced by the

multiplier itself. Output power is also important, as the frequency multiplier needs

to drive its load at a certain power level. Working in the frequency synthesizer, the

load will be the mixer of the RF transceiver and in the clocking network, the load will

be the local clock distribution. Undesired harmonics at the output of the frequency

multiplier will generate interference in other bands through mixing and modulation,

and thus need to be suppressed as well. Power consumption directly relates to the

battery life of a battery-powered device and needs to be minimized.

A common implementation of frequency multiplier is to generate harmonics of an

Page 63: Low-Power, Gigahertz Clock Generation and Distribution

42

Figure 2.13: Frequency multipliers by (a) harmonic generation and filtering, (b) re-generative harmonic doubling and (c) injection-locking.

input signal using a nonlinear device, and then choose the desired harmonic com-

ponent by a filter network, as shown in Fig. 2.13(a). The nonlinear device can be

a diode or a transistor biased at a small conduction angle [44, 45, 46, 47], and the

filter network is usually built by passive LC circuits. Such a harmonic generation and

filtering approach is effective if the filter network has a large quality factor, which

is available for high-resistivity-substrate processes like GaAs and SiGe. However, as

cost becomes the main driver for a system on-a-chip (SoC) solution for wireless sys-

tems, RF circuits are migrating to digital CMOS technologies for single-chip radio

solutions. This creates a challenge for the conventional frequency multiplier design

as a high-Q filter is very difficult to construct using the lossy on-chip inductors in

a digital CMOS process. Therefore, integrated CMOS frequency multipliers usually

have poor harmonics suppression [46].

Phase-locked loop with frequency divider in the feedback path can also work as a

Page 64: Low-Power, Gigahertz Clock Generation and Distribution

43

frequency multiplier. The PLL-based frequency multiplication, however, suffers from

drawbacks like circuit complexity and power consumption.

The common mode node in a differential oscillator can be another way to extract

the doubled frequency component, as shown in Fig. 2.13(b). Such a frequency dou-

bling approach is called push-pull frequency doubling as in [38, 48], or regenerative

frequency doubling as in [49, 50]. During operation, the differential pair will each

conduct a current pulse for a fraction of period, and together generate the second

order harmonic current, which multiplies with the impedance of the common source

node at the second harmonic frequency to generate the desired output voltage. Other

higher order harmonic current will also be generated, however, the capacitive nature

of the common source node impedance will filter out most of the harmonic power at

higher orders. If we can consider the cross pair as a generalized harmonic generator

and the RC network at the common source node as the filter network, the regenera-

tive frequency doubler can fall into the category of harmonic generation and filtering

approach. The regenerative frequency doubler has a fundamental limitation, that is

the voltage swing at the common source node is limited. Further amplification and

buffering are needed before the frequency doubler can drive any meaningful load. In

[38, 48], the doubled frequency is extracted directly from the VCOs which are sup-

posed to be working inside the frequency synthesis PLL. For [49, 50], the doubled

frequency is extracted from oscillators which are injection locked by external input.

For the former case, the phase noise performance of the doubled frequency is deter-

mined by the frequency synthesis PLL, while for the latter case, the phase noise is

determined by the injection source at low offset frequency.

Apart from being utilized in regenerative frequency doubler as in [49, 50], injec-

tion locking can also be utilized as a direct frequency multiplication technique and is

capable of introduce extra valuable harmonic suppression. This is because an oscil-

lator can be locked to an input which is much smaller in amplitude. This in effect

Page 65: Low-Power, Gigahertz Clock Generation and Distribution

44

is a large amplification for the desired harmonic. While at the same time, for other

harmonics, there is no such an amplification effect. The conventional frequency mul-

tiplier followed by an injection-locked oscillator thus is a promising candidate for high

harmonic suppression frequency multipliers in lossy digital CMOS processes. In such

an injection-locked frequency multiplier, the harmonic suppression can be expressed

as

HSm,n,inj = |Imω0

Inω0

||Zosc(mω0)

Zosc(nω0)|1η

(2.3)

where η is the injection ratio, which is defined as the amplitude ratio between the

injection signal and the output. The filter network this time is the oscillator itself,

so the subscript osc is added for distinction. The extra term of 1η

in Eqn. 2.3 can

introduce substantial increase for the harmonic suppression as the injection ratio can

have a value much smaller than one.

In [51, 52], such an injection-locked frequency multiplication idea are applied to

frequency tripler applications as shown in Fig. 2.13(c). In these implementations, an

input differential pair works as the harmonic generator, which generates strong third

order harmonic current by the current steering effect of the differential pair. This

third harmonic rich current then injects lock the oscillator core whose free running

frequency is near the third order harmonic of the injection. One problem with this

topology is the generated fundamental current by the input differential pair is even

stronger than the desired third harmonic. The only mechanism for suppression this

undesired fundamental is the band pass filtering of the LC tank.[52] has not reported

this harmonic suppression performance. But according to the low quality factor of

the oscillator tank reported by [52], the suppression of this undesired fundamental

component would not be satisfactory.

In this dissertation, we propose a new injection-locked frequency multiplier topol-

ogy to address the problem by two stage filtering in a compact structure. It also

Page 66: Low-Power, Gigahertz Clock Generation and Distribution

45

support dual-modulus operation of both doubler and tripler modes. Fabricated in

0.18µm digital CMOS process with lossy substrate, this new topology achieves very

good suppression for undesired harmonics, in both doubler and tripler modes [53].

Details of the new injection-locked frequency multiplier topology will be discussed in

chapter 3.

2.2 Clock Distribution

Clock distribution will increasingly be one of the most challenging tasks in micro-

processors and other high-speed VLSIs. The 2007 ITRS roadmap projects that the

on-chip clock speed will continue to rise to near 9 GHz in 2015 [54]. Even though

the device feature size will shrink, the chip size will remain constant (about 16.7 mm

from edge to edge [54]) as more functions are added. If current clocking schemes

continue to be used, it is expected that skew and jitter will consume an increasingly

large portion of each clock cycle, and hence the time available for critical path will

eventually be less than the technology-allowed minimum delay beyond the 32 nm

node in 2013. This will largely defeat the purpose of any further clock speed in-

crease. In the meantime, the power consumption in clock distribution networks has

also become a serious problem. Currently, about 40% of total power consumption

of a high-performance microprocessor is used by the clocking circuitry [55]. As both

clock speed and transistor count increase, the projected power consumption of a high-

performance microprocessor will exceed the power density limit set by packaging [54].

Therefore, we need a new clocking solution that can achieve better skew and jitter

performance while consuming less power.

Page 67: Low-Power, Gigahertz Clock Generation and Distribution

46

2.2.1 Conventional Monolithic Clock Distributions

Fig. 1.6 in Chapter 1 shows a conventional clock distribution scheme [14]. The

global clock is generated by an on-chip phase-locked loop (PLL) from an off-chip

reference clock, usually a crystal oscillator at tens of MHz. The global clock is dis-

tributed using a global clock distribution network, typically in an H-tree topology,

which consists of interconnect transmission lines and clock buffers, and then further

distributed by local clock distribution networks. Local clock distribution networks

can be another level of H-tree, as shown in the upper right corner in Fig. 1.6, or a

metal grid, as shown in the upper left corner. Both H-tree and metal grid are metic-

ulously designed to balance the clock arrival time at different positions in the clock

distribution network.

The Conventional distribution schemes are more or less monolithic in that a sin-

gle clock source is fed through hierarchies of clock buffers to eventually drive almost

the entire chip. This raises a number of challenges. First, due to irregular logic,

the load of the clock network is non-uniform, and the increasing process and device

variations in deep sub-micron semiconductor technologies further adds to the spatial

timing uncertainties known as clock skews. Second, the load of the entire chip is

substantial, and sending a high quality clock signal to every corner of the chip nec-

essarily requires driving the clock distribution network “hard”, usually in full swing

of the power supply voltage. Not only does this mean high power expenditure, but it

also requires a chain of clock buffers to deliver the ultimate driving capability. These

active elements are subject to power supply noise, and adds delay uncertainty – jitter

– which also eats into usable clock cycle. Jitter and skew combined represent about

18% of cycle time currently [56], and that results in indirect energy waste as well.

For a fixed cycle time budget, any increase in jitter and skew reduces the time left for

the logic. To compensate and make the circuitry faster, the supply voltage is raised,

therefore increasing energy consumption. Conversely, any improvement in jitter and

Page 68: Low-Power, Gigahertz Clock Generation and Distribution

47

skew generates timing slack that can be used to allow the logic circuit to operate

more energy-efficiently.

In order to minimize the global clock skew, the global clock-distribution network

has to be balanced by meticulous design of the transmission lines and buffers. This

practice puts a very demanding constraint on the physical design of the chip. Even so,

the ever-increasing process variations with each technology generation still results in

greater challenges in maintaining a small skew budget. Another current practice is to

use a grid instead of a tree for clock distribution, as shown in the upper-left local clock

region in Fig. 1.6. A grid has a lower resistance than a tree between two end nodes,

and hence can reduce the skew. At the same time, a grid usually has much larger

parasitic capacitance (larger metal layers) than an equivalent tree, and therefore takes

more power to drive. Passive and active deskew methods [57, 58, 59, 60] have also

been employed to compensate skew after chip fabrication. Apparently this approach

increases the chip complexity, manufacturing cost, and in the case of active deskew,

power consumption and jitter.

Jitter poses an even larger threat to microprocessor performance and power con-

sumption. The global-clock PLL and clock-distribution network generate noise, and

hence contribute to global clock jitter. But the main culprit is usually the noise cou-

pled from other circuits, such as power supply noise, substrate noise, and cross-talks.

Short-term jitter (cycle-to-cycle jitter) can only be accounted for by adding timing

margin to the clock cycle, and hence degrades performance. Unlike skew, jitter is

very difficult to compensate due to its random nature. In order to reduce jitter, the

interconnect wires in the global clock distribution network need to be well shielded

from other noise sources, usually by sandwiching them between Vdd/ground wires

and layers. Shielding inevitably increases the parasitic capacitance of the clocking

network, which means more and larger clock buffers, and hence larger power dissi-

pation to drive them. In turn, having more buffer stages introduces another source

Page 69: Low-Power, Gigahertz Clock Generation and Distribution

48

of jitter, and the situation deteriorates quickly with faster clock speed. It is evident

that current skew and jitter reduction techniques almost always result in higher power

consumption. A better clocking scheme with less jitter and skew directly translates

into power savings for a given performance target.

2.2.2 Emerging Gigahertz Clock Distribution Schemes

There have been intensive research efforts in recent years to address the chal-

lenges in high-speed clocking from different disciplines, including clockless design

(asynchronous circuits), optical interconnect, and resonant clocking, to name a few.

Each of these alternative solutions has its own technological issues to be addressed.

Optical interconnect potentially offers smaller delays and lower power consump-

tion than electrical ones, and is promising for the global clock distribution network

[61, 62, 56]. However, there are still great challenges in its silicon implementation,

particularly for on-chip electrical-optical modulators [63]. Wireless clock distribution

proposed in [64][65] suffers substantial overhead in chip area and power consumption

due to on-chip clock transceivers.

Among the proposed electrical solutions, a family of synchronized clocking tech-

niques, such as distributed PLLs [66, 67], synchronous distributed oscillators [68, 69],

rotary clocking [70], coupled standing-wave oscillators [71], and resonant clocking [72]

have recently been proposed to improve the performance of global clock distribution.

In [72, 73], on-chip inductors are added to all the local nodes of the global clock

distribution tree, and hence turn it into a single large resonator. Resonance improves

power efficiency. Therefore, this technique reduces dc power dissipation and lowers

jitter in the global clock distribution network. It is a good step in the right direction.

However, it does not provide deskew capabilities like injection-locked clocking. The

more stringent layout constraints due to on-chip inductors could even aggravate the

problem of skew.

Page 70: Low-Power, Gigahertz Clock Generation and Distribution

49

In [66, 67], an array of PLLs is constructed using a voltage-controlled oscillator

(VCO) and loop filter at each node, and a phase detector between adjacent nodes.

Each PLL generates the local clock in the particular clock domain, which is synchro-

nized with others through the aforementioned phase detectors at the clock domain

boundaries. Global clock as in conventional clocking is removed in this scheme, and

hence it promises lower jitter. The drawbacks are that a) the global skew is still a

problem since deskewing only happens locally, and b) the sensitive analog circuits in

a PLL (phase detectors, loop filters, ring oscillators) are vulnerable to noise in the

hostile environment of digital circuits.

In [68, 69, 70, 71], an array of oscillators are connected to the global clock dis-

tribution network, and thus are synchronized by coupling. The resulting oscillator

array becomes a distributed oscillator. The difference is that in [70] the oscillator

array is a one-dimensional loop, and the phase of oscillators change linearly along

the array, similarly to a distributed VCO [74], which was based on traveling-wave

amplification [75]. In [71], the oscillator array generate a standing-wave pattern on

the network, i.e., each oscillator has the same phase. Essentially all these techniques

use a distributed oscillator with interconnects as its resonator. A distributed os-

cillator suffers the problem of phase uncertainty due to mode locking [66, 67, 69].

This is evident in that similar topologies can be used for either traveling-wave [70]

or standing-wave oscillation [71]. Another problem is that jitter tends to be worse

than conventional clocking since the global clock is now generated on chip using lossy

passive components, without the clean reference clock from the off-chip crystal oscil-

lator. It is noteworthy that [73] unintentionally adds injection locking to distributed

oscillator clocking and demonstrated good jitter performance.

Overall, all these promising technologies face significant technical difficulties and

require dramatic changes in processes technologies, design methodologies, or testing

methods, and hence will face significant resistance in adoption.

Page 71: Low-Power, Gigahertz Clock Generation and Distribution

50

Figure 2.14: Injection-locked clocking with active deskew based on the ILO delaytunings.

In this dissertation, we proposed a new clocking scheme as shown in Figure 2.14.

Similar to conventional clocking, the global clock is generated by an on-chip PLL, and

distributed by a global tree. The difference is that we use injection-locked oscillators

(ILOs) to regenerate local clocks, which are synchronized to the global clock through

injection locking. Another difference is that most global clock buffers in conventional

clocking are removed because the sensitivity of ILOs are much greater than digital

buffers. Essentially, we use ILOs as local clock receivers, similar to the idea of clock

recovery in communication systems. Note that this is different from resonant clocking

[73], where all the oscillators are coupled together. By utilizing their phase tunability,

ILOs in injection-locked clocking also serve as deskew circuit. They work with phase

detectors (PDs) and deskew logics (DSKs) to form deskew loops, which reduce the

skew between different local clock distribution. Further, ILOs can be constructed

as frequency multipliers [76] or dividers[20, 39], and hence this scheme enables local

clock domains to have higher (n × f0) or lower clock speed (f0/m) than the global

clock (f0). Such a global-local clocking scheme with multiple-speed local clocks offers

significant improvements over conventional single-speed clocking scheme in terms of

Page 72: Low-Power, Gigahertz Clock Generation and Distribution

51

power consumption, skew, and jitter. More details on ILC will be presented in chapter

4.

2.3 Contributions of This Dissertation

Injection-locked oscillators have several useful properties which make them ideal

for high speed clock applications, including clock generation and clock distribution.

Firstly, because an ILO can be locked by its super-harmonics and sub-harmonics,

it is very convenient to build frequency dividers and multipliers based on an ILO.

Because of the resonant nature, such injection-locked frequency dividers (ILFDs) and

injection-locked frequency multipliers (ILFMs) generally can work at higher speed

than conventional frequency dividers and multipliers in the same technology. The

power consumptions are also smaller because of recycling of power by the resonator.

The trade-off is usually smaller bandwidth. So ILFDs and ILFMs are naturally

suitable for high-speed low-power narrow-band clocking applications.

Secondly, an ILO introduces phase shift between the injection signal and its out-

put. This phase shift is determined by the frequency offset between the injection

signal and the resonant frequency of its tank. So if we can control the resonant fre-

quency of the tank, we can tune the phase shift introduced by the ILO. This phase

tunability of ILOs makes it suitable for high-speed clocking applications where ac-

curate phase relationship control is required. Examples of such applications include

quadrature generation for wireless transceivers, multiple phase generation for phased-

array systems and active deskew for clock distributions.

The third useful property of an ILO is its capability to be locked by an injection

signal much smaller than its output signal strength. This effectively makes an ILO a

high gain amplifier. The high gain nature of an ILO enable it to work as a local clock

regenerator in a clock distribution network where the requirement for its input clock

Page 73: Low-Power, Gigahertz Clock Generation and Distribution

52

strength is much smaller than conventional buffer chain based clock distributions.

This high gain property of an ILO can be also used together with its narrow band

nature to function as a high-Q bandpass filter. Such a high-Q bandpass filter can be

used in frequency multipliers to suppress undesired harmonics where other bandpass

filter structures are not effective.

This dissertation presents our studies of gigahertz, high performance, low power

clock generation and distribution using injection-locked oscillators. For gigahertz

clock generation, we introduced a phase tuning scheme for injection-locked frequency

divider based dual-phase signal generators. The phase tuning capability in this

scheme comes from the tunable phase transfer characteristics of injection-locked

frequency dividers. Implemented with a frequency-tunable double-balanced divide-

by-two injection-locked frequency divider, the dual-phase signal generator prototype

achieves 100o differential phase tuning range around quadrature with generated signal

frequency of 5 GHz.

For gigahertz frequency division, we introduced a divide-by-odd-number injection-

locked frequency divider to address the division ratio limitation of conventional injection-

locked frequency dividers. With differential injection and harmonic filtering, this new

ILFD topology maintains the fully differential nature of the output signal, while at

the same time achieves effective mixing between the injected odd harmonics and out-

put oscillation. 5% locking range without frequency tuning is achieved for the circuit

prototype of this topology working at input frequency of 16-18 GHz.

For gigahertz frequency multiplication, we introduced a injection-locked oscillator

to work as a high gain, high Q harmonic filter for conventional harmonic-generation-

and-filtering frequency multipliers. This new approach achieves significant better un-

desired harmonic suppression for frequency multipliers built with lossy digital CMOS

processes. Frequency tunability of injection-locked oscillators also enables multi-mode

operations for such injection-locked frequency multipliers. The circuit prototype of

Page 74: Low-Power, Gigahertz Clock Generation and Distribution

53

such a frequency multiplier achieves multiply by 2 and 3 dual-mode operation with

undesired harmonic suppressions better than 30 dB achieved for both modes.

For gigahertz clock distribution, we proposed injection-locked clocking using injection-

locked oscillators as the local clock regenerators. Because of ILO’s capability to be

locked by a small input signal, this new approach reduced a large amount of clock

buffers in global clock distribution. This not only reduces the power consumption, but

also reduces the skew and jitter which come from these clock buffers. The phase tun-

ability of ILOs can also be utilized to achieve the deskew function between different

clock domains. Three ILC circuit prototypes working at several gigahertz demon-

strated the better power and jitter performance and the built-in deskew capability of

ILC.

Page 75: Low-Power, Gigahertz Clock Generation and Distribution

Chapter 3

Injection-Locked Oscillators for

Clock Generation

3.1 Analysis of Injection-Locked Oscillators

An injection-locked oscillator can be analyzed with a similar loop model as an free

running oscillator discussed in Chapter 2. Shown in Fig. 3.1-a, the loop model for

an ILO is also composed of an resonant tank H(jω) and the active circuit f(Vi, Vo).

Different from the free running oscillator case shown in Fig. 2.3-b, the active circuit

of an ILO loop model now have two inputs, instead of one. The extra input Vi, which

is the injection signal, introduces a phase shift between Vo and output current of the

active circuit, which is f(Vi, Vo) (Fig. 3.1-b). According to Barkhausen’s oscillation

criteria, this phase shift needs to be compensated by the resonant tank, as shown in

Fig. 3.1-c. Because of this phase shift, the oscillation frequency shifts away from the

resonant frequency of the tank. Instead, it shows exact the same frequency as the

injection signal if the injection frequency is within a particular frequency range.

This loop model qualitatively demonstrates how an ILO works. However, the

illustration of the phase shift introduced by Vi in Fig. 2.3-b is based on a linear model

54

Page 76: Low-Power, Gigahertz Clock Generation and Distribution

55

Figure 3.1: Loop analysis of injection-locked oscillators. (a) loop model for an ILO,where the LC tank is represented by H(jω), and the active circuit is represented byf(Vi, Vo); (b) phasor representation of the phase shift introduced active circuit; (c)amplitude and phase response of the LC tank H(jω), showing the new oscillationfrequency at ωi, instead of ω0.

([18]), which is not valid in many ILO implementations. The introduction of nonlinear

models for the active circuit in an ILO has led to various ILO models reported in

literatures. In [23], a hard switching model was used to model a cross-coupled pair in

the LC differential ILO. This model successfully predicts the locking range and output

amplitude of this ILO topology at divide-by-two operation with reasonable accuracy.

However, the hard switching model is an over-simplification for the behavior of a

cross-coupled pair, and loses accuracy at small oscillation amplitudes. At the same

time, it only illustrates the divide-by-even-number operation of this ILO topology.

In [24], a unified nonlinear model was introduced to analyze general injection-locked

frequency dividers. The proposed method uses a two-dimensional Taylor expansion

to model the nonlinearity of the active circuit. The two-dimensional Taylor expansion

is carried out on the two inputs of the active cross-pair. One is the injection signal

and the other is the oscillation signal. Specifically for the divide-by-two ILFD, a

piecewise nonlinearity model was used for the cross-pair in the ILFD, which is more

accurate than hard switching model in [23]. However, this proposed unified model

Page 77: Low-Power, Gigahertz Clock Generation and Distribution

56

has not been used to analyze non-even-number harmonic injection locking. At the

same time, because the mathematical treatment for the nonlinearity is to general, the

model results lose their design insights.

The increase of injection locking in high performance clocking applications makes a

new ILO modeling method necessary. This new modeling method should, at one hand,

be general enough to analyze both even-harmonic and odd-harmonic injection locking,

at the other hand, give enough design insights for circuit designers. Because of the

large signal and nonlinear natures of an ILO operation, frequency domain harmonic

analysis is a natural fit for ILO analysis. Numerical-method-based harmonic balance

has been applied to both Cadence and Advanced Design Systems (ADS) tools for

analyzing oscillators, forced and free-running ones. They are effective for computer

simulations of injection locking. However, such pure numerical method always lacks

the design insights, or physical pictures of the circuit being analyzed. In stead, an

analytical method based on harmonic balance concept should be employed, if one

intends to reveal the underlying working principles of an injection-locked oscillators.

3.1.1 ”Harmonic Balance” Analysis of Oscillators

A resonator-based oscillator can be analyzed using a one-port model as shown in

Fig. 3.2. The resonator is typically a linear passive network, and can be represented

by an admittance YR. The active circuit, on the other hand, is nonlinear, and needs

linearization in analysis. For a free-running oscillator with no external excitation, the

nonlinear active circuit can be linearized using the describing function method [15],

which represents the active circuit with its describing function, i.e., the fundamental

frequency component of the Fourier series for the nonlinearity under periodic excita-

tion of the oscillation signal. The describing function is shown as a linear admittance

Page 78: Low-Power, Gigahertz Clock Generation and Distribution

57

YA here. Then the oscillation condition is formulated as:

YA + YR = 0 (3.1)

Figure 3.2: One port model for a resonator-based oscillator with the active circuitrepresented by a linear admittance.

Eqn. 3.1 can be used to analyze the oscillation frequency and amplitude. For

example, for an LC differential oscillator (Fig. 2.2-b), the admittances YA and YR can

be expressed as

YA = −GA (3.2)

and

YR = Gp + jGp2Q(ω

ω0− ω0

ω) (3.3)

where −GA is the negative conductance of the active circuit, and Gp is effective

parallel conductance, Q the quality factor, and ω0 = 1/√

LC the natural frequency

of the LC resonator, respectively. Therefore,

−GA + Gp + jGp2Q(ω

ω0

− ω0

ω) = 0 (3.4)

The imaginary part of the equation shows that the oscillation frequency is equal to

ω0, and the real part means that the active circuit compensates the loss of the LC

tank, from which the oscillation amplitude can be derived.

Page 79: Low-Power, Gigahertz Clock Generation and Distribution

58

3.1.2 ”Harmonic Balance” Analysis of Injection-Locked Os-

cillators

Figure 3.3: One port model for an injection-locked oscillator with the active circuitdescribed in time domain.

For an injection-locked oscillator, the active circuit can not be easily linearized

using the describing function method since there are two excitations to the active

circuit, the oscillation signal vosc(t)) and injection signal iinj(t), which might be at

different harmonic frequencies. Instead, a full-blown harmonic balance analysis [77]

is needed for this scenario, as shown in Fig. 3.3. A time-domain nonlinear function

f(iinj(t), vosc(t)) relates the injection and oscillation signals to the active circuit out-

put iA(t). The resonator can be described by a linear transfer function in frequency

domain,

~IR = [YR] · ~Vosc (3.5)

where vectors ~IR and ~Vosc are the oscillation voltage and resonator current phasors

at all the harmonics, and hence the resonator transfer function [YR] becomes a trans-

fer matrix. The time-domain signals and frequency-domain harmonics are related

through Fourier transformations. The harmonic balance equation for the oscillator is

~IA + ~IR = 0 (3.6)

where ~IA represents the harmonic phasors of the active circuit output iA(t) = f [iinj(t), vosc(t)].

Assuming the injection signal ii is small enough, we can treat it as a small pertur-

bation to the oscillation signal in harmonic balance. First, the nonlinear function f

Page 80: Low-Power, Gigahertz Clock Generation and Distribution

59

is expanded into a Fourier series under large-signal periodic excitation of vo(t). Then

the Fourier coefficients are linearized at the dc value of ii(t), Idc. This process can be

formulated as the following:

f =

∞∑

h=−∞Ah(iinj)e

jhωt

≈∞

h=−∞Ah(Idc)e

jhωt + ii

∞∑

h=−∞A

h(Idc)ejhωt (3.7)

Note that since f is a real function

Ah(Idc) = A−h(Idc) (3.8)

It is worth noting that the Fourier series expansion is valid as long as the oscillator

is locked. Linearization for ii is justified when the injection signal is small compared

to Idc.

3.1.3 Common-Mode and Differential Injection

Figure 3.4: An injection-locked oscillator based on a cross-coupled LC differentialoscillator.

A common implementation of the ILO model shown in Fig. 3.3-b is the differential

Page 81: Low-Power, Gigahertz Clock Generation and Distribution

60

LC oscillator as shown in Fig. 3.4. Such an ILO has fully differential output and is

robust to common mode noises. Injection currents ii1 and ii2 are fed into the sources

of the cross-coupled transistors M1 and M2. For each cross-coupled transistor, its

drain current (id1 or id2) is a nonlinear function of its source current (ii1 or ii2) and

gate-drain voltage1, and can be separately approximated using Eqn. 3.7. For M1, note

id1 = f(ii1, vgd1) = f(ii1, vo)

≈∞

h=−∞Ahe

jhωt + ii1

∞∑

h=−∞A

hejhωt . (3.9)

where Ah and A′

h are values at Idc. Because of the circuit symmetry,

id2 ≈∞

h=−∞Ahe

jh(ωt+π) + ii2

∞∑

h=−∞A

hejh(ωt+π) (3.10)

Note that because vgd2 = −vosc, there is a time delay of π/ω, which translates into

extra phase shift of hπ in each harmonic.

The differential output signal of the active circuit now is

iAd = id1 − id2

=∞

k=−∞2A2k−1e

j(2k−1)ωt

+(ii1 + ii2)∞

k=−∞A

2k−1ej(2k−1)ωt

+(ii1 − ii2)

∞∑

k=−∞A

2kej2kωt

(3.11)

1This is because the drain and source current (id and ii) are both single-value functions of vgs

and vgd. Once ii and vgd are known, vgs and hence id are determined, too.

Page 82: Low-Power, Gigahertz Clock Generation and Distribution

61

Define the common-mode and differential injection signals as

iic =1

2(ii1 + ii2)

iid = ii1 − ii2 (3.12)

Then

iAd =

∞∑

k=−∞2A2k−1e

j(2k−1)ωt

+2iic

∞∑

k=−∞A

2k−1ej(2k−1)ωt

+iid

∞∑

k=−∞A

2kej2kωt (3.13)

There are three mechanisms contributing to iAd: (a) the odd harmonics of the os-

cillation signal, (b)the mixing product of the common-mode injection signal and the

odd harmonics of oscillation signal, and (c) the mixing products of the differential

injection signal and the even harmonics of the oscillation signal. We can draw sev-

eral conclusions: First, when the oscillator is free running, the differential oscillation

signal only consists of the fundamental frequency and its odd harmonics, which is

not surprising given the symmetry. Second, common-mode injection topologies can

only support divide-by-even-number injection locking. This is because only even-

order-harmonic injection can mix with odd harmonics of the output to generate the

fundamental current.

As a special case, we can study the divide-by-two ILO by the generic current

expansion of Eqn. 3.11. For divide-by-two ILOs, we have the small signal differential

injection signal as zero,

iid = 0 (3.14)

Page 83: Low-Power, Gigahertz Clock Generation and Distribution

62

and common-mode small signal injection as

iic = Iinj1

2(ej(2ωt+φ) + e−j(2ωt+φ)) (3.15)

The fundamental of Eqn. 3.11 can be expressed as

Ioej(ωt+γ) = 2A1(Idc)e

jωt + Iinj1

2ej(2ωt+φ)A

−1(Idc)e−jωt + Iinj

1

2e−j(2ωt+φ)A

3(Idc)ej3ωt

(3.16)

By taking this fundamental current to the harmonic balance equation of Eqn. 3.6, we

can calculate the frequency range in which the balance holds. This range is called

locking range and the half-side locking range can be expressed as

|∆ω| ≤ ω0

2Q

|12Iinj[A

−1(Idc) − A′

3(Idc)]|√

[2A1(Idc)]2 − (12Iinj)2[A

−1(Idc) + A′

3(Idc)]2(3.17)

It is worth noting that if we assume hard switching model as in [23] for the cross pair,

we can directly calculate the Fourier coefficients and their derivatives from the sign

function nonlinearity of the cross pair. Those in Eqn. 3.17 are listed as A1(Idc) = Idc2π,

A′

−1(Idc) = A′

1(Idc) = 2π, and A

3(Idc) = − 23π

. Taking these coefficients into Eqn. 3.17,

we can write the half-side locking range as

|∆ω| ≤ ω0

Q

1√

( 3η)2 − 1

(3.18)

where η is defined asIinj

Idc. This is the same as the result calculated in [23], which

tells Eqn. 3.17 is a more general expression for the locking range of divide-by-two

ILFD and the hard switching model of Eqn. 3.18 can be viewed as a specialized and

simplified case of Eqn. 3.17.

A divide-by-two injection-locked frequency divider with ideal parallel RLC tank

Page 84: Low-Power, Gigahertz Clock Generation and Distribution

63

Figure 3.5: Locking range simulation of a divide-by-two ILFD as compared the har-monic perturbation and hard switching models.

is simulated to verify the aforementioned locking range derivations. For the harmonic

perturbation method, the Fourier coefficients in Eqn. 3.17 are directly calculated

from the drain current waveforms of the cross-pair transistors in the free running

oscillator. A small perturbation in the dc bias current is then applied to the same

oscillator to get the derivatives of these Fourier coefficients. These Fourier coefficients

and their derivatives are important in determining the locking range of an ILO based

on this oscillator. They can serve as initial design guides before the real simulations

on the locking range, which are usually time consuming. The calculated locking

range from Eqn. 3.17 are compared with the real circuit simulation and the simplified

hard switching model in Eqn. 3.18. As shown in Fig. 3.5, the locking range derived by

harmonic perturbation model in has much better matching with real circuit simulation

than the simplified hard switching model, especially at small injection ratio.

Page 85: Low-Power, Gigahertz Clock Generation and Distribution

64

3.1.4 Differential Injection for Odd-Harmonic and Funda-

mental Injection Locking

The differential current expression in Eqn. 3.11 not only explains the reason why

common-mode injection in differential LC ILO topology can only work with even-

order harmonics, it also points to the solution for odd-order-harmonic injection lock-

ing: differential injection. Suppose we have differential injection at the third har-

monic, which is

iid = Iinj1

2(ej(3ωt+φ) + e−j(3ωt+φ)) (3.19)

and common-mode small injection as zero,

iic = 0. (3.20)

From Eqn. 3.11, we can have the fundamental differential current from the differential

pair as

Ioej(ωt+γ) = 2A1(Idc)e

jωt + Iinj1

2ej(3ωt+φ)A

−2(Idc)e−j2ωt + Iinj

1

2e−j(3ωt+φ)A

4(Idc)ej4ωt

(3.21)

From similar procedure as in divide-by-two case, we can have the locking range for

divide-by-three ILFD expressed as

|∆ω| ≤ ω0

2Q

|12Iinj[A

−2(Idc) − A′

4(Idc)]|√

[2A1(Idc)]2 − (12Iinj)2[A

−2(Idc) + A′

4(Idc)]2(3.22)

For fundamental injection locking, or non-division ILOs, we have

iid = Iinj1

2(ej(ωt+φ) + e−j(ωt+φ)) (3.23)

Page 86: Low-Power, Gigahertz Clock Generation and Distribution

65

Figure 3.6: Realizations of divide-by-three ILFD by differential cascode injection.

and

iic = 0 (3.24)

The fundamental of Eqn. 3.11 can be expressed as

Ioej(ωt+γ) = 2A1(Idc)e

jωt + Iinj1

2ej(ωt+φ)A

0(Idc) + Iinj1

2e−j(ωt+φ)A

2(Idc)ej2ωt (3.25)

Locking range can be calculated as

|∆ω| ≤ ω0

2Q

|12Iinj[A

0(Idc) − A′

2(Idc)]|√

[2A1(Idc)]2 − (12Iinj)2[A

0(Idc) + A′

2(Idc)]2(3.26)

where A′

0(Idc) and 2A1(Idc) can be calculated by the current waveform or simulation.

Differential injection for divide-by-odd-number ILFDs can be realized by a differ-

ential pair connected to the separated sources of the cross pair in a cascode fashion,

as shown in Fig. 3.6-a. Using divide-by-three as an example, Fig. 3.6-a shows the sig-

nal flows of different harmonics, where the third harmonic are injected differentially

Page 87: Low-Power, Gigahertz Clock Generation and Distribution

66

by the input differential pair, and the fundamental current goes through a dedicated

bandstop filter added in order to sustain the oscillation. The bandstop frequency

of the filter is at the third harmonic in order to prevent a short current path for

the injection. Fig. 3.6-b shows another realization for the differential injection-locked

oscillators, where the differential injection is provided by a transformer, instead of a

differential pair, while the path for fundamental current is provided by the parasitics

of the transformer. Compared with differential pair injection, this topology reduces

the number of transistor stacks, thus can be used under smaller supply voltage, at

the same time, it only requires single-ended injection.

3.2 Injection-Locked Frequency Dividers (ILFDs)

An injection-locked oscillator locked to a super-harmonic input can be used as

frequency divider. Such a frequency divider is called injection-locked frequency di-

viders (ILFDs). An ILFD has inherent advantage in both speed and power dissipation

compared to a digital divider. It is fundamentally an oscillator at the subharmonic

frequency of the input signal, which effectively lowers the speed requirement for the

process technology by n-fold. As a resonant circuit, only a fraction of the stored

energy is dissipated in every cycle, which is determined by the quality factor Q of

the resonator. This means that an ILFD can have lower power consumption than a

digital divider. At the same time, an ILFD also has the advantages of simpler circuit

structure than regenerative frequency dividers and better tolerance for low-Q devices

compared with parametric frequency dividers.

Once locked to the input signal, the output of ILOs will maintain a determined

phase relative to the input signal (Fig. 3.7). The phase difference from the input

signal to the output is determined by the injection signal strength, the frequency

shift from its free-running oscillation frequency, and the frequency characteristics of

Page 88: Low-Power, Gigahertz Clock Generation and Distribution

67

Figure 3.7: Phase tuning characteristics for a divide-by-two ILO in Fig. 2.7-b. η ≡Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation frequency, ∆ω ≡ ω−ω0

is the frequency shift, and Q is the LC tank quality factor.

the oscillator resonator. As shown in Fig. 3.7, the phase shift ϕ is a monotonic

function of the frequency shift ∆ω, and the function is quite linear within the locking

range except when close to the edges. By tuning the free-running frequency of the

oscillator, we can tune the phase of the output signal [?].

The phase noise behavior of an injection-locked oscillator also resembles that of a

first order PLL. The phase noise at ILO output is determined by that of input at lower

frequency offset, and by the oscillator itself at higher frequency offset. The corner

frequency which divide these two regions, which is similar to the loop bandwidth of

a PLL, is determined by the ratio of injection strength and oscillation amplitude.

Limited locking range is a constraint for the application of injection-locked fre-

quency dividers. A model similar to the one in Fig. 2.7-a by [23] makes a hard switch-

ing assumption for the cross pair, and gives a locking range equation for divide-by-two

Page 89: Low-Power, Gigahertz Clock Generation and Distribution

68

ILFD as

Locking Range ∼= 4ω0

3Qη, (3.27)

where ω0 is the free ruining oscillation frequency of the ILFD, Q is the quality factor

of the tank, and η is the injection ratio defined as the injection strength over the

oscillation strength.

In order to increase the looking range, the most intuitive approach is to increase

the effective injection strength. While the available injection signal strength is limited

by the output of a previous device, to reduce the loss in the injection-path seems an

effective way to increase the actual injection strength. The injection node at the source

of the cross pair is the node of most interest along this approach, because the large

parasitic capacitance at this node tends to be a good leakage path for the injection

signal to ground. Adding an inductor in parallel with the parasitic capacitance to

form a resonance at the injection frequency is an effective way to reduce this leakage

and increase the locking range[21].

Notice the locking range equation presented above is based on a hard switching

assumption for the cross pair in the ILFD. Such an assumption has the best mixing

efficiency out of a single balanced mixer. If the oscillation amplitude is not large

enough to support such an assumption, the mixing efficiency and then the locking

range of the ILFD will drop. In order to maintain such an assumption and thus

obtain a large locking range, it is necessary to increase the oscillation amplitude of the

ILFD. Under the same power consumption, to increase quality factor and inductance

to capacitance ratio of the tank can help increase the oscillation amplitude. However,

an increase in quality factor will decrease the locking range from the locking range

equation. To increase the inductance to capacitance ratio is the only way to increase

the locking range along the approach of ensuring hard switching. There is also a

trade-off associated with this approach, which is the reduced load capability.

Page 90: Low-Power, Gigahertz Clock Generation and Distribution

69

The most common injection-locked frequency divider introduced so far is the

divide-by-two topology as shown in Fig. 2.7-b. It is a differential LC oscillator with

a cross pair source coupled, and an injection (2ω0) at twice harmonic of oscillation

frequency (ω0) injected at the source node. This topology is suitable for divide-by-two

operation because of inherent second harmonic content at the injection node. Two

divide-by-two ILFDs driven by differential injection are used to generate quadrature

signal at high speed and with low power consumption, as proposed in [23]. How-

ever, because of mismatch between the two ILFDs, several degrees and quadrature

error exists at the quadrature output. The phase tunability of ILO can be utilized to

compensate for such quadrature mismatch. Such an application will be shown in the

subsection of the dual-phase signal generator.

In circuit implementations of divide-by-two ILFDs, the injection signal is usually

applied to the gate of the bias transistor, which converts the injection signal from

voltage to current. The injection current at 2ω0 then mixes with the oscillation voltage

to generate the desired frequency component at fundamental ω0. The mixing process

is like in a single-balanced mixer, where only odd harmonics (ω0, 3ω0, · · · ) of oscillation

voltage mix with the injection. So such a topology works for all even-harmonics

injection, because all even harmonics can mix with corresponding odd harmonics

to generate the fundamental component. However, as the input harmonic index

increase, the mixing efficiency drops because of a smaller weight on the corresponding

odd harmonics generated by the current steering function of the cross pair. Odd-

harmonics injection is not supported in this topology. We will address this problem

with a differential injection divide-by-odd-number injection-locked frequency divider

topology.

Page 91: Low-Power, Gigahertz Clock Generation and Distribution

70

3.2.1 Divide-by-Two ILFD for Dual-phase Signal Generation

In modern digital communication systems, it is increasingly important to generate

accurate multi-phase signals. For example, in-phase and quadrature LO signals are

required for quadrature modulation, quadrature down-conversion, and Weaver image

rejection. Passive phase shift circuits such as poly-phase filters [78] are commonly

used in low-GHz applications for this purpose. Their disadvantages are limited band-

width per stage, large signal attenuation, and noise degradation. Ring and coupled

oscillators can also be used to generate accurate multi-phase signals, but they suffer

from inferior phase noise performance, especially at high frequencies. Toggle-flip-

flop digital frequency dividers [28] are widely used to generate quadrature signals.

However, their phase accuracy depends on the input signal duty-cycle, and the large

power consumption is also a concern at high frequencies. Injection-locked frequency

dividers (ILFDs) [22] have been demonstrated for quadrature generation with good

phase accuracy and substantially lower power consumption [79, 23, 80, 81]. They

are particularly suitable for microwave and millimeter-wave applications where the

trade-off between speed and power consumption is more challenging [21, 39].

The application of ILFDs in signal generation is so far limited to quadrature

cases. Our work presents a study on generating signals with arbitrary and tunable

phase difference by utilizing the phase shift characteristics of ILFDs (more generally,

injection-locked oscillators). This is very attractive in applications that require tun-

able phases with fine phase resolution, e.g., phase array systems [82]. It can also be

used to improve the phase accuracy of quadrature generation.

An ILFD can be treated as a simplified regenerative divider with a built-in mixer

and filter [24, 25, 23]. For example, a divide-by-2 ILFD based on differential LC

oscillator (Fig. 3.8-a) can be modeled as a regenerative divider with a single-balanced

mixer and an LC tank filter (Fig. 3.8-b,c) . At large oscillation amplitude, assuming

ideal switching for the differential pair (M1 and M2), the output signal phase shift ϕ

Page 92: Low-Power, Gigahertz Clock Generation and Distribution

71

M1 M2

ω2@iv

ddV

ω@ov

Mtail

+ −

(a)

)(ωZ

M1 M2

tII injbias ω2cos+

+ −ov

1i 2i 21 iii −=∆

(b)

tII injbias ω2cos+

)cos( ϕω += tVv oo)(ωZ−

NonlinearityLC tank

ovi∆ i∆

(c)

Figure 3.8: (a) Schematic,(b) equivalent circuit model and (c) behavior model ofa divide-by-two ILFD based on differential LC oscillator. The nonlinearity in thebehavior model comes from the switching of the cross pair.

Page 93: Low-Power, Gigahertz Clock Generation and Distribution

72

Figure 3.9: Dual phase generation by injection-locking two ILFDs. The injecteddifferential signals result in quadrature phase difference at the ILFD outputs whenω01 = ω02. The output phases φ1 and φ2 are explicitly expressed as the sum of thequadrature phases and the phase shift parts ϕ1 and ϕ2 so that Eqn. 3.28 can bedirectly applied.

can be found [23]

ϕ =1

2[sin−1 3/η

1 + ( ω0

Q∆ω)2 + sin−1 1

1 + ( ω0

Q∆ω)2] (3.28)

where η ≡ Iinj/Ibias is the injection ratio, ω0 is the free-running oscillation frequency,

∆ω ≡ ω − ω0 is the frequency shift, and Q is the LC tank quality factor. This phase

shift characteristics has been shown in Fig. 3.7, from which we can see the phase shift

ϕ is a monotonic function of the frequency shift ∆ω, and the function is quite linear

within the locking range except close to the edges.

When the injected signal changes phase by 180, the phase of the ILFD output

changes by 90. Therefore, when a differential signal is injected into two identical

ILFDs (Fig. 3.9) with the same free-running oscillation frequency (ω01 = ω02), the

two differential output signals are exactly in quadrature, i.e., ∆φ = 90 (Fig. 3.10-a).

The quadrature accuracy is determined by the mismatch between the two ILFDs, and

also affected by the injection ratio η and Q of the LC tank.

Page 94: Low-Power, Gigahertz Clock Generation and Distribution

73

Figure 3.10: Phase tuning of two ILFDs: (a) quadrature; (b)(c) single-ended tuning;(d) differential tuning.

Page 95: Low-Power, Gigahertz Clock Generation and Distribution

74

Figure 3.11: Schematic of the prototype double-balanced ILFD for tunable dual-phase signal generation. A first stage ILFD works as an active balun to convert asingle-ended input to differential signals, which are then fed into the input of thedouble-balanced ILFD stage.

When the two ILFD cores have different free-running oscillation frequencies (ω01 6=ω02), φ1 and φ2 will be no longer in quadrature but with another phase difference.

Therefore, if we frequency-tune ILFD1 or ILFD2, their phase difference ∆φ will change

accordingly. Fig. 3.10 shows some possible ways of phase tuning : we can fix ω02 (and

hence φ2), while tune ω01 to change φ1; we can also tune ω01 and ω02 (and hence

φ1 and φ2) differentially to achieve a larger phase tuning range. If the ILFD cores

are designed to center their frequency tuning range around half input frequency ω,

the phase tuning range will be around quadrature, and reaches its maximum when

tuning differentially. If desirable phase tuning range is around ∆φ = 0, we can just

injection-locked both ILFD cores with the same single-ended signal.

Notice that the signal amplitude is related to ϕ as[23]

V0 =4

πRIbias(1 +

η

3cos2ϕ) (3.29)

Page 96: Low-Power, Gigahertz Clock Generation and Distribution

75

Figure 3.12: Chip micrograph of the prototype ILFD. The chip occupies an area of1.0 × 1.1mm2

where R is the equivalent tank resistance. Therefore, in order to maintain an equal

signal amplitude for the two outputs, it is also better to tune ω01 and ω02 differentially

around ω, in which case ϕ1 ≈ −ϕ2, and hence cos2ϕ1 = cos2ϕ2 .

From Eqn. 3.28 and 3.29, it can be seen that both phase shift and output am-

plitude strongly depend on the injection ratio η, which in turn depends on both the

injection current Iinj and bias current Ibias. In a simple differential LC ILFD, Iinj is

generated by a transconductor, usually made of the tail transistor. Any variation in

transistor size or bias voltage would translate into change in Iinj, and hence affects

the phase accuracy and amplitude equality. To address this problem, we introduce

a double-balanced structure similar to a Gilbert cell (Fig. 3.11). In such a double-

balanced ILFD, the input transconductor is replaced by a differential pair (M5 and

M6) operating in strong switching mode. Therefore, the injection ratio η is deter-

mined only by the Fourier series coefficients of an ideal sign function, and hence is

largely immune from variations in transistor size or dc bias, given the input voltage

Page 97: Low-Power, Gigahertz Clock Generation and Distribution

76

Figure 3.13: Frequency tuning range of the free-running ILFD core.

signal is sufficiently large. Note that the injection current now consists of multiple

harmonics of 2ω.

In the prototype, NMOS inversion-mode varactors (Ct1 to Ct4) are used in the

LC tanks to tune the free-running oscillation frequency (Fig. 3.11). Another ILFD is

added to serve as an on-chip active balun in order to convert the single-ended signal

from a signal source to the differential injection signal with good phase noise. It is a

regular differential LC divide-by-two ILFD like the one in Fig. 1-a. Varactor tuning

is also included in the balun ILFD to cover the locking range of the main ILFD. Since

there is no stringent input bias requirement on the double-balanced ILFD, they are

directly dc coupled.

The circuit is fabricated using a standard 0.18um digital CMOS technology with

low-resistivity substrate. Spiral inductors are constructed using the 0.9um-thick top

metal layer. Due to the thin metal and lossy substrate, Q of the inductors is about

6 at 5 GHz. Two open drain differential buffers are used at the output ports. The

main ILFD consumes 8mA from a 1.8-V power supply. The balun ILFD and the open

drain buffers consume 4 mA and 18 mA from 1.4 V and 1.8 V vdd, respectively. The

Page 98: Low-Power, Gigahertz Clock Generation and Distribution

77

Figure 3.14: Locking range and bounds in the middle of the tuning range. Notethat these are the input signal frequencies, which are 4 times that of the outputs. Amaximum of 17% locking range was achieved

die photo is shown in Fig. 3.12, and the chip size is 1.0mm × 1.1mm.

The circuit prototype is measured using a probe station. First, we measured the

stand-alone main ILFD cores (without the balun ILFD) implemented in a companion

test chip. Their tuning range when free-running is from 4.96 GHz to 6.16 GHz

(Fig. 3.13), and their locking range without tuning is found to be 17%. Then the

locking range of the prototype (with the balun ILFD) was measured at different

tuning points, and is found to be over 15% across the tuning range (Fig. 3.14). Notice

that the locking range extends symmetrically around the free-running frequency as

the injected power increases. Taking into account both the tuning and locking range,

the total operation frequency range then extends to 22%, from 4.78 GHz to 5.95 GHz

at the outputs.

The phase difference of the two output signals are measured using a sampling

oscilloscope. Cables and probes are calibrated to remove the phase mismatch intro-

duced by the measurement setup. Fig. 3.15-a shows the case of tuning the first core

Page 99: Low-Power, Gigahertz Clock Generation and Distribution

78

(a) (b)

(c)

Figure 3.15: Phase tuning: (a) keep Vt1 constant and tune Vt2; (b) keep Vt2 constantand tune Vt1; (c) differential tuning of Vt1 and Vt2 at different injection frequencies.

Page 100: Low-Power, Gigahertz Clock Generation and Distribution

79

Figure 3.16: Phase noise within the locking range, compared with that of input signal.Measured at 5.3 GHz output. 12-dB phase noise reduction comes from the divide byfour operation.

ILFD1 while keeping ILFD2 at the middle of its tuning range. The phase difference

can be varied by 55 around quadrature before the ILFD loses lock. Fig. 3.15-b shows

the opposite case of tuning ILFD2 only. A similar 50 phase tuning around quadra-

ture is achieved. When ILFD1 and ILFD2 are tuned differentially, about 100 (40 to

140) phase tuning is achieved for different input frequencies (Fig. 3.15-c). Compared

to single-ended phase tuning, differential tuning shows much better linearity in the

tuning characteristics.

The phase noise across the locking range is also measured, together with that of

the input signal, as shown in Fig. 3.16. The phase noise of the output is about 10

to 11 dB lower than that of the 21-GHz input. The phase noise suppression is quite

close to the theoretical value of 12 dB for divide-by-four operation.

Page 101: Low-Power, Gigahertz Clock Generation and Distribution

80

Figure 3.17: Circuit evolution of ILFD from divide-by-two to divide-by-three. Themost important change is from common mode injection and differential mixing todifferential injection and single-ended mixing.

3.2.2 Divide-by-Odd-Number ILFD

In order to achieve divide-by-odd number operations for LC differential oscillator

based ILFDs, a different mixing scheme should be introduced to bypass the even

harmonics suppression of balance mixer, while at the same time, careful harmonics

design is necessary to maintain the differential topology for outputs. We introduce

such a new topology based on differential injection and single ended mixing scheme

to achieve the divide-by-odd number operations, while the LC differential oscillator

topology is perfectly maintained through dedicated harmonics engineering.

The evolution from a divide-by-two ILFD to a divide-by-odd number ILFD can be

shown in Fig. 3.17. In the divide-by-two ILFD, input at second harmonic is injected at

the gate of the tail transistor, the transconductance of the tail transistor converts the

injection voltage to current and feeds it into the common source of the cross-coupled

transistors. The output at fundamental frequency steers this injection current and the

Page 102: Low-Power, Gigahertz Clock Generation and Distribution

81

1M 2M

)cos( tVosc ω

LC

1i 2i

)3cos( ϕω +tVinj )3cos( ϕω +− tVinj

)( 21 iiid −−=

Tank

)cos( tVosc ω−1v 2v

Figure 3.18: Circuit model for the divide-by-three ILFD, with differential injectionat the sources of the cross pair.

mixing component at fundamental is feed into the LC tank, which translate it back

to voltages at the output. The phase and voltage balance of this feedback system will

shift the oscillation frequency from its resonant frequency to be exactly half of the

injection frequency inside the locking range. This topology is efficient for divide-by-

two operation and can be used for higher order divide-by-even number operations with

decreasing efficiency as the division ratio increases. But it will have very bad efficiency

for divide-by-odd number operations, as the differential topology will largely suppress

the even number harmonics for the feedback voltages, which on the other hand, is a

must for divide-by-odd number operations as it is only the even harmonics that can

mix with the odd number harmonic input to generate fundamental frequency. While

in the divide-by-three topology, differential injection currents are feed into only one

of the cross-coupled transistors and mix with the feedback voltage single ended. This

topology do not has the even harmonics suppression during the mixing operation and

thus can support odd number division ratio. In order to maintain the signal path for

fundamental current, a filtering structure is introduced to provide a short between

the sources of cross-coupled pair, while it should appear as open for the injection odd

Page 103: Low-Power, Gigahertz Clock Generation and Distribution

82

Figure 3.19: Loop model for the divide-by-three ILFD.

harmonics.

Fig. 3.18 and Fig. 3.19 show the modeling of the divide-by-three topology. Injec-

tion at third harmonic of the output is modeled by two differential voltages at the

sources of the cross pair (Fig. 3.18). Output is taken differentially from the drains

where it also mixes with the injection voltages by M1 and M2. The differential current

id = i2 − i1 which contains the mixed component of injection and output is filtered

by the tank of the LC oscillator, leaving only the fundamental frequency and filtered

out other unwanted harmonics. The transfer function of the LC tank

F (jω) = |Z(jω)|ejβ(ω), (3.30)

with the phase shift

β(ω) = −arctan(2Q∆ω

ω0), (3.31)

where ω0 is the resonance frequency of the tank, and ∆ω = ω − ω0 is the difference

between the 3rd subharmonic of the input frequency and the resonance frequency. In

the free running case, the oscillator is working at its resonance frequency ω0, so the

phase shift by the tank is zero. The feedback loop of the oscillator can maintain a

balance with the active circuit only provides the negative resistance to compensate

Page 104: Low-Power, Gigahertz Clock Generation and Distribution

83

for the loss component in the tank. In the injection-locked case, the oscillator should

be able to work in a frequency band around its resonance frequency. This means

the phase shift of the tank will no longer be zero. At the same time, oscillation

condition requires the loop phase shift to be 0 or multiple of 2π. So, the phase shift

of the tank must be compensated by the active circuit in injection-locked oscillator.

In non-division cases, this phase shift is introduced by adding up the output with an

injection component at the same frequency[25]. And in division case, the phase shift

is introduced by mixing[24].

As for the divide-by-three topology in Fig. 3.19, the phase shift provided by the

mixing operation between the 3rd harmonic injection and the fundamental output in

a single-transistor fashion. Assuming a third order nonlinearity for transistors M1

and M2,

f(vi, vo) = a0 + a1(vi + vo) + a2(vi + vo)2 + a3(vi + vo)

3 (3.32)

the fundamental component that feeds into the tank Z(ω) is

id = i2 − i1

= 2a1Vosc cos(ωt) − 3a3Vosc2Vinj cos(ωt + φ)

= |id| cos(ωt + ϕ)

where

ϕ = −arctan[3a3Vosc

2Vinjsin(φ)

2a1Vosc − 3a3Vosc2Vinjcos(φ)

] (3.33)

The above equation shows the importance of differential injection, because if the

injection is in common mode, even if the mixing is performed by a single transistor,

the mixing generated fundamental current still will be canceled for differential mode

operation.

Page 105: Low-Power, Gigahertz Clock Generation and Distribution

84

0L3M 4M

1T

biasV

1L 2L

1M 2M

ddV

1tC 2tCtuneV

1outV 2outVBuffer Buffer

injV

biasI

Figure 3.20: Circuit implementation of the divide-by-three ILFD. An input balun isused for single-end to differential conversion.

From the phase balance between cross-coupled pair and LC tank, the half band-

width locking range in percentage of ω0 can be expressed as:

∆ω

ω0

≤ 3a3VoscVinj√

(2a1)2 − (3a3VoscVinj)2

1

2Q

=1

(( 2a1

3a3VoscVinj)2 − 1)

1

2Q

which is inverse proportional to Q, the quality factor of the tank, and proportional to

the third order nonlinearity coefficient a3 of the cross pair transistors, the injection

level Vinj, and finally the the oscillation amplitude Vosc.

For circuit implementation of the proposed divide-by-odd-number injection-locked

frequency divider, we construct a differential cascode topology by adding another

differential pair of M3 and M4 (right in Fig. 3.20). A shunt-peaking inductor L0 is

also introduced to resonate with the parasitic capacitances at the 3rd harmonics[21].

Page 106: Low-Power, Gigahertz Clock Generation and Distribution

85

5 6 7 8 9 10 11 12 13 14 15 16 17-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

5.4435 5.4485 5.4535 5.4585 5.4635-100

-80

-60

-40

-20

0 dB23 dB21

Spec

trum

(dBm

)

Frequency (GHz)

Figure 3.21: Output spectrum of the divide-by-three ILFD with 23-dB and 21-dBof second and third harmonic suppression. The zoom in of the fundamental outputshows a clean spectrum.

It also provides a short-circuit current path for the fundamental frequency component.

Therefore, the upper half circuit (M1, M2, L0 and LC tank) works as a differential

LC oscillator at the fundamental frequency, the lower one as a tuned differential

amplifier, while mixing is accomplished within the left and right half circuits single-

ended. Overall, we try to confine signals at different harmonics locally by circuit

topology and filtering for them to co-exist in harmony. A balun T1 is used to convert

a single-ended input from a signal source to differential signals. It also helps to match

the input impedance of M3 and M4 to 50 Ω. The input can be directly connected

when the ILFD is integrated with a on-chip differential source like a VCO.

A prototype ILFD with input frequency of 18GHz using the new topology has been

designed and fabricated using National Semiconductor’s 0.18µm CMOS technology

with low-resistivity epi silicon substrate. Spiral inductors are constructed using the

2µm-thick top metal layer. Due to the lossy substrate, Q of the inductors are about

3 at 6 GHz and 7 8 at 18 GHz. A open-drain differential buffer is used at the output

Page 107: Low-Power, Gigahertz Clock Generation and Distribution

86

Freq

uenc

y(G

Hz)

Injected Power (dBm)-15 -13 -11 -9 -7 -5 -3 -1 1 3 516.6

16.8

17

17.2

17.4

17.6

17.8Upper BoundLower Bound

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Locking Range

Figure 3.22: Locking range vs. injection power for the divide-by-three ILFD. Amaximum of 1-GHz locking range was achieved.

Freq

uenc

y(G

Hz)

Voltage Amplitude @ Injection Port (V)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.816.6

16.8

17

17.2

17.4

17.6

17.8

Upper BoundLower Bound

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Locking Range

Figure 3.23: Locking range vs. injection voltage. The injection-voltage is calculatedby the incident power reading and the s11 at the input port, with cable and connectorloss calibrated out.

Page 108: Low-Power, Gigahertz Clock Generation and Distribution

87

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.815.5

16

16.5

17

17.5

18

18.5

19Upper BoundLower Bound3f0

Freq

uenc

y(G

Hz)

Tuning Voltage (V)

Figure 3.24: Extended working frequency range which combines the frequency tuningrange and the locking range.

103 104 105 106-140

-130

-120

-110

-100

-90

-80

-70

-60

-50Injected Power 3.7dBmInjected Power -3.3dBmInjected Power -8dBmFree RunningAgilent E8244A

Phas

eN

oise

(dBc

/Hz)

Offset Frequency (Hz)

Figure 3.25: Phase noise performance vs. injection power. 9-dB phase noise reductionis because of the divide by three operation.

port. The ILFD core and output buffer consume 2.55 mA and 22.6 mA from a 1.8 V

power supply, respectively.

The prototype ILFD chip is measured by on-wafer probing. The loss from the

Page 109: Low-Power, Gigahertz Clock Generation and Distribution

88

Figure 3.26: Chip micrograph of the divide-by-three ILFD, with a chip size of 0.9mm×0.9mm.

cables, adapters and probes between the signal source and the input port is char-

acterized to calibrate the injection signal power. S11 at the input port is measured

and used to calculate the gate voltage on M3 and M4. S11 is below -8 dB across

the frequency tuning and locking range, and the injection power is adjusted from

the incident power accordingly. The output signal spectrum in locked condition is

shown is Fig. 3.21. The 2nd and 3rd harmonics are -23 dB and -21 dB below the

fundamental frequency, and a large part of them is contributed by the open-drain

buffer at the output (single-ended measurement). The locking range increases from

0.3 GHz at injection power of -14 dBm to 1 GHz at 4 dBm with little change in the

center frequency (Fig. 3.22). The corresponding input port voltage is calculated using

S11 and shown in Fig. 3.23. Note that this is the single-ended voltage (amplitude)

at the primary of the balun with 1:1 transformation ratio. Considering impedance

matching, the differential voltage on the gates of M3 and M4 is about 0.5 times of

Page 110: Low-Power, Gigahertz Clock Generation and Distribution

89

that, which clearly is compatible with on-chip VCOs. The ILFD can also be tuned

by the varactors Ct1 and Ct2 with the free-running frequency from 5.37 GHz to 6.1

GHz. The locking range shift with the free-running frequency and remain almost

constant across the tuning range. Fig. 3.25 shows the phase noise performance of the

ILFD at different injection power levels. The phase noise of the free-running ILFD

(no injection) and the signal source is also shown for comparison. Due to the low Q

of inductors, the free-running phase noise is not good at all. When the ILFD is in

locked condition, the phase noise is following that of the signal source with a 9-10-dB

reduction at large injection power (-3 dBm and 3.7 dBm) which matches well with

the theoretical value 9.5 dB. For small injection power (-8-dBm), the phase noise

degrades only at large offset frequency. Fig. 3.24 shows the phase noise performance

across the locking range at the injection power level of 3.7 dBM. At small offset fre-

quency (<200 KHz), the phase noise degradation at the edges of the locking range is

almost negligible, while at it deviates more at larger offset.

The die photo is shown in Fig. 3.26. The chip size is 0.9mm × 0.9mm.

3.3 Injection-Locked Frequency Multipliers (ILFMs)

3.3.1 Dual Modulus ILFM with Good Harmonic Suppression

As discussed in chapter 2, conventional implementation of frequency multipliers

with nonlinear element and filtering network can not provide good harmonic suppres-

sions in CMOS processes, as there is no passive elements with good quality factors.

We propose a new frequency multiplier topology which cascades an injection-locked

oscillator (ILO) after the harmonic generator to perform the frequency selection.The

ability of an ILO to be locked by a signal strength much smaller compared with

its output enables a small output level from the harmonic generator, while at the

Page 111: Low-Power, Gigahertz Clock Generation and Distribution

90

same time, maintaining a relative large output strength for the multiplier as a whole.

This actually isolates the two closely related factors of output power level and har-

monic generator power in conventional approach. A smaller harmonic generator power

means smaller undesired harmonics also, which is easier to filter out. The ILO after

the harmonic generator acts as a high-gain bandpass amplifier which amplifies the

desired harmonic component to the required signal strength, while at the same time,

suppresses the undesired ones effectively. This new frequency multiplier topology can

also be easily configurable to be working at different multiplication ratios, say as fre-

quency doubler or tripler, depending on the relation between the input frequency and

the ILO oscillation frequency. If the ILO frequency is twice the input frequency, the

multiplier works as a doubler; if the ILO frequency is three times the input frequency,

the multiplier is a tripler. A mode control can be assigned to switch the circuit

between the two operation modes to achieve dual-modulus frequency multiplier.

The proposed injection-locked frequency multiplier (ILFM), integrates an injection-

locked oscillator (ILO) [20] with a harmonic generator to perform the frequency se-

lection, instead of solely relying on the output filter as in the conventional frequency

multiplier. The architecture of such a frequency multiplier is shown in Fig. 3.27.

The harmonic generator generates harmonic rich current, which is filtered by a filter

network composed by a tunable switched capacitor tank and an on chip transformer.

The transformer also serves as the input device of the last stage ILO [?]. The ILO

has a natural oscillation frequency at or near the desired output frequency, tunable

by a switched capacitor, and is locked by the desired harmonic frequency generated

by the harmonic generator.

There are several benefits of the new topology over the conventional approach. The

first one is better harmonic suppression in a lossy process. Because an ILO can be

locked by a very small input signal, the harmonic generator in this topology can run at

a relatively low power level. This means that, there are only insignificant undesired

Page 112: Low-Power, Gigahertz Clock Generation and Distribution

91

Figure 3.27: Schematic of the dual-modulus injection-locked frequency multiplier.

harmonics at the harmonic generator output, which will be largely suppressed by

the two stage filtering in the frequency multiplier. No further filtering is necessary

since the harmonics suppression is already satisfied at the multiplier output. Since

an ILO can be built with low-Q passive devices, the proposed topology can provide

large harmonics suppression in a digital CMOS technology. Compared with injection-

locked frequency multipliers proposed in [51, 52], the harmonic rich current is band

pass filtered first before it feeds into the ILO. So the new topology can have superior

harmonic suppression within a compact structure. The harmonic suppresson for such

a new topology can be expressed as

HSm,n,inj = |Imω0

Inω0

||Zin(mω0)

Zin(nω0)||Zosc(mω0)

Zosc(nω0)|1η

(3.34)

in which the extra term of |Zin(mω0)Zin(nω0)

| can significantly improve the harmonic sup-

pression performance.

The second benefit of the new topology is that it decouples the output power of

the frequency multiplier from that of the harmonic generator. These two are directly

Page 113: Low-Power, Gigahertz Clock Generation and Distribution

92

Figure 3.28: (a) Harmonic current generation vs. input voltage for a zero biasedNMOS transistor harmonic generator. (b) Harmonic current ratio to fundamental vs.input voltage for the same harmonic generator

correlated in the conventional design. In order to increase the output power level, the

conventional design would have to drive the harmonic generator with larger power,

but this also increases the power of the undesired harmonics. In the new topology,

the output power is determined by the bias of the ILO, and hence can be tuned

independently from the power level of the harmonic generator.

Another benefit of the new ILFM topology is that it can be easily implemented for

variable multiplication ratios. For example, if the ILO natural frequency is designed

twice as the input, the multiplier works as a doubler; if it is three times as the input,

it works as a tripler. Dual-modulus operation can be achieved by switching the ILO

natural oscillation frequency between these two frequencies.

As shown in Fig. 3.27, we choose an NMOS transistor with zero bias voltage

as the harmonic generator. This harmonic generator always works in class C mode

as the conduction angle is always smaller than pi. As the input voltage increase,

the generated harmonic currents also increase, but at the cost of higher dc current

consumption and higher fundamental current. This can be illustrated by the simu-

lated current components of such a zero biased NMOS transistor harmonic generator,

shown in Fig. 3.28.

Page 114: Low-Power, Gigahertz Clock Generation and Distribution

93

Figure 3.29: Model of the filter between harmonic generator and the ILO. The trans-former model has neglected the parasitic inductance.

The filter between the harmonic generator and ILO can be modeled as in Fig. 3.29,

where the leakage inductance of the transformer has been neglicted. After transfer

the load impedance ZL to the primary, the filter structure becomes a parallel RLC

network driven by the current source of the harmonic generator. In order to maxi-

mize the voltage swing of the desired harmonic component on the load ZL, while at

the same time suppress the undesired harmonics, it is necessary to tune the tuning

capacitance Ct so that it forms a parallel resonance at the desired harmonic. This

tuning capacitance includes the parasitic capacitance of the harmonic generator and

transformer, and can be built with tunability to enable modulus change.

The final stage ILO is in a transformer-direct-injection topology [?]. The filtered

harmonic components are applied to the sources of the cross pair in the final stage

injection-locked oscillator by the transformer. These injections are amplified in a

common gate configuration by the cross pair before feeding into the LC tank of the

oscillator. This arrangement reuses the gain from the cross pair without extra power

consumption.

The dual-modulus ILFM prototype is designed and fabricated to verify the benefits

of the new technique. The schematic of the prototype dual-modulus ILFM is shown

in Fig. 3.27. It is designed with input frequency of 1.6 GHz and output frequencies

Page 115: Low-Power, Gigahertz Clock Generation and Distribution

94

Figure 3.30: Die photo of the dual-modulus frequency multiplier. Osc is the oscillatorcore, T1 is the transformer and M1 is the harmonic generator.

of 3.2 GHz and 4.8 GHz at doubler mode and tripler mode, respectively.

The harmonic generator is a transistor in common source configuration with its

gate tied to the ground by a large resistor R1. Such a design makes the transistor

always operate in class-C region at different input levels. The ILO is built with LC

differential oscillator. A switched capacitor array Cs1 is used in the tank to switch

its natural oscillation frequency between 3.2 GHz and 4.8 GHz. The transformer-

direct-injection topology injects the input signal differentially to the ILO tank and

is effective for locking to odd harmonics, including fundamental, of its natural oscil-

lation frequency. In an ILFM, the ILO natural oscillation frequency is designed at

the desired harmonics of the input frequency, which is generated by the harmonic

generator. Since the input of this ILO topology is inductive, a capacitor Ct1 is added

in parallel to form a resonant filter, which adds another stage of frequency selection.

Another switched capacitor Cs2 is added in parallel to switch the resonant frequency

between 3.2 GHz and 4.8 GHz. An open drain differential buffer is used to facilitate

measurements in a 50-ohm measurement system.

Page 116: Low-Power, Gigahertz Clock Generation and Distribution

95

Figure 3.31: Locking ranges of doubler and tripler modes vs. input levels, whichdetermine the output frequency ranges of the frequency multiplier.

The circuit prototype was fabricated in a 0.18-µm standard digital CMOS process

with low resistivity substrate. Transformer and symmetric inductor are both imple-

mented with the 0.35-µm-thick top metal layer. Transformer has a k factor of 0.77 at

both 3.2 GHz and 4.8 GHz. The symmetric inductor has an inductance value of 4nH

and quality factors of 2.9 and 3.9 at 3.2 GHz and 4.8 GHz, respectively. Die photo

of the circuit prototype is shown in Fig. 3.30, with the effective circuit size without

pads 0.4 by 0.1 mm. The whole test chip with pads has a size of 0.8 by 0.6 mm.

The dual-modulus injection-locked frequency multiplier is measured on a probe

station. The input signal is from a continuous-wave signal generator. The differential

output from the open drain buffer is measured by an SGS probe and only one branch

is measured in single-ended fashion. Locking ranges, which determine the output

frequency ranges, and harmonic suppressions of both doubler and tripler modes are

measured at different input amplitudes. Input amplitude is calculated based on the

Page 117: Low-Power, Gigahertz Clock Generation and Distribution

96

Figure 3.32: Harmonic suppressions of doubler and tripler modes vs. input levels.

power reading of the signal generator and s11 of the injection port, with cable and

connector losses calibrated.

As shown in Fig. 3.31, locking ranges increase with input amplitude and as large

as 35.0% and 13.5% are achieved for doubler and tripler mode, respectively, both at

1-V input. Harmonic suppressions, on the other hand, decrease with increasing input

amplitude as in Fig. 3.32. For the minimum input level, the doubler mode shows a

fundamental suppression of 49 dB, and the tripler mode shows a fundamental and

second harmonic suppression of 62 dB and 54 dB respectively. In order to compensate

for PVT (process voltage and temperature) variations, a 5% locking range is usually

required. In order to achieve 5% locking range, for doubler mode, the input level is

0.48 V and the fundamental suppression is 42 dB; for tripler mode, the input level

is 0.64 V and the fundamental and second harmonic suppression is 40 dB and 32

dB (Fig. 3.33). The power consumption and locking range trade-offs are shown in

Fig. 3.34. A larger input level will have a larger locking range, but with more power

consumption. At 0.48-V input and in doubler mode, the core circuit without buffer

burns 2.2-mW dc power, and at 0.64-V input, in tripler, it burns 3.7 mW.

Phase noise of both doubler and tripler mode are shown in Fig. 3.35 with compar-

ison with their free running cases and the measured input from the signal generator.

Page 118: Low-Power, Gigahertz Clock Generation and Distribution

97

Figure 3.33: Output spectra of doubler (a) and tripler mode (b) showing the harmonicsuppressions at 5% locking range input levels.

Page 119: Low-Power, Gigahertz Clock Generation and Distribution

98

Figure 3.34: Power and locking range trade-offs for doubler and tripler modes.

As the phase noise of the signal generator is much lower than the noise floor of the

spectrum analyzer, the measured input phase noise and output phase noise already

hit the noise floor. Thus, we cannot see the theoretical 6-dB and 9.5-dB phase noise

degradation from input to output for doubler and tripler, respectively. On the other

hand, compared with free-running phase noises, as large as 60-dB phase noise sup-

pression at offset frequency of 10 kHz can be observed.

Due to the difficulty of capturing the dynamics of modulus change in measurement,

we simulate the multiplication modulus change in ADS with the control ramp time

of 5ns. The simulated dynamics is shown in Fig. 3.36. The nanosecond transit time

shows that an ILFM can be used in applications where fast modulus switching is

critical, like in clocking.

Table 3.1 compares the performance of the ILFM with some recent published

results of frequency multipliers. It can be seen that the proposed ILFM achieves

superior suppressions of the undesired harmonics even when compared with devices

fabricated in advanced technologies like SOI CMOS, SiGe, InGaAs, InGaP and GaAs.

Additionally, because of usage of low-Q inductors, which tend to occupy less area than

Page 120: Low-Power, Gigahertz Clock Generation and Distribution

99

103 104 105 106 107−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

−50

−40

Offset frequency (Hz)

Phas

e no

ise (d

Bc)

Measured InputDoublerTriplerDoubler FreerunTripler Freerun

Noise floor ofspectrum analyzer

Ideal phase noise profileof signal source from specs

Figure 3.35: Output phase noises of doubler and tripler with comparison to freerunning conditions.

22 23 24 25 26 27 28 29 30 31 32 330

0.5

1

1.5

2

Time (ns)

Mod

ulus

con

trol (

V)

22 23 24 25 26 27 28 29 30 31 32 330

0.5

1

1.5

2

Time (ns)

ILFM

out

put (

V)

Figure 3.36: Simulated transient of modulus change from tripler to doubler for thedual-modulus ILFM, which shows a dynamics time in ns range. Limiting amplifier isadded at output to balance the amplitudes for the two modes.

Page 121: Low-Power, Gigahertz Clock Generation and Distribution

100

Table 3.1: Performance Comparison with Other Works

Process Multiplication Output Pdc Fundamental 2nd harmonic Chip size without Reference

ratio frequency (GHz) (mW) suppression (dB) suppression (dB) pads (mm2)

CMOS Doubler 3.2 2.2 42 NA 0.1 × 0.4 This work

CMOS Tripler 4.8 3.7 40 32 0.1 × 0.4 This work

CMOS Doubler 5 12.6 20 NA 0.6 × 0.6 [46]

SOI CMOS Doubler 5.2 10 11 NA 0.29 × 0.13 [47]

SiGe Tripler 60 54 25 NA NA [40]

SiGe HBT Doubler 16 22 25 NA 0.7 × 0.35 [44]

SiGe HBT Doubler 36 95 35 NA 0.7 × 0.5 [44]

InGaAs PHEMT Tripler 36 18.9 21.4 22.3 2 × 2.5 [45]

InGaP HBT Doubler 16 200 25 NA 0.7 × 0.4 [83]

SiGe HBT Doubler 30 185 22 NA 0.45 × 0.55 [84]

GaAs PHEMT Doubler 56 70 29 NA 1.4 × 0.64 [85]

GaAs HEMT Doubler 56 275 23 NA 1.6 × 1.2 [86]

high-Q inductors, the circuit area of the ILFM is smaller compared with other works.

Page 122: Low-Power, Gigahertz Clock Generation and Distribution

Chapter 4

Injection-Locked Clock

Distribution

4.1 Injection-Locked Clocking

Figure 4.1: Injection-locked clocking (ILC).

We propose a new clocking scheme based on injection locking as shown in Fig. 4.26.

Similar to conventional clocking, the global clock is generated by an on-chip PLL and

distributed by a global tree. The difference is that we use injection-locked oscilla-

tors (ILOs) to regenerate local clocks, which are synchronized to the global clock

101

Page 123: Low-Power, Gigahertz Clock Generation and Distribution

102

through injection locking. Another difference is that most global clock buffers in

conventional clocking are removed because the sensitivity of ILOs are much greater

than digital buffers (see detailed discussion below). Essentially, we use ILOs as local

clock receivers, similar to the idea of clock recovery in communication systems. Note

that this is different from resonant clocking [73], where all the oscillators are coupled

together.

In addition to acting as clock receivers, ILOs can be constructed as frequency

multipliers [76] or dividers[22, 39], and hence this scheme enables local clock do-

mains to have higher (nf0) or lower clock speed (f0/m) than the global clock (f0).

Such a global-local clocking scheme with multiple-speed local clocks offers signifi-

cant improvements over conventional single-speed clocking scheme in terms of power

consumption, skew, and jitter.

4.1.1 Power Savings

Injection-locked clocking (ILC) can lead to significant power savings in high-

performance microprocessors. The benefits come from several sources. First, the

possible combination of a low-speed global clock and high-speed local clocks can

reduce the power consumption in the global clock distribution network. In the con-

ventional approach, this would require multiple power-hungry PLLs for frequency

multiplication. An ILO consumes much less power than a PLL because of their cir-

cuit simplicity [21]. This will become more evident in multi-core processors.

Second, ILOs have higher sensitivity than clock buffers (inverters). As a synchro-

nized oscillator, an ILO effectively has very large voltage gain when the injection

signal amplitude is small, while the gain of an inverter is much smaller (Fig. 4.2).

This can be easily understood if we realize that synchronization in an ILO is usually

achieved in tens to hundreds clock cycles, and hence in each clock cycle only a small

amount of injection locking force is needed. While an inverter needs to change its

Page 124: Low-Power, Gigahertz Clock Generation and Distribution

103

Figure 4.2: Voltage gain of an inverter and an injection-locked oscillator at differentinput signal levels.

state twice in every clock cycle. As a result of this difference, the signal amplitude of

the global clock can be much smaller in the injection-locked clocking scheme, which

results in less power loss on the parasitic capacitance and resistance of the global-

clock distribution network. This will be increasingly attractive as the interconnect

loss becomes a dominant factor as the process technology scales further.

Further, the number of clock buffers in the global clock distribution can be re-

duced. In conventional clocking, in order to minimize jitter generated by clock buffers,

the global clock signal needs to be driven from rail to rail throughout the whole net-

work, and in turn many clock buffers are inserted. In injection-locked clocking, ILOs

can achieve good jitter performance with small input signal amplitude. Therefore,

the global clock signal amplitude no longer needs to be full swing, and few (or none

at all) clock buffers are needed on the global tree. Reduced number of clock buffers

directly translates into lower power consumption (Fig. 4.3).

More importantly, because injection-locked clocking significantly lowers skew and

jitter in the global clock, the timing margin originally allocated can be recovered, and

used for circuit operation. This can enable faster clock speed. Or, we can trade it

for lower power supply voltage (vdd), and save power dissipation from not only clock

distribution network, but all the logic gates on the chip.

Page 125: Low-Power, Gigahertz Clock Generation and Distribution

104

Figure 4.3: Power savings of ILC relative to conventional clock distribution.

4.1.2 Skew Reduction and Deskew Capability

Skew in conventional buffered-tree based clock distribution is introduced mainly

by the clock buffer mismatch between different clock branches. In resonant clocking,

even though there are less clock buffers, thus less buffer mismatch induced skew, the

resonator itself can still generate significant skew due to the resonant frequency mis-

match between different resonators. This can be illustrated by the plot in Figure 4.4,

where the skews introduced by such frequency mismatch between resonantors are

plotted versus quality factor of resonators. From the plot we can see significant por-

tion of a clock cycle can be consumed by such skews introduced by resonant frequency

mismatch.

Page 126: Low-Power, Gigahertz Clock Generation and Distribution

105

Figure 4.4: Skew introduced by resonant frequency error vs. quality factor of res-onator in a resonant based clock distribution.

Conventional active deskew methods [58, 87] compensate the skew by adding tun-

able delays to different clock paths. They are designed to reduce the clock skew

after the chip fabrication, and capable of tracking the skew variations dynamically.

The tunable delay is typically implemented by active delay lines which are loaded

with switched-capacitor arrays [58], or built with current starved buffers [87]. These

approaches proved effective in conventional clocking and have been applied to res-

onant clocking [88]. However, adding active delay lines has several disadvantages.

First, it consumes extra power; second, it increases the clock latency substantially

due to the delay tuning requirement; most importantly, the extra active delay line

tends to degrade the clock jitter significantly. This is because power supply noise

coupled through clock buffers is the main contributor to jitter accumulation in the

conventional clock distribution [60], and adding active delay lines for deskew further

increases the length of the buffer chain in the clock signal path.

Injection-locked clocking can have better skew performance compared with con-

ventional buffered-tree based clock distribution due to two reasons. Because the

number of buffers is reduced in the new clocking scheme, skew due to mismatch

Page 127: Low-Power, Gigahertz Clock Generation and Distribution

106

of clock buffers is reduced compared to conventional clocking. More importantly,

injection-locked clocking provides a built-in mechanism for deskew. The phase dif-

ference between the input and output signals of an ILO can be tuned by adjusting

the center frequency of an ILO. This phase tuning capability enables ILOs to serve

as built-in “deskew buffers”. In turn, removing dedicated deskew buffers not only

saves power, but also reduces their vulnerability to power supply noise. Similar to ac-

tive deskewing in conventional clocking, phase detectors can be placed between some

local clock domains to check skew and then tune corresponding ILOs. Note that

this is different from distributed PLL approach [66, 67], where phase detectors have

to be added between all adjacent clock domains for frequency synchronization, and

then possibly for deskew. In injection-locked clocking, frequency synchronization is

achieved by injection locking, and the phase detection is used for deskew only. In other

words, injection-locked clocking with deskew tuning is a dual-loop feedback system,

and therefore provides both good tuning speed and small phase error (residue skew).

Because the excellent built-in deskew capability of ILOs, it can be expected that an

injection-locked clock tree has much more freedom in its physical design (layout).

4.1.3 Jitter Reduction and Suppression

Injection-locked clocking can significantly reduce jitter in global clock distribution

networks. First, reduced number of global clock buffers also means less pick-up of

power supply and substrate noise, and hence less jitter generation and accumulation,

as shown in Fig. 4.5a. Second, because of the design freedom in layout, clock inter-

connect can be placed where there is minimal noise coupling from adjacent circuits

and interconnects. In addition, similar to a PLL, an ILO can suppress both its inter-

nal noise through high-pass filtering and input noise through low-pass filtering, and

hence can possibly lower jitter at its output [17, 21]. Using a differential structure,

an ILO can be less insensitive to the common-mode power supply and substrate noise

Page 128: Low-Power, Gigahertz Clock Generation and Distribution

107

than an inverter by design. Therefore, injection-locked clocking is likely to achieve

better jitter performance than conventional clocking.

Figure 4.5: Illustration of ILC jitter suppression in comparison with conventionalclocking.

Compared to other resonance-based clocking schemes proposed recently [71, 89],

ILC’s jitter performance is not limited by the quality factor Q of on-chip resonator,

which explains why injection locking has recently been adopted for resonant clocking

[90, 91, 88].

4.1.4 Potential Applications

With the numerous technical advantages, ILC can be used to improve high-end

microprocessors and the design process in many ways:

First, ILC reduces jitter and skew compared to a conventional clocking network.

This reduces cycle time and therefore allows a faster clock speed. As technology

scaling improves transistor performance but does not reduce jitter and skew (which

actually increase), the improvement in clock speed will be more pronounced over time.

Although further increasing whole-chip clock speed finds limited practical appeal in

Page 129: Low-Power, Gigahertz Clock Generation and Distribution

108

today’s setting, it may still be effective in certain specialized engine inside a general-

purpose architecture.

Second, using ILC, clock distribution for a multi-core system is a natural exten-

sion from a single-core system. A conventional clocking scheme would require adding

chip-level PLLs. PLLs are bulky and particularly vulnerable to noise and hence usu-

ally placed at the very edge of a chip. In future multi-core systems, it represents

a significant challenge to place PLLs and route high-speed clock signal to the des-

tination cores. In contrast, in ILC, a single medium-speed global clock signal can

be distributed throughout the chip and locally each core can multiply the frequency

according to its need.

Third, even in a single-core architecture, different macroblocks can run at different

frequencies. This is referred to as the multiple clock domain (MCD) approach [92, 93].

Using ILC, we can locally multiply (or divide) the frequency of the single global clock.

One significant advantage of using ILC to enable multiple clock domains is that the

local clocks have a well-defined relationship as they are all synchronized to the global

clock. As a result, cross-domain communication can still be handled by synchronous

logic without relying on asynchronous circuits. Note that although ILOs are not as

flexible as PLLs in frequency multiplication, they are sufficient for MCD processors

as only a few frequency gears are needed for practical use [94].

4.2 Architecture Level Evaluation of ILC Power

Impact

We quantitatively demonstrate some benefits of ILC in a most straightforward

setting, a single-core processor running at a single clock frequency. We focus on

the energy benefits in this case study and compare processors that only differ in the

Page 130: Low-Power, Gigahertz Clock Generation and Distribution

109

global clock distribution. Due to the limited availability of detailed characterization of

clocking network in the literature, our choice of the clocking network closely resembles

that of the baseline processor. Note that this is far from the optimal ILC design for

the given processor, but demonstrates significant benefits of ILC nonetheless.

Our baseline processor is Alpha 21264, which has the most details in public domain

on its clock distribution network [95, 96]. In this processor, an on-chip PLL drives an

X-tree, which in turn drives a two-level clocking grid containing a global clock grid

and several major clock grids. The major clock grids cover about 50% of the chip

area and drive local clock chains in those portions. The remaining part of the chip is

directly clocked by the global clock grid. The densities of the two levels of grids are

different. This configuration is illustrated in Fig. 4.6-a. The three planes X, G, and

M represent the three layers of clock distribution networks: the X-tree, the global

clock grid, and the major clock grids, respectively.

In the first ILC configuration (Fig. 4.6-b), we only replace the very top level of

the clock network (X). We remove all buffers in the X-tree trunk and replace the final

level of buffers (a total of 4) with ILOs. The rest of the hierarchy remains unchanged.

Note that in contrast to the Alpha implementation, we send low-swing signals on

the X-tree, which reduces the energy consumption of the top level clock network.

Furthermore, as discussed before, clock jitter and skew will also reduce. We convert

this timing advantage into energy reduction by slightly reducing the supply voltage.

While such a simple approach of using ILC as a drop-in replacement already

reduces energy consumption, it is not fully exploiting the benefits of ILC. As discussed

before, numerous ILOs can be distributed around the chip to clock logic macro-blocks.

Thanks to the built-in deskew capability, we can avoid using power-hungry clock grids

altogether. However, to faithfully model and compare different approaches, we need

parameters ( i.e., capacitance load of individual logic macroblocks) for circuit-level

simulation which we could not find in the literature. As a compromise, in the second

Page 131: Low-Power, Gigahertz Clock Generation and Distribution

110

X

G

M

(a) XGM

I

G

M

(b) IGM

I

M’

(c) IM’

FP FP MultMapper

L2 Cache &Sys Interface

Int Exec

IntMapper

IntExec

Instruction Cache & Line and Set Predictors

Data Cache

FPQueue

MBOXITBFP Add

FileReg

DIVSQRT

PC

Branch Pred

Reg File

Int Q

Int Q

Reg File

Int Exec Int Exec

DTB DTB

LD/STPLL

(d) Floorplan (IM’)

Figure 4.6: Illustration of the three different configurations (a-c) of global clock dis-tribution, and a possible floorplan (d) for the ILC-based global clock distributionin Alpha 21264. Each configuration is designated according to its clocking network:XGM, IGM, and IM′.

ILC configuration, we still use grids, but use only a single level of grids, which consist

of all the major clock grids and the portion of the global grid that directly feeds logic

circuit (Fig. 4.6-c). With this configuration, the load of the clock network can be

derived based on results reported in [95, 96] and technology files. Finally, thanks to

the deskew capability of ILOs, there is no need to use a balanced global clock tree. In

Figure 4.6-d, we show an example clock tree design. In this example, each macroblock

in the floorplan is driven by an ILO which is at the leaf of the global clock tree.

To evaluate the benefits of injection-locked clocking, we perform both circuit- and

Page 132: Low-Power, Gigahertz Clock Generation and Distribution

111

architecture-level simulations (In collaboration with Prof. Huang’s group) on the

baseline processors with each clock distribution configuration in Fig. 4.6. In order to

reflect the state of the art, we scale the global clock speed from 600 MHz to 3 GHz,

and correspondingly the process technology from 0.35 µm to 0.13 µm. The validity

of scaling is verified using Pentium 4 Northwood 3.0 GHz processor as the reference.

At the circuit level, we use a commercial circuit simulator, Advanced Design Sys-

tems (ADS), to evaluate power consumption and jitter performance of the clock distri-

bution network with different configurations. The simulations are based on extracted

models of the clock distribution networks, including the buffer size, interconnect ca-

pacitance, and local clock load capacitance. Then the distribution network model is

applied in the circuit simulation with ILOs and clock buffers constructed using SPICE

models of transistors.

PowerMeter

ClockJitter

Clock Distribution

vdd

NoisyPowerSupply

ClockSource

withJitter

Clock PeriodDistribution

T

PowerMeter

ClockJitter

Clock Distribution

vdd

NoisyPowerSupply

ClockSource

withJitter

Clock PeriodDistribution

T

Figure 4.7: Circuit-level jitter simulation setup.

Since jitter is largely introduced by power supply and substrate noise through

clock buffers, a noise voltage source with a Gaussian distribution is inserted to the

power supply node, as shown in Fig. 4.7. Transient simulation is used to calculate the

voltage and current waveforms along the clock distribution. Output clock waveform is

analyzed statistically to get the distribution of the clock period. Jitter at the output

is then calculated based on this distribution. Jitter is first measured in the baseline

conventional clocking configuration, and the noise source amplitude is determined by

Page 133: Low-Power, Gigahertz Clock Generation and Distribution

112

matching measured jitter with reported value in [60], 35 ps. The same noise voltage

source is then used in the subsequent jitter simulation for the ILC configurations, and

the results are compared to the baseline configuration. We believe this approach is

actually pessimistic considering the target jitter number (35 ps) is among the lowest in

conventional clocking reported [56]. The source jitter from on-chip PLL is represented

using a built-in ADS model of clock with jitter, and the clock jitter is chosen to be 5

ps, which is consistent with jitter of on-chip PLLs published.

For architectural simulations, we use a heavily modified version of SimpleScalar

toolset extended with Wattch for the dynamic energy component, and HotSpot and

BSIM3 models for temperature-dependent leakage modeling in 0.13µm technology

with a vdd of 1.5 V. For brevity, the detailed parameters of the simulation are left in

the technical report [97].

In the circuit simulation, the PLL source jitter is set to 5 ps, and the value of

the added power supply noise source is chosen so that the output clock jitter for

the baseline processor (Fig. 4.6-a) is 35 ps [60]. Apparently, there is 30 ps jitter

added along the clock distribution, which comes from the power supply noise coupled

through the buffers. For the clock speed of 3 GHz, the overall jitter in the baseline

processor therefore corresponds to 10.5% of the clock cycle. In the case of ILC with

IGM configuration (Fig. 4.6-b), under the same power supply noise and source jitter,

the output clock jitter is lowered to 15 ps – a 57% reduction. This translates into

recovering 6% of a clock cycle at 3 GHz, a significant performance improvement.

The jitter reduction can be attributed to the reduced number of clock buffers and

good noise rejection of ILOs. When ILOs are used to directly drive the local clock

grids without the global grid as in IM′ configuration (Fig. 4.6-c), thanks to the further

reduction in the buffer stages, jitter is lowered to 12 ps, or 66% lower than the baseline.

These results clearly demonstrate that ILC can achieve better jitter performance than

conventional clocking.

Page 134: Low-Power, Gigahertz Clock Generation and Distribution

113

In the current study, it is assumed that built-in deskew capability of ILOs can

reduce the skew to below 15 ps, resulting in 10ps savings in timing margin compared

to the baseline processor (without any deskew). This estimate is consistent with the

results using existing deskew schemes [56], and hence quite reasonable. In fact, we

believe ILC should lead to even lower skew, which can be supported by a test chip

measurement shown below.

The results of using different clocking structures are summarized in Fig. 4.8. In

this comparison, all configurations achieve the same cycle time. The density of the

grids and the driving capabilities are determined using circuit simulation. We choose

the design point where energy is minimized.

Simulations show that the power consumption of the baseline processor ranges

from 30.4 W to 50.4 W with an average of 40.7 W. The power can be divided into

three categories: global clock distribution power, leakage, and the dynamic power of

the rest of the circuit. The breakdown of the power is shown in Fig. 4.8. The global

clock is unconditional and consumes 9.2 W or (23%).

Now we analyze the power savings of ILC. For IGM (Fig. 4.6-b), power savings

come from two factors. First, the power consumed in the top-level X-tree is reduced

from 1.72 W to 1.56 W because of the reduction of the total levels of buffers used

and the lowered voltage swing on the X-tree. Second, as explained above, jitter and

skew all improved when using ILC: a 20 ps reduction in jitter and 10 ps in skew are

achieved. These savings increase the available cycle time for logic from 273 ps to 303

ps. This, in turn, allows a reduction in Vdd without affecting the clock speed. We

use the following voltage-delay equation from [98] to calculate the new Vdd, which is

1.415 V.

t =C

k′(W/L)(Vdd − Vt)

[

2Vt

Vdd − Vt+ ln

(

3Vdd − 4Vt

Vdd

)]

The power reduction for the tested applications ranges from 3 W to 5.2 W with an

Page 135: Low-Power, Gigahertz Clock Generation and Distribution

114

average of 4.1 W, or 10.1%. The reduction is mainly due to the lowering of supply

voltage.

The second ILC configuration, IM′ (Fig. 4.6-c), further reduces clock distribution

power by reducing the size of the grid. For IM′, the global clock power is reduced to

5.9 W (from 9.2 W in XGM) and the combined jitter and skew reduction is 33 ps,

which allows us to scale Vdd to 1.41 V. The overall effect is an average of 6.8 W (17%)

total power reduction. Compared to IGM, IM′ further reduces power by 2.7 W, or

7%.

5.9 5.7 5.6 5.5

27.2 25.9 23.0 22.5

15.69.2

8.0 5.9

0

10

20

30

40

50

XG' XGM IGM IM'

Po

we

r (W

att

s)

Leakage power Circuit power Clock power

Figure 4.8: Breakdown of processor power consumption with different clock distribu-tion methods.

For reference, we also show the result of replacing the two levels of grids by a

single grid in the conventional configuration. Note that this grid is different from

the M′ grid as it needs higher density and larger buffers to achieve the same overall

cycle time target. We designate this grid G′, and the configuration XG′. We use

the same methodology to compute its jitter performance, clocking load, and power

consumption. From the results, it is clear that ILC significantly improves power

consumption. It is also clear that using a single-level grid per se is not the source

of energy savings for IM′: using a single grid in the conventional design leads to a

significant 7.9 W of extra power consumption.

Overall, we see that ILC can be introduced to a processor in various levels of

ease. With minimum design intrusion, when only the very top level of the clock tree

is modified to use injection locking, energy reduction is already significant (10%),

Page 136: Low-Power, Gigahertz Clock Generation and Distribution

115

thanks to the lowered jitter and skew. When we further optimize the clocking grid,

the power savings become more pronounced (17%). All these are achieved without

affecting performance or the design methodology of the processor.

4.3 ILC Circuit Prototypes

4.3.1 Prototype I: ILC with Divide-by-two ILOs

The first test chip is designed with a divide-by-two injection-locked frequency

divider as the local clock regenerator. It is designed and implemented in a standard

0.18 um digital CMOS technology with low-resistivity substrate (Fig. 4.9-(a)). A

3-section H-tree mimics the global clock distribution network in real microprocessors.

The root of the H-tree is directly connected to a ground-signal-ground (GSG) pad to

facilitate testing (Fig. 4.10). The leaves of the H-tree are four divide-by-two ILOs ,

which divide the input 10-GHz clock signal into 5-GHz local clocks. The differential

outputs of ILOs then drive four open-drain differential amplifiers, which are directly

connected to output RF pads.

The differential divide-by-two ILO is shown in (Fig. 4.9-(b)). This is essentially a

differential LC oscillator, with the input signal injected into the gate of the tail tran-

sistor. We chose this ILO for the test chip because of its well-understood operation

and good performance. Spiral inductors are made on metal 5 with a quality factor

about 4 at 5 GHz. Such low Q is not a problem for ILO operation and actually helps

increase the locking range. If better metal is available, the power efficiency can be

further improved. NMOS transistors biased in inversion region are used as varactors

to tune the ILO center frequency, which in turn changes the phase of the local clocks

for deskewing purpose.

The H-tree is constructed using coplanar-waveguide (CPW) transmission lines.

Page 137: Low-Power, Gigahertz Clock Generation and Distribution

116

(a) (b)

Figure 4.9: Schematic of (a) the test chip and (b) a divide-by-two ILO used.

Figure 4.10: Chip micrograph of the test chip. The whole chip size is 1.5mm×1.3mm,and each ILO occupies 0.25mm× 0.22mm. The H-tree sections measure 500 µm, 280µm, and 290 µm, respectively, from root to leaves.

Page 138: Low-Power, Gigahertz Clock Generation and Distribution

117

Figure 4.11: Spectrum of the generated local clock signal from ILO1, identical to thatfrom other ILOs on-chip.

Bottom shield is used to reduce substrate coupling in a real microprocessor environ-

ment. This limits the maximum characteristic impedance of the transmission line to

be just over 40Ω in this technology. So the transmission lines from the H-tree leaves

to the root are designed to be 40Ω, 20Ω and 10Ω, respectively, in order to achieve

impedance matching at all junctions. Width of signal and ground lines, spacing be-

tween them, and choice of metal layers are also optimized for minimizing the clock

propagation loss.

The test chip is measured using an RF probe station. The input is a sinusoidal

signal from a continuous-wave (CW) signal generator. The power supply voltage is

1.4 V. The spectra of the local clock signals generated by the four ILOs are almost

identical, and one of them is shown in Fig. 4.11.

The locking range of ILOs on the test chip is found to be identical, and that of

ILO1 is shown in Fig. 4.12. The injection signal amplitude is calculated from the

measured incident power and reflection coefficient (S11) at the root of the H-tree.

Page 139: Low-Power, Gigahertz Clock Generation and Distribution

118

Figure 4.12: Locking range of ILO1, identical to that of other ILOs on-chip.

It can be seen that when the input signal has rail-to-rail swing (1.4 V), the locking

range is about 17%, which is sufficient for both accommodating process/temperature

variation and deskew tuning (see below).

Phase noise of both the input and output clock signals are shown in Fig. 4.13. The

6-dB reduction (up to about 500-kHz offset) because of the divide-by-two operation

is evident, which shows that the internal ILO noise is suppressed by injection locking.

The deskew capability of injection-locked clocking is demonstrated in Fig. 4.14.

Fig. 4.14(a) shows the whole deskew surface when tuning ILO1 by Vtune1, and/or

ILO2 by Vtune2. One particular tuning example is shown in Fig. 4.14(b), where Vtune1

and Vtune2 are tuned differentially, and the deskew range is up to 80 ps. Because

of the large deskew range, small imbalance in the global clock tree can be easily

compensated, which greatly relaxes the requirement on the design and layout of the

clock distribution network.

The test chip consumes a total power of 52.8 mW, where 45.3 mW comes from

the 1.8 V-supplied open-drain buffers. The ILOs core circuitry working under 1.4-V

Page 140: Low-Power, Gigahertz Clock Generation and Distribution

119

103 104 105 106−130

−125

−120

−115

−110

−105

−100

−95

−90

−85

−80

Offset frequency (Hz)

Phas

e no

ise (d

Bc)

Reference clockLocal Clock 1Local Clock 2Local Clock 3Local Clock 4

Figure 4.13: Phase noise of reference clock and 4 output clocks at different positionson chip.

Figure 4.14: Deskew capability of ILC. (a) deskewing when tuning ILO1 and/or ILO2;(b) deskewing when tuning ILO1 and ILO2 differentially. The skew is measuredbetween the two output clock signals of ILO1 and ILO2. Note that there is someimbalance between ILO1 and ILO2 caused by mismatch in the clock distribution treeand measurement system.

Page 141: Low-Power, Gigahertz Clock Generation and Distribution

120

vdd only consumes 7.3 mW when biased low and injection signal is 6 dBm. The bias

circuitry consumes 0.2 mW.

4.3.2 Prototype II: ILC with Non-division ILOs

Because of the usage of the divide-by-two injection-locked oscillators, the global

clock needs to run at twice frequency of the local clock. Also, the analog tuning of the

ILO delay is not robust to noise at the controlling node. In order to improve over these

issues, a second ILC prototype was designed with non-division ILO as the local clock

regenerators. The non-division ILO are as described in chapter II with a transformer

injection because we need a single-to-differential conversion as we stick to single-ended

global clock distribution. For delay tuning part, we replace the frequency-tuning

varactor C1 in the LC tank with an array of more linear MIM switched capacitors

(Fig. 4.18), whose values are binary weighted and can be controlled digitally. Digital

tuning enables fast nonlinear deskew algorithm and thus can be very useful for runtime

deskew. It also avoids noise pick-up from the sensitive analog tuning voltage node.

As a special case of divide-by-odd-number operation, divide-by-one, or non-division

injection-locked oscillators also find applications in high-speed clocking systems, i.e.,

clock distribution and serial link clock recovery, because of its phase tunability and

noise filtering property. The most intuitive way to construct a non-division ILO out

of an LC differential oscillator will be adding a differential pair for direct injection

into the tank, as shown in Fig. 4.15-a. This topology requires a differential input,

which is not always available. For applications where there is only single-ended input,

another topology with transformer input is proposed in Fig. 4.15-b. In this topology,

the transformer convert the single-ended input to differential currents, which inject

to the sources of the cross pair. Bias of the tank can be provided by taping a current

source to the center of the transformer secondary.

In non-division ILOs, there is a direct summation of the injected signal and the

Page 142: Low-Power, Gigahertz Clock Generation and Distribution

121

Figure 4.15: Two possible circuit implementation of non-division injection-lockedoscillator. (a) requires a differential input while (b) can take both single-ended anddifferential input.

Figure 4.16: Non-division ILO analysis: (a) circuit model; (b) loop behavior model.Similar to the divide-by-3 topology, differential injection is required for non-divisionoperation.

output voltage by the cross-coupled pair M1 and M2 (Fig. 4.16-a). Thus it can be

analyzed by the model [23] shown in Fig. 4.16-b, where the output phase ϕ is the

phase shift introduced by the ILO and Gm is transconductance from transistors M1

and M2.

Page 143: Low-Power, Gigahertz Clock Generation and Distribution

122

The closed loop equation for the model is

Gm|Z(jω)|ejβ(ω)(Viejωt + Voe

j(ωt+ϕ)) = Voej(ωt+ϕ) (4.1)

where β(ω) is the phase part of Z(jω). Assuming large Q, we can write

β(ω) = −arctan(2Q∆ω

ω0

)

where ∆ω = ω − ω0.

Gm|Z(jω)|ejβ(ω)(ηe−jϕ + 1) = 1 (4.2)

where η = Vi

Vois defined as injection ratio of the ILO. The phase condition of the loop

equation is

−arctan(2Q∆ω

ω0) − arctan(

ηsin(ϕ)

1 + ηcos(ϕ)) = 0 (4.3)

from which we can derive the locking range of the new topology ILO as

LR =ω0

Q

η√

1 − η2(4.4)

which is a function of the oscillation frequency, quality factor of the tank and the

injection ratio.

With the phase condition Eqn. 4.3 , we can derive the phase shifting of the ILO

Page 144: Low-Power, Gigahertz Clock Generation and Distribution

123

as:

ϕ = −arcsin(1

1 + ( ω0

2Q∆ω)2

)

−arcsin(1

η

1√

1 + ( ω0

2Q∆ω)2

) (4.5)

Phase shifting versus normalized frequency offset Q∆ωω0

from this equation for different

injection ratio is plotted in Fig. 4.17. From the plot, we can see three properties, (a)

the phase shifting is monotonic to the frequency offset and is quite linear except at

the edges of locking range; this property shows ILO can provide the desired linear

phase shifting for delay tuning. (b) the phase tuning range can be as large as 180

degrees; this shows the delay tuning range can be as large as half the clock cycle and

(c) phase shifting is inverse proportional to the injection ratio at a particular offset

frequency; this allows the delay step and delay tuning range to be programmable via

changing the injection ratio under fixed physical design.

A 4GHz test chip is designed and implemented in a standard 0.18 um digital

CMOS technology with low-resistivity substrate (Fig. 4.19-a).Input transformer and

symmetric inductor of the ILO are implemented on Metal 5 (Fig. 4.19-b). Transformer

has a k factor of 0.6 and inductor has a quality factor of 4, both at 4 GHz. The

switched capacitor array was designed with capacitance ratio of 1:2:4:8:16 to enable

a 5-bit binary tuning. Large capacitors are realized by combining multiple of the

minimum-sized unit capacitors to ensure linearity.

Four ILOs are placed as 4 local clock regenerators at the leaves of a 3-section

H-tree, which mimics the global clock distribution network in real microprocessors.

H-tree dimensions are 400 um, 100 um and 250 um for 3 sections, respectively. The

root of the H-tree is directly connected to a ground-signal-ground (GSG) pad to

facilitate testing (Fig. 4.19-b). The H-tree is constructed using coplanar waveguide

Page 145: Low-Power, Gigahertz Clock Generation and Distribution

124

Figure 4.17: Phase shifting characteristics of non-division ILO at different injectionratios when Q = 4.

Figure 4.18: Non-division ILO with binary weighted switch capacitor tuning.

Page 146: Low-Power, Gigahertz Clock Generation and Distribution

125

(a)

(b)

Figure 4.19: (a) Schematic and (b) chip micrograph of the test chip.

Page 147: Low-Power, Gigahertz Clock Generation and Distribution

126

Figure 4.20: Measured spectrum for output clock.

(CPW) transmission lines. Bottom shield is used to reduce substrate coupling in a

real microprocessor environment. A differential open-drain buffer is used at each ILO

output to drive the 50Ω test port.

The whole test chip occupies an area of 1.5 × 1.8mm2. Each ILO uses only

0.37 × 0.1mm2.

In the measurement, the input is a sinusoidal signal from a continuous-wave (CW)

signal generator. Output clock spectra are measured for each ILO and one of them

is shown in Fig. 4.20. The clean spectrum and low harmonic contents proves that

injection locking is quite efficient. Locking ranges are measured with different input

signal amplitude. Up to 12% locking range are achieved for all the ILOs when input

amplitude is about 0.8V at the root of the H-tree (Fig. 4.21). According to the ILO

locking range equation (Eqn. 4.4 ), this locking range corresponds to an injection

ratio η of about 0.43.

In order to characterize the phase tuning of the ILO, free-running frequency tuning

using the switched capacitor array was first measured. A tuning range of 0.22 GHz

was achieved, which corresponding to 5.5% at a center frequency of 4 GHz. With

Page 148: Low-Power, Gigahertz Clock Generation and Distribution

127

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.93.7

3.8

3.9

4

4.1

4.2

4.3

Boun

ds o

f Loc

king

Rang

e(G

Hz)

Upper BoundLower Bound

0

0.1

0.2

0.3

0.4

0.5

Voltage Swing@ Injection Port(Volt)

Lock

ing

Rang

e(G

Hz)

Locking Range

Figure 4.21: Measured locking range of an ILO in ILC.

Figure 4.22: 5-bit digital phase shift tunings for two ILOs’ outputs.

Page 149: Low-Power, Gigahertz Clock Generation and Distribution

128

Figure 4.23: Phase noise comparison of the input clock from signal source and outputclock from the ILO.

the tank quality factor of 4 and the injection ratio of 0.43 derived from locking range

above, the theoretical phase tuning range is found to be 84o according to Eqn. 4.5.

Then phase tuning of the ILO at injection was directly measured from the output

waveform on a digital oscilloscope. Output waveforms for different tuning conditions

were recorded and their zero crossing indicated their phase shift information. Phase

shift tuning of two ILOs in the ILC are measured at the same time for comparison

purpose. After calibrating out their cable mismatch and referring the phase to that

of the center tuning point of one of the ILOs, the phase tuning curves are plotted in

Fig. 4.22. The result shows that a phase tuning range up to 80o is achieved, which

corresponds to 55 ps delay tuning range with a step size of 1.8 ps in time domain. Also

the measured 80o phase shift tuning range shows good agreement with the calculated

theoretical value of 84o.

Page 150: Low-Power, Gigahertz Clock Generation and Distribution

129

Phase noise of both signal generator and ILC output are also measured and com-

pared in Fig. 4.23. From the comparison, we can see the ILO output phase noise

tracks exact the input phase noise up to 600 kHz. A self-triggered jitter measurement

[17] was made for both the signal source and ILC output to characterize their jitter

profiles. After removing the jitter introduced by the triggering circuit by

δ∆T,eff =√

δ2∆T,meas − δ2

∆T,min (4.6)

the effective rms jitters versus measurement time for both signal source and ILC

output are plotted in Fig. 4.24. From the comparison, we can see only 0.03 ps of

cycle-to-cycle jitter is added by the ILC network, which corresponding to 0.012% of

the clock cycle, thanks to the noise filtering effect of the ILO.

The power consumption for each ILO is only 0.95 mW under 1-V power supply

voltage. Each open drain buffer burns 4.2 mW from 1.5-V vdd.

4.3.3 Prototype III: ILC with Active Deskewing

In the first two ILC prototypes, the deskew operation is performed by external

controls, either using analog voltage or digital codes. Such external controls facilitated

the initial verification and characterization of the deskew capability of injection-locked

clocking. In the third ILC circuit prototype, we implemented on-chip deskew loop for

the ILC, which can be more useful for pratical applications [99].

In real systems, there are usually two types of active deskew architectures. The

first type has a reference clock driving no real load distributed to every clock domain.

The delay to all clock domains are carefully designed and compensated so that this

reference clock can act as a time standard for the real clocks in each domain. For

such an architecture with N clock domains, the extra cost for active deskew will

be N phase detectors and a reference clock distribution. The second type active

Page 151: Low-Power, Gigahertz Clock Generation and Distribution

130

10−10 10−9 10−8 10−7 10−610−1

100

101

RMS

jitter

of s

ourc

e (p

s)

10−10 10−9 10−8 10−7 10−610−1

100

101

Measurement time relative to sampling edge (s)

RMS

jitter

of I

LC o

utpu

t (ps

)

Figure 4.24: Jitter measurement of the signal source and ILC output at 4 GHz. Fromextrapolation, the cycle-to-cycle jitter of signal source and ILC output are 0.11 psand 0.14 ps, respectively.

Figure 4.25: ILC with active deskewing.

Page 152: Low-Power, Gigahertz Clock Generation and Distribution

131

deskew architecture does not distribute a separate reference clock, in stead, skews

are compared and compensated between adjacent clock domains. Suppose again a

clock distribution with N clock domains, a complete deskew of the whole distribution

requires N − 1 phase comparison and compensation processes. Thus, the cost for the

second approach is only N − 1 phase detectors and no extra reference distribution.

We will choose the second approach as the deskew architecture for our ILC prototype.

A self-controlled active deskew scheme based on injection-locked clocking is shown

in Fig. 4.25. In this scheme, clock skews between different clock domains are measured

by phase detectors (PDs). Skew information is fed back to the microprocessor core,

which then controls the operation of each deskew logic (DSK) associated every ILO.

Delays of each ILO are controlled by the deskew logics, so that after the deskew

process, the skew to the inputs of each clock domains are minimized. Depending on

the system requirement, such a deskew process can run once after the power up, or it

can be enabled periodically for a dynamic control on the skew as the microprocessor

conditions change with different working tasks and environment. Because digital

controlled deskew scheme has better noise rejection than analog controlled ones, the

digital based control will be maintained for the new ILC prototype.

Fig. 4.26 shows the schematic of the test chip to demonstrate the proposed ILO-

based active deskew in ILC. The input clock signal is distributed by a passive H-tree to

each clock domain, and injection-locked to an ILO. Each ILO drives a 2-pF clock load,

which models the local clock load in real processors, through a differential to single-

ended buffer (Buf1). The ILOs use a newly developed transformer-direct-injection

topology [100], as shown in the blow-up of Fig. 4.26. A 5-bit binary-weighted switched

capacitor array is implemented in the LC resonator of each ILO for phase tuning. The

5-bit binary coded digital control is generated from a 5-bit bi-directional counter built

inside the deskew control logics (DSKs). The DSK algorithm and an example of the

deskew sequence is shown in Fig. 4.27-a and Fig. 4.27-b. When the counter counts UP,

Page 153: Low-Power, Gigahertz Clock Generation and Distribution

132

Figure 4.26: An injection-locked clocking system with ILO-based active deskew. FourILOs are driven by the input clock through an H-tree. Each ILO is buffered by Buf1

to drive 2 pF of on-chip load capacitor (CL), which also converts the ILO differentialoutput to a single-ended signal. Output buffers Buf2 drive the test ports (TPx).

Page 154: Low-Power, Gigahertz Clock Generation and Distribution

133

Figure 4.27: (a) Deskew logic algorithm, and (b) an example of the deskew sequencewhich shows the design for ringing prevention.

Page 155: Low-Power, Gigahertz Clock Generation and Distribution

134

Figure 4.28: Test chip die photo.

the ILO free-running frequency decreases, and the ILO phase tuning increases, and

vice versa. The counter value can also be preset from external. This enables a manual

adjustment of the ILO delays for test purpose. The clock phases from adjacent clock

domains are compared by a digital phase detector, and the 1-bit skew information

is fed into the deskew control logic, which is Dn in Fig. 4.27. The skew information

for two previous cycles are also stored by two registers R1 and R2 to implement the

ringing detection and prevention algorithm in the deskew control logic. Once a ringing

happens, the DSK forces the counter to enter a stop state, until a start from external

to restart the deskew control logic.

The test chip was fabricated in a standard 0.18 µm digital CMOS technology. The

clock frequency is set at 3.5 GHz, representing the state-of-the-art processor speed.

Each clock domain has a capacitive load of 2 pF to model the local clock load in

real processors. The transformer and inductors in the ILOs are all built with the 0.3

5µm-thick top metal layer. The transformers have a k factor of 0.77 at 3.5 GHz. The

symmetric spiral inductor have an inductance value of 2.8 nH and quality factor of

Page 156: Low-Power, Gigahertz Clock Generation and Distribution

135

0.1 0.2 0.3 0.4 0.5 0.63.35

3.4

3.45

3.5

3.55

3.6

Boun

ds o

f Loc

king

Rang

e(G

Hz)

0

1

2

3

4

5

6

7

Input Amplitude (V)

Lock

ing

Rang

e(%

)

Figure 4.29: Measured locking range for the ILC network.

4.1 at 3.5 GHz. The die photo of the test chip is shown in Fig. 4.28, which measures

2 mm by 2 mm.

The test chip is measured on a probe station. Locking range of the ILC network is

measured up to 6.5% with input amplitude of 0.6 V (Fig. 4.29). Time domain clock

waveforms from each clock domain are measured on 50 GHz sampling oscilloscope

to study the clock timing. When comparing the timing of different clock domains,

connector and cable delay mismatch is first characterized and then used to calibrate

the measured results.

The free-running frequency tuning and locked state phase tuning of ILOs are first

characterized to show the deskew capability of the ILC network, as shown in Fig. 4.30.

The measured ILO free-running frequency tuning is pretty linear with a step size of

2.5M Hz. This linear frequency tuning generated a linear phase tuning for ILOs in

the locked state, with range of 40 ps and a step size of 1.25 ps Then the dynamics

of the deskew loop is measured as shown in Fig. 4.31. An initial skew of -16 ps is

preset between the two clock domains before the deskew loop starts. The deskew loop

reduces the skew to a final residual value of 2 ps within 15 cycles of the deskew clock

Page 157: Low-Power, Gigahertz Clock Generation and Distribution

136

Figure 4.30: (a) Measured free-running frequency tuning, and (b) delay tuning underlocked state by 5-bit switched capacitor array.

Page 158: Low-Power, Gigahertz Clock Generation and Distribution

137

Figure 4.31: Deskew dynamics of the deskew loop.

with a little overshoot. The residual skew can be attributed to the deskew step size

limitation and phase detector offset.

The phase noise of ILC output clock is measured and compared with that of the

input clock source and free-running ILO (Fig. 4.32a). The ILC output tracks the

phase noise of the source clock up to 10 MHz, and shows up to 60-dB improvement

over free-running case near the offset of 10 kHz. Cycle-to-cycle jitters for both ILC

output and input clock are measured with a self-triggered method, and compared in

Fig. 4.32b. The jitter accumulated in ILC is largely negligible (0.04 ps).

Each ILO in the test chip consumes 12 mA from a 1-V supply, and each Buf1

consumes 40 mA from 1.8-V supply. Each PD and DSK consumes 6.8 mA from 1.8-V

supply.

Page 159: Low-Power, Gigahertz Clock Generation and Distribution

138

(a)

10−10 10−9 10−8 10−7 10−610−1

100

Measurement time relative to sampling edge (s)

RMS

Jitte

r (ps

)

Signal SourceILO output

1 T

0.11ps

0.15ps

(b)

Figure 4.32: (a) Phase noise of the ILC output in comparison with input clock andfree-running ILO. (b) Cycle-to-cycle jitter test for ILC output and input clock. Thedegradation is only 0.04 ps.

Page 160: Low-Power, Gigahertz Clock Generation and Distribution

Chapter 5

Future Work and Conclusions

5.1 Future Work

5.1.1 Injection-Locked Clock and Data Recovery

Although most clock and data recovery circuits (CDRs) today are typically de-

signed based on a dual-loop phase-locked loop (PLL), such PLL-based CDRs are

difficult to design, costly in terms of power and area, and suffer from several other

limitations [6]. For example, in designing a PLL-based CDR, the designer must com-

promise between the ability to track the data signal and noise suppression of the PLL.

Additionally, the dynamics of PLL-based CDRs are dependent on the contents of the

data signal, and PLL-based CDRs can have a long locking time since they must lock

to both the frequency and phase of the data signal. PLL-based CDRs also suffer from

analog offsets and device mismatches which can cause the receiver circuitry to sense

the data signal at shifted, sub-optimal sampling points. Lastly, for chips receiving

multiple data signals, a dedicated PLL-based CDR must be provided for each data

signal. This is a costly requirement since these PLLs typically require relatively large

silicon area (e.g. for large filter capacitors) and dissipate relatively large amounts of

139

Page 161: Low-Power, Gigahertz Clock Generation and Distribution

140

Figure 5.1: (a) Conventional dual-loop CDR, with loop I for frequency acquisition andloop II for phase acquisition. (b) Injection-locked CDR with frequency acquisitionachieved by injection locking.

Page 162: Low-Power, Gigahertz Clock Generation and Distribution

141

power (e.g. for various high speed PLL components).

At the same time, most optical standards require a very narrow loop bandwidth

for CDR circuits, typically less than 1% of the operating speed. CDR circuits designed

under such a specification shows a very small frequency acquisition range and long

lock time. In order to solve this problem, dual-loop structure with another frequency

acquisition loop is proposed to extend the frequency acquisition range and increase

the lock speed, as shown in Fig. 5.1-(a). There are two disadvantages, however, for

such an approach. Firstly, in order to achieve frequency acquisition, the extra PLL

loop requires a reference frequency, usually from a crystal oscillator, which increases

the cost and integration difficulties. Secondly, the two PLL loops structure requires

a control mechanism to close one loop while the other is in operation, so that these

two will not conflict with each other. This complicate the design again.

We propose an injection-locked clock and data recovery circuit as shown in Fig. 5.1-

(b), where the frequency acquisition is achieved by the injection locking of the ILO,

while another phase acquisition loop adjusts the phase of recovered clock. Because the

data stream contains only frequency component which is half of the clock frequency,

a frequency doubler is required between the input data and the ILO. The relative

small lock time of an ILO can increase the frequency acquisition speed of such a

injection-locked CDR. While at the same time, no extra crystal oscillator reference is

needed, which greatly simplifies the structure and reduces the cost.

5.1.2 Injection-Locked Free-Space Optoelectronic Oscillators

Optoelectronic oscillators (OEOs) [101, 102], which consist of a laser, an optical

fiber cavity, a photo detector and an electrical amplifier, have received a lot of at-

tention as they can generate a low jitter optical pulse train and high-purity electrical

clock signal at the same time. Fig. 5.2 shows a typical optoelectronic oscillator topol-

ogy, where light from one of the output ports of the E/O modulator is detected by

Page 163: Low-Power, Gigahertz Clock Generation and Distribution

142

Figure 5.2: Optoelectronic oscillator with optical fiber as the resonator.

the photodetector and then is amplified, filtered, and feedback to the electrical-input

port of the modulator. If the modulator is properly chosen, self-electrooptic oscilla-

tion will be sustained. Because both optical and electrical processes are involved in

the oscillation, both the optical subcarrier and the electrical signal will be generated

simultaneously. The oscillation frequency fosc of the optoelectronic oscillator shown

in Fig. 5.2 can be expressed as

fosc =k

τ(5.1)

where k is an integer, representing different possible oscillating modes and τ is the

total group delay of the loop, including the physical-length delay of the loop and

the group delay resulting from dispersive components in the loop. Because of such

a relationship, a hybrid opto-electronic oscillator using a long optical fiber as the

frequency selective element can permit high tunability and almost no limitation on

the range of possible oscillation frequency, due to the high mode density that can

be generated. The loop-delay time τ not only determines the mode spacing of the

oscillation frequency, it also has a huge impact on the phase noise of the oscillator,

with phase noise decreases quadratically with τ . The larger the τ , the smaller the

phase noise.

Page 164: Low-Power, Gigahertz Clock Generation and Distribution

143

Figure 5.3: Injection-locked optoelectronic oscillator (OEO) with free space resonator.

We propose an injection-locked optoelectronic oscillator (OEO) with free space

resonator as shown in Fig. 5.3. In Fig. 5.3, we use a vertical-cavity surface-emitting

laser (VCSEL) [102] as both the laser source and the modulator, as compared with

the original OEO where an external laser source and an E/O modulator are used.

We also use free space optical link as the resonator instead of a long fiber. The free

space optical link includes two optical lens which focus the emission from the VCSEL

to the active area of the photo detector. The photo detector used in the experiment

is a PIN diode fabricated in standard digitcal CMOS process. A microwave amplifier

and electrical filter amplifies and selects the desired frequency of oscillation. A RF

coupler serve as both the electrical injection and output device. Since the free space

resonator delay can not be as long as the fiber version of OEO, the phase noise of free

space OEO tends to be inferior than fiber based version, however, external injection

can be used to clean the OEO oscillation spectrum purity. Such a free-space OEO

topology with external injection can be used as a method of generating clocks for

free-space optic communication links.

Page 165: Low-Power, Gigahertz Clock Generation and Distribution

144

5.2 Conclusions

This dissertation presents our study of injection-locking based high-speed low-

power clock generation and distribution techniques. Specifically, several circuit tech-

niques for the design of injection-locked clock dividers, injection-locked clock multi-

pliers and injection-locked clock distribution are introduced with silicon circuit pro-

totypes and measurement results.

In chapter 3, three circuit innovations for injection-locking based clock generation

are demonstrated. These three circuit innovations include: divide-by-odd-number

injection-locked frequency divider using differential injection and harmonic engineer-

ing; double balanced divide-by-two injection-locked frequency dividers for tunable

dual phase signal generation using the phase tunability of injection-locked oscilla-

tors; dual mode injection-locked frequency doubler and tripler with good harmonic

suppressions in lossy digital CMOS process.

In chapter 4, injection-locked clock distribution is presented with discussions of its

benefits over conventional buffered tree based clock distribution. Due to the resonant

nature and high effective amplitude gain, and phase tunability of injection-locked

oscillators, injection-locked clock distribution can achieve better power efficiency and

better jitter performance, while at the same time has built-in deskew capability. Three

circuit prototypes are fabricated together with an architecture evaluation simulation

are used to verify these benefits of injection-locked clock distribution.

In chapter 5, some future works are discussed. These include injection-locked clock

and data recovery (CDR) for digital baseband communication and injection-locked

free-space optoelectronic oscillators (OEOs).

In summary, injection locking is a special type of forced oscillation. Due to its

resonant nature, PLL-like noise transfer nature and output phase tunability, it has

great potential in gigahertz, lower power clock generation and distribution. We have

Page 166: Low-Power, Gigahertz Clock Generation and Distribution

145

demonstrated several such applications of injection-locking clocking circuits and dis-

cussed the their benefits through analysis and experiments based on real silicon test

chips. Some further applications of injection locking for high speed clocking are also

proposed.

Page 167: Low-Power, Gigahertz Clock Generation and Distribution

146

Bibliography

[1] http://www.sonet.com/.

[2] http://www.ieee802.org/3/.

[3] http://www.wi-fi.org/.

[4] http://www.bluetooth.com/bluetooth/.

[5] http://www.wirelesshd.org/.

[6] B. Razavi. Monolithic Phase-Locked Loops And Clock Recovery Circuits: TheoryAnd Design. IEEE Press, 1996.

[7] B. Razavi. RF Microelectronics. Prentice-Hall, 1998.

[8] International Technology Roadmap for Semiconductors 2007 Edition.http://public.itrs.net.

[9] http://www.pcisig.com/specifications/pciexpress/.

[10] http://www.serialata.org/.

[11] http://www.rambus.com/.

[12] D.A. Hodges, H.G. Jackson, and R.A. Saleh. Analysis and Design of DigitalIntegrated Circuits. McGraw-Hill, 2000.

[13] Eby G. Friedman. Clock Distribution Networks in VLSI Circuits and Systems.IEEE Press, 1995.

[14] E.B. Friedman. Clock Distribution Networks in Synchronous Digital IntegratedCircuits. Proc. IEEE, 89(5):665–692, May 2001.

[15] T.H. Lee. The Design of CMOS Radio-Frequency Integrated Circuits. CambrideUniversity Press, Cambridge, U.K., 1998.

Page 168: Low-Power, Gigahertz Clock Generation and Distribution

147

[16] A. Hajimiri and T.H. Lee. A general theory of phase noise in electrical oscilla-tors. IEEE J. Solid-State Circuits, 33(2):179–194, Feb. 1998.

[17] A. Hajimiri, S. Limotyrakis, and T.H. Lee. Jitter and Phase Noise of RingOscillators. IEEE J. Solid-State Circuits, 34(6):896–909, June 1999.

[18] R. Adler. A Study of Locking Phenomena in Oscillators. Proc. IRE, 34:351–357,June 1946.

[19] K. Kurokawa. Injection Locking of Microwave Solid-State Oscillators. Proc.IEEE, 61(10):1386–1410, Oct. 1973.

[20] H. Rategh, H. Samavati, and T.H. Lee. A CMOS Frequency Synthesizer withAn Injection-Locked Frequency Divider for a 5-GHz Wireless LAN Receiver.IEEE J. Solid-State Circuits, 35(5):780–787, May 2000.

[21] H. Wu and A. Hajimiri. A 19 GHz, 0.5 mW, 0.35 µm CMOS frequency di-vider with shunt-peaking locking-range enhancement. In IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, pages 412–3, 2001.

[22] H. Rategh and T.H. Lee. Superharmonic Injection-Locked Frequency Dividers.IEEE J. Solid-State Circuits, 34(6):813–821, June 1999.

[23] A. Mazzanti, P. Uggetti, and F. Svelto. Analysis and Design of Injection-Locked LC Dividers for Quadrature Generation. IEEE J. Solid-State Circuits,39(9):1425–1433, Sept. 2004.

[24] S. Verma, H. Rategh, and T. Lee. A Unified Model for Injection-Locked Fre-quency Dividers. IEEE J. Solid-State Circuits, 38(6):1015–1027, Jun. 2003.

[25] B. Razavi. A Study of Injection Locking and Pulling in Oscillators. IEEE J.Solid-State Circuits, 39(9):1415–1424, Sept. 2004.

[26] W.Z. Chen and C.L. Kuo. 18 GHz and 7 GHz Superharmonic Injection-LockedDividers in 0.25um CMOS Technology. In ”Proceedings of 2002 European Solid-State Circuits Conference (ESSCIRC)”, pages 89–92, 2002.

[27] J.C. Chien and L.H. Lu. Analysis and Design of Wideband Injection-LockedRing Oscillators with Multiple-Input Injection. IEEE J. Solid-State Circuits,42(9):1906–1915, Sept. 2007.

[28] B Razavi, K.F. Lee, and R.-H. Yan. A 13.4 GHz CMOS frequency divider. InIEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 176–177, 1994.

Page 169: Low-Power, Gigahertz Clock Generation and Distribution

148

[29] H. Wang, A. Akbar, and B. Song. A 1.8V 3mW 16.8GHz frequency dividerin 0.25 µm CMOS. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,pages 196–197, 2000.

[30] H. Knapp, H.-D. Wohlmuth, M. Wurzer, and M. Rest. 25GHz static frequencydivider and 25Gb/s multiplexer in 0.12 µm CMOS. In IEEE Int. Solid-StateCircuits Conf. Dig. Tech. Papers, pages 302–303, 2002.

[31] R.L. Miller. Fractional-frequency generators utilizing regenerative modulation.Proc. IRE, pages 446–457, July 1939.

[32] R.G. Harrison. Theory of regenerative frequency dividers using double-balancedmixers. In IEEE Int. Microwave Symposium, pages 459–462, 1989.

[33] R.G. Harrison. A broad-band frequency divider using microwave varactors.IEEE Trans. Microwave Theory Tech., 25(12):1055–1059, Dec. 1977.

[34] R.G. Harrison. Theory of the varactor frequency halver. In IEEE Int. MicrowaveSymposium, pages 203–205, 1983.

[35] G.R. Sloan. The modeling, analysis, and design of filter-based parametric fre-quency dividers. IEEE Trans. Microwave Theory Tech., 41(2):224–228, Feb.1993.

[36] W.-Z Chen and C.-L. Kuo. 18GHz and 7GHz Superharmonic Injection-LockedDividers in 0.25um CMOS Technology. In Proc. ESSCIRC, pages 89–92, 2002.

[37] A. Natarajan, A. Komijani, X. Guan, A. Babakhani, and A. Hajimiri. A 77-GHzPhased-Array Transceiver With On-Chip Antennas in Silicon: Transmitter andLocal LO-Path Phase Shifting. IEEE J. Solid-State Circuits, 41(12):2807–2819,Dec. 2006.

[38] C. Cao and K. K.O. A 50-GHz Phase-Locked Loop in 130-nm CMOS. In IEEECustom Integrated Circuits Conf. Dig. Tech. Papers, pages 21–24, May 2006.

[39] H. Wu and L. Zhang. A 16-to-18GHz 0.18µm Epi-CMOS Divide-by-3 Injection-Locked Frequency Divider. In IEEE Int. Solid-State Circuits Conf. Dig. Tech.Papers, pages 602–3, 2006.

[40] S. Reynolds, B. Floyd, U. Pfeiffer, and T. Zwick. 60 GHz Transceiver Circuitsin SiGe Bipolar Technology. In IEEE Int. Solid-State Circuits Conf. Dig. Tech.Papers, pages 442–443, 2004.

Page 170: Low-Power, Gigahertz Clock Generation and Distribution

149

[41] H. Zirath, T. Masuda, R. Kozhuharov, and M. Ferndahl. Development of 60-GHz Front-End Circuits for High-Data-Rate Communication System. IEEE J.Solid-State Circuits, 39(10):1640–1649, Oct. 2004.

[42] Y. Deval, J-B. Begueret, A. Spataro, P. Fouillat, D. Belot, and F. Badets.HiperLAN 5.4-GHz Low-Power CMOS Synchronous Oscillator. IEEE Trans.Microwave Theory Tech., 49(9):1525–1530, Sept. 2001.

[43] L. Zhang, B. Ciftcioglu, M. Huang, and H. Wu. Injection-Locked Clocking:A New GHz Clock Distribution Scheme. In IEEE Custom Integrated CircuitsConf. Dig. Tech. Papers, pages 785–788, 2006.

[44] J.J. Hung, T.M. Hancock, and G.M.Rebeiz. High-Power High-Efficiency SiGeKu- and Ka-Band Balanced Frequency Doubler. IEEE Transactions on Mi-crowave Theory and Techniques, vol. 53, No. 2, pp.754-761, 2005.

[45] J-C Chiu, C-P Chang, M-P Houng, and Y-H Wang. A 12-36 GHz PHEMTMMIC Balanced Frequency Tripler. IEEE Microwave and Wireless ComponentLetters, vol. 16, No. 2, pp.19-21, 2006.

[46] K. Yamamoto. A 1.8V Operation 5-GHz-Bnad CMOS Frequency Doubler Us-ing Current-Reuse Circuit Design Technique. IEEE J. Solid-State Circuits,40(6):1288–1295, June 2005.

[47] F. Ellinger and H. Jackel. Ultracompact SOI CMOS Frequency Doubler for LowPower Applications at 26.5-28.5 GHz. IEEE Microwave and Wireless Compo-nents Letters, vol. 14, No. 2, pp.53-55, 2004.

[48] C. Cao and E. Seok and K.K. O. 192-GHz push-pull VCO in 0.13um CMOS.Electron Lett., 42(4):208–209, Feb. 2006.

[49] J.P. Maligeorgos and J.R. Long. A Low-Voltage 5.1-5.8-GHz Image-Reject Re-ceiver with Wide Dynamic Range. IEEE J. Solid-State Circuits, 35(12):1917–1926, Dec. 2000.

[50] J. Wong and H. Luong. A 1.5-V 4-GHz Dynamic-Loading Regenerative Fre-quency Doubler in a 0.35-um CMOS Process. IEEE Transactions on Circuitsand Systems, 50(8):450–455, Aug. 2003.

[51] D.K. Ma and J.R. Long. A Subharmonically Injected LC Delay Line Oscil-lator for 17-GHz Quadrature LO Generation. IEEE J. Solid-State Circuits,39(9):1434–1445, Sept. 2004.

Page 171: Low-Power, Gigahertz Clock Generation and Distribution

150

[52] W.L. Chan and J.R. Long. A 56-65 GHz Injection-Locked Frequency Triplerwith Quadrature Outputs in 90-nm CMOS. IEEE J. Solid-State Circuits,43(12):2739–2746, Dec. 2008.

[53] L. Zhang, D. Karasiewicz, B. Ciftcioglu, and H. Wu. A 1.6-to-3.2/4.8 GHz Dual-Modulus Injection-Locked Frequency Multiplier in 0.18um Digital CMOS. InIEEE RFIC Symp. Dig. Papers, pages 427–430, 2008.

[54] International technology roadmap of semiconductor. www.itrs.org, 2005.

[55] V. Tiwari et al. Reducing Power in High-performance Microprocessors. InDesign Automation Conference (DAC), pages 732–737, 1998.

[56] A.V. Mule et al. Electrical and Optical Clock Distribution Networks For Gi-gascale Microprocessors. Transactions on VLSI, pages 582–594, Oct. 2002.

[57] G. Geannopoulos and X. Dai. An adaptive Digital Deskewing Circuit for ClockDistribution Networks. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Pa-pers, pages 400–401, 1998.

[58] S. Tam, S. Rusu, U.N. Desai, R. Kim, J. Zhang, and I. Young. Clock Genera-tion and Distribution for the First IA-64 Microprocessor. IEEE J. Solid-StateCircuits, pages 1545–1552, Nov. 2000.

[59] P.J. Restle et al. A Clock Distribution Network for Microprocessors. IEEE J.Solid-State Circuits, 36(5):792–799, May 2001.

[60] N.A. Kurd, J.S. Barkatullah, R.O. Dizon, T.D. Fletcher, and P.D. Madland. AMultigigahertz Clocking Scheme for the Pentium 4 Microprocessor. IEEE J.Solid-State Circuits, 36(11):1647–1653, Nov. 2001.

[61] J.W. Goodman, F.J. Leonberger, et al. Optical Interconnections for VLSI Sys-tems. Proc. IEEE, 72:850–866, July 1984.

[62] E. Kaimiley, P. Marchand, et al. Performance Comparison between Optoelec-tronic and VLSI Multistage Interconnect Networks. J. Lightwave Technol.,9:1674–1692, 1991.

[63] K.C. Cadien et al. Challenges for On-Chip Optical Interconnects. Proc. SPIE,5730:133–143, Nov. 2005.

[64] R. Li, X.L. Guo, D.J. Yang, and K. K.O. Initialization of a Wireless ClockDistribution System Using an External Antenna. In IEEE Custom IntegratedCircuits Conf. Dig. Tech. Papers, pages 105–108, 2005.

Page 172: Low-Power, Gigahertz Clock Generation and Distribution

151

[65] X. Guo, D.J. Yang, R. Li, and K. K.O. A Receiver with Start-up Initializationand Programmable Delays for Wireless Clock Distribution. In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 386–387, 2006.

[66] G.A. Pratt and J. Nguyen. Distributed Synchronous Clocking. IEEE Trans.Parallel Distributed Systems, 6(3):314–328, March 1995.

[67] V. Gutnik and A.P. Chandrakasan. Active GHz Clock Network Using Dis-tributed PLLs. JSSC, pages 1553–1560, Nov. 2000.

[68] H. Mizuno and K. Ishibashi. A Noise-Immune GHz-Clock Distribution Schemeusing Synchronous Distributed Oscillators. In IEEE Int. Solid-State CircuitsConf. Dig. Tech. Papers, pages 404–405, 1998.

[69] H.-A. Tanaka, A. Hasegawa, H. Mizuno, and T. Endo. Synchronizability ofDistributed Clock Oscillators. IEEE Trans. Circuits Syst. I, 49(9):1271–1278,Sep. 2002.

[70] J. Wood, C. Edwards, and S. Lipa. Rotary Traveling-Wave Oscillator Arrays:a New Clock Technology. IEEE J. Solid-State Circuits, 36(11):1654–1665, Nov.2001.

[71] F. O’Mahony, C.P. Yue, M.A. Horowitz, and S.S. Wong. A 10-GHz GlobalClock Distribution Using Coupled Standing-Wave Oscillators. IEEE J. Solid-State Circuits, 38(11):1813–1820, Nov. 2003.

[72] S.C Chan, K.L. Shepard, and P.J. Restle. Uniform-Phase Uniform AmplitudeResonant-Load Global Clock Distributions. IEEE J. Solid-State Circuits, pages102–109, March 2005.

[73] S.C. Chang, K.L. Shepard, and P.J. Restle. 1.1 to 1.6GHz Distributed Differen-tial Oscillator Global Clock Network. In IEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers, pages 518–519, 2005.

[74] H. Wu and A. Hajimiri. Silicon-Based Distributed Voltage Controlled Oscilla-tors. IEEE J. Solid-State Circuits, 36(3):493–502, Mar. 2001.

[75] E.L. Ginzton, W.R. Hewlett, J.H. Jasberg, and J.D. Noe. Distributed amplifi-cation. Proc. IRE, 36:956–969, Aug. 1948.

[76] K. Kamogawa, T. Tokumitsu, and M. Aikawa. Injection-Locked OscillatorChain: A Possible Solution to Millimeter-Wave MMIC Synthesizers. IEEETrans. Microwave Theory Tech., 45(9):1578–1584, Sept. 1997.

Page 173: Low-Power, Gigahertz Clock Generation and Distribution

152

[77] K. Kundert, J. White, and A. Sangiovanni-Vincentelli. Steady-State Methodsfor Simulating Analog and Microwave Circuits. Springer, 1990.

[78] M.J. Gingell. Single Sideband Modulation Using Sequence AsymmetricPolyphase Networks. Electrical Communication, 48(1-2):21–25, 1973.

[79] A. Ravi, K. Soumyanath, L.R. Carley, and R. Bishop. An Integrated 10/5GHzInjection-Locked Quadrature LC VCO in a 0.18µm Digital CMOS Process. InProceedings of the 28th European Solid-State Circuits Conf., pages 543–6, 2002.

[80] S. Gierkink, S. Levantino, R. Frye, C. Samori, and V. Boccuzzi. A Low-Phase-Noise 5-GHz CMOS Quadrature VCO Using Superharmonic Coupling. IEEEJ. Solid-State Circuits, 38(7):1148–1154, Jul. 2003.

[81] P. Kinget, R. Melville, D. Long, and V. Gopinathan. An Injection-LockingScheme for Precise Quadrature Generation. IEEE J. Solid-State Circuits,37(7):845–851, Jul. 2002.

[82] X. Guan, H. Hashemi, and A. Hajimiri. A Fully Integrated 24-GHz Eight-Element Phased-Array Receiver in Silicon. IEEE J. Solid-State Circuits,39(12):2311–2320, Dec. 2004.

[83] D.W. kang, D.H. Baek, S.H. Jeon, J.W. Park, and S.C. Hong. A MiniaturizedK-band Balanced Frequency Doubler Using InGaP HBT Technology. In IEEEMTT-S Int. Microwave Symp. Dig., pages 107–110, 2003.

[84] S. Hackl and J. Bock. 42 GHz Active Frequency Doubler in SiGe BipolarTechnology. In Proc. 3rd Int. Microwave and Millimeter Wave Tech. Conf.,pages 54–57, 2002.

[85] T. Masuda et al. Low Power Single-ended Active Frequency Doubler for a60GHz-band Application. In Proc. GaAs, 2002.

[86] C.Fager, L. Landen, and H. Zirath. High Output Power, Broadband 28-56 GHzMMIC FrequencyDoubler. In IEEE MTT-S Int. Microwave Symp. Dig., pages1589–1591, 2002.

[87] S. Tam, R.D. Limaye, and U.N. Desai. Clock Generation and Distributionfor the 130-nm Itanium 2 Processor With 6-MB On-Die L3 Cache. IEEE J.Solid-State Circuits, 39(4):636–642, April 2004.

[88] Z. Xu and K. Shepard. Low-Jitter Active Deskewing Through Injection-LockedResonant Clocking. IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers,pages 9–12, 2007.

Page 174: Low-Power, Gigahertz Clock Generation and Distribution

153

[89] S.C. Chang, P.J. Restle, K.L. Shepard, N.K. James, and R.L. Franch. A 4.6GHzResonant Global Clock Distribution Network. In IEEE Int. Solid-State CircuitsConf. Dig. Tech. Papers, pages 342–343, 2004.

[90] B. Mesgarzadeh, M. Hansson, and A. Alvandpour. Jitter Characteristic inResonant Clock Distribution. In Solid-State Circuits Conference, ESSCIRC2006. Proceedings of the 32nd European, pages 464–467, 2006.

[91] L. Lee and C.K. Yang. An Adaptive Low-Jitter LC-Based Clock Distribution.In IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pages 182–183, 2007.

[92] G. Semeraro et al. Dynamic Frequency and Voltage Control for a Multiple ClockDomain Microarchitecture. In International Symposium on Microarchitecture,pages 356–367, Istanbul, Turkey, November 2002.

[93] A. Iyer and D. Marculescu. Power-Performance Evaluation of Globally Asyn-chronous, Locally Synchronous Processors. In International Symposium onComputer Architecture, pages 158–168, Anchorage, Alaska, May 2002.

[94] Y. Zhu, D. Albonesi, and A. Buyuktosunoglu. A High Performance, Energy Ef-ficient, GALS Processor Microarchitecture with Reduced Implementation Com-plexity. In International Symposium on Performance Analysis of Systems andSoftware, pages 42–53, Austin, Texas, March 2005.

[95] W. J. Bowhill et al. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU. Digital Technology Journal, 7(1):100–118, 1995.

[96] D. W. Bailey and B. J. Benschneider. Clocking Design and Analysis for a 600-MHz Alpha Microprocessor. IEEE Journal of Solid-State Circuits, 33(11):1627–1633, November 1998.

[97] L. Zhang et al. Injection-Locked Clocking: A Lower-Power Clock DistributionScheme for High-End Microprocessors. Technical report, Dept. Electrical &Computer Engineering, Univ. of Rochester, September 2007.

[98] A. S. Sedra and K. C. Smith. Microelectronic Circuits. Oxford University Press,2004.

[99] L. Zhang, B. Ciftcioglu, and H. Wu. Active Deskew in Injection-Locked Clock-ing. In IEEE Custom Integrated Circuits Conference (CICC), 2008.

[100] L. Zhang, Berkehan Ciftcioglu, and H. Wu. A 1V, 1mW, 4GHz Injection-LockedOscillator for High Performance Clocking. In IEEE Custom Integrated CircuitsConf. Dig. Tech. Papers, 2007.

Page 175: Low-Power, Gigahertz Clock Generation and Distribution

154

[101] X. S. Yao and L. Maleki. Optoelectronic Oscillator for Photonic Systems. IEEEJournal of Quantum Electronics, 32(7):1141–1149, July 1996.

[102] P. Devgan and D. Serkland and G. Keeler and K. Geib and P. Kumar. AnOptoelectronic Oscillator Using an 850-nm VCSEL for Generating Low JitterOptical Pulses. IEEE Photonic Technology Letters, 18(5):685–687, March 2006.