[IEEE 2012 IEEE Custom Integrated Circuits Conference - CICC 2012 - San Jose, CA, USA (2012.09.9-2012.09.12)] Proceedings of the IEEE 2012 Custom Integrated Circuits Conference - A

A 10Gb/s 10mW 2-Tap Reconfigurable Pre-Emphasis Transmitter in 65nm LP CMOS

Yue Lu1, Kwangmo Jung1, Yasuo Hidaka2, and Elad Alon1

1University of California, Berkeley, 2Fujitsu Laboratories of America

Abstract – A low-power pre-emphasis voltage mode transmitter architecture with output swing control, pre-emphasis coefficient control, and online impedance calibration is proposed and demonstrated. A 65nm LP CMOS implementation of this architecture dissipates only ~10mW from a 1.2V supply when transmitting 10Gb/s 400mV differential peak-to-peak data with 2-tap pre-emphasis, achieving 1pJ/bit energy efficiency.

I. INTRODUCTION

The pursuit of energy-efficient high-speed links has popularized the use of voltage-mode (VM) transmitters due to their ideally 4x lower signaling power (=Vsw/2RT) compared to current mode logic (CML) transmitters [1]. These transmitters must typically support impedance matching and pre-emphasis (for signal integrity) as well as swing control to enable reduced power on clean channels.

While each of these individual functions has been supported by previous VM designs [1-4], combining all of these functions in an energy efficient manner remains challenging. Specifically, for VM TXs with pre-emphasis, the most straightforward approach to including pre-emphasis (as described in [1]) causes the signaling power to go up as the level of pre-emphasis is increased. Significant follow-up work has therefore been done to reduce the signaling power consumption of such pre-emphasis voltage-mode (PEVM) transmitters [2, 3]. However, both of these approaches require a large number of unit cells due to the non-linear mapping between output swing and driver impedance (as in [3]) or the need for simultaneous impedance calibration (as in [2]). Due to the constraint of transistor minimum size in a given technology, the large number of unit cells and associated decoding circuitry results in excess switching capacitance on the high-speed data path.

At low data-rates where the signaling power consumption is relatively large, the power overhead from extra digital gates on the data-path is mild. However, as the data-rate is increased and the digital power rises along with it, the power consumed by the complex pre-driver logic and unit cells necessary to support these PEVM TXs can eliminate any benefit from reduced signaling power.

In order to achieve a highly reconfigurable VM TX with swing control, impedance control, and pre-emphasis, and to also address the challenge of simultaneously reducing signaling power without sacrificing pre-driver power, this paper presents a new PEVM TX architecture that combines a shunt device for pre-emphasis (similar to [2]) with substantially simpler decoding logic. In order to clarify the motivations behind this architecture, in Section II we will first describe the digital pre-driver and signaling power characteristics of current PEVM transmitters, and then

describe how the proposed architecture improves upon these tradeoffs. In Section III we will then describe the proposed PEVM circuit implementation details – including the swing and impedance control circuitry. In Section IV we will then present measured result from a 65nm LP prototype, and finally conclude in Section V.

II. PEVM TX POWER CONSUMPTION

As previously discussed, much of the earlier work in PEVM designs has focused on reducing the signaling power overheads due to pre-emphasis. For the sake of comparison, the equivalent circuits of the TX designs from [1] and [3] and their total signaling currents are shown in Fig. 1. The design from [1] lowers the output voltage level (implementing pre de-emphasis) by creating an extra crowbar path (Gkill) from Vdrv to ground in parallel with the main signal path (Gsig). In contrast, [3] modulates the impedance of the transmitter in order to lower its output voltage. This provides excellent signaling power efficiency since the signaling current scales proportionally with the output swing.

0 0.1 0.2 0.3 0.4 0.50

0.1

0.2

0.3

0.4

0.5

Vsw/VdrvSig

nal

ing

Cur

rent

[N

orm

aliz

ed t

o V d

rv/R

term

]

TX [1]TX [3]Proposed TX

Fig. 1 (a) Equivalent circuits for the transmitters from [1], [3], and the proposed design. (b) Signaling current vs. output voltage for the three TXs.

Although reducing signaling power along with output swing is an attractive benefit, this approach comes at the expense of sacrificing output termination and hence system linearity (and thus perhaps signal integrity). More importantly, the nonlinear mapping between the output swing and required impedance would require a larger number of unit cells as compared to [1] for the same pre-emphasis coefficient resolution. This can be seen by comparing this transmitters’ Vsw vs. Gsig equations for both [3] (Eq. (1)) and [1] (Eq. (2)):

(a)

(b)

978-1-4673-1556-2/12/$31.00@2012 IEEE

where GT and Gsig (≤GT) are the conductance of the transmission line and the driver respectively, Gkill is the conductance of the shunt device creating the crowbar current path, and Vdrv is the driver’s supply voltage. Contrary to the design in [1] (Eq. (2)), where Vsw is linearly proportional to Gsig, Vsw in the design from [3] (Eq. (1)) increases faster with lower Gsig, and saturates with high Gsig. Therefore, for the same Vsw resolution, the TX design from [3] requires roughly twice as many units cells in order to more finely partition Gsig. As discussed in both [4] and [5], a larger number of unit cells leads to increased pre-driver power. Since the PEVM architecture in [1] has better digital power efficiency, if we can improve its signaling power at the same time, a more energy efficient PEVM can be realized. The key to achieving this is modifying the means by which pre-emphasis is implemented. Specifically, in [1], signaling power is wasted in the crowbar current paths created from Vdrv to ground when reduced swing (due to pre-emphasis) is used. Instead, similar to the technique used in [2], the transmitter’s output voltage can be reduced by using a shunting path that is in parallel to the channel – as shown in Fig. 1(a). The Vsw vs. Gsig relationship for this driver is:

Comparing (3) with (2), the additional benefit from this configuration is that the Gsig range for Vsw>0 in (3) is twice of that in (2), enabling 1 additional bit of pre-emphasis resolution for the same number of unit cells. Finally, the total signaling current can be expressed as:

As shown in Fig. 1(b), the proposed architecture allows signaling current reduction along with the transmitted swing (albeit not quite proportionally as [3]) while retaining a smaller number of unit cells and hence lower pre-driver power.

Another design choice made in some previous work [2, 4] is to incorporate both impedance calibration and pre-emphasis control into the unit cell design. Although this may simplify the overall implementation, even with non-uniform unit cells [4], this choice tends to compromise the pre-emphasis resolution for a reasonable total number of unit cells. For example, the design in [4] used 24 unit cells to achieve 4.6-bit dynamic range in impedance but only ~2-bit dynamic range in pre-emphasis. To decouple impedance calibration and pre-emphasis control and hence increase the pre-emphasis dynamic range, as in [1], we propose to use separate regulator-based loops to control the impedance of our pull, push, and

shunt devices, leaving unit cells only for pre-emphasis coefficient control.

Fig. 2 Overall transmitter architecture.

III. TX ARCHITECTURE AND CIRCUIT IMPLEMENTATION

In this section, we describe the circuit details of a 2-tap implementation of the proposed PEVM TX architecture. Specifically, this section describes how the pre-emphasis decoder, driver unit cell, swing control loop, and online impedance control loop are implemented.

A. Overall Structure

The 2-tap transmitter architecture is shown in Fig. 2. Since a large number of TX pre-emphasis taps quickly leads to excessive pre-driver power due to complex unit cell logic, in this design we have chosen to implement only 2 taps of pre-emphasis. The pre-emphasis tap can be assigned to be either the post-cursor or the pre-cursor. As will be described in further detail shortly, following 16:1 serialization, before entering the final driver, the digital data stream is pre-decoded in order to simplify the logic required within each driver unit cell.

In this design, the final driver comprises of 15 unit cells, which corresponds to the maximum achievable resolution before the pre-driver logic transistors hit the minimum size constraint. This results in a 4-bit tunable pre-emphasis weight from 0 to 100% with a step of ~6.7%.

B. Pre-Emphasis Decoder

Knowing that the target TX FIR filter will be high-pass in nature, the pre-emphasis decoder simply looks for differences/similarities between the current bit and next or previous bits (using XORs and ANDs) in order to provide appropriate signals for the pull-up, pull-down, and shunt devices. The principle motivation for performing this pre-decoding is that these logic gates can be shared amongst all final driver unit cells. Each individual unit cell then only needs to include MUXes to decide whether to use the pre-decoder outputs or simply the raw data bit. In comparison to a design which effectively requires multiple complex logic gates within each unit cell ([3]), this design reduces the number of minimum sized transistors within each cell, and hence results in significantly reduced digital power overhead.

C. Driver Unit Cell

The driver unit cell is the interface between the full-swing digital data and the transmitted low-swing analog signal. Since

the target maximum swing of this particular design is low (<250mV differential amplitude), each unit cell utilizes an N-over-N voltage mode driver. Similarly, the shunt devices are also implemented with NMOS transistors due to the low output common-mode voltage. The supply voltage of the N-over-N leg is connected to Vdrv, which is generated by the swing control regulator. The gate voltage of the pull-up, pull-down, and shunt devices are connected to 3 buffers each with a supply of Vtp, Vbt and Vmd generated from the impedance control regulators.

As mentioned earlier, since we have decoupled the impedance control from the pre-emphasis control, MUXes can be simply added to the inputs of the gate buffers of the driver devices to achieve coefficient adjustment. If the unit is enabled for pre-emphasis, the MUX within the unit cell will connect the pull-up, pull-down, and shunt device gates to the outputs of the pre-decoder that compares between the current and next or previous bits. If the unit cell is not configured for pre-emphasis, the shunt device will be grounded and the pull-up and pull-down device will be driven purely by the current bit. With this simplified unit cell logic consisting only of 2-input MUXes and level-shifting inverters, the power overhead from high-speed digital gates is significantly reduced.

D. Swing and Impedance Control Loops

As shown in Fig. 3, swing control is achieved through a voltage regulator that sets Vdrv based on a reference voltage generated from a voltage DAC. The regulator utilizes a comparator-based architecture with a source-follower output stage in order to reduce the power consumed by its feedback loop. Given the broadband low output impedance and gmro intrinsic PSRR of the NMOS source follower power device [6], the regulator’s feedback control loop need not achieve high bandwidth. Hence, this design utilizes a switched-capacitor resistor (SCR) at the gate of the power device to implement a low-pass filter and attenuate the output voltage ripple due to the loop’s dither to less than 1mV (the capacitance at the gate of the power device is ~1000X larger than that of the SCR).

The error signal between Vdrv and the reference voltage is generated by a comparator. Given the relatively low target swing and hence low common-mode at the comparator’s input, the feedback comparator is implemented with a PMOS-input StrongArm latch. Minimizing the power dissipation of this comparator leads to nearly minimum sized devices. Thus, if left uncompensated, the comparator (and hence the overall regulator) could exhibit >100mV of offset. An automatic digital loop was therefore utilized to continuously cancel the comparator’s offset.

As indicated by the schematic and example waveforms in Fig. 3, the offset cancellation and regulator feedback processes operate in an interleaved fashion. The comparator’s offset is cancelled by a 5-bit capacitor-DAC (CDAC) that is connected to the drains of the input devices. In steady-state, the output of the offset accumulator will dither around a certain value. In order to limit this offset dither, the LSB of the CDAC is chosen to be 4mV.

Vdd

Vdrv

Voltage DAC

clkcal

CapDAC

clkcal

clkcal

LatchOffset-Free Comparator

clk /2 cal

SW-CAP Resistor

cal

clk

Accum. Output

SW-CAP OutputOffset Cancelled

Vdrv Settled

Fig. 3 Swing control with comparator offset cancellation.

As shown in Fig. 4, the impedance control loops operate in a manner very similar to the Vdrv regulator. Since Vtp and Vbt as well as perhaps Vmd may need to be very close to Vdd, and since the regulator’s power device is an NMOS source-follower, the gate voltage of the power device may need to be higher than Vdd. The regulator feedback control loop therefore uses a higher voltage (but very low current) supply VddH (e.g. 1.6V) along with a level shifter to enable this higher gate voltage. The devices in the switched-cap resistor are hence all thick-oxide to ensure reliability.

Fig. 4 Impedance control loop.

IV. MEASUREMENT RESULTS

In order to experimentally verify the proposed TX architecture and circuits, a test-chip including this design was fabricated in a 65nm LP CMOS process. The die photo along with the TX layout is shown in Fig. 5, highlighting that the TX occupies an area of ~300μm×200μm.

200um

300um

Vtp GenVbt Gen

Vdrv Gen

VoltageDAC

16:1Serializer

2-TapPreemphasisDriver

Vmd Gen

90um

150um

50um

50um

Fig. 5 Die photo and TX floor plan

Figure 6(a) shows the TX eye with 223-1 PRBS data and Vsw = 250mV before and after a 10” FR4 PCB trace. Since the total output pad capacitance including ESD is about 1.3pF and there is ~2” of FR4 PCB trace between the test-chip and the

connector, the intrinsic output bandwidth of the TX even without the additional 10” trace is limited to ~4GHz. This inherent bandwidth limitation is the reason for the ISI apparent in Fig. 6(a). After the 10” trace, the eye is completely closed. However, as shown in Fig. 6(b), after configuring the TX to apply a pre-emphasis filter of (10x[n]-5x[n-1])/15, the eye is opened with 3.19ps/22.22ps RMS/P2P jitter.

Fig. 6 223-1 PRBS eye before and after 10” trace with post-tap pre-emphasis (a) turned off and (b) turned on (100mV/div vertical, 20ps/div horizontal)

To verify the impedance and swing control loops, Fig. 7 shows the measured characteristics of the transmitter. Across output swings and pre-emphasis settings (PE code), the output impedance remains nearly flat, indicating the effectiveness of the impedance loop. Note that the deviation from the ideal 50Ω value is due to mismatches between the different reference resistor strings within each regulator.

10 15 20 250

50

100

150

200

250

300

Swing Control Code

Ou

tpu

t S

win

g (

mV

)

100 150 200 25010

20

30

40

50

60

70

80

Output Swing (mV)

Ou

tpu

t R

esis

tanc

e ( Ω

)

No PEPE=4PE=8

Fig.7 TX output swing and impedance characteristics.

0 2 4 6 8 101.5

2

2.5

3

3.5

Pre-emphasis Control Code

Sig

nalin

g P

ower

at

10G

bps

(m

W)

MeasurementCalculation

Fig.8 Signaling power with 250mV ampl. swing vs. pre-emphasis setting.

Finally, the power consumption of the TX was characterized. Figure 8 shows the signaling power vs. pre-emphasis code at a nominal output swing of 250mV. The measured signaling power drops with lower swing and tracks al,pst perfectly with the analytical predictions. The total power vs. output swing is plotted in Fig. 9(a), and Fig. 9(b) shows the total power vs. data-rate. The total power consumption is ~10mW for a 10Gb/s 200mV amplitude PRBS sequence. Based on these trends, we can also extract that the analog power is ~5mW, with 2.4mW due to the signaling power and

the remaining 2.6mW from other biasing and leakage paths. Similarly, the total digital power is ~5mW.

100 150 200 2504

6

8

10

12

Output Swing (mV)

To

tal

Po

wer

at

10G

bp

s (m

W)

5 6 7 8 9 106

7

8

9

10

11

12

Frequency (GHz)

To

tal

Po

wer

at

200m

V O

utp

ut S

win

g (

mW

)

Fig.9 TX power vs. output swing and data-rate.

Table I summarizes the results of this design in comparison with previous PEVM designs with the same number of equalization taps and similar TX swings. In addition to supporting output swing scaling and impedance control, the proposed PEVM works at the highest data rate while achieving ~2x improved energy-efficiency, largely due to the reduced digital power.

Table I. PEVM comparison [1] [2] [3] This work

Technology 0.18um 45nm 90nm 65nm Data rate 3.6Gb/s 7.4Gb/s 4Gb/s 10Gb/s

Output Swing 250mV 400mV 0-500mV 80-250mV # of Taps 2 2 2 2

Power 10mW 32mW 5-17mW 8-11mW Efficiency 2.8pJ/bit 4.3pJ/bit 1.25-4.25pJ/bit 0.8-1.1pJ/bit

Digital Overhead N/A N/A 1.25pJ/bita 0.5pJ/bitb

aDecoder power only. bIncludes 16:1 serializer and clock distribution power.

V. CONCLUSION

This work shows that with an optimized pre-emphasis implementation and circuitry, a voltage-mode transmitter can support impedance and swing control, programmable pre-emphasis, and output voltage-dependent signaling power, all while retaining energy-efficiency. Specifically, separating the impedance control loop in order to minimize the number of final driver unit cells required for pre-emphasis control, using a shared pre-emphasis decoder with simple driver unit cells, and a shunt path for signal amplitude de-emphasis, the design maintains both low signaling and digital pre-driver power.

ACKNOWLEDGMENTS

The authors would like to thank the Berkeley Wireless Research Center, NSF Infrastructure Grant No. 0403427, IMPACT, and UC Discovery for support, Fujitsu Semiconductor Ltd. for device fabrication, Berkeley Design Automation, and H. Hwang of Samsung Electronics.

REFERENCES

[1] H. Hatamkhani, K-L. J. Wong, R. Drost, C-K. K. Yang, “A 10mW 3.6Gbps I/O Transmitter,” Symp. VLSI Circuits, Jun., 2003.

[2] W. Dettloff et al., “A 32mW 7.4Gb/s Protocol-Agile Source-Series Terminated Transmitter in 45nm CMOS SOI,” IEEE ISSCC, Feb. 2010.

[3] R. Sredojevic and V. Stojanovic, “Digital Link Pre-emphasis with Dynamic Driver Impedance Modulation Links,” IEEE CICC, Sept. 2010

[4] A. Amirkhany et al., “A 12.8-Gb/s/link Tri-Modal Single-Ended Memory Interface,” IEEE JSSC, April. 2012

[5] K. Jung, Y. Lu, and E. Alon "Power Analysis and Optimization for High-Speed I/O Transceivers," IEEE MWSCAS, Aug. 2011.

[6] E. Alon and M. Horowitz, "Integrated Regulation for Energy-Efficient Digital Circuits," IEEE Journal of Solid-State Circuits, Aug. 2008.

(a)

(b)

(a) (b)

Documents

[IEEE 2012 IEEE Custom Integrated Circuits Conference - CICC 2012 - San Jose, CA, USA (2012.09.9-2012.09.12)] Proceedings of the IEEE 2012 Custom Integrated Circuits Conference - A