56
Integrated Systems Group Massachusetts Institute of Technology Design of High-Speed Links: A look at Modern VLSI Design Vladimir Stojanović

Design of High-Speed Links: A look at Modern VLSI Design · Design of High-Speed Links: A look at Modern VLSI Design ... New link design ... Middle sample is corrupted by 0.2 trailing

Embed Size (px)

Citation preview

Integrated Systems GroupMassachusetts Institute of Technology

Design of High-Speed Links: A look at Modern VLSI Design

Vladimir Stojanović

Integrated Systems Group 2

Chip design is changing

Best systems trade-off circuits, architecture and system issues

Becoming constrained by powerNot so much by area/density

Pentium 4125M transistors850mW/mm2

90nm tech103W3.4GHz

Pentium3M transistors30mW/mm2

0.6um tech4W0.1GHz

Integrated Systems Group 3

Power-performance system optimization

Complex, many levels of hierarchy and variables

Integrated Systems Group 4

Power-performance system optimization

Complex, many levels of hierarchy and variables

V. Stojanović, V.G. Oklobdzija "Comparative Analysis of MS Latches and Flip-Flopsfor High-Performance and Low-Power Systems," IEEE Journal Solid-State Circuits, April 1999.

Individual componentsFlops & latches(power and timing critical)

D Q

ClkLogic

D Q

Clk

Integrated Systems Group 5

Power-performance system optimization

Complex, many levels of hierarchy and variables

Individual componentsFlops & latches(power and timing critical)

D Q

ClkLogic

D Q

Clk

V. Stojanovic, D. Markovic, B. Nikolic, M. A. Horowitz and R. W. Brodersen "Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization," European Solid-State Circuits Conference, September 2002

Vdd1, Vth1

Vdd2,

Vth2

Vdd3,

Vth3

Vdd4,

Vth4

Vdd5,

Vth5

System level, VLSI blocks and circuits-Physical (Vdd,Vth,Sizing)-Logic-uArchitecture(parallelism, pipelining)

D Q

Clk

Logic A

D Q

Clk

Logic B

D Q

Clk

D Q

Clk

Logic A

D Q

Clk

Logic A

Logic B

Logic B

Integrated Systems Group 6

Seems pretty simple:

Challenging multi-disciplinary areaCircuitsCommunicationsOptimization

Look at system-level problem: links

TransmitterChannel

Receiver

Integrated Systems Group 7

What makes it challenging

Now, the bandwidth limit is in wires

High speed link chip

> 2 GHz signals

Integrated Systems Group 8

New link design

Dealing with bandwidth limited channels

This is an old research areaTextbooks on digital communicationsThink modems, DSL

But can’t directly apply their solutionsStandard approach requires high-speed A/Ds and digital signal processing20Gs/s A/Ds are expensive

(Un)fortunately need to rethink issues

Integrated Systems Group 9

Outline

Show system level optimization for linksCreate a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issues

Current / future work

Integrated Systems Group 10

Backplane environment

Line attenuationReflections from stubs (vias)

Back plane connector

Line card trace

Package

On-chip parasitic(termination resistance and device loading capacitance)

Line card via

Back plane trace

Backplane via

Package via

Back plane connector

Line card trace

Package

On-chip parasitic(termination resistance and device loading capacitance)

Line card via

Back plane trace

Backplane via

Package via

Integrated Systems Group 11

Backplane channel

Loss is variableSame backplaneDifferent lengthsDifferent stubs

Top vs. Bot

Attenuation is large>30dB @ 3GHzBut is that bad?

Required signal amplitude set by noise

0 2 4 6 8 10

-60

-50

-40

-30

-20

-10

0

frequency [GHz]

Atte

nuat

ion

[dB

]

9" FR4, via stub

26" FR4,via stub

26" FR4

9" FR4

Integrated Systems Group 12

Inter-symbol interference (ISI)Channel is low pass

Our nice short pulse gets spread out

0 1 2 3

0

0.2

0.4

0.6

0.8

1

ns

puls

e re

spon

se

Tsymbol=160ps

Dispersion –short latency(skin-effect, dielectric loss) Reflections –long latency(impedance mismatches – connectors, via stubs, device parasitics, package)

Integrated Systems Group 13

ISI

Middle sample is corrupted by 0.2 trailing ISI (from the previous symbol), and 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISIAs a result middle symbol is detected in error

0 2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

Symbol time

Am

plitu

de

Error!

Integrated Systems Group 14

Prior state of high-speed links

Channel

serializer

PLL

dataIn

ref Clk

Driver/Equalizer Data Slicer

deserializerdataOut

Clock, datarecovery

Links components well developedFast multiplexed transmitters and receiversPrecise timing generation and data recovery

Starting to use equalization (1 – 2 taps)Few taps set manually at the transmitter

Integrated Systems Group 15

Barriers to improving link performance

No good link system and noise modelsCannot predict the “right” architecture for a given set of channelsNeed to make performance/power tradeoff

Maximum achievable data rates – unknownLimited link communication system design

Peak power constraint in the transmitterNo solution for optimal transmit equalizationNo solution for automatic equalization

Integrated Systems Group 16

Previous system models

Mostly non-existentBorrowed from computer systems

Worst case analysisCan be too pessimistic in links

Borrowed from data communicationsGaussian distributions

Works well near mean Often way off at tails

ISI distribution is bounded

Need accurate models To relate the power/complexity to performance

Integrated Systems Group 17

How bad is Gaussian model?

0 25 50 75 100

-10

-8

-6

-4

-2

0

re sidual ISI [m V ]80 100 120 140 160 180

-10

-8

-6

-4

-2

0

40mV error @ 10-10

25% of eye height

4% Tsym bol

error @ 10-10

9% Tsym bol

log 10

pro

babi

lity

[cdf

]

log 10

Ste

ady-

Stat

e Ph

ase

Prob

abili

ty

phase count

Cumulative ISI distribution Impact on CDR phase

Gaussian model only good down to 10-3 probabilityWay pessimistic for much lower probabilities

Integrated Systems Group 18

A new model

Use direct noise and interference statistics

Main system impairmentsInterference

Voltage noise (thermal, supply, offsets, quantization)

Timing noise – always looked at separatelyKey to integrate with voltage noise sourcesNeed to map from time to voltage

Integrated Systems Group 19

Effect of timing noise

Ideal sampling

Jittered sampling

Voltage noiseVoltage noise when receiver clock is off

The effect depends on the size of the jitter, the input sequence, and the channelNeed effective voltage noise distribution

Integrated Systems Group 20

kb

kT

TXkε

Tk )1( +

TXk 1+ε

kT

TXkε

Tk )1( +

TXk 1+ε

+

kb−

kb

kb

1

2 ≈TXkkb ε−

TXkkb 1+ε

Example: Effect of transmitter jitter

Decompose output into ideal and noiseNoise are pulses at front and end of symbol

Width of pulse is equal to jitter

Approximate with deltas on bandlimited channels

ideal

noise

V. Stojanović, M. Horowitz, “Modeling and Analysis of High-Speed Links,”IEEE Custom Integrated Circuits Conference, September 2003. (invited)

Integrated Systems Group 21

Jitter effect on voltage noiseTransmitter jitter

High frequency (cycle-cycle) jitter is badChanges the energy (area) of the symbolNo correlation of noise sources that sum

Low frequency jitter is less badEffectively shifts waveformCorrelated noise give partial cancellation

Receive jitterModeled by shift of transmit sequenceSame as low frequency transmitter jitter

Bandwidth of the jitter is criticalIt sets the magnitude of the noise created

εkRx

≡εk

Rx

Integrated Systems Group 22

Jitter source from PLL clocks

Noise sourcesReference clock phase noiseVCO supply noiseClock buffer supply noise

M. Mansuri, C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters," IEEE Journal Solid-State Circuits, Nov. 2002

Ref Clk PhasedetectorKpd

Icp

Icp R

C

VCOKvco/s

Clockbuffer

+−

105

106

107

108

109

1010

-30

-20

-10

0

10

frequency [Hz]

Noi

se tr

ansf

er fu

nctio

ns [d

B]

fromVCO supply

frominput clock

fromclock buffer supply

E. Alon, V. Stojanovic, M. Horowitz “Circuits and Techniques for High-Resolution Measurement of On-Chip Power Supply Noise,” IEEE Journal Solid-State Circuits, April 2005.

Integrated Systems Group 23

Slicer

PD

deserializer

PLL

dataOut

ref Clk

Phasecontrol

Phasemixeredge Clk

data Clk

2x Oversampled bang-bang CDR

Generate early/late from dn,dn-1,enSimple 1st order loop, cancels receiver setup time

Now need jitter on data Clk, not PLL outputBase linear PLL jitterAdd non-linear phase selector noise from CDR

dn-1

dn

en (late)

dn

en

Integrated Systems Group 24

0 50 100 150 200 250

-15

-10

-5

0

Phase Count

log 10

Ste

ady-

Stat

e Pr

obab

ility

Bang-bang CDR model

Gives the probability distribution of phaseWhich is the CDR jitter distribution

Model CDR loop as a state machine – Markov chain

iφ1−iφ 1+iφ

iholdp ,

iupp ,

idnp ,

A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983.

Integrated Systems Group 25

Outline

Show system level optimization for linksCreate a framework to evaluate trade-offs

Background on high-speed linksHigh-speed link modelingSystem level optimization

Limits – What is the capacity of these links?Improving today’s baseband signaling

Practical implementation issues

Current / future work

Integrated Systems Group 26

Baseline channels

Legacy (FR4) - lots of reflectionsMicrowave engineered (NELCO)

0 5 10 15 20

-100

-80

-60

-40

-20

0

Atte

nuat

ion

[dB

]

frequency [GHz]

26" FR4, via stub

26" NELCO,no stub

(b)

Integrated Systems Group 27

Capacity with link-specific noise

Effective noise from phase noiseProportional to signal energyDecreases expected gains

Still, capacity much higher than data rates in today’s links

NELCO FR4

-25 -20 -15 -10 -5 00

20

40

60

80

100

120

140

Cap

acity

[Gb/

s]

log10(Clipping probability)

thermal noise

thermal noise and LC PLL phase noise

thermal noise and ring PLL phase noise

-25 -20 -15 -10 -5 00

20

40

60

80

100

120

140

Cap

acity

[Gb/

s]

log10(Clipping probability)

thermal noise

thermal noise and LC PLL phase noise

thermal noise and ring PLL phase noise

Integrated Systems Group 28

Removing ISI

Transmit and Receive Equalization Changes signal to correct for ISIOften easier to work at transmitter

DACs easier than ADCs

Linear transmit equalizer

Decision-feedback equalizer

SampledData

Deadband Feedback taps

Tap SelLogic

TxData

Causaltaps

Anticausal taps

Channel

J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.

0eqI

doutNoutP

d

Ω50Ω50

Integrated Systems Group 29

Transmit equalization – headroom constraint

Transmit DAC has limited voltage headroomUnknown target signal levels

Hard to formulate error or objective function

Need to tune the equalizer and receive comparator levels

0 0.5 1 1.5 2 2.5-25

-20

-15

-10

-5

0

frequency [GHz]

Atte

nuat

ion

[dB

]

equalized

unequalized

Amplitude of equalized signaldepends on the channel

TxData

Causaltaps

Anticausal taps

Channel

Peak power constraint

Integrated Systems Group 30

Optimization example: Power constrained linear precoding

Add variable gain to amplify to known target levelFormulate the objective function from error

SINR is not concave in w in generalChange objective to quasiconcave

w P

pow er constraint

precoder channelpulse response

g

noise

ka

ka

kake

( ) 222121),( σgwwgwgEgwMSE TTTa ++−= ∆ PPP

2

2

)11)(11()1()(

σ+−−=

∆∆∆∆

∆∆

wwEwEwSINR

TTTTTa

Ta

unbiased PIIPP

unbiasedSINR

V. Stojanović, A. Amirkhany, M. Horowitz, “Optimal Linear Precoding with Theoretical and Practical Data Rates in High-Speed Serial-Link Backplane Communication,”IEEE International Conference on Communications, June 2004

Integrated Systems Group 31

Optimal linear precoding

Minimize BERResidual dispersion into peak distortionReflections into mean distortion

Includes all link-specific noise sources

( )1..

)11)(11(

15.0maximize

1

2/12

1min

+−−−−

−−=

∆∆∆∆

wtswwE

offsetwVwd

w TTPD

TPD

TTa

PDpeakT

σγ

PIIIIP

PIP

σ2=wTS0TXw+wTS0

RXw+σ2thermal

Still, does this objective really relate to link performance?

Need to look at noise and interference distributions

Integrated Systems Group 32

Including feedback equalization

Feedback equalization (DFE)Subtracts error from input No attenuation

Problem with DFENeed to know interfering bitsISI must be causal

Problem - latency in the decision circuitReceive latency + DAC settling < bit time

Can increase allowable time by loop unrollingReceive next bit before the previous is resolved

0 2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

Symbol time

Am

plitu

de

Feedbackequalization

Integrated Systems Group 33

One-tap DFE with loop unrolling

+1

-1

0

1

αPulse response

Integrated Systems Group 34

+1

-1

0

+1+α

-1+α

1

α

One-tap DFE with loop unrolling

Integrated Systems Group 35

+1

-1

0

+1+α

-1+α

+α+1-α

-1-α

1

α

One-tap DFE with loop unrolling

Integrated Systems Group 36

+1+α

-1+α

+α+1-α

-1-α

Instead of subtracting the errorMove the slicer level to include the noiseSlice for each possible level, since previous value unknown

K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops," IEEE International Symposium on Circuits and Systems, May 1990

D Q1−nd

dClk

1| 1 =−nn dd

0| 1 =−nn dd

dClknx

One-tap DFE with loop unrolling

Integrated Systems Group 37

0 20 40 60 80 100 120 140 160-150

-100

-50

0

50

100

150

time [ps]

mar

gin

[mV]

-30

-25

-20

-15

-10

-5

BER contours

Voltage marginMin. distance between the receiver threshold and contours with same BER

0 20 40 60 80 100 120 140 160-150

-100

-50

0

50

100

150

time [ps]m

argi

n [m

V]-30

-25

-20

-15

-10

-5

5 tap Tx Eq 5 tap Tx Eq + 1 tap DFE

Integrated Systems Group 38

Pulse amplitude modulation

Binary (NRZ)1 bit / symbolSymbol rate = bit rate

PAM4 2 bits / symbolSymbol rate = bit rate/2

10

11

01

00

1

0

Integrated Systems Group 39

Multi-level: Offset and jitter are crucial

thermal noise + offset

thermal noise + offset+ jitter

To make better use of available bandwidth, need better circuitsPAM2/PAM4 robust candidate for next generation links

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

Dat

a ra

te [G

b/s]

Symbol rate [Gs/s]

PAM16

PAM8

PAM4

PAM2

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM2

PAM4

PAM8

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35

40

45

Dat

a ra

te [G

b/s]

PAM4

PAM16

PAM8

PAM2

Symbol rate [Gs/s]

thermal noise

Integrated Systems Group 40

Full ISI compensation too costly

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Dat

a ra

te [G

b/s]

Symbol rate [Gs/s]

PAM16PAM4

PAM2PAM8

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM8

PAM4

PAM2

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM2

PAM4

PAM8

thermal noisethermal noise + offset

thermal noise + offset+ jitter

Today’s links cannot afford to compensate all ISILimits today’s maximum achievable data rates

Integrated Systems Group 41

Outline

Show system level optimization for linksCreate a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issuesLow-cost adaptationDual-mode link (hardware re-use)

Current / future work

Integrated Systems Group 42

Fully adaptive dual-mode link

Reconfigurable dual-mode PAM2/PAM4 linkAdaptive equalizationTransmit and receive equalizationDFE with loop unrolling

PAM2/PAM42-10Gb/s0.13µm40mW/Gb/s

V. Stojanović et al. “Autonomous Dual-Mode (PAM2/4) Serial Link Transceiver with Adaptive Equalizationand Data Recovery,” ,” IEEE Journal of Solid-State Circuits, April 2005

Config Registers

PhaseMixers

CDRLogic PLL

TransmitterReflectionCanceller

Receiver

Backchannel RX

Backchannel TX

Integrated Systems Group 43

Adaptation with minimum overhead

Adaptive sampler Generates the error signal at reference level

Monitors the linkAdjustable voltage and time referenceOn-chip sampling scope

Can replace any other sampler - calibration

Tx Data

Channel

tapupdates

dLev

adaptivesampler

error

Rx data

Adaptivemacro

tap updates

thresholds

CDRedge

aClk dClk eClk

aClk

dClk

eClk

Integrated Systems Group 44

Equalizer loop

Scale the equalizer - output Tx constraint

Dual-loop adaptive algorithmData level reference loop

)ˆ()(1 nnwnn xsignesignstepww +=+

0ˆ),(1 >−=+ nndataLevnn xesignstepdLevdLev

… …

dLevinit dLevmid dLevend

Initial eye Mid-way equalized Equalized

errorinitp-p

nx

)ˆ( nxSign

)( neSign

Integrated Systems Group 45

Dual loop convergence – 4 tap example

Hard to estimate analyticallyExperimental results show

Both loops are stable within wide range 0.1 – 10x of relative speeds

0 50 100 150 200-400

-200

0

200

400

600

800

1000

number of updatesta

p w

eigh

t [m

V] main tap

post1 pre1

post2

0 50 100 150 2000

20

40

60

80

100

number of updates

dLev

[mV]

PAM2, 5Gb/s, 4taps Tx Equalization

Integrated Systems Group 46

Hardware re-use: Dual-mode receiver

PAM4

thresh(+)

thresh(-)

0

D QD Q

D Q

D Q

D Q

thresh (+)

thresh (-)

in0

lsb(+)

lsb(-)

msb

prDFE enable

D Q

dClk

dClk

dClk

prDFE enable

prDFE enable

D Q

D Q

D Q

D Q1

0

0

1

0

1

0

1

Integrated Systems Group 47

D QD Q

D Q

D Q

D Q

thresh (+)

thresh (-)

in0

lsb(+)

lsb(-)

msb

prDFE enable

D Q

dClk

dClk

dClk

prDFE enable

prDFE enable

D Q

D Q

D Q

D Q1

0

0

1

0

1

0

1inP

inNclkthreshII

+2 threshII

−2

inP

outNoutP

clkclk

outP outN

Q

Q

pre-amp with offset comparator

PAM4

thresh(+)

thresh(-)

0

Hardware re-use: Dual-mode receiver

Integrated Systems Group 48

0

PAM2

D QD Q

D Q

D Q

D Q

thresh (+)

thresh (-)

in0

lsb(+)

lsb(-)

msb

prDFE enable

D Q

dClk

dClk

dClk

prDFE enable

prDFE enable

D Q

D Q

D Q

D Q1

0

0

1

0

1

0

1

Hardware re-use: Dual-mode receiver

Integrated Systems Group 49

PAM2 with loop-unrolled DFE tap

D QD Q

D Q

D Q

D Q

thresh (+)

thresh (-)

in0

lsb(+)

lsb(-)

msb

prDFE enable

D Q

dClk

dClk

dClk

prDFE enable

prDFE enable

D Q

D Q

D Q

D Q1

0

0

1

0

1

0

1

Hardware re-use: Dual-mode receiver

Integrated Systems Group 50

PAM2 with loop-unrolled DFE tapLeverage multi-level properties of signals in loop-unrollingRe-use PAM4 receiver hardware (slicers and CDR)

thresh(+)

thresh(-)

D QD Q

D Q

D Q

D Q

thresh (+)

thresh (-)

in0

lsb(+)

lsb(-)

msb

prDFE enable

D Q

dClk

dClk

dClk

prDFE enable

prDFE enable

D Q

D Q

D Q

D Q1

0

0

1

0

1

0

1

Hardware re-use: Dual-mode receiver

Integrated Systems Group 51

Improvements with loop-unrolling

Signal as seen by the receiver (on-chip scope)

0 50 100 150 200

-100

-50

0

50

100

150

200

[ps][m

V]

-5

-4.5

-4

-3.5

-3

log 10

(vol

tage

pro

babi

lity

dist

ribut

ion)

0 1000 2000 3000 4000

0

0.05

0.1

0.15

0.2

0.25 [V]

[ps]

transmit equalized with one tap DFE

fully transmit equalized

0 1000 2000 3000 4000

0

0.1

0.2

0.3

0.4 [V]

[ps]

unequalized

Integrated Systems Group 52

Model and measurements

-80-60-40-200 20 40 60 80

-14

-12

-10

-8

-6

-4

-2

0

log1

0(B

ER)

Voltage Margin [mV]

PAM4, 3taps of transmit equalization, 5Gb/s, 26” FR4 channel

Integrated Systems Group 53

Outline

Show system level optimization for linksCreate a framework to evaluate trade-offs

Background on high-speed links

High-speed link modeling

System level optimization

Practical implementation issues

Current / future workBridging the gap to link capacityOther similar system optimizations

Integrated Systems Group 54

Bridging the gap: Multi-tone link

0 2 4 6 8 10 12 140

2

4

6

8

10

Multi-tone data rates with thermal noise

Nelco 64Gb/sFR4 38Gb/s

#bits

/Hz

frequency [GHz]

A. Amirkhany, V. Stojanovic, M.A. Horowitz, “Multi-tone Signaling for High-speed Backplane Electrical Links,” IEEE Global Telecommunications Conference, November 2004.

Integrated Systems Group 55

Bridging the gap: Multi-tone link

…f

# le

vels

data0

data1

dataN

Challenge – balancing the inter-symbol and inter-channel interference

Microwave filter techniquesCustom signal processing

0 2 4 6 8 10 12 140

2

4

6

8

10

Multi-tone data rates with thermal noise

Nelco 64Gb/sFR4 38Gb/s

#bits

/Hz

frequency [GHz]

LPF

BPF

BPF

BPF

LPF

ejw1t ejw1t

ejwNt

data0

data1LPF

BPF

ejwNt

LPFdataN

LPF

LPF

Integrated Systems Group 56

ConclusionsInterfaces are challenging system designs

Good space to explore system level optimization

Optimization leads to novel approachesBaseband links

PAM4 and simple DFE reduce effect of ISILow cost adaptive, self calibrating link

Still, far from the capacity of these linksLooking into multi-tone to bridge the gap

This cross-layer approach can/should be applied to many different systems