54
Integrated Systems Group Massachusetts Institute of Technology High-speed links: A new field in high-throughput, energy-efficient communications? Vladimir Stojanović

Vladimir Stojanović

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vladimir Stojanović

Integrated Systems GroupMassachusetts Institute of Technology

High-speed links: A new field in high-throughput, energy-efficient communications?

Vladimir Stojanović

Page 2: Vladimir Stojanović

Integrated Systems Group 2

High-speed links are everywhere

Backbone Router Rack

PC or Console

Page 3: Vladimir Stojanović

Integrated Systems Group 3

Serial link signaling over backplanes - past

Designs limited by transmitter & receiver speedClever circuit design

No communications/SI background needed

serdes

BackplaneLinecard Linecard

serdes

Signal at Tx Signal at Rx0.1

1.00.0 0.2 0.4 0.6 0.8 1.0

[GHz]

Channel was not an issue up to 2-3Gb/s

2Gb/s view of the channel

Page 4: Vladimir Stojanović

Integrated Systems Group 4

Serial Link Signaling Over Backplanes - Present

Now that we’ve made the fastest Tx & RxLook what happens with the eye

Channel seems to be the problem

serdes

BackplaneLinecard Linecard

serdes

Signal at Tx Signal at Rx

0.00

0.01

0.10

1.000.0 1.0 2.0 3.0 4.0 5.0 [GHz]

10Gb/s view of the channel

Page 5: Vladimir Stojanović

Integrated Systems Group 5

New link design

Dealing with bandwidth limited channels

This is an old research areaTextbooks on digital communicationsThink modems, DSL

But can’t directly apply their solutionsStandard approach requires high-speed A/Ds and digital signal processing20Gs/s A/Ds are expensive

(Un)fortunately need to rethink issues

Page 6: Vladimir Stojanović

Integrated Systems Group 6

Energy-Efficiency of communication

Standard approach is not energy-efficientCan’t apply to dense interconnectsLinks are 50x more energy-efficient

1

10

100

1000

10000

100000

1000000

56Kb/s V.92modem

12x12Mb/sADSL

modem

GigabitEthernet

10Gb/s High-speed

link

Ene

rgy

cost

per

bit

mW

/(Gb/

s)

Page 7: Vladimir Stojanović

Integrated Systems Group 7

Backbone router – lots of high-speed links

State-of-the art up to 1 Tb/s throughputLots of linecards – power constrained system

What matters is energy cost per bit

source: Juniper Networkssource: Alcatel, Tyco

Page 8: Vladimir Stojanović

Integrated Systems Group 8

Electrical I/O Challenges

100 Tb/s I/O throughputWith 10Gb/s per link

10000 transceivers20000 high-speed I/O pairs10000 mm2 in 0.13 µm technology

Power 4kW40 mW/Gb/s – energy cost per bit

Scaling the throughput to 100 Tb/s

Page 9: Vladimir Stojanović

Integrated Systems Group 9

Density issuesConnectors

50 diff pairs/inch400” long connector

Trace routing50mils pitch250” wide 4-signal layer line-cardBackplane less critical

PackagePackage/Chip ball pitch (1mm / 200um)4000 mm2 / 160mm2

Scaling the throughput to 100 Tb/s

source: Teradyne, Rambus

Page 10: Vladimir Stojanović

Integrated Systems Group 10

NeedPower

Reduce energy/bit to 1mW/Gb/sDensity

Increase data rate per link by 10-15x

Design challenge

GoalFit 100 Tb/s on a 100 W crossbar chipReasonable system/rack size

Page 11: Vladimir Stojanović

Integrated Systems Group 11

Outline

Explore high-throughput, energy-efficient linksLook at different aspects of system design

High-speed link environmentSystem modeling

Communication techniques

Page 12: Vladimir Stojanović

Integrated Systems Group 12

Backplane environment

Line attenuationReflections from stubs (vias)

Page 13: Vladimir Stojanović

Integrated Systems Group 13

Backplane channel

Loss is variableSame backplaneDifferent lengthsDifferent stubs

Top vs. Bot

Attenuation is large>30dB @ 3GHzBut is that bad?

Required signal amplitude set by noise

0 2 4 6 8 10

-60

-50

-40

-30

-20

-10

0

frequency [GHz]

Atte

nuat

ion

[dB

]

9" FR4, via stub

26" FR4,via stub

26" FR4

9" FR4

Page 14: Vladimir Stojanović

Integrated Systems Group 14

Channel variations:Geometry, manufacturing, environment

Channel Variations come from multiple sources

Trace routingManufacturingTemperature & Humidity

0 2 4 6 8 10

-60

-50

-40

-30

-20

-10

0

frequency [GHz]

Atte

nuat

ion

[dB

]

9" FR4, via stub

26" FR4,via stub

26" FR4

9" FR4

GHz

dB

Page 15: Vladimir Stojanović

Integrated Systems Group 15

Interference

0 2 4 6 8 10

-60

-50

-40

-30

-20

-10

0

frequency [GHz]

Atte

nuat

ion

[dB

]

FEXT

NEXT

THROUGH

0 1 2 3

0

0.2

0.4

0.6

0.8

1

ns

puls

e re

spon

se

Tsymbol=160ps

Inter-symbol interferenceDispersion (skin-effect, dielectric loss) - short latencyReflections (impedance mismatches – connectors, via stubs, device parasitics, package) – long latency

Co-channel interference (Far-End & Near-End Crosstalk)

Page 16: Vladimir Stojanović

Integrated Systems Group 16

Reflections and crosstalk

Don’t just receive the signal you wantGet versions of signals “close” to youVertical connections have worst coupling

“Close” in these vertical connection regions

Far-end XTALK (FEXT)

Desired signal

Near-end XTALK (NEXT)

Reflections

Sercu, DesignCon03

Page 17: Vladimir Stojanović

Integrated Systems Group 17

A complex system

PCB only

PCB + Connectors

PCB, Connectors,Via stubs & Devices

Page 18: Vladimir Stojanović

Integrated Systems Group 18

Outline

Explore high-throughput, energy-efficient linksLook at all aspects of system design

High-speed link environmentSystem modeling

Interference and noise

Communication techniques

Page 19: Vladimir Stojanović

Integrated Systems Group 19

Previous system models

Borrowed from computer systemsWorst case analysis

Can be too pessimistic in links

Borrowed from data communicationsGaussian distributions

Works well near mean Often way off at tails

ISI distribution is bounded

Need accurate models To relate the power/complexity to performance

Page 20: Vladimir Stojanović

Integrated Systems Group 20

How bad is Gaussian interference model?

Gaussian model only good down to 10-3 probabilityWay pessimistic for much lower probabilities

Link target BER ~ 10-15

0 25 50 75 100

-10

-8

-6

-4

-2

0

re sidual ISI [m V ]80 100 120 140 160 180

-10

-8

-6

-4

-2

0

40mV error @ 10-10

25% of eye height

4% Tsym bol

error @ 10-10

9% Tsym bol

log 10

pro

babi

lity

[cdf

]

log 10

Ste

ady-

Stat

e Ph

ase

Prob

abili

ty

phase count

Cumulative ISI distribution Impact on CDR phase

Page 21: Vladimir Stojanović

Integrated Systems Group 21

Link modeling issues

No good link system and noise modelsHard to make performance/power tradeoffCannot predict the “right” architecture

Maximum achievable data rates – unknownLimited link communication system design

Peak power constraint in the transmitterNo solution for optimal transmit equalizationNo solution for automatic equalization

Page 22: Vladimir Stojanović

Integrated Systems Group 22

A new model

Use direct noise and interference statistics

Main system impairmentsInterference

Voltage noise (thermal, supply, offsets, quantization)

Timing noise – always looked at separatelyKey to integrate with voltage noise sourcesNeed to map from time to voltage

Page 23: Vladimir Stojanović

Integrated Systems Group 23

Effect of timing noise

Ideal sampling

Jittered sampling

Voltage noiseVoltage noise when receiver clock is off

The effect depends on the size of the jitter, the input sequence, and the channelNeed effective voltage noise distribution

Page 24: Vladimir Stojanović

Integrated Systems Group 24

kb

kT

TXkε

Tk )1( +

TXk 1+ε

kT

TXkε

Tk )1( +

TXk 1+ε

+

kb−

kb

kb

1

2 ≈TXkkb ε−

TXkkb 1+ε

Example: Effect of transmitter jitter

Decompose output into ideal and noiseNoise are pulses at front and end of symbol

Width of pulse is equal to jitter

Approximate with deltas on bandlimited channels

ideal

noise

V. Stojanović, M. Horowitz, “Modeling and Analysis of High-Speed Links,” IEEE Custom Integrated Circuits Conference, September 2003. (invited)

Page 25: Vladimir Stojanović

Integrated Systems Group 25

Jitter propagation model

TXw )( jTp

)(sHjit

PLL

ka kb

TXkk 1, +ε

inn⎟⎠⎞

⎜⎝⎛ +

2TjTh

+ kxISIk

x

jitTXk

x RXkε

kaprecoder

impulseresponse

pulseresponse

vddn

RX

ideal

noise

∑−=

− +=++sbE

sbSjijk

RXki

ISI jTpbkTx )()( φεφ

( ) ( )∑−=

−+−− ⎥⎦⎤

⎢⎣⎡ −+−−−++=++

sbE

sbSj

TXjk

RXki

TXjk

RXkijk

RXki

jitter TjThTjThbkTx 1)2

()2

()( εεφεεφεφ

Page 26: Vladimir Stojanović

Integrated Systems Group 26

Jitter effect on voltage noiseTransmitter jitter

High frequency (cycle-cycle) jitter is badChanges the energy (area) of the symbolNo correlation of noise sources that sum

Low frequency jitter is less badEffectively shifts waveformCorrelated noise give partial cancellation

Receive jitterModeled by shift of transmit sequenceSame as low frequency transmitter jitter

Bandwidth of the jitter is criticalIt sets the magnitude of the noise created

εkRx

≡εk

Rx

Page 27: Vladimir Stojanović

Integrated Systems Group 27

Outline

Explore high-throughput, energy-efficient linksLook at all aspects of system design

High-speed link environmentSystem modeling

Communication techniquesExploring the limits – Capacity and Baseband

Page 28: Vladimir Stojanović

Integrated Systems Group 28

Baseline channels

Legacy FR4 BP26”, via stub

N6K BP, 26”

Short ATCA BP, 3”

Legacy (FR4) - lots of reflectionsMicrowave engineered (N6K)Emerging standards (IEEE 802.3ap, ATCA)

Page 29: Vladimir Stojanović

Integrated Systems Group 29

Capacity calculation

Concave problem

( )

NnE

PARNENEEts

HE

HE

nE

n

N

npeakavgn

N

n nnnthermal

nn

N

,...,1,0

..

1log21bmaximizelim

1

1

1222

2

2

=≥

==

⎟⎟⎟

⎜⎜⎜

⎟⎟

⎜⎜

+Γ+=

=

=∞→

θσσ

Modified waterfilling Add phase noise

Page 30: Vladimir Stojanović

Integrated Systems Group 30

Channel capacity – the impact of noise

CapacityMuch higher than data rates in today’s linksNominal noise

Thermal – 50 Ohm terminationPhase noise – best LC PLL (0.14%UI rms)

Legacy FR4 BP

N6K BP

Short ATCA BP

Capacity – thermal and phase noise

Page 31: Vladimir Stojanović

Integrated Systems Group 31

Removing ISI – baseband link

Transmit and Receive Equalization Changes signal to correct for ISIOften easier to work at transmitter

DACs easier than ADCs

Linear transmit equalizer

Decision-feedback equalizer

SampledData

Deadband Feedback taps

Tap SelLogic

TxData

Causaltaps

Anticausal taps

Channel

J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.

0eqI

doutNoutP

d

Ω50Ω50

Page 32: Vladimir Stojanović

Integrated Systems Group 32

Transmit equalization – headroom constraint

Transmit DAC has limited voltage headroomUnknown target signal levels

Hard to formulate error or objective function

Need to tune the equalizer and receive comparator levels

Amplitude of equalized signaldepends on the channel

TxData

Causaltaps

Anticausal taps

Channel

Peak power constraint

Page 33: Vladimir Stojanović

Integrated Systems Group 33

Power constrained equalizer optimization

Add variable gain to amplify to known target levelFormulate the objective function from error

SINR is not concave in w in generalChange objective to quasiconcave

w P

power constraint

precoder channelpulse response

g

noise

ka

ka

kake

( ) 222121),( σgwwgwgEgwMSE TTTa ++−= Δ PPP

2

2

)11)(11()1()(

σ+−−=

ΔΔΔΔ

ΔΔ

wwEwEwSINR

TTTTTa

Ta

unbiased PIIPP

unbiasedSINR

Page 34: Vladimir Stojanović

Integrated Systems Group 34

Optimal equalization

Minimize BERResidual dispersion into peak distortionReflections into mean distortion

Includes all link-specific noise sourcesExpand to DFE by puncturing the P matrix

( )1..

)11)(11(

15.0maximize

1

2/12

1min

+−−−−

−−=

ΔΔΔΔ

Δ

wtswwE

offsetwVwd

w TTPD

TPD

TTa

PDpeakT

σγ

PIIIIP

PIP

σ2=wTS0TXw+wTS0

RXw+σ2thermal

Still, does this objective really relate to link performance? Need to look at noise and interference distributions

V. Stojanović, A. Amirkhany, M. Horowitz, “Optimal Linear Precoding with Theoretical and Practical Data Rates in High-Speed Serial-Link Backplane Communication,” ICC’04

Page 35: Vladimir Stojanović

Integrated Systems Group 35

Pulse amplitude modulation

Binary (NRZ)1 bit / symbolSymbol rate = bit rate

PAM4 2 bits / symbolSymbol rate = bit rate/2

10

11

01

00

1

0

Page 36: Vladimir Stojanović

Integrated Systems Group 36

Multi-level: offset and jitter are crucial

thermal noise + offset

thermal noise + offset + jitter

To make better use of available bandwidth, need better circuitsPAM2/PAM4 robust candidate for next generation links

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

Dat

a ra

te [G

b/s]

Symbol rate [Gs/s]

PAM16

PAM8

PAM4

PAM2

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM2

PAM4

PAM8

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35

40

45

Dat

a ra

te [G

b/s]

PAM4

PAM16

PAM8

PAM2

Symbol rate [Gs/s]

thermal noise

Page 37: Vladimir Stojanović

Integrated Systems Group 37

Full ISI compensation too costly

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Dat

a ra

te [G

b/s]

Symbol rate [Gs/s]

PAM16PAM4

PAM2PAM8

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM8

PAM4

PAM2

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

18

20

Symbol rate [Gs/s]

Dat

a ra

te [G

b/s]

PAM2

PAM4

PAM8

thermal noisethermal noise + offset

thermal noise + offset+ jitter

Today’s links cannot afford to compensate all ISIToo much powerLimits today’s maximum achievable data rates

Page 38: Vladimir Stojanović

Integrated Systems Group 38

Outline

Explore high-throughput, energy-efficient linksLook at all aspects of system design

High-speed link environmentSystem modeling

Communication techniquesExploring the limits – Capacity and Baseband

What next?

Page 39: Vladimir Stojanović

Integrated Systems Group 39

What next?

Very long filters too costlyHigh-speed circuits have more noise/errors

Channel-and-circuit-aware codingCode against reflections and xtalk, ckt noise

Page 40: Vladimir Stojanović

Integrated Systems Group 40

Lowers energy allocated to

Timing

Equalization

High-speed I/O link

Energy-efficient Channel Coding for Links

CDR

Rx Eq Rx

PLL

TxTx EqS/P

P/S

Adapt

Enc

Dec

Channel-and-circuit-aware code:

Operate at BER = 10-5

Decoded BER = 10-15

Reuses existing P/S infrastructure.

Page 41: Vladimir Stojanović

Integrated Systems Group 41

But, need to be careful

Always now what you’re optimizingPowerful coders/encoders often costly

Example - fastest RS (255,239) implementation10 – 40 Gb/s throughputEnergy cost - 12mW/Gb/s50x area of the high-speed link (extensive parallelism)

Need to include the energy cost per bit in the code design spec

L. Song, M-L Yu, M.S. Shaffer, “10- and 40-Gb/s Forward Error Correction Devices for Optical Communications,”IEEE Journal of Solid-State Circuits, vol. 37, no. 11, Nov. 2002.

Page 42: Vladimir Stojanović

Integrated Systems Group 42

Energy efficiency of link components

Large chunk of energy on timing sub-system (PLL, CDR)Different scaling for PAM2/PAM4

Energy scales linearly with technologyLikely to remain a key constraintCan’t count on very complex filters – for reflections and xtalk

0 2 4 6 8 10 12 14 16 18 200

20

40

60

80

100

120

140

Data rate [Gb/s]

PAM2 Tx5 Rx20PAM2 Tx5 Rx1+20PAM2 Tx50 Rx80PAM4 Tx5 Rx20PAM4 Tx50 Rx80

Ener

gy c

ost p

er b

it [m

W/G

b/s]

1 2.2

811

1.5

5.9

4

5.5

0.3

0.450

2

4

6

8

10

12

14

16

18

TxTap RxTap RxSamp PLL CDR

Ener

gy c

ost p

er b

it [m

W/G

b/s]

PAM4PAM2

Page 43: Vladimir Stojanović

Integrated Systems Group 43

Potential savings from relaxed timing

With proper codingIncrease data rateRelax PLL jitter spec (6-12 dB) – save power

Original jitter – rms = 1.4%UI (ring oscillator based PLL)

Legacy FR4 BP Short ATCA BP

6Gb/s

8Gb/s10Gb/s12Gb/s

10Gb/s

15Gb/s

20Gb/s

25Gb/s

Page 44: Vladimir Stojanović

Integrated Systems Group 44

BER vs. hardware complexity

Partially eliminate ISI (leave most of the reflections)Let simple code take care of the rest

Can recover from raw BER of 10-5

And save up to 50 feedback taps - up to 15mW/Gb/s in 0.13µm

15Gb/s 20Gb/s

25Gb/s

6Gb/s 8Gb/s

10Gb/s

12Gb/s

Legacy FR4 BP Short ATCA BP

Page 45: Vladimir Stojanović

Integrated Systems Group 45

~5~2.5

~1.5

• Code : Hamming fordetection only

• Protocol : ARQ

• Code : Single-Error-Correcting,Double-Error Detecting(SEC-DED)

• Protocol : Hybrid ARQ-FEC

• Code : t=2 BCH• Protocol : Forward error correction

Increasing Code Rate, Decreasing BER = Improving Performance

BER (Uncoded)

BER (Coded)

Experimental Testbed - Results

Evaluate impact of codes on actual link- Impact of correlated noise and ISI- Early estimation of energy-efficiency

block length

BER

Rate

Trade-off between error correction and detection-Example: 2-error correcting/detecting schemes at 6.25 Gb/s

Correction only

Detection only

Page 46: Vladimir Stojanović

Integrated Systems Group 46

Channel-Aware Codes

Systematic pattern elimination codesTrivially simple decoding.

CMF of some codes with r overhead

RX Voltage

Cum

ulat

ive

Pro

babi

lity

Blitvic, et. al, submitted to ICC 2007

Page 47: Vladimir Stojanović

Integrated Systems Group 47

What next?

Very long filters too costlyHigh-speed circuits have more noise/errors

Channel-and-circuit-aware codingCode against reflections and xtalk, ckt noise

Multi-tone signallingParallel links

Less noise, higher spectral efficiencyMore energy-efficient for given data rate

Page 48: Vladimir Stojanović

Integrated Systems Group 48

Uncoded multi-tone – the impact of noise

Uncoded multi-tone data ratesHalf the capacity

BER target of 10-15

Peak-power constraintEven simple coding can help – since Gap is huge

Uncoded MT – thermal and phase noise

Legacy FR4 BP

N6K BP

Short ATCA BP

Page 49: Vladimir Stojanović

Integrated Systems Group 49

Capacity – bit loading

Bandwidth is limited by attenuation and noiseCan’t just keep increasing the signaling frequencyNeed to focus on available bandwidth (at most 10-20GHz)

Need circuits that can create/sense 4-8 bits/dim

Excess Noise factor 0dB Excess Noise factor 20dB

Legacy FR4 BP

N6K BP

Short ATCA BP

Page 50: Vladimir Stojanović

Integrated Systems Group 50

Uncoded multi-tone – bit loading

Modified Levin-Campello algorithm (includes phase noise)Bandwidth not affected much (still 10-20GHz)

In high-noise case - less advantage over basebandWith coding can improve by up to 2x – closer to capacity

Legacy FR4 BP

N6K BP

Short ATCA BP

Excess Noise factor 0dB Excess Noise factor 20dB

Page 51: Vladimir Stojanović

Integrated Systems Group 51

Analog multi-tone link

…f

# le

vels

data0

data1

dataN

Challenge – balancing the inter-symbol and inter-channel interference

Microwave filter techniquesWideband RFMixed-signal matrix signal processing

LPF

BPF

BPF

BPF

LPF

ejw1t ejw1t

ejwNt

data0

data1LPF

BPF

ejwNt

LPFdataN

LPF

LPF

A. Amirkhany, V. Stojanovic, M.A. Horowitz, “Multi-tone Signaling for High-speed Backplane Electrical Links,” GLOBCOM’04.

Page 52: Vladimir Stojanović

Integrated Systems Group 52

A more digital implementation

24 Gb/s with 4 channels on a backplane link (roughly 2x better than equalization)Amirkhany et al, GLOBCOM 2006, VLSI Symposium 2007

Page 53: Vladimir Stojanović

Integrated Systems Group 53

ConclusionsInterfaces becoming complex comm. systems

Challenges in modeling, comm. techniques and system design

State-of-the-art baseband links (chips)Far from utilizing the capacity of the channels

10-20x difference in data ratesUseful channel bandwidth 10-20 GHz

Need lower-speed, precision circuits for higher order constellations

Research trendsCustom Coding

If careful, can lower the energy cost per bit for the whole systemProblem formulation different in so many ways

Analog Multi-toneMore energy-efficient ISI reductionSignificant challenges in front-end precision

Page 54: Vladimir Stojanović

Integrated Systems Group 54

Acknowledgments

MARCO Interconnect Focus Center

Jared Zerbe and Ravi Kollipara - RambusIEEE 802.3ap, ATCA forum