Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Integrated Systems GroupMassachusetts Institute of Technology
High-speed links: A new field in high-throughput, energy-efficient communications?
Vladimir Stojanović
Integrated Systems Group 2
High-speed links are everywhere
Backbone Router Rack
PC or Console
Integrated Systems Group 3
Serial link signaling over backplanes - past
Designs limited by transmitter & receiver speedClever circuit design
No communications/SI background needed
serdes
BackplaneLinecard Linecard
serdes
Signal at Tx Signal at Rx0.1
1.00.0 0.2 0.4 0.6 0.8 1.0
[GHz]
Channel was not an issue up to 2-3Gb/s
2Gb/s view of the channel
Integrated Systems Group 4
Serial Link Signaling Over Backplanes - Present
Now that we’ve made the fastest Tx & RxLook what happens with the eye
Channel seems to be the problem
serdes
BackplaneLinecard Linecard
serdes
Signal at Tx Signal at Rx
0.00
0.01
0.10
1.000.0 1.0 2.0 3.0 4.0 5.0 [GHz]
10Gb/s view of the channel
Integrated Systems Group 5
New link design
Dealing with bandwidth limited channels
This is an old research areaTextbooks on digital communicationsThink modems, DSL
But can’t directly apply their solutionsStandard approach requires high-speed A/Ds and digital signal processing20Gs/s A/Ds are expensive
(Un)fortunately need to rethink issues
Integrated Systems Group 6
Energy-Efficiency of communication
Standard approach is not energy-efficientCan’t apply to dense interconnectsLinks are 50x more energy-efficient
1
10
100
1000
10000
100000
1000000
56Kb/s V.92modem
12x12Mb/sADSL
modem
GigabitEthernet
10Gb/s High-speed
link
Ene
rgy
cost
per
bit
mW
/(Gb/
s)
Integrated Systems Group 7
Backbone router – lots of high-speed links
State-of-the art up to 1 Tb/s throughputLots of linecards – power constrained system
What matters is energy cost per bit
source: Juniper Networkssource: Alcatel, Tyco
Integrated Systems Group 8
Electrical I/O Challenges
100 Tb/s I/O throughputWith 10Gb/s per link
10000 transceivers20000 high-speed I/O pairs10000 mm2 in 0.13 µm technology
Power 4kW40 mW/Gb/s – energy cost per bit
Scaling the throughput to 100 Tb/s
Integrated Systems Group 9
Density issuesConnectors
50 diff pairs/inch400” long connector
Trace routing50mils pitch250” wide 4-signal layer line-cardBackplane less critical
PackagePackage/Chip ball pitch (1mm / 200um)4000 mm2 / 160mm2
Scaling the throughput to 100 Tb/s
source: Teradyne, Rambus
Integrated Systems Group 10
NeedPower
Reduce energy/bit to 1mW/Gb/sDensity
Increase data rate per link by 10-15x
Design challenge
GoalFit 100 Tb/s on a 100 W crossbar chipReasonable system/rack size
Integrated Systems Group 11
Outline
Explore high-throughput, energy-efficient linksLook at different aspects of system design
High-speed link environmentSystem modeling
Communication techniques
Integrated Systems Group 12
Backplane environment
Line attenuationReflections from stubs (vias)
Integrated Systems Group 13
Backplane channel
Loss is variableSame backplaneDifferent lengthsDifferent stubs
Top vs. Bot
Attenuation is large>30dB @ 3GHzBut is that bad?
Required signal amplitude set by noise
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
9" FR4, via stub
26" FR4,via stub
26" FR4
9" FR4
Integrated Systems Group 14
Channel variations:Geometry, manufacturing, environment
Channel Variations come from multiple sources
Trace routingManufacturingTemperature & Humidity
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
9" FR4, via stub
26" FR4,via stub
26" FR4
9" FR4
GHz
dB
Integrated Systems Group 15
Interference
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
FEXT
NEXT
THROUGH
0 1 2 3
0
0.2
0.4
0.6
0.8
1
ns
puls
e re
spon
se
Tsymbol=160ps
Inter-symbol interferenceDispersion (skin-effect, dielectric loss) - short latencyReflections (impedance mismatches – connectors, via stubs, device parasitics, package) – long latency
Co-channel interference (Far-End & Near-End Crosstalk)
Integrated Systems Group 16
Reflections and crosstalk
Don’t just receive the signal you wantGet versions of signals “close” to youVertical connections have worst coupling
“Close” in these vertical connection regions
Far-end XTALK (FEXT)
Desired signal
Near-end XTALK (NEXT)
Reflections
Sercu, DesignCon03
Integrated Systems Group 17
A complex system
PCB only
PCB + Connectors
PCB, Connectors,Via stubs & Devices
Integrated Systems Group 18
Outline
Explore high-throughput, energy-efficient linksLook at all aspects of system design
High-speed link environmentSystem modeling
Interference and noise
Communication techniques
Integrated Systems Group 19
Previous system models
Borrowed from computer systemsWorst case analysis
Can be too pessimistic in links
Borrowed from data communicationsGaussian distributions
Works well near mean Often way off at tails
ISI distribution is bounded
Need accurate models To relate the power/complexity to performance
Integrated Systems Group 20
How bad is Gaussian interference model?
Gaussian model only good down to 10-3 probabilityWay pessimistic for much lower probabilities
Link target BER ~ 10-15
0 25 50 75 100
-10
-8
-6
-4
-2
0
re sidual ISI [m V ]80 100 120 140 160 180
-10
-8
-6
-4
-2
0
40mV error @ 10-10
25% of eye height
4% Tsym bol
error @ 10-10
9% Tsym bol
log 10
pro
babi
lity
[cdf
]
log 10
Ste
ady-
Stat
e Ph
ase
Prob
abili
ty
phase count
Cumulative ISI distribution Impact on CDR phase
Integrated Systems Group 21
Link modeling issues
No good link system and noise modelsHard to make performance/power tradeoffCannot predict the “right” architecture
Maximum achievable data rates – unknownLimited link communication system design
Peak power constraint in the transmitterNo solution for optimal transmit equalizationNo solution for automatic equalization
Integrated Systems Group 22
A new model
Use direct noise and interference statistics
Main system impairmentsInterference
Voltage noise (thermal, supply, offsets, quantization)
Timing noise – always looked at separatelyKey to integrate with voltage noise sourcesNeed to map from time to voltage
Integrated Systems Group 23
Effect of timing noise
Ideal sampling
Jittered sampling
Voltage noiseVoltage noise when receiver clock is off
The effect depends on the size of the jitter, the input sequence, and the channelNeed effective voltage noise distribution
Integrated Systems Group 24
kb
kT
TXkε
Tk )1( +
TXk 1+ε
kT
TXkε
Tk )1( +
TXk 1+ε
+
kb−
kb
kb
1
2 ≈TXkkb ε−
TXkkb 1+ε
Example: Effect of transmitter jitter
Decompose output into ideal and noiseNoise are pulses at front and end of symbol
Width of pulse is equal to jitter
Approximate with deltas on bandlimited channels
ideal
noise
V. Stojanović, M. Horowitz, “Modeling and Analysis of High-Speed Links,” IEEE Custom Integrated Circuits Conference, September 2003. (invited)
Integrated Systems Group 25
Jitter propagation model
TXw )( jTp
)(sHjit
PLL
ka kb
TXkk 1, +ε
inn⎟⎠⎞
⎜⎝⎛ +
2TjTh
+ kxISIk
x
jitTXk
x RXkε
kaprecoder
impulseresponse
pulseresponse
vddn
RX
ideal
noise
∑−=
− +=++sbE
sbSjijk
RXki
ISI jTpbkTx )()( φεφ
( ) ( )∑−=
−+−− ⎥⎦⎤
⎢⎣⎡ −+−−−++=++
sbE
sbSj
TXjk
RXki
TXjk
RXkijk
RXki
jitter TjThTjThbkTx 1)2
()2
()( εεφεεφεφ
Integrated Systems Group 26
Jitter effect on voltage noiseTransmitter jitter
High frequency (cycle-cycle) jitter is badChanges the energy (area) of the symbolNo correlation of noise sources that sum
Low frequency jitter is less badEffectively shifts waveformCorrelated noise give partial cancellation
Receive jitterModeled by shift of transmit sequenceSame as low frequency transmitter jitter
Bandwidth of the jitter is criticalIt sets the magnitude of the noise created
εkRx
≡εk
Rx
Integrated Systems Group 27
Outline
Explore high-throughput, energy-efficient linksLook at all aspects of system design
High-speed link environmentSystem modeling
Communication techniquesExploring the limits – Capacity and Baseband
Integrated Systems Group 28
Baseline channels
Legacy FR4 BP26”, via stub
N6K BP, 26”
Short ATCA BP, 3”
Legacy (FR4) - lots of reflectionsMicrowave engineered (N6K)Emerging standards (IEEE 802.3ap, ATCA)
Integrated Systems Group 29
Capacity calculation
Concave problem
( )
NnE
PARNENEEts
HE
HE
nE
n
N
npeakavgn
N
n nnnthermal
nn
N
,...,1,0
..
1log21bmaximizelim
1
1
1222
2
2
=≥
==
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
+Γ+=
∑
∑
=
−
=∞→
θσσ
Modified waterfilling Add phase noise
Integrated Systems Group 30
Channel capacity – the impact of noise
CapacityMuch higher than data rates in today’s linksNominal noise
Thermal – 50 Ohm terminationPhase noise – best LC PLL (0.14%UI rms)
Legacy FR4 BP
N6K BP
Short ATCA BP
Capacity – thermal and phase noise
Integrated Systems Group 31
Removing ISI – baseband link
Transmit and Receive Equalization Changes signal to correct for ISIOften easier to work at transmitter
DACs easier than ADCs
Linear transmit equalizer
Decision-feedback equalizer
SampledData
Deadband Feedback taps
Tap SelLogic
TxData
Causaltaps
Anticausal taps
Channel
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
0eqI
doutNoutP
d
Ω50Ω50
Integrated Systems Group 32
Transmit equalization – headroom constraint
Transmit DAC has limited voltage headroomUnknown target signal levels
Hard to formulate error or objective function
Need to tune the equalizer and receive comparator levels
Amplitude of equalized signaldepends on the channel
TxData
Causaltaps
Anticausal taps
Channel
Peak power constraint
Integrated Systems Group 33
Power constrained equalizer optimization
Add variable gain to amplify to known target levelFormulate the objective function from error
SINR is not concave in w in generalChange objective to quasiconcave
w P
power constraint
precoder channelpulse response
g
noise
ka
ka
kake
( ) 222121),( σgwwgwgEgwMSE TTTa ++−= Δ PPP
2
2
)11)(11()1()(
σ+−−=
ΔΔΔΔ
ΔΔ
wwEwEwSINR
TTTTTa
Ta
unbiased PIIPP
unbiasedSINR
Integrated Systems Group 34
Optimal equalization
Minimize BERResidual dispersion into peak distortionReflections into mean distortion
Includes all link-specific noise sourcesExpand to DFE by puncturing the P matrix
( )1..
)11)(11(
15.0maximize
1
2/12
1min
≤
+−−−−
−−=
ΔΔΔΔ
Δ
wtswwE
offsetwVwd
w TTPD
TPD
TTa
PDpeakT
σγ
PIIIIP
PIP
σ2=wTS0TXw+wTS0
RXw+σ2thermal
Still, does this objective really relate to link performance? Need to look at noise and interference distributions
V. Stojanović, A. Amirkhany, M. Horowitz, “Optimal Linear Precoding with Theoretical and Practical Data Rates in High-Speed Serial-Link Backplane Communication,” ICC’04
Integrated Systems Group 35
Pulse amplitude modulation
Binary (NRZ)1 bit / symbolSymbol rate = bit rate
PAM4 2 bits / symbolSymbol rate = bit rate/2
10
11
01
00
1
0
Integrated Systems Group 36
Multi-level: offset and jitter are crucial
thermal noise + offset
thermal noise + offset + jitter
To make better use of available bandwidth, need better circuitsPAM2/PAM4 robust candidate for next generation links
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
Dat
a ra
te [G
b/s]
Symbol rate [Gs/s]
PAM16
PAM8
PAM4
PAM2
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
Symbol rate [Gs/s]
Dat
a ra
te [G
b/s]
PAM2
PAM4
PAM8
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
35
40
45
Dat
a ra
te [G
b/s]
PAM4
PAM16
PAM8
PAM2
Symbol rate [Gs/s]
thermal noise
Integrated Systems Group 37
Full ISI compensation too costly
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
18
20
Dat
a ra
te [G
b/s]
Symbol rate [Gs/s]
PAM16PAM4
PAM2PAM8
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
18
20
Symbol rate [Gs/s]
Dat
a ra
te [G
b/s]
PAM8
PAM4
PAM2
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
18
20
Symbol rate [Gs/s]
Dat
a ra
te [G
b/s]
PAM2
PAM4
PAM8
thermal noisethermal noise + offset
thermal noise + offset+ jitter
Today’s links cannot afford to compensate all ISIToo much powerLimits today’s maximum achievable data rates
Integrated Systems Group 38
Outline
Explore high-throughput, energy-efficient linksLook at all aspects of system design
High-speed link environmentSystem modeling
Communication techniquesExploring the limits – Capacity and Baseband
What next?
Integrated Systems Group 39
What next?
Very long filters too costlyHigh-speed circuits have more noise/errors
Channel-and-circuit-aware codingCode against reflections and xtalk, ckt noise
Integrated Systems Group 40
Lowers energy allocated to
Timing
Equalization
High-speed I/O link
Energy-efficient Channel Coding for Links
CDR
Rx Eq Rx
PLL
TxTx EqS/P
P/S
Adapt
Enc
Dec
Channel-and-circuit-aware code:
Operate at BER = 10-5
Decoded BER = 10-15
Reuses existing P/S infrastructure.
Integrated Systems Group 41
But, need to be careful
Always now what you’re optimizingPowerful coders/encoders often costly
Example - fastest RS (255,239) implementation10 – 40 Gb/s throughputEnergy cost - 12mW/Gb/s50x area of the high-speed link (extensive parallelism)
Need to include the energy cost per bit in the code design spec
L. Song, M-L Yu, M.S. Shaffer, “10- and 40-Gb/s Forward Error Correction Devices for Optical Communications,”IEEE Journal of Solid-State Circuits, vol. 37, no. 11, Nov. 2002.
Integrated Systems Group 42
Energy efficiency of link components
Large chunk of energy on timing sub-system (PLL, CDR)Different scaling for PAM2/PAM4
Energy scales linearly with technologyLikely to remain a key constraintCan’t count on very complex filters – for reflections and xtalk
0 2 4 6 8 10 12 14 16 18 200
20
40
60
80
100
120
140
Data rate [Gb/s]
PAM2 Tx5 Rx20PAM2 Tx5 Rx1+20PAM2 Tx50 Rx80PAM4 Tx5 Rx20PAM4 Tx50 Rx80
Ener
gy c
ost p
er b
it [m
W/G
b/s]
1 2.2
811
1.5
5.9
4
5.5
0.3
0.450
2
4
6
8
10
12
14
16
18
TxTap RxTap RxSamp PLL CDR
Ener
gy c
ost p
er b
it [m
W/G
b/s]
PAM4PAM2
Integrated Systems Group 43
Potential savings from relaxed timing
With proper codingIncrease data rateRelax PLL jitter spec (6-12 dB) – save power
Original jitter – rms = 1.4%UI (ring oscillator based PLL)
Legacy FR4 BP Short ATCA BP
6Gb/s
8Gb/s10Gb/s12Gb/s
10Gb/s
15Gb/s
20Gb/s
25Gb/s
Integrated Systems Group 44
BER vs. hardware complexity
Partially eliminate ISI (leave most of the reflections)Let simple code take care of the rest
Can recover from raw BER of 10-5
And save up to 50 feedback taps - up to 15mW/Gb/s in 0.13µm
15Gb/s 20Gb/s
25Gb/s
6Gb/s 8Gb/s
10Gb/s
12Gb/s
Legacy FR4 BP Short ATCA BP
Integrated Systems Group 45
~5~2.5
~1.5
• Code : Hamming fordetection only
• Protocol : ARQ
• Code : Single-Error-Correcting,Double-Error Detecting(SEC-DED)
• Protocol : Hybrid ARQ-FEC
• Code : t=2 BCH• Protocol : Forward error correction
Increasing Code Rate, Decreasing BER = Improving Performance
BER (Uncoded)
BER (Coded)
Experimental Testbed - Results
Evaluate impact of codes on actual link- Impact of correlated noise and ISI- Early estimation of energy-efficiency
block length
BER
Rate
Trade-off between error correction and detection-Example: 2-error correcting/detecting schemes at 6.25 Gb/s
Correction only
Detection only
Integrated Systems Group 46
Channel-Aware Codes
Systematic pattern elimination codesTrivially simple decoding.
CMF of some codes with r overhead
RX Voltage
Cum
ulat
ive
Pro
babi
lity
Blitvic, et. al, submitted to ICC 2007
Integrated Systems Group 47
What next?
Very long filters too costlyHigh-speed circuits have more noise/errors
Channel-and-circuit-aware codingCode against reflections and xtalk, ckt noise
Multi-tone signallingParallel links
Less noise, higher spectral efficiencyMore energy-efficient for given data rate
Integrated Systems Group 48
Uncoded multi-tone – the impact of noise
Uncoded multi-tone data ratesHalf the capacity
BER target of 10-15
Peak-power constraintEven simple coding can help – since Gap is huge
Uncoded MT – thermal and phase noise
Legacy FR4 BP
N6K BP
Short ATCA BP
Integrated Systems Group 49
Capacity – bit loading
Bandwidth is limited by attenuation and noiseCan’t just keep increasing the signaling frequencyNeed to focus on available bandwidth (at most 10-20GHz)
Need circuits that can create/sense 4-8 bits/dim
Excess Noise factor 0dB Excess Noise factor 20dB
Legacy FR4 BP
N6K BP
Short ATCA BP
Integrated Systems Group 50
Uncoded multi-tone – bit loading
Modified Levin-Campello algorithm (includes phase noise)Bandwidth not affected much (still 10-20GHz)
In high-noise case - less advantage over basebandWith coding can improve by up to 2x – closer to capacity
Legacy FR4 BP
N6K BP
Short ATCA BP
Excess Noise factor 0dB Excess Noise factor 20dB
Integrated Systems Group 51
Analog multi-tone link
…f
# le
vels
data0
data1
dataN
Challenge – balancing the inter-symbol and inter-channel interference
Microwave filter techniquesWideband RFMixed-signal matrix signal processing
LPF
BPF
BPF
BPF
LPF
ejw1t ejw1t
ejwNt
data0
data1LPF
BPF
ejwNt
LPFdataN
LPF
LPF
A. Amirkhany, V. Stojanovic, M.A. Horowitz, “Multi-tone Signaling for High-speed Backplane Electrical Links,” GLOBCOM’04.
Integrated Systems Group 52
A more digital implementation
24 Gb/s with 4 channels on a backplane link (roughly 2x better than equalization)Amirkhany et al, GLOBCOM 2006, VLSI Symposium 2007
Integrated Systems Group 53
ConclusionsInterfaces becoming complex comm. systems
Challenges in modeling, comm. techniques and system design
State-of-the-art baseband links (chips)Far from utilizing the capacity of the channels
10-20x difference in data ratesUseful channel bandwidth 10-20 GHz
Need lower-speed, precision circuits for higher order constellations
Research trendsCustom Coding
If careful, can lower the energy cost per bit for the whole systemProblem formulation different in so many ways
Analog Multi-toneMore energy-efficient ISI reductionSignificant challenges in front-end precision
Integrated Systems Group 54
Acknowledgments
MARCO Interconnect Focus Center
Jared Zerbe and Ravi Kollipara - RambusIEEE 802.3ap, ATCA forum