Upload
chaim-spink
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Exploiting Crosstalk to Exploiting Crosstalk to Speed up On-chip BusesSpeed up On-chip Buses
Chunjie DuanEricsson Wireless, Boulder
Sunil P KhatriUniversity of Colorado, Boulder
OutlineOutline Introduction Classification of Cross-talk types The Story so far..
Eliminating 3C and 4C sequences Eliminating 4C sequences
Eliminating 2C sequences Eliminating 1C sequences Experimental Results Conclusions
IntroductionIntroduction
Verified cross-talk trends Accurate 3-D capacitance extraction Delay variation 2.47:1 (200 m wires, 10X drivers,
0.1 m technology)
Deep sub-micron process
s t
wa v aCI
CL
v
a
CLCL
CIa v a
CL
v
a
CL
CICI
CL
a av
a
CL
v
CLCL
CI CI
a
CIa av
v
CI
CL CL CL
CICI
CL CL CL
CI
CL
CI
CL CL
Cross-talk vs Bus Data Cross-talk vs Bus Data PatternPattern
When λ ~ 0.1μm, r = CI/CL ~ 10 (metal 4)
Effective total capacitance depends on bus data sequence :
Best case: 0 x CI
Worst case: 4 x CI
0·CI
Ctotal = 0 ·CICtotal = 4 ·CI
0·CI 2·CI 2·CI
Classification of Cross-Classification of Cross-talktalk
4·C sequence:
3·C sequence:
2·C sequence:
1·C sequence:
0·C sequence:
Forbidden patterns (“010” and “101”) Maximum bus data rate depends on total
capacitance seen by any bit
Previous work – Previous work – Eliminating 3C & 4C Eliminating 3C & 4C
SequencesSequences Simple approach: shielding
No 3C/4C sequences ; bus-width is doubled
Theorem: If no forbidden patterns are allowed on the bus, Proof: see “Analysis and Avoidance of Cross-talk in
Buses” – Duan, Tirumala, Khatri (Hot Interconnects August 2001).
So we simply encode the data on the bus to get rid of the forbidden patterns
Recurrence equation for asymptotic bus overhead CODEC implementation to demonstrate practicality
CCtotal 2max
Eliminating 3C & 4C Eliminating 3C & 4C sequences sequences
44% asymptotic overhead Look-Up Table, straightforward, can achieve
minimum overhead (44%), but not practical Our implementation
62.5% overhead (higher than minimum) Modular and straightforward
Break bus into 4-bit groups Encode each group independently
(4bit -> 5 bit) Additional logic to handle across-
group forbidden patterns
overhead percentage
0.00E+00
5.00E-02
1.00E-01
1.50E-01
2.00E-01
2.50E-01
3.00E-01
3.50E-01
4.00E-01
4.50E-01
5.00E-01
0 10 20 30 40 50 60 70 80 90 100
Previous Work - Previous Work - Eliminating 4C sequencesEliminating 4C sequences
Less aggressive: eliminating 4C sequences only Less overhead (33%) Simpler algorithm:
Divide the bus into 3 bit groups When 4C sequence occurs, complement group data Insert group complement indicator Special handling for across-group 4C sequences (see
paper for details) 101 001 -> 010 010 1010 0010 -> 1011 0100
CODEC ResultsCODEC Results Compare waveform with and without coding Random input sequence
Random
sequence
Recovered sequence
encoder decoderdriver receiver
Random
sequence
Recovered sequence
encoder decoderdriver receiver
Encoder/decoder delay ~250ps (memoryless) Max data rate more than 2X compared to scheme with no encoding Speedup is data pattern independent
CODEC Results … 2CODEC Results … 2 Bus length 5mm, 10mm or 20mm Driver strength 30X, 60X and 120X of minimum
DELAY comparison(1mm trace)
-1.00E+00
-5.00E-01
0.00E+00
5.00E-01
1.00E+00
1.50E+00
2.00E+00
0 1 2 3 4 5 6
0c
1C
2C
3C
4C
DELAY comparison(2mm trace)
-1.00E+00
-5.00E-01
0.00E+00
5.00E-01
1.00E+00
1.50E+00
2.00E+00
0 1 2 3 4 5 6 7 8
0C
1C
2C
3C
4C
Trc_len Buf_size 0C 1C 2C 3C 4C 5mm 30x 83 121 241 516 665 5mm 60x 108 131 213 399 402 5mm 120x 96 117 136 196 279 10mm 30x 102 153 437 912 1026 10mm 60x 131 164 413 722 919 10mm 120x 114 137 270 379 548 20mm 30x 153 203 793 1068 1586 20mm 60x 164 206 691 1161 1561 20mm 120x 134 177 580 969 1365
Further Speedup Further Speedup Possible?Possible?
Can we exploit crosstalk to further speed up the bus? Eliminate 2C sequences Eliminate 1C sequences
Simulation shows that eliminating 2C sequences results in a speedup of 2X – 4X over eliminating 3C/4C sequences
Note that we seek memory-less CODEC based techniques
Let’s look at eliminating 2C and 1C sequences next…
Eliminating 2C sequencesEliminating 2C sequences
How to guarantee a 2C free sequence? Find a vector clique such that any pair of elements in
this clique only exhibit 1C transitions between them For an n bit bus, we need a k bit encoded bus (k > n)
such that the new bus has a 2C free clique of cardinality greater than or equal 2n
Solution is memoryless (no need to “remember” the last transmit word) Fast and simple CODEC implementation
We have an inductive method to construct 2C free cliques
Constructing 2C free Constructing 2C free CliquesCliques
Inductive method, extends a known clique Cn = {v}
Let v’ = v . vn First set Cn+1 = {}, and Cn+1 <= Cn+1 U v’
Definition: the 0-extended subset of Cn+1 is:
Definition: the 1-extended subset of Cn+1 is:
Constructing Create a new vector and Add the vector unless there exist a vector in S1 such that:
and
Constructing : similar to Finally where Theorem: Both sets of the previous step are 2C free cliques.
Proof - see paper
1|
0|
111
110
nn
nn
vCvS
vCvS
01
SnC
inewi vv 11 n
newn vv
nnewn wv 11 n
newn wv
11
SnC
01
SnC
Spnnn CCC 111 Si
ni
Cp 1maxarg
Constructing 2C free Cliques Constructing 2C free Cliques … 2… 2
Some observations about the construction Vectors ending with ’01’ and ’10’ can not co-exist in Cn
The first n-bits of any vector of Cn+1 is the same as some vector of Cn and the last two bits are “00” or “11”. In other words, Cn+1 is at least as large as Cn
Because of (a), we know there will be no “011” or “100” in the same clique Cn+1
So we can construct vectors of Cn+1 ending in “001” or “110” by add ‘1’ to vectors ending with “00” or add ‘0’ to vectors end with “11”. However, we can not have both
Constructing 2C free Cliques … Constructing 2C free Cliques … 33
Consider the construction of C4 from C3:
000
100
001
111
0000
1000
0011
1111
0001
1001
0010
1110
01
SnC
11
SnC 1nCnC
0000
1000
0011
1110
1111
Quadratic number of tests required as described above. We can do better…
Constructing Cn+1 from Cn using the 0-extended subset
Similar algorithm when we use the 1-extended subset
Clique Extension Clique Extension AlgorithmAlgorithm
append ‘0’ to n-bit vectors ending with ‘0’ append ‘1’ to n-bit vectors ending with ‘1’ since we use the 0-extended subset of Cn+1
If there is no n-bit vector ending with ’01’ Append ‘1’ to vectors ending with ’00’
If there is no n-bit vector ending with ’11’ Append ‘1’ to vectors ending with ‘10’
The new clique has no vectors ending with ’10’
ifend
TT
thenTif
ifend
TT
thenTif
T
TTT
TTT
nn
n
nn
n
n
nnn
nnn
_
_0_
_
_0_
0
101
01
11
001
01
01
110
11011
11
10001
00
Clique Extension Algorithm … Clique Extension Algorithm … 2 2
Simply perform both versions of the clique extension algorithm
Select the result according to the rule: where
Some values of clique sizes:
Spnnn CCC 111 Si
ni
Cp 1maxarg
N Clique size
3 4
4 5
5 7
6 9
7 10
Area Overhead TrendsArea Overhead Trends Asymptotic overhead is 146%
Lower for smaller bus sizes. Suggests partitioning of bus into smaller sections
1C free Configurations1C free Configurations 1C free sequences have least delay (typically 50%
of 2C free sequences) Just send any data bit multiple times (3/5…) No encoder/decoder needed (no extra codec delay) Simulation shows it’s the fastest compared to any
other techniques with similar area overhead: 3x (or 5x) separation between wires Widening the trace (3x): small R, bigger C
A
B
C
A
B
C
A
B
C
Bus configurations for 1C Bus configurations for 1C delaydelay
We simulated the delay of several different bus configurations
Different configurations yield different delay and area trade-offs
w w w w w variable w w w w w
A: 3-wire group, fixed spacing within group, variable spacing between groups.
w w w w w variable w w w w w
B: similar to A but with a ground shielding between groups.
variable variable
C: no shielding wires, vary wire sizes and spacing
w w w w w w w w w variable w w w w w w w w w
D: 5-wire group, fixed spacing within group, variable spacing between groups. largest overhead
variable
1C free Configurations1C free Configurations
Circuit parameters are extracted using SPACE3D
Bus simulations CODEC was not modeled Spice3f5, 0.1μm BPTM model Transmission line with inter-wire coupling Quantify actual delay of 1C free bus vector
sequences for the 4 configurations described 20mm wire, 30X driver (IDEAL 1C free delay
153ps, 3C free delay 793ps)
Delays for 1C free Delays for 1C free ConfigurationsConfigurations
Configuration C has significantly larger delay than others (3X) since it’s essentially a 3C free configuration (has no shielding)
All other configurations shows up to 2.5X speed up over 3C free bus. For all configurations, the actually delays are larger than IDEAL 0C delay
This is caused by skew on the outer shielding wires Transition of dynamic shields of any wire are slightly misaligned Verified by intentionally skewing the delay on signals
ConclusionsConclusions Inter-wire capacitance increasingly significant for
DSM VLSI bus delays We have developed an array of CODECs to trade off
bus area overhead with delay 4C free = 33% 3C free = 62% 2C free = 146% (asymptotic), up to 4X to 6X faster
Inductive algorithm for 2C free clique construction Simulated several 1C free configurations for area
overhead and delays (no CODECs) 1C free techniques not as fast as expected
Thank You!Thank You!