24
Exploiting Crosstalk to Exploiting Crosstalk to Speed up On-chip Buses Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Embed Size (px)

Citation preview

Page 1: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Exploiting Crosstalk to Exploiting Crosstalk to Speed up On-chip BusesSpeed up On-chip Buses

Chunjie DuanEricsson Wireless, Boulder

Sunil P KhatriUniversity of Colorado, Boulder

Page 2: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

OutlineOutline Introduction Classification of Cross-talk types The Story so far..

Eliminating 3C and 4C sequences Eliminating 4C sequences

Eliminating 2C sequences Eliminating 1C sequences Experimental Results Conclusions

Page 3: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

IntroductionIntroduction

Verified cross-talk trends Accurate 3-D capacitance extraction Delay variation 2.47:1 (200 m wires, 10X drivers,

0.1 m technology)

Deep sub-micron process

s t

wa v aCI

CL

v

a

CLCL

CIa v a

CL

v

a

CL

CICI

CL

a av

a

CL

v

CLCL

CI CI

a

CIa av

v

CI

CL CL CL

CICI

CL CL CL

CI

CL

CI

CL CL

Page 4: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Cross-talk vs Bus Data Cross-talk vs Bus Data PatternPattern

When λ ~ 0.1μm, r = CI/CL ~ 10 (metal 4)

Effective total capacitance depends on bus data sequence :

Best case: 0 x CI

Worst case: 4 x CI

0·CI

Ctotal = 0 ·CICtotal = 4 ·CI

0·CI 2·CI 2·CI

Page 5: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Classification of Cross-Classification of Cross-talktalk

4·C sequence:

3·C sequence:

2·C sequence:

1·C sequence:

0·C sequence:

Forbidden patterns (“010” and “101”) Maximum bus data rate depends on total

capacitance seen by any bit

Page 6: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Previous work – Previous work – Eliminating 3C & 4C Eliminating 3C & 4C

SequencesSequences Simple approach: shielding

No 3C/4C sequences ; bus-width is doubled

Theorem: If no forbidden patterns are allowed on the bus, Proof: see “Analysis and Avoidance of Cross-talk in

Buses” – Duan, Tirumala, Khatri (Hot Interconnects August 2001).

So we simply encode the data on the bus to get rid of the forbidden patterns

Recurrence equation for asymptotic bus overhead CODEC implementation to demonstrate practicality

CCtotal 2max

Page 7: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Eliminating 3C & 4C Eliminating 3C & 4C sequences sequences

44% asymptotic overhead Look-Up Table, straightforward, can achieve

minimum overhead (44%), but not practical Our implementation

62.5% overhead (higher than minimum) Modular and straightforward

Break bus into 4-bit groups Encode each group independently

(4bit -> 5 bit) Additional logic to handle across-

group forbidden patterns

overhead percentage

0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

2.50E-01

3.00E-01

3.50E-01

4.00E-01

4.50E-01

5.00E-01

0 10 20 30 40 50 60 70 80 90 100

Page 8: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Previous Work - Previous Work - Eliminating 4C sequencesEliminating 4C sequences

Less aggressive: eliminating 4C sequences only Less overhead (33%) Simpler algorithm:

Divide the bus into 3 bit groups When 4C sequence occurs, complement group data Insert group complement indicator Special handling for across-group 4C sequences (see

paper for details) 101 001 -> 010 010 1010 0010 -> 1011 0100

Page 9: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

CODEC ResultsCODEC Results Compare waveform with and without coding Random input sequence

Random

sequence

Recovered sequence

encoder decoderdriver receiver

Random

sequence

Recovered sequence

encoder decoderdriver receiver

Encoder/decoder delay ~250ps (memoryless) Max data rate more than 2X compared to scheme with no encoding Speedup is data pattern independent

Page 10: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

CODEC Results … 2CODEC Results … 2 Bus length 5mm, 10mm or 20mm Driver strength 30X, 60X and 120X of minimum

DELAY comparison(1mm trace)

-1.00E+00

-5.00E-01

0.00E+00

5.00E-01

1.00E+00

1.50E+00

2.00E+00

0 1 2 3 4 5 6

0c

1C

2C

3C

4C

DELAY comparison(2mm trace)

-1.00E+00

-5.00E-01

0.00E+00

5.00E-01

1.00E+00

1.50E+00

2.00E+00

0 1 2 3 4 5 6 7 8

0C

1C

2C

3C

4C

Trc_len Buf_size 0C 1C 2C 3C 4C 5mm 30x 83 121 241 516 665 5mm 60x 108 131 213 399 402 5mm 120x 96 117 136 196 279 10mm 30x 102 153 437 912 1026 10mm 60x 131 164 413 722 919 10mm 120x 114 137 270 379 548 20mm 30x 153 203 793 1068 1586 20mm 60x 164 206 691 1161 1561 20mm 120x 134 177 580 969 1365

Page 11: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Further Speedup Further Speedup Possible?Possible?

Can we exploit crosstalk to further speed up the bus? Eliminate 2C sequences Eliminate 1C sequences

Simulation shows that eliminating 2C sequences results in a speedup of 2X – 4X over eliminating 3C/4C sequences

Note that we seek memory-less CODEC based techniques

Let’s look at eliminating 2C and 1C sequences next…

Page 12: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Eliminating 2C sequencesEliminating 2C sequences

How to guarantee a 2C free sequence? Find a vector clique such that any pair of elements in

this clique only exhibit 1C transitions between them For an n bit bus, we need a k bit encoded bus (k > n)

such that the new bus has a 2C free clique of cardinality greater than or equal 2n

Solution is memoryless (no need to “remember” the last transmit word) Fast and simple CODEC implementation

We have an inductive method to construct 2C free cliques

Page 13: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Constructing 2C free Constructing 2C free CliquesCliques

Inductive method, extends a known clique Cn = {v}

Let v’ = v . vn First set Cn+1 = {}, and Cn+1 <= Cn+1 U v’

Definition: the 0-extended subset of Cn+1 is:

Definition: the 1-extended subset of Cn+1 is:

Constructing Create a new vector and Add the vector unless there exist a vector in S1 such that:

and

Constructing : similar to Finally where Theorem: Both sets of the previous step are 2C free cliques.

Proof - see paper

1|

0|

111

110

nn

nn

vCvS

vCvS

01

SnC

inewi vv 11 n

newn vv

nnewn wv 11 n

newn wv

11

SnC

01

SnC

Spnnn CCC 111 Si

ni

Cp 1maxarg

Page 14: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Constructing 2C free Cliques Constructing 2C free Cliques … 2… 2

Some observations about the construction Vectors ending with ’01’ and ’10’ can not co-exist in Cn

The first n-bits of any vector of Cn+1 is the same as some vector of Cn and the last two bits are “00” or “11”. In other words, Cn+1 is at least as large as Cn

Because of (a), we know there will be no “011” or “100” in the same clique Cn+1

So we can construct vectors of Cn+1 ending in “001” or “110” by add ‘1’ to vectors ending with “00” or add ‘0’ to vectors end with “11”. However, we can not have both

Page 15: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Constructing 2C free Cliques … Constructing 2C free Cliques … 33

Consider the construction of C4 from C3:

000

100

001

111

0000

1000

0011

1111

0001

1001

0010

1110

01

SnC

11

SnC 1nCnC

0000

1000

0011

1110

1111

Quadratic number of tests required as described above. We can do better…

Page 16: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Constructing Cn+1 from Cn using the 0-extended subset

Similar algorithm when we use the 1-extended subset

Clique Extension Clique Extension AlgorithmAlgorithm

append ‘0’ to n-bit vectors ending with ‘0’ append ‘1’ to n-bit vectors ending with ‘1’ since we use the 0-extended subset of Cn+1

If there is no n-bit vector ending with ’01’ Append ‘1’ to vectors ending with ’00’

If there is no n-bit vector ending with ’11’ Append ‘1’ to vectors ending with ‘10’

The new clique has no vectors ending with ’10’

ifend

TT

thenTif

ifend

TT

thenTif

T

TTT

TTT

nn

n

nn

n

n

nnn

nnn

_

_0_

_

_0_

0

101

01

11

001

01

01

110

11011

11

10001

00

Page 17: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Clique Extension Algorithm … Clique Extension Algorithm … 2 2

Simply perform both versions of the clique extension algorithm

Select the result according to the rule: where

Some values of clique sizes:

Spnnn CCC 111 Si

ni

Cp 1maxarg

N Clique size

3 4

4 5

5 7

6 9

7 10

Page 18: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Area Overhead TrendsArea Overhead Trends Asymptotic overhead is 146%

Lower for smaller bus sizes. Suggests partitioning of bus into smaller sections

Page 19: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

1C free Configurations1C free Configurations 1C free sequences have least delay (typically 50%

of 2C free sequences) Just send any data bit multiple times (3/5…) No encoder/decoder needed (no extra codec delay) Simulation shows it’s the fastest compared to any

other techniques with similar area overhead: 3x (or 5x) separation between wires Widening the trace (3x): small R, bigger C

A

B

C

A

B

C

A

B

C

Page 20: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Bus configurations for 1C Bus configurations for 1C delaydelay

We simulated the delay of several different bus configurations

Different configurations yield different delay and area trade-offs

w w w w w variable w w w w w

A: 3-wire group, fixed spacing within group, variable spacing between groups.

w w w w w variable w w w w w

B: similar to A but with a ground shielding between groups.

variable variable

C: no shielding wires, vary wire sizes and spacing

w w w w w w w w w variable w w w w w w w w w

D: 5-wire group, fixed spacing within group, variable spacing between groups. largest overhead

variable

Page 21: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

1C free Configurations1C free Configurations

Circuit parameters are extracted using SPACE3D

Bus simulations CODEC was not modeled Spice3f5, 0.1μm BPTM model Transmission line with inter-wire coupling Quantify actual delay of 1C free bus vector

sequences for the 4 configurations described 20mm wire, 30X driver (IDEAL 1C free delay

153ps, 3C free delay 793ps)

Page 22: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Delays for 1C free Delays for 1C free ConfigurationsConfigurations

Configuration C has significantly larger delay than others (3X) since it’s essentially a 3C free configuration (has no shielding)

All other configurations shows up to 2.5X speed up over 3C free bus. For all configurations, the actually delays are larger than IDEAL 0C delay

This is caused by skew on the outer shielding wires Transition of dynamic shields of any wire are slightly misaligned Verified by intentionally skewing the delay on signals

Page 23: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

ConclusionsConclusions Inter-wire capacitance increasingly significant for

DSM VLSI bus delays We have developed an array of CODECs to trade off

bus area overhead with delay 4C free = 33% 3C free = 62% 2C free = 146% (asymptotic), up to 4X to 6X faster

Inductive algorithm for 2C free clique construction Simulated several 1C free configurations for area

overhead and delays (no CODECs) 1C free techniques not as fast as expected

Page 24: Exploiting Crosstalk to Speed up On-chip Buses Chunjie Duan Ericsson Wireless, Boulder Sunil P Khatri University of Colorado, Boulder

Thank You!Thank You!