12
Energy Efficient and High Speed On- Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University, College Station, TX, USA

Energy Efficient and High Speed On-Chip Ternary Bus Chunjie Duan Mitsubishi Electric Research Labs, Cambridge, MA, USA Sunil P. Khatri Texas A&M University,

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Energy Efficient and High Speed On-Chip Ternary Bus

Chunjie DuanMitsubishi Electric Research Labs, Cambridge, MA, USA

Sunil P. KhatriTexas A&M University, College Station, TX, USA

2

03/13/2008

Motivation

• Trends in VLSI design– Shrinking feature size

• Deep SubMicron (DSM) and Very Deep SubMicron (VDSM) processes– Scaling down supply voltage– Increasing die-size (e.g. SoC, NoC, CMP)

• Impacts Smaller gate delay (high speed logic) Lower switching power per gate High complexity (>billion gates) χ Increasing power consumptionχ Higher leakage current (standby power)χ Reduced noise marginχ Increasing interconnect delay

• Interconnect delay >> gate delay• Global interconnect becomes the performance bottleneck

3

03/13/2008

On-chip Bus Interconnects

• The impact of DSM / VDSM: – W↓, P↓

– L↑, T↑ • to avoid quadratic increase in resistance of the wire:

• Inter-wire capacitance CI is much greater than substrate capacitance CL, → crosstalk becomes dominant

– λ = CI / CL > 10 for metal 4 in a m CMOS process

WT

LR

CLCL CL

T

W

CI CI

CI CI

CL CL CL

Earlier process

P

DSM process

4

03/13/2008

Ternary Bus and Mapping

• Advantage of a ternary bus– low voltage step: Vdd/2 instead of Vdd

• We propose a bit-to-bit binary-ternary mapping scheme– Each binary bit is mapped directly to a line on the ternary bus.

– A binary 0 is mapped to a middle value on the ternary bus. i.e. 0b->0t.

– A binary 1 is mapped to either high or low value on the ternary bus. i.e. 1b+ or 1b - .

• Disadvantage: lower bit density (1 bit/line vs 1.58 bit/line for true ternary bus)

• Advantages: direct mapping and flexible polarity– Ternary to binary conversion is very slow and complex– Flexible polarity results in low crosstalk. e.g., the ternary vectors +0+, -0-, +0- and

-0+ all represent the same binary value 101.

• Each ternary value is represented by the polarity Pj and the magnitude Dj

Ternary driver truth table

Dj Pj Tj Vout

0 X 0 V0

1 0 - V-

1 1 + V+

5

03/13/2008

Crosstalk in a Multi-valued Bus

• Define the effective crosstalk as

– where j,k = sgn(j) Vk is the normalized voltage change,

and . NOL is the number of logic levels

• Delay can be approximated as

– for • Energy consumption is

– when >> 1,

• For ternary bus, Vstep = Vdd/2, we know

– max(Xeff,j)= 8

– min(Xeff,j)=0

• Bus speed/power is highly data pattern dependent!

n

jstepjeffLtotal VXCE

1

2.

jeffstepLj XVCk ,

jeffjstepLj XVCk ,

n

jstepLjeffjtotal VCXE

1

2.

1,1,, 2 jjjjjjeff absX

Table 1. Examples of Total Crosstalk

Vt-1 Vt Xeff

000 +++ 0

000 0++ 1

000 0+- 5

+0+ 0+0 4

+0+ 0-0 0

-+0 +-0 6

+-+ -+- 8

+++ --- 0

NOL

VV ddstep

step

jj V

V

6

03/13/2008

A Low Power, High Speed 4X Ternary Bus• Using direct bit-to-bit mapping• Coding rules:

– Rule #1: A direct - ↔ + transition is prohibited. – Rule #2: A 1b0b is mapped as -t0t or +t0t depending only on the current polarity of the 1b. – Rule #3: For a 0b1b transition on bj, if bj-1 is transitioning, Pj is coded so both lines transition

in the same direction.– Rule #4: For a 0b1b transition on bj, if bj-1 is not transitioning and and bj+1 is transitioning

from 1 to 0, Pj is coded so that the jth and (j+1)th line transition in the same direction.– Rule #5: For a 0b1b transition on bj, if no transition on either neighbor, Pj is coded so {Pj =

Pj-1 or Pj = Pj+1} with Pj = Pj-1 having the higher priority.

• The 1st rule guarantees max(Xeff,j) = 4, therefore a 2X speed up from a conventional binary bus

• The other rules are designed to lower the probability of high value Xeff,j’s occurrence on the bus

• Identical encoder/decoder logic for each bit

An example of 4X ternary sequences

Binary Ternary Xeff

1111011100110101111000110101010010101110011100010000001100011110

++-000-+00—0+0+++-000-+0+0+0+00-0-0-+-00+-+000-000000--000+++-0

01100121012201111011212200001021012122001343112100110121

7

03/13/2008

An Even Faster 3X Ternary Bus

• Partition the bus into 5-bit groups• Insert shield wire between groups• Apply the same rules for 4X bus

• It can be proven that such a configuration guarantees max(Xeff) = 3

– Additional 33% speed up over 4X ternary bus

• At the cost of 20% additional wires

Enc

Pj+

1

Dj+

1B

j+1

Tj+

2

Enc

Pj+

2

Dj+

2

Bj+

2

Enc

Ternarydriver

Tj

Pj

Dj

Bj

Ternarydriver

Ternarydriver

Tj+

1

Enc

Pj+

3

Dj+

3B

j+3

Ternarydriver

Tj+

3

Enc

Pj+

4

Dj+

4B

j+4

Ternarydriver

Tj+

4

Ternarydriver

Tj

Enc

Pj

Dj

Bj

Ternarydriver

Tj+

1

Enc

Pj+

1

Dj+

1B

j+1

Ternarydriver

Tj-1

Enc

Pj-1

Dj-1

Bj-1

To j+2, …To j-2, …

4X bus encoder and driver circuit 3X bus encoder and driver circuit

8

03/13/2008

Circuit Implementations

• Encoder implemented based on the 5 rules

• Decoder is extremely simple (implemented with two 2-input gates)

• Ternary driver and receiver can be implemented in current or voltage mode– Current mode is more power hungry (static current)– Voltage mode requires a low impedance Vdd/2 supply

M1

M3

M2

VddVdd/2

busw xtalk

Iref

VddI ref

2Iref

out2out1 dout

I-receiver

ENCdin

M3 M4

M5

M1 M2

I-driver

to Dj+1

to Dj-1

CL

CI

R

bus

to Dj+1

to Dj-1

CL

CI

RENCdin

Vdd

Vref1

Vref2

Vdd

Vdd

Vref2

Vref1

doutV-driver

V-receiver

shared V-ref

(B) Voltage mode

(A) current mode

9

03/13/2008

Experimental Results

Crosstalk distribution and normalized energy consumption comparison (code ternary vs. half-swing binary)

Bus Size0X 1X 2X 3X 4X EF

(x104)%

5 B 52821 81837 46056 20289 3792 25.0 34.5

T 74712 99228 28101 2754 0 16.3

8 B 16924 26509 14432 6123 1540 7.99 28.2

T 21792 31373 11104 1259 0 5.73

16 B 15541 25637 15437 7264 1641 8.49 27.2

T 19843 31302 12685 1690 2 6.17

32 B 14852 25109 15949 7771 1823 8.76 27.5

T 18976 31285 13550 1691 2 6.35

• The power saving comes from the redistribution of the Xeff

– More transitions are pushed towards lower Xeff

• The average power saving is ~27%

4X: ternary bus using 4X code; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus

10

03/13/2008

Experimental Results

• The proposed 4X and 3X busses are advantageous over other bus coding schemes.

• EF: Normalized total energy

• PDP: power delay product

Bus type 4XT 3XT SB HB RP TT

EF (x104) 6.13 6.67 19.7 8.38 12.1 7.55

Delay 4x 3x 4x 4x 8x 8x

PDP (x105) 2.45 2.00 7.88 3.35 9.68 6.04

Pwr saving (%) 68.9 66.1 0 57.5 38.6 61.7

PDP gain (%) 68.9 74.6 0 57.5 -22.8 23.4

Bus Area 1 1.2 1.97 1 1 0.68

4XT: ternary bus using 4X code; 3XT: ternary bus with 3X code; SB: binary bus with shielding; HB: half-swing binary bus; RP: ternary bus with random polarity; TT: true ternary bus

Bus performance comparison

11

03/13/2008

Experimental Results

Eye diagrams for uncoded an coded busses (10mm)

12

03/13/2008

Summary

• Crosstalk classification was extended to multi-valued buses• We proposed a direct bit-to-bit binary-ternary mapping scheme which

results in a simple CODEC design.• We proposed a 4X coding scheme that allows us to double the speed

of a conventional ternary bus and save energy. • We proposed a coding scheme (3X coding) to attain an additional

33% speed gain at the cost of 20% area overhead.• We designed and implemented the CODEC and ternary

driver/receiver.• Our experimental results show significant power saving (27%) and

speed gain (2X or more) over other schemes