Interconnect Centric VLSI Design Automation

Preview:

DESCRIPTION

Interconnect Centric VLSI Design Automation. Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 ckcheng@ucsd.edu. Interconnect Dominance. Goals: Speed, Power, Cost Constraints: Area, Current, Skew Challenges: PVT Variations, Signal Integrity. Outlines. - PowerPoint PPT Presentation

Citation preview

1

Interconnect Centric VLSI Design Automation

Chung-Kuan ChengCSE DepartmentUC San DiegoLa Jolla, CA 92093-0404ckcheng@ucsd.edu

2

Interconnect Dominance

Goals: Speed, Power, Cost

Constraints: Area, Current, Skew

Challenges: PVT Variations, Signal Integrity

0

40

80

120

160

200

180 150 130 100 90 80 70 65 57 50Process Technology Node (nm)

Dela

y (p

s)

1mm Global I nterconnect with Scattering(source: I TRS Roadmap 2004)

FO4 I nverter Delay (Estimated by0.36*Ldraw)

1mm distortionless Transmission Line (Speedof Light)

3

Outlines• Interconnect Technologies

• Buffers, Pitches, Circuit Styles• Geometric Planning

• Wire Orientations, Chip Shapes• Interconnect Networks

• Topologies, Wire Styles• Power and Ground Distributions• Clock Networks• Functional Modules

• Adders, Shifters • Packaging• Conclusion

4

Interconnect Technologies

• RC Wires• Wire Pitch, Width, Separation• Buffer Size, Buffer Interval

• Transmission Line• RLCG

5

Interconnect Technologies

Year (On-Chip) 2005 2010 2015

rncn (ps)T40 0.870 0.400 0.180

rwcw (ps/mm)T80

metal 1/global

440

110

1792

523

5951

1601

interval (mm) 0.136

0.272

0.0457

0.0846

0.0168

0.0324

delay (ps/mm) 120

60

164

89

200

104

,)1(2

ww

gn

cr

crfInt

SRC Roadmap 2005

wgwntrtr ccrrfllDelay ))1(22(/)(

6

Interconnect Tech.: Transmission Line• Speed-of-the-light on-chip communication

• < 1/5 Delay of Traditional Wires

• Low Power Consumption• < 1/5 Power Consumption

• Robust against process variations• Short Latency• Insensitive to Feature Size

7

Interconnect Tech.: Transmission Line

RΔl LΔlGΔl CΔl

RΔl LΔl

GΔl CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Differential Transmission Line Surfliner

RΔl LΔlCΔl

RΔl LΔl

CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Serial resistance causes voltage loss.

Speed and attenuation are frequency dependent.

Shunt conductance compensates voltage loss: R/G = L/C.

Flat from DC Mode to Giga Hz

Telegraph Cable: O. Heaviside in 1887.

8

Theory (Telegrapher’s Equation)• Telegrapher’s equation:

),(),(),(

),(),(

),(

tzGVdt

tzdVC

dz

tzdIdt

tzdILtzRI

dz

tzdV

• Propagation Constant:

jCjGLjR ))((

• Wave Propagation: zjzeVzV 0)(

• Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant

9

Theory (Distortionless Line)

• Set G=RC/L• Frequency Independent speed and attenuation:

LCCLR ,//

• Characteristic impedance: (pure resistive)

CLZ /0 • Phase Velocity (Speed of light in the media)

cLCv /1• Attenuation:

zZ

R

ezA 0)(

10

Digital Signal Response

11

Interconnect Tech.: Transmission Line• Add shunt conductance between differential

wires

• Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal

12

Geometrical Planning

• Wire Orientations• Manhattan, Hexagonal, Octagonal, Euclidean

• Die Shapes• Rectangle, Diamond, Hexagon, Octagon, Circle

13

Average Radius of Unit-Circle Area

lambda geo.

Shape

Man. Y-Arch X-Arch Euclid.

Square 1.329 1.122 1.070 1.017

Diamond 1.253 1.121 1.070 1.017

Hexagon 1.276 1.100 1.058 1.003

Octagon 1.272 1.104 1.054 1.001

Circle 1.273 1.103 1.055 1.000

14

Throughput : concurrent flow demand

lambda geo.

Shape

Man. Y-Arch X-Arch*

M: Square 1.000 1.225 1.346

M: Diamond 1.195

Y: Hexagon 1.315

X: Octagon* 1.420

*ratio of 0-90 planes and 45-135 planes is not fixed

15

Flow congestion map for uniform 90 Degree meshes

16

12 by 12 13 by 13

Congestion map of square chip using X-architecture

17

12 by 12 13 by 13

Y-architecture + Square Chip

18

Y Architecture + Hexagonal Chip

19

X-Architecture + Octagonal Chip

20

Manhattan Architecture + Diamond Chip

21

Routing Grids

(http://www.xinitiative.org/img/062102forum.pdf)

X-Architecture Y-Architecture

22

Interconnect Networks

• Optimized Interconnect Architecture• Data Bus, Control Signals

• Shared Interconnect• Packet Switching• Circuit Switching• RTL Level Partition

23

Interconnect Networks

Physical Implementation

On-chip transmission line for long distance communication

Well spaced RC wire with buffer insertion for local connections

24

Interconnect Networks

• Obj: Power, Latency

• Constraints:• Routing Area, Bandwidth

• Design Space:• Topology• Wire Styles, Switches

• Model:• Traffic Demand• Data Bus, Control Signals

25

Interconnect Networks: Design Flow

Power & Delay Lib

Topology Lib

Multi-commodity network flow (MCF)

formulation

Power Evaluation(MCF solver)

Latency Aware low power NoC topology

with wire style optimization

26

Power and Latency Tradeoffs for Optimal 8x8 Topologies

48

53

58

63

68

73

2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um

27

Topology Selection (Latency, Power, BW)

48

58

68

78

88

98

2. 3 2. 8 3. 3 3. 8 4. 3

Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

topo=opt i mal topo=mesh topo=torus topo=hypercube

28

(a) Optimal topology when area = 3000um

(b) Optimal topology when area = 7000um

(c) Optimal topology when area = 11000um

Topology Selection (Latency, Power, BW)

Optimal 8-node topologies vs area resources

29

Power Ground Networks• Power network on chip and package• Decoupling capacitor • Conducting transistors connected to the P/G

grids

30

Power Ground Analysis• Constraints: Voltage Drops and Current Densit

y• IR Drop: Static Analysis

• Resistive Networks• Static Current Sources

• dI/dt: Dynamic Analysis• RLC Networks• Power on and off• Sleep mode on and off• Gated clocks• Various operation modes

31

Power/Ground Networks:Natural Frequency

• RLC Network Characteristics: Natural frequencies (Quality Factor)

• Operation Modes: Excite the resonance

• Decoupling Capacitance: Shift the natural frequencies.

32H. Chen, IBM

33

Clock Distributions

34

Clock: Linear Variations Model

• Process variation model• Transistor length• Wire width• Linear variation model

• Power variation model• Supply voltage varies randomly (10%)

ykxkdd yx 0

35

Clock: RC Model

Input: an n level meshes and h-treesConstraint: routing area, parameter variationsObjective: skew

36

Simplified Circuit Model

Vs1Rs

C

Vs2Rs

C

R

1

2

u(t)

u(t-T)

37

Skew Expression

110 2ln CRt

)(2

1.

2

21.

1

21

V

VV

V

VVT

Assumptions:

1. T<<RsC

2. Rs /R <<RsC/T

Using first order

Taylor expansion ex=1+x,

)2ln2exp(

:function Skew

R

RTT s

38

Optimal Routing Resources Allocation

level-1

level-2

level-3

level-4

total area

39

Inductance Diminishes Shunt Effects

Vs1 Rs

C

Vs2 Rs

C

R

1

2

u(t)

u(t-T)

L

f(GHz) 0.5 1 1.5 2 3 3.5 4 5

skew(ps) 3.9 4.2 5.8 7.5 9.9 13 17 26

• 0.5um wide 1.2 cm long copper wire

• Input skew 20ps

40

Clock Distributions

Surfliner

41

Functional Modules (Data Path)

• Arithmetic

• Algorithms

• Logic

• Logic Styles

• Placement

42

Cyclic Shifter

S2

S1

S0

D7 D6 D5 D4 D3 D2 D1 D0

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay: nlogn, Power: ¼ n2

43

Fan-out Splitting

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

D7 D6 D5 D4 D3 D2 D1 D0

S2

S1

S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay Comparison

50.7

102.7

218.7

484

45.376

128

224

0

100

200

300

400

500

600

8-bit 16-bit 32-bit 64-bit

Del

ay

MuxShifterDemuxShifter

τ

Impr. Ratio8-bit 10.7%16-bit 26.0%32-bit 41.5%64-bit 53.7%

Power Comparison

150.7484

1561.3

5220

146.9453.7

1397.6

4458.9

0

1000

2000

3000

4000

5000

6000

8-bit 16-bit 32-bit 64-bit

Tot

al S

witc

hed

Cap

acita

nce

MuxShifterDemuxShifter

Cg

Impr. Ratio8-bit 2.5%16-bit 6.3%32-bit 10.5%64-bit 14.6%

Delay: nPower: 3/16 n2

44

Cell Permutation• Fix the input/output stage, permute cells of intermediate

stages to further improve delay.• Formulate as an ILP problem and solve by CPLEX.

• Optimal solution in terms of delay• Delay/Power tradeoff

Additional Delay Reductionby Cell Permutation

13

29

61

125

818

38

78

0

20

40

60

80

100

120

140

8-bit 16-bit 32-bit 64-bit

Del

ay (

in u

nit

of c

olum

ns s

pann

ed)

w/o permuatation

w/ permutation

7 6 5 4 3 2 1 0

6 5 4 3 7 2 1 0

3 4 2 6 7 5 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

0 0 00 0 0 0 0

1 1 01 1 4 0 0 11130000

11141111

4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3

4 2 4 5 4 5 4 3

8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8

8 7 8 7 8 7 6 8

Optimal Timing solution for 8-bit

45

Packaging: Pin Breakaway

• Row by Row Escape: Escape interconnect row by row from outside toward

inside.

46

Packaging: Escape Sequence Strategies

• Parallel triangular sequence: This method divides the objects into groups

and escape each group with a triangular outline.

47

Packaging: Escape Sequence Strategies• Central triangular sequence: Escape objects from the center of the outside row and

expand the indent with a single triangular outline. In this method, the outline capacity is increased continuously layer by layer while the first several layers is small.

48

Packaging: Escape Sequence Strategies• Two-sided sequence: Escape objects from the inside as well as from the outside.

The outline shrinks slowly and also follows zigzag shape.

49

Packaing: Escape Sequence Strategies

50

Packaging: Experimental Results

Layer Row by rowParallel

triangularCentral

triangularTwo sided

1 304 276 100 312

2 272 340 156 328

3 240 292 228 308

4 208 240 300 324

5 176 164 372 328

6 144 124 444

7 112 164

8 80

9 48

10 16

40 x 40

51

V.3. Experimental Results

Layer Row by rowParallel

triangularCentral

triangularTwo sided

1 144 132 92 140

2 112 144 116 160

3 80 96 140 100

4 48 28 52

5 16

20 x 20

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

Conclusion

• Interconnect Technologies

• Geometric Planning

• Interconnect Planning

• Power and Ground Distribution

• Clock Networks

• Data Path

• Packaging

69

Interconnect Technologies

2.5 0.31 0.08 1.31 1.5 1.5 0.13t h h ta

c s s s sC t

e e e es

0.2

0.35 0.651.53 0.98 0.01w sfr

c h hC h

e es

asC w

h

0.05 0.25

1.21.05 0.63 0.0632

t sfrs s h

C s te e

s h h

Coupling capacitance:

Self capacitance:

a fr a frc c s s

w

C C C CC

Total wire capacitance:

t

h

h

w

s

d

p/g plane

wire

w: wire widtht: wire thickness

s: wire spacingh: distance to p/g plane

d: wire pitch

Cc

Cs

Cc: coupling capacitance Cs: self capacitance

Cs

70

Interconnect Technologies

ndelay bandwidth /bandwidth powerDesign metrics:

ndelay n ndelay power 2n ndelay powerObjective

functions:

For each pitch, For each objective function

Find wire width w, buffer size sinv and buffer interval linv

Calculate the metrics

71

Experimental Results – normalized delay

50100

150200

0

0.5

1

1.5x 10

-6

0

0.5

1

1.5

2x 10

-7

de

lay n(s

/m)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9 10

x 10-7

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch)

Bottom right: at 70nm technology

72

Interconnect Tech. -- bandwidth

50100

150200

0

0.51

1.5x 10-6

1

2

3

4

5x 10

13

band

wid

th(b

its/s

)

min-d

min-ddp

min-dp

1.5 2 2.5 3 3.5 4 4.5

x 10-7

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)min-dmin-ddpmin-dp

Top left: OverviewTop right: at min-pitchesBottom right: at 70nm technology

73

Interconnect Tech. – bandwidth/power

50

100

150

200

00.5

11.5x 10

-6

0

1

2

3

4

5x 10

23

band

wid

th/p

ower

(m/J

s)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9

x 10-7

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

ba

nd

wid

th/p

ow

er(

bits

*m/J

s)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

band

wid

th/p

ower

(bits

*m/J

s)min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch)

Bottom right: at 70nm technology

74

Interconnect Technologies

mcr

crfl

ww

gn 242)1(2

41gw

wn

cr

crs

Example: w= 85nm, t= 145nm

Optimal interval

mmpsmfsccrrfllDelay wgwntrtr /194/194))1(22(/)(

Optimal buffer size

Optimal delay

rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fFrw=2ohm/um, cw=0.2fF/um

75

Experiments—Optimized Skew

s-mesh(s) m-mesh(s) ratio0.00 2.92E-11 2.92E-11 100.0%0.25 2.79E-11 2.60E-11 93.2%0.40 2.71E-11 2.45E-11 90.4%1.00 2.42E-11 1.98E-11 81.8%3.00 1.70E-11 1.24E-11 73.2%5.00 1.24E-11 8.72E-12 70.5%

skewtotal area

76

Robustness Against Supply Voltage Variations

ave worst ave worst0.00 2.10E-11 2.91E-11 2.10E-11 2.91E-111.00 8.38E-12 1.14E-11 8.26E-12 1.43E-112.00 2.71E-12 4.42E-12 6.18E-12 1.11E-113.00 1.89E-12 3.33E-12 4.83E-12 8.73E-124.00 1.45E-12 2.48E-12 3.88E-12 6.96E-125.00 1.16E-12 2.02E-12 3.18E-12 5.64E-12

total areamutli-level mesh single-level mesh

Recommended