76
1 Interconnect Centric VLSI Design Automation Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 [email protected]

Interconnect Centric VLSI Design Automation

  • Upload
    herve

  • View
    37

  • Download
    1

Embed Size (px)

DESCRIPTION

Interconnect Centric VLSI Design Automation. Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 [email protected]. Interconnect Dominance. Goals: Speed, Power, Cost Constraints: Area, Current, Skew Challenges: PVT Variations, Signal Integrity. Outlines. - PowerPoint PPT Presentation

Citation preview

Page 1: Interconnect Centric VLSI Design Automation

1

Interconnect Centric VLSI Design Automation

Chung-Kuan ChengCSE DepartmentUC San DiegoLa Jolla, CA [email protected]

Page 2: Interconnect Centric VLSI Design Automation

2

Interconnect Dominance

Goals: Speed, Power, Cost

Constraints: Area, Current, Skew

Challenges: PVT Variations, Signal Integrity

0

40

80

120

160

200

180 150 130 100 90 80 70 65 57 50Process Technology Node (nm)

Dela

y (p

s)

1mm Global I nterconnect with Scattering(source: I TRS Roadmap 2004)

FO4 I nverter Delay (Estimated by0.36*Ldraw)

1mm distortionless Transmission Line (Speedof Light)

Page 3: Interconnect Centric VLSI Design Automation

3

Outlines• Interconnect Technologies

• Buffers, Pitches, Circuit Styles• Geometric Planning

• Wire Orientations, Chip Shapes• Interconnect Networks

• Topologies, Wire Styles• Power and Ground Distributions• Clock Networks• Functional Modules

• Adders, Shifters • Packaging• Conclusion

Page 4: Interconnect Centric VLSI Design Automation

4

Interconnect Technologies

• RC Wires• Wire Pitch, Width, Separation• Buffer Size, Buffer Interval

• Transmission Line• RLCG

Page 5: Interconnect Centric VLSI Design Automation

5

Interconnect Technologies

Year (On-Chip) 2005 2010 2015

rncn (ps)T40 0.870 0.400 0.180

rwcw (ps/mm)T80

metal 1/global

440

110

1792

523

5951

1601

interval (mm) 0.136

0.272

0.0457

0.0846

0.0168

0.0324

delay (ps/mm) 120

60

164

89

200

104

,)1(2

ww

gn

cr

crfInt

SRC Roadmap 2005

wgwntrtr ccrrfllDelay ))1(22(/)(

Page 6: Interconnect Centric VLSI Design Automation

6

Interconnect Tech.: Transmission Line• Speed-of-the-light on-chip communication

• < 1/5 Delay of Traditional Wires

• Low Power Consumption• < 1/5 Power Consumption

• Robust against process variations• Short Latency• Insensitive to Feature Size

Page 7: Interconnect Centric VLSI Design Automation

7

Interconnect Tech.: Transmission Line

RΔl LΔlGΔl CΔl

RΔl LΔl

GΔl CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Differential Transmission Line Surfliner

RΔl LΔlCΔl

RΔl LΔl

CΔl …

i(z,t)

RΔl LΔl RΔl LΔl

Serial resistance causes voltage loss.

Speed and attenuation are frequency dependent.

Shunt conductance compensates voltage loss: R/G = L/C.

Flat from DC Mode to Giga Hz

Telegraph Cable: O. Heaviside in 1887.

Page 8: Interconnect Centric VLSI Design Automation

8

Theory (Telegrapher’s Equation)• Telegrapher’s equation:

),(),(),(

),(),(

),(

tzGVdt

tzdVC

dz

tzdIdt

tzdILtzRI

dz

tzdV

• Propagation Constant:

jCjGLjR ))((

• Wave Propagation: zjzeVzV 0)(

• Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant

Page 9: Interconnect Centric VLSI Design Automation

9

Theory (Distortionless Line)

• Set G=RC/L• Frequency Independent speed and attenuation:

LCCLR ,//

• Characteristic impedance: (pure resistive)

CLZ /0 • Phase Velocity (Speed of light in the media)

cLCv /1• Attenuation:

zZ

R

ezA 0)(

Page 10: Interconnect Centric VLSI Design Automation

10

Digital Signal Response

Page 11: Interconnect Centric VLSI Design Automation

11

Interconnect Tech.: Transmission Line• Add shunt conductance between differential

wires

• Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal

Page 12: Interconnect Centric VLSI Design Automation

12

Geometrical Planning

• Wire Orientations• Manhattan, Hexagonal, Octagonal, Euclidean

• Die Shapes• Rectangle, Diamond, Hexagon, Octagon, Circle

Page 13: Interconnect Centric VLSI Design Automation

13

Average Radius of Unit-Circle Area

lambda geo.

Shape

Man. Y-Arch X-Arch Euclid.

Square 1.329 1.122 1.070 1.017

Diamond 1.253 1.121 1.070 1.017

Hexagon 1.276 1.100 1.058 1.003

Octagon 1.272 1.104 1.054 1.001

Circle 1.273 1.103 1.055 1.000

Page 14: Interconnect Centric VLSI Design Automation

14

Throughput : concurrent flow demand

lambda geo.

Shape

Man. Y-Arch X-Arch*

M: Square 1.000 1.225 1.346

M: Diamond 1.195

Y: Hexagon 1.315

X: Octagon* 1.420

*ratio of 0-90 planes and 45-135 planes is not fixed

Page 15: Interconnect Centric VLSI Design Automation

15

Flow congestion map for uniform 90 Degree meshes

Page 16: Interconnect Centric VLSI Design Automation

16

12 by 12 13 by 13

Congestion map of square chip using X-architecture

Page 17: Interconnect Centric VLSI Design Automation

17

12 by 12 13 by 13

Y-architecture + Square Chip

Page 18: Interconnect Centric VLSI Design Automation

18

Y Architecture + Hexagonal Chip

Page 19: Interconnect Centric VLSI Design Automation

19

X-Architecture + Octagonal Chip

Page 20: Interconnect Centric VLSI Design Automation

20

Manhattan Architecture + Diamond Chip

Page 21: Interconnect Centric VLSI Design Automation

21

Routing Grids

(http://www.xinitiative.org/img/062102forum.pdf)

X-Architecture Y-Architecture

Page 22: Interconnect Centric VLSI Design Automation

22

Interconnect Networks

• Optimized Interconnect Architecture• Data Bus, Control Signals

• Shared Interconnect• Packet Switching• Circuit Switching• RTL Level Partition

Page 23: Interconnect Centric VLSI Design Automation

23

Interconnect Networks

Physical Implementation

On-chip transmission line for long distance communication

Well spaced RC wire with buffer insertion for local connections

Page 24: Interconnect Centric VLSI Design Automation

24

Interconnect Networks

• Obj: Power, Latency

• Constraints:• Routing Area, Bandwidth

• Design Space:• Topology• Wire Styles, Switches

• Model:• Traffic Demand• Data Bus, Control Signals

Page 25: Interconnect Centric VLSI Design Automation

25

Interconnect Networks: Design Flow

Power & Delay Lib

Topology Lib

Multi-commodity network flow (MCF)

formulation

Power Evaluation(MCF solver)

Latency Aware low power NoC topology

with wire style optimization

Page 26: Interconnect Centric VLSI Design Automation

26

Power and Latency Tradeoffs for Optimal 8x8 Topologies

48

53

58

63

68

73

2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um

Page 27: Interconnect Centric VLSI Design Automation

27

Topology Selection (Latency, Power, BW)

48

58

68

78

88

98

2. 3 2. 8 3. 3 3. 8 4. 3

Average Latency (ns)

Powe

r Co

nsum

ptio

n (W

)

topo=opt i mal topo=mesh topo=torus topo=hypercube

Page 28: Interconnect Centric VLSI Design Automation

28

(a) Optimal topology when area = 3000um

(b) Optimal topology when area = 7000um

(c) Optimal topology when area = 11000um

Topology Selection (Latency, Power, BW)

Optimal 8-node topologies vs area resources

Page 29: Interconnect Centric VLSI Design Automation

29

Power Ground Networks• Power network on chip and package• Decoupling capacitor • Conducting transistors connected to the P/G

grids

Page 30: Interconnect Centric VLSI Design Automation

30

Power Ground Analysis• Constraints: Voltage Drops and Current Densit

y• IR Drop: Static Analysis

• Resistive Networks• Static Current Sources

• dI/dt: Dynamic Analysis• RLC Networks• Power on and off• Sleep mode on and off• Gated clocks• Various operation modes

Page 31: Interconnect Centric VLSI Design Automation

31

Power/Ground Networks:Natural Frequency

• RLC Network Characteristics: Natural frequencies (Quality Factor)

• Operation Modes: Excite the resonance

• Decoupling Capacitance: Shift the natural frequencies.

Page 32: Interconnect Centric VLSI Design Automation

32H. Chen, IBM

Page 33: Interconnect Centric VLSI Design Automation

33

Clock Distributions

Page 34: Interconnect Centric VLSI Design Automation

34

Clock: Linear Variations Model

• Process variation model• Transistor length• Wire width• Linear variation model

• Power variation model• Supply voltage varies randomly (10%)

ykxkdd yx 0

Page 35: Interconnect Centric VLSI Design Automation

35

Clock: RC Model

Input: an n level meshes and h-treesConstraint: routing area, parameter variationsObjective: skew

Page 36: Interconnect Centric VLSI Design Automation

36

Simplified Circuit Model

Vs1Rs

C

Vs2Rs

C

R

1

2

u(t)

u(t-T)

Page 37: Interconnect Centric VLSI Design Automation

37

Skew Expression

110 2ln CRt

)(2

1.

2

21.

1

21

V

VV

V

VVT

Assumptions:

1. T<<RsC

2. Rs /R <<RsC/T

Using first order

Taylor expansion ex=1+x,

)2ln2exp(

:function Skew

R

RTT s

Page 38: Interconnect Centric VLSI Design Automation

38

Optimal Routing Resources Allocation

level-1

level-2

level-3

level-4

total area

Page 39: Interconnect Centric VLSI Design Automation

39

Inductance Diminishes Shunt Effects

Vs1 Rs

C

Vs2 Rs

C

R

1

2

u(t)

u(t-T)

L

f(GHz) 0.5 1 1.5 2 3 3.5 4 5

skew(ps) 3.9 4.2 5.8 7.5 9.9 13 17 26

• 0.5um wide 1.2 cm long copper wire

• Input skew 20ps

Page 40: Interconnect Centric VLSI Design Automation

40

Clock Distributions

Surfliner

Page 41: Interconnect Centric VLSI Design Automation

41

Functional Modules (Data Path)

• Arithmetic

• Algorithms

• Logic

• Logic Styles

• Placement

Page 42: Interconnect Centric VLSI Design Automation

42

Cyclic Shifter

S2

S1

S0

D7 D6 D5 D4 D3 D2 D1 D0

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay: nlogn, Power: ¼ n2

Page 43: Interconnect Centric VLSI Design Automation

43

Fan-out Splitting

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

D7 D6 D5 D4 D3 D2 D1 D0

S2

S1

S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Delay Comparison

50.7

102.7

218.7

484

45.376

128

224

0

100

200

300

400

500

600

8-bit 16-bit 32-bit 64-bit

Del

ay

MuxShifterDemuxShifter

τ

Impr. Ratio8-bit 10.7%16-bit 26.0%32-bit 41.5%64-bit 53.7%

Power Comparison

150.7484

1561.3

5220

146.9453.7

1397.6

4458.9

0

1000

2000

3000

4000

5000

6000

8-bit 16-bit 32-bit 64-bit

Tot

al S

witc

hed

Cap

acita

nce

MuxShifterDemuxShifter

Cg

Impr. Ratio8-bit 2.5%16-bit 6.3%32-bit 10.5%64-bit 14.6%

Delay: nPower: 3/16 n2

Page 44: Interconnect Centric VLSI Design Automation

44

Cell Permutation• Fix the input/output stage, permute cells of intermediate

stages to further improve delay.• Formulate as an ILP problem and solve by CPLEX.

• Optimal solution in terms of delay• Delay/Power tradeoff

Additional Delay Reductionby Cell Permutation

13

29

61

125

818

38

78

0

20

40

60

80

100

120

140

8-bit 16-bit 32-bit 64-bit

Del

ay (

in u

nit

of c

olum

ns s

pann

ed)

w/o permuatation

w/ permutation

7 6 5 4 3 2 1 0

6 5 4 3 7 2 1 0

3 4 2 6 7 5 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

0 0 00 0 0 0 0

1 1 01 1 4 0 0 11130000

11141111

4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3

4 2 4 5 4 5 4 3

8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8

8 7 8 7 8 7 6 8

Optimal Timing solution for 8-bit

Page 45: Interconnect Centric VLSI Design Automation

45

Packaging: Pin Breakaway

• Row by Row Escape: Escape interconnect row by row from outside toward

inside.

Page 46: Interconnect Centric VLSI Design Automation

46

Packaging: Escape Sequence Strategies

• Parallel triangular sequence: This method divides the objects into groups

and escape each group with a triangular outline.

Page 47: Interconnect Centric VLSI Design Automation

47

Packaging: Escape Sequence Strategies• Central triangular sequence: Escape objects from the center of the outside row and

expand the indent with a single triangular outline. In this method, the outline capacity is increased continuously layer by layer while the first several layers is small.

Page 48: Interconnect Centric VLSI Design Automation

48

Packaging: Escape Sequence Strategies• Two-sided sequence: Escape objects from the inside as well as from the outside.

The outline shrinks slowly and also follows zigzag shape.

Page 49: Interconnect Centric VLSI Design Automation

49

Packaing: Escape Sequence Strategies

Page 50: Interconnect Centric VLSI Design Automation

50

Packaging: Experimental Results

Layer Row by rowParallel

triangularCentral

triangularTwo sided

1 304 276 100 312

2 272 340 156 328

3 240 292 228 308

4 208 240 300 324

5 176 164 372 328

6 144 124 444

7 112 164

8 80

9 48

10 16

40 x 40

Page 51: Interconnect Centric VLSI Design Automation

51

V.3. Experimental Results

Layer Row by rowParallel

triangularCentral

triangularTwo sided

1 144 132 92 140

2 112 144 116 160

3 80 96 140 100

4 48 28 52

5 16

20 x 20

Page 52: Interconnect Centric VLSI Design Automation

52

Page 53: Interconnect Centric VLSI Design Automation

53

Page 54: Interconnect Centric VLSI Design Automation

54

Page 55: Interconnect Centric VLSI Design Automation

55

Page 56: Interconnect Centric VLSI Design Automation

56

Page 57: Interconnect Centric VLSI Design Automation

57

Page 58: Interconnect Centric VLSI Design Automation

58

Page 59: Interconnect Centric VLSI Design Automation

59

Page 60: Interconnect Centric VLSI Design Automation

60

Page 61: Interconnect Centric VLSI Design Automation

61

Page 62: Interconnect Centric VLSI Design Automation

62

Page 63: Interconnect Centric VLSI Design Automation

63

Page 64: Interconnect Centric VLSI Design Automation

64

Page 65: Interconnect Centric VLSI Design Automation

65

Page 66: Interconnect Centric VLSI Design Automation

66

Page 67: Interconnect Centric VLSI Design Automation

67

Page 68: Interconnect Centric VLSI Design Automation

68

Conclusion

• Interconnect Technologies

• Geometric Planning

• Interconnect Planning

• Power and Ground Distribution

• Clock Networks

• Data Path

• Packaging

Page 69: Interconnect Centric VLSI Design Automation

69

Interconnect Technologies

2.5 0.31 0.08 1.31 1.5 1.5 0.13t h h ta

c s s s sC t

e e e es

0.2

0.35 0.651.53 0.98 0.01w sfr

c h hC h

e es

asC w

h

0.05 0.25

1.21.05 0.63 0.0632

t sfrs s h

C s te e

s h h

Coupling capacitance:

Self capacitance:

a fr a frc c s s

w

C C C CC

Total wire capacitance:

t

h

h

w

s

d

p/g plane

wire

w: wire widtht: wire thickness

s: wire spacingh: distance to p/g plane

d: wire pitch

Cc

Cs

Cc: coupling capacitance Cs: self capacitance

Cs

Page 70: Interconnect Centric VLSI Design Automation

70

Interconnect Technologies

ndelay bandwidth /bandwidth powerDesign metrics:

ndelay n ndelay power 2n ndelay powerObjective

functions:

For each pitch, For each objective function

Find wire width w, buffer size sinv and buffer interval linv

Calculate the metrics

Page 71: Interconnect Centric VLSI Design Automation

71

Experimental Results – normalized delay

50100

150200

0

0.5

1

1.5x 10

-6

0

0.5

1

1.5

2x 10

-7

de

lay n(s

/m)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9 10

x 10-7

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

-7

pitch(m)

dela

y n(s/m

)

min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch)

Bottom right: at 70nm technology

Page 72: Interconnect Centric VLSI Design Automation

72

Interconnect Tech. -- bandwidth

50100

150200

0

0.51

1.5x 10-6

1

2

3

4

5x 10

13

band

wid

th(b

its/s

)

min-d

min-ddp

min-dp

1.5 2 2.5 3 3.5 4 4.5

x 10-7

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

13

pitch(m)

band

wid

th(b

its/s

)min-dmin-ddpmin-dp

Top left: OverviewTop right: at min-pitchesBottom right: at 70nm technology

Page 73: Interconnect Centric VLSI Design Automation

73

Interconnect Tech. – bandwidth/power

50

100

150

200

00.5

11.5x 10

-6

0

1

2

3

4

5x 10

23

band

wid

th/p

ower

(m/J

s)

min-d

min-ddp

min-dp

1 2 3 4 5 6 7 8 9

x 10-7

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

ba

nd

wid

th/p

ow

er(

bits

*m/J

s)

min-dmin-ddpmin-dp180nm130nm100nm70nm

0 0.2 0.4 0.6 0.8 1 1.2

x 10-6

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

23

pitch(m)

band

wid

th/p

ower

(bits

*m/J

s)min-dmin-ddpmin-dp

Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch)

Bottom right: at 70nm technology

Page 74: Interconnect Centric VLSI Design Automation

74

Interconnect Technologies

mcr

crfl

ww

gn 242)1(2

41gw

wn

cr

crs

Example: w= 85nm, t= 145nm

Optimal interval

mmpsmfsccrrfllDelay wgwntrtr /194/194))1(22(/)(

Optimal buffer size

Optimal delay

rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fFrw=2ohm/um, cw=0.2fF/um

Page 75: Interconnect Centric VLSI Design Automation

75

Experiments—Optimized Skew

s-mesh(s) m-mesh(s) ratio0.00 2.92E-11 2.92E-11 100.0%0.25 2.79E-11 2.60E-11 93.2%0.40 2.71E-11 2.45E-11 90.4%1.00 2.42E-11 1.98E-11 81.8%3.00 1.70E-11 1.24E-11 73.2%5.00 1.24E-11 8.72E-12 70.5%

skewtotal area

Page 76: Interconnect Centric VLSI Design Automation

76

Robustness Against Supply Voltage Variations

ave worst ave worst0.00 2.10E-11 2.91E-11 2.10E-11 2.91E-111.00 8.38E-12 1.14E-11 8.26E-12 1.43E-112.00 2.71E-12 4.42E-12 6.18E-12 1.11E-113.00 1.89E-12 3.33E-12 4.83E-12 8.73E-124.00 1.45E-12 2.48E-12 3.88E-12 6.96E-125.00 1.16E-12 2.02E-12 3.18E-12 5.64E-12

total areamutli-level mesh single-level mesh