31
High-Radix Routers 1 ISCA’05 Microarchitecture of a High-Radix Router John Kim, William J. Dally, Brian Towles 1 , and Amit K. Gupta Concurrent VLSI Architecture 1 D.E. Shaw Stanford University Research & Development

Microarchitecture of a High-Radix Router

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microarchitecture of a High-Radix Router

High-Radix Routers 1ISCA’05

Microarchitecture of a High-Radix Router

John Kim, William J. Dally, Brian Towles1, and Amit K. Gupta

Concurrent VLSI Architecture 1D.E. Shaw Stanford University Research & Development

Page 2: Microarchitecture of a High-Radix Router

High-Radix Routers 2ISCA’05

Interconnection Network

����������������

�����������������

��������������

��������

�������� ��� � ��

�����!

�����" ���������� � ���

#����$%�

Page 3: Microarchitecture of a High-Radix Router

High-Radix Routers 3ISCA’05

Bandwidth Trend

0.1

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

year

ban

dwid

th p

er r

out

er n

od

e (G

b/s

)

Torus Routing ChipIntel iPSC/2J-MachineCM-5Intel Paragon XPCray T3DMIT AlewifeIBM VulcanCray T3ESGI Origin 2000AlphaServer GS320IBM SP Switch2Quadrics QsNetCray X1Velio 3003IBM HPSSGI Altix 3000

Bandwidth growth result of

1.

2. Increase in number of signals

Increase in signaling rate

Page 4: Microarchitecture of a High-Radix Router

High-Radix Routers 4ISCA’05

Bandwidth Trend

0.1

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

year

ban

dwid

th p

er r

out

er n

od

e (G

b/s

)

Torus Routing ChipIntel iPSC/2J-MachineCM-5Intel Paragon XPCray T3DMIT AlewifeIBM VulcanCray T3ESGI Origin 2000AlphaServer GS320IBM SP Switch2Quadrics QsNetCray X1Velio 3003IBM HPSSGI Altix 3000

Approximately an order of magnitude

increase in bandwidth every 5

years

How to effectively utilize the increased bandwidth?

Page 5: Microarchitecture of a High-Radix Router

High-Radix Routers 5ISCA’05

Current Interconnection Networks

• Earlier works [Dally ’90, Agarwal ’91] suggested “lower-radix” networks– Examples: Torus Routing Chip, Cray T3E, SGI Altix 3000

• Current Routers– Myrinet : radix-32– Quadrics : radix-8– IBM SP2 Switch : radix-8

• Technology trends– Increasing Bandwidth– Optical signaling

High-Radix Routers can take better advantage of these technology trends.

Page 6: Microarchitecture of a High-Radix Router

High-Radix Routers 6ISCA’05

Outline

� Technology Trends for High- Radix Router� Motivation for High- Radix Router� High- Radix Router Architectures

� Baseline Architecture� Fully Buffered Crossbar� Hierarchical Crossbar

� Conclusion & Future Work

Page 7: Microarchitecture of a High-Radix Router

High-Radix Routers 7ISCA’05

High-Radix Router

Router

Router

Page 8: Microarchitecture of a High-Radix Router

High-Radix Routers 8ISCA’05

High-Radix Router

Router

Router

Low-radix (small number of fat ports) High-radix (large number of skinny ports)

RouterRouter

Page 9: Microarchitecture of a High-Radix Router

High-Radix Routers 9ISCA’05

Latency in a Network

Latency = Header Latency + Serialization Latency

= H tr + L / b

= 2trlogkN + 2kL / B

= H tr + L / b

where k = radixB = total BandwidthN = # of nodesL = message size

Page 10: Microarchitecture of a High-Radix Router

High-Radix Routers 10ISCA’05

Latency vs. Radix

0

50

100

150

200

250

300

0 50 100 150 200 250radix

late

ncy

(nse

c)

2003 technology 2010 technology

�������������� �

�����������������

Serialization latency increasesHeader latency decreases

Page 11: Microarchitecture of a High-Radix Router

High-Radix Routers 11ISCA’05

Determining Optimal Radix

Latency = Header Latency + Serialization Latency

= H tr + L / b= 2trlogkN + 2kL / B

Optimal radix� k log2 k = (B tr log N) / L

= Aspect Ratio

where k = radixB = total BandwidthN = # of nodesL = message size

Page 12: Microarchitecture of a High-Radix Router

High-Radix Routers 12ISCA’05

Higher Aspect Ratio, Higher Optimal Radix

1996

2003

2010

1991

1

10

100

1000

10 100 1000 10000

Aspect Ratio

Opt

imal

Rad

ix (k

)

Page 13: Microarchitecture of a High-Radix Router

High-Radix Routers 13ISCA’05

Higher Radix, Lower Cost

0

1

2

3

4

5

6

7

8

0 50 100 150 200 250radix

cost

( #

of 1

000

chan

nels

)

2003 technology 2010 technology

Page 14: Microarchitecture of a High-Radix Router

High-Radix Routers 14ISCA’05

Outline

� Technology Trend for High- Radix Router� Motivation for High- Radix Router� High- Radix Router Architectures

� Baseline Architecture� Fully Buffered Crossbar� Hierarchical Crossbar

� Conclusion & Future Work

Page 15: Microarchitecture of a High-Radix Router

High-Radix Routers 15ISCA’05

Virtual Channel Router Architecture

Page 16: Microarchitecture of a High-Radix Router

High-Radix Routers 16ISCA’05

High-Radix Switch Architectures (I)

output 1

output 2

output k

input 1

input k

���

input 2

� � �

(a) Baseline design

Page 17: Microarchitecture of a High-Radix Router

High-Radix Routers 17ISCA’05

Simulation Methodology

• Open-loop simulation done with steady-state measurement• Output switch arbitration required two cycles• Wire delay included in the pipeline• Uniform random traffic pattern• Each packet was assumed to be a single flit• Radix=64 router evaluated with 4 VCs per input

• Metrics – latency & throughput• Cost – area

Page 18: Microarchitecture of a High-Radix Router

High-Radix Routers 18ISCA’05

Baseline Performance Evaluation

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1

offered load

late

ncy

(cyc

les)

low-radix

Page 19: Microarchitecture of a High-Radix Router

High-Radix Routers 19ISCA’05

Baseline Performance Evaluation

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1

offered load

late

ncy

(cyc

les)

low-radix

baseline(high-radix)

Low radix

better

Page 20: Microarchitecture of a High-Radix Router

High-Radix Routers 20ISCA’05

High-Radix Switch Architectures (II)

output 1

output 2

output k

input 1

input k

���

input 2

� � �

(a) Baseline design (b) Fully buffered crossbar

output 1

output 2

output k

input 1

input k

���

input 2

� � � output 1

output 2

output k

input 1

input k

���

input 2

� � �

Fully buffered crossbar with shared buffers

Page 21: Microarchitecture of a High-Radix Router

High-Radix Routers 21ISCA’05

Fully Buffered Crossbar Provides High Performance but �.

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1offered load

late

ncy

(cyc

les)

low-radixbaselinefully-buffered

Page 22: Microarchitecture of a High-Radix Router

High-Radix Routers 22ISCA’05

� Becomes Costly to build

0

50

100

150

200

0 50 100 150 200 250radix

area

(m

m2)

wire area buffer area

Buffer area dominates at radix 50

Other issues:

1.

2.

Credit Information

Adaptive Routing

Page 23: Microarchitecture of a High-Radix Router

High-Radix Routers 23ISCA’05

(b) Fully buffered crossbar

output 1

output 2

output k

input 1

input k

���

input 2

� � �

High-Radix Switch Architectures (III)

output 1

output 2

output k

input 1

input k

���

input 2

� � �

(a) Baseline design

subswitch

(c) Hierarchical crossbar

Page 24: Microarchitecture of a High-Radix Router

High-Radix Routers 24ISCA’05

Hierarchical Crossbar Performance on Uniform Random Traffic

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1offered load

late

ncy

(cyc

les)

baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered

subswitch

Page 25: Microarchitecture of a High-Radix Router

High-Radix Routers 25ISCA’05

Worst-case Traffic Pattern

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1offered load

late

ncy

(cyc

les)

baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered

subswitch

Page 26: Microarchitecture of a High-Radix Router

High-Radix Routers 26ISCA’05

Worst Case Performance Comparison

0

10

20

30

40

50

0 0.2 0.4 0.6 0.8 1offered load

late

ncy

(cyc

les)

baselinesubswitch 32subswitch 16subswitch 8subswitch 4fully-buffered

Page 27: Microarchitecture of a High-Radix Router

High-Radix Routers 27ISCA’05

4k Node Network Performance

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1offered load

late

ncy

(cyc

les)

low-radix router high-radix router

High-Radix leads to lower zero–load latency

Page 28: Microarchitecture of a High-Radix Router

High-Radix Routers 28ISCA’05

High-Radix Router Microarchitecture

� In building high- radix router, there are scaling issues as the number of ports increase

� Baseline design provides poor performance� Fully buffered crossbar leads to high performance but

a costly design� Hierarchical crossbar provides a feasible design with

minimal loss in performance

Page 29: Microarchitecture of a High-Radix Router

High-Radix Routers 29ISCA’05

Conclusion

� Performance of digital systems is often limited by the interconnection network

� Increasing on- chip bandwidth can be more efficiently used by a high-radix routers.

� High- radix lead to lower cost and lower latency networks

� A high- radix router requires a different architecture� A hierarchical organization makes high- radix routers

feasible

Page 30: Microarchitecture of a High-Radix Router

High-Radix Routers 30ISCA’05

Future Work

• Optimal topology for high- radix network– Clos topology suited for high- radix routers

• Routing - How to adaptively route in a high- radix router?

• Flow- control with high- radix router

• Simulating a larger network

• Microarchitecture - Alternative to crossbars : multistage switch organizations, network- on- chip (torus, mesh, etc.)

Page 31: Microarchitecture of a High-Radix Router

High-Radix Routers 31ISCA’05

Thank you

Questions?