25
Layout-conscious Random Topologies for HPC Off-chip Interconnects Michihiro Koibuchi Ikki Fujiwara Hiroki Matsutani Henri Casanova National Institute of Informatics, JP National Institute of Informatics, JP Keio University, JP University of Hawaii at Manoa, US The 19th International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2013

Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Embed Size (px)

Citation preview

Page 1: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Layout-conscious Random Topologies for

HPC Off-chip Interconnects

Michihiro Koibuchi

Ikki Fujiwara

Hiroki Matsutani

Henri Casanova

National Institute of Informatics, JP

National Institute of Informatics, JP

Keio University, JP

University of Hawaii at Manoa, US

The 19th International Symposium on High-Performance Computer Architecture (HPCA), Feb. 2013

Page 2: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Introduction

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

2

Page 3: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

• Goal: to make a low-latency topology for HPC networks

– low diameter and low average path hops

• Random topology is best [Koibuchi et al, ISCA2012]

100 times

improvement (a) Non-random Shortcuts

(b) Random Shortcuts

1,024-node network

Avg. short

est path

length

[hops]

Good Point of Random Topology

3 Switch degree ≈ Number of shortcuts

Page 4: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

• Random topology leads to lowest latency [Koibuchi et al, ISCA2012]

but causes longer cables

• Idea: to design a layout-friendly quasi-random topologies

– shorter cable length (near to tori & hypercube)

– low path hops (near to fully random)

Bad Point of Random Topology

4

Up to 200% increased cable length

Page 5: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Introduction

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

5

Page 6: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Why Should We Care about Topology Now?

6

[Tomkins, 2008] 1μs system-across latency is desired [Henmmert, 2008]

Switch delay>100ns, Link delay=5ns/m

New low-diameter/hop topology is needed

Page 7: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Randomness Makes Graphs Smaller

• Small-world

phenomenon

– Social network

– P2P network

– Airline network

• Its use for HPC

interconnects

– Relatively high radix

– More uniform degree

– Considering rack layout

7

WS model

[Watts, 1998]

Vertex = Person/Computer/Airport Vertex = Switch

Edge = Links

Page 8: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Random Topology Minimizes Latency

8

Non-random topology (tori, fat-tree, etc.)

in current HPC & DC networks

Random shortcut topology

(ring + random shortcuts)

Toward low-latency

network era

[Koibuchi et al, ISCA 2012]

Reduced latency

Page 9: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

But, Cabling is Enormous

• Even for non-random topology…

9

K Computer (6-D mesh/torus) Earth Simulator, 1st gen. (crossbar)

(c) kan-haru (c) Riken

83,200 cables

2,400 km

140 tons

200,000 cables

1,000 km

Our layout-conscious random topology

provides Shorter Cable Length in

addition to lower latency

Page 10: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

10

Page 11: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Two Approaches to Quasi-randomness

• Method A makes a non-random topology random

• Method B makes a random topology layout-friendly

11

Low High

(not random) (fully random)

Randomness

Method A Method B start start

Quasi-random topologies

Page 12: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

12

Page 13: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Method A: Link Permutation

• Typical non-random topologies have great layout

• Randomly swap the ends of cables on its layout

– Maintaining the great layout and the lengths of cables

13

Endpoints of intra - cabinet links are swapped Endpoints of inter - cabinet links are swapped

Cabinet 0 Cabinet 1 Cabinet 2

0

01

10

11

20

21

02

03

12

13

22

23

04

05

14

15

24

25

Cabinet 0 Cabinet 1 Cabinet 2

00 10 20

05 15 25

Map

to

cab

inets

Sw

ap

th

e c

ab

les

Given a non-random

topology

A non-random topology

laid out with cabinets

Resulting permuted

topology

Page 14: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

14

Path Hops vs. Network Size

Permuted random topology provides better path hops

- Permuted_tori, and Permuted_Hypercube also do 14

Ideal

Baseline non-

random Up to 25%

reduction

Avg. short

est path

length

[hops]

Page 15: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Network Simulation

15

Comm. Latency and Throughput Switch structure

How many cycles ?

Cycle-accurate network simulation

XBAR

FIFO

FIFO

Packet length 33 flits (1 flit = 256 bit)

Switching technique Virtual-cut through

Traffic Pattern Uni, Matrix-t, Bit-rev

Number of VCs 2

Switch delay >100 ns

Link delay 20 ns

Switch & network parameters

Mesh/Hypercube Duato

Torus DOR

Ring + Random Irregular

Topology & Routing

Applying Deadlock-

free Routing theory

Adaptive channel

Escape channel

Page 16: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Latency vs. Throughput

• Up to 20% reduced latency: close to full-random

• Similar results found with other traffic patterns and

with other baseline topologies (e.g. fat tree) 16

Reduced

latency

512-switch networks, uniform traffic

Page 17: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

17

Page 18: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

• Optimize layout after randomizing topology

• Constrain the number of bypassed nodes when

each random link is added to a ring

– Virtually maintaining the diameter and the average

shortest path hops of a fully random topology

Method B: Constrained Shortcutting

18

cabinet layout

■◆

▼●

switch topology cabinet topology

▼ ●

[Fujiwara et al, PDCAT 2012]

Clu

steri

ng

Map

pin

g

Page 19: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Way to Constrain the Shortcuts

• N nodes, m degree, distribution θ

• Add m−2 shortcuts randomly to a ring (m=2)

• So that each shortcut bypasses up to θ/2n nodes

along the ring

– If θ = 100% then it is identical to a fully random

topology

19

m = 6

θ = 40%

Low path hops close to θ = 100% (ideal)

m = 6

θ = 50% m = 6

θ = 60% m = 6

θ = 70%

Page 20: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Cable Length vs. Network Size

• Constrained shortcutting successfully reduced the

cable length by up to 26%

20

Reduced

cable

Floorplan follows [ANSI/TIA/EIA-942] and [J.Kim, ISCA2007]

Page 21: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Outline

• Motivation

• Layout-conscious random topologies

– (method A) link permutation

– (method B) constrained random shortcutting

• Comparison of both methods

• Conclusions

21

Page 22: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Permutation vs. Constrained Shortcuts

• Link permutation is usually recommended

• Constrained shortcutting (θ=50%) is better in

low-radix networks

22

Latency Throughput

Degree = 4

Degree = 8

Degree = 8

Degree = 4

in 256-switch networks

Average Cable Length(m) Average Cable Length(m)

Page 23: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Conclusions

• Two randomizing methods for practical quasi-

random topology in off-chip networks

– Link permutation randomizes links after layout is fixed

– Constrained shortcutting optimizes layout after

randomizing topology

23

A resulting quasi-random topology has

- Shorter cable length

(≒tori/hypercube)

- Low path hops

(≒ fully random ) Switch

Link permutation Constrained shortcutting

Page 24: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

Thank you!

24

Page 25: Layout-conscious Random Topologies for HPC Off-chip ...research.nii.ac.jp/~koibuchi/pdf/hpca2013_slide.pdf · Layout-conscious Random Topologies for HPC Off-chip Interconnects

25

Routing Computation Cost

• Address and routing-table size at switch

• InfiniBand LID: 48k

• General issue regardless of topology

• Computational cost of path search

• Topology-agnostic deadlock-free routing [Flich,TPDS2012]

• O ((N+E) logN) priority-queue Dijkstra algo.

N = 8, E=11

Switch

Link

82 sec for 16k nodes