Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
ISPA 2016, August 23 - 26, Tianjin, China
Hosei University ISPA 2016 – 1 / 29
http://cis.k.hosei.ac.jp/∼yamin/
Question about k-Ary n-Cube
Hosei University ISPA 2016 – 2 / 29
If we have 27 nodes, we can build a 3-ary 3-cube
(0,0,0)
(2,2,2)
(0,2,0)
(2,2,0)
(2,0,2)
(0,0,2)
(0,0,1)
(2,2,1)
(1,0,2)
(0,1,0)(2,1,2)
(1,2,0)
Suppose we have 10,000,000 nodes, k =?, n =? so that
the system has high performance at low cost
k = 3, n = 3
N = kn = 27
Bidirectional 4-Ary 2-Cube (p = 1)
Hosei University ISPA 2016 – 3 / 29
(0,0) (3,0)
(0,3) (3,3)
(1,0) (2,0)
(1,3) (2,3)
Router
Compute
node
Node
External ports: 4
Internal ports: 1 p: the number of computer nodes in a node
Unidirectional 4-Ary 2-Cube (p = 1)
Hosei University ISPA 2016 – 4 / 29
(0,0) (3,0)
(0,3) (3,3)
(1,0) (2,0)
(1,3) (2,3)
Router
Compute
node
Node
External ports: 2
Internal ports: 1 p: the number of computer nodes in a node
Bidirectional or Unidirectional 3-Ary 3-Cube
Hosei University ISPA 2016 – 5 / 29
(0,0,0)
(2,2,2)
(0,2,0)
(2,2,0)
(2,0,2)
(0,0,2)
(0,0,1)
(2,2,1)
(1,0,2)
(0,1,0)(2,1,2)
(1,2,0)
Interconnection Network
Hosei University ISPA 2016 – 6 / 29
Switch (router)
CPU/memory board Interconnection network
Link (cable)
Ports are connected by links
based on a certain topologyNode
Communication port
Used for designing large distributed memory parallel systems
Router with Four External Ports (p = 1)
Hosei University ISPA 2016 – 7 / 29
Compute node
mux
Controller
Compute node
Controller
mux
mux
mux
mux
mux
mux
mux
mux
mux
(a) 5 × 5 crossbar (b) 5 × 5 input buffered crossbar
Cross-Point Buffered Router (p = 1)
Hosei University ISPA 2016 – 8 / 29
Compute node
(Processing element core or CPU/memory board)mux
dem
ux
mux
dem
ux
mux
dem
ux
mux
dem
ux
mux
dem
ux
5 × 5 cross-point buffered crossbar
Crossbar controller
Flits
Diameter Comparison (Bidirectional Torus)
Hosei University ISPA 2016 – 9 / 29
0
50
100
150
200
250
300
350
400
450
500
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
Diameter
Number of nodes in the system
k-ary 2-cubek-ary 3-cubek-ary 4-cubek-ary 5-cubek-ary 6-cube
n-cube
As n becomes larger, the diameter becomes smaller, but degree gets larger
Topological Properties
Hosei University ISPA 2016 – 10 / 29
Network # of nodes Degree Diameter Bisection
n-cube 2n n n 2n−1
k-ary n-cube (mesh) kn 2n n(k − 1) kn−1
Bidirectional
k-ary n-cube (torus)kn 2n n⌊k/2⌋ 2kn−1
Unidirectional
k-ary n-cube (torus)kn n n(k − 1) kn−1
Number of Compute Nodes in a Node
Hosei University ISPA 2016 – 11 / 29
Router
Comp.
node
Comp.
node
Comp.
node
Comp.
node
Router
Comp.
node
Comp.
node
Comp.
node
Router
Comp.
node
Comp.
node
Router
Comp.
node
(d) p = 4(c) p = 3
(b) p = 2(a) p = 1
Node Node
Node
RCP — Relative Cost Performance
Hosei University ISPA 2016 – 12 / 29
RCP =(d + p)λD
(log2N + p)λlog
2N
d: node degreep: the number of compute nodes in a node
λ: the router complexity (1.0 ≤ λ ≤ 2.0)D: diameter
N : the number of nodes in system
Taking p and λ into consideration
The smaller RCP, the lower cost and higher performance
RCP of Hypercube
Hosei University ISPA 2016 – 13 / 29
RCP =(d + p)λD
(log2N + p)λlog
2N
=(n + p)λn
(n + p)λn
≡ 1
n-cube:d = n (node degree)
D = n (diameter)
N = 2n (the number of nodes in system)
Irrespective of λ, p, and N
Derivative of RCP
Hosei University ISPA 2016 – 14 / 29
Let x = log2N , then N = 2x = kn, or k = 2x/n, therefore
we have D = kn/2 = 2x/nn/2
Let g(x) = (2n + p)λ2x/nn/2
f(x) = (x + p)λx
Then RCP′ = (g(x)/f(x))′ =
g′(x)f(x) − g(x)f ′(x)
f 2(x)
where g′(x) = (2n + p)λ2x/nln2/2
f ′(x) = ((x + p)λ)′x + (x + p)λx′
= λ(x + p)λ−1x + (x + p)λ
Derivative of RCP
Hosei University ISPA 2016 – 15 / 29
Let RCP′ = 0, i.e.,
g′(x)f(x) = g(x)f ′(x)
The positive number of x can be calculated from the equation
ln2(x + p)x = n((λ + 1)x + p)
Then we can determine an odd k from the equation
k = ⌊2x/n⌋ or
k = ⌈2x/n⌉
If both are even, k = 2x/n + 1
RCP Comparison (p = 1, λ = 1.0)
Hosei University ISPA 2016 – 16 / 29
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
k-ary2-cube
k-ary3-cube
k-ary4-cube
k-ary5-cube
k-ary 6-cube
n-cube
Relativecostperform
ance
Number of nodes in the system
RCP Comparison (p = 1, λ = 1.5)
Hosei University ISPA 2016 – 17 / 29
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
k-ary 6
-cubek-a
ry5-c
ube
k-ary4-cube
k-ary3-cube
k-ary2-cube
n-cube
Relativecostperform
ance
Number of nodes in the system
RCP Comparison (p = 1, λ = 2.0)
Hosei University ISPA 2016 – 18 / 29
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
k-ary
6-cu
be
k-ary 5
-cube
k-ary 4-cube
k-ary3-cube
k-ary2-cube
n-cube
Relativecostperform
ance
Number of nodes in the system
RCP Comparison on λ (p = 1)
Hosei University ISPA 2016 – 19 / 29
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Relativecostperform
ance
λ
N = 211, 2-cube
N = 211, 3-cube
N = 211, 4-cube
N = 211, 5-cube
N = 211, 6-cube
n-cube
RCP Comparison on p
Hosei University ISPA 2016 – 20 / 29
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
RelativeRCPto
thatwith
p=
1
Number of compute nodes in the system
λ = 1.5, n = 3, p = 4
λ = 1.5, n = 2, p = 4
λ = 1.5, n = 3, p = 2
λ = 1.5, n = 2, p = 2
p = 1
RCP Comparison on n (p = 1)
Hosei University ISPA 2016 – 21 / 29
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2 3 4 5 6 7 8 9 10 11 12 13 14
N=
210 , λ
=1.0
N=
210 ,
λ=
1.5
N=
210 ,
λ=
2.0
N = 220 , λ = 1.0
N=
220 , λ =
1.5
N=
220 , λ
=2.
0
n-cube
Relativecostperform
ance
n
RCP Comparison for Bidirectional Torus
Hosei University ISPA 2016 – 22 / 29
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
n=
5, even
k
n=
5,odd
k
n=
4, even
kn
=4,odd
k
n=
5, even
kn
=5, odd
k
Relativecostperform
ance
Number of nodes in the system
λ = 1.0λ = 1.5λ = 2.0n-cube
Recommended Bidirectional Tori with p = 1
Hosei University ISPA 2016 – 23 / 29
N n k d D RCP λ
121 2 11 4 10 0.576 2.0
256 2 16 4 16 0.617 2.0
343 3 7 6 9 0.794 1.0
1,000 3 10 6 15 0.768 1.5
3,375 3 15 6 21 0.543 2.0
4,913 3 17 6 24 0.545 2.0
14,641 4 11 8 20 0.683 1.5
16,807 5 7 10 15 0.782 1.0
50,625 4 15 8 28 0.525 2.0
117,649 6 7 12 18 0.779 1.0
161,051 5 11 10 25 0.674 1.5
248,832 5 12 10 30 0.742 1.5
759,375 5 15 10 35 0.514 2.0
1,771,561 6 11 12 30 0.668 1.5
2,476,099 5 19 10 45 0.518 2.0
11,390,625 6 15 12 42 0.507 2.0
47,045,881 6 19 12 54 0.728 2.0
RCP of Mesh (p = 1, λ = 1.5)
Hosei University ISPA 2016 – 24 / 29
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
2.8
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
k-ary 6
-cubek-
ary5-cube
k-ary4-cube
k-ary3-cube
k-ary2-cube
n-cube
Relativecostperform
ance
Number of nodes in the system
Dividing Mesh RCP by Torus RCP
Hosei University ISPA 2016 – 25 / 29
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
MeshrelativeRCPto
torus
Number of nodes in the system
k-ary 2-cubek-ary 3-cubek-ary 4-cubek-ary 5-cubek-ary 6-cube
The performance of mesh is worse than that of torus
Improvement of Unidirectional Torus
Hosei University ISPA 2016 – 26 / 29
1.0
1.5
2.0
2.5
3.0
3.5
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
Bidir.torusRCP/unidir.torusRCP
Number of nodes in the system
k-ary 6-cubek-ary 5-cubek-ary 4-cubek-ary 3-cubek-ary 2-cube
The unidirectional torus has better performance than bidirectional torus
RCP Comparison for Unidirectional Torus
Hosei University ISPA 2016 – 27 / 29
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
24
26
28
210
212
214
216
218
220
222
224
226
228
230
232
n=
2
n=
4
n=
6
n=
8
n=
2
n=
4
n=
6
n=
2
n=
4n = 6
Relativecostperform
ance
Number of nodes in the system
λ = 1.0λ = 1.5λ = 2.0n-cube
Recommended Undirectional Tori with p = 1
Hosei University ISPA 2016 – 28 / 29
N n k d D RCP λ
196 2 14 2 26 0.414 2.0
256 4 4 4 12 0.833 1.0
512 3 8 3 21 0.590 1.5
1,024 5 4 5 15 0.818 1.0
2,744 3 14 3 39 0.354 2.0
3,375 3 15 3 42 0.354 2.0
4,096 4 8 4 28 0.557 1.5
15,625 6 5 6 24 0.808 1.0
32,768 5 8 5 35 0.536 1.5
50,625 4 15 4 56 0.324 2.0
59,049 5 9 5 40 0.536 1.5
262,144 6 8 6 42 0.522 1.5
531,441 6 9 6 48 0.522 1.5
759,375 5 15 5 70 0.306 2.0
1,048,576 5 16 5 75 0.306 2.0
7,529,536 6 14 6 78 0.294 2.0
24,137,569 6 17 6 96 0.294 2.0
Summary
Hosei University ISPA 2016 – 29 / 29
The k-ary n-cube has been deeply investigated and widely
adopted in real supercomputer designs
We proposed an analytical model for evaluating the relative
cost performance to hypercube
RCP = ((d + p)λD) / ((log2N + p)λlog
2N)
By using this model, we can con�gure the k-ary n-cube toachieve high performance at low cost.
We also investigated k-ary n-dimensional mesh and
unidirectional k-ary n-dimensional torus
The unidirectional k-ary n-dimensional torus is better than
that of the bidirectional k-ary n-dimensional torus