Upload
shyann-hopton
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
NC論2 2
内 容• 分散・並列処理計算機における相互結合ネッ
トワークとその上でのメッセージ・ルーティング技法などについて学ぶ
• 資料 http://comp.is.uec.ac.jp/yoshinagalab/yoshinaga/dp2.html
• http://ceng.usc.edu/smart/presentations/archives/AppendixE.ppt (253 slides, 13MB)
• http://booksite.mkp.com/9780123838728/references/appendix_f.pdf (P.118, 2MB)
• TA: 重信 裕政君 [email protected]
3
References• T. M. Pinkston and J. Duato: Interconnection
Networks, Appendix E in Computer Architecture: A Quantitative Approach, 4th Edition, Morgan Kaufmann publishers (2006).
• 5th Edition, Morgan Kaufmann publishers (2011).• J. Duato, S. Yalamanchili, L. Ni: Interconnection
Networks - an Engineering Approach-, 第 2版 , Morgan Kaufmann publishers (2003)
• 富田眞治: 並列コンピュータ、昭晃堂( 1996 )
• W.D. Dally, B. Towles: Principles and Practices of Interconnection Networks, Morgan Kaufmann publishers (2003)
NC論2 4
What is an interconnection Network?
• It is a programmable system that transports data between terminals, such as processors and memory.
• It is programmable in the sense that it makes different connections at different points.
• It is a system because it is composed of many components: buffers, channels, switches, and controls that works together to deliver data.
NC論2 5
Interconnection Network (1/2)
P
M
Interconnection Network
Multicomputer
P
M
P
M
NC論2 6
Interconnection Network (2/2)
P
M
Interconnection Network
UMA type shared memory multiprocessor
It is also called dance-hall architecture.
P
M
P
M
NC論2 7
Trend
• Its performance is increasing with processor performance at a rate of 50% per year.
• Communication is a limiting factor in the performance of many modern systems.
• Buses have been unable to keep up with the bandwidth demand, and point-to-point interconnection networks are rapidly taking over.
NC論2 8
Computer Classifications (%) 2013/06 2012/06 2011/06
MPP 16.6 18.6 17.4
Cluster 83.4 81.4 82.2
Others 0.0 0.0 0.4
http://www.top500.org/
share of the TOP500 June, 2013 – June, 2011
NC論2 9
Examples of clustersProcessors Accelerator Interconnect
Tianhe-2( 天河 2 号)
China2013
Intel Xeon E5-2692 12C
2.2 GHz×2 ×16K
Xeon Phi 31S1P (57 cores)×3
×16K
TH Express-2(proprietary)
Fat tree
Tsubame 2.5Tokyo Tech.
2013
Xeon X5670 2.93GHz×2
×1,408
NVIDIA Kepler K20x
×3×1,048
Infiniband QDR
(40Gbps) ×2Fat tree
NC論2 10
Examples of MPPsNode Topology #core
RmaxK computer@RIKEN
Fujitsu2011
SPARC64 VIIIfx2 GHz
(16 GFlops×8 cores)
6D mesh/3D torus
Tofu interconnect
80K-node x 8-core= 640K-core10.51 PFlops
7,890 KW
Titan@ORNLCray XK7
2012
AMD Opteron 16C 2.2 GHz
+ NVIDIA K20x
3D torus Gemini
interconnect
18,688 nodes(200 Cabinets) 27.11 PFlops
8,209 KW
NC論2 11
Other Networks of Supercomputers• Sequoia (2011): 5D torus, proprietary IBM SeaStar• Pleiades / NASA (2011): partial 11D hypercube
topology with IB QDR/DDR• Red Sky/ Sandia National Lab. (2010): 3D torus (12 bristled node) with IB QDR switches• IBM Roadrunner (2009): fat-tree with IB DDR• Earth Simulator2 / NEC SX-9E (2009): Fat-Tree (64GB/s/cpu, 8-CPU/node, 160 nodes)• IBM Blue Gene/L (2004): 3D torus proprietary (64 x 32 x 32 = 64K nodes)
NC論2 12
Architecture vs. software
memory programming
UMA(SMP) shared OpenMP
NUMA(MPP)
distributed(not shared)
MPI(Message Passing Interface)
NC論2 13
Network Design (1/3)
• Performance: latency and throughput (bandwidth)
• Scalability: #processors vs. network, memory, I/O bandwidth
• Incremental expandability: small to maximum size
• Partitionability: netwrok may be partitioned for several users
NC論2 14
Network Design (2/3)
• Simplicity: simple design, higher clock frequency, easy to
use• Distance span: smaller system is preferred
for noise and cable delay, etc.• Physical constraints: packaging (pin count),
wiring(wire length), and maintenance (power consumption) should meet physical limitation.
NC論2 15
Network Design (3/3)
• Reliability: fault tolerant, reliable communication, hot swap
• Expected workload: robust performance over a wade range of traffic
conditions.• Cost: trade-offs between cost and
performance.
NC論2 16
Classifiction of Interconnection Networks
• Shared-Medium Networks– Local area networks (ethernet, token ring)– Backplane bus (e.g. SUN Gigaplane)
• Direct Networks (router-based)– mesh, torus, hypercube, tree, … etc.
• Indirect Networks (switch-based)• Hybrid Networks
NC論2 17
Shared-Medium Networks (LAN)
• Arbitration that determines the mastership of the shared-medium network to resolve network access is needed.
• The most well-known protocol is carrier-sense multiple access with collision detection (CSMA/CD).
• Token bus and token ring pass a token from the owner which has the right to access the bus/ring and resolve nondeterministic waiting time.
NC論2 18
Shared-Medium Networks (Backplane bus)
• It is commonly used to interconnect processor(s) and memory modules to provide SMP (Symmetrical Memory Processor) architecture.
• It is realized by printed lines on a circuit board by discrete wiring.
• Gigaplane in SUN Enterprise x000 server(1996): 2.6GB/s, 256 bits data, 42 bits address, 83.8MHz clock.
NC論2 19
Direct (static) Networks
• Consists of a set of nodes.• Each node is directly connected to a subset
of other nodes in the network.• Examples:
– 2D mesh (intel Paragon), 3D mesh (MIT J-Mahine)– 2D torus (Fujitsu AP3000), 3D torus (Cray T3D, T3E)– Hypercube (CM1, CM2, nCUBE)
NC論2 20
Mesh topology
2D 3D
node
NC論2 21
Torus topology
2D
(4-ary 2-cube)3D
(3-ary 3-cube)
NC論2 22
Hypercube (binary n-cube)
4D
(2-ary 4-cube)
NC論2 23
tree
Binary tree fat tree x tree
NC論2 24
Hierarchical topology (1/2)
Pyramid
(Hierarchical 2D mesh)Hierarchical ring
NC論2 25
Hierarchical topology (2/2)
Cube-connected cycles RDT (Recursive Diagonal Torus)
NC論2 26
Hypermesh (spaninng-bus hypercube)
Single or multiple buses
NC論2 27
Base-m n-cube (hyper-crossbar)
Base-8 3-cube (Toshiba Prodigy)
000 007
070 077
707
777770
8x8 crossbar
NC論2 28
Diameter and degrees (1/2)2D mesh
2D torus
3D torus
binary n-cube
#node N N N N = 2n
Diameter 2√ N √N √N log N
degree 4 4 6 log N
3
NC論2 29
Diameter and degrees (2/2)Base-m n-cube
CCC Binary tree
ring
#node N = mn
N = n2n
N N
Diameter logm N 3n/2 2log N N/2
degree logm N 3 3 2
3