Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
CS575 Parallel Processing
Lecture three: Interconnection Networks Wim Bohm, CSU
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
CS575 lecture 3 2
Interconnection networks n Connect
n Processors, memories, I/O devices n Dynamic interconnection networks
n Connect any to any using switches or busses n Two types of switches
n On / off: 1 input, 1 output n Pass through / cross over: 2 inputs, 2 outputs
n Static interconnection networks n Connect point to point using “wires”
CS575 lecture 3 3
Dynamic Interconnection Network: Crossbar
n Connects e.g. p processors to b memories n p * b matrix
n p horizontal lines, b vertical lines n Cross points: on/off switches n Only one switch on per (row,column) pair n Non blocking: Pi to Mj does not block Pl to Mk
n Very costly, does not scale well n p * b switches, complex timing and checking
CS575 lecture 3 4
Dynamic Interconnection Network: Bus n Connects processors, memories, I/O devices
n Master: can issue a request to get the bus n Slave: can respond to a request, one bus is granted n If there are multiple masters, we need an arbiter
n Sequential n Only one communication at the time n Bottleneck n But simple and cheap
CS575 lecture 3 5
Crossbar vs bus n Crossbar
n Scalable in performance n Not scalable in hardware complexity
n Bus n Not scalable in performance n Scalable in hardware complexity
n Compromise: multistage network
CS575 lecture 3 6
Multi-stage network n Connects n components to each other n Usually built from O(n.log(n)) 2x2 switches n Cheaper than cross bar n Faster than bus n Many topologies
n e.g. Omega (book fig 2.12), Butterfly, ...
CS575 lecture 3 7
Static Interconnection Networks
n Fixed wires (channels) between devices n Many topologies
n Completely connected n (n(n-1))/2 channels n Static counterpart of crossbar
n Star n One central PE for message passing n Static counterpart of bus
n Multistage network with PE at each switch
CS575 lecture 3 8
More topologies n Necklace or ring n Mesh / Torus
n 2D, 3D
n Trees n Fat tree
n Hypercube n 2n nodes in nD hypercube n n links per node in nD hypercube n Addressing: 1 bit per dimension
CS575 lecture 3 9
Hypercube n Two connected nodes differ in one bit n nD hypercube can be divided in
n 2 (n-1) D cubes in n ways n 4 (n-2) D cubes n 8 (n-3) D cubes
n To get from node s to node t n Follow the path determined by the differing bits n E.g. 01100 à 11000:
01100 à 11100 à 11000 n Question: how many (simple) paths from one node to another?
CS575 lecture 3 10
Measures of static networks
n Diameter n Maximal shortest path between two nodes
n Ring: ⎣p/2⎦, hypercube: log(p) 2D wraparound mesh: 2 ⎣sqrt(p)/2⎦
n Connectivity n Measure of multiplicity of paths between nodes n Arc connectivity
n Minimum #arcs to be removed to create two disconnected networks
n Ring: 2, hypercube: log(p), mesh: 2, wraparound mesh: 4
CS575 lecture 3 11
More measures n Bisection width
n Minimal #arcs to be removed to partition the network in two (off by one node) equal halves
n Ring: 2, Complete binary tree: 1, 2D mesh: sqrt(p) n Question: bisection width of a hypercube?
n Channel width n #bits communicated simultaneously over channel
n Channel rate / bandwidth n Peak communication rate (#bits/second)
n Bisection bandwidth n Bisection width * channel bandwidth
CS575 lecture 3 12
Summary of measures: p nodes Network Diameter Bisection
width Arc
connectivity #links
Completely-Connected
1 p2/4 p-1 p(p-1)/2
Star 2 ⎣p/2⎦ * 1 p-1
Ring ⎣p/2⎦ 2 2 p
Complete binary tree
2log((p+1)/2) 1 1 p-1
Hypercube log(p) p/2 log(p) p.log(p)/2 * The textbook mentions bisection width of a star as 1, but the only way to split a star into (almost) equal halves is by cutting half of its links.
CS575 lecture 3 13
Meshes and Hyper cubes
n Mesh n Buildable, scalable, cheaper than hyper cubes n Many (eg grid) applications map naturally n Cut through works well in meshes n Commercial systems based on it.
n Hyper cube n Recursive structure nice for algorithm design n Often same O complexity as PRAMs n Often hypercube algorithm also good for other
topologies, so good starting point
CS575 lecture 3 14
Embedding
n Relationship between two networks n Studied by mapping one into the other n Why? n G(V,E) à G’(V’,E’)
n graph G, G’, vertices V, V’, edges E, E’ n Map E àE’, V à V’
n congestion of k: k (>1) e-s to one e’ n dilation of k: 1 e to k e’-s n expansion: |V’| / |V| n Often we want congestion=dilation=expansion=1
CS575 lecture 3 15
Ring into hypercube n Number the nodes of the ring s.t.
n Hamming distance between two adjacent nodes = 1 n Gray code provides such a numbering
n Can be built recursively: binary reflected Gray code n 2 nodes: 0 1 OK n 2k nodes:
n take Gray code for 2k-1 nodes n Concatenate it with reflected Gray code for 2k-1 nodes n Put 0 in front of first batch, 1 in front of second
n Mesh can be embedded into a hypercube n (Toroidal) mesh = rings of rings
CS575 lecture 3 16
ring to hypercube cont’
0 00 000 G(0,1) = 0 i →G(i,dim) 1 01 001 G(1,1) = 1 11 011 10 010 G(i,x+1) = 0||G(i,x) i<2x
110 = 1||G(2 x+1-i-1,x) i>=2x
111 (|| is concatenation)
101 100
CS575 lecture 3 17
2D Mesh into hypercube
n Note 2D Mesh n Rows: rings n Cols: rings
n 2r * 2s wraparound mesh into 2r+s cube n Map node(i,j) onto node G(i,r)||G(j,s) n Row coincides with sub cube n Column coincides with sub cube n S.t. if adjacent in mesh then adjacent in cube
CS575 lecture 3 18
Complete binary tree into hypercube
n Map tree root to any cube node n left child to same node n right child at level j: invert bit j of parent node 000 000 001 000 010 001 011 000 100 010 110 001 101 011 111
CS575 lecture 3 19
Routing Mechanisms
n Determine all source à destination paths n Minimal: a shortest path n Deterministic: one path per (src,dst) pair
n Mesh: dimension ordered (XY routing) n Cube: E-routing
n Send along least significant 1 bit in src XOR dst
n Adaptive: many paths per (src,dst) pair n Minimal: only shortest n Why adaptive? Discuss.
CS575 lecture 3 20
Routing (communication) Costs
n Three factors n Start up at source (ts)
n OS, buffers, error correction info, routing algorithm
n Hop time (th) n The time it takes to get from one PE to the next n Also called node latency
n Word transfer time (tw) n Inverse of channel bandwidth
CS575 lecture 3 21
Two rout(switch)ing techniques
n Store and Forward O(m.l) n Strict: whole message travels from PE to PE n m words, l links tcomm = ts + (m.tw + th).l n Often, th is much less than m.tw: tcomm= ts + m.l.tw
n Cut-through O(m+l) n Non-strict: message broken in flits (packets) n Flits are pipelined through the network tcomm= ts + l.th + m.tw n Circular path + finite flit buffer can give rise to deadlock