24
CS 258 Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance February 8, 2002 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/ cs258

CS 258 Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

  • Upload
    mayda

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 258 Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance. February 8, 2002 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs258. Recall: Deadlock Freedom. How can it arise? necessary conditions: shared resource incrementally allocated non-preemptible - PowerPoint PPT Presentation

Citation preview

Page 1: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

CS 258 Parallel Computer Architecture

Lecture 6

Router DesignFault Tolerance

February 8, 2002Prof John D. Kubiatowicz

http://www.cs.berkeley.edu/~kubitron/cs258

Page 2: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.2

Recall: Deadlock Freedom• How can it arise?

– necessary conditions:» shared resource» incrementally allocated» non-preemptible

– think of a channel as a shared resource that is acquired incrementally

» source buffer then dest. buffer» channels along a route

• How do you avoid it?– constrain how channel resources are allocated– ex: dimension order

• How do you prove that a routing algorithm is deadlock free

Page 3: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.3

Recall: Channel Dependence Graph

1 2 3

01200 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

17

18

1916

17

18

1 2 3

012

1718 1718 1718 1718

1 2 3

012

1817 1817 1817 1817

1 2 3

012

1916 1916 1916 1916

1 2 3

012

Page 4: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.4

Turn Restrictions in X,Y

• XY routing forbids 4 of 8 turns and leaves no room for adaptive routing

• Can you allow more turns and still be deadlock free

+Y

-Y

+X-X

Page 5: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.5

Minimal turn restrictions in 2D

West-first

north-last negative first

-x +x

+y

-y

Page 6: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.6

Example legal west-first routes

• Can route around failures or congestion• Can combine turn restrictions with virtual

channels

Page 7: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.7

Adaptive Routing• R: C x N x -> C• Essential for fault tolerance

– at least multipath• Can improve utilization of the network• Simple deterministic algorithms easily run into

bad permutations

• fully/partially adaptive, minimal/non-minimal• can introduce complexity or anomolies• little adaptation goes a long way!

Page 8: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.8

Recall: Virtual channels

Packet switchesfrom lo to hi channel

Page 9: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.9

Switch Design

• Flow control?• Routing Control?• Fault tolerance?• Retransmission?

Cross-bar

InputBuffer

Control

OutputPorts

Input Receiver Transmiter

Ports

Routing, Scheduling

OutputBuffer

Page 10: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.10

Other Adaptive Algorithms• Adaptive in all dimensions

– Linder and Harden paper– Requires 2d-1 virtual channels/physical link

• Planar Adaptive routing (Chien and Kim)– Take 2 dimensions at a time and adapt within them:

» (X,Y), then (Y,Z), then (Z,W), etc.– Many of advantages of the fully adaptive, much less

hardware!

Page 11: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.11

How do you build a crossbar

Io

I1

I2

I3

Io I1 I2 I3

O0

Oi

O2

O3

RAMphase

O0

OiO2O3

DoutDin

Io

I1I2I3

addr

Page 12: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.12

Input buffered swtich

• Independent routing logic per input– FSM

• Scheduler logic arbitrates each output– priority, FIFO, random

• Head-of-line blocking problem

Cross-bar

OutputPorts

Input Ports

Scheduling

R0

R1

R2

R3

Page 13: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.13

Output Buffered Switch

• How would you build a shared pool?

Control

OutputPorts

Input Ports

OutputPorts

OutputPorts

OutputPorts

R0

R1

R2

R3

Page 14: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.14

Example: IBM SP vulcan switch

• Many gigabit ethernet switches use similar design without the cut-through

FIFO

CRCcheck

Routecontrol

FlowControl

8 8

Des

eria

lizer

64

Input Port

RAM64x128

InArb

OutArb

8 x 8Crossbar

CentralQueue

FIFO

CRCGen

FlowControl

8 8Seria

lizer

64

Ouput Port

XBarArb

FIFO

CRCcheck

Routecontrol

FlowControl

8 8

Des

eria

lizerInput Port

°°°

64

°°°

FIFO

CRCGen

FlowControl

8 8Seria

lizer

Ouput Port

XBarArb

8

°°°

8

Page 15: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.15

Output scheduling

• n independent arbitration problems?– static priority, random, round-robin

• simplifications due to routing algorithm?• general case is max bipartite matching

Cross-bar

OutputPorts

R0

R1

R2

R3

O0

O1

O2

InputBuffers

Page 16: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.16

Stacked Dimension Switches• Dimension order

on 3D cube?

• Cube connected cycles?

Host Out

Host In

Xin

Yin

Zin

Xout

Yout

Zout

2x2

2x2

2x2

Page 17: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.17

Flow Control• What do you do when push comes to

shove?– ethernet: collision detection and retry after delay– FDDI, token ring: arbitration token– TCP/WAN: buffer, drop, adjust rate– any solution must adjust to output rate

• Link-level flow control

Data

Ready

Page 18: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.18

Examples• Short Links

• long links– several flits on the wire

Sour

ce

Des

tinat

ion

Data

Req

Ready/AckF/E F/E

Page 19: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.19

Smoothing the flow

• How much slack do you need to maximize bandwidth?

LowMark

HighMark

Empty

Full

Stop

Go

Incoming Phits

Outgoing Phits

Flow-control Symbols

Page 20: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.20

Link vs global flow control• Hot Spots• Global communication operations• Natural parallel program dependences

Page 21: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.21

Reliable Router• Bidirectional signaling (Conserve pins)

– Current-mode drive– Transmitter subtracts its injection of current to

compute receiver’s value• Clock synchronization: Pleisiochronous

– Clocks “almost” same frequency at all nodes– By sending occasional dummy cells, transmit rate <

receive rate• Deadlock: 5 virtual channels

– 2 minimal-adaptive (always route toward destination)– 2 dimension ordered (i.e. 2 priorities of channel)– 1 non-minimal turn-restricted channel

• Error Handling– “Unique Token protocol”– Every flit in two places at a time– Flow control maintains this invariant

Page 22: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.22

Example: T3D

• 3D bidirectional torus, dimension order (NIC selected), virtual cut-through, packet sw.

• 16 bit x 150 MHz, short, wide, synch.• rotating priority per output• logically separate request/response• 3 independent, stacked switches• 8 16-bit flits on each of 4 VC in each directions

Ro ute TagDest PECo mmand

Ro ute TagDest PECo mmand

Route TagDest PECommand

Route TagDest PEComman d

Route TagDest PECommand

R oute T agD est PEC ommand

R oute TagD est PEC ommand

R ead Req - no cache - cache - prefetch - fetch&in c

Ad dr 0Ad dr 1Src PE

R ead Resp Read Resp - cached

Word 0 Word 0Word 1Word 2Word 3

Write Req - Proc - BLT 1 - fetch&inc

Addr 0Addr 1Src PEWord 0

Addr 0Addr 1Src PEWord 0Word 1Word 2Word 3

Write Req - proc 4 - BLT 4

Write Resp

A ddr 0A ddr 1Src PEA ddr 0A ddr 1

B LT R ead Req

Packet Type req/resp coomand3 1 8

Page 23: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.23

Example: SP

• 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock

• packet sw, cut-through, no virtual channel, source-based routing

• variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes per output, 16 phit links

• 128 8-byte ‘chunks’ in central queue, LRU per output• run in shadow mode

P0P1P2P3 P15

E0E1E2E3 E15

Intra-Rack Host Ports

Inter-Rack External Switch Ports

16-node Rack

SwitchBoard

Multi-rack Configuration

Page 24: CS 258  Parallel Computer Architecture Lecture 6 Router Design Fault Tolerance

2/8/02John Kubiatowicz

Slide 6.24

Summary• Routing Algorithms restrict the set of routes

within the topology– simple mechanism selects turn at each hop– arithmetic, selection, lookup

• Deadlock-free if channel dependence graph is acyclic– limit turns to eliminate dependences– add separate channel resources to break dependences– combination of topology, algorithm, and switch design

• Deterministic vs adaptive routing• Switch design issues

– input/output/pooled buffering, routing logic, selection logic

• Flow control• Real networks are a ‘package’ of design

choices