55
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Embed Size (px)

Citation preview

Page 1: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

Packet Switches

Page 2: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Packet switches In a circuit switch, path of a sample is determined at time

of connection establishment No need for a sample header--position in frame used In a packet switch, packets carry a destination field or

label Need to look up destination port on-the-fly

Datagram switches lookup based on entire destination address (longest-

prefix match) Cell or Label-switches

lookup based on VCI or Labels L2 Switches, L3 Switches, L4-L7 switches

Key difference is in lookup function (I.e. filtering), not in switching (I.e not in forwarding)

Page 3: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

Shared Memory Switches Dual-ported RAM Incoming cells converted

from serial to parallel Elegant, but memory

speeds & port counts don’t scale Output buffering 100% throughput

under heavy load Minimize buffers

Eg: CNET Prelude, Hitachi shared buffer s/w, AT&T GCNS-2000

Page 4: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Shared memory fabrics: more…

Memory interface hardware expensive => many “ports” share fewer memory interfaces Eg: dual-ported memory

Separate low-speed bus lines for controller

Page 5: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Shared Medium Switches Share medium (I.e.

bus/ring etc) instead of memory

Medium has to be N times as fast Address filters &

output buffers at the medium speed also!

TDM + round robin Egs: IBM PARIS &

plaNET s/w, Fore Forerunner ASX-100, NEC ATOM

Page 6: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Fully Interconnected Switches Full interconnections Broadcast + address-filters

Multicasting is natural Output queuing All hardware same speed =>

scalable Quadratic growth of buffers/filters Knockout switch (AT&T) reduced

# of buffers: fixed L (=8) buffers per output + a tournament method to eliminate packets Small residual packet loss rate

(1/million) Egs: Fujitsu bus matrix, GTE

SPANet

Page 7: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

Crossbar: “Switched” interconnections

2N media (I.e. buses), BUT… Use “switches” between each input and output bus instead of broadcasting

Total number of “paths” required = N+M Number of switching points = NxM Arbitration/scheduling needed to deal with port contention

Page 8: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

Multi-Stage Fabrics Compromise between pure time-division and pure space

division Attempt to combine advantages of each

Lower cost from time-division Higher performance from space-division

Technique: Limited Sharing Eg: Banyan switch Features

Scalable Self-routing, I.e. no central controller Packet queues allowed, but not required

Note: multi-stage switches share the “crosspoints” which have now become “expensive” resources…

Page 9: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Multi-stage switches: fewer crosspoints

Issue: output & internal blocking…

Page 10: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Banyan Switch Fabric (Contd)

Basic building block = 2x2 switch, labelled by 0/1 Can be synchronous or asynchronous

Asynchronous => packets can arrive at arbitrary times Synchronous banyan offers TWICE the effective throughput!

Worst case when all inputs receive packets with same label

Page 11: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Switch fabric element

Goal: “self-routing” fabrics Build complicated fabrics from a simple elements

Routing rule: if 0, send packet to upper output, else to lower output If both packets to same output, buffer or drop

Page 12: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

Multi-stage Interconnects (MINs): Banyan Key: reduce the number of

crosspoints in a crossbar 8x8 banyan: Recursive design

Use the first bit to route the cell through the first stage, either to the upper or lower 4x4 network,

Last 2 bits to route the cell through the 4x4 network to the appropriate output port.

Self-routing: output address completely specifies the route through the network (aka digit-controlled routing)

Simple elements, scalable, parallel routing, elements at same speed

Eg: Bellcore Sunshine, Alcatel DN 1100

Page 13: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Banyan Fabric: another view…

Page 14: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Banyan Simplest self-routing recursive fabric

Two packets want to go to the same output => output blocking Banyan: packets may block even if they want to go to different outputs

=> internal blocking! Unlike crossbar: because it has fewer crosspoints However, feasible non-blocking schedules exist => pre-sort &

shuffle packets to get to such non-blocking schedules

Page 15: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

Non-Blocking Batcher-Banyan

3

7

5

2

6

0

1

4

7

2

3

5

6

1

0

4

7

5

2

3

1

0

6

4

7

0

5

1

3

4

2

6

7

4

5

6

0

3

1

2

7

6

4

5

3

2

0

2

7

6

5

4

3

2

1

0

000001

010011

100101

110111

Batcher Sorter Self-Routing Network

• Fabric can be used as scheduler. •Batcher-Banyan network is blocking for multicast.

Page 16: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

Blocking in Banyan S/ws: Sorting Can avoid blocking by choosing order in which packets

appear at input ports If we can

present packets at inputs sorted by output “trap” duplicates (I.e. going to same o/p port) remove gaps precede banyan with a perfect shuffle stage then no internal blocking

For example: [X, 010, 010, X, 011, X, X, X]: Sort => [010, 011, 011, X, X, X, X, X] Trap duplicates => [010, 011, X, X, X, X, X, X] Shuffle => [010, X, 011, X, X, X, X, X] Need sort, shuffle, and trap networks

Page 17: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

Sorting using Merging Build sorters from merge networks Assume we can merge two sorted lists Sort pairwise, merge, recurse

Page 18: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Putting together: Batcher-Banyan

Page 19: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Scaling Banyan Networks: Challenges

1. Batcher-banyan networks of significant size are physically limited by the possible circuit density and number of input/output pins of the integrated circuit. To interconnect several boards, interconnection complexity and power dissipation place a constraint on the number of boards that can be interconnected

2. The entire set of N cells must be synchronized at every stage

3. Large sizes increases the difficulty of reliability and repairability

4. All modifications to maximize the throughput of space-division networks increase the implementation complexity

Page 20: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Other Non-Blocking FabricsClos Network

Page 21: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

Other Non-Blocking FabricsClos Network

Expansion factor required = 2-1/N (but still blocking for multicast)

Page 22: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Blocking and Buffering

Page 23: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Blocking in packet switches

Can have both internal and output blocking Internal

no path to output Output

trunk unavailable Unlike a circuit switch, cannot predict if packets

will block (why?) If packet is blocked => must either buffer or drop

Page 24: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Dealing with blocking in packet switches

Over-provisioning internal links much faster than inputs

Buffersat input or output

Backpressure if switch fabric doesn’t have buffers, prevent

packet from entering until path is available Parallel switch fabrics

increases effective switching capacity

Page 25: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Blocking in Banyan Fabric

Page 26: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Buffering: where?

Input Output Internal Re-circulating

Page 27: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Queuing: input, output buffers

Page 28: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Switch Fabrics: Buffered crossbar

What happens if packets at two inputs both want to go to same output?

Can defer one at an input buffer Or, buffer cross-points: complex arbiter

Page 29: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Queuing:Two basic practical techniques

Input Queueing Output Queueing

Usually a non-blockingswitch fabric (e.g. crossbar)

Usually a fast bus

Page 30: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Queuing:Output Queueing

Individual Output Queues Centralized Shared Memory

Memory b/w = (N+1).R

1

2

N

Memory b/w = 2N.R

1

2

N

Page 31: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Output Queuing

Page 32: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Input Queuing

Page 33: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Input QueueingHead of Line Blocking

Del

ay

Load58.6% 100%

Page 34: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

34

Solution: Input Queueing w/Virtual output queues (VOQ)

Page 35: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

35

Head-of-Line (HOL) in Input Queuing

Page 36: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

36

Input QueuesVirtual Output Queues

Del

ay

Load100%

Page 37: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

37

Input Queueing

Scheduler

Memory b/w = 2R

Can be quitecomplex!

Page 38: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

38

Input QueueingScheduling

Input 1

Q(1,1)

Q(1,n)

A1(t)

Input m

Q(m,1)

Q(m,n)

Am(t)

D1(t)

Dn(t)

Output 1

Output n

Matching, MA1,1(t)

?

Page 39: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

39

Input QueueingScheduling: Example

RequestGraph

123

4

12342

5

242

7

BipartiteMatching

1234

1234

(Weight = 18)

Page 40: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

40

Input QueueingLongest Queue First or

Oldest Cell First

1234

1234

1234

1234

10 1

1

1

1 10

Maximum weight

Weight Waiting Time

100%Queue Length { } =

Page 41: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

41

Input QueueingScheduling

Maximum SizeMaximizes instantaneous throughputDoes it maximize long-term throughput?

Maximum WeightCan clear most backlogged queuesBut does it sacrifice long-term throughput?

Page 42: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

42

Input QueuingWhy is serving long/old queues better than serving

maximum number of queues?

• When traffic is uniformly distributed, servicing themaximum number of queues leads to 100% throughput.• When traffic is non-uniform, some queues become longer than others.• A good algorithm keeps the queue lengths matched, and services a large number of queues.

VOQ #

Avg

Occ

upan

cy Uniform traffic

VOQ #

Avg

Occ

upan

cyNon-uniform traffic

Page 43: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

43

Input QueueingPractical Algorithms

Maximal Size AlgorithmsWave Front Arbiter (WFA)Parallel Iterative Matching (PIM) iSLIP

Maximal Weight AlgorithmsFair Access Round Robin (FARR)Longest Port First (LPF)

Page 44: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

44

iSLIP

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

Requests

1

2

3

4

1

2

3

4Grant

1

2

3

4

1

2

3

4Accept/Match

1

2

3

4

1

2

3

4

#1

#2

Round-Robin Selection

1

2

3

4

1

2

3

4

Round-Robin Selection

Page 45: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

45

iSLIPProperties

Random under low load TDM under high load Lowest priority to MRU 1 iteration: fair to outputs Converges in at most N iterations. On

average <= log2N Implementation: N priority encoders Up to 100% throughput for uniform traffic

Page 46: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

46

iSLIP

Page 47: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

47

iSLIP

Page 48: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

48

iSLIPImplementation

Grant

Grant

Grant

Accept

Accept

Accept

1

2

N

1

2

N

State

N

N

N

Decision

log2N

log2N

log2N

ProgrammablePriority Encoder

Page 49: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

49

Throughput results

Theory:

Practice:

InputQueueing

(IQ)

InputQueueing

(IQ)

InputQueueing

(IQ)

InputQueueing

(IQ)

58% [Karol, 1987]

IQ + VOQ,Maximum weight matching

IQ + VOQ,Maximum weight matching

IQ + VOQ,Sub-maximal size matching

e.g. PIM, iSLIP.

IQ + VOQ,Sub-maximal size matching

e.g. PIM, iSLIP.

100% [M et al., 1995]

Different weight functions,incomplete information, pipelining.

Different weight functions,incomplete information, pipelining.

Randomized algorithmsRandomized algorithms

100% [Tassiulas, 1998]

100% [Various]

Various heuristics, distributed algorithms,

and amounts of speedup

Various heuristics, distributed algorithms,

and amounts of speedup

IQ + VOQ,Maximal size matching,

Speedup of two.

IQ + VOQ,Maximal size matching,

Speedup of two.

100% [Dai & Prabhakar, 2000]

Page 50: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

50

Speedup: Context

Memory

Memory

The placement of memory gives

- Output-queued switches- Input-queued switches- Combined input- and output-queued switches

A generic switch

Page 51: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

51

Output-queued switches

Best delay and throughput performance- Possible to erect “bandwidth firewalls” between sessions

Main problem- Requires high fabric speedup (S = N)

Unsuitable for high-speed switching

Page 52: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

52

Input-queued switches

Big advantage - Speedup of one is sufficient

Main problem- Can’t guarantee delay due to input contention

Overcoming input contention: use higher speedup

Page 53: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

53

The Speedup Problem

Find a compromise: 1 < Speedup << N

- to get the performance of an OQ switch- close to the cost of an IQ switch

Essential for high speed QoS switching

Page 54: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

54

Intuition

Speedup = 1

Speedup = 2

Fabric throughput = .58

Bernoulli IID inputs

Fabric throughput = 1.16

Bernoulli IID inputs

I/p efficiency, = 1/1.16

Ave I/p queue = 6.25

Page 55: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Packet Switches

Shivkumar KalyanaramanRensselaer Polytechnic Institute

55

Intuition (continued)

Speedup = 3Fabric throughput = 1.74

Bernoulli IID inputs

Input efficiency = 1/1.74

Speedup = 4 Fabric throughput = 2.32

Bernoulli IID inputs

Input efficiency = 1/2.32

Ave I/p queue = 0.75

Ave I/p queue = 1.35