29
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm

IP routers with memory that runs slower than the line rate

Embed Size (px)

DESCRIPTION

IP routers with memory that runs slower than the line rate. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm. Outline. Trends in packet switch design Additional problem: - PowerPoint PPT Presentation

Citation preview

Page 1: IP routers with memory that runs slower than the line rate

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

IP routers with memory thatruns slower than the line rate

Nick McKeownAssistant Professor of Electrical Engineering and Computer Science, Stanford University

[email protected]://www.stanford.edu/~nickm

Page 2: IP routers with memory that runs slower than the line rate

2

Outline

• Trends in packet switch design • Additional problem:

“Data rates may soon exceed memory bandwidth”

• The Fork-Join Router & Parallel Packet Switches

Page 3: IP routers with memory that runs slower than the line rate

3

Output 2

Output N

First Packet SwitchesShared Memory

Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.

Limited by memory bandwidth.

Input 1 Output 1

Input N

Input 2

Numerous work has proven and made possible:– Fairness– Delay Guarantees– Delay Variation Control– Loss Guarantees– Statistical Guarantees

Page 4: IP routers with memory that runs slower than the line rate

4

Later Packet SwitchesSingle-stage crossbar with CIOQ and

VOQs

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Lookup&

DropPolicy

OutputScheduling

Virtual Output Queues

OutputScheduling

OutputScheduling

SwitchFabric

SwitchArbitration

Linecard Linecard

Switch Core(Bufferless)

Lookup&

DropPolicy

Lookup&

DropPolicy

Page 5: IP routers with memory that runs slower than the line rate

5

Myths about CIOQ-based crossbar switches

1. “Input-queued crossbars have low throughput”– An input-queued crossbar can have as high

throughput as any switch.

2. “Crossbars don’t support multicast traffic well”– A crossbar inherently supports multicast efficiently.

3. “Crossbars don’t scale well”– Today, it is the number of chip I/Os, not the number

of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches.

Page 6: IP routers with memory that runs slower than the line rate

6

Myths about CIOQ-based crossbar switches (2)

4. “Crossbar switches can’t support delay/QoS guarantees”

– With an internal speedup of 2, a CIOQ switch can (in theory) precisely emulate a shared memory switch for all traffic.

Page 7: IP routers with memory that runs slower than the line rate

7

What makes sense today?

Shared Memory

Input Queued

CIOQ Multistage

Blocking No No No Yes

Speedup High High Small High

Emulation of SM Yes No Yes No

Multicast Good Good Good Poor

Resequencing No No No Yes

Power Low OK OK High

Packaging - OK OK Complex

Page 8: IP routers with memory that runs slower than the line rate

8

Summary of trend

Output 2

Output N

Input 1 Output 1

Input N

Input 2

SwitchFabric

SwitchArbitration

Higher CapacityMultistage:•Clos•Banyan•Toroidal…

Less frequentarbitration

Limited by:Memory bandwidth~50Gb/s

Limited by:Per-cell arbitrationPower~5Tb/s

1

2

Page 9: IP routers with memory that runs slower than the line rate

9

Buffer MemoryHow Fast Can I Make a Packet Buffer?

BufferMemory

10ns on-chip DRAM

Rough Estimate:– 10ns per memory operation.– Two memory operations per

packet.– Therefore, maximum ~26Gb/s.

64-byte wide bus 64-byte wide bus

Exte

rnal

Lin

ee.g

. O

C7

68c

Sw

itch

Fabri

c

Page 10: IP routers with memory that runs slower than the line rate

10

How can we make routers with 40Gb/s, 160Gb/s,…

interfaces?

Page 11: IP routers with memory that runs slower than the line rate

11

Higher capacity and higher linerates

Output 2

Output N

Input 1 Output 1

Input N

Input 2

SwitchFabric

SwitchArbitration

Multistage

Less frequentarbitration

Limited by:Memory bandwidth~50Gb/s

Limited by:Per-cell arbitrationPower~5Tb/s

1

2

More parallelism:Fork-Join Router

3

Higher capacity

Higher Linerates

Page 12: IP routers with memory that runs slower than the line rate

12

Fork-Join Router

How can we:– Increase capacity. – Reduce power per subsystem.

While at the same time…– Keep the system simple. – Support line rates faster than memory

bandwidth. – Provide delay guarantees.

Increase parallelism.

Multiple racks.

Single-stage buffering.

Pkt-by-pkt load balancing.

Hmmm….?

Page 13: IP routers with memory that runs slower than the line rate

13

The Fork-Join Router

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

Bufferless

Page 14: IP routers with memory that runs slower than the line rate

14

The Fork-Join Router

• Advantages– Single-stage of buffering– kpower per subsystem – kmemory bandwidth – kfowarding table lookup rate

Page 15: IP routers with memory that runs slower than the line rate

15

The Fork-Join Router

• Questions– Switching: What is the performance?– Forwarding Lookups: How do they

work?

Page 16: IP routers with memory that runs slower than the line rate

16

A Parallel Packet Switch

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

Arriving packet tagged with egress port

Page 17: IP routers with memory that runs slower than the line rate

17

Performance Questions

1. Can it be work-conserving?2. Can it emulate a single big output

queued switch?3. Can it support delay guarantees,

strict-priorities, WFQ, …?

Page 18: IP routers with memory that runs slower than the line rate

18

Work Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

Input LinkConstraint

Output LinkConstraint

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

Page 19: IP routers with memory that runs slower than the line rate

19

Work Conservation

rate, R1rate, R

1

2

k

1

R/k

R/k

R/k

R/k

R/k

R/k

1

2

3 Output LinkConstraint

45

1

2

3

4

1234115

Page 20: IP routers with memory that runs slower than the line rate

20

Work Conservation

1

N

rate, R

rate, R

rate, R

rate, R

1

N

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

S(R/k)

Page 21: IP routers with memory that runs slower than the line rate

21

Precise Emulation of an Output Queued Switch

N N

Output Queued Switch

1

N

Parallel Packet Switch

= ?

1

N

1

N

Page 22: IP routers with memory that runs slower than the line rate

22

Parallel Packet SwitchTheorems

1. If S > 2k/(k+2) 2 then a parallel packet switch can be work-conserving for all traffic.

2. If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.

Page 23: IP routers with memory that runs slower than the line rate

23

Parallel Packet SwitchTheorems

3. If S > 3k/(k+3) 3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

Page 24: IP routers with memory that runs slower than the line rate

24

Parallel Packet SwitchTheorems

4. If S >= 1 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a FCFS switch for all traffic.

Page 25: IP routers with memory that runs slower than the line rate

25

Co-ordination buffers

rate, R

rate, R

rate, R

rate, R

OutputQueuedSwitch

OutputQueuedSwitch

OutputQueuedSwitch

1

2

k

R/k

R/k

R/k

R/k

R/k

R/k

Size Nk Size Nk

Page 26: IP routers with memory that runs slower than the line rate

26

Parallel Packet SwitchTheorems

5. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

Page 27: IP routers with memory that runs slower than the line rate

27

The Fork-Join Router

• Questions– Switching: What is the performance?– Forwarding Lookups: How do they

work?

Page 28: IP routers with memory that runs slower than the line rate

28

The Fork-Join RouterLookahead Forwarding Table Lookups

Packet tagged with egress port at next

router

Lookup performed in

parallel at rate R/k

Page 29: IP routers with memory that runs slower than the line rate

29

The Fork-Join Router

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

•Possibly >100Tb/s aggregate capacity•Linerates in excess of 100Gb/s