View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
High Speed Router Design
Shivkumar KalyanaramanRensselaer Polytechnic Institute
[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma
Also based on slides of S. Keshav (Ensim), Douglas Comer (Purdue),Raj Yavatkar (Intel), Cyriel Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)
Many slides thanks to Nick McKeown (Stanford),
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
Introduction Evolution of High-Speed Routers High Speed Router Components:
Lookup Algorithm Switching Classification, Scheduling
Multi-Tbps Routers: Challenges & Trends
Overview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
What do switches/routers look like?
Access routerse.g. ISDN, ADSL
Core routere.g. OC48c POS
Core ATM switch
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
Dimensions, Power Consumption
Cisco GSR 12416 Juniper M160
6ft
19”
2ft
Capacity: 160Gb/sPower: 4.2kW
3ft
2.5ft
19”
Capacity: 80Gb/sPower: 2.6kW
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
Where high performance packet switches are used
Enterprise WAN access& Enterprise Campus Switch
- Carrier Class Core Router- ATM Switch- Frame Relay Switch
The Internet Core
Edge Router
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
Where are routers? Ans: Points of Presence (POPs)
A
B
C
POP1
POP3POP2
POP4 D
E
F
POP5
POP6 POP7POP8
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
POP with smaller routersPOP with large routers
Interfaces: Price >$200k, Power > 400W Space, power, interface cost economics! About 50-60% of i/fs are used for interconnection within the POP. Industry trend is towards large, single router per POP.
Why the Need for Big/Fast/Large Routers?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
Job of router architect
For a given set of features:
3
. . 5
2
Maximize capacity,
Power,
Volume,
C
P kW
V
t
m
s
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
Performance metrics1. Capacity
“maximize C, s.t. volume < 2m3 and power < 5kW”2. Throughput
Maximize usage of expensive long-haul links. Trivial with work-conserving output-queued routers
3. Controllable Delay Some users would like predictable delay. This is feasible with output-queueing plus weighted fair
queuing (WFQ).
WFQ( , ) ( , )
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
Relative preformance increase
100%
1000%
10000%
100000%
1996 1998 2000 2002
DWDM Link speedx2/8 months
Router capacityx2.2/18 months
Moore’s lawx2/18 m
DRAM access rate x1.1/18 m
Internetx2/yr
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Alt: Memory BandwidthCommercial DRAM
Memory speed is not keeping up with Moore’s Law.
0.0001
0.001
0.01
0.1
1
10
100
1000
1980 1983 1986 1989 1992 1995 1998 2001
Acc
ess
Tim
e (n
s) DRAM1.1x / 18months
Moore’s Law2x / 18 months
Router Capacity2.2x / 18months
Line Capacity2x / 7 months
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
An Example: Packet buffers40Gb/s router linecard
BufferMemory
Write Rate, R
One 40B packetevery 8ns
Read Rate, R
One 40B packetevery 8ns
10Gbits
Buffer Manager
Use SRAM?+ Fast enough random access time, but- Too low density to store 10Gbits of data.
Use DRAM? + High density means we can store data, but- Can’t meet random access time.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
Eg: Problems w/ Output Queuing
Output queued switches are impractical
R
R
RR
DRAMDRAM
NR NR
data
R
R
RR
output1
N
Can’t I just use N separate memory devices per output?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Packet processing is getting harder
1
10
100
1000
1996 1997 1998 1999 2000 2001
CPU Instructions per minimum length packet since 1996
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
First-Generation IP Routers
Most Ethernet switches and cheap packet routers Bottleneck can be CPU, host-adaptor or I/O bus What is costly? Bus ? Memory? Interface? CPU?
Shared Backplane
Line Interface
CPU
Memory
CPU BufferMemory
LineInterface
DMA
MAC
LineInterface
DMA
MAC
LineInterface
DMA
MAC
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
First Generation Routers
Shared Backplane
Line Interface
CPU
Memory
RouteTableCPU Buffer
Memory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Fixed length “DMA” blocksor cells. Reassembled on egress
linecard
Fixed length cells or variable length packets
Typically <0.5Gb/s aggregate capacity
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
Output 2
Output N
First Generation RoutersQueueing Structure: Shared Memory
Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.
Limited by memory bandwidth.
Input 1 Output 1
Input N
Input 2
Numerous work has proven and made possible:
Fairness Delay Guarantees Delay Variation Control Loss Guarantees Statistical Guarantees
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
Second-Generation IP Routers
CPU BufferMemory
LineCard
DMA
MAC
LocalLocalBufferBufferMemoryMemory
LineCard
DMA
MAC
LocalLocalBufferBufferMemoryMemory
LineCard
DMA
MAC
LocalLocalBufferBufferMemoryMemory
Port mapping intelligence in line cards Higher hit rate in local lookup cache What is costly? Bus ? Memory? Interface? CPU?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
Second Generation Routers
RouteTableCPU
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingCache
FwdingCache
FwdingCache
MAC
Slow Path
Drop PolicyDrop Policy Or Backpressure
OutputLink
Scheduling
BufferMemory
Typically <5Gb/s aggregate capacity
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
RouteTableCPU
Second Generation RoutersAs caching became ineffective
LineCard
BufferMemory
LineCard
MAC
BufferMemory
LineCard
MAC
BufferMemory
FwdingTable
FwdingTable
FwdingTable
MAC
ExceptionProcessor
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Second Generation RoutersQueuing Structure: Combined Input and Output Queuing (CIOQ)
Bus
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by bus speed
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Third-Generation Switches/Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line Interface
Line InterfaceCPU
Memory
Third generation switch provides parallel paths (fabric)
What’s costly? Bus? Memory, CPU?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
Third Generation Routers
LineCard
MAC
LocalBuffer
Memory
CPUCard
LineCard
MAC
LocalBuffer
Memory
Switched Backplane
Line Interface
CPUMem
ory FwdingTable
RoutingTable
FwdingTable
Typically <50Gb/s aggregate capacity
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
Arbiter
Third Generation RoutersQueueing Structure
Switch
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch
fabric speedup
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
Arbiter
Third Generation RoutersQueueing Structure: VOQs
Switch
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch
fabric speedup
Per-flow/class or per-output queues (VOQs)
Per-flow/class or per-input queues
Flow-controlbackpressure
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Third Generation Routers: limits
19” or 23”
7’
• Size-constrained: 19” or 23” wide.
• Power-constrained: ~<8kW.
Supply: 100A/200A maximum at 48V
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
Fourth Generation: Clustering/Multi-stage
Switch Core Linecards
Optical links
100’sof feet
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Key: Physically Separating Switch Core and Linecards
Distributes power over multiple racks. Allows all buffering to be placed on the linecard:
Reduces power.Places complex scheduling, buffer mgmt, drop
policy etc. on linecard.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
Fourth Generation Routers/Switches
Switch Core Linecards
Optical links
100’sof feet
The LCS Protocol
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
Linecard
LCS
LCS
1: Req
Physical Separation
3: DataSwitch
Scheduler
Switch
Scheduler
2: Grant/credit
Seq num
Switch
Fabric
Switch
Fabric
Switch Port
Req
Grant
1 RTT
Per Queue Counters
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
Physical SeparationAligning Cells
Switch
Scheduler
Switch
Scheduler
Switch
Fabric
Switch
Fabric
LCS
LCS
LCS
Switch Core
Linecard
Linecard
Linecard
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
Fourth Generation Routers/SwitchesQueueing Structure
1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch
fabric speedup
Lookup&
DropPolicy
OutputScheduling
Virtual Output Queues
OutputScheduling
OutputScheduling
SwitchFabric
SwitchArbitration
Linecard Linecard
Switch Core(Bufferless)
Lookup&
DropPolicy
Lookup&
DropPolicy
Typically <5Tb/s aggregate capacity