Upload
blythe
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Extreme Networking Achieving Nonstop Network Operation Under Extreme Operating Conditions. Jon Turner [email protected] http://www.arl.wustl.edu/arl. Project Overview. Motivation data networks have become mission-critical resource networks often subject to extreme traffic conditions - PowerPoint PPT Presentation
Citation preview
http://www.arl.wustl.edu/arl
Extreme NetworkingAchieving Nonstop Network Operation Under Extreme Operating Conditions
2 - Jonathan Turner - January 15, 2002
Project Overview Motivation
»data networks have become mission-critical resource»networks often subject to extreme traffic conditions»need to design networks for worst-case conditions» technology advances making extreme defenses
practical Extreme network services
»Lightweight Flow Setup (LFS)»Network Access Service (NAS)»Distributed Tree Service (DTS)
Key router technology components»Super-Scalable Packet Scheduling (SPS)»Dynamic Queues with Auto-aggregation (DQA)»Scalable Distributed Queueing (SDQ)
3 - Jonathan Turner - January 15, 2002
Extreme Router Architecture
ControlProcessor
Switch Fabric
. . .
Flow/RouteLookup
Dist. Q. Ctl.Dist. Q. Ctl. OutputPortProc.
FlowLookup
InputPortProc.
Flow/RouteLookup
Dist. Q. Ctl.Dist. Q. Ctl.
FlowLookup
Lookup routeor state forreserved flows
Scalableswitch fabric
•system mgmt.•route table cfg.
•signalling
Distrib. queueing•traffic isolation•protect res. flows
4 - Jonathan Turner - January 15, 2002
Switch Fabric
IPP
OP
P
FPX
SPC
TI
IPP
OP
P
FPX
SPC
TI
IPP
OP
P
FPX
SPC
TI
IPP
OP
P
FPX
SPC
TI
IPP
OP
P
FPX
SPC
TI
IPP
OP
PFPX
SPC
TI
ControlProcessor
Prototype Extreme RouterField Programmable Port Ext.
NetworkInterfaceDevice
ReprogrammableApplication
Device
SDRAM128 MB
SRAM4 MB
Field Programmable Port Extenders
Smart Port Card
Sys.FPGA
64MB
Pentium
Cache
NorthBridge APIC
ATM Switch Core
Transmisson Interfaces
Embedded Processors
5 - Jonathan Turner - January 15, 2002
Distributed Queueing
Switch Fabric
TI TI TITI TI
I O I O I OI O I OI O
TI
ControlProcessor
Routing
Sched.
Routing
Sched.
Routing
Sched.
Routing
Sched.
Routing
Sched.
Routing
Sched.queueper output
periodic queuelength reports
Scheduler paces eachqueue according to
backlog share
6 - Jonathan Turner - January 15, 2002
Is Distributed Queueing Necessary? ATM switches generally do not do it.
»switch is engineered with small speedup (typically 2:1)»with well-regulated traffic, do not expect >2:1 overload
Overloads more likely in IP networks.» limited route diversity makes congested links common»route selection not guided by session bandwidth needs»routing changes cause rapid shifts in traffic»crude, slow congestion control mechanism»no protection from malicious users
Challenges»prevent congestion while avoiding “underflow”»scalability - target 1000x10 Gb/s systems»support fair queueing and reserved flow queueing
7 - Jonathan Turner - January 15, 2002
Basic Distributed Queueing Algorithm
Goal: avoid switch congestion and output queue underflow.
Let hi(i,j) be input i’s share of input-side backlog to output j.» can avoid switch congestion by sending from input i to output j at
rate LShi(i,j)» where L is external link rate and S is switch speedup
Let lo(i,j) be input i’s share of total backlog for output j.» can avoid underflow of queue at output j by sending from input i
to output j at rate Llo(i,j) » this works if L(lo(i,1)+···+lo(i,n)) LS for all i
Let wt(i,j) be the ratio of lo(i,j) to lo(i,1) +···+ lo(i,n). Let rate(i,j)=LSmin{wt(i,j),hi(i,j)}. Note: algorithm avoids congestion and for large enough
S, avoids underflow.» what is the smallest value of S for which underflow cannot occur?
8 - Jonathan Turner - January 15, 2002
Stress Test
can vary number of inputs and outputs used, and length of
“phases”
9 - Jonathan Turner - January 15, 2002
Stress Test Simulation - Min Rates
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
min
rate
sum
s
lo(1,1) +lo(1,2)
+lo(1,3)
+lo(1,4)
+lo(1,5)
speedup=1.5
first phasesecond
critical rate
10 - Jonathan Turner - January 15, 2002
Stress Test - Actual Rates
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
allo
cate
d r
ate
sum
s
rate(1,1)
+rate(1,2)
+rate(1,3)
+rate(1,4)
+rate(1,5)
speedup=1.5
critical rate
first phasesecond
Under-use of input
bandwidth
11 - Jonathan Turner - January 15, 2002
0
100
200
300
400
500
600
700
800
900
1,000
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
input
queue length
s
B(1,1)
B(1,2)B(1,3)B(1,4)B(1,5)
speedup=1.5
Stress Test - Input Queue Lengths
input side backlog for final output implies
underflow
12 - Jonathan Turner - January 15, 2002
Stress Test - Output Queue Lengths
0
250
500
750
1,000
1,250
1,500
1,750
2,000
2,250
2,500
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
outp
ut
queue length
B(1)
B(2)
B(3)
B(4)
B(5)
speedup=1.5
persistent output side backlog
caused by earlier dip in forwarding
rate
13 - Jonathan Turner - January 15, 2002
Improving Basic Algorithm Basic algorithm does not always make full use of
available input bandwidth.»does not reallocate bandwidth that is “sacrificed” by
queues that are “output limited”»extend algorithm to reallocate
Revised rate allocation at input i:R = SLrepeat n times
Let j be unassigned queue with smallest ratio hi(i,j)/lo(i,j)Let wt(i,j) = lo(i,j)/(sum of lo(i,q) for unassigned queues q)rate(i,j) = min{Rwt(i,j),SLhi(i,j)}R = R - rate(i,j)
Plus other refinements.
14 - Jonathan Turner - January 15, 2002
Performance Gain - Allocated Rates
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
allo
cate
d r
ate
sum
s
rate(1,1)
+rate(1,2)
+rate(1,3) +rate(1,5)
+rate(1,4)
speedup=1.5full use of
input bandwidth
preallocate bandwidth
to idle outputs
15 - Jonathan Turner - January 15, 2002
Performance Gain - Min Rates
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000
time
min
rate
sum
s
lo(1,1) +lo(1,2)
+lo(1,3)
+lo(1,4)
+lo(1,5)
speedup=1.5
critical rate
16 - Jonathan Turner - January 15, 2002
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
number of phases
wors
t-ca
se m
in r
ate
sum
s
speedup=1.25, 2 inputs
1.5,5
2,8
2.5,10
Worst-Case Min Rate Sums
17 - Jonathan Turner - January 15, 2002
Results for Random Bursty Traffic
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
load
mis
s fr
act
ion
speedup = 1.0
1.1
1.2
1.3
1.4
n= 16, avg. burst = 10
Lost link capacity is negligible for speedups greater than 1.2
18 - Jonathan Turner - January 15, 2002
Extending for Fair Queueing Fair queueing gives each flow equal share of
congested link.» limits impact of “greedy” users on others» improves performance of congestion control
mechanisms, reducing queueing delays and packet loss Partial solution
»per flow queues with packet scheduler at each output»provides fairness when no significant input-side
queueing Better solution
»per flow input and output queues»distributed queueing controls rates of per-output
schedulers at the inputs»bandwidth allocated by number of backlogged queues
19 - Jonathan Turner - January 15, 2002
Fair Distributed Queueing
Periodic update messages contain information on both backlog and number of backlogged queues.
. . .
Sw
itch
Fab
ric
. . .
dq
. . .
. . .
. . .
to output 1
to output 2
to output n
separate queue set for each output
dist. queueing controls rate of each queue set
20 - Jonathan Turner - January 15, 2002
Fair Distributed Queueing Algorithm
Same objectives as before plus fairness.» each backlogged queue gets equal share of congested output» so, allocate bandwidth according to number of backlogged queues
Let Q(i,j) be number of backlogged queues at input i for j. Let hi(i,j) = Q(i,j)/(Q(1,j) + + Q(n,j)).
» can avoid switch congestion by ensuring rate(i,j) LShi(i,j) Let need(j) be total input-side share of backlog to output j. Let lo(i,j) = need(j)Q(i,j)/(Q(1,j) + + Q(n,j)).
» can avoid underflow by ensuring rate(i,j) Llo(i,j) » this works if L(lo(i,1)+···+lo(i,n)) LS for all i
Use same rate allocation as before with modified lo and hi. For weighted fair queueing, re-define Q(i,j) to be total weight
of backlogged queues at input i for output j.
21 - Jonathan Turner - January 15, 2002
Summary Growing reliance on data networks creates higher
expectations - reliability, consistent performance.»design for worst-case - constructive paranoia»extreme defenses can be practical
Distributed queueing is key component of scalable extreme routers.»with small speedup, prevents congestion (always) and
underflow (almost always) while ensuring fairness (mostly)» increases latency and complexity
Current reconfigurable hardware capabilities.»67K elementary logic cells (LUT+FF) plus 2.5 Mb of SRAM»over 1K IO pads, high speed IOs (>500 MHz)»enables experimental implementation of complex features