Upload
madeline-preston
View
221
Download
6
Tags:
Embed Size (px)
Citation preview
Ashish Goel
Dept of CSUSC
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Balaji Prabhakar
Network Algorithms: Techniques for Design and
Analysis
Nick McKeown Depts of EE and CSStanford University
Balaji Prabhakar
Depts of EE and CSStanford University
ACM SIGCOMM 2001San Diego, CA
2
Overview and Objectives
• Algorithm design is a classical subject– beginnings in early computations
(multiplication, division, etc.)– and has become more sophisticated and
mature in the computer age
• The subject has grown– when tasks have to be performed under
more stringent conditions– or, when new tasks have to be performed– (typically, it’s a combination of both
reasons)
3
Algorithms for Networks
• Networking provides a rich new context for algorithm design
– algorithms are used everywhere in networks– at the end-hosts for packet transmission– in the network: switching, routing, caching, etc.
– many new scenarios – and very stringent constraints
– high speed of operation– large-sized systems– cost of implementation
– require new approaches and techniques
4
Methods
• Algorithm analysis– sheds light on the complexity of an algorithm (time, space,
resource)– uses discrete math and has many standard methods– implementors care about algorithm complexity– (needs cautious interpretation: metrics of the implementor
and of the theoretician not necessarily the same)
• In the networking context– we also need to understand the “performance” of an
algorithm: How well does a network or a component that uses a particular algorithm perform, as perceived by the user?
– performance analysis is concerned with metrics like delay, throughput, loss rates, etc
– this requires continuous math methods: e.g. queueing theory
5
Recent Algorithm Design Methods
• Motivated by the desire – for simple implementations– and for robust performance (because operating conditions change and are
unknown, or because of security reasons)
• Several new methods of algorithm design can be used in the networking context
– randomized algorithms– approximate algorithms– genetic algorithms– online algorithms– combinatorial optimization techniques
6
Performance Analysis Methods
• There are many: some classical, some new
– standard queueing theory (assuming input distributions are known, can we say what
delays and buffer occupancies will be like?)
– fluid models (very simple and useful for determining throughput regions)
– adversarial analysis (useful for worst-case analyses: your worst enemy is
generating traffic to beat your algorithm)
– competitive analysis (useful for comparing two different algorithms on the same
inputs – like a competition)
7
In this tutorial…
• We will consider a number of problems in networking
• Show various methods for algorithm design and for performance analysis
• Nick McKeown– Switch scheduling algorithms
• Balaji Prabhakar– Randomized algorithms
• Ashish Goel– Competitive analysis of approximate algorithms
8
Disclaimers
• This tutorial is idiosyncratic– we talk about things we know well: the subject is larger– we also talk about things we’ve worked on (hopefully,
this are also things we know well)
• Your participation is essential– please don’t hesitate to ask for clarifications– there is a lot of material and some of it is not easy– there is no such thing as a stupid question
• References– are included for each topic, but they are not exhaustive (please don’t be upset if we didn’t cite your paper)– if you need more details, please drop us a note
Switch Scheduling Algorithms
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Balaji Prabhakar
10
Scheduling crossbar switches to achieve 100% throughput
• Background to problem• Techniques and algorithms
11
Background to switch scheduling
1. [Karol et al. 1987] Throughput limited to by head-of-line blocking for Bernoulli IID uniform traffic.
2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up.
%5822
12
History of the theory
3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph.
4. [McKeown et al. 1995] (a) Maximum size match can not guarantee 100% throughput.(b) But maximum weight match can – O(N3).
5. [Mekkittikul and McKeown 1998] A carefully picked maximum size match can give 100% throughput.
Matching
O(N2.5)
13
History of the theory Speedup
5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm.
6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two.
14
History of the theory (3)Newer approaches
7. [Tassiulas 1998] 100% throughput possible for simple randomized algorithm with memory.
8. [Giaccone et al. 2001] “Apsara” algorithms.
9. [Iyer and McKeown 2000] Parallel switches can achieve 100% throughput and emulate an output queued switch.
10. [Chang et al. 2000] A 2-stage switch with a TDM scheduler can give 100% throughput.
15
Scheduling crossbar switches to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
16
Basic Switch Model
A1(n)
S(n)
N NLNN(n)
A1N(n)
A11(n)L11(n)
1 1
AN(n)
ANN(n)
AN1(n)
D1(n)
DN(n)
17
Some definitions
matrix. npermutatio a is and :where
:matrix Service 2.
".admissible" is traffic the say we If
where
:matrix Traffic 1.
SssS
nAE
ijij
jij
iij
ijijij
1,0],[
1,1
)]([:,
3. Queue occupancies:
Occupancy
L11(n) LNN(n)
18
Some possible performance goals
?metrics... Other 6
5
4
3.
"throughput "100% 2.
onconservati Work 1.
.
)(lim
)(lim.
,)]([.
,)(
ij
ij
n
ij
n
ij
ij
n
nA
n
nD
CnLE
nCnL
When traffic is
admissible
19
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
20
Algorithms that give 100% throughput for uniform traffic
• Quite a few algorithms give 100% throughput when traffic is uniform1
• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full– iSLIP
1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.
21
Maximum size bipartite match
• Intuition: maximizes instantaneous throughput
• Gives 100% throughput for uniform traffic.
L11(n)>0
LN1(n)>0
“Request” Graph Bipartite Match
MaximumSize Match
22
Network flows and bipartite matching
Finding a maximum size bipartite matching is equivalent to solving a network flow problem
with capacities and flows of size “1”.
A 1
Sources
Sinkt
B
C
D
E
F
2
3
4
5
6
23
Network flows and bipartite matching
A 1
s t
B
C
D
E
F
2
3
4
5
6
Ford-Fulkerson method.Residual Graph for first three paths:
24
Network flows and bipartite matching
A 1
s t
B
C
D
E
F
2
3
4
5
6
Residual Graph for next two paths:
25
Network flows and bipartite matching
A 1
s t
B
C
D
E
F
2
3
4
5
6
Residual Graph for augmenting path:
26
Network flows and bipartite matching
A 1
s t
B
C
D
E
F
2
3
4
5
6
Residual Graph for last augmenting path:
27
Network flows and bipartite matching
A 1
s t
B
C
D
E
F
2
3
4
5
6
Maximum flow graph:
28
Network flows and bipartite matching
A 1
B
C
D
E
F
2
3
4
5
6
Maximum Size Matching:
29
Aside: Maximal Matching
• A maximal matching is one in which each edge is added one at a time, and is not later removed from the matching.
• i.e. no augmenting paths allowed (they remove edges added earlier).
• No input and output are left unnecessarily idle.
30
Aside: Example of Maximal Size Matching
A 1
B
C
D
E
F
2
3
4
5
6
A 1
B
C
D
E
F
2
3
4
5
6
Maximal Matching Maximum Matching
31
Aside: Maximal Matchings
• In general, maximal matching is much simpler to implement, and has a much faster running time.
• A maximal size matching is at least half the size of a maximum size matching.
• A maximal weight matching is defined in the obvious way.
• A maximal weight matching is at least half the weight of a maximum weight matching. End of
aside
32
Algorithms that give 100% throughput for uniform traffic
• Quite a few algorithms give 100% throughput when traffic is uniform1
• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full
1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.
33
TDM Scheduling Algorithm
If arriving traffic is i.i.d with destinations picked uar across outputs, then a “TDM” schedule gives 100% throughput.
A 1
B
C
D
2
3
4
B
C
D
2
3
4
B
C
D
2
3
4
A 1 A 1
Variation 1: if permutations are picked uar from the set of N! permutations, this too will also give 100% throughput.
Variation 2: if permutations are picked uar from the TDM permutations above, this too will give 100% throughput.
34
A Simple wait-until-full algorithm
The following algorithm is believed to be stable for Bernoulli i.i.d. uniform arrivals:
1. If any VOQ is empty, do nothing (i.e. serve no queues).
2. If no VOQ is empty, pick a permutation uar across either (TDM permutations, or all permutations).
35
Some simple algorithms that achieve 100% throughput
36
Some observations
• A maximum size match (MSM) maximizes instantaneous throughput.
• But a MSM is complex – O(N2.5).• It turns out that there are many simple
algorithms that give 100% throughput for uniform traffic.
• So what happens if the traffic is non-uniform?
37
Why doesn’t maximizing instantaneous throughput give 100% throughput for non-
uniform traffic?
2/1
2/1
2/1
32
21
1211Three possiblematches, S(n):
100%). t(throughpu stable not is switch 0.0358 if so And
But
most at is served is 1 input which at rate total The
. w.p. serviced is 1 Input ) w.p.( arrivals have
both and and , time at that Assume
.)21(31121
.)21(311
)21(11)21(32
32)21(
)()(0)(0)(
21
2
22
2
32211211
-δ// - -λ
//
/-//
/-δ/
nQnQ n, L nn, L
38
Simulation of simple 3x3 example
39
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
40
Example 1: (Trivial) scheduling to achieve 100% throughput
• Assume we know the traffic matrix, and the arrival pattern is deterministic:
• Then we can simply choose:
• Q: What is Lij(n)?
1000
0100
0010
0001
nnS
,
10
...
1
01
)(
41
Example 2:With random arrivals, but known traffic matrix
• Assume we know the traffic matrix, and the arrival pattern is random:
• Then we can simply choose:
• Q: Does Lij(n) = 0 for all n?• Q: In general, if we know , can we pick a sequence S(n) to achieve
100% throughput?
1000
0100
002/12/1
002/12/1
1000
0100
0001
0010
)(,
1000
0100
0010
0001
)( evenSoddS
42
Birkhoff - von Neumann Decomposition
rate. arrival the exceeds rate
departure the and words, other In
is period in of soccurrence of# the that So
:matrices service of sequence the pick Then
element) by (element
:that such matrices, service of set and
constants of set some pick can we y,Intuitivel
,0))((
.
),,,,,,,()(
.,
),(
),,(
1
13221
1
1
1
T
i
ii
r
r
iii
r
r
iS
aTM
T
MMMMMMnS
Ma
MM
aa
Turns out, any can always be decomposed into a linear (convex) combination of matrices, (M1, …, Mr) by Birkhoff-von Neumann.
43
In practice…
• Unfortunately, we usually don’t know traffic matrix a priori, so we can:– Measure or estimate , or– Not use .
• In what follows, we will assume we don’t know or use .
44
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known
• Technique: Birkhoff-von Neumann decomposition.
4. When traffic matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
45
When the traffic matrix is not known
( 1) ( ) ( ) 0
( ) ( ) ( ) ( ) | ( ) 0.
ij ij
ij ijij ij
E L n L n | L n ,
E L n S n A n L n L n
1. We will try and fi nd conditions f or which, roughly:
i.e.
2. I n other words, there is an expected downward drif t
in the o
: { ( 1)} { ( )} ( ) 0
[ ( )] , , .
{ ( 1)} {
ij
E V L n V L n | L n
E L n i j
E V L n V L
ccupancy of each queue.
3. This is an example of a Lyapunov f unction.
4. I t is known that if
f or some V{.}, then:
5. The same result holds if :
( )} ( ) ( ) .n | L n c k L n
46
Some additional definitions
else.
and if
:Evolution Approx
:
matrix. npermutatio a is
:
:
:
:
A:
Evolution
Service
rate Arrival
Arrivals
,0
1)(0)(,1)1(
~)1(
).()()()1(~
).()()()1(
)(..:
1,1.1,0
)),(,),(()(
).,,(
)).(),...,(()(
,
)(..)(
...
...
)(..)(
)(
11
11
1
11
1
11
nSnLnLnL
nAnSnLnL
nAnSnLnL
nSei
SSS
nSnSnS
nAnAnA
nAnA
nAnA
n
ij ij ij ij
ij ij ij ij
ij ij ij ij
jij
iijij
NNT
NNT
NNT
NN
N
N
47
Some facts that we’ll use
matrices. npermutatio the are C of points extreme The
:Theorem sBirkhoff’ 4.
elements. other of nscombinatio linear
not are that C of elements are points extreme The
C. of points Extreme 3.
then
if e.g.
set. closed a is C 2.
i.e. ,stochasticsub doubly is 1.
0,1)(
,3040
7020
1070
8010
.11
...
.........
...
2
21
1
111
1
21
a,bbaC, bΛaΛ
C,ΛΛ. .
. ., Λ
. .
. .Λ
λ, λ
λλ
λλ
Λ
i j
ijij
NNN
N
48
Some more facts that we’ll use
1 1
( )
max ( ( ) )
1 1 0.
max( ( ) ) max( ( ) ( )).
( ( )
T
N N
ij ij iji j
T T
λ S n
T
L n
λ , λ , λ
L n λ L n S n
L n
Consider the f ollowing linear programming problem:
Find:
s.t.
We know that the solution is an extreme point of C.
i.e.
( )
( )
) max( ( ) ( )) 0.
max( ( ) ( ))?
T
S n
T
S n
λ L n S n
L n S n
Q: So what is
49
Maximum weight matching
A1(n)
N NLNN(n)
A1N(n)
A11(n)
L11(n)
1 1
AN(n)
ANN(n)
AN1(n)
D1(n)
DN(n)
L11(n)
LN1(n)
“Request” Graph Bipartite Match
S*(n)
MaximumWeight Match
*
( )( ) arg max( ( ) ( ))T
S nS n L n S n
50
Outline of Proof
*
( )
*
( ) arg max( ( ) ( )),
( ( ) ) ( ( ) ( )) 0.
( 1) ( 1) ( ) ( ) ( ) ( ) .
{ ( )} ( ) ( ),
T
S n
T T
T T
T
S n L n S n
L n λ L n S n
E L n L n L n L n | L n c L n
V L n L n L n
1. We know that if we pick
then
2. Next we use this f act to show that:
where:
( )
[ ( )]
L n
E L n
is our Lyapunov f unction.
3. Hence, if is large enough, there is an expected
single-step downward drif t in occupancy, and so
and 100%throughput is achieved.
For more details, see ref erence.
51
Choosing the weight
2 3
( ) ( )?
( ) [ ( )] ,[ ( )] ,...
( )
ij ij
ij ij ij
ij
w n L n
w n L n L n
w n
Q: Do we need to choose edge weights:
I f we choose then same
Lyapunov method can be used to show that 100% throughput
is achieved.
I f
Fact 1:
Fact 2: [ ( )] [ ( )] .
( ) ( ) [ ( )] .
( ) [ ( )]
xxij ij
ij ij ij
xij ij
L n E L n
w n L n E L n
w n L n
x
then For example,
if , then
Theory suggests that if , then
switch becomes "more stable" as we increase . Simulation
sugg
Observation:
( )
( ) ( ) (
ij
ij
ij ij iji
x
w n
Q
w n L n L n
ests that average delay decreases as we increase .
I f is defi ned to be the time that the HOL cell
has been in queue , then 100% throughput is achieved.
I f
Fact 3:
Fact 4: )j
, then 100% throughput is
achieved. This is called a "Longest Port First (LPF)" match, and
(surprisingly) is also a maximum size match.
52
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
53
100% throughput with pipelining
ˆ ( ) ( ),ij ij
n
L n L n k
k
I n practice, switch schedulers are of ten pipelined.
So what happens if the pipeline uses out-of -date inf ormation?
1. Defi ne out-of -date occupancy at time :
where is how out-of -date th
ˆ( ) ( ) ( ) ,
( 1) ( 1) ( ) ( ) ( ) ( ) 2 .
( )
ij ij ij
T T
L n k L n L n k
E L n L n L n L n | L n c L n Nk
L n
additional term
e inf ormation is.
2. Because it can be shown that:
3. As bef ore, if is large enough, there is an expecte
[ ( )]E L n
k
d
single-step downward drif t in occupancy, and so
and 100%throughput is achieved.
Q: I f we increase , will average delay increase?
54
100% throughput with incomplete information
I n practice, the bandwidth of state inf ormation to/ f rom
and within a switch schedulers is limited.
So what happens if the scheduler uses f ewer bits to store
the weight inf ormation?
1. Defi ne noisy inf orma
ˆ( ) ( ) ( ),
( )
( )
n
L n L n e n
e n
e n C n C
tion at time :
where is an error term.
2. I f , , where is some constant, then 100%
throughput is achieved.
55
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
56
Achieving 100% when algorithm does not complete
Randomized algorithms:1. Basic idea (Tassiulas)2. Reducing delay (Shah, Giaccone and
Prabhakar)
Note: Balaji Prabhakar will cover randomized scheduling algorithms in detail in the next section of the tutorial.
57
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
58
Speedup and Combined Input Output Queueing (CIOQ)
A1(n)
S(n)
N NLNN(n)
A1N(n)
A11(n)L11(n)
1 1
AN(n)
ANN(n)
AN1(n)
D1(n)
DN(n)
• With speedup, the matching is performed s times per cell time, and up to s cells are removed from each VOQ.• Therefore, output queues are required.
59
Fluid model
• Fluid models used to obtain stability regions for discrete time stochastic networks.
• Apply to any traffic that satisfies a strong law of large numbers, i.e.
• The fluid model “washes” out the packet structure, yet still can prove stability results.
ijij
n n
nA
)(lim
60
Fluid Model
.)(),()(
)()0()(
.)(
)(
)),1()((1)(
)()()0()(
1)0)({
ttTtTstD
tDtLtL
nnTn
SnT
kTkTsnD
nDnALnL
Ss
ms
Ss
msijij
ijijijij
Ss
ms
mS
ms
Ss
n
k
mskLijij
ijijijij
ij
:where
:time continuous in equations Fluid
and ; slot by used been has
npermutatio time cumulative the is :where
:evolution Switch
61
Fluid Model
reference. the see details, more For
achieved. is
throughput 100% and stable, is switch the , of speedup
a with that shown be can it this, From served. be must output
the and/or served, be must input the either words, other In
some for 2.
and/or 1.
: if
hold must following the of one , slot time of , phase,
each for then match, maximal a is , match, the If
:Prabhakar) and (Dai proof of Sketch
2
}.,,1{,,0)()(
,0)()(
0)(
*
*
s
Njis
knS
s
knL
s
knS
s
knL
s
knL
nk
S
jiji
jiji
ij
*
62
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
63
2-stage switch and no scheduler
Motivation:1. If traffic is uniformly distributed, then
even a TDM schedule gives 100% throughput.
2. So why not force non-uniform traffic to be uniformly distributed?
64
2-stage switch and no scheduler
S2(n)
N NLNN(n)
L11(n)
1 1 D1(n)
DN(n)
N N
1 1 A’1(n)
A’N(n)
S1(n)
A1(n)
AN(n)
BufferlessLoad-balancing
Stage
BufferedSwitching
Stage
65
2-stage switch with no scheduler
ˆ( ) ,
ˆ mod
nn
n n N
1. Consider a periodic sequence of permutation matrices:
where is a one-cycle permutation matrix
(f or example, a TDM sequence), and .
2. I f 1st stage is
Main Result [Chang et al.]:
1 1
1
2 2
( ) ( ),
( ) ( ),
n n
n n
scheduled by a sequence of permutation
matrices:
where is a random starting phase, and
3. The 2nd stage is scheduled by a sequence of permutation
matrices:
4. Then the switch gives 100% throughput f or a very broad
range of traffi c types.
1st stage makes non-unif orm traffi c unif orm,
and breaks up burstiness. For bursty traffi c, delay can be
lower than f or an ou
Observation 1:
tput queued switch!
Cells can become mis-sequenced.Observation 2:
66
Parallel Packet Switches
Definition:
A PPS is comprised of multiple identical lower-speed packet-switches operating independently and in parallel. An incoming stream of packets is spread, packet-by-packet, by a demultiplexor across the slower packet-switches, then recombined by a multiplexor at the output.
We call this “parallel packet switching”
67
Architecture of a PPS
OQ Switch
OQ Switch
OQ Switch
1
2
3
N=4
R
R
R
R
1
2
3
N=4
R
R
R
R
MultiplexorDemultiplexor
Demultiplexor
Demultiplexor
Demultiplexor
Multiplexor
Multiplexor
Multiplexor
(sR/k) (sR/k)
k=3
1
2
(sR/k) (sR/k)
68
Layer 1
Layer 2
Layer 3
1
2
3
N=4
R
R
R
R
1
2
3
N=4
R
R
R
R
2
2
415
3
1
2
1
3
2
1
4
2
3
1
4
2 13
4
1234
5
123
5
1234 1234
5
12345
R/3
R/3
R/3
Why a PPS isn’t work-conserving with s=1
69
Layer 1
Layer 2
Layer 3
1
2
3
N=4
R
R
R
R
1
3
N=4
R
R
R
R
Why is there no Choice at the Input ?
1
2
3
4
2
j
4
1
2
3
j
4
1
2
3
jj
5
j
j
41
2
3
41
2
jj54
1
2
3j
4jjj5
How we got there on the input side
70
Layer 1
Layer 2
Layer 3
1
2
3
N=4
R
R
R
R
1
3
N=4
R
R
R
R
Result of no Choice
2
41
2
3
5
1
2
3
4
jj54
Why is there no Choice at the Input ?
How we got there on the input side
71
How can we increase choice? Speedup
Layer 1
Layer 2
Layer 3
1
2
3
N=4
R
R
R
R
1
3
N=4
R
R
R
R
2
1j
1
j
1
j
jjj54 5
14
2
3j
5
j
j
14
jj
2
3
5
14
jj
2
3
5
(2R/3) (2R/3)
(2R/3) (2R/3)
72
Effect of Speedup on Choice
R
A speedup of S= 2,
with k= 10 links
2R/k Layer 1
Layer 10
1k/S
Layer 2
Layer 9
73
Aside: 3-stage Clos Network
n x k
m x m
k x n1
N
N = n x mk >= n
1
2
…
m
1
2
…
…
…
k
1
2
…
m
1
N
n n
74
Aside: With k = n is a Clos network non-blocking like a
crossbar?
Consider the example: scheduler chooses to match(1,1), (2,4), (3,3), (4,2)
75
Aside: With k = n is a Clos network non-blocking like a
crossbar?
Consider the example: scheduler chooses to match(1,1), (2,2), (4,4), (5,3), …
76
Aside: With k > n can a Clos network be non-blocking without
rearrangement?
Clos’ Theorem: If k > 2n – 1, then a new connection can alwaysbe added without rearrangement.
77
I1
I2
…
Im
O1
O2
…
Om
M1
M2
…
…
…
Mk
n x k
m x m
k x n1
N
N = n x mk >= n
1
N
n n
Aside: With k > n can a Clos network be non-blocking without
rearrangement?
78
Clos Theorem
Ia Ob
x
x + n
1
n
k
1
n
k
1. Consider adding the n-th connection between1st stage Ia and 3rd stage Ob.
2. We need to ensure that there is always somecenter-stage M available.
3. If k > (n-1) + (n-1) , then there is always an M available. i.e. k > 2n – 1.
n-1 alreadyin use at input
and output.
End of aside
79
Definitions for PPS
• Available Input Link Set (AIL)
AIL(i,n) is the set of layers to which external input port i can start writing a cell to, at time slot n.
80
Definition
• Departure Time of a Cell (n’)
The departure time of a cell, n’, is the time it would have departed from an equivalent FIFO OQ switch.
81
Definition
• Available Output Link Set (AOL)
AOL(j,n’) is the set of layers that output j can start reading a cell from, at time slot n’.
82
Main Observation
Layer 1
Layer 2
Layer 3
1
2
3
N=4
R
R
R
R
1
3
N=4
R
R
R
R
2
14
1
j
j
jj
2
3
(2R/3) (2R/3)
(2R/3) (2R/3)
• Inputs can only send to the AIL set.• Outputs can only read from the AOL set.
5 1jj
j
222
83
Minimum size of AIL, AOL:
|AIL|, >= Total – Maximum number of|AOL| links links which can have
cells in progress
Lower Bounds on Choice Sets
= k - ( k/S - 1 )
84
Assurance of Choice
• A cell must be sent to a link which belongs to both the AIL and the AOL set.
AI L AOL
|AI L| |AOL| k
(k k/ s 1) (k k/ s 1) k
S 2k/ (k 2)
85
Parallel Packet SwitchResults
• If S >= 2 then each cell is guaranteed to find a layer that belongs to both the AIL and AOL sets.
• If S >= 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns, and hence achieves 100% throughput.
86
Scheduling algorithms to achieve 100% throughput
1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.
• Technique: Birkhoff-von Neumann decomposition.
4. When matrix is not known.• Technique: Lyapunov function.
5. When algorithm is pipelined, or information is incomplete.
• Technique: Lyapunov function.
6. When algorithm does not complete.• Technique: Randomized algorithm.
7. When there is speedup.• Technique: Fluid model.
8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.
87
References
1. C.-S. Chang, W.-J. Chen, and H.-Y. Huang, "Birkhoff-von Neumann input buffered crossbar switches," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, 2000, pp. 1614 – 1623.
2. N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug 1999.
3. A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March 1998.
4. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April 1998.
5. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects 2001.
6. J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp. 556 -- 564.
7. C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann
switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas. 8. S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc.
IEEE INFOCOM `01, April 2001, Alaska.
Randomized Algorithms
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Balaji Prabhakar
89
Motivation
• Networking problems suffer from the “curse of dimensionality”– algorithmic solutions do not scale well
• Typical causes– size: large number of users– time: very high speeds of operation
• A good deterministic algorithm exists, but …– it requires too large a data structure– it needs state information, and “state” is too big– it “starts from scratch” in each iteration
90
Overview
• In various scenarios, e.g.– caching– load balancing– switch scheduling– packet dropping (active queue management)
• We will– consider good (even optimal) exact algorithms– discuss their complexity– design approximate algorithms– and, analyze their performance
91
Some Specifics
• Exact algorithms – in each scenario these are either well-known or easily
determined– when their analysis and optimality properties have been
established in the classical theoretical literature, we will only give some intuition and point to references
– if their development is more recent (e.g. switch scheduling), we will consider them in more detail
thus, the main focus of this segment is the design and analysis of approximate randomized schemes
• Randomized algorithms– are a powerful way of approximating– it is often possible to randomize deterministic
algorithms – this simplifies the implementation while retaining a
(surprisingly) high level of performance
92
Randomization
• The main idea is – to simplify the decision-making process– by basing decisions upon a small, randomly
chosen sample of the state – rather than upon the complete state
93
An Illustrative Example
• Find the largest element of a set S of size 1 billion• Deterministic algorithm: linear search
– has a complexity of 1 billion
• The randomized version: find the largest of 10 randomly chosen samples– has a complexity of 10– (note: this ignores complexity of choosing 10 random
samples)
• Performance– linear search will find the absolute largest element– if R is the element found by randomized algorithm, we can
make statements like P(R is at least the 100 millionth largest element) = thus, we can say that the performance of the randomized
algorithm is very good with a high probability
101
110
94
Randomizing Iterative Schemes
• Often, we want to perform some operation iteratively
• Example: find the heaviest matching in a switch in every time slot
• Since, in each time slot– at most one packet can arrive at each input– and, at most one packet can depart from each output the size of the queues, or the “state” of the switch, doesn’t
change by much between successive time slots so, a matching that was heavy at time t will quite likely continue
to be heavy at time t+1
• This suggests that– knowing a heavy matching at time t should help in determining
a heavy matching at time t+1 there is no need to start from scratch in each time slot
95
Summarizing the Philosophy…
• Randomized algorithms can help simplify the implementation– by reducing the amount of work in each iteration
• If the state of the system doesn’t change by much between iterations, then– we can reduce the work even further by carrying
information between iterations
• The big pay-off is that, even though it is an approximation, the performance
of a randomized scheme can be surprisingly good
96
Examples
• We’ll discuss these issues in the following scenarios
– document replacement in web-caches– load balancing– switch scheduling– bandwidth firewalling via packet dropping
A Randomized Web-Cache Replacement Scheme
98
Background
• Tremendous increase in HTTP traffic• Proxy caches reduce
– network traffic– download latency– server load
99
Replacement Policies
• In CPU caches– the Least Recently Used (LRU) algorithm
and variants• recentness of use exploits temporal correlation
• In Web caches– more complicated criteria
• different document sizes and fetching costs• recentness and frequency of use exploit
correlation and popularity
needed to determine the suitability of a document for eviction
100
Motivation
• Data structures may get complicated– priority queues
• Supporting computations can get time- consuming– cache of size K, O(log K) per access, to
prepare for evictions
• Need efficient approximations
101
A Randomized Algorithm
• First cut– pick N documents at random from cache– evict the least useful document
• For subsequent iterations…
• Why throw away all previous info?
• Second best (or second least useful sample) is pretty good!
102
• First iteration– pick N documents at random from cache– evict the least useful document– retain the M next least useful documents
• Subsequent iterations– pick N-M documents at random from cache– evict the least useful document
• among the fresh N-M and the M retained
– retain the M next least useful documents
The Iterative Version
103
An Example (N=8, M=2)
11 89 2 39 41 77 95 8
11 89 2 39 41 77 8
89 77311 22 49 25 82
8277
104
Performance Criterion
• Deterministic algorithm would evict the most useless document
• Goal: document evicted in most useless nth percentile– error if goal not achieved
• Goal positively correlated with hit rate
105
Memory Improves Performance
• Using memory improves performance significantly for small M– –
• It is like choosing the minimum of the minimum
nNNerror enMP )1()0(
nNNerror enMP 212)1()1(
106
• Compute Perror as a function of memory
M– Xk: number of useless documents (in nth bin)
prior to kth replacement
– Ak: number of useless documents acquired
from resampling
– Xk is a Markov Chain
Some Analysis
107
•
• Perror = P(Xk=0)
• Analysis independent of trace-characteristics
Xk Xk+1
1(Xk>0)
Ak+1
1(Xk+1>0)
k+2thk+1thkth
The Markov Chain
108
Perror
0 1 2 3 4 5 60.1
0.2
0.3
0.4
0.5
M
Pe
rro
r
N=12, n=10%
0 5 10 150
0.1
0.2
0.3
0.4
0.5
M
Pe
rro
r
N=30, n=4%
0 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
M
Pe
rro
r
N=60, n=2%
0 10 20 30 400
0.05
0.1
0.15
0.2
0.25
M
Pe
rro
r
N=80, n=2%
109
The Right Amount of Memory (M)
• From the figures it looks like there is an optimal value for M that minimizes Perror
– that is, if M is too low, we’re not carrying enough information between iterations
– if M is too high, we’re carrying a lot of (stale) information, there isn’t much new
• So there seems to be right balance between the amount of memory we need and the amount of random sampling
• More precisely…
110
The Right Amount of Memory (M)
• Perror is a complicated function of M
• But, we can still get some info on its dependency on M
• First, we can show that Perror is a convex function of M– to do this we need to show that the discrete
second derivative of Perror (M) is non-negative
– this is done using a “coupling argument”
• As a result there exists an optimal value of M = M*
111
The Optimal Value of M
• There is an approximate closed form formula for M*
• This is obtained using an appropriate “exponential martingale” based on the Markov chain X
*/1001,0max MnNNM o
K
112
A Comparison
113
Trace-driven Simulation
• Approximate the following web-cache replacement schemes– LRU– GD-Hyb (GD-Size and Hybrid)
• recentness• frequency• size• cost to fetchfrom the work of Cao and Irani, ’97, Wooster and Abrams, ‘97
114
0 5 10 15 2060
65
70
75
80
85
90
95
100
% relative cache size
% h
it ra
teWeekly NLANR Trace
LRU(non-random): blackN=30, M=5: redN=8,M=2: cyanN=3,M=1: greenRR(N=1, M=0): blue
LRU: Hit Rate
115
0 5 10 15 2060
65
70
75
80
85
90
95
100
% relative cache size
% h
it ra
teWeekly NLANR Trace
GD-Hyb(non-random): blackN=30, M=5: redN=8, M=2: cyanN=3, M=1: greenRR(N=1, M=0): blue
GD-Hyb: Hit Rate
116
0 5 10 15 2050
55
60
65
70
75
80
85
90
95
% relative cache size
% r
ed
uce
d la
tenc
yDaily NLANR Trace
LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue
LRU: Latency Reduction
117
0 5 10 15 2050
55
60
65
70
75
80
85
90
95
100
% relative cache size
% r
ed
uce
d la
tenc
y
Daily NLANR Trace
LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue
GD-Hyb: Latency Reduction
118
References
1. P. Cao and S. Irani, “Cost-aware WWW Proxy Caching Algorithms,” Proc. of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA, Dec 1997
2. R. Wooster and M. Abrams, “Proxy Caching that Estimates Edge Load Delays,” 6th International WWW Conference, Santa Clara, April 1997
3. K.Psounis and B. Prabhakar, “A Randomized Web-cache Replacement Scheme,” Proc. INFOCOM 2001
4. T. Lindvall, Lectures on the Coupling Method, Wiley Series in Probability and Mathematical Statistics, Wiley, New York, 1992.
5. R. Durrett, Probability: Theory and Examples, Duxbury Press,
Second Edition, 1996.
Randomized Load Balancing
120
Load Balancing: Static Case
121
Load Balancing: Dynamic Case
122
A Simple (and elegant) Analysis
123
Continuing…
124
• Since the load doesn’t change by much between iterations …– that is, a lightly loaded queue is likely to continue to
be lightly loaded
• It might help– to remember the identity of the least loaded bin in
the current iteration for use in the next iteration – similar idea used in the web-caching problem
Carrying Information Between Iterations
125
• The (d,1) system– d random choices – 1 bin stored in memory
• Question– How well does the (d,1) system perform ?
Load Balancing with Memory
126
• The bin stored in memory– is likely to be very lightly loaded– so we might expect better load balancing
An Illustration
127
• The maximum load achieved in the (d,1) system is less than log log n/log (2d-1) +O(1) with a high probability
• This is as if we had a (2d-1,0) system.– so the bin in memory is at least as good as (d-1)
samples– again, we see the minimum of minimums effect
Theorem (Shah and P)
128
1. Y. Azar et. al., “Balanced Allocations,” Proc. Of ACM STOC, 1994.
2. M. Mitzenmacher, “The power of two choices in randomized load balancing,” PhD Thesis, UC Berkeley, 1996.
3. N. D. Vvedenskaya, R. Dobrushin and F. Karpelevich, “Queueing system with selection of the shortest of two queues: An asymptotic approach,” Problems of Information Transmission, 1996.
4. B. Vocking, “How Asymmetry Helps Load Balancing,” Proc. Of 40th IEEE-FOCS, 1999.
5. S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence, John Wiley and Sons, 1986.
References
Switch Scheduling
130
Switch Scheduling and Bipartite Graph Matching
• As we have seen, switch scheduling is essentially finding matchings in weighted bipartite graphs
41
2
4
42
131
Scheduling Algorithm
• Ideal policy: Maximum weight matching– weights: queue size, age of packets etc.– very complex for high speed networks
• In practice, approximate maximum weight matchings are the best hope
• We will discover good, randomized, approximate matchings in an evolutionary fashion– story told pictorially using simulations
132
• Switch Size : 32 X 32
• Input Traffic (shown for a 4 X 4 switch) – Bernoulli i.i.d. inputs– diagonal load matrix:
• normalized load=x+y<1• x=2y
Simulation Scenario
xy
yx
yx
yx
00
00
00
00
133
Obvious Randomized Schemes
• Choose a matching at random and use it as the schedule doesn’t give 100% throughput
• Choose 2 matchings at random and use the heavier one as the schedule
• Choose N matchings at random and use the heaviest one as the schedule
None of these can give 100% throughput !!
134
0.001
0.01
0.1
1
10
100
1000
10000
0.0 0.2 0.4 0.6 0.8 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWM R32R1
135
Bounds on Maximum Throughput
136
Iterative Randomized Scheme(Tassiulas)
• Say M is the matching used at time t
• Let R be a new matching chosen u.a.r.
• At time t+1, use the heavier of M and R • This gives 100% throughput !
note the boost in throughput is due to memory
• But, delays are very large
137
0.01
0.1
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWMTassiulas
138
Observations for Improvement
• Most of the weight of a matching is carried in a small number of edges
• Hence, remember edges not matchings
139
0.01
0.1
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWMR32M32 R1M1 Tassiulas
140
Finer Observations
• Let M be schedule used at time t
• Choose a “good’’ random matching R
• M’ = Merge(M,R)
• M’ includes best edges from M and R
• Use M’ as schedule at time t+1
• Above procedure yields algorithm called LAURA
141
3
2
3
2
2
1
2
3
4
1Merging
3
2
3
3
1
X R3-1+2-2=2
2-1+2-4=-1
W(X)=12 W(R)=10
M
W(M)=13
Merging Procedure
142
0.01
0.1
1
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mea
n IQ
Len
Normalized Load
Diagonal Traffic
MWMM-LAURA LAURAiLQFTassiulas
143
References
1. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input-queued switches,” Proc. INFOCOM 1998.
2. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” Proc. of Hot Interconnects, 2001.
3. R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.
A Randomized Bandwidth Partitioning Algorithm
145
The Setup
• In a congested network with many users– QoS requirements are different
• Problems:– allocate bandwidth– control queue size and hence delay
146
Approach 1: Network-centric
• Network node: fair queueing• User traffic: any type
problem: complex implementation
147
Approach 2: User-centric
• Network node: simple FIFO• User traffic: congestion-aware (e.g.
TCP)problem: requires user cooperation
148
Approach 2: Controlling Delays
• Use RED (Random Early Detection)– drop incoming packet randomly based on
congestion level– this signals the onset of congestion to the
sources who will back-off (if they are responsive)
• RED is simple – but can’t prevent unresponsive flows from
eating up all the bandwidth
• Goal: find a bandwidth partitioning algorithm that is close to RED in implementational simplicity
149
Preliminary Comments
• Consider a single link shared by 1 unresponsive (red) flow and k responsive (green) flows
• Suppose the buffer gets congested
• Observe: It is likely there are more packets from the red (unresponsive) source
• So if a randomly chosen packet is evicted, it will likely be a red packet
• Therefore, one algorithm could be: When congested evict a random packet
150
Preliminary Comments
• Unfortunately, this doesn’t work because there is a small non-zero chance of evicting a green packet
• Since green sources are responsive, they interpret the packet drop as a congestion signal and back-off
• This only frees up more room for red packets
• Idea: Suppose we choose two packets at random from the queue and compare their ids, then it is quite unlikely that both will be green
• This suggests another algorithm: Choose two packets at random and drop them both if
their ids agree• This works: That is, it limits the maximum bandwidth
the red source can consume
151
The CHOKe Algorithm
• Builds on the previous observation
• Is a randomized algorithm (like RED, and is imbedded in RED)
• Turns out to have an easily analyzable performance via fluid models
• The last point is interesting, since we’ll see how surprisingly accurate fluid models are for modeling TCP- and UDP-type traffics
152
The CHOKe Algorithm
Admit new packet
Arriving packet
y
ny
Drop both packets
Draw a packet at random from queue
end
end
n
Drop the new packet
end
Admit packet witha probability p
end
y
nAvgQsize <= Minth?
Both packets from same flow?
AvgQsize <= Maxth?
153
The CHOKe Algorithm: Multiple Samples
Admit new packet
Arriving packet
y
ny
Drop all matched packets
Draw m packets at random from queue
end
end
n
Drop the new packet
end
Admit packet witha probability p
end
y
nAvgQsize <= Minth?
Do any of the packet ids match?
AvgQsize <= Maxth?
154
Simulation Comparison: The setup
R11Mbps
10MbpsS(2)
S(m)
S(m+n)
TCP Sources
S(m+1)
UDP Sources
S(1)
R2
D(2)
D(m)
D(m+n)
TCP Sinks
D(m+1)
UDP Sinks
D(1)
10Mbps
155
The Specifics
• 32 TCP flows, 1 UDP flow• All TCP’s maximum window size = 300 • All links have a propagation delay of
1ms• FIFO buffer size = 300 packets• All packets sizes = 1 KByte• RED: (minth,maxth) = (100,200)
packets
156
Simulation 1: 1UDP source
UDP's arrival rate = 2 Mbps
0
200
400
600
800
1000
0 50 100
Time (second)
Th
rou
gh
pu
t (K
bp
s))
DropTail: UDP's ThroughputRED: UDP's ThroughputCHOKe: UDP's Throughput
157
Different UDP Loadings
98.3%
97.5%
95.0%
87.0%74.1%
57.3%
45.8%34.8%
23.0%
0
50
100
150
200
250
300
350
400
100 1000 10000
UDP Arrival Rate (Kbps)
Th
rou
gh
pu
t (K
bp
s))
UDP Throughput with mark forUDP Dropping PercentageAverage TCP Throughput
158
5 UDPs and 1 Sample from Queue
32 TCPs, 5 UDPs (with same arrival rate)
0
200
400
600
800
1000
1200
100 1000 10000Total UDP Arrival Rate (Kbps)
Th
rou
gh
pu
t (K
bp
s)) Total UDP Throughput
Total TCP Throughput
159
5 Samples for 5 UDPs
0
200
400
600
800
1000
1200
100 1000 10000
Total UDP Arrival Rate (Kbps)
Thr
ough
put
(Kbp
s)
Total UDP Throughput
Total TCP Throughput
160
How many samples to take?
• Since we don’t know a priori how many unresponsive flows are passing throught the link, take the number of samples depending on backlog
• As Qavg increases, increase number of samples
minthMaxth
R1R2Rk
avg
161
A Fluid Analysis
discards from the queue
permeable tube with leakage
162
Some notation
• N: total number of packets in the buffer
• Li(t): rate at which flow i’s packets cross position t of buffer • 0 = entrance and D = exit
• pi: fraction of flow i’s packets dropped at ingress
= fraction of flow i’s packets dropped in buffer (since drops occur in pairs)
i : rate at which flow i packets arrive
163
The Equation
• Li(t)t - Li(t +t)t = i Li(t)t /N
=> - dLi(t)/dt = i Li(t) N
Li(0) = i (1-pi )
Li(D) = i (1-2pi )
• This first order differential equation can be solved explicitly for Li(t), 0 < t < D
164
Simulation Comparison: 1UDP, 32 TCPs
0
50
100
150
200
250
300
350
0.1 1 10Arrival Rate
Thr
ough
put
fluid model
CHOKe ns simulation
165
Fluid Analysis of Multiple Samples
• With M samples
Li(t)t - Li(t +t)t = Mi Li(t)t /N
=> - dLi(t)/dt = Mi Li(t) N
Li(0) = i (1-pi )M
Li(D) = i (1-pi )M - Mi pi
166
Comparison: 1 UDP, 2 Samples
0
20
40
60
80
100
120
140
0 0.5 1 1.5 2UDP Arrival Rate(Mbps)
UD
P T
hrou
ghpu
t(K
bps)
NS SimulationFluid Model
167
References
1. A. Demers, S. Kesav and S. Shenker, “Analysis and simulation of a fair queueing algorithm,” Proc. ACM SIGCOMM 1989.
2. S. Floyd and V. Jacobson, “Link-sharing and Resource Management Models for Packet Networks,” IEEE/ACM Trans. on Networking, 1995.
3. R. Braden et. al., “Recommendations on queue management and congestion avoidance in the Internet,” IETF RFC (Informational) 2309, April 1998.
4. R. Pan, B. Prabhakar and K. Psounis, “CHOKe: A stateless active queue management scheme for approximating fair bandwidth allocation,” Proc. INFOCOM 2000.
Competitive Analysis: Theory and Applications in Networking
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Balaji Prabhakar
169
Competitive Analysis in Networking: Outline
• Background• Incremental construction of Multicast
Trees– The Greedy Strategy
• Routing and Admission Control– The Exponential Metric
• More Restricted Adversaries– Adversarial Queueing Theory
Theoretical Analysis; Rules of Thumb;Pragmatic Analysis
170
Decision Making Under Uncertainty:
Online Algorithms and Competitive Analysis
• Online Algorithm:– Inputs arrive online (one by one)– Algorithm must process each input as it arrives– Lack of knowledge of future arrivals results in
inefficiency
• Malicious, All-powerful Adversary:– Omniscient: monitors the algorithm– Generates “worst-case” inputs
• Competitive Ratio:– Worst ratio of the “cost” of online algorithm to
the “cost” of optimum algorithm
171
Warm-up Example: The Unlucky Skier
• Beginning Skier:– Does not know how many ski trips she will
make– Can rent skis for $40 or buy skis for $400– Online algorithm: on each successive trip,
must decide whether to buy or continue renting
• Adversary: All powerful– As long as the skier is renting, the adversary
will send her on another trip– As soon as the skier buys, the adversary will
stop her ski trips once and for all
172
The Unlucky Skier [Contd.]
• Buy after K trips– Cost of the algorithm = K £ $40 + $400– Optimum Cost = min{$400, (K+1) £ $40}– Competitive Ratio: Algorithm’s Cost/Optimum Cost
• Best Strategy: Rent 9 times, buy on 10th trip– Competitive Ratio = 760/400 ¼ 2– Best strategy does not always yield best solution
• Ski Principle: Buy after paying enough rent
173
Competitive Analysis: Discussion
• Very Harsh Model– All powerful adversary
• But..– Can often still prove good competitive ratios– Really tough Testing-Ground for Algorithms– Often leads to good rules of thumb which can
be validated by other analyses– Distribution independent: doesn’t matter
whether traffic is heavy-tailed or Poisson or Bernoulli
174
Competitive Analysis in Networking: Outline
• Background• Incremental construction of Multicast
Trees– The Greedy Strategy
• Routing and Admission Control– The Exponential Metric
• More Restricted Adversaries– Adversarial Queueing Theory
175
Incremental Construction of Multicast Trees
• Fixed Multicast Source s– K Receivers arrive one by one– Must adapt multicast tree to each new arrival
without rerouting existing receivers– Malicious adversary generates bad requests– Objective: Minimize total size of multicast tree– Applications: Streaming; Cache updates; ……
s r1
a
b
r1r1
bb
C ¸ 3/2
Can create worse sequences
176
Two Classes of Algorithms
• Shortest Path Algorithm– Each receiver connects using shortest path to
source (or to a core)• DVMRP [Waitzman, Partridge, Deering ’88]• CBT [Ballardie, Francis, Crowcroft ‘93]• PIM [Deering et al. ’96]
• Greedy Algorithm [Imase and Waxman ‘91]– Each receiver connects to the closest point on
the existing tree– Independently known to the Systems community
• The “naive” algorithm [Doar and Leslie ‘92]• End-system multicasting [Faloutsos, Banerjea, Pankaj
’98; Francis ‘99]
177
Shortest Path Algorithm: Example
• Receivers r1, r2, r3, … , rK join in order
N
s
r1
r2
r3
rK
178
Shortest Path Algorithm
• Cost of shortest path tree ¼ K £ N
N
s
r1
r2
r3
rK
179
Shortest Path AlgorithmCompetitive Ratio
• Optimum Cost ¼ K + N– Competitive Ratio ¼ KN/(K+N)
• If N is large, then competitive ratio ¼ K
s
r1
r2
r3
rK
180
Greedy Algorithm
• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)
• Theorem 2: No algorithm can achieve a competitive ratio better than log K
[Imase and Waxman ’91]
Greedy algorithm is the optimum strategy
181
Proof of Theorem 1
[Alon and Azar ’93]
• L = Size of the optimum multicast tree
• pi = amount paid by online algorithm for ri
– i.e. the increase in size of the greedy multicast tree as a result of adding receiver ri
• Lemma 1: The greedy algorithm pays 2L/j or more for at most j receivers– Assume the lemma– Total Cost 2L (1 + 1/2 + 1/3 + … 1/K) ¼ 2L
log K
182
Proof of Lemma 3
• Suppose there are more than j receivers for which the greedy algorithm paid more than 2L/j– Let these be r1, r2, … , rm, for m larger than j
– Each of these receivers is at least 2L/j away from each other and from the source
) The shortest tour with all these receivers and the source ¸ (2L/j)m > 2L
) Cost of multicast tree ¸ ½ (cost of tour) > LContradiction!
183
rm
Tours and Trees
s
r1 r2
r3
r4
Each segment ¸ 2L/j
) Tour cost ¸ (2L/j)m > 2L
s
r1 r2r3
r4
rm
Can construct tour from tree by repeating edgesAt most twice) Tree Cost ¸ ½(Tour cost)
> L
184
Greedy Algorithm: Recap
• Add new receiver to closest node on existing tree
• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)
• Theorem 2: No algorithm can achieve a competitive ratio better than log K
• Greedy algorithm is the optimum strategy
• Shortest path algorithm can be pretty bad
185
Objections to the Greedy Algorithm
• Log K is pretty bad
• We don’t care about performance–Network bandwidth is cheap
• Shortest path performs well in practice– The example given earlier is pathological
• Greedy algorithm is impractical
• We don’t trust theoreticians–Theoreticians always hide something
All valid concerns; Must be addressed
186
Log K is Pretty Bad?
• But K is worse !!
187
Network Bandwidth is Cheap?
• Quantitative Analysis helps
– Difference between shortest path algorithm and greedy algorithm is K/log K
– Network bandwidth is not that cheap, specially for bandwidth intensive multicasts
188
Shortest Path Works Well in Real-life Networks?
• What are “Real-life” networks?– Internet topology is not completely
understood
• Must look at interesting special cases– Assume receivers chosen at random1. The network looks like a grid
– Shortest path: Competitive ratio = SquareRoot(K)– Greedy Algorithm: Competitive ratio = O(1)
2. The network looks like a random graph– Shortest path: Competitive ratio = O(1)– Greedy Algorithm: Competitive ratio = O(1)
[Goel and Munagala ’00]
189
Greedy Algorithm is Impractical?
• Yes, for deployment at lower network layers
• But not if multicast routing occurs at the application layer
• Several systems now implement similar schemes (end-system multicast)– Qosmic [Faloutsos, Banerjea, Pankaj ’98]– Yallcast/YOID [Francis ’99]………
190
Theoreticians Hide Things?
• So what did we hide here?– Greedy algorithm can result in large latency
from source to receivers– Shortest path algorithm can achieve the best
possible latency
• Fix: Reroute large latency receivers and some of their ancestors– Close to optimum latencies– Tree size close to the greedy tree– No receiver rerouted more than once[Goel and Munagala ‘00]
191
Moral
• Rule of thumb for multicast routing:– Since future is unknown, be greedy in the
present
• Meta-morals:– Competitive analysis can yield valuable clues
about algorithm performance– Caution: Competitive analysis is the beginning,
not the end– Must validate online algorithms in systems
setting– Must often tweak the algorithms
192
Competitive Analysis in Networking: Outline
• Background• Incremental construction of Multicast
Trees– The Greedy Strategy
• Routing and Admission Control– The Exponential Metric
• More Restricted Adversaries– Adversarial Queueing Theory
193
The Exponential Cost Metric
• Consider a resource with capacity C• Assume that a fraction of the resource has
been consumed• Exponential cost “rule of thumb”: The cost of the
resource is given by for appropriately chosen • Intuition: Cost increases steeply with
– Bottleneck resources become expensive
Cost
194
Applications of Exponential Costs
• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing
195
The Online Routing Problem
• Connection establishment requests arrive online in a VPN (Virtual Private Network)
• Must assign a route to each connection and reserve bandwidth along that route– PVCs in ATM networks– MPLS + RSVP in IP networks
• Oversubscribing is allowed– Congestion = the worst oversubscribing on a link
• Goal: Assign routes to minimize congestion• Assume all connections have identical b/w
requirement, all links have identical capacity
196
Online Routing Problem: Example
s r1
b
r1r1C ¸ 2
Can create worse sequences
aaa
197
Online Algorithm for Routing
• L = Fraction of bandwidth of link L that has been already reserved
• = N, the size of the network
• The Exponential Cost Algorithm:– Route each incoming connection on current
cheapest path from src to dst– Reserve bandwidth along this path[Aspnes et al. ‘93]
198
Online Algorithm for Routing
• Theorem 1: The exponential cost algorithm achieves a competitive ratio of O(log N) for congestion
• Theorem 2: No algorithm can achieve competitive ratio better than log N in asymmetric networks
This simple strategy is optimum!
Does the idea extend to other problems? To more realistic scenarios?
199
Applications of Exponential Costs
• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing
200
Online Admission Control and Routing
• Connection establishment requests arrive online
• Must assign a route to each connection and reserve bandwidth along that route
• Oversubscribing is not allowed– Must perform admission control
• Goal: Admit and route connections to maximize total number of accepted connections (throughput)
201
Exponential Metric and Admission Control
• When a connection arrives, compute the cheapest path under current exponential costs
• If the cost of the path is less than then accept the connection; else reject[Awerbuch, Azar, Plotkin ’93]
• Theorem: This simple algorithm admits at least O(1/log N) as many calls as the optimum
202
Objections to Exponential Costs
• Log N is too bad• Requires permanent connections• Too inefficient
– Frequent “link-state updates”– Frequent computation of shortest paths
203
Applications of Exponential Costs
• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing
204
Assume Stochastic Arrivals
• Connection arrivals are Poisson, durations are Memory-less
• Assume fat links (Capacity >> log N)• Theorem: The exponential cost
algorithm results in1. Near-optimum congestion for routing problem 2. Near-optimum throughput for admission
problem[Kamath, Palmon, Plotkin ’96]Near-optimum: Compt. ratio = (1+) for close
to 0
205
Versatility of Exponential Costs
• Guarantees of log N for Competitive ratio against malicious adversary
• Near-optimum for stochastic arrivals• Near-optimum given fixed traffic matrix
[Young ’95; Garg and Konemann ’98]
No need to know whether there is an adversary, or what the stochastic distribution is, or what the traffic matrix is !!
206
Objections to Exponential Costs
• Log N is too bad• Requires permanent connections• Too inefficient
– Frequent “link-state updates”– Frequent computation of shortest paths
207
Applications of Exponential Costs
• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing
208
Exponential Metrics and Stale Information
• Exponential metrics continue to work well if– Link states are a little stale– Shortest paths are reused over small intervals
rather than recomputed for each connection– No centralized agent[Goel, Meyerson, Plotkin ’01]
• Caveat: Still pretty hard to implement
209
Applications of Exponential Costs
• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing
210
Power Aware Routing
• Consider a group of small mobile nodes eg. sensors which form an adhoc network– Bottleneck Resource: Battery– Goal: Maximize the time till the network partitions
• Assign a cost to each mobile node which is where = fraction of battery consumed– Send packets over the cheapest path under this cost
measure
• O(log n) competitive against an adversary– Near-optimum for stochastic/fixed traffic
211
Power Aware Routing: Implementation?
• Hard to implement in general• Consider the Directed Diffusion Model
[Intanagonwiwat, Govindan, Estrin ’00]– Receiver floods network with interest for
desired data– Interest reaches the source– Source sends data over multiple paths– Receiver reinforces the “best” path
• Just send accumulated sum of exponential costs along with the data– Receiver reinforces the path with the least cost
212
Competitive Analysis in Networking: Outline
• Background• Incremental construction of Multicast
Trees– The Greedy Strategy
• Routing and Admission Control– The Exponential Metric
• More Restricted Adversaries– Adversarial Queueing Theory
213
• Malicious, all-knowing adversary– Injects packets into the network– Each packet must travel over a specified route
• Suppose adversary injects 3 packets per second from s to r– Link capacities are one packet per second
– No matter what we do, we will have unbounded queues and unbounded delays
– Need to temper our definition of adversaries
Adversarial Queueing TheoryMotivation
sr
214
Adversarial Queueing TheoryBounded Adversaries
• Given a window size W, and a rate r < 1– For any link L, and during any interval of
duration T > W, the adversary can inject at most rT packets which have link L in their route
• Adversary can’t set an impossible task!!– More gentle than competitive analysis
• Will study packet scheduling strategies– Which packet to forward if more than one
packets are waiting to cross a link?
215
Some Interesting Scheduling Policies
• FIFO: First In First Out• LIFO: Last In First Out• NTG: Nearest To Go
– Forward a packet which is closest to destination
• FTG: Furthest To Go– Forward a packet which is furthest from its destination
• LIS: Longest In System– Forward the packet that got injected the earliest– Global FIFO
• SIS: Shortest In System– Forward the packet that got injected the last– Global LIFO
216
Stability in the Adversarial Model
• Consider a scheduling policy (eg. FIFO, LIFO etc.)
• The policy is universally stable if for networks and all “bounded adversaries”, the packet delays and queue sizes remain bounded[Borodin et al. ‘96]
• FIFO, LIFO, NTG are not universally stable
• LIS, SIS, FTG are universally stable[Andrews et al. ‘96]
217
Adversarial Queueing Model: Routing
Using the Exponential Cost Metric
• Adversary injects packets into the network but gives only the src, dst– The correct routes are hidden
• Need to compute routes– Again, use the exponential cost metric– Reset the cost periodically to zero– Use any stable scheduling policy
• Theorem: The combined routing and scheduling policy is universally stable[Andrews et al. ’01]
218
Summary
• Competitive analysis models decision making under uncertainty– Applicable to a wide range of networking
problems
• General rules of thumb– Greedy algorithm for multicasting– Exponential cost metric for online routing,
admission control, stochastic injections, power-aware routing
• Adversarial Queueing Theory– Bounded adversaries– FIFO unstable; LIS stable– Exponential metrics result in stable routing
219
References
1. N. Alon and Y. Azar. On-line Steiner trees in the Euclidean plane. Discrete and Computational Geometry, 10(2), 113-121, 1993.
2. M. Andrews, B. Awerbuch, A. Fernandez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results for greedy contention-resolution protocols. Proceedings of the 37th IEEE Conference on Foundations of Computer Science, 1996.
3. M. Andrews, A. Fernandez, A. Goel, and L. Zhang. Source Routing and Scheduling in Packet Networks. To appear in the proceedings of the 42nd IEEE Foundations of Computer Science, 2001.
4. J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line load balancing with applications to machine scheduling and virtual circuit routing. Proceedings of the 25th ACM Symposium on Theory of Computing, 1993.
5. B. Awerbuch, Y. Azar, and S. Plotkin. Throughput competitive online routing. Proceedings of the 34th IEEE symposium on Foundations of Computer Science, 1993.
6. A. Ballardie, P. Francis, and J. Crowcroft. Core Based Trees(CBT) - An architecture for scalable inter-domain multicast routing. Proceedings of the ACM SIGCOMM, 1993.
220
References [Contd.]
7. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. Proceedings of the 28th ACM Symposium on Theory of Computing, 1996.
8. S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei. The PIM architecture for wide-area multicast routing. IEEE/ACM Transactions on Networking, 4(2), 153-162, 1996.
9. M. Doar and I. Leslie. How bad is Naïve Multicast Routing? IEEE INFOCOM, 82-89, 1992.
10. M. Faloutsos, A. Banerjea, and R. Pankaj. QoSMIC: quality of service sensitive multicast Internet protocol. Computer Communication Review, 28(4), 144-53, 1998.
11. P. Francis. Yoid: Extending the Internet Multicast Architecture. Unrefereed report, http://www.isi.edu/div7/yoid/docs/index.html .
12. N. Garg and J. Konemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. Proceedings of the 39th IEEE Foundations of Computer Science, 1998.
221
References [Contd.]
13. A. Goel, A. Meyerson, and S. Plotkin. Distributed Admission Control, Scheduling, and Routing with Stale Information. Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, 2001.
14. A. Goel and K. Munagala. Extending Greedy Multicast Routing to Delay Sensitive Applications. Short abstract in proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms, 2000. Long version to appear in Algorithmica.
15. M. Imase and B. Waxman. Dynamic Steiner tree problem. SIAM J. Discrete Math., 4(3), 369-384, 1991.
16. C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MobiCOM), 2000.
17. A. Kamath, O. Palmon, and S. Plotkin. Routing and admission control in general topology networks with Poisson arrivals. Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms, 1996.
18. D. Waitzman, C. Partridge, and S. Deering. Distance Vector Multicast Routing Protocol. Internet RFC 1075, 1988.
19. N. Young. Randomized rounding without solving the linear program. Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms, 1995.