Ashish Goel Dept of CS USC [email protected] Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Ashish Goel

Dept of CSUSC

[email protected]

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Balaji Prabhakar

Network Algorithms: Techniques for Design and

Analysis

Nick McKeown Depts of EE and CSStanford University

[email protected]

Balaji Prabhakar

Depts of EE and CSStanford University

[email protected]

ACM SIGCOMM 2001San Diego, CA

2

Overview and Objectives

• Algorithm design is a classical subject– beginnings in early computations

(multiplication, division, etc.)– and has become more sophisticated and

mature in the computer age

• The subject has grown– when tasks have to be performed under

more stringent conditions– or, when new tasks have to be performed– (typically, it’s a combination of both

reasons)

3

Algorithms for Networks

• Networking provides a rich new context for algorithm design

– algorithms are used everywhere in networks– at the end-hosts for packet transmission– in the network: switching, routing, caching, etc.

– many new scenarios – and very stringent constraints

– high speed of operation– large-sized systems– cost of implementation

– require new approaches and techniques

4

Methods

• Algorithm analysis– sheds light on the complexity of an algorithm (time, space,

resource)– uses discrete math and has many standard methods– implementors care about algorithm complexity– (needs cautious interpretation: metrics of the implementor

and of the theoretician not necessarily the same)

• In the networking context– we also need to understand the “performance” of an

algorithm: How well does a network or a component that uses a particular algorithm perform, as perceived by the user?

– performance analysis is concerned with metrics like delay, throughput, loss rates, etc

– this requires continuous math methods: e.g. queueing theory

5

Recent Algorithm Design Methods

• Motivated by the desire – for simple implementations– and for robust performance (because operating conditions change and are

unknown, or because of security reasons)

• Several new methods of algorithm design can be used in the networking context

– randomized algorithms– approximate algorithms– genetic algorithms– online algorithms– combinatorial optimization techniques

6

Performance Analysis Methods

• There are many: some classical, some new

– standard queueing theory (assuming input distributions are known, can we say what

delays and buffer occupancies will be like?)

– fluid models (very simple and useful for determining throughput regions)

– adversarial analysis (useful for worst-case analyses: your worst enemy is

generating traffic to beat your algorithm)

– competitive analysis (useful for comparing two different algorithms on the same

inputs – like a competition)

7

In this tutorial…

• We will consider a number of problems in networking

• Show various methods for algorithm design and for performance analysis

• Nick McKeown– Switch scheduling algorithms

• Balaji Prabhakar– Randomized algorithms

• Ashish Goel– Competitive analysis of approximate algorithms

8

Disclaimers

• This tutorial is idiosyncratic– we talk about things we know well: the subject is larger– we also talk about things we’ve worked on (hopefully,

this are also things we know well)

• Your participation is essential– please don’t hesitate to ask for clarifications– there is a lot of material and some of it is not easy– there is no such thing as a stupid question

• References– are included for each topic, but they are not exhaustive (please don’t be upset if we didn’t cite your paper)– if you need more details, please drop us a note

Switch Scheduling Algorithms


Balaji Prabhakar

10

Scheduling crossbar switches to achieve 100% throughput

• Background to problem• Techniques and algorithms

11

Background to switch scheduling

1. [Karol et al. 1987] Throughput limited to by head-of-line blocking for Bernoulli IID uniform traffic.

2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up.

%5822

12

History of the theory

3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph.

4. [McKeown et al. 1995] (a) Maximum size match can not guarantee 100% throughput.(b) But maximum weight match can – O(N3).

5. [Mekkittikul and McKeown 1998] A carefully picked maximum size match can give 100% throughput.

Matching

O(N2.5)

13

History of the theory Speedup

5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm.

6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two.

14

History of the theory (3)Newer approaches

7. [Tassiulas 1998] 100% throughput possible for simple randomized algorithm with memory.

8. [Giaccone et al. 2001] “Apsara” algorithms.

9. [Iyer and McKeown 2000] Parallel switches can achieve 100% throughput and emulate an output queued switch.

10. [Chang et al. 2000] A 2-stage switch with a TDM scheduler can give 100% throughput.

15

Scheduling crossbar switches to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

16

Basic Switch Model

A1(n)

S(n)

N NLNN(n)

A1N(n)

A11(n)L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

17

Some definitions

matrix. npermutatio a is and :where

:matrix Service 2.

".admissible" is traffic the say we If

where

:matrix Traffic 1.

SssS

nAE

ijij

jij

iij

ijijij

1,0],[

1,1

)]([:,

3. Queue occupancies:

Occupancy

L11(n) LNN(n)

18

Some possible performance goals

?metrics... Other 6

5

4

3.

"throughput "100% 2.

onconservati Work 1.

.

)(lim

)(lim.

,)]([.

,)(

ij

ij

n

ij

n

ij

ij

n

nA

n

nD

CnLE

nCnL

When traffic is

admissible

19

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known








20

Algorithms that give 100% throughput for uniform traffic

• Quite a few algorithms give 100% throughput when traffic is uniform1

• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full– iSLIP

1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.

21

Maximum size bipartite match

• Intuition: maximizes instantaneous throughput

• Gives 100% throughput for uniform traffic.

L11(n)>0

LN1(n)>0

“Request” Graph Bipartite Match

MaximumSize Match

22

Network flows and bipartite matching

Finding a maximum size bipartite matching is equivalent to solving a network flow problem

with capacities and flows of size “1”.

A 1

Sources

Sinkt

B

C

D

E

F

2

3

4

5

6

23


A 1

s t

B

C

D

E

F

2

3

4

5

6

Ford-Fulkerson method.Residual Graph for first three paths:

24


A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for next two paths:

25


A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for augmenting path:

26


A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for last augmenting path:

27


A 1

s t

B

C

D

E

F

2

3

4

5

6

Maximum flow graph:

28


A 1

B

C

D

E

F

2

3

4

5

6

Maximum Size Matching:

29

Aside: Maximal Matching

• A maximal matching is one in which each edge is added one at a time, and is not later removed from the matching.

• i.e. no augmenting paths allowed (they remove edges added earlier).

• No input and output are left unnecessarily idle.

30

Aside: Example of Maximal Size Matching

A 1

B

C

D

E

F

2

3

4

5

6

A 1

B

C

D

E

F

2

3

4

5

6

Maximal Matching Maximum Matching

31

Aside: Maximal Matchings

• In general, maximal matching is much simpler to implement, and has a much faster running time.

• A maximal size matching is at least half the size of a maximum size matching.

• A maximal weight matching is defined in the obvious way.

• A maximal weight matching is at least half the weight of a maximum weight matching. End of

aside

32

Algorithms that give 100% throughput for uniform traffic

• Quite a few algorithms give 100% throughput when traffic is uniform1

• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full

1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.

33

TDM Scheduling Algorithm

If arriving traffic is i.i.d with destinations picked uar across outputs, then a “TDM” schedule gives 100% throughput.

A 1

B

C

D

2

3

4

B

C

D

2

3

4

B

C

D

2

3

4

A 1 A 1

Variation 1: if permutations are picked uar from the set of N! permutations, this too will also give 100% throughput.

Variation 2: if permutations are picked uar from the TDM permutations above, this too will give 100% throughput.

34

A Simple wait-until-full algorithm

The following algorithm is believed to be stable for Bernoulli i.i.d. uniform arrivals:

1. If any VOQ is empty, do nothing (i.e. serve no queues).

2. If no VOQ is empty, pick a permutation uar across either (TDM permutations, or all permutations).

35

Some simple algorithms that achieve 100% throughput

36

Some observations

• A maximum size match (MSM) maximizes instantaneous throughput.

• But a MSM is complex – O(N2.5).• It turns out that there are many simple

algorithms that give 100% throughput for uniform traffic.

• So what happens if the traffic is non-uniform?

37

Why doesn’t maximizing instantaneous throughput give 100% throughput for non-

uniform traffic?

2/1

2/1

2/1

32

21

1211Three possiblematches, S(n):

100%). t(throughpu stable not is switch 0.0358 if so And

But

most at is served is 1 input which at rate total The

. w.p. serviced is 1 Input ) w.p.( arrivals have

both and and , time at that Assume

.)21(31121

.)21(311

)21(11)21(32

32)21(

)()(0)(0)(

21

2

22

2

32211211

-δ// - -λ

//

/-//

/-δ/

nQnQ n, L nn, L

38

Simulation of simple 3x3 example

39










40

Example 1: (Trivial) scheduling to achieve 100% throughput

• Assume we know the traffic matrix, and the arrival pattern is deterministic:

• Then we can simply choose:

• Q: What is Lij(n)?

1000

0100

0010

0001

nnS

,

10

...

1

01

)(

41

Example 2:With random arrivals, but known traffic matrix

• Assume we know the traffic matrix, and the arrival pattern is random:

• Then we can simply choose:

• Q: Does Lij(n) = 0 for all n?• Q: In general, if we know , can we pick a sequence S(n) to achieve

100% throughput?

1000

0100

002/12/1

002/12/1

1000

0100

0001

0010

)(,

1000

0100

0010

0001

)( evenSoddS

42

Birkhoff - von Neumann Decomposition

rate. arrival the exceeds rate

departure the and words, other In

is period in of soccurrence of# the that So

:matrices service of sequence the pick Then

element) by (element

:that such matrices, service of set and

constants of set some pick can we y,Intuitivel

,0))((

.

),,,,,,,()(

.,

),(

),,(

1

13221

1

1

1

T

i

ii

r

r

iii

r

r

iS

aTM

T

MMMMMMnS

Ma

MM

aa

Turns out, any can always be decomposed into a linear (convex) combination of matrices, (M1, …, Mr) by Birkhoff-von Neumann.

43

In practice…

• Unfortunately, we usually don’t know traffic matrix a priori, so we can:– Measure or estimate , or– Not use .

• In what follows, we will assume we don’t know or use .

44




4. When traffic matrix is not known.• Technique: Lyapunov function.






45

When the traffic matrix is not known

( 1) ( ) ( ) 0

( ) ( ) ( ) ( ) | ( ) 0.

ij ij

ij ijij ij

E L n L n | L n ,

E L n S n A n L n L n

1. We will try and fi nd conditions f or which, roughly:

i.e.

2. I n other words, there is an expected downward drif t

in the o

: { ( 1)} { ( )} ( ) 0

[ ( )] , , .

{ ( 1)} {

ij

E V L n V L n | L n

E L n i j

E V L n V L

ccupancy of each queue.

3. This is an example of a Lyapunov f unction.

4. I t is known that if

f or some V{.}, then:

5. The same result holds if :

( )} ( ) ( ) .n | L n c k L n

46

Some additional definitions

else.

and if

:Evolution Approx

:

matrix. npermutatio a is

:

:

:

:

A:

Evolution

Service

rate Arrival

Arrivals

,0

1)(0)(,1)1(

~)1(

).()()()1(~

).()()()1(

)(..:

1,1.1,0

)),(,),(()(

).,,(

)).(),...,(()(

,

)(..)(

...

...

)(..)(

)(

11

11

1

11

1

11

nSnLnLnL

nAnSnLnL

nAnSnLnL

nSei

SSS

nSnSnS

nAnAnA

nAnA

nAnA

n

ij ij ij ij

ij ij ij ij

ij ij ij ij

jij

iijij

NNT

NNT

NNT

NN

N

N

47

Some facts that we’ll use

matrices. npermutatio the are C of points extreme The

:Theorem sBirkhoff’ 4.

elements. other of nscombinatio linear

not are that C of elements are points extreme The

C. of points Extreme 3.

then

if e.g.

set. closed a is C 2.

i.e. ,stochasticsub doubly is 1.

0,1)(

,3040

7020

1070

8010

.11

...

.........

...

2

21

1

111

1

21

a,bbaC, bΛaΛ

C,ΛΛ. .

. ., Λ

. .

. .Λ

λ, λ

λλ

λλ

Λ

i j

ijij

NNN

N

48

Some more facts that we’ll use

1 1

( )

max ( ( ) )

1 1 0.

max( ( ) ) max( ( ) ( )).

( ( )

T

N N

ij ij iji j

T T

λ S n

T

L n

λ , λ , λ

L n λ L n S n

L n

Consider the f ollowing linear programming problem:

Find:

s.t.

We know that the solution is an extreme point of C.

i.e.

( )

( )

) max( ( ) ( )) 0.

max( ( ) ( ))?

T

S n

T

S n

λ L n S n

L n S n

Q: So what is

49

Maximum weight matching

A1(n)

N NLNN(n)

A1N(n)

A11(n)

L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

L11(n)

LN1(n)

“Request” Graph Bipartite Match

S*(n)

MaximumWeight Match

*

( )( ) arg max( ( ) ( ))T

S nS n L n S n

50

Outline of Proof

*

( )

*

( ) arg max( ( ) ( )),

( ( ) ) ( ( ) ( )) 0.

( 1) ( 1) ( ) ( ) ( ) ( ) .

{ ( )} ( ) ( ),

T

S n

T T

T T

T

S n L n S n

L n λ L n S n

E L n L n L n L n | L n c L n

V L n L n L n

1. We know that if we pick

then

2. Next we use this f act to show that:

where:

( )

[ ( )]

L n

E L n

is our Lyapunov f unction.

3. Hence, if is large enough, there is an expected

single-step downward drif t in occupancy, and so

and 100%throughput is achieved.

For more details, see ref erence.

51

Choosing the weight

2 3

( ) ( )?

( ) [ ( )] ,[ ( )] ,...

( )

ij ij

ij ij ij

ij

w n L n

w n L n L n

w n

Q: Do we need to choose edge weights:

I f we choose then same

Lyapunov method can be used to show that 100% throughput

is achieved.

I f

Fact 1:

Fact 2: [ ( )] [ ( )] .

( ) ( ) [ ( )] .

( ) [ ( )]

xxij ij

ij ij ij

xij ij

L n E L n

w n L n E L n

w n L n

x

then For example,

if , then

Theory suggests that if , then

switch becomes "more stable" as we increase . Simulation

sugg

Observation:

( )

( ) ( ) (

ij

ij

ij ij iji

x

w n

Q

w n L n L n

ests that average delay decreases as we increase .

I f is defi ned to be the time that the HOL cell

has been in queue , then 100% throughput is achieved.

I f

Fact 3:

Fact 4: )j

, then 100% throughput is

achieved. This is called a "Longest Port First (LPF)" match, and

(surprisingly) is also a maximum size match.

52










53

100% throughput with pipelining

ˆ ( ) ( ),ij ij

n

L n L n k

k

I n practice, switch schedulers are of ten pipelined.

So what happens if the pipeline uses out-of -date inf ormation?

1. Defi ne out-of -date occupancy at time :

where is how out-of -date th

ˆ( ) ( ) ( ) ,

( 1) ( 1) ( ) ( ) ( ) ( ) 2 .

( )

ij ij ij

T T

L n k L n L n k

E L n L n L n L n | L n c L n Nk

L n

additional term

e inf ormation is.

2. Because it can be shown that:

3. As bef ore, if is large enough, there is an expecte

[ ( )]E L n

k

d

single-step downward drif t in occupancy, and so

and 100%throughput is achieved.

Q: I f we increase , will average delay increase?

54

100% throughput with incomplete information

I n practice, the bandwidth of state inf ormation to/ f rom

and within a switch schedulers is limited.

So what happens if the scheduler uses f ewer bits to store

the weight inf ormation?

1. Defi ne noisy inf orma

ˆ( ) ( ) ( ),

( )

( )

n

L n L n e n

e n

e n C n C

tion at time :

where is an error term.

2. I f , , where is some constant, then 100%

throughput is achieved.

55










56

Achieving 100% when algorithm does not complete

Randomized algorithms:1. Basic idea (Tassiulas)2. Reducing delay (Shah, Giaccone and

Prabhakar)

Note: Balaji Prabhakar will cover randomized scheduling algorithms in detail in the next section of the tutorial.

57










58

Speedup and Combined Input Output Queueing (CIOQ)

A1(n)

S(n)

N NLNN(n)

A1N(n)

A11(n)L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

• With speedup, the matching is performed s times per cell time, and up to s cells are removed from each VOQ.• Therefore, output queues are required.

59

Fluid model

• Fluid models used to obtain stability regions for discrete time stochastic networks.

• Apply to any traffic that satisfies a strong law of large numbers, i.e.

• The fluid model “washes” out the packet structure, yet still can prove stability results.

ijij

n n

nA

)(lim

60

Fluid Model

.)(),()(

)()0()(

.)(

)(

)),1()((1)(

)()()0()(

1)0)({

ttTtTstD

tDtLtL

nnTn

SnT

kTkTsnD

nDnALnL

Ss

ms

Ss

msijij

ijijijij

Ss

ms

mS

ms

Ss

n

k

mskLijij

ijijijij

ij

:where

:time continuous in equations Fluid

and ; slot by used been has

npermutatio time cumulative the is :where

:evolution Switch

61

Fluid Model

reference. the see details, more For

achieved. is

throughput 100% and stable, is switch the , of speedup

a with that shown be can it this, From served. be must output

the and/or served, be must input the either words, other In

some for 2.

and/or 1.

: if

hold must following the of one , slot time of , phase,

each for then match, maximal a is , match, the If

:Prabhakar) and (Dai proof of Sketch

2

}.,,1{,,0)()(

,0)()(

0)(

*

*

s

Njis

knS

s

knL

s

knS

s

knL

s

knL

nk

S

jiji

jiji

ij

*

62










63

2-stage switch and no scheduler

Motivation:1. If traffic is uniformly distributed, then

even a TDM schedule gives 100% throughput.

2. So why not force non-uniform traffic to be uniformly distributed?

64

2-stage switch and no scheduler

S2(n)

N NLNN(n)

L11(n)

1 1 D1(n)

DN(n)

N N

1 1 A’1(n)

A’N(n)

S1(n)

A1(n)

AN(n)

BufferlessLoad-balancing

Stage

BufferedSwitching

Stage

65

2-stage switch with no scheduler

ˆ( ) ,

ˆ mod

nn

n n N

1. Consider a periodic sequence of permutation matrices:

where is a one-cycle permutation matrix

(f or example, a TDM sequence), and .

2. I f 1st stage is

Main Result [Chang et al.]:

1 1

1

2 2

( ) ( ),

( ) ( ),

n n

n n

scheduled by a sequence of permutation

matrices:

where is a random starting phase, and

3. The 2nd stage is scheduled by a sequence of permutation

matrices:

4. Then the switch gives 100% throughput f or a very broad

range of traffi c types.

1st stage makes non-unif orm traffi c unif orm,

and breaks up burstiness. For bursty traffi c, delay can be

lower than f or an ou

Observation 1:

tput queued switch!

Cells can become mis-sequenced.Observation 2:

66

Parallel Packet Switches

Definition:

A PPS is comprised of multiple identical lower-speed packet-switches operating independently and in parallel. An incoming stream of packets is spread, packet-by-packet, by a demultiplexor across the slower packet-switches, then recombined by a multiplexor at the output.

We call this “parallel packet switching”

67

Architecture of a PPS

OQ Switch

OQ Switch

OQ Switch

1

2

3

N=4

R

R

R

R

1

2

3

N=4

R

R

R

R

MultiplexorDemultiplexor

Demultiplexor

Demultiplexor

Demultiplexor

Multiplexor

Multiplexor

Multiplexor

(sR/k) (sR/k)

k=3

1

2

(sR/k) (sR/k)

68

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

2

3

N=4

R

R

R

R

2

2

415

3

1

2

1

3

2

1

4

2

3

1

4

2 13

4

1234

5

123

5

1234 1234

5

12345

R/3

R/3

R/3

Why a PPS isn’t work-conserving with s=1

69

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

Why is there no Choice at the Input ?

1

2

3

4

2

j

4

1

2

3

j

4

1

2

3

jj

5

j

j

41

2

3

41

2

jj54

1

2

3j

4jjj5

How we got there on the input side

70

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

Result of no Choice

2

41

2

3

5

1

2

3

4

jj54

Why is there no Choice at the Input ?

How we got there on the input side

71

How can we increase choice? Speedup

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

2

1j

1

j

1

j

jjj54 5

14

2

3j

5

j

j

14

jj

2

3

5

14

jj

2

3

5

(2R/3) (2R/3)

(2R/3) (2R/3)

72

Effect of Speedup on Choice

R

A speedup of S= 2,

with k= 10 links

2R/k Layer 1

Layer 10

1k/S

Layer 2

Layer 9

73

Aside: 3-stage Clos Network

n x k

m x m

k x n1

N

N = n x mk >= n

1

2

…

m

1

2

…

…

…

k

1

2

…

m

1

N

n n

74

Aside: With k = n is a Clos network non-blocking like a

crossbar?

Consider the example: scheduler chooses to match(1,1), (2,4), (3,3), (4,2)

75

Aside: With k = n is a Clos network non-blocking like a

crossbar?

Consider the example: scheduler chooses to match(1,1), (2,2), (4,4), (5,3), …

76

Aside: With k > n can a Clos network be non-blocking without

rearrangement?

Clos’ Theorem: If k > 2n – 1, then a new connection can alwaysbe added without rearrangement.

77

I1

I2

…

Im

O1

O2

…

Om

M1

M2

…

…

…

Mk

n x k

m x m

k x n1

N

N = n x mk >= n

1

N

n n

Aside: With k > n can a Clos network be non-blocking without

rearrangement?

78

Clos Theorem

Ia Ob

x

x + n

1

n

k

1

n

k

1. Consider adding the n-th connection between1st stage Ia and 3rd stage Ob.

2. We need to ensure that there is always somecenter-stage M available.

3. If k > (n-1) + (n-1) , then there is always an M available. i.e. k > 2n – 1.

n-1 alreadyin use at input

and output.

End of aside

79

Definitions for PPS

• Available Input Link Set (AIL)

AIL(i,n) is the set of layers to which external input port i can start writing a cell to, at time slot n.

80

Definition

• Departure Time of a Cell (n’)

The departure time of a cell, n’, is the time it would have departed from an equivalent FIFO OQ switch.

81

Definition

• Available Output Link Set (AOL)

AOL(j,n’) is the set of layers that output j can start reading a cell from, at time slot n’.

82

Main Observation

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

2

14

1

j

j

jj

2

3

(2R/3) (2R/3)

(2R/3) (2R/3)

• Inputs can only send to the AIL set.• Outputs can only read from the AOL set.

5 1jj

j

222

83

Minimum size of AIL, AOL:

|AIL|, >= Total – Maximum number of|AOL| links links which can have

cells in progress

Lower Bounds on Choice Sets

= k - ( k/S - 1 )

84

Assurance of Choice

• A cell must be sent to a link which belongs to both the AIL and the AOL set.

AI L AOL

|AI L| |AOL| k

(k k/ s 1) (k k/ s 1) k

S 2k/ (k 2)

85

Parallel Packet SwitchResults

• If S >= 2 then each cell is guaranteed to find a layer that belongs to both the AIL and AOL sets.

• If S >= 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns, and hence achieves 100% throughput.

86










87

References

1. C.-S. Chang, W.-J. Chen, and H.-Y. Huang, "Birkhoff-von Neumann input buffered crossbar switches," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, 2000, pp. 1614 – 1623.

2. N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug 1999.

3. A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March 1998.

4. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April 1998.

5. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects 2001.

6. J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp. 556 -- 564.

7. C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann

switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas. 8. S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc.

IEEE INFOCOM `01, April 2001, Alaska.

Randomized Algorithms


Balaji Prabhakar

89

Motivation

• Networking problems suffer from the “curse of dimensionality”– algorithmic solutions do not scale well

• Typical causes– size: large number of users– time: very high speeds of operation

• A good deterministic algorithm exists, but …– it requires too large a data structure– it needs state information, and “state” is too big– it “starts from scratch” in each iteration

90

Overview

• In various scenarios, e.g.– caching– load balancing– switch scheduling– packet dropping (active queue management)

• We will– consider good (even optimal) exact algorithms– discuss their complexity– design approximate algorithms– and, analyze their performance

91

Some Specifics

• Exact algorithms – in each scenario these are either well-known or easily

determined– when their analysis and optimality properties have been

established in the classical theoretical literature, we will only give some intuition and point to references

– if their development is more recent (e.g. switch scheduling), we will consider them in more detail

thus, the main focus of this segment is the design and analysis of approximate randomized schemes

• Randomized algorithms– are a powerful way of approximating– it is often possible to randomize deterministic

algorithms – this simplifies the implementation while retaining a

(surprisingly) high level of performance

92

Randomization

• The main idea is – to simplify the decision-making process– by basing decisions upon a small, randomly

chosen sample of the state – rather than upon the complete state

93

An Illustrative Example

• Find the largest element of a set S of size 1 billion• Deterministic algorithm: linear search

– has a complexity of 1 billion

• The randomized version: find the largest of 10 randomly chosen samples– has a complexity of 10– (note: this ignores complexity of choosing 10 random

samples)

• Performance– linear search will find the absolute largest element– if R is the element found by randomized algorithm, we can

make statements like P(R is at least the 100 millionth largest element) = thus, we can say that the performance of the randomized

algorithm is very good with a high probability

101

110

94

Randomizing Iterative Schemes

• Often, we want to perform some operation iteratively

• Example: find the heaviest matching in a switch in every time slot

• Since, in each time slot– at most one packet can arrive at each input– and, at most one packet can depart from each output the size of the queues, or the “state” of the switch, doesn’t

change by much between successive time slots so, a matching that was heavy at time t will quite likely continue

to be heavy at time t+1

• This suggests that– knowing a heavy matching at time t should help in determining

a heavy matching at time t+1 there is no need to start from scratch in each time slot

95

Summarizing the Philosophy…

• Randomized algorithms can help simplify the implementation– by reducing the amount of work in each iteration

• If the state of the system doesn’t change by much between iterations, then– we can reduce the work even further by carrying

information between iterations

• The big pay-off is that, even though it is an approximation, the performance

of a randomized scheme can be surprisingly good

96

Examples

• We’ll discuss these issues in the following scenarios

– document replacement in web-caches– load balancing– switch scheduling– bandwidth firewalling via packet dropping

A Randomized Web-Cache Replacement Scheme

98

Background

• Tremendous increase in HTTP traffic• Proxy caches reduce

– network traffic– download latency– server load

99

Replacement Policies

• In CPU caches– the Least Recently Used (LRU) algorithm

and variants• recentness of use exploits temporal correlation

• In Web caches– more complicated criteria

• different document sizes and fetching costs• recentness and frequency of use exploit

correlation and popularity

needed to determine the suitability of a document for eviction

100

Motivation

• Data structures may get complicated– priority queues

• Supporting computations can get time- consuming– cache of size K, O(log K) per access, to

prepare for evictions

• Need efficient approximations

101

A Randomized Algorithm

• First cut– pick N documents at random from cache– evict the least useful document

• For subsequent iterations…

• Why throw away all previous info?

• Second best (or second least useful sample) is pretty good!

102

• First iteration– pick N documents at random from cache– evict the least useful document– retain the M next least useful documents

• Subsequent iterations– pick N-M documents at random from cache– evict the least useful document

• among the fresh N-M and the M retained

– retain the M next least useful documents

The Iterative Version

103

An Example (N=8, M=2)

11 89 2 39 41 77 95 8

11 89 2 39 41 77 8

89 77311 22 49 25 82

8277

104

Performance Criterion

• Deterministic algorithm would evict the most useless document

• Goal: document evicted in most useless nth percentile– error if goal not achieved

• Goal positively correlated with hit rate

105

Memory Improves Performance

• Using memory improves performance significantly for small M– –

• It is like choosing the minimum of the minimum

nNNerror enMP )1()0(

nNNerror enMP 212)1()1(

106

• Compute Perror as a function of memory

M– Xk: number of useless documents (in nth bin)

prior to kth replacement

– Ak: number of useless documents acquired

from resampling

– Xk is a Markov Chain

Some Analysis

107

•

• Perror = P(Xk=0)

• Analysis independent of trace-characteristics

Xk Xk+1

1(Xk>0)

Ak+1

1(Xk+1>0)

k+2thk+1thkth

The Markov Chain

108

Perror

0 1 2 3 4 5 60.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=12, n=10%

0 5 10 150

0.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=30, n=4%

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=60, n=2%

0 10 20 30 400

0.05

0.1

0.15

0.2

0.25

M

Pe

rro

r

N=80, n=2%

109

The Right Amount of Memory (M)

• From the figures it looks like there is an optimal value for M that minimizes Perror

– that is, if M is too low, we’re not carrying enough information between iterations

– if M is too high, we’re carrying a lot of (stale) information, there isn’t much new

• So there seems to be right balance between the amount of memory we need and the amount of random sampling

• More precisely…

110

The Right Amount of Memory (M)

• Perror is a complicated function of M

• But, we can still get some info on its dependency on M

• First, we can show that Perror is a convex function of M– to do this we need to show that the discrete

second derivative of Perror (M) is non-negative

– this is done using a “coupling argument”

• As a result there exists an optimal value of M = M*

111

The Optimal Value of M

• There is an approximate closed form formula for M*

• This is obtained using an appropriate “exponential martingale” based on the Markov chain X

*/1001,0max MnNNM o

K

112

A Comparison

113

Trace-driven Simulation

• Approximate the following web-cache replacement schemes– LRU– GD-Hyb (GD-Size and Hybrid)

• recentness• frequency• size• cost to fetchfrom the work of Cao and Irani, ’97, Wooster and Abrams, ‘97

114

0 5 10 15 2060

65

70

75

80

85

90

95

100

% relative cache size

% h

it ra

teWeekly NLANR Trace

LRU(non-random): blackN=30, M=5: redN=8,M=2: cyanN=3,M=1: greenRR(N=1, M=0): blue

LRU: Hit Rate

115

0 5 10 15 2060

65

70

75

80

85

90

95

100


% h

it ra

teWeekly NLANR Trace

GD-Hyb(non-random): blackN=30, M=5: redN=8, M=2: cyanN=3, M=1: greenRR(N=1, M=0): blue

GD-Hyb: Hit Rate

116

0 5 10 15 2050

55

60

65

70

75

80

85

90

95


% r

ed

uce

d la

tenc

yDaily NLANR Trace

LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue

LRU: Latency Reduction

117

0 5 10 15 2050

55

60

65

70

75

80

85

90

95

100


% r

ed

uce

d la

tenc

y

Daily NLANR Trace

LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue

GD-Hyb: Latency Reduction

118

References

1. P. Cao and S. Irani, “Cost-aware WWW Proxy Caching Algorithms,” Proc. of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA, Dec 1997

2. R. Wooster and M. Abrams, “Proxy Caching that Estimates Edge Load Delays,” 6th International WWW Conference, Santa Clara, April 1997

3. K.Psounis and B. Prabhakar, “A Randomized Web-cache Replacement Scheme,” Proc. INFOCOM 2001

4. T. Lindvall, Lectures on the Coupling Method, Wiley Series in Probability and Mathematical Statistics, Wiley, New York, 1992.

5. R. Durrett, Probability: Theory and Examples, Duxbury Press,

Second Edition, 1996.

Randomized Load Balancing

120

Load Balancing: Static Case

121

Load Balancing: Dynamic Case

122

A Simple (and elegant) Analysis

123

Continuing…

124

• Since the load doesn’t change by much between iterations …– that is, a lightly loaded queue is likely to continue to

be lightly loaded

• It might help– to remember the identity of the least loaded bin in

the current iteration for use in the next iteration – similar idea used in the web-caching problem

Carrying Information Between Iterations

125

• The (d,1) system– d random choices – 1 bin stored in memory

• Question– How well does the (d,1) system perform ?

Load Balancing with Memory

126

• The bin stored in memory– is likely to be very lightly loaded– so we might expect better load balancing

An Illustration

127

• The maximum load achieved in the (d,1) system is less than log log n/log (2d-1) +O(1) with a high probability

• This is as if we had a (2d-1,0) system.– so the bin in memory is at least as good as (d-1)

samples– again, we see the minimum of minimums effect

Theorem (Shah and P)

128

1. Y. Azar et. al., “Balanced Allocations,” Proc. Of ACM STOC, 1994.

2. M. Mitzenmacher, “The power of two choices in randomized load balancing,” PhD Thesis, UC Berkeley, 1996.

3. N. D. Vvedenskaya, R. Dobrushin and F. Karpelevich, “Queueing system with selection of the shortest of two queues: An asymptotic approach,” Problems of Information Transmission, 1996.

4. B. Vocking, “How Asymmetry Helps Load Balancing,” Proc. Of 40th IEEE-FOCS, 1999.

5. S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence, John Wiley and Sons, 1986.

References

Switch Scheduling

130

Switch Scheduling and Bipartite Graph Matching

• As we have seen, switch scheduling is essentially finding matchings in weighted bipartite graphs

41

2

4

42

131

Scheduling Algorithm

• Ideal policy: Maximum weight matching– weights: queue size, age of packets etc.– very complex for high speed networks

• In practice, approximate maximum weight matchings are the best hope

• We will discover good, randomized, approximate matchings in an evolutionary fashion– story told pictorially using simulations

132

• Switch Size : 32 X 32

• Input Traffic (shown for a 4 X 4 switch) – Bernoulli i.i.d. inputs– diagonal load matrix:

• normalized load=x+y<1• x=2y

Simulation Scenario

xy

yx

yx

yx

00

00

00

00

133

Obvious Randomized Schemes

• Choose a matching at random and use it as the schedule doesn’t give 100% throughput

• Choose 2 matchings at random and use the heavier one as the schedule

• Choose N matchings at random and use the heaviest one as the schedule

None of these can give 100% throughput !!

134

0.001

0.01

0.1

1

10

100

1000

10000

0.0 0.2 0.4 0.6 0.8 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWM R32R1

135

Bounds on Maximum Throughput

136

Iterative Randomized Scheme(Tassiulas)

• Say M is the matching used at time t

• Let R be a new matching chosen u.a.r.

• At time t+1, use the heavier of M and R • This gives 100% throughput !

note the boost in throughput is due to memory

• But, delays are very large

137

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMTassiulas

138

Observations for Improvement

• Most of the weight of a matching is carried in a small number of edges

• Hence, remember edges not matchings

139

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMR32M32 R1M1 Tassiulas

140

Finer Observations

• Let M be schedule used at time t

• Choose a “good’’ random matching R

• M’ = Merge(M,R)

• M’ includes best edges from M and R

• Use M’ as schedule at time t+1

• Above procedure yields algorithm called LAURA

141

3

2

3

2

2

1

2

3

4

1Merging

3

2

3

3

1

X R3-1+2-2=2

2-1+2-4=-1

W(X)=12 W(R)=10

M

W(M)=13

Merging Procedure

142

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMM-LAURA LAURAiLQFTassiulas

143

References

1. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input-queued switches,” Proc. INFOCOM 1998.

2. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” Proc. of Hot Interconnects, 2001.

3. R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.

A Randomized Bandwidth Partitioning Algorithm

145

The Setup

• In a congested network with many users– QoS requirements are different

• Problems:– allocate bandwidth– control queue size and hence delay

146

Approach 1: Network-centric

• Network node: fair queueing• User traffic: any type

problem: complex implementation

147

Approach 2: User-centric

• Network node: simple FIFO• User traffic: congestion-aware (e.g.

TCP)problem: requires user cooperation

148

Approach 2: Controlling Delays

• Use RED (Random Early Detection)– drop incoming packet randomly based on

congestion level– this signals the onset of congestion to the

sources who will back-off (if they are responsive)

• RED is simple – but can’t prevent unresponsive flows from

eating up all the bandwidth

• Goal: find a bandwidth partitioning algorithm that is close to RED in implementational simplicity

149

Preliminary Comments

• Consider a single link shared by 1 unresponsive (red) flow and k responsive (green) flows

• Suppose the buffer gets congested

• Observe: It is likely there are more packets from the red (unresponsive) source

• So if a randomly chosen packet is evicted, it will likely be a red packet

• Therefore, one algorithm could be: When congested evict a random packet

150

Preliminary Comments

• Unfortunately, this doesn’t work because there is a small non-zero chance of evicting a green packet

• Since green sources are responsive, they interpret the packet drop as a congestion signal and back-off

• This only frees up more room for red packets

• Idea: Suppose we choose two packets at random from the queue and compare their ids, then it is quite unlikely that both will be green

• This suggests another algorithm: Choose two packets at random and drop them both if

their ids agree• This works: That is, it limits the maximum bandwidth

the red source can consume

151

The CHOKe Algorithm

• Builds on the previous observation

• Is a randomized algorithm (like RED, and is imbedded in RED)

• Turns out to have an easily analyzable performance via fluid models

• The last point is interesting, since we’ll see how surprisingly accurate fluid models are for modeling TCP- and UDP-type traffics

152

The CHOKe Algorithm

Admit new packet

Arriving packet

y

ny

Drop both packets

Draw a packet at random from queue

end

end

n

Drop the new packet

end

Admit packet witha probability p

end

y

nAvgQsize <= Minth?

Both packets from same flow?

AvgQsize <= Maxth?

153

The CHOKe Algorithm: Multiple Samples

Admit new packet

Arriving packet

y

ny

Drop all matched packets

Draw m packets at random from queue

end

end

n

Drop the new packet

end

Admit packet witha probability p

end

y

nAvgQsize <= Minth?

Do any of the packet ids match?

AvgQsize <= Maxth?

154

Simulation Comparison: The setup

R11Mbps

10MbpsS(2)

S(m)

S(m+n)

TCP Sources

S(m+1)

UDP Sources

S(1)

R2

D(2)

D(m)

D(m+n)

TCP Sinks

D(m+1)

UDP Sinks

D(1)

10Mbps

155

The Specifics

• 32 TCP flows, 1 UDP flow• All TCP’s maximum window size = 300 • All links have a propagation delay of

1ms• FIFO buffer size = 300 packets• All packets sizes = 1 KByte• RED: (minth,maxth) = (100,200)

packets

156

Simulation 1: 1UDP source

UDP's arrival rate = 2 Mbps

0

200

400

600

800

1000

0 50 100

Time (second)

Th

rou

gh

pu

t (K

bp

s))

DropTail: UDP's ThroughputRED: UDP's ThroughputCHOKe: UDP's Throughput

157

Different UDP Loadings

98.3%

97.5%

95.0%

87.0%74.1%

57.3%

45.8%34.8%

23.0%

0

50

100

150

200

250

300

350

400

100 1000 10000

UDP Arrival Rate (Kbps)

Th

rou

gh

pu

t (K

bp

s))

UDP Throughput with mark forUDP Dropping PercentageAverage TCP Throughput

158

5 UDPs and 1 Sample from Queue

32 TCPs, 5 UDPs (with same arrival rate)

0

200

400

600

800

1000

1200

100 1000 10000Total UDP Arrival Rate (Kbps)

Th

rou

gh

pu

t (K

bp

s)) Total UDP Throughput

Total TCP Throughput

159

5 Samples for 5 UDPs

0

200

400

600

800

1000

1200

100 1000 10000

Total UDP Arrival Rate (Kbps)

Thr

ough

put

(Kbp

s)

Total UDP Throughput

Total TCP Throughput

160

How many samples to take?

• Since we don’t know a priori how many unresponsive flows are passing throught the link, take the number of samples depending on backlog

• As Qavg increases, increase number of samples

minthMaxth

R1R2Rk

avg

161

A Fluid Analysis

discards from the queue

permeable tube with leakage

162

Some notation

• N: total number of packets in the buffer

• Li(t): rate at which flow i’s packets cross position t of buffer • 0 = entrance and D = exit

• pi: fraction of flow i’s packets dropped at ingress

= fraction of flow i’s packets dropped in buffer (since drops occur in pairs)

i : rate at which flow i packets arrive

163

The Equation

• Li(t)t - Li(t +t)t = i Li(t)t /N

=> - dLi(t)/dt = i Li(t) N

Li(0) = i (1-pi )

Li(D) = i (1-2pi )

• This first order differential equation can be solved explicitly for Li(t), 0 < t < D

164

Simulation Comparison: 1UDP, 32 TCPs

0

50

100

150

200

250

300

350

0.1 1 10Arrival Rate

Thr

ough

put

fluid model

CHOKe ns simulation

165

Fluid Analysis of Multiple Samples

• With M samples

Li(t)t - Li(t +t)t = Mi Li(t)t /N

=> - dLi(t)/dt = Mi Li(t) N

Li(0) = i (1-pi )M

Li(D) = i (1-pi )M - Mi pi

166

Comparison: 1 UDP, 2 Samples

0

20

40

60

80

100

120

140

0 0.5 1 1.5 2UDP Arrival Rate(Mbps)

UD

P T

hrou

ghpu

t(K

bps)

NS SimulationFluid Model

167

References

1. A. Demers, S. Kesav and S. Shenker, “Analysis and simulation of a fair queueing algorithm,” Proc. ACM SIGCOMM 1989.

2. S. Floyd and V. Jacobson, “Link-sharing and Resource Management Models for Packet Networks,” IEEE/ACM Trans. on Networking, 1995.

3. R. Braden et. al., “Recommendations on queue management and congestion avoidance in the Internet,” IETF RFC (Informational) 2309, April 1998.

4. R. Pan, B. Prabhakar and K. Psounis, “CHOKe: A stateless active queue management scheme for approximating fair bandwidth allocation,” Proc. INFOCOM 2000.

Competitive Analysis: Theory and Applications in Networking


Balaji Prabhakar

169

Competitive Analysis in Networking: Outline

• Background• Incremental construction of Multicast

Trees– The Greedy Strategy

• Routing and Admission Control– The Exponential Metric

• More Restricted Adversaries– Adversarial Queueing Theory

Theoretical Analysis; Rules of Thumb;Pragmatic Analysis

170

Decision Making Under Uncertainty:

Online Algorithms and Competitive Analysis

• Online Algorithm:– Inputs arrive online (one by one)– Algorithm must process each input as it arrives– Lack of knowledge of future arrivals results in

inefficiency

• Malicious, All-powerful Adversary:– Omniscient: monitors the algorithm– Generates “worst-case” inputs

• Competitive Ratio:– Worst ratio of the “cost” of online algorithm to

the “cost” of optimum algorithm

171

Warm-up Example: The Unlucky Skier

• Beginning Skier:– Does not know how many ski trips she will

make– Can rent skis for $40 or buy skis for $400– Online algorithm: on each successive trip,

must decide whether to buy or continue renting

• Adversary: All powerful– As long as the skier is renting, the adversary

will send her on another trip– As soon as the skier buys, the adversary will

stop her ski trips once and for all

172

The Unlucky Skier [Contd.]

• Buy after K trips– Cost of the algorithm = K £ $40 + $400– Optimum Cost = min{$400, (K+1) £ $40}– Competitive Ratio: Algorithm’s Cost/Optimum Cost

• Best Strategy: Rent 9 times, buy on 10th trip– Competitive Ratio = 760/400 ¼ 2– Best strategy does not always yield best solution

• Ski Principle: Buy after paying enough rent

173

Competitive Analysis: Discussion

• Very Harsh Model– All powerful adversary

• But..– Can often still prove good competitive ratios– Really tough Testing-Ground for Algorithms– Often leads to good rules of thumb which can

be validated by other analyses– Distribution independent: doesn’t matter

whether traffic is heavy-tailed or Poisson or Bernoulli

174






175

Incremental Construction of Multicast Trees

• Fixed Multicast Source s– K Receivers arrive one by one– Must adapt multicast tree to each new arrival

without rerouting existing receivers– Malicious adversary generates bad requests– Objective: Minimize total size of multicast tree– Applications: Streaming; Cache updates; ……

s r1

a

b

r1r1

bb

C ¸ 3/2

Can create worse sequences

176

Two Classes of Algorithms

• Shortest Path Algorithm– Each receiver connects using shortest path to

source (or to a core)• DVMRP [Waitzman, Partridge, Deering ’88]• CBT [Ballardie, Francis, Crowcroft ‘93]• PIM [Deering et al. ’96]

• Greedy Algorithm [Imase and Waxman ‘91]– Each receiver connects to the closest point on

the existing tree– Independently known to the Systems community

• The “naive” algorithm [Doar and Leslie ‘92]• End-system multicasting [Faloutsos, Banerjea, Pankaj

’98; Francis ‘99]

177

Shortest Path Algorithm: Example

• Receivers r1, r2, r3, … , rK join in order

N

s

r1

r2

r3

rK

178

Shortest Path Algorithm

• Cost of shortest path tree ¼ K £ N

N

s

r1

r2

r3

rK

179

Shortest Path AlgorithmCompetitive Ratio

• Optimum Cost ¼ K + N– Competitive Ratio ¼ KN/(K+N)

• If N is large, then competitive ratio ¼ K

s

r1

r2

r3

rK

180

Greedy Algorithm

• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)

• Theorem 2: No algorithm can achieve a competitive ratio better than log K

[Imase and Waxman ’91]

Greedy algorithm is the optimum strategy

181

Proof of Theorem 1

[Alon and Azar ’93]

• L = Size of the optimum multicast tree

• pi = amount paid by online algorithm for ri

– i.e. the increase in size of the greedy multicast tree as a result of adding receiver ri

• Lemma 1: The greedy algorithm pays 2L/j or more for at most j receivers– Assume the lemma– Total Cost 2L (1 + 1/2 + 1/3 + … 1/K) ¼ 2L

log K

182

Proof of Lemma 3

• Suppose there are more than j receivers for which the greedy algorithm paid more than 2L/j– Let these be r1, r2, … , rm, for m larger than j

– Each of these receivers is at least 2L/j away from each other and from the source

) The shortest tour with all these receivers and the source ¸ (2L/j)m > 2L

) Cost of multicast tree ¸ ½ (cost of tour) > LContradiction!

183

rm

Tours and Trees

s

r1 r2

r3

r4

Each segment ¸ 2L/j

) Tour cost ¸ (2L/j)m > 2L

s

r1 r2r3

r4

rm

Can construct tour from tree by repeating edgesAt most twice) Tree Cost ¸ ½(Tour cost)

> L

184

Greedy Algorithm: Recap

• Add new receiver to closest node on existing tree

• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)

• Theorem 2: No algorithm can achieve a competitive ratio better than log K

• Greedy algorithm is the optimum strategy

• Shortest path algorithm can be pretty bad

185

Objections to the Greedy Algorithm

• Log K is pretty bad

• We don’t care about performance–Network bandwidth is cheap

• Shortest path performs well in practice– The example given earlier is pathological

• Greedy algorithm is impractical

• We don’t trust theoreticians–Theoreticians always hide something

All valid concerns; Must be addressed

186

Log K is Pretty Bad?

• But K is worse !!

187

Network Bandwidth is Cheap?

• Quantitative Analysis helps

– Difference between shortest path algorithm and greedy algorithm is K/log K

– Network bandwidth is not that cheap, specially for bandwidth intensive multicasts

188

Shortest Path Works Well in Real-life Networks?

• What are “Real-life” networks?– Internet topology is not completely

understood

• Must look at interesting special cases– Assume receivers chosen at random1. The network looks like a grid

– Shortest path: Competitive ratio = SquareRoot(K)– Greedy Algorithm: Competitive ratio = O(1)

2. The network looks like a random graph– Shortest path: Competitive ratio = O(1)– Greedy Algorithm: Competitive ratio = O(1)

[Goel and Munagala ’00]

189

Greedy Algorithm is Impractical?

• Yes, for deployment at lower network layers

• But not if multicast routing occurs at the application layer

• Several systems now implement similar schemes (end-system multicast)– Qosmic [Faloutsos, Banerjea, Pankaj ’98]– Yallcast/YOID [Francis ’99]………

190

Theoreticians Hide Things?

• So what did we hide here?– Greedy algorithm can result in large latency

from source to receivers– Shortest path algorithm can achieve the best

possible latency

• Fix: Reroute large latency receivers and some of their ancestors– Close to optimum latencies– Tree size close to the greedy tree– No receiver rerouted more than once[Goel and Munagala ‘00]

191

Moral

• Rule of thumb for multicast routing:– Since future is unknown, be greedy in the

present

• Meta-morals:– Competitive analysis can yield valuable clues

about algorithm performance– Caution: Competitive analysis is the beginning,

not the end– Must validate online algorithms in systems

setting– Must often tweak the algorithms

192






193

The Exponential Cost Metric

• Consider a resource with capacity C• Assume that a fraction of the resource has

been consumed• Exponential cost “rule of thumb”: The cost of the

resource is given by for appropriately chosen • Intuition: Cost increases steeply with

– Bottleneck resources become expensive

Cost

194

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

195

The Online Routing Problem

• Connection establishment requests arrive online in a VPN (Virtual Private Network)

• Must assign a route to each connection and reserve bandwidth along that route– PVCs in ATM networks– MPLS + RSVP in IP networks

• Oversubscribing is allowed– Congestion = the worst oversubscribing on a link

• Goal: Assign routes to minimize congestion• Assume all connections have identical b/w

requirement, all links have identical capacity

196

Online Routing Problem: Example

s r1

b

r1r1C ¸ 2

Can create worse sequences

aaa

197

Online Algorithm for Routing

• L = Fraction of bandwidth of link L that has been already reserved

• = N, the size of the network

• The Exponential Cost Algorithm:– Route each incoming connection on current

cheapest path from src to dst– Reserve bandwidth along this path[Aspnes et al. ‘93]

198

Online Algorithm for Routing

• Theorem 1: The exponential cost algorithm achieves a competitive ratio of O(log N) for congestion

• Theorem 2: No algorithm can achieve competitive ratio better than log N in asymmetric networks

This simple strategy is optimum!

Does the idea extend to other problems? To more realistic scenarios?

199



200

Online Admission Control and Routing

• Connection establishment requests arrive online

• Must assign a route to each connection and reserve bandwidth along that route

• Oversubscribing is not allowed– Must perform admission control

• Goal: Admit and route connections to maximize total number of accepted connections (throughput)

201

Exponential Metric and Admission Control

• When a connection arrives, compute the cheapest path under current exponential costs

• If the cost of the path is less than then accept the connection; else reject[Awerbuch, Azar, Plotkin ’93]

• Theorem: This simple algorithm admits at least O(1/log N) as many calls as the optimum

202

Objections to Exponential Costs

• Log N is too bad• Requires permanent connections• Too inefficient

– Frequent “link-state updates”– Frequent computation of shortest paths

203



204

Assume Stochastic Arrivals

• Connection arrivals are Poisson, durations are Memory-less

• Assume fat links (Capacity >> log N)• Theorem: The exponential cost

algorithm results in1. Near-optimum congestion for routing problem 2. Near-optimum throughput for admission

problem[Kamath, Palmon, Plotkin ’96]Near-optimum: Compt. ratio = (1+) for close

to 0

205

Versatility of Exponential Costs

• Guarantees of log N for Competitive ratio against malicious adversary

• Near-optimum for stochastic arrivals• Near-optimum given fixed traffic matrix

[Young ’95; Garg and Konemann ’98]

No need to know whether there is an adversary, or what the stochastic distribution is, or what the traffic matrix is !!

206

Objections to Exponential Costs

• Log N is too bad• Requires permanent connections• Too inefficient

– Frequent “link-state updates”– Frequent computation of shortest paths

207



208

Exponential Metrics and Stale Information

• Exponential metrics continue to work well if– Link states are a little stale– Shortest paths are reused over small intervals

rather than recomputed for each connection– No centralized agent[Goel, Meyerson, Plotkin ’01]

• Caveat: Still pretty hard to implement

209



210

Power Aware Routing

• Consider a group of small mobile nodes eg. sensors which form an adhoc network– Bottleneck Resource: Battery– Goal: Maximize the time till the network partitions

• Assign a cost to each mobile node which is where = fraction of battery consumed– Send packets over the cheapest path under this cost

measure

• O(log n) competitive against an adversary– Near-optimum for stochastic/fixed traffic

211

Power Aware Routing: Implementation?

• Hard to implement in general• Consider the Directed Diffusion Model

[Intanagonwiwat, Govindan, Estrin ’00]– Receiver floods network with interest for

desired data– Interest reaches the source– Source sends data over multiple paths– Receiver reinforces the “best” path

• Just send accumulated sum of exponential costs along with the data– Receiver reinforces the path with the least cost

212






213

• Malicious, all-knowing adversary– Injects packets into the network– Each packet must travel over a specified route

• Suppose adversary injects 3 packets per second from s to r– Link capacities are one packet per second

– No matter what we do, we will have unbounded queues and unbounded delays

– Need to temper our definition of adversaries

Adversarial Queueing TheoryMotivation

sr

214

Adversarial Queueing TheoryBounded Adversaries

• Given a window size W, and a rate r < 1– For any link L, and during any interval of

duration T > W, the adversary can inject at most rT packets which have link L in their route

• Adversary can’t set an impossible task!!– More gentle than competitive analysis

• Will study packet scheduling strategies– Which packet to forward if more than one

packets are waiting to cross a link?

215

Some Interesting Scheduling Policies

• FIFO: First In First Out• LIFO: Last In First Out• NTG: Nearest To Go

– Forward a packet which is closest to destination

• FTG: Furthest To Go– Forward a packet which is furthest from its destination

• LIS: Longest In System– Forward the packet that got injected the earliest– Global FIFO

• SIS: Shortest In System– Forward the packet that got injected the last– Global LIFO

216

Stability in the Adversarial Model

• Consider a scheduling policy (eg. FIFO, LIFO etc.)

• The policy is universally stable if for networks and all “bounded adversaries”, the packet delays and queue sizes remain bounded[Borodin et al. ‘96]

• FIFO, LIFO, NTG are not universally stable

• LIS, SIS, FTG are universally stable[Andrews et al. ‘96]

217

Adversarial Queueing Model: Routing

Using the Exponential Cost Metric

• Adversary injects packets into the network but gives only the src, dst– The correct routes are hidden

• Need to compute routes– Again, use the exponential cost metric– Reset the cost periodically to zero– Use any stable scheduling policy

• Theorem: The combined routing and scheduling policy is universally stable[Andrews et al. ’01]

218

Summary

• Competitive analysis models decision making under uncertainty– Applicable to a wide range of networking

problems

• General rules of thumb– Greedy algorithm for multicasting– Exponential cost metric for online routing,

admission control, stochastic injections, power-aware routing

• Adversarial Queueing Theory– Bounded adversaries– FIFO unstable; LIS stable– Exponential metrics result in stable routing

219

References

1. N. Alon and Y. Azar. On-line Steiner trees in the Euclidean plane. Discrete and Computational Geometry, 10(2), 113-121, 1993.

2. M. Andrews, B. Awerbuch, A. Fernandez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results for greedy contention-resolution protocols. Proceedings of the 37th IEEE Conference on Foundations of Computer Science, 1996.

3. M. Andrews, A. Fernandez, A. Goel, and L. Zhang. Source Routing and Scheduling in Packet Networks. To appear in the proceedings of the 42nd IEEE Foundations of Computer Science, 2001.

4. J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line load balancing with applications to machine scheduling and virtual circuit routing. Proceedings of the 25th ACM Symposium on Theory of Computing, 1993.

5. B. Awerbuch, Y. Azar, and S. Plotkin. Throughput competitive online routing. Proceedings of the 34th IEEE symposium on Foundations of Computer Science, 1993.

6. A. Ballardie, P. Francis, and J. Crowcroft. Core Based Trees(CBT) - An architecture for scalable inter-domain multicast routing. Proceedings of the ACM SIGCOMM, 1993.

220

References [Contd.]

7. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. Proceedings of the 28th ACM Symposium on Theory of Computing, 1996.

8. S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei. The PIM architecture for wide-area multicast routing. IEEE/ACM Transactions on Networking, 4(2), 153-162, 1996.

9. M. Doar and I. Leslie. How bad is Naïve Multicast Routing? IEEE INFOCOM, 82-89, 1992.

10. M. Faloutsos, A. Banerjea, and R. Pankaj. QoSMIC: quality of service sensitive multicast Internet protocol. Computer Communication Review, 28(4), 144-53, 1998.

11. P. Francis. Yoid: Extending the Internet Multicast Architecture. Unrefereed report, http://www.isi.edu/div7/yoid/docs/index.html .

12. N. Garg and J. Konemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. Proceedings of the 39th IEEE Foundations of Computer Science, 1998.

221

References [Contd.]

13. A. Goel, A. Meyerson, and S. Plotkin. Distributed Admission Control, Scheduling, and Routing with Stale Information. Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, 2001.

14. A. Goel and K. Munagala. Extending Greedy Multicast Routing to Delay Sensitive Applications. Short abstract in proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms, 2000. Long version to appear in Algorithmica.

15. M. Imase and B. Waxman. Dynamic Steiner tree problem. SIAM J. Discrete Math., 4(3), 369-384, 1991.

16. C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MobiCOM), 2000.

17. A. Kamath, O. Palmon, and S. Plotkin. Routing and admission control in general topology networks with Poisson arrivals. Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms, 1996.

18. D. Waitzman, C. Partridge, and S. Deering. Distance Vector Multicast Routing Protocol. Internet RFC 1075, 1988.

19. N. Young. Randomized rounding without solving the linear program. Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms, 1995.

http://www.isi.edu/~govindan

Documents

Ashish Goel Dept of CS USC [email protected] Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS