221
Ashish Goel Dept of CS USC [email protected]. edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS Stanford University [email protected] Balaji Prabhakar Depts of EE and CS Stanford University [email protected] ACM SIGCOMM 2001 San Diego, CA

Ashish Goel Dept of CS USC [email protected] Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Embed Size (px)

Citation preview

Page 1: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Ashish Goel

Dept of CSUSC

[email protected]

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Balaji Prabhakar

Network Algorithms: Techniques for Design and

Analysis

Nick McKeown Depts of EE and CSStanford University

[email protected]

Balaji Prabhakar

Depts of EE and CSStanford University

[email protected]

ACM SIGCOMM 2001San Diego, CA

Page 2: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

2

Overview and Objectives

• Algorithm design is a classical subject– beginnings in early computations

(multiplication, division, etc.)– and has become more sophisticated and

mature in the computer age

• The subject has grown– when tasks have to be performed under

more stringent conditions– or, when new tasks have to be performed– (typically, it’s a combination of both

reasons)

Page 3: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

3

Algorithms for Networks

• Networking provides a rich new context for algorithm design

– algorithms are used everywhere in networks– at the end-hosts for packet transmission– in the network: switching, routing, caching, etc.

– many new scenarios – and very stringent constraints

– high speed of operation– large-sized systems– cost of implementation

– require new approaches and techniques

Page 4: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

4

Methods

• Algorithm analysis– sheds light on the complexity of an algorithm (time, space,

resource)– uses discrete math and has many standard methods– implementors care about algorithm complexity– (needs cautious interpretation: metrics of the implementor

and of the theoretician not necessarily the same)

• In the networking context– we also need to understand the “performance” of an

algorithm: How well does a network or a component that uses a particular algorithm perform, as perceived by the user?

– performance analysis is concerned with metrics like delay, throughput, loss rates, etc

– this requires continuous math methods: e.g. queueing theory

Page 5: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

5

Recent Algorithm Design Methods

• Motivated by the desire – for simple implementations– and for robust performance (because operating conditions change and are

unknown, or because of security reasons)

• Several new methods of algorithm design can be used in the networking context

– randomized algorithms– approximate algorithms– genetic algorithms– online algorithms– combinatorial optimization techniques

Page 6: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

6

Performance Analysis Methods

• There are many: some classical, some new

– standard queueing theory (assuming input distributions are known, can we say what

delays and buffer occupancies will be like?)

– fluid models (very simple and useful for determining throughput regions)

– adversarial analysis (useful for worst-case analyses: your worst enemy is

generating traffic to beat your algorithm)

– competitive analysis (useful for comparing two different algorithms on the same

inputs – like a competition)

Page 7: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

7

In this tutorial…

• We will consider a number of problems in networking

• Show various methods for algorithm design and for performance analysis

• Nick McKeown– Switch scheduling algorithms

• Balaji Prabhakar– Randomized algorithms

• Ashish Goel– Competitive analysis of approximate algorithms

Page 8: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

8

Disclaimers

• This tutorial is idiosyncratic– we talk about things we know well: the subject is larger– we also talk about things we’ve worked on (hopefully,

this are also things we know well)

• Your participation is essential– please don’t hesitate to ask for clarifications– there is a lot of material and some of it is not easy– there is no such thing as a stupid question

• References– are included for each topic, but they are not exhaustive (please don’t be upset if we didn’t cite your paper)– if you need more details, please drop us a note

Page 9: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Switch Scheduling Algorithms

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Balaji Prabhakar

Page 10: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

10

Scheduling crossbar switches to achieve 100% throughput

• Background to problem• Techniques and algorithms

Page 11: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

11

Background to switch scheduling

1. [Karol et al. 1987] Throughput limited to by head-of-line blocking for Bernoulli IID uniform traffic.

2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up.

%5822

Page 12: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

12

History of the theory

3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph.

4. [McKeown et al. 1995] (a) Maximum size match can not guarantee 100% throughput.(b) But maximum weight match can – O(N3).

5. [Mekkittikul and McKeown 1998] A carefully picked maximum size match can give 100% throughput.

Matching

O(N2.5)

Page 13: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

13

History of the theory Speedup

5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm.

6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two.

Page 14: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

14

History of the theory (3)Newer approaches

7. [Tassiulas 1998] 100% throughput possible for simple randomized algorithm with memory.

8. [Giaccone et al. 2001] “Apsara” algorithms.

9. [Iyer and McKeown 2000] Parallel switches can achieve 100% throughput and emulate an output queued switch.

10. [Chang et al. 2000] A 2-stage switch with a TDM scheduler can give 100% throughput.

Page 15: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

15

Scheduling crossbar switches to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 16: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

16

Basic Switch Model

A1(n)

S(n)

N NLNN(n)

A1N(n)

A11(n)L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

Page 17: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

17

Some definitions

matrix. npermutatio a is and :where

:matrix Service 2.

".admissible" is traffic the say we If

where

:matrix Traffic 1.

SssS

nAE

ijij

jij

iij

ijijij

1,0],[

1,1

)]([:,

3. Queue occupancies:

Occupancy

L11(n) LNN(n)

Page 18: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

18

Some possible performance goals

?metrics... Other 6

5

4

3.

"throughput "100% 2.

onconservati Work 1.

.

)(lim

)(lim.

,)]([.

,)(

ij

ij

n

ij

n

ij

ij

n

nA

n

nD

CnLE

nCnL

When traffic is

admissible

Page 19: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

19

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 20: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

20

Algorithms that give 100% throughput for uniform traffic

• Quite a few algorithms give 100% throughput when traffic is uniform1

• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full– iSLIP

1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.

Page 21: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

21

Maximum size bipartite match

• Intuition: maximizes instantaneous throughput

• Gives 100% throughput for uniform traffic.

L11(n)>0

LN1(n)>0

“Request” Graph Bipartite Match

MaximumSize Match

Page 22: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

22

Network flows and bipartite matching

Finding a maximum size bipartite matching is equivalent to solving a network flow problem

with capacities and flows of size “1”.

A 1

Sources

Sinkt

B

C

D

E

F

2

3

4

5

6

Page 23: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

23

Network flows and bipartite matching

A 1

s t

B

C

D

E

F

2

3

4

5

6

Ford-Fulkerson method.Residual Graph for first three paths:

Page 24: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

24

Network flows and bipartite matching

A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for next two paths:

Page 25: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

25

Network flows and bipartite matching

A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for augmenting path:

Page 26: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

26

Network flows and bipartite matching

A 1

s t

B

C

D

E

F

2

3

4

5

6

Residual Graph for last augmenting path:

Page 27: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

27

Network flows and bipartite matching

A 1

s t

B

C

D

E

F

2

3

4

5

6

Maximum flow graph:

Page 28: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

28

Network flows and bipartite matching

A 1

B

C

D

E

F

2

3

4

5

6

Maximum Size Matching:

Page 29: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

29

Aside: Maximal Matching

• A maximal matching is one in which each edge is added one at a time, and is not later removed from the matching.

• i.e. no augmenting paths allowed (they remove edges added earlier).

• No input and output are left unnecessarily idle.

Page 30: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

30

Aside: Example of Maximal Size Matching

A 1

B

C

D

E

F

2

3

4

5

6

A 1

B

C

D

E

F

2

3

4

5

6

Maximal Matching Maximum Matching

Page 31: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

31

Aside: Maximal Matchings

• In general, maximal matching is much simpler to implement, and has a much faster running time.

• A maximal size matching is at least half the size of a maximum size matching.

• A maximal weight matching is defined in the obvious way.

• A maximal weight matching is at least half the weight of a maximum weight matching. End of

aside

Page 32: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

32

Algorithms that give 100% throughput for uniform traffic

• Quite a few algorithms give 100% throughput when traffic is uniform1

• For example:– Maximum size bipartite match.– TDM and a few variants– Wait-until-full

1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.

Page 33: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

33

TDM Scheduling Algorithm

If arriving traffic is i.i.d with destinations picked uar across outputs, then a “TDM” schedule gives 100% throughput.

A 1

B

C

D

2

3

4

B

C

D

2

3

4

B

C

D

2

3

4

A 1 A 1

Variation 1: if permutations are picked uar from the set of N! permutations, this too will also give 100% throughput.

Variation 2: if permutations are picked uar from the TDM permutations above, this too will give 100% throughput.

Page 34: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

34

A Simple wait-until-full algorithm

The following algorithm is believed to be stable for Bernoulli i.i.d. uniform arrivals:

1. If any VOQ is empty, do nothing (i.e. serve no queues).

2. If no VOQ is empty, pick a permutation uar across either (TDM permutations, or all permutations).

Page 35: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

35

Some simple algorithms that achieve 100% throughput

Page 36: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

36

Some observations

• A maximum size match (MSM) maximizes instantaneous throughput.

• But a MSM is complex – O(N2.5).• It turns out that there are many simple

algorithms that give 100% throughput for uniform traffic.

• So what happens if the traffic is non-uniform?

Page 37: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

37

Why doesn’t maximizing instantaneous throughput give 100% throughput for non-

uniform traffic?

2/1

2/1

2/1

32

21

1211Three possiblematches, S(n):

100%). t(throughpu stable not is switch 0.0358 if so And

But

most at is served is 1 input which at rate total The

. w.p. serviced is 1 Input ) w.p.( arrivals have

both and and , time at that Assume

.)21(31121

.)21(311

)21(11)21(32

32)21(

)()(0)(0)(

21

2

22

2

32211211

-δ// - -λ

//

/-//

/-δ/

nQnQ n, L nn, L

Page 38: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

38

Simulation of simple 3x3 example

Page 39: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

39

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 40: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

40

Example 1: (Trivial) scheduling to achieve 100% throughput

• Assume we know the traffic matrix, and the arrival pattern is deterministic:

• Then we can simply choose:

• Q: What is Lij(n)?

1000

0100

0010

0001

nnS

,

10

...

1

01

)(

Page 41: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

41

Example 2:With random arrivals, but known traffic matrix

• Assume we know the traffic matrix, and the arrival pattern is random:

• Then we can simply choose:

• Q: Does Lij(n) = 0 for all n?• Q: In general, if we know , can we pick a sequence S(n) to achieve

100% throughput?

1000

0100

002/12/1

002/12/1

1000

0100

0001

0010

)(,

1000

0100

0010

0001

)( evenSoddS

Page 42: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

42

Birkhoff - von Neumann Decomposition

rate. arrival the exceeds rate

departure the and words, other In

is period in of soccurrence of# the that So

:matrices service of sequence the pick Then

element) by (element

:that such matrices, service of set and

constants of set some pick can we y,Intuitivel

,0))((

.

),,,,,,,()(

.,

),(

),,(

1

13221

1

1

1

T

i

ii

r

r

iii

r

r

iS

aTM

T

MMMMMMnS

Ma

MM

aa

Turns out, any can always be decomposed into a linear (convex) combination of matrices, (M1, …, Mr) by Birkhoff-von Neumann.

Page 43: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

43

In practice…

• Unfortunately, we usually don’t know traffic matrix a priori, so we can:– Measure or estimate , or– Not use .

• In what follows, we will assume we don’t know or use .

Page 44: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

44

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known

• Technique: Birkhoff-von Neumann decomposition.

4. When traffic matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 45: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

45

When the traffic matrix is not known

( 1) ( ) ( ) 0

( ) ( ) ( ) ( ) | ( ) 0.

ij ij

ij ijij ij

E L n L n | L n ,

E L n S n A n L n L n

1. We will try and fi nd conditions f or which, roughly:

i.e.

2. I n other words, there is an expected downward drif t

in the o

: { ( 1)} { ( )} ( ) 0

[ ( )] , , .

{ ( 1)} {

ij

E V L n V L n | L n

E L n i j

E V L n V L

ccupancy of each queue.

3. This is an example of a Lyapunov f unction.

4. I t is known that if

f or some V{.}, then:

5. The same result holds if :

( )} ( ) ( ) .n | L n c k L n

Page 46: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

46

Some additional definitions

else.

and if

:Evolution Approx

:

matrix. npermutatio a is

:

:

:

:

A:

Evolution

Service

rate Arrival

Arrivals

,0

1)(0)(,1)1(

~)1(

).()()()1(~

).()()()1(

)(..:

1,1.1,0

)),(,),(()(

).,,(

)).(),...,(()(

,

)(..)(

...

...

)(..)(

)(

11

11

1

11

1

11

nSnLnLnL

nAnSnLnL

nAnSnLnL

nSei

SSS

nSnSnS

nAnAnA

nAnA

nAnA

n

ij ij ij ij

ij ij ij ij

ij ij ij ij

jij

iijij

NNT

NNT

NNT

NN

N

N

Page 47: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

47

Some facts that we’ll use

matrices. npermutatio the are C of points extreme The

:Theorem sBirkhoff’ 4.

elements. other of nscombinatio linear

not are that C of elements are points extreme The

C. of points Extreme 3.

then

if e.g.

set. closed a is C 2.

i.e. ,stochasticsub doubly is 1.

0,1)(

,3040

7020

1070

8010

.11

...

.........

...

2

21

1

111

1

21

a,bbaC, bΛaΛ

C,ΛΛ. .

. ., Λ

. .

. .Λ

λ, λ

λλ

λλ

Λ

i j

ijij

NNN

N

Page 48: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

48

Some more facts that we’ll use

1 1

( )

max ( ( ) )

1 1 0.

max( ( ) ) max( ( ) ( )).

( ( )

T

N N

ij ij iji j

T T

λ S n

T

L n

λ , λ , λ

L n λ L n S n

L n

Consider the f ollowing linear programming problem:

Find:

s.t.

We know that the solution is an extreme point of C.

i.e.

( )

( )

) max( ( ) ( )) 0.

max( ( ) ( ))?

T

S n

T

S n

λ L n S n

L n S n

Q: So what is

Page 49: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

49

Maximum weight matching

A1(n)

N NLNN(n)

A1N(n)

A11(n)

L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

L11(n)

LN1(n)

“Request” Graph Bipartite Match

S*(n)

MaximumWeight Match

*

( )( ) arg max( ( ) ( ))T

S nS n L n S n

Page 50: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

50

Outline of Proof

*

( )

*

( ) arg max( ( ) ( )),

( ( ) ) ( ( ) ( )) 0.

( 1) ( 1) ( ) ( ) ( ) ( ) .

{ ( )} ( ) ( ),

T

S n

T T

T T

T

S n L n S n

L n λ L n S n

E L n L n L n L n | L n c L n

V L n L n L n

1. We know that if we pick

then

2. Next we use this f act to show that:

where:

( )

[ ( )]

L n

E L n

is our Lyapunov f unction.

3. Hence, if is large enough, there is an expected

single-step downward drif t in occupancy, and so

and 100%throughput is achieved.

For more details, see ref erence.

Page 51: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

51

Choosing the weight

2 3

( ) ( )?

( ) [ ( )] ,[ ( )] ,...

( )

ij ij

ij ij ij

ij

w n L n

w n L n L n

w n

Q: Do we need to choose edge weights:

I f we choose then same

Lyapunov method can be used to show that 100% throughput

is achieved.

I f

Fact 1:

Fact 2: [ ( )] [ ( )] .

( ) ( ) [ ( )] .

( ) [ ( )]

xxij ij

ij ij ij

xij ij

L n E L n

w n L n E L n

w n L n

x

then For example,

if , then

Theory suggests that if , then

switch becomes "more stable" as we increase . Simulation

sugg

Observation:

( )

( ) ( ) (

ij

ij

ij ij iji

x

w n

Q

w n L n L n

ests that average delay decreases as we increase .

I f is defi ned to be the time that the HOL cell

has been in queue , then 100% throughput is achieved.

I f

Fact 3:

Fact 4: )j

, then 100% throughput is

achieved. This is called a "Longest Port First (LPF)" match, and

(surprisingly) is also a maximum size match.

Page 52: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

52

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 53: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

53

100% throughput with pipelining

ˆ ( ) ( ),ij ij

n

L n L n k

k

I n practice, switch schedulers are of ten pipelined.

So what happens if the pipeline uses out-of -date inf ormation?

1. Defi ne out-of -date occupancy at time :

where is how out-of -date th

ˆ( ) ( ) ( ) ,

( 1) ( 1) ( ) ( ) ( ) ( ) 2 .

( )

ij ij ij

T T

L n k L n L n k

E L n L n L n L n | L n c L n Nk

L n

additional term

e inf ormation is.

2. Because it can be shown that:

3. As bef ore, if is large enough, there is an expecte

[ ( )]E L n

k

d

single-step downward drif t in occupancy, and so

and 100%throughput is achieved.

Q: I f we increase , will average delay increase?

Page 54: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

54

100% throughput with incomplete information

I n practice, the bandwidth of state inf ormation to/ f rom

and within a switch schedulers is limited.

So what happens if the scheduler uses f ewer bits to store

the weight inf ormation?

1. Defi ne noisy inf orma

ˆ( ) ( ) ( ),

( )

( )

n

L n L n e n

e n

e n C n C

tion at time :

where is an error term.

2. I f , , where is some constant, then 100%

throughput is achieved.

Page 55: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

55

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 56: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

56

Achieving 100% when algorithm does not complete

Randomized algorithms:1. Basic idea (Tassiulas)2. Reducing delay (Shah, Giaccone and

Prabhakar)

Note: Balaji Prabhakar will cover randomized scheduling algorithms in detail in the next section of the tutorial.

Page 57: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

57

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 58: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

58

Speedup and Combined Input Output Queueing (CIOQ)

A1(n)

S(n)

N NLNN(n)

A1N(n)

A11(n)L11(n)

1 1

AN(n)

ANN(n)

AN1(n)

D1(n)

DN(n)

• With speedup, the matching is performed s times per cell time, and up to s cells are removed from each VOQ.• Therefore, output queues are required.

Page 59: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

59

Fluid model

• Fluid models used to obtain stability regions for discrete time stochastic networks.

• Apply to any traffic that satisfies a strong law of large numbers, i.e.

• The fluid model “washes” out the packet structure, yet still can prove stability results.

ijij

n n

nA

)(lim

Page 60: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

60

Fluid Model

.)(),()(

)()0()(

.)(

)(

)),1()((1)(

)()()0()(

1)0)({

ttTtTstD

tDtLtL

nnTn

SnT

kTkTsnD

nDnALnL

Ss

ms

Ss

msijij

ijijijij

Ss

ms

mS

ms

Ss

n

k

mskLijij

ijijijij

ij

:where

:time continuous in equations Fluid

and ; slot by used been has

npermutatio time cumulative the is :where

:evolution Switch

Page 61: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

61

Fluid Model

reference. the see details, more For

achieved. is

throughput 100% and stable, is switch the , of speedup

a with that shown be can it this, From served. be must output

the and/or served, be must input the either words, other In

some for 2.

and/or 1.

: if

hold must following the of one , slot time of , phase,

each for then match, maximal a is , match, the If

:Prabhakar) and (Dai proof of Sketch

2

}.,,1{,,0)()(

,0)()(

0)(

*

*

s

Njis

knS

s

knL

s

knS

s

knL

s

knL

nk

S

jiji

jiji

ij

*

Page 62: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

62

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 63: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

63

2-stage switch and no scheduler

Motivation:1. If traffic is uniformly distributed, then

even a TDM schedule gives 100% throughput.

2. So why not force non-uniform traffic to be uniformly distributed?

Page 64: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

64

2-stage switch and no scheduler

S2(n)

N NLNN(n)

L11(n)

1 1 D1(n)

DN(n)

N N

1 1 A’1(n)

A’N(n)

S1(n)

A1(n)

AN(n)

BufferlessLoad-balancing

Stage

BufferedSwitching

Stage

Page 65: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

65

2-stage switch with no scheduler

ˆ( ) ,

ˆ mod

nn

n n N

1. Consider a periodic sequence of permutation matrices:

where is a one-cycle permutation matrix

(f or example, a TDM sequence), and .

2. I f 1st stage is

Main Result [Chang et al.]:

1 1

1

2 2

( ) ( ),

( ) ( ),

n n

n n

scheduled by a sequence of permutation

matrices:

where is a random starting phase, and

3. The 2nd stage is scheduled by a sequence of permutation

matrices:

4. Then the switch gives 100% throughput f or a very broad

range of traffi c types.

1st stage makes non-unif orm traffi c unif orm,

and breaks up burstiness. For bursty traffi c, delay can be

lower than f or an ou

Observation 1:

tput queued switch!

Cells can become mis-sequenced.Observation 2:

Page 66: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

66

Parallel Packet Switches

Definition:

A PPS is comprised of multiple identical lower-speed packet-switches operating independently and in parallel. An incoming stream of packets is spread, packet-by-packet, by a demultiplexor across the slower packet-switches, then recombined by a multiplexor at the output.

We call this “parallel packet switching”

Page 67: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

67

Architecture of a PPS

OQ Switch

OQ Switch

OQ Switch

1

2

3

N=4

R

R

R

R

1

2

3

N=4

R

R

R

R

MultiplexorDemultiplexor

Demultiplexor

Demultiplexor

Demultiplexor

Multiplexor

Multiplexor

Multiplexor

(sR/k) (sR/k)

k=3

1

2

(sR/k) (sR/k)

Page 68: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

68

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

2

3

N=4

R

R

R

R

2

2

415

3

1

2

1

3

2

1

4

2

3

1

4

2 13

4

1234

5

123

5

1234 1234

5

12345

R/3

R/3

R/3

Why a PPS isn’t work-conserving with s=1

Page 69: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

69

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

Why is there no Choice at the Input ?

1

2

3

4

2

j

4

1

2

3

j

4

1

2

3

jj

5

j

j

41

2

3

41

2

jj54

1

2

3j

4jjj5

How we got there on the input side

Page 70: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

70

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

Result of no Choice

2

41

2

3

5

1

2

3

4

jj54

Why is there no Choice at the Input ?

How we got there on the input side

Page 71: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

71

How can we increase choice? Speedup

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

2

1j

1

j

1

j

jjj54 5

14

2

3j

5

j

j

14

jj

2

3

5

14

jj

2

3

5

(2R/3) (2R/3)

(2R/3) (2R/3)

Page 72: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

72

Effect of Speedup on Choice

R

A speedup of S= 2,

with k= 10 links

2R/k Layer 1

Layer 10

1k/S

Layer 2

Layer 9

Page 73: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

73

Aside: 3-stage Clos Network

n x k

m x m

k x n1

N

N = n x mk >= n

1

2

m

1

2

k

1

2

m

1

N

n n

Page 74: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

74

Aside: With k = n is a Clos network non-blocking like a

crossbar?

Consider the example: scheduler chooses to match(1,1), (2,4), (3,3), (4,2)

Page 75: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

75

Aside: With k = n is a Clos network non-blocking like a

crossbar?

Consider the example: scheduler chooses to match(1,1), (2,2), (4,4), (5,3), …

Page 76: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

76

Aside: With k > n can a Clos network be non-blocking without

rearrangement?

Clos’ Theorem: If k > 2n – 1, then a new connection can alwaysbe added without rearrangement.

Page 77: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

77

I1

I2

Im

O1

O2

Om

M1

M2

Mk

n x k

m x m

k x n1

N

N = n x mk >= n

1

N

n n

Aside: With k > n can a Clos network be non-blocking without

rearrangement?

Page 78: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

78

Clos Theorem

Ia Ob

x

x + n

1

n

k

1

n

k

1. Consider adding the n-th connection between1st stage Ia and 3rd stage Ob.

2. We need to ensure that there is always somecenter-stage M available.

3. If k > (n-1) + (n-1) , then there is always an M available. i.e. k > 2n – 1.

n-1 alreadyin use at input

and output.

End of aside

Page 79: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

79

Definitions for PPS

• Available Input Link Set (AIL)

AIL(i,n) is the set of layers to which external input port i can start writing a cell to, at time slot n.

Page 80: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

80

Definition

• Departure Time of a Cell (n’)

The departure time of a cell, n’, is the time it would have departed from an equivalent FIFO OQ switch.

Page 81: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

81

Definition

• Available Output Link Set (AOL)

AOL(j,n’) is the set of layers that output j can start reading a cell from, at time slot n’.

Page 82: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

82

Main Observation

Layer 1

Layer 2

Layer 3

1

2

3

N=4

R

R

R

R

1

3

N=4

R

R

R

R

2

14

1

j

j

jj

2

3

(2R/3) (2R/3)

(2R/3) (2R/3)

• Inputs can only send to the AIL set.• Outputs can only read from the AOL set.

5 1jj

j

222

Page 83: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

83

Minimum size of AIL, AOL:

|AIL|, >= Total – Maximum number of|AOL| links links which can have

cells in progress

Lower Bounds on Choice Sets

= k - ( k/S - 1 )

Page 84: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

84

Assurance of Choice

• A cell must be sent to a link which belongs to both the AIL and the AOL set.

AI L AOL

|AI L| |AOL| k

(k k/ s 1) (k k/ s 1) k

S 2k/ (k 2)

Page 85: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

85

Parallel Packet SwitchResults

• If S >= 2 then each cell is guaranteed to find a layer that belongs to both the AIL and AOL sets.

• If S >= 2 then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns, and hence achieves 100% throughput.

Page 86: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

86

Scheduling algorithms to achieve 100% throughput

1. Basic switch model.2. When traffic is uniform (Many algorithms…)3. When traffic is non-uniform, but traffic matrix is known.

• Technique: Birkhoff-von Neumann decomposition.

4. When matrix is not known.• Technique: Lyapunov function.

5. When algorithm is pipelined, or information is incomplete.

• Technique: Lyapunov function.

6. When algorithm does not complete.• Technique: Randomized algorithm.

7. When there is speedup.• Technique: Fluid model.

8. When there is no algorithm.• Technique: 2-stage load-balancing switch.• Technique: Parallel Packet Switch.

Page 87: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

87

References

1. C.-S. Chang, W.-J. Chen, and H.-Y. Huang, "Birkhoff-von Neumann input buffered crossbar switches," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, 2000, pp. 1614 – 1623.

2. N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug 1999.

3. A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March 1998.

4. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April 1998.

5. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects 2001.

6. J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp. 556 -- 564.

7. C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann

switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas. 8. S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc.

IEEE INFOCOM `01, April 2001, Alaska.

Page 88: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Randomized Algorithms

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Balaji Prabhakar

Page 89: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

89

Motivation

• Networking problems suffer from the “curse of dimensionality”– algorithmic solutions do not scale well

• Typical causes– size: large number of users– time: very high speeds of operation

• A good deterministic algorithm exists, but …– it requires too large a data structure– it needs state information, and “state” is too big– it “starts from scratch” in each iteration

Page 90: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

90

Overview

• In various scenarios, e.g.– caching– load balancing– switch scheduling– packet dropping (active queue management)

• We will– consider good (even optimal) exact algorithms– discuss their complexity– design approximate algorithms– and, analyze their performance

Page 91: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

91

Some Specifics

• Exact algorithms – in each scenario these are either well-known or easily

determined– when their analysis and optimality properties have been

established in the classical theoretical literature, we will only give some intuition and point to references

– if their development is more recent (e.g. switch scheduling), we will consider them in more detail

thus, the main focus of this segment is the design and analysis of approximate randomized schemes

• Randomized algorithms– are a powerful way of approximating– it is often possible to randomize deterministic

algorithms – this simplifies the implementation while retaining a

(surprisingly) high level of performance

Page 92: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

92

Randomization

• The main idea is – to simplify the decision-making process– by basing decisions upon a small, randomly

chosen sample of the state – rather than upon the complete state

Page 93: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

93

An Illustrative Example

• Find the largest element of a set S of size 1 billion• Deterministic algorithm: linear search

– has a complexity of 1 billion

• The randomized version: find the largest of 10 randomly chosen samples– has a complexity of 10– (note: this ignores complexity of choosing 10 random

samples)

• Performance– linear search will find the absolute largest element– if R is the element found by randomized algorithm, we can

make statements like P(R is at least the 100 millionth largest element) = thus, we can say that the performance of the randomized

algorithm is very good with a high probability

101

110

Page 94: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

94

Randomizing Iterative Schemes

• Often, we want to perform some operation iteratively

• Example: find the heaviest matching in a switch in every time slot

• Since, in each time slot– at most one packet can arrive at each input– and, at most one packet can depart from each output the size of the queues, or the “state” of the switch, doesn’t

change by much between successive time slots so, a matching that was heavy at time t will quite likely continue

to be heavy at time t+1

• This suggests that– knowing a heavy matching at time t should help in determining

a heavy matching at time t+1 there is no need to start from scratch in each time slot

Page 95: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

95

Summarizing the Philosophy…

• Randomized algorithms can help simplify the implementation– by reducing the amount of work in each iteration

• If the state of the system doesn’t change by much between iterations, then– we can reduce the work even further by carrying

information between iterations

• The big pay-off is that, even though it is an approximation, the performance

of a randomized scheme can be surprisingly good

Page 96: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

96

Examples

• We’ll discuss these issues in the following scenarios

– document replacement in web-caches– load balancing– switch scheduling– bandwidth firewalling via packet dropping

Page 97: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

A Randomized Web-Cache Replacement Scheme

Page 98: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

98

Background

• Tremendous increase in HTTP traffic• Proxy caches reduce

– network traffic– download latency– server load

Page 99: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

99

Replacement Policies

• In CPU caches– the Least Recently Used (LRU) algorithm

and variants• recentness of use exploits temporal correlation

• In Web caches– more complicated criteria

• different document sizes and fetching costs• recentness and frequency of use exploit

correlation and popularity

needed to determine the suitability of a document for eviction

Page 100: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

100

Motivation

• Data structures may get complicated– priority queues

• Supporting computations can get time- consuming– cache of size K, O(log K) per access, to

prepare for evictions

• Need efficient approximations

Page 101: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

101

A Randomized Algorithm

• First cut– pick N documents at random from cache– evict the least useful document

• For subsequent iterations…

• Why throw away all previous info?

• Second best (or second least useful sample) is pretty good!

Page 102: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

102

• First iteration– pick N documents at random from cache– evict the least useful document– retain the M next least useful documents

• Subsequent iterations– pick N-M documents at random from cache– evict the least useful document

• among the fresh N-M and the M retained

– retain the M next least useful documents

The Iterative Version

Page 103: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

103

An Example (N=8, M=2)

11 89 2 39 41 77 95 8

11 89 2 39 41 77 8

89 77311 22 49 25 82

8277

Page 104: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

104

Performance Criterion

• Deterministic algorithm would evict the most useless document

• Goal: document evicted in most useless nth percentile– error if goal not achieved

• Goal positively correlated with hit rate

Page 105: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

105

Memory Improves Performance

• Using memory improves performance significantly for small M– –

• It is like choosing the minimum of the minimum

nNNerror enMP )1()0(

nNNerror enMP 212)1()1(

Page 106: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

106

• Compute Perror as a function of memory

M– Xk: number of useless documents (in nth bin)

prior to kth replacement

– Ak: number of useless documents acquired

from resampling

– Xk is a Markov Chain

Some Analysis

Page 107: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

107

• Perror = P(Xk=0)

• Analysis independent of trace-characteristics

Xk Xk+1

1(Xk>0)

Ak+1

1(Xk+1>0)

k+2thk+1thkth

The Markov Chain

Page 108: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

108

Perror

0 1 2 3 4 5 60.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=12, n=10%

0 5 10 150

0.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=30, n=4%

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

M

Pe

rro

r

N=60, n=2%

0 10 20 30 400

0.05

0.1

0.15

0.2

0.25

M

Pe

rro

r

N=80, n=2%

Page 109: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

109

The Right Amount of Memory (M)

• From the figures it looks like there is an optimal value for M that minimizes Perror

– that is, if M is too low, we’re not carrying enough information between iterations

– if M is too high, we’re carrying a lot of (stale) information, there isn’t much new

• So there seems to be right balance between the amount of memory we need and the amount of random sampling

• More precisely…

Page 110: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

110

The Right Amount of Memory (M)

• Perror is a complicated function of M

• But, we can still get some info on its dependency on M

• First, we can show that Perror is a convex function of M– to do this we need to show that the discrete

second derivative of Perror (M) is non-negative

– this is done using a “coupling argument”

• As a result there exists an optimal value of M = M*

Page 111: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

111

The Optimal Value of M

• There is an approximate closed form formula for M*

• This is obtained using an appropriate “exponential martingale” based on the Markov chain X

*/1001,0max MnNNM o

K

Page 112: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

112

A Comparison

Page 113: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

113

Trace-driven Simulation

• Approximate the following web-cache replacement schemes– LRU– GD-Hyb (GD-Size and Hybrid)

• recentness• frequency• size• cost to fetchfrom the work of Cao and Irani, ’97, Wooster and Abrams, ‘97

Page 114: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

114

0 5 10 15 2060

65

70

75

80

85

90

95

100

% relative cache size

% h

it ra

teWeekly NLANR Trace

LRU(non-random): blackN=30, M=5: redN=8,M=2: cyanN=3,M=1: greenRR(N=1, M=0): blue

LRU: Hit Rate

Page 115: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

115

0 5 10 15 2060

65

70

75

80

85

90

95

100

% relative cache size

% h

it ra

teWeekly NLANR Trace

GD-Hyb(non-random): blackN=30, M=5: redN=8, M=2: cyanN=3, M=1: greenRR(N=1, M=0): blue

GD-Hyb: Hit Rate

Page 116: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

116

0 5 10 15 2050

55

60

65

70

75

80

85

90

95

% relative cache size

% r

ed

uce

d la

tenc

yDaily NLANR Trace

LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue

LRU: Latency Reduction

Page 117: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

117

0 5 10 15 2050

55

60

65

70

75

80

85

90

95

100

% relative cache size

% r

ed

uce

d la

tenc

y

Daily NLANR Trace

LRU(non-random):blackN=30,M=5:redN=8,M=2:cyanN=3,M=1:greenRR(N=1,M=0)=blue

GD-Hyb: Latency Reduction

Page 118: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

118

References

1. P. Cao and S. Irani, “Cost-aware WWW Proxy Caching Algorithms,” Proc. of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA, Dec 1997

2. R. Wooster and M. Abrams, “Proxy Caching that Estimates Edge Load Delays,” 6th International WWW Conference, Santa Clara, April 1997

3. K.Psounis and B. Prabhakar, “A Randomized Web-cache Replacement Scheme,” Proc. INFOCOM 2001

4. T. Lindvall, Lectures on the Coupling Method, Wiley Series in Probability and Mathematical Statistics, Wiley, New York, 1992.

5. R. Durrett, Probability: Theory and Examples, Duxbury Press,

Second Edition, 1996.

Page 119: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Randomized Load Balancing

Page 120: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

120

Load Balancing: Static Case

Page 121: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

121

Load Balancing: Dynamic Case

Page 122: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

122

A Simple (and elegant) Analysis

Page 123: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

123

Continuing…

Page 124: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

124

• Since the load doesn’t change by much between iterations …– that is, a lightly loaded queue is likely to continue to

be lightly loaded

• It might help– to remember the identity of the least loaded bin in

the current iteration for use in the next iteration – similar idea used in the web-caching problem

Carrying Information Between Iterations

Page 125: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

125

• The (d,1) system– d random choices – 1 bin stored in memory

• Question– How well does the (d,1) system perform ?

Load Balancing with Memory

Page 126: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

126

• The bin stored in memory– is likely to be very lightly loaded– so we might expect better load balancing

An Illustration

Page 127: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

127

• The maximum load achieved in the (d,1) system is less than log log n/log (2d-1) +O(1) with a high probability

• This is as if we had a (2d-1,0) system.– so the bin in memory is at least as good as (d-1)

samples– again, we see the minimum of minimums effect

Theorem (Shah and P)

Page 128: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

128

1. Y. Azar et. al., “Balanced Allocations,” Proc. Of ACM STOC, 1994.

2. M. Mitzenmacher, “The power of two choices in randomized load balancing,” PhD Thesis, UC Berkeley, 1996.

3. N. D. Vvedenskaya, R. Dobrushin and F. Karpelevich, “Queueing system with selection of the shortest of two queues: An asymptotic approach,” Problems of Information Transmission, 1996.

4. B. Vocking, “How Asymmetry Helps Load Balancing,” Proc. Of 40th IEEE-FOCS, 1999.

5. S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence, John Wiley and Sons, 1986.

References

Page 129: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Switch Scheduling

Page 130: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

130

Switch Scheduling and Bipartite Graph Matching

• As we have seen, switch scheduling is essentially finding matchings in weighted bipartite graphs

41

2

4

42

Page 131: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

131

Scheduling Algorithm

• Ideal policy: Maximum weight matching– weights: queue size, age of packets etc.– very complex for high speed networks

• In practice, approximate maximum weight matchings are the best hope

• We will discover good, randomized, approximate matchings in an evolutionary fashion– story told pictorially using simulations

Page 132: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

132

• Switch Size : 32 X 32

• Input Traffic (shown for a 4 X 4 switch) – Bernoulli i.i.d. inputs– diagonal load matrix:

• normalized load=x+y<1• x=2y

Simulation Scenario

xy

yx

yx

yx

00

00

00

00

Page 133: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

133

Obvious Randomized Schemes

• Choose a matching at random and use it as the schedule doesn’t give 100% throughput

• Choose 2 matchings at random and use the heavier one as the schedule

• Choose N matchings at random and use the heaviest one as the schedule

None of these can give 100% throughput !!

Page 134: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

134

0.001

0.01

0.1

1

10

100

1000

10000

0.0 0.2 0.4 0.6 0.8 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWM R32R1

Page 135: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

135

Bounds on Maximum Throughput

Page 136: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

136

Iterative Randomized Scheme(Tassiulas)

• Say M is the matching used at time t

• Let R be a new matching chosen u.a.r.

• At time t+1, use the heavier of M and R • This gives 100% throughput !

note the boost in throughput is due to memory

• But, delays are very large

Page 137: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

137

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMTassiulas

Page 138: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

138

Observations for Improvement

• Most of the weight of a matching is carried in a small number of edges

• Hence, remember edges not matchings

Page 139: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

139

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMR32M32 R1M1 Tassiulas

Page 140: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

140

Finer Observations

• Let M be schedule used at time t

• Choose a “good’’ random matching R

• M’ = Merge(M,R)

• M’ includes best edges from M and R

• Use M’ as schedule at time t+1

• Above procedure yields algorithm called LAURA

Page 141: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

141

3

2

3

2

2

1

2

3

4

1Merging

3

2

3

3

1

X R3-1+2-2=2

2-1+2-4=-1

W(X)=12 W(R)=10

M

W(M)=13

Merging Procedure

Page 142: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

142

0.01

0.1

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Mea

n IQ

Len

Normalized Load

Diagonal Traffic

MWMM-LAURA LAURAiLQFTassiulas

Page 143: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

143

References

1. L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input-queued switches,” Proc. INFOCOM 1998.

2. D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” Proc. of Hot Interconnects, 2001.

3. R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.

Page 144: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

A Randomized Bandwidth Partitioning Algorithm

Page 145: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

145

The Setup

• In a congested network with many users– QoS requirements are different

• Problems:– allocate bandwidth– control queue size and hence delay

Page 146: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

146

Approach 1: Network-centric

• Network node: fair queueing• User traffic: any type

problem: complex implementation

Page 147: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

147

Approach 2: User-centric

• Network node: simple FIFO• User traffic: congestion-aware (e.g.

TCP)problem: requires user cooperation

Page 148: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

148

Approach 2: Controlling Delays

• Use RED (Random Early Detection)– drop incoming packet randomly based on

congestion level– this signals the onset of congestion to the

sources who will back-off (if they are responsive)

• RED is simple – but can’t prevent unresponsive flows from

eating up all the bandwidth

• Goal: find a bandwidth partitioning algorithm that is close to RED in implementational simplicity

Page 149: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

149

Preliminary Comments

• Consider a single link shared by 1 unresponsive (red) flow and k responsive (green) flows

• Suppose the buffer gets congested

• Observe: It is likely there are more packets from the red (unresponsive) source

• So if a randomly chosen packet is evicted, it will likely be a red packet

• Therefore, one algorithm could be: When congested evict a random packet

Page 150: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

150

Preliminary Comments

• Unfortunately, this doesn’t work because there is a small non-zero chance of evicting a green packet

• Since green sources are responsive, they interpret the packet drop as a congestion signal and back-off

• This only frees up more room for red packets

• Idea: Suppose we choose two packets at random from the queue and compare their ids, then it is quite unlikely that both will be green

• This suggests another algorithm: Choose two packets at random and drop them both if

their ids agree• This works: That is, it limits the maximum bandwidth

the red source can consume

Page 151: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

151

The CHOKe Algorithm

• Builds on the previous observation

• Is a randomized algorithm (like RED, and is imbedded in RED)

• Turns out to have an easily analyzable performance via fluid models

• The last point is interesting, since we’ll see how surprisingly accurate fluid models are for modeling TCP- and UDP-type traffics

Page 152: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

152

The CHOKe Algorithm

Admit new packet

Arriving packet

y

ny

Drop both packets

Draw a packet at random from queue

end

end

n

Drop the new packet

end

Admit packet witha probability p

end

y

nAvgQsize <= Minth?

Both packets from same flow?

AvgQsize <= Maxth?

Page 153: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

153

The CHOKe Algorithm: Multiple Samples

Admit new packet

Arriving packet

y

ny

Drop all matched packets

Draw m packets at random from queue

end

end

n

Drop the new packet

end

Admit packet witha probability p

end

y

nAvgQsize <= Minth?

Do any of the packet ids match?

AvgQsize <= Maxth?

Page 154: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

154

Simulation Comparison: The setup

R11Mbps

10MbpsS(2)

S(m)

S(m+n)

TCP Sources

S(m+1)

UDP Sources

S(1)

R2

D(2)

D(m)

D(m+n)

TCP Sinks

D(m+1)

UDP Sinks

D(1)

10Mbps

Page 155: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

155

The Specifics

• 32 TCP flows, 1 UDP flow• All TCP’s maximum window size = 300 • All links have a propagation delay of

1ms• FIFO buffer size = 300 packets• All packets sizes = 1 KByte• RED: (minth,maxth) = (100,200)

packets

Page 156: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

156

Simulation 1: 1UDP source

UDP's arrival rate = 2 Mbps

0

200

400

600

800

1000

0 50 100

Time (second)

Th

rou

gh

pu

t (K

bp

s))

DropTail: UDP's ThroughputRED: UDP's ThroughputCHOKe: UDP's Throughput

Page 157: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

157

Different UDP Loadings

98.3%

97.5%

95.0%

87.0%74.1%

57.3%

45.8%34.8%

23.0%

0

50

100

150

200

250

300

350

400

100 1000 10000

UDP Arrival Rate (Kbps)

Th

rou

gh

pu

t (K

bp

s))

UDP Throughput with mark forUDP Dropping PercentageAverage TCP Throughput

Page 158: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

158

5 UDPs and 1 Sample from Queue

32 TCPs, 5 UDPs (with same arrival rate)

0

200

400

600

800

1000

1200

100 1000 10000Total UDP Arrival Rate (Kbps)

Th

rou

gh

pu

t (K

bp

s)) Total UDP Throughput

Total TCP Throughput

Page 159: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

159

5 Samples for 5 UDPs

0

200

400

600

800

1000

1200

100 1000 10000

Total UDP Arrival Rate (Kbps)

Thr

ough

put

(Kbp

s)

Total UDP Throughput

Total TCP Throughput

Page 160: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

160

How many samples to take?

• Since we don’t know a priori how many unresponsive flows are passing throught the link, take the number of samples depending on backlog

• As Qavg increases, increase number of samples

minthMaxth

R1R2Rk

avg

Page 161: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

161

A Fluid Analysis

discards from the queue

permeable tube with leakage

Page 162: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

162

Some notation

• N: total number of packets in the buffer

• Li(t): rate at which flow i’s packets cross position t of buffer • 0 = entrance and D = exit

• pi: fraction of flow i’s packets dropped at ingress

= fraction of flow i’s packets dropped in buffer (since drops occur in pairs)

i : rate at which flow i packets arrive

Page 163: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

163

The Equation

• Li(t)t - Li(t +t)t = i Li(t)t /N

=> - dLi(t)/dt = i Li(t) N

Li(0) = i (1-pi )

Li(D) = i (1-2pi )

• This first order differential equation can be solved explicitly for Li(t), 0 < t < D

Page 164: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

164

Simulation Comparison: 1UDP, 32 TCPs

0

50

100

150

200

250

300

350

0.1 1 10Arrival Rate

Thr

ough

put

fluid model

CHOKe ns simulation

Page 165: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

165

Fluid Analysis of Multiple Samples

• With M samples

Li(t)t - Li(t +t)t = Mi Li(t)t /N

=> - dLi(t)/dt = Mi Li(t) N

Li(0) = i (1-pi )M

Li(D) = i (1-pi )M - Mi pi

Page 166: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

166

Comparison: 1 UDP, 2 Samples

0

20

40

60

80

100

120

140

0 0.5 1 1.5 2UDP Arrival Rate(Mbps)

UD

P T

hrou

ghpu

t(K

bps)

NS SimulationFluid Model

Page 167: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

167

References

1. A. Demers, S. Kesav and S. Shenker, “Analysis and simulation of a fair queueing algorithm,” Proc. ACM SIGCOMM 1989.

2. S. Floyd and V. Jacobson, “Link-sharing and Resource Management Models for Packet Networks,” IEEE/ACM Trans. on Networking, 1995.

3. R. Braden et. al., “Recommendations on queue management and congestion avoidance in the Internet,” IETF RFC (Informational) 2309, April 1998.

4. R. Pan, B. Prabhakar and K. Psounis, “CHOKe: A stateless active queue management scheme for approximating fair bandwidth allocation,” Proc. INFOCOM 2000.

Page 168: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

Competitive Analysis: Theory and Applications in Networking

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Balaji Prabhakar

Page 169: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

169

Competitive Analysis in Networking: Outline

• Background• Incremental construction of Multicast

Trees– The Greedy Strategy

• Routing and Admission Control– The Exponential Metric

• More Restricted Adversaries– Adversarial Queueing Theory

Theoretical Analysis; Rules of Thumb;Pragmatic Analysis

Page 170: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

170

Decision Making Under Uncertainty:

Online Algorithms and Competitive Analysis

• Online Algorithm:– Inputs arrive online (one by one)– Algorithm must process each input as it arrives– Lack of knowledge of future arrivals results in

inefficiency

• Malicious, All-powerful Adversary:– Omniscient: monitors the algorithm– Generates “worst-case” inputs

• Competitive Ratio:– Worst ratio of the “cost” of online algorithm to

the “cost” of optimum algorithm

Page 171: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

171

Warm-up Example: The Unlucky Skier

• Beginning Skier:– Does not know how many ski trips she will

make– Can rent skis for $40 or buy skis for $400– Online algorithm: on each successive trip,

must decide whether to buy or continue renting

• Adversary: All powerful– As long as the skier is renting, the adversary

will send her on another trip– As soon as the skier buys, the adversary will

stop her ski trips once and for all

Page 172: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

172

The Unlucky Skier [Contd.]

• Buy after K trips– Cost of the algorithm = K £ $40 + $400– Optimum Cost = min{$400, (K+1) £ $40}– Competitive Ratio: Algorithm’s Cost/Optimum Cost

• Best Strategy: Rent 9 times, buy on 10th trip– Competitive Ratio = 760/400 ¼ 2– Best strategy does not always yield best solution

• Ski Principle: Buy after paying enough rent

Page 173: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

173

Competitive Analysis: Discussion

• Very Harsh Model– All powerful adversary

• But..– Can often still prove good competitive ratios– Really tough Testing-Ground for Algorithms– Often leads to good rules of thumb which can

be validated by other analyses– Distribution independent: doesn’t matter

whether traffic is heavy-tailed or Poisson or Bernoulli

Page 174: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

174

Competitive Analysis in Networking: Outline

• Background• Incremental construction of Multicast

Trees– The Greedy Strategy

• Routing and Admission Control– The Exponential Metric

• More Restricted Adversaries– Adversarial Queueing Theory

Page 175: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

175

Incremental Construction of Multicast Trees

• Fixed Multicast Source s– K Receivers arrive one by one– Must adapt multicast tree to each new arrival

without rerouting existing receivers– Malicious adversary generates bad requests– Objective: Minimize total size of multicast tree– Applications: Streaming; Cache updates; ……

s r1

a

b

r1r1

bb

C ¸ 3/2

Can create worse sequences

Page 176: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

176

Two Classes of Algorithms

• Shortest Path Algorithm– Each receiver connects using shortest path to

source (or to a core)• DVMRP [Waitzman, Partridge, Deering ’88]• CBT [Ballardie, Francis, Crowcroft ‘93]• PIM [Deering et al. ’96]

• Greedy Algorithm [Imase and Waxman ‘91]– Each receiver connects to the closest point on

the existing tree– Independently known to the Systems community

• The “naive” algorithm [Doar and Leslie ‘92]• End-system multicasting [Faloutsos, Banerjea, Pankaj

’98; Francis ‘99]

Page 177: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

177

Shortest Path Algorithm: Example

• Receivers r1, r2, r3, … , rK join in order

N

s

r1

r2

r3

rK

Page 178: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

178

Shortest Path Algorithm

• Cost of shortest path tree ¼ K £ N

N

s

r1

r2

r3

rK

Page 179: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

179

Shortest Path AlgorithmCompetitive Ratio

• Optimum Cost ¼ K + N– Competitive Ratio ¼ KN/(K+N)

• If N is large, then competitive ratio ¼ K

s

r1

r2

r3

rK

Page 180: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

180

Greedy Algorithm

• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)

• Theorem 2: No algorithm can achieve a competitive ratio better than log K

[Imase and Waxman ’91]

Greedy algorithm is the optimum strategy

Page 181: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

181

Proof of Theorem 1

[Alon and Azar ’93]

• L = Size of the optimum multicast tree

• pi = amount paid by online algorithm for ri

– i.e. the increase in size of the greedy multicast tree as a result of adding receiver ri

• Lemma 1: The greedy algorithm pays 2L/j or more for at most j receivers– Assume the lemma– Total Cost 2L (1 + 1/2 + 1/3 + … 1/K) ¼ 2L

log K

Page 182: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

182

Proof of Lemma 3

• Suppose there are more than j receivers for which the greedy algorithm paid more than 2L/j– Let these be r1, r2, … , rm, for m larger than j

– Each of these receivers is at least 2L/j away from each other and from the source

) The shortest tour with all these receivers and the source ¸ (2L/j)m > 2L

) Cost of multicast tree ¸ ½ (cost of tour) > LContradiction!

Page 183: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

183

rm

Tours and Trees

s

r1 r2

r3

r4

Each segment ¸ 2L/j

) Tour cost ¸ (2L/j)m > 2L

s

r1 r2r3

r4

rm

Can construct tour from tree by repeating edgesAt most twice) Tree Cost ¸ ½(Tour cost)

> L

Page 184: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

184

Greedy Algorithm: Recap

• Add new receiver to closest node on existing tree

• Theorem 1: For the greedy algorithm, competitive ratio = O(log K)

• Theorem 2: No algorithm can achieve a competitive ratio better than log K

• Greedy algorithm is the optimum strategy

• Shortest path algorithm can be pretty bad

Page 185: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

185

Objections to the Greedy Algorithm

• Log K is pretty bad

• We don’t care about performance–Network bandwidth is cheap

• Shortest path performs well in practice– The example given earlier is pathological

• Greedy algorithm is impractical

• We don’t trust theoreticians–Theoreticians always hide something

All valid concerns; Must be addressed

Page 186: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

186

Log K is Pretty Bad?

• But K is worse !!

Page 187: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

187

Network Bandwidth is Cheap?

• Quantitative Analysis helps

– Difference between shortest path algorithm and greedy algorithm is K/log K

– Network bandwidth is not that cheap, specially for bandwidth intensive multicasts

Page 188: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

188

Shortest Path Works Well in Real-life Networks?

• What are “Real-life” networks?– Internet topology is not completely

understood

• Must look at interesting special cases– Assume receivers chosen at random1. The network looks like a grid

– Shortest path: Competitive ratio = SquareRoot(K)– Greedy Algorithm: Competitive ratio = O(1)

2. The network looks like a random graph– Shortest path: Competitive ratio = O(1)– Greedy Algorithm: Competitive ratio = O(1)

[Goel and Munagala ’00]

Page 189: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

189

Greedy Algorithm is Impractical?

• Yes, for deployment at lower network layers

• But not if multicast routing occurs at the application layer

• Several systems now implement similar schemes (end-system multicast)– Qosmic [Faloutsos, Banerjea, Pankaj ’98]– Yallcast/YOID [Francis ’99]………

Page 190: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

190

Theoreticians Hide Things?

• So what did we hide here?– Greedy algorithm can result in large latency

from source to receivers– Shortest path algorithm can achieve the best

possible latency

• Fix: Reroute large latency receivers and some of their ancestors– Close to optimum latencies– Tree size close to the greedy tree– No receiver rerouted more than once[Goel and Munagala ‘00]

Page 191: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

191

Moral

• Rule of thumb for multicast routing:– Since future is unknown, be greedy in the

present

• Meta-morals:– Competitive analysis can yield valuable clues

about algorithm performance– Caution: Competitive analysis is the beginning,

not the end– Must validate online algorithms in systems

setting– Must often tweak the algorithms

Page 192: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

192

Competitive Analysis in Networking: Outline

• Background• Incremental construction of Multicast

Trees– The Greedy Strategy

• Routing and Admission Control– The Exponential Metric

• More Restricted Adversaries– Adversarial Queueing Theory

Page 193: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

193

The Exponential Cost Metric

• Consider a resource with capacity C• Assume that a fraction of the resource has

been consumed• Exponential cost “rule of thumb”: The cost of the

resource is given by for appropriately chosen • Intuition: Cost increases steeply with

– Bottleneck resources become expensive

Cost

Page 194: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

194

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

Page 195: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

195

The Online Routing Problem

• Connection establishment requests arrive online in a VPN (Virtual Private Network)

• Must assign a route to each connection and reserve bandwidth along that route– PVCs in ATM networks– MPLS + RSVP in IP networks

• Oversubscribing is allowed– Congestion = the worst oversubscribing on a link

• Goal: Assign routes to minimize congestion• Assume all connections have identical b/w

requirement, all links have identical capacity

Page 196: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

196

Online Routing Problem: Example

s r1

b

r1r1C ¸ 2

Can create worse sequences

aaa

Page 197: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

197

Online Algorithm for Routing

• L = Fraction of bandwidth of link L that has been already reserved

• = N, the size of the network

• The Exponential Cost Algorithm:– Route each incoming connection on current

cheapest path from src to dst– Reserve bandwidth along this path[Aspnes et al. ‘93]

Page 198: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

198

Online Algorithm for Routing

• Theorem 1: The exponential cost algorithm achieves a competitive ratio of O(log N) for congestion

• Theorem 2: No algorithm can achieve competitive ratio better than log N in asymmetric networks

This simple strategy is optimum!

Does the idea extend to other problems? To more realistic scenarios?

Page 199: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

199

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

Page 200: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

200

Online Admission Control and Routing

• Connection establishment requests arrive online

• Must assign a route to each connection and reserve bandwidth along that route

• Oversubscribing is not allowed– Must perform admission control

• Goal: Admit and route connections to maximize total number of accepted connections (throughput)

Page 201: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

201

Exponential Metric and Admission Control

• When a connection arrives, compute the cheapest path under current exponential costs

• If the cost of the path is less than then accept the connection; else reject[Awerbuch, Azar, Plotkin ’93]

• Theorem: This simple algorithm admits at least O(1/log N) as many calls as the optimum

Page 202: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

202

Objections to Exponential Costs

• Log N is too bad• Requires permanent connections• Too inefficient

– Frequent “link-state updates”– Frequent computation of shortest paths

Page 203: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

203

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

Page 204: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

204

Assume Stochastic Arrivals

• Connection arrivals are Poisson, durations are Memory-less

• Assume fat links (Capacity >> log N)• Theorem: The exponential cost

algorithm results in1. Near-optimum congestion for routing problem 2. Near-optimum throughput for admission

problem[Kamath, Palmon, Plotkin ’96]Near-optimum: Compt. ratio = (1+) for close

to 0

Page 205: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

205

Versatility of Exponential Costs

• Guarantees of log N for Competitive ratio against malicious adversary

• Near-optimum for stochastic arrivals• Near-optimum given fixed traffic matrix

[Young ’95; Garg and Konemann ’98]

No need to know whether there is an adversary, or what the stochastic distribution is, or what the traffic matrix is !!

Page 206: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

206

Objections to Exponential Costs

• Log N is too bad• Requires permanent connections• Too inefficient

– Frequent “link-state updates”– Frequent computation of shortest paths

Page 207: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

207

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

Page 208: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

208

Exponential Metrics and Stale Information

• Exponential metrics continue to work well if– Link states are a little stale– Shortest paths are reused over small intervals

rather than recomputed for each connection– No centralized agent[Goel, Meyerson, Plotkin ’01]

• Caveat: Still pretty hard to implement

Page 209: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

209

Applications of Exponential Costs

• Exponential cost “rule of thumb” applies to– Online Routing– Online Call Admission Control– Stochastic arrivals– Stale Information– Power aware routing

Page 210: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

210

Power Aware Routing

• Consider a group of small mobile nodes eg. sensors which form an adhoc network– Bottleneck Resource: Battery– Goal: Maximize the time till the network partitions

• Assign a cost to each mobile node which is where = fraction of battery consumed– Send packets over the cheapest path under this cost

measure

• O(log n) competitive against an adversary– Near-optimum for stochastic/fixed traffic

Page 211: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

211

Power Aware Routing: Implementation?

• Hard to implement in general• Consider the Directed Diffusion Model

[Intanagonwiwat, Govindan, Estrin ’00]– Receiver floods network with interest for

desired data– Interest reaches the source– Source sends data over multiple paths– Receiver reinforces the “best” path

• Just send accumulated sum of exponential costs along with the data– Receiver reinforces the path with the least cost

Page 212: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

212

Competitive Analysis in Networking: Outline

• Background• Incremental construction of Multicast

Trees– The Greedy Strategy

• Routing and Admission Control– The Exponential Metric

• More Restricted Adversaries– Adversarial Queueing Theory

Page 213: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

213

• Malicious, all-knowing adversary– Injects packets into the network– Each packet must travel over a specified route

• Suppose adversary injects 3 packets per second from s to r– Link capacities are one packet per second

– No matter what we do, we will have unbounded queues and unbounded delays

– Need to temper our definition of adversaries

Adversarial Queueing TheoryMotivation

sr

Page 214: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

214

Adversarial Queueing TheoryBounded Adversaries

• Given a window size W, and a rate r < 1– For any link L, and during any interval of

duration T > W, the adversary can inject at most rT packets which have link L in their route

• Adversary can’t set an impossible task!!– More gentle than competitive analysis

• Will study packet scheduling strategies– Which packet to forward if more than one

packets are waiting to cross a link?

Page 215: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

215

Some Interesting Scheduling Policies

• FIFO: First In First Out• LIFO: Last In First Out• NTG: Nearest To Go

– Forward a packet which is closest to destination

• FTG: Furthest To Go– Forward a packet which is furthest from its destination

• LIS: Longest In System– Forward the packet that got injected the earliest– Global FIFO

• SIS: Shortest In System– Forward the packet that got injected the last– Global LIFO

Page 216: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

216

Stability in the Adversarial Model

• Consider a scheduling policy (eg. FIFO, LIFO etc.)

• The policy is universally stable if for networks and all “bounded adversaries”, the packet delays and queue sizes remain bounded[Borodin et al. ‘96]

• FIFO, LIFO, NTG are not universally stable

• LIS, SIS, FTG are universally stable[Andrews et al. ‘96]

Page 217: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

217

Adversarial Queueing Model: Routing

Using the Exponential Cost Metric

• Adversary injects packets into the network but gives only the src, dst– The correct routes are hidden

• Need to compute routes– Again, use the exponential cost metric– Reset the cost periodically to zero– Use any stable scheduling policy

• Theorem: The combined routing and scheduling policy is universally stable[Andrews et al. ’01]

Page 218: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

218

Summary

• Competitive analysis models decision making under uncertainty– Applicable to a wide range of networking

problems

• General rules of thumb– Greedy algorithm for multicasting– Exponential cost metric for online routing,

admission control, stochastic injections, power-aware routing

• Adversarial Queueing Theory– Bounded adversaries– FIFO unstable; LIS stable– Exponential metrics result in stable routing

Page 219: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

219

References

1. N. Alon and Y. Azar. On-line Steiner trees in the Euclidean plane. Discrete and Computational Geometry, 10(2), 113-121, 1993.

2. M. Andrews, B. Awerbuch, A. Fernandez, J. Kleinberg, T. Leighton, and Z. Liu. Universal stability results for greedy contention-resolution protocols. Proceedings of the 37th IEEE Conference on Foundations of Computer Science, 1996.

3. M. Andrews, A. Fernandez, A. Goel, and L. Zhang. Source Routing and Scheduling in Packet Networks. To appear in the proceedings of the 42nd IEEE Foundations of Computer Science, 2001.

4. J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts. On-line load balancing with applications to machine scheduling and virtual circuit routing. Proceedings of the 25th ACM Symposium on Theory of Computing, 1993.

5. B. Awerbuch, Y. Azar, and S. Plotkin. Throughput competitive online routing. Proceedings of the 34th IEEE symposium on Foundations of Computer Science, 1993.

6. A. Ballardie, P. Francis, and J. Crowcroft. Core Based Trees(CBT) - An architecture for scalable inter-domain multicast routing. Proceedings of the ACM SIGCOMM, 1993.

Page 220: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

220

References [Contd.]

7. A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. Williamson. Adversarial queueing theory. Proceedings of the 28th ACM Symposium on Theory of Computing, 1996.

8. S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei. The PIM architecture for wide-area multicast routing. IEEE/ACM Transactions on Networking, 4(2), 153-162, 1996.

9. M. Doar and I. Leslie. How bad is Naïve Multicast Routing? IEEE INFOCOM, 82-89, 1992.

10. M. Faloutsos, A. Banerjea, and R. Pankaj. QoSMIC: quality of service sensitive multicast Internet protocol. Computer Communication Review, 28(4), 144-53, 1998.

11. P. Francis. Yoid: Extending the Internet Multicast Architecture. Unrefereed report, http://www.isi.edu/div7/yoid/docs/index.html .

12. N. Garg and J. Konemann. Faster and simpler algorithms for multicommodity flow and other fractional packing problems. Proceedings of the 39th IEEE Foundations of Computer Science, 1998.

Page 221: Ashish Goel Dept of CS USC ashish@pollux.usc.edu Balaji Prabhakar Network Algorithms: Techniques for Design and Analysis Nick McKeown Depts of EE and CS

221

References [Contd.]

13. A. Goel, A. Meyerson, and S. Plotkin. Distributed Admission Control, Scheduling, and Routing with Stale Information. Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms, 2001.

14. A. Goel and K. Munagala. Extending Greedy Multicast Routing to Delay Sensitive Applications. Short abstract in proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms, 2000. Long version to appear in Algorithmica.

15. M. Imase and B. Waxman. Dynamic Steiner tree problem. SIAM J. Discrete Math., 4(3), 369-384, 1991.

16. C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MobiCOM), 2000.

17. A. Kamath, O. Palmon, and S. Plotkin. Routing and admission control in general topology networks with Poisson arrivals. Proceedings of the 7th ACM-SIAM Symposium on Discrete Algorithms, 1996.

18. D. Waitzman, C. Partridge, and S. Deering. Distance Vector Multicast Routing Protocol. Internet RFC 1075, 1988.

19. N. Young. Randomized rounding without solving the linear program. Proceedings of the 6th ACM-SIAM Symposium on Discrete Algorithms, 1995.