Upload
talor
View
43
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments. Pr(success 1 , …, success n ) = ??. Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Information Theory and Applications Workshop (ITA), UCSD Feb. 2009. - PowerPoint PPT Presentation
Citation preview
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments
Michael J. NeelyUniversity of Southern California
http://www-rcf.usc.edu/~mjneelyInformation Theory and Applications Workshop
(ITA), UCSD Feb. 2009*Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525
Pr(success1, …, successn) = ??
•Slotted System, slots t in {0, 1, 2, …}
•Network Queues: Q(t) = (Q1(t), …, QL(t))
•2-Stage Control Decision Every slot t: 1) Stage 1 Decision: k(t) in {1, 2, …, K}.
Reveals random vector w(t) (iid given k(t)) w(t) has unknown distribution Fk(w).
2) Stage 2 Decision: I(t) in I (a possibly infinite set). Affects queue rates: A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)) Incurs a “Penalty Vector” x(t): x(t) = x(k(t), w(t), I(t))
0 1 2 3 4 5 6
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
Goal: Choose stage 1 and stage 2 decisions over time so that the time average penalties x solve:
f(x), hn(x) general convex functions of multi-variables
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs
A1(t)
A2(t)
AL(t)
S1(t)
S2(t)
SL(t)
If channel states are known every slot: Can Schedule without knowing channel statistics or arrival rates! (EECA --- Neely 2005, 2006) (Georgiadis, Neely, Tassiulas F&T 2006)
Minimize Avg. PowerSubject to Stability
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs
A1(t)
A2(t)
AL(t)
S1(t)
S2(t)
SL(t)
If “cost” to measuring, we make a 2-stage decision:Stage 1: Measure or Not? (reveals channels w(t) )Stage 2: Transmit over a known channel? a blind channel?
-Li and Neely (07) -Gopalan, Caramanis, Shakkottai (07)Existing Solutions require a-priori knowledge of the full joint-channel state distribution! (2L , 1024L ? )
Minimize Avg. PowerSubject to Stability
Motivating Example 2: Diversity Backpressure Routing (DIVBAR)
1
23
broadcasting
error
Networking with Lossy channels & Multi-Receiver Diversity:DIVBAR Stage 1: Choose Commodity and TransmitDIVBAR Stage 2: Get Success Feedback, Choose Next hop
If there is a single commodity (no stage 1 decision), we do not need success probabilities! If two or more commodities, we need full joint success probability distribution over all neighbors!
[Neely, Urgaonkar 2006, 2008]
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
Goal:
Equivalent to:
Where g(t) is an auxiliary vector that is a proxy for x(t).
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
EquivalentGoal:
Technique: Form virtual queues for each constraint.
U(t) bh(g(t)) Un(t+1) = max[Un(t) + hn(g(t)) – bn,0]
Z(t) g(t)x(t) Zm(t+1) = Zm(t) – gm(t) + xm(t)
Possibly negative
Use Stochastic Lyapunov Optimization Technique: [Neely 2003], [Georgiadis, Neely, Tassiulas F&T 2006]
Define: Q(t) = All Queues States = [Q(t), Z(t), U(t)]Define: L(Q(t)) = (1/2)[sum of squared queue sizes]Define: D(Q(t)) = E{L(Q(t+1)) – L(Q(t))|Q(t)}
Schedule using the modified “Max-Weight” Rule: Every slot t, observe queue states and make a 2-stage decision to minimize the “drift plus penalty”:
Minimize: D(Q(t)) + Vf(g(t))
Where V is a constant control parameter that affectsProximity to optimality (and a delay tradeoff).
How to (try to) minimize:
Minimize: D(Q(t)) + Vf(g(t))The proxy variables g(t) appear separably, and their termscan be minimized without knowing system stochastics!
Minimize:
Subject to:
[Zm(t) and Un(t) are known queue backlogs for slot t]
Minimizing the Remaining Terms:
Minimize: D(Q(t)) + Vf(g(t))
Solution: Define g(mw)(t), I(mw)(t) , k(mw)(t) as the ideal max-weight decisions (minimizing the drift expression).
Define ek(t):
k(mw)(t) = argmin{k in {1,.., K}} ek(t) (Stage 1)
I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q) (Stage 2)
g(mw)(t) = solution to the proxy problem
Then:
?
Approximation Theorem: (related to Neely 2003, G-N-T F&T 2006)
If actual decisions satisfy:
With:
(related to slackness of constraints)
Then: -All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ]
-Average Queue Sizes <
f( x ) < f*optimal + O(max[eQ,eZ]) + (B+C)/V
-Penalty Satisfies:
It all hinges on our approximation of ek(t):
Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here.
{w1(k)(t), …, wW
(k)(t)} = samples over past W type k explor. events
Approach 1:
It all hinges on our approximation of ek(t):
Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here.
{w1(k)(t), …, wW
(k)(t)} = samples over past W type k explor. Events{Q1
(k)(t), …, QW(k)(t)} = queue backlogs at these sample times.
Approach 2:
Analysis (Approach 2):
Subtleties:1) “Inspection Paradox” issue requires use of samples at exploration events, so {w1
(k)(t), …, wW(k)(t)} iid.
2) Even so, {w1(k)(t), …, wW
(k)(t)} are correlated with queue backlogs at time t, and so we cannot directly apply the Law of Large Numbers!
Analysis (Approach 2):
Use a “Delayed Queue” Analysis:
constant constantCan Apply LLN
ttstart
wW(t)w1(t) w2(t) w3(t)
Max-Weight Learning Algorithm (Approach 2):(No knowledge of probability distributions is required!)
-Have Random Exploration Events (prob. q).
-Choose Stage-1 decision k(t) = argmin{k in {1,.., K}}[ ek(t) ]
-Use I(mw)(t) for Stage-2 decision: I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q(t))
-Use g(mw)(t) for proxy variables.
-Update the virtual queues and the moving averages.
Theorem (Fixed W, V): With window size W we have:
-All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ]
-Average Queue Sizes <
f( x ) < f*q + O(1/sqrt{W}) + (B+C)/V
-Penalty Satisfies:
Concluding Theorem (Variable W, V): Let 0 < b1 < b2 < 1.
Define V(t) = (t + 1) b1 , W(t) = (t+1)b2
Then under the Max-Weight Learning Algorithm: -All Constraints are Satisfied. -All Queues are mean rate stable*:
-Average Penalty gets exact optimality (subject to random exploration events):
f( x ) = f*q
*Mean rate stability does not imply finite average congestion and delay. In fact, Average congestion and delay are necessarily infinite when exact optimality is reached.