Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
Designing Autonomic Wireless Multi-hop Networks for Delay-Sensitive Applications
Peter Hsien-Po ShiangAdvisor : Prof. Mihaela van der Schaar
Electrical Engineering, UCLA
2
Delay-sensitive applications are booming!
Examples of delay-sensitive applications
1) Hard delay constraints2) Prioritized multimedia traffic
(graceful degradation desired)
Video telephony SurveillanceLive audio Live video
Vehicular communications
Battlefield sensing GamesVideo conferencing
3
Overall goalBuilding efficient multi-hop networks for delay-sensitive applications
- Autonomic decision making framework• Gather local information• Learn• Make decisions and interact
Channel condition
Autonomic node = Agent
Information gathering phase
Control info.
DataDecision making phase
Info. exchange
Learning phase
Application layertraffic requirementsNetwork
environment
4
Autonomic network scenarios
SINRInterference coupling among transmitter-receiver pairs
Power control over ad hoc mobile networks
Primary users’loading, other secondary users’ actions
Resource availability(spectrum holes)
Distributed resource management over cognitive radio networks
Source rate, transmission rate, packet error rate
Source traffic, channel condition
Multimedia transmission over wireless mesh network
Local information
DynamicsNetwork scenarios
……
……
……
1PU
2PU
3PU
4PU
1PU
4PU
1F
2F
3F
4F
1 2, ,..., NSU SU SU
S2
D1
r4
r5
r1
r2
r3
S1
D2
S2S2
D1D1
r4
r5
r1r1
r2
r3
S1
D2
5
Overview
I. Multimedia transmission over mesh networksII. Exploiting information over space
– information horizonIII. Exploiting information over time
– learningIV. Conclusions
1. Information gathering phase
Control info.
Data packets3. Decision phase
2. Learning phase
Multimediacharacteristics
6
Focus: multiple multimedia applications over multi-hop wireless networks
V1 V2
Nodes:Applications:Actions:Utility:Goal: Maximizing overall efficiency
iV=V
m=M
, mA m= ∀ ∈A M
( )iU A
max ( )i
i
V
U∈∑
AV
A
m
, 1,..., i k iV C k K= =
Edges:Classes:
( , ') m ml=E
7
Limitations of prior work (1/2)
Centralized optimization for multimedia transmission
• Wu and Chou (2005)
• Setton, Yoo, Zhu, Goldsmith, and Girod. (2005)
• Jurca, Frossard (2007)
• Andreopoulos, Mastronarde, and Van der Schaar (2006,2007)
max ( ( ))
s.t. ( )
i
i i
V
U R
CR
∈
⊆
∑A
V
A
R A Capacity constraint
Function of resource, e.g. throughput
Low
High
Complexity
Distributed
Centralized
Decision maker
Fast
Slow
Adaptation ability
LowProposed sol.
HighTraditional sol.
Information overhead
8
Limitations of prior decentralized work (2/2)
Flow-based optimized routing using queue information feedback
• Awerbuch and Leighton (1994)
• Neely, Modiano, Rohrs (2005)
• Gupta, Javidi (2007)
• Gupta, Lin, Srikant (2007)
Flow-based optimized routing using link state information
• Wei, Zakhor (2002)
• Draves, Padhye, Zill (2004)
Online adaptation
Predetermined
Resource allocation
Packet-based
Flow-based
Applicationmodel
ExplicitProposed sol.
ImplicitTraditional sol.
Delayconstraint
9
Challenges
Heterogeneous characteristics of delay-sensitive applications
Different priorities, hard delay deadlines, and loss tolerance
Time-varying transmission environmentDynamic network conditions
Informationally-decentralized environmentCost of information gathering
Coupling among agents’ actions and utilities
10
Required solution features for multimedia transmission
Fully distributed optimization that determines the actions at each node , e.g. how to relay
Dynamic adaptation to the changing network/source conditions at each node and coupling between nodes’actions
From rate-constraint flow-based to explicit delay-constraint packet-based optimization
mA
11
Delay-sensitive application quality model
1i
Ktransi k k
V k
Q Rλ∈ =
=∑ ∑V
Quality at the sources:
1
(1 ( ))i
Kreceivei k k k
V k
Q R Pλ∈ =
= −∑ ∑V
A
Quality at the destinations:
GOP Transmission Time
Multimedia Packets
:
:
:
:
Classes
1C
3C
2C
4C
1 1 1, ,D Lλ
3 3 3, ,D Lλ
2 2 2, ,D Lλ
4 4 4, ,D Lλ
1V
2V
Source Nodes Relay Nodes
1m
2m
3m
4m
12
Cross-layer transmission strategy (action )
Applicationscheduling
Networkrelay selection
Data Linkretransmission limit
PhysicalMCS selection
mA
Priority queuing model provides an unified framework to analyze
• Packet scheduling• Relay selection impacts packet
arrival process • ARQ can be modeled as geometric
service time distribution• MCS provides different physical
transmission rates and packet error rates
Per-packet decisions with explicit delay constraints
13
Elementary structure (2-hop case)
V users with distinct sources and destinations.M intermediate nodes.Information feedback from all the nodes of the next hop
– fully distributed.
14
Decisions at the PHY and MAC layer
We show that in the delay constraint optimization, it is optimal to transmit the most important packet first with infinite retransmission limit
Knowing the next relay, the optimal modulation and coding scheme:
, 1, h h
MAXk m mγ
+→ ∞
( )1 1 11, , , , , ,, ,
ˆ argmax 1 ( , ) ( )h h h h h hh h
effk m m k m m k k m mk m mT p L Tθ θ θ
+ + ++
= = −
( )1
, 1
,, ,, 1
hh h
h h
eff PASSk k mk m mMAX
k m mk
T D Delay
Lγ +
+
− = −
Cascade the elementary structure to a multi-hop network
15
Overlay multi-hop network structure
Directed Acyclic Multi-hop NetworkCan be applied to any physical network as an overlay network
Classes
1
K
C
C
Information feedback
16
Decentralized optimization
• Advantages:– No predetermined rate allocation– Low complexity– Fully distributed solution– Fast adaptation to network changes
• Delay evaluation
Centralized cross-layer optimization:
1
argmax (1 ( ))
s.t. ( ) , 1,...,
Kopt
k k k
k
k k
R P
Delay D k K
λ=
= −
≤ =
∑A
A A
A
Decentralized optimization:
,
, ,,
, , ,
( ) argmin [ ( , )]
s.t. ( , )
h h h hhk mh
h h h h
optm k m k m mk m
A
PASSk m k m m k k m
A E Delay A
Delay A D Delay
=
≤ −
L L
L
,[ ]hk mE Delay
Delay constraints
17
Route selection at the network layer
Our method: a generalization of the Bellman-Ford routing algorithm
Local information:Transmission rates, packet error rates, expected delays
1
1 1
1
, , , 1, ,1
[ ] [ ] [ ]h
h h h h
h
M
k m k m k h m k m
m
E Delay EW E Delayβ+
+ +
+
+=
= + ∑
Information from the next hopInformation to the previous hop relay selecting parameter
, ,,1 [ ]h
h
kk h m
k m
Coeff
E Delay γβκ
=+
Property: Automatically avoid the congestion regionMulti-path routing
1
,
1
1 [ ]hh
kk mm that
feedback
CoeffE Delay γκ
−
= +
∑
Proposed solution: self-learning algorithm [H. Shiang JSAC 2007]
18
Convergence of the route selection
Proposition 1: The self-learning policy over an H-hop overlay network will converge to a steady state
Key idea:
Evaluate
,[ ]hk mE Delay
Get informationfeedback
1,[ ]hk mE Delay+
Learn the relayselecting prob.
1, 1, hk h mβ++
Priority queuing analysis for
,[ ]hk mEW
1
1 1
1
, , , 1, ,1
[ ] [ ] [ ]h
h h h h
h
M
k m k m k h m k m
m
E Delay EW E Delayβ+
+ +
+
+=
= + ∑
19
Advantages of using a queuing model
Advantages:• Fast adaptation to the dynamics• Sophisticated models at different layers
Video streamsstatistics
Relayselection
Input rateanalysis
, , hk h mη
, , hk h mX
Service timeanalysis
(MAC layer)Retransmission,TXOP(PHY) Modulation
Priority queuing analysis
Delay/Packet loss
20
Average queue waiting time analysisAverage queue waiting time (M/G/1 preemptive-repeat model):
2, , , ,
1, , 1
, , , , , , , ,1 1
[ ]
[ ]
2 1 [ ] 1 [ ]
h h
h
h h h h
k
i h m i h m
ik h m k k
i h m i h m i h m i h m
i i
E X
EW
E X E X
η
η η
=−
= =
= − −
∑
∑ ∑
Interference affects the average service time
1
, , , , ,0
1
, , , , ,0 1
, , , ,, ,1
Prob [ ]
[ ] [ ]
[ ] exp( )[ ]
h h
h h
h h
h
h
k h m k h m k k j
j
h K
k k j i h m i h mKj i
i h m i h mk h mi
P W D EW
D EW E X
E XEW
η
η
−
=
−
= =
=
= > −
− ≈ −
∑
∑ ∑∑
Approximation of packet loss rate at a relay
, , , , ,1
[ ] [ ]j
j j
j
M
k j k j m k j m
m
EW EWβ=
= ∑
21
Results of the elementary structure
SimulationAnalytical
35.5935.1034.2932.2635.6135.3433.9332.49PSNR(dB) Coastguard
33.0532.0031.4129.3433.1231.7430.3430.15PSNR(dB) Mobile
0.60.50.40.30.60.50.40.3Tm(Mbps)
v1
v2 v2
v1
m1
m2
m3
m4
5Tm
4Tm5Tm
3Tm
5Tm
4Tm5Tm
3Tm
3Tm4Tm5Tm5Tm
4Tm
5Tm
3Tm
5Tm
Analytical Result
22
Results for a 6-hop network
SimulationAnalytical
35.5833.8833.5631.8635.6133.9333.9232.48PSNR(dB) Coastguard
32.8531.3530.2128.3933.1231.7430.3428.20PSNR(dB) Mobile
0.60.50.40.30.60.50.40.3Tm(Mbps)
Analytical Result
23
Comparisons with state-of-the-art routing solutions
S2
D1
r4
r5
r1
r2
r3
S1
D2
: Physical connections: Overlay connections
4.3Tm
4.1Tm
3.8Tm
4Tm
5.1Tm
3Tm
2.5Tm
4.9Tm
4Tm
3.8Tm
4.1Tm
4.2Tm
4Tm
3.8Tm
S2S2
D1D1
r4
r5
r1r1
r2
r3
S1
D2
: Physical connections: Overlay connections
4.3Tm
4.1Tm
3.8Tm
4Tm
5.1Tm
3Tm
2.5Tm
4.9Tm
4Tm
3.8Tm
4.1Tm
4.2Tm
4Tm
3.8Tm
35.6133.1033.2730.42Self-learning policy
35.5832.8531.8628.39MDTMR [Wei, Zakhor 2004]
34.3231.3730.6724.98AODV [Perkins 1999]
“Coastguard”Y-PSNR (dB)
“Mobile”Y-PSNR(dB)
“Coastguard”Y-PSNR (dB)
“Mobile”Y-PSNR(dB)
Tm = 0.6 (Mbps)Tm = 0.3 (Mbps)Simulated
method
24
Overview
I. Multimedia transmission over mesh networksII. Exploiting information over space
– information horizonIII. Exploiting information over time
– learningIV. Conclusions
The need of information feedback• Decentralized decision making• Timely adaptation• Inter-user collaboration
Similar concept can be found in distance vector routing protocols, e.g. AODV [Perkins 1999], DSDV [Perkins 1994]
• Information horizon = 1 hop?
25
Larger information horizon
Information horizon
n1
n2
n3
n4
n5
n6
n7
hop1 hop2 hop3
Video data (With TX strategies)
Information feedback
TX strategies
n1
n2
n3
n4
n5
n6
n7
hop1 hop2 hop3
TX strategies
horizonh
from multiple agents
1C
2C
3C
4C
1horizonh = 2horizonh =
Advantages:• More accurate delay estimation• Faster adaptation
26
Information horizon tradeoff
( ( ))horizonkP h ↓A
1
argmax (1 ( ( )))K
opt horizonk k k
k
R P hλ=
= −∑A
A A
better decisions of , so that
larger time overhead per packet
( )horizonhA
,[ ( )]h
horizonk mE X h ↑ ( ( ))horizon
kP h ↑A
Example: risk-aware scheduling hmπ
27
Risk-aware scheduling – definition of “risk”Three categories of the queued packets
“Dropped” packets
“Almost dropped” packets “Seldom dropped” packets
Definition of risk:
, h
PASSk m kDelay D>
, ,[ ]h h
PASSk m k m kDelay E Delay D+ >
, ,[ ]h h
PASSk m k m kDelay E Delay D+ ≤
I,
I, , ,
,
,
( , )
Prob( [ ]), if [ ] 0 (seldom-dropped packets)=
0 , if [ ] 0 (almost-dropped packets)
=
h
h h h
h
h
horizonk m
rem remk m k m k m
remk m
i m
Risk Time h
W Time E D E D
E D
Eη
+ > > ≤
( ), ,
1 I, , ,
,1
[ ]
[ ] exp [ ] , if [ ] 0 [ ]
0 , if
h h
h h h
h
k
i m i mki rem rem
i m k m k mk mi
E X
X Time E D E DEW
E
η=
=
× − >
∑∑
,[ ] 0 h
remk mD
≤
, , ,[ ] [ ]h h h
rem PASSk m k k m k mE D D Delay E Delay= − −
28
Illustrative example
2,[ ]remmE D
Class should be sent before class during , since it is more “risky”
2C 1CITime
1,[ ]remmE D
User 1: Mobile Deadline: 500 ms
User 2: Coastguard Deadline: 300 ms
29
Problem formulation
Priority-based packet scheduling
Risk-aware packet scheduling
I I, ,
1
1
,
( )
argmax ( , ) ( , )
subject to ( ,..., ,..., ),
, if , and
h
h h hmh
h
h
IFDS horizonm
Khorizon
k k m k m m
k
m l L
PASSl k l m k
h
Risk Time h N Time
drop l C Delay D
λ
π π π
π
=
=
× ×
=
= ∈ ≥
∑π
π
π
π
( )I,1
1
,
argmax ,
subject to ( ,..., ,..., ),
, if , and
h h hmh
h
h
KPRIm k k m m
k
m l L
PASSl k l m k
N Time
drop l C Delay D
λ
π π π
π
=
= ×
=
= ∈ ≥
∑π
π π
πNumber of class packets sentkC
Instead of using onlykλ
30
Optimal information horizon
S1
S2
D1
D2
10Tm
10Tm
5Tm
5Tm
5Tm
5Tm4Tm
3Tm
3Tm4Tm
5Tm
5Tm 5Tm
5Tm
4Tm
3Tm
5Tm
4Tm
5Tm
5TmVideo: MobileDeadline = 500ms
Video: CoastguardDeadline = 300ms
Hop1 Hop2 Hop3 Hop4
n1
n3
n4
n5
n6
n7
n8
n2
5Tm
5Tm
Hop5
n9
n10
n11
n12
n13
Hop6
5Tm 10Tm
5Tm
5Tm
4Tm
3Tm
3Tm
4Tm
3Tm
5Tm
5Tm
10Tm
10Tm
10TmS1
S2
D1
D2
10Tm
10Tm
5Tm
5Tm
5Tm
5Tm4Tm
3Tm
3Tm4Tm
5Tm
5Tm 5Tm
5Tm
4Tm
3Tm
5Tm
4Tm
5Tm
5TmVideo: MobileDeadline = 500ms
Video: CoastguardDeadline = 300ms
Hop1 Hop2 Hop3 Hop4
n1
n3
n4
n5
n6
n7
n8
n2
5Tm
5Tm
Hop5
n9
n10
n11
n12
n13
Hop6
5Tm 10Tm
5Tm
5Tm
4Tm
3Tm
3Tm
4Tm
3Tm
5Tm
5Tm
10Tm
10Tm
10Tm
31.75
30.85
29.59
Riskh=4
32.0
31.1
29.63
Riskh=3
31.55
30.80
30.1
Riskh=2
29.61300Kbps
30.75400Kbps
31.50500Kbps
Priorityh=1
Tm
Analytical average PSNR (dB) for various information horizon
31
Overview
I. Multimedia transmission over mesh networksII. Exploiting information over space
– information horizonIII. Exploiting information over time
– learningIV. Conclusions
32
Given limited information feedback, can an agent do better?
Remarks: • Interaction with other agents• Local information is changing over time• Current actions may influence future
local information
Answer: Yes!!Solution: Learn changing environment
and make foresighted decisions
Agent
e.g.input rate, SINR, etc.
Utility evaluation
Determine transmission
action
Gather local information
Wirelessnetworks
(otheragents)
future influence
33
Foresighted decision making
Key ideas:• Agent does not have to wait for something really to happen then react!!• The anticipation is not only over space, but also over time• Markov decision process
Futureutility
evaluation
Gather local Information
State
Determine transmission
action
Agent
Priority queuing model
,[ ( , )]h h hk m m mEW s A
, hk mη 1,h hm mx+
e.g.input rate, SINR, etc.
Wirelessnetworks
(other agents)
future influence
State transition prob.( ' | , )
h h hm m mp s s A
34
Markov Decision Process (MDP)
• Tuple:– state:– action: – transition probability:– immediate reward:– discount factor:
• Goal: maximize the discounted sum of future rewards
, , , ,p R γS A
s ∈ S
A ∈ A
( ' | , )p s s A
( , )R s A
where 0 1γ< <γ
0
0
( , | )t t t t
t
R s A sγ
∞
=∑ Why discounted???
35
MDP Solution
Policy:Optimal state-value function (Bellman equation):
Optimal policy:
Off-line solution: value-iteration
Immediate Reward
Discounted ExpectedFuture Reward
:π →S A
* *
'
( ) max ( , ) ( ' | , ) ( ')A
s
V s R s A p s s AV sγ∈
∈
= + ∑A
S
* *
'
( ) argmax ( , ) ( ' | , ) ( ')A
s
s R s A p s s AV sπ γ∈
∈
= + ∑A
S
1
'
( ) max ( , ) ( ' | , ) ( ')t t
As
V s R s A p s s AV sγ+
∈∈
= + ∑A
S
36
Reinforcement learning
Applied when the dynamics are partially known or unknown
Model-free reinforcement learning, e.g. Q-learning [Watkins 1992],TD-learning [Sutton 1988]
Cannot take advantage of the queuing model
Converges slowly
Model-based reinforcement learningPriority queuing model (M/G/1 preemptive-repeat model)Maximum likelihood state transition probability
( ' | , )p s s A
*
'
( , ) ( , ) ( ' | , ) ( ')s
Q s A R s A p s s AV sγ∈
= + ∑S
''( ) ( )t t
s sssn A n A
∈= ∑ S
''
( )ˆ ( )( )
tsst
ss ts
n AT A
n A=
.
.
37
Coupling among agents in the multi-hop network
Information feedforward 1
fhF −
Expected delay evaluation
Condition to drop a certain priority class
1, , ,[ ] [ ] [ ]h h hk m k m k mE Delay EW E Delay
+= +
, 1 ,[ ]h
PASSk h k m kDelay EW D− + >
1 1 , , h h
fbm m h hs F F+ −=L
Required local information
Information feedback 1bhF +
1
,1 ,1( ) [ ]
h
b t th k mhF m E Delay
+++ =
Agent
hm
1bhF +
1fhF −
38
, , 1, 1 1( ) arg min [ ( , )] ( ) ( ) ( ', )
h h h h h m m h h hh hm hh
mh
b t b tt t t tkh m k m m m m s s m m mh h
As
EW s A F A T A V s Fµ γ −+ +∈
= + +
∑ '
'A
L
Proposed transmission policy
Transmission policy update:
Information exchange update:
, 1 , ,1 ,1( ) ( ) [ ( , ( ))]
h h h
b t b t d tt t th h k m m mh h khF m F m EW s µ+
++= + L
, 1 ,, 1 ,( ) [ ( , ( ))]
h h h
f t d tPASS t t th k h k m m mh khF m Delay EW s µ+
−= + L
Current delay Future delay
39
Proposed distributed MDP
Step 1: Gather local informationStep 2: Evaluate state transition prob. and queuing delaysStep 3: Update transmission policyStep 4: Exchange information
Futureutility
evaluation
DistributedMDP
Local Information
State
Determine transmission
action h hm ∈ M
Decisionprocessof agents
1fhF −
bhF
fhF
hms
( )hkh msµ
Markovian statetransition
1bhF +
, , 1, 1 1
ˆ( ) arg min [ ( , )] ( ) ( ) ( ', )h h h h h m m h h hh h
m hhmh
b t b tt t t tkh m k m m m m s s m m mh h
As
EW s A F A T A V s Fµ γ −+ +∈
= + +
∑ '
'A
L
Feedback-modified Bellman equation
40
Futureutility
evaluation
DistributedMDP
Local Information
State
Determine transmission
action 1 1h hm − −∈ M
Decisionprocessof agents
1bhF −
1hms −
11( )hkh msµ−−
Markovian statetransition
converge
Convergence of proposed distributed MDP
Proposition 2: The transmission policy of the distributed MDPwill converge if and only if the priority class is not dropped in the networks
[ , 1,..., ]tkh h Hµ =
Futureutility
evaluation
DistributedMDP
Local Information
State
Determine transmission
action h hm ∈ M
Decisionprocessof agents
bhF
1fhF −
hms
( )hkh msµ
Markovian statetransition
1bhF +
Last hop
converge
kC
41
Comparison with traditional routing solutions
Existing routing solutions– Throughput optimal [Tassiulas 1996]– Flow-based optimized routing using queue size backpressure [Neely and
Modiano 2006]– Throughput and delay optimized opportunistic routing [Gupta and Javidi 2007]– Low complexity distributed joint scheduling-routing algorithms
[Gupta, Lin, Srikant 2007] – Selfish routing based on congestion information [Roughgarden 2002]– Network utility maximization framework (NUM) [Kelly 1998][Xu 2008]
Required knowledge
Decision making
Online learning based on local information
A priori known environment
(e.g. given capacity region)
Foresighted decision makingMyopic decision making
Proposed autonomicmulti-hop routing
Traditional routing
42
0 20 40 60 80 100 1200
20
40
E[D
elay
1]
0 20 40 60 80 100 1200
20
40
E[D
elay
2]
0 20 40 60 80 100 1200
20
40
E[D
elay
3]
0 20 40 60 80 100 1200
20
40
E[D
elay
4]
40
Model-based learningSelf-learningQ-learning
Simulation results
Delay deadline: 1sec
Packet loss
time (sec)
Delay deadline: 1sec
Delay deadline: 1sec
2 4,C C →
1 3,C C →
2 4,C C→
1 3,C C→
43
Multi-agent interactive learning solutions -required observations and information
Model-free learning– Q-learning [Watkins 1992]– TD-learning [Sutton 1988]– Reinforcement learning
Model-based learning– Fictitious play
[Brown 1951][Shapley 1996]– Model-based
reinforcement learning [Singh 1995][Ok 1998]
Informationoverhead
Reinforcementlearning
Model-basedreinforcement
learningobservation and informationabout the agent itself
Fictitious play
observation and informationabout all the other agents
44
Autonomic network scenarios
Reinforcement learning
SINRInterference coupling among agents
Power control over ad hoc mobile networks
Fictitious playPrimary users’loading, other secondary users’ actions
Resource availability(spectrum holes)
Distributed resource management over cognitive radio networks
Model-based reinforcement learning
Source rate, transmission rate, packet error rate
Source traffic, channel condition
Multimedia transmission over wireless mesh network
Suitable learning
Local information
DynamicsNetwork scenarios
45
Conclusions
Decentralized decision making is not enough!
Proposed new networking paradigm, where autonomic agents can self-configure and optimize the applications’performance by
adapting their cross-layer transmission strategiesproactively acquiring information by trading-off information overheads vs. performance gainsinteractively learningmaking foresighted decisions (across time and across hops)
46
Broader impact and future direction
Vision: the foresighted decision making and interactive learning approaches
Managing any decentralized system with both information and delayconstraintDecentralized management by autonomic “cognitive” agents
Future directionHierarchies of cognitive agents
Coalition of agents
Solutions for malicious behavior prevention
47
Summary of main contributions
Cross-layer design for multimedia streamingMulti-user video streaming over multi-hop wireless networks[JSAC 2007, Asilomar 2006, IIH-MSP 2006]Risk-aware scheduling [TMM 2007, VCIP 2008]
Dynamic resource management in cognitive radio networksQueuing-based channel selection for multimedia transmission[TMM 2008, ICIP 2008]Joint route/channel selection in multi-hop cognitive radio networks [TVT 2008, DySPAN 2008]
Learning in gamesAdaptive learning in power control game [TVT 2009]Predictive channel selection [ICC 2008]Learning in conjecture-based channel selection game [TNet submitted, Gamenets 2009]Model-based reinforcement learning for distributed MDP[under preparation]
OtherRouting decision for surveillance network under information constraint[TCSVT Submitted]
48
Journal publicationsAccepted• Hsien-Po Shiang , Mihaela van der Schaar, “Multi-user Video Streaming over Multi-hop Wireless
Networks: A Distributed, Cross-layer Approach Based on Priority Queuing,” IEEE Journal of Selected Areas in Communications, vol. 25, no. 4, pp. 770-785, May 2007.
• Hsien-Po Shiang , Mihaela van der Schaar, “Informationally Decentralized Video Streaming over Multi-hop Wireless Networks,” IEEE Transactions on Multimedia, vol. 9, no. 6, pp. 1299-1313, Oct 2007.
• Hsien-Po Shiang , Mihaela van der Schaar, “Queuing-Based Dynamic Channel Selection for Heterogeneous Multimedia Applications over Cognitive Radio Networks,” IEEE Transactions on Multimedia, vol. 10, no. 5, pp. 896-909, Aug. 2008.
• Hsien-Po Shiang , Mihaela van der Schaar, “Distributed Resource Management in Multi-hop Cognitive Radio Networks for Delay Sensitive Transmission,” IEEE Transactions on Vehicular Technology, vol. 52, no.2, pp. 941-953, Feb 2009.
• Hsien-Po Shiang , Mihaela van der Schaar, “Feedback-Driven Interactive Learning in Dynamic Wireless Resource Management for Delay Sensitive Users,” IEEE Transactions on Vehicular Technology, accepted, to appear.
Submitted• Hsien-Po Shiang , Mihaela van der Schaar, “Conjecture-Based Channel Selection for
Autonomous Delay-Sensitive Users in Multi-Channel Wireless Networks,” submitted to IEEE Transactions on Networking.
• Hsien-Po Shiang , Mihaela van der Schaar, “Information-Constrained Resource Allocation in Multi-Camera Wireless Surveillance Networks,” submitted to IEEE Transactions on Circuits and Systems for Video Technology.
49
• Conference Papers• Hsien-Po Shiang , Mihaela van der Schaar, “Delay-Sensitive Resource Management in Multi-hop
Cognitive Radio Networks" in IEEE Dynamic Spectrum Access Networks (DySPAN 2008), Oct. 2008.
• Hsien-Po Shiang , Mihaela van der Schaar, “Dynamic Channel Selection for Multi-user Video Streaming over Cognitive Radio Networks," in Proc. Int. Conf. On Image Processing. (ICIP 2008) Oct. 2008.
• Hsien-Po Shiang , Wenchi Tu, Mihaela van der Schaar, “Dynamic Resource Allocation of Delay Sensitive Users Using Interactive Learning over Multi-carrier Networks," in Proc. Int. Conf. Commun. (ICC 2008) May 2008.
• Hsien-Po Shiang , Mihaela van der Schaar, “Risk-aware scheduling for multi-user video streaming over wireless multi-hop networks,” in IS&T/SPIE Visual Communications and Image Processing (VCIP 2008), San Jose, Jan 2008.
• Hsien-Po Shiang , Mihaela van der Schaar, “Multi-user Video Streaming over Multi-hop Wireless Networks: A Cross-layer Priority Queuing Approach,” in IEEE Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2006), pp. 255-258, Dec 2006.
• Hsien-Po Shiang , D. Krishnaswamy, and Mihaela van der Schaar, “Quality-aware Video Streaming over Wireless Mesh Networks with Optimal Dynamic Routing and Time Allocation,” in Proceedings of the 40th Asilomar Conference on Signals, Systems, and Computers, Oct 2006.
• D. Krishnaswamy, H.-P. Shiang, J. Vicente, W. S. Conner, S. Rungta, W. Chan and K. Miao, “A Cross-Layer Cross-Overlay Architecture for Proactive Adaptive Processing in Mesh Networks,” in 2nd IEEE Workshop on Wireless Mesh Networks (WiMesh 2006), Sep 2006.
50
References (1/2)
[WCZ05] Y. Wu, P. A. Chou, Q. Zhang, K. Jain, W. Zhu, S.Y. Kung, "Network Planning in Wireless Ad Hoc Networks: A Cross-Layer Approach", IEEE Journal on Selected Areas in Communications, vol. 23, no. 1, pp. 136-150, Jan. 2005.
[SYZ05] E. Setton, T. Yoo, X. Zhu, A. Goldsmith, and B. Girod, “Cross-layer design of Ad hoc Networks for real-time video streaming,” IEEE Wireless Communications Mag., pp. 59-65, Aug 2005.
[JF07] D. Jurca, P. Frossard, “Packet Selection and Scheduling for Multipath video streaming,” IEEE Transactions on Multimedia, vol. 9, no. 2, Apr. 2007.
[AMV06] Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, “Cross-layer Optimized video Streaming over wireless multi-hop Mesh Networks,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 11, Nov 2006, pp. 2104-2115.
[PR99] C. E. Perkins, E. M. Royer, “Ad hoc on-demand distance vector routing,” in Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications, pp. 90-100, Feb 1999.
[PB94] C. E. Perkins, P. Bhagwat, “Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for Mobile Computers,” ACM SIGCOMM Computer Communication Review, vol. 24, no. 4, pp. 234-244, Oct. 1994.
[WZ02] W. Wei, and A. Zakhor, “Multipath unicast and multicast video communication over wireless ad hoc networks,” Proc. Int. Conf. Broadband Networks, Broadnets, pp. 496-505, 2002.
[DPZ04] R. Draves, J. Padhye, and B. Zill, “Routing in multi-radio, multi-hop wireless mesh networks,”in Proc. ACM Internat. Conf. on Mob. Computing and Networking (MOBICOM), 2004, pp. 114-128.
[AL94] B. Awerbuch and T. Leighton, “Improved Approximation Algorithms for the Multi-commodity Flow Problem and Local Competitive Routing in Dynamic Networks,” Proc. 26th ACM Symposium on Theory of Computing, May 1994.
[NMR05] M. J. Neely, E. Modiano, and C. E. Rohrs, “Dynamic Power Allocation and Routing for Time-Varying Wireless Networks”, IEEE Journal on Selected Areas in Communications, vol. 23. no1, Jan 2005. pp. 89-103.
[GJ07] P. Gupta and T. Javidi, "Towards Throughput and Delay-Optimal Routing for Wireless Ad-Hoc Networks,'' Asilomar Conference on Signals, Systems and Computers, Nov. 2007.
51
References (2/2)
[WD92] C. J. C. H. Watkins, P. Dayan, “Q-learning”, Machine Learning, vol. 8, no. 3-4, pp. 279-292, May 1992.
[Sut88] R. S. Sutton, ”Learning to predict by the method of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9-44, Aug. 1988.
[TO98] P. Tadepalli and D. Ok, "Model-based average reward reinforcement learning", Artificial Intelligence, Volume 100, Issues 1-2, January 1998, Pages 177-224.
[BBS95] A. G. Barto, S. J. Bradtke and S. P. Singh, "Learning to act using real-time dynamic programming", Artificial Intelligence, Volume 72, Issues 1-2, January 1995, Pages 81-138.
52
Sub-flows separation of Coastguard and Mobile video sequences
1λ
2λ
3λ
5λ
7λ
4λ
6λ
8λ
53
Agent
Local information
Transmission action
Delay evaluation
hmA
hmL
54
Delay-Sensitive Multimedia Applications
• Heterogeneous dependencies– Delay deadlines– Time-varying complexity
• Loss tolerant / adaptable
0 1 2 3 4 5 6 7 8 90
1000
2000
3000
4000
5000
6000
7000
8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s
Time (sec)
Nor
mal
ized
Pro
cess
or T
icks
0 1 2 3 4 5 6 7 8 90
1000
2000
3000
4000
5000
6000
7000
8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s
Time (sec)
Nor
mal
ized
Pro
cess
or T
icks
(c)
Decoding complexity (Silent sequence)
Time (seconds)
N
orm
aliz
ed C
ompl
exit
y
(a) Sequential Dependencies
(a) Typical Hybrid Coder Dependencies (MPEG-2, H.264/AVC)
(a) Scalable Coding Dependencies
[Chou, 2006]
55
Multimedia transmission over wireless mesh networks
S2
D1
r4
r5
r1
r2
r3
S1
D2
S2S2
D1D1
r4
r5
r1r1
r2
r3
S1
D2
56
Resource management in cognitive radio networks
1PU
2PU
3PU
4PU
1PU
4PU
1F
2F
3F
4F
1 2, ,..., NSU SU SU
57
Power control in ad hoc networks
……
……
……
58
Example: distributed channel/route selection in multi-hop cognitive radio networks[TVT Shiang 2008]
20 40 60 80 100 120 14020
40
60
80
100
120
140
1 2 3
4 56
7 8 9
1011 12
13 14 15
1V
2V
1dn
1sn
2sn
2dnm
App: 2 video streamsUsers: 15 secondary users (nodes)Actions: channel/route selection
2 frequency channelsUtilities: reduce packet loss rate of
delay-sensitive applicationsTransmission range: 40 metersPrimary user around node 11,12Adopt fictitious play
59
Fictitious Play
– Goal: learn the other agents’ policies – Count the empirical frequency of the other agents’ actions– Probabilistic behaviors
( )[ ( ), for ] , ( )
( )u
tu ut t t
u u u u u u tu
A
r AS A A S A
r A∈
= ∈ ∈ =∑
S
A
A S
1: arg max [ ( , )]v
v v
t t t tv v v v
AA EU A B
ππ
−
−
∈=
A
: , v
t t tv u vB u
π− − = ∈ ΩS S
Privateinformation
Evaluate andmaximize
User vm
Users v−tvπ
1tvπ−
−Wireless network
environment
tvL
1tv−−L
Fictitious playvΛ
v
tBπ−
[ ]tvEU
Should an agent monitorall the other agents??
60
Information cellBenefit of acquiring more information
Build more accurate belief Avoid “information mismatch problem”
Cost of gathering information
1n
3n
4n
1m
6n
5n
2n
(a)
6 6 6, ( ), [ ( )]n k kA n E d nI
1 1 1, ( ), [ ( )]n k kA n E d nI
5 5 5, ( ), [ ( )]n k kA n E d nI
3 3 3, ( ), [ ( )]n k kA n E d nI
4 4 4, ( ), [ ( )]n k kA n E d nI
Interference range of 2nInformation horizon
1n1n
3n3n
4n4n
1m1m1m
6n6n
5n5n
2n2n
(a)
6 6 6, ( ), [ ( )]n k kA n E d nI
1 1 1, ( ), [ ( )]n k kA n E d nI
5 5 5, ( ), [ ( )]n k kA n E d nI
3 3 3, ( ), [ ( )]n k kA n E d nI
4 4 4, ( ), [ ( )]n k kA n E d nI
Interference range of 2nInformation horizon
6 6 6, ( ), [ ( )]n k kA n E d nI
1 1 1, ( ), [ ( )]n k kA n E d nI
5 5 5, ( ), [ ( )]n k kA n E d nI
3 3 3, ( ), [ ( )]n k kA n E d nI
4 4 4, ( ), [ ( )]n k kA n E d nI
Interference range of 2nInformation horizon
2n
1n
3n
4n
1m
6n
5n
(b)
3 3 3, ( ), [ ( )]n k kA n E d nI
4 4 4, ( ), [ ( )]n k kA n E d nI
1 1 1, ( ), [ ( )]n k kA n E d nI
2n2n
1n1n
3n3n
4n4n
1m1m1m
6n6n
5n5n
(b)
3 3 3, ( ), [ ( )]n k kA n E d nI
4 4 4, ( ), [ ( )]n k kA n E d nI
1 1 1, ( ), [ ( )]n k kA n E d nI
( )It ν
( ( ))I nd hL ( ( ))I nd hL
Decision making Packet transmission
( )It ν
( ( ))I nd hL ( ( ))I nd hL
Decision making Packet transmission
( ) ( ) ( )( ( )) ( )[( 1)( ) ]d I AI nd h N h K U U U= − + +L
61
Adaptive fictitious play in cognitive radio networks
Adaptive fictitious play adapts the information cell that limits the neighbors with which information is exchanged
Primary usersnZ
Minimum-delayroute/channel
selection
ˆ ( )n kA
optnA
h
Availableresource
AdaptiveFictitious play
( )n hA−
( ( ))S n h−Secondary users
in horizon
Node n
62
Results of the two applications V1 and V2
2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
Average Transmission Rate T(e,f) (Mbps)
Pac
ket
Loss
Rat
e
2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
Average Transmission Rate T(e,f) (Mbps)
Pac
ket
Loss
Rat
e
AODV V2AODV/LB V2DCS V2AFP horizon 2 V2AFP horizon 1 V2
AODV V1AODV/LB V1DCS V1AFP horizon 2 V1AFP horizon 1 V1
Myopic channel selection
Learn from less neighbors
Learn from more neighbors
Random channel selection
(Primary users loading ~ 0)
63
Results regarding the impact of primary users
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.4
0.5
0.6
0.7
0.8
Primary user time fraction ρ
Pac
ket
loss
rat
e
AFP horizon 3 V1AFP horizon 2 V1AFP horizon 1 V1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.15
0.2
0.25
0.3
0.35
Primary user time fraction ρ
Pac
ket
loss
rat
e
AFP horizon 3 V2AFP horizon 2 V2AFP horizon 1 V2
(Primary users around nodes 11, 12, T=5Mbps)
Information cost