Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Stochastic Robotics
Stochastic Robotics:
Complexity, Compositionality, and Scalability
Dr. Theodore (Ted) P. Pavlic
www
Friday, February 22, 2013
Overview
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Introduction
Guarded Command Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task Processing
Application: Patrol
Network Cooperation Game
Conclusions
Guarded Command Programming with Rates(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
N. Napp and E. Klavins, “A compositional framework for programming
stochastically interacting robots,” Int. J. Robot. Res. [Special Issue Stochasticity in
Robot. Bio-Systems Part 2], vol. 30, no. 6, pp. 713–729, May 2011.
Application: Factory Floor testbed(Galloway et al. 2010)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Application: Factory Floor testbed(Galloway et al. 2010)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Application: Factory Floor testbed routing(Galloway et al. 2010; Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Guard︷︸︸︷
1000→
Action︷︸︸︷
0100
1 0 0 0
0 1 0 0
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Weakly related to GCP by Dijkstra (1975).
Non-deterministic reasoning about programs (predicate transformers).
x ≥ y → m := x� y ≥ x→ m := y
state = 0100 → state = 1000
� state = 0100 → state = 0010
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Guard︷︸︸︷
1000→
Action︷︸︸︷
0100 (global state)
1 0 0 0
0 1 0 0
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Guard︷︸︸︷
1000→
Action︷︸︸︷
0100
s1s2→ s1s2
(global state)
(site encoded)
1 0 0 0
0 1 0 0
Application: Factory Floor testbed routing(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Guard︷︸︸︷
1000→
Action︷︸︸︷
0100
s1s2→ s1s2
sisi+1→ sisi+1
(global state)
(site encoded)
(behavioral ND program)
1 0 0 0
0 1 0 0
Guarded Command Programming with Rates(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
For compositionality, augment GCP with exponential rates.
sisi+1 → sisi+1 (GCP)
sisi+1k−⇀ sisi+1 (GCPR)
So a GCPR command is now an edge of a Markov chain. Rates are
either chosen programmatically or are used to model error rates.
Guarded Command Programming with Rates(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
For compositionality, augment GCP with exponential rates.
sisi+1 → sisi+1 (GCP)
sisi+1k−⇀ sisi+1 (GCPR)
So a GCPR command is now an edge of a Markov chain. Rates are
either chosen programmatically or are used to model error rates.
� A GCPR Ψ = {(g1, a1, r1), . . . , (gn, an, rn)} is a set of commands
that are each made up of a guard, action, and rate.
Guarded Command Programming with Rates(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
For compositionality, augment GCP with exponential rates.
sisi+1 → sisi+1 (GCP)
sisi+1k−⇀ sisi+1 (GCPR)
So a GCPR command is now an edge of a Markov chain. Rates are
either chosen programmatically or are used to model error rates.
� A GCPR Ψ = {(g1, a1, r1), . . . , (gn, an, rn)} is a set of commands
that are each made up of a guard, action, and rate.
� A GCPR Ψ can be scaled by σ ∈ R≥0 such that
σΨ =⋃
(g,a,r)∈Ψ
(g, a, σr).
Guarded Command Programming with Rates(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
For compositionality, augment GCP with exponential rates.
sisi+1 → sisi+1 (GCP)
sisi+1k−⇀ sisi+1 (GCPR)
So a GCPR command is now an edge of a Markov chain. Rates are
either chosen programmatically or are used to model error rates.
� A GCPR Ψ = {(g1, a1, r1), . . . , (gn, an, rn)} is a set of commands
that are each made up of a guard, action, and rate.
� A GCPR Ψ can be scaled by σ ∈ R≥0 such that
σΨ =⋃
(g,a,r)∈Ψ
(g, a, σr).
� The composition Ψ ∪ Φ of GCPR Ψ and GCPR Φ is the union of the
two programs.
Analysis of GCPR using Master Equation(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Desired and undesired behaviors compose a single Markov process.
Analysis of GCPR using Master Equation(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Desired and undesired behaviors compose a single Markov process.
� Slow, robust behaviors can be mixed with fast, idealized behaviors.
Analysis of GCPR using Master Equation(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Desired and undesired behaviors compose a single Markov process.
� Slow, robust behaviors can be mixed with fast, idealized behaviors.
� Rates and scalars are chosen to ensure adequate performance and
error resilience.
Analysis of GCPR using Master Equation(Napp and Klavins 2011)
Introduction
Guarded Command
Programming with Rates
Application
GCP with Rates
Markov Analysis
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Desired and undesired behaviors compose a single Markov process.
� Slow, robust behaviors can be mixed with fast, idealized behaviors.
� Rates and scalars are chosen to ensure adequate performance and
error resilience.
� System described by linear master equation:
~p = ~pQ
where pj is probability of state j and Q is graph Laplacian.
� Correctness: steady-state probability p∗
� Performance: spectrum of Q (i.e., λ2)
Application: Artificial Pollination by RoboBees(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
S. Berman, V. Kumar, and R. Nagpal, “Design of control policies for spatially inhomogeneous robot
swarms with application to commercial pollination,” in Proceedings of the 2011 IEEE International
Conference on Robotics and Automation, Shanghai, China, May 9–13, 2011.
S. Berman, R. Nagpal, and A. Halasz, “Optimization of stochastic strategies for spatially
inhomogeneous robot swarms: a case study in commerical pollination,” in Proceedings of the 2011
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), San Francisco, CA,
USA, September 25–30, 2011, pp. 3923–3930.
http://robobees.seas.harvard.edu/
RoboBees as Chemical Reaction Networks(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
RoboBees as Chemical Reaction Networks(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Reactions model behavior switches; choose rates programmatically.
RoboBees as Chemical Reaction Networks(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Reactions model behavior switches; choose rates programmatically.
� Motion governed by drift–diffusion process; choose field and diffusion
coefficient programmatically.
~xi(t+δt) = ~xi(t)+~v(~xi, t)δt+√
2Dδt ~Z(t), Zj(t) ∼ N (0, 1)
RoboBees as Chemical Reaction Networks(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Reactions model behavior switches; choose rates programmatically.
� Motion governed by drift–diffusion process; choose field and diffusion
coefficient programmatically.
~xi(t+δt) = ~xi(t)+~v(~xi, t)δt+√
2Dδt ~Z(t), Zj(t) ∼ N (0, 1)
� Macroscopic design with advection–diffusion–reaction equations.
RoboBees as Chemical Reaction Networks(Berman et al. 2011a,b)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Reactions model behavior switches; choose rates programmatically.
� Motion governed by drift–diffusion process; choose field and diffusion
coefficient programmatically.
~xi(t+δt) = ~xi(t)+~v(~xi, t)δt+√
2Dδt ~Z(t), Zj(t) ∼ N (0, 1)
� Macroscopic design with advection–diffusion–reaction equations.
� Goal: Desired flower coverage (1500 robots, 3 minute bouts, wind).
Application: Swarm Robotic Assembly(Matthey et al. 2009)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
L. Matthey, S. Berman, and V. Kumar, “Stochastic strategies for a swarm robotic assembly
system,” in Proceedings of the 2009 IEEE International Conference on Robotics and
Automation, Kobe, Japan, May 12–17, 2009, pp. 1953–1958
Application: Swarm Robotic Assembly(Matthey et al. 2009)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Swarm Robotic Assembly as Chemical Reaction Network(Matthey et al. 2009)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Goal: Equal number of both parts assembled (i.e., xF1 = xF2).
Swarm Robotic Assembly as Chemical Reaction Network(Matthey et al. 2009)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
RoboBees
Swarm Assembly
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Results from CME Design: Micro-/Macro-scopic Fraction of Parts 1 and 2
Cooperative Transport in Ants(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
G. P. Kumar, A. Buffin, T. P. Pavlic, S. C. Pratt, and S. M. Berman, “A stochastic
hybrid system model of collective transport in the desert ant Aphaenogaster
cockerelli ,” in Proceedings of the 16th ACM International Conference on Hybrid
Systems: Computation and Control, Philadelphia, PA, April 8–11, 2013.
Cooperative Transport in Ants(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Aphaenogaster cockerelli ants
Stochastic Hybrid System Model for Ants(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Front Back
Detached
kFB
kBF
kFD
kDF kBD k
DB
xL = vL
mLvL = NFK(vdL − vL)︸ ︷︷ ︸
IndividualBehavior
+ µ sgn(vL)(mLg − (NF +NB)FL)︸ ︷︷ ︸
Friction
SHS Extended Generator(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Generalized Lie derivative of moments along trajectories of SHS systems:
� For function ψ,
Lψ(~x) ,∂ψ
∂xLxL +
∂ψ
∂vLvL +
∑
i,j∈{F,B,D}i 6=j
(ψ(φij(~x))− ψ(~x))kijNi
andd
dtE(ψ(~x)) = E(Lψ(~x)).
So arbitrary moment dynamics can be derived from SHS model.
SHS Extended Generator(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
Generalized Lie derivative of moments along trajectories of SHS systems:
� For function ψ,
Lψ(~x) ,∂ψ
∂xLxL +
∂ψ
∂vLvL +
∑
i,j∈{F,B,D}i 6=j
(ψ(φij(~x))− ψ(~x))kijNi
andd
dtE(ψ(~x)) = E(Lψ(~x)).
So arbitrary moment dynamics can be derived from SHS model.
Behavioral switching rates and control parameters can be found by fitting
moment dynamics to statistics.
Fitting Results(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
0.5
0.4
0.3
0.2
0.1
150150
150150150
100100
100100100
50
50
50
505050
40
30
20
10
101010
888
666
444
222
00
00
00
00
00
NF
NB
ND
xL
(cm
)
vL
(cm
/s)
Time (s)Time (s)
Time (s)Time (s)Time (s)
Fitting Results(Kumar et al. 2013)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Modeling Ants
SHS Generators
Cooperative Task
Processing
Conclusions
Stochastic Robotics
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
150150150150
150150150150
150150150150
100100100100
100100100100
100100100100
50505050
50505050
50505050
40
40
40
30
30
30
20
20
20
10
1010
10
1010
10
1010
88
88
88
66
66
66
44
44
44
22
22
22
00
00
00
00
00
00
00
00
00
00
00
00
NF
NF
NF
NB
NB
NB
xL
(cm
)xL
(cm
)xL
(cm
)
vL
(cm
/s)
vL
(cm
/s)
vL
(cm
/s)
Time (s)Time (s)Time (s)Time (s)
Time (s)Time (s)Time (s)Time (s)
Time (s)Time (s)Time (s)Time (s)
Bonus – Ant Cognition: Drift–Diffusion Modeling of Quorum Sensing
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Ant Cognition
Cooperative Task
Processing
Conclusions
Stochastic Robotics
T. P. Pavlic, S. C. Pratt, “Speed–accuracy tradeoffs in Temnothorax
rugatulus ants: sequential-sampling models of quorum detection while
house hunting” (IUSSI-NAS 2012, SMB 2013).
Time
Evid
en
ce
a
z
0
Choice A
Choice B
v
“Drift” toward appropriate
answer over time.
Process variability (noise)
causes occasional errors.
b
bb
Cooperative Task Processing(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
T. P. Pavlic and K. M. Passino, “Cooperative task-processing networks,” in Proceedings of
the Second International Workshop on Networks of Cooperating Objects, CONET 2011,
Chicago, IL, USA, April 11, 2011.
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
Example AAV Application:
Three MQ-8 Firescouts
on patrol
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
1
2
3
b
b
1
2 3
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
×
× ×
×
××
×
×
×
××
×
×
× ×
×
λ
1
2
3
b
b
1
2 3
λ
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
×
× ×
×
××
×
×
×
××
×
×
× ×
×
λ
1
2
3
b
b
1
2 3
λ
� Task-processing network (TPN): conveyor 1 and cooperators 2 and 3
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
×
× ×
×
××
×
×
×
××
×
×
× ×
×
×
×
×
×
×
×
× ×
×
×
××
××
××
×
×
×
×
××
×
×
×
××
××
××
××
λ1
λ2 λ3
1
2
3
b
b
1
2 3
1
2 3
λ1
λ2 λ3
� Task-processing network (TPN): conveyor 1 and cooperators 2 and 3� Each can be both conveyor and cooperator simultaneously
Application: Cooperative patrol(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
×
× ×
×
××
×
×
×
××
×
×
× ×
×
×
×
×
×
×
×
× ×
×
×
××
××
××
×
×
×
×
××
×
×
×
××
××
××
××
λ1
λ2 λ3
1
2
3
b
b
1
2 3
1
2 3
λ1
λ2 λ3
� Task-processing network (TPN): conveyor 1 and cooperators 2 and 3� Each can be both conveyor and cooperator simultaneously
� Policy should be decentralized but still share load
Cooperation game(Pavlic and Passino 2011)
Stochastic Robotics
For cooperator i ∈ C, its local rate of gain
Ui(~γ) ,
Conveyor costs︷ ︸︸ ︷
−ci
(∏
j∈Ci
(1− γj)
)
︸ ︷︷ ︸
Pr(No Ci volunteers|i advertisement)
+ γi∑
j∈Vi
(−
Pr(i awarded task from j|i volunteers)︷ ︸︸ ︷
Ps(j|i)cij)
︸ ︷︷ ︸
Cooperator part:
Costs of local processing on i ∈ V :
ci ,∑
k∈Yi
λki cki
Costs and benefits to i ∈ C for volunteering for
tasks exported from j ∈ Vi:
cij ,∑
k∈Yj
λkj ckij
Cooperation game(Pavlic and Passino 2011)
Stochastic Robotics
For cooperator i ∈ C, its local rate of gain
Ui(~γ) ,
Conveyor costs︷ ︸︸ ︷
−ci
(∏
j∈Ci
(1− γj)
)
︸ ︷︷ ︸
Pr(No Ci volunteers|i advertisement)
+ γi∑
j∈Vi
(pij(Qj)−
Pr(i awarded task from j|i volunteers)︷ ︸︸ ︷
Ps(j|i)cij)
︸ ︷︷ ︸
Cooperator part: γi , Qj vary with γi
Costs of local processing on i ∈ V :
ci ,∑
k∈Yi
λki cki
Costs and benefits to i ∈ C for volunteering for
tasks exported from j ∈ Vi:
cij ,∑
k∈Yj
λkj ckij
pij(Qj) ,∑
k∈Yj
λkj qkijp
kj (Qj)
Payment functions added as stabilizing controls (“quantity” Qi ,∑
j∈Ciγj ).
Cooperation game(Pavlic and Passino 2011)
Stochastic Robotics
For cooperator i ∈ C, its local rate of gain
Ui(~γ) ,
Conveyor costs︷ ︸︸ ︷
−ci
(∏
j∈Ci
(1− γj)
)
︸ ︷︷ ︸
Pr(No Ci volunteers|i advertisement)
+ γi∑
j∈Vi
(pij(Qj)−
Pr(i awarded task from j|i volunteers)︷ ︸︸ ︷
Ps(j|i)cij)
︸ ︷︷ ︸
Cooperator part: γi , Qj vary with γi
Costs of local processing on i ∈ V :
ci ,∑
k∈Yi
λki cki
Costs and benefits to i ∈ C for volunteering for
tasks exported from j ∈ Vi:
cij ,∑
k∈Yj
λkj ckij
pij(Qj) ,∑
k∈Yj
λkj qkijp
kj (Qj)
Payment functions added as stabilizing controls (“quantity” Qi ,∑
j∈Ciγj ).
Cournot oligopolieson a graph(∝ jury volunteering)
Decreasing-cost
externality
Totally asynchronous solver(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
� Totally asynchronous parallel computation of ~γ∗ by local gradient
ascent:
Totally asynchronous solver(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
� Totally asynchronous parallel computation of ~γ∗ by local gradient
ascent:
� Agents iterate asynchronously.
� Each agent operates on a possibly outdated copy of ~γ.
Totally asynchronous solver(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
� Totally asynchronous parallel computation of ~γ∗ by local gradient
ascent:
� Agents iterate asynchronously.
� Each agent operates on a possibly outdated copy of ~γ.
� If synchronous transition mapping is a contraction with respect to
maximum norm
‖~γ‖∞ , maxi∈C
{|γi|},
then a unique Nash equilibrium exists and is asymptotically stable
by totally asynchronous distributed gradient ascent
iterations (Bertsekas and Tsitsiklis 1997).
Totally asynchronous solver(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
� Totally asynchronous parallel computation of ~γ∗ by local gradient
ascent:
� Agents iterate asynchronously.
� Each agent operates on a possibly outdated copy of ~γ.
� If synchronous transition mapping is a contraction with respect to
maximum norm
‖~γ‖∞ , maxi∈C
{|γi|},
then a unique Nash equilibrium exists and is asymptotically stable
by totally asynchronous distributed gradient ascent
iterations (Bertsekas and Tsitsiklis 1997).
� Constraints on topology and payment functions ensure contraction.
� Diagonal-dominance/convexity argument.
� Network structure ensures dominance.
Totally asynchronous projected gradient ascent
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
Define T : [0, 1]n → [0, 1]n by T (~γ) , (T1(~γ), T2(~γ), . . . , Tn(~γ)) where, for
each i ∈ C,
Ti(~γ) , min{γmaxi ,max{0, γi + σi∇iUi(~γ)}}
Projected gradient ascent
Totally asynchronous projected gradient ascent
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
Define T : [0, 1]n → [0, 1]n by T (~γ) , (T1(~γ), T2(~γ), . . . , Tn(~γ)) where, for
each i ∈ C,
Ti(~γ) , min{γmaxi ,max{0, γi + σi∇iUi(~γ)}},
where1
σi≥ 2|Vi|max
k∈Vi
|p′ik(0)|
for all ~γ ∈ [0, 1]n.
Totally asynchronous projected gradient ascent
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
Define T : [0, 1]n → [0, 1]n by T (~γ) , (T1(~γ), T2(~γ), . . . , Tn(~γ)) where, for
each i ∈ C,
Ti(~γ) , min{γmaxi ,max{0, γi + σi∇iUi(~γ)}},
where1
σi≥ 2|Vi|max
k∈Vi
|p′ik(0)|
for all ~γ ∈ [0, 1]n. If
minj∈Vi
|p′ij (|Cj |) | >
(
|Vi| −1
2
)
maxj∈Vi
|cij | for all i ∈ C,
then the totally asynchronous distributed iteration (TADI) sequence {~γ(t)}generated with mapping T and the outdated estimate sequence {~γi(t)} for all
i ∈ C each converge to the unique Nash equilibrium ~γ∗ of the cooperation game.
Totally asynchronous projected gradient ascent
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
Define T : [0, 1]n → [0, 1]n by T (~γ) , (T1(~γ), T2(~γ), . . . , Tn(~γ)) where, for
each i ∈ C,
Ti(~γ) , min{γmaxi ,max{0, γi + σi∇iUi(~γ)}},
where1
σi≥ 2|Vi|max
k∈Vi
|p′ik(0)|
for all ~γ ∈ [0, 1]n. If
Benefit︷ ︸︸ ︷
minj∈Vi
|p′ij (|Cj |) | >
1/(Relatedness)︷ ︸︸ ︷(
|Vi| −1
2
) Cost︷ ︸︸ ︷
maxj∈Vi
|cij | for all i ∈ C,
∼ Hamilton’s rule
on networks
then the totally asynchronous distributed iteration (TADI) sequence {~γ(t)}generated with mapping T and the outdated estimate sequence {~γi(t)} for all
i ∈ C each converge to the unique Nash equilibrium ~γ∗ of the cooperation game.
Results: Cyclic feedback(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
1
2 3
γi
C = V = {1, 2, 3}
Ci = Vi = {1, 2, 3} \ {i}
Yj = {j}
1
2 3
λ11
λ22 λ3
3
λ22 (encounters per second)
Optimal
coop
erationwillingn
ess
γ∗ ifori∈{1,2,3}
A B C
0
1
0 1 2 3 4 5
λ11 = 0.6
λ11
λ33 = 1.7
λ33
ba
b
bbc
bd
γ∗1
γ∗3
γ ∗2
A
λ22 < λ11 < λ33
γ∗2 > γ∗1 > γ∗3
B
λ11 < λ22 < λ33
γ∗1 > γ∗2 > γ∗3
C
λ11 < λ33 < λ22
γ∗1 > γ∗3 > γ∗2
Equilibrium in AAV patrol scenario(simulation results above match analytic predictions)
Results: Cyclic feedback(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
1
2 3
γi
C = V = {1, 2, 3}
Ci = Vi = {1, 2, 3} \ {i}
Yj = {j}
1
2 3
λ11
λ22 λ3
3
λ22 (encounters per second)
Optimal
coop
erationwillingn
ess
γ∗ ifori∈{1,2,3}
A B C
0
1
0 1 2 3 4 5
λ11 = 0.6
λ11
λ33 = 1.7
λ33
ba
b
bbc
bd
γ∗1
γ∗3
γ ∗2
A
λ22 < λ11 < λ33
γ∗2 > γ∗1 > γ∗3
B
λ11 < λ22 < λ33
γ∗1 > γ∗2 > γ∗3
C
λ11 < λ33 < λ22
γ∗1 > γ∗3 > γ∗2
Equilibrium in AAV patrol scenario(simulation results above match analytic predictions)
� Increases in one encounter rate (e.g., λ2) cause equilibrium shift so
neighbors (e.g., 1 and 3) help more and agent (e.g., 2) helps less
Results: Cyclic feedback(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
1
2 3
γi
C = V = {1, 2, 3}
Ci = Vi = {1, 2, 3} \ {i}
Yj = {j}
1
2 3
λ11
λ22 λ3
3
λ22 (encounters per second)
Optimal
coop
erationwillingn
ess
γ∗ ifori∈{1,2,3}
A B C
0
1
0 1 2 3 4 5
λ11 = 0.6
λ11
λ33 = 1.7
λ33
ba
b
bbc
bd
γ∗1
γ∗3
γ ∗2
A
λ22 < λ11 < λ33
γ∗2 > γ∗1 > γ∗3
B
λ11 < λ22 < λ33
γ∗1 > γ∗2 > γ∗3
C
λ11 < λ33 < λ22
γ∗1 > γ∗3 > γ∗2
Equilibrium in AAV patrol scenario(simulation results above match analytic predictions)
� Increases in one encounter rate (e.g., λ2) cause equilibrium shift so
neighbors (e.g., 1 and 3) help more and agent (e.g., 2) helps less
� Emergence due to market coupling from network cycles
Results: Cyclic feedback(Pavlic and Passino 2011)
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Application: Patrol
Network Cooperation
Game
Conclusions
Stochastic Robotics
b
b
1
2 3
γi
C = V = {1, 2, 3}
Ci = Vi = {1, 2, 3} \ {i}
Yj = {j}
1
2 3
λ11
λ22 λ3
3
λ22 (encounters per second)
Processingrate
Ri(γ∗ i)fori∈{1,2,3}
A B C
0
1
2
3
4
5
6
7
0 1 2 3 4 5
Rmax
Ropt
λ11+λ22+λ33
λ11 = 0.6
λ11
λ33 = 1.7
λ33
R1(γ∗1)
R3(γ∗3)
R2(γ∗2)
A
λ22 < λ11 < λ33
γ∗2 > γ∗1 > γ∗3
B
λ11 < λ22 < λ33
γ∗1 > γ∗2 > γ∗3
C
λ11 < λ33 < λ22
γ∗1 > γ∗3 > γ∗2
Equilibrium in AAV patrol scenario(simulation results above match analytic predictions)
� Increases in one encounter rate (e.g., λ2) cause equilibrium shift so
neighbors (e.g., 1 and 3) help more and agent (e.g., 2) helps less
� Emergence due to market coupling from network cycles
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
� Chemical reaction networks provide reduced-order analysis
frameworks for design and control (Berman et al. 2011a,b; Matthey
et al. 2009)
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
� Chemical reaction networks provide reduced-order analysis
frameworks for design and control (Berman et al. 2011a,b; Matthey
et al. 2009)
� Stochastic hybrid system models combine stochastic behavioral
switching with continuous-time dynamics (Kumar et al. 2013)
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
� Chemical reaction networks provide reduced-order analysis
frameworks for design and control (Berman et al. 2011a,b; Matthey
et al. 2009)
� Stochastic hybrid system models combine stochastic behavioral
switching with continuous-time dynamics (Kumar et al. 2013)
� Numerical solvers stabilize adaptive mixed-Nash equilibria on
networks (Pavlic and Passino 2011)
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
� Chemical reaction networks provide reduced-order analysis
frameworks for design and control (Berman et al. 2011a,b; Matthey
et al. 2009)
� Stochastic hybrid system models combine stochastic behavioral
switching with continuous-time dynamics (Kumar et al. 2013)
� Numerical solvers stabilize adaptive mixed-Nash equilibria on
networks (Pavlic and Passino 2011)
� Acknowledgments:
Stochastic Robotics: Conclusions
Introduction
Guarded Command
Programming with Rates
Reaction Networks
Cooperative Transport
Cooperative Task
Processing
Conclusions
Stochastic Robotics
� Stochastic programming can model failures and blend programs for
optimal performance (Napp and Klavins 2011)
� Chemical reaction networks provide reduced-order analysis
frameworks for design and control (Berman et al. 2011a,b; Matthey
et al. 2009)
� Stochastic hybrid system models combine stochastic behavioral
switching with continuous-time dynamics (Kumar et al. 2013)
� Numerical solvers stabilize adaptive mixed-Nash equilibria on
networks (Pavlic and Passino 2011)
� Acknowledgments:
� Questions? Comments?