Upload
sydney
View
30
Download
0
Embed Size (px)
DESCRIPTION
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping. Pradeep Varakantham Singapore Management University. Joint work with J.Y.Kwak, M.Taylor , J. Marecki, P. Scerri, M.Tambe. Motivating Domains. Sensor Networks. Disaster Rescue. Characteristics of Domains: - PowerPoint PPT Presentation
Citation preview
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping
Pradeep Varakantham Singapore Management University
Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe
Motivating Domains
Disaster RescueSensor Networks
Characteristics of Domains: Uncertainty Coordinating multiple agents Sequential decision making
Meeting the challengesProblem:
Multiple agents coordinating to perform multiple tasks in presence of uncertainty
Sol: Represent as Distributed POMDPs and solveNEXP Complete for optimal solutionApproximate algorithm to dynamically exploit
structure in interactionsResult: Vast improvement in performance over
existing algorithms
Outline
Illustrative Domain
Model
Approach: Exploit dynamic structure in interactions
Results
Illustrative Domain Multiple types of
robots Uncertainty in
movements Reward
Saving victims Collisions Clearing debris
Maximize expected joint reward
ModelDisPOMDPs with Coordination Locales, DPCL
Joint model: <S, A, Ω, P, R, O, Ag>Global state represents completion of tasksAgents independent except in coordination locales,
CLsTwo types of CLs:
Same time CL (Ex: Agents colliding with each other)
Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal)
Individual observability
Solving DPCLs with TREMORTeams REshaping of MOdels for Rapid
execution
Two steps:1. Branch and Bound search
MDP based heuristics
2. Task Assignment evaluation By computing policies for every agentPerform only joint policy computation at CLs
1. Branch and Bound search
2. Task Assignment EvaluationUntil convergence of policies or
maximum iterations:1)Solve individual POMDPs2)Identify potential coordination locales3)Based on type and value of
coordination :Shape P and R of relevant individual agents
Capture interactionsEncourage/Discourage interactions
4)Go to step 1
Identifying potential CLsCL = <State, Action>Probability of CL occurring at a time step, T
Given starting beliefStandard belief update given policy
Policy over belief states
Probability of observing w, in belief state “b”
Updating “b”
Type of CLSTCL, if there exists “s” and “a” for which
Transition/Reward function not decomposable, P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’)) OR R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))
FTCL, Completion of task (global state) by an agent at
t’ affects transitions/rewards of other agents at t
Shaping Model (STCL)Shaping transition function
Shaping reward function
Joint transition probability when CL occursNew transition
probability for agent “i”
ResultsBenchmark Algorithms
Independent POMDPsMemory Bounded Dynamic Programming (MBDP)
CriterionDecision qualityRun-time
Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon
State space
Agents
Coordination Locales
Time Horizon
Related workExisting Research
DEC-MDPs Assuming individual or collective full observability Task allocation and dependencies as input
DEC-POMDPs JESP MBDP Exploiting independence in
transition/reward/observation.Model Shaping
Guestrin and Gordon, 2002
ConclusionDPCL, a specialization of Distributed POMDPs
TREMOR exploits presence of few CLs in domains
TREMOR depends on single agent POMDP solvers
Results: TREMOR outperformed DisPOMDP algorithms,
except in tightly coupled small problems
Questions?
Same Time CL (STCL)There is an STCL, if
Transition function not decomposable, OR P(s,a,s’) ≠ Π1≤i≤N P((sg,si),ai,(sg’,si’))
Observation function not decomposable, OR O(s’,a,o) ≠ Π 1≤i≤N O(oi,ai,(sg’,si’))
Reward function not decomposable R(s,a,s’) ≠ Σ1≤i≤N R((sg,si),ai,(sg’,si’))
Ex: Two robots colliding in a narrow corridor
Future Time CLActions of one agent at “ t’ ” can affect
transitions OR observations OR rewards of other agents at “ t ” P((st
g,sti),at
i,(stg’,st
i’)|ajt’ ) ≠ P((st
g,sti),at
i,(stg’,st
i’)) , ¥ t’ < t
R((stg,st
i),ati,(st
g’,sti’)|aj
t’ ) ≠ R((stg,st
i),ati,(st
g’,sti’)) , ¥ t’
< t O(wt
i,ati,(st
g’,sti’)|aj
t’ ) ≠ O(wti,at
i,(stg’,st
i’)) , ¥ t’ < t
Ex: Clearing of debris assists rescue robots in getting to victims faster