31
Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32

Integrating decision-theoretic planning and programming

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integrating decision-theoretic planning and programming

Integrating decision-theoretic planning andprogramming for robot control in highly

dynamic domains

Christian Fritz

Thesis, Final Presentation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32

Page 2: Integrating decision-theoretic planning and programming

Introduction

Goals:I combine:

� programming� decision-theoretic planning� on-line!

I extend planning with optionsI evaluate in three diversified example domains

� grid world� RoboCup Simulation� RoboCup Mid-Size

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32

Page 3: Integrating decision-theoretic planning and programming

Programming

ICPGOLOGI based on situation calculus

I extends basic GOLOG:

+ on-line: incremental, sensing (active and passive)

+ continuous change

+ concurrency

+ progression

+ probabilistic projection

– nondeterminism

I problems:

� decision making: explicit, missing utility theory

� projection comparatively slowIntegrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32

Page 4: Integrating decision-theoretic planning and programming

Decision-Theoretic Planning

Markov Decision Processes (MDPs) standard model fordecision-theoretic planning problems

I Formally: M =< S,A, T,R >, with

� S a set of states

� A a set of actions

� T : S ×A× S → [0, 1] a transition function

� R : S → IR a reward function

I Here: fully observable MDPs

I Planning task: find an optimal policy, maximizing expected reward

I Note: S and A are usually finite!

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32

Page 5: Integrating decision-theoretic planning and programming

Programming & Planning:DTGolog

I New Golog derivative DTGOLOG [Boutilier et al.]

I Combines explicit agent programming with planning

I Uses MDPs to model the planning problem:

� S = situations

� A = primitive actions

� T = for each action a ∈ A, a list of outcomes and theirrespective probability

� R : situations→ IR

I applies decision-tree search to solve MDP up to a given horizon

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32

Page 6: Integrating decision-theoretic planning and programming

Programming & Planning:DTGolog

Disadvantages:I offlineI situations = states

� infinite state space� inefficient

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32

Page 7: Integrating decision-theoretic planning and programming

READYLOG

Contributions:I re-added nondeterminism with decision-theoretic

semantics→ on-line decision-theoretic Golog

I added options to speed up MDP solutionI preprocessor to minimize interpretation on-line

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32

Page 8: Integrating decision-theoretic planning and programming

Part I

Extending DTGolog with Options

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32

Page 9: Integrating decision-theoretic planning and programming

Options? what’s that?

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32

Page 10: Integrating decision-theoretic planning and programming

Options

Idea:I construct complex actions from primitive onesI options: solutions to sub-MDPsI generate models about them:

� when possible to execute?� which outcomes possible to occur?� which probabilities do the outcomes have?� expected rewards and costs? (expected value)

I these can then be used in planning

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32

Page 11: Integrating decision-theoretic planning and programming

Integrating Options into Golog

how do we integrate options into DTGolog/ReadyLog?

I avoiding the inconvenience “situations = states”

I instead mappings:

� situations→ states (when ’entering’ option)

� states→ situations (when ’leaving’ option)

I options..

� ..are solutions to local MDPs..

� ..encapsulated into a stochastic procedure.

I stochastic procedures..

� ..are procedures with an explicit model (preconditions/effects/costs);

� ..replace stochastic actions;

� ..can model options.Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32

Page 12: Integrating decision-theoretic planning and programming

Generating Options

how do we generate options?

I define:

� φ precondition (think: states where option is applicable)

� β : exitstates → value pseudo-rewards for local MDP

� θ option-skeleton one-step program to take in each step..• ..usually something like nondet( [left, right, down, up] ) ;• ..can contain ifs;• ..can build on options/stochastic procedures

� and: two mappings:

• Φ : situations→ states• Σ : states→ situations• option_mapping(o, σ,Γ, ϕ)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32

Page 13: Integrating decision-theoretic planning and programming

Examples

I example policy:proc (room1_2 , [exogf_Update ,

while (is_possible (room1_2 ),

[ if (pos=[0, 0], go_right , if (pos=[0, 1], go_right ,

if (pos=[0, 2], go_up , if (pos =[1, 0], go_right ,

if (pos=[1, 1], go_right , if (pos=[1, 2], go_right ,

if (pos=[2, 0], go_down , if (pos=[2, 1], go_right ,

if (pos=[2, 2], go_up , []))))))))),

exogf_Update ])]).

I example model (for state ’position=(0,0)’):opt_costs (room1_2, [(pos, [0, 0])], 4.51650594972207).

opt_probability_list (room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012),

([(pos , [3, 1])], 0.99987)]).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32

Page 14: Integrating decision-theoretic planning and programming

Test Setting

3 4

5

6 7

21 S

G11

G4

G3

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32

Page 15: Integrating decision-theoretic planning and programming

Experimental Results

(a) full MDP

planning (A)

(b) heuristics

(B)

(c) options (C)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32

Page 16: Integrating decision-theoretic planning and programming

Experimental Results

0.1

0.03

9555.86

2507.56

302.01

0.048

53.6

702.33

38.63

11.23 6.81

1.04

0.55

3.66

3 4 5 6 7 8 9 10 11

seco

nds

manhattan distance from start to goal

A

A’

B

B’

C

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32

Page 17: Integrating decision-theoretic planning and programming

Part II

On-line Decision-Theoretic Golog for UnpredictableDomains

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32

Page 18: Integrating decision-theoretic planning and programming

READYLOG: on-line DT planning

on-line:

I incremental

� solve( plan-skeleton , horizon)

� execute returned policy

I sensing / exogenous events

� problem:• dynamic environment (changes while thinking)• imperfect models→ policy can get invalid

⇒ execution monitoring:• program and policy coexistence• markers

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32

Page 19: Integrating decision-theoretic planning and programming

Execution Monitoring Semantics

Trans(solve(p, h), s, δ′, s′) ≡∃π, v, pr .BestDo(p, s, h, π, v, pr) ∧ δ′ = applyPol(π) ∧ s′ = s.

BestDo(if (ϕ, p1, p2); p,s, h, π, v, pr).=

ϕ[s] ∧ ∃π1.BestDo(p1; p, s, h, π1, pr) ∧ π = M(ϕ, true);π1 ∨¬ϕ[s] ∧ ∃π2.BestDo(p2; p, s, h, π2, v, pr) ∧ π = M(ϕ, false);π2

Trans(applyPol(M(ϕ, v);π), s, δ′, s′) ≡ s = s′∧(v = true ∧ ϕ[s] ∧ δ′ = applyPol(π) ∨v = false ∧ ¬ϕ[s] ∧ δ′ = applyPol(π) ∨v = true ∧ ¬ϕ[s] ∧ δ′ = nil ∨v = false ∧ ϕ[s] ∧ δ′ = nil)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32

Page 20: Integrating decision-theoretic planning and programming

READYLOG II

I options (..)

I preprocessor:

� translates READYLOG functions, conditions, definitions.. to Prolog code

� creates successor state axioms from effect axioms

� speed-up of about factor 16

1

4

16

64

256

1024

200 400 600 800 1000 1200 1400 1600 1800 2000

seco

nds

length of situation term

effect axioms (uncompiled)

successor state axioms (compiled)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32

Page 21: Integrating decision-theoretic planning and programming

Experimental Results

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32

Page 22: Integrating decision-theoretic planning and programming

Experimental Results:SimLeague

I compared with ICPGOLOG(Normans results)planning time in seconds

ICPGOLOG READYLOG

goal shot 0.35 0.01

direct pass 0.25 0.01

I speed-up due to preprocessor

Example where these are combined (demo):

solve (nondet ([goalKick (OwnNumber ),

[pickBest (bestP , [2..11],

[directPass (OwnNumber , bestP, pass_NORMAL ),

goalKick (bestP )])]

]), Horizon )

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32

Page 23: Integrating decision-theoretic planning and programming

Experimental Results: MidSize

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32

Page 24: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Code

solve (

nondet ([ kick(ownNumber , 40),

dribble_or_move_kick (ownNumber ),

dribble_to_points (ownNumber ),

if (isKickable (ownNumber ),

pickBest (var_turnAngle , [-3.1, -2.3, 2.3, 3.1],

[ turn_relative (ownNumber , var_turnAngle , 2),

nondet ([ [intercept_ball (ownNumber , 1),

dribble_or_move_kick (ownNumber )],

[intercept_ball (numberByRole (supporter ), 1),

dribble_or_move_kick (numberByRole (supporter ))]

]) ]),

nondet ([ [intercept_ball (ownNumber , 1),

dribble_or_move_kick (ownNumber )],

intercept_ball (ownNumber , 0.0, 1)]) ) ]), 4)

proc (dribble_or_move_kick (Own ),

nondet ([[dribble_to (Own , oppGoalBestCorner , 1)],

[move_kick (Own , oppGoalBestCorner , 1)]])).

proc (dribble_to_points (Own),

pickBest (var_pos , [[2.5, -1.25], [2.5, -2.5], [2.5, 0.0],[2.5, 2.5], [2.5, 1.25]],

dribble_to (Own , var_pos , 1))).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.24/32

Page 25: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Behavior

ball behavior when turning with ball

move_kick/dribble

move/dribble/intercept

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.25/32

Page 26: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Example: Situation

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.26/32

Page 27: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Example: Plans

(d) (e) (f)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.27/32

Page 28: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Example: Teamplay

(g) (h) (i)

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.28/32

Page 29: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Example: Decision Tree

natures choices

agent choices move_kick

kick

turn

intercept(me)

intercept(TM)

move_kick

move_kick

0.8

0.2

0.8

0.2

10000

10000

4169

4169

costs: −70

costs: −70

costs: −70

costs: −70

4169

4169

4169

costs: −12

costs: −7

4557

4776

4623

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.29/32

Page 30: Integrating decision-theoretic planning and programming

Experimental Results: MidSize,Computation

planning time in secondsexamples min avg max

without ball 698 0.0 0.094 0.450with ball 117 0.170 0.536 2.110

I variance due to processor load on robotsI qualitatively: enough for rudimentarily playing

soccer

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.30/32

Page 31: Integrating decision-theoretic planning and programming

Conclusion

I On-line decision theoretic Golog..

� ..can be applied to highly dynamic domains withinfinite/continuous state spaces,

� ..can coexist with passive sensing,

� ..motivates more sophisticated execution monitoring.

I Options..

� ..can be added to decision theoretic Golog,

� ..provide good speed-ups,

� ..rely on finite state spaces(!).

Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.31/32