67
Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Planning with Factored MDPs

Carlos Guestrin

Stanford University

Page 2: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Collaborative Multiagent Planning

Search and rescue Factory management Supply chain Firefighting Network routing Air traffic control

Long-termgoals

Multiple agents

Coordinateddecisions

CollaborativeMultiagentPlanning

Page 3: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Exploiting Structure

Real-world problems have:

Hundreds of objects Googles of states

Real-world problems have structure!

Approach: Exploit structured representation to obtain efficient approximate solution

Page 4: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

peasant

footman

building

Real-time Strategy GamePeasants collect resources and buildFootmen attack enemiesBuildings train peasants and footmen

Page 5: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Joint Decision Space

State space: Joint state x of entire system

Action space: Joint action a= {a1,…, an} for all agents

Reward function: Total reward R(x,a)

Transition model: Dynamics of the entire system P(x’|x,a)

Markov Decision Process (MDP) Representation:

Page 6: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Policy

Policy: (x) = aAt state x, action a for all agents

(x0) = both peasants get woodx0

(x1) = one peasant gets gold, other builds barrack

x1

(x2) = Peasants get gold, footmen attack

x2

Page 7: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Value of Policy

Value: V(x)Expected long-term

reward starting from

xStart from x0

x0

R(x0)

(x0

)

V(x0) = E[R(x0) + R(x1) + 2 R(x2) + 3 R(x3) + 4 R(x4) + ]

Future rewards discounted by 2 [0,1)x1

R(x1)

x1’’

x1’R(x1’)

R(x1’’)

(x1

)x2

R(x2)

(x2

)x3

R(x3)

(x3

) x4

R(x4)

(x1’)

(x1’’)

Page 8: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Optimal Long-term Plan

Optimal Policy: *(x)

Optimal value function V*(x)

'

)'(),|'(),(max)(x

axaxxaxx VPRV

Optimal policy:)a,x(maxarg)x(

a

Q

Bellman Equations:

'

)'(),|'(),(),(x

xaxxaxax VPRQ

Page 9: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Solving an MDP

Policy iteration [Howard ’60, Bellman ‘57]

Value iteration [Bellman ‘57]

Linear programming [Manne ’60]

Solve Bellman equation

Optimal value V*(x)

Optimal policy *(x)

Many algorithms solve the Bellman equations:

Page 10: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

LP Solution to MDP

Value computed by linear programming:

One variable V (x) for each state One constraint for each state x and action a Polynomial time solution

[Manne ’60]

),(

:subject to

:minimize

, ax

xa

x

Q)(xV

)(xV )(xV

, ax)(xV

Page 11: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Planning under Bellman’s “Curse”

Planning is Polynomial in #states and #actions

#states exponential in number of variables

#actions exponential in number of agents

Efficient approximation by exploiting structure!

Page 12: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

F’

E’

G’

P’

Structure in Representation: Factored MDP

State Dynamics Decisions Rewards

Peasant

Footman

Enemy

Gold

RComplexity of representation:Exponential in #parents (worst

case)

[Boutilier et al. ’95]t t+1TimeAPeasant

ABuild

AFootman

P(F’|F,G,AB,AF)

Page 13: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Structured Value function ?Factored MDP Structure in V*

Y’’

Z’’

X’’

R

Y’’’

Z’’’

X’’’

Time t t+1

R

Y’

Z’

X’

t+2 t+3

R

Z

Y

X

R

Factored MDP Structure in V*

Almost!

Structured V yields

good approximate value

function

?

Page 14: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Linear combination of restricted domain functions [Bellman et al. ‘63][Tsitsiklis & Van Roy ’96][Koller & Parr ’99,’00][Guestrin et al. ’01]

Structured Value Functions

Each hi is status of small part(s) of a complex system: State of footman and enemy Status of barracks Status of barracks and state of footman

Structured V Structured Q

Must find w giving good approximate value function

i

ihiwV )()(~

xx

i

iQQ~

small #of Ai’s, Xj’s

Page 15: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Approximate LP Solution

:subject to

, ax

:minimize x

),( xaQ)(xV

)(xV

),( xa

iiQ)( x

iii hw

)( xi

ii hw

One variable wi for each basis function Polynomial number of LP variables

One constraint for every state and action Exponentially many LP constraints

)( xi

iihw

)( xi

iihw

, ax

[Schweitzer and Seidmann ‘85]

Page 16: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

,),()( :subject to

axxaxi

ii

ii Qhw

Representing Exponentially Many Constraints

)x()x,a( :to subject max0x,a

i

iii hwQ

Exponentially many linear = one nonlinear constraint

,)(),(0 :subject to

axxxai

iii hwQ

[Guestrin, Koller, Parr ’01]

Maximization over exponential space

Page 17: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

)x()x,a( :to subject max0x,a

i

iii hwQ

Variable Elimination

i

iii hwQ )x()x,a(maxx,a

A

D

B C

1f

4f 3f

2f

Here we need only 23, instead of 63 sum operations

),(),(),(max 121,,

CBgCAfBAfCBA

),(),(max),(),(max 4321,,

DBfDCfCAfBAfDCBA

),(),(),(),(max 4321,,,

DBfDCfCAfBAfDCBA

Variable elimination to maximize over state space [Bertele & Brioschi ‘72]

Maximization only exponential in largest factor Tree-width characterizes complexity

Graph-theoretic measure of “connectedness” Arises in many settings: integer prog., Bayes nets, comput. geometry, …

Structured Value Function

i

iii

XXAA

hwQ

m

n

)(),(

,,,,

1

1

max

small #of Ai’s, Xj’s

small #of Xj’s

Page 18: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Representing the Constraints

Use Variable Elimination to represent constraints:

),(),(max),(),(max0 4321,,

DBfDCfCAfBAfDCBA

),(),(

),(),(max0

43),(

1

),(121

,,

DBfDCfg

gCAfBAf

CB

CB

CBA

Number of constraints exponentially smaller!

)x()x,a(max0 :to subjectx,a

i

iii hwQ i

iii hwQ )x()x,a(maxx,a

i

iii hwQ )x()x,a(maxx,a

Page 19: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Understanding Scaling Properties

Explicit LP Factored LP

k = tree-width

2n (n+1-k)2k

Explicit LP

0

10000

20000

30000

40000

2 4 6 8 10 12 14 16number of variables

nu

mb

er o

f co

nst

rain

ts

Factored LP

k = 3

k = 5

k = 8

k = 10

k = 12

Page 20: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Network Management Problem

Ring

Star

Ring of Rings

k-grid

Computer status = {good, dead, faulty}

Dead neighbors increase dying probability

Computer runs processes

Reward for successful processes

Each SysAdmin takes local action = {reboot, not reboot }

Problem with n machines 9n states, 2n actions

Page 21: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Running Time

0

500

1000

1500

2000

2500

3000

0 2 4 6 8 10 12

number of machines

Ru

nn

ing

tim

e (

s)

RingExact solution

RingSingle basis k=4

StarSingle basis

k=4

3-gridSingle basis

k=5

StarPair basis

k=4RingPair basis

k=8

k – tree-width

Page 22: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Summary of Algorithm

1. Pick local basis functions hi

2. Factored LP computes value function

3. Policy is argmaxa of Q

Page 23: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Large-scale Multiagent Coordination

Efficient algorithm computes V Action at state x is:

)a,x(maxarga

Q

#actions is exponential Complete observability Full communication

Page 24: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Distributed Q Function

Q(A1,…,A4, X1,…,X4)

[Guestrin, Koller, Parr ’02]

2

3

4

1

Q4

Q2(A1, A2, X1,X2)

Q4(A3, A4, X3,X4)

Q1(A1, A4,

X1,X4) Q3(A2, A3, X2,X3)+

++

Each agent maintains a part of the Q function

Distributed Q

function

Page 25: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Action Selection

2

3

4

1

Q2(A1, A2, X1,X2)

Q4(A3, A4, X3,X4)

Q1(A1, A4,

X1,X4)

Q3(A2, A3, X2,X3)

Distributed Q

function

Instantiate current state x

Maximal action

argmaxa

Page 26: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Instantiate Current State x

2

3

4

1

Q2(A1, A2, X1,X2)

Q4(A3, A4, X3,X4)

Q1(A1, A4,

X1,X4)

Q3(A2, A3, X2,X3)

Q2(A1, A2)

Q3(A2, A3)

Q4(A3, A4)

Q1(A1, A4)

Observe only

X1 and X2

Instantiate current state x

Limited observability: agent i only observes variables in Qi

Page 27: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Action Selection

2

3

4

1

Distributed Q

function

Instantiate current state x

Maximal action

argmaxa

Q2(A1, A2)

Q3(A2, A3)

Q4(A3, A4)

Q1(A1, A4)

Page 28: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Coordination Graph

Q2(A1, A2)

Q3(A2, A3)

Q4(A3, A4)

Q1(A1, A4)

maxa

+ + +

Use variable elimination for maximization:

Limited communication for optimal action choice

Comm. bandwidth = tree-width of coord. graph

A1

A3

A2 A4

2Q

3Q 4Q

1Q

),(),(),(max 421212411,, 321

AAgAAQAAQA A A

),(),(max),(),(max 434323212411,, 3321

AAQAAQAAQAAQAA A A

),(),(),(),(max 434323212411,,, 4321

AAQAAQAAQAAQA A A A

A2 A4 Value of optimal A3

action

Attack Attack 5

Attack Defend

6

Defend

Attack 8

Defend

Defend

12

Page 29: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Coordination Graph Example

A4

A1

A3

A2

A7

A5

A6

A11

A9

A8

A10

Trees don’t increase communication requirements

Cycles require graph triangulation

Page 30: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Unified View: Function Approximation Multiagent Coordination

Q1(A1, A4, X1,X4) + Q2(A1, A2, X1,X2)

+

Q3(A2, A3, X2,X3) + Q4(A3, A4, X3,X4)

A1

A3

A2 A4

Q1(A1, X1) + Q2(A2, X2) +

Q3(A3, X3) + Q4(A4, X4)

A1

A3

A2 A4

Factored MDP and value function representations induce communication, coordination

Tradeoff Communication / Accuracy

Page 31: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

How good are the policies?

SysAdmin problem

Power grid problem [Schneider et al. ‘99]

Page 32: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

SysAdmin Ring - Quality of Policies

1.5

2.5

3.5

4.5

0 5 10number of machines

va

lue

pe

r m

ac

hin

e

Utopic maximum value

Exact solution

Constraint samplingSingle basis

Constraint samplingPair basis

Factored LP Single basis

Page 33: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Power Grid – Factored Multiagent

Lower is better!

[Guestrin, Lagoudakis, Parr ‘02]

0

10

20

30

40

50

60

70

80

90

100

A B C DGrid

Co

st

DR [Schneider+al '99]

DVF [Schneider+al '99]

Factored Multiagent no comm.

Factored Multiagent pairwise comm.

Page 34: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Summary of Algorithm

1. Pick local basis functions hi

2. Factored LP computes value function

3. Coordination graph computes argmaxa

of Q

Page 35: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Planning Complex Environments

When faced with a complex problem, exploit structure:

For planning For action selection

Given new problem

Replan from scratch: Different MDP New planning problem Huge problems intractable, even with factored LP

Page 36: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Generalizing to New Problems

SolveProblem 1

SolveProblem n

Good solution to

Problem n+1

SolveProblem 2

MDPs are different! Different sets of states, action, reward,

transition, …

Many problems are “similar”

Page 37: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Generalization with Relational MDPs

Avoid need to replan Tackle larger problems

[Guestrin, Koller, Gearhart, Kanodia ’03]

“Similar” domains have similar “types” of objects

Exploit similarities by computing generalizable value functions

RelationalMDP

Generalization

Page 38: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Relational Models and MDPs

Classes: Peasant, Gold, Wood, Barracks,

Footman, Enemy…

Relations Collects, Builds, Trains, Attacks…

Instances Peasant1, Peasant2, Footman1,

Enemy1…

Page 39: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Relational MDPs

Class-level transition probabilities depends on: Attributes; Actions; Attributes of

related objects Class-level reward function

P P’

AP

G

Gold

G’Collects

Very compact representation!Does not depend on # of objects

Peasant

Page 40: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Tactical Freecraft: Relational Schema

Enemy

H’ Health

R

Count

Footman

H’ Health

AFootmanmy_enemy

Enemy’s health depends on #footmen attacking Footman’s health depends on Enemy’s health

Page 41: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

World is a Large Factored MDP

Instantiation (world): # instances of each class Links between instances

Well-defined factored MDP

RelationalMDP

Linksbetweenobjects

FactoredMDP

# of objects

Page 42: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

World with 2 Footmen and 2 Enemies

F1.Health

F1.A

F1.H’

E1.Health E1.H’

F2.Health

F2.A

F2.H’

E2.Health E2.H’

R1

R2

Footman1

Enemy1

Enemy2

Footman2

Page 43: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

World is a Large Factored MDP

Instantiate world Well-defined factored MDP Use factored LP for planning

We have gained nothing!

RelationalMDP

Linksbetweenobjects

FactoredMDP

# of objects

Page 44: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Class-level Value Functions

F1.Health E1.Health F2.Health E2.Health

Footman1

Enemy1

Enemy2

Footman2

VF1(F1.H, E1.H) VE1(E1.H) VF2(F2.H, E2.H) VE2(E2.H)

V(F1.H, E1.H, F2.H, E2.H) = + + + Units are Interchangeable!VF1 VF2 VF VE1 VE2

VE

At state x, each footman has different contribution to V

Given VC — can instantiate value function for any world

Footman1

Enemy1

Enemy2

Footman2

VF VF VE VE

Page 45: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Computing Class-level VC

:minimize

:subject to

, ax

),( xaQ)(xV

x

)(xV

C Co

CV )(][

x

C Co

CQ ),(][

ax

C Co

CV )(][

x

ax,,

Constraints for each world represented by factored LP

Number of worlds exponential or infinite

Page 46: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Sampling Worlds

Many worlds are similar Sample set I of worlds

, x, a I , x, aSampling

Page 47: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Theorem

Exponentially (infinitely) many worlds !

need exponentially many samples?NO!

samples

Value function within of class-level solution optimized for all worlds, with prob. at least 1-

Rmax is the maximum class reward Proof method related to [de Farias, Van Roy ‘02]

Page 48: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Learning Classes of Objects

1

23

4

23

3

4

510

10

20

30

40

50 GoodFaultyDead

V1

0

10

20

30

40

50 GoodFaultyDeadV2

0

20

40

60 GoodFaultyDead

V1

0

10

20

30

40

50 GoodFaultyDeadV2

Plan for sampled worlds

separately

Objects with similar values

belong to same class

Find regularitiesbetween worlds

Used decision tree regression in experiments

Page 49: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Summary of Algorithm

1. Model domain as Relational MDPs

2. Sample set of worlds

3. Factored LP computes class-level value

function for sampled worlds

4. Reuse class-level value function in new world

5. Coordination graph computes argmaxa of Q

Page 50: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Experimental Results

SysAdmin problem

Page 51: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Generalizing to New Problems

3

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

Ring Star Three legs

Est

imat

ed p

oli

cy v

alu

e p

er a

gen

t

Utopic maximum valueObject-based value with complete replanningClass-based value function - no replanning

Page 52: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Learning Classes of Objects

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Ring Star Three legs

Max

-no

rm e

rro

r o

f va

lue

fun

ctio

n

No class learning

Learnt classes

Page 53: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Classes of Objects Discovered

Learned 3 classes

Server

Intermediate

Intermediate

Intermediate

Leaf

LeafLeaf

Page 54: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Strategic

World: 2 Peasants, 2 Footmen,

1 Enemy, Gold, Wood, Barracks Reward for dead enemy About 1 million state/action pairs

Algorithm: Solve with Factored LP Coordination graph for action

selection

Page 55: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Strategic

World: 9 Peasants, 3 Footmen,

1 Enemy, Gold, Wood, Barracks Reward for dead enemy About 3 trillion state/action pairs

Algorithm: Solve with factored LP Coordination graph for action

selection

grows exponentially in #

agents

Page 56: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Strategic

World: 9 Peasants, 3 Footmen,

1 Enemy, Gold, Wood, Barracks Reward for dead enemy About 3 trillion state/action pairs

Algorithm: Use generalized class-based value

function Coordination graph for action selection

instantiated Q-functionsgrow polynomially in #

agents

Page 57: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Tactical

Planned in 3 Footmen versus 3 Enemies

Generalized to 4 Footmen versus 4 Enemies

3 vs. 3 4 vs. 4

Generalize

Page 58: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Contributions Efficient planning with LP decomposition

[Guestrin, Koller, Parr ’01]

Multiagent action selection [Guestrin, Koller, Parr ’02]

Generalization to new environments [Guestrin, Koller, Gearhart, Kanodia ’03]

Variable coordination structure [Guestrin, Venkataraman, Koller ’02]

Multiagent reinforcement learning [Guestrin, Lagoudakis, Parr ’02] [Guestrin, Patrascu, Schuurmans ’02]

Hierarchical decomposition [Guestrin, Gordon ’02]

Page 59: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Open Issues

High tree-width problems

Basis function selection

Variable relational structure

Partial observability

Page 60: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Daphne Koller Committee

Leslie Kaelbling, Yoav Shoham, Claire Tomlin, Ben Van Roy

Co-authors

DAGS members Kristina and Friends My Family

M.S. Apaydin, D. Brutlag, F. Cozman, C. Gearhart, G. Gordon, D. Hsu, N. Kanodia, D. Koller, E. Krotkov, M. Lagoudakis, J.C. Latombe, D. Ormoneit,

R. Parr, R. Patrascu, D. Schuurmans, C. Varma, S. Venkataraman.

Page 61: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

In planning problem –

Factored LP

ExploitStructure

In action selection –

Coord. graph

Between problems –

Generalization

Complex multiagent planning task

Conclusions

Formal framework for multiagent planningthat scales to very large problemsvery large

14436596542203275214816766492036822682859734670489954077831385060806196390977769687258235595095458210061891186534272525795367402762022519832080387801477422896484127439040011758861804112894781562309443806156617305408667449050617812548034440554705439703889581746536825491613622083026856377858229022846398307887896918556404084898937609373242171846359938695516765018940588109060426089671438864102814350385648747165832010614366132173102768902855220001

states

1322070819480806636890455259752

Page 62: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Network Management Problem

Ring

Star

Ring of Rings

k-grid

Computer runs processes

Computer status = {good, dead, faulty}

Dead neighbors increase dying probability

Reward for successful processes

Each SysAdmin takes local action = {reboot, not reboot }

Page 63: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Policy QualityComparing to Distributed Reward and Distributed Value Function algorithms [Schneider et al. ‘99]

3.4

3.6

3.8

4

4.2

4.4

2 4 6 8 10 12 14 16

Number of agents

Est

imat

ed v

alue

per

age

nt Utopic maximum value

Page 64: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Policy QualityComparing to Distributed Reward and Distributed Value Function algorithms [Schneider et al. ‘99]

3.4

3.6

3.8

4

4.2

4.4

2 4 6 8 10 12 14 16

Number of agents

Est

imat

ed v

alue

per

age

nt Utopic maximum value

Distributedreward

Distributedvalue

Page 65: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Multiagent Policy QualityComparing to Distributed Reward and Distributed Value Function algorithms [Schneider et al. ‘99]

3.4

3.6

3.8

4

4.2

4.4

2 4 6 8 10 12 14 16

Number of agents

Est

imat

ed v

alue

per

age

nt Utopic maximum value

LP single basis

LP pair basis

Distributedreward

Distributedvalue

Page 66: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Comparing to Apricodd [Boutilier et al.]

y = 0.1473x3 - 0.8595x2 + 2.5006x - 1.5964R2 = 0.9997

y = 0.0254x2 + 0.0363x + 0.0725

R2 = 0.9983

0

10

20

30

40

50

6 8 10 12 14 16 18 20

Number of variables

Tim

e (

in s

eco

nd

s)

Apricodd

Rule-based

Apricodd: Exploits context-specific independence (CSI)

Factored LP: Exploits CSI and linear independence

y = 5.275x3 - 29.95x2 + 53.915x - 28.83

R2 = 1

0

100

200

300

400

500

6 8 10 12

Number of variables

Tim

e

(in

se

con

ds)

Apricodd

Rule-based

y = 3E-05 * 2 - 0.0026 * 2 + 5.6737R2 = 0.9999

x x2

Page 67: Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University

Appricodd

0

10

20

30

40

50

60

0 2 4 6 8 10 12

Number of machines

Ru

nn

ing

tim

e (

min

ute

s)

Rule-based LP

Apricodd

0

5

10

15

20

25

30

0 2 4 6 8 10 12

Number of machines

Dis

cou

nte

d v

alu

e o

f p

olic

y (a

vg.

50

ru

ns

of

10

0 s

tep

s) Rule-based LP

Apricodd

0

5

10

15

20

25

30

35

40

45

50

0 2 4 6 8 10 12

Number of machines

Ru

nn

ing

tim

e (

min

ute

s)

Rule-based LP

Apricodd

Ring

Star

0

5

10

15

20

25

30

0 2 4 6 8 10 12

Number of machines

Dis

cou

nte

d v

alu

e o

f p

olic

y (a

vg.

50

ru

ns

of

10

0 s

tep

s)

Rule-based LP

Apricodd