88
The Stochastic Shortest Path Problem: A Polyhedral Perspective Matthieu Guillot 1 Gautier Stauer 1 1 G-SCOP, Univ. Grenoble Alpes, 38000 Grenoble, France London School of Economics, january 2017 Guillot and Stauer The Stochastic Shortest Path Problem LSE 2017 1 / 18

The Stochastic Shortest Path Problem: A Polyhedral Perspective

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem:A Polyhedral Perspective

Matthieu Guillot 1 Gautier Stauffer 1

1G-SCOP, Univ. Grenoble Alpes, 38000 Grenoble, France

London School of Economics, january 2017

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 1 / 18

Page 2: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Outline of the talk

Infinite horizon total cost MDP

The Stochastic Shortest Path Problem

Contributions

Main proof technique: Generalized flow decomposition theorem

Open Questions

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 2 / 18

Page 3: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Outline of the talk

Infinite horizon total cost MDP

The Stochastic Shortest Path Problem

Contributions

Main proof technique: Generalized flow decomposition theorem

Open Questions

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 2 / 18

Page 4: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Outline of the talk

Infinite horizon total cost MDP

The Stochastic Shortest Path Problem

Contributions

Main proof technique: Generalized flow decomposition theorem

Open Questions

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 2 / 18

Page 5: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Outline of the talk

Infinite horizon total cost MDP

The Stochastic Shortest Path Problem

Contributions

Main proof technique: Generalized flow decomposition theorem

Open Questions

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 2 / 18

Page 6: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Outline of the talk

Infinite horizon total cost MDP

The Stochastic Shortest Path Problem

Contributions

Main proof technique: Generalized flow decomposition theorem

Open Questions

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 2 / 18

Page 7: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action aAn initial state s0.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 8: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of states

A = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action aAn initial state s0.

1

2 4

3

0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 9: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actions

c : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action aAn initial state s0.

1

2 4

3

0

a

b c

d

e

f

g

0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 10: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actions

c : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action aAn initial state s0.

1

2 4

3

0

a

b c

d

e

f

g

0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 11: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actions

P(·|a), conditional probabilities over the state space for each action aAn initial state s0.

1

2 4

3

0

a

b c

d

e

f

g

0

3

10

−5

7

−1

2

4

0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 12: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action a

An initial state s0.

1

2 4

3

0

a

b c

d

e

f

g

00.7 10.3

1 0.5

0.9

0.2

1

0.5

0.8

0.1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 13: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action a

An initial state s0.

1

2 4

3

0

a

b c

d

e

f

g

0

3

10

−5

7

−1

2

4

00.7 10.3

1 0.5

0.9

0.2

1

0.5

0.8

0.1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 14: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Entries :

S a finite set of statesA = ∪s∈SA(s) a finite set of actionsc : A 7→ R, a cost function on the actionsP(·|a), conditional probabilities over the state space for each action aAn initial state s0.

11

2 4

3

0

a

b c

d

e

f

g

0

3

10

−5

7

−1

2

4

00.7 10.3

1 0.5

0.9

0.2

1

0.5

0.8

0.1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 3 / 18

Page 15: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).The system evolves to state st+1 according to P(·|a).

1

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 16: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).

The system evolves to state st+1 according to P(·|a).

1

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 17: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).

The system evolves to state st+1 according to P(·|a).

11

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 18: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).

The system evolves to state st+1 according to P(·|a).

11

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

7

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 19: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).

The system evolves to state st+1 according to P(·|a).

1

2 4

3

0

a

b c

dd

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 20: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).The system evolves to state st+1 according to P(·|a).

1

2 4

3

0

a

b c

dd

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 21: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).The system evolves to state st+1 according to P(·|a).

1

2 4

3

0

a

b c

dd

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

0.5

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 22: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).The system evolves to state st+1 according to P(·|a).

1

3

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 23: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon Markov Decision Process

Dynamics :

In each time period t ≥ 0, the system is in state st and we need to decideupon an action a available in A(st).The system evolves to state st+1 according to P(·|a).

1

3

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 4 / 18

Page 24: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon (total cost) Markov Decision Process

Goal :

Find a policy π : S 7→ A(It defines a Markov Chain with transition matrix Pπ).

Minimizing∑+∞

k=0 1ts0

(Pπ)kcπ

1

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

NB : we might consider non stationary and non deterministic policies BUTfor most MDPs ‘pure’ policies are optimal

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 5 / 18

Page 25: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon (total cost) Markov Decision Process

Goal :

Find a policy π : S 7→ A

(It defines a Markov Chain with transition matrix Pπ).

Minimizing∑+∞

k=0 1ts0

(Pπ)kcπ

1

2 4

3

0

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

7 2

−5

10

0

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

NB : we might consider non stationary and non deterministic policies BUTfor most MDPs ‘pure’ policies are optimal

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 5 / 18

Page 26: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon (total cost) Markov Decision Process

Goal :

Find a policy π : S 7→ A(It defines a Markov Chain with transition matrix Pπ).

Minimizing∑+∞

k=0 1ts0

(Pπ)kcπ

1

2 4

3

0

b

d

e

g

0

10

1−5

1

7

0.5

2

0.2

0.5

0.8

0

1

7 2

−5

10

0

a

s

action

state

a sp(s|a)

as c(a)

NB : we might consider non stationary and non deterministic policies BUTfor most MDPs ‘pure’ policies are optimal

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 5 / 18

Page 27: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon (total cost) Markov Decision Process

Goal :

Find a policy π : S 7→ A(It defines a Markov Chain with transition matrix Pπ).

Minimizing∑+∞

k=0 1ts0

(Pπ)kcπ

1

2 4

3

0

b

d

e

g

0

10

1−5

1

7

0.5

2

0.2

0.5

0.8

0

1

7 2

−5

10

0

a

s

action

state

a sp(s|a)

as c(a)

NB : we might consider non stationary and non deterministic policies BUTfor most MDPs ‘pure’ policies are optimal

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 5 / 18

Page 28: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Infinite horizon (total cost) Markov Decision Process

Goal :

Find a policy π : S 7→ A(It defines a Markov Chain with transition matrix Pπ).

Minimizing∑+∞

k=0 1ts0

(Pπ)kcπ

1

2 4

3

0

b

d

e

g

0

10

1−5

1

7

0.5

2

0.2

0.5

0.8

0

1

7 2

−5

10

0

a

s

action

state

a sp(s|a)

as c(a)

NB : we might consider non stationary and non deterministic policies BUTfor most MDPs ‘pure’ policies are optimal

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 5 / 18

Page 29: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 30: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 31: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 32: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

V ∗(s) = mina∈A(s)

{c(a) + α∑s′

P(s ′|a) · V ∗(s ′)}

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 33: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

V ∗(s) = mina∈A(s)

{c(a) + α∑s′

P(s ′|a) · V ∗(s ′)}

Value Iteration : Bellman (1957) Dynamic Programming

Policy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 34: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

V ∗(s) = mina∈A(s)

{c(a) + α∑s′

P(s ′|a) · V ∗(s ′)}

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithm

Linear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 35: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Discounted Markov Decision Process

Issue :∑+∞

k=0 1ts0

(Pπ)kcπ is not always defined

1 2

b

c

1 1

−11

Discounted models : V ∗(s0) := min∑+∞

k=0 αk1t

s0(Pπ)kcπ for some 0 ≤ α < 1

Standards Methods from the 50’s:

V ∗(s) = mina∈A(s)

{c(a) + α∑s′

P(s ′|a) · V ∗(s ′)}

Value Iteration : Bellman (1957) Dynamic ProgrammingPolicy Iteration : Howard (1960) Block-Pivot Simplex algorithmLinear Programming : Manne (1960)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 6 / 18

Page 36: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

T

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 37: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

T

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 38: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

T

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 39: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)

there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

TT

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 40: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1

‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

T

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

7 2

−5

10

0

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 41: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Extension to undiscounted MDPs i.e. α = 1 (discounted cas is special case)

Bertsekas and Tsitsiklis 1991 : Value Iteration, Policy Iteration, LP all work

Hypothesis :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost

1

2 4

3

T

a

b c

d

e

f

g

0

3

0.7

10

10.3−5

1

7

0.5

−1

0.9

2

0.2

3

2

−5

41

0.5

0.8

0.1 0

1

a

s

action

state

a sp(s|a)

as c(a)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 7 / 18

Page 42: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

T

3

10

−1

2

−5

7

2

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 43: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

T

a

b c

d

e

f

g

3

1

10

1−5

1

7

1

−12

21

1

1

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 44: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)

there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

TT

a

b c

d

e

f

g

3

1

10

1−5

1

7

1

−12

21

1

1

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 45: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1

‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

T

a

b c

d

e

f

g

3

1

10

1−5

1

7

1

−12

21

1

1

3

10

−5

−1

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 46: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

T

3

10

−1

2

−5

7

2

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 47: The Stochastic Shortest Path Problem: A Polyhedral Perspective

The Stochastic Shortest Path Problem

Almost an extension of the standard deterministic shortest path :

there is an identified target state T (from there no way to escape)there is a proper policy that lead to T with proba 1‘looping’ in the system (outside T ) is costly : +∞ cost→ this forbids zero cost cycles

1

2 4

3

T

3

10

−1

2

−5

7

2

NB: Bertsekas and Yu (2016) proved that perturbated version of PI and VIconverge in the presence of zero cost cycles.

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 8 / 18

Page 48: The Stochastic Shortest Path Problem: A Polyhedral Perspective

This is not only a technical problem !

Many applications with zero cost cycles !

Maximizing the probability of reaching a target

Ex: Robot motion planing in turbulent water

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 9 / 18

Page 49: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 50: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 51: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 52: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 53: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 54: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our Contribution

A Generalization of the framework by Bertsekas and Tsitsiklis thatencapsulates the deterministic version (i.e. zero cost cycles)

A proof that we can actually restrict to ‘pure’ policies

Proof of convergence of Value Iteration by a simple analysis :a natural extension of Bellman-Ford

Proof that Policy Iteration converges

A generalization of Dijkstra’s algorithm through primal-dual

→ Simplifies, Improves and Extends all previous results and analysis forinfinite horizon total cost MDPs !

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 10 / 18

Page 55: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Observation that the (dual of the) linear programming formulation for SSP isa natural relaxation of a more general problem

→ The corresponding polyhedra generalizes the network flow polyhedra

min cx∑a∈δ+(v)

x(a)−∑

a∈δ−(v)

x(a) =

1, if v = s−1, if v = t0, otherwise

,∀v ∈ V

x ≥ 0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 11 / 18

Page 56: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Observation that the (dual of the) linear programming formulation for SSP isa natural relaxation of a more general problem

→ The corresponding polyhedra generalizes the network flow polyhedra

1

2 4

3

T

1

1

1/3

2/3

2/3

1/3

1/3

1

1/3

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 11 / 18

Page 57: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Observation that the (dual of the) linear programming formulation for SSP isa natural relaxation of a more general problem

→ The corresponding polyhedra generalizes the network flow polyhedra

min cx∑a∈A(s)

x(a)−∑a∈A

p(s|a)x(a) =

1, if s = s0

−1, if s = T0, otherwise

∀s ∈ S

x ≥ 0

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 11 / 18

Page 58: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Observation that the (dual of the) linear programming formulation for SSP isa natural relaxation of a more general problem

→ The corresponding polyhedra generalizes the network flow polyhedra

1

1

1

2 4

3

T

a

b c

d

e

f

g

0.5

0.5

0.375

10.51.5

1

2.5

0.5

0.25

0.5

1

0.5

1.251

0.5

0.5

0.5

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 11 / 18

Page 59: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Linear Programming relaxation : proof sketch

A policy π induces a probability distribution over all possible (s0,T )-walks

yπk (s): probability of being in state s in period k following policy π

xπk (a): probability of taking action a in period k following policy π

We have for all π and for all k ≥ 0 :∑a∈A(s) x

πk (a) = yπ

k (s) and yπk+1(s) =

∑j∈A p(s|a)xπk (a)

It implies∑

k

∑a∈A(s) x

πk+1(a) =

∑k

∑a∈A p(s|a)xπk (a)

Together with yπ0 = 1s0 =

∑a∈A(s) x

π0 (a) this yields∑

a∈A(s)

xπ(a)−∑a∈A

p(s|a)xπ(a) = 1s0

as long as xπ(a) :=∑

k xπk (a) is well-defined for all a

(this is our new def. of proper)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 12 / 18

Page 60: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

3

3

1

2

2

1

1

3

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 61: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

3

3

1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 62: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

2

2

1

2

1

1

3

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 63: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

2

2

1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 64: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

1

1

1

2

1

1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 65: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

1

1

1

1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 66: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T1

1

1

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 67: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 68: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

3

3

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 13 / 18

Page 69: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

d

e

f

g

2

2

1

0.50.53

1

2.510.75

1

0.5

0.5

0.5

5 2

0.5

0.50.5

0.5

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 70: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

d

e

f

g

2

20.50.5

1

10.75

1

0.5

0.5

0.5

1

0.5

0.50.5

0.5

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 71: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

d

e1

1

1

0.50.53

1

2.51

4 2

0.5

0.50.5

0.5

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 72: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

d

e1

10.50.52

1

21

4 2

0.5

0.50.5

0.5

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 73: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

1

0.50.51

1

0.51

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 74: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

1

0.50.51

1

0.51

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 75: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Our technique : polyhedral analysis

Proof that the extreme points of this relaxation are ‘associated’ with ‘pure’policies (NB: the extreme points are NOT integral)

→ The proof relies on a generalization of the ‘flow’ decomposition theorem

1

2 4

3

T

a

b c

d

e

f

g

2

20.50.5

1

1

1

0.5

0.5

0.5

0.50.5

0.5

1

1

0.5

2

2

42

0.75

0.5

1

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 14 / 18

Page 76: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : framework

Decomposition theorem implies that extreme points are ‘pure’ strategies andextreme rays of the relaxation are ‘transition cycles’

A transition cycle is a solution x ≥ 0 to∑

a∈A(s) x(a)−∑

a∈A p(s|a)x(a) = 0

The optimum of the relaxation and of the original problem coincide when notransition cycles of negative costs : this is our new framework

Assumptions

There exists a path between all node i and 0 in the support graph

There is no negative cost transition cycle

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 15 / 18

Page 77: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : framework

Decomposition theorem implies that extreme points are ‘pure’ strategies andextreme rays of the relaxation are ‘transition cycles’

A transition cycle is a solution x ≥ 0 to∑

a∈A(s) x(a)−∑

a∈A p(s|a)x(a) = 0

The optimum of the relaxation and of the original problem coincide when notransition cycles of negative costs : this is our new framework

Assumptions

There exists a path between all node i and 0 in the support graph

There is no negative cost transition cycle

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 15 / 18

Page 78: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : framework

Decomposition theorem implies that extreme points are ‘pure’ strategies andextreme rays of the relaxation are ‘transition cycles’

A transition cycle is a solution x ≥ 0 to∑

a∈A(s) x(a)−∑

a∈A p(s|a)x(a) = 0

The optimum of the relaxation and of the original problem coincide when notransition cycles of negative costs : this is our new framework

Assumptions

There exists a path between all node i and 0 in the support graph

There is no negative cost transition cycle

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 15 / 18

Page 79: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : framework

Decomposition theorem implies that extreme points are ‘pure’ strategies andextreme rays of the relaxation are ‘transition cycles’

A transition cycle is a solution x ≥ 0 to∑

a∈A(s) x(a)−∑

a∈A p(s|a)x(a) = 0

The optimum of the relaxation and of the original problem coincide when notransition cycles of negative costs : this is our new framework

Assumptions

There exists a path between all node i and 0 in the support graph

There is no negative cost transition cycle

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 15 / 18

Page 80: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : algorithms

Value iteration is very similar to Bellman-Ford: we essentially prove thatmin lim = lim min

minΠ∈P

limK→∞

K∑k=0

cT xΠk = lim

K→∞min

Π∈PK

K∑k=0

cT xΠk

( P ∼ all proper policies, PK ∼ all proper policies that terminate in K steps)

Policy iteration is a block-pivot simplex : we prove strict improvement togarantee finiteness.

We can apply a primal-dual algorithm, the subproblem is a reachabilityquestion : Dijkstra-like algorithm (we fall into the same class, not the casebefore because of zero cost cycles !!)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 16 / 18

Page 81: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : algorithms

Value iteration is very similar to Bellman-Ford: we essentially prove thatmin lim = lim min

minΠ∈P

limK→∞

K∑k=0

cT xΠk = lim

K→∞min

Π∈PK

K∑k=0

cT xΠk

( P ∼ all proper policies, PK ∼ all proper policies that terminate in K steps)

Policy iteration is a block-pivot simplex : we prove strict improvement togarantee finiteness.

We can apply a primal-dual algorithm, the subproblem is a reachabilityquestion : Dijkstra-like algorithm (we fall into the same class, not the casebefore because of zero cost cycles !!)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 16 / 18

Page 82: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Idea of contributions : algorithms

Value iteration is very similar to Bellman-Ford: we essentially prove thatmin lim = lim min

minΠ∈P

limK→∞

K∑k=0

cT xΠk = lim

K→∞min

Π∈PK

K∑k=0

cT xΠk

( P ∼ all proper policies, PK ∼ all proper policies that terminate in K steps)

Policy iteration is a block-pivot simplex : we prove strict improvement togarantee finiteness.

We can apply a primal-dual algorithm, the subproblem is a reachabilityquestion : Dijkstra-like algorithm (we fall into the same class, not the casebefore because of zero cost cycles !!)

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 16 / 18

Page 83: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Main Open questions

The stochastic shortest path problem is polynomial through LP

Is it strongly polynomial ?

Ye (2011) : true for discounted MDPs if α is fixed

Is our generalization of Disjkstra’s algorithm strongly polynomial ?

Is the reachability subproblem strongly polynomial ?

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 17 / 18

Page 84: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Main Open questions

The stochastic shortest path problem is polynomial through LP

Is it strongly polynomial ?

Ye (2011) : true for discounted MDPs if α is fixed

Is our generalization of Disjkstra’s algorithm strongly polynomial ?

Is the reachability subproblem strongly polynomial ?

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 17 / 18

Page 85: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Main Open questions

The stochastic shortest path problem is polynomial through LP

Is it strongly polynomial ?

Ye (2011) : true for discounted MDPs if α is fixed

Is our generalization of Disjkstra’s algorithm strongly polynomial ?

Is the reachability subproblem strongly polynomial ?

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 17 / 18

Page 86: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Main Open questions

The stochastic shortest path problem is polynomial through LP

Is it strongly polynomial ?

Ye (2011) : true for discounted MDPs if α is fixed

Is our generalization of Disjkstra’s algorithm strongly polynomial ?

Is the reachability subproblem strongly polynomial ?

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 17 / 18

Page 87: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Main Open questions

The stochastic shortest path problem is polynomial through LP

Is it strongly polynomial ?

Ye (2011) : true for discounted MDPs if α is fixed

Is our generalization of Disjkstra’s algorithm strongly polynomial ?

Is the reachability subproblem strongly polynomial ?

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 17 / 18

Page 88: The Stochastic Shortest Path Problem: A Polyhedral Perspective

Guillot and Stauffer The Stochastic Shortest Path Problem LSE 2017 18 / 18