1 OR II GSLM 52800. 2 Outline introduction to discrete-time Markov Chain introduction to discrete-time Markov Chain problem statement long-term

1

OR IIOR IIGSLM 52800GSLM 52800

2

OutlineOutline

introduction to discrete-time Markov Chain

problem statement

long-term average cost pre unit time

solving the MDP by linear programming

3

States of a MachineStates of a Machine

State Condition

0 Good as new

1 Operable – minor deterioration

2 Operable – major deterioration

3 Inoperable – output of unacceptable quality

4

Transition of StatesTransition of States

State 0 1 2 3

0 0 7/8 1/16 1/16

1 0 3/4 1/8 1/8

2 0 0 1/2 1/2

3 0 0 0 1

5

Possible Actions Possible Actions

Decision Action Relevant States

1 Do nothing 0, 1, 2

2Overhaul (return to

state 1)2

3Replace (return to

state 0)1, 2, 3

6

ProblemProblem

adopting different collections of actions adopting different collections of actions leading to different long-term average cost leading to different long-term average cost per unit time per unit time

problem: to find the policy that minimizes problem: to find the policy that minimizes the long-term average cost per unit time the long-term average cost per unit time

7

Costs of ProblemCosts of Problem

cost of defective items state 0: 0; state 1: 1000; state 2: 3000

cost of replacing the machine = 4000

cost of losing production in machine replacement = 2000

cost of overhauling (at state 2) = 2000

8

Policy Policy RRdd: Always Replace : Always Replace

When State When State 0 0 half of the time at state 0, with cost 0

half of the time at other states, all with cost 6000, because of machine replacement

average cost per unit time = 3000

0 1

23

1/16

7/8

1

1

1 1/16

9

Long-Term Average Cost ofLong-Term Average Cost ofa Positive, Irreducible Discrete-time Markov Chaina Positive, Irreducible Discrete-time Markov Chain

a positive, irreducible discrete-time Markov chain with M+1 states, 0, …, M only M of the balance eqt plus the

normalization eqt

0

0

balance eqt.: , 0,...,

normalization eqt.: 1

M

j i iji

M

ii

p j M

10

Policy Policy RRaa: Replace at Failure : Replace at Failure

but Otherwise Do Nothingbut Otherwise Do Nothing

0 1

23

1/16

7/8

1

1/2

1 1/16

3/4

1/8

1/8

1/2

0 3

7 31 0 18 4

1 1 12 0 1 216 8 2

1 1 13 0 1 216 8 2

0 1 2 3 1

72 2 20 1 2 313 13 13 13

; ; ;

0 1 2 3(0) 1000 3000 6000 1923

11

Policy Policy RRbb: Replace in State 3, : Replace in State 3,

and Overhaul in State 2and Overhaul in State 20 3

7 31 0 1 28 4

1 12 0 116 8

1 13 0 116 8

0 1 2 3 1

52 2 20 1 2 321 7 21 21

; ; ;

0 1 2 3(0) 1000 4000 6000 1667

0 1

23

1/16

7/8

1

1 1/16

3/4

1/8

1/81

12

Policy Policy RRcc: Replace : Replace

in States 2 and 3in States 2 and 30 2 3

7 31 0 18 4

1 12 0 116 8

1 13 0 116 8

0 1 2 3 1

72 1 10 1 2 311 11 11 11

; ; ;

0 1 2 3(0) 1000 6000 6000 1727

0 1

231/16

7/8

1

1 1/16

3/4

1/8

1/81

1

13

ProblemProblem

in this case the minimum-cost policy is RRbb, ,

i.e., replacing in State 3 and overhauling in i.e., replacing in State 3 and overhauling in State 2State 2

question: Is there any efficient way to find the minimum cost policy if there are many states and different types of actions? impossible to check all possible cases

14

Linear Programming Approach Linear Programming Approach for an MDPfor an MDP

let

Dik be the probability of adopting decision k at state i

i be the stationary probability of state i

yik = P(state i and decision k)

Cik = the cost of adopting decision k at state i

15


0

0

, 0,...,

1

M

j i iji

M

ii

p j M

, 0,...,ik i iky D i M

0 1 1M K

iki k

y

1 0 1 ( ), 0,1,...,K M K

jk ik ijk i ky y p k j M

0, 0,1,..., ; 1,...,iky i M k K

0 1 0 1 ( ) =

M K M K

i ik ik ik iki k i k

E C C D C y

16


0 1

0 1

1 0 1

min = ,

. .

1,

( ) 0, 0,1,...,

0, 0,1,..., ; 0,1,...,

M K

ik iki k

M K

iki k

K M K

jk ik ijk i k

ik

Z C y

s t

y

y y p k j M

y i M k K

at optimal, Dik = 0 or 1, i.e., a deterministic policy is used

17


actions possibly to adopt at state 0: do nothing (i.e., k = 1)

1: do nothing or replace (i.e., k = 1 or 3)

2: do nothing, overhaul, or replace (i.e., k = 1, 2, or 3)

3: replace (i.e., k = 3)

variables: y01, y11, y13, y21, y22, y23, and y33

18


11 13 21

22 23 33

01 11 13 21 22 23 33

01 13 23 33

11 13

min =1000 6000 +3000

+4000 +6000 +6000 ,

. .

+ + + + 1

( + + ) 0

+

Z y y y

y y y

s t

y y y y y y y

y y y y

y y

7 301 11 228 4

1 1 121 22 23 01 11 2116 8 2

1 1 133 01 11 2116 8 2

( + + ) 0

+ + ( + + ) 0

( + + ) 0

0, 0,1,..., ; 0,1,...,ik

y y y

y y y y y y

y y y y

y i M k K

State 0 1 2 3

0 0 7/8 1/161/16

1 0 3/4 1/8 1/8

2 0 0 1/2 1/2

3 0 0 0 1

19


solving, y01 = 2/21, y11 = 5/7, y13 = 0, y21 = 0, y22 = 2/21, y23 = 0, y33 = 2/21

optimal policy at state 0: do nothing

state 1: do nothing

state 2: overhaul

state 3: replace

Documents

1 OR II GSLM 52800. 2 Outline introduction to discrete-time Markov Chain introduction to discrete-time Markov Chain problem statement long-term