7 OD Dynamic Programming a-2008

Embed Size (px)

DESCRIPTION

zxvczvc

Citation preview

  • 1DYNAMIC PROGRAMMING

    Joo Miguel da Costa Sousa / Alexandra Moutinho 228

    Dynamic Programming

    It is a useful mathematical technique for making a

    sequence of interrelated decisions.

    Systematic procedure for determining the optimal

    combinations of decisions.

    There is no standard mathematical formulation of

    the Dynamic Programming problem.

    Knowing when to apply dynamic programming depends largely on experience with its general

    structure.

    Prototype example

    Stagecoach problem

    Fortune seeker that wants to go from Missouri to California in the mid-19th century.

    Possible routes shown in the figure of next page.

    Travel has 4 stages. A is Missouri and J is California.

    Cost is the life insurance of a specific route; lowest cost is equivalent to a safest trip.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 229

    Stagecoach problem

    Road system and costs.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 230

    Costs

    Cost cij of going from state i to state j is:

    Problem: which route minimizes the total cost of the policy?

    B C D E F G H I J

    A 2 4 3 B 7 4 6 E 1 4 H 3

    C 3 2 4 F 6 3 I 4

    D 4 1 5 G 3 3

    Joo Miguel da Costa Sousa / Alexandra Moutinho 231

    Solving the problem

    Note that greedy approach does not work.

    Solution A B F I J has total cost of 13.

    However, e.g. AD F is cheaper than A B F.

    Other possibility: trial-and-error. Too much effort even for this simple problem.

    Dynamic programming is much more efficient than

    exhaustive enumeration, especially for large problems.

    Usually starts from the last stage of the problem, and enlarges it one stage at a time.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 232

  • 2Formulation

    Decision variables xn (n = 1, 2, 3, 4) are the

    immediate destination of stage n.

    Route is A x1 x2 x3 x4, where x4 = J.

    Total cost of the best overall policy for the remaining stages is fn(s, xn)

    Actual state is s, ready to start stage n, selecting xn as the immediate destination.

    xn* minimizes fn(s, xn) and fn

    *(s, xn) is the minimum value of fn(s, xn):

    = =* *( ) min ( , ) ( , )n

    n n n n nx

    f s f s x f s x

    Joo Miguel da Costa Sousa / Alexandra Moutinho 233

    Formulation

    where

    Value of csxn is given by cij where i = s, and j = xn.

    Objective: find f1*(A) and the corresponding route.

    Dynamic programming finds successively f4*(s), f3

    *(s), f2*(s)

    and finally f1*(A).

    +

    =

    = + 1

    ( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)

    ( )n

    n n

    sx n n

    f s x n n

    c f x

    Joo Miguel da Costa Sousa / Alexandra Moutinho 234

    Solution procedure

    When n = 4, the route is determined by its current

    state s (H or I) and its final destination J.

    Since f4*(s) = f4

    *(s, J) = csJ, the solution for n = 4:

    s f4*(s) x4

    *

    H 3 J

    I 4 J

    Joo Miguel da Costa Sousa / Alexandra Moutinho 235

    Stage n = 3

    Needs a few calculations. If fortune seeker is in state F,

    he can go to either H or I with costs cF,H = 6 or cF,I = 3.

    Choosing H, the minimum additional cost is f4*(H) = 3.

    Total cost is 6 + 3 = 9.

    Choosing I, the total cost is 3 + 4 = 7. This is smaller,

    and it is the optimal choice for state F.

    F

    H

    I

    6

    3

    3

    4Joo Miguel da Costa Sousa / Alexandra Moutinho 236

    Stage n = 3

    Similar calculations can be made for the two possible

    states s = E and s = G, resulting in the table for n = 3:

    s x3

    f3*(s, x3) = csx3 + f4

    *(x3)

    H I f3*(s) x3

    *

    E 4 8 4 H

    F 9 7 7 I

    G 6 7 6 H

    Joo Miguel da Costa Sousa / Alexandra Moutinho 237

    Stage n = 2

    In this case, f2*(s, x2) = csx2 + f3

    *(x2).

    Example for node C:

    x2 = E: f2*(C, E) = cC,E + f3

    *(E) = 3 + 4 = 7 optimal

    x2 = F: f2*(C, F) = cC,F + f3

    *(F) = 2 + 7 = 9.

    x2 = G: f2*(C, G) = cC,G + f3

    *(G) = 4 + 6 = 10.

    C

    E

    G

    3

    4

    4

    6

    F

    72

    Joo Miguel da Costa Sousa / Alexandra Moutinho 238

  • 3Stage n = 2

    Similar calculations can be made for the two possible

    states s = B and s = D, resulting in the table for n = 2:

    s x2f2

    *(s, x2) = csx2 + f3*(x2)

    E F G f2*(s) x2

    *

    B 11 11 12 11 E or F

    C 7 9 10 7 E

    D 8 8 11 8 E or F

    Joo Miguel da Costa Sousa / Alexandra Moutinho 239

    Stage n = 1

    Just one possible starting state: A.

    x1 = B: f2*(A, B) = cA,B + f2

    *(B) = 2 + 11 = 13.

    x1 = C: f2*(A, C) = cA,C + f2

    *(C) = 4 + 7 = 11 optimal

    x1 = D: f2*(A, D) = cA,D + f2

    *(D) = 3 + 8 = 11 optimal

    Results in the table:

    s x1f1

    *(s, x1) = csx1 + f2*(x1)

    B C D f1*(s) x1

    *

    A 13 11 11 11 C or D

    Joo Miguel da Costa Sousa / Alexandra Moutinho 240

    Optimal solution

    Three optimal solutions, all with f1*(A) = 11:

    Joo Miguel da Costa Sousa / Alexandra Moutinho 241

    Characteristics of DP

    1. The problem can be divided into stages, with a

    policy decision required at each stage.

    Example: 4 stages and life insurance policy to choose.

    Dynamic programming problems require making a sequence of interrelated decisions.

    2. Each stage has a number of states associated with the beginning of each stage.

    Example: states are the possible territories where the fortune seeker could be located.

    States are possible conditions in which the system might be.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 242

    Characteristics of DP

    3. Policy decision transforms the current state to a state

    associated with the beginning of the next stage.

    Example: fortune seekers decision led him from his current state to the next state on his journey.

    DP problems can be interpreted in terms of networks: each node correspond to a state.

    Value assigned to each link is the immediate contributionto the objective function from making that policy decision.

    In most cases, objective corresponds to finding the shortest or the longest path.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 243

    Characteristics of DP

    4. The solution procedure finds an optimal policy for

    the overall problem. Finds a prescription of the optimal policy decision at each stage for each of the possible states.

    Example: solution procedure constructed a table for each stage, n, that prescribed the optimal decision, xn

    *, for each possible state s.

    In addition to identifying optimal solutions, DP provides a policy prescription of what to do under every possible circumstance (why a decision is called policy decision). This is useful for sensitivity analysis.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 244

  • 4Characteristics of DP

    5. Given the current state, an optimal policy for the

    remaining stages is independent of the policy decisions adopted in previous stages.

    Optimal immediate decision depends only on current state and not on how it was obtained: this is the principle of optimality for DP.

    Example: at any state, the insurance policy is independent on how the fortune seeker got there.

    Knowledge of the current state conveys all information necessary for determining the optimal policy henceforth (Markovian property). Problems lacking this property are not Dynamic Programming Problems.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 245

    Characteristics of DP

    6. Solution procedure begins by finding the optimal

    policy for the last stage. Solution is usually trivial.

    7. A recursive relationship that identifies optimal

    policy for stage n, given optimal policy for stage n+1, is available.

    Example: recursive relationship was

    Recursive relationship differs somewhat among dynamic programming problems.

    { }+= +* * 1( ) min ( )nn

    n sx n nx

    f s c f x

    Joo Miguel da Costa Sousa / Alexandra Moutinho 246

    Characteristics of DP

    7. (cont.) Notation:

    =

    = =

    =

    =

    =

    = +

    *

    number of stages.

    label for current stage ( 1,2, , ).

    current for stage .

    decision variable for stage .

    optimal value of (given ).

    ( , ) contribution of stages , 1, , to o

    n

    n

    n n n

    n n n

    N

    n n N

    s n

    x n

    x x s

    f s x

    a e

    n N

    t t

    n

    s

    =* *

    bjective

    function if system starts in state at stage ,

    immediate decision is , and optimal decisions

    are made thereafter.

    ( ) ( , )

    n

    n

    n n n n n

    s n

    x

    f s f s x

    Joo Miguel da Costa Sousa / Alexandra Moutinho 247

    Characteristics of DP

    7. (cont.) Recursive relationship:

    where fn(sn, xn) is written in terms of sn, xn , , and probably some measure of the immediate contribution of xnto the objective function.

    8. Using recursive relationship, solution procedure

    starts at the end and moves backward stage by stage.

    Stops when optimal policy starting at initial stage is found.

    The optimal policy for the entire problem is found.

    Example: the tables for the stages show this procedure.

    { } { }= =* *( ) max ( , ) or ( ) min ( , )nn

    n n n n n n n n n nxx

    f s f s x f s f s x

    + +

    *

    1 1( )n nf s

    Joo Miguel da Costa Sousa / Alexandra Moutinho 248

    Characteristics of DP

    8. (cont.) For DP problems, a table such as the

    following would be obtained for each stage (n = N, N-1, , 1):

    * ( )n nf s

    Joo Miguel da Costa Sousa / Alexandra Moutinho 249

    *

    nxsn xn

    fn(sn, xn)

    Deterministic dynamic programming

    Deterministic problems: the state at the next stage is

    completely determined by the state and policy decisionat the current stage.

    Form of the objective function: minimize or maximize the sum, product, etc. of the contributions from the individual stages.

    Set of states: may be discrete or continuous, or a state vector. Decision variables can also be discrete or continuous.

    Joo Miguel da Costa Sousa / Alexandra Moutinho 250

  • 5Example: distributing medical teams

    The World Health Council has five medical teams to allocate to three underdeveloped countries.

    Measure of performance: additional person-years of life, i.e., increased life expectancy (in years) times countrys population.

    Thousands of additional person-years of life

    Country

    Medical teams 1 2 3

    0 0 0 0

    1 45 20 50

    2 70 45 70

    3 90 75 80

    4 105 110 100

    5 120 150 130

    Joo Miguel da Costa Sousa / Alexandra Moutinho 251

    Problem requires three interrelated decisions: how

    many teams to allocate to the three countries (stages).

    xn is the number of teams to allocate to stage n.

    What are the states? What changes from one

    stage to another?

    sn = number of medical teams still available for remaining countries (n, , 3).

    Thus: s1 = 5, s2 = 5 x1 = s1 x1, s3 = s2 x2.

    Formulation of the problem

    Joo Miguel da Costa Sousa / Alexandra Moutinho 252

    Joo Miguel da Costa Sousa 253

    States to be considered

    Thousands of additional

    person-years of life

    Country

    Medical

    teams

    1 2 3

    0 0 0 0

    1 45 20 50

    2 70 45 70

    3 90 75 80

    4 105 110 100

    5 120 150 130

    Overall problem

    pi(xi): measure of performance from allocating ximedical teams to country i.

    =

    3

    1

    Maximize ( ),i ii

    p x

    =

    =3

    1

    subject to

    5,ii

    x

    and are nonnegative integers.ix

    Joo Miguel da Costa Sousa / Alexandra Moutinho 254

    Policy

    Recursive relationship relating functions:

    { }+=

    = + =

    * *

    10,1 , ,

    ( ) max ( ) ( ) , for 1,2n n

    n n n n n n nx s

    f s p x f s x n

    ==

    3

    *

    3 3 3 30,1, ,

    ( ) max ( )nx s

    f s p x

    Joo Miguel da Costa Sousa / Alexandra Moutinho 255

    Solution procedure, stage n = 3

    For last stage n = 3, values of p3(x3) are the last

    column of table. Here, x3* = s3 and f3

    *(s3) = p3(s3).

    n = 3: s3 f3*(s3) x3

    *

    0 0 0

    1 50 1

    2 70 2

    3 80 3

    4 100 4

    5 130 5

    Joo Miguel da Costa Sousa / Alexandra Moutinho 256

    Thousands of additional

    person-years of life

    Country

    Medical

    teams

    1 2 3

    0 0 0 0

    1 45 20 50

    2 70 45 70

    3 90 75 80

    4 105 110 100

    5 120 150 130

  • 6Stage n = 2

    Here, finding x2* requires calculating f2(s2, x2) for the

    values of x2 = 0, 1, , s2. Example for s2 = 2:

    2

    0

    2

    45

    0

    0

    70

    1

    5020State:

    Joo Miguel da Costa Sousa / Alexandra Moutinho 257

    Thousands of additional

    person-years of life

    Country

    Medical

    teams

    1 2 3

    0 0 0 0

    1 45 20 50

    2 70 45 70

    3 90 75 80

    4 105 110 100

    5 120 150 130

    Stage n = 2

    Similar calculations can be made for the other values

    of s2:

    n = 2 : s2 x2

    f2*(s2, x2) = p2(x2) + f3

    *(s2 x2)

    0 1 2 3 4 5 f2*(s2) x2

    *

    0 0 0 0

    1 50 20 50 0

    2 70 70 45 70 0 or 1

    3 80 90 95 75 95 2

    4 100 100 115 125 110 125 3

    5 130 120 125 145 160 150 160 4

    Joo Miguel da Costa Sousa / Alexandra Moutinho 258

    Stage n = 1

    Only state is the starting

    state s1 = 5:

    5

    0

    5

    120

    0

    0

    160

    4

    12545State:

    ...

    n = 1 : s1 x1f1

    *(s1, x1) = p1(x1) + f2*(s1 x1)

    0 1 2 3 4 5 f1*(s1) x1

    *

    5 160 170 165 160 155 120 170 1

    Joo Miguel da Costa Sousa / Alexandra Moutinho 259

    Thousands of additional

    person-years of life

    Country

    Medical

    teams

    1 2 3

    0 0 0 0

    1 45 20 50

    2 70 45 70

    3 90 75 80

    4 105 110 100

    5 120 150 130

    Joo Miguel da Costa Sousa 260

    Optimal policy decision

    Distribution of effort problem

    One kind of resource is allocated to a number of

    activities. Objective: how to distribute the effort (resource) among the activities most effectively.

    DP involves only one (or few) resources, while LP can deal with thousands of resources.

    The assumptions of LP: proportionality, divisibility and certainty can be violated by DP. Only additivity (or analogous for product of terms) is necessary because

    of the principle of optimality.

    World Health Council problem violates proportionality and divisibility (only integers are allowed).

    Joo Miguel da Costa Sousa / Alexandra Moutinho 261

    Formulation of distribution of effort

    When system starts at stage n in state sn, choice xnresults in the next state at stage n + 1 being sn+1 = sn - xn :

    = =

    =

    =

    Stage activity ( 1,2, , ).

    amount of resource allocated to activity .

    State amount of resource still available for

    allocation to remaining activities ( , , ).

    n

    n

    n n n N

    x n

    s

    n N

    Stage: n

    State: snxn sn xn

    n + 1

    Joo Miguel da Costa Sousa / Alexandra Moutinho 262

  • 7Example

    Distributing scientists to research teams

    3 teams are solving engineering problem to safely fly people to Mars.

    2 extra scientists reduce the probability of failure.

    Probability of failure

    Team

    New scientists 1 2 3

    0 0.40 0.60 0.80

    1 0.20 0.40 0.50

    2 0.15 0.20 0.30

    Joo Miguel da Costa Sousa / Alexandra Moutinho 263