Upload
carolinarvsocn
View
217
Download
0
Embed Size (px)
DESCRIPTION
zxvczvc
Citation preview
1DYNAMIC PROGRAMMING
Joo Miguel da Costa Sousa / Alexandra Moutinho 228
Dynamic Programming
It is a useful mathematical technique for making a
sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
the Dynamic Programming problem.
Knowing when to apply dynamic programming depends largely on experience with its general
structure.
Prototype example
Stagecoach problem
Fortune seeker that wants to go from Missouri to California in the mid-19th century.
Possible routes shown in the figure of next page.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowest cost is equivalent to a safest trip.
Joo Miguel da Costa Sousa / Alexandra Moutinho 229
Stagecoach problem
Road system and costs.
Joo Miguel da Costa Sousa / Alexandra Moutinho 230
Costs
Cost cij of going from state i to state j is:
Problem: which route minimizes the total cost of the policy?
B C D E F G H I J
A 2 4 3 B 7 4 6 E 1 4 H 3
C 3 2 4 F 6 3 I 4
D 4 1 5 G 3 3
Joo Miguel da Costa Sousa / Alexandra Moutinho 231
Solving the problem
Note that greedy approach does not work.
Solution A B F I J has total cost of 13.
However, e.g. AD F is cheaper than A B F.
Other possibility: trial-and-error. Too much effort even for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large problems.
Usually starts from the last stage of the problem, and enlarges it one stage at a time.
Joo Miguel da Costa Sousa / Alexandra Moutinho 232
2Formulation
Decision variables xn (n = 1, 2, 3, 4) are the
immediate destination of stage n.
Route is A x1 x2 x3 x4, where x4 = J.
Total cost of the best overall policy for the remaining stages is fn(s, xn)
Actual state is s, ready to start stage n, selecting xn as the immediate destination.
xn* minimizes fn(s, xn) and fn
*(s, xn) is the minimum value of fn(s, xn):
= =* *( ) min ( , ) ( , )n
n n n n nx
f s f s x f s x
Joo Miguel da Costa Sousa / Alexandra Moutinho 233
Formulation
where
Value of csxn is given by cij where i = s, and j = xn.
Objective: find f1*(A) and the corresponding route.
Dynamic programming finds successively f4*(s), f3
*(s), f2*(s)
and finally f1*(A).
+
=
= + 1
( , ) immediate cost (stage ) + minimum future cost (stages +1 onward)
( )n
n n
sx n n
f s x n n
c f x
Joo Miguel da Costa Sousa / Alexandra Moutinho 234
Solution procedure
When n = 4, the route is determined by its current
state s (H or I) and its final destination J.
Since f4*(s) = f4
*(s, J) = csJ, the solution for n = 4:
s f4*(s) x4
*
H 3 J
I 4 J
Joo Miguel da Costa Sousa / Alexandra Moutinho 235
Stage n = 3
Needs a few calculations. If fortune seeker is in state F,
he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller,
and it is the optimal choice for state F.
F
H
I
6
3
3
4Joo Miguel da Costa Sousa / Alexandra Moutinho 236
Stage n = 3
Similar calculations can be made for the two possible
states s = E and s = G, resulting in the table for n = 3:
s x3
f3*(s, x3) = csx3 + f4
*(x3)
H I f3*(s) x3
*
E 4 8 4 H
F 9 7 7 I
G 6 7 6 H
Joo Miguel da Costa Sousa / Alexandra Moutinho 237
Stage n = 2
In this case, f2*(s, x2) = csx2 + f3
*(x2).
Example for node C:
x2 = E: f2*(C, E) = cC,E + f3
*(E) = 3 + 4 = 7 optimal
x2 = F: f2*(C, F) = cC,F + f3
*(F) = 2 + 7 = 9.
x2 = G: f2*(C, G) = cC,G + f3
*(G) = 4 + 6 = 10.
C
E
G
3
4
4
6
F
72
Joo Miguel da Costa Sousa / Alexandra Moutinho 238
3Stage n = 2
Similar calculations can be made for the two possible
states s = B and s = D, resulting in the table for n = 2:
s x2f2
*(s, x2) = csx2 + f3*(x2)
E F G f2*(s) x2
*
B 11 11 12 11 E or F
C 7 9 10 7 E
D 8 8 11 8 E or F
Joo Miguel da Costa Sousa / Alexandra Moutinho 239
Stage n = 1
Just one possible starting state: A.
x1 = B: f2*(A, B) = cA,B + f2
*(B) = 2 + 11 = 13.
x1 = C: f2*(A, C) = cA,C + f2
*(C) = 4 + 7 = 11 optimal
x1 = D: f2*(A, D) = cA,D + f2
*(D) = 3 + 8 = 11 optimal
Results in the table:
s x1f1
*(s, x1) = csx1 + f2*(x1)
B C D f1*(s) x1
*
A 13 11 11 11 C or D
Joo Miguel da Costa Sousa / Alexandra Moutinho 240
Optimal solution
Three optimal solutions, all with f1*(A) = 11:
Joo Miguel da Costa Sousa / Alexandra Moutinho 241
Characteristics of DP
1. The problem can be divided into stages, with a
policy decision required at each stage.
Example: 4 stages and life insurance policy to choose.
Dynamic programming problems require making a sequence of interrelated decisions.
2. Each stage has a number of states associated with the beginning of each stage.
Example: states are the possible territories where the fortune seeker could be located.
States are possible conditions in which the system might be.
Joo Miguel da Costa Sousa / Alexandra Moutinho 242
Characteristics of DP
3. Policy decision transforms the current state to a state
associated with the beginning of the next stage.
Example: fortune seekers decision led him from his current state to the next state on his journey.
DP problems can be interpreted in terms of networks: each node correspond to a state.
Value assigned to each link is the immediate contributionto the objective function from making that policy decision.
In most cases, objective corresponds to finding the shortest or the longest path.
Joo Miguel da Costa Sousa / Alexandra Moutinho 243
Characteristics of DP
4. The solution procedure finds an optimal policy for
the overall problem. Finds a prescription of the optimal policy decision at each stage for each of the possible states.
Example: solution procedure constructed a table for each stage, n, that prescribed the optimal decision, xn
*, for each possible state s.
In addition to identifying optimal solutions, DP provides a policy prescription of what to do under every possible circumstance (why a decision is called policy decision). This is useful for sensitivity analysis.
Joo Miguel da Costa Sousa / Alexandra Moutinho 244
4Characteristics of DP
5. Given the current state, an optimal policy for the
remaining stages is independent of the policy decisions adopted in previous stages.
Optimal immediate decision depends only on current state and not on how it was obtained: this is the principle of optimality for DP.
Example: at any state, the insurance policy is independent on how the fortune seeker got there.
Knowledge of the current state conveys all information necessary for determining the optimal policy henceforth (Markovian property). Problems lacking this property are not Dynamic Programming Problems.
Joo Miguel da Costa Sousa / Alexandra Moutinho 245
Characteristics of DP
6. Solution procedure begins by finding the optimal
policy for the last stage. Solution is usually trivial.
7. A recursive relationship that identifies optimal
policy for stage n, given optimal policy for stage n+1, is available.
Example: recursive relationship was
Recursive relationship differs somewhat among dynamic programming problems.
{ }+= +* * 1( ) min ( )nn
n sx n nx
f s c f x
Joo Miguel da Costa Sousa / Alexandra Moutinho 246
Characteristics of DP
7. (cont.) Notation:
=
= =
=
=
=
= +
*
number of stages.
label for current stage ( 1,2, , ).
current for stage .
decision variable for stage .
optimal value of (given ).
( , ) contribution of stages , 1, , to o
n
n
n n n
n n n
N
n n N
s n
x n
x x s
f s x
a e
n N
t t
n
s
=* *
bjective
function if system starts in state at stage ,
immediate decision is , and optimal decisions
are made thereafter.
( ) ( , )
n
n
n n n n n
s n
x
f s f s x
Joo Miguel da Costa Sousa / Alexandra Moutinho 247
Characteristics of DP
7. (cont.) Recursive relationship:
where fn(sn, xn) is written in terms of sn, xn , , and probably some measure of the immediate contribution of xnto the objective function.
8. Using recursive relationship, solution procedure
starts at the end and moves backward stage by stage.
Stops when optimal policy starting at initial stage is found.
The optimal policy for the entire problem is found.
Example: the tables for the stages show this procedure.
{ } { }= =* *( ) max ( , ) or ( ) min ( , )nn
n n n n n n n n n nxx
f s f s x f s f s x
+ +
*
1 1( )n nf s
Joo Miguel da Costa Sousa / Alexandra Moutinho 248
Characteristics of DP
8. (cont.) For DP problems, a table such as the
following would be obtained for each stage (n = N, N-1, , 1):
* ( )n nf s
Joo Miguel da Costa Sousa / Alexandra Moutinho 249
*
nxsn xn
fn(sn, xn)
Deterministic dynamic programming
Deterministic problems: the state at the next stage is
completely determined by the state and policy decisionat the current stage.
Form of the objective function: minimize or maximize the sum, product, etc. of the contributions from the individual stages.
Set of states: may be discrete or continuous, or a state vector. Decision variables can also be discrete or continuous.
Joo Miguel da Costa Sousa / Alexandra Moutinho 250
5Example: distributing medical teams
The World Health Council has five medical teams to allocate to three underdeveloped countries.
Measure of performance: additional person-years of life, i.e., increased life expectancy (in years) times countrys population.
Thousands of additional person-years of life
Country
Medical teams 1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Joo Miguel da Costa Sousa / Alexandra Moutinho 251
Problem requires three interrelated decisions: how
many teams to allocate to the three countries (stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one
stage to another?
sn = number of medical teams still available for remaining countries (n, , 3).
Thus: s1 = 5, s2 = 5 x1 = s1 x1, s3 = s2 x2.
Formulation of the problem
Joo Miguel da Costa Sousa / Alexandra Moutinho 252
Joo Miguel da Costa Sousa 253
States to be considered
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Overall problem
pi(xi): measure of performance from allocating ximedical teams to country i.
=
3
1
Maximize ( ),i ii
p x
=
=3
1
subject to
5,ii
x
and are nonnegative integers.ix
Joo Miguel da Costa Sousa / Alexandra Moutinho 254
Policy
Recursive relationship relating functions:
{ }+=
= + =
* *
10,1 , ,
( ) max ( ) ( ) , for 1,2n n
n n n n n n nx s
f s p x f s x n
==
3
*
3 3 3 30,1, ,
( ) max ( )nx s
f s p x
Joo Miguel da Costa Sousa / Alexandra Moutinho 255
Solution procedure, stage n = 3
For last stage n = 3, values of p3(x3) are the last
column of table. Here, x3* = s3 and f3
*(s3) = p3(s3).
n = 3: s3 f3*(s3) x3
*
0 0 0
1 50 1
2 70 2
3 80 3
4 100 4
5 130 5
Joo Miguel da Costa Sousa / Alexandra Moutinho 256
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
6Stage n = 2
Here, finding x2* requires calculating f2(s2, x2) for the
values of x2 = 0, 1, , s2. Example for s2 = 2:
2
0
2
45
0
0
70
1
5020State:
Joo Miguel da Costa Sousa / Alexandra Moutinho 257
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Stage n = 2
Similar calculations can be made for the other values
of s2:
n = 2 : s2 x2
f2*(s2, x2) = p2(x2) + f3
*(s2 x2)
0 1 2 3 4 5 f2*(s2) x2
*
0 0 0 0
1 50 20 50 0
2 70 70 45 70 0 or 1
3 80 90 95 75 95 2
4 100 100 115 125 110 125 3
5 130 120 125 145 160 150 160 4
Joo Miguel da Costa Sousa / Alexandra Moutinho 258
Stage n = 1
Only state is the starting
state s1 = 5:
5
0
5
120
0
0
160
4
12545State:
...
n = 1 : s1 x1f1
*(s1, x1) = p1(x1) + f2*(s1 x1)
0 1 2 3 4 5 f1*(s1) x1
*
5 160 170 165 160 155 120 170 1
Joo Miguel da Costa Sousa / Alexandra Moutinho 259
Thousands of additional
person-years of life
Country
Medical
teams
1 2 3
0 0 0 0
1 45 20 50
2 70 45 70
3 90 75 80
4 105 110 100
5 120 150 130
Joo Miguel da Costa Sousa 260
Optimal policy decision
Distribution of effort problem
One kind of resource is allocated to a number of
activities. Objective: how to distribute the effort (resource) among the activities most effectively.
DP involves only one (or few) resources, while LP can deal with thousands of resources.
The assumptions of LP: proportionality, divisibility and certainty can be violated by DP. Only additivity (or analogous for product of terms) is necessary because
of the principle of optimality.
World Health Council problem violates proportionality and divisibility (only integers are allowed).
Joo Miguel da Costa Sousa / Alexandra Moutinho 261
Formulation of distribution of effort
When system starts at stage n in state sn, choice xnresults in the next state at stage n + 1 being sn+1 = sn - xn :
= =
=
=
Stage activity ( 1,2, , ).
amount of resource allocated to activity .
State amount of resource still available for
allocation to remaining activities ( , , ).
n
n
n n n N
x n
s
n N
Stage: n
State: snxn sn xn
n + 1
Joo Miguel da Costa Sousa / Alexandra Moutinho 262
7Example
Distributing scientists to research teams
3 teams are solving engineering problem to safely fly people to Mars.
2 extra scientists reduce the probability of failure.
Probability of failure
Team
New scientists 1 2 3
0 0.40 0.60 0.80
1 0.20 0.40 0.50
2 0.15 0.20 0.30
Joo Miguel da Costa Sousa / Alexandra Moutinho 263