Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
CS 401
Dynamic Programming / Knapsack Problem, RNA Secondary
Structure
Xiaorui Sun1
Dynamic Programming
Principle:• Define a collection of subproblems
• Typically, only a polynomial number of subproblems• Solution to original problem can be computed from subproblems• Natural ordering of subproblems from “smallest” to “largest” that enables
determining a solution to a subproblem from solutions to smaller subproblems
• Parameterization: Describe subproblems by parameters• Memorization: Remember the solution of subproblems
Technique:• Binary choice: weighted interval scheduling. • Multiway choice: segmented least squares.
Question: How to find a solution for dynamic programming?• Back-track from the optimal objective value 2
548
327
236
045
134
013
042
031
00
p(j)!"j
Weighted Interval Scheduling
Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.
Time0 1 2 3 4 5 6 7 8 9 10 11
6
7
8
4
3
1
2
5
3
4
4
6
6
7
7
10
OPT(j)
012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.
548
327
236
045
134
013
042
031
00
p(j)!"j
Weighted Interval Scheduling
Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.
Time0 1 2 3 4 5 6 7 8 9 10 11
6
7
8
4
3
1
2
5
3
4
4
6
6
7
7
10
OPT(j)
012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.
548
327
236
045
134
013
042
031
00
p(j)!"j
Weighted Interval Scheduling
Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.
Time0 1 2 3 4 5 6 7 8 9 10 11
6
7
8
4
3
1
2
5
3
4
4
6
6
7
7
10
OPT(j)
012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.
548
327
236
045
134
013
042
031
00
p(j)!"j
Weighted Interval Scheduling
Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.
Time0 1 2 3 4 5 6 7 8 9 10 11
6
7
8
4
3
1
2
5
3
4
4
6
6
7
7
10
OPT(j)
012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.
548
327
236
045
134
013
042
031
00
p(j)!"j
Weighted Interval Scheduling
Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.
Time0 1 2 3 4 5 6 7 8 9 10 11
6
7
8
4
3
1
2
5
3
4
4
6
6
7
7
10
OPT(j)
012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.
Knapsack Problem
Knapsack Problem
Given ! objects and a "knapsack.“
Item " weighs #$ > 0 kilograms and has value '$ > 0.
Knapsack has capacity of ( kilograms.
Goal: fill knapsack so as to maximize total value.
Ex: OPT is { 3, 4 } with value 40.
Greedy: repeatedly add item with maximum ratio '$/#$.Ex: { 5, 2, 1 } achieves only value = 35 Þ greedy not optimal.
9
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2W = 11
First Attempt
Def !"#(%) = max profit subset of items 1,… , %.
Case 1: !"# does not select item %.• !"# selects best of { 1, 2, … , % − 1 }
Case 2: !"# selects item %.• accepting item % does not immediately imply that we will have
to reject other items• without knowing what other items were selected before %, we
don't even know if we have enough room for %
Conclusion: Need more sub-problems!
10
Stronger DP (New Variable)
Let !"#(%, ') = Max value of subsets of items 1,… , % of weight ≤ '
Case 1: !"#(%, ') selects item %• In this case, !"# %, ' = -. + !"#(% − 1,' − '.)
Case 2: !"# %, ' does not select item %• In this case, !"# %, ' = !"#(% − 1,').
Therefore,
11
!"# %, ' = 10!"# % − 1,'max(!"# % − 1,' , -. + !"# % − 1,' − '. )
Take best of the two
If % = 0If '. > 'o.w.,
DP for Knapsack
12
for w = 0 to WM[0, w] = 0
for i = 1 to nfor w = 1 to W
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
return M[n, W]
Compute-OPT(i,w)if M[i,w] == empty
if (i==0)M[i,w]=0
else if (wi > w)M[i,w]=Comp-OPT(i-1,w)
elseM[i,w]= max {Comp-OPT(i-1,w), vi + Comp-OPT(i-1,w-wi)}
return M[i, w]
recursive
Non-recursive
DP for Knapsack
13
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
10
0
11
0
W + 1
W = 11
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
DP for Knapsack
14
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
1
2
0
1
3
0
1
4
0
1
5
0
1
6
0
1
7
0
1
8
0
1
9
0
1
10
0
1
11
0
1
W + 1
W = 11
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
DP for Knapsack
15
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
1
1
1
1
1
2
0
6
1
3
0
1
4
0
1
5
0
1
6
0
1
7
0
1
8
0
1
9
0
1
10
0
1
11
0
1
W + 1
W = 11OPT: { 4, 3 }
value = 22 + 18 = 40
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
7
DP for Knapsack
16
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
W + 1
W = 11OPT: { 4, 3 }
value = 22 + 18 = 40
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
1
1
1
1
1
2
0
6
6
1
3
0
7
7
1
4
0
7
7
1
5
0
7
18
1
6
0
7
1
7
0
7
1
8
0
7
1
9
0
7
1
10
0
7
1
11
0
7
1
19
DP for Knapsack
17
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
W + 1
W = 11OPT: { 4, 3 }
value = 22 + 18 = 40
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
1
1
1
1
1
2
0
6
6
6
1
3
0
7
7
7
1
4
0
7
7
7
1
5
0
7
18
18
1
6
0
7
19
22
1
7
0
7
24
24
1
8
0
7
25
28
1
9
0
7
25
1
10
0
7
25
1
11
0
7
25
1
29
DP for Knapsack
18
n + 1
1
Value
182228
1
Weight
56
6 2
7
Item
1
345
2
W + 1
W = 11OPT: { 4, 3 }
value = 22 + 18 = 40
if (wi > w)M[i, w] = M[i-1, w]
elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
f
{ 1, 2 }
{ 1, 2, 3 }
{ 1, 2, 3, 4 }
{ 1 }
{ 1, 2, 3, 4, 5 }
0
0
0
0
0
0
0
1
0
1
1
1
1
1
2
0
6
6
6
1
6
3
0
7
7
7
1
7
4
0
7
7
7
1
7
5
0
7
18
18
1
18
6
0
7
19
22
1
22
7
0
7
24
24
1
28
8
0
7
25
28
1
29
9
0
7
25
29
1
34
10
0
7
25
29
1
34
11
0
7
25
40
1
40
DP Ideas so far
• You may have to define an ordering to decrease
#subproblems
• You may have to strengthen DP, equivalently the induction,
i.e., you have may have to carry more information to find the
Optimum.
• This means that sometimes we may have to use two
dimensional or three dimensional induction
• This dynamic algorithm is not a polynomial time algorithm • For an input of N bits, W can be as large as 2N
• Knapsack problem is actually NP-complete
19
RNA Secondary Structure
RNA Secondary Structure
RNA: A String B = b1b2…bn over alphabet { A, C, G, U }.
Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule.
21G
U
C
A
GA
A
G
CG
A
UG
A
U
U
A
G
A
C A
A
C
U
G
A
G
U
C
A
U
C
G
G
G
C
C
G
Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA
complementary base pairs: A-U, C-G
RNA Secondary Structure (Formal)
Secondary structure. A set of pairs ! = { $%, $' } that satisfy:
[Watson-Crick.]• ! is a matching and
• each pair in ! is a Watson-Crick pair: * − ,, , − *, - − ., or . − -.[No sharp turns.]: The ends of each pair are separated by at least 4 intervening bases. If $%, $' ∈ !, then 0 < 2 − 4.
[Non-crossing.] If ($%, $') and ($6, $7) are two pairs in !, then we cannot have 0 < 8 < 2 < 9.
22
Secondary Structure (Examples)
23
C
G G
C
A
G
U
U
U A
A U G U G G C C A U
G G
C
A
G
U
U A
A U G G G C A U
C
G G
C
A
U
G
U
U A
A G U U G G C C A U
sharp turn crossingok
G
G£4
base pair
RNA Secondary Structure (Formal)
Secondary structure. A set of pairs ! = { $%, $' } that satisfy:
[Watson-Crick.]• ! is a matching and • each pair in ! is a Watson-Crick pair: * − ,, , − *, - − ., or . − -.[No sharp turns.]: The ends of each pair are separated by at least 4 intervening bases. If $%, $' ∈ !, then 0 < 2 − 4.[Non-crossing.] If ($%, $') and ($6, $7) are two pairs in !, then we cannot have 0 < 8 < 2 < 9.
Free energy: Usual hypothesis is that an RNA molecule will maximize total free energy.
Goal: Given an RNA molecule B = b1b2…bn, find a secondary structure S that maximizes the number of base pairs.
24
approximate by number of base pairs
DP: First Attempt
First attempt. Let !"#(%) = maximum number of base pairs in a
secondary structure of the substring b1b2…bn.
Suppose '( is matched with ') in !"# % .What IH should we use?
Difficulty: This naturally reduces to two subproblems
• Finding secondary structure in '+, … , ').+, i.e., OPT(t-1)
• Finding secondary structure in ')/+, … , '(.+, ???
25
1 t n
match bt and bn
DP: Second Attempt
Definition: !"# $, & = maximum number of base pairs in a secondary structure of the substring (), ()*+, … , (-
Case 1: If & − $ ≤ 4.• !"# $, & = 0 by no-sharp turns condition.
Case 2: Base (- is not involved in a pair.• !"# $, & = !"#($, & − 1)
Case 3: Base (- pairs with (7 for some $ ≤ 8 < & − 4• non-crossing constraint decouples resulting sub-problems• !"# $, & = max)=7>-?@{ 1 + !"#($, 8 − 1) + !"#(8 + 1, & − 1) }
26
The most important part of a correct DP; It fixes IH
Recursive Code
27
Let M[i,j]=empty for all i,j.
Compute-OPT(i,j){if (j-i <= 4)
return 0;if (M[i,j] is empty)
M[i,j]=Compute-OPT(i,j-1)for t=i to j-5 do
if (!", !$ is in {A-U, U-A, C-G, G-C})M[i,j]=max(M[i,j], 1+Compute-OPT(i,t-1) +
Compute-OPT(t+1,j-1))return M[j]
}
Does this code terminate?What are we inducting on?Key question: is there any loop in the recursion?
Formal Induction
Let !"#(%, ') = maximum number of base pairs in a secondary structure of the substring )*, )*+,, … , ).Base Case: !"#(%, ') = 0 for all %, ' where ' − % ≤ 4.IH: For some ℓ ≥ 4, Suppose we have computed !"#(%, ') for all %, 'where % − ' ≤ ℓ.
IS: Goal: We find !"#(%, ') for all %, ' where % − ' = ℓ + 1. Fix %, ' such that % − ' = ℓ + 1.Case 1: Base ). is not involved in a pair.• !"# %, ' = !"#(%, ' − 1) [this we know by IH since % − ' − 1 = ℓ]
Case 2: Base ). pairs with ): for some % ≤ ; < ' − 4• !"# %, ' = max*@:A.BC{ 1 + !"#(%, ; − 1) + !"#(; + 1, ' − 1) }
28We know by IH since difference ≤ ℓ
Bottom-up DP
29
for ℓ = 1, 2, …, n-1for i = 1, 2, …, n-1
j = i + ℓif (ℓ <= 4)
M[i,j]=0;else
M[i,j]=M[i,j-1]for t=i to j-5 do
if ("#, "% is in {A-U, U-A, C-G, G-C})M[i,j]=max(M[i,j], 1+ M[i,t-1] + M[t+1,j-1])
return M[1, n]}
Running Time: &(())
0 0 0
0 0
02
3
4
1
i
6 7 8 9
j
Lesson
We may not always induct on ! or " to get to smaller subproblems.
We may have to induct on |! − %| or ! + % when we are dealing with more complex problems.
30