lecture-25 - cs.uic.eduxiaorui/cs401/slides/lecture-25.pdf8 4 5 7 2 3 6 3 2 5 4 0 4 3 1 3 1 0 2 4 0 1 3 0 0 0 j !" p(j) Weighted Interval Scheduling Label jobs by finishing time: #1≤⋯≤#'.)(+)=largest

CS 401

Dynamic Programming / Knapsack Problem, RNA Secondary

Structure

Xiaorui Sun1

Dynamic Programming

Principle:• Define a collection of subproblems

• Typically, only a polynomial number of subproblems• Solution to original problem can be computed from subproblems• Natural ordering of subproblems from “smallest” to “largest” that enables

determining a solution to a subproblem from solutions to smaller subproblems

• Parameterization: Describe subproblems by parameters• Memorization: Remember the solution of subproblems

Technique:• Binary choice: weighted interval scheduling. • Multiway choice: segmented least squares.

Question: How to find a solution for dynamic programming?• Back-track from the optimal objective value 2

548

327

236

045

134

013

042

031

00

p(j)!"j

Weighted Interval Scheduling

Label jobs by finishing time: # 1 ≤ ⋯ ≤ # ' .)(+) = largest index . < + such that job . is compatible with +.

Time0 1 2 3 4 5 6 7 8 9 10 11

6

7

8

4

3

1

2

5

3

4

4

6

6

7

7

10

OPT(j)

012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.

548

327

236

045

134

013

042

031

00

p(j)!"j



Time0 1 2 3 4 5 6 7 8 9 10 11

6

7

8

4

3

1

2

5

3

4

4

6

6

7

7

10

OPT(j)

012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.

548

327

236

045

134

013

042

031

00

p(j)!"j



Time0 1 2 3 4 5 6 7 8 9 10 11

6

7

8

4

3

1

2

5

3

4

4

6

6

7

7

10

OPT(j)

012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.

548

327

236

045

134

013

042

031

00

p(j)!"j



Time0 1 2 3 4 5 6 7 8 9 10 11

6

7

8

4

3

1

2

5

3

4

4

6

6

7

7

10

OPT(j)

012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.

548

327

236

045

134

013

042

031

00

p(j)!"j



Time0 1 2 3 4 5 6 7 8 9 10 11

6

7

8

4

3

1

2

5

3

4

4

6

6

7

7

10

OPT(j)

012 + = 30 if + = 0max !" + 012 ) + , 012 + − 1 o.w.

Knapsack Problem

Knapsack Problem

Given ! objects and a "knapsack.“

Item " weighs #$ > 0 kilograms and has value '$ > 0.

Knapsack has capacity of ( kilograms.

Goal: fill knapsack so as to maximize total value.

Ex: OPT is { 3, 4 } with value 40.

Greedy: repeatedly add item with maximum ratio '$/#$.Ex: { 5, 2, 1 } achieves only value = 35 Þ greedy not optimal.

9

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2W = 11

First Attempt

Def !"#(%) = max profit subset of items 1,… , %.

Case 1: !"# does not select item %.• !"# selects best of { 1, 2, … , % − 1 }

Case 2: !"# selects item %.• accepting item % does not immediately imply that we will have

to reject other items• without knowing what other items were selected before %, we

don't even know if we have enough room for %

Conclusion: Need more sub-problems!

10

Stronger DP (New Variable)

Let !"#(%, ') = Max value of subsets of items 1,… , % of weight ≤ '

Case 1: !"#(%, ') selects item %• In this case, !"# %, ' = -. + !"#(% − 1,' − '.)

Case 2: !"# %, ' does not select item %• In this case, !"# %, ' = !"#(% − 1,').

Therefore,

11

!"# %, ' = 10!"# % − 1,'max(!"# % − 1,' , -. + !"# % − 1,' − '. )

Take best of the two

If % = 0If '. > 'o.w.,

DP for Knapsack

12

for w = 0 to WM[0, w] = 0

for i = 1 to nfor w = 1 to W

if (wi > w)M[i, w] = M[i-1, w]

elseM[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}

return M[n, W]

Compute-OPT(i,w)if M[i,w] == empty

if (i==0)M[i,w]=0

else if (wi > w)M[i,w]=Comp-OPT(i-1,w)

elseM[i,w]= max {Comp-OPT(i-1,w), vi + Comp-OPT(i-1,w-wi)}

return M[i, w]

recursive

Non-recursive

DP for Knapsack

13

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

2

0

3

0

4

0

5

0

6

0

7

0

8

0

9

0

10

0

11

0

W + 1

W = 11

if (wi > w)M[i, w] = M[i-1, w]


DP for Knapsack

14

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

1

2

0

1

3

0

1

4

0

1

5

0

1

6

0

1

7

0

1

8

0

1

9

0

1

10

0

1

11

0

1

W + 1

W = 11

if (wi > w)M[i, w] = M[i-1, w]


DP for Knapsack

15

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

1

1

1

1

1

2

0

6

1

3

0

1

4

0

1

5

0

1

6

0

1

7

0

1

8

0

1

9

0

1

10

0

1

11

0

1

W + 1

W = 11OPT: { 4, 3 }

value = 22 + 18 = 40

if (wi > w)M[i, w] = M[i-1, w]


7

DP for Knapsack

16

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

W + 1

W = 11OPT: { 4, 3 }

value = 22 + 18 = 40

if (wi > w)M[i, w] = M[i-1, w]


f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

1

1

1

1

1

2

0

6

6

1

3

0

7

7

1

4

0

7

7

1

5

0

7

18

1

6

0

7

1

7

0

7

1

8

0

7

1

9

0

7

1

10

0

7

1

11

0

7

1

19

DP for Knapsack

17

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

W + 1

W = 11OPT: { 4, 3 }

value = 22 + 18 = 40

if (wi > w)M[i, w] = M[i-1, w]


f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

1

1

1

1

1

2

0

6

6

6

1

3

0

7

7

7

1

4

0

7

7

7

1

5

0

7

18

18

1

6

0

7

19

22

1

7

0

7

24

24

1

8

0

7

25

28

1

9

0

7

25

1

10

0

7

25

1

11

0

7

25

1

29

DP for Knapsack

18

n + 1

1

Value

182228

1

Weight

56

6 2

7

Item

1

345

2

W + 1

W = 11OPT: { 4, 3 }

value = 22 + 18 = 40

if (wi > w)M[i, w] = M[i-1, w]


f

{ 1, 2 }

{ 1, 2, 3 }

{ 1, 2, 3, 4 }

{ 1 }

{ 1, 2, 3, 4, 5 }

0

0

0

0

0

0

0

1

0

1

1

1

1

1

2

0

6

6

6

1

6

3

0

7

7

7

1

7

4

0

7

7

7

1

7

5

0

7

18

18

1

18

6

0

7

19

22

1

22

7

0

7

24

24

1

28

8

0

7

25

28

1

29

9

0

7

25

29

1

34

10

0

7

25

29

1

34

11

0

7

25

40

1

40

DP Ideas so far

• You may have to define an ordering to decrease

#subproblems

• You may have to strengthen DP, equivalently the induction,

i.e., you have may have to carry more information to find the

Optimum.

• This means that sometimes we may have to use two

dimensional or three dimensional induction

• This dynamic algorithm is not a polynomial time algorithm • For an input of N bits, W can be as large as 2N

• Knapsack problem is actually NP-complete

19

RNA Secondary Structure

RNA Secondary Structure

RNA: A String B = b1b2…bn over alphabet { A, C, G, U }.

Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule.

21G

U

C

A

GA

A

G

CG

A

UG

A

U

U

A

G

A

C A

A

C

U

G

A

G

U

C

A

U

C

G

G

G

C

C

G

Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA

complementary base pairs: A-U, C-G

RNA Secondary Structure (Formal)

Secondary structure. A set of pairs ! = { $%, $' } that satisfy:

[Watson-Crick.]• ! is a matching and

• each pair in ! is a Watson-Crick pair: * − ,, , − *, - − ., or . − -.[No sharp turns.]: The ends of each pair are separated by at least 4 intervening bases. If $%, $' ∈ !, then 0 < 2 − 4.

[Non-crossing.] If ($%, $') and ($6, $7) are two pairs in !, then we cannot have 0 < 8 < 2 < 9.

22

Secondary Structure (Examples)

23

C

G G

C

A

G

U

U

U A

A U G U G G C C A U

G G

C

A

G

U

U A

A U G G G C A U

C

G G

C

A

U

G

U

U A

A G U U G G C C A U

sharp turn crossingok

G

G£4

base pair

RNA Secondary Structure (Formal)

Secondary structure. A set of pairs ! = { $%, $' } that satisfy:

[Watson-Crick.]• ! is a matching and • each pair in ! is a Watson-Crick pair: * − ,, , − *, - − ., or . − -.[No sharp turns.]: The ends of each pair are separated by at least 4 intervening bases. If $%, $' ∈ !, then 0 < 2 − 4.[Non-crossing.] If ($%, $') and ($6, $7) are two pairs in !, then we cannot have 0 < 8 < 2 < 9.

Free energy: Usual hypothesis is that an RNA molecule will maximize total free energy.

Goal: Given an RNA molecule B = b1b2…bn, find a secondary structure S that maximizes the number of base pairs.

24

approximate by number of base pairs

DP: First Attempt

First attempt. Let !"#(%) = maximum number of base pairs in a

secondary structure of the substring b1b2…bn.

Suppose '( is matched with ') in !"# % .What IH should we use?

Difficulty: This naturally reduces to two subproblems

• Finding secondary structure in '+, … , ').+, i.e., OPT(t-1)

• Finding secondary structure in ')/+, … , '(.+, ???

25

1 t n

match bt and bn

DP: Second Attempt

Definition: !"# $, & = maximum number of base pairs in a secondary structure of the substring (), ()*+, … , (-

Case 1: If & − $ ≤ 4.• !"# $, & = 0 by no-sharp turns condition.

Case 2: Base (- is not involved in a pair.• !"# $, & = !"#($, & − 1)

Case 3: Base (- pairs with (7 for some $ ≤ 8 < & − 4• non-crossing constraint decouples resulting sub-problems• !"# $, & = max)=7>-?@{ 1 + !"#($, 8 − 1) + !"#(8 + 1, & − 1) }

26

The most important part of a correct DP; It fixes IH

Recursive Code

27

Let M[i,j]=empty for all i,j.

Compute-OPT(i,j){if (j-i <= 4)

return 0;if (M[i,j] is empty)

M[i,j]=Compute-OPT(i,j-1)for t=i to j-5 do

if (!", !$ is in {A-U, U-A, C-G, G-C})M[i,j]=max(M[i,j], 1+Compute-OPT(i,t-1) +

Compute-OPT(t+1,j-1))return M[j]

}

Does this code terminate?What are we inducting on?Key question: is there any loop in the recursion?

Formal Induction

Let !"#(%, ') = maximum number of base pairs in a secondary structure of the substring )*, )*+,, … , ).Base Case: !"#(%, ') = 0 for all %, ' where ' − % ≤ 4.IH: For some ℓ ≥ 4, Suppose we have computed !"#(%, ') for all %, 'where % − ' ≤ ℓ.

IS: Goal: We find !"#(%, ') for all %, ' where % − ' = ℓ + 1. Fix %, ' such that % − ' = ℓ + 1.Case 1: Base ). is not involved in a pair.• !"# %, ' = !"#(%, ' − 1) [this we know by IH since % − ' − 1 = ℓ]

Case 2: Base ). pairs with ): for some % ≤ ; < ' − 4• !"# %, ' = max*@:A.BC{ 1 + !"#(%, ; − 1) + !"#(; + 1, ' − 1) }

28We know by IH since difference ≤ ℓ

Bottom-up DP

29

for ℓ = 1, 2, …, n-1for i = 1, 2, …, n-1

j = i + ℓif (ℓ <= 4)

M[i,j]=0;else

M[i,j]=M[i,j-1]for t=i to j-5 do

if ("#, "% is in {A-U, U-A, C-G, G-C})M[i,j]=max(M[i,j], 1+ M[i,t-1] + M[t+1,j-1])

return M[1, n]}

Running Time: &(())

0 0 0

0 0

02

3

4

1

i

6 7 8 9

j

Lesson

We may not always induct on ! or " to get to smaller subproblems.

We may have to induct on |! − %| or ! + % when we are dealing with more complex problems.

30

Documents

lecture-25 - cs.uic.eduxiaorui/cs401/slides/lecture-25.pdf8 4 5 7 2 3 6 3 2 5 4 0 4 3 1 3 1 0 2 4 0 1 3 0 0 0 j !" p(j) Weighted Interval Scheduling Label jobs by finishing time: #1≤⋯≤#'.)(+)=largest