43
Sofya Raskhodnikova Algorithm Design and Analysis LECTURES 15 Dynamic Programming RNA Secondary Structure Shortest Paths: Bellman- Ford 10/17/2016 S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.1

Algorithm Design and Analysis CSE 565sxr48/cse565/lecture-notes/CSE565-F16...Dynamic Programming Over Intervals Notation. OPT(i, j) = maximum number of base pairs in a secondary structure

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Sofya Raskhodnikova

    Algorithm Design and Analysis

    LECTURES 15Dynamic Programming

    • RNA Secondary Structure

    • Shortest Paths: Bellman-

    Ford

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.1

  • Review

    • Weighted independent set on the chain

    – Input: a chain graph of length n with values v1,..,vn

    – Goal: find a heaviest independent set

    • Let OPT(j) = ???

    • Write down a recursive formula for OPT

    – How many different subproblems?

    8 3 7 10 7 2

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.2

  • Exercise

    • Do the same with a 2 x n grid graph

    8 3 7 10 7 2

    12 52 7 14 10 8

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.3

  • Exercise

    • Do the same with a 2 x n grid graph

    • Three types of subproblems:

    – grid(i)

    – gridTop(i)

    – gridBottom(i)

    8 3 7 10 7 2

    12 52 7 14 10 8

    8 3 71

    07

    1

    2

    5

    27

    1

    4

    1

    0

    8 3 71

    07 2

    1

    2

    5

    27

    1

    4

    1

    0

    8 3 71

    07

    1

    2

    5

    27

    1

    4

    1

    08

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.4

  • Exercise

    – grid(i): maximum independent set in the subgraph consisting of only the first i pairs

    of nodes

    – gridTop(i): maximum independent set in the subgraph consisting of the first i pairs

    of nodes plus the top node of the (i+1)-st pair

    – gridBottom(i): maximum independent set in the subgraph consisting of the first i

    pairs of nodes plus the bottom node of the (i+1)-st pair

    8 3 71

    07

    1

    2

    5

    27

    1

    4

    1

    0

    8 3 71

    07 2

    1

    2

    5

    27

    1

    4

    1

    0

    8 3 71

    07

    1

    2

    5

    27

    1

    4

    1

    08

    i pairs

    i pairs

    i pairs10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.5

  • RNA Secondary Structure

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.7

  • RNA Secondary Structure

    RNA. String B = b1b2bn over alphabet { A, C, G, U }.

    Secondary structure. RNA is single-stranded so it tends to loop back

    and form base pairs with itself. This structure is essential for

    understanding behavior of molecule.

    G

    U

    C

    A

    GA

    A

    G

    CG

    A

    UG

    A

    U

    U

    A

    G

    A

    C A

    A

    C

    U

    G

    A

    G

    U

    C

    A

    U

    C

    G

    G

    G

    C

    C

    G

    Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA

    complementary base pairs: A-U, C-G

    L15.8

  • RNA Secondary Structure

    Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:

    [Watson-Crick.] S is a matching and each pair in S is a Watson-

    Crick complement: A-U, U-A, C-G, or G-C.

    [No sharp turns.] The ends of each pair are separated by at least 4

    intervening bases. If (bi, bj) S, then i < j - 4.

    [Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then we

    cannot have i < k < j < l.

    Free energy. Usual hypothesis is that an RNA molecule will form the

    secondary structure with the optimum total free energy.

    Goal. Given an RNA molecule B = b1b2bn, find a secondary structure S

    that maximizes the number of base pairs.

    approximate by number of base pairs

    L15.9

  • RNA Secondary Structure: Examples

    Examples.

    C

    G G

    C

    A

    G

    U

    U

    U A

    A U G U G G C C A U

    G G

    C

    A

    G

    U

    U A

    A U G G G C A U

    C

    G G

    C

    A

    U

    G

    U

    U A

    A G U U G G C C A U

    sharp turn crossingok

    G

    G

    4

    base pair

    L15.10

  • RNA Secondary Structure: Subproblems

    First attempt. OPT(j) = maximum number of base pairs in a secondary

    structure of the substring b1b2bj.

    Difficulty. Results in two sub-problems.

    Finding secondary structure in: b1b2bt-1.

    Finding secondary structure in: bt+1bt+2bn-1.

    1 t n

    match bt and bn

    OPT(t-1)

    need more sub-problems

    L15.11

  • Dynamic Programming Over Intervals

    Notation. OPT(i, j) = maximum number of base pairs in a secondary

    structure of the substring bibi+1bj.

    Case 1. i j - 4.

    – OPT(i, j) = 0 by no-sharp turns condition.

    Case 2. Base bj is not involved in a pair.

    – OPT(i, j) = OPT(i, j-1)

    Case 3. Base bj pairs with bt for some i t < j - 4.

    – non-crossing constraint decouples resulting sub-problems

    – OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) }

    take max over t such that i t < j-4 andbt and bj are Watson-Crick complements

    L15.12

  • Bottom Up Dynamic Programming Over Intervals

    Q. In what order should we solve the subproblems?

    A. Do shortest intervals first.

    Running time. O(n3).

    RNA(b1,…,bn) {

    for k = 5, 6, …, n-1

    for i = 1, 2, …, n-k

    j = i + k

    Compute M[i, j]

    return M[1, n]

    }using recurrence

    0 0 0

    0 0

    02

    3

    4

    1

    i

    6 7 8 9

    j

    L15.13

  • Dynamic Programming Summary

    Recipe.

    Decide which subproblems to use (define OPT(???)).

    Recursively define value of optimal solution.

    Compute value of optimal solution (bottom up or via memoization).

    Construct optimal solution from computed information.

    Dynamic programming techniques.

    Binary choice: weighted interval scheduling.

    Multi-way choice: segmented least squares.

    Adding a new variable: knapsack.

    Dynamic programming over intervals: RNA secondary structure.

    Top-down vs. bottom-up: different people have different intuition.

    Viterbi algorithm for Hidden Markov Models also uses DP to optimize a maximum likelihoodtradeoff between parsimony and accuracy

    parsing algorithm for context-freegrammars has similar structure

    If it fails, try again.

    L15.14

  • Bellman-Ford: Shortest paths

    via dynamic programming

    For consistency with book: shortest

    paths from all vertices

    to a destination 𝑡

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.15

  • 10/17/2016

    Single-source Shortest Path Problem

    • Input:

    – Directed graph G = (V, E).

    – Source node s, destination node t.

    – for each edge e, length (e) = length of e.

    – length path = sum of edge lengths

    • Find: shortest directed path from s to t.

    Length of path (s,2,3,5,t)is 9 + 23 + 2 + 16 = 50.

    s

    3

    t

    2

    6

    7

    4

    5

    23

    18

    2

    9

    14

    155

    30

    20

    44

    16

    11

    6

    19

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.16

  • When is there a shortest path?

    Under which conditins do shortest paths

    – Always exist?

    – Sometimes exist and sometimes not exist?

    – Never exist?

    • In a directed graph with nonnegative edge

    lengths?

    • In a directed graph with negative edges lengths?

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.17

  • When is there a shortest path?

    • Edge weights nonnegative:

    – shortest path always exists

    • Negative weight edges:

    – shortest path may exist or not

    – If no negative cycles in G, then shortest path exists

    – If negative cycles in G, then no shortest path

    10/17/2016S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.18

  • Shortest Paths: Negative-Cost Cycles

    Negative cost cycles

    Observation. If some path from s to t contains a negative

    cost cycle, there does not exist a shortest s-t path;

    otherwise, there exists one that is simple.

    10/17/2016

    s tW

    c(W) < 0

    -6

    7

    -4

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.19

  • Shortest Paths: Dynamic Programming

    Def. OPT(i, v) = length of shortest v-t path P using at most i edges.

    Case 1: P uses at most i-1 edges.

    – OPT(i, v) = OPT(i-1, v)

    Case 2: P uses exactly i edges.

    – if (v, w) is first edge, then OPT uses (v, w), and then selects best

    w-t path using at most i-1 edges

    Remark. By previous observation, if no negative cycles, then

    OPT(n-1, v) = length of shortest v-t path.

    OPT(i, v)

    0 if i 0

    min OPT(i1, v) ,(v, w) E

    min OPT(i1, w) cvw

    otherwise

    L15.24

  • Shortest Paths: Implementation

    Analysis. (mn) time, (n2) space.

    Finding the shortest paths. Maintain a "successor" for each table

    entry.

    Shortest-Path(G, t) {

    foreach node v V

    M[0, v]

    M[0, t] 0

    for i = 1 to n-1

    foreach node v V

    M[i, v] M[i-1, v]

    foreach edge (v, w) E

    M[i, v] min { M[i, v], M[i-1, w] + cvw }

    }

    L11.25

  • Shortest Paths: Improvements

    Maintain one array M[v] = length of shortest v-t path found so far.

    No need to check edges of the form (v, w) unless M[w] changed

    in previous iteration.

    Theorem. Throughout the algorithm,

    M[v] is length of some v-t path, and

    For every i: after i rounds of updates, the value M[v] is no larger than

    the length of shortest v-t path using i edges.

    Space and time complexity.

    Memory: O(m + n).

    Running time: O(mn) worst case, but substantially faster in practice.

    L15.26

  • Belman-Ford: Efficient Implementation

    10/17/2016

    Bellman-Ford-Shortest-Path(G, s, t) {

    foreach node v V {

    M[v]

    successor[v]

    }

    M[t] = 0

    for i = 1 to n-1 {

    foreach node w V {

    if (M[w] has been updated in previous iteration) {

    foreach node v such that (v, w) E {

    if (M[v] > M[w] + cvw) {

    M[v] M[w] + cvwsuccessor[v] w

    }

    }

    }

    If no M[w] value changed in iteration i, stop.

    }

    }

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.27

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    The demonstration is for a sligtly different version of the algorithm (see

    CLRS) that computes distances from the sourse node rather than distances

    to the destination node.

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.28

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    Initialization.

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.29

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    Order of edge relaxation.

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.30

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.31

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.32

  • 10/17/2016

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.33

  • 10/17/2016

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.34

  • 10/17/2016

    4

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.35

  • 10/17/2016

    4

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.36

  • 10/17/2016

    42

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.37

  • 10/17/2016

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.38

  • 10/17/2016

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    End of pass 1.

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.39

  • 10/17/2016

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.40

  • 10/17/2016

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.41

  • 10/17/2016

    1

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.42

  • 10/17/2016

    1

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.43

  • 10/17/2016

    1

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.44

  • 10/17/2016

    1

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.45

  • 10/17/2016

    1

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.46

  • 10/17/2016

    12

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.47

  • 10/17/2016

    2

    1

    2

    1

    Example of Bellman-Ford

    A

    B

    E

    C D

    –1

    4

    12

    –3

    2

    5

    3

    0

    1

    2

    34

    5

    7

    8

    6

    End of pass 2 (and 3 and 4).

    S. Raskhodnikova; based on slides by E. Demaine, C. Leiserson, A. Smith, K. Wayne L15.48