22
Forest alignment with affine gaps and anchors Stefanie Schirmer and Robert Giegerich Practical Computer Science Bielefeld University CPM 2011 Universität Bielefeld S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 1 / 22

Stefanie Schirmer and Robert Giegerichstelo/cpm/cpm11/11.schirme.pdf · 2011. 10. 18. · Stefanie Schirmer and Robert Giegerich Practical Computer Science Bielefeld University CPM

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Forest alignment with affine gaps and anchors

    Stefanie Schirmer and Robert Giegerich

    Practical Computer ScienceBielefeld University

    CPM 2011

    Universität Bielefeld

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 1 / 22

  • RNA structure levels

    GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA

    Figure: The three structure levels of an RNA molecule, here: t-RNA forphenylalanine in yeast [9].

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 2 / 22

  • Previous works: Forest alignment

    Most closely related:

    Tao Jiang, Lusheng Wang, and Kaizhong Zhang. “Alignment of Trees– An Alternative to Tree Edit.” In: Theoretical Computer Science143.1 (1995), pp. 137–148

    M. Hoechsmann et al. “Local similarity in RNA secondarystructures.” In: Proceedings of the IEEE Computational SystemsBioinformatics Conference (CSB 2003) 2 (2003), pp. 159–168

    A. Lozano et al. “Seeded tree alignment”. In: IEEE Transactions oncomputational biology and bioinformatics (2008), pp. 503–513

    Plus 100 articles on RNA structure comparison with other methods

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 3 / 22

  • RNA forest representation and forest alignment

    P

    a P

    a a c c

    aaacccuuu((.....))

    c u u

    u

    P

    a P

    a P

    u c c

    aaucccauu(((...)))

    c a

    u

    u(P,P)

    (a,a) (-,P)

    (-,a) (P,P)

    (a,u) (a,-) (c,c) (c,c)

    (-(.....)-)(((-...-)))

    (c,c) (u,-) (u,a)

    (-,u)

    (u,u)

    Figure: Example for RNA forest alignment model. Top: two hairpin structuresand their tree representations. Bottom: one possible alignment tree.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 4 / 22

  • Tree and Forest alignment

    Definition (Alignment tree)

    Alignment tree: Tree labeled with pairs from{A ∪ {−} ×A ∪ {−}}/{(−,−)}.

    Definition (Tree alignment)

    Tree alignment: Alignment tree A ∈ Alignments(F ,G ), which can betransformed

    to F by projecting pair node labels to left component, contractingresulting tree to remove all gaps

    to G in same way after projecting pair node labels to right component

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 5 / 22

  • Classic gap model: singleton gap

    (P, P)

    (a, a) (–, P)

    (–, a) (P, P)

    (a, u) (a, –) (c, c) (c, c) (c, c) (u, –) (u, a)

    (–, u)

    (u, u)

    Figure: Singleton Gaps.

    Definition (Singleton gap)

    Singleton gap: single node in a forest, labeled with gap symbol “−”.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 6 / 22

  • New gap model: General gap

    (P, P)

    (a, a) (-, P)

    (-, a) (P, P)

    (a,u) (a, -) (c, c) (c, c) (c, c) (u, -) (u, a)

    (-, u)

    (u, u)

    Figure: General gap.

    Definition (General gap)

    (General) gap: set of singleton gaps in forest, maximal and connectedunder union of parent-child and direct-sibling relation.Gap in alignment A of F and G is gap in either left or right projection of A.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 7 / 22

  • New gap model: Oscillating gap

    (P, P)

    (a, a) (-, P)

    (-, a) (P, -)

    (a, -) (a, -) (c, c) (c, c) (c, c) (u, -) (u, -)

    (-, u)

    (u, u)

    Figure: Oscillating gap.

    Definition (Oscillating gap)

    Oscillating gap: a gap that may cross from the first to the secondcomponent of alignment A and vice versa.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 8 / 22

  • Gap scoring

    Classical linear gap model

    each singleton gap adds contribution to score

    score function linear in number of singleton gaps

    procedure can lead to many small gaps / scattered alignment → notreasonable (interpreted as evolutionary events)

    New affine gap model

    consider adjacent gaps as one unit

    → affine gap cost model: High gap opening costs, low extension costscost function w(l) = wopen + (l − 1) · wextend , l number of singletonsscore of (a,−) depends on contextalgorithm must keep track of opened gaps → gap mode

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 9 / 22

  • Traditional forest alignment

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Figure: Recurrences for traditional forest alignment.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 10 / 22

  • Forest alignment with affine gap costsInput: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Figure: Affine gap costs: no gap mode , parent gap mode , sibling gap mode .

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 11 / 22

  • Forest alignment with affine gap costs: Case 1

    no gap modeparent gap modesibling gap mode

    Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .

    R: (a1, b1)

    a1 . . . b1 . . .

    a2 . . . b2 . . . D: (a1,−)

    a1 . . . b1...k

    a2 . . . bk+1 . . . I: (−, b1)

    a1...k b1 . . .

    ak+1 . . . b2 . . .

    Figure: Case 1 of the forest alignment with affine gap costs.

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 12 / 22

  • Shape abstraction

    CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG((((((...(((..(((...))))))...(((..((.....))..)))))))))..

    Shape Level 5:[[][]]Shape Level 4:[[][[]]]Shape Level 3:[[[]][[]]]Shape Level 2:[[_[]][_[]_]]Shape Level 1:[_[_[]]_[_[]_]]_

    1

    10

    20

    30

    40

    50

    56

    C

    G

    U

    C

    U

    UAA

    A

    CUC

    AU

    CACC

    G

    U G U G GA G

    C

    UG C

    G

    A

    C

    CC

    U

    U

    C C

    C

    UA

    G

    A

    UU

    C

    G

    A

    A

    G

    A

    C

    G AG*

    *

    *

    *

    *

    *

    ******

    *

    *

    *

    *

    *

    Figure: An example secondary structure and its five shape representations,implemented in the tool RNAshapes [7]. Figure from [4]

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 13 / 22

  • Anchored alignment

    Figure: Anchored alignment input trees with constraints.

    anchoring is a given tree mapping

    tree alignment conforming to an anchoring need not exist

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 14 / 22

  • Time complexity: traditional alignment

    let n = |F | = |G |let d = deg(F ) = deg(G )

    time complexity for alignment of F and G is O(n2d2)RNA: average degree is effectively constant, d ≈ 30

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 15 / 22

  • Time complexity: affine alignment

    remains asymptotically the same with affine gap scoring

    constant factor ≈ 7 is expectedmeasured an average runtime factor

    * 8.02 for alignments of folded sequences of ≈ 100 nucleotides* 7.79 for those of ≈ 200 nucleotides

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 16 / 22

  • Time complexity: anchored alignment

    k − 1 evenly placed anchors → both F and G are split into k parts ofabout equal size

    n is divided by k in the complexity expression

    → complexity is reduced from n2 to k( nk )2 = n2

    k

    → efficiency with k − 1 anchors: O(n2k d2).

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 17 / 22

  • Results: speedup by anchoring

    Table: Speed-up factor gained by shape anchoring, for ten members of each ofthe chosen Rfam families, folded by RNAcast [5].

    RNA Family Common Shape AnchorsAvg. Speed-upby Anchoring

    5S rRNA (RF00001) [[][]] 3 1.27

    Spot 42 (RF00021) [][][] 3 3.30

    Cobalamin (RF00174) [[][[][[][]]]] 7 2.96

    T-box (RF00230) [[[][]][[][]]] 7 3.23

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 18 / 22

  • Results: better alignment to true family

    0

    200

    400

    600

    800

    1000

    1200

    1400

    80 100 120 140 160 180 200 220

    rank

    length

    Figure: Improved recognition of RNA family with affine gap costs (green arrow).

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 19 / 22

  • Next steps

    submit PhD thesis

    deploy tool on Bielefeld University Bioinformatics Serverhttp://bibiserv.techfak.uni-bielefeld.de

    compose shape analysis and anchored alignment for comparativestructure prediction

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 20 / 22

  • Thanks for your attention.Questions and remarks are welcome.

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ??

    ?

    ?

    ? ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ??

    ?

    ?

    ?

    ?

    ??

    ?

    ?

    ?

    ?

    ?? ???

    ?

    ?

    ?

    ?

    ?

    ?

    ? ??

    ??

    ?

    ?

    ?

    ?

    ??

    ??

    ?

    ?

    ?

    ?

    ?

    ?

    ?

    ??

    ?

    ?

    ?

    ?

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 21 / 22

  • Further reading

    Tao Jiang, Lusheng Wang, and Kaizhong Zhang. “Alignment of Trees – AnAlternative to Tree Edit.” In: Theoretical Computer Science 143.1 (1995),pp. 137–148

    M. Hoechsmann et al. “Local similarity in RNA secondary structures.” In:Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB2003) 2 (2003), pp. 159–168

    A. Lozano et al. “Seeded tree alignment”. In: IEEE Transactions on computationalbiology and bioinformatics (2008), pp. 503–513

    Francesc Rosselló and Gabriel Valiente. “An algebraic view of the relation betweenlargest common subtrees and smallest common supertrees”. In: TheoreticalComputer Science 362.1 (2006), pp. 33–53

    J. Reeder and R. Giegerich. “Consensus shapes: an alternative to the Sankoffalgorithm for RNA consensus structure prediction.” In: Bioinformatics 21.17(2005), pp. 3516–3523

    Hélène Touzet. “Tree edit distance with gaps”. In: Information Processing Letters85.3 (2003), pp. 123–129

    S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 22 / 22

    Motivation: RNA secondary structurePrevious worksStructure tree representation and tree alignmentTree alignment modelGap ModelsTraditional and affine gap forest aligment algorithmsAnchored alignment by shape abstractionComplexityResults