Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Forest alignment with affine gaps and anchors
Stefanie Schirmer and Robert Giegerich
Practical Computer ScienceBielefeld University
CPM 2011
Universität Bielefeld
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 1 / 22
RNA structure levels
GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA
Figure: The three structure levels of an RNA molecule, here: t-RNA forphenylalanine in yeast [9].
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 2 / 22
Previous works: Forest alignment
Most closely related:
Tao Jiang, Lusheng Wang, and Kaizhong Zhang. “Alignment of Trees– An Alternative to Tree Edit.” In: Theoretical Computer Science143.1 (1995), pp. 137–148
M. Hoechsmann et al. “Local similarity in RNA secondarystructures.” In: Proceedings of the IEEE Computational SystemsBioinformatics Conference (CSB 2003) 2 (2003), pp. 159–168
A. Lozano et al. “Seeded tree alignment”. In: IEEE Transactions oncomputational biology and bioinformatics (2008), pp. 503–513
Plus 100 articles on RNA structure comparison with other methods
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 3 / 22
RNA forest representation and forest alignment
P
a P
a a c c
aaacccuuu((.....))
c u u
u
P
a P
a P
u c c
aaucccauu(((...)))
c a
u
u(P,P)
(a,a) (-,P)
(-,a) (P,P)
(a,u) (a,-) (c,c) (c,c)
(-(.....)-)(((-...-)))
(c,c) (u,-) (u,a)
(-,u)
(u,u)
Figure: Example for RNA forest alignment model. Top: two hairpin structuresand their tree representations. Bottom: one possible alignment tree.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 4 / 22
Tree and Forest alignment
Definition (Alignment tree)
Alignment tree: Tree labeled with pairs from{A ∪ {−} ×A ∪ {−}}/{(−,−)}.
Definition (Tree alignment)
Tree alignment: Alignment tree A ∈ Alignments(F ,G ), which can betransformed
to F by projecting pair node labels to left component, contractingresulting tree to remove all gaps
to G in same way after projecting pair node labels to right component
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 5 / 22
Classic gap model: singleton gap
(P, P)
(a, a) (–, P)
(–, a) (P, P)
(a, u) (a, –) (c, c) (c, c) (c, c) (u, –) (u, a)
(–, u)
(u, u)
Figure: Singleton Gaps.
Definition (Singleton gap)
Singleton gap: single node in a forest, labeled with gap symbol “−”.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 6 / 22
New gap model: General gap
(P, P)
(a, a) (-, P)
(-, a) (P, P)
(a,u) (a, -) (c, c) (c, c) (c, c) (u, -) (u, a)
(-, u)
(u, u)
Figure: General gap.
Definition (General gap)
(General) gap: set of singleton gaps in forest, maximal and connectedunder union of parent-child and direct-sibling relation.Gap in alignment A of F and G is gap in either left or right projection of A.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 7 / 22
New gap model: Oscillating gap
(P, P)
(a, a) (-, P)
(-, a) (P, -)
(a, -) (a, -) (c, c) (c, c) (c, c) (u, -) (u, -)
(-, u)
(u, u)
Figure: Oscillating gap.
Definition (Oscillating gap)
Oscillating gap: a gap that may cross from the first to the secondcomponent of alignment A and vice versa.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 8 / 22
Gap scoring
Classical linear gap model
each singleton gap adds contribution to score
score function linear in number of singleton gaps
procedure can lead to many small gaps / scattered alignment → notreasonable (interpreted as evolutionary events)
New affine gap model
consider adjacent gaps as one unit
→ affine gap cost model: High gap opening costs, low extension costscost function w(l) = wopen + (l − 1) · wextend , l number of singletonsscore of (a,−) depends on contextalgorithm must keep track of opened gaps → gap mode
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 9 / 22
Traditional forest alignment
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Figure: Recurrences for traditional forest alignment.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 10 / 22
Forest alignment with affine gap costsInput: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Figure: Affine gap costs: no gap mode , parent gap mode , sibling gap mode .
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 11 / 22
Forest alignment with affine gap costs: Case 1
no gap modeparent gap modesibling gap mode
Input: a1 < a1 · · · > a2 . . . b1 < b1 · · · > b2 . . .
R: (a1, b1)
a1 . . . b1 . . .
a2 . . . b2 . . . D: (a1,−)
a1 . . . b1...k
a2 . . . bk+1 . . . I: (−, b1)
a1...k b1 . . .
ak+1 . . . b2 . . .
Figure: Case 1 of the forest alignment with affine gap costs.
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 12 / 22
Shape abstraction
CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG((((((...(((..(((...))))))...(((..((.....))..)))))))))..
Shape Level 5:[[][]]Shape Level 4:[[][[]]]Shape Level 3:[[[]][[]]]Shape Level 2:[[_[]][_[]_]]Shape Level 1:[_[_[]]_[_[]_]]_
1
10
20
30
40
50
56
C
G
U
C
U
UAA
A
CUC
AU
CACC
G
U G U G GA G
C
UG C
G
A
C
CC
U
U
C C
C
UA
G
A
UU
C
G
A
A
G
A
C
G AG*
*
*
*
*
*
******
*
*
*
*
*
Figure: An example secondary structure and its five shape representations,implemented in the tool RNAshapes [7]. Figure from [4]
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 13 / 22
Anchored alignment
Figure: Anchored alignment input trees with constraints.
anchoring is a given tree mapping
tree alignment conforming to an anchoring need not exist
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 14 / 22
Time complexity: traditional alignment
let n = |F | = |G |let d = deg(F ) = deg(G )
time complexity for alignment of F and G is O(n2d2)RNA: average degree is effectively constant, d ≈ 30
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 15 / 22
Time complexity: affine alignment
remains asymptotically the same with affine gap scoring
constant factor ≈ 7 is expectedmeasured an average runtime factor
* 8.02 for alignments of folded sequences of ≈ 100 nucleotides* 7.79 for those of ≈ 200 nucleotides
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 16 / 22
Time complexity: anchored alignment
k − 1 evenly placed anchors → both F and G are split into k parts ofabout equal size
n is divided by k in the complexity expression
→ complexity is reduced from n2 to k( nk )2 = n2
k
→ efficiency with k − 1 anchors: O(n2k d2).
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 17 / 22
Results: speedup by anchoring
Table: Speed-up factor gained by shape anchoring, for ten members of each ofthe chosen Rfam families, folded by RNAcast [5].
RNA Family Common Shape AnchorsAvg. Speed-upby Anchoring
5S rRNA (RF00001) [[][]] 3 1.27
Spot 42 (RF00021) [][][] 3 3.30
Cobalamin (RF00174) [[][[][[][]]]] 7 2.96
T-box (RF00230) [[[][]][[][]]] 7 3.23
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 18 / 22
Results: better alignment to true family
0
200
400
600
800
1000
1200
1400
80 100 120 140 160 180 200 220
rank
length
Figure: Improved recognition of RNA family with affine gap costs (green arrow).
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 19 / 22
Next steps
submit PhD thesis
deploy tool on Bielefeld University Bioinformatics Serverhttp://bibiserv.techfak.uni-bielefeld.de
compose shape analysis and anchored alignment for comparativestructure prediction
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 20 / 22
Thanks for your attention.Questions and remarks are welcome.
?
?
?
?
?
?
?
??
?
?
? ?
?
?
?
?
?
?
?
?
??
?
?
?
?
??
?
?
?
?
?? ???
?
?
?
?
?
?
? ??
??
?
?
?
?
??
??
?
?
?
?
?
?
?
??
?
?
?
?
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 21 / 22
Further reading
Tao Jiang, Lusheng Wang, and Kaizhong Zhang. “Alignment of Trees – AnAlternative to Tree Edit.” In: Theoretical Computer Science 143.1 (1995),pp. 137–148
M. Hoechsmann et al. “Local similarity in RNA secondary structures.” In:Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB2003) 2 (2003), pp. 159–168
A. Lozano et al. “Seeded tree alignment”. In: IEEE Transactions on computationalbiology and bioinformatics (2008), pp. 503–513
Francesc Rosselló and Gabriel Valiente. “An algebraic view of the relation betweenlargest common subtrees and smallest common supertrees”. In: TheoreticalComputer Science 362.1 (2006), pp. 33–53
J. Reeder and R. Giegerich. “Consensus shapes: an alternative to the Sankoffalgorithm for RNA consensus structure prediction.” In: Bioinformatics 21.17(2005), pp. 3516–3523
Hélène Touzet. “Tree edit distance with gaps”. In: Information Processing Letters85.3 (2003), pp. 123–129
S. Schirmer, R.Giegerich (Bielefeld Univ.) Forest alignment with affine gaps & anchors 22 / 22
Motivation: RNA secondary structurePrevious worksStructure tree representation and tree alignmentTree alignment modelGap ModelsTraditional and affine gap forest aligment algorithmsAnchored alignment by shape abstractionComplexityResults