RNA Abstract Shape Analysis - · PDF fileshape space RNAshapes Simple shape analysis Complete...

Preview:

Citation preview

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

RNA Abstract Shape Analysis

Robert Giegerich

Faculty of Technology & Center of BiotechnologyBielefeld University

robert@TechFak.Uni-Bielefeld.DE

EMBO Practical Course on Computational RNA Biology,Cargese, April 2010

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Where do we stand ...

1 Thermodynamic model (X. Flamm)

2 MFE folding, optimal structure, fallacies (G. Steger)

3 representative structural alternatives

4 structure prediction from multiple sequences (D.Mathews)

5 structure comparison (D. Mathews)

6 search by structure (I. Meyer, P. Gardner)

7 . . .

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

1 MotivationLost in Folding SpaceAbstraction comes to rescue

2 Abstract shapesDefining shape abstractionsProperties of the shape space

3 RNAshapesSimple shape analysisComplete probabilistic shape analysisShape Probabilitites

4 Application: Shape based indexing

5 Application: Shape based matching

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (1)

Can we get better/more information from thermodynamicfolding than the MFE structure?

How accurate is the MFE structure anyway?

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (1)

Can we get better/more information from thermodynamicfolding than the MFE structure?

How accurate is the MFE structure anyway?

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

2004 Mfold evaluation by Gutell Lab

Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR.: Evaluation of the

suitability of free-energy minimization using nearest-neighbor energy

parameters for RNA secondary structure prediction. BMC Bioinformatics.

2004 Aug 5;5:105.

Compares MFE foldings to structures derived by comparativeanalysis and proven by experimental techniques.Findings:

base pair accuracy of about 20% - 71%

no improvement from recently updated thermodynamicparameters

note: did not check for good near-optimal solutions

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Base pair accuracy – what does it mean?

( )

( )

((((

((((

))))

))))

(((( ))))

( )((((

(((())))

(((( ))))

( )....

....

....

.... ....

.... ....

....

....

((((...)))) ((((...))))...((((...))))...((((...))))...((((....))))

((((...)))) ..............((((((((((((((((........))))))))))))))))

((((...)))) ............((((((((((((((((........))))))))))))))))..

4 out of 20 BP correct...

....))))))))

.... ))))(( ))

....

....

4 out of 20 BP correct...

a reference structure

and two structures

at the same distance 16

two structures at distance 16, but with the same "shape"

(((((( ))

((((

[ [ ] [ ] ]

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Accuracy of MFE folding . . .

RNA folding struggles with

adequacy of thermodynamic parameters . . . ?

uncovered structural motifs – pseudoknots, kissinghairpins!

dynamics of interaction with other molecules . . . ?

RNA transcript processing . . . ?

folding kinetics (co-transcriptional folding) . . . ?

...

physical properties of the folding space . . . !

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The problem to be solved

We want more comprehensive information about an RNAmolecule’s foldings than just its MFE structure.

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Lost in folding space (1)

The folding space of a given sequence is LARGE:

number of foldings is exponential in sequence length

number of near-optimal foldings is exponential in energywindow

Structure asymptotics:

S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n

Number of secondary structures for ALL sequences of length n.A tyical tRNA of 74 nt has about 4 Mio. feasible structures.Consider the 111 “best” structures, each with 27 - 28 bp:

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

gggcccauagcucagugguagagugccuccuuugcaaggaggaugcccuggguucgaaucccagugggucca

((((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))).

((((((((.((...)).))((.((((((((((...))))))).))).))))))))((.(((....)))))..

((((((((.((...)).))((.((((((((((...))))))).))).))))))))((..(((...)))))..

((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..

.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))))))..

((((((((.......((((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((((((((((((.(((...((.(((((((...))))))))))))))))))).........)))))))).

((((((((.((...))(((...((.(((((((...)))))))))))).(((((....)).))))))))))).

((((((((.((...))(((...((.(((((((...)))))))))))).(((((....))).)))))))))).

((((((((.((...))((....((((((((((...))))))).)))))(((((....)).))))))))))).

((((((((.((...))((....((((((((((...))))))).)))))(((((....))).)))))))))).

((((((((.((...))(((.((((((((((((...))))))).))((...)))))..)))...)))))))).

((((((((.((...))(((.((((.(((((((...)))))))))(((...)))))..)))...)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).))).((....))))..)))))))).

((((((((.((...))((.((.((((((((((...))))))).))).)).(((....))))).)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).)))((......)))).)))))))).

((((((((.((...))(((((.((((((((((...))))))).))).))).((....)).)).)))))))).

((((((((.((...))(((((.((.(((((((...)))))))))...)))(((....))))).)))))))).

((((((((.((...))(((((..(((((((((...))))))).))..)))(((....))))).)))))))).

((((((((.((...))(((((.(((..((((((....))))))))).)))(((....))))).)))))))).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((((.((...))(((((.(((..((((((...)).))))))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((..((((((...))).)))))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((((.....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((..((...)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((...)).)))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((.(((....)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((..(((...)))))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((((((.(((...))).))).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((((.((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((((.((((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((.((((((...))))))..))).)))(((....))))).)))))))).

((((((((.((...))(((((.(((.((((((...)))).)).))).)))(((....))))).)))))))).

((((((((.((...))(((((.((.(((((((...)))))))..)).)))(((....))))).)))))))).

((((((((.((...))((((..((((((((((...))))))).)))..))(((....))))).)))))))).

((((((((.(((....)))((.((((((((((...))))))).))).))((((....))))..)))))))).

((((((((.(((....)))((.((((((((((...))))))).))).))((((....)).)).)))))))).

((((((((.((((.((.((...))))((((((...)))))))).))..(((((....)).))))))))))).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((((.((((.((.((...))))((((((...)))))))).))..(((((....))).)))))))))).

(((((((...((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

(((((((...((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((((((...(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((((((..((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((((((..((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((.((....))))..((((((((((...))))))).))).((((((....)))).))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((..((...))))..((((((((((...))))))).))).((((((....)))).))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....)).))))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....))).)))))))))).

(((((((((((...))..))..((((((((((...))))))).))).((((((....)))).))))))))).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

(((((((((..((.((.((...))))((((((...))))))))))..((((((....)).))))))))))).

(((((((((..((.((.((...))))((((((...))))))))))..((((((....))).)))))))))).

(((((((((..((.((.((...))))((((((...))))))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((....))))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((...)).))))))..((((((....)))).))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....)).))))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....))).)))))))))).

(((((((((((...))((((...))))((((((...))).)))))..((((((....)))).))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((((((..((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

((((((...(((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

((((((...(((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

((((((...((((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

((((((...((....((((((.((((((((((...))))))).))).)))(((....)))))))))))))). [ [][]]

(((((((((((((((.(((...((.(((((((...)))))))))))))))))))...))......)))))). [ ]

((((((..(((((((.(((...((.(((((((...)))))))))))))))))))...((....)))))))).

(((((..((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

(((((..((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((((..((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((((..((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))). [[][][]]

(((((..((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((((..((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((((..((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((((..((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).

((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...))((....)))))).

((((..(((((((((.(((...((.(((((((...)))))))))))))))))))...)).((...)))))).

((((.(((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))).)))).

(((.((.((.((....(((((.((((((((((...))))))).))).)))(((....)))))))))))))).

(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)).))))))))))).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

(((.((.((.((.((.(((...((.(((((((...))))))))))))))((((....)))).))))))))).

(((.((.((.(((((.(((...((.(((((((...)))))))))))))))(((....)))..))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)).))))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....))).)))))))))).

(((.((.((((...))(((...((.(((((((...))))))))))))((((((....)))).))))))))).

(((.((.((((...))(((((.((((((((((...))))))).))).)))(((....)))))..))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)).))))))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....))).)))))))))).

(((.((.((((((.((.((...))))((((((...)))))))).)).((((((....)))).))))))))).

(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)).))))).

(((((.((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).

((((((((.((...)).))((.((((((((((...))))))).))).)))))(((..((....)))))))).

(((...(((((((((.(((...((.(((((((...)))))))))))))))))))...))(((...)))))).

(((.((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).)))).))).

((.(((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).

.(((((((.((...))(((((.((((((((((...))))))).))).)))(((....))))).))))).)).

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Lost in folding space (2)

What we observe from the simple tRNA example:

LARGE number of close-to-optimal foldings

FEW structural classes holding many similar foldings

Can we condense the folding space to good representatives ofthese classes?

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Better than optimal . . . (2)

Alternatives to a single MFE structure prediction:

BP probabilities and dotplots (McCaskill)

sampling of near-optimal structures (Mfold)

complete enumeration within a threshold (RNAsubopt)

stochastic sampling and clustering a posteriori (Sfold)

classified folding by abstract shape (RNAshapes)

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Classification by abstract shape

C

U

GC

A

G

UA

G

G

U U GG

UC C

G

CG

C

G

U C

UG

CUG

CGG

U

GC

C G

G

A

AU

C

G

U

C

G

G

U

U

G

G

Multiple Loop

Stacking Region

Hairpin Loop

Internal Loop

Bulge Loop (left)

Bulge Loop (right)

C

C A

C

UGGC

GCC

G

CG

G

GC

C

G

A

CG

UC

G A

CU

A G

G CC

G

C

U

C

GGA

A

A

C

G

G

G

G

U

A

C

C

G

C

G

UU

C

CC

A

C

U

A

G

G

C

G

C

C

GG

What is a shape LIKE this .............. or NOT like this.....?Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Levels of abstraction

Level 0 Level 1

All types ofFull structure

loops

Level 3

All helix

Level 4

Multi− and

internal loops,

no bulges

Level 5

Stem

arrangement

only

interruptions

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

String representation of shapes

CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG((((((...(((..(((...))))))...(((..((.....))..)))))))))..

Shape Type 5: [[][]]Shape Type 4: [[][[]]]Shape Type 3: [[[]][[]]]Shape Type 2: [[ []][ [] ]]Shape Type 1: [ [ [ ]] [ [ ] ]]

1

10

20

30

40

50

56

C

G

U

C

U

UAA

A

CUC

AU

CACC

G

U G U G G A G

C

UG C

G

A

C

CC

U

U

C C

C

UA

G

A

UU

C

G

A

A

G

A

C

G AG*

*

*

*

*

*

******

*

*

*

*

*

1

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stems

Shape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )

Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops, i.e. helix interruptions

RNAshapes provides shape abstraction levels 1 through 5

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape abstraction mathematics

General:

tree-like domains of structures F and shapes Ptree homomorphism π : F → P

For each sequence s:

folding space of sequence s: F (s)

shape space of sequence s: P(s) = π(F (s))

shape class of p in F (s):f (x , p) = {x |x ∈ F (S), π(x) = p}

shape representative structure:shrep = class member of minimal free energy, formally

shrep(s, p)

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Structures and shapes as trees and strings

Level 0

sr

sr

ml

c

c

c

a a u

sr

bl

aua

g

g

g

sr

sr

g

ML

HE HE

HE

HE

ML

HE

HEc

c

g

g

c

((((.(((....)))((...(...))))))) [ [ ] [ ] ]

sr

uuuu

c g

hl

g

gc

chl

ccc

Level 3

abstract

shape

Level 5

abstract

shape

sr

[ [ ] [ [ ] ] ]

[ _ [_] [ _ [_] ] ] level 1

HE

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape algorithmics

Implementation of shape analysis:

shape abstractions are tree homomorphisms

integrate well with DP algorithms

allows for a priori rather than a posteriori analysis

compute shapes in parallel with energyperform analyses on per-shape basis

Any RNA folding program can implement shape abstractionCurrently: use RNAshapes.

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Properties of shapes and shreps

Good properties:

shape classes are disjoint

shreps are interesting representatives

shapes have sequence-independent representation

shapes are meaningful across different sequences (ofdifferent length)

shapes and shreps can be computed efficiently

Bad properties:

shapes are too abstract

shapes are not abstract enough

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Properties of shapes and shreps

Good properties:

shape classes are disjoint

shreps are interesting representatives

shapes have sequence-independent representation

shapes are meaningful across different sequences (ofdifferent length)

shapes and shreps can be computed efficiently

Bad properties:

shapes are too abstract

shapes are not abstract enough

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Simple shape analysis with RNAshapes

The three top shreps of our tRNA example:

Shape GGGCCCAUAGCUCAGUGGUAGAGUGCCUCCUUUGCAAGGAGGAUGCCCUGGGUUCGAAUCCCAGUGGGUCCA[] (((((((((((((((.((((.....(((((((...))))))).))))))))))).........)))))))). -35.9 kcal/mol[[][]] ((((((((.....((.((((.....(((((((...))))))).))))))(((.......))).)))))))). -32.2 kcal/mol[[][][]] ((((((...((((.......)))).(((((((...))))))).....(((((.......))))).)))))). -31.7 kcal/mol

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [ ]

GG

GG

AUG

UA

GC

UCA

GUG

GUAG

AGC

GC

AU

GC

UU C

GCAUGU A U

GA

GGCC C

CGGGUU C

GAUCCCC G

GC

AUCU

C

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [[ ][ ]]

GGGCCCAUAG

CUCA

GUGG

UAGAG

UGCCUCCUU

UG C

AAGGAGG

AUGCCCU

G G GU U

CG

AAUCCC

AGUGGGUCCA

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape [[ ][ ][ ]]

GGGCCCAUA

GCUCAGU

GG

U AG A G U

GCCUCCUU

UG C

AAGGAGGAUGC

CC U G G G

U UCG

AAUCCCAG

UGGGUCCA

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape Space Statistics

Condensation of the folding space:Structure asymptotics:

S(n) ≈ 1.104366 ∗ n−3/2 ∗ 2.618034n

Level-k shape asymptotics:

P1(n) ≈ 0.98542 ∗ n−3/2 ∗ 2.40591n

P5(n) ≈ 2.44251 ∗ n−3/2 ∗ 1.32218n

Empirically, numbers are much smaller for a concrete sequenceSee some statistics within 5% kcal/mol of MFE:

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Numbers of shapes versus structures

0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300

Nr.

of S

truct

ures

/Sha

pes

Sequence length [nt]

ShapesStructures

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shapes versus structures, logarithmic scale

0.01

1

100

10000

1e+06

1e+08

1e+10

1e+12

1e+14

1e+16

1e+18

0 20 40 60 80 100 120

Nr.

of S

truct

ures

/Sha

pes

Sequence length N [nt]

StructuresShapes

0.0391 * 1.3968912N

0.2064 * 1.1067094N

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Homogenity in shape classes

The “Boltzman Ensemble” on Ice

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Homogenity in shape classes

The “Boltzman Ensemble” on Ice

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Best k shreps

Björn Voß

[] [[][]] [[][][]]

RNAshapes

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Complete probabilistic shape analysis

“How much would you trust a structure with aprobability of 0.1 ∗ 10−12, even when it is optimal?”

Chip Lawrence, Benasque 2003 and ISMB 2007

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

From energy to probability

According to Boltzmann statistics, sequence s has structure xwith probability

Prob(x) = (e−Ex/RT )/Q

where Ex is folding energy, T is temperature, R universal gasconstant, and Q the “partition function”,

Q =∑

x∈F (s)

e−Ex/RT

Accumulated shape probabilities

Prob(p) =∑

π(x)=p Prob(x) for all p ∈ P(s)

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

New information from shape probabilities

Overtaking: Shape probabilities may contradict energy ranking

[ ]E= -22.90 kcal/mol

P= 0.2370279

[ ][ ][ ]E= -22.50 kcal/mol

P= 0.0999191

[ ][ ]E= -22.30 kcal/mol

P= 0.5511424

Gets 2nd Gets 3rd

Gets 1stBjörn Voß

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

A propos “complete”

Probabilistic shape analysis is computationally expensive

probabilities give full information about folding space, but

we can not compute only the k most likely shapes

computation feasible up to 400 nts ...

but check for RapidShapes by Stefan Janssen

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Requirements

Complete probabilistic shape analysis

requires a non-ambiguous grammar with correct dangles atall places

applies “classified” dynamic programming

takes time O(1.1n ∗ n3) where n = |s|

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Results from complete probabilistic analysis

Some observations:

Sequence Shape 1 Prob. Shape 2 Prob.lin-4 precursor [] 0.99999994tRNA-ala [] 0.989744 [[]] 0.008994typical mRNA [][[][]] 0.432154 [[[][]][]] 0.149831HIV-1 Leader [][[][[][]]]] 0.6164 [][[[][[][]]][]] 0.3492

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The RNAshapes package

Modes of operation:

Computation of low-energy shape representative structures

Computation of accumulated shape probabilities

Computation of consensus shapes

No heuristics involvedAvailable athttp://bibiserv.techfak.uni-bielefeld.de/RNAshapes/

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Application: shape based indexing

Assume we have a ncRNA candidate in some novel organism,(⇒ lecture by C. Sharma)and want to know whether it resembles something known:

main resource: Rfam database with 600 structural RNAfamilies

families represented by curated structural alignments (cf.Rfam lecture)

search via covariance models (cf. probabilistic modelslecture)

search effort O(n4) per model

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Filter techniques

Filter techniques are used to skip unsuccessful searches

1 BLAST filter

2 Ravenna HMM filter

3 shape index based filtering – RNAsifter by Stefan Janssen

Details on (1) and (2) in the Rfam Database lecture

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape index construction

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape index based search

_[_[_[_[_[]]_[_[]_]_]_]_]_[]__[_[_[_[_[]]_[_[]_]_]_]_]_[]_[_[_[_[_[]]_[_[]_]]_]_]_[]_

[_[_[[_[]][_[]_]]_]_][][[_[_[[_[]][_[]_]]_]_][]][_[]_][_[_[[_[]_][]]_]_]

[[[[[]][[]]]]][][[[[[[]][[]]]]][]][[]][[[[[]][]]]]

[][[[[]]]]

53,116 more shapes

[[[]][[[]]]]

[[[[]]]][[[]]]

[[[[[]][[]]]]][]

59,337 more shapes

[[[[[[]][[]]]]][]]

[[]][[[[[]][]]]]

[_[_[_[]_]_]_][_[_[]_]]

93,840 more shapes

[[_[_[[_[]][_[]_]]_]_][]]

[_[]_][_[_[[_[]_][]]_]_]

_[_[_[]]]_

112,489 more shapes

[[[_[_[]_]_]_]_]_

_[_[[_[[]_]_]_]]_

>Q

uery

: hg1

7_ct

_RN

Azs

et19

0_s5

031

[]

12,156 more shapes

[[][[][]]]

[[][]][][]

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Filtered search performance

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.40 0.50 0.60 0.70 0.80 0.90 1.00

k-best-shape-index1-SS_cons-shape-index1-consensus-shape-index1-hybrid-shape-index1-union-shape-index1-RNAalifold-shape-index

cmsearch --hmmfilterk-RNAlishapes-shape-index

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Average run times

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600 700 800 900

RNAsiftercmsearch

HMM-filterBLAST-filter

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

Shape based matching

Search by structure ...

Assume you have a (single) transcript with a well-definedstructure

How to search for structural homologues in relatedorganisms?

Create a specialized folding program via Locomotif athttp://bibiserv.cebitec.uni-bielefeld.de/locomotif

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

References on abstract shape analysis

Abstract Shapes of RNA. Giegerich R, Voss B, Rehmsmeier M.Nucleic Acids Research 2004, Vol. 32, No 15, 1 - 9.

Complete Probabilistic Analysis of RNA Abstract Shapes. Voss,Giegerich, Rehmsmeier. BMC Biology, 2006, Feb 15;4(1):5

RNAshapes: an integrated RNA analysis package based onabstract shapes. Steffen P, Voss B, Rehmsmeier M, Reeder J,Giegerich R. Bioinformatics 2006, Feb 15;22(4):500-3.

Shape based indexing for faster search of RNA family databases.Janssen S, Reeder J, Giegerich R, BMC Bioinformatics, 2008

Locomotif

Rapidshapes

Robert Giegerich Advanced Course: Shapes

AdvancedCourse:Shapes

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

Abstractshapes

Defining shapeabstractions

Properties of theshape space

RNAshapes

Simple shapeanalysis

Completeprobabilisticshape analysis

ShapeProbabilitites

Application:Shape basedindexing

Application:Shape basedmatching

The End

Thanks for your attention.

Robert Giegerich Advanced Course: Shapes

Recommended