74
Background Turbo-Decoding RNA Secondary Structure References Turbo-Decoding of RNA Secondary Structure Gaurav Sharma Department of Electrical and Computer Engineering Department of Biostatistics and Computational Biology June 18, 2013 1

Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

Turbo-Decoding of RNA Secondary Structure

Gaurav Sharma

Department of Electrical and Computer EngineeringDepartment of Biostatistics and Computational Biology

June 18, 2013

1

Page 2: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Outline

◮ Turbo-decoding in Communications: A Quick Review

Page 3: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Outline

◮ Turbo-decoding in Communications: A Quick Review

◮ RNA Structure Analysis: Motivation and Background◮ RNA, noncoding RNA, RNA structure and its significance◮ RNA structure prediction

◮ Single/Multiple sequence methods

Page 4: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Outline

◮ Turbo-decoding in Communications: A Quick Review

◮ RNA Structure Analysis: Motivation and Background◮ RNA, noncoding RNA, RNA structure and its significance◮ RNA structure prediction

◮ Single/Multiple sequence methods

◮ Turbo-decoding RNA secondary structure◮ Iterative probabilistic decoding of structures of multiple

homologs: TurboFold

◮ Turbo-decoding: RNA vs communications

Page 5: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Outline

◮ Turbo-decoding in Communications: A Quick Review

◮ RNA Structure Analysis: Motivation and Background◮ RNA, noncoding RNA, RNA structure and its significance◮ RNA structure prediction

◮ Single/Multiple sequence methods

◮ Turbo-decoding RNA secondary structure◮ Iterative probabilistic decoding of structures of multiple

homologs: TurboFold

◮ Turbo-decoding: RNA vs communications

◮ Conclusions

Page 6: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Outline

◮ Turbo-decoding in Communications: A Quick Review

◮ RNA Structure Analysis: Motivation and Background◮ RNA, noncoding RNA, RNA structure and its significance◮ RNA structure prediction

◮ Single/Multiple sequence methods

◮ Turbo-decoding RNA secondary structure◮ Iterative probabilistic decoding of structures of multiple

homologs: TurboFold

◮ Turbo-decoding: RNA vs communications

◮ Conclusions

◮ Ongoing related work

Page 7: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Convolutional Codes

◮ Encoder

D D D Codeword symbolstreams

Messageword

symbolstreams c1(D)

c2(D)

c3(D)

c4(D)

m1(D)

m2(D)

m3(D)

◮ Finite state machine◮ Output and next state are functions of current state and inputs

Page 8: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

ML Decoding: Convolutional Code

◮ Convolutional code structure constrains possibilities to a trellis

1 00 0 1 1 Message BitsDe odedRx Bits

11 10

00

10

01

11

11 01 1001

◮ ML Decoding: Most likely path through the trellis given thereceived information

Page 9: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo Decoding in Communications

◮ An encoder construction

m(l)1

m(l)0

m(l)k−1

p(l)

m(l)... ...ParitySubunitComputationRe ursive Systemati Convolutional En oder... c

(l)0

c(l)1

c(l)n−k−1

m(l)Messageword

c(l)n−k

c(l)n−k+1

c(l)n

Systemati Subunit ...P(D)

Figure : Systematic convolutional encoder(recursive)

m(l)

P1(D)

m(l)

p(l)1

m(l)

P2(D)

m(l)

p(l)2

Figure : Two encoders.

InterleaverΠ

P2(D) p(l)2

m(l) m(l)

P1(D) p(l)1

c(l)

Figure : Parallelconcatentation.

Page 10: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo Decoding in Communications

InterleaverΠ

P1(D)

P2(D)

p̃1

p̃2

m̃ r̃1

r̃1

r̃1

Channel

Figure : Encoder + channel

Forward-Ba kward Algorithmfor Extrinsi InformationUpdate for Code 1r̃1

Forward-Ba kward Algorithmfor Extrinsi InformationUpdate for Code 2r̃2

Π

Approximate MAPDe oding {m̂(l)}L−1l=0 ≡ ˆ̃m

Π−1 Π

r̃0

t = T

Iterative PhaseFinal De oding Step{

T(t)l

(m)}L−1

l=0

{

U(t)l

(m)}L−1

l=0

Figure : Iterative Decoder

Page 11: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Symbol-wise MAP Decoding: Convolutional Code

◮ Convolutional code structure constrains possibilities to a trellisRx Bits11 10

00

10

01

11

11 01 1001

◮ Most probable value of a bit for all possible paths through thetrellis, given the received information

◮ Localized probabilistic information

Page 12: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo Decoding in Communications: Observations

◮ Multiple encodings of same message information

◮ Joint (optimal) decoding desirable◮ Exact joint decoding ≈ exponential complexity◮ Computationally Efficient Decoding: Iterative approximation

(belief propagation)◮ Localized MAP probabilitistic formulation◮ Decomposition into loosely coupled individual decodings +

information exchange at each iteration◮ Linear complexity in length of data◮ Pseudo-prior interpretation

Page 13: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo Decoding in Communications: Observations

◮ Multiple encodings of same message information

◮ Joint (optimal) decoding desirable◮ Exact joint decoding ≈ exponential complexity◮ Computationally Efficient Decoding: Iterative approximation

(belief propagation)◮ Localized MAP probabilitistic formulation◮ Decomposition into loosely coupled individual decodings +

information exchange at each iteration◮ Linear complexity in length of data◮ Pseudo-prior interpretation

Page 14: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA?

What does this have to do with RNA?

Page 15: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA: Ribonucleic Acid

◮ Nucleic Acid of long chain ofunits named nucleotides:Nitrogenous Base, Ribosesugar, Phosphate

◮ Adjacent nucleotides linkedtogether by strong (covalent)phosphodiester bonds

between sugar and phosphate

◮ Information encoded with 4different types of nucleotidesdifferentiated by basecontent: Adenine, Guanine,Cytosine, Uracilhttp://www.genome.gov

DNA

Deoxyribonucleic acidRibonucleic acid

RNA

Base pair

Nitrogenous

Bases

Sugarphosphatebackbone

AUCG's

ATC

G's

H

CC

H

CH

H

H

NC

O

N

OU Uracil

replaces Thymine in RNA

NH2

CC

N

N

CC H

H

H

H H

HC

NC

N

N

O

G Guanine

CC

N

N

CC

H

H

CH

NC

H

N

A Adenine

H

C Cytosine

CC

H

CH

H

NC

O

N

NH2

NH2

CC

N

N

CC H

H

H

H H

HC

NC

N

N

O

G Guanine

CC

N

N

CC

H

H

CH

NC

H

N

A Adenine

H

C Cytosine

CC

H

CH

H

NC

O

N

NH2

H

CC

H3C

CH

H

H

NC

O

N

O

T Thymine

Nitrogeneous

BasesNitrogeneous

Bases

Page 16: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

The Central Dogma

DNA

mRNA Transcription

Mature mRNA

mRNA

Transport to cytoplasm

Translation

Nuclear membrane

Ribosome

tRNA

Codon

Anti-codon

Amino acid

Amino acid

chain (protein)

◮ Genetic information flows unidirectionally:◮ DNA → RNA → Protein

◮ RNA plays a passive role◮ Transient copy created for protein synthesis

http://www.genome.gov

Page 17: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA an Active Player: ncRNAs

micro RNAUAG

DNA Template

Transcription

pre mRNASelf-splicing

UAA

mRNA

Codons

Anti-Codon

Protein

AUG

+/- RibosomalRNA

tRNA

Intron

ncRNAs

◮ ncRNAs: play direct functional rolesin cellular processes

◮ w/o translation to protein =⇒“noncoding”

◮ Increasing numbers (being)discovered

◮ 1989 Nobel Prize in Chemistry:Ribozymes

◮ Thomas Cech and SidneyAltman

◮ 2006 Nobel Prize inPhysiology/Medicine: siRNA

◮ Andrew Fire and Craig Mello

Page 18: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

Noncoding RNAs (ncRNAs): Examples

◮ Commonly known ncRNAs◮ Protein synthesis: tRNA, rRNA◮ RNA modification: snoRNAs,

◮ Up/Down regulation of gene expression◮ Regulation of transcription◮ siRNA/miRNA post transcription regulation silencing of genes◮ piRNAs regulation of retroransposons

◮ RNA Splicing (autocatalysis)

◮ Many more: ...

◮ RNA Genomes (Many viruses including HIV and SIV)

◮ ncRNAs and diseases◮ Abnormal expression for ncRNAs observed in cancerous cells◮ Prader-Willi Syndrome (over-eating and learning disabilities)◮ Autism, Alzheimer’s, ...

13

Page 19: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

Noncoding RNAs (ncRNAs)

◮ RNA molecules that directly play functional roles in cellularprocesses

◮ Do not code for protein synthesis =⇒ “noncoding”.

◮ Structure determines function in noncoding roles

◮ Determination of structure is of significant interest◮ Further understanding of ncRNA function◮ Enhances understanding of cellular processes and interactions◮ Provides targets for drug design

14

Page 20: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Computational Prediction of RNA Structure

◮ Structure determines function in noncoding roles

Page 21: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Computational Prediction of RNA Structure

◮ Structure determines function in noncoding roles

◮ Experimental determination of structure is challenging◮ X-ray Crystallography◮ Crystallization difficult and expensive

Page 22: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Computational Prediction of RNA Structure

◮ Structure determines function in noncoding roles

◮ Experimental determination of structure is challenging◮ X-ray Crystallography◮ Crystallization difficult and expensive

◮ Computational estimation of structure is of significant interest

◮ Understanding ncRNA function in cellular processes andinteractions

◮ Genome understanding: structure based ncRNA gene search◮ Therapeutics: targets for drug design

Page 23: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Computational Prediction of RNA Structure

◮ Structure determines function in noncoding roles

◮ Experimental determination of structure is challenging◮ X-ray Crystallography◮ Crystallization difficult and expensive

◮ Computational estimation of structure is of significant interest

◮ Understanding ncRNA function in cellular processes andinteractions

◮ Genome understanding: structure based ncRNA gene search◮ Therapeutics: targets for drug design

Page 24: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA Structure Hierarchy [Tinoco and Bustamante, 1999]

AAUUGCGGGAAAGGGGUCAA

CAGCCGUUCAGUACCAAGUC

UCAGGGGAAACUUUGAGAUG

GCCUUGCAAAGGGUAUGGUA

AUAAGCUGACGGACAUGGUC

CUAACCACGCAGCCAAGUCC

UAAGUCAACAGAUCUUCUGU

UGAUAUGGAUGCAGUUCA

UG

CA A

P 6b

P 6a

P 6

P 4

P 5P 5a

P 5b

P 5c

120

140

160

180

200

220

240

260

AAU

UGCGGG

A

A

A

G

GGGUCA

ACAGCCG UUCAG

U

ACCA

AGUCUCAGGGG

AAACUUUGAGAU

GGCCU

A G GG U A U

GGUA

AU

A AG

CUGACGG

ACA

UGGUCC

U

A

A

CC

A CGCA

GC

C

A

A

GUCC

UA

AGUCAACAGA

U C UU

CUGUUGAU

A

U

GGAU

G

C

A

GU

UC A

5’

3’

Figure : Hierarchy of RNA structure formation [Waring and Davies, 1984,Doudna and Cech, 2002, Doudna and Cate, 1997]

Page 25: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA Secondary Structure

◮ Folding of RNA linear molecular chain onto itself with basepairing rules

◮ Formation of hydrogen bonds between nucleotides◮ Canonical base pairs

◮ A can pair with U◮ G can pair with C and U◮ G-U pair called non Watson-Crick pair

◮ Greater variety of structures than the DNA double helix

Page 26: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA Secondary Structure Elements

G A G G A A

U

410

A

U

G

C

UC

10

G

C

G

G

C

G

C

G

C

C GU AC G 280C GA

U

20

AGG

G CA G C

G CA UU30 A

C

G

C

G

A

U

G

C

U

G

UA

AC

40G

G CU AG CC50 G 70U G

AUACG

U GCG C

60

AA

ACCA

A

A

G

C

G

C

A CAG

80

C

U

A

270

G

U

C

G

AACA

GA

G

90

A A C A GA

C

G

C

G

GU

C100

G

CG

GA U

A

G

C

G

U

C

G

U

A

200

U

A

U

110

A

G

C

G

C

G

C

C

G

U

G

GU

G

GC

GC120

GGCC

GUA

130G

CCU U G

AAAA

GC

190

U

140

G CG CG CG UG UG

UAG

CG

A 150U

GU180

AU

GU

CG

GCAUA

GCAU 160G

CG170

GC

UA

UUC

G

C

G

C

A

A

AA

210GUGA

AA

CG

220

C GC

CG

C AU GC G

UA

AG

230

AG

A

GC

C

240

G260

GC

CG

UA

GC

GU

CG

GC

GUA

250

A

U

GA

CGGA

CAAG

U

A

U

C

G

C

G

A

290

A A UAGG

C

G

C

A

U

A

U

G

300

C

C

G

A UUA

GU320 GCCGGCU 310

GG

GCCC

GGGU

330

A

GCUUG340AG

C

C

G

360

C

G

U

A

G

C

U

G

C

G

G

C

G

350

UG

AC

UA

GA

GGAA

370

UGA

UU

GCCGA

380 AC

G

C

G

C

G

C

G

390

GCA

A G G A AC

AG

400

AA

A

5’

3’

Pseudoknot

Internal Loop

Multibranch Loop

Bulge Loop

G C

Hairpin Loop

Figure : Structural Elements of LGW17 sequence from RNAse PDatabase [Brown, 1999]

Page 27: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA Structure: Thermodynamics

◮ Equilibrium: Boltzmann Distribution of structures

UnpairedState

3’

5’

3’

5’

3’

5’

3’

5’

exp(−∆Go(Sk)/RT )

···

··· ···

exp(−∆Go(S1)/RT )

exp(−∆Go(S2)/RT )

Sk

···

···

S1

S2

◮ Lower ∆Go(Sk), higher the probability of Sk

◮ Most likely structure → Minimization of free energy

Page 28: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Modeling RNA Thermodynamics: Nearest neighbor model

◮ Nearest neighbor model [Xia et al., 1998, Mathews et al.,1999]

◮ Computational model for free energy change of RNA structure◮ Experimentally determined free energy terms for each nearest

neighbor interaction in secondary structure◮ Loop decomposition

A

U U

1 Nucleotide Bulge Loop: +3.3 kcal/mol

Stacked loop (UA/GC): −1.8 kcal/mol

4 nucleotide hairpin loop: +5.9 kcal/mol

Terminal Mismatch (GC/AA): −1.1 kcal/mol

5’ dangling nucleotide: −0.3 kcal/mol

Stacked loop (AU/CG): −2.1 kcal/mol

Stacked loop (CG/AU): −1.8 kcal/mol

Stacked loop (AU/UA): −0.9 kcal/mol

Stacking of base pairs (GC/GC): −2.9 kcal/mol

Stacked loop (GC/GC): −2.9 kcal/mol

3’

5’

U

G

U

A

C

C

C

AA

G

G

A

G

U

A

C

A

A

Figure : Total free energy change is summation of all nearest neighborenergies [Durbin et al., 1999]

Page 29: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA ML Decoding of Structure: Single Sequence

◮ Most likely or minimum free energy structure, given sequence

· · · · · ·· · ·

◮ Dynamic Programming MFold [Zuker, 1989] O(N3)complexity

Page 30: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA MAP Decoding of Structure

◮ Posterior probability of base pairing, given sequence

· · · · · · · · ·

◮ Dynamic Programming [McCaskill, 1990], MFold, RNAfold(O(N3) in time, O(N2) in space)

◮ Localized probabilistic information

Page 31: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

RNA Structure Prediction (Single Sequence)

RD0260 Sequence

GCGACCGGGGCUGGCUUGGUAAUGGUACUCCCCUGUCACGGGAGAGAAUGUGGGUUCAAAUCCCAUCGGUCGCGCCA

-28.6 kcals/mol

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 nucleotide indices

RD

02

60

nu

cle

oti

de

ind

ice

s

Free Energy

Minimization

Partition Function

Computation

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD

02

60

nu

cle

oti

de

ind

ice

s

RD0260 nucleotide indices

20

30

40

50

60

70

GCGACCG

GG

GC

UG

GCUUG

G

U

AA U

G G U

ACUC

CCC

U

G

U CA

C

GGG

A

GAG

AA

U

G

U G G GU U

C

A

AAU

CCCA

UCGGUCGC

G

C

C

A

◮ Free energy minimization: “Hard” Prediction◮ Single prediction structure

◮ Base pairing probabilities: “Soft” Prediction◮ Thresholding may yield pseudo-knotted structures◮ Maximum Expected Accuracy Structure Prediction, [Do et al.,

2006, Lu et al., 2009]

Page 32: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Structure Prediction for Multiple Sequences: HomologousncRNAs

◮ Homologous ncRNAs

◮ Share evolutionary ancestor◮ Serve same function◮ Structural similarity in terms of topology of structures

10

20

30

40

60

70

GCGACCG

G

G

GCUGG

CUU

G

GU A A

U G G U

A

CUCCC

C

U

GU C

A

C

GGGAG

AG

A

A

U

G U G G GU U

C

A

AAU

CCCAU

CGGUCGC

G

C

C

A

10

20

30

40

50

60

70

GCCCGGG

UG

GUG

UAGU

G

G

C

CC A

UC A U

ACGACC

C

U

GU C

A

C

GGUCGU

GA

C

G C G G GU U

C

G

AAU

CCCGC

CUCGGGC

G

C

C

A

10

20

30

40

50

60

70

GCCCCCA

U

C

GUCUA

GAG

G

CC U

AG G A C

A

CCUCC

C

U

UU C

A

C

GGAGG

CG

A

C

A G G G AU U

C

G

AAU

UCCCU

UGGGGGU

A

C

C

A

RD0260 RD0500 RE2140

Bacteriophage T5 (Asp) Haloferax Volcanii (Asp) Synechocystis (Glu)

Page 33: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Structure Prediction for Multiple Sequences: HomologousncRNAs

◮ Homologous ncRNAs

◮ Share evolutionary ancestor◮ Serve same function◮ Structural similarity in terms of topology of structures

10

20

30

40

60

70

GCGACCG

G

G

GCUGG

CUU

G

GU A A

U G G U

A

CUCCC

C

U

GU C

A

C

GGGAG

AG

A

A

U

G U G G GU U

C

A

AAU

CCCAU

CGGUCGC

G

C

C

A

10

20

30

40

50

60

70

GCCCGGG

UG

GUG

UAGU

G

G

C

CC A

UC A U

ACGACC

C

U

GU C

A

C

GGUCGU

GA

C

G C G G GU U

C

G

AAU

CCCGC

CUCGGGC

G

C

C

A

10

20

30

40

50

60

70

GCCCCCA

U

C

GUCUA

GAG

G

CC U

AG G A C

A

CCUCC

C

U

UU C

A

C

GGAGG

CG

A

C

A G G G AU U

C

G

AAU

UCCCU

UGGGGGU

A

C

C

A

RD0260 RD0500 RE2140

Bacteriophage T5 (Asp) Haloferax Volcanii (Asp) Synechocystis (Glu)

GCGACCG C

GGUCGC

GCCCGGG C

UCGGGC

GCCCCCA U

GGGGGU

G C G G GU

CCCGC

A G G G AU

UCCCUG U G G G

U

CCCAU

CUCCC

C

GGGAG

ACGACC G

GUCGU C

CUCC

C GGAGG

GCUG

U G G UGUG

C A U

GUCU

G G A C

Page 34: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Structure Prediction for Multiple Sequences: HomologousncRNAs

RD0260 GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAAUGUGGGUUCAAAUCCCAUCGGUCGCGCCA

RD0500 GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-CGCGGGUUCGAAUCCCGCCUCGGGCGCCA

RE2140 GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-CAGGGAUUCGAAUUCCCUUGGGGGUACCA

10

20

30

40

60

70

GCGACCG

G

G

GCUGG

CUU

G

GU A A

U G G U

A

CUCCC

C

U

GU C

A

C

GGGAG

AG

A

A

U

G U G G GU U

C

A

AAU

CCCAU

CGGUCGC

G

C

C

A

10

20

30

40

50

60

70

GCCCGGG

UG

GUG

UAGU

G

G

C

CC A

UC A U

ACGACC

C

U

GU C

A

C

GGUCGU

GA

C

G C G G GU U

C

G

AAU

CCCGC

CUCGGGC

G

C

C

A

10

20

30

40

50

60

70

GCCCCCA

U

C

GUCUA

GAG

G

CC U

AG G A C

A

CCUCC

C

U

UU C

A

C

GGAGG

CG

A

C

A G G G AU U

C

G

AAU

UCCCU

UGGGGGU

A

C

C

A

GCGACCG C

GGUCGC

GCCCGGG C

UCGGGC

GCCCCCA U

GGGGGU

70

G C G G G

CCCGC

A G G G A

UCCCUG U G G G

CCCAU

CUCCC

C

GGGAG

ACGACC

C GGUCGU C

CUCC

C GGAGG

GCUG

U G G UGUG

UC A U

GUCU

G G A C

GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAA

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

UGUGGGUUCAAAUCCCAUCGGUCGCGCCA

CGCGGGUUCGAAUCCCGCCUCGGGCGCCA

CAGGGAUUCGAAUUCCCUUGGGGGUACCA

GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAA

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAA

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAA

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

GCGACCGGGGCUGGCUUGGUA-AUGGUACUCCCCUGUCACGGGAGAGAA

GCCCGGGUGGUGUAGU-GGCCCAUCAUACGACCCUGUCACGGUCGUGA-

GCCCCCAUCGUCUAGA-GGCCUAGGACACCUCCCUUUCACGGAGGCGA-

UGUGGGUUCAAAUCCCAUCGGUCGCGCCA

CGCGGGUUCGAAUCCCGCCUCGGGCGCCA

CAGGGAUUCGAAUUCCCUUGGGGGUACCA

UGUGGGUUCAAAUCCCAUCGGUCGCGCCA

CGCGGGUUCGAAUCCCGCCUCGGGCGCCA

CAGGGAUUCGAAUUCCCUUGGGGGUACCA

CGCGGGUUCGAAUCCCGCCUCGGGCGCCA

CAGGGAUUCGAAUUCCCUUGGGGGUACCA

RD0260 RD0500 RE2140

5’ 3’

◮ “Common” structures and conforming sequence alignment

◮ Joint estimation can harness comparative structure and sequenceinformation across homologs

Page 35: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Multiple Sequence RNA Structure Prediction

Multiple

PredictionStructure

Input Sequences

(Optional)

N1

...

xK

x2

x1

...

N2

NK

1

1

1 A

...

S1

SK

S2

Page 36: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Multiple Sequence RNA Structure Prediction

Multiple

PredictionStructure

Input Sequences

(Optional)

N1

...

xK

x2

x1

...

N2

NK

1

1

1 A

...

S1

SK

S2

◮ Sankoff’s dynamic programming algorithm [Sankoff, 1985]◮ Simultaneous folding (pseudo-knot free) and alignment of K

sequences◮ Time (Memory) complexity: O(N3K) (O(N2K))◮ Computationally infeasible even for short sequences and K = 2

w/o cutting corners

Page 37: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo-Decoding RNA Secondary Structure

◮ Goal: Performance similar (“better”) than joint estimation,complexity similar to single sequence computation.

◮ Probabilistic formulation of folding and alignment◮ Base pairing probabilities, posterior alignment probabilities

◮ Iteratively update each using information from other◮ TurboFold [Harmanci et al., 2007, 2011].

Base pairing probabilities

TurboFold

Input Sequences

...

...

NK11

NK

N2

N211

11

N1

N1

1

1

1

NK

N2

...

x1

x2

xK

...

N1

Page 38: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Structural Alignment: Joint Representation of Structuresand Alignment

◮ Two sequence case

3’5’

5’ 3’

A

UG

CA

C

UU

U U GA

U

C G

AU

U

GU

UA

AC

CG U A C U U A C U U G A U C G C A U U G UUAA C

GGA C A A U A U U C U U C U U C C G U G G U A G UUC

UUGAUGGUGCCUUCUUCUUAUAACAGG C

for alignmentCo−incidence path

Structure 2

Structure 1

Sequence 2

Sequence 1 3’5’

Sequence 2

Sequence 1U

UG

AU

GG

UG

CC

UU

CU

UC

UU

AU

AA

CA

GG

CG

AC

UA

AC

UU

AC

UU

GA

UC

GC

AU

UG

UU

UUGUUACGCUAGUUCAUUCAA UCAG

CU

U

U

A

U

C

UU

AA

U C

C

A

G

G C

U

U

G

A

G

UC

G

G

U

Page 39: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Decoupled Probabilistic Representation for RD0260,RD0500 Structural Alignment

◮ Formulate in probabilistic framework and separate thefolding/alignment representations

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 indices

RD

05

00

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0500 indices

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

60

70

80R

D0

26

0 in

dic

es

Base pairing

probabilities for

RD0260

Co-incidence

probabilities

(via HMM) Base pairing

probabilities for

RD0500

(via partition

function)

Page 40: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold

Given K homologous RNA sequences, each sequence contains

some information about folding of every other sequence.

Page 41: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold

Given K homologous RNA sequences, each sequence contains

some information about folding of every other sequence.

◮ Extrinsic information for a sequence◮ The information about folding of a sequence which is

computed using base pairing probabilities of other sequences◮ Thermodynamic model + Alignment model

Page 42: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold

Given K homologous RNA sequences, each sequence contains

some information about folding of every other sequence.

◮ Extrinsic information for a sequence◮ The information about folding of a sequence which is

computed using base pairing probabilities of other sequences◮ Thermodynamic model + Alignment model

◮ Base pairing probabilities of a sequence (Intrinsic Information)◮ From sequence itself◮ Thermodynamic model

Page 43: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold

Given K homologous RNA sequences, each sequence contains

some information about folding of every other sequence.

◮ Extrinsic information for a sequence◮ The information about folding of a sequence which is

computed using base pairing probabilities of other sequences◮ Thermodynamic model + Alignment model

◮ Base pairing probabilities of a sequence (Intrinsic Information)◮ From sequence itself◮ Thermodynamic model

◮ Iterative updates:◮ Compute extrinsic information using base pairing probabilities

and alignment co-incidence probabilities◮ Update base pairing probabilities using updated extrinsic

information◮ Update extrinsic information using updated base pairing

probabilities◮ . . .

Page 44: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Extrinsic Information for Base Pairing for RD0260

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

60

70

80

RD

02

60

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 indices

RD

05

00

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0500 indices

0

pΠRD0260

Extrinsic (RD0260)

˜

0

pΠRD0500

0

cΠRD0500, RD0260

0

cΠRD0500, RD0260

T

(1 − ψRD0500, RD0260

)

( )

Page 45: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

3 Sequences

RD0260 RD0500

RE2140

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 indices

RD

02

60

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0500 indices

RD

05

00

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RE2140 indices

RE

21

40

ind

ice

s

Page 46: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Proclivity Matrix for RD0260 Induced by RE2140

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RE2140

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RE2140 indices

RD

02

60

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RE2140 indices

RD

02

60

ind

ice

s

RE2140 indices

RE

21

40

ind

ice

s

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 indices

RD

02

60

ind

ice

s

Base pairing proclivity matrix

for RD0260 induced by RE2140

T

◮ Information in RE2140 about folding of RD0260

tpΠ̃

(s→m) = cΠ(m,s) t−1

p Πs (cΠ

(m,s))T (1)

Page 47: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

3 Sequences: Extrinsic information Computation

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

RD0260 indices

Proclivity Matrices for RD0260

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

Extrinsic Information

Matrix for RD0260

Normalize by

maximum

RD0260 indices

RD

02

60

ind

ice

sR

D0

26

0 in

dic

es

Proclivity matrix

induced by RE2140

Proclivity matrix

induced by RD0500

RD0260 indices

RD

02

60

ind

ice

s

Weight by

(1 - similarity)

Weight by

(1 - similarity)

Page 48: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

K Sequences: Extrinsic Information Computation for xm

Co−incidence probabilities

Induced base pairingproclivity matrices

for

Base pairingprobabilities

Normalize by maximum

base pairing forExtrinsic information for

{tcΠ(s,m)}s∈N\m

{tpΠ̃(s→m)}s∈N\m

xm

{t−1p Πs}s∈N\m1

1N1

N1

Nm−1

Nm−1

11

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·x1

NK

11 NK

· · ·xm−1 xm+1 xK

Nm+111

Nm+1

· · ·

xm

11 Nm

Nm

(1 − ψ1,m)

tpΠ̃

m

Nm

Nm11

11 Nm

Nm

· · · (1 − ψk,m)

111

1 Nm

Nm

Nm

Nm

· · · (1 − ψm−1,m) (1 − ψm+1,m)

Nm

11

N1

11 Nm

NK

11

11

Nm Nm

Nm+1Nm−1

Page 49: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Probability Computation Using ExtrinsicInformation

◮ Modified Boltzmann distribution of secondary structures:

P (S) ∝ exp

(

−∆G̃(S)

RT

)

Page 50: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Probability Computation Using ExtrinsicInformation

◮ Modified Boltzmann distribution of secondary structures:

P (S) ∝ exp

(

−∆G̃(S)

RT

)

where∆G̃(S) = ∆Go(S)− γ

(i,j)∈S

log(π̃(i, j))

is the modified free energy change for structure S.

◮ π̃(i, j): Extrinsic information for pairing of nucleotides atindices i and j

◮ γ: Weight of extrinsic information on modified free energyrelative to ∆Go(S)

Extrinsic information introduced via a pseudo free energy for eachbase pair

Page 51: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Probability Computation Using ExtrinsicInformation

◮ Modified Boltzmann distribution of secondary structures:

P (S) ∝ exp

(

−∆G̃(S)

RT

)

(2)

where∆G̃(S) = ∆Go(S)− γ

(i,j)∈S

log(π̃(i, j)) (3)

Page 52: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Probability Computation Using ExtrinsicInformation

◮ Modified Boltzmann distribution of secondary structures:

P (S) ∝ exp

(

−∆G̃(S)

RT

)

(2)

where∆G̃(S) = ∆Go(S)− γ

(i,j)∈S

log(π̃(i, j)) (3)

Replace (3) in (2):

P (S) ∝ exp(−∆Go(S)

RT)

︸ ︷︷ ︸

Boltzmann distributionproportionality term

(i,j)∈S

(π̃(i, j))γ/RT

︸ ︷︷ ︸

Extrinsicinformation

Page 53: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Base Pairing Probability Computation Using ExtrinsicInformation

◮ Modified Boltzmann distribution of secondary structures:

P (S) ∝ exp

(

−∆G̃(S)

RT

)

(2)

where∆G̃(S) = ∆Go(S)− γ

(i,j)∈S

log(π̃(i, j)) (3)

Replace (3) in (2):

P (S) ∝ exp(−∆Go(S)

RT)

︸ ︷︷ ︸

Boltzmann distributionproportionality term

(i,j)∈S

(π̃(i, j))γ/RT

︸ ︷︷ ︸

Extrinsicinformation

◮ Base pair (i, j) has a pseudo prior probability of (π̃(i, j))γ/RT

due to extrinsic information.

Page 54: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold: Iterative Updates

Alignment

Iterations

computation for

sequencecomputation for computation for

sequence

Unit "time" delay

computation forcomputation for

probabilities

sequence

Extrinsic information

sequence

Extrinsic information Extrinsic information

sequence

Base pairing probability Base pairing probabilityBase pairing probability

sequencecomputation for

···

···

· · ·

· · ·

η

1st· · ·

· · ·mth Kth· · · · · ·

· · ·· · ·

KthmthΠK

···

· · · · · ·

···

···

Πm

· · ·· · ·

Π̃1

Π1

···

· · · · · ·

···

···

···

···

Π̃m

1st Π̃K

◮ For low K, per iteration complexity is comparable to singlesequence structure prediction

◮ Benefits from comparative analysis

Page 55: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold Structure Prediction Overview

◮ Obtain base pairing probabilities after η iterations, thenpredict structures

◮ Significant base pairs◮ Maximum expected accuracy (MEA) structures

Iterations

Thresholding

Maximum Expected

Accuracy Prediction

TurboFold−MEA

Input Sequences

TurboFold

η

{tpΠ

m}{tpΠ̃

m}{η

pΠm}m∈N

{xm}m∈N

N1

...

xK

x2

x1

...

N2

NK

1

1

1

Page 56: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Structure Prediction

◮ Structure for xm composed of base pairs with probabilitiesgreater than Pthresh:

S∗

m = {(i, j) ∋ ηpπ

m(i, j) > Pthresh} (4)

Page 57: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

TurboFold: Computation Complexity

◮ Initialization◮ Computation of co-incidence matrices: O(K2N2)◮ Computation of sequence similarities: O(K2N2)

◮ Iterations◮ Extrinsic information computation: O(ηK2d2N2)◮ Base pairing probability computation: O(ηKUN3)

◮ Structure prediction◮ Thresholding: O(KN2)◮ MEA prediction: O(KN3)

Compare to Sankoff’s algorithm: O(N3(U2d)K)

Page 58: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Evaluating Accuracy of Estimates

◮ Sensitivity: Ratio of number of correctly predicted base pairsto the total number of base pairs in the known structure

True Positive

True Positive + False Negative

◮ Recall

◮ Positive Predictive Value(PPV): Ratio of number of correctlypredicted base pairs to the total number of base pairs in thepredicted structure

True Positive

True Positive + False Positive

◮ Precision

Page 59: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Parameter Selection: Number of iterations, η

Sensitivity vs. PPV over 5S rRNA datasetwith changing η

0.66 0.67 0.87 0.88 0.89 0.90.635

0.64

0.645

0.875

0.88

0.885

0.89

0.895

0.9

0.905

Sensitivity

PP

V

0

1

234

5

◮ η = 3 is used in TurboFold

Page 60: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Parameter Selection: Weight of Extrinsic Information, γ

Sensitivity vs. PPV over 5S rRNA datasetwith changing γ/RT

0.65 0.7 0.75 0.8 0.85 0.90.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0.001

0.05

0.1

0.2

0.30.5

0.81.01.2

Sensitivity

PP

V

◮ γ = 0.3RT is used in TurboFold

Page 61: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Benchmarking Experiments: Datasets

◮ Randomly choose 200 RNase P, 400 5S rRNA, 400 SRP, and400 tRNA sequences and divide into K combinations

◮ Choose and divide for K = 2, . . . , 10

◮ Yields 36 datasets

The datasets have significant diversity:

◮ RNase Ps: 336 nucleotides, 50% average pairwise identity

◮ tmRNA: 366 nucleotides, 45% average pairwise identity

◮ telomerase RNA: 445 nucleotides, 54% average pairwiseidentity

◮ SRPs: 187 nucleotides, 42% average pairwise identity

◮ tRNAs: 77 nucleotides, 47% average pairwise identity

◮ 5S rRNAs: 119 nucleotides, 63% average pairwise identity

Page 62: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Benchmarking Experiments: Datasets

◮ Randomly choose 200 RNase P, 400 5S rRNA, 400 SRP, and400 tRNA sequences and divide into K combinations

◮ Choose and divide for K = 2, . . . , 10

◮ Yields 36 datasets

The datasets have significant diversity:

◮ RNase Ps: 336 nucleotides, 50% average pairwise identity

◮ tmRNA: 366 nucleotides, 45% average pairwise identity

◮ telomerase RNA: 445 nucleotides, 54% average pairwiseidentity

◮ SRPs: 187 nucleotides, 42% average pairwise identity

◮ tRNAs: 77 nucleotides, 47% average pairwise identity

◮ 5S rRNAs: 119 nucleotides, 63% average pairwise identity

Page 63: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Benchmarking Experiments

◮ TurboFold is benchmarked against methods that estimatebase pairing probabilities:

◮ LocARNA [Will et al., 2007]◮ RNAalifold [Bernhart et al., 2008]◮ Single sequence partition function [Mathews, 2004]

◮ The set of base pairs with estimated probabilities higher thanPthresh are scored

◮ Plotted sensitivity versus PPV while varying Pthresh between 0and 1 with step size of 0.04

Page 64: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Benchmarking Experiments

Sensitivity vs PPV ROC curves for TurboFold vs three alternativemethods

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Sensitivity

PP

V

TurboFOLD (K = 3)TurboFOLD (K = 10)LocARNA (K = 3)LocARNA (K = 10)RNAalifold (K = 3)RNAalifold (K = 10)Single (K = 3)Single (K = 10)

Page 65: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Run Time Requirements

◮ Run time requirements over 50 RNase P sequence datasets

Runtime (seconds) forK = 3 K = 5 K = 10

TurboFold 136.75 277.9 517.0

LocARNA 746.44 2815.9 11395.8

RNAalifold 0.2 0.3 0.6Table : Time requirements (in seconds) for the methods.

◮ TurboFold scales slower with increase in K

Page 66: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Conclusions

◮ TurboFold: A multiple sequence structure prediction method◮ Lowers Complexity with iterative combination of intrinsic and

extrinsic information for folding◮ Intrinsic information: From sequence via thermodynamic

folding model (nearest neighbor model)◮ Extrinsic information: From other sequences

◮ TurboFold accuracy: close to or higher than the simultaneousfolding and alignment methods

◮ Details: BMC Bioinformatics article Harmanci et al. [2011].

◮ Connections to coding theory in digital communications

Page 67: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Turbo Decoding: RNA vs Communications

◮ Multiple encodings of same information◮ Nature/Man

◮ Joint (optimal) decoding desirable◮ Exact joint decoding ≈ exponential complexity◮ Iterative approximation (belief propagation)

◮ Localized MAP probabilitistic formulation (basepairing/symbol probs.)

◮ Decomposition into loosely coupled individual decodings +information exchange at each iteration

◮ Linear complexity in length of data◮ Pseudo-prior interpretation

Page 68: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Ongoing Related Work

◮ Moving beyond TurboFold◮ Alignment probability updates based on structures◮ Better handling of dependencies◮ Domain insertions/deletions

◮ Connecting with experiments◮ Incorporating experimental information (e.g. SHAPE) in

structural alignments◮ Postulating mechanisms and experimental validation (HIV)

Page 69: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Acknowledgments

◮ Collaborators at UR:◮ Arif O. Harmanci, UR ECE, currently PostDoc at Yale◮ David H. Mathews, Department of Biochemistry and

Biophysics

◮ Research support:◮ National Institutes of Health (NIH) (Award # GM097334-01)◮ Center for Research Computing, University of Rochester

Page 70: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

Thank you

Questions?

Page 71: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

S. H. Bernhart, I. L. Hofacker, S. Will, A. R. Gruber, and P. F.Stadler. RNAalifold: improved consensus structure prediction forRNA alignments. BMC Bioinformatics, 9:474, 2008.

J. W. Brown. The Ribonuclease P database. Nucleic Acids Res.,27(1):314, Jan. 1999.

Chuong B. Do, Daniel A. Woods, and Serafim Batzoglou.CONTRAfold: RNA secondary structure prediction withoutphysics-based models. Bioinformatics, 22(14):90–98, 2006.

J. A. Doudna and T. R. Cech. The chemical repertoire of naturalribozymes. Nature, 418(6894):222–228, 2002.

J.A. Doudna and J.H. Cate. RNA structure: crystal clear? Current

Opinions in Structural Biology, 7:310–316, 1997.

R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. BiologicalSequence Analysis : Probabilistic Models of Proteins and

Nucleic Acids. Cambridge University Press, Cambridge, UK,1999. ISBN 0521629713.

54

Page 72: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

A. Ozgun Harmanci, Gaurav Sharma, and David H. Mathews.Toward turbo decoding of RNA secondary structure. In Proc.

IEEE Intl. Conf. Acoustics Speech and Sig. Proc., volume I,pages 365–368, Apr. 2007.

A. Ozgun Harmanci, Gaurav Sharma, and David H. Mathews.TurboFold: Iterative Probabilistic Estimation of SecondaryStructures for Multiple RNA Sequences. BMC Bioinformatics,12:108, 2011. early access available online, April 20, 2011.

Zhi John Lu, Jason W. Gloor, and David H. Mathews. ImprovedRNA secondary structure prediction by maximizing expected pairaccuracy. RNA, 15(10):1805–1813, 2009.

D. H. Mathews. Using an RNA secondary structure partitionfunction to determine confidence in base pairs predicted by freeenergy minimization. RNA, 10(8):1178–1190, 2004.

D. H. Mathews, J. Sabina, M. Zuker, and D. H. Turner. Expandedsequence dependence of thermodynamic parameters provides

54

Page 73: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

improved prediction of RNA secondary structure. J. Mol. Biol.,288(5):911–940, 1999.

J. S. McCaskill. The equilibrium partition function and base pairbinding probabilities for RNA secondary structure. Biopolymers,29(6-7):1105–1119, Nov. 1990.

D. Sankoff. Simultaneous solution of RNA folding, alignment andprotosequence problems. SIAM J. App. Math., 45(5):810–825,Oct. 1985.

I. Tinoco, Jr. and C. Bustamante. How RNA folds. J Mol Biol,293(2):271–281, 1999.

Richard B. Waring and R. Wayne Davies. Assessment of a modelfor intron rna secondary structure relevant to RNA self-splicing –a review. Gene, 28(3):277–291, 1984.

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler,and Rolf Backofen. Inferring noncoding RNA families andclasses by means of genome-scale structure-based clustering.PLoS Comput. Biol., 3(4):680–691, Apr. 2007.

54

Page 74: Turbo-Decoding of RNA Secondary Structuregsharma/.../SharmaTurboDecodingRNASe… · RNA Structure Analysis: Motivation and Background RNA, noncoding RNA, RNA structure and its significance

BackgroundTurbo-Decoding RNA Secondary Structure

References

T. Xia, J. SantaLucia, Jr., R Kierzek, S. J. Schroeder, X Jiao,C Cox, and Douglas Henry Turner. Thermodynamic parametersfor an expanded nearest-neighbor model for formation of RNAduplexes with Watson-Crick pairs. Biochemistry, 37(42):14719–14735, 1998.

M. Zuker. Computer prediction of RNA structure. Methods

Enzymol., 180:262–288, 1989.

54