28
microRNA Prediction with microRNA Prediction with SCFG and MFE Structure SCFG and MFE Structure Annotation Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Embed Size (px)

DESCRIPTION

Computers vs Genetics

Citation preview

Page 1: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

microRNA Prediction with microRNA Prediction with SCFG and MFE Structure SCFG and MFE Structure AnnotationAnnotationTim Shaw, Ying Zheng, and Bram Sebastian

Page 2: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Goal of the PresentationGoal of the PresentationIntroduction to miRNASurvey of computational and

experimental approaches to identify microRNA

CYK AlgorithmOur MethodologyResult/DiscussionFuture Direction

Page 3: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Computers vs GeneticsComputers vs Genetics

Page 4: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Background on microRNA Background on microRNA and its Classical Definitionand its Classical DefinitionFound in Eukaryotes (706 identified in

human)Genome-encoded stem-loop precursorGenerally Processed by a Dicer and

HelicaseMature microRNA is approximately 22

nucleotides (nt)Recognize target mRNA by base-pairing

◦Acts as a primarily gene silencing◦Some cases of gene enhancing

Page 5: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Diagram for miRNADiagram for miRNA

Page 6: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Problems with miRNA Problems with miRNA Hunting through Lab Hunting through Lab ExperimentsExperimentsBiology = network of cause and

effectmiRNA might be bounded by certain

Environmental TriggersHard to detect expression of certain

microRNA sequences.Some miRNA may have a hard to

clone physical property including sequence composition or post-transcriptional modification

Page 7: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Problems with miRNA Problems with miRNA Hunting through Hunting through Computational ApproachesComputational ApproachesStem loop structure is common in

EukaryotesEukaryotic genome are long and

most computational approach are not practical for scanning through the entire genome

Page 8: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Computational Driven Computational Driven ApproachApproachStructure information

(Thermodynamics)◦RNAz

Homology Conservation of structure (ERPIN, MirScan, snarloop)

◦Stem ◦Loop◦Target sequece

Machine Learning (miRFinder, microPred)◦Feature selection based on sequence and

structural information

Page 9: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Tests done on those Tests done on those methodologymethodologyERPIN (2001) (Homology)

◦ Good result but very limited to the availability of the training data. Capable of detecting only 66 of the 706 miRNA if we remove the human training sequences we can only detect 36 miRNA

miRFinder (2007) (ab initio)◦ Human

Specificity: (1320/8494) (84.46%)  Sensitivity: (599/706) (84.84%)

◦ Mouse Specificity: (1759/10213) (82.78%) Sensitivity: (450/547) (82.27%)

microPred (2009) (ab initio)◦ Found bug for the author, currently getting it

fixed.

Page 10: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Negative Set GenerationNegative Set GenerationSequence were obtained from

the CDS region of the genome◦Implementation of a CDS Extractor

for ccdsgenes.txt files from the UCSC Genome Browser

CDS means coding region ◦(Sequence that code for protein)

Need to implement a new parser based on the cds.txt from the UCSC Genome Browser

Page 11: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Positive SetPositive SetDownloaded from MiRBase 706

human and 547 mouse genome

Page 12: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Algorithms for SCFGAlgorithms for SCFGCYK algorithm

◦calculates the optimal alignment of a sequence to an SCFG with ambiguity

Inside algorithm◦calculates the probability of a

sequence given an SCFG.Inside-outside algorithm

◦Estimates optimal probability parameters for an SCFG given a set of example sequences.

Page 13: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Advantages of CYKAdvantages of CYKA relative fast algorithm O(n3) and if

we take advantage of the Dynamic Programming table we can scan through the sequence O(n2)

We can quickly compute multiple windows at the same time

It is able to fold an RNA forcefully into a specific structure that we specify

Page 14: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Introduction to the Modified Introduction to the Modified CYK AlgorithmCYK AlgorithmGiven X = X1… Xn and a SCFG G,

◦Find the optimal parse of X◦Dynamic Programming

(i, j, V): likelihood of the most likely parse of xi…xj,

rooted at nonterminal V

Page 15: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Stochastic CYKStochastic CYKInitialization:

(i, i-1) = log P()

Iteration:For i = 1 to NFor j = i to N (i+1, j–1) + log P(xi S xj) (i, j–1) + log P(S xi) (i, j) = max (i+1, j) + log P(xi S) maxi < k < j (i, k) + (k+1, j) + log P(S S)

Page 16: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Weight Estimation of each Weight Estimation of each Non-terminal emissionNon-terminal emissionmiRNA let7 57 sequences

obtained from RfamUsed R Coffee to estimate length

of the hairpin loop, stem, and bulge

The parameters that we estimated seems to work well with majority of the cases of the microRNA

Page 17: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Result for CYKResult for CYKInsert Plot Here

Page 18: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

RNAfoldRNAfoldMost commonly used tool for

predicting RNA secondary structureAll the ab intio approaches or hairpin

loop finders currently uses RNAfold for identifying an estimate of a microRNA structure and its MFE

We use RNAfold’s mfe as a measuring stick and use some of its structural features to assist our routine

Page 19: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Result for RNAfold Result for RNAfold Insert Plot Here

Page 20: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

CYK RNAfold HybridCYK RNAfold HybridI use the formula as follows[CYK] * 2 + [MFE] =

CombinedScoreDuring the calculation, if RNAfold

predict a structure with two or more hairpin loops, then we penalize the CYK score

Page 21: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Z score calculationZ score calculationIn order for us to combine the

features of the MFE and the CYK score we randomly sampled 20,000 sequences from the Human Genome and calculated its MFE and CYK

Page 22: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

CYK RNAfold Hybrid ResultCYK RNAfold Hybrid ResultInsert Plot Here

Page 23: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Optimized Sensitivity Optimized Sensitivity Specificity ComparisonSpecificity Comparison

  Human Specificity TestHuman Sensitivity Test Mouse Specificity Test

Mouse Sensitivity Test

  8494 pseudo-miRNA 706 miRNA 10213 pseudo-miRNA 547 miRNA

MFE 73.15% 73.07% 65.83% 66.97%

CYK 79.09% 78.60% 72.19% 72.47%

CYK-Hybrid 81.05% 81.08% 72.17% 71.93%

miRFinder 84.46% 84.84% 82.78% 82.27%

Page 24: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

Disadvangtage of our Disadvangtage of our ProgramProgramLimited to its structural accuracy

Page 25: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

To Do ListTo Do ListPossibly test the accuracy in

terms of CYK’s ability to predicting the structure of the microRNA

Need to run through the

Page 26: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

SummarySummaryWe currently have a routine that is

capable of identifying microRNA on a 82% Sensitivity and Specificity based solely on its structure

Currently communicating with a student from the UK that published microPred to see if we can use our program to retrain their SVM to see if we can get a better result

Page 27: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

See Website for more See Website for more DetailsDetailshttp://128.192.76.177/ProjectUpd

ate/microRNA.htmlhttp://128.192.76.177/CYK.html

for testing out the grammar

Page 28: MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian

ReferencesReferences