Upload
kathlyn-marilyn-neal
View
225
Download
0
Embed Size (px)
DESCRIPTION
Computers vs Genetics
Citation preview
microRNA Prediction with microRNA Prediction with SCFG and MFE Structure SCFG and MFE Structure AnnotationAnnotationTim Shaw, Ying Zheng, and Bram Sebastian
Goal of the PresentationGoal of the PresentationIntroduction to miRNASurvey of computational and
experimental approaches to identify microRNA
CYK AlgorithmOur MethodologyResult/DiscussionFuture Direction
Computers vs GeneticsComputers vs Genetics
Background on microRNA Background on microRNA and its Classical Definitionand its Classical DefinitionFound in Eukaryotes (706 identified in
human)Genome-encoded stem-loop precursorGenerally Processed by a Dicer and
HelicaseMature microRNA is approximately 22
nucleotides (nt)Recognize target mRNA by base-pairing
◦Acts as a primarily gene silencing◦Some cases of gene enhancing
Diagram for miRNADiagram for miRNA
Problems with miRNA Problems with miRNA Hunting through Lab Hunting through Lab ExperimentsExperimentsBiology = network of cause and
effectmiRNA might be bounded by certain
Environmental TriggersHard to detect expression of certain
microRNA sequences.Some miRNA may have a hard to
clone physical property including sequence composition or post-transcriptional modification
Problems with miRNA Problems with miRNA Hunting through Hunting through Computational ApproachesComputational ApproachesStem loop structure is common in
EukaryotesEukaryotic genome are long and
most computational approach are not practical for scanning through the entire genome
Computational Driven Computational Driven ApproachApproachStructure information
(Thermodynamics)◦RNAz
Homology Conservation of structure (ERPIN, MirScan, snarloop)
◦Stem ◦Loop◦Target sequece
Machine Learning (miRFinder, microPred)◦Feature selection based on sequence and
structural information
Tests done on those Tests done on those methodologymethodologyERPIN (2001) (Homology)
◦ Good result but very limited to the availability of the training data. Capable of detecting only 66 of the 706 miRNA if we remove the human training sequences we can only detect 36 miRNA
miRFinder (2007) (ab initio)◦ Human
Specificity: (1320/8494) (84.46%) Sensitivity: (599/706) (84.84%)
◦ Mouse Specificity: (1759/10213) (82.78%) Sensitivity: (450/547) (82.27%)
microPred (2009) (ab initio)◦ Found bug for the author, currently getting it
fixed.
Negative Set GenerationNegative Set GenerationSequence were obtained from
the CDS region of the genome◦Implementation of a CDS Extractor
for ccdsgenes.txt files from the UCSC Genome Browser
CDS means coding region ◦(Sequence that code for protein)
Need to implement a new parser based on the cds.txt from the UCSC Genome Browser
Positive SetPositive SetDownloaded from MiRBase 706
human and 547 mouse genome
Algorithms for SCFGAlgorithms for SCFGCYK algorithm
◦calculates the optimal alignment of a sequence to an SCFG with ambiguity
Inside algorithm◦calculates the probability of a
sequence given an SCFG.Inside-outside algorithm
◦Estimates optimal probability parameters for an SCFG given a set of example sequences.
Advantages of CYKAdvantages of CYKA relative fast algorithm O(n3) and if
we take advantage of the Dynamic Programming table we can scan through the sequence O(n2)
We can quickly compute multiple windows at the same time
It is able to fold an RNA forcefully into a specific structure that we specify
Introduction to the Modified Introduction to the Modified CYK AlgorithmCYK AlgorithmGiven X = X1… Xn and a SCFG G,
◦Find the optimal parse of X◦Dynamic Programming
(i, j, V): likelihood of the most likely parse of xi…xj,
rooted at nonterminal V
Stochastic CYKStochastic CYKInitialization:
(i, i-1) = log P()
Iteration:For i = 1 to NFor j = i to N (i+1, j–1) + log P(xi S xj) (i, j–1) + log P(S xi) (i, j) = max (i+1, j) + log P(xi S) maxi < k < j (i, k) + (k+1, j) + log P(S S)
Weight Estimation of each Weight Estimation of each Non-terminal emissionNon-terminal emissionmiRNA let7 57 sequences
obtained from RfamUsed R Coffee to estimate length
of the hairpin loop, stem, and bulge
The parameters that we estimated seems to work well with majority of the cases of the microRNA
Result for CYKResult for CYKInsert Plot Here
RNAfoldRNAfoldMost commonly used tool for
predicting RNA secondary structureAll the ab intio approaches or hairpin
loop finders currently uses RNAfold for identifying an estimate of a microRNA structure and its MFE
We use RNAfold’s mfe as a measuring stick and use some of its structural features to assist our routine
Result for RNAfold Result for RNAfold Insert Plot Here
CYK RNAfold HybridCYK RNAfold HybridI use the formula as follows[CYK] * 2 + [MFE] =
CombinedScoreDuring the calculation, if RNAfold
predict a structure with two or more hairpin loops, then we penalize the CYK score
Z score calculationZ score calculationIn order for us to combine the
features of the MFE and the CYK score we randomly sampled 20,000 sequences from the Human Genome and calculated its MFE and CYK
CYK RNAfold Hybrid ResultCYK RNAfold Hybrid ResultInsert Plot Here
Optimized Sensitivity Optimized Sensitivity Specificity ComparisonSpecificity Comparison
Human Specificity TestHuman Sensitivity Test Mouse Specificity Test
Mouse Sensitivity Test
8494 pseudo-miRNA 706 miRNA 10213 pseudo-miRNA 547 miRNA
MFE 73.15% 73.07% 65.83% 66.97%
CYK 79.09% 78.60% 72.19% 72.47%
CYK-Hybrid 81.05% 81.08% 72.17% 71.93%
miRFinder 84.46% 84.84% 82.78% 82.27%
Disadvangtage of our Disadvangtage of our ProgramProgramLimited to its structural accuracy
To Do ListTo Do ListPossibly test the accuracy in
terms of CYK’s ability to predicting the structure of the microRNA
Need to run through the
SummarySummaryWe currently have a routine that is
capable of identifying microRNA on a 82% Sensitivity and Specificity based solely on its structure
Currently communicating with a student from the UK that published microPred to see if we can use our program to retrain their SVM to see if we can get a better result
See Website for more See Website for more DetailsDetailshttp://128.192.76.177/ProjectUpd
ate/microRNA.htmlhttp://128.192.76.177/CYK.html
for testing out the grammar
ReferencesReferences