View
25
Download
1
Category
Tags:
Preview:
DESCRIPTION
RBP1 Splicing Regulation in Drosophila Melanogaster. 03-711 - Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla. This presentation available at http://www.jjoseph.org/biology/. Alternative Splicing in Dros. RBP1 Regulation. Involved in dsx splicing and Rbp1 auto-regulation - PowerPoint PPT Presentation
Citation preview
RBP1 Splicing RBP1 Splicing RegulationRegulation
in in Drosophila Drosophila
MelanogasterMelanogaster03-711 - Fall 200503-711 - Fall 2005
Jacob Joseph, Ahmet Bakan, Jacob Joseph, Ahmet Bakan,
Amina AbdullaAmina Abdulla
This presentation available at http://www.jjoseph.org/biology/
Alternative Splicing in Alternative Splicing in Dros.Dros.
RBP1 RegulationRBP1 Regulation
Involved in Involved in dsxdsx splicing and splicing and Rbp1Rbp1 auto- auto-regulationregulation
Suspected in many other related Suspected in many other related pathwayspathways
Genome DataGenome Data
Sequence of all introns of known Sequence of all introns of known splice variantssplice variants
Two annotated genomes availableTwo annotated genomes available D. MelanogasterD. Melanogaster D. PseudoobscuraD. Pseudoobscura
As the gene names for D. Mel. and As the gene names for D. Mel. and D. Pseu. differ, a list of gene D. Pseu. differ, a list of gene orthologs was also obtainedorthologs was also obtained
Computational ApproachComputational Approach Create profile HMM for each motif (B-B, Create profile HMM for each motif (B-B,
B-A) B-A) Select the end of every intron (~50 bases)Select the end of every intron (~50 bases) Perform an HMM search for each intron Perform an HMM search for each intron
segment, in both D. Mel. and D. Pseu.segment, in both D. Mel. and D. Pseu. Keep matches found in both speciesKeep matches found in both species Keep matches at the end of introns (~15 Keep matches at the end of introns (~15
bases)bases) Return alignment of both speciesReturn alignment of both species Examine biological similarity of matchesExamine biological similarity of matches
Data SummaryData Summary
Hidden Markov Profile Hidden Markov Profile (HMM) and HMMer(HMM) and HMMer
We needed an HMM profiler and search We needed an HMM profiler and search program.program.
Revised version of what Krogh/Haussler Revised version of what Krogh/Haussler model called Plan 7model called Plan 7 Not only global alignmentNot only global alignment
HMMer AdvantagesHMMer Advantages Possible AlignmentsPossible Alignments
Classic global alignmentClassic global alignment Classic local alignmentClassic local alignment Global Profile, Local Sequence alignmentGlobal Profile, Local Sequence alignment Fully local “multihit” alignment. Ex:Fully local “multihit” alignment. Ex:
ScoringScoring Raw alignment scoreRaw alignment score E-value, showing the significance of the E-value, showing the significance of the
alignmentalignment
HMMerHMMer
Create HMM for multiple alignment of Create HMM for multiple alignment of each B-B and B-A motifeach B-B and B-A motif
Genome is scanned for high scoring Genome is scanned for high scoring matchesmatches
Only hits within a distance of 15 base Only hits within a distance of 15 base pairs of the 3’ splice site are consideredpairs of the 3’ splice site are considered
Results: B-A MotifResults: B-A MotifCG30271-RC-in_5 (27 - 39), GA15740-in_5 (27 - 39) score: -6ctgttgaatcacttggaaagcaatcaGTCGACAATTGTTtacttttacag| |||||||||| |||||||||||||||||||||||||||||||||||cctttgaatcactcggaaagcaatcaGTCGACAATTGTTtacttttacag
CG30020-RA-in_3 (25 - 37), GA15581-in_9 (24 - 36) score: -8ccgtcccagtgacttacaatacgaTTCTACTATTTTTtgtacgcttacag | | | | | ||||| |||| | | taaggctcttcatactttatcaaATCTACAATTTCTcaatgtaattgcag
Klp3A-RA-in_3 (31 - 43), GA21186-in_3 (26 - 38) score: -9ttgaagttcgaaaactcctgaaactaattgTTCCACAATTTTTttttatt | || || || ||| || ||||| | | tgttcaattcttaaataaaaccaatTTCGACTCTTTTTctcttctttcag
na-RB-in_0 (33 - 45), GA13546-in_2 (25 - 37) score: -9tctggtgcactgagagaaatgccatctacttcATCGATACTCTTTtgcag | | || | | || || | tgtaaacactcgttgcaaacacaaATTTACAATCAATttccatgttttat
CG30428-RA-in_2 (33 - 45), GA15840-in_1 (25 - 37) score: -9ggtaaggaagcgtaaaaataaattctttttttATCACCAATATTTttcag | || || ||||| |||| ||||| aaaatatcaagccgaaacaaatttATGTACAATTTTTtttttatggaaag
CG2199-RB-in_0 (36 - 48), GA15296-in_0 (33 - 45) score: -10ttgctactgccattataggtagtttaaaaactgttTTCTACACTCTTTct | | | | | || ||||| | | aacaaaaacaaaaatatggccctctgataattGGGGACACTTTATttcag
Results: B-B MotifResults: B-B Motifps-RD-in_4 (31 - 42), GA20847-in_4 (31 - 42) score: -11catttaatatcttgaaaatatttaacataaATCTGATGCAAAtattccag | || | || ||||||||||||||||||||||||||||||||attactattcttaaaatatatttaacataaATCTGATGCAAAtattccag
fru-RE-in_6 (26 - 37), GA12896-in_5 (24 - 35) score: -13cccacccccacagtgatgacgcctaATATGAACCAAGcaaatgtttgcag | | | | | | ||| | || | | | | tgctaaataaaccaaattccaaaCTCTGATCAAAAaataccgataaaaag
Ptp52F-RA-in_0 (38 - 49), GA14851-in_14 (34 - 45) score: -13tactctttgaaaaataagcatatggatgtcactgataATATGATATTAAt | | | | || | ||| || || tctaaatcgtattcaaatcgaattgaaacataaATCGAATCCAAAaacag
CG9455-RA-in_0 (32 - 43), GA21800-in_0 (27 - 38) score: -13aatagtggctttgttttaataacaatgtaatATCTGATATTTAttctcag | | | | | ||||| | | | cagagcgtgccccgtctgatgatccgAACTGATCTGATgtttttcggtag
CG8709-RA-in_2 (34 - 45), GA21271-in_9 (34 - 45) score: -13acaaatcttaggaaataccaaagttgttctacgATCTTATCTATGgagtc | | | | | | || || | |||||| gccccatcagtgtcagtggcagctgaccccaccATTTGATCTATTtgcag
CG7966-RA-in_0 (37 - 48), GA20727-in_4 (26 - 37) score: -13tatatgtacacattgtactgcaaacacatgccctgaATCTTTGATAAAga | | ||| | | |||||| | |||| gtgttgaatgaaagaatacacttgaATCGGTTCTAAAttgcatcgcacag
Biomolecular Activity: B-Biomolecular Activity: B-AA
Biomolecular Activity: B-Biomolecular Activity: B-BB
Biomolecular activity Biomolecular activity analysisanalysis
frufru gene, regulated by the gene, regulated by the tratra and and tra2tra2 genes is expressed at the same genes is expressed at the same time as dsx gene helps validate our time as dsx gene helps validate our results.results.
Expected presence of Expected presence of sxlsxl and and tra tra genes.genes.
Functional Similarity:Functional Similarity: B-A motif: B-A motif: SNF4Agamma, rdgc, qtc.SNF4Agamma, rdgc, qtc. B-B motif: B-B motif: ps, ptp, CG9455ps, ptp, CG9455..
Difficulties & Future Difficulties & Future DirectionsDirections
Support Vector Machines were Support Vector Machines were appliedapplied
Lack of significant training data.Lack of significant training data. Lack of direct experimental data for Lack of direct experimental data for
cross-validation.cross-validation. Since the current D. Pse. genome has Since the current D. Pse. genome has
far fewer intron sequences, reliance far fewer intron sequences, reliance upon orthologs introduces many false upon orthologs introduces many false negatives.negatives.
Alternate Approach:Alternate Approach:Support Vector Machines Support Vector Machines
(SVM)(SVM) Used for data classificationUsed for data classification Creates hyperplanes that Creates hyperplanes that
separate data into two classes separate data into two classes with maximum-marginwith maximum-margin
Appropriate for Appropriate for multidimensional multidimensional classification problemsclassification problems
ExamplesExamples Article classificationArticle classification Protein classificationProtein classification
Critical pointsCritical points Feature selectionFeature selection TrainingTraining
HMM and SVMHMM and SVM HMMer is used to generate featuresHMMer is used to generate features All genome searched for A and B All genome searched for A and B
consensus sequencesconsensus sequences Search results for each intron combined Search results for each intron combined
to create featuresto create features FeaturesFeatures
Scores of two motifs in the upstream (2)Scores of two motifs in the upstream (2) Distance of the motifs to the splice site (1)Distance of the motifs to the splice site (1) Length of consensus sequence overlap (1)Length of consensus sequence overlap (1) Length of motif (1)Length of motif (1) Does consensus sequence B precedes A (1)Does consensus sequence B precedes A (1)
Number of features = 6Number of features = 6
SummarySummary
Profile HMM used for modelingProfile HMM used for modeling Comparative analysis with the D.Pseu Comparative analysis with the D.Pseu
genomegenome High scoring alignments for both High scoring alignments for both
motifs further analyzed for motifs further analyzed for biomolecular activitybiomolecular activity
The existence of the The existence of the fru fru and other and other close matches help to validate our close matches help to validate our resultsresults
Recommended