View
39
Download
2
Category
Preview:
DESCRIPTION
Georgetown University. D 1. D 2. D 3. D 4. I 0. I 1. I 2. I 4. I 3. Begin. b 1. y 1. b 2. y 2. End. Improving the Sensitivity of Peptide Identification With Meta-Search and Machine Learning. Nathan J. Edwards 1 , Xue Wu 2 , Chau-Wen Tseng 2. - PowerPoint PPT Presentation
Citation preview
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center
Peptide sequence databases, meta-search engine,
machine-learning combiner available from:
http://edwardslab.bmcb.georgetown.edu Application of enumeration, meta-search, and
machine-learning can significantly improve the
sensitivity of peptide identification.
Improving the Sensitivity of Peptide Identification With Meta-Search and Machine LearningNathan J. Edwards1, Xue Wu2, Chau-Wen Tseng2
Introduction
All peptide sequences from: Six-frame translation of EST and HTC sequences; Three-frame translation of mRNA sequences; All IPI, RefSeq, Genbank, Vega, EMBL, HInvDB,
SwissProt and TrEMBL proteins; SwissProt variants, splices, conflicts, mature isoforms
grouped by gene-cluster & compressed, as FASTA.
1Georgetown University Medical Center; 2University of Maryland, College Park
Peptide Sequence Databases
PepSeqDB Release 1.2
Peptide Identification Meta-Search HMMatch Spectral Matching
Conclusions
References
We use a variety of techniques, from sequence
enumeration and meta-search to machine learning
to increase the number high-confidence peptide
identifications from large tandem mass-spectra
datasets. These techniques seek to improve the number of
peptide identifications made at a given level of
statistical significance. We show that these techniques can improve
identification sensitivity significantly.
Georgetown University
1. Edwards. Novel Peptide Identification using Expressed Sequence
Tags and Sequence Database Compression. Mol. Sys. Biol. 2007.
2. Wu, Tseng, Edwards. HMMatch: Peptide Identification by Spectral
Matching of Tandem Mass Spectra using Hidden Markov Models.
J. Comp. Biol. 2007.
3. Wu, Tseng, Rudnick, Balgley, Edwards. PepArML: An Unsupervised,
Model-Free, Combining, Peptide Identification Arbiter for Tandem
Mass Spectra via Machine Learning. In preparation.
Organism Size (AA) Size (Entries)
Human 209Mb 75,043
Mouse 151Mb 55,929
Rat 67Mb 43,211
Zebra-fish 90Mb 47,922
Schedule: Automated rebuild every few months.
Coming soon: Fast peptide to gene and source sequence
mapping using suffix-trees and gene sequence-groups.
Annual Meeting, 2008
PepArML - Unsupervised Machine-Learning Combiner
NSF TeraGrid1000+ CPUs
UMIACS250+ CPUs
Edwards LabScheduler &48+ CPUs
Meta-search with four search engines;Target & decoy searches automatically.
Web-service API for all data
Securecommunication
Heterogeneouscompute resources
Simple search descriptionScales to 100’s of
simultaneous searchesFree, instantregistration
Iteration
Legend: Heuristic: H; Classifier w/ 5-fold-CV: C-T, C-M, C-O, C-TM, C-TO, C-MO, C-TMO; Unsupervised classifier w/ 5-fold-CV: U-TMO; Unsupervised classifier w/ no-CV: U*-TMO.
Q-TOF
False Positive Rate
LTQ
MALDI
HC-TMO
U-TMO
U*-TMO
Endb2
D3
y1
I2
D2
b1
I1
D1
Begin
I0
y2
I4
D4
I3
I0 b1 I1 I2 I3 I4 I5 I6y1 b2 y2 b3 y3
11% 17% 6% 94% 8% 0% 11% 86% 17% 0% 6% 92% 19%
Recommended