http://www.bits.vib.be/training
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Lennart [email protected]
Proteomics Services GroupEuropean Bioinformatics Institute
Hinxton, CambridgeUnited Kingdomwww.ebi.ac.uk
kenny helsens
Computational Omics and Systems Biology Group
Department of Medical Protein Research, VIBDepartment of Biochemistry, Ghent University
Ghent, Belgium
peptide validationand protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Raw data
Peaklists
Peptide sequences
Protein accession numbersdata sizeambiguity
See: Martens and Hermjakob, Molecular BioSystems, 2007
Data processing and information ambiguity
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
PEPTIDE IDENTIFICATION VALIDATION
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Populations and individuals
10,000 peptide-to-spectrum matches
5%decoy hits
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Suspect peptide identifications happen.
The problem is that finding them requiresdetailed analysis of a single spectrum and its identifications, amongst thousands of
other spectra…
Eliminating false positives
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Automated interpretation
The Netherlands??
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Manual interpretation
Tyrosine phosporylation
See: Ghesquière and Helsens, Proteomics, 2010
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Peptizer expert system
Aggregation of the votes
Agent a
Agent b
Agent c
Agent d
Agent e
+ 1 + 1 0 -1 + 1Vote casts
Trustedsubset
Suspicioussubset
Confident Peptide Identifications
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Peptizer expert system
See: Helsens et al, Molecular and Cellular Proteomics, 2008
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
PROTEIN INFERENCE
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
1a 1b 3 4 6a 6b2 5
1a 1b 3 4 6a2 5
1a 1b 6a 6b2 5
1a 1b 3 6a 6b2 5
1b 3 4 6a 6b2 5
2 5
2 3 5
3 4 52
3 4 52
Gene
Transcripts
Translations
Intron Exon UTR Exon CDS Peptide
Peptidesmatching all transcriptsmatching a transcript subsetmatching exactly 1 translation
redundant
Not all peptides are created equal
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Sample preparation consequences
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
See: Nesvizhskii AI et al, Molecular and Cellular Proteomics, 2005
Sample preparation consequences
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
peptides a b c d
proteinsprot X x xprot Y xprot Z x x x
Minimal setOccam {
peptides a b c d
proteinsprot X x xprot Y xprot Z x x x
Maximal setanti-Occam {
peptides a b c d
proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x
Minimal set withmaximal annotation {
true Occam?See: Martens and Hermjakob, Molecular BioSystems, 2007
Protein inference: a question of conviction
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
ALGORITHMS FOR THE
PROTEIN INFERENCE PROBLEM
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
• IDPickerZhang et al, Journal of Proteome Research, 2007
• ProteinProphetNesvizhskii AI et al, Analytical Chemistry, 2003
• DBToolkitMartens et al, Bioinformatics, 2005http://genesis.UGent.be/dbtoolkit
A few algorithms for protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
IDPicker parsimonious protein assembly
(I) Initialize
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
IDPicker parsimonious protein assembly
(II) Collapse
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
IDPicker parsimonious protein assembly
(III) Separate
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
IDPicker parsimonious protein assembly
(IV) Reduce
See: Zhang et al, Journal of Proteome Research, 2007
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
peptideprobability
peptideweight
proteinprobability
In iteration 1, all weights w start off as 1/n,with n the degeneracy count for the peptide
peptide probability
See: Nesvizhskii AI et al., Analytical Chemistry, 2003
ProteinProphet: the simplified view
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
peptides a b cd
proteinsprot X(-) x xprot Y(+) xprot Z(0) x x x
Minimal set withmaximal annotation{
DBToolkit protein inference
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
peptides a b c d
proteinsprot X (-) x xprot Y (+) xprot Z (0) x x x
Some indications from the HUPO BPP
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
PROTEIN INFERENCE AND
QUANTIFICATION
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Some inference examples (i)
See: Colaert et al, Proteomics, 2010
http://genesis.ugent.be/rover/
Nice and easy, 1/1, only unique peptides (blue) and a narrow distribution
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Some inference examples (ii)
See: Colaert et al, Proteomics, 2010
Nice and easy, down-regulated
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Some inference examples (iii)
See: Colaert et al, Proteomics, 2010
A little less easy, up-regulated
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Some inference examples (iv)
See: Colaert et al, Proteomics, 2010
A nice example of the mess of degenerate peptides
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Some inference examples (v)
See: Colaert et al, Proteomics, 2010
A bit of chaos, but a defined core distribution
http://genesis.ugent.be/rover/
BITS MS Data Processing – Protein InferenceUGent, Gent, Belgium – 16 December 2011
Kenny [email protected]
Thank you!
Questions?