Upload
zola
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Top-down characterization of proteins in bacteria with unsequenced genomes. Nathan Edwards Georgetown University Medical Center. Microorganism Identification. Homeland-security/defense applications Long history of fingerprinting approaches Clinical applications in strain identification: - PowerPoint PPT Presentation
Citation preview
Top-down characterization of proteins in bacteria with
unsequenced genomes
Nathan EdwardsGeorgetown University Medical Center
2
Microorganism Identification
Homeland-security/defense applications Long history of fingerprinting approaches
Clinical applications in strain identification: Selection of treatment and/or antibiotics
New applications in microbiome analysis: Bacterial colonies in gut, .... Chronic wound infections
Compete with genomic approaches? PCR, Next-gen sequencing Primary sales-pitch is speed.
Microorganism Identifications
Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to
instrumentation and sample prep
Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example
...but many have – about 2500 to date.
3
Microorganism Identifications
Match spectra with proteome (or genome) sequence for (species) identity Provides robust match with respect to
instrumentation and sample prep
Many bacteria will never be sequenced or "finished"... Pathogen simulants, for example
...but many have – about 2500 to date.
Can we use the available sequence to identify proteins from unknown, unsequenced bacteria? Yes, for some proteins in some organisms!
4
5
Intact protein LC-MS/MS
Crude cell lysate
Capilary HPLC C8 column
LTQ-Orbitrap XL
Precursor scan: 30,000 @ 400 m/z
Data-dependent precursor selection: 5 most abundant ions 10 second dynamic
exclusion Charge-state +3 or
greater
CAD product ion scan 15,000 @ 400 m/z
6
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
7
Enterobacteriaceae Protein Sequences
Exhaustive set of all Enterobacteriaceae family protein sequences from Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR]
...plus Glimmer3 predictions on RefSeq Enterobacteriaceae genomes Primary and alternative translation start-sites
Filter for intact mass in range 1 kDa – 20 kDa 253,626 distinct protein sequences, 256 species
Derived from "Rapid Microorganism Identification Database" (RMIDb.org) infrastructure.
8
ProSightPC 2.0
Product ion scan decharging Enabled by high-resolution fragment ion
measurements THRASH algorithm implementation
Absolute mass search mode 15 ppm fragment ion match tolerance 250 Da precursor ion match tolerance
"Single-click" analysis of entire LC-MS/MS datafile.
Other tools
Explored using standard search engines: Decharge and format as charge +1 spectrum X!Tandem scoring plugin (ProSight, delta M) OMSSA, Mascot, etc…
MS-Tools: MS-Deconv, MS-TopDown, MS-Align, MS-Align+, MS-Align-E!
9
10
E:\Yersinia Work\yr_inclusion 3/11/2009 3:43:13 PM yrohdei
RT: 19.04 - 25.39
19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0
Time (min)
0
20
40
60
80
100
0
20
40
60
80
100
Re
lative
Ab
un
da
nce
25.3619.9919.93
25.2720.04 25.2319.89 23.0322.97 23.08
20.1019.83 23.64 25.1923.7022.88 24.6324.5720.1422.82
20.2019.7822.7220.2519.48
22.5220.41 22.0821.8420.60 21.04
20.00
21.03 21.46
NL: 1.66E8
TIC MS yr_inclusion
NL: 1.01E7
TIC F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00] MS yr_inclusion
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lative
Ab
un
da
nce
576.83z=2
840.16z=7
720.39z=2 903.81
z=3785.41
z=4694.62
z=4
584.57z=4
928.49z=4559.55
z=41804.48
z=?992.53
z=3200.78z=?
329.71z=?
1253.14z=?
555.29z=4
1610.27z=?
1883.75z=?
1491.23z=?
1118.93z=?
1666.89z=?
1345.30z=?
461.16z=?
756.70 +8 MW 6044.11
CID Protein Fragmentation Spectrum from Y. rohdei
Match to Y. pestis 50SRibosomal Protein L32
Exact match sequence…
11
Phylogeny: Protein vs DNA
12
Protein Sequence 16S-rRNA Sequence
What about mixtures?
13
14
Shared Small Ribosomal Proteins
15
Shared Small Ribosomal Proteins
16
Identified E. herbicola proteins
30S Ribosomal Protein S19 m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007
Six proteins identified with |Δ| < 0.02
17
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Eight proteins identified with "large" |Δ|
Identified E. herbicola proteins
18
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 1.91e-58
Use "Sequence Gazer" to find mass shift ΔM mode can "tolerate" one shift for free!
Identified E. herbicola proteins
ProSightPC: ΔM mode
19
Protein Sequence
ExperimentalPrecursor
ΔM
b- and y-ions
Also: PIITA - Tsai et al. 2009
ProSightPC: ΔM mode
20
Protein Sequence
ExperimentalPrecursor
ΔM
b- and y-ions
ΔM b'- and y'-ions
Also: PIITA - Tsai et al. 2009
Match a single "blind" mass-shift for free!
ProSightPC: ΔM mode
21
Protein Sequence
ExperimentalPrecursor
ΔM
b-, b'-, y- and y'-ions
ΔM
Also: PIITA - Tsai et al. 2009
Match a single "blind" mass-shift for free!
22
DNA-binding protein HU-alpha m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Extract N- and C-terminus sequence supported by at least 3 b- or y-ions
Identified E. herbicola proteins
23
E. herbicola protein sequences
24
E. herbicola sequences found in other species
25
Phylogenetic placement of E. herbicola
Phylogram Cladogramphylogeny.fr – "One-Click"
Genome annotation errors
UniProt: E. coli Cell division protein ZapB
22 (371) E. coli strains
26
MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…
3 (204)17 (166)
0 (2)
Genome annotation errors
UniProt: E. coli Cell division protein ZapB
22 (371) E. coli strains Need ±1500 Da precursor tolerance…
27
MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…
3 (204)17 (166)
0 (2)
28
Conclusions Protein identification for unsequenced organisms.
Identification and localization for sequence mutations and post-translational modifications.
Extraction of confidently established sequence suitable for phylogenetic analysis.
Genome annotation correction.
New paradigm for phylogenetic analysis?
29
Acknowledgements
Dr. Catherine Fenselau Avantika Dhabaria, Joe Cannon*, Colin Wynne* University of Maryland Biochemistry
Dr. Yan Wang University of Maryland Proteomics Core
Dr. Art Delcher University of Maryland CBCB
Funding: NIH/NCI