Upload
maryann-smith
View
215
Download
0
Embed Size (px)
Citation preview
Peak Detection with Chemical Noise Removal Using Short-Time
FFT for a Kind of MALDI Data
Xiaobo Zhou
HCNR-CBI, Harvard Medical School
and Brigham & Women’s Hospital, Boston, MA.
MS Profiling and Biomarkers
Biosci Rep 25(1-2):107
MALDI-TOF Mass Spectrometry
Biochemistry @ NCBI Books
Outline
Introduction
Peak detection method Random noise removal Chemical noise removal Peak identification
Results Conclusion
Introduction
Peak detection is the first step in the biomarker extraction from the MS data.
Proper peak detection methods should be based on the properties of different types of MS data.
Properties of the MALDI (Matrix Assisted Laser Desorption Ionization) data interested: high mass accuracy and resolution over a wide
mass range: m/z can be as large as 20000 Da. the unique noise pattern: low frequency noise
peaks at 1 Da apart.
Example of a spectrum
Fig. 1 The largest m/z is more than 9000Da. The two enlarged regions show the detailed signals when m/z is relatively small and large in this spectrum. When m/z is small, the pattern of the chemical noise can be seen clearly, while when m/z is becoming larger, it is weaker.
Small m/z
Large m/z
Example of chemical noise
Fig. 2 Part of a raw spectrum. The chemical noise has a unique pattern that peaks at 1 Da apart.
Existed methods
Most methods: Based on the a priori knowledge of the mass of the peptides Drawback: effective for m/z less than 4000Da
Yu W. et al. : Gaussian filter to smooth the signal Gabor quadrature filter to extract the envelope signal Drawback: the threshold of defining peaks is got via an empirical
approach without considering the noise model. Some small peaks can be overlooked / omitted.
Jurgen K. et al: To remove chemical noise by Fourier transform with fixed window
size. Drawback: useful for the eletrospray quadrupole TOF data, whose
maximum m/z is less than 2000Da.
Peak detection method
Proposed model:
:number of the spectrum
:number of the time of flight (TOF) point
:the true intensity :the observed intensity
:chemical noise :random noise
( ) ( ) ( ) ( )j k j j k j k j kx t N s t c t n t 1,2,...j m 1,2,...k T
( )j ks t
j
kt
( )j kx t( )j kc t ( )j kn t
Three steps
Random noise removalundecimated wavelet transform to get the smoothed signal
software: Rice Wavelet Toolbox (RWT)
website: http://www-dsp.rice.edu/software/rwt.shtml
Chemical noise removalshort time discrete Fourier transform with adaptive window size to
get an estimate of chemical noise
Peak identification Normalization Threshold Approach via SNR Identification
ˆ ( )j kx t
Chemical noise removal
Why STDFT? The shape of the chemical noise is similar to the sinusoidal
signal. The number of points in a 1Da region is different and
decreasing from lower m/z to higher m/z. Short Time Discrete Fourier Transform
Choose the window sizethe region with the same number of points in 1Da (figure in thefollowing page)
2 ( 1)( 1) /
1
ˆ( , ) ( , ) ( )T
w i k l Tj j k
k
X l u w k u x t e
1,2,...l T
1, ,( , )
0, .
if k u window sizew k u
otherwise
bb
Window size of STDFT
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010
15
20
25
30
35
40
45
50
55
m/z
Num
ber
of p
oint
s in
1D
a
Fig. 3 Approximate number of data points (window size) in 1Da region as a function of m/z. With the increasing of m/z, the region with same number of time point is becoming larger.
Estimate of the chemical noise
Small m/z Large m/z
5360 5365 5370 5375 5380 5385 5390 5395 5400
0
5
10
15
20
25
30
35
40
m/zin
tens
ity
Fig. 4 Estimate of the chemical noise for the two enlarged regions in Fig. 1. Black curve: smoothed signal; Blue curve: estimate of the chemical noise. If the smoothed signal minus the estimate of the chemical noise is less than a certain threshold, it is set to be zero.
Peak identification
Normalization of the processed spectrum The spectrum is divided by the mean intensity of the processed
spectrum to correct the systematic differences
Definition of the Signal-to-noise ratio (SNR)
Peak identification
The peak is located in the position where it has the highest intensity in a cluster of peaks
Intensity of the estimated true signal
Mean of the intensity of the random noise in a windowSNR
Peak identification
5360 5365 5370 5375 5380 5385 5390 5395 5400
0
5
10
15
20
25
30
35
40
45
m/z
inte
nsity
1340 1345 1350 1355 1360 1365 1370 1375 13800
20
40
60
80
100
120
140
160
180
200
m/z
inte
nsity
Fig. 5 Examples of envelope extraction and peak detection. The blue curves show the envelope extraction. Each concave part of the curve corresponds to the same protein. The red circles
label the peaks detected by our method.
Small m/z Large m/z
Results:
Data sets -The data got from the serum of a cohort of patients
undergoing endarterctomy (EA) (therapy) for occluded carotid arteries (disease).
-Here we use the data of a group of all the patients: the symptomatic EA patients (EAS).
-After the sample preparation and the data acquisition, we finally got 35 spectra.
Numerical experiment settings The threshold of wavelet transform: 6 The threshold of chemical noise removal: 80
percentiles of the value got by subtracting the estimated chemical noise from the smoothed signal in each adaptive region
We define true peaks: the number of peaks in one location is more than 15% of the number of spectra in one group in the same 1Da region.
Position of the peaks: the mean m/z over all the peaks in 1 Da.
Peaks in the EAS group
1000 2000 3000 4000 5000 6000 7000 8000 9000
5
10
15
20
25
30
35
m/z
num
ber
of s
pect
ra
Fig. 6 All peaks detected in EAS group when SNR is 2. One bad spectrum is removed. Totally 34 spectra are considered.
SNR comparison
Fig. 7 Comparison of SNR between the same spectrum without and with removal of chemical noise.
Left: the SNR got without chemical noise removal.
Right: SNR got with chemical noise removal.
Conclusions
Proposed an automatic peak detection method for a kind of MALDI data: chemical noise removal to increase the SNR of some small peaks.
Future works: Peak alignment: to align all the peaks
corresponding to the same protein but with shifting
Feature extraction: to select the biomarkers
References
Berndt, P., Hobohm, U., Langen, H., Reliable automatic protein identification from matrix-assisted laser desorption ionization mass spectrometric peptide finger prints, Electrophoresis, 1999, 20, 3521-3526.
Breen, E. J, Hopwood, F.G., Williams, K.L., Wilkins, M. R., Automatic poisson peak harvesting for high throughput protein identification, Electrophoresis, 2000, 21, 2243-2251.
Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., H. M. Kuerer, PImproved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics, 2005, 5, 4107-4117.
Du, P., Kibbe, W. A., Lin, S. M., Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics, 2006, 22, 2059-2065.
Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. R., Appel, R. D., Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection, Electrophoresis, 1999, 20, 3535-3550.
References Jurgen, K., Marc, G., Keith, R., Matthias, W., Noise filtering techniques for
electrospray quadrupole time of flight mass spectra, J. Am. Soc. Mass Spectrom., 2003, 14, 766-776.
Krutchinsky, A. N., Chait, B. T., On the nature of the chemical noise in MALDI mass spectra, Journal American Society for Mass Spectrometry, 2002, 13, 129-134.
Morris, J. S., Coombes, K. R., Koomen , J., Baggerly, K. A., Kobayashi, R., Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum, Bioinformatics, 2005, 21, 1764-1775.
Samuelsson, J., Dalevi, D., Levander, F., Rognvaldsson, T., Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting, Bioinformatics, 2004, 20, 3628-3635.
Yasui, Y., McLerran, D., Adam, B., Winget, M., Thornquist, M., Feng, Z., An automated peak-identification / calibration procedure for high-dimensional protein measures from mass spectrometers, Journal of Biomedicine and Biotechnology, 2003, 4, 242-248.
Yu, W., Wu, B., Lin, N., Stone, K., Williams, K., Zhao, H., Detecting and Aligning Peaks in Mass Spectrometry Data with Applications to MALDI, Computational Biology and Chemistry, 2006, 30, 27-38.