22
Peak Detection with Chemical Noise Removal Using Short- Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s Hospital, Boston, MA.

Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Embed Size (px)

Citation preview

Page 1: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Peak Detection with Chemical Noise Removal Using Short-Time

FFT for a Kind of MALDI Data

Xiaobo Zhou

HCNR-CBI, Harvard Medical School

and Brigham & Women’s Hospital, Boston, MA.

Page 2: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

MS Profiling and Biomarkers

Biosci Rep 25(1-2):107

Page 3: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

MALDI-TOF Mass Spectrometry

Biochemistry @ NCBI Books

Page 4: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Outline

Introduction

Peak detection method Random noise removal Chemical noise removal Peak identification

Results Conclusion

Page 5: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Introduction

Peak detection is the first step in the biomarker extraction from the MS data.

Proper peak detection methods should be based on the properties of different types of MS data.

Properties of the MALDI (Matrix Assisted Laser Desorption Ionization) data interested: high mass accuracy and resolution over a wide

mass range: m/z can be as large as 20000 Da. the unique noise pattern: low frequency noise

peaks at 1 Da apart.

Page 6: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Example of a spectrum

Fig. 1 The largest m/z is more than 9000Da. The two enlarged regions show the detailed signals when m/z is relatively small and large in this spectrum. When m/z is small, the pattern of the chemical noise can be seen clearly, while when m/z is becoming larger, it is weaker.

Small m/z

Large m/z

Page 7: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Example of chemical noise

Fig. 2 Part of a raw spectrum. The chemical noise has a unique pattern that peaks at 1 Da apart.

Page 8: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Existed methods

Most methods: Based on the a priori knowledge of the mass of the peptides Drawback: effective for m/z less than 4000Da

Yu W. et al. : Gaussian filter to smooth the signal Gabor quadrature filter to extract the envelope signal Drawback: the threshold of defining peaks is got via an empirical

approach without considering the noise model. Some small peaks can be overlooked / omitted.

Jurgen K. et al: To remove chemical noise by Fourier transform with fixed window

size. Drawback: useful for the eletrospray quadrupole TOF data, whose

maximum m/z is less than 2000Da.

Page 9: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Peak detection method

Proposed model:

:number of the spectrum

:number of the time of flight (TOF) point

:the true intensity :the observed intensity

:chemical noise :random noise

( ) ( ) ( ) ( )j k j j k j k j kx t N s t c t n t 1,2,...j m 1,2,...k T

( )j ks t

j

kt

( )j kx t( )j kc t ( )j kn t

Page 10: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Three steps

Random noise removalundecimated wavelet transform to get the smoothed signal

software: Rice Wavelet Toolbox (RWT)

website: http://www-dsp.rice.edu/software/rwt.shtml

Chemical noise removalshort time discrete Fourier transform with adaptive window size to

get an estimate of chemical noise

Peak identification Normalization Threshold Approach via SNR Identification

ˆ ( )j kx t

Page 11: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Chemical noise removal

Why STDFT? The shape of the chemical noise is similar to the sinusoidal

signal. The number of points in a 1Da region is different and

decreasing from lower m/z to higher m/z. Short Time Discrete Fourier Transform

Choose the window sizethe region with the same number of points in 1Da (figure in thefollowing page)

2 ( 1)( 1) /

1

ˆ( , ) ( , ) ( )T

w i k l Tj j k

k

X l u w k u x t e

1,2,...l T

1, ,( , )

0, .

if k u window sizew k u

otherwise

bb

Page 12: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Window size of STDFT

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010

15

20

25

30

35

40

45

50

55

m/z

Num

ber

of p

oint

s in

1D

a

Fig. 3 Approximate number of data points (window size) in 1Da region as a function of m/z. With the increasing of m/z, the region with same number of time point is becoming larger.

Page 13: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Estimate of the chemical noise

Small m/z Large m/z

5360 5365 5370 5375 5380 5385 5390 5395 5400

0

5

10

15

20

25

30

35

40

m/zin

tens

ity

Fig. 4 Estimate of the chemical noise for the two enlarged regions in Fig. 1. Black curve: smoothed signal; Blue curve: estimate of the chemical noise. If the smoothed signal minus the estimate of the chemical noise is less than a certain threshold, it is set to be zero.

Page 14: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Peak identification

Normalization of the processed spectrum The spectrum is divided by the mean intensity of the processed

spectrum to correct the systematic differences

Definition of the Signal-to-noise ratio (SNR)

Peak identification

The peak is located in the position where it has the highest intensity in a cluster of peaks

Intensity of the estimated true signal

Mean of the intensity of the random noise in a windowSNR

Page 15: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Peak identification

5360 5365 5370 5375 5380 5385 5390 5395 5400

0

5

10

15

20

25

30

35

40

45

m/z

inte

nsity

1340 1345 1350 1355 1360 1365 1370 1375 13800

20

40

60

80

100

120

140

160

180

200

m/z

inte

nsity

Fig. 5 Examples of envelope extraction and peak detection. The blue curves show the envelope extraction. Each concave part of the curve corresponds to the same protein. The red circles

label the peaks detected by our method.

Small m/z Large m/z

Page 16: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Results:

Data sets -The data got from the serum of a cohort of patients

undergoing endarterctomy (EA) (therapy) for occluded carotid arteries (disease).

-Here we use the data of a group of all the patients: the symptomatic EA patients (EAS).

-After the sample preparation and the data acquisition, we finally got 35 spectra.

Page 17: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Numerical experiment settings The threshold of wavelet transform: 6 The threshold of chemical noise removal: 80

percentiles of the value got by subtracting the estimated chemical noise from the smoothed signal in each adaptive region

We define true peaks: the number of peaks in one location is more than 15% of the number of spectra in one group in the same 1Da region.

Position of the peaks: the mean m/z over all the peaks in 1 Da.

Page 18: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Peaks in the EAS group

1000 2000 3000 4000 5000 6000 7000 8000 9000

5

10

15

20

25

30

35

m/z

num

ber

of s

pect

ra

Fig. 6 All peaks detected in EAS group when SNR is 2. One bad spectrum is removed. Totally 34 spectra are considered.

Page 19: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

SNR comparison

Fig. 7 Comparison of SNR between the same spectrum without and with removal of chemical noise.

Left: the SNR got without chemical noise removal.

Right: SNR got with chemical noise removal.

Page 20: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

Conclusions

Proposed an automatic peak detection method for a kind of MALDI data: chemical noise removal to increase the SNR of some small peaks.

Future works: Peak alignment: to align all the peaks

corresponding to the same protein but with shifting

Feature extraction: to select the biomarkers

Page 21: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

References

Berndt, P., Hobohm, U., Langen, H., Reliable automatic protein identification from matrix-assisted laser desorption ionization mass spectrometric peptide finger prints, Electrophoresis, 1999, 20, 3521-3526.

Breen, E. J, Hopwood, F.G., Williams, K.L., Wilkins, M. R., Automatic poisson peak harvesting for high throughput protein identification, Electrophoresis, 2000, 21, 2243-2251.

Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., H. M. Kuerer, PImproved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics, 2005, 5, 4107-4117.

Du, P., Kibbe, W. A., Lin, S. M., Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics, 2006, 22, 2059-2065.

Gras, R., Muller, M., Gasteiger, E., Gay, S., Binz, P. A., Bienvenut, W., Hoogland, C., Sanchez, J. C., Bairoch, A., Hochstrasser, D. R., Appel, R. D., Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection, Electrophoresis, 1999, 20, 3535-3550.

Page 22: Peak Detection with Chemical Noise Removal Using Short-Time FFT for a Kind of MALDI Data Xiaobo Zhou HCNR-CBI, Harvard Medical School and Brigham & Women’s

References Jurgen, K., Marc, G., Keith, R., Matthias, W., Noise filtering techniques for

electrospray quadrupole time of flight mass spectra, J. Am. Soc. Mass Spectrom., 2003, 14, 766-776.

Krutchinsky, A. N., Chait, B. T., On the nature of the chemical noise in MALDI mass spectra, Journal American Society for Mass Spectrometry, 2002, 13, 129-134.

Morris, J. S., Coombes, K. R., Koomen , J., Baggerly, K. A., Kobayashi, R., Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum, Bioinformatics, 2005, 21, 1764-1775.

Samuelsson, J., Dalevi, D., Levander, F., Rognvaldsson, T., Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting, Bioinformatics, 2004, 20, 3628-3635.

Yasui, Y., McLerran, D., Adam, B., Winget, M., Thornquist, M., Feng, Z., An automated peak-identification / calibration procedure for high-dimensional protein measures from mass spectrometers, Journal of Biomedicine and Biotechnology, 2003, 4, 242-248.

Yu, W., Wu, B., Lin, N., Stone, K., Williams, K., Zhao, H., Detecting and Aligning Peaks in Mass Spectrometry Data with Applications to MALDI, Computational Biology and Chemistry, 2006, 30, 27-38.