57
A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳吳吳 1

A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

Embed Size (px)

Citation preview

Page 1: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

1

A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data

吳立青

Page 2: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

2

Outline

• Introduction• Methods• Data source• Methods of comparison• Results• Conclusion

Page 3: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

3

Introduction

• Using protein mass spectrometry to discriminate diseased from healthy individuals becomes more popular

• MALDI-TOF and SELDI-TOF have the advantage :– Fast– High through-put– Accuracy– Protein ID

Page 4: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

4

SELDI-TOF MS applications in clinical oncology

Page 5: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

5

The example of ovarian cancer data

•Large scale

•Full of noise

•Nonlinear and non-stationary

Page 6: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

6

The analysis of mass spectra seems to be simple.However, we do suffer from several problems ,and they need to

be solved.

Motivation

Page 7: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

7

Common preprocessing step

• Baseline subtraction• Denoising : very important and also complicated !

• Normalization• Peak detection• Peak alignment

Page 8: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

8

Noise component

• Chemical noise : – From the matrix material and sample

contaminations– One kind of the biochemical material

• Electrical noise : The physical characteristics of the machine – Do not mean anything actually

Page 9: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

9

Chemical noise

• Chemical component (organic acid)– We call it Matrix

• Ionization– Provide H+ to peptide or protein for ionization and

flight in the machine• Protection

– Protect the peptide or protein in the process of laser flash

Page 10: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

10

Problem

• The simulated model before did not separate the chemical noise from spectra.

• The chemical noise mixed with the machine noise is worse.

• A novel preprocessing method should be developed

Page 11: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

11

Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29.

Page 12: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

12

Goal

• Develop a method which can be satisfied– The electrical noise should be removed– Preserve the significant peaks even the chemical

noise

Page 13: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

13

Methods

• Hilbert Huang transform– Denoising

• Modification– Baseline subtraction– Rescale– Peak detection

Page 14: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

14

Flow chart

Ovarian cancer data HHT

IMFs

Remove IMFs

Modification

Baseline subtraction

Rescale

Peak detection

Page 15: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

15

Hilbert Huang transform

• Method : Hilbert Huang transform (HHT)– Wu, Z., N. E. Huang, et al. (2007). "On the trend, detrending, and variability of nonlinear and

nonstationary time series." Proc Natl Acad Sci U S A 104(38): 14889-94.

– An adaptive data analysis method for nonlinear and non-stationary processes

– The main feature of HHT is the empirical mode decomposition (EMD)

– After the process of EMD, we get the intrinsic mode functions and remove several from them as noise

• Goal : denoising

Page 16: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

16

Process of EMD (1/5)

• Find the envelope of the local maxima

Page 17: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

17

Process of EMD (2/5)

• Find the envelope of the local minima

Page 18: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

18

Process of EMD (3/5)

• Compute the mean envelope from the maximum envelope and minimum envelope

Page 19: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

19

Process of EMD (4/5)• We get IMF1 (i.e. h1) by subtracting the mean

envelope m1 from the original signal X(t)

Page 20: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

20

Process of EMD (5/5)

• We take IMF1 as X(t) and repeat the same process and so on.

• We terminate the process untill the number of the extrema and the zero-crossing of IMFn

differ by more than one

Page 21: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

21

IMFs : 1~16

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

0

1

2

3

4

5

6

7

8

9x 10

9

HHT

Page 22: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

22

Modification

• Baseline subtraction– Remove the systematic artifacts

• Rescale– Shift the scale to positive

• Peak detection– Key feature of the preprocessing method– We compare several popular methods

Page 23: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

23

Baseline subtraction

Page 24: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

24

Baseline subtraction

Page 25: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

25

Peak detection

• We use three popular algorithms for peak detection– MassSpecWavelet (Du, Kibbe et al. 2006)– SpecAlign (Wong, Cagney et al. 2005)– PROcess (Li 2005)

Page 26: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

26

Data source

• Source : National Cancer Institute• Type : 50 ovarian cancer data• Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced

laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

• Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29

Page 27: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

27

Methods of comparison

• Judgment– Count of peaks detected– Real location of the peaks in visual

• Interior comparison– HHT and modification+SpecAlign– HHT and modification+PROcess– HHT and modification+MassSpecWavelet

Page 28: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

28

Methods of comparison

• Exterior comparison– SpecAlign

• Abbreviation : SA

– PROcess• Interpolation : PRO1• Regression : PRO2

– MassSpecWavelet• Abbreviation : MSW

– PRO2+MSW• As suggested in Cruz-Marcelo, Guerra et al. 2008

Page 29: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

29

Raw data

Page 30: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

30

Results after HHT and modification

Page 31: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

31

Results of interior comparison

• Results of interior comparison

Algorithm \ peak detected 2000~ 4000 DA

4000~6000 DA

6000~8000 DA

8000~10000 DA

10000~12000 DA

12000~14000 DA

14000~15000 DA Total

HHT modification +MassSpecWavelet 35 32 27 21 44 38 21 218

HHT modification+SpecAlign 18 17 6 11 8 9 3 80

HHT modification+PROcess 21 18 19 13 14 13 10 108

Page 32: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

32

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

Modified HHT+MSW Peak detected : 218M over z range : whole region

Page 33: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

33

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

Modified HHT+SpecAlign Peak detected : 80M over z range : whole region

Page 34: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

34

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

Modified HHT+PROcess Peak detected : 108M over z range : whole region

Significant peak lost

Page 35: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

35

Results of exterior comparison

• Results of exterior comparisonAlgorithm \ peak detected 2000~

4000 DA4000~6000 DA

6000~8000 DA

8000~10000 DA

10000~12000 DA

12000~14000 DA

14000~15000 DA Total

mHHT+MassSpecWavelet 35 32 27 21 44 38 21 218

mHHT+SpecAlign 18 17 6 11 8 9 3 80

mHHT+PROcess 21 18 19 13 14 13 10 108

SpecAlign 67 43 21 18 16 14 7 186

PRO1 46 24 19 18 16 16 6 145

PRO2 40 23 8 12 13 14 4 114

MassSpecWavelet 51 39 25 24 22 20 7 188

PRO2+MSW 54 37 33 25 23 18 8 198

Page 36: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

36

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

PRO1 Peak detected : 145M over z range : whole region

Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

Page 37: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

37

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

PRO2 Peak detected : 114M over z range : whole region

Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

Page 38: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

38

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

MSW Peak detected : 188M over z range : whole region

Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

Page 39: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

39

安追

3

13

6000 6500 7000 7500 8000 85000

5

10

15x 10

8

MSWPeak detected : 25M over z range : 6000~8000

Page 40: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

40

安追

3

13

0 2000 4000 6000 8000 10000 12000 14000 160000

5

10

15x 10

8

PRO2+MSW Peak detected : 198M over z range : whole region

Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88

Page 41: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

41

安追

3

13

6000 6500 7000 7500 8000 85000

5

10

15x 10

8

PRO2+MSWPeak detected : 33M over z range : 6000~8000

Ex

Page 42: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

42

Results

• Interior comparison:– HHT and modification+MSW covers the most of

the peaks– HHT and modification+SpecAlign pick the most

important peaks• Exterior comparison:

– PROcess miss the significant peaks– MassSpecWavelet and PRO2MSW have many

redundancies

Page 43: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

43

Results of validation

• Validation– Data source : Cathay General Hospital– Experiments :

• Divide into three experiments– Water only– VrD1

Page 44: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

44

Water

Sample : waterOrganic acid : CHCA (<1000 DA)

Page 45: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

45

VrD1

Sample : VrD1Type : proteinOrganic acid : CHCA (<1000 Da)Molecular weight : 5119 Da

Page 46: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

46

Results of validation

Algorithm\sample Water VrD1 (Mw: 5119) Peak located in M/Z=5119 of VrD1

Double charge of VrD1 (5119+2)/2

Number of peaks (>1000Da) detected in

VrD1

MassSpecWavelet 400 369 Detected Detected 340

SpecAlign 391 477 Detected Detected 355

HHT modification+SpecAlign 17 22 Detected Detected 5

The amount of peaks which HHT modification removed 95.7% 94.8% 98.6%

Number of the peaks detected

Page 47: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

47

The peaks of Water detected by MassSpecWacelet

Page 48: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

48

The peaks of VrD1 detected by MassSpecWacelet

Molecular weight : 5119 Da

Page 49: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

49

The peaks of Water detected by SpecAlign

Page 50: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

50

The peaks of VrD1 detected by SpecAlign

Molecular weight : 5119 Da

Page 51: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

51

The peaks of water detected by HHT modification + SpecAlign

Page 52: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

52

The peaks of VrD1 detected by HHT modification + SpecAlign

Molecular weight : 5119 Da

Page 53: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

53

The peaks of VrD1 detected by HHT modification + SpecAlign 0-5200Da

Whole

Molecular weight : 5119 Da

Double charge : (5119+2)/2

Page 54: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

54

Results of validation

• MassSpecWavelet and SpecAlign do not remove the noise

• HHT and modification+SpecAlign detects the least peaks but the most significant peaks

• HHT and modification+SpecAlign removes 98.6% of the peaks (>1000Da) which are redudancies and noise

Page 55: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

55

Conclusion

• HHT performs well at denoising• As the result of comparison, HHT and

modification can make the raw data more simple

• Simultaneously, HHT and modification preserve the significant information.

• After the preprocessing of HHT and modification , it is suggested that detect the peaks by SpecAlign

Page 56: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

56

Acknowledgement

• 感謝本所陳欣昊同學的投入• 感謝黃鄂院士以及本所博士班林澂同學• 感謝汐止國泰醫院的鄭宇哲博士的實驗驗證

Page 57: A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青 1

57

Thanks for your attention!