40
1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications Information-Theoretic Mass Spectral Library CSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion

Information-Theoretic Mass Spectral Library Search

  • Upload
    candid

  • View
    56

  • Download
    1

Embed Size (px)

DESCRIPTION

Outline Introduction Related Work Method Results and Discussion. Information-Theoretic Mass Spectral Library Search. Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications. Information-Theoretic Mass Spectral Library Search. - PowerPoint PPT Presentation

Citation preview

Page 1: Information-Theoretic Mass Spectral Library Search

1

Information-Theoretic Mass Spectral Library Search

Arvind Visvanathan

CSCE 990Seminar in Multi-Dimensional Chromatography Systems, Informatics,

and Applications

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 2: Information-Theoretic Mass Spectral Library Search

2

Outline

• Introduction– Mass spectrum search types

• Related Work– Other techniques

• NIST, PBM, DotMap

• Method– Probability and Information– Normalized distribution function

• Results• Conclusion

OutlineIntroduction

Related WorkMethod

Results and Discussion

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

Page 3: Information-Theoretic Mass Spectral Library Search

3

Introduction – Mass Spectrum

Mass SpectrumSearch AlgorithmSearch TypesApplications

OutlineIntroduction

Related WorkMethod

Results and Discussion

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

m/z

Inte

nsity

Decane

Page 4: Information-Theoretic Mass Spectral Library Search

4

Introduction – Mass Spectrum Search

OutlineIntroduction

Related WorkMethod

Results and Discussion

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MS Library

Unknown Spectrum Search

Algorithm

Pot

entia

l Mat

ches

Mass SpectrumSearch AlgorithmSearch TypesApplications

Page 5: Information-Theoretic Mass Spectral Library Search

5

Introduction – Search Types

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

• Identity search– Unknown mass spectrum present in library– Looking for exact spectrum

• Similarity search– Unknown mass spectrum not present in library– Looking for similar spectrum

Mass SpectrumSearch AlgorithmSearch TypesApplications

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 6: Information-Theoretic Mass Spectral Library Search

6

Introduction – MS Search Applications

• Steroid detection in athletes• Monitor patient breath during surgery• Composition of molecular species found in

space• Honey adulterated with corn syrup• Locate oil deposits• Monitor fermentation process in the

biotechnology industry• Detect dioxins in contaminated fish

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

Mass SpectrumSearch AlgorithmSearch TypesApplications

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 7: Information-Theoretic Mass Spectral Library Search

7

Related Work – NIST MS-Search [Stein ‘94]

• Pre-search the unknown spectra in library– Reduce search domain (160K 4K compounds)

• Compute match factor for each compound in the pre-search result

• Match Factor (MF)– Range 0-999– Higher the better

• Pre-search result sorted based on MF value• Pick the topmost compounds as possible matches

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MS SearchProbability Based MatchingDotMap

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 8: Information-Theoretic Mass Spectral Library Search

8

Related Work – NIST MS-Search [Stein ‘94]

• Match Factor Computation [Stein ‘94]– Term 1 – Mass weighted normalized dot product

– Term 2 – Relative intensities of adjacent peaks in both spectra

– Combination of F1 & F2

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MS SearchProbability Based MatchingDotMap

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 9: Information-Theoretic Mass Spectral Library Search

9

Related Work – NIST MS-Search [Stein ‘94]

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MS SearchProbability Based MatchingDotMap

OutlineIntroduction

Related WorkMethod

Results and Discussion

m/z Intensity

35 100

36 1

37 1

45 999

55 200

m/z Intensity

35 100

36 1

37 2

45 999

55 200

C-1 C-2

Compare

C-1 & C-1

Compare

C-1 & C-2

F1 999 999

F2 999 824

MF 999 925

Page 10: Information-Theoretic Mass Spectral Library Search

10

Related Work – Probability Based Matching [McLafferty et. al. ‘75]

• Confidence Value (K) instead of MF• Four components for each m/z

– Term 1 : U : Based on the uniqueness of a m/z value– Term 2 : A : Intensity contribution to the confidence– Term 3 : W : Window factor (measure of agreement)– Term 4 : D : Dilution factor (measure of purity)– K ∑ (U + A + W – D) for each m/z

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

MS SearchProbability Based MatchingDotMap

Page 11: Information-Theoretic Mass Spectral Library Search

11

Related Work – DotMap [Sinovec et. al. ‘04]

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

MS SearchProbability Based MatchingDotMap

Fumaric acid

Adipic acid

Lactic acid

DotMap

Page 12: Information-Theoretic Mass Spectral Library Search

12

Related Work – DotMap [Sinovec et. al. ‘04]

• Inverse problem• DotMap computed across the image

• Higher valued areas indicate presence of compound of interest

• Multiple compounds of interest– Compute DotMap overlay

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

MS SearchProbability Based MatchingDotMap

Page 13: Information-Theoretic Mass Spectral Library Search

13

Related Work – DotMap [Sinovec et. al. ‘04]

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

MS SearchProbability Based MatchingDotMap

Page 14: Information-Theoretic Mass Spectral Library Search

14

Related Work – DotMap [Sinovec et. al. ‘04]

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OutlineIntroduction

Related WorkMethod

Results and Discussion

MS SearchProbability Based MatchingDotMap

Page 15: Information-Theoretic Mass Spectral Library Search

15

Method – Motivation

• NIST MS-Search [Stein ‘94]– No domain information utilized

• PBM Matching [McLafferty et. al. ‘75]– Old technique (‘75)– Ad hoc domain information utilization

• DotMap– No domain information utilized

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 16: Information-Theoretic Mass Spectral Library Search

16

Method – Entropy

• Entropy based approach– Entropy measure of the amount of

uncertainty – Based on probabilities

• Include domain based knowledge (information) in computing the match factor

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 17: Information-Theoretic Mass Spectral Library Search

17

Method – Distribution Function

• Library– NIST EPA Library– 163K compounds

• Compute distribution function (DF)– 2 dimensional array

• m/z vs intensity

– DF[i][j]• # compounds in library

– m/z = i– Intensity = j

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 18: Information-Theoretic Mass Spectral Library Search

18

Method – Distribution Function

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

m/z

Inte

nsity

Page 19: Information-Theoretic Mass Spectral Library Search

19

Method – Normalized Distribution Function (NDF)

• Normalized Distribution Function

– NDF[mz][int] = DF[mz][int] / ∑ DF[mz][i]

– Where ∑ DF[mz][i] = 163K

– NDF Probabilities [0-1]

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

i

i

Page 20: Information-Theoretic Mass Spectral Library Search

20

Method – Assumptions

• AssumptionEach m/z is treated independently in the match

factor computation from normalized distribution function

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 21: Information-Theoretic Mass Spectral Library Search

21

Method – Match Factor

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

MotivationProbability & EntropyDistribution FunctionMatch Factor

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 22: Information-Theoretic Mass Spectral Library Search

22

Results – Overview

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

• Technique– Compound in library + Noise – Search noisy compound in library

• Evaluation metric - Average Rank– Rank = Position of correct compound in hit list– Repeat above 3000 times and take average rank

• Compared with– NIST– NISTDOT (First term in NIST algorithm)

Page 23: Information-Theoretic Mass Spectral Library Search

23

Results – Noise models

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

• AdditiveAU = AL + G(0,σ)

• MultiplicativeAU = AL + AL* G(0,σ)

• Johnson ColoredAU = AL + G(0,σ*√m)

• Random spectrumAU = AL + x * AR

Page 24: Information-Theoretic Mass Spectral Library Search

24

Results – Additive Noise

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

• Compound = Compound + Additive noise• Additive Gaussian noise

– Zero mean– Variable standard deviation

• For each m/z in library spectrumAU = AL + G(0,σ)

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 25: Information-Theoretic Mass Spectral Library Search

25

Results – Additive Noise (Example)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

0

200

400

600

800

1000

1200

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

m/z

Inte

ns

ity

Pure

Noisy

-50

-40

-30

-20

-10

0

10

20

27 34 41 48 55 62 69 76 83

m/z

No

ise

Inte

nsi

ty

Page 26: Information-Theoretic Mass Spectral Library Search

26

Results – Additive Noise (Performance)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 27: Information-Theoretic Mass Spectral Library Search

27

Results – Multiplicative Noise

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

• Compound = Compound + Multiplicative noise

• Multiplicative Gaussian noise – Zero mean– Variable standard deviation

• For each m/z in library spectrumAU = AL + AL* G(0,σ)

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 28: Information-Theoretic Mass Spectral Library Search

28

Results – Multiplicative Noise (Example)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

-200

-150

-100

-50

0

50

100

27 34 41 48 55 62 69 76 83

m/z

No

ise

Inte

nsi

ty

0

200

400

600

800

1000

1200

27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85

m/z

Inte

nsi

ty

Pure

Noisy

Page 29: Information-Theoretic Mass Spectral Library Search

29

Results – Multiplicative Noise (Performance)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 30: Information-Theoretic Mass Spectral Library Search

30

Results – Johnson Colored Noise

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

• Compound = Compound + Colored Noise• Gaussian noise

– Zero mean– Variable standard deviation

• For each m/z in library spectrumAU = AL + G(0,σ*√m)

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 31: Information-Theoretic Mass Spectral Library Search

31

Results – Johnson Colored Noise (Example)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

-50

-40

-30

-20

-10

0

10

20

30

40

27 34 41 48 55 62 69 76 83

m/z

No

ise

Inte

nsi

ty

0

200

400

600

800

1000

1200

27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85

m/z

Inte

nsi

ty

Pure

Noisy

Page 32: Information-Theoretic Mass Spectral Library Search

32

Results – Johnson Colored Noise (Performance)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 33: Information-Theoretic Mass Spectral Library Search

33

Results – Random Spectrum Noise

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

• Compound = Compound + Random Spectrum

• Additive Spectrum– Add x% of another random spectrum

• For each m/z in library or random spectrum– AU = AL + x * AR

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 34: Information-Theoretic Mass Spectral Library Search

34

Results – Random Spectrum Noise (Example)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

0

5

10

15

20

25

27 34 41 48 55 62 69 76 83

m/z

No

ise

Inte

nsi

ty

0

200

400

600

800

1000

1200

27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85

m/z

Inte

nsi

ty

Pure

Noisy

Page 35: Information-Theoretic Mass Spectral Library Search

35

Results – Random Spectrum Noise (Performance)

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 36: Information-Theoretic Mass Spectral Library Search

36

Results – Summary of Noise Models

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

• AdditiveAU = AL + G(0,σ)

• MultiplicativeAU = AL + AL* G(0,σ)

• Johnson ColoredAU = AL + G(0,σ*√m)

• Random SpectrumAU = AL + x * AR

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 37: Information-Theoretic Mass Spectral Library Search

37

Results – Summary of Noise Models

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

-200

-150

-100

-50

0

50

100

27 29 38 39 41 42 43 50 51 52 55 56 57 71 74 76 77 78 79 85

m/z

Inten

sity

Additive

Multiplicative

Johnson

Random

Page 38: Information-Theoretic Mass Spectral Library Search

38

Results – Summary of Noise Models

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

OverviewAdditive NoiseMultiplicative NoiseJohnson Colored NoiseRandom Spectrum Noise

OutlineIntroduction

Related WorkMethod

Results and Discussion

0

200

400

600

800

1000

1200

27 29 38 39 41 42 43 50 51 52 55 56 57 71 74 76 77 78 79 85

m/z

Inten

sity

Additive

Multiplicative

Johnson

Random

Page 39: Information-Theoretic Mass Spectral Library Search

39

Conclusion

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

• MS library search algorithm• Information theoretic

– Domain knowledge incorporated

• Algorithm works well for various noise models

• Future work– Must improve performance for the random

spectrum noise case

OutlineIntroduction

Related WorkMethod

Results and Discussion

Page 40: Information-Theoretic Mass Spectral Library Search

40

Questions & Suggestions

Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar

?

OutlineIntroduction

Related WorkMethod

Results and Discussion