23
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis

  • Upload
    mari

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis. Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center. Systems Biology. Structured High-Throughput Experiments. Knowledge Databases. - PowerPoint PPT Presentation

Citation preview

Page 1: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Generalized Protein Parsimony and Spectral Counting

for FunctionalEnrichment Analysis

Nathan EdwardsDepartment of Biochemistry and

Molecular & Cellular BiologyGeorgetown University Medical Center

Page 2: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Systems Biology

2

Structured High-Throughput

Experiments

KnowledgeDatabases

Page 3: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

molecular biology ↕

phenotype

Systems Biology

3

KnowledgeDatabases

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

Page 4: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

molecular biology ↕

phenotype

Systems Biology

4

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabases

Page 5: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

molecular biology ↕

phenotype

Systems Biology

5

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 6: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

molecular biology ↕

phenotype

Systems Biology

6

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 7: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

molecular biology ↕

phenotype

Systems Biology

7

MathematicalModels

Structured High-Throughput

Experiments• Localization• Function• Process• Interactions• Pathway• Mutation

• Proteomics• Sequencing• Microarrays• Metabolomics

molecular biology↕

biology

KnowledgeDatabasesFunctional

AnnotationEnrichment

Page 8: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Functional Annotation Enrichment In any draw, we expect:

~ 5 "evens", ~ 2 "≤ 10", etc. Each ball is equally likely Balls are independent p-value is surprise! For transcriptomics:

Genes ↔ Balls Genome ↔ Tumbler Diff. Expr. ↔ Draw Annotation ↔ "evens",…

8

Draw 10 of 50!

Page 9: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Why not in proteomics? Double counting and false positives…

…due to traditional protein inference

Proteomics cannot see all proteins… …proteins are not equally likely to be drawn

Good relative abundance is hard… …extra chemistries, workflows, and software …missing values are particularly problematic

9

Page 10: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

In proteomics… Double counting and false positives…

Use generalized protein parsimony

Proteomics cannot see all proteins… Use identified proteins as background

Good relative abundance is hard… Model differential spectral counts directly

10

Page 11: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Ignore some PSMs FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered

11

10%

Proteins

PSMs

Page 12: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Ignore some PSMs FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered

12

Proteins

PSMs

90%

Page 13: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Match uncovered PSMs to FDR

13

Page 14: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Plasma membrane enrichment Pellicle enrichment of plasma membrane

Choksawangkarn et al. JPR 2013 (Fenselau Lab) Six replicate LC-MS/MS analyses each

Cell-lysate (44,861 MS/MS) Fe3O4-Al2O3 pellicle (21,871 MS/MS)

625 3-unique proteins to match 10% FDR: Lysate: 18,976 PSMs; Pellicle: 13,723 PSMs 89 proteins with significantly (< 10-5) increased counts

14

Page 15: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Plasma membrane enrichment Na/K+ ATPase subunit alpha-1 (P05023):

Lysate: 1; Pellicle: 90; p-value: 5.2 x 10-33 Transferrin receptor protein 1 (P02786):

Lysate: 17; Pellicle: 63; p-value: 2.0 x 10-11 DAVID Bioinformatics analysis (89/625):

Plasma membrane (GO:0005886) : 29 (5.2 x 10-5) Transmembrane (SwissProtKW): 24 (1.3 x 10-6)

Transmembrane (SwissProtKW): Lysate: 524; Pellicle: 1335; p-value: 2.6 x 10-158

15

Page 16: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

A protein's PSMs rise and fall together!

16

Page 17: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

A protein's PSMs rise and fall together?

17

Page 18: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Anomalies indicate proteoforms

18

Page 19: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Nascent polypeptide-associated complex subunit alpha

19

7.3 x 10-8

Page 20: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

20

Pyruvate kinase isozymes M1/M22.5 x 10-5

Page 21: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Summary Functional annotation enrichment for

proteomics too: Careful counting (generalized parsimony) Differential abundance by spectral counts

Use (multivariate-)hypergeometric model for Differential abundance by spectral counts Proteoform detection

21

Page 22: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

HER2/Neu Mouse Model of Breast Cancer Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by LC-

MS/MS 1.4 million MS/MS spectra

Peptide-spectrum assignments Normal samples (Nn): 161,286 (49.7%) Tumor samples (Nt): 163,068 (50.3%)

4270 proteins identified in total 2-unique generalized protein parsimony

22

Page 23: Generalized  Protein  Parsimony and  Spectral  Counting  for  Functional Enrichment Analysis

Distribution of p-values (Yeast)

23