21
1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer Research Center Washington University New York University CPTAC Data Jamboree April 16, 2014 National Institutes of Health Bethesda, Maryland

1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Embed Size (px)

Citation preview

Page 1: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1

Proteogenomic Noveltyin 105 TCGA Breast Tumors

Karl Clauser

CPTAC Breast Cancer Analysis Group

Broad Institute of MIT and Harvard

Fred Hutchinson Cancer Research Center

Washington University

New York University

CPTAC Data Jamboree

April 16, 2014

National Institutes of Health

Bethesda, Maryland

Page 2: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Tumor-specific protein databases forMS/MS-spectra searches

Kelly Ruggles, David Fenyo, NYU

Page 3: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

QUILTS: Treatment of different variant types

Novel

Novel

Novel downstream: 1 frame translation

Novel upstream:6 frame translation

6 frame translation

6 frame translation

1 frame translation

1 frame translation

1 frame translation

1 frame translation

Unannotated Alternative Splicing

Partially Novel Splicing

Completely Novel Expression

Fusion Genes

Variants

In frameshifts db

In alternatesframeshifts

In variants db

In other db

Page 4: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Type Genome Proteome36 P/G Proteome27Variants (Nonsynonomous) 119,977 3,247 2.71% ∑ 3,028 Germline 91,944 2,138 2.33% 1,930 Somatic 9,607 88 0.92% 85 Germline & Somatic 18,426 1,021 5.54% 1,013

Alternative splicing (junction-spanning) 36,195 197 0.54% 279

Frameshifts (novel exon splicing 1 side) 20,240 82 0.41% Truncation (frameshift overlap) 3,671 22 0.60% Novel exon insertion (insert overlap) 4,643 11 0.24% Partial exon deletion (junction-spanning) 11,913 49 0.41%

Novel exon splicing (2 sides)Fusion genes (junction-spanning)Completely novel gene

Proteogenomic mapping: Genetic alterations can be observed on protein level (105 tumors)

|work inprogress

|

• Low thresholds applied to Genome calls (>1 read RNA-seq, >2 QUAL phred-scaled Variants)• High thresholds applied to Proteome calls (<0.1% FDR)

• 0.2-2.7% of frameshifts, alternative splices & single AA variants observable by proteomics• mRNA may not be translated or at low abundance• Proteome coverage is incomplete

S S

S S

Page 5: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1 mg total protein per tumorInternal reference: equal representation of basal, Her2 and Luminal A/B subtypes

Global proteome and phosphoproteomediscovery workflow for TCGA breast tumors

Page 6: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Serial Search Strategy with Personalized Databases

> Canonical ProteinSIGNALINGPATHWAYREGULATOR

25,776,160 Spectra(105 patients)(36 iTRAQ experiments)(25 LC-MS/MS runs / experiment)

RefSeq-Hs-7/2013: 31,852

11,328,955Matched Spectra (44% of total)(1% FDR)

3247 Variants Matched

197 Splice Junctions Matched

14,447,205LeftoverSpectra

• Concatenated FASTA files, 105 patients• Altered proteins

• Removed redundant entries

> Canonical – Variant Patient 1 SIGNALINGPATHWAHREGULATOR>Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR

Variants: 133,241

> Canonical – Alternate splice Patient 1 SIGNALINGREGULATOR>Canonical – Alternate splice Patient 2 SIGNALINGPATHREGULATOR

Alternate Spliceforms: 67,853

Low confidence thresholds for Genome calls• Variants: >2 QUAL score (phred-

scaled) • Alternative splices, frameshifts: >1 read

Concatenated: 252,890

> Canonical – Truncation Patient 1 SIGNALINGPATFRAMESHIF>Canonical – Novel Exon Insert Patient 2 SIGNALINGPATHWAYINSERTREGULATOR>Canonical – Partial Exon Deletion Patient 3 SIGNALINGPATHWAYULATOR

Frameshifts: 19,944

22 Truncation Overlaps Matched

11 Insertion Overlaps Matched

49 Deletion Junctions Matched

High confidence for Proteome IDs• <0.1% FDR peptide spectrum match

Page 7: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 1011

10

100

1,000

10,000

100,000

Germline Variants

Somatic Variants

Alternative Splices

Frameshifts

# Patients with Feature

# Fe

atur

esFrequency of Single AA Variants, Alternative Splices, Frameshifts Across Patients

verycommon

• Somatic variants are less frequent than germline variants• Some germline variants are very common

• Rare germline variants present in RefSeq• Some alternative splice forms and frameshifts are very common

• Should be in RefSeq

Genome & Transcriptome

Data

Page 8: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Max #Reads

17 observed

in >1Expmt

How many RNA-seq reads to yield a proteomics observation of an alternate splice or frameshift?

1 experiment: 3 individual patients + 1 Common control (40 patients)

82 Frameshifts197 Alternative splices

Max #Reads19

observedin >1

Expmt

0 5 10 15 20 25 30 351

10

100

1000

# Proteomics Experiments with Frameshift Peptide

Max

# R

eads

for

Fram

eshi

ft

Tran

scri

pt

0 5 10 15 20 25 30 351

10

100

1000

# Proteomics Experiments with Splice Peptide

Max

# R

eads

for

Splic

e Tr

ansc

ript

Page 9: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Frameshift Truncation: ras-Related protein Rab-15Observed only in Proteomics Exp 3

9

E159

Max RNA-Seq Reads: 1Present in only 1 Common control member

Page 10: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Frameshift Truncation: Cysteine-rich protein 1Observed in 9 Proteomics Experiments

10

E159

Max RNA-Seq Reads: 1Present in only 1 Common control member

Page 11: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Frameshift Truncation: Cullin-2 isoform aObserved in 3 Proteomics Experiments

11

E159

Max RNA-Seq Reads: 1Present in only 1 Common control member

Page 12: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Many missing observations even when transcript present in many common control members

1 experiment: 3 individual patients + 1 Common control (40 patients)

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

40

# of Proteomics Experiments with Splice Peptide

# Pa

tient

s in

Com

mon

Con

trol

wit

h A

S Tr

ansc

ript

0 5 10 15 20 25 30 350

5

10

15

20

25

30

35

40

# Proteomics Experiments with Frameshift Pep-tide

# Pa

tient

s in

Com

mon

Con

trol

wit

h Fr

ames

hift

Tra

nscr

ipt

FrameshiftsAlternative splices

Page 13: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

1 4 7 10 13 16 19 22 25 28 31 340

5

10

15

20

25

30

35

40

45

50

38

11

2 2 31

31 2 1 2 2 2

0 0 0 0 0 1 0 1 20 1 0 0 0 0 0 1 0 0

2 1 2 1

# Proteomics Experiments with Frameshift Peptide

# Fr

ames

hift

Pep

tides

Majority of Alternative Splice Junctions and Frameshifts observed in >1 Proteomics Experiment

1 experiment: 3 individual patients + 1 Common control (40 patients)

Frameshifts

1 4 7 10 13 16 19 22 25 28 31 340

5

10

15

20

25

30

35

40

45

5047

26

1718

47 8

35

3 4 4

02 2 1 2 2 2

7

3 20 0

20

2 3 31 1

31

3 36

# Proteomics Experiments with Splice Junction Pep-tide

# A

lter

nativ

e Sp

lice

Junc

tion

Pepti

des

Alternative splices

150/197 observed in >1 experiment 44/82 observed in >1 experiment

Pie chart

Page 14: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Next steps:

• Examine “other” category– Fusion genes (junction-spanning)– Novel exon splicing (2 sides)– Completely novel gene

• Use updated somatic variants from QUILTS• Define genomic data thresholds suitable for proteomic observations

– RNA-seq: Min read count– Variant calling: phred-scaled QUAL score– Sort out Germline/Somatic variant call mix status across patients

Page 15: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Summary of Proteome Re-processing105 TCGA patients- 36 iTAQ experiments

15

MS/MS spectra Identified

PIP <50 Spectra (%)

Isobaric Labeled Spectra (%)

Isobaric Fully Labeled Spectra (%)

Isobaric No Label Spectra (%)

Isobaric Only Lys Label Spectra (%)

FDR Spectra (%)

FDR Distinct Peptide (%)

initial 10,264,670 100.00 100.00 0.00 0.00 0.87 5.46re-processed 11,232,970 7.77 97.40 92.30 2.60 4.95 0.89 6.00

Page 16: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Karl ClauserProteomics and Biomarker Discovery 16

Changes in Re-processing of TCGA dataExtraction• Centroiding Use Xcalibur , instead of SM.

• iTRAQ ratios  are little changed,• intensities lower by ~5x (will more closely match NIST central analysis pipeline)

• Precursor  MH+  range expanded from 750-4000 to 750-6000. Searches• Replace database with RefSeq version used as reference for the personalized database generation.

• database content/size very similar,• protein identifiers change from gi numbers to RefSeq numbers.

• Allowed modifications will be expanded. Increases the # of identified spectra by ~10%.• From Full iTRAQ, M-ox, N-deam, q-pyro• To iTRAQ-Full-Lys-only, M-ox, N-deam, q-pyro, c-pyro, Ac-nTermProt

 Autovalidation• Proteome initial processing, peptide FDR per experiment : 1.1 -1.4%,

• but overall peptide FDR across all 36 experiments: ~5.5% • Phosphoproteome initial processing , peptide FDR per experiment : 1.6 -2.1%

• but overall peptide FDR across all 36 experiments: ~7.2%.Changes will seek to bring the overall peptide FDR’s down to ~1%

• require multiple observations (protein, P-site) across experiments• raise score thresholds

 Quantitation• Will use PIP(precursor ion purity) filtering to exclude from quantitation but not identification.

• PIP > 50% excludes ~7.8% of spectra.• Filtering reduces standard deviations on protein & phosphosite level iTRAQ ratios

Page 17: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Y Chromosome Frameshift - CD99 antigenObserved in 36 Proteomics Experiments

17

Partial exon deletion splice, plus frameshift truncation

E159

Max RNA-Seq Reads: 12

Transcript present in 18/40 Common Control Members

Page 18: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Acknowledgments

Washington U./MD Anderson/NYU- Sherri Davies- Matthew Ellis- David Fenyo- Kelly Ruggles- Reid Townsend- Li Ding

Broad Institute/FHCRC- Steve Carr- Karl Clauser- Michael Gillette- Jana Qiao- Philipp Mertins- DR Mani- Eric Kuhn- Sue Abbatiello- Amanda Paulovich- Pei Wang- Sean Wang- Ping Yan

NCI Staff- Emily Boja- Mehdi Mesri- Rob Rivers- Chris Kinsinger- Henry RodriguezFunding

- National Cancer Institute

Page 19: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Single AA Variants may be Somatic in Some Patients, Germline in Others

Genomic

Proteomic

• Highly Interesting, should correlate with prognosis and/or subtype.

• May correlate with prognosis?• Might as well be canonical isoforms?

• Detectable, but too rare to indicate biology.

Variant Type Gen Prot P/GGermline Only >1 patient 34,022 1,226 3.6%Germline Only 1 patient 57,922 704 1.2%G&S mix 18,426 1,013 5.5%Somatic Only >1 patient 270 3 1.1%Somatic Only 1 patient 9,337 82 0.9%

119,977 3,028 2.5%

• G&S mix genomic variants have the highest observation rate by Proteomics.

• Genomic variants present in only a single patient are observable by Proteomics

81 PatientsNov 2013

Page 20: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline

• Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity?• Is there some cancer biology involved for high S/G ratio variants?

•Are patients with germline form more cancer prone?•Does somatic form correlate with prognosis, development of drug-resistance?

Genomic Proteomic

81 PatientsNov 2013

Page 21: 1 Proteogenomic Novelty in 105 TCGA Breast Tumors Karl Clauser CPTAC Breast Cancer Analysis Group Broad Institute of MIT and Harvard Fred Hutchinson Cancer

Wide Range of Somatic Single AA Variants/Patient

D8-A13Y A7-A0CJ A2-A0YM E2-A10A AR-A0TV AN-A0AL BH-A0BZ A8-A09I BH-A18N BH-A0C0 AO-A12B A8-A08G AR-A0U4 AR-A1AW BH-A0DG A8-A08Z A2-A0EX A2-A0T210

100

1,000

10,000

100,000Germline VariantsSomatic VariantsAlternative Splices

Low confidence thresholds applied to calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read

Skip