Upload
liana-soulsby
View
218
Download
3
Embed Size (px)
Citation preview
1
Proteogenomic Noveltyin 105 TCGA Breast Tumors
Karl Clauser
CPTAC Breast Cancer Analysis Group
Broad Institute of MIT and Harvard
Fred Hutchinson Cancer Research Center
Washington University
New York University
CPTAC Data Jamboree
April 16, 2014
National Institutes of Health
Bethesda, Maryland
Tumor-specific protein databases forMS/MS-spectra searches
Kelly Ruggles, David Fenyo, NYU
QUILTS: Treatment of different variant types
Novel
Novel
Novel downstream: 1 frame translation
Novel upstream:6 frame translation
6 frame translation
6 frame translation
1 frame translation
1 frame translation
1 frame translation
1 frame translation
Unannotated Alternative Splicing
Partially Novel Splicing
Completely Novel Expression
Fusion Genes
Variants
In frameshifts db
In alternatesframeshifts
In variants db
In other db
Type Genome Proteome36 P/G Proteome27Variants (Nonsynonomous) 119,977 3,247 2.71% ∑ 3,028 Germline 91,944 2,138 2.33% 1,930 Somatic 9,607 88 0.92% 85 Germline & Somatic 18,426 1,021 5.54% 1,013
Alternative splicing (junction-spanning) 36,195 197 0.54% 279
Frameshifts (novel exon splicing 1 side) 20,240 82 0.41% Truncation (frameshift overlap) 3,671 22 0.60% Novel exon insertion (insert overlap) 4,643 11 0.24% Partial exon deletion (junction-spanning) 11,913 49 0.41%
Novel exon splicing (2 sides)Fusion genes (junction-spanning)Completely novel gene
Proteogenomic mapping: Genetic alterations can be observed on protein level (105 tumors)
|work inprogress
|
• Low thresholds applied to Genome calls (>1 read RNA-seq, >2 QUAL phred-scaled Variants)• High thresholds applied to Proteome calls (<0.1% FDR)
• 0.2-2.7% of frameshifts, alternative splices & single AA variants observable by proteomics• mRNA may not be translated or at low abundance• Proteome coverage is incomplete
S S
S S
1 mg total protein per tumorInternal reference: equal representation of basal, Her2 and Luminal A/B subtypes
Global proteome and phosphoproteomediscovery workflow for TCGA breast tumors
Serial Search Strategy with Personalized Databases
> Canonical ProteinSIGNALINGPATHWAYREGULATOR
25,776,160 Spectra(105 patients)(36 iTRAQ experiments)(25 LC-MS/MS runs / experiment)
RefSeq-Hs-7/2013: 31,852
11,328,955Matched Spectra (44% of total)(1% FDR)
3247 Variants Matched
197 Splice Junctions Matched
14,447,205LeftoverSpectra
• Concatenated FASTA files, 105 patients• Altered proteins
• Removed redundant entries
> Canonical – Variant Patient 1 SIGNALINGPATHWAHREGULATOR>Canonical Protein – Variant Patient 2 SIKNALINGPATHWAYREGULATOR
Variants: 133,241
> Canonical – Alternate splice Patient 1 SIGNALINGREGULATOR>Canonical – Alternate splice Patient 2 SIGNALINGPATHREGULATOR
Alternate Spliceforms: 67,853
Low confidence thresholds for Genome calls• Variants: >2 QUAL score (phred-
scaled) • Alternative splices, frameshifts: >1 read
Concatenated: 252,890
> Canonical – Truncation Patient 1 SIGNALINGPATFRAMESHIF>Canonical – Novel Exon Insert Patient 2 SIGNALINGPATHWAYINSERTREGULATOR>Canonical – Partial Exon Deletion Patient 3 SIGNALINGPATHWAYULATOR
Frameshifts: 19,944
22 Truncation Overlaps Matched
11 Insertion Overlaps Matched
49 Deletion Junctions Matched
High confidence for Proteome IDs• <0.1% FDR peptide spectrum match
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 1011
10
100
1,000
10,000
100,000
Germline Variants
Somatic Variants
Alternative Splices
Frameshifts
# Patients with Feature
# Fe
atur
esFrequency of Single AA Variants, Alternative Splices, Frameshifts Across Patients
verycommon
• Somatic variants are less frequent than germline variants• Some germline variants are very common
• Rare germline variants present in RefSeq• Some alternative splice forms and frameshifts are very common
• Should be in RefSeq
Genome & Transcriptome
Data
Max #Reads
17 observed
in >1Expmt
How many RNA-seq reads to yield a proteomics observation of an alternate splice or frameshift?
1 experiment: 3 individual patients + 1 Common control (40 patients)
82 Frameshifts197 Alternative splices
Max #Reads19
observedin >1
Expmt
0 5 10 15 20 25 30 351
10
100
1000
# Proteomics Experiments with Frameshift Peptide
Max
# R
eads
for
Fram
eshi
ft
Tran
scri
pt
0 5 10 15 20 25 30 351
10
100
1000
# Proteomics Experiments with Splice Peptide
Max
# R
eads
for
Splic
e Tr
ansc
ript
Frameshift Truncation: ras-Related protein Rab-15Observed only in Proteomics Exp 3
9
E159
Max RNA-Seq Reads: 1Present in only 1 Common control member
Frameshift Truncation: Cysteine-rich protein 1Observed in 9 Proteomics Experiments
10
E159
Max RNA-Seq Reads: 1Present in only 1 Common control member
Frameshift Truncation: Cullin-2 isoform aObserved in 3 Proteomics Experiments
11
E159
Max RNA-Seq Reads: 1Present in only 1 Common control member
Many missing observations even when transcript present in many common control members
1 experiment: 3 individual patients + 1 Common control (40 patients)
0 5 10 15 20 25 30 350
5
10
15
20
25
30
35
40
# of Proteomics Experiments with Splice Peptide
# Pa
tient
s in
Com
mon
Con
trol
wit
h A
S Tr
ansc
ript
0 5 10 15 20 25 30 350
5
10
15
20
25
30
35
40
# Proteomics Experiments with Frameshift Pep-tide
# Pa
tient
s in
Com
mon
Con
trol
wit
h Fr
ames
hift
Tra
nscr
ipt
FrameshiftsAlternative splices
1 4 7 10 13 16 19 22 25 28 31 340
5
10
15
20
25
30
35
40
45
50
38
11
2 2 31
31 2 1 2 2 2
0 0 0 0 0 1 0 1 20 1 0 0 0 0 0 1 0 0
2 1 2 1
# Proteomics Experiments with Frameshift Peptide
# Fr
ames
hift
Pep
tides
Majority of Alternative Splice Junctions and Frameshifts observed in >1 Proteomics Experiment
1 experiment: 3 individual patients + 1 Common control (40 patients)
Frameshifts
1 4 7 10 13 16 19 22 25 28 31 340
5
10
15
20
25
30
35
40
45
5047
26
1718
47 8
35
3 4 4
02 2 1 2 2 2
7
3 20 0
20
2 3 31 1
31
3 36
# Proteomics Experiments with Splice Junction Pep-tide
# A
lter
nativ
e Sp
lice
Junc
tion
Pepti
des
Alternative splices
150/197 observed in >1 experiment 44/82 observed in >1 experiment
Pie chart
Next steps:
• Examine “other” category– Fusion genes (junction-spanning)– Novel exon splicing (2 sides)– Completely novel gene
• Use updated somatic variants from QUILTS• Define genomic data thresholds suitable for proteomic observations
– RNA-seq: Min read count– Variant calling: phred-scaled QUAL score– Sort out Germline/Somatic variant call mix status across patients
Summary of Proteome Re-processing105 TCGA patients- 36 iTAQ experiments
15
MS/MS spectra Identified
PIP <50 Spectra (%)
Isobaric Labeled Spectra (%)
Isobaric Fully Labeled Spectra (%)
Isobaric No Label Spectra (%)
Isobaric Only Lys Label Spectra (%)
FDR Spectra (%)
FDR Distinct Peptide (%)
initial 10,264,670 100.00 100.00 0.00 0.00 0.87 5.46re-processed 11,232,970 7.77 97.40 92.30 2.60 4.95 0.89 6.00
Karl ClauserProteomics and Biomarker Discovery 16
Changes in Re-processing of TCGA dataExtraction• Centroiding Use Xcalibur , instead of SM.
• iTRAQ ratios are little changed,• intensities lower by ~5x (will more closely match NIST central analysis pipeline)
• Precursor MH+ range expanded from 750-4000 to 750-6000. Searches• Replace database with RefSeq version used as reference for the personalized database generation.
• database content/size very similar,• protein identifiers change from gi numbers to RefSeq numbers.
• Allowed modifications will be expanded. Increases the # of identified spectra by ~10%.• From Full iTRAQ, M-ox, N-deam, q-pyro• To iTRAQ-Full-Lys-only, M-ox, N-deam, q-pyro, c-pyro, Ac-nTermProt
Autovalidation• Proteome initial processing, peptide FDR per experiment : 1.1 -1.4%,
• but overall peptide FDR across all 36 experiments: ~5.5% • Phosphoproteome initial processing , peptide FDR per experiment : 1.6 -2.1%
• but overall peptide FDR across all 36 experiments: ~7.2%.Changes will seek to bring the overall peptide FDR’s down to ~1%
• require multiple observations (protein, P-site) across experiments• raise score thresholds
Quantitation• Will use PIP(precursor ion purity) filtering to exclude from quantitation but not identification.
• PIP > 50% excludes ~7.8% of spectra.• Filtering reduces standard deviations on protein & phosphosite level iTRAQ ratios
Y Chromosome Frameshift - CD99 antigenObserved in 36 Proteomics Experiments
17
Partial exon deletion splice, plus frameshift truncation
E159
Max RNA-Seq Reads: 12
Transcript present in 18/40 Common Control Members
Acknowledgments
Washington U./MD Anderson/NYU- Sherri Davies- Matthew Ellis- David Fenyo- Kelly Ruggles- Reid Townsend- Li Ding
Broad Institute/FHCRC- Steve Carr- Karl Clauser- Michael Gillette- Jana Qiao- Philipp Mertins- DR Mani- Eric Kuhn- Sue Abbatiello- Amanda Paulovich- Pei Wang- Sean Wang- Ping Yan
NCI Staff- Emily Boja- Mehdi Mesri- Rob Rivers- Chris Kinsinger- Henry RodriguezFunding
- National Cancer Institute
Single AA Variants may be Somatic in Some Patients, Germline in Others
Genomic
Proteomic
• Highly Interesting, should correlate with prognosis and/or subtype.
• May correlate with prognosis?• Might as well be canonical isoforms?
• Detectable, but too rare to indicate biology.
Variant Type Gen Prot P/GGermline Only >1 patient 34,022 1,226 3.6%Germline Only 1 patient 57,922 704 1.2%G&S mix 18,426 1,013 5.5%Somatic Only >1 patient 270 3 1.1%Somatic Only 1 patient 9,337 82 0.9%
119,977 3,028 2.5%
• G&S mix genomic variants have the highest observation rate by Proteomics.
• Genomic variants present in only a single patient are observable by Proteomics
81 PatientsNov 2013
Not all Germline &Somatic mix Single AA Variants are “Essentially” Germline
• Is G&S mix status primarily an artifact of variant calling accuracy/sensitivity?• Is there some cancer biology involved for high S/G ratio variants?
•Are patients with germline form more cancer prone?•Does somatic form correlate with prognosis, development of drug-resistance?
Genomic Proteomic
81 PatientsNov 2013
Wide Range of Somatic Single AA Variants/Patient
D8-A13Y A7-A0CJ A2-A0YM E2-A10A AR-A0TV AN-A0AL BH-A0BZ A8-A09I BH-A18N BH-A0C0 AO-A12B A8-A08G AR-A0U4 AR-A1AW BH-A0DG A8-A08Z A2-A0EX A2-A0T210
100
1,000
10,000
100,000Germline VariantsSomatic VariantsAlternative Splices
Low confidence thresholds applied to calls• Variants: >2 QUAL score (phred-scaled) • Alternative splices: >1 read
Skip