CRITICAL COMPARISON OF SAMPLE PREPARATION STRATEGIES FOR SHOTGUN PROTEOMIC ANALYSIS OF FORMALIN-FIXED,
PARAFFIN-EMBEDDED SAMPLES Marcello Abbondio, Alessandro Tanca*, Salvatore Pisanu,
Sergio Uzzau, Daniela Pagnozzi, Maria Filippa Addis Porto Conte Ricerche Srl, Tramariglio, Alghero (SS), Italy; *[email protected]
1. INTRODUCTION The growing field of formalin-fixed paraffin-embedded (FFPE) tissue proteomics holds promise for improving translational research. Worldwide archival tissue banks hold a significant number and variety of tissue samples, as well as a wealth of retrospective information regarding diagnosis, prognosis, and response to therapy. This makes them an important resource for protein biomarker discovery and validation. Direct tissue trypsinization (DT) and protein extraction followed by in solution digestion (ISD) or filter-aided sample preparation (FASP) are the most common workflows for shotgun LC-MS/MS analysis of FFPE samples. However, there is currently no consensus on the optimal protocol, and no studies critically comparing the performance of the three different methods with FFPE specimens have been reported so far. Liver tissue was chosen as a model in consideration of its high proteome complexity in terms of expressed proteins and metabolic pathways.
4. CONCLUSIONS These results highlight that diverse sample preparation strategies provide qualitatively and quantitatively different proteomic information, and present typical biases that should be taken into account when planning a shotgun proteomic investigation dealing with FFPE samples. In view of the considerable portion of unique identifications provided by each method (particularly by DT and FASP), when a sufficient amount of tissue is available, a complementary, parallel use of different sample preparation strategies is suggested to increase proteome coverage, width and depth.
3.2. QUALITATIVE AND QUANTITATIVE COMPARISON
A B
1
10
100
1000
1 10 100 1000
Log 1
0N
SAF
ISD
Log10 NSAF FASP
1
10
100
1000
1 10 100 1000
Log 1
0N
SAF
ISD
Log10 NSAF DT
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
FASP
Log10 NSAF DT
r=0.952 r=0.897 r=0.941
DT FASP
ISD
1274
50.0%
317
124
314 302
171 47
FASP
1
FASP
2
FASP
3
ISD
3
ISD
1
ISD
2
DT
1
DT
3
DT
2
1
10
100
1 10 100
Log 1
0N
SAF
ISD
Log10 NSAF FASP
1
10
100
1 10 100
Log 1
0N
SAF
ISD
Log10 NSAF DT
1
10
100
1 10 100Log 1
0N
SAF
FASP
Log10 NSAF DT
r=0.775 r=0.576 r=0.622
DT FASP
ISD
3595
26.6%
2746
406
3387 2329
745 204
FASP
1
FASP
2
FASP
3
ISD
3
ISD
1
ISD
2
DT
1
DT
2
DT
3
C D
E F
Top: Unsupervised hierarchical cluster analysis based on protein (A) and peptide (B) label-free quantitative data, respectively. Middle: Venn diagrams illustrating distribution of all identified proteins (C) and peptides (D). Percentage of common proteins and peptides are indicated in yellow. Bottom: Dot plots describing correlation of protein (E) and peptide (F) abundance between DT and FASP, DT and ISD, FASP and ISD. Pearson correlation coefficients are also reported.
3. RESULTS AND DISCUSSION
179
235
61
112
1126
58.0% 174 53
DT 1 DT 2
DT 3
1
10
100
1000
1 10 100 1000
Log 1
0N
SAF
DT
3
Log10 NSAF DT 2
1
10
100
1000
1 10 100 1000
Log 1
0N
SAF
DT
3
Log10 NSAF DT 1
1
10
100
1000
1 10 100 1000
Log 1
0N
SAF
DT
2
Log10 NSAF DT 1
r=0.910 r=0.957 r=0.931
136 128
163
179
1353
65.2% 54 63
FASP 1 FASP 2
FASP 3
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
FASP
2
Log10 NSAF FASP 1
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
FASP
3
Log10 NSAF FASP 1
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
FASP
3
Log10 NSAF FASP 2
r=0.995 r=0.974 r=0.978
ISD 1 ISD 2
ISD 3
139
1124
69.5%
43 28
139 75
68
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
ISD
2
Log10 NSAF ISD 1
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
ISD
3
Log10 NSAF ISD 1
1
10
100
1000
1 10 100 1000Log 1
0N
SAF
ISD
3
Log10 NSAF ISD 2
r=0.994 r=0.987 r=0.988
A
1462
1371
488
1126
2895
32.6% 1221 311
DT 1 DT 2
DT 3
1
10
100
1 10 100
Log 1
0N
SAF
DT
3
Log10 NSAF DT 2
1
10
100
1 10 100
Log 1
0N
SAF
DT
3
Log10 NSAF DT 1
1
10
100
1 10 100
Log 1
0N
SAF
DT
2
Log10 NSAF DT 1
r=0.582 r=0.561 r=0.488
960 1003
1310
1825
4623
44.1% 330 422
FASP 1 FASP 2
FASP 3
1
10
100
1 10 100Log 1
0N
SAF
FASP
3
Log10 NSAF FASP 2
1
10
100
1 10 100Log 1
0N
SAF
FASP
3
Log10 NSAF FASP 1
1
10
100
1 10 100Log 1
0N
SAF
FASP
2
Log10 NSAF FASP 1
r=0.920 r=0.705 r=0.728
ISD 1 ISD 2
ISD 3
691
2798
56.5%
214 113
519 268
347
1
10
100
1 10 100Log 1
0N
SAF
ISD
3
Log10 NSAF ISD 2
1
10
100
1 10 100Log 1
0N
SAF
ISD
3
Log10 NSAF ISD 1
1
10
100
1 10 100Log 1
0N
SAF
ISD
2
Log10 NSAF ISD 1
r=0.899 r=0.833 r=0.822
B
3.1. REPRODUCIBILITY
• lower reproducibility • good preservation of high-MW proteins • much lower keratin contamination • higher abundance of non tryptic peptides
• depletion of high-MW proteins • enrichment in hydrophobic and membrane proteins
• higher identification yields
• higher reproducibility
DT
FASP AND ISD
FASP
ISD
Qualitative and quantitative reproducibility of DT, FASP and ISD. A) Top: distribution of identified proteins among replicates. Percentage of common proteins are indicated in yellow. Bottom: correlation of protein abundance between all replicates combinations for every method. Pearson correlation coefficients are also reported. B) Same as Panel A but at peptide level.
3.5. NON-TRYPTIC AND FORMALDEHYDE-MODIFIED PEPTIDES DT FASP
ISD
75
78
129
226 525
37 37
DT FASP
ISD
10
25
76
117 270
8 12
A
B DT
187 317
DT
8687 +3.6%
mod no mod
715
DT
7735 1139
DT trypsin no enzyme
+7.3% 8651 416 1822
FASP FASP trypsin no enzyme
+3.8%
437
FASP FASP
10036 160 +1.5%
no mod mod
3734 278 1216
ISD ISD trypsin no enzyme
+5.3%
4745 205
ISD ISD
106 +2.1%
mod no mod
A) Left: distribution of peptides identified with ‘trypsin’ and ‘no enzyme’ searches in DT, FASP and ISD samples. Right: distribution of non-tryptic peptides among all methods. B) Left: distribution of peptides identified with standard search (‘no mod’) and search comprising formaldehyde-induced modifications (‘mod’) in DT, FASP and ISD samples. Right: distribution of formaldehyde-modified peptides among all methods.
3.4. QUANTITATIVE PROTEIN DISTRIBUTION: PHYSICOCHEMICAL FEATURES
0
5
10
15
20
25
30
35
40
45
50
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% N
SAF
MW (kDa)
*
** *
*
*
* * ** *
* * * * **** * ** * ** * * * * * *
*
*
0
5
10
15
20
25
30
<5 5-6 6-7 7-8 8-9 9-10 10-11 >11
% N
SAF
pI
*
*
**
*
*
**
*
*
** *
* **
*
0.0
0.1
0.2
0.3
0.4
0.5
0.6
GRAVY >0.5
% N
SAF
*
*
*
0
1
2
3
4
5
6
7
8
9
TMD>0 TMD>1 TMD>2
% N
SAF
*
**
**
* **
*
A B C D
0.0
0.2
0.4
0.6
0.8
1.0
1.2
GRAVY >0.5 GRAVY >0.5
proteins NSAF
%
DT
FASP
ISD
*
*
*
*
**
*
Quantitative protein distribution according to MW (A), pI (B), number of transmembrane domains (TMD, C) and hydrophobicity (GRAVY score, D). Mean and SD value of NSAF percentage for three independent experimental replicates are shown. NSAF values were expressed as percentage of all proteins. Asterisks indicate statistical significance according to Student’s t-test (p value < 0.05): statistically significant difference versus DT, versus FASP, versus ISD and versus all other methods. * * * *
0.0
0.5
1.0
1.5
2.0
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% N
SAF
MW (kDa)
*
* * *
*
*
* * * * ** * * * **
** * **
* *
*
*
*
**
* **
*
0.0
0.5
1.0
1.5
2.0
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-150 150-200 >200
% N
SAF
MW (kDa)
*
* * *
*
*
* * * * ** * * * **
** * **
* *
*
*
*
**
* *
*
*
3.3. QUANTITATIVE PROTEIN DISTRIBUTION: SUBCELLULAR LOCALIZATION
Mean and SD value of NSAF percentage for three independent experimental replicates are shown. NSAF values were expressed as percentage of the annotated proteins. Asterisks indicate statistical significance according to Student’s t-test (p value < 0.05): statistically significant difference versus DT versus FASP versus ISD versus all other methods
* * * *
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Extracellular matrix
Secreted
Membrane
Multi-pass membrane protein
Single-pass membrane protein
Peripheral membrane protein
Lipid-anchor
Cell membrane
Cytoplasm
Cytosol
Cytoskeleton
Nucleus
Nucleus membrane
Nucleus matrix
Nucleolus
Endoplasmic reticulum membrane
Endoplasmic reticulum lumen
Golgi apparatus
Mitochondrion outer membrane
Mitochondrion matrix
Mitochondrion inner membrane
Lysosome
NSAF
DT
FASP
ISD
*
**
*
*
*
*
***
*
***
*
*
**
**
**
*
***
*
**
*
*
**
***
***
***
* **
* **
2. METHODS
0
1
2
3
4
5
6
Categoria 1 Categoria 2 Categoria 3 Categoria 4
DIRECT TISSUE TRYPSINIZATION (DT) Ammonium bicarbonate 50 mM
FASP Microcon YM-30
IN SOLUTION DIGESTION (ISD) Detergent Removal Spin Columns
PROTEIN EXTRACTION SDS 2 %, DTT 200 mM, Tris–HCl (pH 8.8) 20 mM 99 °C for 60 min
DEPARAFFINIZATION & REHYDRATION
TRYPSIN DIGESTION PEPTIDE MIXTURE LC-MS/MS UltiMate 3000 RSLCnano LC system 485 min gradient
LTQ Orbitrap Velos - HCD
PROTEIN IDENTIFICATION Search engine: Sequest-HT
Peptide validation: Percolator
FDR < 1 % based on peptide q-value
DATA ANALYSIS Label free quantification via spectral counting
Multivariate statistics using Perseus
COMPARISON Reproducibility
Qualitative-quantitative
Subcellular localization, pI, MW, GRAVY, TMD
Formaldehyde-modified and non-tryptic peptides
HUMAN LIVER TISSUE 3 INDEPENDENT REPLICATES PER METHOD 5 5-μM-THICK SLICES PER REPLICATE
NSAF = Spc/L
∑ SpC/L
5. REFERENCES •Tanca A, Abbondio M, Pisanu S, Pagnozzi D, Uzzau S, Addis MF: Critical comparison of sample preparation strategies for shotgun proteomic analysis of formalin-fixed, paraffin-embedded samples: insights from liver tissue. Clin Proteomics 2014, 11(1):28. •Tanca A, Pagnozzi D, Addis MF: Setting proteins free: Progresses and achievements in proteomics of formalin-fixed, paraffin-embedded tissues. Proteomics Clin Appl 2012, 6:7–21.
•Zybailov B, Mosley AL, Sardiu ME, Coleman MK, Florens L, Washburn MP: Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res 2006, 5:2339–2347. •Ostasiewicz P, Zielinska DF, Mann M, Wisniewski JR: Proteome, phosphoproteome, and N-glycoproteome are quantitatively preserved in formalin-fixed paraffin-embedded tissue and analyzable by high-resolution mass spectrometry. J Proteome Res 2010, 9:3688–3700.
•Alkhas A, Hood BL, Oliver K, Teng PN, Oliver J, Mitchell D, Hamilton CA, Maxwell GL, Conrads TP: Standardization of a sample preparation and analytical workflow for proteomics of archival endometrial cancer tissue. J Proteome Res 2011, 10:5264–5271. •Gamez-Pozo A, Ferrer NI, Ciruelos E, Lopez-Vacas R, Martinez FG, Espinosa E, Vara JA: Shotgun proteomics of archival triple-negative breast cancer samples. Proteomics Clin Appl 2013, 7:283–291.