13
Ce—M—M— Research Center for Molecular Medicine of the Austrian Academy of Sciences The isobar R package: Analysis of quantitative proteomics data F. Breitwieser J. Colinge Bioinformatics Open Source Conference, 2011 1 / 10

Bosc2011 isobar-fbp

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Bosc2011 isobar-fbp

Ce—M—M—Research Center for Molecular Medicineof the Austrian Academy of Sciences

The isobar R package:Analysis of quantitative proteomics data

F. Breitwieser J. Colinge

Bioinformatics Open Source Conference, 2011

1 / 10

Page 2: Bosc2011 isobar-fbp

isobar for Analysis of Quantitative Proteomics DataCe—M—M— F. Breitwieser & J. Colinge

1 General Statistical Modeling of Data from Protein Relative2 Expression Isobaric Tags3 Florian P. Breitwieser,† Andr�e M€uller,† Loïc Dayon,‡ Thomas K€ocher,z Alexandre Hainard,‡ Peter Pichler,§

4 Ursula Schmidt-Erfurth,|| Giulio Superti-Furga,† Jean-Charles Sanchez,‡ Karl Mechtler,z Keiryn L. Bennett,†

5 and Jacques Colinge*,†

6†CeMM, Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria

7‡Biomedical Proteomics Group, Department of Structural Biology and Bioinformatics, Faculty of Medicine, University of Geneva,

8 Geneva, Switzerland

9zInstitute of Molecular Pathology, Vienna, Austria

10§CD Laboratory for Proteome Analysis, University of Vienna, 1030 Vienna, Austria

11

)Department of Ophtalmology, Medical University of Vienna, Vienna, Austria

12 bS Supporting Information

’ INTRODUCTION

30 Proteomic technologies provide access to the protein content31 of biological samples1,2 and are important tools for current32 medical, biological, and systems biology research. Several highly33 efficient approaches also using MS exist to measure quantitative34 information related to proteins3�5 that can be combined with35 PTM analysis.36 In this work, we consider methods allowing the measurement37 of proteome-wide protein relative expression.5 In general, pro-38 tein digestion by an enzyme, e.g., trypsin, and tandem mass39 spectrometry (MS/MS) are required to identify the resultant40 peptides.6 The samples for comparison are prepared such that41 the peptides from each of them are labeled in order to distinguish42 them after sample pooling and shared MS analysis. Several43 methods have been designed along this principle, e.g., ICPL,7

44 ICAT,8 SILAC,9 COFRADIC,10 16O/18O,11 iTRAQ,12 and45 TMT13 to cite the most common ones. iTRAQ is especially46 convenient as (1) it can be multiplexed (up to 4 samples can be47 analyzed simultaneously), and (2) quantitative information re-48 sides in each single MS/MS spectrum (not necessary to combine

49spectra). Multiplexing is achieved through the use of isobaric tags50(equal mass) to label the peptides. These tags fragment during51MS/MS, thus yielding reporter peaks with distinct m/z ratios,12

52e.g., 114, 115, 116, and 117Da. Direct comparison of the reporter53peak intensities, or channel intensities, provides an estimate of54relative expression. TMT (2- or 6-plex) works according to the55same principle, and there exists an 8-plex version of iTRAQ; the56theory we develop here applies to all of them. In this work, we are57interested in the prevalent experimental settings where biological58samples are compared in a single experiment (( replicates).59Experimental design that is composed of multiple iTRAQ/TMT60experiments is out of the scope of this work and has been studied61by others.14�16

62Regarding statistical analysis, iTRAQ/TMT data have simila-63rities with gene microarray data, though they also have clear64specificities. One notable difference is the variability of available65information due to the variable number of measured spectra.

Received: December 23, 2010

13 ABSTRACT:Quantitative comparison of the protein content of biological14 samples is a fundamental tool of research. The TMT and iTRAQ isobaric15 labeling technologies allow the comparison of 2, 4, 6, or 8 samples in one16 mass spectrometric analysis. Sound statistical models that scale with the17 most advanced mass spectrometry (MS) instruments are essential for their18 efficient use. Through the application of robust statistical methods, we19 developed models that capture variability from individual spectra to20 biological samples. Classical experimental designs with a distinct sample21 in each channel as well as the use of replicates in multiple channels are22 integrated into a single statistical framework. We have prepared complex23 test samples including controlled ratios ranging from 100:1 to 1:100 to24 characterize the performance of our method. We demonstrate its application to actual biological data sets originating from three25 different laboratories and MS platforms. Finally, test data and an R package, named isobar, which can read Mascot, Phenyx, and26 mzIdentML files, are made available. The isobar package can also be used as an independent software that requires very little or no R27 programming skills.

28 KEYWORDS: bioinformatics, statistics, iTRAQ, TMT, quantitative proteomics

29

Journal of Proteome Research | 3b2 | ver.9 | 6/5/011 | 12:56 | Msc: pr-2010-012784 | TEID: sbh00 | BATID: 00000 | Pages: 8.99

ARTICLE

pubs.acs.org/jpr

rXXXX American Chemical Society A dx.doi.org/10.1021/pr1012784 | J. Proteome Res. XXXX, XXX, 000–000

■ Mass Spectrometers to identify and quantify proteins■ isobar: R package for handling isobarically tagged data

□ analyze and visualize protein expression changes□ interactive within R□ scripts to generate PDF (via LATEX) and Excel reports

■ http://bioinformatics.cemm.oeaw.ac.at/isobar

2 / 10

Page 3: Bosc2011 isobar-fbp

Quantitative Proteomics via Mass SpectrometryCe—M—M— F. Breitwieser & J. Colinge

■ peptide fragmentation spectrum for identification■ isobaric peptide tags for quantification

□ up to 8 different samples■ isobar package

□ extracts identification from Mascot/Phenyx results□ extracts quantitative information from spectrum□ groups proteins to have reporters with specific peptides

3 / 10

Page 4: Bosc2011 isobar-fbp

Modelling Technical Variability on a Spectrum LevelCe—M—M— F. Breitwieser & J. Colinge

■ correct for isotope impurities■ normalize■ handle technical variability

□ depends on signal intensity□ using noise model

ib <- correctIsotopeImpurities(ib)ib <- normalize(ib)nm <- NoiseModel(ib)maplot(ib,channel1="114",channel2="115",noise.model=nm)

4 / 10

Page 5: Bosc2011 isobar-fbp

Modelling Technical Variability on a Spectrum LevelCe—M—M— F. Breitwieser & J. Colinge

■ correct for isotope impurities ✓■ normalize■ handle technical variability

□ depends on signal intensity□ using noise model

ib <- correctIsotopeImpurities(ib)ib <- normalize(ib)nm <- NoiseModel(ib)maplot(ib,channel1="114",channel2="115",noise.model=nm)

4 / 10

Page 6: Bosc2011 isobar-fbp

Modelling Technical Variability on a Spectrum LevelCe—M—M— F. Breitwieser & J. Colinge

■ correct for isotope impurities ✓■ normalize ✓■ handle technical variability

□ depends on signal intensity□ using noise model

ib <- correctIsotopeImpurities(ib)ib <- normalize(ib)nm <- NoiseModel(ib)maplot(ib,channel1="114",channel2="115",noise.model=nm)

4 / 10

Page 7: Bosc2011 isobar-fbp

Modelling Technical Variability on a Spectrum LevelCe—M—M— F. Breitwieser & J. Colinge

■ correct for isotope impurities ✓■ normalize ✓■ handle technical variability

□ depends on signal intensity□ using noise model ✓

ib <- correctIsotopeImpurities(ib)ib <- normalize(ib)nm <- NoiseModel(ib)maplot(ib,channel1="114",channel2="115",noise.model=nm)

4 / 10

Page 8: Bosc2011 isobar-fbp

Differential Protein ExpressionCe—M—M— F. Breitwieser & J. Colinge

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●● ●

●●●

●●

●●●

●●●

●●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●●

●●●●

●●●●

●●●

●●●●

● ●

●●●

●●●

●●

●●●

●●●

●●●●

●●

●●●

●●

●●

5e+02 5e+03 5e+04 5e+05 5e+06

12

510

20

CERU_RAT

average intensity

ratio

● 115/114116/114117/114

■ spectra → peptides → protein■ summarize ratio with a weighted

mean□ relative to spectrum intensity

■ calculate significance afterassessing biological variability

■ compute ratios between classes□ Healthy versus Diseased

estimateRatio(ib,noise.model.hcd,"114","116",ceru.rat)proteinRatios(ib,cl=c("H","H","D","D"),

summarize=TRUE,method="interclass")maplot2(ib,relative.to="114",ceru.rat,main="CERU_RAT")

5 / 10

Page 9: Bosc2011 isobar-fbp

Deciding for significant regulationCe—M—M— F. Breitwieser & J. Colinge

■ Biological variability□ can be learned from replicates

−1 0 1

h1 h2 d1 d2

−1 0 1

h1 h2 d1 d2

■ ’Volcanoe plot’□ fold change versus p-value

● ●●

●●●●●

●●

●●

●●

●●

●●●

●●● ●● ●●● ● ●●

●●● ●

● ●●

●●●

● ●●

●● ●●● ●

●●● ●● ● ●●

●●

●● ● ●● ●●● ● ●●●

●●

● ●●

●●● ●●

●●●

●●●

● ●● ● ●●●● ●

●●

●● ●● ●●●

●●

●●● ●●●

●●

●●

●●●●

● ●●●

●●

●● ● ●●● ●●●

●● ●●●

●●● ●●●●

●●

●●

●● ●●

●● ● ●●●

●●●●●

●●●

●●

●●

● ●●

●●

● ●●

●●●● ●●

● ● ●●

●● ●●●●

●● ●

●● ●● ●

●●●●

●●●●●

●●●

● ●●●● ●

●●

●●

●●

●●

●●

●●●●●●

●●

●●●

● ●● ●

● ●

● ●●● ●

●● ●●●●●

●●●

●●●

●● ●● ● ●●

●●●●● ●

●●●

●●●

●● ●

●●

●● ●

●●

●●

●●

● ●●

●●

● ●●●●

●●

●●

●●

●●● ●

●●

●●●● ●

●●●

● ●● ●●

●●

●●

●●

●●●

●●●

● ●●

●●

●●● ●

●● ●● ●

●●

●●●

● ● ●● ●●● ●●● ●●●

●● ●●

● ●

●●● ● ●

● ●

● ●●

●●●●

● ● ●●●

● ●●

●●●

●●●

●● ●

● ●●

●● ●

●●●

●●

●● ●●

●● ●●●●

● ●● ●● ●●●

● ●● ●●

●●

●●

● ●●

●● ●

●●

●●●

●●

●●●●

● ●● ●● ●●

●●

●●

●●● ●●● ● ●●●● ●●●

●● ●

●● ●

●● ●●

●● ●

●●

●●

●● ●

●● ●

●●●●

●●●●●

●●● ● ● ●● ● ●● ●●●

●● ●●● ● ●●● ●●●

●●●●●

●● ●● ● ●● ●● ●●

●●

●●● ●●●

●●●

●●●

●● ●●●

●●●● ● ●

● ●●

● ●●

●●●

●● ●●

●●●

●●

● ●●● ●●● ●● ●●● ● ● ●●●

●●●●●

●● ● ● ●● ●●●● ● ●●● ●●● ●●● ●●●

●●

●● ● ●● ● ● ●● ● ●●●

●●●● ●

●●

●●● ●● ● ●●●

●●● ●●● ●

●●

●●

●●●● ●

●●●

● ●● ●●●

●● ●●● ●●●

● ●●● ●●●● ● ●

●●● ●●

● ●●●●●●

● ●● ●

●● ●●● ●●●●● ●●●● ●● ●●●● ●● ●● ● ●

●●●● ●●

●●●

● ●●

●●● ● ●●●

●●●

−4 −2 0 2 4

010

2030

4050

60− log10 sample p−value

− lo

g10

sign

al p

−va

lue

6 / 10

Page 10: Bosc2011 isobar-fbp

Automating the Analysis - PDF ReportCe—M—M— F. Breitwieser & J. Colinge

ch1 ch2 protein group peptides spectra ratio ..-5

.1.

5..

1 C T Serpina1e: Q00898 1/1 7 1 0.22 ..< . >

2 C T Acaca: Q5SWU91,2 2/2 5 4 0.40 ..< . >

3 C T Atp5j: P97450 1/1 4 19 0.49 ..< . >...

......

......

130 C T Hist1h3a: P68433,Hist1h3c: P84228 2/3 8 2 2.42 ..< . >

131 C T Postn: Q620091−5 5/5 1 3 3.05 ..< . >

132 C T Myh7: Q91Z83 1/1 128 62 3.66 ..< . >

■ via Sweave: R code within LATEX□ reproducible research

■ sections□ Significantly regulated proteins□ All protein ratios□ Protein grouping

■ not shown: QC report, Excel report

Proteins

pos accession gene name protein name

1 P68433 Hist1h3a Histone H3.1

1 P84228 Hist1h3c Histone H3.2

2 P84244 H3f3b Histone H3.3

Peptides

peptidesrs gs us

1 1 7 0

2 0 7 0

Sweave("isobar -analysis.Rnw") # generate report using Sweave7 / 10

Page 11: Bosc2011 isobar-fbp

AcknowledgmentsCe—M—M— F. Breitwieser & J. Colinge

■ Research Center for Molecular Medicine, Vienna□ Jacques Colinge□ Keiryn Bennett’s Masspec group□ Giulio Superti-Furga□ Bioinformatics group

■ Alexey Stukalov■ Gerhard Duernberger■ Patrick Meidl

■ .. isobar Collaborators□ University of Geneva: Jean-Charles Sanchez□ IMP, Vienna: Peter Pichler and Karl Mechtler

■ Open Source Software Developers□ Richard Stallman, Linus Torvalds, Robert Gentleman, . . .□ Donald Knuth, Hadley Wickham, Till Tantau, . . .

8 / 10

Page 12: Bosc2011 isobar-fbp

Appendix: Quality Control ReportCe—M—M— F. Breitwieser & J. Colinge

mass

count

0

100

200

300

400

500

tag 116: m/z 116.11

−1e−03

−5e−04

0e+005e−04

1e−03

tag 117: m/z 117.11

−1e−03

−5e−04

0e+005e−04

1e−03

■ shows reporter mass precision and biological variability

reporterMassPrecision(ib)Sweave("isobar -qc.Rnw")

9 / 10

Page 13: Bosc2011 isobar-fbp

Appendix: Protein Identification using Mass SpectrometerCe—M—M— F. Breitwieser & J. Colinge

10 / 10