77
/128 © Burkhard Rost (TUM Munich) 1 title: Predict PPI / Protein-DNA / GO short title: pp2_ppi2 lecture: Protein Prediction 2 - Protein function TUM Winter 2011/2012 Monday February 6, 2012

Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich) 1

title: Predict PPI / Protein-DNA / GOshort title: pp2_ppi2

lecture: Protein Prediction 2 - Protein function TUM Winter 2011/2012

Monday February 6, 2012

Page 2: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Announcements

Videos: SciVe / www.rostlab.orgTHANKS : Tim Karl + Julia GerkeSpecial lectures:

• Jan 25: Marco De Vivo (ISS Geneva)• Jan 27: Marco Punta (Pfam)

NO lectures (not final):

LAST lecture: Feb 3Examen: Feb 8, 12:00 (likely this room)

• Makeup: likely: Apr 19 - morning

CONTACT: Marlena Drabik [email protected]

2Monday February 6, 2012

Page 3: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Today: Secondary structure prediction 1

LAST YEAR• Predicting effects of changeTHIS WEEK

• Predicting effects of change• Protein protein interactionsNEXT WEEK

• Marco Punta (Pfam, Sanger, Cambridgeshire): Families• Marco DeVito (Geneva, ISS): Drug design 2 WEEKs from now

• Protein-protein interactions• Protein-DNA interactions

3Monday February 6, 2012

Page 4: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

IV. (b)Predict protein interactions

4Monday February 6, 2012

Page 5: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

IV.6 protein interactionsPPI - predictions

5Monday February 6, 2012

Page 6: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Protein-protein interaction networks

6

S Li et al. & M Vidal (2004) Science 303, 540-3

Monday February 6, 2012

Page 7: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich) 7CE Turner (2000) J Cell Sci 13, 4139-40

Monday February 6, 2012

Page 8: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

8

(A) PROFILES:• AJ Enright, I Ilipoulos, NC Kyrpides and CA Ouzounis 1999 Nature 402, 86-90• M Pellegrini, EM Marcotte, MJ Thompson, D Eisenberg and TO Yeates 1999 PNAS 96, 4285-4288

Monday February 6, 2012

Page 9: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

9Monday February 6, 2012

Page 10: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

9

(B) FUSION:• T Gaasterland and MA Ragan 1998 Microb Comp Genomics 3, 177-192

• EM Marcotte, M Pellegrini, HL Ng, DW Rice, TO Yeates and D Eisenberg 1999 Science 285, 751-753

Monday February 6, 2012

Page 11: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

10Monday February 6, 2012

Page 12: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

10

(C) CORRELATED MUTATIONS:• F Pazos and A Valencia 2002 Proteins 47, 219-227

Monday February 6, 2012

Page 13: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Mirror tree: similarity of phylogenetic trees

11Pazos and Valencia, (2001) Protein EngineeringJuan et al. (2008). PNAS. © Ta-Tsen Soong, Columbia Univ

Monday February 6, 2012

Page 14: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Mirror tree vs. phylogenetic profiles

Mirror tree more sophisticated

12

Mirror tree

Phylogenetic profiles

F Pazos & A Valencia (2001) Protein Engineering© Ta-Tsen Soong, Columbia Univ

Monday February 6, 2012

Page 15: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Mirror tree vs. phylogenetic profiles

Mirror tree more sophisticated

12

Mirror tree

Phylogenetic profiles

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 16: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Mirror tree vs. phylogenetic profiles

Mirror tree performs worse than phylogenetic profiles

13F Pazos & A Valencia (2001) Protein Engineering

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 17: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Mirror tree vs. phylogenetic profiles

Mirror tree performs worse than phylogenetic profiles

13© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 18: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

14Monday February 6, 2012

Page 19: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions

14

MOTIFS:• E Sprinzak & H Margalit 2001 J Mol Biol 311, 681-692

• SM Gomez & A Rzhetsky 2002 Pac Symp Biocom 413-24Monday February 6, 2012

Page 20: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

In silico predictions of P=P interactions(A) PROFILES:• AJ Enright, I Ilipoulos, NC Kyrpides and CA Ouzounis 1999 Nature 402, 86-90• M Pellegrini, EM Marcotte, MJ Thompson, D Eisenberg and TO Yeates 1999 PNAS 96, 4285-4288(B) FUSION:

• T Gaasterland and MA Ragan 1998 Microb Comp Genomics 3, 177-192• EM Marcotte, M Pellegrini, HL Ng, DW Rice, TO Yeates and D Eisenberg 1999 Science 285, 751-753

(C) CORRELATED MUTATIONS:• F Pazos and A Valencia 2002 Proteins 47, 219-227

MOTIFS:• E Sprinzak & H Margalit 2001 J Mol Biol 311, 681-692

• SM Gomez & A Rzhetsky 2002 Pac Symp Biocom 413-2415Monday February 6, 2012

Page 21: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Features commonly used for PPI prediction

16

Gene fusion Homology (interolog)

Domain interactionMicroarrays

Functional similarityPhylogenetic profile

Enright, et al., (1999) NatureMatthews, et al., (2001) Genome Res.Rhodes, et al. (2005) Nature Biotech

Monday February 6, 2012

Page 22: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

Other sources with evidence for PPI

17Monday February 6, 2012

Page 23: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Features commonly used for PPI prediction

18

Gene fusion Homology (interolog)

Domain interactionMicroarrays

Functional similarityPhylogenetic profile

Enright, et al., (1999) NatureMatthews, et al., (2001) Genome Res.Rhodes, et al. (2005) Nature Biotech

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 24: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Integrating diverse data types

19

Gene fusionHomology

MicroarrayFunctionalsimilarity

Sequencedomain

Mirror tree

Phylogeneticprofiles Conserved

coexpression

SVM-basedprotocol Subcellular

localization

Text mining

Integration (naïve Bayes)

Ta-Tsen Soong & B Rost, unpublished © Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 25: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Integrative PPI prediction

20A Rzehtsky et al. (2004) Text mining: GeneWays. JBIR Nair & B Rost (2005) LocTree. JMB

Are

a un

der R

OC

YEAST, FPR< .01

0.01

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 26: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Integrative PPI prediction

21

Are

a un

der R

OC

YEAST, FPR< .01

0.01

HUMAN, FPR< .01

Are

a un

der R

OC

0.01

© Ta-Tsen Soong, Columbia Univ A Rzehtsky et al. (2004) Text mining: GeneWays. JBIR Nair & B Rost (2005) LocTree. JMB

Monday February 6, 2012

Page 27: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Integrative PPI prediction

22

Are

a un

der R

OC

YEAST, FPR< .01

0.01

HUMAN, FPR< .01

Are

a un

der R

OC

0.01

© Ta-Tsen Soong, Columbia Univ A Rzehtsky et al. (2004) Text mining: GeneWays. JBIR Nair & B Rost (2005) LocTree. JMB

all better than random (0.005)combination bestmajor contributions: GO, Text mining, SVMat low FPR: homology, gene fusion, domain intraction

Monday February 6, 2012

Page 28: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Data coverage

23

1 GPL570 for human, GPL90 for yeast2 Predictions made with LocTree (Nair and Rost, 2005). Experimental annotations taken from SWISS-PROT3 Annotations taken from the GeneWays database (Rzhetsky, et al. 2004)

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 29: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

PPI through array data?

24Monday February 6, 2012

Page 30: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Microarray data

25

cDNA microarrays measure gene expression in high-throughput (ht) manner

Cancer cells Normal cells

RNA isolation

Hybridization to microarray

mRNA

cDNA

Expression level readout

Reverse transcriptase

labeling

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 31: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

High-throughput technologies

Yeast two-hybrid system• Interaction type: transient, binary• Takes place in the nucleus• Shortcomings: folding, localization, post-translational modification.

Affinity purification with mass spectrometry (AP-MS)• Interaction type: protein complex membership• Takes place in the native cellular environment• Shortcomings: affinity tag interference, purification, sticky proteins, no

details about pairwise binding.

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 32: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Microarrays

Large amount of data available• Human: ~137,000 samples in

GEO microarray database (Barrett, T. et al. 2007. NAR)

• 18 organisms with > 1000 samples

mRNA level correlates with protein abundance (r= .57) (Ghaemmaghami , et al. 2003. Nature)

PPI prediction from microarrays• Correlation of expression patterns

Stable, permanent protein complexesTransient, direct, physical PPIs

• Difficult to predict physical PPIs from microarray data

R Jansen et al. & M Gerstein (2002) Genome Research

Microarray coexpression (Pearson correlation)

27© Ta-Tsen Soong, Thesis Defense (2009), Columbia Univ. Monday February 6, 2012

Page 33: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Experiments

Yeast S. cerevisiaeInteractions:

• 5299 interactions from DIP (Salwinski, et al. 2004. NAR)• 5299 random protein pairs (Ben-hur, et al. 2005. Bioinformatics)Microarrays:

• 349 microarrays from GEO database (Barrett, et al. 2007. NAR)• Remove noise and extract underlying biological processesCompare our protocol with correlation-based predictions

• Cross validation• Genome wide analysis

28© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 34: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Microaarray expression reveals functional associations

Physical protein–protein interactions predicted from microarrays*

*Soong, TT, Wrzeszczynski, KO, Rost, B. (2008) Bioinformatics.29© Ta-Tsen Soong, Columbia Univ

Monday February 6, 2012

Page 35: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich) 30

Association vs. Interactiongp120

CD4

antibody-1antibody-2

A BC

D

E

FG7 physical PPI:

AB, BC, CD, DE, DF, EF, FG7*6/2=21 associations

Monday February 6, 2012

Page 36: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Microaarray expression reveals functional associationsMost associated proteins are not in direct physical contact. Our goal:predict physical interactions from microarray data

Physical protein–protein interactions predicted from microarrays*

*Soong, TT, Wrzeszczynski, KO, Rost, B. (2008) Bioinformatics.31© Ta-Tsen Soong, Columbia Univ

Monday February 6, 2012

Page 37: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Two components of method

PCA to group the microarray experiments (noise reduction)SVM to separate association and physical interaction

32© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 38: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Step 1: PCA noise reduction

Remove noise and recover underlying biological processes• Principal Component Analysis (PCA)

Statistical technique (projection method)– Misra, et al. (2002) Genome Research– Liebermeister (2002) Bioinformatics– Lee, et al. (2003) Genome Biology

PCA components correspond to distinct biological processes

Microarray samples

Gen

es

PCA component, expression mode, eigenarray

Gen

es

PCA

Ranked by importance (eigenvalue)

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 39: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Step 2: SVM physical vs associateLearn PPIs from PCA components with SVM

Vapnik Statistical Learning Theory, 1998

Kernel function

Gen

es

top N PCA components

Outer-product

Concatenation

Protein features Protein pairwise features

Ranked by importance

Non-interaction

Interaction

Unknown pair

Classify

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 40: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

SVM provided better prediction than correlation

Implemented the correlation-based method as a Bayes modelBayes (correlation) performed slightly better than random (green vs. diagonal).A small number of PCA components performed better than Bayes (e.g. SVM20 > Bayes).Performance increases with more input PCA components. Reaches the maximum at ~150 (SVM150 > SVM50 > SVM20).SVM provided performance improvement (SVMAllMA> Bayes).

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 41: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

SVM provided better prediction than correlation

Implemented the correlation-based method as a Bayes modelBayes (correlation) performed slightly better than random (green vs. diagonal).A small number of PCA components performed better than Bayes (e.g. SVM20 > Bayes).Performance increases with more input PCA components. Reaches the maximum at ~150 (SVM150 > SVM50 > SVM20).SVM provided performance improvement (SVMAllMA> Bayes).

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 42: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

PCA components improve SVM

Compared SVM performance with increasing PCA components (red) to using randomly selected microarrays (green) as input.PCA components provide a more distinct representation of gene activity.

36

SVM: with PCA componentsSVM: with microarrays

A. FPR<0.05 B. Entire ROC

Are

a un

der R

OC

Are

a un

der R

OC

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 43: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Prediction score indicative of network distance

Predicted interaction score for all protein pairs in the DIP network and plotted against network distance.SVM score is significantly more correlated with network distance than Bayes is (p<<.05).Potential use of SVM score to help functional prediction in a network context.

37

SVM Bayes

r= .29 r= .04Net

wor

k di

stan

ce

Net

wor

k di

stan

ce

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 44: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Predictions confirmed by experimental annotations

SVM in general have more predictions confirmed by BioGRID*.SVM also predicted other types of interactions (e.g. genetic)Big difference between two Affinity Purification methods.

38© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 45: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Promising predictions by the SVM8% of top predictions share specific Gene Ontology annotations suggesting biologically plausible interactions, while only 2% are expected by chance.

Examples from literature:• POB3_YEAST (YML06W) and CTK3_YEAST (YML11W)

Both interact with RNA pol II and are involved in chromatin modulated transcription functionsSuggested role in regulation of FACT via the Ctk kinase complex(Singer and Johnston. 2004. Biochem Cell Biology. 82:419-427; Wood et al. 2007. Mol Cell Biol. 27:709-720)

• SEC27_YEAST (YGL137W) and GCS1_YEAST (YDL226C)Implicated through E-MAP experiments (Schuldiner, et al. 2005. Cell. 123:507-519)Sec27p is a coatomer subunit and is known to bind the di-lysine motif critical to retrograde transport of proteins from the Golgi to the ER.Gcs1p contains the di-lysine motif and also acts as a mediator in the secretory pathway, suggesting a plausible interaction.

© Ta-Tsen Soong, Columbia Univ Monday February 6, 2012

Page 46: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

gp120

CD4

antibody-1antibody-2

A->B->C->D : 6 possible, 3 true

40Monday February 6, 2012

Page 47: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

Microarray data can predict physical

interactionsT-t Soong, K Wrzeszczynski & B Rost 2008 Bioinformatics: 2608-14

gp120

CD4

antibody-1antibody-2

A->B->C->D : 6 possible, 3 true

40Monday February 6, 2012

Page 48: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

IV.7 protein interactionsPPI - PiNat

41Monday February 6, 2012

Page 49: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

PiNat (Protein Interaction Network analysis tool)

42Y Ofran et al. & Rost 2006 Bioinformatics 22:e402-7Monday February 6, 2012

Page 50: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Protein-protein interactions across compartments

43

Extra-cellular Cytoplasm Organelles Mitochondri

a Nuclear TMtransmembrane

Extra-cellular

Cytoplasm

Organelles

Mitochondria

Nuclear

TMtransmembrane

Monday February 6, 2012

Page 51: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

PiNat (Protein Interaction Network analysis tool)

44Y Ofran G Yachdav, E Mozes, T Soong, R Nair, B Rost al. 2006 Bioinformatics 15:22 e402-7

Monday February 6, 2012

Page 52: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

PiNat view of Alzheimers

45Y Ofran G Yachdav, E Mozes, T Soong, R Nair, B Rost al. 2006 Bioinformatics 15:22 e402-7

Q9P2H0

ADD

Monday February 6, 2012

Page 53: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

PiNat (Protein Interaction Network analysis tool)

46Y Ofran G Yachdav, E Mozes, T Soong, R Nair, B Rost al. 2006 Bioinformatics 15:22 e402-7

Q9P2H0

ADD

Monday February 6, 2012

Page 54: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

IV.8 protein interactionsProtein-DNA interactions

47Monday February 6, 2012

Page 55: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

PPI interfaces use local segments

48Y Ofran & B Rost (2003) FEBS Lett 544, 236-9

Monday February 6, 2012

Page 56: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Datas protein-DNA interaction

291 protein-DNA complexes from PDB

250 chains bind DNA

46,000 residues

• Trevor Siggers / Barry Honig

49Monday February 6, 2012

Page 57: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Impressively accurate

50Y Ofran & B Rost (2004) unpublished

Monday February 6, 2012

Page 58: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Very accurate prediction of DNA binding

51Y Ofran & B Rost (2004) in preparation

Monday February 6, 2012

Page 59: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Very accurate prediction of DNA binding

51Y Ofran & B Rost (2004) in preparation

Monday February 6, 2012

Page 60: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Most predictions are discoveries!

52Y Ofran & B Rost (2004) in preparation

Monday February 6, 2012

Page 61: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Future DNA/RNA-binding

ConsolidateProteomesDNA/RNADNA-binding and membrane insertionExperimental verification of new motifsDiscover unknown DNA-binders in regulatory complexes:

• Transcription factor X• Find all proteins Y implicated with X that:

not known to bind DNA/RNApredicted by our method

53

T Agalioti, G Chen, D Thanos (2002) Cell 111, 381-92

Monday February 6, 2012

Page 62: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

Most predictions new!

54Increasing accuracy for subset

Y Ofran & B Rost (2004) unpublishedMonday February 6, 2012

Page 63: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

DNA/RNA motif-discovery engine

55

0

2

4

6

8

10

12

14

16

Knownbindingmotifs

Alwayspredicted

Random

Y Ofran, V Mysore, R Nair & B Rost (2004) unpublished

0

2

4

6

8

10

12

14

16

18

Possiblemotif(>10)

Alwayspredicted

Random

Monday February 6, 2012

Page 64: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

How many known motifs picked up?

56

0

2

4

6

8

10

12

14

16

Knownbindingmotifs

Alwayspredicted

Random

Y Ofran & B Rost (2004) unpublished

Monday February 6, 2012

Page 65: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

How many new motifs discovered?

57

0

2

4

6

8

10

12

14

16

18

Possiblemotif(>10)

Alwayspredicted

Random

0

10

20

30

40

50

60

70

80

90

Y Ofran & B Rost (2004) unpublishedMonday February 6, 2012

Page 66: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

How many new motifs discovered?

57

0

2

4

6

8

10

12

14

16

18

Possiblemotif(>10)

Alwayspredicted

Random

Y Ofran & B Rost (2004) unpublishedMonday February 6, 2012

Page 67: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128©

CAFA:Critical Assessment of protein Function

Annotation58

Monday February 6, 2012

Page 68: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

CAFA: SIG meeting @ ISMB/ECCB 2011

59

Iddo FriedbergMiami University

Oxford OH

Predrag RadivojaIndiana UniversityBloomington IN

Monday February 6, 2012

Page 69: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

CAFA data

60

September 15, 2010Sequences released (48,298)

Molecular Function

Biological Process

a Timeline

b Target Counts c Functional Terms

January 18, 2011Submission deadline

September 21, 2011Target set defined (762)

0

1

2

4

8

16

32

64

128

256

1

12

2

3

3

Molecular Function

Biological Process

Total

Prediction Phase Target Accumulation Phase

CAFA: P Radivojac et al. & I Friedberg (2012) in submissionMonday February 6, 2012

Page 70: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

CAFA: top performers

61

0.2

0.3

0.4

0.5

0.6

0.2

0.3

0.4

0.5

0.6

4

5

6

7

8

9

4

5

6

7

8

9

Molecular�Function Biological�Process

C: semantic distance D: semantic distance

A: maximum F-measure B: maximum F-measure

CAFA: P Radivojac et al. & I Friedberg (2012) in submissionMonday February 6, 2012

Page 71: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

CAFA homology-based inferrence

62CAFA: T Hamp et al. (2012) submittedMonday February 6, 2012

Page 72: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/128© Burkhard Rost (TUM Munich)

CAFA ranking (homology only)

63

BPOBPOBPO MFOMFOMFOTop-20 Threshold Leaf Top-20 Threshold Leaf

Priors 8 8 11 7 6 11Priors‘ 10 10 10 10 10 6BLAST 9 9 9 6 9 10GOtcha 6 6 8 2 3 9Student A 5 5 5 8 7 5Student A‘ 3 4 4 5 5 2Student B 11 11 7 11 11 7Student B‘ 2 2 1 3 4 1Student C 7 7 6 9 8 8Student C‘ 4 3 3 4 2 4MetaStudent‘ 1 1 2 1 1 3

CAFA: T Hamp et al. (2012) submittedMonday February 6, 2012

Page 73: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Announcements

Videos: SciVe / www.rostlab.orgTHANKS : Tim Karl + Julia GerkeSpecial lectures:

• Jan 25: Marco De Vivo (ISS Geneva)• Jan 27: Marco Punta (Pfam)

NO lectures (not final):

LAST lecture: Feb 3Examen: Feb 8, 12:00 (likely this room)

• Makeup: likely: Apr 19 - morning

CONTACT: Marlena Drabik [email protected]

64Monday February 6, 2012

Page 74: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

/89© Burkhard Rost (TUM Munich)

Lecture plan01: 2011/10/19: welcome: who we are02: 2011/10/21: individualized medicine03: 2011/10/26: Intro - function 1: concepts04: 2011/10/28: ?05: 2011/11/02: FVV (Student plenary meeting) 06: 2011/11/04: ?07: 2011/11/09: Intro - function 2: homology08: 2011/11/11: Intro - function 3: motifs09: 2011/11/16: Andrea Schafferhans: Docking10: 2011/11/18: Andrea Schafferhans: 3D function prediction11: 2011/11/23: Localization 112: 2011/11/25: Localization 213: 2011/11/30: Marc Offman: Flexibility 114: 2011/12/02: Marc Offman: Flexibility 215: 2011/12/07: Bioinfo & Industry + Localization 316: 2011/12/09: Localization 317: 2011/12/14: skip18: 2011/12/16: Localization 4: Tatyana Goldberg19-20: no lectures (2011/12/21 - 2011/12/23)21-24: no lectures - winter break (2011/12/21 - 2012/01/06)25: 2012/01/11: SNP effect 1 26: 2012/01/13: SNP effect 227: 2012/01/18: SNP effect 3 / Protein-protein interaction 128: 2012/01/20: Protein-protein interaction 229: 2012/01/25: Marco De Vivo (ISS Geneva)30: 2012/01/27: Marco Punta (Pfam)31: 2012/02/01: Protein-DNA interaction132: 2012/02/03: Protein-DNA interaction 2

65Monday February 6, 2012

Page 75: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

© Burkhard Rost (ISCB President)

ISMB 2012 Long Beach Jul 15-17

ISCB Conferences DirectorSteven LeardMarketwhys Corp.

Honorary ChairSydney BrennerUCSD, USA

Burkhard RostTUM Munich, Germany & Columbia Univ USA

Terry GaasterlandUCSD, USA

SC Co-Chairs

Rick LathropUC Irvine, USA

Monday February 6, 2012

Page 76: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

© Burkhard Rost (ISCB President)

Key Submission Deadlines Special Interest Groups Oct 7 2011 Special Sessions Oct 21 Proceedings Papers Jan 13 2012 Workshops Feb 10 Highlights Papers Mar 2 Late Breaking Mar 9 Posters Mar 16 Travel Fellowship Applications Apr 13 Technology Track Apr 20 Late Posters Apr 20

Student Council Symposium May 4Monday February 6, 2012

Page 77: Predict PPI / Protein-DNA / GO pp2 ppi2 lecture: Protein ... · Tim Karl + Julia Gerke Special lectures: • Jan 25: Marco De Vivo (ISS Geneva) • Jan 27: Marco Punta (Pfam) NO lectures

© Burkhard Rost (ISCB President)

Key Submission Deadlines Special Interest Groups Oct 7 2011 Special Sessions Oct 21 Proceedings Papers Jan 13 2012 Workshops Feb 10 Highlights Papers Mar 2 Late Breaking Mar 9 Posters Mar 16 Travel Fellowship Applications Apr 13 Technology Track Apr 20 Late Posters Apr 20

Student Council Symposium May 4

google “ismb 2012” for details

Monday February 6, 2012