36
of protein- protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

  • Upload
    salali

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Inferring strengths of protein-protein interactions from experimental data using linear programming. Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University. Overview. Background Probabilistic model Related work Biological experimental data - PowerPoint PPT Presentation

Citation preview

Page 1: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Inferring strengths           of protein-protein interactions from experimental data using     linear programming

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu

Bioinformatics Center,Kyoto University

Page 2: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 3: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Background (1/3) Understanding protein-protein

interactions is useful for understanding of protein functions. Transcription factors

Proteins interact with a factor. Regulate the gene.

Receptors, etc.

Page 4: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Background (2/3) Various methods were developed for inf

erence of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. a

nd Marcotte et al. 1999) Number of possible genes to be applied is limit

ed. Molecular dynamics

Long CPU time Difficult to predict precisely

Page 5: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Background (3/3) A Model based on domain-domain

interactions has been proposed. Use domains defined by databases

like InterPro or Pfam.

Domain

Domain

Page 6: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 7: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Probabilistic model of interaction (1/2) Model (Deng et al., 2002)

Two proteins interact. At least one pair of domains

interacts. Interactions between domains are

independent events.D1

D2

D3

D2 D4

P2P1

Page 8: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

: Proteins Pi and Pj interact : Domains Dm and Dn interact : Domain pair (Dm ,Dn) is include

d in protein pair PiX Pj

Probabilistic model of interaction (2/2)

Page 9: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work

Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002)

Biological experimental data Proposed methods Results of computational experiments Conclusion

Page 10: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Related work INPUT:

interacting protein pairs (positive examples) non-interacting protein pairs (negative example

s) OUTPUT: Pr(Dmn=1) for all domain pairs

Page 11: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Association method (Sprinzak et al., 2001) Inference of probabilities of

domain-domain interactions using ratios of frequencies

: Number of interacting protein pairs that include (Dm, Dn)

: Number of protein pairs that include (Dm, Dn)

Page 12: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

EM method (Deng et al.,2002) Probability (likelihood L) that experiment

al data {Oij={0,1}} are observed.

Use EM algorithm in order to (locally) maximize L.

Estimate Pr(Dmn=1)

Page 13: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 14: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Biological experimental data Related methods (Association and EM) use o

nly binary data (interact or not). Experimental data using Yeast 2 hybrid

Ito et al. (2000, 2001) Uetz et al. (2001)

For many protein pairs, different results (Oij = {0,1}) were observed.

We developed new methods using raw numerical data.

Page 15: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Numerical data Ito et al. (2000,2001)

For each protein pair, experiments were performed multiple times.

IST (Interaction Sequence Tag) Number of observed interactions By using a threshold, we obtain binary

data.

Page 16: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 17: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Proposed methods It seems difficult

to modify EM method for numerical data.

Linear Programming

For binary data LPBN Combined methods

LPEM EMLP

SVM-based method For numerical data

ASNM LPNM

Page 18: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 19: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

LPBN (LP-based method)(1/2) Transformation into linear

inequalities Pi and Pj interact

Page 20: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

LPBN (LP-based method)(2/2) Linear programming for inference

of protein-protein interactions

Page 21: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Combination of EM and LPBN LPEM method

Use the results of LPBN as initial parameter values for EM.

EMLP method Constrains to LPBN with the

following inequalities so that LP solutions are close to EM solutions.

Page 22: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Simple SVM-based method Feature vector

Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative

examples

Page 23: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 24: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Strength of protein-protein interaction For each protein pair, experiments were

performed multiple times. The ratio can be considered as streng

th.

Kij : Number of observed interactions for a protein pair (Pi,Pj)

Mij : Number of experiments for (Pi,Pj)

Page 25: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

LPNM method (1/2) Minimize the gap between Pr(Pij=1)

and using LP.

Page 26: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

LPNM method (2/2) Linear programming for inference

of strengths of protein-protein interactions

Page 27: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

ASNM Modified Association method for numeri

cal data

For binary data (Sprinzak et al., 2001)

Page 28: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview Background Probabilistic model Related work Biological experimental data Proposed methods

For binary data For numerical data

Results of computational experiments Conclusion

Page 29: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Computational experimentsfor binary data DIP database (Xenarios et al., 2002)

1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test

Computational environment Xeon processor 2.8 GHz LP solver: loqo

Page 30: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Results on training data (binary data)

SVM

EM

LPBN

Association

Page 31: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Results on test data (binary data)

SVM

EMEML

P

Association

LPEM

Page 32: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Computational experimentsfor numerical data YIP database (Ito et al., 2001, 2002)

IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test

Computational environment Xeon processor 2.8 GHz LP solver: lp_solve

Page 33: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Results on test data (numerical data)

ASNMEMLPN

MAssociation

Page 34: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Results on test data (numerical data)

LPNM is the best. EM and Association methods

classify Pr(Pij=1) into either 0 or 1.

LPNM ASNM

EM ASSOC

Ave. Error 0.0308 0.0405 0.295 0.277

CPU (sec.) 1.20 0.0077 1.62 0.0088

Page 35: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Conclusion We have defined a new problem to infer

strengths of protein-protein interactions.

We have proposed LP-based methods. For binary data

LPBN, LPEM, EMLP SVM-based method

For numerical data ASNM LPNM LPNM outperformed the other methods.

Page 36: Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Future work Improve the methods to avoid overfittin

g. Improve the probabilistic model to under

stand protein-protein interactions more accurately.