Upload
salali
View
52
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Inferring strengths of protein-protein interactions from experimental data using linear programming. Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University. Overview. Background Probabilistic model Related work Biological experimental data - PowerPoint PPT Presentation
Citation preview
Inferring strengths of protein-protein interactions from experimental data using linear programming
Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu
Bioinformatics Center,Kyoto University
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Background (1/3) Understanding protein-protein
interactions is useful for understanding of protein functions. Transcription factors
Proteins interact with a factor. Regulate the gene.
Receptors, etc.
Background (2/3) Various methods were developed for inf
erence of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. a
nd Marcotte et al. 1999) Number of possible genes to be applied is limit
ed. Molecular dynamics
Long CPU time Difficult to predict precisely
Background (3/3) A Model based on domain-domain
interactions has been proposed. Use domains defined by databases
like InterPro or Pfam.
Domain
Domain
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Probabilistic model of interaction (1/2) Model (Deng et al., 2002)
Two proteins interact. At least one pair of domains
interacts. Interactions between domains are
independent events.D1
D2
D3
D2 D4
P2P1
: Proteins Pi and Pj interact : Domains Dm and Dn interact : Domain pair (Dm ,Dn) is include
d in protein pair PiX Pj
Probabilistic model of interaction (2/2)
Overview Background Probabilistic model Related work
Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002)
Biological experimental data Proposed methods Results of computational experiments Conclusion
Related work INPUT:
interacting protein pairs (positive examples) non-interacting protein pairs (negative example
s) OUTPUT: Pr(Dmn=1) for all domain pairs
Association method (Sprinzak et al., 2001) Inference of probabilities of
domain-domain interactions using ratios of frequencies
: Number of interacting protein pairs that include (Dm, Dn)
: Number of protein pairs that include (Dm, Dn)
EM method (Deng et al.,2002) Probability (likelihood L) that experiment
al data {Oij={0,1}} are observed.
Use EM algorithm in order to (locally) maximize L.
Estimate Pr(Dmn=1)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Biological experimental data Related methods (Association and EM) use o
nly binary data (interact or not). Experimental data using Yeast 2 hybrid
Ito et al. (2000, 2001) Uetz et al. (2001)
For many protein pairs, different results (Oij = {0,1}) were observed.
We developed new methods using raw numerical data.
Numerical data Ito et al. (2000,2001)
For each protein pair, experiments were performed multiple times.
IST (Interaction Sequence Tag) Number of observed interactions By using a threshold, we obtain binary
data.
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Proposed methods It seems difficult
to modify EM method for numerical data.
Linear Programming
For binary data LPBN Combined methods
LPEM EMLP
SVM-based method For numerical data
ASNM LPNM
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
LPBN (LP-based method)(1/2) Transformation into linear
inequalities Pi and Pj interact
LPBN (LP-based method)(2/2) Linear programming for inference
of protein-protein interactions
Combination of EM and LPBN LPEM method
Use the results of LPBN as initial parameter values for EM.
EMLP method Constrains to LPBN with the
following inequalities so that LP solutions are close to EM solutions.
Simple SVM-based method Feature vector
Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative
examples
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Strength of protein-protein interaction For each protein pair, experiments were
performed multiple times. The ratio can be considered as streng
th.
Kij : Number of observed interactions for a protein pair (Pi,Pj)
Mij : Number of experiments for (Pi,Pj)
LPNM method (1/2) Minimize the gap between Pr(Pij=1)
and using LP.
LPNM method (2/2) Linear programming for inference
of strengths of protein-protein interactions
ASNM Modified Association method for numeri
cal data
For binary data (Sprinzak et al., 2001)
Overview Background Probabilistic model Related work Biological experimental data Proposed methods
For binary data For numerical data
Results of computational experiments Conclusion
Computational experimentsfor binary data DIP database (Xenarios et al., 2002)
1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test
Computational environment Xeon processor 2.8 GHz LP solver: loqo
Results on training data (binary data)
SVM
EM
LPBN
Association
Results on test data (binary data)
SVM
EMEML
P
Association
LPEM
Computational experimentsfor numerical data YIP database (Ito et al., 2001, 2002)
IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test
Computational environment Xeon processor 2.8 GHz LP solver: lp_solve
Results on test data (numerical data)
ASNMEMLPN
MAssociation
Results on test data (numerical data)
LPNM is the best. EM and Association methods
classify Pr(Pij=1) into either 0 or 1.
LPNM ASNM
EM ASSOC
Ave. Error 0.0308 0.0405 0.295 0.277
CPU (sec.) 1.20 0.0077 1.62 0.0088
Conclusion We have defined a new problem to infer
strengths of protein-protein interactions.
We have proposed LP-based methods. For binary data
LPBN, LPEM, EMLP SVM-based method
For numerical data ASNM LPNM LPNM outperformed the other methods.
Future work Improve the methods to avoid overfittin
g. Improve the probabilistic model to under
stand protein-protein interactions more accurately.