Upload
novia
View
44
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Biocomputing Unit Department of Biology, University of Bologna, Italy www.biocomp.unibo.it. A neural network-based method for predicting protein stability changes upon single point mutations. Emidio Capriotti, Piero Fariselli and Rita Casadio. Problem Definition - PowerPoint PPT Presentation
Citation preview
A neural network-based method for predicting
protein stability changes upon single point
mutationsEmidio Capriotti, Piero Fariselli and Rita Casadio
Biocomputing Unit
Department of Biology,
University of Bologna,
Italy
www.biocomp.unibo.it
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
Problem Definition (I)
Native
A
Mutant
L
If we change Alanine 35 with a Leucine,is the protein stability increased ? Decreased? A35L
Problem Definition (I)
If we change Alanine 35 with a Leucine,is the protein stability increased ? Decreased?
Gf=Gu-Gf
Free Energy
Gf mut
U
F
Mutant
Gf nat
U
F
NativeGf = Gf mut - Gf nat
Problem Definition (II)The sign of Gfu
identifies the direction of the stability change
The sign is more informative than the |G|
Gf < 0 => the mutation increases the protein stability
Gf > 0 => the mutation decreases the protein stability
Our Neural Networks are trained to predict the sign of the stability change
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
Energy-based predictive methods
3) empirical energiesG = Wvdw Gvdw + Wsolv Gsolv + Wsc TSsc +...
2) statistical potentialsE(i,j) = - KT log ( f(i,j) )
1) physical effective energy potentials (classical MM force fields)
E= ½ks,ij(rij -ro)2 + ½kb,ij(ij –o)
2 +...
The State of the Art
+
+
-
-
OK
OK
Over/Under-predictions
Over/Under-predictions
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
The Data Base
http://www.rtc.riken.go.jp/jouhou/Protherm/
ProTherm is a collection of numerical data of thermodynamic parameters including Gibbs free energy change, enthalpy change, heat capacity change, transition temperature etc. for wild type and mutant proteins
Total number of entries 15379 Number of unique proteins 471 Total number of all proteins 668 Number of Proteins with mutants 195 Number of Single Mutations 7586 Number of Double Mutations 1192 Number of Multiple Mutations 563 Number of Wild Type 6038
Gromiha et al. (2000). Nucleic Acids Res. 28, 283-285
Training/testing Data set (I)
The data set of proteins was extracted from ProTherm, with the following constraints:
i) the G value was experimentally detected and reported in the data base;
ii) the protein structure is known with atomic resolution (and deposited in the PDB (Berman et al., 2000));
iii) the data are relative to single mutations (no multiple mutations have been taken into account).
After this filtering procedure, we ended up with 2 data sets
S1615 : 1615 different single mutations
S388 : 388 mutations from containing only experiments performed at physiological conditions (T 20-40 °C, pH 6-8)
Training/testing Data set (II)
S388S1615
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
Neural Network Predictor (I)
• N1: A 20 element vector that describes the aminoacid mutation, pH and T
• N2: adds to the N1 input one more neuron for the relative accessibility surface of the mutated residue
• N3: adds to N2 20 more input neurons (43 in total) encoding the three-dimensional residue environment
A C D E F G H I K L M N P Q R S T V W Y1-1
Mutation E->A
pHT
Network N1
A
Relative Solvent Accessibility
N2
Neural Network Predictor (II)
E->A
A
A
L
G
G
LE
I
L
E
A C D E F G H I K L M N P Q R S T V W Y22 1 32
Environment
N3Radius
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
Cross-validation performance of the different
neural networks on S1615
+ and – : the index is evaluated for positive and negative signs of protein energy stability change, respectively.
Method Q2 P(+) Q(+) P(-) Q(-) C
N1 0.74 0.59 0.23 0.76 0.94 0.24N2 0.75 0.57 0.45 0.80 0.87 0.34N3 0.81 0.71 0.52 0.83 0.91 0.49
Cross-validation performance of N3 as a function of different protein environments (different radius) centred on the mutated residue
Method Radius Q2 P(+) Q(+) P(-) Q(-) CN3-4.5 4.5 0.79 0.63 0.55 0.83 0.88 0.45N3-6.0 6.0 0.79 0.63 0.57 0.84 0.87 0.46N3-9.0 9.0 0.81 0.71 0.52 0.83 0.91 0.49N3-12.0 12.0 0.79 0.63 0.59 0.84 0.87 0.47
Q2 accuracy of neural network (N3-9.0) as a function of the reliability index (Rel)
0
0.2
0.4
0.6
0.8
1
<0.5 1 2 >2
| Stability Change |
Q2DB(%)
Q2 accuracy of neural network (N3-9.0) as a function of the absolute value of protein stability changes upon mutation (|Stability Change|)
Kcal/mol
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
Comparison of neural network with other methods on S388
Method Q2 P(+) Q(+) P(-) Q(-) C
FOLDX(1) 0.75 0.26 0.56 0.93 0.78 0.25DFIRE(2) 0.68 0.18 0.44 0.90 0.71 0.11PoPMuSiC(3) 0.85 0.33 0.25 0.90 0.93 0.20N3-9.0 0.87 0.44 0.21 0.90 0.96 0.25
(1) http://fold-x.embl-heidelberg.de. (2) http://phyyz4.med.buffalo.edu/hzhou/dmutation.html(3) http://babylone.ulb.ac.be/popmusic/
Accuracy of joint-methods on subsets of S388
Method Agreement Q2 P(+) Q(+) P(-) Q(-) CN3-9.0 72% 0.93 0.88 0.28 0.93 0.99 0.47+ FOLDX(1) N3-9.0 69% 0.90 0.36 0.16 0.92 0.97 0.19+ DFIRE(2)
N3-9.0 86% 0.91 0.67 0.07 0.92 0.99 0.19+PoPMuSiC(3)
• Problem Definition• The State of the Art• Data Base• Neural Network Predictor• Results• Comparison with other Methods• I-Mutant
I-Mutant
I-Mutant Web Server
http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant/I-Mutant.cgi/
thank you for your attention that’s all !
Biocomputing Unit
Department of Biology,
University of Bologna,
I taly
www.biocomp.unibo.it
Emidio Capriotti, Stability test
Piero Fariselli
Rita Casadio
Measures of Accuracy
1/2iiiiiiii
iiii
)]o (n )u (n )o (p )u [(p)ou - n(p
C
ii
ii up
pQ
ii
ii op
pP
N
i
i
NpQ
1
2Overall Accuracy
The efficiency of the predictor is scored using the statistical indexes defined following.
Correlation coefficient
Probability correct Prediction
Coverage
Where N is the total number of prediction, p the correct number of predictions, u and o are the numbers of under and over predictions.
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 >70
Relative Solvent Accessibility (RSA)
Q2DB(%)
Q2 accuracy of the neural network (N3-9.0) as a function of the relative accessibility value of the mutated residue
Q2 accuracy as a function of the residue mutation type
native \ new Charged Polar Apolar
Charged
Polar
Apolar
0.62 (4%) 0.77 (8%) 0.72 (9%)
0.69 (6%) 0.82 (10%) 0.77 (17%)
0.75 (3%) 0.92 (12%) 0.87 (31%)