Optimization methods
Morten NielsenDepartment of Systems Biology,
DTU
Outline
• Optimization procedures – Gradient decent– Monte Carlo
• Overfitting – cross-validation
• Method evaluation
Linear methods. Error estimate
I1 I2
w1 w2
Linear function
o
Gradient decent (from wekipedia)
Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if
for > 0 a small enough number, then F(b)<F(a)
Gradient decent (example)
Gradient decent
Gradient decent
Weights are changed in the opposite direction of the gradient of the error
Gradient decent (Linear function)
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient decent
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient decent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient decent. Example
Weights are changed in the opposite direction of the gradient of the error
I1 I2
w1 w2
Linear function
o
Gradient decent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error
1 0
W1=0.1 W2=0.1
Linear function
o
What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?
Fill out the table
itr W1 W2 O
0 0.1 0.1
1
2
What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?
1 0
W1=0.1 W2=0.1
Linear function
o
Monte Carlo
Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?
Monte Carlo (Minimization)
dE<0dE>0
Gibbs sampler. Monte Carlo simulations
RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE
E1 = 5.4 E2 = 5.7
E2 = 5.2
dE>0; Paccept =1
dE<0; 0 < Paccept < 1
Note the sign. Maximization
Monte Carlo Temperature
• What is the Monte Carlo temperature?
• Say dE=-0.2, T=1
• T=0.001
MC minimization
Monte Carlo - Examples
• Why a temperature?
Local minima
• A prediction method contains a very large set of parameters
– A matrix for predicting binding for 9meric peptides has 9x20=180 weights
• Over fitting is a problem
Data driven method training
yearsTe
mperature
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSVMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS
Evaluation of predictive performance• Train PSSM on raw data
– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points
• Evaluate on training data–PCC = 0.97–AUC = 1.0
• Close to a perfect prediction method
Bin
ders
Non
e B
ind
ers
AAAMAAKLAAAKNLAAAAAKALAAAARAAAAKLATAALAKAVAAAIPELMRTNGFIMGVFTGLNVTKVVAWLLEPLNLVLKVAVIVSVPFMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS
Evaluation of predictive performance• Train PSSM on Permuted (random) data
– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points
• Evaluate on training data–PCC = 0.97–AUC = 1.0
• Close to a perfect prediction method AND• Same performance as one the original data
Bin
ders
Non
e B
ind
ers
Repeat on large training data (229 ligands)
Cross validation
Cross validation
Train on 4/5 of dataTest/evaluate on 1/5=>Produce 5 different methods each with a different prediction focus
Model over-fitting
2000 MHC:peptide binding dataPCC=0.99
Evaluate on 600 MHC:peptide binding dataPCC=0.80
Model over-fitting (early stopping)
Evaluate on 600 MHC:peptide binding dataPCC=0.89
Stop training
What is going on?
years
Temperature
5 fold training
Which method to choose?
5 fold training
Method evaluation
• Use cross validation• Evaluate on concatenated data and not
as an average over each cross-validated performance
Method evaluation
Which prediction to use?
Method evaluation
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
Per target:
Global:
Sum over weights
Sum over data points
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
l per target
SMM - Stabilization matrix method
I1 I2
w1 w2
Linear function
o
SMM training
Evaluate on 600 MHC:peptide binding dataL=0: PCC=0.70L=0.1 PCC = 0.78
SMM - Stabilization matrix methodMonte Carlo
I1 I2
w1 w2
Linear function
o
Global:
• Make random change to weights
• Calculate change in “global” error
• Update weights if MC move is accepted
Note difference between MC and GD in the use of “global” versus “per target” error
Training/evaluation procedure
• Define method• Select data• Deal with data redundancy
– In method (sequence weighting)– In data (Hobohm)
• Deal with over-fitting either– in method (SMM regulation term) or– in training (stop fitting on test set
performance)• Evaluate method using cross-validation
A small doit script/usr/opt/www/pub/CBS/courses/27623.algo/exercises/code/SMM/doit_ex
#! /bin/tcsh foreach a ( `cat allelefile` )
mkdir -p $cd $a
foreach l ( 0 1 2.5 5 10 20 30 )
mkdir -p l.$lcd l.$l
foreach n ( 0 1 2 3 4 )
smm -nc 500 -l $l train.$n > mat.$npep2score -mat mat.$n eval.$n > eval.$n.pred
end
echo $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr`
cd ..
end
cd ..
end