Download pptx - Optimization methods Morten Nielsen Department of Systems Biology, DTU

Optimization methods

Morten NielsenDepartment of Systems Biology,

DTU

Outline

• Optimization procedures – Gradient decent– Monte Carlo

• Overfitting – cross-validation

• Method evaluation

Linear methods. Error estimate

I1 I2

w1 w2

Linear function

o

Gradient decent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent

Gradient decent

Weights are changed in the opposite direction of the gradient of the error

Gradient decent (Linear function)


I1 I2

w1 w2

Linear function

o

Gradient decent


I1 I2

w1 w2

Linear function

o

Gradient decent. Example


I1 I2

w1 w2

Linear function

o

Gradient decent. Example


I1 I2

w1 w2

Linear function

o

Gradient decent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error

1 0

W1=0.1 W2=0.1

Linear function

o

What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?

Fill out the table

itr W1 W2 O

0 0.1 0.1

1

2

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

1 0

W1=0.1 W2=0.1

Linear function

o

Monte Carlo

Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?

Monte Carlo (Minimization)

dE<0dE>0

Gibbs sampler. Monte Carlo simulations

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

E1 = 5.4 E2 = 5.7

E2 = 5.2

dE>0; Paccept =1

dE<0; 0 < Paccept < 1

Note the sign. Maximization

Monte Carlo Temperature

• What is the Monte Carlo temperature?

• Say dE=-0.2, T=1

• T=0.001

MC minimization

Monte Carlo - Examples

• Why a temperature?

Local minima

• A prediction method contains a very large set of parameters

– A matrix for predicting binding for 9meric peptides has 9x20=180 weights

• Over fitting is a problem

Data driven method training

yearsTe

mperature

ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSVMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS

Evaluation of predictive performance• Train PSSM on raw data

– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points

• Evaluate on training data–PCC = 0.97–AUC = 1.0

• Close to a perfect prediction method

Bin

ders

Non

e B

ind

ers

AAAMAAKLAAAKNLAAAAAKALAAAARAAAAKLATAALAKAVAAAIPELMRTNGFIMGVFTGLNVTKVVAWLLEPLNLVLKVAVIVSVPFMRSGRVHAVVRFNIDETPANYIGQDGLAELCGDPGDQTRAVADGKGRPVPAAHPMTAQWWLDAFARGVVHVILQRELTRLQAVAEEMTKS

Evaluation of predictive performance• Train PSSM on Permuted (random) data

– No pseudo counts, No sequence weighting– Fit 9*20 parameters to 9*10 data points

• Evaluate on training data–PCC = 0.97–AUC = 1.0

• Close to a perfect prediction method AND• Same performance as one the original data

Bin

ders

Non

e B

ind

ers

Repeat on large training data (229 ligands)

Cross validation

Cross validation

Train on 4/5 of dataTest/evaluate on 1/5=>Produce 5 different methods each with a different prediction focus

Model over-fitting

2000 MHC:peptide binding dataPCC=0.99

Evaluate on 600 MHC:peptide binding dataPCC=0.80

Model over-fitting (early stopping)

Evaluate on 600 MHC:peptide binding dataPCC=0.89

Stop training

What is going on?

years

Temperature

5 fold training

Which method to choose?

5 fold training

Method evaluation

• Use cross validation• Evaluate on concatenated data and not

as an average over each cross-validated performance

Method evaluation

Which prediction to use?

Method evaluation

SMM - Stabilization matrix method

I1 I2

w1 w2

Linear function

o

Per target:

Global:

Sum over weights

Sum over data points


I1 I2

w1 w2

Linear function

o

l per target


I1 I2

w1 w2

Linear function

o

SMM training

Evaluate on 600 MHC:peptide binding dataL=0: PCC=0.70L=0.1 PCC = 0.78

SMM - Stabilization matrix methodMonte Carlo

I1 I2

w1 w2

Linear function

o

Global:

• Make random change to weights

• Calculate change in “global” error

• Update weights if MC move is accepted

Note difference between MC and GD in the use of “global” versus “per target” error

Training/evaluation procedure

• Define method• Select data• Deal with data redundancy

– In method (sequence weighting)– In data (Hobohm)

• Deal with over-fitting either– in method (SMM regulation term) or– in training (stop fitting on test set

performance)• Evaluate method using cross-validation

A small doit script/usr/opt/www/pub/CBS/courses/27623.algo/exercises/code/SMM/doit_ex

#! /bin/tcsh foreach a ( `cat allelefile` )

mkdir -p $cd $a

foreach l ( 0 1 2.5 5 10 20 30 )

mkdir -p l.$lcd l.$l

foreach n ( 0 1 2 3 4 )

smm -nc 500 -l $l train.$n > mat.$npep2score -mat mat.$n eval.$n > eval.$n.pred

end

echo $a $l `cat eval.?.pred | grep -v "#" | gawk '{print $2,$3}' | xycorr`

cd ..

end

cd ..

end