44
Experimental Evaluation CS446-Spring06 1 Experimental Evaluation ental Machine Learning we evaluate the accur s empirically. es a few important methodological questions:

Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Embed Size (px)

Citation preview

Page 1: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 1

Experimental Evaluation• In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This raises a few important methodological questions:

Page 2: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 2

Experimental Evaluation• In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This raises a few important methodological questions:

• Given the observed accuracy of the hypothesis over a limited sample of data, how well does it estimate its accuracy over additional examples ?

Page 3: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 3

Experimental Evaluation• In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This raises a few important methodological questions:

• Given the observed accuracy of the hypothesis over a limited sample of data, how well does it estimate its accuracy over additional examples ?

• Given that one hypothesis outperforms another over some sample of data, how probable is it that it is more accurate in general ?

Page 4: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 4

Experimental Evaluation• In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This raises a few important methodological questions:

• Given the observed accuracy of the hypothesis over a limited sample of data, how well does it estimate its accuracy over additional examples ?

• Given that one hypothesis outperforms another over some sample of data, how probable is it that it is more accurate in general ?

• When data is limited, what is the best way to use this data to both learn the hypothesis and estimate its accuracy.

Page 5: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 5

Experimental Evaluation• In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This raises a few important methodological questions:

• Given the observed accuracy of the hypothesis over a limited sample of data, how well does it estimate its accuracy over additional examples ? Estimating Hypothesis Accuracy• Given that one hypothesis outperforms another over some sample of data, how probable is it that it is more accurate in general ? Comparing Classifiers/Learning Algorithms• When data is limited, what is the best way to use this data to both learn the hypothesis and estimate its accuracy.

Statistical Problems: Parameter Estimation and Hypothesis Testing

Page 6: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 6

Estimating Hypothesis Accuracy• Given a hypothesis h and data sample containing n examples drawn at random according to some distribution D what is the best estimate of the accuracy of h over future instances drawn from the same distribution ?

Page 7: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 7

Estimating Hypothesis Accuracy• Given a hypothesis h and data sample containing n examples drawn at random according to some distribution D what is the best estimate of the accuracy of h over future instances drawn from the same distribution ?

PAC: Given sample drawn according to D Want to make sure that we will be okay for new sample from D with confidence we will be -accurate Here: We observe some accuracy and want to know if its typical

Note the difference from the (worst case) PAC learning question. Here we are interested in a statistical estimation problem.

Page 8: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 8

Estimating Hypothesis Accuracy• Given a hypothesis h and data sample containing n examples drawn at random according to some distribution D what is the best estimate of the accuracy of h over future instances drawn from the same distribution ?

Note the difference from the (worst case) PAC learning question. Here we are interested in a statistical estimation problem.

• The problem is to estimate the proportion of a population that exhibits some property, given the observed proportion over some random sample of the population.

Page 9: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 9

Estimating Hypothesis Accuracy• The property we are interested in is (for some fixed function f) The True Error:

• Since we cannot observe it, we are performing an experiment: we collect a random sample S of n independently drawn instances from the distribution D, and use it to measure The Sample Error:

• Naturally, each time we run an experiment (i.e., collect a sample of n test examples) we expect to get a different Sample Error.

f(x))(h(x)Pr(h)error DxD

|f(x)}h(x)|S{x|n

1(h)errorS

Page 10: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 10

Estimating Hypothesis Accuracy • The distribution of the number of mistakes is Binomial( p) with p=

The mean is and the standard variation is

r-nr p)-(1r)pC(n,r)Mistakes of p(Num (h)errorD

npE p)-np(1

Page 11: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 11

Estimating Hypothesis Accuracy • The Sample Error is distributed like r/n, when r is Binomial(p)

The mean is

and the standard variation is

(h)error p E D(h)errorS

p)-np(1n

1

(h)errorS

Page 12: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 12

Estimating Hypothesis Accuracy • The Sample Error is distributed like r/n, when r is Binomial(p)

The mean is and the standard variation is ,

• But, due to the central limit theorem, if n is large enough (30…) we can assume that the distribution of the Sample Error is Normal with mean and standard variation:

(h)error p E D(h)errorS

p)-np(1n

1

(h)errorS

(h)error p E D(h)errorS

(h)/nerror(h)(1errorp)-np(1n

1SS(h)errorS

Page 13: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 13

Estimating Hypothesis AccuracyThe distribution of the Sample Error:

Consequently, one can give a range on the error of a hypothesis such that with high probability the true error will be within this range.

(h)errorD

(h))Pr(errorS

(h)errorS

Page 14: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 14

Estimating Hypothesis AccuracyThe distribution of the Sample Error:

Consequently, one can give a range on the error of a hypothesis such that with high probability the true error will be within this range.

Given the observed error (your estimate of the true error), you know with some confidence that the true error is within some range around it.

(h)errorD

(h))Pr(errorS

(h)errorS

Page 15: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 15

Some NumbersAssume you test an hypothesis h and find that it commits r = 12 errors on a sample of n=40 examples.

The estimation for the true error will be: p=r/n= 0.3 What is the variance of this error ? (n is fixed, r - random variable, distributed Binomial(0.3).) Therefore (# of mistakes) = 40. 0.3 (1-0.3) = 2.89 And (sample error) = 2.89/40=0.07 r = 300 errors on a sample of n=1000 examples. The estimation for the true error will be: p=r/n= 0.3 And (sample error) = 0.3 (1-0.3)/1000=14.5/1000=0.014

Page 16: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 16

Estimating Hypothesis AccuracyThe distribution of the Sample Error:

95% of the samples are within ±2 of the mean

Consequently, one can give a range on the error of a hypothesis such that with high probability the true error will be within this range.With confidence N%:Confidence N% 50% 68% 80% 90% 95% 98% 99%Constant R 0.67 1.00 1.28 1.64 1.96 2.33 2.58

(h))/nerror-(h)(1error SS

(h)errorD

(h))Pr(errorS

(h))error R( (h)error SND

(h)errorS

Page 17: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 17

Estimating Hypothesis AccuracyThe distribution of the Sample Error:

95% of the samples are within ±2 of the mean

Consequently, one can give a range on the error of a hypothesis such that with high probability the true error will be within this range.With confidence N%:Confidence N% 50% 68% 80% 90% 95% 98% 99%Constant R 0.67 1.00 1.28 1.64 1.96 2.33 2.58

(h))/nerror-(h)(1error SS

(h)errorD

(h))Pr(errorS

(h))error R( (h)error SND

(h)errorS

NRZR-Prob( NN )

Page 18: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 18

Comparing Two HypothesesWhen comparing two hypotheses, the ordering of their sample accuracies may or may nor accurately reflect the ordering of their true accuracies.

Page 19: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 19

Comparing Two HypothesesWhen comparing two hypotheses, the ordering of their sample accuracies may or may nor accurately reflect the ordering of their true accuracies.

Interpretation: assume we test on and on and measure and respectively. These graphs indicate theprobability distribution of the sample error. We can see that it is possible that the true error of is lower than that of and vice versa.

)(herror 2S2

(h)errorS

(h))Pr(errorS

)(herror 1S1

)(herror 2S2)(herror 1S1

1h 1S 2h 2S

2h 1h

Page 20: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 20

Comparing Two Hypotheses• We wish to estimate the difference between the true error of these hypotheses. • The difference of two normally distributed variables is also normally distributed

d

Pr(d))(herror-)(herrord 2S1S 21

0

Page 21: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 21

Comparing Two Hypotheses• We wish to estimate the difference between the true error of these hypotheses. • The difference of two normally distributed variables is also normally distributed

Notice that the density function is a convolution of the original two

d

Pr(d))(herror-)(herrord 2S1S 21

0

Page 22: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 22

Confidence in Difference• The probability that is the probability that d > 0 which is given by the shaded area.

d

Pr(d)

0

)(herror )(herror 2D1D

)(herror-)(herrord 2S1S 21

Page 23: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 23

Confidence in Difference• Since the normal distribution is symmetric, we can also assert confidence intervals with lower bounds and upper bounds

d

Pr(d))(herror-)(herrord 2S1S 21

Page 24: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 24

Standard Deviation of the Difference• The variance of the difference is the sum of the variances

• The mean is the observed difference d• Therefore, the N% confidence interval in d is

• What is the probability that ? This is the confidence that d is in the one-sided interval d >0

• We find the highest value N such that , that is and conclude that with probability (100-(100-N)/2)%

2

2S2S

1

1S1Sd n

))(herror-)(1(herror

n

))(herror-)(1(herror2211

) Rd dN(

)(herror )(herror 2D1D

Rd dNdN /dR )(herror )(herror 2D1D

Page 25: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 25

Standard Deviation of the Difference• What is the probability that ? This is the confidence that d is in the one-sided interval d >0 • We find the highest value N such that , that is and conclude that with probability (100-(100-N)/2)%

Confidence N% 50% 68% 80% 90% 95% 98% 99%Constant Rn 0.67 1.00 1.28 1.64 1.96 2.33 2.58

)(herror )(herror 2D1D

Rd dN d

Rd

N

)(herror )(herror 2D1D

(h)errorS

d

Pr(d)

0

NRZR-Prob( NN )

Page 26: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 26

Hypothesis Testing

•In this case we can say that we accept the hypothesis that with N% confidence.

• Equivalently, we can say that we reject that hypothesis that the difference is due to random chance, at a (100-N)/100 level of significance.

• By convention, in normal scientific practice, a confidence of 95% is high enough to assert that there is a “significant difference”.

)(herror )(herror 2D1D

• A statistical hypothesis is a statement about a set of parameters of a distribution. We are looking for procedures that determine whether the hypothesis is correct or not.

Page 27: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 27

A Hypothesis Test• Assume that based on two different samples of 100 test instances we observe that:

• We can say that we accept the hypothesis that “h1 is better than h2” with 95% confidence, or that the difference is significant at the .05 level.

0.2 )(herror 1S1

0.3)(herror 2S2

0.0608n

))(herror-)(1(herror

n

))(herror-)(1(herror

2

2S2S

1

1S1Sd

2211

9590)/2-(100-100R1.6440.0608

0.2-0.3d90

d

Page 28: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 28

A Hypothesis Test• Assume that based on two different samples of 100 test instances we observe that:

• We conclude: “h1 is better than h2” with 75% confidence.• Cannot conclude that that difference is significant (since p> .05).

0.2 )(herror 1D 0.25)(herror 2D

0.0589n

))(herror-)(1(herror

n

))(herror-)(1(herror

2

2S2S

1

1S1Sd

2211

7550)/2-(100-100R0.8480.0589

0.25-0.3d50

d

Page 29: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 29

Comparing Learning Algorithms• Given two algorithms A and B we would like to know which of the methods is the better method, on average, for learning a particular function f

• Statistical tests must control several sources of variation: - variation in selecting test data - variation in selecting training data - random decisions of the algorithms

• Algorithms A might do better than B when trained on a particular randomly selected set, or when tested on a particular randomly selected test set, even though on the whole population they perform identically.

Page 30: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 30

Comparing Learning Algorithms• An ideal statistical test should derive conclusions based on estimating:

where by L(S) we denote the output hypothesis of the algorithm when being trained on S, and the expectation is over all possible samples from the underlying distribution, taken independently.

(S)](Lerror-(S)(L[errorE BDADDS

• In practice we usually have a sample D’ from D to work with. The average is therefore done on different splits of this sample to training/test sets. • We want methods that - identify a difference between algorithms when it exists - do not find a difference when it does not exists

Page 31: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 31

Methodology• Assume a hypothesis (null hypothesis) E.g. the algorithms are equivalent• Choose a statistics A figure that you can compute from the data and can estimate given that the hypothesis holds. - what value do we expect? (assuming the hypothesis holds) - what value do we get? (experimentally)

• What is the probability distribution of the statistics? What’s the deviation of the empirical figure from the expected one?

Decide: Is this due to chance? - Yes/ No/ with what confidence?

Page 32: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 32

Distributions• Normal Distribution

• Chi Square Consider the random variables: The random variables defined by:

is (with n degrees of freedom)

2),( 2

iii NX

22

11

2 /)( ii

n

i

n

i XZ )(2 n

),( 2iii NX

iiii XZ /)(

Page 33: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 33

Distributions Student’s t distribution: Let W be N(0,1), V be and assume the W,V are independent Then, the distribution of the random variable

is call a t-distribution (with n degrees of freedom)

)(2 n

nV

WTn

Tn is symmetric about zero. As n becomes larger, it becomes more and more like N(0,1). E(Tn) = 0 Var(Tn) = n/(n-2)

Page 34: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 34

t-Distributions Student’s t distribution: Let W be N(0,1), V be and assume the W,V are independent Then, the distribution of the random variable

is call a t-distribution (with n degrees of freedom)

2)(2 n

nV

WTn

Tn is symmetric about zero. As n becomes larger, it becomes more and more like N(0,1). E(Tn) = 0 Var(Tn) = n/(n-2)

Page 35: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 35

t Distributions• Originally used when one can obtain an estimate for the mean but not for

• We want a distribution that allows us to compute a confidence in the mean without knowing , but only an estimate s for it (based on the same sample that produced the mean).

• The quantity t is given by:

• That is, t is the deviation of the sample mean from the population mean, measured in units of the means standard error • This is good for small samples, and the tables depend on n

ns /

Xt

ns /

Page 36: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 36

K-Fold Cross Validation• Partition the data D’ into k disjoint subsets of equal size.

• For i from 1 to k do:

Use for test and the rest for training. Set

• Return the average difference in error:

k21 T,...,T,T

iT

ii TD'S

)(SLh iAA

)(herror-)(herror BTATi ii

)(SLh iBB

i ik

1

Page 37: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 37

K-Fold Cross Validation Comments• 10 is a standard number of folds. When k=|D|, the methods is called leave-one-out.

• Every example gets used as a test example exactly once and as a training example k-1 times.

• Test sets are independent but training set overlap significantly.

• The hypotheses are generated using (k-1)/k of the training data.

Page 38: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 38

K-Fold Cross Validation Comments• 10 is a standard number of folds. When k=|D|, the methods is called leave-one-out.

• Every examples gets used as a test example exactly once and as a training example k-1 times.

• Test sets are independent but training set overlap significantly.

• The hypotheses are generated using (k-1)/k of the training data.

• Before we compared hypotheses using independent test sets. Here, the hypotheses generated by algorithms A and B are tested on the same test set (paired tests).

Page 39: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 39

Paired t Tests• Paired tests produce tighter bounds since any difference is due to difference in the hypotheses rather than differences in the test set.

Significance Testing of the Paired Tests: • Compute the statistics:

• where is the measured difference between A and B on the ith data set and is their average.• The statistics is distributed according to a t-distribution(k)• When k paired test are performed With k=30, to get N=95%, we need |t| < 2.04

2

1

)()1(

1

k

ik

kt

i

i

i i1/k

Page 40: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 40

Paired t Tests• Paired t tests can be used in many ways.

• Sample the data 30 times. Split each sample to Train and Test. Run A and B on Train and Test. Let be the difference in error. Estimate the same statistics. (Most common in Machine Leaning; has problems)

• 10-Fold Cross validation: The ith experiment is done on Better; has problems due to training set overlap

i

iT

Page 41: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 41

5x2 Cross Validation• Perform 5 replication of 2-fold cross validation.• In each replication, the available data is randomly partitioned into and of equal size.

• Train algorithms A and B on each set and test on the other.• Error measures:

• Let be the variances computed for each of the 5 replications. • Then,

• Has a t-distribution, with k=5

2S 1S

22

21

22B

2A2

1B

1A1

2B

2A

1B

1A )p(p)p(ps ,eep ,eep ,e,e,e,e

5

1i

2i

1

s5

1

pt

2is

Page 42: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 42

McNemar’s Test• An alternative to Cross Validation, when the test can be run only once. • Divide the sample S into a training set R and test set T. • Train algorithms A and B on R, yielding classifiers A, B• Record how each example in T is classified and compute the number of: examples misclassified by both examples misclassified by A and B A but not B

examples misclassified by both examples misclassified by B but not A neither A nor B

where N is the total number of examples in the test set T

00N

10N11N

01N

NNNNN 11011000

Page 43: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 43

McNemar’s Test• The hypothesis: the two learning algorithms have the same error rate on a randomly drawn sample. That is, we expect that

• The statistics we use to measure deviation from the expected counts:

• This statistics is distributed (approximately) as with 1 degree of freedom. (with a continuity correction since the statistics is discrete)

• Example: Since we reject the hypothesis with 95% confidence if the above ratio is greater the 3.841

1001

21001

NN

1)|NN(|

2

841.321,0.95

0110 NN

Page 44: Experimental EvaluationCS446-Spring061 Experimental Evaluation In experimental Machine Learning we evaluate the accuracy of a hypothesis empirically. This

Experimental Evaluation CS446-Spring06 44

Experimental Evaluation - Final Comments• Good experimental methodology, including statistical analysis is important in empirically comparing learning algorithms.

• Methods have their shortcomings; this is an active area of research. Tom Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms (Neural Computation)• Artificial data is useful for testing certain hypotheses about specific strength and weaknesses of algorithms but only real data can test the hypothesis that the bias of the learner is useful for the actual problem.• There are a few benchmarks for comparing learning algorithms. The UC Irvine repository is the one most commonly used http://www/ics.uci.edu/mlearn/MLRepository.html