30
ETHEM ALPAYDIN © The MIT Press, 2010 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for

Introduction to Machine Learning - CmpE WEB |ethem/i2ml2e/2e_v1-0/i2ml2... · 2016-08-08 · Guidelines for ML experiments A. Aim of the study B. Selection of the response variable

Embed Size (px)

Citation preview

ETHEM ALPAYDIN© The MIT Press, 2010

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

Introduction Questions:

Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%?

Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?

Training/validation/test sets

Resampling methods: K-fold cross-validation

3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Algorithm Preference Criteria (Application-dependent):

Misclassification error, or risk (loss functions)

Training time/space complexity

Testing time/space complexity

Interpretability

Easy programmability

Cost-sensitive learning

4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Factors and Response

5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Strategies of Experimentation

6

Response surface design for approximating and maximizing the response function in terms of the controllable factors

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Guidelines for ML experimentsA. Aim of the study

B. Selection of the response variable

C. Choice of factors and levels

D. Choice of experimental design

E. Performing the experiment

F. Statistical Analysis of the Data

G. Conclusions and Recommendations

7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

The need for multiple training/validation sets

{Xi,Vi}i: Training/validation sets of fold i

K-fold cross-validation: Divide X into k, Xi,i=1,...,K

Ti share K-2 parts

Resampling and K-Fold Cross-Validation

121

3122

3211

KKKK

K

K

XXXTXV

XXXTXV

XXXTXV

2

1

8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 Cross-Validation

1510

2510

259

159

124

224

223

123

112

212

211

111

XVXT

XVXT

XVXT

XVXT

XVXT

XVXT

9

5 times 2 fold cross-validation (Dietterich, 1998)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Draw instances from a dataset with replacement

Prob that we do not pick an instance after N draws

that is, only 36.8% is new!

Bootstrapping

36801

1 1 .

eN

N

10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Measuring Error

Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives

= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found

= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

ROC Curve

12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Precision and Recall

14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Interval Estimation

1

950961961

950961961

22N

zmN

zmP

Nm

NmP

mNP

mN

//

...

...

~Z

15

X = { xt }t where xt ~ N ( μ, σ2)

m ~ N ( μ, σ2/N)

100(1- α) percentconfidence interval

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

16

When σ2 is not known:

1

950641

950641

NzmP

NmP

mNP

..

..

1

1

1212

1

22

N

Stm

N

StmP

tS

mNNmxS

NN

Nt

t

,/,/

~ /

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Reject a null hypothesis if not supported by the sample with enough confidence

X = { xt }t where xt ~ N ( μ, σ2)

H0: μ = μ0 vs. H1: μ ≠ μ0

Accept H0 with level of significance α if μ0 is in the

100(1- α) confidence interval

Two-sided test

Hypothesis Testing

220

// ,

zz

mN

17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0

Accept if

Variance unknown: Use t, instead of z

Accept H0: μ = μ0 if

z

mN,

0

18

12120

NN ttS

mN,/,/ ,

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Assessing Error: H0: p ≤ p0 vs. H1: p > p0

19

Single training/validation set: Binomial Test

If error prob is p0, prob that there are e errors or less in Nvalidation trials is

1- α

Accept if this prob is less than 1- α

N=100, e=20

jNjje

j

ppj

NeXP

00

1

1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Number of errors X is approx N with mean Np0 and var Np0(1-p0)

Normal Approximation to the Binomial

Z~

00

0

1 pNp

NpX

20

Accept if this prob for X = e is less than z1-α

1- α

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Multiple training/validation sets

xti = 1 if instance t misclassified on fold i

Error rate of fold i:

With m and s2 average and var of pi , we accept p0 or less error if

is less than tα,K-1

t Test

N

xp

N

t

ti

i

1

21

1

0

Kt

S

pmK~

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparing Classifiers: H0: μ0 = μ1 vs. H1: μ0 ≠ μ1

2

1

1001

2

1001 1X~

ee

ee

22

Single training/validation set: McNemar’s Test

Under H0, we expect e01= e10=(e01+ e10)/2

Accept if < X2α,1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

K-Fold CV Paired t Test

12121

1

2

21

00

0

1

00

KKK

K

i i

K

i i

ttts

mK

s

mK

K

mps

K

pm

HH

,/,/ ,~

::

in if Accept

vs.

23

Use K-fold cv to get K training/validation folds

pi1, pi

2: Errors of classifiers 1 and 2 on fold i

pi = pi1 – pi

2 : Paired difference on fold i

The null hypothesis is whether pi has mean 0

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 cv Paired t Test

55

1

2

1

1

2221221

5

2

ts

p

ppppsppp

i i

iiiiiiii

~/

/

24

Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998)

pi(j) : difference btw errors of 1 and 2 on fold j=1, 2 of

replication i=1,...,5

Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5×2 cv Paired F Test

5105

1

2

5

1

2

1

2

2,~ F

s

p

i i

i j

ji

25

Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparing L>2 Algorithms: Analysis of Variance (Anova)

LH 210 :

26

Errors of L algorithms on K folds

We construct two estimators to σ2 .

One is valid if H0 is true, the other is always valid.

We reject H0 if the two estimators disagree.

KiLjX jij ,..., ,,...,,,~ 112 N

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

27

2

12

0

22

12

2

1

2

2

2

2

21

2

1

0

1

1

L

jjL

j

j

L

j

j

j j

L

j j

K

i

ij

j

SSb

H

mmKSSbK

mm

L

mmK

SK

L

mmS

L

mm

KK

Xm

H

X

X

N

~

~/

ˆ

/,~

:

have wetrue, is whenSo

namely, , is of estimator an Thus

true is If

2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

28

11210

11

22

2

12

2

12

2

2

2

1

2

1

2

2

2

0

1

1

11

1

11

KLLL

KLL

KLK

j

j ijij

j i

jijL

j

j

K

i jij

j

j

FH

FKLSSw

LSSb

KL

SSw

L

SSb

SSwSK

mXSSw

KL

mX

L

S

K

mXS

S

H

,,

,

:

~/

///

/

~~

ˆ

if

: variances group of average

the is to estimator second our of Regardless

2

2

XX

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

ANOVA table

29

If ANOVA rejects, we do pairwise posthoc tests

)(~

: vs :

1

10

2

KL

w

ji

jiji

tmm

t

HH

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Comparison over Multiple Datasets Comparing two algorithms:

Sign test: Count how many times A beats B over Ndatasets, and check if this could have been by chance if A and B did have the same error rate

Comparing multiple algorithms

Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error

If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference

30Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)