Upload
phamtuyen
View
233
Download
0
Embed Size (px)
Citation preview
ETHEM ALPAYDIN© The MIT Press, 2010
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Lecture Slides for
Introduction Questions:
Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%?
Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?
Training/validation/test sets
Resampling methods: K-fold cross-validation
3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Algorithm Preference Criteria (Application-dependent):
Misclassification error, or risk (loss functions)
Training time/space complexity
Testing time/space complexity
Interpretability
Easy programmability
Cost-sensitive learning
4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Factors and Response
5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Strategies of Experimentation
6
Response surface design for approximating and maximizing the response function in terms of the controllable factors
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Guidelines for ML experimentsA. Aim of the study
B. Selection of the response variable
C. Choice of factors and levels
D. Choice of experimental design
E. Performing the experiment
F. Statistical Analysis of the Data
G. Conclusions and Recommendations
7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
The need for multiple training/validation sets
{Xi,Vi}i: Training/validation sets of fold i
K-fold cross-validation: Divide X into k, Xi,i=1,...,K
Ti share K-2 parts
Resampling and K-Fold Cross-Validation
121
3122
3211
KKKK
K
K
XXXTXV
XXXTXV
XXXTXV
2
1
8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5×2 Cross-Validation
1510
2510
259
159
124
224
223
123
112
212
211
111
XVXT
XVXT
XVXT
XVXT
XVXT
XVXT
9
5 times 2 fold cross-validation (Dietterich, 1998)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Draw instances from a dataset with replacement
Prob that we do not pick an instance after N draws
that is, only 36.8% is new!
Bootstrapping
36801
1 1 .
eN
N
10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Measuring Error
Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found
= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity
11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
ROC Curve
12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Precision and Recall
14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Interval Estimation
1
950961961
950961961
22N
zmN
zmP
Nm
NmP
mNP
mN
//
...
...
~Z
15
X = { xt }t where xt ~ N ( μ, σ2)
m ~ N ( μ, σ2/N)
100(1- α) percentconfidence interval
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
When σ2 is not known:
1
950641
950641
NzmP
NmP
mNP
..
..
1
1
1212
1
22
N
Stm
N
StmP
tS
mNNmxS
NN
Nt
t
,/,/
~ /
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Reject a null hypothesis if not supported by the sample with enough confidence
X = { xt }t where xt ~ N ( μ, σ2)
H0: μ = μ0 vs. H1: μ ≠ μ0
Accept H0 with level of significance α if μ0 is in the
100(1- α) confidence interval
Two-sided test
Hypothesis Testing
220
// ,
zz
mN
17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0
Accept if
Variance unknown: Use t, instead of z
Accept H0: μ = μ0 if
z
mN,
0
18
12120
NN ttS
mN,/,/ ,
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Assessing Error: H0: p ≤ p0 vs. H1: p > p0
19
Single training/validation set: Binomial Test
If error prob is p0, prob that there are e errors or less in Nvalidation trials is
1- α
Accept if this prob is less than 1- α
N=100, e=20
jNjje
j
ppj
NeXP
00
1
1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Number of errors X is approx N with mean Np0 and var Np0(1-p0)
Normal Approximation to the Binomial
Z~
00
0
1 pNp
NpX
20
Accept if this prob for X = e is less than z1-α
1- α
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Multiple training/validation sets
xti = 1 if instance t misclassified on fold i
Error rate of fold i:
With m and s2 average and var of pi , we accept p0 or less error if
is less than tα,K-1
t Test
N
xp
N
t
ti
i
1
21
1
0
Kt
S
pmK~
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Comparing Classifiers: H0: μ0 = μ1 vs. H1: μ0 ≠ μ1
2
1
1001
2
1001 1X~
ee
ee
22
Single training/validation set: McNemar’s Test
Under H0, we expect e01= e10=(e01+ e10)/2
Accept if < X2α,1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
K-Fold CV Paired t Test
12121
1
2
21
00
0
1
00
KKK
K
i i
K
i i
ttts
mK
s
mK
K
mps
K
pm
HH
,/,/ ,~
::
in if Accept
vs.
23
Use K-fold cv to get K training/validation folds
pi1, pi
2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi
2 : Paired difference on fold i
The null hypothesis is whether pi has mean 0
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5×2 cv Paired t Test
55
1
2
1
1
2221221
5
2
ts
p
ppppsppp
i i
iiiiiiii
~/
/
24
Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998)
pi(j) : difference btw errors of 1 and 2 on fold j=1, 2 of
replication i=1,...,5
Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5×2 cv Paired F Test
5105
1
2
5
1
2
1
2
2,~ F
s
p
i i
i j
ji
25
Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Comparing L>2 Algorithms: Analysis of Variance (Anova)
LH 210 :
26
Errors of L algorithms on K folds
We construct two estimators to σ2 .
One is valid if H0 is true, the other is always valid.
We reject H0 if the two estimators disagree.
KiLjX jij ,..., ,,...,,,~ 112 N
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
27
2
12
0
22
12
2
1
2
2
2
2
21
2
1
0
1
1
L
jjL
j
j
L
j
j
j j
L
j j
K
i
ij
j
SSb
H
mmKSSbK
mm
L
mmK
SK
L
mmS
L
mm
KK
Xm
H
X
X
N
~
~/
ˆ
/,~
:
have wetrue, is whenSo
namely, , is of estimator an Thus
true is If
2
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
28
11210
11
22
2
12
2
12
2
2
2
1
2
1
2
2
2
0
1
1
11
1
11
KLLL
KLL
KLK
j
j ijij
j i
jijL
j
j
K
i jij
j
j
FH
FKLSSw
LSSb
KL
SSw
L
SSb
SSwSK
mXSSw
KL
mX
L
S
K
mXS
S
H
,,
,
:
~/
///
/
~~
ˆ
if
: variances group of average
the is to estimator second our of Regardless
2
2
XX
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
ANOVA table
29
If ANOVA rejects, we do pairwise posthoc tests
)(~
: vs :
1
10
2
KL
w
ji
jiji
tmm
t
HH
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Comparison over Multiple Datasets Comparing two algorithms:
Sign test: Count how many times A beats B over Ndatasets, and check if this could have been by chance if A and B did have the same error rate
Comparing multiple algorithms
Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error
If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference
30Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)