15
ETHEM ALPAYDIN © The MIT Press, 2010 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for

Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

ETHEM ALPAYDIN© The MIT Press, 2010

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

Page 2: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late
Page 3: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Rationale No Free Lunch Theorem: There is no algorithm that is always

the most accurate

Generate a group of base-learners which when combined has higher accuracy

Different learners use different Algorithms

Hyperparameters

Representations /Modalities/Views

Training sets

Subproblems

Diversity vs accuracy

3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 4: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Voting Linear combination

Classification

L

jjiji dwy

1

101

1

L

jjj

L

jjj

ww

dwy

and

4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 5: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Bayesian perspective:

If dj are iid

Bias does not change, variance decreases by L

If dependent, error increase with positive correlation

jjii PxCPxCPj

MMM

,|| models all

5

jjj

jj

j

jjj

j

dL

dLL

dL

dL

y

dEdELL

dL

EyE

VarVarVarVarVar1111

11

22

j j jijij

jj ddCovd

Ld

Ly ),(VarVarVar 2

1122

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 6: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Fixed Combination Rules

6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 7: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Error-Correcting Output Codes

110100

101010

011001

000111

W

7

K classes; L problems (Dietterich and Bakiri, 1995)

Code matrix W codes classes in terms of learners

One per class

L=K

Pairwise

L=K(K-1)/2

1111

1111

1111

1111

W

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 8: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

8

Full code L=2(K-1)-1

With reasonable L, find W such that the Hamming distance btw rows and columns are maximized.

Voting scheme

Subproblems may be more difficult than one-per-K

1111111

1111111

1111111

1111111

W

L

jjiji dwy

1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 9: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Bagging Use bootstrapping to generate L training sets and train

one base-learner with each (Breiman, 1996)

Use voting (Average or median with regression)

Unstable algorithms profit from bagging

9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 10: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

AdaBoost

Generate a sequence of base-learners each focusing on previous one’s errors

(Freund and Schapire, 1996)

10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 11: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Voting where weights are input-dependent (gating)

(Jacobs et al., 1991)

Experts or gating

can be nonlinear

Mixture of Experts

L

jjjdwy

1

11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 12: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Stacking Combiner f () is

another learner (Wolpert, 1992)

12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 13: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Fine-Tuning an Ensemble Given an ensemble of dependent classifiers, do not use it

as is, try to get independence

1. Subset selection: Forward (growing)/Backward (pruning) approaches to improve accuracy/diversity/independence

2. Train metaclassifiers: From the output of correlated classifiers, extract new combinations that are uncorrelated. Using PCA, we get “eigenlearners.”

Similar to feature selection vs feature extraction

13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 14: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

CascadingUse dj only if preceding ones are not confident

Cascade learners in order of complexity

14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Page 15: Introduction to Machine Learning - CmpE WEBethem/i2ml2e/2e_v1-0/i2ml2e-chap1… · Combining Multiple Sources Early integration: Concat all features and train a single learner Late

Combining Multiple Sources Early integration: Concat all features and train a single

learner

Late integration: With each feature set, train one learner, then either use a fixed rule or stacking to combine decisions

Intermediate integration: With each feature set, calculate a kernel, then use a single SVM with multiple kernels

Combining features vs decisions vs kernels

15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)