37
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for

INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] ethem/i2ml Lecture

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

INTRODUCTION TO

Machine Learning

ETHEM ALPAYDIN© The MIT Press, 2004

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

Page 2: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

CHAPTER 11:

Multilayer Perceptrons

Page 3: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Neural Networks

Networks of processing units (neurons) with connections (synapses) between them

Large number of neurons: 1010

Large connectitivity: 105

Parallel processing

Distributed computation/memory

Robust to noise, failures

Page 4: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

Understanding the Brain

Levels of analysis (Marr, 1982)1. Computational theory

2. Representation and algorithm

3. Hardware implementation

Reverse engineering: From hardware to theory

Parallel processing: SIMD vs MIMD

Neural net: SIMD with modifiable local memory

Learning: Update by training/experience

Page 5: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5

Perceptron

(Rosenblatt, 1962)

[ ][ ]Td

Td

Td

jjj

x,...,x,

w,...,w,w

wxwy

1

10

01

1=

=

=+= ∑=

x

w

xw

Page 6: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6

What a Perceptron Does

Regression: y=wx+w0 Classification: y=1(wx+w0>0)

ww0

y

x

x0=+1

ww0

y

x

s

w0

y

x

( ) [ ]xwToy

−+==

exp11

sigmoid

Page 7: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

K Outputs

kk

i

i

k k

ii

Tii

yy

C

oo

y

o

maxif

choose

expexp

=

=

=

xw

Classification:

Regression:

xy

xw

W=

=+= ∑=

Tii

d

jjiji wxwy 0

1

Page 8: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8

Training

Online (instances seen one by one) vs batch (whole sample) learning:

No need to store the whole sample

Problem may change in time

Wear and degradation in system components

Stochastic gradient-descent: Update after a single pattern

Generic update rule (LMS rule):

( )( ) InpututActualOutpputDesiredOutctorLearningFaUpdate ⋅−⋅=

−=∆ tj

ti

ti

tij xyrw η

Page 9: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9

Training a Perceptron: Regression

Regression (Linear output):

( ) ( ) ( )[ ]( ) t

jttt

j

tTtttttt

xyrw

ryrr,E

−η=

−=−=

22

21

21

| xwxw

Page 10: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

Classification

Single sigmoid output

K>2 softmax outputs

( )( ) ( ) ( )

( ) tj

tttj

ttttttt

tTt

xyrw

yryr,E

y

−=∆

−−−−=

=

η

1 log 1 log |

sigmoid

rxw

xw

{ }( )

( ) tj

ti

ti

tij

i

ti

ti

ttii

t

k

tTk

tTit

xyrw

yr,Ey

−=∆

−== ∑∑η

log | exp

exprxw

xwxw

Page 11: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

Learning Boolean AND

Page 12: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12

XOR

No w0, w1, w2 satisfy:

(Minsky and Papert, 1969)

0

0

0

0

021

01

02

0

≤++>+>+≤

www

ww

ww

w

Page 13: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Multilayer Perceptrons

(Rumelhart et al., 1986)

( )

( )[ ]∑

=

=

+−+=

=

+==

d

j hjhj

Thh

H

hihih

Tii

wxw

z

vzvy

1 0

10

exp1

1

sigmoid xw

zv

Page 14: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)

Page 15: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15

Backpropagation

( )

( )[ ]

hj

h

h

i

ihj

d

j hjhj

Thh

H

hihih

Tii

wz

zy

yE

wE

wxw

z

vzvy

∂∂

∂∂

∂∂

=∂∂

+−+=

=

+==

=

=

exp1

1

sigmoid

1 0

10

xw

zv

Page 16: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16

( ) ( )( ) ( ) t

jth

th

th

tt

tj

th

th

th

tt

hj

th

th

t

tt

hjhj

xzzvyr

xzzvyr

wz

zy

yE

wE

w

−−η=

−−−η−=

∂∂

∂∂

∂∂

η−=

∂∂

η−=

1

1

Regression

Forward

Backward

x

( )xwThhz sigmoid=

∑=

+=H

h

thh

t vzvy1

0

( ) ( )221

| ∑ −=t

tt yr,E XvW

( ) th

t

tth zyrv ∑ −=∆

Page 17: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Regression with Multiple Outputs

zh

vih

yi

xj

whj

( ) ( )

( )

( ) ( ) tj

th

th

t iih

ti

tihj

th

t

ti

tiih

i

H

h

thih

ti

t i

ti

ti

xzzvyrw

zyrv

vzvy

yr,E

−⎥⎦

⎤⎢⎣

⎡−η=

−η=

+=

−=

∑ ∑

∑∑

=

1

21

|

01

2

XVW

Page 18: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

Page 19: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19

Page 20: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

whx+w0

zh

vhzh

Page 21: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

Two-Class Discrimination

One sigmoid output yt for P(C1|xt) and P(C2|xt) ≡ 1-yt

( ) ( ) ( )( )( ) ( ) t

jth

thh

t

tthj

th

t

tth

t

tttt

H

h

thh

t

xzzvyrw

zyrv

yryr,E

vzvy

−−=∆

−=∆

−−+−=

⎟⎟⎠

⎞⎜⎜⎝

⎛+=

∑=

1

1 log 1 log |

sigmoid1

0

η

η

XvW

Page 22: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

K>2 Classes

( )

( )

( )

( ) ( ) tj

th

th

t iih

ti

tihj

th

t

ti

tiih

t i

ti

ti

ti

k

tk

tit

i

H

hi

thih

ti

xzzvyrw

zyrv

yr,E

CPo

oyvzvo

−⎥⎦

⎤⎢⎣

⎡−η=

−η=

−=

≡=+=

∑ ∑

∑∑∑∑

=

1

log|

|exp

exp

10

Xv

x

W

Page 23: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

Multiple Hidden Layers

MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks

( )

( )

=

=

=

+==

=⎟⎟⎠

⎞⎜⎜⎝

⎛+==

=⎟⎟⎠

⎞⎜⎜⎝

⎛+==

2

1

1022

21

0212122

11

01111

1sigmoidsigmoid

1sigmoidsigmoid

H

lll

T

H

hlhlh

Tll

d

jhjhj

Thh

vzvy

H,...,l,wzwz

H,...,h,wxwz

zv

zw

xw

Page 24: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24

Improving Convergence

Momentum

Adaptive learning rate

1−α+∂∂

η−= ti

i

tti w

wE

w ∆∆

⎩⎨⎧

η−<+

=ητ+

otherwise

if

b

EEa tt

Page 25: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

25

Overfitting/OvertrainingNumber of weights: H (d+1)+(H+1)K

Page 26: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26

Page 27: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27

Structured MLP

(Le Cun et al, 1989)

Page 28: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

28

Weight Sharing

Page 29: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

29

Hints

Invariance to translation, rotation, size

Virtual examples

Augmented error: E’=E+λhEh

If x’ and x are the “same”: Eh=[g(x|θ)- g(x’|θ)]2

Approximation hint:

(Abu-Mostafa, 1995)

( ) [ ]( )( ) ( )( )( ) ( )⎪

⎪⎨

>θ−θ<θ−θ∈θ

=

xx

xx

xx

h

bxgbxg

axgaxg

b,axg

E

|if |

|if |

|if 0

2

2

Page 30: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

30

Tuning the Network Size

Destructive

Weight decay:

Constructive

Growing networks

(Ash, 1989) (Fahlman and Lebiere, 1989)

∑λ+=

λ−∂∂

η−=

ii

ii

i

wE'E

wwE

w

2

2

Page 31: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

31

Bayesian Learning

Consider weights wi as random vars, prior p(wi)

Weight decay, ridge regression, regularizationcost=data-misfit + λ complexity

( ) ( ) ( )( ) ( )

( ) ( ) ( )

( ) ( ) ( )

2

2

212exp where

log|log|log

|log max arg |

|

w

w

www

ww

www

w

λ+=

⎥⎦

⎤⎢⎣

⎡λ

−⋅==

++=

==

E'E

)/(

wcwpwpp

Cppp

pˆp

ppp

ii

ii

MAP

XX

XX

XX

Page 32: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

32

Dimensionality Reduction

Page 33: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

33

Page 34: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

34

Learning Time

Applications:Sequence recognition: Speech recognition

Sequence reproduction: Time-series prediction

Sequence association

Network architecturesTime-delay networks (Waibel et al., 1989)

Recurrent networks (Rumelhart et al., 1986)

Page 35: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

35

Time-Delay Neural Networks

Page 36: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

36

Recurrent Networks

Page 37: INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

37

Unfolding in Time