37
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] ethem/i2ml Lecture Slides for

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

INTRODUCTION TO Machine LearningETHEM ALPAYDIN© The MIT Press, 2004

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

Page 2: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

CHAPTER 11:

Multilayer Perceptrons

Page 3: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3

Neural Networks

Networks of processing units (neurons) with connections (synapses) between them

Large number of neurons: 1010

Large connectitivity: 105

Parallel processing Distributed computation/memory Robust to noise, failures

Page 4: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4

Understanding the Brain

Levels of analysis (Marr, 1982)1. Computational theory2. Representation and algorithm3. Hardware implementation

Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD

Neural net: SIMD with modifiable local memoryLearning: Update by training/experience

Page 5: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5

Perceptron

(Rosenblatt, 1962)

Td

Td

Td

jjj

x,...,x,

w,...,w,w

wxwy

1

10

01

1

x

w

xw

Page 6: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6

What a Perceptron Does

Regression: y=wx+w0 Classification:

y=1(wx+w0>0)

ww0

y

x

x0=+1

ww0

y

x

s

w0

y

x

xwToy

exp1

1sigmoid

Page 7: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7

K Outputs

kk

i

i

k k

ii

Tii

yy

C

oo

y

o

maxif

choose

expexp

xw

Classification:

Regression:

xy

xw

W

Tii

d

jjiji wxwy 0

1

Page 8: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8

Training

Online (instances seen one by one) vs batch (whole sample) learning: No need to store the whole sample Problem may change in time Wear and degradation in system components

Stochastic gradient-descent: Update after a single pattern

Generic update rule (LMS rule):

InpututActualOutpputDesiredOutctorLearningFaUpdate

tj

ti

ti

tij xyrw

Page 9: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9

Training a Perceptron: Regression Regression (Linear output):

t

jttt

j

tTtttttt

xyrw

ryrr,E

22

21

21

| xwxw

Page 10: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10

Classification

Single sigmoid output

K>2 softmax outputs

tj

tttj

ttttttt

tTt

xyrw

yryr,E

y

1 log 1 log |

sigmoid

rxw

xw

tj

ti

ti

tij

i

ti

ti

ttii

t

k

tTk

tTit

xyrw

yr,Ey

log | exp

exprxw

xwxw

Page 11: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11

Learning Boolean AND

Page 12: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12

XOR

No w0, w1, w2 satisfy:

(Minsky and Papert, 1969)

0

0

0

0

021

01

02

0

www

ww

ww

w

Page 13: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13

Multilayer Perceptrons

(Rumelhart et al., 1986)

d

j hjhj

Thh

H

hihih

Tii

wxw

z

vzvy

1 0

10

exp1

1

sigmoid xw

zv

Page 14: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14

x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)

Page 15: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15

Backpropagation

hj

h

h

i

ihj

d

j hjhj

Thh

H

hihih

Tii

wz

zy

yE

wE

wxw

z

vzvy

exp1

1

sigmoid

1 0

10

xw

zv

Page 16: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16

t

jth

th

th

tt

tj

th

th

th

tt

hj

th

th

t

tt

hjhj

xzzvyr

xzzvyr

wz

zy

yE

wE

w

1

1

Regression

Forward

Backward

x

xwThhz sigmoid

H

h

thh

t vzvy1

0

221

| t

tt yr,E XvW

th

t

tth zyrv

Page 17: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)17

Regression with Multiple Outputs

zh

vih

yi

xj

whj

tj

th

th

t iih

ti

tihj

th

t

ti

tiih

i

H

h

thih

ti

t i

ti

ti

xzzvyrw

zyrv

vzvy

yr,E

1

21

|

01

2

XVW

Page 18: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)18

Page 19: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19

Page 20: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)20

whx+w0

zh

vhzh

Page 21: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)21

Two-Class Discrimination

One sigmoid output yt for P(C1|xt) and P(C2|xt) ≡ 1-yt

t

jth

thh

t

tthj

th

t

tth

t

tttt

H

h

thh

t

xzzvyrw

zyrv

yryr,E

vzvy

1

1 log 1 log |

sigmoid1

0

XvW

Page 22: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)22

K>2 Classes

tj

th

th

t iih

ti

tihj

th

t

ti

tiih

t i

ti

ti

ti

k

tk

tit

i

H

hi

thih

ti

xzzvyrw

zyrv

yr,E

CPo

oyvzvo

1

log|

|exp

exp

10

Xv

x

W

Page 23: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)23

Multiple Hidden Layers

MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks

2

1

1022

21

0212122

11

01111

1sigmoidsigmoid

1sigmoidsigmoid

H

lll

T

H

hlhlh

Tll

d

jhjhj

Thh

vzvy

H,...,l,wzwz

H,...,h,wxwz

zv

zw

xw

Page 24: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)24

Improving Convergence

Momentum

Adaptive learning rate

1

ti

i

tti w

wE

w

otherwise

if

b

EEa tt

Page 25: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)25

Overfitting/OvertrainingNumber of weights: H (d+1)+(H+1)K

Page 26: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)26

Page 27: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)27

Structured MLP

(Le Cun et al, 1989)

Page 28: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)28

Weight Sharing

Page 29: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)29

Hints

Invariance to translation, rotation, size

Virtual examples Augmented error: E’=E+λhEh

If x’ and x are the “same”: Eh=[g(x|θ)- g(x’|θ)]2

Approximation hint:

(Abu-Mostafa, 1995)

xx

xx

xx

h

bxgbxg

axgaxg

b,axg

E

|if |

|if |

|if 0

2

2

Page 30: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)30

Tuning the Network Size

Destructive Weight decay:

Constructive Growing networks

(Ash, 1989) (Fahlman and Lebiere, 1989)

ii

ii

i

wE'E

wwE

w

2

2

Page 31: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)31

Bayesian Learning

Consider weights wi as random vars, prior p(wi)

Weight decay, ridge regression, regularizationcost=data-misfit + λ complexity

2

2

212exp where

log|log|log

|log max arg |

|

w

w

www

ww

www

w

E'E

)/(w

cwpwpp

Cppp

pˆp

ppp

ii

ii

MAP

XX

XX

XX

Page 32: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)32

Dimensionality Reduction

Page 33: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)33

Page 34: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)34

Learning Time

Applications: Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association

Network architectures Time-delay networks (Waibel et al., 1989) Recurrent networks (Rumelhart et al., 1986)

Page 35: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)35

Time-Delay Neural Networks

Page 36: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)36

Recurrent Networks

Page 37: INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)37

Unfolding in Time