24
1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology [email protected]

1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology [email protected]

Embed Size (px)

Citation preview

Page 1: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

1

A Statistical Mechanical Analysis of Online Learning:

Seiji MIYOSHIKobe City College of Technology

[email protected]

Page 2: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

2

Background (1)

• Batch Learning– Examples are used repeatedly– Correct answers for all examples– Long time– Large memory

• Online Learning– Examples used once are discarded– Cannot give correct answers for all examples– Large memory isn't necessary– Time variant teacher

Page 3: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

3

A Statistical Mechanical Analysis of Online Learning:

Can Student be more Clever than Teacher ?

Seiji MIYOSHIKobe City College of Technology

[email protected]

Jan. 2006

Page 4: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

4

BMoving Teacher

JStudent

True Teacher

A

Jan. 2006

Page 5: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

5

A Statistical Mechanical Analysis of Online Learning:

Seiji MIYOSHIKobe City College of Technology

[email protected]

Many Teachers or Few Teachers ?

Page 6: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

6

B B

B B

k'k

K 1A

J

True teacher

Student

Ensemble teachers

Page 7: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

7

P U R P O S EP U R P O S ETo analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher

To discuss the relationship between the number, the diversity of ensemble teachers and the generalization error

B B

B B

k'k

K 1A

J

Page 8: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

8

M O D E L (1/4)M O D E L (1/4)True teacher

Student

• J learns B1,B2, ・・・ in turn.

• J can not learn A directly.

• A, B1,B2, ・・・ ,J are linear perceptrons with noises.

Ensemble teachers

B B

B B

k'k

K 1A

J

Page 9: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

9

Simple Perceptron

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

+1

-1

Page 10: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

10

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

Simple Perceptron

N

iiixJ

1

Output

Linear Perceptron

Page 11: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

11

B

BB

B 1

kk'

KA

J

M O D E L (2/4)M O D E L (2/4)

Linear Perceptrons with Noises

Page 12: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

12

M O D E L (3/4)M O D E L (3/4)• Inputs:   • Initial value of student:

• True teacher:  • Ensemble teachers:

• N→∞ (Thermodynamic limit)

• Order parameters– Length of student– Direction cosines

Page 13: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

13

B B

B B

R

q

R

RJ

B k

k'k

K 1

BkJ

kk'

A

J

True teacher

Student

Ensemble teachers

Page 14: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

14

fkm

Student learns K ensemble teachers in turn.

M O D E L (4/4)M O D E L (4/4)

Gradient method

Squared errors

Page 15: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

15

GENERALIZATION ERRORGENERALIZATION ERROR• A goal of statistical learning theory is to obtain generalization error theoretically.

• Generalization error = mean of errors over the distribution of new input

Page 16: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

16

Simultaneous differential equations in deterministic forms, Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameterswhich describe dynamical behaviors of order parameters

Page 17: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

17

Analytical solutions of order parametersAnalytical solutions of order parameters

Page 18: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

18

GENERALIZATION ERRORGENERALIZATION ERROR• A goal of statistical learning theory is to obtain generalization error theoretically.

• Generalization error = mean of errors over the distribution of new input

Page 19: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

19

Dynamical behaviors of generalization error, Dynamical behaviors of generalization error, RRJJ and and ll

( η=0.3, K=3, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 )

Ord

er P

ara

me

ters

t=m/N

q=1.00

l

R

q=0.80q=0.60q=0.49

0 5

0.2

0.4

0.6

0.0

1.0

0.8

10 15 20

Student

Ensembleteachers

Ge

ner

ali

zati

on

Err

or

t=m/N

q=1.00q=0.80q=0.60q=0.49

0 50.2

0.4

0.6

1.2

1.0

0.8

10 15 20

J

Page 20: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

20

Analytical solutions of order parametersAnalytical solutions of order parameters

Page 21: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

21

Steady state analysisSteady state analysis ( ( tt → → ∞ ∞ ))

・ If η <0  or   η>2

・ If 0< η <2

Generalization error and length of student diverge.

If η <1 , the more teachers exist or the richer the diversity of teachers is, the cleverer the student can become. 

If η >1 , the fewer teachers exist or the poorer the diversity of teachers is, the cleverer the student can become.

Page 22: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

22

Steady value of generalization error, Steady value of generalization error, RRJJ and and ll

( K=3, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 )

0 0.5 1 1.5 2

q=1.00q=0.80q=0.60q=0.49

η

0.2

0.4R

0.6

0.0

0.8

Ge

ner

ali

zati

on

Err

or

00.1

1

10

0.5 1 1.5 2

q=1.00q=0.80q=0.60q=0.49

ηJ

Page 23: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

23

Steady value of generalization error, Steady value of generalization error, RRJJ and and ll

( q=0.49, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 )

0 0.5 1 1.5 2

η

K=1K=3K=10K=30

0.2

0.4R

0.6

0.0

0.8

1.0

Ge

ner

ali

zati

on

Err

or

0.1

1

10

0 0.5 1 1.5 2

η

K=1K=3K=10K=30

J

Page 24: 1 A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

24

CONCLUSIONSCONCLUSIONSWe have analyzed the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student.

Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we have proven that when the learning rate satisfies η<1, the larger the number K is and the more diversity the teachers have, the smaller the generalization error is. On the other hand, when η>1, the properties are completely reversed.

If the diversity of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.