15
1 : n : n : n : . ! n : n : : n : . n , n n Rosenblatt 1962 n Minsky and Papert 1969 n Perceptron: n Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n net input: n : threshold function ( threshold θ = -w 0 ) n activation function n Perceptron Networks n weighted links w i n Multi-Layer Perceptron (MLP): = = n 0 i net i i x w Perceptron x 1 x 2 x n w 1 w 2 w n Σ x 0 = 1 w 0 = n 0 i i i x w ( 29 > = = otherwise if 1 0 1 , , 0 2 1 - x w x x x o i n i i n K (29 ( 29 > = = otherwise if : Vector 1 0 1 sgn - x w w , x x o r r r r r 0 1 w x w i n i i - > = n : n (McCulloch and Pitts, 1943) n e.g., AND(x 1 , x 2 ), OR(x 1 , x 2 ), NOT(x) n n + - + + - - x 1 x 2 + + - - x 1 x 2 0 0 = = i n i i x w 0 0 > = i n i i x w 0 0 < = i n i i x w n n <x i , t i > 1in x i n t i +1 or -1 n x i t i n n x i x t + - + + - - x 1 x 2 + - + + - - x 1 x 2 n : Hebb Hebbian Learning Rule (Hebb, 1949) n : active (firing) , n w ij = w ij + r o i o j , r learning rate n , , n n Perceptron Learning Rule (Rosenblatt, 1959) n : , , n (Bool , Boolean-valued) ; n t = c(x) , o , r , . 1 , r n D linearly separable , . r i i i i i o)x r(t ∆w ∆w w w - = +

Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

1

�����

G�������G�����

� ������������

���: G�����

n ��� !"#: $%&'()*+*,-./01n 234567: 89:;<=n >?@ABCDEF: HI. JKLMN !

n OPQ<=67: RS9TUn OPQ<=679VWDXAY: Z[@ABCDEF\]^K_: RS9TUn 2345679VWDXAY: Z[@ABCDEF. ̀

n abc(d, efgh0ijklmnopghqn rs��� !"#

n Rosenblatt 1962

n Minsky and Papert 1969

n ��� !"# Perceptron: tuvw�"#*xyzn {|234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG)

n 67}9~�� net input: 23��n 679��: ~���45�� threshold function �����9 (45 threshold θ = −w0)

n ~����������_��� ����� activation function \]�n ��� !"#��!��� Perceptron Networks

n @ABCDEF��L������ weighted links wi�����L���_n Multi-Layer Perceptron (MLP): `

∑=

=n

0i

netii

xw

G����� Perceptron

x1

x2

xn

w1

w2

wn

ΣΣΣΣ

x0 = 1

w0

∑=

n

0i

ii xw( )

>= ∑

=

otherwise

if

1

01,, 021

-

xwxxxo

i

n

i

i

nK

( ) ( ) >⋅

==otherwise

if : Vector

1

01sgn

-

xww, xxo

rrrrr

 ¡

¢£¤¥¦§¨©ª0

1

wxw i

n

i

i −>∑=

G�����«¬­«®¯°±

n ��� !"#: ²³´µ¶·0¸¹+ºtg»¼(½1n ¾¿�� (McCulloch and Pitts, 1943) 9ZÀ9�9n e.g., ÁÂÃ��Ä AND(x1, x2), OR(x1, x2), NOT(x)

n »¼(½´0µ¶+Å1n 23ÆÇÈÉÄÃ��9 –�ÊËÌÄÍL

+

-+

+

--

x1

x2

+

+

-

-x1

x2

ÎÏÐÑ0

0

=∑=

i

n

i

i xw

00

>∑=

i

n

i

i xw

00

<∑=

i

n

i

i xw

ÒÓÔÕÖ×ØÙÚÛÜÝÞß

n àáâãn äåæçèé <xi, ti> êëì í1≤i≤îïð

n xidñòórôõ��ö�÷øj*ùn tid +1 or −1

n úûüçèéýþÿG� xiF�����ê� tié��ëì�ëì

n ��âã ��âã�n xiê����� x�ÿG���ì���� té��ëì

�*�e��$%(�

+

-+

+

--

x1

x2 +

-+

+

--

x1

x2

� !"

#$%&'("

G�����)*+,-./0

n 12: Hebb*�e3 Hebbian Learning Rule (Hebb, 1949)

n 4<54: ��67967L89\� active (“firing”)Ä:K^, ����;<=Í_n wij = wij + r oi oj , >� r;?@A� learning rateÄBC�Ä:_n DEH¿?I�, JK, LMNK��_n �O�BPQ�Ã�

n ��� !"#�eRzSöTU Perceptron Learning Rule (Rosenblatt, 1959)

n 4<54: V��WYDX�Y����5LZ[\K��_Ã\, ���]^I�_`Í_J\��a,

Nb��5L��Ä�_�c�Ã_n 65�� (Bool5, Boolean-valued) �dC; Âe@ABCDEF67n

>� t = c(x) ;fg��5, o;@ABCDEF9hi9��5, r;?@A�, jC�Ä:K^kÄ�l�. 1Ä��9ÄBm;B@ABCDEF?@4XnopqÄ;, r;rs

n DL23ÆÇÈÉ linearly separableÄ:K^, PQÍ_. rLtÆuN�J\�vw\Í_xy�:_LzK;{a

ii

iii

o)xr(t∆w

∆www

−=+←

Page 2: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

2

G�����)*+,-./0

n t�´���� Gradient Descent RzSöTU(Å1n J94<54;, 4NÃ�h���K^, ��?@��I�?@��4�ÈÉ

n RzSöTU Train-Perceptron (D ≡ {<x, t(x) ≡ c(x)>})

n �� wi �;FGq5���Í_ // @ABCDEF;0��������n WHILE j������Ã���L:_ DO

FOR zKFK9�� x ∈ D

hi9��hi9��hi9��hi9�� o(x) �� �� �� �� FOR i = 1 to n

wi ←←←← wi + r(t - o)xi // perceptron learning rule. r is any positive #

n ��� !"#�e���n h ∈ H 9\�9�?@ÈÉ - i.e., 23ÆÇÈÉ linearly separable (LS) functions

n Minsky and Papert (1969) Perceptrons : >?@ABCDEF9�h�?@9������• �: 67e7Ä; parity (n-�� XOR: x1 ⊕ x2 ⊕ … ⊕ xn) ��L�hÄ�Ã�, \�c9;��• e.g., #�9 symmetry, connectedness;�>?@ABCDEFÄ��hÄ�Ã�• APerceptrons”9 �Ä ANN %!L10"$À&K�\�'K�Í_LB(J)Ä*mO+

,-./

Linearly Separable (LS)Data Set

x1

x2

+

+

+

++

++

+

+

+

+

-

-

--

-

-

--

-

-

-

--

- - -

n Ïcn f(x) = 1 if w1x1 + w2x2 + … + wnxn ≥ θ, 0 otherwise

n θ: 45n 0123��56

n �: DL23ÆÇÈÉ7O\\���,

*9�� c(x) L23ÆÇÈÉ\;�\Ã�n 89 disjunction: c(x) = x1’ ∨ x2’ ∨ … ∨ xm’

n m of n: c(x) = at least 3 of (x1’ , x2’, …, xm’ )

n :;I exclusive OR (XOR): c(x) = x1 ⊕ x2

n e<9 DNF: c(x) = T1 ∨ T2 ∨ …∨ Tm; Ti = l1 ∧ l1 ∧ … ∧ lk

n »¼*=>n 23ÆÇÈÉÄÃ�?@�23ÆÇÈÉÃ?@��BÄ�_O?

n zK;CD9:_J\Ã9O? hmIÃ9O?

n hm?@9�sÃEÆ�HI_9O?

JJKK

G�����)*±LM

n ��� !"#�e*NOÏPn QR: ��ST5AU\ consistent Ã��V�L:K^ (i.e., 5AUL23ÆÇÈÉÃ\), @ABCDEF?@4Xnopq;PQÍ_

n Wy: XYZ[L��9:_\]�Ã���_ (“^9_`Lab�cd���À) –efMinsky and Papert, 11.2-11.3

n �C 1: PQ)Ä9gh[;?

n �C 2: ��23ÆÇÈÉÄÃiK^(cÃ_9O?

n ��� !"#jkÏPn QR: ST5AUL23ÆÇÈÉÄÃiK^@ABCDEF?@4Xnopq��a�\K_��WYDX;B:_l�V�m�n)_. ��Lo�WYDXÃ\, l�V�m�n)_.

n Wy: ��tÆ�pY5Lq����WYDXO\rI_\, pY5;s(q�ÀÃKÃ�J\L� _; ST��98> n9�?Ituv��_ – Minsky and Papert, 11.10

n wx"yz!g, {mwx»¼|,}~1gd?

n fI 1: ��\�l�$����Í_4Xnopq9��n fI 2: �h9����[_`��4A��Y��9��

������

n ���������������n ����� ¡¢

n £¤¥¦n §¨¡©ª�¢«¬­¨®¯°±²¢³�¢°±´n µ¶·¸¹

n º»¼�½¾¿À¥¦

ÁÂÃÄÅÆÇ

1x

2x

nx

1w

2w

nw

10 =x

0w

i

n

i

i xwnet ∑=

=0 ( )

== ∑=

i

n

i

i xwneto0

σσ

ÈÉÊËÌÍ�ÎÏã σ Ð ( )xe

x −+≡

1

ÑÒÓÔÕÖ×Ø

ÙÚÛÜÝÞ

ßà

áâ

),(

),(

),(

),(

4

1

4

1

4

3

1

3

1

3

2

1

2

1

2

1

1

1

1

1

xwfy

xwfy

xwfy

xwfy

=

=

=

=

),(

),(

),(

12

3

2

3

12

2

2

2

12

1

2

1

ywfy

ywfy

ywfy

=

=

=

,

1

4

1

3

1

2

1

1

1

=

y

y

y

y

y

),( 23

1 ywfyOut

=

,2

3

2

3

2

3

2

=y

y

y

y

1x

2x

3x

4x

1

1y

1

2y

1

3y

1

4y

2

1y

2

2y

2

3y

1x

2x

3x

4x

Page 3: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

3

G��������

n ������ �����n ������������

���� !!"#$%&"'#"(')*+"#'#,$!-�.$"/�0&"1��%&2!-�.$"/�0&"1��%&2'��34

56�789:;: <=>?

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-3/www/ml.html

@AB9:C

DE FHI�J KL�XORMN

KO KP

QRSTÿ

QRSOUV

QRSWUV

XYI

OXYI���Zé[Z\ÿ�]�

^_í`ÿ�UVïabð

A

AB

B

A

AB

B

A

AB

B

BA

BA

BA

cd8�

n e¼�f½ghi��jÀklmnopqrstuv�w�rxyzvÀ{x|r�ozvÀ}~��������w�

x

1

1y

1

2y

1

3y

1

4y

x

1

�����������

����

n ��� !"#�ep*��,��mn ������������ !"#�e*m�gd

n ���(� �*¡(Å1i¢¡n·£³n .5.¤¥¦��� !"#(d¤§|��¨©gd¤i� n·´0

n ª'«¬­®¯°±²³´«��µ¶·¸¹º"»n ¼«��µ¼$½(¶·¸¹º"¾%¿À�Á®ÃÄÅÆÇ«ÈÉʱ%¿À"

Ë̵Í"»n ÎÏÐ�ÑÒ¹ÓÔº»Í"ÕÖ%×Ø»ÙÚÛ²ÜÝ%Þßغ»àáÙÚâ㺲äå¹Þßغ»���ÙÚâãº%¼æ¿À$çÔº¾èn #æ¸Øéê��� ëê��ìí²îï!"»n ªð²�credit assignment problem ñºò$"

n ¹{x¤��� !"#�epdóo(½´0ôn õ/¤¤¤¤

x

1

1y

1

2y

1

3y

1

4y

x

1ö÷øù

ö÷úû

Page 4: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

4

1����� -- G�����

n ��� ���n ��UV��ì�í��� –%����ð2 é���ÿ�

n F*�¤��gd¤F*�*�y��¤�§|��gµ�1� ¤, E(w) �!Fq

n úûüçè"�UVR�#$&' w

n E(w) é w ()*í+)*(ëðÿG0ê,-n ./01 w gµ�1234,567w0n ��� !"#(d¤82(½´0

n �9:;<�=ã>�Í?>�n @�AïéBCDEHAï

n ��:IJ KL MN OPQ�RSTUVn ï�WXÿ�Y��(]�Z�T[Yn \]Z^é_`�

( ) ∑=

−=N

i

ii txWFN

WE1

2)),((1

ab1980cde«fgh

ijkl

n PDP (Parallel Distributed Computing) ê[mn��oépqrTs�Yÿ\ÿ�

n tuvwéxy�úûüçè()*ÿ�0ê,z�"�]sê{\Z|}(�s�Yn �d, ~�*�pd, F*j, ��õ0/0m (Amari 1967; Werbos,

1974, “dynamic feedback”; Parker, 1982, “learning logic”) *(¤�����+&�1ô��di��gq{¸0¸n�0qF�,�.mF�

n ("�T��-�T�zTs��\� _`Z�ì���n ��p(NO�1�d��´5lmôn ��p�*+*·�=´�����0/0môn ��2·�xg+�.��/¤��/ºt´��¨©q{¸�¸�d��´5lm

n �Õ��ÓÈ «¡Æ¢£¤º�ͥػn ¦.��¶·§�1�¤¨©*ª#«w��(dNO.´5lmn �¬´F�·­®(��0/01�d��´5lmô

¯°ÜÝ��Ö�Ó���±

n ²³´²µ¶·¸rÀ�y¹º¹ºr»¦¼½u�w�¾

n ¶·¸¼µyn ¿�«¬feedforward´n ÀÁ«¬recurrent´

ÂÃ: Ä�-Å��G�����Æ

ÇÈ: ÉÊËÌÍrecurrentÎ

n ���Ï����Ðhj�ÑÒÓlÔÕ�

ÇÖ× 1�������

• ØÙÚÛ0�ÜÝÞßàáâãäåæÝçèéêë– �ìí�:îï9:;<�= KðíÍ9ñ?Í Vò– óîô:õö÷øùúû�9TÉü– ýð�:þÿF����Í?�ü����Û

• 8�� �ßàá���²��²ãä�Ý• ��ã�¬�Ú��²ãâ��ÚÛÝ�ß�´â������ !²��� âG"�ß�â���

( ) ( ) ( ) L>>> 321 WEWEWE

L321 ,, WWW

Page 5: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

5

������

• ���ß������ÛÝ��• ����µ��²�� �

– �����Éã V:����� �îô�Î K�ü

-2-1

01

23w0

-3 -2 -1 0 1 2w1

0

5

10

E>w�

-2-1

01

23w0

-3 -2 -1 0 1 2w1

������í��PíF â� Í!"L

�#$%��ÝÞ&

• '(À)*µ+,�Õ-.¹|/• 01µy234567��89:• ¿��y

���¾;µ�<1�4=¼?@r¹�¹tr¹A�

WWW ∆+←

∂∂−=∆W

EW α

)BCDEgG

2)( ytE −=

xyt )(2 −−=

)()(2)]([ 2 xfdx

dxfxf

dx

d =W

ytyt

W

E

∂−∂−=

∂∂ )(

)(2

W

y

W

yt

∂∂−=

∂−∂ )(

x−=

´¼º�!µ

W

xwn

i

ii

∂−=∑

=1

∑=

=n

i

ii xwy1

(t d� �(Åx¤W gHI.´0)

ñx dJ�!z(�ô

KLMN

xytW )( −=∆ α

��ãUV:2O � O íP=Q�Ï�

xytW

E)(2 −−=

∂∂

∂∂−+←W

EWW α Uó

WWW ∆+← Uó

F*R*�e3d¤yz�3�+S701ñWidrow-Hoff 3�+S701ô

Perceptron ��TB�UV

• ÿ\ÿ�äåWX�YGYìê, perceptron äåW

", Z[�[G, ê[\](�^=1 /2 ê�[�]�

• ÿ\ÿ@�Aï")*_`T�y�perceptron äåW�abc"�tuvw���("de(�T[Y

=−=≠−

=≠=∆

yt

tytx

tytx

W

if0

1 and if

1 and if

WWW ∆+← Uó

xytW )( −=∆ αWWW ∆+← Uó

1,1 ±=±= yt

• 0123��: f�´2~·(½1– � A: @ABCDEF?@4XnopqLPQ

• 0123g�: h�(½1*i– � B: 23ÆÇrÉ; 5XUjk;PQ, �O�37jl�a;�ÀÃ\Ã�– � C: 23ÆÇrÉ; 5XUjkÄ���m

nopqrsperceptronturvwx¯©yz�{|yz

+

+

+

+

+

++

++

++

+

+

+

+

-

-

--

--

-

--

-

--

-

--

- - -

}}}} A

+

-+

+

--

x1

x2

}}}} B

-

-x1

x2

}}}} C

x1

x2

Page 6: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

6

• 0123��: $%2~,�1– � A: @ABCDEF?@k�5XUjk�2G67����Æ��Í_�7�B�m;FÃ_9ÄB�����YÍ_��;FÃ_ÈÉ�:a

• 0123��: $´12~,�1ô0R��*��¤��*y�#z,�1– � B: 23ÆÇÈÉ; Perceptron?@k;jl�aÆ�– � C: 23ÆÇrÉ; 5XUjk;jl�a;Æ��Ã�+�;F:;���

nopqrsperceptrontur��yz�{|yz

}}}} A

+

-+

+

--

x1

x2

}}}} B

x1

x2

+++

+

+

+

+

+

+

+

+

-

-- -

--

-- -}}}} C

x1

x2

+++

+

+

+

+

+

+

+

+

-

-- -

--

-- -

��� ������±®

s

sins

WWW

xyfytW

∆+←−←∆ )(')(α

sW∆

α

sx

t

���� s �ëì&'����

äå� (��ûç, Wz��$�ï)

%��

!Fáâ�

���� s

=

=

=

=n

i

iiin

n

i

ii

xwy

xwfy

1

1. . . "#

%&

n'â 1 áâ

( )inyfy =

∑=

=n

i

iiin xwy1

f )*(`TAï

°)*+

2)( ytE −=

( )xyfyt in′−−= )(2

W

ytyt

W

E

∂−∂−=

∂∂ )(

)(2

W

y

W

yt

∂∂−=

∂−∂ )(

( )W

xw

yf

n

i

ii

in ∂

∂′−=

∑=1

( )W

yf in

∂∂−=

( ) ∑=

==n

i

iiinin xwyyfy1

,

( )W

y

y

yf in

in

in

∂∂

∂∂−=

( )xyf in′−= Óx ,-./012Ø

3B45678

( )xyfytW in′−=∆ )(α

��ãUV:2O � O íP=Q�Ï�

( )xyfytW

Ein

′−−=∂∂

)(2

∂∂−+←W

EWW α Uó

WWW ∆+← Uó

B9:Ù

• -�;("�����F<î=`Z��ê���&'��é_`G��Y• ÿ\ÿ�-�"��\ÿ[Y• T�TZ�vwê[]�"�=`Z������>?�vwé_`T[ê_@FT[

• T�TZ��ì���� x1AëìvwéAB�C�#D�E����� x2 �ëìvwFHIÿGÿ;[�#DêÿG�>?vw"HIÿGÿ;(`cF�ì\Z(�ì

• (�ì\Z�2)( ytE −= �9 J ∑ −=

s

ss ytE2)(

��ãK= ðÏ� batch mode íK� vs. online mode��

( )∑ ′−=∆s

ssinss xyfytW,

)(αWWW ∆+← Uóý?Ê:

LNMNOÞß

• PQÍR?Küý?ÊðFS:OTU�� J:àáVWXø�YôZâ[Í:\ ] � O^0 í ã V:

Í>ìÊ: 9 _`��a�í ãðíTbcÏã

• �9:batch mode F�TKKFUd• ñ9ý�ef�9 Kü• 9:online mode F�T, ghÍ, >ôacK E �iãðíTjk�lVÏÊKãü

( )∑ ′−=∆s

ssinss xyfytW,

)(αWWW ∆+← Uó

∑ −=s

ss ytE2)(

Page 7: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

7

��ÆÇ�����B�

jsjj

sjinjjjs

WWW

xyfytW

,

,, )(')(

∆+←−←∆ α

jsW ,∆

α

sx

tj

���� s �ëìFj��UV��&'����

äå� (��ûç, Wz��$�ï)

Fj��UV�%��

!Fáâ�

���� s

=

=

=

=

n

i

ijijin

n

i

ijij

xwy

xwfy

1

,,

1

,

.

. . "#%&

n'â m áâ

( )jinj yfy ,=

∑=

=n

i

ijijin xwy1

,,

f )*(`TAï . . .

3B45678

( )xyfytW jinjjj ,)( ′−=∆ α

��ãUV:2O � O íP=Q�Ï�

( )xyfytW

Ejinjj

j

,)(2 ′−−=∂∂

∂∂−+←

j

jjW

EWW α Uó

jjj WWW ∆+← Uó

��������?

• ��j��(*²��, q�l/��07w05• perceptron �e3�yz��3,�07251·¤²*��,��1m�gd¤¢¡·£³(Å1¤�*m�gd¤� §|�·£³(Å1ô

• �F�·¤�j��gd¤� §|�·´0ñ��/0/0´0�ô– ������� �Í"

• ¹{x¤��!�����(dñhl/¤§|��(dô¢¡·Ïc(½1·¤��*�j��g�61¢¡dÏc(½´0¤�´��¤��¢¡*�j��(*�x x2·25/´0ô

• F*wqg¤Å1!��(*".�#$·25lm�½¤�*��%&',�)*+g *wqg2��,½5�0q-.d¤0�0�´�/(�0�1ô

• F*-.d¤credit assignment *-.�0�01ô• �j¦·Å1NN(¤credit assignment -.·�0.m*(Å1

2134256789:;credit titleÔcredit<=>

( )xyfytW jinjjj ,)( ′−=∆ α

Credit Assignment ?@

.

. .. . .

n'â m A[

"#%&

"#%&

p áâ

.

. .

• -����ìB>?(�CDEGHF*\s�ê����IJKLMéNDOPQ��*Rë��\ê[MN"�[S[ST�I(TUëìY

• -�MN"�credit assignment �MNê[V�ìY• QRSF�ìNN(�credit assignment MNFTUÿ��(�ì

ÁÂÃÄÅÆÇÛWXY

• ÚêÚ²vwx¯©yzæ��²ØÙZ[�\]^_`âa�b¹�c²d��ß���²���e�åfgh� credit assignment ijcãk��

• �� !²­lmno�±�p�q��²

�ܱ²��å²�� �ârs��åæÝ∑ −=

s

ss ytE2)(

∑ =−= p

k kk ytE1

2][

2)( ytE −=tu9,t � §|y �§|�T?:

vw"� p��é_`ìY�Xÿ�����ï"1îêÿG�z:

.

. .. . .

n'â m A[

"#%&

"#%&

p áâ

.

. .

IxJz

KyIJv JKw…

……

Page 8: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

8

IJ

Kin

Kin

IJ

Kin

Kin

Kin

IJ

Kin

IJ

K

v

yyf

v

y

y

yf

v

yf

v

y

∂∂

=

∂∂

∂∂

=

∂∂

=∂∂

,

,

,

,

,

,

)('

)(

)(

∑ =

∂∂

′−−= p

kIJ

kin

kinkkv

yyfyt

1

,

, )(][2

∑ =

∂∂−−=

∂∂ p

kIJ

kkk

IJ v

yyt

v

E1

][2

∑ =−= p

k kk ytE1

2][

/��e�Íâã�É, )('][ ,kinkkk yfyt −=δ

∑ =

∂∂

−= p

kIJ

kin

kv

y

1

,2 δ

(F0dF����1)

IxJz

KyIJv JKw…

……

E = [tk

− yk]2

k=1

p

∑)('][ ,kinkkk yfyt −=δ∑ =

∂∂

−=∂∂ p

kIJ

kin

k

IJ v

y

v

E1

,2 δ

IJ

JJK

IJ

m

j jKj

IJ

Kin

v

zw

v

wz

v

y

∂∂=

∂=

∂∂ ∑ =1,

Ijin

IJ

n

i iJi

jin

IJ

Jin

jin

IJ

jin

IJ

J

xzf

v

vxzf

v

zzf

v

zf

v

z

)('

)('

)('

)(

,

1,

,

,

,

=∂

∂=

∂∂

=

∂∂

=∂∂

∑ =Ix

Jz

KyIJv JKw…

……

E = [tk

− yk]2

k=1

p

])[(' , kkkink ytyf −=δ

{ }{ }

( )( )( )( )∑∑∑

=

=

=

=

=

−=

−=

−=

∂∂−=

∂∂

−=∂∂

p

k kJkJinI

IJin

p

k Jkk

p

k IJinJkk

p

kIJ

JJkk

p

kIJ

kin

k

IJ

wzfx

xzfw

xzfw

v

zw

v

y

v

E

1,

,1

1 ,

1

1

,

)('2

)('2

)('2

2

2

δ

δ

δ

δ

δ

IxJz

KyIJv JKw…

……

])[(' , KKKinKytyf −=δ

IJ

IJv

Ev

∂∂−=∆ α

IxJz

KyIJv JKw…

……

( )( )( )( )∑ =−=

∂∂ p

k kJkJinI

IJ

wzfxv

E1, )('2 δ

( )( )( )∑ ==∆ p

k kJkJinIIJwzfxv

1, )(' δα

2αéαêm��`ìê

])[(' , kkkink ytyf −=δ

Ix)(' ,Jinzf

)(' ,kinyfIJv JKw…

……

kk yt −

])[(' , kkkink ytyf −=δ

IJIJIJ vvv ∆+←���Z:

( )( )( )∑ ==∆ p

k kJkJinIIJ wzfxv1, )(' δα

Ix)(' ,Jinzf

)(' ,kinyfIJv JK

w…

……

kk yt −

])[(' , kkkink ytyf −=δ

)(' ,kinyfKKKO yt −=,δ

∑ == p

k kJkJinJ wzf1, )(' δδ

KOkinK yf ,, )(' δδ =

)(' ,JinzfJK

w…

……

KδIy

IJvJδ

JIIJ yv δα=∆

)(' ,kinyfKKKO yt −←,δ

∑ =← p

k kJkJinJ wzf1, )(' δδ

KOkinK yf ,, )(' δδ ←

)(' ,JinzfJK

w…

……

KδIz

IJvJδJIIJ zv δα=∆

( )JJinJin

n

i i zzfzz →→∑ = ,,1,

Jz)( ,Jinzf

JKw…

……

Iz

forward propagation

backword propagation

δ Fvw��IJ(?)*(�ì

z Fb�UV���(�ì

Page 9: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

9

H��kl����À�����H��kl����À�����H��kl����À�����H��kl����À�����

��S

xi

x1

x2

xn

1

2

i

n

��S

1

2

k

p

yk

y1

y2

yp

í��ð��

���

wjk

7�S

wij

1

2

j

m

∑ =−= p

k kk ytE1

2][

.

. .. . .

n'â m A[

"#%&

"#%&

p áâ

.

. .ky

ix

1x1y

nx py

p"%

kt

1t

pt

IxJz

KyIJv JKw…

……

3�× ����ÝÞ

∑=

=n

i

iijjin xvz1

,

( )jinj zfz ,=

IxJz

KyIJv JKw

……

∑=

↑n

i

iij

jin

xv

z

1

,

( )jin

j

zf

z

,

↑f

Jz

3�× ����ÝÞ

∑=

=m

j

jjkkin zwy1

,

( )kink yfy ,=

Ix IJv JKw

……

Jz

∑=

↑n

i

jjk

kin

zw

y

1

,

( )kin

k

yf

y

,

↑f

Jz

∑=

↑n

i

iij

jin

xv

z

1

,

( )jin

j

zf

z

,

↑f Ky

Jz

��× 1���ÝÞ

z

JoutJin

z

Jzf ,, )(' δδ =

∑ == p

k

y

kJk

z

Jout w1, δδ

òGòG �!"ËÌ�0º

x

Iδ y

KδIJv JKw

……

Jzf ′

z

Jout

p

k

y

kJkw

,

1

δ

δ

∑=

z

J

z

Jout

Jinzf

δ

δ↓

,

, )('

z

Jδ��× 1���ÝÞ

E = [tk − yk]2

k=1

p

y

koutkin

y

k yf ,, )(' δδ =kk

y

koutyt −=,δ

òGòG �!"ËÌ�0º

IJv JKw

……

Jzf ′

z

Jout

p

k

y

kJkw

,

1

δ

δ

∑=

z

J

z

Jout

Jinzf

δ

δ↓

,

, )('

Jzf ′ y

kout

kk yt

,δ↓−

y

k

y

kout

kinyf

δ

δ↓

,

, )(

y

z

x

Page 10: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

10

�L��ÝÞ

z

JIIJ xv δα=∆

IJIJIJ vvv ∆+←

y

KJJK zw δα=∆

JKJKJK www ∆+←

IJv JKw

……

Jzf ′

z

Jout

p

k

y

kJkw

,

1

δ

δ

∑=

z

J

z

Jout

Jinzf

δ

δ↓

,

, )('

Jzf ′ y

kout

kk yt

,δ↓−

y

k

y

kout

kinyf

δ

δ↓

,

, )(

y

z

x

Ix IJv JKw

……

Jz

∑=

↑n

i

jjk

kin

zw

y

1

,

( )kin

k

yf

y

,

↑f

Jz

∑=

↑n

i

iij

jin

xv

z

1

,

( )jin

j

zf

z

,

↑f Ky

Jz

)(' ,kinyfKKKO yt −←,δ

∑ =← p

k kJkJinJ wzf1, )(' δδ

KOkinK yf ,, )(' δδ ←

)(' ,JinzfJK

w…

……

KδIz

IJvJδJIIJ zv δα=∆

( )JJinJin

n

i i zzfzz →→∑ = ,,1,

Jz)( ,Jinzf

JKw…

……I

z

forward propagation

backword propagation

δ Fvw��IJ(?)*(�ì

z Fb�UV���(�ì

îíÉÐ 1�V���UV

GB�1: ������

n ¥¦)�À�@6n ��­¡mâ ���n Þ��� ¡¢���_ân � _`â�������c²

n FEâ��^��WâG"�åæÝ

( ) ∑=

−=N

t

tt yxWFN

WE1

2)),((1

( ){ }Ntyx tt ≤≤1, |),( xWFy =

GB�2: 1�������

n ØÙÚÛ0�ÜÝÞßàáâãäåæÝçèéêën îï9:;<�= KðíÍ9ñ?Í Vòn õö÷øùúû�9TÉü

n 8�� �ßàá���²��²ãä�Ýn ��ã�¬�Ú��²ãâ��ÚÛÝ�ß�´â������ !²��� âG"�ß�â���

( ) ( ) ( ) L>>> 321 WEWEWE

L321 ,, WWW

GB�3: ������

n ���ß������ÛÝ��n ����µ��²�� �

n �����Éã V:����� �îô�Î K�ü

-2-1

01

23w0

-3 -2 -1 0 1 2w1

0

5

10

E>w�

-2-1

01

23w0

-3 -2 -1 0 1 2w1

������í��PíF â� Í!"L

e�4: ������±���

n '(À)*µ+,�Õ-.¹|/n 01µy234567��89:n ¿��y

���¾ηµ�<1�4=¼?@r¹�¹tr¹A�

j

i

j

i

newj

i

j

i

j

i

www

Ww

Ew

∆+=

∂∂⋅−=∆

,

)(η

Page 11: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

11

1�����±LM

n ���´�ó5(*NOd��õ0´0n ��: @ABCDEF9PQ (�4à h ∈ H �, >� h ∈ H Ã_vwH; i.e., 23ÆÇÈÉ)

n :_F�4l ():qGI�4lÄ;ÃOc) }$����Àn backprop (BP) �YÍ_ �(O��KÃ�)

• ��� ( ��_`jk�Zd�_): ��F�ul;:�WCÍ_O�• MWnew ← MW + αMWnew ; ����α�1�������

• & I�!"H stochastic gradient descent : Fl�#)_& L$H• 0�79VWD�FÃ_��5Ä��; c)À%�Í_

n '(A=')XA=VWDXAY9 �• ANNs 9W<p?@ Bayesian learning (e.g., simulated annealing)

n NO*3n 0�$��5, i.e., 23�$�VWDXAYO\�r, +,�-2GVWDXAY}: �ly

n �!��NO./n C;DA9l2: 3456v natural gradient [Amari, 1998]

��Ò�7Ó889Ý:M

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2

;<1� vs. w11,1 B w2

1,1

-50

510

15

-5

0

5

10

15

0

5

10

-5 0 5 10 15-5

0

5

10

15

w11,1w2

1,1

w11,1

w21,1

;<1� vs. w11,1 B b1

1

w11,1

b11

-10

0

10

20

30 -30-20

-100

1020

0

0.5

1

1.5

2

2.5

b11w1

1,1-10 0 10 20 30-25

-15

-5

5

15

;<1� vs. b11 B b1

2

-10-5

05

10

-10

-5

0

5

10

0

0.7

1.4

-10 -5 0 5 10-10

-5

0

5

10

b11

b21

b21b1

1

=>?@�;

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

Page 12: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

12

��V����Ü�Ý�Ì

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

BP�Ù��Þß

n ê�(]T[Fn "���^" ((�ì-êFV\sGÿ;`�����Z^���[í���Tð]�"�[zZ(]�ìY

n ��T�^F�YZ��n ��T�;z"[z�XF�%é�ìQ("T[

n "g-.´*d��©jn 9�0� ��|W|∗|W| (|W|���«��) «��«���«��µËÌñ!"¾¤n ���%���«3���ññÓ²�3���!" µ!º¤$"

n �0g��q�6*¤*#$�%&5'($·./0´0n Neural Networks dt�´R*wq�·¤))�*·+0ô"gi"$ùn·Ål/¤NO,,¸.mx¤�-·"$g´lmx�1

n .Fý�K�Y�Q(�/�]�"�Levenberg-Marquardt ̂

Timothy Masters, Advanced Algorithms for Neural Networks: A

C++ Sourcebook, John Wiley & Sons (1995).

Levenberg-Marquardt �

-5 0 5 10 15-5

0

5

10

15

w11,1

w21,1

Feedforward ANNs: 012s4�56

n »¼|n 7K[89feedforward ANN

• :C9 Boolean functionLmhÄ�_(“Ä�_` J\;3y. AND-OR VWDXAY�*�)

• :C9 l�;<�� bounded continuous function (:C=>Ä$�) [Funahashi, 1989;

Cybenko, 1989; Hornik et al, 1989]

n OPQ<=���ÄÃÀ\����: ?@�� basis functions; �JK�FIÃHÄ��$�n ANNs L$�ABÃ��: Network Efficiently Representable Functions (NERFs) -CD�i;Ä���Ã� [Russell and Norvig, 1995]

n ANNs *EGyIRzn n-8>JAYoW=Z[ (����9Z[ weight space)

n ;<�� (��@;KAU����;<)

n 8L�<4:: ST��9 AM\OÃmN`n �À;ÆO���Ã�

ANNs ±O)*

n �e: *�e*Ïcn h’ ; h \����, worse on Dtrain, better on Dtest

n *�e: Å11n PaQ�RSn ST: UVvw

(cross-validation: holdout, k-fold)

n STW: weight decay

Error versus epochs (Example 1)Error versus epochs (Example 2)

ANNs ±O)*

n *�e*��/01�*XYn ZI[CÍ_7K67�97�n dÃÍS_\, tÆ�?@Ä�Ã� (“underfitting”)

• \]r^• ;_: ;`9abÄBb�NNQ5X�97��3c>��a���*9���97�LZ�

n ZÍS_\R?@• deaNK��Ã�• ;_: 28Z�b��af89Z�bÄ$�Í_

n 5n Zg: h�EÆV�8L attribute subset selection (pre-filter )�; wrapper)

n ST• cross-validation (CV)

• Weight decay: ijWYk\����eC5Ä�pY5��cdN _n ��/SÌ: random restarts: �5�;FGq�O[�, ��l679 addition and deletion

n mnopqrstuvwxyz{u|}~��qr����u������S. Amari, N. Murata, K.-R. Muller, M Finke and H. H. Yang, Asymptotic Statistical Theory of Overtraining and Cross-Validation, IEEE

Transactions on Neural Networks, Vol. 8, No. 5, pp. 985-996, 1997.

Page 13: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

13

�±1���

n �= �����W��

n ��F6=�àá��

( ) ( ) ∑∑ ∑ +−≡∈ ∈ ji

ij

Dd outputsk

dkdk wotwE,

2

,

2

,, γr

( ) ( )∑ ∑ ∑∈ ∈ ∈

∂∂

−∂∂

+−≡Dd outputsk inputsj

j

d

dk

j

d

dk

dkdkx

o

x

totwE

2

,,2

,, µr

F±�k±����,

n " ,+lmvw�"#n Neuroids [Valiant, 1994]

n zKFK9?M67L�����n zKFK9_`jk;FÃ����� ()�; ���?�ÀFÃ��� )

n 44IÃVWDXAYQ5X• ;FGqP;'9G�• ?M67;?@Ra9eE\��CD���_

n �zz%ª�y�#�n :@<Y���AEF spiking neurons [Maass and Schmitt, 1997]

n ��>L��9�hÄ;Ã�n �$9�[9�9�KLCD���

n ��[�A5(FP temporal coding Ä; rate coding L��\KBzK;��>Ä�hÈÉ

n �.0��3n -= I_` [Stein and Meredith, 1993; Seguin, 1998]

n :@<Y���AEF�Q5X spiking neuron model

ESN: MN��� NN

!P9#J�>K%&

ðF%&TJ'R P�(70¦5/§|¦(*)�*i

%&)�%&)�%&)�%&)�70¦�§|¦70¦�§|¦70¦�§|¦70¦�§|¦H�=�*�H�*�H�*�H�*�H�#*UgÎÏ.¤¨�+Ï

))()()1(()1( nyWnWxnuWfnxbackin +++=+

®,õv�!*§|®,õv�!*§|®,õv�!*§|®,õv�!*§|

Jaeger H. and Haas, H. Harnessing nonlinearity: Predicting chaotic systems and

saving energy in wireless communication. Science, 304:78-80, 2004.

ESN: Echo State Network

ESN���;

)5/(sin2/1)(7

nny teach =)5/sin()( nnu =���������

���������-10.�R/(��K01æçèé2D

35y��8| 95:;y���*MSE,��

<5MSE,�=>�1)�?²,��%&)�%&)�%&)�%&)�70¦®,70¦®,70¦®,70¦®,H0 +0.4 -0.4g0.95 0.025 0.025*@$(ÎÏ8|¦�70¦8|¦�70¦8|¦�70¦8|¦�70¦H1 -1gA@$(ÎÏB��÷y��)�B��÷y��)�B��÷y��)�B��÷y��)�H 1 -1gA@$(ÎÏ

ESN���;: The Mackey Glass system

)())(1/()()( tytytyty γττα β −−+−=&

1.0,10,2.0 === γβα17=τ

n Chaotic attractor�����dynamical system

��Þ"CD�.EIJK¡L�MÚÝ

30=τ

NOPQRSTUVWXYZ[\

NMild]^_ NWild]^_

))(1.0)/(1

)/(2.0()()1(

10ny

ny

nynyny −

−+−+=+

δτδτδ

`abcabcabcabc

defghijk

17=τ 30=τNMildlmn NWildlmn

)))()()()1((()()1(

)1(

nnyWnWxnuWfCnxCa

nx

backin νδδ +++++−=+

opqrstuvwopqrstuvwopqrstuvwopqrstuvwxyz{|}

Page 14: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

14

de��(WildF��) 30=τ

G����G����G����G���� 21000step�������� 3000step���������� ��������� ��������� ��������� �������

de��(MildF��) 17=τ

G����G����G����G���� 21000step�������� 3000step��������

` �������� �������� �������� ��������

/�� ������fg �

n !"#$%&'()*+,-.012345678,9:;<=>?,@<A

n BCBCD<?E-HIJK78L@34M;<

n N%&OP*QPRSTUVWXJ<?BayesTUY1>?,Z>M&E[\]^_K;1`a&E-HbcK,d`ef<>?,gh<

���ij �

n >e%TULk<2l

mnopq rpqopqsttttttt u sttttttttstttttt u tsttttttttsttttt u ttsttttttttstttt u tttsttttttttsttt u ttttsttttttttstt u tttttsttttttttst u ttttttstttttttts u ttttttts

���ij �vwx

n TUyz

{|}~� �~�}~��������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ���������������� � ��� ��� ��� � ��������

���ij �v�x

n TU]�d]��

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500 1000 1500 2000 2500

Sum of squared errors for each output unit

Page 15: Perceptron - Keio University...Perceptron: " tuvz# yw *x n {| 234567 Linear Threshold Unit (LTU) or Linear Threshold Gate (LTG) n 67}9~ net input: 23 n 679 : ~ 45 threshold function

15

���ij �v7x

n TU]�d]��=�A

���ij �v�x

n TU]�d]��=�A

-5

-4

-3

-2

-1

0

1

2

3

4

0 500 1000 1500 2000 2500

Weights from inputs to one hidden unit

��k: 8��jde

n ���]�

... ...

left strt right up

30 x 32

8��jde

n TU]�

... ...

left strt right up

30 x 32