Upload
sotir
View
217
Download
3
Embed Size (px)
Citation preview
Index matrix interpretation of the Multilayer Perceptron
Krassimir Atanassov Institute of Biophysics and Biomedical Engineering
Bulgarian Academy of Sciences 105 “Acad. G. Bonchev” Str., Sofia 1113, Bulgaria
e-mail: [email protected] and Asen Zlatarov University – Burgas
1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria
Sotir Sotirov Asen Zlatarov University – Burgas
1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria e-mail: [email protected]
Abstract—Neural networks are a mathematical model for
solving problems, that uses the structure of human brain. One of the mostly used kinds of neural networks, the Multilayer perceptron (MLP), has been modelled with various tools. Here, starting with the MLP, we approach the problem by modelling neural networks in terms of index matrices (IMs). The work includes IM interpretations of the building components of the neural network, namely, input vector, weight coefficients, transfer function, and biases, as well as the various operations defined over these.
Keywords—index matrix; modelling; neural network.
I. INTRODUCTION The artificial neural networks represent a mathematical
model inspired by the biological neural networks [3, 6]. Its functions are borrowed from the functions of human brain. There is not yet an uniform opinion on the definition of neural networks, yet increasingly more specialists share the view that neural networks are a number of simple connected items, each featuring a rather limited local memory. These items are connected with connections, transferring numerical data, coded with various tools.
Figure 1 shows, in abbreviated notation, a classical three-layered neural network.
Fig. 1. Block diagram of the feedforward neural network
In multilayered networks, the exits of one layer become entries for the next one. The equations describing this operation are:
am+1 = f m+1(w m+1.am + bm+1)
for m = 0, 1, 2, ..., M − 1, where:
• m is the current number of the layers in the network;
• M is the number of the layers in the network;
• P is an entry network’s vector;
• am is the exit of the m-th layer of the neural network;
• sm is a number of neutrons of a m-th layer of the neural network;
• W is a matrix of the coefficients of all inputs;
• b is neuron’s input bias;
• Fm is the transfer function of the m-th layer exit.
II. SHORT REMARKS ON INDEX MATRIX Let I be a fixed set of indices and let R be the set of the
real numbers. An index matrix (IM) with sets of indices K and L (K, L ⊂ I ) is defined (see [1, 2]) by
{ }[ ] ,
aaak
aaakaaak
lllaLK
nl,mkl,mkl,mkm
nl,kl,kl,k
nl,kl,kl,k
njl,ik
21
222122
121111
21 , , ≡
where K = {k1, k2, ..., km}, L = {l1, l2, ..., ln}, for mi ≤≤1 , and Ranj
ji lk ∈≤≤ ,:1 .
Let A = [K, L, { jl,ika }] and B = [P, Q, { sq,rpb }] be two IMs. For them, we define the following operations:
a) addition
A ⊕ B = [ { }wu vtcLPK ,,Q, ∪∪ ], where
The authors acknowledge the financial support provided by the Bulgarian National Science Fund under Grant DID-02-29 “Modelling of Processes withFixed Development Rules”.
978-1-4799-0661-1/13/$31.00 ©2013 IEEE
⎪⎪⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎧
∩∈
∩∈+∈
−∈−∈
∈
∈−∈
−∈
∈
=
otherwise0,
; == and
== if, ; = and
=or = and
= if,
; = and =or
= and
= if,
QL qlv
PKpktbaQqv
K P ptLQqv
P ptb
LlvP Kkt
QL lv
Kkta
c
sjw
riusq,rpjl,ik
sw
ru
sw
rusq,rp
jw
iu
jw
iujl,ik
wv,ut
b) termwise multiplication
{ }[ ]wv,utc,QL,PKBA ∩∩=⊗ , where sq,rpjl,ikwv,ut bac +=
for P Kpkt riu ∩== and QLqlv sjw ∩∈== ;
c) multiplication A ☼ B = ( ) ( ) { }[ ]
wu vtcPLQLPK ,,, −∪−∪ , where
⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪
⎨
⎧
∈
∈+
∈
−∈
−∈
∈
=
∑∩∈=
otherwise0,
; and
if,
; and
if,
; and
if,
Qq=l=v
Kp=k=tba
Q q=v
L Pp=tb
P Ll=v
K k=ta
c
sjw
riuPLrpjl
sq,rpjl,ik
sw
rusq,rp
jw
iujl,ik
wv,ut
d) structural subtraction
A ㊀ B= { }[ ]wv,utc,L,PK Q−− , where “–” is the set–theoretical difference operation and jl,ikwv,ut ac = , for
PKkt iu −∈= and Q−∈= Llv jw .
e) multiplication with a constant
{ }[ ]ji lkaLKA ,,, ⋅=⋅ αα , where α is a constant.
f) termwise subtraction
( ) BABA ⋅−⊕=− 1 .
III. MAIN RESULTS Let P be an input vector in the form
R
R
ssppp
P10
0=
Let the weight coefficients of the connections between the nodes of the input vector and these from the first layer be given by the IM
,
1,
11,
1,1
11,11
,11,11
1
1
1
sRRR
s
s
WWp
WWpaa
W =
while let the parameters of the moves of the neurons from the first layer be given by the IM
11110
11111
s,,
s,,
bbpaa
B =
Then, a1 is the IM with the values of the neurons in the first layer. It is obtained by the formula
Pa (1 = ☼ =⊕ 11) BW
∑∑==
++= R
kskskk
R
kkkk
s
bWabWap
aa
1,
1,
11,
11,0
,11,1
11
1
..
11
110
1111
s
s,,
aapaa
=
Let i be a natural number from the set {2, 3, ..., M}. Let the IM of the weight coefficients of the connections between the nodes of the i-th and (i + 1)-st layers be
,
1,1
11,11,1
1,1
11,11,1
,11,1
1
1
1
−−
−−−−
−−−
=
iss
issi
is
ii
si
iiiWWa
WWaaa
W
and let the parameters of the moves of the neurons from the i-th layer be given by the IM
110
1111
s,i,i
s,,ibbpaa
B = .
Let us have the IM for the (i − 1)-st layer
11
110
11111−−
−−− = i
isi
is,,iaapaa
a .
Then, 1−= ii a(a ☼ =⊕ ii B)W
∑∑=
−
=
− ++= R
ksk
iisk
ik
R
ksk
iik
ik
s
iii
i
bWabWap
aa
1,,
1
1,1,
10
,11,1
..
iis
iis,,
aapaa
10
111=
and for i = M
11
110
111−−
−−− = M
MsM
Ms,M,MMaapaa
a .
In [5], the transfer function F : R → R is defined, where R is the set of the real numbers. We see that the above formulas can be interpreted as results of identical function F, so that for every real number x, F(x) = x. Below, we discuss the case, when F is a real function, different from the identical one. For example, in [4] this function is sigmoidal or hyperbolical tangent function.
Now, for a fixed real function F, we define over the IM
nmmm
n
n
aak
aakll
A
,1,
,11,11
1=
the operator
.
)()(
)()()(
,1,
,11,11
1
nmmm
n
nF
aFaFk
aFaFkll
AO =
Now we can describe the neural network with the form
POa F ((1 = ☼ )) 11 BW ⊕ , 1(( −= i
Fi aOa ☼ ) )i iW B⊕ .
Therefore, 21 (( −− = M
FM aOa ☼ )) 11 −− ⊕ MM BW
POOO FFF ((((((...= ☼ )) 11 BW ⊕ ☼ )) 22 BW ⊕ …☼
)) 11 −− ⊕ MM BW .
A more general case is the following: each layer hat its own transfer function, i.e., function Fi is associated to the i-th layer. Therefore, the NN has the IM-representation
POOOa FFFM
M((((((... 12
11−
=− ☼ )) 11 BW ⊕ ☼ )) 22 BW ⊕ …
☼ )) 11 −− ⊕ MM BW .
IV. CONCLUSION Index matrices are a non-standard instrument for descript-
tion of neural networks, yet one that can be successfully applied to model one of the best used types of neural networks: the Multilayer Perceptron. Apart from the classical type, there have been various types of neural networks developed, which divide by the method of learning in two general classes: supervised and unsuprevised. In the present work we propose for the first time the approach of interpreting neural networks using index matrices. In a series of works, we plan to apply this approach to the descriprion of some other types of neural networks, from both classes of supervised and unsuprevised networks.
It is particularly challenging to investigate the various algorithms for neural network learning in terms of index matrices, and these will also be an object of future research.
REFERENCES [1] Atanassov K., Generalized index matrices, Comptes rendus de
l'Academie Bulgare des Sciences, Vol. 40, 1987, No. 11, 15–18. [2] Atanassov, K. On index matrices, Part 1: Standard cases. Advanced
Studies in Contemporary Mathematics, Vol. 20, 2010, No. 2, 291–302. [3] Cybenko, G., Approximation by superpositions of a sigmoidal function,
Mathematics of Control, Signals and Systems, Vol. 2, 1989, 303–314. [4] Hagan, M., H. Demuth, M. Beale, Neural Network Design, Boston, MA:
PWS Publishing, 1996. [5] Haykin, S., Neural Networks: A Comprehensive Foundation, NY:
Macmillan, 1994. [6] Rumelhart, D., G. Hinton, R. Williams. Training representation by back-
propagation errors, Nature, Vol. 323, 1986, 533–536.