3
Index matrix interpretation of the Multilayer Perceptron Krassimir Atanassov Institute of Biophysics and Biomedical Engineering Bulgarian Academy of Sciences 105 “Acad. G. Bonchev” Str., Sofia 1113, Bulgaria e-mail: [email protected] and Asen Zlatarov University – Burgas 1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria Sotir Sotirov Asen Zlatarov University – Burgas 1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria e-mail: [email protected] Abstract—Neural networks are a mathematical model for solving problems, that uses the structure of human brain. One of the mostly used kinds of neural networks, the Multilayer perceptron (MLP), has been modelled with various tools. Here, starting with the MLP, we approach the problem by modelling neural networks in terms of index matrices (IMs). The work includes IM interpretations of the building components of the neural network, namely, input vector, weight coefficients, transfer function, and biases, as well as the various operations defined over these. Keywords—index matrix; modelling; neural network. I. INTRODUCTION The artificial neural networks represent a mathematical model inspired by the biological neural networks [3, 6]. Its functions are borrowed from the functions of human brain. There is not yet an uniform opinion on the definition of neural networks, yet increasingly more specialists share the view that neural networks are a number of simple connected items, each featuring a rather limited local memory. These items are connected with connections, transferring numerical data, coded with various tools. Figure 1 shows, in abbreviated notation, a classical three- layered neural network. Fig. 1. Block diagram of the feedforward neural network In multilayered networks, the exits of one layer become entries for the next one. The equations describing this operation are: a m+1 = f m+1 (w m+1 .a m + b m+1 ) for m = 0, 1, 2, ..., M 1, where: m is the current number of the layers in the network; M is the number of the layers in the network; P is an entry network’s vector; a m is the exit of the m-th layer of the neural network; s m is a number of neutrons of a m-th layer of the neural network; W is a matrix of the coefficients of all inputs; b is neuron’s input bias; F m is the transfer function of the m-th layer exit. II. SHORT REMARKS ON INDEX MATRIX Let I be a fixed set of indices and let R be the set of the real numbers. An index matrix (IM) with sets of indices K and L (K, L I ) is defined (see [1, 2]) by { } [ ] , a a a k a a a k a a a k l l l a L K n l , m k l , m k l , m k m n l , k l , k l , k n l , k l , k l , k n j l , i k " # % # # # " " " 2 1 2 2 2 1 2 2 1 2 1 1 1 1 2 1 , , where K = {k 1 , k 2 , ..., k m }, L = {l 1 , l 2 , ..., l n }, for m i 1 , and R a n j j i l k , : 1 . Let A = [K, L, { j l , i k a }] and B = [P, Q, { s q , r p b }] be two IMs. For them, we define the following operations: a) addition A B = [ { } w u v t c L P K , , Q , ], where The authors acknowledge the financial support provided by the Bulgarian National Science Fund under Grant DID-02-29 “Modelling of Processes with Fixed Development Rules”. 978-1-4799-0661-1/13/$31.00 ©2013 IEEE

[IEEE 2013 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) - Albena, Bulgaria (2013.06.19-2013.06.21)] 2013 IEEE INISTA - Index matrix

  • Upload
    sotir

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Index matrix interpretation of the Multilayer Perceptron

Krassimir Atanassov Institute of Biophysics and Biomedical Engineering

Bulgarian Academy of Sciences 105 “Acad. G. Bonchev” Str., Sofia 1113, Bulgaria

e-mail: [email protected] and Asen Zlatarov University – Burgas

1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria

Sotir Sotirov Asen Zlatarov University – Burgas

1 “Prof. Yakimov” Blvd, Burgas 8010, Bulgaria e-mail: [email protected]

Abstract—Neural networks are a mathematical model for

solving problems, that uses the structure of human brain. One of the mostly used kinds of neural networks, the Multilayer perceptron (MLP), has been modelled with various tools. Here, starting with the MLP, we approach the problem by modelling neural networks in terms of index matrices (IMs). The work includes IM interpretations of the building components of the neural network, namely, input vector, weight coefficients, transfer function, and biases, as well as the various operations defined over these.

Keywords—index matrix; modelling; neural network.

I. INTRODUCTION The artificial neural networks represent a mathematical

model inspired by the biological neural networks [3, 6]. Its functions are borrowed from the functions of human brain. There is not yet an uniform opinion on the definition of neural networks, yet increasingly more specialists share the view that neural networks are a number of simple connected items, each featuring a rather limited local memory. These items are connected with connections, transferring numerical data, coded with various tools.

Figure 1 shows, in abbreviated notation, a classical three-layered neural network.

Fig. 1. Block diagram of the feedforward neural network

In multilayered networks, the exits of one layer become entries for the next one. The equations describing this operation are:

am+1 = f m+1(w m+1.am + bm+1)

for m = 0, 1, 2, ..., M − 1, where:

• m is the current number of the layers in the network;

• M is the number of the layers in the network;

• P is an entry network’s vector;

• am is the exit of the m-th layer of the neural network;

• sm is a number of neutrons of a m-th layer of the neural network;

• W is a matrix of the coefficients of all inputs;

• b is neuron’s input bias;

• Fm is the transfer function of the m-th layer exit.

II. SHORT REMARKS ON INDEX MATRIX Let I be a fixed set of indices and let R be the set of the

real numbers. An index matrix (IM) with sets of indices K and L (K, L ⊂ I ) is defined (see [1, 2]) by

{ }[ ] ,

aaak

aaakaaak

lllaLK

nl,mkl,mkl,mkm

nl,kl,kl,k

nl,kl,kl,k

njl,ik

21

222122

121111

21 , , ≡

where K = {k1, k2, ..., km}, L = {l1, l2, ..., ln}, for mi ≤≤1 , and Ranj

ji lk ∈≤≤ ,:1 .

Let A = [K, L, { jl,ika }] and B = [P, Q, { sq,rpb }] be two IMs. For them, we define the following operations:

a) addition

A ⊕ B = [ { }wu vtcLPK ,,Q, ∪∪ ], where

The authors acknowledge the financial support provided by the Bulgarian National Science Fund under Grant DID-02-29 “Modelling of Processes withFixed Development Rules”.

978-1-4799-0661-1/13/$31.00 ©2013 IEEE

⎪⎪⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪⎪⎪

∩∈

∩∈+∈

−∈−∈

∈−∈

−∈

=

otherwise0,

; == and

== if, ; = and

=or = and

= if,

; = and =or

= and

= if,

QL qlv

PKpktbaQqv

K P ptLQqv

P ptb

LlvP Kkt

QL lv

Kkta

c

sjw

riusq,rpjl,ik

sw

ru

sw

rusq,rp

jw

iu

jw

iujl,ik

wv,ut

b) termwise multiplication

{ }[ ]wv,utc,QL,PKBA ∩∩=⊗ , where sq,rpjl,ikwv,ut bac +=

for P Kpkt riu ∩== and QLqlv sjw ∩∈== ;

c) multiplication A ☼ B = ( ) ( ) { }[ ]

wu vtcPLQLPK ,,, −∪−∪ , where

⎪⎪⎪⎪⎪⎪

⎪⎪⎪⎪⎪⎪

∈+

−∈

−∈

=

∑∩∈=

otherwise0,

; and

if,

; and

if,

; and

if,

Qq=l=v

Kp=k=tba

Q q=v

L Pp=tb

P Ll=v

K k=ta

c

sjw

riuPLrpjl

sq,rpjl,ik

sw

rusq,rp

jw

iujl,ik

wv,ut

d) structural subtraction

A ㊀ B= { }[ ]wv,utc,L,PK Q−− , where “–” is the set–theoretical difference operation and jl,ikwv,ut ac = , for

PKkt iu −∈= and Q−∈= Llv jw .

e) multiplication with a constant

{ }[ ]ji lkaLKA ,,, ⋅=⋅ αα , where α is a constant.

f) termwise subtraction

( ) BABA ⋅−⊕=− 1 .

III. MAIN RESULTS Let P be an input vector in the form

R

R

ssppp

P10

0=

Let the weight coefficients of the connections between the nodes of the input vector and these from the first layer be given by the IM

,

1,

11,

1,1

11,11

,11,11

1

1

1

sRRR

s

s

WWp

WWpaa

W =

while let the parameters of the moves of the neurons from the first layer be given by the IM

11110

11111

s,,

s,,

bbpaa

B =

Then, a1 is the IM with the values of the neurons in the first layer. It is obtained by the formula

Pa (1 = ☼ =⊕ 11) BW

∑∑==

++= R

kskskk

R

kkkk

s

bWabWap

aa

1,

1,

11,

11,0

,11,1

11

1

..

11

110

1111

s

s,,

aapaa

=

Let i be a natural number from the set {2, 3, ..., M}. Let the IM of the weight coefficients of the connections between the nodes of the i-th and (i + 1)-st layers be

,

1,1

11,11,1

1,1

11,11,1

,11,1

1

1

1

−−

−−−−

−−−

=

iss

issi

is

ii

si

iiiWWa

WWaaa

W

and let the parameters of the moves of the neurons from the i-th layer be given by the IM

110

1111

s,i,i

s,,ibbpaa

B = .

Let us have the IM for the (i − 1)-st layer

11

110

11111−−

−−− = i

isi

is,,iaapaa

a .

Then, 1−= ii a(a ☼ =⊕ ii B)W

∑∑=

=

− ++= R

ksk

iisk

ik

R

ksk

iik

ik

s

iii

i

bWabWap

aa

1,,

1

1,1,

10

,11,1

..

iis

iis,,

aapaa

10

111=

and for i = M

11

110

111−−

−−− = M

MsM

Ms,M,MMaapaa

a .

In [5], the transfer function F : R → R is defined, where R is the set of the real numbers. We see that the above formulas can be interpreted as results of identical function F, so that for every real number x, F(x) = x. Below, we discuss the case, when F is a real function, different from the identical one. For example, in [4] this function is sigmoidal or hyperbolical tangent function.

Now, for a fixed real function F, we define over the IM

nmmm

n

n

aak

aakll

A

,1,

,11,11

1=

the operator

.

)()(

)()()(

,1,

,11,11

1

nmmm

n

nF

aFaFk

aFaFkll

AO =

Now we can describe the neural network with the form

POa F ((1 = ☼ )) 11 BW ⊕ , 1(( −= i

Fi aOa ☼ ) )i iW B⊕ .

Therefore, 21 (( −− = M

FM aOa ☼ )) 11 −− ⊕ MM BW

POOO FFF ((((((...= ☼ )) 11 BW ⊕ ☼ )) 22 BW ⊕ …☼

)) 11 −− ⊕ MM BW .

A more general case is the following: each layer hat its own transfer function, i.e., function Fi is associated to the i-th layer. Therefore, the NN has the IM-representation

POOOa FFFM

M((((((... 12

11−

=− ☼ )) 11 BW ⊕ ☼ )) 22 BW ⊕ …

☼ )) 11 −− ⊕ MM BW .

IV. CONCLUSION Index matrices are a non-standard instrument for descript-

tion of neural networks, yet one that can be successfully applied to model one of the best used types of neural networks: the Multilayer Perceptron. Apart from the classical type, there have been various types of neural networks developed, which divide by the method of learning in two general classes: supervised and unsuprevised. In the present work we propose for the first time the approach of interpreting neural networks using index matrices. In a series of works, we plan to apply this approach to the descriprion of some other types of neural networks, from both classes of supervised and unsuprevised networks.

It is particularly challenging to investigate the various algorithms for neural network learning in terms of index matrices, and these will also be an object of future research.

REFERENCES [1] Atanassov K., Generalized index matrices, Comptes rendus de

l'Academie Bulgare des Sciences, Vol. 40, 1987, No. 11, 15–18. [2] Atanassov, K. On index matrices, Part 1: Standard cases. Advanced

Studies in Contemporary Mathematics, Vol. 20, 2010, No. 2, 291–302. [3] Cybenko, G., Approximation by superpositions of a sigmoidal function,

Mathematics of Control, Signals and Systems, Vol. 2, 1989, 303–314. [4] Hagan, M., H. Demuth, M. Beale, Neural Network Design, Boston, MA:

PWS Publishing, 1996. [5] Haykin, S., Neural Networks: A Comprehensive Foundation, NY:

Macmillan, 1994. [6] Rumelhart, D., G. Hinton, R. Williams. Training representation by back-

propagation errors, Nature, Vol. 323, 1986, 533–536.