18
Pergamon CONTRIBUTED ARTICLE 0893-6080(94)00080-8 Neural Networks, Vol. 8, No. 4, pp. 579-596, 1995 Copyright © 1995 Elsevier Science Ltd Printed in the USA. All fights reserved 0893-6080/95 $9.50 + .00 Properties of Block Feedback Neural Networks SIMONE SANTINI AND ALBERTO DEL BIMBO Dipartimento di Sistemi e Informatica (Received 27 September 1993; accepted 19 July 1994) Abstract--In this paper, we discuss some properties of Block Feedback Neural Networks (BFN). In the first part of the paper, we study network structures. We define formally what a structure is, and then show that the set ~ of n- layers BFN structures can be expressed as the direct sum of the set all, of n-layers BFN'architectures and the set ~, of n-layers BFN dimensions. Both ~4n and ~n are shown to have the structure of a distributive lattice and to indice such structure in ~. Moreover, we show that the computing capabilities of BFN are monotonically nondecreasing with the elements of ~t~ ordered according to the lattice structure. In the second part we show that the increasing in the computing power allows the BFN to be universal computers, having the same computing power as a Turing machine. Keywords---Block feedback neural networks, Network structures, Network architectures, Computing capabilities. 1. INTRODUCTION In this paper, we discuss some properties of the Block Feedback Networks (BFN) model, that we presented in Santini, Del Bimbo, and Jain ( 1993 ). BFNs are dis- crete-time dynamic multilayer perceptrons, built ac- cording to a recursive block-based scheme. We used these networks for nonlinear systems identification and image sequences prediction (Del Bimbo, Landi, & San- tini, 1992, 1993). Being built according to a recursive scheme, the BFN model enjoys a number of unique properties that we investigate in part in this paper. Most of these prop- erties stem from the great structural flexibility allowed by the recursive scheme. Intuitively (the term will be formally defined in the paper), the structure of a net- work is a description of how the neurons are connected, regardless of the weight values and the details of how the neurons work. Most current network models are based on a well- defined structure, and instances of the models are built by selecting the appropriate number of neurons in var- ious parts of a network. For all these networks, re- searchers study how several properties depend on the number of neurons--the structure being fixed. A num- ber of structures have been studied for generalization (Schwarze & Hertz, 1992), approximation capabilities (Blum & Li, 1991 ), and probability estimation from Requests for reprints should be sent to Simone Santini, Depart- ment of ComputerScience, UCSD 9500 Gilman Drive, La Jolla, CA 92093-0116. finite training sets (Moody, 1992; Murata, Yoshizawa, & Amari, 1992). On the other hand, if one wants to study the possi- bilities of entirely arbitrary models, he is overwhelmed by the countless structural possibilities, and cannot, usually, obtain general results. BFNs are in quite a unique position. The model is general enough to allow the definition of a great num- ber of structures and therefore to make structural issues worth studying. Yet, the systematic way in which net- works are built suggests that general results can be in- ferred. We argue that structural issues play a central role in neural network models and that the pos- sibility for BFNs to derive general properties should be regarded as one of their most desirable features. This paper is divided in two parts. In the first part, we discuss the algebraic properties of the set of BFN structures. We begin by giving a formal definition of the terms "structure" and "architecture," and then we introduce a partial ordering in the set of architectures and in the orthogonal set of BFN dimensions. At the end of this section, we show that the partial ordering corresponds to an ordering of the computing capabili- ties of the BFNs. This leaves the open question of what can be done with this model. In the second part we show that, if the BFNs are seen as sequential computing de- vices, their computing power is the same a Turing machine. Before discussing the properties of BFNs, we in- clude a brief reminder of the model, which also serves 579

Properties of block feedback neural networks

Embed Size (px)

Citation preview

Pergamon

CONTRIBUTED ARTICLE

0893-6080(94)00080-8

Neural Networks, Vol. 8, No. 4, pp. 579-596, 1995 Copyright © 1995 Elsevier Science Ltd Printed in the USA. All fights reserved

0893-6080/95 $9.50 + .00

Properties of Block Feedback Neural Networks

SIMONE SANTINI AND ALBERTO DEL BIMBO

Dipartimento di Sistemi e Informatica

(Received 27 September 1993; accepted 19 July 1994)

Abstract--In this paper, we discuss some properties o f Block Feedback Neural Networks (BFN). In the first part o f the paper, we study network structures. We define formally what a structure is, and then show that the set ~ o f n- layers B F N structures can be expressed as the direct sum of the set all, of n-layers BFN'architectures and the set ~, o f n-layers B F N dimensions. Both ~4n and ~n are shown to have the structure o f a distributive lattice and to indice such structure in ~ . Moreover, we show that the computing capabilities o f B F N are monotonically nondecreasing with the elements o f ~t~ ordered according to the lattice structure. In the second part we show that the increasing in the computing power allows the B F N to be universal computers, having the same computing power as a Turing machine.

Keywords---Block feedback neural networks, Network structures, Network architectures, Computing capabilities.

1. I N T R O D U C T I O N

In this paper, we discuss some properties of the B l o c k

F e e d b a c k N e t w o r k s (BFN) model, that we presented in Santini, Del Bimbo, and Jain ( 1993 ). BFNs are dis- crete-time dynamic multilayer perceptrons, built ac- cording to a recursive block-based scheme. We used these networks for nonlinear systems identification and image sequences prediction (Del Bimbo, Landi, & San- tini, 1992, 1993).

Being built according to a recursive scheme, the BFN model enjoys a number of unique properties that we investigate in part in this paper. Most of these prop- erties stem from the great structural flexibility allowed by the recursive scheme. Intuitively (the term will be formally defined in the paper) , the structure of a net- work is a description of how the neurons are connected, regardless of the weight values and the details of how the neurons work.

Most current network models are based on a well- defined structure, and instances of the models are built by selecting the appropriate number of neurons in var- ious parts of a network. For all these networks, re- searchers study how several properties depend on the number of neurons - - the structure being fixed. A num- ber of structures have been studied for generalization (Schwarze & Hertz, 1992), approximation capabilities (Blum & Li, 1991 ), and probability estimation from

Requests for reprints should be sent to Simone Santini, Depart- ment of Computer Science, UCSD 9500 Gilman Drive, La Jolla, CA 92093-0116.

finite training sets (Moody, 1992; Murata, Yoshizawa, & Amari, 1992).

On the other hand, if one wants to study the possi- bilities of entirely arbitrary models, he is overwhelmed by the countless structural possibilities, and cannot, usually, obtain general results.

BFNs are in quite a unique position. The model is general enough to allow the definition of a great num- ber of structures and therefore to make structural issues worth studying. Yet, the systematic way in which net- works are built suggests that general results can be in- ferred.

We argue that structural issues play a central role in neural network models and that the pos- sibility for BFNs to derive general propert ies should be regarded as one of their most desirable features.

This paper is divided in two parts. In the first part, we discuss the algebraic properties of the set of BFN structures. We begin by giving a formal definition of the terms "structure" and "architecture," and then we introduce a partial ordering in the set of architectures and in the orthogonal set of BFN dimensions. At the end of this section, we show that the partial ordering corresponds to an ordering of the computing capabili- ties of the BFNs.

This leaves the open question of w h a t can be done

with this model. In the second part we show that, if the BFNs are seen as sequential computing de- vices, their computing power is the same a Turing machine.

Before discussing the properties of BFNs, we in- clude a brief reminder of the model, which also serves

579

580 S. Santini and A. Del Bimbo

b ~ N 1

FIGURE 1. Cascade connection (a) and feedback connection (b). N1 is the embedded block, that is, a feedback neural net- work for which input-output behavior is known but for which internal details are unknown. The weights matrix W in (a) and A, B in (b) contain the parameters to be modified when the connection is trained. We suppose we know the derivatives of the cost with respect to the output of the connection (i.e., with respect to the output of N1) and the derivatives of the outputs of N1 with respect to the inputs of N1. The learning algorithm allows to compute the derivatives of the cost with respect to all the parameters in W, A, and B and the deriva- tives of the outputs of the overall block with respect to the inputs of the overall block.

as an introduction of the symbolism used throughout the paper.

2. I N T R O D U C T I O N TO BLOCK FEEDBACK NETWORKS

Block feedback networks have been extensively dis- cussed in Santini, Del Bimbo, and Jain ( 1991 ) and San- tini, Del Bimbo, and Jain (1995). The BFN model is based on a block diagram notation, inspired by that in Narendra and Parthasarathy (1990). A block can be a single layer, or a whole network, which is regarded as a black box, for which input-output behavior can be specified without knowledge of the internal details.

Blocks can be connected by using a limited number of connections. In this paper, we consider two such con- nections: the cascade and the feedback connection.

The two connections are shown in Figure 1. We as- sume that the block NI is a BFN built by repeated ap- plications of the same elementary connections. The net- work we obtain with the connections can also be considered as a block, to be embedded in further con- nections in a recursive way.

Throughout this paper we will use a compact nota- tion to represent BFNs: we denote single layers with lower-case Latin letters, the cascade of layer n and block N with n-N, and the feedback connection of the block N and the layer q as q { N }. For instance, the network of Figure 2 is represented as:

a ' b { c { d } } ' e . (1)

A layer is characterized by two weights mat r ices - - the feedforward weight matrix and the feedback weight ma- t r i x - a n d by the output functions arrayf . For the layer of Figure lb, A is the feedforward weights matrix and B is the feedback weights matrix. The layer of Figure 1 a has feedforward weights matrix W and no feedback weights matrix.

When the internal structure of a layer needs to be evidenced, we will use the symbol

where A is the feedforward weights matrix and B the feedback weights matrix. A feedforward layer will be indicated as

This symbol can be used in more complex network specifications. For instance, a complete specification of the network of Figure 2 is

(2)

FIGURE 2. Sample block feedback neural network.

Properties of Block Feedback Neural Networks 581

The application of the network N to the input vector x yielding the output vector y will be indicated by

xNy. (3)

In part II, we will need to build networks with a computing power at least equal to that of a given net- work. We can formalize this with the following defi- nition.

DEFINITION 2.1. Given two networks N2 and N2, we say

that N 2 covers Nl (Nl ~-- N2) if for any weights config- uration of N~, there exists a weights configuration of N2 such that, for every x, xNly ~ xN2y.

The covering relation is reflexive and transitive. We introduce a further operation over networks: the

parallelization. It is based on the following lemma.

that

[xlxt]r[A trBl]Yl

[xtx2]r[a2tTB2]Y2

with

Aj = [All AI2] A2 = [A21 A22]

being x E R", x~ E R m, xz E R p, Yl E R k, Y2 E ~hAll E ~kxn, AlE E ~kXm, A21 E R hx", Ajl E ~hxp, Bj E R k×k and B2 E ~h×h.

Then there exists a layer AI with 2n + m + p neurons such that

that is, [xlx, Ix21W[yl ly21 ~.

Proof. To prove the lemma we have just to build such a layer. This is easily done by setting

with

[a A2 0] .=[B, 0] A = A21 0 A22 0 B2 "

It is easy to see that this layer has k + h output and that its first k output is equal to yl, whereas the other h outputs are equal to Y2. From the arbitrariety OfAl, A2, BI, B2 the lemma follows. •

Thus, if we have two layers, possibly with some common input, and separate outputs, with nl and n2 neurons, respectively, a single layer with nl + n2 neu- rons can generate both the outputs in parallel, while

receiving the same input of the two layers. This oper- ation will be referred to as parallelization. The paral- lelization of layers N and M will be denoted by

Note that the parallelization of two layers is still a layer, and not a more complex structure. In real world there is no such a thing as a "paral lel ized" couple of layers. There is no need to define a learning procedure for them. Parallelization is just a modelling tool, and not a third type of connection. Parallelization can be applied to whole networks, on a layer-by-layer basis.

PART I: STRUCTURAL PROPERTIES

In this part, we study some structural properties of BFNs. Loosely speak ing- - the term will be defined more precisely in the next section--structural proper- ties are those referring to the way neurons are con- nected, independently on the values of the synaptic weights and the characteristics of the neurons. In BFNs, structural properties are independent on the training al- gorithm, because training merely adjusts the weight values. This would not be true, on the other hand, for cascade-correlation type algorithms (Fahlman & Le- biere, 1989) or for pruning algorithms (Karnin, 1990; Mozer & Smolensky, 1989), which change the weights and the network structure.

We identify two elements of the network structure: the architecture of the network and the dimension of the network. The former translates in mathematical term the intuitive notion of the "connection scheme" of the layers, whereas the latter retains the information on the layers sizes. Both architecture and dimension determine partitions in the set of BFNs and, therefore, can be used to build two distinct quotient sets.

We study the quotient set ~A, of n layer BFN archi- tectures and the quotient set ~D, of n layers BFN di- mensions. We introduce a partial ordering in ~4. and ~. and show that this endows both partitions with the structure of a distributive and pseudocomplemented lattice. Then we show that the two quotient sets ~4. and ~). make up an orthogonal decomposition of the set of n layers BFNs.

Finally, we show that the ordering induced by ~A. and ~). in the BFNs corresponds to an increasing com- puting capability.

3. SOME DEFINITIONS

Our task in this section is to give a formal definition of the term "network architecture." We first introduce a few definitions that restrict the scope of the discussion to multilayer neural networks. Then we give a defini- tion of "structure," which reflects the intuitive notion

582 S. Santini and A. Del Bimbo

of something obtained from a neural network by ig- not ing the values o f the weights. Finally, we refine this definition to obtain a definition o f network architecture.

Let WN be the set o f all weights of the network N, Vu the set o f all the neurons o f the network N, and ZN the set o f all delay units o f the network N.

I f the output o f the neuron n E VN is connected to an input o f the neuron m E VN by the weight w, we use the notation n ( w ) m . I f the output of the neuron n is connected to an input o f the neuron m by the weight w and the delay unit z, we use the notation n ( z ) m .

DEFINITION 3.1. A feedforward chain o f order r, Cr is an ordered set o f t neurons {nl . . . . . nr } such that

Vi E [1 . . . . . r - l], 3wi E W s : n i ( w i ) n i + l . (5)

Two neurons p, q are r-connected i f there exists a f eed forward chain o f order r, er, such that either Cr =

{p, n2 . . . . . n ~ _ l , q } o r C ~ = {q, n2 . . . . . n ~ - l , p } . This definition doesn' t consider the feedback connec-

tions. Two neurons are part of a feedforward chain only if there is a feedforward (i.e., instantaneous) connection between them. Two neurons are said to be disconnected if they are not r-connected for any value of r.

DEFINITION 3.2. A neuron no is an output neuron i f there are no m E VN and w E WN such that no(w)m.

A neuron n~ is an input neuron i f there are no m E VN and w E WN such that m ( w ) n i .

We assume, without loss o f generality, that no neu- ron is both an input and an output neuron.

DEFINITION 3.3. A ne twork N is multilayered i f there exists a part i t ion .C = { Ll . . . . . L, } o f VN, such that,

f o r every couple o f neurons n, m, E t i (p, q E VN, wl , wz e WN): 1. m and n are disconnected. 2. Both n and m are output neurons or

n ( w , ) q , m ( w z ) p ~ p , q G L j i ~ j . (6)

3. Both n and m are input neurons or

q ( w , ) n , p ( w z ) m = p , q E L j i ~ : j . (7)

The sets L~ are cal led layers o f the network. From the definition, it is clear that if the neuron n E

L~ is connected to the neuron m E L j , then all the neu- rons o f the layer L~ are either disconnected or connected to neurons o f the layer Lj. We can briefly say that the layer L~ is connected to the layer Lj. Note that the con- cept o f r - feedforward chain er can be easily extended to layers.

DEFINITION 3.4. Two layers Ll, L2 are feedforward con- nected (Ll(W)/-~) i f there exists a neuron n E L l and a neuron m G Ia such that n ( w ) m , with w E Wu.

Two layers L~ , L2 are feedback connected ( L2( w ) L~ ) i f there exists a neuron n E L~ and a neuron m E L2 such that n(~ ' )m, with w E Wu and z E Zu.

DEFINITION 3.5. A network is fully connected i f

LI(W)L2 = 'v'n C Ll, Vm E Lzn(w)m

and

(8)

(wl Ll Z L2 ~ k/n E Li, k/m E Lzn(w)mz. (9)

This means that if a neuron of layer L~ is connected (feedback connected) to neuron of layer/-2, then every neuron of layer L1 is connected (feedback connected) to all the neurons of layer L2.

The feedforward connections also induce a natural ordering in the set o f layers.

DEFINITION 3.6. Let £ = { L~ . . . . . L , } be the ordered set o f layers o f the neural ne twork N. The set £ is prop- erly ordered i f f o r each i, t i and Li + i are connected by a f o r w a r d connection, that is,

Vnh E Li, mk E Li+13Wih k E WN:nh(Wlhk)mk.

To study structural relations among BFNs, we must abstract f rom the details o f the particular networks. For instance, we don ' t care about the specific weights value, because the intuitive notion o f structure we have is something that is left unchanged when the weight values change. We show this by the following defini- tion.

DEFINITION 3.7. Two neural networks N and M are structurally equal ( N s M ) i f they have the same num-

ber o f neurons (i.e., I vNI -- I vMI), and VN and VM can be ordered as VN = { nl . . . . . ns }, VM = { ml . . . . . ms } so that

1. n i ( w ) n j ¢~ m i [ ~ ) m j 2. n i (~)nj ~ m i ( f ) m j .

This definition does not imply that N and M imple- ment the same function, because structurally equal net- works may have different weights. On the other hand, the definition ensures that [ WN[ = [ WM[.

THEOREM 3.1. The structural equality relation (3.7) is an equivalence relation.

We omit the proof of this property for the sake o f brevity. It can be found in Santini and Del Bimbo (1993) . It is easy to see that Definitions 3.4 and 3.7 ensure that structurally equal networks have the same number of layers.

S THEOREM 3.2. Let M and N be such that M =-- N; let { Ql . . . . . QN } be the layers o f M, and {Ll . . . . . LN} be the layers o f N. These two sets can be ordered such that i f Li is r-connected to Lj f o r some r, then Qi is r- connected to Qj f o r the same r.

Let S, be the set of all multilayer neural networks with n layer. The equivalence relation ,,___s ,, (3.7) can be used to build up equivalence classes in S,, defined a s

[N]( --s ) = {M E S, IM s N} . (10)

Properties of Block Feedback Neural Networks 583

DEFINITION 3.8. Let S# be the set o f n layer BFNs. A n- layer structure f is an e lement o f the quotient set

~ = S#/([N]( s-- )). (11)

The definitions and properties given so far are quite general, and they are valid for a large class of networks. In this paper, however, we are concerned with BFNs, for which architectures are determined by the connec- tions of Figure 1. We now discuss some specific prop- erties of the BFN model. These properties stem from the recursive mechanism that creates BFN networks.

First, we establish a principle that rules the trans- mission of properties among BFN. Let P be an arbitrary logic predicate with domain in the set of BFNs, and N a BFN. If P is true for N, we will use the notation PN. Then, the following holds.

LEMMA 3.1. (Inheritance Principle) I f Pq holds f o r each single layer q, and i f

PNI, PN2 =' PNI "N2

PN = Pq { N } for each layer q,

then P holds f o r all f eedback neural networks.

THEOREM 3.3. Let N be a B F N with layers layer { LI, . . . . L , } . Then 1. it is possible to f ind a permutation p, . . . . p , o f 1,

. . . n such that, f o r the ordered set { Lpj . . . . . Lp~} : ( a ) Each nondelay connection is forward-directed,

that is:

( b )

n E Lp,, m E Lpj, n(w)m ~ j = i + I.

Each delay connection is backward-directed,

that is:

n(z)m° 2. each layer has its input connected with the output

o f at most one layer: i f m, n E L~, p E ILk, and q E Lh, then

(wl (Iw m , q n = h = k .

P z z

Proof. We will prove only the last assertion (the proof of the other two being similar) by using the inheritance principle.

The property is certainly true for a single layer, be- cause it receives either no feedback or feedback from itself. Suppose now that it is true for network N, and consider a network N ' obtained by application of ele- mentary connections to N and an arbitrary layer 1. There are four different ways in which we can use the two elementary connections to put together the layer 1 and the network N:

I 'N

I {N}

l{ } .N

N ' I { }

In the first case, the layers of Nreceive exactly the same number of feedbacks as before the connection (i.e., none or one) by hypothesis; in case 2, the layers of N are untouched, and layer I will receive feedback only from the (k + I )th layer; in cases 3 and 4, the layer I will receive feedback from itself only. Thus, the prop- erty holds for the whole network. •

DEFINITION 3.9. Let NI and N2 be two layered networks, with I~11 = 1~21 (i.e., let the two networks have the same number o f layers). The network N~ is said to be ~b related to the network 312 (NI s N2(~b)) i f there are L) E ,~, L 2 E ,8-2 with L~ --/: f~ and L2i * 0 , and there is a neuron n ~ VN such that

N ' = {L 2 . . . . . L2_,,L~ U n , L2+, . . . . . L~} s__ N,, (12)

that is, there are two nonempty layers L~ E .el, L 2 E ,e2 such that the two networks can be made structurally equal by adding a neuron to ~2.

This definition doesn' t allow layers creation and de- struction. The two layers L~ and L 2 must be nonempty before we can add up a neuron to L/2 .

This definition is evidently antis~,mmetdcal: if N~ __s N2(~) , then we cannot have N2 - Nl(~b). We can state the following symmetrical version.

DEFINITION 3.10. Two networks NI and N2 are ~P related (N~ __s N2(k~)) i f either N~ s N2(~b) or N2 s N~(~b).

The following property of the @ relation is easily verified.

S S LEMMA 3.2. l f Nl N2 then Nt =- N2(~)

The kO relation, though symmetric, is not an equiv- alence relation, being not transitive.

DEFINITION 3.11. Two networks N1 and N2 are said to be O-re la ted- - written NI = N2 ( 0 ) - - i f there exists an ordered set o f networks { Qo, Ql • • • Qr } such that Nt S S

=-- Qo, N2 - Qr, and

~/i E [1, r - 1] Qj s Qj+,(ff2). (13)

For the O relation, the following property holds (see Santini & Del Bimbo, 1993, for the proof) .

THEOREM 3.4. The 0 relation is an equivalence rela- tion.

As an equivalence relation, O can be used to build equivalence classes into the set 5in of all n layers BFN structures:

[N](O) = {ME 5t, lM = N(O)}. (14)

This also generates the quotient set ~n = 5 tJ (O) : the set of all the O equivalence classes of St#.

We can now define the term network architecture.

584 S. Santini and A. Del Bimbo

DEFINITION 3.12. A network architecture f o r n layers BFNs is a 0 equivalence class over the set o f all n layers neural networks.

The set ~g, = 7 , / 0 is the set o f n-layers ne twork architectures.

Note that, because the network architecture is de- fined in terms of the O equivalence, structurally equal networks, ~b-related networks, and kO-related networks have the same architecture.

An equivalence class can be represented by one of its components, because if N and M are O equivalent, then [ N ] ( O ) = [ M ] ( O ) . Therefore, we can pick one o f the networks belonging to a particular equivalence class and use it to label the whole class. We exploit the following lemma, for which the proof is in Santini and Del Bimbo (1993) .

LEMMA 3.3. Given an equivalence class [N] (O) C L , / 0 , there is a ne twork ?gN E [ N ] ( O ) having exactly one neuron f o r each layer.

Thus, we can label a network architecture using the "s ing le -neuron" network. In the following we will of- ten refer to the "?/N architecture." This will mean that we refer to the equivalence class that ~/N belongs to.

4. A R C H I T E C T U R E F U N C T I O N S

The structure o f the set ~A, can be more easily studied if we find a way to characterize an architecture in terms of numerical properties only, without any reference to the recursive process o f actually building an architec- ture.

In this section we associate a descriptive func t ion to each network architecture. We first give an intuitive description o f the architecture function, and then the technical definitions.

The idea is quite simple: assign a number from zero to n to each layer o f a n-layers network. The value of the function for layer i will be equal to the layer where the feedback path ending in layer i starts from.

For instance, consider the network a" b { c { d } }- e[ eqn (1) and Figure 2]. This network has five layers. The feedback path ending in layer b (the layer number two) starts from layer d (the layer number four) ; therefore, for this network, f ( 2 ) = 4.

There is a problem in this representation: which value to assign to the layers without any feedback paths ending on them. A simple idea would be to define the value as equal to the layer number. Unfortunately, this does not work. Consider the following two networks:

a . b . c

and

a.b{ } .c .

According to this definition, they would both have f ( 1 ) = l , f ( 2 ) = 2, and f ( 3 ) = 3. Nevertheless, they have

different architectures (i.e., they do not belong to the same O class).

To work the problem out, we must modify the rep- resentation. Imagine, after any layer, the presence o f a ghost layer. The architecture functions take their ar- guments on the real layers only, but take values on the real plus ghost layers set, so that, for an n-layer net- work, f : { 1 . . . . n } ~ { l . . . . . 2n }. In the target set, real layers correspond to odd numbers and ghost layers to even numbers. The value o f f ( i ) is equal to the real layer corresponding to i (i.e., 2i - 1 ) if no feedback path ends in layer i. Otherwise, the value is always even, and corresponds to the ghost layer associated to the real layer where the feedback path starts.

For instance, for the first network above, we have f ( 1 ) = l , . f (2 ) = 3, f ( 3 ) = 5. For the second network we have f ( 1 ) = 3 , f ( 2 ) = 4 (the ghost layer after layer b) and f ( 3 ) = 5.

In the following we will sometimes use a shorthand representation for the nestings. A function f is repre- sented as an n-tuple, for which ith element is equal to the value o f f ( i ) . For instance, the network a{ b } . c will be represented by the vector [4, 3, 6] .

This is a isomorphic representation; that is, the map- ping from networks to functions is one-to-one and onto. We give now a more formal definition o f the concepts introduced so far.

Let ~ , be the set of integer numbers f rom 1 to n, E,, the set of even number from 1 to n, and 0,, the set of odd numbers from 1 to n.

DEFINITION 4.1. Let a n-layers B F N n,, be given, and let ~,, be a unitary ne twork such that

~/,, = N(O)

with layers { Lj . . . . Ln }. The architecture function o f N, is a funct ion f : ~ , ~ ~J2,, defined as fo l lows:

For each n E Li : i f there exists m C Lj such that

m ( ~' ) n , then define f ( i ) = 2 j , o therwise define f ( i ) = 2 i - 1.

It is easy to see that Definitions 3.3 and 3.4 and the last point of Theorem 3.3 ensure that the function f i s properly defined.

A set of functions isomorphic to architecture func- tions can be defined without making any reference to BFNs. To this end, we introduce the following class o f functions.

DEFINITION 4.2. A funct ion f : Nn -~ N2,, is a nesting if" I. f ( i ) >-- 2 i - 1 2. f ( i ) E 02, o f ( i ) = 2i - l 3. f ( i ) = 2k = V j : i < - - j < - - k f ( j ) <-- 2k.

The relation between nesting functions and BFNs is established by Theorems 4.1 and 4.2.

THEOREM 4.1. f i N is a BFN, its architecture descrip- tion funct ion is a nesting funct ion or, using the term N-

Properties of Block Feedback Neural Networks 585

representable to indicate entities that can be repre- sented as nestings, every B F N is N-representable.

P r o o f We use the inheritance principle. First we prove that a single layer is N-representable; then we prove that the two allowed operations, cascade and feedback, map N-representable n e t w o r k s - - o r , in the case o f cas- cade, couples o f N-representable n e t w o r k s - - i n t o N- representable networks.

A single layer is described by the function f ( 1 ) = 1, which is trivially a nesting. Let NI be a network described by fl and N 2 a network described by f2 , with both f~ and f2 nesting functions. For ease o f notation, we will suppose both networks made up of n layers. Let N = NI "N2, a n d f : N2. -"* N4. be the corresponding describing function. Then:

f l ( i ) if i -< n

f ( i ) = f z ( i - n) + 2n if i > n (15)

To prove the theorem, we have to show that f is a nesting, that is, that it fulfills the three parts o f Defi- nition 4.2: 1. First, we prove t h a t f ( i ) --> 2i - 1: if i --< n, then

f( i) = f l ( i ) ~ 2i -- 1; i f / > n, thenf2( i - n) --> 2i - 2n - 1; t h u s f ( i ) = f 2 ( i - n) + 2n >-- 2i - 1.

2. I f f ( i ) E 04, then, either i --< n andf~( i ) E 02, o r / > n andf2( i - n) E 02, . The first case is obvious: b e c a u s e f ( i ) = f l ( i ) , t h e n f ( i ) = 2i - 1. For the second case: i f f ( i ) = f2( i - n) + 2n E 04, then f ( i - n) E 02, . Because f2 is a nesting, this implies tha t f2( i - n) = 2 ( i - n) - 1, t h u s f ( i ) = 2( i - n ) - 1 + 2 n = 2 i - 1.

3. Finally, l e t f ( i ) = 2k. We must show that, for a l l j such that i --<j -- k, we h a v e f ( j ) <--f(i). It can be e i t h e r i - < n a n d k < 2 n o r i > n a n d k > 2 n . The first case is straightforward, being f e q u a l to f l . For the second case, l e t j > i. T h e n f ( j ) = f 2 ( j - n) + 2n. But, because i -< j <- k, andf2 is a nesting, f2 ( j - n) <-f2(i - n ) . Thus, f ( j ) = f a ( j - n) + 2n --<

f2( i - n) + 2n = f ( i ) . Thus, the describing function of the cascade o f two N- representable networks is a nesting.

Now, let N be a N-representable network, and let q be a layer. We build the feedback network q { N }. I f f ( i ) : ~ , ~ ~2, is the representation function o f N, then the representation function o f the whole network is g : Nn+I ~ ~ 2 ( n + l ) defined as:

~ 2(n + 1) if i = 1

g(i) = I f ( i - 1) + 2 if i > 1 (16)

It can be shown, using the same technique used above, that if f i s a nesting, g is also a nesting. We skip the proof for the sake o f brevity. •

To prove the second theorem, we first state the fol- lowing lemma.

LEMMA 4.1. Consider the nesting funct ions f with f ( i) = 2k > 2i - 1, and the set o f the restrictions o f such

functions to [ i, k] :

~R = { f : [i, k] --* [2i + 2, 2k] }.

Then ~? is isomorphic to the set o f the nesting functions g : ~k-i+l ~ ~2tk-i+l), with an isomorphism defined by:

g ( j ) = f ( i + j - 1 ) - 2 i - 1 j = 1 . . . k .

P r o o f The proof is a straightforward application o f the definition.

THEOREM 4.2. I f f is a nesting function, then there exist a B F N N such that the architecture funct ion o f N is equal to f That is, every nesting funct ion represents a network.

Proo f The proof is by induction on the size n o f the function domain. It can be verified by direct enumer- ation that for each n e s t i n g f : ~ 2 ----i, ~ 4 there is at least one BFN architecture described by f .

Suppose now the property holds for all the nestings f , : ~ , ~ N2,, and consider the nesting f ,+t : ~,+~

~2(n + 1 )"

Consider the value of f ( 1 ). We must discuss two c a s e s : f ( 1 ) = 1 and f ( 1 ) = 2 j , j >- 1.

Consider the case f ( 1 ) = 1. By Lemma 4.1, the restriction o f f to { 2 . . . . n + 1 } can be mapped onto a nesting functionf2 : N, ~ ~2,. Because the dominium of this function has size n, by hypothesis, it is the ar- chitectural description of a BFN. Let N be this network, and consider the network l .N . For the architecture function o f this networkf~, it holds: f , ( 1 ) = 1, f , ( i) I i > = f2( i - 1) + 2 = f ( i ) . Thus, f , = f , and f i s the description function o f the network.

Assume, n o w , f ( 1 ) = 2 j , with 2 --<j --< 2n. Consider the two functions f2 and j~, being f2 the restriction of f to { 2 . . . . . j }, and f3 the restriction of f t o { j + 1 . . . . . n }. Both f2 and3~ can be mapped, by virtue o f Lemma 4.1, on two nestings with a dominium of size less or equal n; thus, by assumption, they are the architectural description of two networks N~ and N2. I f we consider the network l { N~ }- N2, it is easy to see, with an ar- gument similar to that above, that f is the architecture function of this network. •

We have proved that all architectural description functions are nestings, and all nestings are architectural descriptions. We still need to prove that the correspon- dence is unique.

THEOREM 4.3. Each B F N architecture is described by a unique nesting.

Proo f Because we have shown that architecture func- tions are nestings, it will be sufficient to show that the association is unique.

Suppose f and f2 are two distinct nesting functions describing the same network N, and, assume that f~ (k) * fz (k) .

586 S. Santini and A. Del Bimbo

Let us distinguish three cases: 1. I f both f~ E 02., and f2 ~ O2,, then, for the point 1

of Definition 4.2,f~ (k) = 2k - i = f2(k) , thus con- tradicting the hypothesis.

2. Assumef l E E2, and f2 E O2~. In this case, f l ( k ) = 2h and there are neurons n ft. Lk, m if: Lh such that m ( z ) n . But if such a connection exists, then, by Definition 3.3, all the neurons in the layer Lh are connected to neurons in the layer Lk and, by the definition o f O equivalence, the hth layer o f / / u is connected to the kth layer. Thus, we must have f2 (k) = 2h, contradicting the hypothesis. The case f2 c E2,, andf l C 02n is evidently symmetric.

3. I f f j ( k ) = 2h andfa (k) = 2q, with h :~ q, then with a reasoning analogous to point 2 we can see that this contradicts Theorem 3.3. •

THEOREM 4.4. I f tWO architectures are described by the same nesting, they are equal.

P r o o f Suppose A and B are two ne twork architec- tures, that is, two networks with just one neuron in each layer. Because they are descr ibed by the same f u n c t i o n f : N. + N2,~, they have the same number o f layers, namely n. Because they are both layered net- works, if { LA i . . . . . LA. } are the layers o f the first network, and { LBI . . . . . LB,, } are the layers o f the second, we have

n G Lai, m @ LAU+I) =~ n(w)m.

Similarly,

n E LBi, m C L/~u+lt =:~ n ( w ) m .

Thus, the feedforward connections satisfy the equality requirements. We have to show that the feedback con- nections do the same.

L e t f ( i ) = 2 j --> 2i. Then

t i e t a i m C L A j ~ m<Wz>n.

Because B is described by the same function,

n C L n , , m E L B j ~ m<W>n.

I f f ( i ) = 2i - 1, then there does not exist m such that n E La~ and m(~')n. Similarly, there does not exist a corresponding m for the second network.

Thus, for each n, m in A, such that n ( w ) m or m ( ~ ) n , there exists a corresponding couple in B for which the same connection holds. •

The above properties imply that the relation between network architectures and nesting functions is one-to- one and onto (i.e., it is an isomorphism).

5. LATTICE STRUCTURE OF N E T W O R K ARCHITECTURES

Given the isomorphism between network architectures and nesting functions, we can study the properties o f

the nesting functions to investigate the structure of the set CAn.

Because o f the isomorphism, we can use the same symbol d , to indicate the set o f all nestings f : ~n ~2n. In CA, we can define a partial ordering as follows.

DEFINITION 5.1. Let f l , f2 E cA,. We say that f l <- f2 i f

ViE [1 . . . n ] f~(i) <-f2(i).

This definition makes the set cA, a poset. The meet (or infimum, A) operator can be defined as:

DEFINITION 5.2.

V f , f , , f 2 E ~t.

f = f ~ A f2 = f ( i ) =rnin( f~( i ) , f2( i ) ) V i E [1 . . . . n],

It is easy to verify t h a t f l , f : ~ cA. ~ fl A f2 C cA., thus cA. is closed with respect to A.

To make cA. into a lattice, we have to show that, for all f~, f2 ~ cA., sup { f l , f2 } = f~ v f2 exists. This is shown by the following lemma.

LEMMA 5.1. For all f i , f2 E cA., there exists f E cA,,

such that f = sup { f l , f2 }.

P r o o f Given f~ and f2, consider the set

AI,#2 = {g C d , : f j --< g and f2 ~ g } .

This set is surely nonempty, because the " 1 " function, defined as 1 (i) = 2n Vi belongs to it. Moreover, As,,s 2 is a finite partial lattice, because f , g E As,.s ~ = f A g E As~.s ~. Therefore, there is a minimum in As,,s2, that is, an element a such that

x E Af,,f2 =~ a <: x.

It is easy to see that a = f l V f2. • The closure property of cA~ with respect to the v is

a straightforward consequence o f the definition. These operators are associative, idempotent, and sat-

isfy the absorption identities (Gr~itzer, 1971 ):

a A ( a v b ) = a a V ( a A b ) = a . (17)

From this, the next theorem easily follows.

THEOREM 5.1. For each n, the set ~t, is a lattice. Figures 3 and 4 show the lattice diagrams for two-

and three-layer feedback neural networks. The codes beside the lattice elements are the corresponding func- tion representations.

The covering relation between two functions f and g is defined as: f covers g (g - < f ) if g --< f and there is no element h such that g -- h -< f .

THEOREM 5.2. Let f g E ~ , . Then g - < f i f f g ( i ) = f ( i ) , i q: k and either: 1. g ( k ) = 2k - 1 a n d f ( k ) = 2k, or 2. g ( k ) = 2 j a n d f ( k ) = m a x { 2 ( j + l ) , g ( j + 1)}.

The proof o f this theorem is omitted, and can be found in Santini and Del Bimbo (1993 ).

Properties of Block Feedback Neural Networks

[ 2 , 4 ] ~ ~ _

[1, 41 a]

[~,3] FIGURE 3. Lattice diagram of ~2.

3]

The passage f rom an e lement of the lattice ~A, to another e lement that covers it corresponds to one of the two fol lowing " e l e m e n t a r y " operations: 1. the insertion of a narrow feedback loop (i.e., a loop

with no layers embedded into) in a feedforward layer, or

2. the extension o f an existing feedback loop to the next available position.

6. D I S T R I B U T I V I T Y AND PSEUDOCOMPLEMENTATION

The lattice o f B F N architectures has several noticeable properties, stated in the two theorems that follow. To prove the first, we begin by proving the fol lowing lemma.

LEMMA 6.1. Le t I and J be two ideals o f ~4,, and let p E L q E J. I f g <- p v q, then there exist~6 E l a n d q E J such that g = /t v q.

P r o o f Define/6 = p A g, and q = q A g, that is,

/~(i) = min{p(i) , g(i)}

4(i) = min { q(i ) , g( i ) }

and h = /6 V q. According to the definition of v ( L e m m a 5.1 ), if

~ = {h E cA,:15 < h , ~ <- h },

then the p roof of the theorem is equivalent to the proof that g = inf { Sh }.

To prove this, first note that f rom the definition of/6 and t~ it fol lows that g E { ~ }.

Then, let s E 5D~, and consider the value of s ( i ) . By definition, s ( i ) >- max{ /6 ( i ) , q ( i ) } . That is, consid- ering s ( i ) as a componen t of the lattice of integer num- bers,

s( i ) > l~(i) v q(i) = (p( i ) A g( i ) ) V (q( i ) A g( i ) ) ,

but the lattice o f integer numbers is distributive; thus, for the single values s ( i ) , the distributive law holds:

587

s( i ) >-- g( i ) A (p( i ) V q( i ) )

and, because g -< p v q, we have g ( i ) A ( p ( i ) V q ( i ) ) = g ( i ) , thus s ( i ) >- g ( i ) , and, repeating the argument for all i, s --> g . Therefore,

s E ~ s ~ g .

Because we have just seen that g E { IP~ }, it must be g = min { ~ }. This proves that g = p v q.

We still have to prove that fi E I and q E J . F rom the definition, it is apparent that/6 ~ p and q - q.

Because ~A, is finite, every ideal is principal. Thus, there exists i, j E ~4, such that I = { x [x -< i } and J = { x I x - < j } . T h i s m e a n s t h a t p E I , / 6 ~ p = 16 E I , and similarly, q E J .

This completes the proof of the lemma. • Moreover , it is possible to prove that the fol lowing

theorem holds (Gr~itzer, 1971 ).

THEOREM 6.1. A lattice L is distributive iff, f o r any two ideals I and J o f L,

I v J = { i V j l i E I , j E J }.

The first relevant property of ~A, is stated in the fol- lowing theorem.

THEOREM 6.2. The lattice ~4, is distributive.

P r o o f The proof is an application of Theorem 6.1. Let I and J be two ideals o f ~A,, and H the subset o f ~A, such that I v J = ( H ] . The theorem can then be proved by showing that

(H] = { a : a = i V j , i E l , j E J } .

FIGURE 4. Lattice diagram of ~ .

588 S. Santini and A. Del Bimbo

For a general property of ideals, it holds (Gr~itzer, 1971):

g E ( H ] = 3hl . . . . . h , @ H : g<--h~ v . . . v h,,.

Because H = 1 U J , for every h~, either hi E I or h~ C J (or both) .

Let us order the hi as { hi . . . . . h, } = {Pl . . . . . Pr, ql . . . . . qs } with p~ E I and qi C J . Thus,

g<- {p~ V . . . v p , V q~ V . . . vqs} .

But, being ideals, I and J are sublattices; thus, there are two elements:

such that

f f=p~ v . . . V p ~ E l

Ul=q~ v . . . q . ~ C J

q @ ( H ] = q ~ p V q p E l U l E J .

The distributivity is then a consequence of Lemma 6.1. •

The second property of ~An is stated in the following theorem.

THEOREM 6.3. The lattice ~4~ is pseusocomplemented.

P r o o f The proof is by construction. Given f E d,,, consider the function g defined as:

2 i - 1 if f ( i ) > 2 i - 1

g(i) = 2n if f ( i ) = 2 i - l (18)

It is easy to prove that g = f *. The proof is in (San- tini & Del Bimbo, 1993).

Moreover, it is easy to check that for all f in ~A~, it holds f * v f * * = 1.

THEOREM 6.4. The lattice ~4~ is a Stone Algebra (~A,, A, V, * ) .

7. THE LATTICE OF THE LAYER DIMENSIONS

So far, we have considered network architectures, that is, networks c o n t a i n i n g - - b y de f in i t ion - -a single neu- ron in each layer. In this section we turn our attention to the number o f neurons in the layers. We will not consider the feedback structure of the network, which is described by the elements of ~A,. The feedforward connections will be implicitly considered, because we will use the proper ordering (Definition 3.6) that is induced by the feedforward connections. All sets o f layers considered in the fol lowing will be assumed to be properly ordered.

Consider the set of networks with n layers. We first need to establish an equivalence pr inc ip le- -d i f ferent from O equ iva lence - - tha t focuses on layer dimensions properties. To this end, we state the following defini- tion.

DEFINITION 7.1. Let N1 and N2 be two neural networks with n layers, and let L = { Li . . . . . L,, } and K = { Ki, . . . . K,, } be the respective layer sets. Nl and N2 are

D dimensionally equivalent (Nl ~- N z ( D ) ) if, f o r all i, I Li I = I Ki ], that is, the number o f neurons in Li is equal to the number o f neurons in Ki.

It is easy to show the following (we omit the proof) .

THEOREM 7.1. D is an equivalence relation. Thus, D induces equivalence classes in the set fin of neural network structures

[N](D) = {m: M = -° N(D)} (19)

and a quotient set ~On = ~, , / (D) . Our next step is to make ~,, into a poset.

DEFINITION 7.2. Let NI, N2 C ~, and let L -- { LI . . . . . L,, } and K = { Kl . . . . . K,, } be the respective layer sets. Then,

N, <- N2 i f fVi lLi l <-IKil .

Then, we can make ~D,, into a lattice.

DEFINITION 7.3. Let NI, N=, M E ~ and let L = { Ll,

. . . . L,,}, K = {K~ . . . . . Kn} and Q = {01 . . . . . Q , } be the respective layers sets, then:

1. M = N , AN2 = Vi[Oil : m i n ( l L [ , Ig, I) 2. M = N I v N 2 = V i l Q i l = m a x ( I L l , IK, I). It is easy to verify that these two operations are asso- ciative, idempotent, and satisfy the absorption identi- ties. Moreover, for all N, M ~ ~,,, we have N A M C D, a n d N v M E ~, .

If [~ is the set of integer numbers, made into a lattice by the natural ordering, then it is easy to see from the definition of direct product that ~ , ~ ~" . This implies that ~D, is distributive, pseudocomplemented, and, as a lattice, it is a Stone Algebra.

As in the case of neural network architectures, we will pick up a particular element of each equivalence class and use it as a representative for the class. We exploit the following theorem, stated without proof.

THEOREM 7.2. For each n-layers neural network N, there exists a f eed forward ne twork ~N such that ~N E [ N ] ( D ) .

The feedforward network will be used as the "class placeholder ."

8. THE LATTICE OF NEURAL NETWORKS

In the previous sections, we created two distinct par- titions in the set if,, o f n- layer neural network structures. The first led to the lattice ~A,, of neural network archi- tectures, the second to the lattice ~ of neural network dimensions.

We now study the relation between these two par- titions. Before doing so, we want to stress again that if,, is not the set o f BFNs with n layers, but the set of neural network structures, that is, ~,, is the quotient set of the

Properties of Block Feedback Neural Networks

set S, under the structural equivalence relation " - s ,, (3 .7) . To study the two partitions we have introduced, it will be easier to refer directly to S, rather than to ~0. Because ft, is a partition o f S,, the O and the D equiv- alence also are partitions o f S,.

We define two "pro jec t ion" operators:

7r . : S. --, ~A.

where, for each network M E S,, the two projections are defined as

7L,(M) = ~IM

7%(M) = ~M.

We want to show that the two quotient lattices we have defined completely characterize the lattice ft, o f neural network structures.

First, we state an intermediate result with the follow- ing lemma, for which the proof is omitted for the sake o f brevity, and can be found in Santini and Del Bimbo (1993) .

LEMMA 8.1. Let N, M E S,. It is Try(M) = Try(N) and S

7%(M) = 7r,(N), i f f M -~ N. With the aid o f this lemma, we can prove the fol-

lowing theorem, which states the result we were look- ing for.

THEOREM 8.1. The lattice ~, is isomorphic to the direct product o f ~4. and ~9., that is,

7,, ~ ~/,, × ~,. (20)

l f s E ~ , l e t s = [ N ] ( s ) [see eqn (11) ] , and M E [N] ~ ( ), then an isomorphism between ~, and ~t, × ~9~ is given by

A(s) = (Try(M), 7r.(M)). (21)

Proof. We first prove that A is one-to-one by contra- diction. Suppose s~, s2 E 59. with Sl * s2 and A(s~ ) = A(s2).oLet Nl, N2 E S,,, sl = [N~]( -s ), and s2 = [Nzl( - ). Because

A ( S l ) = A ( s 2 ) ¢0 7 t '~ (NI) = 71 . , (N2) , rr.(Nl) = 7 r . ( N 2 ) ,

s from I_emma 8.1 it follows that NI -- N2 and thus that

s2 = [ N 2 ] ( s ) = [Ni]( --s ) = st,

which is a contradiction. If N E S,, then, by Theorem 4.1 there is always a

nesting associated with it and, thus, an element o f ~A,. Moreover, an element o f 59, associated to N can be triv- ially built by counting the neurons it its layers. There- fore, 7r~ and 7% are defined for every element in S, (i.e., A i s onto) . •

The isomorphism A induces a lattice structure in the set ft,. We call ~, the lattice of BFNs.

589

9. SOME RELATION BETWEEN LATTICES AND COMPUTING CAPABILITIES

In the previous sections we discussed some algebraic properties o f the set o f BFNs, which derives f rom the introduction o f partial orderings into the sets ~A~ and D,. The resulting algebra allows to define operations over networks.

It is interesting to investigate whether there is a re- lation between the orderings introduced so far and cer- tain network properties. In this section, we present a first result in this sense.

L e t x = {xl . . . . . xn . . . . } a n d y = {yl . . . . . y . . . . . } be two sequences o f vectors (x, E ~"~ and y, E ~"y for all t ) , and let N b e a neural ne twork with nx inputs and ny outputs. Suppose that when the sequence x is given as the ne twork input, the ne twork outputs de- scribe the sequence y, we will say in this case that y is generated by x via N, and use the symbol x N ~ y. I f fN : R nx × t ~ R n' is the funct ion implemented by N, then

xN --* y = 'V(fN(X,, t) = y,.

DEVaNITION 9.1. l f Nt, N2 E Sn, we say that N~ is cov- ered by N2 within e(Nl ~- N2(e)) i f given a weights configuration wt o f N~, it is possible to f ind a weights configuration w2 o f N2 such that

Vx, y xNl ~ y = x N 2 " ~ y (22)

Vt Ily(t) - ~,11 - ~ (23)

It is easy to see that, for a fixed e, _<] is a partial ordering.

The results in the following are strictly related to the universal approximation theorem. There are a number o f proofs o f the theorem (see, for example, H e c h t - Nielsen, 1989; Hornik, 1991; Cotter, 1990) under dif- ferent hypotheses.

One form of these theorems (see Santini & Del Bimbo, 1993, for the proof) is the following lemma.

LEMMA 9.1. I f ~0 is a function computed by a single layer, for which output function satisfies the hypothesis o f the universal approximation theorem, then the same function can be approximated arbitrarily well with a cascade of two layers.

By multiple applications o f the above lemma, it fol- lows that:

LEMMA 9.2. I f g is a function computed by a single layer, for which output function satisfies the hypotheses o f the universal approximation theorem, then the same function can be approximated arbitrarily well with a cascade of n layers, n >- 2.

A first result that relates this definition with the lat- tice structures is the following.

LEMMA 9.3. Let sl, $2 ~ e/4n. I f s~ - < s2. I f we f ix e > O, then there exist two networks N~, N2 E S, such that 7r~(Nl) = SI , 7r,(N2) = s2 and Ni ~- N2(e) .

590 S. Santini and A. Del Bimbo

Proof Let f~ and f2 be the architecture functions of S 1

and s2, respectively. By Theorem 5.2, there is a value k such that f t ( i ) and f2 ( i ) are different only for i = k. W e have two possibilities. f l ( i ) = 2 i - 1: In this case, by Theorem 5.2, we have f2( i ) = 2i. I f 7r,(N~ ) = st , the ith layer of Nt must be without feed-

back, that is, representable as [A *~]" I f N2 is equal to

N~ everywhere but in the i th layer, and the ith layer is

given by [A* o]" Then, the two networks compute the

same function, and f2 ( i ) = 2i, as required. f l ( i ) = 2 j : Suppose, for the sake of simplicity, that f , ( j + 1 ) = 2 ( j + 1 ) - 1; thus, according to Theorem 5 .2 , f2( i ) = 2 ( j + 1 ). Let all the layers of NI between the ith and the j t h be represented as

[ ~ O ] k = i j A ~ B~ . . . . .

where, for some k, it might be B~ = O . For k = j + 1

the layer is: [A~+,* ~ ] " Let x(~)be the output vector o f

the kth layer, then the fol lowing holds:

x ") = ~0(A~x"-" + B~x ~')

x (~) = ~b(A~x ~-~) + B~x~h*)), h~ = l f l ( k ) - - j

x(J +1) = ~l(Aj+lx(J)) .

(24)

Let mk be the size of the kth layer of N[. The network N2 (also with n layers) is defined as:

• T h e j t h layer of Nz has size m r + M, with some suit- able M, and a structure

where Aj E N mj×"j-' , Ar E N uxmj-' , B r E N '~j×mj, Br E R u×"j. Note that, b e c a u s e f ( i ) = 2 j , by definition of nesting function the l a y e r j either has no feedback or has a narrow feedback onto itself.

• The ( j + 1 ) th layer has size mr+, + m r and structure

~0 0 [ (A0+t ~_j+,) O] (26) where A r ~ W "j+'×mj, .~ ~ W "j×M, and the feedback matrix is empty b e c a u s e f ( j + 1) = 2 ( j + 1) - 1.

• The ( j + 2 ) th layer has size mr+2 and structure

[(Aj+2100) Bj+2]" (27)

• The ith layer has size m~, structure

q, [Ai (0 IA, ) ] (28)

with the 0 matrix appearing in the feedback matrix is an e lement of R m,×,,~+~. Moreover , the ith layer has the feedback path attached to layer j + 1 instead of l aye r j .

• All the other layers have the same size and structure as the corresponding layers in Nl. One can easily convince that the architecture func-

tion of the network N2 so defined is indeed f2. We must show that N2 can compute the same function as N~ within an approximat ion e.

To this end, let y~J) be the output o f t h e j t h layer of N2 at t ime t. For the sake of simplicity, the output o f the layers j and ( j + 1 ) will be divided as

[z/ ' ]

~J) (29) Y, = (J)/ q, J

= [z~ j+' ' ] (j+l) (30) Y, [q: j+l) j

( j + l ) ~mj+ I with z~ j) E R mj and z, E . F rom the structure (25 ) , we can write the equations

for the j t h layer as

z~ j ' = ~b(Ajy~J-') + Bjz~',)

q~J' = ~b(~t, jy~J-') + BjZ~J)I), (31)

the equations for t h e j + l th layer as

z~ j+ t )= ~0(Aj+,z~ j,) (32)

q~j+l) = ~k(~j+,q~j)), (33)

and the equation for the l a y e r j + 2 as y[j+2) ~ _ _ ()+1) (j+2)\

= q.'taj+zz, + Bj+zy,-i ). (34)

Note that eqn (33) can be written as

q~J+') = Ik(Aj+iq,(Ajy~ j - ' ) + Bjz~J_),)). (35)

By L e m m a 9.2, it is possible to choose M A j + I , Aj and Br such that

IIq~ j + ' , - z / ' l l < 6

for any specified 6. I f we start with the equality x~ i - j ' = y~i t), for

which validity derives f rom the equality of the two net- works up to layer i, we can see, f rom equations above, that, if we could take 6 = 0 (that is, if we could repro- duce exactly the function of l aye r j + 1 ), then we would . (j) (j) ( j + l ) ( r+ l ) (j+2) (j+2) nave: z, = x, , z, = x, , and Yt : X t •

This is not possible in general but, because the func- tion computed by the network is continuous, we find that

V~ > 036 > 0 :llq~ j+~) - z~J)ll

< 6 ~ [ly~ ") - xf")ll < ~. (36)

I f f ( j ) = 2h, the computat ion is more involved, because the feedback connection of layer i must be moved to layer h, and all the layers between j and h must be augmented, just like layer j + 1 in the case

Properties o f Block Feedback Neural Networks 591

discussed here. However, the arguments are the same, and are based on application of Lemma 9.2. •

By reiterated application of Lemma 9.3, we can prove the following theorem.

THEOREM 9.1. I f sl, S2 E e/4 n ands1 <- s2, then, f o r every network Nl such that 7r~(Nl) = sl, and f o r every ~ > O, there exists a network Nz with 7r ~(Nz) = s2 such that Nl -<1 N2(~).

This theorem can be stated also in a way that is not related to any particular realization N~ and N2. For s E ~A,, let:

f ~ ) = { ~ E C " [ 3 N E S. : 7 r . ( N ) = s, y,

= ~(x,), XNl --, ~ ~ [lY, - Y-,II < ~ }. (37)

Then Theorem 9.1 is equivalent to:

THEOP.EM 9.2. I f s~, S2 E ~A,, then, f o r all 6,

• , -< ~ = aT_~ a~?. (38)

PART II: COMPUTING POWER OF BLOCK FEEDBACK NETWORKS

In part I, we have seen that BFN architectures make up an algebra and that the partial ordering implied by this algebra corresponds to an increasing computing power of the networks. This leaves the open question of what kind of devices can be built into the BFN framework. More specifically, this leaves open the question on whether the BFN model is universal, that is, any com- puting device can be represented as a BFN.

In this part, we show that the BFN model has the same computing power as a Turing machine. That is, for any Turing computable function f • N - ' N there exists a BFN ~ ( f ) such that, considering a finite set M C N, and for every n E M, if n is given as input to a BFN, then, after a finite number of steps, the output of the network is equal t o f ( n ) .

This will be proved by proving the equivalence of the class of BFN-computable functions with the class of/z-recursive functions that, in turn, is known to be equivalent to the class of Turing computable (T-com- putable) functions.

10. /.t RECURSIVITY AND TUR ING C O M P U T A B I L I T Y

Throughout this second part of the paper, we will con- sider functions defined on the set of integer numbers, taking integer values. This may seem a strong limita- tion, because one often uses neural networks to ap- proximate functions defined on R". However, integer numbers can be used to represent the elements of any set for which cardinality is, at most, ×0- This can be done, for example, by using the Gtde l numeration ( G t - del, 1986). Among the sets for which cardinality is at most ×0, there is the set Q of rational numbers, that is,

of the numbers of the form p / q , p , q E N. This set is dense inside the set of real numbers, and so Q", which also has cardinality ×0, is dense in •". Therefore, any func t ionf : R N--* R M can be approximated arbitrarily well by a func t ionf : QU ~ QM and consequently by a network implementing a function defined on the set of integers.

10.1. Primitive Recursive Functions

Let h~ . . . . . hr be n-ary functions (i.e., hi " N" --' N) , g be an r-ary function, and x = (x~ . . . . . x,) E N". If, for any x, it holds

f (x ) = g(hl(x) . . . . . hr(x)) (39)

the function f is said to be obtained from g by substi- tution of hi . . . . . hr.

Let g be an n-ary function, and h an (n + 2)-ary function. If, for any x E N", y E N it holds

f ( x , 0 ) = g ( x ) (4O)

f ( x , y ' ) = h ( x , y , f ( x , y ) )

(where y ' is the successor of y) , then f i s said to be defined by induction from g and h.

We consider also three elementary functions, for which definition is self-contained: 1. the successor funct ion, that for x E N, has value

S ( x ) = x ' = x + 1, ~ 2. the identity functions, U~,, 1 <- i <- n, defined as:

U~,(x~ . . . . . x , ) = xi for any x~ . . . . . x,, 3. the 0-ary constant C O (for which value is 0).

DEFINITION 10.1. A function is said to be primitive re- cursive i f it is one o f the funct ions 1, 2, 3 above, or i f it is obtained by finite application o f induction and sub- stitution to the funct ions 1, 2, 3 above.

To state it differently, the function f is primitive recursive if it is obtained by applying substitution or induction to a primitive recursive function. For in- stance, if the function h in eqn (40) is primitive recur- sive, so is f .

A similar definition holds for predicates on n-tuples of integer numbers.

DEFINITION 10.2. A n-ary predicate P ( n >- 1 ) is prim- itive recursive if f there exists an n-ary primitive recur- sive funct ion f such that

i In the definit ion of S(x) we have used the sum x + 1. This may lead to the impress ion that S(x) is def ined in terms of the sum. Ac- tually, the sum was used only as a shorthand. The successor should be cons idered as a p r imi t ive function, not def ined in terms of anything else, as in Peano ' s axioms. The sum is a function defined by induct ion in terms of it. I f x + y is regarded as a function of y, parametr ized

by x, then:

I f ( x , 01 = x f(x, y ) = S(x + y)"

592 S. Santini and A. Del Bimbo

Px ¢ ~ f ( x ) = 0, ( 41 )

f is called the characteristic function of the predicate P. Given an (n + 1 )-ary predicate Pxy, the/z operator,

applied to P, #yPxy gives the lowest value y for which Pxy. Note that the # operator gives a result only if there exists at least one y such that Pxy. A predicate Pxy is regular if such a y exists for all x. Similarly, the primitive recursive functionf(x, y) is said to be regular if for any x there exists at least one y such tha t f (x , y) = 0.

If g (x , y) is a regular function, the function f ( x ) is obtained by application of the # operator if: 1. for any x there exists at least one y such that g (x,

y) = 0, and 2. f ( x ) is the smaller y such that g (x , y) = 0.

DEFINITION 10.3. A function f is said to be # recursive if it is obtained by repeated application of" 1. substitution 2. inductive definition 3. # operator (at regular functions) to the three elementary functions.

The importance of the definition of # computability lies in the following theorem (Hermes, 1969).

THEOnEM 10.1. Any Turing-computable function is # recursive, and any tz-recursive function is Turing com- putable.

11. BUILDING UP ELEMENTARY TOOLS

Our proof of the equivalence from BFN computability and # recursivity is based upon the construction of a BFN to compute an arbitrarily selected #-recursive function. We do this by developing networks that im- plement the elementary functions, the substitution and induction schemes, and the # operator.

All these networks must be integrated together to make up a single complex network that computes the required function. Integration is done by a pair of neu- rons that we call the " g o " and the " d o n e " neurons. Suppose the network N1 has to perform some operation which will, in general, take several time steps to be completed. The network N~ needs to know when to start computing, and the environment of Nt needs to know when N~ has finished and its results are ready to be used. To do this, we add to the network Nj an extra input neuron, that we call the go neuron and an extra output net~ron that we call the done neuron (see Figure 5). We assume that the inputs to the network remain constant while go is active, and that go remains active at least until done goes active. This means that the net- work N~ "senses" constant inputs and constant acti- vations throughout its computation. This will not con- stitute a limitation, due to the memory block we will develop shortly.

Data in > ~ N ~ 7 ~ D a t a out

( etwoo \ Go ~ o n e

FIGURE 5. The "go " and "done" organization of subnet- works. The subnetwork illustrated receives the inputs data and a "go " signal. During its work, data are kept constant. When the network finishes, it outputs the results and acti- vates the "done" signal.

11.1. Number Representation and Output Functions

To develop network implemented functions taking value on the set of integer number (and, by extension, on any set for which cardinality is at most ×0), we need to define a representation of integer numbers in terms of neurons. Throughout this paper, we will use the bi- nary representation, with every neuron representing ei- ther a 0 or a 1 value. The numbers from 0 to n - 1 will be represented in this way using log2 n neurons. Of course, any other consistent representation will do as well.

To compute any T-computable function, for any value of the argument, we would need an infinite num- ber of neurons. This is a general property of the number representation, and in a sense it corresponds to the "in- finite tape" condition for Turing machines. To ease the development, we will restrict to a B bits representation of numbers. Of course, B depends on the interval where the functions have to be computed but, for every finite interval I C N assumed as the domain of any given T- computable function f , there is a B such that a BFN with a B bits representation can c o m p u t e f ( n ) for all n c 1 .

As the neuron output function is concerned, we will follow the dominant trend in MLP literature, and as- sume the sigmoid function

1 a(x) = . (42)

1+ e x p ( ~ -~)

The use of this function gives rise to some approx- imation issues. We assume a representation made up of 0 and 1 value, but the value of or(x) always lies in the open interval (0, 1 ).2 This produces a representation

2 This problem cannot be avoided when dealing with finite co- dominium functions. Had the range of the output function been the closed interval [0, 1 ], its first derivative would have had to be 0 over a finite interval of the argument. But any finite interval with 0 deriv- ative makes gradient descent method, and thus the algorithm in San- tini et al. ( 1991 ) inapplicable.

Properties o f Block Feedback Neural Networks 593

error, which propagates through the network, invali- dating the results obtained.

Anyway , for any network with M neurons, it is pos- sible to adjust the slope/3 of the output functions [eqn (42) ] so that, retaining the continuity o f the function, the error is less than any arbitrarily set positive number. We assume that, for any network we build, the appro- priate/3 has been chosen, to make the representation error negligible (i.e., less that an arbitrary value e) . The output function with this characteristic will be referred to as cry. When the threshold value 0 o f eqn (42) needs to be represented, it will be added as a superscript, as • (0) in: a~ . We define also a value w as an arbitrary real value such that cr®(~v) > 1 - e and a ® ( - 0 v ) < 6.

In the fol lowing we will use also the notation I "×" for the identity matrix and ei for the vector made up of all zeros, with a " 1 " in the ith position.

11.2. Some Basic Blocks

In this subsection we develop some specialized neural blocks that will be useful in the following to build more complex networks.

Some o f these blocks have just to perform a static mapping f rom input to output. Due to the approxima- tion theorem (Cotter, 1990; Hornik, 1991 ), a three- layer feedforward network can perform any required ope ra t ion - - sub jec t to some regularity cons t r a in t - - with any degree o f accuracy.

Thanks to this theorem we can use a feedforward network to build each of the fol lowing blocks, where the inputs and the outputs are supposed to be binary representations o f integer numbers: 1. the s u c c e s s o r network if+, such that Vx, x f f+y ~ y

: X r,

2. the p r e d e c e s s o r network if_, such that Vx > 0, x f f _ y ~ x = y ' ,

3. the z e r o network Y, such that x2~y = (y = 1 ¢~ x = 0) ,

4. the n o t z e r o network ,~, such that xffy ~ (y = 1 ¢~ x ~: 0) ,

5. the un i t network ~9 such that x~v ~ y = x. We now use these feedforward blocks to build less

straightforward basic tools. T h e O - e n a b l e block has B + 1 inputs and B outputs•

The first B inputs carry an integer value n, the (n + 1 )th input carries a control signal e. I f e is zero, the output is equal to n; if e is 1, the output is zero. The O - e n a b l e network is described by:

- - 2 w

go = 2wI B×n - ; w 0 . (43)

T h e 1 - e n a b l e block is the dual o f the O - e n a b l e

block. The only difference is that this block allows the

n value to be transferred to the output when the e input is equal to 1:

~' = ~L-2~ go" (44 )

T h e p u l s e block has one input and one output• When the input steps to " 1 , " the network produces a one- time tick-wide output pulse, then returns to zero, and there it rests until the input has been reset to 0 and set to 1 again• This network is the neural version o f the monostable logic circuit; it is represented in Figure 6, and its description is:

], [Io] 0] (45)

T h e r e v e r s e net is a simple layer that, when a binary number is presented in input, reverses all its bits

= -2~oI B×B 0 . (46)

T h e s t o r e block has B + 1 inputs and B outputs. The first B inputs contain the representation o f an integer number n, whereas the (B + 1 )-th carries a control input e. When e is equal to 1, the input n is transferred to the b lock ' s output. When e is equal to 0, the value o f n currently in output is retained. The store network is represented in Figure 7 and described by

(;:t × 3~vl Bxn [ 2wI nxB { }. (47)

- 3 w I B×B _]

T h e b a c k c o u n t block has B + 2 inputs: a B bits integer number and a c o u n t input. It " l o a d s " a value in its memory element, when the load signal is active, and then, for each pulse at the c o u n t signal, decrements

0) ~ O)

\ z I-- FIGURE 6. The network that sends a single pulse when ac- tivated.

594 S. Santini and A. Del Bimbo

FIGURE 7. Network that stores a value. The value to be stored is given in input to the blocks ~ and % When a pulse is applied to the block ~), the value is moved in output, and there it re- mains until a new pulse is sent to store a new value.

the value until it arrives to 0. The actions taken by the back count block after its output reaches 0 are undefi- ned. Its description is

(48)

with Q = block diag(5ov, -5w , 5w, w) and

A = [ 5w'/Rx8 -5w'l~XBO w' lB×~00] . (49)

Note that a forward counter C+ can be obtained simply substituting the block ~ in eqn (48) with a block if+.

12. C O M P U T I N G bt-RECURSIVE FUNCTIONS

In this section we will show that, for any #-recursive function f : ~ ---, ~, there exists a BFN that computes f ( n ) for all n < M, being M an arbitrary integer number.

Let F be the class of all BFN-computable functions. To prove our assert we have to show that: 1. The functions S(x) , U~ and C o are in F. 2. I f g E F, hi E F i = 1 . . . r, and f i s obtained from

g by substitution of hi . . . . . hr, then f C F. 3. I f h E F, g E F, and f i s obtained by induction from

g and h, then f E F. 4. I f P i s a ( n + 1)-ary predicate suchthat P (x ) ** q(x,

y) = 0, q c F, a n d f ( y ) = #yPxy, then f E F. The first point can be worked out by feedforward

neural networks. The successor S ( x ) can be imple- mented (under the hypothesis of binary representation of integer numbers) by a suitable network of NAND circuits. The NAND circuit can be put in canonical form, and the layers of the canonical form can be mapped onto the layers of the feedforward neural net- work.

The unit functions U7 can be implemented by a feedforward layer such as

(50)

with

B [ 0 ~ ×B 0 ~ × ~ 2w. B×B 8×B . . 0 ~ × 8 l . . . . . - l 0 i + I " . ( 5 1 )

The 0-ary constant can be computed by a neuron with all weights at - w .

To be inserted in the network architectures we de- sign, these function need to accept a " g o " signal and to yield a " d o n e " signal when the computation is fin- ished. Let N be a feedforward network that computes one of the elementary functions. The network:

performs the same computation with the required con- trol signals.

Now, let ~Ei, . . . , ~ r be r BFNs that compute the functions h i , . . . , hr, and $ the BFN that computes the function g. It is apparent that

j . ~2 .~ (53)

computes the function fobta ined from g by substitution of hi . . . . . hr.

Let us consider now the induction scheme. This is quite a complex realization, and it may be better un- derstood if decomposed into pieces. Let us begin by considering only the second of eqn (40) , and assume

f (x , 0) = 0.

We have a network 2d that computes the function h. The existence of such a network is guaranteed by point 2 above, because h is supposed to be BFN computable.

x,n)

lone

FIGURE 8. Network that implements the induction scheme. This network assumes no initialization for the function oh- tained from the induction scheme. If f is the function obtained by induction on h, then it is assumed f(0) = 0.

Properties of Block Feedback Neural Networks

ix,n)

595

lone

gc

FIGURE 9. Network that implements the induction scheme. With initialization f(0) = g(x).

We claim that the network of Figure 8 solves the prob- lem. This network is described by

Infact, setN=(i+),andx,=[xlntldolelht-,].

Consider the four blocks separately. I f we issue the " e " signal to the blocks e_ and e+, then

x,•y, = y, = [ x l n , - lltlh,-~]. (55)

Therefore, n, runs from n to 0, while successive values of h are fed into the ~E block. Note that n, is not fed back and therefore does not enter into the actual com- putation. It is used, in conjunction with the block 2~, only to detect the end of computation, and issue the end signal.

The insertion of the first of eqn (40) can be easily achieved by forcing at the initial step the value g (x ) into the last ~ block of Figure 8, that is, the feedback layer where the value o f f ( n - 1 ) is taken back. Sup- pose we have a network that computes g (again, this is allowed by the assumption we make that g is BFN com- putable) and uses the same " g o - d o n e " signal scheme

used for h. We must design a network that takes in input x, n, and the " g o " signal, and yields in output x, n, g ( x ) , and a " d o n e " signal to state the end of the computation of g. If ~ is the network that performs this operation, then

~ = ~gB 8t • ~ • , ( 5 6 )

~9

where g is the network that computes g. To attach this to the induction network, we substitute

the last ~9 block in the first layer of Figure 8 with a slightly different block made up of the single feedback layer

(57)

The result of this process is the network

(58)

which implements the induction scheme. This network is depicted in Figure 9.

gyPxy

done

FIGURE 10. Network that implements the/~ operator.

596 S. Santini and A. Del Bimbo

W e final ly p r o v e the c losure o f the set F wi th respec t

to the app l ica t ion o f the # ope ra to r to r egu la r funct ions .

T o s h o w this, let 7 be the n e t w o r k that c o m p u t e s the

( B F N - c o m p u t a b l e ) f u n c t i o n f ( x , y ) . T h e n e t w o r k

(C~00).(e~+) { ( ~ ) - ( ( J ? . Z ~ R ) ) } (59)

dep i c t ed in F igu re 10 c o m p u t e s # y P x y . Please note that

i f there is no y such that P x y , the n e t w o r k runs forever .

Th is case, h o w e v e r , has been exp l i c i t ly exc luded , be-

cause we i m p o s e P to h a v e a regu la r charac te r i s t ic funct ion .

Pack ing toge the r the a b o v e observa t ions , we h a v e

the p r o o f to the f o l l o w i n g theorem.

THEOREM 1 2.1. F o r any Tur ing -compu tab l e f u n c t i o n f

a n d f o r every in teger n u m b e r M, there exis ts a B F N

such that

Vn < M : nS t f (n ) . (60)

13. C O N C L U S I O N S

In this paper , we h a v e d i scussed s o m e arch i tec tura l

p roper t i es o f the B F N mode l .

In the first part o f the paper , w e bui l t two d i f fe ren t

quo t i en t sets in the set o f B F N ne tworks : the set ~/n o f

a rch i tec tu res and the set ~n o f d imens ions . W e h a v e

s h o w n that these two par t i t ions are o r thogona l and that

a c o u p l e ( a , s ) , a E ~1,, s E ft, c o m p l e t e l y spec i f ies a

n e t w o r k structure.

Bo th the sets d/, and ~ can be e n d o w e d wi th a lat-

t ice s t ructure, and we h a v e d e t e r m i n e d the proper t ies

o f these s tructures .

W e also h a v e s h o w n that o rde r ing in the archi tec-

tures imp l i e s an o rde r ing in the c o m p u t i n g capac i ty o f

the ne tworks .

In the s e c o n d part, w e h a v e c o n s i d e r e d the p r o b l e m

o f " h o w f a r " this i nc rea s ing in c o m p u t i n g capac i t i e s

can lead us. W e h a v e s h o w n that the B F N m o d e l is as

p o w e r f u l as the # recurs iv i ty , wh ich , in turn, has the

s a m e p o w e r as the T u r i n g mach ine .

REFERENCES

Blum, E. K., & Li, L. K. (1991). Approximation theory and feed- forward networks. Neural Networks, 4, 511-515.

Cotter, N. E. (1990). The Stone-Weierstrass theorem and its appli- cation to neural networks. IEEE Transactions on Neural Net- works, 1(4), 290-295.

Del Bimbo, A., Landi, L., & Santini, S. (1992). Dynamic neural estimation for autonomous vehicles driving. In Proceedings of the 1 l th International Conference on Pattern Recognition, Le Hague, The Netherlands.

Del Bimbo, A., Landi, L., & Santini, S. (1993). Determination of road directions using feedback neural networks. Signal Process- ing, 32(1-2), 147-160.

Fahlman, S. E., & Lebiere, C. (1989). The cascade-correlation learn- ing architecture. In D. D. Touretzky (Ed.), Advances in neural information processing systems 2 (pp. 524-532). San Mateo: Morgan Kaufmann.

G6del, K. (1986). On formally undecidable proposition ofprincipia mathematica and related systems I. In Kurt Giidel: Collected works. New York: Oxford University Press.

Gratzer, G. ( 1971 ). Lattice theory. A Series of Books in Mathemat- ics. New York: W. H. Freeman and Company.

Hecht-Nielsen, R. (1989). Theory of the backpropagation neural network. In Proceedings of International Joint Conference on Neural Network, pp. 1-593-I-605.

Hermes, H. (1961 (1969)). Aufziihlbarkeit, entscheidbarkeit, be- rechenbarkeit (English version Enumerability, decidibility, com- putability). New York: Springer-Verlag.

Hornik, K. ( 1991 ). Approximation capabilities of multilayer feed- forward networks. Neural Networks, 4, 251-257.

Karnin, E. D. (1990). A simple procedure for pruning back-propa- gation trained neural networks. IEEE Transactions on Neural Networks, 1(2), 239-242.

Moody, J. E. (1992). The effective number of parameters: An anal- ysis of generalization and regularization in nonlinear learning sys- tems. In J. E. Moody, S. J. Hanson, and Lippmann, R. P. (Eds.), Advances in neural information processing systems, 4. San Ma- teo, CA: Morgan Kaufman.

Mozer, M. C., & Smolensky, P. (1989). Skeletonization: A tech- nique for trimming the fat from a network via relevance as- sessment. In D. D. Touretzky (Ed.), Advances in neural in- formation processing systems 1 (pp. 107 115). San Mateo: Morgan Kaufmann.

Murata, N., Yoshizawa, S., & Amari, S.-i. (1992). Network infor- mation criterion--determining the number of hidden units for an artificial network model (Tech. Rep.). Department of Mathe- matical Engineering and Information Physics, University of To- kyo, Bunkyo-ku, Tokyo 113, Japan.

Narendra, K. S., & Parthasarathy, K. (1990). Identification and con- trol of dynamical systems using neural networks. IEEE Trans- actions on Neural Networks, 1 ( 1 ), 4-27.

Santini, S., & Del Bimbo, A. ( 1993 ). Block feedback neural networks are universal computers (Tech. Rep. TR 19/93). Dipartimento di Sistemi e Informatica, Universith di Firenze.

Santini, S., Del Bimbo, A., & Jain, R. (1991). An algorithm for training neural networks with arbitrary feedback structure (Tech. Rep. TR 10/91). Dipartimento di Sistemi e lnformatica, Uni- versith di Firenze.

Santini, S., Del Bimbo, A., & Jain, R. (1995). Block-structured recurrent neural networks. Neural Networks, 8 (1), 135-147.

Schwarze, H., & Hertz, J. (1992). Generalization in fully connected committee machines (Tech. Rep. CONNECT). The Neils Bohr Institute and Nordita, Blegdamsvej 17, DK-2100 Copenhagen ~, Denmark.