ON SOLVING SYSTEMS OF LINEAR INEQUALITIES WITH … · The problem of solving a system of linear inequalities arises in numerous applications. It is omnipresent in optimization, where

ON SOLVING SYSTEMS OF LINEAR INEQUALITIES

WITH ARTIFICIAL NEURAL NETWORKS

Gilles Labonté

Department of Mathematics and Computer Science

Royal Military College of Canada

Kingston, Ontario, K7K 5L0 Canada

[email protected]

2

Abstract. The implementation of the relaxation-projection algorithm by artificial

neural networks to solve sets of linear inequalities is examined. The different

versions of this algorithm are described, and theoretical convergence results are

given. The best known analogue optimization solvers are shown to use the

simultaneous projection version of it. Neural networks that implement each

version are described. The results of tests, made with simulated realizations of

these networks, are reported. These tests consisted in having all networks solve

some sample problems. The results obtained help determine good values for

the step size parameters, and point out the relative merits of the different

networks.

3

1. INTRODUCTION

The problem of solving a system of linear inequalities arises in numerous

applications. It is omnipresent in optimization, where it is solved by itself, or

concurrently with the problem of finding the minimum value of a cost function, as

with the simplex algorithm [6], or as a preliminary step for interior point methods

(see for example, Chapter 5 of [9]).

The particular method of solution of this problem called the relaxation-

projection method is the main object of the present article. The original research

on this method was carried out, around 1954, by S. Agmon [1], T.S. Motzkin and

I.J. Schoenberg [17]. Associating the inequalities to half-spaces, in which lie the

points corresponding to the feasible solutions, they proved that such a point can

be reached, from an arbitrary outside point, by constructing a trajectory of

straight line segments, each of which is in the direction of one of the half-spaces

corresponding to violated constraints.

Many neural network training procedures consist, or are based upon, a

method very closely related to that algorithm. For example, the single layer

perceptron training method is such a process. Notwithstanding this fact, it

seems that F. Rosenblatt [21], H.D. Block [3] and the many others who

contributed to its proof of convergence, were unaware of the work of Agmon,

Motzkin and Shoenberg, since no mention of it can be found in their writings.

Recently, H. Oh and S.C. Kothari [18,19], in their study of neural networks

used as bidirectional associative memory, realized the usefulness of these

4

results and, in effect, proposed using a particular version of the relaxation-

projection algorithm to calculate directly the weights of the neurons.

Even though the mathematical results concerning these algorithms are

clearly very pertinent to the field of artificial neural networks, they seem to have

gone very much unnoticed by the researchers in that field, until their use by Oh

and Kothari. Thus, one of our aims, in the present article, is to draw attention to

the most important results concerning these methods. We shall describe the

different versions of the relaxation-projection algorithm, known as the maximal

distance, the maximal residual, the systematic, the general recurrent and the

simultaneous relaxation-projection methods. We shall also give definite

theorems concerning the step size parameters for which convergence, and even

termination in a finite number of steps, is guaranteed.

After having done so, we shall look at some of the best known analogue

optimization networks, namely those of L.O. Chua and G.N. Lin [5] and of M.P.

Kennedy and L.O. Chua [13], of D.W. Tank and J.J. Hopfield [23], and of A.

Rodríguez-Vázquez et al. [20]. We shall demonstrate that they are all making

use of a continuous time version of the simultaneous relaxation-projection

algorithm.

We shall then show neural networks which implement each of the

different versions of the relaxation-projection algorithm. We shall give the

number of units of time needed to perform one step, and the formulas for the

number of neurons these networks require, in terms of the number of variables

and the number of inequalities to solve. We shall describe two types of

5

implementations, one with fixed weights, and one with weights varying according

to Hebb's rule.

Finally, we report on tests we made with simulated realizations of all these

networks. These tests consisted in having each network solve a set of fifteen

small problems, with from two to six variables, and from four to sixteen

inequalities, and one somewhat larger problem with twenty variables and thirty

five inequalities. Different step size parameters were used, so that we can

determine good values to use for these parameters, as well as compare the

relative merits of the different networks.

1.1 Notation

We consider the problem of finding a vector x ∈ Rn such that Ax + b ≥ 0,

when A is a constant mXn matrix and b is a constant vector in Rm. If we let ai

denote the transposed of the i'th row of A, these inequalities can be written as

wi(x) = <ai , x> + bi ≥ 0 for i=1,...,m (1)

where < , > is the Euclidean scalar product. We assume that no ai is the zero

vector.

Define the closed half-space hi and its bounding hyperplane ππi as

hi = x : wi(x) ≥ 0 , ππi = x : wi(x) = 0 . (2)

6

ni = ai / |ai| is then the unit normal to ππi that points inward of hi. A point x is "on

the right side" of ππi if it is in hi; otherwise, it is "on the wrong side" of it. The

Euclidean distance between point x and hyperplane ππi is

dist(x, ππi) = εi (<ni, x>+ βi) (3)

where βi = bi / |ai| and εi = 1 if x is on the right side of ππi and -1 if it is on its

wrong side. The distance between point x and the half-space hi is dist(x, hi) =

dist(x, ππi) if x ∉ hi and zero if x ∈ hi. The solutions to the system of inequalities

correspond to the points of the convex polytope ΩΩ, defined as the intersection of

all half-spaces hi. We shall assume hereafter that ΩΩ is non empty.

1.2 Methods of Solution

Essentially all optimization methods, except of course those that require

starting from a feasible solution, will solve the feasibility problem, when the true

cost function is set to zero. This is particularly straightforward to implement for

methods which use an objective function which consists of two terms: a term for

the actual cost to be minimized, and a penalty term for the non satisfied

constraints. We recall (see, for example, Chapter 5 of Ref. [9]), that the penalty

functions that are most commonly used in optimization are the two functions F1

and F2 defined by:

ii(x)i

i1 bx,a)x(F +><η= ∑∈ΙΙ

and 2ii

(x)ii2 bx,a)x(F +><η= ∑

∈ΙΙ(4)

7

where ΙΙ(x) is the set of the indices of the constraints which are violated by x, and

ηi are some positive constants.

There are also algorithms which have been developed especially for the

solution of the feasibility problem. This is the case for the relaxation-projection

method of S. Agmon [1], T.S. Motzkin and I.J. Schoenberg [17], mentioned

above, and for the simultaneous relaxation-projection method, proposed more

recently by Y. Censor and T. Elfving [4]. This latter method is a variant of the

former, in which the steps of the iteration sequence are made in the direction of

an average of the normals toward all the half-spaces of the violated constraints.

This method is remarkable in that, as proved by A.R. De Pierro and A.N. Iusem

[7], even for inconsistent problems, it will produce a point for which the weighted

average of the squares of the distances to the half-spaces, i.e. the value of the

function F2, is minimum.

2. THE RELAXATION-PROJECTION ALGORITHMS

Define the operators T(hi), i=1,...,m, such that

T(hi) x= x if x ∈ hi (5)= x + [λi dist(x, ππi) + ρi] ni if x ∉ hi

where λi and ρi are non-negative constants. Define also the operator T:

∑=

γ=m

1iii )(TT h , where each γi >0 and ∑

==γ

m

1ii 1 (6)

8

Single-Plane Algorithm. Define an infinite sequence of half-spaces Hν, by

repeating elements of the set hi,i=1...,m, as prescribed below. Take an

arbitrary x0 in Rn and as long as xν ∉ ΩΩ, define inductively xν+1 = T(Hν) xν.

How the sequence Hν is defined characterizes different versions of this

algorithm. Some often considered choices are

1) The maximal distance algorithm, for which Hν is the half-space farthest away

from xν, or anyone of them, if there are more than one at the largest distance.

2) The maximal residual algorithm, for which Hν =hi if wi(xν) is the linear form in

the set of Ineqs. (1) which has the smallest negative value, or anyone of them. if

there are more than one with the smallest value.

3) The systematic projection algorithm, for which Hν is the infinite cyclic

sequence with Hν = hi for ν = i (mod m).

4) The general recurrent projection algorithm, for which the infinite sequence of

half-spaces Hν is arbitrary except for the requirement that any one half-space

hi must reoccur in a finite number of steps, after any given ν. Sequences so

defined are commonly considered in neural network theory, when it comes to

presenting a finite set of exemplars to a learning neural network; (see, for

example, F. Rosenblatt [21] and H.D. Block [3]). The systematic projection

algorithm is obviously a particular case of it.

Multi-Plane Algorithm. Take an arbitrary x0 in Rn, and as long as xν ∉ ΩΩ,

define inductively xν+1 = T xν.

9

This is the simultaneous projection algorithm. In its more general form,

the γi's are allowed to vary from step to step, as long as their sum remains 1.

2.1 General Convergence Properties

In this and the following two sections, we review some important results

concerning the convergence of the relaxation-projection algorithms described

above. Because the results we could find published did not cover all the variants

of these algorithms, we had to generalize some of them. We simply state those

results that had already been proven as such, and prove those that resulted from

some generalization. The proofs are worth going over in that they provide a

good understanding of the nature of the algorithms.

We start with two preliminary lemmas on which most of the proofs are

based.

Lemma 2.1: Let x be an arbitrary point, let y = T(hi) x with 0 ≤ λi ≤ 2 and let the

point a ∈ hi be such that 0 ≤ ρi ≤ 2 dist (a, ππi), then | y - a | ≤ | x - a |.

Proof: If x ∈ hi, then y = x and the result is trivial. Let us therefore consider x ∉

hi, and to simplify the notation, define di(x) = dist (x, ππi). A straightforward

algebraic calculation, making use of Eqs. (5) and (3), yields the following

equation.

| y - a |2 = | x - a |2 - Qi(x) (7)

10

with Qi(x) = [ λidi(x) + ρi] [(2 - λi) di(x) + 2 di(a) - ρi] (8)

The factorization we have made in Qi(x) makes it evident that, under the

hypotheses of this lemma, Qi(x) ≥ 0 ∀ i. •

Lemma 2.2: Let ΩΩ be non empty, let x be an arbitrary point outside ΩΩ, let y = Tx

while for each T(hi) entering in T, 0 ≤ λi ≤ 2 and 0 ≤ ρi ≤ 2 dist (a, ππi) for some

point a ∈ ΩΩ, then | y - a | ≤ | x - a |.

Proof: Upon using the definition of T, given in Eq.(6), and the result of Lemma

2.1, one gets:

| y - a | ∑ ∑= =

−γ≤−γ≤m

1i

m

1iiii axax)(T h = | x - a |. (9)

Theorem 1: Let xν be any type of single-plane or a multi-plane relaxation-

projection sequence, with 0 ≤ λi ≤ 2 ∀ i and with 0 ≤ ρi ≤ 2 dist (a, ππi) ∀ i, for

some point a ∈ ΩΩ, then the sequence of distances | xν - a | is monotonically

non-increasing and thus convergent.

Proof: Lemmas 2.1 and 2.2 imply that the inequality | xν+1 - a | ≤ | xν - a |

holds for all ν's. The sequence of distances | xν - a | is therefore monotone

non-increasing and since it is obviously bounded below by zero, it is then

necessarily convergent. •

11

Theorem 1 states that all relaxation-projection sequences have the

remarkable geometrical property, called Fejér monotonicity, of approaching

pointwise the polytope ΩΩ or a subset of it. Indeed, when ρi = 0 ∀ i, each step xν

→ xν+1 of the algorithm produces a point xν+1 that is closer than xν or at an

equal distance, to every point of the polytope ΩΩ. When ρi > 0, a similar property

holds with respect to the subset a: 2 dist (a, ππi) ≥ ρi ∀ i of ΩΩ. This subset is a

sort of core inside ΩΩ, the boundaries of which are obtained by translating inwards

by ρi/2 each hyperplane ππi, i=1...m. Note that, when ΩΩ is not full dimensional,

this subset is empty, when all ρi's are positive.

Theorem 1, or particular cases of it, can be found stated in most articles

dealing with the convergence of relaxation-projection algorithms. S. Agmon [1],

S. Motzkin and I.J. Schoenberg [17] were the first to mention it for single-plane

algorithms, with ρi = 0, λi = λ ∀ i, and 0 < λ < 2. H. Oh and S.C. Kothari [19]

proved the same result for algorithms with ρi > 0 and the same conditions as

above on the λi's. Although the latter authors talk explicitly about the systematic

relaxation-projection algorithm, their proof clearly holds for all single-plane

algorithms. However, they do not provide an explicit upper bound on the ρi's, as

we did in Theorem 1; they simply state that if ΩΩ is full-dimensional, the ρi's can

always be taken small enough for the property to hold. Y. Censor and T. Elving

[4], proved the Fejér monotonicity of the multi-planes simultaneous relaxation-

projection algorithm with ρi = 0 and 0 < λi < 2 ∀ i.

Our proof of Theorem 1 has the merit of covering all variants of the

algorithm and it is somewhat more direct than some of the above mentioned

proofs.

12

Theorem 2: Let xν be any type of single-plane or a multi-plane relaxation-

projection sequence, with 0 ≤ λi ≤ 2 ∀ i and with 0 < ρi < 2 dist (a, ππi) ∀ i, for

some point a λ ΩΩ, then xν terminates after a finite number of steps to a point of

ΩΩ.

Proof: Let ρm be the smallest of the ρi's and define the positive constant K =

Min [2di(a) - ρi]: i=1,...,m. Eq.(8) shows that Qi(x) ≥ ρm K ∀ i.

i) Single plane methods. With the help of Eq.(7) and the above lower bound on

Qi, one sees that each time a step xν → xν+1 of the algorithm is made, with a

half-space that does not contain xν, | xν+1 - a |2 ≤ | xν - a |2 - ρm K. Thus, if at

point xµ, there has been N non-trivial steps, | xµ - a |2 ≤ | x0 - a |2 - N ρm K.

Since distances are bounded below by zero, there can only be a finite number of

non-trivial steps.

ii) Multi-plane method. With the lower bound obtained above on Qi, Eq.(7) which

holds for x ∉ hi, leads to | T(hi) x - a | ≤ | x - a | - [ρm K]½ . Thus, Ineq.(9) can be

refined to yield

| xν+1 - a | ≤ | xν - a | - [ ] 2/1m

ii Kργ∑

ν∈ΙΙ ≤ | xν - a | - γm [ρm K]½

where ΙΙν is the set of indices of the half-spaces not containing xν and γm is the

smallest of the γi's. Thus, at point xµ, | xµ - a | ≤ | x0 - a | - µ γm [ρm K]½. Again,

since distances are bounded below by zero, there can only be a finite number of

non-trivial steps. •

13

F. Rosenblatt, in Chapter 5 of Ref.[21] and H.D. Block in his article [3]

about the convergence of the learning procedure of single (evolving) layer

perceptrons, have proven the result of Theorem 2, for the single-plane general

recurrent algorithm, with λi = 0 and ρi > 0 ∀ i, in the particular situation where

the polytope ΩΩ is actually a hypercone. For hypercones, the conditions of the

theorem impose no upper bound on the values of the ρi's since whatever these

values, it is always possible to find a point a inside the hypercone, far enough

from its apex, so that these conditions hold. Although it is not obvious, the

pseudo-relaxation method proposed by H. Oh and S.C. Kothari [18,19] can be

recognized as the single-plane systematic relaxation-projection algorithm, with λi

= λ, 0 < λ < 2 and ρi > 0 ∀ i. The argument we used above, for the part of our

Theorem 2 that deals with the general single-plane algorithm, is the same one

they used in proving their Theorem 1 of Ref.[19]. Note however that they did not

provide an explicit upper bound on the ρi's, as we do; they simply stated that if ΩΩ

is full-dimensional, the ρi's can always be taken small enough for the property to

hold. We did not find any published proof of Theorem 2 for the simultaneous

relaxation-projection algorithm.

2.2 Convergence of Single-Plane Methods with ρi = 0

Theorem 3: Let xν be any type of single-plane relaxation-projection sequence

with 0 < λi < 2 and ρi = 0 ∀ i, then xν converges to a point of ΩΩ.

For the proof of this theorem, we shall use the following lemma, which

holds under the same hypotheses as Theorem 3.

Lemma 2.3: The sequence | xν+1 - xν | converges to zero.

14

Proof: The definition of the sequence xν is such that it terminates only if it has

reached a point of ΩΩ, thus the theorem needs to be proven only for infinite

sequences xν. We write hereafter Λν for λi and ΠΠν for ππi if the half-space Hν is

hi and we write Dν(x) for dist(x, ΠΠν). Thus, Eq.(7) becomes:

| xν+1 - a | = | xν - a | - Qν when xν γHν (10)

with Qν = ΛνDν(xν) [(2 - Λν)Dν(xν) + 2 Dν(a)] ≥ Λν(2 - Λν) [Dν(xν)]2 0.

Since the sequences | xν+1 - a | and | xν - a | have the same limit, there

follows from Eq.(10) that the limit of the non-negative sequence Qν must be

zero. The positive sequence Λν(2 - Λν) being bounded, this can happen if

and only if the sequence Dν(xν) converges to zero. The conclusion of the

lemma follows from the fact that for any single-plane relaxation-projection

sequence, with ρi = 0 ∀ i, the step size is | xν+1 - xν | = Λν Dν(xν). •

Proof of Theorem 3: Lemma 2.1 states that the sequence xν is bounded.

Thus, by the Bolzano-Weierstrass theorem, it must have at least one

accumulation point l. Our proof of Theorem 3 will consist in proving that there is

only one such point which is thus the limit of xν, and that this point is in ΩΩ.

We first show that l cannot be outside of ΩΩ. For this, we consider the

possibility that it is and let d be the distance between the point l and the closest

half-space not containing it, and let λm be the smallest of the λi's. By the

definition of accumulation points, for any ε > 0, there exists an index ν0(ε) after

15

which the sequence xν has an infinite number of its points in the closed sphere

Sc(l,ε) of radius ε, centered on l. Consider then )1(2

dm

m

λ+λ<ε , and a point xν ∈

Sc(l,ε) with ν large enough that | xν+1 - xν | < ε (recall Lemma 2.2). The former

condition on xν implies that whatever the plane ΠΠν, Dν(xν) ≥ Dν(l) - ε ≥ d -

ε > m

m )2(

λελ+

. Therefore | xν+1 - xν | = Λν Dν(xν) ≥ λm Dν(xν) > (2+λm)ε,

which is incompatible with the latter condition on xν. Thus no accumulation

point of the sequence xν can be outside ΩΩ.

The accumulation point l must be on the surface of ΩΩ. Indeed, it cannot

be inside ΩΩ, since then ε can be taken such that the sphere Sc(l,ε) lies entirely

inside ΩΩ. The first point of the sequence xν to enter this sphere would then be

inside ΩΩ, and the sequence would terminate at this point.

There then remains only to prove that there can be only one accumulation

point at the surface of ΩΩ. Suppose there are two different such points l and q,

and take ε < | l - q | /2 and small enough that the sphere Sc(l,ε) is traversed only

by hyperplanes containing l. Then, if xν is a point of the sequence in this sphere,

xν+1 must also be in this sphere because the reflecting hyperplane ΠΠν

necessarily passes through l. By induction, one can prove that all following

points are also necessarily in Sc(l,ε), contrary to the hypothesis of existence of

another accumulation point q ≠ l. •

As for our Theorem 1 above, S. Agmon was the first one to prove this

result explicitly , in his Theorem 3 of Ref.[1], for maximal distance relaxation-

projection sequences, with ρi = 0 and λi = λ ∀ i, and 0 < λ < 2. In Section 4 of

16

Ref. [1], he states that his result can be proven as well for maximal residual and

systematic relaxation-projection sequences. S. Motzkin and I.J. Schoenberg [17]

also proved exactly the same result as Agmon, but by a different method. The

proof we present above covers explicitly all variants of the single-plane algorithm

and involves similar ideas as those used by Motzkin and Schoenberg.

Theorem 4: (a) When ΩΩ is full-dimensional, there exists a constant λ0 ∈ [1,2)

such that all the single-plane relaxation-projection sequences, for which λ0 < λi ≤

2 and ρi = 0 ∀ i, terminate after a finite number of steps. (λ0 is a geometrical

constant associated with the polytope ΩΩ, defined in Ref.[10]).

(b) Furthermore, if λ > 2, then the sequence either converges

finitely to a point of ΩΩ or it does not converge.

T.S. Motzkin and I.J. Schoenberg [17] were the first to prove finite

termination, for the particular case λi = 2 and ρi = 0 ∀ i. J.L. Goffin (see his

Theorem (3.3.1) and his Section 4.1 of Ref. [10]) then proved the above

theorem, which constitutes a noteworthy improvement over the result of Motzkin

and Schoenberg, in that it guarantees termination for a whole interval of λi's.

This fact may prove important when doing numerical computations, in that it

would allow to avoid the inevitable instability of a property which holds only for

one particular value of some parameter.

2.3 Convergence of the Multi-Plane Method with ρi = 0

Theorem 5: Consider F(x) = ∑=

λγm

1i

2iii )],x(dist[ h . A simultaneous relaxation-

projection sequence xν, with ρi = 0 and 0 < λi < 2, produces a monotonically

17

non-increasing sequence F(xν) . xν converges to a solution, if the system of

inequalities is consistent and, if not, to a minimizer of F(x).

Y. Censor and T. Elfving [4] were the first to prove the convergence of

the simultaneous relaxation-projection sequence under the conditions of

Theorem 5. A proof similar to theirs is easily produced with the help of the ideas

we used in the proofs presented above. A different proof was given by A.R. De

Pierro and A.N. Iusem in Ref. [7] who also had the merit of proving Theorem 5

as such.

It is remarkable that Theorem 5 is the only result to the effect that each

step of a relaxation-projection sequence decreases an objective function. Even

the similar simultaneous relaxation-projection sequence with ρi > 0 has not been

proven to have this property with respect to its objective function F:

F(x) = ∑=

ρ+λγm

1iii

2iii ),x(dist2)],x(dist[ hh . (11)

It is, of course, nonetheless true that all relaxation-projection sequences, which

converge to a point of ΩΩ, produce a sequence of values F(xν) which converges

to 0, and therefore to the minimum of F(x), where F is the objective function of

Eq.(11), with arbitrary positive constants γi.

3. ON THE ANALOG FEASIBILITY SOLVERS

Widespread interest in the use of electrical circuits as analog solvers for

optimization problems was really awakened in 1986 by D.W. Tank and J.J.

18

Hopfield [23] (for an overview of this subject, see C.Y. Maa and M. Shanblatt

[15]). As M.P. Kennedy and L.O. Chua [12] pointed out, the network proposed

by Tank and Hopfield, with corrected sign for the penalty function, is closely

related to the canonical nonlinear programming circuit of L.O. Chua and G.N. Lin

[5]. Another circuit that is also often mentioned is that of A. Rodríguez-Vázquez

et al. [20].

We shall briefly examine, in this section, the feasibility solvers that these

networks become when they are used with a zero cost function. Although they

are also of interest, we shall not discuss, in the present article, models where

equality and inequality constraints are treated separately (as, for example, the

model of S.H. Zak et al. [24]). Thus, we consider models that solve instances of

the optimization problem:

Minimize φ(x), subject to the set of constraints Ax + b ≥ 0.

3.1 The Model of Chua-Lin and Kennedy-Chua

The circuit of Chua-Lin and Kennedy-Chua [5, 13] is devised to solve the

general optimization problem, where minimal assumptions are made about the

cost function and the constraint functions. When the constraints are taken to be

linear in x, the evolution equation for the n-vector x of voltages in this circuit is

ii

m

1i

2i n),x(dist|a|

R1

dtdx

C h∑=

+φ−∇= (12)

19

where C is a nXn diagonal matrix of constant capacitances and R is a constant

resistance. A Liapunov function that is minimized by this system is

E(x) = 2i

m

1i

2i )],x(dist[|a|

R21

h∑=

+φ . (13)

When the cost function φ is zero, the network implements a continuous

time version of the simultaneous relaxation-projection algorithm, with ρi= 0 ∀ i.

In order to see this, do the change of variables i2/1

i2/1 a)RC(a~,x)RC(x~ −== to

remove the constants R and C from Eq.(12). A first order Euler discretization of

the resulting equation then produces the equation xν+1 = T xν, which describes

one step of the simultaneous relaxation-projection algorithm, with ρi = 0 ∀ i and

γiλi = 2i |a~|t∆ . The value of the Liapunov function E is then F(x)/2R∆t, where F

is the objective function of our Theorem 5.

Note that the conditions for the convergence of the discrete algorithm,

seen in Section 2, correspond here to upper bounds on the step size ∆t.

3.2 The Model of Rodríguez-Vázquez et al.

The circuit proposed by Rodríguez-Vázquez et al.[20] is also devised to

solve the general optimization problem, with minimal conditions on the cost and

constraint functions. Their model is characterized by its division of the space in

two regions: the region of feasibility, inside of which the objective function is

solely the cost function, and the rest of space, where the objective function is

solely the penalty function for the violated constraints. Correspondingly, they

define the pseudo-cost function

20

ψ(x) = U(Ax+b)φ(x) + µP(x) with U(Ax+b) = 1 if Ax+b ≥ 0 (14)

= 0 otherwise.

The constant µ is called the penalty multiplier and P is the penalty function for

the violated constraints. P can be taken to be either one of P1 or P2 with:

P1(x) = ∑=

m

1iii ),x(dist|a| h P2(x) = 2

m

1ii

2i ]),x(dist[|a|∑

=h . (15)

The equation of motion that corresponds to their circuit is

∑∈

µ−φ∇+−=)x(i

ii av)bAx(Udtdx

ΙΙ(16)

where ΙΙ(x) is the set of indices of the violated constraints, and

vi = -1 if P = P1

= - |ai| dist(x, ππi) if P = P2.

When the cost function φ is zero, this model corresponds to a continuous

time version of the simultaneous relaxation-projection algorithm, with λi =0 ∀ i

when the penalty function P is P1 and with ρi=0 ∀ i when P = P2.

A first order Euler discretization of Eq.(16) transforms it into the equation

describing a step of the simultaneous relaxation-projection algorithm with

λi =0 ∀ i and γiρi = µ∆t |ai| when P = P1

21

ρi =0 ∀ i and γiλi = µ∆t |ai|2 when P = P2.

The value of the pseudo-cost function in Eq.(14), when P = P1, is F(x)/2∆t, where

F is the objective function of Eq.(11) with λi =0 ∀ i. When P=P2, it is F(x)/∆t,

where F is the objective function defined in our Theorem 5. The conditions of

convergence, discussed in Section 2, correspond to upper bounds on ∆t.

3.3 The Model of Tank-Hopfield

The linear programming network of Tank and Hopfield [23] minimizes the

objective function φ(x) = <k , x>, where k is a constant n-vector. Its equation of

motion, for the n-vector of voltages x, with corrected sign for the penalty term

(see M.P. Kennedy and L.O. Chua [12]), is

∑=

− +−−=m

1iii

2i

1 n),x(dist|a|x)rR(kdtdx

rC

h (17)

where C is a nXn constant diagonal matrix of capacitances, R is a constant nXn

diagonal matrix of resistances, and r is the proportionality constant in the linear

input-output function for the variable amplifiers. The Liapunov function for this

model is

E(x) = <k, x> + ∑=

− ><+m

1i

1i

2i xR.x

r21

)],x(dist[|a|21

h . (18)

When the cost function is zero, this model does not quite correspond to the

simultaneous relaxation-projection algorithm, due to the additional term -(rR)-1x

on the right hand side of Eq.(17).

22

In order to make conspicuous the effect of this additional term, we

consider a problem with a one-dimensional variable x, and the single constraint x

≥ b > 0. The equation of motion and its solution, in the region x < b, are then

)xb(xrR1

dtdx

rC −+−= and x(t) = β + [ x(0) - β ] e

- αt .

with α = RC

rRb1+ and β =

)rR1(rR+

. It is straightforward to show that x(t) always

remains smaller than b at finite times, when x(0) < b, and that the limit x(∞)

= β < b. Thus, x(t) never moves into the region where the constraint is satisfied,

even asymptotically. Not only that, if x(0) is such that x(∞) < x(0) < b, x(t) will

actually move away from the region of feasibility and decrease monotonically to

x(∞).

These calculations corroborate the remark, made by M.P. Kennedy and

L.O. Chua [12] and C.Y. Maa and M. Shanblatt [15], to the effect that the Tank-

Hopfield network should be used only when the resistances in R are very large,

so that the second term on the right hand side of Eq. (17) is negligible. When

this is the case, the circuit implements the simultaneous relaxation-projection

algorithm.

4. OTHER RELAXATION-PROJECTION NETWORKS

The analogue networks of Section 3 were all implementations of the

simultaneous relaxation-projection algorithm. We now present networks which

implement all the variants of the relaxation-projection algorithm. Although we

23

describe them as digital networks performing the discrete algorithms, it should be

clear that they can also be realized as analogue electrical circuits, performing the

continuous time algorithms.

We consider networks of Mac Culloch-Pitts type neurons, such that when

a neuron has input vector x , weight vector w and activation function f, its output

is f(<w, x>). These neurons are arranged in layers, the data taking one unit of

time to go through each layer. A clock, and possibly delays, ensure that the

proper data enter and leave each layer in step. As with the analogue networks

of Section 3, the neurons all have fixed weights which depend on the parameters

in the inequalities.

These networks all work on the same principle: an arbitrary vector x0 is

initially fed them as input. They are then left to cycle, their output being fed back

as input, until a solution is reached.

4.1 Maximal Distance and Maximal Residual Algorithms

The maximal distance and the maximal residual algorithm each requires a

Winner-Takes-All (WTA) subnetwork to select the maximum xM in a set of values

x1,...,xn. This WTA subnetwork must take the n-vector x = (x1,...,xn)t as input,

and return as output the n-vector y = (y1,...,yn)t of zeros, except for a 1 at only

one of the positions of xM in x.

Some Winner-Takes-All Networks .

1) Feldman and Ballard [8] have presented a WTA network which operates in

one unit of time. It is however composed of neurons that are somewhat more

24

complex than Mac Culloch-Pitts neurons, in that the value of their activation

function depends on the position of the inputs on their surface, as well as the

connection weights. When n inputs arrive at different locations on the surface of

such a neuron, one of them is favoured in that only when this one is the largest

of all inputs does the neuron fire, with output 1. The same behavior is obtained

when considering that the "favoured" input is presented directly to the neuron,

and the other (n-1) inputs come from the neighboring neurons, through inhibiting

channels.

A single layer of such neurons, each having one of the xi's arriving at its

favoured surface location, constitutes a WTA network, its output vector will have

a 1 only at the positions of xM in x, and zeros elsewhere. This WTA network is

obviously the optimum. However, we cannot use it, if we restrict ourselves to

networks with only Mac Culloch-Pitts neurons.

2) A WTA network which operates in only two time steps, can be made with

Mac Culloch-Pitts neurons.

Its first layer has n(n -1)/2 neurons, which we label as nij with i < j, ∀ i and

j ∈ 1,...,n, according to the two components xi and xj of x that it receives as

input. The weights of its connections to these inputs are respectively +1 and -1.

Their activation function is the sign function sgn+ : sgn+(x) = 1 if x ≥ 0 and -1 if x

< 0. Thus, the components of the output vector of this first layer are the signs of

all the possible differences between two components of the input x.

The second layer has n neurons, with the k'th one connected to each

neuron nij of the first layer, for which either i or j = k. Its connection weight is +1

25

if i = k and -1 if j = k. These neurons have the activation function f: f(x) = 1 if x ≥

(n - 3/2) and 0 otherwise. Thus, when the components of the input to the

network are all different, the total input to the k'th neuron is ∑≠=

−n

)ki(1iki )xx(sgn . It

can be seen that, with sgn+ as activation function for the first layer, the output

vector of the network will have a 1 only at the first position of xM in x, and 0

everywhere else, as desired. This WTA network requires a total of n(n+1)/2

neurons. An example of this network, with n = 4, is shown in Figure 1.

3) As final example of WTA network, we mention the binary maximum selector

network devised by T. Martin [16], and described by R.P. Lippman [14]. By

appropriately defining its activation functions at zero, this network can be made

to return an output vector y which has a 1 only at the first position of xM in x and

0 everywhere else. This network requires 2[log2 n] +1 layers of neurons, if n > 2

(where [x] = x if x is an integer and the next integer greater than x otherwise) and

1 layer if n = 2. It then operates in as many time steps. It has

(5 x 2[log2n] - 6 + n) neurons if n > 2 and 2 neurons if n = 2.

We remark that when n ≥ 5, this network is slower than the second

network mentioned above. However, it requires less neurons than the latter

network, whenever n ≥ 13. Since we shall here consider neurons to be

inexpensive, we will use the faster second WTA network.

The network that realizes the algorithm. The network shown in Figure 2

performs one step of the maximal distance or the maximal residual algorithm. It

takes an arbitrary vector x as input. If this x is a solution of Ineqs.(1), it is

returned as the output. If it is not, the output is the vector T(hk) x, where k is the

26

index of the half-space farthest from x, if the maximal distance algorithm is

performed, and the index of the smallest negative linear form wi of Ineqs.(1), if

the maximal residual algorithm is performed.

Here is how this network functions. Its first layer comprises m neurons:

one for each inequality. The threshold of the i'th neuron of this layer is αi βi,

where αi = 1 for the maximal distance algorithm and αi = |ai| for the maximal

residual algorithm. Its weight vector is - αi ni, where ni is the unit vector normal

to the hyperplane ππi. As is common practice, we shall take the threshold into

account, by augmenting the weight vector by one component. Thus, we let the

threshold be its zeroth component, so that it becomes Wi = αi (-βi, -ni1,...,-nin)t.

Correspondingly, an augmented input vector X is defined as (1, x1,...,xn)t. The

activation function for each of these neurons is f: f(x) = x if x ≥ 0 and 0 if x < 0.

The output vector of the first layer is therefore [α1dist(x,h1),..., αmdist(x,hm)]t.

This output vector serves as input for the WTA subnetwork. If x is already

a solution, the output of this subnetwork is z = 0 and y = [1,0,...0]t, and if it is not,

it is z = αkdist(x,ππk) and the vector y = [0,...1,...0]t, where the 1 is at the k'th

position.

The WTA subnetwork is followed by a layer of m neurons, with zero

threshold, and multiplicatively arranged input connections to the input and output

ports of the WTA subnetwork. These connections are such that the i'th neuron

of this layer has the input OiΙi where Oi is the i'th output of the WTA network and

Ιi is its i'th input. How to realize multiplicative synaptic arrangements has been

discussed by G.E. Hinton [11] and others (see, for example, Section 9.6 of Ref.

27

[2]). The activation function of the i'th neuron is fi: fi(x) = i

i

αλ

x + ρi if x > 0 and 0

if x ≤ 0. The output vector of this layer is then the zero vector if the point x input

to the first layer of the network is a solution and the vector [0,...

0,

ρ+

αλ

kkk

k ),x(dist ππ ,0,...,0], if it is not.

The last layer is made up of n neurons, each with zero threshold and

activation function fl: fl(x) = x. There is a connection from the i'th neuron of the

previous layer to the j'th one of this layer, with weight nij, where nij is the j'th

component of the unit vector ni, normal to the hyperplane ππi. This j'th neuron is

also fed, with weight one, the j'th component of the input vector x to the first layer

of the network. The output vector of this last layer is then T(hk) x.

If one unit of time is required for the data to go through each layer of the

network, 5 units of time will be required for it to perform one step of the

algorithm. The network has (m2 + 5m + 2n)/2 neurons. A solution to the system

of inequalities is obtained when the output vector of the network is identical to its

input vector x.

4.2 Systematic Projection Algorithm

The network for this algorithm is composed of as many subnetworks, as

that illustrated in Figure 3, as there are inequalities to satisfy. These

subnetworks are chained together to perform a full cycle of the algorithm.

28

Here is how the i'th subnetwork functions. Its first layer contains one

neuron, with same weights and threshold as the i'th neuron in the first layer of

our maximal distance algorithm network. Its activation function fi is however

different, with fi(x) = (λi x + ρi) if x > 0 and = 0 if x ≤ 0.

The last layer of this subnetwork is similar to that of the maximal distance

algorithm network. The connection from the single neuron of the previous layer

to the j'th one of this layer has weight nij, where nij is the j'th component of the

unit vector ni, normal to the hyperplane ππi. This j'th neuron is also fed, with

weight one, the j'th component of the input vector x for the first layer of this i'th

subnetwork. The output vector of this last layer is therefore T(hi) x.

Two units of time are required to perform one step of the algorithm, i.e. for

the data to go through one subnetwork, which contains (n+1) neurons. Since m

such subnetworks, connected in series, are required for a whole cycle through all

the inequalities, 2m units of time and m(n+1) neurons will be required for one

such cycle. A solution is obtained when the output vector, at the end of the

chain, is identical to the input vector x, at its beginning.

4.3 Simultaneous Projection Algorithm

The basic structure of the network for this algorithm, shown in Figure 4,

can be recognized in each of the analogue optimization networks discussed in

Section 3.

29

Its first layer has m neurons. The i'th one of which is identical to that of

the first layer of the i'th subnetwork for the systematic algorithm, with its

activation function multiplied by γi.

The last layer of this network is identical to that of the maximal distance

algorithm network, and it is connected in the same way to its preceding layer and

the input x for the whole network. Its output vector is Tx.

Each step of the algorithm is performed in two units of time. The network

has (m+n) neurons. A solution is obtained when its output vector is identical to

its input vector x.

5. RECIPROCAL IMPLEMENTATIONS

Another set of networks implementing the same algorithms, is obtained by

interchanging, in the networks of Section 4, the way in which the coordinates X

and the inequality parameters Wi are treated. Thus, the weights of the neurons

of the first layer of the networks would now all be set to X, and the i'th neuron of

the first layer would receive the vector Wi as input. This interchange leaves its

total input <Wi, X> unchanged. When these neurons are let to evolve according

to Hebb's rule, a solution to the system of inequalities will be obtained as the

final value of their weights.

More precisely, consider a neuron, with (n+1)-dimensional weight vector X

= (1, x1,...,xn)t, and activation function fi: fi(x) = λix + ρi , if x > 0 and f(x) = 0, if x

≤ 0.

30

When the vector Wi is presented to it as input, its output fi(<Wi, X>) will be

0 if x ∈ hi, and λi dist(x, ππi) + ρi, if x∉ hi. The first component of its weight vector

is then kept to the constant value 1, and its other weights are made to change

according to the Hebbian learning rule: x … x + fi(<Wi, X>) ni. Thus, this neuron

implements the action of the operator T(hi) on the vector x.

1) Systematic and General Recurrent Algorithm. A single neuron, as described

above, can perform the systematic and the general recurrent versions of the

single-plane algorithm, with λi = λ and ρi = ρ for all i's. For this, It suffices to

present to it the inputs Wi's, in the order specified by these algorithms.

In order to allow for different parameters λi and ρi, it would be necessary

to use m neurons, each one with a different value of these parameters in its

activation function. The exemplar Wi would then be presented only to the i'th

neuron, the output of which would provide the weight correction for all m

neurons.

2) Maximal Distance and Maximal Residual Algorithms. The interchange of the

roles of X and Wi, as described above, is made for the neurons of the first layer

of the network described in Section 4.1. To execute one step, all the Wi's are

presented simultaneously as inputs (Wi being the input for the i'th neuron). If the

direct connections, between the input to the network and the last layer, are

removed, the output of the last layer will be the zero vector, if x is a solution, and

the vector [λk dist(x,ππk) + ρk] nk if not. This output is the correction to be added

to the x part of the weight vector for each neuron of the first layer. A solution is

recognized as such when the output of the network is zero.

31

3) Simultaneous Algorithm. The same modifications done to the maximal

distance network should be made to the network described in Section 4.3. The

network would then perform one step of the algorithm, by weight correction for

the neurons of the first layer, exactly as described above for the maximal

distance algorithm network.

6. SOME SIMULATION RESULTS

We have simulated the digital neural networks implementing the maximal

distance, the systematic and the simultaneous relaxation-projection algorithms.

In a first series of tests, they were used to solve some 15 small feasibility

problems (most of these problems are optimization problems from Ref. [22], in

which we have set the cost function to zero). Upon characterizing a problem,

with n variables and m inequalities, by the pair (n, m), the problems solved can

be described as two of each of the types (2, 4), (3, 6) and (4, 7), four of the type

(5, 9) and one of each of the types (3, 7), (4, 8), (5, 3), (5, 8) and (6, 16). For

each algorithm, the same step size parameters λi and ρi were used for all

hyperplanes. Values of λ = 0.5k, with k=0,...,6 and ρ ∈ 0, 0.25, 0.5, 1 were

tried. For the simultaneous projection algorithm, the additional values of λ with k

= 7,...,20 and ρ = 0.5s, with s=3,...,11, were also tried. Note that whenever this

algorithm was used, its parameters γi, for i=1,...,m, were all taken to be 1/m,

where m is the number of inequalities. Table 1 reports the total number of steps

and the total number of units of time each network required to solve all of these

problems, when the best values for λ and ρ were used. Notice that for the

simultaneous relaxation-projection algorithm, the best results were obtained for

values of λ much outside of the bounds given in Theorems 2 and 5. These

results, and those with λ =2, the upper "safety" bound, appear in Table 1.

32

ALGORITHM λλ ρρ Steps Time

Max. Distance 1.5 0.25 119 595

Systematic 1 0.25 370 740

Simultaneous 2 2.5 277 554

(> safe bounds) 7 2 118 236

TABLE 1: The values of the parameters λ and ρ for which the three

networks took less overall time to solve 15 small feasibility problems.

As this table indicates, all the networks, given appropriate step size parameters,

solved the 15 problems in a finite number of steps. In terms of the number of

steps, the best performance of the maximal distance and of the simultaneous

projection algorithms are comparable, and are much better than that of the

systematic projection algorithm. In terms of the time required however, the

simultaneous projection network is faster because of its fewer layers.

Nonetheless, if λ had to be taken within the safe range λ ≤ 2, the times required

by the two would be comparable (595 vs 554 time units). And if we had allow

ourselves the single level WTA network, the maximal distance network would

have performed the best with 476 vs 554 time units.

33

Figures 5 to 7 are graphs showing the values taken, at each step of the

solution, by the two variables x1 and x2, when the following sample problem is

solved by the different networks.

Find x1 and x2 such that: x1 - x2 ≥ 1 and -x1 + 5x2 ≥ 5.

The behavior seen is representative of the general one observed with the

different networks. The maximal distance and the simultaneous projection

networks are seen to be of somewhat similar effectiveness in terms of the

number of steps. However, the trajectories produced by the first network

oscillate more, in general, as should be expected from the fact that the

simultaneous projection algorithm involves an average of the directions toward

all violated constraints hyperplanes. The systematic projection algorithm is seen

to converge most slowly. All networks started from the point (0, 0)t. The

maximal distance algorithm network reached the solution (3.371, 1.864)t after 6

steps, taking 36 units of time. The systematic algorithm network reached the

solution (2.976, 1.623)t after 16 steps, taking 32 time units. The simultaneous

algorithm network reached the solution (4.961, 2.193)t after 8 steps, taking 16

units of time.

Table 2 shows the number of neurons each network requires for solving a

type (20, 35) problem. The systematic projection and the maximal distance

networks are seen to be, by far, the most costly in terms of the number of

neurons. This same table also shows the number of steps and of units of time

required for the solution, with the best values for the parameters, as determined

in the tests with the 15 small problems, as well as those values among those

mentioned above, which yielded the best solution time for this (20, 35) problem

34

alone. The mention "Ended" in the table indicates that the network was stopped,

the algorithm having run for 100 steps without producing a solution. The

systematic projection network proved the less efficient, taking more than the 100

steps limit, for most values of the parameters. The performance of the maximal

distance algorithm and the simultaneous projection algorithm are comparable as

for the number of steps, when λ is in the "safe" range. The latter algorithm is

however definitely superior in terms of the number of units of time required. The

best performance, obtained with the simultaneous projection network, is

remarkable in that the solution is obtained in a single step.

ALGORITHM Neurons λλ ρρ Steps Time

Max. Distance 720 1.5 0.25 27 135

1.5 0 12 60

Systematic 735 1 0.25 Ended Ended

1.5 0 70 140

Simultaneous 55 2 2.5 15 30

10 1 1 2

TABLE 2: Values of the network parameters and the corresponding times taken

by the three networks to solve a 20 variables, 35 inequalities problem.

7. CONCLUSIONS

35

We have shown that the solution method, used by the best known

analogue optimization networks, is a continuous time version of the simultaneous

relaxation-projection algorithm. As for the Tank-Hopfield network however, the

input resistances for the variable amplifiers makes its behavior deviate slightly

from that of this algorithm. By solving exactly its equation of motion, we have

demonstrated that this additional term has a negative effect, in that it prevents a

feasible solution from being reached.

We have produced neural networks that implement each of the relaxation-

projection algorithms. For the fixed weights implementation, the number of

neurons required to solve a problem with n variables and m inequalities are

(m2 + 5m + 2n)/2 neurons for the maximal distance version,

m(n+1) neurons for the systematic projection version, and

(m+n) for the simultaneous projection version.

These numbers clearly show that, among these three versions, the last one is

the most economical in terms of neurons used, its number of neurons increasing

only linearly with the problem parameters. The variable weights networks have

the same basic structure and same efficiency as the above ones. However, as

mentioned in Section 5, a single neuron with the Hebbian learning capacity,

suffices to perform the systematic and any recurrent single-plane algorithm.

Although we have found these algorithms to be generally less efficient than the

other ones, this fact definitely renders them worthy of consideration, for certain

applications where speed of solution is not a critical factor.

The results of the preliminary tests, we have conducted with these

networks, have been discussed to a certain extent in Section 6. We sum them

up as follows. For the sample problems solved, the maximal distance and the

36

simultaneous projection algorithms required comparable numbers of steps,

always much less than the systematic projection algorithm. In terms of the

number of units of time used, the simultaneous projection network appears

superior to the maximal distance network. This comes from the fact that the

latter one has more layers than the former.

. The simultaneous projection algorithm furthermore provides its user, with

the important unique advantage of guaranteeing to minimize the objective

function, even when the system of inequalities has no solution (see Theorem 5).

For the single plane methods, we have found that good values, among

those tried, for the step size parameters λ and ρ are λ ≈ 1.5 and ρ ≈ 0.25. This is

consistent with the convergence theorems mentioned. For the simultaneous

projection algorithm, although good results were found with λ ≈ 2, the best

results were consistently obtained for larger λ's as well as for rather large ρ's,

between 1 and 2.5. This fact can very well be interpreted as an indication that

the sufficient conditions in the convergence theorems are not really necessary,

and that the theoretical results need to be refined.

We note that, when both step size parameters λ and ρ are non-zero, the

convergence should generally be better than when one of the two is zero.

Indeed, when the point xν is far from the polytope, the distance dependance of

the step size ensures that the points of the sequence approach the polytope at a

faster pace than if the steps were of constant lengths. On the other hand, as the

points get close to the polytope and the distance term in the step size becomes

small, the constant term takes over and ensures that the points of the sequence

37

keep on moving toward the polytope at a minimum, non-infinitesimal, rate, so

that it is reached in a finite number of steps.

For computing solutions, it suffices, of course, to know that the iteration

sequence converges; the calculations can then always be stopped when a

certain precision criterion is satisfied. This will always happen after a finite

number of steps, even though the exact sequence xν may actually be infinite.

Nevertheless, it still remains a very important property for an iteration sequence

to exactly terminate in a finite number of steps. Indeed, this generally means

that its limit point is inside the polytope ΩΩ, while for infinite sequences, it is

necessarily on its surface. Interior point solutions are more stable and more

robust because they are completely surrounded by a whole neighborhood of

other solutions. On the other hand, surface limit points have neighbors both in ΩΩ

and outside of it; so that they can easily cease to be solutions under small

perturbations of the parameters of the problem, as when the coefficients of the

inequalities are slightly modified. For example, this is the kind of stability that

leads to a better ability of neural networks to generalize to new inputs the

knowledge they have accumulated during their training.

Given the fact that all the networks we described can be realized with very

inexpensive computing elements, it would be practical to further improve on the

solution time by having many copies of the networks work simultaneously on the

same problem, each using either different values of the step size parameters,

some even with λ > 2, and some with different starting points x0.

It is certainly worthwhile to conduct other tests, with more complex and

larger sample problems, in order to see whether the results we observed persist.

38

We believe that the study reported in the present article is important for

the theory of optimization neural networks, as well as for feasibility networks,

since after all, the latter networks are always an essential part of the first ones.

ACKNOWLEDGMENT

The author is grateful to the reviewers for their constructive comments

and suggestions for improving this manuscript.

39

References

[1] S. Agmon, "The relaxation method for linear inequalities", Can. J. Math., vol.

6, pp. 382-392, 1954.

[2] I. Aleksander and H. Morton, "An Introduction to Neural Computing",

Chapman and Hall Editors, New York, 1990.

[3] H.D. Block, "The perceptron: a model for brain functioning. I", Rev. Mod.

Phys., vol. 34, pp. 123-135, 1962.

[4] Y. Censor and T. Elfving, "New methods for linear inequalities", Linear

Algebra Appl., vol. 42, pp. 199-211, 1982.

[5] L.O. Chua and G.N. Lin, "Nonlinear programming without computation", IEEE

Trans. Circ.Syst., vol. CAS-31, pp. 182-188, Feb. 1984.

[6] G.B. Dantzig, "Linear Programming and Extensions", Princeton University

Press, Princeton, NJ, 1963.

[7] A.R. De Pierro and A.N. Iusem, "A simultaneous projections method for linear

inequalities", Linear Algebra Appl., vol. 64, pp. 243-253, 1985.

[8] J.A. Feldman and D.H. Ballard, "Connectionnist models and their properties",

Cognitive Science, vol. 6, pp. 205-254, 1982.

40

[9] P.E. Gill, W. Murray and M.H. Wright, "Practical Optimization", Academic

Press, London, 1981.

[10] J.L. Goffin, "On the Finite Convergence of the Relaxation Method for Solving

Systems of Inequalities", Operations Research Center Report ORC 71-36, Univ.

of California at Berkely, 1971.

[11] G.E. Hinton, "A parallel computation that assigns object-based frames of

reference", in Proc. 7th Int. Joint Conf. on Artificial Intelligence, 1981.

[12] M.P. Kennedy and L.O. Chua, "Unifying the Tank and Hopfield linear

programming network and the canonical nonlinear programming network of

Chua and Lin", IEEE Trans. Circ. Syst., vol. CAS-34, pp. 210-214, Feb. 1987.

[13] M.P. Kennedy and L.O. Chua, "Neural networks for nonlinear programming",

IEEE Trans. Circ. Syst., vol. 35, pp. 554-562, May 1988.

[14] R.P. Lippmann, "An introduction to computing with neural nets" in "Artificial

Neural Networks; Theoretical Concepts", V. Vemur, Editor, IEEE Computer

Society Press, 1988.

[15] C.Y. Maa and M. Shanblatt, "Linear and quadratic programming neural

network analysis", IEEE Trans. Neural Networks, vol. 3, pp. 580-594, Jul. 1992.

[16] T. Martin, "Acoustic Recognition of a Limited Vocabulary in Continuous

Speech", Ph.D. Thesis, Dept. Electrical Engineering Univ. Pennsylvania, 1970.

41

[17] T.X. Motzkin and I.J. Schoenberg, "The relaxation method for linear

inequalities", Can. J. Math, vol. 6, pp. 393-404, 1954.

[18] H. Oh and S.C. Kothari, "A pseudo-relaxation learning algorithm for

bidirectional associative memory", in Proceedings of the International Joint

Conference on Neural Networks, Baltimore, Maryland, (1992), Volume II, pp.

208-213.

[19] H. Oh and S.C. Kothari, "Adaptation of the relaxation method for learning in

bidirectional associative memory", IEEE Trans. Neural Networks, vol. 5, No 4,

pp. 576-583, 1994.

[20] A. Rodríguez-Vázquez, R. Domínguer-Castro, A. Rueda, J.L. Huertas and E.

Sánchez-Sinencio, "Nonlinear switched-capacitor 'neural' networks for

optimization problems", IEEE Trans. Circ. Syst., vol. 37, pp. 384-397, Mar. 1990.

[21] F. Rosenblatt, "Principles of Neurodynamics", Spartan Books, Washington,

D.C., 1962.

[22] W.R. Smythe Jr and L.A. Johnson, "Introduction to linear programming with

applications", Prentice Hall, Englewood Cliffs, N.J., 1966.

[23] D.W. Tank and J.J. Hopfield, "Simple 'neural' optimization networks: An A/D

converter, signal decision circuit, and a linear programming circuit", IEEE Trans.

Circ. Syst., vol. CAS-33, pp. 533- 541, May 1986.

42

[24] S.H. Zak, V. Upatising and S. Hui, "Solving linear programming problems

with neural networks: a comparative study", IEEE Trans. Neural Networks, vol. 6,

pp. 94-104, Jan. 1995.

43

Fig. 1: Winner-Takes-All network with 4 inputs. Full lines have weight 1, dashed

lines -1. Activation functions are a sign function for the first layer and a step

function, with threshold of 5/2, for the second layer. Data transits from left to

right. The outputs yi are 0 except at the position of the maximum input xi.

44

Fig. 2: Artificial neural network to perform the maximal distance and maximal

residual relaxation-projection algorithms.

45

Fig. 3: The i'th subnetwork of the chain that constitutes the artificial neural

network to perform the systematic relaxation-projection algorithm.

46

Fig. 4: Artificial neural network to perform the simultaneous relaxation-projection

algorithm.

47

Fig. 5: Trajectories (value vs step number) of the variables x1 and x2, produced

by the maximal distance relaxation-projection network, for a sample problem,

with λ = 1.5 and ρ = 0.25. x2 is the variable that increases at the start.

48


by the systematic relaxation-projection network, for a sample problem, with λ = 1

and ρ = 0.25. x1 is the variable that increases at the start.

49


by the simultaneous relaxation-projection network, for a sample problem, with λ =

2 and ρ = 2.5. x1 is the variable that increases most at the start.

Documents

ON SOLVING SYSTEMS OF LINEAR INEQUALITIES WITH … · The problem of solving a system of linear inequalities arises in numerous applications. It is omnipresent in optimization, where