Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ON SOLVING SYSTEMS OF LINEAR INEQUALITIES
WITH ARTIFICIAL NEURAL NETWORKS
Gilles Labonté
Department of Mathematics and Computer Science
Royal Military College of Canada
Kingston, Ontario, K7K 5L0 Canada
2
Abstract. The implementation of the relaxation-projection algorithm by artificial
neural networks to solve sets of linear inequalities is examined. The different
versions of this algorithm are described, and theoretical convergence results are
given. The best known analogue optimization solvers are shown to use the
simultaneous projection version of it. Neural networks that implement each
version are described. The results of tests, made with simulated realizations of
these networks, are reported. These tests consisted in having all networks solve
some sample problems. The results obtained help determine good values for
the step size parameters, and point out the relative merits of the different
networks.
3
1. INTRODUCTION
The problem of solving a system of linear inequalities arises in numerous
applications. It is omnipresent in optimization, where it is solved by itself, or
concurrently with the problem of finding the minimum value of a cost function, as
with the simplex algorithm [6], or as a preliminary step for interior point methods
(see for example, Chapter 5 of [9]).
The particular method of solution of this problem called the relaxation-
projection method is the main object of the present article. The original research
on this method was carried out, around 1954, by S. Agmon [1], T.S. Motzkin and
I.J. Schoenberg [17]. Associating the inequalities to half-spaces, in which lie the
points corresponding to the feasible solutions, they proved that such a point can
be reached, from an arbitrary outside point, by constructing a trajectory of
straight line segments, each of which is in the direction of one of the half-spaces
corresponding to violated constraints.
Many neural network training procedures consist, or are based upon, a
method very closely related to that algorithm. For example, the single layer
perceptron training method is such a process. Notwithstanding this fact, it
seems that F. Rosenblatt [21], H.D. Block [3] and the many others who
contributed to its proof of convergence, were unaware of the work of Agmon,
Motzkin and Shoenberg, since no mention of it can be found in their writings.
Recently, H. Oh and S.C. Kothari [18,19], in their study of neural networks
used as bidirectional associative memory, realized the usefulness of these
4
results and, in effect, proposed using a particular version of the relaxation-
projection algorithm to calculate directly the weights of the neurons.
Even though the mathematical results concerning these algorithms are
clearly very pertinent to the field of artificial neural networks, they seem to have
gone very much unnoticed by the researchers in that field, until their use by Oh
and Kothari. Thus, one of our aims, in the present article, is to draw attention to
the most important results concerning these methods. We shall describe the
different versions of the relaxation-projection algorithm, known as the maximal
distance, the maximal residual, the systematic, the general recurrent and the
simultaneous relaxation-projection methods. We shall also give definite
theorems concerning the step size parameters for which convergence, and even
termination in a finite number of steps, is guaranteed.
After having done so, we shall look at some of the best known analogue
optimization networks, namely those of L.O. Chua and G.N. Lin [5] and of M.P.
Kennedy and L.O. Chua [13], of D.W. Tank and J.J. Hopfield [23], and of A.
Rodríguez-Vázquez et al. [20]. We shall demonstrate that they are all making
use of a continuous time version of the simultaneous relaxation-projection
algorithm.
We shall then show neural networks which implement each of the
different versions of the relaxation-projection algorithm. We shall give the
number of units of time needed to perform one step, and the formulas for the
number of neurons these networks require, in terms of the number of variables
and the number of inequalities to solve. We shall describe two types of
5
implementations, one with fixed weights, and one with weights varying according
to Hebb's rule.
Finally, we report on tests we made with simulated realizations of all these
networks. These tests consisted in having each network solve a set of fifteen
small problems, with from two to six variables, and from four to sixteen
inequalities, and one somewhat larger problem with twenty variables and thirty
five inequalities. Different step size parameters were used, so that we can
determine good values to use for these parameters, as well as compare the
relative merits of the different networks.
1.1 Notation
We consider the problem of finding a vector x ∈ Rn such that Ax + b ≥ 0,
when A is a constant mXn matrix and b is a constant vector in Rm. If we let ai
denote the transposed of the i'th row of A, these inequalities can be written as
wi(x) = <ai , x> + bi ≥ 0 for i=1,...,m (1)
where < , > is the Euclidean scalar product. We assume that no ai is the zero
vector.
Define the closed half-space hi and its bounding hyperplane ππi as
hi = x : wi(x) ≥ 0 , ππi = x : wi(x) = 0 . (2)
6
ni = ai / |ai| is then the unit normal to ππi that points inward of hi. A point x is "on
the right side" of ππi if it is in hi; otherwise, it is "on the wrong side" of it. The
Euclidean distance between point x and hyperplane ππi is
dist(x, ππi) = εi (<ni, x>+ βi) (3)
where βi = bi / |ai| and εi = 1 if x is on the right side of ππi and -1 if it is on its
wrong side. The distance between point x and the half-space hi is dist(x, hi) =
dist(x, ππi) if x ∉ hi and zero if x ∈ hi. The solutions to the system of inequalities
correspond to the points of the convex polytope ΩΩ, defined as the intersection of
all half-spaces hi. We shall assume hereafter that ΩΩ is non empty.
1.2 Methods of Solution
Essentially all optimization methods, except of course those that require
starting from a feasible solution, will solve the feasibility problem, when the true
cost function is set to zero. This is particularly straightforward to implement for
methods which use an objective function which consists of two terms: a term for
the actual cost to be minimized, and a penalty term for the non satisfied
constraints. We recall (see, for example, Chapter 5 of Ref. [9]), that the penalty
functions that are most commonly used in optimization are the two functions F1
and F2 defined by:
ii(x)i
i1 bx,a)x(F +><η= ∑∈ΙΙ
and 2ii
(x)ii2 bx,a)x(F +><η= ∑
∈ΙΙ(4)
7
where ΙΙ(x) is the set of the indices of the constraints which are violated by x, and
ηi are some positive constants.
There are also algorithms which have been developed especially for the
solution of the feasibility problem. This is the case for the relaxation-projection
method of S. Agmon [1], T.S. Motzkin and I.J. Schoenberg [17], mentioned
above, and for the simultaneous relaxation-projection method, proposed more
recently by Y. Censor and T. Elfving [4]. This latter method is a variant of the
former, in which the steps of the iteration sequence are made in the direction of
an average of the normals toward all the half-spaces of the violated constraints.
This method is remarkable in that, as proved by A.R. De Pierro and A.N. Iusem
[7], even for inconsistent problems, it will produce a point for which the weighted
average of the squares of the distances to the half-spaces, i.e. the value of the
function F2, is minimum.
2. THE RELAXATION-PROJECTION ALGORITHMS
Define the operators T(hi), i=1,...,m, such that
T(hi) x= x if x ∈ hi (5)= x + [λi dist(x, ππi) + ρi] ni if x ∉ hi
where λi and ρi are non-negative constants. Define also the operator T:
∑=
γ=m
1iii )(TT h , where each γi >0 and ∑
==γ
m
1ii 1 (6)
8
Single-Plane Algorithm. Define an infinite sequence of half-spaces Hν, by
repeating elements of the set hi,i=1...,m, as prescribed below. Take an
arbitrary x0 in Rn and as long as xν ∉ ΩΩ, define inductively xν+1 = T(Hν) xν.
How the sequence Hν is defined characterizes different versions of this
algorithm. Some often considered choices are
1) The maximal distance algorithm, for which Hν is the half-space farthest away
from xν, or anyone of them, if there are more than one at the largest distance.
2) The maximal residual algorithm, for which Hν =hi if wi(xν) is the linear form in
the set of Ineqs. (1) which has the smallest negative value, or anyone of them. if
there are more than one with the smallest value.
3) The systematic projection algorithm, for which Hν is the infinite cyclic
sequence with Hν = hi for ν = i (mod m).
4) The general recurrent projection algorithm, for which the infinite sequence of
half-spaces Hν is arbitrary except for the requirement that any one half-space
hi must reoccur in a finite number of steps, after any given ν. Sequences so
defined are commonly considered in neural network theory, when it comes to
presenting a finite set of exemplars to a learning neural network; (see, for
example, F. Rosenblatt [21] and H.D. Block [3]). The systematic projection
algorithm is obviously a particular case of it.
Multi-Plane Algorithm. Take an arbitrary x0 in Rn, and as long as xν ∉ ΩΩ,
define inductively xν+1 = T xν.
9
This is the simultaneous projection algorithm. In its more general form,
the γi's are allowed to vary from step to step, as long as their sum remains 1.
2.1 General Convergence Properties
In this and the following two sections, we review some important results
concerning the convergence of the relaxation-projection algorithms described
above. Because the results we could find published did not cover all the variants
of these algorithms, we had to generalize some of them. We simply state those
results that had already been proven as such, and prove those that resulted from
some generalization. The proofs are worth going over in that they provide a
good understanding of the nature of the algorithms.
We start with two preliminary lemmas on which most of the proofs are
based.
Lemma 2.1: Let x be an arbitrary point, let y = T(hi) x with 0 ≤ λi ≤ 2 and let the
point a ∈ hi be such that 0 ≤ ρi ≤ 2 dist (a, ππi), then | y - a | ≤ | x - a |.
Proof: If x ∈ hi, then y = x and the result is trivial. Let us therefore consider x ∉
hi, and to simplify the notation, define di(x) = dist (x, ππi). A straightforward
algebraic calculation, making use of Eqs. (5) and (3), yields the following
equation.
| y - a |2 = | x - a |2 - Qi(x) (7)
10
with Qi(x) = [ λidi(x) + ρi] [(2 - λi) di(x) + 2 di(a) - ρi] (8)
The factorization we have made in Qi(x) makes it evident that, under the
hypotheses of this lemma, Qi(x) ≥ 0 ∀ i. •
Lemma 2.2: Let ΩΩ be non empty, let x be an arbitrary point outside ΩΩ, let y = Tx
while for each T(hi) entering in T, 0 ≤ λi ≤ 2 and 0 ≤ ρi ≤ 2 dist (a, ππi) for some
point a ∈ ΩΩ, then | y - a | ≤ | x - a |.
Proof: Upon using the definition of T, given in Eq.(6), and the result of Lemma
2.1, one gets:
| y - a | ∑ ∑= =
−γ≤−γ≤m
1i
m
1iiii axax)(T h = | x - a |. (9)
Theorem 1: Let xν be any type of single-plane or a multi-plane relaxation-
projection sequence, with 0 ≤ λi ≤ 2 ∀ i and with 0 ≤ ρi ≤ 2 dist (a, ππi) ∀ i, for
some point a ∈ ΩΩ, then the sequence of distances | xν - a | is monotonically
non-increasing and thus convergent.
Proof: Lemmas 2.1 and 2.2 imply that the inequality | xν+1 - a | ≤ | xν - a |
holds for all ν's. The sequence of distances | xν - a | is therefore monotone
non-increasing and since it is obviously bounded below by zero, it is then
necessarily convergent. •
11
Theorem 1 states that all relaxation-projection sequences have the
remarkable geometrical property, called Fejér monotonicity, of approaching
pointwise the polytope ΩΩ or a subset of it. Indeed, when ρi = 0 ∀ i, each step xν
→ xν+1 of the algorithm produces a point xν+1 that is closer than xν or at an
equal distance, to every point of the polytope ΩΩ. When ρi > 0, a similar property
holds with respect to the subset a: 2 dist (a, ππi) ≥ ρi ∀ i of ΩΩ. This subset is a
sort of core inside ΩΩ, the boundaries of which are obtained by translating inwards
by ρi/2 each hyperplane ππi, i=1...m. Note that, when ΩΩ is not full dimensional,
this subset is empty, when all ρi's are positive.
Theorem 1, or particular cases of it, can be found stated in most articles
dealing with the convergence of relaxation-projection algorithms. S. Agmon [1],
S. Motzkin and I.J. Schoenberg [17] were the first to mention it for single-plane
algorithms, with ρi = 0, λi = λ ∀ i, and 0 < λ < 2. H. Oh and S.C. Kothari [19]
proved the same result for algorithms with ρi > 0 and the same conditions as
above on the λi's. Although the latter authors talk explicitly about the systematic
relaxation-projection algorithm, their proof clearly holds for all single-plane
algorithms. However, they do not provide an explicit upper bound on the ρi's, as
we did in Theorem 1; they simply state that if ΩΩ is full-dimensional, the ρi's can
always be taken small enough for the property to hold. Y. Censor and T. Elving
[4], proved the Fejér monotonicity of the multi-planes simultaneous relaxation-
projection algorithm with ρi = 0 and 0 < λi < 2 ∀ i.
Our proof of Theorem 1 has the merit of covering all variants of the
algorithm and it is somewhat more direct than some of the above mentioned
proofs.
12
Theorem 2: Let xν be any type of single-plane or a multi-plane relaxation-
projection sequence, with 0 ≤ λi ≤ 2 ∀ i and with 0 < ρi < 2 dist (a, ππi) ∀ i, for
some point a λ ΩΩ, then xν terminates after a finite number of steps to a point of
ΩΩ.
Proof: Let ρm be the smallest of the ρi's and define the positive constant K =
Min [2di(a) - ρi]: i=1,...,m. Eq.(8) shows that Qi(x) ≥ ρm K ∀ i.
i) Single plane methods. With the help of Eq.(7) and the above lower bound on
Qi, one sees that each time a step xν → xν+1 of the algorithm is made, with a
half-space that does not contain xν, | xν+1 - a |2 ≤ | xν - a |2 - ρm K. Thus, if at
point xµ, there has been N non-trivial steps, | xµ - a |2 ≤ | x0 - a |2 - N ρm K.
Since distances are bounded below by zero, there can only be a finite number of
non-trivial steps.
ii) Multi-plane method. With the lower bound obtained above on Qi, Eq.(7) which
holds for x ∉ hi, leads to | T(hi) x - a | ≤ | x - a | - [ρm K]½ . Thus, Ineq.(9) can be
refined to yield
| xν+1 - a | ≤ | xν - a | - [ ] 2/1m
ii Kργ∑
ν∈ΙΙ ≤ | xν - a | - γm [ρm K]½
where ΙΙν is the set of indices of the half-spaces not containing xν and γm is the
smallest of the γi's. Thus, at point xµ, | xµ - a | ≤ | x0 - a | - µ γm [ρm K]½. Again,
since distances are bounded below by zero, there can only be a finite number of
non-trivial steps. •
13
F. Rosenblatt, in Chapter 5 of Ref.[21] and H.D. Block in his article [3]
about the convergence of the learning procedure of single (evolving) layer
perceptrons, have proven the result of Theorem 2, for the single-plane general
recurrent algorithm, with λi = 0 and ρi > 0 ∀ i, in the particular situation where
the polytope ΩΩ is actually a hypercone. For hypercones, the conditions of the
theorem impose no upper bound on the values of the ρi's since whatever these
values, it is always possible to find a point a inside the hypercone, far enough
from its apex, so that these conditions hold. Although it is not obvious, the
pseudo-relaxation method proposed by H. Oh and S.C. Kothari [18,19] can be
recognized as the single-plane systematic relaxation-projection algorithm, with λi
= λ, 0 < λ < 2 and ρi > 0 ∀ i. The argument we used above, for the part of our
Theorem 2 that deals with the general single-plane algorithm, is the same one
they used in proving their Theorem 1 of Ref.[19]. Note however that they did not
provide an explicit upper bound on the ρi's, as we do; they simply stated that if ΩΩ
is full-dimensional, the ρi's can always be taken small enough for the property to
hold. We did not find any published proof of Theorem 2 for the simultaneous
relaxation-projection algorithm.
2.2 Convergence of Single-Plane Methods with ρi = 0
Theorem 3: Let xν be any type of single-plane relaxation-projection sequence
with 0 < λi < 2 and ρi = 0 ∀ i, then xν converges to a point of ΩΩ.
For the proof of this theorem, we shall use the following lemma, which
holds under the same hypotheses as Theorem 3.
Lemma 2.3: The sequence | xν+1 - xν | converges to zero.
14
Proof: The definition of the sequence xν is such that it terminates only if it has
reached a point of ΩΩ, thus the theorem needs to be proven only for infinite
sequences xν. We write hereafter Λν for λi and ΠΠν for ππi if the half-space Hν is
hi and we write Dν(x) for dist(x, ΠΠν). Thus, Eq.(7) becomes:
| xν+1 - a | = | xν - a | - Qν when xν γHν (10)
with Qν = ΛνDν(xν) [(2 - Λν)Dν(xν) + 2 Dν(a)] ≥ Λν(2 - Λν) [Dν(xν)]2 0.
Since the sequences | xν+1 - a | and | xν - a | have the same limit, there
follows from Eq.(10) that the limit of the non-negative sequence Qν must be
zero. The positive sequence Λν(2 - Λν) being bounded, this can happen if
and only if the sequence Dν(xν) converges to zero. The conclusion of the
lemma follows from the fact that for any single-plane relaxation-projection
sequence, with ρi = 0 ∀ i, the step size is | xν+1 - xν | = Λν Dν(xν). •
Proof of Theorem 3: Lemma 2.1 states that the sequence xν is bounded.
Thus, by the Bolzano-Weierstrass theorem, it must have at least one
accumulation point l. Our proof of Theorem 3 will consist in proving that there is
only one such point which is thus the limit of xν, and that this point is in ΩΩ.
We first show that l cannot be outside of ΩΩ. For this, we consider the
possibility that it is and let d be the distance between the point l and the closest
half-space not containing it, and let λm be the smallest of the λi's. By the
definition of accumulation points, for any ε > 0, there exists an index ν0(ε) after
15
which the sequence xν has an infinite number of its points in the closed sphere
Sc(l,ε) of radius ε, centered on l. Consider then )1(2
dm
m
λ+λ<ε , and a point xν ∈
Sc(l,ε) with ν large enough that | xν+1 - xν | < ε (recall Lemma 2.2). The former
condition on xν implies that whatever the plane ΠΠν, Dν(xν) ≥ Dν(l) - ε ≥ d -
ε > m
m )2(
λελ+
. Therefore | xν+1 - xν | = Λν Dν(xν) ≥ λm Dν(xν) > (2+λm)ε,
which is incompatible with the latter condition on xν. Thus no accumulation
point of the sequence xν can be outside ΩΩ.
The accumulation point l must be on the surface of ΩΩ. Indeed, it cannot
be inside ΩΩ, since then ε can be taken such that the sphere Sc(l,ε) lies entirely
inside ΩΩ. The first point of the sequence xν to enter this sphere would then be
inside ΩΩ, and the sequence would terminate at this point.
There then remains only to prove that there can be only one accumulation
point at the surface of ΩΩ. Suppose there are two different such points l and q,
and take ε < | l - q | /2 and small enough that the sphere Sc(l,ε) is traversed only
by hyperplanes containing l. Then, if xν is a point of the sequence in this sphere,
xν+1 must also be in this sphere because the reflecting hyperplane ΠΠν
necessarily passes through l. By induction, one can prove that all following
points are also necessarily in Sc(l,ε), contrary to the hypothesis of existence of
another accumulation point q ≠ l. •
As for our Theorem 1 above, S. Agmon was the first one to prove this
result explicitly , in his Theorem 3 of Ref.[1], for maximal distance relaxation-
projection sequences, with ρi = 0 and λi = λ ∀ i, and 0 < λ < 2. In Section 4 of
16
Ref. [1], he states that his result can be proven as well for maximal residual and
systematic relaxation-projection sequences. S. Motzkin and I.J. Schoenberg [17]
also proved exactly the same result as Agmon, but by a different method. The
proof we present above covers explicitly all variants of the single-plane algorithm
and involves similar ideas as those used by Motzkin and Schoenberg.
Theorem 4: (a) When ΩΩ is full-dimensional, there exists a constant λ0 ∈ [1,2)
such that all the single-plane relaxation-projection sequences, for which λ0 < λi ≤
2 and ρi = 0 ∀ i, terminate after a finite number of steps. (λ0 is a geometrical
constant associated with the polytope ΩΩ, defined in Ref.[10]).
(b) Furthermore, if λ > 2, then the sequence either converges
finitely to a point of ΩΩ or it does not converge.
T.S. Motzkin and I.J. Schoenberg [17] were the first to prove finite
termination, for the particular case λi = 2 and ρi = 0 ∀ i. J.L. Goffin (see his
Theorem (3.3.1) and his Section 4.1 of Ref. [10]) then proved the above
theorem, which constitutes a noteworthy improvement over the result of Motzkin
and Schoenberg, in that it guarantees termination for a whole interval of λi's.
This fact may prove important when doing numerical computations, in that it
would allow to avoid the inevitable instability of a property which holds only for
one particular value of some parameter.
2.3 Convergence of the Multi-Plane Method with ρi = 0
Theorem 5: Consider F(x) = ∑=
λγm
1i
2iii )],x(dist[ h . A simultaneous relaxation-
projection sequence xν, with ρi = 0 and 0 < λi < 2, produces a monotonically
17
non-increasing sequence F(xν) . xν converges to a solution, if the system of
inequalities is consistent and, if not, to a minimizer of F(x).
Y. Censor and T. Elfving [4] were the first to prove the convergence of
the simultaneous relaxation-projection sequence under the conditions of
Theorem 5. A proof similar to theirs is easily produced with the help of the ideas
we used in the proofs presented above. A different proof was given by A.R. De
Pierro and A.N. Iusem in Ref. [7] who also had the merit of proving Theorem 5
as such.
It is remarkable that Theorem 5 is the only result to the effect that each
step of a relaxation-projection sequence decreases an objective function. Even
the similar simultaneous relaxation-projection sequence with ρi > 0 has not been
proven to have this property with respect to its objective function F:
F(x) = ∑=
ρ+λγm
1iii
2iii ),x(dist2)],x(dist[ hh . (11)
It is, of course, nonetheless true that all relaxation-projection sequences, which
converge to a point of ΩΩ, produce a sequence of values F(xν) which converges
to 0, and therefore to the minimum of F(x), where F is the objective function of
Eq.(11), with arbitrary positive constants γi.
3. ON THE ANALOG FEASIBILITY SOLVERS
Widespread interest in the use of electrical circuits as analog solvers for
optimization problems was really awakened in 1986 by D.W. Tank and J.J.
18
Hopfield [23] (for an overview of this subject, see C.Y. Maa and M. Shanblatt
[15]). As M.P. Kennedy and L.O. Chua [12] pointed out, the network proposed
by Tank and Hopfield, with corrected sign for the penalty function, is closely
related to the canonical nonlinear programming circuit of L.O. Chua and G.N. Lin
[5]. Another circuit that is also often mentioned is that of A. Rodríguez-Vázquez
et al. [20].
We shall briefly examine, in this section, the feasibility solvers that these
networks become when they are used with a zero cost function. Although they
are also of interest, we shall not discuss, in the present article, models where
equality and inequality constraints are treated separately (as, for example, the
model of S.H. Zak et al. [24]). Thus, we consider models that solve instances of
the optimization problem:
Minimize φ(x), subject to the set of constraints Ax + b ≥ 0.
3.1 The Model of Chua-Lin and Kennedy-Chua
The circuit of Chua-Lin and Kennedy-Chua [5, 13] is devised to solve the
general optimization problem, where minimal assumptions are made about the
cost function and the constraint functions. When the constraints are taken to be
linear in x, the evolution equation for the n-vector x of voltages in this circuit is
ii
m
1i
2i n),x(dist|a|
R1
dtdx
C h∑=
+φ−∇= (12)
19
where C is a nXn diagonal matrix of constant capacitances and R is a constant
resistance. A Liapunov function that is minimized by this system is
E(x) = 2i
m
1i
2i )],x(dist[|a|
R21
h∑=
+φ . (13)
When the cost function φ is zero, the network implements a continuous
time version of the simultaneous relaxation-projection algorithm, with ρi= 0 ∀ i.
In order to see this, do the change of variables i2/1
i2/1 a)RC(a~,x)RC(x~ −== to
remove the constants R and C from Eq.(12). A first order Euler discretization of
the resulting equation then produces the equation xν+1 = T xν, which describes
one step of the simultaneous relaxation-projection algorithm, with ρi = 0 ∀ i and
γiλi = 2i |a~|t∆ . The value of the Liapunov function E is then F(x)/2R∆t, where F
is the objective function of our Theorem 5.
Note that the conditions for the convergence of the discrete algorithm,
seen in Section 2, correspond here to upper bounds on the step size ∆t.
3.2 The Model of Rodríguez-Vázquez et al.
The circuit proposed by Rodríguez-Vázquez et al.[20] is also devised to
solve the general optimization problem, with minimal conditions on the cost and
constraint functions. Their model is characterized by its division of the space in
two regions: the region of feasibility, inside of which the objective function is
solely the cost function, and the rest of space, where the objective function is
solely the penalty function for the violated constraints. Correspondingly, they
define the pseudo-cost function
20
ψ(x) = U(Ax+b)φ(x) + µP(x) with U(Ax+b) = 1 if Ax+b ≥ 0 (14)
= 0 otherwise.
The constant µ is called the penalty multiplier and P is the penalty function for
the violated constraints. P can be taken to be either one of P1 or P2 with:
P1(x) = ∑=
m
1iii ),x(dist|a| h P2(x) = 2
m
1ii
2i ]),x(dist[|a|∑
=h . (15)
The equation of motion that corresponds to their circuit is
∑∈
µ−φ∇+−=)x(i
ii av)bAx(Udtdx
ΙΙ(16)
where ΙΙ(x) is the set of indices of the violated constraints, and
vi = -1 if P = P1
= - |ai| dist(x, ππi) if P = P2.
When the cost function φ is zero, this model corresponds to a continuous
time version of the simultaneous relaxation-projection algorithm, with λi =0 ∀ i
when the penalty function P is P1 and with ρi=0 ∀ i when P = P2.
A first order Euler discretization of Eq.(16) transforms it into the equation
describing a step of the simultaneous relaxation-projection algorithm with
λi =0 ∀ i and γiρi = µ∆t |ai| when P = P1
21
ρi =0 ∀ i and γiλi = µ∆t |ai|2 when P = P2.
The value of the pseudo-cost function in Eq.(14), when P = P1, is F(x)/2∆t, where
F is the objective function of Eq.(11) with λi =0 ∀ i. When P=P2, it is F(x)/∆t,
where F is the objective function defined in our Theorem 5. The conditions of
convergence, discussed in Section 2, correspond to upper bounds on ∆t.
3.3 The Model of Tank-Hopfield
The linear programming network of Tank and Hopfield [23] minimizes the
objective function φ(x) = <k , x>, where k is a constant n-vector. Its equation of
motion, for the n-vector of voltages x, with corrected sign for the penalty term
(see M.P. Kennedy and L.O. Chua [12]), is
∑=
− +−−=m
1iii
2i
1 n),x(dist|a|x)rR(kdtdx
rC
h (17)
where C is a nXn constant diagonal matrix of capacitances, R is a constant nXn
diagonal matrix of resistances, and r is the proportionality constant in the linear
input-output function for the variable amplifiers. The Liapunov function for this
model is
E(x) = <k, x> + ∑=
− ><+m
1i
1i
2i xR.x
r21
)],x(dist[|a|21
h . (18)
When the cost function is zero, this model does not quite correspond to the
simultaneous relaxation-projection algorithm, due to the additional term -(rR)-1x
on the right hand side of Eq.(17).
22
In order to make conspicuous the effect of this additional term, we
consider a problem with a one-dimensional variable x, and the single constraint x
≥ b > 0. The equation of motion and its solution, in the region x < b, are then
)xb(xrR1
dtdx
rC −+−= and x(t) = β + [ x(0) - β ] e
- αt .
with α = RC
rRb1+ and β =
)rR1(rR+
. It is straightforward to show that x(t) always
remains smaller than b at finite times, when x(0) < b, and that the limit x(∞)
= β < b. Thus, x(t) never moves into the region where the constraint is satisfied,
even asymptotically. Not only that, if x(0) is such that x(∞) < x(0) < b, x(t) will
actually move away from the region of feasibility and decrease monotonically to
x(∞).
These calculations corroborate the remark, made by M.P. Kennedy and
L.O. Chua [12] and C.Y. Maa and M. Shanblatt [15], to the effect that the Tank-
Hopfield network should be used only when the resistances in R are very large,
so that the second term on the right hand side of Eq. (17) is negligible. When
this is the case, the circuit implements the simultaneous relaxation-projection
algorithm.
4. OTHER RELAXATION-PROJECTION NETWORKS
The analogue networks of Section 3 were all implementations of the
simultaneous relaxation-projection algorithm. We now present networks which
implement all the variants of the relaxation-projection algorithm. Although we
23
describe them as digital networks performing the discrete algorithms, it should be
clear that they can also be realized as analogue electrical circuits, performing the
continuous time algorithms.
We consider networks of Mac Culloch-Pitts type neurons, such that when
a neuron has input vector x , weight vector w and activation function f, its output
is f(<w, x>). These neurons are arranged in layers, the data taking one unit of
time to go through each layer. A clock, and possibly delays, ensure that the
proper data enter and leave each layer in step. As with the analogue networks
of Section 3, the neurons all have fixed weights which depend on the parameters
in the inequalities.
These networks all work on the same principle: an arbitrary vector x0 is
initially fed them as input. They are then left to cycle, their output being fed back
as input, until a solution is reached.
4.1 Maximal Distance and Maximal Residual Algorithms
The maximal distance and the maximal residual algorithm each requires a
Winner-Takes-All (WTA) subnetwork to select the maximum xM in a set of values
x1,...,xn. This WTA subnetwork must take the n-vector x = (x1,...,xn)t as input,
and return as output the n-vector y = (y1,...,yn)t of zeros, except for a 1 at only
one of the positions of xM in x.
Some Winner-Takes-All Networks .
1) Feldman and Ballard [8] have presented a WTA network which operates in
one unit of time. It is however composed of neurons that are somewhat more
24
complex than Mac Culloch-Pitts neurons, in that the value of their activation
function depends on the position of the inputs on their surface, as well as the
connection weights. When n inputs arrive at different locations on the surface of
such a neuron, one of them is favoured in that only when this one is the largest
of all inputs does the neuron fire, with output 1. The same behavior is obtained
when considering that the "favoured" input is presented directly to the neuron,
and the other (n-1) inputs come from the neighboring neurons, through inhibiting
channels.
A single layer of such neurons, each having one of the xi's arriving at its
favoured surface location, constitutes a WTA network, its output vector will have
a 1 only at the positions of xM in x, and zeros elsewhere. This WTA network is
obviously the optimum. However, we cannot use it, if we restrict ourselves to
networks with only Mac Culloch-Pitts neurons.
2) A WTA network which operates in only two time steps, can be made with
Mac Culloch-Pitts neurons.
Its first layer has n(n -1)/2 neurons, which we label as nij with i < j, ∀ i and
j ∈ 1,...,n, according to the two components xi and xj of x that it receives as
input. The weights of its connections to these inputs are respectively +1 and -1.
Their activation function is the sign function sgn+ : sgn+(x) = 1 if x ≥ 0 and -1 if x
< 0. Thus, the components of the output vector of this first layer are the signs of
all the possible differences between two components of the input x.
The second layer has n neurons, with the k'th one connected to each
neuron nij of the first layer, for which either i or j = k. Its connection weight is +1
25
if i = k and -1 if j = k. These neurons have the activation function f: f(x) = 1 if x ≥
(n - 3/2) and 0 otherwise. Thus, when the components of the input to the
network are all different, the total input to the k'th neuron is ∑≠=
−n
)ki(1iki )xx(sgn . It
can be seen that, with sgn+ as activation function for the first layer, the output
vector of the network will have a 1 only at the first position of xM in x, and 0
everywhere else, as desired. This WTA network requires a total of n(n+1)/2
neurons. An example of this network, with n = 4, is shown in Figure 1.
3) As final example of WTA network, we mention the binary maximum selector
network devised by T. Martin [16], and described by R.P. Lippman [14]. By
appropriately defining its activation functions at zero, this network can be made
to return an output vector y which has a 1 only at the first position of xM in x and
0 everywhere else. This network requires 2[log2 n] +1 layers of neurons, if n > 2
(where [x] = x if x is an integer and the next integer greater than x otherwise) and
1 layer if n = 2. It then operates in as many time steps. It has
(5 x 2[log2n] - 6 + n) neurons if n > 2 and 2 neurons if n = 2.
We remark that when n ≥ 5, this network is slower than the second
network mentioned above. However, it requires less neurons than the latter
network, whenever n ≥ 13. Since we shall here consider neurons to be
inexpensive, we will use the faster second WTA network.
The network that realizes the algorithm. The network shown in Figure 2
performs one step of the maximal distance or the maximal residual algorithm. It
takes an arbitrary vector x as input. If this x is a solution of Ineqs.(1), it is
returned as the output. If it is not, the output is the vector T(hk) x, where k is the
26
index of the half-space farthest from x, if the maximal distance algorithm is
performed, and the index of the smallest negative linear form wi of Ineqs.(1), if
the maximal residual algorithm is performed.
Here is how this network functions. Its first layer comprises m neurons:
one for each inequality. The threshold of the i'th neuron of this layer is αi βi,
where αi = 1 for the maximal distance algorithm and αi = |ai| for the maximal
residual algorithm. Its weight vector is - αi ni, where ni is the unit vector normal
to the hyperplane ππi. As is common practice, we shall take the threshold into
account, by augmenting the weight vector by one component. Thus, we let the
threshold be its zeroth component, so that it becomes Wi = αi (-βi, -ni1,...,-nin)t.
Correspondingly, an augmented input vector X is defined as (1, x1,...,xn)t. The
activation function for each of these neurons is f: f(x) = x if x ≥ 0 and 0 if x < 0.
The output vector of the first layer is therefore [α1dist(x,h1),..., αmdist(x,hm)]t.
This output vector serves as input for the WTA subnetwork. If x is already
a solution, the output of this subnetwork is z = 0 and y = [1,0,...0]t, and if it is not,
it is z = αkdist(x,ππk) and the vector y = [0,...1,...0]t, where the 1 is at the k'th
position.
The WTA subnetwork is followed by a layer of m neurons, with zero
threshold, and multiplicatively arranged input connections to the input and output
ports of the WTA subnetwork. These connections are such that the i'th neuron
of this layer has the input OiΙi where Oi is the i'th output of the WTA network and
Ιi is its i'th input. How to realize multiplicative synaptic arrangements has been
discussed by G.E. Hinton [11] and others (see, for example, Section 9.6 of Ref.
27
[2]). The activation function of the i'th neuron is fi: fi(x) = i
i
αλ
x + ρi if x > 0 and 0
if x ≤ 0. The output vector of this layer is then the zero vector if the point x input
to the first layer of the network is a solution and the vector [0,...
0,
ρ+
αλ
kkk
k ),x(dist ππ ,0,...,0], if it is not.
The last layer is made up of n neurons, each with zero threshold and
activation function fl: fl(x) = x. There is a connection from the i'th neuron of the
previous layer to the j'th one of this layer, with weight nij, where nij is the j'th
component of the unit vector ni, normal to the hyperplane ππi. This j'th neuron is
also fed, with weight one, the j'th component of the input vector x to the first layer
of the network. The output vector of this last layer is then T(hk) x.
If one unit of time is required for the data to go through each layer of the
network, 5 units of time will be required for it to perform one step of the
algorithm. The network has (m2 + 5m + 2n)/2 neurons. A solution to the system
of inequalities is obtained when the output vector of the network is identical to its
input vector x.
4.2 Systematic Projection Algorithm
The network for this algorithm is composed of as many subnetworks, as
that illustrated in Figure 3, as there are inequalities to satisfy. These
subnetworks are chained together to perform a full cycle of the algorithm.
28
Here is how the i'th subnetwork functions. Its first layer contains one
neuron, with same weights and threshold as the i'th neuron in the first layer of
our maximal distance algorithm network. Its activation function fi is however
different, with fi(x) = (λi x + ρi) if x > 0 and = 0 if x ≤ 0.
The last layer of this subnetwork is similar to that of the maximal distance
algorithm network. The connection from the single neuron of the previous layer
to the j'th one of this layer has weight nij, where nij is the j'th component of the
unit vector ni, normal to the hyperplane ππi. This j'th neuron is also fed, with
weight one, the j'th component of the input vector x for the first layer of this i'th
subnetwork. The output vector of this last layer is therefore T(hi) x.
Two units of time are required to perform one step of the algorithm, i.e. for
the data to go through one subnetwork, which contains (n+1) neurons. Since m
such subnetworks, connected in series, are required for a whole cycle through all
the inequalities, 2m units of time and m(n+1) neurons will be required for one
such cycle. A solution is obtained when the output vector, at the end of the
chain, is identical to the input vector x, at its beginning.
4.3 Simultaneous Projection Algorithm
The basic structure of the network for this algorithm, shown in Figure 4,
can be recognized in each of the analogue optimization networks discussed in
Section 3.
29
Its first layer has m neurons. The i'th one of which is identical to that of
the first layer of the i'th subnetwork for the systematic algorithm, with its
activation function multiplied by γi.
The last layer of this network is identical to that of the maximal distance
algorithm network, and it is connected in the same way to its preceding layer and
the input x for the whole network. Its output vector is Tx.
Each step of the algorithm is performed in two units of time. The network
has (m+n) neurons. A solution is obtained when its output vector is identical to
its input vector x.
5. RECIPROCAL IMPLEMENTATIONS
Another set of networks implementing the same algorithms, is obtained by
interchanging, in the networks of Section 4, the way in which the coordinates X
and the inequality parameters Wi are treated. Thus, the weights of the neurons
of the first layer of the networks would now all be set to X, and the i'th neuron of
the first layer would receive the vector Wi as input. This interchange leaves its
total input <Wi, X> unchanged. When these neurons are let to evolve according
to Hebb's rule, a solution to the system of inequalities will be obtained as the
final value of their weights.
More precisely, consider a neuron, with (n+1)-dimensional weight vector X
= (1, x1,...,xn)t, and activation function fi: fi(x) = λix + ρi , if x > 0 and f(x) = 0, if x
≤ 0.
30
When the vector Wi is presented to it as input, its output fi(<Wi, X>) will be
0 if x ∈ hi, and λi dist(x, ππi) + ρi, if x∉ hi. The first component of its weight vector
is then kept to the constant value 1, and its other weights are made to change
according to the Hebbian learning rule: x … x + fi(<Wi, X>) ni. Thus, this neuron
implements the action of the operator T(hi) on the vector x.
1) Systematic and General Recurrent Algorithm. A single neuron, as described
above, can perform the systematic and the general recurrent versions of the
single-plane algorithm, with λi = λ and ρi = ρ for all i's. For this, It suffices to
present to it the inputs Wi's, in the order specified by these algorithms.
In order to allow for different parameters λi and ρi, it would be necessary
to use m neurons, each one with a different value of these parameters in its
activation function. The exemplar Wi would then be presented only to the i'th
neuron, the output of which would provide the weight correction for all m
neurons.
2) Maximal Distance and Maximal Residual Algorithms. The interchange of the
roles of X and Wi, as described above, is made for the neurons of the first layer
of the network described in Section 4.1. To execute one step, all the Wi's are
presented simultaneously as inputs (Wi being the input for the i'th neuron). If the
direct connections, between the input to the network and the last layer, are
removed, the output of the last layer will be the zero vector, if x is a solution, and
the vector [λk dist(x,ππk) + ρk] nk if not. This output is the correction to be added
to the x part of the weight vector for each neuron of the first layer. A solution is
recognized as such when the output of the network is zero.
31
3) Simultaneous Algorithm. The same modifications done to the maximal
distance network should be made to the network described in Section 4.3. The
network would then perform one step of the algorithm, by weight correction for
the neurons of the first layer, exactly as described above for the maximal
distance algorithm network.
6. SOME SIMULATION RESULTS
We have simulated the digital neural networks implementing the maximal
distance, the systematic and the simultaneous relaxation-projection algorithms.
In a first series of tests, they were used to solve some 15 small feasibility
problems (most of these problems are optimization problems from Ref. [22], in
which we have set the cost function to zero). Upon characterizing a problem,
with n variables and m inequalities, by the pair (n, m), the problems solved can
be described as two of each of the types (2, 4), (3, 6) and (4, 7), four of the type
(5, 9) and one of each of the types (3, 7), (4, 8), (5, 3), (5, 8) and (6, 16). For
each algorithm, the same step size parameters λi and ρi were used for all
hyperplanes. Values of λ = 0.5k, with k=0,...,6 and ρ ∈ 0, 0.25, 0.5, 1 were
tried. For the simultaneous projection algorithm, the additional values of λ with k
= 7,...,20 and ρ = 0.5s, with s=3,...,11, were also tried. Note that whenever this
algorithm was used, its parameters γi, for i=1,...,m, were all taken to be 1/m,
where m is the number of inequalities. Table 1 reports the total number of steps
and the total number of units of time each network required to solve all of these
problems, when the best values for λ and ρ were used. Notice that for the
simultaneous relaxation-projection algorithm, the best results were obtained for
values of λ much outside of the bounds given in Theorems 2 and 5. These
results, and those with λ =2, the upper "safety" bound, appear in Table 1.
32
ALGORITHM λλ ρρ Steps Time
Max. Distance 1.5 0.25 119 595
Systematic 1 0.25 370 740
Simultaneous 2 2.5 277 554
(> safe bounds) 7 2 118 236
TABLE 1: The values of the parameters λ and ρ for which the three
networks took less overall time to solve 15 small feasibility problems.
As this table indicates, all the networks, given appropriate step size parameters,
solved the 15 problems in a finite number of steps. In terms of the number of
steps, the best performance of the maximal distance and of the simultaneous
projection algorithms are comparable, and are much better than that of the
systematic projection algorithm. In terms of the time required however, the
simultaneous projection network is faster because of its fewer layers.
Nonetheless, if λ had to be taken within the safe range λ ≤ 2, the times required
by the two would be comparable (595 vs 554 time units). And if we had allow
ourselves the single level WTA network, the maximal distance network would
have performed the best with 476 vs 554 time units.
33
Figures 5 to 7 are graphs showing the values taken, at each step of the
solution, by the two variables x1 and x2, when the following sample problem is
solved by the different networks.
Find x1 and x2 such that: x1 - x2 ≥ 1 and -x1 + 5x2 ≥ 5.
The behavior seen is representative of the general one observed with the
different networks. The maximal distance and the simultaneous projection
networks are seen to be of somewhat similar effectiveness in terms of the
number of steps. However, the trajectories produced by the first network
oscillate more, in general, as should be expected from the fact that the
simultaneous projection algorithm involves an average of the directions toward
all violated constraints hyperplanes. The systematic projection algorithm is seen
to converge most slowly. All networks started from the point (0, 0)t. The
maximal distance algorithm network reached the solution (3.371, 1.864)t after 6
steps, taking 36 units of time. The systematic algorithm network reached the
solution (2.976, 1.623)t after 16 steps, taking 32 time units. The simultaneous
algorithm network reached the solution (4.961, 2.193)t after 8 steps, taking 16
units of time.
Table 2 shows the number of neurons each network requires for solving a
type (20, 35) problem. The systematic projection and the maximal distance
networks are seen to be, by far, the most costly in terms of the number of
neurons. This same table also shows the number of steps and of units of time
required for the solution, with the best values for the parameters, as determined
in the tests with the 15 small problems, as well as those values among those
mentioned above, which yielded the best solution time for this (20, 35) problem
34
alone. The mention "Ended" in the table indicates that the network was stopped,
the algorithm having run for 100 steps without producing a solution. The
systematic projection network proved the less efficient, taking more than the 100
steps limit, for most values of the parameters. The performance of the maximal
distance algorithm and the simultaneous projection algorithm are comparable as
for the number of steps, when λ is in the "safe" range. The latter algorithm is
however definitely superior in terms of the number of units of time required. The
best performance, obtained with the simultaneous projection network, is
remarkable in that the solution is obtained in a single step.
ALGORITHM Neurons λλ ρρ Steps Time
Max. Distance 720 1.5 0.25 27 135
1.5 0 12 60
Systematic 735 1 0.25 Ended Ended
1.5 0 70 140
Simultaneous 55 2 2.5 15 30
10 1 1 2
TABLE 2: Values of the network parameters and the corresponding times taken
by the three networks to solve a 20 variables, 35 inequalities problem.
7. CONCLUSIONS
35
We have shown that the solution method, used by the best known
analogue optimization networks, is a continuous time version of the simultaneous
relaxation-projection algorithm. As for the Tank-Hopfield network however, the
input resistances for the variable amplifiers makes its behavior deviate slightly
from that of this algorithm. By solving exactly its equation of motion, we have
demonstrated that this additional term has a negative effect, in that it prevents a
feasible solution from being reached.
We have produced neural networks that implement each of the relaxation-
projection algorithms. For the fixed weights implementation, the number of
neurons required to solve a problem with n variables and m inequalities are
(m2 + 5m + 2n)/2 neurons for the maximal distance version,
m(n+1) neurons for the systematic projection version, and
(m+n) for the simultaneous projection version.
These numbers clearly show that, among these three versions, the last one is
the most economical in terms of neurons used, its number of neurons increasing
only linearly with the problem parameters. The variable weights networks have
the same basic structure and same efficiency as the above ones. However, as
mentioned in Section 5, a single neuron with the Hebbian learning capacity,
suffices to perform the systematic and any recurrent single-plane algorithm.
Although we have found these algorithms to be generally less efficient than the
other ones, this fact definitely renders them worthy of consideration, for certain
applications where speed of solution is not a critical factor.
The results of the preliminary tests, we have conducted with these
networks, have been discussed to a certain extent in Section 6. We sum them
up as follows. For the sample problems solved, the maximal distance and the
36
simultaneous projection algorithms required comparable numbers of steps,
always much less than the systematic projection algorithm. In terms of the
number of units of time used, the simultaneous projection network appears
superior to the maximal distance network. This comes from the fact that the
latter one has more layers than the former.
. The simultaneous projection algorithm furthermore provides its user, with
the important unique advantage of guaranteeing to minimize the objective
function, even when the system of inequalities has no solution (see Theorem 5).
For the single plane methods, we have found that good values, among
those tried, for the step size parameters λ and ρ are λ ≈ 1.5 and ρ ≈ 0.25. This is
consistent with the convergence theorems mentioned. For the simultaneous
projection algorithm, although good results were found with λ ≈ 2, the best
results were consistently obtained for larger λ's as well as for rather large ρ's,
between 1 and 2.5. This fact can very well be interpreted as an indication that
the sufficient conditions in the convergence theorems are not really necessary,
and that the theoretical results need to be refined.
We note that, when both step size parameters λ and ρ are non-zero, the
convergence should generally be better than when one of the two is zero.
Indeed, when the point xν is far from the polytope, the distance dependance of
the step size ensures that the points of the sequence approach the polytope at a
faster pace than if the steps were of constant lengths. On the other hand, as the
points get close to the polytope and the distance term in the step size becomes
small, the constant term takes over and ensures that the points of the sequence
37
keep on moving toward the polytope at a minimum, non-infinitesimal, rate, so
that it is reached in a finite number of steps.
For computing solutions, it suffices, of course, to know that the iteration
sequence converges; the calculations can then always be stopped when a
certain precision criterion is satisfied. This will always happen after a finite
number of steps, even though the exact sequence xν may actually be infinite.
Nevertheless, it still remains a very important property for an iteration sequence
to exactly terminate in a finite number of steps. Indeed, this generally means
that its limit point is inside the polytope ΩΩ, while for infinite sequences, it is
necessarily on its surface. Interior point solutions are more stable and more
robust because they are completely surrounded by a whole neighborhood of
other solutions. On the other hand, surface limit points have neighbors both in ΩΩ
and outside of it; so that they can easily cease to be solutions under small
perturbations of the parameters of the problem, as when the coefficients of the
inequalities are slightly modified. For example, this is the kind of stability that
leads to a better ability of neural networks to generalize to new inputs the
knowledge they have accumulated during their training.
Given the fact that all the networks we described can be realized with very
inexpensive computing elements, it would be practical to further improve on the
solution time by having many copies of the networks work simultaneously on the
same problem, each using either different values of the step size parameters,
some even with λ > 2, and some with different starting points x0.
It is certainly worthwhile to conduct other tests, with more complex and
larger sample problems, in order to see whether the results we observed persist.
38
We believe that the study reported in the present article is important for
the theory of optimization neural networks, as well as for feasibility networks,
since after all, the latter networks are always an essential part of the first ones.
ACKNOWLEDGMENT
The author is grateful to the reviewers for their constructive comments
and suggestions for improving this manuscript.
39
References
[1] S. Agmon, "The relaxation method for linear inequalities", Can. J. Math., vol.
6, pp. 382-392, 1954.
[2] I. Aleksander and H. Morton, "An Introduction to Neural Computing",
Chapman and Hall Editors, New York, 1990.
[3] H.D. Block, "The perceptron: a model for brain functioning. I", Rev. Mod.
Phys., vol. 34, pp. 123-135, 1962.
[4] Y. Censor and T. Elfving, "New methods for linear inequalities", Linear
Algebra Appl., vol. 42, pp. 199-211, 1982.
[5] L.O. Chua and G.N. Lin, "Nonlinear programming without computation", IEEE
Trans. Circ.Syst., vol. CAS-31, pp. 182-188, Feb. 1984.
[6] G.B. Dantzig, "Linear Programming and Extensions", Princeton University
Press, Princeton, NJ, 1963.
[7] A.R. De Pierro and A.N. Iusem, "A simultaneous projections method for linear
inequalities", Linear Algebra Appl., vol. 64, pp. 243-253, 1985.
[8] J.A. Feldman and D.H. Ballard, "Connectionnist models and their properties",
Cognitive Science, vol. 6, pp. 205-254, 1982.
40
[9] P.E. Gill, W. Murray and M.H. Wright, "Practical Optimization", Academic
Press, London, 1981.
[10] J.L. Goffin, "On the Finite Convergence of the Relaxation Method for Solving
Systems of Inequalities", Operations Research Center Report ORC 71-36, Univ.
of California at Berkely, 1971.
[11] G.E. Hinton, "A parallel computation that assigns object-based frames of
reference", in Proc. 7th Int. Joint Conf. on Artificial Intelligence, 1981.
[12] M.P. Kennedy and L.O. Chua, "Unifying the Tank and Hopfield linear
programming network and the canonical nonlinear programming network of
Chua and Lin", IEEE Trans. Circ. Syst., vol. CAS-34, pp. 210-214, Feb. 1987.
[13] M.P. Kennedy and L.O. Chua, "Neural networks for nonlinear programming",
IEEE Trans. Circ. Syst., vol. 35, pp. 554-562, May 1988.
[14] R.P. Lippmann, "An introduction to computing with neural nets" in "Artificial
Neural Networks; Theoretical Concepts", V. Vemur, Editor, IEEE Computer
Society Press, 1988.
[15] C.Y. Maa and M. Shanblatt, "Linear and quadratic programming neural
network analysis", IEEE Trans. Neural Networks, vol. 3, pp. 580-594, Jul. 1992.
[16] T. Martin, "Acoustic Recognition of a Limited Vocabulary in Continuous
Speech", Ph.D. Thesis, Dept. Electrical Engineering Univ. Pennsylvania, 1970.
41
[17] T.X. Motzkin and I.J. Schoenberg, "The relaxation method for linear
inequalities", Can. J. Math, vol. 6, pp. 393-404, 1954.
[18] H. Oh and S.C. Kothari, "A pseudo-relaxation learning algorithm for
bidirectional associative memory", in Proceedings of the International Joint
Conference on Neural Networks, Baltimore, Maryland, (1992), Volume II, pp.
208-213.
[19] H. Oh and S.C. Kothari, "Adaptation of the relaxation method for learning in
bidirectional associative memory", IEEE Trans. Neural Networks, vol. 5, No 4,
pp. 576-583, 1994.
[20] A. Rodríguez-Vázquez, R. Domínguer-Castro, A. Rueda, J.L. Huertas and E.
Sánchez-Sinencio, "Nonlinear switched-capacitor 'neural' networks for
optimization problems", IEEE Trans. Circ. Syst., vol. 37, pp. 384-397, Mar. 1990.
[21] F. Rosenblatt, "Principles of Neurodynamics", Spartan Books, Washington,
D.C., 1962.
[22] W.R. Smythe Jr and L.A. Johnson, "Introduction to linear programming with
applications", Prentice Hall, Englewood Cliffs, N.J., 1966.
[23] D.W. Tank and J.J. Hopfield, "Simple 'neural' optimization networks: An A/D
converter, signal decision circuit, and a linear programming circuit", IEEE Trans.
Circ. Syst., vol. CAS-33, pp. 533- 541, May 1986.
42
[24] S.H. Zak, V. Upatising and S. Hui, "Solving linear programming problems
with neural networks: a comparative study", IEEE Trans. Neural Networks, vol. 6,
pp. 94-104, Jan. 1995.
43
Fig. 1: Winner-Takes-All network with 4 inputs. Full lines have weight 1, dashed
lines -1. Activation functions are a sign function for the first layer and a step
function, with threshold of 5/2, for the second layer. Data transits from left to
right. The outputs yi are 0 except at the position of the maximum input xi.
44
Fig. 2: Artificial neural network to perform the maximal distance and maximal
residual relaxation-projection algorithms.
45
Fig. 3: The i'th subnetwork of the chain that constitutes the artificial neural
network to perform the systematic relaxation-projection algorithm.
46
Fig. 4: Artificial neural network to perform the simultaneous relaxation-projection
algorithm.
47
Fig. 5: Trajectories (value vs step number) of the variables x1 and x2, produced
by the maximal distance relaxation-projection network, for a sample problem,
with λ = 1.5 and ρ = 0.25. x2 is the variable that increases at the start.
48
Fig. 6: Trajectories (value vs step number) of the variables x1 and x2, produced
by the systematic relaxation-projection network, for a sample problem, with λ = 1
and ρ = 0.25. x1 is the variable that increases at the start.
49
Fig. 7: Trajectories (value vs step number) of the variables x1 and x2, produced
by the simultaneous relaxation-projection network, for a sample problem, with λ =
2 and ρ = 2.5. x1 is the variable that increases most at the start.