Static and dynamic stabilizing neural controllers, applicable to transition between equilibrium points

Pergamon

CONTRIBUTED ARTICLE 0893-6080(94)E0012-A

Neural Networks, Vol. 7, No. 5, pp. 819-831, 1994 Copyright © 1994 Elsevier Science Ltd Printed in the USA. All rights reserved

0893-6080/94 $6.00 + .00

Static and Dynamic Stabilizing Neural Controllers, Applicable to Transition Between Equilibrium Points

JOHAN A. K . SUYKENS, BART L. R . D E MOOR,* AND J o o s VANDEWALLE

Katholieke Universiteit Leuven

(Received 1 March 1993; accepted 19 January 1994)

Abstract--A design method for stabilization of nonlinear systems by feedforward and recurrent neural networks is proposed, applicable to transition between equilibrium points with local stabilization at the end point. Both static and dynamic state and output feedback stabilizing neural control are discussed. The link with linear controller design techniques is explained by linearizing the model around the target equilibrium point and incorporating the linear controller design results in the neural controller The weights are learned off-line and are the solution to a nonlinear optimization problem through simulation of the system. The method is illustrated with the example of swinging up the pole of an inverted pendulum system with local stabilization at the upper equilibrium point, both by a feedforward and a recurrent neural network.

Keywords--Feedforward and recurrent neural networks, Nonlinear optimization, Static and dynamic output feedback, Neural control.

1. INTRODUCTION

In many important applications the control engineer is confronted with the following problem: given a mul- tivariable nonlinear plant that may operate around several equilibrium points, design a control strategy that not only stabilizes the plant around these equilibrium points, but that can also switch the system from one operating point to another. In this paper, we will propose a general design strategy and then illustrate it on a particular example. The general idea is as follows: • First a nonlinear model is obtained by whatever

modeling method that is available (nonlinear system identification, physical laws, bondgraphs, etc.).

• Next the equilibrium points are identified. Around each of these points the system is linearized and a linear stabilizing controller is calculated for each

* Research Associate of the Belgian National Fund for Scientific Research.

Acknowledgement: This research work was carried out at the ESAT laboratory and the Interdisciplinary Center for Neural Networks ICNN of the Katholieke Universiteit Leuven, in the framework of a Concerted Action Project of the Flemish Community, entitled Applicable Neural Networks. The scientific responsibility is assumed by its authors.

Requests for reprints should be sent to Johan A. K. Suykens, Katholieke Universiteit Leuven, Department of Electrical Engineering, ESAT-SISTA, Kardinaal Mercierlaan 94, B-3001 Leuven (Heverlee), Belgium.

819

specific operation point according to classical or modern linear control theory (e.g., by pole placement, PID, LQR, HE, H~ control, etc.) [see Astr6m & Wittenmark (1984); Franklin, Powell, & Work- man ( 1990); Maciejowski (1989) for an introduction to linear control theory ].

• A general parametrized control law is proposed for the switching from one operating point to another, either by a feedforward or a recurrent neural network, depending on the linear controller design. This control law is overparametrized but the parameter vector is constrained in the following sense: in the neighborhood of each operating point, the control law coincides with the linear stabilizing controller around that specific point. The additional freedom in the parameters is used to enforce the desired switching from one point to another.

• If there are more than two equilibrium points, one can repeat the whole strategy for each pair of operating points and design an appropriate switching cir- cuitry that governs the switchings. Often a good linear controller is already available

around some working point. The neural controller can then be used to realize the desired transient and act as the linear controller around the working point. The transition is formulated as a nonlinear optimization problem in the interconnection weights of the neural controller, constrained by the linear controller design at the target point.

820 J. A. K. Suykens, B. L. R. De Moor, and J. Vandewalle

Plant t Plant

Feedfocward [_ Recummt

Neural Network Neural Network

FIGURE 1. Static and dynamic output feedback laws, parametrized by feedforward or recurrent neural networks, respectively.

An academic test example for this kind of problem is, for example, an inverted pendulum. Until now, most of the literature on control of an inverted pendulum is concerned with the problem of keeping a pole up (Barto, Sutton, & Anderson, 1983; Miller, Sutton, & Werbos, 1990). In this paper, we study the more difficult control task of swinging the pole from down to up and stabilizing the pole locally at the upper equilibrium point. The proposed neural controller, parametrized by a feedforward or recurrent neural net, is able to perform this kind of transition.

This paper is organized as follows. Section 2 explains how the controller may be parametrized by feedforward and recurrent neural networks. In Section 3 a design method is presented where linear controller design results can be included in the neural controller. Section 4 treats the problem of transition between equilibrium points, making use of the results of Section 3. A comparison is made with other neural optimal control methods in Section 5, and in Section 6 the methods are illustrated on the example of an inverted pendulum system.

2. PARAMETRIZATION OF CONTROLLERS BY FEEDFORWARD AND RECURRENT

NEURAL NETWORKS

In this section a general outline is proposed of a method for stabilization of nonlinear systems by means of feedforward and recurrent neural networks.

Given a single input nonlinear system

{;=f(x)+g(x)u h(x ) ( 1 )

with state vector x E R", input u @ ~, output vector y R " , and f , g, h are continuous nonlinear mappings.

Furthermore, we suppose that f ( 0 ) = 0, h(0) = 0. If this is not the case, a change of coordinates can be applied. All techniques that will be discussed carry over to the multi-input and discrete time case too, but this will not be discussed in the text. Suppose the control task defined in eqn ( 1 ) is to bring the state x from x (0 ) to a target equilibrium point Xeq.

Therefore, we consider both the case of parametrized static and dynamic output feedback. For s tat ic output feedback we have

u = u(y: O) (2)

where 0 ~ RP is a parameter vector to be determined and # is a continuous nonlinear mapping. This feedback law may be parametrized by a feedforward neural network with one hidden layer (Figure 1 ) as

u = a tanh [ w'tanh(Vy)] (3)

with interconnection weights w E R "h, V ~ R "h×m, where nh is the number of hidden neurons and the scalar a > 0 is to be determined, l The parameter vector 0 in eqn (2) contains the elements of w, V, and c~. The motivation for such a parametrization stems from the fact that any continuous nonlinear function may be approximated arbitrarily well on a compact interval by a multilayer neural network with one or more hidden layers (Cybenko, 1989; Funahashi, 1989; Hornik, Stinchcombe, & White, 1989).

In the d y n a m i c output feedback case we have

{ ; = v(z, y; 01) (4)

o( z, y; 02)

with z E ~"" the state of the controller, 01 E R p' and 02 E R p2 parameter vectors to be determined, and v, p are continuous nonlinear mappings.

This dynamic controller can be parametrized by a recurrent neural network (Fig. 1 ) as

-~ = Wltanh([Vll V12] [ ; ] )

u =~ tanh w~tanh V21 V22 y (5)

with interconnection matrices WI E N"×'~,, V,I E N nht×nz, V12 ~ N nh'Xrn, V21 ~ N nh2Xnz, [/22 ~ N nh2×m, and

vector w2 E R"h,. The parameter vectors 0~ and 02 in

L As activation function we take the hyperbolic tangent function [tanh(x) = ( 1 -- exp(-2x)/( 1 + exp(-2x)], which is applied ele- mentwise to a vector or a matrix.

Static and Dynamic Stabilizing Neural Controllers 821

eqn (4) contain the elements of W1, Vii, Vi2 and a, w2, V2~, V22, respectively.

The design procedure for the feedforward and recurrent neural net works basically as follows: 1. Design a linear controller around the target equilib-

rium point Xeq based on the linearized model. This controller may be static or dynamic depending on the particular system to be controlled. The nonlinear control law eqns (3) and (5) coincides with the linearized controller around X~q.

2. The parameters (interconnection weights) serve to satisfy other performance criteria such as specifi- cations on the transition from x(0) to X~q. This results in optimization problems constrained by the linear controller design results of step 1.

This procedure can be repeated between each pair of equilibrium points of the system or for other initial states.

3. DESIGNING THE NEURAL CONTROLLER

Incorporation of linear controller design results around the target equilibrium point will be outlined here for the case of static and dynamic output feedback.

3.1. Static Output Feedback Using Feedforward Neural Networks

3.1.1. General Case. Suppose the plant model is given by eqn ( 1 ) and the origin is the target equilibrium point [ f (0) = 0, h(0) = 0]. If this is not the case a change of coordinates can be applied to accomplish this. The static controller, parametrized by a feedforward neural network with one hidden layer, is given by eqn (3)

u = a tanh(w'tanh(Vy)) (6)

where a = I u[ max, which is a user-defined maximum amplitude for the control signal. Another choice may be an output layer consisting of linear neurons

u = wttanh(Vy), (7)

but eqn (6) has the advantage of automatically restrict- ing l u[ by a. Hence, eqn (6) is used in the sequel.

In closed loop we obtain

Jc = f ( x ) + g (x )a tanh(wttanh(Vh(x))) . (8)

Local stability around x = 0 is guaranteed then if the eigenvalues Xi of the Jacobian J (0) are in the open left half complex plane

Re{Xi( fo+g(O)awtVho)} < 0 i= 1 . . . . . n (9)

where the matrices f0 and h0 represent the matrices f , = [Of/Oxj] and hx = [Ohi/Ox A evaluated at 0 [see Ap- pendix A for calculation of the Jacobian J ( x ) for the closed loop system (8)]. It will be explained now how the condition (9) can be related to linear controller design results and how such results can be incorporated

into the neural controller. Linear static output feedback u = - k ty applied to the linearized system around x = 0

{~=fox+g(O)u= hox (10)

leads to the closed loop system

Jc = [fo - g(O)ktho]x, ( 11 )

which is stable if the n eigenvalues of the matrix f0 - g(O)ktho are in the open left half complex plane

Re{Xi ( fo -g (O)Mho)} < 0 i = l . . . . . n. (12)

Comparing eqn (9) with eqn (12) gives the following set of constraints on the set of weights related to a certain linear controller design with static feedback gain k

k t = -otwtV. (13)

Observe that the interconnection weights w and V are not fully determined by eqn (13), which means that additional requirements could be achieved by the neural controller. This degree of freedom in the choice of the weights will be used further to enforce a transition between equilibrium points with local stabilization at the end point.

3.1.2. Full State Feedback: E x a m p l e o f LQR. In the case of full state feedback, we have h0 = I , if y = x and the set of constraints on w, V can then be related, for example, to an LQR design (Franklin et al., 1990). The linear state feedback law u = -k~qrX applied to eqn (10) can be determined such that the cost function

Clq r : (xtQx + u'Ru) dt (14)

is minimized where Q and R are given positive definite symmetric matrices. The solution to this problem is given by

k~qr = R-~g(O)tp (15)

where P is the stabilizing solution to the matrix algebraic Riccati equation

0 = Pfo +f~oe - Pg(O)R-lg(O)te + Q (16)

and k~qr = - otw t V. Other techniques like H~o control (state feedback

case) may be applied here.

3.1.3. Introduction o f More Hidden Layers. It is a straightforward calculation to see that the results of one hidden layer [eqn (13)] extend to feedforward neural networks with more than one hidden layer. We can summarize the conditions for local stability at the target equilibrium point as 1. One neuron:

k t = - a w t. (17)


2. One hidden layer."

U = - a w ' V . (18)

3. General case o f q hidden layers."

k ~ : - a w ' V i V 2 . . . Vq. (19)

Notice that in the case of one neuron all weights are determined by eqn (17) for a given a, and in the other cases [ eqns (18) and (19) ] a set of constraints is obtained from the local stability condition. This means that linear static state feedback can be placed at the same level as static state feedback with one neuron.

3.2. Dynamic Output Feedback Using Recurrent Neural Networks

We suppose again that the plant model is given by eqn ( 1 ) with the same assumptions on f , g, and h as in the static output feedback case. The dynamic controller is parametrized now by a recurrent neural network with one hidden layer [eqn (5)] .

The closed loop system

{~= ~(z,x) (20) ~(z, x)

is

(w ( (z ])) k =f(x) + g(x) a tanh ~tanh [V21 V22] h(x) (21)

with ] U[max = Oe. Local stability around x = 0, z = 0 is guaranteed if the n + nz eigenvalues hi of the Jacobian J(0, 0) are in the open left half complex plane

Re{hi(J(0,0))} < 0 i= 1 . . . . . n + n z (22)

with

[ wl v,, w, V,2ho ] J(O, O) = Lg(O)awt2V2, fo + g(O)aw~2V22hoJ " (23)

An expression of the Jacobian J(2, x) for the closed loop system (21 ) can be found in Appendix A. Like in the static output feedback case, it will be shown how linear controller design results can be included in the neural controller. Linear dynamic output feedback

( ~ = E z + F y (24) Gz + Hy

applied on the linearized system around x = 0

{;=Ax+BU=cx (25)

withA =fo, B = g(0) , C = h0 leads to the closed loop system

{ ~ = Ez + FCx

= BGz + (A + BHC)x (26)

which is stable if

i = 1 . . . . . n + n z . (27)

A set of constraints on the interconnection weights is obtained by selecting the linearized dynamic output feedback controller. Indeed, by comparing eqns (22) and (23) with eqn (27) one obtains

L CCW~ V21 oLwIg22 j . (28)

Again the weights are not fully determined by eqn (28), and this additional degree of freedom can be used to enforce a transition between equilibrium points with local stabilization at the end point, as will be illustrated for an inverted pendulum system in Section 6. For the linear controller design on the linearized system, all existing linear controller design methods can be applied, including e.g. PID, LQG, 142, H ~ control, u-synthesis, etc. (Astr6m & Wittenmark, 1984; Franklin et al., 1990; Maciejowski, 1989). These methods are briefly re- viewed in Appendix B for the reader who is not familiar with these techniques. The ideas can also be extended to parametrizations with more than one hidden layer, like in the static output feedback case.

4. TRANSITION BETWEEN EQUILIBRIUM POINTS

Transition between equilibrium points will be discussed for the static and dynamic output feedback case.

4.1. Static Feedback

To impose a transition from a given initial state x (0) to a target equilibrium point Xeq (we suppose Xeq = 0 here without loss of generality) to the system ( 1 ), the optimal control problem may be formulated

min C(a, w, V,f , g, h, x(0), rt, ~', T), (29) c~,w, v

where the cost function C is defined as

C = ~(x(T)) + ~(x(t)) dt

with constraints 1. Local stability at x = 0 [ eqn (13) ] :

k' = - a w ' V (30)

2. Closed loop system dynamics [ eqn (8) ]:

= f ( x ) + g(x)a tanh(w'tanh(Vh(x))) (31)

Static and Dynamic Stabi l iz ing Neural Controllers 823

where, for example, ~ (x (T) ) = II x ( T ) II and ~ ( x ( t ) )

= x ( t ) t x ( t ) (quadratic control) or ~" = 0 (terminal control) and T is the time horizon (Bryson & Ho, 1969). In the case of m hidden neurons (V square and full rank), constraint (30) can be eliminated because W t = - ( k t / o t ) V -1 , so that we get the problem

min C(c~, V , f , g, h, x(O) , 71, ~, T ) (32) a,V

with constraint

(k, ) .¢ = f ( x ) + g ( x ) a tanh - - - V - ~ t a n h ( V h ( x ) ) . (33) OL

Some remarks are made with respect to eqn ( 2 9 ) - ( 3 3 ) .

REMARKS:

1. The region around x = 0 where the linearization (9) is valid depends on w, V. This indicates that it will be needed to formulate an additional constraint on the norm o fw and Vto eqn (33).

2. It is not guaranteed a priori that the linearized region will be entered. This question of controllability may depend, for example, on the choice of the time horizon T. But once the linearized region is entered, the corresponding linear controller takes over, ensuring the state will remain in this region for all time even if a finite time horizon T is considered for the optimal control problem.

3. The determination of the number of hidden neurons needed to perform a certain transition for a given system is in fact a trial-and-error procedure, but it will be illustrated with the example of an inverted pendulum that small-size neural nets are capable of performing complex control tasks.

4. Robustness considerations are twofold: robustness with respect to the initial state and robustness inside the linearized region. For the linearized region, it can be expected that the robustness is comparable to the corresponding linear controller on which the neural controller was designed. Omitting the constraint (30) would result in a controller that is less robust to perturbations or even locally unstable around the target equilibrium point. With respect to the initial state, nothing can be said in general and this problem has to be investigated for each particular system under study.

5. It is also possible to let the controller learn for several initial states x (0) by defining, for example, a cost function C that is the sum of the cost functions each related to a single initial state.

4.2. Numerical Solution to the Optimization Problem

A practical solution to eqns (32) and (33) is to simulate the closed loop system (33) over a finite time horizon

T, by applying an appropriate integration rule, and to calculate the cost function from this simulation result. To guarantee a sufficiently large region for which the linearization (9) is valid, we add the following constraints to eqn (33)

II v I[ < ~,

IIw'Vl[ < ¢~2 (34)

where II" II represents any induced norm and/31,132 are user-defined scalars. The last constraint II wtV II can be interpreted as a design rule for a because IIk'/all =

IlwtVll < 82 so that a > IIk11/¢~2. This leads to the constrained optimization problem

m i n C ( V , f , g , h , x ( O ) , n , ~ , T ) Ilvll<¢~, (35) V

where the cost function C is calculated from the simulation result of the closed loop system dynamics (33).

In general, many local optima may exist to this problem. For solving eqn (35) one can use global optimization techniques (e.g., Monte Carlo method, ge- netic algorithms, etc.) or local optimization algorithms (e.g., sequential quadratic programming) by trying some different starting points. If a local optimization routine is used, the constraint in eqn (35) is also needed to keep I[ V II bounded, because it was observed that otherwise II v II ~ oo for several starting points.

REMARK. In the case where the matrix Vnh×m is not square (nh 4: m), the following elimination trick can be done to eliminate the constraint (30) out of eqn (29): • nh < m: Define s~, s2, VI, 1/'2 according to

1 - - k t - - [s] st2]= wtIV, V21 (36) Ot

with V~ square (nh × nh). This leads to a condition

st T = st (37)

with T = V ~ 1 V2 where one part of T may be chosen freely and the other one is calculated afterwards from eqn (37).

• nh > m: Define V~, V2 such that

1 - - k t = w~V~ + w~zV2 (38)

O~

with VI square (m × m). The elements of w2, V~, V2 may be chosen freely and w~ can be calculated from eqn (38).

In both cases the parameter vector to be optimized consists of all elements that may be chosen freely in eqns (37) and (38).

4.3. Dynamic Output Feedback

In the dynamic output feedback case, the following optimization problem can be formulated for a transition

824 .L A. K. Suykens, B. L. R. De Moor, and J. Vandewalle

from a given initial state x(0) to x = 0, z = 0 for the system ( 1 ):

min C ( a , WI , w2, VII , VI2 , V21 , V22 , a,['vi,w2,vii, v12,v21,v22

./~ g, h, x(0), z(0), rl, ~', T) (39)

where the cost function C is defined as

C = n(x(T), z(T)) + f(x(t), z(t)) dt

with constraints 1. Local stability at x = O, z = 0 [eqn (28)]:

[E F] = W,[V~, VI2 ]

[G H] = aw[[V21 1/'22]. (40)

2. Closed loop system dynamics [eqn (21)] :

0 l[;)]) ~ = Wltanh Vii V12 x

(w ( Ehz ) 1 ) ) ( 4 1 ) k =f(x) + g(x) a tanh ~tanh [V21 V221 (x

If the matrices [Vii VI2] and [V21 V22] are square and full rank, constraint (40) can be eliminated such that

min C(a, Vii, Vj2, V21, 1/22, o:, Vl i, Vl2, V21, V22

./; g, h, x(0), z(0), ~7, ~-, T) (42)

with constraint

k= [EF] [VllVl2]-'tanh([Vll Vl2][h(~)])

=f(x) + g(x) a tanh(~- [G H]

Z X IV21 V22]-ltanh([V21 V22][h(x)]) ) (43)

The same remarks can be made here as in the static output feedback case with respect to controllability, reachability of the linearized region, and robustness.

4.4 Numerical Solution to the Optimization Problem

Again the system (43) is simulated over a finite time horizon T, by applying an integration rule, and next the cost function is calculated from this simulation result. A sufficiently large region for which the linearization (22) is valid is guaranteed by

II[v,, v,2]ll <~l,

II IV2, v22] II < &

IIw~,[v2, v~]l l </33. (44)

The last constraint serves as a design rule for a because [G H]=aw~[V21 V22]suchthata>ll[G H]II/ /33. This leads to the constrained optimization problem

rnin ViI,VI2, V21,P72

C ( V I I , VI2, 1721, ~/'22,

./~ g, h, x(0), z(0), 7/, ~-, T)

with constraints

if[v,, v,=]li <~ ,

it[v=, v~dll <g~2,

(45)

(46)

5. COMPARISON WITH OTHER NEURAL OPTIMAL CONTROL STRATEGIES

Some other candidate methods in neural optimal control for solving eqn (29) are the method proposed by Nguyen and Widrow (1990), which makes use of the back-propagation algorithm to minimize a cost function, illustrated on the well-known truck backer-upper example, and methods in reinforcement learning such as heuristic dynamic programming (Werbos, 1990) and Q-learning. The basic differences of our method with the other schemes are the following 1. Neural controller architecture. Full state information

is not required for the present algorithm. The controller design holds for full state feedback as well as output feedback. Both static and dynamic neural controllers are proposed, respectively parametrized by feedforward and recurrent neural networks. There is no fundamental difference in treating the discrete time or continuous time case.

2. Learning algorithm. In the present algorithm, the choice of the controller architecture and the learning algorithm that are used to find its optimal interconnection weights, are treated separately: the optimal control problem (with finite time horizon) is formulated as a nonlinear optimization in the interconnection weights of the controller. Either local or global optimization algorithms can be used to solve these problems. In the local optimization case effi- cient algorithms from mathematical programming can be applied that are faster compared with steepest descent algorithms (like back propagation without momentum term).

3. Constraint from linear control theory. A linear controller design constraint is introduced at the endpoint, ensuring a locally stabilizing controller at the endpoint that also can be made robust with respect to perturbations (parametric uncertainties, external noise signals, etc.), keeping the closed loop system stable for all time, although a finite time horizon is considered in the optimal control problem (a finite time horizon that is large enough to bring the state close to the target point). More specifically, compared with the method of

Nguyen and Widrow, there is no need here to first derive a neural network model to make error back propagation through the plant possible. The optimization algorithm can work with any kind of nonlinear model (such as models from physical laws, neural network models, polynomial expansions, etc.) because the idea is simply


to optimize the result of a simulation. However, our approach is also an indirect method in the sense that a nonlinear model for the plant must be available to apply the control strategy. The latter is not the case with reinforcement learning algorithms, where the control algorithm is not based on any model for the plant, but receives full state information and reward/ punishment from the environment. As stated in Sutton, Barto, and Williams (1992), reinforcement learning algorithms are direct adaptive optimal control methods and are in fact an approximation to dynamic programming, which is used if a precise model is available, for cases where no state space model for the environment is available. The curse of dimensionality, which is well known to be a problem in Q-learning and adaptive critic methods, where the state space is decoded into boxes (Barto et al., 1983), does not occur in the present algorithm because the control signal is parametrized by a neural network architecture. Furthermore, con- cerning the linear controller constraint, one can also notice that a linearized model around the target equilibrium point is used for deriving a linear controller at that point. In the context of direct neural adaptive control, a priori knowledge of linear models was also used by Saerens and Soquet (1991) to estimate the sign of the output of the plant with respect to its input, which is needed to make back propagation through the plant possible (also see Schiffmann and Geffers, 1993, and Psaltis, Sideris, & Yamamura 1988) if one is not able to use a neural network model for the plant to back propagate through.

6. EXAMPLE: INVERTED P E N D U L U M

We now discuss an example of an inverted pendulum system for transition between equilibrium points with local stabilization at the target equilibrium point. A state space model ( 1 ) for the inverted pendulum system (Barto et al., 1983) (Figure 2) is given by

X2

4 mg -~ mlx2sin x3 - "-~ sin(2x3)

f ( x ) = a3 mt - m cos2x3

X4

ml mtg sin x3 - -~- xEsin(2x3)

l(] m t - m cos2x3) _

0

4 1

g (x )= - 3 3 m , - m c o s 2 x 3 , h ( x ) = [ x ' ] . (47) 0 x3

COS x 3

l(3 m, - m cos2x3)

In this model friction is not taken into account. The

I I

I I

' xl ' I I ! I L l I I

x/ u

FIGURE 2. Inverted pendulum system: xl is the position of the cart, xs is the angle of the po le , u is the force applied to the cart, and x2 = x l , x4 = x3.

state variables x~, x2, x3, x 4 are position and velocity of the cart, angle of the pole with the vertical, and rate of change of the angle, respectively. The input signal u is the force applied to the cart's center of mass. The symbols m, rot, l, g denote the mass of the pole, the total mass of cart and pole, the half pole length, and the acceleration due to gravity, respectively. The input signal u is constrained by I u l < a. Notice that in the autonomous c a s e x - - [ 0 0 0 0] t a n d x - - [0 0 ~r 0 ] t are equilibrium points and we call them eq ÷ and eq- , respectively. The linearized system around the target equilibrium point eq + is

0 l 0 0

0 0 mg 0 fo = 3 m , - m

0 0 0 1

0 0 mtg 0 1(3 m t - m)

- 0

4 1

g(0)= 3 3 m , - m , h o = [ o 0 0 ~]. (48) 0 0 1

1

l(3 m, - m)

Two neural controllers will be studied now: a full static state feedback neural controller (feedforward net) related to LQR design and a dynamic output feedback neural controller (recurrent net) based on LQG.

6.1. Feedforward Neural Network

We take full static state feedback (Figure 3) here in eqn (6) (y = x) with V E R 4×4 and w E R 4 (four hidden neurons). The design method for the neural controller can be outlined as: • Control task. The control task is a transition from

eq- to eq + with local stability at eq +. • L Q R design on the l inearized sys tem. An LQR con-

troller eqns ( 1 4 ) - ( 1 6 ) is calculated for Q = I4 and R = 0.01, m = 0.1, rnt = 1.1, l = 0.5, which gives

k~qr : [ - - 1 0 . 0 0 0 0 - - 16.8140 - 100.4101 - 27.7187].


.J I

]nverled

Pendulum

~ x l

x2

I

x3

x4

FIGURE 3. Inverted pendulum controlled by a feedforward neural network with full state feedback for performing the swinging up problem.

• Choice of or. In eqn (34)/32 = 15 was taken and a = 10 such that c~ > ]lkllz//32. This choice of 132 turns out not to be very critical.

• Optimization problem. The optimization problem (35) was solved here with t31 -- 2, x (0) = [0 0 7r 0]t, T = 3. The closed loop system is simulated by means of a trapezoidal integration rule (Rice, 1983) with constant step length 0.03. This simulation was written in C code. The cost function was calculated from this simulation result with terminal control ( ~" = 0). A local optimization method of sequential quadratic programming (SQP) was used (constr of Matlab) (Matlab User's Guide, 1992 ) for solving the optimization problem (35). A constrained nonlinear optimization problem is in general of the form

min ~o(x) c(x) <- 0 (49) xCRP

where ~ E R and c E R q. The associated Lagrangian function is

q

L(x , X) = ~o(x) + ~ Xici(x). (50) i-1

In the SQP method a quadratic programming (QP) subproblem is solved at each iteration Xk+~ = Xk + akdk by linearizing the nonlinear constraints

1 rain ~ dT Hkd + X7~O( Xk ) Yd (51) x~RP

with Hessian update using the BFGS formula (Broy- den, Fletcher, Goldfarb, Shanno)

qkq r H[Hk Uk+, = Hk + qrs~ s[Hksk (52)

where Sk = Xk+~ -- Xk and qk = xY~O(Xk+~) + ~qi=l ~ k i C i ( X k + l ) - - [ ~ ( P ( X k ) + ~ = 1 ~ k i C i ( X k ) ] (Hetcher, 1987; Gill, Murray, & Wright, 1981; Powell, 1983).

A locally optimal solution obtained by SQP is

0.8375 -0.0987 0.8139 0.2730" 0.3902 0.7257 0.9669 -1.1032

V = -0.0051 0.4888 -0.2993 0.7534 0.2590 0.0862 0.1389 -0.8283

In Figure 4 this solution is given in state space. Figure 5 illustrates the evolution of the pole during swinging up.

Other methods, like reinforcement learning or the method of Nguyen and Widrow, are probably also able to swing up the pole (although this has not been shown as far as we know), but will have problems with ensuring that the pole will stay at the upright position for all time, because in all problems a finite time horizon T is considered. A controller is then obtained that is not necessarily locally stabilizing at the end point and can become unstable from time T on. In the presented scheme the locally stabilizing LQR controller will take over and keep the pole in its upright position for all time. Robustness. In Figure 6 the neural optimal controller was tested for some other initial states in the neighborhood of eq- . Close to eq ÷ the controller acts as a classical LQR controller with comparable robustness. In Figure 7 the behaviour of the neural controller for initial states in the neighborhood ofeq + is given.

6.2. Recurrent Neural Network

The control signal is parametrized by a recurrent neural network (5) (Figure 8) with Wl E R 4x6, W 2 ~ R6, VI l E ~6x4, VI 2 ~ ~6x2, V21 ~ ~6×4, V22 E ~6×2 (Hhl =

6

2

0

-2

-4 0 1 2

xl [m]

5

0 x3 [r~d]

r

x l [ m ]

20

0

2

0

t Is]

FIGURE 4. Solution in state space to the swinging up problem of an inverted pendulum system by a feedforward neural network related to LQR design (initial state x(0) = [0 0 ~ 0] t and target equilibrium point x = 0). This solution corresponds to an intuitive solution that a human would come up with when asked to swing up and stabilize an inverted ponduIum by man- ual control: first apply a maximal force to the right and to the left to swing up the pole and finally a smaller force to keep the pole at its upper equilibrium state.


5

4

3

'~ 2

1

0

-1

i i -3 -2 -~1 0 1 2

po6ition o f cart [m]

t 3 4 5

FIGURE 5. Evolution of the pole during the transition from the equilibriumpeintx(O)=[0 O ~r O]t(poiedown)tothetsrget equilibrium point x = O (pole up). After swinging up, the pole is stabilized locally at its upright position: the neural controller is acting then as a classical LQR controller. The transition between the equilibrium points (nonlocal control) is performed in closed loop by the feedforward neural net of Figures 3-4.

nh2 = 6) , and y = [Xl x3] t. The neural controller design works as: • Control task. The control task is a transition from

eq- to eq + with local stability at eq +. • Linear controller design. Several linear controller de-

sign techniques may be applied here to eqn (28) , like / /2, H~ control, etc. (Astr6m & Wittenmark, 1984; Franklin et al., 1990; Maciejowski, 1989). We did an LQG design around the target equilibrium point x = 0 with equally weighted state and input in eqn (B.3) and W = 14 and V = 12 (see Appendix B), yielding the matrices E, F, G, H in eqn (28) .

• Choice of or. In eqn (44)/~3 = 4 was taken and a = 10 such that a > II [G H] 11//~3. The choice o f / ~ 3 is not critical here.

-2 -1 0 1 2 3

xl

FIGURE 6. Robustness of the feedforward neural controller of Figures 3-4 with respect to the initial state x(0) lying in the neighborhood of eq- . Different trajectories are shown for different initial states, all ending in the target equilibrium state eq+ = 0.

0.2

0.15

0.1

0.05

o

-0.05

-0.1

-0.15

-0.2 -1.5 -1 -0.5 0 0.5 1 1.5

xl

FIGURE 7. Behaviour of the feedforward neural controlled system close to the target equilibrium point eq +. All trajectories shown on the figure are going to the origin.

• Optimization problem. The optimization problem in eqns (45) and (46) was solved with t~, = ~2 = 2, x ( 0 ) = [ 0 0 7r 0] ~, z (0 ) = 0, T = 1 0 . The other parameters were the same as in the static output feedback case and a trapezoidal integration rule with constant step length 0.01. The cost function Cin eqn (45) was calculated from the simulation result with

-i Invertcd Pcndulum

y l

y2

1.4

1.2

I

0.8

0.6

0.4

0.2

l 05 -4

= • z2

LF- 1 FIGURE 8. Inverted pendulum controlled by a recurrent neural network (dynamic output feedback) for performing the swinging up problem. The outputs Yl and Y2 are the position of the cart and the angle of the pole.

828 J. A. K. Suvkens B. L. R. De Moor, and J. Vandewalle

4

yl [ml

1[

-5 \ :

t [s]

2

-2 ~ xl [m]

8 0 2 4 6

t [~]

FIGURE 9. Solution in state space to the swinging up problem of an inverted pendulum system by a recurrent neural network based on LQG design (initial state x(0) = [0 0 7r 0] ' , z = 0 and target equilibrium point x = 0, z = 0 ) . Plot of z~(t): Zl (solid line), z= (dashed line), za (dotted line), z4 (dashdot line).

terminal control (~" = 0). A local optimal solution obtained in Matlab is

W21 [ V22

--0.1496 0.1444

-0 .0889 0.1358 0.1234 0.0590

-0 ,0462 -0 .1275 -0 .0985

0.0566 -0 .0249

0.1224

-0 .1356 -0 .0200 0.1870 0.1255 0.0085 -0 .0312 0.0090 0.0032 0.0507

-0 .0653 -0 .1455 -0 .0025 0.0129 0.1871 0.1086

-0 .0615 0.0540 0.1158

-0 .0326 0.0962 0.0905 0.0797 -0 .0140 -0 .0248

-0 .0147 -0 .1009 0.1019 0.0750 -0 .2760 -0 .0069 0.1195 0.0659 0.0398 0.0249 -0 .1407 -0 .0584

-0 .1646 0 .1179- 0.0386 0.0693 0.1512 0.0328 0.1289 -0 .0117 0.0607 -0 .0696

-0 .0187 0.0479

-0 .1174 -0 .0314 -0 .0925 0.0349

0.0907 0.0229 -0 .1212 0.0158

0.0042 -0 .0958 -0 .0188 -0 .1166

In Figure 9 this solution is presented in state space. Figure l0 shows the evolution of the pole during swinging up.

4

3

2

1

0

E

-1

% -~ -~ o ~ ~ 6 yl [m l

FIGURE 11. Robustness of the recurrent neural controller of Figures 8-9 with respect to the initial state x(0) lying in the neighborhood of eq- : notice that this is superior with respect to the feedforwerd neural control of Figure 6 because of the existence of a larger region around x = [0 0 ~ 0] t con- raining x(0) such that the neural controller is still able to perform the control task of swinging up and locally stabilize at the end point.

Robustness. The neural optimal controller was tested for some other initial states in the neighborhood of eq- ( Figure 11 ) and seems very robust with respect to the choice of x(0) and certainly far more robust for this inverted pendulum system than the feedforward neural net controller based on LQR. Close to eq + the controller acts as an LQG controller with comparable robustness. In Figure 12 the behaviour of the neural controller is given for initial states in the neighborhood of eq +.

7. CONCLUSION

It was shown how transitions between equilibrium points can be realized by static and dynamic neural controllers, parametrized by feedforward or recurrent

1.4

1.2

1

0.8

0.6

0.4

0,2

0 -5

position of can [m]

FIGURE 10. Evolution of the pole during the transition from x(0) = [O 0 w 0] t, z = 0 to the target equilibrium point x = 0, z = 0 by recurrent neural net control of Figures 8 - 9 .

0.2

0.15

0.1

0.05

~ o

-0.05

-0,1

!

-0.15i

-0 .2

\ \

L %

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

yl [m]

FIGURE 12. Behaviour of the recurrent neural controlled system close to the target equilibrium point eq *. All trajectories shown on the figure are going to the origin.


neural networks, with local stability at the target equilibrium point. All classical and modern linear controller design techniques like PID, LQR, pole placement, H2, H~ control, etc., may be applied around the target equilibrium point and can be incorporated into the neural controller. The need for imposing such a linear controller design at the end point comes basically from considering a finite time horizon in the optimal control problem. Neglecting the constraint may result in a neural controller that is not locally stabilizing at the end point, even when the cost function would become exactly zero. The slightest perturbation would make the closed loop system unstable.

Differences with previous approaches in neural optimal control are the study of output feedback (full state information is not required) and the fact that the controller may be either static (feedforward neural network) or dynamic (recurrent neural network). For the swinging up problem of an inverted pendulum system, a remarkable observation was that the dynamic controller could generalize much better with respect to new initial states, for which the neural controller was not trained, than the static controller. Another difference is that in our approach the choices of the controller architecture and the learning algorithm are treated more separately from each other by formulating the optimal control problem as a nonlinear optimization problem in the interconnection weights, which may be solved either by local or global optimization algorithms, whereas in the other methods the learning algorithm (such as back propagation or reinforcement learning) plays a more central and essential role in the controller design than the architecture of the controller in itself. In the presented framework it is shown that nonlinear controllers, parametrized by neural networks, are capable of performing complex control tasks.

REFERENCES

Astrtm, K. J., & Wittenmark, B. (1984). Computer-controlled systems, theory and design. Englewood Cliffs, NJ: Prentice-Hall.

Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC- 13(5), 834-846.

Boy& S., & Barratt, C. (1991). Linear controller design, limits of performance. Englewood Cliffs, N J: Prentice-Hall.

Bryson, A. E., & Ho, Y. C. (1969). Applied optimal control. Waltham, MA: Blaisdel.

Cybenko, G. (1989). Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 183- 192.

Fletcher, R. ( 1987 ). Practical methods of optimization. 2nd ed., Chi- chester and New York: John Wiley and Sons.

Franklin, G. E, Powell, J. D., & Workman, M. L. (1990). Digital control of dynamic systems. Reading, MA: Addison-Wesley.

Funahashi, K-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, 183-192.

Gill, P. E., Murray, W., & Wright, M. H. (1981). Practical optimization. London: Academic Press.

Hornik, K., Stinchcombe, M., & White, H. ( 1989 ). Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366.

Maciejowski, J. M. (1989). Multivariablefeedback design. Reading, MA: Addison-Wesley.

Matlab User's Guide ( 1992 ). Optimization toolbox user's guide, Ver- sion 3.5f. Natick, MA: The MathWorks, Inc.

Miller, W. T., Sutton, R. S., & Werbos, P. J. (1990). Neuralnetworks for control. Cambridge, MA: MIT Press.

Nguyen, D., & Widrow, B. (1990). Neural networks for self-learning control systems. IEEE Control Systems Magazine, 10(3), 18- 23.

Powell, M. J. D. (1983). Variable metric methods for constrained optimization. In: A. Bachem, M. Gr0tschel, & B. Korte (Eds.), Mathematical programming: The state of the art (pp. 288-311 ). Berlin and New York: Springer-Verlag,

Psaltis, D., Sideris, A., & Yamamura, A. (1988). A multilayered neural network controller. IEEE Control Systems Magazine, April, 17- 21.

Rice, J. R. ( 1983 ). Numerical methods, software and analysis. New York: McGraw-Hill.

Saerens, M., & Soquet, A. ( 1991 ). Neural controller based on back- propagation algorithm, lEE Proceedings-E 138( 1 ), 55-62.

Schiffmann, W. H., & Geffers, H. W. (1993). Adaptive control of dynamic system by back propagation networks. Neural Networks, 6, 517-524.

Sutton, R. S., Barto, A., & Williams, R. (1992). Reinforcement learning is direct adaptive optimal control. IEEE ControlSystems Magazine. April, 19-22.

Werbos, P. J. (1990). Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3, 179-189.

APPENDIX A

Jacobians for the Closed Loop Systems

The Jacobian matrix J related to the closed loop system ( 8 ) with a feedforward neural net controller is

J(x) = f~ + gxa tanh(wttanh(Vh(x))) + g(x)aqqwtI'2 Vhx (A. 1 )

where f~, gx, and hx denote the matrices with partial derivatives [Of~ Ox:], [Og~/Oxj] ( i , j = 1 . . . . . n), and [Ohj/Ox A (i = 1 . . . . . m , j = 1 . . . . . n). The matrices f0 and ho represent the matrices fx and hx evaluated at 0. The scalar 3't and the diagonal matrix F2 = diag([72 . . . . . . 3 '2j) are given by

"~1 = 1 -- tanh2(wttanh(Vh(x)))

"~2, = 1 - tanh2(v~h(x)) i = 1 . . . . . nh (A.2)

where v[ denotes the ith row of V. The Jacobian matrix J related to the closed loop system (21 ) with

a recurrent neural net controller is

J(z, x) = O~

L ~

= [ WiFiVu

[g(x) orY2wl I'3 V2, Wt I'j Vl2hx ]

fx + g~a tanh(. ) + g(x)a'y~w[F3V22hxJ

(A.3)

with diagonal matrices Ft = diag([Tt, . . . . . ~l,,,]), I's =

diag([~,3 . . . . . . ~ '3J ) , and 72 equal to

830 J. A . K. S u y k e n s , B. L . R . D e Moor, a n d J. Vandewal le

3',, = 1 - tanhZ[v~l,z + v~z,h(x)] i = 1 . . . . . nh,

V2 = 1 -- tanh2(w~tanh[V21z + Vz2h(x)]

~'3, = 1 - tanhZ[v~,,z + v~2,h(x)] i = 1 . . . . . nh2 (A.4)

with v~ l,, v~2~, v~l,, v~2~ the ith row of VH, V~2, V21, 1/22, respectively.

APPENDIX B

Some Results From Linear Control Theory

Linear control theory is a domain that is well developed and many excellent books can be found on this topic, such as hstr6m and Wit- tenmark (1984), Boyd and Barratt (1990), Bryson and Ho (1969), and Franklin et al. (1990). Because it is shown in this paper how this theory can fit into the framework of neural optimal control theory, a very brief review of some classical and modern linear controller techniques is given here, such as PID, LQR, LQG, H 2 and H~ control, and #-synthesis. The techniques exist both for continuous and discrete time systems and can be applied to MIMO (multiple input multiple output) systems, except the PID controller, which is only relevant for SISO (single input single output) systems•

The LQR controller (linear quadratic regulator) is the solution to the following optimal control problem

~o °~ min C/qr = ( x r Q x + u r R u ) dt (B.I)

with weighting matrices Q = Q r > 0, R = R r > 0, subject to the linear system

= A x + Bu. (B.2)

The solution is given by the static full state feedback

R = - - /~X

where K = R - t B r p and P is the solution to the matrix algebraic Riccati equation

0 = PA + A r P - P B R - ~ B r P + Q.

The LQG controller (linear quadratic Gaussian ) is the solution to the optimal control problem

min C/qg = lira E ( z r Q z + u r R u ) dt (B.3) T~m

with z = M x and subject to the system dynamics

~. = A x + Bu + Pw

y = 6 ~ + v

with w, v zero mean white Gaussian noise processes, having covari- ances E{ ww r } = W>_ O, E { vv r } = V> 0, and E{ wv r} = 0 ( E { . } denotes the expectation operator). The solution to this problem is a dynamic controller, which is a combination of Kalman filtering and full state feedback (so-called separation principle)

u = - K , . d

.f = A.f + Bu + K / ( y - CX)

where Kc = R - ' B r p ~ , Kf = PyCrV - ' , and Pc = p r >_ 0, Pj = P f >-_ 0 satisfy both an algebraic Riccati equation

Arpc + PeA - P,.BR t B T p c + M T Q M = 0

P l A t + A P T p f C r V - ' C P f + r w [ ' r = 0. (B.4)

In PID control (proportional, integral, derivative), studied in classical control theory, the controller K(s) in Figure B. l a is chosen to be of the specific form

d

K(s) ~ - ~ G(s)

(a)

u

A

U l m

u 2

Augmented Plant P(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

~s) ~ - ' ~ ~ L _ ~ YA

(c)

"] J

2

P(s)

y . J A

"2

P(s)

(b) (d)

Y A

Yl

Y2

FIGURE B.1. (a) Block diagram of e I-DOF (one degree of freedom) control scheme, (b) standard form of a control scheme with augmented plant P(s), (c) augmented plant P(s) for the mixed sensitivity problem, and (d) control scheme that takes into account uncertainties A on the plant model.


1 sTo)E(s) U(s) = Ke(1 +-~1 + (B.5)

where the constants Ke, Tt, and TD are related to the proportional, integral, and derivative term, respectively.

More recently,//2 and H~o control for the mixed sensitivity problem were investigated. In these methods one brings the control scheme of Figure B.la in the standard form of Figure B.lb with P(s), the so- called augmented #ant representation. In the/ /2 problem one minimizes

min II T~,~, I1~, (B.6) K(s)

with Tr,. , the transfer function from u I to Yt equal to

T~II$1 = [ WI(S)S(s)] (B.7) [ W2(s)T(s).] '

with S(s) = [ 1 + G(s)K(s)] -t the sensitivity function and T(s) = 1 - S(s) the complementary sensitivity function, which are the transfer functions from d t o y and from r to y, respectively. Wl(S), W2(s) are user-defined weighting functions. In H~o control for the mixed sensitivity problem one minimizes

min II T~,., I1~. (B.8) K(s)

Methods such as t,-synthesis can take into account structured uncertainty on the nominal plant G(s), pulled out into the system A ( Figure B. ld). This robust control scheme takes into account parametric uncertainties on the plant model, unmodeled dynamics, etc. The following objective is often minimized then,

min II D(s) Tn~,D(s)-' Iloo, (B.9) K(s)

where D(s) is a diagonal scaling matrix.

Documents

Static and dynamic stabilizing neural controllers, applicable to transition between equilibrium points