Download pdf - A parameter-adaptive control technique

Automatica, Vol. 5, pp. 731-739. Pergamon Press, 1969. Printed in Great Britain.

A Parameter-Adaptive Control Technique* Une technique de commande h adaptation de param&res

Ein Verfahren der Parameter-adaptiven Regelung

TexHvira ynpaBJieHrI~ c a)Ial-tTaIIrte~ napaMeTpoB

G. STEINS" and G. N. SARIDIS+ +

An approximate solution of the functional equation of dynamic programming has been used to develop a simple adaptive controller for linear stochastic systems with unknown parameters.

Summary--Control of linear stochastic systems with unknown parameters is accomplished by means of an approximate solution of the associated functional equation of dynamic programming. The approximation is based on repeated linearizations of a quadratic weighting matrix appearing in the optimal cost function for the control process. This procedure leads to an adaptive control system which is linear in an expanded vector of state estimates with feedback gains which are explicit functions of a posteriori parameter probabilities. The performance of this controller is illustrated with a simple example.

1. INTRODUCTION

FOR SOME years now the dual control formulation [5, 6] and the dynamic programming formulation [2, 3] for the so-called "optimal adaptive control problem" have been available to solve control problems involving certain unknown quantities such as parameters of the system's mathematical model, parameters of the statistical descriptions for various disturbances affecting the system, or entire functional relationships involved in the mathematical representation of the control problem. Various efforts have been made to utilize these formulations and to modify them for numerous special circumstances [1, 11, 13]. Only limited success, however, has been achieved in dealing with the significant analytical complexities and computational burdens associated with both formulations.

This paper considers a special case of the optimal adaptive control problem for which it is possible to exploit a simple approximation technique to obtain an approximate solution of the functional equation

* Received 14 February 1969; revised 26 May 1969. The original version of this paper was presented at the 4th IFAC Congress which was held in Warsaw, Poland during June 1969. It was recommended for publication in revised form by associate editor P. Dorato.

t Honeywell Inc., Systems & Research Div., Research Dept., 2345 Walnut Street, St. Paul, Minnesota 55113, USA.

Purdue University, Department of Electrical Engineer- ing, West Lafayette, Indiana, USA.

associated with the dynamic programming formulation. The adaptive control problem itself is formulated in section 2 of the paper, followed by a discussion of the approximation technique in section 3 and the resulting adaptive control system in section 4. The solution is then illustrated with a simple example in section 5.

2. A FORMULATION OF THE ADAPTIVE CONTROL PROBLEM

Let the system be described by the following linear, discrete-time, stochastic model,

x(k + 1)=A(c~, k)x(k)+ B(c~, k)u(k)+ F(~, k)~(k)

k=O, 1 . . . . . N - 1

~eE~,= {~1, c¢2 . . . . . as} (1)

with the measurement equation

y(k) = C(a, k)x(k) + D(~, k)rl(k),

k = 1, 2 . . . . . N. (2)

The vector x(k) is an n-vector of state variables defined at time instant t,, u(k) is an unconstrained m-vector of control inputs, and y(k) is an r-vector of measured outputs. The r~-vectors ¢(k) and the r2-vectors t/(k) form two independent sequences of independent identically distributed Gaussian ran- d o l n vectors:

~(k) ~-* ]~q'(O, Ir~),

~7(k) ,-~N(0,/,~),

I,, =E{~(k)~r(k)}

k = 0 , 1 . . . . . N - 1

I,~=E{q(k)qT(k)}

k = l , 2 . . . . . N . (3)

Similarly, the system's initial state x(0) is assumed to be a Gaussian distributed random vector:

x(O)~N[p(O), P(0)]

P(0) = E{ Ix(0) - p(0)] [x(0) - #(0)] r ) . (4)

731

732 G. STEIN and G. N. SARIDIS

The quantities A(~, k), B(a, k), F(a, k), C&, k) and D(~, k) are matrices with appropriate dimensions whose elements are arbitrary but known functions of the time index k and of the /-vector ~. The vector a consists of unknown system parameters. It is assumed to belong to the finite set ~ and to be constant on the control interval k = 0, 1 , . . . , N.

The adaptive control problem for this system consists of finding a sequence of control inputs {u(k), k=0 , 1, 2 , . . . , N - l } as functions of the available measurements,

u(k)=f~(yk), yk= {y(1), y(2) . . . . . y(k)},

k =0, 1 . . . . . N - 1 (5)

such that the following average cost function is minimized:

1 2 (6)

The symmetric matrices Q(a, k) and R(c¢, k] represent the relative weights to be placed upon various components of state and control deviations. Their dependence on ~ is included to reflect the empirical fact that quadratic weights are not chosen a priori but rather are chosen to suit the particular plant and the designer's overall conception of satisfactory performance.

The following additional assumptions are made:

(i) D(a, k)DT(~, k) > 0 ] - T . . . . I for all ~e~, and

(ii) Q(~, k)=(2 (c~, k ) _ > O ~ k = l ' 2 . . . . . N (7) l

(iii) R(e, k)=RT(~, k ) > 0 J

(iv) an a priori discrete probability distribution function q(0) for the vector a is available, where q(0) is an s-vector with components

0_< qi(0) = Prob[a = aJ _< 1,

satisfying

qi(O) = 1. i = l

i=1, 2, . . . , s

Since a feedback control of the form (5) is desired, the method of dynamic programming will be used to minimize criterion (6).

Define the "optimal return function": V(y k, N-k)Acost of an N - k stage adaptive

control process using the optimal control sequence {u*(k), u*(k + 1) . . . . , u*(X- 1)} based upon a priori informa- tion (4) and (7 iv) and upon the measurement sequence yk= {y(1), y(2) . . . . . y(k)}.

Applying the "Principle of Optimality" [2], the optimal return function obeys the following recursive functional equation:

V(y k, N - k ) =rain E{llx(k + 1)ll ~ + II~(k)ll~ u(k)

+ V(y k+ ', N - k - 1 ) [ y k} (8)

with V(y N, 0)---0 (with probability one). In this equation, E{ . . . [yk} denotes the mathe-

matical expectation conditioned on the sequence yk and on the a priori data (4) and (7 iv). The dependence of Q and R upon parameters ~ and time index k has been suppressed. As a matter of con- venience, this practice is continued in all subsequent derivations.

Equation (8) may be solved backwards, starting with a one-stage process.

V(yN-1, 1)= min E(I[x(N)I[ ~+ Ilu(N- 1)IINlY~-'). u (N -- 1 )

(9)

The conditional expectation of equation (9) can be expressed as

e ( ( . . . )ly

= ~ Prob[==~,lyN-1]E{(... )]=__~,, yN-1} i = l

s

= y '~q , (N-1)E{( . . . ) [~=~, , yN-1} (10) i = 1

where q,(N- 1), i= 1, 2 , . . . , s, is the a posteriori probability distribution of parameter vector based upon measurements yN- 1. So (9) becomes

V(y s- l , 1)= min ~ qi(N-1)E{tIx(N)]]~ u ( N - 1) i = 1

' 2 yN- 1} + Ilu(N- 1)[I.l = . i ,

= rain ~ qI(N-1)E u ( N - I) i = 1

+ ][u(N- 1)ll, l = (11)

Replacing x(N) by the system equation (1) for each value ~= ~i and carrying out the expectations and minimization, it can now be readily verified [12] that V(y N-l, 1) is quadratic and that the corresponding optimal control is linear in terms of the following expanded vector of state estimates:

)~(k)A[pr(a,, k), #T(~ 2, k) . . . . . #'r(cq, k)] r (12)

where

P(~i, k)A-.E{x(k)l~ = ~,, yk}, i= 1, 2 . . . . . . 1.

A parameter-adaptive control technique 733

The one-stage cost and control are

V(Y N-i, 1)= I[2(N-1)[[s2tq(N-1), 11

+ T[q(N- 1), 1]

u*(N- 1) = - G[q(N - 1), 112(N - 1)

(13)

(14)

where matrices S and G and scalar Tare non-linear functions of the a posteriori distribution qi(N- 1), i= 1, 2 . . . . , s. These are defined in the Appendix, equations (A.1), (A.2), and (A.3).

It is now evident that the vectors ~(k) and q(k) constitute a set of "sufficient coordinates" [15] for the adaptive control problem formulated in equations (1)-(7). The optimal return function can be expressed as

V(y k, N - k) = V[2(k), q(k), N - k]

and the functional equation (8) becomes

V[2(k), q(k), N - k ]

= rain ~ q,(k)E{llx(k + 1)118 + Ilu(k)lll u(k) i= 1

+ V[2(k + 1), q(k + 1), N - k - 1] la = a,, .~(k)}

(15)

with V{X(N), q(N), O}-0 (with probability one). The existence of sufficient coordinates reduces the dependence of I I ( . . . ) upon a growing number of variables (yk) to the dependence upon a constant and finite number of variables [~(k), q(k)].

Equation (15) can be used to continue the dynamic programming solution, starting with the quadratic return function V[~(N-1), q(N-1), l] of (13). As defined by equation (A.2), however, the matrix S[q(N- 1), 1] is a non-linear function of the a posteriori distribution qi(N-1), i= 1, 2 . . . . . s. This fact prevents the successful completion of the solution in closed form. The function V[~(N-2), q(N-2), 2] and all subsequent optimal return functions are no longer expressible in terms of quadratics or in terms of other similarly convenient functional forms.

The solution of (15) must therefore be obtained by numerical techniques [10] or by approximation methods. Because the computing time and memory requirements of numerical solutions are prohibitive for all but the simplest problems, the following discussion will deal with an approximation method which is based upon a very intuitive and appealing linearization technique.

3. L I N E A R I Z A T I O N O F T H E W E I G H T I N G M A T R I X OF T H E O P T I M A L R E T U R N F U N C T I O N

It has been pointed out that the optimal return function for a single stage of the adaptive control problem formulated above is quadratic in ~ ( N - 1)

with a weighting matrix S[q(N-1), 1] which is a non-linear function of the a posteriori distribution q(N- 1). That is,

S[q(N- 1), 1]

=f,[qi(N- 1), qz(N - 1) . . . . . qs- l ( N - 1)],

(16)

where f l ( . . . ) is the matrix-valued function of ( s -1) independent variables defined by equations (A.1) and (A.2). The fact that f l ( . • .) only has ( s - I) arguments is a consequence of the relation

qi(k) = 1 k=O, 1 . . . . . N - 1 . (17) i=1

Let the matrix S(q, 1) be the matrix-valued "tangent plane" to the matrix S(q, 1) at the point q(0). This new matrix g can be computed by considering S itself to be a matrix "surface" on an ( s - 1) dimensional Euclidean space. That is,

fl(ql, q2 . . . . . q s _ l ) - S = 0 . (18)

Then the "tangent plane" at the point q(0) is defined by

s - 1 0~" [J-2[q,-q,(O)]-{~-S[q(O), 1]}=0, (19)

i=l oqi

where the partial derivatives are evaluated at q=q(O). Now define

s--1 U~(1)AS[q(O), 1 ] - E ~f-~ftqi(0)" (20)

i=l vqi

Then the tangent plane becomes

s - i (~f l / s - I "~ S(q, 1)= i=~i-~-q q ,+~i - ,=~l qQU~(I)

s - 1

+ E q,U~(1). /=1

Using (17), this expression reduces to the desired linear function,

~(q, 1)=qiUl(1)+qzU2(1)+ . . . +q~U,(1) (21)

with

Ui(1 ) =0 f l + Us(l), i=1, 2 . . . . . s - 1 .

The optimal return function of the one-stage adaptive control process (13) will now be approx- imated by replacing the weighting matrix S[q(N- 1), 1] by the linearized matrix ~[q(N- 1), 1] defined in equations (20) and (21). Using this approximation,


the return function of a two-stage adaptive control process can be obtained analytically from the following equation:

V [ 2 ( N - 2 ) , q ( N - 2 ) , 2]

- min ~ q,(n-2)E{llx(N-1)]l ~ u(N- 2) i=1

2 + I[u( N - 2)112 + Ill°( N - 1)llgtq(N-1), 11

+ T[q (N- 1), 1)]a =a,, J~(N-2)} . (22)

The resulting return function is again quadratic

1712~(N - 2), q(N - 2), 2] = [[ J~(N - 2)[lgtq(N- z,,21

+ T [q (N - 2), 2] (23)

and the corresponding control is linear [12]

a * ( N - 2) = - G[q(N- 2), 2 1 2 ( N - 2). (24)

The matrices S[q(N-2), 2] and G[q(N-2), 2] are non-linear functions of the a posteriori distribution q ( N - 2 ) which have exactly the same functional forms as the corresponding matrices of the one- stage return function.

The symbols ff and ~* in equations (22)-(24) are used to emphasize the fact that these quantities are no longer the optimal return function and the optimal control respectively but rather that they depend upon the approximation of S[q(N-I) , 1] by the linearized matrix g[q(N-1), 1]. Since this approximation is directly involved in the minimization indicated by equation (22), the return function Vof (23) and the control tT* of (24) have a meaningful interpretation only if an inequality of the type

II IIL,. II IIL,,, (25)

can be established for all )? and all probability distributions q. V[J~(N-2), q(N-2), 2] is then the minimum cost of a two-stage adaptive control process for which the "cost of the final stage is somewhat higher than the optimal cost". O*(N- 2) is the corresponding minimizing control. It is not valid, of course, to claim that I /*(N-2) is "close" to the optimal control signal u * ( N - 2 ) as a consequence of (25). However, the actual cost incurred by using t i* (N-2) will be less than or equal to the right-hand side of (25) [12]. Therefore, if 17is close to V then the control signal ~*, no matter how different from u*, will achieve nearly optimal cost.

The inequality (25) is indeed satisfied as a consequence of the following property.

Upper bound property of the g approximation. For any fixed but arbitrary vector ~ , the function 11211~(q, t), considered as a function of the s-vector

q, defines a supporting hyperplane [7] of the closed convex set f2

^' z . qe~o ] n = {(z, q)[O<_z<_tlx ]s~,,, ,,,

2 ~ at the point [ll Ilst,~(o),11, q(0)]

(26)

where

~q={q[O<_qi<_l, i=1 , 2 . . . . . s; ~ qt~ 1}. i=1

~oof. parts: (i) A

(ii)

The proof of this property consists of two

proof of convexity for the set f2 which reduces to a proof of convexity for the function 2 2 I1 Ils(q,', on the domain f~q. The details can

be found in Ref. [12]. A proof of the fact that II IIL,. defines a sup-

q(O)]. porting hyperplane o f ~ at [[IJ~}[sfq(o), l j, This follows directly from the definition of g as the matrix-valued "tangent plane" to S at q=q(0) , and from the convexity of Y2.

Inequality (25) is now a direct consequence of the fact that the set f~ and particularly its boundary

2 It lls q, 1, lies in one closed half-space produced by the supporting hyperplane []Jfll~(q, 1,[7] •

The inequality (25) is a property of the g approximation which lends a meaningful interpretation to the two-stage return function V[J((N-2), q (N-2 ) , 2]. Equally important, however, is the fact that this function itself is again quadratic, with a non-linear weighting matrix S(q, 2) which has exactly the same functional form as the matrix S(q, 1). It is therefore possible to approximate the new weighting matrix by the same linearized form

S(q, 2)= ~ qiU~(2), (27) i=1

where the matrices Ut(2), i= 1 , . . . , s, are defined by analogy to equation (21). This approximation again satisfies an upper bound property

Ila°ll2(q,z)<lIJ(ll~(q, 2) for all )? and all q ~ q (28)

and further, it permits the computation of an approximate three-stage return function

V [ X ( N - 3), q ( N - 3), 3] = [1 ~ ( N - 3)11S2tq~,V - 3,. 31

+ T[q(N-3), 3] (29)

with the minimizing control

f f* (N- 3)= - G [ q ( N - 3), 3]J~(N - 3). (30)

Since this three-stage return function is again


quadratic with the same non-linear weighting matrix, it is evident that the ~ approximation may be applied once more to yield an approximate four- stage return function and that repeated applications of the same procedure can be used to obtain a solution for the entire N-stage adaptive control process. The computations required for such a solution are summarized by the following recursive equations:

Solve backwards for k = N, N - I . . . . . 1

Q,(k) = [ I - KC,(k)]- ~*[ W,(k)

+ U,(N- k)] [ I - KC,(k)]-' i=1 ,2 . . . . . s (31)

R(q, k)= ~ qiR(ai, k) (32) i = 1

Q.(q, k )= ~ q,Q,(k) (33) i = 1

G(q, N - k + 1) = [BTO.(q, k)B

+.~(q, k)] - ~ BTQ(q, k)A (34)

S(q, N - k + 1) = ~rQ(q, k ) [ A - BG(q, N - k + 1)]

(35)

~fN-k+q ~qi q(o)

=AT[Q,(k - Qs(k)]A

- ~T[Q~(k)- Qs(k)]Ba(q(O), N - k + 1)

- Gr(q(0), N - k + 1)BT[Q,(k)- O~(k)].4

+ GT[q(O), N - k + 1] {BT[Q,(k)- Q~(k)] B

+ n(ct i, k)-R(a~, k)}G[q(O), N - k + 1]

i=1 , 2 . . . . . s - 1 (36)

U , ( N - k + I)=S[q(O), N - k + l ]

- ~ ' qi(0)O~] (37) ~= i cqi [q(o)

V , ( N - k + l)= U~(N-k + I)+ O fu-k+' Oqi q(O)'

i= 1, 2 . . . . . s - 1 (38)

with initial conditions U~(0), i= 1, 2 . . . . . s. These recursive equations define the desired

sequence of (m x ns) feedback gain matrices G( ') as well as a sequence of weighting matrices Ui('), i= 1, 2 . . . . . s, each of dimension (ns × ns). The equations can be solved entirely off-line. They require a computational effort roughly equivalent to solving linear-quadratic control problems for s separate ns-dimensional systems, recalling that n

is the order of system (1) and s is the total number of values which the vector a may assume. Again, detailed derivations of the recursion equations are available in Ref. [12]. The needed definitions of known matrices .~ = .4 (k - 1), B = / ~ ( k - 1), K =K(k), and Ci(k), Wl(k) are found in the Appendix, equations (A.4), (A.5), (A.6), and (A.10), (A.11), respectively.

Repeated applications of the ~ approximation thus yields a closed form approximate solution of the adaptive control problem formulated by equations (1)-(7). This solution can be readily interpreted in the form of a closed-loop adaptive control system.

4. THE RESULTING ADAPTIVE CONTROL SYSTEM As shown in the derivations above, the adaptive

controller must generate control signals ~*(k) defined by

5*(k) = - G[q(k), N - k])~(k)

k=0 , 1 . . . . . N - 1 .

The controls are thus function of the "sufficient coordinates" .~(k), q(k) and of the matrices G(q, k). Expressions for the feedback matrices can, of course, be obtained entirely off-line by solving equations (31)-(38) recursively. The sufficient coordinates, on the other hand, must be computed on-line by the adaptive controller itself. The computation of ~(k) may be interpreted as "state estimation," which can be performed by the simultaneous operation of s Kalman-Bucy filters [9] [equations (A. 13)-(A. 16)], and the computation of q(k) may be interpreted as "parameter identification", which can be performed by the applica- tion of Bayes theorem [8]

qi(k + l)= p[y(k + l)[e=% yk]qi(k)

plY(k+ 1)[a=aj, yk]qj(k) j = l

i= 1, 2 . . . . . s (39)

where p[y(k + 1)1~ = % yk] is the probability density function of the (k + 1)-th measurement conditioned on a=~ i and yk. With these interpretations, the resulting closed-loop adaptive control system will have the form shown in Fig. 1. It is important to observe that the apparent separation of the state estimation and parameter identification functions of this controller is not a consequence of an apriori assumption [4, 14], but rather, that it is a consequence of the approximation method used to solve the recursive dynamic programming equations. A further consequence of this method is the fact that the feedback matrices G(q, k) are intimately related to both the state estimation and parameter identification schemes.


i] i' i~',7(K)

7 ....... I ,, ~ - - ~ ~ [ Stored ~ X[k) [ -"Stat~ ,, . ~ ! : L ~ tim°t°~ LJ~i

I Functi°ns I q(k) r , ~ q II ',

t

FIG. 1. The adaptive control system.

5. AN EXAMPLE

The parameter adaptive control technique presented above is now illustrated with the following simple example. Let the system be described by the discretized version of a continuous-time stochastic second order differential equation with a known natural frequency o9,=1 rad./sec and with an unknown parameter vector ~r--(6, a), where 6 is the damping ratio of the system and is the standard deviation of the stochastic distur- bance.

neither of these controllers is optimal For the adaptive control problem formulated in section 2. The true optimal solution for this problem is, of course, not available. The two controllers do, however, provide a meaningful standard of comparison. They are the controllers which will be obtained if an a priori decision is made about the value of the parameter vector ~r=(6 , a). Controller (1) for i= 1 corresponds to the correct decision and controller (2) for i=2 corresponds to the incorrect decision.

A typical comparison between the adaptive control system and the two controllers above is given in Table 1. Additional comparisons can

TABLE 1, COMPARISON OF CONTROLLERS

~1, ~2

Cost using Cost of Cost using optimal adaptive optimal

controller control controller for i - 1 process for i = 2

(40)

0'1 , 0"9 571 587 958

0'25, 0'75 447 455 536

0"40, 0'60 357 358 367

y(k+ 1)=(1, O)x(k+l)+O.316q(k+l) (41)

k=0 , 1 . . . . . 49, i = 1 , 2 . . . . . s.

be found in Figs. 2 and 3. Figure 2 compares the per-stage costs of the three processes represented in

Let the sampling period A of the discretization be 0"1 and let the parameter set ~, , the performance index J, and the a priori data be given as follows:

The control gains G(q, k), k = 1 . . . . ,50, for this example were computed for several sets of values of 51 and 62. Using these gains and a " t rue" system corresponding to the index i=1, the 50-stage adaptive control process was simulated on a CDC 6500 digital computer. The average cost of 100 simulation runs was then used to compare the performance of the adaptive control technique presented here with the performance of two other control lers- (l ) the optimal stochastic controller computed for a plant with known parameter values corresponding to i = 1, and (2) the optimal stochastic controller computed for a plant with known parameter values corresponding to i=2. Note that

"y +

x 4-

-:- 4of,, \

8

0

i

~- 20-

Optimal Control i=l i=2 . . . . . . . . .

Ad~ptive Control

I

\XXx

0 25 50 5~Qg~ k

FIG. 2. Per-stage costs.


~ 2

/ "N,

I 2

I /

x ! ~'/

Fro. 3. Phase-plane trajectories.

the first row of Table 1, while Fig. 3 compares their phase plane trajectories. (Per-stage costs are defined to be the individual summands of the performance index J.) Again, both figures were obtained by averaging 100 separate simulation runs. To conserve space, the behavior of the a posteriori probabilities q(k), k = 0 , 1 . . . . . 50, associated with the adaptive control process is not shown. It is sufficient to state that these probabilities exhibit well-behaved convergence pro- perties from the a priori values qr(0)= (0"5, 0"5) toward qr(oo) = (1, 0).

The comparisons of Table 1 and Figs. 2 and 3 all indicate that the proposed parameter-adaptive control scheme represents a promising approach to the solution of appropriately formulated control problems.

6. CONCLUSIONS

This paper has presented an approximation technique for the closed form solution of the functional equation of dynamic programming associated with a particular class of linear, parameter-adaptive control problems. The method leads to a simple and intuitively appealing adaptive controller whose performance, at least in the example presented, appears quite promising. Many questions, however, remain unresolved. For example, the control processes considered here are limited to finite duration. This restriction eliminates the need to consider the convergence question of the approximate solution of the dynamic programming equation [3]. It is clear, however, that if convergence is indeed obtained, then the storage requirement associated with the gains G(q, k), k = 1, 2 , . . . , N, can be sig- nificantly reduced by storing only G(q, oo) for an infinite time adaptive control process. Another

interesting question concerns the fact that the linearization procedure makes G(q, k) an implicit function of the a priori distribution q(0). It would seem appropriate, therefore, to recompute the feedback gains occasionally as the control process evolves and as a posteriori probabilities become available about which to relinearize. Of course, this procedure will require additional on-line computational capability. Finally, a very important practical and theoretical question concerns the "closeness" of the approximate solution. Is it possible to evaluate analytically the performance losses associated with the proposed adaptive control system? An answer to this question, which sig- nificantly strengthens the theoretical value of the proposed controller, was obtained only very recent- ly and is briefly discussed in Appendix 2.

REFERENCES

[1] M. AoKI: Optimization of Stochastic Systems. Aca- demic Press, New York (1967).

[2] R. BELLMAN: Adaptive Control Processes. Princeton University Press (1961).

[3] R. BELLMAN and R. KALABA" Dynamic Programming and Modern Control Theory. Academic Press, New York (1965).

[4] J. B. FARISON, R. E. GRAHAM and R. C. SHELTON: Identification and control of linear discrete systems. 1EEE Trans. Aut. Control AC-12, 438-442 (1967).

[5] A. A. FEL'DBAUM: Theory of dual control, I, II, III, IV. Automn remote Control 21, 1240-49; 1453-64 (1960); 22, 3-16, 129-143 (1961).

[6] A. A. FEL'DBAUM : Optimal Control Systems. Academic Press, New York (1965).

[7] G. HADLEY: Linear Algebra. Addison-Wesley Publish- ing Company, Reading, Massachusetts (1964).

[8] Y. C. Ho and R. C. K. LEE: A Bayesian approach to problems in stochastic estimation and control. IEEE Trans. Aut. Control AC-9, 333-339 (1964).

[9] R. E. KALMAN and R. S. BuoY: New results in linear filtering and prediction theory. J. bas. Engng 83, 95-108 (1961).

[10] R. E. LARSON: A survey of dynamic programming procedures. IEEE Trans. Aut. Control AC-12, 767-774 (1967).

[11] D. SWORDER: Optimal Adaptive Control Systems. Academic Press, New York (1966).

[12] G. STERN: An approach to the Parameter-Adaptive Control Problem. Ph.D. Thesis, Purdue University, West Lafayette, Indiana, January (1969).

[13] J. T. Too: System optimization via learning and adaptation. Int. J. Control 2, (1965).

[14] YA. Z. TSYPKIN: Adaptation, training and self-organiza- tion in automatic systems. Automn remote Control 27, No. I (1966).

[15] W. M. WONHAM: Stochastic Problems in Optimal Con- trol. RIAS Technical Report 63-14 (May 1963).

APPENDIX A

Definitions o f non-linear functions of a posteriori distributions in cost and control

Matrices for the one-stage adaptive control process are

G(q, 1) = [BT~)(q, N)B+R(q , N)] - 1BTQ(q, N)A

(A.1)

738 G. STEIN and G. N. SAR~D;S

S(q, 1)=.4rQ(q, N)[ / I -BG(q , 1)] (A.2)

T(q, 1)= ~ q~T,(1) (A.3) i = 1

where

A = A ( N - 1)

= d iag{[ I - K(a,, N)C(%'N)]A(a,, N - 1)} (h.4)

B = B ( N - 1)

= c o l u m n { [ l - K ( % N)C(% N)]B(% N - 1 ) }

(A.5)

(A.6)

(A.7)

K = K(N) = column{K(% N)}

_R(q, N)= ~ qiR(% N) i = 1

Q(q, N)= ~ q,Qi(N) (h.8) i = 1

Q,(N) = [1 - KC,(N)] - ~ W,(N)[ I - KC~(N)]-~,

i----1 . . . . . s

W , ( U ) =

el(N)= [0 . . . c(c~,, N) . . . o], T

ith partition

q(o~ i, N)

(A.9)

i = l . . . . . s

(A. 10)

0

ith diagonal partition,

0

0

i=1 . . . . . s (A.11)

P(% k + 1) = M - MC~[CMC'r + DDr]_ - ~ CM

(A.15~

M =AP(ot i, k)Ar+ I T r ,

P(Ti, 0) = P(0)

i = 1 , 2 . . . . . s,

k=0 , 1 . . . . , N - 1 . (A.16)

In equations (A.12)-(A.16) the suppressed ~ and time dependence is given by

K = K(cq, k + 1),

D = D(oq, k+ 1),

A = A ( % k),

F = F(% k).

C = C ( % k + l ) ,

M = M ( % k + l ) ,

B = B(% k),

APPENDIX B

"Closeness" of the approximate solution Whenever approximation methods are employed

to solve engineering problems, it is of great practical and theoretical interest to evaluate the error magnitudes incurred in the final solution. For the case of the adaptive control system proposed in this paper, such evaluations have been obtained in Ref. [12] and are briefly described in this appendix.

To begin with, it can be readily shown, by using the upper bound properly of the ~-approximation, that the approximate solution ~ of the dynamic programming equation represents an upper bound on the optimal adaptive control cost V. That is,

17(2, q, k)> V()(, q, k) (B.l)

for all state estimates )(, all probability distributions q, and all stage indices k =0, 1 . . . . . N. Moreover, the approximate solution 17 also bounds the actual cost incurred by using the approximate adaptive control signals ~*(k), k=0 , 1 . . . . . N - I . This is established as follows: Let V"~(_~, q, k) be the actual operating cost realized by the approximate controller and assume that for some stage index k

T,(N)=Trace[Q(% N)P(ai, N)]

+ Trace[Wi(N)K(CMC T + DDT)Kr],

i= 1 . . . . . s (A.12)

and where the matrices K, P, and M are obtained from the solution ofs Kalman-Buey filter equations [9]:

#(as, k + l) = ( I - KC)[A#(cq, k) + Bu(k)] + Ky(k)

p(%, 0) = p(0) (A. 13)

K = P(% k + 1)C(DD~) - ~ (A. 14)

if(2, q, k)>_ V"c(X, q, k). (B.2)

Then it follows from the ~-approximation and from (B.2) that

17(J?, q, k + l) >- E{Ltq- k

+ V()~', q',/¢) } I l u ( N - k - 1 ) = 5 * ( N - k - 1 )

>__E{Ls_~ + V"C()( ', q', k)}l

',u = z~*

AV"C()(, q, K+ 1), (B.3/


where LN_ k is the (N-k ) - th summand in the performance index (6). Since 17(£, q, k )= VQc(£, q, k) = 0, an inductive argument now yields

17(£, q, k)_ V"C()(, q, k)_> voW, q, k). (B.4)

Equation (B.4) represents upper and lower bounds on the actual operating cost of the proposed adaptive control system. Because Vis an unknown quantity, however, these bounds are of limited utility as performance measures of the controller. A much more useful set of bounds has been obtained by replacing V(J~, q, k) with a known lower bound VZ(2, q, k) on the optimal cost. Such a bound is derived in Ref. [12]. Intuitively, it represents the optimal cost of the control process with known values of the parameter e averaged over the initial parameter distribution. With this bound, equation (B.4) becomes

17(2, q, k)_> V~C(.~, q, k)_> V(£, q, k)2 Vt()~, q, k) (8.5)

for all .£, q, and k. These inequalities represent an effective performance measure for the proposed adaptive controller, one which can be evaluated analytically without extensive simulation studies or actual system operation.

R6sum6---La commando de syst6mes lin6aires al6atoires avec des param6tres inconnus est r6alis6¢ au moycn d'une solution approch6c de l '6quation fonctionnelle adjointe de program- mation dynamique. L'approximation est bas6e sur des lin6arisations r6p6titives d 'une matricc quadratique de ponderation apparaissant dans la fonction de cofit optimal du proc~d6 de commande. Cette mgthode conduit fi un syst6me de commande adaptative qui est l in~i re dam un vecteur 6tendu d'estimations d'6tat avec des gains de r~action qui sont des fonctions explicites des probabilitOs a posteriori des param6tres. Le fonctionnement de ce r6gulateur est illustr6 par un exemple simple.

Zusammenfassung--Dic Regelung eincs lincaren stochastis- chcn Systems mit unbckannten Parametern wird durch einc N/iherungsl6sung der zugeordnetcn Funktionalgleichung der dynamischen Programmierung vervollstfindigt. Die Approximation sttitzt sich auf mehrfache Lincarisierungen einer quadratischen Gewichtsmatrix, die in der optimalcn Kostenfunktion fiir den geregeltcn Prozcfl erscheint. Das Verfahren ftihtt zu einem adaptiven Regelungssystem, das in einem erwciterten Vektor von Zustandssch/itzungen mit Rfickf'tihrgcwinnen, die explizite Funktionen von a posteriori Parameter-Wahrscheinlichkeiten sind, linear ist.

Pe3IoMe---YnpaBHeHH¢ JIRHe~IMH CTOXaCTH~Ie~KHIVIH CHCTe- MaMH ¢ HeH31~CTHhIMH napaMeTpaMH OCyLRPCTB2LqeTCB C nOMOIIIH~ npH6nnacennoro pemetma BcnoMoraTenl, Horo ~bynrdmoHan~noro ypanHetma mmaMa~CcKoro nporpaM- MHpoBaHILq. 1-IpH6JII~KeHHe OCHOBaHO Ha nOBTOpnTeJ]bnbIX nHncapHaaRnax ~na~paTnv~IO~ BCCOBO~ MaTpHHb] noa- Baa~omeRca B dpymcIIm~ OnTHManI, nOl~I IIeHI, I npouecca ynpaBneH~a. DTOT MeTO~ np~BO~HT g CHCTeMe aRanxaTHn- HOrO ynpaBHeHHa KOTOpag YIaHei~Ia B pacmnpenHOM nClCl'OiX~ oIIeHOK COCTOgHIta C gO3q~qbmJkIeHTaMH yCltnCHH~ o6paTHOlt CB~3H ann~onmMC~ aB~nvm ~yIHc3JMSMH nocne~ymmnx ncp0aTHOCTe~ napaMeTpoB. Pa~oTa 3Toro pcryn~Topa HHYtIOCTpHpyeTC~I Ha npOCTOM npHMepe.