A parameter-adaptive control technique

  • Published on
    15-Jun-2016

  • View
    215

  • Download
    1

Embed Size (px)

Transcript

<ul><li><p>Automatica, Vol. 5, pp. 731-739. Pergamon Press, 1969. Printed in Great Britain. </p><p>A Parameter-Adaptive Control Technique* Une technique de commande h adaptation de param&amp;res </p><p>Ein Verfahren der Parameter-adaptiven Regelung </p><p>TexHvira ynpaBJieHrI~ c a)Ial-tTaIIrte~ napaMeTpoB </p><p>G. STEINS" and G. N. SARIDIS+ + </p><p>An approximate solution of the functional equation of dynamic programming has been used to develop a simple adaptive controller for linear stochastic systems with unknown parameters. </p><p>Summary--Control of linear stochastic systems with un- known parameters is accomplished by means of an approxi- mate solution of the associated functional equation of dynamic programming. The approximation is based on repeated linearizations of a quadratic weighting matrix appearing in the optimal cost function for the control pro- cess. This procedure leads to an adaptive control system which is linear in an expanded vector of state estimates with feedback gains which are explicit functions of a posteriori parameter probabilities. The performance of this controller is illustrated with a simple example. </p><p>1. INTRODUCTION </p><p>FOR SOME years now the dual control formulation [5, 6] and the dynamic programming formulation [2, 3] for the so-called "optimal adaptive control problem" have been available to solve control problems involving certain unknown quantities such as parameters of the system's mathematical model, parameters of the statistical descriptions for various disturbances affecting the system, or entire functional relationships involved in the mathe- matical representation of the control problem. Various efforts have been made to utilize these formulations and to modify them for numerous special circumstances [1, 11, 13]. Only limited success, however, has been achieved in dealing with the significant analytical complexities and com- putational burdens associated with both formula- tions. </p><p>This paper considers a special case of the optimal adaptive control problem for which it is possible to exploit a simple approximation technique to obtain an approximate solution of the functional equation </p><p>* Received 14 February 1969; revised 26 May 1969. The original version of this paper was presented at the 4th IFAC Congress which was held in Warsaw, Poland during June 1969. It was recommended for publication in revised form by associate editor P. Dorato. </p><p>t Honeywell Inc., Systems &amp; Research Div., Research Dept., 2345 Walnut Street, St. Paul, Minnesota 55113, USA. </p><p>Purdue University, Department of Electrical Engineer- ing, West Lafayette, Indiana, USA. </p><p>associated with the dynamic programming for- mulation. The adaptive control problem itself is formulated in section 2 of the paper, followed by a discussion of the approximation technique in section 3 and the resulting adaptive control system in section 4. The solution is then illustrated with a simple example in section 5. </p><p>2. A FORMULATION OF THE ADAPTIVE CONTROL PROBLEM </p><p>Let the system be described by the following linear, discrete-time, stochastic model, </p><p>x(k + 1)=A(c~, k)x(k)+ B(c~, k)u(k)+ F(~, k)~(k) </p><p>k=O, 1 . . . . . N -1 </p><p>~eE~,= {~1, c2 . . . . . as} (1) </p><p>with the measurement equation </p><p>y(k) = C(a, k)x(k) + D(~, k)rl(k), </p><p>k = 1, 2 . . . . . N. (2) </p><p>The vector x(k) is an n-vector of state variables defined at time instant t,, u(k) is an unconstrained m-vector of control inputs, and y(k) is an r-vector of measured outputs. The r~-vectors (k) and the r2-vectors t/(k) form two independent sequences of independent identically distributed Gaussian ran- do ln vectors: </p><p>~(k) ~-* ]~q'(O, Ir~), </p><p>~7(k) ,-~N(0,/,~), </p><p>I,, =E{~(k)~r(k)} </p><p>k=0,1 . . . . . N -1 </p><p>I,~=E{q(k)qT(k)} </p><p>k=l , 2 . . . . . N . (3) </p><p>Similarly, the system's initial state x(0) is assumed to be a Gaussian distributed random vector: </p><p>x(O)~N[p(O), P(0)] </p><p>P(0) = E{ Ix(0) - p(0)] [x(0) - #(0)] r ) . (4) </p><p>731 </p></li><li><p>732 G. STEIN and G. N. SARIDIS </p><p>The quantities A(~, k), B(a, k), F(a, k), C&amp;, k) and D(~, k) are matrices with appropriate dimensions whose elements are arbitrary but known functions of the time index k and of the /-vector ~. The vector a consists of unknown system parameters. It is assumed to belong to the finite set ~ and to be constant on the control interval k = 0, 1 , . . . , N. </p><p>The adaptive control problem for this system consists of finding a sequence of control inputs {u(k), k=0, 1, 2 , . . . , N - l} as functions of the available measurements, </p><p>u(k)=f~(yk), yk= {y(1), y(2) . . . . . y(k)}, </p><p>k =0, 1 . . . . . N -1 (5) </p><p>such that the following average cost function is minimized: </p><p>1 2 (6) </p><p>The symmetric matrices Q(a, k) and R(c, k] represent the relative weights to be placed upon various components of state and control deviations. Their dependence on ~ is included to reflect the empirical fact that quadratic weights are not chosen a priori but rather are chosen to suit the particular plant and the designer's overall conception of satisfactory performance. </p><p>The following additional assumptions are made: </p><p>(i) D(a, k)DT(~, k) &gt;0] - T . . . . I for all ~e~, and </p><p>(ii) Q(~, k)=(2 (c~, k)_&gt;O~k=l ' 2 . . . . . N (7) l </p><p>(iii) R(e, k)=RT(~, k)&gt;0J </p><p>(iv) an a priori discrete probability distribution function q(0) for the vector a is available, where q(0) is an s-vector with components </p><p>0_&lt; qi(0) = Prob[a = aJ _&lt; 1, </p><p>satisfying </p><p>qi(O) = 1. i= l </p><p>i=1, 2, . . . , s </p><p>Since a feedback control of the form (5) is desired, the method of dynamic programming will be used to minimize criterion (6). </p><p>Define the "optimal return function": V(y k, N-k)Acost of an N-k stage adaptive </p><p>control process using the optimal control sequence {u*(k), u*(k + 1) . . . . , u*(X- 1)} based upon a priori informa- tion (4) and (7 iv) and upon the measurement sequence yk= {y(1), y(2) . . . . . y(k)}. </p><p>Applying the "Principle of Optimality" [2], the optimal return function obeys the following recursive functional equation: </p><p>V(y k, N-k) =rain E{llx(k + 1)ll ~ + II~(k)ll~ u(k) </p><p>+ V(y k+ ', N -k -1 ) [ y k} (8) </p><p>with V(y N, 0)---0 (with probability one). In this equation, E{ . . . [yk} denotes the mathe- </p><p>matical expectation conditioned on the sequence yk and on the a priori data (4) and (7 iv). The depen- dence of Q and R upon parameters ~ and time index k has been suppressed. As a matter of con- venience, this practice is continued in all subse- quent derivations. </p><p>Equation (8) may be solved backwards, starting with a one-stage process. </p><p>V(yN-1, 1)= min E(I[x(N)I[ ~+ I lu(N- 1)IINlY~-'). u(N -- 1 ) </p><p>(9) </p><p>The conditional expectation of equation (9) can be expressed as </p><p>e( ( . . . )ly </p><p>= ~ Prob[==~,lyN-1]E{(... )]=__~,, yN-1} i= l </p><p>s </p><p>= y'~q,(N-1)E{(.. .)[~=~,, yN-1} (10) i=1 </p><p>where q,(N- 1), i= 1, 2 , . . . , s, is the a posteriori probability distribution of parameter vector based upon measurements yN- 1. So (9) becomes </p><p>V(y s-l , 1)= min ~ qi(N-1)E{tIx(N)]]~ u(N- 1) i=1 </p><p>' 2 yN- 1} + Ilu(N- 1)[I.l =.i , </p><p>= rain ~ qI(N-1)E u(N- I) i=1 </p><p>+ ][u(N- 1)ll, l = (11) </p><p>Replacing x(N) by the system equation (1) for each value ~= ~i and carrying out the expectations and minimization, it can now be readily verified [12] that V(y N-l, 1) is quadratic and that the cor- responding optimal control is linear in terms of the following expanded vector of state estimates: </p><p>)~(k)A[pr(a,, k), #T(~ 2, k) . . . . . #'r(cq, k)] r (12) </p><p>where </p><p>P(~i, k)A-.E{x(k)l~ = ~,, yk}, i= 1, 2 . . . . . . 1. </p></li><li><p>A parameter-adaptive control technique 733 </p><p>The one-stage cost and control are </p><p>V(Y N-i, 1)= I[2(N-1)[[s2tq(N-1), 11 </p><p>+ T[q(N- 1), 1] </p><p>u*(N- 1) = - G[q(N - 1), 112(N - 1) </p><p>(13) </p><p>(14) </p><p>where matrices S and G and scalar Tare non-linear functions of the a posteriori distribution qi(N- 1), i= 1, 2 . . . . , s. These are defined in the Appendix, equations (A.1), (A.2), and (A.3). </p><p>It is now evident that the vectors ~(k) and q(k) constitute a set of "sufficient coordinates" [15] for the adaptive control problem formulated in equations (1)-(7). The optimal return function can be expressed as </p><p>V(y k, N - k) = V[2(k), q(k), N - k] </p><p>and the functional equation (8) becomes </p><p>V[2(k), q(k), N -k ] </p><p>= rain ~ q,(k)E{llx(k + 1)118 + Ilu(k)lll u(k) i= 1 </p><p>+ V[2(k + 1), q(k + 1), N - k - 1] la = a,, .~(k)} </p><p>(15) </p><p>with V{X(N), q(N), O}-0 (with probability one). The existence of sufficient coordinates reduces the dependence of I I ( . . . ) upon a growing number of variables (yk) to the dependence upon a constant and finite number of variables [~(k), q(k)]. </p><p>Equation (15) can be used to continue the dynamic programming solution, starting with the quadratic return function V[~(N-1), q(N-1), l] of (13). As defined by equation (A.2), however, the matrix S[q(N- 1), 1] is a non-linear function of the a posteriori distribution qi(N-1), i= 1, 2 . . . . . s. This fact prevents the successful completion of the solution in closed form. The function V[~(N-2), q(N-2), 2] and all subsequent optimal return functions are no longer expressible in terms of quadratics or in terms of other similarly convenient functional forms. </p><p>The solution of (15) must therefore be obtained by numerical techniques [10] or by approximation methods. Because the computing time and memory requirements of numerical solutions are prohibitive for all but the simplest problems, the following discussion will deal with an approximation method which is based upon a very intuitive and appealing linearization technique. </p><p>3. L INEARIZAT ION OF THE WEIGHTING MATRIX OF THE OPTIMAL RETURN FUNCTION </p><p>It has been pointed out that the optimal return function for a single stage of the adaptive control problem formulated above is quadratic in ~(N- 1) </p><p>with a weighting matrix S[q(N-1), 1] which is a non-linear function of the a posteriori distribution q(N- 1). That is, </p><p>S[q(N- 1), 1] </p><p>=f,[qi(N- 1), qz(N - 1) . . . . . qs- l (N - 1)], </p><p>(16) </p><p>where f l ( . . . ) is the matrix-valued function of (s-1) independent variables defined by equations (A.1) and (A.2). The fact that f l ( . .) only has (s - I) arguments is a consequence of the relation </p><p>qi(k) = 1 k=O, 1 . . . . . N -1 . (17) i=1 </p><p>Let the matrix S(q, 1) be the matrix-valued "tangent plane" to the matrix S(q, 1) at the point q(0). This new matrix g can be computed by considering S itself to be a matrix "surface" on an (s - 1) dimen- sional Euclidean space. That is, </p><p>fl(ql, q2 . . . . . qs_ l ) -S=0. (18) </p><p>Then the "tangent plane" at the point q(0) is defined by </p><p>s -1 0~" [J-2[q,-q,(O)]-{~-S[q(O), 1]}=0, (19) </p><p>i=l oqi </p><p>where the partial derivatives are evaluated at q=q(O). Now define </p><p>s--1 U~(1)AS[q(O), 1] - E ~f-~ftqi(0)" (20) </p><p>i=l vqi </p><p>Then the tangent plane becomes </p><p>s - i (~fl / s - I "~ S(q, 1)= i=~i-~-q q,+~i-,=~l qQU~(I) </p><p>s -1 </p><p>+ E q,U~(1). /=1 </p><p>Using (17), this expression reduces to the desired linear function, </p><p>~(q, 1)=qiUl(1)+qzU2(1)+ .. . +q~U,(1) (21) </p><p>with </p><p>Ui(1 ) =0f l + Us(l), i=1, 2 . . . . . s -1 . </p><p>The optimal return function of the one-stage adaptive control process (13) will now be approx- imated by replacing the weighting matrix S[q(N- 1), 1] by the linearized matrix ~[q(N- 1), 1] defined in equations (20) and (21). Using this approximation, </p></li><li><p>734 G. STEIN and G. N. SARIDIS </p><p>the return function of a two-stage adaptive control process can be obtained analytically from the following equation: </p><p>V [2(N-2) , q (N-2) , 2] </p><p>- min ~ q,(n-2)E{llx(N-1)]l ~ u(N- 2) i=1 </p><p>2 + I[u( N - 2)112 + Ill( N - 1)llgtq(N-1), 11 </p><p>+ T[q(N- 1), 1)]a =a,, J~(N-2)}. (22) </p><p>The resulting return function is again quadratic </p><p>1712~(N - 2), q(N - 2), 2] = [[ J~(N - 2)[lgtq(N- z,,21 </p><p>+ T[q(N- 2), 2] (23) </p><p>and the corresponding control is linear [12] </p><p>a* (N- 2) = - G[q(N- 2), 212(N- 2). (24) </p><p>The matrices S[q(N-2), 2] and G[q(N-2), 2] are non-linear functions of the a posteriori distribution q (N-2) which have exactly the same functional forms as the corresponding matrices of the one- stage return function. </p><p>The symbols ff and ~* in equations (22)-(24) are used to emphasize the fact that these quantities are no longer the optimal return function and the optimal control respectively but rather that they depend upon the approximation of S[q(N-I), 1] by the linearized matrix g[q(N-1), 1]. Since this approximation is directly involved in the minimiza- tion indicated by equation (22), the return function Vof (23) and the control tT* of (24) have a meaning- ful interpretation only if an inequality of the type </p><p>II IIL,. II IIL,,, (25) </p><p>can be established for all )? and all probability distributions q. V[J~(N-2), q(N-2), 2] is then the minimum cost of a two-stage adaptive control process for which the "cost of the final stage is somewhat higher than the optimal cost". O*(N- 2) is the corresponding minimizing control. It is not valid, of course, to claim that I /*(N-2) is "close" to the optimal control signal u*(N-2) as a con- sequence of (25). However, the actual cost incurred by using t i*(N-2) will be less than or equal to the right-hand side of (25) [12]. Therefore, if 17is close to V then the control signal ~*, no matter how different from u*, will achieve nearly optimal cost. </p><p>The inequality (25) is indeed satisfied as a con- sequence of the following property. </p><p>Upper bound property of the g approximation. For any fixed but arbitrary vector ~, the function 11211~(q, t), considered as a function of the s-vector </p><p>q, defines a supporting hyperplane [7] of the closed convex set f2 </p><p>^' z . qe~o ] n= {(z, q)[O</p></li><li><p>A parameter-adaptive control technique 735 </p><p>quadratic with the same non-linear weighting matrix, it is evident that the ~ approximation may be applied once more to yield an approximate four- stage return function and that repeated applica- tions of the same procedure can be used to obtain a solution for the entire N-stage adaptive control process. The computations required for such a solution are summarized by the following recursive equations: </p><p>Solve backwards for k = N, N - I . . . . . 1 </p><p>Q,(k) = [ I - KC,(k)]- ~*[ W,(k) </p><p>+ U,(N- k)] [ I - KC,(k)]-' i=1,2 . . . . . s (31) </p><p>R(q, k)= ~ qiR(ai, k) (32) i=1 </p><p>Q.(q, k)= ~ q,Q,(k) (33) i=1 </p><p>G(q, N - k + 1) = [BTO.(q, k)B </p><p>+.~(q, k)] - ~ BTQ(q, k)A (34) </p><p>S(q, N - k + 1) = ~rQ(q, k) [A - BG(q, N - k + 1)] (35) </p><p>~fN-k+q ~qi q(o) </p><p>=AT[Q,(k - Qs(k)]A </p><p>- ~T[Q~(k)- Qs(k)]Ba(q(O), N - k + 1) </p><p>- Gr(q(0), N- k + 1)BT[Q,(k)- O~(k)].4 </p><p>+ GT[q(O), N - k + 1] {BT[Q,(k)- Q~(k)] B </p><p>+ n(ct i, k)-R(a~, k)}G[q(O), N -k+ 1] </p><p>i=1, 2 . . . . . s -1 (36) </p><p>U, (N-k + I)=S[q(O), N-k+l ] </p><p>-~ ' qi(0)O~] (37) ~= i cqi [q(o) </p><p>V, (N-k + l)= U~(N-k + I)+ O fu-k+' Oqi q(O)' </p><p>i= 1, 2 . . . . . s - 1 (38) </p><p>with initial conditions U~(0), i= 1, 2 . . . . . s. These recursive equations define the desired </p><p>sequence of (m x ns) feedback gain matrices G(') as well as a sequence of weighting matrices Ui('), i= 1, 2 . . . . . s, each of dimension (ns ns). The equations can be solved entirely off-line. They require a computational effort roughly equivalent to solving linear-quadratic control problems for s separate ns-dimensional sys...</p></li></ul>