Optimal Design of Control Systems Stochastic and Deterministic Problems

OPllMAL DESIGN OF CONlROL SYSTEMS Stochastic and Deterministic Problems

OPTIMAL DESIGN OF CONTROL SYSTEMS

PURE AND APPLIED MATHEMATICS

A Program of Monographs, Textbooks, and Lecture Notes

EXECUTIVE EDITORS

Earl J. Tafi Rutgers University

New Brunswick, New Jersey

Zuhair Nashed University of Delaware

Newark, Delaware

EDITORIAL BOARD

M. S. Baouendi Anil Nerode University of Calrlifornia, Cornell University

San Diego Donald Passman

Jane Cronin University of Wisconsin, Rutgers University Madison

Jack K. Hale Fred S. Roberts Georgia Institute of Technology Rutgers University

S. Kobayashi Gian-Carlo Rota University of California, Massachusetts Institute of

Berkeley Technology

Marvin Marcus David L. Russell University of California, Virginia Polytechnic Institute

Santa Barbara and State University

W. S. Massey Walter Schempp Yale University Universitat Siegen

Mark Teply University of Wisconsin,

Milwaukee

MONOGRAPHS AND TEXTBOOKS IN PURE AND APPLIED MATHEMATICS

K. Yano, Integral Formulas in Riemannian Geometry (1970) S. Kobayashi, Hyperbolic Manifolds and Holomorphic Mappings (1970) V. S. Vladimirov, Equations of Mathematical Physics (A. Jeffrey, ed.; A. Littlewood, trans.) (1970) B. N. Pshenichnyi, Necessary Conditions for an Extremum (L. Neustadt, translation ed.; K. Makowski, trans.) (1971) L. Narici et a/., Functional Analysis and Valuation Theory (1971) S. S. Passrnan, Infinite Group Rings (1971) L. Domhoff, Group Representation Theory. Part A: Ordinary Representation Theory. Part B: Modular Representation Theory (1971,1972) W. Boothby and G. L. Weiss, eds., Symmetric Spaces (1972) Y. Matsushima, Differentiable Manifolds (E. T. Kobayashi, trans.) (1972) L. E. Ward, Jr., Topology (1 972) A. Babakhanian, Cohomological Methods in Group Theory (1972) R. Gilrner, Multiplicative Ideal Theory (1972) J. Yeh, Stochastic Processes and the Wiener Integral (1973) J. Barros-Neto, lntroduction to the Theory of Distributions (1973) R. Larsen, Functional Analysis (1973) K. Yano and S. Ishihara, Tangent and Cotangent Bundles (1973) C. Pmcesi, Rings with Polynomial Identities (1973) R. Hermann, Geometry, Physics, and Systems (1973) N. R. Wallach, Harmonic Analysis on Homogeneous Spaces (1973) J. Dieudonne, lntroduction to the Theory of Formal Groups (1973) I. Vaisman, Cohomology and Differential Forms (1973) B.-Y. Chen. Geometrv of Submanifolds (19731 M. arcu us,' Finite ~ikensional ~ultilinear ~ l ~ e b r a (in two parts) (1973, 1975) R. Larsen, Banach Algebras (1 973) R. 0. Kujala and A. L. Vitter, eds., Value Distribution Theory: Part A; Part B: Deficit and Bezout Estimates bv Wilhelm Stoll11973\ K. B. Stolarsky, ~ l ~ e b r a i c Numbers and-~io~hantine Approximation (1974) A. R. Maaid. The Se~arable Galois Theow of Commutative Rinas (1974) - . B. R. ~cBonald, ~in i te Rings with ldentig(l974) J. Satake, Linear Algebra (S. Koh et al., trans.) (1975) J. S. Golan, Localization of Noncommutative Rings (1975) G. Klambauer, Mathematical Analysis (1975) M. K. Agoston, Algebraic Topology (1976) K. R. Goodead, Ring Theory (1976) L. E. Mansfield, Linear Algebra with Geometric Applications (1976) N. J. Pullman, Matrix Theory and Its Applications (1976) B. R. McDonald. Geometric Alaebra Over Local Rinas 11976) C. W. ~roetsch; Generalized lnverses of Linear op&aiors (1977) J. E. Kuczkowski and J. L. Gerstina, Abstract Algebra (1977) C. 0. Christenson and W. L. oxm man, Aspects of ~ o p o l o g ~ (1 977) M. Nagata, Field Theory (1 977) R. L. Long, Algebraic Number Theory (1977) W. F. Pfeffer, Integrals and Measures (1977) R. L. Wheeden and A. Zygmund, Measure and Integral (1977) J. H. Curtiss, lntroduction to Functions of a Complex Variable (1978) K. Hrbacek and T Jech, lntroduction to Set Theory (1978) W. S. Massey, Homology and Cohomology Theory (1978) M. Marcus, lntroduction to Modern Algebra (1978) E. C. Young, Vector and Tensor Analysis (1978) S. B. Nadler, Jr., Hyperspaces of Sets (1978) S. K. Segal, Topics in Group Kings (1978) A. C. M. van Rood, Non-Archimedean Functional Analysis (1978) L. Convin and R. Szczarba, Calculus in Vector Spaces (1979) C. Sadosky, Interpolation of Operators and Singular Integrals (1979) J. Cronin, Differential Equations (1980) C. W. Groetsch, Elements of Applicable Functional Analysis (1980)

56. 1. Vaisman, Foundations of Three-Dimensional Euclidean Geometry (1980) 57. H. I. Freedan. Deterministic Mathematical Models in Pooulation Ecoloav (1980) -. . , 58. S. B. Chae, ~ e b e s ~ u e Integration (1980) 59. C. S. Rees et a/., Theory and Applications of Fourier Analysis (1981) 60. L. Nachbin, lntroduction to Functional Analysis (R. M. Aron, trans.) (1981) 61. G. Orzech and M. Otzech, Plane Algebraic Curves (1981) 62. R. Johnsonbaugh and W. E. Pfaffenberger, Foundations of Mathematical Analysis

(1 981) 63. W. L. Voxman and R. H. Goetschel, Advanced Calculus (1981) 64. L. J. Corwin and R. H. Szczarba, Multivariable Calculus (1982) 65. V. I. lsfr~tescu, lntroduction to Linear Operator Theory (1981) 66. R. D. Jawinen, Finite and Infinite Dimensional Linear Spaces (1981) 67. J. K. Beem and P. E. Ehrlich, Global Lorentzian Geometry (1981) 68. D. L. Annacost, The Structure of Locally Compact Abelian Groups (1981) 69. J. W. Brewerand M. K. Smith, eds., Emmy Noether: A Tribute (1981) 70. K. H. Kim, Boolean Matrix Theory and Applications (1982) 71. T. W. Wieting, The Mathematical Theory of Chromatic Plane Ornaments (1982) 72. D. B. Gauld, Differential Topology (1 982) 73. R. L. Faber, Foundations of Euclidean and Non-Euclidean Geometry (1983) 74. M. Canneli, Statistical Theory and Random Matrices (1983) 75. J. H. Carmth et a/., The Theory of Topological Semigroups (1983) 76. R. L. Faber, Differential Geometry and Relativity Theory (1983) 77. S. Bamett, Polynomials and Linear Control Systems (1983) 78. G. Karpilovsky, Commutative Group Algebras (1983) 79. F. Van Oystaeyen and A. Verschoren, Relative lnvariants of Rings (1983) 80. 1. Vaisman, A First Course in Differential Geometry (1984) 81. G. W. Swan, Applications of Optimal Control Theory in Biomedicine (1984) 82. T. Petrie and J. D. Randall, Transformation Groups on Manifolds (1984) 83. K. Goebel and S. Reich, Uniform Convexity, Hyperbolic Geometry, and Nonexpansive

Mappings (1 984) 84. T. Albu and C. Nastasescu, Relative Finiteness in Module Theory (1984) 85. K. Hrbacek and T. Jech, lntroduction to Set Theory: Second Edition (1984) 86. F. Van Ovstaeven and A. Verschoren. Relative lnvariants of Rinas (1984) . , 87. B. R. ~cbonalb, Linear Algebra 0ver'Commutative Rings (19845 88. M. Namba, Geometry of Projective Algebraic Curves (1 984) 89. G. F. Webb, Theory of ~onlihear ~ge-bependent ~o~ulat ion Dynamics (1985) 90. M. R. Bremner et al., Tables of Dominant Weight Multiplicities for Representations of

Simple Lie Algebras (1985) 91. A. E. Fekete, Real Linear Algebra (1 985) 92. S. B. Chae, Holomorphy and Calculus in Normed Spaces (1985) 93. A. J. Jem.. lntroduction to Integral Equations with A~~lications (1985) 94. G. Karpilovsky, Projective ~epresenbtions of ~in i te 'Groups (1985) ' 95. L. Narici and E. Beckenstein, Topological Vector Spaces (1 985) . . 96. J. Weeks, The Shape of Space (1 985) 97. P. R. Gribik and K. 0. Kortanek, Extremal Methods of Operations Research (1985) 98. J.-A. Chao and W. A. Woyczynski, eds., Probability Theory and Harmonic Analysis

(1 986) 99. G. D. Crown et a/., Abstract Algebra (1986)

100. J. H. Carmth et a/., The Theory of Topological Semigroups, Volume 2 (1986) 101. R. S. Doran and V. A. Belfi, Characterizations of C*-Algebras (1986) 102. M. W. Jeter, Mathematical Programming (1986) 103. M. Altman, A Unified Theory of Nonlinear Operator and Evolution Equations with

Applications (1 986) 104. A. Verschoren, Relative lnvariants of Sheaves (1987) 105. R. A. Usmani, Applied Linear Algebra (1987) 106. P. Blass and J. Lang, Zariski Surfaces and Differential Equations in Characteristic p >

0 (1 987) 107. J. A. Reneke et a/., Structured Hereditary Systems (1987) 108. H. Busemann and B. B. Phadke, Spaces with Distinguished Geodesics (1987) 109. R. Harte, lnvertibility and Singularity for Bounded Linear Operators (1988) 110. G. S. Ladde et a/., Oscillation Theory of Differential Equations with Deviating Argu-

ments (1 987) 11 1. L. Dudkin et al., Iterative Aggregation Theory (1987) 112. T. Okubo, Differential Geometry (1987) 113. D. L. Stancl and M. L. Stancl, Real Analysis with Point-Set Topology (1987)

T. C. Gard, lntroduction to Stochastic Differential Equations (1988) S. S. Abhyankar, Enumerative Combinatorics of Young Tableaux (1988) H. Strade and R. Famsteiner, Modular Lie Algebras and Their Representations (1988) J. A. Huckaba, Commutative Rings with Zero Divisors (1988) W. D. Wallis, Combinatorial Designs (1988) W. Wieshw, Topological Fields (1 988) G. Karpilovsky, Field Theory (1988) S. Caenepeel and F. Van Oystaeyen, Brauer Groups and the Cohomology of Graded Rings (1 989) W. Kozlowski, Modular Function Spaces (1988) E. Lowen-Colebunders, Function Classes of Cauchy Continuous Maps (1989) M. Pavel, Fundamentals of Pattern Recognition (1989) V. Lakshmikantham et al., Stability Analysis of Nonlinear Systems (1989) R. Sivaramakrishnan, The Classical Theory of Arithmetic Functions (1989) N. A. Watson, Parabolic Equations on an Infinite Strip (1989) K. J. Hastings, lntroduction to the Mathematics of Operations Research (1989) B. Fine, Algebraic Theory of the Bianchi Groups (1989) D. N. Dikranjan etal., Topological Groups (1989) J. C. Morgan 11, Point Set Theory (1 990) P. Biler and A. Witkowski, Problems in Mathematical Analysis (1 990) H. J. Sussmann, Nonlinear Controllability and Optimal Control (1990) J.-P. Florens et a/., Elements of Bayesian Statistics (1990) N. Shell, Topological Fields and Near Valuations (1990) B. F. Doolin and C. F. Martin, lntroduction to Differential Geometry for Engineers (1 990) S. S. Holland, Jr., Applied Analysis by the Hilbert Space Method (1990) J. Okninski, Semigroup Algebras (1 990) K. Zhu, Operator Theory in Function Spaces (1990) G. B. Price, An lntroduction to Multicomplex Spaces and Functions (1991) R. B. Darst, lntroduction to Linear Programming (1991) P. L. Sachdev, Nonlinear Ordinary Differential Equations and Their Applications (1991) T. Husain, Orthogonal Schauder Bases (1991) J. Foran, Fundamentals of Real Analysis (1991) W. C. Brown, Matrices and Vector Spaces (1991) M. M. Rao and Z. D. Ren, Theory of Orlicz Spaces (1 991) J. S. Golan and T. Head, Modules and the Structures of Rings (1991) C. Small, Arithmetic of Finite Fields (1991) K. Yang, Complex Algebraic Geometry (1 991 ) D. G. Hoffman etal., Coding Theory (1991) M. 0. Gonzalez, Classical Complex Analysis (1992) M. 0. Gonzalez. Comolex Analvsis (1992) L. W. Baggett, Functional ~na l is is (1992) M. Sniedovich, Dynamic Prwramminn (1992) R. P. Aaatwal. ~ijference ~aiations and lneaualities 119921 C. ~rezkski, ~iorthogonality'and Its ~ ~ ~ l i c a t i o n s to ~ 6 m e r i k l Analysis (1992) C. Swartz. An lntroduction to Functional Analysis (1992) - . . S. B. Nadler, Jr., Continuum Theory (1 992) M. A. Al-Gwaiz, Theory of Distributions (1992) E. Peny, Geometry: Axiomatic Developments with Problem Solving (1992) E. Castillo and M. R. Ruiz-Cobo, Functional Equations and Modelling in Science and Engineering (I 992) A. J. Jeni, Integral and Discrete Transforms with Applications and Error Analysis (1 992) A. Charlier et al., Tensors and the Clifford Algebra (1992) P. Bilerand T. Nadzieia. Problems and Examoles in Differential Eauations 119921 . , E. Hansen, Global ~itimization Using lntervai Analysis (1992)

'

S. Guerre-Delabriere. Classical Seauences in Banach S~aces (1992) Y. C. Wong, lntroductory Theory of ~opological Vector spaces (1992) S. H. Kulkami and B. V. Limaye, Real Function Algebras (1992) W. C. Brown, Matrices Over Commutative Rings (1993) J. Loustau and M. Dillon, Linear Geometry with Computer Graphics (1993) W. V. Petlyshyn, Approximation-Solvability of Nonlinear Functional and Differential Equations (1 993) E. C. Young. Vector and Tensor Analysis: Second Edition (1993) T. A. Bick, Elementary Boundary Value Problems (1993)

174. M. Pavel, Fundamentals of Pattem Recognition: Second Edition (1993) 175. S. A. Albeverio et a/. , Noncommutative Distributions (1 993) 176. W. Fulks, Complex Variables (1 993) 177. M. M. Rao. Conditional Measures and Applications (1993) 178. A. Janicki and A. Weron, Simulation and Chaotic Behavior of a-Stable Stochastic

Processes (1 994) 179. P. Neittaanmaki and D. Tiba. Optimal Control of Nonlinear Parabolic Systems (1994) 180. J. Cronin, Differential Equations: lntroduction and Qualitative Theory, Sewnd Edition

(1994) 181. S. Heikkila and V. Lakshmikantham, Monotone Iterative Techniques for Discontinuous

Nonlinear Differential Equations (1994) 182. X. Mao, Exponential Stability of Stochastic Differential Equations (1994) 183. B. S. Thomson, Symmetric Properties of Real Functions (1994) 184. J. E. Rubio. Optimization and Nonstandard Analysis (1994) 185. J. L. Bueso et al., Compatibility, Stability, and Sheaves (1995) 186. A. N. Micheland K. Wang. Qualitative Theory of Dynamical Systems (1995) 187. M. R. Damel, Theory of Lattice-Ordered Groups (1 995) 188. Z. Naniewicz and P. D. Panagiotopoulos, Mathematical Theory of Hemivariational

Inequalities and Applications (1995) 189. L. J. Corwin and R. H. Szczanba. Calculus in Vector Spaces: Sewnd Edition (1995) 190. L. H. Erbe et al. , Oscillation Theory for Functional Differential Equations (1 995) 191. S. Agaian et al., Binary Polynomial Transforms and Nonlinear Digital Filters (1995) 192. M. I. Gil: Norm Estimations for Operation-Valued Functions and Applications (1995) 193. P. A. Grillet. Semigroups: An lntroduction to the Structure Theory (1995) 194. S. Kichenassamy, Nonlinear Wave Equations (1996) 195. V. F. Krotov, Global Methods in Optimal Control Theory (1996) 196. K. I. Beidaret a/., Rings with Generalized Identities (1996) 197. V. I. Amautov et a/., lntroduction to the Theory of Topological Rings and Modules

11996) G. ~ierksma, Linear and lnteger Programming (1996) R. Lasser, lntroduction to Fourier Series (1996) V. Sima, Algorithms for Linear-Quadratic Optimization (1996) D. Redmond, Number Theory (1996) J. K. Beem et a/., Global Lorentzian Geometry: Second Edition (1996) M. Fontana et a/., Prijfer Domains (1997) H. Tanabe, Functional Analytic Methods for Partial Differential Equations (1997) C. Q. Zhang, lnteger Flows and Cycle Covers of Graphs (1997) E. Spiegel and C. J. O'Donnell, Incidence Algebras (1 997) B. Jakubczyk and W. Respondek, Geometry of Feedback and Optimal Control (1998) T. W. Haynes eta/., Fundamentals of Domination in Graphs (1998) T. W. Haynes et a/., Domination in Graphs: Advanced Topics (1998) L. A. D'Alotto et al., A Unified Signal Algebra Approach to Two-Dimensional Parallel Digital Signal Processing (1998) F. Halter-Koch, Ideal Systems (1998) N. K. Govil et a/., Approximation Theory (1998) R. Cross, Multivalued Linear Operators (1998) A. A. Marfynyuk, Stability by Liapunov's Matrix Function Method with Applications (1 998) A. ~av in i and A. Yagi, Degenerate Differential Equations in Banach Spaces (1999) A. lllanes and S. Nadler, Jr.. Hyperspaces: Fundamentals and Recent Advances -. . (1 999) G. Kato and D. Struppa, Fundamentals of Algebraic Microlocal Analysis (1999) G. X.-Z. Yuan, KKM Theory and Applications in Nonlinear Analysis (1999) D. Motreanu and N. H. Pavel, Tangency, Flow Invariance for Differential Equations, and Optimization Problems (1999) K. Hrbacek and T. Jech, lntroduction to Set Theory, Third Edition (1999) G. E. Kolosov, Optimal Design of Control Systems (1999) A. I. Prilepko et a/., Methods for Solving Inverse Problems in Mathematical Physics (1 999)

Additional Volumes in Preparation

OPTIMAL DESIGN OF CONTROL SYSTEMS

Stochastic and Deterministic Problems

G. E. Kolosov Moscow University of Electronics and Mathematics Moscow, Russia

M A R C E L

MARCEL DEKKER, INC.

Library of Congress Cataloging-in-Publication Data

Kolosov, G. E. (Gennadii Evgen'evich) Optimal design of control systems: stochastic and deterministic problems / G. E.

Kolosov. p. cm.- (Monographs and textbooks in pure and applied mathematics; 221)

Includes bibliographical references and index. ISBN 0-8247-7537-6 (alk. papel-) 1. Control theory. 2. Mathematical optimization. I. Titlc. 11. Scrics.

QA402.3.K577 1999 629.8'3 12--dc2 1 99-30940

CIP

This book is printed on acid-kee paper

IIeadqualters Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 1001 6 tel: 21 2-696-9000; fax: 2 12-685-4540

Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 8 12, CH-4001 Basel, Switzerland tel: 41-61-261-8482; fax: 41-61-261-8896

World Wide Web http://www.dekker.com

The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Spccial Sales/Professional Marketing at the headquarters address above.

Copyright O 1999 by Marcel Dekker, Inc. All Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any form or by any mcans, clcctronic or mechanical, including photocopying, microfillning, arid recording, or by any information storage and retrieval system, without permission in writing from the p~ihlisher

Current printing (last digit): 1 0 9 8 7 6 5 4 3 2 1

PKlNTED IN THE UNITED STATES OF AMERICA

PREFACE

The rise of optimal control theory is a remarkable example of interaction between practical needs and mathematical theories.

Indeed, in the middle of this century the development of various automatic control systems in technology and of systems for control of motion of mechanical objects (in particular, of flying objects such as airplanes and rockets) gave rise to specific mathematical problems concerned with finding the conditional extremum of functions or functionals, which could not be solved by means of the methods of classical mathematical analysis and the calculus of variations.

Extreme urgency of these problems for practical needs stimulated the efforts of mathematicians to develop methods for solving these new problems. At the end of the fifties and at the beginning of the sixties, these efforts were crowned with success when new mathematical approaches such as Pontryagin's maximum principle, Bellman's dynamic programming, and linear and convex programming (developed somewhat earlier by L. Kan- torovich, G . Dantzig, and others) were established. These new approaches greatly affected the research carried out in control theory at that time. It should be noted that these approaches have played a very important role in the process of formation of optimal control theory as an independent branch of science. One can say that the role of the maximum principle and dynamic programming in the theory of optimal control is as significant as that of Maxwell's equations in electromagnetic theory in physics.

Optimal control theory evolved most intensively at the end of the sixties and during the seventies. This period showed a very high degree of coop- eration and interaction between mathematicians and all those dealing with applications of control theory in technology, mechanics, physics, chemistry, biology, etc.

Later on, a gap between the purely mathematical and the practical approach to solving applied problems of optimal control began to emerge and is now apparent. Although the appearance of this gap can be explained by quite natural reasons, nevertheless, the further growth of this trend seems to be undesirable. The author hopes that this book will to some extent reduce the gap between these two branches of research.

iv Preface

This book is primarily intended for specialists dealing with applications of control theory. It is well known that the use of such approaches as, say, the maximum principle or dynamic programming often leads to optimal control algorithms whose implementation for actual real-time plants en- counters great (sometimes insurmountable) difficulties. This is the reason that for solving control problems in practice one often employs methods based on various simplifications and heuristic concepts. Naturally, this results in losses in optimality but makes it possible to obtain control algorithms that allow simple technological implementations. In some cases the use of simplifications and heuristic concepts can also result in significant deviations of the system performance index from its optimal value (Chapter VI).

In this book we describe ways for constructing simply realizable algorithms of optimal (suboptimal) control, which are based on the dynamic programming approach. These algorithms are derived on the basis of exact, approximate analytical, or numerical solutions of differential and functional Bellman equations corresponding to the control problems considered.

The book contains an introduction and seven chapters. Chapter I deals with some general concepts of control theory and the description of mathematical approaches to solving problems of optimal control. We consider both deterministic and stochastic models of controlled systems and discuss the distinguishing features of stochastic models, which arise due to possible ambiguous interpretation of solutions to stochastic differential equations describing controlled systems with white noise disturbances.

We define the synthesis problem as the principal problem of optimal control theory and give a general scheme of the dynamic programming approach. The Bellman equations for deterministic and stochastic control problems (for Markov models and stochastic models with indirect observations) are studied. For problems with infinite horizon we introduce the concept of stationary operating conditions, which is widely used in further chapters of the book.

Exact methods of synthesis are considered in Chapter 11. We describe the exceptional cases in which the Bellman equations have exact solutions, and hence the optimal control algorithms can be obtained in explicit analytical forms.

First (in §2.1), we briefly discuss some well-known results concerned with solution of the so-called LQ-problems. Next, in gg2.2-2.4, we write exact solutions for three specific problems of optimal control with bounded control actions. We consider deterministic and stochastic problems of control of the population size and the problem of constructing an optimal servomechanism. In these systems, the optimal controllers are of the "bang-bang" form, and the switch point coordinates are given by finite formulas.

Preface v

The following four chapters are devoted to the description of approximate methods for synthesis. In this case, the design of suboptimal control systems is based, as a rule, on using the approximate solutions of the corresponding Bellman equations. To obtain these approximate solutions, we mainly use various versions of small parameter methods or successive approximation procedures.

In Chapter I11 we study weakly controlled systems. We consider control problems with bounded controls and assume that the values of admissible control actions are small. This stipulates the appearance of a small parameter in the nonlinear term in the Bellman equation. This, in turn, makes it possible to propose a natural successive approximation procedure for solving the Bellman equation, and thus the synthesis problem, approximately. This procedure is a modification of the well-known Picard and Bellman procedures which provide a way for obtaining approximate solutions of nonlinear differential equations by solving a sequence of linear equations.

Chapter I11 is organized as follows. First (in §3.1), we describe the general scheme of approximate synthesis for controlled systems under stationary operating conditions. Next (in 33.2), by using this general scheme, we calculate a suboptimal controller for an oscillatory system with one degree of freedom. Later (in 33.3 and 33.4), we generalize our approach to nonstationary problems and to the case of correlated disturbances; then we estimate the error obtained. In 33.5 we prove that the successive approximation procedure in question converges asymptotically. Finally (in 33.6), we apply this approach to an approximate design of a stochastic system with distributed parameters.

Chapter IV is about stochastic controlled systems with noises of small intensities. In this case, the diffusion terms in the Bellman equation contain small coefficients. Under certain assumptions this allows us to replace the initial stochastic problem by a sequence of auxiliary deterministic problems of optimal control whose solutions (i) can be calculated more easily and (ii) give a way for designing suboptimal control systems (with respect to the initial stochastic problem). This approach is used for calculating suboptimal controllers for two specific servomechanisms.

In Chapter V we consider a class of controlled systems whose dynamics are quasiharmonic. The trajectories of such systems are close to harmonic oscillations, and this is the reason that the well-developed techniques of the theory of nonlinear oscillations can be effectively applied for studying these systems. By using polar coordinates as the phase variables, we describe the system state in terms of slowly changing amplitude and phase. The presence of a small parameter on the right-hand sides of the differential equations for these variables allows us to elaborate different versions of approximate solutions for the various problems of optimal control. These

vi Preface

solutions are based on the use of appropriate asymptotic expansions of the performance index, the optimal control algorithm, etc. in powers of the small parameter.

We illustrate these techniques by solving four specific problems of optimal damping of deterministic and stochastic oscillations in a biological predator-prey system and in a mechanical system with oscillatory dynamics.

In Chapter VI we discuss some special asymptotic methods of synthesis which do not belong to the classes of control problems studied in Chap- ters 111-V. We consider the problems of control of plants with unknown parameters (the adaptive control problems), in which the a priori uncertainty of their values is small. In addition, we study stochastic control problems with bounded phase variables and a problem of optimal control of the population size whose behavior is governed by a stochastic logistic equation with a large value of the medium capacity. We use small parameter approaches for solving the problems mentioned above. For the construction of suboptimal controls, we employ the asymptotic series expansions for the loss functions and the optimal control algorithms. The error obtained is estimated.

Numerical methods of synthesis are covered in the final Chapter VII. We discuss the problem of the assignment of boundary conditions to grid fUncti0n.s and propose some different schemes for solving specific problems of optimal control. The numerical methods proposed are used for solving specific synthesis problems.

The presentation of all the approaches studied in the book is accompanied by numerous examples of actual control problems. All calculations are carried out up to the accuracy level sufficient for comparatively simple implementation of the optimal (suboptimal) algorithms obtained in actual devices. In many cases, the algorithms are presented in the form of analogous circuits or flow charts.

The book can be helpful to students, postgraduate students, and specialists working in the field of automatic control and applied mathematics. The book may be of interest to mechanical and electrical engineers, physicists and biologists. Only knowledge of the foundations of probability theory is required for assimilating the subject matter of the book.

The reader should be acquainted with basic notions of probability theory such as random events and random variables, the probability distribution function and the probability density of random variables, the mean value of a random variable, inconsistent and independent random events and variables, etc. It is not compulsory to know the foundations of the theory of random processes, since Chapter I provides all necessary facts about the methods for describing random processes that are encountered further in

Preface vii

the book. This makes the book accessible to a wide circle of students and specialists who are interested in applications of optimal control theory.

The author's intention to write this book was supported by R. L. Stra- tonovich, who was the supervisor of the author's Ph.D thesis and for many years till his sudden death in 1997 remained the author's friend.

The author wishes to express his deep gratitude to V. B. Kolmanovskii, R. S. Liptser, and all participants of the seminar "Stability and Control" a t the Moscow University of Electronics and Mathematics for useful remarks and advice concerning the contents of this book.

The author's special thanks go to M. A. Shishkova for translating the manuscript into English and keyboarding.

G. E. Kolosov

CONTENTS

Preface

Introduction

Chapter I. Synthesis Problems for Control Systems and the Dynamic Programming Approach

1.1. Statement of synthesis ~roblems for optimal control systems

1.2. Differential equations for controlled systems with random functions

1.3. Deterministic control problems. Formal scheme of the dynamic programming approach

1.4. The Bellman equations for Markov controlled processes

1.5. Sufficient coordinates in control problems with indirect observations

Chapter 11. Exact Methods for Synthesis Prob- lems

2.1. Linear-quadratic problems of optimal control (LQ- problems)

2.2. Problem of optimal tracking a wandering coordinate 2.3. Optimal control of the population size 2.4. Stochastic problem of optimal fisheries management

Chapter 111. Approximate Synthesis of Stochas- tic Control Systems With Small Control Actions

3.1. Approximate solution of stationary synthesis problems

3.2. Calculation of a quasioptimal regulator for the oscillatory plant

x Contents

3.3. Synthesis of quasioptimal controls in the case of correlated noises

3.4. Nonstationary problems. Estimates of the quality of approximate synthesis

3.5. Analysis of the asymptotic convergence of successive approximations (3.0.6)-(3.0.8) as k -+ co

3.6. Approximate synthesis of some stochastic systems with distributed parameters

Chapter IV. Synthesis of Quasioptimal Systems in the Case of Small Diffusion Terms in the Bellman Equation

4.1. Approximate synthesis of a servomechanism with small-intensity noise

4.2. Calculation of a quasioptimal system for tracking a discrete Markov process

Chapter V. Control of Oscillatory Systems 5.1. Optimal control of a quasiharmonic oscillator. An

asymptotic synthesis method 5.2. Control of the "predator-prey" system. The case of

a poorly adapted predator 5.3. Optimal damping of random oscillations 5.4. Optimal control of quasiharmonic systems with noise

in the feedback circuit

Chapter VI. Some Special Applications of Asymptotic Synthesis Methods

6.1. Adaptive problems of optimal control 6.2. Some stochastic control problems with constrained

phase coordinates 6.3. Optimal control of the population size governed by

the stochastic logistic model

Chapter VII. Numerical Synthesis Methods 7.1. Numerical solution of the problem of optimal damp-

ing of random oscillations 7.2. Optimal control for the "predator-prey" system (the

general case)

Conclusion

References

Index

INTRODUCTION

The main problem of the control theory can be formulated as follows. In the design of control systems it is assumed that each control sys-

tem (see Fig. 1) consists of the following two principal parts (blocks or subsystems): the subsystem P to be controlled (the ~ l a n t ) and the controlling subsystem C (the controller). The plant P is a dynamical system (mechanical, electrical, biological, etc.) whose behavior is described by a well-known operator mapping the input (controlling) actions u(t) into the output trajectories x(t). This operator can be defined by a system of ordinary differential, functional, functional-differential, or integral equations or by partial differential equations. It is important that the operator (or, in technical terms, the structure or the construction) of the plant P is assumed to be given and fixed from the outset.

As for the controller C , no preliminary restrictions are imposed on its structure. This block must be constructed in such a way that the output trajectories {x(t): 0 5 t < T } (the case T = +cc is not excluded) possess, in a sense, sufficiently "good" properties.

Whether the trajectories are "good" or not depends on the specifications imposed on the control system in question. These assumptions are often stated by using the concept of a support (or standard) trajectory S(t), and the control system itself is constructed so that the deviation Ix(t) - Z(t)l on the time interval 0 < t < T does not exceed a value given in advance.

If the "quality" of an individual trajectory {x(t) : 0 < t < T} can be estimated by the value of some functional I[x(t)] of this trajectory, then there is a possibility to find an optimal trajectory x,(t) on which the functional

2 Introduction

I[x(t)] attains its extremum value (in this case, the extremum type (minimum or maximum) is determined by the character of the control problem). The functional I[x(t)] used for estimating the control quality is often called the optimality criterion or the performance index of the control system designed.

If there are no random actions on the system, the problem of finding the optimal trajectory x,(t) amounts to finding the optimal control program {u, (t) : 0 5 t < T) that ensures the plant motion along the extremum trajectory {x,(t) : 0 5 t 5 T}. The optimal control u,(t) can be calculated by using methods of classical calculus of variations [64], or, in more general situations, Pontryagin's maximum principle [156], or various approximate methods [I381 based on these two fundamental approaches. Different methods for calculating the optimal control programs are discussed in [137].

If an optimal control system is constructed without considering stochastic effects, then the system can be open (as in Fig. I ) , since the plant trajectory {x(t) : 0 < t < T} and hence the value of the optimality criterion I[x(t)] are determined uniquely for a chosen realization {u(t) : 0 < t < T} of control actions. (Needless to say that the equation of the plant is assumed to have a unique solution for a given initial state x(0) = xo and a given input function u(t) .)

The situation is different if the system is subject to noncontrolled random actions. In this case, to obtain an effective control, one needs some information about the actual current state x(t) of the plant, that is, the optimal system must be a closed-loop (or feedback) system. For example, all servomechanisms are designed according to this principle (see Fig. 2). In this case, in addition to the operator of the plant P, it is necessary to take into account the properties of a source of information, which determines the required value y(t) of the output parameter vector x(t) at each instant t (examples of specific servomechanisms can be found in [2, 20, 38, 501). The block C measures the current values of the input y(t) and output x(t) variables and forms controlling actions in the form of the functional u(t) = (~(y;, x;) of the observed trajectories yk = {y(s): 0 < s 5 t}, xk = {x(s): 0 < s < t} so that the equality x(t) y(t) holds, ifpossible,

Introduction 3

for 0 < t < T. However, the stochastic nature of the assigning action (command signal) y(t) on one side, and the inertial properties of the plant P on the other side do not allow to ensure the required identity between the input and output parameters. Therefore, a problem of optimal control arises in a natural way.

Hence, just as in the deterministic case, the optimality criterion I[ly(t) - x(t) I] is introduced, which is a measure of the "distance" between the functions y(t) and x(t) on the time interval 0 < - t < - T. The final statement of the problem depends on the type of assumptions on the properties of the assigning action y(t). Throughout this book, we use the probability description for all random actions on the system. This means that all assigning actions are treated as random functions with known (completely or partially) probability characteristics. In this approach, the optimal control law that determines the structure of the block C can be found from the condition that the mean value of the criterion I [I y(t) - x(t) I] attains its minimum. Another approach in which the regions of admissible values of perturbations rather than their probability characteristics are specified and the optimal system is constructed by methods of the game theory is described in [23, 114, 115, 145, 1951.

If the servomechanism shown in Fig. 2 is significantly affected by noises arising due to measurement errors, instability of voltage sources in electrical circuits, varying properties of the medium surrounding the automatic system, then the block diagram in Fig. 2 becomes more complicated and can be of the form shown in Fig. 3.

Here C(t) and ~ ( t ) denote random perturbations distorting information on the command signal y(t) and the state x(t) of the plant to be controlled; the random function [(t) describes the perturbing actions on the plant P. By '1' and '2' we denote the blocks in which useful signals and noises are combined. It is usually assumed that the structure of such blocks is known.

4 Introduction

In this book we do not consider control systems whose block diagrams are more complicated than that shown in Fig. 3. All control systems studied in the sequel are special cases of the system shown in Fig. 3.

The main emphasis of this book is on the methods for calculating the optimal control algorithms

which determine the structure of the controller C and guarantee the optimal behavior of the feedback control system shown in Fig. 3. Since the methods studied in this book are oriented to solving applied control problems in mechanics, engineering, and biology, much attention is paid to obtaining (*) in a form such that it can easily be used in practice. This means that all optimal control algorithms described in the book for specific problems are such that the functional (mapping) c p , in (*) has either a finite analytic form or can be implemented by sufficiently simple standard modeling methods.

From the mathematical viewpoint, all problems of optimal control are related to finding a conditional extremum of a functional (the optimality criterion), i.e., are problems of calculus of variations [28, 58, 64, 1371. However, a distinguishing feature of many optimal control problems is that they are "nonclassical" due restrictions imposed on the admissible values of controlling actions u(t). For instance, this often leads to discontinuous extremals inadmissible in the classical theory [64]. Therefore, problems of optimal control are usually solved by contemporary mathematical methods, the most important being the Pontryagin maximum principle [I561 and the Bellman dynamic programming approach [14]. These methods develop and generalize two different approaches to variational problems in the classical theory: the Euler method and the Weierstrass variational principle used for constructing a separate extremal and the Hamilton-Jacobi method based on the consideration of the entire field of extremals, which leads to partial differential equations for controlled systems with lumped parameters or to equations with functional derivatives for controlled systems with distributed parameters.

The maximum principle, which is a rigorously justified mathematical method, can be used in general for solving both deterministic and stochastic problems of optimal control [58, 116, 1561. However this method, based on the consideration of individual trajectories of the control process, leads to certain technical difficulties when one needs to find the structure of the controller C in feedback stochastic systems (see Figs. 2 and 3). In this situation, the dynamic programming approach looks more attractive. This method however suffers some flaws from the accuracy viewpoint (for example, it is well known that the Bellman differential equations cannot be

Introduction 5

used in some cases of deterministic time-optimal control problems [50, 137, 1561).

In systems with lumped parameters where the behavior of the plant P is governed by ordinary differential equations, the dynamic programming approach allows the reduction of optimal control problem to solving a nonlinear partial differential equation (the Bellman equation). In this case, the structure of the controller C (and hence the form of the function (mapping) cp, in (*)) is determined simultaneously with solving this equation. Thus this method provides a straightforward solution of the main problem in control theory, namely, the synthesis of a closed-loop automatic control system. As for the possibility to use this method, so far it has been rigorously proved that the Bellman differential equations are valid and form the basis for solving the synthesis problems for a wide class of stochastic and deterministic control systems [113, 1751. Therefore, the dynamic programming approach is widely used in this book and underlies practically all methods developed for calculating optimal (or quasioptimal) controls. As noted above, these methods constitute the dominant bulk of the subject matter of this book.

As is known, the functional and differential Bellman equations can be used effectively only if the controlled process (or, in more general cases, the system phase traj.ectory in some state space) is a process without af- tereffects, that is, a Markov type process. In deterministic problems, this Markov property of trajectories readily follows from the corresponding existence and uniqueness theorems for the solutions of the Cauchy problem. To ensure the Markov property of trajectories in stochastic control problems, it is necessary to impose some restrictions on the class of random functions used as mathematical models of random disturbances on the system. To this end, throughout this book, it is assumed that all random actions on the system are either "white noise" type processes or Markov stochastic processes.

When the perturbations are of white noise type, the controlled process x(t) itself can be Markov. If the noises are of Markov type, then the process x(t) is, generally speaking, a component of a partially observable Markov process of larger dimension. Therefore, to solve the synthesis problem effectively in this case, one needs to use a special state space formed by sufficient statistics, so that the time evolution of these statistics possesses the Markov property. In this case, the controller C consists of two parts: a block that forms sufficient statistics (coordinates) and an actual controller whose structure can be found by solving the Bellman equation.

These topics are studied in more detail in Chapter I.

CHAPTER I

SYNTHESIS PROBLEMS FOR CONTROL SYSTEMS

AND THE DYNAMIC PROGRAMMING APPROACH

$1.1. Statement of synthesis problems for optimal control systems

In synthesis problems it is required to find the structure of the control block (controller) C in a feedback control system (see Figs. 2 and 3). From the mathematical viewpoint, this problem is solved if we know the form of the mapping

u = 96, 9 (1.1.1)

that determines a single-valued correspondence between the input functions' P = {P(t): 0 5 t < T) and g = {c(t): 0 5 t < T ) and the control vector-function u = {u(t) : 0 < t < T) (the system is considered on the time interval [0, TI). The conditions under which algorithm (1.1.1) can physically be implemented impose some restrictions on the form of the mapping cp in (1.1.1). Usually, it is assumed that the current values of the control vector u(t) = (ul(t) , . . . , u,(t)) a t time t are independent of the future values Z(tl) and g(tf), t' > t . Therefore, the mapping (1.1.1) can be written as follows (see (*) in Introduction):

where Pi = {P(s): 0 < s < t} and Gl = {(y(s): 0 5 s 5 t ) denote the functions P and ij realized at time t .

In simpler situations (say, in the case of the servomechanism shown in Fig. 2), the synthesis function cp may depend only on the current values of the input processes

u(t) = ~ ( t , 4 t h ~ ( 4 ) (1.1.3)

or even may be of the form

'The functions jF and are input functions for the controller G.

8 Chapter I

if the command signal y(t) is either absent or a known deterministic function of time.

The explicit form of the synthesis function cp is determined by the character of the optimal control problem.

To state the synthesis problem for an optimal control system mathematically, we need to know:

(1) the dynamic equations of the controlled plant; (2) the goal of control; (3) the restrictions (if any) on the domain of admissible values of control

actions u, on the domain of the phase variables x, etc.; (4) the probability characteristics of the stochastic processes that affect

the system.

Obviously, in problems of deterministic optimal control we need only the first three objects.

1.1.1. Dynamic equations of the controlled plant. The present monograph, except for $3.6, deals with control systems in which the plant P can be described by a system of ordinary differential equations in the normal form

i = g(t, 2, u) , (1.1.5)

where x = x(t) E R, and u = ~ ( t ) E R, are the current values of an n-dimensional vector of output parameters (the phase variables) and of an r-dimensional control vector, g(t , x, u) : R x R, x R, c, R, is a given vector-function, and the dot over a letter denotes the derivative with respect to time (that is, 5 is an n-vector with components d x i / d t , i = 1,. . . ,n). Here and in the sequel, Rk denotes the Euclidean space of k-dimensional vectors.

If, in addition to the control u, the controlled plant experiences uncontrolled random perturbations (see Fig. 3), then its behavior is described by the equation

= g(t, 2, u, ((t)) , (1.1.6)

where [(t) is an m-vector of random functions (El (t), . . . , Ern (t)) . Differ- ential equations of the form (1.1.6) with random functions on the right- hand sides are called stochastic dijfe~ential equations. In contrast with the "usual" differential equations of the form (1.1.5), they have some special properties, which we consider in detail in the next section.

The form of the vector-functions g(t, z, u) and (t, x, u, [(t)) on the right in (1.1.5) and (1.1.6) is determined by the physical nature of the plant. In the subsequent chapters, we consider various special cases of Eqs. (1.1.5)

Synthesis Problems for Control Systems 9

and (1.1.6) and solve some specific control problems for mechanical, technical, and biological objects. In the present chapter, we only discuss general restrictions that we need to impose on the function g(.) in (1.1.5) and (1.1.6) to obtain a well-posed mathematical statement of the problem of optimal control synthesis.

The most important and, in fact, the only restriction on the function g(.) is the existence of a unique solution to the Cauchy problem for Eqs. (1.1.5) and (1.1.6) with any given control function u(t) chosen from a function class that is called the class of admissible controls. This means that the trajectory x(t) of system (1.1.5) or (1.1.6) is uniquely determined2 on the time interval to 5 t 5 to + T by the initial state x(to) = xo and a chosen function {u(t) : to 5 t 5 t o + T}.

The uniqueness of the solution x(t) of system (1.1.5) with the initial condition %(to) = zo is guaranteed by well-known existence and uniqueness theorems for systems of ordinary differential equations [137]. The following theorem [I561 presents very general sufficient conditions for the existence and uniqueness of the solution of system (1.1.5) with the initial condition x(to) = xo (the Cauchy problem).

THEOREM. Let a vector-finction g(t, x, u) be continuous with respect to all variables (t, x, u) and continuously difjerentiable with respect to the components of the vector x = (xl , . . . , x,), and let the vector-function u = u(t) be continuous with respect to time. Then there exists a number T > 0 such that a unique continuous vector-function x(t) satisfies system (1.1.5) with the initial condition x(to) = xo on the interval to 5 t < - to + T.

If T + oo, that is, if the domain of existence of the unique solution is arbitrary large, then the solution of the Cauchy problem is said to be infinitely continuable to the right.

It should be noted that the functions g(.) and u need not be continuous with respect to t. The theorem remains valid for piecewise continuous and even for bounded functions g(-) and u that are measurable with respect to t . In the last case, the solution x(t) : to 5 to + T of system (1.1.5) is an absolutely continuous function [91].

The assumption that the function g(-) is smooth with respect to the components of the vector x is much more essential. If this condition is not satisfied, then we can encounter situations in which system (1.1.5) does not have any solutions in the "common" classical sense (for example, for some initial vectors z(t0) = xo, it may be impossible to construct a function

2The solution of the stochastic differential equation (1.1.6) is a stochastic process x(t). The uniqueness of the solution to (1.1.6) is understood in the sense that the initial condition x(to) = zo and the control function u(t): to 5 t 5 t + T uniquely deternline the probability characteristics of the random variables x(t) for all t E (to, to + TI.

10 Chapter I

x(t) that identically satisfies (1.1.5) on an arbitrarily small finite interval to 5 t 5 to +T) .

It is significant that we cannot exclude such seemingly "exotic" cases from our consideration. As was already noted, the control function u on the right-hand side of (1.1.5) can be defined either as a controlling program (that is, as a function of time) or in the synthesis form, for example, in the form u = cp(t, x(t)) like in (1.1.4). It is well known (this will be illustrated by numerous special examples considered later) that many problems of optimal control with control constraints often result in control algorithms u, = cp(t, x(t)) in which the synthesis function cp is discontinuous with respect to the phase variables x. In this case the assumptions of the above- cited theorem may be violated even if the vector-function g(t, x, u) in (1.1.5) is continuously differentiable with respect to x.

Now let us generalize the notion of the solution to the case of discontinuous (with respect to x) right-hand sides of Eqs. (1.1.5). Here we discuss only the basic ideas for constructing generalized solutions. The detailed and rigorous theory of generalized solutions of equations with discontinuous right-hand sides can be found in Filippov's monograph [54].

We assume that in (1.1.5) the control function u has the synthesis form (1.1.4). Then, by setting g(t , x) = g(t, x, cp(t, x)), we can rewrite (1.1.5) as follows:

i = g(t, 2). (1.1.7)

In the space of variables (t, x), we choose a domain D on which we need to construct the solution of system (1.1.7). Suppose that a twice continuously differentiable surface S divides the domain D into two domains D+ and D- and some vector-functions g+ and 5- continuous in t and continuously differentiable in X I , 2 2 , . . . , x, are defined on D+ + S and on D- + S so that g = & in D+ and g = in D-. In this case, the solution of (1.1.7) on the domain D- can uniquely be continued till the surface S. If the vector g is directed towards the surface S in D- and away from the surface S in D+, then the solution goes from D- to D+, intersecting the surface S only once (Fig. 4). But if the vector g is directed towards the surface S in D- and in D+ , then the solution, once coming to S, can leave it neither to D- nor to D+. Therefore, there is a problem of continuation of this solution. In [54] it is assumed that after the solution x(t) comes to the surface S, the subsequent motion of system (1.1.7) is realized along the surface S with velocity

k = $o(t, x) 3 aZ+(t, x) + (1 - a)g- (t, x), (1.1.8)

where x E S and the number a (0 < a < 1) are chosen so that the vector go(t, x) is tangent to the surface S at the point x.

The vector go(t, x) in (1.1.8) can be constructed in the following way.

Synthesis Problems for Control Systems

At the point x E S we construct the vectors &(t, x) and 5- (t, x) and connect their endpoints with a straight line. The point of intersection of this straight line with the plane tangent to S a t the point x is the endpoint of the desired vector &(t, x) (Fig. 5).

A function x(t) satisfying Eq. (1.1.7) in D+ and in D- and satisfying Eq. (1.1.8) on the surface S is called the generalized solution of Eq. (1.1.7) or a solution in the sense of Filippov.

This definition makes sense, since a solution in the sense of Filippov is the limit of a sequence of classical solutions to Eq. (1.1.7) with smoothed (in

12 Chapter I

x) right-hand sides gk(t, x) if &(t, x) + c(t, x) as k -+ co. Moreover, the sequence xk(t) of classical solutions of equations with retarded argument

uniquely converges to the same limit if the delay ~k + 0 as k -+ co (see 1541). We also note that, in practice, solutions in the sense of Filippov can

be realized in some technical, mechanical, and other systems of automatic control, which are sometimes called systems with variable structure [46]. In such systems, the plant is described by Eq. (1.1.5), and the control vector u makes a jump when the phase vector x(t) intersects a given switching surface S. In such systems, if the motion is along the switching surface, the critical segments of the trajectory can be realized by infinitely fast switching of control. In the theory of automatic control such regimes are called "sliding modes" [2, 461.

Generalized solutions in the sense of Filippov allow us to construct the unique solution of the Cauchy problem for Eq. (1.1.5) with function g(t, x, u) piecewise continuous in x.

Now let us consider the stochastic differential equations (1.1.6). We have already pointed out that these equations substantially differ from ordinary differential equations of the form (1.1.5); the special properties of Eqs. (1.1.6) are studied in 51.2. Here we only briefly dwell on the nature of special properties of these equations.

The stochastic differential equations (1.1.6) have the following fundamental characteristic property. If the random function [(t) on the right- hand side of (1.1.6) is a stochastic process of the "white noise" type, then the Cauchy problem for (1.1.6) can have an infinite (larger than a count- able) set of different solutions. Everything depends on how we understand the solution of (1.1.6) or, in other words, on how we construct the random function x(t) that satisfies the corresponding Cauchy problem for (1.1.6). It turns out that in this case we can propose infinitely many well-defined solutions of equation (1.1.6).

This situation gives an impression that the differential equations (1.1.6) do not make any sense. However, since control systems perturbed by a white noise play an important role, it is necessary to specify how the dynamics of a system is described in this case and in which sense Eq. (1.1.6) must be understood if it is still used.

On the other hand, the existence and uniqueness of the solution to the Cauchy problem for equations of the forms (1.1.5) and (1.1.6) is the basic assumption that allows us to use the dynamic programming approach for solving problems of optimal control synthesis.

In $1.2 we discuss these and some other topics.

Synthesis P rob l ems for Con t ro l Sys tems 13

1.1.2. Goa l of control. The requirements imposed on a designed control system determine the form of the functional (the optimality criterion), which is a numerical estimate of the control process. Let us consider some typical problems of optimal control and write out the cost functionals needed to state these problems.

We begin with deterministic problems in which the plant is described by the system of differential equations (1.1.5). First, we assume that the time interval 0 < t 5 T (on which we consider the control process) is fixed and the initial position of the plant is given, that is, x(0) = xo, where xo is a vector of some given numbers. Such problems are called control problems with variable right endpoint of the trajectory. Suppose that it is required to construct an optimal servomechanism (see Fig. 2) such that the input command signal y(t): 0 5 t < T is a known function of time. If the goal of the servomechanism shown in Fig. 2 is to reproduce the input function y(t) via the output function x( t) : 0 < t 5 T most closely, then one of possible criteria for estimating the performance of this servomechanism is the integral

(1.1.9)

where p is a given positive number, and la[ denotes the Euclidean norm of

a vector a, that is, la[ = (C:=, aj)"2. In an "ideal7' servomechanism, the controlled output process is identically equal to the command signal, that is, x(t) y(t), 0 5 t 5 T, and the functional (1.1.9) is equal to zero, which is the least possible value. In other cases, the value of (1.1.9) is a numerical estimate of the proximity between the input and output processes.

It may happen that much "effort" is required to ensure a sufficient proximity between the processes x(t) and y(t), that is, the control action u(t) needs to be large a t the input of the plant P. However, it is undesirable to use too "large" controls in many actual devices both from the energy and economy viewpoints, as well as from the reliability considerations. In these cases, instead of (1.1.9), it is better to use, for example, the cost functional

where a, q > 0 are some given numbers. This functional takes into account both the proximity between the output process x(t) and a given input process y(t) and the total "cost" of control on the time interval [0, TI.

Of course, the functionals (1.1.9) and (1.1.10) do not exhaust all methods for stating integral optimality criteria that are used in problems of synthesis of optimal servomechanisms (Fig. 2). The most general form

14 Chapter I

of integral criteria can be obtained by using the penalty functions introduced by Wald [188]. Suppose that each current state of the system shown in Fig. 2, characterized by the set of vectors ( x ( t ) , y(t) ,u( t ) ) , is "penalized" by a given nonnegative scalar function c(x, y, u) of their arguments. If c(z, y, u) has the meaning of specific penalties per unit time, then the functional

T (1.1.11)

is a natural performance criterion on the time interval [0, TI. Obviously, the functionals (1.1.9) and (1.1.10) are special cases of (1.1.11), in which the penalty function c is defined as c(x, y ,u) = lx - yip or c(x, y, u) = Ix - yip + alulq, respectively.

Another class of optimal control problems is formed by problems of terminal control. Such problems appear when the character of transition processes in the system is not essential for 0 < t < T and we are interested only in the state of the system a t the terminal moment of time T. In this case, using the corresponding penalty function $(x, y), we obtain the terminal optimality criterion

I2[uI = 11r(x(~), Y(T)). (1.1.12)

It should be noted that, by an appropriate extension of the phase vector x, we can rewrite the integral criterion (1.1.11) in the form (1.1.12). Thus, from the mathematical viewpoint, the integral criterion (1.1.11) is a special case of the terminal criterion (1.1.12) (see [I , 34, 1371). Nevertheless, we distinguish these criteria in the sequel, since they have different meanings in applications.

In addition to (1.1.11) and (1.1.12), we often use their combination

this criterion depends both on the transition process and on the terminal state of the system.

If the worst (with respect to a chosen.penalty function) state of the controlled system on a fixed time interval [O,T] is a crucial factor, then, instead of (1.1.11), we must use the criterion

An optimal system constructed by the minimization of the criterion (1.1.14) provides the best (in contrast with any other system) result only in the worst operating mode. Criteria of the form (1.1.14) were studied in [16, 40, 92, 1481.


If a dynamicsystem described by Eqs. (1.1.5) is totally on troll able,^ then optimal control problems with fixed endpoints or with some fixed terminal set are often considered together with control problems with variable right endpoint of the trajectory. In these problems, the control time T is not fixed in advance and, as admissible controls, we take the control functions u(t) : 0 5 t 5 T that transfer system (1.1.5) from a given initial state x(0) = xo to a fixed terminal state x(T) = xl or to a fixed terminal set. An admissible control u, (t) : 0 5 t 5 T is optimal if the integral functional (1.1.11) attains its minimum at u, (t).

The control problems with fixed endpoints contain time-optimal problems as an important special case. In these problems, we have the penalty function c E 1 in (1.1.11), and the minimized functional (1.1.11) is equal to the transition time Il[u] = T from the state xo to the state 21 (or to a given terminal set). Time-optimal problems find wide application in mechanics, physics, technology, e t ~ . (see [1, 24, 85, 90, 123, 137, 1561). It should be noted that in due time just the time-optimal problems made a great impact on the formation of the theory of optimal control as a subject of independent study.

The most of the cited optimal control problems can readily be generalized to the stochastic case in which the plant is described by the stochastic differential equations (1.1.6). It only remains to note that in this case each of the functionals (1.1.11)-(1.1.14) is a random variable for any fixed control algorithm u(t) (given, say, in the form (1.1.3)). These variables are measured to a large extent by their mean values, which determine the values of the mean "losses" or "costs" of control if the control algorithm (1.1.3) is repeated many times. The mean values

Is [u] = E14 [u] = E max c (x (t) , y (t), u(t)) . O<t<T

(1.1.18)

of functionals (1.1.11)-(1.1.14) usually serve as optimality criteria in sto-

3According to 178, 1111, the system (1.1.5) is called totally controllable if for any two given vectors xo and X I , there always exist a finite number T > 0 and an admissible control function u ( t ) : 0 5 t T such that system (1.1.5) is transferred from a given initial state x(0) = xo to a given terminal state x(T) = X I during time T.

16 Chapter I

chastic problems of optimal control (in (1.1.15)-(1.1.18) and, in what follows, EA denotes the mean value (the mathematical expectation) of A).

In controlled stochastic systems, we often encounter control problems in which the terminal moment is a random value, for example, the optimal halting problems [113, 1321. In these problems, we have an additional optimization parameter, namely, the random terminal time T of the process. Therefore, the optimality criterion depends both on the control actions u = U& = ~ ( t ) : 0 5 t 5 T and on r as follows:

There is another type of problems with a random terminal time of the process. Suppose that D c R,+, is a subset of the Cartesian product R,+, = R, x R, of Euclidean spaces of vectors x and y. Suppose that TD is the time instant a t which the point (x(t), y(t)) comes to the boundary a D of the set D for the first time.

Then we can state the problem [34, 1131 offinding the control law (1.1.3) for which the functional

attains an extremum. The functional

is a special case of (1.1.20) with c(-) 1 and $(.) = 0; the value of this functional is equal to the mean time during which the point (x, y) comes to the boundary d D of the set D. If the criterion Ill is used, then the goal of control depends on whether the initial state (x(0), y ( ~ ) ) of the system is an interior point or, vice versa, (x(0), y ( ~ ) ) E Rn+, \ D. If (z(O), y(0)) E D, then, as a rule, we need to maximize (1.1.21) (see 32.2); otherwise ((x(O), y(0)) @ D), the goal of control is to minimize (1.1.21). The last problem is a stochastic version of the time-optimal problem [I , 851.

The criteria 11,. . . ,Il1 considered do not exhaust all possible statements of optimal control problems. The other statements can be found in the literature devoted to the control theory [I, 3, 5, 24, 34, 43, 58, 111, 112, 128, 1561. The choice of the criterion depends on practical needs, that is, on the special technical problem that arises a t the design stage. It should be noted that in the mathematical approach to optimalsystems more attention is paid to general problems of the qualitative theory (the existence and uniqueness of the solution, justification of the optimality principle,


estimates and asymptotics of solutions, etc.), while the choice of a criterion is not very essential. Moreover, by introducing some additional variables, one can transform different optimality criteria to some standard form, for example, to the integral form I1 (15) or to the terminal form I2 ( I s ) . The situation is different if some quantitative calculations of the optimal control block (controller) C are required, that is, if we need to write the algorithm (1.1.3) explicitly. The complexity and sometimes the method of calculations significantly depend on the optimality criterion. On the other hand, some criteria of different form may lead to constructions of optimal systems close to each other (see $2.2). In such cases, it may be useful to replace the original criterion by a new one that simplifies the calculations of the optimal algorithm (1.1.3) but does not change it essentially. Problems of the rational choice of optimality criteria are considered in [50, 1551.

In the present monograph, we assume that an optimality criterion is already chosen from some considerations and is a known functional of trajectories of the system.

1.1.3. Constraints in control problems. In the design of actual control systems (both open- and closed-loop systems, see Figs. 1-3), it is often required to take into account some restrictions on the parameters of the control process. For example, in many problems some constraints are imposed on the control actions u. Suppose that only the value of the control actions is important. Then the control function u is admissible if for all t it takes values from some bounded set U c R,, that is,

In particular, (1.1.22) can be of the form

where uQ, pi, and k are given positive numbers. Constraints of the form (1.1.22)-(1.1.24) reflect the fact that control

actions of any physical nature (force, torque, electric force, potential, heat quantity, concentration, etc.) always vary in a bounded range. The control is not allowed to take large values, since this may result in mechanical break-downs, damages of electric circuits, etc.

Constraints of integral character are also possible. Sometimes they are called constraints on control resources. In this case, the admissible control

Chapter I

functions u(t) must satisfy the condition

where Q > 0 is a given number. Problems with constraints (1.1.25) were considered in [22, 341.

The same technical considerations often show that it is necessary to impose some restrictions on the domain of admissible values of the phase variables x. If X C R, is the set of possible values of x, then the related constraints on the phase trajectories x(t) can be of the form

which is similar to (1.1.22). Constraints of the form (1.1.22)-(1.1.26) are of considerable and some- . .

times of decisive importance in problems of optimal control. So, control problems often make sense only for constraints of the form (1.1.22)-(1.1.25). Indeed, let us consider a control problem in which the plant is described by system (1.1.5) and the control performance is estimated by the integral optimality criterion (1.1.11) with penalty function independent of 7~ and u:

I[u] = j0 c(x(t)) dt.

Suppose that the penalty function c(x) attains the minimumvalue at x = xo (we can always assume that this minimum value is zero). Then, by using an arbitrary large control u (admissible in the absence of constraints (1.1.22)- (1.1.25)), we can obtain a trajectory of motion x(t) that is arbitrarily close to x(t) E xo = const (it is assumed that system (1.1.5) is controllable [78, 1111 and the current state of system (1.1.5) can be measured exactly). Thus, if the control function is unbounded, the functional (1.1.27) can be arbitrarily close to the zero value of its absolute minimum. But if the control u(t) is bounded by some of conditions (1.1.22)-(1.1.25), then the functional (1.1.27) never takes the zero value at x(0) # xo and the minimization problem for (1.1.27) is nondegenerate.

In some cases, restrictions on phase variables (1.1.26) allow us to improve the mathematical model of the control process and to describe the actual situation more precisely. Let us consider an illustrative example. Suppose that the plant is a servomotor with bounded speed. The equation of motion has the form

2 = u, Iu[< uo, (1.1.28)


where x and u are scalars. Suppose that by solving the synthesis problem, we obtain the optimal control algorithm of the relay type:

+ I , Y > 0,

u,(t, x) = uo sign (x - xo(t)), sign y = 0, y = 0, (1.1.29)

-1, Y < 0,

where xo(t) is a given function of time. In this algorithm the control action instantaneously varies by a finite value when the difference (x - xo(t)) changes the sign. If an actual control device implementing (1.1.29) has some inertial properties (for example, it turns out that the absolute rate v of changing the control action is bounded by vo), then it is more convenient to model such a system by a plant whose behavior is described by two phase coordinates x1 = x and x2 = u such that

In this case, v (the rate of change of x2 = u) is a control parameter, and the control constraint in (1.1.28) becomes a constraint imposed on the phase coordinate x2 in (1.1.30).

1.1.4. Probability characteristics of stochastic processes. As was already pointed out in Introduction, in the present monograph, we consider stochastic processes under the assumption that all random actions on the system (the variables ~ ( t ) , [(t) , ~ ( t ) , and ((t) in Fig. 3) are either white noises or Markov type processes. We restrict our consideration of methods for the mathematical description of such processes to a rather elementary presentation of related notions and problems. The rigorous theory of Markov processes based on the measure theory can be found in the monographs [44, 451.

A stationary scalar stochastic process [(t) is called the standard white noise if it is Gaussian and has the zero mean and a delta type correlation function,

E[(t) = 0, E[(t)[(t - r) = S(r) . (1.1.31)

In (1.1.31) S( r ) denotes the Dirac delta-function that is zero for T # 0 and becomes infinite for r = 0 (see [65, 911). Besides, any continuous function f (t) satisfies the relation

[ f ( t ) i ( t - to) dt = I f (b)/2, to = b,

f (a)/2, to = a,

20 Chapter I

Various nonstationary generalizations of the notion of white noise are combinations (obtained by multiplication and addition) of the standard process (1.1.31) and some deterministic functions of time.

Obviously, a Gaussian stochastic process with characteristics (1.1.3 1) is process cannot be physically realized, since, as we can see from (1.1.31), th'

has the infinite variance

and hence, to realize this process we need to have a source of infinite power. Therefore, a process of the white noise type can be considered on some time interval [O,T] as the limit (as A t 0) model of a time sequence of independent identically distributed random variables (i = ((ti = iA) (i = 0,1, . . . , N , N = TlA) with probability density

From (1.1.33) we can see that D[i = l / A t rn as A t 0. This means that on any arbitrarily small finite time interval T with probability 1 a realization of the white noise takes values both larger and smaller than any fixed number. Thus the white noise is a stochastic process that oscillates extremely fast and with infinite amplitude about its mean value. If we try to draw a realization of the white noise on the time interval [to, to + TI, then this realization completely fills the infinite band parallel to the x-axis as shown in Fig. 6.


The white noise is a convenient abstraction of actual stochastic processes. This model of processes is of great advantage for performing mathematical calculations with expressions that contain white noise type processes (in particular, one can readily calculate the mean values of integrals of such processes); this is related to the properties of the delta-function (1.1.32). In mathematical investigations actual stochastic processes &(t) with finite correlation time T,,, can be replaced by white noise type processes if T,,, << T,, where T, is either the characteristic time constant or the transient time in a system under the action of &(t) (for more details, see [173, 1811). The rigorous description of generalized stochastic processes of the white noise type can be found in [63, 1571.

A multidimensional generalization of the standard white noise is an n-dimensional vector-column of random functions [(t) whose components ti (t) (i = 1, . . . , n) are independent Gaussian stochastic processes with characteristics (1.1.31). Thus, instead of (1.1.31), the n-dimensional standard white noise is characterized by

where E is the n x n identity matrix and the superscript indicates the transpose of a matrix.

Now let us consider methods for defining Markov processes. A stochastic process [(t): 0 < t < T with values from some set (the phase space) X is called a Markov process or a process without aftereffect if the conditional probabilities satisfy the relation

for any instants of time t l < t z < . . . < t n from [O,T] and any subset G C X.

Formula (1.1.35) means that the probabilities of future values of Markov processes are completely determined by the last measured state of the process and are independent of the preceding states (the absence of aftereffect). Depending on whether the sets [0, T] and X are discrete or continuous, we distinguish discrete Markov sequences or Markov chains (the sets [0, T] and X are discrete), continuous sequences (the set [O, T] is discrete and the set X is continuous), and discrete Markov processes (the set [O, TI is continuous and the set X is discrete). But if the phase space X is continuous and the argument t of the stochastic process [(t) may take any value t E [0, TI, then we have the following two types of Markov processes: continuous (all sample paths of the process [(t): 0 5 t 5 T are continuous functions of time with probability 1) and strictly discontinuous (all sample paths of the process [(t) : 0 < t < T are step-functions, while the moments and amplitudes of jumps are continuous random variables).

22 Chapter I

There exist more complicated Markov processes that are combinations of processes listed above [181]. Numerous papers and monographs deal with various types of Markov processes (see, for example, [ll, 36, 38, 67, 1601). In the present monograph, we consider the following three types of stochastic processes in most detail: discrete, continuous, and strictly discontinuous Markov processes.

Discrete Markov processes. As was already noted, in this case the time is continuous, but the phase space X is discrete. We assume that the set X consists of finitely many elements x l , . . . , x,, . . . , x,. At each instant of time t E [0, T] (possibly, T + m), the process [(t) with probability P,(t) takes one of the m possible values x,, a: = 1,. . .,m. The transitions to new states are instantaneous and take place a t random moments. Thus a sample path of the process [(t) is a step-function of time as shown in Fig. 7. Suppose that the process is in the state [(t) = x, a t time t. Then, it follows from (1.1.35) that the probability of the event that the process comes to the state [(T) = xp a t time T > t depends only on t , T, x,, and xp. The corresponding conditional probability

which is usually called the transition probability, is an important characteristic of the Markov process [(t).

The unconditional probabilities P, (t) = P{[(t) = x,}, a = 1, . . . , m, and functions (1.1.36) describe the process [(t) completely.4 Actually, the

41f the probabilities P{<(tl) = x,, <(t2) = xp, . . . , ((t,) = xu) are known for any (tl , tz, . . . , t,) E [0, T] and for any set of numbers (a, p , . . . , w ) , then a stochastic process is said to be well-defined.

Synthesis Problems for Control Systems 2 3

probability multiplication theorem [52, 671 and the Markov property of the process [ ( t ) imply that for any t l < t2 < - . . < t, and a, P, . . . , w = 1 , . . . , m the probability of the event { [ ( t l ) = x,, [ ( t 2 ) = x p , . . . , [ ( t , ) = x u } can be expressed in terms of the functions P,(t) and P,p(t, T ) as follows:

On the other hand, the functions P,(t) and P,p( t , r ) can be obtained as solutions of some systems of ordinary differential equations.

Let us derive the corresponding equations for P, ( t ) and Pap ( t , r) . To this end, we first obtain the Chapman-Kolmogorov equation for the transition probabilities

We write formula (1.1.37) for three instants of time t , a, and T as follows:

Since

m

C P { [ ( ~ ) = .,,[(a) = x . , f ( r ) = x p } = P(C( t ) = x,,C(r) = E D ) ,

y=1

(1.1.40) we write the right-hand side of (1.1.40) in the form

and, substituting (1.1.39) and (1.1.41) into (1.1.40), obtain Eq. (1.1.38) after P,(t) is canceled out.

To derive differential equations for P,p(t, T ) we need some local time characteristics of the Markov process [ ( t ) . If we assume that there is at most one change of the state of the process [ ( t ) on a small time interval A,5 then for small T - t we can write the transition probabilities Pap ( t , T )

as follows:

5This is a well-known condition that the process ( ( t ) is ordinary [157, 1601, which means that the probability of two and more jumps of [ ( t ) is equal to o(A) on a small time interval A.

24 Chapter I

(in (1.1.42), as everywhere in the following, by o ( A ) we denote expressions of higher order than the infinitesimal A, that is, o ( A ) is a scalar function such that lima+o o ( A ) / A = 0 ) .

The normalization condition CT==, P,p(t, r ) = 1 for the transition probability and formula (1.1.42) imply that

As is known [160, 1811, the parameters X,p ( t ) determine the intensity of jumps of the process [ ( t ) . The variable X,(t) defined by (1.1.43) is often called the exit intensity or the exit density of the state x,. It determines the time intervaI on which the process <(t) remains in the state x , in the sense that the probability P, of changing the state on the time interval [ t , t + T ] under the condition [ ( t ) = x , is

t+T P, = I - exp { - 1 A, ( s ) c i s ) .

By setting a = t + A in (1.1.38) and using (1.1.42), we obtain

Dividing (1.1.44) by A and passing to the limit as A + 0 , we arrive a t the system of differential equations (a , ,f3 = 1, . . . , m)

for the transition probabilities P,p( t , r ) , which are considered as functions of the initial time t . Equations (1.1.45) hold for t 5 r. The unique solution of system (1.1.45) is determined by the additional conditions on the functions P a p ( t , 7 ) for t = r:


With respect to the variable T, the transition probabilities Pap(t, T) satisfy the other system of equations (a, = 1, . . . , m)

which, by analogy with (1.1.45), can be obtained from (1.1.38) with a = T - A by passing to the limit as A + 0. The initial conditions

Pap (t, t ) = L p , (1.1.48)

which are similar to (1.1.46), provide the uniqueness of the solution of (1.1.47) defined for T > t .

Equations (1.1.47) and (1.1.45) are the forward and backward systems of Kolmogorov equations for the transition probabilities. From equations (1.1.47) one can also readily derive equations for the unconditional probabilities P,(t), a = 1, . . . , m. It suffices to multiply (1.1.47) by P,(t) and sum over a taking into account the fact that

As a result, after we rename p + a and T + t , we obtain the system of equations (a = 1, . . . , m)

The initial probabilities P,(O), a = 1, . . . , m, ensure that the solution of system (1.1.49) is unique for t 2 0.

Thus an ordinary discrete Markov process is completely determined by the probabilities P,(O), a = 1, . . . , m, of initial states and by the intensities Xap(t), a , p = 1, . . ., m, a # p, of jumps. Indeed, if we know these characteristics, then we can find the probabilities P,(t) and P,p(t, T) by solving the systems of linear differential equations (1.1.49) and (1.1.47) (or (1.1.45)). Conversely, if we know the probabilities P,(t) and Pap (t, T), then we can calculate all possible probabilities of the form (1.1.37).

Continuous Markov processes. These processes are continuous in the phase space X and with respect to time. On each time interval to < - t < - to + T, sample paths J( t ) of such processes are continuous functions of time with probability 1.

266

First,, let us consider a one-dimensional (scalar) continuous stochasticprocess.. In this case, the phase space X = RI is a set of points on the realaxis.. Since the instant value £(t) — x of the process is a continuous randomvariable,, its probability properties can be determined via the probabilitydensityy function p(x,t). In a similar way, one can use the multidimen-sionall density function P(XI, X2, . . . , xn; t2, . . . , tn) to describe the set ofinstantt values £(ii) = xi,£(t2) = x2,...,£(tn) = xn. A stochastic process£(t)£(t) : 0 < t < T is considered to be determined if we know all possible jointdensityy functions P(XI, . . . , xn; <i, . . . , tn) for any finite set of time instantsff i, t2, j tn on the interval [0, T].

Thee multidimensional density functions P(XI, . . . , xn; t\, . . . , <„ ) are non-negativee functions that satisfy the normalization condition

p(xi, ...,xn;t l,...,tn)dxi...dxn = I

withh respect to the variables xi,...,xn. With respect to the variablesti ,, . . . , < „, these functions satisfy the symmetry conditions [66, 173]

(heree (ai, «2, ) is a permutation of the indices 1, 2, . . . , n) and thecompatibilityy conditions, which allow us to obtain the marginal distribu-tionn [39] by integrating the initial density function

p(xp(xaa,, x/}; ta,tp) = I I p(x-i, ...,xn;ti,..., tn) dxi . . .

.. . . dxa_idxa+i . . . dxp_idx0+i . . . dxn. (1.1.50)

Itt follows from the probability multiplication theorem for joint distribu-tionss (for any not necessarily Markov process) that

p(xi,x-2,...,xp(xi,x-2,...,xnn;ti,t;ti,t 22,...,t,...,tnn)) =

(1.1.51))

wheree p(a;;,i,- xi, ti; . . . ; K,-_j.,i,-_i), i = 2,...,n, are densities relative tothee conditional distributions of the process £(ti) provided that the instantvaluess £(ti) = x - L , £ ( t2 ) = x2, . . . , £(^-i) = z,--i are chosen. However, ifinn (1.1.51) the sequence of times <i < t2 < < tn increases and £ ( t) is aMarkovv process, then, by (1.1.35), we can write (1.1.51) in the form

xp(xxp(x33,t,t33 x2,t2)...p(xn,tn xn_ i ,<n_ i ) , (1.1.52)


Relation (1.1.52) is an analog of (1.1.37) in the case of continuous Markov processes.

I t follows from (1.1.52) that to write any multidimensional density function, one needs to know the unconditional density p(x, t ) and the conditional density p(y, r 1 x, t ) for any t and r > t. The function p(y, r 1 x, t ) , just as Pap(t, r) in the discrete case, is called the transition probability. One can obtain differential equations for the functions p(x, t ) and p(y, r I x, t ) , which are analogs of Eqs. (1.1.45), (1.1.47), and (1.1.49). However, in contrast with (1.1.45), (1.1.47), and (1.1.49), in this case we have partial differential equations.

We write p(y, r 1 x, t ) = p(x, t ; y, r) for the transition probability and consider this probability as a function of four variables x, t , y, and r. Then, using (1.1.50) and (1.1.52), we readily obtain the relation

p(x, t; y, r) = l p ( x , t ; e, o)p(e, o; y, T) d r , t < o < r, (1.1.53)

which is a continuous analog of the Chapman-Kolmogorov equation; often this equation is also called the Markov [67] or the Smoluchovski equation 11731.

We define the local characteristics of the stochastic process [(t) by the relations

This Markov process is called a diffusion process. The values A(t , x) and B(t , x) that determine [(t) are called the driJet and the diffusion coefficients. Figure 8 illustrates these parameters by showing some sample paths of the diffusion process [(t) issued from the point x a t time to. The straight line AB shows the direction along which the "centroid" of the fan of sample paths drifts for t close to to. The angle a between the line A B and the x-axis is determined by the drift coefficient t a n a = A(to, z). For small (t - to), the diffusion coefficient B(x , to) determines the rate of increase in the variance of instant values of [(t) with respect to the line AB; that is, B(x , to) determines the expansion rate of the fan of sample paths issued from the point (to, x).

Note that the conditional expectations in (1.1.54) can be calculated by integrating with the transition probability. For example,

E{[[(t + A) - [(t)] 1 [(t) = x) = / (2 - x)p(a, t ; e , t + A) de. (1.1.55)

Chapter I

In (1.1.53) we set u = t + A, assume that the transition probability p ( z , t + A; y, r ) is a sufficiently smooth function, and write its Taylor expansion

00 ( Z - x ) ~ a k p

P ( Z , ~ + A ; Y , T ) = P ( x , ~ + A ; Y , ~ ) + ~ ~ ~ ( x , t + A ; y, r ) . (1.1.56) k=l

Substituting (1.1.56) into (1.1.53) and taking into account (1.1.54) and (1.1.55), we obtain

Dividing (1.1.57) by A and passing to the limit as A + 0, we arrive a t the backward Kolmogorov equation

To obtain the forward equation to which the transition probability satisfies as a function of y and r , we need to note that if ( (u ) = z , then the probability properties of the increment <(z , u ) = [ ( r ) - (( u) of the stochastic process [ ( r ) are completely determined by the function p ( z , u; y, 7 ) . As is


known [67, 1731, the characteristic function O(u; z, u ) related to p(z, a ; y, r ) by the Fourier transform

@(u; 2, a ) = E{ exp[ju( t ( r ) - <(a))] I ( (0) = 2)

and by the inverse Fourier transform

is also a universal characteristic of the random variable {(z, a). Considering O(u; z, u ) as the function of u and writing its Maclaurin series

(here uk(z, a ) = E{[~(T) - <( u)lk 1 [(a) = z} is the k-order moment of the increment of <(z, u)), we see that (1.1.61) and (1.1.54) imply

for u = r - A. Applying the inverse Fourier transform to the left- and right-hand sides of this relation, we obtain

In (1.1.63) we have used the well-known formal relation [41] for the delta- function:

which has an exact meaning after it is multiplied by an arbitrary continuous function p(z) and integrated with respect to z and with (1.1.32) taken into account.

We set a = r - A in (1.1.53), use (1.1.63), integrate with respect to z, and obtain

A a2 + - 7 [ B ( ~ , ~ - A)p(x, t ; Y, r - A)] + o(A). 2 ay

30 Chapter I

Then passing to the limit as A t 0, we obtain the forward Kolmogorov equation

which is also called the Fokker-Planck equation. Equations (1.1.58) and (1.1.64) are linear partial differential equations

of parabolic type. It is well known [166, 1791 that the unique solution of such equations is not determined by the initial condition (for (1.1.64)) or the endpoint condition (for (1.1.58))

to which the transition probability satisfies for t = r. It is also necessary to take into account the boundary conditions, which in the case of the infinite phase space X , consist of restrictions imposed on the asymptotic behavior of the function p(x, t ; y, r) as 1x1, Iyl t co. To obtain unique solutions of (1.1.58) and (1.1.64), it suffices to require that p(x, t; y, r) is bounded, though it follows from the normalization condition that we always have a sharper condition: p(x, t ; y, r) t 0 as 1x1, Iyl t co. If the phase space X is bounded, then the additional conditions to which the function p(x, t ; y, r ) need to satisfy a t the boundary points of X are determined by the behavior of the phase trajectories ('(t) near these boundary points (see $6.2).

Multiplying (1.1.64) by p(x, t ) , integrating with respect to x, and taking into account the fact that

we obtain (after the change r t t and y t x) the following equation for the unconditional density p(x, t):

which is similar to (1.1.64). To solve (1.1.66) for t 2 to , it is necessary to set the initial density p(x, to) = po(x) and to take into account the boundary conditions. If po(x) = S(x - xo), then the solution of (1.1.66) is the transition probability p(x0, to; x, t ) , t > to.

If the process ('(t) is an n-dimensional vector-function of time (the phase space is X = R,), then the local characteristics of process (1.1.54) are determined by the vector A(x, t ) of drift coefficients with components Ai(x, t ) ,


i = 1, . . . , n, and by the matrix of diffusion coefficients Bij ( x , t ) , i, j = 1 , . . . , n. In the multidimensional case, the Fokker-Planck equation (1.1.66) has the form

- a 1 a2

a p ( x l t ) - - - [ A i ( x , t ) p ( x , t ) ] + -- dx i 2 a x i d x j

[ B i j ( ~ , t ) p ( x , t ) I . (1.1.67) d t

The sums on the right-hand side of (1.1.67) are taken over twice repeated indices. This means that these expressions are understood as

Later we shall also use such short notation for sums. It should be noted that the Fokker-Planck equation is not the only

method for describing the properties of continuous Markov processes. An- other method for defining the diffusion processes is based on stochastic differential equations. We consider this method in 81.2.

Strictly discontinuous Markov processes. Suppose that the state x of the process [ ( t ) varies by jumps a t random instants of time. By analogy with the case of discrete processes, we assume that the moments of jumps form an ordinary sequence of events; then we denote the intensity of jumps by X(t, x ) provided that [ ( t ) = x . A jump a t time t transfers the process [ ( t ) to a random state y with probability density ~ ( x , y, t ) = p ( [ ( t + 0 ) = y 1 [ ( t ) = x ) . Then for small r - t > 0 the transition probability p ( x , t ; y, r ) of this process can be written in the form

P(., t ; y1r) = 11 - (7 - t ) X ( x , t ) I d ( y - 2 )

+ ( r - t ) X ( x , t ) r ( x , y, t ) + o ( r - t ) . (1.1.68)

Just as previously, in (1.1.53) we first set u = t + A, then we set u = r - A, and apply (1.1.68). Passing to the limit as A + 0 in (1.1.53), we obtain the following pair of integro-diflerential Feller equations for the transition probability:

32 Chapter I

It follows from (1.1.70) that the one-dimensional unconditional density p ( z , t ) satisfies the equation

A(., t ) p ( x , t ) + / A((, e ) r ( t , E , z ) p ( z , t ) dz . (1.1.71)

Equations (1.1.69)-(1.1.71) describe the probability properties of a strictly discontinuous Markov process. They are analogs of Eqs. (1.1.45), (1.1.47), and (1.1.49) for discrete processes and of Eqs. (1.1.58), (1.1.64), and (1.1.66) for diffusion processes.

31.2. Differential equations for controlled systems with random functions

Let us consider system (1.1.6) of stochastic differential equations describing the dynamics of a controlled plant P shown in Fig. 3, namely,

Recall that x = x ( t ) is an n-dimensional vector-column with components xi = z i ( t ) , i = 1 , . . . , n, and g( t , x , u , J ) is a given vector-function of time t and vectors z , u , and J .

We begin the study of stochastic differential equations with the case in which the vector-function g is independent of the control parameters u (the motion without control):

In this case, we mainly consider a special case of Eq. (1.2.1), that is, the scalar equation

ji: = a ( t , x ) + a ( t , x ) J ( t ) . (1.2.2)

The results obtained for (1.2.2) in what follows can readily be generalized to the general case (1.2.1), as well as to the controlled motion case (1.1.6). We shall do this in the end of the section.

If the stochastic process J ( t ) is the time derivative J ( t ) = jl(t) = dr l ( t ) /d t of some random function q ( t ) , then multiplying (1.2.2) by dt we can write Eq. (1.2.2) in terms of differentials:

The stochastic process x ( t ) , t 2 to, is called the solution of stochastic differential equations (1.2.2), (1.2.3) with initial condition x ( t0 ) = xo; the


expression on the right-hand side of (1.2.3) is called the stochastic diflerential of this process if for any t > to we have the integral representation

Suppose that [(t) in (1.2.2) is the standard white noise with characteristics (1.1.31). Then the stochastic process

is Gaussian and it follows from (1.1.31) that the increment ~ ( t ) - q(0) of this process has the following mean and variance over time t:

It also follows from (1.1.31) that the increments of ~ ( t ) on nonintersecting time intervals are independent random variables, since we have

if the intervals [tl, t2] and [t3, t4] have no common interior points. The process ~ ( t ) is called Brownian motion or a Wiener random process. One can show [66] that with probability 1 the realizations of this process are continuous but (in view of (1.1.31) and (1.2.5)) nowhere differentiable functions of time. Formula (1.2.7) illustrates these properties of the Wiener process. Indeed, the order of the increment

is given by the mean square deviation

34 Chapter I

Thus, as A -+ 0 we have lAql - fi -+ 0 (the continuity) while the rate jAqI/A - 1 / f i -+ co (the nondifferentiability). The important formula [I751

lim C [q(ti+l) - ?(ti)]' = t - to A+O ,

2 = 0

(to < t l < t2 < . . . < t N = t ; A = m ~ x ( t ~ + ~ 2 - t i ) )

is an immediate consequence of these specific properties of the Wiener process. In (1.2.8) the convergence of the sequence of random sums on the left-hand side to a nonrandom variable on the right-hand side is understood in the sense of convergence in probability (that is, (1.2.8) is satisfied almost surely).

Let us prove (1.2.8). To this end, we take into account the fact that, according to (1.2.6) and (1.2.7), the increment Aq; = ~ ( t i + l ) - ?(ti) is a Gaussian random variable with zero mean and variance

We calculate the initial 4-order moment of this increment

E(Avi)4 = 1 J x 4 a p { - x2

d~ = 3(ti+l - ti)3. J2n(ti+1 - t i) 2(ti+l - ti)

(1.2.10) Let us consider the random sum

It follows from (1.2.9) that

N-1

E ~ N = C (ti+l - ti) = t - to.

Since Aqi are independent, we have the variance

Taking into account the inequality D(Aqi)' 5 E(Aqi)4 = 3(ti+l -ti)', we obtain

N-l N- 1

DCN < 3 C (ti+l - ti)' 5 3m?x(ti+l - t i) C (ti+l t i ) = 3A(t - to), i = O i =O


that is, the variance of (1.2.11) tends to zero as A + 0. Thus Chebyshev's

and formula (1.2.12) imply (1.2.8). Now let us return to formula (1.2.4) for the solution of stochastic dif-

ferential equations (1.2.2), (1.2.3). The right-hand side of (1.2.4) contains integrals of random functions, that is, stochastic integrals. The rigorous theory of such integrals, presented in [75] for the first time, can be found in [66]. Let us consider some special properties of these integrals that distinguish them from common integrals of sufficiently smooth deterministic functions. From the course of mathematical analysis it is known that (nonstochastic) Riemann and Stieltjes integrals are defined as the limits as A -+ 0 of the integral sums

obtained by the A-decomposition of the integration interval [to, t]: to < tl < tz < ... < t N = t , A = max, (ti+l - t i) . In (1.2.13) and (1.2.14), ri denote some points 7-i E [ti,ti+l] on the ith subinterval of the A-decomposition. Note that for any piecewise continuous functions a(7, x) and any continuous functions U(T, x) , the limits of the integral sums in (1.2.13) and (1.2.14) are independent of the position of the points ~i on the interval i = O , ..., N - 1 .

In the stochastic case in which a(7, x(T)) = a ( r ) , U(T, x ( r ) ) = u( r ) , and V(T) are random functions ( ~ ( 7 ) is a Wiener process), the integrals

can also be defined by formulas (1.2.13), (1.2.14) provided that [160]:

(1) the random functions a ( r ) and U(T) are uniformly mean square continuous on the interval [to, t], that is, E [ ~ ( T + A) - a(7)I2 + 0 as A t 0 uniformly in T E [to, t] (the same holds for ~ ( 7 ) ) ;

(2) the functions a(7) and U(T) are square integrable, more precisely,

36 Chapter I

(3) the limit in (1.2.13), (1.2.14) is understood as the mean square limit (recall that a random variable [ is called the mean square limit of the sequence of random variables [,,

if E([, - [)' + 0 as n + 00).

If assumptions (1)-(3) are satisfied, then the limits in (1.2.13) and (1.2.14) exist and the stochastic integrals are well defined by formulas (1.2.13) and (1.2.14).

The fact that the value of the stochastic integral (1.2.14) depends on the choice of the points ri is essentially new in contrast with the deterministic case. This follows from special properties of the Wiener process and formula (1.2.8) (integral (1.2.13) is independent of the choice of ri) . Let us consider this fact in more detail. It follows from the method for constructing the integral sum (1.2.14) that we can replace the continuous function u ( r ) = a ( x ( r ) , T) on the integration interval [to, t] by a piecewise constant function u A ( r ) whose constant values on different segments [ti, ti+l] of the A-decomposition coincide with values of the continuous function a ( r ) calculated a t arbitrary point ri E [ti,ti+l] (see Fig. 9). Let us define the rule for the choice of r i by the formula

We show that in this case the value of integral (1.2.14) depends on u.


Since the random function x ( r ) is continuous, we can approximate its realization on the interval [ti, ti+l] (i = 0, . . . , N - 1) by the segment of the straight line

tit^) - ti) X(T) = ti) + (T - ti) + o(A), t i < - T - < t i t l , (1.2.16)

t i+l - ti

(relation (1.2.16), (1.2.8), (1.2.13), and (1.2.14) are satisfied almost surely). From (1.2.16) and (1.2.15), we have

We assume that the function u( r , x) is continuously differentiable with respect to both arguments. Then it follows from (1.2.15) and (1.2.17) that

Using (1.2.18), we transform (1.2.14), which defines the stochastic integral, as follows:

In the integral I, and in the differential of the Wiener process d , ~ ( r ) in (1.2.19), the subscript v indicates the method by which the sum was constructed.

For v = 0, formula (1.2.19) defines the Ito stochastic integral

which is widely used in the theory of stochastic processes of diffusion type [45, 66, 113, 117, 1321. Let us calculate the difference I, - I. for an arbitrary v. Since we assume that the function U(T, x) is differentiable, this function can be expanded into the Taylor series up to the first two terms in a neighborhood of an arbitrary point (ti, x(ti)) as follows:

3 8 Chapter I

Passing in (1.2.3) from differentials to finite increments and taking into account the fact that ti+l - t i is small, we obtain

x(ti+l) - ti) = a ( x ( ~ r ) , 7:) (ti+l - ti)

+ ~ ( 7 : ~ ~(7:)) [q(ti+l) - ti)] + o(ti+l - t i)

= ti, %(ti)) [ ~ ( t i + l ) - ti)] + o ( A ) . (1.2.22)

We substitute (1.2.22) into (1.2.21); then we substitute the result into (1.2.19) and obtain the desired difference of integrals

N-1 d u

I, - I. = lim C vZ(ti , x(ti))g(ti l %(ti)) [q(ti+l) - ?(ti)]'- (1.2.23) a-to

i = O

Let us calculate the limit on the right-hand side of (1.2.23). To this end, following [175], we consider both the A-decomposition t i , 0 < i < N , of the integration domain [to, t] and a larger E-decomposition t;, 0 < j < M < N , such that maxj(t;+, - t5) = E > A. For each fixed E-decomposition we define piecewise constant functions 7, (t) and (t) whose constant values on the j th part of the E-decomposition are given by the formulas

Obviously, we have

N- l

5 C 7, (ti) [ ~ ( t i + l ) - T(ti)l2. (1.2.24) i = O

I t follows from (1.2.8) that

lim x [q(ti+l) - r7(ti)I2 = t;+l - t:; A+O

t i ~ [ t ; ~ t ~ + ~ I

therefore, the inequality (1.2.24) implies


The last inequality holds for any fixed c-decomposition. Since the function u(t, x ) is continuously differentiable, we have

as c + 0. Thus the first and the last sums in (1.2.25) have the same limits as E + 0:

This relation and (1.2.25) imply

Now we return to (1.2.23) and obtain the following relation between the stochastic integral Iv and the Ito integral Io:

Thus we see that the following similar formula holds for any square integrable function @(T, x ( T ) ) that is continuously differentiable with respect to both arguments (provided that the stochastic process z ( t ) satisfies Eq. (1.2.3)):

We also note that if the function u in (1.2.26) or @ in (1.2.27) are independent of x , then there is no difference between the integrals I, and Io.

40 Chapter I

The stochastic integral I, with parameter v = 1/2 plays an important role. Such integrals were introduced by R. L. Stratonovich [174] and are called symmetrized. The advantage of such integrals is that they can be calculated by using the rules of integration for deterministic smooth functions. The calculation of the stochastic integral

of the Wiener process is usually presented as an illustrative example [167, 174, 1751. Indeed, in this case the Ito integral can readily be calculated from (1.2.20) and (1.2.8). By (1.2.20) we have

We write the i th summand in the form

take into account the equality rj(tN) = q(t) and (1.2.8), sum up, and thus obtain

A similar symmetrized integral can be found from (1.2.28) and (1.2.27) for v = 1/2, C(T, x) 1, and i~ (7, x(r)) = x( r ) = q(7). Since in this case the second summand on the right-hand side of (1.2.27) is equal to (t -to)/2, we have

t 1 lo 4 7 ) d ~ i ~ n ( r ) = 2 [s2(t) - v2(t0)],

that is, the usual formula of integration of deterministic functions is valid for symmetrized stochastic integrals.

It follows from the preceding that if the solution of stochastic differential equations (1.2.2), (1.2.3) is defined as a random function x(t) satisfying (1.2.4), then this definition must be supplemented with a method for calculating the last integral in (1.2.4), since different stochastic integrals determine different solutions of Eqs. (1.2.2) and (1.2.3).

In fact, it follows from (1.2.4) that for all interpretations of the stochastic integral, the solution x(t) is a Markov stochastic process, since the initial


value x(to) = xo and the future increments of the Wiener process ~ ( r ) , r > to, independent of xo, uniquely determine the future (with respect to to) behavior of the stochastic process x(t), t > to. The Markov process x(t) is a continuous stochastic processes of diffusion type, therefore, according to i1.1, its probability properties are completely determined by the drift and diffusion coefficients (1.1.54)

A(t, x) = lim E -

1 x(t) = a - + o } (1.2.29)

B( t , x) = lim E I x(t) = x . A-0 (1.2.30)

It follows from (1.2.4) that

Since the process x ( r ) and the function a ( r , x) are continuous, the mean value of the first integral in (1.2.31) can be written for small A as follows:

The result of averaging the second integral in (1.2.31) depends on the definition of the stochastic integral. If we have an Ito integral, then by definition (1.2.20) we have

(for any A not necessarily small). This important property of Ito integrals follows from the fact that the increments q ( r ) are independent of the integrand a ( r , x ( r ) ) (here a is not an extrapolating function 11321). By the same reason, for any A, formulas (1.2.8) and (1.2.20) imply

t+a = Ea2 (7, x( r ) ) dr .

(1.2.34) From (1.2.29)-(1.2.34), we obtain the local characteristics

A(t, x) = a(t , x), B( t , x) = a 2 ( t , x) (1.2.35)

42 Chapter I

of the Markov process defined by (1.2.4) on the basis of the Ito integral. If the second integral in (1.2.31) is understood in the sense of (1.2.19),

then formula (1.2.34) remains valid after the change do7(.r) + d , v ( ~ ) but, instead of (1.2.33), we obtain the following formula from (1.2.26) and (1.2.33):

In this case, the diffusion process x(t) has the other characteristics

aa A(t, x) = a( t , x) + v-(t, x)u(t , x), ax (1.2.37) B( t , x) = u2(t , x).

Thus, to avoid misunderstanding, it were more correct from the very beginning to write the stochastic differential equation (1.2.3), say, in the form

da:(t) = a(x(t) , t ) dt + ~ ( t , ~ ( t ) ) durl(t), (1.2.38)

where the subscript v in the differential d,v(t) shows in which sense we understand the stochastic integrals in the integral equation (1.2.4), equivalent to (1.2.3). If v = 0 in (1.2.38), then Eq. (1.2.38) is called the Ito diffe~enttial equation. If v assumes different values, then, as was shown previously, we have different solutions of Eq. (1.2.38). On the contrary, the same diffusion process x(t) with drift coefficient A(t, x) and diffusion coefficient B( t , z) can be described by infinitely many stochastic differential equations

corresponding to different values of the parameter u. If the Markov process x(t) is defined by the Ito equation

then Eq. (1.2.38), equivalent to (1.2.40), has the form


Finally, if x(t) is defined by (1.2.38), then the Ito equation corresponding to this process has the form

dt + u (t, X) dov(t) a ( t ,x ( t ) ) +vu------ (1.2.42) a x

(formulas (1.2.39), (1.2.41), and (1.2.42) readily follow from (1.2.35) and (1.2.37)).

From (1.2.38), (1.2.40)-(1.2.42) we can see that different forms of differential equations can readily be transformed into each other.

In this connection, the following two questions arise immediately: (1) Do we need different forms of Eq. (1.2.38) a t all? (2) Is it possible to use only one definite form of stochastic equations, say, the Ito form, and to consider all differential equations of the form (1.2.3) as Ito equations? The last question has an affirmative answer. Namely, in the majority of mathematical papers 144, 45, 56, 66, 113, 1321 it is postulated from the beginning that the motion of a controlled system is described by Ito differential equations and the theory is constructed via Ito stochastic differentials and integrals. The answer to the first question is based on advantages and disadvantages of different forms of stochastic equations and integrals, on whether an actual process is adequate to its mathematical model, and on the character of the problem considered.

The main advantage of Ito stochastic differential equations is the fact that their solutions x(t) are martingales; this follows from (1.2.4), the definition of the Ito integral (1.2.20), and formula (1.2.33). This fact allows us to study the processes x(t) by rather general methods from the theory of martingales [132]. Moreover, if we use the Ito equation, then we obtain many formulas, for example, in the theory of filtration ($1.5), in the most compact form.

However, sometimes it is inconvenient to use Ito differentials and integrals, because very often we cannot use formulas from the common analysis for operations with Ito processes. This was already pointed out when integral (1.2.28) was calculated. A similar situation arises when we differentiate a function of the stochastic process x(t) that is a solution of the Ito equation (1.2.40).

Suppose that p ( t , x) is a continuous function with continuous partial derivatives &plat, dp /dx , and a2v /dx2 . Then the stochastic process v( t ,x( t ) ) (here x(t) is a solution of Eq. (1.2.40)) has the following Ito stochastic differential [66, 131, 1671:

Chapter I

But if we use the usual differentiation rule, then we have

for the differential of a composite function cp (under the condition that x(t) satisfies Eq. (1.2.40) with usual differential dq(t)).

The outlined computational difficulties disappear if we use the symmetrized form of stochastic integrals and equations. This has already been shown for integration when we calculated the integral (1.2.28). Let us show that the usual formula of the composite function differentiation (1.2.44) holds for the stochastic process x(t) defined by the symmetrized differential equation (that is, by Eq. (1.2.38) with v = 112). The proof of this statement is indirect. Namely, we show that formula (1.2.44) for x(t) , defined by the symmetrized stochastic equation

implies formula (1.2.43) for x(t) , defined by the Ito equation (1.2.40). Indeed, it follows from (1.2.41) that the symmetrized equation equivalent

to (1.2.40) has the form

(the arguments of a and a are omitted). From this relation and (1.2.44) we obtain the symmetrized stochastic differential

Now we note that (1.2.27) implies

By setting @ = adcplax in (1.2.47), we obtain the Ito stochastic differential (1.2.43) from (1.2.46) and (1.2.47).


Now let us consider another problem, which is extremely important from the viewpoint of applications. This is the question whether our mathematical model is adequate to the actual process in the dynamic system with random perturbations. One of the starting points in the theory of optimal control is the assumption that the equation of motion of a dynamic system is given a priori (51.1). Suppose that the corresponding equation has the form (1.2.2) or (1.2.3). We have already shown that one can construct infinitely many solutions of such equations by choosing one or other form of stochastic integrals and differentials. Which solution from these infinitely many ones corresponds to the actual stochastic process in the system? Does this solution exist? The answers can be obtained only by analyzing specific physical premises that lead to Eqs. (1.2.2), (1.2.3). Such investigations were performed in [167, 173, 175, 1811, whose basic results relative to Eqs. (1.2.2), (1.2.3) we state without details.

If we consider the solution x(t) of Eq. (1.2.3) as a continuous model for a stochastic discrete time process xk = x(kA), k = 0,1,2, . . . , which is computer simulated according to the formula

(Jk, k = 1,2, . . . , is a sequence of independent identically distributed Gauss- ian random variables with zero mean and variance DJk = A), then as A + 0 the sequence xk (under the linear interpolation with respect to t between the points tk = kA) converges in probability to the solution x(t) of (1.2.3), provided that the latter is the Ito equation.

If the motion of a dynamic system is given by (1.2.2) (stochastic equations of the form (1.2.2) are called Langevin equations [127]), where ((t) is a sufficiently wide-band stationary stochastic process (for example, the Gaussian Ornstein-Uhlenbeck process with the autocorrelation function Rt(r) = (a/2) exp{-aj.rl} for large values of a), then the solution of (1.2.2) coincides with the solution of the symmetrized equation (1.2.3), that is, of (1.2.38) with u = 112. In particular, each simulation of the Langevin equation (1.2.2) with an "actual" white noise by using analog computers gives a symmetrized solution of Eq. (1.2.3) (see [37]).

In the present monograph, all stochastic equations given in Langevin form (1.2.2) with the white noise [(t) are understood in the symmetrized sense. In what follows, the symmetrized form of stochastic equations is used rather often, since this is the most convenient form for calculations relative to transformations of random functions, the change of variables, etc. In this connection, we omit the index u = 112 in the stochastic differential. The subscript 0 in the differential doq(t) in Ito equations is used if and only if the Ito equation and the corresponding symmetrized equation have

46 Chapter I

different solutions. In other cases, just as in symmetrized equations, we write stochastic differentials without subscripts. Stochastic integrals and differentials that correspond to other values of v [191, 1921 are basically of theoretical interest. We shall not consider them in what follows.

In conclusion, let us consider some possible generalizations of the results obtained. First we note that all above-mentioned facts for scalar equations (1.1.2), (1.2.3) can readily be generalized to the multidimensional case, so that the form of (1.2.2), (1.2.3) is preserved, provided that x E R, and [(t) ( ~ ( t ) ) are n-dimensional vector-columns of phase coordinates and random functions and the functions a and u are an n-column and an n x n matrix. If necessary, the corresponding systems of equations can be written in more detail, for example, instead of (1.2.2), we can write

(as usual, the summation is taken over repeated indices if any, that is, in (1.2.49) we have u;j[j = ~ ~ = l u i j < j ) . Systems (1.2.2) and (1.2.3) (or (1.2.49)) determine an n-dimensional Markov process x(t) with the vector of drift coefficients

1 daij(t , x) Ai(t, x) = ai( t , x) + - akj(t, x); i = 1 , . . . , n , (1.2.50)

2 dxk

and the matrix of diffusion coefficients

If the process x(t) is defined by the Ito equation (1.2.40), then, instead of (1.2.50) and (1.2.51), we have

According to [173], stochastic equations of the more general form (1.2.1) can always be represented in the form (1.2.2). Indeed, as shown in [173], if random functions [(t) in (1.2.1) have a small correlation time (for example, one can assume that [(t) is an n-vector of independent stochastic processes of Ornstein-Uhlenbeck type with a large parameter a) , then Eq. (1.2.1) determines a Markov process with the following drift and diffusion coefficients:


J-00

(here K [a, P] = E(a - Ea) (P - EP) denotes the covariance of random variables a and 0 ; moreover, the mean Eg; and the correlation functions in (1.2.53) and (1.2.54) are calculated under the assumption that the argument x is a nonrandom fixed vector).

Since similar characteristics of the Markov process defined by (1.2.2) (or by (1.2.49)) have the form (1.2.50), (1.2.51), we can obtain the differential equation (1.2.2), which is stochastically equivalent to (1.2.1), by solving system (1.2.50), (1.2.51) with respect to the unknown variables ai and uij, . .

z , ~ = 1,. . . , n. Note that system of Eqs. (1.2.50), (1.2.51) can always be solved with respect to ai and uij. This follows from the fact that the diffusion matrix B is positive definite (semidefinite) and symmetric. As is known [62], to such matrices there corresponds a real symmetric positive (semidefinite) matrix u that is the matrix square root u = 1/B (here we do not consider methods for calculating u). Since u is symmetric, we have B = u2 = uuT, that is, the matrix equation (1.2.51) is solvable, and hence (1.2.50) implies

It follows from the preceding that to study Markov processes of diffusion type, without loss of generality, we can consider stochastic equations only in the form (1.2.2.), (1.2.3) or (1.2.40).

Therefore, the most general form of differential equations of motion of a controlled system with random perturbations [(t) of the white noise type is given by the equation

or by the equivalent equation

(in (1.2.55) [(t) is the standard white noise with the characteristics (1.1.34); in (1.2.56)

r t

is the standard Wiener process). In (1.2.55) and (1.2.56) u = u(t) is understood as the control algorithm (1.1.2). The form of this algorithm can be found by solving the Bellman equation.

48 Chapter I

31.3. Deterministic control problems. Formal scheme of the dynamic programming approach

The dynamic programming approach [I41 was proposed by R. Bellman in the fifties as a method for solving a wide range of problems relative to processes of multistage choice. In this section we briefly discuss the main idea of this method applied to synthesis problems for optimal feedback control systems [16, 171. We begin with deterministic problems of optimal control and pay the main attention to the algorithm of the method, that is, to the method for constructing the optimal control in the synthesis form.

Let us consider the control problem with free right endpoint of the trajectory, in which the plant is given by system (1.1.5)

the performance criterion is a functional of the form (1.1.11)

and the control vector u may take values a t each moment of time in a given bounded set U C R,,

u ( t ) E U. (1.3.3)

In problem (1.3.1)-(1.3.3) the time interval 10, T] and the initial vector of phase variables xo are known; it is required to find the control function u, ( t ) : 0 < - t < - T satisfying (1.3.3) and minimizing the functional (1.3.2) on the trajectory x , ( t ) : 0 5 t 5 T, which is a solution of the Cauchy problem (1.3.1) with u(t) = u,(t). If the Cauchy problem (1.3.1) with u ( t ) = u , ( t ) has a single solution, then the optimal control u, ( t ) : 0 5 t < T may be represented in the form

u*(t) = ~ ( t , ~ ( t ) ) , (1.3.4)

where the current values of the control vector are expressed in terms of the current values of phase variables of system (1.3.1). The optimal control of the form (1.3.4) is called optimal control i n the synthesis form, and formula (1.3.4) itself is often called the algorithm of optimal control.

The dynamic programming approach allows us to obtain the optimal control in the synthesis form (1.3.4) for problem (1.3.1)-(1.3.3) as follows. We write


The function F ( t , xt), called later the loss f ~ n c t i o n , ~ plays an important role in the method of dynamic programming. This function is equal to the minimum value of the functional (1.3.2) provided that the control process is considered on the time interval [t, TI, 0 5 t < T, and the vector of phase variables is equal to x(t) = xt a t the beginning of this interval (that is, at time t). In (1.3.5) the minimum is calculated over all possible strategies U(T) = (p(r, x(T)) , that is, over all possible vector-functions p(r, x) : [t, T] x R, -+ R, provided that:

(a) these functions take values in an admissible set U ; (b) for any t E [O, T] the Cauchy problem for system (1.3.1),

has a unique solution X(T) : t < T 5 T.

The dynamic programming method is based on the Bellman optimality principle [14, 171, which implies that the loss function (1.3.5) satisfies the basic functional equation

] (1.3.6) F( t , z t ) = rnin c (a, %(a), u(a)) d a + F (3, x ~ ) , U ( ~ ) € U

for all 5 E [t,T]. For different statements of the optimality principle and comments see [I, 16, 501. However, here we do not discuss these statements, since to derive Eq. (1.3.6) it suffices to have the definition of the loss function (1.3.5) and to understand that this is a function of time and of the state x(t) = xt of the controlled system (1.3.1) a t time t (recall that the control process is terminated a t a fixed time T).

T To derive Eq. (1.3.6), we write the integral in (1.3.5) as the sum St = -

$ + of two integrals and write the minimum as the succession of minima

min = min min . u ( r ) € U u ( a ) E U y(p)EU tSr<T t ~ ~ < t t < p < ~

Then we can write (1.3.5) as follows:

F ( t , xt) = min min c(a, %(a), u(a)) d a u ( a ) E U ~ ( P ) E U tSo<x t<p<T

'The function (1.3.5) is also called a value funct ion, a cost function, or t h e Be l lman funct ion.

50 Chapter I

Since, by (1.3.1), the control u(p) on the interval p, T) does not affect the solution x(o) of (1.3.1) on the preceding interval [ t , q , formula (1.3.7) takes the form

F( t ,x t ) = min u(u)EU

Now, since by (1.3.5) the second term in the braces in (1.3.8) is the loss function F(i, x,), we finally obtain Eq. (1.3.6) from (1.3.8).

The basic functional equation (1.3.6) of the dynamic programming approach naturally allows us to derive the differential equation for the loss function F ( t , 2). To this end, in (1.3.6) we set t = t + A, where A > 0 is small, and obtain

t+A F ( t , xt) = min [l c(o, x(o), u ( r ) ) do + ~ ( t + A, xtca)]. (1.3.9)

u ( u ) € U t<u<t+A

Since the solutions x(t) of system (1.1.3) are continuous, the increments . . ( x t + ~ - xt) of the phase vector are small for admissible controls u(t) = cp(t, x(t)). Taking into account this fact and assuming that the loss function F ( t , x) is continuously differentiable with respect to all its arguments, we can expand the function F ( t + A, x ~ + ~ ) into its Taylor series about the point (t, xt) as follows:

In (1.3.10) aF/dx denotes an n-vector column with components aF/dxi , i = 1,2, . . . , n; therefore, the third term on the right-hand side of (1.3.10) is the scalar product of the vector of increments ( x t + ~ - xt) and the gradient of the loss function

the function o(A) denotes the terms whose order is larger than that of the infinitesimal A. It follows from (1.3.1) that for small A the increment of the phase vector x can be written in the form


Writing the first term in the square brackets in (1.3.9) as

J t

substituting (1.3.10) and (1.3.12) into (1.3.9), and taking into account (1.3.11), we arrive at

Note that only the first and the fourth terms on the right-hand side of (1.3.13) depend on the control ut. Therefore, the minimum is calculated only over these terms, the other terms in the brackets can be ignored. Dividing (1.3.13) by A, passing to the limit as A -+ 0, and taking into account the fact that limA+o o(A)/A = 0, we obtain the following differential equation for the loss function F ( t , x):

(here we omit the subscript t of the phase vector xt and the control ut). Note that the loss function F ( t , x) satisfies Eq. (3.1.14) on the entire interval of control 0 5 t < T except a t the endpoint t = T, where, in view of (1.3.5), the loss function satisfies the condition

The differential equation (1.3.14), called the Bellman equation, plays the central role in applications of the dynamic programming approach to the synthesis of feedback optimal control. The solution of the synthesis problem, that is, the optimal strategy or the control algorithm u, (t) = p, (t, x) = p, (t, x(t)) can be found simultaneously with the solution of Eq. (1.3.14). Namely, suppose that we have somehow found the function F ( t , x) that satisfies (1.3.14) and (1.3.15). Then the expression in the square brackets in (1.3.14) is a known function o f t , x, and u. Calculating the minimum of this function with respect to u, we obtain the optimal control u, = p, ( t , x) (u, determines the minimum point of this function in U c R,).

If the functions c(t, x, u) and g (t, x, u) and the set of admissible controls U allow us to minimize the function in the square brackets explicitly, then the optimal control can be written in the form

5 2 Chapter I

where dF/dx is a vector of partial derivatives yet unknown; when we minimize the function in the square brackets in (1.3.14), we assume that this vector is given. Using (1.3.16) and denoting

we write (1.3.14) without the symbol "minx as follows:

To complete the synthesis problem, it is necessary to solve (1.3.18) with regard to (1.3.15), that is, to find the function F ( t , x) that satisfies (1.3.18) for 0 < t < T and continuously tends to a given function $(x) as t + T , and to substitute the function F ( t , x) obtained into (1.3.16).

In practice, the main difficulty in this synthesis procedure, is related to solving the Bellman equation (1.3.14) or (1.3.18), which is a first-order partial differential equation. The main distinguishing feature of the Bellman equation is that it is nonlinear because of the symbol "min" in (1.3.14), which shows that the function in (1.3.18) nonlinearly depends on the components of the vector of the partial derivatives dF/dx. The character of this nonlinearity is determined by the form of the functions c(t, x, u) and g(t, x, u), as well as by the set of admissible controls U .

Let us consider some typical illustrative examples.

lo. Suppose that c(t, x, u) = cl(t, x) + uTp( t , x)u, where P is a symmetric r x r matrix positive definite for all x E R and t E [0, TI, g(t, x, u) = a(t, x) + Q(t, x)u (a is an n-vector and Q is an n x r matrix), and the control u is unbounded (that is, U = R,). Then the expression in the square brackets in (1.3.14) takes the form

By differentiating this function with respect to u and solving the system d[.]/du = 0, we obtain


(the matrix P-l is the inverse of P ) . Substituting (1.3.20) into (1.3.19) instead of u, we obtain

2'. Suppose that c(t, x, u) = cl(t, x), g(t, x, u) = a( t , z ) + Q(t , x)u, and the domain U is an r-dimensional parallelepiped, that is, luil < uoi, i = 1, . . . , r, where the numbers uoi > 0 are given. One can readily see that

where signA and jAl are matrices obtained from A by replacing each its element a i j by sign ai j and laij 1 , respectively; {uol, . . . , uor) denotes the diagonal r x r matrix with uol, . . . , uor on its principal diagonal.

3'. Let the functions c(.) and g(.) be the same as in 2'; for the domain U, instead of a parallelepiped, we take an r-dimensional ball of radius Ro centered a t the origin. Then, instead of (1.3.22), we obtain the following expressions for the functions cpo and Q!:

Note that in (1.3.23) and in the following, dF/dxT denotes an n-row-vector with components dF/dxi , i = 1,. . . , n. Therefore, the function ~ Q Q ~ is a quadratic form in components of the gradient vector of the loss function, and the matrix QQT is its kernel.

As a rule, the nonlinear character of Bellman equations does not allow one to solve these equations (and the synthesis problem) explicitly. There is only one exception, namely, the so-called linear-quadratic problems of optimal control (LQ-problems). In this case the differential equations (1.3.1) of the plant are linear:

54 Chapter I

(here A(t) and B( t ) are given n x n and n x r matrices), the penalty functions c(t, x, u) and $(x) in the optimality criterion (1.3.2) are linear-quadratic forms of the phase variables x and controls u, and there are no restrictions on the domain of admissible controls (that is, U = R, in (1.3.3)).

Let us solve the synthesis problem for the simplest one-dimensional LQ- problem with constant coefficients; in this case, the solution of the Bellman equation and the optimal control can be obtained as finite analytic formulas. Suppose that the plant is described by the scalar differential equation

2 = ax + bu, (1.3.24)

and the optimality criterion has the form

(cl > 0, c > 0, T > 0, h > 0, and a and b in (1.3.24) and (1.3.25) are given constant numbers). The Bellman equation (1.3.14) and the boundary condition (1.3.15) for problem (1.3.24), (1.3.25) have the form

The expression in the square brackets in (1.3.26) considered as a function of u is a quadratic trinomial. Since h > 0, this trinomial has the single minimum

which can readily be obtained from the relation d[.]/du = 0; this is a necessary condition for an extremum. Substituting u, into (1.3.26) instead of u and omitting the symbol "min", we rewrite the Bellman equation in the form

We shall seek the loss function F ( t , x) satisfying Eq. (1.3.29) and the boundary condition (1.3.27) in the form


where p(t) is the desired function of time. If we substitute (1.3.30) into (1.3.29), then we see that p(t) must satisfy the ordinary differential equation

for 0 < - t < T. Moreover, it follows from (1.3.27) and (1.3.30) that the function p(t) assumes a given value a t the right endpoint of the control interval:

p(T) = cl. (1.3.32)

Equation (1.3.31) can readily be integrated by separation of variables. The boundary condition (1.3.32) determines the unique solution of (1.3.3 1). Per- forming the necessary calculations, we obtain the following function p(t) that satisfies Eq. (1.3.31) and the boundary condition (1.3.32):

Thus it follows from (1.3.28) and (1.3.30) that the optimal control in the synthesis form for problem (1.3.24), (1.3.25) has the form

where p(t) is determined by (1.3.33). Note that problem (1.3.24), (1.3.25) is one of few optimal control prob-

lems, for which the Bellman equation can be solved exactly. In Chapter I1 we consider some other examples of exact solutions to synthesis problems of optimal control (for deterministic and stochastic control systems). How- ever, the majority of the optimal control problems cannot be solved exactly. In these cases, one usually employs approximate and numerical synthesis methods considered in Chapters 111-VII.

We complete this section with some remarks. First we note that we have considered only a formal scheme or, as is said

sometimes, the algorithmic essence of the dynamic programming approach. The described method for constructing an optimal control in the synthesis form (1.3.4) is justified by some assumptions, which are violated sometimes.

We need to take into account the following. (1) The loss function F ( x , t ) determined by (1.3.5) is not always dif-

ferentiable even if the penalty functions c(t, x, u) and $(x) are sufficiently

56 Chapter I

smooth (or even analytic) functions. It is well known that, by this reason, the dynamic programming approach cannot be used for solving many time-optimal control problems [50, 1561.

(2) Even in the case where the loss function F ( x , t ) satisfies the Bell- man equation (1.3.14), the control u,(t, x) minimizing the function in the square brackets in (1.3.14) may not be admissible. In particular, this control can violate the existence and uniqueness conditions for the solution of the Cauchy problem for system (1.3.1).

(3) The Bellman equation (1.3.14) (or (1.3.18)) with the boundary condition (1.3.15) can have nonunique solutions.

Nevertheless, we have the following theorem [I].

THEOREM. Suppose that there exists a unique continuously differentiable solution Fo(t, x) of Eq. (1.3.14) with boundary condition (1.3.15) and there exists an admissible control u,(t, x) such that

T ~ F o c(t, 2, u) + 9 (t, 2 , ~ ) - ( t , 2) = c(t, x, u*) +gT( t , 2, u*)-(t, 2).

u E U "O dx 1 dx

Then the control u,(t, x) in the synthesis form is optimal, and the function Fo(t, x) coincides with the loss function (1.3.5).

In conclusion, we point out another fact relative to the dynamic programming approach. The matter is that this method can be used for solving problems of optimal control for which the optimal control u,(t, x) does not exist. For example, such situations appear when the domain of admissible controls U in (1.3.3) is an open set.

The absence of an optimal control does not prevent us from deriving the basic equations for the dynamic programming approach. It only suffices to modify the definition of the loss function (1.3.5). So, if we define the function F ( t , xi) as the greatest lower bound of the functional in the square brackets in (1.3.5),

then one can readily see that the function (1.3.35) satisfies the equations

F ( t , xt) = inf (1.3.36) U ( U ) € U

c(t, X, U) + g ~ ( t , x, ~)g(t , x)] = 0, dt

(1.3.37) a x


which are similar to Eqs. (1.3.6) and (1.3.14). However, in this case the functions u,(t, x) realizing the infimum of the function in the square brackets in (1.3.37) may not exist.

Nevertheless, the absence of an optimal control u,(t, x) is of no fundamental importance in applications of the dynamic programming approach, since if the lower bound in (1.3.37) is not attainable, one can always construct the so-called &-optimal strategy u, (t, x). If this strategy is used in system (1.3.1), then the performance functional (1.3.2) attains the value I(u,) = F ( 0 , xo) + E , where E is a given positive number. Obviously, to construct an actual control system, it suffices to know the &-optimal strategy u, ( t , x) for a small E.

Here we do not describe methods for constructing &-optimal strategies. First, these methods are considered in detail in the literature (see, for example, 1113, 1371). Second (and this is the main point), the optimal control always exists in all special problems studied in Chapters 11-VII. This is the reason that, from the very beginning, in the definition of the loss function (1.3.5) we use the symbol "min" instead of a more general symbol "inf."

$1.4. The Bellman equations for Markov controlled processes

The dynamic programming approach is widely used for solving stochastic problems of optimal control. In this section we consider the control problems in which the controlled process is a Markov stochastic process. It follows from the definition of the Markov processes given in $1.1 that the probabilities of future states of a controlled system are completely determined by the current states of the vector of phase variables, which are assumed to be known a t any time t.

One can readily see that the servomechanism shown in Fig. 10 possesses the listed Markov properties if the following conditions are satisfied:

(1) the joint vector ( y ( t ) , x(t)) of instant values that define the input actions and output variables is treated as the phase vector of the system;

5 8 Chapter I

(2) the input action y(t) is a Markov stochastic process; (3) the random perturbation f (t) is a white noise type process; (4) the controller C is a noninertial device that forms current values of

the control actions u(t) according to the rule

Actually, if the plant P is described by equations of the form (1.2.55) and y(t) is a Markov process with known probability characteristics, then it follows from (1.2.55) and (1.4.1) that the joint vector Z(t) = (x(t), y(t)) is a Markov process. In particular, if y(t) is a diffusion process with drift coefficient A(t, y) and diffusion coefficient B(t , y), then it follows from (1.2.39), (1.2.55), and (1.4.1) that the vector E(t) satisfies a system of stochastic differential equations of the form (1.2.2), that is, E(t) is a diffusion Markov process.

In this section we deal only with systems of the type shown in Fig. 10. In §1.5 we consider the possibilities of applying the dynamic programming approach in a more general situation with non-Markov controlled process (Fig. 3).

Later we shall derive the Bellman equations for various stochastic problems of optimal control that are studied in Chapters 11-VII. These problems were stated in 51.1.

1.4.1. Base problem. Optimal tracking of a diffusion process. As the basic problem we consider the synthesis of the optimal servomechanism shown in Fig. 10 under the following conditions:

(i) the controlled plant P is described by a system of stochastic differential equations of the form

where x E R, is a vector of controlled output variables, u E R, is a vector of control actions, [(t) is the n-dimensional standard white noise with characteristics (1.1.34), a and a are a given matrix and a vector- function, and the initial vector x(0) = xo and the time interval [O,T] are specified;

(ii) the optimal control is sought in the form (1.4.1), the goal of control is to minimize the functional

(iii) the restrictions on admissible controls have the form


where U is a given bounded closed subset in the space R,; (iv) the input stochastic process y(t) is independent of ((t) and is an

m-dimensional diffusion Markov process with a known vector AY(t, y) of drift coefficients and with matrix BY(t, y) of diffusion coefficients;'

(v) there are no restrictions on the phase variables, that is, on the components of the vector Z = (x, y) E Rn+m; the current values of the components of the joint vector Z can be measured precisely at any instant of time t E [O,T].

By analogy with (1.3.5) we determine the loss function F(t, xt, yt) for problem (i)-(v) as follows:

The loss function (1.4.5) for stochastic problem (i)-(v) differs from the loss function (1.3.5) in the deterministic case by the additional operation of averaging the functional in the square brackets in (1.4.5). The averaging in (1.4.5) is performed over the set of sample paths XT = [x(r) : t < r < TI, $' = [y(r): t < r 5 TI, that on the interval [t,T] satisfy the stochastic differential equations (1.4.2) and (*) (see the footnote) with initial conditions x(t) = z t , y(t) = yt and control function ~ ( r ) = ( ~ ( 7 , ~ ( r ) , Y(r)), t < r < T .

Since the process Z(r) = (x( r ) , Y(r)) is Markov, the result of averaging F,(t, 2,) = F,(t, xt, yt) = E[.] in (1.4.5) is uniquely determined by the time moment t , by the state vector of the system Zt = (xt , yt) a t this moment, and by a chosen algorithm of control, that is, by the vector-function y(.) in (1.4.1). Therefore, it turns out that the loss function (1.4.5) obtained by minimizing F,(t, Zt) = F,(t, xt, yt) over all admissible controlss (that is, over all admissible vector-functions c p ( - ) ) depends only on time t and the state (xt, yt) of the servomechanism (Fig. 10) a t this time moment.

'As was shown in 51.2, the coefficients AY(t , y ) and B Y ( t , y ) uniquely determi~le the system of stochastic differential equations

$ ( t ) = ay ( t , ~ ( t ) ) + ay ( t , ~ ( t ) ) v ( t ) , (*I whose solutions are sample paths of the Markov process y ( t ) ; in (*) I l ( t ) denotes the standard white noise (1.1.34) independent of E(t ) .

8 J ~ s t as in the deterministic case (51.3) the control in the form (1 .4 .1) is called admissible if (i) for all t E [0, T ) , x E R,, and y E R,, the vector-function p ( t , x , y) takes values from an admissible set U and (ii) the Cauchy problenl for the stocllastic differential equation (1 .4 .2) with control u ( t ) in the form (1 .4 .1) has a unique solution.

60 Chapter I

One can readily see that, for any Z E [t, TI, the loss function (1.4.5) satisfies the equation

that is a stochastic generalization of the functional equation (1.3.6). The averaging in (1.4.6) is performed over the sample paths xf and &, and the symbol E[-] in (1.4.6) indicates the conditional expectation E - - [-I. ":>y:l"t,yt

To prove (1.4.6), we write Ez, - -(-) for the conditional expectation , >Y;lx;,Y;

of a functional of phase trajectdries denoted by (.). Here we average over all possible sample paths x: = [x(r) : t 5 T < TI, y: = [y(r): t 5 r 5 T] provided that the sample paths x(T), y(r) are known" and fixed on the time interval [t, a. Then, using the formula for the repeated expectations

- T writing the integral in (1.4.5) as the sum & = J: + 1; of two integrals

and writing the minimum as the succession of minima

min = min min , u(r )€U U ( U ) E ~ E (P)EU t<s<T t<u<t t_<p<T

we can rewrite (1.4.5) as

- - min min E - -

t j a < t ?<P<T

It follows from (1.4.1) and (1.4.2) that the controls u ( ~ ) on the time interval Z 5 p < T do not affect the stochastic process x(u) on the preceding time interval t < - u < t and the input stochastic process y(r) is independent of controls a t all. Therefore, taking into account the obvious relation


we can rewrite (1.4.8) in the form

-

F(t, xt, yt) = min E u(a)EU t<a<S

Since the process (x(t), y(t)) is Markov, the result of averaging in the second term in the braces in (1.4.9) depends only on the terminal state of a fixed sample path (x:, y:) . Thus, replacing ExI , - - by Ex:,yTIxi,yi and taking

,Yi I":'Y: into account the fact that, by (1.4.5), the second term in (1.4.9) is the loss function F(3, xt; x), we finally obtain the functional equation (1.4.6) from (1.4.9).

Just as in the deterministic case, the functional equation (1.4.6) allows us to obtain a differential equation for the loss function F ( t , x, y). By setting 3 = t + A, we rewrite (1.4.6) in the form

F ( t , xt, yt) = min u f r ) E U

Assuming that A > 0 is small and the penalty function c(x, y, u) is continuous in its arguments and having in mind that the diffusion processes x ( r ) and y(r) are continuous, we can represent the first term in the square brackets in (1.4.10) as

where, as usual, the function o(A) denotes infinitesimals of higher order than that of A.

Now we assume that the loss function F ( t , x, y) has continuous derivatives with respect to t and continuous second-order derivatives with respect to phase variables x and y. Then for small A we can expand the function

62 Chapter I

F ( t + A, x t+a , yt+a) in the Taylor series

Here all derivatives of the loss function are calculated a t the point ( t , xt, yt); as usual, d F / d x and d F / d y denote the n- and m-column-vectors of partial derivatives of the loss function with respect to the components of the vectors z and y, respectively; a2F/dxazT, a2F/axaYT, and a2F/aYaYT denote the n x n , n x m, and m x m matrices of second derivatives.

To obtain the desired differential equation for F ( t , x, y), we substitute (1.4.11) and (1.4.12) into (1.4.10), average, and pass to the limit as A + 0. Note that if we average expressions containing the random increments ( z t + ~ - z t ) and (yt+a - yt), then all derivatives of F in (1.4.12) are considered as constants, since they depend on (t, xt, yt) and the mathematical expectation in (1.4.10) is calculated under the assumption that the values of xt and yt are known and fixed.

The mean values of the increments ( x t + ~ - xt) can be calculated by integrating Eqs. (1.4.2). However, we can avoid this calculation if we use the results discussed in $1.2. Indeed, if just as in (1.4.11) we assume that the control u ( r ) is fixed and constant, U(T) E ut , then we see that for t < T < t + A, Eq. (1.4.2) determines a Markov process X(T) such that we can write (see (1.1.54))

where Ax(t, xt, ut) is the vector of drift coefficients of this process. But since (for a fixed u( t ) = ut) Eq. (1.4.2) is similar to (1.2.2), it follows from (1.2.50) that the components of this vector have the formg

' ~ e c a l l that formula (1.4.14) holds for the symmetrized stochastic differential equation (1.4.2). But if (1.4.2) is an Ito equation, then we have Ax( t , z t ,u t ) = a ( t , s t , u t ) instead of (1.4.14).


In a similar way, (1.4.2), (1.1.50) and (1.2.52) imply

where Bx(t , z t ) = u(t, xt)uT(t, xt). (1.4.16)

The other mean values in (1.4.12) can be expressed in terms of the input Markov process y(t) as follows:

E(Y~+A - ~ t ) = AY (t, yt)A + o(A), (1.4.17)

Finally, since the stochastic processes y(t) and [(t) are independent, we have

E(zt+a - xt)(yt+a - ~ t ) ~ = o(A)- (1.4.19)

Taking into account (1.4.13)-(1.4.19), we substitute (1.4.11) and (1.4.12) into (1.4.10) and rewrite the resulting expression as follows:

T a~ T aF + (AX (t, xt, ut)) & + (AY (t, yt)) - a y

1 d 2 F 1 + - Sp B x (t, xt) - 2 axaxT + 5 SP B~ (t, ~ t ) - ayayT a2F ] + o ( ~ ) } .

(1.4.20)

For brevity, in (1.4.20) we omit the arguments (t , xt, yt) of all partial derivatives of F and denote the trace of the matrix A = Ilaijll; by SPA = all + a22 + - - - + a,,.

By analogy with Eq. (1.3.14), we divide (1.4.20) by A, pass to the limit as A -+ 0, and obtain the following Bellman differential equation for the loss function F = F ( t , x, y):

By analogy with (1.3.14), we omit the subscripts of xt, yt, and ut, assuming that the phase variables x, y and the control vector u in (1.4.21) are taken

64 Chapter I

at the current time t. We also note that the loss function F = F( t , x, y) must satisfy Eq. (1.4.21) for 0 < t < T. At the right endpoint of the control interval, this function must satisfy the condition

which readily follows from its definition (1.4.5). By using the operator

a2 Sp Bx (t, X ) -

axaxT + SP BY (t, Y)- ayayT 7 (1.4.23) I we can rewrite (1.4.21) in the compact form

In the theory of Markov processes [45, 157, 1751, the operator (1.4.23) is called an injinitesimal operator of the diffusion Markov process Z(t) = ( ~ ( t ) ) , ~ ( t ) ) .

To obtain the optimal control in the synthesis form u, = cp, (t, x, y) for problem (i)-(v), we need to solve the Bellman equation (1.4.21) with the additional condition (1.4.22).

If it is possible to calculate the minimum of the function in the square brackets in (1.4.21) explicitly, then the optimal control can be written as follows (see 81.3, (1.3.16)-(1.3.18)):

and the Bellman equation (1.4.21) can be written without the symbol "min"

where @ denotes a nonlinear function of components of the vector dF/dx

Synthes i s P r o b l e m s for C o n t r o l S y s t e m s 65

In this case, solving the synthesis problem is equivalent to solving (1.4.26) with the additional condition (1.4.22). After the loss function F ( t , x, y) satisfying (1.4.26) and (1.4.22) is found, we can calculate the gradient d F ( t , x, y)/dx = w(t, x, y) and obtain the desired optimal control

Obviously, the main difficulty in this approach to the synthesis problem is to solve Eq. (1.4.26). Comparing this equation with a similar equation (1.3.18) for the deterministic problem (1.3.1)-(1.3.3), we see that, in contrast with (1.3.18), Eq. (1.4.26) is a second-order partial differential equation of parabolic type. By analogy with (1.3.18), Eq. (1.4.26) is nonlinear, but, in contrast with the deterministic case, the nonlinearity of Eq. (1.4.26) is weak, since (1.4.26) is linear with respect to the higher-order derivatives of the loss function. This is why, in the general theory of parabolic equations [61, 1241, equations of type (1.4.26) are usually called quasilinear or semilinear.

In the general theory [I241 of quasilinear parabolic equations of type (1.4.26), the existence and uniqueness theorems for their solutions are proved for some classes of nonlinear functions @. The unique solution of (1.4.26) is selected by initial and boundary conditions on the function F ( t , x, y). In our case, condition (1.4.22) that determines the loss function f o r t = T plays the role of the "initial" condition. The boundary conditions are determined by the restrictions imposed on the phase variables x and y in the original statement of the synthesis problem. If, as in problem (i)-(v) considered here, there are no restrictions on the phase variables, then it is necessary to solve the Cauchy problem for (1.4.26). In this case, the uniqueness of the solution is ensured by some requirements on the rate of growth of the function F ( t , x, y) as 1x1, Iyl -+ rn (for details see Chapter 111).

However, there are no general methods for solving equations of type (1.4.26) explicitly. Nevertheless, in some specific cases, Eq. (1.4.26) can be solved approximately or numerically, and sometimes, exactly. We describe such special cases in detail in Chapters 11-VII.

Now let us consider some modifications of problem (i)-(v) that we shall study later. First of all, we trace how the form of the Bellman equation (1.4.21) varies if, in the initial problem (i)-(v), we use optimality criteria that differ from (1.4.3).

1.4.2. S t a t i o n a r y t racking. We begin by modifying the criterion (1.4.3), which allows us to examine stationary operating conditions of the servomechanism shown in Fig. 10.

We assume that criterion (1.4.3) does not penalize the terminal state of the controlled system, that is, the penalty function $(x, y) = 0 in the

66 Chapter I

functional (1.4.3). Then the servomechanism shown in Fig. 10 can operate in the time-invariant (stationary) tracking mode if the following conditions are satisfied:

(1) the input Markov process y(t) is homogeneous in time, namely, its drift and diffusion coefficients are independent of time: AY(t, y) = AY(y) and BY(t, y) = BY (y);

(2) the plant is autonomous, that is, the right-hand sides of Eqs. (1.4.2) do not depend on time explicitly, a ( t , x, u) = a(x, u) and u(t , x) = u(x);

3) the system works sufficiently long (the upper integration limit T + ca in (1.4.3)).

A process of relaxation to the stationary operating conditions is schematically shown in Fig. 11, where the error z(t) = y(t) -x(t) between the input action (the command signal) and the controlled value (x and y are scalar variables) is plotted on the ordinate axis. One can see that for large T the operation interval [0, T] can be conventionally divided into two intervals: the time-varying operation interval [0, t l ] and the time-invariant operation interval [tl, TI. The first is characterized by a correlation relation between the values of random sample paths z(t) , t E [0, t l] , and the initial state z(0). On this interval the probability characteristics of the stochastic process z(t) depend on t. For t > t l , this correlation disappears, and we can assume that z(t), t E [tl, TI, is a stationary process. Hence, the characteristics of the process related to time t > t l are independent of t. In particular, the instant values of the processes x(t) and y(t) on the interval It1, T] have a constant density p, (x, y) of the probability distribution. Conditions for the existence of time-invariant operating conditions for linear controlled systems are discussed in [194].


The performance on the time-invariant interval is characterized by the value y of mean losses per unit time (the stationary tracking error). If the operation time T increases to T + AT (see Fig. l l ) , then the loss function (1.4.5) increases by yAT. Therefore, to study the stationary tracking, it is expedient, instead of the loss function (1.4.5), to use the loss function f (2, y) that is independent of time and can be written as

It follows from (1.4.23) and (1.4.24) that function (1.4.29) satisfies the stationary Bellman equation

where LE,y denotes the elliptic operator

Obviously, for the optimal control u, = cp,(x, y), the error y of stationary tracking has the form

and, together with the functions f (x, y) and u, = c p , (x, y), can be found by solving the time-invariant equation (1.4.30). Some methods for solving the stationary Bellman equations are considered in Chapters IILVI.

1.4.3. Maximization of the mean time of the first passage to the boundary.

As previously, we assume that in the servomechanism shown in Fig. 10 the stochastic process y(t) is homogeneous in time and the plant P is autonomous. We also assume that a simply connected closed domain D C Rn+, is chosen in the (n + m)-dimensional Euclidean space Rn+, of vectors (2, y). It is required to find a control that, for any initial state (x(0), ~ ( 0 ) ) E D of the system, maximizes the mean time E 7 during which the representative point (x(t), y(t)) achieves the boundary d D of the domain D (see the criterion (1.1.21) in 31.1).

By Wu( t -to, xo, yo) we denote the probability of the event that the representative point (x, y) does not reach d D during time t-to if x(to) = xo and

6 8 Chapter I

y( to) = YO, ( z o , yo) E D , and a control algorithm u(t) = c p ( x ( t ) , y ( t ) ) is chosen. This definition implies the following properties of the function Wu:

wu (0 , xo, yo) = 1, WU (+m, xo, yo) = 0 ,

if (xo, yo) is an interior point of D ; (1.4.33)

Wu( t - to , X O , yo) E 0 , V t > to, if ("0, yo) E aD.

If t , denotes a random instant of time at which the phase vector Z ( t ) = ( x ( t ) , y ( t ) ) comes to the boundary a D for the first time, then the time r = t , -to of coming to the boundary is a random variable and the function Wu (-) can be expressed via the conditional probability

WU(t - to, "0 , yo) = PU { r 2 t - to I x ( t0 ) = 20 , Y( t0) = Y O )

= p U { r > t - to I ~ 0 , ~ ~ ) . (1.4.34)

For the mutually disjoint events { r < t-to} and { r > t-to}, the probability addition theorem implies

Expressing the distribution function of the probabilities Pu{r < t - to I xO, yo) via the probability density w,(u) of the continuous random variable r , we obtain

P { ~ < t - t o I " o , ~ o ) = I'-" W , (u) d a = 1 - Wu (t - to, xo, yo)

from (1.4.34) and (1.4.35). Hence, after the differentiation with respect

Using the same notation for the argument of the density and for the random value, that is, writing w,(t - to) = w ( r ) , from (1.4.33) and (1.4.36) we obtain the mean time Er of achieving the boundary

= Lm WU (r, " 0 , Yo) d r = 1; WU(t - to, "0 , Y O ) dt. (1.4.37)

This formula holds if limt,, (t - to) W u (t - to, xo, yo) = 0.


The mean time Er depends both on the initial state (xo, yo) of the controlled system shown in Fig. 10 and on a chosen control algorithm u = p(x, y). Therefore, the Bellman function for the problem considered is determined by the relation

By analogy with (1.4.10), for the function (1.4.38), the basic functional equation of the dynamic programming approach has the form

Fl(zt , yt) = max W U ( r - t , zt , yt) d~ + EFl(zt+a, Y ~ + A . u ( r ) € U

t<~< t+A 11

(1.4.39) The Bellman differential equation for the function Fl(x, y) can be de-

rived from (1.4.39) by passing to the limit as A t 0. In this case, the procedure is almost the same as that used for the derivation of Eq. (1.4.21) for the basic problem (i)-(v). Expanding F l ( x t + ~ , yt+a) in the Taylor series around the point (zt , Y,), averaging the expansion with respect to the random increments ( x ~ + A - xt) and ( y t + ~ - yt), taking into account the relation limA,o Wu(A, xt, yt) = 1 for all (xt, yt) lying in the interior of D, and passing to the limit as A + 0, from (1.4.39) with regard to (1.4.13)-(1.4.19), we obtain the Bellman differential equation for the function Fl(x, y):

m a ~ L $ , ~ F l ( x , y) = -1, (1.4.40) u E U

where the elliptic operator Lz,y is given by (1.4.31). We also note that the function Fl(z , y) satisfies Eq. (1.4.40) in the in-

terior of the domain D. It follows from (1.4.33) and (1.4.38) that a t the points of the boundary dD the function Fl vanishes,

In the theory of differential equations of elliptic type, the problem of solving Eq. (1.4.40) with the boundary condition (1.4.41) is called the first interior boundary-value problem or the Dirichlet problem. Thus, solving the synthesis problem for the optimal control that maximizes the mean time of the first passage to the boundary is equivalent to solving the Dirichlet problem for the semilinear elliptic equation (1.4.40).

70 Chapter I

1.4.4. Minimization of the maximum penalty. Now let us consider the synthesis problem with optimality criterion (1.1.18) for the optimal control system shown in Fig. 10. In this case, it is reasonable to introduce the loss function

Fz(t ,xt ,yt) = min E max c ( x ( r ) , y ( r ) , u ( r ) ) . I (1.4.42)

t<s<T

In (1.4.42) the averaging has the meaning of the conditional mathematical expectation E[.] = E{[.] I x(t) = xt, y(t) = yt).

For small A we have the following basic functional relation for the function F2:

Let us introduce the notation cO(x, y) = m i h E u c(x, y, u). Then it follows from (1.4.43) that either

provided that the function F2(t, xt , yt) > cO(xt, yt) has been obtained from (1.4.45).

Acting by analogy with Section 1.4.1, that is, expanding the function F2(t + A, x t + ~ , y t + ~ ) in the series (1.4.12), averaging, and passing to the limit as A + 0, from (1.4.44) and (1.4.45) we obtain (with regard to (1.4.13)-(1.4.19)) the Bellman equation in the differential form:

minLZL,z,yF~(t, 2, Y) = 0 if Fz(t, 2, Y) > cO(x, Y), (1.4.46)

Fz(t, X , Y) = cO(z, Y) otherwise,

where LT,z,y denotes the operator (1.4.23). The unique solvability of (1.4.46) implies the condition


as well as the matching conditions for the function F2(t, x, y) on the interface between the domains on which the equations in (1.4.46) are defined. These conditions of "smooth matching" [I131 require the continuity of the function F2(t, x, y) and of its first-order derivatives with respect to the phase variables x and y on the interface mentioned above.

If, by analogy with Sections 1.4.2 and 31.4.3, the characteristics of the input process y(t) and of the controlled plant P are time-independent, then it is often expedient to use a somewhat different statement of the problem considered, which allows us to assume that the loss function is independent of time. In this case, we do not fix the observation time but assume that the optimal system minimizes the functional

where ,f3 > 0 is a given number. This change of the mathematical statement preserves all characteristic features of the problem. Indeed, it follows from (1.4.48) that the time of observation of the function c(x, y, u) is bounded and determined by 0. Namely, this time is large for small /? and small for large 0.

For the criterion (1.4.48) the loss function is determined by the formula10

f2 (2, y) = min E maxc(x(r) , y ( ~ ) , u ( ~ ) ) e - P ( ~ - ~ ) . I (1.4.49) U ( T ) E U ~ ? t

Taking into account the relations

min E max c(x( r ) , Y ( ~ ) , u ( ~ ) ) e - P ( ~ - ~ ) u ( r ) ~ U I T > ~ + * J

- - min E max c(x( r ) , Y(r), u ( ~ ) ) e - P ( ~ - ~ - ~ )

r>t+A I

we can rewrite Eq. (1.4.43) for the function f2(x, y) in the form

''As usual E[.] in (1.4.49) is treated as the conditional mathematical expectation

E[.I = EU.1 I x(t) = 2, ~ ( t ) = Y}.

72 Chapter I

By analogy with the previous reasoning, from (1.4.50) we obtain the Bell- man equation for the function f2 (x, y):

minL;,,fi(x, Y) = Pf i (x , Y) if f2(x, Y) > cO(x, Y), (1.4.51)

otherwise,

where L;,, is the elliptic operator (1.4.31) that does not contain the derivative with respect to time t. In $2.2 of Chapter 11, we solve Eq. (1.4.51) for a special problem of optimal control.

1.4.5. Optimal tracking of a strictly discontinuous Markov process. Let us consider a version of the synthesis problem for the optimal tracking system that differs from the basic problem (i)-(v) by conditions (i) and (iv). Namely, we assume that (i) the input process y(t) in the servomechanism (see Fig. 10) is given by a strictly discontinuous Markov process (see 31.1) with known characteristics X(t, y) and ~ ( y , z , t ) determining the intensity of jumps and the density of the transition probability a t the state (t, y) and that (ii) there are no random perturbations ((t) that act on the plant P. In this case, the plant P is described by the system of ordinary (nonstochastic) differential equations

It follows from (1.1.68) that for small A the transition probability p(t , yt; t +

A, yt+A) = p(y(t + A) = y t + ~ I y(t) = yt) for the input process y(t) is determined by the formula

By analogy with the solution of the basic problem, in the case considered, the loss function F3(t,x, y) is determined by (1.4.5) if E[.] in (1.4.5) is understood as the averaging of the functional [.] over the set of sample paths y: = [y(r): t 5 T 5 T] issued from a given initial point y(t) = yt. Obviously, F3(t, x, y) satisfies the functional equations (1.4.6) and (1.4.10).

We rewrite Eq. (1.4.10) for F3 as follows:


Note that for small A we can explicitly average (1.4.54) by integrating the function in the square brackets multiplied by the transition probability (1.4.53).

Since the sample paths of the input process y(t) are discontinuous, the random increments (yt+a - yt) are, generally speaking, not small. There- fore, in our case, instead of (1.4.12), we use the following representation of F3(t + A, xt+a, yt+a) as A +- 0:

(in (1.4.55) it is assumed that F3(t, x, y) is a continuously differentiable function with respect to t and x).

The Bellman equation for F3(t, x, y) can be derived from (1.4.54) in the standard way. To this end, we substitute expansion (1.4.55) into (1.4.54), average with probability density (1.4.53), and pass to the limit as A + 0 in (1.4.54).

Using (1.4.53), we obtain

In a similar way, it follows from (1.4.52) and (1.4.53) that

x t + ~ - xt = a(t, xt, ut)A + o(A), (1.4.57)

dF3 dF3 Ea+(t, xt, Y ~ + A ) = %(t, xt, yt) + O(A),

aF3 dF3 ET(t, x ~ , Y ~ + A ) = -(t, dx xt, yt) + O(A),

(in (1.4.58) and (1.4.59) the functions O(A) denote terms of the order of A such that l ima ,~ O(A) /A = N, where N is a finite number).

74 Chapter I

Using (1.4.55)-(1.4.60) and passing to the limit as A -+ 0 in (1.4.54), we obtain the following Bellman integro-differential equation for the function F3:

F3(T7 2, Y) = $(z, Y). (1.4.62)

If X(t, Y) = X(Y), r ( t , Y, z) = T(Y, z), a( t , 2, u) = a(x, u), *(x, Y) = 0, and T t m, then the system shown in Fig. 10 may operate in the stationary tracking mode (see Section 1.4.2). In this case, instead of (1.4.61), we have the stationary Bellman equation

where a stationary loss function f3(x, y) is determined by analogy with (1.4.29) as

f3(x, Y) = lim [F3(t, x, Y) - 7(T - t)] T-+03

and the number y > 0 determines mean losses per unit time in the stationary tracking mode under the optimal control. The solution of the time- invariant equation (1.4.63) for a special synthesis problem is given in 32.2.

In conclusion, we make some remarks. First, we note that in this section we have considered only the synthesis

problems (and the corresponding Bellman equations) that are studied in the present monograph. The Bellman equations for other stochastic control problems can be found in [I, 3, 5, 18, 34,50, 57, 58, 113, 1221. Moreover, the ideas and methods of the dynamic programming approach are widely used for solving problems of optimal control for Markov sequences and processes with finitely or countably many states [151, 1521, which we do not consider in this book.

We also point out that many arguments and computations in this section are of rather formal character and sometimes correspond to the "physical level of rigor." To justify the optimality principle, the sufficiency of Markov optimal strategies, the validity of Bellman differential equations, and the solvability of synthesis problems rigorously, it is required to have rather complicated and refined mathematical constructions that are beyond the framework of this book. The reader interested in a closer examination of these problems is referred to the monographs [58, 59, 1751, and especially to [113].


31.5. Sufficient coordinates in control problems with indirect observations

We have already noted that the dynamic programming method in, so to say, its "pure" form can be used only for Markov controlled processes. Let Xt be a current phase state of the system. The probabilities of future states Xt+a (A > 0) of the process X( t ) must be completely determined by the last measured value Xt. However, since the time evolution of X( t ) depends . , on random perturbations and control actions, the process X( t ) satisfies the Markov property only if the values ut of the current control are determined by the instant values of the phase variables and time as follows:

The Markov property of the process X( t ) allows us to write the basic functional equation of the optimality principle, then to obtain the Bellman equation, etc., that is, to follow the procedure described in 31.4.

To implement the control algorithm in the form (1.5. I ) , it is necessary to measure the phase variables Xt exactly a t each instant of time. This possibility is provided by the servomechanism shown in Fig. 10. In this case, the phase variables Xt = Zt = (xt, yt) are the components of the (n + m)-dimensional vector of instant input (assigning) actions and output (controlled) variables.

Now let us consider a more general case of the system shown in Fig. 3. At each instant of time, instead of true values of the vectors xt and yt, we have only the results of measurements ci and Zi, which are sample paths of the stochastic processes {c(s): 0 < s < t ) and {Z(s): 0 < s < t}. These processes are mixtures of "useful signals" Y;, xi and "random noises" (i, rl;. Only these results of measurements can be used for calculating the current values of the control actions ut; therefore, the desired control algorithm for the system shown in Fig. 3 has the form of the functional

To illustrate the computation of the optimal functional ~ ( t , Zi, ci), we consider, as an example, the basic synthesis problem (see i1.4, Section 1.4.1) in the case of indirect observations.

Assume that the equation of the controlled plant, the restrictions on the control, and the optimality criterion have the form

76 Chapter I

(here we use the notation from (1.4.2), (1.4.3), and (1.4.4) in §1.4). The observed processes Z(t) and g(t) are determined by the relations

Here P, Q, H, and G are given matrices whose dimensions agree with the dimensions of the vectors Z, x, q, y, v, and <. We also assume that the vectors Z and q (as well as the vectors c and C) are of the same dimension, and the square matrices Q(t) and G(t) are nondegenerate for all t E [0, TI.'' We assume that the stochastic process [(t) in (1.5.3) is the standard white noise (1.1.34) and the other stochastic functions y(t), C(t), and q(t) are Markov diffusion processes with known characteristics (that is, with given drift and diffusion coefficients). The stochastic processes [(t) , y(t), ((t), and q(t) are assumed to be independent. We also note that the stochastic process x(t), which is a solution of the stochastic equation (1.5.3), is not Markov, since in this case the control functions u(t) = ut on the right-hand side of (1.5.3) have the form of functionals (1.5.2) and depend on the history of the process.

Following the formal scheme of the dynamic programming approach, by analogy with (1.4.5), we can define the loss function for the problem considered as follows:

T - t -t F ( t , xo, Yo) = min E [l ~ ( ~ ( 7 1 , ~ ( 7 1 , 4 ~ ) ) d ~ + d(x(T) , Y(T)) I z;, G] *

u(r )EU t<r<T

(1.5.7) Since the functions jrot and vi are arguments of F in (1.5.7), it would be more correct if expression (1.5.7) were called a loss functional; however, both (1.5.7) and (1.4.5) are called loss functions.

In contrast with $1.4, it is essentially new that we cannot write the optimality principle equation of type (1.4.6) or (1.4.10) for the function (1.5.7), since this function depends on the stochastic processes Z(t) and c( t ) , which are not Markov. Formula (1.5.6) immediately shows that Z(t) and c( t ) have no Markov properties, since the sum of Markov processes is not a Markov process. Moreover, it was pointed out that the process x(t) itself is not Markov. Therefore, we can solve the synthesis problem

"For simplicity, we assume that Q( t ) and G ( t ) are nondegenerate, but this condition is not necessary [132, 1751.


by using the dynamic programming approach only if we can choose new "phase" variables X ( t ) = Xt for the loss function (1.5.7) so that, on one hand, they were sufficient for computation of minimum future losses in the sense of

F ( t , Z;, g;) = F ( t , Xt)

and, on the other hand, the stochastic process X( t ) be Markov. Such phase variables Xt are called suficient coordinates [I711 by analogy with sufficient statistics used in the mathematical statistics [185].

It turns out that there exist sufficient coordinates for the problem considered and Xt is a collection of instant values of observable processes Z(t) = Zt and y(t) = gt and of the a posteriori probability density p(t, xt, yt) = p(x(t) = xt, y(t) = yt I ZJ, 5;) of the unobserved vectors xt and yt,

In what follows, it will be shown that the coordinates (1.5.8) are sufficient to compute the loss function (1.5.7). In the case of an uncontrolled process x(t), the Markov property of (1.5.8) follows from Theorem 5.9 in [175].

To derive the Bellman differential equation, it is necessary to know equations that determine the time evolution of sufficient coordinates. For the first two components of (1.5.8), that is, for the process Z(t) and c( t ) , these equations can be assumed to be known, because one can readily obtain them from the a priori characteristics of the processes ~ ( t ) , x(t), ((t), q(t) and formulas (1.5.6). Later we derive the equation for the a posteriori probability density p(t, xt, yt ) .

First, we do not pay attention to the fact that the control ut has the form of a functional (1.5.2). In other words, we assume that u(t) in (1.5.3) is a known deterministic function of time. Then the stochastic process x(t) that satisfies the stochastic equation (1.5.3) is a diffusion Markov process whose characteristics (the drift and diffusion coefficients) are uniquely determined by the vector a( t , x, u) and the matrix a ( t , x) (see 31.2). Thus, in our case, x(t), y(t), ((t), and q(t) are independent stochastic Markov diffusion processes with given drift coefficients and matrices of diffusion coefficients. In view of formulas (1.5.6) and the fact that the matrices Q(t) and G(t) are nondegenerate, it follows that the collection (Z(t), g(t), x(t) , y(t)) is a Markov diffusion process whose characteristics can be expressed via given characteristics of the processes x(t) , y(t), ((t), and q(t) . Indeed, if we denote the vectors of drift coefficients by A, (t, x), Ay (t, y), AC (t, 0, A, (t , q) and the diffusion matrices of independent Markov processes x(t) , y(t), ((t)) , and q(t) by B,(t, x) , By (t, y), B,-(t, (), B, (t, q), then it follows from (1.5.6) that the drift coefficients A,- and AG for the components Z(t) and G(t) are

78 Chapter I

determined by the relations

and the matrix B of the diffusion coefficients of the joint process ( S ( t ) , c(t), x ( t ) , y ( t ) ) has the block form

where Bs(t, 2, x ) and BG( t , y, y) are square matrices12 determined by the relations

Now we point out that in the Markov collection of random functions ( Z ( t ) , y( t) , x ( t ) , y ( t ) ) the components S ( t ) and G ( t ) are observable, but the components x ( y ) and y ( t ) are not observable. Partially observable Markov processes are often called conditional Markov processes. The rigorous theory of such processes can be found in [132, 1751.

Let us consider the conditional ( a posteriori) density p(t , x t , y t ) = ( x ( t ) = x t , y ( t ) = yt /t I:, Y i ) of the probability distribution for unobservable components of the partially observable Markov process ( S ( t ) , c(t), x ( t ) , y ( t ) ) . It turns out that the a posteriori density p( t , x t , yt) satisfies a stochastic partial differential equation, first obtained in [175]. This is a generalization of the Fokker-Plank equation (1.1.67) to the case of observation. In what follows, we briefly derive this equation.

121f P, Q, H , and G in (1.5.6) are row matrices, then B;( t , f , z ) and BG(t,G,y) are scalar functions.


According to [175], we introduce the following notation. We denote the collection of random functions (Z(t), c( t ) , x(t) , y(t)) that forms a Markov process by a single letter z(t) and assume that the dimension of the vector z is equal to n. We assume that the unobservable components of the vector z are numbered from 1 to m and the observable components are numbered from m+1 to n. For convenience, we write x, (1 5 a 5 m) for unobservable components and y,, (m + 1 < - p < - n) for observable ones. We also use three groups of indices: the indices i , k, l , .. . vary from 1 to n; the indices a ,p , r ,... f rom1 t o m ; a n d p , a , r ,... f r o m m + l t o n .

By assumption, the local characteristics of the diffusion process z(t) are given a priori:

1 lim -E[Azk I ~ ( t ) = z] = Ak(t ,z) ; Azk = zk(t + A) - zk(t); A-0 A

It is required to obtain an equation for the a posteriori probability p(t, xt) = p(xt I y;), provided that (1.5.14) and the results of observation of y; are known.

Using the transition probability pA(zt+a I zt) = pa(x t+a , yt+a I xt, yt) and the probability multiplication theorem, we obtain

Integrating (1.5.15) with respect to xt and taking into account (1.1.50), we obtain

If we write the left-hand side of (1.5.16) in the form

then we can write (1.5.16) as follows:

Integrating (1.5.16) with respect to x t+a , we obtain

80 Chapter I

Substituting (1.5.18) into (1.5.17) and taking into account the fact that the equality

is valid, since the arguments are continuous, we obtain

~ ~

(1.5.19) Equation (1.5.19) for partially observable Markov processes plays the

same role as the Markov (Smoluchovski) equation (1.1.53) for the complete observation. To derive the differential equation for the a posteriori density p(t, z t ) from (1.5.19), we use the same method as for the derivation of the Fokker-Planck equation in 5 1.1 (see (1.1.59)-(1.1.64)).

Let us introduce two characteristic functions of random increments Az,, a = 1 ,..., m, and Azk, k = 1 ,..., n,13

The transition probability can be expressed in terms of inverse Fourier transforms as follows:

131n (1.5.20) and (1.5.21), as usual, j = &i and the sum is taken over the repeated indices:


Using the expansion of In &(u, zt) in the Maclaurin series, we can write

x K k A , . . . , T ] ( j u ) ( j u ) . . . ( u ) } (1.5.24) k , e, . . . , T = l -

S

where Ks [Azk, . . . , Az,] denotes the s-order correlation between the components of the vector of increments A z = z(t + A) - z( t ) of the Markov process z(t) . Using well-known relations between correlations and initial moments [173, 1811, we see that (1.5.14) gives the following representation for (1.5.24):

A @2(u1,. . . , u,, i t ) = exp [ ~ j ~ k U k - - B ~ , u ~ u , + o(A)] (1.5.25)

2

(for brevity, in (1.5.25) and in the following we do not write arguments of Ak and Bkl , namely, Ak = &(t , zt) and Bke = Bke(t, zt)).

Comparing (1.5.22) and (1.5.23), we see that

@ l ( ~ l , - . . , Urn, Zt , Y ~ + A ) = @Irn-" eXP [ - juu(yut+n - yet )]

After the substitution of (1.5.25), we can calculate the integral (1.5.26) explicitly. As the result of integration, for the characteristic function we obtain the formula14

where

1 4 ~ o obtain (1.5.27) and (1.5.28), it suffices to use the well-known formula [67]

which holds for real symmetric positive definite matrices B = [IBkell; and any constants mk and me.

82 Chapter I

and K is a constant that does not influence the final result of calculations. Note that we calculated (1.5.26) under the assumption that the matrix IIBaPIIL+l is nondegenerate and we used the notation ]lF,,,ll = IIB,,ll-l.

Since the exponent in (1.5.27) is small (- A), we replace the exponential by the first three terms of the Maclaurin series ex = 1 + x + x 2 / 2 , truncate the terms whose order with respect to A is larger than 1, and obtain

exp [ ~ ( u l , . . . , urn, ~ t , y t + a ) A + o ( A ) ] = 1 + L ( u l , . . ., u r n , Z t r Y ~ + A ) A

In (1.5.29) we used the relation FupBpr = BapFpr = dur, where d,, is the Kronecker delta, and the formula

which follows from the properties of Wiener processes and is a multidimensional generalization of formula (1.2.8) (for details, see Lemma 2.2 in [175]) . Substituting (1.5.28) into (1.5.29) and collecting the similar terms, we obtain

exp [ L ( u l , . . . , urn, ~ t , y t + a ) A + o ( A ) ] = 1 + ( j u c y ) ( A Q A + BaoFopAYp)

Using (1.5.23), (1.5.27), and (1.5.31), we calculate the numerator of the fraction on the right-hand side in (1.5.19):

- - 1 + j u , (A, A + B,, F u p A y p ) [ 1 . + - ( 3 u , ) ( j ~ ~ ) B , ~ + ~ , ~ , , ~ y , ] p ( t , x t ) d u l . . .durndx1t . . . dzmt 2

Taking into account the formulas (see (1.1.60)-(1.1.64))


we obtain the numerator in (1.5.19) (we omit the constant K , since K and a similar constant in the denominator of (1.5.19) are canceled) :

(in (1.5.33) ( t ,x t+a , yt) are the arguments of the coefficients Ak, Bke, and Fop).

The denominator of the right-hand expression in (1.5.19) differs from the numerator by integration with respect to xt+a. We perform this integration, take into account the fact that the normalization condition for the probability density p(t, x t+a) implies

and from (1.5.33) obtain the following expression (without K ) for the denominator in (1.5.19):

where Eps (.) denotes the a posteriori averaging s ( ~ ) ~ ( t , x) dx. We assume that the elements of the matrix BUp (and of F,,, respec-

tively) are independent of unobservable components x and take into account (1.5.30). Then we can write

84 Chapter I

Multiplying (1.5.33) by (1.5.35) and substituting the result into (1.5.19), we obtain

As A -+ 0, the terms denoted by o(A) in (1.5.36) disappear, and the finite increments become differentials. In this case, according to $1.2, it is necessary to point out in which sense stochastic differentials are understood, since the differential equation obtained is stochastic (it contains the differential of the Markov process dyp(t)).

Comparing Eq. (1.5.36) (as A + 0) with the stochastic equation (1.2.3), we see that now the a posteriori probability density p(t, x) in (1.5.36) plays the role of the random function x(t) in (1.2.3), and the vector-function

plays the role of the function a ( t , x) in (1.2.3). In this case, as it follows from the derivation of Eq. (1.5.36), the functions (1.5.37) relative to time t are multiplied by the increments Ay, = ~ , ( t + A) - ~ , ( t ) (see (1.5.14)). Therefore, as A -+ 0, (1.5.36) implies the following differential equation in the Ito form:

Equation (1.5.38) is the desired equation for the a posteriori probability density of the unobservable components x(t) of the partially observable Markov process z(t).

For the a posteriori density p(t, x), we obtain the symmetrized equation (equivalent to (1.5.38))


- Eps (B,, a(FpuAu) + B,, a(FpuAu) + F,,A, A,)] p ax, ayT (1.5.39)

by using coupling formulas between stochastic differentials and integrals (see $1.2);'~ in (1.5.39) y,, = y,,(t) = dy,(t)/dt denotes the formal time- derivative of the Markov process y, (t).

Equation (1.5.39) is a generalization of the Fokker-Planck equation to the observation case. It should be noted that if some transformations of random functions p(t, x) are necessary (see formulas (1.5.41)-(1.5.44) below), then it is more convenient to use Eq. (1.5.39), although it is more cumbersome than a similar Ito equation (1.5.38), since (see $1.2) the symmetrized form allows us to treat random functions (even such singular functions as white noises) according to the same formal rules as for deterministic and sufficiently smooth functions.

We can show [I321 that fiup[dOyp(t) - EpsApdt]16 is the differential of the standard Wiener process drl(t) studied in $1.2. Therefore, in view of Eq. (1.5.38), the already cited Markov property of the set ( y t , ~ ( t , x)) can be obtained by not completely rigorous but sufficiently illustrative arguments.

Indeed, since the increments [ ~ , ( t + A) - ~ , ( t ) ] of the stochastic pro-

cesses ~ ( t ) = &[y,(t) - EpsAp(7, Y(T)) dt] in (1.5.38) are mutually independent, the future values of the a posteriori probability p(t + A, z ) are completely determined by (zt, yt, p(t, 2)). Since the vector xt is unobservable, then the probabilities ~ ( t + A, x) of future values are determined by (yt,p(t, x)) and the probability of the current value xt, that is, by the a posteriori density p(t, z ) contained in (yt,p(t, x)). On the other hand, since the process z(t) is of Markov character, the probabilities of future values of the observable process yt+a are completely determined by its current state zt = (xt , yt), that is, by the same set (yt, p(t, x)) , since xt is unobservable. This implies that ( y ( t ) , ~ ( t , z ) ) is a Markov process.

Now let us recall that Eqs. (1.5.38) and (1.5.39) were derived under the assumption that the control u(t) in (1.5.3) is a known deterministic function of time. However, if the control u(t) is given by the functional (1.5.2) (in the new notation, following (1.5.14), this functional has the form ~ ( t ) = ut = y(t , yh)), then this fact does not affect the Markov properties of (yt,p(t, x)), since it is assumed that ( y t , ~ ( t , x)) is determined by the

''Here we do not show in detail how to transform the Ito equation (1.5.38) to the symmetrized form (1.5.39); the reader is strongly recommended to do this useful exercise on his own.

16fl,,p denotes an element of the matrix @ which is the square root of the rna- trix F; since the matrix llBpoll is nondegenerate, the inverse F = IIBPall-', as well

as IIBPall, is psi t ive definite and symmetric; therefore, the matrix square root fi exists [62], and the matrix 0, as well as F, is positive definite and symmetric.

86 Chapter I

entire past history of the observations y; = {y(s): 0 5 s 5 t). Thus, for a given state of (yt,p(t , x)) and any chosen functional cp in (1.5.2), the control ut is a known vector on which the functions a( t , x, u) in (1.5.3) and the functions A, and A, in (1.5.38) depend as on a parameter. Hence it follows that Eqs. (1.5.38) and (1.5.39) are also valid for controlled processes (provided that the control is given in the form (1.5.2)).

Now let us return to the synthesis problem and the dynamic programming approach. Describing the state of a controlled system a t time t by (1.5.8) or, briefly, by (yt, p(t, x)) (recall that after (1.5.14) we introduced the new notation: Zt, ct + yt and xt, yt -+ xt), we can write the loss function (1.5.7) as F ( t , y;) = ~ ( t , yt ,p(t ,x)). Using the Markov property of (yt,p(t, x)), we can write the basic equation of the optimality principle for the function F (t, yt, p(t, 2)) as follows:

Generally speaking, by passing to the limit as A + 0 in (1.5.40) and using (1.5.14) and (1.5.38) (or (1.5.39)), we can obtain the Bellman differential equations by analogy with s1.4. However, the equation obtained in this way contains the functional derivatives GF/Sp(t, x), S2 F/Sp(t, x)Sp(t, T) , etc.; usually it is difficult to solve this equation (as pointed out in $1.4 and 81.5, even the solution of "usual" Bellman partial differential equations is a rather complicated problem). Therefore, it is more convenient in practice, instead of the a posteriori density p(t, x), to use some equivalent set of parameters as arguments of the function F. We show how to do this.

Assume that the a posteriori probability density p(t, x) is a unimodal function of a vector variable x for all t E 10, TI. By the vector mt = m(t) we denote the maximum of the a posteriori density p(t, x) a t time t. Expanding lnp(t, x) in the Taylor series with respect to x around the point m(t), we obtain the following representation of the a posteriori density p(t, x):

x 2 amp...c(t)(x, - ma(t)) . . . (xc - mc (t))} (1.5.41) ff,p,...,C =1


(the scalar function a( t ) in (1.5.41) is determined by the normalization condition Jp(t, x) dx = 1).

Using (1.5.41), we readily obtain the system of equations for the parameters (m, (t), aap(t), aapy (t), . . .) instead of the symmetrized equation (1.5.39). To this end, we rewrite (1.5.39) in the more compact form

Next we replace the functions 7f, and @(x, y) by their Taylor series,17 substitute (1.5.41) into (1.5.42), and successively set the coefficients of equal powers of (2, - m,), (x, - m,)(xp - mp) , . . . on the left- and right-hand sides of (1.5.42) equal to each other; thus we obtain the following system of ordinary differential equations for m, (t) , asp (t) , aapy (t) , . . . :

In (1.5.43) the dot over a variable indicates, as usual, the time derivative (mp = dmp(t)/dt). Moreover, in (1.5.43) we assume that Bffp is independent of x and omit the arguments of the functions x, @, and of their derivatives; we assume that the values of these functions are taken a t the - - point x = m, that is, Ap = Ap(t ,m, y), d@/dx, = d@(t ,m, y)/dx,, etc.

It follows from (1.5.41) that the set of the parameters m,(t), aap(t), . . . uniquely determines the a posteriori probability density p(t, x) a t time t.

1 7 ~ h e functions A, and 6(x, y) are expanded with respect to a: in a neighborhood of the point m(t) .

88 Chapter I

Thus we can use these parameters as new arguments of the loss function, since F (t, yt , p(t, 2 ) ) = F (t, yt , mat, a,pt, . . . ). However, in the general case, system (1.5.43) is of infinite order, and therefore, if we use the new sufficient coordinates (yt, mat, a,pt, . . .) instead of the old coordinates (yt,p(t, x)), then we do not gain considerable advantage in solving special problems. Nevertheless, there is an important class of problems in which the a posteriori probability density (1.5.41) is Gaussian (conditional Gaussian Markov processes are studied in detail in [131, 1321). We have such processes if [175] (1) the elements of the matrix are constant numbers; (2) the functions - A, linearly depend on x; (3) the function @(x, y) depends on x linearly and quadratically; (4) the initial probability density (the a pn'ori probability density of unobservable components before the observation) p(O, x) is Gaussian. Under these conditions, we have a,py = a,pya = - . a = 0 in (1.5.41) and (1.5.43), and system (1.5.43) is closed and of finite dimension:

Now let us consider the synthesis problem corresponding to this case. To avoid cumbersome formulas, we deal with a simplified version of problem (1.5.3)-(1.5.6). Namely, we assume that the input y(t) is absent and the system shown in Fig. 3 does not contain Block 1. Suppose that the plant P is described by the system linear with respect to the output (controlled) variables

2 = G(t, u)x + b(t, u) + u(t)[(t), (1.5.45)

where x = x(t) is an m-vector of output variables, G(t, u) and u( t ) are given m x m matrices, b(t, u) is a given m-vector-function, and [(t) is an m-vector of random perturbations of the standard white noise type (1.1.34). More explicitly, the vector-matrix equation (1.5.45) has the form

We observe the stochastic process

where Z and 7 are k-vectors, P and Q are k x n and k x k matrices, the matrix Q(t) is nondegenerate for all 0 < t < T, and rl(t) is the standard white noise (1.1.34) independent of [(t).


Under the assumption that the admissible control satisfies condition (1.5.4), it is required to find the optimal control u,( t ) = cp(t,Zi) such that the cost functional

attains its minimum value. We write

Then ( x ( t ) , y( t)) is a Markov stochastic process, and it follows from relations (1.5.45), (1.5.46), and (1.5.48) that the characteristics (1.5.14) of this process have the form

In (1.5.49) the indices a, P, y take values from 1 to m and the indices p, u, r from m+ 1 to m+ k .

In this case, it follows from (1.5.49) and (1.5.39) that in (1.5.42) we have

(Fop is an element of the matrix IIB,PII-l = [Q( t )QT ( t)]- l) . It follows from (1.5.49) and (1.5.50) that in this case system (1.5.43) has the form (1.5.44). Substituting (1.5.49) and (1.5.50) into (1.5.44), we obtain the following system of equations for the parameters m,(t), a a p ( t ) , a l p = 1 , . . . , m, of the a posteriori density:

System (1.5.51) can be written in a more compact vector-matrix form. So, introducing the matrix A = IlaaPIIy and taking into account the fact that yp = Z p according to (1.5.48), we see that (1.5.51) implies

A h = A [ G ( ~ , u)m+ b(t , u)] + ~ ~ ( t ) [ Q ( t ) & ~ ( t ) ] - l ( Z - ~ ( t ) , ) ,

A = - A C ( ~ ) C ~ ( ~ ) A - G ~ ( ~ , u ) A - A G ( ~ , u )

+ p T ( t ) [ ~ ( t ) ~ T ( t ) l - l p ( t ) .

90 Chapter I

Now we note that the right-hand sides of (1.5.52) do not explicitly depend on y(t), and moreover, the cost functional (1.5.47) is independent of the observable process Z(t). Therefore, in this case, the current values of the vector yt do not belong to the sufficient coordinates of problem (1.5.45)- (1.5.47), which are the current values of the components of the vector mt and of the elements of the matrix At.

If instead of the matrix A we consider the matrix D = A-l of a posteriori covariances, then, multiplying the first equation in (1.5.52) by the matrix D from the left and the second equation in (1.5.52) from the left and from the right and taking into account the formulas

we obtain, instead of (1.5.52), the relations

Equations (1.5.53) are well-known equations of the K a l m a n filter [I, 5, 58, 79, 1321. As is known, the Kalman filter is a device for optimal filtering of the "useful signal" x( t ) that is observed on the background of a random noise. In this case, the vector m(t) is an optimal18 estimate of current values of the components of the unobservable stochastic process x(t) that is the result of observation of ZJ = {Z(s): 0 5 s 5 t}, provided that the observation process is given by (1.5.46). The matrix D(t) that satisfies the second (matrix) equation in (1.5.53) characterizes the accuracy of the estimation of unobservable components of the process x(t) by the vector m(t) (see [I, 5, 791).

Equations (1.5.53) play the role of "equations of motion" for the controlled system in the space of sufficient coordinates. Since the process Q-'(t)(Z(t) - ~ ( t ) m ) is a white noise, the first equation in (1.5.53) is a stochastic equation of type (1.5.45), and the second equation is a usual differential (matrix) equation. Therefore, the Bellman differential equation for the loss function F(t, mt, Dt) can be derived by a technique similar to that used in $1.4 to derive Eq. (1.4.21) for the function (1.4.5).

''The optimality of the estimate m(t) is understood as the minimum of the mean square deviation Elx(t) - m(t)I2; as is known [167, 175, 1811, in the Gaussian case, m(t) coincides with the maximum point of the a posteriori probability density p ( t , x) = p ( x ( t ) = x 1 lot).


After similar calculations, we obtain the Bellman equation of the following form (see also [34, 1751) for the function F ( t , m, D) in problem (1.5.45)- (1.5.47):

d F + - (aaT - DPT(QQT)- ' PD)] a D

where a F / a m is an m-vector with components dF/dm,, a = 1 ,..., m; d 2 ~ / d m d m T is the m x m matrix of the derivatives d2F/dm,dmp, a, /3 = 1, . . . , m; dF/dD is the m x m matrix of the partial derivatives dF/dDap, a, /3 = 1,. . . , m; and c(m, D, u) denotes the a posteriori mean of the penalty function c(x, u) in the functional (1.5.47), that is,

= [ ( 2 ~ ) ~ det D]- ' /~ / C(X, u)e -(~-m)~D-'(x-m)/2 dx.

(1.5.55)

The loss function F ( t , m, D) satisfies (1.5.54) for 0 5 t < T. At the terminal instant of time t = T, this function is determined by the relation

where, by analogy with (1.5.55), Eps(.) denotes integration of ( a ) with Gaussian density. We see that (1.5.56) is a generalization of condition (1.4.22) to the case of indirect observations.

As usual, by solving Eq. (1.5.54) with the additional condition (1.5.56), we simultaneously obtain the optimal control u,(t) = cpl(t, m(t), D(t)) (see 81.3 and 81.4). Thus the desired algorithm of optimal control in the functional form u,(t) = cp(t, Zi) for problem (1.5.45)-(1.5.47) is the superpo- sition of the two operations: the optimal filtering of the observed process (Z(t) : 0 < t < T) by means of the Kalman filter (1.5.53) and the formation of the current control u, (t) = cpl (t, m(t), D(t)) .

This situation is typical of other problems with indirect observations. Therefore, in the general case of the servomechanism shown in Fig. 3, the

Chapter I

controller C actually consists of two blocks that are functionally different (see Fig. 12): the sufficient coordinate block SC that models the corresponding filter and the decision block D whose structure is determined by the solution of the Bellman equation.

Some examples of other Bellman equations obtained by using sufficient coordinates, as well as solutions of these equations, will be considered later in 33.3, $4.2, $5.4, and $6.1.

CHAPTER I1

EXACT METHODS FOR SYNTHESIS PROBLEMS

Exact solutions to synthesis problems of optimal control are of deep theoretical and practical interest. However, exact solutions can be obtained only in some special cases. The point is that exact methods are characterized by rather strict restrictions on the assumptions of the synthesis problem, but these assumptions are seldom satisfied in actual practice. It is well known that, for instance, the Bellman equation can be solved exactly under the following assumptions: (1) the dynamic equations of the plant are linear, (2) the optimality criterion of the form (1.1.11) or (1.4.3) contains only quadratic penalty functions, (3) no restrictions are imposed on the control and on the phase coordinates, (4) random actions (if any) on the system are Gaussian Markov processes or processes of the white noise type. The synthesis problems satisfying (1)-(4) are called linear-quadratic problems of optimal control. An extensive literature is devoted to these problems [3, 5, 18, 24, 72, 112, 122, 128, 132, 1681. In the present chapter we restrict our consideration to an outline of methods for solving such problems ($2.1) and consider in more detail less known results concerning the solution of some special synthesis problems with bounded controls (332.2-2.4).

3 2.1. Linear-quadratic problems of optimal control (LQ-problems)

2.1.1. First, let us consider an elementary optimalstabilization problem of a first-order system perturbed by a Gaussian white noise (see Fig. 13). Suppose that the plant P is described by a linear scalar equation of the form

& = ax + bu$ &[(t), (2.1.1)

where a , b, and Y are given constants ( Y > 0) and [(t) is the standard white noise (1.1.31). The performance of this system is estimated by the following functional of the form (1.4.3) with quadratic penalty functions:

Chapter I1

(here c, e l , and h are given positive constants). We do not impose any restrictions on the control u and the phase variable z.

Problem (2.1.1), (2.1.2) is a stochastic generalization of the linear-quadratic problem (1.3.24), (1.3.25) considered in $1.3 and a special case of a more general problem (1.4.2)-(1.4.4). Since the stabilization system shown in Fig. 13 is a specific case of the servomechanism shown in Fig. 8, the Bellman equation for problem (2.1. I ) , (2.1.2),

can be obtained from (1.4.21) by setting

Ax = a x + bu. A9 = BY = 0, Bx = v, c(z, y, u) = cx2 + hu ,

In (2.1.3) the loss function F = F ( t , x), determined, as usual, by

F ( t , x) = min E [cx2(r) + hu2(r)] d r + c 1 z 2 ( ~ ) I x(t) = x ..(TI

t<s<T

satisfies Eq. (2.1.3) in the strip IIT = (0 5 t < T, -co < x < co} and becomes a given quadratic function,

for t = T. Condition (2.1.5) readily follows from the definition of the loss function (2.1.4) or from formula (1.4.22) with +(x, y) = c1x2.

The optimal control u, in the form (1.4.25), which minimizes the expression in the square brackets in (2.1.3), is determined by the condition

Exact Methods for Synthesis Problems 9 5

Substituting, instead of u, the control u, into the expression in the square brackets in (2.1.3) and omitting the symbol "min", we rewrite Eq. (2.1.3) in the form

(Eq. (2.1.7) is just Eq. (1.4.26) for problem (2.1.1), (2.1.2)). Now to solve the synthesis problem, it remains to find the solution F ( t , x )

that satisfies Eq. (2.1.7) in the strip IIT and is a continuous continuation of (2.1.5) as t + T. We shall seek such a solution in the form

where p ( t ) and r ( t ) are some functions of time. We choose these functions so that the solution of the form (2.1.8) satisfy (2.1.5) and (2.1.7). Substituting (2.1.8) into (2.1.7) and setting the coefficient of x 2 equal to zero, as well as the terms independent of x , we obtain the following equations for the unknown functions p ( t ) and r ( t ) :

It follows from (2.1.5) that the solutions p ( t ) and r ( t ) of (2.1.9) and (2.1.10) attain the values

p ( T ) = c l , r ( T ) = 0 (2.1.11)

at the terminal time t = T . The system of ordinary differential equations (2.1.9), (2.1.10) with addi-

tional conditions (2.1.11) can readily be integrated. As a result, we obtain the following expressions for the functions p ( t ) and r ( t ) :

where the constants p, D l , D 2 , D 3 , and D4 are related to the parameters of problem (2.1.1), (2.1.2) as follows:

96 Chapter I1

From (2.1.6), (2.1.8), and (2.1.12), we obtain the optimal control law

which is the solution of the synthesis problem for the optimal stabilization system in Fig. 13. It follows from (2.1.14) that in this case the controller C in Fig. 13 is a linear amplifier in the variable x with variable amplification factor j?(t). In the sequel, we indicate such amplifiers by a special mark ">." Therefore, the optimal system for problem (2.1.1), (2.1.2) can be represented as the block diagram shown in Fig. 14.

Obviously, the minimum value I[u,] of the optimality criterion (2.1.2) with control (2.1.14) and the initial state x(0) = x is equal to F(0 , x). From (2.1.8), (2.1.12), and (2.1.13), we have

To complete the study of problem (2.1.1), (2.1.2), it remains to prove that the solution (2.1.12)-(2.1.15) of the synthesis problem is unique. It follows from our discussion that the problem of uniqueness of (2.1.12)- (2.1.15) is equivalent to the uniqueness of the solution (2.1.8) of Eq. (2.1.7). The general theory of quasilinear parabolic equations [I241 implies that Eq. (2.1.7) with additional condition (2.1.5) has a unique solution in the class of functions F (t, x) whose growth as lx 1 -+ m does not exceed that of any finite power of 1x1. On the other hand, an analysis of properties of the loss function (2.1.4) performed in [I131 showed that, for each t E [0, T] and x E R1, the function (2.1.4) satisfies the estimate

Exact Methods for Synthesis Problems 9 7

where N ( T ) is bounded for any finite T. Therefore, the function (2.1.8) is a unique solution of Eq. (2.1.7), corresponding to the problem considered, and the synthesis problem has no solutions other than (2.1.12)-(2.1.15).

REMARK. The optimal control (2.1.14) is independent of the parameter v , that is, of the intensity of random actions on the plant P, and coincides with the optimal control algorithm (1.3.33), (1.3.34) for the deterministic problem (1.3.24), (1.3.25). Such a situation is typical of many other linear-quadratic problems of optimal control with perturbations in the form of a Gaussian white noise.

The exact formulas (2.1.12)-(2.1.15) allow us to examine the process of relaxation of stationary operating conditions (see $1.4, Section 1.4.2) for the stabilization system in question. To this end, let us consider a special case of problem (2.1.1) in which the terminal state x ( T ) is not penalized (cl = 0) . In this case, formulas (2.1.12) and (2.1.13) read

If the operating time is equal to T > tl = 3 /20 , then the functions p( t ) and r ( t ) determined by (2.1.16) and (2.1.17) have the form shown in Fig. 15.

The functions ~ ( t ) and r ( t ) are characterized by the existence of two time intervals [O, T - t l ] and [T- t l , T] on which p( t ) and r ( t ) behave in different

9 8 Chapter I1

ways. The first interval [0, T - tl] corresponds to the stationary operating mode, that is, p(t) Y c/(P - a ) = const for t E [0, T - t l] , the function r ( t ) linearly decreases as t grows, and on this interval the rate of decrease in r ( t ) is constant and equal to vc/(,L? - a). The terminal interval [T - t l , T] is essentially nonstationary. It follows from (2.1.16) and (2.1.17) that the length of this nonstationary interval is of the order of 3/20. Obviously, in the case where this nonstationary interval is a small part of the entire operating time [0, TI, the control performance is little affected if, instead of the exact optimal control (2.1.14), we use the control

that corresponds to the stationary operating mode. It follows from (2.1.18) that for large T the controller C in Fig. 13 is a linear amplifier with constant amplification factor, whose technical realization is much simpler than that of the nonstationary control block described by (2.1.14) and (2.1.12).

Formulas (2.1.16) and (2.1.17) show that, for large values of T - t , the loss function (2.1.8) satisfies the approximate relation

C 1/C F ( t , x) - - x2 + - (T - t ) .

P - a 0 - a

Comparing (2.1.19) and (1.4.29), we see that in this case the value y of stationary mean losses per unit time, introduced in $1.4, is equal to

that is, coincides with the rate of decrease in the function r ( t ) on the stationary interval [0, T - tl] (Fig. 15). In this case, the stationary loss function defined by (1.4.29) is equal to

It should be noted that to calculate y and the function f (x ) , we need not have exact formulas for p(t) and r ( t ) in (2.1.8). I t suffices to use the corresponding stationary Bellman equation (1.4.30), which in this cases has the form

and to substitute the desired solution in the form f (x) = px2 into (2.1.22). We obtain the numbers p and y, just as in the nonstationary case, by setting

Exact Methods for Synthesis Problems 99

the coefficients of x2 and the free terms on the left- and right-hand sides in (2.1.22) equal to each other.

We also note that if a t least one of the parameters a, 6, v, c , and h of problem (2.1.1), (2.1.2) depends on time, then, in general, there does not exist any stationary operating mode. In this case, one cannot obtain finite formulas for the functions p(t) and r ( t ) in (2.1.8), since Eq. (2.1.9) is a Riccati equation and, in general, cannot be integrated exactly. Therefore, if the problem has variable parameters, the solution is constructed, as a rule, by using numerical integration methods.

2.1.2. All of the preceding can readily be generalized to multidimensional problems of optimal stabilization. Let us consider the system shown in Fig. 13 whose plant P is described by a linear vector-matrix equation of the form

k = A(t)x + B(t )u + u(t)J(t) , (2.1.23)

where x = x(t) E R, is an n-vector-column of phase variables, u E R, is an r-vector of controlling actions, and J( t ) E R, is an m-vector of random perturbations of a Gaussian white noise type with characteristics (1.1.34). The dimensions of the matrices A, B , and u are related to the dimensions of the corresponding vectors and are equal to n x n , n x r, and n x m, respectively. The elements of these matrices are continuous functions of time1 defined for all t from the interval [0, TI on which the controlled system is considered.

For the optimality criterion, we take a quadratic functional of the form

Here Q and G(t) are symmetric nonnegative definite n x n matrices and the symmetric r x r matrix H ( t ) is positive definite for each t E [0, TI.

Just as (2.1.3), the Bellman equation for problem (2.1.23), (2.1.24) follows from (1.4.21) if we set AY = BY = 0, Bx = u(t)uT(t), Ax = A(t)x + B(t)u, and c(x, y, u) = xTGx + U ~ H U . Thus we obtain

aF T T aF 1 a2 F - + x A (t) - + - SP u( t )aT (t) - a t ax 2 axaxT

'As was shown in [156], it suffices to assume that the elements of the matrices A(t ) , B ( t ) , and a ( t ) are measurable and bounded.

100 Chapter I1

In this case, the additional condition on the loss function (1.4.22) has the form

F(T, x) = xT&x.

The further considerations leading to the solution of the synthesis problem are similar to those in the one-dimensional case. Calculating the minimum value of the expression in the square brackets in (2.1.25), we obtain the

which is a vector analog of formula (2.1.6). Substituting the expression obtained for u, into (2.1.25), we arrive a t the equation

d F . . d F 1 d2 F - + x A (t) - + - sp a( t )aT (t) - dt a x 2 dxdxT

We seek the solution of (2.1.27) as the following quadratic form with respect to the phase variables:

Substituting (2.1.28) into (2.1.27) and setting the coefficients of the quadratic (with respect to x) terms and the free terms on the left-hand side in (2.1.27) equal to zero, we obtain the following system of differential equations for the unknown matrix P ( t ) and the scalar function r(t):

If system (2.1.29) is solved, then the optimal solution of the synthesis problem has the form

which follows from (2.1.26) and (2.1.28). Formula (2.1.30) shows that the controller C in the optimal system in Fig. 13 is a linear amplifier with n inputs and r outputs and variable amplification factors.

Let us briefly discuss the possibilities of solving system (2.1.29). The existence and uniqueness of the nonnegative definite matrix P ( t ) satisfying the matrix-valued Riccati equation (2.1.29) are proved in [72] under the above assumptions on the properties of the matrices A(t), B(t ) , G(t) , H ( t ) ,


and Q. One can obtain explicit formulas for elements of the matrix P ( t ) only by numerical r n e t h o d ~ , ~ which is a rather complicated problem for large dimensions of the phase vector x.

In the special case of the zero matrix G(t) 0, the solution of the matrix equation (2.1.29) has the form [I, 1321

P (t) = xT (T, t ) E [

Here X ( t , s ) , t > s, denotes the fundamental matrix of system (2.1.23); sometimes this matrix is also called the Cauchy matrix. The properties of the fundamental matrix are described by the relations

One can construct the matrix X ( t , s ) if the so-called integral matrix Z(t ) of system (2.1.23) is known. According to [ I l l ] , a square n x n matrix Z( t ) is called the integral matrix of system (2.1.23) if its columns consist of any n linearly independent solutions of the homogeneous system j: = A(t)x. If the matrix Z(t ) is known, then the fundamental matrix X ( t , s ) has the form

X( t , s ) = z(~)z - l ( s ) . (2.1.33)

One can readily see that the matrix (2.1.33) satisfies conditions (2.1.32). The fundamental matrix can readily be calculated if and only if the

elements of the matrix A(t) in (2.1.23) are time-independent, that is, if

A(t) A = const. In this case, we have

and the exponential matrix can be expressed in the standard way [62] either via the Lagrange-Silvester interpolation polynomial (in the case of simple eigenvalues of the matrix A) or via the generalized interpolation polynomial (in the case of multiple eigenvalues and not simple elementary divisors of the matrix A).

If the matrix A is time-varying, the construction of the fundamental matrix (2.1.33) becomes more complicated and requires, as a rule, the use of numerical integration methods.

'There also exist approximate analytic methods for calculating the matrices P( t ) [I, 721. However, for matrices P ( t ) of larger dimensions, these methods meet serious computational difficulties.

102 Chapter I1

2.1.3. The results obtained by solving the basic linear-quadratic problem (2.1.23), (2.1.24) can readily be generalized to more general statements of the optimal control problem. Here we only list the basic lines of these generalizations; for a detailed discussion of this subject see [ I , 5, 34, 58, 72, 122, 1321.

First of all, note that the synthesis problem (2.1.23), (2.1.24) admits an exact solution even if there are noises in the feedback circuit, that is, if instead of exact values of the phase variables x(t) , the controller C (see Fig. 13) receives distorted information of the form

where N ( t ) and uo(t) are given matrices and ~ ( t ) is either a stochastic process of the white noise type (1.1.34) or a Gaussian Markov process. In this case, the optimal control algorithm coincides with (2.1.30) in which, instead of the true values of the current phase vector x = x(t) , we use the vector of current estimates m = m(t) of the phase vector. These estimates are formed with the help of Eqs. (1.5.53) for the Kalman filter, which with regard to the notation in (2.1.23) and (2.1.34) have the form3

rh, = [A(t) - B ( t ) ~ - ' ( t ) B ~ ( t ) ~ ( t ) ] m

+ DNT (t) [uo(t)u: (t)]-' (Z(t) - ~ ( t ) m ) , (2.1.35)

Thus in the case of indirect observation (2.1.34), as shown schematically in Fig. 16, the optimal controller C consists of the following two functionally different blocks connected in series: the block K F modeling Eqs. (2.1.35), (2.1.36) for the Kalman filter and a linear amplifier with matrix amplification factor P ( t ) = - ~ - l ( t ) ~ ~ ( t ) ~ ( t ) . This statement follows from the well-known separation theorem 158, 1931.

The next generalization of the linear-quadratic problem (2.1.23), (2.1.24) is related to a more general model of the plant. Suppose that, in addition to additive noises [(t), the plant P is subject to perturbations depending on the state x and control u and to pulsed random actions with Poisson distribution of the pulse moments. I t is assumed that the behavior of the plant P is described by the special equation

3~qua t ions (2.1.35) and (2.1.36) correspond to the case in which ~ ( t ) in (2.1.34) is a white noise.

Exact Methods for Synthesis Problems

where &(t) and &(t) are scalar Gaussian white noises (1.1.31), O ( t ) is an e-vector of independent Poisson processes with intensity coefficients Xi (i = 1,. . .,l), ul, ~ 2 , and u3 are given n x n, n x r , and n x & matrices, and the other variables have the same meaning as in (2.1.23). For the exact solution of problem (2.1.37), (2.1.24), see 1341.

We also note that sufficiently effective methods have been developed for infinitely dimensional linear-quadratic problems of optimal control if the plant P is either a linear dynamic system with distributed parameters or a quantum-mechanical system. Results corresponding to control of distributed parameter systems can be found in 1118, 130, 164, 1821 and to control of quantum systems in [12, 131.

All linear-quadratic problems of optimal control, as well as the above- treated examples, are characterized by the fact that the loss function satisfying the Bellman equation is of quadratic form (a quadratic functional) and the optimal control law is a linear function (a linear operator) with respect to the phase variables (the state function).

To solve the Bellman equation becomes much more difficult if it is necessary to take into account some restrictions on the domain of admissible control values in the design of an optimal system. In this case, exact analytical results can be obtained, as a rule, for one-dimensional synthesis problems (or for problems reducible to one-dimensional problems). Some of such problems are considered in the following sections of this chapter.

32.2. Problem of optimal tracking a wandering coordinate

Let the input (command) signal y(t) in the servomechanism shown in Fig. 2 be a scalar Markov process with known characteristics, and let the plant P be a servomotor whose speed is bounded and whose behavior is

104 Chapter I1

described by the scalar deterministic equation

(here u, determines the admissible range of the motor speed, -u, < i < u,). Equation (2.2.1) adequately describes the dynamics of a constant current motor controlled by the voltage on the motor armature under the assumption that the moment of inertia and the inductance of the armature winding are small [2, 501. We shall show that various synthesis problems stated in $1.4 can be solved for such servomechanisms.

2.2.1. Let y(t) be a diffusion Markov process with constant drift a and diffusion B coefficients. We need to calculate the controller C (see Fig. 2) that minimizes the integral optimality criterion

where c(x, y) is a given penalty function. By setting AY = a , BY = B, ax = U, and Bx = 0 in (1.4.21), we readily

obtain the following Bellman equation for problem (2.2. I) , (2.2.2):

d~ d~ B ~ ~ F d F - +a- + -- +c(x , y) + min (2.2.3) a t ay 2 a y 2 l u l l u r n

We shall consider the penalty functions c(x, y) depending only on the error signal, that is, on the difference z = y - x between the command input y and the controlled variable x. Obviously, in this case, the loss function F ( t , x , y) = F ( t , y - x) = F ( t , z ) in (2.2.3) also depends only on z . Instead of (2.2.3), we have

aF O F B a 2 F -+a- + --+c(z) + min at az 2 az2 (2.2.4)

IaIIur"

The minimum value of the function in the square brackets in (2.2.4) can be obtained by using the control4

41n (2.2.5) signa indicates the following scalar function of a scalar variable a :


which requires to switch the servomotor speed instantly from one admissible limit value to the opposite value when the derivative a F ( t , z ) / az of the loss function changes its sign. Control of the form (2.2.5) is naturally called control of relay type (sometimes, this control is called "bang-bang" control).

Substituting (2.2.5), instead of u, into (2.2.4) and omitting the symbol "min", we reduce Eq. (2.2.4) to the form

In [113, 1241 it was shown that in the strip ITT = (0 5 t 5 T, -m < z < m } Eq. (2.2.6) has a unique solution F ( t , z) satisfying the additional condition F(T, z) = 0 if the penalty function c(z) is continuous and does not grow too rapidly as [z [ + m.5 In this case, F ( t , z) is a function twice continuously differentiable with respect to z and once with respect to t. In particular, since a F / d z is continuous, the condition

must be satisfied a t the moment of switching the controlling action. If c(z) > 0 attains its single minimum a t the point z = 0 and does

not decrease monotonically as jzj t oo, then Eq. (2.2.7) has a single root z,(t) for each t. This root determines the switch point of control. On different sides of the switch point the derivative a F / d z has opposite signs. If a F / a z > 0 for z > z,(t) and a F / a z < 0 for z < zr( t ) , then we can write the optimal control (2.2.5) in the form

U* (t, Z) = U , sign (z - zr ( t ) ) . (2.2.8)

Thus, the synthesis problem is reduced to finding the switch point z,(t). To this end, we need to solve Eq. (2.2.6).

Equation (2.2.6) has an exact solution if we consider the stationary tracking. In this case, the terminal time (the upper limit of integration in (2.2.2)) T t m , and Eq. (2.2.6) for the time-invariant loss function (see (1.4.29))

f ( z ) = lim [ F ( t , z ) - T+oo

becomes the ordinary differential equation

5More precisely, the condition that there exist positive constants A l , A2, and a such that 0 < c(r) 5 A1 + Azlrla for all r implies constraints on the growth of the function c(r).

106 Chapter I1

which can be solved by the matching method [113, 171, 1721. Let us show how to do this. Obviously, the nonlinear equation (2.2.10)

is equivalent to two linear equations

for the functions fl(z) and f2(z) that determine the function f(z) on each side of the switch point zr. The unique solutions to linear equations (2.2.11) are determined by the behavior of fl and f2 as lz[ -+ co. It follows from the statement of the problem that if we take into account the diffusion "divergence" of the trajectories z(t) for large lzl, then we only obtain small corrections to the value of the optimality criterion and, in the limit as jz I -+ co, the loss functions fl (z) and f2 (z) must behave just as the solutions to Eqs. (2.2.11) with B = 0. The corresponding solutions of Eqs. (2.2.11) have the form

dfl 2 O"

Z = E ~ - yl exp [ - 2(U\- a) (T - z)] dz, (2.2.12)

df 2 - - - ? / z - [.(a) - 71 exp [2(u\+ a) (Z - z)] d ~ . dz B -00

According to (2.2.7), we have the following relations a t the switch point zr:

Substituting (2.2.12) into (2.2.13), considering (2.2.13) as a system of equations with respect to two unknown variables zr and y, and performing some simple transformations, we obtain the equation for the switch point:

and the expression for the stationary tracking error


To obtain explicit formulas for the switch points and stationary errors, it is necessary to choose some special penalty functions c(z). For example, for the quadratic penalty function c(z) = z2 from (2.2.14), (2.2.15), we have

If c(z) = lzl, then we have

It should be noted that formulas (2.2.16)-(2.2.19) make sense only under the condition u, > a. This is due to the fact that the stationary operating mode in the problem considered may exist only for urn > a. Otherwise, (for a > urn), the mean rate of increase in the command signal ~ ( t ) is larger than the limit admissible rate of change in the output variable x(t), and the error signal z( t ) = y(t) - x ( t ) is infinitely growing in time.

If the switch point zr is found, then we know how to control the servomotor P under the stationary operating conditions. In this case, according to (2.2.8), the optimal control has the form

and hence, the block diagram of the optimal servomechanism has the form shown in Fig. 17.

The optimal system shown in Fig. 17 differs from the optimal systems considered in the preceding section by the presence of an essentially nonlinear ideal-relay-type element in the feedback circuit. The other distinction between the system in Fig. 17 and the optimal linear systems considered in 32.1 is that the control method depends on the diffusion coefficient B of the input stochastic process (in $2.1, the optimal control is independent of the diffusion coefficient^,^ and therefore, the block diagrams of optimal deterministic and stochastic systems coincide).

If B = 0 (the deterministic case), then it follows from (2.2.16)-(2.2.19) that the switch point zr = 0 and the stationary tracking error y = 0. These

'This takes place if the current values of the state vector z ( t ) are measured exactly.

Chapter I1

results readily follow from the statement of the problem; to obtain these results it is not necessary to use the dynamic programming method. Indeed, if a t some instant of time we have y(t) > x(t) (z(t) > 0), then, obviously, it is necessary to increase x a t the maximum rate (that is, a t u = +urn) till the equality y = x (z = 0) is attained. Then the motor can be stopped. In a similar way, for y < x (z < O), the control u = -urn is switched on and operates till y becomes equal to x. After y = x is attained and the motor is stopped, the zero error z remains constant, since there are no random actions to take the system out of the state z = 0. Therefore, the stationary tracking "error" is zero.'

If the diffusion is taken into account, then the optimal deterministic control u p t = urn signz is not optimal. This fact can be explained as follows. Let u = urn signz, and let B # 0. Then the following two factors affect the trajectories z(t): they regularly move downwards with velocity (urn - a ) for z > O and upwards with velocity (urn + a ) for z < 0 due to the drift a and control u (see Fig. IS), and they "spread" due to the diffusion B that is the same for all z. As a result, the stochastic process z(t) becomes stationary (since the regular displacement towards the t-axis is proportional to t and the diffusion spreading away from the t-axis is proportional to &) and all sample paths of z( t ) are localized in a strip of finite width containing the t-axis.' However, since the "returning" velocities in the upper and lower half-planes are different, the stationary trajectories of z(t) are arranged not

' ~ t is assumed that the penalty function c ( z ) attains its minimum value a t r = 0 and c(0) = 0.

'More precisely: if z ( 0 ) = 0, then with probability 1 the values of r ( t ) lie in a strip of finite width for all t 2 0.


symmetrically with respect to the line z = 0, as is conventionally shown in Fig. 19. If the penalty function c(z) is an even function (c(z) = c(-z)), then, obviously, the stationary tracking error y = Ec(z) (see (1.4.32)) can be decreased by placing the strip AB (where the trajectories are localized) symmetrically with respect to the axis z = 0. This effect can be reached by switching the control u a t some negative value zr rather than a t z = 0. The exact position of the switch point zr is determined by formulas (2.2.14), (2.2.16), and (22.2.18).

In conclusion, we note that all results obtained in this section can readily be generalized to the case where the plant P is subject to additive noncontrolled perturbations of the white noise type (see Fig. 10). In this case,

110 Chapter I1

instead of Eq. (2.2.1), we have

where [(t) is the standard white noise (1.1.31) independent of the input process y(t) and N > 0 is a given number.

In this case, the Bellman equation (2.2.3) acquires the form

a~ a~ B a 2 ~ N a 2 ~ a~ - +a- + - + - + c(x, y) + min [az] = 0, a t ay 2 a y 2 2 a x 2 I U I S U ~

and instead of (2.2.4), we obtain

This equation differs from (2.2.4) only by a coefficient of the diffusion term. Therefore, all results obtained for systems whose block diagram is shown in Fig. 2 and whose plant is described by Eq. (2.2.1) are automatically valid for systems in Fig. 10 with Eq. (2.2.21) if in the original problem the diffusion coefficient B is replaced by B + N. In particular, if noises in the plant are taken into account, then formulas (2.2.16) and (2.2.17) for the stationary switch point and the stationary tracking error take the form

Note also that the problem studied in this section is equivalent to the synthesis problem for servomechanism tracking a Wiener process of intensity B with nonsymmetric constraints on admissible controls -urn + a 5 u 5 u, + a , since both these problems have the same Bellman equation (2.2.4).

2.2.2. Now let us consider the synthesis problem that differs from the problem considered in the preceding section only by the optimality criterion. We assume that there is an admissible domain I l l , 12] for the error ~ ( t ) = y(t) - x(t) (el and l 2 are given numbers such that el < e2) . We assume that if z(t) leaves this domain, then serious undesirable effects may occur. For example, the system considered or a part of any other more complicated system containing our system may be destroyed. In this case,


it is natural to look for controls that keep z(t) within the admissible limits for the maximum possible time.

General problems of calculating the maximum mean time of the first passage to the boundary were considered in $1.4. In particular, the Bell- man equation (1.4.40) was obtained. In the scalar case studied here, this equation has the form

B d 2 F 1 dF1 -- aFl 2 ay2

+ a p + max u- = -1 ay . I i u m [ ax I (Eq. (2.2.24) follows from (1.4.40), (1.4.3 I ) , since AY = a , Ax = U, BY = B, Bx = 0). Recall that the function Fl(x ,y) in (2.2.24) is equal to the maximum mean time of the first passage to the boundary of the domain of admissible phase variables if the initial state of the system is (x, y). In the case where the domain of admissible values (x, y) is determined by the error signal z = y - x, the function Fl depends only on the difference Fl(x, y) = Fl (y - x) = Fl (z) and, instead of the partial differential equation (2.2.24), we have the following ordinary differential equation for the function Fl(z):

B d2Fl dFl -- a Fi +a- + max -u- [ & I = - 1 . (2.2.25) 2 dz2 dz l u l ~ u ~

The function Fl(z) satisfies Eq. (2.2.25) a t the interior points of the domain [11,12] of admissible errors z. At the boundary points of this domain, Fl vanishes (see (1.4.41)):

The optimal system can be synthesized by solving Eq. (2.2.25) with the boundary conditions (2.2.26). Just as in the preceding section, one can see that the optimal control u, (z) is of relay type and is equal to

U, (z) = -urn sign (3 Using (2.2.27), we transform Eq. (2.2.25) to the form

The condition of smooth matching (see [113], p. 52) implies that the solution Fl(z) of Eq. (2.2.28) and the derivatives dF11d.z and d2Fl/dz2 are

Chapter I1

continuous everywhere in the interior of [.el, e2] . Therefore, the switch point z: is determined by the condition

The same continuity conditions and the boundary conditions (2.2.26), as well as the "physical" meaning of the function Fl(z), a priori allow us to estimate the qualitative behavior of the functional dependence Fl(z). The corresponding curve is shown in Fig. 20. It follows from (2.2.29) that the switch point corresponds to the maximum value of Fl(z) . In this case, Fi(z) < 0 for z > z i , and F i ( z ) > 0 for z < 2;. In particular, this implies that the optimal control (2.2.27) can be written in the form

1 U* (z) = U , sign(z - zr ),

which is similar to (2.2.20) and differs only by the position of the switch point. Thus, in this case, if the applied constant displacement -zr is replaced by -zt, then the block diagram of the optimal system coincides with that in Fig. 15.

The switch point z: can be found by solving Eq. (2.2.28) with the boundary conditions (2.2.26). Just as in the preceding section, we replace the nonlinear equation (2.2.28) by the following pair of linear equations for the


function ~ : ( z ) , z: < z < e2, and the function F c ( z ) , l1 < z < z i :

The required switch point z: can be obtained from the matching conditions

for F$(z) and F c ( z ) . Since F l ( z ) is twice continuously differentiable, it follows from (2.2.27) that these conditions have the form

The boundary conditions (2.2.26) and (2.2.32) for F: ( z ) and F; ( z ) imply

z - l2 2(um - a ) (2; - t 2 ) F$(Z) = - +

u r n - - a 2 ( u , - ~ ) ~ B - a)( . - 2:)

I - exp [2(um

e, - z F; ( 2 ) = - + B { exp [ 2 ( u m + a$l(z: - e l )

u m + a ~ ( u , + u ) ~ I 2(um + a ) ( z - z i )

- exp [ - B I ) -

By using (2.2.33) and the continuity condition (2.2.31), we obtain the following transcendental equation for the required point z::

2aum B 2urnz: = (urn + a)l2 + ( u , - a)el + -

u& - a2

B u r n - a 2 + -{- 2 u m + a exp [E (urn + a) (2: - e l ) ]

Urn -- 2 + a e x p [ - B ( u m - a ) ( z : - & ) ] } . (2.2.34)

urn - a

In the simple special case a = 0, it follows from (2.2.34) that

114 Chapter I1

that is, the switch point is the midpoint of the interval of admissible errors z. This natural result can be predicted without solving the Bellman equation. In the other special case where -el = l2 = l ( l > 0) and a << um, Eq. (2.2.34) gives the following approximate expression for zi:

To find z: in the other cases, it is necessary to solve the transcendental equation (2.2.34).

2.2.3. Assume that the performance of the servomechanism shown in Fig. 2 is determined by the maximum error z( t ) = y(z) - x(t) on a fixed time interval 0 < t 5 T. Then it is natural to minimize the optimality criterion

I[u] = E max Iz(t)l = E max ly(t) - x(t)l, (2.2.35) O S t l T O < t < T

which is a special case of the criterion (1.1.18). For convenience, we shall use the modification (1.4.48) of the criterion (1.1.18), that is, instead of (2.2.35), we shall minimize

The parameter ,6 > 0 determines the observation time for the stochastic process ~ ( 7 ) . We assume that the criteria (2.2.35) and (2.2.36) are equivalent if the terminal time T and the variable ,6 are matched, for example, as follows: T = c/o, where c > 0 is a constant.

The Bellman equation for the problem considered can be obtained from (1.4.51) with regard to the relation f2(x, y) = f2(y - x) = f2(z). This equation has the form

Just as in the preceding sections, after the expression in the square brackets is minimized, Eq. (2.2.37) acquires the form

df2 2 , iff2(z) > z ] , (2.2.38)

otherwise.

In this case, just as in the preceding sections, the optimal control u,(z) is of relay type and can be written in the form (2.2.20). The only distinction


is that, in general, the switch point z: differs from zr and 2:. The point z: can be found by solving Eq. (2.2.38).

Solving Eq. (2.2.38), we shall distinguish two domains on the z-axis: the domain Z1 where f2 (z) > jzI and the domain Z2 where f2(z) = 121.

Obviously, if f2(z*) = lz*I for any z*, then f2(z) = 1x1 for any z such that I z I > Iz*I. In other words, the domain Z2 consists of two infinite intervals (-CQ, z') and (z", +a). In the domain Z1 lying between the boundary points z' < 0 and z" > 0, we have

Next, the interval [zl, z"] is divided by the switch point z: into the following two parts: the interval z' < z < z: where Eq. (2.2.39) takes the form

and z: < z < Z" where

Thus, in this case, we have seven unknown variables: z', z", z:, and the four constants obtained by integrating Eqs. (2.2.40) and (2.2.41). They can be obtained from the following seven conditions:

Formulas (2.2.42) are smooth matching conditions for the solutions f; (z) and f$(z). The last three conditions show that the solutions and their first-order derivatives are continuous a t the switch point z: (see (2.2.31) and (2.2.32)). The first four conditions show that the solutions and their first-order derivatives are continuous a t the boundary points of Z1.

By solving (2.2.40) and (2.2.41) with regard to (2.2.42), we readily obtain

116 Chapter I1

the following three equations for z', zl', and 2::

In (2.2.43)-(2.2.45) we have used the notation

The desired switch point z: can be found by solving the system of transcendental equations (2.2.43)-(2.2.45). Usually, this system can be solved by numerical methods. One can obtain the explicit expression for z: from Eqs. (2.2.43)-(2.2.45) only if the problem is symmetric, that is, if a = 0. In this case, the domain 2 1 is symmetric about the origin, z' = -zl ' , and we have the switch point z: = 0. However, this is a rather trivial case and of no practical interest.

REMARK. It should be noted that the optimal systems considered in Sections 2.2.2 and 2.2.3 are very close to each other (the switch points nearly coincide, z: z:) if the corresponding parameters of the problem agree well with each other. These parameters can be made consistent in the following way. Assume that the same parameters a, B , and urn are given in problems of Sections 2.2.2 and 2.2.3. Then, choosing a value of


the parameter p, we can calculate three numbers z' = zl(P), z" = z1'(/3), and z: = z:(,B) in dependence on the choice of 0. Now if we use z' and z" as the boundary values of admissible errors (el = zl(P), ez = zl'(P)) in the problem considered in Section 2.2.2, then by solving Eq. (2.2.34), we obtain the coordinate of the switch point z: and show that z:(/3) w ~:(/3)' for varying from 1.0 to 10W4. This is confirmed by the numerical experiment described in [92]. Moreover, in [92] it is shown that Fl(z;(P)) w P-l for these values of the parameter p.

2.2.4. Now let us consider the synthesis problem of optimal tracking a discontinuous Markov process. Let us assume that the input process y(t) in the problem of Section 2.2.1 is a pure discontinuous Markov process. As shown in 51.1, such processes are completely characterized by the intensity A(y) of jumps and the density function ~ ( y , y') describing the transition probabilities a t the jump moments. The one-dimensional density p(t , y) of this process satisfies the Feller equation (see (1.1.71))

From (1.4.61) with regard to (2.2.1) and (2.2.2), we obtain the Bellman equation

+ c(z, y) + min I u I I u m

If we denote the integro-differential operator of Eq. (2.2.46) by Lt,y, then this equations can be written in the short form

Comparing Eqs. (2.2.46) and (2.2.47) with the Feller equations (1.1.69) and (1.1.70), we see that, for pure discontinuous processes, the Bellman equation (2.2.47) contains the integro-differential operator ~t~ of the backward Feller equation; this operator is dual to Lt,y. Therefore, Eq. (2.2.47) can be written in the form

9 ~ h e approximate relation zt(P) Z x;(P) means that lz:(P)-zr(P)I << z"(P)-~'(P).

118 Chapter I1

In what follows, we assume that the input Markov process y(t) is homogeneous with respect to the state variable y, that is, X(y) = X = const and ~ ( y , y') = ~ ( y - y'). In this case, by using the formal method proposed in [176], we can replace the integro-differential operator L:~ in (2.2.47) and (2.2.49) by an equivalent differential operator.

Let us show how to do this. First, we try to write Eqs. (2.2.46) and (2.2.48) in the form

where L is the required differential operator and Lp is the density of the probability flow [160, 1731. We apply the Fourier transform to (2.2.50) and (2.2.46). For the Fourier transform of the probability density

we obtain the following two equations from the well-known property of the Fourier transform of the convolution of two functions:1°

where r ( s ) denotes

Comparing (2.2.51) and (2.2.52), we obtain the spectral representation of the desired operator

T(s) - 1 L(s) = X-. (2.2.54)

S

If the expression on the right-hand side of (2.2.54) is the ratio of polynomials

(hi and qi are constant numbers), then, as follows from the theory of Fourier transforms [41], the desired operator L(d/ay) can be obtained from L(s)

l o ~ e c a l l that X(y) = X and T ( Z , y) = ~ ( y - Z) in (2.2.46).


by the formal change s 4 dldy. Using the operator L(d/dy), we transform the Bellman equation (2.2.49) to the form

(note that if L(d/ay) = H(d/dy)/Q(d/dy) , then the L 4 = cp has the meaning of the relation H(d/dy)+ = Q(d/dy)cp).

Equation (2.2.56) can be simplified just as in Section 2.2.1. Namely, assuming that the penalty function c(x, Y) depends only on the difference z = y - x, instead of (2.2.56), we obtain the following ordinary differential equation for the stationary operating mode:

(here the time-invariant loss function f3 = f3(z) is determined by analogy with (2.2.9) and y is the stationary error.) The optimal control

differs from (2.2.20) only by the position of the switch point z:, which can be obtained from the condition

To complete the solution of the synthesis problem, we need to calculate 2:. By analogy with Section 2.2.1, the switch point z: divides the z-axis into two parts: the domain z > z:, where df3/dz > 0, u, = urn, and Eq. (2.2.57) has the form

and the domain z < z:, where df3/dz < 0, u, = urn, and Eq. (2.2.57) has the form

kz (i) df' = y - c(z). -L - f'+u,-&-

At the switch point 2% the derivatives of the functions f+(z ) and f - (z ) vanish:

120 Chapter I1

To solve the linear equations (2.2.59) and (2.2.60) explicitly, we need to specify the form of the linear operator L(d/dz). Assume that the density of the transition probability Z(Y, Y') a t the jump moment is given by the formula

k ~ k z exp(-klIyl - YI) for Y' > Y, (2.2.62) .(Y, Y') = .(Y' - Y) = kl + k2 --{ exp(-k2y' - y/ ) for y' < y.

Calculating the integral (2.2.53), we obtain

After the change s + dldz, we obtain the following expression for the operator L(d/dz) from (2.2.63) and (2.2.54):

With regard to (2.2.64), we can write Eqs. (2.2.59) and (2.2.60) in the form

Introducing the functions

we transform the system (2.2.65) as follows:

d 2 ~ + d $ Q k2 * - XAk $ Q = ( z ) (2.2.67) ( A ) U r n )

where f c (z) ==I--

Urn

ReIations (2.2.6 1) and (2.2.66) impIy the following matching conditions for the functions $Q* (z) a t the switch point zz:


The characteristic equations corresponding to Eqs. (2.2.67) are

By I-"$ and p i we denote the roots of the characteristic equation for the function cp+(z) (correspondingly, by pT and py the roots of the characteristic equation for cp-(2)). A straightforward verification shows that if

then

(1) all roots p t 2 are real, (2) each characteristic equation in (2.2.70) has roots of opposite signs

(for definiteness, in what follows, we assume that p$ and p: are positive, and respectively, p: < 0).

REMARK. Note that condition (2.2.71) must be satisfied, since this is the existence condition for the stationary tracking considered here. Indeed, the expression on the left-hand side in (2.2.71) is equal to the absolute value of the mean rate of the regular displacement in the command signal y(t) caused by random jumps. Obviously, this rate of change cannot exceed the limit admissible speed of the servomotor. Inequality (2.2.7 1) just confirms this fact.

Taking into account these properties of the roots of the characteristic equations (2.2.70), one can readily see that the solutions of Eqs. (2.2.67) have the form

Using (2.2.72), from the matching conditions (2.2.69), we obtain the following equation for the required switch point z;:

122 Chapter I1

For the quadratic penalty function c(z) = z2 , Eq. (2.2.73) has an exact solution. Actually, taking into account (2.2.68)) we can rewrite Eq. (2.2.73) in the form

where

Calculating the integrals in (2.2.74), we obtain

Since p t 2 and pL2 satisfy the quadratic equations (2.2.70), after easily transformation, we obtain the following explicit formula for the switch point:

Using (2.2.69)) (2.2.72), and (2.2.77), we can readily calculate the stationary specific error

2u; X 2

+ (AAk - k 2 ~ m ) 2 [(Ak + -) urn + (k2 - E)] urn . (2.2.78)


If instead of condition (2.2.71), we have a stronger inequality

then we can substantially simplify (2.2.77) and (2.2.78) by expanding these expressions in power series in the small parameter E = XAk/umk2 and taking only the leading terms of these expansions. In this case, instead of (2.2.77) and (2.2.78), we have

For the first time, formulas (2.2.79) and (2.2.80) were derived by somewhat different methods in [176].

$2.3. Optimal control of the population size

Numerous investigations deal with the dynamics of animal and microor- ganism populations and control of the population size. For example, an extensive literature on this subject can be found in 151, 73, 87, 89, 133, 142, 186, 187, 1891. Various mathematical evolutional models depending on the environmental conditions of biological populations are used for describing the variations in the population size. We begin with a brief review of such models. The main attention is given to the models that we shall consider in this book later.

2.3.1. Models describing the population dynamics. Apparently, Malthus was the first who considered the following model for the population dynamics in 1798:

Here x = x(t) is the population size1' a t time t , and the constant number a , called the growth factor, is determined as the difference between the birth- rate and the death-rate factors.

If the birth-rate is larger than the death-rate (a > O), then, according to the Malthus model (2.3.1), the population size must grow infinitely.

"The variable I is assumed to be continuous, although the number of individuals in the population can be only integer. However, if the number of individuals is sufficiently large, then the continuous model (2.3.1) can be used. In this case, the variable x is treated as the population density, that is, as the number of individuals per unit area (or volume) of the population habitat.

124 Chapter I1

Usually, this result is not confirmed, which shows that the mode1 (2.3.1) is not perfect. Nevertheless, the basic idea of this model, namely, the assumption that the rate of the population variation is proportional to the current population size proved to be very fruitful. Many more realistic models were constructed on this basis by introducing the corresponding corrections to the growth factor a.

So, for example, if we assume that in (2.3.1) the growth factor a depends on the population size x as

then we obtain the Gompertz mode1 (1825)

or the Verhulst mode1 (1838)

Equation (2.3.3) is often called the logistic equation. The positive constants r and K are usually called the natural growth factor and the capacity of the medium.

Models for more complicated systems of interacting populations are also based on the Malthus model (2.3.1). Assume that in the same habitat there are two different populations of sizes x l and 22, respectively. Let each of these populations be described by the Malthus type equations

Now we assume that individuals in the second population (predators) can exist only if they eat individuals (prey) from the first population.12 In this case, it is natural to assume that the growth factors a and b in (2.3.4) have the form

12This model is usually illustrated by the assumption that z l denotes a community of hares and x2 a community of wolves. Hares need vegetable food, and wolves feed on hares (and only on hares).


Thus, we arrive a t the two equations

which are well-known Lotka-Volterra equations that model the behavior of the simplest system of interacting populations, the "predator-prey" model. These equations were studied in detail by V. Volterra [187], who found many remarkable properties of their solutions.

The multidimensional generalization of the Lotka-Volterra model has the form

= - a T T , r = 1 , 2 ,..., n. (2.3.6)

The dynamics of system (2.3.6) depends on the form of the matrix A = ljaijI(;L. If this matrix is antisymmetric, i.e., if a i j = -aji and aii = 0, then Eq. (2.3.6) describes a conservative model for the population interaction. If the quadratic form a,,xTxs is positive definite, then model (2.3.6) is called dissipative.

Further generalizations of models for the population dynamics were related to more detailed descriptions of the interaction between individuals in the population. For example, in many actual situations, the growth factor a depends on the population size a t some preceding moment of time rather than on the current population size. In these cases, it is expedient to use the Hutchinson model (1948)

which is a generalization of the logistic model (2.3.3). In 1976 Cushing proposed the following more general model, in which both discrete and distributed delays are taken into account:

where K ( s ) is a nondecreasing bounded function and the integral on the right-hand side is of the Stieltjes type.

In some special cases, it is necessary to take into account the spatial distribution of the population size. In these cases, the state of the population is described by the density function D(t , x, y) a t the point (x, y). If the movement of individuals within the habitat is a diffusion process, then instead of (2.3.7), we have the Hutchinson model with diffusion

126 Chapter I1

Equations (2.3.1)-(2.3.9) model the behavior of isolated biological com- munities (autonomous models). If there are external actions on the population, then some additional terms responsible for these actions appear on the right-hand sides of Eqs. (2.3.1)-(2.3.9) As usual, we distinguish two types of external actions: purposeful controlled actions, which can be used to control the population size, and noncontrolled random perturbations.

Let us consider a population described by the model (2.3.3). If there are external actions, say, some individuals are taken away from the population, then we obtain the controlled logistic model

5 = T 1-- x - q u x , ( 3 where the function u = u(t) > 0 is the intensity of the catching process and the number q > 0 is the catchability coefficient. In this case, the value

Q = 9 lr u(t)x(t) dt (2.3.11)

gives the number of individuals caught during the time interval [tl, t2].13 In a similar way, the Lotka-Volterra equations can be generalized to the

following controlled system:

Here the control functions u l = ul(t) > 0 and u2 = uz(t) 2 0 are, respectively, the intensities of catching prey and predators. If individuals are removed only from one population, then we have ul( t ) 0 or ug(t) 0 in (2.3.12).

If the population behavior is substantially influenced by noncontrolled random perturbations, then the dynamics of the population is described by stochastic differential equations. For example, in some problems, the population behavior can be satisfactory described by the stochastic logistic model

x - qux +2/2Bx[(t), (2.3.13)

where [(t) is the scalar Gaussian white noise (1.1.31) and the number B > 0 determines the intensity of random perturbations. Many other stochastic models used to describe the dynamics of various biological systems can be found in [51, 831.

13Note that Eq. (2.3.10) can be used not only in the case of "mechanical" removal (catching, shooting, etc.) but also in the case where the population size is controlled by treating the habitat with chemical agents.


2.3.2. Statement of the problem. In this section we consider the optimal control of the size of a population described by the controlled logistic model (2.3.10). The statement of this problem is borrowed from the books 135, 681, where this problem is formulated in conformity to the problem of fisheries management.

We shall assume that the state x = x(t) of the controlled system (2.3.10) characterizes the general quantity (or the mean density) of fish at time t in some chosen habitat. We also assume that the intensity of fishing is bounded by a given value u, > 0. In this case, the mathematical model for the dynamics of fish population has the form

j :=r 1 - - x-qux, ( 3 By p > 0 we denote the price of the unit mass of caught fish and by

c > 0 the price of unit "efforts" u for fishing. Then it is natural to estimate the "quality" of control by the functional

which, with regard to (2.3.11), gives the profit provided by the fishing process defined by the control function u(t) : 0 < t 2 T. The problem is to find the optimal control u, (t) : 0 < t < T for which the functional (2.3.15) attains its maximum.

Following [35, 681, instead of (2.3.15), we shall estimate the quality of control (i.e., of fishing) by the functional in which the terminal time T + m. In this case, an additional "killing" factor appears in the integrand to ensure the convergence, namely,

I (u ) = Lrn e-6t (pqx(t) - c)u(t) dt, (2.3.16)

where b > 0 is a given positive number. As a result, we arrive at the following problem:

for an arbitrary initial state x(0) = xo of the controlled system (2.3.14), to find a control function 0 5 u,(t) 5 urn: t > 0 (or 0 5 u, (x (t)) < u, : t > 0) for which the functional (2.3.16) attains its maximum on the trajectories of system (2.3.14).

REMARK. If the initial population size xo does not exceed the capacity K of the medium, then it follows from Eq. (2.3.14) that for any time moment

128 Chapter I1

t > 0 and any admissible control u(t) , the population size has the same property, x(t) < K . Therefore, this problem is well posed if the parameters p, q, and c in the functional (2.3.16) satisfy the condition

Otherwise, (xl > - K ) , this problem has only a trivial solution u,(t) 0, t > - O.I4 Therefore, in what follows, we assume that the inequality (2.3.17) is satisfied. We also assume that qu, > T .

2.3.3. The solution of problem (2.3.14), (2.3.16). If we define the function F ( x ) of the maximum future profit by the relation

F ( x ) = max [lm (pqx(t) - c)u(t) dt I ~ ( 0 ) = x , (2.3.18) O5u(t)lum I

then, using the standard procedure described in 3 1.3, we obtain the Bellman equation

max {[rx (1 - ) - qux] - b F + (pyx - c)u 0LuLum

corresponding to problem (2.3.14), (2.3.16). I t follows from Eq. (2.3.19) that, depending on the current state (the

population size) x of the system (2.3.14), to perform the optimal control we need to choose u,(x) G 0 for all points x E R1 C R+ a t which the . . function

d F p(x) = pqx - c - qx-

dx (2.3.20)

is negative. Conversely, a t all points x E R2 c Rf where p (x) > 0, we need to take the maximum admissible control u,(x) urn. If p(x,) = 0 a t a point x, (in view of continuity, the point x, separating R1 from R2 is the limit point of these domains), then the optimal control u, (x,) for Eq. (2.3.19) is formally undetermined. However, one can see that the choice of any admissible control 0 5 u 5 urn a t the point x, does not affect the solution of Eq. (2.3.19).

Now let us consider how the population size in system (2.3.14) varies with time. Let x(0) = xo < XI. Obviously, in this case, there exists an initial half-interval [O, t,) a t all points of which we must set u, (t) 0. This statement immediately follows from the fact that the expression pqx - c in the parentheses in (2.3.16) is negative for x(t) close to xo. Thus, we have

14Since XI > K, we have ~ ~ x ( t ) - c < 0 for all t.


x(t) E R1 for all t E 10, t,). Hence, it follows from Eq. (2.3.14) with u 0 that, on the interval [0, t,), the population size x(t) increases monotonically up to the value x, = x(t,) that separates the sets R1 and R2. At the point x,, as was already noted, the control may take any arbitrary value. It is expedient to take this value equal to

and keep it constant for t > t,. It follows from (2.3.14) that the control (2.3.21) preserves the population size x,.

REMARK. For u(x,) # ul , the representative point of system (2.3.14), starting from the state x,, comes either to the set R1 (for u(x,) > ul) or to the set R2 (for u(x,) < ul) during an infinitely small time interval. Then the control u = 0 (or u = u,) immediately returns the representative point to the state x,. Thus, for u(x,) # ul , the population size x, is preserved by infinitely rapid switches of control (this is the sliding mode). Though, as follows from (2.3.19), the value of the functional (2.3.16) for this control remains the same as for u(x,) = ul , the constant control u(t) E u(x,) = u1, t > t,, is more convenient, since in this case the existence problem does not arise for the solution x(t) , t > t,, of Eq. (2.3.14). The optimal control

for 0 < t < t, (x(t) < x,), 21, ( t) = (2.3.22)

for t > t , ,

realizes the generalized solution x,(t) of Eq. (2.3.14) in the Filippov sense (see 51.1).

Thus, for x(0) = xo < X I the optimal control (2.3.22) is a piecewise function shown in Fig. 21 together with the plot of the function x, (t) , which shows the change of the population size corresponding to this control. I t remains to find the moment t, a t which the catching of individuals starts or, which is the same, to find the size (density) x, = x, (t,) of the population that we need to keep constant in the area of active catching. These variables can readily be obtained by calculating the functional (2.3.16) and taking its maximum with respect to t,. Indeed, for the control (2.3.22), the functional (2.3.16) is equal to

We can calculate its maximum with respect to t, by using the fact that x, = x, (t,) as a function of t, satisfies Eq. (2.3.14) for u E 0. After the

Chapter I1

F I G . 21

differentiation, from the extremum condition

d.1C, dx* - -de-8t* $ (x , ) + e-6'. - - dt * d z , d t ,

we obtain the following equation for the optimal size x , of the population:

ScK

This equation has only one positive solution

which has a physical meaning. Note that, in view of (2.3.17), the value x , determined by (2.3.25) always

satisfies the inequality X I < x , < K. We also note that the condition xo < X I , introduced for the sake of obviousness, does not influence the


choice of the optimal control. This strategy is completely determined by x,, and according to this strategy, we do not catch individuals if the current population density x(t) < x, and start catching with constant intensity (2.3.21) if the population size attains the value x, (2.3.25).

We can readily calculate the profit function (2.3.18) corresponding to this strategy. Integrating the equation in (2.3.14) with x(0) = x and u 0, we obtain

x K x(t) = t 2 0.

x + ( K - x)ecTt ' Using (2.3.26), we see that the condition

allows us to find the moment t,. From (2.3.23) and (2.3.27), we explicitly calculate the profit function

- T x ( K - x,) 'IT - -(pqx* - c) 1 - -

6q ( '1 [.*in - (2.3.2,)

for x 5 2,.

To solve problem (2.3.14), (2.3.16) completely, it remains to consider the case x(0) = xo > x,, that is, the case where the initial population size is larger than the optimal size (2.3.25). First, we note that, in view of (2.3.28), the profit function F ( x ) monotonically increases on the interval 0 < x 5 x, from zero to the maximum value

We also note that the function +(x) has only one maximum point

and since the "killing" factor 6 in (2.3.16) is strictly positive, we always have the strict inequality x2 > x,. Now if x(0) = xo = x2 > x ,, then using the constant control

u(x2) = r (1 - z ) , (2.3.30) 4

132 Chapter I1

we can keep the population size a t a level of x2 for which the functional (2.3.16) attains the value

However, the constant control (2.3.30) is not optimal. One can readily see that the functional (2.3.16) can take values larger than I (u (x2) ) = $(a2) if, instead of (2.3.30), we use the piecewise constant control function

shown in Fig. 22.

We choose a time interval A during which the control urn is applied, so that a t the end of A the population size attains the value (2.3.25), that is, x(A) = x,.15 The time interval A for the control urn is determined by the equation

15The inequality I(ua(t)) > I(u(12)) = 4(12) is obtained by calculating the functional (2.3.16) with regard to Eq. (2.3.14), where control has the form (2.3.31). Here we do not perform the corresponding elementary but cumbersome calculations and leave them to the reader as an exercise.


Obviously, control functions of the form (2.3.31) can be used not only for the initial population size x(0) = 2 2 but also for arbitrary initial sizes x(0) = x > 2,. In this case, we must perform the change 2 2 -+ x only in Eq. (2.3.32) for the length A of the initial pulse urn. One can easily verify that (2.3.20) implies p(x) > 0 for all x > x,. Therefore, the optimal control as a function of the current population size (the synthesizing function) for problem (2.3.14), (2.3.16) has the form

for 0 < x < x,,

for x = x , , (2.3.33)

for x > x*,

where x, is determined by (2.3.25). Formula (2.3.33) gives the mathematical expression for the control strat-

egy that is well known in the theory of optimal fisheries management [35, 681. The key point in this strategy is the existence of an optimal size x, of fish population given by (2.3.25). The goal of control is to achieve the optimal size x, of the population as soon as possible and to preserve the size x, by using the constant control (2.3.21). This control strategy maximizes the profit obtained by fishing if the profit is estimated by the functional (2.3.16).

In conclusion, we note that the results presented in this section can be generalized to the case in which the dynamics of fish population is subject to the retarded equation (or the equation with delay)

here we have studied the controlled Hutchinson model. For the results related to this case, see [99].

The stochastic version of problem (2.3.14), (2.3.16), when the behavior of the population is described by the stochastic equation (2.3.13), will be described in $6.3.

$2.4. Stochastic problem of optimal fisheries management

Now let us consider the problem of optimal fisheries management that differs from the problem considered in $2.3 by the stochastic character of the model used for the description of the population dynamics. We assume that the behavior of a fish population is subject to the stochastic differential equation of the form

134 Chapter I1

where ((t) is a scalar Gaussian white noise (1.1.31), B > 0 is a given positive number, the natural growth factor r > 0 and the catchability coefficient q > 0 have the same meaning as similar coefficients in (2.3.10), (2.3.13), and (2.3.14). Equation (2.4.1) is a special case (as K + m ) of Eq. (2.3.13) and, in accordance with the classification presented in Section 2.3.1, the model described by Eq. (2.4.1) can be called a controlled stochastic Malthus model.

Just as in $2.3, the size x(t) of the fish population is controlled by catching a part of this population. In this case, the catching intensity u(t) has an upper bound u,, and therefore, the set of all nonnegative measurable bounded functions of the form u(t) : [0, a) + [O, u,] is considered as the set of admissible controls. The goal of control is to maximize the functional (2.3.16), which, in view of the random character of the functions x(t) and u(t), is replaced by the corresponding mean value. As a result, we have the problem

In what follows, we assume that the decay index S in (2.4.2) satisfies the condition S > r .

We shall solve problem (2.4.1), (2.4.2) by using the standard procedure of the dynamic programming approach described in 51.4. We define the profit function for problem (2.4.1), (2.4.2) by the relation

where El(.) I x(0) = x] denotes the conditional mathematical expectation of (.). As was shown in [113, 1751, the second-order derivative of the profit function (2.4.3) is continuous. It follows from Theorem 3.1.5 in [I131 that for all x E R+ = [0, m ) , this function has the upper bound

where N > 0 is a constant. d Z ~ The Bellman equation (F, = g, F,, = =)

for the profit function (2.4.3) can be obtained as usual (see $1.4). I t should be pointed out that a symmetrized stochastic integral (see [I741 and 51.2)


was used for writing (2.4.5). This led to an additional term B in the parentheses in (2.4.5), that is, in the coefficient of xF,.16

Equation (2.4.5) allows us to find the optimal control u, as a function u,(x) of the current states of system (2.4.1). First, we note that, according to (2.4.5), the set of all admissible states of (2.4.1) can be divided into the following two subsets (just as in $2.3):

the subset R1, where p (x) = pqx - c - qxFx < 0 and u,(x) = 0 and

the subset R ~ , where ~ ( x ) > 0 and u, (x) = urn.

The boundary between these two subsets is determined by the relation

pqx - c - qxFx = 0. (2.4.6)

The further calculations show that, in this problem, there exists a unique point x, satisfying (2.4.6). Therefore, the subsets R1 and R2 are the intervals R1 = [0, x,) and R~ = (2, , CQ). Thus the optimal control in the synthesis form u, = u,(x) becomes uniquely determined a t all points x E R+ except for the point x,. I t follows from (2.4.6) that we can use any admissible control u(x,) E [0, urn] a t the point x,.

Therefore, the optimal control function u, (x) can be represented in the form

and the final solution of the synthesis problem is reduced to calculating the coordinate of the switch point 2,. To calculate x,, we need to solve the Bellman equation (2.4.5).

As was already noted, the second-order derivative of the profit function F ( x ) is continuous, thus the profit function F ( x ) satisfying (2.4.5) can be obtained by using the matching method (see $2.2). In what follows, we describe the procedure for solving the Bellman equation (2.4.5) and calculating the coordinate of the switch point x, in detail.

By F1(x) and F 2 ( x ) we denote the profit function F ( x ) on the intervals R1 = [O, x,) and R' = (x,, CQ). It follows from (2.4.5) and (2.4.7) that the functions F1 and F2 satisfy the linear equations

BX~F,:, + (T + B)XF: - 6 ~ ' = 0, O 5 x < x,, (2.4.8) 2 2

B x Fxx + ( r + B - q u r n ) x ~ : - 6~~ + (pqx - c)um = 0, x > 2,.

(2.4.9)

I6If the stochastic differential equation in (2.4.1) is the Ito equation, then the second term in the Bellman equation (2.4.5) has the form (T - qu)xF,.

136 Chapter I1

Since the profit function F ( x ) is sufficiently smooth (recall that the second- order derivative of F ( x ) is continuous), both functions F 1 and F 2 must satisfy the condition (2.4.6) a t the switch point x,. Taking into account the fact that F ( 0 ) = 0 according to (2.4.1) and (2.4.3), we have the following additional boundary condition for the function F 1 ( x ) :

The boundary conditions (2.4.6), (2.4.10), and the upper bound (2.4.4) allow us to obtain the explicit analytic solution of Eqs. (2.4.8) and (2.4.9).

Equation (2.4.8) is the well-known homogeneous Euler equation. Its general solution has the form

where A1 and A2 are constants, and k i and k l satisfy the characteristic equation

B k ( k - 1) + ( T + B ) k - S = 0. (2.4.12)

The constants A1 and A2 are determined by two boundary conditions (2.4.10) and (2.4.6) a t the points x = 0 and x = x,. Since the roots

of Eq. (2.4.12) have opposite signs, we conclude that to satisfy the condition (2.4.10), we need to set A2 equal to zero in (2.4.11). The constant

can be calculated by substituting F 1 ( x ) = ~ 1 % ~ : into (2.4.6) and taking into account the fact that (2.4.6) is valid a t the switch point x , . Thus, the solution of Eq. (2.4.8) is given by the formula

The inhomogeneous Euler equation (2.4.9) can be solved in a similar way. By using the standard method of variation of parameters, we obtain the general solution


where 1 k2 - -

1 - 2B [ P m - T. + J ( ~ u , - T ) ~ + 4B6],

satisfy the characteristic equation

and A3 and A4 are arbitrary constants. Since kf is positive, we must set the constant As equal to zero (otherwise,

formula (2.4.15) contradicts the upper bound (2.4.4)). The constant A* can be calculated from condition (2.4.6) a t the switch point x,. Substituting F2(x) (determined by (2.4.15) with A3 = 0) into (2.4.6) instead of the function F , we obtain

This implies the following expression for the function F2(x) :

The two functions F1(x) (2.4.14) and F2(x) (2.4.17) determine the profit function F ( x ) satisfying the Bellman equation (2.4.5) for all x E R+ = [O, co). These functions contain a parameter x,, which remains unknown. We can calculate x, by using the continuity property of the profit function F (x) .

Each of the functions F1 and F2 is continuous. Hence, to ensure the continuity of F ( x ) , it suffices to satisfy the condition

a t the switch point 2,. I t follows from (2.4.6), (2.4.8), and (2.4.9) that (2.4.18) is equivalent to the condition

a t the switch point x,.

138 Chapter I1

Calculating the second-order derivative of the functions (2.4.14) and (2.4.17), we derive the following equation for x, from (2.4.19):

Hence, the switch point x, is determined by the explicit formula

Formula (2.4.20) and the optimal control algorithm (2.4.7) constitute the complete analytic solution of the stochastic problem (2.4.1), (2.4.2) of optimal fisheries management.

Some final comments and remarks. It is of interest to compare (2.4.20) and (2.3.25), which is the optimal size of the population in the deterministic problem (2.3.14), (2.3.16) of optimal control considered in $2.3. Denoting (2.3.25) by c,, we may expect that the equality

lim F* = lim 2, K+m B+O

(2.4.21)

is valid due to continuity reasons (since the deterministic version of problem (2.4.1), (2.4.2) formally coincides with problem (2.3.14), (2.3.16) as K + m).

We can verify (2.4.21) by straightforward calculations of the limits on both sides. Indeed, using (2.3.25), we readily calculate the limit on the left-hand side of (2.4.21) for 6 > r,

c6 lim T, =

~ - t m pq(6 - r ) '

The same result is obtained by calculating the limit of (2.4.20) as B + 0, since

k: - 1 + (6 - r ) (qum - r) k ; - k; B+O aqum ,

which follows from (2.4.13) and (2.4.16). Formula (2.4.21) shows how the results obtained in this section for

problem (2.4.1), (2.4.2) are related to similar results for problem (2.3.14), (2.3.16) obtained in Section 2.3.3 by quite different methods.

There is another interesting specific feature of problem (2.4.1), (2.4.2). Namely, the standard "classical" approach of the dynamic programming


that leads to the exact solution of the stochastic problem (2.4.1), (2.4.2) does not allow us to solve the synthesis problem (that is, to find the switch point 2,) for the deterministic version of problem (2.4.1), (2.4.2), that is, in the case where there are no random perturbations in Eq. (2.4.1). This fact can readily be verified if we consider the deterministic version of the Bellman equation (2.4.5).

max [(T - o<u<u,

and calculate the functions

which, in this case, determine the profit function F ( x ) on the intervals R1 = [O, x*) and R2 = [x,, 00).

Contrary to the stochastic case in which the continuity condition (2.4.18) for the functions (2.4.14) and (2.4.17) determines the unique switch point (2.4.20), one can readily verify that the same continuity condition F1(z,) = F2(x,) for the functions (2.4.24) and (2.4.25) holds for any point x , E (0, m). Therefore, the control problem considered can serve as an example illustrating the well-known idea (see [113, 1751) that the dynamic programming approach is more suited for solving control problems with stochastic models of plants (which, by the way, describe the actual reality more adequately).

REMARK. If the equation in (2.4.1) is understood as the Ito stochastic equation, then the Bellman equation for problem (2.4.1), (2.4.2) differs from (2.4.5) and has the form

The way for solving this equation is quite similar to the above procedure for solving Eq. (2.4.5). However, the population size x, that determines the switch point for the optimal control (2.4.7) differs from (2.4.20) and is

Chapter I1

given by the expressions

Z* = c(6 - r + qu,) (El 1)

p q [ h - - r + - I q u m l l (k:-%;)

CHAPTER I11

APPROXIMATE SYNTHESIS OF STOCHASTIC

CONTROL SYSTEMS WITH SMALL CONTROL ACTIONS

Various approximate synthesis methods can be useful if the Bellman equation cannot be solved exactly. Chapters 111-VI deal with some of these methods.

Approximate methods are usually efficient if the initial statement of the optimal control problem contains a small parameter. Quasioptimal control algorithms are constructed by using either the corresponding procedures of successive approximations or asymptotic expansions of the loss function in powers of a small parameter of the problem. The choice of a method for constructing an approximate solution of the synthesis problem essentially depends on the choice of a parameter that is considered to be small. For example, in this chapter, the values of control actions are assumed to be small. Chapter IV is about the Bellman equation with small diffusion coefficients. In Chapter V, we consider control problems for oscillating systems with small attenuation decrement. In Chapter VI, the role of small parameters is played by the a posteriori covariances of unknown coefficients in the plant equations.

Let us formulate the main idea of the approximate synthesis method studied in this chapter. As was already noted, the method is based on the assumption that control actions on the plant P are relatively small. From the physical viewpoint, this assumption means that the effect of the control actions on the phase trajectories of the system is small, and therefore the system dynamics is similar to noncontrolled motion. In particular, this assumption holds for control problems with constraints if the noises acting on the plant are of large intensity.

Indeed, let us assume that the controlled (unperturbed) plant is a stable mechanical system. Then large random perturbations lead to large deviations of the system from the equilibrium state. In this case, some "internal" inertial and elastic forces arise in the system. These forces can significantly exceed the (bounded) control forces whose effects on the system turn out to be relatively small.'

'Note that in this book we do not consider deterministic synthesis problems for

141

142 Chapter I11

From the formal mathematical viewpoint, the fact that control actions are small leads to a small parameter in the nonlinear term in the Bellman equation. To verify this fact, let us consider the synthesis problem for the servomechanism (Fig. 10) governed by the Bellman equation (1.4.21). Assume that the dimensions of the region U of admissible controls are bounded by a small value of the order of E. For definiteness, we assume that U is either an r-dimensional parallelepiped (R, > U = (u: lujl < - u,i, i = 1,2, . . . , T ; maxi u,i = E ) ) or an r-dimensional ball of radius E,

that is, (R, > U = (u: x:=l U: < E)) .

In the first case, according to (1.3.22), the solution of the synthesis problem is given by the formula (the control algorithm)

where the vector of partial derivatives d F / d x is calculated by solving the equation

Here denotes the r-vector (column) U,/E and L is a linear operator of the form2

In the second case (where U is a ball), the optimal control has the form (see (1.3.23))

systems controlled by small forces. Such systems called weakly controllable in [32] were studied in [32, 1371.

'Recall that relations (3.0.1) and (3.0.2) follow from the Bellman equation (1.4.21) with c ( z , y, u ) = CI ( x , y) and Az(t , z ) = a ( t , x ) + Q(t)u; {u,~, . . . , u,,) denotes a diagonal (T x T)-matrix; for a column A with components Al , . . . , A,, the expressions sign A and [A[ denote T-columns with components sign Aj and [Aj[ ( i = 1 , . . . , T), respectively.

Approximate Synthesis of Stochastic Control Systems 143

where the vector dF /dz is the gradient of the loss function satisfying the equation

If we denote the nonlinear terms in Eqs. (3.0.2) and (3.0.4) in the same way, then we can write both equations in the form

where @(t, dF/dx) is a given nonlinear function of its arguments. As a rule, equations of the type (3.0.5) cannot be solved exactly. How-

ever, the presence of a small parameter in the nonlinear term of this equation yields a rather natural way of solving this equation approximately. To this end, one can use the method of successive approximations in which the zero-order approximation Fo(t, x, y) satisfies the equation

and the successive approximations Fk(t, x, y) can be calculated recurrently by solving the sequence of linear equations of the form

If we know the solution Fk(t , x, y) of the equation for the kth approximation (k = 0,1, . . . ), then we can perform an approximate synthesis of the controlled system by setting the quasioptimal control algorithm as

In this chapter we consider an approximate method for the synthesis of optimal systems whose "algorithmic" essence is given in formulas (3.0.6)- (3.0.8).3

Needless to say, the practical use of procedure (3.0.6)-(3.08) in special problems leads to additional problems of constructivity and efficiency of

3The approximate synthesis algorithm (3.0.6)-(3.08) is a modification of the well- known Bellman method of successive approximations [14, 161. This method was used by W. Fleming for solving some stochastic problems of optimal control 1551. The procedure (3.0.6)-(3.0.8) is a special case of the Bellman method if the trivial strategy uo(t, x, y) 5 0 is used as the initial "generating" control strategy in the Bellman method.

144 Chapter I11

this approximate synthesis method. In this chapter we shall discuss these problems in detail. All related material is divided in sections as follows.

First ($$3.1-3.3), we consider some methods for calculating the successive approximations for stationary synthesis problems. We write out approximate solutions (that correspond to the first two approximations) for some special control systems with various types of disturbances affecting the system. In $3.1 and 53.2, we consider random perturbations of the white noise type. In $3.3 the results obtained in $3.1 and $3.2 are generalized to the case of correlated noises.

In $3.4 we study nonstationary problems and estimate the error of the approximate synthesis (3.0.6)-(3.0.8) for the first two approximations.

In $ 3.5 we study asymptotic properties of successive approximations (3.0.7), (3.0.8) as k -+ co. We show that, under some special conditions, as k -+ co the sequence Fk is convergent to the exact solution of the Bellman equation, and the corresponding quasioptimal control algorithms (3.0.8) to the optimal control u,(t, x, y). In this case, the convergence uk t u, is understood in the sense of convergence of values of the functional to be minimized.

Finally, in 53.6 the method of successive approximations (3.0.6)-(3.0.8) is used for approximate synthesis of some stochastic control systems with distributed parameters.

$3.1. Approximate solution of stationary synthesis problems

3.1.1. Let us consider the problem of optimal damping of oscillations in a dynamic system subject to random perturbations of the white noise type (Fig. 13). Let the plant P be described by the following system of linear stochastic differential equations with constant coefficients:

Here x = x(t) is an n-vector (column) of current phase variables of the system (xl(t), . . . , xn(t)) , u = u(t) is an T-vector (column) of control actions (ul(t), . . . , u, (t)) , J(t) is an n-vector (column) of random perturbations with independent components ( (~ ( t ) , . . . , Jn(t)) of the standard white noise type (1.1.31), and A, Q, and o are given constant matrices of corresponding size.

It is required to construct a control block C so that to ensure the optimal suppression (damping) of the oscillations x(t) arising in the output of the closed loop system shown in Fig. 13 under the action of random perturbations ((t). As the optimality criterion, we shall use an integral functional of the form

(3.1.2)


where c(x) > 0 is a given convex penalty function attaining the absolute minimum c(0) = 0 a t the point x = 0 (the restrictions on c(x) are discussed in detail in $3.4 and $3.5).

Let admissible controls be bounded and small. We assume that all components of the control vector u satisfy the conditions

where E > 0 is a small parameter and urnl, . . . , urn, > 0 are given numbers of order 1.

The system shown in Fig. 13 is a special case (the input signal y(t) = 0) of the servomechanism shown in Fig. 10. Therefore, the Bellman equation for problem (3.1.1)-(3.1.3) readily follows from (1.4.21), and taking into account the relations AY(t, y) = 0, BY@, y) = 0, Ax(t, x, u) = Ax +Qu, and ~ ( x , y, u) = c(x), we obtain

Here L denotes a linear elliptic operator of the form

where, according to (1.4.16), the matrix B = uuT, and, as usual, the sum in the last expression on the right-hand side of (3.1.5) is taken over repeated indices from 1 to n.

It follows from (3.0.1) and (3.0.2) that in this case the optimal control has the form

U, ( t , x) = -{EU,I, . . . , E U ~ ~ ) sign (3.1.6)

where the loss function F ( t , x) satisfies the equation

Some methods for solving Eq. (3.1.7) will be considered in $3.4. In the present section, we restrict our consideration to stationary operating conditions of the stabilization system in question. It follows from $1.4 that the stationary mode of stabilization (damping) can take place if T -+ co, where T is the terminal instant of the operation interval (the upper

146 Chap te r I11

integration limit in (3.1.2)). Obviously, in this case, stationary operating conditions exist only if the unperturbed motion of the plant P is stable, that is, in other words, if the real parts of the eigenvalues of the matrix A in (3.1.1) are negative. In what follows, we assume that these conditions are satisfied.

If we define the stationary loss function f (x) by the relation (see (1.4.29), (2.2.9))

f (x) = lim [ ~ ( t , x ) - y ( T - t ) ] , T+w

then (3.1.7) implies the following time-invariant equation for f (x):

where the parameter y characterizing stationary "specific losses" together with the function f (x) can be found by solving Eq. (3.1.8). We shall solve Eq. (3.1.8) by the method of successive approximations. The scheme for calculating (3.0.6), (3.0.7), applied to the time-invariant equation (3.1.8) leads to the sequence of equations

It follows from (3.1.9) and (3.1.10) that each time when we are calculating the next approximation, we need to solve a linear inhomogeneous elliptic equation of the form

We shall consider a method for solving Eq. (3.1.11) with a given function cp(x), which allows us to represent the function f (x) in the form of a series in eigenfunctions of some Sturm-Liouville problem [179].

3.1.2. T h e passage t o t h e adjoint equation. Let us consider the operator

a L* = - - ( A , . 1 a2 1 3 % ~ ) + - - (Bij ), (3.1.12) axi 2 axiaxj

that is the dual of the operator (3.1.5). The equation


is the Fokker-Planck equation (1.1.67) for the n-dimensional Gaussian Markov process x ( t ) [45, 167, 1731. The assumption that the matrix A is stable implies that this process has a stationary density function po(x ) such that

L f p o ( z ) = 0. (3.1.14)

In this case the stationary probability density po(x ) has the form

where P-I is the matrix of covariances of components of the vector x . We shall present a possible method for solving Eq. (3.1.13) and calculat-

ing the matrix P in (3.1.15). The diffusion Markov process x ( t ) described by (3.1.13) satisfies the system of linear stochastic differential equations

describing the uncontrolled motion of the plant (3.1.1). We pass from x to new variables y related to x by the linear transformation

with a nondegenerate matrix Zf. As a result, instead of (3.1.16), we have the following system for the new variables:

where - 2 = V - ~ A V , a = V-Lu. (3.1.19)

We choose so that the matrix 2 is diagonal

As is known [62], the matrix always exists and can readily be constructed if the eigenvalues of the matrix A are simple, that is, if the characteristic equation of the matrix A,

det(A - XE) = 0 , (3.1.21)

- has different roots X I , X 2 , . . . , An. In this case, the columns of the matrix V are the eigenvectors 2 of the matrix A that satisfy the linear equations

148 Chapter I11

The system (3.1.18) can readily be solved in the case of (3.1.20). Indeed, writing (3.1.18) in rows, we obtain

where the random functions v e ( t ) = i i e k t k ( t ) are processes of the white noise type and have the characteristics

- Here gLm is an element of the matrix B = iiiiT. Solving Eq. (3.1.23), we obtain

and taking into account (3.1.24), derive the following expressions for the means and covariances:

- Btm (3.1.26)

Eyt ( t ) y m ( t ) - Eye ( t ) . Eym ( t ) = --- [ e ( ' t + A m ) ( t - t ~ ) - 1 Je + Am

1 ,

which determine the transition roba ability p ( y ( t ) I y ( t o ) ) of the Gaussian process ~ ( t ) . It follows from (3.1.26) that

in the stationary case as t + co, since, by assumption, ReXi < 0 (i = 1, . . . , n ) for all roots of the characteristic equation (3.1.2 1) of the matrix A.

It follows from (3.1.27) that the stationary density function p o ( y ) can be written in the form

where each entry of the matrix FP1 is given by the formula


The stationary density po(y) satisfies the stationary Fokker-Planck equation a 1 a2 -

L*Po(Y) = - - ( X ~ Y ~ P O ) S -- (BijPo) = 0- ayi 2 ayiayj

Since the random processes y ( t ) and x ( t ) are related by the linear transformation (3.1.17), the comparison of (3.1.15) with (3.1.28) yields the formula

which together with (3.1.29) allows us to calculate the matrix P. Now let us return to the Fokker-Planck equation (3.1.14). If the operator

(3.1.5) satisfies the potentiality condition (see $4, Section 5 in [173]), then the operator equality4

POL = L*PO (3.1.32)

readily follows from (3.1.5), (3.1.12), and (3.1.14). However, even if the potentiality conditions are not satisfied and (3.1.32) does not hold, one can choose an operator L; satisfying a similar relation

One can readily see that the operator L; has the form5

where the matrix G = IIGijllT is similar to the transpose matrix AT from (3.1.1) and (3.1.5),

G = P-' AT P. (3.1.35)

The similarity transform (3.1.35) employs the matrix P from (3.1.15). Relation (3.1.33) allows us to replace Eq. (3.1.1 1) by a similar equation

for the dual operator. In other words, it follows from (3.1.11) and (3.1.33) that the problem of finding f ( x ) in (3.1.11) is equivalent to the problem of finding z ( x ) in the equation

where z ( x ) , + ( x ) , and the functions f ( x ) , p ( x ) from Eq. (3.1.11) satisfy the relations

4As usual, the operator equality is understood in the sense that it is an ordinary relation po(x)Lw(x) = L*po(x)w(x) for any sufficiently smooth function w(x).

5The verification of (3.1.33) is left to the reader as an exercise.

150 Chapter I11

3.1.3. The solution of equations (3.1.36) and (3.1.11). Let us consider the following problem of finding the eigenfunctions z, (x) and eigenvalues A, of the operator L; (the Sturm-Liouville problem):

Since L; is the Fokker-Planck operator, its eigenfunctions z, must satisfy the zero conditions a t infinity (as 1x1 t MI). -

By passing from x to new variables y (x = Vy) and acting in a way similar to (3.1.17)-(3.1.31), we can transform the operator (3.1.34) to the form

-- 1 where gj is an element of the matrix B = mT, a = V a, 7 is a nonde-

1 - generate matrix such that the transformation V - GV makes the matrix diagonal, = {XI, . . . , A,), and Xi are roots of Eq. (3.1.21).6

In the new variables the stationary Fokker-Planck equation has the form

This equation differs from (3.1.30) only by the matrix of diffusion coefficients; therefore, the stationary probability density %(y) is determined by the formulas

similar to (3.1.28) and (3.1.29). Differentiating (3.1.40) appropriately many times, we see that

' ~ c c o r d i n ~ to (3.1.35), the matrix G is similar to the transpose A ~ . Since all similar and transpose matrices have the same eigenvalues, the characteristic equation det(G - XE) = 0 for the matrix G coincides with (3.1.21)).


where ml , m2, . . . , mn are any arbitrary integers between 0 and m. aml+...+m, -

It follows from (3.1.43) that the functions p, and the num-

bers (mlX1 + - . . + mnX,) can be treated as the elgenfunctions zs and the eigenvalues As of problem (3.1.38), respectively.

By using (3.1.41), we can write the functions zs in more detail as follows:'

Here Hm ,...,,(y) = Hml...m,(yl,. . . , yn) denote multidimensional Hermit- ian polynomials (for instance, see [4]) that, by definition, are equal to

aml+...+m, x exp [ - i (yTPy)] . (3.1.45)

ayyl . . . a y r n

It follows from the general theory [4] for Hermitian polynomials with real variables y that these polynomials form a closed and complete system of functions, and an arbitrary function from a sufficiently large class (these functions grow a t infinity not faster than any finite power of lyl) can be expanded in an absolutely and uniformly convergent series in this system of functions. Furthermore, the polynomials H are orthogonal to another group of Hermitian polynomials G given by the formula

Here the variables p and y satisfy the relation

and the orthogonality condition itself has the form

is the Kronecker delta).

h he constant coefficient [(27~)" d e t ~ - ~ ] - ' / ~ in (3.1.44) is omitted.

152 Chapter I11

However, we often need to use a complex matrix 7 for the change of variables x + y (for instance, see the problem in 83.2.). To pass to complex variables, we need to verify some additional statements from the general theory [4], which hold for real variables. In particular, it is necessary to verify the orthogonality conditions (3.1.48), which are the most important in practical calculations.

This was verified in [107], where it was shown that all properties of - - the polynomials H and G remain valid for complex variables if only all

functions H u l...un (y), G,, ...,. (y), exp[i(yTFy)1, and e x p [ $ ( ~ ~ - l p ) l are considered as functions of the initial real variables x of the problem. To

1 this end, we need to make the change of variables y = V- x in all these functions. In particular, in this case, the orthogonality condition (3.1.48) has the form

where the matrices PI and P satisfy the relation

similar to (3.1.31). Thus, we obtain the following algorithm for constructing the solution

f (x) of Eq. (3.1.11). First, we seek a stationary density po(x) satisfying (3.1.14) and an operator L; satisfying (3.1.33). Then we transform problem (3.1.11) to problem (3.1.36). After this, to find the eigenfunctions and eigenvalues of problem (3.1.38), we need to calculate the matrix 7 that transforms the matrix G to the diagonal form {XI, . . . , A,} by the simi-

1 - larity transform 7- GV. Next, using the known Xi and 7 and (3.1.42),

1 we calculate the matrices P- and ?5 that determine the stationary distribution (3.1.41). The expression obtained for %(y) enables us to find the eigenfunctions z, = z,, ...mm (3.1.44) for problem (3.1.38) and the orthogonal polynomials G, ,... m n (3.1.46).

Finally, we seek the function z(x) satisfying (3.1.36) in the form of the series with respect to the eigenfunctions:


where am,.,,mn are unknown coefficients; the eigenfunctions zml...mn (x) can 1

be calculated by formulas (3.1.44) with y = V - x. If we also represent the right-hand side 4 ( x ) = p ~ ( x ) p ( x ) in (3.1.36) as

the series

po(x)p(x) = 5 bml ... mnzm l...mn (XI, (3.1.52) ml...mn=O

where, in view of (3.1.49),

then we can calculate the unknown coefficients in (3.1.51) by the formula

which follows from (3.1.38) and (3.1.43). Now we see that (3.1.37) implies the expression

for the solution of the initial equation (3.1.11). The algorithm obtained for solving (3.1.11) can be used for calculating

the successive approximations (3.1.9) and (3.1.10) It remains only to solve the problem of how to choose the stationary losses -yk (k=0,1,2, . . . ) in Eqs. (3.1.9) and (3.1.10).

3.1.4. Calculation of the parameters -yk (k: = 0 ,1 ,2 , . . .). The structure of the solution (3.1.55) and a natural requirement that the stationary loss function f (x) must be finite imply that there is a unique method for choosing -yk. Indeed, since, according to (3.1.54), the eigenvalue Aoo...o = 0, the coefficient aoo.,.o in (3.1.46) is finite if a necessary condition boo...o = 0 is satisfied, or, more precisely, (in view of (3.1.53) and (3.1.46)) if we have

154 Chapter I11

This relation, (3.1.9) and (3.1.10) imply the following expressions for the stationary losses yk :

Ip0(x) dxl . . . dx,, ax

Thus, we have completely solved the problem of how to calculate the successive approximations (3.1.9), (3.1.10) for the stationary operating conditions of the optimal stabilization system.

If the loss function f k (x) in the kth approximation is calculated, then the quasioptimal control uk (x) in the kth approximation is completely defined, namely, in view of (3.0.8) and (3.1.6), we have

( uk(x) = -{Eu,~, . . . , EU,,} sign QT -

In the next section, using this general algorithm for approximate synthesis, we shall calculate a special system of optimal damping of random oscillations when the plant is a linear oscillating system with one degree of freedom.

$3.2. Calculation of a quasioptimal regulator for the oscillatory plant

In this section we consider the stabilization system shown in Fig. 13, in which the plant P is an oscillatory dynamic system described by the equation

Z+p5 + x = u + &[(t), (3.2.1)

where the absolute value of the scalar control u is bounded,

the scalar random process [(t) is the standard white noise (1.1.31), and P, B, and E are given positive numbers (p < 2).

Equations of the type of (3.2.1) describe the motion of a single mass point under the action of elastic forces, viscous friction, controlling and random perturbations. The same equation describes the dynamics of a direct-current motor controlled by the voltage applied to the armature when


the load on the shaft varies randomly. Examples of other actual physical objects described by Eq. (3.2.1) can be found in [2, 19, 27, 1361.

For system (3.2.1), (3.2.2), it is required to calculate the optimal regulator (damper) C (see Fig. 13), which will damp, in the best possible way with respect to the mean square error, the oscillations constantly arising in the system due to random perturbations ((t). More precisely, as the optimality criterion (3.1.2), we shall consider the functional

which has the meaning of the mean energy of random oscillations in system (3.2.1). Note that the mean square criterion (3.2.3) is used most frequently and this criterion corresponds to the most natural statement of the optimal damping problem [I, 501. However, there are other statements of the problem with penalty functions other than the function c(x) = x2 + i2 exploited in (3.2.3). From the viewpoint of the method used here for solving the synthesis problem, the choice of the penalty function is of no fundamental importance.

To make the problem (3.2.1)-(3.2.3) consistent with the general statement treated in 53.1, we write Eq. (3.2.1) as the following system of two first-order equations for the phase coordinates x l and 2 2 (these variables can be considered as the displacement x l = x and the velocity x2 = i ) :

Using the vector-matrix notation, we can write system (3.2.4) in the form (3.1.1), where A, Q, and u are the matrices

According to $3.1, under the stationary operating conditions (T + co in (3.2.3)), the desired optimal damper C (Fig. 13) is a relay type regulator described by the equation (see (3.1.6))

U* (xl, 2 2 ) = -E sign (S) - Here f = f (x l ,x2) is the loss function satisfying the stationary Bellman equation (see (3.1.8))

156 Chapter I11

where, according to (3.1.5) and (3.2.5),

The equation

determines a switching line for the optimal control action (from u = +E to u = -E or backwards) on the phase plane ( X I , 2 2 ) . The goal of the present section is to obtain explicit expressions for the control algorithm (3.2.6) and the switching line (3.2.9). To this end, i t is necessary to solve Eq. (3.2.7). We shall solve this equation by the method of successive approximations discussed in $3.1.

First, we shall prepare the mathematical apparatus for calculating the successive approximations. A straightforward verification shows that the stationary distribution with the density function po(x) = po(xl, xz), satisfying the equation (see (3.1.14))

Hence, the matrices P and P-I in (3.1.15) are equal to

It follows from (3.2.11) and (3.1.35) that in this case the matrix G of the operator (3.1.34) coincides with the transpose matrix AT, that is, according to (3.2.5), we have

and the operator (3.1.34) has the form

a a B a2 LT = 22- + - - ( p x 2 - + --. axl ax2 2 ax;

One can readily see that the same probability density (3.2.10) satisfies the stationary equation LTpo(x1, 2 2 ) = 0. Therefore, in this case, the matrix PI from (3.1.49) and (3.1.50) coincides with the matrix P determined by (3.2.11).


The matrix V that reduces (3.2.12) to the diagonal form by the similarity transform is equal to

(3.2.14) This expression and formulas (3.1.50) and (3.2.12) imply

1 Correspondingly, the inverse matrix P- has the form

The matrices (3.2.15) and (3.2.16) allow us to calculate the two-dimensional Hermitian polynomials

ae+m Hem = 1 T - (-qe+" exp [$ (YTFY)] 7 exp [ - % ( Y PY)] - ad 8 ~ 2 (3.2.17)

T--1 ae+m Gem = (-I)eim ~ X P [t (r P P I ] aa:apB exp [ - i ( p T ~ - l p ) ]

- p = P y . (3.2.18)

Then these polynomials must be represented as functions of x l and 2 2

by using the formula x = V y and expression (3.2.14) for the matrix V . Table 3.2.1 shows some first polynomials H and G.

In this case, in view of (3.1.51)-(3.1.55), (3.2.7), (3.2.10), and (3.2.11), the solutions of the equations of successive approximations (3.1.9), (3.1.10) can be written in terms of the Hermitian po1ynomiaIs Hem(xl , 2 2 ) as the series

(3.2.19) where the coefficients bFm are calculated by the formulas

Chapter I11

Polynomials H

Polynomials G

Expressions for the polynomials G20, Gllr . . . can be obtained from the -- 1 corresponding expressions for He, by the change 5, -+ Ppq , ~lp + yp.

Before we pass to the straightforward calculation of the successive approximations fk, we make the following two remarks about some singular-


ities of the series on the right-hand side of (3.2.19).

REMARK 3.2.1. In practice, the series (3.2.19) is usually replaced by a finite sum. The number of terms of the series (3.2.19) left in this sum is determined by the rate of convergence of the series (3.2.19). Here we do not discuss this question (see, for example, [26, 166, 1791). However, in our case, the series (3.2.19) cannot be truncated in an arbitrary way, since it contains complex terms such as the polynomials Hem and the coefficients aFm (this follows from (3.2.14)-(3.2.18) and (3.2.20)). At the same time, the loss function fk(xl, x2) represented by this series has the meaning of a real function for real arguments. Therefore, truncating the series (3.2.19), we must remember that a finite sum of this series determines a real function only if the last terms of this sum contain all terms with He, of a certain group (namely, all Hem with l + m = s, where s is the highest order of the polynomials left in the sum (3.2.19)).

REMARK 3.2.2. Equation (3.2.7), as well as the corresponding equations of successive approximations (3.1.9) and (3. l. lo ) , is centrally symmetric (such equations remain unchanged after the substitution (xl, 22) t (-21, -22)). Therefore, the series (3.2.19) must not contain terms for which the sum (l + m) is odd, since the polynomials Hem with odd (e + m) are not centrally symmetric (see Table 3.2.1). If we take this fact into account, then the body of practical calculations is considerably reduced.

In what follows, we present the first two approximations calculated according to (3.1.9) and (3.1.10) and the quasioptimal control algorithms U O ( X ~ , X ~ ) and ul(x1,22) corresponding to these approximations.

The zero approximation. First of all, let us calculate the parameter -yo of specific stationary losses in the zero approximation. From (3.1.56) with regard to c(x) = xf + x; and (3.2.10), we have

7' = & (x: + x i ) exp [ - (x: + xi)] dxldx2.

Calculating the integral, we obtain

In view of Remark 3.2.2, the first coefficients aYo and a:, in the series (3.2.19) are equal to zero.8 The coefficients bgo, b!,, and b:, can be calculated by using the formulas for G20, GI1, and Go2 from Table 3.2.1 and

'The same result can be obtained if we formally calculate the coefficients byo and b:l by using (3.2.20).

160 Chapter I11

(3.2.16). Then, according to (3.2.20), the coefficient b!& has the form

The integral in (3.2.22) can readily be calculated, thus, taking into account (3.2.21) and (3.2.14), we obtain

In a similar way, we can easily find

All other coefficients b;,, with t+m > 2 are zero in view of the orthogonality condition (3.1.49).

According to (3.2.19), it follows from (3.2.23) and (3.2.24) that

Finally, using the formulas for H20, Hll, and H o ~ from Table 3.2.1 and (3.2.25), we obtain the loss function in the zero approximation

This relation and condition (3.2.9) imply the following equation for the zero-approximation switching line rO:


In this case, the quasioptimal control algorithm uo (x) in the zero approximation has the form

REMARK 3.2.3. The loss function fo(xl, 2 2 ) in the zero approximation (without a constant term) and the parameter of stationary losses (3.2.21) can be calculated in a different way, without using the method considered above. Indeed, if we first seek the solution of the zero approximation equation (L is the operator (3.2.8))

0 2 2 Lfo = y -2, -2,

as the quadratic form

with unknown coefficients hll, h12, and h22, then, substituting this expression into (3.2.39), we obtain four equations for hll, hlz, h22, and However, higher approximations cannot be obtained by this simple reasoning.

The first approximation. It follows from (3.1.10) and (3.2.26) that in the first approximation we need to solve the equation

This equation can be solved by analogy with the zero-approximation equation (3.2.29), but the calculations are much more cumbersome due to more complicated expression on the right-hand side.

First, we employ (3.1.57) and (3.2.21) to find the specific stationary losses

1 B 7 = - - * JJ_m_ lzl + ix2/ exp [ - $(x: + xi)] dxldx2, P TB

then, after the integral is calculated, we obtain

The coefficients a&,, a i l , . . . in (3.2.19) are calculated by (3.2.19) and (3.2.20) with regard to the formulas for Gem from Table 3.2.1. We omit

162 Chapter I11

the intermediate calculations and write the final expression for f l (xl , 22). Taking only the first terms in the series (3.2.19) up to the fourth order inclusively (that is, omitting the terms for which (l+ m) > 4), we obtain the following expression for the loss function in the first approximation:

The condition P2 << 1 has also been used for calculating the coefficients (3.2.32).

From (3.2.9) and (3.2.31) we obtain the following equation for the switching Iine r1 in the first approximation:

It follows from the continuity conditions that for small E the switching line r1 is close to r0 determined by Eq. (3.2.27). Therefore, if we set 2 2 = -(/3/2)21 in the terms of the order of E in (3.2.33), then we obtain errors of the order of E' in the equation for rl. Using this fact and formulas (3.2.32) and (3.2.33), we arrive a t the following expression with accuracy up to the terms of the order of O ( E ~ ) :

Figure 23 shows the position of the switching lines r0 and I" on the phase plane ( ~ 1 ~ 2 2 ) .

The switching line (3.2.34) determines the quasioptimal control algorithm in the first approximation:

'We do not calculate the coefficients v and p and the constant term "const" in (3.2.31) since they do not affect the position of the switching line and the control algorithm in the first approximation.


This algorithm can easily be implemented with the help of standard blocks of analog computers. The corresponding block diagram of a quasioptimal control system for damping of random oscillations is shown in Fig. 24, where 1 and 2 denote direct-current amplifiers with amplification factors

164 Chapter I11

In conclusion, we dwell on another statement that follows from the calculations of the first approximation. Namely, all expressions with a small parameter contain this parameter in the form of the product ~ a r = €1-, This statement concerns the loss function (3.2.31), the switching line (3.2.34), and the formula (3.2.30) for stationary specific losses, which can be written in the form

or, more briefly,

if the condition ,B2 << 1 is used in the same way as for calculating (3.2.32). As was already noted, the method of successive approximations exploited

in the present section is efficient if the nonlinear term of the Bellman equation contains a small parameter E (we discuss this question in 33.4 in detail). However, in the problem considered here, the convergence was ensured, in fact, by the parameter €1-. If we recall that, by the conditions of problem (3.2.2), the parameter E determines the values of admissible control, then it turns out that this variable need not be small for the method of successive approximations to be efficient. Only the relation between the limits of the admissible control and the intensity of random perturbations B is important.

All this confirms our assertion made a t the beginning of this chapter that the method of successive approximations considered here is convenient for solving problems with bounded controls when the intensity of random perturbations is large.

33.3. Synthesis of quasioptimal controls in the case of correlated noises

Now we shall show how the method of successive approximations studied in this chapter can be used for constructing quasioptimal controls when random actions on the system are not white noises. Instead of the system shown in Fig. 13, we shall consider a stabilization system of a somewhat more general form (see Fig. 25), where in addition to random actions [(t) on the plant we also take into account the noise rl(t) in the feedback circuit.

Let the controlled plant P be described, just as in $3.1, by the system of linear differential equations with constant coefficients


where xT = (21,. . . , x,), uT = (u1, . . . , u,), tT (t) = ([~(t), . . . , tm(t)), and the constant matrices A, Q, and r are of dimensions n x n, n x r , and n x m, respectively. Block 1 in Fig. 25 is assumed to be a linear inertialess device described by the equation

where yT = (yl, . . . , yl), rlT = ( ~ 1 , . . . , ve), and C and D are constant matrices of dimensions e x n and e x 1, respectively, (det D # 0). The goal of control is to minimize a functional of the form

We assume that the random perturbations [(t) and ~ ( t ) affecting the system are independent diffusion processes with drift coefficients

and matrices of local diffusion coefficients BE and B, (G and Bt are m x m dimensional constant matrices; H and B, are e x e dimensional constant matrices; the matrices BE and B, are symmetric, BE is a nonnegative definite matrix and B, is a positive definite matrix). It is well known that in this case the diffusion processes [(t) and ~ ( t ) are Gaussian.

The stated problem is a special case of the synthesis problem treated in $1.5. This problem is characterized by the fact that the controlled process x(t) is not a Markov process (in contrast, say, with problems considered in $3.1 and $3.2; moreover, x(t) is a nonobservable process), and therefore, to describe the controlled system shown in Fig. 25, we need a special space Xt

166 Chapter I11

of states. This space was called the space of sufficient coordinates in $1.5 (see also [171]). As was shown in $1.5, in this case, as sufficient coordinates, we must use a system of parameters that determine the current a posteriori probability density of nonobserved stochastic processes:

The a posteriori density (3.3.5) satisfies a stochastic partial differential equation, which is a special case of Eq. (1.5.39). It follows from 5 1.5 that to write an equation for the density (3.3.5), we need to use a priori probability characteristics of the (n + m + l)-dimensional stochastic Markov process (x(t) , t ( t ) , Y(t)).l0

It follows from (3.3.1), (3.3.2), and (3.3.4) that the combined process (x (t), [ ( t ) , y(t)) has the drift coefficients

and the matrix of local variances

The matrices introduced in (3.3.6) and (3.3.7) are

''In this case, the control u in (3.3.1) is assumed to be a given known vector at each time instant t .


Using (3.3.6) and (3.3.7), we obtain the following equation for the a posteriori probability density (3.3.5):"

Here p(t, z) = pt (x, <) denotes the a posteriori density (3.3.5), z denotes the vector (x,[), a, is the vector composed of the vector-columns a, and a t , the matrix B, is a part of the matrix (3.3.7) consisting of its first (n + m) rows and columns, Eps denotes the a posteriori averaging of the corresponding expressions (that is, the integration with respect to z with the density p(t, 2)).

It follows from (3.3.6)-(3.3.8) that the matrix B, is constant, the components of the vector a, are linear functions of z, and the expression in the square brackets in (3.3.9) linearly and quadratically depends on z. There- fore, as shown in $1.5 (see also [170, 175]), the a posteriori density p(t, z) satisfying (3.3.9) is Gaussian, that is,

p(t, z) = [ ( 2 7 ~ ) ~ + det K(t)]-'I2

x exp [ - i ( z - ~ ( t ) ) ~ K - ' ( t ) ( z - Z(t))], (3.3.10)

if the initial (a priori) density p(0, z) = po(z) is Gaussian (this is assumed in the sequel).

Substituting (3.3.10) into (3.3.9), one can obtain a system of differential equations for the parameters 2 and K-' of the a posteriori probability density (3.3.10). One can readily see that this system has the form

(in our special case, the system (1.5.52) acquires the form (3.3.11), (3.3.12)). If instead of K-I we use the inverse matrix K (which is the matrix of

a posteriori covariances), then the system (3.3.11), (3.3.12) can be written in the form

"TO derive (3.3.9) from (1.5.39), we need to recall that, according to the notation used in (1.5.39), the vector A, coincides with the vector a,, the vector A, with ay , and the structure of the diffusion matrix (3.3.7) implies the following relations between the matrices: llBapll = Bz, IIBaull = 0, llFupll = By1, and llBoPll = BY.

Chapter I11

Here u,, V, W are the matrices

where, in turn, k,,, k,(,. . . are elements of the block covariance matrix K ,

(the dimensions of a block are determined by the dimensions of its subscripts; for example, kxE is of dimension n x m).

The loss function for problem (3.3.1)-(3.3.3)

T

F ( t , i t , Kt) = min Eps [ l c(x(r), u ( r ) ) d r 1 yi]

t<r<T T

= min Eps [ 1 c(x(r) , u(r)) di 1 S(t) = i t , K (t) = Kt u f r ) I

t < ; < ~

(3.3.16) is completely determined by the time instant t and the current values of the parameters (St, Kt) of the a posteriori density (3.3.10) at this instant of time. It follows from the definition given in $1.5 that ( Z ( t ) , ~ ( t ) ) are sufficient coordinates for problem (3.3.1)-(3.3.3).

The Bellman equation (1.5.54) for the function (3.3.16) can readily be obtained in the standard way from the Eqs. (3.3.13), (3.3.14) for the sufficient coordinates. However, it should be noted that, in this case, the system (3.3.13), (3.3.14) has a special feature that allows us to exclude the a posteriori covariance K( t ) from sufficient coordinates. The point is that, in contrast, say, with a similar system (1.5.53), the matrix equation (3.3.14) is independent of controls u and in no way related to the system of differential equations (3.3.13) for the a posteriori means Z(t). This allows us first to solve the system (3.3.14) and calculate the matrix of a posteriori covariances K( t ) in the form of a known function of time on the entire control interval 0 < - t < - T (we solve (3.3.14) with the initial matrix K(0) = KO, where KO is the covariance matrix of a priori probability density po(z)). If K( t ) is assumed to be known, then in view of (3.3.8) and (3.3.15) we can also assume that the matrix a, in (3.3.13) is a known function of time, u,(t), and the loss function (3.3.16) depends on the set ( t , i t ) . Therefore,


instead of Eq. (1.5.54) for the loss function F ( t , 3, we have the Bellman equation of the form

(here N (2, K (t)) denotes the normal probability density (3.3.10) with the vector of mean values Zand the covariance matrix K(t)).

Just as in $3.1 and $3.2, Eq. (3.3.17) becomes simpler if we consider the stationary operating conditions for the stabilization system shown in Fig. 25. The stationary operating conditions established during a long operating time interval (which corresponds to large time t ) can exist if only there exists a real symmetric nonnegative definite matrix K, such that

and this constant matrix K, is an asymptotically stable soIution of (3.3.14). Let us assume that this condition is satisfied. Denoting the mean "con-

trol losses" per unit time under the stationary operating conditions, as usual, by y, we can define the stationary loss function

f(Z) = lim [ F ( t , q - T+w

for which, from (3.3.17), we derive the time-invariant Bellman equation

In (3.3.19) a, is the matrix a, (see (3.3.8) and (3.3.15)), where k,,, let,, . . . are the corresponding blocks of the matrix K, determined by (3.3.18).

In some cases, it is convenient to solve Eq. (3.3.19) by the method of successive approximations treated in $3.1 and $3.2. The following example shows how this method can be used. Let us consider the simplest version of the synthesis problem (3.3.1)-(3.3.3) in which Eqs. (3.3.1), (3.3.2) contain scalar variables instead of vectors and matrices. In (3.3.3) we write the penalty function c(x, u) in the form

170 Chapter I11

where E > 0 is a small parameter. From the "physical" viewpoint, this penalty function means that the control actions are penalized much more strongly than the deviations of the phase coordinate x(t) of the control system (3.3.1) from the equilibrium state x = 0.

For simplicity, we set Q = a = C = D = 1, A = -a, G = -g, and H = -h (a, 9, h are given positive numbers) in (3.3.1), (3.3.2), and (3.3.4). Then the optimality filtration equations (3.3.13) and (3.3.14) have the following (nonmatrix) form

- ~ = - a z + f + u + ~ [ $ - ( h - a ) ~ - ~ - u + h ~ ] ,

B, - - (3.3.21) - a t = - 9 ~ + -[G- ( h - a ) 5 - ( - u+ ~ Y I , B,

In this case the Bellman equation (3.3.19) has the form

Here the constants u,* and a t* are

where the constant covariances k;,, k&, and kit form the stationary solution of the system of differential equations (3.3.22).

Passing to the new variables xl = (&/a,*)Z, x2 = (&/ar*)rand denoting by L the linear operator

where r = u,*/up, we can rewrite Eq. (3.3.23) as


where b = gX. /&. Taking into account the formulas

1 = ( 2 ~ k f , ) - ' / ~ Jm 1x1 exp (- -(x - bx1)?

-m 2Gx

and minimizing the expression in the square brackets, we obtain from (3.3.25) the optimal control for the stationary stabilization conditions:

where the function f (xl, 2 2 ) satisfies the nonlinear elliptic equation

Equation (3.3.28) is similar to Eqs. (3.1.8) and (3.2.7), therefore, in this case, we can use the same method of approximate synthesis as in $3.1 and $3.2. Then the quasioptimal control uk (xl, 22) in the kth approximation is determined by the formula

where the functions fk(xl,x2) satisfy the linear equations of successive approximations

In this case, the calculations of successive approximations fk (xl, 2 2 ) are completely similar to those discussed in $3.1 and $3.2. Therefore, here

172 Chapter I11

we restrict our consideration to a brief description of the calculation of fk(xl, x2). We only dwell upon the distinctions in formulas.

The operator (3.3.24) can be written in the form (3.1.5) if A = IIAijll: and B = \lBijll! in (3.1.5) are understood as the matrices

The stationary density po(x) satisfying (3.1.14) has the form (3.1.15), and the matrices P and P-l, as one can readily see, have the form

( p = a + g , v = a - g P r , a n d p = r + 2 g ) . Using (3.3.31), we can find the matrix (see (3.1.35))

as well as the matrix

By the sirr~ilarity transform, this matrix reduces the matrix (3.3.32) to the diagonal form

It follows from (3.1.44), (3.1.51), and (3.1.55) that solutions of the equations of successive approximations (3.3.30) can be represented as the series

where He, (xl, 2 2 ) are two-dimensional Hermitian polynomials calculated 1 1

by the formulas (3.2.17) with y = 7- x (the matrix 7- is inverse to (3.3.33)).


The coefficients a:m are calculated by the formula (see (3.2.19))

and the coefficients

him = det1I2 P " 2*t!m! IS_, ~ t m ( ~ " ) I ~ = v T p ~

exp [ - + (xTpx) ]w~ (x) dx1dx2

(3.3.37) are expressed in terms of the group of Hermitian polynomials Gem(xl, x2) orthogonal to Hem (xl, x2) and calculated by (3.2.18).

Parallel to the calculations of the successive approximations to the loss function (3.3.35), we calculate specific stationary losses -yk (corresponding to the kth approximation) from the condition bEo = 0. So, in the zero approximation we have

hence, performing simple calculations and taking into account (3.3.31), we obtain

Next, using the obtained value of and formulas (3.3.26), (3.3.30), (3.3.36), and (3.3.27), we can calculate any desired number of coefficients aim in the series (3.3.35). With the help of these coefficients, we can construct an approximate expression for the function fo(xl, x2), which allows us to derive an explicit formula for the quasioptimal control algorithm uo(xl, 22) in the zero approximation and to calculate the variables fi(x1, x2), and ul(xl, x2) related to the first approximation.

Here we write explicit formulas neither for fo(xl, 2 2 ) nor for fl (XI, x2), since they are very cumbersome. We only remark that in this case all quasioptimal control algorithms (3.3.29) are nonlinear functions of the phase variables (xl, x2); moreover, the character of nonlinearity is determined by the number of terms left in the series (3.3.35) for the calculations.

174 Chapter I11

Thus, from the preceding it follows that the methods for calculations of stationary operating conditions of the stabilization system (Fig. 13) can readily be generalized to the case of a more general system with correlated noise (Fig. 25) if the noise is a Gaussian Markov process. In this case, the optimal system is characterized by the appearance of an optimal filter in the regulator circuit; this filter is responsible for the formation of sufficient coordinates. In our example (Fig. 25), where x, y, u, t , and 7 are scalar, this filter is described by Eqs. (3.3.21). The circuit of functional elements of this closed-loop control system is shown in Fig. 26.

Blocks P and 1 are units of the initial block diagram (Fig. 25). The rest of the diagram in Fig. 26 determines the structure of the optimal controller. One can see that this controller contains standard linear elements of analog computers such as integrators, amplifiers, adders, etc. and one nonlinear converter NC, which implements the functional dependence (3.3.29). Units of the diagram marked by ">" and having the numbers 1 , 2 , . . . , 8 are amplifiers with the following amplification factors Ki:


33.4. Nonstationary problems. Estimates of the quality of approximate synthesis

3.4.1. Nonstationary synthesis problems. If equations of a plant are time-dependent or if the operating time T of a system is bounded, then the optimal control algorithm is essentially time-varying, and we cannot find this algorithm by using the methods considered in $33.1-3.3. In this case, to synthesize an optimal system, it is necessary to solve a time-varying Bellman equation, which, in general, is a more complicated problem. How- ever, if the plant is governed by a system of linear (time-varying) equations, then we can readily write solutions of the successive approximation equations (3.0.6), (3.0.7) in quadratures.

Let us show how this is done. Just as in 33.1, we consider the synthesis problem for the stabilization system (Fig. 13) with a plant P described by equations of the form

where x is an n-dimensional vector of phase coordinates, u is an T-dimensional vector of controls, A(t), Q(t), and ~ ( t ) are, respectively, given n x n, n x T, and n x n matrices continuous for all t E [0, TI, and [(t) is the n-dimensional standard white noise (1.1.34). To estimate the quality of control, we shall use the following criterion of the type of (1.1.13):

and assume that the absolute values of the components of the control vector u are bounded by small values (see (3.1.3)):

According to (3.1.6) and (3.1.7), the optimal control u*(t, x) for problem (3.4.1)-(3.4.3) is given by the formula

U, (t, x) = - {EU,~, . . . , EU,,} sign

where the loss function F ( t , x) satisfies the equation

176 Chapter 111

with Lt,, denoting a linear parabolic operator of the form

For the function @(t, aF/dx) , we have the expression

In this case, the function F ( t , x) must satisfy (3.4.5) for all x E R,, 0 _< t < T, and be a continuous continuation of the function

as t -+ T (see (1.4.22)). The nonlinear equation (3.4.5) is similar to (3.0.5) and, according to

(3.0.6) and (3.0.7), can be solved by the method of successive approximations. To this end, we need to solve the sequence of linear equations

(all functions Fk(t, x) determined by (3.4.9) and (3.4.10) must satisfy condition (3.4.8)). Next, if we take Fk(t, x) as an approximate solution of Eq. (3.4.5) and substitute Fk into (3.4.4) instead of F, we obtain a quasioptimal control algorithm uk(t , x) in the kth approximation.

Let us write the solutions Fk (t, x), k = 0,1,2,. . . , in quadratures. First, let us consider Eq. (3.4.9). Obviously, its solution Fo(t, z ) is equal to the value of the cost functional

on the time interval [t, T] provided that there are no control actions. In this case, the functional on the right-hand side of (3.4.11) is calculated along the trajectories x(T), t 5 T _< T, that are solutions of the system of stochastic differential equations

describing the uncontrolled motion of the plant (u r 0 in (3.4.1)).


It follows from $1.1 and 31.2 that the solution of (3.4.12) is a continuous Markov process X(T) (a diffusion process). This process is completely determined by the transitive probability density function p(x, t ; z, T), which determines the probability density of the random variable z = X(T) if the stochastic process x(t) was in the state x(t) = x a t the preceding time moment t. Obviously, by using p(x, t; z, T), we can write the functional (3.4.11) in the form

On the other hand, we can write the transitive density p(x, t; z, T) for the diffusion process X(T) (3.4.12) as an explicit finite formula if we know the fundamental matrix X( t , T) for the nonperturbed (deterministic) system .i = A(t)z.

Indeed, since Eqs. (3.4.12) are linear, the stochastic process X(T) satisfying this equation is Markov and Gaussian. Therefore, for this process, the transitive probability density has the form

p(x, t ; z, T) = [ ( 2 ~ ) ~ det ~ ] - ~ / ~ e x ~ [ - $ ( z - u ) ~ D - ' ( ~ - a)], (3.4.14)

where a = Ez = E(X(T) I x(t) = x) is the vector of mean values and D = E[(z - Ez)(z - E z ) ~ ] is the covariance (dispersion) matrix of the random vector z = x(T). On the other hand, using the fundamental matrix X( t , r)12 we can write the solution of system (3.4.12) in the form (the Cauchy formula)

Hence, performing the averaging and taking into account properties of the white noise (1.1.34), we obtain the following expressions for the vector a and the matrix D:

12Recall that the fundamental matrix X(t , T), T > t , is a nondegenerate n x n matrix whose columns are linearly independent solutions of the system i ( r ) = A(r)z(r), SO

that X( t , t) = E, where E is the identity matrix. Methods for constructing fundamental matrices and their properties are briefly described on page 101 (for details, see 162, 1111).

178 Chapter I11

Formulas (3.4.13)-(3.4.16) determine the solution Fo(t, x) of the zero- approximation equation (3.4.9), satisfying (3.4.8), in quadratures. It follows from (3.4.13)-(3.4.16) that the function Fo(t, x) is infinitely many times differentiable with respect to the components of the vector x if the functions c(z) and +(z) belong to a rather wide class (it suffices that the functions c(z) exp(- $zT D-'z) and +(z) exp(- i z T D-'2) were absolutely integrable [25]). Therefore, by analogy with (3.4.13), we can write the solution Fk(t, z) of the successive approximation equations (3.4.10), satisfying (3.4.8), in the form

To obtain explicit formulas for the functions Fo(t, x), Fl( t , x), . . . , which allow us to write the quasioptimal control algorithms uo(t, x), ul( t , x), . . . as finite analytic formulas, we need to have the analytic expression of the matrix X( t , T) and to calculate the integrals in (3.4.13) and (3.4.17). For autonomous plants (the case where the matrix A(t) in (3.4.1) and (3.4.12) is constant, A(t) G A = const), the fundamental matrix X ( ~ , T ) has the form of a matrix exponential:

whose elements can be calculated by standard methods. On the other hand, it is well known that fundamental matrices of nonautonomous systems can be constructed, as a rule, by numerical methods.13 Thus for A(t) # const, it is often difficult to obtain analytical results.

If the plant equation (3.4.1) contains a constant matrix A(t) A = const, then formulas (3.4.13) and (3.4.17) allow us to generalize the results obtained in $3 3.1-3.3 for the stationary operating conditions to the time- varying case. For example, let us consider a time-varying version of the problem of optimal damping of random oscillations studied in $3.2.

13Examples of special matrices A ( t ) for which the fundamental matrix of the system x = A(t)x can be calculated analytically, can be found, e.g., in [139].


Just as in $3.2, we shall consider the optimal control problem (3.2.1)- (3.2.3). However, in contrast with $3.2, we now assume that the terminal time (the upper limit T of integration in the functional (3.2.3)) is a finite fixed value. By writing the plant equation (3.2.1) in the form of system (3.2.4), we see that problem (3.2.1)-(3.2.3) is a special case of problem (3.4.1)-(3.4.3) if

Therefore, it follows from the general scheme (3.4.4)-(3.4.10) that in this case the optimal control has the form

where for 0 5 t < T the function F ( t , 21, 22) satisfies the equation

and vanished a t the terminal point, that is,

According to (3.4.6) and (3.4.13), the operator Lt, , in (3.4.21) has the form

Let us calculate the loss function Fo(t, XI , 22) of the zero approximation. In view of (3.4.9), (3.4.21), and (3.4.22), this function satisfies the linear equation

Lt,,Fo(t, XI., 22) = -2; - x:, 0 5 t < T, (3.4.23)

with the boundary condition

According to (3.4.13), the function Fo(t, x l , 2 2 ) can be written in quadratures

180 Chapter I11

where the transition probability density p(x, t ; z , T) is given by (3.4.14). I t follows from (3.4.15) and (3.4.16) that to find the parameters of the transition density we need to calculate the fundamental matrix (3.4.18).

Obviously, the roots X1 and X2 of the characteristic equation det(A - XE) = 0 of the matrix A given by (3.4.19) are

From this and the Lagrange-Silvester formula [62] we obtain the following expression for the fundamental matrix (3.4.18) (here p = (T - t)):

- e-Ppf2 $ sin Sp + 6 cos Sp - - 1 S -sinSp b cos Sp sin - sp $ sin bp 1 (3.4.26)

It follows from (3.4.15), (3.4.16), and (3.4.26) that in this case the vector of means a and the variance matrix D of the transitive probability density (3.4.14) have the form

a = e-pp/2 XI cos Jp + f (xz + $xl) sin Sp / I 2 2 cos s p - $ (21 + f x 2 ) sin Sp (3.4.27)

1 1 p ~ ( p ) = - ( l - e - ~ ~ ) , p 2 ( p ) = 4 ~ + e - P p ( 2 6 s i n 2 6 p - ~ c o s 2 6 p ) ] ,

P p3 (p) = 26 - e-Pp ( p sin 2Sp + 26 cos 2Sp). (3.4.28)

Substituting (3.4.14) into (3.4.25) instead of p(x, t ; z , T), integrating, taking into account (3.4.27) and (3.4.28), and performing some easy calcula-


tions, we obtain the following final expression for the function Fo(t, x l , 22):

+ Pxlx2 + x i ) I

where 7 = T - t. Let us briefly discuss formula (3.4.29). If we consider the terms on

the right-hand side of (3.4.29) as function of "reverse" time 7 = T - t , then these terms can be divided into three groups: infinitely increasing, damping, and independent of p as 7 -+ oo. These three types of terms have the following physical meaning. The only infinitely growing term (B/P)p in (3.4.29) shows how the mean losses (3.4.11) depend on the operating time in the mode of stationary operating conditions. Therefore, the coefficient B / P has the meaning of the specific mean error y, which was calculated in 53.2 by other methods and for which we obtained = B / P in the zero approximation (see (3.2.21)). Next, the terms independent of p (in the braces in (3.4.29)) coincide with the expression for the stationary loss function obtained in $3.2 (formula (3.2.26)). Finally, the damping terms in (3.4.29) characterize the deviations of operating conditions of the control system from the stationary ones.

Using (3.4.29), we can approximately synthesize the optimal system in the zero approximation, where the control algorithm uo(t, X I , xa) has the form (3.4.20) with F replaced by Fo. The equation

determines the switching line on the phase plane (XI, x2). Formula (3.4.30) shows that this is a straight line coinciding with the x-axis as p -+ 0 and rotating clockwise as p -+ oo (see Fig. 27) till the limit value X I + 2x2/P = 0 corresponding to the stationary switching line (see (3.2.27)).

Formulas (3.4.29) and (3.4.30) also allow us to estimate whether it is important to take into account the fact that the control algorithm is time- varying. Indeed, (3.4.29) and (3.4.30) show that deviations from the stationary operating conditions are observed only on the time interval lying a t

Chapter I11

the distance from the terminal time T. Thus, if the general operating time T is substantially larger than this interval (say, T >> 3/,0), then we can use the stationary algorithm on the entire interval [0, TI, since in this case the value of the optimality criterion (3.2.3) does not practically differ from the optimal value. This fact is important for the practical implementation of optimal systems, since the design of regulators with varying parameters is a rather sophisticated technical problem.

3.4.2. Estimates of the approximate synthesis performance. Up to this point in the present chapter, we have studied the problem of how to find a control syste close to the optimal one by using the method of successive approximations. In this section we shall consider the problem of how the quasioptimal system constructed in this way is close to the optimal system, that is, the problem of approximate synthesis performance.

Let us estimate the approximate synthesis performance for the first two (the zero and the first) approximations calculated by (3.0.6)-(3.0.8). As an example, we use the time-varying problem (3.4.1)-(3.4.3). We assume that the entries of the matrices A(t), Q(t), and ~ ( t ) in (3.4.1) are continuous functions of time defined on the interval 0 5 t 5 T. We also assume that the penalty functions c(x) and $(x) in (3.4.2) are continuous and bounded for all x E R,. Then [I241 there exists a unique function F ( t , x) that satisfies the Cauchy problem (3.4.5), (3.4.8) for the quasilinear parabolic equation (3.4.5)14 This function is continuous in the strip IIT = (1x1 < m, 0 5 t 5 T}

14We shall use the following terminology: Eq. (3.4.5) is called a quasilinear (semilinear) parabolic equation, the problem of solving Eq. (3.4.5) with the boundary condi-


and continuously differentiable once with respect t o t and twice with respect to x for 0 5 t < T; its first- and second-order derivatives with respect to x are bounded for x E IIT.

One can readily see that in this case

and hence, for small E, the functions Fo (t , x) and Fl (t, x) nicely approximate the exact solution of Eq. (3.4.5).

To prove relations (3.4.31), let us consider the functions So@, X ) = F ( t , x) - Fo(t, x) and Sl( t , x) = F (t, x) - Fl(t, x). It follows from (3.4.5), (3.4.9), and (3.4.10) that these functions satisfy the equations

Equations (3.4.32) and (3.4.33) differ from (3.4.9) only by the expressions on the right-hand sides and by the initial data. Therefore, according to (3.4.13), the functions So and S1 can be written in the form

Since the function @ is continuous (see (3.4.7)) and the components of the vector dF/dx are bounded, we have I@(T, dF/dz)l < P for all T E [O, TI; hence, we have the estimate

tion (3.4.8) is called the Cauchy problem, and the boundary condition (3.4.8) itself is sometimes called the "initial" condition for the Cauchy problem (3.4.5), (3.4.8). This terminology corresponds to the universally accepted standards [61, 1241 if (as we shall do in 53.5) in Eq. (3.4.5) we perform a change of variables and use the "reverse" time p = T - t instead of t . In this case, the backward parabolic equation (3.4.5) becon~es a "usual" parabolic equation, and the boundary value problem (3.4.5), (3.4.8) takes the form of the standard Cauchy problem.

184 Chapter I11

The first relation in (3.4.31) is thereby proved. To prove the second relation in (3.4.31), we need to estimate the dif-

ference S$ = (aF/axi) - (aFo/dxi). To this end, we differentiate (3.4.32) with respect to xi. As a result, we obtain the following equation for the function s;:

(in fact, the derivative on the right-hand side of (3.4.37) is formal, since the function @ (3.4.7) is not differentiable). Using (3.4.13) for s;, we obtain

Integrating (3.4.38) by parts with respect to zi and taking into account (3.4.14) and (3.4.15), we arrive at

From (3.4.39) we obtain the following estimate for 5';:

Now we note that since Q(t) in (3.4.7) is bounded, the function @(t, y) satisfies the Lipschitz condition with respect to y:

Using (3.4.40), (3.4.41), and (3.4.35), we obtain

5 E ~ N P V ( T - t ) , v = C 1/;,


which proves the second relation in (3.4.31). In a similar way, we can also estimate the difference a F / d x i -aFl /dx i =

Si. Indeed, just as (3.4.39) was obtained from (3.4.32), we use (3.4.33) to obtain

This relation and (3.4.40), (3.4.41) for the function Si readily yield the estimate

which we shall use later. According to (3.0.8), in this case the quasioptimal controls uo( t , x ) and

ul(t, x ) are determined by (3.4.4), where instead of the loss function F ( t , x ) we use the successive approximations F'(x, t ) and Fl ( x , t ) , respectively. By Go( t , x ) and G l ( x , t ) we denote the mean values of the functional (3.4.11) calculated on the trajectories of the system (3.4.1)

with the use of the quasioptimal controls uo(t, x ) and ul( t , x ) . The functions Gi ( t , z), i = 0 , 1 , estimate the performance of the quasioptimal control algorithms ui(t, x ) , i = 0 , l . Therefore, it is clear that the approximate synthesis may be considered to be justified if there is only a small difference between the performance criteria Go( t , x ) and G l ( t , x ) of the suboptimal systems and the exact solution F ( t , x ) of Eq. (3.4.5) with the initial condition (3.4.8).

One can readily see that the functions Go and G I satisfy estimates of type (3.4.31), that is,

Relations (3.4.45) can be proved by analogy with (3.4.31). Indeed, the functions Go and G1 satisfy the linear partial differential equations [45, 1571

dGi LGi ( t , X ) = - c (x ) - c ~ ? ( t , x ) ~ ~ ( t ) -(t, x ) , (3.4.46)

- d x

G i ( T , X ) = $ ( x ) , u i ( t , X ) = u i ( t , x ) / E , i = 0 , l .

186 Chapter I11

This fact and (3.4.9), (3.4.10) imply the following equations for the functions Ho = Fo - Go and H1 = Fl - G I :

Since zTQT% = ~ ( t , z), Eq. (3.4.48) can be rewritten as follows:

It follows from (3.4.4) that Eqs. (3.1.46), (3.4.49) are linear parabolic equations with discontinuous coefficients. Such equations were studied in [80, 81, 1441. It was shown that if, just as in our case, the coefficients in (3.1.46), (3.1.49) have discontinuities of the first kind, then, under our assumptions about the properties of A(t), Q(t), c(x), and $(x), the solutions of Eqs. (3.4.46), (3.4.49) and their first-order partial derivatives are bounded.

Using this fact, we can readily verify that the right-hand sides of (3.4.47) and (3.4.49) are of the order of E and e2, respectively. For Eq. (3.4.47), this statement readily follows from the boundedness of the components of the vectors dGo/8x and Eo and the elements of the matrix Q. The right-hand side of (3.4.49) can be estimated by the Lipschitz condition (3.4.41) and the inequality

which follows from (3.4.40) and (3.4.44). Therefore, for the functions Ho and H1 we have

IHoINE, I H I I - E ~ . (3.4.50)

To prove (3.4.45), it suffices to take into account the inequalities

and to use (3.4.31) and (3.4.50). Thus, relations (3.4.45) show that if the Bellman equation contains a

small parameter in nonlinear terms, then the difference between the quasioptimal control system calculated by (3.0.6)-(3.0.8) and the optimal control system is small and, for sufficiently small E , we can restrict our calculations to a small number of approximations. We need either one (the zero)


or two (the zero and the first) approximations. This depends on the admissible deviation of the quasioptimal system performance criteria Gi (t, z ) from the loss function F ( t , 2).

In conclusion, we make two remarks about (3.4.45).

REMARK 3.4.1. One can readily see that all arguments that lead to the estimates (3.4.45) remain valid for any types of nonlinear functions in (3.4.5) that satisfy the Lipschitz condition (3.4.41). Therefore, in particular, all statements proved above for the function @ (3.4.7) automatically hold for equations of the form (3.0.4) with T-dimensional ball taken as the set U of admissible controls, instead of an T-dimensional parallelepiped.

REMARK 3.4.2. The estimates of the approximate synthesis accuracy considered in this section are based on the assumption that the solutions of the Bellman equation and their first-order partial derivatives are bounded. At first glance it would seem that this assumption substantially narrows the class of problems for which the approximate synthesis procedure (3.0.6)- (3.0.8) can be justified. Indeed, the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) are unbounded for any x E R, if the functions c(x) and $(x) infinitely grow as 1x1 + m . Therefore, for example, we must eliminate frequently used quadratic penalty functions from consideration. However, if we are interested in the solution of the synthesis problem in a given bounded region Xo of initial states x(0) of the control system, then the procedure (3.0.6)-(3.0.8) can also be used in the case of unbounded penalty functions. This statement is based on the following heuristic arguments. Since the plant equation (3.4.1) is linear and the matrices A(t), &(t), and a ( t ) and the control vector u are bounded, we can always choose a sufficiently large number R such that the probability P{supOltLT Ix(t)l > R) becomes arbitrary small [ l l , 45, 1571 for any fixed domain Xo of the initial states x(0). Therefore, without loss of accuracy, we can replace the unbounded functions c(x) and $(x) in (3.4.2) (if, in a certain sense, these functions grow as 1x1 = R + m slower than the probability

- Iz(t)l 2 R) decreases as R -t m ) by the expressions

for 1x1 < R,

c(x) for 1x1 2 R,

1x1 = R,

for l x l < R ,

for l x l > R ,

for which the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) satisfy the boundedness assumptions.

188 Chapter I11

The question of whether procedure (3.0.6)-(3.0.8) can be used for solving the synthesis problems with unbounded functions c(x) and $(x) in the functional (3.4.2) will be rigorously examined in the next section.

$3.5. Analysis of the asymptotic convergence of successive approximations (3.0.6)-(3.0.8) as k + oo

The method of successive approximations (3.0.6)-(3.0.8) can also be used for the synthesis of quasioptimal control systems if the Bellman equation does not contain a small parameter in nonlinear terms. Needless to say that (in contrast with Section 3.4.2 in $3.4) the first two approximations, as a rule, do not approximate the exact solution of the synthesis problem sufficiently well. We only hope that the suboptimal system synthesized on the basis of (3.0.9) is close to the optimal system for large Ic. Therefore, we need to investigate the asymptotic behavior as k 4 oo of the functions Fk (t, x) and uk (t, x) in (3.0.6)-(3.0.8). The present section deals with this problem.

Let us consider the time-varying synthesis problem of the form (3.4.1)- (3.4.3) in a more general setting. We assume that the plant is described by the vector-matrix stochastic differential equation of the form

Here x is an n-dimensional vector of phase coordinates of the system, u is an r-dimensional vector of controls, [(t) is an n-dimensional vector of random actions of the standard white noise type (1.1.34), Z(t, x) is a given vector- function of the phase coordinates x and time t , and q(t) and ~ ( t , x) are n x r and n x n matrices whose elements depend on t and (t, x), respectively. The conditions imposed on the functions E(t, x), q(t), and Z are stated later in detail. Here we only note that these functions are always assumed to be such that for t > - to, 0 < to < T, the stochastic equation (3.5.1) has a unique solution z( t ) satisfying the condition x(t0) = xo at least in the weak sense (see sIV.4 in [132]).

As an optimality criterion, we take the functional (3.4.2),

Here c(x) and $(x) are given nonnegative scalar penalty functions whose special form is determined by the character of the problem considered (the requirements on c(x) and +(x) are given later).

The constraints on the domain of admissible controls have the form (1.1.22),

u € u, (3.5.3)


where U C R, is a closed bounded convex set in the Euclidean space R,. It is required to find a function u, = u, (t, x(t)) satisfying (3.5.3) such

that the functional (3.5.2) calculated on the trajectories of system (3.5.1) with the control u, attains its minimum value.

In accordance with the dynamic programming approach, solving this problem is equivalent to solving the Bellman equation that, for problem (3.5.1)-(3.5.3), reads (see $1.4)

Here Z ( t , x) is a column of functions with components (see (1.2.48))

- 1 - ae(t ,x)=&(t ,x)+--gmi, & = I , ..., n. (3.5.5)

2 dx,

Recall that we assumed in $1.2 that throughout this book all stochastic differential equations written (just as (3.5.1)) in the Langevin form [I271 are symmetrized [174].

By definition, the loss function F in (3.5.4) is equal to

F = F( t ,x ) = min E u ( r ) E U

Here E[(-) I x(t) = x] means averaging over all possible realizations of the controlled stochastic process x ( r ) = z u ( ~ ) ( r ) ( r > t ) issued from the point x at r = t. It follows from (3.5.6) that

Passing to the "reverse" time p = T - t , we transform Eq. (3.5.4) and the condition (3.5.7) to the form

LF(p ,x )=-c (z ) -min UEU

F(0, 2) = +(XI. (3.5.9)

In (3.5.8) we have the following notation:

190 Chapter I11

ai (p, x) = iii (2, T - p), q(p) = q(T - p), bij (p, x) is a general element of the matrix $a (T - p, x)?FT(~ - p, x) and, as usual, the sum in (3.5.10) (just as in (3.5.5)) is taken over repeated indices from 1 to n.

Assuming that the gradient dF /ax of the loss function is a known vector and calculating the minimum in (3.5.8), we obtain

In addition, we obtain the function

that satisfies the condition

and solves the synthesis problem (after we have solved Eq. (3.5.11) with the initial condition (3.5.9)). The form of the functions cp and @ depends on the form of the domain U in (3.5.3) (see (1.3.19)-(1.3.23)).

Equation (3.5.11) is an equation of the form (3.0.5). It differs from Eq. (3.0.5) only by a small parameter (there is no small coefficient E of the function 9). Nevertheless, in this case, we shall also use the approximate synthesis procedure (3.0.6)-(3.0.8) in which, instead of the exact solution F(p, x) of Eq. (3.5.11), we take the sequence of functions Fo(p, x), Fl(p, x), . . . recurrently calculated by solving the following sequence of linear equations:

The successive approximations uo(p, x), ul(p, x), . . . of control are determined by the expressions

Below we shall find the conditions under which the recurrent procedure (3.5.13)-(3.5.15) converges to the exact solution of the synthesis problem.


Let us consider Eq.(3.5.11) with the operator L determined by (3.5.10). The solution F(p, x) and the coefficients bij(p, x) and ai(p, x) of the operator L are defined on IIT = {[O, TI x R,} E {(p, x): 0 < p 5 T, x E R,). We assume that everywhere in IIT the matrix Ilbij(p, x)lly satisfies the condition that the operator L is uniformly parabolic, that is, everywhere in ItT for any real vector x we have

where and X are some positive constants. Moreover, we assume that the functions bij(p, x) and ai(p, x) are bounded in IIT, continuous in both variables (p, x), and satisfy the Holder inequality with respect to x uniformly in p, that is,

We assume that the functions c(x), $(x), and @(p, dF/dx) are continuous in IIT and that c(x) and $ (x) satisfy the following restrictions on the growth as 1x1 + m:

~ ( x ) < KlehlXl, $(x) < KlehlXl (3.5.18)

(h is a positive constant). We also assume that the function @(p, v) satisfies the Lipschitz condition with respect to v = (vl,. . . , v,) uniformly in p E [O, TI, that is,

In particular, the functions @ from (3.4.7) and (1.3.23) satisfy (3.5.19). The following three consequences from the above assumptions are well

known [74]. (1) There exists a unique fundamental solution G(x, p; y, a) of linear

equations (3.5.13), (3.5.14). This solution is defined for all (x, p) E ItT and (y, a) E IIT (p > a ) , satisfies the homogeneous equation LG = 0 in the variables (x, p), and

for any continuous function f (x) such that

192 Chapter I11

(here 1 is taken from (3.5.16)). (2) Solutions of inhomogeneous equations (3.5.13) and (3.5.14) can be

expressed in terms of G(x, p; y, u ) as follows:

In this case, formula (3.5.22) holds unconditionally in view of (3.5.18); formula (3.5.23) holds only if the derivatives dFk/dxi satisfy some inequalities of the form (3.5.18) (or a t least of the form (3.5.21)). In the sequel, we show that this condition is always satisfied. The solutions Fk(p, x), k = 0,1, . . . , are twice continuously differentiable in x, and the derivatives dFk/dxi and d2Fk/dxidxj can be calculated by differentiating the integrands on the right-hand sides of (3.5.22) and (3.5.23).

(3) The following inequalities hold (for any X < from (3.5.16)):

Statements (1)-(3) hold for linear equations (3.5.13), (3.5.14) of successive approximations. Now we return to the synthesis problem and consider the two stages of solving this problem. First, by using the majorant estimates (3.5.24) and (3.5.25), we prove that the successive approximations Fk(p ,x) converge as k + co to the solution F(p ,x) of Eq. (3.5.11) (in this case, we simultaneously prove that there exists a unique solution of Eq. (3.5.11) with the initial condition (3.5.9)). Next, we show that the suboptimal systems constructed by the control law (3.5.15) are asymptotically as k + co equivalent to the optimal system.

1. First, we prove that the sequence of functions Fo (p, x), Fl (p, x), . . . determined by recurrent formulas (3.5.22), (3.5.23) and the sequence of their partial derivatives dFk(p, x)/dxi, k = 0,1,2, . . . are uniformly con-


vergent. To this end, we construct the differences

(in (3.5.26), (3,5,27) we set Ic = 0,1,2, . . . provided that F- 0). Using (3.5.19), (3.5.26), and (3.5.27), we obtain the inequalities

/ dQk (p, X ) / K2 /" / I a G ( ~ , dxi P; Y, ( . (aQk-l(a' axi o Rn

Formulas (3.5.28), (3.5.29) and (3.5.24), (3.5.25) allow us to calculate estimates for the differences (3.5.26), (3.5.27) recurrently. To this end, it is necessary only to estimate IdQo/dxil. It turns out that the estimate of type (3.5.18) holds, that is,

Indeed, since

t-"l2/ R, exp (- a l y 2 + h y l ) d y < 4

for X > 0, we have

194 Chapter I11

for the derivative aFo/azi provided that (3.5.18), (3.5.22), and (3.5.25) are taken into account.

By using the inequality

with regard to (3.5.19), (3.5.27), and (3.5.32), we obtain

x exp ( - 'Ix - 'IZ + hlyl) dy P-"

and since p is bounded, we arrive a t (3.5.30). Using (3.5.30) and applying formulas (3.5.28) and (3.5.29) repeatedly,

we estimate the differences (3.5.26) and (3.5.27) for an arbitrary number k > - 1 (here r ( - ) is the gamma function) as follows:

(formulas (3.5.33) and (3.5.34) are proved by induction over k). The estimates obtained show that the sequences of functions

converge to some limit functions

dFk F (p, x) = lim F k (p, x), Wi (p, x) = lim - (p, z).

k + m k+m dxi

In this case, the partial sums on the right-hand side of (3.5.35) uniformly converge in any bounded domain lying in HT, while in (3.5.36) the partial


sums converge uniformly if they begin from the second term. The estimate (3.5.32) shows that the first summand is majorized by a function with singularity at p = 0. However, one can readily see that this is an integrable singularity. Therefore, we can pass to the limit (as Ic -+ oo) in (3.5.23) and in the formula obtained by differentiating (3.5.23) with respect to xi. As a result, we obtain

This implies that Wi(p, x) = dF(p, %)/axi and hence the limit function F(p, x) satisfies the equation

Equation (3.5.37) is equivalent to the initial equation (3.5.11) with the initial condition (3.5.9), which can be readily verified by differentiating with regard to (3.5.20).

Thus, we have proved that there exists a solution of Eq. (3.5.11) with the initial condition (3.5.9). The proof of this statement shows that the solution F(p, x) and its derivatives dF/dxi have the following majorants everywhere in ItT:

By using (3.5.38), we can prove that the solution of Eq. (3.5.11) with the initial condition (3.5.9) is unique. Indeed, assume that there exist two solutions Fl and Fz of Eq. (3.5.11) (or of (3.5.37)). For the difference V = Fl - F2 we obtain the expression

which together with (3.5.19) allows us to write

196 Chapter I11

The same reasoning as for the functions Fk leads to the following estimate for the difference V = Fl - F2 that holds for any k :

This implies that V(p, x) 0, that is, Fl(p, x) = &(p, x). We have proved that the successive approximations Fo(p, x), Fl(p, x), . . .

obtained by recurrent formulas (3.5.13) and (3.5.14) converge asymptotically as k + oo to the solution of the Bellman equation, which exists and is unique.

2. Now let us return to the synthesis problem. Previously, it was proposed to use the functions uk(p, x) given by (3.5.15) for the synthesis of the control system. By Hk(p, x) we denote the functional

calculated on the trajectories of system (3.5.1) that pass through the point x at time t = T - p under the action of control uk. The function Hk(p, x) determines the "quality" of the control uk(p, x) and satisfies the linear equation

aHk L H k ( ~ , = -c(x) - u:(P, x)qT(p) z ( ~ , x), Hk(O, X) = d ( ~ ) .

(3.5.39) From (3.5.14), (3.5.39), and the relation -uTqTd~k/dx = @(p, dFk/dx), it follows that the difference Ak(p, x) = Fk(p, x) - Hk(p, x) satisfies the equation

Since the right-hand side of (3.5.40) is small for large k (see (3.5.19), (3.5.34)), that is,

, , . . ,

(3.5.41) and the initial condition in (3.5.40) is zero, we can expect that the difference Ak (p, x) considered as the solution of Eq. (3.5.40) is of the same order, that is,

[Ak (p, x) < - €6 ~ ~ e ~ l ~ l . (3.5.42)


If the functions uk (p, x) are bounded and sufficiently smooth, so that the coefficients of the operator Lk are Holder continuous, then the operator Lk is just the same as L and the inequality (3.5.42) can readily be obtained from (3.5.22), (3.5.24), and (3.5.41). Conversely, if uk(p, x) are discontinuous functions (but without singularities, for example, such as in (3.0.1) and (3.0.8)), then the inequality (3.5.42) follows from the results of [8 11.

Since the series (3.5.35) is convergent, we have IF(p, x) - Fk(p, x)l 5 ~ g ~ 7 e ~ l ~ l (where E; t 0 as k + m) . Finally, this fact, the inequality IF - Hk I < IF - Fk I + IFk - Hk 1 , and (3.5.42) imply

( E ~ = max(~L, E:) and Kg = max(K6, K7)). Formula (3.5.43) proves the asymptotic (as k t m ) optimality of suboptimal systems constructed according to the control algorithms uk(p, x) calculated by the recurrent formulas (3.5.13)-(3.5.15).

REMARK 3.5.1. If the coefficients of the operator L are unbounded in IIT, then the estimates (3.5.24) and (3.5.25), generally speaking, do not hold. However, there may be a change of variables that reduces the problem to the case considered above. If, for example, the coefficients &( t , x) in (3.5.1) depend on x in a linear way (that is, a(t , x) = A(t)x, where A(t) is an n x n matrix depending only on t ) , then the change of variables x = X(O, t)y (where X(0 , t ) is the fundamental matrix of the system i = A(t)x) eliminates unbounded coefficients in the operator L (in the new variables y) , which allows us to investigate such systems by the methods considered above.

In conclusion, let us consider an example from [96], which illustrates the efficiency of the method of successive approximations for a one-dimensional synthesis problem that can be solved exactly.

Let the control system be described by the scalar equation

Here d(r) is the delta function; b and urn are given positive numbers. We shall assume that the penalty function c(x) in the optimality criterion (3.5.2) is even (that is, c(x) = c(-x)) and the final state x(T) is not penalized. Then the Bellman equation (3.5.8) and the initial condition (3.5.9) take the form

d F aF b d 2 F - = c(x) + min u- + --

u [ a 2 ax2 , F(0, x) = 0. (3.5.44) a p

198 Chapter I11

Minimizing the expression in the square brackets, we obtain the optimal control

d F U* (p, X) = -Urn sign - (p, 2) , ax

and transform the Bellman equation to the form

Since the penalty function c(x) is even, it follows from (3.5.45) that for any p the loss function F (p, x) satisfying (3.5.45) is an even function of x, hence we have the explicit formula

u+(p, x) = u* (x) = -urn sign x.

In this case, for x > 0, the loss function F(p , x) is determined by the formula [26]

(x + + pl2 + dp} dy. 2ba

The successive approximations Fo (p, x) , Fl (p, x) , . . . are even functions of the variable x (since c(x) is even). Therefore, in this case, any approximate control (3.5.15) coincides with the optimal control u,, and the efficiency of the method can be estimated by the deviation of the successive approximations Fo, Fl, . . . from the exact solution F (p, x) written above. Choosing the quadratic penalty function c(x) = x2 and taking into account the fact that in this case the fundamental solution G(x, p; y, a) (the transition probability density p(y, a ; y, a)) has the form

we obtain from (3.5.22) and (3.5.23) the following expressions for the first two approximations:

Approximate Synthesis of Stochastic Control Systems

The functions Fo, Fl, F calculated for u, = b = p = 1 are shown in Fig. 28. One can see that

max IF(1,x) - Fo(lt X ) I NN I, max x F (1 ,x ) x

that is, the second approximation gives a satisfactory approximation to the exact solution.

This example shows that the actual rate of convergence of successive approximations to the exact solution of the Bellman equation can be larger than the theoretical rate of convergence estimated by (3.5.35) and (3.5.33), since the proof of the convergence of the method of successive approximations (3.5.13)-(3.5.15) is based on rather rough estimates (3.5.24) and (3.5.25) for the fundamental solution.

$3.6. Approximate synthesis of some stochastic systems with distributed parameters

This section occupies a special place in the book, since only here we consider optimal control systems with distributed parameters in which the plant dynamics is described by partial differential equations.

So far the theory of optimal control of systems with distributed parameters is characterized by a significant progress, first of all, in its deterministic branch [30, 1301. Important results are also obtained in stochastic problems (the distributed Kalman filter, the separation theorem in the optimal control synthesis for linear systems with quadratic criterion, etc. [118, 1821).

200 Chapter I11

However, many problems in the stochastic theory of systems with lumped parameters still remain to be generalized to the case of distributed plants.

We do not try to consider these problems in detail but only discuss the possible use of the approximate synthesis procedure (3.0.6)-(3.0.8) for solving some stochastic control problems for distributed systems. Our consideration is confined to problems in which the plants are described by linear partial equations of parabolic type.

3.6.1. Statement of the problem. Let us consider control systems subject to the equation

aV( t , = L,v(t, x) + u(t, a ) + [(t, a ) , 0 < t 5 T, v(0, x) = vo(x). dt

(3.6.1) Here C, denotes a smooth elliptic operator with respect to spatial variables 2 = ( X I , . . - 7 xn),

d2 L, = aij (t, x) -

d + bi(t, 2)- + c(t, x), axiaxj axi whose coefficients aij (t, x), bi (t, x), and c(t, x) are defined in the cylinder fl = D x [O,T], where D is the closure of an arbitrary domain in the n-dimensional Euclidean space R, and the matrix a(t, x) satisfies the inequality

T rl a7 = aij(t, x )~i r l j > 0 (3.6.3)

for all (t, x) E 0 and all 7 = (vl, . . . , 7,) (as usual, in (3.6.2) and (3.6.3) the sum is taken over twice repeated indices from 1 to n).

If D does not coincide with the entire space R,, then, in addition to (3.6.1), the following boundary conditions must be satisfied at the boundary dD of the domain D:

M,v(t, 2) = uy(t, x), (3.6.4)

where the linear operator M, depends on the character of the boundary problem. Thus, for the first, the second, and the third boundary value problems, condition (3.6.4) has the form

Here x E dD, dvldu denotes the outward conormal derivative, and a is the outward conormal vector whose components aj (i = 1 , . . . , n) and the


components of the outward normal v on the boundary a D are related by the formulas ui = aijvj [61, 1241; in particular, if Ilaijll; is the identity matrix, i.e., aij = Sij, then the conormal coincides with the normal.

For example, equations of the form (3.6.1) with the boundary conditions (3.6.4) describe heat propagation or variation in a substance concentration in diffusion processes in some volume D [166, 1791. In this case, v(t, x) is the temperature (or, respectively, the concentration) at the point x E D at time t. Then the boundary condition (3.6.4.1) determines the temperature (concentration), and the condition (3.6.4.11) determines the heat (substance) flux through the boundary a D of the volume D.

System (3.6.1) is controlled both by control actions u(t, x) distributed throughout the volume and by variations in the boundary operating conditions ur(t, 2). The admissible controls are piecewise continuous functions u(t, x) and ur(t, x) with values in bounded closed domains:

We assume that the spatially distributed random action [(t, x) is of the nature of a spatially correlated normal white noise

where K( t , x, y) is a positive definite kernel-function symmetric in x and y and S(t) is the delta function.

We also assume that, under the above assumptions, the function v(t, x) characterizing the plant state at time t is uniquely determined as the generalized solution of Eq. (3.6.1) that satisfies (3.6.1) for (x, t ) E D x (0, T ] and is a continuous continuation of a given initial function v(0, x) = vo(x) as t + 0 and of the boundary conditions (3.6.4) as x -+ dD.

The problem is to find functions u,(t, x) and u:(t, x) satisfying (3.6.5) so that to minimize the optimality criterion

. . where xi = (xi, x i , . . . , xh), dxi = d x y x i . . .dxh ( i = 1 , 2 , . . . , s), and w is an arbitrary nonnegative integrable function. In this case, the desired functions u, and u: must depend on the current state v(t, x) of the controlled system (the synthesis functions), that is, they must have the operator form

(it is assumed that the state function v(t, x) can be measured precisely).

202 Chapter I11

3.6.2. The Bellman equation and equations of successive approximations. To find the operators (3.6.8), we shall use the dynamic programming approach. Taking into account the properties of the parabolic equation (3.6.1) and the nature of the random actions (3.6.6), we can prove [95] that the time evolution of v ( t , x ) is Markov in the following sense: for given functions u(t, x ) and u r ( t , x ) , the probability distribution of the future values of V ( T , x ) for T > t is completely determined by the value of the function v ( t , x ) at time t . This allows us to consider the minimum losses on the time interval [t, TI,

F [ t , v ( t , x ) ] = min u(t ,x)€U(s)

u , ( t , x ) € U r ( x ) t<r<T

where

as a functional depending only on the initial (at time t ) state v ( t , x ) and time t . Therefore, the fundamental difference equation of the dynamic programming approach (see (1.4.6)) can be written as

t+At F [ t , v ( t , x ) ] = min E{L Li, d~ + F E + at, v ( t + a t , x ) ] ) .

u T € U , urTEUr t<r<t+At

(3.6.10) For small At, in view of (3.6.1), we have

v ( t + At, x ) = v ( t , x ) + A v ( t , x )

Taking (3.6.11) into account, we can expand the functional F [ t + A t , v(t+ A t , x ) ] in the functional Taylor series [91]

+'/ / J 2 F [ t , v ( t , 211 A v ( t , x ) A v ( t , y) dxdy + . . . . D D Sv(t, ~ ) ~ v ( t , Y) (3.6.12)


The functional derivatives SFISv and S2 F / S v ( x ) 6 ~ ( ~ ) in (3.6.12) (for their detailed description, see [91]) can be obtained by calculating the standard derivatives in the formulas

6 F 1 d F a ( ~ i , ~ 2 , . . .) = lim -

S V ( ~ , X ) A+O An dvj 1

A,+x

d2 F . 1 d2Fa(vl ,v2, . . .) (3.6.13) = lim -

6 ~ ( t , ~ ) 6 ~ ( t , y) a+o A2" d ~ i d ~ j a j + x

In (3.6.13) the functional Fa(vl, v2,. . .) denotes a discrete analog of the functional ~ ( t , v(t, x)) , which can be obtained by dividing the volume D into n-dimensional cubes Ai of equal volume An and replacing the continuous function v(t, x) by a set of discrete values vl, 212,. . . each of which is equal to the value of v(t, x) a t the center of the cube Ai. In this case, the functional F is assumed to be sufficiently smooth, that is, its weak and strong Giiteaux and Freshkt derivatives [91] exist up to the second order inclusively, are equal to each other, and coincide with (3.6.13).

Substituting the expansion (3.6.12) into (3.6.10), passing to the limit as At + 0, and taking into account (3.6.6) and (3.6.11), we obtain the Bellman equation with functional derivatives:

d F -- - - 6 F min { ~ ( u , up, v) + -[lXv(t, X) + dx

dt u ~ u , u,EUr

1 62 F + 5 J, J, K(t 'x ' ')6v(t, x)bv(t, y)

dxdy.

To find the desired optimal control operators (3.6.8), it is necessary to solve Eq. (3.6.14).

The integral in the braces in (3.6.14) depends (in addition to the "solid" controls u(t, 2)) on the control actions ur( t , x) that determine the boundary conditions (3.6.4) for the functions v(t, x) obtained by solving Eq. (3.6.1). We can write this dependence explicitly by using the Green formula [61, 1241

204 Chapter I11

where C: denotes the differential operator dual to C, in the variables x and v is the outward normal on aD. In (3.6.15) the integral over the boundary dD of the domain D explicitly depends on the control ur (t, x) of the boundary operating conditions as it follows from (3.6.4). To be definite, let us consider the third boundary value problem (3.6.4.111). The outward conormal derivative of the state function v(t, x) on the boundary d D can be written as

Substituting (3.6.16) into (3.6.15) and (3.6.15) into (3.6.14), we obtain the following final Bellman equation (for the third boundary value problem):

aF -- - - SF

min { ~ ( u , up, v) + J, %udx + LD a t u , ,

u,EUr

This equation can be solved only approximately if the penalty functions are arbitrary and the controls u and ur are subject to constraints.

Let us consider one of the methods for solving (3.6.17) by using the approximate synthesis procedure of the form (3.0.6)-(3.0.8). As already noted (883.1-3.4), the approximate synthesis method is especially convenient if the controlling actions are small, namely, Ilv - voll/llvII << 1, where v is the solution of Eq. (3.6.1) with the boundary condition (3.6.4) and with any admissible functions u and ur satisfying (3.6.5), vo is the solution of the corresponding homogeneous (in u and ur) problem, and 1 1 - 1 1 is the norm in the space L2. From a physical viewpoint, this means that the power of (controlled) sources is not large as compared with llv112 or with the intensity JD lD K(t , x, y ) dxdy of random perturbations t ( t , x).

Then, by setting u(t, x) = ur 0, we obtain the following equation for the zero approximation instead of (3.6.17):


Here, according to (3.6.9), Go(v(t, x)) is a functional of the form

= - uo[v(t, xl), . . . , v(t, xs)] do1.. . dx'. (3.6.19)

If the functional Fo (t, v(t, x)) satisfying (3.6.18) is found, then the condition

min { G ( ~ , u ~ , v ) + JD ?udx+ LD uEU,

u r E U r

allows us to calculate the zero-approximation optimal control functions (operators) uo(t, x) = (t, v(t, x)) and u''(t, x) = $0 (t, v(t, 2)).

The expression for GI (v(t, z)) is used to calculate the first approximation F~ (t, v(t, x)) , and so on. In general, the kth approximation Fk (t, v(t, x)) (k = 1,2,. . .) of the loss functional is determined as the solution of an equation of the form (3.6.18), where the change Go t Gk and Fo -+ Fk is performed. Furthermore, simultaneously with F k , we determine the pair of functions (operators)

u k ( t , = ~ k [ t t ~ ( t , x)], X E D, Ut(t, 2) = &[t, v(t, x)], x E dB,

which allow us to synthesize a suboptimal control system in the kth approximation (the functions cpk and $k can be obtained from Eq. (3.6.20) with Fo replaced by Fk).

3.6.3. Quadrature formulas for functionals of successive approximations Fk[t, v(t, x)], k = 0,1,2,. . .. To use the above procedure of approximate synthesis in practice, we need to solve Eq. (3.6.18) and the corresponding equations for Fk (k = 1,2, . . . ).

206 Chapter I11

First, let us consider the zero-approximation equation (3.6.18). We show that if the influence function G(x, t ; C , T) of an instantaneous point source15 is known, then the solution of Eq. (3.6.18) can be written in the form

where the function wo(v1,. . . , v,) is defined by (3.6.19) and

Here the entries of the matrix llDtrll are given by the formulas

and (DF;),~ denotes the (a, P) th element of the inverse matrix IIDtTjl-l. To prove (3.6.21) and (3.6.22), we need to recall some well-known facts [61, 1241 from the theory of adjoint operators and Green functions.

Suppose that a smooth elliptic operator L, of the form (3.6.2) is given in an arbitrary domain D of an n-dimensional Euclidean space R,. We also assume that this operator is defined in the space of functions f sufficiently smooth in D and satisfying the equation

on the boundary d D of the domain D; here M , denotes a certain differential operator with respect to the variables x E d D (a boundary operator).

I5The function G ( x , t ; C , T ) , t > 7 , with respect to the variables ( x , t ) is the solution of the homogeneous boundary value problem (3 .6 .1) , (3.6.4) (the case in which u ( t , I ) = u r ( t , x) = [ ( t , x ) 0 in (3.6.1) and (3.6.4)) with the initial condition V ( T , x ) = &(x - C ) . This function is also called the fundamental solution or the Green function of problem (3.6.11, (3.6.4).


DEFINITION 3.6.1. The operators L; E D and M; E d D are called adjoint operators of L, and M, if for arbitrary sufficiently smooth functions f (x), satisfying (3.6.24), and p(x) satisfying

we have the relation

In general, the adjoint operators L: and M: are not uniquely defined. However, if we set L: equal to the adjoint operator defined in the unbounded domain D = R, [61], that is,

then it follows from Definition 3.6.1 and the Green formula

that the operator M; can be defined uniquely. So, for the first, second, and third homogeneous boundary conditions (that is, for the conditions (3.6.4.1)-(3.6.4.111)) where up@, x) = 0, Eq. (3.6.25) takes, respectively, the form

Now let us consider the parabolic operators

208 Chapter I11

DEFINITION 3.6.2. A function G ( x , t ; C , r ) defined and continuous for ( x , t ) , ( C , r ) E C!, t > r , is called the influence function of a point source ( the Green function) for the equation Lf = 0 in the domain C! if for any r E 10, T ) the function G ( x , t ; <, r ) satisfies the equation

in the variables ( t , x ) in the domain D x (r < t < T ) and satisfies the initial and boundary conditions

limG(x, t ; (', r ) = d ( x - C ) , (3.6.31) tJ.7

M S G = 0 for x E d D , r < t < T. (3.6.32)

In a similar way, the Green function G* ( x , t ; C , r ) is defined for the adjoint parabolic operator (3.6.29). The only difference is that, in this case, the function G* is defined for time t < r. The conditions (similar to (3.6.30)- (3.6.32)) that determine the Green function for the adjoint problem have the form

L*G*=O for ( t , x ) ~ D x ( O < t < r ) , (3.6.33)

limG* (2 , t ; C , r ) = 6 ( x - <), (3.6.34) t?.r

M : G * = O for ( t , z ) ~ d D x ( O < t < r ) . (3.6.35)

The following statement readily holds for the functions G and G*.

DUALITY THEOREM. I f G ( x , t ; C , r ) and G* ( x , t ; C, r ) satisfy problems (3.6.30)-(3.6.32) and (3.6.33)-(3.6.35), then

G ( x , t ; C , 7 ) = G* ( C , r; x , t ) . (3.6.36)

PROOF. Let us consider the functions G ( y , q ; C , r ) and G * ( y , q ; x , t ) for y E D and r < q < t . Taking into account the fact that these functions satisfy (3.6.30) and (3.6.33) in y and q , in view of Definition 3.6.1 of the adjoint (in y ) operator C;, we have


Rewriting (3.6.37) in the form

passing to the limit as E + 0, and taking into account (3.6.31) and (3.6.34), we obtain (3.6.36).

Now, by using the properties of the Green functions, we shall show that the functional (3.6.21) actually satisfies Eq.(3.6.18). To this end, we need to calculate all derivatives in (3.6.18). Taking into account the relation

m m

l im/ . . .lCO dvl . . . d v s { ~ ( v l , . . ., v , ) [ ( 2 ~ ) ~ det 1 1 ~ ~ , / / ] ' ~ ' rJt -,

and the property (3.6.31) of the Green function, we differentiate (3.6.21) with respect to time and obtain

To calculate apt,/&, we use the rules for differentiating determinants and inverse matrices:

(here B is the matrix composed of the time-derivatives of the entries of the matrix B). Performing the necessary calculations, we obtain

210 Chapter I11

where, for brevity, we use the notation

By formulas (3.6.13) and (3.6.22), we can readily obtain the first- and second-order functional derivatives

In view of (3.6.36), the Green functions G(xa, r; x, t ) in (3.6.39)-(3.6.42) satisfy (with respect to x and t ) the adjoint equation (3.6.33) in the interior of the domain D and the adjoint boundary condition on the boundary dD. Taking into account the fact that the adjoint boundary condition has the form (3.6.25.111) for the third boundary value problem (Eq. (3.6.18) was written just for this problem) and substituting (3.6.41) into (3.6.18), we readily verify that the integral over the boundary aD in (3.6.18) is equal to zero. Finally, substituting (3.6.38)-(3.6.42) into (3.6.18), we arrive a t an identity, and relation (3.6.21) is thereby proved.

The solution of the zero approximation equation (3.6.18) is given by formulas (3.6.21) and (3.6.22). As a rule, the higher-order approximations Fk (t, v(t, x)), k 2 1, are calculated by more complicated formulas, where, in addition, we must pass to the limit, since, in general, wk. (v(t, x)) , k 2 1, are not integral functionals of the form (3.6.19). Therefore, we can calculate successive approximations Fk (t, v(t, x)), k >_ 1, by using, instead of (3.6.21),


the formula [95]

where zg (v1 , . . . , v,) = G: (v(t, xl), . . . , v(t, xT)) is a finite-dimensional analog of the functional Gk [v (t, x)] such that

A lim w k (VI, . . . , vr ) = Gk [v(t, x)]. r + m A+O

The following example illustrates calculations with the help of formula (3.6.43).

3.6.4. An example. If we choose some special expressions for the functional (3.6.7), the operator (3.6.2), etc., then, using formulas (3.6.21) and (3.6.43), we can obtain a finite approximate solution of the synthesis problem. As an example, we calculate the optimal control of a substance concentration in a cylinder of a finite length.

Let us consider a control problem often encountered in chemical industry processes. Suppose that there is a chemical reactor in which the output product is obtained by catalytic synthesis reactions. We assume that the reacting agents diffuse into the catalysis chamber through pipelines. There may be branches in the pipeline through which reagents are coming to technological units where the concentration of the entering substance varies on random. Simultaneously, to obtain a qualitative output product, it is necessary to maintain the reagent concentrations close to given values. One of possible ways to stabilize the concentration in the catalysis chamber is to change the flow rate a t the input of the corresponding pipeline.

After appropriate generalizations and idealizations, this problem can be stated as follows. Let the plant (a pipeline) be a cylinder of length l filled with a homogeneous porous medium; the assumption r << l, where r is the radius of the base, allows us to neglect radial variations of the concentration v and assume that v depends only on (x, t ) , 0 < - x < - l. We also assume that the cylinder is closed a t one end (x = l) and the flow rate is given at the other end of the cylinder. The concentration v of the substance in the cylinder can be affected by changes in the flow rate a t the end of the cylinder (the rate of the incoming flow is the controlling action). Assuming that a random perturbation ((t, x) is a stationary white noise, we obtain the following mathematical model of the plant to be controlled [95]:

212 Chapter I11

(here B and C a r e the diffusion and the porosity coefficients of the medium);

For the plant (3.6.45)-(3.6.47), we need to synthesize a regulator that minimizes the mean value of the quadratic performance criterion

I = E [lT l l a ( X , Y)V(t, x)v(t, Y) dtdx ] (3.6.48)

(8(x, y) is a given positive definite function, i.e., the kernel) provided that the absolute value of the boundary control action (the boundary flow of the substance) u is bounded, that is,

Iul 5 Urn- (3.6.49)

In this example the Bellman equation (3.6.17) has the form

e " drdy

+ f K(x' ~ ) S v ( t , x)Sv(t, y) x=t

va(E)] , F[T,V(T,X)]=O. + a 2 min I U I L U ~ a x Sv dx Sv %=a

Taking into account (3.6.45) and (3.6.46) and calculating the minimumwith respect to u, we can rewrite (3.6.50) in the form

e S2 F + f l l K(x' ~ ) S v ( t , x)Sv(t, y)

dxdy

Simultaneously, we obtain the optimal control law

Thus, to obtain the final solution of the synthesis problem, it remains to calculate the functional derivative [SF/Sv(t, x)] ,=~ in (3.6.52). We calculate it by the method of successive approximations.


The zero approximation. Suppose that urn is small. To solve (3.6.5 I) , we first set urn = 0. As a result, we obtain the following equation of the zero approximation

e e a2 SF0 -?!! = Jd( l B(x, y)v(t, x)v(t, y) dxdy + a v(t, x)- (----) dx

a t 2S, a x 2 s v ( t , x )

S2 Fo dxdy, FO[T, V(T, x)] = 0.

Elementary calculations show that its solution (3.6.21) can be written in the form

F O [ ~ , v(t, = JT t dT{ J( 0 Je@(z, 0 v)

Here the Green function G for the boundary value problem (3.6.45), (3.6.46) can readily be obtained by the separation of variables (the Fourier method) [26, 1791 and represented as the series

The functional derivative of the quadratic functional (3.6.54) can readily be calculated (for example, by using formulas (3.6.13); see also 1911) as follows:

Hence it follows that the optimal control law (3.6.52) has the following form in the zero approximation:

u0 [t , v (t , x)] = urn sign

214 Chapter 111

The first approximation. Taking into account (3.6.56), we can write Eq. (3.6.51) in the first approximation with respect to u, as follows:

S2 Fl dxdy - 2a2u,G[v(t, x)],

Now, formulas (3.6.21) and (3.6.22) are not sufficient for calculating Fl( t , v(t, x)); we need to use a more complicated calculation procedure according to (3.6.43) and (3.6.44). A finite-dimensional analog of the functional G can be obtained by dividing the interval [O,t] into the intervals A = t / r and replacing G by

Next, we use formulas (3.6.21), (3.6.22), and (3.6.43) as well as the formula


where

p, v = 1 , . . . , r; @ ( x ) = - e-y2/' dy ; hPgP = hlgl + . - - + hrgr .

As a result, for Fl[ t , v ( t , x ) ] , we obtain the expression

~ i [ t , v ( t , x ) ] = ~ o [ t , v ( t , x ) ] - 2a2um lT d T { , / F e - H 2 / 2 7 ( T )

where Fo[t, v ( t , x ) ] is given by (3.6.54), and moreover,

H = H [ t , T , v ( t , x ) ] =

x G ( x , u; 0 , T ) G ( Y , u; 2, r ) G ( T , T ; 8, t ) v ( t , y) dxdydZdg, (3.6.60)

x G ( 2 , r; x , u ) G ( y , T ; y , u ) G ( x l , a; 0 , .r)G(yl , a;:, r ) G ( x l ' , Z; 0 , T )

x G ( y l l , z; 5, T ) d x d y d x ' d y ' d x " d y l ' d ~ d ~ .

After the functional derivative ( 6 ~ ~ / 6 v ( t , x ) ) z=o is calculated, relations (3.6.52) and (3.6.59) yield the controlling functional

ul [t, u ( t , x)] = u, sign

x [l l l o ( X , Y ) G ( X , r; o , t ) G ( Y , 7; 8 , t ) v ( t , 8) dydxdy

Formula (3.6.61) enables us to synthesize the quasioptimal control system in the first approximation.

216 Chapter I11

Although the quasioptimal control algorithms (3.6.56) and (3.6.61) look somewhat cumbersome (especially, formula (3.6.61)), they admit a trans- parent technical realization. For example, let us consider the zero-approximation algorithm (3.6.56), which can be written as

where

Q(v, t ) = lT d r L e @(x, Y)G(X, 7; 0, t)G(y, T; Y, t ) d ~ d y

is a known function calculated previously. The current value of the state function v(t, x) can be determined by a system of data units that measure the concentration v(t, X I ) , v(t, x2), . . . , v(t, 2,) at the points X I , 2 2 , . . . , x, lying along the cylinder. In particular, if the concentration gauges are placed uniformly along the cylinder, then the integral in (3.6.62) can be replaced by the sum

As a result, we obtain an algorithm whose realization does not present any difficulties.


Indeed, it follows from (3.6.63) that, besides a system of data units, the control circuit (the feedback circuit) contains a system of linear amplifiers with amplification factors Qi(t), an adder, and a relay type switching device that relates the pipeline [0, .l] either to reservoir 1 (for pumping additional substance) or to reservoir 2 (for substance suction a t the pipeline input).

Figure 29 shows the block diagram of the system realizing the control algorithm (3.6.63).

The quasioptimal first-approximation algorithm (3.6.6 1) can be realized in a similar way. Here only the control circuit, along with a nonlinear unit of an ideal relay type, contains nonlinear transformers that realize the probability error functions a(%).

However, it should be noted that an error is inevitably present in the finite-dimensional approximation of the state function v(t, x) (when the algorithm (3.6.56) is replaced by (3.6.63)), since it is impossible to measure the system state v(t, x) precisely (this state is a point in the infinitely dimensional Hilbert state L2) . However, if the points X I , . . . , x p of location of the concentration data units lie sufficiently close to each other, then this error can be neglected.

CHAPTER IV

SYNTHESIS OF QUASIOPTIMAL SYSTEMS

IN THE CASE OF SMALL DIFFUSION

TERMS IN THE BELLMAN EQUATION

If random actions [(t) on the plant in the closed-loop control system shown in Fig. 3 are of small intensity and the observation errors ~ ( t ) and C(t) are large, then the Bellman equation contains a small parameter, namely, the coefficients of the second-order derivatives of the loss function in the phase variables are small.

Indeed, considering the synthesis problem for which we derived the Bell- man equation in the form (1.4.26) in $1.4, we assume that the matrix u(t, x(t)) in the plant equation (1.4.2), which determines the intensity of random perturbations, has the form a(t , x) = E1I2~o(t, x), where E is a small parameter (0 < E < 1). Moreover, if BY@, y) = ~ B z ( t , y) is the diffusion matrix of the input process, then Eq. (1.4.26) acquires the form

E Ft + [Ay (t, y)lTFY + 5 [SP B: (t, x)FxX+SpB:(t, y)FYy] +@(t, x, Y, Fx) = 0,

(4.0.1) where B$ (t, x) = oo(t, x)uT(t, 2).

On the other hand, large observation errors correspond to the case in which the matrix Q(t) in (1.5.46) has the form Q(t) = E - ~ / ~ Q ~ ( ~ ) . In this case, the Bellman equation (1.5.54) for the problem considered can be written in the form

E Ft + - Sp DRDF,,T + Sp [ F D ( U U ~ - EDRD)] + %(m, D, F,, Fo) = 0,

2 (4.0.2)

where

220 Chapter IV

If the value of the parameter E is small, then the solutions of the above equations and the solutions of the equations

obtained from (4.0.1)) (4.0.2) by setting E = 0, are expected to be close to each other. The equations for F0 are, generally speaking, simpler than the original Bellman equations, since they do not contain second-order derivatives and thus are partial differential equations of the first order. If these simpler equations can be solved exactly, then we can construct solutions of the original Bellman equations as series in powers of the small parameter E,

that is, as F = F0 +&F1 + E ~ . . . . Here the function F0 plays the role of the leading term (generating solution) of the expansion. Taking finitely many terms

F~ = F O + E F ~ + . . . + E ~ F ~ (4.0.5)

of the asymptotic series and considering Fk as an approximate solution of the Bellman equation (the kth approximation), we can readily solve the synthesis problem corresponding to this approximation. To this end, it suffices to make the change F + Fk in the expression for the optimal control algorithm u, = ~ ~ ( t , x, y, dF/ax) (see, for instance, (1.4.25)). In this way, we obtain the quasioptimal algorithm for the kth approximation:

uk(t1 2, Y) = PO(^, 2, Y, aFk(t, 2, Y ) / ~ Z ) - For k > 1, the expressions of Fk (or Fk) can be calculated recurrently. If

the functions and are sufficiently smooth, then the system of equations for the successive terms F1, F2, . . . in the expansion (4.0.5) can be obtained in the standard way by substituting the expansion (4.0.5) into Eqs. (4.0.1) or (4.0.2) and setting the coefficients of different powers E~ (k 2 1) of the small parameter equal to zero. In other cases, it may be convenient to use a somewhat different scheme of calculations in which the successive approximations Fk (k > 1) are obtained as solutions of the sequence of equations:

Synthesis of Quasioptimal Systems 221

This approximate synthesis procedure was studied in detail and exploited for solving some special problems in [34, 56, 58, 172, 1751. The accuracy of the approximate synthesis was investigated in [34, 561. It was shown that, under certain conditions, the use of the quasioptimal control uk in the kth approximation gives an error of the order of E~~~ in the value of the minimized functional. In other words, if instead of the optimal control algorithm u, we use the quasioptimal algorithm uk, then the difference between the value of the optimality criterion I[uk] corresponding to this control and the minimum possible (optimal) value I[u,] = F is of the order of E"', that is,

where c is a constant. In the present section the main attention is paid to the "algorithmic"

aspects of the method, that is, to calculational methods for obtaining quasioptimal controls uk. As an example, we consider two specific problems of the optimal servomechanism synthesis. First (in 54.1), we consider the synthesis problem that generalizes the problem considered in 52.2 to the case in which the input process ~ ( t ) is a diffusion Markov process inhomogeneous in the phase variable y. Next (in $4.2), we write an approximate solution of the synthesis problem for an optimal system of tracking a discrete Markov process of a "telegraph signal" type when the command input is observed on the background of a white noise.

$4.1. Approximate synthesis of a servomechanism with small-intensity noise

Let us consider a servomechanism shown in Fig. 10. Assume that the plant P is described by a scalar equation of the form

where ((t) is the standard white noise of unit intensity (1.1.31), E and N are given positive constants (E is a small parameter), and the values of admissible controls u lie in the region1

'The nonsymmetric constraints (4.1.2) are, first, more general (see [21]), and second, they allow a more convenient comparison between the results obtained later and the corresponding formulas constructed in 52.2.

222 Chapter IV

where urn > a > 0. The command input ~ ( t ) is a J(t)-independent scalar Markov diffusion process with drift and diffusion coefficients

where @ and B > 0 are given numbers and E is the same small parameter as in (4.1.1). The performance of the tracking system will be estimated by the value of the integral optimality criterion

where the penalty function c (y(t) - x (t)) = c(z(t)) > 0, c(0) = 0, is a given concave function of the error signal z(t) = y(t) - x(t).

The problem stated above is a generalization of the problem studied in Section 2.2.1 of $2.2 to the case in which the plant is subject to uncontrolled random perturbations and the input Markov process y(t) is inhomogeneous in the phase variable y (the drift coefficient AY = AY(y) = -@y # const). The inhomogeneity of the input process y(t) makes the synthesis problem more complicated, since in this case the Bellman equation cannot be reduced to a one-dimensional equation (as in Section 2.2.1 of $2.2).

Since problem (4.1.1)-(4.1.4) is a special case of problem (1.4.2)-(1.4.4), then it follows from (1.4.21), (1.4.22), and (4.1.1)-(4.1.4) that the Bellman equation has the form

E -PyFy + min

-a-u,<u<-a+u, [uFz] + i(NF.m + BFyy) + c(y - X) = -Ft,

-

If like in Section 2.2.1 of $2.2 we introduce a new phase variable z = y - x and replace the loss function F ( t , x, y) by F ( t , y, z), then Eq. (4.1.5) can readily be written as

- min @y(" + Fz) + - a - u m < u < - a + u m

[-uFz1

We are interested in the stationary tracking when the terminal time T + co. If the stationary loss function f (y, z ) is introduced in the standard way (see (1.4.29) and (2.2.9)),

f (Y, ') = lim EF@l y, '1 - T ( ~ - t) l , T-im

(4.1.7)


then (4.1.6) implies the following stationary Bellman equation for the problem considered:

- P ~ ( f y + f t ) + min -a-u,<u<-a+u, [-ufrl

As usual, the number y 2 0 in (4.1.8) characterizes the mean losses per unit time under stationary operating conditions. This number is an unknown variable and can be obtained together with the solution of Eq. (4.1.8).

Let us discuss the possibility that Eq. (4.1.8) can be solved. By R+ we denote the domain on the phase plane (y, z) where f, > 0 and by R- the domain where f, < 0. It follows from (4.1.8) that the optimal control u,(y, z) must be equal to u, = u, - a in R+ and to u* = -urn - a in R - . Denoting by f+(y, z) and f- (y, z) the values of the loss function f (y, z) in the domains R+ and R-, we obtain the following two equations from (4.1.8):

Since in (4.1.8) the first derivatives fy and f, are continuous on the interface r between R+ and R- [172], both equations in (4.1.9) hold on r, and we have the condition

Since the control action u, is of opposite sign on each side of the interface I', the line I' is naturally called a switching line. It follows from the preceding that the problem of the optimal system synthesis is equivalent to the problem of finding the equation for the switching line I?.

Equations (4.1.9) cannot be solved exactly. The fact that expressions with second-order derivatives contain a small parameter E allows us to solve these equations by the method of successive approximations. In the zero approximation, instead of (4.1.9), we need to solve the system of equations

By f:, and r0 we denote the loss function, the stationary error, and the switching line obtained from the Eq. (4.1.11) for the zero approximation. The successive approximations fi, -yk, and rk ( k > - 1) are calculated

224 Chapter IV

recurrently by solving a sequence of equations of the form

where

A method for solving Eqs. (4.1.11), (4.1.12) was proposed in [172]. Let us briefly describe the procedure for calculating successive approximations f k , 7k , and rk, k = 0,1,2, . . . . First of all, note that Eqs. (4.1.11), (4.1.12) are the Bellman equations for deterministic problems of synthesis of second- order control systems in which the equation of motion has the form

(in the second equation the signs "minus" and "plus" of urn correspond to the domains R'C+ and R!, respectively). As was shown in [172], the gradient v f of the solution of nondiffusion equations (4.1.1 I ) , (4.1.12) remains continuous when we cross the interface rk, that is, on rk we have the conditions

k -

a f t a f t - af,-- af+ - - - k = 0 , 1 , 2 ,..., (4.1.15) a y a y ' az az

if the phase trajectories of the deterministic system (4.1.14) either approach the line l' on both sides (the switching line of the first kind) or approach rk on one side and recede on the other side (the switching line of the second kind, see Fig. 4). This fact allows us to calculate the gradient vfk along rk. Indeed, in the domain R: we have

and in the domain Rk_


It follows from the preceding continuity considerations that both equations (4.1.16) and (4.1.17) must be satisfied on r simultaneously. Solving these equations for the first-order derivatives, we find the gradient of the loss function on the interface rL between R$ and Rf:

This allows us to write the difference between the values of the loss function a t different points on the boundary rk as a contour integral along the boundary,

Q f k ( ~ ) - f k ( p ) = 1 A: dy + A: dl. (4.1.19)

P

If the part of rk between the points P and Q is a boundary of the first kind (that is, the representative point of system (4.1.14) once coming to the boundary moves in the "sliding regime" along the boundary [172]), then formula (4.1.19) makes it possible to obtain a necessary condition for the boundary rk to be optimal. The corresponding equation for the desired switching line z = zk(y) is obtained from the condition that the difference (4.1.19) must be minimal. This equation can be written in the form [I721

Equation (4.1.20) is a consequence of the following illustrative arguments. Let yQ and yp be the coordinates of the points Q and P on the y-axis. We divide the interval [yQ, y,] into N equal intervals of length A = Iy, - yQ [IN and replace the contour integral (4.1.19) by the corresponding integral sum

where yi = yp f (i - 1) A and zi = ~ ( y i ) . We need to choose zi so that to minimize the function @a(zl , . . . , z ~ + ~ ) . A necessary extremum condition a@,/azi = 0 written for an arbitrary i and the sum (4.1.21) allows us to write the following system of equations for optimal zi:

226 Chapter IV

If A;(y, z), A,k(y, z), and zk(y) are sufficiently smooth functions of their arguments, then we have

k k a ~ k a A: Az(Yi-~,zi-l)-A,(yi,zi) = -Az(yi,zi)+-(yi,zi)(zi-l-zi)+~(A) BY az

(4.1.23) for small A = yi - y;-1. Substituting (4.1.23) into (4.1.22), taking into account the relation zi+1 - 2zi + zi-1 = o(A), and passing to the limit as A + 0, we obtain the condition

which coincides with (4.1.20), since i is an arbitrary number. If we know the gradient of the loss functions along the switching line

rk and the equation z = zk(y) for rk, then we can find a condition for the parameter yk that is the kth approximation of the stationary tracking error y in the original diffusion equation (4.1.8). By using (4.1.18) and the equation z = zk (y), we obtain the following expression for total derivative dfk/dy along rk:

The unknown parameter yk can be found from the condition that the derivative (4.1.25) is finite a t a stable point; in the problem considered the point y = 0 is stable. More precisely, this condition can be written as

lim ywk(y, y k ) = 0. (4.1.26) Y+O

The expression

is the increment of the loss functions f k on the time interval dt. Hence (4.1.26) means that this increment becomes zero after the controlled deterministic system (4.1.14) arrives a t the stable state y = 0. Obviously, in this case, it follows from the above properties of the penalty function c(z) that we also have z = 0. Thus, relation (4.1.26) is a necessary condition for the deterministic Bellman equations (4.1. l l ) , (4.1.12) to have stationary solutions.

Let us use the above-treated calculation procedure for solving the equations of successive approximations (4. l. l l ) , (4. l. 12). We restrict our calculations to a small number of successive approximations that determine the most important terms of the corresponding asymptotic expansions and primarily affect the structure of the controller C when a quasioptimal control system is designed.


The zero approximation. To calculate the zero approximation, we need to solve the system of equations (see (4.1.11))

Using (4.1.15) and solving system (4.1.27) with respect to = = - BY ay -

%f and d = = = g, we obtain the following expressions for the ay az az

components of the gradient V f O (4.1.18) on the switching line rO:

Equation (4.1.20), which is a necessary condition for the switching line of the first kind, and (4.1.28) allow us to obtain the equation for rO:

Since, by assumption, the penalty function c(z) attains its unique minimum at z = 0, the condition (4.1.29) implies the equation

that is, in the zero approximation, the switching line coincides with the y-axis on the plane (y, z).

Now let us verify whether (4.1.30) is a switching line of the first kind. An examination of phase trajectories of system (4.1.14) shows that on the segment

the phase trajectories approach the y-axis on both sides;' therefore, this segment is an actual switching line. For y @ [ L , e+], the equation for the switching line r0 will be obtained in the sequel.

Now let us calculate the stationary tracking error -yo. From (4.1.25), (4.1.26), and (4.1.28), we have

20bviously, in this case, the domain Rt (RO) is the upper (lower) half-plane of the phase plane (y, z). Therefore, to construct the phase trajectories, in the second equation in (4.4.14), we must take urn with sign "minus" for z > 0 and with "plus" for z < 0.

228 Chapter IV

It also follows from (4.1.28) and (4.1.31) (with regard to c(0) = 0) that the loss function is constant on the y-axis for l- < y < l + ; thus we can set fO(y,O) = 0 for y E [l-,[+I.

To calculate the loss function f 0 a t an arbitrary point (y, z), we need to integrate Eqs. (4.1.27). To this end, let us first write the system of equations for the integral curves (characteristics):

If yo denotes the point a t which a given integral curve intersects the y-axis, z = 0, then (4.1.32) implies the following equation for the characteristics (the phase trajectories) :

as well as for the zero approximation of the loss function

c(zl) dz'

PVZ'JI'P*(Y) + z - z 1 1 - a f urn' (4.1.34)

In (4.1.34) we have yo = yzl [yh (y) + z ] , where y-l(y) is the inverse function of cp(y).

By using the loss function (4.1.34) obtained, we can determine how to continue the switching line I'O outside the interval [l-, l+], where r0 is a switching line of the second kind (that is, the phase trajectories of system (4.1.14) approach r0 on one side and go away on the other side). In this case, as already noted, the gradient (4.1.15) remains continuous on rO ; therefore, the derivatives of the loss function along r0 are determined as previously by (4.1.28). However, in general, formula (4.1.20), from which Eq. (4.1.30) was derived, may not be valid any longer. In this case, the equation for r0 can be obtained by differentiating (4.1.34), say, with respect to z and by setting, in view of (4.1.28), the expression obtained equal to zero. This implies the following equation for the switching line rO:

Here we took into account the equality c(0) = 0 and assumed that the condition (dy*/dyo) - (ayoldz) # O must be satisfied on r0 determined by (4.1.35).


An analysis of the phase trajectories (4.1.14) shows that, to find r0 for y > l+, we must use the function cp- (y) in Eq. (4.1.35) (correspondingly, cp+ (y), to obtain r0 for y < L).

Let us calculate r0 for y > t+ in the case of the penalty function c(z) = z2. In this case, the integral in (4.1.35) can readily be calculated and Eq. (4.1.35) acquires the form

(in (4.1.36) we have yo = cp~l[cp-(~) + z], where cp-(~) is determined by (4.1.33)).

Equation (4.1.36) determines the switching line z = zO(y) for z > L+ implicitly. Near the point y = l+ = (a + u,)/P at which the switching line changes its type, Eq. (4.1.36) allows us to obtain an approximate formula and thus write the equation for r0 explicitly:

Figure 30 shows the position of the switching line r0 and the phase trajectories in the zero approximation.

230 Chapter IV

Higher-order approximations. Everywhere in the sequel we assume that the penalty function c(z) = z2. Let us consider Eqs. (4.1.12) corresponding to the first approximation:

To simplify the further calculations, we note that, in the case of the stationary tracking mode and of small diffusion coefficients considered here, the probability that the phase variables y and z fluctuate near the origin on phase plane (y, z) is very large. The values y = (a f u,)/P a t which the switching line r0 changes its type are attained very seldom (for the stationary operating conditions); therefore, we are mainly interested in finding the exact position of the switching line in the region -(u, - a)/@ < y < (u, + a) /p , where, in the zero approximation, the position of the switching line is given by the equation z = 0. Next, note that the first-approximation equation (4.1.37) differs from the corresponding zero-approximation equation (4.1.27) only by a small (of the order of E ) term in the expression for

z) (see (4.1.38)). Therefore, the continuity conditions imply that the switching line I'l in the first approximation determined by (4.1.37) is sufficiently close to the previous position z = 0. Thus, we can calculate I" by using, instead of exact formulas, approximate expressions corresponding to small values of z.

Now, taking into account the preceding arguments, let us calculate the function z) = c:(~, z) determined by (4.1.38). To this end, we differentiate the second expression in (4.1.34) and restrict ourselves to the first- and second-order terms in z. As a result, we obtain3

a2fg - 22 P2yz2 - + + z 3 . . . , dz2 fly - a & u, (fly - a 41

d2f2 - - - - Pz2 + z 3 . . . , d2 f2 = z 3 . . dzdy (fly - a k dy2

3The functions f:(y, z) and fL(y,z), as the solutions of Eqs. (4.1.37), are defined in R: and R?. At the same time, the functions f t (Y, z) and f:(y, z) are defined in R: and RO. However, since the switching lines r0 (between R: and RO) and I'l (between R: and R?) are close to each other, to calculate (4.1.39), we have used expressions

(4.1.34) for f i in R$ and RL .


Substituting (4.1.39) into (4.1.38) and (4.1.37), we arrive a t the equations

af: a f i E(B + N ) z Py-+ (Py - a i u r n ) - = z 2 - y l + (4.1.40) ay 8.2 (PY - a + urn)

(in Eqs. (4.1.40) we preserve only the most important terms in the functions c i (y , z) and neglect the terms of the order higher than or equal to that of e3).

In view of (4.1.15), both equations (4.1.40) hold on the boundary r l . By solving these equations, we obtain the components of the gradient of the loss function V fl(y, z ) on the switching line r l :

In this case, the condition (4.1.20) (a necessary condition for the switching line of the first kind) leads to the equation

Hence, neglecting the order terms, we obtain the following equation for the switching line I'l in the first approximation:

Equation (4.1.43) allows us to calculate the stationary tracking error y1 in the first approximation. The function wl(y, yl) readily follows from (4.1.25), (4.1.41), and (4.1.43). Substituting the expression obtained for wl(y, y l ) into (4.1.26), we see that y1 = O(e2), that is, the stationary tracking error in the first approximation coincides with that in the zero approximation, namely, = 0.

The stationary error y attains nonzero values only in the second approximation. To calculate the derivative (4.1.25)

with desired accuracy, we need not calculate the loss function f& (y, z) in the first approximation but can calculate c2(y, z) in (4.1.12) and (4.1.13)

232 Chapter IV

by using expressions (4.1.41) for the derivatives d f '/dy and d f l /dz , which are satisfied along the switching line rl.

Differentiating the first relation in (4.1.41), we obtain

d2f1 - - -

E(B + N ) along I". dz2 u & - ( P ~ - a ) ~

As follows from (4.1.41), the other second-order derivatives d2 f '/dzdy and d2 f1/dy2 on I'l are higher-order infinitesimals and can be neglected when we calculate y2. Therefore, (4.1.45) and (4.1.13) yield the following approximation expression for the function c2(y, z):

Taking (4.1.46) into account and solving the system (4.1.16), (4.1.17) (with k = 2) for a f 2/ay and 8 f /dz, we calculate the functions A: and A: in (4.1.44) as

From (4.1.26), (4.1.43), (4.1.44), and (4.1.47), we derive the equation foi the stationary tracking error in the second approximation:

whence it follows that

Formula (4.1.48) exactly coincides with the stationary error (2.2.23) obtained for a homogeneous (in y) input process. The inhomogeneity, in other words, the dependence of the stationary error on the parameter P, begins to manifest itself only in the calculations of higher approximations. However, the drift coefficient -fly affects the position of the switching line (4.1.43) already in the first approximation. Formula (4.1.43) is a generalization of the corresponding formula (2.2.22); for /3 = O these formulas coincide.

Figure 31 shows the analogous circuit diagram of the tracking system that realizes the optimal control algorithm in the first approximation. The unit N C is an inertialess nonlinear transformer governed by the functional

Synthesis of Quasioptimal Systems

dependence (4.1.43). The realization of the unit N C in practice is substantially simplified owing to the fact that the operating region of the input variable y (where (4.1.43) must be maintained) is small. In fact, it suffices to maintain (4.1.43) for lyl < CE' /~ , where C is a positive constant of the order of O(1). Outside this region, the character of the functional input-output relation describing N C is of no importance. In particular, for Iyl > CE'/', the nonlinear transformer N C can be constructed by using the equations for the switching line r0 in the zero approximation or, which is even simpler, by using the equation z 0. This is due to the fact that the system shown in Fig. 31 optimizes only the stationary tracking conditions when the phase variables are fluctuating in a small neighborhood of the origin on the plane (y, 2).

$4.2. Calculation of a quasioptimal system for tracking a discrete Markov process

As the second example illustrating the approximate synthesis procedure described above, we consider the problem of constructing an optimal system for tracking a Markov "telegraph signal" type process (a discrete process with two states) in the case where the measurement of the input signal is accompanied by a white noise and the plant is subject to random actions.

Figure 32 shows the block diagram of the system in question. We assume that y(t) is a symmetric Markov process with two states (y(t) = f 1) whose a prior2 probabilities p t ( f 1) = P[y(t) = f l ] satisfy the equations

Chapter IV

Here the number ,u > 0 determines the intensity of transitions between the states y = +1 and y = -1 per unit time. The system (4.2.1) is a special case of system (1.1.49) with m = 2 and X,(t) = Xya(t) = p. I t readily follows from (4.2.1) that realizations of the input signal y(t) are sequences of random pulses; the lengths T of these pulses and of the intervals between them are independent exponentially distributed random variables, P(T > C) = e-PC.

The observable process y(t) is an additive mixture of the input signal y(t) and a white noise (independent of y(t)) of intensity x:

Like in $4.1, the plant P is described by the scalar equation

where ((t) is the standard white noise independent of y(t) and C(t) and the controlling action is bounded in absolute value,

To estimate the system performance, we use the integral optimality criterion

where the penalty function c(y - x) is the same as in (4.1.4). In the method used here for solving problem (4.2.1)-(4.2.5), it is important that c(y - x) is a differentiable function. In the subsequent calculations, this function is quadratic, namely,

c(y - 2) = (y - x ) ~ . (4.2.6)


A peculiar feature of our problem, in contrast, say, with the problem studied in $4.1, is that the observed pair of stochastic processes (g(t), x(t)) is not a Markov process. Therefore, as was already noted in $1.5, to use the dynamic programming approach, it is necessary to introduce a special space of states formed by sufficient coordinates that already possess the Markov property.

4.2.1. Sufficient coordinates and the Bellman equation. Let us show that the current value of the output variable x(t) and the a posteriori probability w t ( l ) = P[y(t) = +1 I fji] are sufficient coordinates Xt in the problem considered. In the sequel, owing to purely technical considerations, it is more convenient to take, instead of wt(l) , the variable zt = wt( l ) - wt(-1) as the second component of Xt . It follows from the normalization condition wt (1) + wt (-1) = 1 that the a posteriori probabilities wt (1) and wt(-1) can be uniquely expressed via zt as follows:

Obviously, zt randomly varies in time. Let us derive the stochastic equation describing the random function zt = z(t). Here we shall consider a somewhat more general case of the input signal nonsymmetric with respect to probability. In this case, instead of (4.2.1) the a priori properties of y(t) are described by the equations

that is, the intensities of transitions between the states y = +l and y = -1 down from above (p) and upwards from below (v) are not equal to each other.

Let us pass to the discrete time reference. In this case, random functions in (4.2.2) are replaced by sequences of random variables

where gn, y,, and Cn are understood as the mean values of realizations over the interval A of time quantization:

236 Chapter IV

It follows from (4.2.8) (see also (1.1.42)) that the sequence yn is a simple Markov chain characterized by the following four transition probabilities pA(yn+1 I yn):

(all relations in (4.2.11) hold up to terms of the order of o(A)). I t follows from the properties of the white noise (1.1.31) that the random

variables in corresponding to different indices are independent of each other and have the same probability densities

Using these properties of the sequences yn and in, we can write recurrent formulas relating the a posteriori probabilities of successive time instants (with numbers n and n + 1) and the result of the last observation.

The probability addition and multiplication theorems yield the formulas

Taking into account the relation p(yn = f 1, ?jT) = wn (f l )p(?jT) , we can rewrite (4.2.13) and (4.2.14) as follows:

We write dn = wn(l)/wn(-1) and note that (4.2.9) and (4.2.12) imply


Now, dividing (4.2.15) by (4.2.16) and taking into account (4.2.11), we obtain the following recurrent relation for the parameter d,:

By letting the time interval A -+ 0, and taking into account the fact that lima+o (d,+l - d,)/A = dt and (4.2.17), we derive the following differential equation for the function dt = d(t):

Since, in view of (4.2.7), the functions zt = z(t) and dt satisfy the relation dt = (1 + z t ) / ( l - zt), Eq. (4.2.18) for zt has the form

For a symmetric signal ( p = u), instead of (4.2.19), we have

REMARK. According to (4.2.2), the observable process y(t) contains a white noise, and the coefficients of g(t) in (4.2.18)-(4.2.20) contain random functions dt = d(t) and zt = z(t). I t follows from $1.2 that, in this case, we must indicate in which sense we understand the stochastic integrals used for calculating the solutions of the stochastic differential equations (4.2.18)- (4.2.20). A more rigorous analysis (e.g., see [132, 1751 shows that all three equations (4.2.18)-(4.2.20) must be treated as symmetrized equations. In particular, just due to this fact we can pass from Eq. (4.2.18) to Eq. (4.2.19) by using the standard rules for differentiating composite functions (instead of a more complicated differentiation rule (1.2.43) for solutions of differential Ito equations).

Now let us verify whether the coordinates Xt = (xt, zt) are sufficient for the solution of the synthesis problem in question. To this end, according to [I711 and $1.5, we need to verify whether the coordinates Xt = (xt ,z t ) are sufficient

(1) for obtaining the conditional mean penalties

238 Chapter IV

(2) for finding constraints on the set of admissible controls u; (3) for determining their future evolution (that is, the probabilities of

the future values X t + a , A > 0). In this problem, in view of (4.2.4), the set of admissible controls is a

given interval -1 5 u 5 1 of the number axis independent of anything; therefore, we need not take into account the statement of item (2).4

Obviously, the conditional mean penalties (4.2.21) can be expressed via the a posteriori probabilities as follows:

Since formulas (4.2.7) express the a posteriori probabilities wt(4Zl) in terms of zt, statement (1) is trivially satisfied for the variables (xt, zt).

Let us study the time evolution of (xt,zt). The variable xt = x(t) satisfies an equation of the form (4.2.3). If in this equation the control ut a t time t is determined by the current values of (xt, zt), then, in view of the white noise properties, the probabilities of the future values of x(T), T > t , are completely determined by Xt = (xt, zt). Now, let us consider Eq. (4.2.20). Note that, according to (4.2.2), c( t ) = y(t) + f iC(t) , where y(t) is a Markov process and <(t) is a white noise. Therefore, it follows from Eq. (4.2.20) that the probabilities of the future values zt+a are determined by zt and the behavior of Y(T), T 2 t. However, since Y(T) is a Markov process, its behavior for T > t is determined by the state yt described by the probabilities wt(yt = *I), that is, in view of (4.2.7), still by the coordinate zt. Thus, statement (3) is proved for Xt = (xy, zt).

Equations (4.2.3) and (4.2.20) allow us to write the Bellman equation for the problem considered. Introducing the loss function

(4.2.23) and using the Markov property of the sufficient coordinates (x( t ) ,z( t ) ) , from (4.2.23) we obtain the basic functional equation of the dynamic programming approach:

r &+A F ( t , xt, zt) = min

lu( r ) l<1

4 ~ t is necessary to verify the statement of item (2) only in special cases in which the control constraints depend on the state of the control system. Such problems are not considered in this book.


The Bellman differential equation can be derived from (4.2.24) by the standard method (see $1.4 and $1.5) of expanding F ( t + A, xt+a, zt+a) in the Taylor series around the point (t, xt, zt) , averaging, and passing to the limit as A t 0. In this procedure, we use the following obvious formulas that are consequences of (4.2.3), (4.2.7), and (4.2.20)-(4.2.22):

E[(zt+a - ~ t ) ( z t + a - zt) I xt, ~ t ] = o(A), (4.2.29)

E[(xt+a - I xt,zt] = o(A),

E[(zt+a - I xt, zt] = o(A), k > 3. (4.2.30)

It is somewhat more difficult to calculate the mean value of the difference (zt+a-zt). Since, as was already noted, (4.2.20) is a symmetrized stochastic equation, E [ ( z ~ + ~ - zt) I xt, zt] = E[(zt+a - z t ) I zt] can be calculated with the help of formulas (1.2.29) and (1.2.37) (with u = 112 in (1.2.37)). Then, taking into account the relation

from (4.2.20) and (1.2.37), we obtain

As A -+ 0, relations (4.2.24)-(4.2.31) enable us to write the Bellman differential equation in the form

aF aF aF B ~ ~ F ( 1 - z 2 ) 2 a 2 ~ -+ min u- -2pz-+--+ a t 1,151 [ ax] az 2 axz 2~ az2

240 Chapter IV

The second term in Eq. (4.2.32) can also be written as -IdF/dxl. To the equation obtained, we must add a condition on the loss function

in the end of the control process, namely,

and some boundary conditions. Since the input signal takes one of the two values y(t) = f 1 a t each

instant of time t , we can restrict our consideration to the region 1x1 5 1. Thus the sufficient coordinates are defined on the square -1 5 x < $1, -1 < - z < - $1. The boundary conditions on the sides x = -1 and x = $1 of this square are

These conditions mean that there is no probability flow [ l l , 1731 through the boundary x = f

On the other sides z = f 1 of the square, the diffusion coefficient con-

tained in the second diffusion term *% is zero. Therefore, instead of the conditions dF/dz = 0 on these sides of the square, we have the trivial conditions

If, by analogy with the problem solved in $4.1, in the space of sufficient coordinates (x, z) we denote the regions where dF/dx > 0 and dF/dx < 0 by R+ and R-, respectively, then in these regions the nonlinear equation (4.2.32) is replaced by the corresponding linear equation and the optimal control is formed by the rule

Since the first-order derivatives of the loss function are continuous [113, 1751, on the interface I' between R+ and R-, we have

To solve the synthesis problem is equivalent to find the interface r between R+ and R- (the switching line for the controlling action). A straightforward way for obtaining the equation for the switching line I' is to solve

5The condition (4.2.34) means that there are reflecting screens on the boundary segments (x = +1, -1 5 z 5 $1) and (x = -1, -1 5 x 5 +1) (for a detailed description of diffusion processes with phase constraints and various screens, see $6.2).


the original nonlinear equation (4.2.32) with the initial and boundary conditions (4.2.33)-(4.2.35) and then, on the plane (z , z), to find the geometric locus where the condition (4.2.36) is satisfied.

However, this method can be implemented only numerically. To solve the synthesis problem analytically, let us return to the approximate method used in $4.1.

4.2.2. Calculation of the successive approximations. Suppose that the intensity of random actions on the plant is small but the error of measurement of the input signal is large. In this case, we can set B = EBO and ;ic = xO/& (where E > 0 is a small parameter). We consider, just as in $4.1, the stationary tracking operating conditions. Then for the quadratic loss function (4.2.6), the Bellman equation (4.2.32) has the form

+ x 2 - 2 x z + 1 - y az2 (4.2.37)

(here f = f (x , z) is the stationary loss function defined just as in (4.1.7), and y is the stationary tracking error).

Introducing the special notation f+ and f- for the loss function f in R + and R-, we can replace the nonlinear equation (4.2.37) by the pair of linear equations

each of which is valid only in one of the regions (R+ or R-) on the phase plane (z , z) .

We shall solve Eqs. (4.2.38) by the method of successive approximations considered in $4.1. In this case, instead of (4.2.38), we need to solve a number of simpler equations that successively approximate the original equations (4.2.38). By setting E = 0 in (4.2.38), we obtain the zero-approximation equations

The next approximations are calculated according to the scheme

242 Chapter IV

By solving the equations for the kth approximation (k = 0,1,2, . . . ), we obtain the set fz (x, z), I'" -yk consisting of approximate expressions for the loss function, the switching line, and the stationary tracking error.

In what follows, we solve the synthesis problem in the first two approximations, the zero and the first.

The zero approximation. Let us consider Eqs. (4.2.39). By analogy with 34.1, the equation for the interface r0 between R t and R:, on which both equations for f t and f! hold, and the stationary tracking error -yo can be found without solving Eqs. (4.2.39). Indeed, using the condition that the gradient v f k (see (4.1.15)) is continuous on the switching line rk ,

we obtain from (4.2.39) the following components of the gradient V f 0 along r O :

The condition

which is necessary for the existence of the switching line of the first kind (see (4.1.20)), together with (4.2.42) implies that the line

is a possible F0 for the zero approximation. An analysis of the phase trajectories of the deterministic system

shows that the trajectories actually approach the line (4.2.44) on both sides6 if only 2p < 1. In what follows, we assume that this condition is satisfied.

The stationary error is obtained from the condition that the derivative d f O / d x calculated along r0 a t the stable point (e.g., a t the origin x = 0,

61n the first equation in (4.2.45), the sign + corresponds to the region z > x and the sign - to z < x.


z = 0) is finite (see (4.1.25) and (4.1.26)). In view of (4.2.42) and (4.2.44),

I-"=- 0 - along rO. The condition (4.1.26) in this case has the form limx,o 2pxY - 0, which implies = 1.

Now, to solve Eq. (4.2.39), we write the characteristic equations

To solve (4.2.46) uniquely, it is necessary to pose some additional "initial" condition (to pose the Cauchy problem) for the loss function f (x, z). This condition follows from (4.2.42) and (4.2.44). The second relation in (4.2.42) implies that fO(z ,z ) = -(z2/4,u) + f O ( O , 0) on the line (4.2.44). Without loss of generality, we can set fO(O, 0) = 0. Thus, among the solutions fg obtained from (4.2.46), we choose the solution satisfying the condition

on the line z = x. We readily obtain this solution

where xo = x*(x, z ) and the functions X* are determined as solutions of the equations

X * e ~ 2 ~ ~ * = z e ~ 2 ~ x . (4.2.49)

The first approximation. Now, using (4.2.48), we can find the switching line I'1 in the first approximation. Relations (4.2.40) and (4.2.41) allow us to write the components of the gradient V f l on the line rl:

Differentiating (4.2.48) and using the relations

244 Chapter IV

that follow from (4.2.49), we find the components

Substituting (4.2.51) into (4.2.50), we obtain

Using again the condition (4.2.43), we find rl. The derivatives d A i / d z and dAi/dx are calculated with regard to the fact that the difference between the position of the switching line I'1 in the first approximation and the position of r0 determined by (4.2.44) is small. Therefore, after the differentiation of (4.2.52), we can replace X+ and X- by the relation X+ = X- = z = x. If this replacement is performed only for the terms of the order of E, then the error caused by this replacement is an infinitesimal of higher order.

Synthesis of Quasioptimal Systems

Taking into account this fact, we obtain from (4.2.52):

Hence, using (4.2.43), we obtain the equation for the switching line I":

The position of r1 on the plane (x, z) depends on the values of p, xo, and Bo. Figure 33 shows one of the possible switching lines and the phase trajectories of system (4.2.45).

By analogy with the zero approximation, we find the stationary tracking error y1 from the condition that the gradient (4.2.52) is finite a t the origin. By letting z + 0 and x 4 0 in (4.2.52) and taking into account the fact that X+ and X - tend to zero just as x and z , we obtain

Hence it follows that the stationary error in the first approximation depends on the noise intensity a t the input of the system shown in Fig. 32 but is independent of the noises in the plant.

246 Chapter IV

Using the equation (4.2.53) for the switching line and Eq. (4.2.20), we construct the analogous circuit (see Fig. 34) for a quasioptimal tracking system in the first approximation. The dotted line indicates the unit SC that produces a sufficient coordinate z(t); the unit NC is an inertialess transducer that realizes the functional dependence on the right-hand side of (4.2.53). If we have E << 1 for the small parameter contained in the problem, then the output variable x(t) fluctuates mostly in a small neighborhood of zero. In this case (Ix(t)l << I), as follows from (4.2.53), the nonlinear unit NC can be replaced by a linear amplifier with the amplification factor

CHAPTER V

CONTROL OF OSCILLATORY SYSTEMS

The present chapter deals with some synthesis problems for optimal systems with quasiharmonic plants. Here the term "quasiharmonic" means that the plant dynamics is close to harmonic oscillations in the process of control. In this case, through time t = 2n, the phase trajectories of the second-order systems considered in this chapter are close to circles on the plane (3, k ) .

There exists an extensive literature on the methods for studying such systems (including controlled systems) (e.g., see [2, 19, 27, 33, 69, 70, 136, 153, 1541 and the references therein). These methods are based on the idea (going back to Poincark) that the motion in oscillatory systems can be divided into "fast" and "slow" motions. This idea along with the averaging method [2] enables one to derive equations for "slow" variables that can readily be integrated. These equations are usually derived by different versions of the method of successive approximations.

Various approximate methods based on the first-approximation equation for slowly varying variables play an important role in industrial engineering. For the first time, such a method for studying nonlinear oscillatory systems was proposed by van der Pol [183, 1841 (the method of slowly varying amplitudes). Among other first-approximation methods, we also point out the "mean steepness" method [2] and the harmonic balance method [69, 701, which is widely used in engineering calculations of automatic control systems.

More precise results can be obtained by regular asymptotic methods, the most important of which is the asymptotic Krylov-Bogolyubov method [19]. Originally, this method was developed for studying nonlinear oscillations in deterministic uncontrolled systems. Later on, this method was also used for the investigation of stochastic [log, 1731 and controlled [33] oscillatory systems. In the present chapter, the Krylov-Bogolyubov method is also widely used for constructing quasioptimal control algorithms.

This chapter consists of four sections, in which we consider four special problems of optimal damping of oscillations in quasiharmonic second-order systems with constrained controlling actions. In the first two sections (55.1

248 Chapter V

and $5.2) we consider deterministic problems; the other two sections ($5.3 and $5.4) deal with stochastic synthesis problems.

First, in $5.1 we study the control problem for an arbitrary quasiharmonic oscillator with one degree of freedom. We describe a method for solving the synthesis problem approximately. In this method, the minimized functional and the equation for the switching line are presented as asymptotic expansions in powers of a small parameter contained in the problem. The method of approximate synthesis is illustrated by some examples of solving the optimal control problems for a linear oscillator and a nonlinear van der Pol oscillator. In $5.2 we use the method (considered in $5.1) for solving the control problem for a system of two biological populations, namely, the "predator-prey" model described by the Lotka- Volterra equation (see $2.3). We study a special Lotka-Volterra model with a "poorly adapted predator." In this case, the sizes of both interacting populations obey a quasiharmonic dynamics. Next, in $5.3, we consider the stochastic version of the problem studied in $5.1. We consider an asymptotic synthesis method that allows us to construct quasioptimal control systems with an oscillatory plant subject to additive random disturbances. Finally, in $5.4, the method considered in $5.3 is generalized to the case of indirect observation when the measurement of the current state of the oscillator is accompanied by a white noise.

$5.1. Optimal control of a quasiharmonic oscillator. An asymptotic synthesis method

According to [2], a mechanical system with one degree of freedom is called a quasiharmonic oscillator if its behavior is described by the system of the form

331 = 2 2 + E ~ 1 ( 2 1 , 2 2 , u), (5.1.1)

332 = -21 + ~ ~ 2 ( 2 1 , 2 2 , u),

where 21 and 2 2 are the phase coordinates, and ~2 are sufficiently arbitrary (nonlinear, in the general case) functions of their arguments,' u = (ul , . . . , uT) is an r-dimensional vector of controlling actions subject to various restrictions, and the number E is a small parameter.

It follows from (5.1.1) that for E = 0 the general solution of system (5.1.1) is a union of two harmonic oscillations

z l ( t ) = a sin(t + a), 22(t) = a cos(t + a), (5.1.2)

'The only assumption is that, for some given functions ~1 and xz, the Cauchy problem of system (5.1.1) has a unique solution in a chosen domain D in the space of the variables ( t , xl , xz) (see $1.1).

Control of Oscillatory Systems 249

with the same period T = 27r and the phase shift Acp = n/2. Note that, in the phase plane (21,22), the trajectory that is a circle of radius a corresponds to the solution (5.1.2). If E # 0 but is a sufficiently small parameter, then, in view of the continuity, the difference between the solution of system (5.1.1) and the solution (5.1.2) is small on a time interval that is not too large. More precisely, if for E # 0 we seek the solution of system (5.1.1) in the form

then the "amplitude" increment Aa = a( t + 27r) - a(t) and the "phase" increment A a = a ( t + 27r) - a ( t ) are small during time T = 27r, that is, Aa N E and A a E. This fact justifies the term "quasiharmonic" for systems of the form (5.1.1) and serves as a basis for the elaboration of various asymptotic methods for the analysis of such systems.

5.1.1. Statement of the problem. In the present section we consider controlled oscillators whose behavior is described by an equation of the form

x + EX(%, I)& + x = EU, (5.1.3)

where ~ ( x , I) is an arbitrary given function (nonlinear in the general case) that is centrally symmetric, e.g., ~ ( x , I) = x(-x, -I). In the phase variables xl , x2 (determined, as usual, by x l = x and 2 2 = g), we can replace Eq. (5.1.3) by the following equivalent system of first-order equations:

231 = 2 2 , &2 = -21 - E X ( X ~ , X ~ ) X ~ + EU, (5.1.4)

hence it follows that the oscillator (5.1.3) is a special case of the oscillator (5.1.1) with XI 2 0 and X Z ( X ~ , X Z , U) = u - ~ ( 2 1 , 22)x2.

It should be noted that equations of the form (5.1.3) describe a wide class of controlled plants of various physical nature: mechanical (the Froude pendulum [2]), electrical (vacuum-tube and semiconductor generators of harmonic oscillations [2, 19, 183, 184]), electromechanical remote tracking systems for angle reconstruction [2], etc. Numerous examples of actual systems mathematically modeled by Eq. (5.1.3) can be found in [2, 19, 1361.

For the controlled oscillator (5.1.3), we shall consider the following optimal control problem with free right-hand endpoint of the trajectory.

We assume that the absolute value of the admissible (scalar) control u = u(t) is bounded a t each time instant t:

250 Chapter V

and the goal of control for system (5.1.3) is to minimize the integral functional

T

I[u] = c (x (t), &(t)) dt i min (5.1.6) Iu(t) l<um, O<t<T

over the trajectories {x(t) = xu(t) : O 5 t 5 T) of system (5.1.3) that correspond to all possible controls u satisfying (5.1.5). The time interval [0, TI and the initial state of the oscillator x(0) = xl(0) = xlo, & ( O ) = x2(0) = 220 are given. The penalty function c(x, 2 ) = c(xl,x2) in (5.1.6) is assumed to be nonnegative and symmetrical with respect to the origin, c(xl, 2 2 ) = c(-xl, -x2), and vanishing only a t the point ( X I = 0,x2 = 0). In this case, the optimal control u, minimizing the functional (5.1.6) is sought in the synthesis form u, = u, (t, xl(t), xz(t)).

Problem (5.1.3)-(5.1.6) is a special case of problem (1.3.1)-(1.3.3) considered in $1.3. Therefore, if we determine the function of minimum future losses

F ( t , X I , 2 2 ) = min [lT c(xl(r) , xZ(T)) d r I xl( t ) = X I , ~ 2 ( t ) = 5 2

Iu(r)I<urn, t<s<T

I (5.1.7)

in the standard way and use the standard derivation procedure described in $1.3, then, for the function (5.1.7), we obtain the Bellman differential equation

d F -- d F dl7 at = "2- - ("1 + EX("', x2)xZ) - + rnin EU- + c(x1, xz),

ax l ax2 I u l S u m [ :xtl 0 < t < T, F ( T , ~ 1 , x z ) = 0,

(5.1.8) that corresponds to problem (5.1.3)-(5.1.6).

Equation (5.1.8) allow us to obtain some general properties of the optimal control in the synthesis form u, (t, XI, xz), which we shall use later. Indeed, it follows from (5.1.8) that the optimal control u, for which the expression in the square brackets attains its minimum is a relay-type control and can be written in the form

a F -urn sign -(t, xl, x2).

8x2

REMARK 5.1.1. Rigorously speaking, the optimal control in this problem is not unique. This is related to the fact that at the points (t, xl , x2), where aF ( t , X I , x2)/dx2 = 0, the optimal control u, is not uniquely determined by Eq. (5.1.8). On the other hand, one can see that a t the points


(t, 21, 22), where aF/dx2 = 0, the choice of any control u0 lying in the admissible region [-urn, u,] does not affect the value of the loss function F ( t , 21, 22) that satisfies the Bellman equation. Therefore, in particular, the control (5.1.9) that requires the choice of u, = 0 a t the points (t, xl, x2), where aF ( t , x l , x2)/dx2 = 0,2 is optimal.

Using (5.1.9), we can rewrite the Bellman equation (5.1.8) in the form

0 5 t < T, F(T, 21, 2 2 ) = 0. (5.1.10)

It follows from (5.1.10) and the central symmetry of x(x1, z2) and c(xl,22) that the loss function (5.1.7) satisfying (5.1.10) is centrally symmetric with respect to the phase coordinates, namely, F ( t , X I , 2 2 ) = F ( t , -21, -22) .

Therefore, for any t , XI , 2 2 we have

It follows from this relation and (5.1.9) that the optimal control algorithm u, (t, 21, x2) has an important property of being antisymmetric, namely,

The facts that the optimal control in problem (5.1.3)-(5.1.6) is of relay type (5.1.9) and antisymmetric (5.1.11) play an important role in the asymptotic synthesis method discussed in the sequel.

We also note that the optimal control algorithm in problem (5.1.3)- (5.1.6) can be simplified significantly if we consider the optimal control of system (5.1.3) on an infinite time interval. In this case, the upper limit of integration T + oo in (5.1.6) and, instead of (5.1.7), we have the time- independent3 loss function

f ( x l , x 2 ) = min lu (~) I<~rn,

2Recall that the discontinuous function signx is determined by the relation

3The loss function (5.1.12) is time-independent, since the plant equations (5.1.4) are time-invariant.

252 Chapter V

and, instead of (5.1.9), we have a time-invariant control algorithm of the form

8f u*(x1, x2) = -urn sign -(XI, x2). (5.1.13)

8x2 In what follows we shall consider just such a time-invariant version of

the optimal control problem (5.1.3)-(5.1.6) on an infinite time interval.

REMARK 5.1.2. As T -+ co, problem (5.1.3)-(5.1.6) makes sense only if there exists an admissible control u(xl, 2 2 ) in the synthesis form ensuring the convergence of the improper integral4

I"(XI, 2 2 ) = c(x?(t), xF(t)) d t , I" (5.1.14)

where xT(t) and x%(t) denote solutions of system (5.1.4) with control E and the initial conditions xl(0) = x1 and x2(0) = 2 2 . Simultaneously, for some constraints of the form (5.1.5) imposed on the admissible controls and for some nonlinear functions ~ ( x l , 2 2 ) in (5.1.3), (5.1.4), it may happen that none of the admissible controls u ensures the convergence of the integral (5.1.14). For example, if x(xl,x2) = xq - 1, then system (5.1.3) is a controlled van der Pol oscillator. It is well known [2, 183, 1841 that undamped quasiharmonic auto-oscillations arise in such systems for u G 0. Moreover, this auto-oscillating process is stable with respect to small disturbances affecting the oscillator. Therefore, for sufficiently small urn in (5.1.5), any admissible control is insufficient to "suppress" auto-oscillations in the oscillator (5.1.3). In its turn, in view of the properties of the penalty function c(x1, x2), it follows from this fact that the integral (5.1.14) does not converge.

Everywhere in the sequel, we assume that the parameters of problem (5.1.3)-(5.1.6) are chosen so that this problem has a solution as T + co. The solvability conditions for problem (5.1.3)-(5.1.6) as T + co will be studied in more detail in Section 5.1.4.

5.1.2. Equations for the amplitude and the phase. Reduction of the synthesis problem. To study the quasiharmonic systems of the form (5.1.1) and (5.1.3), it is convenient to describe the current state of the system by using, instead of the coordinate X I and the velocity 2 2 , the polar coordinates A and cp, which have the meaning of the "amplitude" and the "phase" of almost harmonic oscillations. We can pass to the new coordinates by the formulas

x l = A c o s @ , x2=-Asin@, @ = t + p . (5.1.15) -

4 ~ t also follows from the ~ r o ~ e r t i e s of the penalty function c(xl,x2) that the control Z(x1,x2) guarantees the asymptotic stability of the trivial solution XI ( t ) = xa(t) r 0 of system (5.1.4).


The change of variables (5.1.15) transforms system (5.1.4) to the following equations for the slowly changing amplitude and phase (equations in the normal form [2, 19, 1361):

A = EG(A, @, u), @ = EH(A, @, u), (5.1.16)

where

G(A, @, u) = x,(A, @) - %(A, @I,

A x,(A, @) = -(cos 2@ - I)X(A cos @, -Asin@),

2 sin 2@

x&, @I = -- 2 ~ ( A c o s @, -Asin@),

us (A, @) = u(A, @) sin @, u,(A, @) = u(A, @) cos @.

Since the optimal control is of relay type (5.1.9), (5.1.13) and antisymmetric (5.1.11), for the control function u(A, @) in (5.1.17), we can immediately write

u(A, @) = u, sign [ sin (@ - pr (A))]. (5.1.18)

Note that, in view of the change of variables (5.1.15), controls of the form (5.1.18) are already of relay type and antisymmetric on the phase plane (xl, x2). The function pJA) in (5.1.18) determines (in the polar coordinates) an equation for the switching line of the controlling action. Thus, in this case, the synthesis problem is equivalent to the problem of finding the function $(A) that minimizes a given optimality criterion. The function p:(A) is calculated by using the method of successive approximations presented in Section 5.1.4.

It is well known [2, 19, 331 that for a sufficiently small parameter E , instead of Eqs. (5.1.16), one can use some other auxiliary equations, which are constructed according to certain rules and are called truncated equations. These equations allow one to obtain approximate solutions of the original equations in a rather simple way (the accuracy is the higher, the smaller is the parameter E ) .5

In the simplest case, the truncated equations

5Here we do not justify the approximatingproperties of the solutions constructed with the help of truncated equations. A detailed discussion of these problems can be found in numerous textbooks and monographs devoted to the theory of nonlinear oscillations (e.g., see 12, 19, 33, 1361).

254 Chapter V

are obtained from (5.1.16) by neglecting the vibrational terms in the expressions for G(A, @, u) and H(A, @, u) or, which is the same, by averaging the right-hand sides of Eqs. (5.1.16) over the "fast phase" Q, while the amplitude A is fixed,6 namely,

A higher accuracy of approximation to the solution of system (5.1.16) is ensured by the regular asymptotic Krylov-Bogolyubov method [19, 1731, in which the vibrational terms on the right-hand sides of Eqs. (5.1.16) are eliminated by the additional change of variables

where

denote purely vibrational functions such that

iT 12R v(A*, Q*) dm* = 0, v(A*,@*) = -

2i 12' "(A*, a*) dm* = 0. "(A*,@*) = -

By the change of variables (5.1.21), we obtain the following equations for the nonvibrational amplitude A* and phase JO* from (5.1.16):

A* = EG* (A*) = EGT(A*) + E~G;(A*) + E3 . . . , (5.1.23)

@* = EH*(A*) = EH;(A*) +E~H;(A*) + E ~ . . .

In this case, the successive terms G;, H;, G;, H;, . . . , v1, "1, v2, "2,. . . of the asymptotic series (5.1.23) and (5.1.22) are calculated recurrently by the method of successive approximations.

'This method for obtaining truncated equations is often called the method of slowly varying amplitudes or the v a n der Pol method.


Let us illustrate this method. By using (5.1.21), we can write (5.1.16) in the form

Substituting (5.1.22) and (5.1.23) into (5.1.24) and retaining only the terms of the order of E in (5.1.24), we obtain the first-approximation relations

awl 1 H;(A*) + -(A*, a * ) = H(A*, a * ) = x,(A*, a * ) - -u,(A*, a * ) . a@* A* (5.1.25)

Now, by equating the nonvibrational and purely vibrational terms on the left and on the right in (5.1.25), we obtain the following expressions for the first terms of the asymptotic series (5.1.23) and (5.1.22):

G;(A*) = x A (A*, a * ) - u,(A*, a * ) = G(A*),

1 (5.1.26) H;(A*) = xP(A*, a*) - ;u,(A*, a * ) = H(A*),

A

( A * a*) = * ) - ] d f - ( A * a*), (5.1.2'7)

where

Q, (A*, a * ) = [u, (A*, @I) - ?&I da'. 6; In (5.1.26)-(5.1.28), as usual, the bar over an expression indicates the averaging over the period, that is, & J:*. . . d@*; the lower integration limits

and are chosen so that the functions vl(A*, a*) and wl(A*, @*), determined by (5.1.27) and (5.1.28), be "purely vibrational" in the variable a*.

In a similar way, we can calculate the next terms Ga(A*), H,*(A*), v2(Ar, a * ) , . . . of the asymptotic expansions (5.1.23) and (5.1.22). So, to

256 Chapter V

calculate the functions Ga, H,+, 212, w2 in (5.1.24), we need to retain the expressions of the order of E ~ . Then (5.1.24) implies the second-approximation relations

8% avl dv2 G;(A*) + -G;(A*) + -H; (A*) + a, d ~ * aa*

dG dG = v17(A*,@*) + wi;(A*,@*),

d A d@ (5.1.29) awl awl dw2 H; (A*) + --G;(A*) + mH; (A*) + dA*

In its turn, each equality in (5.1.29) splits into two separate relations for the nonvibrational and vibrational terms contained in (5.1.29), respectively. This allows us to calculate the four functions GZ(AS), H;(A*), v2(A*, a*), and wg(A*, a*). In particular, for the nonvibrational terms, the first equality in (5.1.29) implies

Using (5.1.17), (5.1.27), and (5.1.28), we can write the right-hand side of (5.1.30) in more details as follow^:^

where the expression

g; (A*) = - n,) dm* + ( - ) a * (5.1.32)

indicates the control-independent terms. We do not write out the expressions for H,+ (A*), v2(A*, a * ) , . . . since we

do not need them in the sequel.

or brevity, we omit the arguments ( A * , a*) of the functions x , , xP, us, and Q, in (5.1.31) and (5.1.32).


5.1.3. Auxiliary formulas. The functions G; (A*), HI (A*), Ga (A*), Hz(A*), . . . that form the asymptotic series in (5.1.23) depend on the choice of the control algorithm u(A, a ) , that is, in view of (5.1.18), on the function cp,(A). It follows from (5.1.26) and (5.1.31) that we can write this dependence explicitly if we know the expressions

- - 8% duS 8% us, u,, aA sin n@, - cosna, Qs -, d a d A (5.1.33)

auS Q-, Qc s i nna , 9, cos n@. aa

The average values (5.1.33) can readily be calculated by using (5.1.18), the properties of the S-function, and the fact that the functions u,(A, a), u,(A, a), Qs (A, a), and +,(A, @) are periodic (with respect to @).

1. If, for definiteness, we assume that 0 5 cpr 5 r / 2 , then it follows from (5.1.17) and (5.1.18) that

u, sign [sin (a - c p , ( ~ ) ) ] sin @ d@

"+qr(A) s i n a d @ + / s i n a d @ - J2" sin dm

Y,(A) "+Y,(A) 2u, = - cos pr(A).

I 7r

(5.1.34)

One can readily see that formula (5.1.34) remains valid for any cp,(A) such that -7r < cp,(A) 5 7r.

2. In a similar way, we obtain

2% u, sign [sin (a - cp, (A))] cos @ d@ = - - sin cpp(A) 7r

and the relation U,U,= 0,

which we shall use later.

3. Using the formal relation

d - sign x = 2S(x) dx

Chapter V

and formula (5.1. IS), we can write

au, a - = - [u(A, @) sin @I dA dA

a = -{urn sign [sin (0 - pr (A))] sin @) a A

Using (5.1.37) and the properties of the &function, after the integration and some elementary calculations, we obtain

[cos(n + l)pr - cos(n - l ) p r ] for even n , for odd n.

4. By the straightforward integration with regard to

8% - a0 = 2urn6[sin(0 - p,)] cos(@ - p,) sin cP + u, sign[sin(0 - pr)] cos 0,

we obtain

[* ~ i n ( n + l ) ~ , - sin(n - l ) p r ] for even n , for odd n.

5. Since \E, (A, cP) and du, (A, @)/dA are periodic functions, we have

Next, using (5.1.27) and (5.1.37), we arrive a t

where

&[sin(@' - pr)] cos(@' - p,) sin 0 ' d0 '

Control of Oscillatory Systems

It follows from (5.1.40) that the choice of does not affect the value of

Q,%. Hence we set = 0. Furthermore, if we consider 0 5 cp, 5 n-, then the piecewise constant function F ( @ ) in (5.1.41) has jumps of value sinp, a t the points cpr and n- + cp, as shown in Fig. 35. For this function F(@), one can readily calculate F and q, namely,

2 sin cp,.

sin pr

These relations, (5.1.34), (5.1.40), and (5.1.41) imply

- - 2u& dcp 1 - 3 (5 sin 2pp - - sin f p , + sin cp, n- d A n- 2

0 'f, 7T n - + ' f , 27T @

- - - - - - - - - - -

.- -

Carrying out similar calculations for - 7 ~ 5 cp, 5 0 and comparing the result with the last formula, we finally obtain

I

I I I I

6. Using the relation

I I I 1 I I I I

*

260 Chapter V

and expressions (5.1.34)-(5.1.36), we obtain

du, 2u2 Qc- = --sin2pr. aa 7r2

7. The relation 1

Q, sin n@ = -u, cos n 9 (5.1.44) n

allows us to reduce the calculation of the desired mean value to finding a simpler expression u, cos n a . Using (5.1.17) and (5.1.18) and performing some simple calculations, we obtain

1 n+l sin(n + 1 ) ~ ~ + 5 sin(n - l ) y , for even n,

for odd n.

8. The value Q, co sn9 can readily be obtained by using the obvious relation

1 au, Q, cos n@ = - - - cos n 9

n2 d@

and formula (5.1.39). The expressions obtained for the average values (5.1.33) will be used

later for solving the synthesis problem.

5.1.4. Approximate solution of the synthesis problem. Now let us return to the basic problem of minimizing the functional (5.1.14). By choosing the nonvibrational amplitude and phase as the state variables, we rewrite (5.1.14) in the form8

( A * * = ] c* (A;, a;) d t ,

where c*(A*, 9*) is obtained from the penalty function c(xl, x2) by the change of variables (5.1.15), (5.1.21).

Note that the functional (5.1.46), treated as a function of the initial state (A*, a * ) , is a periodic function in the second variable, namely, I(A* , a * ) =

'The value of the functional (5.1.46) depends both on the initial state A*(O) = A*, @ * ( O ) = @* of the system and on the control algorithm u(A;, Q;): 0 < t < oo. There-

fore, for the functional (5.1.46) it is more correct to use the notation IU(~;,*:)(A*, @*) or I ' ~ ( ~ * ) ( A * , a*) (which, in view of (5.1.18), is the same). However, for simplicity, we write I(A*, a*) .


I (A*, @* + 27~). Therefore, taking into account (5.1.21) and the second equation in (5.1.23), we obtain

from (5.1.46). In (5.1.47) the integration over the period is performed along a trajectory of the system, and hence the amplitude A; is treated as a function of the fast phase @ f . This function Af (a;) is determined by the relation

that follows from Eqs. (5.1.23). Note that the amplitude increment AA* = A* (a* + 27~) - A*(@*) during

the period is, in view of (5.1.23), a small variable (of the order of E). By using this fact and the Taylor expansion of the left-hand side of (5.1.47), for the derivative dI(A*, @*)/dA* we obtain the following power series in the small parameter E:

1 d 2 1 ( ~ * , a*) - -

2 dA*2 AA* - . .

Since AA* = €G;(AS)2.rr in the first approximation with respect to E, it follows from (5.1.49) that

& dI(A*, a*) - c* (A*)

- -- dA* G;(A*) + € . . . .

where - C* (A*) = - '*(A*, a,') d@F, (5.1.51)

and the function G;(A*) = G;(A*, (pr(A*)) is determined by (5.1.26), (5.1.17), and (5.1.34).

Calculating the right-hand side of (5.1.49) with a higher accuracy (in this case, to calculate the last term in (5.1.49), we need to differentiate (5.1.50)), we obtain

& dI(A*, a*) - c* (A*) dc*

- - dA*

- €-(A*, @f)(@; - a*) Gf (A*) + EG; (A*) dA*

dc* (A* ) - E7T-

dA* + c 2 . . . ,

262 Chapter V

where, just as in (5.1.51), the bar over a letter indicates the averaging over the period with respect to @:, and the function GB(A*) is determined by (5.1.31).

Let us write the functional to be minimized as follows:

(note that, by the assumptions of the problem considered, we can set AT, = I ( A 2 ) = 0).

It follows from (5.1.53) that, to minimize the functional (5.1.46), it suffices to find the minimum of the derivative a I (A*, @*)/dAS for an arbitrary current state (A*, @*) of the control system. The accuracy of this minimization procedure depends on the number of terms retained in the expansion of the right-hand side of (5.1.49) in powers of E. Let us perform the corresponding calculations for the first two approximations.

According to (5.1.50), to minimize the functional (5.1.46) in the first approximation in E, it suffices to minimize (in cpr) the expression

Since the penalty function c(x, i ) = c(xl, 2 2 ) is nonnegative, we have i?*(A*) > 0 for A* # 0. Therefore, to minimize (5.1.54), it suffices to minimize the function G; (A*, v r ( ~ * ) ) . In its turn, it follows from (5.1.17) and (5.1.26) that GT attains its minimum value for the maximum value of

- 1 27r u. = u, (A*, m*) = - j U(A* , a*) sin a* d@*

2~ 0

This fact and (5.1.5) readily imply that the optimal control ul(A*, a*) in the first approximation must have the form

Cornparing (5.1.55) and (5.1.18), we see that cpr(A*) z 0 in the first approximation in E. This means that, in this case, the switching line of the control coincides with the abscissa axis on the phase plane (xl = x, x2 = 2). Indeed, if, instead of the amplitude A* and the phase @*, we take the coordinate x and the velocity 2 as the state variables, then it follows from (5.1.15), (5.1.21), and (5.1.55) that, in this approximation, the optimal control of the oscillator (5.1.3) is ensured by the synthesis function of the form

u1 (x, 2) = -urn sign 2. (5.1.56)


From the mechanical viewpoint, this result means that, to obtain the optimal damping of oscillations in the oscillator (5.1.3), we must apply the maximum admissible controlling force (the torque) and this force (the torque) must always be opposite to the velocity (the angular velocity) of the motion. It must also be emphasized that the control algorithm in the first approximation is universal, since it depends neither on the nonlinear characteristics of the oscillator (that is, on the function ~ ( x , i) in (5.1.3)) nor on the form of the penalty function c ( x , i) in the optimality criterion (5.1.6).

To find the quasioptimal control algorithm in the second approximation, we need to calculate the function cp,(A*) that minimizes (5.1.52) or, which is the same, the expression

G; (A*, P,(A*)) + E G ~ (A*, P,(A* ) ) - (5.1.57)

Since (5.1.57) differs from G;(A*, c p r ( ~ * ) ) by a term of the order of E, it is natural to assume that the difference between the function cp,(A*) that minimizes (5.1.57) and the function cpr(A*) 0 in the first approximation is small, that is, it is natural to assume that we have cp,(A*) -- E for the desired function.

Having in mind the fact that cp,(A*) -- E and using the average values (5.1.33) calculated in Section 5.1.3, we can estimate the order of different terms in formula (5.1.31) for the function G; (A*, c p r ( ~ * ) ) . We also note that since the function x in (5.1.3) is symmetric, that is, ~ ( x , i) = x(-x, - i ) , there are only cosines (sines) of a, 2@, . . . in the Fourier series for the functions X, (A, a ) &(A, a ) ) . Thus, it follows from the results obtained in Section 5.1.3 that, among all terms in (5.1.31), only two terms &Q,$$ and -&*,&& are of the order of E . The other terms (depending on the control, that is, on cpr(A*)) in (5.1.31) are of the order of s2 or E ~ . This implies that the function cpr(At) minimizing (5.1.57) in the second approximation is just the function maximizing the expression

E au, E F(pr) = ii, - -!I!,- + --!I! X".

A* a@* A* "a@*

To obtain some special results, we need to define the function ~ ( x , i) explicitly. Let us consider two examples.

EXAMPLE 1. Suppose that the plant is a linear quasiharmonic oscillator described by Eq. (5.1.3). In this case, ~ ( x , i) 1 and, in view of (5.1.17), x(A, a) = A(cos 2@ - 1)/2.

By using (5.1.34), (5.1.43), and (5.1.45), we obtain

2um urn 1 2u; F(c,) = cos pr + E- (- sin 3cpr + sin cpr) + E-

27r 3 sin 2v,.

264 Chapter V

The desired function qr(A*) can be found from the condition

aF - 2u, urn 4 4 , - -- - sinv, + r-(cos3q, + cosq,) +E= cos 2qr = 0. a% '7r 21r

(5.1.59) Since cpr is small (q, - E), it follows from (5.1.59) thatg

The function pr (A*) determines (in the polar coordinates) the switching line equation for the quasioptimal control in the second approximation. The position of this switching line on the phase plane (x, 5 ) is shown in Fig. 36.

It follows from (5.1.18) and (5.1.60) that in this case the quasioptimal control algorithm (the synthesis function) in the second approximation has the form

u2(A*,@*) =urnsign [ sin ( @* - E (i - + - ::))Ie REMARK 5.1.3. It follows from (5.1.60) that qr(A*) + co as A* 4 0

and formulas (5.1.60) and (5.1.61) do not make sense any more. The reason

he terms of the order of c2 and of higher orders on the right-hand side of (5.1.60) are omitted.


is that if we use a control of the form (5.1.18), then there always exists a small neighborhood of the origin on the phase plane (x, i ) and the quasiharmonic character of the trajectories of the plant (5.1.3) is violated in this neighborhood. In Fig. 36, this neighborhood is the circle of radius R (R - E)." In the interior of this neighborhood, the applicability conditions for the asymptotic (van der Pol, Krylov-Bogolyubov, etc.) methods are violated. Therefore, the quasioptimal control algorithms (5.1.56) and (5.1.61) can be used everywhere except for the interior of this neighborhood. More- over, it is important to keep in mind that, by using the asymptotic synthesis method discussed in this section, it is in principle impossible to find the optimal control in a small neighborhood of the point (x = 0, i = 0).

EXAMPLE 2. Now let x ( x , i ) = x2 - 1. In this case, the plant (5.1.3) is a self-oscillating system (a self-exciting circuit) sometimes called the van der Pol oscillator or the T h o m s o n generator. It follows from (5.1.17) that, in this case, we have

Using formulas (5.1.34), (5.1.43), and (5.1.45) for the function (5.1.58), we obtain

2urn F(cpp) = cos cp sin 58, + 3

1 4um - - sin 3yr - sin cp, + - sin 2yr

3 7r A* I Just as in Example 1, from the condition dF/dpr = 0 with regard to the fact that cpr is small (cpp - E), we derive the equation of the switching line,

and the synthesis function in the second approximation,

u2(A*, a*) = urn sign [sin (a* - f ($ - 1 + -- 4urn))]. (5.1.64) 7rA*

' O A ~ elementary analysis of the phase trajectories of a linear oscillator subject to the control (5.1.56) shows that the phase trajectories of the system, once entering the circle of radius R = 2&um, not only cease to be quasiharmonic, but cease to be oscillatory in character at all.

Chapter V

FIG. 37

The switching line (5.1.63) is shown in Fig. 37.

REMARK 5.1.4. It was pointed out in Remark 5.1.2 that the problem of optimal damping of oscillations in system (5.1.3) on an infinite time interval is well posed if the optimal (quasioptimal ) control of the plant (5.1.3) ensures the convergence of the improper integral (5.1.14) (or, which is the same, of the integral (5.1.46)). Let us establish the convergence conditions for these integrals in Example 2.

The properties of the penalty function c ( x , 2) readily imply that the integral (5.1.46) converges if, for a chosen control algorithm and any initial value of the nonvibrational amplitude A* (0), the solution of the first equation in (5.1.23) A* (t) + 0 as t t oo, and furthermore, if A* (t) tends to zero not too "slowly." Let us consider the special form of Eq. (5.1.23) in Exam- ple 2. We confine ourselves to the first approximation A* = &G;(A*). Since the quasioptimal control in the first approximation has the form (5.1.55), it follows from (5.1.26) and (5.1.62) that the nonvibrational amplitude obeys the equation

If u, > ~ 1 3 4 , then for any A* > 0 the function on the right-hand side of (5.1.65) cannot be positive; therefore, A* (t) + 0 as t + oo for any solution of (5.1.65). If in this case


then the solution Ai(t) of Eq. (5.1.65) attains the value A* = 0 on a finite time interval, which guarantees the convergence of the integral (5.1.46). Thus, the inequality (5.1.66) is the solvability condition for problem (5.1.3)- (5.1.6) a s T + m i n the case ofExample2.11

In conclusion we note that, in principle, the approximate method considered here can also be used for calculating the quasioptimal control algorithms in the third, fourth and higher approximations. However, in this case, the number of required calculations increases sharply.

$5.2. Control of the "predator-prey'' system. The case of a poorly adapted predator

In this section, by using the asymptotic synthesis method considered in $5.1, we solve the optimal control problem for a biological system consisting of two different populations interpreted as "predators" and "prey" coexist- ing in the same habitat (e.g., see $2.3 and 1133, 186, 1871). This system is mathematically described by the standard Lotka-Volterra model in which the behavior of an isolated system is subject to the following system of equations (see (2.3.5)):

- Recall that 5 = Z(r ) and 5 = G(t ) are the respective population sizes1' of prey and predators a t time Fand the positive constants ax, a2, bl, and b2 have the following meaning: a1 is the rate of growth of the number of prey, a2 is the rate of prey consumption by predators, bl is the rate at which the prey biomass is processed into the new biomass of predators, and b2 is the rate of predator natural death.

In this section we consider a special case of system (5.2.1) in which the predators die at a high natural rate and are "poor" predators, since they consume their prey at a low rate. In the nomenclature of [177], this problem corresponds to the case of predators poorly adapted to the habitat. For system (5.2.1), this means that we can take the ratio azbllb2 = E << 1 as a small parameter in the subsequent calculations.

''condition (5.1.66) becomes sharper with an increase in the number of terms retained in the asymptotic series on the right-hand side of Eq. (5.1.23) for the nonvibrational amplitude.

''If the distribution of species over the habitat is uniform, then Z and y" denote the densities of the corresponding populations, that is, the numbers of species per unit area (volume) of the habitat.

268 Chapter V

5.2.1. Statement of the problem. We assume that system (5.2.1) is controlled by eliminating prey specimens from the population (by shooting, catching, and using herbicides). Then, instead of (5.2.1), we have the system (see (2.3.12)

here the control Z = Z(T) satisfies the constraints

where 7 is a given positive number. We consider the control problem for system (5.2.2) with infinite time

interval; the goal of control is to take the system from any initial state - - xo,& > 0 to the equilibrium state 2, = b2/bl, y* = a1/a2 of system (5.2.1). For the optimality criterion we use the functional

b2 2

I,, = Lrn ( ~ ( r ) - ,) + c2 (5(T) - %) ] (5.2.4) a2

where cl and c2 are given positive constants. We assume that the integral (5.2.4) is convergent.

In (5.2.2) we change the variables as follows:

This allows us to rewrite system (5.2.2) in the form

In this case, the functional (5.2.4) to be minimized acquires the form

In the new variables (x, y), the goal of control is to transfer the system to the origin (x = y = O), and the range of admissible values is bounded by the


quadrant z > -1 /~ , y < W / E (since the initial variables are nonnegative, Z , ? > 0).

We assume that the admissible control is bounded by a small value. To this end, we set 7 = E~~ in (5.2.3). Then, changing the scale of the controlling function, Z = E ~ U , we can write system (5.2.6) and the constraint (5.2.3) as

Thus the desired optimal control u, can be found from the condition that the functional (5.2.7) attains the minimum value on the trajectories of system (5.2.8) with constraint (5.2.9) imposed on the control actions. In this case, we seek the control in the form u, = u, (x(t), y(t)).

5.2.2. Approximate solution of problem (5.2.7)-(5.2.9). In the case of "poorly adapted" predators, the number E in (5.2.8) is small, and system (5.2.8) is a special case of the controlled quasiharmonic oscillator (5.1.1). Therefore, the method of $5.1 can immediately be used for solving problem (5.2.7)-(5.2.9). The single distinction is that admissible controls are subject to nonsymmetric constraints (5.2.9); thus the antisymmetry property (5.1.11) of the optimal control is violated. As a result, it is impossible to write the desired controls in the form (5.1.18). However, as is shown later, no special difficulties in calculating the quasioptimal controls in problem (5.2.7)-(5.2.9) arise due to this fact.

On the whole, the scheme for solving problem (5.2.7)-(5.2.9) repeats the approximate synthesis procedure described in $5.1. Therefore, in what follows, the main attention is paid to distinctions in expressions and formulas caused by the special nature of problem (5.2.7)-(5.2.9).

Just as in $5.1, by changing variables according to formulas (5.1. 15)13 we transform system (5.2.8) to the following equations for the slowly changing amplitude and phase (5.1.16):

Now, instead of (5.1.17), we have the following expressions for the functions G(A, 9 ) and H(A, 9 ) only:

G(A, 9) = g(A, 9) - uc(A, 9) - Eu:(A, @),

H(A, 9) = h(A, 9) - u, (A, 9) - EU: ( A , 9),

13With the obvious change in notation: XI = x and 2 2 = y.

270 Chapter V

g(A, a ) = A' sin @ cos @(sin cf, - cos a ) ,

h(A, a ) = A sin @ cos @(sin @ + cos a ) ,

u,(A, m) = - @) cos m, u:(A, m) = - b2w

u(A' @) A cos2 a, b2w

us (A, m) = -- "(A' @) sin a, u' (A, .PI = -- b2wA

') sin 8 cos a. b2w

The passage to Eqs. (5.1.23) for the nonvibrational amplitude A* and phase p* is performed, as above, by using formulas (5.1.21)-(5.1.24). The terms Gfi, Hg , Ga, . . . in the asymptotic series in (5.1.23) are calculated from (5.1.24), (5.2.10) by the method of successive approximations. In particular, in the first approximation, instead of (5.1.26)-(5.1.28), we have

G;(A*) = -u,(A*, a * ) , H,*(A*) = -us (A*, a*), (5.2.11)

~ l ( ~ * , m * ) = ~ ~ * l o ( ~ * l ~ ) - u , ( ~ * , ~ ) + ~ ~ d ~ , (5.2.12)

w,(A*, m*) = l:* [h(A*, 5 ) - u,(A*, 6) + us] da . (5.2.13)

In (5.2.11)-(5.2.13) we took into account the fact that, in view of (5.2.10), we have

For the second term of the asymptotic series on the right-hand side of Eq.(5.1.23), instead of (5.1.31), we have

&(A*, a*) &(A*, a*) [ aA*

-

dA*

&(A* a*) ~ u , ( A * 7 a*) wl(A*, @*) + [ am*

-

am* I By $5.1 the quasioptimal controls ul(A*, @*), u2(A*, a*), . . . are found from the condition that the partial derivative dI(A*, @*)/dA* attains its minimum. In view of (5.1.50) and (5.1.52), this condition is equivalent to the condition that G;(A*) attains its minimum (in the first approximation) or


the sum GT(A*) + EG; (A*) attains its minimum (in the second approximation). It follows from (5.2.9), (5.2.10), and (5.2.11) that minimization of G'; (A*) means maximization of

- 1 2" U, = fic(A*,@*) = - 1 u(A*,@)cos@d@ i max . (5.2.15)

2.n o oSu<r

This fact immediately implies the following implicit formula for quasiopti- ma1 control in the first approximation:

Y ul(A*, a*) = -(sign cos @* + 1). 2

(5.2.16)

Taking into account formulas (5.1.15) and (5.1.21) for the change of variables, we can write x = A* cos @* with accuracy up to terms of the order of E. This fact and (5.2.16) readily imply the following expression for the synthesis control in the first approximation in terms of the variables (x, y):

I u l (x , y) = -(sign% + 1). 2

(5.2.17)

Thus, in the course of the control process, the controlling action assumes only the boundary values from the admissible range (5.2.9) and is switched from the state u l = 0 to the state u l = y (or conversely) each time when the representative point (x, y) intersects the y-axis (the switching line in the first approximation). We also point out that, according to (5.2.5), in the variables (5, fj) corresponding to the original statement of problem (5.2.2)- (5.2.4), this control algorithm leads to the switching line that is the vertical line passing through the point 5 = 2, = b2/b l on the abscissa axis; this point determines the number of prey if system (5.2.1) is in equilibrium.

To find the optimal control in the second approximation, we need to minimize the expression G;(A*) + EG;(A*) = F(A*, u). The functions G; (A*) = G; (A*, u) and G; (A*) = G; (A*, u) are calculated by formulas (5.2.11) and (5.2.14) with regard to (5.2.10), (5.2.12), and (5.2.13). In actual calculations by these formulas, it is convenient to use the fact that the difference between the optimal control uz(A*, a*) in the second approximation and (5.2.16) must be small. More precisely, we can assume that on the one-period interval of the fast phase @* variation, the optimal control in the second approximation has the form of the function shown in Fig. 38 (the solid lines), where A1 and A2 are the phase shifts of the switch times with respect to the switch times of the control in the first approximation (the dashed lines); these variables are small (Al , A2 - E).

This fact allows us, without loss of generality, to seek the control algorithm ug(A*, @*) in the second approximation immediately in the form

Y UZ(A*, a*) = - 2 {sign[cos(@* - ( P ~ ) - sin ( P ~ ] + 1) .

Chapter V

Here $71 = pl(A*) and $72 = $72(A*) are related to Al and A2 as

and hence, are also of the order of E.

If the desired control in the second approximation is written in the form (5.2.18), then there are a t least two advantages. First, in this case, we can minimize F(A*, u) = G;(A*) + E G ~ ( A * ) by finding the minimum of the known function F(A*, p l , $72) of two variables $71 and $72. Second, we can calculate GT and G% by formulas (5.2.11) and (5.2.14) using the fact that $71 and $72 are small ($71, $72 -- E)).

From (5.2.10), (5.2.11), and (5.2.18), we obtain

277

G;(A*) = -u,(A*,Q*) = -- I 1 n2(A*, Q*) cos Q* dQ* 27rbzw

- - Y --

2xb2w [c0~($71 - $72) + cos((p1 + $72)) (5.2.20)

Since $71, $72 -- E , it follows from (5.2.20) that the maximal terms (depending on $71 and $72) in the expansion of (5.2.20) in powers of E are of the order of E ~ . Therefore, to calculate the second term EG;~ in the function F(A*, $71, $72) = G; +EG; to be minimized, we can retain only terms of the order of E~ and neglect the terms of the order of e3 and of higher orders.


With regard to this remark we calculate the mean values on the right-hand side of (5.2.14) and thus see1* that we need to minimize the function

(5.2.21) to obtain the optimal values of cpl and cpz in the second approximation.

From the condition dF/dcpl = dF/dcp2 = 0 necessary for an extremum, we obtain the desired optimal values

Expressions (5.2.22) determine (in the polar coordinates) the switching line for the optimal control in the second approximation. The form of this line on the phase plane (x, y) is shown in Fig. 39. The neighborhood of the origin in the interior of the circle of radius R = ~ E Y / w ~ ~ is the region where the quasiharmonic character of the phase trajectories is violated. Generally speaking, the results obtained here are not authentic, and we need to use some other methods for constructing the switching line in this region.

5.2.3. Comparative analysis of different control algorithms. It is of interest to compare the results obtained in the preceding subsection

14Here we omit cumbersome elementary transformations leading t o (5.2.21). To obtain (5.2.21), we need to use formulas (5.2.10), (5.2.12), (5.2.13), and (5.2.18) and the technique used in Section 5.1.3 for calculating average values.

2 74 Chapter V

with the solutions of similar synthesis problems obtained by other methods. To this end, we can use the results discussed in $7.2 (see also [105]), where we present a numerical method for solving the synthesis problem for the "normalized" predator-prey system controlled on a finite time interval.

In 57.2 we consider the optimal control problem in which the plant equations, the constraints on admissible controls, and the optimality criterion have the form

In this case, in 57.2 we derive the optimal control G(T, 5, y) in the synthesis form by solving the Bellman equation corresponding to problem (5.2.23)- (5.2.25) numerically.

Note that problem (5.2.23)-(5.2.25) turns into problem (5.2.2)-(5.2.4) if the following assumptions are satisfied:

We also note that, in view of the changes of variables (5.2.5) and (5.2.26)) the quasioptimal control algorithm in the first approximation (5.2.17) acquires the form

7 ul (Z, Y) = -[sign@ - 1) + 11. (5.2.27) 2

To estimate the effectiveness of algorithm (5.2.27), we performed a numerical simulation of the normalized system (5.2.23). Namely, we constructed a numerical solution of (5.2.23) on the fixed time interval 0 < T 5 T = 15 for three different algorithms of control E (1) the optimal control Ti = Z,(T, :, y); (2) the optimal stationary control Ti = ?i:(z, y) corresponding to the case where the terminal time T t oo in problem (5.2.23)-(5.2.25); (3) the quasioptimal control in the first approximation (5.2.27).


For these three control algorithms, the transient processes in system (5.2.23) are shown as functions of time in Fig. 40 and as phase trajectories in Fig. 41. Moreover, the following parameters of problem (5.2.2)-(5.2.4) were used for the simulation: a1 = a2 = bl = b2 = 0.5, 7 = 0.125, E = 0.5, w = 1, y = 0.5, cl = ca = 1 (in problem (5.2.23)-(5.2.25), to these values there correspond = 0.25 and b = 1).

Comparing the curves in Figs. 40 and 41, we see that these three algorithms lead to close transient processes in the control system. Hence,

276 Chapter V

the second and the third algorithms provide a sufficiently "good" control. This fact is also confirmed by calculating the quality functional (5.2.25) for these three algorithms, namely, we obtain I[u,(r, Z, y)] = 4.812, I[ui(%,?j)] = 4.827, and I[ul(:,y)] = 4.901. Thus, any of these algorithms can be used with approximately the same result. Obviously, the simplest practical realization is provided by the first-approximation algorithm (5.2.27) obtained here; by the way, this algorithm corresponds to reasonable intuitive heuristic considerations of how to control the system. Indeed, according to (5.2.27), i t is necessary to start catching (shooting, etc.) every time when the prey population size becomes larger than the equilibrium size (for the normalized dimensionless system (5.2.23), this equilibrium size is equal to 1). Conversely, as soon as the prey population size becomes smaller than the equilibrium size, any external action on the system must be stopped.

It should be noted that the situation when the first-order approximation allows one to obtain a control algorithm close to the optimal control is rather typical not only of this special case but also of other cases where the small parameter methods are used for solving approximate synthesis problems for control systems. This fact is often (and not without success) used in practice for solving special problems [2, 331. However, it should be noted that this fact is not universal. There are several cases where the first-approximation control leads to considerable increase in the value of the functional to be minimized with respect to its optimal value. At the same time, the higher-order approximations allow one to obtain control algorithms close to the optimal control. Some examples of such situations (however, related to control problems of different nature) are examined in $6.1 and in [97, 981.

$5.3. Optimal damping of random oscillations

In this section we consider the optimal control problem for a quasiharmonic oscillator, which is a stochastic generalization of the problem studied in $5.1. Therefore, many ideas and calculational formulas from $5.1 are widely used in the sequel.

However, it should be pointed out that the foundations underlying the approximate synthesis methods in these two sections are absolutely different. In $5.1 the quasioptimal controls are obtained by straightforward calculations and minimization of the cost functional, while in the present section the approximate synthesis is based on an approximate method for solving the Bellman equation corresponding to the problem in question.

5.3.1. Statement of the problem. Preliminary notes. Here we consider a stochastic version of problem (5.1.3)-(5.1.6) as the initial synthe-


sis problem. We assume that the quasiharmonic oscillator (5.1.3) is subject to small controls EU = E U ( ~ ) and, in addition, to random perturbations of small intensity

where [(t) denotes the standard scalar white noise (1.1.31) and B > 0 is a given number.

The admissible controls u = u(t), just as in (5.1.5), are subject to the constraints

Iu(t) I I Urn, (5.3.2)

and the goal of control is to minimize the mean value of the functional

T

I[U] = E [l c(x(t), i ( t ) ) dt] i min . (5.3.3) Iu(t)llum

O<t<T

The nonlinear functions ~ ( x , k) and c(x, k) in (5.3.1) and (5.3.3), just as in $5.1, are assumed to be centrally symmetric, ~ ( x , 2) = x(-x, -k) and c(x, k) = c(-x, -2). Next, it is assumed that the penalty function c(x, k) is nonnegative and possesses a single minimum a t the point (x = 0, k = 0) and c(0,O) = 0.

Let us introduce the coordinates x l = x, x2 = k and rewrite (5.3.1) as

Then, using the standard procedure from $1.4, for the function of minimum future losses

F(t,x1,x2) = min [ T ( 1 ( ) 2 ( ) ) d 1 xi( t) = xl,x2(t) = x2 , Iu(.)l<um

t<r<T I

(5.3.5) we obtain the Bellman differential equation

d F -- - d F d F

dt - 2 2 - - ( x I + E x ( x ~ , x ~ ) x z ) - + min EU-

ax1 8x2 Iullu, [ El E B a2F + -7 +- c ( x ~ , X Z ) , 0 5 t < T, F(T, XI, x2) = 0, 2 dx,

(5.3.6)

corresponding to problem (5.3.1)-(5.3.3).

278 Chapter V

It follows from (5.3.6) that the desired optimal control u,(t, XI , 2 2 ) can be written in the form

aF u*(t, xi, 2 2 ) = -urn sign -(t, 21, x2),

ax2

where the loss function F( t , 21, x2) satisfies the following semilinear equation of parabolic type:

Equation (5.3.8) and the fact that the functions ~ ( x l , 2 2 ) and c(xl,x2) are symmetric imply that F = F( t , X I , x2), satisfying (5.3.8), is symmetric with respect to the phase coordinates, that is, F ( t , XI , 2 2 ) = F( t , -21, -22). This and formula (5.3.7) show that the optimal control (5.3.7) possesses an important property, which will be used in what follows; namely, the optimal control (5.3.7) is antisymmetric (see (5.1.11)):

We also stress that in this section the main attention is paid to solving the stationary version of problem (5.3.1)-(5.3.3), that is, to solving the control problem in which the terminal time T + m. In the nomenclature of [I], problem (5.3.1)-(5.3.3) as T + m is called the problem of optimal stabilization of the oscillator (5.3.1).

5.3.2. Passage to the polar coordinates. The Bellman differential and functional equations. By using the change of variables (5.1.15), we transform Eqs. (5.3.4) to equations for the slowly changing amplitude A and phase cp:

where

(??(A, @, u, t ) = G(A, @, u) - E-'/~(, (t), (, (t) = B1I2<(t) sin @,

E-1/2 G(A, a, u, t ) = H(A, @, U) - -(,(t), &(t) = ~ l I ~ ( ( t ) cos @,

A (5.3.11)

and the functions G(A, @, u) and H(A, @, u) are determined by (5.1.17).


Note that the right-hand sides of the differential equations (5.3.10) for the amplitude and phase contain a random function ((t) that is a white noise. Therefore, Eqs. (5.3.10) are stochastic equations. The expressions (5.3.11) for 6' and & are derived from (5.3.4) and (5.1.15) by changing the variables according to the usual rules valid for smooth functions [(t). Thus it follows from 31.2 that the stochastic equations (5.3.4) and (5.3.10) are equivalent if they are symmetrized.15

We also note that by passing to the polar coordinates (which become the arguments of the loss function (5.3.5)), we can equally use either the set (A, p , t ) of current values (at time t ) of the amplitude A, the "slow" phase c p , and time t or the set (A, @, t ) in which the "slow" phase is replaced by the "fast" phase @. For the calculations performed later, the set (A, @, t ) is more convenient.

For the loss function F ( t , A, @) defined by analogy with (5.3.5),

F (t, A, a) = min E [ lT ~1 ( ~ ( 7 ) ~ ~ ( r ) ) d r 1 ~ ( t ) = A, ~ ( t ) = @ , Iu(.)I<um

t<s<T I

we can write the basic functional equation of the dynamic programming approach (see (1.4.6)) as

t + A

~ ( t , A t , @ t ) = min E [l c i ( ~ r , a,) d r + ~ ( t + A, A,+*, at+,)]. 1u(.)11um t<r<t+A

(5.3.13) This equation expresses the "optimality principle." It is important to stress that relation (5.3.13) holds for any time interval A (not necessarily small). This fact is important in what follows.

But if A -+ 0 in (5.3.13), than, using (5.3.10) and (5.3.11), we can readily obtain (see 3 1.4) the following Bellman differential equation for the function (5.3.12):

aF d F --

d F = - + E L F + min EG(A, @, u)- + EH(A, @, u)- a t a@ I U ( T ) I S U ~ a A

15More precisely, for Eqs. (5.3.10) it is important to take into account the sym- metrization property, since these equations contain a white noise E ( t ) multiplicatively with expressions that depend on the state variables A and 9. As for Eqs.(5.3.4), they have the same solutions independent of whether they are understood in the Ito, Stratonovich, or any other sense.

Chapter V

where L denotes the operator

The last two terms in (5.3.15) appear due to the fact that the stochastic equations (5.3.10) are symmetrized.

If we change the time scale and pass to the slowly varying time ?= ~ t , then Eq. (5.3.14) for the loss function F(K A, @) acquires the form

It follows from (5.3.16) that the derivatives of the loss functions with respect to the amplitude and the fast phase are of different orders of magnitude (if dF/dA - 1, then dF/a@ E). This fact, important for the subsequent considerations, follows from the quasiharmonic character of the motion of system (5.3.4).

Equation (5.3.16) can be simplified if, just as in $1.4, $2.2, $3.1, etc., we consider the stationary stabilization of random oscillations in system (5.3.4). In this case, the upper limit of integration T -+ oo in (5.3.5) and (5.3.12). The right-hand side of (5.3.12) also tends to infinity because of random perturbations ((t). Therefore, to suppress the divergence in the stationary case, we need to consider the following stationary loss function f (A, @) (see (1.4.29), (2.2.9), and (4.1.7)):

f (A, a) = lim [F (6 A, a) - y (ET - t)] , T-tw

where the constant y characterizes the mean losses of control per unit time in the stationary operating conditions. For the function (5.3.17), we have the stationary version of Eq. (5.3.16):

- min IuILum

Just as in $5.1, taking into account the relay (5.3.7) and the antisymmetry (5.3.9) properties, without loss of generality, we can seek the optimal


control u* (A, a ) , which minimizes the expression in the square brackets in (5.3.18), in the set of controlling actions of the form (5.1.18):

u(A, a ) = u, sign [sin (a - p , ( ~ ) ) ] .

This allows us to rewrite Eq. (5.3.18) in the form

where G(A, a , p,) and H(A, a, p,) denote the functions obtained from (5.1.17) after the substitution of the control u(A, a ) in the form (5.3.19).

Thus, solving the synthesis problem is reduced to finding the function $(A) that minimizes the expression in the square brackets in (5.3.20) and determines (in polar coordinates) the equation for the switching line of the controlling actions u* = f U, under the optimal control u, (A, a). To calculate the function p+r(A), just as to solve Eq. (5.3.20), we use the method of successive approximations (see Section 5.3.3), which allows us to obtain the desired function @+(A) in the form of a series in powers of the parameter E:

Now let us write the functional equation (5.3.13) for the time interval A = 27r. With regard to (5.3.19), we can write

t+2n F ( t , A t , a t ) = min E [l C I ( A ~ , a,) d r + ~ ( t + 271, ~ t + 2 n , @t+2n -

v,(Ar) t<r<t+27T

)I (5.3.22)

Since the loss function (5.3.12) is periodic in the variable a, we have F ( t , A, a ) = F( t , A, - 27r). This and (5.3.10) imply that relation (5.3.22) can be rewritten as

t+2n F ( t , At, a t ) = min E [l cl(A,, a,) d ~

v r

+ F ( t + 2 ~ , At + EAA, at + E A ~ ) 1 , (5.3.23)

Chapter V

where

t + 2 ~ EAA = E J: G ( A r , @ r , u r , r ) d ~

t+27r = ,5 lt+2= G(A,, a, , cpr(Ar)) d r - &J L ( T ) dr,

t (5.3.24) t + a ~

~ A c p = E H(A,, a , , u ~ , 7) d r

H(Ar , @,, cpr(Ar)) d~ - & dr.

- Using, just as in (5.3.16), the "slow" time t = ~t and expanding

F(;+ ~ T E , At + EAA, @t + ~Acp) in the Taylor series, we rewrite (5.3.23) in the form

( E A A ) ~ d2 F - d 2 F - +-- ,,, (t + 2re, At, at) + (EAA) ( i A p ) (t + ~ T E , At, at) (&Av)' d 2 ~ - +--

2 aa2 (t + 2TE, At, @t) + . . . = 0. I

In the stationary case considered in what follows, Eq. (5.3.25) acquires the form

min E [E ltiZn cl(A,, a,) d r - 2 r & ~ + EAA-(At, a f at) v F a A

8 f (EAA)' d2 f + ~Acp-(At, at) + -- d a 2 dA2 (At, at)

Equation (5.3.26) is of infinite order and looks much more complicated than the differential equation (5.3.20). Nevertheless, since the differences EAA and ~ A c p are small, the higher-order derivatives of the loss function in (5.2.26) are, as a rule, of higher orders of magnitude with respect to powers of the parameter E. This allows us, considering terms of more and more


higher order of E in (5.3.26) successively, to solve equations of comparatively non-high orders and then to use these solutions for approximate solving of the synthesis problem.

In practice, in this procedure of approximate synthesis, special attention must be paid to a very important fact that simplifies the calculations of successive approximations. Namely, in this case, there are two equations, (5.3.20) and (5.3.26), for the same function f (A, a). Thus, combining both these equations, we can exclude the derivatives d f Id@, d2 f IdAda, . . . of the loss function with respect to the phase from (5.3.26) and thus to decrease the dimension and to turn the two-dimensional equation (5.3.26) into a one-dimensional equation.

It is convenient to exclude the derivatives with respect to the phase, just as to solve Eqs. (5.3.26), by using the method of successive approximations.

5.3.3. Approximate solution of the synthesis problem. To apply the method of successive approximations, we need to calculate the mean value of the integral

in (5.3.26) and the mean values of the amplitude and phase increments

over the time 27r. By using system (5.3.10), we can calculate expressions (5.3.27) and (5.3.28) with arbitrary accuracy in the form of series in powers of the small parameter E.

Let us write

Z(A, O, urn sign[sin(@ - (or(A))], t) = G(A, a, t) , (5.3.29)

&(A, O, urn sign[sin(@ - or(^))], t ) = H(A, O, t).

Then it follows from (5.3.10) that the increments of the amplitude A and the slow phase (o over an arbitrary time interval r are

By using formulas (5.3.30) repeatedly, we can present &SAT and ESP, as the series

284 Chapter V

where

The increments (5.3.24) are calculated by formulas (5.3.31)-(5.3.36) with regard to (5.1.17), (5.3.10), and (5.3.11) as

Finally, we need to use (5.3.31)-(5.3.37) and average the corresponding expressions with respect to ( ( t ) , taking into account (1.1.31).

In a similar way, using formulas (5.3.32)-(5.3.36), we can also calculate the integral in (5.3.27) as a series in powers of E. Indeed, writing

.... ... substituting &A,, Sly,, given by formulas (5.3.32), (5.3.35), and averaging with respect to ( ( t ) , we obtain the desired expansion for (5.3.27).

In practice, to use this method for calculating the mean values of (5.3.27) and (5.3.28), we need to remember that formulas (5.3.30)-(5.3.38) possess a


specific distinguishing feature relative to the fact that the random functions in expressions (5.3.29) have the coefficients &-'I2 :

G(A, a , t ) = G(A, a , p,) - E-'/~[, (t)

= xA(A, a) - us (A, a) - p t ( t ) sin a , E

E-'I2 H(A, a, t ) = H(A, a , cp,) - -tc(t) A

= x,(A, 9 ) - A E A

(formulas (5.3.39) follows from (5.1.17), (5.3.11), and (5.3.29)). Thus, terms of the order of E-' appear in Eb2A2,, ES3A2,, . . ., ES2p2,, E63 (~2~ , . . .. Therefore, in the calculations of the mean values of (5.3.27) and (5.3.28), the number of terms retained in the expansions (5.3.31) must always be larger by 1 than needed for the accuracy desired (if, for example, we need to calculate the mean values of (5.3.27) and (5.3.28) with accuracy up to terms of order of E', then we need to retain (s + 1) terms in the expansions (5.3.31)).

For example, let us calculate the first term in the expansion of the mean value E(EAA). From (5.3.32) and (5.3.35), we have

Averaging (5.3.40) with respect to t ( t ) and taking into account the properties of the white noise, we obtain

where the bar, as usual, indicates averaging with respect to the fast phase over the period (e.g., zA(At) = & gn x,(At, a) d a ) , and us (At, p,) = G,

is given by (5.1.34)). Next, it follows from (5.3.33), (5.3.40), and (5.3.41) that

286 Chapter V

Averaging (5.3.43) with respect to [(t) and taking into account (1.1.31) and (1.1.32), we obtain

Ed2A2, = J2, [Ar E[(rl)t(t) cos(Qt + r') cos(Qt + r) d ~ ' d r + D &At 1

= 8 J2" [AT 6(r1 - r) cos(Qt + r') cos(mt + r) dr ' d r + D &At 0

- - L J2' cos2(Qt + r) d r + D = - I ~ B + D,

I (5.3.44)

2 4 0 2 ~ A t

where

Finally, from (5.1.34), (5.3.31), (5.3.37), (5.3.42), and (5.3.44) we obtain the desired mean value

cp,) + - + e 2 . . . 4At I

( p = u,/~r). In a similar way, we obtain

2~ %(At) + - sin cpp(At) + c2 . . . At 1 (5.3.46)


For the other mean values of (5.3.27) and (5.3.28), in the first approximation in E, we have

All the other mean values E[(EAA)(EAY)], E ( E A ~ ~ ) ~ , . . . in (5.3.28) are higher-order infinitesimals in E.

Now let us calculate successive approximations of the Bellman equation (5.3.26). Simultaneously, with the help of Eq. (5.3.20), we shall exclude the derivatives of the loss function with respect to the phase from (5.3.26).

The first approximation. We represent the loss function f (A, a) as the series

f (A, @) = fl(A, a) + Ef2 (A, @) + E2 - . . , (5.3.49)

substitute into Eq. (5.3.26), and retain only terms of the order of E (omitting the terms of the order of c2 and of higher orders). Since, in view of (5.3.20), d f I d @ - E, using (5.3.45)-(5.3.48), we obtain the following equation of the first approximation from (5.3.26):

In (5.3.50) we calculate the minimum with respect to 9, under the assumption that d fl /dA > 0 and thus obtain the expression

for the minimizing function cp*,(A) that determines the switching line in the first approximation. In this case, in view of (5.3.51), Eq. (5.3.50) acquires the form

Comparing the result obtained with the approximate synthesis result (5.1.55) for a similar deterministic problem, we see that, in the first approximation in E, the perturbation ( ( t ) in no way affects the switching line. Just as in the deterministic case (5.1.55), (5.1.56), the switching line coincides with the abscissa axis on the phase plane (x, k ) for any type of nonlinearity, that is, for any function ~ ( x , &) in Eq. (5.3.1).

288 Chapter V

To find the switching line in the second approximation, we need to calculate the derivative afl/b'A satisfying the differential equation (5.3.52), where the stationary error y is not jet found. But we can readily show how to calculate this error. Namely, since the stationary error is defined (in the probability sense) as the mean penalty value (see (1.4.32)), we have

where pl(A) is the stationary probability density for the distribution of the amplitude A. The Fokker-Planck equation that determines this stationary density is conjugate to the Bellman equation. Therefore, in the case of (5.3.52), the equation for pl(A) has the form

For the zero probability flow (see $4, item 4 in [173]), Eq. (5.3.54) has the solution

where the constant C is determined by the normalization condition

As soon as y is known, we can solve Eq. (5.3.52). The unique solution of this equation is specified by the condition that the function fl / aA must behave as A -+ oo just as in the deterministic case (that is, in (5.3.1) the random perturbations [ ( t ) 0). This assumption on the solution of Eq. (5.3.52) is quite natural, since, obviously, the role of the diffusion term in the equation decreases as A increases (similar considerations were used in 32.2).

It follows from (5.3.52) that if there are no perturbations ( B = y = O ) , then this equation has the solution

afl - - - - E l (A)

X,(A) - 2 ~ '

Therefore, the diffusion equation (5.3.52) has the solution

4 A' (y - E~(A') ) exp [z J (A") - 2p + ) ~ A ' I d ~ ' .

A 4A1' 1 (5.3.58)


Now we can verify whether the derivative dfl/dA is positive (this was our assumption, when we derived (5.3.51)). It follows from (5.3.58) that this assumption is satisfied for not too small values of the amplitude A. Therefore, if we solve the synthesis problem by this method, we need not consider a small neighborhood of the origin on the phase plane (x, k ) . Just as in the deterministic case in 55.1, it is clear from the "physical" viewpoint that the controlling action u and the perturbations [ ( t ) lead to appearance of a neighborhood where the quasiharmonic character of the phase trajectories is violated.

The second approximation. To obtain the Bellman equation in the second approximation, we retain the following expression in (5.3.26):

min E{E ['IT cl(A,, a,) d r - 2,rray $ EAA- af 1 + EAF- af 1 +'r a A a@

The other terms in (5.3.26) are necessarily of orders larger than that of E ~ .

The derivatives dfl/a@, d2 fl/dAa@, . . . of the loss function with respect to the phase can be eliminated from (5.3.59) by using (5.3.20). Hence we have

To find the function cpr(A) that minimizes the expression in the braces in (5.3.59), we shall consider only the terms in (5.3.59) that depend on the control (or, which is the same, on cp,(A)). In this case, we shall use the fact that the minimizing function p+r(A) is small in the second approximation: p+r(A) = ~ c p a (A). Therefore, in the part of Eq. (5.3.59) that depends on cp,, we can retain only the terms that depend on cpa by expressions - E'

and - E ~ .

Clearly, it is no longer sufficient to have only formulas (5.3.45)-(5.3.48) for the mean values of (5.3.27) and (5.3.28) in the first approximation. In the expansions (5.3.45)-(5.3.48) we need to calculate the terms - E ~ .

Following the way for calculating (5.3.30)-(5.3.38) and retaining only expressions depending on cp, = E ( P ~ in the terms of the order of E ~ , we see that, in the second approximation, formulas (5.3.45)-(5.3.48) must be

Chapter V

replaced by

B7T 2 r c ( A , p,) + -1

2A

~'27rB - E(EAA)' = EBT - -

A uc (A, p,) sin2 a, (5.3.62)

E [& lt+21 cl(A,, a , ) d r = &27rE1 (A) I

where Z(A, cP) and &(A, cP) denote the purely vibrational components of the functions G(A, a, p,) = E(A, p,) + Z(A, a ) and cl(A, a) = El(A) + El (A, a ) .

By using (5.3.60)-(5.3.64), (5.1.34), (5.3.42), and (5.3.59), we see that the desired function p:(A) = ~cpz(A), which determines the switching line in the second approximation, can be found by minimizing the expression

afl N(pp ) = -~4u , cos p, - a A 27T

- EZ { [Tic (A, p,)G (A, a) - uc (A, @ ) h (A, a)]

2~ B a2 fl + ,%(A, p,) [r - cl(A, @) - - sin2 @- 2 dA2

B afl - - cos2 @- -

af 1

2A 8 A pp ) sin2 maA B?T + aGc (A, p,) (5.3.65)

We collect similar terms in (5.3.65) with the help of (5.1.34)-(5.1.36).


As a result, we obtain

In the following two examples, we calculate the function cp:(A) for which (5.3.66) attains its minimum.

EXAMPLE 1. Suppose that the plant to be controlled is a linear system. In this case, ~ ( x , i) 1 in (5.3.1), and it follows from (5.1.17) that

For simplicity, we assume that the vibrational component of the penalty function &(A, @) = 0 (this holds, e.g., if c(z, i) = x 2 + i 2 in (5.3.3)). Then, in view of (5.1.44) and (5.1.45), the expression (5.3.66) acquires the form

The condition aN/dcp, = 0 leads to the following equation for the desired function cp: (A) :

+ ) cos cpF] = 0. (5.3.67) cos 3p, + (1 - - 4 2A2 A(8 fl/dA)

Representing the desired functions cpz(A) in the form of asymptotic expansion p 3 A ) = &&(A) +s2 . . . , we readily obtain from (5.3.67) the following expression for the leading term of this expansion:

cp;(A) = scp; (A) = E [2:2 - + :;:!2)] Formula (5.3.68) determines the switching line of the suboptimal control u2(A, cP) = urn sign[sin(@ - &&(A))] in the second approximation.

292 Chapter V

In (5.3.68), y is calculated by formula (5.3.53) with the stationary probability density

C = ~ - 2 u , ~ e 1 6 P 2 / r , [ l - F ( * ) ] 2 z / B f F(u)=- f i l " e - x 2 dr.

(5.3.69) Here the derivative 8 fl/dA, determined by (5.3.58) in the general case, has the form

1 x lm [Fl (A') - y] A' exp [ - (A" + PA')] dA'. (5.3.70)

Since

afl y -+ 0 and - E l (A) aA 2 p - FA (A)

as B+O;

one can readily see that formula (5.3.68) coincides as B + 0 with the corresponding expression (5.1.60) for the switching line of the deterministic problem.

EXAMPLE 2. Let us consider a nonlinear plant with x(x, &) = x2 - 1 in (5.3.1) (in this case, the plant is a self-exciting van der Pol circuit). For such a system, it follows from (5.1.17) that

- A A3 A A3 x(A) = - - -, Z,(A, a) = -- cos 2 a + - cos 4G. (5.3.71)

2 8 2 8

Substituting (5.3.71) into (5.3.66) and using (5.1.44) and (5.1.45), from (5.3.66) and the condition aN/acp, = 0 we derive the expression for the switching line in the second approximation, which coincides in form with the expression obtained in the previous example. However, now the loss function and the stationary error in (5.3.68) must be calculated in a different way.

So, in this case, the stationary probability density (5.3.55) for the distribution of the amplitude has the form


where C is the normalization constant:

The stationary error y in (5.3.68) is calculated by formula (5.3.53) with the help of (5.3.72) and (5.3.73). The expression for d fl/dA can be obtained from (5.3.58) with regard to (5.3.71). As a result, we see that the derivative d f l /dA in (5.3.68) has the form

afl - 4 - - e x p dA BA [I(g B 8 - A2 + 8 , ~ A ) l

1 A14 A ( A ) - y) exp [ - - (- - Af2 + ~,uA') ] dAf.

B 8

Just as in Example 1, formula (5.3.68) coincides as B + 0 with the corresponding expression

obtained in $5.1 (see (5.1.63)) for the deterministic problem.

The influence of random perturbations on the position of the switching line in the second approximation is shown in Fig. 42, where four switching

294 Chapter V

lines for the linear quasiharmonic system from Example 1 are depicted. Curve 1 corresponds to the deterministic problem ( B = 0). Curves 2, 3, and 4 show the switching lines in the stochastic case and correspond to the white noise intensities B = 1, B = 5, and B = 20, respectively. These switching lines correspond to the quadratic penalty function c(x, &) = x2 + i2 in the optimality criterion (5.3.3) and the parameters u, = 1 and E = 0.25 in problem (5.3.1)-(5.3.3). The dashed circle in Fig. 42 approximately indicates the domain where the quasiharmonic character of the phase trajectories of the system is violated. In the interior of this domain, the synthesis method studied here may lead to large errors, and we need to employ some other methods for calculating the switching line near the origin.

5.3.4. Approximate synthesis of control that maximizes the mean time of the first passage to the boundary. As another example of the method of successive approximations treated above, let us consider the synthesis problem for a system maximizing the mean time during which the representative point (z(t), &(t)) first comes to the boundary of some domain on the phase plane (x, &). For definiteness, we assume that this domain is the disk of radius Ro centered a t the origin. As before, we consider a system whose behavior is described by Eq. (5.3.1) with constraints (5.3.2) imposed on control.

Passing to the polar coordinates and considering the new state variables A and as functions of the "slow" time t= ~ t , we transform Eq. (5.3.1) to the system of equations of the form

where the functions 2 and fi are given by (5.3.11) and (5.1.17). By using Eq. (5.3.74), we can write the Bellman equation for the problem in question.

It follows from $1.4 that the maximum mean time during which the representative point (A(T), @(TI) achieves the boundary (the loss function for the synthesis problem considered) can be written as (see (1.4.38))

Recall that W(r , AT, at-) denotes the probability that the representative point with the polar coordinates (AF, ai-) a t time does not achieve the boundary of the region of admissible values during the time (T - t ) . For the optimality principle (see (1.4.39)) corresponding to the function (5.3.75),


we can write the equation

- - t + A

F (Ac, ac) = max E[L ~ ( r , ~ i , a i ) d r + ~ ( ~ i + A , a i + A ) ] - k ( ~ ) l $ u m t j r < t + A

(5.3.76) By letting the time interval A + 0, in the usual way ($1.4), we obtain the following differential Bellman equation for the function F(A, a):

a~ a~ aF - a@ = r { - ~ - L F - lulLurn max [G(A,@,u)-+H(A,@,u)-I} d A a@ '

- A < Ro, F(Ro,@) = 0.

(5.3.77) Here L is the operator (5.3.15), and the functions G and H are determined by formulas (5.1.17).

On the other hand, if we set A = 2as in (5.3.76), then we arrive a t the finite-difference Bellman equation (an analog of (5.3.26))

max E [r2=' W (7, AT, aF) d r + (EAA) - aF + ( r a p ) - d F + - ( E A A ) ~ - a2F pr d A a@ 2 dA2

Here the increments of the amplitude eAA and the "slow" phase r A y are the same as in (5.3.24), and satisfy (5.3.45)-(5.3.48) and (5.3.61)-(5.3.64).

Next, to solve the synthesis problem approximately, we need, just as in Section 5.3.3, to solve Eqs. (5.3.77) and (5.3.78) simultaneously. Here we write out the first two approximations of the function cpr(A) determining the switching line in the optimal regulator, which, just as in Section 5.3.3, is of relay type and has the form (5.3.19).

The first approximation. Substituting the expression a F / a ~ from (5.3.77) into Eq. (5.3.78), omitting the terms of the order of r2 and of higher orders, and using (5.3.45)-(5.3.48), we obtain the following Bellman equation in the first approximation:

Since, by definition, W(T, Ai; @;) = 1 a t all points in the interior of the domain of admissible states (that is, for all AT < Ro), we can transform

296 Chapter V

(5.3.79) with regard to (5.3.45) to the form

The function cpr(A) determining the switching line in the first approximation is found from the condition that the expression in the square brackets in (5.3.80) attains its maximum. For 8Fl/dA < 016 we obtain

Comparing (5.3.81) with (5.3.51) as well as with (5.1.55), we conclude that, in the first approximation in E , the switching line of the optimal quasiharmonic stabilization system always coincides with the abscissa axis on the plane (x, 2 ) ; this fact is independent of the type of system nonlinearity, the existence of random perturbations, and the optimality criterion. Some distinctions between expressions for cp+r(A) appear only in higher-order approximations.

The equation for the loss function F1 (A) in the first approximation with regard to (5.3.81) has the form

A unique solution of this equation is determined by the natural boundary conditions

For simplicity, we shall consider the case where the plant is a linear quasiharmonic system. In this case, we have ~ ( x , &) = 1 in (5.3.1) and z(A) = -A12 in (5.3.82). Solving (5.3.81) with the second condition in (5.3.83), we readily obtain

The expression (5.3.84) is used for determining the switching line in the second approximation.

161t follows from (5.3.84) that the condition EJFl/dA < 0 is satisfied for all A E

(0, Rol.


The second approximation. The switching line in the second approximation is calculated by analogy with Section 5.3.3. Namely, in Eq. (5.3.78) we consider the terms of the order of E' and retain the terms depending on p r taking into account the fact that p: (A) is small (9: = E&). Then we see that the desired function cp:(A) in the second approximation is determined by the condition that the expression

attains its maximum. If the system is linear, then we have %*(A, a) = ACOS 2@/2, and the

desired expression for (pc(A), which follows from the condition dN/dp, = 0 with regard to (5.1.44) and (5.1.45), has the form

Figure 43 shows the switching line given by (5.3.86).

In conclusion, let us present the block diagram (Fig. 44) of a quasioptimal self-stabilizing feedback control system with plant P described by

298 Chapter V

Eq. (5.3.1). The feed-back circuit (the regulator) of this system contains a differentiator, a multiplier, an adder, an inverter, a relay unit, and two nonlinear transducers N C 1 and NC2. Unit N C 1 realizes the functional dependence A = d m , that is, produces the current value of the am- ~ l i t u d e A. Unit NC2 models the functional dependence V;(A), which is given either by (5.3.68) or by (5.3.86), depending on the problem considered. Thus, the feed-back circuit in the diagram in Fig. 44 realizes the control law

u(x, J) = -€urn sign (1 + xp:(&%Z)),

which coincides with (5.3.19) with accuracy up to terms -- E ~ .

We also note that the diagram in Fig. 44 becomes significantly simpler if system (5.3.1) is controlled by using the quasioptimal algorithm in the first approximation (5.1.55), (5.1.56). In this case, the part of the diagram indicated by the dashed line is absent.

$5.4. Optimal control of quasiharmonic systems with noise in the feedback circuit

Now we shall show how to generalize the results of the preceding section to the case where the error in the measurement of the output (controlled) variable x(t) cannot be removed.

5.4.1. Statement of the problem. We shall consider the feed-back control system whose block diagram is shown in Fig. 25. Just as in 35.3, we


assume that the plant P is a quasiharmonic controlled system perturbed by the standard white noise and described by the equation

We seek the optimal (scalar) control u, = u,(t) in the class of piecewise continuous functions whose absolute value is bounded by urn:

It is required to calculate the controller C so that to provide the optimal damping of oscillations x(t) arising in system (5.4.1) under the action of random perturbations [(t). In this case, the quality of the damping is estimated by the mean value of the functional

The functions ~ ( x , k) and c(x, k) in (5.4.1), (5.4.3) are the same as in (5.3.1), (5.3.3). Therefore, problem (5.4.1)-(5.4.3) is completely identical to problem (5.3.1)-(5.3.3).

The single but important distinction between these problems is the fact that now it is impossible to measure the current state of the controlled variable x(t). We assume that the result y(t) of our measurement is an additive mixture of the true value of x(t) and a random error of small intensity:

y(t) = x(t) + &rl(t), (5.4.4)

where E is a small parameter the same as in (5.4.1) and the random function ~ ( t ) is a white noise (independent of [(t)) with characteristics

where N > 0 is the intensity (spectral density) of the process ~ ( t ) . Now to obtain some information about the current state of the plant

at time t , we need to use the entire prehistory of the observed process y; = {y(r): 0 5 T 5 t} from the initial time t = 0 till the current time t . Therefore, in this case, the current values of the control action ut and the function (5.3.5) of minimum future losses depend on the observed realization yh, that is, are the functionals

~ ( t , y;) = min E [ lT c(x(r), i(r)) d r I y;] . (5.4.7) Iu(.)I5um

t<r<T

300 Chapter V

The principal distinction between problems (5.4.1)-(5.4.4) and (5.3.1)- (5.3.3) is that, to find the optimal control functional (5.4.6) that minimizes the optimality criterion (5.4.3), we need to choose the space of states of the controlled system (the sufficient coordinates of the problem; see $1.5, $3.3, and $4.2) in a special way, which will allow us to use the dynamic programming approach for solving the synthesis problem.

Let us show how to determine the sufficient coordinates for problem (5.4.1)-(5.4.5).

5.4.2. Equations for the sufficient coordinates. Let us consider the random function z(t) = J: ~ ( r ) d r . Then writing the plant equation (5.4.1) as the system of first-order equations:

and assuming that the control u is a given function of time, we can readily show that z(t) is the observable component of the three-dimensional Markov process (xl(t), x2(t), ~ ( t ) ) . By using (5.4.4), (5.4.5), and (5.4.8), as well as the results of $1.5, we readily obtain an equation for the a posteriori probability density wps(t, x) = wps (t, x1,x2) = w(xl, 2 2 1 2:) = w(xl, 2 2 I y:) for the components of the unobservable diffusion process determined by system (5.4.8). The corresponding equation is a special case of Eq. (1.5.39) and has the form

awps (t, 2) - - a -- 1 a2

a t (KawpS) + -- 8% 2 dz,axp

(Bapwps) + [Q(X,Y) - Glwps.

(5.4.9) Here the subscripts a, P take the values 1 and 2, and

Equation (5.4.9) for the a posteriori density also remains valid if the control u in (5.4.8) is a functional of the observed process 24, (or y;) or even of the a posteriori density wps(t, x) itself. This fact is justified in [I751 (see also $ 1.5).

It follows from (5.4.4), (5.4.5), (5.4.9), (5.4.10), and the results of $1.5 that the a posteriori probability density wps(t, x), treated as a function of time, is a Markov stochastic process and thus can be used as a sufficient coordinate in the synthesis problem. However, usually, instead of wps (t, x), it is more convenient to use a parameter system equivalent to wp,(t, x). If we write xy(t) = x;,, x i ( t ) = xgt for the coordinates of the maximum


point of the a posteriori probability density wps(t, x) a t time t,17 then, expanding wps(t, x) in the Taylor series around this point, we obtain the following representation for wp, (t, x) = wps (t , X I , x2) (see (1.5.41)):

(in (5.4.11) the sum is over ni, i = 1, . . . , s, assuming the values 1 and 2). If we substitute (5.4.11) into (5.4.9) and set the coefficients of equal powers of (xn, - x:~). . . ( x , ~ - x:~) on the left- and right-hand sides equal to each other, then we obtain a system of differential equations for the parameters x:; (t) and a ,,,..,* (t) (see (1.5.43)). Note that since Eq. (5.4.9) is symmetrized, the stochastic equations obtained for x:;(t) and u,,...,~ (t) are also symmetrized.

It is convenient to replace the probability density wps(t, x) by a set of parameters, since we often can truncate the infinite system of the parameters x i i , a [167, 170, 1811 retaining only a comparatively small number of terms in the sum that is the exponent in (5.4.11). The error admitted in this case as compared with the exact expression for wps is the less the higher is the a posteriori accuracy of estimation of the unobservable components x1 and x2 (or, which is the same, the less is the norm of the matrix ]lDaPll of the a posteriori variances); here, the norm of the matrix llDaPll is of the order of E , since, in view of (5.4.4), the observation error is a small variable of the order of fi.

It is often assumed [167, 1701 that a,,,,,, = a,,,, ,,,, = . = 0 in (5.4.11) (this is the Gaussian approximation). In the Gaussian approximation, from (5.4.9) and (5.4.10) we have the following system of equations for the parameters of the a posteriori density wps (t, X I , x2):18

1 7 ~ h e variables x: ( t ) and zi( t) are estimates of the current values of the coordinate x ( t ) and the velocity x ( t ) of the control system (5.4.1). If the estimation quality is determined only by the value of the a posteriori probability, then z: ( t ) and z i ( t ) are the optimal estimates.

''For the linear oscillator (when ~ ( z , & ) 1 in (5.4.1)), the a posteriori density (5.4.11) is exactly Gaussian, and Eqs. (5.4.12) are precise.

302 Chapter V

1 a2x1 a2x1 DII = 2D12 - D?, (a - E ~ ) - 2&DI1Dl2--

a2x1 + 8 % ax1ax2 8%; '

1 a2x1 a2x1 ~ 1 2 = D i i D 1 2 - - E ~ ) - D~~ (1 + + DZ2 + E D ~ ~ D ~ ~ - (EN ax, ax;

a2x1 - ~(D11D22 + Di2)- -

ax2 EDZZ-,

ax2 ax2 ax2 1 a2x1

6 2 2 = r B - 2D12 1 + E- + E--) - D:, (- - t7) ( a ax, E N ax, 2 a2x1 a2x1 + - 2&Dl2DZ2--- . (5.4.12) ax; axlax2

To write these equations, we have passed from the parameter system llaapll to the matrix llDas 1 1 = /laaa 11-l of the a posteriori covariances. Besides of this, in (5.4.12) we have used the notation

Let us make some remarks concerning Eqs. (5.4.12). First, since (see (5.4.1), (5.4.4), and (5.4.5)) the noise intensity in the plant and in the feedback circuit is assumed to be small (of the order of E ) , the covariances of the a posteriori distribution are also small variables of the order of E , that is, we can write D l 1 = &Ell, D12 = &Dl2, and D22 = &Dz2. This implies that the terms in (5.4.12) are of different order of magnitude and thus Eqs. (5.4.12) can be simplified furthermore. So, retaining the most important terms and omitting the terms of the order of e2 and of higher orders, we can rewrite (5.4.12) in the form

We also note that, in this approximation, the last three equations in (5.4.13) can be solved independently of the first two equations. In particular, we


see that, for a long observation, the stationary operating conditions occur and the covariances of the a posteriori probability distribution attain some steady-state values D;l, DT2, and Da2 that do not change during the further observation. These limit covariances depend neither on the way of control nor on the type of the plant nonlinearity (the function ~ ( x , 5 ) in (5.4.1)) and are equal to

In what follows, we obtain the control algorithm for the optimal stabilizer (controller) C under these stationary observation conditions.

5.4.3. The Bellman equation and the solution of the synthesis problem. In the Gaussian approximation, the loss function (5.4.7) is completely determined by the current values of the a posteriori means

0 zl(t) = x?, and x;(t) = xi, and by the values of the a posteriori covariances Dl1, Dl2, and D22. Under the stationary observation conditions, the a posteriori covariances (5.4.14) are constant, and therefore, we can take x?(t), xg(t), and time t as the arguments of the loss function (5.4.7). Thus, in this case, instead of (5.4.7), we have

- (5.4.15)

In (5.4.15) the symbol E of the mathematical expectation means the a posteriori averaging, that is, the averaging with the a posteriori probability density. In other words, if we write the integral in the square brackets in (5.4.15) as a function of the initial values of the unobservable variable xlt and x2t, then, to obtain F ( t , x?,, xi,), we need to integrate this function with respect to xlt and x2t with the Gaussian probability density

For the function (5.4.15), the basic functional equation (the optimality

304 Chapter V

principle) of the dynamic programming approach has the form

~ ( t , x:,, x;,) = min cl(x1ry x2r) d7 Iu(r)lLum tLr<t+A

The differential Bellman equation can be obtained from (5.4.17) by using the standard derivation procedure outlined in $1.4 and $1.5. To this end, we need to expand the function F ( t + A, x ; , + ~ ) in the Taylor series around the point ( t , x?,, x:~), to calculate the mean values of the increments

and the integral

~ ~ ( x l r r ~ 2 r ) d7 , I (5.4.19)

to substitute the expressions obtained for (5.4.18) and (5.4.19) into (5.4.17), and pass to the limit as A -+ 0.

To calculate the mean values of (5.4.18), we need Eqs. (5.4.13) and formulas (5.4.4) and (5.4.5). So, from (5.4.13) we obtain

Since the stochastic processes XI, = x ~ ( T ) , xt, = x?(T), and x;, = x?(T) are continuous, for small A we can replace these stochastic functions by the constant values xlt , x?,, and xi,. The error of this replacement is of the order of o(A). As a result, if we average with respect to q(t) with regard to (5.4.5), then (5.4.20) implies

By averaging (*) with respect to xlt with the probability density (5.4.16), we finally obtain

0 - 0 , E(z?t+a - XI,) - %2t + o(A)- (5.4.21)


In a similar way, we can find the other expressions for (5.4.18) and (5.4.19):

c(r,,, x2,) d r ] = A /[: c(xlt, x2t)N(xP, D*) dxitdxzt

+o(A) . (5.4.22)

Using (5.4.21) and (5.4.22) and letting A -+ 0 in (5.4.17), we obtain

d F 0 0 -- 0 d F at (t, XI , x2) = x 2 7 - (xy + E~(X:, ax1

+ /c c(xl, x2)N(x0, a * ) dxldx2 (5.4.23)

(here we omit the subscript t in xy, xg, X I , 22). If the terminal time T in (5.4.3), (5.4.7), and (5.4.15) is sufficiently large,

then the fact that F depends on t becomes unimportant (the stationary stabilization conditions take place), since the derivative -dF/dt -+ y as T + co (here y is a constant that characterizes the mean losses per unit time under the optimal control). As is usual in such cases (see (1.4.29), (2.2.9), (4.1.7), and (5.3.17)), passing from F ( t , x?, x;) to the time-independent loss function

f (x:, 4 ) = [w, 4, xi ) - y ( T - t)] , we arrive a t the stationary version of Eq. (5.4.23):

306 Chapter V

Just as in s5.3, it is more convenient to solve Eq. (5.4.24) in the polar coordinates if, instead of the estimated values of the coordinate x: and the velocity x;, we use, as the arguments of the loss function, the corresponding values of the amplitude A. and the phase Go:

xy = A. cos Go, x: = -Ao sin (@o = t + yo). (5.4.25)

Performing the change of variables (5.4.25), we transform (5.4.24) to the form

- min [G(AO, mo, u ) ~ a f + H(AO, ~ o , %)-I). a f (5-4.26) I ~ l l u m d@o

The expressions for G(Ao, Qo, u) and H(A0, a o , u) coincide with (5.1.17) after the change A, @ + Ao, Qo. The function c* (Ao, @o) is determined by the penalty function c(x, k) in (5.4.3) (e.g., for c(z, 2) = x 2 + i2, we have

c*(Ao, a o ) = A; + EB,*, + ED:^). In (5.4.26) Lo denotes the differential operator

cos2 @O a sin 2Qo a I) + - - -- AO ~ A O A; a@, .

Note that as N + 0 formula (5.4.27) passes into formula (5.3.15) for the operator L obtained in 35.3 for systems containing complete information about the phase coordinates of the plant. We can readily verify this fact by substituting the values (5.4.14) of the steady-state covariances into (5.4.27) and passing to the limit as N + 0. Then (5.4.27) acquires the form of (5.3.15), and Eq. (5.4.26) coincides with (5.3.18).


Equation (5.4.26) can be solved by the approximate method outlined in $5.3. Indeed, the principal assumption (necessary for the approximate method to be efficient) that the trajectories of the sufficient coordinates z:(t) and x:(t) are quasiharmonic is satisfied in this case, since the noises [ ( t ) in the plant and the noises ~ ( t ) in the feed-back circuit are small (their intensity is of the order of e). In view of this fact, the rate of change of the estimated values for the amplitude A. and the phase cpo are small, and hence we can use the successive approximation procedure of 85.3.

To this end, considering the loss function (5.4.15) as a function of the estimates of the amplitude A. and the phase Go and writing the finite difference equation (5.4.17) for the time interval A = 27r, we obtain the equation

( ~ A ' p o ) ~ -+...I d2 = o , +

2 am:,

similar to Eq. (5.3.26). Next, just as in 85.3, by using (5.4.26), we eliminate the derivatives of the loss function with respect to the phase eo from (5.4.28) and solve the obtained one-dimensional infinite-order equation by the method of successive approximations.

Note that the increments of the estimated values of the amplitude EAAO and the phase &Avo on the time interval A = 27r can readily be calculated with the help of Eqs. (5.4.13) for the sufficient coordinates written in the polar coordinates A. and Qo in accordance with the change of variables (5.4.25). In this case, just as in 85.3, we assume in advance that, in view of the symmetry of the problem, the optimal control has the form

u,(Ao, Go) = urn sign [sin (@o - cp,(Ao))], (5.4.29)

and thus solving the synthesis problem is equivalent to finding the equation in the polar coordinates for the switching line cp,(Ao).

We do not consider the mathematical calculations in detail (they coincide with those in $5.3), but illustrate the resuIts obtained for the switching line in the first two approximations by way of example of a controlled plant that is a linear quasiharmonic system (in (5.4.1) we have ~ ( z , k ) 1). By using the above-described procedure, we simultaneously solve Eqs. (5.4.26) and

308 Chapter V

(5.4.28) and obtain the following one-dimensional Bellman equation in the first approximation (in the case of quadratic penalties c(x, I ) = x2 + k2 ) :

d2fi + min [-2acoscpp- =?-A:, " I d A0

(5.4.30) Urn

p = - , Y = (Dli)" (Kd2 4 N 7i-

Hence we obtain the following equation for the switching line in the first approximation:

c p p o ) - 0, (5.4.31)

which corresponds to the control law

Taking into account (5.4.31), from (5.4.30) we obtain the expression

for the derivative dfl/dAo, which enters the formula for the switching line in the second approximation:


Since y?(Ao) is small, it follows from (5.4.25) and (5.4.29) that the quasioptimal control algorithm in the second approximation can be written as

U L J ( X ? , X : ) = -urn sign

The block diagram of a self-stabilizing system realizing the control algorithm in the second approximation is shown in Fig. 45. The most important distinction between this system and that in Fig. 44 is that the feed-back circuit contains an additional element SC producing the current values of the sufficient coordinates x : ( t ) and x i ( t ) . Figure 46 presents the diagram of this element in detail.

- 1

I <

CHAPTER VI

SOME SPECIAL APPLICATIONS OF

ASYMPTOTIC SYNTHESIS METHODS

In this chapter we consider some methods for solving adaptive problems of optimal control ($6.1), as well as problems of control with constrained phase coordinates ($6.2). Furthermore, in $6.3 we solve a problem of controlling the size of a population whose behavior is described by a stochastic logistic model.

"Adaptive problems" are optimal control problems, similar to those considered above, that are solved under the assumption that some system parameters are unknown a priori. In this case, just as in problems with observation noise (33.3, $4.2, and $5.4), the optimal controller is a combination of the optimal filtration unit and the controlling unit properly producing the required controlling actions on the plant. In $6.1 we present an approximate method for calculating such controllers; this method is effective if the a priori indeterminacy of unknown parameters is relatively small.

In 56.2 we present exact and approximate solutions of some stochastic problems of control with constrained phase coordinates. We consider two servomechanisms and a stabilizing system under the assumption that the range of admissible deviations between the command signal and the output coordinate is a fixed interval on the coordinate axis. We consider two cases of reflecting and absorbing screens at the endpoints of this interval. In solving the stabilization problem, we study a two-dimensional problem in which the phase trajectories reflect along the normal on the boundary of the region of admissible phase variables.

In $2.4 we have already studied the problem of control of a population size and have exactly solved a special control problem based on the stochastic Malthus model. In $6.3 we shall consider a general case of a stochastic logistic controlled model and construct an optimal control algorithm for this model in terms of generalized power series. We also obtain approximate finite formulas for quasioptimal algorithms, which can be used for large values of the model parameter called the m e d i u m capacity.

Chapter VI

$6.1. Adaptive problems of optimal control

In this section we consider the synthesis problem for controlled dynamic systems perturbed by a white noise and described by equations with unknown parameters. We assume that the system equations contain these parameters linearly and that the a priori indeterminacy of these parameters is small in some sense. First we present a formal algorithm for solving the Bellman equation approximately (and for the synthesis of a quasioptimal control). The algorithm is based on the method of successive approximations in which the solution of the optimal control problem with completely known values of all parameters is used as the zero approximation (a gener- ative solution). Next, we estimate the quality of the approximate synthesis (for the first two approximations). Finally, we illustrate our method by calculating a quasioptimal stabilization system in which the controlled plant is an aperiodic dynamic unit with an unknown inertia factor.

6.1.1. We shall consider control systems where the plant is described by stochastic differential equations of the form

Here x is an n-dimensional phase vector, u is an r-dimensional control vector, O(x) is an n-dimensional vector of known functions, [(t) is an n- dimensional vector of random functions of the white noise type (1.1.34), and A, B , u are constant matrices of the corresponding dimensions.

Here B and u are known matrices (det u # O), and some (or all) elements of the matrix A are a priori unknown. The functions $1 (x), . . . , O n (x) are arbitrary. The only assumption is that, a t least in the weak sense [131], Eqs. (6.1.1) has a unique solution x(t) = xu (t), t 2 to, for a given x(to) = x and any admissible control u.

In the following, it is convenient to denote the unknown parameters of the matrix A by the special letter a. Numbering all unknown parameters in an arbitrary way and writing them as a column a = (al,. . . , ak), we can rewrite Eq. (6.1.1) as

where A* is obtained from the matrix A by substituting zeros instead of all unknown elements and the n x k matrix Q(x) (that consists of the functions Oi (x) and zeros) is uniquely determined by the vector a from the condition AO(x) = A*O(x) + Q(x)a. The goal of control is to minimize with respect to u the mean value of the functional

Applications of Asymptotic Synthesis Methods 313

where c(x) and $(x) are some nonnegative bounded continuous functions, and H is a positive definite constant r x r matrix. We do not impose any restrictions on the admissible values of the control vector u and assume that the state vector x can exactly be measured at any time t E [O, TI. Thus, we can seek the optimal control u, that minimizes the mathematical expectation (6.1.3) in the form of the functional

where xh = {x(T) : 0 5 T 5 t ) is an observed realization of the state vector from the initial instant of time to the current time t.

6.1.2. The approximate synthesis algorithm. We assume that the difference between the unknown parameters a and the a priori known vector a 0 is small. To obtain a rigorous mathematical statement, we assume that a is a random vector subject to an a priori Gaussian distribution with mean a0 and the covariance matrix Do = =Do (E is a small This assumption and Eqs. (6.1.2) imply the following two facts that we need in the sequel.

1. The a posteriori probability density p(a I zk) = pt(a) calculated from observations of the process x (t)' is a Gaussian (conditionally Gaussian) density completely described by the vector m = m(t) = mt of a posteriori mean values and the matrix D = D(t) = Dt of a posteriori covariances. The latter are described by the following differential equations (see [132, 1751):

d0m = D Q ~ (x ( t ) )~ - l [dox( t ) - (A(m)~(x ) + BU) d t ] , (6.1.5)

Throughout this section, N-I is the inverse of the matrix N = auT, Nl = QTN-'Q, and the matrix x ( m ) is obtained from A in (6.1.1) in which all unknown parameters a are replaced by their a posteriori means m.2 We also note that system (6.1.5) contains stochastic differential Ito equations and the differential equations in system (6.1.6) are understood in the usual sense.

2. The elements of the matrix Dt are small variables ( N E) for all t > 0. Indeed, by integrating the matrix equation (6.1.6), we obtain the following explicit formula for the covariance matrix Dt in quadratures:

'It follows from (6.1.2) and (6.1.4) that x ( t ) is a diffusion type process. 2As is known 138, 39, 1671, the a posteriori means m = mt are optimal estimates of

cu with respect to the minimum mean square error criterion.

314 Chapter VI

(E is the k x k identity matrix). Denoting the columns (with the same numbers) of the matrices Dt and Do by yt and yo, respectively, we obtain from (6.1.7) the relations

Since the constant matrices Do and N-' are positive definite, the matrix R(s) is nonnegative definite; R(s) is degenerate if and only if all elements of a t least one column of the matrix Q are zero.

Let X(s) > 0 be the minimum eigenvalue of the matrix R(s). On multiplying (6.1.8) by yt in the scalar way, we obtain

(here llytll is the Euclidean norm of the vector yt). Replacing the quadratic form in (6.1.9) by its lower bound and estimating the inner product (yo, yt) with the help of the Cauchy-Schwarz-Bunyakovskii inequality, we arrive a t the inequality

Since llyoll - E , it follows from (6.1.10) that llyt 1 1 - E . Thus we have Dt - E

for all t E [0, TI. We shall solve the problem of optimal control synthesis by the dynamic

programming approach. To this end, we first note that the a posten'ori probability density pt(a) (or the current values of its parameters mt and Dt) together with the current values of the phase vector xt form the sufficient coordinates (see $1.5) for the problem in question. Therefore, these parameters and time t are arguments of the loss function given, as usual, by the formula

~ ( t , x , m , D ) = min U(J)ER, t<s<T

The expression in the square brackets in (6.1.5) is the differential of the Wiener process (the innovation process [132]) with the matrix N of


diffusion coefficients. Therefore, it follows from (6.1.2), (6.1.5), and (6.1.6) that the variables (xt, mt, Dt) form a diffusion Markov process (degenerate with respect to D). By applying the standard derivation procedure (see 31.4, as well as [97]), we obtain the following differential Bellman equation for the function F = F( t , x, m, D):

-T 1 -Ft = oT (%)A (m)Fx + min [uTBTFX + uTHu] + - S p ( N F x x ~ )

uER, 2

Here Ft = dF/dt, Fx is a vector-column with components g, . . . , E,

d F are matrices of partial derivatives,

and Sp(.) is the trace of the matrix (.). Since the covariance matrix D is of the order of E, it is now expedient

to pass to new variables D according to the formula D = ED. Performing this substitution and minimizing the expression in the square brackets, we transform Eq. (6.1.12) to the form

In this case, the vector

316 Chapter VI

at which the function in the square brackets in (6.1.12) attains its minimum, determines the optimal control law, which becomes a known function u, = u, ( t , x, m, D) of the sufficient coordinates, after the loss function F = F (t, x, m, D) is calculated from Eq. (6.1.13).

Now let us discuss whether Eqs. (6.1.13) can be solved. Obviously, in the more or less general case, it is hardly possible to obtain an exact solution. Moreover, one cannot construct the exact solution of Eq. (6.1.13) even in the special case where 8(x) is a linear function and c(x) and +(x) are quadratic functions of x, that is, in the case in which the synthesis problem with known parameters in system (6.1.1) can be solved exactly. The crucial difficulty in this case is related to the bilinear form (in the variables x and m) appearing in the coefficients of the first-order derivatives F,. On the other hand, a high accuracy of estimating the unknown parameters a, due to which a small parameter E appeared in the three last terms in (6.1.13), results in a rather natural assumption that the difference between the exact solution of (6.1.13) and the solution of (6.1.13) with E = 0 is small. (In other words, the difference between the solution of the synthesis problem with unknown parameters a and the similar solution with given a = a 0 is small.)

The above considerations allow us to believe that an efficient approximate solution of Eq. (6.1.13) (that is, of the synthesis problem) can be obtained by means of the regular asymptotic method based on the expansion of the desired loss function F in powers of the small parameter E:

Substituting (6.1.15) into (6.1.13) and grouping terms of the same order with respect to E, we obtain the following equations for successive approximations:

1 1 -F: = dT ( x ) z T ( m ) F ~ - -(F;)~BI F: + 2 S p ( ~ ~ 2 z , ) + c(x),

4 O < t < T , F 0 ( ~ , x , m , D ) = + ( x ) ; (6.1.16)


The zero-approximation equation (6.1.16) is n ~ n l i n e a r , ~ while the successive approximations can be found by solving the linear equations (6.1.17) and (6.1.18), which usually is a simpler computational problem. Thus, the described scheme for solving Eq. (6.1.13) approximately is useful only if Eq. (6.1.16), that is, the Bellman equation for the problem with completely known parameters a, can be solved exactly. As was already pointed out, the last condition is satisfied if Oi (x) are linear functions and c(x) and $(x) are quadratic functions of the phase variables x. In this case, all successive approximations can also be calculated in the form of quadratures (see $3.1 in [34]).

The solutions of Eqs. (6.1.16)-(6.1.18) of successive approximations can be used for obtaining approximate solution of the synthesis problem. Namely, the quasioptimal control us(t , x, m, D), corresponding to the sth approximation, is determined by formula (6.1.14) after the function F in (6.1.14) is replaced by the approximate expression FS = F0 + &F1 + - - . + cSFS.

6.1.3. Estimates of the quality of approximate synthesis. We assume that the quasioptimal control us (t, x, m, D) has already been obtained in the sth approximation. By

GS (t, x, m, D) = E [c(x(r)) + U: (T)HU,(T)] d r + $(x(T))

we denote the mean value (calculated from the time instant t ) of the optimality criterion (6.1.3) for the control us.4 The deviation As = GS - F of the function (6.1.19) from the exact solution F (t, x, m, D) of the Bellman equation (6.1.13) is a natural estimate of the quality of the approximate control uS(t , x, m, D). In what follows, we calculate the order of A" in the first two approximations, that is, we estimate AO and A'.

Just as in 53.4, we calculate the desired estimates AS (s = 0 , l ) in two steps. First, we estimate the differences 6' = F - FS, then, y" Fs - Gs, which immediately implies the estimates for AS (in view of the triangle inequality).

Estimation of the differences 6' and S 1 . Let O(x), c(x), and $(z) be bounded continuous functions for all x E R n . Then it follows from The- orem 2.8 (for the Cauchy problem) in [I241 that the quasilinear equations

3The partial differential equations (6.1.13) and (6.1.16) of parabolic type are linear with respect to the higher-order derivatives of the loss function. That is why, equations of the form (6.1.13) and (6.1.16) are sometimes called weakly nonlinear (quasilinear or semi-linear), see [61, 1241.

41n (6.1.19), u , ( r ) = u , ( r , x U e ( r ) , m U ~ ( r ) , DUE ( T ) ) , where z U a ( T ) , m u . ( T ) , and DUs (7 ) satisfy Eqs. (6.1.2), (6.1.5), and (6.1.6) with u = u , ( r ) for T > t and the initial conditions xUs ( t ) = x , m U a ( t ) = m, and Dun ( t ) = D.

318 Chapter VI

(6.1.13) and (6.1.16) have a t most one solution in the class of functions that are continuous in the strip IIT = (1x1 < m ; ID1 < m ; Iml < m ; 0 5 t < T}, continuously differentiable once in t , and twice in other variables for 0 < t < T, and possess bounded first- and second-order derivatives with respect to x, m, D in IIT. Furthermore, Theorem 2.5 (for quasilinear equations) in 11241 implies the following estimate for the solution of the Cauchy problem (6.1.13):

(here C1, C2 2 0 are some constants; it is assumed that the function c may depend not only on x as in (6.1.13) but on the other variables t , m , D). The above arguments also hold for linear equations (6.1.17) and (6.1.18) of successive approximations.

By introducing a quasilinear operator L, we rewrite Eq. (6.1.13) in the form

L F = - c ( x ) , O < t < T ; F ( T , x , m , D ) = $ ( x ) .

Then, for 6' = F - F O , we obtain from (6.1.13) and (6.1.16) a quasilinear equation of the form

(with regard to the fact that the solution F0 of the zero-approximation equation (6.1.16) is independent of Dl and therefore, Fi = 0). The vector of partial derivatives Fi is a bounded continuous function in view of the above-mentioned properties of the solution to Eq. (6.1.16). Hence, (6.1.21) is an equation of the form (6.1.13). To use the estimate (6.1.20), we need to verify whether the right-hand side of (6.1.21) is bounded.

The elements of the matrices Q and Nl are bounded, since the functions O(x) are bounded and the matrix N is bounded and nondegenerate. More- over, it follows from the inequality (6.1.10) that the norm of the matrix D can only decrease with time t. Therefore, the matrix D is bounded for all t E [0, T] if the matrix Do of the initial (a priori) covariances is bounded, which was assumed in advance.

It remains to estimate the matrices F : ~ ~ and F:,, of partial derivatives. To this end, we turn to the zero-approximation equation (6.1.16). By writing v i = aFO/dmi (here mi is an arbitrary component of the vector m) and differentiating (6.1.16) with respect to the parameter mi, we obtain

Applications of Asymptotic Synthesis Methods

the linear equation for vi:

0 5 t < T ; v' (T, x, m, D) = 0. (6.1.22)

Equation (6.1.22) is written for the case where the unknown parameter ai

stands on the r t h line and in the j th column of the matrix A in the initial system (6.1.1); here Bj = Bj(x) is the j th component of the vector-function B(x). Since is bounded, the solution v h f Eq. (6.1.22) and its partial derivatives vk and vLXT, as was already noted, are also bounded. Finally, since vL = FL; is bounded and the number i is arbitrary, the matrix F:mT in the first term on the right in (6.1.21) is also bounded. In a similar way, we verify the boundedness of F:,,.

Thus, it follows from (6.1.21), (6.1.20) that So satisfies the estimate

where C is a positive constant. -1

In a similar way, we can estimate S1 = F - F = F - F0 - aF1. From (6.1.13), (6.1.16), and (6.1.17), it follows that S1 satisfies the equation

The boundedness of F:, FA, F;,~, and FA,, can be verified by analogy with the case where we estimated So. Therefore, (6.1.24) and the inequality (6.1.20) imply

IS1[ 5 c a 2 . (6.1.25)

Estimation of the differences -yo and -yl. For the functions GS = GS(t, z, m, D), s = 0,1,2 , . . . , determined by (6.1.19), we have the linear partial differential equations 1451

€ 2 + C(X) + a SP(DQ~(X)G:,,) + SP(DNI(X)DGR,T)

- a Sp(DNl(x)DGa, 0 5 t < T,

GS (T, x, m, D) = y5 (2). (6.1.26)

320 Chapter VI

The quasioptimal controls

contained in (6.1.26) are bounded continuous functions. Therefore, in view of [66], the functions GS satisfying (6.1.26) are also bounded and twice continuously differentiable, just as the functions F and FS discussed above.

By using the expressions uo = - H - ~ B ~ F ; / ~ and u 1 = -H-'B~(F; + ~F,1)/2 for quasioptimal controls, as well as equations (6.1.26), (6.1.16), and (6.1.17), we can readily obtain the following equations for the differences y O = F O - G O a n d y l = F O + & F 1 - G 1 :

where Lo and L1 are the linear differential operators

Since the expressions in the square brackets in (6.1.27) and (6.1.28) are bounded, the inequalities (6.1.20) for the solutions yO(t, x , m , D) and yl(t , x, m, D) of Eqs. (6.1.27) and (6.1.28) yield the estimates

Finally, from (6.1.29), (6.1.23), and (6.1.25) with regard to the inequality /A" 5 ]Iss + [ys[ , we have


The estimates (6.1.30) show that the use of the quasioptimal control uo or ul instead of the optimal control (6.1.14) results in a deviation (an increase) in the functional (6.1.3) by - E in the zero approximation and by -- E~ in the first approximation. Thus, it follows from (6.1.30) that the method of approximate synthesis of optimal control considered in Section 6.1.2 is asymptotically efficient.

6.1.4. An example. Let us consider the simplest case of system (6.1.1) in which the plant is an aperiodic first-order unit with an unknown inertia factor. In this case, Eq. (6.1.2) is a scalar equation

where a is an unknown parameter, b and v > 0 are given numbers, and [(t) is a scalar white noise of intensity 1. We define the optimality criterion (6.1.3) as

where g and h > 0 are given constants. The optimal filtration equations (6.1.5), (6.1.6) and the Bellman equation (6.1.13) for problem (6.1.31), (6.1.32) are

- D

dom = - -z(t) [doz(t) + (mx(t) - bu) dt], (6.1.33) V

- D~ D = --x2(t), (6.1.34)

V

(2, m, and D are scalar variables in (6.1.33)-(6.1.35)). The zero approximation (6.1.6) for Eq. (6.1.35) has the form

322 Chapter VI

The exact solution of Eq. (6.1.36) is5

~ ' ( t , x, m) = fO(t , m)x2 + rO(t, m), (6.1.37)

b29 a = J m 2 + - h '

o gv (T- t ) vh T (t, m) = - -1n 2P

p + m b2 ,B + m + (P - m)e-2P(T-t) '

It follows from (6.1.14) and (6.1.37) that the quasioptimal control in the zero approximation has the form

where fo(t, m) is determined by (6.1.37). To obtain the quasioptimal control in the first approximation, we need

to calculate the second term in the asymptotic expansion (6.1.15). In our case, Eq. (6.1.17) for the function F1 = F1(t, x, m, D) has the form

b2 0 1 v 1 -Ftl = - m x ~ ; - - f (t, m)xF, + -F,, - DXF;,, h 2

O < t < T , ~ l ( T , x , m , D ) = o . (6.1.39)

Since, in view of (6.1.37), we have F:, = 2fk(t,m)x, we obtain the following expression for the desired function F1(t, x, m, D):

(here fg(T- s, m) denotes the partial derivative 8 fo(s , m)/8m of the function fO(s, m) in (6.1.37) with respect to the parameter m).

5Note that the loss function in the zero approximation is independent of the estimate variance D, i.e., Fa = F0 ( t , z ,m).


It follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control synthesis in the first approximation is given by the formula

Comparing (6.1.38) and (6.1.41), we note that the optimal regulators in the zero and first approximations are linear in the phase variable x. However, if higher-order approximations are used, then we obtain nonlinear "laws of control."

For example, in the second approximation, we obtain from (6.1.18) and (6.1.35) the following equation for the function F2 = F2( t , x, m, D):

Obviously, its solution has the form

~ ~ ( t , x ,m, D) = q(t,m, D)x4 + f2( t , m, D ) X ~ + r2(t, m, D),

and therefore, it follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasioptimal control in the second approximation

is a linearly cubic function of x. Figures 47 and 48 show block designs for quasioptimal feedback control

systems, which correspond to the first (Fig. 47) and the second (Fig. 48) approximations. By Wi ( i = 0,1,2,3) we denote linear (in x) amplifiers with varying amplification coefficients

2 2b W3 = -E -q(t,m, D). h

324 Chapter VI

The plant P is described by Eq. (6.1.31). The unit S C of optimal filtration forms the current values of the sufficient coordinates m = m(t) = mt and D = D ( t ) = Dt. It should be noted that the coordinate mt is formed in S C with the aid of the equation

which differs from Eq. (6.1.33). The reason is that only stochastic equations understood in the symmetrized sense 11741 are subject to straightforward simulation. Therefore, the symmetrized equation (6.1.42) is chosen so that its solution coincides with the solution of the Ito equation (6.1.33).

6.1.5. Some results of numerical experiments. The estimates (6.1.30) establish only the asymptotic optimality of the quasioptimal controls uo and u1. Roughly speaking, the estimates (6.1.30) only mean that the less the parameter E (i.e., the less the a priori indeterminacy of the components of the vector a) , the more grounds we have for using the quasioptimal controls uo and u1 (calculated according to the algorithm given in Section 6.1.2) instead of the optimal (unknown) control (6.1.4) that solves problem (6.1.1)-(6.1.3).

On the other hand, in practice we always deal with problems (6.1.1)- (6.1.3) in which all parameters (including E) have definite finite values. As a rule, in advance, it is difficult to determine whether a given specific value of the parameter E is sufficiently small so that the above approximate synthesis procedure can be used effectively. Some ideas about the situations arising


for various relations between the parameters of problem (6.1.1)-(6.1.3) are given by the results of numerical experiments performed to analyze the efficiency of the quasioptimal algorithms (6.1.38) and (6.1.41) (see the example considered in Section 6.1.4).

As was already noted, it is natural to estimate the quality of the quasioptimal controls us (s = 0,1,2,. . .) by the differences AS = GS - F, where the functions GS = GS (t, x, m, D), given by (6.1.19), satisfy the linear parabolic type equations (6.1.26) and the loss function F = F( t , x, m, D) satisfies the Bellman equation (6.1.13). In the example considered in Sec- tion 6.1.4, the Bellman equation has the form (6.1.35), and the functions GS (s = 0,1,2, . . . ) satisfy the equations

Equations (6.1.35) and (6.1.43) were solved numerically (Eq. (6.1.43) was solved for s = 0 and s = 1 with the quasioptimal controls (6.1.38) and (6.1.41) taken as us, s = 0 , l ) .

Here we do not describe finite-difference schemes for constructing numerical solutions of Eqs. (6.1.35) and (6.1.43)6 but only present the results

'Numerical methods for solving equations of the form (6.1.35) and (6.1.43) are discussed in Chapter VII.

Chapter VI

of the corresponding calculations performed for different values of the parameters of problem (6.1.31), (6.1.32).

In Fig. 49 the plots of the loss function F (solid curves) and the function Go (dashed curves) are given for three values of the a posteriori variance D = ED in the case where m = 1, p = T - t = 3, and problem (6.1.31), (6.1.32) has the parameters g = h = b = u = 1. (Since the functions F and Go are even with respect to the variable x, that is, F ( t , x ,m , D) = F( t , -x,m, D) and Go(t, x ,m) = Go(t, -x,m), Fig. 49 shows the plots of F and Go only for x > 0.) Since the corresponding curves for F and Go are close to each other, we can state that, in this case, the quasioptimal zero-approximation control (6.1.38) ensures the control quality close to that of the optimal control. However, this situation is not universal, which is illustrated by the numerical results shown in Fig. 50.

Figure 50 shows the plots of the functions F (solid curves), Go (dot- and-dash curves), and G1 (dashed curves) for the "reverse" time p = T - t = 2.5 and the parameters g = h = 1, b = O , l , and u = 5 of problem (6.31), (6.1.32). One can see that the use of the quasioptimal zero-approximation control uo(t, x, m) leads to a considerable increase in the value of the functional (6.1.19) compared with the possible minimum (optimal) value F ( t , x , m, D). Therefore, in this case, to ensure a qualitative control of system (6.1.31), we need to use quasioptimal controls in higher-order approximations. In particular, it follows from Fig. 50 that, in this case, the quasioptimal first-approximation control ul(t, x, m, D) determined by (6.1.37), (6.1.40) and (6.1.41) provides the control quality close


to the optimal. Thus, the results of numerical solution of Eqs. (6.1.35) and (6.1.43) con-

firm that the quasioptimal control algorithm (6.1.41) is "highly qualitative." We point out that this result was obtained in spite of the fact that the a posteriori variance D, which plays the role of a small parameter in the asymptotic synthesis method considered here, is of the same order of magnitude as the other parameters (g, h, b, v) of problem (6.1.31), (6.1.32). This fact allows us to believe that the asymptotic synthesis method (see Section 6.1.2) can be used successfully for solving various practical problems of the form (6.1.1)-(6.1.3) with finite values of the parameter E.

In conclusion, we make some methodological remarks. First, we recall that in the title of this section the problems of optimal control with unknown parameters of the form (6.1.1)-(6.1.3) are called "adaptive." It is well known that problems of adaptive control are very important in the modern control theory, and at present there are numerous publications in this field (e.g., see 16-9, 1901 and the references therein). Thus, it is of interest to compare the results obtained in this section with other approaches to similar problems.

The following heuristic idea is very often used for constructing adaptive algorithms of control. For example, suppose that for the feedback control system shown in Fig. 13 it is required to construct a controller C that provides some desired (not necessarily optimal) behavior of the system in the case where some parameters cr of the plant P are not known in advance. Suppose also that for some given parameters a , the required

328 Chapter VI

behavior of system in Fig. 13 is ensured by the well-known control algorithm u = cp(t, x, a) . In this case the proposed heuristic adaptive algorithm consists in two steps: (1) to include an optimal filter into the controller C so that this filter will produce optimal estimates Gt = S(x6) of the vector of unknown parameters of the plant by means of the observation of the output process x i = {x(T): 0 5 T 5 t}; (2) to define the adaptive control by the formula u, = ~ ( t , x, Zt). Needless to say, an additional analysis is required to answer the question of whether such control ensures the desired behavior of the system. The corresponding analysis [6-9, 1901 shows that this method for constructing adaptive control is quite acceptable in many specific problems.

Now let us discuss the results of this section. Note that the above- mentioned heuristic idea is exactly realized if system (6.1.2) is controlled by the quasioptimal zero-approximation control uo(t, x, m). To verify this fact, we return to the example considered in Section 6.1.4. The algorithm of the optimal control for problem (6.1.31), (6.1.32) with a known parameter a = -a is given by formulas (2.1.14) and (2.1.16) in $2.1. Compar- ing (2.1.14), (2.1.16) with (6.1.37), (6.1.38), we see that the quasioptimal zero-approximation algorithm (6.1.37), (6.1.38) can be obtained from the optimal algorithm (2.1.14), (2.1.16) by replacing the parameter -a by its optimal estimate mt established by means of the filter equations (6.1.33) and (6.1.34).

On the other hand, a numerical analysis of the quasioptimal algorithms uo(t, x, m) and u l ( t , x, m, D) (see Figs. 49 and 50) shows that the algorithm u l is preferable in contrast with the "heuristic" algorithm uo in the zero approximation. This result proves that the regular asymptotic method considered in this section is effective for solving adaptive problems of optimal control.

$6.2. Some stochastic control problems with constrained phase coordinates

As was pointed out in $1.1, in the process of constructing actual control systems, one often needs to take into account constraints of various types imposed on the set of possible values of the phase coordinates. These constraints arise due to exploiting some specific systems, additional requirements on the transient processes, allowance to the fact that the time of control switching is finite, and to other courses. In these cases, a region of admissible values is specified in the phase space, so that the representative point of the controlled system must not leave this region. In this case, the equations of the system dynamics that determine the phase trajectories in the interior of this region can be violated a t the boundary of this region. Additional constraints that are imposed on the phase trajectories on the


boundary depend on the type of the problem. In what follows, we consider two one-dimensional and one two-dimensio-

nal problems of optimal control synthesis (the problem dimension is determined by the number of phase variables on which, in addition to time t , the loss function depends). In the one-dimensional problems, the controlled variable z(t) is interpreted as the difference (error signal) between the current values of the random command input y(t) and the controlled variable x(t) in the servomechanism studied in $2.2. However, in contrast with $2.2 where any value of the error signal z(t) was admissible, in the present section it is assumed that the region of admissible values of z is an interval [11,12]. At the endpoints of this interval, we have either reflecting or absorbing screens [157, 1601. In the first case, if the representative point z(t) comes to ll or 12, then it is instantaneously reflected into the interior of the interval; in the second case, on the contrary, the representative point "sticks" to the boundary and remains there forever. In practice, we have the first problem if the error signal values lying outside the admissible interval [11,12] are prohibited, and we have the second problem if the tracking is interrupted a t the endpoints (just as in radio systems of phase small adjustment [143, 1801).

In the two-dimensional problem we consider the optimal control of a diffusion process in the interior of the disk of radius ro centered at the origin on the phase plane (x, y). The circle bounding this disk is a regular boundary [I241 reflecting the phase trajectories along the inward normal.

6.2.1. One-dimensional problems. Reflecting screens. Let us consider, just as in 32.2, the synthesis problem of optimal tracking a wandering coordinate in the case where the servomotor with bounded speed is used as an executive mechanism. By analogy with $2.2, we assume that the command input y(t) is a continuous Markov diffusion process with known drift a and diffusion B coefficients (a , B = const, B > 0). By using a servomotor with bounded speed (x = u, lul < um, um > la[) , it is required to "follow" the command signal y(t) on the time interval 0 < t < T so that to minimize the mathematical expectation (mean value) of the integral performance criterion

where z(t) = y(t) - x(t) is the error signal, c(z) is a nonnegative penalty function attaining its minimum a t the unique point z = 0, and c(0) = 0. In this case, as shown in $2.2, solving the synthesis problem (in the case of unbounded phase coordinates) is equivalent to solving the Bellman equation

330 Chapter VI

(see (2.2.4))

with the loss function

F ( t , z) = min I U ( J ) I I U ~

t l s < T

satisfying the following natural condition for t = T:

F (T, z) = 0. (6.2.3)

According to 51.4, the Bellman equation is defined only by local characteristics of the controlled process z(t). Therefore, for problems with constraints on the error signal, Eq. (6.2.1) also remains valid at all interior points el < z < 12. Indeed, since the stochastic process z(t) is continuous, its realizations issued from an interior point z with large probability (almost surely) move to a small distance during a small time At and cannot reach the endpoints el and 12. Therefore, in a sufficiently small neighborhood of any interior point z , a controlled stochastic process behaves in the same way as if there were no reflecting screens. Hence, the differential equation (6.2.1) is valid at these points.

At the points el and e2, Eq. (6.2.1) is not valid, and additional conditions on the function F at these points are determined by the character of the process z(t) near these points. For example, in the case of reflecting screens considered here, we have the conditions [I571

The conditions (6.2.4) can be explained intuitively by modeling the diffusion process z(t) approximately as a discrete random walk [I601 in which with some definite probabilities the representative point goes from the point z to neighboring points z f Az, Az = m, during the time At. Then if at any time instant t the point z comes to the boundary, say, z = el, then with probability 1 the process z attains the value el + Az at time t + At, and therefore, we can write the following relation for the loss function (6.2.2):


By expanding the second term in the Taylor series around the point (t, el), we obtain

whence, passing to the limit as At + 0, we arrive a t (6.2.4). Thus, to synthesize an optimal servomechanism with the variable z sub-

ject to constraints in the form of reflecting screens a t the points z = el and z = 12, we need to solve Eq. (6.2.1) with additional conditions (6.2.3) and (6.2.4) on the function F ( t , z ) (el 5 z 5 l2, 0 < t < - T). In this case, the synthesis problem is solved according to the scheme studied in 52.2 for a similar problem without constraints on z. Therefore, here we only briefly recall this scheme paying the main attention to distinctions arising in the calculational formulas due to constraints on the phase variable.

Obviously, the expression in the square brackets in (6.2.1) is minimized by an optimal control of the form

d F U* (t , z) = urn sign - (t , z) . (6.2.5)

dz

Substituting (6.2.5) into (6.2.1) and omitting the symbol min, we obtain

If we pass to the reverse time T = T - t , then the boundary value problem we need to solve acquires the form

By taking into account the properties of the penalty function c(z), we see that the loss function F(T, z) satisfying the boundary value problem (6.2.7)-(6.2.9) for all T (0 < T < T) has a single minimum (with respect to z) on the interval el < z < e2. Therefore, the optimal control (6.2.5) can be written as (see (2.2.8))

U* (7, z) = urn sign (z - z, (7)) , (6.2.10)

332 Chapter VI

where z, (7) is the minimum point (with respect to z) of the function F(T, Z) and simultaneously the switch point of the controlling action. This point can be found from the condition

Thus, to synthesize an optimal system, we need to solve the boundary value problem (6.2.7)-(6.2.9) and to use the condition (6.2.11).

Problem (6.2.7)-(6.2.9) can be solved exactly if, just as in $2.2, we consider the stationary operating conditions corresponding to large values of 7.

In this case, instead of the function F(T, z), we can consider the stationary loss function f (z) given by the relation

f (z) = lim [ F (r, z) - y r ] T + W

(just as in (1.4.29), (2.2.9), (4.1.7), and (5.3.17), the number y characterizes the mean losses per unit time in the stationary tracking mode). Therefore, for large T (more precisely, as r + m), the partial differential equation (6.2.7) is replaced by the following ordinary differential equation for the function f (2):

with the boundary conditions

In this case, the coordinate of the switch point given by (6.2.11), where F is replaced by f , attains a constant value z, (that is, we have a stationary switch point).

The boundary value problem (6.2.12), (6.2.13) can readily be solved by the matching method. By analogy with $2.2, let us consider Eq. (6.2.12) on different sides of the switch point z,. Then the nonlinear equation (6.2.12) is replaced by the pair of linear equations

Solving Eqs. (6.2.14), (6.2.15) with boundary conditions (6.2.13), we arrive


By using (6.2.11), we obtain the two equations

for two unknown parameters y and z,. Substituting (6.2.16) into (6.2.17) and eliminating the parameter y from the system obtained, we see that the stationary switch point z, satisfies the transcendental equation

For the quadratic penalty function c(z) = z2, Eq. (6.2.18) acquires the form

If el -+ -a and l 2 + +a (that is, reflecting screens are absent), then Eqs. (6.2.19) imply the following explicit formula for the switch point z,:

this formula was obtained in 52.2 (see (2.2.16)). In the other special case e2 = -el and A 1 = -Az (the last equality is possible only if a = O), Eq. (6.2.19) has a single trivial root z, = 0, that is, the optimal control (6.2.10) coincides in sign with the error signal z.

6.2.2. Absorbing screens. Let us see how the tracking system studied in the preceding section operates with absorbing screens. Obviously, in this case, the loss function (6.2.2) must also satisfy Eq. (6.2.7) in the interior of the interval [el, 12] and the zero initial condition (6.2.9). At the boundary points, instead of (6.2.8), we have

The conditions (6.2.20) follow from formula (6.2.2) and the fact that the trajectories z(t) stick to the boundary. Indeed, by using, as above, the discrete random walk model for z(t) , we can rewrite (6.2.2) as

334 Chapter VI

and hence, since t and A are arbitrary, we obtain

Just as in the preceding section, the exact solution of synthesis problem with absorbing screens can be obtained only in the stationary case (as .r + co). Suppose that the stationary operating mode exists and that zo is the corresponding stationary switch point. Then for large T, the nonlinear equation (6.2.7) can be replaced by two linear equations

For z = zo, z = el, and 2- = e2, the functions Fl and F 2 satisfy (6.2.11) and (6.2.20).

In accordance with [26], for large T, we seek the solutions of the linear equations (6.2.21) and (6.2.22) in the form

Using (6.2.23), we obtain from (6.2.21), (6.2.11), and (6.2.20) the following system of ordinary differential equations for the functions $l(z) and fl(z):

d$1 df 1 -(zo) dz = -(zo) dz = 0, $l(el) = ~ ( e , ) , fl(el) = o.

(6.2.24) From (6.2.24) we obtain

In a similar way, for the functions $2 and f2 we have

(here X1 and A2 are given by (6.2.16)). It follows from (6.2.23), (6.2.25), and (6.2.26) that Eq. (6.2.7) has a

continuous solution only if


The same continuity condition allows us also to obtain the following equation for the switch point zo (provided that (6.2.27) is satisfied):

Just as in the case of reflecting screens, Eq. (6.2.28) can be specified by various expressions for the penalty function c(z).

REMARK 6.2.1. If the condition (6.2.27) is violated, then it makes no sense to study the stationary operating mode in the problem with absorbing boundaries, since in this case the synthesis problem has only a trivial solution. In fact, we can readily see that for c(ll) > c(12) we always need to set u = -urn (correspondingly, for c(C1) < c(12) we need to set u = +urn). This character of control depends on the fact that, in view of its regularity, the diffusion process z( t ) sticks to that or other boundary with probability 1 (as t + co). Therefore, it is clear that this algorithm for controlling the process z(t) maximizes the probability of the event that the process sticks to the boundary with the least possible value of the penalty function c(z).

In the general case c(el) # c(lz), we need to solve the nonstationary boundary value problem (6.2.7), (6.2.20), and (6.2.9). Since this problem cannot be solved exactly, it is necessary to use approximate synthesis methods. In particular, we can use the method of successive approximations considered in Chapter I11 for problems with unbounded phase coordinates. According to Chapter 111, the approximate solutions F ( ~ ) ( T , z) of Eq. (6.2.7) can be found by recurrently solving the sequence of linear equations

(all F ( ~ ) ( T , z), k = 0,1,2 , . . . , in (6.2.29) satisfy (6.2.9) and (6.2.20)). Af- ter F ( ~ ) ( T , z) are calculated, the synthesis of a suboptimal system is established by (6.2.10) and (6.2.11) with F replaced by F(~) ) . Just as in Chapter 111, we can prove that the function sequence F ( ~ ) (T, z) asymptotically as k -+ co converges to the exact solution F(T, Z) of the boundary value problem (6.2.7), (6.2.9), (6.2.20), and the corresponding suboptimal systems to the optimal system (the last convergence is estimated by the quality functional).

6.2.3. The two-dimensional problem. Suppose that the motion of a controlled system is similar to the dynamics of a Brownian particle

336 Chapter VI

randomly walking on the plane (x, y) so that along one of the axes, say, along the x-axis, this motion is controlled by variations of the drift velocity within a given region, while along the y-axis we have a purely diffusion noncontrolled wandering. In this case, the equations describing the system motion have the form

= u + r n & ( t ) , j,=&&(t), - (urn - a) 5 u < urn + a , (6.2.30)

where J l ( t ) and t2 ( t ) are independent stochastic processes of the white noise type with intensity 1 and, by analogy with one-dimensional problems, -(urn - a ) < 0 and (u, + a ) > 0 indicate the boundary values of the nonsymmetric region of admissible controls u.

We assume that the representative point (x(t), y(t)) must not go away from the origin on the plane (x, y) to distances larger than TO. To this end, we assume that the phase trajectories reflect from the circle of radius ro along the inward normal to this boundary. Under this assumption, it is required to find a control law that minimizes the mean value of the quadratic optimality criterion

One can readily see that the Bellman equation related to this problem, written in the reverse time r = T -t , has the form (F,, Fx, Fy indicate the ~ a r t i a l derivatives with respect to 7, x, y):

B(Fxx + Fyy) + min [uFx] = F, - x2 - Y2. (6.2.32) -(urn-a)<u<(um+a)

In addition to Eq. (6.2.32) for the function F(T , x, y) such that 0 < r 5 T and < T O , the loss function F ( r , x, y) must satisfy the zero initial condition

F(O, x, Y) = 0 (6.2.33)

and the boundary condition of the form [I571

where d l d n is the normal derivative on the circle of radius TO.


In the polar coordinates (r , cp) defined by the formulas x = r cos cp, y = r sin cp, the boundary value problem (6.2.32)-(6.2.34) acquires the form

+ min -(urn-a)<u<(um+a)

It follows from (6.2.35) that, just as in the one-dimensional case, the optimal control is of relay type:

but now, instead of the switch point, we have a switching line on the plane (x, y). This switching line is given by the equation (in the polar coordinates)

sin cp COS pFT - -F, r = 0.

To obtain a n explicit formula for the switching line, we need to solve Eq. (6.2.35) or (since this is impossible) equations of successive approximations obtained by analogy with Eqs. (6.2.29). Now we shall calculate the loss functions and the corresponding switching lines for the first two approximations of Eq. (6.2.35).

The zero approximation. Following the algorithm of successive approximations considered in Chapter I11 (see also (6.2.29)), we set the nonlinear term in the zero approximation of (6.2.35) equal to zero and thus obtain

1 (6.2.40)

It follows from (6.2.40), (6.2.36), and (6.2.37) that the solution F(O) is radially symmetric, F(')) = F(O)(r,r) , and therefore, instead of (6.2.40), (6.2.36), and (6.2.37), we have

338 Chapter VI

It is well known [I791 that the solution of Eq. (6.2.41) can be found by separation of variables (by the Fourier method) as the series

Here lo(%) is the Bessel function of zero order and is the mth root of the equation dIo(,u)/dp = 0.

It follows from the properties of zeros of the Bessel function [I791 that the series (6.2.42) is rapidly convergent. Therefore, since we are interested only in the qualitative character of suboptimal control laws, it suffices to find only the first term of the series in the sum in (6.2.42).

Calculating cl and using the tables of Bessel functions [77], we obtain the following approximate expression (0 = ~ / r i ) for the function ~ ( ' 1 :

r r2 0

F(')(T, r) = AT - 0.0426210 ( h r ) (1 - exp [ - ~(&')'r]). (6.2.44) 2 0 ro

By differentiating (6.2.44) with respect to r and taking into account the relations dIo(x)/dx = I l (x) and p: = 3.84, we find

FJO)(r, r ) = 0.164~11 (gr) (1 - exp [ - ~ ( ~ : ) ~ r ] ) . (6.2.45) 0 To


Since the first-order Bessel function I l ( p ~ r / r o ) is positive for 0 < r < ro (Il(&) = O), the derivative (6.2.45) is positive everywhere in the interior of the disk of radius ro on the plane (x, y). Hence, in view of (6.2.38), the sign of the controlling action in the zero-approximation is determined by the sign of cos cp, that is, the switching line coincides with the vertical diameter of the disk of radius ro on the plane (x, y) (in Fig. 51 the switching line is indicated by AOB; the arrows show the direction of the mean drift velocity).

The first approximation. By using the results obtained above, we can write the first-approximation equation as

F(')(o, r, cp) = 0, F,(l)(r, TO, cp) = 0,

F( ')(r , r, cp) = F(')(r, r, cp + 27r) = 0,

(0) r2 - (urn - a)F, (r, r ) cos cp, 0 < cp < 5, < cp < 2 ~ , @(7, T, cp) =

r2 + (urn + a)~!') (7, T) cos C, $ < (f < 2, (6.2.47)

(here the function F ' O ) is given by formula (6.2.45)). The solution F(')

of Eq. (6.2.46) may also be written as a series in eigenfunctions, but since now there is no radial symmetry, this series differs from (6.2.42) and has the form [I791

F(')(T, r , cp) = coo(7) + (c,, cos ncp + c;, sin ncp)

where

1 for n # 0, dn = {

2 for n = O ,

340 Chapter VI

and coo(r) denotes the terms independent of r and p, and hence, insignif- icant for the control law (6.2.38). The numbers p k are the roots of the equation dIn(p)/dp = 0, where In(p) is the nth-order Bessel function.

By analogy with the case of the zero approximation, we consider only the first most important terms of the series (6.2.48). Namely, we retain only the terms corresponding to the two roots p i and p: of the equation dI,(p)/dp = 0; according to [77], p i = 1.84 and py = 3.84. This means that all coefficients in (6.2.48) except for col, cll, and cil must be set equal to zero. The coefficient col coincides with cl in (6.2.43) and has been calculated in the zero approximation (therefore, in the series (6.2.48) the term containing col coincides with the second term in formula (6.2.44)). By calculating cil according to (6.2.50) with regard to (6.2.47), we obtain cil = 0. Thus, to find the loss function F('), it suffices to calculate only cll. Substituting (6.2.47) and (6.2.45) into (6.2.49), we obtain

x lr exp [ - ~(,LL:)~(T - g)] (1 - e ~ ~ [ - - ~ ( p p ) ~ c ] ) do

Since we have (see [179], $2, Part 1, Appendix 2)

we calculate the other integrals in (6.2.51) and obtain


Substituting (6.2.52) into (6.2.39) and letting r t m, we arrive a t the following equation for the switching line corresponding to the stationary operating conditions:

(here v = r/ro, E = Oro/a = Bluro , and I i ( p ) = dIl(p) /dp) .

Curves 1, 2, and 3 in Fig. 52 correspond to the three values of the parameter E in Eq. (6.2.53): E = 0.4, 1.0, 3.0. Thus, the optimal control in the first approximation consists in switching the control action from u = -(urn -a ) in the region R- to u = +(urn + a ) in the region R+, which (in dependence of the value of the parameter E) lies inside one of the closed curves 1-3 in Fig. 52.

REMARK 6.2.2. The decomposition (Fig. 52) of the phase space into the regions R- and R+ can be refined if the functions F(')(T,z) and F(')(T, r, cp) are calculated more precisely (that is, are approximated by a larger number of terms in the series (6.2.42) and (6.2.48)). However, as the corresponding calculations show, curves 1-3 obtained in this case do not practically differ from those shown in Fig. 52. 17

86.3. Optimal control of the population size governed by the stochastic logistic model

In this section we return to the problem of optimal control of the population size, which was formulated in $2.4 (but not solved). Let us briefly recall the statement of this problem.

342 Chapter VI

6.3.1. Statement of the problem. We shall consider a single-species population whose dynamics is described by the controlled stochastic logistic model

li. = T 1 - - x - qux + d%x[(t), t > 0, x(0) = xO, (6.3.1) ( 3 where x = x(t) is the population size (density) a t time t , [(t) is a stochastic process (1.1.31) of the standard white noise type, and T , K q, B, and x0 are given positive constants.

Admissible controls belong to the class of nonnegative scalar bounded measurable functions u = u(t) that for all t satisfy a condition of the form

where u, is a given positive number. We shall consider the control problem on an infinite time interval R+ =

[0, CQ) with an arbitrary initial population size x(0) = xO > 0. The goal of control is to maximize the functional

cc

I[u] = E [l e-dt(pqx(t) - c)u(t) dt] -+ max , (6.3.3) 0<u(t)<um

t>o

where 6, p, q, c > 0 are given numbers and E denotes the mathematical expectation of the expression in the square brackets (we average over the ensemble of random trajectories issued from a given point x(0) = xO and satisfying the stochastic differential equation (6.3.1)).

It follows from $2.4 that problem (6.3.1)-(6.3.3) is a stochastic generalization of optimal fisheries management problems studied in [35, 68, 1011. If, just as in 52.4 and in [35, 68, 1011, the number p is the cost of unit mass of caught fish, the number c denotes the cost of unit efforts u(t) for fishing, and q is the catchability coefficient, then the functional (6.3.3) is an estimate of the mean profit obtained by fishing during time of the order of 116.

The optimal control function u, (t) : R+ -+ [0, urn] maximizing the functional (6.3.3) is a random function of time. To obtain a constructive algorithm for calculating this function, we need to use some results of the general control theory for processes of diffusion type (see [58, 113, 1751 as well as $ 1.4).

We assume that the controlling party has information about the current values of the controlled process x(t). Then it is expedient to choose the control u(t) a t time t on the basis of the entire body of information available


on the controlled process. This leads to a controlling function of the form u(t) = u(t, xk), xh = {x(s): 0 5 s 5 t } that is sometimes called the natural strategy of control (the function u(t , 26) can be a probability measure). But if, just as in our case, the controlled system obeys an equation of the form (6.3.1) with perturbations [(t) in the form of a Gaussian white noise, then, as was shown in [113, 1751, the prehistory of the controlled process x(s): 0 5 s 5 t does not affect the quality of control. Therefore, to solve the optimization problem (6.3.3), it suffices to consider only the class of controlling functions that are deterministic functions of the current phase variable u(t) = u(t, x(t)) (the nonrandomized Markov strategy). Next, since the stochastic process [(t) is stationary and the coefficients in (6.3.1) are time-invariant, the optimal strategy for the infinite-horizon problem in question does not depend on time explicitly, that is, u,(t) = u,(x(t)). By using the controlling function (the synthesizing function) in the form u* (x), we can realize the optimal control of system (6.3.1) in the form of an automatic feedback control system.

In what follows, we present a method for calculating the synthesizing function u,(x) for problem (6.3.1)-(6.3.3).

6.3.2. Solution of problem (6.3.1)-(6.3.3). By analogy with $2.4 and on the basis of the results obtained in [113, 1751, we can assert that the maximum value of the functional (6.3.3) (that is, the cost function)

F(X) = max E [lm e-bt (pqr(t) - c)u(t) dt 1 ~ ( 0 ) = O<u(t)<um

t20- I

considered as a function of the initial state x is twice continuously differentiable and satisfies the following Bellman equation7 (F ' = dF/dx, F" = d2F/dx2):

F'+ max [ ( p q x - c - q x ~ ' ) u ] - ~ ~ = 0. (6.3.4) O<u(t)<um

The cost function is defined only for nonnegative values of the variable x; for x = 0, this function satisfies the natural boundary condition

which is a straightforward consequence of (6.3.1) and (6.3.3) (indeed, it follows from (6.3.1) that if x(0) = 0, then x(t) 0 for all t > 0; hence,

' ~ ~ u a t i o n (6.3.4) is written with regard to the fact that the solution of the stochastic equation (6.3.1) is understood in the symmetrized sense (see 51.2 and [174]).

344 Chapter VI

it follows from (6.3.3) that in this case the optimal control has the form u,(t) r 0; and hence, (6.3.3) implies (6.3.5)).

First, note that Eq. (6.3.4) for S > r + B and K -t oo has exact solution (obtained in $2.4)

Here c(S - r - B + qu,)

20 = kO-1

pq(6 - r - B + e q u r n )

determines the switch point of the optimal control in the synthesis form

and the numbers Ic: > 0 and kg < 0 in (6.3.6) and (6.3.7) can be written in terms of the parameters of problem (6.3.1)-(6.3.3) as

For an arbitrarily chosen value of the parameter (the medium capacity) K > 0, it is impossible to find the solution of Eq. (6.3.4) in the form of finite formulas like (6.3.6) and (6.3.7). Nevertheless, as is shown below, constructive methods for solving the synthesis problem can also be found in this case.

Let us construct a solution of Eq. (6.3.4). First, we note that it follows from (6.3.4) that the optimal control requires only the boundary values u = 0 and u = urn of the set 10, urn] of admissible controls. The choice of one of these values is determined by the sign of the expression y(x) = pqx - c - qxFL. If y(x) = 0, then the choice of control is not determined formally by Eq. (6.3.4). However, one can see that in this case the choice of any admissible value of u does not affect the solution of Eq. (6.3.4), since the nonlinear term of Eq.(6.3.4) vanishes for y(x) = 0 and any admissible u. Therefore, we can write the optimal control in the form


If the equation y ( x ) = 0 has a single root x,, then the optimal control can be written in the form

similar to (6.3.8), where the coordinate of the switch point x , is determined by the equation

pqx - c - y x F 1 ( x ) = 0 , (6.3.10)

whose solution can be obtained after the cost function F ( x ) is calculated. By Fo(x ) and F l ( x ) we shall denote the cost function F ( x ) on either side

of the switch point 3,. Then, as it follows from (6.3.4) and (6.3.9), instead of one nonlinear equation (6.3.4), we have two linear equations for Fo and F' :

B X ~ F ; + x T + B - yu, - -x F i - SFl = U,(C - pyx) , x , < z. ( K ) (6.3.12)

Since the cost function F (z) , as the solution of the Bellman equation (6.3.4), is twice continuously differentiable for all x E [0, oo) , the functions Fo and Fl satisfy the boundary condition (6.3.10) a t the switch point x,. Moreover, it follows from (6.3.5) that Fo(0) = 0. These boundary conditions allow us to obtain the unique solution of Eqs. (6.3.11) and (6.3.12), and thus, for all x E [O, oo) , to construct the cost function F ( x ) satisfying the Bellman equation (6.3.4).

We shall seek the solution of Eq.(6.3.11) as the generalized power series

By substituting the series (6.3.13) into (6.3.11) and setting the coefficients of x u , xu+', . . . equal to zero, we obtain the following system for the characteristic factor a and the coefficients a; , i = 0 , 1 , 2 , . . . :

346 Chapter VI

If we set a. # 0, then the first relation in (6.3.14) implies the characteristic equation

BU' + TU - 6 = 0,

whose roots

determine two possible values of the characteristic factor a in (6.3.13). Since a2 is negative, it follows from the boundary condition (6.3.5) that we need only to use ul = k! in (6.3.13). Therefore, the solution of Eq. (6.3.11) can be written in the form

Fo(2) = ao$(x), (6.3.15)

where

For the coefficients of the series (6.3.16) we have the estimate

Thus, the series (6.3.16) converges for any finite x > 0, we can differentiate this series term by term, and its sum $(x) is an entire analytic function satisfying the estimate

The constant ao in (6.3.15) can be found from the boundary condition (6.3.10) for the function Fo a t the switch point x,. Hence we have the following final expression for the solution of Eq. (6.3.11):

The nonhomogeneous equation (6.3.12) is of the same type as Eq. (6.3.11) and its solution can also be expressed in terms of generalized power series.


It is well known that the general solution of the nonhomogeneous equation (6.3.12) is the sum of the general solution of the homogeneous equation,

and any particular solution of Eq. (6.3.12). Equation (6.3.18) is similar to Eq. (6.3.11), and therefore, its solution can be constructed by analogy with the above-described procedure (6.3.13)-(6.3.17). Performing the required calculations, we obtain the following expression for the general solution of Eq. (6.3.18):

F l (x ) = ci$l(x) + ~ 2 4 2 ( ~ ) . (6.3.19)

Here cl and c2 are arbitrary constants and the functions ?Itl(x) and $ J ~ ( X )

are the sums of generalized power series

m 1 k:(ki + 1) ...( kt + n - 1)

*I(.) = xk: [I + C - n = l n! (a: + l ) ( a + 2) . . . ( a + n)

where the numbers k:, k i , and a: are determined by the expressions

Note that the series (6.3.20) for any finite x can be majorized by a convergent numerical series. Therefore, the series (6.3.20) can be differentiated and integrated term by term, and its sum &(x) is an entire function. Sim- ilar statements for the series (6.3.21) hold only for a: # n (here n is a positive integer number); in what follows, we assume that this inequality is satisfied.

A particular solution of the nonhomogeneous equation (6.3.12) can be found by the standard procedure of variation of parameters. We write the desired particular solution @ as

where the desired functions cl(x) and c2(x) satisfy the condition

348 Chapter VI

By substituting, instead of Fl, the relation (6.3.23) into (6.3.12), after simple calculations with regard to (6.3.24), we obtain

Note that the expression in the square brackets in the integrands in (6.3.25) and (6.3.26) is the Wronskian of Eq. (6.3.12), which is not zero for all x , since the solutions $l(x) and $2(x) are linearly independent. Therefore, we can readily calculate the integrals in (6.3.25) and (6.3.26) and thus find the functions cl(x) and c2(x) as generalized power series obtained by term-by-term integration in (6.3.25) and (6.3.26).

Thus the general solution of the nonhomogeneous equation (6.3.12) has the form

FI(x) = c ~ $ l ( x ) + c2$2(2) + a ( % ) , (6.3.27)

where Q(x) is given by (6.3.23), (6.3.25), and (6.3.26). To obtain the unique solution satisfying the Bellman equation (6.3.4) for x > x,, we need to choose arbitrary constants cl and c2 in (6.3.27). To this end, we use the boundary condition (6.3.10) for the function Fl(x) a t the switch point x,. To obtain the second condition, we assume that the functions Fo(x) and Fl(x) coincide as K + co with the known exact solution F ( x ) given by (6.3.6). I t follows from (6.3.16), (6.3.17), (6.3.20), (6.3.21), (6.3.25), and (6.3.26) that this condition is satisfied if we set cl = 0. The condition (6.3.10) for the function Fl(x) a t the point x, implies

Thus, the desired solution of the inhomogeneous equation (6.3.12) acquires the form

Formulas (6.3.17) and (6.3.28) determine the cost function F (x) that satisfies the Bellman equation (6.3.4) for all x E [O, XI). In these formulas, only the coordinate of the switch point x, remains unknown. To find x,, we use the condition that the cost function F (x) must be continuous a t the switch point:

x=x* (6.3.29)

x=x*


or, which is the same due to (6.3.10), the condition that the second-order derivative must be continuous:

Since the series (6.3.16) and (6.3.21) are convergent, we can calculate x, with any prescribed accuracy, and thus solve our equations numerically. Furthermore, for large values of the medium capacity K , formulas (6.3.29) and (6.3.30) give us approximate analytic formulas for the switch point, and these formulas allows us to construct control algorithms that are close to the optimal control.

6.3.3. The calculation of x, for large K . In the case K -+ m, the functions $(x), $l(x), and $z(x), as it follows from (6.3.16), (6.3.20), and (6.3.21), are given by the finite formulas

Correspondingly, instead of the series (6.3.15) and (6.3.28), we have

By substituting (6.3.31) and (6.3.32) into (6.3.30), we obtain x, = xo, where xo is given by (6.3.7) (derived in $2.4).

If the medium capacity K is a finite number, then the coordinate x, cannot be written as a finite formula. However, it follows from continuity considerations that for large K the coordinate x, is close to xo, so that we can take xo as the first approximation to the root of Eqs. (6.3.29) and (6.3.30). Then the corrections for refining this first approximation can be calculated by the following scheme.

For large K , the E = T / K B can be considered as a small parameter and, as follows from (6.3.15), (6.3.16), (6.3.20), (6.3.21), and (6.3.28), the functions Fo(x) and Fl(x) can be represented as power series in E:

350 Chapter VI

We also seek the root of Eqs. (6.3.29) and (6.3.30), that is, the coordinate x,, as the series

2, = xo + &Al + c 2 a 2 + E ~ . . . , (6.3.35)

where the numbers xo, A,, A,, . . . must be calculated. By substituting the expansions (6.3.33)-(6.3.35) into Eq. (6.3.29) (or (6.3.30)) and setting the coefficients of equal powers of the small parameter E on the left- and right- hand sides equal to each other, we obtain a system of equations for successive calculation of the numbers xo, Al , A2,. . . in the expansion (6.3.35).

Obviously, the first term xo in (6.3.35) coincides with (6.3.7). To calculate the first correction Al in the expansions (6.3.33) and (6.3.34), we retain the zero-order and the first-order terms and omit the terms - E~ and higher-order terms. As a result, from (6.3.16), (6.3.17), (6.3.20), (6.3.21), and (6.3.28) we obtain the following expressions for the functions Fo(x) and Fl(x) in the first approximation:

where

By differentiating (6.3.36) and (6.3.37) two times, we rewrite Eq. (6.3.30) as

To calculate the first two terms in the expansion (6.3.35), we substitute the root x, = xo + &Al into Eq. (6.3.38) and collect the terms of the zero


and the first order with respect to the small parameter E. If we retain only the zero-order terms in Eq. (6.3.38), then we can readily see that (6.3.38) implies formula (6.3.7) for xo. Collecting the terms of the order of E , from (6.3.38) we obtain the first correction

Thus, for large values of the parameter K (that is, for small E ) , the coordinate xo given by (6.3.7) can be interpreted as the switch point in the zero approximation. Correspondingly, the formula

where xo and Al are given by (6.3.7) and (6.3.39), determines the switch point in the first approximation.

Let uo(x) and ul(x) denote the controls

Obviously, by using these algorithms to control system (6.3.1), we can decrease the value of the functional (6.3.3) compared with its maximum value F(x) , which can be obtained by using the optimal control (6.3.9). How- ever, it is natural to expect that this decrease in the value of the functional (6.3.3) is negligible for large K, and moreover, the quasioptimal control ul(x) is "better" than the zero-approximation algorithm uo(x) in the sense that I[u$ > I[uo].

6.3.4. Results of the numerical analysis. Our expectations are confirmed by the following results of numerical analysis of the quasiopti- ma1 algorithms (6.3.41). By Gi (x) we denote the value of the functional (6.3.3) obtained by using the controls ui and a given initial population size x(0) = x. Then Gi(x) is a continuously differentiable function of the initial state x and satisfies the linear equation

352 Chapter VI

Denoting by Gio(x) and Gil(x), just as in Section 6.3.2, the values of the function Gi(x) on either side of the switch point xi, we obtain the following equations for Gio and Gil from (6.3.42):

which are quite similar to Eqs. (6.3.11) and (6.3.12). Therefore, the general solutions of these equations, by analogy with Section 6.3.2, have the form

where the functions $(x), &(x), and @(x) are given by formulas (6.3.16), (6.3.20), (6.3.21), (6.3.23), (6.3.25), and (6.3.26).

The functions (6.3.45) differ from the corresponding functions (6.3.17) and (6.3.28) in Section 6.3.2 by the method used for calculating the constants El and C2 in (6.3.45). In Section 6.3.2 the corresponding constants (ao in (6.3.15) and cl, cz in (6.3.27)) were determined by the condition (6.3.10) a t an unknown switch point x,, while in Eqs. (6.3.42) the switch point xi was given in advance either by (6.3.7) with i = 0 or by (6.3.40) with i = 1. By substituting (6.3.45) into the formulas Gio(xi) = Gil(xi) and G;,(xi) = G:l(x;),8 we obtain the following formulas for the coefficients El and C2 in (6.3.45):

By choosing specific numerical values of the parameters r , K, q, u, in problem (6.3.1)-(6.3.3), one can calculate the coefficients (6.3.46) and thus construct the plots of the functions Gi(x), i = 0,1, by using computers. We also note that the same formulas (6.3.45) and (6.3.46) can be used for numerical calculation of the cost function F ( x ) satisfying the Bellman equation (6.3.4). To this end, it suffices first to calculate the root of Eq. (6.3.29) (or (6.3.30)), and then to substitute the obtained value into (6.3.46) instead of xi. In this case, the functions Gio(x) and Gil(x) given by (6.3.45)

'These formulas follow from the condition that the solutions G;(z) of Eqs. (6.3.42) are continuously differentiable.


coincide, respectively, with the functions Fo(x) and Fl (x) given by (6.3.17) and (6.3.28), that is, we have Gi(x) z F(x) .

The above-described procedure for numerical constructing the functions Go(x), Gl(x), and F (x) was realized in the form of software and was used in numerical experiments for estimating the quality of the quasioptimal control algorithms uo(x) in the zero approximation and ul(x) in the first approximation. Some results of these experiments are shown in Figs. 53 and 54, where the cost function F (x ) is plotted by solid curves and the functions Go(%) and Gl(x) by dot-and-dash and dashed curves, respectively.

In Fig. 53 these curves are constructed for two values of the parameter K: K = 7.5 and K = 11; the other parameters of problem (6.3.1)-(6.3.3) are: r = 1 , 6 = 3 , B = l , q = 3, u, = 1.5, c = 3, and p = 2. In this case, the variable E = r / K B treated as a small parameter in the expansions (6.3.33)-(6.3.35) attains the values E = 0.091 (the upper group of curves) and E = 0.133 (the lower group of curves). Figure 53 shows that in this case all three curves F (x), Go(%), and GI (x), relative to the same group of parameters, are sufficiently close to each other. Hence, the use of the quasioptimal algorithms (6.3.41) ensures the control quality close to that of the optimal control (obviously, the first-approximation control ul(x) is

354 Chapter VI

preferable than the zero-approximation control uo(x), since the mean cost Gl(x) corresponding to ul(x) is closer to the optimal cost F(x) ) .

It is of interest to point out that an improvement in the control quality can be obtained by using ul(x) instead of uo(x) even if the parameter E = T I K B is not small. This phenomenon is clearly illustrated by the results of calculations shown in Fig. 54, where the curves F ( x ) , Go(x), and Gl (x) are drawn for the following parameters of problem (6.3.1)-(6.3.3): r = 1 , 6 = 2 0 , B = 1 , q = 3 , u m = 1 0 O , c = 3 , p = 2 , K = 0 . 3 , a n d K = 0.17.

Many times in Chapters 111, V, and VI we have considered similar situations (in which the formal use of the approximate synthesis procedure developed for problems with a small parameter c << 1 provides satisfactory results for E 1). Thus we see that the small parameter methods and related methods of successive approximations are very effective tools for investigation and solution of various specific practical problems of optimal control.

CHAPTER VII

NUMERICAL SYNTHESIS METHODS

Numerical synthesis methods are, in general, mostly universal compared with any other methods for solving problems of optimal control, since numerical methods are highly insensitive to the problem conditions.

Indeed, each of the approximate methods described in Chapters III-VI is intended for solving optimal control problems from a certain class characterized by the singularities of the plant dynamics equations, by small parameters, etc. The choice of the method for obtaining quasioptimal control algorithms essentially depends on the singularity of the control problem designed.

On the other hand, if the control problem is solved, just as in the present book, by the dynamic programming method, then the possibility to solve the synthesis problem numerically is determined by the way of constructing a numerical solution of the Bellman equation corresponding to the problem in question. The type of this Bellman equation is determined by the character of the problem considered. So, the majority of stochastic synthesis problems studied in Chapters II-VI correspond to the Bellman equations in the form of nonlinear second-order partial differential equation of the parabolic type. Correspondingly, the Bellman equations for deterministic synthesis problems are (nonlinear) first-order partial differential advection type equations.

Equations of both types were thoroughly studied long ago. Such equations arise in many problems of mathematical physics and mechanics of continuous media, in modeling chemical and biological processes, etc. Hence, so far numerous different numerical methods have been developed for solving such equations,' many of which are realized as standard programs that

'It would be right to note that numerical methods have been developed mostly for solving second-order parabolic equations. Nonlinear advection equations have been less studied until the present time. However, many papers dealing with qualitative analysis and numerical solution of such equations have appeared most recently. Here we would like to mention the Italian school of mathematicians (M. Falcone, R. Ferretti, and others) who studied various discrete schemes that allow the construction of numerical solutions for various types of nonlinear advection equations including those with discontinuous solutions [lo, 31, 48, 49, 531.

Chapter VII

are parts of well-known software packages such as MATLAB, Mathematzca, and some others.

It should be noted that the existing software can be used for solving synthesis problems in practice rather seldom. This fact is related to some peculiar features of the Bellman equations (see $3.5 in [34]), which make the application of standard numerical methods rather difficult. For example, the difficulties arising in solving the Bellman equations of higher dimensions are well known. Furthermore, an obstacle known as the " boundary difficulty" is often encountered in the numerical solution of synthesis problems.

Obviously, any numerical procedure allows us to construct the solution of the Bellman equation only in a bounded region D where the arguments of the loss function vary. Therefore, if, for example, we solve the Bellman equation of parabolic type, then we need to pose the initial and boundary conditions on the boundary of D. At the same time, many optimal control problems do not contain any restrictions on the phase coordinates (in this case, to solve the synthesis problem, we need to solve the Cauchy problem for the Bellman equation). Thus, for a reasonable choice of the boundary conditions required for the numerical solution of the problem, we need, in addition, to study the asymptotic behavior of the loss function a t infinity. In more detail, these problems are considered in $7.1.

In $7.1 and $7.2 we show how one of the most widely used methods (known as the grid function method) for solving partial differential equations numerically can be applied for the numerical solution of some specific optimal control problems studied in the previous chapters by other methods.

$7.1. Numerical solution of the problem of optimal damping of random oscillations

The main results of this section are related to the numerical solution of the problem of optimal damping of random oscillations in a linear oscillator; this problem was studied in $3.2 and $3.4. However, we begin with some general problems concerning methods for stating the boundary conditions for the loss function in solving the synthesis problem numerically.

7.1.1. Choice of the boundary conditions for the loss function. Let us consider a control system governed by the differential Ito equation of the form

dx(t) = [a(t, x) + q(t)u] dt + ~ ( t , X ) d o ~ ( t ) ,

0 < t 5 T, x(0) = zo.

Numerical Synthesis 357

Here x = x(t) is an n-dimensional vector of phase variables, u = u(t) is an r-dimensional vector of controlling actions, q(t) is a d-dimensional vector of independent Wiener stochastic processes of unit intensity, a( t , x) is an n-dimensional vector of given functions, and q(t) and a ( t , x) are given n x r and n x d matrices.

We assume that admissible control actions are subject to constraints of the form

u(t) E U1 (7.1.2)

where U is a given closed bounded set in R,. If the vector of current phase variables x(t) can be measured exactly,

then we need to construct a control function u, = u, (t, x(t)), 0 5 t 5 T , in the synthesis form so that, for any given initial state x(0) = xo, the function u* minimizes the following functional defined on the trajectories of Eq. (7.1.1):

(7.1.3)

(here EL.] is the mathematical expectation of [.I, c(x), $(x) > 0 are given penalty functions, and 0 5 t < T is a given time interval).

According to $1.4, the dynamic programming approach allows one to reduce problem (7.1.1)-(7.1.3) to solving the partial differential equation (the Bellman equation)

a a a2 L = a t + a T ( t , ~ ) - + ~ p b ( t , x ) -

1

dx dxdxT ' b(t , 2) = -a(t , 2 x)aT (t, x).

Here F = F( t , x) is the loss function determined as usual by

Equation (7.1.4) is a semilinear (linear with respect to higher-order derivatives) equation of parabolic type, and we shall try to solve it numerically by using different versions of well-studied finite-difference procedures in the grid function methods (the grid methods) 1135, 162, 163, 1791. How- ever, these calculational scheme allow one to obtain the solution only in a bounded domain D of the phase variables x. To apply these methods, we need to impose some boundary conditions on the loss function F ( t , x) on the boundary of D. Since in the initial statement of the problem it is

358 Chapter VII

assumed that the solution of Eq. (7.1.4) must be defined on an unbounded phase space (x E R,), the boundary conditions for F ( t , x) require a special analysis if Eq. (7.1.4) is solved numerically.

A possible method for overcoming the boundary indeterminacy in stochastic time-optimal control problems was proposed in [85].

For the problem considered here, the essence of the method suggested in [85] consists in the following. Suppose that it is required to construct a numerical solution of Eq. (7.1.4) in a bounded region D. Let us consider a sequence of expanding bounded regions in the phase space DR > D (DR can be the n-dimensional ball of radius R or the n-dimensional cube with edge R centered a t the origin). Then the desired solution F ( t , x) is defined in the region D as the limit of the sequence of numerical solutions of the boundary value problems for Eq. (7.1.4) in the regions DR, corresponding to the increasing sequence of values of the parameter R. In this case, the boundary conditions posed on the boundaries of the regions DR can be arbitrary (for example, the zero conditions F ( t , x) l a D , = 0).

However, in practice, the use of this procedure in the numerical synthesis requires an extremely large amount of calculations. For example, already for the second-order system (7.1.1) (x E R2) , this method is unacceptable, since the time required to compute the solution is too large.

Here we present a more economical numerical method based on the use of the asymptotic behavior of the loss function for large 1x1. In this case, we need a priori estimate the asymptotic behavior of F ( t , x) satisfying (7.1.4) as 1x1 + m.

Suppose that q(t) be a piecewise continuous bounded function for all t E [0, T] and a(t , x) and u( t , x) are continuous functions in x, Bore1 function in (t, a ) , and satisfy the conditions

for all X , y, E R, and t E [O,T], where N > 0 are constants, la1 is the Euclidean norm of the vector a, and I I c T I I = Jw.

We assume that the penalty functions c(x), $(x) > 0 are continuous and satisfy the condition

for all x E R, and some m, Nl, N2 > 0; furthermore,

for all R > 0 and x, y E SR (SR is a ball of radius R in R,).


By using Theorem IV.l.l [113], one can show that the conditions (7.1.6), (7.1.8), together with the upper estimates (7.1.7), guarantee that the function F ( t , x) satisfying problem (7.1.1)-(7.1.3) has generalized first-order derivatives in x and the estimate

is satisfied for any t E [0, T] and for almost all x. The lower bounds for the penalty functions (7.1.7) and the continuity of

the phase trajectories x(t) imply the following lower estimate for the loss function:

F ( t , x) > N ( 1 + 1x1)". (7.1.10)

Let ~ ' ( t , x) denote the solution of the linear equation

(L is the operator in (7.1.4)). Obviously, F0 is the value of the functional

This functional is calculated on the trajectories of system (7.1.1) corresponding to the noncontrolled motion (the averaging in (7.1.12) is performed over the set of sample paths x(s) : t < s ( T issued from a given point x(t) = x and satisfying the stochastic differential equation (7.1.1) for u E 0).

It follows from (7.1.4) and (7.1.11) that the difference G(t, x) = Fo(t , x)- F ( t , x) satisfies the equation

Here @ denotes the nonlinear function @(t, F,) = - min,EU[uTqT F,]. Since the set U of admissible controls and the function q(t) are bounded, we have the estimate

I @ @ , F x ) I 5 NIFx(t, x)I. (7.1.14)

If the transition probability density of a noncontrolled Markov process x(s) satisfying Eq. (7.1.1) for u G 0 is denoted by p(x, t; y, s) ( s > t ) , then we can write the solutions of Eqs. (7.1.11) and (7.1.13) in quadratures (see (3.4.13)). In particular, for the function G we have

360 Chapter VII

This relation and (7.1.9) imply the following (similar to (7.1.9)) upper bound for the difference G = F - FO:

Hence, with regard to (7.1.10), we obtain

as Ix 1 + co. This condition allows us to use FO(t , x) as the asymptotics of the loss function F ( t , x) for solving the Bellman equation (7.1.4) numerically.

In some cases, for instance, in the example considered below, we succeed in obtaining a finite analytic formula for the function FO(t , x).

7.1.2. Numerical solution of a specific problem. We shall discuss the method of numerical synthesis in more detail for the problem of optimal damping of random oscillations studied in $3.2 and $3.4. Suppose that the plant to be controlled is a linear oscillator with one degree of freedom governed by an equation of the form

where [( t ) is the scalar standard white noise (1.1.31), u is a scalar control, and p, B , and urn are given positive numbers (P < 2). By setting the penalty functions c(x(t)) = x2(t) +k2(t) and +(x) = 0 in (7.1.3), we obtain the Bellman equation

for the loss function F ( t , x, y) (here x and y = j: are the phase variables). By passing to the reverse time p = T - t , we can rewrite (7.1.18) as the standard Cauchy problem for a semilinear parabolic equation. By using the old notation t for the reverse time p, we rewrite (7.1.18) as

We shall seek the numerical solution of Eq. (7.1.19) in the quadratic (-L 5 x < L, -L < - y < - L) region D of the phase variables (see Fig. 55). We need to pose boundary conditions for the function F ( t , x, y) on the boundary of D. It follows form (7.1.17) that the phase trajectories lying in

Numerical Synthesis

the interior of D cannot terminate on the boundary segments BC and ED indicated by dashed lines in Fig. 55. Therefore, we need not pose boundary conditions on these segments; on the other parts of the boundary, as it follows from Section 7.1.1, the boundary conditions are posed with the aid of the asymptotics FO( t , x, y) satisfying the linear equation

Up to the notation, Eq. (7.1.20) coincides with Eq. (3.4.23) whose solution was obtained in $3.4 as the finite formula (3.4.29). Rewriting (3.4.29) with regard to the notation used in the present problem, we obtain the solution of Eq. (7.1.20) in the form

Formula (7.1.21) allows us to pose the boundary conditions for the desired function F = F ( t , x, y) on the unhatched parts of the boundary

362 Chapter VII

of D = (-L < x ,y < +L). To this end, we set F = F( t , x , y ) = FO( t , -L, y) on AB, F = FO(t , x, L) on C F , F = FO(t , L, y) on E F , and F = FO(t , x, -L) on AD.

Let us construct a uniform grid in the domain IIT = {D x [0, TI) = {(x, y , t ) : - L < x, y < L, 0 < t < T}. By F& we denote the value of the function F ( t , x, y) a t the point with coordinates (t = k ~ , x = ih, y = jh) , where h and T are the approximation steps in the coordinates x, y and in time t and i, j, k are integer-valued variables with values in -Q 5 i 5 +Q, -Q 5 j < +Q, and 0 < Ic < K (L = Qh, T = KT).

The boundary conditions for the grid function F$ have the following form (here FO(t , x, y) is the function (7.1.21)):

~ 6 , ~ = FO(ICT, ~ h , j h ) , o 5 j 5 Q;

F!~,~ = F0(k7, -Qh, jh) , -Q < j < 0;

FtQ = F0(k7-,ih, Qh), -Q + 1 < i < Q;

FtPQ = Fo(k7, ih, - ~ h ) , -Q < i < Q - 1.

It follows from (7.1.19) that for Ic = 0 we must set

a t all nodes of the grid. For the difference approximation of Eq. (7.1.19) we shall use a locally

one-dimensional solution method (a lengthwise-transverse scheme) [163]. In this case the complete approximation scheme consists in solving the following two one-dimensional (with respect to the phase coordinates) equations successively:

Each of Eqs. (7.1.24) and (7.1.25) is replaced by a two-layer difference scheme defined by the three-point pattern (Eq. (7.1.24)) or by the four-point pattern (Eq. (7.1.25)). In this case, since the parts of the boundary of D indicated by dashed lines in Fig. 55 are inaccessible, we shall approximate v, = dvldx by the right difference derivative for y > 0 ( j > 0) and by the left difference derivative for y < 0 ( j < 0). Then the derivatives V, = dV/dy and V,, = d2V/dy2 are approximated by the central difference derivatives

Numerical Synthesis

The values of the grid function v t j and at the grid nodes are calculated successively for the time layers k = 1,2, . . . by an implicit scheme. In this case the (k + 1)th layer function vtjfl corresponding to Eq. (7.1.24)

is used as the initial function vtjfl = K!~ for solving Eq. (7.1.25). The grid functions F$ corresponding to the original equation (7.1.19) and the functions vf,j and 63 corresponding to the auxiliary equations (7.1.24) and

(7.1.25) are related as follows: 8'tj = vFTj, vf,fl = Yzj, and ~ , f ? l = F ~ + ~ i, j - Moreover, since the time-step is assumed to be small (we take r = 0.01), in the difference approximation of Eq. (7.1.25) we can use the sign of the derivative Vk = vkfl instead of s ign (~ :z l - T/,::-l1), that is, we shall use ui,j = sign(~f"j+, - q$-l) instead of sign V, (a similar replacement was performed in [34, 861).

It follows from the preceding that the difference approximation transforms Eqs. (7.1.24) and (7.1.25) to the following three difference equations;

Formulas (7.1.26) and (7.1.27) together with the boundary conditions (7.1.22) and the initial conditions (7.1.23) allow us to calculate the functions v f t l recurrently at all nodes of the grid. Indeed, rewriting (7.1.26) and

13 (7.1.27) in the form

364 Chapter VII

we see that, for given vfTj = F& and each fixed j 2 0, the desired set of the values of v f t l can be calculated successively from right to left by formula

73

(7.1.29). For the initial value of vz: we take FO((k + l ) ~ , L, jh ) , where FO(t , x, Y) is the function (7.1.21). Correspondingly, for j < 0 the values of vf,fl can be calculated from left to right by formula (7.1.30) with the

initial value vF'Qtj = F0 ((k + l ) ~ , -L, jh) .' Since vf,:' = yfEj, we obtain the grid function r/;fj for the kth time layer,

after the grid function vtf is calculated. Now to calculate the grid function

v,::' = F;T1 on the layer (k + I ) , we need to solve the linear algebraic system (7.1.28). I t is convenient to solve this system by the sweep method [162, 1791, which we briefly discuss here. Let us denote the desired values of the grid function on the layer (k + 1) by zj = K:;'. Then system (7.1.28) can be written in the form

where Aj , Cj , Mi, and cpj are well-known expressions

Aj = 2 r B + hr ( ih + j p h + u,ui,j), C j = 2h2 + 47B,

Mj = 2 r B - hr ( ih + j p h + umuilj), cpj = 2h2(vf,T1 + ~ ( j h ) ' ) . (7.1.32)

Since the number of equations in (7.1.31) is less than the number of unknown variables z j , -Q 5 j 5 Q, to solve the system (7.1.31) uniquely, we need to complete this system with two conditions

zPQ = FO((k + l ) ~ , ih, -L), zQ = ~ ' ( ( k + l ) ~ , ih, L) (7.1.33)

that follow from the boundary conditions (7.1.22). We seek the solution of problem (7.1.31), (7.1.33) in the form

where the coefficients p j and uj are calculated by the recurrent formulas

2The recurrent formulas (7.1.29) and (7.1.30) are used for k = 0 ,1 ,2 , . . ., K - 1. It follows from (7.1.23) that in (7.1.29) and (7.1.30) we must set vPIj = 0, -Q 5 i , j 5 Q, for k = 0.


with the initial conditions

p-Q+I = 0 7 Y-Q+l = F0 ((x: + l ) ~ , i h , - L ) .

Thus, the algorithm for solving problem (7.1.31), (7.1.33) by the sweep method consists in the following two steps:

( 1 ) to find l ~ j and vj recurrently for -Q + 1 5 j 5 Q (from left to right from j to j + 1) by using the initial values (7.1.36) and formulas (7.1.35);

( 2 ) employing z from (7.1.33), to calculate (from right to left from Q j + 1 to j ) the values z ~ - ~ , z ~ - ~ , . . . , z - ~ + ~ , z - ~ S U C C ~ S S ~ V ~ ~ ~ according to formulas (7.1.34) (note that in this case, in view of (7.1.36), the value of z coincides with that given by (7.1.33)). -Q

As was shown in [162, 1791, the procedure of calculations by formulas (7.1.34) and (7.1.35) is stable if for any j we have

It follows from (7.1.32) that these conditions can be reduced to the following one in the problem in question:

Obviously, the last condition can always be satisfied by choosing a sufficiently small approximation step h.

This calculational procedure was realized as a software package used for numerical experiments on computers. The parameters of a &fference scheme were chosen so that to ensure a prescribed accuracy. It is well known [I631 that the total locally one-dimensional approximation scheme (7.1.22), (7.1.23), (7.1.26)-(7.1.28) is absolutely stable and its error is O(h2 + r ) .

The approximation steps were: T = 0.01 and h = 0.1. The dimensions of the region D were: L = 3 and Q = 30. The other parameters 0, urn, B of the problem were different in different specific calculations. The two- dimensional data array of the loss function F ( t , s , y) was printed for t = 0.25, 0.5, 0.75 ,....

Some results of these calculations are shown in Fig. 56-60. Figure 56 presents the axonometry of the loss function F ( t , x , y) in Eq. (7.1.19) with j3 = B = urn = 1 a t three time moments t = 0.25, 0.5, 1.0. Figure 57 shows curves of constant level F ( t , x , y) = 3 and switching lines in an optimal system with ,6 = B = urn = 1 a t three time moments t = 0.5, 2.0, 8.0. In view of the central symmetry of Eqs. (7.1.19), these curves are plotted in two different halves of the region D. The switching line uniquely determines the optimal control of system (7.1.17) as follows: u = -urn a t the points of

Chapter VII

the phase plane (2, y) lying above the switching line, and u = +urn below this line.

Figure 58 illustrates how the switching line and the value of the performance criterion of this optimal system depend on the value of the admissible control u, for B = /? = 1 and t = 4. In Fig. 58 one can see that an increase in the range of admissible controls uniformly improves the control quality,


that is, decreases the value of the optimality criterion independent of the initial state of system (7.1.17).

Figures 59 and 60 show how the switching lines and the constant level curves depend on the other parameters of the problem.

Chapter VII

$7.2. Optimal control for the "predator-prey" system (the general case)

In this section we consider the deterministic problem of optimal control for a biological system consisting of two interacting populations ("preda- torsff and "prey"). We have already considered this system in $5.2 where we studied a special type of this system called in $5.2 the case of a "poorly adapted predator." In what follows, we consider the general case of this problem. The synthesis problem corresponding to this case is solved numerically. Furthermore, we obtain some analytic results for a control problem with infinite horizon.

7.2.1. The normalized Lotka-Volterra model. Statement of the problem. We assume that the system considered is described by the Lotka-Volterra model (see [133, 186, 1871 as well as $2.3 and $5.2) in which the behavior of the isolated system is governed by a system of the form

Here x1(7) and y1(7) are the sizes (densities) of prey and predator populations at time T, the positive numbers ai (i = 1,2,3,4) characterize the intraspecific (al, a4) and interspecific (az, as) interactions. By changing the variables

we rewrite system (7.2.1) in the dimensionless (normalized) form


Just as in $5.2, we assume that the exsernal (controlling) action on system (7.2.2) is to remove some prey species from the habitat (by catching, shooting, or using some chemical substances). In this case, the control system considered is described by equations of the form

x(t) = (1 - y)x - ux, $(t) = b(x - l )y , (7.2.3)

t > 0, x(0) = xo > 0, ~ ( 0 ) = yo > 0,

where u = u(t) is a nonnegative bounded scalar controlling function that for all t > 0 satisfies the constraints

where urn is a given positive number. Let us consider the phase trajectories of the controlled system (7.2.3).

They are solutions of the differential equation

First, we note that in view of Eqs. (7.2.3), the phase variables z ( t ) and ~ ( t ) cannot attain negative values for all t 2 0 if the initial values xo and yo are nonnegative (the last assumption is always satisfied, since xo and yo denote the initial sizes of the prey and predator populations, respectively). There- fore, all solutions of Eq. (7.2.5) (the phase trajectories of system (7.2.3)) lie in the first quadrant (x 2 0, y > 0) of the phase plane (x, y). Furthermore, we shall consider only the phase trajectories that correspond to the two boundary values of control: u = 0 and u = u,.

For u = 0 Eqs. (7.2.3) coincide with Eqs. (7.2.2) for an isolated (autonomous) Lotka-Volterra system. The dynamics of system (7.2.2) was studied in detail in 11871. Omitting the details, we only note that in the first quadrant (x > 0, y 2 0) there are two singular points ( z = 0, y = 0) and (x = 1, y = 1) that are the equilibrium states of system (7.2.2). In this case the origin (x = 0, y = 0) is an unstable equilibrium state, while the state (x = 1, y = 1) is stable and is a center type singular point. All phase trajectories of system (7.2.2) (except for the two that lie on the coordinate axes: (x > 0, y = 0) and (x = 0, y > 0)) form a family of closed concentric curves around the point (x = 1, y = 1). Thus, in a noncontrolled system the sizes of both populations are subject to undecaying oscillations whose period and amplitude depend on the initial state (xo, yo). However, if the initial state (zo, yo) lies on one of the coordinate axes in the plane (x, y), then there arise singular (aperiodic) phase trajectories. In this case it follows from Eqs. (7.2.2) that the representative point of the system cannot

370 Chapter VII

leave the corresponding coordinate axis and in the course of time either approaches the origin (along the y-axis) or goes to infinity (along the x- axis). The singular phase trajectories correspond to the degenerate case of system (7.2.2). In this case, the biological system considered contains only one population.

If u = urn > 0, then the dynamics of system (7.2.3) substantially depends on urn. For example, if 0 < urn < 1, then the periodic character of solutions of system (7.2.3) is conserved (just as in the case u = 0), while only the center of the family of phase trajectories moves to the point (x = 1, y = 1 - urn). For u, 2 1 the solution of system (7.2.3) is aperiodic. In the special case u, = 1, Eq. (7.2.5) can easily be solved, and the phase trajectories of system (7.2.3) can be written explicitly as

For urn > 1 Eq. (7.2.5) has a unique singular point (x = 0, y = O ) , and this equilibrium state is globally asymptotically table.^

Now let us formulate the goal of control for system (7.2.3). In many cases 190, 1051 it is most desirable that system (7.2.3) is in equilibrium for u = 0, that is, the point (x = 1, y = 1) is the most desirable state of system (7.2.3). In this case, one is interested in a control u, = u,(x, Y) that takes system (7.2.3) from any initial state (xo, yo) to the point x = 1, y = 1 in a minimum time. This problem was solved in 1901. Here we consider the problem of constructing a control u, = u,(t, x, y), which, in general, does not guarantee that the system comes to the equilibrium point (x = 1, y = 1) but ensures the minimum mean square deviation of the system phase trajectories from the state (x = 1, y = 1) in a given time interval 0 < t 5 T:

7.2.2. The Bellman equation and calculation of the boundary conditions. By using the standard procedure of the dynamic programming approach (see $1.3), we obtain the following algorithm for solving problem (7.2.3.), (7.2.4), (7.2.7).

3 ~ n this case the term "global" means that the trivial solution of system (7.2.3) is asymptotically stable for any initial values (xo, yo) from the first quadrant of the phase plane.


Now we define the loss function (the functional of minimum future losses) by the relation

F ( t , x , ~ ) = min 0 j u ( u ) l u m

t<o<T

and thus write the Bellman equation for problem (7.2.3), (7.2.4), (7.2.7) as

x , y > O , O i t < T , F ( T , x , y ) = O . (7.2.9)

If the function F (t, x, Y) satisfying (7.2.9) is found, then the desired optimal control u, (t, x, y) in the synthesis form is given by the expression

for g(t, x, y) < 0,

for g ( t , x, y) > 0.

By using (7.2.10), we can rewrite the Bellman equation in the form

It follows from (7.2.10) that the optimal control is a relay type function, that is, a t each time instant the control u is either u = 0 or u = u, (this is a bang-bang control). If the loss function (7.2.8) is continuously differentiable with respect to x, then the control is switched from one value to the other each time when the condition

is satisfied. Equation (7.2.12) determines the switching line on the phase plane (x, y) a t each time instant. This switching line divides the phase space x, y > 0 into two regions Ro and R, where the control u is either u = 0 or u = u,, respectively. To find the switching line is equivalent to solve the problem of optimal control synthesis.

372 Chapter VII

Of course, it must be remembered that the above procedure for solving the synthesis problem can be used only if the loss function (7.2.8) is sufficiently smooth and the Bellman equation (7.2.9) (or (7.2.11)) holds a t all points of the domain TIT = {x, y > 0,0 < t 5 T) of definition of the loss function. The smoothness properties of solutions satisfying equation of the form (7.2.9) (or (7.2.11)) were studied in detail in [172]. As applied to Eq. (7.2.9), the main result of [I721 has the following meaning. The loss function F ( t , x, y) satisfying (7.2.9) has continuous first-order derivatives with respect to all its arguments in the regions Ro and R,. On the interface between Ro and R,, that is, on the switching line, the derivatives dF/dx and dF/dy can be discontinuous (have jumps) depending on type of the switching line. Namely, for the switching lines of the first and second kind, the first-order derivatives of the loss function are continuous everywhere in IIT. On the switching line of the third kind, the partial derivatives dF/dx and dF/dy always have jumps. Recall that, according to the classification given in [172], the type of the switching line is determined by the character of the phase trajectories of system (7.2.3) in the regions Ro and R, near the switching line. For example, if the phase trajectories approach the switching line on both sides, then such a switching line is called a switching line of the first kind. In this case, the representative point of system (7.2.3), once coming to the switching line, moves along this line in the sliding mode (see $1.1). If the phase trajectories approach the switching line on one side (say, in the region Ro) and leave it on the other side (in R,), then we have a switching line of the second kind. Finally, if the switching line coincides with a phase trajectory in the region R, (or Ro), then we have a switching line of the third kind.

In what follows, switching lines of the third kind do not occur; thus we can assume that for problem (7.2.3), (7.2.4), (7.2.7) studied here the Bellman equation (7.2.9) (or (7.2.11)) is valid everywhere in the region x > 0, y > 0, 0 5 t < T, and in this region the function F ( t , x, y) satisfying this equation has continuous first-order derivatives with respect to all its arguments.

To solve Eq. (7.2.9) uniquely, we need to pose boundary conditions for the loss function F ( t , x, y) on the boundary of the region of admissible phase variables, that is, for x = 0 and y = 0. Such boundary conditions can readily be obtained by a straightforward calculation of the functional on the right in (7.2.8) by using Eqs. (7.2.3) describing the system considered.

Let us write F ( t , 0, y) = p(t , y) and F( t , x, 0) = G(t, x). Then, using (7.2.3) and (7.2.8), we obtain


To find $(t, x ) , we need to solve the following one-dimensional optimization problem

( u ) = 1 - u ) x ( ) > t , x ( t ) = 2.

Problem (7.2.14) can readily be solved, although the solution of (7.2.14) and hence the form of the function $(t , x ) substantially depend on the value of urn.

(a) Let 0 < urn < 1. In this case the points

divide the x-axis into three intervals. On the intervals 0 5 x 5 x l and x2 < x < co, the function has the explicit form

{ 2(T - t ) - 2 ~ [ e ( ~ - ~ ) - 11 + x2[e2(T-t) - 1] /2 , 0 5 x < 21,

$( t , x ) = 2(T - t ) - f i [ e ( l - u m ) ( T - t ) 1-U, - 11

+ L [ e 2 ( 1 - u m ) ( ~ - t ) - 11 Z(1-urn)

x2 < x. (7.2.16)

On the interval x l < - x < - 2 2 , the function $(t , x ) is given by the formula

where z is the root of the transcendental algebraic equation

One can readily see that the possible values of the root z of Eq. (7.2.18) always lie in the region 1 < - z < - e( l -um)(T-t ) and the boundary values z = 1 and z = e(l-um)(T-t) correspond to the endpoints (7.2.15) of the interval x1 5 x < 2 2 . The optimal control u,, which solves problem (7.2.14), depends on the variable x ( t ) = x and is determined as follows:

if x < 2 1 , then u* 2 0, t 5 u 5 T , if x 2 2 2 , the nu*^^,, t < u < T ,

3 74 Chapter VII

for x(u) < x* = x ~ l / ( l - ~ m ) , i f x l < x 5 x2, then u* = {

urn, for x(u) > 2,.

(b) Let urn = 1. In this case, for u = urn the coordinate x(u) = const, and problem (7.2.14) has the obvious solution

for x(u) < 1, u* = {;rn,

for x(u) 2 1.

The minimum value of the functional (7.2.14) can readily be calculated for control (7.2.19), and as a result, for the desired function $(t, x) we obtain the expression

$(t, 2) = I O 5 x 5 e-(T-t),

( T - t ) - l n x + 22 - x2/2 - 312, e-(T-t) < x < 1,

( ( T - t ) ( 2 - 2 x + x 2 ) , x > 1. (7.2.20)

(c) Let urn > 1. In this case the optimal control solving problem (7.2.14) coincides with (7.2.19).4 After some simple calculations, we obtain

4For e-(T-t) < x < e ( ~ m - l ) ( ~ - ~ ) , there always exists a time instant a0 at which the solution x ( o ) of the equation x ( a ) = (1 - u , ) x ( a ) , a 2 t , x ( t ) = x , attains the value x(oo) = 1. After the time 0 0 , the control (7.2.19) ensures the constant value x ( o ) 5 1: a0 5 a 5 T by switching the control u infinitely fast between the boundary values u = 0 and u = urn (the sliding mode). Just the same trajectory % ( a ) : t < a 5 T but without the sliding mode can be obtained by using, instead of (7.2.19), the control

0, for %(a) < 1,

u . = { l , for x ( u ) = l ,

Urn r for =(a) > 1.

Under this control we can realize the generalized solution in the sense of Filippov of the equation x ( u ) = (1 - u * ) x ( o ) (see 1541 and $1.1).


Thus, to find the optimal control in the synthesis form that solves problem (7.2.3), (7.2.4), (7.2.7), we need to solve the following boundary value problem for the loss function F ( t , x, y):

where u, has the form (7.2. lo ) , ~ ( t , y) is given by formula (7.2.13), and the function +( t ,x ) is given by expressions (7.2.16)-(7.2.18), (7.2.20) or (7.2.21) depending on the value of the maximum admissible control u,.

The boundary value problem (7.2.22) was solved numerically. The results obtained are given in Section 7.2.4.

7.2.3. Problem with infinite horizon. Stationary operating mode. Let us consider the control problem (7.2.3), (7.2.4), (7.2.7) on an infinite time interval (in this case the terminal time T -+ ca). If the optimal control u, (t, x, y) that solves problem (7.2.3), (7.2.4), (7.2.7) ensures the convergence of the functional (7.2.8) for any initial state (x > 0, y > 0) of the system, then due to time-invariance of Eqs. (7.2.3) the loss function (7.2.8) is also time-invariant, that is, F (t, x, y) -+ f (x, y), where the function f (x, y) satisfies the equation

which is the stationary version of the Bellman equation (7.2.9). In this case, the optimal control u,(x, y) and the switching line do not

depend on time explicitly and are given by formulas (7.2.10) and (7.2.12) with F (t, x, y) replaced by the loss function f (s, y).

Let us denote the loss function f ( x , y) in the region Ro (u, = 0) by fo(x, y), and the loss function f (x, y) in the region R, (u, = u,) by f,(x, y). In Ro the function fo satisfies the equation

Correspondingly, for the function f, defined on R, we have

376 Chapter VII

Since the gradient of the loss function is continuous on the switching line, that is, on the interface between Ro and R,, we have

Equations (7.2.24)-(7.2.26) allow us to obtain explicit formulas for the partial derivatives a f /dx and a f l a y along the switching line

If the switching line contains intervals of sliding mode, then formulas (7.2.27) allow us to find these intervals and to obtain explicit analytic formulas for the switching line on these intervals. As was shown in $4.1 (see also [172]), the second-order mixed partial derivatives of the loss function f (x, y) must coincide on the intervals of sliding mode, that is, we have

By using formulas (7.2.27), one can readily see that the condition (7.2.28) is satisfied along the two lines y = x and y = 2 - x. To verify whether these lines (or some parts of them) are lines of the sliding mode, we need to consider the families of phase trajectories (that is, the solutions of Eq. (7.2.5)) for u = 0 and u = u, near these lines.

The corresponding analysis of the phase trajectories of system (7.2.3) shows that the sliding mode may take place along the straight line y = x for x < 1 and along the line y = 2 - x for x > 1. In this case the representative point of system (7.2.3) once coming to the line y = x (x < 1) moves along this lines (due to the sliding mode) away from the equilibrium state (x = 1, y = 1). On the other hand, along the line y = 2 - x (x > I ) , system (7.2.3) asymptotically as t + oo approaches the point (x = 1, y = 1) due to the sliding mode. Tha t is why, only the straight line segment

can be considered as the switching line for the optimal control in the stationary operating mode.

If u = urn, then the integral curve of Eq. (7.2.5) is tangent to the line y = 2 - x a t the endpoint xO of the segment (7.2.29). By using (7.2.5), we can write the tangency condition as


For different values of the parameters in problem (7.2.3), (7.2.4), (7.2.7) (that is, of the numbers b > 0 and urn > O ) , the solution of Eq. (7.2.30) has the form

[3b - 1 - urn - J (3b - 1 - u , )~ - 8b(b - 1)] /2 (b - I ) ,

if O<u, < 1, b # 1 or urn 2 1, b > b,, x0 =

2 / ( 2 - u,), if 0 < urn < 1, b = 1,

if urn 2 I, b < b,,

where the number b, is equal to

One can easily obtain a finite formula for the stationary loss function f (x, y) along the switching line (7.2.29). By using the second equation in (7.2.3) and formula (7.2.29), we see that the coordinate y(t) is governed by the differential equation

Y = b(y - y2) (7.2.32)

while moving along the straight line (7.2.29). By integrating (7.2.32) with the initial condition y (0 ) = y, we obtain

Using (7.2.33) and the relation x ( t ) = 2 - y ( t ) and calculating the functional I in (7.2.7) for T = co, we find the desired stationary loss function

Here y is an arbitrary point in the interval 2 - x0 < y 5 1.

7.2.4. Numerical solution of the nonstationary synthesis problem. If the control time T is finite, then the algorithm of the optimal control u,(t, x , y ) depends on time and, to find this control, we need to solve the nonstationary Bellman equation (7.2.22). This equation is solved numerically in the bounded region R = {0 5 x 5 x,,,, 0 5 y 5 y,,,, 0 5 t 5 T } . To this end, in R we construct the grid

w = {xi = ih,, i = 0 , 1 , . . ., N,, h,N, = x,,,;

yj = j h y , j =O, l , . . . ,N , , h,Ny =ymax;

t k = k r , k = 0 , 1 , ..., N , r N = T } , (7.2.35)

378 Chapter VII

and define the grid function F; that approximates the desired continuous solution F ( t , x, y) of Eq. (7.2.22) a t the nodes of the grid (xi, yj, tk). The values of the grid function ~4 a t the nodes of the grid (7.2.35) are related to each other by algebraic equations obtained by the difference approximation of the Bellman equation (7.2.22). In what follows, we use well-known methods for constructing difference schemes [60, 135, 162, 1631, therefore, here we restrict our consideration only to a formal description of the difference equations used for solving Eq. (7.2.22) numerically. We stress that the problems of approximation accuracy and stability and of the convergence of the grid function F; to the exact solution F ( t , x, y) of Eq. (7.2.22) as h,, hy, T + 0 are studied in detail in [49, 53, 135, 162, 163, 1791.

Just as in $7.1, by using the alternating direction method [163], we replace the two-dimensional (with respect to the phase variables) equation (7.2.22) by the following pair of one-dimensional equations:

each of which is approximated by a finite-difference scheme with fractional steps in the variable t. To ensure the stability of the difference approximation of Eqs. (7.2.36), (7.2.37), we use the scheme of "oriented differences" [163]. For 0 < i < Nx and 0 < j < Ny, 0 < k < N , we replace Eq. (7.2.36) by the difference scheme

where

and the approximation steps h, and T satisfy the condition TIT,^ 5 hr for all r, on the grid w.

For Eq. (7.2.37) we used the difference approximation


where

and the steps hy and T are specified by the condition T1ry[ < hy for all ry on the grid (7.2.35).

The grid functions for the initial Bellman equation (7.2.22) and for the auxiliary equations (7.2.36), (7.2.37) are related as F; = vk. '3 ' vk-0.5 '3 = ~ k - 0 . ~ and ~ 5 - I =

'3 ' '3 The grid functions are calculated backwards over the time layers (num-

bered by k) from k = N to an arbitrary number 0 k < N. The grid -

function F; approximates the loss function F(T-~FT, ih,, jhy) corresponding to Eq. (7.2.22).

To obtain the unknown values of the grid functions vFj and K; uniquely from the algebraic equations (7.2.38) and (7.2.39), in view of (7.2.22), we need to complete these equations with the zero "initial" conditions

and the boundary conditions of the form

where the function p( t , y) is determined by (7.2.13), and the function 4 ( t , x) is calculated either by formulas (7.2.16)-(7.2.18) or by formula (7.2.20) (or (7.2.21)) depending on the value of the admissible control u,. According to [163], the difference scheme (7.2.38)-(7.2.41) approximates the loss function F ( t , x, y) of Eq. (7.2.22) up to O(h, + hy + 7).

Calculations according to formulas (7.2.38)-(7.2.41) were performed by using computers, and some numerical results are shown in Figs. 61-64. Figure 61 shows the position of the switching lines (7.2.12) on the phase plane (x, y) for different values of the "reverse" time p = T - t. The curves in Fig. 61 were constructed for the problem parameters b = u, = 0.5 and the parameters h, = hy = 0.1, T = 0.01, and N, = Ny = 20 of the grid (7.2.35). Curves 1-5 correspond to the values of the reverse time p = 1.5, 2.5, 3.5, 5.0, 7.0, respectively. The dashed line in Fig. 61 indicates the segment of the line (7.2.29) that is the part of the switching line corresponding to the sliding mode of control in the limit case p = T - t + a. Figures 62 and 63 show similar results for the maximum values u, = 1.0 and u, = 1.5 of the admissible control. Curves 1-3 in Figs. 62 and 63 are the switching lines corresponding to three values of the

Chapter VII

reverse time p = 3.5, 6.0, 12.0. Figure 64 illustrates the variation of the loss function F ( t , x, y) along a part of the line (7.2.29) for different time moments. The dotted line in Fig. 64 shows the stationary loss function (7.2.34).

Numerical Synthesis

Figures 6 1-64 show that the results of numerical solution of Eq. (7.2.22) (and of the synthesis problem) as p t cc allow us to study the passage to the stationary control of population sizes. Moreover, these data confirm the results of the theoretical analysis of the stationary mode carried out in Section 7.2.3.

382 Chapter VII

We also point out that the nonstationary u,(t, x, y ) and the stationary u, (x, y ) = limp,, u, (t, x, y ) algorithms of optimal control, obtained by solving the Bellman equation (7.2.22) numerically, were used for the numerical simulation of transient processes in system (7.2.3) when the comparative analysis of different control algorithms was carried out. The results of this simulation and comparative analysis were discussed in $5.2.

CONCLUSION

Design methods that use the frequency approach to the analysis and synthesis of control systems [119-121, 146, 1471 are widely applied in modern control engineering. Based on such notions as the transfer functions of open- or closed-loop systems, these methods allow one to evaluate the control quality by the position of zeros and poles of these transfer functions in the frequency domain. The frequency methods are very illustrative and effective in studying linear feedback control systems.

As for the methods for the calculation of optimal (suboptimal) control algorithms in the state space, shown in this book, modern engineering most frequently deals with results obtained by solving problems of linear quadratic optimization, which lead to linear optimal control systems.

So far linear quadratic problems of optimal control have been studied comprehensively, the literature on this subject is quite extensive, and therefore these problems are only briefly outlined here. It should be noted that the practical realization of linear optimal systems often involves difficulties, as one needs to solve the matrix-valued Riccati equation and to use the solution of this equation on the actual time scale. These problems are discussed in [47, 126, 134, 149, 1501.

It is well known that a large number of practically important problems of optimal control cannot be reduced to linear quadratic problems. In particular, this is true for control problems in which constraints imposed on the values of the admissible control play an important role. Although practically important, there is currently no universal approach to solving these optimal control problems with constraints in the form that ensures a simple technical realization of the optimal control algorithm. The author hopes that the results obtained in this book will help to develop new engineering methods for solving such problems by using constructive methods for solving the Bellman equations.

Some remarks concerning the prospects for solving applied problems of optimal control on the basis of the dynamic programming approach should be made.

The existing methods of optimal control synthesis could be categorized as exact, approximate analytic, and numerical. If a synthesis problem can

384 Conclusion

be solved exactly, then the optimal control algorithm can be written as a finite formula obtained by analytically solving the corresponding Bellman equation. Then the block C (the controller) in the functional diagram (see Figs. 2 and 3) is a device simulating the analytic expression derived for the optimal algorithm. Unfortunately, it is seldom that the Bellman equations can be solved exactly (as a rule, for one-dimensional control problems). The same holds in the case of linear quadratic problems, for which the dynamic programming approach only simplifies the procedure of solving the synthesis problem by reducing the problem of solving a nonlinear partial differential equation to solving a finite system of ordinary differential equations (a matrix-valued Riccati equation). In general, one could say that intuition and conjecture are crucial in search of exact solutions to the Bell- man equations. Therefore, the construction of exact solution resembles a kind of art rather than a formal scientific approach.' Thus, we cannot expect that exact synthesis methods would be widely used for solving actual control problems. The "practical" value of exact solutions to Bellman equations (and to synthesis problems) is that they, as a rule, form the basis for a family of approximate analytic synthesis methods, which in turn enable one to find control algorithms close to optimal algorithms for a significantly larger class of specific applied problems.

The most common approximate synthesis methods employ various versions of the methods of a small parameter and of successive approximations for solving the Bellman equation. On one hand, a large variety of different versions of asymptotic synthesis methods (described in this book and by other authors, see [22, 33, 34, 56-58, 1101) is available which allow one to obtain solutions for many important classes of optimal control problems often encountered in practice. On the other hand, the asymptotic synthesis methods usually have a remarkable feature (multiply shown in this book) that ensures a high effectiveness of asymptotic methods in practice. Namely, quasioptimal control algorithms derived according to some scheme with small parameters are often sufficient when the parameter supposed to be small is in fact of a finite value, which is comparable to the other parameters of the problem. In the design of actual control systems, this allows one to obtain reasonable control algorithms by introducing a purely formal small parameter into a specific problem considered. Moreover, by formally applying the method of a small parameter, it is often possible to significantly improve various heuristic control algorithms commonly used in engineering (a typical example of such an improvement is given in 36.1). All this makes approximate synthesis methods based on the use of asymp-

'A similar situation arises in the search of Liapunov functions in the theory of stability [I, 29, 125, 1291. This fact was pointed out by T. Burton [29, p. 1661: " . . . Beyond any doubt, construction of Liapunov functions is an art."

Conclusion 385

totic methods for solving the Bellman equations one of the most promising trends in the engineering design of optimal control systems.

Another important branch of applied methods for solving problems of optimal control is the development of numerical methods for solving the Bellman equations (and synthesis problems). This field has recently re- ceived much attention [lo, 31, 48, 49, 53, 86, 104, 1691. The main benefit of numerical synthesis methods is their high universality. It is worth to note that numerical methods also play an important role in problems of evalu- ating the performance index of quasioptimal control algorithms calculated by other methods. Currently, the widespread use of numerical synthesis methods in modern engineering is somewhat hampered by the following two factors: (i) the approximate properties of discrete schemes for solving some classes of Bellman equations still remain to be rigorously mathematically justified, and (ii) the calculation of grid functions requires a great number of operations. All this makes it difficult to solve control problems of higher dimension and those with unbounded phase space. However, one must not consider these facts as an obstacle to using numerical methods in engineering. Recent developments in numerical methods for solving the Bellman equations and in the decomposition of multidimensional problems [31], continuous advances in parallel computing, and the progress in computer technology itself suggest that the numerical methods for the synthesis of optimal systems will soon become a regular tool for all those dealing with the design of actual control systems.

REFERENCES

1. V. N. Afanasiev, V. B. Kolmanovskii, and V.R. Nosov, Mathemati- cal Theory of Control Systems Design, Dordrecht: Kluwer Academic Publishers, 1996.

2. A. A. Andronov, A. A. Vitt, and S. E. Khaikin, Theory of Oscilla- tions, Moscow: Fizmatgiz, 1971.

3. M. Aoki, Optimization of Stochastic Systems, New York-London: Academic Press, 1967.

4. P. Appel et J . Kampd de Feriet, Fonktions hypergdomdtriques et hypersphdriques, Polynomes d7Hermite. Paris, 1926.

5. K. J. Astriim, Introduction to Stochastic Control Theory. New York: Academic Press, 1970.

6. K. J. AstrGm, Theory and Applications of Adaptive Control - a Survey. Automatica-J. IFAC, 19: 471-486, 1992.

7. K. J. Astriim, Adaptive control. In: Antoulas, ed., Mathematical System Theory, Berlin: Springer, 1991, pp. 437-450.

8. K. J. AstrGm, Adaptive control around 1960. IEEE Control Sys- tems, 16, No. 3: 44-49, 1996.

9. K. J. Astriim and B. Wittenmark, A survey of adaptive control applications. Proceedings 34th IEEE Conference on Decision and Control, New Orleans, Louisiana, 1995, pp. 649-654.

10. M. Bardi, S. Bottacin, and M. Falcone, Convergence of discrete schemes for discontinuous value functions of pursuit-evasion games. In: G. J. Olsder, ed., New Trends in Dynamic Games and Applica- tions, Basel-Boston: Birkhauser, 1995, pp. 273-304.

11. A. T. Bharucha-Reid, Elements of the Theory of Markov Processes and Their Applications, New York: McGrow-Hill, 1960.

12. V. P. Belavkin, Optimization of quantum observation and control. Proceedings of 9th IFIP Conference on Optimizations Techniques, Warszawa, 1979, Springer, 1980, pp. 141-149.

388 References

13. V. P. Belavkin, Nondemolition measurement and control in quantum dynamic systems. Proceedings of CISM Seminar on Informa- tion Complexity and Control in Quantum Physics, Springer, 1987, pp. 311-329.

14. R. Bellman, Dynamic Programming. Princeton: Princeton Univer- sity Press, 1957.

15. R. Bellman and E. Angel, Dynamic Programming and Partial Dif- ferential Equations. New York: Academic Press, 1972.

16. R. Bellman, I. Gliksberg, and 0. A. Gross, Some Aspects of the Mathematical Theory of Control Processes. Santa Monica, Califor- nia: Rand Corporation, 1958.

17. R. Bellman and R. Kalaba, Theory of dynamic programming and feedback systems. Proceedings of 1st IFAC Congress, Theory of Discrete, Optimal, and Self-Tuning Systems, Moscow: Akad. Nauk USSR, 1961.

18. D. P. Bertsekas, Dynamic Programming and Stochastic Control. London: Academic Press, 1976.

19. N. N. Bogolyubov and Yu. A. Mitropolskii, Asymptotic Methods in Nonlinear Oscillation Theory. Moscow: Fizmatgiz, 1974.

20. I. A. Boguslavskii, Navigation and Control under Incomplete Sta- tistical Information. Moscow: Mashinostroenie, 1970.

21. I. A. Boguslavskii and A. V. Egorova, Stochastic optimal control of motion with nonsymmetric constraints. Avtomat. i Telemekh., 33, No. 8, 1972.

22. M. Y. Borodovskii, A. S. Bratus, and F. L. Chernous'ko, Optimal pulse correction under random disturbances. Prikl. Mat. Mekh., 39, No. 5, 1975.

23. N. D. Botkin and V. S. Patsko, Universal strategy in a differential game with fixed terminal time. Problems Control Inform. Theory, 11, No. 6: 419-432, 1982.

24. A. E. Bryson and Y. C. Ho, Applied Optimal Control. Toronto- London: Blaisdell, 1969.

25. B. M. Budak and S. V. Fomin, Multiple Integrals and Series. Mos- cow: Nauka, 1965.

26. B. M. Budak, A. A. Samarskii, A. N. Tikhonov, Collection of Prob- lems in Mathematical Physics. Moscow: Nauka, 1972.

27. B. V. Bulgakov, Oscillations. Moscow: Gostekhizdat, 1954.

28. R. Bulirsch and H. J. Pesch, The maximum principle, Bellman's equation, and Caratheodory's work. J. Optim. Theory and Appl.,

References 389

80, No. 2: 203-229, 1994.

29. T. A. Burton, Volterra Integral and Differential Equations. New York: Academic Press, 1983.

30. A. G. Butkovskii, Distributed Control Systems. New York: Else- vier, 1969.

31. F. Camili, M. Falcone, P. Lanucara, and A. Seghini, A domain decomposition method for Bellman equations. In: D. E. Keyes and J. Xu, eds., Domain Decomposition Methods in Scientific and Engineering. Contemp. Math., Providence: Amer. Math. Soc., 1994, 180: 477-483, 1994.

32. F. L. Chernous'ko, Some problems of optimal control with a small parameter. Prikl. Mat. Mekh., 32, No. 1, 1968.

33. F. L. Chernous'ko, L. D. Akulenko, and B. N. Sokolov, Control of Oscillations. Moscow: Nauka, 1980.

34. F. L. Chernous'ko and V. B. Kolmanovskii, Optimal Control under Random Disturbances. Moscow: Nauka, 1978.

35. C. W. Clark, Bioeconomic Modeling and Fisheries Managements. New York: Wiley, 1985.

36. D. R. Cox and H. D. Miller, The Theory of Stochastic Processes. Methuen, 1965.

37. M. L. Dashevskiy and R. S. Liptser, Analog modeling of stochastic differential equations connected with change point problem. Av- tomat. i Telemekh., 27, No. 4, 1966.

38. M. H. A. Davis and R. B. Vinter, Stochastic Modeling and Control. London: Chapman and Hall, 1985.

39. M. H. DeGroot, Optimal Statistical Decisions. New York: McGrow- Hill, 1970.

40. V. F. Dem'yanov, On minimization of maximal deviation. Vestnik Leningrad Univ. Math., No. 7, 1966.

41. V. A. Ditkyn and A. P. Prudnikov, Integral Transforms and Oper- ational Calculus. Moscow: Fizmatgiz, 1961.

42. A. L. Dontchev, Error estimates for a discrete approximation to constrained control problems. SIAM J. Numer. Anal., 18: 500- 514, 1981.

43. A. L. Dontchev, Perturbations, Approximations, and Sensitivity Analysis of Optimal Control Systems. Lecture Notes in Control and Inform. Sci., Vol. 52, Berlin: Springer, 1983.

44. J. L. Doob, Stochastic Processes. New York: Wiley, 1953.

390 References

45. E. B. Dynkin, Markov Process. Berlin: Springer, 1965.

46. S. V. Emel'yanov, ed., Theory of Variable-Structure Systems. Mos- cow: Nauka, 1970.

47. C. Endrikat and I. Hartmann, Optimal design of discrete-time MIMO systems in the frequency domain. Internat. J. Control, 48, No. 4: 1569-1582, 1988.

48. M. Falcone, Numerical solution of dynamic programming equations. Appendix to the monograph by M. Bardi, I. Capuzzo Dolcetta, Op- timal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Basel-Boston: Birkhauser, 1997.

49. M. Falcone and R. Ferretti, Convergence analysis for a class of semi- Lagrangian advection schemes. SIAM J. Numer. Anal., 38, 1998.

50. A. A. Feldbaum, Foundations of the Theory of Optimal Automatic Systems. Moscow: Nauka, 1966.

51. M. Feldman and J . Roughgarden, A populations's stationary distribution and chance of extinction in stochastic environments with remarks on the theory of species packing. Theor. Pop. Biol., 7, No. 12: 197-207, 1975.

52. W. Feller, An Introduction to Probability Theory and Its Applica- tions. New York: Wiley, 1970.

53. R. Ferretti, On a Class of Approximation Schemes for Linear Bound- ary Control Problems. Lecture Notes in Pure and Appl. Math., Vol. 163, New York: Marcel Dekker, 1994.

54. A. F. Filippov, Differential Equations with Discontinuous Right- Hand Sides. Dordrecht: Kluwer Academic Publishers, 1986.

55. W. H. Fleming, Some Markovian optimization problems. J. Math. and Mech., 12 No. 1, 1963.

56. W. H. Fleming, Stochastic control for small noise intensities. SIAM J. Control, 9, No. 3, 1971.

57. W. H. Fleming and M. R. James, Asymptotic series and exit time probabilities. Ann. Probab., 20, No. 3: 1369-1384, 1992.

58. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic Optimal Control: Berlin: Springer, 1975.

59. W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions. Berlin: Springer, 1993.

60. G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Meth- ods for Mathematical Computation. Englewood Cliffs, N.J.: Pren- tice Hall, 1977.

References 391

61. A. Friedman, Partial Differential Equations of Parabolic Type. En- glewood Cliffs, N.J.: Prentice Hall, 1964.

62. F. R. Gantmacher, The Theory of Matrices. Vol. 1, New York: Chelsea, 1964.

63. I. M. Gelfand, Generalized Stochastic Processes. Dokl. Akad. Nauk SSSR, 100, No. 5, 1955.

64. I. M. Gelfand and S. V. Fomin, Calculus of Variations. Moscow: Fizmatgiz, 1961.

65. I. M. Gelfand and G. I. Shilov, Generalized Functions and Their Calculations. Moscow: Fizmatgiz, 1959.

66. I. I. Gikhman and A. V. Skorokhod, The Theory of Stochastic Pro- cesses. Berlin: Springer, Vol. 1, 1974; Vol. 2, 1975.

67. B. V. Gnedenko, Theory of Probabilities. Moscow: Nauka, 1969.

68. B. S. Goh, Management and Analysis of Biological Populations. Amsterdam: Elsevier Sci., 1980.

69. L. S. Goldfarb, On some nonlinearities in automatic regulation systems. Avtomat. i Telemekh., 8, No. 5, 1947.

70. L. S. Goldfarb, Research method for nonlinear regulation systems based on harmonic balance principle. In: Theory of Automatic Regulation, Moscow: Mashgiz, 1951.

71. E. Goursat, Cours d'Analyse MathCmatique. V. 3, Paris: Gauthier- Villars, 1927.

72. R. Z. Hasminskii, Stochastic Stability of Differential Equations. Alphen: Sijtjoff and Naordhoff, 1980.

73. G. E. Hutchinson, Circular control systems in ecology. Ann. New York Acad. Sci., 50, 1948.

74. A. M. Il'in, A. S. Kalashnikov, and 0. L. Oleynik, Second-order parabolic linear equations. Uspekhi Mat. Nauk, 17, No. 3, 1962.

75. K. Ito, Stochastic integral. Proc. Imp. Acad., Tokyo, 20, 1944.

76. K. Ito, On a formula concerning stochastic differentials. Nagoya Math. J., 3: 55-65, 1951.

77. E. Janke, F. Emde, and F. Losch, Tafeln hoherer Funktionen. Stutt- gart: Teubner, 1960.

78. R. E. Kalman, On general theory of control systems. In: Proceed- ings of the 1 IFAC Congress, Vol. 2, Moscow: Acad. Nauk SSSR, 1960.

79. R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory. Trans. ASME Ser. D (J. Basic Engineering), 83: 95-108, 1961.

392 References

80. L. I. Kamynin, Methods of heat potentials for a parabolic equation with discontinuous coefficients. Siberian Math. J., 4, No. 5, 1963.

81. L. I. Kamynin, On existence of boundary problem solution for parabolic equations with discontinuous coefficients. Izv. Akad. Nauk SSSR Ser. Mat., 28, No. 4, 1964.

82. V. A. Kazakov, Introduction to the Theory of Markov Processes and Radio Engineering Problems. Moscow: Sovetskoe Radio, 1973.

83. M. Kimura, Some problems of stochastic processes in genetics. Ann. Math. Statist., 28: 882-901, 1957.

84. V. B. Kolmanovskii, On approximate synthesis of some stochastic systems. Avtomat. i Telemekh., 36, No. 1, 1975.

85. V. B. Kolmanovskii, Some time-optimal control problems for stochastic systems. Problems Control Inform. Theory, 4, No. 4, 1975.

86. V. B. Kolmanovskii and G. E. Kolosov, Approximate and numerical methods to design optimal control of stochastic systems. Izv. Akad. Nauk SSSR Tekhn. Kibernet., No. 4: 64-79, 1989.

87. V. B. Kolmanovskii and A. D. Myshkis, Applied Theory of Func- tional Differential Equations. Dordrecht: Kluwer Academic Pub- lishers, 1992.

88. V. B. Kolmanovskii and V. R. Nosov, Stability of Functional Dif- ferential Equations. London: Academic Press, 1986.

89. V. B. Kolmanovskii and L. E. Shaikhet, Control of Systems with Af- tereffect. Transl. Math. Monographs, Vol. 157, Providence: Amer. Math. Soc., 1996.

90. V. B. Kolmanovskii and A. K. Spivak, Time-optimal control in a predator-prey system. Prikl. Mat. Mekh., 54, No. 3: 502-506, 1990.

91. A. N. Kolmogorov and S. V. Fomin, Elements of Function Theory and Functional Analysis. Moscow: Nauka, 1968.

92. G. E. Kolosov, Synthesis of statistical feedback systems optimal with respect to different performance indices. Vestnik Moskov. Univ. Ser. 111, No. 1: 3-14, 1966.

93. G. E. Kolosov, Optimal control of quasiharmonic plants under incomplete information about the current values of phase variables. Avtomat. i Telemekh., 30, No. 3: 33-41, 1969.

94. G. E. Kolosov, Some problems of optimal control of Markov plants. Avtomat. i Telemekh., 35, No. 2: 16-24, 1974.

95. G. E. Kolosov, Analytical solution of problems in synthesis of optimal distributed-parameter control systems subject to random per-

References 393

turbations. Automat. Remote Control, No. 11: 1612-1622, 1978.

96. G. E. Kolosov, Synthesis of optimal stochastic control systems by the method of successive approximations. Prikl. Mat. Mekh., 43, No. 1: 7-16, 1979.

97. G. E. Kolosov, Approximate synthesis of stochastic control systems with random parameters. Avtomat. i Telemekh., 43, No. 6: 107- 116, 1982.

98. G. E. Kolosov, Approximate method for design of stochastic adaptive optimal control systems. In: G. S. Ladde and M. Sambandham, eds., Proceedings of Dynamic Systems and Applications, Vol. 1, 1994, pp. 173-180.

99. G. E. Kolosov, On a problem of population size control. Izv. Ross. Akad. Nauk Teor. Sist. Upravlen., No. 2: 181-189, 1995.

100. G. E. Kolosov, Numerical analysis of some stochastic suboptimal controlled systems. In: Z. Deng, Z. Liang, G. Lu, and S. Ruan, eds., Differential Equations and Control Theory. Lecture Notes in Pure and Appl. Math., Vol. 176, New York: Marcel Dekker, 1996, pp. 143-148.

101. G. E. Kolosov, Exact solution of a stochastic problem of optimal control by population size. Dynamic Systems and Appl., 5, No. 1: 153-161, 1996.

102. G. E. Kolosov, Size control of a population described by a stochastic logistic model. Automat. Remote Control, 58, No. 4: 678-686, 1997.

103. G. E. Kolosov and D. V. Nezhmetdinova, Stochastic problems of optimal fisheries managements. In: Proceedings of the 15th IMACS Congress on Scientific Computation. Modelling and Applied Math- ematics, Vol. 5, Berlin: Springer, 1997, pp. 15-20.

104. G. E. Kolosov and M. M. Sharov, Numerical method of design of stochastic optimal control systems. Automat. Remote Control, 49, No. 8: 1053-1058, 1988.

105. G. E. Kolosov and M. M. Sharov, Optimal damping of population size fluctuations in an isolated "predator-prey" ecological system. Automation and Remote Control, 53 No. 6: 912-920, 1992.

106. G. E. Kolosov and M. M. Sharov, Optimal control of population sizes in a predator-prey system. Approximate design in the case of an ill-adapted predator. Automat. Remote Control, 54, No. 10: 1476-1484, 1993.

107. G. E. Kolosov and R. L. Stratonovich, An asymptotic method for

394 References

solution of the problems of optimal regulators design. Avtomat. i Telemekh., 25, No. 12: 1641-1655, 1964.

108. G. E. Kolosov and R. L. Stratonovich, On optimal control of quasiharmonic systems. Avtomat. i Telemekh., 26, No. 4:601-614, 1965.

109. G. E. Kolosov and R. L. Stratonovich, Asymptotic method for solution of stochastic problems of optimal control of quasiharmonic systems. Avtomat. i Telemekh., 28, No. 2: 45-58, 1967.

110. N. N. Krasovskii and E. A. Lidskii, Analytical design of regulators in the systems with random properties. Avtomat. i Telemekh., 22, No. 9-11, 1961.

111. N. N. Krasovskii, Theory of the Control of Motion. Moscow: Nauka, 1968.

112. V. F. Krotov, Global Methods in Optimal Control Theory. New York: Marcel Dekker, 1996.

113. N. V. Krylov, Controlled Diffusion Process. New York: Springer, 1980.

114. S. I. Kumkov and V. S. Patsko, Information sets in the problem of pulse control. Avtomat. i Telemekh., 22, No. 7: 195-206, 1997.

115. A. B. Kurzhanskii, Control and Observation under Uncertainty. Moscow: Nauka, 1977.

116. H. J. Kushner and A. Schweppe, Maximum principle for stochastic control systems. J. Math. Anal. Appl., No. 8, 1964.

117. H. J. Kushner, Stochastic Stability and Control. New York-London: Academic Press, 1967.

118. H. J. Kushner, On the optimal control of a system governed by a linear parabolic equation with white noise inputs. SIAM J . Control, 6, No. 4, 1968.

119. H. Kwakernaak, The polynomial approach to H, optimal regulation. In: E. Mosca and L. Pandolfi, eds., H,-Control Theory, Como, 1990. Lecture Notes in Math., Vol. 1496, Berlin: Springer, 1991.

120. H. Kwakernaak, Robust control and H,-optimization. Automati- ca-J. IFAC, 29, No. 2: 255-273, 1993.

121. H. Kwakernaak, Symmetries in control system design. In: Alberto Isidori, ed., Trends in Control, A European Perspective, Rome. Berlin: Springer, 1995.

122. H. Kwakernaak and R. Sivan, Linear Optimal Control Systems. New York-London: Wiley, 1972.

References 395

123. J. P. La Salle, The time-optimal control problem. In: Contribution to Differential Equations, Vol. 5, Princeton, N.J.: Princeton Univ. Press, 1960.

124. 0. Ladyzhenskaya, V. Solonnikov, and N. Uraltseva, Linear and Quasilinear Equations of Parabolic Type. Transl. Math. Mono- graphs, Vol. 23, Providence: Amer. Math. Soc., 1968.

125. V. Lakshmikantham, S. Leela and A. A. Martynyuk, Stability Anal- ysis of Nonlinear Systems. New York: Marcel Dekker, 1988.

126. P. Lancaster and L. Rodman, Solutions of the continuous and discrete time algebraic Riccati equations. In: S. Bittanti, A. J. Laub, and J. G. Willems, eds., The Riccati Equation. Berlin: Springer, 1991.

127. P. Langevin, Sur la thdorie du mouvment brownien. Comptes Ren- dus Acad. Sci. Paris, 146, No. 10, 1908.

128. E. B. Lee and L. Marcus, Foundation of Optimal Control Theory. New York-London: Wiley, 1969.

129. X. X. Liao, Mathematical Theory and Application of Stability, Wuhan, China: Huazhong Normal Univ. Press, 1988.

130. J . L. Lions, Optimal Control of Systems Governed by Partial Dif- ferential Equations. Berlin: Springer, 1971.

131. R. S. Liptser and A. N. Shiryaev, Statistics of conditionally Gauss- ian random sequences. In: Proc. of the 6th Berkeley Symp. of Mathem. Statistics and Probability. University of California, 1970.

132. R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes. Berlin: Springer, Vol. 1, 1977 and Vol. 2, 1978.

133. A. J. Lotka, Elements of Physical Biology. Baltimore: Williams and Wilkins, 1925.

134. R. Luttmann, A. Munack, and M. Thoma, Mathematical modelling, parameter identification, and adaptive control of single cell protein processes in tower loop bioreactors. In: Advances in Biochemical Engineering, Biotechnology, Vol. 32, Berlin-Heidelberg: Springer, 1985, pp. 95-205.

135. G. I. Marchuk, Methods of Numerical Mathematics. New York- Berlin: Springer, 1975.

136. N. N. Moiseev, Asymptotical Methods of Nonlinear Analysis. Mos- cow: Nauka, 1969.

137. N. N. Moiseev, Foundations of the Theory of Optimal Systems. Moscow: Nauka, 1975.

396 References

138. B. S. Mordukhovich, Approximation Methods in Problems of Opti- mization and Control. Moscow: Nauka, 1988.

139. V. M. Morozov and I. N. Kalenkova, Estimation and Control in Nonstationary Systems. Moscow: Moscow State Univ. Press, 1988.

140. E. M. Moshkov, On accuracy of optimal control of terminal condition. Prikl. Mat. i Mekh., 34, No. 3, 1970.

142. J. D. Murray, Lectures on Nonlinear Differential Equation Model in Biology. Oxford: Claremon Press, 1977.

143. G. V. Obrezkov and V. D. Razevig, Methods of Analysis of Tracking Breakdowns. Moscow: Sovetskoe Radio, 1972.

144. 0. A. Oleynik, Boundary problems for linear elliptic and parabolic equation with discontinuous coefficients. Izv. Acad. Nauk SSSR Ser. Mat., 25, No. 1, 1961.

145. V. S. Patsko, et al., Control of an aircraft landing in windshear. J. Optim. Theory and Appl., 83, No. 2: 237-267, 1994.

146. A. E. Pearson, Y. Shen, and J. Q. Pan, Discrete frequency formats for linear differential system identification. In: Proc. of 12th World Congress IFAC, Sydney, Australia, Vol. VII, 1993, pp. 143-148

147. A. E. Pearson and A. A. Pandiscio, Control of time lag systems via reducing transformations. In: Proc. of 15th IMACS World Congress. A. Sydow, ed., Systems Engineering, Vol. 5, Berlin: Wis- senschaft & Technik, 1997, pp. 9-14.

148. A. A. Pervozvanskii, On minimum of maximal deviation of controlled linear system. Izv. Acad. Nauk SSSR Mekhanika, No. 2, 1965.

149. H. J. Pesch, Real-time computation of feedback controls for constrained extremals (Part 1: Neighboring extremals; Part 2: A correction method based on multiple shooting). Optimal Control Appl. Methods, 10, No. 2: 129-171, 1989.

150. H. J. Pesch, A practical guide to the solution of real-life optimal control problems. Control Cybernet., 23, No. 1 and 2: 7-60, 1994.

151. A. B. Piunovskiy, Optimal control of stochastic sequences with constraints. Stochastic Anal. Appl., 15, No. 2: 231-254, 1997.

152. A. B. Piunovskiy, Optimal Control of Random Sequences in Prob- lems with Constraints. Dordrecht: Kluwer Academic Publishers, 1997.

153. H. Poincare, Sur le probleme de troits corps et les equations de la dynamiques. Acta Math., 13, 1890.

References 397

154. H. Poincare, Les Methodes Nouvelles de la Maechanique Celeste. Paris: Gauthier-Villars, 1892-1899.

155. I. I. Poletayeva, Choice of optimality criterion. In: Engineering Cybernetics, Moscow: Nauka, 1965.

156. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mischenko, The Mathematical Theory of Optimal Processes. New York: Interscience, 1962.

157. Yu. V. Prokhorov and Yu. A. Rozanov, Probability Theory, Founda- tions, Limit Theorems, and Stochastic Processes. Moscow: Nauka, 1967.

158. N. S. Rao and E. 0. Roxin, Controlled growth of competing species. SIAM J. Appl. Math., 50, No. 3: 853-864, 1990.

159. V. I. Romanovskii, Discrete Markov Chains. Moscow: Gostekhiz- dat, 1949.

160. Yu. A. Rozanov, Stochastic Processes. Moscow: Nauka, 1971.

161. A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communication and Control. New York: McGraw-Hill, 1971.

162. A. A. Samarskii, Introduction to Theory of Difference Schemes. Moscow: Nauka, 1971.

163. A. A. Samarskii and A. V. Gulin, Numerical Methods. Moscow: Nauka, 1989.

164. M. S. Sholar and D. M. Wiberg, Canonical equation for boundary feedback control of stochastic distributed parameter systems. Automatica-J. IFAC, 8, 1972.

165. H. L. Smith, Competitive coexistence in an oscillating chemostat. SIAM J. Appl. Math., 40, No. 3: 498-552, 1981.

166. S. L. Sobolev, Equations of Mathematical Physics. Moscow: Nauka, 1966.

167. Yu. G. Sosulin, Theory of Detection and Estimation of Stochastic Signals. Moscow: Sovetskoe Radio, 1978.

168. J. Song and J. Yu, Population System Control. Berlin: Springer, 1987.

169. J. Stoer, Principles of sequential quadratic programming methods for solving nonlinear programs. In: K. Schittkowski, ed., Computa- tional Mathematical Programming. NATO AS1 Series, F15, 1985, pp. 165-207.

170. R. L. Stratonovich, Application of Markov processes theory for optimal filtering of signals. Radiotekhn. i Elektron., 5, No. 11, 1960.

398 References

171. R. L. Stratonovich, On the optimal control theory. Sufficient coordinates. Avtomat. i Telemekh., 23, No. 7, 1962.

172. R. L. Stratonovich, On the optimal control theory. Asymptotic method for solving the diffusion alternative equation. Avtomat. i Telemekh., 23, No. 11, 1962.

173. R. L. Stratonovich, Topics in the Theory of Random Noise. New York: Gordon and Breach, Vol. 1, 1963 and Vol. 2, 1967.

174. R. L. Stratonovich, New form of stochastic integrals and equations. Vestnik Moskov. Univ. Ser. I Mat. Mekh., No. 1, 1964.

175. R. L. Stratonovich, Conditional Markov Processes and Their Ap- plication to the Theory of Optimal Control. New York: Elsevier, 1968.

176. R. L. Stratonovich and V. I. Shmalgauzen, Some stationary problems of dynamic programming, Izv. Akad. Nauk SSSR Energetika i Avtomatika, No. 5, 1962.

177. Y. M. Svirezhev, Nonlinear Waves, Dissipative Structures, and Ca- tastrophes in Ecology. Moscow: Nauka, 1987.

178. G. W. Swan, Role of optimal control theory in cancer chemotherapy, Math. Biosci., 101: 237-284, 1990.

179. A. N. Tikhonov and A. A. Samarskii, Equations of Mathematical Physics. Moscow: Nauka, 1972.

180. V. I. Tikhonov, Phase small adjustment of frequency in presence of noises. Avtomat. i Telemekh., 21, No. 3, 1960.

181. V. I. Tikhonov and M. A. Mironov, Markov Processes. Moscow: Sovetskoe Radio, 1977.

182. S. G. Tzafestas and J. M. Nightingale, Optimal control of a class of linear stochastic distributed parameter systems. Proc. IEE, 115, No. 8, 1968.

183. B. van der Pol, A theory of the amplitude of free and forced triode vibration. Radio Review, 1, 1920.

184. B. van der Pol, Nonlinear theory of electrical oscillations. Proc. IRE, 22, No. 9, 1934.

185. B. L. van der Waerden, Mathematische Statistik. Berlin: Springer, 1957.

186. V. Volterra, Variazione fluttuazioni del numero d'individui in specie animali convivelnti. Mem. Acad. Lincei, 2: 31-113, 1926.

187. V. Volterra, Lecons sur la theorie mathematique de la lutte pour la vie. Paris: Gauthier-Villars, 1931.

References 399

188. A. Wald, Sequential Analysis, New York: Wiley, 1950.

189. K. E. F. Watt, Ecology and Resource Management. New York: McGraw-Hill, 1968.

190. B. Wittenmark and K. J . Astrijm, Practical issues in the implementation of self-tuning control. Automatica--J. IFAC, 20: 595-605, 1984.

191. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations. Internat. J. Engrg. Sci., 3, 1965.

192. E. Wong and M. Zakai, On the relation between ordinary and stochastic differential equations and applications to stochastic problems in control theory. Proc. Third Intern. Congress IFAC, Lon- don, 1966.

193. W. M. Wonham, On the separation theorem of stochastic control. SIAM J. Control, 6 : 312-326, 1968.

194. W. M. Wonham, Random differential equations in control theory. In: A. T. Bharucha-Reid, ed., Probabilistic Methods in Applied Mathematics, Vol. 2, New York: Academic Press, 1970.

195. M. A. Zarkh and V. S. Patsko, Strategy of second payer in the linear differential game. Prikl. Math. Mekh., 51, No. 2: 193-200, 1987.

INDEX

Adaptive problems of optimal control, 9

A posteriori mean values, 91 A posteriori covariances, 90 Asymptotic synthesis

method, 248 Asymptotic series, 220

program, 2 of relay type, 105, 111

Control problem with infinite horizon, 343

Controller, 1, 7 Constraints, control, 17

on control resources, 17 on phase variables, 18

Cost function (functional), 49 Covariance matrix. 147

Bellman equation, 47, 51 differentional, 63 functional, 278 integro-differentional, 74 stationary, 67

Bellman optimality principle, 49 Brownian motion, 33

Cauchy problem, 9 Capacity of the medium, 124 Chapman-Kolmogorov

equation, 23 Control, admissible, 9

bang-bang, 105 boundary, 212 distributed, 201

Diffusion process, 27 Dynamic programming

approach, 47

Equations, Langevin 45 logistic, 124 of a single population, 342 stochastic differential, 32 truncated, 253

Error signal, 104 Error,

stationary tracking, 67, 226 Estimate,

of approximate synthesis, 182 of unknown parameters, 316

Index

Euler equation, 136 Lotka-Volterra, equation, 125 normalized model, 274, 368

Feedback control system, 2 Filippov generalized

solution, 12 Fokker-Planck equation, 29 Functional, cost, 19

quadratic, 93, 99

Gaussian, conditionally, 313 probability density, 92 process, 20

Hutchinson, model, 125

Integral criterion, 14 Ito, equation, 42

stochastic integral, 37

Kalman filter, 91 Kolmogorov,

backward equation, 25 forward equation, 25

Krylov-Bogolyubov method, 254

Loss function, 49

Markov, process, 21 discrete, 22 continuous, 25 conditional, 79 strictly discontinuous, 31

Mathematical expectation, 15 conditional, 60

Matrix, fundamental, 177 Method,

alternating direction, 378 grid function, 356 small parameter, 220 of successive approximation, 143 sweep, 364

Model, stochastic logistic, 126, 311

Malthus, model, 123

Natural growth factor, 124 Nonvibrational amplitude, 254

phase, 254

Optimal, damping of random oscillations, 276 fisheries management, 133, 342

Optimality criterion, 2, 13 terminal, 14

Index

Oscillator, quasiharmonic, 248 Oscillatory systems, 247

Performance index, 2 Plant, 1, 7 Plant with distributed

parameters, 199 Poorly adapted predator, 267 Population models, 123 Predator-prey model, 125 Probability, density, 20 Problem, boundary-value, 70

linear-quadratic (LQ-), 53 with free endpoint, 48

Process, stochastic, 19 optimal stabilization, 278

Regulator, 154 Riccati equation, 100

Sample path, 108 Scheme,

lengthwise-transverse, 362 Screen, reflecting, 329

absorbing, 333 Servomechanism, 7 Sliding mode, 12 Stationary operating

conditions, 65 Sufficient coordinates, 75 Switch point, 105 Switching line, 156 Symmetrized (Stratonovich)

stochastic integral, 40 Synthesis, numerical, 355 Synthesis problem, 7

Transition probability, 22

Van-der-Pol oscillator, 252 Van-der-Pol method, 254

White noise, 19 Wiener random process, 33

Documents

Optimal Design of Control Systems Stochastic and Deterministic Problems