Optimal Predictive Adaptive Control

Embed Size (px)

DESCRIPTION

Predictive Adaptive Control

Citation preview

  • OPTIMAL, PREDICTIVE,AND ADAPTIVE CONTRO L

    dw1(t 1)

    w1(t)

    RLS

    zt(t+ 1)y(t+ 1)

    S1(t)

    RLS

    zt(t+ 2)

    yt+1t+2

    u(t+ 1)

    S2(t)

    w2(t)

    dw2(t 1)

    RLS

    zt(t+ 3)

    yt+1t+3

    ut+1t+2

    S3(t)

    w3(t)

    Edoardo Mosca

  • ii

    COPYRIGHT

    Il presente libro elettronico e` protetto dalle leggi sul copyrighted e` vietata la vendita; puo` essere liberamente distribuito,senza apporvi alcuna modifica, per usi didattici per gli stu-denti iscritti al corso Sistemi Adattativi della facolta` di In-gegneria dellUniversita` di Firenze.

  • CONTENTS

    CHAPTER 1 Introduction 11.1 Optimal, Predictive and Adaptive Control . . . . . . . . . . . . . . . 11.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Part and Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . . 4

    PART I Basic Deterministic Theory of LQ and Predictive Control 7

    CHAPTER 2 Deterministic LQ Regulation IRiccatiBased Solution 92.1 The Deterministic LQ Regulation Problem . . . . . . . . . . . . . . 92.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 RiccatiBased Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 TimeInvariant LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5 SteadyState LQR Computation . . . . . . . . . . . . . . . . . . . . 292.6 Cheap Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.7 Single Step Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    CHAPTER 3 I/O Descriptions and Feedback Systems 393.1 Sequences and Matrix Fraction Descriptions . . . . . . . . . . . . . . 393.2 Feedback Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3 Robust Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.4 Streamlined Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 563.5 1DOF Trackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    CHAPTER 4 Deterministic LQ Regulation II 614.1 Polynomial Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 CausalAnticausal Decomposition . . . . . . . . . . . . . . . . . . . . 654.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.4 Solvability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.5 Relationship with the RiccatiBased Solution . . . . . . . . . . . . . 724.6 Robust Stability of LQ Regulated Systems . . . . . . . . . . . . . . . 74

    Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    iii

  • iv CONTENTS

    CHAPTER 5 Deterministic Receding Horizon Control 775.1 Receding Horizon Regulation . . . . . . . . . . . . . . . . . . . . . . 775.2 RDE Monotonicity and Stabilizing RHR . . . . . . . . . . . . . . . . 805.3 Zero Terminal State RHR . . . . . . . . . . . . . . . . . . . . . . . . 825.4 Stabilizing Dynamic RHR . . . . . . . . . . . . . . . . . . . . . . . . 915.5 SIORHR Computations . . . . . . . . . . . . . . . . . . . . . . . . . 955.6 Generalized Predictive Regulation . . . . . . . . . . . . . . . . . . . 995.7 Receding Horizon Iterations . . . . . . . . . . . . . . . . . . . . . . . 1035.8 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    5.8.1 1DOF Trackers . . . . . . . . . . . . . . . . . . . . . . . . . 1145.8.2 2DOF Trackers . . . . . . . . . . . . . . . . . . . . . . . . . 1155.8.3 Reference Management and Predictive Control . . . . . . . . 124Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    PART II State Estimation, System Identification, LQ and PredictiveStochastic Control 127

    CHAPTER 6 Recursive State Filtering and System Identication 1296.1 Indirect Sensing Measurement Problems . . . . . . . . . . . . . . . . 1296.2 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

    6.2.1 The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . 1366.2.2 SteadyState Kalman Filtering . . . . . . . . . . . . . . . . . 1426.2.3 Correlated Disturbances . . . . . . . . . . . . . . . . . . . . . 1436.2.4 Distributional Interpretation of the Kalman Filter . . . . . . 1446.2.5 Innovations Representation . . . . . . . . . . . . . . . . . . . 1456.2.6 Solution via Polynomial Equations . . . . . . . . . . . . . . . 146

    6.3 System Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 1486.3.1 Linear Regression Algorithms . . . . . . . . . . . . . . . . . . 1496.3.2 Pseudolinear Regression Algorithms . . . . . . . . . . . . . . 1596.3.3 Parameter Estimation for MIMO Systems . . . . . . . . . . . 1636.3.4 The Minimum Prediction Error Method . . . . . . . . . . . . 1646.3.5 Tracking and Covariance Management . . . . . . . . . . . . . 1686.3.6 Numerically Robust Recursions . . . . . . . . . . . . . . . . . 170

    6.4 Convergence of Recursive Identication Algorithms . . . . . . . . . . 1726.4.1 RLS Deterministic Convergence . . . . . . . . . . . . . . . . . 1736.4.2 RLS Stochastic Convergence . . . . . . . . . . . . . . . . . . 1766.4.3 RELS Convergence Results . . . . . . . . . . . . . . . . . . . 182Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

    CHAPTER 7 LQ and Predictive Stochastic Control 1877.1 LQ Stochastic Regulation: Complete State Information . . . . . . . 1877.2 LQ Stochastic Regulation: Partial State Information . . . . . . . . . 192

    7.2.1 LQG Regulation . . . . . . . . . . . . . . . . . . . . . . . . . 1927.2.2 Linear NonGaussian Plants . . . . . . . . . . . . . . . . . . 1967.2.3 SteadyState LQG Regulation . . . . . . . . . . . . . . . . . 196

    7.3 SteadyState Regulation of CARMA Plants: Solution via Polyno-mial Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1997.3.1 Single Step Stochastic Regulation . . . . . . . . . . . . . . . . 1997.3.2 SteadyState LQ Stochastic Linear Regulation . . . . . . . . 202

  • CONTENTS v

    7.3.3 LQSL Regulator Optimality among Nonlinear Compensators 2117.4 Monotonic Performance Properties of LQ Stochastic Regulation . . . 2137.5 SteadyState LQS Tracking and Servo . . . . . . . . . . . . . . . . . 215

    7.5.1 Problem Formulation and Solution . . . . . . . . . . . . . . . 2157.5.2 Use of Plant CARIMA Models . . . . . . . . . . . . . . . . . 2207.5.3 Dynamic Control Weight . . . . . . . . . . . . . . . . . . . . 221

    7.6 H and LQ Stochastic Control . . . . . . . . . . . . . . . . . . . . . 2227.7 Predictive Control of CARMA Plants . . . . . . . . . . . . . . . . . 225

    Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

    PART III Adaptive Control 233

    CHAPTER 8 SingleStepAhead SelfTuning Control 2358.1 Control of Uncertain Plants . . . . . . . . . . . . . . . . . . . . . . . 2368.2 Bayesian and SelfTuning Control . . . . . . . . . . . . . . . . . . . 2408.3 Global Convergence Tools for Deterministic STCs . . . . . . . . . . . 2448.4 RLS Deterministic Properties . . . . . . . . . . . . . . . . . . . . . . 2488.5 SelfTuning Cheap Control . . . . . . . . . . . . . . . . . . . . . . . 2518.6 Constant Trace Normalized RLS and STCC . . . . . . . . . . . . . . 2578.7 SelfTuning Minimum Variance Control . . . . . . . . . . . . . . . . 262

    8.7.1 Implicit Linear Regression Models and ST Regulation . . . . 2628.7.2 Implicit RLS+MV ST Regulation . . . . . . . . . . . . . . . 2638.7.3 Implicit SG+MV ST Regulation . . . . . . . . . . . . . . . . 265

    8.8 Generalized MinimumVariance SelfTuning Control . . . . . . . . . 2718.9 Robust SelfTuning Cheap Control . . . . . . . . . . . . . . . . . . . 273

    8.9.1 ReducedOrder Models . . . . . . . . . . . . . . . . . . . . . 2748.9.2 Preltering the Data . . . . . . . . . . . . . . . . . . . . . . . 2748.9.3 Dynamic Weights . . . . . . . . . . . . . . . . . . . . . . . . . 2758.9.4 CTNRLS with deadzone and STCC . . . . . . . . . . . . . 276Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

    CHAPTER 9 Adaptive Predictive Control 2859.1 Indirect Adaptive Predictive Control . . . . . . . . . . . . . . . . . . 285

    9.1.1 The Ideal Case . . . . . . . . . . . . . . . . . . . . . . . . . . 2859.1.2 The Bounded Disturbance Case . . . . . . . . . . . . . . . . . 2969.1.3 The Neglected Dynamics Case . . . . . . . . . . . . . . . . . 303

    9.2 Implicit Multistep Prediction Models of LinearRegression Type . . 3059.3 Use of Implicit Prediction Models in Adaptive Predictive Control . . 3099.4 MUSMAR as an Adaptive ReducedComplexity Controller . . . . . 3179.5 MUSMAR Local Convergence Properties . . . . . . . . . . . . . . . . 328

    9.5.1 Stochastic Averaging: the ODE Method . . . . . . . . . . . . 3289.5.2 MUSMAR ODE Analysis . . . . . . . . . . . . . . . . . . . . 3349.5.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 343

    9.6 Extensions of the MUSMAR Algorithm . . . . . . . . . . . . . . . . 3469.6.1 MUSMAR with MeanSquare Input Constraint . . . . . . . . 3469.6.2 Implicit Adaptive MKI: MUSMAR . . . . . . . . . . . . . 354Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

  • vi CONTENTS

    Appendices 366

    APPENDIX A Some Results from Linear Systems Theory 369A.1 StateSpace Representations . . . . . . . . . . . . . . . . . . . . . . 369A.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372A.3 StateSpace Realizations . . . . . . . . . . . . . . . . . . . . . . . . . 373

    APPENDIX B Some Results of Polynomial Matrix Theory 375B.1 MatrixFraction Descriptions . . . . . . . . . . . . . . . . . . . . . . 375

    B.1.1 Divisors and Irreducible MFDs . . . . . . . . . . . . . . . . . 376B.1.2 Elementary Row (Column) Operations for Polynomial Matrices377B.1.3 A Construction for a gcrd . . . . . . . . . . . . . . . . . . . . 377B.1.4 Bezout Identity . . . . . . . . . . . . . . . . . . . . . . . . . . 378

    B.2 Column and RowReduced Matrices . . . . . . . . . . . . . . . . . 378B.3 Reachable Realizations from Right MFDs . . . . . . . . . . . . . . . 379B.4 Relationship between z and d MFDs . . . . . . . . . . . . . . . . . . 380B.5 Divisors and SystemTheoretic Properties . . . . . . . . . . . . . . . 380

    APPENDIX C Some Results on Linear Diophantine Equations 383C.1 Unilateral Polynomial Matrix Equations . . . . . . . . . . . . . . . . 383C.2 Bilateral Polynomial Matrix Equations . . . . . . . . . . . . . . . . . 385

    APPENDIX D Probability Theory and Stochastic Processes 387D.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387D.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387D.3 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 389D.4 Gaussian Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . 390D.5 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 390D.6 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391D.7 Minimum MeanSquareError Estimators . . . . . . . . . . . . . . . 393

    References 394

  • List of Figures

    2.2-1Optimal solution of the regulation problem in a statefeedback formas given by Dynamic Programming. . . . . . . . . . . . . . . . . . . 15

    2.3-1 LQR solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5-1A controltheoretic interpretation of Kleinman iterations. . . . . . . 31

    3.2-1The feedback system. . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2-2The feedback system with a Qparameterized compensator. . . . . . 523.2-3The feedback system with a Qparameterized compensator for P (d)

    stable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.5-1Unityfeedback conguration of a closedloop system with a 1DOF

    controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.5-2Closedloop system with a 2DOF controller. . . . . . . . . . . . . . 583.5-3Unityfeedback closedloop system with a 1DOF controller for asymp-

    totic tracking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    4.6-1 Plant/compensator cascade unity feedback for an LQ regulated system. 74

    5.4-1 Plant with I/O transport delay . . . . . . . . . . . . . . . . . . . . . 935.7-1MKI closedloop eigenvalues for the plant (16) with = 2. . . . . . 1065.7-2MKI closedloop eigenvalues for the plant (16) with = 1.999001. . 1065.7-3TCI closedloop eigenvalues for the plant (16) with = 2 and =

    0.1, when: high precision (h) and low precision (l) computations areused. TCI are initialized from a feedback close to FSS. . . . . . . . . 109

    5.7-4TCI closedloop eigenvalues for the plant (16) with = 2, = 0.1and high precision computations. TCI are initialized from FLQ. . . . 109

    5.7-5TCI closedloop eigenvalues for the plant of Example 4 and = 0.1,when high precision computations are used. TCI are initialized froma feedback close to FSS. . . . . . . . . . . . . . . . . . . . . . . . . . 110

    5.7-6TCI feedbackgains for = 101 and = 103, for the plant ofExample 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

    5.7-7Realization of the TCI regulation law via a bank of T parallel feedbackgain matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    5.8-1Reference and plant output when a 1DOF LQ controller is used forthe tracking problem of Example 1. . . . . . . . . . . . . . . . . . . . 116

    5.8-2Reference and plant output when a 2DOF LQ controller is used forthe tracking problem of Example 1. . . . . . . . . . . . . . . . . . . . 116

    5.8-3Deadbeat tracking for the plant of Example 3 controlled by SIORHC(or GPC) when the reference consistency condition is satised. T =3 is used for SIORHC (N1 = Nu = 3, N2 = 5 and u = 0 for GPC). 123

    vii

  • viii LIST OF FIGURES

    5.8-4Tracking performance for the plant of Example 3 controlled by GPC(N1 = Nu = 3, N2 = 5 and u = 0) when the reference consistencycondition is violated, viz. the timevarying sequence r(t + Nu + i),i = 1, , N2 Nu, is used in calculating u(t). . . . . . . . . . . . . 123

    6.1-1The ISLM estimate as given by an orthogonal projection. . . . . . . 1306.1-2 Block diagram view of algorithm (37)(44) for computing recursively

    the ISLM estimate w|r. . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.2-1 Illustration of the Kalman lter. . . . . . . . . . . . . . . . . . . . . 1416.2-2 Illustration of the KF as an innovations generator. The third system

    recovers z from its innovations e. . . . . . . . . . . . . . . . . . . . . 1456.3-1Orthogonalized projection algorithm estimate of the impulse response

    of a 16 steps delay system when the input is a PRBS of period 31. . 1526.3-2Geometric interpretation of the projection algorithm. . . . . . . . . . 1536.3-3Recursive estimation of the impulse response of the 6pole Butter-

    worth lter of Example 2. . . . . . . . . . . . . . . . . . . . . . . . . 1546.3-4Geometrical illustration of the Least Squares solution. . . . . . . . . 1566.3-5 Block diagram of the MPE estimation method. . . . . . . . . . . . . 1666.4-1 Polar diagram of C(ei) with C(d) as in (37). . . . . . . . . . . . . . 1836.4-2Time evolution of the four RELS estimated parameters when the

    data are generated by the ARMA model (37). . . . . . . . . . . . . . 184

    7.2-1The LQG regulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1957.4-1The relation between E {u2(k)} and E {y2(k)} parameterized by

    for the plant of Example 1 under Single Step Stochastic regulation(solid line) and steadystate LQS regulation (dotted line). . . . . . . 215

    8.1-1 Block diagram of a MRAC system. . . . . . . . . . . . . . . . . . . . 2378.1-2 Block diagram of a STC system. . . . . . . . . . . . . . . . . . . . . 2378.2-1 Block diagram of an adaptive controller as the solution of an optimal

    stochastic control problem. . . . . . . . . . . . . . . . . . . . . . . . 2418.8-1The original CARMA plant controlled by the GMV controller on the

    L.H.S. is equivalent to the modied CARMA plant controlled by theMV controller on the R.H.S. . . . . . . . . . . . . . . . . . . . . . . . 272

    8.9-1 Block scheme of a robust adaptive MV control system involving alowpass lter L(d) for identication, and a highpass lter H(d) forthe control synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

    9.1-1 Illustration of the mode of operation of adaptive SIORHC. . . . . . . 2909.1-2 Plant with input and output bounded disturbances. . . . . . . . . . 2969.2-1Visualization of the constraint (9a). . . . . . . . . . . . . . . . . . . 3079.3-1 Signal ow in the interlaced identication scheme. . . . . . . . . . . 3109.3-2Visualization of the constraints (7) and (10). . . . . . . . . . . . . . 3139.3-3Time steps for the regressor and regressands when the next input to

    be computed is u(t). . . . . . . . . . . . . . . . . . . . . . . . . . . . 3169.3-4 Signal ows in the bank of parallel MUSMAR RLS identiers when

    T = 3 and the next input is u(t). . . . . . . . . . . . . . . . . . . . . 3169.4-1Reference trajectories for joint 1 (above) and joint 2 (below). . . . . 324

  • LIST OF FIGURES ix

    9.4-2aTime evolution of the three PID feedbackgains KP , KI , and KDadaptively obtained by MUSMAR for the joint 1 of the robot ma-nipulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

    9.4-2bTime evolution of the three PID feedbackgains KP , KI , and KDadaptively obtained by MUSMAR for the joint 2 of the robot ma-nipulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

    9.4-3Time evolution of the tracking errors for the robot manipulator con-trolled by a digital PID autotuned by MUSMAR (solid lines) orZiegler and Nichols method (dotted lines): (a) joint 1 error; (b)joint 2 error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

    9.5-1Time behaviour of the MUSMAR feedback parameters in Example1 for T = 1, 2, 3, respectively. . . . . . . . . . . . . . . . . . . . . . . 344

    9.5-2The unconditional cost C(F ) in Example 3 and feedback convergencepoints for various control horizons T . . . . . . . . . . . . . . . . . . . 345

    9.5-3Time behaviour of MUSMAR feedback parameters of Example 4 forT = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

    9.6-1 Superposition of the feedback timeevolution over the constant levelcurves of E{y2(t)} and the allowed boundary E{u2(t)} = 0.1 forCMUSMAR with T = 5 and the plant in Example 1. . . . . . . . . . 353

    9.6-2Time evolution of and E{u2(t)} for CMUSMAR with T = 2 andthe plant of Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . 354

    9.6-3 Illustration of the minorant imposed on T . . . . . . . . . . . . . . . . 3599.6-4The accumulated loss divided by time when the plant of Example 3

    is regulated by MUSMAR (T = 3) and MUSMAR (T = 3). . . . 3629.6-5The evolution of the feedback calculated by MUSMAR in Exam-

    ple 5, superimposed to the level curves of the underlying quadraticcost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

    9.6-6Convergence of the feedback when the plant of Example 6 is con-trolled by MUSMAR. . . . . . . . . . . . . . . . . . . . . . . . . . 363

    9.6-7The accumulated loss divided by time when the plant of Example 6 iscontrolled by ILQS, standard MUSMAR (T = 3) and MUSMAR(T = 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

  • x LIST OF FIGURES

  • CHAPTER 1

    INTRODUCTION

    1.1 Optimal, Predictive and Adaptive Control

    This book covers various topics related to the design of discretetime control sys-tems via the Linear Quadratic (LQ) control methodology.

    LQ control is an optimal control approach whereby the control law of a givendynamic linear system the socalled plant is analytically obtained by mini-mizing a performance index quadratic in the regulation/tracking error and controlvariables. LQ control is either deterministic or stochastic according to the deter-ministic or stochastic nature of the plant. To master LQ control theory is importantfor several reasons:

    LQ control theory provides a set of analytical design procedures that facilitatethe synthesis of control systems with nice properties. These procedures, oftenimplemented by commercial software packages, yield a solution which can bealso used as a rst cut in a trial and error iterative process, in case somespecications are not met by the initial LQ solution.

    LQ control allows us to design control systems under various assumptions onthe information available to the controller. If this includes also the knowledgeof the reference to be tracked, feedback as well as feedforward control laws the socalled 2DOF controllers are jointly obtained analytically.

    More advanced control design methodologies, such as H control theory, canbe regarded as extensions of LQ control theory.

    LQ control theory can be applied to nonlinear systems operating on a smallsignal basis.

    There exists a relationship of duality between LQ control and MinimumMeanSquare linear prediction, ltering and smoothing. Hence any LQ con-trol result has a direct counterpart in the latter areas.

    LQ control theory is complemented in the book with a treatment of multistep pre-dictive control algorithms. With respect to LQ control, predictive control basicallyadds constraints in the tracking error and control variables and uses the recedinghorizon control philosophy. In this way, relatively simple 2DOF control laws canbe synthesized. Their feature is that the prole of the reference over the prediction

    1

  • 2 Introduction

    horizon can be made part of the overall control system design and dependent onboth the current plant state and the desired set point. This extra freedom can beeectively used so as to guarantee a bumpless behaviour and avoid to surpass sat-uration bounds. These are aspects of such an importance that multistep predictivecontrol has gained wide acceptance in industrial control applications.

    Both the LQ and the predictive control methodologies assume that a model ofthe physical system is available to the designer. When, as often happens in prac-tice, this is not the case, we can combine online system identication and controldesign methods to build up adaptive control systems. Our study of such systemsincludes basically two classes of adaptive control systems: singlestepahead selftuning controllers and multistep predictive adaptive controllers. The controllers inthe rst class are based on simple control laws and the price paid for it originatesthe stability problems that they exhibit with nonminimumphase and openloopunstable plants. They also require that the plant I/O delay be known. The mul-tistep predictive adaptive controllers have a substantially greater computationalload and require a greater eort for convergence analysis, but overcome the abovementioned limitations.

    1.2 About This Book

    LQ control is by far the most thoroughly studied analytic approach to linear feed-back control system design. In particular, several excellent textbooks exist on thetopic. Considering also the already available books on adaptive control at variouslevels of rigour and generality, the question arises on whether this new additionto the preexisting literature can be justied. We answer this question by listingsome of the distinguishing features of this book.

    The Dynamic Programming vs. the Polynomial EquationApproach

    LQ control, either in a deterministic or in a stochastic setting, is customarily ap-proached via Dynamic Programming by using a statespace or internal modelof the physical system. This is a timedomain approach and yields the desiredsolution in terms of a Riccati dierence equation. For timeinvariant plants, thesocalled steadystate LQ control law is obtained by letting the control horizonto become of innite length, and, henceforth, can be computed by solving an al-gebraic Riccati equation. This steadystate solution can be also obtained via analternative way, the socalled Polynomial Equation approach. This derives from aquite dierent way of looking at the LQ control problem. It uses transfer matricesor external models of the physical system, and turns out to be more akin to afrequencydomain methodology. It leads us to solve the steadystate LQ controlproblem by spectral factorization and a couple of linear Diophantine equations.

    In this book the Dynamic Programming and the Polynomial Equation approachare thoroughly studied and compared, our experience being that mastering bothapproaches can be highly benecial for the student or the practitioner. Both ap-proaches play in fact a synergetic role, providing us with both two alternative waysof looking at the same problem and dierent sets of solving equations. As a conse-quence, our insight is enhanced and our ability in applying the theory strengthened.

  • Sect. 1.2 About This Book 3

    Predictive vs. LQ Control

    Multistep or longrange predictive control is an important topic for process controlapplications, some of the reasons having been outlined above. In this book the em-phasis is, however, on design techniques that are applicable when the plant is onlypartially known. Further, we study predictive control within the framework of LQcontrol theory. In fact, a predictive control law referred to as SIORHC (StabilizingI/O Receding Horizon Control) is singled out by addressing a dynamic LQ controlproblem in the presence of suitable constraints on its terminal regulation/trackingerror and on control signals. SIORHC has the peculiarity of possessing a guaranteedstabilizing property, provided that its prediction horizon length exceeds or equalsthe order of the plant. This nitehorizon stabilizing property makes SIORHCparticularly wellsuited for adaptive control wherein stabilization must be insuredirrespective of the actual value of the estimated plant parameters. SIORHC, asits prediction horizon becomes larger, tightly approximates the steadystate LQcontrol, when the latter optimally exploits the knowledge of the future referenceprole. However, thanks to the nite length of its prediction horizon, SIORHC canbe more easily computable than steadystate LQ control, since it does not requirethe solution of an algebraic Riccati equation or a spectral factorization problem.

    SingleStepAhead vs. Multistep Predictive Adaptive Con-trol

    One entire chapter is devoted to singlestepahead selftuning control. This ismainly done to introduce the subject of adaptive control and the tools for analysingmore general schemes, our prevalent interest being in adaptive (multistep) predic-tive control systems because of their wider application potential. However, ingoing from singlestepahead to more complicated control design procedures suchas poleassignment, LQ and some predictive control laws, a diculty arises in thatit may not be always feasible to evaluate the control law. A typical situation iswhen the estimated model has unstable polezero cancellations, i.e. the estimatedmodel becomes unstabilizable. We refer to the above diculty as the singularityproblem. This has been one of the stumbling blocks for the construction of stableadaptive predictive control systems. The standard way to circumvent the singular-ity problem is to assume that the true plant parameter vector belongs to an a prioriknown convex set whose elements correspond to reachable plant models. Next, therecursive identication algorithm is modied so as to guarantee that the estimatesbelong to the set. E.g., this can be achieved by embodying a projection facility inthe identication algorithm. The alleged prior knowledge of such a convex set isinstrumental to the development of (locally) convergent algorithms, but it does notappear to be justiable in many instances. In contrast with the above approach, inorder to address convergence of adaptive multistep predictive control systems, analternative technique is here adopted and analyzed. It consists of a selfexcitationmechanism by which a dither signal is superimposed to the plant input wheneverthe estimated plant model is close to become unreachable. Under quite generalconditions, the selfexcitation mechanism turns o after a nite time and globalconvergence of the adaptive system is ensured.

  • 4 Introduction

    Implicit Adaptive Predictive ControlOne classic result in stochastic adaptive control is that an autoregressive movingaverage plant under MinimumVariance control can be described in terms of alinearregression model. This allows one to construct simple implicit selftuningcontrollers based on the MinimumVariance control law. In the book it is shownthat a similar property holds also when multistep predictive control laws are used.Hence, implicit adaptive predictive controllers can be constructed, wherein simplelinearregression identiers are used. The fact that no global convergence proofsare generally available or even feasible does not deter us from consideringimplicit adaptive predictive control in view of its excellent local selfoptimizingproperties in the presence of neglected dynamics, and hence its possible use forautotuning reducedcomplexity controllers of complex plants.

    1.3 Part and Chapter Outline

    In this section, we briey describe the breakdown of the book into parts and chap-ters. The parts are three and they are listed below with some comments.

    Part I Basic deterministic theory of LQ and predictive control Thispart consists of Chapters 25. The purpose of Chapter 2 is to establish the mainfacts on the deterministic LQ regulation problem. Dynamic Programming is dis-cussed and used to get the Riccatibased solution of the LQ regulator. Next, timeinvariant LQ regulation is considered, and existence conditions and properties ofthe steadystate LQ regulator are established. Finally, two simple versions of LQregulation, based on a control horizon comprising a single step, are analyzed andtheir limitations are pointed out. Chapter 3 introduces the drepresentation of asequence, matrixfraction descriptions of system transfer matrices and system poly-nomial representations. Using these tools, a study follows on the characterizationof stability of feedback linear systems and on the socalled YJBK parameterizationof all stabilizing compensators. Finally, the asymptotic tracking problem is con-sidered and formulated as a stability problem of a feedback system. In Chapter 4,the polynomial approach to LQ regulation is addressed, the related solution foundin terms of a spectral factorization problem and a couple of linear Diophantineequations, and its relationship with the Riccatibased solution established. Someremarks follow on robust stability of LQ regulated problems. Chapter 5 intro-duces receding horizon control. Zero terminal state regulation is rst consideredso as to develop dynamic receding horizon regulation with a guaranteed stabilizingproperty. Within the same framework, generalized predictive regulation is treated.Next, receding horizon iterations are introduced, our interest in them being moti-vated by their possible use in adaptive multistep predictive control. Finally, thetracking problem is discussed. In particular, predictive control is introduced as a2DOF receding horizon control methodology whereby the feedforward action ismade dependent on the future reference evolution which, in turn, can be selectedonline so as to avoid saturation phenomena.

    Part II State estimation, system identication, LQ and predictive sto-chastic control This part consists of Chapter 6 and 7. The purpose of Chapter 6is to lay down the main results on recursive state estimation and system identi-cation. The Kalman lter and various linear and pseudolinear recursive system

  • Sect. 1.3 Part and Chapter Outline 5

    parameter estimators are considered and related to algorithms derived systemati-cally via the Prediction Error Method. Finally, convergence properties of recursiveidentication algorithms are studied. The emphasis here is to prove convergenceto the true system parameter vector under some strong conditions which typicallycannot be guaranteed in adaptive control. Chapter 7 extends LQ and predictiverecedinghorizon control to a stochastic setting. To this end, Stochastic DynamicProgramming is used to yield the optimal LQG (Linear Quadratic Gaussian) con-trol solution via the socalled Certainty Equivalence Principle. Next, MinimumVariance control and steadystate LQ stochastic control for CARMA (ControlledAutoRegressive Moving Average) plants are tackled via the stochastic variant ofthe polynomial equation approach introduced in Chapter 3. Finally, 2DOF track-ing and servo problems are considered, and the stabilizing predictive control lawintroduced in Chapter 5, is extended to a stochastic setting.

    Part III Adaptive control Chapter 8 and 9 combine recursive systemidentication algorithms with LQ and predictive control methods to build adaptivecontrol systems for unknown linear plants. Chapter 8 describes the two basic groupsof adaptive controllers, viz. modelreference and selftuning controllers. Next, wepoint out the diculties encountered in formulating adaptive control as an op-timal stochastic control problem, and, in contrast, the possibility of adopting asimple suboptimal procedure by enforcing the Certainty Equivalence Principle. Wediscuss the deterministic properties of the RLS (Recursive Least Squares) identi-cation algorithm not subject to persistency of excitation and, hence, applicablein the analysis of adaptive systems. These properties are used so as to constructa selftuning control system, based on a simple onestepahead control law, forwhich global convergence can be established. Global convergence is also shown tohold true when a constanttrace RLS identier with data normalization is used,the nite memorylength of this identier being important for timevarying plants.Selftuning MinimumVariance control is discussed by pointing out that implicitmodelling of CARMA plants under MinimumVariance control can be exploited soas to construct algorithms whose global convergence can be proved via the stochas-tic Lyapunov equation method. Further, it is shown that Generalized MinimumVariance control is equivalent to MinimumVariance control of a modied plant,and, hence, globally convergent selftuning algorithms based on the former con-trol law can be developed by exploiting such an equivalence. Chapter 8 ends bydiscussing how to robustify selftuning singlestepahead controllers so as to dealwith plants with bounded disturbances and neglected dynamics. Chapter 9 studiesvarious adaptive multistep predictive control algorithms, the main interest beingin extending the potential applications beyond the restrictions inherent to singlestepahead controllers. We start with considering an indirect adaptive version ofthe stabilizing predictive control (SIORHC) algorithm introduced in Chapter 5.We show that, in order to avoid a singularity problem in the controller parameterevaluation, the notion of a selfexcitation mechanism can be used. The resultingcontrol philosophy is of dual control type in that the selfexcitation mechanismswitches on an input dither whenever the estimated plant parameter vector be-comes close to singularity. The dither intensity must be suitably chosen, by takinginto account the interaction between the dither and the feedforward signal, so as toensure global convergence to the adaptive system. We next discuss how the indirectadaptive predictive control algorithm can be robustied in order to deal with plantbounded disturbances and neglected dynamics. The second part of Chapter 9 deals

  • 6 Introduction

    with implicit adaptive predictive control. It rst shows how the implicit modellingproperty of CARMA plants, previously derived under MinimumVariance control,can be extended to more complex control laws, such as steadystate LQ stochasticcontrol and variants thereof. Next, the possible use of implicit prediction models inadaptive predictive control is discussed and some examples of such controllers aregiven. One of such controllers, MUSMAR, which possesses attractive local selfoptimizing properties, is studied via the Ordinary Dierential Equation (ODE)approach to analyse recursive stochastic algorithms. Two extensions of MUSMARare nally studied. Such extensions are nalized to recover exactly the steadystate LQ stochastic regulation law as an equilibrium point of the algorithm, and,respectively, to impose a meansquare input constraint to the controlled system.

    Appendices Results from linear system theory, polynomial matrix theory, linearDiophantine equations, probability theory and stochastic processes.

  • PART I

    BASIC DETERMINISTICTHEORY OF LQ AND

    PREDICTIVE CONTROL

    7

  • CHAPTER 2

    DETERMINISTIC LQREGULATION IRICCATIBASED SOLUTION

    The purpose of this chapter is to establish the main facts on the deterministic LinearQuadratic (LQ) regulator. After formulating the problem in Sect. 1, DynamicProgramming is discussed in Sect. 2 and used in Sect. 3 to get the Riccatibasedsolution of the LQ regulator. Sect. 4 discusses the timeinvariant LQ regulation,the existence and properties of the steadystate regulator resulting asymptoticallyby letting the regulation horizon become innitely large. Sect. 5 considers iterativemethods for computing the steadystate regulator. In Sect. 6 and 7 two simpleversions of LQ regulation, viz. Cheap Control and Single Step Control, are presentedand analysed.

    2.1 The Deterministic LQ Regulation Problem

    The plant to be regulated consists of a discretetime linear dynamic system repre-sented as follows

    x(k + 1) = (k)x(k) +G(k)u(k) (2.1-1)

    Here: k ZZ := { ,1, 0, 1, }; x(k) IRn denotes the plant state at time k;u(k) IRm the plant input or control at time k; and (k) and G(k) are matricesof compatible dimensions.

    Assuming that the plant state at a given time t0 is x(t0), the interest is to nda control sequence over the regulation horizon [t0, T ], t0 T 1,

    u[t0,T ) :={u(k)

    }T1k=t0

    (2.1-2)

    which minimizes the quadratic performance index or cost functional

    J(t0, x(t0), u[t0,T )

    ):= x(T )2x(T )+

    T1k=t0

    [x(k)2x(k) + 2u(k)M(k)x(k) + u(k)2u(k)

    ] (2.1-3)9

  • 10 Deterministic LQ Regulation I RiccatiBased Solution

    where x2 := xx, and the prime denotes matrix transposition. W.l.o.g., it willbe assumed that x(k), u(k) and x(T ) are symmetric matrices.

    Problem 2.1-1 Consider the quadratic form xx, x IRn, with any n n matrix withreal entries. Let s = s := ( + )/2. Show that xx = xsx. [Hint: Use the fact that = s + s if s := ( )/2 ]

    J(t0, x(t0), u[t0,T )) quanties the regulation performance of the plant (1), from theinitial event (t0, x(t0)) when its input is specied by u[t0,T ). It is assumed that anynonzero input u(k) = Om is costly. This condition amounts to assuming that thesymmetric matrix u(k) is positive denite

    u(k) = u(k) > 0 (2.1-4)

    It is also assumed that the instantaneous loss at time k, viz. the term within bracketsin (3), is nonnegative

    (k, x(k), u(k)) := x(k)2x(k) + 2u(k)M(k)x(k) + u(k)2u(k) 0 (2.1-5)

    Since by (4) u(k) is nonsingular, (5) is equivalent to the following nonnegativedeniteness condition

    x(k)M (k)1u (k)M(k) 0 (2.1-6)Problem 2.1-2 Consider the quadratic form

    (x, u) := x2x + 2uMx+ u2uwith x IRn, u IRm, and u = u > 0, x and M matrices of compatible dimensions. Showsthat (x, u) 0 for every (x, u) IRn IRm if and only if x M 1u M 0. [Hint: Find thevector u0(x) IRm which minimizes (x, u) for any given x, viz. (x, u0(x)) (x, u), u IRm ]

    The terminal cost x(T )2x(T ) is nally assumed nonnegative

    x(T ) = x(T ) 0 (2.1-7)

    Let us consider the following as a formal statement of the deterministic LQregulation problem.

    Deterministic LQ regulator (LQR) problem Consider the linear plant(1). Dene the quadratic performance index (3) with x(k), u(k), x(T )symmetric matrices satisfying (4), (6) and (7). Find an optimal input u0[t0,T )to the plant (1), initialized from the event (t0, x(t0)), minimizing the perfor-mance index (3).

    The general LQR problem can be transformed into an equivalent problem with nocrossproduct terms in its instantaneous loss. In order to see this, set

    u(k) = u(k)K(k)x(k) (2.1-8)

    This means that the plant input u(k) at time k is the sum of K(k)x(k), a statefeedback component, and a vector u(k).

  • Sect. 2.2 Dynamic Programming 11

    Problem 2.1-3 Consider the instantaneous loss (5). Rewrite it as

    (k, x(k), u(k)) := (k, x(k), u(k)K(k)x(k)).Show that the crossproduct terms in vanish provided that

    K(k) = 1u (k)M(k) (2.1-9)

    Show also that under the choice (9)

    (k, x(k), u(k)) = x(k)2x(k)

    + u(k)2u(k) (2.1-10)

    where x(k) equals the L.H.S. of (6)

    x(k) := x(k)M (k)1u (k)M(k) (2.1-11)

    Taking into account the solution of Problem 3, we can see that the general LQRproblem is equivalent to the following. Given the plant

    x(k + 1) = [(k)G(k)1u (k)M(k)]x(k) +G(k)u(k), (2.1-12)

    nd an optimal input u0[t0,T ) minimizing the performance index

    J(t0, x(t0), u[t0,T )) =T1k=t0

    (k, x(k), u(k)) + x(T )2x(T ) (2.1-13)

    where the instantaneous loss is given by (10).

    Problem 2.1-4 (An LQ Tracking Problem) Consider the plant (1) along with the ndimensionallinear system

    xw(k + 1) = w(k)xw(k)

    with xw(t0) IRn given. Letx(k) := x(k) xw(k)

    and

    J

    (t0, x(t0), u[t0,T )

    ):=

    T1k=t0

    (k, x(k), u(k)) + x(T )2x(T )

    (k, x(k), u(k)) := x(k)2x(k) + 2u(k)M(k)x(k) + u(k)2u(k)

    Show that the problem of nding an optimal input u0[t0,T )

    for the plant (1) which minimizes the

    above performance index can be cast into an equivalent LQR problem. [Hint: Consider theplant with extended state (k) := [ x(k) xw(k) ]. ]

    2.2 Dynamic Programming

    A solution method which exploits in an essential way the dynamic nature of theLQR problem is Bellmans technique of Dynamic Programming. Dynamic Pro-gramming is discussed here only to the extent necessary to solve the LQR problem.In doing this, we consider a larger class of optimal regulation problems so as tobetter focus our attention on the essential features of Dynamic Programming.

    Let the plant be described by a possibly nonlinear statespace representation

    x(k + 1) = f(k, x(k), u(k)) (2.2-1)

    As in (1-1), x(k) IRn and u(k) IRm. The function f , referred to as the localstatetransition function, species the rule according to which the event (k, x(k))is transformed, by a given input u(k) at time k, into the next plant state x(k + 1)

  • 12 Deterministic LQ Regulation I RiccatiBased Solution

    at time k + 1. By iterating (1), it is possible to dene the global statetransitionfunction

    x(j) = (j, k, x(k), u[k,j)

    ), j k (2.2-2)

    The function , for a given input sequence u[k,j), j k, species the rule accordingto which the initial event (k, x(k)) is transformed into the nal event (j, x(j)). E.g.

    x(k + 2) = f(k + 1, x(k + 1), u(k + 1))= f(k + 1, f(k, x(k), u(k)), u(k + 1)) [(1)]=: (k + 2, k, x(k), u[k,k+2))

    For j = k, u[k,j) is empty, and, consequently, the system is left in the event (k, x(k)).This amounts to assuming that satises the following consistency condition

    (k, k, x(k), u[k,k)) = x(k) (2.2-3)

    Problem 2.2-1 Show that for the linear dynamic system (1-1), the global statetransitionfunction equals

    (j, k, x(k), u[k,j)) = (j, k)x(k) +

    j1i=k

    (j, i+ 1)G(i)u(i)

    where

    (j, k) :=

    {In j = k(j 1) (k) j > k (2.2-4)

    is the statetransition matrix of the linear system.

    Along with the plant (1) initialized from the event (t0, x(t0)), we consider thefollowing possibly nonquadratic performance index

    J(t0, x(t0), u[t0,T )) =T1k=t0

    (k, x(k), u(k)) + (x(T )) (2.2-5)

    Here again (k, x(k), u(k)) stands for a nonnegative instantaneous loss incurred attime k, (x(T )) for a nonnegative loss due to the terminal state x(T ), and [t0, T ]for the regulation horizon. The problem is to nd an optimal control u0[t0,T ) for theplant (1), initialized from (t0, x(t0)), minimizing (5).

    Hereafter, conditions on (1) and (5) will be implicitly assumed in order that eachstep of the adopted optimization procedure makes sense. For t [t0, T ], considerthe so called Bellman function

    V (t, x(t)) := minu[t,T )

    J(t, x(t), u[t,T )

    )= min

    u[t,t1)

    {minu[t1,T )

    [t11k=t

    (k, u(k), x(k)) + (2.2-6)

    J(t1, (t1, t, x(t), u[t,t1)

    ), u[t1,T ))

    ]}The second equality follows since u[t,T ) = u[t,t1)u[t1,T ) for t1 [t, T ), denotingconcatenation. Eq. (6) can be rewritten as follows

    V (t, x(t)) = minu[t,t1)

    { t11k=t

    (k, u(k), x(k)) +

  • Sect. 2.2 Dynamic Programming 13

    minu[t1,T)

    J(t1,

    (t1, t, x(t), u[t,t1)

    ), u[t1,T )

    )}(2.2-7)

    = minu[t,t1)

    { t11k=t

    (k, u(k), x(k)) + V(t1,

    (t1, t, x(t), u[t,t1)

    ))}Suppose now that u0[t,T ) is an optimal input over the horizon [t, T ) for the initialevent (t, x(t)), viz.

    V (t, x(t)) = J(t, x(t), u0[t,T )

    ) J

    (t, x(t), u[t,T )

    )for all control sequences u[t,T ). Then, from (7) it follows that u0[t1,T ), the restrictionof u0[t,T ) to [t1, T ), is again an optimal input over the horizon [t1, T ) for the initialevent (t1, x(t1)), x(t1) := (t1, t, x(t), u0[t,t1)), viz.

    V (t1, x(t1)) = J(t1, x(t1), u0[t1,T )) J(t1, x(t1, u[t1,T ))

    The above statement is a way of expressing Bellmans Principle of Optimality. Inwords,

    the Principle of Optimality states that an optimal input sequence u0[t,T )is such that, given an event (t1, x(t1)) along the corresponding optimal tra-jectory, x(t1) = (t1, t0, x(t0), u0[t0,t1)), the subsequent input sequence u

    0[t1,T )

    is again optimal for the costtogo over the horizon [t1, T ].

    For t1 = t+ 1, (7) yields the Bellman equation

    V (t, x(t)) = minu(t)

    {(t, x(t), u(t)) + V (t+ 1, f(t, x(t), u(t)))

    }(2.2-8)

    with the terminal event condition

    V (T, x(T )) = (x(T )) (2.2-9)

    The functional equation (8) can be used as follows. Eq. (8) for t = T 1 gives

    V (T 1, x(T 1)) = minu(T1)

    {(T 1, x(T 1), u(T 1)) + (x(T ))

    }x(T ) = f(T 1, x(T 1), u(T 1)) (2.2-10)

    If this can be solved w.r.t. u(T 1) for any state x(T 1), one nds an optimalinput at time T 1 in a statefeedback form

    u0(T 1) = u0(T 1, x(T 1)) (2.2-11)and hence determines V (T 1, x(T 1)). By iterating backward the above pro-cedure, provided that at each step a solution can be found, one can determine anoptimal control law in a statefeedback form

    u0(k) = u0(k, x(k)) , k [t0, T ) (2.2-12)

  • 14 Deterministic LQ Regulation I RiccatiBased Solution

    and V (k, x(k)).Before proceeding any further, let us consolidate the discussion so far. We have

    used the Principle of Optimality of Dynamic Programming to obtain the Bellmanequation (8). This suggests the procedure outlined above for obtaining an optimalcontrol. It is remarkable that, if a solution can be obtained, it is in a statefeedbackform. Next theorem shows that, provided that the procedure yields a solution, itsolves the optimal regulation problem at hand.

    Theorem 2.2-1. Suppose that {V (t, x)}Tt=t0 satises the Bellman equation (8)with terminal condition (9). Suppose that a minimum as in (8) exists and is at-tained at

    u(t) = u(t, x)

    viz.

    (t, x, u(t)) + V (t+ 1, f(t, x, u(t))) (t, x, u) + V (t+ 1, f(t, x, u)) , u IRm .Dene x0[t0,T ] and u

    0[t0,T )

    recursively as follows

    x0(t0) = x(t0) (2.2-13)

    u0(t) = u(t, x0(t))x0(t+ 1) = f(t, x0(t), u0(t))

    }t = t0, t0 + 1, , T 1 (2.2-14)

    Then u0[t0,T ) is an optimal control sequence, and the minimum cost equals V (t0, x(t0)).

    Proof It is to be shown that, if u0[t0,T )

    is dened as in (14),

    V (t0, x(t0)) = J(t0, x(t0), u0[t0,T )

    ) J

    (t0, x(t0), u[t0,T )

    ) (2.2-15)for all control sequences u[t0,T ).

    Since for x(t) = x0(t), the R.H.S. of (8) attains its minimum at u0(t), one has

    V (t, x0(t)) = (t, x0(t), u0(t)) + V (t + 1, x0(t+ 1)) (2.2-16)

    Hence

    V(t0, x(t0)

    ) V (T, x0(T )) = (2.2-17)

    =

    T1t=t0

    [V(t, x0(t)

    ) V (t+ 1, x0(t+ 1))]

    =

    T1t=t0

    (t, x0(t), u0(t)

    )Since V (T, x0(T )) = (x(T )), the equality in (15) follows.Now for every control sequence u[t0,T ) applied to the plant initialized from the event (t0, x(t0)),one has

    V (t, x(t)) (t, x(t), u(t)) + V (t+ 1, x(t+ 1)) (2.2-18)if

    x(t + 1) = f(t, x(t), u(t))

    = (t, t0, x(t0), u[t0,T )

    )Using (18) instead of (16), one nds the next inequality in place of (17)

    V (t0, x(t0)) T1t=t0

    (t, x(t), u(t)) + x(x(T ))

    = J(t0, x(t0), u[t0,T )

    ). (2.2-19)

  • Sect. 2.3 RiccatiBased Solution 15

    u(t) =u(t, x(t))

    Plant x(t) =(t, t0, x(t0), u[t0,t))

    Regulator

    (t0, x(t0))

    Figure 2.2-1: Optimal solution of the regulation problem in a statefeedback formas given by Dynamic Programming.

    Main points of the section Bellman equation of Dynamic Programming, ifsolvable, yields, via backward iterations, the optimal regulator in a statefeedbackform (Fig. 1).

    2.3 RiccatiBased Solution

    The Bellman equation (2-8) is now applied to solve the deterministic LQR prob-lem of Sect. 2.1. In this case, the plant is as in (1-1), the performance indexas in (1-3) with the instantaneous loss as in (1-5). Taking into account (1-7),one sees that V (T, x(T )), the Bellman function at the terminal event, equals thequadratic function x(T )x(T )x(T ) with the matrix x(T ) symmetric and nonneg-ative denite. By adopting the procedure outlined in Sect. 2.2 to compute backwardV (t, x(t)), t = T 1, T 2, , t0, the solution of the LQR problem is obtainedTheorem 2.3-1. The solution to the deterministic LQR problem of Sect. 2.1 isgiven by the following linear statefeedback control law

    u(t) = F (t)x(t) , t [t0, T ) (2.3-1)

    where F (t) is the LQR feedbackgain matrix

    F (t) = [u(t) +G(t)P(t+ 1)G(t)

    ]1[M(t) +G(t)P(t+ 1)(t)

    ](2.3-2)

    and P(t) is the symmetric nonnegative denite matrix given by the solution of thefollowing Riccati backward dierence equation

    P(t) = (t)P(t+ 1)(t)[M (t) + (t)P(t+ 1)G(t)

    ][

    u(t) +G(t)P(t+ 1)G(t)]1

    (2.3-3)[M(t) +G(t)P(t+ 1)(t)

    ]+ x(t)

    = (t)P(t+ 1)(t)F (t)

    [u +G(t)P(t+ 1)G(t)

    ]F (t) + x(t) (2.3-4)

  • 16 Deterministic LQ Regulation I RiccatiBased Solution

    =[(t) +G(t)F (t)

    ]P(t+ 1)

    [(t) +G(t)F (t)

    ]+

    F (t)u(t)F (t) +M (t)F (t) + F (t)M(t) + x(t) (2.3-5)

    with terminal conditionP(T ) = x(T ) (2.3-6)

    Further,V (t, x(t)) = minu[t,T ) J

    (t, x(t), u[t,T )

    )= x(t)P(t)x(t)

    (2.3-7)

    Proof (by induction) It is known that V (T, x(T )) is given by (7) if P(T ) is as in (6). Next,assume that V (t + 1, x(t + 1)) = x(t + 1)2P(t+1) with P(t + 1) = P (t + 1) 0 and x(t + 1) =(t)x(t) +G(t)u(t). Show that V (t, x(t)) = x(t)2P(t) with P(t) satisfying (3). One has

    V (t, x(t)) = minu(t)

    J(t, x(t), u(t))

    := minu(t)

    {x(t)2x(t) + 2u(t)M(t)x(t) + (2.3-8)

    u(t)2u(t) + (t)x(t) +G(t)u(t)2P(t+1)}

    Let u(t) = [u1(t) um(t)]. Set to zero the gradient vector of J w.r.t. u(t)

    Om =J(t, x(t), u(t))

    2u(t):=

    1

    2

    [ Ju1(t)

    Jum(t)

    ]= [M(t) +G(t)P(t + 1)(t)]x(t) + (2.3-9)

    [u(t) +G(t)P(t + 1)G(t)]u(t)

    This yields (1) and (2). That these two equations give uniquely the optimizing input u(t), itfollows from invertibility of [u(t) + G(t)P(t + 1)G(t)] and positive deniteness of the Hessianmatrix

    2J(t, x(t), u(t))

    2u(t)= 2[u(t) +G

    (t)P(t + 1)G(t)] > 0.

    Substituting (1) and (2) into J(t, x(t), u(t)), (7) is obtained with P(t) satisfying (3). Eq. (3) showsthat P(t) is symmetric.

    To complete the induction it now remains to show that P(t) is nonnegative denite. Rewrite(4) as follows

    P(t) = (t)P(t + 1)[(t) +G(t)F (t)

    ]+M (t)F (t) + x(t) (2.3-10)

    Further, premultiply both sides of (2) by F (t)[u(t) +G(t)P(t + 1)G(t)] to getF (t)u(t)F (t) = F (t)M(t)

    F (t)G(t)P(t + 1)[(t) +G(t)F (t)

    ](2.3-11)

    Subtracting (11) from (10), we nd (5). Next, by virtue of (1-6),

    F (t)u(t)F (t) +M (t)F (t) + F (t)M(t) + x(t) F (t)u(t)F (t) +M (t)F (t) + F (t)M(t) +M (t)1u (t)M(t) = (2.3-12)[F (t) +M (t)1u (t)

    ]u(t)

    [F (t) + 1u (t)M(t)

    ]From (5), (12) and nonnegative deniteness of P(t + 1), P(t) is seen to be lowerbounded by thesum of two nonnegative denite matrices. Hence, P(t) is also nonnegative denite.

    Main points of the section For any horizon of nite length the LQR problemis solved (Fig. 1) by a regulator consisting of a linear timevarying statefeedbackgain matrix F (t), computable by solving a Riccati dierence equation.

  • Sect. 2.4 TimeInvariant LQR 17

    u(t) Plant((t), G(t))

    x(t)

    Linear

    statefeedbackF (t)

    (t0, x(t0))

    Figure 2.3-1: LQR solution.

    2.4 TimeInvariant LQR

    It is of interest to make a detailed study of the LQR properties in the timeinvariantcase. In this case, the plant (1-1) and the weights in (1-3) are timeinvariant, viz.(k) = ; G(k) = G; x(k) = x; Mk(k) = M ; and u(k) = u, for k =t0, , T 1.

    By timeinvariance, we have for the cost (1-3)

    J(t0, x, u[t0,T )

    )= J

    (0, x, u[0,N)

    )(2.4-1)

    whereu() := u(+ t0) and N := T t0

    for any x IRn and input sequence u(). In (1) u(+ t0) indicates the sequence u()anticipated in time by t0 steps. The notation can be further simplied, by rewriting(1) as J

    (x, u[0,N)

    )where it is understood that x denotes the initial state of the

    plant to be regulated, and N the length of the regulation horizon. The following isa restatement of the deterministic LQR problem in the timeinvariant case.

    LQR problem in the timeinvariant case Consider the timeinvariantlinear plant

    x(k + 1) = x(k) +Gu(k)x(0) = x

    }(2.4-2)

    along with the quadratic performance index

    J(x, u[0,N)

    ):=

    N1k=0

    (x(k), u(k)) + x(N)2x(N) (2.4-3)

    (x, u) := x2x + 2uMx+ u2u (2.4-4)where x, u, x(N) are symmetric matrices satisfying

    u = u > 0 (2.4-5)x := x M 1u M 0 (2.4-6)

    x(N) = x(N) 0 (2.4-7)

  • 18 Deterministic LQ Regulation I RiccatiBased Solution

    Find an optimal input u0() to the plant (2) with initial state x, minimizingthe performance index (3) over an Nsteps regulation horizon.

    For any nite N , Theorem 3-1 provides, of course, the solution to the problem (2)(7). Here, the solution depends on the matrix sequence {P(t)}Nt=0 which can becomputed by iterating backward the matrix Riccati equation (3-3)(3-5). Equiva-lently, by setting

    P (j) := P(N j) , j = 0, 1, , N (2.4-8)we can express the solution via Riccati forward iterations as in the next theorem

    Theorem 2.4-1. In the timeinvariant case, the solution to the deterministic LQRproblem (2)(7) is given by the following statefeedback control

    u(N j) = F (j)x(N j) , j = 1, , N (2.4-9)where F (j) is the LQR feedback matrix

    F (j) = [u +GP (j 1)G]1[M +GP (j 1)] (2.4-10)and P (j) is the symmetric nonnegative denite matrix solution of the followingRiccati forward dierence equation

    P (j) = P (j 1)[M +P (j 1)G

    ][

    u +GP (j 1)G]1[

    M +GP (j 1)]+ x (2.4-11)

    = P (j 1) F (j)[u +GP (j 1)G

    ]F (j) + x (2.4-12)

    =[+GF (j)

    ]P (j 1)

    [+GF (j)

    ]+

    F (j)uF (j) +M F (j) + F (j)M + x (2.4-13)

    with initial conditionP (0) = x(N) (2.4-14)

    Further, the Bellman function Vj(x), relative to an initial state x and a jstepsregulation horizon, with terminal state costed by P (0), equals

    Vj(x) : = minu[Nj,N)

    J(x, u[Nj,N)

    )= min

    u[0,j)J(x, u[0,j)

    )(2.4-15)

    = xP (j)x

    Our interest will be now focused on the limit properties of the LQR solution(9)(15) as j , i.e. as the length of the regulation horizon becomes innite. Theinterest is motivated by the fact that, if a limit solution exists, the correspondingstatefeedback may yield good transient as well as steadystate regulation proper-ties to the controlled system.

    We start by studying the convergence properties of P (j) as j . As nextexample 1 shows, the limit of P (j) for j need not exist. In particular, we seethat some stabilizability condition on the pair (, G) must be satised if the limithas to exist.

  • Sect. 2.4 TimeInvariant LQR 19

    Example 2.4-1 Consider the plant (2) with

    =

    [1 10 2

    ]G =

    [10

    ](2.4-16)

    For the pair (, G), 2 is an unstable unreachable eigenvalue. Hence, (, G) is not stabilizable.Let x2(k) be the second component of the plant state x(k). It is seen that x2(k) is unaected byu(). In fact, it satises the following homogeneous dierence equation

    x2(k + 1) = 2x2(k) (2.4-17)

    Consider the performance index (3) with x(N) = O22 and instantaneous loss

    (x, u) = x22 + u2. (2.4-18)

    Assume that the corresponding matrix sequence {P (j)}j=0 admits a limit as j limj

    P (j) = P () M (2.4-19)

    Then, according to (15), there is an input sequence for which

    limj

    J(x, u[0,j)

    )= lim

    j

    j1k=0

    [x22(k) + u

    2(k)]

    = xP ()x < However, last inequality contradicts the fact that the performance index (3), with (x, u) as in(18) and x2(k) satisfying (17), diverges as j for any initial state x IR2 such that x2 = 0,irrespective of the input sequence. Therefore, by contradiction, we conclude that the limit (19)does not exist.

    Next Problem 1 applies the results of Theorem 1 to the plant (2) when G = Onmand is a stability matrix

    Problem 2.4-1 Consider the sequence {x(k)}N1k=0 satisfying the dierence equationx(k + 1) = x(k) (2.4-20)

    Show thatN1k=0

    x(k)2x = x(0)2L(N) (2.4-21)

    where L(N) is the symmetric nonnegative denite matrix obtained by the following Lyapunovdierence equation

    L(j + 1) = L(j) + x , j = 0, 1, (2.4-22)initialized from L(0) = Onm. Next, show that the following limits exist

    limN

    N1k=0

    x(k)2x = x(0)2L() (2.4-23)

    limN

    L(N) =: L(), (2.4-24)provided that is a stability matrix, i.e.

    |()| < 1 (2.4-25)if () denotes any eigenvalue of . Finally show that L() satises the following (algebraic)Lyapunov equation

    L() = L() + x (2.4-26)That (26) has a unique solution under (25), it follows from a result of matrix theory [Fra64].

    Next lemma will be used in the study of the limiting properties as j of thesolution P (j) of the Riccati equation (11)(13)

    Lemma 2.4-1. Let {P (j)}j=0 be a sequence of matrices in IRnn such that:

  • 20 Deterministic LQ Regulation I RiccatiBased Solution

    i. every P (j) is symmetric and nonnegative denite

    P (j) = P (j) 0 (2.4-27)

    ii. {P (j)}j=0 is monotonically nondecreasing, viz.i j P (i) P (j) (2.4-28)

    iii. {P (j)}j=0 is bounded from above, viz. there exists a matrix Q IRnn suchthat, for every j,

    P (j) Q (2.4-29)Then, {P (j)}j=0 admits a symmetric nonnegative denite limit P as j

    limj

    P (j) = P (2.4-30)

    Proof For every x IRn the realvalued sequence {(j)} := {xP (j)x} is, by ii., monotonicallynondecreasing and, by iii., upperbounded by xQx. Hence, there exists limj (j) = . Takenow x = ei where ei is the ith vector of the natural basis of IR

    n. Thus, with such a choice,xP (j)x = Pii(j) if Pik denotes the (i, k)entry of P . Hence, we have established that there exist

    limj

    Pii(j) = Pii , i = 1, , n

    Next, take x = ei + ek. Under such a choice, xP (j)x = Pii(j) + 2Pik(j) +Pkk(j). This admits a

    limit as j . Since limj Pii(j) = Pii and limj Pkk(j) = Pkk, there existslimj

    Pik(j) = Pik

    Since we have established the existence of the limit as j of all entries of P (j), and P (j)satises (27), it follows that P exists symmetric and nonnegative denite.

    We show next that the solution of the Riccati iterations (11)(13) initialized fromP (0) = Onn enjoys the properties i.iii. of Lemma 1, provided that the pair (, G)is stabilizable.

    Proposition 2.4-1. Consider the matrix sequence {P (j)}j=0 generated by the Ric-cati iterations (11)(13) initialized from P (0) = Onn. Then,{P (j)}j=0 enjoys the properties i.iii. of Lemma 2.4-1, provided that (, G) isa stabilizable pair.

    Proof Property i. of Lemma 1 is clearly satised. To prove property ii. we proceed as follows.Consider the LQ optimal input u0

    [0,j+1)for the regulation horizon [0, j + 1] and an initial plant

    state x. Let x0[0,j+1]

    be the corresponding state evolution. Then,

    xP (j + 1)x =j1k=0

    (x0(k), u0(k)) + (x0(j), u0(j))

    j1k=0

    (x0(k), u0(k)) (2.4-31)

    minu[0,j)

    j1k=0

    (x(k), u(k)) = xP (j)x

    Hence, {P (j)}j=0 is monotonically nondecreasing.To check property iii., consider a feedbackgain matrix F which stabilizes , viz. + GF is

    a stability matrix. Letu(k) = F x(k) , k = 0, 1, (2.4-32)

  • Sect. 2.4 TimeInvariant LQR 21

    and correspondinglyx(k + 1) = ( +GF )x(k)

    x(0) = x.(2.4-33)

    Recall that by (3-12), x+F M+M F +F uF is a symmetric and nonnegative denite matrix.Then, by Problem 1, there exists a matrix Q = Q 0, solution of the Lyapunov equation

    Q = ( +GF )Q( +GF ) + x + F M +M F + F uF (2.4-34)

    and such that

    xQx =k=0

    (x(k), u(k))

    j1k=0

    (x(k), u(k)) (2.4-35)

    minu[0,j)

    j1k=0

    (x(k), u(k)) = xP (j)x

    Hence, {P (j)}j=0 is upperbounded by Q.

    Proposition 1, together with Lemma 1, enables us to establish a sucient conditionfor the existence of the limit of P (j) as j .Theorem 2.4-2. Consider the matrix sequence {P (j)}j=0 generated by the Riccatiiterations (11)(13) initialized from P (0) = Onn. Then, if (, G) is a stabilizablepair, there exists the limit of P (j) as j

    P := limj

    P (j) (2.4-36)

    P is symmetric nonnegative denite and satises the algebraic Riccati equation

    P = P (2.4-37)(M +PG)(u +GPG)1(M +GP) + x

    = P F (u +GPG)1F + x (2.4-38)= ( +GF )P ( +GF ) + F uF +M F + F M + x (2.4-39)

    withF = (u +GPG)1(M +GP) (2.4-40)

    Under the above circumstances, the innitehorizon or steadystate LQR, for which

    minu[0,)

    J(x, u[0,)

    )= xPx, (2.4-41)

    is given by the statefeedback control

    u(k) = F x(k) k = 0, 1, (2.4-42)It is to be pointed out that Theorem 1 does not give any insurance on the

    asymptotic stability of the resulting closedloop system

    x(k + 1) = ( +GF )x(k) (2.4-43)

    Stability has to be guaranteed in order to make the steadystate LQR applicablein practice. We now begin to study stability of the closedloop system (43), shouldP (j) admit a limit P for j . For the sake of simplicity, this study will becarried out with reference to the Linear Quadratic Output Regulation (LQOR)problem dened as follows.

  • 22 Deterministic LQ Regulation I RiccatiBased Solution

    LQOR problem in the timeinvariant case Here the plant is describedby a linear timeinvariant statespace representation

    x(k + 1) = x(k) +Gu(k)x(0) = xy(k) = Hx(t)

    (2.4-44)where y(k) IRp is the output to be regulated at zero. A quadratic perfor-mance index as in (3) is considered with instantaneous loss

    (x, u) := y2y + u2u (2.4-45)where u satises (5) and

    y = y > 0 (2.4-46)

    Since in view of (44) y2y = x2x whenever

    x = H yH, (2.4-47)

    it appears that the LQOR problem is an LQR problem with M = 0. However,we recall that, by (1-8)(1-13), each LQR problem can be cast into an equivalentproblem with no crossproduct terms in the instantaneous loss. In turn, any stateinstantaneous loss such as x2x can be equivalently rewritten as y2y , y = Hxand y = y > 0, if H and y are selected as follows. Let rank x = p n. Thenthere exist matrices H IRpn and y = y > 0 such that the factorization (47)holds. Any such a pair (H,y) can be used for rewriting x2x as y2y . Therefore,we conclude that, in principle, there is no loss of generality in considering the LQORin place of the LQR problem.

    For any nite regulation horizon, the solution of the LQOR problem in the timeinvariant case is given by (9)(15) of Theorem 1, provided that M = 0 and x isas in (47). An advantage of the LQOR formulation is that the limiting propertiesas N of the LQOR solution can be nicely related to the systemtheoreticproperties of the plant = (, G,H) in (44).

    Problem 2.4-2 Consider the plant (44) in a GilbertKalman (GK) canonical observabilitydecomposition

    =

    [o 0oo o

    ]G =

    [GoGo

    ]H =

    [Ho 0

    ]x =

    [xo xo

    ] (2.4-48)It is to be remarked that this can be assumed w.l.o.g. since any plant (44) is algebraically equivalentto (48). With reference to (10)(15) with M = 0 and x = HyH, show that, if

    P (0) =

    Po(0) 00 0

    , (2.4-49)then

    P (j) =

    Po(j) 00 0

    , (2.4-50)with

    Po(j + 1) = oPo(j)o (2.4-51)

    oPo(j)Go[u +G

    oPo(j)Go

    ]1GoPo(j)o +H

    oyHo

  • Sect. 2.4 TimeInvariant LQR 23

    andF (j) =

    [Fo(j) 0

    ](2.4-52)

    withFo(j) = [u +GoPo(j 1)Go]1GoPo(j 1)o (2.4-53)

    Expressing in words the conclusions of Problem 2, we can say that the solution ofthe LQOR problem depends solely on the observable subsystem o = (o, Go, Ho)of the plant, provided that only the observable component xo(N) of the nal statex(N) =

    [xo(N) x

    o(N)

    ] is costed.For the timeinvariant LQOR problem, next Theorem 2 gives a necessary and

    sucient condition for the existence of P in (36).

    Theorem 2.4-3. Consider the timeinvariant LQOR problem and the correspond-ing matrix sequence {P (j)}j=0 generated by the Riccati iterations (11)(13), withM = Onn, initialized from P (0) = Omn. Let o = (o, Go, Ho) be the com-pletely observable subsystem obtained via a GK canonical observability decomposi-tion of the plant (44) = (, G,H). Next, let or the state transition matrix ofthe unreachable subsystem obtained via a GK canonical reachability decompositionof o. Then, there exists

    P = limj

    P (j) (2.4-54)

    if and only if or is a stability matrix.

    Proof According to Problem 2, everything depends on o. Thus, w.l.o.g., we can assume thatthe plant is o. To say that or is a stability matrix is equivalent to stabilizability of o. Then,by Theorem 1, the above condition implies (54).

    We prove that the condition is necessary by contradiction. Assume that or is not a stabilitymatrix. Therefore, there are observable initial states of the form xo =

    [xor = 0 xor

    ]such

    thatj1

    k=0 y(k)2x diverges as j , irrespective of the input sequence. This contradicts (54).

    The reader is warned of the right order for the GK canonical decompositions thatmust be used to get or in Theorem 2.

    Example 2.4-2 Consider the plant = (, G,H) with

    =

    [1 10 2

    ]G =

    [10

    ]H =

    [1 0

    ] is seen to be completely observable. Hence, we can set = o. Further, is already in a GKreachability canonical decomposition with or = 2. Hence, we conclude that the limit (54) doesnot exist.

    If we reverse the order of the GK canonical decompositions, we rst get the unreachable rof . It equals r = (2, 0, 0) which is unobservable. Then, ro is empty (no unreachable andobservable eigenvalue). Hence, we would erroneously conclude that the limit (54) exists.

    Problem 2.4-3 Consider the LQOR problem for the plant = (, G,H). Assume that the

    matrix or, dened in Theorem 3, is a stability matrix. Then, by Theorem 3, there exists P as in(54). Prove by contradiction that P is positive denite if and only if the pair (, H) is completelyobservable. [Hint: Make use of (50) and positive deniteness of y .]

    Theorem 2.4-4. Consider the timeinvariant LQOR problem and the correspond-ing matrix sequence {P (j)}j=0 generated by the Riccati iterations (11)(13), withM = Omn, initialized from P (0) = Onn. Then, there exists

    P = limj

    P (j)

  • 24 Deterministic LQ Regulation I RiccatiBased Solution

    such that the corresponding feedbackgain matrix

    F = (u +GPG)1GP (2.4-55)

    yields a statefeedback control law u(k) = F x(k) which stabilizes the plant, viz. +GF is a stability matrix, if and only if the plant = (, G,H) is stabilizableand detectable.

    Proof We rst show that stabilizability and detectability of is a necessary condition for theexistence of P and stability the corresponding closedloop system.

    First, +GF stable implies stabilizability of the pair (, G). Second, necessity of detectabilityof (,H) is proved by contradiction. Assume, then, that (,H) is undetectable. Referring toProblem 2, w.l.o.g. (, G,H) can be considered in a GK canonical observability decomposition

    and, according to (52), F =[Fo 0

    ]. Hence, the unobservable subsystem of is left unchanged

    by the steadystate LQ regulator. This contradicts stability of +GF .We next show that stabilizability and detectability of is a sucient condition. Since the

    pair (, G) is stabilizable, by Theorem 1 there exists P . Further, according to Problem 2, the

    unobservable eigenvalues of are again eigenvalues of + GF . Since by detectability of (,H)they are stable, w.l.o.g. we can complete the proof by assuming that (, H) is completely ob-

    servable. Suppose now that + GF is not a stability matrix, and show that this contradicts(54). To see this, consider that complete observability of (, H) implies complete observability of(+GF,

    [F H

    ])for any F of compatible dimensions. Then, if +GF is not a stability

    matrix, there exists states x(0) such that

    j1k=0

    [y(k)2y + u(k)2u

    ]=

    j1k=0

    {x(k)

    [F H

    ] [ u 00 y

    ] [FH

    ]x(k)

    }diverges as j . This contradicts (54).

    We next show that, whenever the validity conditions of Theorem 3 are fullled, theRiccati iterations (11)(13), with M = Omn, initialized from any P (0) = P (0) 0, yield the same limit as (54).

    Lemma 2.4-2. Consider the timeinvariant LQOR problem (44)(46) with terminalstate cost weight P (0) = P (0) 0. Let the plant be stabilizable and detectable.Then, the corresponding matrix sequence {P (j)}j=0 generated by the Riccati iter-ations (11)(13) with M = Omn, admits, as j , a unique limit, no matterhow P (0) is chosen. Such a limit is the same as the one of (54). Further,

    xP x = limj

    minu[0,j)

    { j1k=0

    [y(k)2y + u(k)2u

    ]+ x(j)2P (0)

    }(2.4-56)

    and the optimal input sequence minimizing the performance index in (56) is givenby the statefeedback control law u(k) = F x(k) with F as in (55).

    Proof Since the plant is stabilizable and detectable, Theorem 3 guarantees that, if we adopt thecontrol law u0(k) = F x0(k) , then

    limj

    x0(j)2P (0) = 0

    and

    limj

    { j1k=0

    [y0(k)2y + |u0(k)2u

    ]+ x0(j)2P (0)

    }= xPx

    where the superscript denotes all system variables obtained by using the above control law. As-sume now that F is not the steadystate LQOR feedbackgain matrix for some P (0) and initial

  • Sect. 2.4 TimeInvariant LQR 25

    state x IRn. Then,

    > xP x >=

    limj

    j1k=0

    [y(k)2y + u(k)2

    ]+ x(j)2P (0)

    lim

    j

    j1k=0

    [y(k)2y + u(k)2u

    This contradict steadystate optimality of F for P (0) = Onn.

    Whenever the Riccati iterations (11)(13) for the LQOR problem converge as j and P = limj P (j), the limit matrix P satises the following algebraic Riccatiequation (ARE)

    P = P PG(u +GPG)1GP+ x (2.4-57)Conversely, all the solutions of (57) need not coincide with a limiting matrix ofthe Riccati iterations for the LQOR problem. The situation again simplies understabilizability and detectability of the plant.

    Lemma 2.4-3. Consider the timeinvariant LQOR problem. Let the plant be sta-bilizable and detectable. Then, the ARE (57) has a unique symmetric nonnegativedenite solution which coincides with the matrix P in (54).

    Proof Assume that, besides P , (57) has a dierent solution P = P 0, P = P . If the Riccatiiterations (11)(13) are initialized from P (0) = P , we get P (j) = P , j = 1, 2, Then, P and Pare two dierent limits of the Riccati iterations. This contradicts Lemma 2.

    Since the ARE is a nonlinear matrix equation, it has many solutions. Amongthese solutions P , the strong solutions are called the ones yielding a feedbackgainmatrix F = (u+GPG)1GP for which the closedloop transition matrix haseigenvalues in the closed unit disk. The following result completes Lemma 3 in thisrespect.

    Result 2.4-1. Consider the timeinvariant LQOR problem and its associated ARE.Then:

    i. The ARE has a unique strong solution if and only if the plant is stabilizable;

    ii. The strong solution is the only nonnegative denite solution of the ARE ifand only if the plant is stabilizable and has no undetectable eigenvalue outsidethe closed unit disk.

    The most useful results of steadystate LQR theory are summed up in The-orem 5. Its conclusions are reassuming in that, under general conditions, theyguarantee that the steadystate LQOR exists and stabilizes the plant. One impor-tant implication is that steadystate LQR theory provides a tool for systematicallydesigning regulators which, while optimizing an engineering signicant performanceindex, yield stable closedloop systems.

    Theorem 2.4-5. Consider the timeinvariant LQOR problem (44)(46) and therelated matrix sequence {P (j)}j=0 generated via the Riccati iterations (11)(13)with M = Omn, initialized from any P (0) = P (0) 0. Then, there exists

    P = limj

    P (j) (2.4-58)

  • 26 Deterministic LQ Regulation I RiccatiBased Solution

    such that

    xPx = V(x) (2.4-59)

    = limj

    minu[0,j)

    {j1k=0

    [y(k)2y + u(k)2u

    ]+ x(j)2P (0)

    }and the LQOR control law given by

    u(k) = Fx(k) (2.4-60)

    F = (u +GPG)1GP (2.4-61)stabilizes the plant, if and only if the plant (, G,H) is stabilizable and detectable.Further, under such conditions, the matrix P in (58) coincides with the uniquesymmetric nonnegative denite solution of the ARE (57).

    Main points of the section The innitetime or steadystate LQOR solution canbe used so as to stabilize any timeinvariant plant, while optimizing a quadraticperformance index, provided that the plant is stabilizable and detectable. Thesteadystate LQOR consists of a timeinvariant statefeedback whose gain (61)is expressed in terms of the limit matrix P (58) of the Riccati iterations (11)(13). This also coincides with the unique symmetric nonnegative denite solutionof the ARE (57). While stabilizability appears as an obvious intrinsic propertywhich cannot be enforced by the designer, on the contrary detectability can beguaranteed by a suitable choice of the matrix H or the state weighting matrix x(47).Problem 2.4-4 Show that the zero eigenvalues of the plant, are also eigenvalues of the LQORclosedloop system.

    Problem 2.4-5 (Output Dynamic Compensator as an LQOR) Consider the SISO plant de-scribed by the following dierence equation

    y(t) + a1y(t 1) + + any(t n) = b1u(t 1) + + bnu(t n) (2.4-62)Show that:

    i. x(t) :=[y(t n+ 1) y(t) u(t n+ 1) u(t 1) ] (2.4-63)

    is a statevector for the plant (62);

    ii. The statespace representation (, G,H) with state x(t) is stabilizable and detectable ifthe polynomials

    A(q) := qn + a1qn1 + + anB(q) := b1qn1 + + bn

    }(2.4-64)

    have a strictly Hurwitz greatest common divisor;

    iii. Under the assumption in ii., the steadystate LQOR, obtained by using the triplet (, G,H),consists of an output dynamic compensator of the form

    u(t) + r1u(t 1) + + rn1u(t n+ 1) =0y(t) + 1y(t 1) + + n1y(t n+ 1) (2.4-65)

    [Hint: Use the result [GS84] according to which (, G) is completely reachable if and only ifA(q) and B(q) are coprime. ]

    Problem 2.4-6 Consider the LQOR problem for the SISO plant

    =

    [0 1 12

    32

    ]G =

    [01

    ]H =

    [ 1

    ], IR (2.4-66)

    and the cost

    k=0[y2(k) + u2(k)], > 0. Find the values of for which the steadystate

    LQOR problem has no solution yielding an asymptotically stable closedloop system. [Hint:The unobservable eigenvalues of (66) coincide with the common roots of (z) := det(zI2 )and H Adj(zI2 )G] ]

  • Sect. 2.4 TimeInvariant LQR 27

    Problem 2.4-7 Consider the plant

    y(t) 14y(t 2) = u(t 1) + u(t 2)

    with initial conditions

    x1(0) :=1

    4y(t 1) + u(1)

    x2(0) := y(0)

    and statefeedback control law u(t) = 14y(t). Compute the corresponding cost J =

    k=0[y

    2(k)+

    u2(k)], 0. [Hint: Use the Lyapunov equation (26) with a suitable choice for x(t). ]

    Problem 2.4-8 Consider the LQOR problem for the plant

    =

    [12

    02

    ]G =

    [g1g2

    ]H =

    [0 1

    ]and performance index J =

    k=0[y

    2(k) + 104u2(k)]. Give detailed answers to the followingquestions.

    i. Find the set of values for the parameters (, g1, g2) for which there exists P = limj P (j)as in (54).

    ii. Assuming that P as in i. exists, nd for which values of (, g1, g2) the statefeedback control

    law u(k) = (104 + GPG)1GPx(k) makes the closedloop system asymptoticallystable.

    Problem 2.4-9 (LQOR with a Prescribed Degree of Stability) Consider the LQOR problem fora plant (, G,H) and performance index

    J =k=0

    r2k[y(k)2y + u(k)2u

    ](2.4-67)

    with r 1, y = y > 0 and u = u > 0. Show that:i. The above LQOR problem is equivalent to an LQOR problem with the following perfor-mance index

    k=0

    [y(k)2y + u(k)2u

    ]and a new plant (, G, H) to be specied;

    ii. Provided that (, H) is a detectable pair, the eigenvalues of the characteristic polynomialof the closedloop system consisting of the initial plant optimally regulated according to(67) satisfy the inequality

    || < 1r

    Problem 2.4-10 (Tracking as a Regulation Problem) Consider a detectable plant(, G,H) with input u(t), state x(t) and scalar output y(t). Let r be any real number. De-ne (t) := y(t) r. Prove that, if 1 is an eigenvalue of , viz. (1) := det(In ) = 0, thereexist eigenvectors xr of associated with the eigenvalue 1 such that, for x(t) := x(t) xr , wehave

    x(t + 1) = x(t) +Gu(t)(t) = Hx(t)

    }(2.4-68)

    This shows that, under the stated assumptions, the plant with input u(t), state x(t) and output(t) has a description coinciding with the initial triplet (, G,H). Then, if (, G) is stabilizable,the LQ regulation law u(t) = F x(t) minimizing

    k=0

    [2(k) + u(k)2u ] , u > 0 (2.4-69)

    for the plant (62), exists and the corresponding closedloop system is asymptotically stable.

  • 28 Deterministic LQ Regulation I RiccatiBased Solution

    Problem 2.4-11 (Tracking as a Regulation Problem) Consider again the situation described inProblem 2.4-10 where u(t) IR. Let

    x(t) := x(t) x(t 1)u(t) := u(t) u(t 1)(t) :=

    [x(t)(t)

    ] IRn+1 (2.4-70)

    i. Show that the statespace representation of the plant with input u(t), state (t) andoutput (t) is given by the triplet

    =

    ([ 0H 1

    ],

    [GHG

    ],[On 1

    ]).

    ii. Let be the observability matrix of . Show that by taking elementary row operationson , we can get a matrix which can be factorized as follows

    [ OnOn 1

    ], =

    0 1H 0H 0...

    .

    ..Hn1 0

    Show that (, H) detectable implies detectability of .

    iii. Let R the reachability matrix of . Dene

    R :=

    [In OnH 1

    ]R

    Show that by taking elementary column operations on R, we can get a matrix which canbe factorized as LR, with

    L =

    [G In0 H

    ], R =

    [1 0 0 00 G G n1G

    ]iv. Prove that nonsingularity of L is equivalent to Hyu(1) := H(In )1G = 0v. Prove that (, G) stabilizable and Hyu(1) = 0 implies that is stabilizable.vi. Conclude that if (, G,H) is stabilizable and detectable, andHyu(1) = 0, the LQ regulation

    lawu(t) = Fxx(t) + F(t) (2.4-71)

    minimizingk=0

    [2(k) + [u(k)]2

    ], > 0 (2.4-72)

    for the plant , exists and the corresponding closedloop system is asymptotically stable.

    Note that (71) gives

    u(t) u(0) =t

    k=1

    u(t) (2.4-73)

    = Fx[x(t) x(0)] + Ft

    k=1

    (t)

    In other terms, (71) is a feedbackcontrol law including an integral action from the tracking error.

    Problem 2.4-12 (Fake ARE) Consider the Riccati forward dierence equation (11) with M =0mn and x as in (46) and (47):

    P (j + 1) = P (j) (2.4-74)P (j)G[u +GP (j)G]1GP (j) +HyH

    We note that the above equation can be formally rewritten as follows

    P (j) = P (j)P (j)G[u +GP (j)G]1GP (j) +Q(j) (2.4-75)

    Q(j) := HyH + P (j) P (j + 1) (2.4-76)

  • Sect. 2.5 SteadyState LQR Computation 29

    The latter has the same form as the ARE (57) and has been called [BGW90] Fake ARE. Makeuse of Theorem 4 to show that the feedbackgain matrix

    F (j + 1) = [u +GP (j)G]1GP (j) (2.4-77)stabilizes the plant, viz. +GF (j+1) is a stability matrix, provided that (, G,H) is stabilizableand detectable, and P has the property

    P (j) P (j + 1) 0 (2.4-78)[Hint: Show that (78) implies that Q(j) can be written as HyH + , = > 0, IRrn, r := rank[P (j)P (j+1)]. Next, prove that detectability of (,H) implies detectabilityof (,

    [H

    ]). Finally, consider the Fake ARE. ]

    2.5 SteadyState LQR Computation

    There are several numerical procedures available for computing the matrix P in(4-58). We limit our discussion to the ones that will be used in this text. In partic-ular, we shall not enter here into numerical factorization techniques for solving LQproblems which will be touched upon for the dual estimation problem in Sect. 6.5.

    Riccati Iterations

    Eqs. (4-11)(4-13), withM = Omn, can be iterated, once they are initialized fromany P (0) = P (0) 0, for computing P as in (4-58). Of the three dierent forms,the third, viz.

    P (j + 1) = [ +GF (j + 1)]P (j)[ +GF (j + 1)] +F (j + 1)uF (j + 1) + x (2.5-1)

    F (j + 1) = [u +GP (j)G]1GP (j) (2.5-2)is referred to as the robustied form of the Riccati iterations. The attribute hereis motivated by the fact that, unlike the other two remaining forms, it updatesthe matrix P (j) by adding symmetric nonnegative denite matrices. When com-putations with roundo errors are considered, this is a feature that can help toobtain at each iteration step a new symmetric nonnegative denite matrix P (j), asrequired by LQR theory.

    The rate of convergence of the Riccati iterations is generally not very rapid,even in the neighborhood of the steadystate solution P . The numerical proceduredescribed next exhibits fast convergence in the vicinity of P .

    Kleinman Iterations

    Given a stabilizing feedbackgain matrix Fk IRmn, let Lk be the solution of theLyapunov equation

    Lk = kLkk + F kuFk + x (2.5-3)k := +GFk (2.5-4)

    The next feedbackgain matrix Fk+1 is then computed

    Fk+1 = (u +GLkG)1GLk (2.5-5)The iterative equations (3)(5), k = 0, 1, 2, , enjoy the following properties.

  • 30 Deterministic LQ Regulation I RiccatiBased Solution

    Suppose that the ARE (4-57) has a unique nonnegative denite solution, e.g.(, G,H), with x = H yH , y = y > 0, stabilizable and detectable. Then,provided that F0 is such as to make 0 a stability matrix,

    i. the sequence {Lk}k=0 is monotonic nonincreasing and lowerbounded by thesolution P of the ARE (4-57)

    L0 Lk Lk+1 P ; (2.5-6)

    ii. limk

    Lk = P ; (2.5-7)

    iii. the rate of convergence to P is quadratic, viz.

    P Lk+1 cP Lk2 (2.5-8)for any matrix norm and for a constant c independent of the iteration indexk.

    Eq. (8) shows that the rate of convergence of the Kleinman iterations is fast inthe vicinity of P . It is however required that the iterations be initialized froma stabilizing feedbackgain matrix F0. In order to speed up convergence, [AL84]suggests to select F0 via a direct Schurtype method.

    The main problem with Kleinman iterations is that (3) must be solved at eachiteration step. Although (3) is linear in Lk, its solution cannot be obtained bysimple matrix inversion. Actually, the numerical eort for solving it may be ratherformidable since the number of linear equations that must be solved at each itera-tion step equals n(n+ 1)/2 if n denotes the plant order.

    Kleinman iterations result from using the NewtonRaphsons method [Lue69]for solving the ARE (4-57).

    Problem 2.5-1 Consider the matrix function

    N(P ) := P + P PG[H(P )]1GP+ xwhere

    H(P ) := (u +GPG)

    The aim is to nd the symmetric nonnegative denite matrix P such that

    N(P ) = Onn

    Let Lk1 = Lk1 0 be a given approximation to P . It is asked to nd a next approximationLk by increasing Lk1 by a small correction L

    Lk = Lk1 + LL, and hence Lk, has to be determined in such a way that N(Lk) Onn.

    By omitting the terms in L of order higher than the rst, show thatH1(Lk) H1(Lk1)H1(Lk1)GLGH1(Lk1)

    and, further, that N(Lk) Onn if Lk satises (3)(4).

    Controltheoretic interpretation It is of interest for its possible use in adaptivecontrol, to give a specic controltheoretic interpretation to the Kleinman itera-tions. To this end, consider the quadratic cost J(x, u[0,)) under the assumptionthat all inputs, except u(0), are given by feeding back the current plant state by astabilizing constant gain matrix Fk, viz.

    u(j) = Fkx(j) , j = 1, 2, (2.5-9)

  • Sect. 2.6 Cheap Control 31

    u(0) t = 0+

    (, G,H) y(t)

    x(t)

    Fk

    (0, x)

    Figure 2.5-1: A controltheoretic interpretation of Kleinman iterations.

    The situation is depicted in Fig. 1 where t = O+ indicates that the switch commutesfrom position a to position b after u(0) has been applied and before u(1) is fed intothe plant.

    Let the corresponding cost be denoted as follows

    J(x, u(0), Fk

    ):= J

    (x, u[0,) | u(j) = Fkx(j), j = 1, 2,

    )(2.5-10)

    We show that, for given x and Fk,

    u(0) = Fk+1x. (2.5-11)

    minimizes (10) w.r.t. u(0), if Fk+1 is related to Fk via (3)(6). To see this, rewrite(10) as follows

    J(x, u(0), Fk) = x2x + u(0)2u +j=1

    x(j)2(x+F kuFk)

    = x2x + u(0)2u + x(1)2Lk= x2x + u(0)2u + x+Gu(0)2Lk (2.5-12)

    where the rst equality follows from (9), the second from Problem 4-1 if Lk is thesolution of the Lyapunov equation (3), and the third since x(1) = x + Gu(0).Minimization of (10) w.r.t. u(0) yields (11) with Fk+1 as in (5).

    Problem 2.5-2 Consider (12) and dene the symmetric nonnegative denite matrix Rk+1implicitly via

    xRk+1x := minu(0)

    J(x, u(0), Fk) (2.5-13)

    Show that Rk+1 satises the recursions

    Rk+1 = LkLkG(u +GLkG)1GLk+ x (2.5-14)

    with Lk as in (3).

    Problem 2.5-3 If Rk+1 and Lk are as in Problem 2, show thatLk Rk+1 0 (2.5-15)

    Problem 2.5-4 Assume that x = HyH , y = y > 0 and (, G,H) is stabilizableand detectable. Use (14) and (15) to prove that (5) is a stabilizing feedbackgain matrix, viz.k+1 = +GFk+1 is a stability matrix. [Hint: Refer to Problem 4-12. From (14) form a fakeARE Lk = Lk LkG(u +GLkG)1GLk+Qk. Etc.]

  • 32 Deterministic LQ Regulation I RiccatiBased Solution

    2.6 Cheap Control

    The performance index used in the LQR problem has to be regarded as a compro-mise between two conicting objectives: to obtain a good regulation performance,viz. small y(k), as well as to prevent u(k) from becoming too large. This com-promise is achieved by selecting suitable values for the weights u, x and M inthe performance index. It is however interesting to consider in the timeinvariantcase a performance index in which

    u = Omm M = Omn x(N) = Onn

    This means that the plant input is allowed to take on even very large values, thecontrol eort not being penalized in the resulting performance index

    J(x, u[0,N)) =N1k=0

    x(k)2x =N1k=0

    y(k)2y (2.6-1)

    This choice should hopefully yield a high regulation performance though at theexpense of possibly large inputs. The LQR pr