of 240/240
C.K. Chui · G. Chen Kalman Filtering with Real-Time Applications Fourth Edition 123

Kalman Filtering - University of Calgarypeople.ucalgary.ca/~far/Books/Kalman/Springer - Kalman... · 2015-05-21 · Preface to the Third Edition Two modern topics in Kalman filtering

  • View
    10

  • Download
    0

Embed Size (px)

Text of Kalman Filtering - University of Calgarypeople.ucalgary.ca/~far/Books/Kalman/Springer -...

  • C.K. Chui · G. Chen

    Kalman Filteringwith Real-Time Applications

    Fourth Edition

    123

  • Kalman Filtering

  • Professor Charles K. ChuiTexas A&M UniversityDepartment Mathematics608K Blocker HallCollege Station, TX, 77843USA

    Professor Guanrong ChenCity University Hong KongDepartment of Electronic Engineering83 Tat Chee AvenueKowloonHong Kong/PR China

    Second printing of the third edition with ISBN 3-540-64611-6, published as softcover editionin Springer Series in Information Sciences.

    ISBN 978-3-540-87848-3

    DOI 10.1007/978-3-540-87849-0

    e-ISBN 978-3-540-87849-0

    Library of Congress Control Number: 2008940869

    © 2009, 1999, 1991, 1987 Springer-Verlag Berlin Heidelberg

    This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable to prosecution under the German Copyright Law.

    The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,even in the absence of a specific statement, that such names are exempt from the relevant protective lawsand regulations and therefore free for general use.

    Cover design: eStudioCalamar S.L., F. Steinen-Broo, Girona, Spain

    Printed on acid-free paper

    9 8 7 6 5 4 3 2 1

    springer.com

  • Preface to the Third Edition

    Two modern topics in Kalman filtering are new additions to thisThird Edition of Kalman Filtering with Real-Time Appli-cations. Interval Kalman Filtering (Chapter 10) is added to ex-pand the capability' of Kalman filtering to uncertain systems, andWavelet Kalman Filtering (Chapter 11) is introduced to incorpo-rate efficient techniques from wavelets and splines with Kalmanfiltering to give more effective computational schemes for treatingproblems in such areas as signal estimation and signal decompo-sition. It is hoped that with the addition of these two new chap-ters, the current edition gives a more complete and up-to-datetreatment of Kalman filtering for real-time applications.

    College Station and HoustonAugust 1998

    Charles K. ChuiGuanrong Chen

  • Preface to the Second Edition

    In addition to making a number of minor corrections and updat-ing the list of references, we have expanded the section on "real-time system identification" in Chapter 10 of the first edition intotwo sections and combined it with Chapter 8. In its place, a verybrief introduction to wavelet analysis is included in Chapter 10.Although the pyramid algorithms for wavelet decompositions andreconstructions are quite different from the Kalman filtering al-gorithms, they can also be applied to time-domain filtering, andit is hoped that splines and wavelets can be incorporated withKalman filtering in the near future.

    College Station and HoustonSeptember 1990

    Charles K. ChuiGuanrong Chen

  • Preface to the First Edition

    Kalman filtering is an optimal state estimation process appliedto a dynamic system that involves random perturbations. Moreprecisely, the Kalman filter gives a linear, unbiased, and min-imum error variance recursive algorithm to optimally estimatethe unknown state of a dynamic system from noisy data takenat discrete real-time. It has been widely used in many areas ofindustrial and government applications such as video and lasertracking systems, satellite navigation, ballistic missile trajectoryestimation, radar, and fire control. With the recent developmentof high-speed computers, the Kalman filter has become more use-ful even for very complicated real-time applications.

    In spite of its importance, the mathematical theory ofKalman filtering and its implications are not well understoodeven among many applied mathematicians and engineers. In fact,most practitioners are just told what the filtering algorithms arewithout knowing why they work so well. One of the main objec-tives of this text is to disclose this mystery by presenting a fairlythorough discussion of its mathematical theory and applicationsto various elementary real-time problems.

    A very elementary derivation of the filtering equations is firstpresented. By assuming that certain matrices are nonsingular,the advantage of this approach is that the optimality of theKalman filter can be easily understood. Of course these assump-tions can be dropped by using the more well known method oforthogonal projection usually known as the innovations approach.This is done next, again rigorously. This approach is extendedfirst to take care of correlated system and measurement noises,and then colored noise processes. Kalman filtering for nonlinearsystems with an application to adaptive system identification isalso discussed in this text. In addition, the limiting or steady-state Kalman filtering theory and efficient computational schemessuch as the sequential and square-root algorithms are included forreal-time application purposes. One such application is the de-sign of a digital tracking filter such as the a - f3 -, and a - f3 -, - ()

  • VIII Preface to the First Edition

    trackers. Using the limit of Kalman gains to define the a, (3, 7 pa-rameters for white noise and the a, (3,7, () values for colored noiseprocesses, it is now possible to characterize this tracking filteras a limiting or steady-state Kalman filter. The state estima-tion obtained by these much more efficient prediction-correctionequations is proved to be near-optimal, in the sense that its errorfrom the optimal estimate decays exponentially with time. Ourstudy of this topic includes a decoupling method that yields thefiltering equations for each component of the state vector.

    The style of writing in this book is intended to be informal,the mathematical argument throughout elementary and rigorous,and in addition, easily readable by anyone, student or profes-sional, with a minimal knowledge of linear algebra and systemtheory. In this regard, a preliminary chapter on matrix theory,determinants, probability, and least-squares is included in an at-tempt to ensure that this text be self-contained. Each chaptercontains a variety of exercises for the purpose of illustrating cer-tain related view-points, improving the understanding of the ma-terial, or filling in the gaps of some proofs in the text. Answersand hints are given at the end of the text, and a collection of notesand references is included for the reader who might be interestedin further study.

    This book is designed to serve three purposes. It is writ-ten not only for self-study but also for use in a one-quarteror one-semester introductory course on Kalman filtering theoryfor upper-division undergraduate or first-year graduate appliedmathematics or engineering students. In addition, it is hopedthat it will become a valuable reference to any industrial or gov-ernment engineer.

    The first author would like to thank the U.S. Army ResearchOffice for continuous support and is especially indebted to RobertGreen of the White Sands Missile Range for' his encouragementand many stimulating discussions. To his wife, Margaret, hewould like to express his appreciation for her understanding andconstant support. The second author is very grateful to ProfessorMingjun Chen of Zhongshan University for introducing him tothis important research area, and to his wife Qiyun Xian for herpatience and encouragement.

    Among the colleagues who have made valuable suggestions,the authors would especially like to thank Professors AndrewChan (Texas A&M), Thomas Huang (Illinois), and ThomasKailath (Stanford). Finally, the friendly cooperation and kindassistance from Dr. Helmut Lotsch, Dr. Angela Lahee, and theireditorial staff at Springer-Verlag are greatly appreciated.

    College StationTexas, January, 1987

    Charles K. ChuiGuanrong Chen

  • Contents

    Notation

    1. Preliminaries

    1.1 Matrix and Determinant Preliminaries1.2 Probability Preliminaries1.3 Least-Squares Preliminaries

    Exercises

    2. Kalman Filter: An Elementary Approach

    2.1 The Model2.2 Optimality Criterion2.3 Prediction-Correction Formulation2.4 Kalman Filtering Process

    Exercises

    3. Orthogonal Projection and Kalman Filter

    XIII

    1

    18

    1518

    20

    2021232729

    33

    3.1 Orthogonality Characterization of Optimal Estimates 333.2 Innovations Sequences 353.3 Minimum Variance Estimates 373.4 Kalman Filtering Equations 383.5 Real-Time Tracking 42

    Exercises 45

    4. Correlated Systemand Measurement Noise Processes 49

    4.1 The Affine Model 494.2 Optimal Estimate Operators 514.3 Effect on Optimal Estimation with Additional Data 524.4 Derivation of Kalman Filtering Equations 554.5 Real-Time Applications 614.6 Linear DeterministicjStochastic Systems 63

    Exercises 65

  • X Contents

    5. Colored Noise 67

    5.1 Outline of Procedure 675.2 Error Estimates 685.3 Kalman Filtering Process 705.4 White System Noise 735.5 Real-Time Applications 73

    Exercises 75

    6. Limiting Kalman Filter 77

    6.1 Outline of Procedure 786.2 Preliminary Results 796.3 Geometric Convergence 886.4 Real-Time Applications 93

    Exercises 95

    7. Sequential and Square-Root Algorithms 97

    7.1 Sequential Algorithm 977.2 Square-Root Algorithm 1037.3 An Algorithm for Real-Time Applications 105

    Exercises 107

    8. Extended Kalman Filter and System Identification 108

    8.1 Extended Kalman Filter 1088.2 Satellite Orbit Estimation 1118.3 Adaptive System Identification 1138.4 An Example of Constant Parameter Identification 1158.5 Modified Extended Kalman Filter 1188.6 Time-Varying Parameter Identification 124

    Exercises 129

    9. Decoupling of Filtering Equations 131

    9.1 Decoupling Formulas 1319.2 Real-Time Tracking 1349.3 The a - {3 -1 Tracker 1369.4 An Example 139

    Exercises 140

  • Contents XI

    10. Kalman Filtering for Interval Systems 143

    10.1 Interval Mathematics 14310.2 Interval Kalman Filtering 15410.3 Weighted-Average Interval Kalman Filtering 160

    Exercises 162

    11. Wavelet Kalman Filtering 164

    11.1 Wavelet Preliminaries 16411.2 Signal Estimation and Decomposition 170

    Exercises 177

    12. Notes 178

    12.1 The Kalman Smoother 17812.2 The a - {3 - , - () Tracker 18012.3 Adaptive Kalman Filtering 18212.4 Adaptive Kalman Filtering Approach

    to Wiener Filtering 18412.5 The Kalman-Bucy Filter 18512.6 Stochastic Optimal Control 18612.7 Square-Root Filtering

    and Systolic Array Implementation 188

    References 191

    Answers and Hints to Exercises 197

    Subject Index 227

  • Notation

    A, Ak systena naatricesAC "square-root" of A

    in Cholesky factorization 103AU "square-root" of A

    in upper triangular deconaposition 107B, Bk control input naatricesC, Ck naeasurenaent naatricesCov(X, Y) covariance of randona variables X and Y 13E(X) expectation of randona variable X 9E(XIY = y) conditional expectation 14ej, ej 36,37f(x) probability density function 9f(Xl, X2) joint probability density function 10f(Xllx2) conditional probability density function 12fk(Xk) vector-valued nonlinear functions 108G linaiting Kalnaan gain naatrix 78Gk Kalnaan gain naatrix 23Hk(Xk) naatrix-valued nonlinear function 108H* 50In n X n identity naatrixJ Jordan canonical forna of a matrix 7Kk 56L(x, v) 51MAr controllability matrix 85NCA observability matrix 79Onxm n x m zero matrixP limiting (error) covariance matrix 79Pk,k estimate (error) covariance matrix 26P[i,j] (i, j)th entry of naatrix PP(X) probability of random variable X 8Qk variance matrix of random vector ikRk variance matrix of random vector !l.k

  • XIV Notation

    Rn space of column vectors x = [Xl ... xn]TSk covariance matrix of Sk and '!1ktr traceUk deterministic control input

    (at the kth time instant)Var(X) variance of random variable XVar(XIY = y) conditional varianceVk observation (or measurement) data

    (at the kth time instant)

    555

    1014

    5253,5734333435

    531534164

    "norm" ofw

    weight matrix

    integral wavelet transformstate vector (at the kth time instant)optimal filtering estimate of Xkoptimal prediction of Xksuboptimal estimate of Xknear-optimal estimate of Xk

    V2#

    WkWj

    (W1/Jf)(b, a)Xk

    Xk, xklkxklk-l

    XkXkx*x# x#, k

    IIwll(x, w) "inner product" of x and WY(wo,···, w r ) "linear span" of vectors Wo,···, Wr{Zj} innovations sequence of data

    2265110

    136,141,18067

    a,{3,,,,/,(}

    {~k}' {lk}r, rkbij

    !.k,£' ~k,£

    '!1kSkk£

    df/dA8hj8x

    tracker parameterswhite noise sequencessystem noise matricesKronecker delta 15random (noise) vectors 22measurement noise (at the kth time instant)system noise (at the kth time instant)transition matrixJacobian matrixJacobian matrix

  • 1. Preliminaries

    The importance of Kalman filtering in engineering applicationsis well known, and its mathematical theory has been rigorouslyestablished. The main objective of this treatise is to present athorough discussion of the mathematical theory, computationalalgorithms, and application to real-time tracking problems of theKalman filter.

    In explaining how the Kalman filtering algorithm is obtainedand how well it performs, it is necessary to use some formulasand inequalities in matrix algebra. In addition, since only thestatistical properties of both the system and measurement noiseprocesses in real-time applications are being considered, someknowledge of certain basic concepts in probability theory will behelpful. This chapter is devoted to the study of these topics.

    1.1 Matrix and Determinant Preliminaries

    Let Rn denote the space of all column vectors x = [Xl··· xn]T,where Xl, ... ,Xn are real numbers. An n x n real matrix A is saidto be positive definite if xTAx is a positive number for all nonzerovectors x in Rn. It is said to be non-negative definite if x T Ax isnon-negative for any x in Rn. If A and B are any two nxn matricesof real numbers, we will use the notation

    A>B

    when the matrix A - B is positive definite, and

    A~B

    when A - B is non-negative definite.

  • 2 1. Preliminaries

    We first recall the so-called Schwarz inequality:

    IxTyl::; Ixllyl, x,y E Rn,

    where, as usual, the notation

    Ixl = (xTX)1/2

    is used. In addition, we recall that the above inequality becomesequality if and only if x and y are parallel, and this, in turn,means that

    x = Ay or y = AXfor some scalar A. Note, in particular, that if y i= 0, then theSchwarz inequality may be written as

    xTx~ (yTx )T(yTy)-l(yTX).

    This formulation allows us to generalize the Schwarz inequalityto the matrix setting.

    Lemma 1.1. (Matrix Schwarz inequality) Let p and Q be m x nand m x f matrices, respectively, such that pTP is nonsingular.Then

    QTQ ~ (pTQ)T(pT p)-l(pT Q). (1.1)

    Furthermore, equality in (1.1) holds if and only if Q = PS forsome n x f matrix S.

    The proof of the (vector) Schwarz inequality is simple. Itamounts to observing that the minimum of the quadratic poly-nomial

    (x - AY) T (x - AY) , y i= 0,of A is attained at

    A=(yTy )-l(yTX )

    and using this A value in the above inequality. Hence, in thematrix setting, we consider

    (Q - PS) T (Q - p S) ~ 0

    and choose

    so thatQT Q 2: ST (pT Q) + (pT Q)T S _ ST (pT P)S = (pT Q)T (pT p)-l(pTQ)

    as stated in (1.1). Furthermore, this inequality becomes equalityif and only if

    (Q - PS) T (Q - PS) = 0 ,or equivalently, Q = PS for some n x f matrix S. This completesthe proof of the lemma.

  • 1.1 Matrix and Determinant Preliminaries 3

    We now turn to the following so-called matrix inversionlemma.

    Lemma 1.2. (Matrix inversion lemma) Let

    where All and A22 are n x n and m x m nonsingular submatrices,respectively, such that

    are also nonsingular. Then A is nonsingular with

    A-I

    [

    All + AlII A 12 (A22-A2I A Il AI2 )-1A 2l A Il

    -(A22 - A2lAIII AI2 )-1A 2l Ail

    [

    (All - Al2A221A21 )-1

    -A221A21 (AII - Al2A221A21 )-1

    In particular,

    -AIlAI2(A22 - A2I AIIIAI2)_I]

    (A22 - A 2l A Il A I2 )-1

    -(All - A l2A 2l A21 )-1A12A2l]

    A 221+ A2lA21(AII

    -A12A221A21 )-1 Al2A221(1.2)

    (All - Al2A221A21 )-1

    =AIII + AllA 12 (A22 - A 2l A Il AI2)-IA2IAIl (1.3)

    and

    AllA 12 (A22 - A2lAilAI2 )-1

    =(AII - Al2A221A21)-IAI2A2l·

    Furthermore,

    det A =(det All) det(A22 - A 2l A Il A 12)=(det A 22 ) det(A II - Al2A221A 21 ) .

    To prove this lemma, we write

    (1.4)

    (1.5)

  • 4 1. Preliminaries

    andA = [In A 12 A2"2l ] [All - A 12A2"l A2l 0].

    o I m A2l A22

    Taking determinants, we obtain (1.5). In particular, we have

    det A =1= 0,

    or A is nonsingular. Now observe that

    ]

    -1A12

    A22 - A2l AIl A 12-AlIIA 12 (A22 - A2l A Il A12)-1]

    (A22 - A2l A1l A12)-1

    and

    [In

    A2lAIll

    Hence, we have

    which gives the first part of (1.2). A similar proof also gives thesecond part of (1.2). Finally, (1.3) and (1.4) follow by equatingthe appropriate blocks in (1.2).

    An immediate application of Lemma 1.2 yields the followingresult.

    Lemma 1.3. If P 2: Q > 0, then Q-l 2: p- l > o.

    Let P(E) = P + El where E> o. Then P(E) - Q > o. By Lemma1.2, we have

    p-l(E) = [Q + (P(E) _ Q)]-l= Q-l _ Q-l[(P{E) _ Q)-l + Q-l]-lQ-l,

    so that

  • 1.1 Matrix and Determinant Preliminaries 5

    Letting € ~ 0 gives Q:-l - p-l ~ 0, so that

    Q-l ~ p-l > O.

    Now let us turn to discussing the trace of an n x n matrixA. The trace of A, denoted by trA, is defined as the sum of itsdiagonal elements, namely:

    n

    trA == Laii'i=1

    where A == [aij]. We first state some elementary properties.

    Lemma 1.4. If A and Bare n x n matrices, then

    trAT == trA,

    tr(A + B) == trA + trB,and

    (1.6)

    (1.7)

    tr(.AA) ==.A trA. (1.8)

    If A is an n x m matrix and B is an m x n matrix, then

    trAB == trBT AT == trBA = trAT B T (1.9)

    andn m

    trAT A == LLa;j'i=1 j=1

    (1.10)

    The proof of the above identities is immediate from the def-inition and we leave it to the reader (cf. Exercise 1.1). Thefollowing result is important.

    Lemma 1.5. Let A be an nxn matrix with eigenvalues .AI,'" ,.An,multiplicities being listed. Then

    n

    trA == L.Ai'i=1

    (1.11)

    To prove the lemma, we simply write A == UJU- 1 where J isthe Jordan canonical form of A and U is some nonsingular matrix.Then an application of (1.9) yields

    n

    trA == tr(AU)U- 1 = trU- 1 (AU) == trJ == L.Ai.i=1

    It follows from this lemma that if A > 0 then trA > 0, and if A ~ 0then trA ~ o.

    Next, we state some useful inequalities on the trace.

  • 6 1. Preliminaries

    Lemma 1.6. Let A be an n x n matrix. Then

    trA :::; (n trAAT)I/2 . (1.12)

    We leave the proof of this inequality to the reader (cf. Exer-cise 1.2).

    Lemma 1.7. IfA and Bare n x m and m x f matrices respectively,tben

    tr(AB)(AB)T :::; (trAAT)(trBBT).

    Consequently, for any matrices AI,···, Ap with appropriate dimen-sions,

    tr(AI ·· .Ap)(AI ·· .Ap)T :::; (trAIAi)··· (trApAJ). (1.13)

    If A = [aij] and B = [bij ], then

    tr(AB)(AB) T = tr [t aikbki] [t aikbki ]k=1 k=1

    I:~=l (I:;;'=l a1kbkP) 2 *= tr

    * I:~=l (I:;;'=l ankbkP)2n i (m )2 n i m m

    = t;]; {; aikbkp ~ tt]; {; a;k (; b~p

    = (t~a;k) (t,~b~p) = (trAAT)(trBBT ) ,

    where the Schwarz inequality has been used. This completes theproof of the lemma.

    It should be remarked that A ~ B > 0 does not necessarilyimply trAAT 2: trBBT. An example is

    A = [8 n and B = [i ~1] .Here, it is clear that A - B ~ 0 and B > 0, but

    169trAAT = 25 < 7 = trBBT

    (cf. Exercise 1.3).For a symmetric matrix, however, we can draw the expected

    conclusion as follows.

  • 1.1 Matrix and Determinant Preliminaries 7

    Lemma 1.8. Let A and B be non-negative definite symmetricmatrices with A 2:: B. Then trAAT 2:: trBBT , or trA2 2:: trB2.

    We leave the proof of this lemma as an exercise (cf. Exercise1.4).

    Lemma 1.9. Let B be an n x n non-negative definite symmetricmatrix. Then

    trB2 S (trB)2 . (1.14)

    Consequently, ifA is another nxn non-negative definite symmetricmatrix such that B S A, then

    trB 2 S (trA)2 . (1.15)

    To prove (1.14), let AI,···,An be the eigenvalues of B. ThenAi,· . " A~ are the eigenvalues of B 2 • Now, since AI,' ", An are non-negative, Lemma 1.5 gives

    trB2 = ~A~ S (~Ai) 2 = (trB)2.

    (1.15) follows from the fact that B S A implies trB S trA.

    We also have the following result which will be useful later.

    Lemma 1.10. Let F be an nxn matrix with eigenvalues AI,"', Ansuch that

    A := max(IAII,···, IAnl) < 1.Then there exist a real number r satisfying 0 < r < 1 and a con-stant C such that

    for all k = 1,2,···.

    Let J be the Jordan canonical form for F. Then F = UJU- Ifor some nonsingular matrix U. Hence, using (1.13), we have

    ItrFk(Fk)T I= ItrUJkU-1(U-l) T (Jk)TUT I

    S ItrUUT IltrJk(Jk)TIltrU-1(U-l)TIS p(k)A2k ,

    where p(k) is a polynomial in k. Now, any choice of r satisfyingA2 < r < 1 yields the desired result, by choosing a positive constantC that satisfies

    for all k.

  • 8 1. Preliminaries

    1.2 Probability Preliminaries

    Consider an experiment in which a fair coin is tossed such thaton each toss either the head (denoted by H) or the tail (denotedby T) occurs. The actual result that occurs when the experimentis performed is called an outcome of the experiment and the setof all possible outcomes is called the sample space (denoted bys) of the experiment. For instance, if a fair coin is tossed twice,then each result of two tosses is an outcome, the possibilities areHH, TT, HT, TH, and the set {HH, TT, HT, TH} is the samplespace s. Furthermore, any subset of the sample space is calledan event and an event consisting of a single outcome is called asimple event.

    Since there is no way to predict the outcomes, we have toassign a real number P, between 0 and 1, to each event to indicatethe probability that a certain outcome occurs. This is specifiedby a real-valued function, called a random variable, defined onthe sample space. In the above example, if the random variableX = X(s), s E S, denotes the number of H's in the outcome s,then the number P;== P(X(s)) gives the probability in percentagein the number of H's of the outcome s. More generally, let S bea sample space and X : S ~ RI be a random variable. For eachmeasurable set A c RI (and in the above example, A'= {O}, {I},or {2} indicating no H, one H, or two H's, respectively) defineP : {events} ~ [0,1], where each event is a set {s E S : X(s) E A CRI} := {X E A}, subject to the following conditions:

    (1) P(X E A) 2 0 for any measurable set A CRI,(2) P(X E RI) = 1, and(3) for any countable sequence of pairwise disjoint measurable

    sets Ai in RI,

    CXJ

    p(x E igl Ai) = LP(X E Ai)'i=I

    P is called the probability distribution (or probability distri-bution function) of the random variable X.

    If there exists an integrable function f such that

    P(X E A) = i f(x)dx (1.16)for all measurable sets A, we say that P is a continuous probabilitydistribution and f is called the probability density function of

  • 1.2 Probability Preliminaries 9

    the random variable X. Note that actually we could have definedj(x)dx = d>" where>.. is a measure (for example, step functions) sothat the discrete case such as the example of "tossing coins" canbe included.

    If the probability density function j is given by

    j(x) = ~ e-~(X-/l)2, a > 0 and /L E R, (1.17)v21rO'

    called the Gaussian (or normal) probability density function,then P is called a normal distribution of the random variableX, and we use the notation: X rv N(J.L, 0'2). It can be easily veri-fied that the normal distribution P is a probability distribution.Indeed, (1) since j(x) > 0, P(X E A) = fA j(x)dx ~ 0 for any mea-surable set A c R, (2) by substituting y = (x - J.L)/(v'2O') ,

    jCX) 1 jCX) 2

    P(X E RI) = j(x)dx = c. e-Y dy = 1,-CX) V 1r -CX)

    (cf. Exercise 1.5), and (3) since

    LA. j(x)dx =~i. j(x)dx. ~ ~ ~~

    for any countable sequence of pairwise disjoint measurable setsAi c RI, we have

    p(X E l}Ai ) = LP(X E Ai).~

    i

    Let X be a random variable. The expectation of X indicatesthe mean of the values of X, and is defined by

    E(X) =i: xj(x)dx. (1.18)Note that E(X) is a real number for any random variable X withprobability density function j. For the normal distribution, usingthe substitution y= (x-J.L)/(v'2O') again, we have

    E(X) =i: xj(x)dx= _1_ (CX) xe-~(x-JL)2 dx~O' J-CX)

    = ~ [00 (V2ay + /L)e- y2 dyv 1r J-CX)

    = J.L_1_ [00 e- y2 dy~ J-CX)

    = J.L. (1.19)

  • 10 1. Preliminaries

    Note also that E{X) is the first moment of the probability densityfunction f. The second moment gives the variance of X definedby

    Var (X) = E(X - E(X))2 = 1:(x - E(X))2f(x)dx. (1.20)This number indicates the dispersion of the values of X from itsmean E{X). For the normal distribution, using the substitutiony = (x - J-l)j{y'2o-) again, we have

    Var(X) =1:(x - /L)2f(x)dx1 100 __I_(X-JL)2=-- {x - J-l)2 e 20-2 dx

    yl2i0- -00

    20-2 100 2= - y2e- Y dy~ -00

    = 0-2 , (1.21)

    where we have used the equality J:O y2e- y2 dy =~/2 (cf. Exercise1.6).

    We now turn to random vectors whose components are ran-dom variables. We denote a random n-vector X = [Xl··· Xn]Twhere each Xi{s) E RI, S E S.

    Let P be a continuous probability distribution function of X.That is,

    (1.22)

    where AI,··· ,An are measurable sets in RI and f an integrablefunction. f is called a joint probability density function of X andP is called a joint probability distribution (function) of X. Foreach i, i = 1,· .. ,n, define

    fi(X) =

    {CO ... (CO f(Xl, ... , Xi-I, x, Xi+l, ... , Xn)dXl ... dXi-ldXi+l ... dXn .i-oo i-oo(1.23)

    Then it is clear that J~oo fi{X)dx = 1. fi is called the ith marginalprobability density function of X corresponding to the joint prob-ability density function f{XI,· .. ,xn ). Similarly, we define fij and

  • 1.2 Probability Preliminaries 11

    fijk by deleting the integrals with respect to Xi, Xj and Xi, Xj, Xk,respectively, etc., as in the definition of fi. If

    () 1 {1 T -1 }j x = (21r)n/2(det R)l/2 exp -2(x -!!.) R (x -!!.) , (1.24)where ft is a constant n-vector and R is a symmetric positive defi-nite matrix, we say that f(x) is a Gaussian (or normal) probabilitydensity function of x. It can be verified that

    1: j(x)dx:= 1:···1: j(X)dX l ... dXn = 1 , (1.25)E(X) =1: xj(x)dx

    and

    =!:!.' (1.26)

    (1.27)

    Indeed, since R is symmetric and positive definite, there is aunitary matrix U such that R = UT JU where J = diag[Al,· .. ,An]and AI,···, An > o. Let y = ~diag[JXl'···' JXn]U(x - !:!.). Then

    1: j(x)dx2n / 2 . IX .... IX 100 2 100 2_ V Al V An -Yl -y

    - (2 )n/2(oX ... oX )1/2 e dYl . . . e n dYn1r 1 n -00 -00

    =1.

    Equations (1.26) and (1.27) can be verified by using the samesubstitution as that used for the scalar case (cf. (1.21) and Ex-ercise 1.7) .

    Next, we introduce the concept of conditional probability.Consider an experiment in which balls are drawn one at a timefrom an urn containing M 1 white balls and M 2 black balls. Whatis the probability that the second ball drawn from the urn is alsoblack (event A2 ) under the condition that the first one is black(event AI)? Here, we sample without replacement; that is, thefirst ball is not returned to the urn after being drawn.

  • 12 1. Preliminaries

    To solve this simple problem, we reason as follows: since thefirst ball drawn from the urn is black, there remain M I whiteballs and M 2 - 1 black balls in the urn before the second drawing.Hence, the probability that a black ball is drawn is now

    M 2 -1

    Note that

    where M 2 j(M1 + !"12) is the p~~babili~Ylhat.a black ball ~~ pickedat the first drawIng, and CMI +M2) . MI +M2- 1 IS the probabIlIty thatblack balls are picked at both the first and second drawings. Thisexample motivates the following definition of conditional proba-bility: The conditional probability of Xl E Al given X 2 E A2 isdefined by

    (1.28)

    (1.29)

    Suppose that P is a continuous probability distribution func-tion with joint probability density function f. Then (1.28) be-comes

    fAI JA2 f(XI' X2) dx l dx2P(X1 E A1 IX2 E A2 ) = J f ()d '

    A2 2 X2 X2

    where f2 defined by

    h(X2) = 1: f(Xll X2)dx lis the second marginal probability density function of f. Letf(Xllx2) denote the probability density function corresponding toP(XI E AI IX2 E A2). f(Xllx2) is called the conditional probabil-ity density function corresponding to the conditional probabilitydistribution function P(XI E AI IX2 E A2 ). It is known that

    f( I ) - f(XI' X2)Xl X2 - f2(X2)

    which is called the Bayes formula (see, for example, Probabilityby A. N. Shiryayev (1984)). By symmetry, the Bayes formula canbe written as

  • 1.2 Probability Preliminaries 13

    We remark that this formula also holds for random vectors Xland X 2 •

    Let X and Y be random n- and m-vectors, respectively. Thecovariance of X and Y is defined by the n x m matrix

    Cov(X, Y) = E[(X - E(X))(Y - E(Y)) T] . (1.31)

    When Y = X, we have the variance matrix, which is sometimescalled a covariance matrix of X, Var(X) = Cov(X, X).

    It can be verified that the expectation, variance, and covari-ance have the following properties:

    and

    E(AX + BY) = AE(X) + BE(Y)

    E((AX)(By)T) = A(E(XyT))BT

    Var(X) 2:: 0,

    Cov(X, Y) = (Cov(Y,X))T ,

    (1.32a)

    (1.32b)

    (1.32c)

    (1.32d)

    Cov(X, Y) = E(XyT) - E(X)(E(y))T , (1.32e)

    where A and B are constant matrices (cf. Exercise 1.8). X andy are said to be independent if f(xly) = fl(X) and f(ylx) = f2(y),and X and y are said to be uncorrelated if Cov(X, Y) = o. Itis easy to see that if X and Y are independent then they areuncorrelated. Indeed, if X and Y are independent then f(x, y) =fl(X)f2(Y). Hence,

    E(XyT) =1:1:xyT f(x,y)dxdy= 1: x!I(x)dx1: yTh(y)dy= E(X)(E(y))T ,

    so that by property (1.32e) Cov(X, Y) = o. But the converse doesnot necessarily hold, unless the probability distribution is normal.Let

    whereR = [Rll R12] , T

    R R R12 = R 2l ,21 22

    R ll and R 22 are symmetric, and R is positive definite. Then itcan be shown that Xl and X2 are independent if and only ifR 12 = CoV(Xl ,X2) = 0 (cf. Exercise 1.9).

  • 14 1. Preliminaries

    Let X and Y be two random vectors. Similar to the definitionsof expectation and variance, the conditional expectation of Xunder the condition thaty = y is defined to be

    E(XIY = y) =1:xf(xly)dx (1.33)and the conditional variance, which IS sometimes called the con-ditional covariance of X, under the condition that y = y to be

    Var(XIY = y)

    =1:[x - E(XIY = y)][x - E(XIY = y)]TJ(xly)dx. (1.34)Next, suppose that

    E([~]) = [~] and Var([X]) = [Rxx R xy ].Y R yx R yy

    Then it follows from (1.24) that

    f(x,y) =f([;])

    1

    It can be verified that

    f(xly) =f(x,y)f(y)

    1 eXP{-!(X-MTil-l(X-M} , (1.35)(2n)n/2(det R)1/2 2 - -

    where

    and- -1R = R xx - RxyRyy Ryx

    (cf. Exercise 1.10). Hence, by rewriting p:. and R, we have

    E(XIY = y) = E(X) + Cov(X, Y)[Var(y)]-l(y - E(Y)) (1.36)

  • 1.3 Least-Squares Preliminaries 15

    and

    Var(XIY = y) = Var(X) - Cov(X, Y)[Var(y)]-lCOV(y,X) . (1.37)

    1.3 Least-Squares Preliminaries

    Let {5k} be a sequence of random vectors, called a random se-quence. Denote E(~k) = !!:.k' Cov(~k'~j) = Rkj so that Var(~k) =Rkk := Rk· A random sequence {~k} is called a white noise se-quence if Cov(5.k'~j) = Rkj = Rk8kj where 8kj = 1 if k = j and 0 ifk =I j. {5.k} is called a sequence of Gaussian (or normal) whitenoise if it is white and each 5.k is normal.

    Consider the observation equation of a linear system wherethe observed data is contaminated with noise, namely:

    where, as usual, {Xk} is the state sequence, {Uk} the control se-quence, and {Vk} the data sequence. We assume, for each k, thatthe q x n constant matrix Ck, q x p constant matrix Dk, and thedeterministic control p-vector Uk are given. Usually, {~k} is notknown but will be assumed to be a sequence of zero-mean Gaus-sian white noise, namely: E(5.k) = 0 and E(5.k~;) = Rkj8kj with Rkbeing symmetric and positive definite, k, j = 1, 2, ....

    Our goal is to obtain an optimal estimate Yk of the statevector Xk from the information {Vk}. If there were no noise, thenit is clear that Zk - CkYk = 0, where Zk := Vk - DkUk, wheneverthis linear system has a solution; otherwise, some measurementof the error Zk - CkYk must be minimized over all Yk. In general,when the data is contaminated with noise, we will minimize thequantity:

    over all n-vectors Yk where Wk is a positive definite and symmetricq x q matrix, called a weight matrix. That is, we wish to find aYk = Yk(Wk) such that

  • 16 1. Preliminaries

    In addition, we wish to determine the optimal weight Wk. To findYk = Yk(Wk), assuming that (C~WkCk) is nonsingular, we rewrite

    F(Yk, Wk)

    =E(Zk - CkYk)TWk(Zk - CkYk)

    =E[(C~WkCk)Yk - C~WkZk]T (C~WkCk)-I[(C~WkCk)Yk - C~WkZk]

    + E(zJ [I - WkCk(C~WkCk)-ICJ]WkZk) ,

    where· the first term on the right hand side is non-negative def-inite. To minimize F(Yk, Wk), the first term on the right mustvanish, so that

    Yk = (C~WkCk)-IC~WkZk.

    Note that if (C~WkCk) is singular, then Yk is not unique. To findthe optimal weight Wk, let us consider

    F(Yk' Wk) = E(Zk - CkYk)TWk(Zk - CkYk).

    It is clear that this quantity does not attain a minimum valueat a positive definite weight Wk since such a minimum wouldresult from Wk = o. Hence, we need another measurement todetermine an optimal Wk. Noting that the original problem is toestimate the state vector Xk by Yk(Wk), it is natural to considera measurement of the error (Xk - Yk(Wk)). But since not muchabout Xk is known and only the noisy data can be measured,this measurement should be determined by the variance of theerror. That is, we will minimize Var(xk -Yk(Wk)) over all positivedefinite symmetric matrices Wk. We write Yk = Yk(Wk) and

    Xk - Yk = (C~WkCk)-I(C~WkCk)Xk - (C~WkCk)-IClwkZk= (CJWkCk)-IC~Wk(CkXk - Zk)

    = -(ClWkCk)-IClwk~k'

    Therefore, by the linearity of the expectation operation, we have

    Var(xk - Yk) = (C~WkCk)-IClWkE(~k~:)WkCk(C~WkCk)-1= (ClWkCk)-IClWkRkWkCk(ClWkCk)-I.

    This is the quantity to be minimized. To write this as a per-fect square, we need the positive square root of the positivedefinite symmetric matrix Rk defined as follows: Let the eigen-values of Rk be AI,·", An, which are all positive, and writeRk = UT diag[Al'···' An]U where U is a unitary matrix (formedby the normalized eigenvectors of Ai, i = 1"·,, n). Then we define

  • 1.3 Least-Squares Preliminaries 17

    Ri/2 = UT diag[JXl'···' JXn]U which gives (R~/2)(Ri/2) T = Rk. It

    follows thatVar(xk - Yk) = QTQ,

    where Q = (R~/2)TWkCk(C~WkCk)-1. By Lemma 1.1 (the matrixSchwarz inequality), under the assumption that p is a qxn matrixwith nonsingular pTP, we have

    Hence, if (C"[ R;lCk) is nonsingular, we may choose p = (Ri/2)-lCk'so that

    pTp = c"[ ((R~/2)T)-1(R~/2)Ck = C~R;lCk

    is nonsingular, and

    (pT Q) T (pTp)-l(pTQ)

    = [C"[ ((Ri/2)-1)T(R~/2)TWkCk(C~WkCk)-l]T (C~RJ;lCk)-l

    . [C~ ((Ri/2)-1)T(Rk/2)TWkCk(C~WkCk)-l]

    = (C~RJ;lCk)-l= Var(Xk - Yk(RJ;l)).

    Hence, Var(xk-Yk(Wk)) ~ Var(xk-Yk(R;l)) for all positive definitesymmetric weight matrices Wk. Therefore, the optimal weightmatrix is Wk = R;l, and the optimal estimate of Xk using thisoptimal weight is

    Xk := Yk(R;l) = (C~R;lCk)-lC~RJ;l(Vk - DkUk). (1.38)

    We call Xk the least-squares optimal estimate of Xk. Note that Xkis a linear estimate of Xk. Being the image of a linear transfor-mation of the data Vk -DkUk, it gives an unbiased estimate of Xk,in the sense that EXk = EXk (cf. Exercise 1.12), and it also givesa minimum variance estimate of Xk, since

    for all positive definite symmetric weight matrices Wk.

  • 18 1. Preliminaries

    Exercises

    1.1. Prove Lemma 1.4.1.2. Prove Lemma 1.6.1.3. Give an example of two matrices A and B such that A 2: B >

    o but for which the inequality AAT 2: BBT is not satisfied.1.4. Prove Lemma 1.8.1.5. Show that J~CXJ e-y2 dy = ~.1.6. Verify that J~CXJ y2 e- y2dy = ~~. (Hint: Differentiate the

    integral - J~CXJ e-xy2 dy with respect to x and then let x -+ 1.)1.7. Let

    f(x) = (21T)n/2(~etR)l/2 exp{ -~(x - I:!)T R-1(x - !!:.) } .

    Show that(a)

    E(X) =i: xf(x)dx:=i: ...i: [}] j(X)dXl ... dXn=!!:..'

    and(b)

    Var(X) = E(X - !!:..)(X - !!:..)T = R.

    1.8. Verify the properties (1.32a-e) of the expectation, variance,and covariance.

    1.9. Prove that two random vectors Xl and X 2 with normal dis-tributions are independent if and only if Cov(XI , X 2 ) = o.

    1.10. Verify (1.35).1.11. Consider the minimization of the quantity

    over all n-vectors Yk, where Zk is a q x 1 vector, Ck, a q x nmatrix, and W k , a q x q weight matrix, such that the matrix(C~WkCk) is nonsingular. By letting dF(Yk)/dYk = 0, showthat the optimal solution Yk is given by

  • Exercises 19

    (Hint: The differentiation of a scalar-valued function F(y)with respect to the n-vector y = [Yl ... Yn]T is defined to be

    1.12. Verify that the estimate Xk given by (1.38) is an unbiasedestimate of Xk in the sense that EXk = EXk.

  • 2. Kalman Filter: An Elementary Approach

    This chapter is devoted to a most elementary introduction to theKalman filtering algorithm. By assuming invertibility of certainmatrices, the Kalman filtering "prediction-correction" algorithmwill be derived based on the optimality criterion of least-squaresunbiased estimation of the state vector with the optimal weight,using all available data information. The filtering algorithm isfirst obtained for a system with no deterministic (control) input.By superimposing the deterministic solution, we then arrive atthe general Kalman filtering algorithm.

    2.1 The Model

    Consider a linear system with state-space description

    {

    Yk+l = AkYk + BkUk + rk~kWk = CkYk + DkUk +!1k'

    where Ak' Bk' rk, Ck, Dk are nx n, nxm, nxp, q x n, q xm (known) con-stant matrices, respectively, with 1 ::; m,p,q ::; n, {Uk} a (known)sequence of m-vectors (called a deterministic input sequence),and {~k} and t~l.k} are, respectively, (unknown) system and obser-vation noise sequences, with known statistical information suchas mean, variance, and covariance. Since both the determinis-tic input {Uk} and noise sequences {ik } and {!1k} are present, thesystem is usually called a linear deterministic/stochastic system.This system can be decomposed into the sum of a linear deter-ministic system:

    {

    Zk+l = AkZk + BkUkSk = CkZk + DkUk,

  • 2.2 Optimality Criterion 21

    and a linear (purely) stochastic system:

    {

    Xk+l = AkXk + rk~kVk = CkXk + '!1k '

    (2.1)

    with Wk = Sk+Vk and Yk = Zk+Xk. The advantage of the decompo-sition is that the solution of Zk in the linear deterministic systemis well known and is given by the so-called transition equation

    k

    Zk = (Ak- 1 ··· Ao)zo + L(Ak- 1 ... Ai-l)Bi-lUi-l.i=l

    Hence, it is sufficient to derive the optimal estimate Xk of Xk inthe stochastic state-space description (2.1), so that

    becomes the optimal estimate of the state vector Yk in the originallinear system. Of course, the estimate has to depend on thestatistical information of the noise sequences. In this chapter, wewill only consider zero-mean Gaussian white noise processes.

    Assumption 2.1. Let {~k} and {'!1k} be sequences of zero-meanGaussian white noise such that Var({-{» = Qk and Var('!l.J.) = Rk arepositive definite matrices and E(~k~ ) = 0 for all k and l. Theinitial state Xo is also assumed to be independent of ~k and '!1k inthe sense that E(xo~~) = 0 and E(xo'!1;) = 0 for all k.

    2.2 Optimality Criterion

    In determining the optimal estimate Xk of Xk, it will be seen thatthe optimality is in the sense of least-squares followed by choosingthe optimal weight matrix that gives a minimum variance esti-mate as discussed in Section 1.3. However, we will incorporatethe information of all data Vj , j = 0,1,· .. , k, in determining theestimate Xk of Xk (instead of just using Vk as discussed in Section1.3). To accomplish this, we introduce the vectors

    j = 0, 1,··· ,

  • 22 2. Kalman Filter: An Elementary Approach

    and obtain Xk from the data vector Vk. For this approach, weassume for the time being that all the system matrices Aj arenonsingular. Then it can be shown that the state-space descrip-tion of the linear stochastic system can be written as

    where

    [ CO~Ok] [f-k'O]Hk,j = : and ~k,j = : 'CjiPjk f-k,j

    with iPik being the transition matrices defined by

    _ {Ai-I . .. Ak if R> k,iPik - I if R= k,

    iPik = iPkl if R< k, andk

    5:.k,l = '!lR. ~ Cl L iPliri-l~i_l·i=l+l

    (2.2)

    Indeed, by applying the inverse transition property of ~ki de-scribed above and the transition equation

    k

    Xk = iPkiXi + L iPkiri-l{i_l'i=l+l

    which can be easily obtained from the first recursive equation in(2.1), we have

    k

    Xi = iPlkXk - L iPiiri-l~i_l;i=l+l

    and this yields

    Hk,jXk + fk,j

    [

    CO~Ok ] [ !la - Co Et=~

  • 2.3 Prediction-Correction Formulation 23

    Now, using the least-squares estimate discussed in Chapter1, Section 1.3, with weight Wkj = (Var(~k,j))-\ where the inverseis assumed only for the purpose of illustrating the optimalitycriterion, we arrive at the linear, unbiased, minimum varianceleast-squares estimate Xk1j of Xk using the data Vo,···, Vj.

    Definition 2.1. (1) For j = k, we denote Xk = xklk and callthe estimation process a digital filtering process. (2) For j < k,we call Xklj an optimal prediction of Xk and the process a digitalprediction process. (3) For j > k, we call Xklj a smoothing estimateof Xk and the process a digital smoothing process.

    We will only discuss digital filtering. However, since Xk = xklkis determined by using all data Vo,··· ,Vk, the process is not ap-plicable to real-time problems for very large values of k, since theneed for storage of the data and the computational requirementgrow with time. Hence, we will derive a recursive formula thatgives Xk = xklk from the "prediction" xklk-l and xklk-l from theestimate Xk-l = Xk-llk-l. At each step, we only use the incomingbit of the data information so that very little storage of the datais necessary. This is what is usually called the Kalman filteringalgorithm.

    2.3 Prediction-Correction Formulation

    To compute Xk in real-time, we will derive the recursive formula

    {~klk = ~klk-l ~ Gk(Vk - CkXklk-l)Xklk-l - Ak-1Xk-llk-l ,

    where Gk will be called the Kalman gain matrices. The startingpoint is the initial estimate Xo = xOlo. Since Xo is an unbiasedestimate of the initial state Xo, we could use Xo = E(xo), which isa constant vector. In the actual Kalman filtering, Gk must alsobe computed recursively. The two recursive processes togetherwill be called the Kalman filtering process.

    Let Xklj be the (optimal) least-squares estimate of Xk withminimum variance by choosing the weight matrix to be

  • 24 2. Kalman Filter: An Elementary Approach

    using Vj in (2.2) (see Section 1.3 for details). It is easy to verifythat

    W - 1k,k-l =

    and

    o ] [CO Ei=l Oiri_l~i_l]+Var :

    Rk-l Ck-l k-l,krk-l~k_l

    (2.3)

    (2.4)w- 1 _ [Wk~-l 0]k,k - 6 Rk(cf. Exercise 2.1). Hence, Wk,k-l and Wk,k are positive definite(cf. Exercise 2.2).

    In this chapter, we also assume that the matrices

    (H~jWk,jHk,j), j=k-l and k,

    are nonsingular. Then it follows from Chapter 1, Section 1.3,that

    xkli = (H~jWk,jHk,j)-lH~jWk,jVj. (2.5)

    Our first goal is to relate xklk-l with xklk. To do so, we observethat

    andH T TXT - HT TXT - CTR- 1k,k JIJI k,k vk = k,k-l JIJI k,k-lvk-l + k k Vk·

    Using (2.5) and the above two equalities, we have

    and(H~k-lWk,k-lHk,k-l + C~RJ;lCk)Xklk

    =(H~kWk,kHk,k)Xklk

    H T TXT - CTR- 1= k,k-l JIJI k,k-lVk-l + k k vk .A simple subtraction gives

    (H~k-lWk,k-lHk,k-l + C~RJ;lCk)(Xklk - xklk-l)=C~RJ;l(Vk - CkXk1k-l) .

  • 2.3 Prediction-Correction Formulation 25

    Now define

    Gk ==(H~k-IWk,k-IHk,k-1 + C"[R;lCk)-IC"[R;I==(H~kWk,kHk,k)-IC"[R;I.

    Then we have

    (2.6)

    Since xklk-I is a one-step prediction and (Vk - CkXk1k-l) is theerror between the real data and the prediction, (2.6) is in fact a"prediction-correction" formula with the Kalman gain matrix Gkas a weight matrix. To complete the recursive process, we needan equation that gives xklk-I from Xk-Ilk-I' This is simply theequation

    (2.7)

    To prove this, we first note that

    so that

    W~~_I == W';!I,k-1 +Hk-I,k-I k-I,krk-I Qk-I r r-Ir-l,kH"[-I,k-1 (2.8)

    (cf. Exercise 2.3). Hence, by Lemma 1.2, we have

    Wk,k-I ==Wk-I,k-I - Wk-l,k-IHk-l,k-1 k-I,krk-I(Q;~I

    + rI-II-I,kH-:-I,k-1Wk-l,k-IHk-l,k-1 k-I,krk-I)-I. rI-II-I,kH-:-I,k-1Wk-I,k-I (2.9)

    (cf. Exercise 2.4). Then by the transition relation

    Hk,k-I == Hk-I,k-I k-I,k

    we have

    H~k-IWk,k-I

    ==I-I,k{I - H"[-I,k-IWk-l,k-IHk-l,k-1 k-I,krk-I(Q;~I

    + rI-I ifJI-I,kH-:-I,k-1Wk-l,k-IHk-l,k-1 k-I,krk-I)-I. rI-II-I,k}H-:-I,k-1Wk-I,k-l (2.10)

  • 26 2. Kalman Filter: An Elementary Approach

    (cf. Exercise 2.5). It follows that

    (H~k-lWk,k-lHk,k-l)k,k-l (H"[-l,k-lWk-l,k-lHk-l,k-l)-l

    . H"[-l,k-lWk-l,k-l = H~k-lWk,k-l (2.11)

    (cf. Exercise 2.6). This, together with (2.5) with j = k -1 and k,gives (2.7).

    Our next goal is to derive a recursive scheme for calculatingthe Kalman gain matrices Gk. Write

    where

    and set

    Then, sinceP - 1 p-l CTR-1Ck,k = k,k-l + k k k ,

    we obtain, using Lemma 1.2,

    It can be proved that

    (2.12)

    (cf. Exercise 2.7), so that

    (2.13)

    Furthermore, we can show that

    (cf. Exercise 2.8). Hence, using (2.13) and (2.14) with the initialmatrix PO,o, we obtain a recursive scheme to compute Pk-l,k-l,Pk,k-l, Gk and Pk,k for k = 1,2,," . Moreover, it can be shown that

    Pk,k-l = E(Xk - xklk-l)(Xk - xklk-l)T= Var(xk - xklk-l) (2.15)

  • 2.4 Kalman Filtering Process 27

    (cf. Exercise 2.9) and that

    In particular, we have

    Po,o = E(xo - Exo)(xo - Exo) T = Var(xo).

    Finally, combining all the results obtained above, we arriveat the following Kalman filtering process for the linear stochasticsystem with state-space description (2.1):

    Po,o = Var(xo)

    Pk,k-l = Ak-lPk-l,k-lAr-l + rk-lQk-lrr-lGk = Pk,k-lC"[ (CkPk,k-lC"[ + Rk)-lPk,k = (I - GkCk)Pk,k-l

    xOlo = E(xo)

    xklk-l = Ak-lXk-llk-lxklk = xklk-l + Gk(Vk - CkXk1k-l)k = 1,2,··· .

    This algorithm may be realized as shown in Fig. 2.1.

    (2.17)

    +

    Fig. 2.1.

    2.4 Kalman Filtering Process

    Let us now consider the general linear deterministic/stochasticsystem where the deterministic control input {Uk} is present.More precisely, let us consider the state-space description

    {

    Xk+l = AkXk + BkUk + rk~kVk = CkXk + DkUk +!1k'

  • 28 2. Kalman Filter: An Elementary Approach

    where {Uk} is a sequence of m-vectors with 1 ~ m ~ n. Then bysuperimposing the deterministic solution with (2.17), the Kalmanfiltering process for this system is given by

    Po,o == Var(xo)

    Pk,k-l == Ak-1Pk-l,k-lAJ-l + rk-1Qk-lrJ-lGk == Pk,k-lC~ (CkPk,k-lC~ + Rk)-lPk,k == (I - GkCk)Pk,k-l

    xOlo == E(xo)

    xklk-l == Ak-lXk-llk-l + Bk-lUk-lxklk == Xk/k-l + Gk(Vk - DkUk - CkXk1k-l)k == 1,2,,,,,

    (2.18)

    (cf. Exercise 2.13). This algorithm may be implemented as shownin Fig.2.2.

    Fig. 2.2.

  • Exercises 29

    Exercises

    2.1. Let

    [ :~:,o.]fk,j = c-k,J

    andk

    £k,l = '0. - Cl L ~liri-1Si_1'i=l+l

    where {Sk} and {!lk} are both zero-mean Gaussian whitenoise sequences with Var(Sk) = Qk and Var(!lk) = Rk. DefineWk,j = (Var(fk,j))-l. Show that

    W- 1k,k-1 =

    and

    o ] [CO 2::=1 ~Oiri-1Si_1 ]+Var :

    Rk-1 Ck-1 ~k-1,krk-1Sk_1

    W - 1 _ [Wk-~-l 0]k,k - b Rk'2.2. Show that the sum of a positive definite matrix A and a

    non-negative definite matrix B is positive definite.2.3. Let fk,j and Wk,j be defined as in Exercise 2.1. Verify the

    relation

    where

    and then show that

    W~~_l = Wk!1,k-1 +Hk-1,k-1~k-1,krk-1Qk-1rr-1~r-1,kH;[-1,k-1.

    2.4. Use Exercise 2.3 and Lemma 1.2 to show that

    Wk,k-1 =Wk-1,k-1 - Wk-1,k-1Hk-1,k-1~k-1,krk-1(Qk~1

    + rl-1~l-1,kH"[-1,k-1Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1. rl-1~l-1,kH"[-1,k-1Wk-1,k-1 .

    2.5. Use Exercise 2.4 and the relation Hk,k-1 = Hk-1,k-1~k-1,k toshow that

    H~k-1Wk,k-1

    =~l-l,k{I - H;[-1,k-1Wk-1,k-1Hk-1,k-1 ~k-1,krk-1(Qk~l

    + rl-1~l-1,kHI-1,k-1Wk-1,k-1Hk-1,k-1 ~k-1,krk-1)-1. rl- 1~r-1,k}HI-1,k-1Wk-1,k-1 .

  • 30 2. Kalman Filter: An Elementary Approach

    2.6. Use Exercise 2.5 to derive the identity:

    (H~k-1Wk,k-1Hk,k-1)k,k-1 (H"[-1,k-1 Wk-1,k-1Hk-1,k-1)-1

    . H"[-1,k-1 Wk-1,k-1 = H~k-1Wk,k-1 .

    2.7. Use Lemma 1.2 to show that

    Pk,k-1C"[ (CkPk,k-1C"[ + Rk)-l = Pk,kC"[ Rk1 = Gk.

    2.8. Start with Pk,k-1 = (H"[k-1Wk,k-1Hk,k-1)-1. Use Lemma 1.2,(2.8), and the definiti~n of Pk,k = (H~kWk,kHk,k)-l to showthat

    Pk,k-1 = Ak-1Pk-1,k-1Al-1 + rk-1Qk-1rl-1 .2.9. Use (2.5) and (2.2) to prove that

    E(Xk - xklk-1)(Xk - xklk-1) T = Pk,k-1

    andE(Xk - xklk)(Xk - xklk) T = Pk,k .

    2.10. Consider the one-dimensional linear stochastic dynamicsystem

    Xo = 0,

    where E(Xk) = 0, Var(xk) = 0-2, E(Xk~j) = 0, E(~k) = 0, andE(~k~j) = J-L28kj. Prove that 0-2 = J-L2/(1 - a2) and E(XkXk+j) =a1j1 0-2 for all integers j.

    2.11. Consider the one-dimensional stochastic linear system

    with E(TJk) = 0, Var(TJk) = 0-2,E(xo) = °and Var(xo) = J-L2.Show that

    and that xklk -t c for some constant c as k -t 00.2.12. Let {Vk} be a sequence of data obtained from the observa-

    tion of a zero-mean random vector y with unknown varianceQ. The variance of y can be estimated by

  • Exercises 31

    Derive a prediction-correction recursive formula for this es-timation.

    2.13. Consider the linear deterministic/stochastic system

    {

    Xk+l = AkXk + BkUk + rk~kVk = CkXk + Dk Uk + '!lk '

    where {Uk} is a given sequence of deterministic control inputm-vectors, 1 :::; m :::; n. Suppose that Assumption 2.1 issatisfied and the matrix Var(?k,j) is nonsingular (cf. (2.2) forthe definition of "fk,j). Derive the Kalman filtering equationsfor this model.

    2.14. In digital signal processing, a widely used mathemati-cal model is the following so-called ARMA (autoregressivemoving-average) process:

    N M

    Vk = L BiVk-i + L AiUk-i ,i=l i=O

    where the n x n matrices B I ,'" ,BN and the n x q matricesAo, AI,"', AM are independent of the time variable k, and{Uk} and {Vk} are input and output digital signal sequences,respectively (cf. Fig. 2.3). Assuming that M :::; N, showthat the input-output relationship can be described as astate-space model

    {

    Xk+l = AXk + BUkVk = CXk + DUk

    with Xo = 0, where

    Al + BIAoB I I 0 0 A 2 + B 2 AoB 2 0 I

    A= B= AM+BMAo

    BN-I 0 IBM+IAo

    BN 0 0BNAo

    C=[10"'0] and D = [Ao] .

  • 32 2. Kalman Filter: An Elementary Approach

    At

    Fig. 2.3.

  • 3. Orthogonal Projection and Kalman Filter

    The elementary approach to the derivation of the optimal Kalmanfiltering process discussed in Chapter 2 has the advantage thatthe optimal estimate Xk == xklk of the state vector Xk is easilyunderstood to be a least-squares estimate of Xk with the prop-erties that (i) the transformation that yields Xk from the dataVk == [vci··· vl]T is linear, (ii) Xk is unbiased in the sense thatE(Xk) == E(Xk), and (iii) it yields a minimum variance estimatewith (VarCfk,k))-l as the optimal weight. The disadvantage of thiselementary approach is that certain matrices must be assumedto be nonsingular. In this chapter, we will drop the nonsingu-larity assumptions and give a rigorous derivation of the Kalmanfiltering algorithm.

    3.1 Orthogonality Characterizationof Optimal Estimates

    Consider the linear stochastic system described by (2.1) such thatAssumption 2.1 is satisfied. That is, consider the state-spacedescription

    {

    Xk+l == AkXk + rk{k(3.1)

    Vk == CkXk + !1.k 'where Ak , rk and Ok are known n x n, n x p and q x n constantmatrices, respectively, with 1 ::; p, q ::; n, and

    E({k) == 0, E(~k~;) == Qk8kl,

    E({kTlJ) == 0, E(xo~~) == 0, E(xo!1.~) == 0,

    for all k, f == 0,1,···, with Qk and Rk being positive definite andsymmetric matrices.

    Let x be a random n-vector and w a random q-vector. Wedefine the "inner product" (x, w) to be the n x q matrix

  • 34 3. Orthogonal Projection and Kalman Filter

    (x, w) = Cov(x, w) = E(x - E(x))(w - E(w))T .Let Ilwllq be the positive square root of (w, w). That is, IIwllq is anon-negative definite q x q matrix with

    IIwll~ = Ilwllqllwll; = (w, w).

    Similarly, let IIxlln be the positive square root of (x, x). Now, letwo,' . " W r be random q-vectors and consider the "linear span":

    Y(wo,"', w r )r

    ={y: y = L Piwi, Po,"', Pr, n x q constant matrices}.i=o

    The first minimization problem we will study is to determine a yin Y(wo,"', w r ) such that trllxk - yll~ = Fk' where

    Fk := min{trllxk - yll~: y E Y(wo,"', w r )}. (3.2)

    The following result characterizes y.

    Lemma 3.1. y E Y(wo,"', w r ) satisfies trllxk - yll~ = Fk if andonly if

    (Xk - y, Wj) = Onxqfor all j = 0, 1, ... , r. Furthermore, y is unique in the sense that

    trllxk - YII~ = trllxk - YII;only ify = y.

    To prove this lemma, we first suppose that trllxk - yll~ = Fkbut (Xk - y, Wjo) = C =I- Onxq for some jo where 0 :::; jo :::; r. ThenWjo =I- 0 so that Ilwjoll~ is a positive definite symmetric matrixand so is its inverse Ilwjoll~2. Hence, Cllwjoll~2CT =1= Onxn and isa non-negative definite and symmetric matrix. It can be shownthat

    tr{Cllwjoll;2CT} > 0 (3.3)

    (cf. Exercise 3.1). Now, the vector y + Cllwjoll~2wjo is in Y(wo,"', w r ) and

    trllxk - (y + Cllwjoll;2Wjo)ll~= tr{lI x k - YII; - (Xk - y, wjo)(Cllwjoll;2)T - Cllwjoll;2(wjo,Xk - y)

    + Cllwjo 11;211 w jo 11~(Cllwjo11;2)T}= tr{llxk - :9"11; - CIIWjoll;2C

    T}

    < trllxk - :9"11; = F kby using (3.3). This contradicts the definition of Fk in (3.2).

  • 3.2 Innovations Sequences 35

    Conversely, let (Xk -Y, Wj) = Onxq for all j = 0,1" . " r. Let y bean arbitrary random n-vector in Y(wo,"" w r ) and write Yo = y-Y= 2:j=o PojWj where Poj are constant nxq matrices, j = 0,1"", r.Then

    trllxk - yll;= trll(xk - y) - Yoll;= tr{lI xk - YII; - (Xk - Y, Yo) - (Yo, Xk - y) + IIYoll;}

    = tr{llxk -Yll;' - t(Xk -y,Wj)P;;j - tPOj(Xk _y,Wj)T + IIYolI;'})=0 )=0

    = trllxk - YII; + trllYoll;2: trllxk - YII; ,

    so that trllxk - yll~ = Fk. Furthermore, equality is attained if andonly if trllYoll~ = 0 or Yo = 0 so that Y = Y(cf. Exercise 3.1). Thiscompletes the proof of the lemma.

    3.2 Innovations Sequences

    To use the data information, we require an "orthogonalization"process.

    Definition 3.1. Given a random q-vector data sequence {Vj},j = O,···,k. The innovations sequence {Zj}' j = O, .. ·,k, of {Vj}(i.e., a sequence obtained by changing the original data sequence{ v j }) is defined by

    Zj=Vj-CjYj_1' j=O,l,···,k,

    with Y-1 = 0 and

    j-1

    Yj-1 = L Pj-1,iV i E Y(vo,"', Vj_1) , j = 1"", k,i=o

    (3.4)

    where the q x n matrices Cj are the observation matrices in (3.1)and the n x q matrices Pj - 1,i are chosen so that Yj-1 solvesthe minimization problem (3.2) with Y(wo,"', w r ) replaced byY(vo,"', Vj_1)'

    We first give the correlation property of the innovations se-quence.

  • 36 3. Orthogonal Projection and Kalman Filter

    Lemma 3.2. The innovations sequence {Zj} of {Vj} satisfies thefollowing property:

    (Zj,Zf) = (Ri + Clllxl - Yl-lll~CJ)8jl,

    where Ri = Var('!lt) > O.

    For convenience, we set

    ej = Cj(Xj - Yj-l)'

    To prove the lemma, we first observe that

    Zj = ej + '!1j'

    where {'!1k} is the observation noise sequence, and

    ('!It' ej) = Oqxq for all f 2 j .

    (3.5)

    (3.6)

    (3.7)

    Clearly, (3.6) follows from (3.4), (3.5), and the observation equa-tion in (3.1). The proof of (3.7) is left to the reader as an exercise(cf. Exercise 3.2). Now, for j = f, we have, by (3.6), (3.7), and(3.5) consecutively,

    (Zl, Zi) = (el + '!It' ei + '!It)= (el, el) + (!It' !il)= Cl\\Xl - Yl-ll1~cJ + Ri·

    For j =I f, since (ei,ej)T = (ej,ei)' we can assume without loss ofgenerality that j > f. Hence, by (3.6), (3.7), and Lemma 3.1 wehave

    (Zj, Zl) = (ej, el) + (ej, "70) + ("7., el) + (7J., "70)~ -J -J ~

    = (ej, ei + '!It)= (ej, Zi)= (ej, Vi - CiYi-l)

    / i-I)= \ Cj(Xj - Yj-l), ve - Ce ~.f>e-l,ivi

    2=0i-I= Cj(Xj - Yj-l, Vi) - Cj L(Xj - Yj-l, Vi)Pl~l,iCJ

    i=o= Oqxq.

    This completes the proof of the lemma.

  • 3.3 Minimum Variance Estimates 37

    Since Rj > 0, Lemma 3.2 says that {Zj} is an "orthogonal"sequence of nonzero vectors which we can normalize by setting

    (3.8)

    Then {ej} is an "orthonormal" sequence in the sense that (ei' ej) =bij Iq for all i and j. Furthermore, it should be clear that

    (3.9)

    (cf. Exercise 3.3).

    3.3 Minimum Variance Estimates

    We are now ready to give the minimum variance estimate Xk ofthe state vector Xk by introducing the "Fourier expansion"

    k

    Xk = L(Xk' ei)eii=o

    (3.10)

    of Xk with respect to the "orthonormal" sequence {ej}' Since

    k

    (xk,ej) = L(xk,ei)(ei,ej) = (xk,ej)'i=o

    we have(Xk-Xk,ej)=Onxq, j=O,l,···,k.

    It follows from Exercise 3.3 that

    so that by Lemma 3.1,

    That is, Xk is a minimum variance estimate of Xk.

    (3.11 )

    (3.12)

  • 38 3. Orthogonal Projection and Kalman Filter

    3.4 Kalman Filtering Equations

    This section is devoted to the derivation of the Kalman filteringequations. From Assumption 2.1, we first observe that

    (cf. Exercise 3.4), so that

    k

    Xk = L(xk,ej)ejj=ok-l

    = L(xk,ej)ej + (xk,ek)ekj=ok-l

    = L {(Ak-lXk-l, ej )ej + (rk-l~k_l' ej )ej} + (Xk, ek)ekj=o

    k-l

    = Ak- 1 L(xk-l,ej)ej + (xk,ek)ekj=o

    = Ak-1Xk-l + (Xk' ek)ek .

    Hence, by definingXklk-l = Ak-lXk-l ,

    where Xk-l := Xk-llk-l, we have

    (3.13)

    (3.14)

    Obviously, if we can show that there exists a constant n x q matrixGk such that

    (xk,ek)ek = Gk(Vk - CkXk1k-l) '

    then the "prediction-correction" formulation of the Kalman filteris obtained. To accomplish this, we consider the random vector(Vk - CkXklk-l) and obtain the following:

    Lemma 3.3. For j = 0,1, ... ,k,

    To prove the lemma, we first observe that

    (3.15)

  • 3.4 Kalman Filtering Equations 39

    (cf. Exercise 3.4). Hence, using (3.14), (3.11), and (3.15), wehave

    (Vk - CkXklk-1 ,ek)

    = (Vk - Ck(Xklk - (Xk, ek)ek), ek)

    = (Vk' ek) - Ck{ (xklk, ek) - (Xk, ek)}= (vk,ek) - Ck(Xklk - xk,ek)= (Vk' ek)= (Zk + CkYk-1, Il zkll;lzk)= (Zk, Zk) IIzkll;l + Ck(Yk-1, Zk) IIZk 11;1= II Z kllq·

    On the other hand, using (3.14), (3.11), and (3.7), we have

    (Vk - Ck Xk1k -1,ej)

    = (CkXk +'!lk - Ck(Xklk - (xk,ek)ek),ej)= Ck (Xk - Xklk, ej) + ('!lk' ej) + Ck(Xk, ek) (ek' ej)= Oqxq

    for j = 0,1,···, k - 1. This completes the proof of the Lemma.

    It is clear, by using Exercise 3.3 and the definition of Xk-1 =Xk-1Ik-1, that the random q-vector (Vk-CkXklk-1) can be expressedas I:i=o Miei for some constant q xq matrices Mi. It follows nowfrom Lemma 3.3 that for j = 0,1,· .. ,k,

    so that Mo = M1 = ... = Mk-l = 0 and Mk = IIzkllq. Hence,

    Define

    Then we obtain

    (Xk,ek)ek = Gk(Vk - CkXklk-l).

    This, together with (3.14), gives the "prediction-correction"equation:

    (3.16)

  • 40 3. Orthogonal Projection and Kalman Filter

    We remark that xklk is an unbiased estimate of Xk by choosingan appropriate initial estimate. In fact,

    Xk - xklk

    ==Ak-1Xk-l + rk-l~k_l - Ak-lXk-llk-l - Gk(Vk - CkAk-lXk-llk-l) .

    Xk - xklk

    ==(1 - GkCk)Ak-l(Xk-l - Xk-llk-l)

    + (I - GkCk)rk-l~k_l - Gk'!lk .

    Since the noise sequences are of zero-mean, we have

    so that

    Hence, if we set

    (3.17)

    XOIO == E(xo) , (3.18)

    then E(Xk - xklk) == 0 or E(Xklk) == E(Xk) for all k, Le., xklk is indeedan unbiased estimate of Xk.

    Now what is left is to derive a recursive formula for Gk. Using(3.12) and (3.17), we first have

    o == (Xk - xklk, Vk)== ((I - GkCk)Ak-l(Xk-l - Xk-llk-l) + (I - GkCk)rk-l~k_l - Gk'!lk'

    CkAk-l((Xk-l - Xk-llk-l) + Xk-llk-l) + Ckrk-l~k_l + '!lk)v v 2 T T

    == (I - GkCk)Ak-lllxk-l - Xk-llk-lllnAk-l Ckv T T v+ (I - GkCk)rk-lQk-lrk_lck - GkRk, (3.19)

    where we have used the facts that (Xk-l -Xk-llk-l' Xk-llk-l) == Onxn,a consequence of Lemma 3.1, and

    (Xk'~k) == Onxn,

    (Xk'!lj) == Onxq ,

    (Xklk' €.) == Onxn ,-J

    (Xk-llk-l, !lk) == Onxq ,(3.20)

    j == 0, ... , k (cf. Exercise 3.5). Define

    Pk,k == Ilxk - xklkll;

  • 3.4 Kalman Filtering Equations 41

    andPk,k-l = Ilxk - xklk-lll~·

    Then again by Exercise 3.5 we have

    Pk,k-l = IIAk-1Xk-l + rk-l~k_l - Ak-lXk-llk-lll~= Ak-1\lxk-l - Xk-llk-lll~Al-l + rk-lQk-lrl-l

    orPk,k-l = Ak-1Pk-l,k-lAl-..l +rk-lQk-lrl-l' (3.21)

    On the other hand, from (3.19), we also obtainv T T

    (I - GkCk)Ak-lPk-l,k-lAk-lCkv T T v+ (I - GkCk)rk-lQk-lrk_lCk - GkRk = o.

    In solving for Ok from this expression, we writev T T T

    Gk[Rk + Ck(Ak-1Pk-l,k-lAk-l + rk-1Qk-lrk-l)Ck ]= [Ak-lPk-l,k-lAl-l + rk-lQk-lrl-l]C~= Pk,k-lC~ .

    and obtainv T T 1

    Gk = Pk,k-lCk (Rk + CkPk,k-lCk)- , (3.22)where Rk is positive definite and CkPk,k-lC~ is non-negative def-inite so that their sum is positive definite (cf. Exercise 2.2).

    Next, we wish to write Pk,k in terms of Pk,k-l, so that togetherwith (3.21), we will have a recursive scheme. This can be doneas follows:

    Pk,k = II x k - xklkll~= Ilxk - (xklk-l + Ok(Vk - Ckxklk-l))II~

    v v 2= Ilxk - xklk-l - Gk(CkXk + '!lk) + GkCkxklk-llln

    v v 2= 11(1 - GkCk)(Xk - xklk-l) - Gk!lklln

    v 2 v T v VT= (I - GkCk)\Ixk - xklk-llln(1 - GkCk) + GkRkG k

    v v T v VT= (I - GkCk)Pk,k-l(1 - GkCk) + GkRkGk ,

    where we have applied Exercise 3.5 to conclude that (Xk -xklk-l,!lk) = Onxq. This relation can be further simplified by using(3.22). Indeed, since

  • (3.25)

    42 3. Orthogonal Projection and Kalman Filter

    we have.... v T.... .... T

    Pk,k ==(1 - GkCk)Pk,k-l(I - GkCk) + (I - GkCk)Pk,k-l(GkCk)==(1 - OkCk)Pk,k-l . (3.23)

    Therefore, combining (3.13), (3.16), (3.18), (3.21), (3.22) and(3.23), together with

    Po,o == IIxo - xOloll~ == Var(xo) , (3.24)

    we obtain the Kalman filtering equations which agree with theones we derived in Chapter 2. That is, we have xklk == xklk' xklk-l ==xklk-l and Ok == Gk as follows:

    Po,o == Var(xo)

    Pk,k-l == Ak-lPk-l,k-lAl-l + fk-lQk-lfr-lGk == Pk,k-l C-:' (CkPk,k-1C-:' + Rk)-lPk,k == (I - GkCk)Pk,k-lxOlo == E(xo)

    xklk-l == Ak-1Xk-llk-lxklk == xklk-l + Gk(Vk - CkXk1k-l)k == 1,2, ....

    Of course, the Kalman filtering equations (2.18) derived inSection 2.4 for the general linear deterministic/stochastic system

    {

    Xk+l == AkXk + BkUk + rk~kVk == CkXk + DkUk +!1k

    can also be obtained without the assumption on the invertibilityof the matrices Ak, VarC~k,j)' etc. (cf. Exercise 3.6).

    3.5 Real-Time Tracking

    To illustrate the application of the Kalman filtering algorithm de-scribed by (3.25), let us consider an example of real-time tracking.Let x(t), 0 :::; t < 00, denote the trajectory in three-dimensionalspace of a flying object, where t denotes the time variable (cf.Fig.3.1). This vector-valued function is discretized by samplingand quantizing with sampling time h > 0 to yield

    Xk ~ x(kh), k == 0,1,···.

  • 3.5 Real-Time 'fracking 43

    Fig. 3.1.

    -~, ", - - x(t)"'-,

    II

    • x(O)

    For practical purposes, x(t) can be assumed to have continuousfirst and second order derivatives, denoted by x(t) and x(t), re-spectively, so that for small values of h, the position and velocityvectors Xk and Xk ~ x(kh) are governed by the equations

    {

    h · 1h2 ..~k+l = ~k + ~k +"2 XkXk+l = Xk + hXk ,

    where Xk ~ x(kh) and k = 0,1,···. In addition, in many applica-tions only the position (vector) of the flying object is observedat each time instant, so that Vk = CXk with C = [I 0 0] ismeasured. In view of Exercise 3.8, to facilitate our discussion weonly consider the tracking model

    (3.26)

  • 44 3. Orthogonal Projection and Kalman Filter

    to be zero-mean Gaussian white noise sequences satisfying:

    E(~k) = 0, E("1k) = 0,

    E(~k~;) = Qk6kl, E("1k"1l) = rk6kl,

    E(xo~;) = 0, E(Xo"1k) = 0,

    where Qk is a non-negative definite symmetric matrix and rk > 0for all k. It is further assumed that initial conditions E(xo) andVar(xo) are given. For this tracking model, the Kalman filteringalgorithm can be specified as follows: Let Pk := Pk,k and let P[i, j]denote the (i, j)th entry of P. Then we have

    Pk,k-l[l,l] = Pk-l[l, 1] + 2hPk-l[1, 2] + h2Pk_l[1, 3] + h2Pk-l[2, 2]h4

    + h3Pk-l[2, 3] + 4Pk-1[3, 3] + Qk-l[l, 1],

    Pk,k-l[1,2] = Pk,k-l[2, 1]

    3h2= Pk-l[l, 2] + hPk-l[l, 3] + hPk-l[2, 2] + TPk-1[2, 3]

    h3+ 2Pk-1[3, 3] + Qk-l[l, 2],

    Pk,k-l[2,2] = Pk-l[2, 2] + 2hPk-l[2, 3] + h2Pk-l[3, 3] + Qk-l[2, 2],Pk,k-l[1,3] = Pk,k-l[3, 1]

    h2= Pk- 1[1, 3] + hPk- 1[2, 3] + 2Pk-1[3, 3] + Qk-l[l, 3],

    Pk,k-l [2,3] = Pk,k-l [3,2]= Pk-l[2, 3] + hPk-l[3, 3] + Qk-l[2, 3],

    Pk,k-l [3,3] = Pk-l [3,3] + Qk-l [3,3] ,

    with Po,o = Var(xo) ,

  • with :Kala = E(xo).

    Exercises

    Exercises 45

    (3.27)

    3.1. Let A =f. 0 be a non-negative definite and symmetric constantmatrix. Show that trA > o. (Hint: Decompose A as A = BBTwith B =f. 0.)

    3.2. Letj-I

    ej = Cj(Xj - Yj-I) = C j (Xi - L Pi-l,iVi) ,'1.=0

    where Pj-I,i are some constant matrices. Use Assumption 2.1to show that

    for all /!, ? j.3.3. For random vectors WO,"', W r , define

    Y(Wo,"', w r )r

    y= LPiWi,i=O

    Po, ... 'Pr' constant matrices}.

    Letj-I

    Zj = Vj - C j L Pj-l,iVii=O

    be defined as in (3.4) and ej = Ilzjll-lzj' Show that

    3.4. Letj-I

    Yj-I = L Pj-l,iVii=O

    andj-I

    Zj = Vj - C j L Pj-l,iVi .i=O

    Show that

    j = 0,1," . ,k - 1.

  • 46 3. Orthogonal Projection and Kalman Filter

    3.5. Let ej be defined as in Exercise 3.3. Also define

    k

    Xk == L(Xk, ei)eii=O

    as in (3.10). Show that

    (Xk, "l.) == Onxq ,-J

    for j == O,l,···,k.3.6. Consider the linear deterministic/stochastic system

    {Xk+l == AkXk + BkUk + rk{k

    Vk == CkXk + DkUk +!lk '

    where {Uk} is a given sequence of deterministic control inputm-vectors, 1 :::; m :::; n. Suppose that Assumption 2.1 is satis-fied. Derive the Kalman filtering algorithm for this model.

    3.7. Consider a simplified radar tracking model where a large-amplitude and narrow-width impulse signal is transmittedby an antenna. The impulse signal propagates at the speedof light c, and is reflected by a flying object being tracked.The radar antenna receives the reflected signal so that a time-difference b..t is obtained. The range (or distance) d from theradar to the object is then given by d == c~t/2. The impulsesignal is transmitted periodically with period h. Assume thatthe object is traveling at a constant velocity w with randomdisturbance ~ ~ N(O, q), so that the range d satisfies the dif-ference equation

    dk+1 == dk + h(Wk + ~k) .

    Suppose also that the measured range using the formula d ==cb..t/2 has an inherent error ~d and is contaminated with noise"l where "l~N(O,r), so that

    Assume that the initial target range is do which is indepen-dent of ~k and "lk, and that {~k} and {"lk} are also independent(cf. Fig.3.2). Derive a Kalman filtering algorithm as a range-estimator for this radar tracking system.

  • radar

    Exercises 47

    Fig. 3.2.

    3.8. A linear stochastic system for radar tracking can be describedas follows. Let E, ~A, ~E be the range, the azimuthal an-gular error, and the elevational angular error, respectively,of the target, with the radar being located at the origin (cf.Fig.3.3). Consider E, ~A, and ~E as functions of time withfirst and second derivatives denoted by E, ~A, ~E, ~, ~A,~E, respectively. Let h > 0 be the sampling time unit andset Ek = E(kh), Ek = E(kh), Ek = E(kh), etc. Then, using thesecond degree Taylor polynomial approximation, the radartracking model takes on the following linear stochastic state-space description:

    {Xk+l = AXk + rk~k

    Vk = CXk + '!lk'where

    Xk = [Ek Ek Ek ~Ak ~Ak ~Ak ~Ek ~Ek ~Ek] T ,1 h h2 /2o 1 ho 0 1

    1 h h2 /2o 1 ho 0 1

    1 h h2 /20 1 h0 0 1

    C = [~0 0 0 0 0 0 0

    ~] ,0 0 1 0 0 0 00 0 0 0 0 1 0

  • 48 3. Orthogonal Projection and Kalman Filter

    and {{k} and {!lk} are independent zero-mean Gaussian whitenoise sequences with Var({k) = Qk and Var(!lk) = Rk. Assumethat [fI f~] ,r k = k r 2k

    [QI Q~] ,[RI

    Rf] ,Qk = k Q~ Rk = k R2k

    where r~ are 3 x 3 submatrices, Q~, 3 x 3 non-negative def-inite symmetric submatrices, and Rl, 3 x 3 positive definitesymmetric submatrices, for i = 1, 2, 3. Show that this sys-tem can be decoupled into three subsystems with analogousstate-space descriptions.

    Fig. 3.3.

  • 4. Correlated Systemand Measurement Noise Processes

    In the previous two chapters, Kalman filtering for the model in-volving uncorrelated system and measurement noise processeswas studied. That is, we have assumed all along that

    E(~kiJ) = °for k, R = 0,1,' ". However, in applications such as aircraft iner-tial navigation systems, where vibration of the aircraft induces acommon source of noise for both the dynamic driving system andonboard radar measurement, the system and measurement noisesequences {~k} and {!lk} are correlated in the statistical sense,with

    E(~kiJ) = Sk8k£ ,k,P = 0,1,"', where each Sk is a known non-negative definite ma-trix. This chapter is devoted to the study of Kalman filtering forthe above model.

    4.1 The Affine Model

    Consider the linear stochastic state-space description

    {Xk+l = AkXk + rk~k

    Vk = CkXk + !lkwith initial state Xo, where Ak' Ck and rk are known constantmatrices. We start with least-squares estimation as discussed inSection 1.3. Recall that least-squares estimates are linear func-tions of the data vectors; that is, if x is the least-squares estimateof the state vector x using the data v, then it follows that x= Hvfor some matrix H. To study Kalman filtering with correlatedsystem and measurement noise processes, it is necessary to ex-tend to a more general model in determining the estimator x. Itturns out that the affine model

    x=h+Hv (4.1)

  • 50 4. Correlated Noise Processes

    which provides an extra parameter vector h is sufficient. Here, his some constant n-vector and H some constant n x q matrix. Ofcourse, the requirements on our optimal estimate x of x are: x isan unbiased estimator of x in the sense that

    E(x) = E(x)

    and the estimate is of minimum (error) variance.From (4.1) it follows that

    h = E(h) = E(x - Hv) = E(x) - H(E(v)).

    Hence, to satisfy the requirement (4.2), we must have

    h = E(x) - HE(v).

    or, equivalently,x= E(x) - H(E(v) - v) .

    (4.2)

    (4.3)

    (4.4)

    On the other hand, to satisfy the minimum variance requirement,we use the notation

    F(H) = Var(x - x) = Ilx - xII;,

    so that by (4.4) and the fact that Ilvll; = Var(v) is positive definite,we obtain·

    F(H) = (x-x,x-x)= ((x - E(x)) - H(v - E(v)), (x - E(x)) - H(v - E(v)))= Ilxll; - H(v,x) - (x, v)HT + Hllvll~HT= {lIxll; - (x,v)[Ilvll~]-l(v,x)}

    + {Hllvll~HT - H(v, x) - (x, v)HT + (x, v)[lIvll~]-l(v, x)}= {lIxll~ - (x; v) [lIvll~]-l (v, x)}

    + [H - (x, v)[lIvll~]-l]lIvll~[H - (x, v)[lIvll~]-l]T ,

    where the facts that (x, v) T = (v, x) and that Var(v) is nonsingularhave been used.

    Recall that minimum variance estimation means the exis-tence of H* such that F(H) 2:: F(H*), or F(H) - F(H*) is non-negative definite, for all constant matrices H. This can be at-tained by simply setting

    H* = (x, v)[Ilvll~]-l , (4.5)

  • 4.2 Optimal Estimate Operators 51

    so that

    F(H) - F(H*) = [H - (x, v)[llvll~]-l]lIvll~[H - (x, v)[llvll~]-l]T ,

    which is non-negative definite for all constant matrices H. Fur-thermore, H* is unique in the sense that F(H) - F(H*) = 0 if andonly if H = H*. Hence, we can conclude that x can be uniquelyexpressed as

    x = h+H*v,

    where H* is given by (4.5). We will also use the notation x =L(x, v) for the optimal estimate of x with data v, so that by using(4.4) and (4.5), it follows that this "optimal estimate operator"satisfies:

    L(x, v) = E(x) + (x, v)[llvll~]-l(v - E(v)). (4.6)

    4.2 Optimal Estimate Operators

    First, we remark that for any fixed data vector v, L(·, v) is a linearoperator in the sense that

    L(Ax + By, v) = AL(x, v) + BL(y, v) (4.7)

    for all constant matrices A and B and state vectors x and y (cf.Exercise 4.1). In addition, if the state vector is a constant vectora, then

    L(a, v) = a (4.8)

    (cf. Exercise 4.2). This means that if x is a constant vector, sothat E(x) = x, then x = x, or the estimate is exact.

    We need some additional properties of L(x, v). For this pur-pose we first establish the following.

    Lemma 4.1. Let v be a given data vector and y = h+Hv, whereh is determined by the condition E(y) = E(x), so that y is uniquelydetermined by the constant matrix H. If x* is one of the y's suchthat

    trllx - x*ll~ = mintrllx - YII~,H

    then it follows that x* = x, where x= L(x, v) is given by (4.6).

  • 52 4. Correlated Noise Processes

    This lemma says that the minimum variance estimate x andthe "minimum trace variance" estimate x* of x from the samedata v are identical over all affine models.

    To prove the lemma, let us consider

    trllx - YII;== trE((x - y)(x _ y) T)== E((x - y)T (x - y))== E((x - E(x)) - H(v - E(v))T ((x - E(x)) - H(v - E(v)),

    where (4.3) has been used. Taking

    8 28H (trllx - Ylln) == 0,

    we arrive at

    x* == E(x) - (x, v)[Ilvll~]-l(E(v) - v) (4.9)

    which is the same as the x given in (4.6) (cf. Exercise 4.3). Thiscompletes the proof of the Lemma.

    4.3 Effect on Optimal Estimation with Additional Data

    Now, let us recall from Lemma 3.1 in the previous chapter thaty E Y == Y(wo,"" wr ) satisfies

    trllx - YII; == mintrllx - YII;yEY

    if and only if

    (x-y,Wj)==Onxq, j==O,l,···,r.

    Set Y == Y(v - E(v)) and x == L(x, v) == E(x) + H*(v - E(v)), whereH* == (x, v)[llvll~]-l. If we use the notation

    x == x - E(x) and v == v - E(v) ,

    then we obtain

    IIx - xii; == 11 (x - E(x)) - H*(v - E(v))II; == Ilx - H*vll; .

  • 4.3 Effect on Optimal Estimation 53

    But H* was chosen such that F(H*) ::; F(H) for all H, and thisimplies that trF(H*) ::; trF(H) for all H. Hence, it follows that

    trllx - H*vll; ::; trllx - YII;

    for all Y E Y(v - E(v)) = y(v). By Lemma 3.1, we have

    (x - H*v, v) = Onxq .

    Since E(v) is a constant, (x - H*v, E(v)) = Onxq, so that

    (x - H*v, v) = Onxq ,

    or(x - x, v) = Onxq .

    Consider two random data vectors vI and v 2 and set

    (4.10)

    (4.11)

    Then from (4.10) and the definition of the optimal estimate op-erator L, we have

    (4.12)

    and similarly,(v2#, vI) = o. (4.13)

    The following lemma is essential for further investigation.

    Lemma 4.2. Let x be a state vector and vI, v 2 be randomobservation data vectors with nonzero finite variances. Set

    Then the minimum variance estimate x of x using the data v canbe approximated by the minimum variance estimate L(x, vI) of xusing the data vI in the sense that

    with the error

    e(x,v2 ) :=L(x#,v2#)= (x#, v 2#)[llv2#11 2]-Iv 2# .

    (4.14)

    (4.15)

  • 54 4. Correlated Noise Processes

    We first verify (4.15). Since L(x,y1) is an unbiased estimateof x (cf. (4.6)),

    E(x#) = E(x - L(x, yl)) = o.

    Similarly, E(y2#) = o. Hence, by (4.6), we have

    L(x#, y2#) = E(x#) + (x#, y2#) [Il y2# 11 2]-I(y2# - E(y2#))= (x#,y2#)[lIy2#1I 2]-ly2#,

    yielding (4.15). To prove (4.14), it is equivalent to showing that

    XO := L(x, yl) + L(x# ,y2#)

    is an affine unbiased minimum variance estimate of x from thedata y, so that by the uniqueness of x, XO = X = L(x, v). First,note that

    XO = L(x, yl) + L(x#, y2#)= (hI + H1y l) + (h2 + H2(y2 - L(y2, yl))= (hI + H1yl) + h2 + H2(y2 - (h3 + H3yl))

    = (hI + h 2 - H2h3 ) + H [ :~ ]:= h+Hy,

    where H = [HI - H2 H3 H2 ]. Hence, XO is an affine transformationof Y. Next, since E(L(x, yl)) = E(x) and E(L(x#, y2#)) = E(x#) = 0,we have

    E(xO) = E(L(x, yl)) + E(L(x#, y2#)) = E(x).

    Hence, XO is an unbiased estimate of x. Finally, to prove thatXO is a minimum variance estimate of x, we note that by usingLemmas 4.1 and 3.1, it is sufficient to establish the orthogonalityproperty

    (x-XO,y) =Onxq.

    This can be done as follows. By (4.15), (4.11), (4.12), and (4.13),we have

    (x - xO, y)

    = (x# - (x#,y2#)[lIy 2#11 2]-ly2#,y)1 1

    = (x#, [:2]) - (x#,v2#Hllv2#112j-l(v2#, [:2])= (x#, y2) _ (x#, v 2#) [ll y2# 11 2]-1 (y2#, y2) .

  • (4.16)

    4.4 Kalman Filtering Equations 55

    But since v 2 = v 2# + L(v2 , vI), it follows that

    (v2#,v2) = (v2#,v2#) + (v2#,L(v2,vl ))

    from which, by using (4.6), (4.13), and the fact that (v2#,E(vl )) =(v2#, E(v2 )) = 0, we arrive at

    (v2#, L(v2 , vI))

    = (v2#, E(v2 ) + (v2, vl)[llvI1l2]-I(vl - E(vl )))= ((v2#,vl ) _ (v2#,E(vl)))[llvII12]-I(vl,v2)

    =0,

    so that(v2#, v 2) == (v2#, v 2#) .

    Similarly, we also have

    (x#, v 2) = (x#, v 2#) .

    Hence, indeed, we have the orthogonality property:

    (x - xo, v)

    =(x#,v2#) - (x#,v2#)[Ilv2#112]-I(v2#,v2#)

    ==Onxq.

    This completes the proof of the Lemma.

    4.4 Derivation of Kalman Filtering Equations

    We are now ready to study Kalman filtering with correlated sys-tem and measurement noises. Let us again consider the linearstochastic system described by

    {

    Xk+1 = AkXk + rk~kVk = CkXk + '!lk

    with initial state Xo, where Ak, Ck and rk are known constantmatrices. We will adopt Assumption 2.1 here with the exceptionthat the two noise sequences {~k} and {'!lk} may be correlated,namely: we assume that {~k} and {'!lk} are zero-mean Gaussianwhite noise sequences satisfying

    E(~kXci) = Opxn , E('!lkxci ) = Oqxn ,E(~k~;) == Qk8kl , E('!lkiJ) == Rk8kl ,E(~kiJ) = Sk8kl ,

  • 56 4. Correlated Noise Processes

    where Qk, Rk are, respectively, known non-negative definite andpositive definite matrices and Sk is a known, but not necessarilyzero, non-negative definite matrix.

    The problem is to determine the optimal estimate Xk = xklkof the state vector Xk from the data vectors Vo, VI,···, Vk, usingthe initial information E(xo) and Var(xo). We have the followingresult.

    Theorem 4.1. The optimal estimate Xk = xklk of Xk from thedata vo, VI,···, Vk can be computed recursively as follows: Define

    Po,o = Var(xo) .

    Then, for k = 1,2, ... , compute

    Pk,k-I =(Ak- I - Kk-ICk-I)Pk-l,k-I(Ak-1 - Kk-ICk-l) T

    + rk-IQk-Irl-1 - Kk-IRk-IKl-I , (a)

    where(b)

    and the Kalman gain matrix

    withPk,k = (I - GkCk)Pk,k-1 .

    Then, with the initial condition

    xOlo = E(xo) ,

    compute, for k = 1,2,· .. , the prediction estimates

    (c)

    (d)

    and the correction estimates

    (cf. Fig.4.1).

    (f)

  • 4.4 Kalman Filtering Equations 57

    These are the Kalman filtering equations for correlated sys-tem and measurement noise processes. We remark that if thesystem noise and measurement noise are uncorrelated; that is,Sk-I = Opxq, so that Kk-I = Onxq for all k = 1,2,···, then theabove Kalman filtering equations reduce to the ones discussed inChapters 2 and 3.

    +

    Fig. 4.1.

    We will first derive the prediction-correction formulas (e) and(f). In this process, the matrices Pk,k-I, Pk,k, and Gk will bedefined, and their computational schemes (a), (b), (c), and (d)will be determined. Let

    Then, v k , V k - I , and Vk can be considered as the data vectors v,VI, and v 2 in Lemma 4.2, respectively. Also, set

    A L( k-I)Xklk-I = Xk, v ,Xklk = L(Xk, v k ) ,

    and# A L( k-I)X k = Xk - Xklk-I = Xk - Xk, v .

  • 58 4. Correlated Noise Processes

    Then we have the following properties

    (~k-1'Vk-

    2) = 0,

    (~k-1'Xk-1) = 0,(Xt-1'~k_1) = 0,(Xk-1Ik-2, ~k-1) = 0,

    ('1k-1' Vk-

    2) = 0,

    ('1k-1' Xk-1) = 0,(xt-1' '1k-1) = 0,(Xk-1Ik-2, '!lk-1) = 0,

    (4.17)

    (cf. Exercise 4.4). To derive the prediction formula, the idea is toadd the "zero term" Kk-1(Vk-1 - Ck-1Xk-1 - '!lk-1) to the estimate

    For an appropriate matrix Kk-1, we could eliminate the noise cor-relation in the estimate by absorbing it in Kk-1. More precisely,since L(.,vk - 1 ) is linear, it follows that

    Xklk-1

    =L(Ak-1Xk-1 + rk-1~k_1 + Kk-1(Vk-1 - Ck-1Xk-1 - '!lk-1)' v k- 1)=L((Ak- 1 - Kk-1Ck-1)Xk-1 + Kk-1Vk-1

    + (rk-1~k_1 - Kk-1'!1.k_1) , Vk- 1)=(Ak-1 - Kk-1Ck-1)L(Xk-1, v k- 1) + Kk-1L(Vk-1, Vk- 1)

    + L(rk-1~k_1 - Kk-1'!lk_1' v k - 1 ):=11 + 12 + 13 .

    We will force the noise term 13 to be zero by choosing Kk-1 appro-priately. To accomplish this, observe that from (4.6) and (4.17),we first have

    13 = L(rk-1~k_1 - Kk-1'!lk_1' vk

    -1

    )

    = E(rk-1~k_1 - Kk-1!l.k_1)

    + (rk-1~k_1 - Kk-1!l.k_1' vk-1)[llvk-1112]-1(vk-1 - E(vk- 1))

    = (rk-l~k_l - Kk-l!lk_l' [::~:J) [llvk-1112rl(vk-l - E(Vk- 1))

    = (rk-1~k_1 - Kk-1!l.k_1' Vk_1)[!Ivk-1112]-1(vk-1 - E(vk- 1))

    = (rk-1~k_1 - Kk-1'!lk_1' Ck-lXk-l + '1k-1) [lIvk-1112]-1(vk-l - E(vk- 1))= (rk-lSk-l - Kk_lRk_l)[llvk-1112]-1(vk-l - E(Vk- 1)).

    Hence, by choosing

  • 4.4 Kalman Filtering Equations 59

    so that (b) is satisfied, we have 13 = o. Next, 11 and 12 can bedetermined as follows:

    11 = (Ak- 1 - Kk-1Ck-l)L(Xk-l, v k - 1 )= (Ak- 1 - Kk-1Ck-l)Xk-llk-l

    and, by Lemma 4.2 with vt-l = Vk-l - L(Vk-l, vk- 2), we have

    12 = Kk-1L(Vk-l, v k - 1)k-2

    = Kk-1L(Vk- 1, [V ])Vk-l

    = Kk-l(L(Vk-l, vk- 2) + (vt-l' vt_l)[llvt_11l 2]-lvt_l)= Kk-1(L(Vk-l, vk- 2) + vt-l)= Kk-lVk-l.

    Hence, it follows that

    Xklk-l = 11+ 12= (Ak-l - Kk-1Ck-l)Xk-llk-l + Kk-lVk-l= Ak-lXk-llk-l + Kk-l(Vk-l - Ck-1Xk-llk-l)

    which is the prediction formula (e).To derive the correction formula, we use Lemma 4.2 again to

    conclude that

    Xklk = L(Xk,vk- 1 ) + (xt,vt)[llvtI12]-lvt= xklk-l + (xt,vt)[llvtIl2]-lvt, (4.18)

    where

    and, using (4.6) and (4.17), we arrive at

    vt = Vk - L(Vk, vk- 1)= CkXk + ~k - L(CkXk + ~k' v k - 1)= CkXk + ~k - CkL(Xk, v k - 1) - L(~k' v k - 1)= Ck(Xk - L(Xk, V k - 1)) + ~k - E('!lk)

    - (!1k' vk-l)[lIvk-1112]-1(vk-l - E(vk- 1))

    = Ck(Xk - L(Xk, v k - 1)) + ~k= Ck(Xk - xklk-l) + '!lk .

  • 60 4. Correlated Noise Processes

    Hence, by applying (4.17) again, it follows from (4.18) that

    Xklk == xklk-I + (Xk - xklk-I, Ck(Xk - xklk-I) + !!..k). [IICk(Xk - xklk-I) + !lkll~]-I(Ck(xk - xklk-I) + !lk)

    == Xk\k-I + IIx k - xk\k-IIl;C~. [Ckllxk - xklk-III;C~ + Rk]-I(Vk - CkXk1k-l)

    == xklk-I + Gk(Vk - CkXk1k-I) '

    which is the correction formula (f), if we set

    Pk,j == Ilxk - xk1jll;

    andGk == Pk,k-IC-: (CkPk,k-IC-: + Rk)-I. (4.19)

    What is left is to verify the recursive relations (a) and (d) forPk,k-I and Pk,k. To do so, we need the following two formulas, thejustification of which is left to the reader:

    (4.20)

    and(Xk-I - Xk-Ilk-I, rk-lik_1 - Kk-l!!..k_l) == Onxn , (4.21)

    (cf. Exercise 4.5).Now, using (e), (b), and (4.21) consecutively, we have

    Pk,k-I

    == II x k - xklk-Ill;== II A k-I Xk-1 + rk-l~k_1 - Ak-IXk-Ilk-1

    - Kk-I(Vk-1 - Ck-IXk-Ilk-I)II;

    == IIAk-IXk-1 + rk-l~k_1 - Ak-IXk-Ilk-1- Kk-I(Ck-IXk-1 + !lk-I - Ck-IXk-Ilk-I)II;

    == II(Ak- 1 - Kk-ICk-I)(Xk-1 - Xk-Ilk-I)+ (rk-l~k_1 - Kk-l!!..k_l) 11;

    == (Ak- I - Kk-ICk-I)Pk-Ilk-I(Ak-1 - Kk-ICk-l) T + rk-IQk-IrJ-I+ Kk-IRk-IK-:_I - rk-ISk-IK-:_I - Kk-IS~-lrJ-I

    == (Ak- I - Kk-ICk-I)Pk-l,k-1 (Ak- I - Kk-ICk-I) T+ rk-IQk-IrJ-I - Kk-IRk-IK~_1 ,

    which is (a).

  • (4.22)

    4.5 Real-Time Applications 61

    Finally, using (f), (4.17), and (4.20) consecutively, we alsohave

    Pk,k

    = Ilxk - xklkll;= Ilxk - xklk-l - Gk(Vk - CkXklk-l)ll~= lI(xk ~ xklk-l) - Gk(CkXk + '!lk - CkXklk-l)ll;= 11(1 - GkCk)(Xk - xklk-l) - Gk'!lk ll ;= (1 - GkCk)Pk,k-l(1 - GkCk)T + GkRkGl= (1 - GkCk)Pk,k-l - (1 - GkCk)Pk,k-lC-:Gl + GkRkGl= (1 - GkCk)Pk,k-l ,

    which is (d). This completes the proof of the theorem.

    4.5 Real-Time Applications

    An aircraft radar guidance system provides a typical application.This system can be described by using the model (3.26) con-sidered in the last chapter, with only one modification, namely:the tracking radar is now onboard to provide the position datainformation. Hence, both the system and measurement noise pro-cesses come from the same source such as vibration, and will becorrelated. For instance, let us consider the following state-spacedescription

    [Xk+l[l]] [1 h h2 /2] [Xk[l]] [~k[l]]Xk+l[2] = 0 1 h xk[2] + ~k[2]Xk+l[3] 0 0 1 xk[3] ~k[3]

    [

    Xk[l]]Vk = [1 0 0] xk[2] + TJk ,

    xk[3]

    where {{k}' with {k := [~k[l] ~k[2] ~k[3]]T, and {TJk} are assumedto be correlated zero-mean Gaussian white noise sequences sat-isfying

    E({k) = 0, E(TJk) = 0,

    E({k(;) = Qk6ki!, E(TJkTJi!) = rk 6ki!, E({kTJi!) = Sk 6ki!,

    E(xo{~) = 0, E(XoTJk) = 0 ,

    with Qk ~ 0, rk > 0, Sk := [sk[l] sk[2] sk[3]]T ~ 0 for all k, andE(xo) and Var(xo) are both assumed to be given.

  • 62 4. Correlated Noise Processes

    An application of Theorem 4.1 to this system yields the fol-lowing:

    Pk,k-l [1,1]

    == Pk-l[l, 1] + 2hPk-l[l, 2] + h2Pk-l[l, 3] + h2Pk-l[2, 2]h4

    + h3Pk-d2, 3J + 4"Pk-l[3, 3J + Qk-d1, 1]

    + Sk-l[l] {Sk-d1JPk-d1, 1J - 2(Pk-l[l, 1]rk-l rk-l

    + hPk-d1,2J + ~2 Pk-d1,3J) - Sk-l[11},

    Pk,k-l [1, 2] == Pk,k-l [2, 1]3h2

    == Pk- 1 [1,2] + hPk-l[I,3] + hPk-l[2,2] + T Pk- 1 [2,3]

    h3

    {Sk-l[I]Sk-l[2]+ -Pk-l[3, 3] + Qk-l[l, 2] + 2 Pk-l[I,I]

    2 r k - 1

    Sk-l[l] ( ) Sk-l[2] (--- Pk-l[I,2]+ hPk-l[I,3] --- Pk-l[I,I]rk-l rk-l

    h2

    ) Sk-I[I]Sk-l[2]}+ hPk-l[l, 2] + -Pk- 1 [1, 3] - ,2 rk-l

    Pk,k-l [2,2]

    == Pk-l[2, 2] + 2hPk-l[2, 3] + h2Pk-l[3, 3] + Qk-I[2, 2]

    Sk-l [2] { Sk-l [2] ( )}+-- --Pk-I[2,2] - 2 Pk- I [I,2] + hPk- 1 [1, 3] - Sk-l[2] ,rk-l rk-l

    Pk,k-l [1,3] == Pk,k-l [3,1]h2

    == Pk-I[I, 3] + hPk-I[2, 3] + 2Pk-1[3, 3] + Qk-l[l, 3]

    + { Sk-d11Sk-l