c391x Appendix A

Embed Size (px)

Citation preview

  • 8/10/2019 c391x Appendix A

    1/20

    A

    Matrix Properties

    THIS appendix provides a reasonably comprehensive account of matrix proper-ties, which are used in the linear algebra of estimation and control theory. Sev-eral theorems are shown, but are not proven here; those proofs given areconstructive

    (i.e., suggest an algorithm). The account here is thus not satisfactorily self-contained,

    but references are provided where rigorous proofs may be found.

    A.1 Basic Definitions of Matrices

    The system ofm linear equations

    y1 = a11x1 +a12x2 + +a1nxny2 = a21x1 +a22x2 + +a2nxn

    ...

    ym= am1x1 +am2x2 + +amnxn

    (A.1)

    can be written in matrix form as

    y =A x (A.2)whereyis anm

    1 vector,xis ann

    1 vector (seeA.2 for a definition of a vector)

    andAis anm nmatrix, with

    y =

    y1y2

    ...

    ym

    , x =

    x1x2

    ...

    xn

    , A =

    a11 a12 a1na21 a22 a2n

    ......

    . . ....

    am1am2 amn

    (A.3)

    Ifm = n, then the matrixAissquare.

    Matrix Addition, Subtraction, and MultiplicationMatrices can be added, subtracted, or multiplied. For addition and subtraction,

    all matrices must of the same dimension. Suppose we wish to add/substract two

    matricesAandB :

    C=A B (A.4)

    533 2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    2/20

    534 Optimal Estimation of Dynamic Systems

    Then each element ofCis given byci j= ai j bi j . Matrix addition and subtractionare both commutative, A B=B A, and associative, (A B) C=A (B C). Matrix multiplication is much more complicated though. Suppose we wish to

    multiply two matricesAandB :

    C

    =A B (A.5)

    This operation is valid only when the number of columns ofAis equal to the number

    of rows ofB(i.e.,AandBmust beconformable). The resulting matrixCwill have

    rows equal to the number of rows ofAand columns equal to the number of columns

    ofB . Thus, ifAhas dimensionm nand Bhas dimensionn p, thenCwill havedimensionm p. Theci jelement ofCcan be determined by

    ci j=n

    k=1

    aikbk j (A.6)

    for alli=1,2, . . . ,m and j=1,2, . . . ,p. Matrix multiplication is associative,A (B C) = (A B)C, and distributive, A (B + C) =A B +AC, but not commutativein general,A B =B A. In some cases though ifA B =B A, thenAandBare said tocommute.

    The transpose of a matrix, denoted AT, has rows that are the columns ofAand

    columns that are the rows ofA. The transpose operator has the following properties:

    (A)T = AT,whereis a scalar (A.7a)(A +B)T =AT+BT (A.7b)

    (A B)T =BTAT (A.7c)

    IfA =AT, then Ais said to be asymmetricmatrix. Also, ifA = AT, then Aissaid to be askew symmetricmatrix.

    Matrix Inverse

    We now discuss the properties of the matrix inverse. Suppose we are given bothy

    and A in eqn. (A.2), and we want to determine x. The following terminology should

    be noted carefully: ifm >n, the system in eqn. (A.2) is said to be overdetermined

    (there are more equations than unknowns). Under typical circumstances we will find

    that the exact solution forxdoes not exist; therefore algorithms for approximate

    solutions forxare usually characterized by some measure ofhow wellthe linear

    equations are satisfied. Ifm

  • 8/10/2019 c391x Appendix A

    3/20

    Matrix Properties 535

    Ahas linearly independent columns. Ahas linearly independent rows. The inverse satisfiesA1A =A A1 =I

    whereIis ann nidentity matrix:

    I=

    1 0 00 1 0...

    .... . .

    ...

    0 0 1

    (A.8)

    Anonsingularmatrix is a matrix whose inverse exists (likewiseAT is nonsingular):

    (A1)1

    =A (A.9a)

    (AT)1 = (A1)T AT (A.9b)Furthermore, letAandB ben nmatrices. The matrix productA Bis nonsingularif and only ifAandBare nonsingular. If this condition is met, then

    (A B)1 =B1A1 (A.10)Formal proof of this relationship and other relationships are given in Ref. [1]. The

    inverse of a square matrixAcan be computed by

    A1 =adj(A)

    det(A) (A.11)

    where adj(A)is theadjointofAand det(A)is thedeterminantofA. The adjoint

    and determinant of a matrix with large dimension can ultimately be broken down to

    a series of 2 2 matrix cases, where the adjoint and determinant are given by

    adj(A22) =

    a22 a12a21 a11

    (A.12a)

    det(A22) = a11a22 a12a21 (A.12b)Other determinant identities are given by

    det(I) = 1 (A.13a)det(A B) = det(A)det(B) (A.13b)

    det(A B) = det(B A) (A.13c)det(A B +I) = det(B A +I) (A.13d)

    det(A +xyT) = det(A) (1+yTA1x) (A.13e)det(A)det(D

    +C A1B)

    =det(D)det(A

    +B D1C) (A.13f)

    det(A) = [det(A)], must be positive if det(A) = 0 (A.13g)det(A) = n det(A) (A.13h)

    det(A33) det

    a b c= aT[b]c = bT[c]a = cT[a]b (A.13i)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    4/20

    536 Optimal Estimation of Dynamic Systems

    where the matrices [a], [b], and [a] are defined in eqn. (A.38). The adjoint isgiven by the transpose of thecofactormatrix:

    adj(A) = [cof(A)]T (A.14)

    The cofactor is given byCi j= (1)i+jMi j (A.15)

    where Mi j is theminor, which is the determinant of the resulting matrix given by

    crossing out the row and column of the elementai j . The determinant can be com-

    puted using an expansion about rowior column j :

    det(A) =n

    k=1aikCik=

    n

    k=1ak j Ck j (A.16)

    From eqn. (A.11)A1 exists if and only if the determinant ofAis nonzero. Matrixinverses are usually complicated to compute numerically; however, a special case is

    when the inverse is given by the transpose of the matrix itself. This matrix is then

    said to beorthogonalwith the property

    ATA =A AT =I (A.17)

    Also, the determinant of an orthogonal matrix can be shown to be 1. An orthogonalmatrix preserves the length (norm) of a vector (see eqn. (A.27) for a definition of thenorm of a vector). Hence, ifAis an orthogonal matrix, then ||Ax| |=| |x||.Block Structures and Other Identities

    Matrices can also be analyzed using block structures. Assume that Ais ann nmatrix and thatCis anm mmatrix. Then, we have

    det

    A B

    0C

    = det

    A 0

    B C

    = det(A)det(C) (A.18a)

    detA B

    C D= det(A)det(P) = det(D)det(Q) (A.18b)

    A B

    C D

    1= Q1 Q1B D1

    D1C Q1 D1(I+C Q1B D1)

    =A1(I+B P1C A1) A1B P1

    P1C A1 P1

    (A.18c)

    whereP and QareSchur complementsofAandD :

    PD C A1B (A.19a)Q A B D1C (A.19b)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    5/20

    Matrix Properties 537

    Other useful matrix identities involve theSherman-Morrison lemma, given by

    (I+A B)1 =IA (I+B A)1B (A.20)and thematrix inversion lemma, given by

    (A +B C D)1 =A1 A1B D A1B +C11D A1 (A.21)whereAis an arbitraryn nmatrix andCis an arbitrarym mmatrix. A proof ofthe matrix inversion lemma is given in 1.3.

    Matrix Trace

    Another useful quantity often used in estimation theory is thetraceof a matrix,

    which is defined only for square matrices:

    Tr(A)=

    n

    i=1

    aii (A.22)

    Some useful identities involving the matrix trace are given by

    Tr(A) = Tr(A) (A.23a)Tr(A +B) = Tr(A) +Tr(B) (A.23b)

    Tr(A B) = Tr(B A) (A.23c)Tr(xyT) = xTy (A.23d)

    Tr(AyxT)=

    xTA y (A.23e)

    Tr(A B C D) = Tr(B C D A) = Tr(C D A B) = Tr(D A B C) (A.23f)Equation (A.23f) shows the cyclic invariance of the trace. The operationyxT is

    known as theouter product(alsoyxT = xyT in general).Solution of Triangular Systems

    Anupper triangular systemof linear equations has the form

    t11x1 + t12x2 + t13x3 + + t1nxn=y1t22x2

    +t23x3

    + +t2nxn

    =y2

    t33x3 + + t3nxn=y3...

    tnnxn=yn

    (A.24)

    or

    Tx = y (A.25)where

    T=

    t11t12t13

    t1n

    0 t22t23 t2n0 0 t33 t3n...

    .... . .

    0 0 tnn

    (A.26)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    6/20

    538 Optimal Estimation of Dynamic Systems

    The matrixTcan be shown to be nonsingular if and only if its diagonal elements are

    nonzero.1 Clearly,xncan be easily determined using the upper triangular form. The

    xicoefficients can be determined by aback substitution algorithm:

    fori= n,n 1, . . . ,1

    xi= t1i iyi n

    j=i+1ti jxj

    nexti

    This algorithm will fail only ifti i 0. But, this can occur only ifTis singular (ornearly singular). Experience indicates that the algorithm is well-behaved for most

    applications though.

    The back substitution algorithm can be modified to compute the inverse,S= T1,of an upper triangular matrix T. We now summarize an algorithm for calculating

    S= T1

    and overwritingTbyT1

    :fork= n,n 1, . . . ,1

    tkk Skk= t1kktik Sik= t1ii

    kj=i+1

    ti j sjk, i= k1,k2, . . . ,1

    nextk

    wheredenotes replacement. This algorithm requires aboutn3/6 calculations(note: if only the solution ofxis required and not the explicit form forT1, then the

    back substitution algorithm should be solely employed since onlyn2/2 calculationsare required for this algorithm).

    A.2 Vectors

    The quantitiesxandyin eqn. (A.2) are known asvectors, which are a special case

    of a matrix. Vectors can consist of one row, known as a row vector, or one column,

    known as acolumn vector.

    Vector Norm and Dot Product

    A measure of the length of a vector is given by the norm:

    ||x||

    xTx =

    n

    i=1x2i

    1/2(A.27)

    Also,|| x| | = || ||x||. A vector with norm one is said to be aunit vector. AnyThe symbolxy means overwritexby the currenty-value. This notation is employed to indicatehow storage may be conserved by overwriting quantities no longer needed.

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    7/20

    Matrix Properties 539

    y

    x

    y x

    (a) Angle between Two Vectors

    y

    p

    x

    y p

    (b) Orthogonal Projection

    Figure A.1:Depiction of the Angle between Two Vectors and an Orthogonal

    Projection

    nonzero vector can be made into a unit vector by dividing it by its norm:

    x x||x|| (A.28)

    Note that the carat is also used to denote estimate in this text. Thedot productor

    inner productof two vectors of equal dimension,n 1, is given by

    xTy = yTx =n

    i=1xiyi (A.29)

    If the dot product is zero, then the vectors are said to be orthogonal. Suppose that a

    set of vectorsxi (i= 1,2, . . . ,m)follows

    xTixj= i j (A.30)

    where the Kronecker delta i jis defined as

    i j= 0 ifi=j= 1 ifi=j (A.31)

    Then, this set is said to beorthonormal. The column and row vectors of an orthogo-

    nal matrix, defined by the property shown in eqn. (A.17), form an orthonormal set.

    Angle Between Two Vectors and the Orthogonal Projection

    Figure A.1(a) shows two vectors, xandy, and the angle which is the anglebetween them. This angle can be computed from the cosine law:

    cos( ) = xTy

    ||x||||y|| (A.32)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    8/20

    540 Optimal Estimation of Dynamic Systems

    x

    = z x y

    y

    Figure A.2:Cross Product and the Right Hand Rule

    Figure A.1(b) shows the orthogonal projection of a vector y to a vector x. Theorthogonal projection ofytoxis given by

    p = xTy

    ||x||2 x (A.33)

    This projection yields(yp)Tx = 0.Triangle and Schwartz Inequalities

    Some important inequalities are given by thetriangle inequality:

    ||x+y| || |x||+||y|| (A.34)

    and theSchwartz inequality:

    |xTy| | |x||||y|| (A.35)Note that the Schwartz inequality implies the triangle inequality.

    Cross Product

    The cross product of two vectors yields a vector that is perpendicular to bothvectors. The cross product ofxandyis given by

    z = xy =x2y3 x3y2x3y1 x1y3

    x1y2 x2y1

    (A.36)

    The cross product follows theright hand rule, which states that the orientation ofz

    is determined by placingxandytail-to-tail, flattening the right hand, extending it in

    the direction ofx, and then curling the fingers in the direction that the angleymakeswithx. The thumb then points in the direction ofz, as shown in Figure A.2. The

    cross product can also be obtained using matrix multiplication:

    z = [x]y (A.37)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    9/20

    Matrix Properties 541

    where [x] is thecross product matrix, defined by

    [x]

    0 x3 x2

    x3 0 x1

    x2 x1 0

    (A.38)

    Note that [x] is a skew symmetric matrix.The cross product has the following properties:

    [x]T = [x] (A.39a)[x]y = [y]x (A.39b)

    [x] [y] =

    xTyI+yxT (A.39c)

    [x]3

    = xTx [x] (A.39d)[x] [y] [y] [x] = yxTxyT = [(xy)] (A.39e)

    xyT[w]+ [w]yxT = [{x (yw)}] (A.39f)

    (I [x])(I+ [x])1 = 11+xTx

    (1xTx)I+2xxT2[x]

    (A.39g)

    Other useful properties involving an arbitrary 3 3 square matrixMare given by2

    M[x]+ [x]MT

    + [(MT

    x)] = Tr(M) [x] (A.40a)M[x]MT = [{adj(MT) x}] (A.40b)

    (Mx) (My) = adj(MT) (xy) (A.40c)[{(Mx) (My)}] =M[(xy)]MT (A.40d)

    If we writeMin terms of its columns

    M

    = x1x2x3 (A.41)

    then

    det(M) = xT1(x2 x3) (A.42)Also, ifAis an orthogonal matrix with determinant 1, then from eqn. (A.40b) we

    have

    A[x]AT = [(Ax)] (A.43)Another important quantity using eqn. (A.39c) is given by

    [x]2 = xTxI+xxT (A.44)This matrix is the projection operator onto the space perpendicular to x. Many other

    interesting relations involving the cross product are given in Ref. [3].

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    10/20

    542 Optimal Estimation of Dynamic Systems

    Table A.1:Matrix and Vector Norms

    Norm Vector Matrix

    One-norm ||x||1 =ni=1 |xi | ||A||1 = maxj ni=1 |ai j |Two-norm ||x||2 =

    ni=1x

    2i

    1/2 ||A||2 = max singular value ofAFrobenius norm ||x||F= ||x||2 ||A||F=

    Tr(AA)

    Infinity-norm ||x|| = maxi

    |xi | |||A|| = maxi

    nj=1 |ai j |

    The angle inFigure A.1(a) can be computed from

    sin( ) =||xy||||x||||y|| (A.45)

    Using sin2( ) + cos2( ) = 1, eqns. (A.32) and (A.45) also give

    ||xy|| =(xTx)(yTy) (xTy)2 (A.46)From the Schwartz inequality in eqn. (A.35), the quantity within the square root in

    eqn. (A.46) is always positive.

    A.3 Matrix Norms and Definiteness

    Norms for matrices are slightly more difficult to define than for vectors. Also, thedefinition of a positive or negative matrix is more complicated for matrices than

    scalars. Before showing these quantities, we first define the following quantities for

    any complex matrixA:

    Conjugate transpose: defined as the transpose of the conjugate of each ele-ment; denoted byA.

    Hermitian: has the propertyA =A(note: any real symmetric matrix is Her-mitian).

    Normal: has the propertyAA =A A. Unitary: inverse is equal to its Hermitian transpose, so thatAA =A A =I

    (note: a real unitary matrix is an orthogonal matrix).

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    11/20

    Matrix Properties 543

    Matrix Norms

    Several possible matrix norms can be defined.Table A.1lists the most commonly

    used norms for both vectors and matrices. The one-norm is the largest column sum.

    The two-norm is the maximum singular value (see A.4). Also, unless otherwise

    stated, the norm defined without showing a subscript is the two-norm, as shown byeqn. (A.27). The Frobenius norm is defined as the square root of the sum of the

    absolute squares of its elements. The infinity-norm is the largest row sum. The

    matrix norms described in Table A.1 have the following properties:

    ||A| |= || ||A|| (A.47a)||A +B| || |A||+||B|| (A.47b)

    ||A B| || |A||||B|| (A.47c)Not all norms follow eqn. (A.47c) though (e.g., the maximum absolute matrix ele-

    ment). More matrix norm properties can be found in Refs. [4] and [5].

    Definiteness

    Sufficiency tests in least squares and the minimization of functions with multiple

    variables often require that one determine thedefinitenessof the matrix of second

    partial derivatives. A real and square matrixAis

    Positive definiteifxTA x>0 for all nonzerox.

    Positive semi-definiteifxTA x

    0 for all nonzerox.

    Negative definiteifxTA x 0 there exists a unique positive semi-definite matrix such that

    A =B (note: AandBcommute, so thatA B =B A). The following relationship:B > A (A.49)

    implies(B A) >0, which states that the matrix (B A)is positive definite. Also,B A (A.50)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    12/20

    544 Optimal Estimation of Dynamic Systems

    implies (B A) 0, which states that the matrix (B A)is positive semi-definite.The conditions for negative definite and negative semi-definite are obvious from the

    definitions stated for positive definite and positive semi-definite.

    A.4 Matrix Decompositions

    Several matrix decompositions are given in the open literature. Many of these

    decompositions are used in place of a matrix inverse either to simplify the calcula-

    tions or to provide more numerically robust approaches. In this section we present

    several useful matrix decompositions that are widely used in estimation and control

    theory. The methods to compute these decompositions is beyond the scope of thepresent text. Reference [4] provides all the necessary algorithms and proofs for the

    interested reader. Before we proceed a short description of the rank of a matrix is

    provided. Several definitions are possible. We will state that the rank of a matrix is

    given by the dimension of the range of the matrix corresponding to the number of

    linearly independent rows or columns. An m n matrix is rank deficient if the rankof A is less than the minimum (m,n). Suppose that the rank of an n n matrix A isgiven by rank(A) =r. Then, a set of (n r)nonzero unit vectors, xi , can always befound that have the following property for a singular square matrix A:

    A xi =0, i =1,2, . . . ,n r (A.51)The value of (n r)is known as the nullity, which is the maximum number of lin-early independent null vectors of A. These vectors can form an orthonormal basis

    (which is how they are commonly shown) for the null space of A, and can be com-

    puted from the singular value decomposition. If A is nonsingular then no nonzero

    vector xi can be found to satisfy eqn. (A.51). For more details on the rank of a matrix

    see Refs. [4] and [6].

    Eigenvalue/Eigenvector Decomposition and the Cayley-Hamilton Theorem

    One of the most widely used decompositions for a square n n matrix A in thestudy of dynamical systems is the eigenvalue/eigenvector decomposition. A real or

    complex number is an eigenvalue of A if there exists a nonzero (right) eigenvector

    psuch that

    A p = p (A.52)The solution forpis not unique in general, so usuallypis given as a unit vector. In

    order for eqn. (A.52) to have a nonzero solution for p, from eqn. (A.51), the matrix

    (I

    A)must be singular. Therefore, from eqn. (A.11) we have

    det(IA) = n + 1n1 + + n1 + n= 0 (A.53)Equation (A.53) leads to a polynomial of degreen, which is called thecharacteristic

    equationofA. For example, the characteristic equation for a 3 3 matrix is given

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    13/20

    Matrix Properties 545

    by

    3 2Tr(A) + Tr[adj(A)]det(A) = 0 (A.54)If all eigenvalues ofAare distinct, then the set of eigenvectors is linearly indepen-

    dent. Therefore, the matrixAcan bediagonalizedas

    =P1A P (A.55)

    where = diag 1 2 n andP= p1p2 pn. IfAhas repeated eigenval-ues, then a block diagonal and triangular-form representation must be used, called

    aJordan block.6 The eigenvalue/eigenvector decomposition can be used for linear

    state variable transformations (see 3.1.4). Eigenvalues and eigenvectors can either

    be real or complex. This decomposition is very useful whenAis symmetric, since

    is always diagonal (even for repeated eigenvalues) and P is orthogonal for this case.

    A proof is given in Ref. [4]. Also, Ref. [4] provides many algorithms to compute theeigenvalue/eigenvector decomposition.

    One of the most useful properties used in linear algebra is the Cayley-Hamilton

    theorem, which states that a matrix satisfies its own characteristic equation, so that

    An + 1An1 + + n1A + nI= 0 (A.56)

    This theorem is useful for computing powers ofAthat are larger thann, sinceAn+1can be written as a linear combination of(A,A2, . . . ,An).6

    Q RDecompositionTheQ Rdecomposition is especially useful in least squares (see 1.6.1) and the

    Square Root Information Filter (SRIF) (see 5.7.1). The Q Rdecomposition of an

    m nmatrixA, withm n, is given by

    A =QR (A.57)

    where Qis anm morthogonal matrix, and Ris an upper triangularm nmatrixwith all elements Ri j= 0 fori > j . IfAhas full column rank, then the firstncolumns ofQform an orthonormal basis for the range ofA.

    4

    Therefore, the thinQ Rdecomposition is often used:

    A = Q R (A.58)

    whereQ is anm nmatrix with orthonormal columns andRis an upper triangularn nmatrix. Since theQ Rdecomposition is widely used throughout the presenttext, we present a numerical algorithm to compute this decomposition by the modi-

    fied Gram-Schmidtmethod.4 LetAandQ be partitioned by columns

    a1a2 anand q

    1q

    2 q

    n, respectively. To begin the algorithm we setQ =A, and thenfork= 1,2, . . . ,n

    rkk= ||qk||2qk qk/rkk

    rk j= qTkqj , j= k+1, . . . ,n

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    14/20

    546 Optimal Estimation of Dynamic Systems

    qj qj rk j qk, j= k+1, . . . ,nnextk

    where denotes replacement (note: rkk andrk j are elements of the matrix R).This algorithm works even when Ais complex. TheQ Rdecomposition is useful

    to invert ann

    nmatrix A, which is given by A1=

    R1 QT (note: the inverseof an upper triangular matrix is also a triangular matrix). Other methods, based

    on the Householder transformation and Givens rotations, can be used for the Q R

    decomposition.4

    Singular Value Decomposition

    Another decomposition of anm nmatrix Ais thesingular-value decomposi-tion,4, 7 which decomposes a matrix into a diagonal matrix and two orthogonal ma-

    trices:

    A

    =U S V (A.59)

    where Uis anm munitary matrix, Sis anm ndiagonal matrix such that Si j= 0fori=j , and Vis ann nunitary matrix. Many efficient algorithms can be used todetermine the singular value decomposition.4 Note that the zeros below the diagonal

    in S(withm >n) imply that the elements of columns(n + 1), (n + 2), .. .,mofUare arbitrary. So, we can define the following reduced singular value decomposition:

    A = U S V (A.60)

    whereUis them

    nsubset matrix ofU(with the (n+

    1), (n+

    2), .. .,mcolumns

    eliminated), Sis the upper n n matrix ofS, andV= V. Note thatUU= I,but it is no longer possible to make the same statement for U U. The elements ofS= diag s1 sn are known as thesingular valuesofA, which are ordered fromthe smallest singular value to the largest singular value. These values are extremely

    important since they can give an indication of how well we can invert a matrix.8 A

    common measure of the invertability of a matrix is thecondition number, which is

    usually defined as the ratio of its largest singular value to its smallest singular value:

    Condition Number =sn

    s1 (A.61)

    Large condition numbers may indicate a near singular matrix, and the minimum

    value of the condition number is unity (which occurs when the matrix is orthogonal).

    The rank ofAis given by the number of nonzero singular values. Also, the singular

    value decomposition is useful to determine various norms (e.g., ||A||2F= s21 + +s2p, where p =min(m,n), and the two-norm as shown inTable A.1).Gaussian Elimination

    Gaussian eliminationis a classical reduction procedure by which a matrixAcanbe reduced to upper triangular form. This procedure involves pre-multiplications of

    a square matrixAby a sequence ofelementary lower triangularmatrices, each cho-

    sen to introduce a column with zeros below the diagonal (this process is often called

    annihilation). Several possible variations of Gaussian elimination can be derived.

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    15/20

    Matrix Properties 547

    We present a very robust algorithm calledGaussian elimination with complete piv-

    oting. This approach requires data movements such as the interchange of two matrix

    rows. These interchanges can be tracked by using permutation matrices, which are

    just identity matrices with rows or columns reordered. For example, consider the

    following matrix:

    P=

    0 0 0 1

    1 0 0 0

    0 0 1 0

    0 1 0 0

    (A.62)

    So P A is a row permutated version ofAandA Pis a column permutated version

    ofA. Permutation matrices are orthogonal. This algorithm computes the complete

    pivoting factorization

    A =P L U QT (A.63)whereP andQare permutation matrices,Lis aunit(with ones along the diagonal)

    lower triangular matrix, andUis an upper triangular matrix. The algorithm begins

    by settingP= Q =I, which are partitioned into column vectors as

    P= p1p2 pn , Q = q1q2 qn (A.64)The algorithm for Gaussian elimination with complete pivoting is given by overwrit-

    ing theAmatrix:

    fork

    =1,2, . . . ,n

    1

    Determinewithk nand withk nso|a| = max{|ai j |,i= 1,2, . . . ,n, j= 1,2, . . . ,n}

    if = kpk pak j aj , j= 1,2, . . . ,n

    end if

    if = kqk qajk

    aj, j

    =1,2, . . . ,n

    end ififakk= 0

    ajk ak j /akk, j= k+1, . . . ,naj j aj j ajkak j , j= k+1, . . . ,n

    end if

    nextk

    where denotes replacement and denotes interchange the value assigned to.The matrixUis given by the upper triangular part (including the diagonal elements)

    of the overwrittenAmatrix, and the matrixLis given by the lower triangular part (re-

    placing the diagonal elements with ones) of the overwritten Amatrix. More detailson Gaussian elimination can be found in Ref. [4].

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    16/20

    548 Optimal Estimation of Dynamic Systems

    LUand Cholesky Decompositions

    TheLUdecomposition factors ann nmatrixAinto a product of a lower trian-gular matrixLand an upper triangular matrixU, so that

    A

    =L U (A.65)

    Gaussian elimination is a foremost example ofLUdecompositions. In general, the

    LUdecomposition is not unique. This can be seen by observing that for an arbitrary

    nonsingular diagonal matrixD, settingL =L DandU =D1Uyield new upperand lower triangular matrices that satisfy L U =L D D1U=L U=A. The factthat the decomposition is not unique suggests the possible wisdom of forming the

    normalizeddecomposition

    A =L D U (A.66)in whichL andUare unit lower and upper triangular matrices and Dis a diagonal

    matrix. The question of existence and uniqueness is addressed by Stewart1 whoproves that the A =L D Udecomposition is unique, provided the leading diagonalsub-matrices ofAare nonsingular.

    There are three important variants of theL D Udecomposition; the first associates

    Dwith the lower triangular part to give the factorization

    A = LU (A.67)where LL D. This is known as theCrout reduction. The second variant associatesDwith the upper triangular factor as

    A =LU (A.68)where UD U. This reduction is exactly that obtained by Gaussian elimination.

    The third variation is possible only for symmetric positive definite matrices, in

    which case

    A =L D LT (A.69)ThusAcan be written as

    A = LLT (A.70)where now L L D1/2 is known as thematrix square root, and the factorization ineqn. (A.69) is known as the Cholesky decomposition. Efficient algorithms to compute

    theLUand Cholesky decompositions can be found in Ref. [4].

    A.5 Matrix Calculus

    In this section several relations are given for taking partial or time derivatives of

    matrices. Before providing a list of matrix calculus identities, we first will define

    Most of these relations can be found in a website given by Mike Brooks, Imperial College, London, UK.

    As of this writing this website is given byhttp://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html.

    2004 by CRC Press LLC

    http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/
  • 8/10/2019 c391x Appendix A

    17/20

    Matrix Properties 549

    theJacobianandHessianof a scalar function f(x), wherexis ann 1 vector. TheJacobian of f(x)is ann 1 vector given by

    xff

    x=

    f

    x1

    f

    x2

    ...

    f

    xn

    (A.71)

    The Hessian of f(x)is ann nmatrix given by

    2xf 2f

    xxT=

    f

    x1x1

    f

    x1x2 f

    x1xn

    f

    x2x1

    f

    x2x2 f

    x2xn

    ......

    . . ....

    f

    xnx1

    f

    xnx2 f

    xnxn

    (A.72)

    Note that the Hessian of a scalar is a symmetric matrix. Iff(x)is anm 1 vector andxis ann 1 vector, then the Jacobian matrix is given by

    xf f

    x=

    f1

    x1

    f1

    x2 f1

    xn

    f2

    x1

    f2

    x2 f2

    xn

    ......

    . . ....

    fm

    x1

    fm

    x2 fm

    xn

    (A.73)

    Note that the Jacobian matrix is anmnmatrix. Also, there is a slight inconsistencybetween eqn. (A.71) and eqn. (A.73) whenm = 1, since eqn. (A.71) gives ann 1vector, while eqn. (A.73) gives a 1 nvector. This should pose no problems forthe reader though since the context of this notation is clear for the particular system

    shown in this text.

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    18/20

    550 Optimal Estimation of Dynamic Systems

    A list of derivatives involving linear products is given by

    x(Ax) =A (A.74a)

    A(aTA b)

    =abT (A.74b)

    A(aTATb) = baT (A.74c)

    d

    dt(A B) =A

    d

    dt(B)

    +

    d

    dt(A)

    B (A.74d)

    A list of derivatives involving quadratic and cubic products is given by

    x(A x+b)TC(D x+e) =ATC(D x+e) +DTCT(A x+b) (A.75a)

    x(xTCx) = (C+CT) x (A.75b)

    A(aTATAb) =A (abT+baT) (A.75c)

    A(aTATC A b) = CTA abT+C A baT (A.75d)

    A(A a+b)TC(A a+b) = (C+CT) (A a+b)aT (A.75e)

    x

    (xTA xxT) = (A +AT)xxT+ (xTA x)I (A.75f)

    A list of derivatives involving the inverse of a matrix is given by

    d

    dt(A1) = A1

    d

    dt(A)

    A1 (A.76a)

    A(aTA1b) = ATabTAT (A.76b)

    A list of derivatives involving the trace of a matrix is given by

    ATr(A) =

    ATr(AT) =I (A.77a)

    ATr(A) =

    A1

    T(A.77b)

    ATr(C A1B) = ATC B AT (A.77c)

    ATr(CTA BT) =

    ATr(B ATC) = C B (A.77d)

    ATr(C A B ATD) = CTDTA BT+D C A B (A.77e)

    ATr(C A B A) = CTATBT+BTATCT (A.77f)

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    19/20

    Matrix Properties 551

    A list of derivatives involving the determinant of a matrix is given by

    Adet(A) =

    Adet(AT) = [adj(A)]T (A.78a)

    Adet(C A B)

    =det(C A B)A

    T (A.78b)

    Aln[det(C A B)] =AT (A.78c)

    Adet(A) = det(A)AT (A.78d)

    Aln[det(A)] = AT (A.78e)

    Adet(ATC A) = det(ATC A) (C+CT)A(ATC A)1 (A.78f)

    Aln[det(ATC A)] = (C+CT)A(ATC A)1 (A.78g)

    Relations involving the Hessian matrix are given by

    2

    xxT(A x+b)TC(D x+e) =ATC D +DTCTA (A.79a)

    2

    xxT(xTCx) = C+CT (A.79b)

    References

    [1] Stewart, G.W.,Introduction to Matrix Computations, Academic Press, New

    York, NY, 1973.

    [2] Shuster, M.D., A Survey of Attitude Representations,Journal of the Astro-

    nautical Sciences, Vol. 41, No. 4, Oct.-Dec. 1993, pp. 439517.

    [3] Tempelman, W., The Linear Algebra of Cross Product Operations,Journal

    of the Astronautical Sciences, Vol. 36, No. 4, Oct.-Dec. 1988, pp. 447461.

    [4] Golub, G.H. and Van Loan, C.F.,Matrix Computations, The Johns Hopkins

    University Press, Baltimore, MD, 3rd ed., 1996.

    [5] Zhang, F.,Linear Algebra: Challenging Problems for Students, The Johns

    Hopkins University Press, Baltimore, MD, 1996.

    [6] Chen, C.T.,Linear System Theory and Design, Holt, Rinehart and Winston,New York, NY, 1984.

    [7] Horn, R.A. and Johnson, C.R.,Matrix Analysis, Cambridge University Press,

    Cambridge, MA, 1985.

    2004 by CRC Press LLC

  • 8/10/2019 c391x Appendix A

    20/20

    552 Optimal Estimation of Dynamic Systems

    [8] Nash, J.C.,Compact Numerical Methods for Computers: linear algebra and

    function minimization, Adam Hilger Ltd., Bristol, 1979.