c391x Appendix A

8/10/2019 c391x Appendix A

1/20

A

Matrix Properties

THIS appendix provides a reasonably comprehensive account of matrix proper-ties, which are used in the linear algebra of estimation and control theory. Sev-eral theorems are shown, but are not proven here; those proofs given areconstructive

(i.e., suggest an algorithm). The account here is thus not satisfactorily self-contained,

but references are provided where rigorous proofs may be found.

A.1 Basic Definitions of Matrices

The system ofm linear equations

y1 = a11x1 +a12x2 + +a1nxny2 = a21x1 +a22x2 + +a2nxn

...

ym= am1x1 +am2x2 + +amnxn

(A.1)

can be written in matrix form as

y =A x (A.2)whereyis anm

1 vector,xis ann

1 vector (seeA.2 for a definition of a vector)

andAis anm nmatrix, with

y =

y1y2

...

ym

, x =

x1x2

...

xn

, A =

a11 a12 a1na21 a22 a2n

......

. . ....

am1am2 amn

(A.3)

Ifm = n, then the matrixAissquare.

Matrix Addition, Subtraction, and MultiplicationMatrices can be added, subtracted, or multiplied. For addition and subtraction,

all matrices must of the same dimension. Suppose we wish to add/substract two

matricesAandB :

C=A B (A.4)

533 2004 by CRC Press LLC


2/20

534 Optimal Estimation of Dynamic Systems

Then each element ofCis given byci j= ai j bi j . Matrix addition and subtractionare both commutative, A B=B A, and associative, (A B) C=A (B C). Matrix multiplication is much more complicated though. Suppose we wish to

multiply two matricesAandB :

C

=A B (A.5)

This operation is valid only when the number of columns ofAis equal to the number

of rows ofB(i.e.,AandBmust beconformable). The resulting matrixCwill have

rows equal to the number of rows ofAand columns equal to the number of columns

ofB . Thus, ifAhas dimensionm nand Bhas dimensionn p, thenCwill havedimensionm p. Theci jelement ofCcan be determined by

ci j=n

k=1

aikbk j (A.6)

for alli=1,2, . . . ,m and j=1,2, . . . ,p. Matrix multiplication is associative,A (B C) = (A B)C, and distributive, A (B + C) =A B +AC, but not commutativein general,A B =B A. In some cases though ifA B =B A, thenAandBare said tocommute.

The transpose of a matrix, denoted AT, has rows that are the columns ofAand

columns that are the rows ofA. The transpose operator has the following properties:

(A)T = AT,whereis a scalar (A.7a)(A +B)T =AT+BT (A.7b)

(A B)T =BTAT (A.7c)

IfA =AT, then Ais said to be asymmetricmatrix. Also, ifA = AT, then Aissaid to be askew symmetricmatrix.

Matrix Inverse

We now discuss the properties of the matrix inverse. Suppose we are given bothy

and A in eqn. (A.2), and we want to determine x. The following terminology should

be noted carefully: ifm >n, the system in eqn. (A.2) is said to be overdetermined

(there are more equations than unknowns). Under typical circumstances we will find

that the exact solution forxdoes not exist; therefore algorithms for approximate

solutions forxare usually characterized by some measure ofhow wellthe linear

equations are satisfied. Ifm


3/20

Matrix Properties 535

Ahas linearly independent columns. Ahas linearly independent rows. The inverse satisfiesA1A =A A1 =I

whereIis ann nidentity matrix:

I=

1 0 00 1 0...

.... . .

...

0 0 1

(A.8)

Anonsingularmatrix is a matrix whose inverse exists (likewiseAT is nonsingular):

(A1)1

=A (A.9a)

(AT)1 = (A1)T AT (A.9b)Furthermore, letAandB ben nmatrices. The matrix productA Bis nonsingularif and only ifAandBare nonsingular. If this condition is met, then

(A B)1 =B1A1 (A.10)Formal proof of this relationship and other relationships are given in Ref. [1]. The

inverse of a square matrixAcan be computed by

A1 =adj(A)

det(A) (A.11)

where adj(A)is theadjointofAand det(A)is thedeterminantofA. The adjoint

and determinant of a matrix with large dimension can ultimately be broken down to

a series of 2 2 matrix cases, where the adjoint and determinant are given by

adj(A22) =

a22 a12a21 a11

(A.12a)

det(A22) = a11a22 a12a21 (A.12b)Other determinant identities are given by

det(I) = 1 (A.13a)det(A B) = det(A)det(B) (A.13b)

det(A B) = det(B A) (A.13c)det(A B +I) = det(B A +I) (A.13d)

det(A +xyT) = det(A) (1+yTA1x) (A.13e)det(A)det(D

+C A1B)

=det(D)det(A

+B D1C) (A.13f)

det(A) = [det(A)], must be positive if det(A) = 0 (A.13g)det(A) = n det(A) (A.13h)

det(A33) det

a b c= aT[b]c = bT[c]a = cT[a]b (A.13i)

2004 by CRC Press LLC


4/20


where the matrices [a], [b], and [a] are defined in eqn. (A.38). The adjoint isgiven by the transpose of thecofactormatrix:

adj(A) = [cof(A)]T (A.14)

The cofactor is given byCi j= (1)i+jMi j (A.15)

where Mi j is theminor, which is the determinant of the resulting matrix given by

crossing out the row and column of the elementai j . The determinant can be com-

puted using an expansion about rowior column j :

det(A) =n

k=1aikCik=

n

k=1ak j Ck j (A.16)

From eqn. (A.11)A1 exists if and only if the determinant ofAis nonzero. Matrixinverses are usually complicated to compute numerically; however, a special case is

when the inverse is given by the transpose of the matrix itself. This matrix is then

said to beorthogonalwith the property

ATA =A AT =I (A.17)

Also, the determinant of an orthogonal matrix can be shown to be 1. An orthogonalmatrix preserves the length (norm) of a vector (see eqn. (A.27) for a definition of thenorm of a vector). Hence, ifAis an orthogonal matrix, then ||Ax| |=| |x||.Block Structures and Other Identities

Matrices can also be analyzed using block structures. Assume that Ais ann nmatrix and thatCis anm mmatrix. Then, we have

det

A B

0C

= det

A 0

B C

= det(A)det(C) (A.18a)

detA B

C D= det(A)det(P) = det(D)det(Q) (A.18b)

A B

C D

1= Q1 Q1B D1

D1C Q1 D1(I+C Q1B D1)

=A1(I+B P1C A1) A1B P1

P1C A1 P1

(A.18c)

whereP and QareSchur complementsofAandD :

PD C A1B (A.19a)Q A B D1C (A.19b)



5/20


Other useful matrix identities involve theSherman-Morrison lemma, given by

(I+A B)1 =IA (I+B A)1B (A.20)and thematrix inversion lemma, given by

(A +B C D)1 =A1 A1B D A1B +C11D A1 (A.21)whereAis an arbitraryn nmatrix andCis an arbitrarym mmatrix. A proof ofthe matrix inversion lemma is given in 1.3.

Matrix Trace

Another useful quantity often used in estimation theory is thetraceof a matrix,

which is defined only for square matrices:

Tr(A)=

n

i=1

aii (A.22)

Some useful identities involving the matrix trace are given by

Tr(A) = Tr(A) (A.23a)Tr(A +B) = Tr(A) +Tr(B) (A.23b)

Tr(A B) = Tr(B A) (A.23c)Tr(xyT) = xTy (A.23d)

Tr(AyxT)=

xTA y (A.23e)

Tr(A B C D) = Tr(B C D A) = Tr(C D A B) = Tr(D A B C) (A.23f)Equation (A.23f) shows the cyclic invariance of the trace. The operationyxT is

known as theouter product(alsoyxT = xyT in general).Solution of Triangular Systems

Anupper triangular systemof linear equations has the form

t11x1 + t12x2 + t13x3 + + t1nxn=y1t22x2

+t23x3

+ +t2nxn

=y2

t33x3 + + t3nxn=y3...

tnnxn=yn

(A.24)

or

Tx = y (A.25)where

T=

t11t12t13

t1n

0 t22t23 t2n0 0 t33 t3n...

.... . .

0 0 tnn

(A.26)



6/20


The matrixTcan be shown to be nonsingular if and only if its diagonal elements are

nonzero.1 Clearly,xncan be easily determined using the upper triangular form. The

xicoefficients can be determined by aback substitution algorithm:

fori= n,n 1, . . . ,1

xi= t1i iyi n

j=i+1ti jxj

nexti

This algorithm will fail only ifti i 0. But, this can occur only ifTis singular (ornearly singular). Experience indicates that the algorithm is well-behaved for most

applications though.

The back substitution algorithm can be modified to compute the inverse,S= T1,of an upper triangular matrix T. We now summarize an algorithm for calculating

S= T1

and overwritingTbyT1

:fork= n,n 1, . . . ,1

tkk Skk= t1kktik Sik= t1ii

kj=i+1

ti j sjk, i= k1,k2, . . . ,1

nextk

wheredenotes replacement. This algorithm requires aboutn3/6 calculations(note: if only the solution ofxis required and not the explicit form forT1, then the

back substitution algorithm should be solely employed since onlyn2/2 calculationsare required for this algorithm).

A.2 Vectors

The quantitiesxandyin eqn. (A.2) are known asvectors, which are a special case

of a matrix. Vectors can consist of one row, known as a row vector, or one column,

known as acolumn vector.

Vector Norm and Dot Product

A measure of the length of a vector is given by the norm:

||x||

xTx =

n

i=1x2i

1/2(A.27)

Also,|| x| | = || ||x||. A vector with norm one is said to be aunit vector. AnyThe symbolxy means overwritexby the currenty-value. This notation is employed to indicatehow storage may be conserved by overwriting quantities no longer needed.



7/20


y

x

y x

(a) Angle between Two Vectors

y

p

x

y p

(b) Orthogonal Projection

Figure A.1:Depiction of the Angle between Two Vectors and an Orthogonal

Projection

nonzero vector can be made into a unit vector by dividing it by its norm:

x x||x|| (A.28)

Note that the carat is also used to denote estimate in this text. Thedot productor

inner productof two vectors of equal dimension,n 1, is given by

xTy = yTx =n

i=1xiyi (A.29)

If the dot product is zero, then the vectors are said to be orthogonal. Suppose that a

set of vectorsxi (i= 1,2, . . . ,m)follows

xTixj= i j (A.30)

where the Kronecker delta i jis defined as

i j= 0 ifi=j= 1 ifi=j (A.31)

Then, this set is said to beorthonormal. The column and row vectors of an orthogo-

nal matrix, defined by the property shown in eqn. (A.17), form an orthonormal set.

Angle Between Two Vectors and the Orthogonal Projection

Figure A.1(a) shows two vectors, xandy, and the angle which is the anglebetween them. This angle can be computed from the cosine law:

cos( ) = xTy

||x||||y|| (A.32)



8/20


x

= z x y

y

Figure A.2:Cross Product and the Right Hand Rule

Figure A.1(b) shows the orthogonal projection of a vector y to a vector x. Theorthogonal projection ofytoxis given by

p = xTy

||x||2 x (A.33)

This projection yields(yp)Tx = 0.Triangle and Schwartz Inequalities

Some important inequalities are given by thetriangle inequality:

||x+y| || |x||+||y|| (A.34)

and theSchwartz inequality:

|xTy| | |x||||y|| (A.35)Note that the Schwartz inequality implies the triangle inequality.

Cross Product

The cross product of two vectors yields a vector that is perpendicular to bothvectors. The cross product ofxandyis given by

z = xy =x2y3 x3y2x3y1 x1y3

x1y2 x2y1

(A.36)

The cross product follows theright hand rule, which states that the orientation ofz

is determined by placingxandytail-to-tail, flattening the right hand, extending it in

the direction ofx, and then curling the fingers in the direction that the angleymakeswithx. The thumb then points in the direction ofz, as shown in Figure A.2. The

cross product can also be obtained using matrix multiplication:

z = [x]y (A.37)



9/20


where [x] is thecross product matrix, defined by

[x]

0 x3 x2

x3 0 x1

x2 x1 0

(A.38)

Note that [x] is a skew symmetric matrix.The cross product has the following properties:

[x]T = [x] (A.39a)[x]y = [y]x (A.39b)

[x] [y] =

xTyI+yxT (A.39c)

[x]3

= xTx [x] (A.39d)[x] [y] [y] [x] = yxTxyT = [(xy)] (A.39e)

xyT[w]+ [w]yxT = [{x (yw)}] (A.39f)

(I [x])(I+ [x])1 = 11+xTx

(1xTx)I+2xxT2[x]

(A.39g)

Other useful properties involving an arbitrary 3 3 square matrixMare given by2

M[x]+ [x]MT

+ [(MT

x)] = Tr(M) [x] (A.40a)M[x]MT = [{adj(MT) x}] (A.40b)

(Mx) (My) = adj(MT) (xy) (A.40c)[{(Mx) (My)}] =M[(xy)]MT (A.40d)

If we writeMin terms of its columns

M

= x1x2x3 (A.41)

then

det(M) = xT1(x2 x3) (A.42)Also, ifAis an orthogonal matrix with determinant 1, then from eqn. (A.40b) we

have

A[x]AT = [(Ax)] (A.43)Another important quantity using eqn. (A.39c) is given by

[x]2 = xTxI+xxT (A.44)This matrix is the projection operator onto the space perpendicular to x. Many other

interesting relations involving the cross product are given in Ref. [3].



10/20


Table A.1:Matrix and Vector Norms

Norm Vector Matrix

One-norm ||x||1 =ni=1 |xi | ||A||1 = maxj ni=1 |ai j |Two-norm ||x||2 =

ni=1x

2i

1/2 ||A||2 = max singular value ofAFrobenius norm ||x||F= ||x||2 ||A||F=

Tr(AA)

Infinity-norm ||x|| = maxi

|xi | |||A|| = maxi

nj=1 |ai j |

The angle inFigure A.1(a) can be computed from

sin( ) =||xy||||x||||y|| (A.45)

Using sin2( ) + cos2( ) = 1, eqns. (A.32) and (A.45) also give

||xy|| =(xTx)(yTy) (xTy)2 (A.46)From the Schwartz inequality in eqn. (A.35), the quantity within the square root in

eqn. (A.46) is always positive.

A.3 Matrix Norms and Definiteness

Norms for matrices are slightly more difficult to define than for vectors. Also, thedefinition of a positive or negative matrix is more complicated for matrices than

scalars. Before showing these quantities, we first define the following quantities for

any complex matrixA:

Conjugate transpose: defined as the transpose of the conjugate of each ele-ment; denoted byA.

Hermitian: has the propertyA =A(note: any real symmetric matrix is Her-mitian).

Normal: has the propertyAA =A A. Unitary: inverse is equal to its Hermitian transpose, so thatAA =A A =I

(note: a real unitary matrix is an orthogonal matrix).



11/20


Matrix Norms

Several possible matrix norms can be defined.Table A.1lists the most commonly

used norms for both vectors and matrices. The one-norm is the largest column sum.

The two-norm is the maximum singular value (see A.4). Also, unless otherwise

stated, the norm defined without showing a subscript is the two-norm, as shown byeqn. (A.27). The Frobenius norm is defined as the square root of the sum of the

absolute squares of its elements. The infinity-norm is the largest row sum. The

matrix norms described in Table A.1 have the following properties:

||A| |= || ||A|| (A.47a)||A +B| || |A||+||B|| (A.47b)

||A B| || |A||||B|| (A.47c)Not all norms follow eqn. (A.47c) though (e.g., the maximum absolute matrix ele-

ment). More matrix norm properties can be found in Refs. [4] and [5].

Definiteness

Sufficiency tests in least squares and the minimization of functions with multiple

variables often require that one determine thedefinitenessof the matrix of second

partial derivatives. A real and square matrixAis

Positive definiteifxTA x>0 for all nonzerox.

Positive semi-definiteifxTA x

0 for all nonzerox.

Negative definiteifxTA x 0 there exists a unique positive semi-definite matrix such that

A =B (note: AandBcommute, so thatA B =B A). The following relationship:B > A (A.49)

implies(B A) >0, which states that the matrix (B A)is positive definite. Also,B A (A.50)



12/20


implies (B A) 0, which states that the matrix (B A)is positive semi-definite.The conditions for negative definite and negative semi-definite are obvious from the

definitions stated for positive definite and positive semi-definite.

A.4 Matrix Decompositions

Several matrix decompositions are given in the open literature. Many of these

decompositions are used in place of a matrix inverse either to simplify the calcula-

tions or to provide more numerically robust approaches. In this section we present

several useful matrix decompositions that are widely used in estimation and control

theory. The methods to compute these decompositions is beyond the scope of thepresent text. Reference [4] provides all the necessary algorithms and proofs for the

interested reader. Before we proceed a short description of the rank of a matrix is

provided. Several definitions are possible. We will state that the rank of a matrix is

given by the dimension of the range of the matrix corresponding to the number of

linearly independent rows or columns. An m n matrix is rank deficient if the rankof A is less than the minimum (m,n). Suppose that the rank of an n n matrix A isgiven by rank(A) =r. Then, a set of (n r)nonzero unit vectors, xi , can always befound that have the following property for a singular square matrix A:

A xi =0, i =1,2, . . . ,n r (A.51)The value of (n r)is known as the nullity, which is the maximum number of lin-early independent null vectors of A. These vectors can form an orthonormal basis

(which is how they are commonly shown) for the null space of A, and can be com-

puted from the singular value decomposition. If A is nonsingular then no nonzero

vector xi can be found to satisfy eqn. (A.51). For more details on the rank of a matrix

see Refs. [4] and [6].

Eigenvalue/Eigenvector Decomposition and the Cayley-Hamilton Theorem

One of the most widely used decompositions for a square n n matrix A in thestudy of dynamical systems is the eigenvalue/eigenvector decomposition. A real or

complex number is an eigenvalue of A if there exists a nonzero (right) eigenvector

psuch that

A p = p (A.52)The solution forpis not unique in general, so usuallypis given as a unit vector. In

order for eqn. (A.52) to have a nonzero solution for p, from eqn. (A.51), the matrix

(I

A)must be singular. Therefore, from eqn. (A.11) we have

det(IA) = n + 1n1 + + n1 + n= 0 (A.53)Equation (A.53) leads to a polynomial of degreen, which is called thecharacteristic

equationofA. For example, the characteristic equation for a 3 3 matrix is given



13/20


by

3 2Tr(A) + Tr[adj(A)]det(A) = 0 (A.54)If all eigenvalues ofAare distinct, then the set of eigenvectors is linearly indepen-

dent. Therefore, the matrixAcan bediagonalizedas

=P1A P (A.55)

where = diag 1 2 n andP= p1p2 pn. IfAhas repeated eigenval-ues, then a block diagonal and triangular-form representation must be used, called

aJordan block.6 The eigenvalue/eigenvector decomposition can be used for linear

state variable transformations (see 3.1.4). Eigenvalues and eigenvectors can either

be real or complex. This decomposition is very useful whenAis symmetric, since

is always diagonal (even for repeated eigenvalues) and P is orthogonal for this case.

A proof is given in Ref. [4]. Also, Ref. [4] provides many algorithms to compute theeigenvalue/eigenvector decomposition.

One of the most useful properties used in linear algebra is the Cayley-Hamilton

theorem, which states that a matrix satisfies its own characteristic equation, so that

An + 1An1 + + n1A + nI= 0 (A.56)

This theorem is useful for computing powers ofAthat are larger thann, sinceAn+1can be written as a linear combination of(A,A2, . . . ,An).6

Q RDecompositionTheQ Rdecomposition is especially useful in least squares (see 1.6.1) and the

Square Root Information Filter (SRIF) (see 5.7.1). The Q Rdecomposition of an

m nmatrixA, withm n, is given by

A =QR (A.57)

where Qis anm morthogonal matrix, and Ris an upper triangularm nmatrixwith all elements Ri j= 0 fori > j . IfAhas full column rank, then the firstncolumns ofQform an orthonormal basis for the range ofA.

4

Therefore, the thinQ Rdecomposition is often used:

A = Q R (A.58)

whereQ is anm nmatrix with orthonormal columns andRis an upper triangularn nmatrix. Since theQ Rdecomposition is widely used throughout the presenttext, we present a numerical algorithm to compute this decomposition by the modi-

fied Gram-Schmidtmethod.4 LetAandQ be partitioned by columns

a1a2 anand q

1q

2 q

n, respectively. To begin the algorithm we setQ =A, and thenfork= 1,2, . . . ,n

rkk= ||qk||2qk qk/rkk

rk j= qTkqj , j= k+1, . . . ,n



14/20


qj qj rk j qk, j= k+1, . . . ,nnextk

where denotes replacement (note: rkk andrk j are elements of the matrix R).This algorithm works even when Ais complex. TheQ Rdecomposition is useful

to invert ann

nmatrix A, which is given by A1=

R1 QT (note: the inverseof an upper triangular matrix is also a triangular matrix). Other methods, based

on the Householder transformation and Givens rotations, can be used for the Q R

decomposition.4

Singular Value Decomposition

Another decomposition of anm nmatrix Ais thesingular-value decomposi-tion,4, 7 which decomposes a matrix into a diagonal matrix and two orthogonal ma-

trices:

A

=U S V (A.59)

where Uis anm munitary matrix, Sis anm ndiagonal matrix such that Si j= 0fori=j , and Vis ann nunitary matrix. Many efficient algorithms can be used todetermine the singular value decomposition.4 Note that the zeros below the diagonal

in S(withm >n) imply that the elements of columns(n + 1), (n + 2), .. .,mofUare arbitrary. So, we can define the following reduced singular value decomposition:

A = U S V (A.60)

whereUis them

nsubset matrix ofU(with the (n+

1), (n+

2), .. .,mcolumns

eliminated), Sis the upper n n matrix ofS, andV= V. Note thatUU= I,but it is no longer possible to make the same statement for U U. The elements ofS= diag s1 sn are known as thesingular valuesofA, which are ordered fromthe smallest singular value to the largest singular value. These values are extremely

important since they can give an indication of how well we can invert a matrix.8 A

common measure of the invertability of a matrix is thecondition number, which is

usually defined as the ratio of its largest singular value to its smallest singular value:

Condition Number =sn

s1 (A.61)

Large condition numbers may indicate a near singular matrix, and the minimum

value of the condition number is unity (which occurs when the matrix is orthogonal).

The rank ofAis given by the number of nonzero singular values. Also, the singular

value decomposition is useful to determine various norms (e.g., ||A||2F= s21 + +s2p, where p =min(m,n), and the two-norm as shown inTable A.1).Gaussian Elimination

Gaussian eliminationis a classical reduction procedure by which a matrixAcanbe reduced to upper triangular form. This procedure involves pre-multiplications of

a square matrixAby a sequence ofelementary lower triangularmatrices, each cho-

sen to introduce a column with zeros below the diagonal (this process is often called

annihilation). Several possible variations of Gaussian elimination can be derived.



15/20


We present a very robust algorithm calledGaussian elimination with complete piv-

oting. This approach requires data movements such as the interchange of two matrix

rows. These interchanges can be tracked by using permutation matrices, which are

just identity matrices with rows or columns reordered. For example, consider the

following matrix:

P=

0 0 0 1

1 0 0 0

0 0 1 0

0 1 0 0

(A.62)

So P A is a row permutated version ofAandA Pis a column permutated version

ofA. Permutation matrices are orthogonal. This algorithm computes the complete

pivoting factorization

A =P L U QT (A.63)whereP andQare permutation matrices,Lis aunit(with ones along the diagonal)

lower triangular matrix, andUis an upper triangular matrix. The algorithm begins

by settingP= Q =I, which are partitioned into column vectors as

P= p1p2 pn , Q = q1q2 qn (A.64)The algorithm for Gaussian elimination with complete pivoting is given by overwrit-

ing theAmatrix:

fork

=1,2, . . . ,n

1

Determinewithk nand withk nso|a| = max{|ai j |,i= 1,2, . . . ,n, j= 1,2, . . . ,n}

if = kpk pak j aj , j= 1,2, . . . ,n

end if

if = kqk qajk

aj, j

=1,2, . . . ,n

end ififakk= 0

ajk ak j /akk, j= k+1, . . . ,naj j aj j ajkak j , j= k+1, . . . ,n

end if

nextk

where denotes replacement and denotes interchange the value assigned to.The matrixUis given by the upper triangular part (including the diagonal elements)

of the overwrittenAmatrix, and the matrixLis given by the lower triangular part (re-

placing the diagonal elements with ones) of the overwritten Amatrix. More detailson Gaussian elimination can be found in Ref. [4].



16/20


LUand Cholesky Decompositions

TheLUdecomposition factors ann nmatrixAinto a product of a lower trian-gular matrixLand an upper triangular matrixU, so that

A

=L U (A.65)

Gaussian elimination is a foremost example ofLUdecompositions. In general, the

LUdecomposition is not unique. This can be seen by observing that for an arbitrary

nonsingular diagonal matrixD, settingL =L DandU =D1Uyield new upperand lower triangular matrices that satisfy L U =L D D1U=L U=A. The factthat the decomposition is not unique suggests the possible wisdom of forming the

normalizeddecomposition

A =L D U (A.66)in whichL andUare unit lower and upper triangular matrices and Dis a diagonal

matrix. The question of existence and uniqueness is addressed by Stewart1 whoproves that the A =L D Udecomposition is unique, provided the leading diagonalsub-matrices ofAare nonsingular.

There are three important variants of theL D Udecomposition; the first associates

Dwith the lower triangular part to give the factorization

A = LU (A.67)where LL D. This is known as theCrout reduction. The second variant associatesDwith the upper triangular factor as

A =LU (A.68)where UD U. This reduction is exactly that obtained by Gaussian elimination.

The third variation is possible only for symmetric positive definite matrices, in

which case

A =L D LT (A.69)ThusAcan be written as

A = LLT (A.70)where now L L D1/2 is known as thematrix square root, and the factorization ineqn. (A.69) is known as the Cholesky decomposition. Efficient algorithms to compute

theLUand Cholesky decompositions can be found in Ref. [4].

A.5 Matrix Calculus

In this section several relations are given for taking partial or time derivatives of

matrices. Before providing a list of matrix calculus identities, we first will define

Most of these relations can be found in a website given by Mike Brooks, Imperial College, London, UK.

As of this writing this website is given byhttp://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html.

http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/


17/20


theJacobianandHessianof a scalar function f(x), wherexis ann 1 vector. TheJacobian of f(x)is ann 1 vector given by

xff

x=

f

x1

f

x2

...

f

xn

(A.71)

The Hessian of f(x)is ann nmatrix given by

2xf 2f

xxT=

f

x1x1

f

x1x2 f

x1xn

f

x2x1

f

x2x2 f

x2xn

......

. . ....

f

xnx1

f

xnx2 f

xnxn

(A.72)

Note that the Hessian of a scalar is a symmetric matrix. Iff(x)is anm 1 vector andxis ann 1 vector, then the Jacobian matrix is given by

xf f

x=

f1

x1

f1

x2 f1

xn

f2

x1

f2

x2 f2

xn

......

. . ....

fm

x1

fm

x2 fm

xn

(A.73)

Note that the Jacobian matrix is anmnmatrix. Also, there is a slight inconsistencybetween eqn. (A.71) and eqn. (A.73) whenm = 1, since eqn. (A.71) gives ann 1vector, while eqn. (A.73) gives a 1 nvector. This should pose no problems forthe reader though since the context of this notation is clear for the particular system

shown in this text.



18/20


A list of derivatives involving linear products is given by

x(Ax) =A (A.74a)

A(aTA b)

=abT (A.74b)

A(aTATb) = baT (A.74c)

d

dt(A B) =A

d

dt(B)

+

d

dt(A)

B (A.74d)

A list of derivatives involving quadratic and cubic products is given by

x(A x+b)TC(D x+e) =ATC(D x+e) +DTCT(A x+b) (A.75a)

x(xTCx) = (C+CT) x (A.75b)

A(aTATAb) =A (abT+baT) (A.75c)

A(aTATC A b) = CTA abT+C A baT (A.75d)

A(A a+b)TC(A a+b) = (C+CT) (A a+b)aT (A.75e)

x

(xTA xxT) = (A +AT)xxT+ (xTA x)I (A.75f)

A list of derivatives involving the inverse of a matrix is given by

d

dt(A1) = A1

d

dt(A)

A1 (A.76a)

A(aTA1b) = ATabTAT (A.76b)

A list of derivatives involving the trace of a matrix is given by

ATr(A) =

ATr(AT) =I (A.77a)

ATr(A) =

A1

T(A.77b)

ATr(C A1B) = ATC B AT (A.77c)

ATr(CTA BT) =

ATr(B ATC) = C B (A.77d)

ATr(C A B ATD) = CTDTA BT+D C A B (A.77e)

ATr(C A B A) = CTATBT+BTATCT (A.77f)



19/20


A list of derivatives involving the determinant of a matrix is given by

Adet(A) =

Adet(AT) = [adj(A)]T (A.78a)

Adet(C A B)

=det(C A B)A

T (A.78b)

Aln[det(C A B)] =AT (A.78c)

Adet(A) = det(A)AT (A.78d)

Aln[det(A)] = AT (A.78e)

Adet(ATC A) = det(ATC A) (C+CT)A(ATC A)1 (A.78f)

Aln[det(ATC A)] = (C+CT)A(ATC A)1 (A.78g)

Relations involving the Hessian matrix are given by

2

xxT(A x+b)TC(D x+e) =ATC D +DTCTA (A.79a)

2

xxT(xTCx) = C+CT (A.79b)

References

[1] Stewart, G.W.,Introduction to Matrix Computations, Academic Press, New

York, NY, 1973.

[2] Shuster, M.D., A Survey of Attitude Representations,Journal of the Astro-

nautical Sciences, Vol. 41, No. 4, Oct.-Dec. 1993, pp. 439517.

[3] Tempelman, W., The Linear Algebra of Cross Product Operations,Journal

of the Astronautical Sciences, Vol. 36, No. 4, Oct.-Dec. 1988, pp. 447461.

[4] Golub, G.H. and Van Loan, C.F.,Matrix Computations, The Johns Hopkins

University Press, Baltimore, MD, 3rd ed., 1996.

[5] Zhang, F.,Linear Algebra: Challenging Problems for Students, The Johns

Hopkins University Press, Baltimore, MD, 1996.

[6] Chen, C.T.,Linear System Theory and Design, Holt, Rinehart and Winston,New York, NY, 1984.

[7] Horn, R.A. and Johnson, C.R.,Matrix Analysis, Cambridge University Press,

Cambridge, MA, 1985.



20/20


[8] Nash, J.C.,Compact Numerical Methods for Computers: linear algebra and

function minimization, Adam Hilger Ltd., Bristol, 1979.

Documents

c391x Appendix A