View
214
Download
0
Category
Preview:
Citation preview
8/10/2019 c391x Appendix A
1/20
A
Matrix Properties
THIS appendix provides a reasonably comprehensive account of matrix proper-ties, which are used in the linear algebra of estimation and control theory. Sev-eral theorems are shown, but are not proven here; those proofs given areconstructive
(i.e., suggest an algorithm). The account here is thus not satisfactorily self-contained,
but references are provided where rigorous proofs may be found.
A.1 Basic Definitions of Matrices
The system ofm linear equations
y1 = a11x1 +a12x2 + +a1nxny2 = a21x1 +a22x2 + +a2nxn
...
ym= am1x1 +am2x2 + +amnxn
(A.1)
can be written in matrix form as
y =A x (A.2)whereyis anm
1 vector,xis ann
1 vector (seeA.2 for a definition of a vector)
andAis anm nmatrix, with
y =
y1y2
...
ym
, x =
x1x2
...
xn
, A =
a11 a12 a1na21 a22 a2n
......
. . ....
am1am2 amn
(A.3)
Ifm = n, then the matrixAissquare.
Matrix Addition, Subtraction, and MultiplicationMatrices can be added, subtracted, or multiplied. For addition and subtraction,
all matrices must of the same dimension. Suppose we wish to add/substract two
matricesAandB :
C=A B (A.4)
533 2004 by CRC Press LLC
8/10/2019 c391x Appendix A
2/20
534 Optimal Estimation of Dynamic Systems
Then each element ofCis given byci j= ai j bi j . Matrix addition and subtractionare both commutative, A B=B A, and associative, (A B) C=A (B C). Matrix multiplication is much more complicated though. Suppose we wish to
multiply two matricesAandB :
C
=A B (A.5)
This operation is valid only when the number of columns ofAis equal to the number
of rows ofB(i.e.,AandBmust beconformable). The resulting matrixCwill have
rows equal to the number of rows ofAand columns equal to the number of columns
ofB . Thus, ifAhas dimensionm nand Bhas dimensionn p, thenCwill havedimensionm p. Theci jelement ofCcan be determined by
ci j=n
k=1
aikbk j (A.6)
for alli=1,2, . . . ,m and j=1,2, . . . ,p. Matrix multiplication is associative,A (B C) = (A B)C, and distributive, A (B + C) =A B +AC, but not commutativein general,A B =B A. In some cases though ifA B =B A, thenAandBare said tocommute.
The transpose of a matrix, denoted AT, has rows that are the columns ofAand
columns that are the rows ofA. The transpose operator has the following properties:
(A)T = AT,whereis a scalar (A.7a)(A +B)T =AT+BT (A.7b)
(A B)T =BTAT (A.7c)
IfA =AT, then Ais said to be asymmetricmatrix. Also, ifA = AT, then Aissaid to be askew symmetricmatrix.
Matrix Inverse
We now discuss the properties of the matrix inverse. Suppose we are given bothy
and A in eqn. (A.2), and we want to determine x. The following terminology should
be noted carefully: ifm >n, the system in eqn. (A.2) is said to be overdetermined
(there are more equations than unknowns). Under typical circumstances we will find
that the exact solution forxdoes not exist; therefore algorithms for approximate
solutions forxare usually characterized by some measure ofhow wellthe linear
equations are satisfied. Ifm
8/10/2019 c391x Appendix A
3/20
Matrix Properties 535
Ahas linearly independent columns. Ahas linearly independent rows. The inverse satisfiesA1A =A A1 =I
whereIis ann nidentity matrix:
I=
1 0 00 1 0...
.... . .
...
0 0 1
(A.8)
Anonsingularmatrix is a matrix whose inverse exists (likewiseAT is nonsingular):
(A1)1
=A (A.9a)
(AT)1 = (A1)T AT (A.9b)Furthermore, letAandB ben nmatrices. The matrix productA Bis nonsingularif and only ifAandBare nonsingular. If this condition is met, then
(A B)1 =B1A1 (A.10)Formal proof of this relationship and other relationships are given in Ref. [1]. The
inverse of a square matrixAcan be computed by
A1 =adj(A)
det(A) (A.11)
where adj(A)is theadjointofAand det(A)is thedeterminantofA. The adjoint
and determinant of a matrix with large dimension can ultimately be broken down to
a series of 2 2 matrix cases, where the adjoint and determinant are given by
adj(A22) =
a22 a12a21 a11
(A.12a)
det(A22) = a11a22 a12a21 (A.12b)Other determinant identities are given by
det(I) = 1 (A.13a)det(A B) = det(A)det(B) (A.13b)
det(A B) = det(B A) (A.13c)det(A B +I) = det(B A +I) (A.13d)
det(A +xyT) = det(A) (1+yTA1x) (A.13e)det(A)det(D
+C A1B)
=det(D)det(A
+B D1C) (A.13f)
det(A) = [det(A)], must be positive if det(A) = 0 (A.13g)det(A) = n det(A) (A.13h)
det(A33) det
a b c= aT[b]c = bT[c]a = cT[a]b (A.13i)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
4/20
536 Optimal Estimation of Dynamic Systems
where the matrices [a], [b], and [a] are defined in eqn. (A.38). The adjoint isgiven by the transpose of thecofactormatrix:
adj(A) = [cof(A)]T (A.14)
The cofactor is given byCi j= (1)i+jMi j (A.15)
where Mi j is theminor, which is the determinant of the resulting matrix given by
crossing out the row and column of the elementai j . The determinant can be com-
puted using an expansion about rowior column j :
det(A) =n
k=1aikCik=
n
k=1ak j Ck j (A.16)
From eqn. (A.11)A1 exists if and only if the determinant ofAis nonzero. Matrixinverses are usually complicated to compute numerically; however, a special case is
when the inverse is given by the transpose of the matrix itself. This matrix is then
said to beorthogonalwith the property
ATA =A AT =I (A.17)
Also, the determinant of an orthogonal matrix can be shown to be 1. An orthogonalmatrix preserves the length (norm) of a vector (see eqn. (A.27) for a definition of thenorm of a vector). Hence, ifAis an orthogonal matrix, then ||Ax| |=| |x||.Block Structures and Other Identities
Matrices can also be analyzed using block structures. Assume that Ais ann nmatrix and thatCis anm mmatrix. Then, we have
det
A B
0C
= det
A 0
B C
= det(A)det(C) (A.18a)
detA B
C D= det(A)det(P) = det(D)det(Q) (A.18b)
A B
C D
1= Q1 Q1B D1
D1C Q1 D1(I+C Q1B D1)
=A1(I+B P1C A1) A1B P1
P1C A1 P1
(A.18c)
whereP and QareSchur complementsofAandD :
PD C A1B (A.19a)Q A B D1C (A.19b)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
5/20
Matrix Properties 537
Other useful matrix identities involve theSherman-Morrison lemma, given by
(I+A B)1 =IA (I+B A)1B (A.20)and thematrix inversion lemma, given by
(A +B C D)1 =A1 A1B D A1B +C11D A1 (A.21)whereAis an arbitraryn nmatrix andCis an arbitrarym mmatrix. A proof ofthe matrix inversion lemma is given in 1.3.
Matrix Trace
Another useful quantity often used in estimation theory is thetraceof a matrix,
which is defined only for square matrices:
Tr(A)=
n
i=1
aii (A.22)
Some useful identities involving the matrix trace are given by
Tr(A) = Tr(A) (A.23a)Tr(A +B) = Tr(A) +Tr(B) (A.23b)
Tr(A B) = Tr(B A) (A.23c)Tr(xyT) = xTy (A.23d)
Tr(AyxT)=
xTA y (A.23e)
Tr(A B C D) = Tr(B C D A) = Tr(C D A B) = Tr(D A B C) (A.23f)Equation (A.23f) shows the cyclic invariance of the trace. The operationyxT is
known as theouter product(alsoyxT = xyT in general).Solution of Triangular Systems
Anupper triangular systemof linear equations has the form
t11x1 + t12x2 + t13x3 + + t1nxn=y1t22x2
+t23x3
+ +t2nxn
=y2
t33x3 + + t3nxn=y3...
tnnxn=yn
(A.24)
or
Tx = y (A.25)where
T=
t11t12t13
t1n
0 t22t23 t2n0 0 t33 t3n...
.... . .
0 0 tnn
(A.26)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
6/20
538 Optimal Estimation of Dynamic Systems
The matrixTcan be shown to be nonsingular if and only if its diagonal elements are
nonzero.1 Clearly,xncan be easily determined using the upper triangular form. The
xicoefficients can be determined by aback substitution algorithm:
fori= n,n 1, . . . ,1
xi= t1i iyi n
j=i+1ti jxj
nexti
This algorithm will fail only ifti i 0. But, this can occur only ifTis singular (ornearly singular). Experience indicates that the algorithm is well-behaved for most
applications though.
The back substitution algorithm can be modified to compute the inverse,S= T1,of an upper triangular matrix T. We now summarize an algorithm for calculating
S= T1
and overwritingTbyT1
:fork= n,n 1, . . . ,1
tkk Skk= t1kktik Sik= t1ii
kj=i+1
ti j sjk, i= k1,k2, . . . ,1
nextk
wheredenotes replacement. This algorithm requires aboutn3/6 calculations(note: if only the solution ofxis required and not the explicit form forT1, then the
back substitution algorithm should be solely employed since onlyn2/2 calculationsare required for this algorithm).
A.2 Vectors
The quantitiesxandyin eqn. (A.2) are known asvectors, which are a special case
of a matrix. Vectors can consist of one row, known as a row vector, or one column,
known as acolumn vector.
Vector Norm and Dot Product
A measure of the length of a vector is given by the norm:
||x||
xTx =
n
i=1x2i
1/2(A.27)
Also,|| x| | = || ||x||. A vector with norm one is said to be aunit vector. AnyThe symbolxy means overwritexby the currenty-value. This notation is employed to indicatehow storage may be conserved by overwriting quantities no longer needed.
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
7/20
Matrix Properties 539
y
x
y x
(a) Angle between Two Vectors
y
p
x
y p
(b) Orthogonal Projection
Figure A.1:Depiction of the Angle between Two Vectors and an Orthogonal
Projection
nonzero vector can be made into a unit vector by dividing it by its norm:
x x||x|| (A.28)
Note that the carat is also used to denote estimate in this text. Thedot productor
inner productof two vectors of equal dimension,n 1, is given by
xTy = yTx =n
i=1xiyi (A.29)
If the dot product is zero, then the vectors are said to be orthogonal. Suppose that a
set of vectorsxi (i= 1,2, . . . ,m)follows
xTixj= i j (A.30)
where the Kronecker delta i jis defined as
i j= 0 ifi=j= 1 ifi=j (A.31)
Then, this set is said to beorthonormal. The column and row vectors of an orthogo-
nal matrix, defined by the property shown in eqn. (A.17), form an orthonormal set.
Angle Between Two Vectors and the Orthogonal Projection
Figure A.1(a) shows two vectors, xandy, and the angle which is the anglebetween them. This angle can be computed from the cosine law:
cos( ) = xTy
||x||||y|| (A.32)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
8/20
540 Optimal Estimation of Dynamic Systems
x
= z x y
y
Figure A.2:Cross Product and the Right Hand Rule
Figure A.1(b) shows the orthogonal projection of a vector y to a vector x. Theorthogonal projection ofytoxis given by
p = xTy
||x||2 x (A.33)
This projection yields(yp)Tx = 0.Triangle and Schwartz Inequalities
Some important inequalities are given by thetriangle inequality:
||x+y| || |x||+||y|| (A.34)
and theSchwartz inequality:
|xTy| | |x||||y|| (A.35)Note that the Schwartz inequality implies the triangle inequality.
Cross Product
The cross product of two vectors yields a vector that is perpendicular to bothvectors. The cross product ofxandyis given by
z = xy =x2y3 x3y2x3y1 x1y3
x1y2 x2y1
(A.36)
The cross product follows theright hand rule, which states that the orientation ofz
is determined by placingxandytail-to-tail, flattening the right hand, extending it in
the direction ofx, and then curling the fingers in the direction that the angleymakeswithx. The thumb then points in the direction ofz, as shown in Figure A.2. The
cross product can also be obtained using matrix multiplication:
z = [x]y (A.37)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
9/20
Matrix Properties 541
where [x] is thecross product matrix, defined by
[x]
0 x3 x2
x3 0 x1
x2 x1 0
(A.38)
Note that [x] is a skew symmetric matrix.The cross product has the following properties:
[x]T = [x] (A.39a)[x]y = [y]x (A.39b)
[x] [y] =
xTyI+yxT (A.39c)
[x]3
= xTx [x] (A.39d)[x] [y] [y] [x] = yxTxyT = [(xy)] (A.39e)
xyT[w]+ [w]yxT = [{x (yw)}] (A.39f)
(I [x])(I+ [x])1 = 11+xTx
(1xTx)I+2xxT2[x]
(A.39g)
Other useful properties involving an arbitrary 3 3 square matrixMare given by2
M[x]+ [x]MT
+ [(MT
x)] = Tr(M) [x] (A.40a)M[x]MT = [{adj(MT) x}] (A.40b)
(Mx) (My) = adj(MT) (xy) (A.40c)[{(Mx) (My)}] =M[(xy)]MT (A.40d)
If we writeMin terms of its columns
M
= x1x2x3 (A.41)
then
det(M) = xT1(x2 x3) (A.42)Also, ifAis an orthogonal matrix with determinant 1, then from eqn. (A.40b) we
have
A[x]AT = [(Ax)] (A.43)Another important quantity using eqn. (A.39c) is given by
[x]2 = xTxI+xxT (A.44)This matrix is the projection operator onto the space perpendicular to x. Many other
interesting relations involving the cross product are given in Ref. [3].
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
10/20
542 Optimal Estimation of Dynamic Systems
Table A.1:Matrix and Vector Norms
Norm Vector Matrix
One-norm ||x||1 =ni=1 |xi | ||A||1 = maxj ni=1 |ai j |Two-norm ||x||2 =
ni=1x
2i
1/2 ||A||2 = max singular value ofAFrobenius norm ||x||F= ||x||2 ||A||F=
Tr(AA)
Infinity-norm ||x|| = maxi
|xi | |||A|| = maxi
nj=1 |ai j |
The angle inFigure A.1(a) can be computed from
sin( ) =||xy||||x||||y|| (A.45)
Using sin2( ) + cos2( ) = 1, eqns. (A.32) and (A.45) also give
||xy|| =(xTx)(yTy) (xTy)2 (A.46)From the Schwartz inequality in eqn. (A.35), the quantity within the square root in
eqn. (A.46) is always positive.
A.3 Matrix Norms and Definiteness
Norms for matrices are slightly more difficult to define than for vectors. Also, thedefinition of a positive or negative matrix is more complicated for matrices than
scalars. Before showing these quantities, we first define the following quantities for
any complex matrixA:
Conjugate transpose: defined as the transpose of the conjugate of each ele-ment; denoted byA.
Hermitian: has the propertyA =A(note: any real symmetric matrix is Her-mitian).
Normal: has the propertyAA =A A. Unitary: inverse is equal to its Hermitian transpose, so thatAA =A A =I
(note: a real unitary matrix is an orthogonal matrix).
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
11/20
Matrix Properties 543
Matrix Norms
Several possible matrix norms can be defined.Table A.1lists the most commonly
used norms for both vectors and matrices. The one-norm is the largest column sum.
The two-norm is the maximum singular value (see A.4). Also, unless otherwise
stated, the norm defined without showing a subscript is the two-norm, as shown byeqn. (A.27). The Frobenius norm is defined as the square root of the sum of the
absolute squares of its elements. The infinity-norm is the largest row sum. The
matrix norms described in Table A.1 have the following properties:
||A| |= || ||A|| (A.47a)||A +B| || |A||+||B|| (A.47b)
||A B| || |A||||B|| (A.47c)Not all norms follow eqn. (A.47c) though (e.g., the maximum absolute matrix ele-
ment). More matrix norm properties can be found in Refs. [4] and [5].
Definiteness
Sufficiency tests in least squares and the minimization of functions with multiple
variables often require that one determine thedefinitenessof the matrix of second
partial derivatives. A real and square matrixAis
Positive definiteifxTA x>0 for all nonzerox.
Positive semi-definiteifxTA x
0 for all nonzerox.
Negative definiteifxTA x 0 there exists a unique positive semi-definite matrix such that
A =B (note: AandBcommute, so thatA B =B A). The following relationship:B > A (A.49)
implies(B A) >0, which states that the matrix (B A)is positive definite. Also,B A (A.50)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
12/20
544 Optimal Estimation of Dynamic Systems
implies (B A) 0, which states that the matrix (B A)is positive semi-definite.The conditions for negative definite and negative semi-definite are obvious from the
definitions stated for positive definite and positive semi-definite.
A.4 Matrix Decompositions
Several matrix decompositions are given in the open literature. Many of these
decompositions are used in place of a matrix inverse either to simplify the calcula-
tions or to provide more numerically robust approaches. In this section we present
several useful matrix decompositions that are widely used in estimation and control
theory. The methods to compute these decompositions is beyond the scope of thepresent text. Reference [4] provides all the necessary algorithms and proofs for the
interested reader. Before we proceed a short description of the rank of a matrix is
provided. Several definitions are possible. We will state that the rank of a matrix is
given by the dimension of the range of the matrix corresponding to the number of
linearly independent rows or columns. An m n matrix is rank deficient if the rankof A is less than the minimum (m,n). Suppose that the rank of an n n matrix A isgiven by rank(A) =r. Then, a set of (n r)nonzero unit vectors, xi , can always befound that have the following property for a singular square matrix A:
A xi =0, i =1,2, . . . ,n r (A.51)The value of (n r)is known as the nullity, which is the maximum number of lin-early independent null vectors of A. These vectors can form an orthonormal basis
(which is how they are commonly shown) for the null space of A, and can be com-
puted from the singular value decomposition. If A is nonsingular then no nonzero
vector xi can be found to satisfy eqn. (A.51). For more details on the rank of a matrix
see Refs. [4] and [6].
Eigenvalue/Eigenvector Decomposition and the Cayley-Hamilton Theorem
One of the most widely used decompositions for a square n n matrix A in thestudy of dynamical systems is the eigenvalue/eigenvector decomposition. A real or
complex number is an eigenvalue of A if there exists a nonzero (right) eigenvector
psuch that
A p = p (A.52)The solution forpis not unique in general, so usuallypis given as a unit vector. In
order for eqn. (A.52) to have a nonzero solution for p, from eqn. (A.51), the matrix
(I
A)must be singular. Therefore, from eqn. (A.11) we have
det(IA) = n + 1n1 + + n1 + n= 0 (A.53)Equation (A.53) leads to a polynomial of degreen, which is called thecharacteristic
equationofA. For example, the characteristic equation for a 3 3 matrix is given
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
13/20
Matrix Properties 545
by
3 2Tr(A) + Tr[adj(A)]det(A) = 0 (A.54)If all eigenvalues ofAare distinct, then the set of eigenvectors is linearly indepen-
dent. Therefore, the matrixAcan bediagonalizedas
=P1A P (A.55)
where = diag 1 2 n andP= p1p2 pn. IfAhas repeated eigenval-ues, then a block diagonal and triangular-form representation must be used, called
aJordan block.6 The eigenvalue/eigenvector decomposition can be used for linear
state variable transformations (see 3.1.4). Eigenvalues and eigenvectors can either
be real or complex. This decomposition is very useful whenAis symmetric, since
is always diagonal (even for repeated eigenvalues) and P is orthogonal for this case.
A proof is given in Ref. [4]. Also, Ref. [4] provides many algorithms to compute theeigenvalue/eigenvector decomposition.
One of the most useful properties used in linear algebra is the Cayley-Hamilton
theorem, which states that a matrix satisfies its own characteristic equation, so that
An + 1An1 + + n1A + nI= 0 (A.56)
This theorem is useful for computing powers ofAthat are larger thann, sinceAn+1can be written as a linear combination of(A,A2, . . . ,An).6
Q RDecompositionTheQ Rdecomposition is especially useful in least squares (see 1.6.1) and the
Square Root Information Filter (SRIF) (see 5.7.1). The Q Rdecomposition of an
m nmatrixA, withm n, is given by
A =QR (A.57)
where Qis anm morthogonal matrix, and Ris an upper triangularm nmatrixwith all elements Ri j= 0 fori > j . IfAhas full column rank, then the firstncolumns ofQform an orthonormal basis for the range ofA.
4
Therefore, the thinQ Rdecomposition is often used:
A = Q R (A.58)
whereQ is anm nmatrix with orthonormal columns andRis an upper triangularn nmatrix. Since theQ Rdecomposition is widely used throughout the presenttext, we present a numerical algorithm to compute this decomposition by the modi-
fied Gram-Schmidtmethod.4 LetAandQ be partitioned by columns
a1a2 anand q
1q
2 q
n, respectively. To begin the algorithm we setQ =A, and thenfork= 1,2, . . . ,n
rkk= ||qk||2qk qk/rkk
rk j= qTkqj , j= k+1, . . . ,n
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
14/20
546 Optimal Estimation of Dynamic Systems
qj qj rk j qk, j= k+1, . . . ,nnextk
where denotes replacement (note: rkk andrk j are elements of the matrix R).This algorithm works even when Ais complex. TheQ Rdecomposition is useful
to invert ann
nmatrix A, which is given by A1=
R1 QT (note: the inverseof an upper triangular matrix is also a triangular matrix). Other methods, based
on the Householder transformation and Givens rotations, can be used for the Q R
decomposition.4
Singular Value Decomposition
Another decomposition of anm nmatrix Ais thesingular-value decomposi-tion,4, 7 which decomposes a matrix into a diagonal matrix and two orthogonal ma-
trices:
A
=U S V (A.59)
where Uis anm munitary matrix, Sis anm ndiagonal matrix such that Si j= 0fori=j , and Vis ann nunitary matrix. Many efficient algorithms can be used todetermine the singular value decomposition.4 Note that the zeros below the diagonal
in S(withm >n) imply that the elements of columns(n + 1), (n + 2), .. .,mofUare arbitrary. So, we can define the following reduced singular value decomposition:
A = U S V (A.60)
whereUis them
nsubset matrix ofU(with the (n+
1), (n+
2), .. .,mcolumns
eliminated), Sis the upper n n matrix ofS, andV= V. Note thatUU= I,but it is no longer possible to make the same statement for U U. The elements ofS= diag s1 sn are known as thesingular valuesofA, which are ordered fromthe smallest singular value to the largest singular value. These values are extremely
important since they can give an indication of how well we can invert a matrix.8 A
common measure of the invertability of a matrix is thecondition number, which is
usually defined as the ratio of its largest singular value to its smallest singular value:
Condition Number =sn
s1 (A.61)
Large condition numbers may indicate a near singular matrix, and the minimum
value of the condition number is unity (which occurs when the matrix is orthogonal).
The rank ofAis given by the number of nonzero singular values. Also, the singular
value decomposition is useful to determine various norms (e.g., ||A||2F= s21 + +s2p, where p =min(m,n), and the two-norm as shown inTable A.1).Gaussian Elimination
Gaussian eliminationis a classical reduction procedure by which a matrixAcanbe reduced to upper triangular form. This procedure involves pre-multiplications of
a square matrixAby a sequence ofelementary lower triangularmatrices, each cho-
sen to introduce a column with zeros below the diagonal (this process is often called
annihilation). Several possible variations of Gaussian elimination can be derived.
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
15/20
Matrix Properties 547
We present a very robust algorithm calledGaussian elimination with complete piv-
oting. This approach requires data movements such as the interchange of two matrix
rows. These interchanges can be tracked by using permutation matrices, which are
just identity matrices with rows or columns reordered. For example, consider the
following matrix:
P=
0 0 0 1
1 0 0 0
0 0 1 0
0 1 0 0
(A.62)
So P A is a row permutated version ofAandA Pis a column permutated version
ofA. Permutation matrices are orthogonal. This algorithm computes the complete
pivoting factorization
A =P L U QT (A.63)whereP andQare permutation matrices,Lis aunit(with ones along the diagonal)
lower triangular matrix, andUis an upper triangular matrix. The algorithm begins
by settingP= Q =I, which are partitioned into column vectors as
P= p1p2 pn , Q = q1q2 qn (A.64)The algorithm for Gaussian elimination with complete pivoting is given by overwrit-
ing theAmatrix:
fork
=1,2, . . . ,n
1
Determinewithk nand withk nso|a| = max{|ai j |,i= 1,2, . . . ,n, j= 1,2, . . . ,n}
if = kpk pak j aj , j= 1,2, . . . ,n
end if
if = kqk qajk
aj, j
=1,2, . . . ,n
end ififakk= 0
ajk ak j /akk, j= k+1, . . . ,naj j aj j ajkak j , j= k+1, . . . ,n
end if
nextk
where denotes replacement and denotes interchange the value assigned to.The matrixUis given by the upper triangular part (including the diagonal elements)
of the overwrittenAmatrix, and the matrixLis given by the lower triangular part (re-
placing the diagonal elements with ones) of the overwritten Amatrix. More detailson Gaussian elimination can be found in Ref. [4].
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
16/20
548 Optimal Estimation of Dynamic Systems
LUand Cholesky Decompositions
TheLUdecomposition factors ann nmatrixAinto a product of a lower trian-gular matrixLand an upper triangular matrixU, so that
A
=L U (A.65)
Gaussian elimination is a foremost example ofLUdecompositions. In general, the
LUdecomposition is not unique. This can be seen by observing that for an arbitrary
nonsingular diagonal matrixD, settingL =L DandU =D1Uyield new upperand lower triangular matrices that satisfy L U =L D D1U=L U=A. The factthat the decomposition is not unique suggests the possible wisdom of forming the
normalizeddecomposition
A =L D U (A.66)in whichL andUare unit lower and upper triangular matrices and Dis a diagonal
matrix. The question of existence and uniqueness is addressed by Stewart1 whoproves that the A =L D Udecomposition is unique, provided the leading diagonalsub-matrices ofAare nonsingular.
There are three important variants of theL D Udecomposition; the first associates
Dwith the lower triangular part to give the factorization
A = LU (A.67)where LL D. This is known as theCrout reduction. The second variant associatesDwith the upper triangular factor as
A =LU (A.68)where UD U. This reduction is exactly that obtained by Gaussian elimination.
The third variation is possible only for symmetric positive definite matrices, in
which case
A =L D LT (A.69)ThusAcan be written as
A = LLT (A.70)where now L L D1/2 is known as thematrix square root, and the factorization ineqn. (A.69) is known as the Cholesky decomposition. Efficient algorithms to compute
theLUand Cholesky decompositions can be found in Ref. [4].
A.5 Matrix Calculus
In this section several relations are given for taking partial or time derivatives of
matrices. Before providing a list of matrix calculus identities, we first will define
Most of these relations can be found in a website given by Mike Brooks, Imperial College, London, UK.
As of this writing this website is given byhttp://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html.
2004 by CRC Press LLC
http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/http://www.ee.ic.ac.uk/8/10/2019 c391x Appendix A
17/20
Matrix Properties 549
theJacobianandHessianof a scalar function f(x), wherexis ann 1 vector. TheJacobian of f(x)is ann 1 vector given by
xff
x=
f
x1
f
x2
...
f
xn
(A.71)
The Hessian of f(x)is ann nmatrix given by
2xf 2f
xxT=
f
x1x1
f
x1x2 f
x1xn
f
x2x1
f
x2x2 f
x2xn
......
. . ....
f
xnx1
f
xnx2 f
xnxn
(A.72)
Note that the Hessian of a scalar is a symmetric matrix. Iff(x)is anm 1 vector andxis ann 1 vector, then the Jacobian matrix is given by
xf f
x=
f1
x1
f1
x2 f1
xn
f2
x1
f2
x2 f2
xn
......
. . ....
fm
x1
fm
x2 fm
xn
(A.73)
Note that the Jacobian matrix is anmnmatrix. Also, there is a slight inconsistencybetween eqn. (A.71) and eqn. (A.73) whenm = 1, since eqn. (A.71) gives ann 1vector, while eqn. (A.73) gives a 1 nvector. This should pose no problems forthe reader though since the context of this notation is clear for the particular system
shown in this text.
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
18/20
550 Optimal Estimation of Dynamic Systems
A list of derivatives involving linear products is given by
x(Ax) =A (A.74a)
A(aTA b)
=abT (A.74b)
A(aTATb) = baT (A.74c)
d
dt(A B) =A
d
dt(B)
+
d
dt(A)
B (A.74d)
A list of derivatives involving quadratic and cubic products is given by
x(A x+b)TC(D x+e) =ATC(D x+e) +DTCT(A x+b) (A.75a)
x(xTCx) = (C+CT) x (A.75b)
A(aTATAb) =A (abT+baT) (A.75c)
A(aTATC A b) = CTA abT+C A baT (A.75d)
A(A a+b)TC(A a+b) = (C+CT) (A a+b)aT (A.75e)
x
(xTA xxT) = (A +AT)xxT+ (xTA x)I (A.75f)
A list of derivatives involving the inverse of a matrix is given by
d
dt(A1) = A1
d
dt(A)
A1 (A.76a)
A(aTA1b) = ATabTAT (A.76b)
A list of derivatives involving the trace of a matrix is given by
ATr(A) =
ATr(AT) =I (A.77a)
ATr(A) =
A1
T(A.77b)
ATr(C A1B) = ATC B AT (A.77c)
ATr(CTA BT) =
ATr(B ATC) = C B (A.77d)
ATr(C A B ATD) = CTDTA BT+D C A B (A.77e)
ATr(C A B A) = CTATBT+BTATCT (A.77f)
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
19/20
Matrix Properties 551
A list of derivatives involving the determinant of a matrix is given by
Adet(A) =
Adet(AT) = [adj(A)]T (A.78a)
Adet(C A B)
=det(C A B)A
T (A.78b)
Aln[det(C A B)] =AT (A.78c)
Adet(A) = det(A)AT (A.78d)
Aln[det(A)] = AT (A.78e)
Adet(ATC A) = det(ATC A) (C+CT)A(ATC A)1 (A.78f)
Aln[det(ATC A)] = (C+CT)A(ATC A)1 (A.78g)
Relations involving the Hessian matrix are given by
2
xxT(A x+b)TC(D x+e) =ATC D +DTCTA (A.79a)
2
xxT(xTCx) = C+CT (A.79b)
References
[1] Stewart, G.W.,Introduction to Matrix Computations, Academic Press, New
York, NY, 1973.
[2] Shuster, M.D., A Survey of Attitude Representations,Journal of the Astro-
nautical Sciences, Vol. 41, No. 4, Oct.-Dec. 1993, pp. 439517.
[3] Tempelman, W., The Linear Algebra of Cross Product Operations,Journal
of the Astronautical Sciences, Vol. 36, No. 4, Oct.-Dec. 1988, pp. 447461.
[4] Golub, G.H. and Van Loan, C.F.,Matrix Computations, The Johns Hopkins
University Press, Baltimore, MD, 3rd ed., 1996.
[5] Zhang, F.,Linear Algebra: Challenging Problems for Students, The Johns
Hopkins University Press, Baltimore, MD, 1996.
[6] Chen, C.T.,Linear System Theory and Design, Holt, Rinehart and Winston,New York, NY, 1984.
[7] Horn, R.A. and Johnson, C.R.,Matrix Analysis, Cambridge University Press,
Cambridge, MA, 1985.
2004 by CRC Press LLC
8/10/2019 c391x Appendix A
20/20
552 Optimal Estimation of Dynamic Systems
[8] Nash, J.C.,Compact Numerical Methods for Computers: linear algebra and
function minimization, Adam Hilger Ltd., Bristol, 1979.
Recommended