Kronecker Products and Matrix Calculus:
with Applications
ALEXANDER GRAHMvI, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior Lecturer in Mathematics,
The Open University, Milton Keynes
ELLIS HORWOOD LIMITED Publishers· Chichester
Halsted Press: a division of JOHN WILEY & SONS
New York· Brisbane· Chichester· Toronto
.,...;.
first published in 1981 by ELLiS HORWOOD LIMiTED Market Cross House, Cooper Street, Chichester, West Sussex, PO 19 lEB, England
11Ie publisher's colophon is reproduced from James Gillison's drawing of the allcient Market Cross, Chichester.
Distributors: Australia, New Zealand, South-east Asia; Jacaranda-WUey Ltd., Jacaranda Press, JOHN WILEY & SONS INC., G.P.O. Box 859, Brisbane, Queensland 40001, Australia Canada: JOHN WILEY & SONS CANADA LIMITED 22 Worcester Road, Rexdale, OntariO, Canada. b'urope, Africa.' JOHN WILEY & SONS LIMITED Baffins Lane, Chichester, West Sussex, England,
North and South America and the rest of the world: Halsted Press: a division of JOliN WILEY & SONS 605 Third Avenue, New York, N.Y. 10016, U.S.A.
© 1981 A. Graham/Ellis Horwood Ltd.
British Library Cataloguing in Publication Data Grw:un. Alexander
Kronecker products and matrix calculus. -(Ellis Horwood series in mathematics and its applications) 1. Matrices 1. Title 512.9'43 QA188
Library of Congress Card No. 81-7132 AACR2
ISBN 0-85312-391-8 (Ellis Horwood Limited, Library Edition) [SBN 0-85312-427-2 (Ellis Horwood Limited. Student Edition) ISBN 0-470-27300-3 (Halsted Press)
Typeset in Press Roman by Ellis Horwood Ltd. PIlnted in Great Britain by R. J. Acford, Chichester
COI'YRIGIIT NOTICE -All Rillht~ Rescrved. No [lurt or this publication may be rcproduccd, stored in a retricval ~ystCl\\, or tranSlllillcd,ln any form or by any means, ele~tronic, mcchanical, photocopying, recording or otherwise, without the permission of E111s Horwood Limited, Market Cross House, Cooper SIIeet, Chichester, West Sussex, England.
a..}..
Q[>
4-.
(/]0..
.01.v-1
Table of Contents
Author's Preface ..........................................7Symbols and Notation Used ..................................9
Chapter 1 - Preliminaries1.1 Introduction ....................................... 111.2 Unit Vectors and Elementary Matrices ...................... 111.3 Decompositions of a Matrix ............................. 131.4 The Trace Function .................................. 161.5 The Vec Operator . ................................. 18
Problems for Chapter I ................................20
Chapter 2 - The Kronecker Product2.1 Introduction ....................................... 212.2 Definition of the Kronecker Product .......................212.3 Some Properties and Rules for Kronecker Products ............. 232.4 Definition of the Kronecker Sum .........................302.5 The Permutation Matrix associating vccX and vecX' ............. 32
Problems for Chapter 2 ................................ 35
Chapter 3 - Some Applications for the Kronecker Product3.1 Introduction ....................................... 373.2 The Derivative of a Matrix ..............................373.3 Problem 1: solution of AX + XB = C ..................... 383.4 Problem 2: solution of AX + XA = µX ..................... 403.5 Problem 3: solution of X = AX + XB ..................... 413.6 Problem 4: to find the transition matrix associated with
the equation X = AX + XB ............................ 423.7 Problem 5: solution of AXB = C .........................443.8 Problem 6: Pole assignment for a Multivariable System...........45
'v,
'C7
...
A.°
'_'
...
380A
6 Table of Contents
Chapter 4 - Introduction to Matrix Calculus4.1 Introduction ....................................... 514.2 The Derivatives of Vectors ............................. 524.3 The Chain rule for Vectors ............................. 544.4 The Derivative of Scalar Functions of a Matrix
with respect to a Matrix ............................... 564.5 The Derivative of a Matrix with respect to one of
its Elements and Conversely ............................604.6 The Derivatives of the Powers of a Matrix ................... 67
Problems for Chapter 4 ................................ 68Chapter 5 - Further Development of Matrix Calculus including an
Application of Kronecker Products5.1 Introduction ....................................... 705.2 Derivatives of Matrices and Kronecker Products ............... 705.3 The Determination of (avecX)/(avecY) for more
complicated Equations ............................... 725.4 More on Derivatives of Scalar Functions with respect to a Matrix .... 755.5 The Matrix Differential ................................ 78
Problems for Chapter 5 ................................ 80Chapter 6 - The Derivative of a Matrix with respect to a Matrix
6.1 Introduction ....................................... 816.2 The Definition and some Results ......................... 816.3 Product Rules for Matrices ............................. 846.4 The Chain Rule for the Derivative of a Matrix with respect to Matrix .88
Problems for Chapter 6 ................................ 92Chapter 7 - Some Applications of Matrix Calculus
7.1 Introduction ....................................... 947.2 The Problems of Least Squares and Constrained Optimization in
Scalar Variables ..................................... 947.3 Problem 1: Matrix Calculus Approach to the Problems
of Least Squares and Constrained Optimization ................967.4 Problem 2: The General Least Squares Problem ............... 1007.5 Problem 3: Maximum Likelihood Estimate of the Multivariate Normal 1027.6 Problem 4: Evaluation of the Jacobians of some Transformations... 1047.7 Problem 5: To Find the Derivative of an Exponential
Matrix with respect to a Matrix ......................... 108Solution to Problems ..................................... IIITables of Formulae and Derivatives ............................ 121Bibliography ........................................... 126Index ............................................... 129
in.
CV
]
'27
T°°
.'7l/1
_r.
..^ ^.N
'i.
...
'r+
rte.
i1.
+~
+
O..
4-.
''.
.'7
~a.,
...1)¢
9a)...'-'
Author's Preface
My purpose in writing this book is to bring to the attention of the reader, somerecent developments in the field of Matrix Calculus. Although some concepts,such as Kronecker matrix products, the vector derivative etc. are mentioned ina few specialised books, no book, to my knowledge, is totally devoted to thissubject. The interested researcher must consult numerous published papers toappreciate the scope of the concepts involved.
Matrix calculus applicable to square matrices was developed by Turnbuil[29,301 as far back as 1927. The theory presented in this book is based on theworks of Dwyer and McPhail [15] published in 1948 and others mentioned inthe Bibliography. It is more general than Turnbull's development and is applicableto non-square matrices. But even this more general theory has grave limitations,in particular it requires that in general the matrix elements are non constant andindependent. A symmetric matrix, for example, is treated as a special case.Methods of overcoming some of these limitations have been suggested, but I amnot aware of any published theory which is both quite general and simple enoughto be useful.
The book is organised in the following way:Chapter 1 concentrates on the preliminaries of matrix theory and notation
which is found useful throughout the book. In particular, the simple and usefulelementary matrix is defined. The vec operator is defined and many usefulrelations are developed. Chapter 2 introduces and establishes various importantproperties of the matrix Kronecker product.
Several applications of the Kronecker product are considered in Chapter 3.Chapter 4 introduces Matrix Calculus. Various derivatives of vectors are definedand the chain rule for vector differentiation is established. Rules for obtainingthe derivative of a matrix with respect to one of its elements and conversely arediscussed. Further developments in Matrix Calculus including derivatives ofscalar functions of a matrix with respect to the matrix and matrix differentialsare found in Chapter 5.
Chapter 6 deals with the derivative of a matrix with respect to a matrix.
'..1..
-S7
4..
4.n 'CJ
461a`1
'C1
8Author's Preface
This includes the derivation of expressions for the derivatives of both the matrixproduct and the Kronecker product of matrices with respect to a matrix. Thereis also the derivation of a chain rule of matrix differentiation, Various applicationsof at least some of the matrix calculus are discussod in Chapter 7,
By making use, whenever possible, of simple notation, including manyworked examples to illustrate most of the important results and other examplesat the end of each Chapter (except for Chapters 3 and 7) with solutions at theend of the book, I have attempted to bring a topic studied mainly at post-graduate and research level to an undergraduate level.
,..
.w,
Symbols and Notation Used
A,B,C... matricesA' the transpose of A
ari the (i, j)th element of the matrix A[aif] the matrix A having arf as its (4 j)th elementI,,, the unit matrix of order m X inel the unit vectore the one vector (having all elements equal to one)
Ell the elementary matrix0,,, the zero matrix of order in X mSU the Kronecker deltaA., the lth column of the matrix A
Aj. the jti row of A as a column vectorA1.' the transpose of Af. (a row vector)(A')., the ithe column of the matrix A'(A').; the transpose of the ith column of A' (that is, a row vector)tr A the trace of AvecA an ordered stock of columns ofAA O B the Kronecker product of A and Biff if and only ifdiag {A} the square matrix having elements all, a22, . . . along its diagonal
and zeros elsewhere8Y
aXrs
ayfaxErs
E#
a matrix of the same order as Y
a matrix of the same order as X
an elementary matrix of the same order as Xan elementary matrix of the same order as Y
...
.".
.....
L°.
fl.
CHAPTER I
Preliminaries
1.1 INTRODUCTION
In this chapter we Introduce some notation and discuss some results which willbe found very useful for the development of the theory of both Kroneckerproducts and matrix differentiation. Our aim will be to make the notation assimple as possible although inevitably it will be complicated. Some simplificationmay be obtained at the expense of generality. For example, we may show that aresult holds for a square matrix of order n X n and state that it holds in the moregeneral case when A is of order in X n. We will leave it to the interested reader tomodify the proof for the more general case.
Further, we will often write
or or justDij instead ofm
ij
n
i=1 j=1
when the summation limits are obvious from the context.Many other simplifications will be used as the opportunities arise. Unless of
particular importance, we shall not state the order of the matrices considered.It will be assumed that, for example, when taking the product All or ABC thematrices are conformable.
1.2 UNIT VECTORS AND ELEMENTARY MATRICES
The unit vectors of order n are defined as
1 0
0 1
e1 = 0 , e2 = 0 , ..., e _
0
0
0
Pi L0J L1
..-
ti.
t27
Cs.
r.,
--+
12Preliminaries
The one vector of order n is defined as
11
1
e = 1
1
[Ch. 1
(1.2)
From (1.1) and (1.2), obtain the relation
e = Eel (1.3)
The elementary matrix E,i is defined as the matrix (of order m X n) whichhas a unity in the (i, f)th position and all other elements are zero.
For example,
E23 =
000...0001 ...0000...0
Lo00...0JThe relation between e1, ei and E11 is as follows
Eli = ei el
where ei denotes the transposed vector (that is, the row vector) of el.
(1.4)
Example 1.1
Using the unit vectors of order 3
(i) form Ell, E21, and E23(ii) write the unit matrix of order 3 X 3 as a sum of the elementary matrices.
Solution
(i) 1 1 0 0E11= 0 [1 001=
1000
0 000
0 000E21= 1 [100]= 100
0 0000 0 0 0
E23 = 1 [0 0 1]= 0 0 1
0 000
CAD
Sec. 1.3] Decompositions of a Matrix 13
3
(ii)' ! = Eit + E22 + E33 = eiejr=
The Kronecker delta Sij is defined as
1ift=/Sid Oifizkj
it can be expressed as
Sij=ejei=ejei . (1.6)
We can now determine some relations between unit vectors and elementarymatrices.
Eijer = eiejer (by 1.5)
= 5/rei (1.7)and
e,.Eii = e.eiej
= Sriej (1.8)Also
EijErs = eieieres = 5jetes = SjrEis (1.9)
In articular if r =f we havep ,
EijEjs=51jEis=Eis
and more generally
LijEjsEsrn = EisEsm = Eim (1.10)
Notice from (1.9) that
EijErs = 0 if / # r .
1.3 DECOMPOSITIONS OF A MATRIX
We consider a matrix A of order m X n having the following form
all a12 ainnA
-all a22 . a2n = [a11]
Lamlamt amnJ
We denote then columns of A by A.1, A.2, ... A,n. So that
A.j = a2i (j = 1, 21 .... n) (1.12)
an,j
'+f
fl,
0
14 Preliminaries [Cll. 1
and them rows of A by A1., A.2, ...A.. so that
A. =
A
(i = 1,2,... ,m) (1.13)
Both the A.l and the A. are column vectors. In this notation we can write A asthe (partitioned) matrix
A = [A.1 A.2 ... A.,,] (1.14)or as
A = [A1.A2.... A,,,.]' (1.15)
(where the prime means 'the transpose of').For example, let
so that
then
A1. =
au
ate
at,
all al
a21 a22
alland A2. _
a12
a21
a22
Palla2I' = call a121
L 12a221 La21 a22
=A.
The elements, the columns and the rows of A can be expressed in terms of theunit vectors as follows:
The jth column A.1 = Ael (1.16)
The ith rowAi '= ejA. (1.17)So that
A;. = (e,A)' = A'e1. (1.18)
The (i,j)th element ofA can now be written as
all = ejAel = eeA'el
We can express A as the sum
A = EEailEfl (1.20)
(where the Ell are of course of the same order as A) so that
A = EEaile,e1. (1.21)
[1.
GIN
1(,--*
N..
a.)
Sec. 1.31 Decompositions of a Matrix
From (1.16) and (1.21)
Similarly
A. j = Aej = (2Eaiieie)ei
= ZEatjet(e/ej)
= 2;a;ie; .
15
(1.22)
At. = Ea;jej (1.23)so that j
A;. = Eatjej . (1.24)I
It follows from (1.21), (1.22), and (J.24) that
A = XA.jejand
A = Eet A;.' .
Example 1.2Write the matrix
A =Fall a,2
L2l a2J
as a sum of: (i) column vectors of A; (ii) row vectors of A.
Solutions(i) Using (1.25)
A = A.le'1 + A.2e2
a21
[1 03 +[Using (1.26)
a22a el
[0 1]
(1.25)
(1.26)
A = el A1: + e2A2.'
ro [all a12] + [00 [a,21 a,22]
There exist interesting relations involving the elementary matrices operating onthe matrix A.
For example
EtjA = e;ej'A (by 1.5)
= e1Aj ' (by 1.17) (1.27)
`s]
t.)
CST
v..,...
.On
+:.
16Preliminaries [Ch. I
similarly AErj = Ae;ej' = A.ree .(by 1.16) (1.28)
sa that AEij = A.jee (1.29)
AE,jB = Aejej'B = A.,B1.' (by 1.28 and 1.27) (1.30)
,ErjAEr,i = ere/Aeres (by 1.5)
= ejalre'l (by 1.19)
= ajreie; = airEls (1.31)
In particularEj1AErr = airEir (1.32)
Example 1.3Use elementary matrices and/or unit vectors to find an expression for
(i) The product AB of the matrices A = [a,1] and B = [bij].(ii) The kth column of the product AB(iii) The kth column of the product XYZ of the matricesX= [xji], Y=
and Z = [zii]
Solutions(i) By (1.25) and (1.29)
A = EA. i e, = EAEii
hence
AB = E(AE11)B = E(Aej)(ej'B)
= EA.1Bj.' (by (1.16) and (1.17)(ii) (a)
(AB).k = (AB)ek = A(Bek) = AB.k by (1,16)
(b) From (i) above we can write
(AB).k = E(Aejej'B)ek = E(Aej)(e%Bek)
= EA./bjk by (1.16) and (1.19)i
(iii) (XYZ).k = Ezjk(XY).j by (ii)(b) above
= E(zjkX)Y.j by (ii)(a) above.
1.4 THE TRACE FUNCTIONThe trace (or the spur) of a square matrix A of order (n X n) is the sum of thediagonal terms n
art1=1
Sec. 1.4] The Trace Function
We writetr A = Eau
From (1.19) we have
aj1 = e';Aet,so that
tr A = Ee'iAei
From (1.16) and (1. 34) we find
tr A = Ee'iA.j
and from (1.17) and (1.34)
tr A = EAj.'ej .
17
(1.33)
(1.34)
(1.35)
(1.36)
We can obtain similar expression for the trace of a productAB of matrices.
For example
tr AB = Ee'jABej (1.37)t
= EE(e'Ae1)(e%Bet) (See Ex. 1.3)II
= Efatlbfj
Similarly
= EEbljat/
tr BA = EeeBAe1
=
From (1.38) and (1.39) we find that
trAB=trBA.From (1.16), (1.17) and (1.37) we have
tr AB = EA; B.t
Also from (1.40) and (1.41)
tr AB = EB1.A.j .
Similarlytr AB' = EAj.B1 .
and since tr AB' = Is A'B
tr AB' = EA.'jB.t
(1.38)
(1.39)
(1.40)
(1.41)
(1.42)
(1.43)
(1.44)
'C3
-U.
18Preliminaries [Ch. I
Two important properties of the trace are
tr (A + B) = tr A + tr B (1.45)
.nd tr (a A) = a trA (1.46)
where a is ascalar.These properties show that trace is a linear function.For real matrices A and B the various properties of tr (AB') indicated above
show that it is an inner product and is sometimes written as
tr (AB') _ (A, B)
1.5 THE VEC OPERATORWe shall make use of a vector valued function denoted by vec A of a matrix Adefined by Neudecker (221.
If A is of order m X n
A.1
vecA = A.2 (1.47)
LA. J, .
From the definition it is clear that vecA is a vector of order mn.For example if
then
A =a21 azzC11 a'2
rai nvecA = a21
a12
a22
Example 1.4Show that we can write tr AB as (vec A')' vec B
Solution
By (1.37)tr AB = Ee'jABe1
= EAi;B,1 by (1.16) and (1.17)
(since the ith row of A is the ith column of A')
..,
t3.
.N.
'C7
.NJ
..d
c$'
Sec. 1.51 The Vec Operator
Hence (assuming A and B of order n X n)
tr AB = E(A').1'(A').i 2'. (A').,,']
_ (vec A')'vec B
B.l
B.2
B,
19
Before discussing a useful application of the above we must first agree onnotation for the transpose of an elementary matrix, we do this with the aid ofan example.
Let X =X11 Xl2 X13
X21 X22 X23
then an elementary matrix associated with will X will also be of order (2 X 3).For example, one such matrix is
_ 0 1 0E12= 000
The transpose of E12 is the matrix
E12 =0 0
1 0
00
Although at first sight this notation for the transpose is sensible and is usedfrequently in this book, there are associated snags. The difficulty arises whenthe suffix notation is not only indicative, of the matrix involved but also deter-mines specific elements as in equations (1.31) and (1.32). On such occasions itwill be necessary to use a more accurate notation indicating the matrix order andthe element involved. Then instead of E12 we will write E12(2 X 3) and insteadof E12 we write E21(3 X 2),
More generally if X is a matrix or order (in X n) then the transpose of
Ers (171 X n)
will be written as
Ers
unless an accurate description Is necessary, in which case the transpose will bewritten as
Esr(nXm) .
Now for the application of the result of Example 1.4 which will be used later onin the book.
.-.
C.1
..y
20Preliminaries [Ch. 1]
From the above
tr E,''A = (vec Ers)' (vec A)
ars
where ars is the (r,s)th element of the matrix A.We can of course prove this important result by a more direct method.
tr E',.sA = Ee ErsAek
ai/ekese;.eiejek (sinceA =>aiiEij)i, j, k
i,1, kij'k$Sri'jk = ars
Problems for Chapter 1
(1) The matrix A Is of order (4 X n) and the matrix B is of order (n X 3). Writethe product AB in terms of the rows of A, that is, A,., A2., .. , and thecolumns of B, that is, B.1, B.2, ... .
(2) Describe in words the matrices
(3)
(a) AEik and (b) EikA .
Write these matrices in terms of an appropriate product of a row or a columnof A and a unit vector.
Show that
(a) trABC= EA1.BC.i
(b) trABC= trBCA=trCAB
Show that tr AEij = aji
B = [bij] is a matrix of order (n X n)diag {B} = diag {bll, b22, ... , b,,,, } = EbiiEii .Show that if
aij = tr BEjj6jj
then A = [aij] = diag{B}
[3.
+.,
.''
CHAPTER 2
The Kronecker Product
2.1 INTRODUCTIONKronecker product, also known as a direct product or a tensor product is aconcept having its origin in group theory and has important applications inparticle physics. But the technique has been successfully applied in various fieldsof matrix theory, for example in the solution of matrix equations which arisewhen using Lyapunov's approach to the stability theory. The development of thetechnique in this chapter will be as a topic within the scope of matrix algebra.
2.2 DEFINITION OF THE KRONECKER PRODUCTConsider a matrix A = [aqj of order (m X n) and a matrix B = [bq] of order(r X s). The Kronecker product of the two matrices, denoted by A O B is definedas the partitioned matrix
a11B a12B ...
AOB = a21B a22B ... a2,B (2.1)
LamIB a,r,,, BA O B is seen to be a matrix of order (rnr X its). It has inn blocks, the (i,j)thblock is the matrix a11B of order (r X s).
For example, let
A E P' 11 ail f B_ I bil b121
a21 a221 I b21 b22then
rallbll allbl2 al2bll a12b12
AOB a11B a12B
La21B a22B=
a11b21 a1lb22 a12b21 a12b22
a21b11 a21b12 a22b11 a22b12
a21 b21 a21 b22 a22 b21 a22 b22
tr.
._._
C3.
CU
D
w{..'
.N.
.N..
".. Z2The Kronecker Product (Ch. 2
Notice that the Kronecker product is defined irrespective of the order of the
makes involved. From this point of view it is a more general concept than
matrix multiplication. As we develop the theory we will note other resultswhich are more general than the corresponding ones for matrix multiplication.
The Kronecker product arises naturally in the following way. Consider two
linear transformations
x = Az and y = Bw
which, in the simplest case take the form
xt
x2
Fall
Last
at2
a22
r a t
Z2and
Yt
Y2
btr
bet
bb2r22, wt
LW J
(2.2)
We can consider the two transformations simultaneously by defining the following
vectors xiyt ztwtXI VI z ws
x 0y = and v= z© w= (2.3)
I x2Yt z2wt
x2 Y2 z2w2 .
To find the transformation between µ and v, we determine the relations betweenthe components of the two vectors.
For example,
xtyt = (attzt + at2z2) (btt wt + bt2w2)
= all btt (ziwt) + all bt2(ztw2) + at2btt(z2wt) + at2bt2(z2w2)
Similar expressions for the other components lead to the transformation
alibi, attbt2 at2btt a,-2b,2
attb21 all b22 at2b2t a12b22u= v
a2tbtt a21b12 a22brt a22b12
a21b12 a2tb22 a22b2t a22 b22
or
µ = (A®B)v,that is
Az®Bw = (A®B)(z(Dw) . (2.4)
Example 2.1
Let Eq be an elementary matrix of order (2 X 2) defined in section 1.2 (see 1.4).Find the matrix
U=2
Ej, i ®EI I
2
L°-!
+
Sec. 2.3] Some Properties and Rules for Kronecker Products 23
SolutionU =Ell (8) Ell +E1,2 ®E2,1 +E11 ® E12 +E2,2 ®E2,2
f11®r61
+roa1
®roof
+(001 (lo it
0 0 00 0 0 1 0 I of Lo 0J
'+so that 011( ® 0111 0 0 0
0 0 1 0U =
0 1 0 0
0 0 0 1
Note. U is seen to be a square matrix having columns which are unit vectorser(i = 1, 2,.. ). It can be obtained from a unit matrix by a permutation of rowsor columns. It is known as a permutation matrix (see also section 2.5).
2.3 SOME PROPERTIES AND RULES FOR KRONECKER PRODUCTS
We expect the Kronecker product to have the usual properties of a product.
I If a is a scalar, then
A O (aB) = a(A ®B) . (2.5)
ProofThe (i, j)tli block of A O (aB) is
[are (aB)J
= a[a11BJ
= a[(i, j) th block of A O BJ
The result follows.
It The product is distributive with respect to addition, that is
(a) (A+B)OC = AOC+B®C (2.6)
(b) A®(B+C) = ul ®B+A®C (2.7)
Proof
We will only consider (a), The (i, j)th block of (A + B) ® C is
(ali + b1i) C .
The (i, j)th block of A ® C + B ® C is
a11C+b;1C = (a11+bl)C
0
'+7
t.,..
C.,
The Kronecker Product24
Since the two blocks are equal for every (i,j), the result follows.
-III The product is associative
A®(B®C) _ (A(2-9 B)®C .
IV There exists
a zero element Ornr, = Orr, 2) On
a unit element Imn ° Im ® In
(Ch. 2
(2.8)
(2.9)
The unit matrices are all square, for example In, in the unit matrix of order(jn X m).
Other important properties of the Kronecker product follow.
V (A ®B)' = A' ®B' (2.10)
ProofThe (i,j)th block of (A (D B)' is
ai jB' .
VI (The `Mixed Product Rule').
(A ®B) (C ®D) = AC ®BD (2.11)
provided the dimensions of the matrices are such that the various expressionsexist.
ProofThe (i,j)th block of the left hand side is obtained by taking the product of theith row block of (A ® B) and the /th colum block of (C ® D), this is of thefollowing form
(ai1B ai2B ... ajnB)
c11D
c21D
cn1D
= EajrcriBD . - -
r
The (i, j)th block of the right hand side is (by definition of the Kronecker product)
gj1BD
where gji is the (i, j)th element of the matrix AC. But by the rule of matrixmultiplications
gji=Zajrcri
Sec. 2.31 Some Properties and Rules for Kronecker Products 25
Since the (i,j)th blocks are equal, the result follows.
VII Given A(m X m) and B(n X n) and subject to the existence of the variousinverses,
(A©B)'' = A"' OBy' (2.12)
ProofUse (2.11 )
(A ®B) (A-' ®B"') = AA-' ®BY-' = I, ®In = Inv.The result follows.
VIII (See (1.47))
vec(AYB) _ (B' ®A) vec Y (2.13)
ProofWe prove (2.13) for A, Y and B each of order n X n. The result is true forA(m X n), Y(n X r), B(r X s). We use the solutions to Example 1.3(iii).
(AYB).k = E(bikA)Y.ii '- -,
_ [blkA b2kA ... bnkA1
Y.1
Y. 2
Y.n
= [B.k'®A]vecY
= [(B')k: ®A] vec Y
since the transpose of the kth column of B is the kth row of B'; the resultsfollows.
Example 2.2
Write the equation
all a12
a21 a22
in a matrix-vector form.
XI X3
X2 X4
X11 C12
X21 `2J
Solution
The equation can be written as AXI = C. Use (2.12), to find
vec (AXI) = (1®A) vec X = vcc C ,
o`"
f`7'fl
,U.
61.
26
so thatFall a12 0 0-1
a21 a22 0 0
0 0 all a12
0 0 a21 a22
The Kronecker Product [Ch. 2
x1
x2
X3
x4
C1t
X21
a12
Lc22
Example 2.3A and B are both of order (n X n), show that
(i) vecAB=(1®A)vecB(ii) vecAB=(B'®A)vecl(iii) vec AB = E (B').k ® A.k
Solution
(1) (As in Example 2.2)In (2.13) let Y = B andB =1.
(ii) In (2.13) let Y = I .
(iii) In vec AB = (B' ®A) vec I
substitute (1.25), to obtain
vecAB = [(B').ie; O EA.lei]vecl
= [((B').i®A.J)(e.® ee)] vec 1 (by 2.11)ijThe product e', O ei' is a one row matrix having a unit element in the [(i - 1)n +
j]th column and zeros elsewhere. Hence the product
[(B').; ®A.i] [el' O el]
is a matrix having
(B').1®A.1
as its [(i -1)n + j]th column and zeros elsewhere. Since vecl is a one columnmatrix having a unity in the 1st, (n + 2)nd, (2n + 3)rd . . . n2rd position andzeros elsewhere, the product of
[(B').I ®A.l] [ej ® e)] and vec I
is a one column matrix whose elements are all zeros unless i and j satisfy
(i-1)n+j = l,orn+2,or2n+3,...,orn2
Sec. 2.3j Some Properties and Rules for Kronecker Products 27
that is
1=j=1 or i = j = 2 or i=j=3 or ..., i=j=nin which case the one column matrix is
(B').i®A.r (i = 1,2,...,n)The result now follows.
IX If (X;} and (xj) are the eigenvalues and the corresponding eigenvectors for Aand (µi} and (yi) are the eigenvalues and the corresponding eigenvectors for B,then
A®Bhas eigenvalues (Xrµj} with corresponding eigenvectors (xi ® yi}.
ProofBy (2.11)
(A ® B) (x, ® yi) _ (Ax,) © (Byi)
_ (Xixr) ® (µ1y1)
= Xjµi(x1 ®yj) (by 2.5)
The result follows.
X Given the two matrices A and B of order n X n and m X m respectively
JAOBI = IAImJBV"
where IAA means the determinant of A.
ProofAssume that X1, X2, ... , X and µr, µ2, ... , µ,,, are the eigenvalues of A and Brespectively. The proof relies on the fact (see [18] p. 145) that the determinantof a matrix is equal to the product of its eigenvalues.
Hence (from Property IX above)
IAOBI = jjXjufi,l
n n rr
X ' II µj) 1x2 tI P) ...t X nr l l µ//1=t 1=t 1=t
(X1 X2 ... ll(22 ...JAI"' IBI°
'I]
u^,.
28The Kronecker Product [Ch. 2
Another important property of Kronecker products follows.
AOB = Ut(BOA)U2
where U1 and U2 are permutation matrices (see Example 2.1).
ProofLet AYB' = X, then by (2.13)
(BOA) vec Y = vecX X.
on taking transpose, we obtain
BY;t' = X'So that by (2.13)
(A 0 B) vec Y' _
From example 1.5, we knowsuch that
vecX' .
(1)
(2)
that there exist permutation matrices
vec X' = U1 vec X and vec Y = U2 vec Y' .
(2.14)
U1 and U2
Substituting for vec Yin (1) and multiplying both sides by U1, we obtain
U1(B 0A)U2vecY' = U1 vecX .
Substituting for vec X' in (2), we obtain
(3)
(A O B) vec Y' = U1 vecX . (4)
The result follows from (3) and (4).We will obtain an explicit formula for the permutation matrix Uin section
2.5. Notice that U1 and U2 are independent of A and B except for the orders ofthe matrices.
XII if f is an analytic function, A is a matrix of order (n X n) and f(A) exists,then
andf(1,n&A) = Im ID AA)
f(A O Im) = f(A) O I.
ProofSince f is an analytic function it can be expressed as a power series such as
f(z) = a°+a1z+a2z2+..so that
f(A) = aoI,, +a1A+a2A2+... _
where A° = I.By Cayley Hamilton's theorem (see [18]) the right hand side of the equation
for f(A) is the sum of at most (n + 1) matrices.
a~'
'"1
Sec. 2.3] Some Properties and Rules for Kronecker Products 29
We now have
k =O
k=0
k=0
k7err, a®
k=0
Im ©f (A)
This proves (2.15); (2.16) is proved similarly.We can write
f(A (D I,,) )'ak(A Ox Im)kk -O
k=0
(Ak ©Im) by (2.11)
k=0
akAk ®lm)
= akA®0Imk=0
f(A) (& Irn
This proves (2.16).An important application of the above property is for
f(z) = eZ .
(2.15) leads to the result
elm 6A = Im O eA
and (2.16) leads to
eA ®rm = eA O It n
Example 2.4
Use a direct method to verify (2.17) and (2.18).
by (2.6)
(2.17)
(2.18)
Solution
elm®A =
30The Kronecker Product [Ch. 2
The right hand side is a block diagonal matrix, each of the m blocks is the sum
I,,,+A+21 A2+... = eA .
The result (2.17) follows.
eA®Im (In®Im)+(A(D Im)+21 Q. ®A)2+...
( 1 n ®In,) + ( A ®1m) + 1(A2 01m) + .. .
= Q,,+A+2A2+...)OOIm
= eA ®I,,,
XIII tr(A®B)=trAtrB
ProofAssume that A is of order (n X n)
tr(A®B) = tr(a1,B)+tr(a22B)+...+tr(annB)= a11trB+a22trB+...+anntrB= (all +a22+...+a.... )trB= tr A tr B .
2.4 DEFINITION OF THE KRONECKER SUMGiven a matrix A(n X n) and a matrix B (m X m), their Kronecker Sum denoted
by A ®B is defined as the expression
AG+B = A©I,,+1n®B (2.19)
We have seen (Property IX) that if {X;} and {pj} are the eigenvalues of A and Brespectively, then {X;pj} are the eigenvalues of the product A ® B. We now showthe equivalent and fundamental property for A (D B.
XIV If {X;} and tAj) are the eigenvalues of A and B respectively, then (Xi + pf}are the eigenvalues of A O B.
ProofLet x and y be the eigenvectors corresponding to the eigenvalues X and p of Aand B respectively, then
(A(DB)(x®y) _ (A0I)(x0y)+(10B)(x(3y) by (2.19)
= (Ax ®y) + (x ®By) by (2.11)
= X(x ®y) + U(x ®y)
_ (X+p)(x®y)The result follows.
'=9
c,.
Sec. 2.41 Definition of the Kronecker Sum
Example 2. S
Verify the Property XN for
A _ l -1 I
0
SolutionFor the matrix A;
and B =1-0
Cl-lJ1
31
X, = 1 and x, = 1101
X2 = 2 and x2 = [ 1For the matrix B;
1 iµ, = 1 and Yi
-(L
1122 and Y2=L1
We find
C=AO+B =
2 0 -1 0
2 0 0 -10 0 3 0
0 0 2 1
and 1 pi - Cl = p (p - 1) (p - 2) (p - 3), so that the eigenvalues of A O B are
and
p = 0 = X, + µ2 and xt O y2 = [0 1 0 0]'
p = 1 = X2 + 112 and x2 O Y2 = 10 1 0 -1]'p = 2 = X,+µr and x1Oy, = (1 1 0 0]'
p = 3 = X2 + µr and x2 O Yr = 11 1 -1 -1 ]' .
The Kronecker sum frequently turns up when we are considering equationsof the form;
AX + XB = C (2.20)
where A(n X n), B(m X in) and X(n X m).Use (2.13) and solution to Example 2.3 to write the above in the form
vecCor
(11' (D A) vec X = vec C
It is interesting to note the generality of the Kronecker sum. For example,
exp (A + B) = exp A exp B
(2.21)
,,,
...
't7"t7
32 The Kronecker Product [Ch. 2
if and only if A and B commute (see [ 181 p. 227)whereas exp (A 0 B) = exp (A 0 1) exp (I 0 B)even if A and B do not commute!
Example 2.6Show that
exp (A ®B) = expA © exp B
whereA(n X it),B(m X m).
SolutionBy (2.11)
A®Band
(A 0 Im) and (In 0 B) commute so that
exp (A ®B) = exp (A 01m + In 0 B)
= exp (A ®I,,,) exp (In ®B)
= (expA ®Im) (1 ® exp B) (by 2.15 and 2.16)
= expA 0 expB (by 2.11)
2.5 THE PERMUTATION MATRIX ASSOCIATING vec X AND vec X'
If X = [x;l] is a matrix of order (in X n) we can write (see (1.20))
X = EEx,/E;j
where Eli is an elementary matrix of order (in X n). It follows that
X' =so that
vec X' = EEx11 vec Erl' .
We can write (2.22) in a form of matrix multiplication as
(2.22)
x11
x21
vec X' = [vec E11 vec E21 .. vec E,;1 vec E12:... vec E,;,n] I x,,,,
x12
xmn
f=7
--»
Sec. 2.5] The Permutation Matrix
that is
33
vec X' = [vec E11 vec E21; ... vec E,,',,: vec E12 ... vec E,,',,j vec X.
So the permutation matrix associating vec X and vec X' is
U = [vec E,', vec E2, ... vec (2.23)
Example 2.7Given
X = xli X12 X13 determine the matrix Ux21 x21 x13
such that
Solution
vecX' = U vec X,
0l 0 = r0EI'1
-1 0 0 Ei'r !f 0 0 'El, =
r -,
E13 0 and E23 = 0
1,0
0
1(0)
0
0 1
Hence by (2.23)
U =
1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 1 0
0 1 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0 1
001 0
0 0
E22 =
000 1
0 0
We now obtain the permutation matrix U in a useful form as a Kroneckerproduct of elementry matrices.
As it is necessary to be precise about the suffixes of the elementary matrices,we will use the notation explained at the end of Chapter 1.
As above, we writem
X' = > > xrsEsr (n X m) .
r=l s=1
By (1.31) we can write
X' Er (nXm)XEsr(11 Xm).r, s
4U.
c..
r..
34
1 fence,
The Kronecker Product [Ch. 2
vec X' = vec Esr (n X nt) XE,rr (n X m)r, s
Er,.(mXn)©E,.r(nXm)jvecX by (2.13)r' s
It follows thatU = ) Ers (m X n) O Esr (n X m)
r, s
or in our less rigorous notation
(2.24)
U = ,E, Ox Ers (2.25)r, s
Notice that U is a matrix of order (nut X nut).At first sight it may appear that the evaluation of the permutation matrices
Ut and U2 in (2.14) using (2.24) is a major task. In fact this is one of the exampleswhere the practice is much easier than the theory.
We can readily determine the form of a permutation matrix - as in Example2.7. So the only real problem is to determine the orders of the two matrices.
Since the matrices forming the product (2.14) must be conformable, theorders of the matrices Ut and U2 are determined respectively by the number ofrows and the number of columns of (A O B).
Example 2.8Let A = [a111 be a matrix of order (2 X 3), and B = [bit] be a matrix of order(2 X 2).
Determine the permutation matrices Ut and U2 such that
A O B = Ut (B 0 A) U2
Solution
(A ©B) is of the order (4 X 6)
From the above discussion we conclude that Ut is of order (4 X 4) and U2 is oforder (6 X 6).
1 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 1 0Ut _0 1 0 0
and U2 =0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 0 0 0 0 11
(-)
C)'
`L7
Sec. 2.51 The Permutation Matrix
Another related matrix which will be used (in Chapter 6) is
U=r, s
rs O Ers
When the matrix X is or order (in X n), U is or order (nr2 X n2).
Problems of Chapter 2
(1) GivenU = Ers(inX n)0Esr(nX m).
Show thatr, s
U-1 = U' =.Er(nXin)0Ers(inXn)r' s
35
(2.26)
(2) A = [at1], B = [b,1] and Y = [y,j] are matrices all of order (2 X 2), use adirect method to evaluate
(a) (i) AYB(ii) B' ©A
(b) Verify (2.13) that
vecAYB = (B' O A) vec Y.
(3) Givenr2 1
and B =-1 1
2 0A =
01
(a) Calculate
AOB add BOA.(b) Find matrices U, and U2 such that
AOB = Ul(BOA)U2.
(4) GivenC3 4
A2 _3
calculate
(a) exp (A)
(b)'exp(A 01).
Verify (2.16), that is
exp (A) 01 = exp (A 01).
../
36
(5) Given
The Kronecker Product [Ch. 2)
2 1 1 2and B , calculate
1 3 4-1 -A
(a) A"' O B-'and
(b) (A ©B)'' .
Hence verify (2.12), that is
(A © B)'' = A"' © B''
(6) Given
L4 2]and B = L2
3, find
(a) The eigenvalues and eigenvectors of A and B.(b) The eigenvalues and eigenvectors of A © B.(c) Verify Property IX of Kronecker Products.
A =
(7) A, B, C and D are matrices such that
A is similar to C, and
B is similar to D.
Show that A 0 B is similar to C rJ D.
....
ice'
"'7
,-.
..'
C].
.'^
CHAvrER 3
Some Applications of theKronecker Product
3.1 INTRODUCTION
There are numerous applications of the Kronecker product in various fieldsincluding statistics, economics, optimisation and control. It is not our intentionto discuss applications in all these fields, just a selected number to give an ideaof the problems tackled in some of the literature mentioned in the Bibliography.There is no doubt that the interested reader will find there various other appli-cations hopefully in his own field of interest.
A number of the applications involve the derivative of a matrix - it is a wellknown concept (for example see [18] p. 229) which we now briefly review.
3.2 THE DERIVATIVE OF A MATRIX
Given the matrix
A(t) _ [ar!(t))the derivative of the matrix, with respect to a scalar variable t, denoted by(d/dt)A(t) or just dA/dt or A(t) is defined as the matrix
dtA(t) - I dta;t(t)I . (3.1)
Similarly, the integral of the matrix is defined as
JA(t)dt = [Jaii@)dt (3.2)
For example, given
2t2 4A =
sin t 2 + t2
717
cry
38
then
Some Applications of the Kronecker Product [Clt. 3
d
A =14t Q t3 4t
dtand fAdt = + C
cost 2t -cost 2t + t3/3
where C is a constant matrix.One important property follows immediately. Given conformable matrices
A(t) and B(t), then
dt [AB] = aAB+A d- . (3.3)
Example 3.1
Given C = AOB(each matrix is assumed to be a function of t) show that
dC = dAOB+AO dB 3.4)
SolutionOn differentiating the (i, j)th block of A O B, we obtain
dt (aijB) i'iB + a,i aB
which is the (i, j)th partition of
dAOB+AOdB ,
the result follows.
3.3 PROBLEM 1
Determine the condition for the equation
AX+XB = C
to have a unique solution.
Solution
We have already considered this equation and wrote it (2.21) as
(B'@ A) vec X = vec Cor
Gx = c
where G = B' (D A and c = vec C.
(3.5)
_;'
"..
-N1
...
^'r
Sec. 3.3) Problem I 39
Equation (3,5) has a unique solution 1ff G is nonsingular, that is iff theeigenvalues of G are all nonzero. Since, by Property XIV (see section 2.4), theeigenvalues of G are (X1 + µ/) (note that the eigenvalues of the matrix B' arethe same as the eigenvalues of B). Equation (3.5) has a unique solution iff
Xr+µl a:0 (all iandj).We have thus proved that AX + BX = C has a unique solution iff A and (-B)have no eigenvalue in common.
If on the other hand,A and (-B) have common eigenvalues then the existenceof solutions depends on the rank of the augmented matrix
[Gc]
If the rank of [G:c] is equal to the rank of G, then solutions do exist, otherwisethe set of equations
AX+XB = Cis not consistent.
Example 3.2Obtain the solution to
AX+XB = Cwhere
(1) A = I0 22, B = [ 1 0]
an d C =12 2
+
3 4(ii) A = 1
0 2 , B = 0 -1 an d C = 2 -9
Solution
Writing the equation in the form of (3.5) we obtain,
(1) -2 - 1 1 0 x1 I
0 - 1 0 1 x2 -24 0 1 - 1 x3 3
0 4 0 2 x4 2
where for convenience we have denoted
x2 x,l
00.
0
...
0
40 Some Applications of the Kronecker Product [Ch. 3
On solving we obtain the unique solution
10 21X=
1 -1
(ii) In case (ii) A and (-B) have one eigenvalue (X = 1) in common. Equation(3.5) becomes
H2 -1 0 0 x, 0
0 -1 0 0 x2 _ 2
4 0 0 -1 x3 SS
LO 4 0 J x4 -9
and rank G = rank [G; c].G is seen to be singular, but
rank G = rank [G c] = 3
hence at least one solution exists. In fact two linearly independent solutions are
X, _1 0
-2 -1and X2 =
Ti 1-1
-2 -1
any other solution is a linear combination of X, and X.2.
3.4 PROBLEM 2
Determine the condition for the equation
AX-XA=yX (3.6)
to have a nontrivial solution.
Solution
We can write (3.6) as
Hx = px (3.7)
whereH=I®A -A'@ I and
x = vecX .
(3.7) has a nontrivial solution for x iff
1,41-HI = 0
that is iff p is an eigenvalue of H. But by a simple generalisation of Property XIV,
Sec. 3.5] Problem 3
section 2.4, the eigenvalues of H are {(At - ?l)} whereof A. 1-fence (3.6) has a nontrivial solution iff
p =
Example 3.3
Determine the solutions to (3.6) when
A =5 of2 3
and p = -2 .
{rr}
41
are the eigenvalues
Solution
p = -2 is an eigenvalue of H, hence we expect a nontrivial solution. Equation(3.7) becomes
0 0--2 01 XI X1
2 2 0 - 2 x2 x2= -2
0 0 -2 0 X3 X3
0 0 2 0 xa x4
On solving, we obtain
X= 1 1
-1 -1
3.5 PROBLEM 3
Use the fact (see [18] p. 230) that the solution to
z = Ax , x(0) = cis
x = exp (A t) c
to solve the equation
X = AX + XB , X (O) = C
where A(n X n), B(m X in) and X(n X rn).
Solution
Using the vec operator on (3.10) we obtain
X = GX , x (0) = cwhere
x = vecX, c = vecCand
G = I,,, OA+B'OI
(3.8)
(3.9)
(3.10)
(3.11)
...
.-.
42 Some Applications of the Kronecker Product [Ch. 3
By (3.9) the solution to (3.11) is
vee X = exp {(I,,, 0 A) t + (B' ®lr,)t) vcc C
[exp (I,,, ©A)t] [exp vcc C (see Example 2.6)
[I, © exp (At)] [exp (Bt) O vec C by (2.17) and
We now make use of the result
vec AB = (B'(D I) vec A
(in (2.13) put A =1 and Y - A) in conjunction with the
to obtain
fact that
(2.18).
[exp (B'r)] = exp (Bt) ,
(exp (Bt) O vec C = vec [Cexp (Bt)]
Using the result of Example 2.3(1), we finally obtain
vec X = vec [exp (At) C exp (Bt)
So that X = exp (At) C exp (Bt).
(3.12)
Example 3.4Obtain the solution to (3.10) when
A =1 -1
, B =1 0
and C =-2 0
0 2 0 -1 I 1
Solution
(See [ 18] p. 227)
er et - e2t et 0exp(At) = e2tfl, exp (Bt) = l
0 e r
hence
X_e2r-ear
e3t er
3.6 PROBLEM 4
We consider a problem similar to the previous one but in a different context.An important concept in Control Theory is the transition matrix.Very briefly, associated with the equations
X = A(t)X or is = A(t)xis the transition matrix (P1 (t, r) having the following two properties
c1(t r) = A(t)'t1(t r) (3.13)and
`b1(t, t) = 1
,L]
.ti
Sec. 3.6J Problem 4 43
[For simplicity of notation we shall write for cb(t,T).] lfA is a constant matrix,it is easily shown that
(1) = exp(At)
Similarly, with the equation
X = X13 so that X' = 13'X'we associate the transition matrix 4'2 such that
4,2 = B'`F2 .
The problem is to find the transition matrix associated with the equation
X=AX+XBgiven the transition matrices 4' and `I'2 defined above.
SolutionWe can write (3.15) as
is=Gx
(3.14)
(3.15)
where x and G were defined in the previous problem.We define a matrix as
Ji(t,T) _ 1,2(t,T)0 `P,(t,T) (3.16)
We obtain by (3.4)
q)2 ©(>;, + 4)2
(B'4'2) ® `1't + `1'2 O (A`1)1) by (3.13) and (3.14)
= (B'`F2) ® (I`1't) + (I`F2) ® (A`l't)
= [B'OI+IOAi[(2O(1?,J . by (2.11)Hence
=GO .Also
i (t, t) _ `l'2(4 r) ®`F (t, r)
= I®I=I. (3.18)
The two equations (3.17) and (3.18) prove that L is the transition matrix for(3.15)
Example 3.5Find the transition matrix for the equation
1 0X - IO 2
X+X0 -1
coy
44 Some Applications of the Kronecker Product (Clr. 3
SolutionIn this case both A and B are constant matrices. From Example 3.4.
4'1 = exp(At) =et et-e2` i
LoIet
0
e2t
0 te-
4)2 = exp (Bt) _
So thate2t e2t__e31 0 0
0 e3t 00
1G=(D2O t=0 0 1 1 -et
Lo 0 0 etFor this equation
2 -1 0 0
0 3 0 0G =
0 0 0 -1
L0 0 0 1,
and it is easily verified that
=Giand
3.7 PROBLEM 5Solve the equation
AXB =Cwhere all matrices are of order n X n.
Solution
Using (2.13) we can write (3.19) In the form
Hx = c (3.20)
where H = B'O A, x = vec X and c = vec C.The criteria for the existence and the uniqueness of a solution to (3.20) are
well known (see for example [ 18] ).The above method of solving the problem is easily generalised to the linear
equation of the form
A1XB1 + A2XB2 + ... +A,XB,. = C (3.21)
Sec. 3.81 Problem 6
Equation (3.21) can be written as for example (3.20) where this time
B ,
Example 3.6Find the matrix X, given
whereA1XB1 +A2XB2 = C
0 2 4 -61
B2 = and C =1-l 3 0 8
Solution
For this example it is found that r --t2 2 -2 - 3
1 -1 1 2H = B0A1+Bz0A2 =0 2 2 5
-4 -2 -5 - 4
andc'=[4 0 -6 81It follows that
so that
x = H-lc =
X
-1
-2
0
45
3.8 PROBLEM 6
This problem is to determine a constant output feedback matrix K so that theclosed loop matrix of a system has preassigned eigenvalues.
A multivariable system is defined by the equations
x = Ax+Buy=Cx
(3.22)
where A(n X n), B(n X m) and C(r X n) are constant matrices, u, x and y arecolumn vectors of order in, n and r respectively.
Some Applications of the Kronecker Product [Ch. 3
We are concerned with a system having an output feedback law of the form
u = Ky (3.23)
where K(m X r) is the constant control matrix to be determined.On substituting (3.23) into (3.22), we obtain the equations of the closed
loop system
z=(A+BKC)xy
C= x.
The problem can now be restated as follows:Given the matrices A, B, and C, determine a matrix K such that
(3.24)
XI -A -BKC I = ao + a, X + ... + an._1 A"-1 + A" (say) (3.25)
= 0 for preassigned values A = A1, A2, ..., An
SolutionVarious solutions exist to this problem. We are interested in the application ofthe Kronecker product and will follow a method suggested in [24].
We consider a matrix H(n X n) whose eigenvalues are the desired values A1,A2 ... , An, that is
IAl-HI = 0 for A = (3.26)and
IAl-HI = ao+a1A+...+an_1A"-1+A" . (3.27)Let
so thatA + BKC = H
BKC=H-A=Q (say) 3.28)
Using (2.13) we can write (3.28) as
(C'@ B) vec K = vec Q (3.29)
or more simply as
Pk = q (3.30)
where P = C' O B, k = vec K and q = vec Q.Notice that P is of order (n2 X mr) and k and q are column vectors of order
mr and n2 respectively.The system of equations (3.30) is overdetennined unless of course to = n =r,
in which case can be solved in the usual manner - assuming a solution doesexist!
In general, to solve the system for k we must consider the subsystem oflinearly independent equations, the ienraining equations being linearly dependent
oCD
'r.
may
.'..Y
.°,
I!]
+'7
'.'
.°C
.'.
Sec. 3,8 .1 Problem 6 47
on this subsystem. In other words we determine a nonsingular matrix T(n2 X n2)such that
PtTP = --- (3.31)Li P2
where P, is the matrix of the coefficients of the linearly independent equationsof the system (3.30) and P2 is a null matrix.
Premultiplying both sides of (3.30) by T and making use of (3.31), weobtain
TPk=Tqor
LiPNk=
u
V(3.32)
If the rank of P is tnr, then Pl is of order (nir X rnr), P2 is of order ([n2- mr] Xmr) and u and v are of order nir and (n2 -mr) respectively.
A sufficient condition for the existence of a solution to (3.32) or equivalentlyto (3.30) is that
v = 0 (3.33)
in (3.32).If the condition (3.33) holds and rank Pt = mr, then
k = Pt-t u , (3.34)
The condition (3.33) depends on an appropriate choice of H. The underlyingassumption being made is that a matrix H satisfying this condition does exist.This in turn depends on the system under consideration, for example whether itis controllable.
Some obvious choices for the forth of matrix H are: (a) diagonal, (b) upperor lower triangular, (c) companion form or (d) certain combinations of the aboveforms.
Although forms (a) and (b) are well known, the companion form is less welldocumented.
Very briefly, the matrix
0 1 0 ... 0
0 0 I ... 0H=0 0 0 ... 1
Lao -ar -a2 -a"-
is said to be in `companion' form, it has the associated characteristic equation
IA! -HI = ao + at?t + ... + 0 (3.35)
ON
O
Some Applications of the Kxonecker Product (Ch. 3
Example 3. 7Determine the feedback matrix K so that the two input - two output system
0 1 0 0 0
x= 3 3 1 x+ 1 0 u2 -3 2 0 1
has closed loop eigenvalues (-1, -2, -3).
SolutionWe must first decide on the form of the matrix H.
Since (see (3.28))
H - A = BKC
and the first row of B is zero, it follows that the first row of
H-Amust be zero.
We must therefore choose H in the companion form.Since the characteristic equation of His
(X+1)(X+2)(a+3) = X3 +6X2+11a+ 6 = 0
0
1 0
H
[0 1 (see (3 .35))
-6 -11 --6
and hence (see (3.28))
r0 0 0
Q = -3 -3 0
-8 -8 -8 .0 0 0 0
1 0 1 0
0 1 0 1
1 0 0 0 0 0 0
P=C'OB= 11
11
O 1 0 1 0 1 0
0 1 0 1 0 1 0 1
0 0 0 0
0 0 1 0
0 0 0 1
`--O
0
0
Sec. 3.8] Problem 6
An appropriate matrix T is the following
49
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
T= 1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 1 0 0-1 0 0 0
0 1 0 0-1 0 0 0 0
It follows that
0 1 0
0 0 0 1
1 0 1 0
0 1 0 1 P,TP =
0 0 0 0 PZ
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
and
0
- 8
- 3
- 8 uTq = 0
v
50 Some Applications of the Kronecker Product [Ch. 3]
Since
0 0 1 0 -1 0 1 0
Pt =0 0 0 1
Pit =0-1 0 1
1 0 1 0 1 0 0 0
so that (see (3.34)
L0 1 0 1J
)
0 1 0 0
-3
k =Pi-lu = 0
0
-8Hence
A _ [-13 0
-8
t17
n';
,w.
.w,
coo
't7v0
,
try
(G9
SAC
f-'".,
.ti.-.
.."
'C7
`'"
-L7
.CC
fl..
II.a,:
°.'
CHAPTER 4
Introduction to Matrix Calculus
4.1 INTRODUCTION
It is becoming ever increasingly clear that there is a real need for matrix calculusin fields such as multivariate analysis. There is a strong analogy here with matrixalgebra which is such a powerful and elegant tool in the study of linear systemsand elsewhere.
Expressions in multivariate analysis can be written in terms of scalar calculus,but the compactness of the equivalent relations in terms of matrices not onlyleads to a better understanding of the problems involved, but also encourages theconsideration of problems which may be too complex to tackle by scalar calculus.
We have already defined the derivative of a matrix with respect to a scalar(see (3.1)), we now generalise this concept. The process is frequently referred toas formal or symbolic matrix differentiation. The basic definitions involvethe partial differentiation of scalar matrix functions with respect to all theelements of a matrix. These derivatives are the elements of a matrix, of the sameorder as the original matrix, which is defined as the derived matrix. The words'formal' and 'symbolic' refer to the fact that the matrix derivatives are definedwithout the rigorous mathematical justification which we expect for the corres-ponding scalar derivatives. This is not to say that such justification cannot bemade, rather the fact is that this topic is still in its infancy and that appropriatemathematical basis is being laid as the subject develops. With this in mind wemake the following observations about the notation used. In general the elementsof the matrices A, B, C, . . . will be constant scalars. On the other hand theelements of the matrices X, Y, Z, . . . are scalar variables and we exclude thepossibility that any element can be a constant or zero. In general we will alsodemand that these elements are independent. When this is not the case, forexample when the matrix X is symmetric, is considered as a special case. Thereader will appreciate the necessity for these restrictions when he considers thepartial derivatives of (say) a matrix X with respect to one of its elements xr5.Obviously the derivative is undefined if xr,. is a constant. The derivative isEr,s if xr5 is independent of all the other elements of X, but is Er,s + E,,. if X issymmetric.
((d
INS
rowrod`
`d.loo
C1'
rpm
52 Introduction to Matrix Calculus (Ch. 4
There have been attempts to define the derivative when xrs is a constant (orZero) but, as far as this author knows, no rigorous mathematical theory for thegeneral case has been proposed and successfully applied.
4.2 THE DERIVATIVES OF VECTORS
Let x and y be vectors of orders n and m respectively. We can define variousderivatives in the following way (15]:
(1) The derivative of the vector y with respect to vector x is the matrix
ay
FaYt
ax,
ayt
ays
ax,
3Y2
aYR,
ax,
aYm(4.1)
ax ax2 ax2
axe
ay, by., aym
axn axn ax-1
of order (n X m) where yr, Y2, ... , y,,, and x,, x2, ... , x are the components ofy and x respectively.
(2) The derivatives of a scalar with respect to a vector. Ify is a scalar
ray
ay ayTax ax2
by
axn
(3) The derivative of a vector y with respect to a scalar x
(4.2)
by ay, aye aym(4.3)
ax Lax ax ax
Example 4.1
Giveny =
Yr x =X,
x2Y2
X3
/Ay
-ti
n...
l0,
.ti
Sec. 4.2] The Derivatives of Vectors 53
andYi =xi-x2Y2 = x3 + 3x2
Obtain ay/ax.
Solution3Yi ay-2 2xt\ 0
ay _axe axt
ay, aye-1 3
ax axe axe
ay, ay20 2xj
ax3 ax3
In multivariate analysis, if x and y are of the same order, the absolute valueof the determinant of ax/ay, that is of
aX
ayJ
is called the Jacobian of the transformation determined by
y = Y(x)
Example 4.2
The transformation from spherical to Cartesian co-ordinates is defined by x =r sin 0 cos >V ,y = r sin B sin ', and z = r cos B where r > 0, 0 < 0 <7r and0< ,<27r.
Obtain the Jacobian of the transformation.
SolutionLet
and
ay
x=Yt, Y=x2, z=x3r=Y1, 0=Y2, '=Y3,
sin y2 COSy3 S"' Y2 sin Y3
J =` ax Yt COs y2 COSY3 Yt Cosy2 sin y3
yt sin y2 Sin Y3 Yt sin Y2 COS Y3
= , sin y2
CosY2
-Yi sill Y20
Definitions (4.1), (4.2) and (4.3) can be used to obtain derivatives to manyfrequently used expressions, including quatratic and bilinear forms.
...-_
-
INN
.fl
:.,
54 Introduction to Matrix Calculus
For example consider
y=xAxUsing (4,2) it Is not difficult to show that
ay =Ax+A'xax
(Ch. 4
= 2Ax if A is symmetric.
We can of course differentiate the vector 2Ax with respect to x, by definition
a
\a ) =8 (2Ax)
ax ax ax
= 2A' = 2A (if A is symmetric).
The following table summarises a number of vector derivative (ormulae.
Y aY
scalar or a vector ax
(4.4)
4.3 THE CHAIN RULE FOR VECTORS
Let
x
=[x2l
'y = [Y21 and z = [zi z211
Xn Yr
Using the definition (4.1), we can write
az, 3z,
ax, aX2
8Z2 aZ2
ax1 ax2
(4.5)
aZrn 8Zrn
ax, ax2
!ate
row
fro
Sec. 4.3] The Chain Rule for Vectors 55
Assume that
z=y(x)so that
azi _ r-, °ZI ayq
ax1 i ayy ax,
Then (4.5) becomes
1 = 1,Z,...,m
az, ayq
yq ax,
az ' 8Z2 aYq
ax :Eayq ax,
azrn '}'q
ayq ax,
az, aZ, "IaYt aY2
...aYr
az2 aZ2 az2
aYt aY2 ayr
az,,, aZm az,n
aYr aY2 aYr
(ayl
(ax),
_ ayaz'ax ay
az, ayq
ayq axe
az2 ayq
ayq axe
az, alayq aXn
az2 ayq
ayq axn
azm ayq aZm ayqayq
a3C2 aYq axn
lay, ay, ... ay,
ax, ax2 axn
aye aY2 aY2
ax, 3X2...
bxn
ayr ayr ayr
ax, az2 axn
(by (4.1))
on transporting both sides, we finally obtain
az _ ay az(4.6)
ay ax ay
a°'
Imo)
`"J
fly
`i117.
56 Introduction to Matrix Calculus [Ch. 4
4.4 THE DERIVATIVE OF SCALAR FUNCTIONS OF A MATRIX WITHRESPECT TO THE MATRIX
Let X = [x,J] be a matrix of order (m X n) and let
Y = f(X)be a scalar function of X.
The derivative of y with respect to X, denoted by
ay
axis defined as the following matrix of order (m X n)
ay ay ay
BY
ax
ax11 ax12 -
axin
BY BY ay
ax21 ax 22 ax2n
ay
axj/E,/
ay(4.7)
axi/
ay ay .. ayaxml axm2 ax-
where E;/ is an elementary matrix of order (m X n).
DefinitionWhen X = [xy] is a matrix of order (m X n) and y = f(X) is a scalar function ofX, then af(X)/aX is known as a gradient matrix.
Example 4.3
Given the matrix X = [xti] of order (n X n) obtain ay/aX when y = tr X.
Solutiony=trX=x11+x22+...+xnn =trX'(see 1.33) hence by (4.7)
ayIn
ax
An important family of derivatives with respect to a matrix involves functionsof the determinant of a matrix, for example
y = JXJ or y = JAXJ .
We will consider a general case, say we have a matrix Y = [y;/] whosecomponents are functions of a matrix X = [x;/], that is
Yi/ = fl (x)where x = [x11 x12 . . . xmn]'
CD
.
Sec. 4.4] The Derivative of Scalar Functions of a Matrix 57
We will determine
aiYiax
which will allow us to build up the matrix
ax
Using the chain rule we can write
alyiax1z
But JYl = EyilYl
aryl ay;,
a axrs
where Y, Is the cofactor of the element y;; in IYI. Since the cofactorsare independent of the element y11, we have
alYl
It follows that
Yl, Yia, . ,
(4.8)
Although we have achieved our objective in determining the above formula,it can be written in an alternate and useful form.
With
a;; = Yil and b;, =ay;;
ax,we can write (4.8) as
a'Y
OX0 i bU a+'1b11e/el
= EEa,,ej'b,1el
= EA,.'B;. (by (1.23) and (1.24))
= tr (AB') = tr (B'A) (by (1.43))
where A = [a;;] and B = [b;l].
N
58 Introduction to Matrix Calculus
Assuming that Y is of order (k X k) let
Ylrl Y12 ... Ylk
Y21 Y22 ... Yak I = Z
Yk1 Yk2 ... Ykk
and sincerayi aY
8x axwe can write
aFYI l/a Y,= tr l-Zax, x,s
We use (4.10) to evaluate 8IY1/di1l, aIYI/ax12,use (4.7) to construct
aIYI
ax
... , a
Example 4.4
Given the matrix X = [x11} of order (2 X 2) evaluate aIXIlaX,
(i) when all the components xll of X are independent(ii) when X is a symmetric matrix.
Solution(i) In the notation of (4.10), we have
Y=Lx21 X22J
so that aY/ax,z = E,.s (for notation see (1.4)).As r
z= X11 X12j
LX21 X2? I
we use the result of Example (1.4) to write (4.10) as
a'YI_ (vec E,,)' vec Z
ax,
[Ch. 4
(4.9)
(4,10)
and then
0
C13
'.)
r\N
CJ]
Sec. 4.4] The Derivative of Scalar Functions of a Matrix 59
So that, for example
and
Vleslce
alYl[1ax
alyax12
_ [0 0 1 01
alyl alxxl
ax ax
= x
= X12 and so on.
x11xll
x21 x22
= IXI(X-1)' (See [18] p. 124).
(ii) This time
hence
Y =Ix11 X121
Lx 12 X22J
aiYi alylL11 = E12 + E21 and so on.
(See the introduction to Chapter 4 for explanantion of the notation.)It follows that
X11
alYl=-alyl
= [0 1 1 0] X21 = X21 +X12 = 2X12
hence
ax12 ax21 X12
X22 (Since X12 = X21)
al yl
ax
x11 2X12
2x21 Xz2= 2
x11 X12
X21 X22
Fx11 0
0 X22
The above results can be generalised to a matrix X of order (n X n).We obtain, in the symmetric matrix case
aixl= 2 [Xjj] - diag {X;;}
ax
60 Introduction to Matrix Calculus [Ch. 4
We defer the discussion of differentiating other scalar matrix functions toChapter 5.
4,5 THE DERIVATIVE OF A MATRIX WITH RESPECT TO ONE OF ITSELEMENTS AND CONVERSELY
In this section we will generalise the concepts discussed in the previous section.We again consider a matrix
X = [x,,] or order (m X n) .
The derivative of the matrix X relative to one of its elements x,s (say), isobviously (see (3.1))
ax= E, (4.11)ax
where E1 is the elementary matrix of order(nt X n) (the order of X) defined insection 1,2.
It follows immediately that
. (4.12)ax'
=E.'
ax,
A more complicated situation arises when we consider a product of the form
where
and
Y = AXB (4.13)
X = [xq] is of order (m X n)
A = [a;,] is or order (I X m)
B = [b;,] is of order (n X q)
Y = [y;,] is of order (I X q) .
A and B are assumed independent of X.Our aim is to find the rule for obtaining the derivatives
aY
ax'sand
ay,,
ax
where xrs is a typical element of X and yil Is a typical element of Y.We will first obtain the (I,/)th element yr, in (4.13) as a function of the
elements of X.We can achieve this objective in a number of different ways. For example,
we can use (2.13) to write
vecY = (B'(D A)vecX .
...
roles
Sec. 4.5] The Derivative of a Matrix 61
From this expression we see that yij is the (scalar) product of the ith row of
[bljA; b2jA; . ;b1A] and vecX,so that
Yij = >. /',. ail bpjxlp .
A=l 1=1
From (4.14) we immediately obtain
ayij
axrs- atrbsj
We can now write the expression for aylj/aX ,
ayij aytj ayii
ax11 aX12 , . aXln
a ayu
ax21 ax22
ayu
aX2n
ay11 aylj aye,...
(4,14)
(4.15)
(4.16)
aXm 1 axm 2 axm n
Using (4.15), we obtain
aylj
ax
a11blj aitb2j ... ailbnj
ai2blj a12b2j ... a12bnj (4.17)
Limblj aimb2i ... almbnjj
We note that the matrix on the right hand side of (4.17) can be expressedas (for notation see (1.5) (1.13) (1.16) and (1.17))
ail
ail
atmj
(btjb2j ... bnjj
= Al. B./
= A'e1 ee B'.
SRS
AID
"
r0'
62 Introduction to Matrix Calculus
So that
ay`I = A'E B'r/ax
(Ch. 4
(4.18)
where Ell is an elementary matrix of order (I X q) the order of the matrix Y.We also use (4.14) to obtain an expression for aYlaxrs
a Y ay,I (r, s fixed, 1, j variable I < i s 1, 1< j 5 q)=
aXrs aXrsthat is
ayaxrs
ayll ayl2
axrs aXrs
aye, ay22
aXrs axrs
ay,g
aXrs
ay2g
axrs
ay a 8YI2 aytq
xs axrs axrs
Eli (4.19)ay"axrs
where Et1 is an elementary matrix of order (1 X q).We again use (4.15) to write
ayu
axrs
So that
alrbsl alrbs2 ...alrbqa2rbsi a2rbs2 ... a2rbsq
arnrbsl arnrba2
air
a2r
arnr
. amrbsq
[bst b52 . . . bsq ]
A.rBs' = AeresB .
a (AXB)= AErsB
axrs(4.20)
where Ers is an elementary matrix of order (m X n), the order of the matrix X.
lam
II.
.C)
Sec. 4.51 The Derivative of a Matrix
Example 4.5Find the derivative aY/axr,, given
Y = AX'B
63
where the order of the matrices A, X and B is such that the product on the righthand side is defined.
Solution
By the method used above to obtain the derivative a/axis (AYB), we find
a(AX'B) = AE,,B .
3Xrs
Before continuing with further examples we need a rule for determining thederivative of a product of matrices.
Consider
Y = UV (4.21)
where U = [u11] is of order (rn X n) and V = [qj] is of order (n X 1) and bothU and V are functions of a matrix X.
We wish to determine
aY-- andaxis
ay11
ax
The (i,j)th element of (4.21) is
ylj =n
UIPVPI (4.22)
P=1
hence
ay;j n UUpv
n avP1-U (4 23)airs
PjP =
i aXrs P=I
iP .
axis.
For fixed r and s, (4.23) is the (i,j)th element of the matrix aYlax,s oforder (m X 1) the same as the order of the matrix Y.
On comparing both the terms on the right hand side of (4.23) with (4.22),we can write
a(UV) au avV + U
axrs axis axis(4.24)
as one would expect.
,-.
CID
ti-
64 Introduction to Matrix Calculus [Ch. 4
On the other hand, when fixing (i,j), (4.23) is the (r,s)th element of thematrix ay;l/aX, which is of the same order as the matrix X, that is
ay,l "lip avpl
ax L ax vpl + L utp axp=1 p-1
(4.25)
We will make use of the result (4.24) in some of the subsequent examples.
Example 4.6Let X = [xrs] be a non-singular matrix. Find the derivative aY/axrs, given
(i) Y = AX -'B, and(ii) Y=XAX
Solution(i) Using (4.24) to differentiate
yy-t = I,we obtain
hence
aY 3Y-'-Y-'+Y = 0,axrs axrs
aY ay-'- _ -Y - Y-.axrs axrs
But by (4.20)
3Y-' a (B-1XA-1) = B-'Ers q-taxrs axrs
so that
axrs axrs
ay a
- = - (AX-'B) = AX -'BB-'ErsA-'AX -'BAX-'ErsX-'B .
(ii) Using (4.24), we obtain
ay_
aX' AX+X' a(AX)-axrs axrs axrs
_ E, AX + X'Airrs (by (4.12) and (4.20)) .
Both (4.18) and (4.20) were derived from (4.15) which is valid for all i, jand r, s, defined by the orders of the matrices involved.
1
,R.
Sec. 4.5 1 The Derivative of a Matrix 65
The First Transformation PrincipleIt follows that (4.18) is a transformation of (4.20) and conversely. To obtain(4.18) from (4.20) we replace A by A', B by B' and Er: by Eli (careful, Ers andEtl may be of different orders).
The point is that although (4.18) and (4.20) were derived forconstant matrices A and B, the above transformation is independent of thestatus of the matrices and is valid even when A and B are functions of X.
Example 4.7Find the derivative of aytl/aX, given
(i) Y = AX'B,(ii) Y=AX-'B, and(iii) Y = X AU
where X = [x,l] is a nonsingular matrix,
Solution(1) Let W = X', tlien
ayY = AWB so that by (4.20) - =AEr3B
aWrs
hence
But
hence
ay,l= A'E;iB'.
aw
ayL/ a}ri _ (ay.lax aw' awl
DYq= BE ;IA
ax
(ii) From Example 4.6(i)
aY-AX-'L,-,,,X-'B.
axrs
Let At = AX -1 and Bt = X''B, then
aYA1E 3B
a1
xrs
so thatay,t
= AiE,1B1' = -(X )'A'E;1B'(X t)' .ax
0.j
66 Introduction to Matrix Calculus [Ch. 4
(iii) From Example 4.6 (ii)
aY= E,,AX + X'AE,,s .
aXrs
LetA,=1,Bt=Ax,A2=XAandB2=1, thenax
= AtErsBl +A2Ersl32 .axrs
The second term on the right hand side is in standard form. The first term is inthe form of the solution to Example 4.5 for which the derivative ay;l/aX wasfound in (i) above, hence
ay 'r = B1E;1AI + A2E,/B2'ax
= AXE; + A`xE;l .
It is interesting to compare this last result with the example in section 4.2when we considered the scalary = x'Ax.
In this special case when the matrix X has only one column, the elementarymatrix which is of the same order as Y, becomes
E;1=E;j=1.Hence
ay,, = aY = Ax + A'xax ax
which is the result obtained in section 4.2 (see (4,4)).Conversely using the above techniques we can also obtain the derivatives of
the matrix equivalents of the other equations in the table (4.4).
Example 4.8Find
aY
aXrsand
ay;;
axwhen
(i) Y = AX, and(ii) Y=X'X.
Solution(i) With B = I, apply (4.20)
aY= AEr3.
axrs
Ltd
Sec. 4.61 The Derivatives of the Powers of a Matrix
The transformation principle results in
ay11
ax
(ii) This is a special case of Example 4.6 (ii) in which A = I.We have found the solution
aYErsX + X'Ers
axrs
and (Solution to Example 4.7 (iii))
'Y" = XE11 + XEj .ax
4.6 THE DERIVATIVES OF THE POWERS OF A MATRIX
Our aim in this section is to obtain the rules for determining
when
ay
axrsand
ay;;
ax
Y=X".Using (4.24) when U = V= X so that
Y=X2we immediately obtain
ay- =ErsX+XErsaxrs
and, applying the first transformation principle,
ay,= E;1X'+X'E;j .
axIt is instructive to repeat this exercise with
so that
We obtain
and
U= X 2 and V= X
Y X3.
ay= ErsX 2 + XErsX + X 2Ers
axrs
67
Y-u = Ei, (X')2 + X'EifX' + (X 1)2E,,
ax
...
68 Introduction to, Matrix Calculus
More generally, it can be proved by induction, that for
Y=Xn
k=0
XkEESXn-k-1
where by definition X ° = I, and
ay;l
ax
a(X-n) Xn+X-n a (Xn)=
0
airs axrs
3(X-n)_ `x -n a(Xn)
X-n.axrs axrs
x )k E,j (X ) n -k-1
Example 4.9Using the result (4.26), obtain aYlaxrs when
Y=X-n
SolutionUsing (4.24) on both sides of
X-nXn=Iwe find
so that
Now making use of (4.26), we conclude that
3(X-n)
"-I
k=1
= -x-nFn-1 7
L=°axrs
Problems for Chapter 4
(1) Given -x= xtl x12 x3
x21 x22x233]
XkErsXn-k-1
Y = 1x-1
2x2 sin x
and y = 2x11x22 -x21x13, calculate
[Ch. 4
(4.26)
(4.27)
ay andBY
ax ax
Sec. 4.61 The Derivatives of the Power of a Matrix
(2) Given
Xsinx X
cos x czand X =
evaluatealxlax
by(a) a direct method(b) use of a derivative formula.
(3) Given
X =X11 x12 X13
and Y = X'X,Lx 21 x22 X231
use a direct method to evaluate
(a)D Y
and (b) aY i3
ax-21 ax
Fsinx
L'
ex
XI
(4) Obtain expressions for
byand
ay;;
ax's axwhen
(a) Y = XAX and (b) Y = XAX'.
(5) Obtain an expression for atAXBI/ax,,. It is assumedAXB is non-singular.
(6) Evaluate aY/ax,,s when
(a) Y = X (X')2 and (b) Y = (X')2X.
69
-o^
CHAPTER 5
Further Development of MatrixCalculus including an Applicationof Kronecker Products
5.1 INTRODUCTION
In Chapter 4 we discussed rules for determining the derivatives of a vector andthen the derivatives of a matrix.
But it will be remembered that when Y is a matrix, then vec Y is a vector.This fact, together with the closely related Kronecker product techniquesdiscussed in Chapter 2 will now be exploited to derive some interesting results.
Also we explore further the derivatives of some scalar functions with respectto a matrix first considered in the previous chapter.
5.2 DERIVATIVES OF MATRICES AND KRONECKER PRODUCTS
In the previous chapter we have found ay;!/3X when
Y = AXB (5.1)
where Y = [y1j], A = [ajj], X = [x11] and B = [by].We now obtain (a vec Y)/(a vec X) for (5.1). We can write (5.1) as
y=Px (5.2)
where y = vec Y, x=vecXand P=B'OA.By (4.1), (4.4) and (2.10)
ay=P' = (B'OA)' = BOA'. (5.3)
ax
The corresponding result for the equation
Y = AX'B (5.4)
is not so simple.
[Sec. 5.2] Derivatives of Matrices and Kronecker Products71
The problem is that when we write (5.4) in the form of (5.2), we have thistime
y = Pz (5.5)
where z = vec X'We can find (see (2.25)) a permutation matrix U such that
vecX' = UvecX (5.6)
in which case (5.5) becomes
y=PUxso that
ax= (PU)' = U'(B ®A') . 5.7)
It is convenient to write
U'(B O A') = (B (5.8)
U' is seen to premultiply the matrix (B O A'). Its effect is therefore to rearrangethe rows of (B d A').
In fact the first and every subsequent nth row of (B (D A') form the firstconsecutive m rows of (B O A')(,,). The second and every subsequent nth rowform the next m consecutive rows of (B and so on.
A special case of this notation is for n = 1, then
(B (D A'){1) = BOA' . (S.9)
Now, returning to (5.5), we obtain, by comparison with (5.3)
ay= (B(D
ax
Example 5.1
Obtain (a vec Y)/(a vec X), given X = [x;l] of order (m X n), when
(i) Y=AX, (ii) Y=XA, (iii) Y=AX' and (iv) Y=XA.
Solution
Let y = vec Y and x = vec X.
(i) Use (5.3) with B = I
ay= 10 A'.
ax
(5.10)
... ...
...
III
I72 Further Development of Matrix Calculus
(ii) Use (5.3)
ay= A ®I .
ax
(iii) Use (5.10)
ay_ (I ®A')(n)
ax
(iv) Use (5.10)
ay
ax = (A ®I)(o
[Ch. 5
5.3 THE DETERMINATION OF (a vec X)/(3 vec Y) FOR MORECOMPLICATED EQUATIONS
In this section we wish to determine the derivative (a vec Y)/(a vec X) when, forexample,
Y = X'AX (5.11)
wheie X is of order (m X n).Since Y is a matrix of order (n X n), it follows that vec Y and vec X are
vectors of order nn and nm respectively.With the usual notation
Y = [yi/) , X = [xi/)we have, by definition (4.1),
ay11 ay21 ... aynn
ax11 ax11 ax11
a vec Y ayl I ay21 aynn
avecx axle a .x21 ax21
ayll ay21 aynn
aXmn axmn 3Xmn
But by definition (4.19),ay) '
the first row of the matrix (5,12) is vec --ax, I
(5.12)
/ a'the second row of the matrix (5.12) is +\vec
Y-),etc.
a.x21
Sec. 5.3] The Determination of (3 vecX)/(3 vec Y)
We can therefore write (5.12) as
a vec Y ( by , BY aY 1 '= vec - : vec - ; ... ; vec
a vecX 3x11 8x21 ax,nn
We now use the solution to Example (4.6) where we had established that
73
(5.13)
when Y = X'AX, thenby
= E,,SAX + X AErs . (5.14)axrs
It follows thatby
vec - = vec E;SAX +vec X AE,Saxrs
= (XA'OI) vecE;S+(IOXA)vecErs (5.15)
(using (2.13)) .
Substituting (5.15) into (5.13) we obtain
a vec Y
a vec X
(by (2.10)).The matrix
_ [(X'A'01)[vee/'1 vecE21; . ;vecErnr,]]'
+ [(IOXA)[vecEll: vecE21:... vecE,,,n]]'
_ [vec Eii: vec E21; ... ; vec E;,,n]'(AX 01)
+ [vec E11 vec E21 vec E,nn ]' (I (DA'X) (5.16)
[vec E, 1 , vec E21 .. . vec Ernn ]
is the unit matrix I of order (mn X mn).Using (2.23) we can write (5.16) as
3vecY
avecX
That is
= U'(AX 01) + (10 A'X) .
a vec Y
a vcc X(5.17)
In the above calculations we have used the derivative a Y/axrs to obtain (3 vec Y)/(a vecX).
cow
'-j
74 Further Development of Matrix Calculus [Ch. 5
The Second Transformation Principle-Only slight modifications are needed to generalise the above calculations andshow that whenever
ay= AErsB + CE,, D
aXrs
where A, B, C and D may be functions of X, then
a vec Y
avecX=
We will refer to the above result as the second transformation principle.
Example .f.2Find
avecY
avecXwhen
(i) Y = X'X (ii) Y = AX-'B
Solution
Lety=vecYandx=vecX(i) From Example 4.8
ay= Er'sX + X'Ers
aXrs
Now use the second transformation principle, to obtain
ay= I©X+(X(D
ax
(u) From Example 4.6
hence
ayAX-'ErjX-'B
axrs
ay
= -(X -'B) O (X-')'A'.ax
(5.18)
Hopefully, using the above results for matrices, we should be able to rediscoverresults for the derivatives of vectors considered in Chapter 4.
.ti
c..
0
t-0
Sec. 5.4] More on Derivatives of Scalar Functions
For example let X be a column vector x then
Y = X'X becomes x 'x (y is a scalar) .y=The above result for ay/ax becomes
av= (I0 x)+(x0 1)(1)
ax
75
But the unit vectors involved are of order (n X 1) which, for the one columnvector X is (1 X 1). ilence
ay= l ©x + x ©1 (use (5,9))
ax
=x+x=2xwhich is the result found in (4.4).
5.4 MORE ON DERIVATIVES OF SCALAR FUNCTIONS WITHRESPECT TO A MATRIX
In section 4.4 we derived a formula, (4.10), which is useful when evaluating31Y)/3X for a large class of scalar matrix functions defined by Y.
Example.5.3Evaluate the derivatives
()a log IX
and (ii)aIXIr
axax
Solution
(i) We have
ax(log IXD = X
I axa
I .
rs I rs
From Example 4.4,
alxlax = Ixl(x-')
Hence
a log IXI _(X
1)
= .ax
(ii) alxlrr-1 a1xl
(non-symmetric case) .
= rjXjaXrs aXrs
c^,76 Further Development of Matrix Calculus [Ch. 5
Hence
alxlr -- rlXIr(X-1)'ax
Traces of matrices form an important class of scalar matrix functionscovering a wide range of applications, particularly in statistics in the formu-lation of least squares and various optimisation problems.
Having discussed the evaluation of the derivative a Y/axrs for various productsof matrices, we can now apply these results to the evaluation of the derivative
a(tr Y)ax
We first note that
a(tr Y) _ [a(tr Y)1
ax axrs JI
(5.19)
where the bracket on the right hand side of (5.19) denotes, (as usual) a matrixof the same order as X, defined by its (r,s)th element.
As a consequence of (5.19) or perhaps more clearly seen from the definition(4.7), we note that on transposing X, we have
a(tr Y) a(tr Y) '
ax' ax -
(5.20)
Another, and possibly an obvious property of a trace is found when consideringthe definition of aY/axrs (see (4.19)).
Assuming that Y = [yij] is of order (n X n)
tray =ayri+aY22+...+aYnn
Hence,
tr
axrs axrs
axrs axrs 3Xrs axrs
a- (YI1 + Y22 + . + Ynn)axrs
ay a (tr Y)
Example 5.4
(5.21)
Evaluatea tr(AX)
ax
.-1
Sec. 5.4] More on Derivatives of Scalar Functions
Solution
Hence,
a tr(AX) a(AX)aXrs
= trairs
by (5.21)
= tr (AE,,) by Example (4.8)
= tr(E,,A') since tr Y = tr Y'
= (vec E,.,)' (vec A') by Example (1.4).
atr(AX) ,
= Aax
77
As we found in the previous chapter we can use the derivative of the trace ofone product to obtain the derivative of the trace of a different product.
Example 5.5Evaluate
a tr (AX')
ax
Solution
From the previous result
a t r (BX) _ a t r (X'B') = B,
ax ax
Let A' = B in the above equation, it follows that
atr(X'A) atr(A'X)_ = A.
ax ax
The derivatives of traces of more complicated matrix products can be foundsimilarly.
Example 5.6
Evaluate8 (tr Y)
aYwhen
(i) Y = XAX(ii) Y = X AXB
Solution
It is obvious that (i) follows from (ii) when B = I.
>'C
78
tr(aY\ = tr(E,3AXB)+tr(X`AErsB)axrs!)
tr (E,,4AXB) + tr (E,,.4 XB')
= (vec EE,.)' vec (AXB) + (vec Ers)' vec (AXB') .
(ii) Y = X1B where X1= X AU.ay _ axt B
airs ax-".'
= E,s AXB + X'AEB (by Example 4.6)Hence,
It follows that
a(trY)= AXB + A'XB'.
ax
(i) Let B = I in the above equation, we obtain
a(tr Y)
ax= AX+A'X = (A+A')X .
5.5 THE MATRIX DIFFERENTIAL
For a scalar function f(x) where x = [x1 x2 ... x,,]', the differential df is definedas
Further Development of Matrix Calculus [Ch. 5
ofdf = > dxl. (5.23)
J=Ox,
Corresponding to this definition we define the matrix differential dX for thematrix X = [x;1] of order (m X n) to be
dX =dx11 dx12 ... dxtn
dx21 dx22 ... dx2n (5.24)
Ldxmt dXm2 ... dxrn.1 .
The following two results follow immediately:
d(aX) = a(dX) (where a is a scalar) (5.25)
d(X + Y) = dX + dY. (5.26)
Consider now X = [x;1] of order (m X n) and Y = [ y,f] of order (n X p).
XY = [ExjJyjk]
Sec. 5.5]
hence
The Matrix Differential
d(XY) = d[Yxtlyjk)
=7_ [E(dXij)yjk) + IExii(dYjk))
It follows that
d(XY) = (dX)Y+X(dY).
Example 5.7Given X = [xtl] a nonsingular matrix, evaluate
(i) dlXl , (il) d(X'')
Solution(i) By (5.23)
dIXI (dx,j)ax11
Xij(dxij)
79
(5.27)
since (a1Xl)/(axij) =Xij, the cofactor ofxij in IXI.By an argument similar to the one used in section 4.4, we can write
dIXI = tr {Z'(dX)} (compare with (4.10))
where Z = IXij]Since Z'= IX jX-1, we can write
dIXI = IXl tr {X-'(dX)} .
(ii) SinceX-1X =
we use (5.27) to write
d(X-')X + X-'(dX) = 0.Hence
d(X-') = -X-'(dX)X"'
(compare with Example 4.6).Notice that if X is a symmetric matrix, then
x=x'and
(dX)' = dX . (5.28)
.,.,.
80 Further Development of Matrix Calculus [Ch. 5]
Problems for Chapter 5
(1) Consider
A =all a12
a21 a12X= [X11 xiz
X21 X22and Y = AX'.
Use a direct method to evaluate
a vec Y
avac X
and verify (5.10).
(2) Obtainavac Y
avecxwhen
(i) Y = AX'B and (ii) Y = )JAII X2.
(3) Find expressions for
atrYax
when
(a) Y = AXB, (b) Y = X2 and (c) Y = XX'.
(4) Evaluate
a tryax
when
(a) Y = X-1, (b) Y = AX-'B, (c) Y = X" and (d) Y = eX.
(5) (a) Use the direct method to obtain expressions for the matrix differentialdY when
(i) Y = AX, (ii) Y = X'X and (iii) Y = X2.
(b) Find dY when
Y = AXBX.
}d{
...
y,,
Cc)
Cl IAPTLR 6
The Derivative of a Matrix withrespect to a Matrix
6.1 INTRODUCTIONIn the previous two chapters we have defined the derivative of a matrix withrespect to a scalar and the derivative of a scalar with respect to a matrix. We willnow generalise the definitions to include the derivative of a matrix with respectto a matrix. The author dial"adopted the definition suggested by Vetter [31],although other definitions also'give rise to some useful results.
6.2 THE DEFINITIONS AND SOME RESULTSLet Y = [y,j be a matrix of order (p X q). We have defined (see (4.19)) thederivative of Y with respect to a scalar xrs, it is the matrix [ayti/axr,s] of order(pXq)
Let X = [xrs] be a matrix of order (m X n) we generalise (4.19) and definethe derivative of Y with respect to X, denoted by
aY
axas the partitioned matrix whose (r,s)th partition is
aY
axrs
in other words
aY
ax
ay ay aY
OXt1 3x12 ... axij
aY aY aY
421 a.X22 ... 3x2n
aY aY aY
OXmt axm2
ay_ Ers0 - (6.1)
r, s axrs
_.y
,1y
82 The Derivative of a Matrix with Respect to a Matrix [Clt. 6
The right hand side of (6.1) following from the definitions (1.4) and (2.1) whereErr is of order (in X n), the order of the matrix X.
It is seen that 3Y/3X is a matrix of order (mp X nq).
Example 6.1
Considerx11 x12 x22
exll x"Y =
sin(xii +x12) log (x1t ,F-X21))Jand
x11 xt2X 1
x21 x22Evaluate
aY
ax
Solution
ay 12 x22 x22 exl l x]] 1
1axi t + x12)cos (XI I(x11 + x21)
ay x77 x22 0
aX12 cos (x11 + x12) 0
ay 0 0 ay x11x12 x17 exllx731
4211
0ax22 0 0
x11 + x21
x12 x22 x22 exl l x» 0X1 t x22
ay 1
cos (x11 + x ) cos (x11 + x12) 012ax xii + x21
0 0 xtt x12
1
x11 exl l x21
0 0 0
Example 6.2Given the matrix X = [xv] of order (m X n), evaluate aX/aX when
(i) All elements of X are independent(ii) X is a symmetric matrix (of course in this case m = n).
.-,
I--
.-,
v°,
Sec. 6.2) The Definitions and Some Results
Solution
(i) I3y (G.1)
ax
ax r, s
ax= Ers +Esr
axrs
ax=
axrs. "
= U (see (2.26))
for r$s
for r = s
We can write the above as;
ax= Ers + Esr - SrsErr
axrs
Hence,ax
axrs
r
Ers + > Ers Ox Esr ` 5rs > Esr Ox Errr,s r,s
= U+ U-2:ErrOx Err
Example 6.3
Evaluate and write out in full ax'lax given
X =
Solution
By (6.1) we have
ax'
X11 X12 X13
Lx21 x22 x231
r, s
(see (2.24) and (2.26))
83
= Ers © Ersax= U.
Hence
1 0 0 0 0 0
0 0 1 0 0 0
ax, 0 0 0 0 1 0
ax - 0 1 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0 1
"C7
I-,
.ox
84 The Derivative of a Matrix with Respect to a Matrix
From the definition (6.1) we obtain
tax, =(>Ers °aX )'
Ers Ox f a by (2. 10)r, s \axr.
a Y'_ O from (4 19)
r,sIt follows that
aY aYfax
= ax'
[Ch. 6
(6.2)
'6.3 PRODUCT RULES FOR MATRICES
We shall first obtain a rule for the derivative of a product of matrices withrespect to a matrix, that is to find an expression for
a (XY)
az
where the order of the matrices are as indicated
X(mXn), Y(nXv), Z(pXq).By (4.24) we write
a(XY)=
axY+X
aY
azrs azrs azrs
where Z = [ZrslIf Ers is an elementary matrix of order (p X q), we make use of (6.1) to
write
a (XY)Ers O
FaxY+X
aylaZ r. s aZrs azrs
ax aYIEr, -Y+ Ers(8X
r, s aZrs r s azrs
ax 3Y'= > Erslo OX -Y+ 5 IIErs 0X -
UZrs rS
UZrsr. s
Sec. 6.3 1 Product Rules for Matrices 85
(where Iq and Ip are unit matrices of order (q X q) and (p X p) respectively)
ax aY(Lrs (D- ) (Iq ®Y) + (I ®X) Er, ---) (by 2.11)
r, s airs r s azrf
finally, by (6.1)
a(XY) ax(I ®Y) + (I®(@ X) aY (6.3)
az = az az
Example 6.4Find an expression for
ax-'ax
SolutionUsing (6.3) on
xX-'=1,we obtain
hence
a (xx-') ax ax-1
ax ax ax
ax-Iax =
-(I©x)-' ax(I©x-')
= -(IOX-1)CI(I(& X-')
(by Example 6.2 and (2.12)).
Next we determine a rule for the derivative of a Kronecker product ofmatrices with respect to a matrix, that is an expression for
a(X (D Y)
az
The order of the matrix Y is not now restricted, we will consider that it is(u X v). On representing X © Y by it (i,j)th partition [x;1Y] (i = 1, 2, ... , m,k = 1, 2, .. , n), we can write
a (X ©Y) a
[xr1Y]azrs air:
C1.
c14
'GO
c(0
f1.
86 The Derivative of a Matrix with Respect to a Matrix [Ch. 6
where (r, s) are fixed
= L3ZrsYJ + L aZsj
r, s
Hence by (6.1)
3(X(D Y)
az
where Ers is of order (p X q)
=aZ®Y+'
r,
_ -OY+XO- .aZrs aZrs
ax
The summation on the right hand side is not X © aY/aZ as may appear at firstsight, nevertheless it can be put into a more convenient form, as a product ofmatrices. To achieve this aim we make repeated use of (2.8) and (2.11)
Ers®(Xazrs® aYl= [IpErsIq]OLUii®X)U1]
r, s //aZrs /r, s
by (2.14)
r, s
ay
ax:rs0x -OO Y+
aZrs r,s
E 0X0 aY
aZrs
Ers0(XOaY\
azrs J
aYErs) O U, -0 X [Iq O U2] by (2.11)
azrs//
OUi]ErsOa-Y
OO X [Ig0 U2] bY(2.11).aZrs
a(XOY)_ ax0Y+ 10U ay0X IO U21az az [ p ] laz ] [ q (6.4)
where U, and U2 are permutation matrices of orders (mu X mu) and (nv X nv)re pe ctive ly.
We illustrate the use of equation (6.4) with a simple example.
Example 6.5A = [ail] and X = [x11] are matrices, each of order (2 X 2). Use
(i) Equation (6.4), and(ii) a direct method to evaluate
a(A©X)ax
ICS
Sec. 6.3] Product Rules for Matrices
Solution(i) In this example (6.4) becomes
(Aaxx)_ [I O U1 ] Cax ©A [I ©U2]
where I is the unit matrix of order (2 X 2) and
U1=U2=ZE,s0OErs=
Since
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
1 0 0 1
ax 0 0 0 0
ax 0 0 0 0
1 0 0 1
only a simple calculation is necessary to obtain the result. It is found that
all 0 a12 0 0 all 0 a12
a(AOX)ax
0 0 0 0 0 0 0 0
a21 0 a22 0 0 a21 0 a22
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
all 0 a12 0 0 all 0 a12
0 0 0 0 0 0 0 0
a21 0 a22 0 0 a21 0 a22
(il) We evaluate
Y = AOX =
allxll alixl2 a12x11 a12x12
a11x21 a11x22 a12X21 a12x22
a21 x11 a21 x 12 a22 x 11 a22 x 12
a21x21 a21x22 a22x21 a22x22
87
and then make use of (6.1) to obtain the above result.
("1
(0l'0
88 The Derivative of a Matrix with Respect to a Matrix [Ch. 6
6.4 THE CHAIN RULE FOR THE DERIVATIVE OF A MATRIX WITHRESPECT TO A MATRIX
We wish to obtain an expression for
azax
where the matrix Z Is a matrix function of a matrix X, that is
Z = Y(X)where
X = [xii] is of order (m X n)
Y = [ yil] is of order (u X v)
Z = [zri] is of order (p X q)
By definition in (6.1)
az az r=1,2,...,max r, s axrs s = 1, 2, ... , n
where Er,s is an elementary matrix of order (m X n),
=Ers D
r,s i,i
azii l=1, 2,...,uii -axrs 1 = 1, 2, ... , q
where Eli is of order (p X q)As in section 4, 3, we use the chain rule to write
Hence
azii
airs
az
ax =
az,i ayap
a,
Ersr, s
a=1,2,...,uayap axrs 0=1,2,...,v
ayap
ayap axrs
ayapO Ei
azii- (by 2.5)axrs aya p
ayap ® az(by (4.7) and (4.19))
ax ayap0e, 9
`DIN
Sec. 6.4] The Chain Rule for the Derivative of a Matrix 89
If I,, and It, are unit matrices of orders (n X n) and (p X p) respectively, we canwrite the above as
az
ax (1-Yli")'& )IPaY\ap ap
Hence, by (2.11)
M 3z
aX p (aaXN) (I.l\ Yap
Equation (6.5) can be written in a more convenient form, avoiding the summation,if we define an appropriate notation, a generalisation of the previous one.
Since
Y =
Y1i Y12 ... Yiv
Y21 Y22 Y2v
LYu1 Yu2 ... YuvJ
than (vec Y)' _ y y21 . Yuv JWe will write the partitioned matrix
as
or as
P P PLaax®1 aXi(3) 1;...ax
a®IP
ax
a (vec Y)'®IP
ax
Similarly, we write the partitioned matrix
azIn ® -
aYii
azIn Ox -
aY21
as
P az lIn®
a vec Y
azIn
ayuv
fro
w-.
'r4
..n
(0I(0
+,G
N
90 The Derivative of a Matrix with Respect to a Matrix
We can write the sum (6.5) in the following order
az Y11
ax = ra®
IPJ CI"
aaZ 1 + ray" 01] (1" © aZ
ax ax Pyu1 IL +l Yzi
+auv®IPI"° azLayx
J[1.
ayu.J
We can write this as a (partitioned) matrix product
az ayii©I aY21 :,.ax r 75X P* ax_)I
1P ax -
[Ch. 6
az
I" ®ayuvFinally, using the notations defined above, we have
aZ a [vec Y]' aZ,,p1"0 ®
ax az P L" a vec Y](6.6)
We consider a simple example to illustrate the application of the above formula.The example can also be solved by evaluating the matrix Z in terms of the com-ponents of the matrix X and then applying the definition in (6.1).
Example 6.6Given the matrix A = [au] and X = [x11] both of order (2 X 2), evaluate
aziaxwhere Z = Y'Y and Y = AX.
(i) Using (6.6)(ii) Using a direct method.
Solution(1) For convenience write (6,6) as
az
ax = QRwhere
[a[vecYrQ
az®I and R = IO
ax P] a vec Y
00
Nom
'
Sec. 6.4] The Chain Rule for the Derivative of a Matrix 91
From Example 4.8 we know that
ay" ± A'Eax r
so that Q can now be easily evaluated,
Q
all 0 0 0 a21 0 0 0 1 0 0 all 0 0 0 a21I I
0 I
0 all 0 0 0 all 0 0 1 0 0 0 all X 0 0 0 a2i
a12 0 0 0 a22 0 0 0 0 0 a12 0 0 0 a22 0
0 all 0 0 1 0 a22 0 0 0 0 0 ate 1 0 0 0 a22
Also in Example 4.8 we have found
aZ= E,S Y + Y'Ers
aYra
we can now evaluate R
R =
2Y11 Y12 0 0
Y12 0 0 0
0 0 2Y11 Y12
o 0 Y12 0
2Y21 Yn 0 0
Y22
0
0
0
0
2Y21
0
Y22
0 0 Y22 0
0 Y11 0 0
Yil 2Y,2 0 0
0 0 0 Y11
0 0 Y 2Y,20""Y21"0""0"Y21 2Y22 0
0 0 0
Lo 0
0
Y21
Y21 2y2
;,c
r-.O
--
^'.'0
92 The Derivative of a Matrix with Respect to a Matrix (Ch.
The product of Q and R is the derivative we have been asked to evaluate
E2ailyil + 2a21y21 a11y12 + a21y22
QR = a11y12 + a21y22 0
2412y + 2a22Y21 a12y12 +1122Y22
La12y12 +a22y22 0
o ally,, + a21y21
a11y1 l +a21y21 2a11y12 + 2a21y22
o al2y11 + a22y21
a12.y11 + a22y21 2a12y12 + 2a22y22
(ii) By a simple extension of the result of Example 4.6(b) we find that when
Z = X'A'AX
az= ErSAAX + X'A'AErs
axrs
= ErsA'Y + Y'AErswhere Y = AX.
By (6.1) and (2.11)
az
ax(Ers Ox Ers) (10 A'Y) + 2 (I OO Y'Z) (Ers Ox Ers)
r.sr,s
Since the matrices involved are all of order (2 X 2)
1 0 0 0
0 0 1 0
IErsOE;s =0 1 0 0
0 0 0 1
and
1 0 0 1
0 0 0 0E Ers OX Ers =
0 0 0 0
1 0 0 1
On substitution and multiplying out in the above expression for aZfaX, we obtainthe same matrix as in (i).
Problems for Chapter 6
(1) Evaluate aYjaX given
IX-21
y _ [cos (X12 + x22) xux211and X = x11 x12
X12x22 X22
.L]
.mar
6] Problems
(2)
The elements of the matrix X =rxil x21
x12 x22
LX13 X23 J
are all independent. Use a direct method to evaluate aX/aX.
3 ]() Given a non-singular matrix X = _I x11 x12
x21 x22
use a direct method to obtain
ax-1
axand verify the solution to Example 6.4.
93
(4) The matrices A = [aiij and X = [x,ij are both of order (2 X 2), X is non-singular. Use a direct method to evaluate
a(A 0 X-')ax
'L7
E--
CHAPTER 7
Some Applications of MatrixCalculus
7.1 INTRODUCTION
As in Chapter 3, where a number of applications of the Kronecker product wereconsidered, in this chapter a number of applications of matrix calculus arediscussed. The applications have been selected from a number considered in thepublished literature, as indicated in the Bibliography at the end of this book.
These problems were originally intended for the expert, but by expansionand simplification it is hoped that they will now be appreciated by the generalreader.
7.2 THE PROBLEMS OF LEAST SQUARES AND CONSTRAINEDOPTIMISATION IN SCALAR VARIABLES
In this section we consider, very briefly, the Method of Least Squares to obtaina curve or a line of `best fit', and the Method of Lagrange Multipliers to obtainan extremum of a function subject to constraints.
For the least squares method we consider a set of data
(xi, Yi) i = 1, 2, ..., n (7.1)
and a relationship, usually a polynomial function
Y = f(x) (7.2)
For each x;, we evaluate f(xi) and the residual or the deviation
ei = y, -f(xr) . (7.3)
The method depends on choosing the unknown parameters, the polynomialcoefficients when f(x) is a polynomial, so that the sum of the squares of theresiduals is a minimum, that is
n
S = > ei (Yi -f(x,))' (7.4)
is a minimum.
C/!
[Sec. 7.21 The Problems of Least Square and Constrained Optimisation 95
In particular, when f(x) Is a linear function
y =ao+alxS(ao, al) is a minimum when
as asas =0=as . (7.5)
0 1
These two equations, known as normal equations, determine the two unknownparameters ao and a1 which specify the line of 'best fit' according to the principleof least squares.
For the second method we wish-to determine the extremum of a continuouslydifferentiable function
f(x1,x2, ...,xn)
whose n variables are contrained by in equations of the form
g1(x1,x2,...,x,) = 0, 1 = 1,2,...,rr
(7.6)
The method of Lagrange Multipliers depends on defining an augmented function
mff+ 1pigt
t=1
where the pi are known as Lagrange multipliers.The extreme of f(x) is determined by solving the system of the (m + n)
equationsaf* =aax,
r = 1, 2, .. , n
g; = 0 i = 1,2,...,mfor the m parameters µl, u2, ... , µm and the n variables x determining theextremum.
Example 71
Given a matrix A = [a11] of order (2 X 2) determine a symmetric matrixX = [x;j] which is a best approximation to A by the criterion of least squares.
Solution
Corresponding to (7.3) we have
E=A - Xwhere E = [e;1] and e11 = a;i -x1j.
.N+
.ti
96 Some Applications of Matrix Calculus [Ch. 7
The criterion of least squares for this example is to minimise
S = e =l,/
which is the equivalent of (7.6) above.The constraint equation is
Xi2 -x21 = 0
and the augmented function is
f* = Earl -x1/)2 + µ(x12 -x21)
-2(aax11
ll '-x11) = 0
af* --2(a12 -x12) +',1 = 0
ax12
af*- -2 (a21 -x21) -11 = 0
ax21
af* --2 (a22 - x22) = 0
ax22
= 0
This system of 5 equations (including the constraint) leads to the solution
µ = a12 -x21
Hencex11 = all , x22 = a22 , x12 = x21 = J(a12 + a21)
X =
all
a12 + a21
L 2
a12 + a21
2
a22
all a12+ -
2
all a21
2 a21 a22
= j(A+A')
a12 a22
7.3 PROBLEM 1 - MATRIX CALCULUS APPROACH TO THE PROBLEMSOF LEAST SQUARES AND CONSTRAINED OPTIMISATION
If we can express the residuals in the form of a matrix E, as in Example 7.1, thenthe sum of the residuals squared is
S = tr E'E . (7.10)
a.-
Sec. 7.3] Problem 1 97
The criterion of the least squares method is to minimise (7,10) with respect tothe parameters involved.
The constrained optimisation problem then takes the form of finding thematrix X such that the scalar matrix function
S = f(X)is minimised subject to contraints on X in the form of
.G(X)=0 (7.11)
where G = [gill is a matrix of order (s X t) where s and t are dependent on thenumber of constraints g1l involved.
As for the scalar case, we use Lagrange multipliers to form an augmentedmatrix function f*(X).
Each constraint gil is associated with a parameter (Lagrange multiplier)Ail
Since
whereEµllg;l = tr U'G
U = [µtl]we can write the augmented scalar matrix function as
f*(X) = trE'E+ tr U'G (7.12)
which is the equivalent to (7.8). To find the optimal X, we must solve thesystem of equations
af*= 0.
ax(7.13)
Problem
Given a non-singular matrix A = [ail] of order (n X n) determine a matrixX = [x,1] which is a least squares approximation to A
(i) when X is a symmetric matrix(ii) when X is an orthogonal matrix.
Solution
(i) The problem was solved in Example 7.1 when A and X are of order (2 X 2).With the terminology defined above, we write
E=A - XG(X) = X -X' = 0
so that G and hence U are both of order (n X n).
E'"
,_,...
fl.
`""
98 Some Applications of Matrix Calculus [Ch. 7
Equation (7.12) becomes
f* =trA'A-trA'X-trX'A+trX'X+trU'X-trU'X'.
We now make use of the results, in modified form if necessary, of Examples 5.4and 5.5, we obtain
of _ -2A+2X+U-U'ax
U °- U'
Then
= 0 for X = A+
X'=A'+U'-U
2
and since X = X', we finally obtain
X=j(A+A').
G(X)=X'X-I=0.f* = tr[A'-X'][A-X] +trU'[XX'-I]
(ii) This time
Hence
so that a f
_ -2A+2X+X[U+U']ax
=0 for X=A-X2
Premultiplying by X' and using the condition
X'X = I
we obtain
X'A=I+U+U'
2
2
and on transposing
Hence
A'X = I+U+ U'
A'X = X'A .
2
(7.14)
If a solution to (7.14) exists, there are various ways of solving this matrixequation.
.D.
...
'L7 1-
+
Sec. 7.3] Problem 1 99
For example with the help of (2.13) and Example (2.7) we can write it as
[(l ©A') .- (A' ©I)U] x = 0 (7,15)
where U is a permutation matrix (see (2.24)) and
x=vecX.We have now reduced the matrix equation into a system of homogeneousequations which can be solved by a standard method.
If a non-trivial solution to (7.15) does exist, it is not unique. We must scaleit appropriately for X to be orthogonal.
There may, of course, be more than one linearly independent solution to(7.15). We must choose the solution corresponding to X being an orthogonalmatrix.
Example 72Given
A =
find the othogonal matrix X which is the least squares best approximation to A.
Solution
1 -1 0 0 r1 -1 0 0
[IOA'] = 2 1 0 0and [A'©1]U = 0
0 1 -10 0 1 -1 2 1 0 0
0 0 2 1 0 0 2 1
Equation (7.15) can now be written as
0 0 0 0
2 1 -1 1
x = 0-2 -1 1 -1
0 0 0 0
There are 3 non-trivial (linearly independent) solutions, (see [18] p.131). Theyare
x = [1 -2 1 1]', x = [1 1 2 -1]' and x = [2 -3 3 2]'.
Only the last solution leads to an orthogonal matrix X, it is
1 2 3X = -13 -3 2
......
... ...
te/
100 Some Applications of Matrix Calculus [Ch. 7
7.4 PROBLEM 2 - THE GENERAL LEAST SQUARES PROBLEM
The linear regression problem presents itself in the following form:N samples from a population are considered. The ith sample consists of an
observation from a variable Y and observations from variables X1, X2, ..., X(say).
We assume a linear relationship between the variables. If the variables aremeasured from zero, the relationship is of the form
Yl = bo+blxn+b2x11+...+bx,8+el. (7.16)
If the observations are measured from their means over the N samples, then
yr = (i= 1, 2, ... N) (7.17)
bo, b1, b2, ... , b are estimated parameters and e1 Is the corresponding residual.In matrix notation we can write the above equations as
y = Xb + e (7.18)where
Y =
[]. b =
[bl], e = eIY2 ba 2
and
YNI' Ibn
_rl x12 ... xln
X = I{1 X22 ... X2n or X =
ex
X11 X12 ... Xln
X21 X22 ... x2n
L1 XN2 ... XNnJ LXNI XN2 ... XNnJ .
As already indicated, the `goodness of fit' criterion is the minimisation withrespect to the parameters b of the sum of the squares of the residuals, which inthis case is
S = e'e = (y'-b'X')(y-Xb).Making use of the results in table (4.4), we obtain
=a (e'e)+ (X'Xb +X'Xb)'X)'-X'-( y
yab= -2X'y + 2X'Xb= 0 for X'Xb = X'y. (7.19)
where b is the least squares estimate of b.If (X'X) is non-singular, we obtain from (7.19)
.b = (X'X)-1 X'y. (7.20)
...
<.o
.-0...
...
Sec. 7.41 Problem 2
We can w,ite (7.19) as
X'(y -Xi) = 0or X'e = 0which is the matrix form of the normal equations defiend in section 7.2.
Example Z 3
101
(7.21)
Obtain the normal equations for a least squares approximation when each sampleconsists of one observation from Y and one observation from
(i) a random variable X(ii) two random variables X and Z.
Solution
(1) 1 x1 Y,6,
X = 1 x2 Iy = Y2 , b =
1 XN YN
62
hence
X'[y-Xb] = Ey;-b1N-b2Ex;ExiYi - b, Ex; - 62 Ex,2J
So that the normal equations are
and
Ey, = b,N+b2Ex1Exly! = b1 E xr + b2 Ex,? .
(ii) In this case
X =
x1 z
l x2 z2 y =
Y11
Y2 b=bl
b2
Lb3J
11 xN ZNJ LYNJ
The normal equations are
Ey, = b1N+b2Ex;+b3EZ1
ExiYi = 61Ext+b2Ex;2+b3Exjz;and Ex;zt = bl Ez; + b2 EX;Zi + b3 Ez1 .
.-.
......
102 Some Applications of Matrix Calculus [Ch. 7
7.5 PROBLEM 3 - MAXIMUM LIKELIHOOD ESTIMATE OF THEMULTIVARIATE NORMAL
Let X1(1 = 1, 2, ..., n) be n random variables each having a normal distributionwith mean Pi and standard deviation ar, that is
Xi = n (lat, at). (7.22)
The joint probability density function (p.d.f.) of the n random variables is
f(xl,x2i...,xn) =
exp(- (x-µ)2V'1(x-µ) (7.23)
where
and
<xi <QO (i= 1,2,..., n)
rall 012 aln
I 012 022 ... a2n
Lain 02n ... annJ
is the covariance matrix.
is, = x' =and
aq = Pi/at a/ (1$I)arr = a,
are the covariances of the random variables.
pr/ is the correlation coefficient between Xr and Xj. The covariance matrix Vis symmetric and positive definite.
Equation (7.23) is called a multivariate normal p.d.f. Maximum likelihoodestimates have certain properties (for example, they are asymptotically efficient)which makes them very useful in estimation and hypothesis testing problems.
For a sample of N observations from the multivariate normal distribution(7.23) the likelihood function is
I
so that
L = (2 r)nty/2 I V IN/2 exp i-2
-µ)1
logL = C--IogIVI-- (xi-µ)'V_1 (x,-µ) (7.24)N 1
r=1
where C Is a constant.
fly
Sec. 7.5] Problem 3
(a) The maximum likelihood estimate of µOn expanding the last term of (7.24), we obtain
1-- {xt'V-
1x'-11, V-'xt -XI' V''µ + µ' V''µ},2
With the help of table (4.4) and using the result
(x1' V-' )' = V-' x1 (since V is symmetric)
we differentiate with respect to it, to obtain
alog L
aµ= V-1
N
J=- u)
Ex,0 when µ =
N= z
Hence the maximum likelihood estimate of is Is µ = z, the sample mean.
(b) The maximum likelihood estimate of V-
We note the following results:
(1)
y V-'Y1 = tr(Y'V-'Y) = tr(YY'V"')r=1
where Y = [Y! Y2 ... YNI
and Yi = xt - µ (i=i,2,...,N.
103
V-' is a symmetric matrix.
(2) By Example 5.3, but taking account of the symmetry of V-' (see Example4.4)
a log I V`'j= 2V-ding{V}.
a v-'
(3) If X is a symmetric matrix
a tr(AX)= A + A'- ding {A} .
ax
Let A = YY' and X = V-', then
a tr(YY'V'')= 2YY'- diag {YY'}
a v-'
SIN
104 Some Applications of Matrix Calculus [Ch. 7
We now write (7.24) as
logL = C+NlogV"r-1 tr(YY'V`r).2 2
Differentiating log L with respect to V`r, using the estimate µ = z, and theresults (2) and (3) above, we obtain
a logL _ N 1
aV'' 2 [2V - ding {V}] - YY'+ 2 diag {YY'}
LetQ=NV-YY',thena log L 1
aV_tQ - 2 diag {Q}
= 0 when 2Q = diag {Q}
Since Q is symmetric, the only solution to the above equation is
Q = 0.It follows that the maximum likelihood estimate of V is
X(X! - X) (X, - X)'V =
N
7.6 PROBLEM 4 - EVALUATION OF THE JACOBIANS OF SOMETRANSFORMATIONS
The interest in Jacobians arises from their importance particularly with referenceto a change of variables in multiple integration.
In terms of scalars, the problem presents itself in the following way.We consider a multiple integral of a subset R of an n-dimensional space
f(xi,x2,...) (7.25)IR
where f is a piecewise continuous function in R.We consider a one to one transformation which maps R onto a subset T
Yt = µ1(x), Y2 = µ2(x), ..., Yn = brn(x)
and the inverse transformation
(7.26)
xt = wr(Y), x2 = WAY), ..., xn = Wn(Y) (7.27)
wherex' _ [xt,x2, ... ,xn] and y' = [Yi,Y2, ... ,Yn]
A A
.off'
8
Sec. 7.6] Problem 4 105
Assuming the first partial derivations of the inverse transformation (7.27) to becontinuous, (7.25) can be expressed as
ff [wr (y), w,,(y), ... , wn (y)] I Jl dy, dye ... dy, (7.28)T
where IJ I can be written as
ax"
by,
D x,.
aye
ax, ax2 ax
by By....
ayn
subject to IJI not vanishing identically in T.
Example 7.4Let
I = 2J exp {-2x1 + 3x2} dx1 dx2R
0<xl<°°, 0<x2<°°.Consider the transformation
Y1 = 2x1 -x2
Y2 = x2 .
Write down the integral corresponding to (7.28).
(7.29)
Solution
We are given
R = ((xi,x2): 0 <x,<-, 0 <x2 <°} .
The above transformation (corresponding to (7.26)) results in the followinginverse transformation (7.27)
XI = I(YI +Y2)
x2 =Y2which defines
T = {(Y1, Y2) : Y2 > 0, Y2 > -Y1, <Yt < °° },
.y.
-4"
.nom
106 Some Applications of Matrix Calculus [Ch. 7
and by (7:29)
Hence
0
IJI= 1 =#
I Jf [i (Yt + Yz), Y21 dYt dY2T
f exp(-y, +2y2)dytdys.T
Our main interest in this section is to evaluate Jacobians when the transfor-mation corresponding to (7.26) is expressed in matrix form, for example as
Y = AXB (7.30)
where A, X and B are all assumed to be of order (n X n).As in section 5.2 (see (5.1) and (5.2)) we can write (7.6) as
y =Px (7.31)
where y=vecY,x=vecXandP=B'®A.In this case
ay=BOA'ax
andax
= [B ®A']-t = B-t ® (A')-t by (2.12)ay
It follows that
a vec Y
avecX
Example ZS
Consider the transformation
whereY = AXB
2 -41A-
J1 3
-t
IBI-" IAI-n (by Property X, p. 27) (7.32)
and B =
Find the Jacobian of this transformation
(i) By a direct method(ii) Using (7.32).
-1-
0
>,I
Sec. 7.6] Problem 4
Solution(i) We have
X = A-tYB-t = [3Yt+4Y2-3Y3-4Y4II
Y Y t + 2Y2 -Ya - 2Y4so that
ax
ay
(2)4
3 1 -3 -14 2 4 -20 0 3 1
1 -4 -2 8
(ii) Al I= 2, IBI = 1 hence IJI = }.
107
-3Y1 - 4Y2 + 6Y3 + 8Y4 J
Yt-2Y2+2Y3+4Y4 J
Similarly, we can use the theory developed in this book to evaluate theJacobians of many other transformations.
Example Z 6
Evaluate the Jacobian associated with the following transformation
(i) Y = X-t(ii) Y = X2 .
Solution
(i) From Example 5.2
ay _ -X-t0 (X-t)'ax
so that
Hence
ax- = -X®X'.by
J = mod
(ii) From section 4.6
ay
ay
ax= IXOX'I = IXI-n IXI-n = IXI-211
= Er,sX +Ersa xrs
so that by the 2nd transformation principle (see section 5.3)
and
ay= XOI+IOX'
ax
J = XOI+IOX'I t
.II
`y1
108 Some Applications of Matrix Calculus [Ch. 7
7.7 PROBLEM 5 - TO FIND THE DERIVATIVE OF AN EXPONENTIALMATRIX WITH RESPECT TO A MATRIX
Since we make use of the spectral decomposition of an exponential matrix, wenow discuss this technique briefly.
Assume that the matrix Q = [q;i] of order (n X n) has eigenvalues
x1,x2,...,An(not necessarily distinct) and corresponding linearly independent eigenvectors
xl, x2, ... , xn .
The eigenvectors of Q' are
Yt, Yn
These two sets of eigenvectors have the property
x; yi = 0 or (equivalently) y,' xi = 0 (i 0J) (7.33)
and can be normalised so that
xiYt = 1 or y;x1 = 1 (i=1,2,...,n). (7.34)
Sets of eigenvectors {x1} and {y1} having the properties (7.33) and (7.34) are saidto be properly normalised.
It is well known (see [18] p. 227) that
exp (Qt) = P diag {ex,t, ex2r, ... , exn t} P-l
where P is the modal matrix of Q, that is the matrix
P = [xl x2 . xn]
It follows from (7.33), (7.34) and (7.35) that
F-1 = [Y1, Y2, ... , Yn]
Hence
exp (Qt) = [x1 x2 ... xn]
Y2
Yn
exlr 0 .. 0
0ex,r
... 0
(7.35)
(7.36)
LO 0 ... e'-' LynJ
= xlyi exp (Alt) + x2Y2 exp (X2t) + ... + xnY eXP (Xnt) ,
Sec. 7.7] Problem 5
that is
109
rt
exp (Qt) = x; yi exp (Xit) . (7.37)1=1
The right hand side of (7.37) is known as the spectral representation (or spectraldecomposition) of the exponential matrix exp (Qt).
We consider a very simple Illustrative example.
Example Z7Find the spectral representation of the matrix exp (Qt), where
Q =
Solution
By (7.37)
1
-1] exp (t) + rl
1
3] exp (-t)
Although we have considered matrices having real eigenvalues, and eigen-vectors having real elements, the spectral decomposition (7.37) is also valid forcomplex elements as can be shown by a slight modification of the above exposition.
By the use of (2.17), that is of the result
exp (10 Q) = 1® exp (Q)
we generalise the result (7.37) to
exp (I ®Q)t = E(1®xtyi) exp (Alt) . (7.38)
We now consider the main problem, to obtain an expression for
wheremqc3Z
' (t) = exp (Qt) , (7.39)so that
c(0) = 1, (7.40)
d(7,41)
dt
_ -1; xi = [2 -11, xz = [1 -11yi = [1 1], yi = [-1 -2] .
exp (Qt) = C 1] [1 1 ] exp (t) + L-11 [-1 -2] exp (-t)
andZ = [z1] is a matrix of order (r X s).
NIA
l0'
tip
r.,
110 Some Applications of Matrix Calculus [Ch. 7]
The matrix Q is assumed to be a function of Z, that is Q(Z). Making use of theresult (6.5), we can write
d 34, a (Q4)) aQ a4)
dt az az az(I ®(P) + (I ®Q) az (7.42)
and from (7.40)
apaZ
(o) = 0 .
We next make use of a generalisation of a well known result (see [19] p. 68);Given
d-X = RX+BUdt
and
the n
For
X(o)=0,
X=ft
exp(R(t--T)}BU(r)dr.
X_-, R=I®Q, B= -az az
and U = I®cl?
the solution to (7.42) subject to (7.43) becomes
f r
az =exp {I®Q(t-r)} aQ [I®4?(t)]dT
0
Hence,
where
I, i
(7.43)
(7.44)
I ®x, y;) exp (X . (t - T))a- [10 xjyj ] exp (X1r) dr
(by 7.37 and 7.38)
N(I ® x1Yi)az
(10 xj yj, ) exp (Xit) exp ((Xj - X;)T) dr .0
a(D
aZ =I®x,y,)aQ(1®xjyj)exp(Xtt)fj(t)
t, l
fy (t) = t if Xj = Xjand
f1(t) _ (ll(A1-Xi))[exp(Xj-?,)t)-I] if
0
Solution to Problems
CHAPTER I
(1)
AB =
A1.B,1 AI.B.2 A1.B.3A2.B,1 A2.B.2 A2.B.3
A3-B.1 A3.B.2 A3.B.j
A4.B.1 A4.B.2 A4.B.3
(2) (a) The kth column of AEIk is the ith column of A, all other columns arezero.
(b) The ith row of EikA is the kth row of A, all other rows are zero.
AEik = Aeiek = A.1ek
EikA=eiekA=e1Ak
(3) trABC = e;ABCei = > (e;A)B(Ce1)
A'i. BC. i .
(4) trAEij = ekarsEr$Eljekk k,r,s
ars ek er es ei ej ekk, r, s
arsbkrbsfbjk = aj1,k,r,s
..r
`-..I
V''
IC)
C1.
r^.
d^.
112 Solution to Problems
(5) A = 2 ai/All = 2- tr (BEII 61j) Eli
CHAPTER 2
(1) Since Uis an orthogonal matrix, the result follows.More formally,
57 [Ers(m X n) © Esr(n X m)] [Err(n X in) ® Ers(m X n)]r. j
r, s
W
I./.k
r, /
kBEitekErt ekBejS1,Eit4/,k
e'BciEii ) b1jE11 = diag {B),
rr(mXm)]O [5'Ess(nXn)]
ssErr(m X m)] ( [SrrEss(n X n)]
,.s(m X n)Esr(n X m)] O [Esr(n X m)Ers(m X n)]
= Im © I = Imr the result follows.
(3) (a) 1 -2 2 -1 11
4 0 2 0A©B =
0 0 -1 1
Q 0 2 0
(b) 5 Q 0 0
U1=U2=0 0 1 0
0 1 0 0
0 0 0 1
, BOA =
-2 -1 2 1
0 1 0 1
4 2 0 0
0 2 0 0
(4) See [18] p. 228 for methods of calculating matrix exponentials.
(a) 2e-e 1 2(ee-')exp (A) = le-1
- e 2e -1 - e
Chapter 2 113
(b) 2e -e-1 0 2(e -e'1) 0
0 2e-e-1 0 2(e-e'')exp (A O I) _
0 -(e - e-1) 0 -e + 2e-1
(5) (a)
exp (A)OI =
2e-e-1 0 2(e-e'1) 0
0 2e-e"' 0 2(e-e-')0-1 -e 0 2c'1 -e 0
0 e'1 -e 0 2e-'-eHence exp (A) 4 I = exp (A C I) .
r1 1 _ 1I 4 -2A-'
1-1. -2 IB_i
2 3 1
so that
-4 2 -4 2
1 3 -1 3 -1A-1 n B-1 = -2 4 -2 3 -1
-3 1 -6 2
(b) As
AOB =
2 4 1 2
6 8 3 4
-1 -2 -1 -2
-3 -4 -3 -4
(A& B)-' =
, it follows that
-2 1 -2 1
3/2 -1/2 3/2 -1/2
2 -1 4 -2-3/2 1/2 -3 -4
This verifies (2.12)
(6) (a) For A; X1 = -1 , X2 = 2 , xi _ [1 4] and x2 = [1 1] .
For B ; j = 1, µ2 = 4 , y; _ [l -1] and yZ = [1 2] .
(b)
AOB = = E (say).
;,,
't1
--k
Q)1
III
114 Solution to Problems
(7)
C(X) = IX/-6f = X4-5X3- 30X2+40X-30_ (X+i)(X+4)(T-2)(X-8).
Hence the eigenvalues of C are
{-I, -4, 2, 81 = (XIMI, X1u2, X2ur, p2µ2).
The corresponding eigenvectors of li are:
2
5
-1
8J L-'J
and2
1
2
(c) This verifies Property IX
For some non-singular P and Q
A = P-' CP and B = Q-' DQ .
HenceAOB = P-'CP0Q-'DQ
_ (P-' 0 Q-')(CPODQ) by (2.11)
= (P®Q)-'(COD)(POQ) by (2.12) and (2.11)
= R-'(COD)Rwhere
ri [ii [ii [i
R = POQ.
The result follows.
CHAPTER 4
(1) ay
ax
2x22 0
x13 2x11
-x 21
0
(2) (a) L X I= x sin x -exp (2Y)
ajxj ex -cosxax x sin x
X I =exp (x) sin x - x cosx,
NX) x -2e2x
ax -2e2x sin x
Chapter 4
(b)
(3)
(a)
0 1
0 0
0 0
ex -cos x
-x sin x
o sin xCx 0
X11X12 +X21X22
x12+x22
X13 X12 +X23 X22
x11 X12 X13
X21 X22
(b) Since Y13 = X11X13 + X21X23
3Y 13 F X13 0 x111
I X11 X12 X13
X23J
X1 1X13 +X21 X23
X12x13+X22X232 2
X13 +X23
rxll x211
+
110 0 00 0 01 +
Lx21 X22 X23J Ll 0 of
X12 x22
X13 X23J
X11
Lx21
0 0 0
1 0 0
115
which is the result in Example 4.8.
aIxIax
X II X12
X21 X22
2[X1] -diag (X11)
2x -2e"
C2ex 2 sin x
2 zX11 +X21
Y = x12x11 +x22x21
X13Xll +X23 X21
hence
ay
ax21
2x21 x22 X23
x22 0 0
x2J 0 0
From Example 4.8
aY = E21X+X'E21 =aX21
ax LX 23 0 X21 j
...
116 Solution to Problems
(4) (a) aY= E,s AX + XAErs
axrs
ay` = E;1 X'A' + A'X'E,; .ax
(b) ayE,'.s Ax' + XAEis
>
17X,3
ayl= AX'E;, + E;, x'A' .
ax
(5) By (4.10)
where
alYI= tr {I YI(Y-')'B'E,,A'}
axrsIYI tr {A'(Y-')'BE,,)
IY I (vec Ers)' vec [A'(Y-')'B']
(AXB I zrs
[zrs] = Z = A' [(AXB)-' ]'B' .
(6) (a) Since
a (X= ErsX' + XErs
axr.,
s
ay= Ers(X')2 + XErsX' + XX'Ers
axrs
(b) aY= X'X + X'E,.s X + (X')2Ers .
axrs
CHAPTER 5
(1) Since
yll
Y21
Y12
Y22
al1xil +a12x12
a21x11 + 1722x12
a11x21 + a12x22
a21x21 + a22x22
CSI
ti,
NIA
Chapter 5 117
avecYa vec X
all a21 0 0
0 0 all a21
a12 a22 0 0
0 0 a12 a22
(2)(a) a vec Y
a vec X(B ©A')(,) by (5.18)
(b) a vec YX©I+IOX'.
a vecX
(3) (a) a tr Y
(b) a tr Y
,A'B' = (vec Ers)' (vec A'B') ,,B = tr E,= tr AEr ,
axrs
hence
atrYaX = A'B'
= 2trE;,.X',
= 2X' .
axrs
hence
atrYax
(c) a tr Y
axrs
hence
= 2trE",X,
atrY= 2X.
ax
(4) (a)
ax
(b) a tr Y
axrs
hence
= -trX-1Er,.X-1 = -trErs(X-2)',atrY
= (X-2)'
axrs
hence
atrY
= -tr AX-t Ers X-'B
a tr Y =-(X'BAX1)'.ax
'Y\
118 Solution to Problems
(c) a tr Y
(d)
axrs
hence
a tr Y
axrs
a tr Y
= tr EE,Xn-1 + tr XErsX"-2 + ... + tr Xn`lEra
= tl(Xn-1)'
exp(X) = I+X+21
1
X2+31
1
X3+.,,
hence by the result (c) above
ax
(5) (a) (i) dY =
= exp (X') .
F`7jjdxjj+aj2dX21 a11dx12+a12dx22
La21 dx 11 + a22 dx21 a21 d x 12 + a 22 d x22
all a12 dx11 dx12A(dX).
all a22 dx21 dx221
(1l)d Y =
(2x11 dxll + 2x21 dx21
x11 dx12 +x12 dx11 +x22dx21 + d21 dx22
x11dx12 +x12dx11 +a121dx22 +x22dx21
2x12dx12 + 2x22dx22 I
dY =
Idx11 dx211 [x11 x12 + [x11 x211 [dx1dx21 1 dx22dx12
dx12 dx22 x21 x22 x12 x22
1
= (dX)'X + X'(dX) .
2x11 dx11 + x12dx21 + x21 dx12
x11dx21 +x21dx11 +x22dx21 +x21dx22
x11dx12 +x12dx11
X 21 dx12
+x12dx22 +x22dx12
+x12dx21 +2Y22dx22
xlldxll +x12dx21 xlldx12 +xl2dx22
[x21dx11 +x22dx21 x21 dx12 +x22dx22
Ixlldxll +x21dx12
x11dx21 +x21dx21
x12dxll +x22dx12
+ x22 dx12x12dx21
= X(dX)+(dX)X.
"'l
Chapter 6 119
(b) Write Y= UV where U = AX and Y = BX ,
then dY= U(dV) + (dU)VAXB(dX) +A(dX)BX .
CHAPTER 6
1( ) -3111(x12 + x22) X21 0 0
ay x12exiiXis 0 x11eX"ix'a x22
ax 0 x11 -sin (x12 +x22) 0
0 0 0 X12
(2)
ax1 0
ax0 0
ax0 0
0 0 1 0 0 0 and so on,axl
1ax l2 ax 13
hence by (
0
6.1)
0 0 0 1 0
1 0 0 1
0 0 0 0
0 0 0 0
ax0 0 0 0
ax1 0 0 1 = U.000000000 0 0 0
1001
(3) SinceX-1 = -
FX22 -x121
-X21 x11
where A =x1 1x22 -x12X21, we can calculate aX_11axrs, for example
ax-t 1
ax 11 A2F
2-x22 X 12X22
x21x22 -x12x21
oho
r"'
--I
120 Solution to Problems
Hence
ax-1 ]
ax A2
FI x22
-x21 x22
-x 12x22
xux22 -x12x11 -x11x21
-x22x21
x221
x12x21
xilx22-x11x21
-x11x122
x11
X22 -x12 0 0
1
11 0 10
]x22 -x12 0 0
-x21x11 0 0 0 0 0 0 -x21 x11 - 0 0
0 0 x22 X12 I O 0 0 0 0 0 x22 -x120 0 -x21 xil L1 0 0 ] L° 0 -x21 x11
-(I ©X'1) U (I O X-').(4)
-'a11x22 -atlx12 a12x22 -a12x 12
A ©XA -a1 1x 21 a11x11 -a12x21 a12x11
a21x 22 -a21x12 a22x22 -a22x 12
-a2 1 x 21 a21x l t -a22x21 a22x i 1
where A = x11x2
We can now calcu
a (A (D X -')/
2 -x12xlate
axrs
21
axrs
and form
0 0 0 0 0 -all 0 -a12
0 all 0 a 12 0 0 0 0
0 0 0 0 0 all 0 -a22
a(A ©X-') 0 a21 0 a 22 0 0 0 0
ax A 0 0 0 0 all 0 a12 0
- a ll 0 -a12 0 0 0 0 0
0 0 0 0 a21 0 a22 0
- a2 1 0 -a22 0 0 0 0 0
Tables of Formulae andDerivatives
Table 1Notation used: A = [ail], B = [bil]
Eij = ei el5i = e/ ej = ej'ei
Eq er = Slrei
EijErs = sjrEisEi1EjsEsm = Eim
EijErs=0iff OrA=-7Za,jE;jii
A.1 = Ael
Al. = A'e1
EiiAErs = air EijtrAB = Zailblit
tr AB' = tr A'B.trAB = (vecA')'vecB.
+q+
+q',
'Ti
fro
122 Tables of Formulae and Derivatives
Table 2
AOB = [apB]AO(aB) = a(AOB)
(A+B)OC = AOC+BOCAO(B+C) = AOB+AOCA0(B0C) (A 0 B) 0 C
(A O B)' = A' O B'(AOB)(C0D) = ACOBC
(A OB)-' = A-' GB-'vec (AYB) _ (B' G A) vec Y
]A O BI _ CAI' IBS" when A and B are of order(n X n) and (rn X rn) respectively
A O B = U, (B (@A) U2, U, and U2 are permutationmatrices
tr (A O B) = trA tr BAOB = A®1,"+1OB
U = Z Z Ers O E,sr s
Table 3
a (Ax) _ A'ax
a (x'A)= A
ax
a (x'x)= 2x
ax
3 (x'A x)= Ax + A'x
ax
az ay az
ax ax ay
Tables of Formulae and Derivatives
= Ers
AErs B
AErs B
= Ers A'AX + X'A'AErs
-AX-'E,,X-'B
= E,, A X + X'AE,s
af(X) af(X)= ZEEiiax ax11
aixtIXI(X-1), when elements of X are
ax independent2 [XXi] - ding (X11}, when X is symmetric,
axErsax,,
axrs
a (AX'B)
axrs
a (X'A'AX )
axrs
a (AX-'B)
axrs
a (X'AX )
axrs
a(Xn)
axrs
a(X-n)
axrs
n
k=0
Table 4
X kErsXn-k-1
-X-n [XkEr,Xn-k-1
k=0
123
124 Tables of Formulae and Derivatives
Table 5
a vec (AXB)_ B' G A
a vecX
a vec (XAX)_ U'(4X ©1) + (IO A'X)
a vec X'
a vec (AX-'B) _ ' ' '
a vecX
Table 6
a log 1XI -, ,
axrlYlr(X-t),
=
a tr (AX)A
ax
a tr (A'X)=A
ax
a tr (X'AXB)= AXB + A' XB'
ax
a tr (XX')= 2X
ax
a tr(X")= nXn'
ax
a tr (ex) x=eax
a tr (AX-'B) = -(X -'BAX-')'
III
NIA
4th
Tables of Formulae and Derivatives
Table 7
ay ayax
= EELrs ®axrs
ax= U + U - EErr ® Err (X symmetric)
ax
axU (elements of X independent)
ax
ax'ax
a(xY)=
-ax BY(I®Y)+(r(Dx)-
az az az
ax-1
ax_ -(I®x-')u(I®x-')
125
(X= ax ®Y+[I®ull[az®xl (r®U2l
._...
t".
.`.
vii
'L7
C17 t=7
Cry
'LS
-r+.-.
<«.m
°>
C/7
O^,
Bibliography
[1] Anderson, T. W., (1958), An Introduction to Multivariate StatisticalAnalysis, John Wiley.
[2] Athans, M., (1968), The Matrix Minimum Principle, Information andControl, 11, 592-606.
(3] Athans, M., and Tse, E., (1967), A Direct Derivation of the OptimalLinear Filter Using the Maximum Principle, IEEE Trans. Auto. Control,AC-12, No. 6, 690-698.
[41 Athans M., and Schweppe, F. C., (1965), Gradient Matrices and MatrixCalculations, MIT Lincoln Lab. Tech., Note 1965-53, Lemington, Mess.
[5] Barnett, S., (1973), Matrix Differential Equations and Kronecker Products,SIAM, J. Appl. Math., 24, No. 1.
[6] Bellman, R., (1960), Introduction to Matrix Analysis, McGraw-Hill.(7] Bodewig, E., (1959),Matrix Calculus, Amsterdam: North Holland Publishing
Co.[8] Brewer, J. W. (1978), Kronecker Products and Matrix Calculus in System
Theory, IEEE Trans. on Circuits and Systems, 25, No. 9, 772-781.[9] Brewer, J. W., (1977), The Derivatives of the Exponential Matrix with
respect to a Matrix, IEEE Trans. Auto. Control, 22, 656-657.[10] Brewer, J. W., (1979), Derivatives of the Characteristic Polynomial Trace
and Determinant with respect to a Matrix, IEEE Trans. Auto. Control,24,787-790.
[11] Brewer, J. W., (1977), The Gradient with respect to a Symmetric Matrix,IEEE Trans. Auto. Control, 22, 265-267.
[12] Brewer, J. W., (1977), The Derivative of the Riccati Matrix with respect toa Matrix, IEEE Trans. Auto. Control, 22, No. 6,980-983.
[131 Conlisk, J. (1969), The Equilibrium Covariance Matrix of Dynamic Econo-metric Models, American Star. Ass. Journal, No. 64, 277-279.
[14] Deemer, W. L. and Olkin, 1., (1951), The Jacobians of certain MatrixTransformations, Biometrika, 30, 345-367.
tic'a
te,=
y
c.,
..,
Sao
vCC
75'
coo
`''[z
].,r
{.73'1
,3.[z,
(On
"O'
"U'
,-.
Sri..;
c;,
'.0>
C;
`,O'.O
._..M
_...
Bibliography 127
[15] Dwyer, P. S. and Macphail, M. S., (1948), Symbolic Matrix Derivatives,Ann. Math. Statist., 19, 517-537.
[16] Dwyer, P. S., (1967), Some Applications of Matrix Derivatives in Multi-variate Analysis, American Statistical Ass. Journal, June, pt 62, 607-625.
[17] Geering, 11. P., (1976), On Calculating Gradient Matrices, IEEE Trans.Auto. Control, August, 615-616.
[18] Graham, A., (1979), Matrix Theory and Applications for Engineers andMathematicians, Ellis Horwood.
[19] Graham, A., and Burghes, D., (1980), Introduction to Control TheoryIncluding Optimal Control, Ellis Horwood.
[20] Lancaster, P., (1970), Explicit Solutions of Linear Matri;, Equations,SIAM Rev., 12, No. 4, 544-566.
[211 MacDuffee, C. C. (1956), The Theory of Matrices, Chelsea, New York.[22] Neudecker, H. (1969), Some Theorems on Matrix Differentiation with
special reference to Kronecker Matrix Products, J Amer. Statist. Assoc.,64,953-963.
[23] Neudecker, H., A Note oj'KroneckerMatriY Products and Matrix EquationSystems.
(24] Paraskevpoulos, P. N. and King, R. E., (1976), A Kronecker Productapproach to Pole assignment by output feedback, Int. J Contr., 24, No. 3,325-334.
[25] Roth, W. E., (1944), On Direct Product Matrices, Bull. Amer. Math. Soc.,No. 40, 461-468.
[26] Schonemann, P. H., (1965), On the Formal Differentiation of Traces andDeterminants, Research Memorandum No.27, University of North Carolina.
[27] Schweppe, F. C., (1973), Uncertain Dynamic Systems, Englewood Cliffs,Prentice Hall.
[28] Tracy, D. S. and Dwyer, P. S., (1969), Multivariate Maxima and Minimawith Matrix Derivatives, J. Amer. Statist. Assoc., 64, 1576-1594.
[29] Turnbull, H. W., (1927), On Differentiating a Matrix,Proc. EdinburghMath.Soc., 11, ser. 2, 111-128.
[30] Turnbull, H. W., (1930/31), `A Matrix Form of Taylor's Theorem', Proc.Edinburgh Math. Soc., Set. 2, 33-54.
(31] Vetter, W. J., (1970), Derivative Operations on Matrices, IEEE Trans.Auto. Control, AC-15, 241-244.
[32] Vetter, W. J., (1971), Correction to `Derivative Operations on Matrices',IEEE Trans. Auto. Control, AC-16, 113.
[33] Vetter, W. J., (1971), An Extension to Gradient Matrices, IEEE Trans.Syst Man. Cybernetics, SMC-1, 184-186.
[34] Vetter, W. J., (1973), Matrix Calculus Operations and Taylor Expansions,SIAMRev., 2, 352-369.
(35] Vetter, W. J., (1975), Vector Structures and Solutions of Linear MatrixEquations, Linear Algebra and its Applications, 10, 181-188.
1 Bibliography
[:W. J., (1971), On Linear Estimates, Minimum Variance and Least-Weighting Matrices, IEEE Trans. Auto. Control, AC-16, 265-
[., R. J. and Mulholland, R. J., (1980), Kronecker Product Represen-or the Solution of the General Linear Matrix Equation,IEEE Trans.ontrol, AC-25, No. 3, 563-564.
CZ
.
...PC
.-.
^C'
`w..
mod'
2,2
Index
C
Chain Rulematrix, 88vector, 54
characteristic equation, 47cofactor, 57column vector, 14companion form, 47constrained optimisation, 94, 96
D
decomposition of a matrix, 13direct product, 21derivative
Kronecker product, 70matrix, 60, 62, 64, 67, 70, 75, 81scalar function, 56, 75vector, 52
determinant, 27, 56deviation, 94
G
Eigenvalues, 27, 30eigenvectors, 27, 30elementary matrix, 12, 19
transpose, 19exponential matrix, 29, 31, 42, 108
G
gradient matrix, 56
J
Jacobian, 53, 109
K
Kronecker delta, 13product, 21, 23, 33, 70, 85sum, 30
L
Langrange multipliers, 95least squares, 94, 96, 100
M
Matrixcalculus, 51, 94companion, 47decomposition, 13derivative, 37, 60, 62, 67, 70, 75,
81,84,88differential, 78elementary, 12, 19exponential, 29, 31, 42, 108gradient, 56integral, 37orthogonal, 97permutation, 23, 28, 32product rule, 84symmetric, 58, 95, 97transition, 42
maximum likelihood, 102mixed product rule, 24multivariable system, 45multivariate normal, 102
N
normal equations, 95, 101