Download pdf - Alexander Graham - Kronecker Products and Matrix Calculus With Applications

Kronecker Products and Matrix Calculus:

with Applications

ALEXANDER GRAHMvI, M.A., M.Sc., Ph.D., C.Eng. M.LE.E. Senior Lecturer in Mathematics,

The Open University, Milton Keynes

ELLIS HORWOOD LIMITED Publishers· Chichester

Halsted Press: a division of JOHN WILEY & SONS

New York· Brisbane· Chichester· Toronto

.,...;.

first published in 1981 by ELLiS HORWOOD LIMiTED Market Cross House, Cooper Street, Chichester, West Sussex, PO 19 lEB, England

11Ie publisher's colophon is reproduced from James Gillison's drawing of the allcient Market Cross, Chichester.

Distributors: Australia, New Zealand, South-east Asia; Jacaranda-WUey Ltd., Jacaranda Press, JOHN WILEY & SONS INC., G.P.O. Box 859, Brisbane, Queensland 40001, Australia Canada: JOHN WILEY & SONS CANADA LIMITED 22 Worcester Road, Rexdale, OntariO, Canada. b'urope, Africa.' JOHN WILEY & SONS LIMITED Baffins Lane, Chichester, West Sussex, England,

North and South America and the rest of the world: Halsted Press: a division of JOliN WILEY & SONS 605 Third Avenue, New York, N.Y. 10016, U.S.A.

© 1981 A. Graham/Ellis Horwood Ltd.

British Library Cataloguing in Publication Data Grw:un. Alexander

Kronecker products and matrix calculus. -(Ellis Horwood series in mathematics and its applications) 1. Matrices 1. Title 512.9'43 QA188

Library of Congress Card No. 81-7132 AACR2

ISBN 0-85312-391-8 (Ellis Horwood Limited, Library Edition) [SBN 0-85312-427-2 (Ellis Horwood Limited. Student Edition) ISBN 0-470-27300-3 (Halsted Press)

Typeset in Press Roman by Ellis Horwood Ltd. PIlnted in Great Britain by R. J. Acford, Chichester

COI'YRIGIIT NOTICE -All Rillht~ Rescrved. No [lurt or this publication may be rcproduccd, stored in a retricval ~ystCl\\, or tranSlllillcd,ln any form or by any means, ele~tronic, mcchanical, photocopying, recording or otherwise, without the permission of E111s Horwood Limited, Market Cross House, Cooper SIIeet, Chichester, West Sussex, England.

a..}..

Q[>

4-.

(/]0..

.01.v-1

Table of Contents

Author's Preface ..........................................7Symbols and Notation Used ..................................9

Chapter 1 - Preliminaries1.1 Introduction ....................................... 111.2 Unit Vectors and Elementary Matrices ...................... 111.3 Decompositions of a Matrix ............................. 131.4 The Trace Function .................................. 161.5 The Vec Operator . ................................. 18

Problems for Chapter I ................................20

Chapter 2 - The Kronecker Product2.1 Introduction ....................................... 212.2 Definition of the Kronecker Product .......................212.3 Some Properties and Rules for Kronecker Products ............. 232.4 Definition of the Kronecker Sum .........................302.5 The Permutation Matrix associating vccX and vecX' ............. 32

Problems for Chapter 2 ................................ 35

Chapter 3 - Some Applications for the Kronecker Product3.1 Introduction ....................................... 373.2 The Derivative of a Matrix ..............................373.3 Problem 1: solution of AX + XB = C ..................... 383.4 Problem 2: solution of AX + XA = µX ..................... 403.5 Problem 3: solution of X = AX + XB ..................... 413.6 Problem 4: to find the transition matrix associated with

the equation X = AX + XB ............................ 423.7 Problem 5: solution of AXB = C .........................443.8 Problem 6: Pole assignment for a Multivariable System...........45

'v,

'C7

...

A.°

'_'

...

380A

6 Table of Contents

Chapter 4 - Introduction to Matrix Calculus4.1 Introduction ....................................... 514.2 The Derivatives of Vectors ............................. 524.3 The Chain rule for Vectors ............................. 544.4 The Derivative of Scalar Functions of a Matrix

with respect to a Matrix ............................... 564.5 The Derivative of a Matrix with respect to one of

its Elements and Conversely ............................604.6 The Derivatives of the Powers of a Matrix ................... 67

Problems for Chapter 4 ................................ 68Chapter 5 - Further Development of Matrix Calculus including an

Application of Kronecker Products5.1 Introduction ....................................... 705.2 Derivatives of Matrices and Kronecker Products ............... 705.3 The Determination of (avecX)/(avecY) for more

complicated Equations ............................... 725.4 More on Derivatives of Scalar Functions with respect to a Matrix .... 755.5 The Matrix Differential ................................ 78

Problems for Chapter 5 ................................ 80Chapter 6 - The Derivative of a Matrix with respect to a Matrix

6.1 Introduction ....................................... 816.2 The Definition and some Results ......................... 816.3 Product Rules for Matrices ............................. 846.4 The Chain Rule for the Derivative of a Matrix with respect to Matrix .88

Problems for Chapter 6 ................................ 92Chapter 7 - Some Applications of Matrix Calculus

7.1 Introduction ....................................... 947.2 The Problems of Least Squares and Constrained Optimization in

Scalar Variables ..................................... 947.3 Problem 1: Matrix Calculus Approach to the Problems

of Least Squares and Constrained Optimization ................967.4 Problem 2: The General Least Squares Problem ............... 1007.5 Problem 3: Maximum Likelihood Estimate of the Multivariate Normal 1027.6 Problem 4: Evaluation of the Jacobians of some Transformations... 1047.7 Problem 5: To Find the Derivative of an Exponential

Matrix with respect to a Matrix ......................... 108Solution to Problems ..................................... IIITables of Formulae and Derivatives ............................ 121Bibliography ........................................... 126Index ............................................... 129

in.

CV

]

'27

T°°

.'7l/1

_r.

..^ ^.N

'i.

...

'r+

rte.

i1.

+~

+

O..

4-.

''.

.'7

~a.,

...1)¢

9a)...'-'

Author's Preface

My purpose in writing this book is to bring to the attention of the reader, somerecent developments in the field of Matrix Calculus. Although some concepts,such as Kronecker matrix products, the vector derivative etc. are mentioned ina few specialised books, no book, to my knowledge, is totally devoted to thissubject. The interested researcher must consult numerous published papers toappreciate the scope of the concepts involved.

Matrix calculus applicable to square matrices was developed by Turnbuil[29,301 as far back as 1927. The theory presented in this book is based on theworks of Dwyer and McPhail [15] published in 1948 and others mentioned inthe Bibliography. It is more general than Turnbull's development and is applicableto non-square matrices. But even this more general theory has grave limitations,in particular it requires that in general the matrix elements are non constant andindependent. A symmetric matrix, for example, is treated as a special case.Methods of overcoming some of these limitations have been suggested, but I amnot aware of any published theory which is both quite general and simple enoughto be useful.

The book is organised in the following way:Chapter 1 concentrates on the preliminaries of matrix theory and notation

which is found useful throughout the book. In particular, the simple and usefulelementary matrix is defined. The vec operator is defined and many usefulrelations are developed. Chapter 2 introduces and establishes various importantproperties of the matrix Kronecker product.

Several applications of the Kronecker product are considered in Chapter 3.Chapter 4 introduces Matrix Calculus. Various derivatives of vectors are definedand the chain rule for vector differentiation is established. Rules for obtainingthe derivative of a matrix with respect to one of its elements and conversely arediscussed. Further developments in Matrix Calculus including derivatives ofscalar functions of a matrix with respect to the matrix and matrix differentialsare found in Chapter 5.

Chapter 6 deals with the derivative of a matrix with respect to a matrix.

'..1..

-S7

4..

4.n 'CJ

461a`1

'C1

8Author's Preface

This includes the derivation of expressions for the derivatives of both the matrixproduct and the Kronecker product of matrices with respect to a matrix. Thereis also the derivation of a chain rule of matrix differentiation, Various applicationsof at least some of the matrix calculus are discussod in Chapter 7,

By making use, whenever possible, of simple notation, including manyworked examples to illustrate most of the important results and other examplesat the end of each Chapter (except for Chapters 3 and 7) with solutions at theend of the book, I have attempted to bring a topic studied mainly at post-graduate and research level to an undergraduate level.

,..

.w,

Symbols and Notation Used

A,B,C... matricesA' the transpose of A

ari the (i, j)th element of the matrix A[aif] the matrix A having arf as its (4 j)th elementI,,, the unit matrix of order m X inel the unit vectore the one vector (having all elements equal to one)

Ell the elementary matrix0,,, the zero matrix of order in X mSU the Kronecker deltaA., the lth column of the matrix A

Aj. the jti row of A as a column vectorA1.' the transpose of Af. (a row vector)(A')., the ithe column of the matrix A'(A').; the transpose of the ith column of A' (that is, a row vector)tr A the trace of AvecA an ordered stock of columns ofAA O B the Kronecker product of A and Biff if and only ifdiag {A} the square matrix having elements all, a22, . . . along its diagonal

and zeros elsewhere8Y

aXrs

ayfaxErs

E#

a matrix of the same order as Y

a matrix of the same order as X

an elementary matrix of the same order as Xan elementary matrix of the same order as Y

...

.".

.....

L°.

fl.

CHAPTER I

Preliminaries

1.1 INTRODUCTION

In this chapter we Introduce some notation and discuss some results which willbe found very useful for the development of the theory of both Kroneckerproducts and matrix differentiation. Our aim will be to make the notation assimple as possible although inevitably it will be complicated. Some simplificationmay be obtained at the expense of generality. For example, we may show that aresult holds for a square matrix of order n X n and state that it holds in the moregeneral case when A is of order in X n. We will leave it to the interested reader tomodify the proof for the more general case.

Further, we will often write

or or justDij instead ofm

ij

n

i=1 j=1

when the summation limits are obvious from the context.Many other simplifications will be used as the opportunities arise. Unless of

particular importance, we shall not state the order of the matrices considered.It will be assumed that, for example, when taking the product All or ABC thematrices are conformable.

1.2 UNIT VECTORS AND ELEMENTARY MATRICES

The unit vectors of order n are defined as

1 0

0 1

e1 = 0 , e2 = 0 , ..., e _

0

0

0

Pi L0J L1

..-

ti.

t27

Cs.

r.,

--+

12Preliminaries

The one vector of order n is defined as

11

1

e = 1

1

[Ch. 1

(1.2)

From (1.1) and (1.2), obtain the relation

e = Eel (1.3)

The elementary matrix E,i is defined as the matrix (of order m X n) whichhas a unity in the (i, f)th position and all other elements are zero.

For example,

E23 =

000...0001 ...0000...0

Lo00...0JThe relation between e1, ei and E11 is as follows

Eli = ei el

where ei denotes the transposed vector (that is, the row vector) of el.

(1.4)

Example 1.1

Using the unit vectors of order 3

(i) form Ell, E21, and E23(ii) write the unit matrix of order 3 X 3 as a sum of the elementary matrices.

Solution

(i) 1 1 0 0E11= 0 [1 001=

1000

0 000

0 000E21= 1 [100]= 100

0 0000 0 0 0

E23 = 1 [0 0 1]= 0 0 1

0 000

CAD

Sec. 1.3] Decompositions of a Matrix 13

3

(ii)' ! = Eit + E22 + E33 = eiejr=

The Kronecker delta Sij is defined as

1ift=/Sid Oifizkj

it can be expressed as

Sij=ejei=ejei . (1.6)

We can now determine some relations between unit vectors and elementarymatrices.

Eijer = eiejer (by 1.5)

= 5/rei (1.7)and

e,.Eii = e.eiej

= Sriej (1.8)Also

EijErs = eieieres = 5jetes = SjrEis (1.9)

In articular if r =f we havep ,

EijEjs=51jEis=Eis

and more generally

LijEjsEsrn = EisEsm = Eim (1.10)

Notice from (1.9) that

EijErs = 0 if / # r .

1.3 DECOMPOSITIONS OF A MATRIX

We consider a matrix A of order m X n having the following form

all a12 ainnA

-all a22 . a2n = [a11]

Lamlamt amnJ

We denote then columns of A by A.1, A.2, ... A,n. So that

A.j = a2i (j = 1, 21 .... n) (1.12)

an,j

'+f

fl,

0

14 Preliminaries [Cll. 1

and them rows of A by A1., A.2, ...A.. so that

A. =

A

(i = 1,2,... ,m) (1.13)

Both the A.l and the A. are column vectors. In this notation we can write A asthe (partitioned) matrix

A = [A.1 A.2 ... A.,,] (1.14)or as

A = [A1.A2.... A,,,.]' (1.15)

(where the prime means 'the transpose of').For example, let

so that

then

A1. =

au

ate

at,

all al

a21 a22

alland A2. _

a12

a21

a22

Palla2I' = call a121

L 12a221 La21 a22

=A.

The elements, the columns and the rows of A can be expressed in terms of theunit vectors as follows:

The jth column A.1 = Ael (1.16)

The ith rowAi '= ejA. (1.17)So that

A;. = (e,A)' = A'e1. (1.18)

The (i,j)th element ofA can now be written as

all = ejAel = eeA'el

We can express A as the sum

A = EEailEfl (1.20)

(where the Ell are of course of the same order as A) so that

A = EEaile,e1. (1.21)

[1.

GIN

1(,--*

N..

a.)

Sec. 1.31 Decompositions of a Matrix

From (1.16) and (1.21)

Similarly

A. j = Aej = (2Eaiieie)ei

= ZEatjet(e/ej)

= 2;a;ie; .

15

(1.22)

At. = Ea;jej (1.23)so that j

A;. = Eatjej . (1.24)I

It follows from (1.21), (1.22), and (J.24) that

A = XA.jejand

A = Eet A;.' .

Example 1.2Write the matrix

A =Fall a,2

L2l a2J

as a sum of: (i) column vectors of A; (ii) row vectors of A.

Solutions(i) Using (1.25)

A = A.le'1 + A.2e2

a21

[1 03 +[Using (1.26)

a22a el

[0 1]

(1.25)

(1.26)

A = el A1: + e2A2.'

ro [all a12] + [00 [a,21 a,22]

There exist interesting relations involving the elementary matrices operating onthe matrix A.

For example

EtjA = e;ej'A (by 1.5)

= e1Aj ' (by 1.17) (1.27)

`s]

t.)

CST

v..,...

.On

+:.

16Preliminaries [Ch. I

similarly AErj = Ae;ej' = A.ree .(by 1.16) (1.28)

sa that AEij = A.jee (1.29)

AE,jB = Aejej'B = A.,B1.' (by 1.28 and 1.27) (1.30)

,ErjAEr,i = ere/Aeres (by 1.5)

= ejalre'l (by 1.19)

= ajreie; = airEls (1.31)

In particularEj1AErr = airEir (1.32)

Example 1.3Use elementary matrices and/or unit vectors to find an expression for

(i) The product AB of the matrices A = [a,1] and B = [bij].(ii) The kth column of the product AB(iii) The kth column of the product XYZ of the matricesX= [xji], Y=

and Z = [zii]

Solutions(i) By (1.25) and (1.29)

A = EA. i e, = EAEii

hence

AB = E(AE11)B = E(Aej)(ej'B)

= EA.1Bj.' (by (1.16) and (1.17)(ii) (a)

(AB).k = (AB)ek = A(Bek) = AB.k by (1,16)

(b) From (i) above we can write

(AB).k = E(Aejej'B)ek = E(Aej)(e%Bek)

= EA./bjk by (1.16) and (1.19)i

(iii) (XYZ).k = Ezjk(XY).j by (ii)(b) above

= E(zjkX)Y.j by (ii)(a) above.

1.4 THE TRACE FUNCTIONThe trace (or the spur) of a square matrix A of order (n X n) is the sum of thediagonal terms n

art1=1

Sec. 1.4] The Trace Function

We writetr A = Eau

From (1.19) we have

aj1 = e';Aet,so that

tr A = Ee'iAei

From (1.16) and (1. 34) we find

tr A = Ee'iA.j

and from (1.17) and (1.34)

tr A = EAj.'ej .

17

(1.33)

(1.34)

(1.35)

(1.36)

We can obtain similar expression for the trace of a productAB of matrices.

For example

tr AB = Ee'jABej (1.37)t

= EE(e'Ae1)(e%Bet) (See Ex. 1.3)II

= Efatlbfj

Similarly

= EEbljat/

tr BA = EeeBAe1

=

From (1.38) and (1.39) we find that

trAB=trBA.From (1.16), (1.17) and (1.37) we have

tr AB = EA; B.t

Also from (1.40) and (1.41)

tr AB = EB1.A.j .

Similarlytr AB' = EAj.B1 .

and since tr AB' = Is A'B

tr AB' = EA.'jB.t

(1.38)

(1.39)

(1.40)

(1.41)

(1.42)

(1.43)

(1.44)

'C3

-U.

18Preliminaries [Ch. I

Two important properties of the trace are

tr (A + B) = tr A + tr B (1.45)

.nd tr (a A) = a trA (1.46)

where a is ascalar.These properties show that trace is a linear function.For real matrices A and B the various properties of tr (AB') indicated above

show that it is an inner product and is sometimes written as

tr (AB') _ (A, B)

1.5 THE VEC OPERATORWe shall make use of a vector valued function denoted by vec A of a matrix Adefined by Neudecker (221.

If A is of order m X n

A.1

vecA = A.2 (1.47)

LA. J, .

From the definition it is clear that vecA is a vector of order mn.For example if

then

A =a21 azzC11 a'2

rai nvecA = a21

a12

a22

Example 1.4Show that we can write tr AB as (vec A')' vec B

Solution

By (1.37)tr AB = Ee'jABe1

= EAi;B,1 by (1.16) and (1.17)

(since the ith row of A is the ith column of A')

..,

t3.

.N.

'C7

.NJ

..d

c$'

Sec. 1.51 The Vec Operator

Hence (assuming A and B of order n X n)

tr AB = E(A').1'(A').i 2'. (A').,,']

_ (vec A')'vec B

B.l

B.2

B,

19

Before discussing a useful application of the above we must first agree onnotation for the transpose of an elementary matrix, we do this with the aid ofan example.

Let X =X11 Xl2 X13

X21 X22 X23

then an elementary matrix associated with will X will also be of order (2 X 3).For example, one such matrix is

_ 0 1 0E12= 000

The transpose of E12 is the matrix

E12 =0 0

1 0

00

Although at first sight this notation for the transpose is sensible and is usedfrequently in this book, there are associated snags. The difficulty arises whenthe suffix notation is not only indicative, of the matrix involved but also deter-mines specific elements as in equations (1.31) and (1.32). On such occasions itwill be necessary to use a more accurate notation indicating the matrix order andthe element involved. Then instead of E12 we will write E12(2 X 3) and insteadof E12 we write E21(3 X 2),

More generally if X is a matrix or order (in X n) then the transpose of

Ers (171 X n)

will be written as

Ers

unless an accurate description Is necessary, in which case the transpose will bewritten as

Esr(nXm) .

Now for the application of the result of Example 1.4 which will be used later onin the book.

.-.

C.1

..y

20Preliminaries [Ch. 1]

From the above

tr E,''A = (vec Ers)' (vec A)

ars

where ars is the (r,s)th element of the matrix A.We can of course prove this important result by a more direct method.

tr E',.sA = Ee ErsAek

ai/ekese;.eiejek (sinceA =>aiiEij)i, j, k

i,1, kij'k$Sri'jk = ars

Problems for Chapter 1

(1) The matrix A Is of order (4 X n) and the matrix B is of order (n X 3). Writethe product AB in terms of the rows of A, that is, A,., A2., .. , and thecolumns of B, that is, B.1, B.2, ... .

(2) Describe in words the matrices

(3)

(a) AEik and (b) EikA .

Write these matrices in terms of an appropriate product of a row or a columnof A and a unit vector.

Show that

(a) trABC= EA1.BC.i

(b) trABC= trBCA=trCAB

Show that tr AEij = aji

B = [bij] is a matrix of order (n X n)diag {B} = diag {bll, b22, ... , b,,,, } = EbiiEii .Show that if

aij = tr BEjj6jj

then A = [aij] = diag{B}

[3.

+.,

.''

CHAPTER 2

The Kronecker Product

2.1 INTRODUCTIONKronecker product, also known as a direct product or a tensor product is aconcept having its origin in group theory and has important applications inparticle physics. But the technique has been successfully applied in various fieldsof matrix theory, for example in the solution of matrix equations which arisewhen using Lyapunov's approach to the stability theory. The development of thetechnique in this chapter will be as a topic within the scope of matrix algebra.

2.2 DEFINITION OF THE KRONECKER PRODUCTConsider a matrix A = [aqj of order (m X n) and a matrix B = [bq] of order(r X s). The Kronecker product of the two matrices, denoted by A O B is definedas the partitioned matrix

a11B a12B ...

AOB = a21B a22B ... a2,B (2.1)

LamIB a,r,,, BA O B is seen to be a matrix of order (rnr X its). It has inn blocks, the (i,j)thblock is the matrix a11B of order (r X s).

For example, let

A E P' 11 ail f B_ I bil b121

a21 a221 I b21 b22then

rallbll allbl2 al2bll a12b12

AOB a11B a12B

La21B a22B=

a11b21 a1lb22 a12b21 a12b22

a21b11 a21b12 a22b11 a22b12

a21 b21 a21 b22 a22 b21 a22 b22

tr.

._._

C3.

CU

D

w{..'

.N.

.N..

".. Z2The Kronecker Product (Ch. 2

Notice that the Kronecker product is defined irrespective of the order of the

makes involved. From this point of view it is a more general concept than

matrix multiplication. As we develop the theory we will note other resultswhich are more general than the corresponding ones for matrix multiplication.

The Kronecker product arises naturally in the following way. Consider two

linear transformations

x = Az and y = Bw

which, in the simplest case take the form

xt

x2

Fall

Last

at2

a22

r a t

Z2and

Yt

Y2

btr

bet

bb2r22, wt

LW J

(2.2)

We can consider the two transformations simultaneously by defining the following

vectors xiyt ztwtXI VI z ws

x 0y = and v= z© w= (2.3)

I x2Yt z2wt

x2 Y2 z2w2 .

To find the transformation between µ and v, we determine the relations betweenthe components of the two vectors.

For example,

xtyt = (attzt + at2z2) (btt wt + bt2w2)

= all btt (ziwt) + all bt2(ztw2) + at2btt(z2wt) + at2bt2(z2w2)

Similar expressions for the other components lead to the transformation

alibi, attbt2 at2btt a,-2b,2

attb21 all b22 at2b2t a12b22u= v

a2tbtt a21b12 a22brt a22b12

a21b12 a2tb22 a22b2t a22 b22

or

µ = (A®B)v,that is

Az®Bw = (A®B)(z(Dw) . (2.4)

Example 2.1

Let Eq be an elementary matrix of order (2 X 2) defined in section 1.2 (see 1.4).Find the matrix

U=2

Ej, i ®EI I

2

L°-!

+

Sec. 2.3] Some Properties and Rules for Kronecker Products 23

SolutionU =Ell (8) Ell +E1,2 ®E2,1 +E11 ® E12 +E2,2 ®E2,2

f11®r61

+roa1

®roof

+(001 (lo it

0 0 00 0 0 1 0 I of Lo 0J

'+so that 011( ® 0111 0 0 0

0 0 1 0U =

0 1 0 0

0 0 0 1

Note. U is seen to be a square matrix having columns which are unit vectorser(i = 1, 2,.. ). It can be obtained from a unit matrix by a permutation of rowsor columns. It is known as a permutation matrix (see also section 2.5).

2.3 SOME PROPERTIES AND RULES FOR KRONECKER PRODUCTS

We expect the Kronecker product to have the usual properties of a product.

I If a is a scalar, then

A O (aB) = a(A ®B) . (2.5)

ProofThe (i, j)tli block of A O (aB) is

[are (aB)J

= a[a11BJ

= a[(i, j) th block of A O BJ

The result follows.

It The product is distributive with respect to addition, that is

(a) (A+B)OC = AOC+B®C (2.6)

(b) A®(B+C) = ul ®B+A®C (2.7)

Proof

We will only consider (a), The (i, j)th block of (A + B) ® C is

(ali + b1i) C .

The (i, j)th block of A ® C + B ® C is

a11C+b;1C = (a11+bl)C

0

'+7

t.,..

C.,

The Kronecker Product24

Since the two blocks are equal for every (i,j), the result follows.

-III The product is associative

A®(B®C) _ (A(2-9 B)®C .

IV There exists

a zero element Ornr, = Orr, 2) On

a unit element Imn ° Im ® In

(Ch. 2

(2.8)

(2.9)

The unit matrices are all square, for example In, in the unit matrix of order(jn X m).

Other important properties of the Kronecker product follow.

V (A ®B)' = A' ®B' (2.10)

ProofThe (i,j)th block of (A (D B)' is

ai jB' .

VI (The `Mixed Product Rule').

(A ®B) (C ®D) = AC ®BD (2.11)

provided the dimensions of the matrices are such that the various expressionsexist.

ProofThe (i,j)th block of the left hand side is obtained by taking the product of theith row block of (A ® B) and the /th colum block of (C ® D), this is of thefollowing form

(ai1B ai2B ... ajnB)

c11D

c21D

cn1D

= EajrcriBD . - -

r

The (i, j)th block of the right hand side is (by definition of the Kronecker product)

gj1BD

where gji is the (i, j)th element of the matrix AC. But by the rule of matrixmultiplications

gji=Zajrcri

Sec. 2.31 Some Properties and Rules for Kronecker Products 25

Since the (i,j)th blocks are equal, the result follows.

VII Given A(m X m) and B(n X n) and subject to the existence of the variousinverses,

(A©B)'' = A"' OBy' (2.12)

ProofUse (2.11 )

(A ®B) (A-' ®B"') = AA-' ®BY-' = I, ®In = Inv.The result follows.

VIII (See (1.47))

vec(AYB) _ (B' ®A) vec Y (2.13)

ProofWe prove (2.13) for A, Y and B each of order n X n. The result is true forA(m X n), Y(n X r), B(r X s). We use the solutions to Example 1.3(iii).

(AYB).k = E(bikA)Y.ii '- -,

_ [blkA b2kA ... bnkA1

Y.1

Y. 2

Y.n

= [B.k'®A]vecY

= [(B')k: ®A] vec Y

since the transpose of the kth column of B is the kth row of B'; the resultsfollows.

Example 2.2

Write the equation

all a12

a21 a22

in a matrix-vector form.

XI X3

X2 X4

X11 C12

X21 `2J

Solution

The equation can be written as AXI = C. Use (2.12), to find

vec (AXI) = (1®A) vec X = vcc C ,

o`"

f`7'fl

,U.

61.

26

so thatFall a12 0 0-1

a21 a22 0 0

0 0 all a12

0 0 a21 a22

The Kronecker Product [Ch. 2

x1

x2

X3

x4

C1t

X21

a12

Lc22

Example 2.3A and B are both of order (n X n), show that

(i) vecAB=(1®A)vecB(ii) vecAB=(B'®A)vecl(iii) vec AB = E (B').k ® A.k

Solution

(1) (As in Example 2.2)In (2.13) let Y = B andB =1.

(ii) In (2.13) let Y = I .

(iii) In vec AB = (B' ®A) vec I

substitute (1.25), to obtain

vecAB = [(B').ie; O EA.lei]vecl

= [((B').i®A.J)(e.® ee)] vec 1 (by 2.11)ijThe product e', O ei' is a one row matrix having a unit element in the [(i - 1)n +

j]th column and zeros elsewhere. Hence the product

[(B').; ®A.i] [el' O el]

is a matrix having

(B').1®A.1

as its [(i -1)n + j]th column and zeros elsewhere. Since vecl is a one columnmatrix having a unity in the 1st, (n + 2)nd, (2n + 3)rd . . . n2rd position andzeros elsewhere, the product of

[(B').I ®A.l] [ej ® e)] and vec I

is a one column matrix whose elements are all zeros unless i and j satisfy

(i-1)n+j = l,orn+2,or2n+3,...,orn2

Sec. 2.3j Some Properties and Rules for Kronecker Products 27

that is

1=j=1 or i = j = 2 or i=j=3 or ..., i=j=nin which case the one column matrix is

(B').i®A.r (i = 1,2,...,n)The result now follows.

IX If (X;} and (xj) are the eigenvalues and the corresponding eigenvectors for Aand (µi} and (yi) are the eigenvalues and the corresponding eigenvectors for B,then

A®Bhas eigenvalues (Xrµj} with corresponding eigenvectors (xi ® yi}.

ProofBy (2.11)

(A ® B) (x, ® yi) _ (Ax,) © (Byi)

_ (Xixr) ® (µ1y1)

= Xjµi(x1 ®yj) (by 2.5)

The result follows.

X Given the two matrices A and B of order n X n and m X m respectively

JAOBI = IAImJBV"

where IAA means the determinant of A.

ProofAssume that X1, X2, ... , X and µr, µ2, ... , µ,,, are the eigenvalues of A and Brespectively. The proof relies on the fact (see [18] p. 145) that the determinantof a matrix is equal to the product of its eigenvalues.

Hence (from Property IX above)

IAOBI = jjXjufi,l

n n rr

X ' II µj) 1x2 tI P) ...t X nr l l µ//1=t 1=t 1=t

(X1 X2 ... ll(22 ...JAI"' IBI°

'I]

u^,.

28The Kronecker Product [Ch. 2

Another important property of Kronecker products follows.

AOB = Ut(BOA)U2

where U1 and U2 are permutation matrices (see Example 2.1).

ProofLet AYB' = X, then by (2.13)

(BOA) vec Y = vecX X.

on taking transpose, we obtain

BY;t' = X'So that by (2.13)

(A 0 B) vec Y' _

From example 1.5, we knowsuch that

vecX' .

(1)

(2)

that there exist permutation matrices

vec X' = U1 vec X and vec Y = U2 vec Y' .

(2.14)

U1 and U2

Substituting for vec Yin (1) and multiplying both sides by U1, we obtain

U1(B 0A)U2vecY' = U1 vecX .

Substituting for vec X' in (2), we obtain

(3)

(A O B) vec Y' = U1 vecX . (4)

The result follows from (3) and (4).We will obtain an explicit formula for the permutation matrix Uin section

2.5. Notice that U1 and U2 are independent of A and B except for the orders ofthe matrices.

XII if f is an analytic function, A is a matrix of order (n X n) and f(A) exists,then

andf(1,n&A) = Im ID AA)

f(A O Im) = f(A) O I.

ProofSince f is an analytic function it can be expressed as a power series such as

f(z) = a°+a1z+a2z2+..so that

f(A) = aoI,, +a1A+a2A2+... _

where A° = I.By Cayley Hamilton's theorem (see [18]) the right hand side of the equation

for f(A) is the sum of at most (n + 1) matrices.

a~'

'"1

Sec. 2.3] Some Properties and Rules for Kronecker Products 29

We now have

k =O

k=0

k=0

k7err, a®

k=0

Im ©f (A)

This proves (2.15); (2.16) is proved similarly.We can write

f(A (D I,,) )'ak(A Ox Im)kk -O

k=0

(Ak ©Im) by (2.11)

k=0

akAk ®lm)

= akA®0Imk=0

f(A) (& Irn

This proves (2.16).An important application of the above property is for

f(z) = eZ .

(2.15) leads to the result

elm 6A = Im O eA

and (2.16) leads to

eA ®rm = eA O It n

Example 2.4

Use a direct method to verify (2.17) and (2.18).

by (2.6)

(2.17)

(2.18)

Solution

elm®A =

30The Kronecker Product [Ch. 2

The right hand side is a block diagonal matrix, each of the m blocks is the sum

I,,,+A+21 A2+... = eA .

The result (2.17) follows.

eA®Im (In®Im)+(A(D Im)+21 Q. ®A)2+...

( 1 n ®In,) + ( A ®1m) + 1(A2 01m) + .. .

= Q,,+A+2A2+...)OOIm

= eA ®I,,,

XIII tr(A®B)=trAtrB

ProofAssume that A is of order (n X n)

tr(A®B) = tr(a1,B)+tr(a22B)+...+tr(annB)= a11trB+a22trB+...+anntrB= (all +a22+...+a.... )trB= tr A tr B .

2.4 DEFINITION OF THE KRONECKER SUMGiven a matrix A(n X n) and a matrix B (m X m), their Kronecker Sum denoted

by A ®B is defined as the expression

AG+B = A©I,,+1n®B (2.19)

We have seen (Property IX) that if {X;} and {pj} are the eigenvalues of A and Brespectively, then {X;pj} are the eigenvalues of the product A ® B. We now showthe equivalent and fundamental property for A (D B.

XIV If {X;} and tAj) are the eigenvalues of A and B respectively, then (Xi + pf}are the eigenvalues of A O B.

ProofLet x and y be the eigenvectors corresponding to the eigenvalues X and p of Aand B respectively, then

(A(DB)(x®y) _ (A0I)(x0y)+(10B)(x(3y) by (2.19)

= (Ax ®y) + (x ®By) by (2.11)

= X(x ®y) + U(x ®y)

_ (X+p)(x®y)The result follows.

'=9

c,.

Sec. 2.41 Definition of the Kronecker Sum

Example 2. S

Verify the Property XN for

A _ l -1 I

0

SolutionFor the matrix A;

and B =1-0

Cl-lJ1

31

X, = 1 and x, = 1101

X2 = 2 and x2 = [ 1For the matrix B;

1 iµ, = 1 and Yi

-(L

1122 and Y2=L1

We find

C=AO+B =

2 0 -1 0

2 0 0 -10 0 3 0

0 0 2 1

and 1 pi - Cl = p (p - 1) (p - 2) (p - 3), so that the eigenvalues of A O B are

and

p = 0 = X, + µ2 and xt O y2 = [0 1 0 0]'

p = 1 = X2 + 112 and x2 O Y2 = 10 1 0 -1]'p = 2 = X,+µr and x1Oy, = (1 1 0 0]'

p = 3 = X2 + µr and x2 O Yr = 11 1 -1 -1 ]' .

The Kronecker sum frequently turns up when we are considering equationsof the form;

AX + XB = C (2.20)

where A(n X n), B(m X in) and X(n X m).Use (2.13) and solution to Example 2.3 to write the above in the form

vecCor

(11' (D A) vec X = vec C

It is interesting to note the generality of the Kronecker sum. For example,

exp (A + B) = exp A exp B

(2.21)

,,,

...

't7"t7

32 The Kronecker Product [Ch. 2

if and only if A and B commute (see [ 181 p. 227)whereas exp (A 0 B) = exp (A 0 1) exp (I 0 B)even if A and B do not commute!

Example 2.6Show that

exp (A ®B) = expA © exp B

whereA(n X it),B(m X m).

SolutionBy (2.11)

A®Band

(A 0 Im) and (In 0 B) commute so that

exp (A ®B) = exp (A 01m + In 0 B)

= exp (A ®I,,,) exp (In ®B)

= (expA ®Im) (1 ® exp B) (by 2.15 and 2.16)

= expA 0 expB (by 2.11)

2.5 THE PERMUTATION MATRIX ASSOCIATING vec X AND vec X'

If X = [x;l] is a matrix of order (in X n) we can write (see (1.20))

X = EEx,/E;j

where Eli is an elementary matrix of order (in X n). It follows that

X' =so that

vec X' = EEx11 vec Erl' .

We can write (2.22) in a form of matrix multiplication as

(2.22)

x11

x21

vec X' = [vec E11 vec E21 .. vec E,;1 vec E12:... vec E,;,n] I x,,,,

x12

xmn

f=7

--»

Sec. 2.5] The Permutation Matrix

that is

33

vec X' = [vec E11 vec E21; ... vec E,,',,: vec E12 ... vec E,,',,j vec X.

So the permutation matrix associating vec X and vec X' is

U = [vec E,', vec E2, ... vec (2.23)

Example 2.7Given

X = xli X12 X13 determine the matrix Ux21 x21 x13

such that

Solution

vecX' = U vec X,

0l 0 = r0EI'1

-1 0 0 Ei'r !f 0 0 'El, =

r -,

E13 0 and E23 = 0

1,0

0

1(0)

0

0 1

Hence by (2.23)

U =

1 0 0 0 0 0

0 0 1 0 0 0

0 0 0 0 1 0

0 1 0 0 0 0

0 0 0 1 0 0

0 0 0 0 0 1

001 0

0 0

E22 =

000 1

0 0

We now obtain the permutation matrix U in a useful form as a Kroneckerproduct of elementry matrices.

As it is necessary to be precise about the suffixes of the elementary matrices,we will use the notation explained at the end of Chapter 1.

As above, we writem

X' = > > xrsEsr (n X m) .

r=l s=1

By (1.31) we can write

X' Er (nXm)XEsr(11 Xm).r, s

4U.

c..

r..

34

1 fence,

The Kronecker Product [Ch. 2

vec X' = vec Esr (n X nt) XE,rr (n X m)r, s

Er,.(mXn)©E,.r(nXm)jvecX by (2.13)r' s

It follows thatU = ) Ers (m X n) O Esr (n X m)

r, s

or in our less rigorous notation

(2.24)

U = ,E, Ox Ers (2.25)r, s

Notice that U is a matrix of order (nut X nut).At first sight it may appear that the evaluation of the permutation matrices

Ut and U2 in (2.14) using (2.24) is a major task. In fact this is one of the exampleswhere the practice is much easier than the theory.

We can readily determine the form of a permutation matrix - as in Example2.7. So the only real problem is to determine the orders of the two matrices.

Since the matrices forming the product (2.14) must be conformable, theorders of the matrices Ut and U2 are determined respectively by the number ofrows and the number of columns of (A O B).

Example 2.8Let A = [a111 be a matrix of order (2 X 3), and B = [bit] be a matrix of order(2 X 2).

Determine the permutation matrices Ut and U2 such that

A O B = Ut (B 0 A) U2

Solution

(A ©B) is of the order (4 X 6)

From the above discussion we conclude that Ut is of order (4 X 4) and U2 is oforder (6 X 6).

1 0 0 0 0 0

1 0 0 0 0 0 1 0 0 0

0 0 1 0 0 0 0 0 1 0Ut _0 1 0 0

and U2 =0 1 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 0 0 0 0 11

(-)

C)'

`L7

Sec. 2.51 The Permutation Matrix

Another related matrix which will be used (in Chapter 6) is

U=r, s

rs O Ers

When the matrix X is or order (in X n), U is or order (nr2 X n2).

Problems of Chapter 2

(1) GivenU = Ers(inX n)0Esr(nX m).

Show thatr, s

U-1 = U' =.Er(nXin)0Ers(inXn)r' s

35

(2.26)

(2) A = [at1], B = [b,1] and Y = [y,j] are matrices all of order (2 X 2), use adirect method to evaluate

(a) (i) AYB(ii) B' ©A

(b) Verify (2.13) that

vecAYB = (B' O A) vec Y.

(3) Givenr2 1

and B =-1 1

2 0A =

01

(a) Calculate

AOB add BOA.(b) Find matrices U, and U2 such that

AOB = Ul(BOA)U2.

(4) GivenC3 4

A2 _3

calculate

(a) exp (A)

(b)'exp(A 01).

Verify (2.16), that is

exp (A) 01 = exp (A 01).

../

36

(5) Given

The Kronecker Product [Ch. 2)

2 1 1 2and B , calculate

1 3 4-1 -A

(a) A"' O B-'and

(b) (A ©B)'' .

Hence verify (2.12), that is

(A © B)'' = A"' © B''

(6) Given

L4 2]and B = L2

3, find

(a) The eigenvalues and eigenvectors of A and B.(b) The eigenvalues and eigenvectors of A © B.(c) Verify Property IX of Kronecker Products.

A =

(7) A, B, C and D are matrices such that

A is similar to C, and

B is similar to D.

Show that A 0 B is similar to C rJ D.

....

ice'

"'7

,-.

..'

C].

.'^

CHAvrER 3

Some Applications of theKronecker Product

3.1 INTRODUCTION

There are numerous applications of the Kronecker product in various fieldsincluding statistics, economics, optimisation and control. It is not our intentionto discuss applications in all these fields, just a selected number to give an ideaof the problems tackled in some of the literature mentioned in the Bibliography.There is no doubt that the interested reader will find there various other appli-cations hopefully in his own field of interest.

A number of the applications involve the derivative of a matrix - it is a wellknown concept (for example see [18] p. 229) which we now briefly review.

3.2 THE DERIVATIVE OF A MATRIX

Given the matrix

A(t) _ [ar!(t))the derivative of the matrix, with respect to a scalar variable t, denoted by(d/dt)A(t) or just dA/dt or A(t) is defined as the matrix

dtA(t) - I dta;t(t)I . (3.1)

Similarly, the integral of the matrix is defined as

JA(t)dt = [Jaii@)dt (3.2)

For example, given

2t2 4A =

sin t 2 + t2

717

cry

38

then

Some Applications of the Kronecker Product [Clt. 3

d

A =14t Q t3 4t

dtand fAdt = + C

cost 2t -cost 2t + t3/3

where C is a constant matrix.One important property follows immediately. Given conformable matrices

A(t) and B(t), then

dt [AB] = aAB+A d- . (3.3)

Example 3.1

Given C = AOB(each matrix is assumed to be a function of t) show that

dC = dAOB+AO dB 3.4)

SolutionOn differentiating the (i, j)th block of A O B, we obtain

dt (aijB) i'iB + a,i aB

which is the (i, j)th partition of

dAOB+AOdB ,

the result follows.

3.3 PROBLEM 1

Determine the condition for the equation

AX+XB = C

to have a unique solution.

Solution

We have already considered this equation and wrote it (2.21) as

(B'@ A) vec X = vec Cor

Gx = c

where G = B' (D A and c = vec C.

(3.5)

_;'

"..

-N1

...

^'r

Sec. 3.3) Problem I 39

Equation (3,5) has a unique solution 1ff G is nonsingular, that is iff theeigenvalues of G are all nonzero. Since, by Property XIV (see section 2.4), theeigenvalues of G are (X1 + µ/) (note that the eigenvalues of the matrix B' arethe same as the eigenvalues of B). Equation (3.5) has a unique solution iff

Xr+µl a:0 (all iandj).We have thus proved that AX + BX = C has a unique solution iff A and (-B)have no eigenvalue in common.

If on the other hand,A and (-B) have common eigenvalues then the existenceof solutions depends on the rank of the augmented matrix

[Gc]

If the rank of [G:c] is equal to the rank of G, then solutions do exist, otherwisethe set of equations

AX+XB = Cis not consistent.

Example 3.2Obtain the solution to

AX+XB = Cwhere

(1) A = I0 22, B = [ 1 0]

an d C =12 2

+

3 4(ii) A = 1

0 2 , B = 0 -1 an d C = 2 -9

Solution

Writing the equation in the form of (3.5) we obtain,

(1) -2 - 1 1 0 x1 I

0 - 1 0 1 x2 -24 0 1 - 1 x3 3

0 4 0 2 x4 2

where for convenience we have denoted

x2 x,l

00.

0

...

0

40 Some Applications of the Kronecker Product [Ch. 3

On solving we obtain the unique solution

10 21X=

1 -1

(ii) In case (ii) A and (-B) have one eigenvalue (X = 1) in common. Equation(3.5) becomes

H2 -1 0 0 x, 0

0 -1 0 0 x2 _ 2

4 0 0 -1 x3 SS

LO 4 0 J x4 -9

and rank G = rank [G; c].G is seen to be singular, but

rank G = rank [G c] = 3

hence at least one solution exists. In fact two linearly independent solutions are

X, _1 0

-2 -1and X2 =

Ti 1-1

-2 -1

any other solution is a linear combination of X, and X.2.

3.4 PROBLEM 2

Determine the condition for the equation

AX-XA=yX (3.6)

to have a nontrivial solution.

Solution

We can write (3.6) as

Hx = px (3.7)

whereH=I®A -A'@ I and

x = vecX .

(3.7) has a nontrivial solution for x iff

1,41-HI = 0

that is iff p is an eigenvalue of H. But by a simple generalisation of Property XIV,

Sec. 3.5] Problem 3

section 2.4, the eigenvalues of H are {(At - ?l)} whereof A. 1-fence (3.6) has a nontrivial solution iff

p =

Example 3.3

Determine the solutions to (3.6) when

A =5 of2 3

and p = -2 .

{rr}

41

are the eigenvalues

Solution

p = -2 is an eigenvalue of H, hence we expect a nontrivial solution. Equation(3.7) becomes

0 0--2 01 XI X1

2 2 0 - 2 x2 x2= -2

0 0 -2 0 X3 X3

0 0 2 0 xa x4

On solving, we obtain

X= 1 1

-1 -1

3.5 PROBLEM 3

Use the fact (see [18] p. 230) that the solution to

z = Ax , x(0) = cis

x = exp (A t) c

to solve the equation

X = AX + XB , X (O) = C

where A(n X n), B(m X in) and X(n X rn).

Solution

Using the vec operator on (3.10) we obtain

X = GX , x (0) = cwhere

x = vecX, c = vecCand

G = I,,, OA+B'OI

(3.8)

(3.9)

(3.10)

(3.11)

...

.-.

42 Some Applications of the Kronecker Product [Ch. 3

By (3.9) the solution to (3.11) is

vee X = exp {(I,,, 0 A) t + (B' ®lr,)t) vcc C

[exp (I,,, ©A)t] [exp vcc C (see Example 2.6)

[I, © exp (At)] [exp (Bt) O vec C by (2.17) and

We now make use of the result

vec AB = (B'(D I) vec A

(in (2.13) put A =1 and Y - A) in conjunction with the

to obtain

fact that

(2.18).

[exp (B'r)] = exp (Bt) ,

(exp (Bt) O vec C = vec [Cexp (Bt)]

Using the result of Example 2.3(1), we finally obtain

vec X = vec [exp (At) C exp (Bt)

So that X = exp (At) C exp (Bt).

(3.12)

Example 3.4Obtain the solution to (3.10) when

A =1 -1

, B =1 0

and C =-2 0

0 2 0 -1 I 1

Solution

(See [ 18] p. 227)

er et - e2t et 0exp(At) = e2tfl, exp (Bt) = l

0 e r

hence

X_e2r-ear

e3t er

3.6 PROBLEM 4

We consider a problem similar to the previous one but in a different context.An important concept in Control Theory is the transition matrix.Very briefly, associated with the equations

X = A(t)X or is = A(t)xis the transition matrix (P1 (t, r) having the following two properties

c1(t r) = A(t)'t1(t r) (3.13)and

`b1(t, t) = 1

,L]

.ti

Sec. 3.6J Problem 4 43

[For simplicity of notation we shall write for cb(t,T).] lfA is a constant matrix,it is easily shown that

(1) = exp(At)

Similarly, with the equation

X = X13 so that X' = 13'X'we associate the transition matrix 4'2 such that

4,2 = B'`F2 .

The problem is to find the transition matrix associated with the equation

X=AX+XBgiven the transition matrices 4' and Ì'2 defined above.

SolutionWe can write (3.15) as

is=Gx

(3.14)

(3.15)

where x and G were defined in the previous problem.We define a matrix as

Ji(t,T) _ 1,2(t,T)0 `P,(t,T) (3.16)

We obtain by (3.4)

q)2 ©(>;, + 4)2

(B'4'2) ® `1't + `1'2 O (A`1)1) by (3.13) and (3.14)

= (B'`F2) ® (I`1't) + (I`F2) ® (A`l't)

= [B'OI+IOAi[(2O(1?,J . by (2.11)Hence

=GO .Also

i (t, t) _ `l'2(4 r) ®`F (t, r)

= I®I=I. (3.18)

The two equations (3.17) and (3.18) prove that L is the transition matrix for(3.15)

Example 3.5Find the transition matrix for the equation

1 0X - IO 2

X+X0 -1

coy

44 Some Applications of the Kronecker Product (Clr. 3

SolutionIn this case both A and B are constant matrices. From Example 3.4.

4'1 = exp(At) =et et-e2` i

LoIet

0

e2t

0 te-

4)2 = exp (Bt) _

So thate2t e2t__e31 0 0

0 e3t 00

1G=(D2O t=0 0 1 1 -et

Lo 0 0 etFor this equation

2 -1 0 0

0 3 0 0G =

0 0 0 -1

L0 0 0 1,

and it is easily verified that

=Giand

3.7 PROBLEM 5Solve the equation

AXB =Cwhere all matrices are of order n X n.

Solution

Using (2.13) we can write (3.19) In the form

Hx = c (3.20)

where H = B'O A, x = vec X and c = vec C.The criteria for the existence and the uniqueness of a solution to (3.20) are

well known (see for example [ 18] ).The above method of solving the problem is easily generalised to the linear

equation of the form

A1XB1 + A2XB2 + ... +A,XB,. = C (3.21)

Sec. 3.81 Problem 6

Equation (3.21) can be written as for example (3.20) where this time

B ,

Example 3.6Find the matrix X, given

whereA1XB1 +A2XB2 = C

0 2 4 -61

B2 = and C =1-l 3 0 8

Solution

For this example it is found that r --t2 2 -2 - 3

1 -1 1 2H = B0A1+Bz0A2 =0 2 2 5

-4 -2 -5 - 4

andc'=[4 0 -6 81It follows that

so that

x = H-lc =

X

-1

-2

0

45

3.8 PROBLEM 6

This problem is to determine a constant output feedback matrix K so that theclosed loop matrix of a system has preassigned eigenvalues.

A multivariable system is defined by the equations

x = Ax+Buy=Cx

(3.22)

where A(n X n), B(n X m) and C(r X n) are constant matrices, u, x and y arecolumn vectors of order in, n and r respectively.

Some Applications of the Kronecker Product [Ch. 3

We are concerned with a system having an output feedback law of the form

u = Ky (3.23)

where K(m X r) is the constant control matrix to be determined.On substituting (3.23) into (3.22), we obtain the equations of the closed

loop system

z=(A+BKC)xy

C= x.

The problem can now be restated as follows:Given the matrices A, B, and C, determine a matrix K such that

(3.24)

XI -A -BKC I = ao + a, X + ... + an._1 A"-1 + A" (say) (3.25)

= 0 for preassigned values A = A1, A2, ..., An

SolutionVarious solutions exist to this problem. We are interested in the application ofthe Kronecker product and will follow a method suggested in [24].

We consider a matrix H(n X n) whose eigenvalues are the desired values A1,A2 ... , An, that is

IAl-HI = 0 for A = (3.26)and

IAl-HI = ao+a1A+...+an_1A"-1+A" . (3.27)Let

so thatA + BKC = H

BKC=H-A=Q (say) 3.28)

Using (2.13) we can write (3.28) as

(C'@ B) vec K = vec Q (3.29)

or more simply as

Pk = q (3.30)

where P = C' O B, k = vec K and q = vec Q.Notice that P is of order (n2 X mr) and k and q are column vectors of order

mr and n2 respectively.The system of equations (3.30) is overdetennined unless of course to = n =r,

in which case can be solved in the usual manner - assuming a solution doesexist!

In general, to solve the system for k we must consider the subsystem oflinearly independent equations, the ienraining equations being linearly dependent

oCD

'r.

may

.'..Y

.°,

I!]

+'7

'.'

.°C

.'.

Sec. 3,8 .1 Problem 6 47

on this subsystem. In other words we determine a nonsingular matrix T(n2 X n2)such that

PtTP = --- (3.31)Li P2

where P, is the matrix of the coefficients of the linearly independent equationsof the system (3.30) and P2 is a null matrix.

Premultiplying both sides of (3.30) by T and making use of (3.31), weobtain

TPk=Tqor

LiPNk=

u

V(3.32)

If the rank of P is tnr, then Pl is of order (nir X rnr), P2 is of order ([n2- mr] Xmr) and u and v are of order nir and (n2 -mr) respectively.

A sufficient condition for the existence of a solution to (3.32) or equivalentlyto (3.30) is that

v = 0 (3.33)

in (3.32).If the condition (3.33) holds and rank Pt = mr, then

k = Pt-t u , (3.34)

The condition (3.33) depends on an appropriate choice of H. The underlyingassumption being made is that a matrix H satisfying this condition does exist.This in turn depends on the system under consideration, for example whether itis controllable.

Some obvious choices for the forth of matrix H are: (a) diagonal, (b) upperor lower triangular, (c) companion form or (d) certain combinations of the aboveforms.

Although forms (a) and (b) are well known, the companion form is less welldocumented.

Very briefly, the matrix

0 1 0 ... 0

0 0 I ... 0H=0 0 0 ... 1

Lao -ar -a2 -a"-

is said to be in `companion' form, it has the associated characteristic equation

IA! -HI = ao + at?t + ... + 0 (3.35)

ON

O

Some Applications of the Kxonecker Product (Ch. 3

Example 3. 7Determine the feedback matrix K so that the two input - two output system

0 1 0 0 0

x= 3 3 1 x+ 1 0 u2 -3 2 0 1

has closed loop eigenvalues (-1, -2, -3).

SolutionWe must first decide on the form of the matrix H.

Since (see (3.28))

H - A = BKC

and the first row of B is zero, it follows that the first row of

H-Amust be zero.

We must therefore choose H in the companion form.Since the characteristic equation of His

(X+1)(X+2)(a+3) = X3 +6X2+11a+ 6 = 0

0

1 0

H

[0 1 (see (3 .35))

-6 -11 --6

and hence (see (3.28))

r0 0 0

Q = -3 -3 0

-8 -8 -8 .0 0 0 0

1 0 1 0

0 1 0 1

1 0 0 0 0 0 0

P=C'OB= 11

11

O 1 0 1 0 1 0

0 1 0 1 0 1 0 1

0 0 0 0

0 0 1 0

0 0 0 1

`--O

0

0

Sec. 3.8] Problem 6

An appropriate matrix T is the following

49

0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 1

0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

T= 1 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 1 0 0

0 0 1 0 0-1 0 0 0

0 1 0 0-1 0 0 0 0

It follows that

0 1 0

0 0 0 1

1 0 1 0

0 1 0 1 P,TP =

0 0 0 0 PZ

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

and

0

- 8

- 3

- 8 uTq = 0

v

50 Some Applications of the Kronecker Product [Ch. 3]

Since

0 0 1 0 -1 0 1 0

Pt =0 0 0 1

Pit =0-1 0 1

1 0 1 0 1 0 0 0

so that (see (3.34)

L0 1 0 1J

)

0 1 0 0

-3

k =Pi-lu = 0

0

-8Hence

A _ [-13 0

-8

t17

n';

,w.

.w,

coo

't7v0

,

try

(G9

SAC

f-'".,

.ti.-.

.."

'C7

`'"

-L7

.CC

fl..

II.a,:

°.'

CHAPTER 4

Introduction to Matrix Calculus

4.1 INTRODUCTION

It is becoming ever increasingly clear that there is a real need for matrix calculusin fields such as multivariate analysis. There is a strong analogy here with matrixalgebra which is such a powerful and elegant tool in the study of linear systemsand elsewhere.

Expressions in multivariate analysis can be written in terms of scalar calculus,but the compactness of the equivalent relations in terms of matrices not onlyleads to a better understanding of the problems involved, but also encourages theconsideration of problems which may be too complex to tackle by scalar calculus.

We have already defined the derivative of a matrix with respect to a scalar(see (3.1)), we now generalise this concept. The process is frequently referred toas formal or symbolic matrix differentiation. The basic definitions involvethe partial differentiation of scalar matrix functions with respect to all theelements of a matrix. These derivatives are the elements of a matrix, of the sameorder as the original matrix, which is defined as the derived matrix. The words'formal' and 'symbolic' refer to the fact that the matrix derivatives are definedwithout the rigorous mathematical justification which we expect for the corres-ponding scalar derivatives. This is not to say that such justification cannot bemade, rather the fact is that this topic is still in its infancy and that appropriatemathematical basis is being laid as the subject develops. With this in mind wemake the following observations about the notation used. In general the elementsof the matrices A, B, C, . . . will be constant scalars. On the other hand theelements of the matrices X, Y, Z, . . . are scalar variables and we exclude thepossibility that any element can be a constant or zero. In general we will alsodemand that these elements are independent. When this is not the case, forexample when the matrix X is symmetric, is considered as a special case. Thereader will appreciate the necessity for these restrictions when he considers thepartial derivatives of (say) a matrix X with respect to one of its elements xr5.Obviously the derivative is undefined if xr,. is a constant. The derivative isEr,s if xr5 is independent of all the other elements of X, but is Er,s + E,,. if X issymmetric.

((d

INS

rowrod`

`d.loo

C1'

rpm

52 Introduction to Matrix Calculus (Ch. 4

There have been attempts to define the derivative when xrs is a constant (orZero) but, as far as this author knows, no rigorous mathematical theory for thegeneral case has been proposed and successfully applied.

4.2 THE DERIVATIVES OF VECTORS

Let x and y be vectors of orders n and m respectively. We can define variousderivatives in the following way (15]:

(1) The derivative of the vector y with respect to vector x is the matrix

ay

FaYt

ax,

ayt

ays

ax,

3Y2

aYR,

ax,

aYm(4.1)

ax ax2 ax2

axe

ay, by., aym

axn axn ax-1

of order (n X m) where yr, Y2, ... , y,,, and x,, x2, ... , x are the components ofy and x respectively.

(2) The derivatives of a scalar with respect to a vector. Ify is a scalar

ray

ay ayTax ax2

by

axn

(3) The derivative of a vector y with respect to a scalar x

(4.2)

by ay, aye aym(4.3)

ax Lax ax ax

Example 4.1

Giveny =

Yr x =X,

x2Y2

X3

/Ay

-ti

n...

l0,

.ti

Sec. 4.2] The Derivatives of Vectors 53

andYi =xi-x2Y2 = x3 + 3x2

Obtain ay/ax.

Solution3Yi ay-2 2xt\ 0

ay _axe axt

ay, aye-1 3

ax axe axe

ay, ay20 2xj

ax3 ax3

In multivariate analysis, if x and y are of the same order, the absolute valueof the determinant of ax/ay, that is of

aX

ayJ

is called the Jacobian of the transformation determined by

y = Y(x)

Example 4.2

The transformation from spherical to Cartesian co-ordinates is defined by x =r sin 0 cos >V ,y = r sin B sin ', and z = r cos B where r > 0, 0 < 0 <7r and0< ,<27r.

Obtain the Jacobian of the transformation.

SolutionLet

and

ay

x=Yt, Y=x2, z=x3r=Y1, 0=Y2, '=Y3,

sin y2 COSy3 S"' Y2 sin Y3

J =` ax Yt COs y2 COSY3 Yt Cosy2 sin y3

yt sin y2 Sin Y3 Yt sin Y2 COS Y3

= , sin y2

CosY2

-Yi sill Y20

Definitions (4.1), (4.2) and (4.3) can be used to obtain derivatives to manyfrequently used expressions, including quatratic and bilinear forms.

...-_

-

INN

.fl

:.,

54 Introduction to Matrix Calculus

For example consider

y=xAxUsing (4,2) it Is not difficult to show that

ay =Ax+A'xax

(Ch. 4

= 2Ax if A is symmetric.

We can of course differentiate the vector 2Ax with respect to x, by definition

a

\a ) =8 (2Ax)

ax ax ax

= 2A' = 2A (if A is symmetric).

The following table summarises a number of vector derivative (ormulae.

Y aY

scalar or a vector ax

(4.4)

4.3 THE CHAIN RULE FOR VECTORS

Let

x

=[x2l

'y = [Y21 and z = [zi z211

Xn Yr

Using the definition (4.1), we can write

az, 3z,

ax, aX2

8Z2 aZ2

ax1 ax2

(4.5)

aZrn 8Zrn

ax, ax2

!ate

row

fro

Sec. 4.3] The Chain Rule for Vectors 55

Assume that

z=y(x)so that

azi _ r-, °ZI ayq

ax1 i ayy ax,

Then (4.5) becomes

1 = 1,Z,...,m

az, ayq

yq ax,

az ' 8Z2 aYq

ax :Eayq ax,

azrn '}'q

ayq ax,

az, aZ, "IaYt aY2

...aYr

az2 aZ2 az2

aYt aY2 ayr

az,,, aZm az,n

aYr aY2 aYr

(ayl

(ax),

_ ayaz'ax ay

az, ayq

ayq axe

az2 ayq

ayq axe

az, alayq aXn

az2 ayq

ayq axn

azm ayq aZm ayqayq

a3C2 aYq axn

lay, ay, ... ay,

ax, ax2 axn

aye aY2 aY2

ax, 3X2...

bxn

ayr ayr ayr

ax, az2 axn

(by (4.1))

on transporting both sides, we finally obtain

az _ ay az(4.6)

ay ax ay

a°'

Imo)

`"J

fly

ì117.

56 Introduction to Matrix Calculus [Ch. 4

4.4 THE DERIVATIVE OF SCALAR FUNCTIONS OF A MATRIX WITHRESPECT TO THE MATRIX

Let X = [x,J] be a matrix of order (m X n) and let

Y = f(X)be a scalar function of X.

The derivative of y with respect to X, denoted by

ay

axis defined as the following matrix of order (m X n)

ay ay ay

BY

ax

ax11 ax12 -

axin

BY BY ay

ax21 ax 22 ax2n

ay

axj/E,/

ay(4.7)

axi/

ay ay .. ayaxml axm2 ax-

where E;/ is an elementary matrix of order (m X n).

DefinitionWhen X = [xy] is a matrix of order (m X n) and y = f(X) is a scalar function ofX, then af(X)/aX is known as a gradient matrix.

Example 4.3

Given the matrix X = [xti] of order (n X n) obtain ay/aX when y = tr X.

Solutiony=trX=x11+x22+...+xnn =trX'(see 1.33) hence by (4.7)

ayIn

ax

An important family of derivatives with respect to a matrix involves functionsof the determinant of a matrix, for example

y = JXJ or y = JAXJ .

We will consider a general case, say we have a matrix Y = [y;/] whosecomponents are functions of a matrix X = [x;/], that is

Yi/ = fl (x)where x = [x11 x12 . . . xmn]'

CD

.

Sec. 4.4] The Derivative of Scalar Functions of a Matrix 57

We will determine

aiYiax

which will allow us to build up the matrix

ax

Using the chain rule we can write

alyiax1z

But JYl = EyilYl

aryl ay;,

a axrs

where Y, Is the cofactor of the element y;; in IYI. Since the cofactorsare independent of the element y11, we have

alYl

It follows that

Yl, Yia, . ,

(4.8)

Although we have achieved our objective in determining the above formula,it can be written in an alternate and useful form.

With

a;; = Yil and b;, =ay;;

ax,we can write (4.8) as

a'Y

OX0 i bU a+'1b11e/el

= EEa,,ej'b,1el

= EA,.'B;. (by (1.23) and (1.24))

= tr (AB') = tr (B'A) (by (1.43))

where A = [a;;] and B = [b;l].

N


Assuming that Y is of order (k X k) let

Ylrl Y12 ... Ylk

Y21 Y22 ... Yak I = Z

Yk1 Yk2 ... Ykk

and sincerayi aY

8x axwe can write

aFYI l/a Y,= tr l-Zax, x,s

We use (4.10) to evaluate 8IY1/di1l, aIYI/ax12,use (4.7) to construct

aIYI

ax

... , a

Example 4.4

Given the matrix X = [x11} of order (2 X 2) evaluate aIXIlaX,

(i) when all the components xll of X are independent(ii) when X is a symmetric matrix.

Solution(i) In the notation of (4.10), we have

Y=Lx21 X22J

so that aY/ax,z = E,.s (for notation see (1.4)).As r

z= X11 X12j

LX21 X2? I

we use the result of Example (1.4) to write (4.10) as

a'YI_ (vec E,,)' vec Z

ax,

[Ch. 4

(4.9)

(4,10)

and then

0

C13

'.)

r\N

CJ]

Sec. 4.4] The Derivative of Scalar Functions of a Matrix 59

So that, for example

and

Vleslce

alYl[1ax

alyax12

_ [0 0 1 01

alyl alxxl

ax ax

= x

= X12 and so on.

x11xll

x21 x22

= IXI(X-1)' (See [18] p. 124).

(ii) This time

hence

Y =Ix11 X121

Lx 12 X22J

aiYi alylL11 = E12 + E21 and so on.

(See the introduction to Chapter 4 for explanantion of the notation.)It follows that

X11

alYl=-alyl

= [0 1 1 0] X21 = X21 +X12 = 2X12

hence

ax12 ax21 X12

X22 (Since X12 = X21)

al yl

ax

x11 2X12

2x21 Xz2= 2

x11 X12

X21 X22

Fx11 0

0 X22

The above results can be generalised to a matrix X of order (n X n).We obtain, in the symmetric matrix case

aixl= 2 [Xjj] - diag {X;;}

ax


We defer the discussion of differentiating other scalar matrix functions toChapter 5.

4,5 THE DERIVATIVE OF A MATRIX WITH RESPECT TO ONE OF ITSELEMENTS AND CONVERSELY

In this section we will generalise the concepts discussed in the previous section.We again consider a matrix

X = [x,,] or order (m X n) .

The derivative of the matrix X relative to one of its elements x,s (say), isobviously (see (3.1))

ax= E, (4.11)ax

where E1 is the elementary matrix of order(nt X n) (the order of X) defined insection 1,2.

It follows immediately that

. (4.12)ax'

=E.'

ax,

A more complicated situation arises when we consider a product of the form

where

and

Y = AXB (4.13)

X = [xq] is of order (m X n)

A = [a;,] is or order (I X m)

B = [b;,] is of order (n X q)

Y = [y;,] is of order (I X q) .

A and B are assumed independent of X.Our aim is to find the rule for obtaining the derivatives

aY

ax'sand

ay,,

ax

where xrs is a typical element of X and yil Is a typical element of Y.We will first obtain the (I,/)th element yr, in (4.13) as a function of the

elements of X.We can achieve this objective in a number of different ways. For example,

we can use (2.13) to write

vecY = (B'(D A)vecX .

...

roles

Sec. 4.5] The Derivative of a Matrix 61

From this expression we see that yij is the (scalar) product of the ith row of

[bljA; b2jA; . ;b1A] and vecX,so that

Yij = >. /',. ail bpjxlp .

A=l 1=1

From (4.14) we immediately obtain

ayij

axrs- atrbsj

We can now write the expression for aylj/aX ,

ayij aytj ayii

ax11 aX12 , . aXln

a ayu

ax21 ax22

ayu

aX2n

ay11 aylj aye,...

(4,14)

(4.15)

(4.16)

aXm 1 axm 2 axm n

Using (4.15), we obtain

aylj

ax

a11blj aitb2j ... ailbnj

ai2blj a12b2j ... a12bnj (4.17)

Limblj aimb2i ... almbnjj

We note that the matrix on the right hand side of (4.17) can be expressedas (for notation see (1.5) (1.13) (1.16) and (1.17))

ail

ail

atmj

(btjb2j ... bnjj

= Al. B./

= A'e1 ee B'.

SRS

AID

"

r0'


So that

ayÌ = A'E B'r/ax

(Ch. 4

(4.18)

where Ell is an elementary matrix of order (I X q) the order of the matrix Y.We also use (4.14) to obtain an expression for aYlaxrs

a Y ay,I (r, s fixed, 1, j variable I < i s 1, 1< j 5 q)=

aXrs aXrsthat is

ayaxrs

ayll ayl2

axrs aXrs

aye, ay22

aXrs axrs

ay,g

aXrs

ay2g

axrs

ay a 8YI2 aytq

xs axrs axrs

Eli (4.19)ay"axrs

where Et1 is an elementary matrix of order (1 X q).We again use (4.15) to write

ayu

axrs

So that

alrbsl alrbs2 ...alrbqa2rbsi a2rbs2 ... a2rbsq

arnrbsl arnrba2

air

a2r

arnr

. amrbsq

[bst b52 . . . bsq ]

A.rBs' = AeresB .

a (AXB)= AErsB

axrs(4.20)

where Ers is an elementary matrix of order (m X n), the order of the matrix X.

lam

II.

.C)

Sec. 4.51 The Derivative of a Matrix

Example 4.5Find the derivative aY/axr,, given

Y = AX'B

63

where the order of the matrices A, X and B is such that the product on the righthand side is defined.

Solution

By the method used above to obtain the derivative a/axis (AYB), we find

a(AX'B) = AE,,B .

3Xrs

Before continuing with further examples we need a rule for determining thederivative of a product of matrices.

Consider

Y = UV (4.21)

where U = [u11] is of order (rn X n) and V = [qj] is of order (n X 1) and bothU and V are functions of a matrix X.

We wish to determine

aY-- andaxis

ay11

ax

The (i,j)th element of (4.21) is

ylj =n

UIPVPI (4.22)

P=1

hence

ay;j n UUpv

n avP1-U (4 23)airs

PjP =

i aXrs P=I

iP .

axis.

For fixed r and s, (4.23) is the (i,j)th element of the matrix aYlax,s oforder (m X 1) the same as the order of the matrix Y.

On comparing both the terms on the right hand side of (4.23) with (4.22),we can write

a(UV) au avV + U

axrs axis axis(4.24)

as one would expect.

,-.

CID

ti-


On the other hand, when fixing (i,j), (4.23) is the (r,s)th element of thematrix ay;l/aX, which is of the same order as the matrix X, that is

ay,l "lip avpl

ax L ax vpl + L utp axp=1 p-1

(4.25)

We will make use of the result (4.24) in some of the subsequent examples.

Example 4.6Let X = [xrs] be a non-singular matrix. Find the derivative aY/axrs, given

(i) Y = AX -'B, and(ii) Y=XAX

Solution(i) Using (4.24) to differentiate

yy-t = I,we obtain

hence

aY 3Y-'-Y-'+Y = 0,axrs axrs

aY ay-'- _ -Y - Y-.axrs axrs

But by (4.20)

3Y-' a (B-1XA-1) = B-'Ers q-taxrs axrs

so that

axrs axrs

ay a

- = - (AX-'B) = AX -'BB-'ErsA-'AX -'BAX-'ErsX-'B .

(ii) Using (4.24), we obtain

ay_

aX' AX+X' a(AX)-axrs axrs axrs

_ E, AX + X'Airrs (by (4.12) and (4.20)) .

Both (4.18) and (4.20) were derived from (4.15) which is valid for all i, jand r, s, defined by the orders of the matrices involved.

1

,R.

Sec. 4.5 1 The Derivative of a Matrix 65

The First Transformation PrincipleIt follows that (4.18) is a transformation of (4.20) and conversely. To obtain(4.18) from (4.20) we replace A by A', B by B' and Er: by Eli (careful, Ers andEtl may be of different orders).

The point is that although (4.18) and (4.20) were derived forconstant matrices A and B, the above transformation is independent of thestatus of the matrices and is valid even when A and B are functions of X.

Example 4.7Find the derivative of aytl/aX, given

(i) Y = AX'B,(ii) Y=AX-'B, and(iii) Y = X AU

where X = [x,l] is a nonsingular matrix,

Solution(1) Let W = X', tlien

ayY = AWB so that by (4.20) - =AEr3B

aWrs

hence

But

hence

ay,l= A'E;iB'.

aw

ayL/ a}ri _ (ay.lax aw' awl

DYq= BE ;IA

ax

(ii) From Example 4.6(i)

aY-AX-'L,-,,,X-'B.

axrs

Let At = AX -1 and Bt = X''B, then

aYA1E 3B

a1

xrs

so thatay,t

= AiE,1B1' = -(X )'A'E;1B'(X t)' .ax

0.j


(iii) From Example 4.6 (ii)

aY= E,,AX + X'AE,,s .

aXrs

LetA,=1,Bt=Ax,A2=XAandB2=1, thenax

= AtErsBl +A2Ersl32 .axrs

The second term on the right hand side is in standard form. The first term is inthe form of the solution to Example 4.5 for which the derivative ay;l/aX wasfound in (i) above, hence

ay 'r = B1E;1AI + A2E,/B2'ax

= AXE; + A`xE;l .

It is interesting to compare this last result with the example in section 4.2when we considered the scalary = x'Ax.

In this special case when the matrix X has only one column, the elementarymatrix which is of the same order as Y, becomes

E;1=E;j=1.Hence

ay,, = aY = Ax + A'xax ax

which is the result obtained in section 4.2 (see (4,4)).Conversely using the above techniques we can also obtain the derivatives of

the matrix equivalents of the other equations in the table (4.4).

Example 4.8Find

aY

aXrsand

ay;;

axwhen

(i) Y = AX, and(ii) Y=X'X.

Solution(i) With B = I, apply (4.20)

aY= AEr3.

axrs

Ltd

Sec. 4.61 The Derivatives of the Powers of a Matrix

The transformation principle results in

ay11

ax

(ii) This is a special case of Example 4.6 (ii) in which A = I.We have found the solution

aYErsX + X'Ers

axrs

and (Solution to Example 4.7 (iii))

'Y" = XE11 + XEj .ax

4.6 THE DERIVATIVES OF THE POWERS OF A MATRIX

Our aim in this section is to obtain the rules for determining

when

ay

axrsand

ay;;

ax

Y=X".Using (4.24) when U = V= X so that

Y=X2we immediately obtain

ay- =ErsX+XErsaxrs

and, applying the first transformation principle,

ay,= E;1X'+X'E;j .

axIt is instructive to repeat this exercise with

so that

We obtain

and

U= X 2 and V= X

Y X3.

ay= ErsX 2 + XErsX + X 2Ers

axrs

67

Y-u = Ei, (X')2 + X'EifX' + (X 1)2E,,

ax

...

68 Introduction to, Matrix Calculus

More generally, it can be proved by induction, that for

Y=Xn

k=0

XkEESXn-k-1

where by definition X ° = I, and

ay;l

ax

a(X-n) Xn+X-n a (Xn)=

0

airs axrs

3(X-n)_ `x -n a(Xn)

X-n.axrs axrs

x )k E,j (X ) n -k-1

Example 4.9Using the result (4.26), obtain aYlaxrs when

Y=X-n

SolutionUsing (4.24) on both sides of

X-nXn=Iwe find

so that

Now making use of (4.26), we conclude that

3(X-n)

"-I

k=1

= -x-nFn-1 7

L=°axrs


(1) Given -x= xtl x12 x3

x21 x22x233]

XkErsXn-k-1

Y = 1x-1

2x2 sin x

and y = 2x11x22 -x21x13, calculate

[Ch. 4

(4.26)

(4.27)

ay andBY

ax ax

Sec. 4.61 The Derivatives of the Power of a Matrix

(2) Given

Xsinx X

cos x czand X =

evaluatealxlax

by(a) a direct method(b) use of a derivative formula.

(3) Given

X =X11 x12 X13

and Y = X'X,Lx 21 x22 X231

use a direct method to evaluate

(a)D Y

and (b) aY i3

ax-21 ax

Fsinx

L'

ex

XI

(4) Obtain expressions for

byand

ay;;

ax's axwhen

(a) Y = XAX and (b) Y = XAX'.

(5) Obtain an expression for atAXBI/ax,,. It is assumedAXB is non-singular.

(6) Evaluate aY/ax,,s when

(a) Y = X (X')2 and (b) Y = (X')2X.

69

-o^

CHAPTER 5

Further Development of MatrixCalculus including an Applicationof Kronecker Products

5.1 INTRODUCTION

In Chapter 4 we discussed rules for determining the derivatives of a vector andthen the derivatives of a matrix.

But it will be remembered that when Y is a matrix, then vec Y is a vector.This fact, together with the closely related Kronecker product techniquesdiscussed in Chapter 2 will now be exploited to derive some interesting results.

Also we explore further the derivatives of some scalar functions with respectto a matrix first considered in the previous chapter.

5.2 DERIVATIVES OF MATRICES AND KRONECKER PRODUCTS

In the previous chapter we have found ay;!/3X when

Y = AXB (5.1)

where Y = [y1j], A = [ajj], X = [x11] and B = [by].We now obtain (a vec Y)/(a vec X) for (5.1). We can write (5.1) as

y=Px (5.2)

where y = vec Y, x=vecXand P=B'OA.By (4.1), (4.4) and (2.10)

ay=P' = (B'OA)' = BOA'. (5.3)

ax

The corresponding result for the equation

Y = AX'B (5.4)

is not so simple.

[Sec. 5.2] Derivatives of Matrices and Kronecker Products71

The problem is that when we write (5.4) in the form of (5.2), we have thistime

y = Pz (5.5)

where z = vec X'We can find (see (2.25)) a permutation matrix U such that

vecX' = UvecX (5.6)

in which case (5.5) becomes

y=PUxso that

ax= (PU)' = U'(B ®A') . 5.7)

It is convenient to write

U'(B O A') = (B (5.8)

U' is seen to premultiply the matrix (B O A'). Its effect is therefore to rearrangethe rows of (B d A').

In fact the first and every subsequent nth row of (B (D A') form the firstconsecutive m rows of (B O A')(,,). The second and every subsequent nth rowform the next m consecutive rows of (B and so on.

A special case of this notation is for n = 1, then

(B (D A'){1) = BOA' . (S.9)

Now, returning to (5.5), we obtain, by comparison with (5.3)

ay= (B(D

ax

Example 5.1

Obtain (a vec Y)/(a vec X), given X = [x;l] of order (m X n), when

(i) Y=AX, (ii) Y=XA, (iii) Y=AX' and (iv) Y=XA.

Solution

Let y = vec Y and x = vec X.

(i) Use (5.3) with B = I

ay= 10 A'.

ax

(5.10)

... ...

...

III

I72 Further Development of Matrix Calculus

(ii) Use (5.3)

ay= A ®I .

ax

(iii) Use (5.10)

ay_ (I ®A')(n)

ax

(iv) Use (5.10)

ay

ax = (A ®I)(o

[Ch. 5

5.3 THE DETERMINATION OF (a vec X)/(3 vec Y) FOR MORECOMPLICATED EQUATIONS

In this section we wish to determine the derivative (a vec Y)/(a vec X) when, forexample,

Y = X'AX (5.11)

wheie X is of order (m X n).Since Y is a matrix of order (n X n), it follows that vec Y and vec X are

vectors of order nn and nm respectively.With the usual notation

Y = [yi/) , X = [xi/)we have, by definition (4.1),

ay11 ay21 ... aynn

ax11 ax11 ax11

a vec Y ayl I ay21 aynn

avecx axle a .x21 ax21

ayll ay21 aynn

aXmn axmn 3Xmn

But by definition (4.19),ay) '

the first row of the matrix (5,12) is vec --ax, I

(5.12)

/ a'the second row of the matrix (5.12) is +\vec

Y-),etc.

a.x21

Sec. 5.3] The Determination of (3 vecX)/(3 vec Y)

We can therefore write (5.12) as

a vec Y ( by , BY aY 1 '= vec - : vec - ; ... ; vec

a vecX 3x11 8x21 ax,nn

We now use the solution to Example (4.6) where we had established that

73

(5.13)

when Y = X'AX, thenby

= E,,SAX + X AErs . (5.14)axrs

It follows thatby

vec - = vec E;SAX +vec X AE,Saxrs

= (XA'OI) vecE;S+(IOXA)vecErs (5.15)

(using (2.13)) .

Substituting (5.15) into (5.13) we obtain

a vec Y

a vec X

(by (2.10)).The matrix

_ [(X'A'01)[vee/'1 vecE21; . ;vecErnr,]]'

+ [(IOXA)[vecEll: vecE21:... vecE,,,n]]'

_ [vec Eii: vec E21; ... ; vec E;,,n]'(AX 01)

+ [vec E11 vec E21 vec E,nn ]' (I (DA'X) (5.16)

[vec E, 1 , vec E21 .. . vec Ernn ]

is the unit matrix I of order (mn X mn).Using (2.23) we can write (5.16) as

3vecY

avecX

That is

= U'(AX 01) + (10 A'X) .

a vec Y

a vcc X(5.17)

In the above calculations we have used the derivative a Y/axrs to obtain (3 vec Y)/(a vecX).

cow

'-j

74 Further Development of Matrix Calculus [Ch. 5

The Second Transformation Principle-Only slight modifications are needed to generalise the above calculations andshow that whenever

ay= AErsB + CE,, D

aXrs

where A, B, C and D may be functions of X, then

a vec Y

avecX=

We will refer to the above result as the second transformation principle.

Example .f.2Find

avecY

avecXwhen

(i) Y = X'X (ii) Y = AX-'B

Solution

Lety=vecYandx=vecX(i) From Example 4.8

ay= Er'sX + X'Ers

aXrs

Now use the second transformation principle, to obtain

ay= I©X+(X(D

ax

(u) From Example 4.6

hence

ayAX-'ErjX-'B

axrs

ay

= -(X -'B) O (X-')'A'.ax

(5.18)

Hopefully, using the above results for matrices, we should be able to rediscoverresults for the derivatives of vectors considered in Chapter 4.

.ti

c..

0

t-0

Sec. 5.4] More on Derivatives of Scalar Functions

For example let X be a column vector x then

Y = X'X becomes x 'x (y is a scalar) .y=The above result for ay/ax becomes

av= (I0 x)+(x0 1)(1)

ax

75

But the unit vectors involved are of order (n X 1) which, for the one columnvector X is (1 X 1). ilence

ay= l ©x + x ©1 (use (5,9))

ax

=x+x=2xwhich is the result found in (4.4).

5.4 MORE ON DERIVATIVES OF SCALAR FUNCTIONS WITHRESPECT TO A MATRIX

In section 4.4 we derived a formula, (4.10), which is useful when evaluating31Y)/3X for a large class of scalar matrix functions defined by Y.

Example.5.3Evaluate the derivatives

()a log IX

and (ii)aIXIr

axax

Solution

(i) We have

ax(log IXD = X

I axa

I .

rs I rs

From Example 4.4,

alxlax = Ixl(x-')

Hence

a log IXI _(X

1)

= .ax

(ii) alxlrr-1 a1xl

(non-symmetric case) .

= rjXjaXrs aXrs

c^,76 Further Development of Matrix Calculus [Ch. 5

Hence

alxlr -- rlXIr(X-1)'ax

Traces of matrices form an important class of scalar matrix functionscovering a wide range of applications, particularly in statistics in the formu-lation of least squares and various optimisation problems.

Having discussed the evaluation of the derivative a Y/axrs for various productsof matrices, we can now apply these results to the evaluation of the derivative

a(tr Y)ax

We first note that

a(tr Y) _ [a(tr Y)1

ax axrs JI

(5.19)

where the bracket on the right hand side of (5.19) denotes, (as usual) a matrixof the same order as X, defined by its (r,s)th element.

As a consequence of (5.19) or perhaps more clearly seen from the definition(4.7), we note that on transposing X, we have

a(tr Y) a(tr Y) '

ax' ax -

(5.20)

Another, and possibly an obvious property of a trace is found when consideringthe definition of aY/axrs (see (4.19)).

Assuming that Y = [yij] is of order (n X n)

tray =ayri+aY22+...+aYnn

Hence,

tr

axrs axrs

axrs axrs 3Xrs axrs

a- (YI1 + Y22 + . + Ynn)axrs

ay a (tr Y)

Example 5.4

(5.21)

Evaluatea tr(AX)

ax

.-1

Sec. 5.4] More on Derivatives of Scalar Functions

Solution

Hence,

a tr(AX) a(AX)aXrs

= trairs

by (5.21)

= tr (AE,,) by Example (4.8)

= tr(E,,A') since tr Y = tr Y'

= (vec E,.,)' (vec A') by Example (1.4).

atr(AX) ,

= Aax

77

As we found in the previous chapter we can use the derivative of the trace ofone product to obtain the derivative of the trace of a different product.

Example 5.5Evaluate

a tr (AX')

ax

Solution

From the previous result

a t r (BX) _ a t r (X'B') = B,

ax ax

Let A' = B in the above equation, it follows that

atr(X'A) atr(A'X)_ = A.

ax ax

The derivatives of traces of more complicated matrix products can be foundsimilarly.

Example 5.6

Evaluate8 (tr Y)

aYwhen

(i) Y = XAX(ii) Y = X AXB

Solution

It is obvious that (i) follows from (ii) when B = I.

>'C

78

tr(aY\ = tr(E,3AXB)+tr(XÀErsB)axrs!)

tr (E,,4AXB) + tr (E,,.4 XB')

= (vec EE,.)' vec (AXB) + (vec Ers)' vec (AXB') .

(ii) Y = X1B where X1= X AU.ay _ axt B

airs ax-".'

= E,s AXB + X'AEB (by Example 4.6)Hence,

It follows that

a(trY)= AXB + A'XB'.

ax

(i) Let B = I in the above equation, we obtain

a(tr Y)

ax= AX+A'X = (A+A')X .

5.5 THE MATRIX DIFFERENTIAL

For a scalar function f(x) where x = [x1 x2 ... x,,]', the differential df is definedas

Further Development of Matrix Calculus [Ch. 5

ofdf = > dxl. (5.23)

J=Ox,

Corresponding to this definition we define the matrix differential dX for thematrix X = [x;1] of order (m X n) to be

dX =dx11 dx12 ... dxtn

dx21 dx22 ... dx2n (5.24)

Ldxmt dXm2 ... dxrn.1 .

The following two results follow immediately:

d(aX) = a(dX) (where a is a scalar) (5.25)

d(X + Y) = dX + dY. (5.26)

Consider now X = [x;1] of order (m X n) and Y = [ y,f] of order (n X p).

XY = [ExjJyjk]

Sec. 5.5]

hence

The Matrix Differential

d(XY) = d[Yxtlyjk)

=7_ [E(dXij)yjk) + IExii(dYjk))

It follows that

d(XY) = (dX)Y+X(dY).

Example 5.7Given X = [xtl] a nonsingular matrix, evaluate

(i) dlXl , (il) d(X'')

Solution(i) By (5.23)

dIXI (dx,j)ax11

Xij(dxij)

79

(5.27)

since (a1Xl)/(axij) =Xij, the cofactor ofxij in IXI.By an argument similar to the one used in section 4.4, we can write

dIXI = tr {Z'(dX)} (compare with (4.10))

where Z = IXij]Since Z'= IX jX-1, we can write

dIXI = IXl tr {X-'(dX)} .

(ii) SinceX-1X =

we use (5.27) to write

d(X-')X + X-'(dX) = 0.Hence

d(X-') = -X-'(dX)X"'

(compare with Example 4.6).Notice that if X is a symmetric matrix, then

x=x'and

(dX)' = dX . (5.28)

.,.,.

80 Further Development of Matrix Calculus [Ch. 5]


(1) Consider

A =all a12

a21 a12X= [X11 xiz

X21 X22and Y = AX'.

Use a direct method to evaluate

a vec Y

avac X

and verify (5.10).

(2) Obtainavac Y

avecxwhen

(i) Y = AX'B and (ii) Y = )JAII X2.

(3) Find expressions for

atrYax

when

(a) Y = AXB, (b) Y = X2 and (c) Y = XX'.

(4) Evaluate

a tryax

when

(a) Y = X-1, (b) Y = AX-'B, (c) Y = X" and (d) Y = eX.

(5) (a) Use the direct method to obtain expressions for the matrix differentialdY when

(i) Y = AX, (ii) Y = X'X and (iii) Y = X2.

(b) Find dY when

Y = AXBX.

}d{

...

y,,

Cc)

Cl IAPTLR 6

The Derivative of a Matrix withrespect to a Matrix

6.1 INTRODUCTIONIn the previous two chapters we have defined the derivative of a matrix withrespect to a scalar and the derivative of a scalar with respect to a matrix. We willnow generalise the definitions to include the derivative of a matrix with respectto a matrix. The author dial"adopted the definition suggested by Vetter [31],although other definitions also'give rise to some useful results.

6.2 THE DEFINITIONS AND SOME RESULTSLet Y = [y,j be a matrix of order (p X q). We have defined (see (4.19)) thederivative of Y with respect to a scalar xrs, it is the matrix [ayti/axr,s] of order(pXq)

Let X = [xrs] be a matrix of order (m X n) we generalise (4.19) and definethe derivative of Y with respect to X, denoted by

aY

axas the partitioned matrix whose (r,s)th partition is

aY

axrs

in other words

aY

ax

ay ay aY

OXt1 3x12 ... axij

aY aY aY

421 a.X22 ... 3x2n

aY aY aY

OXmt axm2

ay_ Ers0 - (6.1)

r, s axrs

_.y

,1y

82 The Derivative of a Matrix with Respect to a Matrix [Clt. 6

The right hand side of (6.1) following from the definitions (1.4) and (2.1) whereErr is of order (in X n), the order of the matrix X.

It is seen that 3Y/3X is a matrix of order (mp X nq).

Example 6.1

Considerx11 x12 x22

exll x"Y =

sin(xii +x12) log (x1t ,F-X21))Jand

x11 xt2X 1

x21 x22Evaluate

aY

ax

Solution

ay 12 x22 x22 exl l x]] 1

1axi t + x12)cos (XI I(x11 + x21)

ay x77 x22 0

aX12 cos (x11 + x12) 0

ay 0 0 ay x11x12 x17 exllx731

4211

0ax22 0 0

x11 + x21

x12 x22 x22 exl l x» 0X1 t x22

ay 1

cos (x11 + x ) cos (x11 + x12) 012ax xii + x21

0 0 xtt x12

1

x11 exl l x21

0 0 0

Example 6.2Given the matrix X = [xv] of order (m X n), evaluate aX/aX when

(i) All elements of X are independent(ii) X is a symmetric matrix (of course in this case m = n).

.-,

I--

.-,

v°,

Sec. 6.2) The Definitions and Some Results

Solution

(i) I3y (G.1)

ax

ax r, s

ax= Ers +Esr

axrs

ax=

axrs. "

= U (see (2.26))

for r$s

for r = s

We can write the above as;

ax= Ers + Esr - SrsErr

axrs

Hence,ax

axrs

r

Ers + > Ers Ox Esr ` 5rs > Esr Ox Errr,s r,s

= U+ U-2:ErrOx Err

Example 6.3

Evaluate and write out in full ax'lax given

X =

Solution

By (6.1) we have

ax'

X11 X12 X13

Lx21 x22 x231

r, s

(see (2.24) and (2.26))

83

= Ers © Ersax= U.

Hence

1 0 0 0 0 0

0 0 1 0 0 0

ax, 0 0 0 0 1 0

ax - 0 1 0 0 0 0

0 0 0 1 0 0

0 0 0 0 0 1

"C7

I-,

.ox

84 The Derivative of a Matrix with Respect to a Matrix

From the definition (6.1) we obtain

tax, =(>Ers °aX )'

Ers Ox f a by (2. 10)r, s \axr.

a Y'_ O from (4 19)

r,sIt follows that

aY aYfax

= ax'

[Ch. 6

(6.2)

'6.3 PRODUCT RULES FOR MATRICES

We shall first obtain a rule for the derivative of a product of matrices withrespect to a matrix, that is to find an expression for

a (XY)

az

where the order of the matrices are as indicated

X(mXn), Y(nXv), Z(pXq).By (4.24) we write

a(XY)=

axY+X

aY

azrs azrs azrs

where Z = [ZrslIf Ers is an elementary matrix of order (p X q), we make use of (6.1) to

write

a (XY)Ers O

FaxY+X

aylaZ r. s aZrs azrs

ax aYIEr, -Y+ Ers(8X

r, s aZrs r s azrs

ax 3Y'= > Erslo OX -Y+ 5 IIErs 0X -

UZrs rS

UZrsr. s

Sec. 6.3 1 Product Rules for Matrices 85

(where Iq and Ip are unit matrices of order (q X q) and (p X p) respectively)

ax aY(Lrs (D- ) (Iq ®Y) + (I ®X) Er, ---) (by 2.11)

r, s airs r s azrf

finally, by (6.1)

a(XY) ax(I ®Y) + (I®(@ X) aY (6.3)

az = az az

Example 6.4Find an expression for

ax-'ax

SolutionUsing (6.3) on

xX-'=1,we obtain

hence

a (xx-') ax ax-1

ax ax ax

ax-Iax =

-(I©x)-' ax(I©x-')

= -(IOX-1)CI(I(& X-')

(by Example 6.2 and (2.12)).

Next we determine a rule for the derivative of a Kronecker product ofmatrices with respect to a matrix, that is an expression for

a(X (D Y)

az

The order of the matrix Y is not now restricted, we will consider that it is(u X v). On representing X © Y by it (i,j)th partition [x;1Y] (i = 1, 2, ... , m,k = 1, 2, .. , n), we can write

a (X ©Y) a

[xr1Y]azrs air:

C1.

c14

'GO

c(0

f1.

86 The Derivative of a Matrix with Respect to a Matrix [Ch. 6

where (r, s) are fixed

= L3ZrsYJ + L aZsj

r, s

Hence by (6.1)

3(X(D Y)

az

where Ers is of order (p X q)

=aZ®Y+'

r,

_ -OY+XO- .aZrs aZrs

ax

The summation on the right hand side is not X © aY/aZ as may appear at firstsight, nevertheless it can be put into a more convenient form, as a product ofmatrices. To achieve this aim we make repeated use of (2.8) and (2.11)

Ers®(Xazrs® aYl= [IpErsIq]OLUii®X)U1]

r, s //aZrs /r, s

by (2.14)

r, s

ay

ax:rs0x -OO Y+

aZrs r,s

E 0X0 aY

aZrs

Ers0(XOaY\

azrs J

aYErs) O U, -0 X [Iq O U2] by (2.11)

azrs//

OUi]ErsOa-Y

OO X [Ig0 U2] bY(2.11).aZrs

a(XOY)_ ax0Y+ 10U ay0X IO U21az az [ p ] laz ] [ q (6.4)

where U, and U2 are permutation matrices of orders (mu X mu) and (nv X nv)re pe ctive ly.

We illustrate the use of equation (6.4) with a simple example.

Example 6.5A = [ail] and X = [x11] are matrices, each of order (2 X 2). Use

(i) Equation (6.4), and(ii) a direct method to evaluate

a(A©X)ax

ICS

Sec. 6.3] Product Rules for Matrices

Solution(i) In this example (6.4) becomes

(Aaxx)_ [I O U1 ] Cax ©A [I ©U2]

where I is the unit matrix of order (2 X 2) and

U1=U2=ZE,s0OErs=

Since

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

1 0 0 1

ax 0 0 0 0

ax 0 0 0 0

1 0 0 1

only a simple calculation is necessary to obtain the result. It is found that

all 0 a12 0 0 all 0 a12

a(AOX)ax

0 0 0 0 0 0 0 0

a21 0 a22 0 0 a21 0 a22

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

all 0 a12 0 0 all 0 a12

0 0 0 0 0 0 0 0

a21 0 a22 0 0 a21 0 a22

(il) We evaluate

Y = AOX =

allxll alixl2 a12x11 a12x12

a11x21 a11x22 a12X21 a12x22

a21 x11 a21 x 12 a22 x 11 a22 x 12

a21x21 a21x22 a22x21 a22x22

87

and then make use of (6.1) to obtain the above result.

("1

(0l'0

88 The Derivative of a Matrix with Respect to a Matrix [Ch. 6

6.4 THE CHAIN RULE FOR THE DERIVATIVE OF A MATRIX WITHRESPECT TO A MATRIX

We wish to obtain an expression for

azax

where the matrix Z Is a matrix function of a matrix X, that is

Z = Y(X)where

X = [xii] is of order (m X n)

Y = [ yil] is of order (u X v)

Z = [zri] is of order (p X q)

By definition in (6.1)

az az r=1,2,...,max r, s axrs s = 1, 2, ... , n

where Er,s is an elementary matrix of order (m X n),

=Ers D

r,s i,i

azii l=1, 2,...,uii -axrs 1 = 1, 2, ... , q

where Eli is of order (p X q)As in section 4, 3, we use the chain rule to write

Hence

azii

airs

az

ax =

az,i ayap

a,

Ersr, s

a=1,2,...,uayap axrs 0=1,2,...,v

ayap

ayap axrs

ayapO Ei

azii- (by 2.5)axrs aya p

ayap ® az(by (4.7) and (4.19))

ax ayap0e, 9

`DIN

Sec. 6.4] The Chain Rule for the Derivative of a Matrix 89

If I,, and It, are unit matrices of orders (n X n) and (p X p) respectively, we canwrite the above as

az

ax (1-Yli")'& )IPaY\ap ap

Hence, by (2.11)

M 3z

aX p (aaXN) (I.l\ Yap

Equation (6.5) can be written in a more convenient form, avoiding the summation,if we define an appropriate notation, a generalisation of the previous one.

Since

Y =

Y1i Y12 ... Yiv

Y21 Y22 Y2v

LYu1 Yu2 ... YuvJ

than (vec Y)' _ y y21 . Yuv JWe will write the partitioned matrix

as

or as

P P PLaax®1 aXi(3) 1;...ax

a®IP

ax

a (vec Y)'®IP

ax

Similarly, we write the partitioned matrix

azIn ® -

aYii

azIn Ox -

aY21

as

P az lIn®

a vec Y

azIn

ayuv

fro

w-.

'r4

..n

(0I(0

+,G

N

90 The Derivative of a Matrix with Respect to a Matrix

We can write the sum (6.5) in the following order

az Y11

ax = ra®

IPJ CI"

aaZ 1 + ray" 01] (1" © aZ

ax ax Pyu1 IL +l Yzi

+auv®IPI"° azLayx

J[1.

ayu.J

We can write this as a (partitioned) matrix product

az ayii©I aY21 :,.ax r 75X P* ax_)I

1P ax -

[Ch. 6

az

I" ®ayuvFinally, using the notations defined above, we have

aZ a [vec Y]' aZ,,p1"0 ®

ax az P L" a vec Y](6.6)

We consider a simple example to illustrate the application of the above formula.The example can also be solved by evaluating the matrix Z in terms of the com-ponents of the matrix X and then applying the definition in (6.1).

Example 6.6Given the matrix A = [au] and X = [x11] both of order (2 X 2), evaluate

aziaxwhere Z = Y'Y and Y = AX.

(i) Using (6.6)(ii) Using a direct method.

Solution(1) For convenience write (6,6) as

az

ax = QRwhere

[a[vecYrQ

az®I and R = IO

ax P] a vec Y

00

Nom

'

Sec. 6.4] The Chain Rule for the Derivative of a Matrix 91

From Example 4.8 we know that

ay" ± A'Eax r

so that Q can now be easily evaluated,

Q

all 0 0 0 a21 0 0 0 1 0 0 all 0 0 0 a21I I

0 I

0 all 0 0 0 all 0 0 1 0 0 0 all X 0 0 0 a2i

a12 0 0 0 a22 0 0 0 0 0 a12 0 0 0 a22 0

0 all 0 0 1 0 a22 0 0 0 0 0 ate 1 0 0 0 a22

Also in Example 4.8 we have found

aZ= E,S Y + Y'Ers

aYra

we can now evaluate R

R =

2Y11 Y12 0 0

Y12 0 0 0

0 0 2Y11 Y12

o 0 Y12 0

2Y21 Yn 0 0

Y22

0

0

0

0

2Y21

0

Y22

0 0 Y22 0

0 Y11 0 0

Yil 2Y,2 0 0

0 0 0 Y11

0 0 Y 2Y,20""Y21"0""0"Y21 2Y22 0

0 0 0

Lo 0

0

Y21

Y21 2y2

;,c

r-.O

--

^'.'0

92 The Derivative of a Matrix with Respect to a Matrix (Ch.

The product of Q and R is the derivative we have been asked to evaluate

E2ailyil + 2a21y21 a11y12 + a21y22

QR = a11y12 + a21y22 0

2412y + 2a22Y21 a12y12 +1122Y22

La12y12 +a22y22 0

o ally,, + a21y21

a11y1 l +a21y21 2a11y12 + 2a21y22

o al2y11 + a22y21

a12.y11 + a22y21 2a12y12 + 2a22y22

(ii) By a simple extension of the result of Example 4.6(b) we find that when

Z = X'A'AX

az= ErSAAX + X'A'AErs

axrs

= ErsA'Y + Y'AErswhere Y = AX.

By (6.1) and (2.11)

az

ax(Ers Ox Ers) (10 A'Y) + 2 (I OO Y'Z) (Ers Ox Ers)

r.sr,s

Since the matrices involved are all of order (2 X 2)

1 0 0 0

0 0 1 0

IErsOE;s =0 1 0 0

0 0 0 1

and

1 0 0 1

0 0 0 0E Ers OX Ers =

0 0 0 0

1 0 0 1

On substitution and multiplying out in the above expression for aZfaX, we obtainthe same matrix as in (i).


(1) Evaluate aYjaX given

IX-21

y _ [cos (X12 + x22) xux211and X = x11 x12

X12x22 X22

.L]

.mar

6] Problems

(2)

The elements of the matrix X =rxil x21

x12 x22

LX13 X23 J

are all independent. Use a direct method to evaluate aX/aX.

3 ]() Given a non-singular matrix X = _I x11 x12

x21 x22

use a direct method to obtain

ax-1

axand verify the solution to Example 6.4.

93

(4) The matrices A = [aiij and X = [x,ij are both of order (2 X 2), X is non-singular. Use a direct method to evaluate

a(A 0 X-')ax

'L7

E--

CHAPTER 7

Some Applications of MatrixCalculus

7.1 INTRODUCTION

As in Chapter 3, where a number of applications of the Kronecker product wereconsidered, in this chapter a number of applications of matrix calculus arediscussed. The applications have been selected from a number considered in thepublished literature, as indicated in the Bibliography at the end of this book.

These problems were originally intended for the expert, but by expansionand simplification it is hoped that they will now be appreciated by the generalreader.

7.2 THE PROBLEMS OF LEAST SQUARES AND CONSTRAINEDOPTIMISATION IN SCALAR VARIABLES

In this section we consider, very briefly, the Method of Least Squares to obtaina curve or a line of `best fit', and the Method of Lagrange Multipliers to obtainan extremum of a function subject to constraints.

For the least squares method we consider a set of data

(xi, Yi) i = 1, 2, ..., n (7.1)

and a relationship, usually a polynomial function

Y = f(x) (7.2)

For each x;, we evaluate f(xi) and the residual or the deviation

ei = y, -f(xr) . (7.3)

The method depends on choosing the unknown parameters, the polynomialcoefficients when f(x) is a polynomial, so that the sum of the squares of theresiduals is a minimum, that is

n

S = > ei (Yi -f(x,))' (7.4)

is a minimum.

C/!

[Sec. 7.21 The Problems of Least Square and Constrained Optimisation 95

In particular, when f(x) Is a linear function

y =ao+alxS(ao, al) is a minimum when

as asas =0=as . (7.5)

0 1

These two equations, known as normal equations, determine the two unknownparameters ao and a1 which specify the line of 'best fit' according to the principleof least squares.

For the second method we wish-to determine the extremum of a continuouslydifferentiable function

f(x1,x2, ...,xn)

whose n variables are contrained by in equations of the form

g1(x1,x2,...,x,) = 0, 1 = 1,2,...,rr

(7.6)

The method of Lagrange Multipliers depends on defining an augmented function

mff+ 1pigt

t=1

where the pi are known as Lagrange multipliers.The extreme of f(x) is determined by solving the system of the (m + n)

equationsaf* =aax,

r = 1, 2, .. , n

g; = 0 i = 1,2,...,mfor the m parameters µl, u2, ... , µm and the n variables x determining theextremum.

Example 71

Given a matrix A = [a11] of order (2 X 2) determine a symmetric matrixX = [x;j] which is a best approximation to A by the criterion of least squares.

Solution

Corresponding to (7.3) we have

E=A - Xwhere E = [e;1] and e11 = a;i -x1j.

.N+

.ti

96 Some Applications of Matrix Calculus [Ch. 7

The criterion of least squares for this example is to minimise

S = e =l,/

which is the equivalent of (7.6) above.The constraint equation is

Xi2 -x21 = 0

and the augmented function is

f* = Earl -x1/)2 + µ(x12 -x21)

-2(aax11

ll '-x11) = 0

af* --2(a12 -x12) +',1 = 0

ax12

af*- -2 (a21 -x21) -11 = 0

ax21

af* --2 (a22 - x22) = 0

ax22

= 0

This system of 5 equations (including the constraint) leads to the solution

µ = a12 -x21

Hencex11 = all , x22 = a22 , x12 = x21 = J(a12 + a21)

X =

all

a12 + a21

L 2

a12 + a21

2

a22

all a12+ -

2

all a21

2 a21 a22

= j(A+A')

a12 a22

7.3 PROBLEM 1 - MATRIX CALCULUS APPROACH TO THE PROBLEMSOF LEAST SQUARES AND CONSTRAINED OPTIMISATION

If we can express the residuals in the form of a matrix E, as in Example 7.1, thenthe sum of the residuals squared is

S = tr E'E . (7.10)

a.-

Sec. 7.3] Problem 1 97

The criterion of the least squares method is to minimise (7,10) with respect tothe parameters involved.

The constrained optimisation problem then takes the form of finding thematrix X such that the scalar matrix function

S = f(X)is minimised subject to contraints on X in the form of

.G(X)=0 (7.11)

where G = [gill is a matrix of order (s X t) where s and t are dependent on thenumber of constraints g1l involved.

As for the scalar case, we use Lagrange multipliers to form an augmentedmatrix function f*(X).

Each constraint gil is associated with a parameter (Lagrange multiplier)Ail

Since

whereEµllg;l = tr U'G

U = [µtl]we can write the augmented scalar matrix function as

f*(X) = trE'E+ tr U'G (7.12)

which is the equivalent to (7.8). To find the optimal X, we must solve thesystem of equations

af*= 0.

ax(7.13)

Problem

Given a non-singular matrix A = [ail] of order (n X n) determine a matrixX = [x,1] which is a least squares approximation to A

(i) when X is a symmetric matrix(ii) when X is an orthogonal matrix.

Solution

(i) The problem was solved in Example 7.1 when A and X are of order (2 X 2).With the terminology defined above, we write

E=A - XG(X) = X -X' = 0

so that G and hence U are both of order (n X n).

E'"

,_,...

fl.

`""


Equation (7.12) becomes

f* =trA'A-trA'X-trX'A+trX'X+trU'X-trU'X'.

We now make use of the results, in modified form if necessary, of Examples 5.4and 5.5, we obtain

of _ -2A+2X+U-U'ax

U °- U'

Then

= 0 for X = A+

X'=A'+U'-U

2

and since X = X', we finally obtain

X=j(A+A').

G(X)=X'X-I=0.f* = tr[A'-X'][A-X] +trU'[XX'-I]

(ii) This time

Hence

so that a f

_ -2A+2X+X[U+U']ax

=0 for X=A-X2

Premultiplying by X' and using the condition

X'X = I

we obtain

X'A=I+U+U'

2

2

and on transposing

Hence

A'X = I+U+ U'

A'X = X'A .

2

(7.14)

If a solution to (7.14) exists, there are various ways of solving this matrixequation.

.D.

...

'L7 1-

+


For example with the help of (2.13) and Example (2.7) we can write it as

[(l ©A') .- (A' ©I)U] x = 0 (7,15)

where U is a permutation matrix (see (2.24)) and

x=vecX.We have now reduced the matrix equation into a system of homogeneousequations which can be solved by a standard method.

If a non-trivial solution to (7.15) does exist, it is not unique. We must scaleit appropriately for X to be orthogonal.

There may, of course, be more than one linearly independent solution to(7.15). We must choose the solution corresponding to X being an orthogonalmatrix.

Example 72Given

A =

find the othogonal matrix X which is the least squares best approximation to A.

Solution

1 -1 0 0 r1 -1 0 0

[IOA'] = 2 1 0 0and [A'©1]U = 0

0 1 -10 0 1 -1 2 1 0 0

0 0 2 1 0 0 2 1

Equation (7.15) can now be written as

0 0 0 0

2 1 -1 1

x = 0-2 -1 1 -1

0 0 0 0

There are 3 non-trivial (linearly independent) solutions, (see [18] p.131). Theyare

x = [1 -2 1 1]', x = [1 1 2 -1]' and x = [2 -3 3 2]'.

Only the last solution leads to an orthogonal matrix X, it is

1 2 3X = -13 -3 2

......

... ...

te/


7.4 PROBLEM 2 - THE GENERAL LEAST SQUARES PROBLEM

The linear regression problem presents itself in the following form:N samples from a population are considered. The ith sample consists of an

observation from a variable Y and observations from variables X1, X2, ..., X(say).

We assume a linear relationship between the variables. If the variables aremeasured from zero, the relationship is of the form

Yl = bo+blxn+b2x11+...+bx,8+el. (7.16)

If the observations are measured from their means over the N samples, then

yr = (i= 1, 2, ... N) (7.17)

bo, b1, b2, ... , b are estimated parameters and e1 Is the corresponding residual.In matrix notation we can write the above equations as

y = Xb + e (7.18)where

Y =

[]. b =

[bl], e = eIY2 ba 2

and

YNI' Ibn

_rl x12 ... xln

X = I{1 X22 ... X2n or X =

ex

X11 X12 ... Xln

X21 X22 ... x2n

L1 XN2 ... XNnJ LXNI XN2 ... XNnJ .

As already indicated, the `goodness of fit' criterion is the minimisation withrespect to the parameters b of the sum of the squares of the residuals, which inthis case is

S = e'e = (y'-b'X')(y-Xb).Making use of the results in table (4.4), we obtain

=a (e'e)+ (X'Xb +X'Xb)'X)'-X'-( y

yab= -2X'y + 2X'Xb= 0 for X'Xb = X'y. (7.19)

where b is the least squares estimate of b.If (X'X) is non-singular, we obtain from (7.19)

.b = (X'X)-1 X'y. (7.20)

...

<.o

.-0...

...

Sec. 7.41 Problem 2

We can w,ite (7.19) as

X'(y -Xi) = 0or X'e = 0which is the matrix form of the normal equations defiend in section 7.2.

Example Z 3

101

(7.21)

Obtain the normal equations for a least squares approximation when each sampleconsists of one observation from Y and one observation from

(i) a random variable X(ii) two random variables X and Z.

Solution

(1) 1 x1 Y,6,

X = 1 x2 Iy = Y2 , b =

1 XN YN

62

hence

X'[y-Xb] = Ey;-b1N-b2Ex;ExiYi - b, Ex; - 62 Ex,2J

So that the normal equations are

and

Ey, = b,N+b2Ex1Exly! = b1 E xr + b2 Ex,? .

(ii) In this case

X =

x1 z

l x2 z2 y =

Y11

Y2 b=bl

b2

Lb3J

11 xN ZNJ LYNJ

The normal equations are

Ey, = b1N+b2Ex;+b3EZ1

ExiYi = 61Ext+b2Ex;2+b3Exjz;and Ex;zt = bl Ez; + b2 EX;Zi + b3 Ez1 .

.-.

......


7.5 PROBLEM 3 - MAXIMUM LIKELIHOOD ESTIMATE OF THEMULTIVARIATE NORMAL

Let X1(1 = 1, 2, ..., n) be n random variables each having a normal distributionwith mean Pi and standard deviation ar, that is

Xi = n (lat, at). (7.22)

The joint probability density function (p.d.f.) of the n random variables is

f(xl,x2i...,xn) =

exp(- (x-µ)2V'1(x-µ) (7.23)

where

and

<xi <QO (i= 1,2,..., n)

rall 012 aln

I 012 022 ... a2n

Lain 02n ... annJ

is the covariance matrix.

is, = x' =and

aq = Pi/at a/ (1$I)arr = a,

are the covariances of the random variables.

pr/ is the correlation coefficient between Xr and Xj. The covariance matrix Vis symmetric and positive definite.

Equation (7.23) is called a multivariate normal p.d.f. Maximum likelihoodestimates have certain properties (for example, they are asymptotically efficient)which makes them very useful in estimation and hypothesis testing problems.

For a sample of N observations from the multivariate normal distribution(7.23) the likelihood function is

I

so that

L = (2 r)nty/2 I V IN/2 exp i-2

-µ)1

logL = C--IogIVI-- (xi-µ)'V_1 (x,-µ) (7.24)N 1

r=1

where C Is a constant.

fly

Sec. 7.5] Problem 3

(a) The maximum likelihood estimate of µOn expanding the last term of (7.24), we obtain

1-- {xt'V-

1x'-11, V-'xt -XI' V''µ + µ' V''µ},2

With the help of table (4.4) and using the result

(x1' V-' )' = V-' x1 (since V is symmetric)

we differentiate with respect to it, to obtain

alog L

aµ= V-1

N

J=- u)

Ex,0 when µ =

N= z

Hence the maximum likelihood estimate of is Is µ = z, the sample mean.

(b) The maximum likelihood estimate of V-

We note the following results:

(1)

y V-'Y1 = tr(Y'V-'Y) = tr(YY'V"')r=1

where Y = [Y! Y2 ... YNI

and Yi = xt - µ (i=i,2,...,N.

103

V-' is a symmetric matrix.

(2) By Example 5.3, but taking account of the symmetry of V-' (see Example4.4)

a log I V`'j= 2V-ding{V}.

a v-'

(3) If X is a symmetric matrix

a tr(AX)= A + A'- ding {A} .

ax

Let A = YY' and X = V-', then

a tr(YY'V'')= 2YY'- diag {YY'}

a v-'

SIN


We now write (7.24) as

logL = C+NlogV"r-1 tr(YY'V`r).2 2

Differentiating log L with respect to V`r, using the estimate µ = z, and theresults (2) and (3) above, we obtain

a logL _ N 1

aV'' 2 [2V - ding {V}] - YY'+ 2 diag {YY'}

LetQ=NV-YY',thena log L 1

aV_tQ - 2 diag {Q}

= 0 when 2Q = diag {Q}

Since Q is symmetric, the only solution to the above equation is

Q = 0.It follows that the maximum likelihood estimate of V is

X(X! - X) (X, - X)'V =

N

7.6 PROBLEM 4 - EVALUATION OF THE JACOBIANS OF SOMETRANSFORMATIONS

The interest in Jacobians arises from their importance particularly with referenceto a change of variables in multiple integration.

In terms of scalars, the problem presents itself in the following way.We consider a multiple integral of a subset R of an n-dimensional space

f(xi,x2,...) (7.25)IR

where f is a piecewise continuous function in R.We consider a one to one transformation which maps R onto a subset T

Yt = µ1(x), Y2 = µ2(x), ..., Yn = brn(x)

and the inverse transformation

(7.26)

xt = wr(Y), x2 = WAY), ..., xn = Wn(Y) (7.27)

wherex' _ [xt,x2, ... ,xn] and y' = [Yi,Y2, ... ,Yn]

A A

.off'

8


Assuming the first partial derivations of the inverse transformation (7.27) to becontinuous, (7.25) can be expressed as

ff [wr (y), w,,(y), ... , wn (y)] I Jl dy, dye ... dy, (7.28)T

where IJ I can be written as

ax"

by,

D x,.

aye

ax, ax2 ax

by By....

ayn

subject to IJI not vanishing identically in T.

Example 7.4Let

I = 2J exp {-2x1 + 3x2} dx1 dx2R

0<xl<°°, 0<x2<°°.Consider the transformation

Y1 = 2x1 -x2

Y2 = x2 .

Write down the integral corresponding to (7.28).

(7.29)

Solution

We are given

R = ((xi,x2): 0 <x,<-, 0 <x2 <°} .

The above transformation (corresponding to (7.26)) results in the followinginverse transformation (7.27)

XI = I(YI +Y2)

x2 =Y2which defines

T = {(Y1, Y2) : Y2 > 0, Y2 > -Y1, <Yt < °° },

.y.

-4"

.nom


and by (7:29)

Hence

0

IJI= 1 =#

I Jf [i (Yt + Yz), Y21 dYt dY2T

f exp(-y, +2y2)dytdys.T

Our main interest in this section is to evaluate Jacobians when the transfor-mation corresponding to (7.26) is expressed in matrix form, for example as

Y = AXB (7.30)

where A, X and B are all assumed to be of order (n X n).As in section 5.2 (see (5.1) and (5.2)) we can write (7.6) as

y =Px (7.31)

where y=vecY,x=vecXandP=B'®A.In this case

ay=BOA'ax

andax

= [B ®A']-t = B-t ® (A')-t by (2.12)ay

It follows that

a vec Y

avecX

Example ZS

Consider the transformation

whereY = AXB

2 -41A-

J1 3

-t

IBI-" IAI-n (by Property X, p. 27) (7.32)

and B =

Find the Jacobian of this transformation

(i) By a direct method(ii) Using (7.32).

-1-

0

>,I

Sec. 7.6] Problem 4

Solution(i) We have

X = A-tYB-t = [3Yt+4Y2-3Y3-4Y4II

Y Y t + 2Y2 -Ya - 2Y4so that

ax

ay

(2)4

3 1 -3 -14 2 4 -20 0 3 1

1 -4 -2 8

(ii) Al I= 2, IBI = 1 hence IJI = }.

107

-3Y1 - 4Y2 + 6Y3 + 8Y4 J

Yt-2Y2+2Y3+4Y4 J

Similarly, we can use the theory developed in this book to evaluate theJacobians of many other transformations.

Example Z 6

Evaluate the Jacobian associated with the following transformation

(i) Y = X-t(ii) Y = X2 .

Solution

(i) From Example 5.2

ay _ -X-t0 (X-t)'ax

so that

Hence

ax- = -X®X'.by

J = mod

(ii) From section 4.6

ay

ay

ax= IXOX'I = IXI-n IXI-n = IXI-211

= Er,sX +Ersa xrs

so that by the 2nd transformation principle (see section 5.3)

and

ay= XOI+IOX'

ax

J = XOI+IOX'I t

.II

`y1


7.7 PROBLEM 5 - TO FIND THE DERIVATIVE OF AN EXPONENTIALMATRIX WITH RESPECT TO A MATRIX

Since we make use of the spectral decomposition of an exponential matrix, wenow discuss this technique briefly.

Assume that the matrix Q = [q;i] of order (n X n) has eigenvalues

x1,x2,...,An(not necessarily distinct) and corresponding linearly independent eigenvectors

xl, x2, ... , xn .

The eigenvectors of Q' are

Yt, Yn

These two sets of eigenvectors have the property

x; yi = 0 or (equivalently) y,' xi = 0 (i 0J) (7.33)

and can be normalised so that

xiYt = 1 or y;x1 = 1 (i=1,2,...,n). (7.34)

Sets of eigenvectors {x1} and {y1} having the properties (7.33) and (7.34) are saidto be properly normalised.

It is well known (see [18] p. 227) that

exp (Qt) = P diag {ex,t, ex2r, ... , exn t} P-l

where P is the modal matrix of Q, that is the matrix

P = [xl x2 . xn]

It follows from (7.33), (7.34) and (7.35) that

F-1 = [Y1, Y2, ... , Yn]

Hence

exp (Qt) = [x1 x2 ... xn]

Y2

Yn

exlr 0 .. 0

0ex,r

... 0

(7.35)

(7.36)

LO 0 ... e'-' LynJ

= xlyi exp (Alt) + x2Y2 exp (X2t) + ... + xnY eXP (Xnt) ,

Sec. 7.7] Problem 5

that is

109

rt

exp (Qt) = x; yi exp (Xit) . (7.37)1=1

The right hand side of (7.37) is known as the spectral representation (or spectraldecomposition) of the exponential matrix exp (Qt).

We consider a very simple Illustrative example.

Example Z7Find the spectral representation of the matrix exp (Qt), where

Q =

Solution

By (7.37)

1

-1] exp (t) + rl

1

3] exp (-t)

Although we have considered matrices having real eigenvalues, and eigen-vectors having real elements, the spectral decomposition (7.37) is also valid forcomplex elements as can be shown by a slight modification of the above exposition.

By the use of (2.17), that is of the result

exp (10 Q) = 1® exp (Q)

we generalise the result (7.37) to

exp (I ®Q)t = E(1®xtyi) exp (Alt) . (7.38)

We now consider the main problem, to obtain an expression for

wheremqc3Z

' (t) = exp (Qt) , (7.39)so that

c(0) = 1, (7.40)

d(7,41)

dt

_ -1; xi = [2 -11, xz = [1 -11yi = [1 1], yi = [-1 -2] .

exp (Qt) = C 1] [1 1 ] exp (t) + L-11 [-1 -2] exp (-t)

andZ = [z1] is a matrix of order (r X s).

NIA

l0'

tip

r.,

110 Some Applications of Matrix Calculus [Ch. 7]

The matrix Q is assumed to be a function of Z, that is Q(Z). Making use of theresult (6.5), we can write

d 34, a (Q4)) aQ a4)

dt az az az(I ®(P) + (I ®Q) az (7.42)

and from (7.40)

apaZ

(o) = 0 .

We next make use of a generalisation of a well known result (see [19] p. 68);Given

d-X = RX+BUdt

and

the n

For

X(o)=0,

X=ft

exp(R(t--T)}BU(r)dr.

X_-, R=I®Q, B= -az az

and U = I®cl?

the solution to (7.42) subject to (7.43) becomes

f r

az =exp {I®Q(t-r)} aQ [I®4?(t)]dT

0

Hence,

where

I, i

(7.43)

(7.44)

I ®x, y;) exp (X . (t - T))a- [10 xjyj ] exp (X1r) dr

(by 7.37 and 7.38)

N(I ® x1Yi)az

(10 xj yj, ) exp (Xit) exp ((Xj - X;)T) dr .0

a(D

aZ =I®x,y,)aQ(1®xjyj)exp(Xtt)fj(t)

t, l

fy (t) = t if Xj = Xjand

f1(t) _ (ll(A1-Xi))[exp(Xj-?,)t)-I] if

0

Solution to Problems

CHAPTER I

(1)

AB =

A1.B,1 AI.B.2 A1.B.3A2.B,1 A2.B.2 A2.B.3

A3-B.1 A3.B.2 A3.B.j

A4.B.1 A4.B.2 A4.B.3

(2) (a) The kth column of AEIk is the ith column of A, all other columns arezero.

(b) The ith row of EikA is the kth row of A, all other rows are zero.

AEik = Aeiek = A.1ek

EikA=eiekA=e1Ak

(3) trABC = e;ABCei = > (e;A)B(Ce1)

A'i. BC. i .

(4) trAEij = ekarsEr$Eljekk k,r,s

ars ek er es ei ej ekk, r, s

arsbkrbsfbjk = aj1,k,r,s

..r

`-..I

V''

IC)

C1.

r^.

d^.

112 Solution to Problems

(5) A = 2 ai/All = 2- tr (BEII 61j) Eli

CHAPTER 2

(1) Since Uis an orthogonal matrix, the result follows.More formally,

57 [Ers(m X n) © Esr(n X m)] [Err(n X in) ® Ers(m X n)]r. j

r, s

W

I./.k

r, /

kBEitekErt ekBejS1,Eit4/,k

e'BciEii ) b1jE11 = diag {B),

rr(mXm)]O [5'Ess(nXn)]

ssErr(m X m)] ( [SrrEss(n X n)]

,.s(m X n)Esr(n X m)] O [Esr(n X m)Ers(m X n)]

= Im © I = Imr the result follows.

(3) (a) 1 -2 2 -1 11

4 0 2 0A©B =

0 0 -1 1

Q 0 2 0

(b) 5 Q 0 0

U1=U2=0 0 1 0

0 1 0 0

0 0 0 1

, BOA =

-2 -1 2 1

0 1 0 1

4 2 0 0

0 2 0 0

(4) See [18] p. 228 for methods of calculating matrix exponentials.

(a) 2e-e 1 2(ee-')exp (A) = le-1

- e 2e -1 - e

Chapter 2 113

(b) 2e -e-1 0 2(e -e'1) 0

0 2e-e-1 0 2(e-e'')exp (A O I) _

0 -(e - e-1) 0 -e + 2e-1

(5) (a)

exp (A)OI =

2e-e-1 0 2(e-e'1) 0

0 2e-e"' 0 2(e-e-')0-1 -e 0 2c'1 -e 0

0 e'1 -e 0 2e-'-eHence exp (A) 4 I = exp (A C I) .

r1 1 _ 1I 4 -2A-'

1-1. -2 IB_i

2 3 1

so that

-4 2 -4 2

1 3 -1 3 -1A-1 n B-1 = -2 4 -2 3 -1

-3 1 -6 2

(b) As

AOB =

2 4 1 2

6 8 3 4

-1 -2 -1 -2

-3 -4 -3 -4

(A& B)-' =

, it follows that

-2 1 -2 1

3/2 -1/2 3/2 -1/2

2 -1 4 -2-3/2 1/2 -3 -4

This verifies (2.12)

(6) (a) For A; X1 = -1 , X2 = 2 , xi _ [1 4] and x2 = [1 1] .

For B ; j = 1, µ2 = 4 , y; _ [l -1] and yZ = [1 2] .

(b)

AOB = = E (say).

;,,

't1

--k

Q)1

III


(7)

C(X) = IX/-6f = X4-5X3- 30X2+40X-30_ (X+i)(X+4)(T-2)(X-8).

Hence the eigenvalues of C are

{-I, -4, 2, 81 = (XIMI, X1u2, X2ur, p2µ2).

The corresponding eigenvectors of li are:

2

5

-1

8J L-'J

and2

1

2

(c) This verifies Property IX

For some non-singular P and Q

A = P-' CP and B = Q-' DQ .

HenceAOB = P-'CP0Q-'DQ

_ (P-' 0 Q-')(CPODQ) by (2.11)

= (P®Q)-'(COD)(POQ) by (2.12) and (2.11)

= R-'(COD)Rwhere

ri [ii [ii [i

R = POQ.

The result follows.

CHAPTER 4

(1) ay

ax

2x22 0

x13 2x11

-x 21

0

(2) (a) L X I= x sin x -exp (2Y)

ajxj ex -cosxax x sin x

X I =exp (x) sin x - x cosx,

NX) x -2e2x

ax -2e2x sin x

Chapter 4

(b)

(3)

(a)

0 1

0 0

0 0

ex -cos x

-x sin x

o sin xCx 0

X11X12 +X21X22

x12+x22

X13 X12 +X23 X22

x11 X12 X13

X21 X22

(b) Since Y13 = X11X13 + X21X23

3Y 13 F X13 0 x111

I X11 X12 X13

X23J

X1 1X13 +X21 X23

X12x13+X22X232 2

X13 +X23

rxll x211

+

110 0 00 0 01 +

Lx21 X22 X23J Ll 0 of

X12 x22

X13 X23J

X11

Lx21

0 0 0

1 0 0

115

which is the result in Example 4.8.

aIxIax

X II X12

X21 X22

2[X1] -diag (X11)

2x -2e"

C2ex 2 sin x

2 zX11 +X21

Y = x12x11 +x22x21

X13Xll +X23 X21

hence

ay

ax21

2x21 x22 X23

x22 0 0

x2J 0 0

From Example 4.8

aY = E21X+X'E21 =aX21

ax LX 23 0 X21 j

...


(4) (a) aY= E,s AX + XAErs

axrs

ay` = E;1 X'A' + A'X'E,; .ax

(b) ayE,'.s Ax' + XAEis

>

17X,3

ayl= AX'E;, + E;, x'A' .

ax

(5) By (4.10)

where

alYI= tr {I YI(Y-')'B'E,,A'}

axrsIYI tr {A'(Y-')'BE,,)

IY I (vec Ers)' vec [A'(Y-')'B']

(AXB I zrs

[zrs] = Z = A' [(AXB)-' ]'B' .

(6) (a) Since

a (X= ErsX' + XErs

axr.,

s

ay= Ers(X')2 + XErsX' + XX'Ers

axrs

(b) aY= X'X + X'E,.s X + (X')2Ers .

axrs

CHAPTER 5

(1) Since

yll

Y21

Y12

Y22

al1xil +a12x12

a21x11 + 1722x12

a11x21 + a12x22

a21x21 + a22x22

CSI

ti,

NIA

Chapter 5 117

avecYa vec X

all a21 0 0

0 0 all a21

a12 a22 0 0

0 0 a12 a22

(2)(a) a vec Y

a vec X(B ©A')(,) by (5.18)

(b) a vec YX©I+IOX'.

a vecX

(3) (a) a tr Y

(b) a tr Y

,A'B' = (vec Ers)' (vec A'B') ,,B = tr E,= tr AEr ,

axrs

hence

atrYaX = A'B'

= 2trE;,.X',

= 2X' .

axrs

hence

atrYax

(c) a tr Y

axrs

hence

= 2trE",X,

atrY= 2X.

ax

(4) (a)

ax

(b) a tr Y

axrs

hence

= -trX-1Er,.X-1 = -trErs(X-2)',atrY

= (X-2)'

axrs

hence

atrY

= -tr AX-t Ers X-'B

a tr Y =-(X'BAX1)'.ax

'Y\


(c) a tr Y

(d)

axrs

hence

a tr Y

axrs

a tr Y

= tr EE,Xn-1 + tr XErsX"-2 + ... + tr Xn`lEra

= tl(Xn-1)'

exp(X) = I+X+21

1

X2+31

1

X3+.,,

hence by the result (c) above

ax

(5) (a) (i) dY =

= exp (X') .

F`7jjdxjj+aj2dX21 a11dx12+a12dx22

La21 dx 11 + a22 dx21 a21 d x 12 + a 22 d x22

all a12 dx11 dx12A(dX).

all a22 dx21 dx221

(1l)d Y =

(2x11 dxll + 2x21 dx21

x11 dx12 +x12 dx11 +x22dx21 + d21 dx22

x11dx12 +x12dx11 +a121dx22 +x22dx21

2x12dx12 + 2x22dx22 I

dY =

Idx11 dx211 [x11 x12 + [x11 x211 [dx1dx21 1 dx22dx12

dx12 dx22 x21 x22 x12 x22

1

= (dX)'X + X'(dX) .

2x11 dx11 + x12dx21 + x21 dx12

x11dx21 +x21dx11 +x22dx21 +x21dx22

x11dx12 +x12dx11

X 21 dx12

+x12dx22 +x22dx12

+x12dx21 +2Y22dx22

xlldxll +x12dx21 xlldx12 +xl2dx22

[x21dx11 +x22dx21 x21 dx12 +x22dx22

Ixlldxll +x21dx12

x11dx21 +x21dx21

x12dxll +x22dx12

+ x22 dx12x12dx21

= X(dX)+(dX)X.

"'l

Chapter 6 119

(b) Write Y= UV where U = AX and Y = BX ,

then dY= U(dV) + (dU)VAXB(dX) +A(dX)BX .

CHAPTER 6

1( ) -3111(x12 + x22) X21 0 0

ay x12exiiXis 0 x11eX"ix'a x22

ax 0 x11 -sin (x12 +x22) 0

0 0 0 X12

(2)

ax1 0

ax0 0

ax0 0

0 0 1 0 0 0 and so on,axl

1ax l2 ax 13

hence by (

0

6.1)

0 0 0 1 0

1 0 0 1

0 0 0 0

0 0 0 0

ax0 0 0 0

ax1 0 0 1 = U.000000000 0 0 0

1001

(3) SinceX-1 = -

FX22 -x121

-X21 x11

where A =x1 1x22 -x12X21, we can calculate aX_11axrs, for example

ax-t 1

ax 11 A2F

2-x22 X 12X22

x21x22 -x12x21

oho

r"'

--I


Hence

ax-1 ]

ax A2

FI x22

-x21 x22

-x 12x22

xux22 -x12x11 -x11x21

-x22x21

x221

x12x21

xilx22-x11x21

-x11x122

x11

X22 -x12 0 0

1

11 0 10

]x22 -x12 0 0

-x21x11 0 0 0 0 0 0 -x21 x11 - 0 0

0 0 x22 X12 I O 0 0 0 0 0 x22 -x120 0 -x21 xil L1 0 0 ] L° 0 -x21 x11

-(I ©X'1) U (I O X-').(4)

-'a11x22 -atlx12 a12x22 -a12x 12

A ©XA -a1 1x 21 a11x11 -a12x21 a12x11

a21x 22 -a21x12 a22x22 -a22x 12

-a2 1 x 21 a21x l t -a22x21 a22x i 1

where A = x11x2

We can now calcu

a (A (D X -')/

2 -x12xlate

axrs

21

axrs

and form

0 0 0 0 0 -all 0 -a12

0 all 0 a 12 0 0 0 0

0 0 0 0 0 all 0 -a22

a(A ©X-') 0 a21 0 a 22 0 0 0 0

ax A 0 0 0 0 all 0 a12 0

- a ll 0 -a12 0 0 0 0 0

0 0 0 0 a21 0 a22 0

- a2 1 0 -a22 0 0 0 0 0

Tables of Formulae andDerivatives

Table 1Notation used: A = [ail], B = [bil]

Eij = ei el5i = e/ ej = ej'ei

Eq er = Slrei

EijErs = sjrEisEi1EjsEsm = Eim

EijErs=0iff OrA=-7Za,jE;jii

A.1 = Ael

Al. = A'e1

EiiAErs = air EijtrAB = Zailblit

tr AB' = tr A'B.trAB = (vecA')'vecB.

+q+

+q',

'Ti

fro

122 Tables of Formulae and Derivatives

Table 2

AOB = [apB]AO(aB) = a(AOB)

(A+B)OC = AOC+BOCAO(B+C) = AOB+AOCA0(B0C) (A 0 B) 0 C

(A O B)' = A' O B'(AOB)(C0D) = ACOBC

(A OB)-' = A-' GB-'vec (AYB) _ (B' G A) vec Y

]A O BI _ CAI' IBS" when A and B are of order(n X n) and (rn X rn) respectively

A O B = U, (B (@A) U2, U, and U2 are permutationmatrices

tr (A O B) = trA tr BAOB = A®1,"+1OB

U = Z Z Ers O E,sr s

Table 3

a (Ax) _ A'ax

a (x'A)= A

ax

a (x'x)= 2x

ax

3 (x'A x)= Ax + A'x

ax

az ay az

ax ax ay

Tables of Formulae and Derivatives

= Ers

AErs B

AErs B

= Ers A'AX + X'A'AErs

-AX-'E,,X-'B

= E,, A X + X'AE,s

af(X) af(X)= ZEEiiax ax11

aixtIXI(X-1), when elements of X are

ax independent2 [XXi] - ding (X11}, when X is symmetric,

axErsax,,

axrs

a (AX'B)

axrs

a (X'A'AX )

axrs

a (AX-'B)

axrs

a (X'AX )

axrs

a(Xn)

axrs

a(X-n)

axrs

n

k=0

Table 4

X kErsXn-k-1

-X-n [XkEr,Xn-k-1

k=0

123

124 Tables of Formulae and Derivatives

Table 5

a vec (AXB)_ B' G A

a vecX

a vec (XAX)_ U'(4X ©1) + (IO A'X)

a vec X'

a vec (AX-'B) _ ' ' '

a vecX

Table 6

a log 1XI -, ,

axrlYlr(X-t),

=

a tr (AX)A

ax

a tr (A'X)=A

ax

a tr (X'AXB)= AXB + A' XB'

ax

a tr (XX')= 2X

ax

a tr(X")= nXn'

ax

a tr (ex) x=eax

a tr (AX-'B) = -(X -'BAX-')'

III

NIA

4th

Tables of Formulae and Derivatives

Table 7

ay ayax

= EELrs ®axrs

ax= U + U - EErr ® Err (X symmetric)

ax

axU (elements of X independent)

ax

ax'ax

a(xY)=

-ax BY(I®Y)+(r(Dx)-

az az az

ax-1

ax_ -(I®x-')u(I®x-')

125

(X= ax ®Y+[I®ull[az®xl (r®U2l

._...

t".

.`.

vii

'L7

C17 t=7

Cry

'LS

-r+.-.

<«.m

°>

C/7

O^,

Bibliography

[1] Anderson, T. W., (1958), An Introduction to Multivariate StatisticalAnalysis, John Wiley.

[2] Athans, M., (1968), The Matrix Minimum Principle, Information andControl, 11, 592-606.

(3] Athans, M., and Tse, E., (1967), A Direct Derivation of the OptimalLinear Filter Using the Maximum Principle, IEEE Trans. Auto. Control,AC-12, No. 6, 690-698.

[41 Athans M., and Schweppe, F. C., (1965), Gradient Matrices and MatrixCalculations, MIT Lincoln Lab. Tech., Note 1965-53, Lemington, Mess.

[5] Barnett, S., (1973), Matrix Differential Equations and Kronecker Products,SIAM, J. Appl. Math., 24, No. 1.

[6] Bellman, R., (1960), Introduction to Matrix Analysis, McGraw-Hill.(7] Bodewig, E., (1959),Matrix Calculus, Amsterdam: North Holland Publishing

Co.[8] Brewer, J. W. (1978), Kronecker Products and Matrix Calculus in System

Theory, IEEE Trans. on Circuits and Systems, 25, No. 9, 772-781.[9] Brewer, J. W., (1977), The Derivatives of the Exponential Matrix with

respect to a Matrix, IEEE Trans. Auto. Control, 22, 656-657.[10] Brewer, J. W., (1979), Derivatives of the Characteristic Polynomial Trace

and Determinant with respect to a Matrix, IEEE Trans. Auto. Control,24,787-790.

[11] Brewer, J. W., (1977), The Gradient with respect to a Symmetric Matrix,IEEE Trans. Auto. Control, 22, 265-267.

[12] Brewer, J. W., (1977), The Derivative of the Riccati Matrix with respect toa Matrix, IEEE Trans. Auto. Control, 22, No. 6,980-983.

[131 Conlisk, J. (1969), The Equilibrium Covariance Matrix of Dynamic Econo-metric Models, American Star. Ass. Journal, No. 64, 277-279.

[14] Deemer, W. L. and Olkin, 1., (1951), The Jacobians of certain MatrixTransformations, Biometrika, 30, 345-367.

tic'a

te,=

y

c.,

..,

Sao

vCC

75'

coo

`''[z

].,r

{.73'1

,3.[z,

(On

"O'

"U'

,-.

Sri..;

c;,

'.0>

C;

`,O'.O

._..M

_...

Bibliography 127

[15] Dwyer, P. S. and Macphail, M. S., (1948), Symbolic Matrix Derivatives,Ann. Math. Statist., 19, 517-537.

[16] Dwyer, P. S., (1967), Some Applications of Matrix Derivatives in Multi-variate Analysis, American Statistical Ass. Journal, June, pt 62, 607-625.

[17] Geering, 11. P., (1976), On Calculating Gradient Matrices, IEEE Trans.Auto. Control, August, 615-616.

[18] Graham, A., (1979), Matrix Theory and Applications for Engineers andMathematicians, Ellis Horwood.

[19] Graham, A., and Burghes, D., (1980), Introduction to Control TheoryIncluding Optimal Control, Ellis Horwood.

[20] Lancaster, P., (1970), Explicit Solutions of Linear Matri;, Equations,SIAM Rev., 12, No. 4, 544-566.

[211 MacDuffee, C. C. (1956), The Theory of Matrices, Chelsea, New York.[22] Neudecker, H. (1969), Some Theorems on Matrix Differentiation with

special reference to Kronecker Matrix Products, J Amer. Statist. Assoc.,64,953-963.

[23] Neudecker, H., A Note oj'KroneckerMatriY Products and Matrix EquationSystems.

(24] Paraskevpoulos, P. N. and King, R. E., (1976), A Kronecker Productapproach to Pole assignment by output feedback, Int. J Contr., 24, No. 3,325-334.

[25] Roth, W. E., (1944), On Direct Product Matrices, Bull. Amer. Math. Soc.,No. 40, 461-468.

[26] Schonemann, P. H., (1965), On the Formal Differentiation of Traces andDeterminants, Research Memorandum No.27, University of North Carolina.

[27] Schweppe, F. C., (1973), Uncertain Dynamic Systems, Englewood Cliffs,Prentice Hall.

[28] Tracy, D. S. and Dwyer, P. S., (1969), Multivariate Maxima and Minimawith Matrix Derivatives, J. Amer. Statist. Assoc., 64, 1576-1594.

[29] Turnbull, H. W., (1927), On Differentiating a Matrix,Proc. EdinburghMath.Soc., 11, ser. 2, 111-128.

[30] Turnbull, H. W., (1930/31), À Matrix Form of Taylor's Theorem', Proc.Edinburgh Math. Soc., Set. 2, 33-54.

(31] Vetter, W. J., (1970), Derivative Operations on Matrices, IEEE Trans.Auto. Control, AC-15, 241-244.

[32] Vetter, W. J., (1971), Correction to `Derivative Operations on Matrices',IEEE Trans. Auto. Control, AC-16, 113.

[33] Vetter, W. J., (1971), An Extension to Gradient Matrices, IEEE Trans.Syst Man. Cybernetics, SMC-1, 184-186.

[34] Vetter, W. J., (1973), Matrix Calculus Operations and Taylor Expansions,SIAMRev., 2, 352-369.

(35] Vetter, W. J., (1975), Vector Structures and Solutions of Linear MatrixEquations, Linear Algebra and its Applications, 10, 181-188.

1 Bibliography

[:W. J., (1971), On Linear Estimates, Minimum Variance and Least-Weighting Matrices, IEEE Trans. Auto. Control, AC-16, 265-

[., R. J. and Mulholland, R. J., (1980), Kronecker Product Represen-or the Solution of the General Linear Matrix Equation,IEEE Trans.ontrol, AC-25, No. 3, 563-564.

CZ

.

...PC

.-.

^C'

`w..

mod'

2,2

Index

C

Chain Rulematrix, 88vector, 54

characteristic equation, 47cofactor, 57column vector, 14companion form, 47constrained optimisation, 94, 96

D

decomposition of a matrix, 13direct product, 21derivative

Kronecker product, 70matrix, 60, 62, 64, 67, 70, 75, 81scalar function, 56, 75vector, 52

determinant, 27, 56deviation, 94

G

Eigenvalues, 27, 30eigenvectors, 27, 30elementary matrix, 12, 19

transpose, 19exponential matrix, 29, 31, 42, 108

G

gradient matrix, 56

J

Jacobian, 53, 109

K

Kronecker delta, 13product, 21, 23, 33, 70, 85sum, 30

L

Langrange multipliers, 95least squares, 94, 96, 100

M

Matrixcalculus, 51, 94companion, 47decomposition, 13derivative, 37, 60, 62, 67, 70, 75,

81,84,88differential, 78elementary, 12, 19exponential, 29, 31, 42, 108gradient, 56integral, 37orthogonal, 97permutation, 23, 28, 32product rule, 84symmetric, 58, 95, 97transition, 42

maximum likelihood, 102mixed product rule, 24multivariable system, 45multivariate normal, 102

N

normal equations, 95, 101