Basic Matrix Theory

7/27/2019 Basic Matrix Theory

1/10

1

Basic Matrix Theory

1. Matrix as a Linear Transformation

Let A be an nm matrix of real numbers. Instead of defining a matrix as a rectangulararray of real numbers, we view it as a linear function from Rm to Rn.

First, A is interpreted as a function so that we write

y = Ax,

where y Rn is the output, corresponding to the input x Rm, of the function

defined by A. Throughout, Ax is defined as the usual multiplication of a vector x tothe matrix A. Note that we write as Ax the function A with its argument x, instead

of A(x), which is more standard in the general context. If A is a square matrix, in

which case we have n = m, the function represented by A becomes a transformation

on Rn, i.e., a function from Rn into itself.

Second, A represents a linear function. This follows directly from the usual rule

for the matrix and vector multiplication, i.e., we have

A(

c1x1 + c2x2)

= c1Ax1 + c2Ax2

for any scalars c1 and c2 and vectors x1 and x2 in Rm, which implies that A is linear

as a function.

As for the usual linear function, we may define the range R(A) and null space (orkernel) N(A) of a function defined by a matrix A by

R(A) = {y y = Ax for some x}

N(A) = {x Ax = 0}.We may easily show that R(A) and N(A) are subspaces respectively ofRn and Rm,i.e., subsets of vector spaces Rn and Rm that are vector spaces themselves. If y1, y2 R(A), then there exist x1, x2 such that y1 = Ax1, y2 = Ax2, and therefore, we have


2/10

2

c1y2 + c2y2 = A(c1x1 + c2x2), which implies that c1y2 + c2y2 R(A) for any linearcombination ofy1, y2. Moreover, ifx1, x2 N(A), then we have Ax1 = Ax2 = 0, and

therefore, A(c1x1 + c2x2) = 0, which implies that c1x2 + c2x2 N(A) for any linearcombination of x1, x2.

It is easy to see that the range of A is the space spanned by the column vectors of

A. Note that Ax yields a linear combination of the column vectors of A with weights

given by the components of x. We define the rank of A, denoted by rank (A), to be

the dimension of the range of A. On the other hand, the dimension of the null space

of A is often referred to as the nullity of A, which is written as nullity (A). Recall

that the dimension of a subspace of a vector space is defined to be the number oflinearly independent vectors we need to span it. It is well expected that rank (A) =

m nullity (A), or equivalently,

rank(A) + nullity (A) = m,

since we have m nulity (A) linearly independent vectors whose span has emptyintersection with the null space of A except for the origin and they are mapped into

a linearly independent set of vectors due to the linearity of A.

We denote by A the transpose of a matrix A. Moreover, for a subspace M ofRn,

we define

M ={

xxy = 0 for all y M},

which is commonly referred to as the orthogonal complement of M and read as M-

perp. For instance, the orthogonal complement of the xy-plane is the z-axis in R3.

Note that we have dim M = n dim M, where dim denotes the dimension. It iseasy to deduce that

R(A) =N(A).

This just states that a vector x is orthogonal to every column vector of A if and only

if Ax = 0, which is almost tautological.


3/10

3

2. Eigenvalues and Eigenvectors

Throughout this section, we let A be an n

n square matrix. We say that a scalar

is an eigenvalue if there exists a vector x Rn such that

Ax = x.

The vector x is called an eigenvector associated with, or corresponding to, the eigen-

value . Clearly, is an eigenvalue of A if and only if there exists x such that

(A I)x = 0, which holds if and only if A is singular or non-invertible. Therefore,we may find eigenvalues by solving the determinantal equation

det(A I) = 0.Once an eigenvalue is found, the corresponding eigenvectors may be obtained by

solving the linear equations (A I)x = 0.

The determinantal equation yields an n-th order equation

a0n + a1

n1 + + an1 + an = 0

in with coefficients a0, . . . , an given by the entries of A. The equation can also be

written after factorization as

( 1) ( n) = 0,

where 1, . . . , n now become eigenvalues ofA. Eigenvalues ofA are generally complex

numbers, even if A itself is a real matrix. The set of eigenvalues of A is called the

spectrum of A. Of course, we may not have n-distinct roots for the determinantal

equation, since some roots may be repeated. In general, therefore, the number of

elements in the spectrum of A is fewer than the dimension n of A. If an eigenvalue

i is repeated m-times as a root of the determinantal equation, then we say that it

has algebraic multiplicity m.

For each of eigenvalue i, we have at least one vector x such that (A iI)x = 0,which becomes an eigenvector associated with i. An eigenvector is identified only up


4/10

4

to a constant multiplication, since if xi is an eigenvector associated with eigenvalue

i then any constant multiple of xi also becomes an eigenvector associated with the

same eigenvalue i. Naturally, there may be multiple eigenvectors associated with asingle eigenvalue i. In this case, they are not individually identified even up to a

constant multiplication. If, for instance, x1i and x2i are two eigenvectors associated

with i, then any linear combination of them is also an eigenvector associated with i,

since A(c1x1i + c2x2i) = i(c1x1i + c2x2i) for any constants c1 and c2. In fact, it is easy

to see that eigenvectors associated with any eigenvalue i are identified only up to

the space spanned by them, which we call the eigenspace associated with eigenvalue

i. The null space, for instance, can be regarded as the eigenspace associated with

zero eigenvalue. The dimension of the eigenspace associated with eigenvalue i is

sometimes called the geometric multiplicity of eigenvalue i. It is known that the

geometric multiplicity cannot exceed the algebraic multiplicity. For instance, if the

eigenspace associated with eigenvalue i is 2-dimensional, then i is a root of the

determinantal equation that is repeated at least 2-times.

There are two commonly used functionals of a matrix, trace and determinant,

whose values are solely determined by its eigenvalues. For a matrix A with eigenvalues

1, . . . , n, we define trace and determinant of A as

tr (A) =ni=1

i and det(A) =ni=1

i

respectively. The trace and determinant can also be obtained directly from the entries

ofA using the relationships between the coefficients a0, . . . , an and the roots 1, . . . , n

of the determinantal equation. To see this more clearly, we consider a 2-dimensional

square matrix A = (aij), i , j = 1, 2, whose determinantal equation is given by

2 (a11 + a22) + (a11a22 a12a21) = 0,and note that we have 1 + 2 = a11 + a22 and 12 = a11a22 a12a21 if we set 1 and2 to be the roots of the determinantal equation.


5/10

5

3. Projections

Let P be an n-dimensional square matrix, which we may view as a linear transfor-

mation on Rn. We say that a matrix P is idempotent if and only if

P2 = P.

For an idempotent matrix P, we have

P(P x) = P2x = P x

for all x Rn, which implies that P is an identity map if restricted to the rangeR(P) ofP. Note that any vector in R(P) can be written as P x for some x Rn. Asfor any other linear transformations on Rn, we have rank (P) + nullity (P) = n, i.e.,

dim R(P)+dimN(P) = n. Moreover, R(P)N(P) = {0}, since any nonzero vectorin R(P) is mapped to itself and in particular does not belong toN(P). Consequently,for any vector x Rn we may write x = y + z uniquely for some y R(P) andz N(P).

For an n-dimensional idempotent matrix P, we have y = P x if x Rn is given

by x = y + z with y R(P) and z N(P). Therefore, y can be obtained byprojecting x on R(P) along N(P). For this reason, we call the transformation givenby an idempotent matrix a projection. An idempotent matrix itself is also often

called a projection. If an idempotent matrix P is also symmetric and P = P, then

we have R(P) N(P), in which case the transformation given by P becomes anorthogonal projection. We also call a matrix P itself an orthogonal projection if it

is idempotent and symmetric. In what follows, we say that P is an m-dimensional

projection or orthogonal projection on Rn if

R(P) is an m-dimensional subspace of

Rn. The identity matrix is the only n-dimensional projection on Rn.

Obviously, any projection P has only two distinct eigenvalues, 0 and 1. For all

x N(P), we have P x = 0, and therefore, N(P) is the eigenspace associated witheigenvalue 0. On the other hand, for all x R(P), we have P x = x, and therefore,


6/10

6

R(P) is the eigenspace associated with eigenvalue 1. Let dim R(P) = m. Then thegeometric multiplicity of eigenvalue 1 is m, and 1 must be a root of determinantal

equation repeated at least m-times. Likewise, the geometric multiplicity of eigenvalue0 is nm, and 0 must be a root of determinantal equation repeated at least (n m)-times. However, there cannot be more than n-roots to the determinantal equation,

the algebraic multiplicities of eigenvalues of 1 and 0 are exactly m and n m. Itfollows, in particular, that tr (P) = m, where m is the dimension of projection P.

We may easily see that if P is idempotent, so is I P. This implies that if Pis a projection, so is I P. In fact, it is clear that I P is a projection on N(P)

along R(P). Note that R(I P) =N(P) and N(I P) = R(P) for any projectionP. Therefore, if P is an m-dimensional projection in Rn, then I P is an (n m)-dimensional projection in Rn. It follows straightforwardly that tr (I P) = n m ifP is m-dimensional. Clearly, we have P(IP) = (IP)P = 0 for any projection P.If, in particular, P is the orthogonal projection on an m-dimensional subspace M of

Rn, then I P is the (n m)-dimensional orthogonal projection on the orthogonal

complement M of M.

Let A be an n m matrix A of full column rank, i.e., rank (A) = m and A hasm-linearly independent column vectors. Now we construct the orthogonal projection

P on the range R(A) of A. We choose an arbitrary x Rn and orthogonally projectit on R(A), which we denote by P x. We may set P x = Ab for some b Rm, sinceit is in R(A). To determine b Rm, we note that A(x Ab) = 0, which yieldsb = (AA)1Ax. Consequently, we have P x = A(AA)1Ax, from which it follows

that

P = A(AA)1A

since the choice ofx Rn was arbitrary. As expected, P is idempotent and symmetric.If, in particular, the column vectors of A are orthonormal, then we have P = AA,

since AA = I.


7/10

7

4. Spectral Representation

Throughout this section, we assume that A is an n-dimensional symmetric matrix of

real numbers. There are three important facts known for symmetric matrices listed

below.

(a) All eigenvalues are real.

(b) Eigenvectors associated with distinct eigenvalues are orthogonal.

(c) Geometric multiplicities of all eigenvalues are identical to their algebraic

multiplicities.

One immediate consequence of these facts is that there are orthogonal eigenspaces

M1, . . . , M m associated with real distinct eigenvalues 1, . . . , m of A, the sum of

whose dimensions is exactly n. Notice that Mi Mj = {0} for all i, j = 1, . . . , m,since they are orthogonal. Of course, the number of distinct eigenvalues is generally

smaller than n, since some roots of the determinantal equations are repeated.

For any n-dimensional symmetric matrix A, we may therefore partition Rn into

the eigenspaces M1, . . . , M m of A, so that we may write any x Rn uniquely asx = x1 + xm with xi Mi for i = 1, . . . , m. Intuitively, it is clear that xi = Pix, if

we denote by Pi the orthogonal projection on Mi, for i = 1, . . . , m, since M1, . . . , M m

are orthogonal. It follows that we have

x = P1x + + Pmx

for all x Rn, i.e., P1 + + Pm = I. Consequently, we may deduce that

Ax = A

mi=1

Pix

=

mi=1

A(Pix) =mi=1

i(Pix) =

mi=1

iPi

x

for all x

Rn, and we have

A =mi=1

iPi,

which is called the spectral representation of A. Note that, if restricted on each

of Mi, the transformation given by A is extremely simple and reduces to a scalar

multiplication by i, i.e., A(Pix) = i(Pix), i = 1, . . . , m.


8/10

8

For i = 1, . . . , m, let Hi be a matrix whose column vectors consist of orthonormal

eigenvectors associated with eigenvalue i. Then we may write Pi more explicitly

as Pi = HiH

i. If we further let xi1, . . . , xi be the column vectors of Hi, we havePi = HiH

i =

j=1 xijx

ij . Therefore, we may write the spectral representation of A

generally as

A =ni=1

i xix

i

with eigenvalues 1, . . . , n and their corresponding orthonormal eigenvectors x1, . . . , xn

of A, where we allow any eigenvalue i to be repeated arbitrary times. As a conse-

quence, we may represent A as

A = UU

,

where U is an orthogonal matrix having xi in its i-th column and is a diagonal

matrix with i on its i-th diagonal entry. Note that U is a nonsingular matrix such

that UU = I, and therefore, U = U1. We call such a matrix orthogonal.

The matrix version A = UU of the spectral representation of A is extremely

useful in many different contexts. Note that A2 = (UU)(UU) = U2U, which

can be easily extended to An = UnU for arbitrary nonnegative integer n. More

generally, the spectral representation ofA allows us to define a wide class of functions

f(A) with the matrix argument A by

f(A) = U

f(1) . . .

f(n)

U.

For instance, we may define

A or A1/2 as above with f() =

, as long as A 0,i.e., i 0 for all i = 1, . . . , n. Likewise, log A is defined as above with f() = log ,which of course require A > 0, i.e., i > 0 for all i = 1, . . . , n. It is also possible todefine A1 as above with f() = 1/ if A is invertible and none ofi, i = 1, . . . , n, is

zero. Other functions ofA, such as eA, may also be defined similarly with f() = e.

Note in particular that, for A = (aij), f(A) is in general not defined as f(A) =

(f(aij)).


9/10

9

Above are introduced matrix inequalities A 0 and A > 0 for a symmetricmatrix A, in which case we say that A is positive semi-definite and positive definite

respectively. It follows from

xAx = x

mi=1

iPi

x =

mi=1

i(

xPix)

that A 0 if and only if xAx 0 for all x Rn, and that A > 0 if and onlyif xAx > 0 for all x = 0 in Rn. Note that for any projection P we have xP x =

(P x)(P x) = P x2 and P x2 0 for all x Rn and P x2 > 0 for all x = 0 inRn. Clearly, we have P 0 for any projection P. For symmetric matrices A and B

of the same dimension, we write A B and A > B if and only if A B 0 andA B > 0 respectively.

5. Exercises

1. Let A and B be matrices of dimensions n m and m , respectively, and defineai to be the i-th column of A and b

i to be the i-th row of B. Show that

AB =m

i=1aib

i.

Apply this to the case of B = b being a vector with = 1 and show that R(A)becomes the space spanned by the column vectors a1, . . . , am of A.

2. Show that rank AB = rank A if and only if N(A) R(B) = {0} for any matricesA and B of conformable dimensions, and use this result to deduce that rank AA =

rank A for any matrix A.

3. Let A and B be nm matrices of full column rank such that R(A)R(B) = {0}.Show that the projection on R(A) along R(B) in Rn is given by

P = A(BA)1B.

Hint: Choose an arbitrary x Rn and write P x = Ab for some b Rm, and obtain bfrom the condition x Ab R(B) =N(B).


10/10

10

4. Define

x =

21

and y =

1

1

.

(a) Find the orthogonal projection on the span of x.

(b) Find the projection on the span of x along the span of y.

5. For a matrix A defined as

A =

3

21

2

12

3

2

,

find A10, A , log A, A1 and eA.

6. On matrix inequality, answer the following:

(a) Show that A 0 implies BAB 0 for any matrix B of conformable dimension,and use this result to deduce that A B implies CAC CBC for any matrix C ofconformable dimension.

(b) Show that A I implies A1 I, and use this result to deduce that A B > 0implies 0 < A1 B1.Hint: Note that, if A has the spectral representation A =

mi=1 iPi, then we have

A I = mi=1(i 1)Pi, since mi=1 Pi = I.

Documents

Basic Matrix Theory