MAT3341 : Applied Linear Algebra Course Notesweb5.uottawa.ca/mnewman/notes/mat3341.pdfMAT3341 : Applied Linear Algebra Mike Newman, april 2018 + 1. Matrix Algebra matrices An m nmatrix

MAT3341 : Applied Linear Algebra

Course Notes

(this version: april 2018 (and corrections december 2019))

These notes are intended for the course MAT3341.

They are not complete (in particular I have a list of unimplemented changes. . . time. . . ). The chapteron singular value decomposition could be expanded. The chapter on Jordan form is not complete.There should be a chapter on Rayleigh quotients as well. Other things. . .

The origin of these notes dates from teaching a course based on Daniel and Noble’s Applied LinearAlgebra [1].

These notes are available on my webpage. Please don’t post them elsewhere. Feel free to share thelink; the current url is http://web5.uottawa.ca/mnewman/notes/.

Despite my best efforts to eliminate typos, there “may” be some that remain. Please let me know ifyou find any mistakes, big or small. Thanks to those who have pointed out typos in the past!

References

[1] Ben Noble and James W. Daniel. Applied linear algebra. Prentice-Hall, Inc., Englewood Cliffs,N.J., second edition, 1977.

MAT3341 : Applied Linear Algebra Mike Newman, april 2018 +

1. Matrix Algebra

matrices

An m × n matrix is a grid of numbers in m rows and n columns. We will deal with both real andcomplex matrices. In fact it is quite reasonable to talk of matrices over any field, though in thiscourse we will stick to R and C. We denote the set of m×n matrices over the reals or the complexesby Mm×n(R) and Mm×n(C). For a matrix A we use Aij for the (i, j)-entry. Sometimes we willuse uppercase for the matrix and lowercase for the entries: so aij for the (i, j)-entry of A. We useboldface such as a for column-vectors, which are otherwise known as matrices with a single column.Then ai is the i-th entry of a. This is not to be confused with ui, which is the i-the vector in asequence. So a1, a2, · · · , an might be the entries of a vector a, while u1,u2, · · · ,un is a sequence ofn different vectors. In particular, uj might be the j-th column of some matrix U .

Matrices add component-wise, so (A+B)ij = Aij +Bij . Multiplying a matrix by a scalar multiplieseach coordinate, so (αA)ij = α (Aij). Matrix multiplication is according to the rule

(AB)ij =k∑t=1

AitBtj

Of course this product is only legal when the number of columns of A is equal to the number of rowsof B. What is “k” in this expression?

We can write this as an expression for the matrix.

Proposition 1.1. If A is m×k and B is k×n then the entries of the product AB are obtainedfrom products of rows of A and columns of B. In other words, if Eij is the m× n matrix thatis 1 in the (i, j)-entry and zero elsewhere, then

AB =m∑i=1

n∑j=1

(Eij

k∑t=1

AitBtj

)

We can also understand matrix multiplication in terms of linear combinations of columns.

Proposition 1.2. If A is m × k and B is k × n, then the j-th column of AB is exactly thelinear combination of the columns of A whose coefficients are the j-th column of B. In otherwords, if Bij is the (i, j)-entry of B and aj is the j-th column of A then the j-th column ofAB is

k∑t=1

Btjat

We can also understand matrix multiplication in terms of linear combinations of rows.

∗ These notes are intended for students in mike’s MAT3341. For other uses please say “hi” to [email protected].

1

Proposition 1.3. If A is m× k and B is k× n, then the i-th row of AB is exactly the linearcombination of the rows of B whose coefficients are the i-th row of A. In other words, if Aijis the (i, j)-entry of A and bTi is the i-th row of B then the i-th row of AB is

k∑t=1

AitbTt

Here is yet another way to think of a matrix product.

Proposition 1.4. If A is m×k and B is k×n, then the product AB is the sum of the productsof the rows of A with the columns of B. In other words, if aj is the j-th column of A and bTiis the i-th row of B, then

AB =

k∑t=1

atbTt

The first thing you should do in verifying that Proposition 1.4 is correct is to see that it is syntacticallycorrect, that is, that the sizes are all compatible and produce the right size result. Note that aj ism × 1 and bTi is 1 × n, so atbt is m × n. We write bTi for a row of B so that bi is a standardcolumn-vector (in fact bi is the i-th column of BT , right?). You are encouraged to work through thefollowing example if you are at all unclear about Proposition 1.4.

Example 1.5. Calculate

[4 0−1 2

] [1 0 3−1 2 1

]according to the technique of Proposition 1.4.

[4 0−1 2

] [1 0 3−1 2 1

]=

[4−1

] [1 0 3

]+

[02

] [−1 2 1

]=

[4 0 12−1 0 −3

]+

[0 0 0−2 4 2

]=

[4 0 12−3 4 −1

]

Check that this gives the same result of we use the techniques of Proposition 1.2, Proposition 1.3, orthe standard definition of matrix multiplication.

Problem 1.6. If a is m×1 and bT is 1×n, then verify that abT has rank at most one, by verifyingthat the columns are all scalar multiples of each other. Is it possible that the rank is less than one?How?

The point of this exercise is that it shows that Proposition 1.4 writes the product AB as a sum of kmatrices of rank at most 1. This is of interest due to the the following.

Theorem 1.7. Let X and Y be matrices. Then rank(X + Y ) ≤ rank(X) + rank(Y ).

We will see later that there are interesting cases where equality holds.

Problem 1.8. Invent some compatible matrices, and multiply them using the standard definitionand the techniques of Proposition 1.2, Proposition 1.3 and Proposition 1.4.

Matrix arithmetic obeys some well-known rules.

2

Proposition 1.9. For any appropriately-sized matrices A, B and C and scalars r, s we havethe following.

A+B = B +A

A+ (B + C) = (A+B) + C A(BC) = (AB)C

A(B + C) = AB +AC (A+B)C = AC +BC

(r + s)A = rA+ sA (rs)A = r(sA)

r(A+B) = rA+ rB r(AB) = (rA)B = A(rB)

A0 = 0 0B = 0

AI = A IB = B

As usual we write I for the identity matrix and 0 for the zero matrix, with sizes determined by thecircumstances.

Problem 1.10. In the properties of Proposition 1.9 there are implicit conditions on the sizes of thegiven matrices. State explicitly these conditions.

One “well-known rule” which does not apply to matrices is commutativity: typically AB 6= BA. Itcan happen that one of the products is allowed and the other is not; it can happen that the twoproducts exist but are not equal; it can happen that the two products both exist, are the same size,but are still not equal.

Problem 1.11. Find pairs of matrices A and B with AB 6= BA. Find pairs of matrices A and Bwith AB = BA. Try and find non-trivial examples (eg, not diagonal, zero, identity, etc).

There are two other useful operations. The transpose of A is defined by(AT)ij

= Aji and the

conjugate transpose is defined by(AH)ij

= Aji. The conjugate transpose is sometimes called

the Hermitian transpose or Hermitian conjugate.

Note that if A is real, then AT = AH . If A is symmetric then AT = A. This leaves one further case:we say that A is Hermitian if A = AH .

Example 1.12. If A =

[1 i

−i+ 2 5i

]then AH =

[1 i+ 2−i −5i

]and A is not Hermitian.

If B =

[0 −5i+ 1

5i+ 1 7

]then BH =

[0 −5i+ 1

5i+ 1 7

]and B is Hermitian.

“Hermitian” is the natural generalization of symmetric for complex matrices. This may seem a littlestrange — why should complex numbers require a conjugation when transposed? — but it has to dowith inner products. We’ll see more later, but here is a quick preview. Consider the following vector.

z =

[1 + i1− i

]∈ C2

What is the length of this vector? The absolute value of each component is√

2 (according toPythagoras applied to the complex plane), so one would think that the length of the vector should

then be

√(√2)2

+(√

2)2

= 2 (again, according to Pythagoras). We would like a “dot product” to

measure length. Compare the following two quantities (that is, compute them and check that thegiven answer is correct)

√z · z =

√zT z =

√0 = 0

√z · z =

√zHz =

√4 = 2

It doesn’t seem right to say that the length of this vector is zero, so it would seem that dot productsof complex vectors should need an automatic conjugate? We’ll see more when we consider generalinner product spaces.

3

inverses

For a matrix A, if XA = I then X is a left inverse for A and if AY = I then Y is a rightinverse for A. Note that A need not be square and that the two identity matrices might be ofdifferent sizes.

Problem 1.13. Show that A is a left inverse of B if and only if B is a right inverse of A.

If ZA = AZ = I then Z is an inverse of A. This is sometimes called a two-sided inverse, but we willsimply call it an inverse. In this case we say that A is invertible or non-singular (a matrix thathas no inverse is called non-invertible or singular). It turns out that existence of inverses anduniqueness of inverses are closely related.

Proposition 1.14. If X is a left inverse of A and Y is a right inverse of A then X = Y (andhence X is an inverse).

The proof is best starting from the middle and reading left and right.

Proof. X = XI = X(AY ) = XAY = (XA)Y = IY = Y

As a corollary we get uniqueness of inverses.

Proposition 1.15. If X and Y are both inverses of A then X = Y .

Proof. X is certainly a left inverse and Y is certainly a right inverse, so by the previous they mustbe equal.

On the other hand, left inverses and right inverses need not be unique as long as only one of themexists. That is if there are no left inverses it is possible for there to be more than one right inverse.Conversely, if there is more than one right inverse there is no left inverse

Problem 1.16. Try and find a matrix A that has more than one right inverse. Does your examplehave a left inverse?

Problem 1.17. Show that if A has two distinct right inverses, R1 and R2, then A has no left inverse.Show furthermore that R = αR1 + (1− α)R2 is a right inverse of A for any scalar α.

Problem 1.18. What can we say about the number of right and left inverses of a fixed matrix?

We also recall some useful formulas.

Proposition 1.19. Let A and B be n× n matrices.

• If A and B are invertible then AB is invertible and (AB)−1 = B−1A−1.

• If A is invertible then AT is invertible and(AT)−1

=(A−1

)T.

• If A is invertible then AH is invertible and(AH)−1

=(A−1

)H.

Problem 1.20. Prove the previous proposition.

theorems

Recall that the rank of a matrix is the number of pivots it has.

4

Theorem 1.21. Let A be a m× n matrix. The following conditions are all equivalent.

• rank(A) = m

• Ax = b has at least one solution for every vector b ∈ Rm (existence of solutions)

• the columns of A span Rm

• the rows of A are independent

• A has a right inverse

• dim nul(A) = n−m (the null space of A, or the kernel of A)

Theorem 1.22. Let A be a m× n matrix. The following conditions are all equivalent.

• rank(A) = n

• Ax = b has at most one solution for every vector b ∈ Rm (uniqueness of solutions)

• the columns of A are independent

• the rows of A span Rn

• A has a left inverse

• dim nul(AT ) = m− n

Theorem 1.23. Let A be a n× n matrix. The following conditions are all equivalent.

• rank(A) = n

• the reduced row echelon form of A is I

• Ax = b has at least one solution for every vector b ∈ Rn

• Ax = b has at most one solution for every vector b ∈ Rn

• Ax = b has at exactly one solution for every vector b ∈ Rn

• the columns of A span Rn

• the columns of A are independent

• the columns of A form a basis for Rn

• the rows of A are independent

• the rows of A span Rn

• the rows of A form a basis for Rn

• A has a right inverse

• A has a left inverse

• A has a (unique, two-sided) inverse

• dim nul(A) = 0

• dim nul(AT ) = 0

• det(A) 6= 0

• 0 is not an eigenvalue of A

5

exercises

1. Compute the matrix product

[1 2 34 5 6

]0 21 10 4

using Proposition 1.1, Proposition 1.2, Propo-

sition 1.3, Proposition 1.4. For Proposition 1.4, give explicitly the matrices that add up tothe product.

2. Prove Proposition 1.1, Proposition 1.2, Proposition 1.3, Proposition 1.4. In other words,verify that they are all equivalent to the standard definition of matrix multiplication.

3. Let M =

[0 2 00 0 0

].

a) Find vectors x and y such that xyT = M .b) Find all vectors x and y such that xyT = M .

4. Let M =

[0 2 00 6 0

].


5. Let M =

[1 2 10 0 0

].


6. Let M =

[1 2 13 6 3

].


7. Let E be a matrix with at most one nonzero entry.a) Find vectors x and y such that xyT = E.b) Find all vectors x and y such that xyT = E.c) LetA be anym×n rank r matrix. Show that there exist x1,x2, · · · ,xmn and y1,y2, · · · ,ymn

such that A =mn∑t=1

xtyTt .

8. Let E be a matrix with at most one nonzero column.a) Find vectors x and y such that xyT = E.b) Find all vectors x and y such that xyT = E.c) LetA be anym×n rank r matrix. Show that there exist x1,x2, · · · ,xn and y1,y2, · · · ,yn

such that A =n∑t=1

xtyTt .

9. Let E be a matrix with at most one nonzero row.a) Find vectors x and y such that xyT = E.b) Find all vectors x and y such that xyT = E.c) LetA be anym×n rank r matrix. Show that there exist x1,x2, · · · ,xm and y1,y2, · · · ,ym

such that A =m∑t=1

xtyTt .

10. Let E be a matrix with rank at most one.a) Find vectors x and y such that xyT = E.b) Find all vectors x and y such that xyT = E.

6

c) Let A be any m × n rank r matrix. Show that there exist x1,x2, · · · ,xr and

y1,y2, · · · ,yr such that A =r∑t=1

xtyTt . (For the moment we don’t have many ideas

of how to even approach this. But we will see how to do this later. The exercise isincluded here for comparison with the previous.)

11. Show that(AT)T

=(AH)H

= A and(AH)T

=(AT)H

= A.

12. Show that if A is Hermitian then it is either real and symmetric or non-real and non-symmetric.

13. Show that if A is Hermitian then the diagonal entries must be real.

14. Claim: “The inverse of AB is B−1A−1.” Give an example to show that this claim is false.

7


2. Inverses

finding right inverses

You already know how to find the (two-sided) inverse of a matrix A. We form the augmented matrix[A|I] and row reduce to its reduced row echelon form (RREF). If we get the identity matrix to theleft of the bar, then the inverse is what’s on the right. Otherwise, A is non-invertible.

Why does this work? It’s based on solving a system of linear equations, or more precisely solvingin parallel a whole collection of systems of linear equations, namely Ax = bj , where bj is the jth

column of the identity matrix. The solution x is then the jth column of the inverse.

Example 2.1. Find the inverse of A =

1 0 −30 1 2−1 0 2

.

We find the RREF of [A|I]. 1 0 −3 1 0 00 1 2 0 1 0−1 0 2 0 0 1

→ · · · → 1 0 0 −2 0 −3

0 1 0 2 1 20 0 1 −1 0 −1

We can see the inverse, but let’s examine in more detail. We can think of this as simultaneouslysolving the linear systems Ax = bj where 1 ≤ j ≤ 3, in parallel, by considering the columns to

the right only one at a time. Consider Ax =[100

]. We know the solution already, by considering

only the first column on the right. 1 0 −3 10 1 2 0−1 0 2 0

→ · · · → 1 0 0 −2

0 1 0 20 0 1 −1

The (unique) solution is x =

[−22−1

]. This is the first column of the (eventual) inverse matrix B.

For the second column of the inverse, we need to solve Ax =[010

]. Again we already know the

answer 1 0 −3 00 1 2 1−1 0 2 0

→ · · · → 1 0 0 0

0 1 0 10 0 1 0

The (unique) answer is x =

[010

].

We get the third column is obtained the same way, as[−3

2−1

].

Note that since we already reduced the big augmented matrix [A|I], we didn’t need to even writedown these three equations, let alone solve them. We had already done them all in parallel (whichis in fact significantly faster than doing them separately).


8

Example 2.2. Find the right inverse of A =

1 0 −3 10 1 2 1−1 0 2 1

.

We want to find a matrix B such that AB = I. The identity matrix is 3 × 3 here. So we wantto solve Ax = bj where bj is the jth column of I. This would give us three augmented matrices,but let’s be practical and write down one combined augmented matrix. 1 0 −3 1 1 0 0

0 1 2 1 0 1 0−1 0 2 1 0 0 1

→ · · · → 1 0 0 −5 −2 0 −3

0 1 0 5 2 1 20 0 1 −2 −1 0 −1

It is in fact not true that the right inverse matrix is to the right of the bar. But it is related: wecan read the columns of the right inverse one at a time.

For clarity of presentation, we’ll write down the systems for each column separately, but youshould convince yourself that this is not actually necessary.

The first column of B is the solution to Ax =[100

]. We reduce the augmented matrix to read the

solution. 1 0 −3 1 10 1 2 1 0−1 0 2 1 0

→ · · · → 1 0 0 −5 −2

0 1 0 5 20 0 1 −2 −1

solution: x =

−22−10

+ t1

5−521

The second column of B is the solution x of Ax =

[010

]. 1 0 −3 1 0

0 1 2 1 1−1 0 2 1 0

→ · · · → 1 0 0 −5 0

0 1 0 5 10 0 1 −2 0

solution: x =

0100

+ t2

5−521

We do the same for each column of the right inverse. Each time, we get one parameter, and in factthe vector associated with this parameter is the same each time. The parameters themselves areall distinct variables. So in fact we can, as in the square case, read the answer from the augmentedmatrix. Here it is in a rather awkward form, followed by a more useful form

−2 + 5t1 0 + 5t2 −3 + 5t32− 5t1 1− 5t2 2− 5t3−1 + 2t1 0 + 2t2 −1 + 2t3

0 + t1 0 + t2 0 + t3

=

−2 0 −32 1 2−1 0 −10 0 0

+

5−521

[t1 t2 t3]

(t1, t2, t3 ∈ R)

You are strongly advised to check that this is the solution, and that these two forms are equal.On the right, the 4 × 3 matrix is a right inverse of A, the 4 × 1 matrix is such that its columnsform a basis of nul(A), and the 1× 3 matrix contains the free parameters.

Here is another example, presented a little more concisely.

Example 2.3. Find the general right inverse of A =

1 1 3 1 −12 0 2 −2 20 1 2 2 −3

.

Start by finding the RREF of [A|I]. 1 1 3 1 −1 1 0 02 0 2 −2 2 0 1 00 1 2 2 −3 0 0 1

→ · · · → 1 0 1 −1 0 −1 1 1

0 1 2 2 0 3 −3/2 −20 0 0 0 1 1 −1/2 −1

9

Think of this augmented matrix as three separate but parallel “ordinary” augmented matrices.We read off the general solution for each, giving the three columns of the general right inverse.−13001

+ s1

−1−2100

+ t1

1−2010

;

1−3/2

00−1/2

+ s2

−1−2100

+ t2

1−2010

;

−1−200−1

+ s3

−1−2100

+ t3

1−2010

This gives the right inverse as follows.

−1 1 13 −3/2 −20 0 00 0 01 −1/2 −1

+

−1−2100

[s1 s2 s3]

+

1−2010

[t1 t2 t3]

Using the multiplication technique of Proposition 1.4, we write this in the form B0 + NT , withB0 a particular right inverse, N a matrix whose columns form a basis for nul(A) (the null spaceof A) and T a matrix of parameters.

−1 1 13 −3/2 −20 0 00 0 01 −1/2 −1

+

−1 1−2 −21 00 10 0

[s1 s2 s3t1 t2 t3

](s1, s2, s3, t1, t2, t3 ∈ R)

Check that the two expressions for the general inverse are in fact equal (by multiplying themout). Notice that in fact we can write down the B0 +NT form directly from the RREF matrix.We don’t need to write down each system separately, nor even write down the solution for eachcolumn separately.

Problem 2.4. Find the general right inverse of

[2 1 2 1 11 0 2 4 1

]. Give it in the form B0 + NT ,

where B0 is a particular right inverse, the columns of N form a basis for nul(A) and T is a matrixof parameters.

Problem 2.5. Try and find the general right inverse for

1 1 0 22 0 −2 01 3 2 6

. Explain.

existence of right (and left!) inverses

If the rank of A is equal to the number of rows of A, then we can solve all of the linear systems (i.e.,the linear systems for each of the columns of the right inverse). We saw this in previous examples.Note that the b-vectors in those systems form a basis themselves (because they are the columns ofthe identity matrix), so being able to solve all of the systems is equivalent to being able to solve anylinear system. When the rank is too small we had problems. In fact what we have is a proof of thefinal equivalence in Theorem 1.21.

Problem 2.6. Let A be an m× n matrix of rank r. Show that we can find the general right inverseof A if r = m. Show that A has no right inverse if r < m.

In showing this, you should explain how it is equivalent to one of the other conditions in Theorem 1.21(you may assume that we proved in a previous course that the others are all equivalent).

10

We recall that (two-sided) inverses only exist for square matrices, but square matrices may or maynot have inverses. Thus being square is a necessary condition for invertibility, but is not sufficient.The same idea applies to right inverses.

Problem 2.7. Show that if A is m× n with m > n then A never has a right inverse. Show that ifm = n then A might or might not have a right inverse (ie: both possibilities can occur).

We finish up with a statement of the algorithm and a theorem implicit in the previous examples.

Algorithm 2.8 (Finding the general right inverse). Let A be an m× n matrix of rank r.In order to find the general right inverse of A we form the augmented matrix [A|I] and findits RREF. For each column to the right of the bar we read the general solution: this is thecorresponding column of the general right inverse.We can write it down in a compact form as follows. Considering the RREF of [A|I]:

• Read the general solution obtained from setting each parameter to zero. This gives thematrix B0, which is one particular right inverse. Notice that B0 is n×m.

• Read off a basis for nul(A). This basis becomes the columns of a matrix N . Notice thatN is n× (n− r).• Form a matrix T of parameters. The rows correspond to the basis of nul(A) and the

columns correspond to the columns of A. Notice that T is (n− r)×m.

• The general right inverse is B0 +NT .

It’s useful to think of what happens in this algorithm in the special case of square matrices.

What about left inverses? We can use the following idea. Suppose that we are given A and we wantto find B such that BA = I. If BA = I then ATBT = IT = I. So we find the right inverse ofAT , and then take the transpose of what we get. Note that we need to be careful about taking thetranspose of things in the form B0 +NT .

Problem 2.9. Find the left inverse of A =

1 2 01 0 13 2 21 −2 2−1 2 −3

. Write your answer in a compact form

analogous to B0 + NT that we had for right inverses. What, exactly, is that compact form? Verifythat whatever form you write down has compatible matrix multiplication and addition. Explain whyit isn’t exactly the same as B0 +NT , even though it is the same idea.

The condition for the existence of left inverses and the method to find them follows directly fromwhat we did for right inverses.

Looking at Algorithm 2.8 (and the fact that it implicitly gives an algorithm for left inverses too),we see that it proves the following. You are invited to explain the details of why this algorithmestablishes this result.

11

Theorem 2.10. Let A be an m× n matrix of rank r.

• If r = m = n then A has a unique right inverse and a unique left inverse, which areequal and hence the (unique) two-sided inverse.

• If r = m < n then A has an infinite number of right inverses and no left inverse, andhence no two-sided inverse

• If r = n < m then A has no right inverse and an infinite number of left inverses, andhence no two-sided inverse.

• If r < m, n then A no right inverse, no left inverse and hence no two-sided inverse.

exercises

1. Let Y1 et Y2 be two left inverses of a matrix A.a) Show that if α+ β = 1 then Y = αY1 + βY2 is also a left inverse of A.b) If we remove the condition that α+ β = 1 is Y still a left inverse of A?

2. Find the general right inverse of each of the following, or explain why there is no right inverse.

a)

[−3 4 6 −32 −1 1 2

]b)

7 43 −915 20

c)

[1 23 4

]d)

2 −4 0 1 83 −6 1 1 102 −4 4 1 4

e)

2 −2 −11 1 33 −5 −51 1 10 12 13

3. Using the fact that XA = I if and only if XTAT = I, find the general left inverse of each

matrix of Exercise 2.2, or explain why there is no left inverse.

4. Let A be m × k matrix and B a k × n matrix, such that A has a left inverse and B has aright inverse.a) Is it true that AB is invertible? Either prove it or give a counterexample.b) Does the answer change if m = n?

5. True or false: if A and AT both have at least one right inverse, then A is square and invertible.Prove your answer.

6. True or false: if A and AT both have at most one right inverse, then A is square and invertible.Prove your answer.

7. Let A be a m× n rank r matrix, and B = B0 +NT be the general right inverse of A, whereB0 is a particular right inverse, the columns of N are a basis for nul(A) and T is a matrixof parameters. Give the dimensions of T , and the total number of parameters in terms ofm,n, r. Is your formula valid for all mn ≥ 1 and r ≥ 0?

8. Theorem 2.10 states exact conditions on the existence and number of right inverses. Provethese. Prove the statements on left inverses using the fact that XA = I if and only ifXTAT = I.

12


3. A = LU Decomposition

row operations and elementary matrices

We all (hopefully!) remember how to use row operations to solve a linear system. We have threefundamental operations, called elementary row operations.

I. Swap two rows of A: Ri Rj

II. Multiply one row of A by a non-zero constant: Ri 7→ γRi

III. Add a multiple of one row to another: Ri 7→ Ri + γRj

We can understand these operations in terms of matrices. Here are the three corresponding types ofelementary matrices.

I. The matrix Eij , with 1 in positions (i, j), (j, i) and (t, t) for t 6= i, j, and 0 everywhere else.

II. The matrix Ei(γ), with 1 on the diagonal, γ in position (i, i), and 0 everywhere else.

III. The matrix Eij(γ), with 1 on the diagonal, γ in position (i, j), and 0 everywhere else.

Put another way, an elementary matrix is the result of applying the corresponding elementary rowoperation to an identity matrix. Here are some examples, for size 3× 3.

E12 =

0 1 01 0 00 0 1

E3(−4) =

1 0 00 1 00 0 −4

E31(2) =

1 0 00 1 02 0 1

Each matrix is “almost” an identity matrix. The reason for making the connection is the followingfundamental result.

Proposition 3.1. Let A be an m× n matrix and E an elementary matrix of size m×m.Then the matrix product EA is equal to the result of applying the corresponding row operationto A.

Proof. This follows directly from Proposition 1.3. The rows of E give the coefficients of the linearcombination of the rows of A that make up the rows of EA.

It is highly recommended to fill in the details and understand the proof!

Elementary matrices are all invertible. This can be seen several ways (there’s an exercise hiding inthere), but perhaps the most useful is that we can write down their inverses.

Proposition 3.2. The inverses of elementary matrices are as follows.

I. (Eij)−1 = Eij

II. (Ei(γ))−1 = Ei(1/γ)

III. (Eij(γ))−1 = Eij(−γ)

Problem 3.3. Prove Proposition 3.2. That is, show that EijEij = I, Ei(γ)Ei(1/γ) = I, Eij(γ)Eij(−γ).


13

Gaussian elimination and elementary matrices

We’ll review Gaussian elimination in order to see how it interacts with the idea of elementary matrices.

Algorithm 3.4 (Gaussian reduction). Repeat the following steps.

1. Choose the first non-zero column.

2. Choose a non-zero position in this column: this will be the next pivot.

3. (permutation) Exchange rows so that this pivot is in the first position.

4. (normalization) Multiply this row by a constant so that the pivot becomes 1.

5. (cancellation) Add multiples of this row to the rows below it in order to make entriesbelow the pivot 0.

6. Ignore the first row and repeat.

We can interpret this in two ways: as a sequence of row operations or as a sequence of multiplicationsby elementary matrices. The second interpretation gives us the A = LU decomposition. For technicalreasons we will start with some assumptions about A: we assume that it has rank rank(A) =min m,n, all of its pivots are in the first min m,n columns and we never need to exchange rows.All of these restrictions will be lifted shortly.

Example 3.5. Apply Gaussian elimination to the matrix

1 0 23 2 0−1 −2 5

.

First, as a sequence of row operations. 1 0 23 2 0−1 −2 5

R2 7→R2−3R1−−−−−−−−−→R3 7→R3+R1

1 0 20 2 −60 −2 7

−−−−−−→R2 7→ 1

2R2

1 0 20 1 −30 −2 7

−−−−−−−−−−→R3 7→R3+2R2

1 0 20 1 −30 0 1

Next, as a sequence of multiplications by elementary matrices. The elementary matrices arebracketed for clarity. 1 0 2

3 2 0−1 −2 5

1 0 0

0 1 01 0 1

1 0 0−3 1 0

0 0 1

1 0 23 2 0−1 −2 5

=

1 0 20 2 −60 −2 7

1 0 0

0 1/2 00 0 1

1 0 00 1 01 0 1

1 0 0−3 1 0

0 0 1

1 0 23 2 0−1 −2 5

=

1 0 20 1 −30 −2 7

1 0 0

0 1 00 2 1

1 0 00 1/2 00 0 1

1 0 00 1 01 0 1

1 0 0−3 1 0

0 0 1

1 0 23 2 0−1 −2 5

=

1 0 20 1 −30 0 1

14

The last step can be rewritten as

A =

1 0 23 2 0−1 −2 5

=

1 0 03 1 00 0 1

1 0 00 1 0−1 0 1

1 0 00 2 00 0 1

1 0 00 1 00 −2 1

1 0 20 1 −30 0 1

Problem 3.6. Explain the final step in the previous example (hint: multiply each side by the inversesof the elementary matrices).

Elementary matrices are not useful for doing Gaussian elimination. They are useful for understandinghow to turn it into a matrix decomposition. We need a small result first.

Definition 3.7. An elementary matrices of type II or III is an identity matrix with onemodified value. The position of this modified value is the active position of the elementarymatrix. Let E and E′ be two elementary matrices, with active positions (i, j) and (i′, j′). Wesay that E precedes E′ in the “right order” if the following three conditions are satisfied.

1. i ≥ j2. i′ ≥ j′

3. either j < j′, or j = j′, i < i′

Note that the first two conditions say that the corresponding row operations use a row to modify arow that is not higher up. What about elementary matrices of type I? We’ll see soon, but for themoment we’ll assume they are not necessary.

The “right order” is a useful notion because of the following result.

Proposition 3.8. Consider a product of elementary matrices such that, reading left to right,they are in the “right order”. Then the product is equal to a copy of the identity matrix, witheach active position copied into place from the corresponding elementary matrix.

This is perhaps best understood by example. Here is the product of elementary matrices we sawbefore. 1 0 0

3 1 00 0 1

1 0 00 1 0−1 0 1

1 0 00 2 00 0 1

1 0 00 1 00 −2 1

The active positions (from left to right) are: (2, 1), (3, 1), (2, 2), (3, 2). These are in the “right order”,because for the active position either the column increases, or else the column stays the same andthe row increases. We can think of the active positions as going “from top to bottom and left toright”, staying within the lower triangular part. So Proposition 3.8 gives that: 1 0 0

3 1 00 0 1

1 0 00 1 0−1 0 1

1 0 00 2 00 0 1

1 0 00 1 00 −2 1

=

1 0 03 2 0−1 −2 1

One might say that the matrix on the right is the “superposition” of the four elementary matrices.Note that these elementary matrices are the inverses of the ones we used to row reduce A, writtenin reverse order. The fact they are in the “right order” is exactly equivalent to the fact that we didthe row reduction according to the order of Algorithm 3.4.

Problem 3.9. Check by direct multiplication that the given product of elementary matrices reallydoes give the result stated.

15

Theorem 3.10. Let A be an m× n matrix with rank r = min m,n, all of its pivots in thefirst r columns and such that no row exchange operations are needed to bring it to row echelonform (only row operations of type II and III).Then there exists a lower triangular matrix L of size m ×m and an upper triangular matrixU of size m× n with diagonal 1 such that A = LU .

The A = LU decomposition is exactly what Algorithm 3.4 gives. The matrix U is the final result ofapplying Algorithm 3.4 to A and the matrix L is the superposition of the inverses of the elementarymatrices used in the reduction. Here is result for the above example

A = L U 1 0 23 2 0−1 −2 5

=

1 0 03 2 0−1 −2 1

1 0 20 1 −30 0 1

Problem 3.11. Check by direct multiplication that in fact A = LU .

A little thought will show that we don’t actually need to write down all of the elementary matricesin order to write down L and U . We just need to know what the operations were and in what orderthey happened (or more precisely, what they were and that they happened in the “right order”),according to Algorithm 3.4.

Problem 3.12. Here is Algorithm 3.4 applied to two different matrices. For each, give the decom-position A = LU .

A =

4 12 0 16−2 −5 −1 −8

3 9 1/2 15

−−−−−−→R1 7→ 14R1

1 3 0 4−2 −5 −1 −8

3 9 1/2 15

R2 7→R2+2R1−−−−−−−−−−→R3 7→R3−3R1

1 3 0 40 1 −1 00 0 1/2 3

−−−−−−→R3 7→2R3

1 3 0 40 1 −1 00 0 1 6

A =

1 −1 32 1 3−2 2 −2

0 5 1

R2 7→R2−2R1−−−−−−−−−−→R3 7→R3+2R1

1 −1 30 3 −30 0 40 5 1

−−−−−−→R2 7→ 1

3R2

1 −1 30 1 −10 0 40 5 1

−−−−−−−−−−→R4 7→R4−5R2

1 −1 30 1 −10 0 40 0 6

−−−−−−→R3 7→ 1

4R3

1 −1 30 1 −10 0 10 0 6

−−−−−−−−−−→R4 7→R4−6R3

1 −1 30 1 −10 0 10 0 0

Problem 3.13. Give the row operations that occurred in the following A = LU decompositions, inthe order in which they happened. Also give the size of A.

L =

−1 0 02 1/2 03/2 0 3/2

U =

1 2 3 40 1 6 70 0 1 9

L =

7 0 00 1 07 0 7

U =

1 2 30 1 50 0 1

16

application: solving linear systems

We want to solve Ax = b. It is common in applications that we want to solve many different systemsthat have the same matrix of coefficients, so that A is fixed and we have various vectors b. Solvingeach system separately amounts to redoing the reduction of A over and over again, which is not veryefficient. The A = LU decomposition enables us to do the reduction once and re-use the result, inan optimally efficient manner.

Knowing A = LU , we want to solve LUx = b. We do this in two steps. First we solve Ly = b for avariable y and then we solve Ux = y.

Example 3.14. Solve Ax = b for A =

1 0 23 2 0−1 −2 5

and b =

132

.

We know the LU decomposition already:

1 0 23 2 0−1 −2 5

=

1 0 03 2 0−1 −2 1

1 0 20 1 −30 0 1

.

We first solve Ly = b. Since L is triangular we can do this by back-substitution, reading top tobottom. 1 0 0

3 2 0−1 −2 1

y1y2y3

=

132

solution: y =

103

Now we solve Ux = y. Since L is triangular we can do this by back-substitution, reading bottomto top. 1 0 2

0 1 −30 0 1

x1x2x3

=

103

solution: x =

−593

Notice that y acts as a variable in Ly = b, and as the constant in Ux = y.

Problem 3.15. Let A =

−2 −4 2 −61 3 −1 70 3 −1 10

=

−2 0 01 1 00 3 −1

1 2 −1 30 1 0 40 0 1 2

and b =

427

.

Solve Ax = b using the LU decomposition (and not by row-reducing).

finding left inverses

Elementary matrices are a theoretical tool that helped us understand the LU decomposition (andwill also help us remove those rather technical restrictions in Theorem 3.10. We don’t need to writethem down every time as long as we understand their properties.

We already saw how to find right inverses. We also saw how that technique could be adapted tofind left inverses. Now we’ll see that we have accidentally discovered a second way, using the idea ofelementary matrices.

Consider a m × n matrix A for which we want the left inverse: a matrix B such that BA = I. Wemight as well assume that rank(A) = n ≤ m. We can reduce A to get some thing like:

A→ · · · →[I0

]

17

The matrix on the right is an identity matrix stacked on top of a zero matrix. This reduction isequivalent to multiplying by a whole sequence of row operations. Note that the product of these rowoperations will not usually be in the “right order” (even after we take the inverse of this sequenceas we did before) and there might be permutation matrices as well. But we can consider that theproduct of elementary matrices gives some matrix E. We probably can’t get E by superposition, butwe only care that E exists, not whether or not it is easy to calculate in this way. Now let’s partitionthe rows of E according to the separation of I and 0.

EA =

[E1

E2

]A =

[I0

]Now we see that E1A = I and E2A = 0. This means that E1 is a particular left inverse and that therows of E2 form a basis for the null space of the transpose of A, nul(AT ). The general left inverse istherefore E1 + TE2, where T is a matrix of parameters.

Problem 3.16. How do we know that the rows of E2 are a basis for nul(AT )? (hint: think aboutthe dimension of nul(AT ), and the fact that E is invertible).

How do we find E? The natural way is to row-reduce not A but [A|I]. Then its reduced form is[EA|EI] so the matrix on the right is E. Effectively, doing all the row-operations to the matrix onthe right is equivalent to multiplying all the elementary matrices that make up E. So the matricesE1 and E2 are at the right, corresponding to the identity and zero matrix on the left.

Example 3.17. We’ll find the general left inverse of

1 −1 32 1 3−2 2 −20 5 1

.

We row reduce [A|I].1 −1 3 1 0 0 0

2 1 3 0 1 0 0

−2 2 −2 0 0 1 0

0 5 1 0 0 0 1

→ · · · →

1 0 0 −2/3 1/3 −1/2 0

0 1 0 −1/6 1/3 1/4 0

0 0 1 1/2 0 1/4 0

0 0 0 1/3 −5/3 −3/2 1

The horizontal line blocks off the identity matrix from the zero matrix, and hence E1 from E2.To the right of the identity matrix is a particular left inverse; to the right of the zero matrix is abasis for nul(AT ). This gives the general left inverse.−2/3 1/3 −1/2 0

−1/6 1/3 1/4 01/2 0 1/4 0

+

t1t2t3

[1/3 −5/3 −3/2 1]

Note that the basis for nul(AT ) is written out in rows, and the parameters written out in columns.

Now we know a way to find right inverses, and a way to find left inverses. In fact we know two waysfor each, since the left [right] inverse of A is the transpose of the right [left] inverse of the transposeof A.

Problem 3.18. Find the general left inverse of A =

1 0 −10 1 0−3 2 21 1 1

by reducing [A|I].

Compare with Example 2.2.

18

exercises

1. Let A be a m × n matrix and E a m ×m elementary matrix. Show that EA is the matrixthat results from applying the corresponding elementary row operation to A. In other words,prove Proposition 3.1. (hint: Proposition 1.3)

2. Consider some elementary row operation. Show that the corresponding elementary matrix isobtained by applying this row operation to the identity matrix. How do we know what sizeof identity matrix to use?

3. Consider the relation A ∼ B when B is obtained from A by multiplying on the right by somesequence of elementary matrices (multiplying by the empty sequence means multiplying byI). Show that this is an equivalence relation on the set of m× n matrices. What propertiesdo equivalent matrices all have in common?

4. Show that the product of two lower triangular matrices is lower triangular. Show that theproduct of two lower triangular matrices with 1’s on the diagonal is lower triangular with 1’son the diagonal.

5. Prove Proposition 3.8.

6. Compute

1 0 00 1 00 a 1

1 0 0b 1 00 0 1

. Explain why this is not a contradiction to Proposition 3.8.

7. Consider two elementary matrices with the same active position (this does not imply the twomatrices are equal). According to Definition 3.7 they can not be in the “right order”. Whathappens when you multiply them? Does Proposition 3.8 hold anyway? (Note that there aretwo cases: the matrices are of type II or III.)

8. Find the A = LU decomposition for each of the following. Give the sequence of elementarymatrices that multiply A on the left to give U , as well as the matrices L and U .

a)

[2 4 −2 65 9 −7 11

]b)

[1 23 4

]c)

2 2 −2 2−1 0 4 12 3 0 0

9. Find the A = LU decomposition for A =

−1 −1 1 −1 33 4 0 3 −71 4 9 3 2

. Using this, solve Ax = b

for each of the following.

a) b =

111

b) b =

100

c) b =

02−1

d) b =

310

10. Find the general left inverse of each matrix of Exercise 2.2. Carefully specify B0 and N and

T . For the parameters, use subscripts to indicate the column it corresponds to (so s1, t1, etcare parameters that determine the first column, etc.)

11. The following row-reduction is given

[A I

];

1 0 2 1 2 2 10 1 3 0 1 −3 −10 0 0 2 0 1 10 0 0 1 3 1 3

a) Explain why A has no left inverse.

19

b) What is the largest submatrix A′ of A that does have a left inverse? Give the generalleft inverse for A′.

c) Give an expression for A. You should give a precise and exact (and in fact quite compact)formula forA in terms of the information given. You do not need to evaluate this formula,but it should be something you could evaluate.

12. (slightly ) Say A is an m×n matrix of rank n with m > n, and we have the row-reduction[A|I] ; [R|E], where R is in reduced row echelon form (since rank(A) = n then R is anidentity matrix with zero rows underneath). Let X ′ be any n × (m − n) matrix and letX = [I|X ′].Show that the product XE is a left inverse of A, and that every left inverse of A arises inthis way.

13. (slightly ) Consider a row-reduction of [A|I] ; [R|E], where R is in reduced row-echelonform. Explain how you could (with almost no further work) read off of [R|E] a basis fornul(A) and nul(AT ).

20


4. PAQ = LU Decomposition

Gaussian elimination, variation 1, and A = L0U0

A variant on Gaussian elimination skips the normalisation step. Pivots are non-zero, but not neces-sarily equal to 1. In practice, it’s more efficient.

Algorithm 4.1 (Gaussian reduction). Repeat the following steps.

1. Choose the first non-zero column.

2. Choose a non-zero position in this column: this will be the next pivot.

3. (permutation) Exchange rows so that this pivot is in the first position.

4. (cancellation) Add multiples of this row to the rows below it in order to make entriesbelow the pivot 0.

5. Ignore the first row and repeat.

This is more numerically stable when done on “real” matrices (i.e., when computations are doneapproximately using floating point arithmetic). Strictly speaking, the final matrix isn’t RREF as thepivots are not 1, but this turns out not to be a problem.

Example 4.2. Here’s a reduction of a matrix A using Algorithm 4.1.

A =

2 0 1 3−4 −1 −2 −4

2 −3 2 12

R2 7→R2+2R1−−−−−−−−−→R3 7→R3−R1

2 0 1 30 −1 0 20 −3 1 9

−−−−−−−−−−→R3 7→R3−3R2

2 0 1 30 −1 0 20 0 1 3

Just as for Algorithm 3.4, we can write this reduction in terms of elementary matrices.

Example 4.3. Find the decomposition in terms of elementary matrices of the previous reduction.

Here are the row reductions as elementary matrices on the left. 2 0 1 3−4 −1 −2 −42 −3 2 12

1 0 0

0 1 0−1 0 1

1 0 02 1 00 0 1

2 0 1 3−4 −1 −2 −42 −3 2 12

=

2 0 1 30 −1 0 20 −3 1 9

1 0 0

0 1 00 −3 1

1 0 00 1 0−1 0 1

1 0 02 1 00 0 1

2 0 1 3−4 −1 −2 −42 −3 2 12

=

2 0 1 30 −1 0 20 0 1 3

As before, we can’t multiply these by “superposition”. But again, as before, if we move them tothe other side then they are in the “right order” and we can multiply them by “superposition”.


21

Active positions are shown in boldface.

A =

2 0 1 3−4 −1 −2 −42 −3 2 12

=

1 0 0−2 1 00 0 1

1 0 00 1 01 0 1

1 0 00 1 00 3 1

2 0 1 30 −1 0 20 0 1 3

=

1 0 0−2 1 01 3 1

2 0 1 30 −1 0 20 0 1 3

= L0U0

Theorem 4.4. Let A be an m × n matrix with rank(A) = r, all of its pivots in the first rcolumns and such that when we use the variant of Gaussian elimination (with no normalisation,Algorithm 4.1) no row exchange operations are needed to bring it to row echelon form (we refuseto use row operations of type II and we don’t need to use row operations of type I).Then there exists a lower triangular matrix L0 of size m×m with 1’s on the diagonal and anupper triangular matrix U0 of size m× n such that A = L0U0.

The A = L0U0 decomposition is exactly what we get from Algorithm 4.1. The matrix U0 is theresult of having applied Algorithm 4.1 to A and the matrix L0 is the superposition of the elementarymatrices used in the reduction. Compare with Theorem 3.10.

Problem 4.5. Explain why the matrix U0 of Theorem 4.4 (i.e., coming out of Algorithm 4.1) hasall of the zeros on its diagonal at the end. Which hypothesis guarantees this? If we remove thishypothesis how can we retain this property?

Proposition 4.6. Let A = L0U0 with rank(A) = m. Set D to be the m×m diagonal matrixwith the diagonal elements of U on the diagonal. So D is the matrix of pivots.Then L = L0D and U = D−1U0.

Note that since the rank of A is m, then D is invertible (why?). So A = LU = L0DD−1U0 = L0D0.

Example 4.7. Find the A = LU decomposition of the matrix A of Example 4.2 from its A = L0U0

decomposition.

We see that D =

2 0 00 −1 00 0 1

and so D−1 =

1/2 0 00 1/(−1) 00 0 1/1

=

1/2 0 00 −1 00 0 1

.

L = L0D =

1 0 0−2 1 01 3 1

2 0 00 −1 00 0 1

=

2 0 0−4 −1 02 −3 1

U = D−1U0 =

1/2 0 00 −1 00 0 1

2 0 1 30 −1 0 20 0 1 3

=

1 0 1/2 3/20 1 0 −20 0 1 3

Problem 4.8. Check that matrix multiplication does in fact give that LU = A and L0U0 = A.Calculate the A = LU decomposition directly from the matrix, and verify that it’s the same as whatwe just obtained.

Problem 4.9. Calculate the A = L0U0 decomposition for the matrices of Problem 3.12. Do it usingthe matrix D, and also directly by reducing.

22

missing pivots and A = L0U0

If rank(A) < min m,n, then the A = LU decomposition doesn’t exist since we would need U tohave 1’s on the diagonal. But we can still find an A = L0U0 decomposition since U0 need not havea pivot in each row.

To find A = L0U0 in general, we need small modification to Algorithm 4.1. Instead of the pivots inechelon form, we will put them on the diagonal. Here is the change:

3. (permutation) Exchange rows so that the pivot is on the diagonal.

Note that if the columns with pivots are at the beginning then this amounts to the same thing.

Example 4.10. Find an A = L0U0 decomposition for the A =

1 2 12 4 13 6 54 8 8

.

We start by row-reducing, but without normalizing.

A =

1 2 12 4 13 6 54 8 8

R2 7→R2−2R1−−−−−−−−−−→R3 7→R3−3R1

R4 7→R4−4R1

1 2 10 0 −10 0 20 0 4

−−−−−−−−−−→R4 7→R4−2R3

1 2 10 0 −10 0 20 0 0

Normally, the last step would be R3 7→ R3 + 2R2 and R4 7→ R4 + 4R2. But we want to havethe pivots on the diagonal. The makes L0 a lower triangular matrix. We find the decompositionas usual, by taking products of elementary matrices. We can write down L0 and U0 directly, bytaking superpositions, but as usual we need to use the inverse of each row operation.

1 2 12 4 13 6 54 8 8

=

1 0 0 02 1 0 03 0 1 04 0 2 1

1 2 10 0 −10 0 20 0 0

= L0U0

There remains one (minor) problem with this example. The diagonal of U0 contains pivots and zeros.We’d like the pivots to come first. For the moment this seems elusive. . .

Problem 4.11. Find an A = L0U0 decomposition for the matrices

0 1 20 2 30 3 7

and

1 0 22 1 00 2 −4

.

permutation matrices

A permutation matrix is a matrix with exactly one 1 in each row and each column, and all otherentries being zero. Every permutation matrix is square (why?). Here’s an example.

P =

0 1 0 00 0 1 01 0 0 00 0 0 1

23

If we calculate PA with this matrix P and any compatible matrix A, we get that the first row of PAis equal to the second row of A, the second row of PA is the third row of A, the third row of PA isthe first row of A and the fourth row is unchanged. This is just Proposition 1.3. So multiplying PAis equivalent to permuting the rows of A.

Elementary matrices of type I are also permutation matrices. Furthermore every product of type Iis a permutation matrix and vice versa.1

Problem 4.12. Compute the product E12E13E12E14 for the matrices with m = 4. Try to do itwithout “really multiplying” the matrices. Show that the result is a permutation matrix. DetermineE12E13E12E14 for matrices with m = 17.

Permutation matrices are invertible, and we can easily write down their inverses.

Problem 4.13. Let P be a permutation matrix. Show by direct multiplication that PP T = I. ThusP−1 = P T , P is an orthogonal matrix and the columns of P form an orthonormal basis of Rn.

We already know that for an elementary matrix of type I, (Eij)−1 = Eij. But such a matrix is a

permutation matrix, so we should have (Eij)−1 = (Eij)

T . Explain the “contradiction”.

permutations and PA = L0U0, PA = LU

Let A be an m×n matrix with pivots in the first m columns (but maybe not in all of these columns),and suppose that in trying to apply Gaussian elimination we need to exchange rows. If we knewin advance which row exchanges were necessary, we could do them at the beginning. This wouldgive a permutation matrix P such that Gaussian elimination applied to PA would not need any rowexchanges. So we could find PA = LU or PA = L0U0.

What do we do when we only find out part way through that row exchanges are necessary?

Theorem 4.14. Let A be a m× n rank r matrix with all pivots within the first r columns.Then there exists an m×m permutation matrix P , an m×m lower triangular matrix L0 with1’s on the diagonal and an m× n upper triangular matrix U0 such that PA = L0U0.Also, if rank(A) = min m,n there exists an m×m lower triangular matrix L and an m× nupper triangular matrix U with 1’s on the diagonal such that PA = LU .Furthermore if rank(A) = m and we let D be the m × m diagonal matrix with the diagonalelements of U0 on the diagonal then U = D−1U0 and L = L0D.

This is Theorem 3.10 and Theorem 4.4 applied to the matrix PA. Notice that this decomposition isnot unique : in each step of row permutation in Gaussian elimination we have a choice. Each choicegives a different (valid) decomposition.

Problem 4.15. Explain why the matrix U0 of Theorem 4.4 (i.e., coming out of Algorithm 4.1) hasall of the zeros on its diagonal at the end. Which hypothesis guarantees this? If we remove thishypothesis how can we retain this property?

It remains to determine P . In principal we could apply Gaussian elimination, see which permutationswere required, back up and apply them to A and then redo the decomposition. But we’d like to avoidthis extra work, by understanding how permutations act.

Proposition 4.16. Let E be a matrix and P a permutation matrix.Then PE = E′P with E′ = PEP T .

1 This is a result in group theory. The set of all permutation matrices of size m×m is a representation of the symmetricgroup Sm. The matrices of type I correspond to transpositions. The theorem says the transpositions generate Sm.

24

Proof. We directly check that E′P = (PEP T )P = PE(P TP ) = PEI = PE.

We apply this to elementary matrices.

Corollary 4.17. Let E be an elementary matrix of type II and P an elementary matrix oftype I. If E precedes P in Algorithm 3.4 then PE = EP .

Corollary 4.18. Let E be an elementary matrix of type III and P an elementary matrix oftype I. If E precedes P in Algorithm 3.4 then PE = E′P , where E′ = I + P (E − I). In otherwords, E′ is the elementary matrix obtained by applying P to the active position of E.

We can prove both of these using a “block-matrix” approach. We might see this in class, or perhapson an assignment. For now, here are a few examples.

Example 4.19. Consider what happens when we change the order of a row permutation and anotheroperation.

If we apply a row operation such as, for example, R3 7→ R3− 2R1 followed by R2 R3, then thisis exactly equivalent to R2 R3 followed by R2 7→ R2 − 2R1. 1 0 −1 2

0 0 4 32 1 0 7

−−−−−−−−−−→R3 7→R3−2R1

1 0 −1 20 0 4 30 1 2 3

−−−−−→R2R3

1 0 −1 20 1 2 30 0 4 3

1 0 −1 2

0 0 4 32 1 0 7

−−−−−→R2R3

1 0 −1 22 1 0 70 0 4 3

−−−−−−−−−−→R2 7→R2−2R1

1 0 −1 20 1 2 30 0 4 3

We add −2R1 either to the old R3 (before the permutation) or to the new R2 (after the permu-tation). Of course these are the “same” rows. In terms of elementary matrices we have:1 0 0

0 0 10 1 0

1 0 00 1 0−2 0 1

=

1 0 0−2 1 00 0 1

1 0 00 0 10 1 0

Check this matrix product directly!

Problem 4.20. Show that if we apply R4 7→ R4 − 2R1 followed by R2 R3, this is equivalent todoing R2 R3 followed by R4 7→ R4 − 2R1 (no change).

Show that if we apply R1 7→ −2R1 followed by R2 R3, this is equivalent to doing R2 R3 followedby R1 7→ −2R1 (no change).

Problem 4.21. Explain why R2 7→ −2R2 never precedes R2 R3 in Gaussian elimination. Whatdoes this say in terms of elementary matrices?

Explain why R4 7→ R4 − 2R2 never precedes R2 R3 in Gaussian elimination. What does this sayin terms of elementary matrices?

The consequence of Corollary 4.17 and Corollary 4.18 is that we know what to do with permutationmatrices in Gaussian elimination. We apply these corollaries to effectively “move the permutationsto the beginning” (without actually doing so). An example will show how and why.

Example 4.22. Apply non-normalised Gaussian elimination (Algorithm 4.1) to A =

2 1 −28 4 −8−4 −3 70 1 2

.

25

Here are the operations.

A =

2 1 −28 4 −8−4 −3 7

0 1 2

R2 7→R2−4R1−−−−−−−−−−→R3 7→R3+2R1

2 1 −20 0 00 −1 30 1 2

−−−−−→R2R3

2 1 −20 −1 30 0 00 1 2

−−−−−−−−−→R4 7→R4+R2

2 1 −20 −1 30 0 00 0 5

−−−−−→R3R4

2 1 −20 −1 30 0 50 0 0

= U0

We can write this in terms of elementary matrices, and then apply Corollary 4.18 and Corol-lary 4.17 so as to rewrite with permutations “at the beginning”. At each row, imagine thepermutation matrix sliding to the right, and observe its effect on the matrices it slides over. Thepermutations matrices are in blue and the actiuve positions in the other elementar matrices arein boldface.

1 0 0 00 1 0 00 0 0 10 0 1 0

1 0 0 00 1 0 00 0 1 00 1 0 1

1 0 0 00 0 1 00 1 0 00 0 0 1

1 0 0 00 1 0 02 0 1 00 0 0 1

1 0 0 0−4 1 0 0

0 0 1 00 0 0 1

A =

2 1 −20 −1 30 0 50 0 0

1 0 0 00 1 0 00 0 0 10 0 1 0

1 0 0 00 1 0 00 0 1 00 1 0 1

1 0 0 02 1 0 00 0 1 00 0 0 1

1 0 0 00 1 0 0−4 0 1 0

0 0 0 1

1 0 0 00 0 1 00 1 0 00 0 0 1

A =

2 1 −20 −1 30 0 50 0 0

1 0 0 00 1 0 00 1 1 00 0 0 1

1 0 0 02 1 0 00 0 1 00 0 0 1

1 0 0 00 1 0 00 0 1 0−4 0 0 1

1 0 0 00 1 0 00 0 0 10 0 1 0

1 0 0 00 0 1 00 1 0 00 0 0 1

A =

2 1 −20 −1 30 0 50 0 0

By multiplying by the inverses, we put the non-permutation matrices to the right:

1 0 0 00 1 0 00 0 0 10 0 1 0

1 0 0 00 0 1 00 1 0 00 0 0 1

A =

1 0 0 00 1 0 00 0 1 04 0 0 1

1 0 0 0−2 1 0 0

0 0 1 00 0 0 1

1 0 0 00 1 0 00 −1 1 00 0 0 1

2 1 −20 −1 30 0 50 0 0

This gives the PA = L0U0 decomposition by “superposition”.

1 0 0 00 0 1 00 0 0 10 1 0 0

A =

1 0 0 0−2 1 0 0

0 −1 1 04 0 0 1

2 1 −20 −1 30 0 50 0 0

Of course we didn’t really need to write down all these elementary matrices. The final permutationmatrix P is the product of all of the elementary matrices Eij . The matrix L0 is built up as usual,except that the active positions (up to now) are moved by the Eij matrices.

Example 4.23. Find directly the decomposition PA = L0U0 of the matrix A of Example 4.22.

We apply Gaussian elimination, but by writing down the decomposition in “real-time”; thinkingof the matrices P , L0 and U0 as being under construction. Note the effect on “L0” and “U0” when

26

we do a row exchange. The steps are the same as Example 4.22.

A =

2 1 −28 4 −8−4 −3 7

0 1 2

A =

1 0 0 04 1 0 0−2 0 1 0

0 0 0 1

2 1 −20 0 00 −1 30 1 2

R2 7→ R2 − 4R1

R3 7→ R3 + 2R11 0 0 00 0 1 00 1 0 00 0 0 1

A =

1 0 0 0−2 1 0 0

4 0 1 00 0 0 1

2 1 −20 −1 30 0 00 1 2

R2 R3

1 0 0 00 0 1 00 1 0 00 0 0 1

A =

1 0 0 0−2 1 0 0

4 0 1 00 −1 0 1

2 1 −20 −1 30 0 00 0 5

R4 7→ R4 +R2

1 0 0 00 0 1 00 0 0 10 1 0 0

A =

1 0 0 0−2 1 0 0

0 −1 1 04 0 0 1

2 1 −20 −1 30 0 50 0 0

R3 R4

P A = L0 U0

We notice that the effect of a row exchange is to be applied to all the active positions in L0 uptill then, and also to be applied to U0. It’s because the active positions up till then correspond tothe row operations that precede the permutation.

Problem 4.24. Give a PA = L0U0 decomposition for the matrix

1 2 1 00 0 2 −12 5 4 1

.

pivots in the wrong columns and PAQ = L0U0, PAQ = LU

We have always presumed that the pivots are at the “beginning” of the matrix, in the first m columns.This guarantees that we can arrange to have all the pivots on the diagonal. What do we do if thisis not the case? We apply a column permutation to put them in the right place.

Multiplication on the left by an elementary matrix corresponds to a row operation: this is Propo-sition 3.1. The same principal tells us that multiplication on the right by an elementary matrixcorresponds to a column operation. The proof is essentially Proposition 1.2.

So if we are finding an “LU”-type decomposition and we find that the pivots are not in the firstm columns, we apply a column permutation. Because this corresponds to multiplying on the right,there is no interaction between the row operations and the column operations. So we can do a columnpermutation at any point, as long as the row operations continue in the “right order”.

27

Theorem 4.25. Let A be an m× n matrix.Then there exists an m×m permutation matrix P , an m×m lower triangular matrix L0 with1’s on the diagonal, an m × n upper triangular matrix U0 with all 0’s on the diagonal at theend of the diagonal and an n× n permutation matrix Q such that PAQ = L0U0.Also, if rank(A) = min m,n we can find a PAQ = LU decomposition with L lower triangularand U upper triangular with 1’s on the diagonal.

Problem 4.26. How can we guarantee that the pivots precede the non-pivots in U0? Compare withProblem 4.15 and Problem 4.15.

Here is an example where a column permutation is necessary. We want all of the pivots to precedethe non-pivots: the zeros on the diagonal of U0 should be at the end.

Example 4.27. Calculate a PAQ = L0U0 decomposition for a matrix A =

1 2 12 4 73 6 2

.

We write the process down in an efficient way, building the matrices L0 and U0 in place. Writingdown the row operations is strictly speaking redundant, but it’s nice to see them. Notice thatwhat goes into L0 is the inverse of the row operations. Do you recall why?

A =

1 2 12 4 73 6 2

A =

1 0 02 1 03 0 1

1 2 10 0 50 0 −1

R2 7→ R2 − 2R1

R3 7→ R3 − 3R1

A

1 0 00 0 10 1 0

=

1 0 02 1 03 0 1

1 2 10 5 00 −1 0

C2 C3

A

1 0 00 0 10 1 0

=

1 0 02 1 03 −1

5 1

1 2 10 5 00 0 0

R3 7→ R3 + 15R2

This is a PAQ = L0U0 decomposition with P = I (no row permutations).

Just like Theorem 4.14, the decomposition is not unique. Not only is the decomposition not unique,but sometimes we can do a column permutation instead of a row permutation.

Example 4.28. Calculate two different PAQ = L0U0 decompositions for A =

1 2 11 2 32 7 2

.

A =

1 2 11 2 32 7 2

A =

1 0 01 1 02 0 1

1 2 10 0 20 3 0

R2 7→ R2 −R1

R3 7→ R3 − 2R11 0 00 0 10 1 0

A =

1 0 02 1 01 0 1

1 2 10 3 00 0 2

R2 R3

28

This is a PAQ = L0U0 decomposition with Q = I (no column permutations).

Alternately we can do a column permutation.

A =

1 2 11 2 32 7 2

A =

1 0 01 1 02 0 1

1 2 10 0 20 3 0

R2 7→ R2 −R1

R3 7→ R3 − 2R1

A

1 0 00 0 10 1 0

=

1 0 01 1 02 0 1

1 1 20 2 00 0 3

C2 C3

This is a PAQ = L0U0 decomposition with P = I (no row permutations).

Problem 4.29. Give a PAQ = L0U0 decomposition for

1 2 1 23 2 −1 50 1 1 4

and

1 2 1 23 6 0 00 0 1 6

. For the

second, give two decompositions: one with no column permutations and one with all the pivots at thebeginning of the matrix.

solving systems redux

We can solve linear systems Ax = b using a PAQ = L0U0 decomposition.

Ax = b

PAQQTx = Pb

L0U0QTx = Pb

L0U0x′ = b′

The vector x′ is the vector of variables x permuted according to QT and the vector b′ is the vectorof constants b permuted according to P .

Problem 4.30. For the matrix A of Example 4.27, solve Ax =[112

]and Ax =

[05−1

]. Do this

using a PAQ = L0U0 decomposition; furthermore, do it using both decompositions to verify that thesolutions are the same.

Do the same thing with the matrix A of Example 4.28.

exercises


2. Prove that the product of two permutation matrices is a permutation matrix.

3. Prove Corollary 4.17 and Corollary 4.18.

29

4. Consider three matrices A,B et C, and their row-reductions as follows.

[A I

];

1 0 0 2 0 0 1 0 3 20 0 1 −3 0 0 0 2 1 −10 0 0 0 1 0 1 2 3 40 0 0 0 0 1 0 0 0 5

[B I

];

1 0 0 0 1 2 3 4 5 60 1 0 0 −1 1 2 3 0 00 0 1 0 1 −1 1 2 0 40 0 0 1 0 0 0 1 2 30 0 0 0 1 0 1 1 3 20 0 0 0 2 1 0 0 0 1

[C I

];

1 0 0 0 0 1 2 −10 1 0 0 1 0 1 10 0 1 0 1 0 0 10 0 0 0 0 0 3 4

Consider the six matrices A, B, C, AT , BT , CT . For each, determine (with justification) ifthey have a left inverse, a right inverse and an inverse. Give explicitly all the inverses (left,right, inverse) that each one has.

5. Let A =

1 2 22 2 30 4 0

. Find a A = LU decomposition and a A = L0U0 decomposition. Specify

the matrices L, U , L0 et U0.

6. Find a PAQ = L0U0 decomposition for the matrix A =

1 3 3 02 6 6 02 6 5 1

. Specify the matrices

P , Q, L0 and U0.

7. Assume PAQ = LU with the following matrices.

P =

0 1 01 0 00 0 1

Q =

1 0 00 0 10 1 0

L =

2 0 03 1 00 2 −1

U =

1 3 00 1 20 0 1

Solve Ax = b using the method of the PAQ = LU decomposition for each of the following.

a) b =

421

b) b =

100

c) b =

000

8. Assume PAQ = L0U0 with the following matrices.

P =

1 0 0 00 0 1 00 0 0 10 1 0 0

Q =

1 0 0 0 00 0 0 0 10 1 0 0 00 0 0 1 00 0 1 0 0

L =

1 0 0 00 1 0 03 1 1 00 0 2 1

U =

1 3 0 −1 00 1 2 0 20 0 1 2 10 0 0 0 0

Solve Ax = b using the method of the PAQ = L0U0 decomposition for each of the following.

a) b =

2−444

. b) b =

1215

. c) b =

1234

d) b =

0000

.

30

9. Find an example (or two. . . ) of a matrix A such that in trying to find an L0U0-type decom-position we must find PAQ = L0U0 with P 6= I and Q 6= I. In other words, find A that needsboth row and column swaps. Recall that the L0U0-type decompositions have the pivots onthe diagonal, with the nonzero elements on the diagonal (pivots) preceding the zeros on thediagonal.

10. Consider A = L0U0 where A has rank r and the pivots of A all appear in the first r columns.Show that Ax = b has a solution if and only if b is a linear combination of the first r columnsof L0. (hint: consider solving Ax = b in two steps as L0y = b, U0x = y: how would therefail to be a solution?)

11. Show that in a A = LU decomposition of a square matrix A, that det(A) is equal to theproduct of the diagonal entries of L. Recall that strictly, LU -type decompositions are onlydefined when the rank of A is either the number of rows or columns of A, and no row orcolumn swaps were required.

12. Show that in a A = L0U0 decomposition of a square matrix A, that det(A) is equal to theproduct of the diagonal entries of U0. Recall that in L0U0-type decompositions, pivots areon the diagonal, and no row or column swaps were required.

13. Suppose we have A = LU where the rank of A is the number of rows of A, and we wish tosolve Ax = b, where the entries of L, U , and b are integers. Show that the solution x hasthe property that the denominator of xi divides the product of the first i diagonal entries ofL. Conclude that the denominator of each xi divides det(A).

14. ( but not too much) Consider a square and invertible matrix A. We saw how A = LU couldbe obtained using Gaussian elimination. Show that this is in fact the “only” way, in the sensethat if A = LU = L′U ′ where L,L′ are lower triangular and U,U ′ are upper triangular with1’s on the diagonal, that L = L′ and U = U ′.

15. ( but not too much) Suppose that A is a square invertible matrix, with A = LU andA = L0U0 decompositions. Define the matrix D to be equal to L on the diagonal and zeroelsewhere.a) Show that A = LDD−1U (in particular, why does D−1 exist).b) Show that A = L0DU = LD−1U0.c) Describe the LU -type decomposition and the L0U0-type decomposition for the matrix

AH in terms of the matrices for A. Justify your answer!

16. ( but not too much) Suppose that A is a square invertible real symmetric matrix, withA = LU and A = L0U0 decompositions. Define the matrix D to be equal to L on thediagonal and zero elsewhere.a) Show that U = LT0 and U0 = LT .b) Show that if the diagonal of D is positive, then there is a lower triangular real matrix

M such that A = MMT .

31


5. Vector Spaces

One hopes that some of this will be a review, or at the least a reminder. . .

systems and solutions

In order to solve Ax = b we could write down the augmented matrix [A|b] and row-reduce it toRREF. In practice, using an LU -type decomposition is more efficient, but we still need to know howto read solutions from a matrix, either RREF or simply echelon. Notice that “reading a solution”is nothing other than understanding the relationship between an augmented matrix and a system oflinear equations. The solution depends on the pivots.

• If rank(A) = rank(A|b) then there is a solution (at least one).

• If rank(A) < rank(A|b) then there is no solution.

• If rank(A) = rank(A|b) = n then the solution is unique.

• If rank(A) = rank(A|b) < n then there are infinitely many solutions.

We saw this when we studied right inverses: the right inverse exists if and only if there is a pivotin each row (rank(A) = rank(A|b) for every vector b) and the right inverse is unique if and only ifthere are no columns without pivots (rank(A) = rank(A|b) = n). As a consequence we also see thata homogeneous system (meaning a system Ax = 0) is always consistent.

We all know how to read solutions from an augmented matrix, right?

Problem 5.1. Make sure you know how to read the general solution from an augmented matrix.

Notice furthermore that the general solution of Ax = b can be written as x = x0 + h, where x0 is aparticular solution and h is an arbitrary vector taken from the null space of A. We could equivalentlywrite this as x = x0 + Nt where the columns of N form a basis for the null space of A and t is avector of parameters. This is what we did for the general right inverse of a matrix.

determinants

A determinant is a function defined on n × n matrices (square). There is a special formula for1× 1 matrices.

det[a]

= a

We can develop a general formula for an arbitrary (square) matrix by cofactors. The (i, j)-cofactorof A is (−1)i+j det(A[ij]), where the matrix A[ij] is obtained by removing the i-th row and j-th columnfrom A. We can compute the determinant of A by expansion on any row s or any column t.

det(A) = As1(−1)s+1 det(A[s1]) +As2(−1)s+2 det(A[s2]) + · · ·+Asn(−1)s+n det(A[sn])

det(A) = A1t(−1)1+t det(A[1t]) +A2t(−1)2+t det(A[2t]) + · · ·+Ant(−1)n+t det(A[nt])

In fact this isn’t really a definition: one would have to prove that expansion along any row or columngives the same answer. We omit this proof. But this gives an important corollary: for a triangularmatrix, the determinant is the product of the diagonal entries.

Problem 5.2. Show that for any triangular matrix the determinant is the product of the diagonalentries.


32

Problem 5.3. An (upper) anti-triangular matrix is a matrix A such that Aij = 0 wheneveri ≥ n−j+1 (you should make a picture to see what this means). Give a formula for the determinantof an anti-triangular matrix.

Problem 5.4. You might know that for a 2 × 2 matrix we have det(

[a bc d

]) = ad − bc. Show that

this follows from expansion along any row or column based on the formula for a 1× 1 matrix.

For all practical purposes expansion by cofactors is the worst method for calculating the determinant:it needs on the order of n! operations to find the determinant of an n×n matrix. It is more efficientto reduce the matrix to a triangular form A→ T . We can easily calculate det(T ), and we can relatedet(A) to det(T ) using the following.

1. If ARiRj−→ B then det(B) = (−1) det(A).

2. If ARi 7→γRi−→ B then det(B) = (γ) det(A).

3. If ARj 7→RjγRi−→ B then det(B) = det(A).

Problem 5.5. If B = γA, then what is the relationship between det(A) and det(B)?

We already know from Theorem 1.23 that A has an inverse if and only if det(A) 6= 0. Furthermore,if the inverse exists then det

(A−1

)= 1/ det(A).

The adjoint of a matrix A is the matrix of cofactors, transposed : (adj(A))ij = (−1)i+j detA[ji]. So

the (i, j) position of the adjoint is computed by removing the j-th row and the i-th column: this isnot a typo, it’s a transpose. One can show that A adj(A) = adj(A)A = det(A)I, as a matrix product.Assuming the inverse exists, multiplication by A−1/ det(A) gives A−1 = adj(A)/det(A).

Problem 5.6. Consider the equalities A adj(A) = adj(A)A = det(A)I. The entries on the diagonalof det(A)I are eqaul to det(A). What do you get if you compute the diagonal entries of A adj(A) oradj(A)A? Why is this equal to det(A)?

vector spaces

A vector space is a collection of objects V (the “vectors”) equipped with an operation of addition ofvectors and an operation of scalar multiplication (where the scalars are from R or C, or in general anyfield). Vectors can in principal be anything, vector addition and scalar multiplication can likewisebe any operations, as long as the following axioms are valid for any u,v,w ∈ V and scalars r, s.

1. u + v ∈ V2. ru ∈ V3. u + v = v + u

4. u + (v + w) = (u + v) + w

5. r(u + v) = ru + rv

6. (r + s)u = ru + su

7. r(su) = (rs)u

8. 1u = u

9. There exists 0 ∈ V such that u + 0 = u for every u ∈ V .

10. For every u ∈ V there exists a −u ∈ V such that u + (−u) = 0.

Examples are Rn, Cn, P (polynomials), Pk (polynomials of degree at most k),Mm×n (m×n matrices,real or complex). Another more general example is the set of all k-times continuously differentiable

33

functions on an interval [a, b]. These are functions that have derivatives up to order k that arecontinuous; we write Ck[a, b].For each of these examples we need to say what the operations are. Typically they will be the standardones that you already know about. When the operations are standard we often omit explicit mentionof them.

The zero vector and inverse vectors are defined by a property and not by a formula. So in principalthere is no reason we can’t have several. In fact this is not the case.

Proposition 5.7. For every vector space, the zero vector 0 is unique. For every vector u, itsinverse −u is unique.

Proposition 5.8. For every scalar r and every vector u the following are true.

• r0 = 0

• 0u = 0

• −1u = −u

The last two can be considered as “formulas”, since we can use them to calculate the zero vector andinverse vectors as a function of knowing what scalar multiplication means. This is especially usefulfor vector spaces with non-standard operations.

subspaces

Let V be a vector space and U a subset of V (we write U ⊆ V ). It could be the case that U is avector space unto itself; if so then U is a subspace of V . Note that in this context it is necessarilythe case that U has the same operations as V . The fact that the operations are inherited gives ashortcut.

Theorem 5.9. Let V be a vector space and U a non-empty subset of V equipped with the sameoperations as V . Then the following are equivalent.

• U is a subspace of V

• U is non-empty, and x + y ∈ U and rx ∈ U for all x,y ∈ U and scalars r

• U is non-empty, and rx + y ∈ U for all x,y ∈ U and scalars r

• U is non-empty, and rx + sy ∈ U for all x,y ∈ U and scalars r, s

The last condition shows that we can combine the two closure axioms into one “double-closure”axiom. Another way of thinking of it: we could say that U is closed under the taking of linearcombinations.

linear independence and span

Let S = u1,u2, · · · ,uk be a set of vectors in some vector space. The set S is linearly indepen-dent if the only solution to the equation α1u1 + α2u2 + · · ·+ αkuk = 0 is α1 = α2 = · · · = αk = 0.Otherwise S is linearly dependent. The key word is “only”, since α1 = α2 = · · · = αk = 0 isalways a solution.

34

In general if S ⊆ T with T an independent set then T is also independent. Also if S ⊆ T with Sdependent then T is also dependent. Furthermore the set 0 is dependent. You are invited to provethese assertions!

Let S be a set of vectors in some vector space V . The span of S is the set of all linear combinationsof (the vectors of) S.

v : v = α1u1 + α2u2 + · · ·+ αkuk

This is a subspace of V , as we can directly check using the subspace test.

Problem 5.10. Check that the span of a set S is a subspace of the vector space that contains S.

basis, coordinates and isomorphisms

A basis of a vector space V is a set of vectors in V that is independent and spans V . If a set ofvectors B = w1,w2, · · · ,wk is independent, then B forms a basis for the space it spans.

Theorem 5.11. Let U be a (sub)space.

• Every basis for U has the same number of vectors. This number is the dimension ofU .

• If S is an independent set of j vectors in U then j ≤ dim(U).

• If S is a spanning set of k vectors in U then dim(U) ≤ k.

• If B = w1,w2, · · · ,wn is a basis of U and u ∈ U then there is a unique solution foru = α1w1 + α2w2 + · · ·+ αnwn. The numbers α1, · · · , αn are the coordinates of uwith respect to the basis B.

When we speak of coordinates, we are really dealing with an ordered basis. The coordinatesdepend on the basis but also on the order of the vectors in the basis.

An isomorphism is a function φ : U → V between two vector spaces such that

• φ(ru1 + u2) = rφ(u1) + φ(u2)

• φ(u) = 0 if and only if u = 0

• For every v ∈ V there exists a u ∈ U with φ(u) = v

The first condition is called linearity. The second condition is also called injectivity; the thirdis also called surjectivity. These two conditions are equivalent to one combined condition calledbijectivity.

• For every v ∈ V there exists a unique vector u ∈ U with φ(u) = v

Isomorphisms are useful because they allow us to translate a problem in a “complicated” vector spaceinto an equivalent problem in a “simple” vector space. This is because every “algebraic property” ispreserved by an isomorphism. For instance, given an isomorphism φ : U → V then:

• If w1,w2, · · · ,wk is independent in U then φ(w1), φ(w2), · · · , φ(wk) is independent inV .

• If w1,w2, · · · ,wk spans U then φ(w1), φ(w2), · · · , φ(wk) spans V .

• If w1,w2, · · · ,wk is a basis for U then φ(w1), φ(w2), · · · , φ(wk) is a basis for V .

• If u = α1w1 +α2w2 + · · ·+αnwn ∈ U then φ(u) = α1φ(w1) +α2φ(w2) + · · ·+αnφ(wn) ∈ V .

• etc . . .

35

For notation, if S = x1,x2, · · · ,xk is a set of vectors, we will sometimes write φ(S) for the set thatis the application of φ to each vector of S, i.e., φ(S) = φ(x1), φ(x2), · · · , φ(xk).Every isomorphism is invertible. That is, given an isomorphism φ, we can define the inversefunction as φ−1(w) = v exactly when w = φ(v).

There is a close connection between isomorphisms and bases.

Theorem 5.12. Let V be a vector space of dimension n and B = w1,w2, · · · ,wn an orderedbasis of V .Then the function φB : V → Rn with φB(v) such that φB(v) is equal to the vector of coordinatesof v with respect to B is an isomorphism.

Proof. We check the conditions of an isomorphism directly.

Let u,v ∈ V with x = φB(u) and y = φB(v). Note that x,y ∈ Rn regardless of V .

φB(ru + v) = φB

(r (x1w1 + x2w2 + · · ·+ xnwn) + (y1w1 + y2w2 + · · ·+ ynwn)

)= φB

((rx1 + y1) w1 + (rx2 + y2) w2 + · · ·+ (rxn + yn) wn

)= rx + y

= rφB(u) + φB(v)

Thus φB is linear.

Certainly φB(0) = 0 (why?). If we have φB(u) = 0 then u = 0w1 + 0w2 + · · ·+ 0wn = 0. ThusφB(u) = 0 ⇐⇒ u = 0.

Lastly, if x ∈ Rn then u = x1w1 + x2w2 + · · ·+ xnwn ∈ V is a vector with φB(u) = x.

Example 5.13. Consider the vector space P3. We want to know if the set S =

1 + t, t+ t2 + t3, 1− t3

is independent.

We can check this directly, but for demonstration purposes we will show how the isomorphismapplies. We will need a basis for P3; for instance the set E =

1, t, t2, t3

is a basis for P3.

We can see directly that φE(S) =

1100

,

0111

,

100−1

, which is a set of vectors in R4. We can

check that this set is independent because by putting the vectors as columns in a matrix we geta matrix with a pivot in every column.

1 0 11 1 00 1 00 1 −1

→ · · · →

1 0 10 1 −10 0 10 0 0

The set φE(S) is independent, so the original set S is also independent.

Problem 5.14. One can check that B =

1, 1 + t, 1 + t+ t2, 1 + t+ t2 + t3

is another basis for P3.For the previous set S, calculate φB(S) and show that S is independent.

36

change of basis

Let S = u1, · · · ,un and T = v1, · · · ,vn be two bases for some vector space V (of dimensionn). Suppose that we know φS(w) = x and we want φT (w) = y. So x is the (known) vector ofcoordinates of w with respect to S and y is the (unknown) vector of coordinates of w with respectto T .

The key idea is to consider the function φT applied to the vectors of S — that is, write the vectors

of the basis S in terms of the vectors of the basis T . Set φT (uj) =[m1j m2j · · · mnj

]T.

w = x1u1 + x2u2 + · · ·xnun= x1 (m11v1 +m21v2 + · · ·+mn1vn)

+ x2 (m12v1 +m22v2 + · · ·+mn2vn)

+ · · ·+ xn (m1nv1 +m2nv2 + · · ·+mnnvn)

=

n∑i=1

n∑j=1

xjmij

vi

This gives that yi =∑

j xjmij . We recognize the form of this. If we set M to be the matrix withMij = mij then we have y = Mx.

The matrix M is a change of basis matrix. The columns of M are exactly the coordinates of thevectors of the basis S in terms of the basis T . We denote this matrix by MS→T in order to explicitlyspecify the two bases.

Example 5.15. Consider the vector space R3 with the two bases E =

1

00

,0

10

,0

01

and

B =

1

00

,1

10

,1

11

. The basis E is of course the standard basis of R3.

In order to determine the matrix MB→E , we need to find φE(u) for each vector u ∈ B; these willbe the columns of the matrix. This is a simple calculation, because the vectors of B are alreadywritten with respect to this basis. So

MB→E =

1 1 10 1 10 0 1

For instance, if the coordinates of a vector v with respect to B are φB(v) =

[1−23

]then the

coordinates of this vector with respect to E are MB→E φB(v) =[213

].

We can check this directly.

v = (1)

100

+ (−2)

110

+ (3)

111

=

213

v = (2)

100

+ (1)

010

+ (3)

001

=

213

37

Example 5.16. For the same two bases as the previous example, we will find the change of basismatrix ME→B.

First we find the coordinates of the vectors of the basis E with respect to the basis B.

φB

100

=

100

φB

010

=

−110

φB

001

=

0−11

It would be an excellent idea to take the time to understand the precise significance of these. Notethat each equation required the solution of a linear system (the details are left as an exercise).Continuing, these will be the columns of the matrix ME→B, so we know the matrix.

ME→B =

1 −1 00 1 −10 0 1

We know that the vector v of the previous example has coordinates φE(v) =

[213

]with respect to

E. We can thus calculate that with respect to the basis B it has coordinates given by the matrix

product φB(v) = ME→B φE(v) =[

1−23

]. Of course this is hardly a surprise. . .

These two examples show that a change of basis is related to its “inverse change”.

Proposition 5.17. If R, S and T are three bases for a vector space then MR→T =MS→TMR→S.

Proof. By definition we have φT (v) = MS→T φS(v) and φS(v) = MR→S φR(v). Thus

φT (v) = MS→T φS(v) = MS→T

(MR→S φR(v)

)=(MS→TMR→S

)φR(v)

So by definition MR→T = MS→TMR→S .

Corollary 5.18. If S and T are bases for a vector space then MT→S = (MS→T )−1.

Proof. We have MS→S = MT→T = I (why?). Thus MS→TMT→S = I = MT→SMS→T .

We see that a change of basis matrix is invertible, because we have found its inverse! We can alsosee that they are invertible directly from their construction. The columns of the matrix MS→T areφT (S).1 The set S is independent and φT is an isomorphism, so φT (S) is independent. If the columnsof a square matrix are independent then it is invertible.

Problem 5.19. For the bases of Example 5.15 and Example 5.16, check that ME→B = (MB→E)−1.

Change of basis matrices apply to any vector space.

Example 5.20. Consider P3 and the two bases

A =

1, 1− t, 1 + t2, t+ t3

B =−1 + t3, 1 + t, 1 + t+ t2, 1− t+ t2 − t3

Find MA→B.

1 We use the notation φT (S) to mean the set of all φT (v) where v is in S. In general, f is a function and X is a setof values in the domain of f , then we define f(X) = f(x) : x ∈ X, the set of all evaluations of f on values in X.This is a slight abuse of notation, but it is standard and there is typically no risk of confusion, since our context Xcannot be both in the domain of f and also be a set of values in the domain of f .

38

We find the matrix MA→B by writing the vectors of the basis A in terms of the basis B. For thefirst vector we find the coordinates by solving the following.

φB(1) =

x1x2x3x4

⇐⇒ 1 = x1(−1 + t3) + x2(1 + t) + x3(1 + t+ t2) + x4(1− t+ t2 − t3)

The system on the right gives as solution x1 = 1, x2 = 2, x3 = −1, and x4 = 1. These are thecoordinates of 1 with respect to the basis B. In a similar manner we find the coordinates of eachvector of A by solving the corresponding system. Notice that each system has a unique solution:why?

1− t = x1(−1 + t3) + x2(1 + t) + x3(1 + t+ t2) + x4(1− t+ t2 − t3) =⇒ φB(1− t) =

x1x2x3x4

=

23−22

1 + t2 = x1(−1 + t3) + x2(1 + t) + x3(1 + t+ t2) + x4(1− t+ t2 − t3) =⇒ φB(1 + t2) =

x1x2x3x4

=

1101

t+ t3 = x1(−1 + t3) + x2(1 + t) + x3(1 + t+ t2) + x4(1− t+ t2 − t3) =⇒ φB(t+ t3) =

x1x2x3x4

=

1100

These are the columns of the change of basis matrix.

MA→B =

1 2 1 12 3 1 1−1 −2 0 01 2 1 0

Problem 5.21. For the two bases of the previous example, directly find MB→A. Verify that MA→BMB→A =I.

Find the two change of basis matrices MA→E and MB→E, where E =

1, t, t2, t3

is the standardbasis for P3.

Explain why MA→B = M−1B→EMA→E. Give the matrix MB→A in terms of the matrices MA→E andMB→E.

another perspective

Every change of basis matrix is invertible. Furthermore, every invertible matrix is a change of basismatrix! If a matrix A is invertible, then its columns form a basis for Rn. Let uj be the j-th column ofA. Then uj is exactly the coordinates of uj with respect to the standard basis. If S = u1, · · · ,unand E is the standard basis then A = MS→E .

Furthermore, every invertible matrix is a change of basis matrix for an infinite number of bases. Wecan start with any basis T and we form the basis S by taking exactly the linear combinations of Tprescribed by the columns of A. We will then have A = MS→T .

If we want to solve a system Ax = b with A an invertible matrix, we can understand this as x beingthe vector of coordinates of b in terms of the basis of the columns of A. An LU decomposition gives

39

LUx = b. We have thus expressed an arbitrary change of basis (A) as a product of two change ofbases (L and U). Since L and U are triangular, it is much easier to invert these two change of basesthan to directly invert A.

exercises

1. Give the general solution to Ax = b in the form x = x0 + Nt, where x0 is a particularsolution, the columns of N are a basis for nul(A), and t is a vector of parameters.

a) A =

[1 1 11 2 3

]and b =

[23

]b) A =

2 4 −2 2 6−1 −1 4 1 −22 5 0 0 9

and b =

225



4. Let φ : V → W be an isomorphism between vector spaces. Show that if U is a subspace ofV then φ(u) : u ∈ U is a subspace of W .(We sometimes write φ(U) for φ(u) : u ∈ U, the image of U under φ.)

5. Consider the following sets.

A =

1

02

,0

20

,2

03

B =

1

11

,1

24

,1

39

a) Verify (in any way you prefer) that these are both bases of R3.b) Describe as many other ways as you can to decide if these are bases. You needn’t go

through all the technical details. How different are all your methods, really? That is,were you to actually do the technical details would they all be “truly distinct”?

6. Consider the bases in the previous question, as well as the standard basis E. Find each ofthe following change of basis matrices.

a) MA→E b) ME→B c) MA→B d) MB→A

7. Consider the following two sets of vectors from R3.

B =

1

01

,1

10

,0

11

C =

1

00

,2

10

,3

21

a) Show that B and C are bases.b) Find the change of basis matrix MB→C .

c) Let x be the vector with φB(x) =[

2−11

]. Find φC(x) using your change of basis matrix

(without calculating x).d) Find x without using your change of basis matrix, and find φC(x) from this (without

using your change of basis matrix).

8. Consider the matrix M =

1 1 10 1 10 0 1

, and the basis S =

0

12

,1

01

,2

10

for R3.

a) Find a basis R for R3 such that M = MR→S .b) Find a basis T for R3 such that M = MS→T .

40

9. a) Prove that any invertible n × n matrix over C is a change of basis matrix for somesuitable pair of bases of Cn.

b) Prove that any invertible n× n matrix over C is a change of basis matrix for an infinitenumber of pairs of bases for Cn.

10. Let P be a permutation matrix (recall that this means that every row and column of P hasexactly one 1 and all other entries are 0).a) Show that P is invertible.b) Since P is invertible, it can be considered as a change of basis matrix between some basis

B and some other basis B′. Give an elegant description of the relationship between Band B′. Be precise, and justify!

11. Let P =

1 2 34 5 67 8 0

and D =

1 0 00 2 00 0 3

. Define A = PDP−1.

Thinking of A as a function from R3 to R3 defined as x → Ax, describe the action of A byidentifying a suitable basis on which the action of A is “easy”.You should not have to do any calculation for this, and certainly nothing as complex asactually finding the inverse of any matrix.

12. Let P =

1 2 34 5 67 8 0

and D =

1 1 10 1 10 0 1

. Define A = PDP−1.

Thinking of A as a function from R3 to R3 defined as x → Ax, describe the action of A byidentifying a suitable basis on which the action of A is “easy”.You should not have to do any calculation for this, and certainly nothing as complex asactually finding the inverse of any matrix.

13. Let B = u1, · · · ,un be a basis for Rn. For any vector x ∈ Rn, define αi(x), 1 ≤ i ≤ n, tobe the unique scalars such that x = α1(x)u1 + · · · + αn(vx)un. Thus each αi is a functionfrom Rn → R, and in fact αi(x) = (φB(x))i. Let P be the matrix whose columns areu1,u2, · · · ,un, and let wT

1 ,wT2 , · · · ,wT

n be the rows of P−1. Finally let e1, e2, · · · , en be thestandard basis.

a) Show that P−1x =

[α1(x)

...αn(x)

].

b) Show that PeieTi P−1x = αi(x)ui.

c) Show that αi(x) = wTi x.

14. Consider the vectors u1 =

011

, u2 =

101

, u3 =

110

. In each case determine a matrix A

that has the desired properties. You may give your answer in the form of a matrix productA = PDP−1, there is no need to multiply it out.a) Au1 = 2u1; Au1 = −1u2; Au1 = 3u3

b) Au1 = 2u1; Au1 = u3; Au1 = u2

c) Au1 = u1 + u2 + u3; Au1 = u2 + u3; Au1 = u3

d) Au1 = u1 + u2 + u3; Au1 = u2 + u3; Au3 = 0


011

, u2 =

101

, u3 =

110

.

a) Determine a matrix A such that Au1 = u2 and Au2 = u1, with no particular constrainton Au3.

b) Determine all such matrices A.

41

You may give your generalanswer in the form of a matrix product A = PDP−1 (withsome parameters somewhere), there is no need to multiply it out.


110

, u2 =

001

, u3 =

111

. We wish to find a matrix A such that

Au1 = 2u1, Au2 = u1 + u2, Au3 = u3

a) Explain why the methods above will not work to write A in the form A = PDP−1.b) Explain why there is no such matrix A of any form whatsoever.

42


6. Distance and Norms

motivation

The Euclidean length of a vector x in Rn is given by√x21 + x22 + · · ·+ x2n

But vectors are not always “geometric”, even in Rn. If x is a vector of populations then the squareroot of the sum of the squares of the individual populations doesn’t mean much.

x =

[50 foxes100 owls

]“length” =

√502 + 1002 = 111.8??

In such a context it might be more reasonable and useful to consider the length to be the sum of allthe individual populations: 150 animals in all. Alternatively we could take a weighted sum, accordingto the impact of each species; eg, if foxes eat s mice per day and owls eat t mice per day, then onemight consider the “length” 50s + 100t as a measure of the total impact of these predators on thepopulation of mice.

Even in a geometric context, the meaning of length can change. If we are at 585 King Edward andwe want to go to Wilbrod and Friel, the Euclidean distance isn’t helpful.

Furthermore if we consider a vector space of (say) functions, how should we interpret the “length”of a function? We are not speaking of the length of a curve between two points, but the length ofthe function itself. What is the “angle” between two functions, say sin(t) and cos(t)?

We want a general way of measuring distance in a vector space, that is inspired by geometric lengthin Rn, but allows other applications that have nothing to do with geometric vectors.

norms

Definition 6.1. A norm is a function ‖·‖ that assigns a non-negative real number to eachvector such that for all vectors u,v and any scalar α we have:

1. ‖v‖ = 0 ⇐⇒ v = 0

2. ‖αv‖ = |α| ‖v‖3. ‖u + v‖ ≤ ‖u‖+ ‖v‖

There are three particular norms that are often useful in Rn and Cn.

‖x‖1 = |x1|+ |x2|+ · · ·+ |xn|

‖x‖2 =

√|x1|2 + |x2|2 + · · ·+ |xn|2

‖x‖∞ = max |x1| , |x2| , · · · , |xn|These are by no means the only norms. In general for any p ∈ R with p ≥ 1 the following functionis a norm.

‖x‖p = (|x1|p + |x2|p + · · ·+ |xn|p)1/p


43

We see that ‖·‖1 and ‖·‖2 correspond to p = 1 and p = 2. Furthermore ‖·‖∞ corresponds to the casep→∞, which explains the notation.

Example 6.2. Here are some examples of norms in R2.

∥∥∥∥[−17+3

]∥∥∥∥1

= |−17|+ |+3| = 17 + 3 = 20∥∥∥∥[−17+3

]∥∥∥∥2

=

√|−17|2 + |+3|2 =

√289 + 9 =

√298

And for complex numbers.∥∥∥∥[i+ 12i

]∥∥∥∥2

=

√|i+ 1|2 + |2i|2 =

√(√12 + 12

)2+(√

02 + 22)2

=√

2 + 4 =√

6∥∥∥∥[i+ 12i

]∥∥∥∥∞

= max |i+ 1| , |2i| = max√

12 + 12,√

02 + 22

= max√

2,√

4

= 2

Recall that |z| = |a+ bi| =√a2 + b2 =

√(a+ bi)(a− bi) =

√zz.

Problem 6.3. Show that the function

∥∥∥∥[x1x2]∥∥∥∥ =

√x21 + 2x22 is a norm. Show that the function∥∥∥∥[x1x2

]∥∥∥∥ =√

2x21 + 2x1x2 + 2x22 is a norm. (hint below1)

The norm ‖·‖2 corresponds exactly to geometric distance. We have an alternative formula that is

sometimes useful. In Rn we see that ‖x‖2 =√

xTx and in Cn we see that ‖z‖2 =√

zHz. In thecomplex case the “transpose” becomes a “transpose conjugate”.

Problem 6.4. Verify that for x ∈ Rn we have√

xTx =√|x1|2 + |x2|2 + · · ·+ |xn|2. Verify that

for z ∈ Cn we have√

zHz =√|z1|2 + |z2|2 + · · ·+ |zn|2. Careful with the absolute value, since in

general |w|2 6= w2 for a complex number w.

Cauchy-Schwarz

In Rn and Cn we have an important result for the geometric norm ‖·‖2. This is the Cauchy-Schwarzinequality.

Theorem 6.5. For every x,y ∈ Cn we have∣∣xHy

∣∣ ≤ ‖x‖2 ‖y‖2.

Proof. Let x and y be two vectors in Cn. To simplify notation we define the following.

α = xHx = ‖x‖22 β = yHy = ‖y‖2

2 γ = xHy = yHx

Note that α, β ∈ R, but in general γ /∈ R.

1 Note that 2x21 + 2x1x2 + 2x22 = (x1 + x2)2 + x21 + x22.

44

We have ‖u‖22 ≥ 0, because norms are always non-negative reals.

0 ≤ ‖γx− αy‖22 = (γx− αy)H (γx− αy)

= γγxHx + ααyHy − γαxHy − αγyHx

= |γ|2 α+ α2β − |γ|2 α− |γ|2 α

= α(αβ − |γ|2

)If α = 0 then x = 0 and the desired statement is true. Otherwise this gives αβ ≥ |γ|2, and so√α√β ≥ |γ|, which is exactly the desired statement.

The case of Rn is similar; in fact it’s a special case. As an exercise, interpret the above proof for Rn.What changes?

Problem 6.6. We see in the previous proof that if∣∣xHy

∣∣ = ‖x‖2 ‖y‖2, then ‖γx− αy‖2 = 0 and soγx = αy. If we assume that x,y 6= 0, then this forces x and y to be parallel, or y = kx.

Show that if y = kx we always have γx = αy. Conclude that∣∣xHy

∣∣ = ‖x‖2 ‖y‖2 if and only if xand y are parallel.

sequences of vectors

We would like to have a notion of “convergence” for vectors. For example if x(k) is a vector ofpopulations at time k, then we would like to work with limk→∞ x(k).1

If a sequence of vectors x(k) converges to some limit x, it should be the case that they eventually get“close” to x. So we define the convergence of vectors in terms of norms.

We say that a sequence of vectors x(k) converges to x if∥∥x(k) − x

∥∥→ 0. This is the number zeroand not the zero vector. It would appear that convergence depends on the choice of norm, but infact this is not the case. We prove this for the three norms of interest, but it is true in general.

Theorem 6.7. Let x(k) be a sequence of vectors. Then∥∥∥x(k) − x∥∥∥1→ 0 ⇐⇒

∥∥∥x(k) − x∥∥∥2→ 0 ⇐⇒

∥∥∥x(k) − x∥∥∥∞→ 0

Furthermore, we have one more equivalent condition:∥∥x(k) − x

∥∥ → 0 if and only if each

coordinate of x(k) converges to the corresponding coordinate of x.

Proof. We apply the famous Sandwich Theorem: If ak ≤ bk ≤ ck for all k and limk→∞ ak =limk→∞ ck = L then limk→∞ bk = L too.

We notice that in an n-dimensional vector space 0 ≤ ‖u‖1 ≤ n ‖u‖∞. So if ‖u‖∞ → 0 then

‖u‖1 → 0. If we set u = x(k) − x then we see that∥∥x(k) − x

∥∥∞ → 0 implies

∥∥x(k) − x∥∥1→ 0.

Similarly we notice that ‖u‖22 ≤ ‖u‖1

2, so 0 ≤ ‖u‖2 ≤ ‖u‖1 and this gives that∥∥x(k) − x

∥∥1→ 0

implies∥∥x(k) − x

∥∥2→ 0.

Lastly we notice that ‖u‖∞2 ≤ ‖u‖2

2, so 0 ≤ ‖u‖∞ ≤ ‖u‖2 and this gives that∥∥x(k) − x

∥∥2→ 0

implies∥∥x(k) − x

∥∥∞ → 0.

1 We use the notation x(k) to refer to the k-th vector in a sequence in order to avoid confusion with the k-th coordinateof a vector. So (x(k))j is the j-th coordinate of the k-th vector. We won’t generally need to explicitly refer tosequences of vectors, so this should not cause confusion.

45

We can summarize the above by observing that 0 ≤ ‖u‖∞ ≤ ‖u‖2 ≤ ‖u‖1 ≤ n ‖u‖∞. Substituting

u = x(k) − x and taking the limit as k →∞ then we see that if one limit goes to 0 then all go to0.

For the final condition, we recall that 0 ≤ |uj | ≤ ‖u‖∞ = max1≤j≤n |uj |. If for each 1 ≤ j ≤ n thej-th coordinate of u tends to 0 then so must ‖u‖∞. If the maximum over all coordinates tends to

0 then so must each coordinate. Setting u = x(k) − x gives the result as stated.

Problem 6.8. Consider the sequence of vectors x(k) =

[2k/(3 + k)

2−k

]and the vector x =

[20

].

Calculate∥∥x(k) − x

∥∥1,∥∥x(k) − x

∥∥2

and∥∥x(k) − x

∥∥∞ and show that each sequence of norms converges

to zero. (These are sequences of numbers, not vectors.) Show also that as k → ∞ each coordinate

of x(k) converges to the corresponding coordinate of x.

exercises

1. Let x ∈ Rn. Show explicitly each of the following.a) if ‖x‖1 ≤ ε then ‖x‖2 ≤ εb) if ‖x‖2 ≤ ε then ‖x‖∞ ≤ εc) if ‖x‖∞ ≤ ε then ‖x‖2 ≤

√nε

d) if ‖x‖2 ≤ ε then ‖x‖1 ≤√nε

2. Your boss wants a nonzero vector of length at most ε, for some ε > 0. You can manufacturea nonzero vector of length at most δ, for any δ > 0. The problem is that you and your bossdon’t use the same norm to measure length. Each of you uses one of ‖·‖1 or ‖·‖2 or ‖·‖∞. Isit always possible to succeed, regardless of the choice of norms?

3. For each of the following real vectors, find the norms using ‖·‖1, ‖·‖2, ‖·‖∞.

a)

123

b)

1−23

c)

0030

d)

1100

4. Let D be a n × n diagonal matrix with ±1 on the diagonal. Show that for any x ∈ Rn we

have ‖Dx‖p = ‖x‖p for p ∈ 1, 2,∞.

5. Let R = 1√2

[1 −11 1

]. Show that for any x = R2 we have ‖Rx‖p = ‖x‖p for p = 2. Is this

still true for p = 1 or p =∞?

6. For each of the following complex vectors, find the norms using ‖·‖1. ‖·‖2, ‖·‖∞.

a)

i23

b)

−1−2i

3(1/2 + i√3/2)

c)

1 + i1 + 2i

3i

d)

1 + i1− i−1 + i−1− i

7. Let D be a n × n diagonal matrix with complex numbers of the form eiθ for θ ∈ R on the

diagonal. Show that for any z ∈ Cn we have ‖Dz‖p = ‖z‖p for p ∈ 1, 2,∞.

8. Draw the region defined by Bp

x ∈ R2 : ‖x‖p ≤ 1

for p = 1, 2,∞. In other words, identify

the vectors whose norm is at most 1 in each of the three norms. Show that B1 ⊆ B2 ⊆ B∞.

46

9. Determine if each of the following are norms for R2. Either prove they satisfy the conditionsfor a norm, or show that they fail at least one of these conditions.

a)

∥∥∥∥[x1x2]∥∥∥∥ = x21 − x22 b)

∥∥∥∥[x1x2]∥∥∥∥ =

√|x1|+ |x2| c)

∥∥∥∥[x1x2]∥∥∥∥ = 0

47


7. Inner Products

geometric angles

In R2 (considered geometrically) we can measure the angle between two vectors x and y, using thetriangle whose three sides are x, y and y − x.

θ

x

y

y − x

The Cosine Law gives ‖y − x‖22 = ‖x‖2

2 + ‖y‖22 − 2 ‖x‖2 ‖y‖2 cos θ. We get a formula for cos θ.

cos θ =xTy

‖x‖ ‖y‖=

xTy√xTx

√yTy

The same formula works in Rn, since two vectors in Rn (with a common basepoint) determine aplane, and the angle is contained in this plane, which is isomorphic to R2. We see that algebraically,the angle between x and y is determined by the norms of the two vectors and the product xTy.Motivated by the definition of the angle, we say that two vectors x,y ∈ Rn are orthogonal ifxTy = 0. Vectors in Rn are orthogonal if they are perpendicular. We want to generalise thisproduct: this will give us a way to measure “angles” in arbitrary vector spaces.

inner products

Inner products generalize the “scalar product”.

Definition 7.1. Let V be a vector space over a field F, where F is R or C. An inner producton V is a function 〈·|·〉 from V × V to F such that the following properties are valid for allvectors u,v,w and scalars α, β. We give two versions: one for real vector spaces and one forcomplex vector spaces.

R spaces C spaces

1. 〈u|v〉 ∈ R 〈u|v〉 ∈ C

2. 〈u|v〉 = 〈v|u〉〈u|v〉 = 〈v|u〉3. 〈αu + βv|w〉 = α〈u|w〉+ β〈v|w〉〈αu + βv|w〉 = α〈u|w〉+ β〈v|w〉4. 〈w|αu + βv〉 = α〈w|u〉+ β〈w|v〉〈w|αu + βv〉 = α〈w|u〉+ β〈w|v〉5. 〈u|u〉 ≥ 0 〈u|u〉 ∈ R, 〈u|u〉 ≥ 0

6. 〈u|u〉 = 0 ⇐⇒ u = 0 〈u|u〉 = 0 ⇐⇒ u = 0


48

In the complex case the product is not commutative and left distributivity needs a complex conjugate.Also the inner product of a vector with itself is always a real number. One could write an innerproduct using a more ordinary function notation: f(u,v) for instance, and occasionally we will dothis. The notation 〈u|v〉 reminds us that it is a rather special type of function.

Problem 7.2. Show that the following examples are all inner products.

• In Rn : the standard inner product 〈u|v〉 = uTv.

• In Cn : the standard inner product 〈u|v〉 = uHv.

• In P3 : 〈f |g〉 =∫ 10 f(t)g(t) dt.

• In R2 : 〈u|v〉 = 5u1v1 + 2u2v2.

The last example shows a particular case of an important class of inner products that we discusshere briefly. Notice that

5u1v1 + 2u2v2 =[u1 u2

] [5 00 2

] [v1v2

]So this is an inner product of the form 〈u|v〉 = uTAv. Does every matrix A give an inner productin this manner? No.

Problem 7.3. Show that 〈u|v〉 = uTAv is not an inner product for A =

[1 00 −1

]or A =

[1 00 0

].

Try and generalize.

Problem 7.4. Show that 〈u|v〉 = uTAv is an inner product for Rn if A is real and symmetric andxTAx > 0 for every x 6= 0.

symmetric matrices, positive definite matrices

Problem 7.4 suggests a definition.

Definition 7.5. A real symmetric matrix with xTAx > 0 for every x 6= 0 is said to bepositive definite. A real symmetric matrix with xTAx ≥ 0 for every x 6= 0 is said to bepositive semi-definite. We sometimes write A 0 to mean that A is positive semi-definiteand A 0 to mean that A is positive definite.

A positive definite matrix is positive semi-definite by definition. Note that when x = 0 we alwayshave xTAx = 0TA0 = 0. So we can say that a matrix is positive semi definite if xTAx ≥ 0 for all x,and it is positive definite if equality holds only when x = 0.

Symmetric real matrices are oddly special.

Theorem 7.6. A symmetric real matrix has all eigenvalues real, and eigenvectors correspond-ing to different eigenvalues orthogonal.

Proof. Assume that A is symmetric with Ax = λx for x 6= 0. Multiplying on the left by xH weget xHAx = λxHx. On the other hand, taking the transpose conjugate and multiplying by x weget (Ax)H x = (λx)H x which simplifies to xHAx = λxHx. Therefore λxHx = λxHx and sincexHx 6= 0 (why?) we get λ = λ, meaning λ ∈ R.

49

Assume we have non-zero vectors x,y with Ax = λx and Ay = µy with λ 6= µ. Then (Ax)T =

(λx)T so xTA = λxT (right?).

λxTy = (xTA)y = xT (Ay) = µxTy

Since λ 6= µ we must have xTy = 0.

We get a little bit more from positive (semi-)definiteness.

Theorem 7.7. If A is positive semi-definite then all of its eigenvalues are (real and) non-negative. If A is positive definite then all of its eigenvalues are (real and) positive.

Proof. Let A 0 with Ax = λx and x 6= 0. Then 0 < xTAx = xT (Ax) = xT (λx) = λxTx. ButxTx > 0 so λ > 0. The situation for A 0 is similar.

Problem 7.8. Show that 〈u|v〉 = uT[2 11 2

]v is an inner product. Do this directly from the def-

inition of an inner product and also by observing that the matrix is symmetric and calculating itseigenvalues.

inner products, norms and angles

An inner product is a more fundamental object than a norm, in the sense that every inner productdetermines a norm.

Proposition 7.9. For any inner product 〈·|·〉 the function ‖u‖ =√〈u|u〉 is a norm.

Also, the appropriate generalisation of the Cauchy-Schwarz inequality is valid for any inner product.

Theorem 7.10. Let V be a vector space equipped with an inner product 〈·|·〉 and the normthat it induces. Then |〈u|v〉| ≤ ‖u‖ ‖v‖ for any uv ∈ V .

Problem 7.11. Prove Theorem 7.10. The idea follows the geometric case, Theorem 6.5. Havingdone this, prove Proposition 7.9. You may find Theorem 7.10 useful in establishing the triangleinequality for Proposition 7.9. This may sound circular, but. . . Did you use the fact that ‖u‖ is anorm in proving Theorem 6.5, or merely it’s definition as a function?

We can now define angles in a general vector space. Let V be a real vector space, equipped with aninner product 〈·|·〉. Then the angle θ between two vectors u and v is defined by

cos θ =〈u|v〉√

〈u|u〉√〈v|v〉

The (general) Cauchy-Schwarz inequality (Theorem 7.10) guarantees that the expression on the rightis between −1 and +1, and is thus legitimately the cosine of something. This angle is not necessarilyanything geometric, but can be thought of as a measure of “similarity of direction” of the two vectors.Note that the angle depends on the choice of inner product. The geometry of a complex vector spaceis a little more “complex”. The angle between two complex vectors in the sense of our definition isa complex number (although there are different ways of thinking about “angle” in a complex vector

50

space), so we will limit ourselves to real vector spaces. But note that even in the complex case,orthogonality still corresponds to a zero inner product.

Example 7.12. Consider the vector space C[0, 2π] of continuous functions on the interval [0, 2π],

with the inner product 〈f |g〉 =∫ 2π0 f(t)g(t) dt. Find the angle between cos(t) and sin(t).

〈cos(t)| sin(t)〉 =

∫ 2π

0cos(t) sin(t) dt =

∫ 2π

0

1

2sin(2t) dt = −cos(2t)

4

∣∣∣∣2π0

= 0

So these two functions are orthogonal. The “angle” between cos(t) and sin(t) is 90 .

Problem 7.13. Consider the vector space P2 equipped with the inner product 〈f |g〉 =∫ 10 f(t)g(t) dt,

and the induced norm ‖f‖ =√〈f |f〉.

Find the “length” of the two polynomials t− 1 and t+ 1; that is, find ‖t− 1‖ and ‖t+ 1‖. Also, findthe “angle” between t− 1 and t+ 1.

norms, orthogonality and independence

We already know that geometric orthogonality (i.e., with respect to the standard inner product onRn) implies linear independence. In fact this is true for any inner product. The general proof isessentially the same as the special case.

Theorem 7.14. Let V be a vector space equipped with an inner product 〈·|·〉. Let S =u1,u2, · · · ,uk be a set of non-zero orthogonal vectors, so that 〈ui|uj〉 = 0 if i 6= j. Then Sis linearly independent.

Proof. Consider a linear combination of S that gives the zero vector.

α1u1 + α2u2 + · · ·+ αkuk = 0

〈u1|α1u1 + α2u2 + · · ·+ αkuk〉 = 〈u1|0〉α1〈u1|u1〉+ α2〈u1|u2〉+ · · ·+ αk〈u1|uk〉 = 0

α1〈u1|u1〉+ α2(0) + · · ·+ αk(0) = 0

α1〈u1|u1〉 = 0

Since u1 6= 0 this means that α1 = 0. Similarly we show that for each j we have αj = 0. Thusthe set S is linearly independent.

Problem 7.15. In the proof we used the fact that 〈x|0〉 = 0 for any vector x. This is certainly truefor the standard inner product. Show that it is true for any inner product. (hint: since 00 = 0 weget 〈x|0〉 = 〈x|00〉, where 00 is the scalar 0 multiplied by the vector 0.)

Orthogonality depends on the choice of inner product, but linear independence has nothing to dowith what inner product we are using. So it is perhaps somewhat surprising that orthogonality withrespect to any inner product implies linear independence.

Example 7.16. Consider the space P2 equipped with the inner product 〈f |g〉 =∫ 10 f(t)g(t) dt. Show

that the set S =t, 2− 3t, 2− 12t+ 12t2

is a basis for P2.

51

We could directly show independence, but we’ll show something stronger: we will show that theset is an orthogonal basis.

〈t|2− 3t〉 =

∫ 1

0t(2− 3t) dt =

∫ 1

02t− 3t2 dt = t2 − t3

∣∣10

= 0

So the vectors t and 2− 3t are orthogonal with respect to this inner product.

One can show (exercise!) that t and 2−12t+12t2 are orthogonal, as well as 2−3t and 2−12t+12t2.We conclude that S is an orthogonal set. Thus (Theorem 7.14), it is linearly independent. A setof three independent vectors in a space of dimension 3 is necessarily a basis; furthermore, this isan orthogonal basis.

Problem 7.17. Consider the space P2 equipped with the inner product 〈f |g〉 = 〈f0 + f1t+ f2t2|g0 +

g1t+ g2t2〉 = f0g0 + f1g1 + f2g2. Show that the set S =

t, 2− 3t, 2− 12t+ 12t2

is not orthogonal.

Is it still a basis? Is this a contradiction with the previous example? (hint: yes. no.)

Problem 7.18. Show that 〈f |g〉 = 〈f0 + f1t+ f2t2|g0 + g1t+ g2t

2〉 = f0g0 + f1g1 + f2g2 is an innerproduct for the vector space P2.

Do this in two ways: according to the definition, and also by observing that the function φ(f0 + f1t+

f2t2) =

[f0 f1 f2

]Tis an isomorphism from P2 to R3, and that 〈f |g〉 = φ(f)Tφ(g). Thus this

norm “is” the standard norm for R3. Give the usual name for this isomorphism φ.

Example 7.19. Consider the space C[−π, π], the vector space of continuous functions on the interval

[−π, π] equipped with the norm 〈f |g〉 =∫ +π−π f(t)g(t) dt. We define the set F of functions as follows.

F = 1 ∪ cos(t), cos(2t), cos(3t), · · · ∪ sin(t), sin(2t), sin(3t), · · ·

Show that the set F is orthogonal.

Note that F is infinite, so this shows that C[−π, π] contains an independent set of infinite size, and soC[−π, π] has no finite basis. The question of whether this vector space has a basis at all is intimatelyconnected with the Axiom of Choice.

First of all we see that the function 1 is orthogonal to the others, since if n ≥ 1 we have thefollowing inner products.

〈1| cos(nt)〉 =

∫ +π

−πcos(nt) dt =

sin(nt)

n

∣∣∣∣+π−π

= 0

〈1| sin(nt)〉 =

∫ +π

−πsin(nt) dt =

− cos(nt)

n

∣∣∣∣+π−π

= 0

52

Next we see that the functions of the same “type” are orthogonal. If m,n ≥ 1 with m 6= n thenwe get the following inner products.

〈cos(mt)| cos(nt)〉 =

∫ +π

−πcos(mt) cos(nt) dt

=

∫ +π

−π

1

2

(cos((m+ n)t) + cos((m− n)t)

)dt

=1

2

(sin((m+ n)t)

m+ n+

sin((m− n)t)

m− n

) ∣∣∣∣+π−π

= 0

〈sin(mt)| sin(nt)〉 =

∫ +π

−πsin(mt) sin(nt) dt

=

∫ +π

−π

1

2

(cos((m+ n)t)− cos((m− n)t)

)dt

=1

2

(sin((m+ n)t)

m+ n− sin((m− n)t)

m− n

) ∣∣∣∣+π−π

= 0

Last we see that functions of different “type” are orthogonal. If m,n ≥ 1 we get the followinginner product.

〈cos(mt)| sin(nt)〉 =

∫ +π

−πcos(mt) sin(nt) dt

=

∫ +π

−π

1

2

(sin((m+ n)t) + sin((−m+ n)t)

)dt

=1

2

(− cos((m+ n)t)

m+ n+− cos((−m+ n)t)

−m+ n

) ∣∣∣∣+π−π

= 0

In evaluating these integrals, we used some trigonometric identities, such as

2 sin(A) sin(B) = − cos(A+B) + cos(A−B)

2 cos(A) cos(B) = cos(A+B) + cos(A−B)

2 cos(A) sin(B) = sin(A+B) + sin(−A+B)

The functions of the set F of the previous example form a “basis” for Fourier series. Given some(periodic) function f(t), we want to find an approximation in terms of functions of F . We computeits projection onto the subspace spanned by the functions of F : this gives the Fourier series of f(t).(Note that since a Fourier series is typically infinite, it is not actually a linear combination.) To whatextent is it a good approximation? Fourier series usually merit a course on their own.

exercises

1. Consider some vector space V with some inner product 〈·|·〉. Show that 〈x|0〉 = 〈0|x〉 = 0 forany vector x. You should not assume this is the standard inner product, nor that the vectorspace is Rn or Cn.

2. For each, decide whether the function f is an inner product on the given vector space V .Explicitly determine which of the required properties hold.a) V = R2 over R with f([ x1x2 ] , [ y1y2 ]) = x1y1 − x2y2.b) V = R2 over R with f([ x1x2 ] , [ y1y2 ]) = x1y2 − x2y1.c) V = C2 over C with f([ x1x2 ] , [ y1y2 ]) = x1y1 − x2y2.d) V = C2 over C with f([ x1x2 ] , [ y1y2 ]) = x1y2 − x2y1.

53

3. Suppose that V is a real or complex vector space with basis B = u1, · · · ,un. Let mij bescalars for 1 ≤ i, j ≤ n.a) Show that there may or may not be an inner product f on V that satisfies f(ui,uj) =

mij . Give examples of choices of mij that do have an inner product and ones that don’t;try and give “non-trivial” examples.

b) Assume that f and g are inner products on V with f(ui,uj) = g(ui,uj) = mij (as youjust showed, this is a non-trivial constraint on the mij ’s, as well as on f and g). Showthat f = g.

In other words, “defining an inner product on a basis” can not be done arbitrarily, but oncedone, it defines the entire inner product. Compare this with “defining an isomorphism on abasis” or even “defining a linear map on a basis”.

4. Let B be a real n × n invertible matrix. Show that f(x,y) = xTBTBy is an inner productfor Rn. Does this still hold if B is not invertible? Does this still hold if B is a m× n matrixof rank n?

5. Let B be a complex n × n invertible matrix. Show that f(x,y) = xHBHBy is an innerproduct for Cn.

6. Let B be a real m× n matrix.a) Show that BTB is positive semi-definite.b) Show that BTB is positive definite if and only if B has rank n.

7. Let V be a vector space over R, and C = u1, · · · ,un a basis for V . Let φC be theisomorphism that maps a vector to its coordinates with respect to C, and let B be a realn× n invertible matrix. Show that f(u,v) = φC(u)TBTBφC(v) is an inner product on V .

8. Consider the vector space R3, with the inner product 〈x|y〉 = xTAy where A =[2 1 01 2 00 0 1

]a) Show that xTAy is an inner product.b) Calculate the angles between the vectors of the standard basis according to this inner

product.

9. Consider the vector space P of polynomials, and x = 1 + t, y = 1− t.a) Find the angle between x and y with respect to the inner product 〈f |g〉 =

∫ 10 f(t)g(t) dt.

b) Find the angle between x and y with respect to the inner product 〈f |g〉 =∫ 1−1 f(t)g(t) dt.

c) Find an inner product on P that makes x and y orthogonal.

10. Alice and Bob which to decide whether two sets S1 and S2 are linearly independent. Alicechooses some inner product, and discovers that S1 is orthogonal with respect to her innerproduct; unfortunately S2 is not orthogonal. Alice concludes that S1 is independent whileS2 is dependent. Bob chooses some other inner product, and discovers that S2 is orthogonalwith respect to his inner product; unfortunately S1 is not orthogonal. Bob concludes that S2is independent while S1 is dependent.Which of them is right? Which of S1 and S2 is independent?

11. Let B be some basis of Rn.a) Show that 〈x|y〉 = (φB(x))TφB(y) is an inner product.b) Show that B is orthogonal with respect to this basis.c) Let P be the matrix with B as its columns. Show that 〈x|y〉 = xT

((P−1)TP−1

)y.

12. Define f(x,y) = xTAy for x,y ∈ Rn and some matrix A.a) Show that if A is not symmetric then there exists x,y ∈ Rn such that f(x,y) 6= f(y,x).b) Show that if A has an eigenvalue λ < 0 then there exists an x ∈ Rn with f(x,x) < 0.c) Show that if A has an eigenvalue λ = 0 then there exists a nonzero x ∈ Rn with

f(x,x) = 0.d) Conclude that f is an inner product if and only if A is positive definite.

54


8. Projection and G-S

geometric projection

Recall the concept of geometric projection in Rn that we first encountered in kindergarten.

x y

projy(x)

We know the formula projy(x) =yTx

yTyy =

yTx

‖y‖2y. This formula is in terms of the (standard) norm

and the (standard) inner product induced by it. This gives a hint as to what the general projectionshould be, but we will define projection in terms of a property.

Definition 8.1. Let V be a vector space equipped with an inner product 〈·|·〉, and u a non-zerovector. We define the orthogonal projection of v onto u as the vector f(u,v)u such thatv − f(u,v)u is orthogonal to u, where f is some scalar-valued function.

Whether such a function f exists, or is unique, is not (yet) clear in general. Before we deal with thoseissues, note that projection depends on the choice of inner product, since orthogonality depends onthe inner product. So we expect f to involve the inner product in some way.

Proposition 8.2. Let V be a vector space equipped with an inner product 〈·|·〉, and u,v ∈ Vwith u 6= 0. If we consider some scalar function f , then v− f(u,v)u is orthogonal to u if and

only if f(u,v) = 〈u|v〉〈u|u〉 .

Proof. For convenience we let f(u,v) = 〈u|v〉〈u|u〉 − ω Of course ω is still a function of u and v but

for clarity we omit this. We would like the following inner product to be zero.

〈 u | v − f(u,v)u 〉 = 〈 u | v − 〈u|v〉〈u|u〉u + ωu 〉

= 〈u|v〉 − 〈u|v〉〈u|u〉〈u|u〉+ ω〈u|u〉= 〈u|v〉 − 〈u|v〉+ ω〈u|u〉= ω〈u|u〉

Since 〈u|u〉 6= 0 (why?), we must have ω = 0. So f(u,v) = 〈u|v〉〈u|u〉 .


55

Corollary 8.3. Let V be a vector space equipped with an inner product 〈·|·〉, and u,v ∈ Vwith u 6= 0. Then proju(v) = 〈u|v〉

〈u|u〉u

So in general the projection of v onto u is exactly what we need to remove from v so that the resultis orthogonal to u.

We can define orthogonality between subspaces too. If V is a vector space with an inner product〈·|·〉 and U is a subspace of V and v ∈ V then we say that v is orthogonal to U if v is orthogonal toevery vector in U . This allows us to think of the projection of v onto U as the vector in U such thatthe difference between v and the projection is orthogonal to U .

Definition 8.4. Let V be a vector space equipped with an inner product 〈·|·〉, and U a subspaceof V . We define the orthogonal projection of v onto U as the vector F (U,v) such thatv − F (U,v) is orthogonal to U , where F is a function whose image lies in U .

It is perhaps even less clear now that such a function F exists and is unique. The apparent needto consider every vector in U is a little intimidating; but see Exercise 8.1. In the case of projectiononto a vector u we knew we had a scalar times u; now we don’t even know which vector u of U topick! One way to approach this is to think of F (U,v) relative to a basis: then the desired u can beexpressed in terms of the basis.

Proposition 8.5. Let V be a vector space with inner product 〈·|·〉, U a nonzero subspace ofV , and v a vector in V . Let u1, · · · ,uk be any orthogonal basis for U . If we consider somefunction F taking values in U , then v − F (U,v) is orthogonal to U if and only if F (U,v) =∑k

j=1〈uj |v〉〈uj |uj〉uj.

Proof. For convenience we let F (U,v) =k∑j=1

(〈uj |v〉〈uj |uj〉 − ωj

)uj . Of course the ωj are still functions

of U and v but for clarity we omit this.

For a vector to be orthogonal to U means that it is orthogonal to every vector in U . By Exercise 8.1,it suffices to check orthogonality with each of the uj .

We would like the following inner product to be zero, for 1 ≤ i ≤ k

〈ui|v − F (U,v)〉 = 〈 ui | v −k∑j=1

(〈uj |v〉〈uj |uj〉 − ωj

)uj 〉

= 〈 ui | v − 〈u1|v〉〈u1|u1〉u1 − · · · − 〈uk|v〉

〈uk|uk〉uk + ω1u1 + · · ·ωkuk 〉

= 〈ui|v〉 − 〈ui|v〉〈ui|ui〉〈ui|ui〉+ ωi〈ui|ui〉

= 〈ui|v〉 − 〈ui|v〉+ ωi〈ui|ui〉= ωi〈ui|ui〉

Since 〈ui|ui〉 6= 0 (why?) we must have ωi = 0. This must be true for each 1 ≤ i ≤ k. So

F (U,v) =k∑j=1

〈uj |v〉〈uj |uj〉uj .

It is highly recommended to understand the calculation in the above proof; it demonstrates theessential reason why an orthogonal basis is useful.

56

Corollary 8.6. Let V be a vector space equipped with an inner product 〈·|·〉, and U a subspace

of V . Let u1, · · · ,uk be an orthogonal basis for U . Then projU (v) =

k∑j=1

〈uj |v〉〈uj |uj〉uj.

Note that this says, among other things, that the projection onto U does not depend on the choiceof basis. Of course the individual coefficients will depend on the basis, but the vector projU (v) doesnot.

We can understand projection quite well in terms of orthogonality, but there is another (equivalent)interpretation. The vector projU (v) is the vector of U that is closest to v; the precise meaning of“close” depends on the norm which depends on the inner product. Formally, we have the following.

Theorem 8.7. Let V be a vector space with inner product 〈·|·〉, U a subspace of V , and v avector in V . Let v = projU (v).If u is a vector in U other than v, then ‖v − u‖ > ‖v − v‖. So among all vectors of U , v isthe (unique) best approximation to v.

Proof. In order to simplify the presentation, we will show that ‖v − u‖2 > ‖v − v‖2.‖v − u‖2 = 〈v − u|v − u〉

= 〈v − v + v − u|v − v + v − u〉= 〈v − v|v − v〉+ 〈v − u|v − u〉+ 〈v − v|v − u〉+ 〈v − u|v − v〉

= ‖v − v‖2 + ‖v − u‖2 + 0 + 0

> ‖v − v‖2

Notice that v − u ∈ U , so Proposition 8.5 guarantees that 〈v − v|v − u〉 = 0. Also u 6= v

guarantees that ‖v − u‖2 > 0.

One consequence of Theorem 8.7 is that projection of a vector onto a subspace is unique. Wecalculated it in terms of a particular orthogonal basis, but the result is independent of the choiceof basis, because the answer is the unique best approximation. We saw this in Proposition 8.5,but somehow it was less satisfying to prove that the projection is independent of basis, by using aparticular basis.

As a special case of projection, if U = V then projU (v) = projV (v) = v, since v is already in thespace it is being projected onto (it’s worth thinking about that sentence if it’s not immediately clear).So the projection formula becomes a formula that gives the coordinates of a vector with respect tothat particular orthogonal basis. In fact, this is true even for proper subspaces, in the sense that theprojection formula gives the coordinates of the projection with respect to that particular orthogonalbasis.

Example 8.8. Consider the vector space P2 with the inner product 〈f |g〉 =∫ 10 f(t)g(t) dt. We’ve

already seen that the set S =t, 2− 3t, 2− 12t+ 12t2

is an orthogonal basis for P2 with respect to

this inner product. Now we find the coordinates of 1 ∈ P2 with respect to this basis. We will need the

57

following inner products.

〈t|1〉 =

∫ 1

0t dt =

t2

2

∣∣∣∣10

=1

2

〈2− 3t|1〉 =

∫ 1

02− 3t dt = 2t− 3t2

2

∣∣∣∣10

=1

2

〈2− 12t+ 12t2|1〉 =

∫ 1

02− 12t+ 12t2 dt = 2t− 12t2

2+

12t3

3

∣∣∣∣10

= 0

〈t|t〉 =

∫ 1

0t2 dt =

t3

3

∣∣∣∣10

=1

3

〈2− 3t|2− 3t〉 =

∫ 1

04− 12t+ 9t2 dt = 4t− 12t2

2+

9t3

3

∣∣∣∣10

= 1

Note that 1 and 2− 12t+ 12t2 are orthogonal, since their inner product is zero. So we can avoid thecalculation of 〈2− 12t+ 12t2|2− 12t+ 12t2〉.We get the coordinates as

1 =〈t|1〉〈t|t〉

(t) +〈2− 3t|1〉

〈2− 3t|2− 3t〉(2− 3t)

=1/2

1/3(t) +

1/2

1(2− 3t)

=3

2(t) +

1

2(2− 3t)

The coordinates of 1 with respect to S are 3/2, 1/2 and 0.

In the previous example we saw that 1 and 2−12t+12t2 are orthogonal. We could have deduced thiswith no calculation. The subset S′ = t, 2− 3t is linearly independent, being a subset of a basis.Also S′ is in the subspace P1. A set of two linearly independent vectors in a space of dimension twois automatically a basis. Since 1 ∈ P1 we know that 1 can be written as a linear combination of thevectors of S′: the 2− 12t+ 12t2 is not needed.

This example also shows that we can interpret orthogonal projection in terms of coordinates.

Proposition 8.9. Let U be a subspace with orthogonal basis B = u1,u2, · · · ,uk and u anarbitrary vector of U .

Then φB(u) =[〈u1|u〉〈u1|u1〉

〈u2|u〉〈u2|u2〉 · · ·

〈uk|u〉〈uk|uk〉

]T.

So in the previous example we found that φS(1) =[3/2 1/2 0

]T.

Problem 8.10. Consider the space P2 with the inner product 〈f |g〉 =∫ 10 f(t)g(t) dt and orthogonal

basis S =t, 2− 3t, 2− 12t+ 12t2

. We also know that S′ = t, 2− 3t is an orthogonal basis for

the subspace P1.

Find projP1(t2). Calculate t2 − projP1

(t2) and show that it is a multiple of 2− 12t+ 12t2. Explain.

Gram-Schmidt

The Gram-Schmidt algorithm applies to any vector space, with any inner product. We just use thegeneral version of projection based on the particular inner product.

58

The algorithm is the same. We take vectors, one at a time, and subtract off the projection onto allthe orthogonal vectors we have so far.

Algorithm 8.11. Consider a set of vectors S = v1,v2, · · · ,vk in some vector space V withan inner product 〈·|·〉. We start with an empty set Ω, and we repeat the following steps untilS is empty.

1. Pick one of the vectors vj of S.

2. Subtract from vj the projection of vj onto each vector in Ω.

3. Add the resulting vector to the set Ω.

At the end, Ω is an orthogonal set that spans the same space as S. Note that Ω could containthe zero vector (perhaps even more than once, making it a multiset). If we were to remove allthe zero vectors we would have an orthogonal basis for the space spanned by S.

This is exactly what we already know, except that now the projections are more general, based onsome inner product in some vector space.

Example 8.12. Consider the set S =

1, 1 + t, 1 + t+ t2, 1 + 2t+ 3t2

in the space P2 with inner

product 〈f |g〉 =∫ 10 f(t)g(t) dt. We transform S into an orthogonal set. To keep the presentation

clean we will omit all calculations of the projection (though you might want to check them).

We start with 1. There is nothing to do. Up until now we have the orthogonal set Ω = 1.Next we take 1 + t.

1 + t− proj1(1 + t)

= 1 + t− 〈1|1 + t〉〈1|1〉

(1)

= 1 + t− 3/2

1(1) = −1

2+ t

Up until now we have the orthogonal set Ω =

1,−12 + t

.

Next we take 1 + t+ t2.

1 + t+ t2 − proj1(1 + t+ t2)− proj− 12+t(1 + t+ t2)

= 1 + t+ t2 − 〈1|1 + t+ t2〉〈1|1〉

(1)−〈−1

2 + t|1 + t+ t2〉〈−1

2 + t| − 12 + t〉

(−1

2+ t)

= 1 + t+ t2 − 11/6

1(1)− 1/6

1/12(−1

2+ t)

=1

6− t+ t2


1,−12 + t, 16 − t+ t2

.

Next we take 1 + 2t+ 3t2.

1 + 2t+ 3t2 − proj1(1 + 2t+ 3t2)− proj− 12+t(1 + 2t+ 3t2)− proj 1

6−t+t2(1 + 2t+ 3t2)

= 1 + 2t+ 3t2 − 3

1(1)− 5/12

1/12(−1

2+ t)− 1/60

1/180(1

6− t+ t2)

= 1 + 2t+ 3t2 − (3)−(−5

2+ 5t

)−(

1

2− 3t+ 3t2

)= 0

59


1,−12 + t, 16 − t+ t2, 0

. For most practical purposes

there is little reason to keep the polynomial 0. The fact that we got 0 tells us that the last polynomial ofS was spanned by the others (in fact, we wrote down the linear combination rather explicitly). So wehave the orthogonal set

1,−1

2 + t, 16 − t+ t2

that spans the same space as the original; furthermore,this is an orthogonal basis.

Problem 8.13. After having calculated the first three orthogonal vectors in the previous example,we could have deduced right away that the third would give 0. How?

An orthonormal basis is an orthogonal basis such that the norm of each vector is 1. Note thatthis norm depends on the choice of inner product. We get an orthonormal basis from an orthogonalspanning set by dividing each vector by its norm — but only if the orthogonal set contains no zerovectors.

Problem 8.14. We know that

1,−12 + t, 16 − t+ t2

is an orthogonal basis for P2 with the inner

product 〈f |g〉 =∫ 10 f(t)g(t) dt. Transform it into an orthonormal basis.

Problem 8.15. Consider the set

1, 1 + t, 1 + t+ t2, 1 + 2t+ 3t2

in the vector space P2, with the

inner product 〈f0 + f1t + f2t2|g0 + g1t + g2t

2〉 = f0go + f1g1 + f2g2. Transform this set into anorthogonal set. Transform the result into an orthonormal basis for P2. Don’t forget to explain howyou know that this set spans P2 (this explanation should involve no actual work).

exercises

1. Let U be a subspace of some vector space, and u1, · · · ,uk a basis for U . Show that v isorthogonal to every vector in U if and only if v is orthogonal to each ui. Does the basis needto be orthogonal?

2. Consider the vector space R3 equipped with the (standard) inner product 〈u|v〉 = uTv and

the norm ‖x‖ =√〈x|x〉 =

√xTx. Let U =

[ αβγ

]: α+ β + γ = 0

. Let S =

[−211

],[

01−1

].

a) Check that U is indeed a vector space.b) Show that S is an orthogonal set, and that S ⊆ U .c) Show that S is a basis for U . Bonus points if you can do this with “no” further arithmetic.d) For each of the following x, find x = projU (x) and φS(x).

i) x =[111

]ii) x =

[1−21

]iii) x =

[121

]iv) x =

[100

]e) For each x in the previous part, identify the point in U that is closest to x, and find the

distance between x and x.

3. Consider the vector space P3 over R equipped with the inner product 〈f |g〉 =∫ +1−1 f(t)g(t) dt.

Let S =

1, t, 3t2 − 1

and let U be the subspace of V spanned by S. Let x = t3.a) Verify that S is an orthogonal set.b) Find x = projU (x).c) Find φS(x).d) Find the vector in U that is closest to x, and find the distance between x and x.

4. Let V be a vector space equipped with an inner product 〈·|·〉 and the induced norm ‖x‖ =√〈x|x〉. Let U be a subspace of V and x ∈ V with x = projU (x). Show that ‖x‖2 =

‖x‖2 + ‖x− x‖2. (note: this shows that Pythagoras works in “any norm”)

5. Suppose that U is a subspace of some vector space V , and define f : V → U by f(x) =projU (x).a) Show that if x ∈ U then f(x) = λx for some λ ∈ R, and determine λ.

60

b) Show that if x ⊥ U then f(x) = λx for some λ ∈ R, and determine λ.

6. Let U be a subspace of a vector space V . Show that projU (x) = x if and only if x ∈ U .

7. For each, apply Gram-Schmidt to the set S using the given inner product.

a) C3 with the standard inner product, S =[

1i0

],[01i

].

b) R3 with the standard inner product, S =[

100

],[011

],[012

].

c) R3 with 〈x|y〉 =[2 1 11 2 11 1 2

], S =

[100

],[010

],[001

].

61


9. QR decomposition

matrix forms

We will translate the notions of projections and Gram-Schmidt into matrix form. We’ll also see anapplication to approximate solutions. We’ll work in Rn or Cn, using the standard inner product.

projection matrices

We’ll start in the real numbers.

Let u1,u2, · · · ,uk be an orthonormal basis for a subspace U of Rm. Let Q be the matrix withthese vectors as columns. We find that QTQ = I.

Problem 9.1. Explain why(QTQ

)ij

= uTi uj. Conclude that(QTQ

)ij

= 0 if i 6= j and also that(QTQ

)ii

= 1. Thus show that QTQ = I.

If U is a subspace of dimension k in Rm then Q is m× k. The equality QTQ = I demonstrates thatQ has a left inverse (namely QT ) so we ask the question: what is QQT ?

Problem 9.2. Show that if m > k then QQT 6= I. (hint: if a matrix has both a left and a rightinverse then. . . )

In order to figure out what QQT is we consider the projection of a vector on U . We know that

projU (v) = α1u1 + · · ·+ αkuk αj =〈uj |v〉〈uj |uj〉

=〈uj |v〉

1= 〈uj |v〉

The projection is a linear combination of vectors of the orthogonal basis, with the αj as coefficients.

Furthermore if a =[α1 α2 · · · αk

]Twe see that this vector is obtained as a matrix product.

projU (v) = Qa a = QTv

So projU (v) = Qa = QQTv. The matrix QQT is the projection matrix on U . If P = QQT thenthe function v 7→ Pv is exactly the function of projection onto U . We have established the followingresult.

Theorem 9.3. Let u1,u2, · · · ,uk be an orthonormal basis for a subspace U of Rm, and Qthe matrix having these vectors as columns, and P = QQT .Then the projection is given by projU (v) = Pv.

For a complex vector space, the idea is similar, except that the inner product behaves in a complexway.1

Problem 9.4. Let u1,u2, · · · ,uk be an orthonormal basis for a subspace U of Cn. Let Q be thematrix having these vectors as columns. Show that QHQ = I, and that projU (v) = QQHv. Explainwhy in general QTQ 6= I and projU (v) 6= QQTv.

∗ These notes are intended for students in mike’s MAT3341. For other uses please say “hi” to [email protected] But not, of course, in a complicated way!

62

Projection matrices have some special properties.

Proposition 9.5. Let Q be a matrix whose columns are orthonormal, and set P = QQT (orP = QQH for a complex vector space). Then

• P is symmetric (Hermitian)

• P 2 = P

• P (I − P ) = (I − P )P = 0

• (I − P )Q = 0

We can show each of these by simply writing P = QQT and simplifying. But it is also useful to thinkin terms of “operations”. Multiplication by P is the operation “project onto U”. Multiplying byP 2 corresponds to “project onto U , and then project the result onto U”. But the second projectionchanges nothing since the result of the first is already in U . So projecting twice is the same thing asprojecting once, which says that P 2 = P .

Problem 9.6. Show each identity of Proposition 9.5 directly by setting P = QQT and simplifying.Show each identity again by arguing in terms of operations. (hint: if P projects onto U then I − Pprojects onto the orthogonal complement of U .)

Example 9.7. Calculate the projection matrix for the subspace U of R4 spanned by[1 1 1 −1

]T,[0 −1 1 0

]TWe see that this set is already orthogonal (why?), so to get an orthonormal basis we only need todivide each vector by its norm. The norms are 2 and

√2 so we get the following.[

12

12

12 −1

2

]T,[0 − 1√

21√2

0]T

These are exactly the columns of the matrix Q, so we get P = QQT .

Q =

12 012 − 1√

212

1√2

−12 0

P = QQT =

14

14

14 −1

414

34 −1

4 −14

14 −1

434 −1

4−1

4 −14 −1

414

Example 9.8. Calculate the projection of v =[1 2 3 4

]Tonto the space U of the previous

example.

projU (v) =

14

14

14 −1

414

34 −1

4 −14

14 −1

434 −1

4−1

4 −14 −1

414

1234

=

1201−1

2

We can check this by calculating v − projU (v) =

[12 2 2 9

2

]; this should be orthogonal to U .

We can check this orthogonality by computing the inner product with the basis vectors, or bymultiplying by P .

122292

T

111−1

= 0

122292

T

0−110

= 0

14

14

14 −1

414

34 −1

4 −14

14 −1

434 −1

4−1

4 −14 −1

414

122292

= 0

63

Since we know the projection formula in terms of a matrix multiplication, we can give a “formula”for the projection of an arbitrary vector.

projU

abcd

=

14

14

14 −1

414

34 −1

4 −14

14 −1

434 −1

4−1

4 −14 −1

414

abcd

=1

4

a+ b+ c− da+ 3b− c− da− b+ 3c− d−a− b− c+ d

This is really just another way of writing the matrix multiplication. . .

Problem 9.9. Find the projection matrix onto the subspace spanned by the following set of vectors.[0 1 1 1

]T,[1 −1 1 0

]T,[0 1 1 −2

]T

QR decomposition

We found a matrix description of projection. Now we do the same for Gram-Schmidt.

If we start with a set of vectors v1,v2, · · · ,vn and apply Gram-Schmidt to get an orthogonalset u1,u2, · · · ,un we can sum of up saying that we calculated the new vectors according to thefollowing.

uj = vj − α1ju1 − α2ju2 − · · · − αj−1,juj−1 αij =〈ui|vj〉〈ui|ui〉

This simply says that we get the new vectors by subtracting off the projections on to the new vectorscalculated up until then. We can rewrite this in terms of the old vector.

vj = α1ju1 + α2ju2 + · · ·+ αj−1,juj−1 + uj

In terms of matrices, this gives the relation A = Q0R0, where A is the matrix whose columns are thevj ’s, Q0 is the matrix whose columns are the uj ’s and R0 is the matrix of the αij ’s. v1 · · · vn

=

u1 · · · un

1 α12 α13 · · · α1n

0 1 α23 · · · α2n

0 0 1 · · · α3n...

. . .

0 · · · 1

If we consider the product on the right from the point of view of Proposition 1.2 we see exactly theequation for vj above.

In order to have nice square matrices, then the orthogonal set has the same size as the initial set.Which means that if there were any zero vectors obtained in Gram-Schmidt, then we didn’t discardthem. This isn’t necessarily a problem, except that if some ui = 0, then we have a problem incomputing further αij for j > i; namely we get 0/0. The solution is to note that αij is the coefficientof ui used in expressing uj in terms of the previous. But since ui = 0, it doesn’t matter whatcoefficient we use, so we might as well choose it as 0. So to be precise, we would calculate the αij asfollows.

αij =

〈ui|vj〉〈ui|ui〉 ui 6= 0

0 ui = 0

64

Theorem 9.10. Let A be an m× n matrix. Then we can write A = Q0R0 where

• Q0 is m× n with orthogonal columns.

• R0 is n× n and is upper triangular with diagonal 1.

• The norm ‖·‖2 of the j-th column of Q0 gives the distance between this column and thespace spanned by columns 1, 2, · · · , j − 1 of A.

We get this decomposition by applying Gram-Schmidt to the columns of A. The columns of Q0 arethe orthogonal vectors that we get and the entries of R0 are the coefficients used in the projections.

The matrix Q0 could have columns of zero; this will be the case whenever the columns of A (thevectors vj) are dependent. If we consider the product Q0R0 from the point of view of Proposition 1.4we see that we can remove these columns, and the corresponding rows of R0. After having done this,we can normalize each column ofQ0 by dividing it by its norm ‖·‖2, and multiplying the correspondingrow of R0 by the same value. This gives the A = QR decomposition.

Theorem 9.11. Let A be an m× n matrix of rank r. Then we can write A = QR where

• Q is m× r with orthonormal columns.

• R is r × n and is in echelon form.

• If pi is the pivot of the i-th row of R and if this pivot is in the j-th column of R,then |pi| gives the distance between the j-th column of Q and the space spanned by thecolumns 1, 2, · · · , j − 1 of A.

Example 9.12. Find the Q0R0 decomposition of the matrix A =

0 1 2 11 1 4 21 3 8 41 −1 0 −3

.

This is Gram-Schmidt, where we fill in Q0 and R0 as we go. In order to simplify the presentationwe omit the calculation of all the scalar products (which makes a nice exercise to check).

The first vector requires no work.

u1 =

0111

Q0 =

0111

R0 =

10 10 0 10 0 0 1

We calculate the second vector by subtracting off one projection.

u2 =

113−1

− 3

3

0111

=

102−2

Q0 =

0 11 01 21 −2

R0 =

1 10 10 0 10 0 0 1

We calculate the third by subtracting off two projections.

u3 =

2480

− 12

3

0111

− 18

9

102−2

=

0000

Q0 =

0 1 01 0 01 2 01 −2 0

R0 =

1 1 40 1 20 0 10 0 0 1

65

Normally the fourth would be calculated by removing three projections, but here the third vectoris the zero vector so we needn’t bother subtracting it (and in fact it would break the formula).

u4 =

124−3

− 3

3

0111

− 15

9

102−2

=

−2/3

1−1/3−2/3

Q0 =

0 1 0 −2/31 0 0 11 2 0 −1/31 −2 0 −2/3

R0 =

1 1 4 10 1 2 5/30 0 1 00 0 0 1

Problem 9.13. There is one value in the matrix R0 that is not explicitly calculated: the (3, 4)position. This is the coefficient of the projection that we didn’t actually do, so we leave it as zero.Show that if we replace this 0 by any other number then we would still have a valid A = Q0R0

decomposition (hint: multiply).

In the previous example we found a zero column; we left it in the matrix for the Q0R0 decomposition,but if we wanted an orthonormal basis for col(A) we would have removed it. Typically we would wanta “minimal” decomposition, that does not include “useless” columns, and the idea of Proposition 1.4shows how.

Q0R0 =

0111

[1 1 4 1]

+

102−2

[0 1 2 5/3]

+

0000

[0 0 1 0]

+

−2/3

1−1/3−2/3

[0 0 0 1]

=

0111

[1 1 4 1]

+

102−2

[0 1 2 5/3]

+ 0 +

−2/3

1−1/3−2/3

[0 0 0 1]

=

0 1 −2/31 0 11 2 −1/31 −2 −2/3

1 1 4 1

0 1 2 5/30 0 0 1

This is exactly the step of “removing the zero columns of Q0 and the corresponding rows of R0” thatgives Theorem 9.11.

Example 9.14. Find a QR decomposition of the previous matrix.

We already know A = Q0R0, so we only need to remove the zero row of Q0 and the correspondingrow of R0 (think of Proposition 1.4), and divide/multiply by the norms of the columns of Q0. Thethree norms are

√3, 3 and

√2.

Q =

0 1/3 −

√2/3

1/√

3 0√

2/2

1/√

3 2/3 −√

2/6

1/√

3 −2/3 −√

2/3

R =

√3√

3 4√

3√

30 3 6 5

0 0 0√

2

Since we already know Q we can easily find the projection matrix.

P = QQT =

1/3 −1/3 1/3 0−1/3 5/6 1/6 01/3 1/6 5/6 00 0 0 1

Note that this matrix does not depend on the choice of basis, only on the subspace. Comparethis example with Problem 9.9.

Problem 9.15. Check that A = Q0R0 = QR in the previous examples and that the columns of Q0

are orthogonal and that the columns of Q are orthonormal.

66

Problem 9.16. Find a Q0R0 decomposition and a QR decomposition for A =

1 21 11 3−1 −2

.

Problem 9.17. Find a Q0R0 decomposition and a QR decomposition for A =

1 2 01 1 −11 3 1−1 −2 0

, and

compare with the previous.

approximations

Consider a linear system Ax = b. It might or might not have a solution, so in general we would lookto solve Ax ≈ b. More precisely, we want to find the vector x that makes Ax be as close to b aspossible; in other words, find the vector x that minimizes ‖Ax− b‖. The vector Ax is necessarily inthe column space of A, and this is the best approximation to b within that subspace. So we know

the answer: we need to have Ax = projcol(A)(b). We will write b = projcol(A)(b) for the projection.

We want to solve Ax = b. The QR decomposition does this for us.

Ax = b ⇐⇒ QRx = QQHb ⇐⇒ QHQRx = QHQQHb ⇐⇒ Rx = QHb

So if we want to solve Ax ≈ b, then we will actually solve Rx = QHb. For a real vector space, thecomplex conjugate has no effect; we would have Rx = QTb

There is another approach, based on the normal equations.

Rx = QHb ⇐⇒ QHQRx = QHb ⇐⇒ RHQHQRx = RHQHb ⇐⇒ AHAx = AHb

This method doesn’t require a QR decomposition, but we would have to solve a general systemanyway. Since R is in echelon form, solving Rx = QHb is faster. Furthermore solving the normalequations is numerically less stable.

We could also calculate the projections explicitly: that is, calculate b = QQHb and then solve

Ax = b. This requires more work since the system is not in echelon form.

The QR method is, in some sense, optimal.

Example 9.18. Solve

0 1 2 11 1 4 21 3 8 41 −1 0 −3

x ≈

1111

.

We already know a QR decomposition, so we will solve the (exact!) system Rx = QTb.√3√

3 4√

3√

30 3 6 5

0 0 0√

2

x = QTb =

0 1/√

3 1/√

3 1/√

31/3 0 2/3 −2/3

−√

2/3√

2/2 −√

2/6 −√

2/3

1111

=

√3

1/3

−√

2/3

67

Being in echelon form, we can solve this directly, starting from the bottom. There is one parameter:x3 = t.

√2x4 = −

√2

3x4 = −1

3

3x2 + 6x3 +15

3x4 =

1

3x2 =

2

3− 2t

√3x1 +

√3x2 + 4

√3x3 +

√3x4 =

√3 x1 =

2

3− 2t

We can also calculate the projection, in two different ways.

b = (QQT )b = Q(QTb) =[13

23

43 1

]Tb = Ax =

[13

23

43 1

]TNote that we can also use the QR method to solve exact systems. So given Ax = b, one approachwould be to find A = QR and then solve Rx = QHb. We know the general solution — but we don’tknow if the original system had an exact solution or not!

To decide whether the solution is exact or not, we could calculate the approximation b, since what

we really solved was Ax = b; if b = b then the solution is exact. We can compute this in two ways:

b = (QQH)b or b = Ax (now that we know the solution x).

exercises

1. Consider Cn with the standard inner product 〈w|z〉 = zHz. Let U be the subspace spanned

by

1i0

,0

1i

.

a) Find an orthonormal basis for U .b) Using the previous part, find the projection matrix P such that Pz = projU (z).

2. Consider Cn with the standard inner product 〈w|z〉 = zHz. Suppose U1 and U2 are orthogonalsubspaces of Cn (meaning that 〈z1|z2〉 = 0 for all z1 ∈ U1 and z2 ∈ U2). Let P1 and P2 bethe projection matrices onto U1 and U2.a) Show that P1z is orthogonal to U2 and that P1z is orthogonal to U2 for every z ∈ Cn.

Conclude from this that P1P2 = P2P1 = 0.b) Using the expression Pi = QiQ

Hi for appropriate Qi, show that QH1 Q2 = QH2 Q1 = 0.

Conclude from this that P1P2 = P2P1 = 0.

3. Consider a Q0R0-decomposition of A with Q0 =

1 2 0−2 2 02 1 0

and R0 =

1 −2 20 1 30 0 1

.

a) Give the QR decomposition.b) Find the projection matrix P such that Px = projU (x), where U is the column space

of A.c) Find the best approximate solution x to Ax =

[011

].

d) Compute b = projU (x). Is your solution to the linear system above exact or approxi-mate?

e) Compute b = Ax. Is your solution to the linear system above exact or approximate?

68

4. Consider the matrix A =

1 2 i0 2 0i 0 1

.

a) Give a A = Q0R0 decomposition.b) Give a A = QR decomposition.

5. Suppose that A = QR is a QR-decomposition of a real matrix A. Under what additionalconditions will it be the case that ATA = RTR?

6. Say that a complex matrix is special if it is square, lower triangular and its columns areorthonormal.a) Find all special matrices.b) Thus, find all matrices A such that in finding A = QR and A = LU we have Q = L.

69


10. Linear Transformations

motivation

A matrix can be thought of as a function. If A is m × n, then it corresponds to the function“multiplication by A”.

A : Rn → Rm

x 7→ Ax

This is in fact a very useful concept. Here are a few applications of matrices as functions.

Example 10.1. Let xt be a vector of populations (e.g., different species in an ecosystem) at timet. Sometimes we can write xt+1 in terms of xt, and sometimes this can be approximated by a linearrelationship. In this case we have a linear dynamical system xt+1 = Axt, where the matrix Arepresents the interactions between the species. It gives the transition between time t and time t+ 1.

Example 10.2. Let A be the “matrix of the internet” (rows and columns indexed by web pageswith Aij = 1 if page j links to page i). This matrix determines a Markov chain. The stationarydistribution of this chain corresponds to a “typical” state of a random web-surfer. This is in fact thebasis of Google’s PageRank.

Example 10.3. We want to send a message x, but the communication system is not perfectly reliable: sometimes the message gets corrupted. The solution is to send not x, but rather a message y withenough redundancy that we could detect and even correct errors introduced by the system. This isoften done with a linear code, where y = Ax for a certain matrix A. The linearity allows us toseparate the message from the error, and to correctly read a corrupted message.

definition and examples

Let V and W be two vector spaces. The function T : V → W is a linear transformation if forevery v1,v2 ∈ V and every scalar α we have

T (v1 + v2) = T (v1) + T (v2)

T (αv1) = αT (v1)

Alternatively we could check a combined condition.

T (αv1 + v2) = αT (v1) + T (v2)

Problem 10.4. Verify that the following three conditions are equivalent.

• T (v1 + v2) = T (v1) + T (v2) and T (αv1) = αT (v1)

• T (αv1 + v2) = αT (v1) + T (v2)

• T (αv1 + βv2) = αT (v1) + βT (v2)

Example 10.5. Let A be an m× n matrix, and T : Rn → Rm defined by T (x) = Ax. Show that Tis a linear transformation.


70

The condition follows directly from the properties of matrix multiplication.

T (x1 + x2) = A(x1 + x2) = Ax1 +Ax2 = T (x1) + T (x2)

T (αx1) = Aαx1 = αAx1 = αT (x1)

We sometimes say that a transformation T : Rn → Rm or T : Cn → Cm that is defined in thisway by T (x) = Ax for a matrix A is a matrix transformation. There are many other lineartransformations.

Example 10.6. Let V = W = C[R], the vector space of continuous functions on the real numbers.For instance sin(x) and x2 − 5x+ 3 are functions in C[R], but tan(x) is not since tan(90 ) does notexist.

Let T : V →W be defined by T (f) = etf(t). Show that T is a linear transformation.

This time we’ll check the combined condition.

T (αf + g) = et (αf(t) + g(t)) = αetf(t) + etg(t) = αT (f) + T (g)

In general, if h(t) is any arbitrary function in C[R], then the transformation T (f) = h(t)f(t) islinear; h(t) = et is just a special case.

Example 10.7. Let V be a vector space of dimension n, and B a basis for V .

Then φB : V → Rn such that φB(x) is the vector of coordinates of x with respect to the basis B is alinear transformation (even more, it’s an isomorphism). We leave the verification as an exercise.

Example 10.8. Let V = W = C∞[R], the vector space of infinitely differentiable functions.

Let T : V → V be defined by T (f) = td

dtf(t)− sin(t)f(t). Show that T is linear.

We verify the conditions directly.

T (αf + g) = td

dt

(αf(t) + g(t)

)− sin(t)

(αf(t) + g(t)

)= t

(α

d

dtf(t) +

d

dtg(t)

)− α sin(t)f(t)− sin(t)g(t)

= αtd

dtf(t) + t

d

dtg(t)− α sin(t)f(t)− sin(t)g(t)

= α

(t

d

dtf(t)− sin(t)f(t)

)+

(t

d

dtg(t)− sin(t)g(t)

)= αT (f) + T (g)

We can add linear transformations, just as we would functions. For instance if S and T are lineartransformations we define R = S + T by R(v) = S(v) + T (v). We can also multiply linear trans-formations by scalars : P = αT is defined by P (v) = αT (v). These are the standard operations forlinear transformations.

Theorem 10.9. Let L (V,W ) be the set of all linear transformations from V to W , withstandard the standard operations.Then L (V,W ) is a vector space. More precisely, it is a subspace of the vector space of allfunctions from V to W .

71

If we accept that functions form a vector space (if you don’t then that’s an exercise), then the proofis just the subspace test. Otherwise, the proof is just verifying the ten axioms of a vector space. Wewon’t verify any, but it’s a good idea to check a few yourself.

Problem 10.10. Let S and T be two linear transformations. Let R1 = S + T and R2 = T + S.Show that R1(v) = R2(v) for all vectors v. (axiom of commutativity)

Problem 10.11. Let S and T be two linear transformations from V to W , and α a scalar. LetR = αS + T . Show that R is a linear transformation from V to W . Show also that the functionZ : V → W with Z(v) = 0 is a linear transformation from V to W , and furthermore that Z is thezero of L (V,W ).

range and null space

There are two important spaces that arise from a linear transformation.

Let T : V → W be a linear transformation. The range of T is the set of all vectors that are “T ofsomething”. The null space of T is the set of all vectors that T sends to zero.

range of T : im(T ) = T (v) | v ∈ V = w ∈W | w = T (v) for some v ∈ V

null space of T : nul(T ) = v ∈ V | T (v) = 0

These are not simply sets, they are subspaces.

Proposition 10.12. If T : V →W is linear then its range, im(T ) is a subspace of W .

Proof. We need to check that if u,v ∈ im(T ) and α is a scalar, then αu + v ∈ im(T ).

If u,v ∈ im(T ) then there exist x,y ∈ V with T (x) = u and T (y) = v. Then we see that:

αu + v = αT (x) + T (y) = T (αx + y)

Thus αu + v ∈ im(T ).

Proposition 10.13. If T : V →W is linear then its null space, nul(T ) is a subspace of V .

Proof. We need to check that if u,v ∈ nul(T ) and α is a scalar, then αu + v ∈ nul(T ).

If u,v ∈ nul(T ) then T (u) = 0 and T (v = 0). Then we see that:

T (αx + y) = αT (x) + T (y) = α0 + 0 = 0

Thus αx + y ∈ nul(T ).

Problem 10.14. If T is a matrix transformation (i.e., T (v) = Av for some matrix A), identifyim(T ) and nul(T ) in terms of A.

The range and null space are closely connected to finding solutions. Linear transformations aregeneralizations of matrix multiplication, so we expected something similar.

72

Proposition 10.15. Let T : V →W be a linear transformation.

• The equation T (x) = w has at least one solution for every w ∈ W if and only ifim(T ) = W .

• The equation T (x) = w has at most one solution for every w ∈ W if and only ifnul(T ) = 0.

Proof. exercise!

It is worth thinking about what this says for a matrix transformation T (x) = Ax.

If T (x) = w has at least one solution for every w then we say that T is surjective. If T (x) = whas at most one solution for every w then we say that T is injective. So T is surjective if and onlyif im(T ) = W and T is injective if and only if nul(T ) = 0.If T is injective and surjective then T is an isomorphism.

Problem 10.16. Review the definition of an isomorphism that we saw earlier. Verify that it isequivalent to a linear transformation that is injective and surjective.

Sometimes we say bijective for “injective and surjective”. So an isomorphism is a bijective lineartransformation. If T is bijective then T (x) = w has exactly one solution for all vectors w.

Corollary 10.17. Let T : V →W be a linear transformation. The equation T (x) = w has atleast one solution for every w ∈W if and only if im(T ) = W and nul(T ) = 0 if and only ifT is a bijection.

dimensions

Recall that for a matrix the dimension of the null space is the number of pivotless columns. In otherwords, the dimension of the column space plus the dimension of the null space is the number ofcolumns. A linear transformation has neither columns nor pivots, but the same idea still holds.

Theorem 10.18. Let T : V →W be a linear transformation with dim(V ) <∞.Then dim(V ) = dim(im(T )) + dim(nul(T )).

Note that im(T ) is a subspace of W and nul(T ) is a subspace of V . This result says that thedimensions “lost” in going from V to W correspond to the null space.

Proof. Since dim(V ) < ∞ then dim(im(T )) < ∞ also. So let w1,w2, · · · ,wr be a basis forim(T ). Since these are vectors in the range of T we know there exist vj ∈ V with wj = T (vj) for1 ≤ j ≤ r. Let u1,u2, · · · ,uk be a basis of nul(T ). (So r = dim(im(T )) and k = dim(nul(T )).)We will show that v1,v2, · · · ,vr,u1,u2, · · · ,uk is a basis for V .

First we will show that the given set is linearly independent, by considering a linear combinationthat gives the zero vector.

0 = α1v1 + α2v2 + · · ·+ αrvr + β1u1 + β2u2 + · · ·+ βkuk

T (0) = α1T (v1) + α2T (v2) + · · ·+ αrT (vr) + β1T (u1) + β2T (u2) + · · ·+ βkT (uk)

0 = α1T (v1) + α2T (v2) + · · ·+ αrT (vr) + 0 + · · ·+ 0

0 = α1w1 + α2w2 + · · ·+ αrwr

73

Since w1,w2, · · · ,wr is a basis for im(T ), the αj ’s must all be 0. But then 0 = β1u1+· · ·+βkuk,and so the uj ’s must all be 0. (why?)

Now we show that the given set spans V . Let v ∈ V ; since T (v) ∈ im(T ) there exist αj ’s suchthat

T (v) = α1w1 + α2w2 + · · ·+ αrwr

T (v) = α1T (v1) + α2T (v2) + · · ·+ αrT (vr)

Set v0 = v − (α1v1 + · · ·+ αrvr). Then T (v0) = 0 and so v0 ∈ nul(T ). So there exist βj ’s suchthat

v0 = β1u1 + β2u2 + · · ·+ βkuk

This gives that

v = (α1v1 + α2v2 + · · ·+ αrvr) + v0

= α1v1 + α2v2 + · · ·+ αrvr + β1u1 + β2u2 + · · ·+ βkuk

Thus v1,v2, · · · ,vr,u1,u2, · · · ,uk is a basis for V and dim(V ) = r + k.

Problem 10.19. In the proof, we used the fact that if dim(V ) < ∞ then dim(im(T )) < ∞ also.Prove this.

Problem 10.20. If T is a matrix transformation (i.e., T (v) = Av for some matrix A), identifyTheorem 10.18 in terms of A.

inverse

If a linear transformation T : V → W is bijective, then for w ∈ W there is a unique vector v suchthat T (v) = w. This allows us to define the inverse of T as T−1 : W → V such that T−1(w) = vwhen w = T (v). Surjectivity guarantees that there exists such a v and injectivity guarantees thatthis v is unique.

Proposition 10.21. If T : V → W is a bijective linear transformation, then T−1 is a lineartransformation.

Proof. Let w1,w2 ∈W and α a scalar. Since T is bijective there exist unique vectors v1,v2 suchthat T (v1) = w1 and T (v2) = w2, and so v1 = T−1(w1) and v2 = T−1(w2). Since T is linear wehave

T (αv1 + v2) = αT (v1) + T (v2) = αw1 + w2

We conclude that

T−1(αw1 + w2) = αv1 + v2 = αT−1(w1) + T−1(w2)

Thus T−1 is linear.

Problem 10.22. Let T : Rn → Rm be a bijective linear transformation defined by T (x) = Ax.

Show that m = n and that T has an inverse, namely T−1(y) = A−1y.

adjoint

There is another transformation associated to T ; this one depends on the choice of inner product.

74

Let T : V → W be a linear transformation and 〈·|·〉V and 〈·|·〉W be inner products on V and W . Ifthere exists a linear transformation T ∗ such that for all v ∈ V and w ∈W we have

〈T (v)|w〉W = 〈v|T ∗(w)〉Vthen T ∗ is an adjoint for T .

Problem 10.23. Let T : Rn → Rm be a linear transformation, with T (x) = Ax, with the standardinner product for Rn and Rm

Show that T ∗(y) = ATy is an adjoint for T . So the adjoint of a linear transformation is a general-ization of the transpose of a matrix.

What changes if we consider complex vector spaces, T : Cn → Cm, with T (z) = A(z), the matrix Abeing complex?

Adjoints are linear.

Theorem 10.24. If T ∈ L (V,W ) then T ∗ is linear; that is T ∗ ∈ L (W,V ).

Proof. See Exercise 10.9.

In fact, adjoints exist in any vector space with an inner product. This follows from the Rieszrepresentation theorem (finite version).1

Theorem 10.25. Let V be a finite dimensional vector space over F (where F is either R or C)with an inner product 〈·|·〉. If g : V → F is linear, then there exists z ∈ V with g(x) = 〈z|x〉.

Using Theorem 10.25, we can establish existence and uniqueness of adjoints. We won’t go into detail.

Theorem 10.26. Let T ∈ L (V,W ). Then T has a unique adjoint T ∗.

matrix of a transformation

Matrices (i.e., matrix linear transformations) seem “simpler” than linear transformations in general.This is one reason that we want to represent a general linear transformation by a matrix.

Theorem 10.27. Let T : V →W be a linear transformation. Let B = v1, · · · ,vn be a basisfor V and C = w1, · · · ,wm a basis for W .Let A be the matrix whose j-th column is φC(T (vj)). Let v ∈ V and w ∈W , and set x = φB(v)and y = φC(w).Then T (v) = w if and only if Ax = y.

The matrix A acts on vectors of coordinates in a way that is consistent with the action of T onvectors of V . The point of this result is that we can calculate with matrices as in Ax instead of moregeneral calculations like T (v). We can work with arbitrary spaces V and W as if they were Rn (orCn). This is an important result, but the proof is essentially an algebraic substitution.

1 In fact, the Riesz representation applies to infinite dimensional spaces as well as long as they are complete withrespect to the norm in question. An inner product space (vector space with an inner product) is complete if everyCauchy sequence is convergent. An inner product space (possibly infinite dimensional) over R or C that is completeis called a Hilbert space, and Theorem 10.25 in fact applies to Hilbert spaces. We won’t need the general versionin this course.

75

Proof. We use xj for the j-th position of x and aij for the (i, j)-th position of A, and notice thatif we fix a column j then the aij are the coordinates of vj with respect to the basis C.

T (v) = T (x1v1 + x2v2 + · · ·+ xnvn)

= x1T (v1) + x2T (v2) + · · ·+ xnT (vn)

= x1(a11w1 + a21w2 + · · · am1wm

)+ x2

(a12w1 + a22w2 + · · · am2wm

)+ · · ·+ xn

(a1nw1 + a2nw2 + · · · amnwm

)=(a11x1 + a12x2 + · · ·+ a1nxn

)w1

+(a21x1 + a22x2 + · · ·+ a2nxn

)w2

+ · · ·+(am1x1 + am2x2 + · · ·+ amnxn

)wm

By definition the values in parentheses are exactly the coordinates of T (v) with respect to thebasis C. In other words, for 1 ≤ i ≤ m we have

yi = ai1x1 + ai2x2 + · · ·+ ainxn

Thus w = T (v) is equivalent to y = Ax.

The matrix A is the matrix of the transformation T with respect to the bases B and C. Notethat the matrix depends on the transformation as well as the two bases involved.

Example 10.28. Let V =M22(C) and W = P3(C), complex 2×2 matrices and complex polynomialsof degree at most 3. We will use the following two bases for V and W .

M22(C) :

[1 00 1

],

[0 11 0

],

[1 00 −1

],

[0 −ii 0

]P3(C) :

1 + it, 1− it, 1 + t+ 2it2

Consider the transformation T : V →W with T

([a bc d

])= (a+ c) + (b+ d)t+ (a+ d)t2. Find the

matrix of T , with respect to the two given bases.

We find the coordinates of T of the first basis, with respect to the second. We start with the firstvector of B, so we want to solve the following.

T

([1 00 1

])= 1 + t+ 2t2 = α1 (1 + it) + α2 (1− it) + α3

(1 + t+ 2it2

)= (α1 + α2 + α3) + (iα1 − iα2 + α3) t+ (2iα3) t

2

This gives the following system of equations and augmented matrix.1 = α1 + α2 + α3

1 = iα1 − iα2 + α3

2 = 2iα3 1 1 1 1i −i 1 10 0 2i 2

−−−−−−−−−→R2 7→R2−iR1

1 1 1 10 −2i 1− i 1− i0 0 2i 2

We get the following solution.

2iα3 = 2 → α3 = −i−2iα2 + (1− i)α3 = −2iα2 + (1− i)(−i) = 1− i → α2 = i

α1 + α2 + α3 = α1 + (i) + (−i) = 1 → α1 = 1

76

So the first column of A is[1 i −i

]T. These are the coordinates of T of the first vector of B,

with respect to the basis C.The second column is calculated based on the second vector of B.

T

([0 11 0

])= 1 + t = α1 (1 + it) + α2 (1− it) + α3

(1 + t+ 2it2

)= (α1 + α2 + α3) + (iα1 − iα2 + α3) t+ (2iα3) t

2 1 1 1 1i −i 1 10 0 2i 0

−−−−−−−−−→R2 7→R2−iR1

1 1 1 10 −2i 1− i 1− i0 0 2i 0

We get the solution α3 = 0, α2 = (1 + i)/2 and α1 = (1 − i)/2. The second column of A is[(1− i)/2 (1 + i)/2 0

]T.

The third column of A is calculated from the third column of B.

T

([1 00 −1

])= 1− t = α1 (1 + it) + α2 (1− it) + α3

(1 + t+ 2it2

)= (α1 + α2 + α3) + (iα1 − iα2 + α3) t+ (2iα3) t

2 1 1 1 1i −i 1 −10 0 2i 0

−−−−−−−−−→R2 7→R2−iR1

1 1 1 10 −2i 1− i −1− i0 0 2i 0

We get the solution α3 = 0, α2 = (1 − i)/2 et α1 = (1 + i)/2. The third column of A is[(1 + i)/2 (1− i)/2 0

]T.

We could do similar calculations for the fourth column, or we could observe that T

([0 −ii 0

])=

i−it = i(1−t). We already know that φC(1−t) =[(1 + i)/2 (1− i)/2 0

]T, and so φC(i(1−t)) =

iφC(1− t) =[(−1 + i)/2 (1 + i)/2 0

]T.

The matrix of the transformation is 1 (1− i)/2 (1 + i)/2 (−1 + i)/2i (1 + i)/2 (1− i)/2 (1 + i)/2−i 0 0 0

Problem 10.29. In calculating that φC(i(1− t)) =[(−1 + i)/2 (1 + i)/2 0

]T, we used a property

of the function φC. Which one? Calculate φC(i(1− t)) directly, like we did for the other columns.

Example 10.30. For the previous transformation, calculate T (v) for v =

[2 1 + 2i

1− 2i 1

]via the

matrix of the transformation, and check this directly.

We see that [2 1 + 2i

1− 2i 1

]= 1

[1 00 1

]+ 1

[0 11 0

]+ 1

[1 00 −1

]− 2

[0 −ii 0

]and so the coordinates of v with respect to B are x = φB(v) =

[1 1 1 −2

]T. We calculate

y = φC(T (v)) as y = Ax =[3− i 0 −i

]T. This vector is the coordinates of w, so we can

express w in terms of the second basis.

w = (3− i)(1 + it) + (0)(1− it) + (−i)(1 + t+ 2it2) = (3− 2i) + (1 + 2i)t+ 2t2

77

We can also get T (v) directly, which of course gives the same thing.

T (v) =(

(2) + (1− 2i))

+(

(1 + 2i) + (0))t+

((2) + (0)

)t2 = (3− 2i) + (1 + 2i)t+ 2t2

It might appear that the “direct” method is easier. But in reality we need only calculate the matrixA once. Also the operation T can be more complicated, whereas the matrix multiplication is alwayssimple.

Problem 10.31. Consider the transformation T : R2 → P1 with T

([ab

])= (2a− b) + (a+ b)t.

Determine the matrix A of this transformation with respect to the following bases.

R2 : B =

[10

],

[01

]P1 : C = 1 + t, 1− t

Determine the matrix A′ of this transformation with respect to the following bases.

R2 : B′ =[

10

],

[11

]P1 : C′ = 1, t

matrix of a transformation and change of basis

The matrix of a transformation is with respect to a particular choice of bases. What if we wouldrather know the matrix of the same transformation with respect to different bases? We’d need achange of basis matrix.

Theorem 10.32. Let A be the matrix of a transformation T : V →W with respect to a basisB for V and C for W .Then the matrix of T with respect to a basis B′ for V and C′ for W is A′ = MC→C′AMB′→B.

Proof. We know that φC(T (v)) = AφB(v). But φB(v) = MB′→BφB′(v) and φC′(T (v)) = MC′→CφC′(T (v)).This gives

MC′→CφC′(T (v)) = AMB′→BφB′(v)

φC′(T (v)) = (MC′→C)−1AMB′→BφB′(v)

φC′(T (v)) = MC→C′AMB′→BφB′(v)

φC′(T (v)) = (MC→C′AMB′→B)φB′(v)

So by definition the matrix of T with respect to B′ and C′ is MC→C′AMB′→B.

Problem 10.33. In Problem 10.31, you calculated A with respect to two different pairs of bases.Calculate MB′→B and MC→C′, and verify that A′ = MC→C′AMB′→B.

exercises

1. Let T : V → W and S : U → V be linear transformations. Prove that the functionR : U →W defined by R(u) = T (S(u)) is a linear transformation.

2. Let S, T ∈ L (V,W ), and let α, β be scalars. Show that αS + βT ∈ L (V,W ).

3. Let Z : V →W defined by Z(v) = 0 for all v ∈ V .a) Show that Z ∈ L (V,W ).b) Show that Z + T = T + Z = T for all T ∈ L (V,W ).

78

c) For each T ∈ L (V,W ), define T : V → W by T (v) = −T (v). Show that T + T =

T + T = Z.(Note that the “−” in −T (v) means the negative of the vector T (v) in the vector space

W , and the “+” in T + T means addition of two functions in L (V,W ).)

4. Suppose that we accept that the set F (V,W ) of functions V →W forms a vector space withaddition and scalar multiplication defined in the usual way.a) Specify what addition and scalar multiplication mean in F (V,W ).b) Show, using the subspace test, that L (V,W ) is a subspace of F (V,W ).

5. Let P be the vector space of polynomials (over R or C, your choice). Define S : P → P by

S(p) = ddtp(t) and T : P → P by T =

∫ t0 p(x) dx.

a) Is S injective? Surjective? Justify!b) Is T injective? Surjective? Justify!c) Are S and T inverses? “Clearly” the fundamental theorem of calculus says they are,

right? Explain.

6. Prove Proposition 10.15 and Corollary 10.17.

7. Let T ∈ L (V,W ), and T (v0) = w. Show that v : T (v) = w = v0 + u : u ∈ nul(T ).8. Let T : Cn → Cn with T (z) = Az for some matrix A.

a) Give the transformations T−1 and T ∗, in terms of A.b) A complex matrix A is unitary if AHA = I. Show that if A is unitary then T−1 = T ∗,

that is, for a unitary matrix, the corresponding inverse and adjoint transformations areequal.

9. Prove Theorem 10.24. Let T ∈ L (V,W ), and T ∗ be the adjoint.a) Show that for any x ∈ V , and u,v ∈ W and scalars α, β, we have 〈x|T ∗(αu + βv)〉 =〈x|αT ∗(u) + βT ∗(v)〉

b) Show that if 〈x|y〉 = 〈x|y′〉 for every x, then y = y′.c) Combine the above to conclude that T ∗ ∈ L (W,V ).

10. Consider the transformation T : P3 → P3 with T (f) = ddt(f).

a) Show that T ∈ L (P3,P3).b) Find the matrix A1 of T with respect to the basis B =

1 + t, 1− t, t2 + t3, t2 − t3

for

P3.c) Find the matrix A2 of T with respect to the standard basis E for P3.d) Find the change of basis matrix MB→E .e) Give an equation that relates the matrices A1, A2 andMB→E . You don’t need to simplify,

just give the relation.

11. Consider T : P2 →M2×2 defined by T (p(t)) =

[p(1) d

dtp(t)∣∣t=1

d2

dt2p(t)

∣∣∣t=1

p(2)

].

a) Find the matrix A1 of T with respect to the following bases (B for P2 and C for M2×2of course).

B =

1, 1 + t, 1 + t+ t2

C =

[1 00 0

],

[1 10 0

],

[1 11 0

],

[1 11 1

]b) Find the matrix A2 of T with respect to the following bases (B for P2 and E for M2×2

of course).

B =

1, 1 + t, 1 + t+ t2

E =

[1 00 0

],

[0 10 0

],

[0 01 0

],

[0 00 1

]c) Find the change of basis matrix MC→E .

79

d) Give an equation that relates the matrices A1, A2 and MC→E . You don’t need to simplify,just give the relation.

12. Consider T : P3 → P3 defined by T (f) = f(t)− f ′(t). The following are all bases for P3 (noneed to check).

B1 =

1 + t, 1− t, t+ t2, t+ t3

B2 =

1, t, t2, t3

B3 =

1 + t− t2, t+ t3, t2 + 3t3, 2t+ 3t2 + 4t3

a) Give the matrix of T with respect to the bases B1 and B1.b) Give the matrix of T with respect to the bases B1 and B2.c) Give the matrix of T with respect to the bases B3 and B1.d) We want to know the matrix of T with respect to the bases Bi → Bj , for each 1 ≤ i, j ≤ 3.

Write down two (well-chosen) change of basis matrices, and give all the matrices of Twith respect to Bi → Bj in terms of these two change of basis matrices and the matrixof T with respect to B1 → B1. Your answers can be in the form of matrix products, andso should involve no serious arithmetic.

80


11. Transformations and Norms

norms of transformations

The space L (V,W ) is a vector space, so we should be able to define a norm on it. A useful norm onL (V,W ) is based on norms for V and W .

Let ‖·‖V and ‖·‖W be norms for V and W . We define a norm on L (V,W ) as

‖T‖V,W = maxv 6=0

‖T (v)‖W‖v‖V

Notice that ‖·‖V , ‖·‖W and ‖·‖V,W are all norms on different vector spaces. We can distinguish them

by context (t.e. to which vector space does their argument belong?) so we often write them as ‖v‖,‖T (v)‖ and ‖T‖, for simplicity.

Proposition 11.1. The function ‖T‖ = ‖T‖V,W is a norm on L (V,W ).

Proof. We need to check the properties of a norm.

Let α be a scalar and T1, T2 ∈ L (V,W ) linear transformations. Set R = αT1 and S = T1 + T2, sothat R(v) = αT1(v) and S(v) = T1(v) + T2(v).

‖R‖ = maxv 6=0

‖R(v)‖W‖v‖V

‖S‖ = maxv 6=0

‖S(v)‖W‖v‖V

= maxv 6=0

‖αT1(v)‖W‖v‖V

= maxv 6=0

‖T1(v) + T2(v)‖W‖v‖V

= maxv 6=0

|α| ‖T1(v)‖W‖v‖V

≤ maxv 6=0

‖T1(v)W ‖+ ‖T2(v)‖W‖v‖V

= |α| ‖T1‖ ≤ maxv 6=0

‖T1(v)‖W‖v‖V

+ maxv 6=0

‖T2(v)‖W‖v‖V

= ‖T1‖+ ‖T2‖We also need to show that ‖T‖ = 0 if and only if T = 0. In order to do this we would first needto know what the zero vector of L (V,W ) actually is. We leave this as an exercise.

Problem 11.2. Show that the zero of L (V,W ) is the transformation Z : V → W with Z(v) = 0for all v ∈ V . Show that ‖T‖ = 0 if and only if T = Z.

Problem 11.3. In showing that ‖S‖ ≤ ‖T1‖ + ‖T2‖ there were two “≤”. Why are these both “≤”instead of “=”?


81

There is another detail with the norm of a transformation. Is it the definition itself valid? Doesthe maximum exist? To better understand this problem, here are two examples in a numerical (asopposed to vectorial) context.

Example 11.4. Consider the following two “maximums”.

maxk 6=0

k2 + k

= max

k 6=02, 6, 12, 20, · · · =??

maxk 6=0

k − 1

k

= max

k 6=0

0,

1

2,2

3,3

4, · · ·

=??

Of course the “answers” are ∞ and 1. The problem is that neither of these values are actually inthe respective sets : there is no value of k that gives ∞ in the first nor 1 in the second.

We define the supremum of a set as the least upper bound of the set. For a set K ⊆ R the supremumof K is the smallest value of m ∈ R ∪ ∞ such that k ≤ m for all k ∈ K. It’s like a maximum, butwe don’t insist that m ∈ K.

Problem 11.5. Show that supk 6=0

k2 + k

=∞, and that sup

k 6=0

k − 1

k

= 1

This issue doesn’t actually occur for the norm of a transformation, by the following result.

Proposition 11.6. If T : V → W is a linear transformation with dim(V ) < ∞ then thereexists a vector vmax ∈ V with

‖T (v)‖‖v‖

≤ ‖T (vmax)‖‖vmax‖

So the maximum is attained by a vector vmax ∈ V .

In general we replace the “max” by a “sup” in the definition of the norm. For our purposes it isuseful to know that it really is a maximum. It means that if we want to compute the norm of anactual transformation then there really is a vector that achieves it.

properties of norms of transformations

The norm ‖T‖ measures the maximum expansion factor that T can produce. Notice that this isn’tstrictly speaking an expansion of length, since we are comparing the norm of a vector T (v) in W tothe norm of a vector v in V .

Proposition 11.7 (properties of the norm of a transformation).

• For all T : V →W and v ∈ V we have ‖T (v)‖ ≤ ‖T‖ ‖v‖.• If I : V → V is defined by I(v) = v then ‖I‖ = 1.

• If T : V → W and S : U → V and we define R : U → W by R(u) = T (S(u)), then‖R‖ ≤ ‖T‖ ‖S‖.• If T : V → V and we define T k(v) as the application of T repeatedly, k times, T k(v) =

T (T (· · ·T (v) · · · )), then∥∥T k∥∥ ≤ ‖T‖k.

Proof. The first statement follows directly from the definition (but is often useful in this form).

The second statement follows from the fact that ‖I(v)‖ / ‖v‖ = ‖v‖ / ‖v‖ = 1 for all v 6= 0.

82

The third statement arises from the fact that the domain of the max is larger.

‖R‖ = maxu∈U

‖T (S(u))‖‖u‖

= maxu∈U

‖T (S(u))‖‖S(u)‖

‖S(u)‖‖u‖

≤ max

u∈U

‖T (S(u))‖‖S(u)‖

maxu∈U

‖S(u)‖‖u‖

= maxu∈U

‖T (S(u))‖‖S(u)‖

‖S‖

≤ ‖T‖ ‖S‖The fourth is a special case (but again, a useful one).

An example of a linear transformation is a linear dynamical system, that describes the interactions

among different populations. The fact that∥∥T k∥∥ ≤ ‖T‖k tells us that the maximum effect of T after

k time intervals is bounded by the k-th power of the maximum effect of T . We can understand theextreme case of the eventual behaviour in terms of the extreme case of one step.

norms of matrices

If T (v) = Av is a matrix transformation then the transformation T , and hence the norm of T , isdetermined by the matrix A. We can understand the norm of T as being a norm on the matrix A.So for a matrix we define its norm as

‖A‖ = maxx 6=0

‖Ax‖‖x‖

This is a useful point of view, especially for the “special” norms that we know for Rn and Cn.

‖A‖1 = maxx 6=0

‖Ax‖1‖x‖1

‖A‖2 = maxx 6=0

‖Ax‖2‖x‖2

‖A‖∞ = maxx 6=0

‖Ax‖∞‖x‖∞

In general determining the norm of a transformation is difficult, because of the max. But for thesenorms we can calculate directly from the matrix, as the following result explains.

Theorem 11.8. For an m× n matrix A we have:

‖A‖1 = max1≤j≤n

m∑i=1

|Aij | ‖A‖2 =√λmax(AHA) ‖A‖∞ = max

1≤i≤m

n∑j=1

|Aij |

where λmax(B) is the largest eigenvalue of the matrix B.

We give the result for ‖A‖2 for comparison; we’ll see it later. Here is a proof for ‖A‖1; the proof for‖A‖∞ is similar.

83

Proof. We first show that the norm ‖Ax‖1 is bounded.

‖Ax‖1 =

m∑i=1

|(Ax)i| =m∑i=1

∣∣∣∣∣∣n∑j=1

Aijxj

∣∣∣∣∣∣=

n∑j=1

(|xj |

∣∣∣∣∣m∑i=1

Aij

∣∣∣∣∣)

≤n∑j=1

(|xj |

m∑i=1

|Aij |

)

≤

n∑j=1

|xj |

( max1≤j≤n

m∑i=1

|Aij |

)= ‖x‖1 max

1≤j≤n

m∑i=1

|Aij |

So ‖A‖1 ≤ max1≤j≤n

m∑i=1

|Aij |

. On the other hand, let k be the column where max

1≤j≤n

m∑i=1

|Aij |

attains the maximum. Let ek be the vector having 1 in position k and 0 elsewhere. We see that‖ek‖1 = 1 and furthermore that

‖A‖1 = maxx 6=0

‖Ax‖1‖x‖1

≥‖Aek‖1‖ek‖1

=

m∑i=1

|Aik| = max1≤j≤n

m∑i=1

|Aij |

Thus ‖A‖1 = max1≤j≤n

m∑i=1

|Aij |

.

Problem 11.9. Calculate ‖A‖1, ‖A‖∞ and ‖A‖2 for A =

[1 −23 4

].

Recall that for any vector space, all norms are equivalent (that’s Theorem 6.7). So this is still thecase for norms on matrices. This makes the following exercise redundant . . . but it follows easily asa consequence of Theorem 11.8.

Problem 11.10. Show that for any m× n matrix B we have:

‖B‖1 ≤ m ‖B‖∞ ‖B‖∞ ≤ n ‖B‖1Conclude that if

(A(k)

)∞k=1

is a sequence of matrices then∥∥A−A(k)

∥∥1→ 0 if and only if

∥∥A−A(k)∥∥∞ →

0. As a hint, note that we always have |Bij | ≤ ‖B‖∞ and |Bij | ≤ ‖B‖1.

A matrix can represent the interactions of a dynamical system.

Example 11.11. Consider a population of foxes and chickens, such that the populations at timet+ 1 are determined by the populations at time t by the following equations.

rt+1 = 0.6rt + 0.5pt

pt+1 = −0.18rt + 1.2pt

This can be written as the following matrix equation xt+1 = Axt.[rt+1

pt+1

]=

[0.6 0.5−0.18 1.2

] [rtpt

]The norm ‖A‖1 represents the maximum possible expansion in terms of “total population”. Thefact that ‖A‖1 = 1.7 means that the total number of animals can never increase by more than a

84

factor of 1.7 in one time step. If we measure the total population at each time interval, then thebiggest possible increase is by a factor of 1.7.

The norm ‖A‖∞ represents the maximum possible expansion in terms of “dominant population”.The fact that ‖A‖∞ = 1.38 means that the population of the dominant species at each step cannever increase by more than a factor of 1.38 in one time step. If we measure the population of thedominant species at each time interval, then the biggest possible increase is by a factor of 1.38.

Notice that “length = total population” or “length = population of the dominant species” mightbe useful, but “length = square root of the sum of the squares of the populations” is less directlypractical.

Problem 11.12. Check that ‖A‖1 = 1.7 and that ‖A‖∞ = 1.38 for the previous example. For thesame example, calculate ‖A‖2.

exercises

1. Find ‖A‖1, ‖A‖∞, ‖A‖2 for each of the following matrices.

a) A =

[i −11 −i

]b) A =

0 1 20 0 30 0 0

c) A =

[1 −1−1 1

]

2. Show that if A is symmetric (AT = A, real or complex) then ‖A‖1 = ‖A‖∞. Show that if Ais Hermitian (AH = A) then ‖A‖1 = ‖A‖∞.(This is unusual in that the same result applies to a symmetric real matrix, a symmetriccomplex matrix and a Hermitian matrix; typically symmetric complex matrices aren’t asnice.)

3. Show that if A is Hermitian (AH = A) with eigenvalues λ1 ≥ · · · ≥ λn then

‖A‖2 = max |λi| : 1 ≤ i ≤ n = max |λ1| , |λn|

4. Show that if A is diagonal then ‖A‖1 = ‖A‖∞ = ‖A‖2.5. Let ei be the vector with 1 in the i-th position and zeroes elsewhere, and e be the vector

of all 1’s (size determined by context). Let A be a m × n matrix, and define |A| to be thematrix such that |A|ij = |Aij |; so |A| takes the absolute value of each entry.

a) Show that ‖A‖1 = maxeT |A| ej : 1 ≤ j ≤ n

.

b) Show that ‖A‖∞ = maxeTi |A| e : 1 ≤ i ≤ n

.

6. a) Give an example (with justification) of matrices A and B such that ‖AB‖1 = ‖A‖1 ‖B‖1.b) Give an example (with justification) of matrices A and B such that ‖AB‖1 < ‖A‖1 ‖B‖1.c) Give an example (with justification) of matricesA andB such that ‖AB‖∞ = ‖A‖∞ ‖B‖∞.d) Give an example (with justification) of matricesA andB such that ‖AB‖∞ < ‖A‖∞ ‖B‖∞.e) Give an example (with justification) of matrices A and B such that ‖AB‖2 = ‖A‖2 ‖B‖2.f) Give an example (with justification) of matrices A and B such that ‖AB‖2 < ‖A‖2 ‖B‖2.g) Give an example (with justification) of matrices A and B such that ‖AB‖ = ‖A‖ ‖B‖

for every norm.h) Give an example (with justification) of matrices A and B such that ‖AB‖ < ‖A‖ ‖B‖

for every norm.If something is valid for every norm then that means it follows from the definition of a norm(or other results that themselves depend only on the definition).

7. Let A be an invertible matrix. Show that ‖A‖ > 0 and∥∥A−1∥∥ > 0 for every norm.

85

8. (but some of this is only be half- ) Note that real m × n matrices form a vector spaceunder matrix addition and scalar multiplication. This vector space is isomorphic to Rmn. Sothis means that the following are norms:

f(A) =

m∑i=1

n∑j=1

|Aij |

g(A) = max |Aij | : 1 ≤ i ≤ m, 1 ≤ j ≤ nThese are the 1-norm and the ∞-norm applied by considering a m × n matrix to be like avector in Rmn. This is not the 1-norm and ∞-norm we defined for matrices, which considersa m× n matrix to be a function Rn → Rm.a) Characterize all matrices A such that ‖A‖1 = f(A).b) Characterize all matrices A such that ‖A‖∞ = f(A).c) Characterize all matrices A such that ‖A‖1 = g(A).d) Characterize all matrices A such that ‖A‖∞ = g(A).

9. Consider the f and g from Exercise 11.8 and ei, e from Exercise 11.5.a) Show that f(A) = eT |A| e.b) Show that g(A) = max

eTi |A| ej : 1 ≤ i ≤ m, 1 ≤ j ≤ n

.

c) Show that g(A) ≤ ‖A‖1 ≤ f(A) and g(A) ≤ ‖A‖∞ ≤ f(A).

10. (just for fun) Definen ∑i=1

xi to be the maximum of x1, · · · , xn. So it is liken∑i=1

xi except

that instead of adding up all the xi it takes the maximum of them. Inspired by Exercise 11.5and Exercise 11.9, show the following.

g(A) =m ∑

i=1

n ∑

j=1

∣∣eTi Aej∣∣

‖A‖1 =n ∑

j=1

m∑i=1

∣∣eTi Aej∣∣

‖A‖∞ =m ∑

i=1

n∑j=1

∣∣eTi Aej∣∣

f(A) =m∑i=1

n∑j=1

∣∣eTi Aej∣∣

86


12. Perturbed Matrices

motivation

We want to solve a system Ax = b in a context where A and b are not known exactly. There mightbe experimental errors, so our solution x represents the correct answer to what is technically thewrong question.

We will consider A and A′, two matrices that are “close” to each other, and two vectors b and b′ thatare “close” to each other. We will sometimes write ∆A = A′ − A, ∆b = b′ − b, etc. The notationdoes carry one risk of confusion: “∆A” is one matrix, and not a scalar multiplied by the matrix A.We typically think of ∆A and ∆b as being “small”.

The question comes down to the following: If x is a solution to Ax = b, is it close to a solution forA′x = b′?

We will start by considering invertibility of matrices.

invertibility near the identity

If a matrix is close to the identity matrix, we would like to think that it resembles it (even thoughit might not be similar!). A matrix A close to the identity matrix can be understood in terms ofP = A − I, or A = I + P , for some “small” matrix P . We need to better understand the meaningof “small”.

Theorem 12.1. Let P be an n× n matrix and ‖·‖ any matrix norm.

If ‖P‖ < 1 then I + P is invertible and furthermore1

1 + ‖P‖≤∥∥(I + P )−1

∥∥ ≤ 1

1− ‖P‖.

Proof. Recall that a matrix is invertible if and only if its null space is trivial (see Theorem 1.23).So to show that I + P is invertible we will show that (I + P )x = 0 implies x = 0.

If (I +P )x = 0 then Px = −Ix = −x. This gives ‖x‖ = ‖−x‖ = ‖Px‖ ≤ ‖P‖ ‖x‖. If x 6= 0 then‖x‖ 6= 0 and hence 1 ≤ ‖P‖, which is a contradiction. So the only possibility is that x = 0 andthus I + P is invertible.

In order to derive the approximation for∥∥(I + P )−1

∥∥, set B = (I +P )−1. Since I = (I +P )B weget

1 = ‖I‖ = ‖(I + P )B‖ ≤ ‖I + P‖ ‖B‖ ≤(‖I‖+ ‖P‖

)‖B‖

Since B = I − PB we get

‖B‖ = ‖I − PB‖ ≤ ‖I‖+ ‖PB‖ ≤ 1 + ‖P‖ ‖B‖Solving these for ‖B‖ we get the desired result.

Problem 12.2. We said ‖I − PB‖ ≤ ‖I‖+ ‖PB‖. Why is it “+” and not “−”?


87

This result tells us that matrices close to the identity are invertible. A small change preserves theproperty of invertibility.

Example 12.3. Let A =

1.1 0.4 00 0.6 −0.1−0.1 0.3 1.1

. What does Theorem 12.1 tell us about A?

We see that A = I + P with P =

0.1 0.4 00 −0.4 −0.1−0.1 0.3 0.1

.

We calculate ‖P‖1 = max 0.2 ; 1.1 ; 0.2 = 1.1. Using this norm, ‖P‖1 ≥ 1 and so Theorem 12.1does not apply — with this norm.

We can also calculate ‖P‖∞ = max 0.5 ; 0.5 ; 0.5 = 0.5. Since ‖P‖∞ < 1, Theorem 12.1 applieswith this norm, and we conclude that A = I + P is invertible. Furthermore we know that2/3 ≤

∥∥A−1∥∥∞ ≤ 2. This guarantees that in any row of the inverse, the sum of the absolutevalues is at most 2, and that in one of the rows ths sum of the absolute values is at least 2/3. Inparticular the absolute value of every entry in the inverse is at most 2, and at least one entry ineach row has absolute value at least 2/9.

Problem 12.4. Let a ∈ C with |a| < 1, and P =

[0 aa 0

].

Show that ‖P‖1 = ‖P‖∞ = ‖P‖2 = a. Is I + P invertible? Give bounds on the norms of (I + P )−1

that follow from Theorem 12.1, and interpret these.

Find the inverse (I +P )−1 and its norm, and compare with the bounds obtained from Theorem 12.1.

inverses

Theorem 12.1 explains what happens to matrices close to the identity matrix. We would like toextend these ideas to matrices close to any arbitrary invertible matrix.

Theorem 12.5. Let A and R be n× n matrices with A invertible, and ‖·‖ any norm.Let α =

∥∥A−1R∥∥ or α =∥∥RA−1∥∥.

If α < 1 then A+R is invertible and furthermore

∥∥A−1∥∥1 + α

≤∥∥(A+R)−1

∥∥ ≤ ∥∥A−1∥∥1− α

.

Note that we have two choices for α. In general∥∥A−1R∥∥ 6= ∥∥RA−1∥∥. A smaller value of α gives a

stronger theorem, so we would choose the smaller of the two options.

Proof. Assume that α =∥∥A−1R∥∥ < 1. The matrix A is invertible, so we can set P = A−1R.

We see that A + R = A(I + P ) and ‖P‖ = α < 1, so by Theorem 12.1 the matrix I + P isinvertible. The product of invertible matrices is invertible and so A + R is invertible — in fact(A+R)−1 = (I + P )−1A−1.

In order to bound the norm, we again appeal to Theorem 12.1.∥∥A−1∥∥ =∥∥(I + P )(A+R)−1

∥∥ ≤ ‖(I + P )‖∥∥(A+R)−1

∥∥ ≤ (‖I‖+ ‖P‖)∥∥(A+R)−1

∥∥∥∥(A+R)−1∥∥ =

∥∥(I + P )−1A−1∥∥ ≤ ∥∥(I + P )−1

∥∥∥∥A−1∥∥ ≤ 1

1− ‖P‖∥∥A−1∥∥

Solving these for∥∥(A+R)−1

∥∥ we get the desired result.

88

Problem 12.6. Show the case of α =∥∥RA−1∥∥ of Theorem 12.5.


[1 1 + ε

1− ε 1

], for ε > 0. This is an invertible matrix. For which matrices

R can we guarantee that A+R is also invertible?

We see that∥∥A−1∥∥

1= (2 + ε)/ε2.

Since∥∥A−1R∥∥

1≤∥∥A−1∥∥ ‖R‖1, having ‖R‖ <

(∥∥A−1∥∥)−1 is sufficient to guarantee∥∥A−1R∥∥ < 1.

Thus if

‖R‖1 <ε2

2 + ε

then A + R is invertible. If ε is small, then to guarantee that A + R is invertible we would needto have ‖R‖1 very small.

Problem 12.8. Find a matrix R with A + R non-invertible. Show that your example is consistentwith the condition given above.

Problem 12.9. Find the inverse of A =

[1 1 + ε

1− ε 1

], and show that

∥∥A−1∥∥1

= (2 + ε)/ε2.

Also, find∥∥A−1∥∥∞ and

∥∥A−1∥∥2, and give a condition on ‖R‖∞ and ‖R‖2 that guarantees that A+R

is invertible.

We see an important detail in Example 12.7. If we want to know if A+R is invertible for some fixedmatrix A and an error term R, a condition on ‖R‖ is more practical than a condition on

∥∥A−1R∥∥.We often use this weaker form of Theorem 12.5.

Theorem 12.10. Let A and R be n × n matrices with A invertible, and ‖·‖ any norm. Letα = ‖R‖

∥∥A−1∥∥.


∥∥A−1∥∥1 + α

≤∥∥(A+R)−1

∥∥ ≤ ∥∥A−1∥∥1− α

.

Proof. We see that ∥∥RA−1∥∥∥∥A−1R∥∥≤ ‖R‖

∥∥A−1∥∥ < (∥∥A−1∥∥)−1 ∥∥A−1∥∥ = 1

Then Theorem 12.5 guarantees that A+R is invertible.

The condition ‖R‖ <(∥∥A−1∥∥−1) can be rewritten in the following seemingly more complicated way.

‖R‖‖A‖

<1

‖A‖ ‖A−1‖

If we think of A as an initial matrix and A + R as the perturbed matrix, then R is an error term,and ‖R‖ / ‖A‖ is the relative perturbation, or relative error. This inequality says that the relativeerror is bounded by a nice function of the matrix and its inverse. Motivated by this, we define thecondition number of a matrix A as c(A) = ‖A‖

∥∥A−1∥∥. Notice that the condition number of amatrix is relative to the choice of norm.

89

Corollary 12.11. Let A an R be n× n matrices with A invertible, and ‖·‖ any norm.

Let α = ‖R‖‖A‖c(A).


∥∥A−1∥∥1 + α

≤∥∥(A+R)−1

∥∥ ≤ ∥∥A−1∥∥1− α

.

Proof. This is just Theorem 12.10 rewritten.

Lemma 12.12. For any invertible matrix A, c(A) ≥ 1.

Proof. exercise!


[10 810 11

]and R =

[0 20 −1

].

Observe that A+ R is not invertible, and calculate ‖R‖1 / ‖A‖1. Calculate A−1, ‖A‖1,∥∥A−1∥∥

1and

thus c(A). Check that c(A) ≥ ‖R‖1 / ‖A‖1 and explain how this agrees with Theorem 12.10 andTheorem 12.5.

Redo this exercise for the norm ‖·‖∞.

linear systems

We want to apply the idea of Theorem 12.5 to the solution of linear systems. Let Ax = b be alinear system we want to solve, and A′x′ = b′ a “close” system. We set A′ = A+ ∆A, b′ = b + ∆band x′ = x + ∆x. So ∆A and ∆b are small matrices and ∆x is the corresponding change in thesolution. For instance, we could have obtained this system from some experimental data, for whichthe values are not exactly known. We can solve the system, but we want to know to what extentthe experimental errors cause errors in the solution. Briefly we want to know to what extent ∆x issmall, given that ∆A and ∆b are small.

Theorem 12.14. Let Ax = b where A is an invertible matrix and b, x are nonzero. LetA′x′ = b′, and define ∆A = A′ − A, ∆b = b′ − b, ∆x = x′ − x. Let α =

∥∥(∆A)(A)−1∥∥ or

α =∥∥(A)−1(∆A)

∥∥. If α < 1 then:

‖∆x‖‖x‖

≤ 1

1− αc(A)

(‖∆b‖‖b‖

+‖∆A‖‖A‖

)

Proof. Theorem 12.5 guarantees that A+∆A is invertible, since α < 1. We can write ∆x in termsof the other matrices.

(A+ ∆A)(x + ∆x) = (b + ∆b)

(A+ ∆A)∆x = (b + ∆b)− (A+ ∆A)x

(A+ ∆A)∆x = b + ∆b−Ax−∆Ax

(A+ ∆A)∆x = ∆b−∆Ax

∆x = (A+ ∆A)−1(∆b−∆Ax)

90

This allows us to get an upper bound on ‖∆x‖. Theorem 12.5 is again useful.

‖∆x‖ =∥∥(A+ ∆A)−1(∆b−∆Ax)

∥∥≤∥∥(A+ ∆A)−1

∥∥ ‖(∆b−∆Ax)‖

≤ 1

1− α∥∥A−1∥∥(‖∆b‖+ ‖∆A‖ ‖x‖

)Dividing by ‖x‖ gives an upper bound on the relative error.

‖∆x‖‖x‖

≤ 1

1− α∥∥A−1∥∥(‖∆b‖

‖x‖+ ‖∆A‖

)=

1

1− α‖A‖

∥∥A−1∥∥( ‖∆b‖‖A‖ ‖x‖

+‖∆A‖‖A‖

)≤ 1

1− α‖A‖

∥∥A−1∥∥(‖∆b‖‖b‖

+‖∆A‖‖A‖

)

Like Theorem 12.10 did for Theorem 12.5, we can give a weaker result that is easier to actually use.

Theorem 12.15. Let Ax = b where A is an invertible matrix and b, x are nonzero. LetA′x′ = b′, and define ∆A = A′ −A, ∆b = b′ − b, ∆x = x′ − x. Let α = ‖(∆A)‖

∥∥(A)−1∥∥. If

α < 1 then:

‖∆x‖‖x‖

≤ 1

1− αc(A)

(‖∆b‖‖b‖

+‖∆A‖‖A‖

)

Proof. exercise.


[10 810 11

], b =

[612

]. Let ∆A be a matrix with 0’s in the first column and

numbers of absolute value at most 1 in the second column. Suppose further that ∆b is a vector withnumbers of absolute value at most 0.1 in each position. We want to solve A′x′ = b′ with A′ = A+∆A,b′ = b + ∆b, x′ = x + ∆x.

Find the relative error ‖∆x‖1 / ‖x‖1 and ‖∆x‖∞ / ‖x‖∞.

exercises

1. Let A =

1.1 0.7 −0.10.3 0.8 0.40.2 0.2 1.1

.

a) Find a matrix P with A = I + P .b) Using Theorem 12.1 with the 1-norm, can we conclude that A is invertible? If yes, give

values a and b such that a ≤∥∥A−1∥∥∞ ≤ b.

c) Using Theorem 12.1 with the∞-norm, can we conclude that A is invertible? If yes, givevalues a and b such that a ≤

∥∥A−1∥∥∞ ≤ b.2. Let A and R be n× n matrices with A invertible, and fix some norm.

a) Give a condition on the scalar p (using the norm) such that A+ pR is invertible.b) Show that there is a q > 0 such that A+ pR is invertible for every p with |p| < q.

3. a) Let A be an n×n matrix, such that the diagonal elements all fall in the range (0.7, 1.2)and the off-diagonal elements all fall in the range [− 1

2n ,+12n ]. Consider the matrix P

91

such that A = I+P . Show that ‖P‖∞ < 1. What does this tell us about the invertibilityof A?

b) Does there exist an invertible matrix B, such that B = I+P for some P with ‖P‖∞ ≥ 1?Either prove that this never happens or give an example.

4. Let A =

1 2 1 −10 3 1 01 1 0 11 0 0 1

, and you are also given that A−1 =

1/2 −1/2 1/2 00 0 1 −10 1 −3 3−1/2 1/2 −1/2 1

.

The 4× 4 matrix R has all entries in the range (−1/24, 1/24).a) Find the 1-norms of A and A−1, and give an upper bound on the 1-norm of R.b) Find the ∞-norms of A and A−1, and give an upper bound on the ∞-norm of R.c) Based on the 1-norms of the various matrices, can you conclude that A+R is invertible?

Explain.d) Based on the∞-norms of the various matrices, can you conclude that A+R is invertible?

Explain.e) Is A+R invertible? Explain.

5. Let A be an invertible matrix. Let R = kA for any scalar k 6= −1.a) Show that A+R is invertible.b) Show that

∥∥A−1R∥∥ =∥∥RA−1∥∥ = |k|, and so can take on any positive real value.

c) We see that A+R is invertible even though∥∥A−1R∥∥ =

∥∥RA−1∥∥ can be greater than 1.Does this contradict Theorem 12.5? Explain.

6. (slightly ) Find examples of invertible matrices A and matrices R such that A and A+R areboth invertible, but

∥∥A−1R∥∥ and∥∥RA−1∥∥ are arbitrarily large. You can do this for specific

norms; it is not necessary that each example works for every norm. Explain why this doesnot contradict Theorem 12.5.

7. a) Let Ax = b with A invertible, x 6= 0, but b = 0. Adapt the proof of Theorem 12.14 togive a bound on ‖∆x‖ / ‖x‖.

b) Let Ax = b with A invertible and x = 0 (note that this implies that b = 0). Adapt theproof of Theorem 12.14 to give a bound on ‖∆x‖.

8. Consider the matrix A from Exercise 12.4, and let b =[3 4 3 2

]T, x =

[1 1 1 1

]T;

so Ax = b. Assume that all entries in ∆A and ∆b are in the range (−−1/24, 1/24).a) Using the ∞-norm, what can you conclude about x′ that solves A′x′ = b′?b) Using the 1-norm, what can you conclude about x′ that solves A′x′ = b′?c) Based on the above, what can you conclude about x′ that solves A′x′ = b′?

9. a) Solve

[1 1

1.0001 1

]x =

[1

1.0001

].

b) Solve

[1 1

1.0001 1

]x =

[11

].

c) Are your solutions similar? Compute the condition number of the matrix. Thinking ofthe second system as a perturbed version of the first system, what does Theorem 12.5and Theorem 12.10 say here? correction: Theorem 12.14 and Theorem 12.15

10. Recall the functions

f(A) =

m∑i=1

n∑j=1

|Aij |

g(A) = max |Aij | : 1 ≤ i ≤ m, 1 ≤ j ≤ n

a) If you didn’t already do so, show (using the definition) that the functions f and g areindeed norms on the vector space Mm,n.

92

b) Consider the matrix of Example 12.3. What do f and g say about this matrix usingTheorem 12.1?

11. For any ε > 0 give a matrix P such that det(P ) < ε yet I + P is not invertible. This showsthat we can’t use determinants instead of norms in Theorem 12.1. It also shows that det(·)is not a norm (but you knew that already, right?).

12. a) Show that for P = −I that ‖P‖ = 1 for every norm, yet I + P is not invertible.b) Give a matrix P 6= −I such that ‖P‖p = 1 for each p ∈ 1,∞, 2, yet I + P is not

invertible.c) Conclude that we cannot strengthen Theorem 12.1 by replacing “‖P‖ < 1” with “‖P‖ ≤

1”. What happens in the (silly) case of 1× 1 matrices?

13. a) Show that for R = −A that∥∥RA−1∥∥ =

∥∥A−1R∥∥ = 1 for every norm, yet A + R is notinvertible.

b) For any invertible matrix A, give a matrix R 6= −A such that either∥∥RA−1∥∥

p= 1 or∥∥A−1R∥∥

p= 1 for each p ∈ 1,∞, 2, yet A+R is not invertible.

c) Conclude that we cannot strengthen Theorem 12.5 by replacing “‖P‖ < 1” with “‖P‖ ≤1”. What happens in the (silly) case of 1× 1 matrices?

14. a) Explain briefly why for two matrices M and N we have ‖MN‖ ≤ ‖M‖ ‖N‖. (This wasimplicitly mentioned in an earlier chapter but should have been asked explicitly.)

b) Prove Lemma 12.12.c) For each of the following matrices, and for each of the 1-norm, ∞-norm, 2-norm, deter-

mine whether c(A) = 1.

A =

[1 11 0

]A =

[0 22 0

]d) For each of the following matrices, and for each of the 1-norm, ∞-norm, 2-norm,

determine the values of a, b, c, d that make c(A) = 1.

A =

[a 00 d

]A =

[0 bc 0

]15. a) Show that if ‖R‖

∥∥A−1∥∥ < 1 then ‖R‖ < ‖A‖ (Exercise 12.14 might help).b) Looking at the proof of Theorem 12.10, can we now conclude that if ‖R‖ < ‖A‖ then

A + R is invertible? (Note that it is easier to check ‖R‖ < ‖A‖ than ‖R‖∥∥A−1∥∥ < 1,

since we don’t need to find A−1, so this would be a somewhat useful criteria.)c) (slightly ) Give examples of matrices A and R such that A is invertible and ‖R‖ < ‖A‖

but A+R is not invertible. You can do this for specific norms; it is not necessary thateach example works for every norm.

93


13. Eigenvalues and Eigenvectors

invariant subspaces

For a transformation T : V → W we can always find a matrix that represents it relative to a choiceof bases for V and W . We would like to have A as “simple” as possible, to better understand theaction of T . We accomplish this by choosing “good” bases. It’s often useful to consider the caseV = W and furthermore to choose the same basis for the origin and destination space.

An invariant subspace U for a linear transformation T is a subspace such that T (u) ∈ U for allu ∈ U . An invariant subspace describes a part of the structure of T .

Example 13.1. Let V be an n-dimensional vector space over R and T : V → V be a linear transfor-mation. Let A be an n×n matrix that represents T it with respect to some basis B = u1,u2, · · · ,un.

For instance, let n = 4 and A =

1 2 3 42 −1 3 40 0 2 30 0 1 1

. Let X =

ab00

: a, b ∈ R

, which is a subspace

of R4.

We see that Ax ∈ X for all x ∈ X. This means that X is an invariant subspace of A.1 Since Arepresents some transformation T , we think of vectors x ∈ R4 as being the coordinates with respectto B of some vector u ∈ V . So in fact the subspace X of R4 corresponds to a subspace U of V , whichis an invariant subspace for T . In particular, since we can see that the subspace X is spanned bye1, e2, we know that U is spanned by u1,u2.

We see two things in this example. One is that we can connect invariant subspaces of a lineartransformation to invariant subspaces of the matrix that represents it. Secondly, it can be easierto find invariant subspaces of a matrix, especially if we choose the bases “correctly”. The block ofzeroes in the matrix makes it easy to find an invariant subspace.

Problem 13.2. For the previous example check that Ax ∈ X for all x ∈ X. Also check that e1, e2is a basis for X.

We’ve seen other invariant subspaces. Recall that if U is a subspace of V and f is the projectiononto U , then f ∈ L (V ), and furthermore f(u) = u for every u ∈ U .

Proposition 13.3. If f : V → V is projection onto U , then U is an invariant subspace for f .

We can also combine invariant subspaces. “Recall” that the sum of subspaces U1 and U2 is U1+U2 =u1 + u2 : u1 ∈ U1,u2 ∈ U2.

Proposition 13.4. Let U1 and U2 be invariant subspaces for a linear transformation T . ThenU1 ∩ U2 and U1 + U2 are both invariant subspaces for T .

∗ These notes are intended for students in mike’s MAT3341. For other uses please say “hi” to [email protected] This is an abuse of notation. But we are thinking of A as a linear transformation from Rn to Rm. So by “an invariant

subspace of a matrix A” we mean “an invariant subspace of the linear transformation x→ Ax”.

94

The simplest invariant subspaces are those of dimension 1. Let U be a invariant subspace of dimension1 and u1 6= 0 be a vector in U . So T (u1) = λu1 (since every vector in U is a multiple of u1). Everyother vector u ∈ U is such that u = αu1 and so T (u) = T (αu1) = αλu1 = λu.

If T ∈ L (V ) and T (v) = λv for v 6= 0 then we say that v is an eigenvector of T , withcorresponding eigenvalue λ. If we take A to be the matrix of T with respect to some basis B, andset x = φB(v), then we see that Ax = λx. Likewise if we have Ax = λx, then A is the matrix ofsome linear transformation T ∈ L (V ) with respect to some basis (can you give a quick example ofone such T?) and we can consider x to be a vector of coordinates for a vector v ∈ V . Then we wouldhave T (v) = λv. So we can move seamlessly back and forth between a linear transformation and amatrix.1 So the next section is written in terms of matrices, but applies to linear transformations aswell.

review : eigenvalues and eigenvectors of matrices, and diagonalization

Let A be an n × n matrix. Recall that x 6= 0 is an eigenvector of A with eigenvalue λ if Ax = λx.We can find eigenvalues and eigenvectors using the following results.

Theorem 13.5. The eigenvalues of A are exactly the roots of det(A− λI).

The determinant det(A−λI) is a polynomial of degree n in the variable λ; it’s the characteristicpolynomial ofA. Note the the roots of this polynomial are in general complex, even if the coefficientsof the polynomial are real. For each root λj we define its multiplicity mult(λj) as its multiplicity inthe polynomial. We see directly that the sum of the multiplicities is always exactly n, the degree ofthe polynomial (and the size of A).

Problem 13.6. Give an example of a matrix such that the characteristic polynomial has real co-efficients but roots are non-real complex numbers. Give an example of a matrix with at least onenon-real entry but for which all eigenvalues are real. Examples can be found among 2 × 2 matrices. . .

Theorem 13.7. The eigenvectors corresponding to λj are exactly the non-zero vectors of thenull space nul(A− λjI).

The space nul(A− λjI) is the eigenspace of λj . In order to “give the vectors of the eigenspace” wewill give instead a basis for each eigenspace (since there are an infinite number of eigenvectors, right?).For each root λj we define its dimension dim(λj) as the dimension of its eigenspace.2 Sometimes wewrite dim(Ej), where Ej is the the eigenspace corresponding to λj .

Problem 13.8. Show that every eigenvalue has an infinite number of eigenvectors.

Theorem 13.9. For each eigenvalue λj we have 1 ≤ dim(λj) ≤ mult(λj).

Notice that the inequality 1 ≤ dim(λj) is a consequence of the fact that each eigenvalue has at leastone eigenvector (the definition demands it) and that every multiple of an eigenvector is itself aneigenvector, so we get an eigenspace of dimension at least 1.

If we want to find an independent set of eigenvectors for a matrix (perhaps not all sharing the sameeigenvalue) then we can’t do any better than taking as many vectors from each eigenspace as its

1 As elsewhere, to be safe, this assumes V is finite dimensional.2 This is an abuse of notation. We should say dim(nul(A− λjI)) and not dim(λj).

95

dimension. So taking a basis from each eigenspace gives the biggest possible set of independenteigenvectors. This gives the following.

Theorem 13.10. Let A be an n× n matrix. Then the following are equivalent.

• For each eigenvalue λj of A we have mult(λj) = dim(λj).

• There exists a basis for Rn (Cn) composed of eigenvectors of A.

• There exists a diagonal matrix D and an invertible matrix P such that A = PDP−1.

Proof sketch. The fact that mult(λj) = dim(λj) gives a basis for Rn comes down to saying that themaximum number of vectors that one can take from each eigenspace is dim(λj). The equivalencewith A = PDP−1 comes down to considering AP = PD: looking at each column of the productwe see that the columns of P are necessarily an independent set of n eigenvectors of A.

We leave the proof of the following result as an exercise.

Theorem 13.11. Let S be a set of eigenvectors, not necessarily with the same eigenvalue.Then the space spanned by S is an invariant subspace. In particular, if U1, · · · , Uk areeigenspaces then U1 + · · ·+ Uk is an invariant subspace.

exercises

1. Let T ∈ L (V ). Show that 0 and V are invariant subspaces of T . These are the trivialinvariant subspaces.

2. Prove that every invariant subspace of dimension 1 is contained in an eigenspace. Why isn’tevery invariant subspace of dimension 1 equal to an eigenspace?

3. Let f : V → V be f(x) = projU (x). Show that U is an eigenspace of f , and conclude that itis an invariant subspace. This gives a variation on the proof of Proposition 13.3 alluded toabove.

4. Let T ∈ L (V ) and let U = im(T ). Show that U is an invariant subspace of T . Explain howthis gives another proof of Proposition 13.3.


6. Let Pn be the vector space of polynomials of degree at most n in the variable t, and letD ∈ L (Pn) be the map D(f) = d

dtf .a) Find all eigenspaces of D.b) Show that the polynomials of degree at most k form an invariant subspace of D, for any

fixed k with 0 ≤ k ≤ n.c) Are there other invariant subspaces of D?

7. Let V be the vector space Rn+1, and let A be the (n + 1) × (n + 1) matrix with Aij = i ifj = 1 + i and 0 otherwise.a) Find all eigenspaces of A.b) Show that the set of vectors with the last n− k entries equal to zero form an invariant

subspace of A, for any fixed k with 1 ≤ k ≤ nc) Are there other invariant subspaces of A?

8. Let λ be an eigenvalue of a matrix A. Explain why dim(λ) ≥ 1.

9. Prove Theorem 13.11.

96


14. Similarity

introduction

A matrix decomposition is useful for better understanding a linear transformation. For a transfor-mation T : V → V we could calculate a matrix representing it with respect to, e.g., the standardbasis. If we then discover that we can diagonalise A as A = PDP−1 then we have D = P−1AP .Had we found the matrix representation of T with respect to the basis of columns of P , we wouldhave obtained the diagonal matrix D. Sometimes it is easier to find a diagonalisation of A than toguess the right basis for T .

Calculating A comes down to Theorem 10.27. We find D by diagonalising A, but we use it as inTheorem 10.32.

The same idea applies to other decompositions. We want the decomposition in order to have arepresentation of some T with respect to a nice basis. But we often find this by starting with amatrix that is easily obtained (eg, with respect to the standard basis, some given basis, etc.) andthen working with the matrix.

similarity

A matrix A is similar to B if there exists an invertible matrix P such that A = PBP−1. In termsof linear transformations we can interpret A and B as two different representations of the sametransformation, with respect to different bases. We write “A ∼ B” if A is similar to B. Note thatsome books use “∼” to refer to Gaussian reduction of a matrix: this is completely different.


[0 −11 2

]and B =

[1 02 1

]. Check that A = PBP−1 with P =

[1 −11 1

].

Conclude that A is similar to B. Find a matrix Q such that B = QAQ−1. Conclude that B is similarto A.

Similarity is an equivalence relation, as the following result asserts.

Proposition 14.2. The following are true for all square matrices A,B,C.

• A ∼ A• A ∼ B ⇐⇒ B ∼ A• A ∼ B,B ∼ C =⇒ A ∼ C.

The fact that similarity is an equivalence relation guarantees that we can partition the set of allmatrices into classes, such that all matrices within a class are similar and matrices in different classesare not similar. If A is a representation of some T then in looking for the simplest representation ofT we are looking for the simplest matrix among those similar to A.

Similar matrices share several properties.


97

Proposition 14.3. Let A ∼ B with A = PBP−1. Then

1. A and B have the same characteristic polynomial, and hence the same eigenvalues withthe same multiplicities.

2. If x is an eigenvector of A with eigenvalue λ, then P−1x is an eigenvector of B witheigenvalue λ.

3. Ak ∼ Bk

4. det(A) = det(B)

5. B is invertible if and only if A is invertible.

Proof. The characteristic polynomial of A is det(A−λI). Using properties of the determinant wesee that

det(A− λI) = det(PBP−1 − λI) = det(PBP−1 − PλP−1)= det(P ) det(B − λI) det(P−1)

= det(P ) det(B − λI)1

det(P )= det(B − λI)

We conclude that the eigenvalues of A and B (including multiplicities) are equal.

If x is an eigenvector of A with eigenvalue λ then

Ax = λx ⇐⇒ PBP−1x = λx ⇐⇒ B(P−1x) = λ(P−1x)

If A = PBP−1 then

Ak =(PBP−1

) (PBP−1

)· · ·(PBP−1

)= PB

(P−1P

)B(P−1P

)B · · ·

(P−1P

)BP−1 = PBkP−1

so Ak and Bk are similar.

The fact that det(A) = det(B) is a special case of det(A− λI) = det(B − λI), with λ = 0. Thisshows, incidentally, that the constant in the characteristic polynomial is the determinant of thematrix.

If B is invertible then PB−1P−1 is the inverse of A (exercise!).

unitary similarity

An n× n matrix P is unitary if PH = P−1. An n× n real matrix P is orthogonal if P T = P−1;so an orthogonal matrix is a real unitary matrix. We’ve already seen these: the matrix Q in theA = QR decomposition for instance (A complex or real).

Problem 14.4. Show that an n × n matrix P is unitary if and only if the columns of P form anorthonormal basis for Cn.

A matrix A is unitarily similar to B if there exists a unitary matrix P with A = PBPH . Aunitary matrix is always invertible (because its inverse is its transpose conjugate), so it’s a specialcase of similarity. If P is an orthogonal matrix then we say that A is orthogonally similar to B.


[0 −11 2

]and B =

[1 02 1

]. Check that A = PBPH with P = 1√

2

[1 −11 1

]and that P is unitary. Conclude that A is unitarily similar to B. Find a matrix Q such thatB = QAQH . Conclude that B is unitarily similar to A. How does this differ from Problem 14.1?

98


[1− i i−i 1 + i

]and B =

[1 20 1

]. Check that A = PHBP with P =

12

[1− i 1− i−1− i 1 + i

]and that P is unitary. Conclude that A is unitarily similar to B. Find a matrix

Q such that B = QAQH . Conclude that B is unitarily similar to A. Also, give (with no calculation)the eigenvalues of A.

Unitary similarity is also an equivalence relation.

Proposition 14.7. Unitary similarity is an equivalence relation. Furthermore, if two matricesare unitarily similar then they are similar. So the equivalence classes of the unitary similarityrelation are a refinement of the equivalence classes for similarity.

If A is the matrix of some linear transformation T with respect to the standard basis and A = PBPH

then B also represents T but with respect to an orthonormal basis. Orthonormal bases are ingenerally preferable (see projections, A = QR, . . . ) so we would expect that a unitary similaritywould be “preferable”.

Some properties of unitary matrices.

Proposition 14.8. If P and Q are unitary matrices then

• The columns of P form an orthonormal basis for Cn

• P is invertible and P−1 is unitary

• The rows of P form an orthonormal basis for Cn

• |det(P )| = 1

• PQ is unitary

Proof Sketch. For the determinant we see that

1 = det(I) = det(PHP ) = det(PH) det(P ) = det(PT

) det(P ) = det(P ) det(P ) = det(P ) det(P )

Thus |det(P )|2 = 1.

The other statements are left as exercises.

In particular an orthogonal matrix (unitary and real) has determinant ±1. But a unitary matrix hasa determinant that is any complex number of absolute value 1.

Unitary matrices preserve inner products.

Proposition 14.9. If P is a unitary matrix then 〈Px|Py〉 = 〈x|y〉 for all x,y ∈ C2 and thestandard inner product and so ‖Px‖2 = ‖x‖2.

Proof. We directly calculate 〈Px|Py〉 = xHPHPy = xHy = 〈x|y〉. This gives that ‖Px‖2 =√〈Px|Px〉 =

√〈x|x〉 = ‖x‖2.

If we define T : Rn → Rn by T (x) = Px for a unitary matrix P then T preserves lengths and angles,since these are determined by the inner product. The length of x is the same as the length of T (x)and the angle between x and y is the same as the angle between T (x) and T (y).

99

Proposition 14.10. If P is unitary then ‖P‖2 = 1. Furthermore if A is any matrix then‖PA‖2 = ‖AP‖2 = ‖A‖2.

Proof. Recall that ‖P‖2 is the maximum of ‖Px‖2 / ‖x‖2 over all x 6= 0. But this fraction is alwaysequal to 1 (Proposition 14.9), so ‖P‖2 = 1. For the norm of PA we again use Proposition 14.9,i.e., that ‖Py‖2 = ‖y‖2 for any y.

‖PA‖2 = maxx 6=0

‖(PA)x‖2‖x‖2

= maxx 6=0

‖P (Ax)‖2‖x‖2

= maxx6=0

‖Ax‖2‖x‖2

= ‖A‖2

The norm of AP is “similar” (!)

‖AP‖2 = maxx 6=0

‖(AP )x‖2‖x‖2

= maxx 6=0

‖APx‖2‖Px‖2

= maxx 6=0

‖A(Px)‖2‖(Px)‖2

= maxy 6=0

‖Ay‖2‖y‖2

= ‖A‖2

It’s always the case that ‖AB‖2 ≤ ‖A‖2 ‖B‖2. What’s new is that if one of the matrices is unitarythen “≤” becomes “=”.

Lastly the eigenvalues of unitary matrices are special.

Proposition 14.11. If P is unitary and Px = λx for some x 6= 0 then |λ| = 1.

Proof. If Px = λx then ‖Px‖2 = ‖λx‖2 = |λ| ‖x‖2. But Proposition 14.9 gives ‖Px‖2 = ‖x‖2, so|λ| = 1.

We recall that similar matrices have the same determinant (Proposition 14.3). Matrices that areunitarily similar share a bit more.

Proposition 14.12. If A and B are unitarily similar then ‖A‖2 = ‖B‖2.

Proof. If A = PBPH for a unitary matrix P then by Proposition 14.10 we have

‖A‖2 =∥∥PBPH∥∥

2=∥∥P (BPH)

∥∥2

=∥∥BPH∥∥

2= ‖B‖2

isometries

The fact that transformations T (x) = Px for a unitary matrix P preserve lengths and angles givesrise to an important class of transformations. Real unitary matrices (i.e., orthogonal matrices) areexactly the isometries of Rn that fix the origin. These are the rigid transformations of Rn. Theseare important in implementing, eg, computer graphics.

This is something we consider in more detail in MAT2355.

There is a similar result for unitary matrices in general, except that we are then dealing with thegeometry of Cn. Each dimension of Cn corresponds to a complex plane, so this is not quite as usefulfor computer screens! However, unitary matrices play an important role in a number of applications.For instance, it can be shown that operations computable by a quantum computer correspond exactlyto unitary matrices.

100

exercises


2. Prove the remaining cases of Proposition 14.8. In particular, show that the rows of a unitarymatrix are orthonormal, and that the product of unitary matrices is unitary.

3. Consider the equivalence relation of similarity. Is it true that every equivalence class containsa diagonal matrix? Explain.

4. Consider the equivalence relation of unitary similarity. Is it true that every equivalence classcontains a diagonal matrix?

5. Consider the set S of matrices that have exactly one nonzero in each row and column, andthat entry has absolute value 1.a) Show that S is a subset of the unitary matrices.b) Show that matrices in S are invertible and their inverse is in S.c) Show that products of matrices from S are in S.d) Show that ‖Px‖1 = ‖x‖1 and ‖Px‖∞ = ‖x‖∞ for P ∈ S.e) Show that ‖PA‖1 = ‖AP‖1 = ‖A‖1 and ‖PA‖∞ = ‖AP‖∞ = ‖A‖∞ for all P ∈ S.

101


15. Condition of eigenvalues

Gerschgorin

We start with a tool that will give us estimations of the eigenvalues of a matrix.

A circle in the complex plane is defined by the equation |z − a| = r. The centre is the complex pointa, the radius is the non-negative real number r and z is thus any point on the circle. So the interioris the set of all points z with |z − a| ≤ r.

Theorem 15.1. Let A be an n × n matrix. For each row 1 ≤ i ≤ n of A we define ri =∑j 6=i |Aij |. We get n discs of the form |z −Aii| ≤ ri.

Then each eigenvalue of A lies in one of these discs. Furthermore if s of these discs form aconnected component then this region contains s eigenvalues.

Proof. Let x be an eigenvector and t an index with |xt| ≥ |xi| for each 1 ≤ i ≤ n. Since(λx)t = (Ax)t =

∑nj=1Atjxj we get the following.

λxt −Attxt =∑j 6=t

Atjxj

|λ−Att| =

∣∣∣∣∣∣∑j 6=t

Atjxjxt

∣∣∣∣∣∣ ≤∑j 6=t|Atj |

∣∣∣∣xjxt∣∣∣∣ ≤∑

j 6=t|Atj | = rt

So λ lies within the disc |z −Att| ≤ rt.

Note that we don’t actually know the value of t, so all we can say in practice is that λ lies withinone of these discs.

Problem 15.2. In the previous proof we divided by |xt|. What if xt = 0? Explain why this isn’t aproblem here.

Problem 15.3. Let A be a n× n matrix and cj =∑

i 6=j |Aij |. Show that Theorem 15.1 is still valid

if we replace ri by cj. (hint: A and AT have the same eigenvalues.)

Example 15.4. Estimate the eigenvalues of A =

0 0.1 0.110 0 11 1 0

using Theorem 15.1.

The Gerschgorin discs are

|z − 0| ≤ 0.2 |z − 0| ≤ 11 |z − 0| ≤ 2

These are three discs centered at the origin. They form one connected component so the onlyconclusion we can draw from Theorem 15.1 is that the three eigenvalues lie within the disc |z| ≤ 11.We can not use Theorem 15.1 to conclude that one of the eigenvalues lies in the disc |z| ≤ 0.2.


102


1.1 0.4 00 0.6 −0.1−0.1 0.3 1.1

. Calculate the the Gerschgorin discs. Conclude that

λ = 0 is not an eigenvalue of A. Is A invertible? Compare with Example 12.3.

If the off-diagonal entries of the matrix are small then the ri will be small: the Gerschgorin discs givegood estimates. If the off-diagonal entries are small relative to the diagonal entries, then the discswill be far from the origin and the matrix will be invertible.

Problem 15.6. For the following matrices, apply Theorem 15.1 to give information about the eigen-values. Can we conclude that the matrix is invertible?[

2 00 3

] −2i 1 00.3 1 + i −i0 0 1

Problem 15.7. Let P be a permutation matrix (so exactly one 1 in each row and each column and 0elsewhere). Show that there are exactly two possibilities for the Gerschgorin discs of P , and describethe possible consequences of Theorem 15.1.

Show that the eigenvalues of a permutation matrix all have absolute value |λ| = 1. Compare with theconclusions of Theorem 15.1.


0 0.1 0.110 0 11 1 0

, and S =

α 0 00 1 00 0 1

.

Calculate the matrix B = SAS−1. Explain why B has the same eigenvalues as A. Give the Ger-schgorin discs for B, and find the value of α that gives the strongest conclusion for the eigenvalues.What can we conclude for the eigenvalues of A?

condition of eigenvalues

Consider a matrix A for which we know a diagonalization A = PDP−1. We certainly know theeigenvalues of A. What can we say about the eigenvalues of A′ = A+ ∆A, if ‖∆A‖ is small?

We can rewrite the similarity as D = P−1AP . So the columns of P are a basis with respect towhich A becomes diagonal. We can imagine that A′ might (hopefully!) also be diagonalizable, witha matrix that is close to P . We can use P to compute a matrix similar to A′.

D′ = P−1A′P = P−1(A+ ∆A)P = P−1AP + P−1∆AP = D + ∆D

The matrix D′ is probably not diagonal, but it is “almost” diagonal in the sense that it is the sumof a diagonal matrix and a “small” matrix. The matrix D′ is similar to A′ and so has the sameeigenvalues. So we can estimate the eigenvalues of A′ by estimating the eigenvalues of D′. This is anatural case for Theorem 15.1, since the entries of ∆D are small and so the radii ri will be small. Thediscs will be good estimates of the eigenvalues. The i-th disc of D′ has centre (D+∆D)ii = Dii+∆Dii

and radius∑

j 6=i |(D + ∆D)ij | =∑

j 6=i |∆Dij |.

Instead of directly using the Gerschgorin discs, we will modify them so that they are in terms of‖∆D‖. These discs are each contained in a disc of centre Dii and radius

∑nj=1 |∆Dij | (a picture will

make this clearer). Since the diagonal entries of D are exactly the eigenvalues of A we have Dii = λi.Also we can bound the radius by

n∑j=1

|∆Dij | ≤ ‖∆D‖∞ =∥∥P−1∆AP∥∥∞ ≤ ∥∥P−1∥∥∞ ‖∆A‖∞ ‖P‖∞ = c(P ) ‖∆A‖∞

103

So if the eigenvalues of A are λi for 1 ≤ i ≤ n then the eigenvalues of A′ are each in one of thefollowing discs.

|z − λi| ≤ c(P ) ‖∆A‖∞

We’ve proved the following result — or rather we’ve proved it for the special case of the ∞-norm. Itis in fact valid for any norm.

Theorem 15.9. Let A be a diagonalizable matrix with A = PDP−1, and λi the eigenvaluesof A, 1 ≤ i ≤ n.Then each of the eigenvalues of A + ∆A lie in one of the discs z ∈ C : |z − λi| ≤ R, whereR = c(P ) ‖∆A‖ (for any matrix norm). If s of these discs form a connected component thenthis region contains s eigenvalues.

We won’t give a proof for arbitrary norms, but notice that by considering the transpose we get ashort proof for the 1-norm, following Problem 15.3.

The consequence is that if we change the matrix A a little bit, then in order to guarantee that theeigenvalues won’t change much we need both ‖∆A‖ and c(P ) to be small.

Problem 15.10. Let A be a diagonalizable matrix with A = PDPH with P unitary and D diagonal,and λi the eigenvalues of A. Show that if λ is an eigenvalue of A′ = A+ ∆A then |λ− λi| ≤ ‖∆A‖2.

This is one of the reasons that an orthogonal basis is preferred, since it guarantees stability of theeigenvalues relative to errors in the matrix.


[1 M0 2

]and ∆A =

[0 0x 0

]with |x| ≤ ε.

Check that A is diagonalizable with P =

[1 M0 1

], and show that ‖P‖∞ =

∥∥P−1∥∥∞ = |M | + 1, as

well as ‖∆A‖∞ ≤ ε.Give the discs within which are to be found the eigenvalues of A′, according to Theorem 15.9. If Mis very large, are the eigenvalues of A′ close to those of A? Explain.

exercises

1. Let A =

1 0.1 0.10.1 2 0.10.1 0.1 3

.

a) Give the Gerschgorin discs for A.

b) Let S =

α 0 00 1 00 0 1

. Using the Gerschgorin discs for SAS−1 for appropriate α, what is

the best bound you can give for the eigenvalue of A that is near 1?

c) Let S =

1 0 00 β 00 0 1

. Using the Gerschgorin discs for SAS−1 for appropriate β, what is


d) Let S =

1 0 00 1 00 0 γ

. Using the Gerschgorin discs for SAS−1 for appropriate γ, what is


104

The point of this is that one need not necessarily diagonalize a matrix in order to get goodestimates for the eigenvalues. Note also that no factoring of the characteristic polynomialwas required.

2. Let A be a square matrix and set D to be the diagonal matrix with 1’s on the diagonal exceptthat D11 = α is some parameter.a) By choosing α appropriately, show that the Gerschgorin disc for DAD−1 centered at

A11 can be made arbitrarily small.b) Can we conclude from this that A11 is an eigenvalue of A?c) Can we conclude from this that A has an eigenvalue close to A11?

3. Let A =

1 0.3 00.4 2 0.40 0.5 4

, and P =

1 0 00 x 00 0 1

.

a) Determine the Gerschgorin discs for A, and explain what we can conclude about theeigenvalues of A based on these.

b) We would like to know whether A has eigenvalues close to 1, 2, 4. By considering thematrix B = PAP−1 for appropriate values of x, find the best approximations for eacheigenvalue. (hint: For each disc, find the value of x that makes that disc as small aspossible while also making it disjoint from the others.)

4. Let A be a square matrix such that one of the Gerschgorin discs has radius zero. What canwe conclude about the eigenvalues of A? Explain!

5. Let A be a triangular matrix, and D a diagonal matrix. By choosing D carefully, show thatthe Gerschgorin discs for DAD−1 can all be made arbitrarily small, simultaneously. Concludefrom this that the eigenvalues of a triangular matrix are the diagonal elements. (Of coursewe knew this already.)

6. Let A be a diagonal matrix, and ∆A be some matrix all of whose entries are at most ε, inabsolute value. Show that as long as ε is “small” the eigenvalues of A + ∆A will be “close”to the eigenvalues of A. Make this statement precise, using Theorem 15.9.

7. Let A be a matrix with A = PDP−1 for some diagonal matrix D and unitary matrix P , and∆A be some matrix all of whose entries are at most ε, in absolute value. Show that as longas ε is “small” the eigenvalues of A+ ∆A will be “close” to the eigenvalues of A. Make thisstatement precise, using Theorem 15.9.

8. Let A be a matrix, and ∆A be a matrix all of whose entries lie in the range [−ε, ε]. Assumethat A is diagonalizable as A = PDP−1, where the eigenvalues of A are λ1, λ2, · · · , λn. Showthat if ε is small enough, that there are complex numbers δ1, δ2, · · · , δn with |δj | ≤ n · ε · c(P ),such that the eigenvalues of A+ ∆A are λ1 + δ1, · · · , λn + δn. Determine precisely how smallε has to be for this to be guaranteed. (For the purposes of this question, use the 1-norm onmatrices.)

9. Consider a known diagonalization A = PDP−1. Some other matrix A′ is such that ‖A′ −A‖is small. For each of the following cases, is it reasonable to expect that the eigenvalues of Aare reasonable approximations to those of A′? Explain, based on Theorem 15.9.

a) P =

1 1 02 1 11 1 1

, P−1 =

0 1 −11 −1 1−1 0 1

b) P =

10 1 02 1 11 1 1

, P−1 =

0 1 −11 −10 10−1 9 −8

c) P =

100 1 02 1 11 1 1

, P−1 =

0 1 −11 −100 100−1 99 −98

105

10. For the matrices P of Exercise 15.9, compute the angles between the vectors that are thecolumns of P , and decide informally if P is “close” to being non-invertible. Do the samefor P−1. Find c(P ) in each case (presumably you already did this in Exercise 15.9). If theangles between the columns of P “seem not too small” so that it P “not too close to beingnon-invertible” does this mean c(P ) is small?

106


16. Schur form and normal matrices

introduction

We know about various matrix decompositions, some of which are similarity relations as well. Forinstance a matrix is diagonalizable if and only if the multiplicity of each eigenvector is equal to thedimension of the corresponding eigenspace. This is a nice decomposition as it gives us a basis ofeigenvectors. We also know that orthogonal bases are in general preferred, as they lead to morenumerically stable computations.

Can we combine these two notions — basis of eigenvectors and orthogonal basis? The most obviousidea is to ask for an orthogonal basis of eigenvectors. This turns out not to be possible in general(we’ll see more precise conditions shortly). But one of the key ideas is the Schur decompositionof a matrix.

Schur form

Theorem 16.1. Let A be an n×n matrix. Then there exists a unitary matrix P and an uppertriangular matrix S with A = PSPH .

Note that the similarity implies that the eigenvalues of A are exactly the diagonal elements of S.Also note that if we rewrite the similarity as AP = PS we do not get that the columns of P areeigenvectors of A (except for the first column), as we do for A = PDP−1.

The proof is a little technical, but as we’ll see it gives a conceptually simple algorithm for finding theSchur form. “Conceptually simple” means that there might be a lot of technical work involved . . .

Proof. We’ll give a proof which is really a recursive algorithm in disguise. To start off with, letA1 = A and find an eigenvector and eigenvalue of A1. So we have Av1 = λ1v1. Now extend v1

to an orthonormal basis of Cn; let Q1 be the matrix whose columns are this basis. We defineS1 = QH1 A1Q1.

What do we know about S1? By analysing Q1S1 = A1Q1 we know that the first column of S1 isin the right form: that is, it is upper triangular with an eigenvalue of A1 on the diagonal.

We now define A2 to be the matrix S1 with the first row and column removed. So A2 is the partof S2 that is “not yet in Schur form”. Note that the eigenvalues of S1 are exactly the eigenvaluesof A2 with the addition of one eigenvalue of λ1.

We can do the same thing with A2 as we did with A1, namely find λ2, v2, and Q2, then we defineS2 = QH2 A2Q2 and set A3 to be S2 with the first row and column removed.

We can continue in this way, obtaining matrices S1, S2, · · · , Sn−1, and we notice that Sn−1 is infact completely in the right form: since it is two by two and its first column is triangular it isitself triangular.

The only problem is that the similarity relationships don’t quite chain together. Although S2 iscertainly similar to A2, we can’t say that it is similar to A1, because these matrices are not thesame size.


107

For each i, we define Pi to be the matrix Qi embedded as the bottom-right corner of an n × nidentity matrix. In other words we add n− i rows and columns of the identity matrix to the topand left of Qi. So P1 = Q1, and P2 is the matrix Q1 with a row and column of zeros added, anda 1 in the upper left.

Now we can compute the similarity recursively, since the Pi are all the same size. We find thatPHn−1P

Hn−2 · · ·PH2 PH1 AP1P2 · · ·Pn−2Pn−1 is a matrix whose i-th column from the diagonal down is

equal to the first column of Si. Namely this is an upper triangular matrix whose diagonal entriesare exactly the eigenvalues of A.

So if we define P = P1P2 · · ·Pn−2Pn−1 then PHAP = S is a triangular matrix and P is a unitarymatrix, as required.

Problem 16.2. Show that the eigenvalues of S1 are exactly the eigenvalues of S2, with the additionof one eigenvalue of λ1.

Problem 16.3. Let’s make the recursive step a little more concrete for i = 2.

We know that S1 = QH1 A1Q1 and S2 = QH2 A2Q2, where A2 is the matrix S1 with first column androw removed. Certainly P1 = Q1, but P2 6= Q2. The formula for P says that we will calculatePH2 S1P2. Show by block multiplication that this matrix has exactly the same first column as S1, andupon removing the first row and column we get S2. In other words, show that PH2 S1P2 is triangularin the first two columns, with the first two diagonal entries being eigenvalues of A.

Problem 16.4. Show, in the proof above, that since the matrices Qi are unitary that the matricesPi are also unitary and that the matrix P is unitary.

We triangularize one column at a time. In principle, this isn’t that different from Gaussian elimina-tion, except that instead of row operations we are using similarity operations.

There is one detail we have omitted from the above proof. We are to find an eigenvalue and eigenvectorof the matrix Ai at each stage, which we can in principle do by factoring the characteristic polynomial,either of Ai or just A once and for all. But then we are to find an orthonormal basis for Cn−i+1

that uses this eigenvector. How do we do this? We consider the set vi, e1, e2, · · · , en−i+1 obtainedby adjoining the standard basis to the vector vi. Then we run Gram-Schmidt on this (which willnecessarily give 0 for one vector, which we ignore), and then we normalise the resulting orthogonalset. So at each stage of finding a Schur decomposition we need to run Gram-Schmidt. We certainlyknow how to do this, but it can involve a bit of work, so it is “conceptually” simple.

You are encouraged to wrestle a bit with the above proof, but an example will definitely help makeit more clear.

Example 16.5. Find the Schur form of A =

1 0 13 2 −31 0 1

.

Set A1 = A and find an eigenvector and eigenvalue of A1. We find that v1 =[010

]and λ1 = 2 will

do. Note that this is the easiest one to see but other possibilities are equally valid. This choice

has the lucky coincidence that we can immediately write down a matrix Q1, as Q1 =[0 1 01 0 00 0 1

].

Note that we could have gotten this by running Gram-Schmidt on the set

v1,[100

],[010

],[001

](you are encouraged to try this, right now, and see what happens). To sum up we have so far

A1 =

1 0 13 2 −31 0 1

λ1 = 2 v1 =

010

Q1 =

0 1 01 0 00 0 1

= P1

108

Then we compute S1 as

S1 = QH1 A1Q1 =

2 3 −30 1 10 1 1

The matrix S1 is triangularized in its first column with one eigenvalue of A displayed.

Now we extract A2, and find an eigenvalue and eigenvector. Note that we can either find a root ofthe characteristic polynomial of A2, or we can find another root of the characteristic polynomialof A1. Then we extend v2 to an orthonormal basis of C2, and let Q be the matrix whose columnsare this basis.

A2 =

[1 11 1

]λ2 = 0 v2 =

[−11

]Q2 =

1√2

[−1 11 1

]P2 =

1 0 00 − 1√

21√2

0 1√2

1√2

Now we compute S2 as

S2 = QH2 A2Q2 =

[0 00 2

]This is triangular, so the other eigenvalue is λ3 = 2.

This is fine, but we don’t actually want S2. What we’d rather do is nest these, so as apply Q2 tonot A2 but to the larger S1 from whence it came. So we use the larger P2 instead.

PH2 S1P2 =

2 −3√

2 00 0 00 0 2

= PH2 PH1 A1P1P2 = PHAP = S

Thus we have A = PSPH with P = P1P2. The eigenvalues of S (and hence of A) are 2, 0and 2, and the columns of P are an orthonormal basis of C3 (or R3). Which columns of P areeigenvectors of A? Deduce this from the form of S, and then compute P explicitly and check.

Since we need to compute unitary matrices at each stage (ie, not only Gram-Schmidt but normalisetoo) the matrices can easily get messy. But you should at least convince yourself that this is adecomposition that you could compute.

normal matrices

A matrix A is normal if AHA = AAH . In general matrices do not commute, so it would seemreasonable to guess that not all matrices are normal.1

Problem 16.6. Show that if A is Hermitian or unitary then A is normal (special cases for real

matrices: symmetric or orthogonal). Also, show that

[1 −11 1

]is normal (it is not in any of the

above categories) while

[1 21 1

]is not normal.

Normal matrices are important because of the following result.

1 In fact most matrices are not normal. The word “normal” is not meant as a synonym for “typical” but rather refers toorthogonality, as in “normal vector to a plane”. We’ll see that normal matrices have eigenspaces that are orthogonalto each other.

109

Theorem 16.7. Let A be an n× n matrix.There exists a unitary matrix P and a diagonal matrix D with A = PDPH if and only if A isnormal.

Such a decomposition A = PDPH is called a unitary diagonalisation, or, when the matrix P isreal, a orthogonal diagonalisation. Note that a unitary diagonalisation A = PDPH is a specialcase of ordinary diagonalisation A = PDP−1 and so we must necessarily have that each eigenspacehas dimension equal to the multiplicity of the corresponding eigenvalue. This condition is in factimplied by normality: we won’t show this directly, since it follows from Theorem 16.7.

Proof. First if A = PDPH then

AHA =(PDPH

)H (PDPH

)= PDHPHPDPH = PDHDPH

= PDDHPH = PDPHPDHPH = AAH

Thus A is normal. The middle step amounts to saying that diagonal matrices commute.

Now we show that if A is normal then it has a unitary diagonalisation. The matrix A certainlyhas a Schur decomposition, so we can find a unitary P and a triangular S with A = PSPH , givingS = PHAP . Now we discover that S is normal, by recycling the above argument.

SHS =(PAPH

)H (PAPH

)= PAHPHPAPH = PAHAPH

= PAAHPH = PAPHPAHPH = SSH

So S is both triangular and normal. The proof is completed by the following result.

Proposition 16.8. If B is triangular and normal then B is diagonal.

Proof. We know that BHB = BBH . If we calculate both products we find the (1, 1)-position as

b11b11 + b21b21 + · · ·+ bn1bn1 = b11b11 + b12b12 + · · ·+ b1nb1n

But B is triangular (say upper) so bij = 0 if i > j. So in fact we have

|b11|2 = |b11|2 + |b12|2 + · · · |b1n|2

This gives |b1j |2 = 0 if j > 1 and so b1j = 0 if j > 1. In general the (i, i)-position of the productsBHB and BBH gives

biibii + bi+1,ibi+1,i + · · ·+ bnibni = biibii + bi,i+1bi,i+1 + · · ·+ binbin

biibii = biibii + bi,i+1bi,i+1 + · · ·+ binbin

|bii|2 = |bii|2 + |bi,i+1|2 + · · · |bin|2

So bij = 0 for all j > i. This gives that bij = 0 if i 6= j and so B is diagonal.

The consequence of Theorem 16.7 is that if we find the Schur decomposition of a normal matrixthen the result will automatically be a unitary diagonalisation, since the matrix S will be diagonal.This isn’t the only way to find it. We could also find an ordinary diagonalisation as A = PDP−1,and choose orthonormal bases for each eigenspace. This involves running Gram-Schmidt for eacheigenspace, namely on sets of size mult(λj) for each eigenvalue λj . The fact that this results in aset that is overall an orthonormal basis of Cn is a consequence of the existence of an orthonormaldiagonalisation.

110

Problem 16.9. Show that if A is normal and Ax = λx, Ay = µy for λ 6= µ, then x and y areorthogonal. (hint: Show that 〈Ax|y〉 = 〈x|Ay〉, and then simplify the two scalar products usingAx = λx and Ay = µy.)

Conclude that if we take an orthonormal basis for each eigenspace of a normal matrix, then the wholeset is orthonormal and is thus an orthonormal basis for Cn.

transformations

Our motivation for matrix decompositions is to better understand linear transformations.

Theorem 16.10. Let T : V → V be a linear transformation that is represented by a matrixA (with respect to some basis).Then there is an orthonormal basis with respect to which the matrix of T is upper triangular.

The proof is just the combination of Theorem 16.1 and Theorem 10.32. We can further use Theo-rem 16.7 to get the following.

Theorem 16.11. Let T : V → V be a linear transformation that is represented by a matrixA (with respect to some basis).Then there is an orthonormal basis with respect to which the matrix of T is diagonal if andonly if A is normal (and in particular, every matrix representing T is normal).

Example 16.12. Let T : V → V be a linear transformation on a complex vector space V of dimension

2, such that T is represented by the matrix A =

[0 2− i

2 + i −4

]. Is T unitarily diagonalisable?

We directly compute that AHA =

[5 −8 + 4i

−8− 4i 21

]= AAH , so the matrix A is normal. Fur-

thermore, we could have noticed that AH = A, an even easier condition to check, which then givesnormality automatically (since Hermitian matrices are normal). In any case Theorem 16.7 guar-antees that there is a unitary P and a diagonal D with A = PDPH . So T would be representedby D with respect to the basis formed of the columns of P .

We can find the unitary diagonalisation of the previous example in two ways.

Example 16.13. Find a unitary diagonalisation of A =

[0 2− i

2 + i −4

], using A = PDP−1.

We calculate the eigenvalues of A and a basis for each eigenspace.

λ1 = −5 mult(−5) = 1 base: v1 =

[−1

2 + i

]dim(−5) = 1

λ2 = 1 mult(1) = 1 base: v2 =

[5

2 + i

]dim(1) = 1

Using Problem 16.9 we know that v1 and v2 will be orthogonal without even calculating theirinner product (calculate it and see!). So in order to find an orthonormal basis we would have toapply Gram-Schmidt to each eigenbasis. But each eigenbasis is of size 1, so we actually only needto normalise the two vectors to get the columns of P . The matrix D is exactly the eigenvalues.

P =

[ −1√6

5√30

2+i√6

2+i√30

]=

1√30

[−√

5 5

2√

5 +√

5i 2 + i

]D =

[−5 00 1

]

111

Were we to have had an eigenspace of dimension greater than 1 there we would have had to to a“real” Gram-Schmidt. This method is relatively easy if the individual multiplicities are all small.

Example 16.14. Find a unitary diagonalisation of A =

[0 2− i

2 + i −3

], using A = PSPH .

We start by finding one eigenvalue and one eigenvector.

λ1 = −5 v1 =

[−1

2 + i

]We needn’t find all the eigenvalues, nor even a basis for the eigenspace in question, one eigenvectorsuffices. Now we build an orthonormal basis for C2 starting with our eigenvector. We could applyGram-Schmidt to the set

1√6

[−1

2 + i

],

[10

],

[01

]Since we are looking for only one more vector, we could also set v2 =

[k 2 + i

]Tand find the

value of k such that v1,v2 is orthogonal.

0 = vH1 v2 =1√6

(−k + 5)

So k = 5. We normalise to get the columns of the matrix P and we have D = S = PHAP .

P =

[ −1√6

5√30

2+i√6

2+i√30

]=√

30

[−√

5 5

2√

5 +√

5i 2 + i

]D = S = PHAP =

[−5 00 1

]The matrix S is automatically diagonal. Instead of calculating S = PHAP , we could have justcalculated the second eigenvalue and written down S directly, because we knew it would bediagonal.

Were we to have had a matrix of size greater than 2, we would have had to do a “real” Schurdecomposition, that is, iterate to triangularise one column at a time. This method is relativelysimple when the matrix A is small.

exercises

1. Which of the following matrices are normal?

a)

[2 31 2

]b)

1 0 00 1 20 0 1

2. a) Show that

[a bc d

]is normal if and only if either b = c, or c = −b and d = a. Us-

ing this, characterize all non-diagonal 2 × 2 matrices that are orthogonally (unitarily)diagonalizable and show that none of them have real eigenvalues.

b) Show that if A and B are normal matrices, then the block matrix

[A 00 B

]is also normal.

3. Show that if AB = BA then AHBH = BHAH . Using this, show that if A and B are normalmatrices that commute, then AB is also normal.

4. Find a Schur decomposition for each matrix.

112

a)

[3 1−5 9

]b)

1 4 30 −2 −90 −4 7

c)

[8 −i6i 7

]d)

1 i√

2√

20 3/2 −1/2i0 −3/2i 7/2

5. Suppose that A = PSPH is a Schur decomposition of A. Show that the first j columns of P

form an invariant subspace of A, for any j. (hint: think of A = PSPH as AP = PS.)

113


17. Singular Value Decomposition

decompositions

We can always find a representation of a linear transformation that is triangular, with an orthonormalbasis (A = PSPH). If the dimensions of the eigenspaces are equal to the multiplicities we can find adiagonal representation (A = PDP−1). If the representing matrix is normal then we can even finda unitary diagonalization (A = PDPH).

All of these use the same basis for the origin and destination vector spaces. This is perhaps natural,in that for these decompositions it is typically the same space. But we don’t have to do this. Insteadof looking for one nice basis B such that we have A = MB→EA

′ME→B and A′ is a nicer matrix(triangular, diagonal, . . . ) we could look for two bases B and C that give A = MC→EA

′′ME→B forsome (perhaps nicer) matrix A′′. So the matrix A′′ functions between two different bases.

T (v) = w ⇐⇒ A′φB(v) = φC(w)

It turns out that the added flexibility of having different bases gives an elegant decomposition. Itmore or less allows us to unitarily diagonalize any matrix.

singular values

Theorem 17.1. Let A be an m× n matrix. Then there exists unitary matrices U and V anda non-negative diagonal matrix Σ with A = UΣV H .

The hypothesis on A is no hypothesis at all. So this decomposition applies to any matrix whatsoever.This is called a singular value decomposition.1

Problem 17.2. Show that if A is m × n and A = UΣV H then U is m ×m, V is n × n and Σ ism× n. Notice that Σij = 0 if i 6= j, and so if m 6= n then Σ has either rows or columns of zeros.

We’ll prove this theorem by giving an algorithm for finding the decomposition. First we notice someconsequences. We denote the diagonal values of Σ as σ1, σ2, · · · . These are the singular valuesof A. Furthermore we will choose to put them in non-decreasing order, and we will denote by r thenumber of non-zero singular values. So σ1 ≥ σ2 ≥ · · ·σr > 0 and σi = 0 for r + 1 ≤ i ≤ min m,n.We will denote the columns of U by u1,u2, · · · ,um and the columns of V by v1,v2, · · · ,vn (of coursethe columns of V are the rows of V H).

If A = UΣV H then AV = UΣ. Calculating the j-th column of both sides of this equation (this isProposition 1.2) we get

Avj = σjuj

This looks a little like an eigenvector-eigenvalue equation. We see that if 1 ≤ j ≤ r then uj = 1σjAvj

and that if j > r then Avj = 0. In fact we can say a little more.

Proposition 17.3. If A = UΣV H then vr+1, · · · ,vn is an orthonormal basis for nul(A).

∗ These notes are intended for students in mike’s MAT3341. For other uses please say “hi” to [email protected] The notation Σ is perhaps a little strange for a matrix, but it is traditional.

114

Proof. Certainly vj is in nul(A) for j > r. Furthermore the columns of V are orthonormal, sovr+1, · · · ,vn is an orthonormal set in nul(A). It remains to be shown that they span nul(A).Now let v ∈ nul(A); we can write it as a linear combinations of the columns of V , and so calculateAv.

v = α1v1 + · · ·+ αrvr + αr+1vr+1 + · · ·+ αnvn

0 = Av = A (α1v1 + · · ·+ αrvr + αr+1vr+1 + · · ·+ αnvn)

0 = α1Av1 + · · ·+ αrAvr + αr+1Avr+1 + · · ·+ αnAvn

0 = α1σ1u1 + · · ·+ αrσrur + 0 + · · ·+ 0

0 = α1σ1u1 + · · ·+ αrσrur

The columns of U are independent so αjσj = 0 for 1 ≤ j ≤ r. Since σj > 0 for j ≤ r, we musthave αj = 0 for j ≤ r. We conclude that v is spanned by vr+1, · · · ,vn; in other words, this setspans nul(A). The set is certainly independent (right?) and so is a basis for nul(A).

If A = UΣV H then we also get AHU = V ΣH . This shows that if 1 ≤ j ≤ r then vj = 1σjAHuj and

that if j > r then AHuj = 0.

Proposition 17.4. If A = UΣV H then ur+1, · · · ,um is an orthonormal basis for nul(AH).

Problem 17.5. Prove Proposition 17.4, analogously to Proposition 17.3.

decompositions of Hermitian matrices

We notice that if a singular value decomposition A = UΣV H exists then

AHA =(UΣV H

)H (UΣV H

)= V ΣHUHUΣV H = V ΣHΣV H

AAH =(UΣV H

) (UΣV H

)H= UΣV HV ΣHUH = UΣΣHUH

Problem 17.6. Describe the difference between ΣHΣ and ΣΣH . Knowing ΣHΣ or ΣΣH , is itpossible to find Σ?

Problem 17.7. Show that AHA and AAH are Hermitian, regardless of A. What can we concludefrom Theorem 16.7?

Since AHA is Hermitian we can find a unitary diagonalization as AHA = PDPH . But we want touse D = ΣHΣ to find the non-negative matrix Σ. What if D isn’t non-negative? What if it isn’treal?

Proposition 17.8. Let B be a Hermitian matrix. Then in the unitary diagonalization B =PDPH the matrix D is real. In particular, the eigenvalues of B are all real.

Proof. If B = PDPH then D = PHBP . Then DH =(PHBP

)H= PHBHP . Since B = BH we

find that D = DH . The matrix D is diagonal so DH = D and so D is real.

Proposition 17.9. Let B = AHA for some matrix A. Then in the unitary diagonalizationB = PDPH the matrix D is real and non-negative. In particular the eigenvalues of B are allreal and non-negative.

115

Proof. If Bx = λx then AHAx = λx and so xHAHAx = λxHx. This gives that λ = ‖Ax‖2 / ‖x‖2.Since the norms are real non-negative then λ is real and non-negative.

This guarantees that knowing D = PH(AHA

)P we can uniquely determine a non-negative matrix

Σ with D = ΣHΣ. We conclude that if a matrix has a singular value decomposition A = UΣV H

then V and Σ can be obtained from the unitary diagonalization AHA = PDPH , and U and Σ arecan likewise be obtained from the unitary diagonalization AAH = PDPH .

So we can get Σ from AHA or from AAH . Are these the same Σ?

Proposition 17.10. Let A be any matrix. Then the non-zero eigenvalues of AHA and AAH areidentical, including multiplicity. Furthermore, nul(AHA) = nul(A) and nul(AAH) = nul(AH).

Proof. Assume AHAx = λx for λ 6= 0 (and x 6= 0 of course). Then AAHAx = λAx and soAAH(Ax) = λ(Ax). Since Ax is non-zero (really? why?) we see that Ax is an eigenvector forAAH with the same eigenvalue λ. So every eigenvalue of AHA is also an eigenvalue of AAH . Asimilar argument show that the eigenvalues of AAH are all eigenvalues of AHA.

For multiplicities, let X be a matrix whose columns form a basis for the λ-eigenspace of AHA (forsome λ 6= 0). Certainly nul(X) = 0 since its columns are independent, and AHAX = λX sinceits columns are eigenvectors. Also the columns of AX are eigenvectors of AAH by the previous;it remains to show that the columns of AX are independent. Assume not, that is assume there issome y 6= 0 such that (AX)y = 0. Then AH(AX)y = AH0 = 0. But then 0 = AHAXy = λXywhich is impossible since the columns of X are independent.

Now if x ∈ nul(A) then Ax = 0, which means that AHAx = 0, making x ∈ nul(AHA). Conversely,if x ∈ nul(AHA) then AHAx = 0, which means xHAHAx = 0. But this says ‖Ax‖2 = 0 whichby a property of norms gives Ax = 0, making x ∈ nul(A). Thus nul(AHA) = nul(A). A similarargument shows that nul(AAH) = nul(AH).

The matrix Σ is uniquely determined by the values σi but U and V are not unique. This means wecan’t compute U and V separately using unitary diagonalizations for AHA and AAH ; we need tocompute one from the other.

We get one further corollary from Proposition 17.10, that explains our choice of r for the number ofnon-zero singular values of A. The proof is left as an exercise.

Corollary 17.11. The rank of a matrix is equal to the number of non-zero singular values ofthe matrix, counting multiplicities.

We leave the proof to Exercise 17.3.

Now that we have Proposition 17.9, it seems like a good time to complete the proof of Theorem 11.8

Proof of Theorem 11.8. We consider ‖A‖2.

maxx 6=0

‖Ax‖2‖x‖2

= maxx 6=0

xHAHAx

xHx

Now we let u1, · · · ,un be an orthonormal set of eigenvectors for AHA (hence a basis of Cn), withAHAui = λiui. By Proposition 17.9 each λi is real and non-negative. Let αi be scalars such that

116

x =

n∑i=1

αiui. To simplify we consider the square of the expression we want to maximize.

(‖Ax‖2‖x‖2

)2

=xHAHAx

xHx=

(∑n

i=1 αiui)H AHA (

∑ni=1 αiui)

(∑n

i=1 αiui)H (∑n

i=1 αiui)

=(∑n

i=1 αiui)H (∑n

i=1 αiλiui)

(∑n

i=1 αiui)H (∑n

i=1 αiui)

=

∑ni=1 |αi|

2 λiuHi ui∑n

i=1 |αi|2 uHi ui

=

∑ni=1 |αi|

2 λi∑ni=1 |αi|

2

≤ λ1∑n

i=1 |αi|2∑n

i=1 |αi|2

≤ λ1But this upper bound is achieved.

(u1)HAHAu1

(u1)Hu1=

(u1)Hλ1u1

(u1)Hu1= λ1

(u1)Hu1

(u1)Hu1= λ1

So ‖A‖2 is the square root of the largest eigenvalue of AHA.

We see now that Theorem 11.8 says that ‖A‖2 is the largest singular value of A.

Corollary 17.12. With respect to the 2-norm, the condition number of an invertible matrixis largest singular value divided by the smallest: c(A) = σ1/σr.

So to have a small condition number (close to 1) the singular values should all be near 1. We leavethe proof to Exercise 17.5.

singular value decomposition

We turn to the proof of Theorem 17.1: how to find a singular value decomposition.

Algorithm 17.13 (Singular Value Decomposition). We have two methods to find a singularvalue decomposition UΣV H of a matrix A.The first method is based on the matrix B = AHA.

1. We find a unitary diagonalization B = PDPH .

2. If the eigenvalues of B are λ1 ≥ λ2 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn then the singularvalues of A are σi =

√λi for 1 ≤ i ≤ r and the other singular values are all zero. So

we know the matrix Σ.

3. The matrix V is the matrix P . We need to be careful to keep the columns orderedconsistently with Σ.

4. We compute uj = Avj/σj for 1 ≤ j ≤ r. This is the beginning of U .

5. We find the remaining columns of U as an orthonormal basis for nul(AH).

117

The second method is similar, but starts with the alternate matrix B = AAH .

1. We find a unitary diagonalization B = PDPH .

2. If the eigenvalues of B are λ1 ≥ λ2 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn then the singularvalues of A are σi =

√λi for 1 ≤ i ≤ r and the other singular values are all zero. So

we know the matrix Σ.

3. The matrix U is the matrix P . We need to be careful to keep the columns orderedconsistently with Σ.

4. We compute vj = Auj/σj for 1 ≤ j ≤ r. This is the beginning of V .

5. We find the remaining columns of V as an orthonormal basis for nul(A).

A singular value decomposition is not unique, since for each eigenspace of B we have a choice oforthonormal basis.

Example 17.14. Find a singular value decomposition of A =

1 12 22 2

.

We start with B = AHA =

[9 99 9

].

We find B = PDPH with the following.

D =

[18 00 0

]P =

1√2

[1 −11 1

]The matrix P gives V directly; the singular values are the square roots of the eigenvalues of B,which give Σ.

Σ =

3√

2 00 00 0

V =1√2

[1 −11 1

]We have r = 1, the number of non-zero singular values. The number of singular values ismin m,n = 2; the singular values are σ1 = 3

√2 and σ2 = 0. Note that D is the same size

as B, namely m×m, and Σ is the same size as A, namely m×n. We compute the first r columnsof U .

u1 =Av1

σ1=

1

3√

2

2/√

2

4/√

2

4/√

2

=

1/32/32/3

Now we find the vectors u2 and u3 that give an orthonormal basis for nul(AH). We reduce thematrix AH , we read a basis for the null space, and we apply Gram-Schmidt to transform this intoan orthonormal basis.[

1 2 21 2 2

]→[1 2 20 0 0

]basis:

−2

10

,−2

01

G-S, o.n. basis:

1√5

−210

, 13√5

24−5

We know U , Σ and V with A = UΣV H .

U =

1/3 −2/√

5 2/(3√

5)

2/3 1/√

5 4/3(√

5)

2/3 0 −5/(3√

5)

Σ =

3√

2 00 00 0

V =1√2

[1 −11 1

]

118

Problem 17.15. In Example 17.14 we could have found u2 and u3 by observing that we need twovectors that make an orthonormal set with u1. We start with a set that contains u1 and spans R3

(u1 along with the standard basis will work) and we apply Gram-Schmidt. There will be r = 1 zerovectors which we will ignore. Check the Gram-Schmidt in the following.

1/32/32/3

,1

00

,0

10

,0

01

G-S:

1/3

2/32/3

, 13√2

4−1−1

, 1√2

01−1

,0

00

Give the matrix U that results from this. Notice that it isn’t necessarily the same matrix as before: thesingular value decomposition is not unique. Which method of finding U seems simpler: the methodin Example 17.14 or this one?

Example 17.16. Find a singular value decomposition of A =

1 12 22 2

.

This time we’ll use the second method. We start with B = AAH =

2 4 44 8 84 8 8

.

We find B = PDPH with

D =

18 0 00 0 00 0 0

P =

1/3 −2/√

5 2/(3√

5)

2/3 1/√

5 4/(3√

5)

2/3 0 −5/(3√

5)

The matrix P gives U directly; the singular values are the square roots of the eigenvalues of B,which gives Σ.

Σ =

3√

2 00 00 0

U =

1/3 −2/√

5 2/(3√

5)

2/3 1/√

5 4/(3√

5)

2/3 0 −5/(3√

5)

We have r = 1, the number of non-zero singular values. The number of singular values ismin m,n = 2; the singular values are σ1 = 3

√2 and σ2 = 0. Note that D is the same size

as B, namely m×m, and Σ is the same size as A, namely m×n. We compute the first r columnsof V .

v1 =AHu1

σ1=

1

3√

2

[33

]=

1√2

[11

]Now we find the vector v2 that gives an orthonormal basis for nul(A). We reduce the matrix A, weread a basis for the null space, and we apply Gram-Schmidt to transform this into an orthonormalbasis. 1 1

2 22 2

→1 1

0 00 0

basis:

[−11

]G-S, o.n. basis:

1√2

[−11

]We know U , Σ and V with A = UΣV H .

U =

1/3 −2/√

5 2/(3√

5)

2/3 1/√

5 4/(3√

5)

2/3 0 −5/(3√

5)

Σ =

3√

2 00 00 0

V =1√2

[1 −11 1

]

Problem 17.17. In Example 17.16 we could have found v2 by observing that we need a vector thatmakes an orthonormal set with v1. We start with a set that contains v1 and spans R3 (v1 along withthe standard basis will work) and we apply Gram-Schmidt. There will be r = 1 zero vectors which we

119

will ignore. Check the Gram-Schmidt in the following.1√2

[11

],

[10

],

[01

]G-S:

1√2

[11

], 1√

2

[−11

],

[00

](We could almost have guessed v2!) Give the matrix V that results from this. Notice that it isn’tnecessarily the same matrix as before: the singular value decomposition is not unique. Which methodof finding V seems simpler: the method in Example 17.16 or this one?

Problem 17.18. Check the decomposition B = PDPH and the calculation of U in Example 17.14.Which of these steps is longer? Check the decomposition B = PDPH and the calculation of V inExample 17.16. Which of these steps is longer?

In general which step is longer: the unitary diagonalization or the null space? How does this dependon the size of A? Is there any real1 advantage to using one method over the other?

singular values and rank

In a singular value decomposition A = UΣV H we know that the columns of vr+1, · · · ,vn form abasis of nul(A). So the dimension of nul(A) is n− r and the rank of A is r, the number of non-zerosingular values.

We can do more than just find the rank of A; we can give approximations to A according to anydesired rank. First we observe that the decomposition can be naturally written in terms of vectors(following the idea of Proposition 1.4).2

Proposition 17.19. If U and V are unitary matrices and Σ is a diagonal matrix then

UΣV H = σ1u1vH1 + σ2u2v

H2 + · · ·σrurvHr

We leave the proof as an exercise. Notice that this expression is a sum of r terms, each of whichis an m × n matrix of rank 1. We sometimes say that each of these matrices σjujv

Hj is a principal

component of A.

Problem 17.20. Show that σjujvHj in the previous is an m× n matrix of rank 1.

Problem 17.21. Show that if C is an m× n matrix of rank 1 then there exists σ > 0 and u,v with‖u‖2 = ‖v‖2 = 1 and C = σuvH . What if C has rank greater than 1? Compare this exercise withExercise 1.10. . . do you know how to do the last part of that one now?

Also, we note that the expression for UΣV H only depends on the first r columns of U and V .

Corollary 17.22. Let A = UΣV H be a singular value decomposition with r non-zero singularvalues. Let Ur be the m × r matrix formed by the first r columns of U and Vr be the n × rmatrix formed by the first r columns of V , and let Σr be the r× r matrix obtained by removingthe zero rows and columns from Σ.Then A = UrΣrV

Hr .

If we know a singular value decomposition A = UΣV H then for any 1 ≤ k ≤ r we define the matrixAk to be the sum of the first k principal components (recall our convention that singular values arelisted in non-decreasing order).

Ak = σ1u1vH1 + σ2u2v

H2 + · · ·σkukvHk

1 or complex. . . (!)2 This is the first of many reasons which motivated explicitly including Proposition 1.4.

120

One way of looking at this is that we are saying what the singular value decomposition of Ak is. Asa consequence of Corollary 17.11 we see that the rank of Ak is k. As a consequence of Theorem 11.8we see that

‖A−Ak‖2 =∥∥σk+1uk+1v

Hk+1 + · · ·σrurvHr

∥∥2

= σk+1

The matrix A − Ak has a singular value decomposition obtained by removing the first k columnsfrom U and V and the first k rows and columns of Σ. We can say a little more.

Proposition 17.23. Let A = UΣV H . Among all matrices A′ of rank k, the one that minimizes‖A−A′‖2 is A′ = Ak. In other words, the rank k matrix that is closest to A (in the sense of‖·‖2) is Ak, the matrix formed of the first k principal components.

It will be useful to notice the following.

Proposition 17.24. For any matrix A and any entry Aij of this matrix we have |Aij | ≤ ‖A‖1,|Aij | ≤ ‖A‖∞ and |Aij | ≤ ‖A‖2.

Proof. The fact that |Aij | ≤ ‖A‖1 and |Aij | ≤ ‖A‖∞ is an immediate consequence of Theorem 11.8.To show the result for the 2-norm, first note that using the standard basis we have Aij = eiAej .Then by Theorem 6.5 we get:

|Aij | = |eiAej | ≤ ‖ei‖2 ‖Aej‖2Using Proposition 11.7 we get:

‖Aej‖2 ≤ ‖A‖2 ‖ej‖2But ‖ei‖2 = ‖ej‖2 = 1, so |Aij | ≤ ‖A‖2.

Example 17.25. Consider the matrix A =

1 1 01 1 10 1 1

, with an uncertainty of ±0.2 in each position.

We’d like to know the rank of A.

Of course we can just find the RREF, but the real question is not the rank of A as such, butrather the rank of some unknown matrix A′ that differs in each position by at most ±0.2.

A singular value decomposition is useful for this.

A = UΣV H Σ ≈

5.82 0 00 1 00 0 0.17

This show that the rank of A, as given, is three. But we also see that the matrix A2, of rank 2, isvery close to A, since ‖A−A2‖2 = σ3 = 0.17. In other words A = A2 + ∆A where ∆A = σ3u3v

H3

and ‖∆A‖2 = 0.17. Knowing that the entries of ∆A are no larger than ‖∆A‖2 we see that it isquite reasonable to believe that the matrix A is “really” the matrix A2 with an error of ∆A thatis within the experimental error of ±0.2.

We conclude that it is possible that the “real” rank is two rather than three.

compression

A matrix A can represent an image, for instance an m × n matrix where each entry represents apixel. In order to transmit the image we could transmit the entire matrix, that is, mn numbers.

121

Alternatively, we could find a singular value decompositions, and keep only the first k principalcomponents.

A = UΣV H

= σ1u1vH1 + σ2u2v

H2 + · · ·+ σrurv

Hr

≈ Ak = σ1u1vH1 + · · ·+ σkukv

Hk

If σk+1 is small, then Ak ≈ A. The matrix Ak is still m× n but in order to transmit it we need onlysend the vectors u1, · · · ,uk and v1, · · · ,vk and the singular values σ1, · · · , σk. This is km+ kn+ knumbers. If k is not too large then this is a much smaller size than mn.

This is sometimes used for satellite imagery: an image is often a superposition of several imagestaken at different frequencies, and sometimes certain frequencies, or combinations of them, show themajority of the detail. The principal components detect this exactly, choosing the most importantparts.

approximation

Recall the approximate system Ax ≈ b. We can solve this using the A = QR decomposition bysolving the exact system Rx = QHb.

There is another approach, based on A = UΣV H . To solve Ax ≈ b we want to find a vector x thatminimizes ‖Ax− b‖2.

‖Ax− b‖2 =∥∥UΣV Hx− b

∥∥2

=∥∥UH (UΣV Hx− b

)∥∥2

=∥∥ΣV Hx− UHb

∥∥2

We used Proposition 14.10. If we set x′ = V Hx, b′ = UHb, we get the following.

‖Ax− b‖2 =∥∥Σx′ − b′

∥∥2

=√

(σ1x′1 − b′1)2 + · · ·+ (σrx′r − b′r)2 + (0− b′r+1)2 + · · ·+ (0− b′m)2

In order to minimize ‖Ax− b‖2 we need to set x′i = b′i/σi for 1 ≤ i ≤ r, and any value whatsoeverfor x′i when i > r. In order to minimize ‖Ax− b‖2 and furthermore minimize ‖x′‖2 we choose x′i = 0for i > r. Note that ‖x′‖2 = ‖x‖2 (why?) so we are also minimizing ‖x‖2.

In order to solve Ax ≈ b we need to compute b′ = UHb, compute x′, and then compute x = V x′.This can be expressed nicely as a matrix equation. Given an m×n diagonal matrix Σ, we define Σ+

as the n×m diagonal matrix with Σii = 1/σi for 1 ≤ i ≤ r and 0 otherwise.

Problem 17.26. Show that ΣΣ+ and Σ+Σ are almost identity matrices and are almost equal. Tounderstand the meaning of “almost” calculate ΣΣ+ and Σ+Σ for a few examples.

Theorem 17.27. Let Ax ≈ b. The solution x that minimizes ‖Ax− b‖ and that furthermoreminimizes ‖x‖2 can be found from the decomposition A = UΣV H by x = V Σ+UHb. Anyother solution that minimizes ‖Ax− b‖ can be obtained by adding a linear combination of thelast n− r columns of V to x.

Proof. We see that b′ = UHb. Then we calculate x′, which is equivalent to x′ = Σ+b′ = Σ+UHband then we get x = V x′ = V Σ+UHb.

122

We define the pseudo-inverse of a matrix A = UΣV H as A+ = V Σ+UH . If A is m × n then A+

is n×m. If Ax = b then x = A+b is a solution, hence the name “pseudo-inverse”.

This method of solving approximate systems has an advantage on the QR method: if the entriesthemselves have errors then we can compensate for this by replacing the small singular values byzeroes, which is equivalent to replacing A with Ak. So in order to solve Ax ≈ b we could computex = A+

k b, where k is chosen so that σk+1 is small.

Problem 17.28. Show that if A has an inverse then A+ = A−1. This can be done in two ways:apply Theorem 17.27 to an invertible matrix A, or by showing that for every invertible matrix thesingular value decomposition is such that (or rather the matrix Σ is such that)

rank(A) = m = n =⇒

A+A =

(V Σ+UH

) (UΣV H

)= I

AA+ =(UΣV H

) (V Σ+UH

)= I

exercises

1. Proposition 17.8 gives that the eigenvalues are real, while Proposition 17.9 gives that theeigenvalues are real and non-negative. Isn’t Proposition 17.9 a stronger version of Proposi-tion 17.8? Is there any use to Proposition 17.8?

2. Proposition 17.10 gives that the nonzero eigenvalues and their multiplicities are the same forAHA and AAH . How does the multiplicity of λ = 0 relate for AHA and AAH? Give youranswer in terms of m, n, r, where A is m×n of rank r. If p(t) and q(t) are the characteristicpolynomials of AHA and AAH , how are p and q related?

3. Using Proposition 17.10, or otherwise, show that dim(nul(A)) = dim(nul(AHA)) = dim(nul(AAH)).From this conclude that A, AHA, AAH all have the same rank. Thus, prove Corollary 17.11.

4. Algorithm 17.13 gives two methods of finding a singular value decomposition of A. Showthat the second is equivalent to applying the first method to AH , and then taking the conju-gate transpose of the resulting decomposition.

5. a) Let A be an invertible matrix, with singular values σ1, · · · , σr and singular value decom-position A = UΣV H . Show that the singular values of A−1 are σ−11 , · · · , σ−1r , and inparticular A−1 has a singular value decomposition as A−1 = V Σ−1UH . As part of this,you should make sure that Σ is invertible.

b) Show that with respect to the 2-norm, c(A) = σ1/σr, proving Corollary 17.12.

6. Let A have maximum singular value α and B have maximum singular value β. Show thatthe maximum singular value of AB is at most αβ. (hint: Now that we’ve finally provedTheorem 11.8, we notice that it says that ‖A‖2 = α, right?)

7. (slightly ) Let A have maximum singular value α and B have maximum singular value β.a) Give an example of A and B with the maximum singular value of AB equal to αβ.b) Give an example of A and B with the maximum singular value of AB less than αβ.

8. Let A =

0 0−2 −66 −7

a) Find a singular value decomposition of A.b) Find A1, the best rank one approximation of A.c) Find the pseudo-inverse of A and of A1.

d) Using the pseudo-inverse, find the best solution to Ax ≈[111

].

e) Using the pseudo-inverse, find the best solution to A1x ≈[111

].

f) Find the condition number of A.

123

9. Let A =

1 −2i2i 00 1

a) Find a singular value decomposition of A.b) Find A1, the best rank one approximation of A.c) Find the pseudo-inverse of A and of A1.


].


].


10. Let A =

[1 i ii 1 1

]a) Find a singular value decomposition of A.b) Find A1, the best rank one approximation of A.c) Find the pseudo-inverse of A and of A1.


].


].


124


18. Jordan Form

Jordan form

We already know that A is similar to a diagonal matrix exactly when dim(λj) = mult(λj) for eacheigenvalue λj . What do we do if dim(λj) 6= mult(λj)? The Jordan form gives the best “almostdiagonal” matrix that always exists.

First, a Jordan block is a square matrix with a constant diagonal, 1’s on the diagonal and 0’severywhere else. Here are some examples.[

3 10 3

] 0 1 00 0 10 0 0

[−7]

Notice that the diagonal can be any number, even 0; the super-diagonal is always 1.

Problem 18.1. Show that a Jordan block is diagonal if and only if it is of size 1× 1.

Let A be an n × n matrix, and consider a maximal collection of linearly independent eigenvectors(so in the best case n vectors in general perhaps less). The Jordan form of A is an n× n matrixthat has a Jordan block for each of these eigenvectors, where the diagonal of the block is equal tothe corresponding eigenvalue.

Example 18.2. If A is a matrix with eigenvalues 2,−3 with mult(2) = 1, mult(−3) = 3 anddim(2) = 1, dim(−3) = 1, then the Jordan form of A is necessarily

J =

2 0 0 00 −3 1 00 0 −3 10 0 0 −3

The matrix J has two Jordan blocks:

[2]

and

−3 1 00 −3 10 0 −3

.

If A is a matrix with eigenvalues 2,−3 with mult(2) = 1, mult(−3) = 3 and dim(2) = 1, dim(−3) = 2,then the Jordan form of A is necessarily

J =

2 0 0 00 −3 1 00 0 −3 00 0 0 −3

The matrix J has three Jordan blocks:

[2],

[−3 10 −3

]and

[−3].

We say “the” Jordan form, but actually we should say “the collection of Jordan blocks”: we couldhave written the blocks in any order.

There is a subtle but important detail hiding in the definition. There is a certain ambiguity in termsof the number of blocks, and it is not always possible to determine the Jordan form solely from themultiplicities and dimensions of the eigenvalues.


125

Example 18.3. Let A be a matrix with one eigenvalue −2, with mult(−2) = 4 and dim(−2) = 2.Then both of the following satisfy the criteria of a Jordan form for A (the criteria given so far!).

−2 1 0 00 −2 0 00 0 −2 10 0 0 −2

−2 1 0 00 −2 1 00 0 −2 00 0 0 −2

Does J have two blocks of size 2× 2 or one of size 1× 1 and one of size 2× 2? For the moment wedon’t know. There isn’t any actual ambiguity, since in order to be a Jordan form for A the matrix Jwould need to be similar to A.

Problem 18.4. For each case, give the Jordan form, or if that is not possible give all the Jordanforms that are consistent with the given information.

• λ = −1, 1, 0 with mult(−1) = 3, mult(1) = 3, mult(0) = 3 and dim(−1) = 1, dim(1) = 2,dim(0) = 3.

• λ = −1, 1, 0 with mult(−1) = 1, mult(1) = 1, mult(0) = 1.

• λ = 7 with mult(7) = 3.

• λ = 7 with dim(7) = 3.

Problem 18.5. Show that if mult(λj) − 1 ≤ dim(λj) ≤ mult(λj) then we know exactly what theJordan blocks of A are. Describe them.

Problem 18.6. Let A be an n×n matrix. Show that A is diagonalisable if and only if every Jordanblock is of size 1× 1.

Theorem 18.7. Let A be an n× n matrix. Then there exists a Jordan form J that is similarto A. Furthermore the matrix J is unique, up to permutations of the blocks.

This says that we always have a decomposition A = PJP−1. This is the almost-diagonalisation ofA. The proof is rather technical, so we don’t cover it. We will look at some of the consequences.

Corollary 18.8. Let J be the Jordan form of A. Then the multiplicity of λj as eigenvalue ofA is the sum of the sizes of the Jordan blocks of J with diagonal λj. The dimension of theeigenspace of A corresponding to λj is the number of Jordan blocks of J with diagonal λj.

Proof. The Jordan form J is triangular, so the eigenvalues of J are exactly the diagonal entries.Theorem 18.7 implies that A = PJP−1, so we conclude that the multiplicity of λj is exactly thesum of the sizes of the Jordan blocks of J with diagonal λj . The uniqueness of the Jordan formguarantees that the blocks are unique. If we rewrite the decomposition as AP = PJ , then wesee that the first column in each block of J corresponds to an eigenvector in P . So each blockcontributes 1 to the dimension of the eigenspace. The matrix J − λjI is readily seen to have anull space of dimension equal to the number of λj-blocks. By Proposition 14.3 the dimensions ofthe corresponding eigenspaces of A and J are the same, so the dimension of the λj-eigenspace isthe number of λj-blocks.

Corollary 18.9. If A and B are similar, then they have the same Jordan form.

Proof. IfA = PJP−1 andA = QBQ−1 thenB = Q−1AQ = Q−1PJP−1Q =(Q−1P

)J(Q−1P

)−1.

So J is also the Jordan form of B.

126

generalized eigenvectors

Consider some Jordan block of size ni×ni, with corresponding eigenvalue λi. Denote by v(1)i ,v

(2)i , · · · ,v(ni)

ithe columns of P corresponding to this Jordan block. By considering AP = PJ we see that

Av(1)i = λiv

(1)i

Av(2)i = v

(1)i + λiv

(2)i

...

Av(ni)i = v

(ni−1)i + λiv

(ni)i

So v(1)i is an eigenvector of A corresponding to λi, and v

(j)i , 2 ≤ j ≤ ni are generalised eigen-

vectors. We see that (A − λiI)v(j)i = v

(j−1)i . Knowing the eigenvector v

(1)i we can then calculate

its generalised eigenvectors by solving these iterated linear systems.

Theorem 18.10. Let A be an n× n matrix.Then there exists values ni, λi and vectors vi such that

•∑

i ni = n

• vi is an eigenvector of A with eigenvalue λi

• If we set v(1)i = vi then the iterated linear systems (A − λI)v

(j)i = v

(j−1)i all have a

solution for 2 ≤ j ≤ ni and have no solution for j = ni + 1.

We can imagine we are trying to diagonalise a matrix, but the dimension of some eigenspaces issmaller than the multiplicity of the corresponding eigenvalues. This theorem says that we can findenough generalised eigenvectors to make up for the lack of usual ones. We will omit the proof, but

we will observe that in practice one does not start by finding the vectors v(1)i , but rather the vectors

v(ni)i , the last vectors of the block.


3 1 −2−1 0 5−1 −1 4

.

We find the characteristic polynomial as (3−λ)(2−λ)2. The eigenvalue λ = 3 has multiplicity 1, andso necessarily its eigenspace has dimension 1 also (why?) and so we get a 1 × 1 Jordan block. Theeigenvalue λ = 2 has multiplicity 2, and so its eigenspace has dimension 1 or 1. There will either bea 2× 2 Jordan block or two 1× 1 Jordan blocks. For the moment we have two possibilities for J .

J?=

3 0 00 2 10 0 2

J?=

3 0 00 2 00 0 2

We find a basis for each eigenspace.

λ1 = 3 :

0 1 −2−1 −3 5−1 −1 1

→1 0 1

0 1 −20 0 0

v1 =

−121

λ2 = 2 :

1 1 −2−1 −2 5−1 −1 2

→1 0 1

0 1 −30 0 0

v2 =

−131

127

Now we know that the dimension of the eigenspace for λ2 = 2 is 1. This determines the matrix J ,as well as part of P (we are still missing one column of P , which we indicate with “·”).

J =

3 0 00 2 10 0 2

P =

−1 −1 ·2 3 ·1 1 ·

For each Jordan block with ni > 1 we would have to determine the generalised eigenvectors. Herethere is only one such block, which is missing only one vector.

(A− λ2I)v(2)2 = v

(1)2 →

1 1 −2−1 −2 5−1 −1 2

v(2)2 =

−131

→ v(2)2 =

1−20

+ t

−131

We can choose any value for t; we choose t = 0 because it is “easier”. We then have A = PJP−1

with

J =

3 0 00 2 10 0 2

P =

−1 −1 12 3 −21 1 0

What happens if we try and calculate another generalised eigenvector?

(A− λ2I)v(3)2 = v

(2)2 →

1 1 −2−1 −2 5−1 −1 2

v(3)2 =

1−20

+ t

−131

→ no solution

If the multiplicity of λ2 had been three, we would have found another generalised eigenvector. But weknew in advance that the iterated systems would only give one further solution since the multiplicitywas 2.

Problem 18.12. Check the solutions of the linear systems in the previous example, especially theone with no solution.

choosing vectors

Theorem 18.10 says, in general, that if we try and diagonalise and we are missing some eigenvectorsthen we should look for generalised eigenvectors (and we will be able to find them) to replace themissing eigenvectors, giving an almost-diagonal matrix J .

There is a difficulty in applying Theorem 18.10 in order to find A = PJP−1. How to choose theeigenvectors for each block? In Example 18.11 the dimension of each eigenspace was 1. This meant

that the only “choice” of eigenvector came down to a scaling factor. Had we chosen α[−1 3 1

]Tfor v

(1)2 (α 6= 0), we would have had α

[1 −2 0

]for v

(2)2 , which comes down to almost the same

thing: some columns of P would be scaled by α, and P−1 would change as a result (how?). Thedifficulty is with the eigenspaces of dimensions larger than 1. Theorem 18.10 does not say thatevery eigenvector will give enough generalised eigenvectors, merely that there exists some choice ofeigenvector that will.

Problem 18.13. Show that if each Jordan block with eigenvalue λ is of size 1 × 1 then there is noambiguity in the choice of eigenvectors for this eigenvalue. This corresponds to the case where thedimension of the eigenspace is equal to the multiplicity of the eigenvalue.

Show that is there is only one Jordan block with eigenvalue λ then there is no ambiguity in thechoice of eigenvectors for this eigenvalue. This corresponds to the case where the dimension of theeigenspace is 1.

128

For the case where 2 ≤ dim(λ) < mult(λ) there is still the question of how to determine which isthe right eigenvector for each Jordan block. For the curious we give a brief sketch. Set B = A− λIfor an eigenvalue λ of A. A basis for the eigenspace is a basis for nul(B); these are the generalizedeigenvectors of order 1. The generalized eigenvectors of order 2 are in nul(B2) — but nul(B) is asubspace of nul(B2). So a basis for nul(B2) gives the generalized eigenvectors of order at most 2.Also, dim(nul(Bj)) < dim(nul(Bj+1)) up to the point where dim(nul(Bj)) = mult(λj). So we startwith the largest null space and we find the maximal generalized eigenvectors: these will be the lastcolumn of each Jordan block. Next we multiply by B to find the other columns, finishing with oneeigenvector for each block. We omit the details.

Problem 18.14. For a matrix B, show that nul(B) is a subspace of nul(B2).

Problem 18.15. Find the A = PJP−1 decomposition for each A. If the choice of eigenvectorsfor each block is ambiguous, then explain why and give the possibilities for J . If the choice is notambiguous (like in Problem 18.13) then give P and J . For the two matrices the characteristicpolynomial is (1 + λ)4.

A =

−1/4 −1/4 −1/2 −3/41/4 −7/4 −1/2 −1/40 1 −1 00 0 1 −1

A =

−1/2 −1/2 0 −1/21/4 −7/4 −1/2 −1/4−1/4 3/4 −1/2 1/41/4 1/4 1/2 −5/4

Cayley-Hamilton

If p(t) = p0 + p1t+ · · ·+ pntn is a polynomial and A is a matrix then we define p(A) = p0I + p1A+

· · ·+pnAn. A fundamental result gives that every matrix is a “root” of its characteristic polynomial.

We need some technical results first.

Proposition 18.16. Let M and N be two block diagonal matrices, with square blocksM1,M2, · · · ,Mk, and N1, N2, · · · , Nk, such that Mi and Ni have the same size. Then theproduct MN is a block diagonal matrix with blocks MiNi.

M1

M2

. . .

Mk

N1

N2

. . .

Nk

=

M1N1

M2N2

. . .

MkNk

Proposition 18.17. Let M a block diagonal matrix, with square blocks M1,M2, · · · ,Mk. ThenM t is a block diagonal matrix with blocks Mi

t.M1

M2

. . .

Mk

t

=

M t

1

M t2

. . .

M tk

This is exactly the analogue of what happens for a diagonal matrix; here each Mj is itself a squarematrix, instead of a single number (a single number is much like a 1× 1 matrix. . . ).

Proposition 18.18. Let B be an n × n matrix such that Bi,i+1 = 1 for 1 ≤ i ≤ n − 1 andBij = 0 otherwise. Then Bn = 0.

129

Problem 18.19. Prove Proposition 18.16, Proposition 18.17 and Proposition 18.18. For Proposi-tion 18.18 it might be easier to show that Bt is a matrix such that Bt

i,i+t = 1 and Btij = 0 otherwise.

Theorem 18.20 (Cayley-Hamilton). Let A be an n × n matrix and p(λ) its characteristicpolynomial. Then p(A) = 0.

Proof. We first notice that if A = PJP−1 then

p(A) = p0I + p1A+ · · ·+ pnAn

= p0PP−1 + p1

(PJP−1

)+ · · ·+ pn

(PJP−1

)n= P (p0I + p1J + · · ·+ pnJ

n)P−1

= P p(J)P−1

Since P is an invertible matrix, p(A) = 0 if and only if p(J) = 0. Furthermore, by Proposition 18.17we see that p(J) = 0 if and only if p(Ji) = 0 for each Jordan block Ji. So is is sufficient to provethat each Jordan block is a root of the characteristic polynomial of A.

We can (in principle at least) factor the polynomial.

p(λ) = (λ1 − λ)m1(λ2 − λ)m2 · · · (λk − λ)mk

Each of these terms corresponds to one eigenvalue λi with multiplicity mi. Thus each term is infact the characteristic polynomial of the meta-block composed of all the Jordan blocks with thateigenvalue. Since all of the Jordan blocks corresponding to that eigenvalue have combined sizemi, each of them has size at most mj , and so Proposition 18.18 gives that each Jordan block is aroot of the characteristic polynomial of A.

Problem 18.21. If you look carefully at the proof of Theorem 18.20, you will see that we in factproved that every matrix is a root of another polynomial, which divides the characteristic polynomialbut may in fact be of strictly smaller degree. Describe this polynomial. When will it be of degree 1?

a discrete model : powers of a matrix

A dynamical system is a sequence of vectors x0,x1,x2, · · · such that xt+1 = Axt. For instance thevector xt could represent populations at time t; the matrix A gives the evolution of the populationsat each generation (day, month, . . . ). Alternatively, the vector x could represent a probabilitydistribution in a network at time t; the matrix A gives the transition probabilities between thedifferent states. The second example is a Markov chain and is the basis of the PageRank algorithmof Google.

Given the model xt+1 = Axt, we would like to know the solution and the behaviour of the solution.The solution is straightforward: we see that xt = Atx0 where x0 is the initial vector at time t = 0.

We classify the behaviour as follows: either xt tends to zero, or it is bounded, or it diverges. Moreprecisely we identify three possibilities.

1. As t→∞, xt → 0 for every choice of x0.

2. As t→∞, (xt)j ≤M for every choice of x0, for a fixed value M (which depends on x0).

3. As t→∞, (xt)j diverges for at least one j and at least one choice of x0.

The Jordan form will be useful for distinguishing these.

130

If xt → 0 for every choice of x0 then At → 0. If At → 0 then PJ tP−1 → 0 and J t → P−10P .Furthermore, Proposition 18.17 gives that if J t → 0 then for each Jordan block Ji of J we haveJit → 0. Each of these implications is in fact valid in both directions. We conclude the following.

Proposition 18.22. Let xt+1 = Axt and A = PJP−1 be the Jordan decomposition of A.Then xt → 0 for each choice of x0 if and only if Ji

t → 0 for each Jordan block of J .

This comes down to a variant of Proposition 18.18.

Proposition 18.23. Let Ji be a Jordan block of size ni×ni with eigenvalue λi. Then Jit → 0

if and only if |λi| < 1.

If xt is bounded, then At is also bounded. So PJ tP−1 is bounded and J t is bounded. As beforethis reduces to Ji

t being bounded for each Jordan block Ji (it might not always be the same bound).Again, the implications are valid in both directions.

Proposition 18.24. Let xt+1 = Axt and A = PJP−1 be the Jordan decomposition of A.Then xt is bounded for each choice of x0 if and only if Ji

t is bounded for each Jordan block ofJ .

This comes down to a variant of Proposition 18.18.

Proposition 18.25. Let Ji be an ni×ni Jordan block with eigenvalue λi. Then Jit is bounded

if and only if either |λi| < 1, or |λi| = 1 and ni = 1.

The divergent case is similar.

Proposition 18.26. Let xt+1 = Axt and A = PJP−1 be the Jordan decomposition of A.Then xt diverges for at least one choice of x0 if and only if Ji

t diverges for at least one Jordanblock of J .

Proposition 18.27. Let Ji be an ni × ni Jordan block with eigenvalue λi. Then Jit diverges

if and only if either |λi| > 1, or |λi| = 1 and ni > 1.

Notice that this is the complement of the two other cases.

We summarise.

Theorem 18.28. Let xt+1 = Axt and A = PJP−1 be the Jordan decomposition of A. Denoteby λi and ni be the eigenvalue and size of the Jordan block Ji of J , 1 ≤ i ≤ k.

• xt → 0 for every choice of x0 if and only if |λi| < 1 for every 1 ≤ i ≤ k. In this casewe have At → 0, or

∥∥At∥∥→ 0 for any norm.

• xt is bounded for every choice of x0 if and only if either |λi| < 1, or |λi| = 1 and ni = 1for every 1 ≤ i ≤ k. In this case we have that At is bounded, or

∥∥At∥∥ ≤ M for everynorm (M depends on the norm).1

• ‖xt‖ → ∞ for at least one choice of x0 if and only if either |λi| > 1, or |λi| = 1 andni > 1 for at least on 1 ≤ i ≤ k. In this case we have

∥∥At∥∥ → ∞. This is valid forevery norm.

131

Note that the proof relies on the Jordan decomposition A = PJP−1, but it is not necessary toactually know P , since the matrix J contains all the necessary information to apply Theorem 18.28.

Example 18.29. Consider the following Jordan forms.

−i 0 00 0.5 10 0 0.5

0.4 0 00 −1 10 0 −1

0.2 + 0.3i 0 00 0.9 10 0 0.9

For the first, we have two blocks: n1 = 1, |λ1| = |−i| = 1 and n2 = 2, |λ2| = |0.5| < 1. The matrixAt is bounded but does not converge to 0.

For the second, we have two blocks: n1 = 1, |λ1| = |0.4| < 1 and n2 = 2, |λ2| = |−1| = 1. The matrixAt diverges, or

∥∥At∥∥→∞.

For the third, we have two blocks: n1 = 1, |λ1| = |0.2 + 0, 3| < 1 and n2 = 2, |λ2| = |0.9| < 1. Thematrix At converges to 0.

a continuous model : exponential of a matrix

The derivative of a vector (with respect to a fixed variable, say, time) is the vector of its derivatives.We write x for d

dtx. So if x is a vector of functions of t (eg, each element of x is a function of thevariable t), then

x =[ddtx1(t)

ddtx2(t) · · ·

ddtxn(t)

]TWe should really write x(t) to indicate that it is a function of t, but we omit this for clarity.

A continuous model is ddtx = Ax. We find this type of relationship in physics (eg a mechanical

system of springs and masses) and in the study of infectious diseases. Note that this is different fromthe preceding model in two ways: it is a continuous function instead of a discrete one, but also thematrix A is playing a different role: is the discrete model it was giving us the changes but here itgives the rate of change.

Given this model we would like to know the solution and the behaviour of the solution. The solutionis already less obvious, as it is now a system of linear differential equations.

We recall thee Taylor series of the exponential function, and we apply it to a matrix to give theexponential of a matrix B.

exp(x) = 1 + x+1

2x2 +

1

6x3 + · · ·

exp(B) = I +B +1

2B2 +

1

6B3 + · · ·

This is the same idea as evaluating a polynomial at a matrix, which we already saw in Theorem 18.20.But a polynomial is a finite computation, which we can always perform. The exponential of a matrixis an infinite series, and so we need to consider convergence. In fact this does converge, for anymatrix B. We will not prove this, but we will give some properties of the exponential of a matrix.Notice especially the result about exp(B) exp(C) · · · , which might be a little surprising.

132

Proposition 18.30. Let exp(B) = I +B + 12B

2 + 16B

3 + · · · .• exp(B) converges for every matrix B

• exp(B) is invertible for every matrix B, and (exp(B))−1 = exp(−B)

• exp(αB) exp(βB) = exp((α+ β)B) for α, β ∈ C• exp(B) exp(C) = exp(B + C) if and only if BC = CB

• B exp(B) = exp(B)B

• exp(0) = I (that’s the matrix 0 and not the number 0)

For our model, there are three other useful properties.

Proposition 18.31. If M is block diagonal with blocks Mi, then exp(M) is block diagonalwith blocks exp(Mi).

exp

M1

. . .

Mk

=

exp(M1). . .

exp(Mk)

This is similar to Proposition 18.17; we leave the proof as an exercise.

Proposition 18.32. If A = PBP−1 then exp(A) = P exp(B)P−1.

This is an application of the definition of exponentiation; it essentially amounts to the fact thatAk = PBkP−1.

Proposition 18.33. ddt exp(tA)v = A exp(tA)v

The derivative of the exponential of a matrix follows the same logic as an ordinary exponential. TheA is a constant.

Proposition 18.33 gives the solution to our model x = Ax: it’s x = exp(tA)x0, where x0 is the vectorat time t = 0. This is completely analogous to the differential equation d

dtx = ax.

For the behaviour of the solution we again get three possibilities.

If x→ 0 as t→∞ for every choice of x0, then exp(tA)→ 0. Proposition 18.32 gives that exp(tJ)→ 0and Proposition 18.31 gives that exp(tJi)→ 0 for each Jordan block Ji of J .

Proposition 18.34. Consider the model x = Ax, and let A = PJP−1 be the Jordan decom-position of A. Then x → 0 for each choice of x0 if and only if exp(Ji) → 0 for each Jordanblock of J .

Proposition 18.35. Let Ji be a Jordan block of size ni×ni with eigenvalue λi. Then exp(Ji)→0 if and only if Re(λi) < 0.

Recall that Re(z) = Re(a+bi) = a, the real part of a complex number. The exponential of a complexnumber is given by exp(a+ bi) = ea(cos(b) + i sin(b), so the exponential tends to zero exactly whenthe real part tends to −∞.

We obtain the other two cases in a similar manner. First the bounded case.

133

Proposition 18.36. Consider the model x = Ax, and let A = PJP−1 be the Jordan decom-position of A. Then x is bounded for every choice of x0 if and only if exp(Ji) is bounded foreach Jordan block of J .

Proposition 18.37. Let Ji be a Jordan block of size ni×ni with eigenvalue λi. Then exp(Ji)is bounded if and only if either Re(λi) < 0, or Re(λi) = 0 and ni = 1.

Next, the divergent case.

Proposition 18.38. Consider the model x = Ax and let A = PJP−1 be the Jordan decom-position of A. Then x diverges for at least one choice of x0 if and only if exp(Ji) diverges forat least one Jordan block of J .

Proposition 18.39. Let Ji be a Jordan block of size ni×ni with eigenvalue λi. Then exp(Ji)diverges if and only if either Re(λi) > 0, or Re(λi) = 0 and ni > 1.

In summary, we have the following.

Theorem 18.40. Consider the model x = Ax, and let A = PJP−1 be the Jordan decomposi-tion of A. Let λi and ni be the eigenvalue and size of the Jordan block Ji of J , 1 ≤ i ≤ k.

• x→ 0 for every choice of x0 if and only if Re(λi) < 0 for each 1 ≤ i ≤ k. In this casewe have exp(tA)→ 0, or ‖exp(tA)‖ → 0.

• x is bounded for every choice of x0 if and only if either Re(λi) < 0, or Re(λi) = 0 andni = 1 for each 1 ≤ i ≤ k. In this case exp(tA) is bounded, or ‖exp(tA)‖ ≤ M forevery norm (M depends on the norm).

• ‖x‖ → ∞ for at least one choice of x0 if and only if either Re(λi) > 0, or Re(λi) = 0and ni > 1 for at least one 1 ≤ i ≤ k. In this case we have ‖exp(tA)‖ → ∞. This isvalid for any norm.

Example 18.41. Consider the following matrices J .−i 0 00 −0.5 10 0 −0.5

−0.4 + i 0 00 −1 10 0 −1

−0.2 + 0.3i 0 00 0.9i 10 0 0.9i

For the first matrix, we have two blocks: n1 = 1, Re(λ1) = Re(−i) = 0 and n2 = 2, Re(λ2) =|−0.5| < 0. The matrix exp(tA) is bounded but does not tend to 0.

For the second matrix, we have two blocks: n1 = 1, Re(λ1) = Re(−0.4 + i) < 0 and n2 = 2,Re(λ2) = Re(−1) < 0. The matrix exp(tA) tends to 0.

For the third matrix, we have two blocks: n1 = 1, Re(λ1) = Re(−0.2 + 0, 3i) < 0 and n2 = 2,Re(λ2) = Re(0.9i) = 0. The matrix exp(tA) diverges, or ‖exp(tA)‖ → ∞.

134


42. Solutions

Following are some solutions of exercises.

Sometimes an answer or even a hint is all that is given. That doesn’t mean that an answer or hintis all that is required of you should such a question show up on a final exam. . .

If you spot a typo let me know. There is an ε chance of errors creeping in, for some ε > 0.

Notice that the very last page contains a list of all exercises that have solutions (for reasons toocomplicated to explain here this list is not quite sorted), in case you want to look those up so youcan make sure you looked at the question before you look at the solution/answer/hint.

1.1 We’ll omit the solution for Proposition 1.1. . .

Using Proposition 1.1 we can compute the entries as

1 · 0 + 2 · 1 + 3 · 0 = 2 1 · 2 + 2 · 1 + 3 · 4 = 164 · 0 + 5 · 1 + 6 · 0 = 5 4 · 2 + 5 · 1 + 6 · 4 = 37

;[2 165 37

]Each of these is a 1× 1 rank 1 matrix, which we then concatenate entrywise.

Using Proposition 1.2 we can compute the columns as

first: 0

[14

]+ 1

[25

]+ 0

[36

]=

[25

]second: 2

[14

]+ 1

[25

]+ 4

[36

]=

[2037

] ;[2 165 37

]

Each of these is a m× 1 rank 1 matrix, which we then concatenate vertically.

Using Proposition 1.3 we can compute the rows as

first 1[0 2

]+ 2

[1 1

]+ 3

[0 4

]=[2 20

]second 4

[0 2

]+ 5

[1 1

]+ 6

[0 4

]=[5 37

] ;[2 165 37

]Each of these is a 1× n rank 1 matrix, which we then concatenate horizontally.

Using Proposition 1.4 we can compute the matrix as a sum of matrices as[14

] [0 2

]+

[25

] [1 1

]+

[36

] [0 4

]=

[0 20 8

]+

[2 25 5

]+

[0 120 24

]=

[2 165 37

]Each of these is a m× n rank 1 matrix, which we then concatenate additively.

1.3 b) First note that x1y2 = 2, so set x1 = t and y2 = 2/t for some t 6= 0. Now x1y1 = x1y3 = 0but x1 6= 0 so y1 = y3 = 0. Also x2y2 = 0 but y2 6= 0 so x2 = 0. Furthermore we aresupposed to have x2y1 = 0, x2y3 = 0, but luckily enough this is already the case; wehave no further choice in x or y so had we not been lucky then there would have beenno x, y.

x =

[t0

],y =

02/t0

t 6= 0

A solution like the one given for Exercise 1.4 or Exercise 1.5 or Exercise 1.6 is alsopossible. . .


135

1.4 b) First note that x1y2 = 2, so set x1 = t and y2 = 2/t for some t 6= 0. Now x1y1 = x1y3 = 0but x1 6= 0 so y1 = y3 = 0. Also x2y2 = 6 and y2 = 2/t so x2 = 3t. Furthermore weare supposed to have x2y1 = 0, x2y3 = 0, but luckily enough this is already the case; wehave no further choice in x or y so had we not been lucky then there would have beenno such x, y. We started this argument with M12 but we could have started with anynonzero entry.

x =

[t3t

],y =

02/t0

t 6= 0

Alternatively each column is a multiple of x, so the possibilities for x are exactly anynonzero multiple of any nonzero column; in this case we can say s times the secondcolumn. Then yj is the number such that the j-th column is yjx. Had no such yjexisted, there would have been no x, y.

x =

[2s6s

],y =

01/s0

s 6= 0

A solution like the one given for Exercise 1.5 or Exercise 1.6 is also possible. . .

1.5 b) First note that x1y2 = 2, so set x1 = t and y2 = 2/t for some t 6= 0. Now x1y1 = 1,x1y3 = 1 and x1 = t so y1 = 1/t, y3 = 1/t. Also x2y2 = 0 but y2 6= 0 so x2 = 0.Furthermore we are supposed to have x2y1 = 0, x2y3 = 0, but luckily enough this isalready the case; we have no further choice in x or y so had we not been lucky thenthere would have been no such x, y. We started this argument with M12 but we couldhave started with any nonzero entry.

x =

[t0

],y =

1/t2/t1/t

t 6= 0

Alternatively each row is a multiple of y, so the possibilities for y are exactly any nonzeromultiple of any nonzero row; in this case we can say s times the first row. Then xi isthe number such that the i-th row is xiy

T . Had no such xi existed, there would havebeen no x, y.

x =

[1/s0

],y =

s2ss

s 6= 0

A solution like the one given for Exercise 1.4 or Exercise 1.6 is also possible. . .

1.6 b) First note that x1y2 = 2, so set x1 = t and y2 = 2/t for some t 6= 0. Now x1y1 = 1,x1y3 = 1 and x1 = t so y1 = 1/t, y3 = 1/t. Also x2y2 = 3 and y2 = 1/t so x2 = 3t.Furthermore we are supposed to have x2y1 = 3, x2y3 = 3, but luckily enough this isalready the case; we have no further choice in x or y so had we not been lucky thenthere would have been no such x, y. We started this argument with M12 but we couldhave started with any nonzero entry.

x =

[t3t

],y =

1/t2/t1/t

t 6= 0

Alternatively each column is a multiple of x and each row is a multiple of y, so thepossibilities for x are exactly any nonzero multiple of any column and for y are exactly

136

any nonzero multiple of any nonzero row; in this case we can say x is r times the secondcolumn and y is s times the first row.

x =

[r3r

],y =

s2ss

r, s??

Now we must have xiyj = Mij . This gives: rs = 1, 2rs = 2, rs = 1, 3rs = 3, 6rs = 6,3rs = 3. So we have rs = 1.

x =

[r3r

],y =

1/r2/r1/r

r 6= 0

A solution like the one given for Exercise 1.4 of Exercise 1.5 is also possible. . .

1.13 We are given that A = AH = AT

. This means that Aij =(AT)ij

=(A)ji

= Aji In

particular for a diagonal element we have i = j so Aii = Aii. But the only numbers equal totheir complex conjugates are the reals, therefore Aii ∈ R.

1.14 The problem is that A and B need not be invertible. A cheap example can be found asfollows:

A =

[1 0 00 1 0

]B =

1 00 10 0

AB =

[1 00 1

]Clearly AB is invertible, but A and B are not even square. In fact if we choose A to be m×nwith rank m and B = AT then AB = AAT will be m×m with rank m (can you prove thisassertion?), so taking m < n gives more examples.

2.1 a) We check by multiplying.

Y A = (αY1 + βY2)A = αY1A+ βY2A = αI + βI = (α+ β)I = I

So yes it is a left inverse.

b) By the argument given above we see that Y A = (α+β)I. So if α+β 6= 1 then Y A 6= I.Therefore Y is a left inverse if and only if α+ β = 1.

2.2 a) It has a right inverse if it has a pivot in every row; which is not a priori impossible. Wecheck to see if it has a right inverse by trying to find it.[−3 4 6 −3 1 0

2 −1 1 2 0 1

]R1 7→−1/3R1−−−−−−−−−−→R2 7→R2−2R1

[1 −4/3 −2 1 −1/3 00 5/3 5 0 2/3 1

]R2 7→3/5R2

−−−−−−−−−−−→R1 7→R1+4/3R2

[1 0 2 1 1/5 4/50 1 3 0 2/5 3/5

]The inverse is Bo +NT , where:

B0 =

1/5 4/52/5 3/50 00 0

N =

−2 −1−3 01 00 1

T =

[s1 s2t1 t2

]

137

b) For a right inverse to exist requires a pivot in every row. But there are four rows andtwo columns, so this can’t happen.

2.3 a) For a left inverse to exist requires a pivot in every column. But there are two rows andfour columns, so this can’t happen.

b) We row-reduce AT .[7 3 15 1 04 −9 20 0 1

]R1 7→1/7R1−−−−−−−−−−→R2 7→R2−4R1

[1 3/7 15/7 1/7 00 −75/7 80/7 −4/7 1

]R2 7→−7/75R2−−−−−−−−−−−→R1 7→R1−3/7R2

[1 0 13/5 3/25 1/250 1 −16/15 4/75 −7/75

]Now we find the right inverse of AT .3/25 1/25

4/75 −7/750 0

+

−13/516/15

1

[t1 t2]

The left inverse of A is the transpose of this. Careful![3/25 4/75 01/25 −7/75 0

]+

[t1t2

] [−13/5 16/15 1

]Although it is true that every entry in the “T” matrix is an arbitrary parameter, theyare somehow not all the equivalent. In the right inverse, each column corresponds to acolumn of the right inverse, and each row corresponds to a basis element for the nullspace. In the left inverse, each row corresponds to a row of the left inverse, and eachcolumn corresponds to a basis element for the left null space.

2.4 a) answer: no (prove it!)

b) answer: no (prove it!)

2.5 answer: true (prove it!)

2.6 answer: false (prove it!)

3.4 Let A and B be lower triangular. We might as well presume they are compatible for multi-plication, so A is m× k and B is k × n.

(AB)ij =k∑t=0

AitBtj

Now Ait = 0 when i < t and Btj = 0 whenever t < j. So we need only consider values of twith j ≤ t ≤ i in the sum.

(AB)ij =

i∑t=j

AitBtj

But if j > i this is the empty sum, which is zero. So (AB)ij = 0 when i < j which meansAB is lower triangular.

138

Now assume in addition that A and B have 1’s on the diagonal.

(AB)ii =

k∑t=0

AitBti =

i∑t=i

AitBti = AiiBii = (1)(1) = 1

So AB has 1’s on the diagonal also.

3.9 answers: The LU decomposition is as follows.

L =

−1 0 03 1 01 3 1

U =

1 1 −1 1 −30 1 3 0 20 0 1 2 −1

a) Ly =

[111

]gives y =

[ −14−10

], and Ux = y gives x =

[−45−9s+9t34+6s−5t−10−2s+t

st

]=

[−4534−1000

]+

[−9 96 −5−2 11 00 1

][ st ].

3.10 a) For a left inverse to exist requires a pivot in every column. But there are two rows andfour columns, so this can’t happen.

b) We will find the left inverse using what we learned about elementary matrices and row-reduction. Compare the answer (and the solution!) for what we had in Exercise 2.3.

7 4 1 0 03 −9 0 1 0

15 20 0 0 1

;

1 0 3/25 4/75 00 1 1/25 −7/75 00 0 −13/5 16/15 1

=

1 0 3/25 4/75 00 1 1/25 −7/75 00 0 −13/5 16/15 1

[3/25 4/75 01/25 −7/75 0

]+

[t1t2

] [−13/5 16/15 1

]

3.11 a) A left-inverse requires a pivot in every column. This isn’t the case so there is no left-inverse.

b) We’d like a submatrix that has a pivot in every column; the first two columns of A havethis property (actually, and two of the columns do). So let A′ be the matrix consistingof the first two columns of A. The row-reduction for A′ is obtained from removing thethird column to the left of the bar. So the general left inverse is the following.

[1 2 2 10 1 −3 −1

]+

[s1 t1s2 t2

] [2 0 1 11 3 1 3

]

c) answer: A =

1 2 2 10 1 −3 −12 0 1 11 3 1 3

−1

1 0 20 1 30 0 00 0 0

139

4.1 First consider what happens when we change the order of row operations. The following twosequences of row operations have exactly the same effect.

Ri 7→ 1/cRi Ri+1 7→ Ri+1 − b′i+1Ri

Ri+1 7→ Ri+1 − bi+1Ri Ri+1 7→ Ri+1 − b′i+1Ri

Ri+2 7→ Ri+2 − bi+2Ri...

... Ri+1 7→ Ri+1 − b′i+1Ri

Rm 7→ Rm − bmRi

On the left is the order of operations we would have done for the pivot in row i in finding anA = LU decomposition. On the right is the order of operations we would have done for thepivot in row i in finding an A = L0U0 decomposition. In each case we are making the entriesbelow the pivot become zero.

From this we see that the initial value of the pivot was c, and the entry below it in row j isbj (according to the left) so then according to the right we have b′j = bj/c

Then the entries in the i-th column of L are (starting from the pivot)[c bi+1 · · · bm

]and

the corresponding entries in the i-th column of L0 are[1 bi+1/c · · · bm/c

]. This means

that if we multiply everything in the i-th column of L0 by the value of the i-th pivot, weget the i-th column of L. So let D be the diagonal matrix with the pivots on the diagonal.Multiplying the i-th column L0 by the i-th pivot is (by Proposition 1.2) the same as L0D,so L0D = L. On the other hand D−1U0 divides the i-th row by the i-th pivot. NowA = L0U0 = L0DD

−1U0 = LD−1U0 and A = LU . Since L and L0 are both invertible (L0

always is; L is since rank(A) = m) we have U = L−1A = D−1U0.

So L0D = L and U = D−1U0.

4.2 Let P1 and P2 both be permutation matrices, and P = P1P2

Consider the i-th row of P . By Proposition 1.3 this is a linear combination of the rows of P2,where the coefficients are the i-th row of P1. But the i-th row of P1 is all zeroes except forone 1. So this means that the i-th row of P is one of the rows of P2. Therefore every row ofP is all zeroes except for one 1.

Consider the j-th row of P . By Proposition 1.2 this is a linear combination of the columnsof P1, where the coefficients are the j-th column of P2. But the j-th column of P2 is allzeroes except for one 1. So this means that the j-th column of P is one of the columns of P1.Therefore every column of P is all zeroes except for one 1.

So every row and column of P has exactly one 1, and everything else is 0. Therefore P is apermutation matrix.

4.4 The general right inverse of A is1 0 3 20 0 0 00 2 1 −10 0 0 01 2 3 40 0 0 5

+

0 −21 00 30 10 0

[s1 s2t1 t2

]

140

The general right inverse of B is

1 2 3 4 5 6−1 1 2 3 0 01 −1 1 2 0 40 0 0 1 2 3

+

[s1 t1s2 t2

] [1 0 1 1 3 22 1 0 0 0 1

]

A has no left inverse, B has no right inverse, C has neither.

4.5 First PAQ = LU .

A =

1 2 22 2 30 4 0

A =

1 0 02 1 00 0 1

1 2 20 −2 −10 4 0

R2 7→ R2 − 2R1

A =

1 0 02 −2 00 0 1

1 2 20 1 1/20 4 0

R2 7→ (−1/2)R2

A =

1 0 02 −2 00 4 1

1 2 20 1 1/20 0 −2

R3 7→ R3 − 4R2

A =

1 0 02 −2 00 4 −2

1 2 20 1 1/20 0 1

R3 7→ (−1/2)R3

This is A = LU , with P = I and Q = I.

Next PAQ = L0U0.

A =

1 2 22 2 30 4 0

A =

1 0 02 1 00 0 1

1 2 20 −2 −10 4 0

R2 7→ R2 − 2R1

A =

1 0 02 1 00 −2 1

1 2 20 −2 −10 0 −2

R3 7→ R3 + 2R2

This is A = L0U0, with P = I and Q = I.

141

4.6

A =

1 3 3 02 6 6 02 6 5 1

A =

1 0 02 1 02 0 1

1 3 3 00 0 0 00 0 −1 1

R2 7→ R2 − 2R1

R3 7→ R3 − 2R1

A

1 0 0 00 0 1 00 1 0 00 0 0 1

=

1 0 02 1 02 0 1

1 3 3 00 0 0 00 −1 0 1

C2 C3

1 0 00 0 10 1 0

A

1 0 0 00 0 1 00 1 0 00 0 0 1

=

1 0 02 1 02 0 1

1 3 3 00 −1 0 10 0 0 0

R2 R3

This is PAQ = L0U0.

Note that we could have done C2 C4 instead if we wanted. Also note that in the last step,the two “2” in the L0 matrix actually changed places, right?

4.7 a) b′ =

241

, y =

111

, x′ =

4−11

, x =

41−1

b) b′ =

010

, y =

012

, x′ =

9−32

, x =

92−3

c) b′ =

000

, y =

000

, x′ =

000

, x =

000

4.8 a) b′ =

244−4

, y =

24−68

, no solution for x′, and hence no solution for x.

b) b′ =

1152

, y =

1110

, x′ =

4−1100

+

−11 0

4 0−2 −11 00 1

[st

], x =

40−101

+

−11 0

0 14 01 0−2 −1

[st

],

c) b′ =

1342

, y =

13−26

, no solution for x′, and hence no solution for x.

142

d) b′ =

0000

, y =

0000

, x′ =

0000

+

−11 0

4 0−2 −11 00 1

[st

], x =

0000

+

−11 0

0 14 01 0−2 −1

[st

]

4.9 We would generally choose the first non-zero column, and the first non-zero element in thatcolumn, but we don’t have to. So row and column swaps could happen if we wanted themto. But the only reason they would be forced to happen is because we didn’t have enoughpivots in the initial rows/columns. So if there is a dependency in the first k columns, butthese do not span all columns, then we will need to swap something from the first k columnswith a later one, and likewise with rows. Here is an example where the first two rows aredependent but don’t span all the rows, and the first 3 columns are dependent but don’t spanall the columns. 1 0 −2 3 1

2 0 −4 6 20 −3 3 0 1

4.11 We have det(A) = det(LU) = det(L) det(U). Now U is upper triangular with 1’s on thediagonal, so det(U) = 1. Also L is lower triangular, so det(L) is the product of the diagonalentries of L. Note that the diagonal entries of L are the (unnormalized) pivots of A.

4.12 We have det(A) = det(L0U0) = det(L0) det(U0). Now U0 is upper triangular, so det(U0) isthe product of the diagonal entries of U0. Also L0 is lower triangular with 1’s on the diagonal,so det(L0) = 1. Note that the diagonal entries of U0 are the (unnormalized) pivots of A.

5.1 a)

110

+

1−21

[t] (yes, a 1× 1 matrix is a valid matrix)

b)

−530

+

−25 1310 −7−4 21 00 1

[s1 s2 s3t1 t2 t3

]

5.2 Assume that z1 and z2 are both zeros, that is x + z1 = x = x + z2 for all vectors x.Then z1 + z2 = z1, becausez2 is a zero. Also z1 + z2 = z2, becausez1 is a zero. Thereforez1 = z1 + z2 = z2.

5.3 a) Using properties of a vector space we have

r0 = r(0 + 0) = r0 + r0

Now we add the inverse of r0 to both sides. We don’t officially know what this is yet,but we do know it exists,because every vector has an inverse, and we know that a vector

143

plus its inverse gives 0.

r0 + (−r0) = r0 + r0 + (−r0)0 = r0

b) Analogously, we have

0u = (0 + 0)u = 0u + 0u

Now we add the inverse of 0u to both sides. We don’t officially know what this is yet,but we do know it exists,because every vector has an inverse, and we know that a vectorplus its inverse gives 0.

0u + (−0u) = 0u + 0u + (−0u)0 = 0u

c) Using the previous we know that 0u = 0.

0 = 0u = (1− 1)u = 1u + (−1)u = u + (−1)u

Since 0 = u + (−1)u, then (−1)u must be an inverse of u, and by the above, must bethe inverse.

5.4 We use the subspace test. For convenience, let W0 = φ(u) : u ∈ U be the image of φ.

Let w1,w2 ∈W0. Let w = w1 + w2; we want to show w ∈W0. Then there exist v1,v2 ∈ Vwith φ(v1) = w1 and φ(v2) = w2. But φ(v1) + φ(v2) = φ(v1 + v2). Let v = v1 + v2, whichis in V ; then this says that w = φ(v), which means that w ∈W0.

Now let w1 ∈W0 and α a scalar. Let w = αw1; we want to show w ∈W0. Then there existsv1 ∈ V with φ(v1) = w1. But αφ(v1) = φ(αv1). Let v = αv1, which is in V ; then this saysthat w = φ(v), which means that w ∈W0.

5.6 a)

1 0 020 2 02 0 3

b)

1 1 11 2 31 4 9

−1, which you can compute yourself. . .

c) We can write this as

1 1 11 2 31 4 9

−1 1 0 20 2 02 0 3

.

But to compute it it’s more efficient to solve the linear systems in parallel. 1 1 1 1 0 021 2 3 0 2 01 4 9 2 0 3

;

1 0 0 4 −5 15/20 1 0 −5 8 −90 0 1 2 −3 7/2

The required change of basis matrix is to be found to the right of the bar.

d) This is the inverse of the previous matrix; alternatively, it can be found directly as theprevious one was.

144

5.7 b)

2 −1 −1−2 1 −11 0 1

c)

2 −1 −1−2 1 −11 0 1

2−11

=

4−63

d) First x = (2)

101

+ (−1)

110

+ (1)

011

=

103

, then solve x =

103

= a

100

+ b

210

+

c

321

to get φC(x) =

abc

=

4−63

5.10 a) We can row-reduce it. The only operations we will need are row swaps, which willrearrange it to be the identity matrix, so it is invertible. But we can also notice that weknow what the inverse is. We show that PP T = I The ij-entry of PP T is the standardscalar product of the i-th row of P and the j-th column of P T . This is the same thingas the standard scalar product of the i-th row of P and the j-th row of P . Now everyrow of P has a single 1 and the rest zero, and every column is the same. This meansthat every row of P has a single 1 and this 1 is in a different column for every row. So(PP T )ij = 1 if i 6= j and (PP T )ij = 0 if i = j.

b) It goes from a re-ordered version of the standard basis (according to the columns of P )to the standard basis.

6.1 a) This is equivalent to ‖x‖2 ≤ ‖x‖1, which is itself equivalent to (‖x‖2)2 ≤ (‖x‖1)

2

(‖x‖2)2 =

(√|x1|2 + |x2|2 + · · ·+ |xn|2

)2

= |x1|2 + |x2|2 + · · ·+ |xn|2

≤ (|x1|+ |x2|+ · · ·+ |xn|)2

= (‖x‖1)2

So if ‖x‖1 ≤ ε then ‖x‖2 ≤ ‖x‖1 ≤ ε.

b) Again, we need to show (‖x‖∞)2 ≤ (‖x‖2)2.

(‖x‖∞)2 = (max (|x1| , |x2| , · · · , |xn|))2

≤ (|x1|+ |x2|+ · · ·+ |xn|)2

= (‖x‖1)2

c) Again, we show (‖x‖2)2 ≤ n (‖x‖∞)2

(‖x‖2)2 =

(√|x1|2 + |x2|2 + · · ·+ |xn|2

)2

= |x1|2 + |x2|2 + · · ·+ |xn|2

≤ nmax (|x1| , |x2| , · · · , |xn|)2

= n (‖x‖∞)2

So if ‖x‖∞ ≤ ε then ‖x‖2 ≤√n ‖x‖∞ ≤

√nε.

145

d) We show ‖x‖1 ≤√n ‖x‖2. But now we notice that if we set z to be the vector of all

1’s, and x be the vector with x′i = |xi| then we have, with respect to the standard innerproduct, the following.

‖x‖1 = 〈z|x′〉√n = 〈z|z〉 = ‖z‖2 ‖x‖2 =

∥∥x′∥∥2

So the desired inequality is Cauchy-Schwarz with z and x′.

6.2 Yes. We have the following inequalities.

‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1 ≤√n ‖x‖2 ≤ n ‖x‖∞

So if your boss uses ‖·‖p and you use ‖·‖q, then there is an inequality of the form ‖·‖p ≤ α ‖·‖qfor some α. So when your boss wants ‖·‖p ≤ ε, you make ‖·‖q ≤ δ with δ = ε/α.

‖·‖p ≤ α ‖·‖q ≤ α (ε/α) = ε

6.4 Let dj = Djj , the diagonal elements of D. Note that |dj | = |±1| = 1

‖Dx‖p =

∑j

|(Dx)j |p1/p

=

∑j

|djxj |p1/p

=

∑j

|dj |p |xj |p1/p

=

∑j

|xj |p1/p

= ‖x‖p

6.6 a) ‖z‖1 = |−i|+ |2|+ |3| = 6

‖z‖2 =

√|−i|2 + |2|2 + |3|2 =

√√√√√[−i 2 3] i2

3

=√

14

‖z‖∞ = max (|−i|+ |2|+ |3|) = 3

b) ‖z‖1 = |−1|+ |2i|+∣∣3(1/2− i

√3/2)∣∣ = 6

‖z‖2 =

√√√√√[−1 2i 3(1/2− i√3/2)] −1

−2i3(1/2 + i

√3/2)

=√

14

‖z‖∞ = max(|−1|+ |2i|+

∣∣3(1/2− i√3/2)∣∣) = 3

6.7 Let dj = Djj , the diagonal elements of D. Note that |dj | =∣∣eiθ∣∣ = 1

‖Dx‖p =

∑j

|(Dx)j |p1/p

=

∑j

|djxj |p1/p

=

∑j

|dj |p |xj |p1/p

=

∑j

|xj |p1/p

= ‖x‖p

It’s worth comparing this with Exercise 6.4, and also with the norms of the different vectorsin Exercise 6.6.

6.9 a) No, since ‖[ a±a ]‖ = 0 but [ a

±a ] 6= 0 for all a 6= 0. Also, it fails the triangle inequality:‖[ 21 ]‖+ ‖[ 32 ]‖ < ‖[ 53 ]‖.

146

b) No, since ‖αx‖ =√|αx1|+ |αx2| =

√α√|x1|+ |x2| =

√|α| ‖x‖. So ‖αx‖ 6= |α| ‖vx‖

for all α 6= 0, 1.

c) No, since ‖x‖ = 0 for every nonzero vector x, which is not allowed.

7.1 First notice that 0 + 0 = 0. Let z = 〈x|0〉; certainly z ∈ C. Then 〈x|0〉 = 〈x|0 + 0〉 =〈x|0〉+ 〈x|0〉. This says that z = 2z, which implies that z = 0.

If we let w = 〈0|x〉 we get almost the same thing. Then 〈0|x〉 = 〈0 + 0|x〉 = 〈0|x〉 + 〈0|x〉.This says that w = 2w, which implies that w = 0.

In the second part, why don’t we have to worry about a complex conjugate?

7.2 a) No. In this case we have 〈x|x〉 = x21−x22. So we have 〈[ 11 ] | [ 11 ]〉 = 0 even though [ 11 ] 6= 0.

b) No. In this case we have 〈x|x〉 = x1x2 − x2x1 = 0. So we have 〈x|x〉 = 0 for every x,including nonzero ones.

c) No. Same as above.

d) No. Same as above.

7.4 The first four conditions of Definition 7.1 follow from basic matrix properties.

1. xTBTBy is clearly a real number

2. xTBTBy =(xTBTBy

)Tsince it is a real number. But then f(x,y) = xTBTBy =(

xTBTBy)T

= yTBTBx = f(y,x).

3. follows from matrix multiplication. . .

4. follows from matrix multiplication. . .

5. In order to show that f(x,x) ≥ 0, we notice that we can rewrite it as f(x,x) =xTBTBy = (Bx)T (Bx). If we set w = Bx, then we see that f(x,x) is the squareof the standard norm of w. Certainly the standard norm of w is non-negative, becausethe standard norm is a norm.

6. Now f(0,0) = 0 by directly matrix arithmetic. If f(x,x) = 0 then 0 = xTBTBy =(Bx)T (Bx) = wtw. But again, this is the square of the standard norm of w, and theonly way that is 0 is when w = 0. Then Bx = 0. Since B is invertible, x = B−10 = 0.

So it is an inner product.

If B is a square non-invertible matrix, then it has a non-trivial null space. Namely, there issome nonzero vector z such that Bz = 0. Then f(z, z) = (Bz)t(Bz) = 0T0 = 0. So this failsthe last condition and is not an inner product.

If B is m× n with rank(B) = n, then the above proof works, except for the last step of thelast property. We get Bx = 0, but we can’t multiply by B−1 because it doesn’t have aninverse. We can either multiply by a left inverse (which is does have) or else we can observedirectly that it has trivial null space by virtue of having full column rank.

So in fact as long as B is any matrix with full column rank, this will be an inner product.

147

7.10 If a set of vectors is orthogonal with respect to any valid inner product, then it is linearlyindependent. This is Theorem 7.14. It may or may not be orthogonal with respect to someother inner product, but that doesn’t make it dependent. In particular, Theorem 7.14 does notsay that sets non-orthogonal sets are dependent. So S1 and S2 are both linearly independent.

But then it turns out that Alice and Bob both forgot to actually check the definition of innerproduct, so in fact both sets consist of sixteen copies of the zero vector. Oh well, life is likethat.

8.1 If v is orthogonal to every vector in U it is certainly orthogonal to the vectors in a basis forU , because they are in U .

Now assume that 〈v|uj〉 = 0 for 1 ≤ j ≤ k. If u is an arbitrary vector in U , then there existsscalars aj such that u =

∑j ajuj . In fact they are uniquely determined but we don’t need

that here.

〈v|u〉 = 〈v|∑j

ajuj〉 =∑j

aj〈v|uj〉 =∑j

aj0 = 0

So v is orthogonal to every vector in U .

8.2 b) We check 〈[−2

11

]|[

01−1

]〉 = [−2 1 1 ]

[01−1

]= (−2)(0)+(1)(1)+(1)(−1) = 0 So the vectors

of S are pairwise orthogonal. Now we check the condition: both vectors in S sum tozero, so they are in U .

c) We just showed that the vectors of S are pairwise orthogonal. A cursory glance revealsthat none of them are the zero vector. So Theorem 7.14 tells us it’s independent.Therefore S spans a 2-dimensional subspace of U .

Now U is a subspace of R3, and U has a subspace of dimension 2. So U has dimensioneither 2 or 3. If U has dimension 3 then U = R3. But there are vectors in R3 thatare not in U , eg, any vector with all positive coefficients. So U 6= R3, meaning thatdim(U) = 2, meaning that U = span(S).

d) i) projU ([111

]) =

〈[−2

11

]|[111

]〉

〈[−2

11

]|[−2

11

]〉

[−211

]+〈[

01−1

]|[111

]〉

〈[

01−1

]|[

01−1

]〉

[01−1

]= 0

6

[−211

]+ 0

2

[01−1

]=[000

]So in fact

[111

]is orthogonal to U ; from this it follows that

[−211

],[

01−1

],[111

]is

an orthogonal basis for R3.

ii) projU ([

1−21

]) =〈[−2

11

]|[

1−21

]〉

〈[−2

11

]|[−2

11

]〉

[−211

]+〈[

01−1

]|[

1−21

]〉

〈[

01−1

]|[

01−1

]〉

[01−1

]= −3

6

[−211

]+ −3

2

[01−1

]=[

1−21

]

148

So in fact[

1−21

]∈ U .

e) i) The closest point is projU (x) =[000

].

ii) The closest point is projU (x) =[

1−21

].

8.3 a) We check the pairwise inner products.

〈1|t〉 =

∫ +1

−1(1)(t) dt =

∫ +1

−1t dt =

t2

2

∣∣∣∣+1

−1= 0

〈1|3t2 − 1〉 =

∫ +1

−1(1)(3t2 − 1) dt =

∫ +1

−13t2 − 1 dt = t3 − t

∣∣+1

−1 = 0

〈t|3t2 − 1〉 =

∫ +1

−1(t)(3t2 − 1) dt =

∫ +1

−13t3 − t dt =

3t4

4− t2

2

∣∣∣∣+1

−1= 0

b) x = projU (x) =〈1|t3〉〈1|1〉

(1) +〈t|t3〉〈t|t〉

(t) +〈3t2 − 1|t3〉

〈3t2 − 1|3t2 − 1〉(3t2 − 1)

=

∫ +1−1 (1)(t3) dt∫ +1−1 (1)(1) dt

(1) +

∫ +1−1 (t)(t3) dt∫ +1−1 (t)(t) dt

(t) +

∫ +1−1 (3t2 − 1)(t3) dt∫ +1

−1 (3t2 − 1)(3t2 − 1) dt(3t2 − 1)

= 02(1) + 2/5

2/3(t) + 08/5(3t2 − 1)

= 35 t

c) This is the vector whose entries are the coefficients we just computed, so φS(x) =[ 0

3/50

].

d) The vector in U closest to x is x = projU (x) (this is Theorem 8.7), which we justcomputed.. The distance between x and x is ‖x− x‖.

‖x− x‖ =∥∥(t3)− (3/5t)

∥∥ =√〈t3 − 3/5t|t3 − 3/5t〉 =

√∫ +1

−1(t3 − 3/5t)2 dt =

√27 − 12 + 6

25

8.5 (briefly:)

a) If x ∈ U then projU (x) = x and so f(x) = x = λx with λ = 1.

b) If x ⊥ U then projU (x) = 0 and so f(x) = 0 = λx with λ = 0.

9.1 a)

1/

√2

i(1/√2)

0

,i(1/√6)1/

√6

2/√6

b) 1

3

2 −i ii 2 1−i 1 2

149

9.2 a) Since P1 is a projection matrix, P1z is a vector in U1; since every vector in U1 isorthogonal to every vector in U2, then P1z is orthogonal to every vector in U2. Thereforethe projection of P1z onto U2 is the zero vector. So we have (P2P1)z = P1(P2z) = 0for every vector z. In particular, (P2P1)I is a matrix each of show columns is zero, soP1P2 is the zero matrix. As an alternative to this last sentence, nul(P1P2) = Cn sorank(P1P2) = 0 so P1P2 is the zero matrix. Interchanging the indices 1 and 2 in theprevious argument, we get that P2P1 is also the zero matrix.

b) We know that P1 can be written as Q1QH1 where the columns of Q1 are an orthonormal

basis for U1, and P2 can be written as Q2QH2 where the columns of Q2 are an orthonormal

basis for U2. Since U1 and U2 are orthogonal, then the inner product of a vector of U1

and a vector of U2 is zero. But the matrix QH1 Q2 is a matrix whose (i, j) entry is theinner product of the i-th column of Q1 with the j-th column of Q2. So QH1 Q2 is the

zero matrix, and likewise QH2 Q1 is also the zero matrix (or: QH2 Q1 =(QH1 Q2

)H. . . ).

P1P2 = Q1QH1 Q2Q

H2 = Q1

(QH1 Q2

)QH2 = Q1 (0)QH2 = 0

P2P1 = Q2QH2 Q1Q

H1 = Q2

(QH2 Q1

)QH1 = Q2 (0)QH1 = 0

9.3 a) Q =

1/3 2/3−2/3 2/32/3 1/3

and R =

[3 −6 60 3 9

]

b) x =

2/31/30

+ t

−8−31

.

c) b = QQTb = Q(QTb) =

2/32/31/3

. Note that we already found QTb, right? Since b 6= b

then the solution is not exact and is the (best) approximation.

d) b = Ax = (QR)x = Q(Rx) =

2/32/31/3

. Note that we already found Rx, right? Since

b 6= b then the solution is not exact and is the (best) approximation.

10.1 Let u1,u2 ∈ U and a, b be scalars (field elements). Then using the linearity of S and then ofT , we get the linearity of R.

R(au1 + nu2) = T (S(au1 + bu2))

= T (aS(u1) + bS(u2))

= aT (S(u1)) + bT (S(u2)) = aR(u1) + bR(u2)

150

10.2 Let R = αS + βT , and u1,u2 ∈ U and α, β be scalars (field elements). The definition of Rsplits into S and T , which are both linear.

R(au1 + bu2) = (αS + βT )(au1 + bu2)

= αS(au1 + bu2) + βT (au1 + bu2)

= αaS(u1) + αbS(u2) + βaT (u1) + βbT (u2)

= a (αS(u1) + βT (u1)) + b (αS(u2) + βT (u2)) = aR(u1) + bR(u2)

10.10 a) T is linear by properties of the derivative: ddt (af(t) + bg(t)) = a d

dt(f) + b ddt(g)

b) Note that we are using the same basis for “input” and “output”. We consider eachelement of the basis, apply the transformation, and write the answer in terms of thebasis. Note that in the last step we need to solve a system of linear equations; we justgive the answer here, even though there are some calculations involved.

T (1 + t) = 1 1 = 1/2(1 + t) + 1/2(1− t) + 0(t2 + t3) + 0(t2 − t3)

1/21/200

T (1− t) = −1 − 1 = −1/2(1 + t) +−1/2(1− t) + 0(t2 + t3) + 0(t2 − t3)

−1/2−1/2

00

T (t2 + t3) = 2t+ 3t2 2t+ 3t2 = 1(1 + t) + 1(1− t) + 3/2(t2 + t3) + 3/2(t2 − t3)

113/23/2

T (t2 − t3) = 2t− 3t2 2t− 3t2 = 1(1 + t) + 1(1− t)− 3/2(t2 + t3)− 3/2(t2 − t3)

11−3/2−3/2

This gives the four columns of the desired matrix: A1 =

1/2 −1/2 1 11/2 −1/2 1 11 1 3/2 3/21 1 −3/2 −3/2

c) A2 =

0 1 0 00 0 2 00 0 0 30 0 0 0

d) We write each element of the basis B in terms of the standard basis; the coefficients are

the columns.

1 + t = 1(1) + 1(t) + 0(t2) + 0(t3)

1100

151

1− t = 1(1)− 1(t) + 0(t2) + 0(t3)

1−100

t2 + t3 = 0(1) + 0(t) + 1(t2) + 1(t3)

0011

t2 − t3 = 0(1) + 0(t) + 1(t2)− 1(t3)

001−1

This gives the matrix: MB→E =

1 1 0 01 −1 0 00 0 1 10 0 1 −1

.

e) A1 = ME→BA2MB→E = (MB→E)−1A2MB→E

10.11 a) A1 =

1 1 00 0 1−1 −2 −51 3 7

b) A2 =

1 2 30 1 30 1 21 3 7

c) MB→E =

1 1 1 10 1 1 10 0 1 10 0 0 1

d) A1ME→BA2MB→E = (MB→E)

−1A2MB→E

11.1 a) ‖A‖1 = 2, ‖A‖∞ = 2, ‖A‖2 =√

2

b) ‖A‖1 = 5, ‖A‖∞ = 3, ‖A‖2 = 0

c) ‖A‖1 = 2, ‖A‖∞ = 2, ‖A‖2 = 2

11.2 If AT = A or AH = A then |Aij | = |Aji|. Which means that considering rows or columnsamounts to the same thing.

11.3 If AH = A then AHA = A2. The eigenvalues of A2 are the squares of the eigenvalues ofA. Since A is Hermitian it’s eigenvalues are real (Proposition 17.8). So that means the

152

eigenvalues of A2 are real and non-negative, and in particular, the largest eigenvalue ofA2 is the square of either the largest or the smallest (most negative) eigenvalue of A. So

‖A‖2 =√λmax(AHA) = max |λ1| , |λn|.

11.7 By a general property of norms, ‖x‖ = 0 if and only if x = 0 for any vector space. The onlyvector whose norm is zero is the zero vector. For matrices, this means that the only matrixwhose matrix norm is zero is the zero matrix. Invertible matrices are certainly nonzero, sothey have nonzero norm, which means positive norm.

12.1 a) P =

0.1 0.7 −0.10.3 −0.2 0.40.2 0.2 0.1

b) No, since ‖P‖1 = 1.1 > 1.

c) Yes , since ‖P‖∞ = 0.9 < 1. We get1

1 + 0.9<∥∥∥(I + P )−1

∥∥∥ < 1

1− 0.9, so 0.526 . . . <∥∥A−1∥∥ < 10.

12.2 For convenience let r = ‖R‖.a) If

∥∥A−1pR∥∥ < 1 or∥∥pRA−1∥∥ < 1 then A + pR is definitely invertible (Theorem 12.5)

This is a little hard to work with, so we weaken slightly.∥∥A−1pR∥∥ ≤ ∥∥A−1∥∥ ‖pR‖ = |p|∥∥A−1∥∥ ‖R‖

So if |p|∥∥A−1∥∥ ‖R‖ < 1 then A+pR is definitely invertible (Theorem 12.10). This means

that if |p| < 1

‖A−1‖ ‖R‖then A+ pR is definitely invertible.

12.3 a) If we let P = A − I then the diagonal elements of P all lie in (−0.3, 0.2) and theoff-diagonal elements of P all lie in [− 1

2n ,+12n ]. This means that ‖P‖∞ ≤ 0.3 + (n −

1)/(2n) < 0.5 + 1/2 = 1 Thus we know that A = I + P is invertible.

b) Certainly. For instance, let P be upper triangular with zero diagonal (or more generally,no −1 on the diagonal). Then I + P is an upper triangular matrix with no zeroes onthe diagonal, so certainly invertible.

12.5 hint: what is A+R and A−1R when R = kA?

12.7 This question is asking what we can say about the null space of A under small perturbations.The point of this question is that adapting the proof of Theorem 12.14 doesn’t give much inthis situation.

a) This scenario doesn’t even make sense. If b = 0, then x ∈ nul(A); but if x 6= 0,then A is not invertible. So Theorem 12.14 doesn’t go through; in fact we can’t applyTheorem 12.5, which is the way we solve for x.. The fundamental issue is that A beingnon-invertible is akin to saying there is a discontinuity in the function x 7→ Ax. It is

153

very difficult to say something meaningful about small perturbations in a function neara singularity (think of the behaviour of f(x) = 1/x near x = 0).

b) This scenario is at least possible. If x = 0 then we can follow the proof up to the pointjust before where we divide by ‖x‖. This gives (remember that ‖x‖ = ‖0‖ = 0):

‖∆x‖ ≤ 1

1− α∥∥A−1∥∥ ‖∆b‖

So one possible solution to A′x′ = 0 is x′ = x+∆x = ∆x with ‖∆x‖ bounded as above.

Except that this doesn’t really say much at all, because in fact if x = 0, we can take∆x = 0 also. In other words, ∆x = 0. To put it another way, if we know that Ax = 0implies that x = 0, then a small perturbation in A will not affect the solution for x. Infact, this is true even if “small” is replaced by “huge”.

12.8 We apply Theorem 12.14 with different norms. Since we don’t actually know ∆A, we areessentially forced to use the slightly weaker form of Theorem 12.15.

a) First, we compute various norms. Theorem 11.8 is useful for the matrix norms. For ∆Aand ∆x note that the absolute values of the entries are all at most 1/24.

‖b‖∞ = 4 ‖x‖∞ = 1 ‖A‖∞ = 5∥∥A−1∥∥∞ = 7 ‖∆A‖∞ ≤ 4(1/24) ‖∆b‖∞ ≤ 1/24

This means that we have α ≤ ‖∆A‖∞∥∥A−1∥∥∞ = 28/24 > 1. So we can’t actually apply

Theorem 12.15.

b) Again, various norms:

‖b‖∞ = 12 ‖x‖∞ = 4 ‖A‖∞ = 6∥∥A−1∥∥∞ = 5 ‖∆A‖∞ ≤ 4(1/24) ‖∆b‖∞ ≤ 4(1/24)

This means that we have α ≤ ‖∆A‖∞∥∥A−1∥∥∞ = 20/24 < 1. Furthermore, c(A) =

‖A‖∞∥∥A−1∥∥∞ = 30. So we can apply Theorem 12.15.

‖∆x‖∞ ≤ ‖x‖∞1

1− αc(A)

(‖∆b‖∞‖b‖∞

+‖∆A‖∞‖A‖∞

)= 4× 1

1− 20/24× 25×

(4/24

12+

4/24

6

)= 25

This means that x′ = x + ∆x where ‖∆x‖1 ≤ 25. In other words, the sum of theabsolute values of xi − x′i is at most 25. Which doesn’t say much!

c) The conclusion based on the 1-norm is valid. It is also rue that the ∞-norm doesn’ttell us anything here, but that doesn’t mean we don’t know anything. So even if the ∞norm is somehow “more important” here, we still know something about the perturbedsolution ∆x. In fact, ‖∆x‖∞ ≤ ‖∆x‖1, so we know that ‖∆x‖∞ ≤ 25 also.

12.9 a) x =

[10

]

b) x =

[01

]

c) We have A−1 = −10000

[1 −1

−1.0001 1

].

154

Using the 1-norm we get ‖A‖1 = 2.0001 and∥∥A−1∥∥

1= 20001, so (A) ≈ 40000. Using

the ∞-norm we get ‖A‖∞ = 2.0001 and∥∥A−1∥∥∞ = 20001, so (A) ≈ 40000.

If we think of the first system as the unprimed one and the second as the primed one,we can use Theorem 12.14 (or Theorem 12.15) to estimate x′ = x + ∆x. Given theenormous condition number, we aren’t surprised to see that this estimate of x′ doesn’ttell us much. We compute the norms and apply Theorem 12.15. Note that since ∆Ais zero (the matrix doesn’t change) we get α = 0. We’ll permit ourselves the luxury ofrounding the condition number to 40000, so in fact our estimate below is a little tooprecise.

‖∆x‖1 ≤ ‖x‖11

1− αc(A)

(‖∆b‖1‖b‖1

+‖∆A‖1‖A‖1

)≤ 1× 1

1− 040000×

(.0001

2.0001+

0

2.0001

)≈ 2

The ∞-norm is similar.

‖∆x‖∞ ≤ ‖x‖∞1

1− αc(A)

(‖∆b‖∞‖b‖∞

+‖∆A‖∞‖A‖∞

)≤ 1× 1

1− 040000×

(.0001

1.0001+

0

2.0001

)≈ 4

In fact ∆x =

[−11

], so ‖∆x‖1 = 2. Our estimate based on the 1-norm is in fact sharp.

13.6 a) We are looking for a polynomial f(t) and a scalar λ such that D(f) = λf . Let f =f0 + f1t+ · · ·+ f + ntn.

D(f) = f1 + 2f2t+ 3f3t2 + · · ·nfntn−1

λf = λf0 + λf1t+ λf2t2 + · · ·λfntn

If n > 0 then the degrees of D(f) and λf are different, so cannot possibly be equal. Ifn = 0 then f(t) = f0 is a constant, so we need to have λf0 = D(f0) = 0. We must havef0 6= 0 (since eigenvectors are not the zero vector) which means that λ = 0.

So the eigenvectors are exactly the constant polynomials, with eigenvalue 0.

b) The derivative does not increase the degree, so the derivative of a polynomial of degreeat most k is a polynomial of degree at most k (in fact it is of degree at most k − 1 butwe don’t need that here). So if we let Pk be the subspace of polynomials of degree atmost k, we see this is an invariant subspace of D.

13.7 It is useful to compare this with Exercise 13.7.

13.8 You should think about this one.

14.2 The key observation is that for square matrices PQ = I implies QP = I. This is actuallyTheorem 2.10. Thus if PHP = I then PPH = I also, which means that the columns of PH

155

(which are the rows of P ) form an orthonormal basis of Cn. If P and Q are both unitary,then PQ(PQ)H = PQQHPH = I and so PQ is unitary.

14.3 This would say that every matrix is diagonalizable.

14.4 This would say that every matrix is unitarily diagonalizable.

15.2 a) Set D11 to be some small (positive) number α. Then the first row will have small radiusas α→ 0.

b) We need α > 0 in order for D to be invertible, so we’ll always get a disc. One mightthink that we could say that A has an eigenvalue in the intersection of all the discs thatcome from all the values of α, but see following part.

c) In fact we can’t really conclude anything at all. We need to know what the other discsare, and whether or not they overlap this one. In point of fact, for all the other rows,the first entry is being divided by a small number α, so if that entry is not zero thecorresponding disc will grow, and so eventually will overlap the first one.

15.6 Applying Theorem 15.9 we could write A = IAI−1, taking I to be the matrix usually known asP . Then ‖∆A‖1 ≤ nε. The radius if the discs from Theorem 15.9 is ρ = c(I) ‖∆A‖1 = ‖∆A‖1since c(I) = 1. So as long as ε is small enough, the radius will be small. In particular, ifwe let δ be the smallest difference between any two distinct diagonal elements of A, thenchoosing ε < δ/(2n) makes ρ < δ/2 so that the discs will be disjoint. So for such a value ofε, Theorem 15.9 tells us that every eigenvalue lies in some disc, and all the discs are disjoint,so every disc contains one eigenvalue. Precisely: there is a bijective correspondence betweenthe eigenvalues of A and the eigenvalues of A+ ∆A, and each eigenvalue of A+ ∆A is close(within ρ) to the corresponding eigenvalue of A.

15.7 This is, in some sense, the same as Exercise 15.6. This time c(P ) might not be 1, but it issome fixed value. The radius we get from Theorem 15.9 is ρ = c(P ) ‖∆A‖1 ≤ c(P )nε. Let δbe the smallest difference between any two distinct diagonal elements of A. Then choosingε < δ/(c(P )2n) makes ρ < δ/2 so that the discs will be disjoint. So for such a value of ε,Theorem 15.9 tells us that every eigenvalue lies in some disc, and all the discs are disjoint,so every disc contains one eigenvalue. Precisely: there is a bijective correspondence betweenthe eigenvalues of A and the eigenvalues of A+ ∆A, and each eigenvalue of A+ ∆A is close(within ρ) to the corresponding eigenvalue of A.

15.9 From the point of view of Theorem 15.9, we know that the eigenvalues of the perturbedmatrix A′−A = ∆A lie in discs centred at the eigenvalues of A, with radius ρ = c(P ) ‖∆A‖.If the discs are disjoint then we know there is one eigenvalue per disc.

a) • Using the 1-norm we get ‖P‖1 = 4,∥∥P−1∥∥

1= 3, c(P ) = 12. The eigenvalues of

A′ all lie in discs with centres at the eigenvalues of A and radius 12 ‖∆A‖1. Theeigenvalues of A would have to be separated by a distance of at least 24 ‖∆A‖1for the discs to be disjoint.

156

• Using the∞-norm we get ‖P‖∞ = 4,∥∥P−1∥∥∞ = 3, c(P ) = 12. The eigenvalues of

A′ all lie in discs with centres at the eigenvalues of A and radius 12 ‖∆A‖∞. Theeigenvalues of A would have to be separated by a distance of at least 24 ‖∆A‖∞for the discs to be disjoint.

• Using the 2-norm we get ‖P‖2 ≈ 3.2,∥∥P−1∥∥

2≈ 2.1, c(P ) ≈ 6.9. The eigenvalues

of A′ all lie in discs with centres at the eigenvalues of A and radius 6.9 ‖∆A‖2. Theeigenvalues of A would have to be separated by a distance of at least 13.8 ‖∆A‖2for the discs to be disjoint.

b) • For the 1-norm we get ‖P‖1 = 13,∥∥P−1∥∥

1= 20, c(P ) = 260. The discs have

radius 260 ‖∆A‖1.

• For the ∞-norm we get ‖P‖∞ = 11,∥∥P−1∥∥∞ = 21, c(P ) = 231. The discs have

radius 231 ‖∆A‖∞.

• For the 2-norm we get ‖P‖2 ≈ 10.3,∥∥P−1∥∥

2≈ 18.7, c(P ) ≈ 192.9. The discs have

radius 192.9 ‖∆A‖2.The radii of the discs are roughly 20 times bigger than the previous. Seems less likelythat they are disjoint.

c) • Using the 1-norm we get ‖P‖1 = 103,∥∥P−1∥∥

1= 200, c(P ) = 20600. The discs

have radius 20600 ‖∆A‖1.

• Using the ∞-norm we get ‖P‖∞ = 101,∥∥P−1∥∥∞ = 201, c(P ) = 20301. The discs

have radius 20301 ‖∆A‖∞.

• Using the 2-norm we get ‖P‖2 ≈ 100.0,∥∥P−1∥∥

2≈ 198.5, c(P ) ≈ 19857. The discs

have radius 19857 ‖∆A‖2.The radii of the discs are roughly 100 times bigger than the previous, and roughly 2000times bigger than the first. Even if they are disjoint, the estimates are so vague as tobe not of much use.

16.1 a) We check if the following two expressions are equal.[2 31 2

] [2 31 2

]H=

[2 31 2

] [2 13 2

]=

[13 88 5

][2 31 2

]H [2 31 2

]=

[2 13 2

] [2 31 2

]=

[5 88 14

]It is not normal.

b) Check the following.1 0 00 1 20 0 1

1 0 00 1 20 0 1

H =

1 0 00 1 20 0 1

1 0 00 1 00 2 1

=

1 0 00 5 20 2 1

1 0 0

0 1 20 0 1

H 1 0 00 1 20 0 1

=

1 0 00 1 00 2 1

1 0 00 1 20 0 1

=

1 0 00 1 20 2 5

These are not equal, so the matrix is not normal. Actually, we could have used Propo-sition 16.8, which says that a matrix that is triangular and normal must be diagonal.

157

Rephrased, Proposition 16.8 says that a matrix that is triangular but not diagonal can-not be normal.

16.4 Some detailed calculations are omitted below; fill them in to check your understanding.

a) We set A1 = A =

[3 1−5 9

]. Then we find an eigenvalue and eigenvector such that

A1v1 = λ1v1. We find the characteristic polynomial of A1 as:

det(A1 − λI) = det

[3− λ 1−5 9− λ

]= (3− λ)(9− λ)− (1)(−5) = λ2 − 12λ+ 32 = (λ− 4)(λ− 8)

We find an eigenvector for (say) λ = 4 as:

A1 − 4I =

[−1 1−5 5

]rowops

[−1 10 0

]eigen

vector

[11

]Then we make an orthonormal basis using this vector.[

11

]add

basis

[11

],

[10

],

[01

]gram

schmidt

[11

],

[1/2−1/2

],

[00

]norm

alize

[1/√

2

1/√

2

],

[1/√

2

−1/√

2

]

We might have simply recognized what the second vector required for the orthonormalbasis is. In any case, we get Q1 using this basis as columns. Then we find S1 = QH1 A1Q1.

Q1 =

[1/√

2 1/√

2

1/√

2 −1/√

2

]S1 =

[1/√

2 1/√

2

1/√

2 −1/√

2

] [3 1−5 9

] [1/√

2 1/√

2

1/√

2 −1/√

2

]=

[4 −60 8

]At this point we would define A2 to be the matrix S1 with the first row and columnremoved, but we don’t need to since S1 is already triangular.

We define P1 = Q1 since it is already the right size, and then P = P1, and thenS = PHAP . But of course all that does nothing here since we started with a 2 × 2matrix. So in fact P = P1 and S = S1 as above, with A = PSPH .

d) We start by setting A1 = A. Then we find an eigenvector and eigenvalue for A1.Either by computing the characteristic polynomial, or by realizing we already know aneigenvector, we find:

v1 =

100

λ1 = i

Then we need to find an orthonormal basis based on v1.101

add

basis

1

00

,1

00

,0

10

,0

01

gram

schmidt

1

00

,0

10

,0

01

158

Of course you didn’t actually do any such Gram-Schmidt, right? This gives the matrixQ1, from which we get S1 = QH1 A1Q1.

Q1 =

1 0 00 1 00 0 1

S1 =

1 0 00 1 00 0 1

1 i√

2√

20 3/2 −1/2i0 −3/2i 7/2

1 0 00 1 00 0 1

=

1 i√

2√

20 3/2 −1/2i0 −3/2i 7/2

Of course, we didn’t really need to do anything up until now: the matrix A1 alreadywas A2, because it’s first column was “triangular”.

In any case we now define A2 to be S1 with the first row and column removed, andrepeat.

A2 =

[3/2 −1/2i−3/2i 7/2

]We find an eigenvector and eigenvalue. Details omitted, but you should be able to fillthem in.

v2 =

[−i3

]λ2 = 3

Then we get an orthonormal basis involving v2.

v2 =

[−i3

]add

basis

[−i3

],

[10

],

[01

]gram

schmidt

[−i3

],

[3−i

]norm

alize

1√10

[−i3

], 1√

10

[3−i

]As columns, this gives Q2. Then we get S2 = QH2 A2Q2.

Q2 = 1√10

[−i 33 −i

]S2 = 1√

10

[i 33 i

] [3/2 −1/2i−3/2i 7/2

]1√10

[−i 33 −i

]=

[3 −2i0 2

]Now S2 is triangular so we are done. We “upsize” the Qj matrices to Pj .

P1 =

1 0 00 1 00 0 1

P2 =

1 0 0

0 −i/√

10 3/√

10

0 3/√

10 −i/√

10

Then P = P1P2 and S = PHAP .

P = P1P2 =

1 0 0

0 −i/√

10 3/√

10

0 3/√

10 −i/√

10

S = PHAP =

1 4/√

5 i2/√

50 3 −2i0 0 2

17.2 The matrices AHA and AAH and A (and AH) all have the same rank. Since AHA is n × nand AAH is m×m, the multiplicity of 0 as an eigenvalue of AHA is n−r and the multiplicityof 0 as an eigenvalue of AAH is m − r. If m < n then p(t) = tn−mq(t), and if m > n thenq(t) = tm−np(t).

159

17.8 a) Start with B = AHA =

[0 −2 60 −6 −7

] 0 0−2 −66 −7

=

[40 −30−30 85

]. We find the eigenval-

ues.

det(AHA− λI) = det

[40− λ −30−30 85− λ

]= λ2 − 125λ− 2500 = (λ− 100)(λ− 25)

So the singular values of A are 10 and 5.

Σ =

10 00 50 0

We find the eigenvectors of B.

B − 100I =

[−60 −30−30 −15

]rowops

=

[2 10 0

]eigen

vector=

[−12

]norm

alize=

[−1/√

5

2/√

5

]

B − 25I =

[15 −30−30 60

]rowops

=

[1 −20 0

]eigen

vector=

[21

]norm

alize=

[2/√

5

1/√

5

]The normalized vectors are the columns of V .

V = 1√5

[−1 22 1

]Now we find some of the columns of V from the columns of U ; specifically, one columnfor each non-zero singular value.

u1 =1

σ1Av1 =

1

10

0 0−2 −66 −7

[−1/√

5

2/√

5

]= 1√

5

0−1−2

u2 =

1

σ2Av2 =

1

5

0 0−2 −66 −7

[2/√5

1/√

5

]= 1√

5

0−21

The remaining columns of U (namely, u3) form a basis for the null space of AH .

AH =

[0 −2 60 −6 −7

]rowops

[0 1 00 0 1

]eigen

vector

100

norm

alize

100

Note that we didn’t really need to row-reduce to find the null space. We could havenoticed that this is in the null space, and we know the dimension of the null space isone, so this must be it.

U =

0 0 1

−1/√

5 −2/√

5 0

−2/√

5 1/√

5 0

b) The best rank k approximation is obtained by keeping only the top k principal compo-

nents. There are two ways to see this. Either we can zero out all but the top k (k = 1here) singular values (note the “Σ” matrix)

A1 =

0 0 1

−1/√

5 −2/√

5 0

−2/√

5 1/√

5 0

10 00 00 0

[−1/√

5 2/√

5

2/√

5 1/√

5

]

160

Or we can expand the above using our old friend Proposition 1.4.

A1 = σ1u1vH1 = 10

0

−1√

5

−2/√

5

[−1√

5 2/√

5]

These are of course exactly the same thing.

161

index of exercises for which there are solutions

1.13, 1371.14, 1371.1, 1351.3, 1351.4, 1361.5, 1361.6, 1362.1, 1372.2, 1372.3, 1382.4, 1382.5, 1382.6, 1383.10, 1393.11, 1393.4, 1383.9, 1394.11, 1434.12, 1434.1, 1404.2, 1404.4, 1404.5, 1414.6, 1424.7, 1424.8, 1424.9, 1435.10, 1455.1, 1435.2, 1435.3, 1435.4, 1445.6, 1445.7, 1456.1, 1456.2, 1466.4, 1466.6, 1466.7, 1466.9, 1467.10, 1487.1, 1477.2, 1477.4, 1478.1, 1488.2, 1488.3, 1498.5, 1499.1, 1499.2, 1509.3, 15010.10, 15110.11, 15210.1, 15010.2, 15111.1, 15211.2, 15211.3, 152

11.7, 15312.1, 15312.2, 15312.3, 15312.5, 15312.7, 15312.8, 15412.9, 15413.6, 15513.7, 15513.8, 15514.2, 15514.3, 15614.4, 15615.2, 15615.6, 15615.7, 15615.9, 15616.1, 15716.4, 15817.2, 15917.8, 160

162

Documents

MAT3341 : Applied Linear Algebra Course Notesweb5.uottawa.ca/mnewman/notes/mat3341.pdfMAT3341 : Applied Linear Algebra Mike Newman, april 2018 + 1. Matrix Algebra matrices An m nmatrix