42
MATH 2080 Further Linear Algebra Jonathan R. Partington, University of Leeds, School of Mathematics December 8, 2010 LECTURE 1 Books: S. Lipschutz – Schaum’s outline of linear algebra S.I. Grossman – Elementary linear algebra 1 Vector spaces and subspaces Vector spaces have two built-in concepts. 1. Vectors – can be added or subtracted. Usually written u, v, w, etc. 2. Scalars – can be added, subtracted, multiplied or divided (not by 0). Usually written a, b, c, etc. Key example R n , space of n-tuples of real numbers, u =(u 1 ,...,u n ). If u =(u 1 ,...,u n ) and v =(v 1 ,...,v n ), then u + v =(u 1 + v 1 ,...,u n + v n ). Also if a R, then au =(au 1 ,...,au n ). 1.1 The obvious properties of the vector space R n (1) Vector addition satisfies: For all u, v, we have u + v = v + u, (commutative rule). For all u, v, w, we have (u + v)+ w = u +(v + w), (associative rule). There is a zero vector 0 with u + 0 = 0 + u = u for all u. For all u there is an inverse vector u with u +(u)= 0 =(u)+ u. (2) Scalar multiplication satisfies: a(u + v)= au + av and 1

MATH 2080 Further Linear Algebra - School of Mathematics

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

MATH 2080 Further Linear Algebra

Jonathan R. Partington, University of Leeds, School of Mathematics

December 8, 2010

LECTURE 1

Books:

S. Lipschutz – Schaum’s outline of linear algebraS.I. Grossman – Elementary linear algebra

1 Vector spaces and subspaces

Vector spaces have two built-in concepts.1. Vectors – can be added or subtracted. Usually written u, v, w, etc.2. Scalars – can be added, subtracted, multiplied or divided (not by 0). Usuallywritten a, b, c, etc.

Key example

Rn, space of n-tuples of real numbers, u = (u1, . . . , un).If u = (u1, . . . , un) and v = (v1, . . . , vn), then u + v = (u1 + v1, . . . , un + vn).Also if a ∈ R, then au = (au1, . . . , aun).

1.1 The obvious properties of the vector space Rn

(1) Vector addition satisfies:

• For all u, v, we have u + v = v + u, (commutative rule).

• For all u, v, w, we have (u + v) + w = u + (v + w), (associative rule).

• There is a zero vector 0 with u + 0 = 0 + u = u for all u.

• For all u there is an inverse vector −u with u + (−u) = 0 = (−u) + u.

(2) Scalar multiplication satisfies:

• a(u + v) = au + av and

1

• (a + b)u = au + bu, (these are the distributive laws).

• (ab)u = a(bu), (associativity of scalar multiplication).

• 1u = u, (identity property).

Now we look for other objects with the same properties.

Note our vectors were in Rn, our scalars in R. Instead of R we can use any setin which we have all the usual rules of arithmetic (a field).

Examples

Q – rational numbers (fractions) a/b, where a, b are integers and b 6= 0.C – complex numbers.A new one: F2 – the field of two elements, denoted 0 and 1, with usual rulesof additions and multiplication except that 1 + 1 = 0 (i.e., addition mod 2). So(−1) is the same as 1. This is used in coding theory, geometry, algebra, computerscience, etc.

1.2 Definition of a vector space

A vector space V over a field F (which in this module can be Q, R, C or F2) is aset V on which operations of vector addition u+v ∈ V and scalar multiplicationau ∈ V have been defined, satisfying the eight rules given in (1.1).

Examples

(a) V = F n, where F = Q, R, C or F2.(b) V = Mm,n, all m × n matrices with entries in F .(c) V = Pn, polynomials of degree at most n, i.e., p(t) = a0 + a1t + . . . + antn,with a0, a1, . . . , an ∈ F .(d) V = F X . Let X be any set; then F X is the collection of functions from Xinto F .Define (f + g)(x) = f(x) + g(x) and (af)(x) = af(x), for f, g ∈ V and a ∈ F .

1.3 Other properties of a vector space

We can deduce the following from the axioms in (1.1):a0 = 0, for a ∈ F and 0 ∈ V .0v = 0, for 0 ∈ F and v ∈ V .If av = 0 then either a = 0 or v = 0.(−1)v = −v, and in general (−a)v = −(av), for a ∈ F and v ∈ V .

2

The proofs are mostly omitted, but are short. For example, a0 = a(0 + 0) =a0+a0. Add −(a0) to both sides and we get 0 = a0+a0+(−a0) = a0+0 = a0.

LECTURE 2

Subspaces

1.4 Definition

Let V be a vector space over a field F and W a subset of V . Then W is a subspaceif it satisfies:(i) 0 ∈ W .(ii) For all v,w ∈ W we have v + w ∈ W .(iii) For all a ∈ F and w ∈ W we have aw ∈ W .

That is, W contains 0 and is closed under the vector space operations. It’s easyto see that then W is also a vector space, i.e., satisfies the properties of (1.1).For example −w = (−1)w ∈ W if w ∈ W .

1.5 Examples

(i) Every vector space V has two trivial subspaces, namely {0} and V .(ii) Take any v ∈ V , not the zero vector. Then span{v} = {av : a ∈ F} is asubspace.For example, in R2 we get a line through the origin [DIAGRAM]. These are theonly subspaces of R2 apart from the trivial ones.(iii) In R3 we have the possibilities in (i) and (ii) above, but we also have planesthrough the origin, e.g.,

W = {(x, y, z) ∈ R3 : x − 2y + 3z = 0}.

The general solution is obtained by fixing y and z, and then x is uniquely deter-mined, e.g., z = a, y = b and x = −3a + 2b. So

W = {(−3a + 2b, b, a) : a, b ∈ R}

= {a(−3, 0, 1) + b(2, 1, 0) : a, b ∈ R}.

So we can see W either as all vectors orthogonal to (1,−2, 3), or all “linearcombinations” of (−3, 0, 1) and (2, 1, 0) (two parameters).

1.6 Definition

Given a set S of vectors in V , the smallest subspace of V containing S is writtenW = span(S) or lin(S), and called the linear span of S.

3

It consists of all linear combinations a1s1 +a2s2 + . . .+ansn, where a1, . . . , an ∈ Fand s1, . . . , sn ∈ S. It includes 0 the “empty combination”.Note that all these combinations must lie in any subspace containing S, and ifwe add linear combinations or multiply by scalars, we still get a combination. Sothis is the smallest subspace containing S.

Example

In R2 the smallest subspace containing (1, 1) and (2, 3) is R2 itself, as we canwrite any (x, y) as a(1, 1) + b(2, 3), solving a + 2b = x and a + 3b = y (uniquely).Whereas, span{(1, 1), (2, 2)} is just span{(1, 1)} again.

1.7 Proposition

Let V be a vector space over F , and let U and W be subspaces of V . Then U ∩Wis also a subspace of V .Proof: (i) 0 ∈ U ∩ W , since 0 ∈ U and 0 ∈ W .(ii) If u,v ∈ U ∩ W , then u + v ∈ U and u + v ∈ W , since each of u and v are,so u + v ∈ U ∩ W .(iii) Similarly if a ∈ F and u ∈ U ∩W , then au ∈ U and au ∈ W so au ∈ U ∩W .

However, U ∪ W doesn’t need to be a subspace. For example, in R2, takeU = {(x, 0) : x ∈ R} and W = {(0, y) : y ∈ R}. [DIAGRAM]Then (1, 0) ∈ U ∪ W and (0, 1) ∈ U ∪ W , but their sum is (1, 1) 6∈ U ∪ W .

LECTURE 3

Sums of subspaces

1.8 Definition

Let V be a vector space over a field F and U, W subspaces of V . ThenU + W = {u + w : u ∈ U,w ∈ W}.

1.9 Proposition

U +W is a subspace of V , and is the smallest subspace containing both U and W .

Proof: (i) 0 = 0 + 0 ∈ U + W as 0 ∈ U and 0 ∈ W .(ii) If v1 = u1 + w1 and v2 = u2 + w2 are in U + W , then

v1 + v2 = (u1 + u2) + (w1 + w2) ∈ U + W.∈ U ∈ W

4

(iii) If v = u + w ∈ U + W and a ∈ F , then

av = au + aw ∈ U + W.∈ U ∈ W

Every u ∈ U can be written as

u = u + 0 ∈ U + W.∈ U ∈ W

so u ∈ U + W and U + W contains U (and W similarly). But any subspacecontaining U and W contains all vectors u + w, so U + W is the smallest one.

Example

In R3 let U = {a(1, 0, 0) : a ∈ R}, W = {b(0, 1, 0) : b ∈ R}, andT = {(c, d,−c) : c, d ∈ R}.

Now U + W = {a(1, 0, 0) + b(0, 1, 0) : a, b ∈ R} = {(a, b, 0) : a, b ∈ R}.

Whereas U + T = R3, since, given (x, y, z) ∈ R3, we want to write(x, y, z) = u + t = (a, 0, 0) + (c, d,−c), i.e., to solve, x = a + c, y = d and z = −cfor a, c and d. We can if c = −z, d = y and a = x + z.

Also, W + T = T since W ⊂ T , so any vector w + t is already in T and we getnothing else.

1.10 Definition

In a vector space V with subspaces U and W , we say that U +W is a direct sum,written U ⊕ W , if U ∩ W = {0}.

In particular, U ⊕ W = V means U + W = V and U ∩ W = {0}.

Examples

As above, U ∩ W = {0}, since if (a, 0, 0) = (0, b, 0), then they are both 0.So U ⊕ W = {(a, b, 0) : a, b ∈ R}.

U ∩ T = {(0, 0, 0)}, as if (a, 0, 0) = (c, d,−c) then c = d = 0. So U ⊕ T = R3.

W ∩ T consists of all vectors (0, b, 0) = (c, d,−c), for some b, c, d, which is allvectors (0, b, 0), or W again.So W + T is not a direct sum, and the notation W ⊕ T is incorrect here.

1.11 Proposition

5

V = U ⊕ W if and only if for each v ∈ V there are unique u ∈ U and w ∈ Wwith v = u + w.

Proof:

“⇒” The u and w are unique, since if u1 + w1 = u2 + w2 then

u1 − u2 = w2 −w1

∈ U ∈ W

and, since U ∩ W = {0}, we have u1 = u2 and w1 = w2.

“⇐” If v ∈ U ∩ W , then

v = v + 0 = 0 + v

∈ U ∈ W ∈ U ∈ W

and by uniqueness, v = 0. So it’s a direct sum.�

In our example, since U ⊕ T = R3, we can write any vector v in R3 uniquely asv = u + t, with u ∈ U and t ∈ T . For example, let’s take v = (5, 6, 7). Then

(5, 6, 7) = (a, 0, 0) + (c, d,−c) = (a + c, d,−c)

gives a = 12, d = 6 and c = −7, i.e.,

(5, 6, 7) = (12, 0, 0) + (−7, 6, 7).

LECTURE 4

2 Linear dependence, spanning and bases

2.1 Definition

Let V be a vector space over a field F . Then a vector v ∈ V is a linear combi-nation of vectors v1, . . . ,vn in V if we can write v = a1v1 + . . . + anvn for somea1, . . . , an ∈ F .

2.2 Definition

A set if vectors S = {v1, . . . ,vn} is linearly independent, if the only solution toa1v1 + . . . + anvn = 0 is a1 = a2 = . . . = an = 0.

This is the same as saying that we can’t express any vector in S as a linearcombination of the others.

6

2.3 Examples:

1) In R3, the vectors v1 = (1, 0, 0), v2 = (0, 1, 0) and v3 = (0, 0, 1) are indepen-

dent, sincea1v1 + a2v2 + a3v3 = (a1, a2, a3) = 0 only if a1 = a2 = a3 = 0.

2) In R3, the vectors v1 = (1, 0, 2), v2 = (1, 1, 0) and v3 = (−1,−2, 2) are linearlydependent, since v1 − 2v2 − v3 = 0. We can write any vector in terms of theothers, e.g. v3 = v1 − 2v2.

2.4 Definition

A set {v1, . . . ,vn} spans V if every v ∈ V can be written asv = a1v1 + . . . + anvn for some a1, . . . , an ∈ F .

2.5 Examples

1) (1, 0, 0), (0, 1, 0) and (0, 0, 1) span R3.

2) See after (1.6). The set {(1, 1), (2, 3)} spans R2 (the set of linear combinationsis all of R2), whereas {(1, 1), (2, 2)} doesn’t.

2.6 Definition

If a set {v1, . . . ,vn} spans V and is linearly independent, then it is called a basisof V .

2.7 Proposition

{v1, . . . ,vn} is a basis of V if and only if every v ∈ V can be written as a uniquelinear combination v = a1v1 + . . . + anvn.

Proof:

Suppose that it is a basis. If there were two such ways of writingv = a1v1 + . . . + anvn = b1v1 + . . . + bnvn, then0 = (a1 − b1)v1 + . . . + (an − bn)vn, and by linear independence we geta1 − b1 = 0, . . . , an − bn = 0, which is uniqueness.

Conversely, if we always have uniqueness, we need to show the vectors are inde-pendent. But if a1v1 + . . .+anvn = 0, we know already that 0v1 + . . .+0vn = 0,and so by uniqueness, a1 = . . . = an = 0, as required.

2.8 Examples

7

1) Clearly {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is a basis of R3.

2) Let’s check whether {(1, 1, 1), (1, 1, 0), (1, 0, 0)} is a basis of R3. We need tosolve x(1, 1, 1) + y(1, 1, 0) + z(1, 0, 0) = (a, b, c) for any given a, b, c.That is

x + y + z = ax + y = bx = c,

with solution x = c, y = b − c, z = a − b (solve from the bottom upwards). Thisis unique, so it’s a basis.

3) Now try {(1, 1, 2), (1, 2, 0), (3, 4, 4)} in R3. We solvex(1, 1, 2) + y(1, 2, 0) + z(3, 4, 4) = (a, b, c), i.e.,

x + y + 3z = ax + 2y + 4z = b2x + 4z = c,

Row-reduce:

1 1 3 | a1 2 4 | b2 0 4 | c

R2 − R1, R3 − 2R1

−→

1 1 3 | a0 1 1 | b − a0 −2 −2 | c − 2a

R3 + 2R2

−→

1 1 3 | a0 1 1 | b − a0 0 0 | c + 2b − 4a

,

which is equivalent to

x + y + 3z = ay + z = b − a

0 = c + 2b − 4a,

so we don’t always get a solution – we only do if c+2b−4a = 0, and the solutionis not unique when it exists.Indeed, 2(1, 1, 2) + (1, 2, 0)− (3, 4, 4) = 0.

LECTURE 5

We are aiming to show that all bases of a vector space have the same number ofelements.

2.9 Exchange Lemma

8

Let V be a vector space over a field F , and suppose that {u1, . . .un} spans V .Let v ∈ V with v 6= 0. Then we can replace uj by v for some j with 1 ≤ j ≤ n,so that the new set still spans V .

Proof: Since {u1, . . . ,un} spans, we can write v = a1u1 + . . . + anun, andsince v 6= 0 there is at least one non-zero aj. Choose one. We have

uj =1

aj

(v − (a1u1 + . . . + anun) + ajuj), (∗)

which is a linear combination of v and all the u1, . . . ,un except uj.Now any w = b1u1 + . . . + bnun can be written using (*) as a linear combinationthat uses v but not uj . So the new set spans.

2.10 Theorem

Let V be a vector space, let {v1, . . . ,vk} be an independent set in V , andlet {u1, . . . ,un} be a spanning set. Then n ≥ k and we can delete k of the{u1, . . . ,un}, replacing them by {v1, . . . ,vk}, so that the new set spans.

Proof: [Non-examinable: only a sketch given in lectures]We’ll apply (2.9) repeatedly. Since {v1, . . . ,vk} is an independent set, none of thevectors are 0. So, after relabelling the spanning set if necessary, we can assumethat {v1,u2, . . . ,un} spans.

So v2 = a1v1 + a2u2 + . . . + anun for some a1, . . . , an. We can’t have a2 = . . . =an = 0 as then v2 = a1v1, which contradicts independence. Without loss ofgenerality, by relabelling, we can suppose a2 6= 0. Then exchange u2 for v2 to get{v1,v2,u3 . . . ,un} spanning.

Continue. Finally {v1, . . . ,vk,uk+1, . . . ,un} spans and k ≤ n.�

2.11 Example

Take V = R3. Then u1 = (1, 0, 0), u2 = (0, 1, 0), u3 = (0, 0, 1) span, andv1 = (1, 1, 0) and v2 = (1, 2, 0) are independent.

Now, v1 = (1, 1, 0) = 1u1 + 1u2 + 0u3, so we can replace either u1 or u2 by v1.Let’s replace u2. Then {v1,u1,u3} spans V .

So v2 = (1, 2, 0) = a(1, 1, 0) + b(1, 0, 0) + c(0, 0, 1). Solving we get a = 2, b = −1and c = 0. That is, v2 = 2v1 −u1. This means we can replace u1 by v2 and then{v1,v2,u3} spans V .

9

2.12 Theorem

Let V be a vector space and let {v1, . . . ,vk} and {u1, . . . ,un} be bases of V .Then k = n.

Proof: {v1, . . . ,vk} are independent and {u1, . . . ,un} span, so k ≤ n, by(2.10).{u1, . . . ,un} are independent and {v1, . . . ,vk} span, so n ≤ k, by (2.10).So n = k.

2.13 Definition

A vector space V has dimension n, if it has a basis with exactly n elements (heren ∈ {1, 2, 3, . . .}). It has dimension 0, if V = {0}, only.We call V finite-dimensional in this case, and write dim V = n, wheren ∈ {0, 1, 2, . . .}.

2.14 Examples

(i) F n has dimension n over F , since it has basis{(1, 0, . . . , 0}, (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)}, the standard basis.

(ii) Pn (polynomials of degree ≤ n) has basis {1, t, t2, . . . , tn}, so has dimensionn + 1.

(iii) Cn is a vector space over R with dimension 2n. A basis is{(1, 0, . . . , 0}, (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1),(i, 0, . . . , 0}, (0, i, 0, . . . , 0), . . . , (0, . . . , 0, i)}.This is not a basis when we use C as our scalars, since it is then no longerindependent.

2.15 Theorem

Let V be a vector space of dimension n. Then any independent set has ≤ nelements, and, if it has exactly n, then it is a basis.Any spanning set has ≥ n elements, and if it has exactly n, then it is a basis.

LECTURE 6

Proof: Let {v1, . . . ,vn} be a basis of V (i.e., spanning and independent). Let{u1, . . . ,uk} be an independent set. By (2.10), we have k ≤ n. We can replacek of the v’s by u’s, so that it spans. So if k = n, then {u1, . . . ,un} spans andhence it is a basis.

10

Now let {w1, . . . ,wm} be a spanning set. Then (2.10) tells us that m ≥ n.Suppose now that m = n. If {w1, . . . ,wm} is not independent, then a1w1 + . . .+amwm = 0, where at least one ai 6= 0. But then wi is a linear combination ofthe others, so we can delete it and the set of n − 1 remaining vectors still spans.This is a contradiction, since a spanning set must always be at least as big asany independent set, by (2.10).

2.16 Examples

In R3, the set {(1, 2, 3), (0, 1, 0)} is independent, but can’t be a basis, as notenough elements for it to span.

Also, {(1, 2, 3), (4, 5, 6), (0, 1, 0), (0, 0, 1)} spans, but can’t be a basis as there aretoo many elements and it is not independent.

2.17 Theorem

Let V be an n-dimensional vector space and W a subspace of V . Then W hasfinite dimension and any basis of W can be extended to a basis for V by addingin more elements. So if W 6= V , then dim W < dim V .

Proof:

We suppose WLOG that W 6= {0} or V , which is easy. Otherwise, let {w1, . . . ,wk}be an independent set in W chosen to have as many elements as possible. (Atmost n, since dim V = n.) We claim it’s a basis for W .For any w ∈ W , the set {w,w1, . . . ,wk} is not independent, soa1w1 + . . . + akwk + bw = 0, say, with not all the coefficients being zero. But bcan’t be 0, since {w1, . . . ,wk} is independent. So we can write w as a combina-tion of {w1, . . . ,wk}, and so they span W , and hence form a basis for it.

Now let {v1, . . . ,vn} be a basis for V . By (2.10) we can replace k of the v’s byw’s and it still spans V . This is the same as extending {w1, . . . ,wk} to a basisof n elements, since any n-element spanning set for V is a basis, by (2.15).

2.18 Example

Let W = {(x, y, z) ∈ R3 : x − 2y + 4z = 0}, with general solution z = a, y = b,x = 2b − 4a, i.e., (2b − 4a, b, a) = a(−4, 0, 1) + b(2, 1, 0).

We can extend it to a basis for R3 by adding in something chosen from the basis{(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Indeed, (1, 0, 0) isn’t in the subspace, so that will do.

11

Thus {(−4, 0, 1), (2, 1, 0), (1, 0, 0)} is a basis of R3 containing the basis for W .

Recall from (1.6) that if S = {v1, . . . ,vk} are vectors in V , then span{v1, . . . ,vk}consists of all linear combinations a1v1 + . . .+ akvk, and is the smallest subspacecontaining S.

2.19 Theorem

Let V be a finite-dimensional vector space and U a subspace of V . Then there isa subspace W such that V = U ⊕ W .

We call W a complement for U .Proof: If U = V , take W = {0}, and if U = {0}, take W = V .

Otherwise, U has a basis {u1, . . . ,uk}, which can be extended to a basis{u1, . . . ,uk,wk+1, . . . ,wn} of V . Let W = span{wk+1, . . . ,wn}. We claim thatV = U ⊕ W .

If v ∈ V , we can write

v = a1u1 + . . . + akuk + ak+1wk+1 + . . . + anwn

in U in W

for some a1, . . . , an ∈ F . So V = U + W .

Also U ∩W = {0} (why?) so we have uniqueness, i.e., a direct sum V = U ⊕W .�

LECTURE 7

2.20 Example

Let V = R3 and U = {a(1, 2, 3) + b(1, 0, 6) : a, b ∈ R}, a plane through 0.For W we can take any line not in the plane that passes through 0, e.g. the x-axis{c(1, 0, 0) : c ∈ R}. The complement is not unique.

3 Linear mappings

3.1 Definition

Let U , V be vector spaces over F . Then a mapping T : U → V is called a linearmapping, or linear transformation, if:

12

(i) T (u1 + u2) = T (u1) + T (u2) for all u1,u2 ∈ U ;

(ii) T (au) = aT (u) for all a ∈ F and u ∈ U .

3.2 Examples

(i) Let A be an m × n matrix of real numbers. Then we define T : Rn → Rm byy = T (x) = Ax, i.e.,

y1

...ym

=

a11 . . . a1n

.... . .

...am1 . . . amn

x1

...xn

.

(m × 1) (m × n) (n × 1)

This is linear, and for m = n = 3 it includes rotations and reflections.

(ii) The identity mapping on any vector space.

(iii) D : Pn → Pn−1, with Dp = dp

dt.

(iv) T : Pn → F , with Tp =∫

1

0p(t)dt.

We shall see that for finite-dimensional vector spaces, all linear mappings canbe represented by matrices, once we have chosen bases for the spaces U and Vinvolved.

3.3 Definition

Let T : U → V be a linear mapping between vector spaces.The null-space, or kernel, of T is ker T = {u ∈ U : T (u) = 0}, and is a subset ofU .The image, or range, of T is im T = T (U) = {T (u) : u ∈ U}, and is a subset of V .

Example. Take T : R3 → R4, defined by T (x, y, z) = (x, y, x + y, x − y) (whichis linear). Then

ker T = {(x, y, z) ∈ R3 : x = y = x + y = x − y = 0} = {(0, 0, z) : z ∈ R},

and

im T = {(x, y, x + y, x− y) : x, y ∈ R} = {x(1, 0, 1, 1) + y(0, 1, 1,−1) : x, y ∈ R}.

3.4 Proposition

13

Let T : U → V be a linear mapping between vector spaces. Then ker T is asubspace of U and im T is a subspace of V .

Proof: (i) Start with ker T . Note that T (0) = T (0 + 0) = T (0) + T (0), bylinearity, and this shows that T (0) = 0. So 0 ∈ ker T .If u1,u2 ∈ ker T , then T (u1 +u2) = T (u1)+T (u2) (linearity), which equals 0+0

or 0. So u1 + u2 ∈ ker T .If u ∈ ker T and a ∈ F , then T (au) = aT (u) (linearity), which equals a0 or 0.So au ∈ ker T .Hence ker T is a subspace of U .

(ii) Since T (0) = 0, we also have 0 ∈ im T .If v1,v2 ∈ im T , then there exist u1,u2 ∈ U such that v1 = T (u1) and v2 =T (u2). Then v1 + v2 = T (u1) + T (u2) = T (u1 + u2), so it lies in im T .Likewise, if v ∈ im T , then there exists u ∈ U such that v = T (u), and thenav = aT (u) = T (au), so it lies in im T .Hence im T is a subspace of V . �

3.5 Definition

For T : U → V linear, the nullity of T is dim(ker T ), and written n(T ).The rank of T is dim(im T ), and written r(T ).

In the example of T : R3 → R4 we have n(T ) = 1 and r(T ) = 2.

LECTURE 8

3.6 Theorem

Let U, V be vector spaces over F and T : U → V linear. If U is finite-dimensional,then r(T ) + n(T ) = dim U .

Proof: If U = {0}, this is clear, so assume dim U ≥ 1.Choose a basis {w1, . . . ,wk} for ker T and extend it to a basisS = {w1, . . . ,wk,uk+1, . . . ,un} for U .(If k = n already, then T is the zero map, and the result is clear.)We claim that {T (uk+1), . . . , T (un)} is a basis for im T .

Independence. If ak+1T (uk+1) + . . . + anT (un) = 0, thenak+1uk+1 + . . . + anun ∈ ker T .So ak+1uk+1 + . . . + anun = a1w1 + . . . + akwk for some a1, . . . , ak ∈ F . Thisgives a linear relation between elements of S, and so since S is independent, weconclude that ak+1 = . . . = an = 0.

14

Spanning. If v ∈ im T , then v = Tu for some u ∈ U , and we can findb1, . . . , bn ∈ F such that u = b1w1 + . . . + bkwk + bk+1uk+1 + . . . + bnun, us-ing our basis for U .Apply T , and we get v = bk+1T (uk+1) + . . . + bnT (un), since the T (wi) are all 0.So the set spans im T .

Now r(T ) = n − k, and n(T ) = k, and indeed r(T ) + n(T ) = n = dim U .�

For example, if T : R4 → R2 is defined by T (x, y, z, w) = (x + y, 3x + 3y), thenker T is all solutions to x + y = 3x + 3y = 0, i.e., parametrised by (a,−a, b, c)and n(T ) = 3.Likewise, im T is parametrised as (d, 3d), and r(T ) = 1.Then r(T ) + n(T ) = 4 = dim R4.

4 Linear mappings and matrices

4.1 Definition

Let v1, . . .vn be a basis of a vector space V and let v = a1v1 + . . . + anvn. Wecall a1, . . . , an the coordinates of v with respect to v1, . . . ,vn.

4.2 Example

If e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1) is the standardbasis of Rn, then x = (x1, . . . , xn) has coordinates x1, . . . , xn.

If v1 = (1, 2) and v2 = (−2, 1) are given as a basis of R2, thenx = (x, y) = a(1, 2) + b(−2, 1) implies that a − 2b = x and 2a + b = y, so that

a =x + 2y

5and b =

−2x + y

5

are the coordinates of x with respect to v1,v2.

4.3 Definition (based on (3.2))

Let U and V be finite-dimensional vector spaces over a field F , with basesu1, . . . ,un and v1, . . . ,vm, and let T : U → V be a linear mapping.T is represented by an m×n matrix A with respect to the given basis if, wheneverx ∈ U has coordinates x1, . . . , xn, then T (x) ∈ V has coordinates y1, . . . , ym,where

y1

...ym

= A

x1

...xn

m × 1 m × n n × 1.

15

Note yi =∑n

j=1aijxj for i = 1, . . . , m, where A has entries (aij)

mi=1

nj=1.

So clearly T : Rn → Rm given by left multiplication by A is represented by A ifwe use the standard basis.

4.4 Proposition

Let U, V,u1, . . . ,un,v1, . . . ,vm be as in (4.3). Every map T such that the coor-dinates of T (x) with respect to the v’s is given by

y1

...ym

= A

x1

...xn

(where x has coordinates x1, . . . , xn with respect to the u’s) is linear.

Proof: Suppose that x has coordinates x1, . . . , xn and x′ has coordinatesx′

1, . . . , x′

n. Then T (x + x′) has coordinates

A

x1 + x′

1

...xn + x′

n

= A

x1

...xn

+ A

x′

1

...x′

n

which are the coordinates of T (x) added to the coordinates of T (x′).So T (x + x′) = T (x) + T (x′).

Similarly, T (ax) = aT (x) by looking at coordinates.�

How do we find the matrix of T : U → V if we are given bases u1, . . . ,un andv1, . . . ,vm?

Note that T (u1) ∈ V , so can be written as a combination of the v’s. Indeed, tofind that combination, if there is a matrix A representing T , we must have

A

10...0

=

b1

b2

...bm

16

if T (u1) = b1v1 + . . . + bmvm. That is,

b1

b2

...bm

is the first column of A. Similarly,

A

010...0

=

c1

c2

...cm

if T (u2) = c1v1 + . . . + cmvm; and so on.

LECTURE 9

Example. Suppose that T : U → V , where U has basis {u1,u2} and V has basis{v1,v2,v3}, and thatT (u1) = 3v1+4v2+5v3, while T (u2) = v1+v2+9v3. Then we fill in the columns

to get A =

3 14 15 9

, and note that A

(

10

)

=

345

, while A

(

01

)

=

119

.

Note that the identity mapping I : U → U with I(u) = u corresponds to the

identity matrix

1 0 0 . . . 00 1 0 . . . 0...

. . .. . . . . .

......

.... . . . . . 0

0 0 . . . . . . 1

of size dim U , written I or In if it is n×n.

This gives us:

4.5 Proposition

The matrix A representing T with respect to u1, . . . ,un and v1, . . . ,vm is the one

whose ith column is

a1i

a2i

...ami

, where

T (ui) = a1iv1 + a2iv2 + . . . + amivm =∑m

j=1ajivj.

17

Proof: For a typical vector x = x1u1 + . . . + xnun, we have

T (x) =

n∑

i=1

xiT (ui) (by linearity)

=n∑

i=1

m∑

j=1

xiajivj =m∑

j=1

yjvj,

where

y1

...ym

= A

x1

...xn

,

for j = 1, . . . , m, i.e., yj =∑n

i=1ajixi.

4.6 Example

Find the matrix of the linear mapping T : R3 → R2, withT (x, y, z) = (x + y + z, 2x + 2y + 2z),(i) with respect to the standard bases of R3 (call it A);(ii) with respect to the bases {(1, 0, 0), (1,−1, 0), (0, 1,−1)} of R3 and {(1, 2), (1, 0)}of R2 (call it B).

Solution. (i) T (1, 0, 0) = (1, 2), T (0, 1, 0) = (1, 2) and T (0, 0, 1) = (1, 2), so fill

in columns to get A =

(

1 1 12 2 2

)

.

(ii)T (1, 0, 0) = (1, 2) = 1(1, 2) + 0(1, 0)T (1,−1, 0) = (0, 0) = 0(1, 2) + 0(1, 0)T (0, 1,−1) = (0, 0) = 0(1, 2) + 0(1, 0)

Filling in columns we get B =

(

1 0 00 0 0

)

.

4.7 Theorem

Let T : U → V be a linear mapping between vector spaces, and suppose thatdim U = n, dim V = m. Then we can find bases u1, . . . ,un of U and v1, . . . ,vm

of V so that the matrix of T has the canonical form

A =

(

Ir OO O

)

,

where r = rank(T ), Ir is the identity matrix of size r × r, and the rest is 0.

18

Proof: Recall that ker T has dimension n − r, the nullity of T , by (3.6).So take a basis ur+1, . . . ,un of ker(T ) and extend to a basis u1, . . . ,un of U .Let v1 = T (u1), . . . ,vr = T (ur).As in (3.6), v1, . . . ,vr is a basis of im T and we can extend to a basis v1, . . . ,vn

of V .

Now T (u1) = v1, so the first column of the matrix will be

10...0

; and so on, until

T (ur) = vr, so the rth column will be

0...010...0

, with the 1 in the rth row. Finally,

T (ur+1) = . . . = T (un) = 0, as these vectors are in the kernel; so the remainingcolumns are all 0.

4.8 Another example

Take T : R2 → R2 with T (x, y) = (x − 2y, 2x − 4y). Now T (1, 0) = (1, 2) and

T (0, 1) = (−2,−4), so the matrix is

(

1 −22 −4

)

with respect to the standard bases.

LECTURE 10

Now ker T is all multiples of (2, 1) so take u2 = (2, 1) and we can take u1 = (1, 0)so that {u1,u2} is a basis for R2.

Then T (u1) = (1, 2), which gives a basis for im T , and so we let v1 = (1, 2).Extend with v2 = (0, 1) (say) to another basis for R2.

Now T has the matrix

(

1 00 0

)

with respect to the bases {u1,u2} and {v1,v2}.

Also r(T ) = 1 and n(T ) = 1.

4.9 Proposition

Let A be an m × n matrix with real entries. Let T : Rn → Rm be defined byT (x) = Ax. Thenn(T ) is the dimension of the solution space of the equations Ax = 0, and

19

r(T ) is the dimension of the subspace of Rm spanned by the columns of A.

Proof:

The result on n(T ) is just by definition of the kernel.

For r(T ), since

A

x1

...xn

=

a1jxj∑

a2jxj

...∑

amjxj

= x1

a11

...am1

+ . . . + xn

a1n

...amn

,

we see that im T is the span of the columns of A.�

4.10 Corollary

The row rank of a matrix (number of independent rows) equals the column rank(number of independent columns).

Proof: Define T : Rn → Rm by T (x) = Ax. By (4.9), the column rank of A isr(T ), which is n − n(T ). This is n-[dimension of solution space to Ax = 0], i.e.,n-[number of free parameters in the solution], which is the number of non-zerorows in the reduced form of A, which is the row rank of A.

For example, A =

(

1 2 32 4 6

)

has row rank 1 and column rank 1. The solutions

to Ax = 0 form a two-dimensional space.

Composition of mappings.

4.11 Proposition

Let U, V, W be vector spaces over F and let T : U → V and S : V → W be linearmappings. Then ST is a linear mapping from U to W .

Proof: Clearly ST (i.e., T followed by S) maps U into W .Also ST (u1 + u2) = S(T (u1) + T (u2)), by linearity of T , and this is S(T (u1)) +S(T (u2)), by linearity of S.Similarly, we see that ST (au) = S(aTu) = a(S(T (u)), and so ST is linear.

4.12 Example

20

Let U = R2, V = R3 and W = R4. We define T (x1, x2) = (x1 + x2, x1, x2) andS(y1, y2, y3) = (y1 + y2, y1, y2, y3).Then ST (x1 + x2) = (2x1 + x2, x1 + x2, x1, x2).

4.13 Proposition

Let U, V, W be finite-dimensional vector spaces over F with bases {u1, . . . ,un},{v1, . . . ,vm} and {w1, . . . ,wℓ}, and let T : U → V and S : V → W be linearmappings. Let T be represented by B and S by A with respect to the givenbases. Then ST is represented by the matrix AB, i.e.,

T SU → V → W

B A

Proof: T (uj) =∑m

i=1bijvi and S(vi) =

∑ℓ

k=1akiwk, as in Proposition 4.5.

So

ST (uj) =m∑

i=1

bijS(vi) =m∑

i=1

ℓ∑

k=1

bijakiwk

=

ℓ∑

k=1

(

m∑

i=1

akibij

)

wk

=

ℓ∑

k=1

(AB)kjwk.

This “explains” the rule for multiplying matrices.

4.14 Example 4.12 revisited

T has matrix B =

1 11 00 1

and S has matrix A =

1 1 01 0 00 1 00 0 1

.

Then AB =

1 1 01 0 00 1 00 0 1

1 11 00 1

=

2 11 11 00 1

, the matrix of ST .

LECTURE 11

Isomorphisms

21

4.15 Definition

An isomorphism is a linear mapping T : U → V of vector spaces over the samefield, for which there is an inverse mapping T−1 : V → U satisfying T−1T =IU and TT−1 = IV , where IU and IV are the identity mappings on U and Vrespectively.

4.16 Theorem

Two finite-dimensional vector spaces U and V are isomorphic if and only ifdim U = dim V .

Proof: If T : U → V is an isomorphism, then every v ∈ V is in im T , sincev = T (T−1(v)), so we have im T = V and dim U = r(T ) + n(T ) ≥ dim V .

Similarly (look at T−1 : V → U , which is also an isomorphism), we havedim V ≥ dim U . So dim U = dim V , and hence r(T ) = dim U = dim V .

Conversely, if dim U = dim V and {u1, . . . ,un} and {v1, . . . ,vn} are bases of Uand V , then we can define an isomorphism by

T (a1u1 + . . . + anun) = a1v1 + . . . + anvn,

for all a1, . . . , an ∈ F , and clearly T−1(a1v1 + . . . + anvn) = a1u1 + . . . + anun.�

Example.

Let U = R2, and let V be the space of all real polynomials p of degree ≤ 2 suchthat p(1) = 0. Clearly dim U = 2. For V , we note thatV = {a0 + a1t + a2t

2 : a0 + a1 + a2 = 0}. Setting a0 = c and a1 = d we havea2 = −c − d, soV = {c + dt + (−c − d)t2 : c, d ∈ R} = {c(1 − t2) + d(t − t2) : c, d ∈ R}, withbasis {1 − t2, t − t2}.Hence dim V = 2, and there is an isomorphism between U and V defined byT (a, b) = T (a(1, 0) + b(0, 1)) = a(1 − t2) + b(t − t2).

4.17 Remark

If A represents T with respect to some given bases of U and V , then A−1 repre-sents T−1, since if B is the matrix of T−1, we have:BA = matrix of T−1T = In, andAB = matrix of TT−1 = In, by (4.13).

22

5 Matrices and change of bases

The idea of this section is to choose bases for U and V so that T : U → V has

the simplest possible matrix, namely

(

Ir OO O

)

, as in (4.7). How is this related

to the original matrix?

5.1 Proposition

Let V be an n-dimensional vector space over F . Let {v1, . . . ,vn} be a basis of Vand {w1, . . . ,wn} be any set of n vectors, not necessarily distinct, in V . Then(i) There is a unique linear mapping S : V → V such that Svj = wj for each j.(ii) There is a unique square matrix P representing S in the basis {v1, . . . ,vn},such that wj =

∑n

i=1pijvi, for j = 1, . . . , n.

(iii) {w1, . . . ,wn} is a basis of V if and only if P is non-singular, i.e., invertible.

Proof: (i) Define S(∑n

j=1xjvj) =

∑n

j=1xjwj. This is clearly linear and the

only possibility.

(ii) We write wj =∑n

i=1pijvi using the basis {v1, . . . ,vn}. This determines the

matrix P , which is the matrix of S, as in (4.5).

(iii) If {w1, . . . ,wn} is a basis of V , then there’s a linear mapping T : V → Vsuch that T (wj) = vj for each j. Now ST = TS = I (identity), so that thematrix R of T satisfies PR = RP = In, i.e., R = P−1.

Conversely, if P is non-singular, then P−1 represents a linear mapping T suchthat T (wj) = vj for each j. But if

∑n

j=1ajwj = 0, then, applying T , we

get∑n

j=1ajvj = 0, so a1 = . . . = an = 0, as {v1, . . . ,vn} is a basis. Hence

{w1, . . . ,wn} is independent, and since dim V = n this means it’s a basis, by(2.15).

LECTURE 12

5.2 Theorem

Let U and V be finite-dimensional vector spaces over F and T : U → V a linearmapping represented by a matrix A with respect to basis {u1, . . . ,un} of U and{v1, . . . ,vm} of V . Then the matrix B representing T with respect to new bases{u′

1, . . . ,u′

n} of U and {v′

1, . . . ,v′

m} of V is given by B = Q−1AP , whereP is the matrix of the identity mapping on U with respect to the bases {u′

1, . . . ,u′

n}and {u1, . . . ,un}, i.e., u′

j =∑n

i=1pijui (so it writes the new basis in terms of the

old one), and similarly

23

Q is the matrix of the identity mapping on V with respect to the bases {v′

1, . . . ,v′

m}and {v1, . . . ,vm}, i.e., v′

k =∑n

i=1qℓkvℓ.

Proof: It’s a composition of mappings, and hence a product of matrices:

Space Basis Mapping MatrixU {u′

1, . . . ,u′

n}I ↓ P

U {u1, . . . ,un}T ↓ A

V {v1, . . . ,vm}I ↓ Q−1

V {v′

1, . . . ,v′

m}

B,

So B = Q−1AP .�

5.3 Definition

Two m×n matrices A, B with entries in F are equivalent, if there are non-singularsquare matrices P , Q with entries in F such that the product Q−1AP is definedand equals B. (So P must be n × n and Q is m × m.)Writing R = Q−1, we could also say B = RAP , with R, P nonsingular.If A and B are equivalent, we write A ≡ B.

5.4 Proposition

Equivalence of matrices is an equivalence relation; i.e., for m × n matrices overF , we have

A ≡ A, A ≡ B =⇒ B ≡ A, [A ≡ B, B ≡ C] =⇒ A ≡ C.

Proof: (i) A = ImAIn, so A ≡ A.

(ii) If A ≡ B, so B = Q−1AP , then A = QBP−1 = (Q−1)−1BP−1, so B ≡ A.

(iii) If A ≡ B and B ≡ C, say B = Q−1AP and C = S−1BR, thenC = S−1Q−1APR = (QS)−1A(PR), so A ≡ C.

5.5 Theorem

Let U , V be vector spaces over F , with dim U = n and dim V = m, and let A bean m × n matrix. Then

24

(i) Given bases {u1, . . . ,un} and {v1, . . . ,vm} of U and V , there is a linear map-ping T that is represented by A with respect to these bases.

(ii) An m×n matrix B satisfies A ≡ B if and only if B represents T with respectto some bases of U and V .

(iii) There is a unique matrix C of the form

(

Ir OO O

)

, such that A ≡ C. More-

over r = rank(T ).

Proof: (i) For u = x1u1 + . . . + xnun we write T (u) = y1v1 + . . . + ymvm,

where

y1

...ym

= A

x1

...xn

.

(ii) This follows from (5.2).

(iii) This follows from (4.7).�

5.6 Example

Take A =

(

1 1 01 1 2

)

. Find r and nonsingular P and Q such that

Q−1AP =

(

Ir OO O

)

. Note that P must be 3 × 3 and Q must be 2 × 2.

Solution. Row and column reduce to get it into canonical form. We’ll do rowsfirst.

(

1 1 01 1 2

)

(

1 1 00 0 2

)

(

1 1 00 0 1

)

,

where we did r2 − r1 and then r2/2. Next, columns.(

1 1 00 0 1

)

(

1 0 00 0 1

)

(

1 0 00 1 0

)

,

doing c2 − c1 and c2 ↔ c3.

For Q−1 start with the 2× 2 identity matrix I2, and do the same row operations.(

1 00 1

)

(

1 0−1 1

)

(

1 0−1

2

1

2

)

.

For P start with I3 and do the same column operations.

1 0 00 1 00 0 1

1 −1 00 1 00 0 1

1 0 −10 0 10 1 0

.

25

We can get Q by inverting Q−1, and the answer is

(

1 01 2

)

.

LECTURE 13

Alternatively. From first principles, we have T (x, y, z) = (x + y, x + y + 2z),so the kernel consists of all vectors with z = 0 and x = −y, i.e., has a basis(−1, 1, 0).Extend to a basis for R3, say by adding in (1, 0, 0) and (0, 0, 1).

Fill in the basis vectors as columns. So P =

1 0 −10 0 10 1 0

, as before.

Also T (1, 0, 0) = (1, 1) and T (0, 0, 1) = (0, 2), which already gives a basis for R2.

Fill in columns so Q =

(

1 01 2

)

, also as before.

6 Linear mappings from a vector space to itself

Now for T : V → V , we shall see what can be done using only one basis, say{v1, . . . ,vn}. A matrix A represents T if and only if the coordinates of T (v) with

respect to the basis are

y1

...yn

= A

x1

...xn

, where v has coordinates x1, . . . , xn;

i.e.,

[v = x1v1 + . . . + xnvn] =⇒ [T (v) = y1v1 + . . . + ynvn].

Here A is n × n (square).

Also T (vj) = a1jv1 + . . . + anjvn, and

a1j

...anj

is the jth column of A.

Consider the linear mapping T : R2 → R2 given by x 7→ Ax with A =

(

1 1−2 4

)

.

If we use the basis consisting of v1 = (1, 1) and v2 = (1, 2), thenT (v1) = (2, 2) = 2v1 and T (v2) = (3, 6) = 3v2.

Hence with respect to this basis T has the diagonal matrix B =

(

2 00 3

)

. Can

we always represent a linear mapping by such a simple matrix?

6.1 Theorem

Let V be an n-dimensional vector space over F , and T : V → V a linear mappingrepresented by the matrix A with respect to the basis {v1, . . . ,vn}. Then T is

26

represented by the matrix B with respect to the basis {v′

1, . . . ,v′

n} if and onlyif B = P−1AP , where P = (pij) is nonsingular and v′

j =∑n

i=1pijvi, i.e., P is

the matrix of the identity map on B with respect to the bases {v′

1, . . . ,v′

n} and{v1, . . . ,vn}.

Proof: This is just Theorem 5.2 with Q = P .�

In our example, P =

(

1 11 2

)

, filling in v1 and v2 as the columns.

6.2 Definition

Two n × n matrices A and B over F are similar or conjugate if there is a non-singular square matrix P with B = P−1AP . We write A ∼ B. This happens ifthey represent the same linear mapping with respect to two bases.

Thus, in our example,

(

1 1−2 4

)

(

2 00 3

)

.

6.3 Proposition

Similarity is an equivalence relation on the set of n × n matrices over F , i.e.,

A ∼ A, A ∼ B =⇒ B ∼ A, [A ∼ B, B ∼ C] =⇒ A ∼ C.

Proof: (i) A = I−1AI, so A ∼ A.(ii) If A ∼ B, i.e., B = P−1AP , then A = PBP−1 = M−1BM , with M = P−1.So B ∼ A.(iii) If A ∼ B and B ∼ C, so that B = P−1AP and C = Q−1BQ, thenC = Q−1P−1APQ = N−1AN , with N = PQ. So A ∼ C.

6.4 Example

Take T : R2 → R2 with T (v) = Av, where A =

(

2 00 2

)

= 2I2. Now P−1AP =

P−1(2I2)P = P−1(2P ) = 2P−1P = 2I2, for every P , so that whatever basis weuse, T has to have the matrix A. Note T (x) = 2x for all v, so that 2 is an“eigenvalue” – this turns out to be the key.

6.5 Definition

Let V be a vector space over F and T : V → V a linear mapping. An eigenvectorof T is an element v 6= 0 such that T (v) = λv for some λ ∈ F . The scalar λ isthen called an eigenvalue.

27

For a matrix A with entries in F , then λ ∈ F is an eigenvalue, if there is anon-zero x ∈ F n such that Ax = λx, and then x is an eigenvector of A.For the mapping T : F n → F n given by T (x) = Ax, this is the same definitionof course.

In our example, λ = 2 and 3 were eigenvalues, with eigenvectors (1, 1) and (1, 2)respectively.

LECTURE 14

6.6 Proposition

A scalar λ ∈ F is an eigenvalue of an n×n matrix A if and only if λ satisfies thecharacteristic equation,

χ(λ) = det(λIn − A) = 0,

which is a polynomial of degree n.

N.B. Some people use det(A − λIn) as the definition of χ(λ). We accept eitherdefinition, as they only differ by a factor of (−1)n, and so have the same roots.

Proof: A matrix M is invertible if and only if det M 6= 0.For (det M)(det M−1) = det(MM−1) = det In = 1, so an invertible matrix hasnon-zero determinant. Conversely, if det M 6= 0, then M can’t be reduced to amatrix with a row of zeroes, so it has rank n, so is invertible.

Now, thinking of A as giving a linear mapping on F n, as usual, we have:λ is an eigenvalue of A⇔ A − λI has a non-zero kernel⇔ A − λI isn’t invertible⇔ det(A − λI) = 0.

In our example, A =

(

1 1−2 4

)

, and

det(A − λI) =

1 − λ 1−2 4 − λ

= (1 − λ)(4 − λ) − (−2)(1)

= λ2 − 5λ + 4 + 2 = λ2 − 5λ + 6 = (λ − 2)(λ − 3),

and the eigenvalues are 2 and 3.

6.7 Proposition

28

Similar matrices have the same characteristic equation, and hence the same eigen-values.

Proof: Let A be an n × n matrix and A ∼ B, so B = P−1AP , where P isnonsingular.

Then det(λIn − B) = det(λP−1InP − P−1AP ) = det P−1(λIn − A)P= (det P−1) det(λIn − A)(det P ) = det(λIn − A), since det P−1 = (det P )−1.

6.8 Proposition

Let V be an n-dimensional vector space over a field F , and T : V → V a linearmapping. Let A represent T with respect to the basis {v1, . . . ,vn} of V . ThenA and T have the same eigenvalues.

Proof: Well, λ ∈ F is an eigenvalue of A

⇔ A

x1

...xn

= λ

x1

...xn

for some x1, . . . , xn not all 0 in F

⇔ T (x1v1 + . . .+xnvn) = λ(x1v1 + . . .+xnvn) for some x1, . . . , xn not all 0 in F⇔ T (v) = λv for some v 6= 0 in V⇔ λ is an eigenvalue of T .

If we can find a basis of eigenvectors, the matrix of T has a particularly niceform. We’ll prove one little result, then see some examples.

6.9 Proposition

Let V be an n-dimensional vector space over a field F , and let T : V → V be alinear mapping. Suppose that {v1, . . . ,vn} is a basis of eigenvectors of T . Then,with respect to this basis, T is represented by a diagonal matrix whose diagonalentries are eigenvalues of T .

Proof: We have T (vj) = λjvj , where λj is the appropriate eigenvalue (forj = 1, 2, . . . , n).Recall that T (vj) =

∑n

i=1aijvi if A is the matrix representing T (these numbers

are the jth column of A).

Now aij =

{

λj if i = j,

0 if i 6= j,so A =

λ1 . . . 0

0. . . 0

0 . . . λn

.

29

6.10 Example

Define T : R2 → R2 by T (x) =

(

11 −23 4

)

x, and take v1 =

(

13

)

and v2 =

(

21

)

.

Now

T (v1) =

(

11 −23 4

)(

13

)

=

(

515

)

= 5

(

13

)

, and

T (v2) =

(

11 −23 4

)(

21

)

=

(

2010

)

= 10

(

21

)

.

So using {v1,v2} the matrix of T is

(

5 00 10

)

.

However, not all matrices have a basis of eigenvectors.

6.11 Example

Take T : R2 → R2, defined by T (x) =

(

0 −11 0

)

x. This is a rotation of the plane

through 90◦ anti-clockwise.

Now det

{(

0 −11 0

)

− λ

(

1 00 1

)}

=

−λ −11 −λ

= λ2 + 1, so no real eigenvalues

(roots of λ2+1 = 0), so no eigenvectors in R2 (which is also obvious geometrically).

FACT. Over C every polynomial can be factorized into linear factors, and so hasa full set of complex roots. This is the Fundamental Theorem of Algebra (seeMATH 2090). So we can always find an eigenvalue.

LECTURE 15

6.12 Example

Consider T : C2 → C2 (or indeed R2 → R2) defined by T (x) =

(

2 10 2

)

x.

det

((

2 10 2

)

− λ

(

1 00 1

))

=

2 − λ 10 2 − λ

= (λ − 2)2,

so 2 is the only eigenvalue. Solving T (x) = 2x gives(

2 10 2

)(

x1

x2

)

= 2

(

x1

x2

)

, i.e.,

(

0 10 0

)(

x1

x2

)

=

(

00

)

,

so x2 = 0, x1 arbitrary, and the eigenvectors are {

(

a0

)

: a ∈ F, a 6= 0}, with

F = R or C.

30

So there is no basis of eigenvectors, and

(

2 10 2

)

is not similar to a diagonal

matrix.

6.13 Theorem

Let V be a vector space over F , let T : V → V be linear, and suppose that{v1, . . . ,vk} are eigenvectors of T corresponding to distinct eigenvalues{λ1, . . . , λk}. Then {v1, . . . ,vk} is a linearly independent set. If k = dim V , thenit’s a basis.

Proof: Suppose that∑k

i=1aivi = 0, and apply (T −λ2I)(T −λ3I) . . . (T −λkI)

to both sides.Each term is sent to zero except for the first which becomes(λ1 − λ2) . . . (λ1 − λk)a1v1 = 0.But v1 6= 0, and the other factors are nonzero, so a1 = 0.Hence a2v2+. . .+akvk = 0. Applying (T−λ3I) . . . (T−λkI), we deduce similarlythat a2 = 0. Continuing, we see that the aj are all 0 and the set is independent.Finally, if we have k independent vectors in a k-dimensional space, then it isautomatically a basis, by (2.15).

6.14 Corollary

(i) Let V be an n-dimensional vector space over a field F , and T : V → V alinear transformation. If T has n distinct eigenvalues in F , then V has a basis ofeigenvectors of T , and T can be represented by a diagonal matrix, whose diagonalentries are the eigenvalues; this is unique up to reordering the eigenvalues.

(ii) Let A be an n × n matrix with entries in F = R or C. If the characteristicpolynomial of A has n distinct roots in F , then A is similar to a diagonal matrixover F .

Proof: (i) Take {v1, . . . ,vn} eigenvectors of T corresponding to different eigen-values {λ1, . . . , λn}, say. By (6.13), they are independent, so since there are nof them, they are a basis for the n-dimensional space V (see Theorem 2.15). By

(6.9), T is represented by

λ1 . . . 0

0. . . 0

0 . . . λn

with respect to this basis.

(ii) Immediate – let V = F n and T (x) = Ax.�

31

A matrix similar to a diagonal matrix is called diagonalisable. Not all matrices

are, e.g., if A =

(

0 10 0

)

, then A2 =

(

0 00 0

)

, so if P−1AP =

(

a1 00 a2

)

, then

P−1A2P = (P−1AP )(P−1AP ) =

(

a21 00 a2

2

)

. Since A2 = O, we have a1 = a2 = 0.

This means that A = P

(

0 00 0

)

P−1 = O, which is a contradiction.

Of course In =

1 . . . 0

0. . . 0

0 . . . 1

is diagonalisable, even though it has repeated eigen-

values, 1, so (6.14) isn’t the only way a matrix can be diagonalisable.

When T (x) = Ax and A is diagonalisable, then we calculate P such that D =P−1AP is diagonal by expressing the eigenvectors of A as its columns.

6.15 Example

Let A =

(

1 34 2

)

. Find D such that A ∼ D, and P such that D = P−1AP .

Eigenvalues.

det(λI − A) =

λ − 1 −3−4 λ − 2

= (λ − 1)(λ − 2) − 12

= λ2 − 3λ − 10 = (λ − 5)(λ + 2),

so the eigenvalues are 5 and −2, and we can take D =

(

5 00 −2

)

.

For λ = 5 we solve (A − 5I)x = 0, or

(

−4 34 −3

)(

x1

x2

)

=

(

00

)

, i.e.,

(x1, x2) = (3

4a, a) for a ∈ R. So we take v1 = (3, 4), say.

For λ = −2, we solve (A + 2I)x = 0, or

(

3 34 4

)(

x1

x2

)

=

(

00

)

, i.e.,

(x1, x2) = (−b, b) for b ∈ R. So we take v2 = (−1, 1), say.

So P =

(

3 −14 1

)

will do. We can check that P−1AP = D, or, what is simpler

and equivalent, that AP = PD.

LECTURE 16

32

7 Polynomials

7.1 Definition

Let V be a vector space over F and T : V → V a linear mapping. Letp(t) = a0 + a1t + . . . + amtm be a polynomial with coefficients in F . Thenwe definep(T ) = a0I + a1T + a2T

2 + . . . + amTm, i.e.,p(T )v = a0v + a1T (v) + a2T (T (v)) + . . . + amT (T . . . (v)).

Note that if T is represented by a matrix A with respect to a basis {v1, . . . ,vn},then T k is represented by Ak (induction on k, using (4.13)), and p(T ) is repre-sented by p(A) = aoIn + a1A + . . . + amAm.

7.2 Definition

Let V be an n-dimensional vector space over F , and T : V → V linear. Supposethat T is represented by an n × n matrix A with respect to some basis. Thenthe characteristic polynomial of T is χ(λ) = det(λIn − A). This is independentof the choice of basis, since if B = P−1AP represents T with respect to anotherbasis, then det(λIn − B) = det(λIn − A) by (6.7).

N.B. Some people use det(A − λIn) as the definition of χ(λ). We accept eitherdefinition, as they only differ by a factor of (−1)n, and so have the same roots.

7.3 The Cayley–Hamilton theorem

Let A be a real or complex square matrix, and χ(λ) its characteristic polynomial.Then χ(A) = O, the zero matrix.

Proof: To be discussed later. Note χ(A) = det(λIn − A), but we can’t justsubstitute λ = A.

Example

Take A =

(

1 34 2

)

, as in (6.15). Then χ(λ) = λ2 − 3λ− 10 = (λ− 5)(λ + 2), and

(A − 5I)(A + 2I) =

(

−4 34 −3

)(

3 34 4

)

=

(

0 00 0

)

.

Similarly, if T : V → V is a linear mapping on an n-dimensional vector space,then χ(T ) = 0, because χ(λ) is defined in terms of a matrix representing T .

So we know that there are polynomials which “kill” a matrix A. It is importantto find the simplest one.

33

7.4 Definition

The minimum or minimal polynomial of a square matrix A is the monic poly-nomial of least degree such that µ(A) = O. (“Monic” means that the leadingcoefficient is 1.) We write it µ or µA.

Example. A =

4 2 00 4 00 0 4

has χA(λ) = det(λI − A) = (λ − 4)3 (check).

So (A−4I)3 = O. But in fact (A−4I)2 =

0 2 00 0 00 0 0

0 2 00 0 00 0 0

=

0 0 00 0 00 0 0

,

and the minimum polynomial is µA(λ) = (λ − 4)2.

Now, given the characteristic polynomial, we can show that there are only a smallnumber of possibilities to test for the minimum polynomial.

7.5 Theorem

Let A be a square matrix; then:(i) Every eigenvalue of A is a root of the minimum polynomial;(ii) The minimum polynomial divides the characteristic polynomial exactly.

Proof: (i) Let Ax = λx with x 6= 0. Then Akx = λkx for k = 1, 2, 3, . . ., andby taking linear combinations we get that p(A)x = p(λ)x for each polynomial p.

Now put p = µ, and we get µ(A)x = µ(λ)x, and µ(A)x = 0. Since x 6= 0, wehave µ(λ) = 0.

(ii) Let µ be the minimum and χ the characteristic polynomial. By long divisionwe can write χ = µq + r for polynomials q (quotient) and r (remainder), withdeg r < deg µ.

Nowχ(A) = µ(A)q(A) + r(A)

O OCH (7.3) Defn. of µ

so r(A) = O. Since deg r < deg µ and µ is the minimum polynomial, we haver ≡ 0, i.e., µ divides χ exactly.

LECTURE 17

Example.

34

Let A =

3 1 0 00 3 0 00 0 6 00 0 0 6

. Calculate µA and χA.

Then

det(λI − A) =

λ − 3 −1 0 00 λ − 3 0 00 0 λ − 6 00 0 0 λ − 6

= (λ − 3)2(λ − 6)2.

So the eigenvalues are 3 and 6 only. We have χ(t) = (t − 3)2(t − 6)2, and µ(t)has to divide it; also, both (t − 3) and (t − 6) must be factors of µ(t).

The only possibilities are therefore:

(t − 3)(t − 6) degree 2,(t − 3)2(t − 6) degree 3,(t − 3)(t − 6)2 degree 3,(t − 3)2(t − 6)2 degree 4.

We try them in turn. So

(A − 3I)(A − 6I) =

0 1 0 00 0 0 00 0 3 00 0 0 3

−3 1 0 00 −3 0 00 0 0 00 0 0 0

=

0 −3 0 00 0 0 00 0 0 00 0 0 0

6= O.

Next try

(A − 3I)2(A − 6I) =

0 1 0 00 0 0 00 0 3 00 0 0 3

0 −3 0 00 0 0 00 0 0 00 0 0 0

= O,

so that µ(t) = (t − 3)2(t − 6).We can check that (A − 3I)(A − 6I)2 6= O.Of course (A − 3I)2(A − 6I)2 = O, by Cayley–Hamilton.

7.6 Proposition

Similar matrices have the same minimum polynomial. Hence if A is diagonalis-able, then µA(t) has no repeated roots.

Proof: If B = P−1AP , then Bn = (P−1AP )(P−1AP ) . . . (P−1AP ) = P−1AnP ,and so for any polynomial p we have p(B) = P−1p(A)P .

35

Hence p(B) = 0 ⇐⇒ p(A) = 0. So µB(t) = µA(t).

Now, if B is a diagonal matrix, say

λ1 0 . . . . . . . . . . . . 0

0. . . 0 . . . . . . . . . 0

0 0 λ1 0 . . . . . . 0

0 . . . 0. . . 0 . . . 0

0 . . . . . . 0 λk 0 0

0 . . . . . . . . . 0. . . 0

0 . . . . . . . . . . . . 0 λk

(with possi-

bly repeated diagonal entries), then (B−λ1I)(B−λ2I) . . . (B−λk)I = O, so µB(t)has no repeated roots since it must divide the polynomial (t−λ1)(t−λ2) . . . (t−λk).But any diagonalisable A is similar to a matrix B of this form, so µA has no re-peated roots.

8 The Jordan canonical form

From (6.9), we know that a matrix is diagonalisable if and only if there is a basisconsisting of its eigenvectors. What can we do if this is not the case?

8.1 Definition

A Jordan block matrix is a square matrix of the form Jλ =

λ 1 0 . . . 0

0 λ 1. . .

......

. . .. . .

. . . 0

0 . . .. . . λ 1

0 . . . . . . 0 λ

,

with λ on the diagonal and 1 just above the diagonal (for some fixed scalar λ).There is also the trivial 1 × 1 case.

For example, the following are Jordan block matrices:

(4),

(

4 10 4

)

,

4 1 00 4 10 0 4

,

4 1 0 00 4 1 00 0 4 10 0 0 4

.

8.2 Proposition

Suppose that V is an n-dimensional vector space, and T a linear transforma-tion on V , represented by a Jordan block matrix A with respect to some basis{v1, . . . ,vn}.

36

Then χ(t) = det(tI − A) = (t − λ)n, and we also have

Tv1 = λv1,

Tv2 = v1 + λv2,

Tv3 = v2 + λv3,

. . . . . .

Tvn = vn−1 + λvn.

So, if we define v0 = 0 we have (T − λI)vk = vk−1 for k = 1, 2, . . . , n. Hence(T − λI)kvk = 0.

Proof: To get χ(t) we expand

t − λ −1 0 . . . 0

0 t − λ −1. . .

......

. . .. . .

. . . 0

0 . . .. . . t − λ −1

0 . . . . . . 0 t − λ

about the first

column, then continue. To work out what Tvk is, just look at the kth column ofA. The rest is clear.

LECTURE 18

8.3 Definition

Let V be a vector space over F and T : V → V a linear mapping. Let λ ∈ Fbe an eigenvalue of T . The non-zero vector v ∈ V is said to be a generalizedeigenvector of T corresponding to λ if (T − λI)kv = 0 for some k ≥ 1.Similarly for n × n matrices A, a column vector x is a generalized eigenvector if(A − λI)kx = 0 for some k ≥ 1.

Clearly every eigenvector is a generalized eigenvector (take k = 1).

Example.

For A =

(

4 10 4

)

, we have v1 = (1, 0) an eigenvector with eigenvalue λ = 4, since(

4 10 4

)(

10

)

=

(

40

)

. Now v2 = (0, 1) is not an eigenvector, since(

4 10 4

)(

01

)

=

(

14

)

= v1 + 4v2. So (A − 4I)v2 = v1, and then

(A − 4I)2v2 = (A − 4I)v1 = 0; so v2 is a generalized eigenvector.

8.4 Definition

37

A square matrix A is said to be in Jordan canonical form (or Jordan normalform) if it consists of Jordan block matrices strung out on the diagonal, withzeroes elsewhere. The diagonal entries are the eigenvalues of A.

8.5 Examples

Any Jordan block matrix is in JCF. So are:

A =

−1 0 00 2 00 0 2

, B =

−1 0 00 2 10 0 2

, and C =

2 1 0 0 00 2 1 0 00 0 2 0 00 0 0 2 10 0 0 0 2

.

here:A has 3 blocks all of size 1 × 1;B has 1 block of size 1 × 1 and 1 of size 2 × 2;C has 1 block size 3 × 3 and 1 block size 2 × 2.

8.6 Theorem

Let A be an n×n matrix with entries in C. Then A is similar to a matrix in Jor-dan canonical form, unique up to re-ordering the blocks. If A is an n× n matrixwith real entries, then A is similar to a real matrix in JCF (i.e., B = P−1APwith B real) if and only if all the roots of the characteristic equation are real.

Proof: Omitted, but the “only if” follows from the fact that if B =

B1 0. . .

0 BN

,

is in JCF and each Bi is a Jordan block of size mi with diagonal elements λi,then

χB(λ) = χB1(λ) . . . χBN

(λ) = (λ − λ1)m1 . . . (λ − λN)mN ,

so if all the blocks are real then χ has only real roots.

8.7 Facts which help us find the JCF of a matrix A

1. For each λ the power of (t − λ) in χ(t) is the total size of Jordan blocksusing λ. See (8.6).

2. The number of λ-blocks is the dimension of the eigenspace ker(A−λI). Foreach block gives one new eigenvector, by (8.2).

3. The biggest blocksize for a λ-block is the power of (t− λ) in the minimumpolynomial. For we need to take (Jλ−λI)n to kill a Jordan block Jλ of sizen, by (8.2).

38

8.8 Example

Take A =

5 0 −12 3 −14 0 1

. Find the Jordan canonical form of A.

N.B. We will do this at the very end of the course for everything up to 4 × 4matrices.

We always start by finding

χ(λ) = det(λI − A) =

λ − 5 0 1−2 λ − 3 1−4 0 λ − 1

= (λ − 3)

λ − 5 1−4 λ − 1

= (λ − 3)(λ2 − 6λ + 5 + 4) = (λ − 3)3.

So 3 is the only eigenvalue.

We can solve (A − 3I)

xyz

=

000

or

2 0 −12 0 −14 0 −2

xyz

=

000

. This is just 2x = z and so

ker(A − 3I) = {(a, b, 2a) : a, b ∈ R},

a 2-dimensional space of eigenvectors (with 0).

This tells us that there are 2 blocks, i.e., one size 2 and one size 1.

Alternatively, we can check that (A − 3I) 6= O, but

(A − 3I)2 =

2 0 −12 0 −14 0 −2

2

= O.

Thus the minimum polynomial of A is (t − 3)2, and the largest blocksize is size2, so it must be 2 + 1 again.

The Jordan form of A is

3 1 00 3 00 0 3

, with one Jordan block of size 2×2 and one

of size 1 × 1.

LECTURE 19

39

8.9 The Cayley–Hamilton theorem via Jordan canonical matrices

Let A be a real or complex matrix and χ(t) the characteristic polynomial. Thenχ(A) = O.

Proof: Suppose that M is a Jordan canonical matrix. If M =

B1 0. . .

0 Bm

,

with B1, . . . , Bm Jordan blocks, then M2 =

B21 0

. . .

0 B2m

, and similarly for

higher powers of M . So χ(M) =

χ(B1) 0. . .

0 χ(Bm)

and we need to show

that χ(Bi) = 0 for each i.

Now suppose that B =

λ 1 0 . . . 0

0 λ 1. . .

......

. . .. . .

. . . 0

0 . . .. . . λ 1

0 . . . . . . 0 λ

is a Jordan block of size k. Then

(B − λI)k = O as in (8.2).Thus χ(B) = 0 for each of the blocks B making up A in Jordan form (since ineach case (t − λ)k divides χ(t)).

Now χ(M) =

χ(B1) 0. . .

0 χ(Bm)

= O.

Finally, any matrix A is similar to a matrix M in Jordan canonical form with thesame characteristic polynomial, by (6.7), i.e., M = P−1AP , andχ(A) = P−1χ(M)P = O.

8.10 Theorem

If no eigenvalue λ appears with multiplicity greater than 3 in the characteristicequation for A, then we can write down the JCF knowing just the characteristicand minimum polynomials (χ(t) and µ(t)).

Proof: In general the power of (t−λ) in the characteristic polynomial

is the total number of diagonal entries λ we get in the JCF; the power

40

of (t − λ) in the minimum polynomial is the size of the largest Jordan

block associated with λ.

If the multiplicity of (t− λ) in χ(t) is 1, then there is just one block, size 1. Thepower of (t − λ) in both χ(t) and µ(t) is 1.

If the multiplicity of (t − λ) in χ(t) is 2, then we have either one block, size 2,or two blocks size 1, that is, looking just at the part corresponding to this λ, wehave

block sizes 2 1 + 1

matrices

(

λ 10 λ

) (

λ 00 λ

)

χ (t − λ)2 (t − λ)2

µ (t − λ)2 (t − λ).

For multiplicity 3, there may be one block size 3, a 2 + 1 or a 1 + 1 + 1.

block sizes 3 2 + 1 1 + 1 + 1

matrices

λ 1 00 λ 10 0 λ

λ 1 00 λ 00 0 λ

λ 0 00 λ 00 0 λ

χ (t − λ)3 (t − λ)3 (t − λ)3

µ (t − λ)3 (t − λ)2 (t − λ).

8.11 Example

A matrix A has characteristic polynomial χ(t) = (t − 1)3(t − 2)3(t − 3)2 andminimal polynomial µ(t) = (t− 1)3(t− 2)(t− 3)2. Find a matrix in Jordan formsimilar to A.

Solution: For λ = 1, there is one block size 3; for λ = 2, there are 3 blocks size1; for λ = 3 there is one block size 2.

Remark: If the eigenvalue has multiplicity 4, the possibilities are now 4, 3 + 1,2+2, 2+1+1, 1+1+1+1. For both 2+2 and 2+1+1 we have χ(t) = (t−λ)4

and µ(t) = (t − λ)2. So χ and µ alone don’t help us distinguish between

λ 1 0 00 λ 0 00 0 λ 10 0 0 λ

and

λ 1 0 00 λ 0 00 0 λ 00 0 0 λ

,

since both have the largest block of size 2.

41

However, it is still possible to determine quickly which case we are in, as in thefirst case there are 2 blocks, and the eigenspace ker(A − λI) has dimension 2; inthe other case, 3 blocks, and it has dimension 3.

LECTURE 20

8.12 Worked example (from the 2003 paper)

You are given that the matrix A =

7 1 1 10 8 0 00 0 6 −2−1 1 −1 7

has characteristic poly-

nomial χA(t) = (t−6)2(t−8)2. Find its Jordan canonical form and its minimumpolynomial.

Solution: We see that the eigenvalues are 6 and 8. Let’s go for the minimumpolynomial µ first. This has roots 6 and 8, and divides χ. Since it is monic, it istherefore one of:(t − 6)(t − 8), (t − 6)2(t − 8), (t − 6)(t − 8)2, or (t − 6)2(t − 8)2.

We calculate

(A−6I)(A−8I) =

1 1 1 10 2 0 00 0 0 −2−1 1 −1 1

−1 1 1 10 0 0 00 0 −2 −2−1 1 −1 −1

=

−2 2 −2 −20 0 0 02 −2 2 20 0 0 0

6= O,

which eliminates the first possibility. Next,

(A−6I)2(A−8I) =

1 1 1 10 2 0 00 0 0 −2−1 1 −1 1

−2 2 −2 −20 0 0 02 −2 2 20 0 0 0

=

0 0 0 00 0 0 00 0 0 00 0 0 0

= O,

so the minimum polynomial is (t − 6)2(t − 8). (If it hadn’t been zero, we wouldthen have tried (t − 6)(t − 8)2.)

Each eigenvalue has multiplicity 2, so we have to work out whether the blocksare 2 or 1 + 1. Since we needed (t − 6)2, this block is size 2, and since we onlyneeded (t − 8)1 this has two blocks of size 1. The Jordan form is therefore:

B =

6 1 0 00 6 0 00 0 8 00 0 0 8

.

THE END

42