Matrices - Statisticsxhou/MAS5107F06/notes.pdfMatrices 1.1. Matrix Algebra Fields. A ﬁled is a set F equipped with two operations + and ·such that (F,+) and (F×,×) are abelian

CHAPTER 1

Matrices

1.1. Matrix Algebra

Fields. A filed is a set F equipped with two operations + and · such that(F,+) and (F×,×) are abelian groups, where F× = F \ 0, and a(b+ c) = ab+acfor all a, b, c ∈ F .

Examples of fields. Q,R,C,Zp (p primes). If F is a filed and f(x) is anirreducible polynomial in F [x], the quotient ring F [x]/(f) is field containing F asa subfield. E.g., C ∼= R[x]/(x2 + 1); Z3[x]/(x3 − x + 1) is a field with 33 elementscontaining Z3. If R is an integral domain (commutative ring without zero divisors),then all fractions p

q (p, q ∈ R, q 6= 0) form the fractional field of R with contains R.

Matrices. Let F be a field. Mm×n(F ) = the set of all m × n matriceswith entries in F ; Mn(F ) = Mn×n(F ). For A = [aij ], B = [bij ] ∈ Mm×n(F ),C = [cjk] ∈Mn×p(F ), α ∈ F ,

A+B := [aij + bij ] ∈Mm×n(F ),

αA := [αaij ] ∈Mm×n(F ),

AC := [dik] ∈Mm×p(F ), where dik =n∑

j=1

aijcjk.

For A ∈Mm×n(F ), B ∈Mn×p(F ), C ∈Mp×q(F ),

(AB)C = A(BC).

(Mn(F ),+, ·) is a ring with identity

In =

1

. . .

1

.GL(n, F ) = the set of invertible matrices in Mn(F ). (GL(n, F ), ·) is the multiplica-tive group of Mn(F ), called the general linear group of degree n over F .

Multiplication by blocks. Let

A =

A11 · · · A1n

......

Am1 · · · Amn

, B =

B11 · · · B1p

......

Bn1 · · · Bnp

,1

2 1. MATRICES

where Aij ∈Mmi×nj (F ), Bjk ∈Mnj×pk(F ). Then

AB =

C11 · · · C1p

......

Cm1 · · · Cmp

,where

Cik =n∑

j=1

AijBjk.

Transpose. The transpose of

A =

a11 · · · a1n

......

am1 · · · amn

is

AT =

a11 · · · am1

......

a1n · · · amn

.If

A =

A11 · · · A1n

......

Am1 · · · Amn

is a block matrix, then

AT =

AT

11 · · · ATm1

......

AT1n · · · AT

mn

.Properties of transpose.

(i) (αA+ βB)T = αAT + βBT .(ii) (AB)T = BTAT .(iii) (A−1)T = (AT )−1.

Elementary operations and elementary matrices.

To perform an elementary row (column) operation on a matrix A is to multiplythe corresponding elementary matrix to A from the left (right).

Note. The inverse of an elementary matrix is also an elementary matrix ofthe same type.

Proposition 1.1. Every A ∈ GL(n, F ) is a product of elementary matrices.

Proof. Use induction on n. A can be transformed into[

1A1

]through suitable

elementary row and column operations, i.e., ∃ elementary matrices P1, . . . , Pk, Q1, . . . , Ql

such that

P1 · · ·PkAQ1 · · ·Ql =

[1

A1

],

1.1. MATRIX ALGEBRA 3

Table 1.1. Elementary row operations and elementary matrices

type elementary row operation elementary matrix

I multiply the ith row by α ∈ F× 1 . . .

α i. . .1

II swap the ith and jth rows

1 . . .1

0 1 i1 . . .

11 0 j

1 . . .1

IIIadd β times the jth row tothe ith row, where i 6= j, β ∈ F

1 . . .

j 1 . . .i β 1 . . .

1

where A1 ∈ GL(n − 1, F ). By the induction hypothesis, A1 is a product of el-ementary matrices. Thus

[1

A1

]is a product of elementary matrices and so is

A = P−1k · · ·P−1

1

[1

A1

]Q−1

l · · ·Q−11 .

Equivalence. Let A,B ∈Mm×n(F ). We say that

• A is row equivalent to B, denoted Ar≈ B, if ∃P ∈ GL(m,F ) such that

A = PB;• A is column equivalent to B, denoted A

c≈ B, if ∃Q ∈ GL(n, F ) such that

A = BQ;• A is equivalent to B, denoted A ≈ B, if ∃P ∈ GL(m,F ) and Q ∈ GL(n, F )

such that A = PBQ.r≈,

c≈ and ≈ are equivalence relations on Mm×n(F ).

Reduced row echelon forms. A matrix A ∈Mm×n(F ) is called a reducedrow echelon form (rref) if

(i) in each nonzero row of A, the first nonzero entry is 1; such an entry iscalled a pivot of A;

(ii) if a column of A contains a pivot, then all other entries in the column are0;

(iii) if a row contains a pivot, then every row above contains a pivot furtherto the left.

A reduced column echelon form (rcef) is defined similarly.

4 1. MATRICES

Proposition 1.2. Every A ∈Mm×n(F ) is row (column) equivalent to a uniquerref (rcef).

Proof. Existence of rref. Induction on the size of A.Uniqueness of rref. Use induction on m.Let A,B ∈ Mm×n(F ) be rref’s such that A = PB for some P ∈ GL(m,F ).

We want to show that A = B. May assume B 6= 0. Assume that the first nonzerocolumn of B is the jth column. Then the first nonzero column of A = PB is alsothe jth column. Write

A =

[0 1 a

0 0 A1

]j

, B =

[0 1 b

0 0 B1

]j

,

where A1, B1 ∈M(m−1)×(n−j)(F ) are rref’s. Then[1 a

0 A1

]= P

[1 b

0 B1

].

It follows that

P =

[1 p

0 P1

], P1 ∈ GL(m− 1, F ),

and [1 a

0 A1

]=

[1 b+ pB1

0 P1B1

].

Since A1 = P1B1, by the induction hypothesis, A1 = B1. Let I be the set of indicesof the pivot columns of B1. Since A,B are rref’s, all components of a and b withindices in I are 0. Since pB1 = a− b, all components of pB1 with indices in I are 0.Write B1 = [b1, , . . . , bn−j ]. Then pbi = 0 for all i ∈ I. Note that every column ofB1 is a linear combination of the pivot columns bi, i ∈ I. So, pB1 = 0. Therefore,a = b. So, A = B.

Proposition 1.3. Every A ∈Mm×n(F ) is equivalent to[Ir 00 0

],

where 0 ≤ r ≤ minm,n is uniquely determined by A. Moreover, r = the numberof pivots in the rref (rcef) of A. r is called the rank of A.

Proof. We only have to show the uniqueness of r; the other claims are obvious.Assume to the contrary that[

Ir 00 0

]≈

[Is 00 0

], r < s.

Then ∃P ∈ GL(m,F ) and Q ∈ GL(n, F ) such that

P

[Ir 00 0

]=

[Is 00 0

]Q.

1.1. MATRIX ALGEBRA 5

Write P = [P1 P2], Q =[

Q1Q2

], where P1 ∈Mm×r(F ), Q1 ∈Ms×n(F ). Then

[P1 0] =

[Q1

0

].

Hence Q1 = [Q11 0], where Q11 ∈ Ms×r(F ). Since s > r, ∃0 6= x ∈ F s such thatxQ11 = 0. Then

[x 0]Q = [x 0]

[Q11 0Q2

]= 0,

which is a contradiction since Q is invertible.

Easy fact. Let A ∈Mn(F ). Then the following are equivalent.

(i) A is invertible.(ii) rref(A) = In.(iii) rcef(A) = In.(iv) rankA = n.

Finding A−1. Let A ∈Mn(F ). Perform elementary row operations:

[A In]→ · · · → [rref(A) B].

If rref(A) = In, A−1 = B; if rref(A) 6= In, A is not invertible.

For A ∈ Mm×n(F ), let kerr(A) = x ∈ M1×m(F ) : xA = 0 and kerc(A) =y ∈Mn×1(F ) : Ay = 0.

Facts. Let A,B ∈Mn(F ).

(i) A ∈ GL(n, F )⇔ kerr(A) = 0 ⇔ kerc(A) = 0.(ii) If AB ∈ GL(n, F ), then A,B ∈ GL(n, F ). In particular, if AB = In, then

B = A−1 and BA = In.

Proof. (i) To see that kerc(A) = 0 ⇒ A ∈ GL(n, F ), note that if rref(A) 6=In, then kerc(A) 6= 0.

(ii) kerc(B) ⊂ kerc(AB) = 0. So, B ∈ GL(n, F ).

Congruence and similarity. Let A,B ∈Mn(F ). We say that

• A is congruent to B, denoted ∼= B, if ∃P ∈ GL(n, F ) such that A =PTBP ;• A is similar to B, denoted A ∼ B, if ∃P ∈ GL(n, F ) such that A =P−1BP .

Canonical forms of symmetric matrices under congruence will be discussed in Chapter??;canonical forms of matrices under similarity will be discussed in Chapter??.

Given P ∈ GL(n, F ), the map φ : Mn(F )→Mn(F ) defined by φ(A) = P−1APis an algebra isomorphism, i.e., φ preserves the addition, multiplication and scalarmultiplication.

6 1. MATRICES

Exercises

1.1. Let A ∈Mm×n(F ) with rankA = 0 and let p > 0. Prove that ∃B ∈Mn×p(F )such that rankB = minn− r, p and AB = 0.

1.2. For 1 ≤ i ≤ n let ei = [0 . . . 0 1i0 . . . 0]T ∈ Fn.

(i) Let σ be a permutation of 1, . . . , n and let

Pσ =[eσ(1) · · · eσ(n)

].

Pσ is called the permutation matrix of σ. Prove that P−1σ = PT

σ .(ii) Let

A = [a1, · · · an] ∈Mm×n(F ), B =

b1...bn

∈Mn×p(F )

Prove that

APσ = [aσ(1), · · · aσ(n)], PσB =

bσ−1(1)

...bσ−1(n)

.Hence, multiplication of a matrix X by a permutation matrix from theleft (right) permutes the rows (columns) of X. In particular, Pστ =PσPτ if τ is another permutation of 1, . . . , n.

1.3. Let A = [aij ] ∈Mm×n(F ) and B = [bkl] ∈Mpq(F ). Define

A⊗B =

a11B · · · a1nB

......

am1B · · · amnB

∈Mmp×nq(F ).

(i) Prove that (A⊗B)T = AT ⊗BT .(ii) Let C ∈ Mn×r(F ) and D ∈ Mq×s(F ). Prove that (A ⊗ B)(C ⊗ D) =

AC ⊗BD.(iii) Let C = [cuv] ∈Mr×s(F ). Prove that A⊗ (B ⊗ C) = (A⊗B)⊗ C.(iv) Let σ be a permutation of 1, . . . ,mp defined by

σ((i− 1)p+ k

)= (k − 1)m+ i for 1 ≤ i ≤ m, 1 ≤ k ≤ p,

and let τ be a permutation of 1, . . . , nq defined by

τ((j − 1)q + l

)= (l − 1)n+ j for 1 ≤ j ≤ n, 1 ≤ l ≤ q.

Show that the (u, v)-entry of A⊗B is the (σ(u), τ(v))-entry of B ⊗ A.Namely

PTσ (A⊗B)Pτ = B ⊗A.

(Note. If m = n and p = q, then σ = τ .)(v) Prove that rank(A⊗B) = (rankA)(rankB).

CHAPTER 2

The Determinant

2.1. Definition, Properties and Formulas

Let Sn be the set (group) of all permutations of 1, . . . , n. A permutationσ ∈ Sn is denoted by

σ =

(1 2 · · · n

σ(1) σ(2) · · · σ(n)

).

A transposition is a swap of i, j ∈ 1, . . . , n (i 6= j) and is denoted by (i, j). Everyσ ∈ Sn is a product of s transpositions. The number s is not uniquely determined byσ, but s (mod 2) is. Define sign(σ) = (−1)s; σ is called an even (odd) permutationif sign(σ) = 1 (−1).

Definition 2.1. Let A = [aij ] ∈ Mn(F ). The determinant of A, denoted bydetA of |A|, is defined to be

detA =∑

σ∈Sn

sign(σ)a1σ(a) · · · anσ(n).

Easy facts.

(i) detAT = detA.(ii) detA is an F -linear function of every row and column of A.(iii) If A has two identical rows (columns), then detA = 0.

Proof. (i)

detAT =∑

σ∈Sn

sign(σ)aσ(1),1 · · · aσ(n),n

=∑

σ∈Sn

sign(σ−1)a1,σ−1(1) · · · an,σ−1(n)

= detA.

(iii) Assume that the first two rows of A are identical. Let C be a set ofrepresentatives of the left cosets of 〈(1, 2)〉 in Sn. Then

detA =∑σ∈C

sign(σ)a1σ(1) · · · anσ(n) +∑σ∈C

sign(σ · (1, 2))aσ(1),1 · · · aσ(n),n = 0.

7

8 2. THE DETERMINANT

Effect of elementary row and column operations on the determi-nant.

det[. . . αvi . . . ] = α det[. . . vi . . . ],

det[. . . vi . . . vj . . . ] = −det[. . . vj . . . vi . . . ],

det[. . . vi . . . vj + αvi . . . ] = det[. . . vi . . . vj . . . ].

Theorem 2.2 (The Laplace expansion). Let A ∈Mn(F ). For I, J ⊂ 1, . . . , n,let A(I, J) denote the submatrix of A with row indices in I and column indices inJ . Fix I ⊂ 1, . . . , n with |I| = k. We have

detA =∑

J⊂1,...,n|J|=k

(−1)∑

i∈I i+∑

j∈J j detA(I, J) detA(Ic, Jc),

where Ic = 1, . . . , n \ I.

Lemma 2.3. Let

σ =

(1 · · · k k + 1 · · · n

i1 · · · ik i′1 · · · i′n−k

)∈ Sn,

where i1 < · · · < ik and i′1 < · · · < i′n−k. Then

sign(σ) = (−1)i1+···+ik+ 12 k(k+1).

Proof. We count the number of transpositions needed to permute i1, . . . , ik,i′1, . . . , i

′n−k into 1, . . . , n. There are ik − k integers in i′1, . . . , i′n−k that are < ik.

Thus, ik − k transpositions are needed to move ik to the right place. In general,it − t transpositions are needed to move it to the right place. So,

sign(σ) = (−1)∑k

t=1(it−t) = (−1)i1+···+ik+ 12 k(k+1).

Corollary 2.4. Let

σ =

(i1 · · · ik i′1 · · · i′n−k

j1 · · · jk j′1 · · · j′n−k

)∈ Sn,

where i1 < · · · < ik, i′1 < · · · < i′n−k, j1 < · · · < jk, j′1 < · · · < jn−k. Then

sign(σ) = (−1)i1+···+ik+j1+···+jk .

Proof of Theorem 2.2. We have

detA =∑

σ∈Sn

sign(σ)a1σ(1) · · · anσ(n) =∑

J⊂1,...,n|J|=k

∑σ∈Sn

σ(I)=J

sign(σ)a1σ(1) · · · anσ(n).

To compute the inner sum in the above, let I = i1, . . . , ik, Ic = i′1, . . . , i′n−k,J = j1, . . . , jk, Jc = j′1, . . . , j′n−k, where i1 < · · · < ik, i′1 < · · · < i′n−k,j1 < · · · < jk, j′1 < · · · < j′n−k, and

σ =

(i1 · · · ik i′1 · · · i′n−k

jα(1) · · · jα(k) j′β(1) · · · j′β(n−k)

),

where α ∈ Sk and β ∈ Sn−k. Then by Corollary 2.4,

sign(σ) = sign(α)sign(β)(−1)i1+···+ik+j1+···+jk .

2.1. DEFINITION, PROPERTIES AND FORMULAS 9

Therefore,∑σ∈Sn

σ(I)=J

sign(σ)a1σ(1) · · · anσ(n)

=(−1)i1+···+ik+j1+···+jk

·(∑

α∈Sk

sign(α)ai1jα(1) · · · aikjα(k)

)( ∑β∈Sn−k

sign(β)ai′1j′β(1)· · · ai′n−kj′

β(n−k)

)=(−1)i1+···+ik+j1+···+jk detA(I, J) detA(Ic, Jc).

Hence the theorem.

Corollary 2.5. Let A = [aij ] ∈Mn(F ). We have

detA =n∑

j=1

(−1)i+jaij detAij , 1 ≤ i ≤ n,

and

detA =n∑

i=1

(−1)i+jaij detAij , 1 ≤ j ≤ n,

where Aij is the submatrix of A obtained after deleting the ith row and the jthcolumn.

Proposition 2.6. Let ej = [0 . . . 0j

1 0 . . . , 0]T ∈ Fm. Let f : Mm×n(F ) → Fsuch that

(i) f(A) is F -linear in every column of A;(ii) f(A) = 0 whenever A has two identical columns;(iii) f([ej1 . . . ejn

]) = 0 for all 1 ≤ j1 < · · · < jn ≤ m; (this condition becomesnull when m < n.)

Then f(A) = 0 for all A ∈Mm×n(F ).

Proof. 1 f([v1 . . . vi . . . vj . . . vn] = −f([v1 . . . vj . . . vi . . . vn]. In fact,

0 = f([. . . vi + vj . . . vi + vj . . . ])

= f([. . . vi . . . vi . . . ]) + f([. . . vi . . . vj . . . ])

+ f([. . . vj . . . vi . . . ]) + f([. . . vj . . . vj . . . ])

= f([. . . vi . . . vj . . . ]) + f([. . . vj . . . vi . . . ]).

2 Each column of A is a linear combination of e1, . . . , em. By (i), f(A) isa linear combination of f([ej1 . . . ejn ]), where j1, . . . , jn ∈ 1, . . . ,m. Thus, itsuffices to show f([ej1 . . . ejn ]) = 0. If j1, . . . , jn are not all distinct, by (ii),f([ej1 . . . ejn

]) = 0. If j1, . . . , jn are all distinct, by 1, we may assume 1 ≤ j1 <· · · < jn ≤ m. By (iii), f([ej1 . . . ejn

]) = 0.

Corollary 2.7. det : Mn(F )→ F is the unique function such that

(i) detA is F -linear in every column of A;(ii) detA = 0 whenever A has two identical columns;(iii) det In = 1.


Theorem 2.8 (Cauchy-Binet). Let A ∈ Mn×m(F ) and B ∈ Mm×n(F ). LetI = 1, . . . , n. Then

(2.1) det(AB) =∑

J⊂1,...,m|J|=n

detA(I, J) detB(J, I).

In particular,

det(AB) =

0 if n > m,

(detA)(detB) if n = m.

Proof. Fix A ∈ Mn×m(F ) and let f(B) be the difference of the two sides of(2.1). Then f : Mm×n(F )→ F satisfies (i) – (iii) in Proposition 2.6.

Proposition 2.9 (The adjoint matrix). For A ∈Mn(F ), define

adj(A) =[(−1)i+j detAij

]T ∈Mn(F ).

We have

A adj(A) = adj(A)A = (detA)In.

Moreover, A is invertible ⇔ detA 6= 0. When detA 6= 0, A−1 = 1det Aadj(A).

Proof. Let A = [aij ] = [v1, . . . , vn]. Then the (i, j) entry of adj(A)A is

n∑k=1

(−1)i+k(detAki)akj = det[v1, . . . ,ivj , . . . , vn] =

detA if i = j,

0 if i 6= j.

So, adj(A)A = (detA)In.

2.2. Techniques for Computing Determinants

Example 2.10 (The Vandermonde determinant). For a1, . . . , an ∈ F , let

V (a1, . . . , an) =

∣∣∣∣∣∣∣∣∣∣1 1 · · · 1a1 a2 · · · an

......

...an−11 an−1

2 · · · an−1n

∣∣∣∣∣∣∣∣∣∣.

Then

V (a1, . . . , an) =∏

1≤i<j≤n

(aj − ai).

2.2. TECHNIQUES FOR COMPUTING DETERMINANTS 11

Proof. Method 1. Subtract a1 × (row (n − 1)) from row n, . . . , a1 × (row 1)from row 2 ⇒

V (a1, . . . , an) =

∣∣∣∣∣∣∣∣∣∣∣∣

1 1 · · · 10 a2 − a1 · · · an − a1

0 a2(a2 − a1) · · · an(an − a1)...

......

0 an−22 (a2 − a1) · · · an−2

n (an − a1)

∣∣∣∣∣∣∣∣∣∣∣∣=V (a2, . . . , an)

n∏j=2

(aj − a1)

=∏

1≤i<j≤n

(aj − ai) (by induction).

Method 2. Assume a1, . . . , an−1 are all distinct. V (a1, . . . , an−1, x) is a polyno-mial of degree n−1 with leading coefficient V (a1, . . . , an−1) and have a1, . . . , an−1 asroots. So, V (a1, . . . , an−1, x) = V (a1, . . . , an−1)

∏n−1j=1 (x− aj). Use induction.

Example 2.11. Let a1, . . . , an, b1, . . . , bn ∈ F such that ai + bj 6= 0 for all i, j.Then

det[ 1ai + bj

]=

∏i<j(ai − aj)(bi − bj)∏

i,j(ai + bj).

Proof. We may assume that a1 . . . , an are all distinct and so are b1, . . . , bn.Denote the determinant by f(a1, . . . , an; b1, . . . , bn). Let x be an indeterminate.Then f(x, a2, . . . , an; b1, . . . , bn)

∏nj=1(x+ bj) is a polynomial of degree n− 1 with

leading coefficient1 · · · 11

a2+b1· · · 1

a2+bn

......

1an+b1

1an+bn

∣∣∣∣∣∣∣∣∣∣=: g(a1, . . . , an; b1, . . . , bn)

and have a2, . . . , an as roots. So,

(2.2) f(x, a2, . . . , an; b1, . . . , an)n∏

j=1

(x+ bj) = g(a2, . . . , an; b1, . . . , bn)n∏

i=2

(x− ai).

Similarly, g(a2, . . . , an;x, b2, . . . , bn)∏n

i=2(ai + x) is a polynomial of degree n − 1with leading coefficient f(a2, . . . , an; b2, . . . , bn) and have b2, . . . , bn as roots. So,

(2.3) g(a2, . . . , an;x, b2, . . . , bn)n∏

i=2

(ai + x) = f(a2, . . . , an; b2, . . . , bn)n∏

j=2

(x− bj).

By (2.2) (with x = a1) and (2.3) (with x = b1), we have

f(a1, . . . , an; b1, . . . , bn)∏

i=n or j=n

(ai+bj) = f(a2, . . . , an; b2, . . . , bn)n∏

j=2

(a1−aj)(b1−bj).

The conclusion follows by induction.


Example 2.12 (Circulant matrix). Let a0, . . . , an−1 ∈ C and

C(a0, . . . , an−1) =

a0 a1 · · · · an−1

an−1 a0 a1 · · · ·· an−1 a0 a1 · · ·

· · ·· · ·

· · · · an−1 a0 a1

a1 · · · · an−1 a0

.

Put

A =

0 10 1· ·· ·

0 11 0

.

Then

C(a0, . . . , an−1) = a0A0 + a1A

1 + · · ·+ an−1An−1.

Let ε = e2πi/n. Then

A

1 ε0 · · · ε0(n−1)

1 ε1 · · · ε1(n−1)

......

...1 εn−1 · · · ε(n−1)2

=

1 ε0 · · · ε0(n−1)

1 ε1 · · · ε1(n−1)

......

...1 εn−1 · · · ε(n−1)2

1ε

. . .

εn−1

.

Thus

A ∼

1

ε. . .

εn−1

and

C(a0, . . . , an−1) ∼

∑n−1

i=0 aiε0·i

. . . ∑n−1i=0 aiε

(n−1)i

.So,

detC(a0, . . . , an−1) =n−1∏j=0

(n−1∑i=0

aiεji).

EXERCISES 13

Exercises

2.1. Compute the (2n)× (2n) determinant∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a1 b2n

a2 b2n−1

. . . · ··

an bn+1

bn an+1

· ·· . . .

b2 a2n−1

b1 a2n

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

.

2.2. (Tridiagonal determinant) Let a, b, c ∈ C and define

Dn =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a b

c a b

c a b

· · ·· · ·

c a b

c a

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣n×n

, n ≥ 1.

(i) Prove that Dn = aDn−1 − bcDn−2 for n ≥ 3.(ii) Prove that

Dn =

αn+1 − βn+1

α− βif a2 − 4bc 6= 0,

(n+ 1)(a

2

)n

if a2 − 4bc = 0,

where α = 12 (a+

√a2 − 4bc), β = 1

2 (a−√a2 − 4bc).

2.3. Use Example 2.11 to compute the determinant of the Hilbert matrix Hn =[ 1i+j ]1≤i,j≤n.

2.4. Prove that∣∣∣∣∣∣∣∣∣∣1 sinx1 cosx1 sin 2x1 cos 2x1 · · · sinnx1 cosnx1

1 sinx2 cosx2 sin 2x2 cos 2x2 · · · sinnx2 cosnx2

......

......

......

...1 sinx2n+1 cosx2n+1 sin 2x2n+1 cos 2x2n+1 · · · sinnx2n+1 cosnx2n+1

∣∣∣∣∣∣∣∣∣∣=(−1)n22n2 ∏

1≤j<k≤2n+1

sinxk − xj

2.


2.5. Prove that∣∣∣∣∣∣∣∣∣∣sinx1 cosx1 sin 2x1 cos 2x1 · · · sinnx1 cosnx1

sinx2 cosx2 sin 2x2 cos 2x2 · · · sinnx2 cosnx2

......

......

......

sinx2n cosx2n sin 2x2n cos 2x2n · · · sinnx2n cosnx2n

∣∣∣∣∣∣∣∣∣∣=(−1)n22n2 1

n+ 1

( ∏1≤j<k≤2n

sinxk − xj

2

) n∑s=0

2n∏j=1

sin(xj

2− π

n+ 1s).

2.6. M1 Let A ∈Mm×n(F ) and B ∈Mp×q(F ), where mp = nq. Prove that

det(A⊗B) =

(detA)p(detB)m if m = n and p = q,

0 otherwise.

2.7. (Maillet’s determinant) Let p be an odd prime. For each i, j ∈ 1, . . . , p−12 ,

let m(i, j) ∈ 1, . . . , p − 1 such that j m(i, j) ≡ i (mod p). (When viewedas an element of Zp, m(i, j) = i/j.) Let

Dp = det[m(i, j)].

For example,

D7 =

∣∣∣∣∣∣∣1 4 52 1 33 5 1

∣∣∣∣∣∣∣ .Compute Dp for p ≤ 19 using a computer. Make a conjecture about |Dp|.Then compute D23.

CHAPTER 3

Vector Spaces and Linear Transformations

3.1. Basic Definitions

Definition 3.1. A vector space over a field F is an abelian group (V,+)equipped with a scalar multiplication F × V → V , (α, x) 7→ αx such that for allx, y ∈ V and α, β ∈ F ,

(i) α(x+ y) = αx+ αy;(ii) (α+ β)x = αx+ βx;(iii) α(βx) = (αβ)x;(iv) 1x = x.

Examples of vector spaces.

• Fn, where F is a field. More generally, let V be a vector space and X anyset. Then V X = the set of all functions from X to V is a vector spaceover F .• If F is a subfield of K, K is a vector space over F .• Mm×n(F ), F [x], etc.• The solution set of a linear system, a linear difference equation, a linear

differential equation, etc.• p > 0, `p =

an∞n=1 : an ∈ C,

∑∞n=1 |an|p < ∞

. (|an + bn|p ≤

(2 max|an|, |bn|)p = 2p max|an|p, |bn|p ≤ 2p(|an|p + |bn|p).)

Subspaces. Let V be a vector space over F . A subset W ⊂ V is called asubspace of V if W is a vector space over F under the same addition and scalarmultiplication of V . W is a subspace of V ⇔W 6= ∅ and W is closed under additionand scalar multiplication.

Linear transformations. Let V and W be vector spaces over F . A functionf : V → W is called a linear transformation (or an F -map) if for all x, y ∈ V andα ∈ F , f(x + y) = f(x) + f(y) and f(αx) = αf(x). A bijective F -map is calledan isomorphism. If ∃ isomorphism f : V → W , we say that V is isomorphic to Wand write V ∼= W ; in this case, f−1 : W → V is also an isomorphism. An injectiveF -map f : V → W is called an embedding. HomF (V,W ) = the set of all F -mapsfrom V to W ; it is a subspace of WV . An F -map f : V → V is also called a linearoperator of V . HomF (V, V ) is denoted by EndF (V ).

Easy fact. Let f : V → W be a linear transformation. Then f(V ) is asubspace of W . If W1 is a subspace of W , then f−1(W1) is a subspace of V . Inparticular, ker f := f−1(0) is a subspace of V . f is 1-1 ⇔ ker f = 0.

Easy fact. Let V be a vector space over F and Vi : i ∈ I a family ofsubspaces of V .

(i)⋂

i∈I Vi is a subspaces of V .

15

16 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS

(ii) Define∑i∈I

Vi =∑

i∈I

ui : ui ∈ Vi, ui 6= 0 for only finitely many i ∈ I.

Then∑

i∈I Vi is the smallest subspace of V containing⋃

i∈I Vi.

Direct product and external direct sum. Let Vi : i ∈ I be a familyof vector spaces over F . Let∏

i∈I

Vi =(ui)i∈I : ui ∈ Vi, i ∈ I

(the cartesian product of Vi : i ∈ I).

Then∏

i∈I Vi is a vector space over F with addition and scalar multiplicationdefined component wise;

∏i∈I Vi is called the direct product of Vi : i ∈ I.⊕ext

i∈I

Vi :=

(ui) ∈∏i∈I

Vi : ai = 0 for all but finitely many i

is a subspace of∏

i∈I Vi.⊕ext

i∈I Vi is called the external direct sum of Vi : i ∈ I.When |I| <∞,

∏i∈I Vi =

⊕exti∈I Vi.

Internal direct sum. Let V be a vector space over F and Vi : i ∈ I afamily of subspaces of V . If

Vi ∩(∑

j∈Ij 6=i

Vj

)= 0 for all i ∈ I,

then∑

i∈I Vi is called an internal direct sum and is denoted by⊕

i∈I Vi.

Easy facts.

(i)∑

i∈I Vi is an internal direct sum ⇔ every u ∈∑

i∈I Vi has a uniquerepresentation u =

∑i∈I ui, where ui ∈ Vi and ui = 0 for all but finitely

many i.(ii)

⊕i∈I Vi

∼=⊕ext

i∈I Vi. (For this reason, we usually do not distinguish inter-nal and external direct sums.

⊕ext is also denoted by⊕

.)

Spans, Spanning Sets and Linearly Independent Sets. Let V be avector space over F and let S ⊂ V . The span of S, denoted by 〈S〉 or spanS, is

〈S〉 = spanS := a1u1 + · · ·+ anun : n ≥ 0, ui ∈ V, ai ∈ F.

〈S〉 is the smallest subspace of V containing S. If V = 〈S〉, S is called a spanningset of V .

A subset S ⊂ V is called a linearly independent set if for any u1, . . . , un ∈ S(distinct) and any a1, . . . , an ∈ F not all zero, a1u1 + · · ·+ anun 6= 0.

Theorem 3.2. Let V be a vector space over F and S ⊂ V . Then the followingstatements are equivalent.

(i) S is a maximal linearly independent set of V .(ii) S is a minimal spanning set of V .(iii) S is a linearly independent spanning set of V .(iv) Every element of V is a unique linear combination of elements in S.

3.1. BASIC DEFINITIONS 17

Proof. (i) ⇔ (iii).(ii) ⇔ (iii).(iv) ⇔ (iii).

By Zorn’s lemma, maximal linearly independent sets of V exist. A subsetS ⊂ V satisfying one of (i) – (iv) in Theorem 3.2 is called a basis of V .

Proposition 3.3. Let V and W be vector spaces over F and let X be a basis ofV . Then every function f : X →W can be extended to a unique F -map f : V →W .

Proof. Define

f : V −→ W∑x∈X axx 7−→

∑x∈X axf(x)

Corollary 3.4. Let V and W be vector spaces over F . Let S be a subspaceof V and f : S →W an F -map. The f can be extended to an F -map g : V →W .

Proof. Let X be a basis of S. Extend X to a basis of Y of V . Extend f |Xto a function f1 : Y → W . By Proposition 3.3, f1 can be extended to an F -mapg : V →W .

Theorem 3.5. Any two bases of a vector space have the same cardinality.

Proof. Let V be a vector space over F and let X, Y be two bases of V .1 Assume that |X| < ∞ and |Y | < ∞. Write X = x1, . . . , xn and Y =

y1, . . . , yn. Assume to the contrary that n > m. Thenx1

...xn

= A

y1...ym

,y1...ym

= B

x1

...xn

for some matrix A ∈ Mn×m(F ) and B ∈ Mm×n(F ). It follows that AB =In. There exists C ∈ GL(n, F ) such that CA = [ ∗

0 ··· 0 ]. Thus (0, . . . , 0, 1)C =(0, . . . , 0, 1)CAB = 0, →←.

2 Assume |X| = ∞. We claim that |Y | = ∞. (Otherwise, X is spannedby Y which is spanned by a finite subset of X. So, X is spanned by a finitesubset of X, →←.) For each x ∈ X, ∃ a finite subset y1, . . . , yn ⊂ Y suchthat x = a1y1 + · · · + anyn, ai ∈ F . Define f(x) = y1, . . . , yn. We claim that⋃

x∈X f(x) = Y . (Otherwise, X is spanned by Y1 :=⋃

x∈X f(x) ( Y ; hence Y isspanned by Y1, →←.) Now,

|Y | =∣∣∣ ⋃x∈X

f(x)∣∣∣ ≤ |X|ℵ0 = |X|.

By symmetry, |X| ≤ |Y |. So, |X| = |Y |.

Dimension. Let V be a vector space over F with a basis X. Define dimV (ordimF V ) = |X|. We have

V =⊕s∈X

Fx ∼=⊕ext

x∈X

F = F |X|.


Caution. Let F be a field and X a set. F |X| is the direct sum of |X| copies ofF , i.e., F |X| =

⊕x∈X F . However, FX is the F -vector space of all functions from

X to F , i.e., FX =∏

x∈X F .

Examples. dimFn = n. dimF [x] = ℵ0. Let Sn(F ) be the set of all n × nsymmetric matrices over F and Un(F ) the set of all n×n upper triangular matricesover F . Then dimSn(F ) = dimUn(F ) = 1

2n(n+ 1).

Example 3.6. If V is a vector space over F such that |V | =∞ and |V | > |F |.Then dimV = |V |. (E.g., dimQ R = ℵ.)

Proof. Let X be a basis of V . Clearly, |X| = ∞. (If |X| < ∞, since |F | <|V | = ∞, we have |V | = |F ||X| < |V |, →←.) Let P0(X) be the set of all finitesubsets of X. Then

|V | =∣∣∣ ⋃S∈P0(X)

〈S〉∣∣∣

≤ |P0(X)|max|F |,ℵ0 (since |〈S〉| = |F ||S| ≤ max|F |,ℵ0)= |X|max|F |,ℵ0 = max|X|, |F |.

Since |V | ≥ |F |, we must have |V | ≤ |X|.

Example. Let F be a subfield of K and V a vector space over K. Then V isnaturally a vector space over F . Moreover,

dimF V = dimK V · dimF K.

Proof. Let X be a basis of V over K and Y a basis over K over F . Then as(y, x) runs over Y ×X, yx are all distinct; Y X = yx : y ∈ Y, x ∈ X is a basis ofV over F .

Easy facts.

(i) Two vector spaces V and W over F are isomorphic iff dimV = dimW .(ii) dim

⊕i∈I Vi =

∑i∈I dimVi.

Example. Let A ∈ Mm×n(F ). The row (column) space of A, denoted byR(A) (C(A)), is the subspace of Fn (Fm) spanned by the rows (columns) of A.The nonzero rows (columns) of rref(A) (rcef(A)) form a basis of R(A) (C(A));dimR(A) = dim C(A) = rankA.

3.2. Quotient Spaces and Isomorphism Theorems

The quotient space. Let S ⊂ V be a vector space over F . Recall that thequotient abelian group V/S = u+ S : u ∈ V and the addition in V/S is definedby (u+S) + (v+S) = (u+ v) +S. Define a scalar multiplication in V/S similarly.For u + S ∈ V/S and α ∈ F , let α(u + S) = αu + S. The scalar multiplication iswell defined and V/S becomes a vector space over F . V/S is called the quotientspace of V by S. The map

π : V −→ V/S

u 7−→ u+ S

is an onto F -map with kerπ = S. π is called the canonical projection from V toV/S.

3.2. QUOTIENT SPACES AND ISOMORPHISM THEOREMS 19

Proposition 3.7. Let S ⊂ V be vector spaces over F . Let εi : i ∈ I be abasis of S and δj + S : j ∈ J a basis of V/S. Then εi : i ∈ I ∪ δj : j ∈ J is abasis of V . So, V ∼= S ⊕ V/S and dimV = dimS + dimV/S. If dimV <∞, thendimV/S = dimV − dimS.

Easy fact (The correspondence theorem). Let S ⊂ V be vector spaces overF . Let A be the set of all subspaces of V containing S and B the set of all subspacesof V/S. Then

A −→ BW 7−→ W/S

is a bijection.

Theorem 3.8 (The universal mapping property of the quotient space). LetS ⊂ V be vector spaces over F . Let W be another vector space over F and f :V → W an F -map such that ker f ⊃ S. Then ∃! F -map f : V/S → W such thatf = f π. Moreover, f(V ) = f(V ) and ker f = ker f/S.

................................................................................................. .......................................................................................... ............................

........................................................

.........................................

f

πf

V W

V/S

Proof. Define f : V/S →W , u+ S 7→ f(u).

Theorem 3.9 (The first isomorphism theorem). Let f : V →W be an F -map.Then V/ ker f ∼= f(V ).

Proof. By Theorem 3.8, ∃ an F -map f : V/ ker f → W such that f = f π,where π : V → V/ ker f is the canonical projection, and f(V ) = f(V ), ker f =ker f/ ker f = 0 + ker f.

Theorem 3.10 (The second isomorphism theorem). Let V be a vector spaceover F and S, T subspaces of V . Then (S + T )/T ∼= S/S ∩ T .

Proof. Define an F -map

f : S −→ (S + T )/Ts 7−→ s+ T

f is onto with ker f = S ∩ T . Use the first isomorphism theorem.

Theorem 3.11 (The third isomorphism theorem). Let S ⊂ T ⊂ V be vectorspace over F . Then (V/S)

/(T/S) ∼= V/T .

Proof. Define an F -map f : V/S → V/T , v+S → v+T . Then f is onto andker f = T/S.

Corollary 3.12.(i) If f : V →W is an F -map, then

dimV = null f + rank f,

where null f := dim(ker f) and rank f := dim f(V ).


(ii) Let S, T be subspaces of V . Then

dimS + dimT = dim(S + T ) + dimS ∩ T.

Proof. (ii) Define an F -map

f : S × T −→ S + T

(s, t) 7−→ s+ t

Then f is onto and ker f = (s,−s) : s ∈ S ∩ T ∼= S ∩ T . Hence

dimS + dimT = dim(S × T ) = dim(S + T ) + dimS ∩ T.

3.3. Finite Dimensional Vector Spaces

Facts.

(i) If S ⊂ V are vector spaces over F such that dimS = dimV < ∞, thenS = V .

(ii) Let f : V →W be an F -map, where dimV = dimW <∞. Then f is 1-1⇔ f is onto.

Proof. (i) dimV/S = 0⇒ V = S.Note. When dimV =∞, both (i) and (ii) are false.

Let V be an n-dimensional vector space over F with an (ordered) basis E =(ε1, . . . , εn) and W and m-dimensional vector space over F with an (ordered) basis(δ1, . . . , δm). Let f : V →W be an F -map. Then(

f(ε1), . . . , f(εn))

= (δ1, . . . , δm)A

for some A ∈ Mm×n(F ). The map f 7→ A is an isomorphism HomF (V,W ) →Mm×n(F ). We have rank f = rankA and null f = nullA. (nullA := dimx ∈ Fn :Ax = 0.) If f ∈ EndF (V ) (= HomF (V, V )), we have

(f(ε1), . . . , f(εn)) = (ε1, . . . , εn)A

for some A ∈Mn(F ). A is called the E-matrix of f . If E ′ = (ε′1, . . . , ε′n) is another

(ordered) basis of V and let B be the E ′-matrix of f . Then

B = P−1AP,

where P ∈ GL(n, F ) is defined by (ε′1, . . . , ε′n) = (ε1, . . . , εn)P . (Proof. f(ε′1, . . . , ε

′n) =

f((ε1, . . . , εn)P ) = f(ε1, . . . , εn)P = (ε1, . . . , εn)AP = (ε′1, . . . , ε′n)P−1AP .) The

map EndF (V )→Mn(F ), f 7→ A, is not only an F -isomorphism but also preservesthe multiplication; the map is an algebra isomorphism.

Facts about ranks of matrices. Let A,B ∈Mm×n(F ) and C ∈Mn×p(F ).

(i) rankA = maxr : A has an r × r invertible submatrix.(ii) rank(A+B) ≤ rankA+ rankB.(iii) rankA+ rankC − n ≤ rankAC ≤ minrankA, rankC.(iv) If P ∈ GL(m,F ) and Q ∈ GL(n, F ), then rankPAQ = rankA.

3.3. FINITE DIMENSIONAL VECTOR SPACES 21

Proof. (iii) Method 1. Define

f : Fn/C(C) −→ C(A)/C(AC)x+ (C) 7−→ Ax+ C(AC).

Then f is a well defined onto F -map. So, dim(Fn/C(C)

)≥ dim

(C(A)/C(AC)

).

Hence the result.Method 2. May assume A =

[Ir 00 0

], where r = rankA. Write C =

[C1C2

], where

C1 is of size r × p. Then rankAC = rankC1 and rankC1 + n− r ≥ rankC. Hencethe result.

Homogeneous linear ordinary differential equations (ODE). LetF = R or C. Let I ⊂ R be an open interval. Let A : I → Mn(F) be a continuousfunction. Let x(t) ∈ Fn denote an unknown function of a real variable t. For eacht0 ∈ I and x0 ∈ Fn, the initial value problem

(3.1)

x′(t) = A(t)x(t)x(t0) = x0

has a unique solution x(t) defined on I. (This is a special case of existence anduniqueness theorems in ODE.)

Let D(I) be F-vector space of all differentiable functions from I to Fn and let(Fn)I be the F-vector space of all functions from I to Fn. Then

L : D(I) −→ (Fn)I

x(t) 7−→ x′(t)−A(t)x(t)

is an F-map. The homogeneous linear ODE x′(t) = A(t)x(t) becomes L(x) = 0;its solution set is kerL. The existence and uniqueness of the solution of (3.1) isequivalent to the following statement. The F-map

(3.2)kerL −→ Fn

x 7−→ x(t0)

is an isomorphism. Therfore dimF kerL = n.Let x1, . . . ,xn ∈ kerL. φ(t) = det[x1(t), . . . ,xn(t)] is called the Wronskian of

x1, . . . ,xn. By the isomorphism (3.2), x1, . . . ,xn is a basis of kerL ⇔ φ(t0) 6= 0.Since t0 ∈ I is arbitrary, φ(t0) 6= 0⇔ φ(t) 6= 0 ∀t ∈ I.

The Wronskian φ(t) is explicitly given by its initial value φ(t0):

(3.3) φ(t) = φ(t0)exp(∫ t

t0

Tr(A(τ))dτ).


Proof of (3.3). We have

φ′(t) =d

dt

(det[x1(t), . . . ,xn(t)]

)=

n∑i=1

det[x1(t), . . . ,xi−1(t),x′i(t),xi+1(t) . . . ,xn(t)] (the product rule)

=n∑

i=1

det[x1(t), . . . ,xi−1(t), A(t)xi(t),xi+1(t) . . . ,xn(t)]

= TrA(t) det[x1(t), . . . ,xn(t)] (by the next lemma)

= TrA(t)φ(t).

It follows that φ(t) = φ(t0)exp(∫ t

t0Tr(A(τ))dτ

).

Let a0(t), . . . , an−1(t) ∈ F be continuous functions of t ∈ I and x(t) ∈ F anunknown function. Then the nth order linear ODE

(3.4) x(n)(t) + an−1(t)x(n−1)(t) + · · ·+ a0(t)x(t) = 0

is equivalent toy′(t) = A(t)y(t),

where

y(t) =

x(1)x′(t)

...x(n−1)(t)

, A(t) =

0 10 1

· ·· ·

0 1−a0(t) −a1(t) · · · −an−1(t)

.

Let S be the solution set of (3.4). Then for each t0 ∈ I,

S −→ Fn

x(t) 7−→ (x(t0), x′(t0), . . . , x(n−1)(t0))T

is an isomorphism. If x1, . . . , xn ∈ S, their Wronskian is

φ(t) =

∣∣∣∣∣∣∣∣∣∣x1(t) · · · xn(t)x′1(t) · · · x′n(t)

......

x(n−1)1 (t) · · · x

(n−1)n (t)

∣∣∣∣∣∣∣∣∣∣.

We have

φ(t) = φ(t0)exp(−∫ t

t0

an−1(τ)dτ).

x1, . . . , xn form a basis of S ⇔ φ(t0) 6= 0⇔ φ(t) 6= 0 ∀t ∈ I.

Lemma 3.13. Let A,B ∈Mn(F ) and write B = [b1, . . . bn]. Then

(3.5)n∑

i=1

det[b1, . . . , bi−1, Abi, bi+1, . . . , bn] = TrAdetB.

3.4. THE DUAL SPACE 23

Proof. Fix A = [aij ] and let f(B) be the difference of the two sides of (3.5).We only have to show that f satisfies (i) – (iii) of Proposition 2.6. (i) is obvious.

(ii) Assume b1 = b2. Then

f(B) = det[Ab1, b2, b3, . . . , bn] + det[b1, Ab2, b3, . . . , bn] = 0.

(iii)

f([e1, . . . , en]) =n∑

i=1

det[e1, . . . , ei−1, Aei, ei+1, . . . , en]

=n∑

i=1

det[e1, . . . , ei−1,

a1i

...ani

, ei+1, . . . , en] =n∑

i=1

aii = TrA.

3.4. The Dual Space

Let V be a vector space over F . HomF (V, F ) is called the dual space of V andis denoted by V ∗.

Let B be a basis of V . For each v ∈ B, ∃!v′ ∈ V ∗ such that

v′(u) =

1 if u = v,

0 if u ∈ B \ u.

It is easy to see that v′ : v ∈ B are linearly independent in V ∗. Thus, B → V ∗,v 7→ v′ extends to an embedding V → V ∗. (Note. This embedding depends on thechoice of the basis of B.)

If dimV = n <∞, then dimV ∗ = n. (Recall that HomF (Fn, F ) ∼= M1×n(F ).)So, the above embedding V → V ∗ is an isomorphism. v′ : v ∈ B is a basis of V ∗

and is called the dual basis of B.

Theorem 3.14. Let V be a vector space over F such that dimV = ∞. ThendimV ∗ = |V ∗| = |F |dim V .

Proof. Let B be a basis of V . Then |V ∗| = |FB | = |F |dim V .Case 1. Assume |F |dim V > |F |. By Example 3.6, dimV ∗ = |V ∗| = |F |dim V .Case 2. Assume |F |dim V = |F |. Let b0, b1, · · · ∈ B be distinct. For each a ∈ F ,

choose fa ∈ V ∗ such that fa(bj) = aj , j ≥ 0. We claim that fa : a ∈ F is linearlyindependent. This is quite obvious. Let a1, . . . , an ∈ F be distinct. Then the n×ℵ0

matrix [fai(bj)

]= [aj

i ]has linearly independent rows. Therefore, dimV ∗ ≥ |fa : a ∈ F| = |F |.

Examples. Let F = Q, V = Qℵ0 . Then dimV ∗ = ℵℵ00 = ℵ.

Let F = R, V = Rℵ0 . Then dimV ∗ = ℵℵ0 = ℵ.The pairing between V and V ∗. Define a map 〈· , ·〉 : V ∗ × V → F by

〈f, v〉 = f(v).

〈·, ·〉 is bilinear, i.e., 〈af+bg, v〉 = a〈f, v〉+b〈g, v〉 and 〈f, au+bv〉 = a〈f, u〉+b〈f, v〉.For any S ⊂ V and A ⊂ V ∗, S⊥ := f ∈ V ∗ : 〈f, v〉 = 0 ∀v ∈ S is a subspace ofV ∗ and A⊥ := v ∈ V : 〈f, v〉 = 0 ∀f ∈ A is a subspace of V .


Proposition 3.15. Let S, T be subspaces of V and A,B subspaces of V ∗.(i) S ⊂ T ⇒ S⊥ ⊃ T⊥; A ⊂ B ⇒ A⊥ ⊃ B⊥.(ii) S = S⊥⊥; A ⊂ A⊥⊥.(iii)

φ : S⊥ −→ (V/S)∗

f 7−→ 〈f, ·〉is an isomorphism, where

〈f, ·〉 : V/S −→ F

v + S 7−→ 〈f, v〉.

(iv)ψ : A⊥ −→ (V ∗/A)∗

v 7−→ 〈·, v〉is an embedding, where

〈·, v〉 : V ∗/A −→ F

f +A 7−→ 〈f, v〉.

(v) If dimV = n < ∞, then dimS + dimS⊥ = n, dimA + dimA⊥ = n,A = A⊥⊥, and the embedding ψ in (iv) is an isomorphism.

Proof. (ii) Clearly, S ⊂ S⊥⊥. If u ∈ V \ S, then ∃f ∈ V ∗ such that f(S) = 0but f(u) 6= 0. So, f ∈ S⊥ but 〈f, u〉 6= 0. Hence u /∈ S⊥⊥. So, S ⊃ S⊥⊥.

(iii) Proof that φ is onto. Let π : V → V/S be the natural projection. ∀g ∈(V/S)∗, we have g π ∈ S⊥ and g = φ(g π).

(v) Note that dim(V/S)∗ = dim(V/S). Thus by (iii), dimS⊥ = dim(V/S) =n− dimS.

Let A = 0 in (iv). We have V = 0⊥ → V ∗∗, v 7→ 〈·, v〉 is an embedding.Since dimV = dimV ∗∗, this embedding is also onto. Thus every α ∈ V ∗∗ is ofthe form 〈·, v〉 for some v ∈ V . It follows that the map ψ in (iv) is onto. (Letρ : V ∗ → V ∗/A be the natural projection. ∀β ∈ (V ∗/A)∗, we have β ρ ∈ V ∗∗;hence β ρ = 〈·, v〉 for some v ∈ V . Clearly v ∈ A⊥ and ψ(v) = β.) Consequently,

dimA⊥ = dim(V ∗/A)∗ = dim(V ∗/A) = n− dimA.Since A ⊂ A⊥⊥ and dimA = n− dimA⊥ = dimA⊥⊥, we have A = A⊥⊥.

Note.

(i) The embedding V → V ∗∗, v 7→ 〈·, v〉, is called the canonical embeddingof V into V ∗∗; it does not depends on any bases of V and V ∗∗. (Forcomparison, note that the embedding V → V ∗ at the beginning of thissection depends on the choice of the bases of V and V ∗.) When dimV <∞, the canonical embedding is an isomorphism.

(ii) Statements (iii) and (iv) of Proposition 3.15 can be made a little moregeneral. See Exercise 3.4.

(iii) When dimV =∞, the claims in (v) of Proposition 3.15 are false. See thefollowing counterexamples.• Let S = 0 ⊂ V . Then dimS⊥ = dimV ∗ > dimV ; hence dimS +

dimS⊥ > dimV .• Let A = V ∗. Then dimA+ dimA⊥ > dimV .

EXERCISES 25

• Since dimV ∗∗ > dimV , the canonical embedding V → V ∗∗ is notonto.• Assume V has a countable basis ε1, ε2, . . . . Let

A = f ∈ V ∗ : f(εn) = 0 when n is large enough.

Then A⊥ = 0. (If 0 6= v ∈ V , then v = a1ε1 + · · ·+ aN εN for someN > 0 and a1, . . . , aN ∈ F . Choose f ∈ V ∗ such that f(v) = 1 andf(εn) = 0 for all n > N . Then f ∈ A but 〈f, v〉 6= 0, so v /∈ A⊥.)Therefore, A⊥⊥ = 0⊥ = V ∗ ) A.

When dimV = n < ∞, the paring between V and V ∗ can be made moreexplicit. Let v1, . . . , vn be a basis of V and v′1, . . . , v

′n the dual basis of V ∗. Define

isomorphisms

α : Fn → V, (a1, . . . , an) 7→ a1v1 + · · ·+ anvn,

β : Fn → V ∗, (b1, . . . , bn) 7→ b1v′1 + · · ·+ bnv

′n.

For v ∈ V and f ∈ V ∗, write v = a1v1 + · · ·+anvn and f = b1v′1 + · · ·+ bnv

′n. Then

〈f, v〉 = 〈b1v′1 + · · ·+ bnv′n, a1v1 + · · ·+ anvn〉 = b1a1 + · · ·+ bnan

= (b1, . . . , bn)(a1, . . . , an)T = β−1(f) · α−1(v)T .

Let S be a subspace of V and A a subspace of V ∗. Let ε1, . . . , εk be a basis ofα−1(S) and δ1, . . . , δl a basis of β−1(A). Then

β−1(S⊥) = kerr[εT , . . . , εTk ],

α−1(A⊥) = kerr[δT , . . . , δTl ].

Proposition 3.16. Let f : V →W be an F -map.

(i) Define f∗ : W ∗ → V ∗, α 7→ α f . Then f∗ ∈ HomF (W ∗, V ∗). Moreover,( )∗ : HomF (V,W )→ HomF (W ∗, V ∗) is an F -map.

(ii) If g : W → X is another F -map, then (g f)∗ = f∗ g∗.(iii) Let θV : V → V ∗∗ and θW : W →W ∗∗ be the canonical embeddings. Then

the following diagram is commutative.

VθV−→ V ∗∗

f

y yf∗∗

WθW−→ W ∗∗

Proof. Exercise.

Exercises

3.1. Let V be a vector space over F and let A,B,A′ be subspaces of V such thatA′ ⊂ A. Prove that

A ∩ (B +A′) = (A ∩B) +A′.


3.2. Let V be a vector space over F and let f be a linear transformation of V . Asubspace W ⊂ V is called f -invariant if f(W ) ⊂W . Define

V1 = a ∈ V : fk(a) = 0 for some integer k > 0,

V2 =∞⋂

k=1

fk(V ).

(i) Prove that V1 and V2 are both f -invariant subspaces of V .(ii) If dimV <∞, prove that

V = V1 ⊕ V2.

(iii) Give an example of a linear transformation f of an infinite dimensionalvector space V such that V1 = V2 = 0.

3.3. Let L = f(x, y) ∈ R[x, y] : degx f ≤ n, degy f ≤ n. Let ∆ = ∂2

∂x2 + ∂2

∂y2 .Prove that

D : L −→ Lf(x, y) 7−→ ∆

((x2 + y2)f(x, y)

)− (x2 + y2)∆f(x, y)

is a linear transformation. Find the matrix of D relative to the basis xiyj :0 ≤ i, j ≤ n of L.

3.4. Let V be a vector space over F . Let S ⊂ T be subspaces of V and A ⊂ Bsubspaces of V ∗.(i) Define

φ : S⊥/T⊥ −→ (T/S)∗

f + T⊥ 7−→ 〈f, ·〉where

〈f, ·〉 T/S −→ F

u+ S 7−→ 〈f, u〉.Prove that φ is a well defined isomorphism.

(ii) Defineψ : A⊥/B⊥ −→ (B/A)∗

u+ B⊥ 7−→ 〈·, u〉where

〈·, u〉 B/A −→ F

f +A 7−→ 〈f, u〉.Prove that ψ is a well defined injective F -map. When dimV <∞, ψ isan isomorphism.

3.5. Prove Proposition 3.16.

3.6. Let

A =

[B C

D E

],

where B ∈ Mm×n(F ) with rankB = r and E ∈ Mp×q(F ). What is thelargest possible values of rankA?

EXERCISES 27

3.7. Let A ∈Mm×n(F ), B ∈Mn×p(F ), C ∈Mp×q(F ). Prove that

rankAB + rankBC ≤ rankB + rankABC.

3.8. (i) Let V and W be vector spaces over Q and f : V → W a function suchthat f(x+ y) = f(x) + f(y) for all x, y ∈ V . Prove that f is a Q-linearmap.

(ii) Let f : Rn → Rm be a continuous function such that f(x + y) =f(x) + f(y) for all x, y ∈ Rn. Prove that f is an R-linear map. (Note.(ii) is false if f is not continuous.)

3.9. LetX be a subspace ofMn(F ) with dimX > n(n−1). Prove thatX containsan invertible matrix.

3.10. Let Fq be a finite field with q elements.(i) Prove that

|GL(n,Fq)| = (qn − 1)(qn − q) · · · (qn − qn−1) = q12 n(n−1)

n∏i=1

(qi − 1).

(ii) Let 0 ≤ k ≤ n and let[nk

]q

be the number of k-dimensional subspacesin Fn

q . Prove that[n

k

]q

=(qn − 1)(qn − q) · · · (qn − qk−1)(qk − 1)(qk − q) · · · (qk − qk−1)

=k∏

i=1

qn−k+i − 1qi − 1

.

([nk

]q

is called the gaussian coefficient.)

3.11. Let n ≥ 0 and V = f ∈ F [x] : deg f ≤ n. For each 1 ≤ i ≤ n + 1, defineLi ∈ V ∗ by

Li(f) =∫ +∞

0

f(x)e−ixdx, f ∈ V.

Find a basis f1, . . . , fn+1 of V such that L1, . . . , Ln+1 is its dual basis.

CHAPTER 4

Rational Canonical Forms and Jordan CanonicalForms

4.1. A Criterion for Matrix Similarity

The main purpose of this chapter is to determine when two matrices in Mn(F )are similar and to determine a canonical form in each similarity class. Let V be ann-dimensional vector space over F . Then two matrices in Mn(F ) are similar iff theyare the matrices of some T ∈ End(V ) relative to two suitable bases. Therefore, toknow canonical forms of the similarity classes of Mn(F ) is to know canonical formsof linear transformations of V relative to suitable bases.

Matrices over Mn(F [x]). Let F [x] be the polynomial ring over F . Mm×n(F [x])is the set of all m × n matrices with entries in F [x]; Mn(F [x]) := Mn×n(F [x]);GL(n, F [x]) is the set of all invertible matrices in Mn(F [x]).

Fact. A ∈Mn(F [x]) is invertible ⇔ detA ∈ F× (= F \ 0).

Proof. (⇒) 1 = det(AA−1) = (detA)(detA−1). So, detA is invertible inF [x], i.e., detA ∈ F×.

(⇐) A−1 = 1det AadjA.

Equivalence in Mm×n(F [x]). Two matrices A,B ∈ Mm×n(F [x]) are calledequivalent, denoted A ≈ B, if ∃P ∈ GL(m,F [x]) and Q ∈ GL(n, F [x]) such thatA = PBQ.

Elementary operations and elementary matrices in Mn(F [x]). Ele-mentary operations and elementary matrices in Mn(F [x]) are almost the same asthose in Mn(F ), cf. Table ??. For type I, we still require that α ∈ F×. (Requiringthat 0 6= α ∈ F [x] is not enough.) For type III, β ∈ F [x]. Elementary matrices inMn(F [x]) are invertible and every matrix in GL(n, F [x]) is a product of elementarymatrices.

Theorem 4.1. Let A,B ∈ Mn(F ). Then A and B are similar in Mn(F ) ⇔xI −A and xI −B are equivalent in Mn(F [x]).

Proof. (⇒) ∃P ∈ GL(n, F ) such thatA = PBP−1. Note that P ∈ GL(n, F [x])and P (xI −B)P−1 = xI −A.

(⇐) ∃P,Q ∈ GL(n, F [x]) such that

P (xI −A) = (xI −B)Q.

Write P = P0 +xP1 + · · ·+xsPs, where Pi ∈Mn(F ). Divide P by xI−B from theleft. We have P = (xI −B)S + T for some S ∈Mn(F [x]) and T ∈Mn(F ). Divide

29

30 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS

Q by xI −A from the right. We have Q = S′(xI −A)+T ′ for some S′ ∈Mn(F [x])and T ′ ∈Mn(F ). Thus

[(xI −B)S + T ](xI −A) = (xI −B)[S′(xI −A) + T ′],

i.e.,

(4.1) (xI −B)(S − S′)(xI −A) = (xI −B)T ′ − T (xI −A).

We claim that S − S′ = 0. (Otherwise, S − S′ = S0 + xS1 + · · · + xkSk, Si ∈Mn(F ), Sk 6= 0. Then (xI − B)(S − S′)(xI − A) = xk+2Sk+ terms of lowerdegree in x while the highest power of x at the RHS of (4.1) is x, →←.) Thus(xI −B)T ′− T (xI −A) = 0, which implies that T = T ′ and BT = TA. It remainsto show that T ∈ GL(n, F ). (Then B = TAT−1.) Write

P−1 = (xI −A)X + Y,

where X ∈Mn(F [x]) and Y ∈Mn(F ). Then

I = PP−1 = [(xI −B)S + T ][(xI −A)X + Y ]

= (xI −B)S[(xI −A)X + Y

]+ T (xI −A) + TY

= (xI −B)S[(xI −A)X + Y

]+ (xI −B)T + TY (∵ TA = BT )

= (xI −B)Z + TY

(4.2)

for some Z ∈Mn(F [x]). Compare the degrees of x at both sides of (4.2). We musthave TY = I and the proof is complete.

Now, the question is to determine when xI −A is equivalent to xI −B.

4.2. The Smith Normal Form

For two matrices A,B of any size, define A⊕B = [ AB ].

Theorem 4.2. Let A ∈ Mm×n(F [x]). Then ∃P ∈ GL(m,F [x]) and Q ∈GL(n, F [x]) such that

(4.3) PAQ =

d1

d2

. . .

dr

⊕ 0,

where d1, . . . , dr ∈ F [x] are monic (with leading coefficient 1) and d1 | d2 | · · · | dr.The polynomials d1, . . . , dr ∈ F [x] are uniquely determined by A and are called theinvariant factors of A. The integer r is called the rank of A. The matrix at theRHS of (4.3) is called the Smith normal form of A.

Proof. Existence of the Smith normal form.For 0 6= A = [aij ] ∈Mm×n(F [x]), define δ(A) = mindeg aij : aij 6= 0.Use induction on min(m,n). First assume min(m,n) = 1, say m = 1. As-

sume A 6= 0. Among all matrices equivalent to A, choose B such that δ(B)is as small as possible. Write B = [b11, . . . , b1n] and, without loss of general-ity, assume deg b11 = δ(B). Then b11 | bij for all 2 ≤ j ≤ n. (If b11 - b12,then b12 = qb11 + r for some q, r ∈ F [x] with 0 ≤ deg r < deg b11. Then

4.2. THE SMITH NORMAL FORM 31

B ∼= [b11, b12 − qb11, b13, . . . , b1n] = [b11, r, b13, . . . , b1n], which contradicts the mini-mality of δ(B).) Thus, suitable elementary column operations of type III transformB into [b11, 0, . . . , 0]. We can make b11 monic using a type I elementary operation.

Now assume min(m,n) > 1 and A 6= 0. Among all matrices equivalent toA, choose B such that δ(B) is as small as possible. Let B = [bij ] and assumedeg b11 = δ(B). By the argument in the case m = 1 we have b11 | b1j for all2 ≤ j ≤ n and b11 | bi1 for all 2 ≤ i ≤ m. Then suitable type III elementaryoperations transform B into

C =

b11 0 · · · 00 c22 · · · c2n

......

...0 cm2 · · · cmn

.We claim that b11 | cij for all 2 ≤ i ≤ m and 2 ≤ j ≤ n. (Since

C ≈

b11 ci2 · · · cin

0 c22 · · · c2n

......

...0 cm2 · · · cmn

,from the above we have b11 | cij for all 2 ≤ j ≤ n.) Therefore, C = [b11] ⊕ b11C1,where C1 ∈M(m−1)×(n−1)(F [x]). Apply the induction hypothesis to C1.

Uniqueness of the Smith normal form.For A ∈Mm×n(F [x]) and 1 ≤ k ≤ min(m,n), define

∆k(A) = gcddetX : X is a k × k submatrix of A.

(∆k(A) is called the kth determinantal divisor of A.) Also define ∆0(A) = 1.We claim that if A,B ∈ Mm×n(F [x]) are equivalent, then ∆k(A) = ∆k(B)

for all 0 ≤ k ≤ min(m,n). Assume B = PAQ, where P ∈ GL(m,F [x]), Q ∈GL(n, F [x]). By Cauchy-Binet, for I ⊂ 1, . . . ,m and J ⊂ 1, . . . , n with |I| =|J | = k,

detB(I, J) =∑

K⊂1,...,mL⊂1,...,n|K|=|L|=k

detP (I,K) detA(K,L) detQ(L, J).

Since ∆k(A) | detA(K,L) for all K,L, ⇒ ∆k(A) | detB(I, J) for all I, J . So,∆k(A) | ∆k(B). By symmetry, ∆k(B) | ∆k(A). So, ∆k(A) = ∆k(B).

Now, if

A ≈

d1

d2

. . .

dr

⊕ 0,

then

(4.4) ∆k(A) =

d1 · · · dk if 0 ≤ k ≤ r,0 if k > r.


So, r is uniquely determined by A and

(4.5) dk =∆k(A)

∆k−1(A), 1 ≤ k ≤ r,

are also uniquely determined by A.

Elementary divisors. Let A ∈ Mm×n(F [x]) and let d1, . . . , dr be the non-constant invariant factors of A. Write di = pei1

i1 · · · pei,sii,si

, where pi1, . . . , pi,si∈

F [x] are distinct monic irreducible polynomials and ei1, . . . , ei,si ∈ Z+. Thenpei1

i1 , . . . , pei,sii,si

, 1 ≤ i ≤ r, are called the elementary divisors of A.

Corollary 4.3. Let A,B ∈Mm×n(F [x]). The following statements are equiv-alent.

(i) A,B are equivalent.(ii) A,B have the same invariant factors.(iii) A,B have the same rank and same elementary divisors.(iv) A,B have the same determinantal divisors.

Proof. By Theorem 4.2, (i) ⇔ (ii). By (4.4) and (4.5), (ii) ⇔ (iv).Obviously, (ii) ⇒ (iii).(iii) ⇒ (ii). It suffices to show that the invariant factors of a matrix A ∈

Mm×n(F [x]) are determined by its rank and its elementary divisors. Let rankA = r.Let the elementary divisors of A be

pe111 , . . . , p

e1,s11 ,

...pet1

t , . . . , pet,stt ,

where p1, . . . , pt ∈ F [x] are distinct monic irreducibles and 0 < ei1 ≤ · · · ≤ ei,si,

1 ≤ i ≤ t. Then the last invariant factor of A is dr = pe1,s11 · · · pet,st

t . The otherinvariant factors of A are determined by the remaining elementary divisors

pe111 , . . . , p

e1,s1−1

1 ,...

pet1t , . . . , p

et,st−1t

the same way. Therefore, the invariant factors of A are determined by its rank andits elementary divisors.

Proposition 4.4. Let A,B be two matrices over F [x]. Then the elementarydivisor list of A⊕B is the union of the elementary divisor lists of A and B.

Proof. We may assume that A and B are Smith normal forms:

A =

f1

. . .

fs

⊕ 0,

g1

. . .

gt

⊕ 0.

Let p ∈ F [x] be any monic irreducible. Write fi = paif ′i , gj = pbjg′j , where p - f ′i ,p - g′j , and a1 ≤ · · · ≤ as, b1 ≤ · · · ≤ bt. Let c1 ≤ · · · ≤ cs+t be a rearangement ofa1, . . . , as, b1, . . . , bt. Then for 1 ≤ k ≤ s+ t,

∆k(A⊕B) = pc1+···+ckhk, hk ∈ F [x], p - hk.

4.2. THE SMITH NORMAL FORM 33

(Note that ∆k(A⊕B) = 0 for k > s+ t.) Hence, the kth invariant factor of A⊕Bis

∆k(A⊕B)∆k−1(A⊕B)

= pckh′k, h′k ∈ F [x], p - h′k.

Therefore, the powers of p appearing in the elementary divisor list of A ⊕ B arepck , ck > 0. These are precisely the powers of p appearing in the union of theelementary divisor lists of A and B.

Example. Let A ∈M5×4(R[x]) be given below.

A =

0 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 82 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −x4 − 2x3 − 3x2 − 6x− 6

x− 1 x2 − 1 −5x5 + 6x2 + 2x− 1 x5 + 3x4 + 5x3 + 6x2 + 5x+ 4−1 0 x4 + x3 + 2x2 3x4 + 6x3 + 9x2 + 12x+ 72 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −2x4 − 4x3 − 6x2 − 10x− 8

We use elementary operations to bring A to its Smith normal form:

Ar1↔r4−−−−−→

r1×(−1)1 0 −x4 − x3 − 2x2 −3x4 − 6x3 − 9x2 − 12x− 72 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −x4 − 2x3 − 3x2 − 6x− 6

x− 1 x2 − 1 −5x5 + 6x2 + 2x− 1 x5 + 3x4 + 5x3 + 6x2 + 5x+ 40 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 82 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −2x4 − 4x3 − 6x2 − 10x− 8

r2−2×r1−−−−−−−−−→r3−(x−1)×r1

r5−2×r1

1 0 −x4 − x3 − 2x2 −3x4 − 6x3 − 9x2 − 12x− 70 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 x2 − 1 x3 + 4x2 + 2x− 1 4x5 + 6x4 + 8x3 + 9x2 − 30 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 2x+ 2 6x+ 6 4x4 + 8x3 + 12x2 + 14x+ 6

−→

1 0 0 00 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 x2 − 1 x3 + 4x2 + 2x− 1 4x5 + 6x4 + 8x3 + 9x2 − 30 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 2x+ 2 6x+ 6 4x4 + 8x3 + 12x2 + 14x+ 6

= [1]⊕ (x+ 1)

2 6 5x3 + 5x2 + 10x+ 8

x− 1 x2 + 3x− 1 4x4 + 2x3 + 6x2 + 3x− 32 6 5x3 + 5x2 + 10x+ 82 6 4x3 + 4x2 + 8x+ 6

,


where2 6 5x3 + 5x2 + 10x+ 8

x− 1 x2 + 3x− 1 4x4 + 2x3 + 6x2 + 3x− 32 6 5x3 + 5x2 + 10x+ 82 6 4x3 + 4x2 + 8x+ 6

→ · · · →

1x2 + 2

(x+ 1)(x2 + 2)0 0 0

.So,

A ≈

1

x+ 1(x+ 1)(x2 + 2)

(x+ 1)2(x2 + 2)0 0 0 0

.

We have

∆1(A) = 1,

∆2(A) = x+ 1,

∆3(A) = (x+ 1)2(x2 + 2),

∆4(A) = (x+ 1)3(x2 + 2)2.

The elementary divisors of A are x+ 1, x+ 1, (x+ 1)2, x2 + 2, x2 + 2.

4.3. Rational Canonical Forms

Let A ∈ Mn(F ). Since det(xI − A) 6= 0 (in F [x]), the Smith normal form ofxI−A has no 0’s on the diagonal. So, the invariant factors of xI−A are completelydetermined by the nonconstant invariant factors of xI − A. For this reason, whenwe speak of the invariant factors of xI −A, we usually mean the nonconstant ones.The invariant factors, elementary divisors and determinantal divisors of xI −A arealso called those of A.

Theorem 4.5. Let A,B ∈ Mn(F ). Then the following statements are equiva-lent.

(i) A ∼ B.(ii) A,B have the same invariant factors.(iii) A,B have the same elementary divisors.(iv) A,B have the same determinantal divisors.

Proof. Immediate from Theorem 4.1 and Corollary 4.3.

Corollary 4.6. For every A ∈Mn(F ), A ∼ AT .

Proof. xI −A and xI −AT have the same determinantal divisors.

4.3. RATIONAL CANONICAL FORMS 35

The companion matrix. Let f(x) = xn + an−1xn−1 + · · ·+ a0 ∈ F [x]. The

companion matrix of f , denoted by M(f), is defined to be

M(f) =

0 10 1

. . . . . .

0 1−a0 −a1 · · · −an−2 −an−1

.

f(x) is the only invariant factor of M(f). In fact,

∆n(M(f)) =

∣∣∣∣∣∣∣∣∣∣∣∣

x −1x −1

. . . . . .

x −1a0 a1 · · · an−2 x+ an−1

∣∣∣∣∣∣∣∣∣∣∣∣= f(x),

∆n−1(M(f)) = 1.

Theorem 4.7. Let A ∈ (Mn(F )) have invariant factors d1, . . . , dr and elemen-tary divisors e1, . . . , es. Then

A ∼M(d1)⊕ · · · ⊕M(dr) ∼M(e1)⊕ · · · ⊕M(es).

M(d1) ⊕ · · · ⊕M(dr) and M(e1) ⊕ · · · ⊕M(es) are called the rational canonicalforms (in terms of invariant factors/elementary divisors).

Proof. The invariant factors of xI −M(d1)⊕ · · ·⊕M(dr) are d1, . . . , dr. Theelementary divisors of M(e1)⊕ · · · ⊕M(es) are e1, . . . , es.

The characteristic polynomial. Let A ∈ Mn(F ). cA(x) := det(xI − A)is called the characteristic polynomial of A.

Theorem 4.8 (Cayley-Hamilton). Let A ∈Mn(F ) have characteristic cA(x) =xn + an−1x

n−1 + · · ·+ a0. Then cA(A) = 0, i.e.,

An + an−1An−1 + · · ·+ a0I = 0.

Proof. We have

(4.6) cA(x)I = xnI+an−1xn−1I+ · · ·+a0I− cA(A)+ cA(A) = (xI−A)p+ cA(A)

for some p ∈Mn(F [x]). We also have

(4.7) cA(x)I = det(xI −A) I = (xI −A) adj(xI −A) = (xI −A)q,

where q = adj(xI −A) ∈Mn(F [x]). By (4.6) and (4.7),

(xI −A)(p− q) = cA(A).

A comparison of degrees in x implies that q = p; hence cA(A) = 0.

The minimal polynomial. Let A ∈Mn(F ). Let I = f ∈ F [x] : f(A) = 0.Then I 6= ∅ since cA ∈ I. Let m ∈ I be monic and of the smallest degree. Thenevery f ∈ I is a multiple of m. (Write f = qm+ r, where r = 0 or deg r < degm.Then 0 = f(A) = r(A). By the minimality of degm, we have r = 0.) Hence m is


unique in I; it is called the minimal polynomial of A, denoted by mA. We havemA | cA.

Easy fact. If A ∼ B, then cA(x) = cB(x) and mA(x) = mB(x).

Proposition 4.9. M2 Let f(x) = xn + an−1xn−1 + · · · + a0 ∈ F [x]. Then

the minimal polynomial of M(f) is f(x).

Proof. Let A = M(f). Only have to show that A0, A1, . . . , An−1 are linearlyindependent. (Thus, 6 ∃ g ∈ F [x] with deg g ≤ n − 1 such that g(A) = 0.) Usinginduction, we have

Ai

0...01

=

∗...∗1 n−i

0...0

, 0 ≤ i ≤ n− 1.

Hence Ai[1, 0, . . . , 0]T , 0 ≤ i ≤ n−1, are linearly independent. So, Ai, 0 ≤ i ≤ n−1,are linearly independent.

Proposition 4.10. Let A ∈Mn(F ) have invariant factors d1, . . . , dr, (d1 | d2 |· · · | dr). Then mA(x) = dr(x).

Proof. May assume A = M(d1)⊕ · · · ⊕M(dr). Then

dr(A) = dr

(M(d1)

)⊕ · · · ⊕ dr

(M(dr)

)= 0.

So, mA | dr. On the other hand, since mA(A) = 0, mA(M(dr)) = 0. By Proposi-tion 4.9, dr | mA.

Example. Let

A =

4 −3 8 −11−6 0 −8 10−14 7 −20 21−6 4 −8 6

∈M4(R).

Then

xI −A =

x− 4 3 −8 11

6 x 8 −1014 −7 x+ 20 −216 −4 8 x− 6

r1+r2−−−−→c1↔c4

1 x+ 3 0 x+ 2−10 x 8 6−21 −7 x+ 20 14x− 6 −4 8 6

−→

1 0 0 00 11x+ 30 8 10x+ 260 21x+ 56 x+ 20 21x+ 560 −x2 + 3x+ 14 8 −x2 + 4x+ 18


c2↔c3c4−c3−−−−→r3×8

[1]⊕

8 11x+ 30 −x− 48(x+ 20) 8(21x+ 56) 0

8 −x2 + 3x+ 14 x+ 4

−→ [1]⊕

8 0 00 −11x2 − 82x− 152x (x+ 4)(x+ 20)0 −x2 − 8x− 16 2x+ 8

−→ [1]⊕ [1]⊕ (x+ 4)

[−11x− 38 x+ 20−x− 4 2

]

−→ [1]⊕ [1]⊕ (x+ 4)

[1 00 x2 + 2x+ 4

].

So, the invariant factors of A are x+4, (x+4)(x2 +2x+4); the elementary divisorsare x+ 4, x+ 4, x2 + 2x+ 4. The rational canonical form of A is

[−4]⊕ [−4]⊕

[0 1−4 −2

].

Eigenvalues, eigenvectors and eigenspaces. Let A ∈ Mn(F ). If ∃ 0 6=x ∈ Fn and λ ∈ F such that

Ax = λx,

λ is called an eigenvalue of A and x is called an eigenvector of A (with eigenvalueλ). Eigenvalues of A are the roots of the characteristic polynomial cA(x). If λ isan eigenvalue of A,

Eλ(A) := x ∈ Fn : Ax = λx = kerc(A− λI)

is called the eigenspace of A with eigenvalue λ. dim EA(λ) = null(A− λI) is calledthe geometric multiplicity of λ. The multiplicity of λ as a root of cA(x) is calledthe algebraic multiplicity of λ. Similar matrices have the same eigenvalues togetherwith their algebraic and geometric multiplicities.

Fact. If A = M(f1) ⊕ · · · ⊕M(fk), where fi ∈ F [x] is monic and λ is aneigenvalue of A. Then the geometric multiplicity of λ is |i : fi(λ) = 0|. Inparticular, geo.mult.(λ) ≤ alg.mult.(λ).

Proof. We have

null(A− λI) =∑

i

null(M(fi)− λI),

where

null(M(fi)− λI) =

0 if fi(λ) 6= 0,1 if fi(λ) = 0.

Fact. Let λ1, . . . , λk ∈ F be distinct eigenvalues of A ∈Mn(F ). Then

EA(λ1) + · · ·+ EA(λk) = EA(λ1)⊕ · · · ⊕ EA(λk).


Proof. We want to show that

EA(λi) ∩(EA(λ1) + · · ·+ EA(λi−1) + EA(λi+1) + · · ·+ EA(λk)

)= 0, 1 ≤ i ≤ k.

Without loss of generality, assume i = 1. Let x ∈ EA(λ1)∩(EA(λ2)+ · · ·+EA(λk)

).

Thenx = a2x2 + · · ·+ akxk, xi ∈ EA(λi), ai ∈ F.

So,[ k∏i=2

(λ1 − λi)]x =

[ k∏i=2

(A− λiI)]x =

[ k∏i=2

(A− λiI)](a2x2 + · · ·+ akxk) = 0.

Hence, x = 0.

Diagonalizable matrices. A ∈ Mn(F ) is called diagonalizable (or diago-nable) if A is similar to a diagonal matrix.

Proposition 4.11. Let A ∈Mn(F ) and let λ1, . . . , λk be all the eigenvalues ofA in F . The following statements are equivalent.

(i) A is diagonalizable.(ii) All elementary divisors of A are of degree 1.(iii) Fn = EA(λ1)⊕ · · · ⊕ EA(λk).(iv)

∑ki=1 geo.mult.(λi) = n.

Simultaneous diagonalization.

Proposition 4.12. Let A1, . . . , Ak ∈ Mn(F ) such that each Ai is diagonaliz-able and AiAj = AjAi for all 1 ≤ i, j ≤ k. Then ∃P ∈ GL(n, F ) such that PAiP

−1

is diagonal for all 1 ≤ i ≤ k.

Proof. Use induction on k.Since A1 is diagonalizable, we may assume

A = a1In1 ⊕ · · · ⊕ asIns,

where a1, . . . , ak ∈ F are distinct and n1 + · · ·+ ns = n. For each 2 ≤ i ≤ n, sinceAi commutes with A1, we must have

Ai = Ai1 ⊕ · · · ⊕Ais, Aij ∈Mnj(F ).

Since Ai is diagonalizable, each Aij is diagonalizable. (Think of the elementary divi-sors.) Since A2, . . . , Ak are pairwise commutative, for each 1 ≤ j ≤ s, A2j , . . . , Akj

are pairwise commutative. By the induction hypothesis, ∃Pj ∈ GL(nj , F ) such thatPjAijP

−1j is diagonal for all 2 ≤ i ≤ k. Let P = P1 ⊕ · · · ⊕ Ps. Then PAiP

−1 isdiagonal for all 1 ≤ i ≤ k.

The equation AX = XB. Let A ∈Mm(F ) and B ∈Mn(F ). We compute

dimX ∈Mm×n(F ) : AX = XB.

Lemma 4.13. Let A ∈ Mn(F ) such that cA(x) = mA(x). Then for any g ∈F [x], rank g(A) = n− deg(g, cA).

Proof. Let h = (g, cA). Then rank g(A) ≤ rankh(A). Write h = ag + bcAfor some a, b ∈ F [x]. Then h(A) = a(A)g(A). So, rank g(A) ≥ rankh(A). Hencerank g(A) = rankh(A).


We may assume that A is a rational canonical form

A =

0 1

. . .

1∗ ∗ · · · ∗

.Then

Ai =

0 · · · 0 1...

.... . .

0 · · · 0 1 n−i

∗ · · · ∗ ∗ · · · ∗...

......

...∗ · · · ∗ ∗ · · · ∗

, 0 ≤ i ≤ n.

Hence, the (n − deg h) × (n − deg h) submatrix at the upper right corner of h(A)is invertible. So, rankh(A) ≥ n − deg h. Replace h with cA/h. We also haverank (cA/h)(A) ≥ deg h. On the other hand, since h(A)(cA/h)(A) = 0, we haverankh(A) + rank (cA/h)(A) ≤ n. Therefore, rankh(A) = n − deg h and rank(cA/h)(A) = deg h.

Lemma 4.14. Let f = xn + an−1xn−1 + · · · + a0 ∈ F [x], A ∈ Mm(F ) and

X ∈Mm×n(F ). Then AX = XM(f)T if and only if

X = [x,Ax, . . . , An−1x]

for some x ∈ kerc f(A).

Proof. Write X = [x1, . . . , xn] where x1, . . . , xn ∈ Fn. Then the equationAX = XM(f)T becomes

[Ax1, . . . , Axn] = [x1, . . . , xn]

0 −a0

1 −a1

. . ....

1 −an−1

= [x2, . . . , xn, −a0x1 − · · · − an−1xn],

i.e.,

(4.8)

Ax1 = x2,

...Axn−1 = xn,

Axn = −a0x1 − · · · − an−1xn.

Clearly, (5.11) is equivalent to xi = Ai−1x1, 1 ≤ i ≤ n and f(A)x1 = 0.

Proposition 4.15. Let A ∈Mm(F ) and B ∈Mn(F ) such that

A ∼M(f1)⊕ · · · ⊕M(fs), B ∼M(g1)⊕ · · · ⊕M(gt),


where fi, gj ∈ F [x] are monic. Then

dimX ∈Mm×n(F ) : AX = XB =∑i,j

deg(fi, gj).

Proof. We may assume that A = M(f1) ⊕ · · · ⊕M(fs) and B = M(g1)T ⊕· · · ⊕M(gt)T . Let αi = deg fi and βj = deg gj . Write

X =

X11 · · · X1t

......

Xs1 · · · Xst

, Xij ∈Mαi×βj (F ).

Then AX = XB ⇔M(fi)Xij = XijM(gj)T for all i, j.

By Lemmas 4.14 and 4.13,

dimXij ∈Mαi×βj (F ) : M(fi)Xij = XijM(gj)T = dim

(kerc gj(M(fi))

)= deg(gj , fi).

Hence the proposition.

Corollary 4.16. Let A ∈ Mm(F ) and B ∈ Mn(F ). Let the elementarydivisors of A be

pa111 , . . . , p

a1,k11 ; . . . ; pas1

s , . . . , pas,kss and powers of q1, . . . , qt,

and let the elementary divisors of B be

pb111 , . . . , p

b1,l11 ; . . . ; pbs1

s , . . . , pbs,lss and powers of r1, . . . , ru,

where p1, . . . , ps, q1, . . . , qt, r1, . . . , ru are distinct monic irreducibles in F [x] andaij , bij ∈ Z+. Then

dimX ∈Mm×n(F ) : AX = XB =s∑

i=1

ks∑j=1

ls∑j′=1

min(aij , bij′) deg pi.

Proof. Immediate from Proposition 4.15.

4.4. The Jordan Canonical Form

Jordan block. Let λ ∈ F and n > 0. The n×n Jordan block with eigenvalueλ is

Jn(λ) :=

λ 1 0 · · · 00 λ 1 · · · 0...

.... . . . . .

...0 0 · · · λ 10 0 · · · 0 λ

∈Mn(F ).

(x− λ)n is the only elementary divisor of Jn(λ).

Let A ∈ Mn(F ) such that cA(x) factors into a product of linear polynomials.(This is the case when F = C or any algebraically closed field.) Then all elementarydivisors of A are of the form (x− λ)e, λ ∈ F , e > 0.

4.4. THE JORDAN CANONICAL FORM 41

Theorem 4.17. Let A ∈Mn(F ) and assume that the elementary divisors of Aare (x− λ1)n1 , . . . , (x− λk)nk , λi ∈ F , ni > 0, n1 + · · ·+ nk = n. Then

(4.9) A ∼ Jn1(λ1)⊕ · · · ⊕ Jnk(λk).

The RHS of (4.9) is called the Jordan canonical form of A.

Proof. The two sides of (4.9) have the same elementary divisors.

The Hasse derivative. For f(x) = a0 + a1x+ · · ·+ anxn ∈ F [x] and k ≥ 0,

define

∂kf =(k

k

)ak +

(k + 1k

)ak+1x+ · · ·+

(n

k

)anx

n−k.

∂kf is called the kth order Hasse derivative of f . (If F is of characteristic 0, then∂kf = 1

k!f(k).)

Properties of the Hasse derivative. Let f, g ∈ F [x] and a, b ∈ F .(i) ∂k(af + bg) = a∂kf + b∂kg.(ii) ∂k(fg) =

∑i+j=k(∂if)(∂jg).

Lemma 4.18. Let f ∈ F [x], n > 0 and λ ∈ F . Then

(4.10) f(Jn(λ)

)=

f(λ) ∂1f(λ) · · · ∂n−1f(λ)

f(λ). . .

.... . . ∂1f(λ)

f(λ)

.

Proof. Only have to prove (4.10) with f(x) = xk, since both sides of (4.10)are linear in f . Let

Nn =

0 1 0 · · · 00 1 · · · 0

. . . . . ....

0 10

n×n

.

Then

N in =

0 · · · 0 1 0 · · · 00 1 · · · 0

. . . . . ....

0 1 n−i

0...0

, 0 ≤ i ≤ n,

and N in = 0 for i ≥ n. Thus

Jn(λ)k = (λI +Nn)k =k∑

i=0

(k

i

)λk−iN i

n =k∑

i=0

∂if(λ)N in =

n−1∑i=0

∂if(λ)N in.


Proposition 4.19. M2 Let A ∈Mn(F ) and λ an eigenvalue of A. Let τi bethe number of Ji(λ) in the Jordan canonical form of A. Then

τi = rank(A− λI)i−1 − 2 rank(A− λI)i + rank(A− λI)i+1, i ≥ 1.

Proof. May assume A = Jn1(λ)⊕· · ·⊕Jnk(λ)⊕B, where λ is not an eigenvalue

of B. Note that A− λI = Nn1 ⊕ · · · ⊕Nnk⊕ (B − λI), where B − λI is invertible.

Thus,

rank(A− λI)i−1 − rank(A− λI)i

=k∑

j=1

[rankN i−1

nj− rankN i

nj

]=

k∑j=1

[max0, nj − (i− 1) −max0, nj − i

]= |j : nj ≥ i|.

Hence,

τi = |j : nj = i|= |j : nj ≥ i| − |j : nj ≥ i+ 1|= rank(A− λI)i−1 − rank(A− λI)i −

[rank(A− λI)i − rank(A− λI)i+1

]= rank(A− λI)i−1 − 2 rank(A− λI)i + rank(A− λI)i+1.

Proposition 4.20 (The Jordan canonical form of a companion matrix). Letf = xk +ak−1x

k−1 + · · ·+a0 = (x−λ1)e1 · · · (x−λt)et ∈ F [x], where λ1, . . . , λt ∈ Fare distinct and e1, . . . , et ∈ Z+. Then

M(f) = P

Je1(λ1)

. . .

Jet(λt)

P−1,

where(4.11)

P =

(00

)λ0

1 · · ·(

0e1−1

)λ1−e1

1 · · ·(00

)λ0

t · · ·(

0et−1

)λ1−et

t(10

)λ1

1 · · ·(

1e1−1

)λ2−e1

1 · · ·(10

)λ1

t · · ·(

1et−1

)λ2−et

t

......

......(

k−10

)λk−1

1 · · ·(

k−1e1−1

)λk−e1

1 · · ·(k−10

)λk−1

t · · ·(

k−1et−1

)λk−et

t

.

(Note.(

ij

)= 0 if i, j ∈ Z and 0 ≤ i < j.)

Proof. First, we show that P is invertible. Assume [b0, . . . , bk−1]P = 0. Letg = b0 + · · · + bk−1x

k−1. Then ∂jg(λi) = 0 for 1 ≤ i ≤ t, 0 ≤ j ≤ ei − 1.Therefore,

∏ti=1(x − λi)ei | g. Since e1 + · · · + et = k, we must have g = 0, i.e.,

[b0, . . . , bk−1] = 0.


We only have to show that M(f)P = P(Je1(λ1)⊕ · · · ⊕ Jet(λt)

). It suffices to

show that for each 1 ≤ i ≤ t,(4.12)

M(f)

(00

)λ0

i · · ·(

0ei−1

)λ1−ei

i(10

)λ1

i · · ·(

1ei−1

)λ2−ei

i...

...(k−10

)λk−1

i · · ·(

k−1ei−1

)λk−ei

1

=

(00

)λ0

i · · ·(

0ei−1

)λ1−ei

i(10

)λ1

i · · ·(

1ei−1

)λ2−ei

i...

...(k−10

)λk−1

i · · ·(

k−1ei−1

)λk−ei

1

Jei(λi).

First,

the 1st column of the LHS of (4.12)

=M(f)

(00

)λ0

i

...(k−10

)λk−1

i

=

(10

)λ1

i

...(k−10

)λk−1

i

−∑k−1

l=0 al

(l0

)λl

i

=

(10

)λ1

i

...(k−10

)λk−1

i(k0

)λk

i

(∵ f(λi) = 0)

=λi

(00

)λ0

i

...(k−10

)λk−1

i

= the 1st column of the RHS of (4.12).

For 1 ≤ j ≤ ei − 1, we have

the (j + 1)st column of the LHS of (4.12)

=M(f)

(0j

)λ

1−(j+1)i

...(k−1

j

)λ

k−(j+1)i

=

(1j

)λ

2−(j+1)i

...(k−1

j

)λ

k−(j+1)i

−∑k−1

l=0 al

(lj

)λl−j

i

=

(1j

)λ1−j

i

...(k−1

j

)λk−1−j

i(kj

)λk−j

i

(∵ (∂jf)(λi) = 0)

=

[(

0j

)+(

0j−1

)]λ1−j

i

...[(k−1

j

)+(k−1j−1

)]λk−j

i

= λi

(0j

)λ

1−(j+1)i

...(k−1

j

)λ

k−(j+1)i

+

(

0j−1

)λ1−j

i

...(k−1j−1

)λk−j

i

=the (j + 1)st column of the RHS of (4.12).

Homogeneous linear recurrence equations with constant coeffi-cients. We try to solve the kth order homogeneous linear recurrence equation

(4.13) xn+k + ak−1xn+k−1 + · · ·+ a0xn = 0, n ≥ 0,


where a0, . . . , ak−1 ∈ F . Equation 4.13 is equivalent to

xn+1

...xn+k

=

0 10 1

. . .

1−a0 −a1 −a2 · · · −ak−1

xn

...xn+k−1

= M(f)

xn

...xn+k−1

, n ≥ 0,

where f = xk+ak−1xk−1+· · ·+a0 ∈ F [x]. (f is called the characteristic polynomial

of equation (4.13).) Thus

(4.14)

xn

...xn+k−1

= M(f)n

x0

...xk−1

.Let f(x) = (x−λ1)e1 · · · (x−λt)et , where λ1, . . . , λt ∈ F are distinct and e1, . . . , et ∈Z+. By Proposition 4.20,

(4.15) M(f) = P

Je1(λ1)

. . .

Jet(λt)

P−1,

where P is given by (4.11). By (4.14) and (4.15),

xn = [1, 0, . . . , 0]M(f)n

x0

...xk−1

= [1, 0, . . . , 0]P

Je1(λ1)n

. . .

Jet(λt)n

P−1

x0

...xk−1

.[1, 0, . . . , 0]P is the first row of P , which has 1 at the 1st, (e1 +1)st, . . . , (e1 + · · ·+et−1 + 1)st components and has 0 elsewhere. By Lemma 4.18, the sum of the 1st,(e1 + 1)st, . . . , (e1 + · · ·+ et−1 + 1)st rows of Je1(λ1)n ⊕ · · · ⊕ Jet

(λt)n is[(n0)λ

n1 , . . . , ( n

e1−1)λn−e1+11 ; . . . ; (n

0)λnt , . . . , ( n

et−1)λn−et+11

].

Thus,

xn =[(n0)λ

n1 , . . . , ( n

e1−1)λn−e1+11 ; . . . ; (n

0)λnt , . . . , ( n

et−1)λn−et+11

]P−1

x0

...xk−1

.Homogeneous linear ODE with constant coefficients. Let A ∈

Mn(C) and consider the initial value problem

(4.16)

x′(t) = Ax(t)x(0) = x0,


where x0 ∈ Cn and x(t) ∈ Cn is an unknown function of a real variable t. Bythe existence and uniqueness theorem in ODE, (4.16) has a unique solution x(t)defined for all t ∈ R. This solution can be explicitly determined as follows.

There exists P ∈ GL(n,C) such that

PAP−1 = Jn1(λ1)⊕ · · · ⊕ Jns(λs),

where λi ∈ C, n1 + · · · + ns = n. Let y(t) = Px(t) and y0 = Px0. Then (4.16)becomes

(4.17)

y′(t) =

(Jn1(λ1)⊕ · · · ⊕ Jns(λs)

)y(t)

y(0) = y0.

Assume for the time being that y(t) is analytic, i.e.,

y(t) =∞∑

k=0

1k!

y(k)(0)tk.

By (4.17),

y(k)(0) =(Jn1(λ1)⊕ · · · ⊕ Jns(λs)

)ky0

=(· · · ⊕ Jni(λi)k ⊕ · · ·

)y0

=[· · · ⊕

(ni−1∑j=0

(k

j

)λk−j

i N jni

)⊕ · · ·

]y0.

Therefore,

y(t) =[· · · ⊕

(ni−1∑j=0

[ ∞∑k=0

1k!

(k

j

)λk−j

i tk]N j

ni

)⊕ · · ·

]y0

=[· · · ⊕

(ni−1∑j=0

tjeλitN jni

)⊕ · · ·

]y0

=(· · · ⊕

eλit teλit · · · tni−1eλit

eλit. . .

.... . . teλit

eλit

⊕ · · ·)y0.

It is easy to see that y(t) given above is indeed a solution of (4.17). The solutionof (4.16) is x(t) = P−1y(t).

Locations of complex eigenvalues.

Gergorin disks. For a ∈ C and r ≥ 0, define D(a, r) = z ∈ C : |z− a| ≤ r.Let A = [aij ] ∈Mn(C). Then

D(r)i (A) := D

(aii,

∑j 6=i

|aij |)

is called the Gergorin row disk for the ith row of A;

D(c)j (A) := D

(ajj ,

∑i 6=j

|aij |)


is called the Gergorin column disk for the jth column of A. The Gergorin regionof A is defined to be

G(A) =( n⋃

i=1

D(r)i (A)

)∩( n⋃

j=1

D(c)j (A)

).

Theorem 4.21 (Gergorin). Let A ∈Mn(C). Then all the eigenvalues of A liein the Gergorin region of A.

Proof. Let A = [aij ] and let λ be an eigenvalue of A with an associatedeigenvector x = [x1, . . . , xn]T . Assume |xi| = max1≤j≤n |xj |. Since Ax = λx, wehave ai1x1 + · · ·+ ainxn = λxi. So,

|λ− aii||xi| = |(λi − aii)xi| =∣∣∣∑j 6=i

aijxj

∣∣∣ ≤ |xi|∑j 6=i

|aij |.

Hence |λ−aii| ≤∑

j 6=i |aij |. Thus λ ∈ D(r)i (A). Therefore, we have proved that λ ∈⋃n

i=1D(r)i (A). In the same (or by looking at AT ), we have λ ∈

⋃nj=1D

(c)j (A).

Corollary 4.22. Let A = [aij ] ∈Mn(C) such that either

(4.18) |aii| >∑j 6=i

|aij | for all 1 ≤ i ≤ n,

or

(4.19) |ajj | >∑i 6=j

|aij | for all 1 ≤ j ≤ n.

(A matrix satisfying (4.18) or (4.19) is called diagonally dominant.) Then A isinvertible.

Proof. We have 0 /∈ G(A).

Proposition 4.23. Let A = [aij ] ∈ Mn(C). Let X be a connected componentof G(A). Then the number of eigenvalues of A (counted with algebraic multiplicity)contained in X is |i : aii ∈ X|.

Proof. Let C be a contour (or a unioun of contours when X is not simplyconneted) such that C encloses X and C ∩G(A) = ∅. For t ∈ [0, 1], let

At =

a11 ta12 · · · ta1n

ta21 a22 · · · ta2n

......

. . ....

tan1 tan2 · · · ann

.Note that G(At) ⊂ G(A); hence C ∩ G(At) = ∅. The number of zeros of cAt

(z)(counted with multiplicity) in X is given by

N(t) :=1

2πi

∫C

c′At(z)

cAt(z)dz.

EXERCISES 47

N(t) is a continuous function of t ∈ [0, 1] and takes only integer values. Thus, N(T )is a constant for t ∈ [0, 1]. So,

the number of zeros of cA in X

=N(1) = N(0)= the number of zeros of cA0 in X

= |i : aii ∈ X|.

Exercises

4.1. Use the rational canonical form to give another proof for Exercise 3.2 (ii).

4.2. Let A = Mm×n(F ) and B ∈Mn×m(F ). Prove that

xn det(xIm −AB) = xm det(xIn −BA).

(In particular, if m = n, then cAB(x) = cBA(x).)

4.3. (Trace) For A = [aij ] ∈Mn(F ), define Tr(A) = a11 + a22 + · · ·+ ann. Provethe following statements.(i) Tr(AB) = Tr(BA) for A,B ∈Mn(F ).(ii) If A ∼ B, then Tr(A) = Tr(B).(iii) Let A ∈ Mn(F ). Then Tr(A) = 0 ⇔ A = XY − Y X for some X,Y ∈

Mn(F ).

4.4. Let A ∈ Mn(F ) have invariant factors d1, d2, . . . , dr, (d1 | d2 | · · · | dr).Define the centralizer of A in Mn(F ) to be

centMn(F )(A) = X ∈Mn(F ) : XA = AX.Prove that

dim[centMn(F )(A)

]=

r−1∑i=0

(2i+ 1) deg dr−i.

4.5. M2 For A ∈ Mn(F ), let 〈A〉 = f(A) : f ∈ F [x]. Obviously, 〈A〉 ⊂centMn(F )(A). Prove that centMn(F )(A) = 〈A〉 ⇔ cA(x) = mA(x). (Amatrix A ∈Mn(F ) with cA(x) = mA(x) is called nonderogatory.)

4.6. Let A,B ∈Mn(C) such that AB = BA. Let λ be an eigenvalue of A. Provethat the eigenspace EA(λ) is B-invariant, i.e., BEA(λ) ⊂ EA(λ). Use this toshow that A,B has a common eigenvector.

4.7. Let xn ∈ C satisfyx0 = a, x1 = b, x2 = c, x3 = d,

xn = 6xn−1 − 11xn−2 + 12xn−3 − 18xn−4, n ≥ 4.

Find an explicit formula for xn.

4.8. Find the rational canonical form of

A =

−9 −2 −9 −824 8 27 −24−4 −2 −4 5−7 −2 −6 7

∈M4(Q).


4.9. Let

A =

1 1 1 1 10 1 0 −1 −10 0 1 1 00 0 0 1 10 0 0 0 1

∈M5(C).

Use Proosition 4.19 to determine the Jordan canonical form of A.

4.10. Find all rational canonical forms (in terms of elementary divisors) of M4(Z2).The irreducibles of degree ≤ 4 in Z2[x] are x, x+ 1, x2 + x+ 1, x3 + x+ 1,x3 + x2 + 1, x4 + x+ 1, x4 + x3 + 1, x4 + x3 + x2 + x+ 1.

4.11. Let A ∈Mm(F ) and B,C ∈Mn(F ).(i) If A⊕B ∼ A⊕ C, then B ∼ C.(ii) If B ⊕B ∼ C ⊕ C, then B ∼ C.

CHAPTER 5

Inner Product Spaces and Unitary Spaces

5.1. Inner Product Spaces

Definition 5.1. An inner product space is a vector space V over R equippedwith a map (called the inner product) 〈·, ·〉 : V × V → R satisfying the followingconditions.

(i) 〈u, v〉 = 〈v, u〉 ∀u, v ∈ V .(ii) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉 ∀u, v, w ∈ V, a, b ∈ R.(iii) 〈u, u〉 ≥ 0 for all u ∈ V and 〈u, u〉 = 0⇔ u = 0.

Examples.

• V = Rn. For x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Rn, define

〈x, y〉 = x1y1 + · · ·+ xnyn.

• V = R[x]. For f(x), g(x) ∈ R[x], define

〈f, g〉 =∫ 1

−1

f(x)g(x)dx.

• `2 =(an)∞n=0 : an ∈ R,

∑∞n=0 a

2n <∞

. For (an), (bn) ∈ `2, define

〈(an), (bn)〉 =∞∑

n=0

anbn.

• L2(X). Let (X,B, µ) be a measure space. Two functions f, g : X → R ∪±∞ are considered the same if f = g almost everywhere. L2(X) = theset of all measurable functions f : X → R∪ ±∞ such that

∫X|f |2dµ <

∞. For f, g ∈ L2(X), define

〈f, g〉 =∫

X

fgdµ.

(`2 is a special case of L2(X).)

Norm and distance. Let V be an inner product space and let u, v ∈ V .||u|| :=

√〈u, u〉 is called the norm of u. ||u − v|| is called the distance between u

and v.

Inequalities and equalities. Let V be an inner product space.(i) (Cauchy-Schwartz) For all u, v ∈ V ,

|〈u, v〉| ≤ ||u|| ||v||.

The equality holds iff one of u, v is a scalar multiple of the other.

49

50 5. INNER PRODUCT SPACES AND UNITARY SPACES

(ii) (The triangle inequality) For all u, v ∈ V ,

||u+ v|| ≤ ||u||+ ||v||.

The equality holds iff one of u, v is a nonnegative multiple of the other.(iii) (Inner product in terms of norm)

(5.1) 〈u, v〉 =14(||u+ v||2 − ||u− v||2), u, v ∈ V.

(iv) (The parallelogram law)

||u+ v||2 + ||u− v||2 = 2||u||2 + 2||v||2, u, v ∈ V.

Proof. (i) Without loss of generality, assume v 6= 0. Let r = 〈u,v〉〈v,v〉 . Then

0 ≤ ||u− rv||2 = 〈u− rv, u− rv〉 = 〈u, u〉 − 2r〈u, v〉+ r2〈v, v〉 = ||u||2 − 〈u, v〉2

||v||2.

Hence, 〈u, v〉2 ≤ ||u||2||v||2, i.e., |〈u, v〉| ≤ ||u|| ||v||. The equality holds ⇔ u− rv =0⇔ u = 〈u,v〉

〈v,v〉v ⇔ u is a multiple of v.(ii) We have

||u+ v||2 = ||u||2 + ||v||2 + 2〈u, v〉 ≤ ||u||2 + ||v||2 + 2||u|| ||v|| = (||u||+ ||v||)2.

Isometry. Let V and W be two inner product spaces. A vector space isomor-phism f : V →W is called an isometry if

〈f(u), f(v)〉 = 〈u, v〉 for all u, v ∈ V.

Fact. Let V and W be two inner product spaces and let f ∈ HomR(V,W ).Then f preserves the inner products (i.e., 〈f(u), f(v)〉 = 〈u, v〉 ∀u, v ∈ V ) ⇔ fpreserves the norms (i.e., ||f(u)|| = ||u|| ∀u ∈ V ).

Proof. (⇐) By (5.1), the inner product is expressible in terms of the norm.

Orthogonality. Let V be an inner product space. Two elements u, v ∈ Vare called orthogonal, denoted as x⊥y, if 〈x, y〉 = 0. For X ⊂ V , define X⊥ = y ∈V : 〈y, x〉 = 0 ∀x ∈ X. X⊥ is a subspace of V .

Pythagorean theorem. Let V be an inner product space and let u, v ∈ V .Then u⊥v ⇔ ||u+ v||2 = ||u||2 + ||v||2.

Proposition 5.2. Let V be an inner product space and let S, T be subspacesof V .

(i) S ⊂ T ⇒ S⊥ ⊃ T⊥.(ii) S ∩ S⊥ = 0, S + S⊥ = S ⊕ S⊥. If dimS <∞, V = S ⊕ S⊥.(iii) S ⊂ S⊥⊥. If dimS <∞, S = S⊥⊥.(iv) If S ⊂ T , then

φ : S⊥/T⊥ −→ (T/S)∗

a+ T⊥ 7−→ 〈·, a〉

is an embedding. If dimV <∞, φ is an isomorphism.

5.1. INNER PRODUCT SPACES 51

Proof. (ii) We show that if dimS <∞, then V = S ⊕ S⊥.Method 1. The map ψ : V/S⊥ → S∗, a+ S⊥ 7→ 〈·, a〉 is an embedding. Hence

dimV/S⊥ ≤ dimS∗ = dimS = dim(S ⊕ S⊥)/S⊥. So, V/S⊥ = (S ⊕ S⊥)/S⊥, i.e.,V = S ⊕ S⊥.

Method 2. By the G-S orthonormalization (p. 52), S has an orthonormal basisu1, . . . , uk. For each x ∈ V , let x′ =

∑ki=1

〈x,ui〉〈ui,ui〉ui. Then x = x′ + (x− x′) where

x′ ∈ S and x− x′ ∈ S⊥.(iii) We show that if dimS <∞, then S⊥⊥ ⊂ S. ∀x ∈ S⊥⊥, write x = x1 +x2,

where x1 ∈ S and x2 ∈ S⊥. Since 0 = 〈x, x2〉 = 〈x2, x2〉, x2 = 0. So, x = x1 ∈ S.(iv) When dimV < ∞, by (ii), dim(S⊥/T⊥) = dimT − dimS = dim(T/S) =

dim(T/S)∗. So, φ is an isomorphism.

Note. In general, we do not have V = S ⊕ S⊥ and S = S⊥⊥. Example:Let S = (an) ∈ `2 : an = 0 for n large enough ⊂ `2. Then S⊥ = 0 andS⊥⊥ = `2 6= S.

Orthogonal and orthonormal sets. Let V be an inner product space. Asubset X ⊂ V is called orthogonal if 〈x, y〉 = 0 for all x, y ∈ X with x 6= y. X iscalled orthonormal if for x, y ∈ X,

〈x, y〉 =

1 if x = y,

0 if x 6= y.

An orthogonal set of nonzero vectors is linearly independent.

Hilbert bases. A maximal orthonormal set of V is called a Hilbert basis ofV . By Zorn’s lemma, V has a Hilbert basis. A Hilbert basis is not necessarily a

basis. Example: Let ei = (0, . . . , 0,i1, 0 . . . ) ∈ `2. Then ei : i ≥ 1 is a Hilbert

basis of `2 but not a basis of `2. Another example: Let V = R⊕R⊕ · · · with innerproduct 〈(x1, x2, . . . ), (y1, y2, . . . )〉 =

∑∞i=1 xiyi. Then ei : i ≥ 1 is a Hilbert

basis of V which is also a basis of V . Let ui, i ≥ 1, be the orthonormalizationp. 52) of ei − ei+1, i ≥ 1. Then ui : i ≥ 1 is a Hilbert basis of V . (If x⊥ui forall i, then x = (a, a, . . . ); hence x = 0.) But ui : i ≥ 1 is not a basis of V sincespanui : i ≥ 1 =

(x1, x2, . . . ) ∈ V :

∑∞i=1 xi = 0

6= V . If dimV <∞, a Hilbert

basis is a basis.

Projections. Assume that S is a subspace of V such that V = S⊕S⊥. Eachx ∈ V can be uniquely written as x = x1 + x2, where x1 ∈ S and x2 ∈ S⊥. x1 iscalled the (orthogonal) projection of x onto S and is denoted by projS(x).

If dimS <∞ and u1, . . . , uk is an orthonormal basis of S, then

projS(x) =k∑

i=1

〈x, ui〉ui.

Since ||x||2 = ||projS(x)||2 + ||x− projS(x)||2 ≥ ||projS(x)||2, we have

(5.2) ||x||2 ≥ |〈x, u1〉|2 + · · ·+ |〈x, uk〉|2, x ∈ V.The equality in (5.2) holds iff x ∈ span(u1, . . . , uk). (5.2) is called Bessel’s inequal-ity.

Proposition 5.3. Any two Hilbert bases of an inner product space V have thesame cardinality. This cardinality is called the Hilbert dimension of V .


Proof. Only have to consider the case where dimV = ∞. Let X and Y betwo Hilbert bases of V . Clearly, |X| = ∞ and |Y | = ∞. For each x ∈ X, letf(x) = y ∈ Y : 〈y, x〉 6= 0 ⊂ Y .

1 We claim that Y =⋃

x∈X f(x). If ∃y ∈ Y \⋃

x∈X f(x), then y⊥x for allx ∈ X. Then X ∪ y

||y|| is an orthonormal set properly containing X, →←.2 We claim that |f(x)| ≤ ℵ0 for all x ∈ X. In fact, f(x) =

⋃∞n=1y ∈ Y :

|〈y, x〉| ≥ 1n. By Bessel’s inequality,∣∣∣y ∈ Y : |〈y, x〉| ≥ 1

n

∣∣∣ · ( 1n

)2

≤ ||x||2;

hence, |y ∈ Y : |〈y, x〉| ≥ 1n| ≤ n

2||x||2.3 |Y | =

∣∣⋃x∈X f(x)

∣∣ ≤ |X|ℵ0 = |X|. By symmetry, |X| ≤ |Y |.

Gram-Schmidt orthonormalization. Let V be an inner product spaceand let v1, v2, · · · ∈ V (finitely or countably many) be linearly independent. Thenthere is a unique orthonormal sequence u1, u2, · · · ∈ V such that for all k ≥ 1,

(i) span(u1, . . . , uk) = span(v1, . . . , vk);(ii) 〈vk, uk〉 > 0.

The sequence uk, called the Gram-Schmidt orthonormalization of vk, is inductivelygiven by

(5.3) uk =1||u′k||

u′k, where u′k = vk −k−1∑i=1

〈vk, ui〉ui.

Proof of the uniqueness of uk. Let wk be another orthonormal sequencesatisfying (i) and (ii). Then wk = a1u1+· · ·+akuk. Since wk⊥span(w1, . . . , wk−1) =span(u1, . . . , uk−1), we have a1 = · · · = ak−1 = 0; hence wk = akuk. Since ||wk|| =||uk|| = 1, we have ak = ±1. Since 〈vk, uk〉 > 0 and 〈vk, wk〉 > 0, we haveak = 1.

Theorem 5.4 (Explicit formula for the G-S orthonormalization). In the abovenotation, define

Dn =

∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉

......

〈vn, v1〉 · · · 〈vn, vn〉

∣∣∣∣∣∣∣∣ , n ≥ 1,

and D0 = 1. Then Dn > 0 for all n ≥ 0 and

(5.4) un =1√

Dn−1Dn

∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉

......

〈vn−1, v1〉 · · · 〈vn−1, vn〉v1 · · · vn

∣∣∣∣∣∣∣∣∣∣, n ≥ 1.

Proof. It follows from Fact 5.5 that Dn > 0 for all n ≥ 0. Let un be given by(5.4). Then

(5.5) un =√Dn−1

Dnvn + an,n−1vn−1 + · · ·+ an1v1.

5.1. INNER PRODUCT SPACES 53

It remains to show that u1, u2, . . . is orthonormal. Let 1 ≤ i ≤ n. We have

〈vi, un〉 =1√

Dn−1Dn

∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉

......

〈vn−1, v1〉 · · · 〈vn−1, vn〉〈vi, v1〉 · · · 〈vi, vn〉

∣∣∣∣∣∣∣∣∣∣= 0.

So, un⊥span(v1, . . . , vn−1) = span(u1, . . . , un−1). By (5.5) and (5.4),

〈un, un〉 =⟨√Dn−1

Dnvn, un

⟩

=√Dn−1

Dn

1√Dn−1Dn

∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉

......

〈vn−1, v1〉 · · · 〈vn−1, vn〉〈vn, v1〉 · · · 〈vn, vn〉

∣∣∣∣∣∣∣∣∣∣= 1.

Fact. Every inner product space V with dimV ≤ ℵ0 has an orthonormal basis.Any two inner product spaces V and W with dimV = dimW ≤ ℵ0 are isometric.

Note.

(i) If V is an inner product space with dimV = ℵ0, then its completion isisometric to `2.

(ii) Let V be an inner product space with dimV ≥ ℵ and let V be a completionof V . Then dim V = dimV .

(iii) Let V be a non complete inner product space such that dimV ≥ ℵ. Thendim V = dimV , but V and V are not isometric.

Proof. (i) May assume V = Rℵ0 and 〈(xn), (yn)〉 =∑∞

i=0 xnyn. The comple-tion of V is `2.

(ii) Let X be a basis of V and Y a basis of V . For each y ∈ Y , ∃ a sequenceyn ∈ V such that limn→∞ yn = y. Each yn is a linear combination of finitelymany elements in X. Hence, ∃ a countable subset x0, x1, . . . ⊂ X such that yn ∈spanx0, x1, . . . for all n. So, y ∈ spanx0, x1, . . . , the closure of spanx0, x1, . . . in V . spanx0, x1, . . . is a completion of spanx0, x1, . . . . By 1, spanx0, x1, . . . is isometric to `2. Define f(y) = (x0, x1, . . . ). Then for each (xn) ∈ XN,∣∣f−1

((xn)

)∣∣ ≤ ∣∣spanx0, x1, . . . ∣∣ = |`2| ≤ |RN| = ℵ.

Therefore,

|Y | =∣∣∣ ⋃(xn)∈XN

f−1((xn)

)∣∣∣ ≤ |XN|ℵ = |X|ℵ = |X|.

Example (Legendre polynomials). For f, g ∈ R[x], define

〈f, g〉 =∫ 1

−1

f(x)g(x)dx.


Let f0, f1, f2, . . . be the G-S orthonormalization of 1, x, x2, . . . . f0, f1, f2, . . . arecalled the Legendre polynomials. Computation of fn using (5.3) or Theorem 5.4 iscomplicated. The following method is more effective.

Let

gn(x) =dn

dxn(x2 − 1)n =

bn/2c∑k=0

(−1)k

(n

k

)(2n− 2k)nx

n−2k,

where (a)b = a(a− 1) · · · (a− b+ 1) for b ∈ N. Let pn(x) = (x2 − 1)n. Integratingby parts repeatedly, we have

〈gm, gn〉 =∫ 1

−1

p(m)m (x) p(n)

n (x)dx =

0 if m 6= n,

(−1)n(2n)!∫ 1

−1(x2 − 1)ndx if m = n.

Note that

∫ 1

−1

(x2 − 1)ndx =∫ 1

−1

(x− 1)n(x+ 1)ndx

=1

n+ 1

∫ 1

−1

(x− 1)nd(x+ 1)n+1

= − 1n+ 1

∫ 1

−1

(x+ 1)n+1d(x− 1)n

= − n

n+ 1

∫ 1

−1

(x− 1)n−1(x+ 1)n+1dx

= · · ·

= (−1)n n!(2n)n

∫ 1

−1

(x+ 1)2ndx

= (−1)n n!(2n)n

· 22n+1

2n+ 1.

Hence,

〈gn, gn〉 =(n!)222n+1

2n+ 1.

So,

fn(x) =1||gn||

g(x) =

√n+ 1

2

n!2n

dn

dxn(x2 − 1)n.

5.2. FINITE DIMENSIONAL INNER PRODUCT SPACES 55

A “space walk”.

..................................................................................................................................................................... ...........................................................................................................................................................................................

..................................................................................................................................................................... ...........................................................................................................................................................................................

...................................................................................................

...................................................................................................

...................................................................................................

...................................................................................................

..................................................................................................................................................................... ...........

..................................................................................................................................................................... ...........

...................................................................................................

..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ..........................................................................................................................................................................

................

................

................

....................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

................

................

................

....................................................................................................................................................................................................................................................

..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................

completeness

completeness

Hilbert spaceinner product

space

Banach space normedvector space

metric space

topologicalvector space

topologicalspace

vector space

5.2. Finite Dimensional Inner Product Spaces

The Gram matrix. Let V be an n-dimensional inner product space and letε1, . . . , εn be a basis of V . The Gram matrix of ε1, . . . , εn, denoted G(ε1, . . . , εn), isthe n× n matrix [〈εi, εj〉]. If u = x1ε1 + · · ·+ xnεn and v = y1ε1 + · · ·+ ynεn, then

〈u, v〉 = (x1, . . . , xn)[〈εi, εj〉]

y1...yn

.The Gram matrix [〈εi, εj〉] is symmetric and has the property that xT [〈εi, εj〉]x > 0for all 0 6= x ∈ Rn. (Unless specified otherwise, vectors in Rn are columns.) Ann × n symmetric matrix A over R is called positive definite if xTAx > 0 for all0 6= x ∈ Rn. Let A be an n× n positive definite matrix and define

〈x, y〉A = xTAy, x, y ∈ Rn.

Then (Rn, 〈·, ·〉A) is an inner product space. The map V → Rn, x1ε1 + · · ·+xnεn 7→(x1, . . . , xn)T is an isometry from (V, 〈·, ·〉) to (Rn, 〈·, ·〉G(ε1,...,εn)).

Fact 5.5. A is an n × n positive definite matrix ⇔ A = PTP for some P ∈GL(n,R).

Proof. (⇒) (Rn, 〈·, ·〉A) is isometric to (Rn, 〈·, ·〉I). Let T : Rn → Rn, x 7→Px, be the isometry, and let e1, . . . , en be the standard basis of Rn. Then

A = [〈ei, ej〉A] = [〈Pei, P ej〉I ] =[eTi P

TPej

]= PTP.


Orthogonal transformations and orthogonal matrices. An isometryof an n-dimensional inner product space V is also called an orthogonal transforma-tion of V . A matrix A ∈Mn(R) is called orthogonal if ATA = I. Let u1, . . . , un bean orthonormal basis of V and T ∈ End(V ) such that

T (u1, . . . , un) = (u1, . . . , un)A.

Then T is orthogonal ⇔ A is orthogonal.

Examples of orthogonal matrices. Permutation matrices;[−1

1

],[

cos θ − sin θsin θ cos θ

];

block sums of orthogonal matrices.

Easy facts about orthogonal matrices. Let O(n) be the set of all n×northogonal matrices. Let A,B ∈ O(n).

(i) AB, A−1, AT ∈ O(n).(ii) detA = ±1.(iii) All complex eigenvalues of A have norm 1.

QR factorization. Let A ∈Mm×n(R) such that rankA = n. Then A = QR,where Q ∈Mm×n(R) has orthonormal columns and R ∈Mn(R) is upper triangularwith positive diagonal entries. The matricesQ and R, with the described properties,are unique.

Proof. Let A = [a1, . . . , an]. Let u1, . . . , un be the G-S orthonormalization ofa1, . . . , an. Then A = [u1, . . . , un]R.

Proposition 5.6. Let A ∈ O(n).

(i) If detA = 1, A is a product of orthogonal matrices of the form

(5.6)

1. . .

1cos θ − sin θ i

1. . .

1sin θ cos θ j

1. . .

1

.

(The matrix in (5.6) is called a rotation matrix.)(ii) If detA = −1, A is a product of [−1] ⊕ In−1 and matrices of the form

(5.6)

Proof. (i) Denote the matrix in (5.6) by R(i, j, θ). Clearly, R(i, j, θ)−1 =R(i, j,−θ).


Use induction on n. The case n = 1 is obvious. Assume n > 1 and let A = [aij ].Choose θ such that a11 sin θ + a21 cos θ = 0. Then

R(1, 2, θ)A =

a′11 ∗ · · · ∗0 ∗ · · · ∗∗... ∗∗

.

In this way, we see that ∃ rotation matrices R2, . . . , Rn such that

Rn · · ·R2A =

b11 b12 · · · b1n

0... ∗0

.Since Rn · · ·R2A is orthogonal, b11 = ±1. We may assume b11 = 1 (Otherwise,look at R(1, 2, π)Rn · · ·R2A.) Since (b11, b12, . . . , b1n) has norm 1, we have b12 =· · · = b1n = 0. So,

Rn · · ·R2A =

[1 00 A1

],

where A1 ∈ O(n − 1). By the induction hypothesis, A1 = S1 · · ·Sm, whereS1, . . . , Sm are rotation matrices in O(n− 1). Thus,

A = R−12 · · ·R−1

n

[1

S1

]· · ·

[1

Sm

],

where all factors are rotation matrices in O(n).(ii) Apply (i) to

([−1]⊕ In−1

)A.

The projection matrix. Let Rn be the inner product space with the stan-dard inner product 〈·, ·〉I . Let S be a subspace of Rn with a basis a1, . . . , am. LetA = [a1, . . . , am] ∈Mn×m(R). Then

projS(x) = Qx, x ∈ Rn,

whereQ = A(ATA)−1AT .

Q is called the projection matrix of S. If a1, . . . , am is an orthonormal basis of S,then Q = AAT .

Proof. 1 ∀x, y ∈ Rn, since Qx ∈ S and y −Qy ∈ S⊥, we have

0 = 〈Qx, y −Qy〉 = xTQT (I −Q)y = xT (QT −QTQ)y.

Thus, QT = QTQ. It follows that Q = QT and Q = Q2.2 We have

Q = projS(e1, . . . , en) = [a1, . . . , am]B = AB

for some B ∈Mm×n(R) with rankB = m. By 1,

BTATAB = QTQ = QT = BTAT .


Thus, ATAB = AT . Since ATA is invertible (Exercise 5.1), B = (ATA)−1AT .Hence Q = AB = A(ATA)−1AT .

The adjoint map. Let V and W be finite dimensional inner product spacesand let f ∈ HomR(V,W ). For each w ∈ W , 〈f(·), w〉 ∈ V ∗. By Proposition 5.2(iv), ∃ a unique element of V , depending on f and w, denoted by f?(w), such that〈f(·), w〉 = 〈·, f?(w)〉. It is easy to check that f? ∈ HomR(W,V ). f? is called theadjoint of f . Moreover, ( )? : W → V is an R-map.

Let f∗ : W ∗ → V ∗ be the R-map defined in Proposition 3.16. Also let φV :V → V ∗ be defined by φV (v) = 〈·, v〉. Then the following diagram commutes.

Wf?

−→ V

∼=

yφW∼=

yφV

W ∗ f∗−→ V ∗

Let v1, . . . , vm be a basis of V and w1, . . . , wn a basis of W and write

f(v1, . . . , vm) = (w1, . . . , wn)A, A ∈Mn×m(R),

f?(w1, . . . , wn) = (v1, . . . , vm)B, B ∈Mm×n(R).

Namely, A (B) is the matrix of f (f?) relative to the bases v1, . . . , vm and w1, . . . , wn

(w1, . . . , wn and v1, . . . , vm). Then

AT

w1

...wn

[w1, . . . , wn] =

f(v1)

...f(vm)

[w1, . . . , wn] =

v1...vm

[f?(w1), . . . , f?(wn)]

=

v1...vm

[v1, . . . , vm]B,

i.e.,ATG(w1, . . . , wn) = G(v1, . . . , vm)B.

If v1, . . . , vm and w1, . . . , wn are orthonormal, then

AT = B.

Self-adjoint maps. An R-map f : V → V is called self-adjoint if f? = f . LetRn be the inner product space with the standard inner product and let f : Rn → Rn

be defined by f(x) = Ax, where A ∈Mn(R). Then f is self-adjoint ⇔ A = AT .

Orthogonal similarity. Two matrices A,B ∈ Mn(R) are called orthogo-nally similar if ∃P ∈ O(n) such that A = PBPT . Let V be an n-dimensional innerproduct space. Two matrices in Mn(R) are orthogonally similar iff they are thematrices of some T ∈ End(V ) relative to two suitable orthonormal bases of V .

Normal matrices. A ∈ Mn(R) is called normal if AAT = ATA. Examples:symmetric, skew symmetric and orthogonal matrices.

Theorem 5.7 (Canonical forms of normal matrices under orthogonal simi-larity). Let A ∈ Mn(R) be normal. Let the eigenvalues of A be a1, . . . , as, b1 ±


c1i, . . . , bt ± cti, where ak, bl, cl ∈ R, cl 6= 0, and s+ 2t = n. Then ∃P ∈ O(n) suchthat

PTAP =

a1

. . .

as

b1 c1

−c1 b1. . .

bt ct

−ct bt

.

Proof. Use induction on n.Case 1. A has a real eigenvalue a. Let x1 ∈ Rn such that ||x1|| = 1 and

Ax1 = ax1. Extend x1 to an orthonormal basis x1, x2, . . . , xn of Rn. For k ≥ 2, byLemma 5.16, 〈Axk, x1〉 = 〈xk, A

Tx1〉 = 〈xk, ax1〉 = 0. So,

A[x1, . . . , xn] = [x1, . . . , xn]

[a 00 A1

],

where A1 ∈Mn−1(R) is normal. Use the induction hypothesis on A1.Case 2. A has an eigenvalue λ = b + ci, c 6= 0. Let 0 6= z ∈ Cn such that

Az = λz. By Lemma 5.16,

λzT z = zTAz = (zTAz)T = zTAT z = λzT z.

Hence zT z = 0. Write z = u + iv, u, v ∈ Rn. Then Az = λz implies thatA[u, v] = [u, v]

[b c−c b

]; zT z = 0 implies that ||u|| = ||v|| and 〈u, v〉 = 0. We may

assume ||u|| = ||v|| = 1. Extend u, v to an orthonormal basis u, v, x3, . . . , xn of Rn.Then for k ≥ 3, (Axk)T z = xT

kAT z = xT

k λz = 0. So, 〈Axk, u〉 = 〈Axk, v〉 = 0.Therefore,

A[u, v, x3, . . . , xu] = [u, v, x3, . . . , xu]

b c

−c b

A1

,where A1 ∈Mn−2(R) is normal. Use the induction hypothesis on A1.

Corollary 5.8. Let A ∈Mn(R).

(i) A = AT ⇔ A is orthogonally similar to a diagonal matrix. In particular,all eigenvalues of a symmetric matrix in Mn(R) are real.

(ii) A = −AT ⇔ A is orthogonally similar to[0 c1

−c1 0

]⊕ · · · ⊕

[0 ct

−ct 0

]⊕ 0

for some c1, . . . , ct ∈ R×. In particular, all eigenvalues of a skew symmet-ric matrix in Mn(R) are purely imaginary.

(iii) A is orthogonal ⇔ A is normal and all eigenvalues of A are of complexnorm 1.


Positive definite and semi positive definite matrices. Let A ∈Mn(R)be symmetric. Recall that A is called positive definite if xAxT > 0 for all 0 6= x ∈Rn. A is called semi positive definite if xAxT ≥ 0 for all x ∈ Rn.

Proposition 5.9. Let A ∈ Mn(R) be symmetric. The following statementsare equivalent.

(i) A is positive definite.(ii) All eigenvalues of A are positive.(iii) A = BBT for some B ∈ GL(n,R).(iv) A = BBT for some Mn×m(R) with rankB = n.(v) detA(I, I) > 0 for every I ⊂ 1, . . . , n.(vi) detA(1, . . . , k, 1, . . . , k) > 0 for every 1 ≤ k ≤ n. (detA(1, . . . , k,

1, . . . , k) is called a leading principal minor of A.)

Proof. The equivalence of (i) – (iv) is easy.(i) ⇒ (v). We claim that A(I, I) is positive definite. To see this, we may

assume I = 1, . . . , k. For each (row vector) 0 6= x ∈ Rk, 0 6= (x, 0) ∈ Rn.So, xA(I, I)xT = (x, 0)A(x, 0)T > 0. Thus A(I, I) is positive definite. By (ii),detA(I, I) > 0.

(v) ⇒ (vi). Obvious.(vi) ⇒ (i). Use induction on n. The case n = 1 is obvious. Assume n > 1. Let

I = 1, . . . , n − 1. Since detA(1, . . . , k, 1, . . . , k) > 0 for all 1 ≤ k ≤ n − 1,by the induction hypothesis, A(I, I) is positive definite. In particular, A(I, I) isinvertible. Hence A is congruent to[

A(I, I) 00 λ

]for some λ ∈ R. Since λ = det A

det A(I,I) > 0, A(I, I) ⊕ [λ] is positive definite. Hencethe conclusion.

Proposition 5.10. Let A ∈ Mn(R) be symmetric. The following statementsare equivalent.

(i) A is semi positive definite.(ii) All eigenvalues of A are ≥ 0.(iii) A = BBT for some B ∈Mn×r(R) with rankB = r.(iv) A = BBT for some B ∈Mn×m(R).(v) detA(I, I) ≥ 0 for all I ⊂ 1, . . . , n.

Proof. (v) ⇒ (i). We have cA(x) = xn−an−1xn−1 + · · ·+(−1)na0, where ak

is the sum of all k × k principal minors of A. Since ak ≥ 0 for all 0 ≤ k ≤ n − 1,cA(x) has no negative roots.

Note. Regarding Proposition 5.10 (v), if all leading minors of A are ≥ 0, A isnot necessarily semi positive definite. Example: A =

[0−1

].

Generalized inverses. Let A ∈Mm×n(R). The map

(5.7)φ : C(AT ) −→ C(A)

ATx 7−→ AATx

is an isomorphism (∵ kerφ = 0 and dim C(AT ) = dim C(A)). Let P be theprojection matrix of C(A). Then ∃!A+ ∈Mn×m(R) such that C(A+) ⊂ C(AT ) and


AA+ = P . A+ is called the (Moore-Penrose) generalized inverse of A. Clearly, ifA is invertible, A+ = A−1.

Properties of A+. Let A ∈Mm×n(R) and let P,Q be the projection matricesof C(A) and C(AT ), respectively.

(i) AA+ = P , A+A = Q.(ii) A+P = QA+ = A+.(iii) A+AA+ = A+, AA+A = A.(iv) rankA+ = rankA.

Proof. (i) Note that C(A+A) ⊂ C(AT ), C(Q) ⊂ C(AT ) and AA+A = PA = A,AQ = (QAT )=(AT )T = A. Since (5.7) is an isomorphism, we have A+A = Q.

(ii) Since C(A+) ⊂ C(AT ), we have QA+ = A+. Then A+P = A+AA+ =QA+ = A+.

(iii) A+AA+ = QA+ = A+.(iv) It follows from (iii) that rankA+ ≤ rankA and rankA ≤ rankA+.

Proposition 5.11 (Characterization of A+). Let A ∈ Mm×n(R) and B ∈Mn×m(R). Then B = A+ ⇔

(i) ABA = A, BAB = B and(ii) both AB and BA are symmetric.

Proof. (⇐) We have (AB)2 = AB, (AB)T = AB, and C(AB) = C(A) (by(i), rankAB ≥ rankA). So, P := AB is the projection matrix of A. Since B =(BA)B = ATBTB, C(B) ⊂ (AT ). Since AB = P , we have B = A+.

Singular value decomposition. Let A ∈Mm×n(R). Then ∃P ∈ O(m) andQ ∈ O(n) such that

A = P [diag(sa, . . . , sr)⊕ 0]Q,

where s1, . . . , sr ∈ R+ and s21, . . . , s2r are the nonzero eigenvalues of ATA. s1, . . . , sr

are called the singular values of A.

Proof. ATA is semi positive definite. Hence ∃Q1 ∈ O(n) such that

(5.8) QT1 A

TAQ1 = diag(s21, . . . , s2r)⊕ 0, si > 0.

Write AQ1 = [a1, . . . , an]. Then

aTi aj =

s2i if i = j ≤ r,0 otherwise.

By (5.8), rankA = rankATA = r; hence span (a1, . . . , an) = span (a1, . . . , ar). Letui = 1

siai, 1 ≤ i ≤ r. Then u1, . . . , ur is orthonormal. Extend it to an orthonormal

basis u1, . . . , um of Rm. Then

[u1, . . . , um]TAQ1 = diag (s1, . . . , sr)⊕ 0.

So, A = [u1, . . . , um](diag (s1, . . . , sr)⊕ 0

)QT

1 .

Proposition 5.12. If A ∈ Mm×n(R) has a singular value decomposition A =P(diag (s1, . . . , sr)⊕ 0

)Q, then A+ = QT

(diag ( 1

s1, . . . , 1

sr)⊕ 0

)PT .

Proof. It follows from Proposition 5.11.


Least squares solutions. Let A ∈Mm×n(R) and b ∈ Rm. For each x ∈ Rn,

||Ax− b||2 = ||Ax− projC(A)(b)||2 + ||projC(A)(b)− b||2.

Hence ||Ax− b|| is minimum iff

(5.9) Ax = projC(A)(b).

A solution of (5.9) is called a least squares solution of

(5.10) Ax = b.

Note that (5.9) is always consistent even if (5.10) is not.

Proposition 5.13. Assume the above notation.(i) (5.9) ⇔ ATAx = AT b.(ii) A+b+ kerc(A) is the set of least squares solutions of (5.10).(iii) A+b is the unique least square solution of (5.10) of minimum norm.

Proof. (i) (5.9) ⇔ (Ax− b)⊥C(A)⇔ AT (Ax− b) = 0.(ii) Only have to show that A+b is a least squares solution. We have ATAA+b =

AT (AA+)T b = (AA+A)T b = AT b.(iii) Note that A+b ∈ C(AT ) ⊂ kerc(A)⊥.

Polar decomposition. Let A ∈Mn(R). Then ∃P ∈ O(n) and semi positivedefinite matrices B1 and B2 such that

(5.11) A = B1P = PB2.

If A ∈ GL(n,R), then B1 and B2 are positive definite and P,B1, B2 are unique.

Proof. By the singular value decomposition, ∃Q,R ∈ O(n) such that

A = Q

s1

. . .

sn

R,where 0 ≤ si ∈ R. Let B1 = Qdiag(s1, . . . , sn)QT , B2 = RT diag(s1, . . . , sn)R, andP = QR. We have (5.11).

Uniqueness of B1, B2, P when A ∈ GL(n,R). Assume A = B′1P

′1 = P ′2B

′2,

where P ′1, P′2 ∈ O(n) and B′

1, B′2 are positive definite. Then B2

1 = AAT = B′12.

Then B1 = B′1 (Exercise 5.5 (i)). So, P ′1 = P . In the same way, B2 = B′

2 andP ′2 = P .

5.3. Unitary Spaces

A unitary space is an inner product space over C.

Definition 5.14. A unitary space is a vector space V over C equipped with amap 〈·, ·, 〉 : V ×V → C, called the inner product, satisfying the following conditions.

(i) 〈u, v〉 = 〈v, u〉, ∀u, v ∈ V .(ii) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉, ∀u, v, w ∈ V , a, b ∈ C.(iii) 〈u, u〉 ≥ 0 for all u ∈ V and 〈u, u〉 = 0⇔ u = 0.

Examples.

5.3. UNITARY SPACES 63

• V = Cn. For x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Cn, define

〈x, y〉 =n∑

i=1

xiyi.

• `2C := (an)∞n=0 : an ∈ C,∑∞

n=0 |an|2 <∞. For (an), (bn) ∈ `2C, define

〈(an), (bn)〉 =∞∑

n=0

anbn.

• Let (X,B, µ) be a measure space and let L2C(X) = u+iv : u, v ∈ L2(X).

For f, g ∈ L2C(X), define

〈f, g〉 =∫

X

fgdµ.

Complexification. Let V be a vector space over R. Define

VC = u+ vi : u, v ∈ V .

For u1 + v1i, u2 + v2i ∈ VC and a + bi ∈ C, where u1, u2, v1, v2 ∈ V and a, b ∈ R,define

(u1 + v1i) + (u2 + v2i) = (u1 + u2) + (v1 + v2)i,

(a+ bi)(u1 + v1i) = (au1 − bv1) + (bu1 + av1)i.

Then VC is a vector space over C; VC is called the complexification of V . (In factVC = C⊗R V .)

If V is an inner product space over C, then VC is a unitary space with innerproduct

〈u1 + v1i, u2 + v2i〉 = 〈u1, u2〉+ 〈v1, v2〉+ (〈v1, u2〉+ 〈u1, v2〉)i.

Cn is the complexification of Rn; L2C(X) is the complexification of L2(X).

On the other hand, if V is a vector space over C, it is of course a vector spaceover R. We write VR for V viewed as a vector space over R. If (V, 〈·, ·〉) is a unitaryspace, then (VR,Re〈·, ·〉) is an inner product space.

Almost all definitions and results about inner product spaces can be carried tounitary spaces without additional work.

• Norm: ||x|| = 〈x, x〉 12 .• Distance: ||x− y||.• Orthogonality: x⊥y if 〈x, y〉 = 0.• Adjoint: Let V and W be finite dimensional unitary spaces and f ∈

HomC(V,W ), then ∃!f? ∈ HomC(W,V ), called the adjoint of f , such that〈f(x), y〉 = 〈x, f?(y)〉 ∀x ∈ V, y ∈W .

• For A ∈Mm×n(C), A∗ := AT .• Hermitian matrices: A ∈Mn(C) such that A∗ = A.• (Semi) positive definite matrices: Hermitian matrix A such that x∗Ax > 0

(≥ 0) for all 0 6= x ∈ Cn.• Unitary matrices: P ∈ Mn(C) such that PP ∗ = I. The set of all n × n

unitary matrices is denoted by U(n).• Unitary transformations: f ∈ HomC(V, V ) such that 〈f(x), f(y)〉 = 〈x, y〉∀x, y ∈ V .


• The generalized inverse: Let A ∈ Mm×n(C) and let P be the projectionmatrix of C(A). A+ ∈Mn×m(C) is the unique matrix such that C(A+) ⊂C(A∗) and AA+ = P .• Normal matrices: A ∈Mn(C) such that AA∗ = A∗A.• Unitary similarity: A,B ∈Mn(C) are called unitarily similar if ∃P ∈ U(n)

such that A = PBP ∗.

Canonical forms of normal matrices under unitary similarity. Theresult is simpler than the case of real normal matrices under orthogonal similaritydue to the fact that C is algebraically closed. (Compare with Theorem 5.7.)

Proposition 5.15. A matrix A ∈ Mn(C) is normal ⇔ A is unitarily similarto a diagonal matrix.

Proof. (⇐) Obvious.(⇒) Method 1. Use Lemma 5.16 and the same argument of the proof of Theo-

rem 5.7, case 1.Method 2. By Lemma 5.17, we may assume that A is upper triangular, say,

A =

a11 a12 · · · a1n

a21 · · · a2n

. . ....ann

.Compare the (1, 1) entries of A∗A and A∗A. We have

|a11|2 = |a11|2 + |a12|2 + · · ·+ |a1n|2.

So, a12 = · · · = a1n = 0. Using induction, we have aij = 0 for all i < j.

Lemma 5.16. Let A ∈Mn(C) be normal. If Ax = λx, where λ ∈ C and x ∈ Cn,then A∗x = λx.

Proof. Since AA∗ = A∗A, we have

〈A∗x− λx, A∗x− λx〉 = 〈Ax− λx, Ax− λx〉 = 0.

Lemma 5.17. Let A ∈ Mn(C). Then ∃P ∈ U(n) such that P ∗AP is uppertriangular.

Proof. Let λ1 ∈ C be an eigenvalue of A and let x1 ∈ C be an associatedeigenvector with ||x1|| = 1. Extend x1 to an orthonormal basis x1, x2, . . . , xn ofCn. Then

A[x1, . . . , xn] = [x1, . . . , xn]

λ1 ∗ · · · ∗0... A1

0

,where A1 ∈Mn−1(C). Apply the induction hypothesis to A1

5.3. UNITARY SPACES 65

Theorem 5.18 (Specht). Let A,B ∈ Mn(C). Then A and B are unitarilysimilar ⇔

(5.12) Tr(Ai1A∗j1 · · ·AikA∗jk) = Tr(Bi1B∗j1 · · ·BikB∗jk)

for all k ≥ 0 and i1, 1, . . . , ik, jk ∈ N.

Proof. (⇒) ∃P ∈ U(n) such that A = PBP ∗. Then

Tr(Ai1A∗j1 · · ·AikA∗jk) = Tr(PBi1B∗j1 · · ·BikB∗jkP ∗) = Tr(Bi1B∗j1 · · ·BikB∗jk).

(⇐) The proof of this part needs representation theory.1 LetA be the C algebra generated by A and A∗ and B the C algebra generated

by B and B∗. Each element in A is a linear combination f(A,A∗) of productsAi1A∗j1 · · ·AikA∗jk with coefficients in C. Define

φ : A −→ Bf(A,A∗) 7−→ f(B,B∗).

Then φ is a well defined isomorphism. In fact, if f(B,B∗) = 0, then by (5.12),

Tr(f(A,A∗)∗f(A,A∗)

)= Tr

(f(B,B∗)∗f(B,B∗)

)= 0;

hence f(A,A∗) = 0.2 A is semisimple.Let I be a nilpotent ideal of A. Then I2m

= 0 for some m > 0. Let C ∈ I.Then (CC∗)2

m

= 0. It follows that (CC∗)2m−1

= 0. By induction, CC∗ = 0, whichimplies C = 0.

3 Let V1 be the natural A-module Cn. Let V − 2 be the A-module Cn withscalar multiplication C ∗ x = φ(C)x, C ∈ A, x ∈ Cn. We claim that AV1

∼= AV2.Let 1 = e1 + · · ·+ eu be a decomposition of 1 into primitive orthogonal idem-

potents of A. Then we can write

AV1 =u⊕

i=1

si⊕j=1

Lij ,

AV2 =u⊕

i=1

ti⊕j=1

Mij ,

where Lij∼= Aei, Mij

∼= Aei. (See [?, 25.8].) Then

si dimCAei = Tr(ei) = Tr(φ(ei)

)(by (5.12))

= ti dimCAei.

So, si = ti and AV1∼= AV2.

4 Let α : AV1 → AV2, x 7→ Px be the isomorphism in 3, where P ∈ GL(n,C).Then ∀C ∈ A,

PCx = α(Cx) = φ(C)α(x) = φ(C)Px ∀x ∈ Cn.

Hence φ(C) = PCP−1. In partticular, B = PAP−1 and B∗ = PA∗P−1. ByExercise 5.15, A and B are unitarily similar.


Exercises

5.1. Let A ∈Mm×n(C). Prove that rank (A∗A) = rankA.

5.2. Let V and W be inner product spaces. Let f : V → W be a function suchthat(i) f(0) = 0;(ii) ||f(u)− f(v)|| = ||u− v||.Prove that f is a linear transformation.

5.3. Let V be a vector space over R and 〈·, ·〉 : V × V → R a function such that(1) 〈u, v〉 = 〈v, u〉 ∀u, v ∈ V ;(2) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉 ∀u, v, w ∈ V, a, b ∈ R;(3) 〈u, u〉 ≥ 0 ∀u ∈ V .

For each u ∈ V , define ||u|| = 〈u, u〉 12 . Prove the following statements.(i) V0 := u ∈ V : ||u|| = 0 is a subspaces of V .(ii) |〈u, v〉| ≤ ||u|| · ||v|| ∀u, v ∈ V .(iii) V0 = u ∈ V : 〈u, v〉 = 0 ∀v ∈ V .(iv) Define

〈·, ·, 〉 : V/V0 × V/V0 −→ R(u+ V0, v + V0) 7−→ 〈u, v〉.

Then (V/V0, 〈·, ·, 〉) is an inner product space.

5.4. (Hermite polynomials) For f, g ∈ R[x], define

〈f, g〉 =∫ +∞

−∞f(x)g(x)e−x2

dx.

Let h0(x), h1(x), h2(x), . . . be the G-S orthonormalization of 1, x, x2, . . . . De-termine hn(x) through the following steps.(i) Let

Hn(x) = (−1)nex2 dn

dxne−x2

.

Prove that

Hn(x) = n!bn/2c∑k=0

(−1)k(2x)n−2k

k!(n− 2k)!.

(ii) Use induction and integration by parts to show that

〈Hm,Hn〉 = 2nn!√πδm,n.

(iii) Use (i) and (ii) to show that

hn(x) = (2nn!√π)−

12Hn(x) = π−

14

( n!2n

) 12bn/2c∑k=0

(−1)k(2x)n−2k

k!(n− 2k)!.

5.5. Let A ∈Mn(C) be semi positive definite.(i) Prove that ∃! semi positive definite matrix A1 ∈Mn(C) such that A =

A21.

(ii) Let B ∈Mn(C). Then B commutes with A⇔ B commutes with A1.

5.6. Let A ∈Mn(C) be hermitian and let k be a positive odd integer.(i) Prove that ∃! hermitian matrix B ∈Mn(C) such that Bk = A.

EXERCISES 67

(ii) Prove that centMn(C)(A) = centMn(C)(B).

5.7. (Volume of a parallelepiped) Let v1, . . . , vk ∈ Rn be column vectors and let

Ω = a1v1 + · · ·+ akvk : 0 ≤ ai ≤ 1.Then

Vol(Ω) =[det([v1, . . . , vk]T [v1, . . . , vk]

)] 12 .

5.8. (Distance from a point to an affine subspace) Let A ∈ Mm×n(R), b ∈Mm×1(R) such that Ax = b is consistent. Let

M = x ∈Mn×1(R) : Ax = b.(i) Let u1, . . . , uk be an orthonormal basis of R(A). Show that ∃B ∈

Mk×m(R) with rankB = k such that

BA =

u1

...uk

=: U

and M = x ∈Mm×1(R) : Ux = c, where c = Bb.(ii) For each y ∈Mm×1(R), prove that

d(y,M) = ||Uy − c|| =[yTUTUy − 2cTUy + cT c

] 12 .

5.9. (The Hadamard inequality) Let A = [a1, . . . , an] ∈ GL(n,C). Then

|detA| ≤n∏

i=1

||ai||.

The equality holds iff a1, . . . , an form an orthogonal basis of Cn.

5.10. Let A = [aij ] ∈Mn(C) be positive definite. Prove that

detA ≤ a11a22 · · · ann

and that the equality holds iff A is diagonal.

5.11. (i) If A ∈Mm(C) and B ∈Mn(C) are (semi) positive definite, so is A⊗B.(ii) If AB ∈Mn(C) are (semi) positive definite and AB = BA, then AB is

also (semi) positive definite.(iii) For A = [aij ], B = [bij ] ∈ Mn(F ), the Hadamard product of A and

B, denoted by A ∗ B, is [aijbij ]. If A,B ∈ Mn(C) are (semi) positivedefinite, so is A ∗B.

5.12. (Properties of generalized inverses) Let A ∈ Mm×n(C), B ∈ Mn×p(C) andC ∈Ms×t(C).(i) (A+)+ = A, A+ = A+, (AT )+ = (A+)T .(ii) (A⊗ C)+ = A+ ⊗ C+.(iii) If rankA = n, A+ = (A∗A)−1A∗. If rankB = n, B+ = B∗(BB∗)−1.(iv) If rankA = rankB = n, then (AB)+ = B+A+.(v) Give an example where (AB)+ 6= B+A+.

5.13. (A practical formula for A+) Let A ∈Mm×n(C) with rankA = r.(i) Prove that ∃B ∈ Mm×r(C) and C ∈ Mr×n(C) such that rankB =

rankC = r and A = BC. (This is true with C replaced with an arbitraryfield F .)


(ii) Prove that A+ = C∗(B∗BCC∗)−1B∗.

5.14. Let A ∈Mm×n(C). Prove that[

AA∗

]is unitarily similar to diag(s1, . . . , st,

−s1, . . . ,−st)⊕0, where s1, . . . , st are the singular values of A (counted withmultiplicity).

5.15. Let A,B ∈Mn(C). Prove that A is unitarily similar to B ⇔ ∃P ∈ GL(n,C)such that P−1AP = B and P−1A∗P = B∗.

5.16. (i) Let A ∈ Mm(C) and B,C ∈ Mn(C) such that A ⊕ B and A ⊕ C areunitarily similar. Then B and C are unitarily similar.

(ii) Let A,B ∈ Mn(C) and k > 0 such that A⊕ · · · ⊕A︸︷︷︸k

and B ⊕ · · · ⊕B︸︷︷︸k

are unitarily similar. Then A and B are unitarily similar.Use Specht’s theorem.

Hints for the Exercises

1.3. (ii) [aijB][cjkD] =[∑n

j=1 aijcjkBD].

(iii) bklcuv appears in the((k−1)r+u, (l−1)s+v

)entry of B⊗C; aijbklcuv

appears in the((i − 1)pr + (k − 1)r + u, (j − 1)qs + (l − 1)s + v

)entry of

A⊗ (B ⊗ C).(v) Let rankA = r. Then ∃P ∈ GL(m,F ), Q ∈ GL(n, F ) such that

PAQ =

[Ir 00 0

].

So, (P ⊗ Ip)(A⊗B)(Q⊗ Iq) = · · · .

2.1. Use a Laplace expansion along two rows.

2.7. The Mathematica code:

p = 23;n = (p - 1)/2;A = Table[Mod[i* PowerMod[j, -1, p], p], i, n, j, n];FactorInteger[Det[A]]

(The number |Dp|p−(p−3)/2 is the relative class number of the cyclotomicfield Q(ζp). See [1].)

3.2 (ii) Since dimV < ∞ and V ⊃ f(V ) ⊃ f2(V ) ⊃ · · · , ∃s such that fs(V ) =fs+1(V ) = · · · . So, V2 = fk(V ), k ≥ s. Since ker f ⊂ ker f2 ⊂ · · · ⊂ V , ∃tsuch that ker f t = ker f t+1 = · · · . So, V1 =

⋃∞k=1 ker fk = ker f t.

3.10. (i) Assume A = [a1, . . . , an] ∈ GL(n,Fq). Count the number of possibilitiesfor a1, a2, etc.

(ii) Let

X =(X, (a1, . . . , ak)

): X is a k-dimensional subspace of Fn

q

and (a1, . . . , ak) is a basis of X

in two ways.

4.1. Let A ∈ Mn(F ) be the matrix of f relative to a basis of V . May assumeA = A0 ⊕ A1, where all elementary divisors of A0 are powers of x and noneof the elementary divisors of A1 is a power of x. Then Ak

1 = 0 for some k ≥ 0and A2 is invertible.

69

70 HINTS FOR THE EXERCISES

4.2. We have[Im A

0 In

][xIm −AB 0

B xIn

][Im −A0 In

]=

[xIm 0B xIn −BA

].

4.11. Elementary divisors.

Solutions of the Exercises

1.2. (i) The (i, j) entry of PTσ Pσ is

eTσ(i)eσ(j) =

1 if i = j,

0 if i 6= j.

So, PTσ Pσ = I.

(ii) The jth column of APσ is [a1, . . . , an]eσ(j) = aσ(j). So, APσ =[aσ(1), . . . , aσ(n)]. We also have

PσB = (BTPTσ )T =

([bT1 , . . . , b

Tn ]Pσ−1

)T = [bTσ−1(1), . . . , bTσ−1(n)]

T =

bσ−1(1)

...bσ−1(n)

.1.3. (ii) [aijB][cjkD] =

[∑nj=1 aijcjkBD

]= AC ⊗BD.

(iii) bklcuv appears in the((k−1)r+u, (l−1)s+v

)entry of B⊗C; aijbklcuv

appears in the((i − 1)pr + (k − 1)r + u, (j − 1)qs + (l − 1)s + v

)entry of

A⊗ (B⊗C). aijbkl appears in the((i− 1)p+ k, (j− 1)q+ l

)entry of A⊗B;

aijbklcuv appears in the((

(i − 1)p + k − 1)r + u,

((j − 1)q + l − 1

)s + v

)entry of (A⊗B)⊗ C.(v) Let rankA = r. Then ∃P ∈ GL(m,F ), Q ∈ GL(n, F ) such that

PAQ =

[Ir 00 0

].

So,

(P ⊗ Ip)(A⊗B)(Q⊗ Iq) = PAQ⊗B =

[Ir ⊗B

0

].

Therefore, rank(A⊗B) = rank(Ir ⊗B) = r rankB.

2.4 Let A be the matrix in the determinant. Then

A

1i −i1 1

. . .

i −i1 1

=

1 eix1 e−ix1 · · · einx1 e−ix1

......

......

...1 eix2n+1 e−ix2n+1 · · · einx2n+1 e−inx2n+1

.

71

72 SOLUTIONS OF THE EXERCISES

So,

(2i)n detA = e−in(x1+···+x2n+1)

·

∣∣∣∣∣∣∣∣einx1 ei(n+1)x1 ei(n−1)x1 · · · ei2nx1 ei0x1

......

......

...einx2n+1 ei(n+1)x2n+1 ei(n−1)x2n+1 · · · ei2nx2n+1 ei0x2n+1

∣∣∣∣∣∣∣∣= e−in(x1+···+x2n+1)

∣∣∣∣∣∣∣∣1 eix1 · · · ei2nx1

......

...1 eix2n+1 · · · ei2nx2n+1

∣∣∣∣∣∣∣∣(2n+ (2n− 2) + · · ·+ 2 column transpositions)

= e−in(x1+···+x2n+1)∏

1≤j<k≤2n+1

(eixk − eixj )

= e−in(x1+···+x2n+1)∏

1≤j<k≤2n+1

2iei 12 (xk+xj) sin

xk − xj

2

= e−in(x1+···+x2n+1)+i 12

∑j<k(xk+xj)(2i)(

2n+12 ) ∏

1≤j<k≤2n+1

sinxk − xj

2

= (2i)(2n+1

2 ) ∏1≤j<k≤2n+1

sinxk − xj

2.

2.5. Let A be the matrix in the determinant. Then by Exercise 2.4,

(2i)n detA =

∣∣∣∣∣∣∣∣eix1 e−ix1 · · · einx1 e−ix1

......

......

eix2n e−ix2n · · · einx2n e−inx2n

∣∣∣∣∣∣∣∣=

1n+ 1

n∑s=0

∣∣∣∣∣∣∣∣∣∣1 ei 2π

n+1 s e−i 2πn+1 s · · · ein 2π

n+1 s e−in 2πn+1 s

1 eix1 e−ix1 · · · einx1 e−ix1

......

......

...1 eix2n e−ix2n · · · einx2n e−inx2n

∣∣∣∣∣∣∣∣∣∣=

1n+ 1

n∑s=0

(2i)(2n+1

2 )( ∏

1≤j<k≤2n

sinxk − xj

2

)( 2n∏j=1

sin12(xj −

2πn+ 1

s)).

2.6. If m 6= n, say m > n, then p < q and rank(A ⊗ B) = (rankA)(rankB) ≤np < mp. If m = n and p = q, det(A ⊗ B) = det[(A ⊗ Ip)(Im ⊗ B)] =det(A⊗ Ip) det(Im ⊗B) = (detA)p(detB)m.

??. We have

bi,0x0 + · · ·+ bi,n−1x

n−1 =

1 if x = xi,

0 if x = xk, k 6= i.

Hence,

bi,0x0 + · · ·+ bi,n−1x

n−1 =

∏k 6=i(x− xk)∏k 6=i(xi − xk)

.

SOLUTIONS OF THE EXERCISES 73

So,

bij =(−1)n−1−j∏k 6=i(xi − xk)

σn−1−j(x1, . . . , xi−1, xi+1, . . . , xn).

3.2. (ii) Since dimV < ∞ and V ⊃ f(V ) ⊃ f2(V ) ⊃ · · · , ∃s such that fs(V ) =fs+1(V ) = · · · . So, V2 = fk(V ), k ≥ s. Since ker f ⊂ ker f2 ⊂ · · · ⊂ V ,∃t ≥ s such that ker f t = ker f t+1 = · · · . So, V1 =

⋃∞k=1 ker fk = ker f t. We

claim that V1 ∩V2 = 0. (Let x ∈ V1 ∩V2. Since x ∈ V2 = f t(V ), x = f t(y)for some y ∈ V . Since x ∈ V1 = ker f t, f2t(y) = f t(x) = 0; hence y ∈ V1.Thus x = f t(y) = 0.) Since dimV1 + dimV2 = dim(ker f t) + dim(im f t) =dimV , we must have V = V1⊕V2. (Note. There is an easier proof using thecanonical form of f .)

(iii) Let V = F ⊕F ⊕ · · · and f : V → V , (x1, x2, . . . ) 7→ (0, x1, x2, . . . ).

3.3. Direct computation shows that D(xiyj) = 4(i + j + 1)xiyj . Clearly, Dpreserves the additions and scalar multiplications. So, D maps L to L andis an R-map. The matrix of D relative to the basis xiyj : 0 ≤ i, j ≤ n of Lis a diagonal matrix with rows and columns labeled by (i, j) : 0 ≤ i, j ≤ n;the

((i, j), (i, j)

)-entry of the matrix is 4(i+ j + 1).

3.7. Define

f : C(B)/C(BC) −→ C(AB)/C(ABC)Bx+ C(BC) 7−→ ABx+ C(ABC), x ∈Mp×1(F ).

Then f is a well defined onto F -map. So,

dim(C(B)/C(BC)

)≥ dim

(C(AB)/C(ABC)

).

Hence the result.

3.8. (ii) We have f(ax) = af(x) for all a ∈ Q and x ∈ Rn. If α ∈ R, choosean ∈ Q such that limn→+∞ an = α. Then f(αx) = limn→+∞ f(anx) =limn→+∞ anf(x) = αf(x).

3.9. Assume to the contrary that the claim is false and assume that n is thesmallest positive integer for which there is a counterexample X, i.e., X isa subspace of Mn(F ) with dimX > n(n − 1) such that X ∩ GL(n, F ) = ∅.Clearly n > 1.

Let Eij ∈ Mn(F ) be the matrix whose (i, j)-entry is 1 and whose otherentries are all 0.

1 We claim that if Eij ∈ X, then Eik, Ekj ∈ X for all 1 ≤ k ≤ n.May assume (i, j) = (1, 1). Assume the contrary of the claim. Consider

an F -map

f : X −→ Mn−1(F )[∗ ∗∗ A

]7−→ A.

Then dim ker f < 2n− 1; hence dim f(X) = dimX − dim ker f > n(n− 1)−2(n − 1) = (n − 1)(n − 2). Thus ∃A ∈ f(X) such that detA 6= 0. We have

74 SOLUTIONS OF THE EXERCISES[a bc A

]∈ X for some a, b, c. For each x ∈ F , we have xE11 +

[a bc A

]∈ X and

det(xE11 +

[a b

c A

])= det

[x+ a b

c A

]= (x+ a) detA+ constant.

So, ∃x ∈ F such that det(xE11 +

[a bc A

])6= 0, →←.

2 Since dimX > n(n− 1), the F -map

g : X −→ M(n−1)×n(F )[∗A

]7−→ A

has ker g 6= 0. Hence ∃0 6= u ∈ Fn such that [ u0 ] ∈ X. By elementary

column operations, we may assume E11 ∈ X. By 1, Ei1 ∈ X for all i. By1 again, Eij ∈ X for all i, j, ⇒ X = Mn(F ), →←.

3.11.

Li(x0) =∫ +∞

0

e−ixdx =1i.

For j > 0,

Li(xj) =∫ +∞

0

xje−ixdx = −1i

∫ +∞

0

xjde−ix = −1i

[xje−ix

∣∣∣+∞0−∫ +∞

0

e−ixdxj]

=j

i

∫ +∞

0

xj−1e−ixdx =j

iLi(xj−1).

HenceLi(xj) =

j!ij+1

, 1 ≤ i ≤ n+ 1, 0 ≤ j ≤ n.

Let (f1, . . . , fn+1) = (1, x, . . . , xn)A. Then

In+1 =

L1

...Ln+1

[f1, . . . , fn+1] =

L1

...Ln+1

[1, x, . . . , xn]A =[ j!ij+1

]A.

To find the inverse of [ j!ij+1 ], note that

[ j!ij+1

]=

1 · · · 111 · · · 1

n+1...

...( 11 )n · · · ( 1

n+1 )n

0!1!

. . .

n!

.

Bibliography

[1] L. Carlitz and F. R. Olson, Maillet’s determinant, Proc. Amer. Math. Soc. 6 (1955), 265 –

269.

75

Documents

Matrices - Statisticsxhou/MAS5107F06/notes.pdfMatrices 1.1. Matrix Algebra Fields. A ﬁled is a set F equipped with two operations + and ·such that (F,+) and (F×,×) are abelian