Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CHAPTER 1
Matrices
1.1. Matrix Algebra
Fields. A filed is a set F equipped with two operations + and · such that(F,+) and (F×,×) are abelian groups, where F× = F \ 0, and a(b+ c) = ab+acfor all a, b, c ∈ F .
Examples of fields. Q,R,C,Zp (p primes). If F is a filed and f(x) is anirreducible polynomial in F [x], the quotient ring F [x]/(f) is field containing F asa subfield. E.g., C ∼= R[x]/(x2 + 1); Z3[x]/(x3 − x + 1) is a field with 33 elementscontaining Z3. If R is an integral domain (commutative ring without zero divisors),then all fractions p
q (p, q ∈ R, q 6= 0) form the fractional field of R with contains R.
Matrices. Let F be a field. Mm×n(F ) = the set of all m × n matriceswith entries in F ; Mn(F ) = Mn×n(F ). For A = [aij ], B = [bij ] ∈ Mm×n(F ),C = [cjk] ∈Mn×p(F ), α ∈ F ,
A+B := [aij + bij ] ∈Mm×n(F ),
αA := [αaij ] ∈Mm×n(F ),
AC := [dik] ∈Mm×p(F ), where dik =n∑
j=1
aijcjk.
For A ∈Mm×n(F ), B ∈Mn×p(F ), C ∈Mp×q(F ),
(AB)C = A(BC).
(Mn(F ),+, ·) is a ring with identity
In =
1
. . .
1
.GL(n, F ) = the set of invertible matrices in Mn(F ). (GL(n, F ), ·) is the multiplica-tive group of Mn(F ), called the general linear group of degree n over F .
Multiplication by blocks. Let
A =
A11 · · · A1n
......
Am1 · · · Amn
, B =
B11 · · · B1p
......
Bn1 · · · Bnp
,1
2 1. MATRICES
where Aij ∈Mmi×nj (F ), Bjk ∈Mnj×pk(F ). Then
AB =
C11 · · · C1p
......
Cm1 · · · Cmp
,where
Cik =n∑
j=1
AijBjk.
Transpose. The transpose of
A =
a11 · · · a1n
......
am1 · · · amn
is
AT =
a11 · · · am1
......
a1n · · · amn
.If
A =
A11 · · · A1n
......
Am1 · · · Amn
is a block matrix, then
AT =
AT
11 · · · ATm1
......
AT1n · · · AT
mn
.Properties of transpose.
(i) (αA+ βB)T = αAT + βBT .(ii) (AB)T = BTAT .(iii) (A−1)T = (AT )−1.
Elementary operations and elementary matrices.
To perform an elementary row (column) operation on a matrix A is to multiplythe corresponding elementary matrix to A from the left (right).
Note. The inverse of an elementary matrix is also an elementary matrix ofthe same type.
Proposition 1.1. Every A ∈ GL(n, F ) is a product of elementary matrices.
Proof. Use induction on n. A can be transformed into[
1A1
]through suitable
elementary row and column operations, i.e., ∃ elementary matrices P1, . . . , Pk, Q1, . . . , Ql
such that
P1 · · ·PkAQ1 · · ·Ql =
[1
A1
],
1.1. MATRIX ALGEBRA 3
Table 1.1. Elementary row operations and elementary matrices
type elementary row operation elementary matrix
I multiply the ith row by α ∈ F× 1 . . .
α i. . .1
II swap the ith and jth rows
1 . . .1
0 1 i1 . . .
11 0 j
1 . . .1
IIIadd β times the jth row tothe ith row, where i 6= j, β ∈ F
1 . . .
j 1 . . .i β 1 . . .
1
where A1 ∈ GL(n − 1, F ). By the induction hypothesis, A1 is a product of el-ementary matrices. Thus
[1
A1
]is a product of elementary matrices and so is
A = P−1k · · ·P−1
1
[1
A1
]Q−1
l · · ·Q−11 .
Equivalence. Let A,B ∈Mm×n(F ). We say that
• A is row equivalent to B, denoted Ar≈ B, if ∃P ∈ GL(m,F ) such that
A = PB;• A is column equivalent to B, denoted A
c≈ B, if ∃Q ∈ GL(n, F ) such that
A = BQ;• A is equivalent to B, denoted A ≈ B, if ∃P ∈ GL(m,F ) and Q ∈ GL(n, F )
such that A = PBQ.r≈,
c≈ and ≈ are equivalence relations on Mm×n(F ).
Reduced row echelon forms. A matrix A ∈Mm×n(F ) is called a reducedrow echelon form (rref) if
(i) in each nonzero row of A, the first nonzero entry is 1; such an entry iscalled a pivot of A;
(ii) if a column of A contains a pivot, then all other entries in the column are0;
(iii) if a row contains a pivot, then every row above contains a pivot furtherto the left.
A reduced column echelon form (rcef) is defined similarly.
4 1. MATRICES
Proposition 1.2. Every A ∈Mm×n(F ) is row (column) equivalent to a uniquerref (rcef).
Proof. Existence of rref. Induction on the size of A.Uniqueness of rref. Use induction on m.Let A,B ∈ Mm×n(F ) be rref’s such that A = PB for some P ∈ GL(m,F ).
We want to show that A = B. May assume B 6= 0. Assume that the first nonzerocolumn of B is the jth column. Then the first nonzero column of A = PB is alsothe jth column. Write
A =
[0 1 a
0 0 A1
]j
, B =
[0 1 b
0 0 B1
]j
,
where A1, B1 ∈M(m−1)×(n−j)(F ) are rref’s. Then[1 a
0 A1
]= P
[1 b
0 B1
].
It follows that
P =
[1 p
0 P1
], P1 ∈ GL(m− 1, F ),
and [1 a
0 A1
]=
[1 b+ pB1
0 P1B1
].
Since A1 = P1B1, by the induction hypothesis, A1 = B1. Let I be the set of indicesof the pivot columns of B1. Since A,B are rref’s, all components of a and b withindices in I are 0. Since pB1 = a− b, all components of pB1 with indices in I are 0.Write B1 = [b1, , . . . , bn−j ]. Then pbi = 0 for all i ∈ I. Note that every column ofB1 is a linear combination of the pivot columns bi, i ∈ I. So, pB1 = 0. Therefore,a = b. So, A = B.
Proposition 1.3. Every A ∈Mm×n(F ) is equivalent to[Ir 00 0
],
where 0 ≤ r ≤ minm,n is uniquely determined by A. Moreover, r = the numberof pivots in the rref (rcef) of A. r is called the rank of A.
Proof. We only have to show the uniqueness of r; the other claims are obvious.Assume to the contrary that[
Ir 00 0
]≈
[Is 00 0
], r < s.
Then ∃P ∈ GL(m,F ) and Q ∈ GL(n, F ) such that
P
[Ir 00 0
]=
[Is 00 0
]Q.
1.1. MATRIX ALGEBRA 5
Write P = [P1 P2], Q =[
Q1Q2
], where P1 ∈Mm×r(F ), Q1 ∈Ms×n(F ). Then
[P1 0] =
[Q1
0
].
Hence Q1 = [Q11 0], where Q11 ∈ Ms×r(F ). Since s > r, ∃0 6= x ∈ F s such thatxQ11 = 0. Then
[x 0]Q = [x 0]
[Q11 0Q2
]= 0,
which is a contradiction since Q is invertible.
Easy fact. Let A ∈Mn(F ). Then the following are equivalent.
(i) A is invertible.(ii) rref(A) = In.(iii) rcef(A) = In.(iv) rankA = n.
Finding A−1. Let A ∈Mn(F ). Perform elementary row operations:
[A In]→ · · · → [rref(A) B].
If rref(A) = In, A−1 = B; if rref(A) 6= In, A is not invertible.
For A ∈ Mm×n(F ), let kerr(A) = x ∈ M1×m(F ) : xA = 0 and kerc(A) =y ∈Mn×1(F ) : Ay = 0.
Facts. Let A,B ∈Mn(F ).
(i) A ∈ GL(n, F )⇔ kerr(A) = 0 ⇔ kerc(A) = 0.(ii) If AB ∈ GL(n, F ), then A,B ∈ GL(n, F ). In particular, if AB = In, then
B = A−1 and BA = In.
Proof. (i) To see that kerc(A) = 0 ⇒ A ∈ GL(n, F ), note that if rref(A) 6=In, then kerc(A) 6= 0.
(ii) kerc(B) ⊂ kerc(AB) = 0. So, B ∈ GL(n, F ).
Congruence and similarity. Let A,B ∈Mn(F ). We say that
• A is congruent to B, denoted ∼= B, if ∃P ∈ GL(n, F ) such that A =PTBP ;• A is similar to B, denoted A ∼ B, if ∃P ∈ GL(n, F ) such that A =P−1BP .
Canonical forms of symmetric matrices under congruence will be discussed in Chapter??;canonical forms of matrices under similarity will be discussed in Chapter??.
Given P ∈ GL(n, F ), the map φ : Mn(F )→Mn(F ) defined by φ(A) = P−1APis an algebra isomorphism, i.e., φ preserves the addition, multiplication and scalarmultiplication.
6 1. MATRICES
Exercises
1.1. Let A ∈Mm×n(F ) with rankA = 0 and let p > 0. Prove that ∃B ∈Mn×p(F )such that rankB = minn− r, p and AB = 0.
1.2. For 1 ≤ i ≤ n let ei = [0 . . . 0 1i0 . . . 0]T ∈ Fn.
(i) Let σ be a permutation of 1, . . . , n and let
Pσ =[eσ(1) · · · eσ(n)
].
Pσ is called the permutation matrix of σ. Prove that P−1σ = PT
σ .(ii) Let
A = [a1, · · · an] ∈Mm×n(F ), B =
b1...bn
∈Mn×p(F )
Prove that
APσ = [aσ(1), · · · aσ(n)], PσB =
bσ−1(1)
...bσ−1(n)
.Hence, multiplication of a matrix X by a permutation matrix from theleft (right) permutes the rows (columns) of X. In particular, Pστ =PσPτ if τ is another permutation of 1, . . . , n.
1.3. Let A = [aij ] ∈Mm×n(F ) and B = [bkl] ∈Mpq(F ). Define
A⊗B =
a11B · · · a1nB
......
am1B · · · amnB
∈Mmp×nq(F ).
(i) Prove that (A⊗B)T = AT ⊗BT .(ii) Let C ∈ Mn×r(F ) and D ∈ Mq×s(F ). Prove that (A ⊗ B)(C ⊗ D) =
AC ⊗BD.(iii) Let C = [cuv] ∈Mr×s(F ). Prove that A⊗ (B ⊗ C) = (A⊗B)⊗ C.(iv) Let σ be a permutation of 1, . . . ,mp defined by
σ((i− 1)p+ k
)= (k − 1)m+ i for 1 ≤ i ≤ m, 1 ≤ k ≤ p,
and let τ be a permutation of 1, . . . , nq defined by
τ((j − 1)q + l
)= (l − 1)n+ j for 1 ≤ j ≤ n, 1 ≤ l ≤ q.
Show that the (u, v)-entry of A⊗B is the (σ(u), τ(v))-entry of B ⊗ A.Namely
PTσ (A⊗B)Pτ = B ⊗A.
(Note. If m = n and p = q, then σ = τ .)(v) Prove that rank(A⊗B) = (rankA)(rankB).
CHAPTER 2
The Determinant
2.1. Definition, Properties and Formulas
Let Sn be the set (group) of all permutations of 1, . . . , n. A permutationσ ∈ Sn is denoted by
σ =
(1 2 · · · n
σ(1) σ(2) · · · σ(n)
).
A transposition is a swap of i, j ∈ 1, . . . , n (i 6= j) and is denoted by (i, j). Everyσ ∈ Sn is a product of s transpositions. The number s is not uniquely determined byσ, but s (mod 2) is. Define sign(σ) = (−1)s; σ is called an even (odd) permutationif sign(σ) = 1 (−1).
Definition 2.1. Let A = [aij ] ∈ Mn(F ). The determinant of A, denoted bydetA of |A|, is defined to be
detA =∑
σ∈Sn
sign(σ)a1σ(a) · · · anσ(n).
Easy facts.
(i) detAT = detA.(ii) detA is an F -linear function of every row and column of A.(iii) If A has two identical rows (columns), then detA = 0.
Proof. (i)
detAT =∑
σ∈Sn
sign(σ)aσ(1),1 · · · aσ(n),n
=∑
σ∈Sn
sign(σ−1)a1,σ−1(1) · · · an,σ−1(n)
= detA.
(iii) Assume that the first two rows of A are identical. Let C be a set ofrepresentatives of the left cosets of 〈(1, 2)〉 in Sn. Then
detA =∑σ∈C
sign(σ)a1σ(1) · · · anσ(n) +∑σ∈C
sign(σ · (1, 2))aσ(1),1 · · · aσ(n),n = 0.
7
8 2. THE DETERMINANT
Effect of elementary row and column operations on the determi-nant.
det[. . . αvi . . . ] = α det[. . . vi . . . ],
det[. . . vi . . . vj . . . ] = −det[. . . vj . . . vi . . . ],
det[. . . vi . . . vj + αvi . . . ] = det[. . . vi . . . vj . . . ].
Theorem 2.2 (The Laplace expansion). Let A ∈Mn(F ). For I, J ⊂ 1, . . . , n,let A(I, J) denote the submatrix of A with row indices in I and column indices inJ . Fix I ⊂ 1, . . . , n with |I| = k. We have
detA =∑
J⊂1,...,n|J|=k
(−1)∑
i∈I i+∑
j∈J j detA(I, J) detA(Ic, Jc),
where Ic = 1, . . . , n \ I.
Lemma 2.3. Let
σ =
(1 · · · k k + 1 · · · n
i1 · · · ik i′1 · · · i′n−k
)∈ Sn,
where i1 < · · · < ik and i′1 < · · · < i′n−k. Then
sign(σ) = (−1)i1+···+ik+ 12 k(k+1).
Proof. We count the number of transpositions needed to permute i1, . . . , ik,i′1, . . . , i
′n−k into 1, . . . , n. There are ik − k integers in i′1, . . . , i′n−k that are < ik.
Thus, ik − k transpositions are needed to move ik to the right place. In general,it − t transpositions are needed to move it to the right place. So,
sign(σ) = (−1)∑k
t=1(it−t) = (−1)i1+···+ik+ 12 k(k+1).
Corollary 2.4. Let
σ =
(i1 · · · ik i′1 · · · i′n−k
j1 · · · jk j′1 · · · j′n−k
)∈ Sn,
where i1 < · · · < ik, i′1 < · · · < i′n−k, j1 < · · · < jk, j′1 < · · · < jn−k. Then
sign(σ) = (−1)i1+···+ik+j1+···+jk .
Proof of Theorem 2.2. We have
detA =∑
σ∈Sn
sign(σ)a1σ(1) · · · anσ(n) =∑
J⊂1,...,n|J|=k
∑σ∈Sn
σ(I)=J
sign(σ)a1σ(1) · · · anσ(n).
To compute the inner sum in the above, let I = i1, . . . , ik, Ic = i′1, . . . , i′n−k,J = j1, . . . , jk, Jc = j′1, . . . , j′n−k, where i1 < · · · < ik, i′1 < · · · < i′n−k,j1 < · · · < jk, j′1 < · · · < j′n−k, and
σ =
(i1 · · · ik i′1 · · · i′n−k
jα(1) · · · jα(k) j′β(1) · · · j′β(n−k)
),
where α ∈ Sk and β ∈ Sn−k. Then by Corollary 2.4,
sign(σ) = sign(α)sign(β)(−1)i1+···+ik+j1+···+jk .
2.1. DEFINITION, PROPERTIES AND FORMULAS 9
Therefore,∑σ∈Sn
σ(I)=J
sign(σ)a1σ(1) · · · anσ(n)
=(−1)i1+···+ik+j1+···+jk
·(∑
α∈Sk
sign(α)ai1jα(1) · · · aikjα(k)
)( ∑β∈Sn−k
sign(β)ai′1j′β(1)· · · ai′n−kj′
β(n−k)
)=(−1)i1+···+ik+j1+···+jk detA(I, J) detA(Ic, Jc).
Hence the theorem.
Corollary 2.5. Let A = [aij ] ∈Mn(F ). We have
detA =n∑
j=1
(−1)i+jaij detAij , 1 ≤ i ≤ n,
and
detA =n∑
i=1
(−1)i+jaij detAij , 1 ≤ j ≤ n,
where Aij is the submatrix of A obtained after deleting the ith row and the jthcolumn.
Proposition 2.6. Let ej = [0 . . . 0j
1 0 . . . , 0]T ∈ Fm. Let f : Mm×n(F ) → Fsuch that
(i) f(A) is F -linear in every column of A;(ii) f(A) = 0 whenever A has two identical columns;(iii) f([ej1 . . . ejn
]) = 0 for all 1 ≤ j1 < · · · < jn ≤ m; (this condition becomesnull when m < n.)
Then f(A) = 0 for all A ∈Mm×n(F ).
Proof. 1 f([v1 . . . vi . . . vj . . . vn] = −f([v1 . . . vj . . . vi . . . vn]. In fact,
0 = f([. . . vi + vj . . . vi + vj . . . ])
= f([. . . vi . . . vi . . . ]) + f([. . . vi . . . vj . . . ])
+ f([. . . vj . . . vi . . . ]) + f([. . . vj . . . vj . . . ])
= f([. . . vi . . . vj . . . ]) + f([. . . vj . . . vi . . . ]).
2 Each column of A is a linear combination of e1, . . . , em. By (i), f(A) isa linear combination of f([ej1 . . . ejn ]), where j1, . . . , jn ∈ 1, . . . ,m. Thus, itsuffices to show f([ej1 . . . ejn ]) = 0. If j1, . . . , jn are not all distinct, by (ii),f([ej1 . . . ejn
]) = 0. If j1, . . . , jn are all distinct, by 1, we may assume 1 ≤ j1 <· · · < jn ≤ m. By (iii), f([ej1 . . . ejn
]) = 0.
Corollary 2.7. det : Mn(F )→ F is the unique function such that
(i) detA is F -linear in every column of A;(ii) detA = 0 whenever A has two identical columns;(iii) det In = 1.
10 2. THE DETERMINANT
Theorem 2.8 (Cauchy-Binet). Let A ∈ Mn×m(F ) and B ∈ Mm×n(F ). LetI = 1, . . . , n. Then
(2.1) det(AB) =∑
J⊂1,...,m|J|=n
detA(I, J) detB(J, I).
In particular,
det(AB) =
0 if n > m,
(detA)(detB) if n = m.
Proof. Fix A ∈ Mn×m(F ) and let f(B) be the difference of the two sides of(2.1). Then f : Mm×n(F )→ F satisfies (i) – (iii) in Proposition 2.6.
Proposition 2.9 (The adjoint matrix). For A ∈Mn(F ), define
adj(A) =[(−1)i+j detAij
]T ∈Mn(F ).
We have
A adj(A) = adj(A)A = (detA)In.
Moreover, A is invertible ⇔ detA 6= 0. When detA 6= 0, A−1 = 1det Aadj(A).
Proof. Let A = [aij ] = [v1, . . . , vn]. Then the (i, j) entry of adj(A)A is
n∑k=1
(−1)i+k(detAki)akj = det[v1, . . . ,ivj , . . . , vn] =
detA if i = j,
0 if i 6= j.
So, adj(A)A = (detA)In.
2.2. Techniques for Computing Determinants
Example 2.10 (The Vandermonde determinant). For a1, . . . , an ∈ F , let
V (a1, . . . , an) =
∣∣∣∣∣∣∣∣∣∣1 1 · · · 1a1 a2 · · · an
......
...an−11 an−1
2 · · · an−1n
∣∣∣∣∣∣∣∣∣∣.
Then
V (a1, . . . , an) =∏
1≤i<j≤n
(aj − ai).
2.2. TECHNIQUES FOR COMPUTING DETERMINANTS 11
Proof. Method 1. Subtract a1 × (row (n − 1)) from row n, . . . , a1 × (row 1)from row 2 ⇒
V (a1, . . . , an) =
∣∣∣∣∣∣∣∣∣∣∣∣
1 1 · · · 10 a2 − a1 · · · an − a1
0 a2(a2 − a1) · · · an(an − a1)...
......
0 an−22 (a2 − a1) · · · an−2
n (an − a1)
∣∣∣∣∣∣∣∣∣∣∣∣=V (a2, . . . , an)
n∏j=2
(aj − a1)
=∏
1≤i<j≤n
(aj − ai) (by induction).
Method 2. Assume a1, . . . , an−1 are all distinct. V (a1, . . . , an−1, x) is a polyno-mial of degree n−1 with leading coefficient V (a1, . . . , an−1) and have a1, . . . , an−1 asroots. So, V (a1, . . . , an−1, x) = V (a1, . . . , an−1)
∏n−1j=1 (x− aj). Use induction.
Example 2.11. Let a1, . . . , an, b1, . . . , bn ∈ F such that ai + bj 6= 0 for all i, j.Then
det[ 1ai + bj
]=
∏i<j(ai − aj)(bi − bj)∏
i,j(ai + bj).
Proof. We may assume that a1 . . . , an are all distinct and so are b1, . . . , bn.Denote the determinant by f(a1, . . . , an; b1, . . . , bn). Let x be an indeterminate.Then f(x, a2, . . . , an; b1, . . . , bn)
∏nj=1(x+ bj) is a polynomial of degree n− 1 with
leading coefficient1 · · · 11
a2+b1· · · 1
a2+bn
......
1an+b1
1an+bn
∣∣∣∣∣∣∣∣∣∣=: g(a1, . . . , an; b1, . . . , bn)
and have a2, . . . , an as roots. So,
(2.2) f(x, a2, . . . , an; b1, . . . , an)n∏
j=1
(x+ bj) = g(a2, . . . , an; b1, . . . , bn)n∏
i=2
(x− ai).
Similarly, g(a2, . . . , an;x, b2, . . . , bn)∏n
i=2(ai + x) is a polynomial of degree n − 1with leading coefficient f(a2, . . . , an; b2, . . . , bn) and have b2, . . . , bn as roots. So,
(2.3) g(a2, . . . , an;x, b2, . . . , bn)n∏
i=2
(ai + x) = f(a2, . . . , an; b2, . . . , bn)n∏
j=2
(x− bj).
By (2.2) (with x = a1) and (2.3) (with x = b1), we have
f(a1, . . . , an; b1, . . . , bn)∏
i=n or j=n
(ai+bj) = f(a2, . . . , an; b2, . . . , bn)n∏
j=2
(a1−aj)(b1−bj).
The conclusion follows by induction.
12 2. THE DETERMINANT
Example 2.12 (Circulant matrix). Let a0, . . . , an−1 ∈ C and
C(a0, . . . , an−1) =
a0 a1 · · · · an−1
an−1 a0 a1 · · · ·· an−1 a0 a1 · · ·
· · ·· · ·
· · · · an−1 a0 a1
a1 · · · · an−1 a0
.
Put
A =
0 10 1· ·· ·
0 11 0
.
Then
C(a0, . . . , an−1) = a0A0 + a1A
1 + · · ·+ an−1An−1.
Let ε = e2πi/n. Then
A
1 ε0 · · · ε0(n−1)
1 ε1 · · · ε1(n−1)
......
...1 εn−1 · · · ε(n−1)2
=
1 ε0 · · · ε0(n−1)
1 ε1 · · · ε1(n−1)
......
...1 εn−1 · · · ε(n−1)2
1ε
. . .
εn−1
.
Thus
A ∼
1
ε. . .
εn−1
and
C(a0, . . . , an−1) ∼
∑n−1
i=0 aiε0·i
. . . ∑n−1i=0 aiε
(n−1)i
.So,
detC(a0, . . . , an−1) =n−1∏j=0
(n−1∑i=0
aiεji).
EXERCISES 13
Exercises
2.1. Compute the (2n)× (2n) determinant∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
a1 b2n
a2 b2n−1
. . . · ··
an bn+1
bn an+1
· ·· . . .
b2 a2n−1
b1 a2n
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
.
2.2. (Tridiagonal determinant) Let a, b, c ∈ C and define
Dn =
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣
a b
c a b
c a b
· · ·· · ·
c a b
c a
∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣n×n
, n ≥ 1.
(i) Prove that Dn = aDn−1 − bcDn−2 for n ≥ 3.(ii) Prove that
Dn =
αn+1 − βn+1
α− βif a2 − 4bc 6= 0,
(n+ 1)(a
2
)n
if a2 − 4bc = 0,
where α = 12 (a+
√a2 − 4bc), β = 1
2 (a−√a2 − 4bc).
2.3. Use Example 2.11 to compute the determinant of the Hilbert matrix Hn =[ 1i+j ]1≤i,j≤n.
2.4. Prove that∣∣∣∣∣∣∣∣∣∣1 sinx1 cosx1 sin 2x1 cos 2x1 · · · sinnx1 cosnx1
1 sinx2 cosx2 sin 2x2 cos 2x2 · · · sinnx2 cosnx2
......
......
......
...1 sinx2n+1 cosx2n+1 sin 2x2n+1 cos 2x2n+1 · · · sinnx2n+1 cosnx2n+1
∣∣∣∣∣∣∣∣∣∣=(−1)n22n2 ∏
1≤j<k≤2n+1
sinxk − xj
2.
14 2. THE DETERMINANT
2.5. Prove that∣∣∣∣∣∣∣∣∣∣sinx1 cosx1 sin 2x1 cos 2x1 · · · sinnx1 cosnx1
sinx2 cosx2 sin 2x2 cos 2x2 · · · sinnx2 cosnx2
......
......
......
sinx2n cosx2n sin 2x2n cos 2x2n · · · sinnx2n cosnx2n
∣∣∣∣∣∣∣∣∣∣=(−1)n22n2 1
n+ 1
( ∏1≤j<k≤2n
sinxk − xj
2
) n∑s=0
2n∏j=1
sin(xj
2− π
n+ 1s).
2.6. M1 Let A ∈Mm×n(F ) and B ∈Mp×q(F ), where mp = nq. Prove that
det(A⊗B) =
(detA)p(detB)m if m = n and p = q,
0 otherwise.
2.7. (Maillet’s determinant) Let p be an odd prime. For each i, j ∈ 1, . . . , p−12 ,
let m(i, j) ∈ 1, . . . , p − 1 such that j m(i, j) ≡ i (mod p). (When viewedas an element of Zp, m(i, j) = i/j.) Let
Dp = det[m(i, j)].
For example,
D7 =
∣∣∣∣∣∣∣1 4 52 1 33 5 1
∣∣∣∣∣∣∣ .Compute Dp for p ≤ 19 using a computer. Make a conjecture about |Dp|.Then compute D23.
CHAPTER 3
Vector Spaces and Linear Transformations
3.1. Basic Definitions
Definition 3.1. A vector space over a field F is an abelian group (V,+)equipped with a scalar multiplication F × V → V , (α, x) 7→ αx such that for allx, y ∈ V and α, β ∈ F ,
(i) α(x+ y) = αx+ αy;(ii) (α+ β)x = αx+ βx;(iii) α(βx) = (αβ)x;(iv) 1x = x.
Examples of vector spaces.
• Fn, where F is a field. More generally, let V be a vector space and X anyset. Then V X = the set of all functions from X to V is a vector spaceover F .• If F is a subfield of K, K is a vector space over F .• Mm×n(F ), F [x], etc.• The solution set of a linear system, a linear difference equation, a linear
differential equation, etc.• p > 0, `p =
an∞n=1 : an ∈ C,
∑∞n=1 |an|p < ∞
. (|an + bn|p ≤
(2 max|an|, |bn|)p = 2p max|an|p, |bn|p ≤ 2p(|an|p + |bn|p).)
Subspaces. Let V be a vector space over F . A subset W ⊂ V is called asubspace of V if W is a vector space over F under the same addition and scalarmultiplication of V . W is a subspace of V ⇔W 6= ∅ and W is closed under additionand scalar multiplication.
Linear transformations. Let V and W be vector spaces over F . A functionf : V → W is called a linear transformation (or an F -map) if for all x, y ∈ V andα ∈ F , f(x + y) = f(x) + f(y) and f(αx) = αf(x). A bijective F -map is calledan isomorphism. If ∃ isomorphism f : V → W , we say that V is isomorphic to Wand write V ∼= W ; in this case, f−1 : W → V is also an isomorphism. An injectiveF -map f : V → W is called an embedding. HomF (V,W ) = the set of all F -mapsfrom V to W ; it is a subspace of WV . An F -map f : V → V is also called a linearoperator of V . HomF (V, V ) is denoted by EndF (V ).
Easy fact. Let f : V → W be a linear transformation. Then f(V ) is asubspace of W . If W1 is a subspace of W , then f−1(W1) is a subspace of V . Inparticular, ker f := f−1(0) is a subspace of V . f is 1-1 ⇔ ker f = 0.
Easy fact. Let V be a vector space over F and Vi : i ∈ I a family ofsubspaces of V .
(i)⋂
i∈I Vi is a subspaces of V .
15
16 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
(ii) Define∑i∈I
Vi =∑
i∈I
ui : ui ∈ Vi, ui 6= 0 for only finitely many i ∈ I.
Then∑
i∈I Vi is the smallest subspace of V containing⋃
i∈I Vi.
Direct product and external direct sum. Let Vi : i ∈ I be a familyof vector spaces over F . Let∏
i∈I
Vi =(ui)i∈I : ui ∈ Vi, i ∈ I
(the cartesian product of Vi : i ∈ I).
Then∏
i∈I Vi is a vector space over F with addition and scalar multiplicationdefined component wise;
∏i∈I Vi is called the direct product of Vi : i ∈ I.⊕ext
i∈I
Vi :=
(ui) ∈∏i∈I
Vi : ai = 0 for all but finitely many i
is a subspace of∏
i∈I Vi.⊕ext
i∈I Vi is called the external direct sum of Vi : i ∈ I.When |I| <∞,
∏i∈I Vi =
⊕exti∈I Vi.
Internal direct sum. Let V be a vector space over F and Vi : i ∈ I afamily of subspaces of V . If
Vi ∩(∑
j∈Ij 6=i
Vj
)= 0 for all i ∈ I,
then∑
i∈I Vi is called an internal direct sum and is denoted by⊕
i∈I Vi.
Easy facts.
(i)∑
i∈I Vi is an internal direct sum ⇔ every u ∈∑
i∈I Vi has a uniquerepresentation u =
∑i∈I ui, where ui ∈ Vi and ui = 0 for all but finitely
many i.(ii)
⊕i∈I Vi
∼=⊕ext
i∈I Vi. (For this reason, we usually do not distinguish inter-nal and external direct sums.
⊕ext is also denoted by⊕
.)
Spans, Spanning Sets and Linearly Independent Sets. Let V be avector space over F and let S ⊂ V . The span of S, denoted by 〈S〉 or spanS, is
〈S〉 = spanS := a1u1 + · · ·+ anun : n ≥ 0, ui ∈ V, ai ∈ F.
〈S〉 is the smallest subspace of V containing S. If V = 〈S〉, S is called a spanningset of V .
A subset S ⊂ V is called a linearly independent set if for any u1, . . . , un ∈ S(distinct) and any a1, . . . , an ∈ F not all zero, a1u1 + · · ·+ anun 6= 0.
Theorem 3.2. Let V be a vector space over F and S ⊂ V . Then the followingstatements are equivalent.
(i) S is a maximal linearly independent set of V .(ii) S is a minimal spanning set of V .(iii) S is a linearly independent spanning set of V .(iv) Every element of V is a unique linear combination of elements in S.
3.1. BASIC DEFINITIONS 17
Proof. (i) ⇔ (iii).(ii) ⇔ (iii).(iv) ⇔ (iii).
By Zorn’s lemma, maximal linearly independent sets of V exist. A subsetS ⊂ V satisfying one of (i) – (iv) in Theorem 3.2 is called a basis of V .
Proposition 3.3. Let V and W be vector spaces over F and let X be a basis ofV . Then every function f : X →W can be extended to a unique F -map f : V →W .
Proof. Define
f : V −→ W∑x∈X axx 7−→
∑x∈X axf(x)
Corollary 3.4. Let V and W be vector spaces over F . Let S be a subspaceof V and f : S →W an F -map. The f can be extended to an F -map g : V →W .
Proof. Let X be a basis of S. Extend X to a basis of Y of V . Extend f |Xto a function f1 : Y → W . By Proposition 3.3, f1 can be extended to an F -mapg : V →W .
Theorem 3.5. Any two bases of a vector space have the same cardinality.
Proof. Let V be a vector space over F and let X, Y be two bases of V .1 Assume that |X| < ∞ and |Y | < ∞. Write X = x1, . . . , xn and Y =
y1, . . . , yn. Assume to the contrary that n > m. Thenx1
...xn
= A
y1...ym
,y1...ym
= B
x1
...xn
for some matrix A ∈ Mn×m(F ) and B ∈ Mm×n(F ). It follows that AB =In. There exists C ∈ GL(n, F ) such that CA = [ ∗
0 ··· 0 ]. Thus (0, . . . , 0, 1)C =(0, . . . , 0, 1)CAB = 0, →←.
2 Assume |X| = ∞. We claim that |Y | = ∞. (Otherwise, X is spannedby Y which is spanned by a finite subset of X. So, X is spanned by a finitesubset of X, →←.) For each x ∈ X, ∃ a finite subset y1, . . . , yn ⊂ Y suchthat x = a1y1 + · · · + anyn, ai ∈ F . Define f(x) = y1, . . . , yn. We claim that⋃
x∈X f(x) = Y . (Otherwise, X is spanned by Y1 :=⋃
x∈X f(x) ( Y ; hence Y isspanned by Y1, →←.) Now,
|Y | =∣∣∣ ⋃x∈X
f(x)∣∣∣ ≤ |X|ℵ0 = |X|.
By symmetry, |X| ≤ |Y |. So, |X| = |Y |.
Dimension. Let V be a vector space over F with a basis X. Define dimV (ordimF V ) = |X|. We have
V =⊕s∈X
Fx ∼=⊕ext
x∈X
F = F |X|.
18 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
Caution. Let F be a field and X a set. F |X| is the direct sum of |X| copies ofF , i.e., F |X| =
⊕x∈X F . However, FX is the F -vector space of all functions from
X to F , i.e., FX =∏
x∈X F .
Examples. dimFn = n. dimF [x] = ℵ0. Let Sn(F ) be the set of all n × nsymmetric matrices over F and Un(F ) the set of all n×n upper triangular matricesover F . Then dimSn(F ) = dimUn(F ) = 1
2n(n+ 1).
Example 3.6. If V is a vector space over F such that |V | =∞ and |V | > |F |.Then dimV = |V |. (E.g., dimQ R = ℵ.)
Proof. Let X be a basis of V . Clearly, |X| = ∞. (If |X| < ∞, since |F | <|V | = ∞, we have |V | = |F ||X| < |V |, →←.) Let P0(X) be the set of all finitesubsets of X. Then
|V | =∣∣∣ ⋃S∈P0(X)
〈S〉∣∣∣
≤ |P0(X)|max|F |,ℵ0 (since |〈S〉| = |F ||S| ≤ max|F |,ℵ0)= |X|max|F |,ℵ0 = max|X|, |F |.
Since |V | ≥ |F |, we must have |V | ≤ |X|.
Example. Let F be a subfield of K and V a vector space over K. Then V isnaturally a vector space over F . Moreover,
dimF V = dimK V · dimF K.
Proof. Let X be a basis of V over K and Y a basis over K over F . Then as(y, x) runs over Y ×X, yx are all distinct; Y X = yx : y ∈ Y, x ∈ X is a basis ofV over F .
Easy facts.
(i) Two vector spaces V and W over F are isomorphic iff dimV = dimW .(ii) dim
⊕i∈I Vi =
∑i∈I dimVi.
Example. Let A ∈ Mm×n(F ). The row (column) space of A, denoted byR(A) (C(A)), is the subspace of Fn (Fm) spanned by the rows (columns) of A.The nonzero rows (columns) of rref(A) (rcef(A)) form a basis of R(A) (C(A));dimR(A) = dim C(A) = rankA.
3.2. Quotient Spaces and Isomorphism Theorems
The quotient space. Let S ⊂ V be a vector space over F . Recall that thequotient abelian group V/S = u+ S : u ∈ V and the addition in V/S is definedby (u+S) + (v+S) = (u+ v) +S. Define a scalar multiplication in V/S similarly.For u + S ∈ V/S and α ∈ F , let α(u + S) = αu + S. The scalar multiplication iswell defined and V/S becomes a vector space over F . V/S is called the quotientspace of V by S. The map
π : V −→ V/S
u 7−→ u+ S
is an onto F -map with kerπ = S. π is called the canonical projection from V toV/S.
3.2. QUOTIENT SPACES AND ISOMORPHISM THEOREMS 19
Proposition 3.7. Let S ⊂ V be vector spaces over F . Let εi : i ∈ I be abasis of S and δj + S : j ∈ J a basis of V/S. Then εi : i ∈ I ∪ δj : j ∈ J is abasis of V . So, V ∼= S ⊕ V/S and dimV = dimS + dimV/S. If dimV <∞, thendimV/S = dimV − dimS.
Easy fact (The correspondence theorem). Let S ⊂ V be vector spaces overF . Let A be the set of all subspaces of V containing S and B the set of all subspacesof V/S. Then
A −→ BW 7−→ W/S
is a bijection.
Theorem 3.8 (The universal mapping property of the quotient space). LetS ⊂ V be vector spaces over F . Let W be another vector space over F and f :V → W an F -map such that ker f ⊃ S. Then ∃! F -map f : V/S → W such thatf = f π. Moreover, f(V ) = f(V ) and ker f = ker f/S.
................................................................................................. .......................................................................................... ............................
........................................................
.........................................
f
πf
V W
V/S
Proof. Define f : V/S →W , u+ S 7→ f(u).
Theorem 3.9 (The first isomorphism theorem). Let f : V →W be an F -map.Then V/ ker f ∼= f(V ).
Proof. By Theorem 3.8, ∃ an F -map f : V/ ker f → W such that f = f π,where π : V → V/ ker f is the canonical projection, and f(V ) = f(V ), ker f =ker f/ ker f = 0 + ker f.
Theorem 3.10 (The second isomorphism theorem). Let V be a vector spaceover F and S, T subspaces of V . Then (S + T )/T ∼= S/S ∩ T .
Proof. Define an F -map
f : S −→ (S + T )/Ts 7−→ s+ T
f is onto with ker f = S ∩ T . Use the first isomorphism theorem.
Theorem 3.11 (The third isomorphism theorem). Let S ⊂ T ⊂ V be vectorspace over F . Then (V/S)
/(T/S) ∼= V/T .
Proof. Define an F -map f : V/S → V/T , v+S → v+T . Then f is onto andker f = T/S.
Corollary 3.12.(i) If f : V →W is an F -map, then
dimV = null f + rank f,
where null f := dim(ker f) and rank f := dim f(V ).
20 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
(ii) Let S, T be subspaces of V . Then
dimS + dimT = dim(S + T ) + dimS ∩ T.
Proof. (ii) Define an F -map
f : S × T −→ S + T
(s, t) 7−→ s+ t
Then f is onto and ker f = (s,−s) : s ∈ S ∩ T ∼= S ∩ T . Hence
dimS + dimT = dim(S × T ) = dim(S + T ) + dimS ∩ T.
3.3. Finite Dimensional Vector Spaces
Facts.
(i) If S ⊂ V are vector spaces over F such that dimS = dimV < ∞, thenS = V .
(ii) Let f : V →W be an F -map, where dimV = dimW <∞. Then f is 1-1⇔ f is onto.
Proof. (i) dimV/S = 0⇒ V = S.Note. When dimV =∞, both (i) and (ii) are false.
Let V be an n-dimensional vector space over F with an (ordered) basis E =(ε1, . . . , εn) and W and m-dimensional vector space over F with an (ordered) basis(δ1, . . . , δm). Let f : V →W be an F -map. Then(
f(ε1), . . . , f(εn))
= (δ1, . . . , δm)A
for some A ∈ Mm×n(F ). The map f 7→ A is an isomorphism HomF (V,W ) →Mm×n(F ). We have rank f = rankA and null f = nullA. (nullA := dimx ∈ Fn :Ax = 0.) If f ∈ EndF (V ) (= HomF (V, V )), we have
(f(ε1), . . . , f(εn)) = (ε1, . . . , εn)A
for some A ∈Mn(F ). A is called the E-matrix of f . If E ′ = (ε′1, . . . , ε′n) is another
(ordered) basis of V and let B be the E ′-matrix of f . Then
B = P−1AP,
where P ∈ GL(n, F ) is defined by (ε′1, . . . , ε′n) = (ε1, . . . , εn)P . (Proof. f(ε′1, . . . , ε
′n) =
f((ε1, . . . , εn)P ) = f(ε1, . . . , εn)P = (ε1, . . . , εn)AP = (ε′1, . . . , ε′n)P−1AP .) The
map EndF (V )→Mn(F ), f 7→ A, is not only an F -isomorphism but also preservesthe multiplication; the map is an algebra isomorphism.
Facts about ranks of matrices. Let A,B ∈Mm×n(F ) and C ∈Mn×p(F ).
(i) rankA = maxr : A has an r × r invertible submatrix.(ii) rank(A+B) ≤ rankA+ rankB.(iii) rankA+ rankC − n ≤ rankAC ≤ minrankA, rankC.(iv) If P ∈ GL(m,F ) and Q ∈ GL(n, F ), then rankPAQ = rankA.
3.3. FINITE DIMENSIONAL VECTOR SPACES 21
Proof. (iii) Method 1. Define
f : Fn/C(C) −→ C(A)/C(AC)x+ (C) 7−→ Ax+ C(AC).
Then f is a well defined onto F -map. So, dim(Fn/C(C)
)≥ dim
(C(A)/C(AC)
).
Hence the result.Method 2. May assume A =
[Ir 00 0
], where r = rankA. Write C =
[C1C2
], where
C1 is of size r × p. Then rankAC = rankC1 and rankC1 + n− r ≥ rankC. Hencethe result.
Homogeneous linear ordinary differential equations (ODE). LetF = R or C. Let I ⊂ R be an open interval. Let A : I → Mn(F) be a continuousfunction. Let x(t) ∈ Fn denote an unknown function of a real variable t. For eacht0 ∈ I and x0 ∈ Fn, the initial value problem
(3.1)
x′(t) = A(t)x(t)x(t0) = x0
has a unique solution x(t) defined on I. (This is a special case of existence anduniqueness theorems in ODE.)
Let D(I) be F-vector space of all differentiable functions from I to Fn and let(Fn)I be the F-vector space of all functions from I to Fn. Then
L : D(I) −→ (Fn)I
x(t) 7−→ x′(t)−A(t)x(t)
is an F-map. The homogeneous linear ODE x′(t) = A(t)x(t) becomes L(x) = 0;its solution set is kerL. The existence and uniqueness of the solution of (3.1) isequivalent to the following statement. The F-map
(3.2)kerL −→ Fn
x 7−→ x(t0)
is an isomorphism. Therfore dimF kerL = n.Let x1, . . . ,xn ∈ kerL. φ(t) = det[x1(t), . . . ,xn(t)] is called the Wronskian of
x1, . . . ,xn. By the isomorphism (3.2), x1, . . . ,xn is a basis of kerL ⇔ φ(t0) 6= 0.Since t0 ∈ I is arbitrary, φ(t0) 6= 0⇔ φ(t) 6= 0 ∀t ∈ I.
The Wronskian φ(t) is explicitly given by its initial value φ(t0):
(3.3) φ(t) = φ(t0)exp(∫ t
t0
Tr(A(τ))dτ).
22 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
Proof of (3.3). We have
φ′(t) =d
dt
(det[x1(t), . . . ,xn(t)]
)=
n∑i=1
det[x1(t), . . . ,xi−1(t),x′i(t),xi+1(t) . . . ,xn(t)] (the product rule)
=n∑
i=1
det[x1(t), . . . ,xi−1(t), A(t)xi(t),xi+1(t) . . . ,xn(t)]
= TrA(t) det[x1(t), . . . ,xn(t)] (by the next lemma)
= TrA(t)φ(t).
It follows that φ(t) = φ(t0)exp(∫ t
t0Tr(A(τ))dτ
).
Let a0(t), . . . , an−1(t) ∈ F be continuous functions of t ∈ I and x(t) ∈ F anunknown function. Then the nth order linear ODE
(3.4) x(n)(t) + an−1(t)x(n−1)(t) + · · ·+ a0(t)x(t) = 0
is equivalent toy′(t) = A(t)y(t),
where
y(t) =
x(1)x′(t)
...x(n−1)(t)
, A(t) =
0 10 1
· ·· ·
0 1−a0(t) −a1(t) · · · −an−1(t)
.
Let S be the solution set of (3.4). Then for each t0 ∈ I,
S −→ Fn
x(t) 7−→ (x(t0), x′(t0), . . . , x(n−1)(t0))T
is an isomorphism. If x1, . . . , xn ∈ S, their Wronskian is
φ(t) =
∣∣∣∣∣∣∣∣∣∣x1(t) · · · xn(t)x′1(t) · · · x′n(t)
......
x(n−1)1 (t) · · · x
(n−1)n (t)
∣∣∣∣∣∣∣∣∣∣.
We have
φ(t) = φ(t0)exp(−∫ t
t0
an−1(τ)dτ).
x1, . . . , xn form a basis of S ⇔ φ(t0) 6= 0⇔ φ(t) 6= 0 ∀t ∈ I.
Lemma 3.13. Let A,B ∈Mn(F ) and write B = [b1, . . . bn]. Then
(3.5)n∑
i=1
det[b1, . . . , bi−1, Abi, bi+1, . . . , bn] = TrAdetB.
3.4. THE DUAL SPACE 23
Proof. Fix A = [aij ] and let f(B) be the difference of the two sides of (3.5).We only have to show that f satisfies (i) – (iii) of Proposition 2.6. (i) is obvious.
(ii) Assume b1 = b2. Then
f(B) = det[Ab1, b2, b3, . . . , bn] + det[b1, Ab2, b3, . . . , bn] = 0.
(iii)
f([e1, . . . , en]) =n∑
i=1
det[e1, . . . , ei−1, Aei, ei+1, . . . , en]
=n∑
i=1
det[e1, . . . , ei−1,
a1i
...ani
, ei+1, . . . , en] =n∑
i=1
aii = TrA.
3.4. The Dual Space
Let V be a vector space over F . HomF (V, F ) is called the dual space of V andis denoted by V ∗.
Let B be a basis of V . For each v ∈ B, ∃!v′ ∈ V ∗ such that
v′(u) =
1 if u = v,
0 if u ∈ B \ u.
It is easy to see that v′ : v ∈ B are linearly independent in V ∗. Thus, B → V ∗,v 7→ v′ extends to an embedding V → V ∗. (Note. This embedding depends on thechoice of the basis of B.)
If dimV = n <∞, then dimV ∗ = n. (Recall that HomF (Fn, F ) ∼= M1×n(F ).)So, the above embedding V → V ∗ is an isomorphism. v′ : v ∈ B is a basis of V ∗
and is called the dual basis of B.
Theorem 3.14. Let V be a vector space over F such that dimV = ∞. ThendimV ∗ = |V ∗| = |F |dim V .
Proof. Let B be a basis of V . Then |V ∗| = |FB | = |F |dim V .Case 1. Assume |F |dim V > |F |. By Example 3.6, dimV ∗ = |V ∗| = |F |dim V .Case 2. Assume |F |dim V = |F |. Let b0, b1, · · · ∈ B be distinct. For each a ∈ F ,
choose fa ∈ V ∗ such that fa(bj) = aj , j ≥ 0. We claim that fa : a ∈ F is linearlyindependent. This is quite obvious. Let a1, . . . , an ∈ F be distinct. Then the n×ℵ0
matrix [fai(bj)
]= [aj
i ]has linearly independent rows. Therefore, dimV ∗ ≥ |fa : a ∈ F| = |F |.
Examples. Let F = Q, V = Qℵ0 . Then dimV ∗ = ℵℵ00 = ℵ.
Let F = R, V = Rℵ0 . Then dimV ∗ = ℵℵ0 = ℵ.The pairing between V and V ∗. Define a map 〈· , ·〉 : V ∗ × V → F by
〈f, v〉 = f(v).
〈·, ·〉 is bilinear, i.e., 〈af+bg, v〉 = a〈f, v〉+b〈g, v〉 and 〈f, au+bv〉 = a〈f, u〉+b〈f, v〉.For any S ⊂ V and A ⊂ V ∗, S⊥ := f ∈ V ∗ : 〈f, v〉 = 0 ∀v ∈ S is a subspace ofV ∗ and A⊥ := v ∈ V : 〈f, v〉 = 0 ∀f ∈ A is a subspace of V .
24 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
Proposition 3.15. Let S, T be subspaces of V and A,B subspaces of V ∗.(i) S ⊂ T ⇒ S⊥ ⊃ T⊥; A ⊂ B ⇒ A⊥ ⊃ B⊥.(ii) S = S⊥⊥; A ⊂ A⊥⊥.(iii)
φ : S⊥ −→ (V/S)∗
f 7−→ 〈f, ·〉is an isomorphism, where
〈f, ·〉 : V/S −→ F
v + S 7−→ 〈f, v〉.
(iv)ψ : A⊥ −→ (V ∗/A)∗
v 7−→ 〈·, v〉is an embedding, where
〈·, v〉 : V ∗/A −→ F
f +A 7−→ 〈f, v〉.
(v) If dimV = n < ∞, then dimS + dimS⊥ = n, dimA + dimA⊥ = n,A = A⊥⊥, and the embedding ψ in (iv) is an isomorphism.
Proof. (ii) Clearly, S ⊂ S⊥⊥. If u ∈ V \ S, then ∃f ∈ V ∗ such that f(S) = 0but f(u) 6= 0. So, f ∈ S⊥ but 〈f, u〉 6= 0. Hence u /∈ S⊥⊥. So, S ⊃ S⊥⊥.
(iii) Proof that φ is onto. Let π : V → V/S be the natural projection. ∀g ∈(V/S)∗, we have g π ∈ S⊥ and g = φ(g π).
(v) Note that dim(V/S)∗ = dim(V/S). Thus by (iii), dimS⊥ = dim(V/S) =n− dimS.
Let A = 0 in (iv). We have V = 0⊥ → V ∗∗, v 7→ 〈·, v〉 is an embedding.Since dimV = dimV ∗∗, this embedding is also onto. Thus every α ∈ V ∗∗ is ofthe form 〈·, v〉 for some v ∈ V . It follows that the map ψ in (iv) is onto. (Letρ : V ∗ → V ∗/A be the natural projection. ∀β ∈ (V ∗/A)∗, we have β ρ ∈ V ∗∗;hence β ρ = 〈·, v〉 for some v ∈ V . Clearly v ∈ A⊥ and ψ(v) = β.) Consequently,
dimA⊥ = dim(V ∗/A)∗ = dim(V ∗/A) = n− dimA.Since A ⊂ A⊥⊥ and dimA = n− dimA⊥ = dimA⊥⊥, we have A = A⊥⊥.
Note.
(i) The embedding V → V ∗∗, v 7→ 〈·, v〉, is called the canonical embeddingof V into V ∗∗; it does not depends on any bases of V and V ∗∗. (Forcomparison, note that the embedding V → V ∗ at the beginning of thissection depends on the choice of the bases of V and V ∗.) When dimV <∞, the canonical embedding is an isomorphism.
(ii) Statements (iii) and (iv) of Proposition 3.15 can be made a little moregeneral. See Exercise 3.4.
(iii) When dimV =∞, the claims in (v) of Proposition 3.15 are false. See thefollowing counterexamples.• Let S = 0 ⊂ V . Then dimS⊥ = dimV ∗ > dimV ; hence dimS +
dimS⊥ > dimV .• Let A = V ∗. Then dimA+ dimA⊥ > dimV .
EXERCISES 25
• Since dimV ∗∗ > dimV , the canonical embedding V → V ∗∗ is notonto.• Assume V has a countable basis ε1, ε2, . . . . Let
A = f ∈ V ∗ : f(εn) = 0 when n is large enough.
Then A⊥ = 0. (If 0 6= v ∈ V , then v = a1ε1 + · · ·+ aN εN for someN > 0 and a1, . . . , aN ∈ F . Choose f ∈ V ∗ such that f(v) = 1 andf(εn) = 0 for all n > N . Then f ∈ A but 〈f, v〉 6= 0, so v /∈ A⊥.)Therefore, A⊥⊥ = 0⊥ = V ∗ ) A.
When dimV = n < ∞, the paring between V and V ∗ can be made moreexplicit. Let v1, . . . , vn be a basis of V and v′1, . . . , v
′n the dual basis of V ∗. Define
isomorphisms
α : Fn → V, (a1, . . . , an) 7→ a1v1 + · · ·+ anvn,
β : Fn → V ∗, (b1, . . . , bn) 7→ b1v′1 + · · ·+ bnv
′n.
For v ∈ V and f ∈ V ∗, write v = a1v1 + · · ·+anvn and f = b1v′1 + · · ·+ bnv
′n. Then
〈f, v〉 = 〈b1v′1 + · · ·+ bnv′n, a1v1 + · · ·+ anvn〉 = b1a1 + · · ·+ bnan
= (b1, . . . , bn)(a1, . . . , an)T = β−1(f) · α−1(v)T .
Let S be a subspace of V and A a subspace of V ∗. Let ε1, . . . , εk be a basis ofα−1(S) and δ1, . . . , δl a basis of β−1(A). Then
β−1(S⊥) = kerr[εT , . . . , εTk ],
α−1(A⊥) = kerr[δT , . . . , δTl ].
Proposition 3.16. Let f : V →W be an F -map.
(i) Define f∗ : W ∗ → V ∗, α 7→ α f . Then f∗ ∈ HomF (W ∗, V ∗). Moreover,( )∗ : HomF (V,W )→ HomF (W ∗, V ∗) is an F -map.
(ii) If g : W → X is another F -map, then (g f)∗ = f∗ g∗.(iii) Let θV : V → V ∗∗ and θW : W →W ∗∗ be the canonical embeddings. Then
the following diagram is commutative.
VθV−→ V ∗∗
f
y yf∗∗
WθW−→ W ∗∗
Proof. Exercise.
Exercises
3.1. Let V be a vector space over F and let A,B,A′ be subspaces of V such thatA′ ⊂ A. Prove that
A ∩ (B +A′) = (A ∩B) +A′.
26 3. VECTOR SPACES AND LINEAR TRANSFORMATIONS
3.2. Let V be a vector space over F and let f be a linear transformation of V . Asubspace W ⊂ V is called f -invariant if f(W ) ⊂W . Define
V1 = a ∈ V : fk(a) = 0 for some integer k > 0,
V2 =∞⋂
k=1
fk(V ).
(i) Prove that V1 and V2 are both f -invariant subspaces of V .(ii) If dimV <∞, prove that
V = V1 ⊕ V2.
(iii) Give an example of a linear transformation f of an infinite dimensionalvector space V such that V1 = V2 = 0.
3.3. Let L = f(x, y) ∈ R[x, y] : degx f ≤ n, degy f ≤ n. Let ∆ = ∂2
∂x2 + ∂2
∂y2 .Prove that
D : L −→ Lf(x, y) 7−→ ∆
((x2 + y2)f(x, y)
)− (x2 + y2)∆f(x, y)
is a linear transformation. Find the matrix of D relative to the basis xiyj :0 ≤ i, j ≤ n of L.
3.4. Let V be a vector space over F . Let S ⊂ T be subspaces of V and A ⊂ Bsubspaces of V ∗.(i) Define
φ : S⊥/T⊥ −→ (T/S)∗
f + T⊥ 7−→ 〈f, ·〉where
〈f, ·〉 T/S −→ F
u+ S 7−→ 〈f, u〉.Prove that φ is a well defined isomorphism.
(ii) Defineψ : A⊥/B⊥ −→ (B/A)∗
u+ B⊥ 7−→ 〈·, u〉where
〈·, u〉 B/A −→ F
f +A 7−→ 〈f, u〉.Prove that ψ is a well defined injective F -map. When dimV <∞, ψ isan isomorphism.
3.5. Prove Proposition 3.16.
3.6. Let
A =
[B C
D E
],
where B ∈ Mm×n(F ) with rankB = r and E ∈ Mp×q(F ). What is thelargest possible values of rankA?
EXERCISES 27
3.7. Let A ∈Mm×n(F ), B ∈Mn×p(F ), C ∈Mp×q(F ). Prove that
rankAB + rankBC ≤ rankB + rankABC.
3.8. (i) Let V and W be vector spaces over Q and f : V → W a function suchthat f(x+ y) = f(x) + f(y) for all x, y ∈ V . Prove that f is a Q-linearmap.
(ii) Let f : Rn → Rm be a continuous function such that f(x + y) =f(x) + f(y) for all x, y ∈ Rn. Prove that f is an R-linear map. (Note.(ii) is false if f is not continuous.)
3.9. LetX be a subspace ofMn(F ) with dimX > n(n−1). Prove thatX containsan invertible matrix.
3.10. Let Fq be a finite field with q elements.(i) Prove that
|GL(n,Fq)| = (qn − 1)(qn − q) · · · (qn − qn−1) = q12 n(n−1)
n∏i=1
(qi − 1).
(ii) Let 0 ≤ k ≤ n and let[nk
]q
be the number of k-dimensional subspacesin Fn
q . Prove that[n
k
]q
=(qn − 1)(qn − q) · · · (qn − qk−1)(qk − 1)(qk − q) · · · (qk − qk−1)
=k∏
i=1
qn−k+i − 1qi − 1
.
([nk
]q
is called the gaussian coefficient.)
3.11. Let n ≥ 0 and V = f ∈ F [x] : deg f ≤ n. For each 1 ≤ i ≤ n + 1, defineLi ∈ V ∗ by
Li(f) =∫ +∞
0
f(x)e−ixdx, f ∈ V.
Find a basis f1, . . . , fn+1 of V such that L1, . . . , Ln+1 is its dual basis.
CHAPTER 4
Rational Canonical Forms and Jordan CanonicalForms
4.1. A Criterion for Matrix Similarity
The main purpose of this chapter is to determine when two matrices in Mn(F )are similar and to determine a canonical form in each similarity class. Let V be ann-dimensional vector space over F . Then two matrices in Mn(F ) are similar iff theyare the matrices of some T ∈ End(V ) relative to two suitable bases. Therefore, toknow canonical forms of the similarity classes of Mn(F ) is to know canonical formsof linear transformations of V relative to suitable bases.
Matrices over Mn(F [x]). Let F [x] be the polynomial ring over F . Mm×n(F [x])is the set of all m × n matrices with entries in F [x]; Mn(F [x]) := Mn×n(F [x]);GL(n, F [x]) is the set of all invertible matrices in Mn(F [x]).
Fact. A ∈Mn(F [x]) is invertible ⇔ detA ∈ F× (= F \ 0).
Proof. (⇒) 1 = det(AA−1) = (detA)(detA−1). So, detA is invertible inF [x], i.e., detA ∈ F×.
(⇐) A−1 = 1det AadjA.
Equivalence in Mm×n(F [x]). Two matrices A,B ∈ Mm×n(F [x]) are calledequivalent, denoted A ≈ B, if ∃P ∈ GL(m,F [x]) and Q ∈ GL(n, F [x]) such thatA = PBQ.
Elementary operations and elementary matrices in Mn(F [x]). Ele-mentary operations and elementary matrices in Mn(F [x]) are almost the same asthose in Mn(F ), cf. Table ??. For type I, we still require that α ∈ F×. (Requiringthat 0 6= α ∈ F [x] is not enough.) For type III, β ∈ F [x]. Elementary matrices inMn(F [x]) are invertible and every matrix in GL(n, F [x]) is a product of elementarymatrices.
Theorem 4.1. Let A,B ∈ Mn(F ). Then A and B are similar in Mn(F ) ⇔xI −A and xI −B are equivalent in Mn(F [x]).
Proof. (⇒) ∃P ∈ GL(n, F ) such thatA = PBP−1. Note that P ∈ GL(n, F [x])and P (xI −B)P−1 = xI −A.
(⇐) ∃P,Q ∈ GL(n, F [x]) such that
P (xI −A) = (xI −B)Q.
Write P = P0 +xP1 + · · ·+xsPs, where Pi ∈Mn(F ). Divide P by xI−B from theleft. We have P = (xI −B)S + T for some S ∈Mn(F [x]) and T ∈Mn(F ). Divide
29
30 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
Q by xI −A from the right. We have Q = S′(xI −A)+T ′ for some S′ ∈Mn(F [x])and T ′ ∈Mn(F ). Thus
[(xI −B)S + T ](xI −A) = (xI −B)[S′(xI −A) + T ′],
i.e.,
(4.1) (xI −B)(S − S′)(xI −A) = (xI −B)T ′ − T (xI −A).
We claim that S − S′ = 0. (Otherwise, S − S′ = S0 + xS1 + · · · + xkSk, Si ∈Mn(F ), Sk 6= 0. Then (xI − B)(S − S′)(xI − A) = xk+2Sk+ terms of lowerdegree in x while the highest power of x at the RHS of (4.1) is x, →←.) Thus(xI −B)T ′− T (xI −A) = 0, which implies that T = T ′ and BT = TA. It remainsto show that T ∈ GL(n, F ). (Then B = TAT−1.) Write
P−1 = (xI −A)X + Y,
where X ∈Mn(F [x]) and Y ∈Mn(F ). Then
I = PP−1 = [(xI −B)S + T ][(xI −A)X + Y ]
= (xI −B)S[(xI −A)X + Y
]+ T (xI −A) + TY
= (xI −B)S[(xI −A)X + Y
]+ (xI −B)T + TY (∵ TA = BT )
= (xI −B)Z + TY
(4.2)
for some Z ∈Mn(F [x]). Compare the degrees of x at both sides of (4.2). We musthave TY = I and the proof is complete.
Now, the question is to determine when xI −A is equivalent to xI −B.
4.2. The Smith Normal Form
For two matrices A,B of any size, define A⊕B = [ AB ].
Theorem 4.2. Let A ∈ Mm×n(F [x]). Then ∃P ∈ GL(m,F [x]) and Q ∈GL(n, F [x]) such that
(4.3) PAQ =
d1
d2
. . .
dr
⊕ 0,
where d1, . . . , dr ∈ F [x] are monic (with leading coefficient 1) and d1 | d2 | · · · | dr.The polynomials d1, . . . , dr ∈ F [x] are uniquely determined by A and are called theinvariant factors of A. The integer r is called the rank of A. The matrix at theRHS of (4.3) is called the Smith normal form of A.
Proof. Existence of the Smith normal form.For 0 6= A = [aij ] ∈Mm×n(F [x]), define δ(A) = mindeg aij : aij 6= 0.Use induction on min(m,n). First assume min(m,n) = 1, say m = 1. As-
sume A 6= 0. Among all matrices equivalent to A, choose B such that δ(B)is as small as possible. Write B = [b11, . . . , b1n] and, without loss of general-ity, assume deg b11 = δ(B). Then b11 | bij for all 2 ≤ j ≤ n. (If b11 - b12,then b12 = qb11 + r for some q, r ∈ F [x] with 0 ≤ deg r < deg b11. Then
4.2. THE SMITH NORMAL FORM 31
B ∼= [b11, b12 − qb11, b13, . . . , b1n] = [b11, r, b13, . . . , b1n], which contradicts the mini-mality of δ(B).) Thus, suitable elementary column operations of type III transformB into [b11, 0, . . . , 0]. We can make b11 monic using a type I elementary operation.
Now assume min(m,n) > 1 and A 6= 0. Among all matrices equivalent toA, choose B such that δ(B) is as small as possible. Let B = [bij ] and assumedeg b11 = δ(B). By the argument in the case m = 1 we have b11 | b1j for all2 ≤ j ≤ n and b11 | bi1 for all 2 ≤ i ≤ m. Then suitable type III elementaryoperations transform B into
C =
b11 0 · · · 00 c22 · · · c2n
......
...0 cm2 · · · cmn
.We claim that b11 | cij for all 2 ≤ i ≤ m and 2 ≤ j ≤ n. (Since
C ≈
b11 ci2 · · · cin
0 c22 · · · c2n
......
...0 cm2 · · · cmn
,from the above we have b11 | cij for all 2 ≤ j ≤ n.) Therefore, C = [b11] ⊕ b11C1,where C1 ∈M(m−1)×(n−1)(F [x]). Apply the induction hypothesis to C1.
Uniqueness of the Smith normal form.For A ∈Mm×n(F [x]) and 1 ≤ k ≤ min(m,n), define
∆k(A) = gcddetX : X is a k × k submatrix of A.
(∆k(A) is called the kth determinantal divisor of A.) Also define ∆0(A) = 1.We claim that if A,B ∈ Mm×n(F [x]) are equivalent, then ∆k(A) = ∆k(B)
for all 0 ≤ k ≤ min(m,n). Assume B = PAQ, where P ∈ GL(m,F [x]), Q ∈GL(n, F [x]). By Cauchy-Binet, for I ⊂ 1, . . . ,m and J ⊂ 1, . . . , n with |I| =|J | = k,
detB(I, J) =∑
K⊂1,...,mL⊂1,...,n|K|=|L|=k
detP (I,K) detA(K,L) detQ(L, J).
Since ∆k(A) | detA(K,L) for all K,L, ⇒ ∆k(A) | detB(I, J) for all I, J . So,∆k(A) | ∆k(B). By symmetry, ∆k(B) | ∆k(A). So, ∆k(A) = ∆k(B).
Now, if
A ≈
d1
d2
. . .
dr
⊕ 0,
then
(4.4) ∆k(A) =
d1 · · · dk if 0 ≤ k ≤ r,0 if k > r.
32 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
So, r is uniquely determined by A and
(4.5) dk =∆k(A)
∆k−1(A), 1 ≤ k ≤ r,
are also uniquely determined by A.
Elementary divisors. Let A ∈ Mm×n(F [x]) and let d1, . . . , dr be the non-constant invariant factors of A. Write di = pei1
i1 · · · pei,sii,si
, where pi1, . . . , pi,si∈
F [x] are distinct monic irreducible polynomials and ei1, . . . , ei,si ∈ Z+. Thenpei1
i1 , . . . , pei,sii,si
, 1 ≤ i ≤ r, are called the elementary divisors of A.
Corollary 4.3. Let A,B ∈Mm×n(F [x]). The following statements are equiv-alent.
(i) A,B are equivalent.(ii) A,B have the same invariant factors.(iii) A,B have the same rank and same elementary divisors.(iv) A,B have the same determinantal divisors.
Proof. By Theorem 4.2, (i) ⇔ (ii). By (4.4) and (4.5), (ii) ⇔ (iv).Obviously, (ii) ⇒ (iii).(iii) ⇒ (ii). It suffices to show that the invariant factors of a matrix A ∈
Mm×n(F [x]) are determined by its rank and its elementary divisors. Let rankA = r.Let the elementary divisors of A be
pe111 , . . . , p
e1,s11 ,
...pet1
t , . . . , pet,stt ,
where p1, . . . , pt ∈ F [x] are distinct monic irreducibles and 0 < ei1 ≤ · · · ≤ ei,si,
1 ≤ i ≤ t. Then the last invariant factor of A is dr = pe1,s11 · · · pet,st
t . The otherinvariant factors of A are determined by the remaining elementary divisors
pe111 , . . . , p
e1,s1−1
1 ,...
pet1t , . . . , p
et,st−1t
the same way. Therefore, the invariant factors of A are determined by its rank andits elementary divisors.
Proposition 4.4. Let A,B be two matrices over F [x]. Then the elementarydivisor list of A⊕B is the union of the elementary divisor lists of A and B.
Proof. We may assume that A and B are Smith normal forms:
A =
f1
. . .
fs
⊕ 0,
g1
. . .
gt
⊕ 0.
Let p ∈ F [x] be any monic irreducible. Write fi = paif ′i , gj = pbjg′j , where p - f ′i ,p - g′j , and a1 ≤ · · · ≤ as, b1 ≤ · · · ≤ bt. Let c1 ≤ · · · ≤ cs+t be a rearangement ofa1, . . . , as, b1, . . . , bt. Then for 1 ≤ k ≤ s+ t,
∆k(A⊕B) = pc1+···+ckhk, hk ∈ F [x], p - hk.
4.2. THE SMITH NORMAL FORM 33
(Note that ∆k(A⊕B) = 0 for k > s+ t.) Hence, the kth invariant factor of A⊕Bis
∆k(A⊕B)∆k−1(A⊕B)
= pckh′k, h′k ∈ F [x], p - h′k.
Therefore, the powers of p appearing in the elementary divisor list of A ⊕ B arepck , ck > 0. These are precisely the powers of p appearing in the union of theelementary divisor lists of A and B.
Example. Let A ∈M5×4(R[x]) be given below.
A =
0 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 82 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −x4 − 2x3 − 3x2 − 6x− 6
x− 1 x2 − 1 −5x5 + 6x2 + 2x− 1 x5 + 3x4 + 5x3 + 6x2 + 5x+ 4−1 0 x4 + x3 + 2x2 3x4 + 6x3 + 9x2 + 12x+ 72 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −2x4 − 4x3 − 6x2 − 10x− 8
We use elementary operations to bring A to its Smith normal form:
Ar1↔r4−−−−−→
r1×(−1)1 0 −x4 − x3 − 2x2 −3x4 − 6x3 − 9x2 − 12x− 72 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −x4 − 2x3 − 3x2 − 6x− 6
x− 1 x2 − 1 −5x5 + 6x2 + 2x− 1 x5 + 3x4 + 5x3 + 6x2 + 5x+ 40 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 82 2x+ 2 −2x4 − 2x3 − 4x2 + 6x+ 6 −2x4 − 4x3 − 6x2 − 10x− 8
r2−2×r1−−−−−−−−−→r3−(x−1)×r1
r5−2×r1
1 0 −x4 − x3 − 2x2 −3x4 − 6x3 − 9x2 − 12x− 70 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 x2 − 1 x3 + 4x2 + 2x− 1 4x5 + 6x4 + 8x3 + 9x2 − 30 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 2x+ 2 6x+ 6 4x4 + 8x3 + 12x2 + 14x+ 6
−→
1 0 0 00 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 x2 − 1 x3 + 4x2 + 2x− 1 4x5 + 6x4 + 8x3 + 9x2 − 30 2x+ 2 6x+ 6 5x4 + 10x3 + 15x2 + 18x+ 80 2x+ 2 6x+ 6 4x4 + 8x3 + 12x2 + 14x+ 6
= [1]⊕ (x+ 1)
2 6 5x3 + 5x2 + 10x+ 8
x− 1 x2 + 3x− 1 4x4 + 2x3 + 6x2 + 3x− 32 6 5x3 + 5x2 + 10x+ 82 6 4x3 + 4x2 + 8x+ 6
,
34 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
where2 6 5x3 + 5x2 + 10x+ 8
x− 1 x2 + 3x− 1 4x4 + 2x3 + 6x2 + 3x− 32 6 5x3 + 5x2 + 10x+ 82 6 4x3 + 4x2 + 8x+ 6
→ · · · →
1x2 + 2
(x+ 1)(x2 + 2)0 0 0
.So,
A ≈
1
x+ 1(x+ 1)(x2 + 2)
(x+ 1)2(x2 + 2)0 0 0 0
.
We have
∆1(A) = 1,
∆2(A) = x+ 1,
∆3(A) = (x+ 1)2(x2 + 2),
∆4(A) = (x+ 1)3(x2 + 2)2.
The elementary divisors of A are x+ 1, x+ 1, (x+ 1)2, x2 + 2, x2 + 2.
4.3. Rational Canonical Forms
Let A ∈ Mn(F ). Since det(xI − A) 6= 0 (in F [x]), the Smith normal form ofxI−A has no 0’s on the diagonal. So, the invariant factors of xI−A are completelydetermined by the nonconstant invariant factors of xI − A. For this reason, whenwe speak of the invariant factors of xI −A, we usually mean the nonconstant ones.The invariant factors, elementary divisors and determinantal divisors of xI −A arealso called those of A.
Theorem 4.5. Let A,B ∈ Mn(F ). Then the following statements are equiva-lent.
(i) A ∼ B.(ii) A,B have the same invariant factors.(iii) A,B have the same elementary divisors.(iv) A,B have the same determinantal divisors.
Proof. Immediate from Theorem 4.1 and Corollary 4.3.
Corollary 4.6. For every A ∈Mn(F ), A ∼ AT .
Proof. xI −A and xI −AT have the same determinantal divisors.
4.3. RATIONAL CANONICAL FORMS 35
The companion matrix. Let f(x) = xn + an−1xn−1 + · · ·+ a0 ∈ F [x]. The
companion matrix of f , denoted by M(f), is defined to be
M(f) =
0 10 1
. . . . . .
0 1−a0 −a1 · · · −an−2 −an−1
.
f(x) is the only invariant factor of M(f). In fact,
∆n(M(f)) =
∣∣∣∣∣∣∣∣∣∣∣∣
x −1x −1
. . . . . .
x −1a0 a1 · · · an−2 x+ an−1
∣∣∣∣∣∣∣∣∣∣∣∣= f(x),
∆n−1(M(f)) = 1.
Theorem 4.7. Let A ∈ (Mn(F )) have invariant factors d1, . . . , dr and elemen-tary divisors e1, . . . , es. Then
A ∼M(d1)⊕ · · · ⊕M(dr) ∼M(e1)⊕ · · · ⊕M(es).
M(d1) ⊕ · · · ⊕M(dr) and M(e1) ⊕ · · · ⊕M(es) are called the rational canonicalforms (in terms of invariant factors/elementary divisors).
Proof. The invariant factors of xI −M(d1)⊕ · · ·⊕M(dr) are d1, . . . , dr. Theelementary divisors of M(e1)⊕ · · · ⊕M(es) are e1, . . . , es.
The characteristic polynomial. Let A ∈ Mn(F ). cA(x) := det(xI − A)is called the characteristic polynomial of A.
Theorem 4.8 (Cayley-Hamilton). Let A ∈Mn(F ) have characteristic cA(x) =xn + an−1x
n−1 + · · ·+ a0. Then cA(A) = 0, i.e.,
An + an−1An−1 + · · ·+ a0I = 0.
Proof. We have
(4.6) cA(x)I = xnI+an−1xn−1I+ · · ·+a0I− cA(A)+ cA(A) = (xI−A)p+ cA(A)
for some p ∈Mn(F [x]). We also have
(4.7) cA(x)I = det(xI −A) I = (xI −A) adj(xI −A) = (xI −A)q,
where q = adj(xI −A) ∈Mn(F [x]). By (4.6) and (4.7),
(xI −A)(p− q) = cA(A).
A comparison of degrees in x implies that q = p; hence cA(A) = 0.
The minimal polynomial. Let A ∈Mn(F ). Let I = f ∈ F [x] : f(A) = 0.Then I 6= ∅ since cA ∈ I. Let m ∈ I be monic and of the smallest degree. Thenevery f ∈ I is a multiple of m. (Write f = qm+ r, where r = 0 or deg r < degm.Then 0 = f(A) = r(A). By the minimality of degm, we have r = 0.) Hence m is
36 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
unique in I; it is called the minimal polynomial of A, denoted by mA. We havemA | cA.
Easy fact. If A ∼ B, then cA(x) = cB(x) and mA(x) = mB(x).
Proposition 4.9. M2 Let f(x) = xn + an−1xn−1 + · · · + a0 ∈ F [x]. Then
the minimal polynomial of M(f) is f(x).
Proof. Let A = M(f). Only have to show that A0, A1, . . . , An−1 are linearlyindependent. (Thus, 6 ∃ g ∈ F [x] with deg g ≤ n − 1 such that g(A) = 0.) Usinginduction, we have
Ai
0...01
=
∗...∗1 n−i
0...0
, 0 ≤ i ≤ n− 1.
Hence Ai[1, 0, . . . , 0]T , 0 ≤ i ≤ n−1, are linearly independent. So, Ai, 0 ≤ i ≤ n−1,are linearly independent.
Proposition 4.10. Let A ∈Mn(F ) have invariant factors d1, . . . , dr, (d1 | d2 |· · · | dr). Then mA(x) = dr(x).
Proof. May assume A = M(d1)⊕ · · · ⊕M(dr). Then
dr(A) = dr
(M(d1)
)⊕ · · · ⊕ dr
(M(dr)
)= 0.
So, mA | dr. On the other hand, since mA(A) = 0, mA(M(dr)) = 0. By Proposi-tion 4.9, dr | mA.
Example. Let
A =
4 −3 8 −11−6 0 −8 10−14 7 −20 21−6 4 −8 6
∈M4(R).
Then
xI −A =
x− 4 3 −8 11
6 x 8 −1014 −7 x+ 20 −216 −4 8 x− 6
r1+r2−−−−→c1↔c4
1 x+ 3 0 x+ 2−10 x 8 6−21 −7 x+ 20 14x− 6 −4 8 6
−→
1 0 0 00 11x+ 30 8 10x+ 260 21x+ 56 x+ 20 21x+ 560 −x2 + 3x+ 14 8 −x2 + 4x+ 18
4.3. RATIONAL CANONICAL FORMS 37
c2↔c3c4−c3−−−−→r3×8
[1]⊕
8 11x+ 30 −x− 48(x+ 20) 8(21x+ 56) 0
8 −x2 + 3x+ 14 x+ 4
−→ [1]⊕
8 0 00 −11x2 − 82x− 152x (x+ 4)(x+ 20)0 −x2 − 8x− 16 2x+ 8
−→ [1]⊕ [1]⊕ (x+ 4)
[−11x− 38 x+ 20−x− 4 2
]
−→ [1]⊕ [1]⊕ (x+ 4)
[1 00 x2 + 2x+ 4
].
So, the invariant factors of A are x+4, (x+4)(x2 +2x+4); the elementary divisorsare x+ 4, x+ 4, x2 + 2x+ 4. The rational canonical form of A is
[−4]⊕ [−4]⊕
[0 1−4 −2
].
Eigenvalues, eigenvectors and eigenspaces. Let A ∈ Mn(F ). If ∃ 0 6=x ∈ Fn and λ ∈ F such that
Ax = λx,
λ is called an eigenvalue of A and x is called an eigenvector of A (with eigenvalueλ). Eigenvalues of A are the roots of the characteristic polynomial cA(x). If λ isan eigenvalue of A,
Eλ(A) := x ∈ Fn : Ax = λx = kerc(A− λI)
is called the eigenspace of A with eigenvalue λ. dim EA(λ) = null(A− λI) is calledthe geometric multiplicity of λ. The multiplicity of λ as a root of cA(x) is calledthe algebraic multiplicity of λ. Similar matrices have the same eigenvalues togetherwith their algebraic and geometric multiplicities.
Fact. If A = M(f1) ⊕ · · · ⊕M(fk), where fi ∈ F [x] is monic and λ is aneigenvalue of A. Then the geometric multiplicity of λ is |i : fi(λ) = 0|. Inparticular, geo.mult.(λ) ≤ alg.mult.(λ).
Proof. We have
null(A− λI) =∑
i
null(M(fi)− λI),
where
null(M(fi)− λI) =
0 if fi(λ) 6= 0,1 if fi(λ) = 0.
Fact. Let λ1, . . . , λk ∈ F be distinct eigenvalues of A ∈Mn(F ). Then
EA(λ1) + · · ·+ EA(λk) = EA(λ1)⊕ · · · ⊕ EA(λk).
38 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
Proof. We want to show that
EA(λi) ∩(EA(λ1) + · · ·+ EA(λi−1) + EA(λi+1) + · · ·+ EA(λk)
)= 0, 1 ≤ i ≤ k.
Without loss of generality, assume i = 1. Let x ∈ EA(λ1)∩(EA(λ2)+ · · ·+EA(λk)
).
Thenx = a2x2 + · · ·+ akxk, xi ∈ EA(λi), ai ∈ F.
So,[ k∏i=2
(λ1 − λi)]x =
[ k∏i=2
(A− λiI)]x =
[ k∏i=2
(A− λiI)](a2x2 + · · ·+ akxk) = 0.
Hence, x = 0.
Diagonalizable matrices. A ∈ Mn(F ) is called diagonalizable (or diago-nable) if A is similar to a diagonal matrix.
Proposition 4.11. Let A ∈Mn(F ) and let λ1, . . . , λk be all the eigenvalues ofA in F . The following statements are equivalent.
(i) A is diagonalizable.(ii) All elementary divisors of A are of degree 1.(iii) Fn = EA(λ1)⊕ · · · ⊕ EA(λk).(iv)
∑ki=1 geo.mult.(λi) = n.
Simultaneous diagonalization.
Proposition 4.12. Let A1, . . . , Ak ∈ Mn(F ) such that each Ai is diagonaliz-able and AiAj = AjAi for all 1 ≤ i, j ≤ k. Then ∃P ∈ GL(n, F ) such that PAiP
−1
is diagonal for all 1 ≤ i ≤ k.
Proof. Use induction on k.Since A1 is diagonalizable, we may assume
A = a1In1 ⊕ · · · ⊕ asIns,
where a1, . . . , ak ∈ F are distinct and n1 + · · ·+ ns = n. For each 2 ≤ i ≤ n, sinceAi commutes with A1, we must have
Ai = Ai1 ⊕ · · · ⊕Ais, Aij ∈Mnj(F ).
Since Ai is diagonalizable, each Aij is diagonalizable. (Think of the elementary divi-sors.) Since A2, . . . , Ak are pairwise commutative, for each 1 ≤ j ≤ s, A2j , . . . , Akj
are pairwise commutative. By the induction hypothesis, ∃Pj ∈ GL(nj , F ) such thatPjAijP
−1j is diagonal for all 2 ≤ i ≤ k. Let P = P1 ⊕ · · · ⊕ Ps. Then PAiP
−1 isdiagonal for all 1 ≤ i ≤ k.
The equation AX = XB. Let A ∈Mm(F ) and B ∈Mn(F ). We compute
dimX ∈Mm×n(F ) : AX = XB.
Lemma 4.13. Let A ∈ Mn(F ) such that cA(x) = mA(x). Then for any g ∈F [x], rank g(A) = n− deg(g, cA).
Proof. Let h = (g, cA). Then rank g(A) ≤ rankh(A). Write h = ag + bcAfor some a, b ∈ F [x]. Then h(A) = a(A)g(A). So, rank g(A) ≥ rankh(A). Hencerank g(A) = rankh(A).
4.3. RATIONAL CANONICAL FORMS 39
We may assume that A is a rational canonical form
A =
0 1
. . .
1∗ ∗ · · · ∗
.Then
Ai =
0 · · · 0 1...
.... . .
0 · · · 0 1 n−i
∗ · · · ∗ ∗ · · · ∗...
......
...∗ · · · ∗ ∗ · · · ∗
, 0 ≤ i ≤ n.
Hence, the (n − deg h) × (n − deg h) submatrix at the upper right corner of h(A)is invertible. So, rankh(A) ≥ n − deg h. Replace h with cA/h. We also haverank (cA/h)(A) ≥ deg h. On the other hand, since h(A)(cA/h)(A) = 0, we haverankh(A) + rank (cA/h)(A) ≤ n. Therefore, rankh(A) = n − deg h and rank(cA/h)(A) = deg h.
Lemma 4.14. Let f = xn + an−1xn−1 + · · · + a0 ∈ F [x], A ∈ Mm(F ) and
X ∈Mm×n(F ). Then AX = XM(f)T if and only if
X = [x,Ax, . . . , An−1x]
for some x ∈ kerc f(A).
Proof. Write X = [x1, . . . , xn] where x1, . . . , xn ∈ Fn. Then the equationAX = XM(f)T becomes
[Ax1, . . . , Axn] = [x1, . . . , xn]
0 −a0
1 −a1
. . ....
1 −an−1
= [x2, . . . , xn, −a0x1 − · · · − an−1xn],
i.e.,
(4.8)
Ax1 = x2,
...Axn−1 = xn,
Axn = −a0x1 − · · · − an−1xn.
Clearly, (5.11) is equivalent to xi = Ai−1x1, 1 ≤ i ≤ n and f(A)x1 = 0.
Proposition 4.15. Let A ∈Mm(F ) and B ∈Mn(F ) such that
A ∼M(f1)⊕ · · · ⊕M(fs), B ∼M(g1)⊕ · · · ⊕M(gt),
40 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
where fi, gj ∈ F [x] are monic. Then
dimX ∈Mm×n(F ) : AX = XB =∑i,j
deg(fi, gj).
Proof. We may assume that A = M(f1) ⊕ · · · ⊕M(fs) and B = M(g1)T ⊕· · · ⊕M(gt)T . Let αi = deg fi and βj = deg gj . Write
X =
X11 · · · X1t
......
Xs1 · · · Xst
, Xij ∈Mαi×βj (F ).
Then AX = XB ⇔M(fi)Xij = XijM(gj)T for all i, j.
By Lemmas 4.14 and 4.13,
dimXij ∈Mαi×βj (F ) : M(fi)Xij = XijM(gj)T = dim
(kerc gj(M(fi))
)= deg(gj , fi).
Hence the proposition.
Corollary 4.16. Let A ∈ Mm(F ) and B ∈ Mn(F ). Let the elementarydivisors of A be
pa111 , . . . , p
a1,k11 ; . . . ; pas1
s , . . . , pas,kss and powers of q1, . . . , qt,
and let the elementary divisors of B be
pb111 , . . . , p
b1,l11 ; . . . ; pbs1
s , . . . , pbs,lss and powers of r1, . . . , ru,
where p1, . . . , ps, q1, . . . , qt, r1, . . . , ru are distinct monic irreducibles in F [x] andaij , bij ∈ Z+. Then
dimX ∈Mm×n(F ) : AX = XB =s∑
i=1
ks∑j=1
ls∑j′=1
min(aij , bij′) deg pi.
Proof. Immediate from Proposition 4.15.
4.4. The Jordan Canonical Form
Jordan block. Let λ ∈ F and n > 0. The n×n Jordan block with eigenvalueλ is
Jn(λ) :=
λ 1 0 · · · 00 λ 1 · · · 0...
.... . . . . .
...0 0 · · · λ 10 0 · · · 0 λ
∈Mn(F ).
(x− λ)n is the only elementary divisor of Jn(λ).
Let A ∈ Mn(F ) such that cA(x) factors into a product of linear polynomials.(This is the case when F = C or any algebraically closed field.) Then all elementarydivisors of A are of the form (x− λ)e, λ ∈ F , e > 0.
4.4. THE JORDAN CANONICAL FORM 41
Theorem 4.17. Let A ∈Mn(F ) and assume that the elementary divisors of Aare (x− λ1)n1 , . . . , (x− λk)nk , λi ∈ F , ni > 0, n1 + · · ·+ nk = n. Then
(4.9) A ∼ Jn1(λ1)⊕ · · · ⊕ Jnk(λk).
The RHS of (4.9) is called the Jordan canonical form of A.
Proof. The two sides of (4.9) have the same elementary divisors.
The Hasse derivative. For f(x) = a0 + a1x+ · · ·+ anxn ∈ F [x] and k ≥ 0,
define
∂kf =(k
k
)ak +
(k + 1k
)ak+1x+ · · ·+
(n
k
)anx
n−k.
∂kf is called the kth order Hasse derivative of f . (If F is of characteristic 0, then∂kf = 1
k!f(k).)
Properties of the Hasse derivative. Let f, g ∈ F [x] and a, b ∈ F .(i) ∂k(af + bg) = a∂kf + b∂kg.(ii) ∂k(fg) =
∑i+j=k(∂if)(∂jg).
Lemma 4.18. Let f ∈ F [x], n > 0 and λ ∈ F . Then
(4.10) f(Jn(λ)
)=
f(λ) ∂1f(λ) · · · ∂n−1f(λ)
f(λ). . .
.... . . ∂1f(λ)
f(λ)
.
Proof. Only have to prove (4.10) with f(x) = xk, since both sides of (4.10)are linear in f . Let
Nn =
0 1 0 · · · 00 1 · · · 0
. . . . . ....
0 10
n×n
.
Then
N in =
0 · · · 0 1 0 · · · 00 1 · · · 0
. . . . . ....
0 1 n−i
0...0
, 0 ≤ i ≤ n,
and N in = 0 for i ≥ n. Thus
Jn(λ)k = (λI +Nn)k =k∑
i=0
(k
i
)λk−iN i
n =k∑
i=0
∂if(λ)N in =
n−1∑i=0
∂if(λ)N in.
42 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
Proposition 4.19. M2 Let A ∈Mn(F ) and λ an eigenvalue of A. Let τi bethe number of Ji(λ) in the Jordan canonical form of A. Then
τi = rank(A− λI)i−1 − 2 rank(A− λI)i + rank(A− λI)i+1, i ≥ 1.
Proof. May assume A = Jn1(λ)⊕· · ·⊕Jnk(λ)⊕B, where λ is not an eigenvalue
of B. Note that A− λI = Nn1 ⊕ · · · ⊕Nnk⊕ (B − λI), where B − λI is invertible.
Thus,
rank(A− λI)i−1 − rank(A− λI)i
=k∑
j=1
[rankN i−1
nj− rankN i
nj
]=
k∑j=1
[max0, nj − (i− 1) −max0, nj − i
]= |j : nj ≥ i|.
Hence,
τi = |j : nj = i|= |j : nj ≥ i| − |j : nj ≥ i+ 1|= rank(A− λI)i−1 − rank(A− λI)i −
[rank(A− λI)i − rank(A− λI)i+1
]= rank(A− λI)i−1 − 2 rank(A− λI)i + rank(A− λI)i+1.
Proposition 4.20 (The Jordan canonical form of a companion matrix). Letf = xk +ak−1x
k−1 + · · ·+a0 = (x−λ1)e1 · · · (x−λt)et ∈ F [x], where λ1, . . . , λt ∈ Fare distinct and e1, . . . , et ∈ Z+. Then
M(f) = P
Je1(λ1)
. . .
Jet(λt)
P−1,
where(4.11)
P =
(00
)λ0
1 · · ·(
0e1−1
)λ1−e1
1 · · ·(00
)λ0
t · · ·(
0et−1
)λ1−et
t(10
)λ1
1 · · ·(
1e1−1
)λ2−e1
1 · · ·(10
)λ1
t · · ·(
1et−1
)λ2−et
t
......
......(
k−10
)λk−1
1 · · ·(
k−1e1−1
)λk−e1
1 · · ·(k−10
)λk−1
t · · ·(
k−1et−1
)λk−et
t
.
(Note.(
ij
)= 0 if i, j ∈ Z and 0 ≤ i < j.)
Proof. First, we show that P is invertible. Assume [b0, . . . , bk−1]P = 0. Letg = b0 + · · · + bk−1x
k−1. Then ∂jg(λi) = 0 for 1 ≤ i ≤ t, 0 ≤ j ≤ ei − 1.Therefore,
∏ti=1(x − λi)ei | g. Since e1 + · · · + et = k, we must have g = 0, i.e.,
[b0, . . . , bk−1] = 0.
4.4. THE JORDAN CANONICAL FORM 43
We only have to show that M(f)P = P(Je1(λ1)⊕ · · · ⊕ Jet(λt)
). It suffices to
show that for each 1 ≤ i ≤ t,(4.12)
M(f)
(00
)λ0
i · · ·(
0ei−1
)λ1−ei
i(10
)λ1
i · · ·(
1ei−1
)λ2−ei
i...
...(k−10
)λk−1
i · · ·(
k−1ei−1
)λk−ei
1
=
(00
)λ0
i · · ·(
0ei−1
)λ1−ei
i(10
)λ1
i · · ·(
1ei−1
)λ2−ei
i...
...(k−10
)λk−1
i · · ·(
k−1ei−1
)λk−ei
1
Jei(λi).
First,
the 1st column of the LHS of (4.12)
=M(f)
(00
)λ0
i
...(k−10
)λk−1
i
=
(10
)λ1
i
...(k−10
)λk−1
i
−∑k−1
l=0 al
(l0
)λl
i
=
(10
)λ1
i
...(k−10
)λk−1
i(k0
)λk
i
(∵ f(λi) = 0)
=λi
(00
)λ0
i
...(k−10
)λk−1
i
= the 1st column of the RHS of (4.12).
For 1 ≤ j ≤ ei − 1, we have
the (j + 1)st column of the LHS of (4.12)
=M(f)
(0j
)λ
1−(j+1)i
...(k−1
j
)λ
k−(j+1)i
=
(1j
)λ
2−(j+1)i
...(k−1
j
)λ
k−(j+1)i
−∑k−1
l=0 al
(lj
)λl−j
i
=
(1j
)λ1−j
i
...(k−1
j
)λk−1−j
i(kj
)λk−j
i
(∵ (∂jf)(λi) = 0)
=
[(
0j
)+(
0j−1
)]λ1−j
i
...[(k−1
j
)+(k−1j−1
)]λk−j
i
= λi
(0j
)λ
1−(j+1)i
...(k−1
j
)λ
k−(j+1)i
+
(
0j−1
)λ1−j
i
...(k−1j−1
)λk−j
i
=the (j + 1)st column of the RHS of (4.12).
Homogeneous linear recurrence equations with constant coeffi-cients. We try to solve the kth order homogeneous linear recurrence equation
(4.13) xn+k + ak−1xn+k−1 + · · ·+ a0xn = 0, n ≥ 0,
44 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
where a0, . . . , ak−1 ∈ F . Equation 4.13 is equivalent to
xn+1
...xn+k
=
0 10 1
. . .
1−a0 −a1 −a2 · · · −ak−1
xn
...xn+k−1
= M(f)
xn
...xn+k−1
, n ≥ 0,
where f = xk+ak−1xk−1+· · ·+a0 ∈ F [x]. (f is called the characteristic polynomial
of equation (4.13).) Thus
(4.14)
xn
...xn+k−1
= M(f)n
x0
...xk−1
.Let f(x) = (x−λ1)e1 · · · (x−λt)et , where λ1, . . . , λt ∈ F are distinct and e1, . . . , et ∈Z+. By Proposition 4.20,
(4.15) M(f) = P
Je1(λ1)
. . .
Jet(λt)
P−1,
where P is given by (4.11). By (4.14) and (4.15),
xn = [1, 0, . . . , 0]M(f)n
x0
...xk−1
= [1, 0, . . . , 0]P
Je1(λ1)n
. . .
Jet(λt)n
P−1
x0
...xk−1
.[1, 0, . . . , 0]P is the first row of P , which has 1 at the 1st, (e1 +1)st, . . . , (e1 + · · ·+et−1 + 1)st components and has 0 elsewhere. By Lemma 4.18, the sum of the 1st,(e1 + 1)st, . . . , (e1 + · · ·+ et−1 + 1)st rows of Je1(λ1)n ⊕ · · · ⊕ Jet
(λt)n is[(n0)λ
n1 , . . . , ( n
e1−1)λn−e1+11 ; . . . ; (n
0)λnt , . . . , ( n
et−1)λn−et+11
].
Thus,
xn =[(n0)λ
n1 , . . . , ( n
e1−1)λn−e1+11 ; . . . ; (n
0)λnt , . . . , ( n
et−1)λn−et+11
]P−1
x0
...xk−1
.Homogeneous linear ODE with constant coefficients. Let A ∈
Mn(C) and consider the initial value problem
(4.16)
x′(t) = Ax(t)x(0) = x0,
4.4. THE JORDAN CANONICAL FORM 45
where x0 ∈ Cn and x(t) ∈ Cn is an unknown function of a real variable t. Bythe existence and uniqueness theorem in ODE, (4.16) has a unique solution x(t)defined for all t ∈ R. This solution can be explicitly determined as follows.
There exists P ∈ GL(n,C) such that
PAP−1 = Jn1(λ1)⊕ · · · ⊕ Jns(λs),
where λi ∈ C, n1 + · · · + ns = n. Let y(t) = Px(t) and y0 = Px0. Then (4.16)becomes
(4.17)
y′(t) =
(Jn1(λ1)⊕ · · · ⊕ Jns(λs)
)y(t)
y(0) = y0.
Assume for the time being that y(t) is analytic, i.e.,
y(t) =∞∑
k=0
1k!
y(k)(0)tk.
By (4.17),
y(k)(0) =(Jn1(λ1)⊕ · · · ⊕ Jns(λs)
)ky0
=(· · · ⊕ Jni(λi)k ⊕ · · ·
)y0
=[· · · ⊕
(ni−1∑j=0
(k
j
)λk−j
i N jni
)⊕ · · ·
]y0.
Therefore,
y(t) =[· · · ⊕
(ni−1∑j=0
[ ∞∑k=0
1k!
(k
j
)λk−j
i tk]N j
ni
)⊕ · · ·
]y0
=[· · · ⊕
(ni−1∑j=0
tjeλitN jni
)⊕ · · ·
]y0
=(· · · ⊕
eλit teλit · · · tni−1eλit
eλit. . .
.... . . teλit
eλit
⊕ · · ·)y0.
It is easy to see that y(t) given above is indeed a solution of (4.17). The solutionof (4.16) is x(t) = P−1y(t).
Locations of complex eigenvalues.
Gergorin disks. For a ∈ C and r ≥ 0, define D(a, r) = z ∈ C : |z− a| ≤ r.Let A = [aij ] ∈Mn(C). Then
D(r)i (A) := D
(aii,
∑j 6=i
|aij |)
is called the Gergorin row disk for the ith row of A;
D(c)j (A) := D
(ajj ,
∑i 6=j
|aij |)
46 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
is called the Gergorin column disk for the jth column of A. The Gergorin regionof A is defined to be
G(A) =( n⋃
i=1
D(r)i (A)
)∩( n⋃
j=1
D(c)j (A)
).
Theorem 4.21 (Gergorin). Let A ∈Mn(C). Then all the eigenvalues of A liein the Gergorin region of A.
Proof. Let A = [aij ] and let λ be an eigenvalue of A with an associatedeigenvector x = [x1, . . . , xn]T . Assume |xi| = max1≤j≤n |xj |. Since Ax = λx, wehave ai1x1 + · · ·+ ainxn = λxi. So,
|λ− aii||xi| = |(λi − aii)xi| =∣∣∣∑j 6=i
aijxj
∣∣∣ ≤ |xi|∑j 6=i
|aij |.
Hence |λ−aii| ≤∑
j 6=i |aij |. Thus λ ∈ D(r)i (A). Therefore, we have proved that λ ∈⋃n
i=1D(r)i (A). In the same (or by looking at AT ), we have λ ∈
⋃nj=1D
(c)j (A).
Corollary 4.22. Let A = [aij ] ∈Mn(C) such that either
(4.18) |aii| >∑j 6=i
|aij | for all 1 ≤ i ≤ n,
or
(4.19) |ajj | >∑i 6=j
|aij | for all 1 ≤ j ≤ n.
(A matrix satisfying (4.18) or (4.19) is called diagonally dominant.) Then A isinvertible.
Proof. We have 0 /∈ G(A).
Proposition 4.23. Let A = [aij ] ∈ Mn(C). Let X be a connected componentof G(A). Then the number of eigenvalues of A (counted with algebraic multiplicity)contained in X is |i : aii ∈ X|.
Proof. Let C be a contour (or a unioun of contours when X is not simplyconneted) such that C encloses X and C ∩G(A) = ∅. For t ∈ [0, 1], let
At =
a11 ta12 · · · ta1n
ta21 a22 · · · ta2n
......
. . ....
tan1 tan2 · · · ann
.Note that G(At) ⊂ G(A); hence C ∩ G(At) = ∅. The number of zeros of cAt
(z)(counted with multiplicity) in X is given by
N(t) :=1
2πi
∫C
c′At(z)
cAt(z)dz.
EXERCISES 47
N(t) is a continuous function of t ∈ [0, 1] and takes only integer values. Thus, N(T )is a constant for t ∈ [0, 1]. So,
the number of zeros of cA in X
=N(1) = N(0)= the number of zeros of cA0 in X
= |i : aii ∈ X|.
Exercises
4.1. Use the rational canonical form to give another proof for Exercise 3.2 (ii).
4.2. Let A = Mm×n(F ) and B ∈Mn×m(F ). Prove that
xn det(xIm −AB) = xm det(xIn −BA).
(In particular, if m = n, then cAB(x) = cBA(x).)
4.3. (Trace) For A = [aij ] ∈Mn(F ), define Tr(A) = a11 + a22 + · · ·+ ann. Provethe following statements.(i) Tr(AB) = Tr(BA) for A,B ∈Mn(F ).(ii) If A ∼ B, then Tr(A) = Tr(B).(iii) Let A ∈ Mn(F ). Then Tr(A) = 0 ⇔ A = XY − Y X for some X,Y ∈
Mn(F ).
4.4. Let A ∈ Mn(F ) have invariant factors d1, d2, . . . , dr, (d1 | d2 | · · · | dr).Define the centralizer of A in Mn(F ) to be
centMn(F )(A) = X ∈Mn(F ) : XA = AX.Prove that
dim[centMn(F )(A)
]=
r−1∑i=0
(2i+ 1) deg dr−i.
4.5. M2 For A ∈ Mn(F ), let 〈A〉 = f(A) : f ∈ F [x]. Obviously, 〈A〉 ⊂centMn(F )(A). Prove that centMn(F )(A) = 〈A〉 ⇔ cA(x) = mA(x). (Amatrix A ∈Mn(F ) with cA(x) = mA(x) is called nonderogatory.)
4.6. Let A,B ∈Mn(C) such that AB = BA. Let λ be an eigenvalue of A. Provethat the eigenspace EA(λ) is B-invariant, i.e., BEA(λ) ⊂ EA(λ). Use this toshow that A,B has a common eigenvector.
4.7. Let xn ∈ C satisfyx0 = a, x1 = b, x2 = c, x3 = d,
xn = 6xn−1 − 11xn−2 + 12xn−3 − 18xn−4, n ≥ 4.
Find an explicit formula for xn.
4.8. Find the rational canonical form of
A =
−9 −2 −9 −824 8 27 −24−4 −2 −4 5−7 −2 −6 7
∈M4(Q).
48 4. RATIONAL CANONICAL FORMS AND JORDAN CANONICAL FORMS
4.9. Let
A =
1 1 1 1 10 1 0 −1 −10 0 1 1 00 0 0 1 10 0 0 0 1
∈M5(C).
Use Proosition 4.19 to determine the Jordan canonical form of A.
4.10. Find all rational canonical forms (in terms of elementary divisors) of M4(Z2).The irreducibles of degree ≤ 4 in Z2[x] are x, x+ 1, x2 + x+ 1, x3 + x+ 1,x3 + x2 + 1, x4 + x+ 1, x4 + x3 + 1, x4 + x3 + x2 + x+ 1.
4.11. Let A ∈Mm(F ) and B,C ∈Mn(F ).(i) If A⊕B ∼ A⊕ C, then B ∼ C.(ii) If B ⊕B ∼ C ⊕ C, then B ∼ C.
CHAPTER 5
Inner Product Spaces and Unitary Spaces
5.1. Inner Product Spaces
Definition 5.1. An inner product space is a vector space V over R equippedwith a map (called the inner product) 〈·, ·〉 : V × V → R satisfying the followingconditions.
(i) 〈u, v〉 = 〈v, u〉 ∀u, v ∈ V .(ii) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉 ∀u, v, w ∈ V, a, b ∈ R.(iii) 〈u, u〉 ≥ 0 for all u ∈ V and 〈u, u〉 = 0⇔ u = 0.
Examples.
• V = Rn. For x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Rn, define
〈x, y〉 = x1y1 + · · ·+ xnyn.
• V = R[x]. For f(x), g(x) ∈ R[x], define
〈f, g〉 =∫ 1
−1
f(x)g(x)dx.
• `2 =(an)∞n=0 : an ∈ R,
∑∞n=0 a
2n <∞
. For (an), (bn) ∈ `2, define
〈(an), (bn)〉 =∞∑
n=0
anbn.
• L2(X). Let (X,B, µ) be a measure space. Two functions f, g : X → R ∪±∞ are considered the same if f = g almost everywhere. L2(X) = theset of all measurable functions f : X → R∪ ±∞ such that
∫X|f |2dµ <
∞. For f, g ∈ L2(X), define
〈f, g〉 =∫
X
fgdµ.
(`2 is a special case of L2(X).)
Norm and distance. Let V be an inner product space and let u, v ∈ V .||u|| :=
√〈u, u〉 is called the norm of u. ||u − v|| is called the distance between u
and v.
Inequalities and equalities. Let V be an inner product space.(i) (Cauchy-Schwartz) For all u, v ∈ V ,
|〈u, v〉| ≤ ||u|| ||v||.
The equality holds iff one of u, v is a scalar multiple of the other.
49
50 5. INNER PRODUCT SPACES AND UNITARY SPACES
(ii) (The triangle inequality) For all u, v ∈ V ,
||u+ v|| ≤ ||u||+ ||v||.
The equality holds iff one of u, v is a nonnegative multiple of the other.(iii) (Inner product in terms of norm)
(5.1) 〈u, v〉 =14(||u+ v||2 − ||u− v||2), u, v ∈ V.
(iv) (The parallelogram law)
||u+ v||2 + ||u− v||2 = 2||u||2 + 2||v||2, u, v ∈ V.
Proof. (i) Without loss of generality, assume v 6= 0. Let r = 〈u,v〉〈v,v〉 . Then
0 ≤ ||u− rv||2 = 〈u− rv, u− rv〉 = 〈u, u〉 − 2r〈u, v〉+ r2〈v, v〉 = ||u||2 − 〈u, v〉2
||v||2.
Hence, 〈u, v〉2 ≤ ||u||2||v||2, i.e., |〈u, v〉| ≤ ||u|| ||v||. The equality holds ⇔ u− rv =0⇔ u = 〈u,v〉
〈v,v〉v ⇔ u is a multiple of v.(ii) We have
||u+ v||2 = ||u||2 + ||v||2 + 2〈u, v〉 ≤ ||u||2 + ||v||2 + 2||u|| ||v|| = (||u||+ ||v||)2.
Isometry. Let V and W be two inner product spaces. A vector space isomor-phism f : V →W is called an isometry if
〈f(u), f(v)〉 = 〈u, v〉 for all u, v ∈ V.
Fact. Let V and W be two inner product spaces and let f ∈ HomR(V,W ).Then f preserves the inner products (i.e., 〈f(u), f(v)〉 = 〈u, v〉 ∀u, v ∈ V ) ⇔ fpreserves the norms (i.e., ||f(u)|| = ||u|| ∀u ∈ V ).
Proof. (⇐) By (5.1), the inner product is expressible in terms of the norm.
Orthogonality. Let V be an inner product space. Two elements u, v ∈ Vare called orthogonal, denoted as x⊥y, if 〈x, y〉 = 0. For X ⊂ V , define X⊥ = y ∈V : 〈y, x〉 = 0 ∀x ∈ X. X⊥ is a subspace of V .
Pythagorean theorem. Let V be an inner product space and let u, v ∈ V .Then u⊥v ⇔ ||u+ v||2 = ||u||2 + ||v||2.
Proposition 5.2. Let V be an inner product space and let S, T be subspacesof V .
(i) S ⊂ T ⇒ S⊥ ⊃ T⊥.(ii) S ∩ S⊥ = 0, S + S⊥ = S ⊕ S⊥. If dimS <∞, V = S ⊕ S⊥.(iii) S ⊂ S⊥⊥. If dimS <∞, S = S⊥⊥.(iv) If S ⊂ T , then
φ : S⊥/T⊥ −→ (T/S)∗
a+ T⊥ 7−→ 〈·, a〉
is an embedding. If dimV <∞, φ is an isomorphism.
5.1. INNER PRODUCT SPACES 51
Proof. (ii) We show that if dimS <∞, then V = S ⊕ S⊥.Method 1. The map ψ : V/S⊥ → S∗, a+ S⊥ 7→ 〈·, a〉 is an embedding. Hence
dimV/S⊥ ≤ dimS∗ = dimS = dim(S ⊕ S⊥)/S⊥. So, V/S⊥ = (S ⊕ S⊥)/S⊥, i.e.,V = S ⊕ S⊥.
Method 2. By the G-S orthonormalization (p. 52), S has an orthonormal basisu1, . . . , uk. For each x ∈ V , let x′ =
∑ki=1
〈x,ui〉〈ui,ui〉ui. Then x = x′ + (x− x′) where
x′ ∈ S and x− x′ ∈ S⊥.(iii) We show that if dimS <∞, then S⊥⊥ ⊂ S. ∀x ∈ S⊥⊥, write x = x1 +x2,
where x1 ∈ S and x2 ∈ S⊥. Since 0 = 〈x, x2〉 = 〈x2, x2〉, x2 = 0. So, x = x1 ∈ S.(iv) When dimV < ∞, by (ii), dim(S⊥/T⊥) = dimT − dimS = dim(T/S) =
dim(T/S)∗. So, φ is an isomorphism.
Note. In general, we do not have V = S ⊕ S⊥ and S = S⊥⊥. Example:Let S = (an) ∈ `2 : an = 0 for n large enough ⊂ `2. Then S⊥ = 0 andS⊥⊥ = `2 6= S.
Orthogonal and orthonormal sets. Let V be an inner product space. Asubset X ⊂ V is called orthogonal if 〈x, y〉 = 0 for all x, y ∈ X with x 6= y. X iscalled orthonormal if for x, y ∈ X,
〈x, y〉 =
1 if x = y,
0 if x 6= y.
An orthogonal set of nonzero vectors is linearly independent.
Hilbert bases. A maximal orthonormal set of V is called a Hilbert basis ofV . By Zorn’s lemma, V has a Hilbert basis. A Hilbert basis is not necessarily a
basis. Example: Let ei = (0, . . . , 0,i1, 0 . . . ) ∈ `2. Then ei : i ≥ 1 is a Hilbert
basis of `2 but not a basis of `2. Another example: Let V = R⊕R⊕ · · · with innerproduct 〈(x1, x2, . . . ), (y1, y2, . . . )〉 =
∑∞i=1 xiyi. Then ei : i ≥ 1 is a Hilbert
basis of V which is also a basis of V . Let ui, i ≥ 1, be the orthonormalizationp. 52) of ei − ei+1, i ≥ 1. Then ui : i ≥ 1 is a Hilbert basis of V . (If x⊥ui forall i, then x = (a, a, . . . ); hence x = 0.) But ui : i ≥ 1 is not a basis of V sincespanui : i ≥ 1 =
(x1, x2, . . . ) ∈ V :
∑∞i=1 xi = 0
6= V . If dimV <∞, a Hilbert
basis is a basis.
Projections. Assume that S is a subspace of V such that V = S⊕S⊥. Eachx ∈ V can be uniquely written as x = x1 + x2, where x1 ∈ S and x2 ∈ S⊥. x1 iscalled the (orthogonal) projection of x onto S and is denoted by projS(x).
If dimS <∞ and u1, . . . , uk is an orthonormal basis of S, then
projS(x) =k∑
i=1
〈x, ui〉ui.
Since ||x||2 = ||projS(x)||2 + ||x− projS(x)||2 ≥ ||projS(x)||2, we have
(5.2) ||x||2 ≥ |〈x, u1〉|2 + · · ·+ |〈x, uk〉|2, x ∈ V.The equality in (5.2) holds iff x ∈ span(u1, . . . , uk). (5.2) is called Bessel’s inequal-ity.
Proposition 5.3. Any two Hilbert bases of an inner product space V have thesame cardinality. This cardinality is called the Hilbert dimension of V .
52 5. INNER PRODUCT SPACES AND UNITARY SPACES
Proof. Only have to consider the case where dimV = ∞. Let X and Y betwo Hilbert bases of V . Clearly, |X| = ∞ and |Y | = ∞. For each x ∈ X, letf(x) = y ∈ Y : 〈y, x〉 6= 0 ⊂ Y .
1 We claim that Y =⋃
x∈X f(x). If ∃y ∈ Y \⋃
x∈X f(x), then y⊥x for allx ∈ X. Then X ∪ y
||y|| is an orthonormal set properly containing X, →←.2 We claim that |f(x)| ≤ ℵ0 for all x ∈ X. In fact, f(x) =
⋃∞n=1y ∈ Y :
|〈y, x〉| ≥ 1n. By Bessel’s inequality,∣∣∣y ∈ Y : |〈y, x〉| ≥ 1
n
∣∣∣ · ( 1n
)2
≤ ||x||2;
hence, |y ∈ Y : |〈y, x〉| ≥ 1n| ≤ n
2||x||2.3 |Y | =
∣∣⋃x∈X f(x)
∣∣ ≤ |X|ℵ0 = |X|. By symmetry, |X| ≤ |Y |.
Gram-Schmidt orthonormalization. Let V be an inner product spaceand let v1, v2, · · · ∈ V (finitely or countably many) be linearly independent. Thenthere is a unique orthonormal sequence u1, u2, · · · ∈ V such that for all k ≥ 1,
(i) span(u1, . . . , uk) = span(v1, . . . , vk);(ii) 〈vk, uk〉 > 0.
The sequence uk, called the Gram-Schmidt orthonormalization of vk, is inductivelygiven by
(5.3) uk =1||u′k||
u′k, where u′k = vk −k−1∑i=1
〈vk, ui〉ui.
Proof of the uniqueness of uk. Let wk be another orthonormal sequencesatisfying (i) and (ii). Then wk = a1u1+· · ·+akuk. Since wk⊥span(w1, . . . , wk−1) =span(u1, . . . , uk−1), we have a1 = · · · = ak−1 = 0; hence wk = akuk. Since ||wk|| =||uk|| = 1, we have ak = ±1. Since 〈vk, uk〉 > 0 and 〈vk, wk〉 > 0, we haveak = 1.
Theorem 5.4 (Explicit formula for the G-S orthonormalization). In the abovenotation, define
Dn =
∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉
......
〈vn, v1〉 · · · 〈vn, vn〉
∣∣∣∣∣∣∣∣ , n ≥ 1,
and D0 = 1. Then Dn > 0 for all n ≥ 0 and
(5.4) un =1√
Dn−1Dn
∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉
......
〈vn−1, v1〉 · · · 〈vn−1, vn〉v1 · · · vn
∣∣∣∣∣∣∣∣∣∣, n ≥ 1.
Proof. It follows from Fact 5.5 that Dn > 0 for all n ≥ 0. Let un be given by(5.4). Then
(5.5) un =√Dn−1
Dnvn + an,n−1vn−1 + · · ·+ an1v1.
5.1. INNER PRODUCT SPACES 53
It remains to show that u1, u2, . . . is orthonormal. Let 1 ≤ i ≤ n. We have
〈vi, un〉 =1√
Dn−1Dn
∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉
......
〈vn−1, v1〉 · · · 〈vn−1, vn〉〈vi, v1〉 · · · 〈vi, vn〉
∣∣∣∣∣∣∣∣∣∣= 0.
So, un⊥span(v1, . . . , vn−1) = span(u1, . . . , un−1). By (5.5) and (5.4),
〈un, un〉 =⟨√Dn−1
Dnvn, un
⟩
=√Dn−1
Dn
1√Dn−1Dn
∣∣∣∣∣∣∣∣∣∣〈v1, v1〉 · · · 〈v1, vn〉
......
〈vn−1, v1〉 · · · 〈vn−1, vn〉〈vn, v1〉 · · · 〈vn, vn〉
∣∣∣∣∣∣∣∣∣∣= 1.
Fact. Every inner product space V with dimV ≤ ℵ0 has an orthonormal basis.Any two inner product spaces V and W with dimV = dimW ≤ ℵ0 are isometric.
Note.
(i) If V is an inner product space with dimV = ℵ0, then its completion isisometric to `2.
(ii) Let V be an inner product space with dimV ≥ ℵ and let V be a completionof V . Then dim V = dimV .
(iii) Let V be a non complete inner product space such that dimV ≥ ℵ. Thendim V = dimV , but V and V are not isometric.
Proof. (i) May assume V = Rℵ0 and 〈(xn), (yn)〉 =∑∞
i=0 xnyn. The comple-tion of V is `2.
(ii) Let X be a basis of V and Y a basis of V . For each y ∈ Y , ∃ a sequenceyn ∈ V such that limn→∞ yn = y. Each yn is a linear combination of finitelymany elements in X. Hence, ∃ a countable subset x0, x1, . . . ⊂ X such that yn ∈spanx0, x1, . . . for all n. So, y ∈ spanx0, x1, . . . , the closure of spanx0, x1, . . . in V . spanx0, x1, . . . is a completion of spanx0, x1, . . . . By 1, spanx0, x1, . . . is isometric to `2. Define f(y) = (x0, x1, . . . ). Then for each (xn) ∈ XN,∣∣f−1
((xn)
)∣∣ ≤ ∣∣spanx0, x1, . . . ∣∣ = |`2| ≤ |RN| = ℵ.
Therefore,
|Y | =∣∣∣ ⋃(xn)∈XN
f−1((xn)
)∣∣∣ ≤ |XN|ℵ = |X|ℵ = |X|.
Example (Legendre polynomials). For f, g ∈ R[x], define
〈f, g〉 =∫ 1
−1
f(x)g(x)dx.
54 5. INNER PRODUCT SPACES AND UNITARY SPACES
Let f0, f1, f2, . . . be the G-S orthonormalization of 1, x, x2, . . . . f0, f1, f2, . . . arecalled the Legendre polynomials. Computation of fn using (5.3) or Theorem 5.4 iscomplicated. The following method is more effective.
Let
gn(x) =dn
dxn(x2 − 1)n =
bn/2c∑k=0
(−1)k
(n
k
)(2n− 2k)nx
n−2k,
where (a)b = a(a− 1) · · · (a− b+ 1) for b ∈ N. Let pn(x) = (x2 − 1)n. Integratingby parts repeatedly, we have
〈gm, gn〉 =∫ 1
−1
p(m)m (x) p(n)
n (x)dx =
0 if m 6= n,
(−1)n(2n)!∫ 1
−1(x2 − 1)ndx if m = n.
Note that
∫ 1
−1
(x2 − 1)ndx =∫ 1
−1
(x− 1)n(x+ 1)ndx
=1
n+ 1
∫ 1
−1
(x− 1)nd(x+ 1)n+1
= − 1n+ 1
∫ 1
−1
(x+ 1)n+1d(x− 1)n
= − n
n+ 1
∫ 1
−1
(x− 1)n−1(x+ 1)n+1dx
= · · ·
= (−1)n n!(2n)n
∫ 1
−1
(x+ 1)2ndx
= (−1)n n!(2n)n
· 22n+1
2n+ 1.
Hence,
〈gn, gn〉 =(n!)222n+1
2n+ 1.
So,
fn(x) =1||gn||
g(x) =
√n+ 1
2
n!2n
dn
dxn(x2 − 1)n.
5.2. FINITE DIMENSIONAL INNER PRODUCT SPACES 55
A “space walk”.
..................................................................................................................................................................... ...........................................................................................................................................................................................
..................................................................................................................................................................... ...........................................................................................................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
..................................................................................................................................................................... ...........
..................................................................................................................................................................... ...........
...................................................................................................
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ..........................................................................................................................................................................
................
................
................
....................................................................................................................................................................................................................................................
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
................
................
................
....................................................................................................................................................................................................................................................
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................
completeness
completeness
Hilbert spaceinner product
space
Banach space normedvector space
metric space
topologicalvector space
topologicalspace
vector space
5.2. Finite Dimensional Inner Product Spaces
The Gram matrix. Let V be an n-dimensional inner product space and letε1, . . . , εn be a basis of V . The Gram matrix of ε1, . . . , εn, denoted G(ε1, . . . , εn), isthe n× n matrix [〈εi, εj〉]. If u = x1ε1 + · · ·+ xnεn and v = y1ε1 + · · ·+ ynεn, then
〈u, v〉 = (x1, . . . , xn)[〈εi, εj〉]
y1...yn
.The Gram matrix [〈εi, εj〉] is symmetric and has the property that xT [〈εi, εj〉]x > 0for all 0 6= x ∈ Rn. (Unless specified otherwise, vectors in Rn are columns.) Ann × n symmetric matrix A over R is called positive definite if xTAx > 0 for all0 6= x ∈ Rn. Let A be an n× n positive definite matrix and define
〈x, y〉A = xTAy, x, y ∈ Rn.
Then (Rn, 〈·, ·〉A) is an inner product space. The map V → Rn, x1ε1 + · · ·+xnεn 7→(x1, . . . , xn)T is an isometry from (V, 〈·, ·〉) to (Rn, 〈·, ·〉G(ε1,...,εn)).
Fact 5.5. A is an n × n positive definite matrix ⇔ A = PTP for some P ∈GL(n,R).
Proof. (⇒) (Rn, 〈·, ·〉A) is isometric to (Rn, 〈·, ·〉I). Let T : Rn → Rn, x 7→Px, be the isometry, and let e1, . . . , en be the standard basis of Rn. Then
A = [〈ei, ej〉A] = [〈Pei, P ej〉I ] =[eTi P
TPej
]= PTP.
56 5. INNER PRODUCT SPACES AND UNITARY SPACES
Orthogonal transformations and orthogonal matrices. An isometryof an n-dimensional inner product space V is also called an orthogonal transforma-tion of V . A matrix A ∈Mn(R) is called orthogonal if ATA = I. Let u1, . . . , un bean orthonormal basis of V and T ∈ End(V ) such that
T (u1, . . . , un) = (u1, . . . , un)A.
Then T is orthogonal ⇔ A is orthogonal.
Examples of orthogonal matrices. Permutation matrices;[−1
1
],[
cos θ − sin θsin θ cos θ
];
block sums of orthogonal matrices.
Easy facts about orthogonal matrices. Let O(n) be the set of all n×northogonal matrices. Let A,B ∈ O(n).
(i) AB, A−1, AT ∈ O(n).(ii) detA = ±1.(iii) All complex eigenvalues of A have norm 1.
QR factorization. Let A ∈Mm×n(R) such that rankA = n. Then A = QR,where Q ∈Mm×n(R) has orthonormal columns and R ∈Mn(R) is upper triangularwith positive diagonal entries. The matricesQ and R, with the described properties,are unique.
Proof. Let A = [a1, . . . , an]. Let u1, . . . , un be the G-S orthonormalization ofa1, . . . , an. Then A = [u1, . . . , un]R.
Proposition 5.6. Let A ∈ O(n).
(i) If detA = 1, A is a product of orthogonal matrices of the form
(5.6)
1. . .
1cos θ − sin θ i
1. . .
1sin θ cos θ j
1. . .
1
.
(The matrix in (5.6) is called a rotation matrix.)(ii) If detA = −1, A is a product of [−1] ⊕ In−1 and matrices of the form
(5.6)
Proof. (i) Denote the matrix in (5.6) by R(i, j, θ). Clearly, R(i, j, θ)−1 =R(i, j,−θ).
5.2. FINITE DIMENSIONAL INNER PRODUCT SPACES 57
Use induction on n. The case n = 1 is obvious. Assume n > 1 and let A = [aij ].Choose θ such that a11 sin θ + a21 cos θ = 0. Then
R(1, 2, θ)A =
a′11 ∗ · · · ∗0 ∗ · · · ∗∗... ∗∗
.
In this way, we see that ∃ rotation matrices R2, . . . , Rn such that
Rn · · ·R2A =
b11 b12 · · · b1n
0... ∗0
.Since Rn · · ·R2A is orthogonal, b11 = ±1. We may assume b11 = 1 (Otherwise,look at R(1, 2, π)Rn · · ·R2A.) Since (b11, b12, . . . , b1n) has norm 1, we have b12 =· · · = b1n = 0. So,
Rn · · ·R2A =
[1 00 A1
],
where A1 ∈ O(n − 1). By the induction hypothesis, A1 = S1 · · ·Sm, whereS1, . . . , Sm are rotation matrices in O(n− 1). Thus,
A = R−12 · · ·R−1
n
[1
S1
]· · ·
[1
Sm
],
where all factors are rotation matrices in O(n).(ii) Apply (i) to
([−1]⊕ In−1
)A.
The projection matrix. Let Rn be the inner product space with the stan-dard inner product 〈·, ·〉I . Let S be a subspace of Rn with a basis a1, . . . , am. LetA = [a1, . . . , am] ∈Mn×m(R). Then
projS(x) = Qx, x ∈ Rn,
whereQ = A(ATA)−1AT .
Q is called the projection matrix of S. If a1, . . . , am is an orthonormal basis of S,then Q = AAT .
Proof. 1 ∀x, y ∈ Rn, since Qx ∈ S and y −Qy ∈ S⊥, we have
0 = 〈Qx, y −Qy〉 = xTQT (I −Q)y = xT (QT −QTQ)y.
Thus, QT = QTQ. It follows that Q = QT and Q = Q2.2 We have
Q = projS(e1, . . . , en) = [a1, . . . , am]B = AB
for some B ∈Mm×n(R) with rankB = m. By 1,
BTATAB = QTQ = QT = BTAT .
58 5. INNER PRODUCT SPACES AND UNITARY SPACES
Thus, ATAB = AT . Since ATA is invertible (Exercise 5.1), B = (ATA)−1AT .Hence Q = AB = A(ATA)−1AT .
The adjoint map. Let V and W be finite dimensional inner product spacesand let f ∈ HomR(V,W ). For each w ∈ W , 〈f(·), w〉 ∈ V ∗. By Proposition 5.2(iv), ∃ a unique element of V , depending on f and w, denoted by f?(w), such that〈f(·), w〉 = 〈·, f?(w)〉. It is easy to check that f? ∈ HomR(W,V ). f? is called theadjoint of f . Moreover, ( )? : W → V is an R-map.
Let f∗ : W ∗ → V ∗ be the R-map defined in Proposition 3.16. Also let φV :V → V ∗ be defined by φV (v) = 〈·, v〉. Then the following diagram commutes.
Wf?
−→ V
∼=
yφW∼=
yφV
W ∗ f∗−→ V ∗
Let v1, . . . , vm be a basis of V and w1, . . . , wn a basis of W and write
f(v1, . . . , vm) = (w1, . . . , wn)A, A ∈Mn×m(R),
f?(w1, . . . , wn) = (v1, . . . , vm)B, B ∈Mm×n(R).
Namely, A (B) is the matrix of f (f?) relative to the bases v1, . . . , vm and w1, . . . , wn
(w1, . . . , wn and v1, . . . , vm). Then
AT
w1
...wn
[w1, . . . , wn] =
f(v1)
...f(vm)
[w1, . . . , wn] =
v1...vm
[f?(w1), . . . , f?(wn)]
=
v1...vm
[v1, . . . , vm]B,
i.e.,ATG(w1, . . . , wn) = G(v1, . . . , vm)B.
If v1, . . . , vm and w1, . . . , wn are orthonormal, then
AT = B.
Self-adjoint maps. An R-map f : V → V is called self-adjoint if f? = f . LetRn be the inner product space with the standard inner product and let f : Rn → Rn
be defined by f(x) = Ax, where A ∈Mn(R). Then f is self-adjoint ⇔ A = AT .
Orthogonal similarity. Two matrices A,B ∈ Mn(R) are called orthogo-nally similar if ∃P ∈ O(n) such that A = PBPT . Let V be an n-dimensional innerproduct space. Two matrices in Mn(R) are orthogonally similar iff they are thematrices of some T ∈ End(V ) relative to two suitable orthonormal bases of V .
Normal matrices. A ∈ Mn(R) is called normal if AAT = ATA. Examples:symmetric, skew symmetric and orthogonal matrices.
Theorem 5.7 (Canonical forms of normal matrices under orthogonal simi-larity). Let A ∈ Mn(R) be normal. Let the eigenvalues of A be a1, . . . , as, b1 ±
5.2. FINITE DIMENSIONAL INNER PRODUCT SPACES 59
c1i, . . . , bt ± cti, where ak, bl, cl ∈ R, cl 6= 0, and s+ 2t = n. Then ∃P ∈ O(n) suchthat
PTAP =
a1
. . .
as
b1 c1
−c1 b1. . .
bt ct
−ct bt
.
Proof. Use induction on n.Case 1. A has a real eigenvalue a. Let x1 ∈ Rn such that ||x1|| = 1 and
Ax1 = ax1. Extend x1 to an orthonormal basis x1, x2, . . . , xn of Rn. For k ≥ 2, byLemma 5.16, 〈Axk, x1〉 = 〈xk, A
Tx1〉 = 〈xk, ax1〉 = 0. So,
A[x1, . . . , xn] = [x1, . . . , xn]
[a 00 A1
],
where A1 ∈Mn−1(R) is normal. Use the induction hypothesis on A1.Case 2. A has an eigenvalue λ = b + ci, c 6= 0. Let 0 6= z ∈ Cn such that
Az = λz. By Lemma 5.16,
λzT z = zTAz = (zTAz)T = zTAT z = λzT z.
Hence zT z = 0. Write z = u + iv, u, v ∈ Rn. Then Az = λz implies thatA[u, v] = [u, v]
[b c−c b
]; zT z = 0 implies that ||u|| = ||v|| and 〈u, v〉 = 0. We may
assume ||u|| = ||v|| = 1. Extend u, v to an orthonormal basis u, v, x3, . . . , xn of Rn.Then for k ≥ 3, (Axk)T z = xT
kAT z = xT
k λz = 0. So, 〈Axk, u〉 = 〈Axk, v〉 = 0.Therefore,
A[u, v, x3, . . . , xu] = [u, v, x3, . . . , xu]
b c
−c b
A1
,where A1 ∈Mn−2(R) is normal. Use the induction hypothesis on A1.
Corollary 5.8. Let A ∈Mn(R).
(i) A = AT ⇔ A is orthogonally similar to a diagonal matrix. In particular,all eigenvalues of a symmetric matrix in Mn(R) are real.
(ii) A = −AT ⇔ A is orthogonally similar to[0 c1
−c1 0
]⊕ · · · ⊕
[0 ct
−ct 0
]⊕ 0
for some c1, . . . , ct ∈ R×. In particular, all eigenvalues of a skew symmet-ric matrix in Mn(R) are purely imaginary.
(iii) A is orthogonal ⇔ A is normal and all eigenvalues of A are of complexnorm 1.
60 5. INNER PRODUCT SPACES AND UNITARY SPACES
Positive definite and semi positive definite matrices. Let A ∈Mn(R)be symmetric. Recall that A is called positive definite if xAxT > 0 for all 0 6= x ∈Rn. A is called semi positive definite if xAxT ≥ 0 for all x ∈ Rn.
Proposition 5.9. Let A ∈ Mn(R) be symmetric. The following statementsare equivalent.
(i) A is positive definite.(ii) All eigenvalues of A are positive.(iii) A = BBT for some B ∈ GL(n,R).(iv) A = BBT for some Mn×m(R) with rankB = n.(v) detA(I, I) > 0 for every I ⊂ 1, . . . , n.(vi) detA(1, . . . , k, 1, . . . , k) > 0 for every 1 ≤ k ≤ n. (detA(1, . . . , k,
1, . . . , k) is called a leading principal minor of A.)
Proof. The equivalence of (i) – (iv) is easy.(i) ⇒ (v). We claim that A(I, I) is positive definite. To see this, we may
assume I = 1, . . . , k. For each (row vector) 0 6= x ∈ Rk, 0 6= (x, 0) ∈ Rn.So, xA(I, I)xT = (x, 0)A(x, 0)T > 0. Thus A(I, I) is positive definite. By (ii),detA(I, I) > 0.
(v) ⇒ (vi). Obvious.(vi) ⇒ (i). Use induction on n. The case n = 1 is obvious. Assume n > 1. Let
I = 1, . . . , n − 1. Since detA(1, . . . , k, 1, . . . , k) > 0 for all 1 ≤ k ≤ n − 1,by the induction hypothesis, A(I, I) is positive definite. In particular, A(I, I) isinvertible. Hence A is congruent to[
A(I, I) 00 λ
]for some λ ∈ R. Since λ = det A
det A(I,I) > 0, A(I, I) ⊕ [λ] is positive definite. Hencethe conclusion.
Proposition 5.10. Let A ∈ Mn(R) be symmetric. The following statementsare equivalent.
(i) A is semi positive definite.(ii) All eigenvalues of A are ≥ 0.(iii) A = BBT for some B ∈Mn×r(R) with rankB = r.(iv) A = BBT for some B ∈Mn×m(R).(v) detA(I, I) ≥ 0 for all I ⊂ 1, . . . , n.
Proof. (v) ⇒ (i). We have cA(x) = xn−an−1xn−1 + · · ·+(−1)na0, where ak
is the sum of all k × k principal minors of A. Since ak ≥ 0 for all 0 ≤ k ≤ n − 1,cA(x) has no negative roots.
Note. Regarding Proposition 5.10 (v), if all leading minors of A are ≥ 0, A isnot necessarily semi positive definite. Example: A =
[0−1
].
Generalized inverses. Let A ∈Mm×n(R). The map
(5.7)φ : C(AT ) −→ C(A)
ATx 7−→ AATx
is an isomorphism (∵ kerφ = 0 and dim C(AT ) = dim C(A)). Let P be theprojection matrix of C(A). Then ∃!A+ ∈Mn×m(R) such that C(A+) ⊂ C(AT ) and
5.2. FINITE DIMENSIONAL INNER PRODUCT SPACES 61
AA+ = P . A+ is called the (Moore-Penrose) generalized inverse of A. Clearly, ifA is invertible, A+ = A−1.
Properties of A+. Let A ∈Mm×n(R) and let P,Q be the projection matricesof C(A) and C(AT ), respectively.
(i) AA+ = P , A+A = Q.(ii) A+P = QA+ = A+.(iii) A+AA+ = A+, AA+A = A.(iv) rankA+ = rankA.
Proof. (i) Note that C(A+A) ⊂ C(AT ), C(Q) ⊂ C(AT ) and AA+A = PA = A,AQ = (QAT )=(AT )T = A. Since (5.7) is an isomorphism, we have A+A = Q.
(ii) Since C(A+) ⊂ C(AT ), we have QA+ = A+. Then A+P = A+AA+ =QA+ = A+.
(iii) A+AA+ = QA+ = A+.(iv) It follows from (iii) that rankA+ ≤ rankA and rankA ≤ rankA+.
Proposition 5.11 (Characterization of A+). Let A ∈ Mm×n(R) and B ∈Mn×m(R). Then B = A+ ⇔
(i) ABA = A, BAB = B and(ii) both AB and BA are symmetric.
Proof. (⇐) We have (AB)2 = AB, (AB)T = AB, and C(AB) = C(A) (by(i), rankAB ≥ rankA). So, P := AB is the projection matrix of A. Since B =(BA)B = ATBTB, C(B) ⊂ (AT ). Since AB = P , we have B = A+.
Singular value decomposition. Let A ∈Mm×n(R). Then ∃P ∈ O(m) andQ ∈ O(n) such that
A = P [diag(sa, . . . , sr)⊕ 0]Q,
where s1, . . . , sr ∈ R+ and s21, . . . , s2r are the nonzero eigenvalues of ATA. s1, . . . , sr
are called the singular values of A.
Proof. ATA is semi positive definite. Hence ∃Q1 ∈ O(n) such that
(5.8) QT1 A
TAQ1 = diag(s21, . . . , s2r)⊕ 0, si > 0.
Write AQ1 = [a1, . . . , an]. Then
aTi aj =
s2i if i = j ≤ r,0 otherwise.
By (5.8), rankA = rankATA = r; hence span (a1, . . . , an) = span (a1, . . . , ar). Letui = 1
siai, 1 ≤ i ≤ r. Then u1, . . . , ur is orthonormal. Extend it to an orthonormal
basis u1, . . . , um of Rm. Then
[u1, . . . , um]TAQ1 = diag (s1, . . . , sr)⊕ 0.
So, A = [u1, . . . , um](diag (s1, . . . , sr)⊕ 0
)QT
1 .
Proposition 5.12. If A ∈ Mm×n(R) has a singular value decomposition A =P(diag (s1, . . . , sr)⊕ 0
)Q, then A+ = QT
(diag ( 1
s1, . . . , 1
sr)⊕ 0
)PT .
Proof. It follows from Proposition 5.11.
62 5. INNER PRODUCT SPACES AND UNITARY SPACES
Least squares solutions. Let A ∈Mm×n(R) and b ∈ Rm. For each x ∈ Rn,
||Ax− b||2 = ||Ax− projC(A)(b)||2 + ||projC(A)(b)− b||2.
Hence ||Ax− b|| is minimum iff
(5.9) Ax = projC(A)(b).
A solution of (5.9) is called a least squares solution of
(5.10) Ax = b.
Note that (5.9) is always consistent even if (5.10) is not.
Proposition 5.13. Assume the above notation.(i) (5.9) ⇔ ATAx = AT b.(ii) A+b+ kerc(A) is the set of least squares solutions of (5.10).(iii) A+b is the unique least square solution of (5.10) of minimum norm.
Proof. (i) (5.9) ⇔ (Ax− b)⊥C(A)⇔ AT (Ax− b) = 0.(ii) Only have to show that A+b is a least squares solution. We have ATAA+b =
AT (AA+)T b = (AA+A)T b = AT b.(iii) Note that A+b ∈ C(AT ) ⊂ kerc(A)⊥.
Polar decomposition. Let A ∈Mn(R). Then ∃P ∈ O(n) and semi positivedefinite matrices B1 and B2 such that
(5.11) A = B1P = PB2.
If A ∈ GL(n,R), then B1 and B2 are positive definite and P,B1, B2 are unique.
Proof. By the singular value decomposition, ∃Q,R ∈ O(n) such that
A = Q
s1
. . .
sn
R,where 0 ≤ si ∈ R. Let B1 = Qdiag(s1, . . . , sn)QT , B2 = RT diag(s1, . . . , sn)R, andP = QR. We have (5.11).
Uniqueness of B1, B2, P when A ∈ GL(n,R). Assume A = B′1P
′1 = P ′2B
′2,
where P ′1, P′2 ∈ O(n) and B′
1, B′2 are positive definite. Then B2
1 = AAT = B′12.
Then B1 = B′1 (Exercise 5.5 (i)). So, P ′1 = P . In the same way, B2 = B′
2 andP ′2 = P .
5.3. Unitary Spaces
A unitary space is an inner product space over C.
Definition 5.14. A unitary space is a vector space V over C equipped with amap 〈·, ·, 〉 : V ×V → C, called the inner product, satisfying the following conditions.
(i) 〈u, v〉 = 〈v, u〉, ∀u, v ∈ V .(ii) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉, ∀u, v, w ∈ V , a, b ∈ C.(iii) 〈u, u〉 ≥ 0 for all u ∈ V and 〈u, u〉 = 0⇔ u = 0.
Examples.
5.3. UNITARY SPACES 63
• V = Cn. For x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Cn, define
〈x, y〉 =n∑
i=1
xiyi.
• `2C := (an)∞n=0 : an ∈ C,∑∞
n=0 |an|2 <∞. For (an), (bn) ∈ `2C, define
〈(an), (bn)〉 =∞∑
n=0
anbn.
• Let (X,B, µ) be a measure space and let L2C(X) = u+iv : u, v ∈ L2(X).
For f, g ∈ L2C(X), define
〈f, g〉 =∫
X
fgdµ.
Complexification. Let V be a vector space over R. Define
VC = u+ vi : u, v ∈ V .
For u1 + v1i, u2 + v2i ∈ VC and a + bi ∈ C, where u1, u2, v1, v2 ∈ V and a, b ∈ R,define
(u1 + v1i) + (u2 + v2i) = (u1 + u2) + (v1 + v2)i,
(a+ bi)(u1 + v1i) = (au1 − bv1) + (bu1 + av1)i.
Then VC is a vector space over C; VC is called the complexification of V . (In factVC = C⊗R V .)
If V is an inner product space over C, then VC is a unitary space with innerproduct
〈u1 + v1i, u2 + v2i〉 = 〈u1, u2〉+ 〈v1, v2〉+ (〈v1, u2〉+ 〈u1, v2〉)i.
Cn is the complexification of Rn; L2C(X) is the complexification of L2(X).
On the other hand, if V is a vector space over C, it is of course a vector spaceover R. We write VR for V viewed as a vector space over R. If (V, 〈·, ·〉) is a unitaryspace, then (VR,Re〈·, ·〉) is an inner product space.
Almost all definitions and results about inner product spaces can be carried tounitary spaces without additional work.
• Norm: ||x|| = 〈x, x〉 12 .• Distance: ||x− y||.• Orthogonality: x⊥y if 〈x, y〉 = 0.• Adjoint: Let V and W be finite dimensional unitary spaces and f ∈
HomC(V,W ), then ∃!f? ∈ HomC(W,V ), called the adjoint of f , such that〈f(x), y〉 = 〈x, f?(y)〉 ∀x ∈ V, y ∈W .
• For A ∈Mm×n(C), A∗ := AT .• Hermitian matrices: A ∈Mn(C) such that A∗ = A.• (Semi) positive definite matrices: Hermitian matrix A such that x∗Ax > 0
(≥ 0) for all 0 6= x ∈ Cn.• Unitary matrices: P ∈ Mn(C) such that PP ∗ = I. The set of all n × n
unitary matrices is denoted by U(n).• Unitary transformations: f ∈ HomC(V, V ) such that 〈f(x), f(y)〉 = 〈x, y〉∀x, y ∈ V .
64 5. INNER PRODUCT SPACES AND UNITARY SPACES
• The generalized inverse: Let A ∈ Mm×n(C) and let P be the projectionmatrix of C(A). A+ ∈Mn×m(C) is the unique matrix such that C(A+) ⊂C(A∗) and AA+ = P .• Normal matrices: A ∈Mn(C) such that AA∗ = A∗A.• Unitary similarity: A,B ∈Mn(C) are called unitarily similar if ∃P ∈ U(n)
such that A = PBP ∗.
Canonical forms of normal matrices under unitary similarity. Theresult is simpler than the case of real normal matrices under orthogonal similaritydue to the fact that C is algebraically closed. (Compare with Theorem 5.7.)
Proposition 5.15. A matrix A ∈ Mn(C) is normal ⇔ A is unitarily similarto a diagonal matrix.
Proof. (⇐) Obvious.(⇒) Method 1. Use Lemma 5.16 and the same argument of the proof of Theo-
rem 5.7, case 1.Method 2. By Lemma 5.17, we may assume that A is upper triangular, say,
A =
a11 a12 · · · a1n
a21 · · · a2n
. . ....ann
.Compare the (1, 1) entries of A∗A and A∗A. We have
|a11|2 = |a11|2 + |a12|2 + · · ·+ |a1n|2.
So, a12 = · · · = a1n = 0. Using induction, we have aij = 0 for all i < j.
Lemma 5.16. Let A ∈Mn(C) be normal. If Ax = λx, where λ ∈ C and x ∈ Cn,then A∗x = λx.
Proof. Since AA∗ = A∗A, we have
〈A∗x− λx, A∗x− λx〉 = 〈Ax− λx, Ax− λx〉 = 0.
Lemma 5.17. Let A ∈ Mn(C). Then ∃P ∈ U(n) such that P ∗AP is uppertriangular.
Proof. Let λ1 ∈ C be an eigenvalue of A and let x1 ∈ C be an associatedeigenvector with ||x1|| = 1. Extend x1 to an orthonormal basis x1, x2, . . . , xn ofCn. Then
A[x1, . . . , xn] = [x1, . . . , xn]
λ1 ∗ · · · ∗0... A1
0
,where A1 ∈Mn−1(C). Apply the induction hypothesis to A1
5.3. UNITARY SPACES 65
Theorem 5.18 (Specht). Let A,B ∈ Mn(C). Then A and B are unitarilysimilar ⇔
(5.12) Tr(Ai1A∗j1 · · ·AikA∗jk) = Tr(Bi1B∗j1 · · ·BikB∗jk)
for all k ≥ 0 and i1, 1, . . . , ik, jk ∈ N.
Proof. (⇒) ∃P ∈ U(n) such that A = PBP ∗. Then
Tr(Ai1A∗j1 · · ·AikA∗jk) = Tr(PBi1B∗j1 · · ·BikB∗jkP ∗) = Tr(Bi1B∗j1 · · ·BikB∗jk).
(⇐) The proof of this part needs representation theory.1 LetA be the C algebra generated by A and A∗ and B the C algebra generated
by B and B∗. Each element in A is a linear combination f(A,A∗) of productsAi1A∗j1 · · ·AikA∗jk with coefficients in C. Define
φ : A −→ Bf(A,A∗) 7−→ f(B,B∗).
Then φ is a well defined isomorphism. In fact, if f(B,B∗) = 0, then by (5.12),
Tr(f(A,A∗)∗f(A,A∗)
)= Tr
(f(B,B∗)∗f(B,B∗)
)= 0;
hence f(A,A∗) = 0.2 A is semisimple.Let I be a nilpotent ideal of A. Then I2m
= 0 for some m > 0. Let C ∈ I.Then (CC∗)2
m
= 0. It follows that (CC∗)2m−1
= 0. By induction, CC∗ = 0, whichimplies C = 0.
3 Let V1 be the natural A-module Cn. Let V − 2 be the A-module Cn withscalar multiplication C ∗ x = φ(C)x, C ∈ A, x ∈ Cn. We claim that AV1
∼= AV2.Let 1 = e1 + · · ·+ eu be a decomposition of 1 into primitive orthogonal idem-
potents of A. Then we can write
AV1 =u⊕
i=1
si⊕j=1
Lij ,
AV2 =u⊕
i=1
ti⊕j=1
Mij ,
where Lij∼= Aei, Mij
∼= Aei. (See [?, 25.8].) Then
si dimCAei = Tr(ei) = Tr(φ(ei)
)(by (5.12))
= ti dimCAei.
So, si = ti and AV1∼= AV2.
4 Let α : AV1 → AV2, x 7→ Px be the isomorphism in 3, where P ∈ GL(n,C).Then ∀C ∈ A,
PCx = α(Cx) = φ(C)α(x) = φ(C)Px ∀x ∈ Cn.
Hence φ(C) = PCP−1. In partticular, B = PAP−1 and B∗ = PA∗P−1. ByExercise 5.15, A and B are unitarily similar.
66 5. INNER PRODUCT SPACES AND UNITARY SPACES
Exercises
5.1. Let A ∈Mm×n(C). Prove that rank (A∗A) = rankA.
5.2. Let V and W be inner product spaces. Let f : V → W be a function suchthat(i) f(0) = 0;(ii) ||f(u)− f(v)|| = ||u− v||.Prove that f is a linear transformation.
5.3. Let V be a vector space over R and 〈·, ·〉 : V × V → R a function such that(1) 〈u, v〉 = 〈v, u〉 ∀u, v ∈ V ;(2) 〈au+ bv, w〉 = a〈u,w〉+ b〈v, w〉 ∀u, v, w ∈ V, a, b ∈ R;(3) 〈u, u〉 ≥ 0 ∀u ∈ V .
For each u ∈ V , define ||u|| = 〈u, u〉 12 . Prove the following statements.(i) V0 := u ∈ V : ||u|| = 0 is a subspaces of V .(ii) |〈u, v〉| ≤ ||u|| · ||v|| ∀u, v ∈ V .(iii) V0 = u ∈ V : 〈u, v〉 = 0 ∀v ∈ V .(iv) Define
〈·, ·, 〉 : V/V0 × V/V0 −→ R(u+ V0, v + V0) 7−→ 〈u, v〉.
Then (V/V0, 〈·, ·, 〉) is an inner product space.
5.4. (Hermite polynomials) For f, g ∈ R[x], define
〈f, g〉 =∫ +∞
−∞f(x)g(x)e−x2
dx.
Let h0(x), h1(x), h2(x), . . . be the G-S orthonormalization of 1, x, x2, . . . . De-termine hn(x) through the following steps.(i) Let
Hn(x) = (−1)nex2 dn
dxne−x2
.
Prove that
Hn(x) = n!bn/2c∑k=0
(−1)k(2x)n−2k
k!(n− 2k)!.
(ii) Use induction and integration by parts to show that
〈Hm,Hn〉 = 2nn!√πδm,n.
(iii) Use (i) and (ii) to show that
hn(x) = (2nn!√π)−
12Hn(x) = π−
14
( n!2n
) 12bn/2c∑k=0
(−1)k(2x)n−2k
k!(n− 2k)!.
5.5. Let A ∈Mn(C) be semi positive definite.(i) Prove that ∃! semi positive definite matrix A1 ∈Mn(C) such that A =
A21.
(ii) Let B ∈Mn(C). Then B commutes with A⇔ B commutes with A1.
5.6. Let A ∈Mn(C) be hermitian and let k be a positive odd integer.(i) Prove that ∃! hermitian matrix B ∈Mn(C) such that Bk = A.
EXERCISES 67
(ii) Prove that centMn(C)(A) = centMn(C)(B).
5.7. (Volume of a parallelepiped) Let v1, . . . , vk ∈ Rn be column vectors and let
Ω = a1v1 + · · ·+ akvk : 0 ≤ ai ≤ 1.Then
Vol(Ω) =[det([v1, . . . , vk]T [v1, . . . , vk]
)] 12 .
5.8. (Distance from a point to an affine subspace) Let A ∈ Mm×n(R), b ∈Mm×1(R) such that Ax = b is consistent. Let
M = x ∈Mn×1(R) : Ax = b.(i) Let u1, . . . , uk be an orthonormal basis of R(A). Show that ∃B ∈
Mk×m(R) with rankB = k such that
BA =
u1
...uk
=: U
and M = x ∈Mm×1(R) : Ux = c, where c = Bb.(ii) For each y ∈Mm×1(R), prove that
d(y,M) = ||Uy − c|| =[yTUTUy − 2cTUy + cT c
] 12 .
5.9. (The Hadamard inequality) Let A = [a1, . . . , an] ∈ GL(n,C). Then
|detA| ≤n∏
i=1
||ai||.
The equality holds iff a1, . . . , an form an orthogonal basis of Cn.
5.10. Let A = [aij ] ∈Mn(C) be positive definite. Prove that
detA ≤ a11a22 · · · ann
and that the equality holds iff A is diagonal.
5.11. (i) If A ∈Mm(C) and B ∈Mn(C) are (semi) positive definite, so is A⊗B.(ii) If AB ∈Mn(C) are (semi) positive definite and AB = BA, then AB is
also (semi) positive definite.(iii) For A = [aij ], B = [bij ] ∈ Mn(F ), the Hadamard product of A and
B, denoted by A ∗ B, is [aijbij ]. If A,B ∈ Mn(C) are (semi) positivedefinite, so is A ∗B.
5.12. (Properties of generalized inverses) Let A ∈ Mm×n(C), B ∈ Mn×p(C) andC ∈Ms×t(C).(i) (A+)+ = A, A+ = A+, (AT )+ = (A+)T .(ii) (A⊗ C)+ = A+ ⊗ C+.(iii) If rankA = n, A+ = (A∗A)−1A∗. If rankB = n, B+ = B∗(BB∗)−1.(iv) If rankA = rankB = n, then (AB)+ = B+A+.(v) Give an example where (AB)+ 6= B+A+.
5.13. (A practical formula for A+) Let A ∈Mm×n(C) with rankA = r.(i) Prove that ∃B ∈ Mm×r(C) and C ∈ Mr×n(C) such that rankB =
rankC = r and A = BC. (This is true with C replaced with an arbitraryfield F .)
68 5. INNER PRODUCT SPACES AND UNITARY SPACES
(ii) Prove that A+ = C∗(B∗BCC∗)−1B∗.
5.14. Let A ∈Mm×n(C). Prove that[
AA∗
]is unitarily similar to diag(s1, . . . , st,
−s1, . . . ,−st)⊕0, where s1, . . . , st are the singular values of A (counted withmultiplicity).
5.15. Let A,B ∈Mn(C). Prove that A is unitarily similar to B ⇔ ∃P ∈ GL(n,C)such that P−1AP = B and P−1A∗P = B∗.
5.16. (i) Let A ∈ Mm(C) and B,C ∈ Mn(C) such that A ⊕ B and A ⊕ C areunitarily similar. Then B and C are unitarily similar.
(ii) Let A,B ∈ Mn(C) and k > 0 such that A⊕ · · · ⊕A︸ ︷︷ ︸k
and B ⊕ · · · ⊕B︸ ︷︷ ︸k
are unitarily similar. Then A and B are unitarily similar.Use Specht’s theorem.
Hints for the Exercises
1.3. (ii) [aijB][cjkD] =[∑n
j=1 aijcjkBD].
(iii) bklcuv appears in the((k−1)r+u, (l−1)s+v
)entry of B⊗C; aijbklcuv
appears in the((i − 1)pr + (k − 1)r + u, (j − 1)qs + (l − 1)s + v
)entry of
A⊗ (B ⊗ C).(v) Let rankA = r. Then ∃P ∈ GL(m,F ), Q ∈ GL(n, F ) such that
PAQ =
[Ir 00 0
].
So, (P ⊗ Ip)(A⊗B)(Q⊗ Iq) = · · · .
2.1. Use a Laplace expansion along two rows.
2.7. The Mathematica code:
p = 23;n = (p - 1)/2;A = Table[Mod[i* PowerMod[j, -1, p], p], i, n, j, n];FactorInteger[Det[A]]
(The number |Dp|p−(p−3)/2 is the relative class number of the cyclotomicfield Q(ζp). See [1].)
3.2 (ii) Since dimV < ∞ and V ⊃ f(V ) ⊃ f2(V ) ⊃ · · · , ∃s such that fs(V ) =fs+1(V ) = · · · . So, V2 = fk(V ), k ≥ s. Since ker f ⊂ ker f2 ⊂ · · · ⊂ V , ∃tsuch that ker f t = ker f t+1 = · · · . So, V1 =
⋃∞k=1 ker fk = ker f t.
3.10. (i) Assume A = [a1, . . . , an] ∈ GL(n,Fq). Count the number of possibilitiesfor a1, a2, etc.
(ii) Let
X =(X, (a1, . . . , ak)
): X is a k-dimensional subspace of Fn
q
and (a1, . . . , ak) is a basis of X
in two ways.
4.1. Let A ∈ Mn(F ) be the matrix of f relative to a basis of V . May assumeA = A0 ⊕ A1, where all elementary divisors of A0 are powers of x and noneof the elementary divisors of A1 is a power of x. Then Ak
1 = 0 for some k ≥ 0and A2 is invertible.
69
70 HINTS FOR THE EXERCISES
4.2. We have[Im A
0 In
][xIm −AB 0
B xIn
][Im −A0 In
]=
[xIm 0B xIn −BA
].
4.11. Elementary divisors.
Solutions of the Exercises
1.2. (i) The (i, j) entry of PTσ Pσ is
eTσ(i)eσ(j) =
1 if i = j,
0 if i 6= j.
So, PTσ Pσ = I.
(ii) The jth column of APσ is [a1, . . . , an]eσ(j) = aσ(j). So, APσ =[aσ(1), . . . , aσ(n)]. We also have
PσB = (BTPTσ )T =
([bT1 , . . . , b
Tn ]Pσ−1
)T = [bTσ−1(1), . . . , bTσ−1(n)]
T =
bσ−1(1)
...bσ−1(n)
.1.3. (ii) [aijB][cjkD] =
[∑nj=1 aijcjkBD
]= AC ⊗BD.
(iii) bklcuv appears in the((k−1)r+u, (l−1)s+v
)entry of B⊗C; aijbklcuv
appears in the((i − 1)pr + (k − 1)r + u, (j − 1)qs + (l − 1)s + v
)entry of
A⊗ (B⊗C). aijbkl appears in the((i− 1)p+ k, (j− 1)q+ l
)entry of A⊗B;
aijbklcuv appears in the((
(i − 1)p + k − 1)r + u,
((j − 1)q + l − 1
)s + v
)entry of (A⊗B)⊗ C.(v) Let rankA = r. Then ∃P ∈ GL(m,F ), Q ∈ GL(n, F ) such that
PAQ =
[Ir 00 0
].
So,
(P ⊗ Ip)(A⊗B)(Q⊗ Iq) = PAQ⊗B =
[Ir ⊗B
0
].
Therefore, rank(A⊗B) = rank(Ir ⊗B) = r rankB.
2.4 Let A be the matrix in the determinant. Then
A
1i −i1 1
. . .
i −i1 1
=
1 eix1 e−ix1 · · · einx1 e−ix1
......
......
...1 eix2n+1 e−ix2n+1 · · · einx2n+1 e−inx2n+1
.
71
72 SOLUTIONS OF THE EXERCISES
So,
(2i)n detA = e−in(x1+···+x2n+1)
·
∣∣∣∣∣∣∣∣einx1 ei(n+1)x1 ei(n−1)x1 · · · ei2nx1 ei0x1
......
......
...einx2n+1 ei(n+1)x2n+1 ei(n−1)x2n+1 · · · ei2nx2n+1 ei0x2n+1
∣∣∣∣∣∣∣∣= e−in(x1+···+x2n+1)
∣∣∣∣∣∣∣∣1 eix1 · · · ei2nx1
......
...1 eix2n+1 · · · ei2nx2n+1
∣∣∣∣∣∣∣∣(2n+ (2n− 2) + · · ·+ 2 column transpositions)
= e−in(x1+···+x2n+1)∏
1≤j<k≤2n+1
(eixk − eixj )
= e−in(x1+···+x2n+1)∏
1≤j<k≤2n+1
2iei 12 (xk+xj) sin
xk − xj
2
= e−in(x1+···+x2n+1)+i 12
∑j<k(xk+xj)(2i)(
2n+12 ) ∏
1≤j<k≤2n+1
sinxk − xj
2
= (2i)(2n+1
2 ) ∏1≤j<k≤2n+1
sinxk − xj
2.
2.5. Let A be the matrix in the determinant. Then by Exercise 2.4,
(2i)n detA =
∣∣∣∣∣∣∣∣eix1 e−ix1 · · · einx1 e−ix1
......
......
eix2n e−ix2n · · · einx2n e−inx2n
∣∣∣∣∣∣∣∣=
1n+ 1
n∑s=0
∣∣∣∣∣∣∣∣∣∣1 ei 2π
n+1 s e−i 2πn+1 s · · · ein 2π
n+1 s e−in 2πn+1 s
1 eix1 e−ix1 · · · einx1 e−ix1
......
......
...1 eix2n e−ix2n · · · einx2n e−inx2n
∣∣∣∣∣∣∣∣∣∣=
1n+ 1
n∑s=0
(2i)(2n+1
2 )( ∏
1≤j<k≤2n
sinxk − xj
2
)( 2n∏j=1
sin12(xj −
2πn+ 1
s)).
2.6. If m 6= n, say m > n, then p < q and rank(A ⊗ B) = (rankA)(rankB) ≤np < mp. If m = n and p = q, det(A ⊗ B) = det[(A ⊗ Ip)(Im ⊗ B)] =det(A⊗ Ip) det(Im ⊗B) = (detA)p(detB)m.
??. We have
bi,0x0 + · · ·+ bi,n−1x
n−1 =
1 if x = xi,
0 if x = xk, k 6= i.
Hence,
bi,0x0 + · · ·+ bi,n−1x
n−1 =
∏k 6=i(x− xk)∏k 6=i(xi − xk)
.
SOLUTIONS OF THE EXERCISES 73
So,
bij =(−1)n−1−j∏k 6=i(xi − xk)
σn−1−j(x1, . . . , xi−1, xi+1, . . . , xn).
3.2. (ii) Since dimV < ∞ and V ⊃ f(V ) ⊃ f2(V ) ⊃ · · · , ∃s such that fs(V ) =fs+1(V ) = · · · . So, V2 = fk(V ), k ≥ s. Since ker f ⊂ ker f2 ⊂ · · · ⊂ V ,∃t ≥ s such that ker f t = ker f t+1 = · · · . So, V1 =
⋃∞k=1 ker fk = ker f t. We
claim that V1 ∩V2 = 0. (Let x ∈ V1 ∩V2. Since x ∈ V2 = f t(V ), x = f t(y)for some y ∈ V . Since x ∈ V1 = ker f t, f2t(y) = f t(x) = 0; hence y ∈ V1.Thus x = f t(y) = 0.) Since dimV1 + dimV2 = dim(ker f t) + dim(im f t) =dimV , we must have V = V1⊕V2. (Note. There is an easier proof using thecanonical form of f .)
(iii) Let V = F ⊕F ⊕ · · · and f : V → V , (x1, x2, . . . ) 7→ (0, x1, x2, . . . ).
3.3. Direct computation shows that D(xiyj) = 4(i + j + 1)xiyj . Clearly, Dpreserves the additions and scalar multiplications. So, D maps L to L andis an R-map. The matrix of D relative to the basis xiyj : 0 ≤ i, j ≤ n of Lis a diagonal matrix with rows and columns labeled by (i, j) : 0 ≤ i, j ≤ n;the
((i, j), (i, j)
)-entry of the matrix is 4(i+ j + 1).
3.7. Define
f : C(B)/C(BC) −→ C(AB)/C(ABC)Bx+ C(BC) 7−→ ABx+ C(ABC), x ∈Mp×1(F ).
Then f is a well defined onto F -map. So,
dim(C(B)/C(BC)
)≥ dim
(C(AB)/C(ABC)
).
Hence the result.
3.8. (ii) We have f(ax) = af(x) for all a ∈ Q and x ∈ Rn. If α ∈ R, choosean ∈ Q such that limn→+∞ an = α. Then f(αx) = limn→+∞ f(anx) =limn→+∞ anf(x) = αf(x).
3.9. Assume to the contrary that the claim is false and assume that n is thesmallest positive integer for which there is a counterexample X, i.e., X isa subspace of Mn(F ) with dimX > n(n − 1) such that X ∩ GL(n, F ) = ∅.Clearly n > 1.
Let Eij ∈ Mn(F ) be the matrix whose (i, j)-entry is 1 and whose otherentries are all 0.
1 We claim that if Eij ∈ X, then Eik, Ekj ∈ X for all 1 ≤ k ≤ n.May assume (i, j) = (1, 1). Assume the contrary of the claim. Consider
an F -map
f : X −→ Mn−1(F )[∗ ∗∗ A
]7−→ A.
Then dim ker f < 2n− 1; hence dim f(X) = dimX − dim ker f > n(n− 1)−2(n − 1) = (n − 1)(n − 2). Thus ∃A ∈ f(X) such that detA 6= 0. We have
74 SOLUTIONS OF THE EXERCISES[a bc A
]∈ X for some a, b, c. For each x ∈ F , we have xE11 +
[a bc A
]∈ X and
det(xE11 +
[a b
c A
])= det
[x+ a b
c A
]= (x+ a) detA+ constant.
So, ∃x ∈ F such that det(xE11 +
[a bc A
])6= 0, →←.
2 Since dimX > n(n− 1), the F -map
g : X −→ M(n−1)×n(F )[∗A
]7−→ A
has ker g 6= 0. Hence ∃0 6= u ∈ Fn such that [ u0 ] ∈ X. By elementary
column operations, we may assume E11 ∈ X. By 1, Ei1 ∈ X for all i. By1 again, Eij ∈ X for all i, j, ⇒ X = Mn(F ), →←.
3.11.
Li(x0) =∫ +∞
0
e−ixdx =1i.
For j > 0,
Li(xj) =∫ +∞
0
xje−ixdx = −1i
∫ +∞
0
xjde−ix = −1i
[xje−ix
∣∣∣+∞0−∫ +∞
0
e−ixdxj]
=j
i
∫ +∞
0
xj−1e−ixdx =j
iLi(xj−1).
HenceLi(xj) =
j!ij+1
, 1 ≤ i ≤ n+ 1, 0 ≤ j ≤ n.
Let (f1, . . . , fn+1) = (1, x, . . . , xn)A. Then
In+1 =
L1
...Ln+1
[f1, . . . , fn+1] =
L1
...Ln+1
[1, x, . . . , xn]A =[ j!ij+1
]A.
To find the inverse of [ j!ij+1 ], note that
[ j!ij+1
]=
1 · · · 111 · · · 1
n+1...
...( 11 )n · · · ( 1
n+1 )n
0!1!
. . .
n!
.
Bibliography
[1] L. Carlitz and F. R. Olson, Maillet’s determinant, Proc. Amer. Math. Soc. 6 (1955), 265 –
269.
75