LinearAlgebraII - math.lmu.dephilip/publications/lectureNotes/philipPeter_Linear... · Thus, the aﬃne subspaces of a vector space V are precisely the translations of vector subspaces

Linear Algebra II

Peter Philip∗

Lecture Notes

Created for the Class of Spring Semester 2019 at LMU Munich

July 29, 2019

Contents

1 Affine Subspaces and Geometry 3

1.1 Affine Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Affine Hull and Affine Independence . . . . . . . . . . . . . . . . . . . . 5

1.3 Affine Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Barycentric Coordinates and Convex Sets . . . . . . . . . . . . . . . . . . 10

1.5 Affine Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.6 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Duality 19

2.1 Linear Forms and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Hyperplanes and Linear Systems . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Dual Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Symmetric Groups 33

4 Multilinear Maps and Determinants 40

4.1 Multilinear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 Alternating Multilinear Maps and Determinants . . . . . . . . . . . . . . 43

4.3 Determinants of Matrices and Linear Maps . . . . . . . . . . . . . . . . . 51

5 Direct Sums and Projections 63

∗E-Mail: [email protected]

1

CONTENTS 2

6 Eigenvalues 67

7 Commutative Rings, Polynomials 80

8 Characteristic Polynomial, Minimal Polynomial 96

9 Jordan Normal Form 105

10 Vector Spaces with Scalar Products 121

10.1 Definition, Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

10.2 The Adjoint Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

10.3 Hermitian, Unitary, and Normal Maps . . . . . . . . . . . . . . . . . . . 133

A Multilinear Maps 140

References 140

1 AFFINE SUBSPACES AND GEOMETRY 3

1 Affine Subspaces and Geometry

1.1 Affine Subspaces

Definition 1.1. Let V be a vector space over the field F . Then M ⊆ V is called anaffine subspace of V if, and only if, there exists a vector v ∈ V and a (vector) subspaceU ⊆ V such that M = v + U . We define dimM := dimU to be the dimension of M(this notion of dimension is well-defined by the following Lem. 1.2(a)).

—

Thus, the affine subspaces of a vector space V are precisely the translations of vectorsubspaces U of V , i.e. the cosets of subspaces U , i.e. the elements of quotient spacesV/U .

Lemma 1.2. Let V be a vector space over the field F .

(a) If M is an affine subspace of V , then the vector subspace corresponding to M isunique, i.e. if M = v1+U1 = v2+U2 with v1, v2 ∈ V and vector subspaces U1, U2 ⊆V , then

U1 = U2 = u− v : u, v ∈M. (1.1)

(b) If M = v+U is an affine subspace of V , then the vector v in this representation isunique if, and only if, U = 0.

Proof. (a): Let M = v1 + U1 with v1 ∈ V and vector subspace U1 ⊆ V . Moreover, letU := u− v : u, v ∈ M. It suffices to show U1 = U . Suppose, u1 ∈ U1. Since v1 ∈ Mand v1 + u1 ∈M , we have u1 = v1 + u1 − v1 ∈ U , showing U1 ⊆ U . If a ∈ U , then thereare u1, u2 ∈ U1 such that a = v1 + u1 − (v1 + u2) = u1 − u2 ∈ U1, showing U ⊆ U1, asdesired.

(b): If U = 0, then M = v and v is unique. If M = v + U with 0 6= u ∈ U , thenM = v + U = v + u+ U with v 6= v + u.

Definition 1.3. In the situation of Def. 1.1, we call affine subspaces of dimension 0points, of dimension 1 lines, and of dimension 2 planes – in R2 and R3, such objects areeasily visualized and they then coincide with the points, lines, and planes with whichone is already familiar.

—

Affine spaces and vector spaces share many structural properties. In consequence, onecan develop a theory of affine spaces that is in many respects analogous to the theoryof vector spaces, as will be illustrated by some of the notions and results presented inthe following. We start by defining so-called affine combinations, which are, for affinespaces, what linear combinations are for vector spaces:


Definition 1.4. Let V be a vector space over the field F with v1, . . . , vn ∈ V andλ1, . . . , λn ∈ F , n ∈ N. Then

∑ni=1 λi vi is called an affine combination of v1, . . . , vn if,

and only if,∑n

i=1 λi = 1.

Theorem 1.5. Let V be a vector space over the field F , ∅ 6= M ⊆ V . Then M isan affine subspace of V if, and only, if M is closed under affine combinations. Moreprecisely, the following statements are equivalent:

(i) M is an affine subspace of V .

(ii) If n ∈ N, v1, . . . , vn ∈M , and λ1, . . . , λn ∈ F with∑n

i=1 λi = 1, then∑n

i=1 λi vi ∈M .

If charF 6= 2, then (i) and (ii) are also equivalent to1

(iii) If v1, v2 ∈M and λ1, λ2 ∈ F with λ1 + λ2 = 1, then λ1v1 + λ2v2 ∈M .

Proof. Exercise.

The following Th. 1.6 is the analogon of [Phi19, Th. 5.7] for affine spaces:

Theorem 1.6. Let V be a vector space over the field F .

(a) Let I 6= ∅ be an index set and (Mi)i∈I a family of affine subspaces of V . Then theintersection M :=

⋂

i∈I Mi is either empty or it is an affine subspace of V .

(b) In contrast to intersections, unions of affine subspaces are almost never affine sub-spaces. More precisely, if M1 and M2 are affine subspaces of V and charF 6= 2 (i.e.1 6= −1 in F ), then

M1 ∪M2 is an affine subspace of V ⇔(

M1 ⊆M2 ∨ M2 ⊆M1

)

(1.2)

(where “⇐” also holds for charF = 2, but cf. Ex. 1.7 below).

Proof. (a): Let M 6= ∅. We use the characterization of Th. 1.5(ii) to show M is anaffine subspace: If n ∈ N, v1, . . . , vn ∈ M , and λ1, . . . , λn ∈ F with

∑nk=1 λk = 1, then

v :=∑n

k=1 λk vk ∈Mi for each i ∈ I, implying v ∈M . Thus, M is an affine subspace ofV .

(b): If M1 ⊆ M2, then M1 ∪ M2 = M2, which is an affine subspace of V . If M2 ⊆M1, then M1 ∪ M2 = M1, which is an affine subspace of V . For the converse, we

1For charF = 2, (iii) does not imply (i) and (ii): Let F := Z2 = 0, 1. Let V be a vector spaceover F with #V ≥ 4 (e.g. V = F 2). Let p, q, r ∈ V be distinct, M := p, q, r (i.e. #M = 3). Ifλ1, λ2 ∈ F with λ1+λ2 = 1, then (λ1, λ2) ∈ (0, 1), (1, 0) and (iii) is trivially true. On the other hand,v := p+ q+ r is an affine combination of p, q, r, since 1+ 1+ 1 = 1 in F ; but v /∈ M : v = p+ q+ r = pimplies q = −r = r, and v = r, q likewise leads to a contradiction (this counterexample was pointed outby Robin Mader).


now assume charF 6= 2, M1 6⊆ M2, and M1 ∪ M2 is an affine subspace of V . Wehave to show M2 ⊆ M1. Let m1 ∈ M1 \ M2 and m2 ∈ M2. Since M1 ∪ M2 is anaffine subspace, m2 + m2 − m1 ∈ M1 ∪ M2 by Th. 1.5(ii). If m2 + m2 − m1 ∈ M2,then m1 = m2 + m2 − (m2 + m2 − m1) ∈ M2, in contradiction to m1 /∈ M2. Thus,m2 + m2 − m1 ∈ M1. Since charF 6= 0, we have 2 := 1 + 1 6= 0 in F , implyingm2 =

12(m2 +m2 −m1) +

12m1 ∈M1, i.e. M2 ⊆M1.

Example 1.7. Consider F := Z2 = 0, 1 and the vector space V := F 2 over F . ThenM1 := U1 := (0, 0), (1, 0) = 〈(1, 0)〉 is a vector subspace and, in particular, an affinesubspace of V . The set M2 := (0, 1) + U1 = (0, 1), (1, 1) is also an affine subspace.Then M1 ∪M2 = V is an affine subspace, even though neither M1 ⊆M2 nor M2 ⊆M1.

1.2 Affine Hull and Affine Independence

Next, we will define the affine hull of a subset A of a vector space, which is the affineanalogon to the linear notion of the span of A (which is sometimes also called the linearhull of A):

Definition 1.8. Let V be a vector space over the field F , ∅ 6= A ⊆ V , and

M :=

M ∈ P(V ) : A ⊆M ∧ M is affine subspace of V

,

where we recall that P(V ) denotes the power set of V . Then the set

aff A :=⋂

M∈M

M

is called the affine hull of A. We call A a generating set of aff A.

—

The following Prop. 1.9 is the analogon of [Phi19, Prop. 5.9] for affine spaces:

Proposition 1.9. Let V be a vector space over the field F and ∅ 6= A ⊆ V .

(a) aff A is an affine subspace of V , namely the smallest affine subspace of V containingA.

(b) aff A is the set of all affine combinations of elements from A, i.e.

aff A =

n∑

i=1

λi ai : n ∈ N ∧ λ1, . . . , λn ∈ F ∧ a1, . . . , an ∈ A ∧n∑

i=1

λi = 1

.

(1.3)

(c) If A ⊆ B ⊆ V , then aff A ⊆ aff B.

(d) A = aff A if, and only if, A is an affine subspace of V .


(e) aff aff A = aff A.

Proof. (a): Since A ⊆ aff A implies aff A 6= ∅, (a) is immediate from Th. 1.6(a).

(b): Let W denote the right-hand side of (1.3). If M is an affine subspace of V andA ⊆M , then W ⊆M , sinceM is closed under affine combinations, showing W ⊆ aff A.On the other hand, suppose N, n1, . . . , nN ∈ N, ak1, . . . , a

knk

∈ A for each k ∈ 1, . . . , N,λk1, . . . , λ

knk

∈ F for each k ∈ 1, . . . , N, and α1, . . . , αN ∈ F such that

∀k∈1,...,N

nk∑

i=1

λki =N∑

i=1

αi = 1.

ThenN∑

k=1

αk

nk∑

i=1

λki aki ∈ W,

sinceN∑

k=1

nk∑

i=1

αkλki =

N∑

k=1

αk = 1,

showing W to be an affine subspace of V by Th. 1.5(ii). Thus, aff A ⊆ W , completingthe proof of aff A = W .

(c) is immediate from (b).

(d): If A = aff A, then A is an affine subspace by (a). For the converse, while it is clearthat A ⊆ aff A always holds, if A is an affine subspace, then A ∈ M, where M is as inDef. 1.8, implying aff A ⊆ A.

(e) now follows by combining (d) with (a).

Proposition 1.10. Let V be a vector space over the field F , A ⊆ V , M = v + U withv ∈ A and U a vector subspace of V . Then the following statements are equivalent:

(i) aff A =M .

(ii) 〈−v + A〉 = U .

Proof. Exercise.

We will now define the notions of affine dependence/independence, which are, for affinespaces, what linear dependence/independence are for vector spaces:

Definition 1.11. Let V be a vector space over the field F .

(a) A vector v ∈ V is called affinely dependent on a subset U of V (or on the vectorsin U) if, and only if, there exists n ∈ N and u1, . . . , un ∈ U such that v is an affinecombination of u1, . . . , un. Otherwise, v is called affinely independent of U .


(b) A subset U of V is called affinely independent if, and only if, whenever 0 ∈ V iswritten as a linear combination of distinct elements of U such that the coefficientshave sum 0, then all coefficients must be 0 ∈ F , i.e. if, and only if,

(

n ∈ N ∧ W ⊆ U ∧ #W = n ∧∑

u∈W

λu u = 0

∧ ∀u∈W

λu ∈ F ∧∑

u∈W

λu = 0

)

⇒ ∀u∈W

λu = 0. (1.4)

Sets that are not affinely independent are called affinely dependent.

—

As a caveat, it is underlined that, in Def. 1.11(b) above, one does not consider affinecombinations of the vectors u ∈ U , but special linear combinations (this is related tothe fact that 0 is only an affine combination of vectors in U , if aff U is a vector subspaceof V ).

Remark 1.12. It is immediate from Def. 1.11 that if v ∈ V is linearly independent ofU ⊆ V , then it is also affinely independent of U , and, if U ⊆ V is linearly independent,then U is also affinely independent. However, the converse is, in general, not true (cf.Ex. 1.13(b),(c) below).

Example 1.13. Let V be a vector space over the field F .

(a) ∅ is affinely independent: Indeed, if U = ∅, then the left side of the implication in(1.4) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.

(b) Every singleton set v, v ∈ V , is affinely independent, since λ1 =∑1

i=1 λi = 0means λ1 = 0 (if v = 0, then v is not linearly independent, cf. [Phi19, Ex.5.13(b)]).

(c) Every set v, w with two distinct vectors v, w ∈ V is affinely independent (butnot linearly independent for w = αv with some α ∈ F ): 0 = λv − λw = λ(v − w)implies λ = 0 or v = w.

—

There is a close relationship between affine independence and linear independence:

Proposition 1.14. Let V be a vector space over the field F and U ⊆ V . Then thefollowing statements are equivalent:

(i) U is affinely independent.

(ii) If u0 ∈ U , then U0 := u− u0 : u ∈ U \ u0 is linearly independent.


(iii) The set X := (u, 1) ∈ V × F : u ∈ U is a linearly independent subset of thevector space V × F .

Proof. Exercise.

The following Prop. 1.15 is the analogon of [Phi19, Prop. 5.14(a)-(c)] for affine spaces:

Proposition 1.15. Let V be a vector space over the field F and U ⊆ V .

(a) U is affinely dependent if, and only if, there exists u0 ∈ U such that u0 is affinelydependent on U \ u0.

(b) If U is affinely dependent and U ⊆M ⊆ V , then M is affinely dependent as well.

(c) If U is affinely independent und M ⊆ U , then M is affinely independent as well.

Proof. (a): Suppose, U is affinely dependent. Then there exists W ⊆ U , #W = n ∈ N,such that

∑

u∈W λuu = 0 with λu ∈ F ,∑

u∈W λu = 0, and there exists u0 ∈ W withλu0 6= 0. Then

u0 = −λ−1u0

∑

u∈W\u0

λuu =∑

u∈W\u0

(−λ−1u0λu)u,

∑

u∈W\u0

(−λ−1u0λu) = (−λ−1

u0)·(−λu0) = 1,

showing u0 to be affinely dependent on U \ u0. Conversely, if u0 ∈ U is affinelydependent on U \ u0, then there exists n ∈ N, distinct u1, . . . , un ∈ U \ u0, andλ1, . . . , λn ∈ F with

∑ni=1 λi = 1 such that

u0 =n∑

i=1

λiui ⇒ −u0 +n∑

i=1

λiui = 0,

showing U to be affinely dependent, since the coefficient of u0 is −1 6= 0 and −1 +∑n

i=1 λi = 0.

(b) and (c) are now both immediate from (a).

1.3 Affine Bases

Definition 1.16. Let V be a vector space over the field F , let M ⊆ V be an affinesubspace, and B ⊆ V . Then B is called an affine basis of M if, and only if, B is agenerating set for M (i.e. M = aff B) that is also affinely independent.

—

There is a close relationship between affine bases and vector space bases:

Proposition 1.17. Let V be a vector space over the field F , let M ⊆ V be an affinesubspace, and let B ⊆M with v ∈ B. Then the following statements are equivalent:


(i) B is an affine basis of M .

(ii) B0 := b − v : b ∈ B \ v is a vector space basis of the vector space U :=v1 − v2 : v1, v2 ∈M.

Proof. As a consequence of Lem. 1.2(a), we know U to be a vector subspace of V andM = a + U for each a ∈ M . Moreover, v ∈ B ⊆ M implies B0 ⊆ U . Accordingto Prop. 1.14, B is affinely independent if, and only if, B0 is linearly independent.According to Prop. 1.10, aff B = M holds if, and only if, 〈−v + B〉 = U , which, sinceB0 = (−v + B) \ 0, holds if, and only if, 〈B0〉 = U .


Theorem 1.18. Let V be a vector space over the field F , let M ⊆ V be an affinesubspace, and let ∅ 6= B ⊆ V . Then the following statements (i) – (iii) are equivalent:

(i) B is an affine basis of M .

(ii) B is a maximal affinely independent subset of M , i.e. B is affinely independentand each set A ⊆M with B ( A is affinely dependent.

(iii) B is a minimal generating set for M , i.e. aff B = M and aff A ( M for eachA ( B.

Proof. Let v ∈ B, and let B0 and U be as in Prop. 1.17 above. Then, due to Prop.1.14, B is a maximal affinely independent subset of M if, and only if, B0 is a maximallinearly independent subset of U . Moreover, due to Prop. 1.10, B is a minimal (affine)generating set for M if, and only if, B0 is a minimal (linear) generating set for U . Thus,the equivalences of Th. 1.18 follow by combining Prop. 1.17 with [Phi19, Th. 5.17].


Theorem 1.19. Let V be a vector space over the field F and let M ⊆ V be an affinesubspace.

(a) If S ⊆M is affinely independent, then there exists an affine basis ofM that containsS.

(b) M has an affine basis B ⊆M .

(c) Affine bases of M have a unique cardinality, i.e. if B ⊆ M and B ⊆ M are bothaffine bases of M , then there exists a bijective map φ : B −→ B.

(d) If B is an affine basis of M and S ⊆ M is affinely independent, then there existsC ⊆ B such that B := S ∪C is an affine basis of M .


Proof. Let v ∈ V and let U be a vector subspace of V such that M = v + U . Thenv ∈M and U = v1 − v2 : v1, v2 ∈M according to Lem. 1.2(a).

(a): It suffices to consider the case S 6= ∅. Thus, let v ∈ S. According to Prop. 1.14(ii),S0 := x − v : x ∈ S \ v is a linearly independent subset of U . According to[Phi19, Th. 5.23(a)], U has a vector space basis S0 ⊆ B0 ⊆ U . Then, by Prop. 1.17,S ⊆ (v + B0) ∪v is an affine basis of M .

(b) is immediate from (a).

(c): Let B ⊆ M and B ⊆ M be affine bases of M . Moreover, let b ∈ B and b ∈ B.Then, by Prop. 1.17, B0 := x − b : x ∈ B \ b and B0 := x − b : x ∈ B \ bare both vector space bases of U . Thus, by [Phi19, Th. 5.23(c)], there exists a bijectivemap ψ : B0 −→ B0. Then, clearly, the map

φ : B −→ B, φ(x) :=

b for x = b,

b+ ψ(x− b) for x 6= b,

is well-defined and bijective, thereby proving (c).

(d): If B ⊆ aff S, then, according to Prop. 1.9(c),(e), M = aff B ⊆ aff aff S = aff S,i.e. aff S = M , as M is an affine subspace containing S. Thus, S is itself an affinebasis of M and the statement holds with C := ∅. It remains to consider the case,where there exists b ∈ B \ S such that S ∪ b is affinely independent. Then, by Prop.1.17, B0 := x − b : x ∈ B \ b is a vector space basis of U and, by Prop. 1.14(ii),S0 := x − b : x ∈ S is a linearly independent subset of U . Thus, by [Phi19, Th.5.23(d)], there exists C0 ⊆ B0 such that B0 := S0 ∪C0 is a vector space basis of U , and,then, using Prop. 1.17 once again, (b+ B0) ∪b = S ∪C with C := (b+ C0) ∪b ⊆ Bis an affine basis of M .

1.4 Barycentric Coordinates and Convex Sets


Theorem 1.20. Let V be a vector space over the field F and assume M ⊆ V is anaffine subspace with affine basis B ofM . Then each vector v ∈M has unique barycentriccoordinates with respect to the affine basis B, i.e., for each v ∈M , there exists a uniquefinite subset Bv of B and a unique map c : Bv −→ F \ 0 such that

v =∑

b∈Bv

c(b) b ∧∑

b∈Bv

c(b) = 1. (1.5)

Proof. The existence of Bv and the map c follows from the fact that the affine basis Bis an affine generating set, aff B = M . For the uniqueness proof, consider finite setsBv, Bv ⊆ B and maps c : Bv −→ F \ 0, c : Bv −→ F \ 0 such that

v =∑

b∈Bv

c(b) b =∑

b∈Bv

c(b) b ∧∑

b∈Bv

c(b) =∑

b∈Bv

c(b) = 1.


Extend both c and c to A := Bv ∪ Bv by letting c(b) := 0 for b ∈ Bv \ Bv and c(b) := 0for b ∈ Bv \ Bv. Then

0 =∑

b∈A

(

c(b)− c(b))

b,

such that the affine independence of A implies c(b) = c(b) for each b ∈ A, which, inturn, implies Bv = Bv and c = c.

Example 1.21. With respect to the affine basis 0, 1 of R over R, the barycentriccoordinates of 1

3are 2

3and 1

3, whereas the barycentric coordinates of 5 are −4 and 5.

Remark 1.22. Let V be a vector space over the field F and assumeM ⊆ V is an affinesubspace with affine basis B of M .

(a) Caveat: In the literature, one also finds the notion of affine coordinates, however,this notion of affine coordinates is usually (but not always, so one has to use care)defined differently from the notion of barycentric coordinates as defined in Th. 1.20above: For the affine coordinates, one designates one point x0 ∈ B to be the originof M . Let v ∈ M and let c : Bv −→ F \ 0 be the map yielding the barycentriccoordinates according to Th. 1.20. We write x0 ∪ Bv = x0, x1, . . . , xn withdistinct elements x1, . . . , xn ∈ M (if any) and we set c(x0) := 0 in case x0 /∈ Bv.Then

v =n∑

i=0

c(xi) xi ∧n∑

i=0

c(xi) = 1,

which, since 1−∑ni=1 c(xi) = c(x0), is equivalent to

v = x0 +n∑

i=1

c(xi) (xi − x0) ∧n∑

i=0

c(xi) = 1.

One calls the c(x1), . . . , c(xn), given by the map ca := cBv\x0, the affine coordi-nates of v with respect to the affine coordinate system x0∪ (−x0+B) (for v = x0,ca turns out to be the empty map).

(b) If x1, . . . , xn ∈M are distinct points that are affinely independent and n := n ·1 6= 0in F , then one sometimes calls

v :=1

n

n∑

i=1

xi ∈M

the barycenter of x1, . . . , xn.

Definition and Remark 1.23. Let V be a vector space over R (we restrict ourselvesto vector spaces over R, since, for a scalar λ we will need to know what it meansfor λ to be positive, i.e. λ > 0 needs to be well-defined). Let v1, . . . , vn ∈ V andλ1, . . . , λn ∈ R, n ∈ N. Then we call the affine combination

∑ni=1 λi vi of v1, . . . , vn a

convex combination of v1, . . . , vn if, and only if, in addition to∑n

i=1 λi = 1, one hasλi ≥ 0 for each i ∈ 1, . . . , n. Moreover, we call C ⊆ V convex if, and only if, C


is closed under convex combinations, i.e. if, and only if, n ∈ N, v1, . . . , vn ∈ C, andλ1, . . . , λn ∈ R+

0 with∑n

i=1 λi = 1, implies∑n

i=1 λi vi ∈ C (analogous to Th. 1.5, C ⊆ Vis then convex if, and only if, each convex combination of merely two elements of Cis again in C). Note that, in contrast to affine subspaces, we allow convex sets tobe empty. Clearly, the convex subsets of R are precisely the intervals (open, closed,half-open, bounded or unbounded). Convex subsets of R2 include triangles and disks.Analogous to the proof of Th. 1.6(a), one can show that arbitrary intersections of convexsets are always convex, and, analogous to the definition of the affine hull in Def. 1.8,one defines the convex hull convA of a set A ⊆ V by letting

C :=

C ∈ P(V ) : A ⊆ C ∧ C is convex subset of V

,

convA :=⋂

C∈C

C.

Then Prop. 1.9 and its proof still work completely analogously in the convex situationand one obtains convA to be the smallest convex subset of V containing A, where convAconsists precisely of all convex combinations of elements fromA; A = convA holds if, andonly if, A is convex; conv convA = convA; and convA ⊆ convB for each A ⊆ B ⊆ V .If n ∈ N0 and A = x0, x1, . . . , xn ⊆ V is an affinely independent set, consisting ofthe n + 1 distinct points x0, x1, . . . , xn, then convA is called an n-dimensional simplex(or simply an n-simplex) with vertices x0, x1, . . . , xn – 0-simplices are called points, 1-simplices line segments, 2-simplices triangles, and 3-simplices tetrahedra. If e1, . . . , eddenotes the standard basis of Rd, d ∈ N, then conve1, . . . , en+1, 0 ≤ n < d, is calledthe standard n-simplex in Rd.

1.5 Affine Maps

We first study a special type of affine map, namely so-called translations.

Definition 1.24. Let V be a vector space over the field F . If v ∈ V , then the map

Tv : V −→ V, Tv(x) := x+ v,

is called a translation, namely, the translation by v or the translation with translationvector v. Let T (V ) := Tv : v ∈ V denote the set of translations on V .

Proposition 1.25. Let V be a vector space over the field F .

(a) If v ∈ V and A,B ⊆ V , then Tv(A + B) = v + A + B. In particular, translationsmap affine subspaces of V into affine subspaces of V .

(b) If v ∈ V , then Tv is bijective with (Tv)−1 = T−v. In particular, T (V ) ⊆ SV , where

SV denotes the symmetric group on V according to [Phi19, Ex. 4.9(b)].

(c) Nontrivial translations are not linear: More precisely, Tv with v ∈ V is linear if,and only if, v = 0 (i.e. Tv = Id).


(d) If v, w ∈ V , then Tv Tw = Tv+w = Tw Tv.

(e) (T (V ), ) is a commutative subgroup of (SV , ). Moreover, (T (V ), ) ∼= (V,+),where

I : (V,+) −→ (T (V ), ), I(v) := Tv,

constitutes a group isomorphism.

Proof. Exercise.

We will now define affine maps, which are, for affine spaces, what linear maps are forvector spaces:

Definition 1.26. Let V and W be vector spaces over the field F . A map A : V −→ Wis called affine if, and only if, there exists a linear map L ∈ L(V,W ) and w ∈ W suchthat

∀x∈V

A(x) = (Tw L)(x) = w + L(x) (1.6)

(i.e. the affine maps are precisely the compositions of linear maps with translations).We denote the set of all affine maps from V into W by A(V,W ).

Proposition 1.27. Let V,W,X be vector spaces over the field F .

(a) If L ∈ L(V,W ) and v ∈ V , then L Tv = TLv L ∈ A(V,W ).

(b) If A ∈ A(V,W ), L ∈ L(V,W ), and w ∈ W , then A = Tw L if, and only if,T−w A = L. In particular, A = Tw L is injective (resp. surjective, resp. bijective)if, and only if, L is injective (resp. surjective, resp. bijective).

(c) If A : V −→ W is an affine and bijective, then A−1 is also affine.

(d) If A : V −→ W and B : W −→ X are affine, then so is B A.

(e) Define GA(V ) := A ∈ A(V, V ) : A bijective. Then (GA(V ), ) forms a subgroupof the symmetric group (SV , ) (then, clearly, GL(V ) forms a subgroup of GA(V ),cf. [Phi19, Cor. 6.23]).

Proof. (a): If L ∈ L(V,W ) and v, x ∈ V , then

(L Tv)(x) = L(v + x) = Lv + Lx = (TLv L)(x),

proving L Tv = TLv L.(b) is due to the bijectivity of Tw: One has, since T−w Tw = Id,

A = Tw L ⇔ T−w A = T−w Tw L = Id L = L.


Moreover, for each x, y ∈ V and z ∈ W , one has

Ax = w + Lx = w + Ly = Ay ⇔ Lx = Ly,

z + w = Ax = w + Lx ⇔ z = Lx,

z − w = Lx ⇔ z = w + Lx = Ax,

proving A = Tw L is injective (resp. surjective, resp. bijective) if, and only if, L isinjective (resp. surjective, resp. bijective).

(c): If A = Tw L with L ∈ L(V,W ) and w ∈ W is affine and bijective, then, by (b), Lis bijective. Thus, A−1 = L−1 (Tw)−1 = L−1 T−w, which is affine by (a).

(d): If A = Tw L, B = Tx K with L ∈ L(V,W ), w ∈ W , K ∈ L(W,X), x ∈ X, then

∀a∈V

(B A)(a) = B(w + La) = x+Kw + (K L)(a) =(

TKw+x (K L))

(a),

showing B A to be affine.

(e) is an immediate consequence of (c) and (d).

Proposition 1.28. Let V and W be vector spaces over the field F .

(a) Let v ∈ V , w ∈ W , L ∈ L(V,W ), and let U be a vector subspace of V . Then

(Tw L)(v + U) = w + Lv + L(U)

(in particular, each affine image of an affine subspace is an affine subspace). More-over, if A := Tw L and S ⊆ V such that M := v + U = aff S, then A(M) =w + Lv + L(U) = aff(A(S)).

(b) Let y ∈ W , L ∈ L(V,W ), and let U be a vector subspace of W . Then L−1(U) is avector subspace of V and

∀v∈L−1y

L−1(y + U) = v + L−1(U)

(in particular, each linear preimage of an affine subspace is either empty or anaffine subspace).

(c) If M ⊆ W is an affine subspace of W and A ∈ A(V,W ), then A−1(M) is eitherempty or an affine subspace of V .

Proof. Exercise.

The following Prop. 1.29 is the analogon of [Phi19, Prop. 6.5(a),(b)] for affine spaces(but cf. Caveat 1.30 below):

Proposition 1.29. Let V andW be vector spaces over the field F , and let A : V −→ Wbe affine.


(a) If A is injective, then, for each affinely independent subset S of V , A(S) is anaffinely independent subset of W .

(b) A is surjective if, and only if, for each subset S of V with V = aff S, one hasW = aff(A(S)).

Proof. Let w ∈ W and L ∈ L(V,W ) be such that A = Tw L.(a): If A is injective, S ⊆ V is affinely independent, and λ1, . . . , λn ∈ F ; s1, . . . , sn ∈ Sdistinct; n ∈ N; such that

∑ni=1 λi = 0 and

0 =n∑

i=1

λiA(si) =n∑

i=1

λi(w + Lsi) =

(

n∑

i=1

λi

)

w + L

(

n∑

i=1

λisi

)

= L

(

n∑

i=1

λisi

)

,

then∑n

i=1 λisi = 0 by [Phi19, Prop. 6.3(d)], implying λ1 = · · · = λn = 0 and, thus,showing that A(S) is also affinely independent.

(b): If A is not surjective, then aff(A(V )) = A(V ) 6= W , since A(V ) is an affine subspaceof W by Prop. 1.28(a). Conversely, if A is surjective, S ⊆ V , and aff(S) = V , then

W = A(V ) = A(aff S)Prop. 1.28(a)

= aff(A(S)),

thereby establishing the case.

Caveat 1.30. Unlike in [Phi19, Prop. 6.5(a)], the converse of Prop. 1.29(a) is, in general,not true: If dimV ≥ 1 and A ≡ w ∈ W is constant, then A is affine, not injective, but itmaps every nonempty affinely independent subset of V (in fact, every nonempty subsetof V ) onto the affinely independent set w.

Corollary 1.31. Let V and W be vector spaces over the field F , and let A : V −→ Wbe affine and injective. If M ⊆ V is an affine subspace and B is an affine basis of M ,then A(B) is an affine basis of A(M) (Caveat 1.30 above shows that the converse is, ingeneral, not true).

Proof. Since B is affinely independent, A(B) is affinely independent by Prop. 1.29(a).On the other hand, A(M) = aff(A(B)) by Prop. 1.28(a).

The following Prop. 1.32 shows that affine subspaces are precisely the images of vectorsubspaces under translations and also precisely the sets of solutions to linear systemswith nonempty sets of solutions:

Proposition 1.32. Let V be a vector space over the field F and M ⊆ V . Then thefollowing statements are equivalent:

(i) M is an affine subspace of V .

(ii) There exists v ∈ V and a vector subspace U ⊆ V such that M = Tv(U).


(iii) There exists a linear map L ∈ L(V, V ) and a vector b ∈ V such that ∅ 6= M =L−1b = x ∈ V : Lx = b (if V is finite-dimensional, then L−1b = L(L|b),where L(L|b) denotes the set of solutions to the linear system Lx = b according to[Phi19, Rem. 8.3]).

Proof. “(i)⇔(ii)”: By the definition of affine subspaces, (i) is equivalent to the existenceof v ∈ V and a vector subspace U ⊆ V such that M = v + U = Tv(U), which is (ii).

“(iii)⇒(i)”: Let L ∈ L(V, V ) and b ∈ V such that ∅ 6=M = L−1b. Let x0 ∈M . Then,by [Phi19, Th. 4.20(d)], M = x0 + kerL, showing M to be an affine subspace.

“(i)⇒(iii)”: Now suppose M = v + U with v ∈ V and U a vector subspace of V .According to [Phi19, Th. 5.27(c)], there exists a subspaceW of V such that V = U⊕W .Then, clearly, L : V −→ V , L(u+w) := w (where u ∈ U , w ∈ W ), defines a linear map.Let b := Lv. Then M = L−1b: Indeed, if u ∈ U , then L(v + u) = Lv + 0 = Lv = b,showingM ⊆ L−1b; if L(u+w) = w = b = Lv, then u+w = v+u+w−v ∈ v+U =M(since L(u + w − v) = Lw − Lv = w − Lv = 0 implies u + w − v ∈ U), showingL−1b ⊆M .


Theorem 1.33. Let V and W be vector spaces over the field F . Moreover, let MV =v + UV ⊆ V and MW = w + UW ⊆ W be affine subspaces of V and W , respectively,where v ∈ V , w ∈ W , UV is a vector subspace of V and UW is a vector subspace ofW . Let BV be an affine basis of MV and let BW be an affine basis of MW . Then thefollowing statements are equivalent:

(i) There exists a linear isomorphism L : UV −→ UW such that

MW = (Tw L T−v)(MV ).

(ii) UV and UW are linearly isomorphic.

(iii) dimMV = dimMW .

(iv) #BV = #BW (i.e. there exists a bijective map from BV onto BW ).

Proof. “(i)⇒(ii)” is trivially true.

“(ii)⇒(i)” holds, since the restricted translations T−v : MV −→ UV and Tw : UW −→MW are, clearly, bijective.

“(ii)⇔(iii)”: By Def. 1.1, (iii) is equivalent to dimUV = dimUW , which, according to[Phi19, Th. 6.9], is equivalent to (ii).

“(iii)⇔(iv)”: Let x ∈ BV and y ∈ BW . Then, by Prop. 1.17, SV := b−x : b ∈ BV \xis a vector space basis of UV and SW := b− y : b ∈ BW \y is a vector space basis ofUW , where the restricted translations T−x : BV \x −→ SV and T−y : BW \y −→ SW

are, clearly, bijective. Thus, if dimMV = dimMW , then there exists a bijective map


φ : SV −→ SW , implying (Ty φ T−x) : BV \ x −→ BW \ y to be bijective as well.Conversely, if ψ : BV \ x −→ BW \ y is bijective, so is (T−y ψ Tx) : SV −→ SW ,implying dimMV = dimMW .

Analogous to [Phi19, Def. 6.17], we now consider, for vector spaces V,W over the fieldF , A(V,W ) with pointwise addition and scalar multiplication, letting, for each A,B ∈A(V,W ), λ ∈ F ,

(A+ B) : V −→ W, (A+ B)(x) := A(x) + B(x),

(λ · A) : V −→ W, (λ · A)(x) := λ · A(x) for each λ ∈ F .

The following Th. 1.34 corresponds to [Phi19, Th. 6.18] and [Phi19, Th. 6.21] for linearmaps.

Theorem 1.34. Let V and W be vector spaces over the field F . Addition and scalarmultiplication on A(V,W ), given by the pointwise definitions above, are well-defined inthe sense that, if A,B ∈ A(V,W ) and λ ∈ F , then A+B ∈ A(V,W ) and λA ∈ A(V,W ).Moreover, with these pointwise defined operations, A(V,W ) forms a vector space overF .

Proof. According to [Phi19, Ex. 5.2(c)], it only remains to show that A(V,W ) is avector subspace of F(V,W ) = W V . To this end, let A,B ∈ A(V,W ) with A = Tw1 L1,B = Tw2 L2, where w1, w2 ∈ W , L1, L2 ∈ L(V,W ), and let λ ∈ F . If v ∈ V , then

(A+ B)(v) = w1 + L1v + w2 + L2v = w1 + w2 + (L1 + L2)v,

(λA)(v) = λw1 + λL1v,

proving A + B = Tw1+w2 (L1 + L2) ∈ A(V,W ) and λA = Tλw1 (λL1) ∈ A(V,W ), asdesired.

1.6 Affine Geometry

The subject of affine geometry is concerned with the relationships between affine sub-spaces, in particular, with the way they are contained in each other.

Definition 1.35. Let V be a vector space over the field F and let M,N ⊆ V be affinesubspaces.

(a) We define the incidence M IN by

M IN :⇔(

M ⊆ N ∨N ⊆M)

.

If M IN holds, then we call M,N incident or M incident with N or N incidentwith M .

(b) If M = v + UM and N = w + UN with v, w ∈ V and UM , UN vector subspaces ofV , then we call M,N parallel (denoted M ‖ N) if, and only if, UM IUN .


Proposition 1.36. Let V be a vector space over the field F and let M,N ⊆ V be affinesubspaces.

(a) If M ‖ N , then M IN or M ∩N = ∅.

(b) If n ∈ N0 and An denotes the set of affine subspaces with dimension n of V , thenthe parallelity relation of Def. 1.35(b) constitutes an equivalence relation on An.

(c) If A denotes the set of all affine subspaces of V , then, for dimV ≥ 2, the parallelityrelation of Def. 1.35(b) is not transitive (in particular, not an equivalence relation)on A.

Proof. (a): Let M = v + UM and N = w + UN with v, w ∈ V and UM , UN vectorsubspaces of V . Without loss of generality, assume UM ⊆ UN . Assume there existsx ∈M ∩N . Then, if y ∈M , then y− x ∈ UM ⊆ UN , implying y = x+(y− x) ∈ N andM ⊆ N .

(b): It is immediate from Def. 1.35 that ‖ is reflexive and symmetric. It remains toshow ‖ is transitive on An. Thus, suppose M = v + UM , N = w + UN , P = z + UP

with v, w, z ∈ V and UM , UN , UP vector subspaces of dimension n of V . If M ‖ N , thenUM IUN and dimUM = dimUN = n implies UM = UN by [Phi19, Th. 5.27(d)]. In thesame way, N ‖ P implies UN = UP . But then UM = UP andM ‖ P , proving transitivityof ‖.(c): Let u, w ∈ V be linearly independent, U := 〈u〉, W := 〈w〉. Then U ‖ V andW ‖ V , but U ∦ W (e.g., due to (a)).

Caveat 1.37. The statement of Prop. 1.36(b) becomes false if n ∈ N0 is replaced by aninfinite cardinality: In an adaptation of the proof of Prop. 1.36(c), suppose V is a vectorspace over the field F , where the distinct vectors v1, v2, . . . are linearly independent, anddefine B := vi : i ∈ N, U := 〈B \ v1〉, W := 〈B \ v2〉, X := 〈B〉. Then, clearly,U ‖ X and W ‖ X, but U ∦ W (e.g., due to Prop. 1.36(a)).


(a) If x, y ∈ V with x 6= y, then there exists a unique line l ⊆ V (i.e. a unique affinesubspace l of V with dim l = 1) such that x, y ∈ l. Moreover, this affine subspace isgiven by

l = x+ 〈x− y〉. (1.7)

(b) If x, y, z ∈ V and there does not exist a line l ⊆ V such that x, y, z ∈ l, then thereexists a unique plane p ⊆ V (i.e. a unique affine subspace p of V with dim p = 2)such that x, y, z ∈ p. Moreover, this affine subspace is given by

p = x+ 〈y − x, z − x〉. (1.8)

(c) If v1, . . . , vn ∈ V , n ∈ N, then affv1, . . . , vn = v1 + 〈v2 − v1, . . . , vn − v1〉.

2 DUALITY 19

Proof. Exercise.

Proposition 1.39. Let V,W be vector spaces over the field F and let M,N ⊆ V beaffine subspaces.

(a) If A ∈ A(V,W ), then M IN implies A(M) IA(N), and M ‖ N implies A(M) ‖A(N).

(b) If v ∈ V , then Tv(M) ‖M .

Proof. (a): Let A ∈ A(V,W ). Then M IN implies A(M) IA(N), since M ⊆ N im-plies A(M) ⊆ A(N) and A(M), A(N) are affine subspaces of W due to Prop. 1.28(a).Moreover, if M = v + UM , N = w + UN with v, w ∈ V and UM , UN vector subspacesof V , A = Tx L with x ∈ W and L ∈ L(V,W ), then A(M) = x + Lv + L(UM) andA(N) = x + Lw + L(UN), such that M ‖ N implies A(M) ‖ A(N), since UM ⊆ UN

implies L(UM) ⊆ L(UN).

(b) is immediate from Tv(M) = v + w + U for M = w + U with w ∈ V and U a vectorsubspace of V .

2 Duality

2.1 Linear Forms and Dual Spaces

If V is a vector space over the field F , then maps from V into F are often of particularinterest and importance. Such maps are sometimes called functionals or forms. Here,we will mostly be concerned with linear forms: Let us briefly review some examples oflinear forms that we already encountered in [Phi19]:

Example 2.1. Let V be a vector space over the field F .

(a) Let B be a basis of V . If cv : Bv −→ F \ 0, Bv ⊆ V , are the correspondingcoordinate maps (i.e. v =

∑

b∈Bvcv(b) b for each v ∈ V ), then, for each b ∈ B, the

projection onto the coordinate with respect to b,

πb : V −→ F, πb(v) :=

cv(b) for b ∈ Bv,

0 for b /∈ Bv,

is a linear form (cf. [Phi19, Ex. 6.7(b)]).

(b) Let I be a nonempty set, V := F(I, F ) = F I (i.e. the vector space of functionsfrom I into F ). Then, for each i ∈ I, the projection onto the ith coordinate

πi : V −→ F, πi(f) := f(i),

is a linear form (cf. [Phi19, Ex. 6.7(c)]).

2 DUALITY 20

(c) Let F := K, where, as in [Phi19], we write K if K may stand for R or C. Let V bethe set of convergent sequences in K. Then

A : V −→ K, A(zn)n∈N := limn→∞

zn,

is a linear form (cf. [Phi19, Ex. 6.7(e)(i)]).

(d) Let a, b ∈ R, a ≤ b, I := [a, b], and let V := R(I,K) be the set of all K-valuedRiemann integrable functions on I. Then

J : V −→ K, J(f) :=

∫

I

f,

is a linear form (cf. [Phi19, Ex. 6.7(e)(iii)]).


(a) The functions from V into F (i.e. the elements of F(V, F ) = F V ) are called func-tionals or forms on V . In particular, the elements of L(V, F ) are called linearfunctionals or linear forms on V .

(b) The setV ′ := L(V, F ) (2.1)

is called the (linear2) dual space (or just the dual) of V (in the literature, one oftenalso finds the notation V ∗ instead of V ′). We already know from [Phi19, Th. 6.18]that V ′ constitutes a vector space over F .

Corollary 2.3. Let V be a vector space over the field F . Then each linear form α :V −→ F is uniquely determined by its values on a basis of V . More precisely, if B is abasis of V , (λb)b∈B is a family in F , and, for each v ∈ V , cv : Bv −→ F \ 0, Bv ⊆ V ,is the corresponding coordinate map (i.e. v =

∑

b∈Bvcv(b) b for each v ∈ V ), then

α : V −→ F, α(v) = α

(

∑

b∈Bv

cv(b) b

)

:=∑

b∈Bv

cv(b)λb, (2.2)

is linear, and α ∈ V ′ with∀

b∈Bα(b) = λb,

implies α = α.

Proof. Corollary 2.3 constitutes a special case of [Phi19, Th. 6.6].

2In Functional Analysis, where the vector space V over K is endowed with the additional structureof a topology (e.g., V might be the normed space Kn), one defines the (topological) dual V ′

top of V(there usually also just denoted as V ′ or V ∗) to consist of all linear functionals on V that are alsocontinuous with respect to the topology on V (cf. [Phi17c, Ex. 3.1]). Depending on the topology on V ,V ′

top can be much smaller than V ′ – V ′

top tends to be much more useful in an analysis context.

2 DUALITY 21

Corollary 2.4. Let V be a vector space over the field F and let B be a basis of V .Using Cor. 2.3, define linear forms αb ∈ V ′ by letting

∀(b,a)∈B×B

αb(a) := δba =

1 for a = b,

0 for a 6= b.(2.3)

DefineB′ :=

αb : b ∈ B

. (2.4)

(a) Then B′ is linearly independent.

(b) If V is finite-dimensional, dimV = n ∈ N, then B′ constitutes a basis for V ′ (inparticular, dimV = dimV ′). In this case, B′ is called the dual basis of B (and Bthe dual basis of B′).

(c) If dimV = ∞, then 〈B′〉 ( V ′ and, in particular, B′ is not a basis of V ′ (in fact,in this case, one has dimV ′ > dimV , see [Jac75, pp. 244-248]).

Proof. Cor. 2.4(a),(b),(c) constitute special cases of the corresponding cases of [Phi19,Th. 6.19].

Definition 2.5. If V is a vector space over the field F with dimV = n ∈ N andB := (b1, . . . , bn) is an ordered basis of V , then we call B′ := (α1, . . . , αn), where

∀i∈1,...,n

(

αi ∈ V ′ ∧ αi(bj) = δij

)

, (2.5)

the ordered dual basis of B (and B the ordered dual basis of B′) – according to Cor.2.4(b), B′ is, indeed, an ordered basis of V ′.

Example 2.6. Consider V := R2. If b1 := (1, 0), b2 := (1, 1), then B := (b1, b2) isan ordered basis of V . Then the ordered dual basis B′ = (α1, α2) of V

′ consists of themaps α1, α2 ∈ V ′ with α1(b1) = α2(b2) = 1, α1(b2) = α2(b1) = 0, i.e. with, for each(v1, v2) ∈ V ,

α1(v1, v2) = α1

(

(v1 − v2)b1 + v2b2)

= v1 − v2,

α2(v1, v2) = α2

(

(v1 − v2)b1 + v2b2)

= v2.

Notation 2.7. Let V be a vector space over the field F , dimV = n ∈ N, with orderedbasis B = (b1, . . . , bn). Moreover, let B′ = (α1, . . . , αn) be the corresponding ordereddual basis of V ′. If one then denotes the coordinates of v ∈ V with respect to B as thecolumn vector

v =

v1...vn

,

then one typically denotes the coordinates of γ ∈ V ′ with respect to B′ as the row vector

γ =(

γ1 . . . γn)

(this has the advantage that one then can express γ(v) as a matrix product, cf. Rem.2.8(a) below).

2 DUALITY 22

Remark 2.8. We remain in the situation of Not. 2.7 above.

(a) We obtain

γ(v) =

(

n∑

k=1

γkαk

)(

n∑

l=1

vlbl

)

=n∑

l=1

n∑

k=1

γkvlαk(bl) =n∑

l=1

n∑

k=1

γkvlδkl

=n∑

k=1

γkvk =(

γ1 . . . γn)

v1...vn

.

(b) Let BV := (v1, . . . , vn) be another ordered basis of V and (cji) ∈ GLn(F ) such that

∀i∈1,...,n

vi =n∑

j=1

cjivj.

If B′V := (α1, . . . , αn) denotes the ordered dual basis corresponding to BV and

(dji) := (cji)−1, then

∀i∈1,...,n

αi =n∑

j=1

d tjiαj =

n∑

j=1

dijαj,

where (d tji) denotes the transpose of (dji), i.e.

(d tji) ∈ GLn(F ) with ∀

(j,i)∈1,...,n×1,...,ndtji := dij :

Indeed, for each k, l ∈ 1, . . . , n, we obain

(

n∑

j=1

dkjαj

)

(vl) =

(

n∑

j=1

dkjαj

)(

n∑

i=1

cilvi

)

=n∑

j=1

n∑

i=1

dkjcilδji =n∑

j=1

dkjcjl = δkl.

Proposition 2.9. Let V be a vector space over the field F . If U is a vector subspace ofV and v ∈ V \ U , then there exists α ∈ V ′, satisfying

α(v) = 1 ∧ ∀u∈U

α(u) = 0.

Proof. Let BU be a basis of U . Then Bv := v ∪BU is linearly independent and,according to [Phi19, Th. 5.23(a)], there exists a basis B of V such that Bv ⊆ B.According to Cor. 2.3,

α : V −→ F, ∀b∈B

α(b) :=

1 for b = v,

0 for b 6= v,

defines an element of V ′, which, clearly, satisfies the required conditions.

2 DUALITY 23


(a) The map〈·, ·〉 : V × V ′ −→ F, 〈v, α〉 := α(v), (2.6)

is called the dual pairing corresponding to V .

(b) The dual of V ′ is called the bidual or the second dual of V . One writes V ′′ := (V ′)′.

(c) The map

Φ : V −→ V ′′, v 7→ Φv, (Φv) : V ′ −→ F, (Φv)(α) := α(v), (2.7)

is called the canonical embedding of V into V ′′ (cf. Th. 2.11 below).


(a) The canonical embedding Φ : V −→ V ′′ of (2.7) is a linear monomorphism (i.e. alinear isomorphism Φ : V ∼= ImΦ ⊆ V ′′).

(b) If dimV = n ∈ N, then Φ is a linear isomorphism Φ : V ∼= V ′′ (in fact, theconverse is also true, i.e., if Φ is an isomorphism, then dimV <∞, cf. the remarkin Cor. 2.4(c)).

Proof. (a): Exercise.

(b): According to Cor. 2.4(b), n = dimV = dimV ′ = dimV ′′. Thus, by [Phi19, Th.6.10], the linear monomorphism Φ is also an epimorphism.

Corollary 2.12. Let V be a vector space over the field F , dimV = n ∈ N. If B′ =α1, . . . , αn is a basis of V ′, then there exists a basis B of V such that B and B′ aredual.

Proof. According to Th. 2.11(b), the canonical embedding Φ : V −→ V ′′ of (2.7)constitutes a linear isomorphism. Let B′′ = f1, . . . , fn be the basis of V ′′ that is dualto B′ and, for each i ∈ 1, . . . , n, bi := Φ−1(fi). Then, as Φ is a linear isomorphism,B := b1, . . . , bn is a basis of V . Moreover, B and B′ are dual:

∀i,j∈1,...,n

αi(bj) = (Φbj)(αi) = fj(αi) = δij,

where we used that B′ and B′′ are dual.

2.2 Annihilators

Definition 2.13. Let V be a vector space over the field F , M ⊆ V , S ⊆ V ′. Moreover,let Φ : V −→ V ′′ denote the canonical embedding of (2.7). Then

M⊥ :=

α ∈ V ′ : ∀v∈M

α(v) = 0

=

V ′ for M = ∅,⋂

v∈M ker(Φv) for M 6= ∅

2 DUALITY 24

is called the (forward) annihilator of M in V ′,

S⊤ :=

v ∈ V : ∀α∈S

α(v) = 0

=

V for S = ∅,⋂

α∈S kerα for S 6= ∅

is called the (backward) annihilator of S in V . In view of Rem. 2.15 and Ex. 2.16(b)below, one also calls v ∈ V and α ∈ V ′ such that

α(v)(2.6)= 〈v, α〉 = 0

perpendicular or orthogonal and, in consequence, sets M⊥ and S⊤ are sometimes calledM perp and S perp, respectively.

Lemma 2.14. Let V be a vector space over the field F , M ⊆ V , S ⊆ V ′. Then M⊥ isa subspace of V ′ and S⊤ is a subspace of V . Moreover,

M⊥ = 〈M〉⊥, S⊤ = 〈S〉⊤. (2.8)

Proof. Since M⊥ and S⊤ are both intersections of kernels of linear maps, they aresubspaces, since kernels are subspaces by [Phi19, Prop. 6.3(c)] and intersections of sub-spaces are subspaces by [Phi19, Th. 5.7(a)]. Moreover, it is immediate from Def. 2.13that M⊥ ⊇ 〈M〉⊥ and S⊤ ⊇ 〈S〉⊤. On the other hand, consider α ∈ M⊥ and v ∈ S⊤.Let λ1, . . . , λn ∈ F , n ∈ N. If v1, . . . , vn ∈M , then

α

(

n∑

i=1

λivi

)

=n∑

i=1

λiα(vi)α∈M⊥

= 0,

showing α ∈ 〈M〉⊥ and M⊥ ⊆ 〈M〉⊥. Analogously, if α1, . . . , αn ∈ S, then(

n∑

i=1

λiαi

)

(v) =n∑

i=1

λiαi(v)v∈S⊤

= 0,

showing v ∈ 〈S〉⊤ and S⊤ ⊆ 〈S〉⊤.

Remark 2.15. On real vector spaces V , one can study so-called scalar products (alsocalled inner products), 〈·, ·〉 : V × V −→ R, (v, w) 7→ 〈v, w〉 ∈ R, which, as partof their definition, have the requirement of being bilinear forms, i.e., for each v ∈ V ,〈v, ·〉 : V −→ R is a linear form and, for each w ∈ V , 〈·, w〉 : V −→ R is a linear form(we will come back to vector spaces with scalar products again in a later section). Onethen calls vectors v, w ∈ V perpendicular or orthogonal with respect to 〈·, ·〉 if, and onlyif, 〈v, w〉 = 0 so that the notions of Def. 2.13 can be seen as generalizing orthogonalitywith respect to scalar products (also cf. Ex. 2.16(b) below).

Example 2.16. (a) Let V be a vector space over the field F and let U be a subspaceof V with BU being a basis of U . Then, according to [Phi19, Th. 5.23(a)], thereexists a basis B of V such that BU ⊆ B. Then Cor. 2.3 implies

∀α∈V ′

(

α ∈ U⊥ ⇔ ∀b∈BU

α(b) = 0

)

.

2 DUALITY 25

(b) Consider the real vector space R2 and let

〈·, ·〉 : R2 × R2 −→ R, 〈(v1, v2), (w1, w2)〉 := v1w1 + v2w2,

denote the so-called Euclidean scalar product on R2. Then, clearly, for each w =(w1, w2) ∈ R2,

αw : R2 −→ R, αw(v) := 〈v, w〉 = v1w1 + v2w2,

defines a linear form on R2. Let v := (1, 2). Then the span of v, i.e. lv := (λ, 2λ) :λ ∈ R, represents the line through v. Moreover, for each w = (w1, w2) ∈ R2,

αw ∈ v⊥ = l⊥v ⇔ αw(v) = w1 + 2w2 = 0 ⇔ w1 = −2w2

⇔ w ∈ l⊥ := (−2λ, λ) : λ ∈ R.

Thus, l⊥ is spanned by (−2, 1) and we see that l⊥v consists precisely of the linearforms αw that are given by vectors w that are perpendicular to v in the Euclideangeometrical sense (i.e. in the sense usually taught in high school geometry).

—

The following notions defined for linear forms in connection with subspaces can some-times be useful when studying annihilators:

Definition 2.17. Let V be a vector space over the field F and let U be a subspace ofV . Then

R : V ′ −→ U ′, Rf := f U ,

is called the restriction operator from V to U ;

I : (V/U)′ −→ V ′, (Ig)(v) := g(v + U),

is called the inflation operator from V/U to V .

Theorem 2.18. Let V be a vector space over the field F and let U be a subspace of Vwith the restriction operator R and the inflation operator I defined as in Def. 2.17.

(a) R : V ′ −→ U ′ is a linear epimorphism with kerR = U⊥. Moreover,

dimU⊥ + dimU ′ = dimV ′ (2.9)

andU ′ ∼= V ′/U⊥. (2.10)

(see [Phi19, Th. 6.8(a)] for the precise meaning of (2.9) in case at least one of theoccurring cardinalities is infinite). If dimV = n ∈ N, then one also has

n = dimV = dimU⊥ + dimU. (2.11)

(b) I is a linear isomorphism I : (V/U)′ ∼= U⊥.

2 DUALITY 26

Proof. (a): Let α, β ∈ V ′ and λ, µ ∈ F . Then, for each u ∈ U ,

R(λα + µβ)(u) = λα(u) + µβ(u) = λ(Rα)(u) + µ(Rβ)(u) =(

λ(Rα) + µ(Rβ))

(u),

showing R to be linear. Moreover, for each α ∈ V ′, one has

α ∈ kerR ⇔ ∀u∈U

α(u) = 0 ⇔ α ∈ U⊥,

proving kerR = U⊥. Let BU be a basis of U . Then, according to [Phi19, Th. 5.23(a)],there exists C ⊆ V such that BU ∪C is a basis of V . Consider α ∈ U ′. Using Cor. 2.3,define β ∈ V ′ by setting

β(b) :=

α(b) for b ∈ BU ,

0 for b ∈ C.

Then, clearly, Rβ = α, showing R to be surjective. Thus, we have

dimV ′ [Phi19, Th. 6.8(a)]= dimkerR + dim ImR = dimU⊥ + dimU ′,

thereby proving (2.9). Next, applying the isomorphism theorem of [Phi19, Th. 6.16(a)]yields

U ′ = ImR ∼= V ′/ kerR = V ′/U⊥,

which is (2.10). Finally, if dimV = n ∈ N, then

n = dimVCor. 2.4(b)

= dimV ′ (2.9)= dimU⊥ + dimU ′ Cor. 2.4(b)

= dimU⊥ + dimU,

proving (2.11).

(b): Exercise.


(a) If U is a subspace of V , then (U⊥)⊤ = U .

(b) If S is a subspace of V ′, S ⊆ (S⊤)⊥. If dimV = n ∈ N, then one even has(S⊤)⊥ = S.

(c) If U1, U2 are subspaces of V , then

(U1 + U2)⊥ = U⊥

1 ∩ U⊥2 , (U1 ∩ U2)

⊥ = U⊥1 + U⊥

2 .

(d) If S1, S2 are subspaces of V ′, then

(S1 + S2)⊤ = S⊤

1 ∩ S⊤2 , (S1 ∩ S2)

⊤ ⊇ S⊤1 + S⊤

2 .

If dimV = n ∈ N, then one also has

(S1 ∩ S2)⊤ = S⊤

1 + S⊤2 .

2 DUALITY 27

Proof. (a): Exercise.

(b): According to Def. 2.13, we have

(S⊤)⊥ :=

α ∈ V ′ : ∀v∈S⊤

α(v) = 0

,

showing S ⊆ (S⊤)⊥. Now assume dimV = n ∈ N and suppose there exists α ∈ (S⊤)⊥\S.Then, according to Prop. 2.9, there exists f ∈ V ′′, satisfying

f(α) = 1 ∧ ∀β∈S

f(β) = 0.

Since dimV = n ∈ N, we may employ Th. 2.11(b) to conclude that the canonicalembedding Φ : V −→ V ′′ is a linear isomorphism, in particular, surjective. Thus, thereexists v ∈ V such that f = Φv, i.e. f(γ) = γ(v) for each γ ∈ V ′. Since f ∈ S⊥, wehave β(v) = f(β) = 0 for each β ∈ S, showing v ∈ S⊤. Thus, α ∈ (S⊤)⊥ implies thecontradiction 0 = α(v) = f(α) = 1. In consequence, (S⊤)⊥ \ S = ∅, proving (b).

(c): Let α ∈ (U1 + U2)⊥. Then U1 ⊆ U1 + U2 implies α ∈ U⊥

1 , U2 ⊆ U1 + U2 impliesα ∈ U⊥

2 , showing α ∈ U⊥1 ∩U⊥

2 and (U1 +U2)⊥ ⊆ U⊥

1 ∩U⊥2 . Conversely, if α ∈ U⊥

1 ∩U⊥2

and u1 ∈ U1, u2 ∈ U2, then α(u1 + u2) = α(u1) + α(u2) = 0, showing α ∈ (U1 + U2)⊥

and U⊥1 ∩U⊥

2 ⊆ (U1 +U2)⊥. To prove the second equality in (c), first, let α ∈ U⊥

1 +U⊥2 ,

i.e. α = α1 + α2 with α1 ∈ U⊥1 , α2 ∈ U⊥

2 . Then, if v ∈ U1 ∩ U2, one has α(v) =α1(v) + α2(v) = 0, showing α ∈ (U1 ∩ U2)

⊥ and U⊥1 + U⊥

2 ⊆ (U1 ∩ U2)⊥. Conversely,

let α ∈ (U1 ∩ U2)⊥. We now proceed similar to the proof of [Phi19, Th. 5.30(c)]: We

choose bases B∩, BU1 , BU2 of U1∩U2, U1, and U2, respectively, such that B∩ ⊆ BU1 andB∩ ⊆ BU2 , defining B1 := BU1 \B∩, B2 := BU2 \B∩. Then it was shown in the proof of[Phi19, Th. 5.30(c)] that

B+ := BU1 ∪ BU2 = B1 ∪B2 ∪B∩

is a basis of U1 + U2 and we may choose C ⊆ V such that B := B+ ∪C is a basis of V .Using Cor. 2.3, we now define α1, α2 ∈ V ′ by setting

∀b∈B

α1(b) :=

α(b) for b ∈ B \B1,

0 for b ∈ B1,α2(b) :=

α(b) for b ∈ B1,

0 for b ∈ B \B1.

Since αB∩≡ 0, we obtain α1BU1

= α1B1 ∪B∩≡ 0 and α2BU2

= α2B2 ∪B∩≡ 0, showing

α1 ∈ U⊥1 and α2 ∈ U⊥

2 . On the other hand, we have, for each b ∈ B, α(b) = α1(b)+α2(b),showing α = α1 + α2, α ∈ U⊥

1 + U⊥2 , and (U1 ∩ U2)

⊥ ⊆ U⊥1 + U⊥

2 , thereby completingthe proof of (c).

(d): Exercise.

2.3 Hyperplanes and Linear Systems

In the present section, we combine duality with the theory of affine spaces of Sec. 1 andwith the theory of linear systems of [Phi19, Sec. 8].

2 DUALITY 28

Definition 2.20. Let V be a vector space over the field F . If α ∈ V ′ \ 0 and r ∈ F ,then the set

Hα,r := α−1r = v ∈ V : α(v) = r ⊆ V

is called a hyperplane in V .

Notation 2.21. Let V be a vector space over the field F , v ∈ V , and α ∈ V ′. We thenwrite

v⊥ := v⊥, α⊤ := α⊤.Theorem 2.22. Let V be a vector space over the field F .

(a) Each hyperplane H in V is an affine subspace of V , where dimV = 1+dimH, i.e.dimV = dimH if V is infinite-dimensional, and dimH = n− 1 if dimV = n ∈ N.More precisely, if 0 6= α ∈ V ′ and r ∈ F , then

∀w∈V :α(w) 6=0

Hα,r = rw

α(w)+ α⊤ = r

w

α(w)+ kerα. (2.12)

(b) If dimV = n ∈ N and M is an affine subspace of V with dimM = n− 1, then Mis a hyperplane in V , i.e. there exist 0 6= α ∈ V ′ and r ∈ F such that M = Hα,r.

(c) Let α, β ∈ V ′ \ 0 and r, s ∈ F . Then

Hα,r = Hβ,s ⇔ ∃06=λ∈F

(

β = λα ∧ s = λr)

.

Moreover, if α = β, then Hα,r and Hβ,s are parallel.

Proof. (a): Let 0 6= α ∈ V ′ and r ∈ F with w ∈ V such that α(w) 6= 0. Definev := r w

α(w). Then α(v) = r, i.e. v ∈ Hα,r = α−1r. Thus, by [Phi19, Th. 4.20(d)], we

haveHα,r = v + kerα = v + x ∈ V : α(x) = 0 = v + α⊤,

proving (2.12). In particular, we have dimHα,r = dimkerα and, by [Phi19, Th. 6.8(a)],

dimV = dimkerα + dim Imα = dimHα,r + dimF = dimHα,r + 1,

therby completing the proof of (a).

(b),(c): Exercise.

Proposition 2.23. Let V be a vector space over the field F , dimV = n ∈ N. If M ⊆ Vis an affine subspace of V with dimM = m ∈ N0, m < n, then M is the intersection ofn−m hyperplanes in V .

Proof. If dimM = m, then there exists a linear subspace U of V with dimU = mand v ∈ V such that M = v + U . Then, according to (2.11), dimU⊥ = n − m. Letα1, . . . , αn−m be a basis of U⊥. Define

∀i∈1,...,n−m

ri := αi(v).

2 DUALITY 29

We claim M = N :=⋂n−m

i=1 Hαi,ri : Indeed, if x = v + u with u ∈ U , then

∀i∈1,...,n−m

αi(x) = αi(v) + αi(u)αi∈U

⊥

= ri + 0 = ri,

showing x ∈ N and M ⊆ N . Conversely, let x ∈ N . Then

∀i∈1,...,n−m

αi(x− v) = ri − ri = 0, i.e. x− v ∈ α⊤i ,

implying

x− v ∈ 〈α1, . . . , αn−m〉⊤ = (U⊥)⊤Th. 2.19(a)

= U,

showing x ∈ v + U =M and N ⊆M as claimed.

Example 2.24. Let F be a field. As in [Phi19, Sec. 8.1], consider the linear system

∀j∈1,...,m

n∑

k=1

ajk xk = bj, (2.13)

where m,n ∈ N; b1, . . . , bm ∈ F and the aji ∈ F , j ∈ 1, . . . ,m, i ∈ 1, . . . , n.We know that we can also write (2.13) in matrix form as Ax = b with A = (aji) ∈M(m,n, F ), m,n ∈ N, and b = (b1, . . . , bm)

t ∈ M(m, 1, F ) ∼= Fm. The set of solutionsto (2.13) is

L(A|b) = x ∈ F n : Ax = b.If we now define the linear forms

∀j∈1,...,m

αj : Fn −→ F, αj

v1...vn

:=(

aj1 . . . ajn)

v1...vn

=

n∑

k=1

ajk vk,

then we can rewrite (2.13) as

∀

j∈1,...,mx =

x1...xn

∈ Hαj ,bj

⇔ x ∈

m⋂

j=1

Hαj ,bj . (2.14)

Thus, we have L(A|b) = ⋂mj=1Hαj ,bj and we can view (2.14) as a geometric interpretation

of (2.13), namely that the solution vectors x are required to lie in the intersection of them hyperplanes Hα1,b1 , . . . , Hαm,bm . Even though we know from [Phi19, Th. 8.15] that theelementary row operations of [Phi19, Def. 8.13] do not change the set of solutions L(A|b),it might be instructive to reexamine this fact in terms of linear forms and hyperplanes:The elementary row operation of row switchingmerely corresponds to changing the orderof the Hαj ,bj in the intersection yielding L(A|b). The elementary row operation of rowmultiplication rj 7→ λrj (0 6= λ ∈ F ) does not change L(A|b) due to Hαj ,bj = Hλαj ,λbj

according to Th. 2.22(c). The elementary row operation of row addition rj 7→ rj + λri

2 DUALITY 30

(λ ∈ F , i 6= j) replaces Hαj ,bj by Hαj+λαi,bj+λbi . We verify, once again, what we alreadyknow form [Phi19, Th. 8.15], namely

L(A|b) =m⋂

k=1

Hαk,bk =M :=

m⋂

k=1,k 6=j

Hαk,bk

∩Hαj+λαi,bj+λbi :

If x ∈ L(A|b), then (αj + λαi)(x) = bj + λbi, showing x ∈ Hαj+λαi,bj+λbi and x ∈ M .Conversely, if x ∈M , then αj(x) = (αj+λαi)(x)−λαi(x) = bj+λbi−λbi = bj, showingx ∈ Hαj ,bj and x ∈ L(A|b).

2.4 Dual Maps

Theorem 2.25. Let V,W be vector spaces over the field F . If A ∈ L(V,W ), then thereexists a unique map A′ : W ′ −→ V ′ such that (using the notation of (2.6))

∀v∈V

∀β∈W ′

(A′β)(v) = 〈v, A′β〉 = 〈Av, β〉 = β(Av). (2.15)

Moreover, this map turs out to be linear, i.e. A′ ∈ L(W ′, V ′).

Proof. Clearly, given A ∈ L(V,W ), (2.15) uniquely defines a map A′ : W ′ −→ V ′ (foreach β ∈ W ′, (2.15) defines the map (A′β) = β A ∈ L(V, F ) = V ′). It merely remainsto check that A′ is linear. To this end, let β, β1, β2 ∈ W ′, λ ∈ F , and v ∈ V . Then

(

A′(β1 + β2))

(v) = (β1 + β2)(Av) = β1(Av) + β2(Av) = (A′β1)(v) + (A′β1)(v)

= (A′β1 + A′β2)(v),(

A′(λβ))

(v) = (λβ)(Av) = λ(A′β)(v),

showing A′(β1 + β2) = A′β1 + A′β2, A′(λβ) = λA′(β), and the linearity of A′.

Definition 2.26. Let V,W be vector spaces over the field F , A ∈ L(V,W ). Then themap A′ ∈ L(W ′, V ′) given by Th. 2.26 is called the dual map corresponding to A (orthe transpose of A).

Theorem 2.27. Let F be a field, m,n ∈ N. Let V,W be finite-dimensional vector spacesover F , dimV = n, dimW = m, where BV := (v1, . . . , vn) is an ordered basis of V andBW := (w1, . . . , wm) is an ordered basis of W . Moreover, let BV ′ = (α1, . . . , αn) andBW ′ = (β1, . . . , βm) be the corresponding (ordered) dual bases of V

′ and W ′, respectively.If A ∈ L(V,W ) and (aji) ∈ M(m,n, F ) is the matrix corresponding to A with respectto BV and BW , then the transpose of (aji), i.e.

(atji)(j,i)∈1,...,n×1,...,m ∈ M(n,m, F ), where ∀(j,i)∈1,...,n×1,...,m

atji := aij

is the matrix corresponding to A′ with respect to BW ′ and BV ′.

2 DUALITY 31

Proof. If (aji) is the matrix corresponding to A with respect to BV and BW , then (cf.[Phi19, Th. 7.10(b)])

∀i∈1,...,n

Avi =m∑

j=1

ajiwj, (2.16)

and we have to show

∀j∈1,...,m

A′βj =n∑

i=1

atijαi,=n∑

i=1

ajiαi. (2.17)

Indeed, one computes, for each j ∈ 1, . . . ,m and for each k ∈ 1, . . . , n,

(A′βj)vk = βj(Avk) = βj

(

m∑

l=1

alkwl

)

=m∑

l=1

alkβj(wl) =m∑

l=1

alkδjl = ajk

=n∑

i=1

ajiδik =n∑

i=1

ajiαi(vk) =

(

n∑

i=1

ajiαi

)

vk,

thereby proving (2.17).

Remark 2.28. As in Th. 2.27, letm,n ∈ N, let V,W be finite-dimensional vector spacesover the field F , dimV = n, dimW = m, where BV := (v1, . . . , vn) is an ordered basisof V and BW := (w1, . . . , wm) is an ordered basis of W with corresponding (ordered)dual bases BV ′ = (α1, . . . , αn) of V

′ and BW ′ = (β1, . . . , βm) of W′. Let A ∈ L(V,W )

with dual map A′ ∈ L(W ′, V ′).

(a) If (aji) ∈ M(m,n, F ) is the matrix corresponding to A with respect to BV and BW

and if one represents elements of the duals as column vectors, then, according toTh. 2.27, one obtains, for γ =

∑mi=1 γiβi ∈ W ′ and ǫ := A′(γ) =

∑ni=1 ǫiαi ∈ V ′

with γ1, . . . , γm, ǫ1, . . . , ǫn ∈ F ,

ǫ1...ǫn

= (aji)

t

γ1...γm

.

However, if one adopts the convention of Not. 2.7 to represent elements of the dualsas row vectors, then one applies transposes in the above equation to obtain

(

ǫ1 . . . ǫn)

=(

γ1 . . . γm)

(aji),

showing that this notation allows A and A′ to be represented by the same matrix(aji).

(b) As in [Phi19, Th. 7.14] and Rem. 2.8(b) above, we now consider basis transitions

∀i∈1,...,n

vi =n∑

j=1

cjivj, ∀i∈1,...,m

wi =m∑

j=1

fjiwj, ,

2 DUALITY 32

BV := (v1, . . . , vn), BW := (w1, . . . , wm). We then know from [Phi19, Th. 7.14] thatthe matrix representing A with respect to BV and BW is (fji)

−1(aji)(cji). Thus, ac-cording to Th. 2.26, the matrix representing A′ with respect to the dual bases B′

V =

(α1, . . . , αn) and B′W = (β1, . . . , βm) is

(

(fji)−1(aji)(cji)

)t= (cji)

t(aji)t((fji)

−1)t.Of course, we can, alternatively, observe that, by Rem. 2.8(b), the basis transi-tion from BV ′ to B′

V is given by ((cji)−1)t and the basis transition from BW ′ to

B′W is given by ((fji)

−1)t and compute the matrix representing A′ with respectto B′

V and B′W via Th. 2.26 and [Phi19, Th. 7.14] to obtain (cji)

t(aji)t((fji)

−1)t,as before. If γ =

∑mi=1 γiβi ∈ W ′ and ǫ := A′(γ) =

∑ni=1 ǫiαi ∈ V ′ with

γ1, . . . , γm, ǫ1, . . . , ǫn ∈ F , then this yields

(

ǫ1 . . . ǫn)

=(

γ1 . . . γm)

(fji)−1(aji)(cji).

(c) Comparing with [Phi19, Rem. 7.24], we observe that the dual map A′ ∈ L(W ′, V ′)is precisely the transpose map At of the map A considered in [Phi19, Rem. 7.24].Moreover, as a consequence of Th. 2.26, the rows of the matrix (aji), representingA, span ImA′ in the same way that the columns of (aji) span ImA.

Theorem 2.29. Let V,W be vector spaces over the field F .

(a) The duality map ′ : L(V,W ) −→ L(W ′, V ′), A 7→ A′, is linear.

(b) If X is another vector space over F , A ∈ L(V,W ) and B ∈ L(W,X), then

(BA)′ = A′B′.

Proof. (a): Let A,B ∈ L(V,W ), λ ∈ F , β ∈ W ′, and v ∈ V . Then we compute

(A+ B)′(β)(v) = β(

(A+ B)(v))

= β(Av + Bv) = β(Av) + β(Bv)

= (A′β)(v) + (B′β)(v) = (A′ + B′)(β)(v),

(λA)′(β)(v) = β(

(λA)(v))

= λβ(Av) = (λA′)(β)(v),

showing (A+ B)′ = A′ + B′, (λA)′ = λA′, and the linearity of ′.

(b): If γ ∈ X ′, then

(B A)′(γ) = γ (B A) = (γ B) A = A′(γ B) = A′(B′γ) = (A′ B′)(γ),

showing (B A)′ = A′ B′.

Theorem 2.30. Let V,W be vector spaces over the field F and A ∈ L(V,W ).

(a) kerA′ = (ImA)⊥.

(b) kerA = (ImA′)⊤.

(c) A is an epimorphism if, and only if, A′ is a monomorphism.

3 SYMMETRIC GROUPS 33

(d) If A′ is an epimorphism, then A is a monomorphism. If A is a monomorphism anddimV = n ∈ N, then A′ is an epimorphism.

(e) If A′ is an isomorphism, then A is a isomorphism. If A is a isomorphism anddimV = n ∈ N, then A′ is an isomorphism.

Proof. (a) is due to the equivalence

β ∈ kerA′ ⇔(

∀v∈V

β(Av) = (A′β)(v) = 0

)

⇔ β ∈ (ImA)⊥.

(b): Exercise.

(c): If A is an epimorphism, then ImA = W , implying

kerA′ (a)= (ImA)⊥ = W⊥ = 0,

showing A′ to be a monomorphism. Converserly, if A′ is a monomorphism, then kerA′ =0, implying

ImATh. 2.19(a)

=(

(ImA)⊥)⊤ (a)

= (kerA′)⊤ = 0⊤ = W,

showing A to be an epimorphism.

(d): Exercise.

(e) is now immediate from combining (c) and (d).

Theorem 2.31. Let V,W be vector spaces over the field F with canonical embeddingsΦV : V −→ V ′′ and ΦW : W −→ W ′′ according to Def. 2.10(c). Let A ∈ L(V,W ) andA′′ := (A′)′ ∈ L(V ′′,W ′′). Then we have

ΦW A = A′′ ΦV . (2.18)

Proof. If v ∈ V and β ∈ W ′, then

(

(ΦW A)(v))

(β) = β(Av) = (A′β)(v) = (ΦV v)(

A′β)

=(

(ΦV v) A′)

(β)

=(

A′′(ΦV v))

(β) =(

(A′′ ΦV )(v))

(β)

proves (2.18).

3 Symmetric Groups

In preparation for the introduction of the notion of determinant (which we will findto be a useful tool to further study linear endomorphisms between finite-dimensionalvector spaces), we revisit the symmetric group Sn of [Phi19, Ex. 4.9(b)].


Definition 3.1. Let k, n ∈ N, k ≤ n. A permutation π ∈ Sn is called a k-cycle if, andonly if, there exist k distinct numbers i1, . . . , ik ∈ 1, . . . , n such that

π(i) =

ij+1 if i = ij, j ∈ 1, . . . , k − 1,i1 if i = ik,

i if i /∈ i1, . . . , ik.(3.1)

A 2-cycle is also known as a transposition.

Notation 3.2. Let n ∈ N, π ∈ Sn.

(a) One writes

π =

(

i1 i2 . . . inπ(i1) π(i2) . . . π(in)

)

,

where i1, . . . , in = 1, . . . , n.

(b) If π is a k-cycle as in (3.1), k ∈ N, then one also writes

π = (i1 i2 . . . ik). (3.2)

Example 3.3. (a) Consider π ∈ S5,

π =

(

5 4 3 2 13 4 1 2 5

)

= (3 1 5).

Thenπ(1) = 5, π(2) = 2, π(3) = 1, π(4) = 4, π(5) = 3.

(b) Letting π ∈ S5 be as in (a), we have (recalling that the composition on Sn is merelythe usual composition of maps)

π =

(

5 4 3 2 14 1 2 5 3

)(

1 2 3 4 52 3 4 5 1

)

= (3 5)(3 1).

Lemma 3.4. Let α, k, n ∈ N, α ≤ k ≤ n, and consider distinct numbers i1, . . . , ik ∈1, . . . , n. Then, in Sn, the following statements hold true:

(a) One has(i1 i2 . . . ik) = (iα iα+1 . . . ik i1 . . . iα−1).

(b) Let n ≥ 2. If 1 < α < k, then

(i1 i2 . . . ik)(i1 iα) = (i1 iα+1 . . . ik)(i2 . . . iα);

and, moreover,

(i1 i2 . . . ik)(i1 ik) = (i2 . . . ik)(i1).


(c) Let n ≥ 2. Given β, l ∈ N, β ≤ l ≤ n, and distinct numbers j1, . . . , jl ∈ 1, . . . , n\i1, . . . , ik, one has

(i1 . . . ik)(j1 . . . jl)(i1 j1) = (j1 i2 i3 . . . ik i1 j2 j3 . . . jl).

Proof. Exercise.

Notation 3.5. Let M,N be sets. Define

S(M,N) := (f : M −→ N) : f bijective.

Proposition 3.6. Let M,N be sets with #M = #N = n ∈ N0, S := S(M,N) (cf. Not.3.5). Then #S = n!; in particular #SM = n!.

Proof. We conduct the proof via induction: If n = 0, then S contains precisely theempty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = a,N = b, then S contains precisely the map f : M −→ N , f(a) = b, and #S = 1 = 1!is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ Mand

A :=⋃

b∈N

S(

M \ a, N \ b)

. (3.3)

Since the union in (3.3) is finite and disjoint, one has

#A =∑

b∈N

#S(

M \ a, N \ b)

ind.hyp.=

∑

b∈N

(n!) = (n+ 1) · n! = (n+ 1)!.

Thus, it suffices to show

φ : S −→ A, φ(f) : M \ a −→ N \ f(a), φ(f) := f M\a,

is well-defined and bijective. If f : M −→ N is bijective, then f M\a: M \ a −→N \ f(a) is bijective as well, i.e. φ is well-defined. Suppose f, g ∈ S with f 6= g. Iff(a) 6= g(a), then φ(f) 6= φ(g), as they have different ranges. If f(a) = g(a), then thereexists x ∈ M \ a with f(x) 6= g(x), implying φ(f)(x) = f(x) 6= g(x) = φ(g)(x), i.e.,once again, φ(f) 6= φ(g). Thus, φ is injective. Now let h ∈ S

(

M \a, N \b)

for someb ∈ N . Letting

f : M −→ N, f(x) :=

b for x = a,

h(x) for x 6= a,

we have φ(f) = h, showing φ to be surjective as well.

Theorem 3.7. Let n ∈ N.

(a) Each permutation can be decomposed into finitely many disjoint cycles: For eachπ ∈ Sn, there exists a decomposition of 1, . . . , n into disjoint sets A1, . . . , AN ,N ∈ N, i.e.

1, . . . , n =N⋃

i=1

Ai and Ai ∩ Aj = ∅ for i 6= j, (3.4)


such that Ai consists of the distinct elements ai1, . . . , ai,Niand

π = (aN1 . . . aN,NN) · · · (a11 . . . a1,N1). (3.5)

The decomposition (3.5) is unique up to the order of the cycles.

(b) If n ≥ 2, then every permutation π ∈ Sn is the composition of finitely many trans-positions, where each transposition permutes two juxtaposed elements, i.e.

∀π∈Sn

∃N∈N

∃τ1,...,τN∈T

π = τN · · · τ1, (3.6)

where T :=

(i i+ 1) : i ∈ 1, . . . , n− 1

.

Proof. (a): We prove the statement by induction on n. For n = 1, there is nothing toprove. Let n > 1 and choose i ∈ 1, . . . , n. We claim that

∃k∈N

(

πk(i) = i ∧ ∀l∈1,...,k−1

πl(i) 6= i

)

. (3.7)

Indeed, since 1, . . . , n is finite, there must be a smallest k ∈ N such that πk(i) ∈ A1 :=i, π(i), . . . , πk−1(i). Since π is bijective, it must be πk(i) = i and (i π(i) . . . , πk−1(i))is a k-cycle. We are already done in case k = n. If k < n, then consider B :=1, . . . , n \A1. Then, again using the bijectivity of π, πB is a permutation on B with1 ≤ #B < n. By induction, there are disjoint sets A2, . . . , AN such that B =

⋃Nj=2Aj,

Aj consists of the distinct elements aj1, . . . , aj,Njand

πB= (aN1 . . . aN,NN) · · · (a21 . . . a2,N2).

Since π = (i π(i) . . . , πk−1(i)) π B, this finishes the proof of (3.5). If there wereanother, different, decomposition of π into cycles, say, given by disjoint sets B1, . . . , BM ,1, . . . , n =

⋃Mi=1Bi, M ∈ N, then there were Ai 6= Bj and k ∈ Ai ∩ Bj. But then k

were in the cycle given by Ai and in the cycle given by Bj, implying Ai = πl(k) : l ∈N = Bj, in contradiction to Ai 6= Bj.

(b): We first show that every π ∈ Sn is a composition of finitely many transpositions(not necessarily transpositions from the set T ): According to (a), it suffices to showthat every cycle is a composition of finitely many transpositions. Since each 1-cycle isthe identity, it is (i) = Id = (1 2) (1 2) for each i ∈ 1, . . . , n. If (i1 . . . ik) is a k-cycle,k ∈ 2, . . . , n, then

(i1 . . . ik) = (i1 i2) (i2 i3) · · · (ik−1 ik) : (3.8)

Indeed,

∀i∈1,...,n

(i1 i2) (i2 i3) · · · (ik−1 ik)(i) =

i1 for i = ik,

il+1 for i = il, l ∈ 1, . . . , k − 1,i for i /∈ i1, . . . , ik,


proving (3.8). To finish the proof of (b), we observe that every transposition is acomposition of finitely many elements of T : If i, j ∈ 1, . . . , n, i < j, then

(i j) = (i i+ 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i+ 1 i+ 2)(i i+ 1) : (3.9)

Indeed,

∀k∈1,...,n

(i i+ 1) · · · (j − 2 j − 1)(j − 1 j) · · · (i+ 1 i+ 2)(i i+ 1)(k)

=

j for k = i,

i for k = j,

k for i < k < j,

k for k /∈ i, i+ 1, . . . , j,

proving (3.9).

Remark 3.8. Let n ∈ N, n ≥ 2. According to Th. 3.7(a), each π ∈ Sn has a uniquedecomposition into N ∈ N cycles as in (3.5). If τ ∈ Sn is a transposition, then, asa consequence of Lem. 3.4(a),(b),(c), the corresponding cycle decomposition of πτ hasprecisely N + 1 cycles (if Lem. 3.4(b) applies) or precisely N − 1 cycles (if Lem. 3.4(c)applies).

Definition 3.9. Let k ∈ Z. We call the integer k even if, and only if, k ≡ 0 (mod 2)(cf. notation introduced in [Phi19, Ex. 4.27(a)]; we call k odd if, and only if, k ≡ 1 (mod2) (i.e. k is even if, and only if, 2 is a divisor of k; k is odd if, and only if, 2 is no divisorof k, cf. [Phi19, Def. D.2(a)]). The property of being even or odd is called the parity ofthe integer k.

Example 3.10. Let Id ∈ S3 be the identity on 1, 2, 3. Then

Id = (1)(2)(3) = (1 2)(1 2) = (3 2)(3 2) = (1 2)(1 2)(3 2)(3 2),

(1 2 3) = (1 2)(2 3) = (2 3)(1 3) = (1 2)(1 3)(2 3)(1 2),

illustrating that, for n ≥ 2, one can write elements π of Sn as products of transpositionswith varying numbers of factors. However, while the number k ∈ N0 of factors in suchproducts representing π is not unique, we will prove in the following Th. 3.11(a) thatthe parity of k is uniquely determined by π ∈ Sn.

Theorem 3.11. Let n ∈ N.

(a) Let n ≥ 2, π ∈ Sn, N ∈ N, k ∈ N0. If

π =k∏

i=1

hi (3.10)

with transpositions h1, . . . , hk ∈ Sn and π is decomposed into N cycles according toTh. 3.7(a), then

k ≡ n−N (mod 2), (3.11)

i.e. the parity of k is uniquely determined by π.


(b) The map

sgn : Sn −→ −1, 1, sgn(π) :=

1 for n = 1,

(−1)k(a)= (−1)n−N (mod 2) for n ≥ 2,

where, for n ≥ 2, k = k(π) and N = N(π) are as in (a), constitutes a groupepimorphism (here, −1, 1 is considered as a multiplicative subgroup of R (or Q)– as we know all groups with two elements to be isomorphic, we have (−1, 1, ·) ∼=(Z2,+)). One calls sgn(π) the sign, signature, or signum of the permutation π.

Proof. (a): We conduct the proof via induction on k: For k = 0, the product in (3.10)is empty, i.e.

π = Id = (1) · · · (n),yielding N = n, showing (3.11) to hold. If k = 1, then π = h1 is a transposition and,thus, has N = n − 1 cycles, showing (3.11) to hold once again. Now assume (3.11) for

k ≥ 1 by induction and consider π =∏k+1

i=1 hi =(

∏ki=1 hi

)

hk+1. Thus, π = πkhk+1,

where πk :=∏k

i=1 hi. If πk has Nk cycles, then, by induction,

k ≡ n−Nk (mod 2). (3.12)

Moreover, from Rem. 3.8 we know N = Nk + 1 or N = Nk − 1. In both cases, (3.12)implies k + 1 ≡ n−N (mod 2), completing the induction.

(b): For n = 1, there is nothing to prove. For n ≥ 2, we first note sgn to be well-defined,as the number of cycles N(π) is uniquely determined by π ∈ Sn (and each π ∈ Sn canbe written as a product of transpositions by Th. 3.7(b)). Next, we note sgn to besurjective, since, for the identity, we can choose k = 0, i.e. sgn(Id) = (−1)0 = 1, and, foreach transposition τ ∈ Sn (as n ≥ 2, Sn contains at least the transposition τ = (1 2)),we can choose k = 1, i.e. sgn(τ) = (−1)1 = −1. To verify sgn to be a homomorphism,let π, σ ∈ Sn. By Th. 3.7(b), there are transpositions τ1, . . . , τkπ , h1, . . . , hkσ ∈ Sn suchthat

π =kπ∏

i=1

τi, σ =kσ∏

i=1

hi ⇒ πσ =

(

kπ∏

i=1

τi

)(

kσ∏

i=1

hi

)

,

implyingsgn(πσ) = (−1)kπ+kσ = (−1)kπ (−1)kσ = sgn(π) sgn(σ),

thus, completing the proof that sgn constitutes a homomorphism.

Proposition 3.12. Let n ∈ N.

(a) One has

∀π∈Sn

sgn(π) =∏

1≤i<j≤n

π(i)− π(j)

i− j. (3.13)


(b) Let n ≥ 2, π ∈ Sn. As in Th. 3.7(a), we write π as a product of N ∈ N cycles,

π(3.5)= (aN1 . . . aN,NN

) · · · (a11 . . . a1,N1) =N∏

i=1

γi, (3.14)

where, for each i ∈ 1, . . . , N, γi := (ai1 . . . ai,Ni) is a cycle of length Ni ∈ N.

Thensgn(π) = (−1)

∑Ni=1(Ni−1). (3.15)

Proof. (a): For each π ∈ Sn, let σ(π) denote the value given by the right-hand side of(3.13). If n = 1, then Sn = Id and σ(Id) = 1 = sgn(Id), since the product in (3.13) isempty (and, thus, equal to 1). For n ≥ 2, we first show

∀π1,π2∈Sn

σ(π1 π2) = σ(π1)σ(π2) :

For each π1, π2 ∈ Sn, one computes

σ(π1 π2) =∏

1≤i<j≤n

π1(π2(i))− π1(π2(j))

i− j

=∏

1≤i<j≤n

(

π1(π2(i))− π1(π2(j))

π2(i)− π2(j)· π2(i)− π2(j)

i− j

)

π2 bij.= σ(π1)σ(π2),

thereby establishing the case. Next, if τ ∈ Sn is a transposition, then there exist elementsi, j ∈ 1, . . . , n such that i < j and τ = (i j). Thus,

σ(τ) =τ(i)− τ(j)

i− j=j − i

i− j= −1

holds for each transposition τ . In consequence, if π ∈ Sn is the composition of k ∈ Ntranspositions, then

σ(π) = (−1)kTh. 3.11(b)

= sgn(π),

proving (a).

(b): Using (3.14) together with the homomorphism property of sgn given by Th. 3.11(b),if suffices to show that

sgn(γ) = (−1)k−1 (3.16)

holds for each cycle γ := (i1 . . . ik) ∈ Sn, where i1, . . . , ik are distinct elements of1, . . . , n, k ∈ N. According to (3.8), we have

γ = (i1 . . . ik) = (i1 i2) (i2 i3) · · · (ik−1 ik),

showing γ to be the product of k − 1 transpositions, thereby proving (3.16) and theproposition.

4 MULTILINEAR MAPS AND DETERMINANTS 40

Definition 3.13. Let n ∈ N, n ≥ 2, and let sgn : Sn −→ 1,−1 be the grouphomomorphism defined in Th. 3.11(b) above. We call π ∈ Sn even if, and only if,sgn(π) = 1; we call π odd if, and only if, sgn(π) = −1. The property of being even orodd is called the parity of the permutation π. Moreover, we call

An := ker sgn = π ∈ Sn : sgn(π) = 1

the alternating group on 1, . . . , n.

Proposition 3.14. Let n ∈ N, n ≥ 2.

(a) An is a normal subgroup of Sn and one has

Sn/An∼= Im sgn = 1,−1 ∼= Z2 (3.17)

(where 1,−1 is considered with multiplication and Z2 = 0, 1 is considered withaddition modulo 2).

(b) For each transposition τ ∈ Sn, one has τ /∈ An, Sn = (Anτ) ∪An, where we recallthat ∪ denotes a disjoint union and Anτ denotes the coset πτ : π ∈ An. Moreover,#An = #(Anτ) = (n!)/2.

Proof. (a): As the kernel of a homomorphism, An is a normal subgroup by [Phi19,Ex. 4.24(a)] and, thus, (3.17) is immediate from the isomorphism theorem [Phi19, Th.4.26(b)].

(b): Let τ ∈ Sn be a transposition. Since sgn(τ) = (−1)1 = −1, τ /∈ An. Moreover, ifπ ∈ An, then sgn(πτ) = sgn(π) sgn(τ) = 1 · (−1) = −1, showing (Anτ) ∩ An = ∅. Letπ ∈ Sn \ An. Then

sgn(π) = −1 ⇒ sgn(πτ) = 1 ⇒ πτ ∈ An ⇒ π = πττ ∈ Anτ,

showing Sn = (Anτ)∪An. To prove #An = #(Anτ), we note the maps φ : An −→ Anτ ,φ(π) := πτ , φ−1 : Anτ −→ An, φ

−1(π) := πτ , to be inverses of each other and, thus,bijective. Moreover, as we know #Sn = n! from Prop. 3.6, we have n! = #Sn =#An +#(Anτ) = 2 · (#An), thereby completing the proof.

4 Multilinear Maps and Determinants

Employing the preparations of the previous section, we are now in a position to formulatean ad hoc definition of the notion of determinant. However, it seems more instructive tofirst study some more general related notions that embed determinants into a contextthat is also of independent interest and importance. Determinants are actually ratherversatile objects: We will see below that, given a finite-dimensional vector space V overa field F , they can be viewed as maps det : L(V, V ) −→ F , assigning numbers to linearendomorphisms. However, they can also be viewed as functions det : M(n, F ) −→ F ,


assigning numbers to quadratic matrices, and also as polynomial functions det : F n2 −→F of degree n in n2 variables. But the point of view, we will focus on first, is determinantsas (alternating) multilinear forms (i.e. F -valued maps) det : V n −→ F .

We start with a general introduction to multilinear maps, which have many importantapplications, not only in Algebra, but also in Analysis, where they, e.g., occur in theform of higher order total derivatives of maps from Rn to Rm (cf. [Phi16b, Sec. 4.6])and when studying the integration of so-called differential forms (see, e.g., [For17, §19]and [Kon04, Sec. 13.1]).

4.1 Multilinear Maps

Definition 4.1. Let V and W be vector spaces over the field F , α ∈ N. We call a map

L : V α −→ W (4.1)

multilinear (more precisely, α times linear, bilinear for α = 2) if, and only if, it is linearin each component, i.e., for each x1, . . . , xi−1, xi+1, . . . , xα, v, w ∈ V , i ∈ 1, . . . , α andeach λ, µ ∈ F :

L(x1, . . . , xi−1, λv + µw, xi+1, . . . , xα)

= λL(x1, . . . , xi−1, v, xi+1, . . . , xα) + µL(x1, . . . , xi−1, w, xi+1, . . . , xα), (4.2)

where we note that, here, and in the following, the superscripts merely denote upperindices and not exponentiation. We denote the set of all α times linear maps from V α

into W by Lα(V,W ). We also set L0(V,W ) := W . In extension of Def. 2.2(a), we alsocall L ∈ Lα(V, F ) a multilinear form, a bilinear form for α = 2.

Remark 4.2. In the situation of Def. 4.1, each Lα(V,W ), α ∈ N0, constitutes a vectorspace over F : It is a subspace of the vector space over F of all functions from V α intoW ,since, clearly, if K,L : V α −→ W are both α times linear and λ, µ ∈ F , then λK + µLis also α times linear.

—

The following Th. 4.3 is in generalization of [Phi19, Th. 6.6].

Theorem 4.3. Let V and W be vector spaces over the field F , α ∈ N. Moreover, letB be a basis of V . Then each α times linear map L ∈ Lα(V,W ) is uniquely determinedby its values on Bα: More precisely, if (wb)b∈Bα is a family in W , and, for each v ∈ V ,Bv and cv : Bv −→ F \ 0 are as in [Phi19, Th. 5.19] (providing the coordinates of vwith respect to B in the usual way), then the map

L : V α −→ W, L(v1, . . . , vα) = L

∑

b1∈Bv1

cv1(b1) b1, . . . ,

∑

bα∈Bvα

cvα(bα) bα

:=∑

(b1,...,bα)∈Bv1×···×Bvα

cv1(b1) · · · cvα(bα)w(b1,...,bα), (4.3)


is α times linear, and L ∈ Lα(V,W ) with

∀b∈Bα

L(b) = wb, (4.4)

implies L = L.

Proof. Exercise: Apart from the more elaborate notation, everything works as in theproof of [Phi19, Th. 6.6].

The following Th. 4.4 is in generalization of [Phi19, Th. 6.19].

Theorem 4.4. Let V and W be vector spaces over the field F , let BV and BW be basesof V and W , respectively, let α ∈ N. Given b1, . . . , bα ∈ BV and b ∈ BW , and using Th.4.3, define maps Lb1,...,bα,b ∈ Lα(V,W ) by letting

Lb1,...,bα,b(b1, . . . , bα) :=

b for (b1, . . . , bα) = (b1, . . . , bα),

0 otherwise.(4.5)

Let B := Lb1,...,bα,b : (b1, . . . , bα) ∈ (BV )

α, b ∈ BW.

(a) B is linearly independent.

(b) If V is finite-dimensional, dimV = n ∈ N, BV = b1, . . . , bn, then B constitutes abasis for Lα(V,W ). If, in addition, dimW = m ∈ N, BW = w1, . . . , wm, then wecan write

dimLα(V,W ) = (dimV )α · dimW = nα ·m. (4.6)

(c) If dimV = ∞ and dimW ≥ 1, then 〈B〉 ( Lα(V,W ) and, in particular, B is not abasis of Lα(V,W ).

Proof. (a): We verify that the elements of B are linearly independent: Let M,N ∈ N.Let (b11, . . . , b

α1 ), . . . , (b

1N , . . . , b

αN) ∈ (BV )

α be distinct and let w1, . . . , wM ∈ BW bedistinct as well. Assume λlk ∈ F to be such that

L :=M∑

l=1

N∑

k=1

λlkLb1k,...,bαk ,w

l = 0.

Let k ∈ 1, . . . , N. Then

0 = L(b1k, . . . , bαk ) =

M∑

l=1

N∑

k=1

λlkLb1k,...,bαk ,w

l(b1k, . . . , bαk ) =

M∑

l=1

λlk wl

implies λ1k = · · · = λMk = 0 due to the linear independence of the wl ∈ BW . As thisholds for each k ∈ 1, . . . , N, we have established the linear independence of B.(b): According to (a), it remains to show 〈B〉 = Lα(V,W ). Let L ∈ Lα(V,W )and (i1, . . . , iα) ∈ 1, . . . , nα. Then there exists a finite set B(i1,...,iα) ⊆ BW such


that L(bi1 , . . . , biα) =∑

w∈B(i1,...,iα)λww with λw ∈ F . Now let w1, . . . , wM , M ∈ N,

be an enumeration of the finite set⋃

I∈1,...,nα BI . Then there exist λj,(i1,...,iα) ∈ F ,

(j, (i1, . . . , iα)) ∈ 1, . . . ,M × 1, . . . , nα, such that

∀(i1,...,iα)∈1,...,nα

L(bi1 , . . . , biα) =M∑

j=1

λj,(i1,...,iα)wj.

Letting L :=∑M

j=1

∑

(i1,...,iα)∈1,...,nαλj,(i1,...,iα)Lbi1 ,...,biα ,wj

, we claim L = L. Indeed,

∀(j1,...,jα)∈1,...,nα

L(bj1 , . . . , bjα) =M∑

j=1

∑

(i1,...,iα)∈1,...,nα

λj,(i1,...,iα)Lbi1 ,...,biα ,wj(bj1 , . . . , bjα)

=M∑

j=1

∑

(i1,...,iα)∈1,...,nα

λj,(i1,...,iα)δ(i1,...,iα),(j1,...,jα)wj

=M∑

j=1

λj,(j1,...,jα)wj = L(bj1 , . . . , bjα),

proving L = L by Th. 4.3. Since L ∈ 〈B〉, the proof of (b) is complete.

(c): As dimW ≥ 1, there exists w ∈ BW . If L ∈ 〈B〉, then b ∈ (BV )α : L(b) 6= 0

is finite. Thus, if BV is infinite, then the map L ∈ Lα(V,W ) with L(b) := w for eachb ∈ (BV )

α is not in 〈B〉, proving (c).

4.2 Alternating Multilinear Maps and Determinants

Definition 4.5. Let V and W be vector spaces over the field F , α ∈ N. Then A ∈Lα(V,W ) is called alternating if, and only if,

∀(v1,...,vα)∈V α

(

∃i,j∈1,...,α, i 6=j

vi = vj)

⇒ A(v1, . . . , vα) = 0. (4.7)

Moreover, define the sets

Alt0(V,W ) := W, Altα(V,W ) :=

A ∈ Lα(V,W ) : A alternating

(note that this immediately yields Alt1(V,W ) = L(V,W )).

Remark 4.6. Let V and W be vector spaces over the field F , α ∈ N. Then Altα(V,W )is a vector subspace of Lα(V,W ): Indeed, 0 ∈ Altα(V,W ) and, if λ ∈ F and A,B ∈Lα(V,W ) satisfy (4.7), then λA and A+B satisfy (4.7) as well.

Notation 4.7. Let V,W be a sets and α ∈ N. Then, for each f ∈ F(V α,W ) = W V α

and each permutation π ∈ Sα, define

(πf) : V α −→ W, (πf)(v1, . . . , vα) := f(vπ(1), . . . , vπ(α)).


Lemma 4.8. Let V,W be a sets and α ∈ N. Then

∀π1,π2∈Sα

∀f∈F(V α,W )

(π1π2)f = π1(π2f). (4.8)

Proof. For each (v1, . . . , vα) ∈ V α, let (w1, . . . , wα) := (vπ1(1), . . . , vπ1(α)) ∈ V α. Then,for each i ∈ 1, . . . , α, we have wi = vπ1(i) and wπ2(i) = vπ1(π2(i)). Thus, we compute

(

(π1π2)f)

(v1, . . . , vα) = f(v(π1π2)(1), . . . , v(π1π2)(α)) = f(vπ1(π2(1)), . . . , vπ1(π2(α)))

= f(wπ2(1), . . . , wπ2(α)) = (π2f)(w1, . . . , wα)

= (π2f)(vπ1(1), . . . , vπ1(α)) =(

π1(π2f))

(v1, . . . , vα),

thereby establishing (4.8).

Proposition 4.9. Let V and W be vector spaces over the field F , α ∈ N. Then, givenA ∈ Lα(V,W ), the following statements are equivalent for charF 6= 2, where “(i) ⇒ (ii)”also holds for charF = 2.

(i) A is alternating.

(ii) For each permutation π ∈ Sα, one has

∀(v1,...,vα)∈V α

A(vπ(1), . . . , vπ(α)) = sgn(π)A(v1, . . . , vα). (4.9)

Proof. “(i)⇒(ii)”: We first prove (4.9) for transpositions π = (i i+1) with i ∈ 1, . . . , α−1: Let vk ∈ V , k ∈ 1, . . . , α \ i, i+ 1, and define

B : V × V −→ W, B(v, w) := A(v1, . . . , vi−1, v, w, vi+2, . . . , vα).

Then, as A is α times linear and alternating, B is bilinear and alternating. Thus, foreach v, w ∈ V ,

0 = B(v + w, v + w) = B(v, v) + B(v, w) + B(w, v) +B(w,w) = B(v, w) + B(w, v),

implying B(v, w) = −B(w, v), proving (4.9) for this case. For general π ∈ Sα, (4.9)now follows from Th. 3.7(b), Th. 3.11(b), and Lem. 4.8: Let T :=

(i i+ 1) ∈ Sα : i ∈1, . . . , α−1

. Then, given π ∈ Sα, Th. 3.7(b) implies the existence of π1, . . . , πN ∈ T ,N ∈ N, such that π = π1 · · · πN . Thus, for each (v1, . . . , vα) ∈ V α,

A(vπ(1), . . . , vπ(α)) = (πA)(v1, . . . , vα)Lem. 4.8=

(

π1(

. . . (πNA))

)

(v1, . . . , vα)

= (−1)N A(v1, . . . , vα)Th. 3.11(b)

= = sgn(π)A(v1, . . . , vα),

proving (4.9).

“(ii)⇒(i)”: Let charF 6= 2. Let (v1, . . . , vα) ∈ V α and suppose i, j ∈ 1, . . . , α aresuch that i 6= j as well as vi = vj. Then

A(v1, . . . , vα)(ii)= sgn(ij)A(v1, . . . , vα) = −A(v1, . . . , vα).

Thus, 2A(v1, . . . , vα) = 0 and 2 6= 0 implies A(v1, . . . , vα) = 0.


The following Ex. 4.10 shows that “(i) ⇐ (ii)” can not be expected to hold in Prop. 4.9for charF = 2:

Example 4.10. Let F := Z2 = 0, 1, V := F . Consider the bilinear map A : V 2 −→F , A(λ, µ) := λµ. Then A is not alternating, since A(1, 1) = 1 · 1 = 1 6= 0. However,(4.9) does hold for A (due to 1 = −1 ∈ F ): A(1, 1) = 1 = −1 = −A(1, 1) (and0 = A(0, 0) = −A(0, 0) = A(0, 1) = −A(1, 0)).Proposition 4.11. Let V and W be vector spaces over the field F , α ∈ N. Then, givenA ∈ Lα(V,W ), the following statements are equivalent:

(i) A is alternating.

(ii) The implication in (4.7) holds whenever j = i + 1 (i.e. whenever two juxtaposedarguments are identical).

(iii) If the family (v1, . . . , vα) in V is linearly dependent, then A(v1, . . . , vα) = 0.

Proof. Exercise.

Definition 4.12. Let F be a field and V := F α, α ∈ N. Then det ∈ Altα(V, F ) is calleda determinant if, and only if,

det(e1, . . . , eα) = 1,

where e1, . . . , eα denote the standard basis vectors of V .

Definition 4.13. Let V be a vector space over the field F , α ∈ N. We define the mapΛα : (V ′)α −→ Altα(V, F ), called outer product or wedge product of linear forms, asfollows:

Λα(ω1, . . . , ωα)(v1, . . . , vα) :=

∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(α)(v

α) (4.10)

(cf. Th. 4.15 below). Given linear forms ω1, . . . , ωα ∈ V ′, it is common to also use thenotation

ω1 ∧ · · · ∧ ωα := Λα(ω1, . . . , ωα).

Lemma 4.14. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα denotethe wedge product of Def. 4.13. Then one has

∀ω1,...,ωα∈V ′

∀v1,...,vα∈V

Λα(ω1, . . . , ωα)(v1, . . . , vα) :=

∑

π∈Sα

sgn(π)ω1(vπ(1)) · · ·ωα(v

π(α)).

(4.11)

Proof. Using the commutativity of multiplication in F , the bijectivity of permutations,as well as sgn(π) = sgn(π−1), we obtain

Λα(ω1, . . . , ωα)(v1, . . . , vα) =

∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(α)(v

α)

=∑

π∈Sα

sgn(π)ω1(vπ−1(1)) · · ·ωα(v

π−1(α))

=∑

π∈Sα


π(α)),


proving (4.11).

Theorem 4.15. Let V be a vector space over the field F , α ∈ N. Moreover, let Λα

denote the wedge product of Def. 4.13.

(a) Λα is well-defined, i.e. it does, indeed, map (V ′)α into Altα(V, F ).

(b) Λα ∈ Altα(

V ′,Altα(V, F ))

.

Proof. (a): Let ω1, . . . , ωα ∈ V ′ and A := Λα(ω1, . . . , ωα). First, we show A to be αtimes linear: To this end, let λ, µ ∈ F and x, v1, . . . , vα ∈ V . Then

A(v1, . . . , vi−1, λvi + µx, vi+1, . . . , vα)

=∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(i−1)(v

i−1)(

λωπ(i)(vi) + µωπ(i)(x)

)

ωπ(i+1)(vi+1) . . . ωπ(α)(v

α)

= λ∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(i−1)(v

i−1)ωπ(i)(vi)ωπ(i+1)(v

i+1) . . . ωπ(α)(vα)

+ µ∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(i−1)(v

i−1)ωπ(i)(x)ωπ(i+1)(vi+1) . . . ωπ(α)(v

α)

= λA(v1, . . . , vi−1, vi, vi+1, . . . , vα) + µA(v1, . . . , vi−1, x, vi+1, . . . , vα),

proving A ∈ Lα(V, F ). It remains to show A is alternating. To this end, let i, j ∈1, . . . , α with i < j. Then, according to Prop. 3.14(b). τ := (i j) /∈ Aα and Sα =(Aατ) ∪Aα. If k ∈ 1, . . . , α \ i, j, then, for each π ∈ Sα, (πτ)(k) = π(k). Thus, if(v1, . . . , vα) ∈ V α is such that vi = vj, then

A(v1, . . . , vα)(4.11)=

∑

π∈Sα


π(α))

=∑

π∈Aα

(


π(α)) + sgn(πτ)ω1(v(πτ)(1)) · · ·ωα(v

(πτ)(α)))

=∑

π∈Aα

(


π(α))− sgn(π)ω1(vπ(1)) · · ·ωα(v

π(α)))

= 0,

showing A to be alternating.

(b) follows from (a): According to (a), Λα maps (V ′′)α into Altα(V ′, F ). Thus, ifΦ : V −→ V ′′ is the canonical embedding, then, for each v1, . . . , vα ∈ V α and eachω1, . . . , ωα ∈ V ′,

Λα(Φv1, . . . ,Φvα)(ω1, . . . , ωα) =∑

π∈Sα

sgn(π) (Φvπ(1))(ω1) · · · (Φvπ(α))(ωα)

=∑

π∈Sα


π(α))

(4.11)= Λα(ω1, . . . , ωα)(v

1, . . . , vα),

thereby completing the proof.


Corollary 4.16. Let F be a field and V := F α, α ∈ N. Given the standard basise1, . . . , eα of V , we define a map det : V α −→ F as follows: For each (v1, . . . , vα) ∈V α such that, for each j ∈ 1, . . . , α, vj =∑α

i=1 ajiei with (aji) ∈ M(α, F ), let

det(v1, . . . , vα) :=∑

π∈Sα

sgn(π) a1π(1) · · · aαπ(α). (4.12)

Then det is a determinant according to Def. 4.12. Moreover,

det(v1, . . . , vα) =∑

π∈Sα

sgn(π) aπ(1)1 · · · aπ(α)α (4.13)

also holds.

Proof. For each i ∈ 1, . . . , α, let ωi := πei : V −→ F be the projection onto thecoordinate with respect to ei according to Ex. 2.1(a). Then, if vj is as above, we haveωi(v

j) = aji. Thus,

Λα(ω1, . . . , ωα)(v1, . . . , vα)

(4.10)=

∑

π∈Sα

sgn(π)ωπ(1)(v1) · · ·ωπ(α)(v

α)(4.12)= det(v1, . . . , vα),

showing det ∈ Altα(V, F ) by Th. 4.15(a). Then we also have

det(v1, . . . , vα) = Λα(ω1, . . . , ωα)(v1, . . . , vα)

(4.11)=

∑

π∈Sα


π(α))

=∑

π∈Sα

sgn(π) aπ(1)1 · · · aπ(α)α,

proving (4.13). Moreover, for (v1, . . . , vα) = (e1, . . . , eα), (aji) = (δji) holds. Thus,

det(e1, . . . , eα) =∑

π∈Sα

sgn(π) δ1π(1) · · · δαπ(α) = sgn(Id) = 1,

which completes the proof.

According to Th. 4.17(b) below, the map defined by (4.12) is the only determinant inAltα(F α, F ).

Theorem 4.17. Let V and W be vector spaces over the field F , let BV and BW be basesof V and W , respectively, let α ∈ N. Moreover, as in Cor. 2.4, let B′ := ωb : b ∈ BV ,where

∀(b,a)∈BV ×BV

(

ωb ∈ V ′, ωb(a) := δba

)

. (4.14)

In addition, assume < to be a strict total order on BV (for dimV = ∞, the order onBV exists due to the axiom of choice, cf. [Phi19, Th. A.52(iv)]) and define

(BV )αord :=

(b1, . . . , bα) ∈ (BV )α : b1 < · · · < bα

.

Given (b1, . . . , bα) ∈ (BV )αord and y ∈ BW , and using Th. 4.15(a), define maps

Ab1,...,bα,y ∈ Altα(V,W ), Ab1,...,bα,y := Λα(ωb1 , . . . , ωbα) y. (4.15)

Let B :=

Ab1,...,bα,y : y ∈ BW , (b1, . . . , bα) ∈ (BV )

αord

.


(a) B is linearly independent.

(b) If V is finite-dimensional, dimV = n ∈ N, BV = b1 < · · · < bn, then B consti-tutes a basis for Altα(V,W ) (note B = ∅, dimAltα(V,W ) = 0 for α > n). If, inaddition, dimW = m ∈ N, then we can write

dimAltα(V,W ) =

(

nα

)

·m for α ≤ n,

0 for α > n.(4.16)

In particular, dimAltα(F α, F ) = 1, showing that the map det ∈ Altα(F α, F ) isuniquely determined by Def. 4.12 and given by (4.12).

(c) If V is infinite-dimensional and dimW ≥ 1, then B is not a basis for Altα(V,W ).

Proof. (a): We verify that the elements of B are linearly independent: Note that

∀(b1,...,bα)∈(BV )αord

∀Id 6=π∈Sα

(bπ(1), . . . , bπ(α)) /∈ (BV )αord. (4.17)

Thus, if(b1, . . . , bα), (c1, . . . , cα) ∈ (BV )

αord,

then

Λα(ωb1 , . . . , ωbα)(c1, . . . , cα) =

∑

π∈Sα

sgn(π)ωbπ(1)(c1) · · ·ωbπ(α)(cα)

(4.17),(4.14)=

1 for (b1, . . . , bα) = (c1, . . . , cα),

0 otherwise.(4.18)

Now let M,N ∈ N, let (b11, . . . , bα1 ), . . . , (b

1N , . . . , b

αN) ∈ (BV )

αord be distinct, and let

y1, . . . , yM ∈ BW be distinct as well. Assume λlk ∈ F to be such that

A :=M∑

l=1

N∑

k=1

λlkAb1k,...,bαk ,yl

= 0.

Let k ∈ 1, . . . , N. Then

0 = A(b1k, . . . , bαk ) =

M∑

l=1

N∑

k=1

λlkΛα(ωb1k

, . . . , ωbαk)(b1k, . . . , b

αk ) yl

(4.18)=

M∑

l=1

λlk yl

implies λ1k = · · · = λMk = 0 due to the linear independence of the yl ∈ BW . As thisholds for each k ∈ 1, . . . , N, we have established the linear independence of B.(b): Note that

#(BV )αord

(4.17)= #

I ⊆ BV : #I = α [Phi16a, Prop. 5.18(b)]

=

(

n

α

)

.


According to (a), it remains to show 〈B〉 = Altα(V,W ). Let A ∈ Altα(V,W ) and(bi1 , . . . , biα) ∈ (BV )

αord. Then there exists a finite set B(i1,...,iα) ⊆ BW such that

A(bi1 , . . . , biα) =∑

y∈B(i1,...,iα)λyy with λy ∈ F . Now let y1, . . . , yM , M ∈ N, be an

enumeration of the finite set⋃

(i1,...,iα)∈1,...,nα: (bi1 ,...,biα )∈(BV )αord

B(i1,...,iα).

Then, for each (j, (bi1 , . . . , biα)) ∈ 1, . . . ,M×(BV )αord, there exists λj,(i1,...,iα) ∈ F , such

that

∀(bi1 ,...,biα )∈(BV )αord

A(bi1 , . . . , biα) =M∑

j=1

λj,(i1,...,iα)yj.

Letting A :=∑M

j=1

∑

(bi1 ,...,biα )∈(BV )αordλj,(i1,...,iα)Abi1 ,...,biα ,yj , we claim A = A. Indeed, for

each (bj1 , . . . , bjα) ∈ (BV )αord,

A(bj1 , . . . , bjα) =M∑

j=1

∑

(bi1 ,...,biα )∈(BV )αord

λj,(i1,...,iα)Λα(ωbi1 , . . . , ωbiα )(b

j1 , . . . , bjα) yj

(4.18)=

M∑

j=1

∑

(bi1 ,...,biα )∈(BV )αord

λj,(i1,...,iα)δ(i1,...,iα),(j1,...,jα) yj

=M∑

j=1

λj,(j1,...,jα) yj = A(bj1 , . . . , bjα),

proving A = A by (4.9) and Th. 4.3. Since A ∈ 〈B〉, the proof of (b) is complete.

(c): Exercise.

The following Th. 4.18 compiles some additional rules of importance for alternatingmultilinear maps:

Theorem 4.18. Let V and W be vector spaces over the field F , α ∈ N. The followingrules hold for each A ∈ Altα(V,W ):

(a) The value of A remains unchanged if one argument is replaced by the sum of thatargument and a linear combination of the other arguments, i.e., if λ1, . . . , λα ∈ F ,v1, . . . , vα ∈ V , and i ∈ 1, . . . , α, then

A(v1, . . . , vα) = A

v1, . . . , vi−1, vi +α∑

j=1j 6=i

λj vj, vi+1, . . . , vα

.

(b) If v1, . . . , vα ∈ V and aji ∈ F are such that

∀j∈1,...,α

wj =α∑

i=1

ajivi,


then

A(w1, . . . , wα) =

(

∑

π∈Sα

sgn(π) a1π(1) · · · aαπ(α))

A(v1, . . . , vα)

= det(x1, . . . , xα) A(v1, . . . , vα), (4.19)

where

∀j∈1,...,α

xj :=α∑

i=1

ajiei ∈ F α

and e1, . . . , eα is the standard basis of F α.

(c) Suppose the family (v1, . . . , vα) in V is linearly independent and such that thereexist w1, . . . , wα ∈ 〈v1, . . . , vα〉 with A(w1, . . . , wα) 6= 0. Then A(v1, . . . , vα) 6= 0as well.

Proof. (a): One computes, for each i, j ∈ 1, . . . , α with i 6= j and λj 6= 0:

A(v1, . . . , vi + λj vj, . . . , vα)

A∈Lα(V,W )= λ−1

j A(v1, . . . , vi + λj vj, . . . , λj v

j, . . . , vα)

A∈Lα(V,W )= λ−1

j A(v1, . . . , vi, . . . , λj vj, . . . , vα)

+λ−1j A(v1, . . . , λj v

j, . . . , λj vj, . . . , vα)

(4.7)= λ−1

j A(v1, . . . , vi, . . . , λj vj, . . . , vα) + 0

A∈Lα(V,W )= A(v1, . . . , vα).

The general case of (a) then follows via a simple induction.

(b): We calculate

A(w1, . . . , wα)A∈Lα(V,W )

=α∑

i1=1

a1i1 · · ·α∑

iα=1

aαiαA(vi1 , . . . , viα)

(4.7)=

∑

π∈Sα

a1π(1) · · · aαπ(α)A(vπ(1), . . . , vπ(α))

(4.9)=

(

∑

π∈Sα

sgn(π) a1π(1) · · · aαπ(α))

A(v1, . . . , vα),

thereby proving the first equality in (4.19). The second equality in (4.19) is now animmediate consequence of Cor. 4.16.

(c): Exercise.


4.3 Determinants of Matrices and Linear Maps

In Def. 4.12, we defined a determinant as an alternating multilinear form on F α. Inthe following, we will see that determinants can be particularly useful when they areconsidered to be maps on quadratic matrices or maps on linear endomorphisms on vectorspaces of finite dimension. We begin by defining determinants on quadratic matrices:

Definition 4.19. Let F be a field and n ∈ N. Then the map

det : M(n, F ) −→ F, det(aji) :=∑

π∈Sn

sgn(π) a1π(1) · · · anπ(n), (4.20)

is called the determinant on M(n, F ).

Notation 4.20. If F is a field, n ∈ N, and det : M(n, F ) −→ F is the determinant,then, for (aji) ∈ M(n, F ), one commonly uses the notation

∣

∣

∣

∣

∣

∣

∣

a11 . . . a1n...

......

an1 . . . ann

∣

∣

∣

∣

∣

∣

∣

:= det(aji). (4.21)

Notation 4.21. Let F be a field and n ∈ N. As in [Phi19, Rem. 7.4(b)], we denote thecolumns and rows of a matrix A := (aji) ∈ M(n, F ) as follows:

∀i∈1,...,n

ci := cAi :=

a1i...ani

, ri := rAi :=

(

ai1 . . . ain)

.

Remark 4.22. Let F be a field and n ∈ N. If we consider the rows r1, . . . , rn of thematrix (aji) ∈ M(n, F ) as elements of F n, then, by Cor. 4.16,

det(aji) = det(r1, . . . , rn),

where the second det is the map det : (F n)n −→ F defined as in Cor. 4.16. As a furtherconsequence of Cor. 4.16, in combination with Th. 4.17(b), we see that det of Def. 4.19is the unique form on M(n, F ) that is multilinear and alternating in the rows of thematrix and that assigns the value 1 to the identity matrix.

Example 4.23. Let F be a field. We evaluate (4.20) explicitly for n = 1, 2, 3:

(a) For (a11) ∈ M(1, F ), we have

|a11| = det(a11) =∑

π∈S1

sgn(π) a1π(1) = sgn(Id) a11 = a11.


(b) For (aji) ∈ M(2, F ), we have

∣

∣

∣

∣

a11 a12a21 a22

∣

∣

∣

∣

= det(aji) =∑

π∈S2

sgn(π) a1π(1)a2π(2)

= sgn(Id) a11a22 + sgn(1 2) a12a21 = a11a22 − a12a21,

i.e. the determinant is the product of the elements of the main diagonal minus theproduct of the elements of the other diagonal:

+ −a11 a12a21 a22

− +

(c) For (aji) ∈ M(3, F ), we have

∣

∣

∣

∣

∣

∣

a11 a12 a13a21 a22 a23a31 a32 a33

∣

∣

∣

∣

∣

∣

= det(aji) =∑

π∈S3

sgn(π) a1π(1)a2π(2)a3π(3)

= sgn(Id) a11a22a33 + sgn(1 2) a12a21a33 + sgn(1 3) a13a22a31

+ sgn(2 3) a11a23a32 + sgn(1 2 3) a12a23a31 + sgn(1 3 2) a13a21a32

= a11a22a33 + a12a23a31 + a13a21a32 −(

a12a21a33 + a13a22a31 + a11a23a32)

.

To remember this formula, one can use the following tableau:

+ + + − − −a11 a12 a13 a11 a12a21 a22 a23 a21 a22a31 a32 a33 a31 a32

− − − + + +

One writes the first two columns of the matrix once again on the right and thentakes the product of the first three diagonals in the direction of the main diagonalwith a positive sign and the first three diagonals in the other direction with anegative sign.

—

We can now translate some of the results of Sec. 4.2 into results on determinants ofmatrices:

Corollary 4.24. Let F be a field, n ∈ N, and A := (aji) ∈ M(n, F ). Let r1, . . . , rndenote the rows of A and let c1, . . . , cn denote the columns of A. Then the followingrules hold for the determinant:


(a) det(Idn) = 1, where Idn denotes the identity matrix in M(n, F ).

(b) det(At) = det(A).

(c) det is multilinear with regard to matrix rows as well as multilinear with regard tomatrix columns, i.e., for each v ∈ M(1, n, F ), w ∈ M(n, 1, F ), i ∈ 1, . . . , n, andλ, µ ∈ F :

det

r1...

ri−1

λ ri + µ vri+1...rn

= λ det(A) + µ det

r1...

ri−1

vri+1...rn

,

det(

c1 . . . ci−1 λ ci + µw ci+1 . . . cn)

= λ det(A) + µ det(

c1 . . . ci−1 w ci+1 . . . cn)

.

(d) If λ ∈ F , then det(λA) = λn det(A).

(e) For each permutation π ∈ Sn, one has

det

rπ(1)...

rπ(n)

= det

(

cπ(1) . . . cπ(n))

= sgn(π) det(A).

In particular, switching rows i and j or columns i and j, where i, j ∈ 1, . . . , n,i 6= j, changes the sign of the determinant, i.e.

det

r1...ri...rj...rn

= − det

r1...rj...ri...rn

,

det(

c1 . . . ci . . . cj . . . cn)

= − det(

c1 . . . cj . . . ci . . . cn)

.

(f) The following statements are equivalent:

(i) detA = 0

(ii) The rows of A are linearly dependent.

(iii) The columns of A are linearly dependent.


(g) Multiplication Rule: If B := (bji) ∈ M(n, F ), then det(AB) = det(A) det(B).

(h) det(A) = 0 if, and only if, A is singular. If A is invertible, then

det(A−1) =(

det(A))−1

.

(i) The value of the determinant remains the same if one row of a matrix is replacedby the sum of that row and a scalar multiple of another row. More generally, thedeterminant remains the same if one row of a matrix is replaced by the sum of thatrow and a linear combination of the other rows. The statement also remains true ifthe word “row” is replaced by “column”. Thus, if λ1, . . . , λn ∈ F and i ∈ 1, . . . , n,then

det(A) = det

r1...rn

= det

r1...

ri−1

ri +∑n

j=1j 6=i

λj rj

ri+1...rn

,

det(A) = det(c1, . . . , cn) = det

c1, . . . , ci−1, ci +n∑

j=1j 6=i

λj cj, ci+1, . . . , cn

.

Proof. As already observed in Rem. 4.22, det : M(n, F ) −→ F can be viewed as theunique map det ∈ Altn(F n, F ) with det(e1, . . . , en) = 1, where one considers

A =

r1...rn

∈ (F n)n :

Since, for each j ∈ 1, . . . , n,

rj =(

aj1 . . . ajn)

=n∑

i=1

ajiei,

cj =

a1j...anj

=

n∑

i=1

aijei,

we have

det(A)(4.20)=

∑

π∈Sn

sgn(π) a1π(1) · · · anπ(n)(4.12)= det(r1, . . . , rn)

(4.13)=

∑

π∈Sn

sgn(π) aπ(1)1 · · · aπ(n)n(4.12)= det(c1, . . . , cn). (4.22)


(a): det(Idn) = det(e1, . . . , en) = 1.

(b) is immediate from (4.22).

(c) is due to det ∈ Altn(F n, F ) and (4.22).

(d) is a consequence of (c).

(e) is due to det ∈ Altn(F n, F ), (4.22), and (4.9).

(f): Again, we use det ∈ Altn(F n, F ) and (4.22): If det(A) = 0, then det(Idn) = 1 andTh. 4.18(c) imply (ii) and (iii). Conversely, if (ii) or (iii) holds, then det(A) = 0 followsfrom Prop. 4.11(iii).

(g): Let C := (cji) := AB. Using Not. 4.21, we have

∀j∈1,...,n

(

rCj = rAj B =n∑

i=1

ajirBi , rAj =

n∑

i=1

ajiei

)

and, thus,

det(AB)(4.22)= det(rC1 , . . . , r

Cn )

(4.19)= det(rA1 , . . . , r

An ) det(r

B1 , . . . , r

Bn )

(4.22)= det(A) det(B).

(h): As a consequence of [Phi19, Th. 7.17(a)], A is singular if, and only if, the columnsof A are linearly dependent. Thus, det(A) = 0 if, and only if, A is singular due to (f).Moreover, if A is invertible, then

1 = det(Idn) = det(AA−1)(g)= det(A) det(A−1).

(i) is due to det ∈ Altn(F n, F ), (4.22), and Th. 4.18(a).

Theorem 4.25 (Block Matrices). Let F be a field. The determinant of so-called blockmatrices over F , where one block is a zero matrix (all entries 0), can be computed asthe product of the determinants of the corresponding blocks. More precisely, if n,m ∈ N,then

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

a11 . . . a1n...

...... ∗

an1 . . . ann0 . . . 0 b11 . . . b1m...

......

......

...0 . . . 0 bm1 . . . bmm

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= det(aji) det(bji). (4.23)

Proof. Suppose (aji) ∈ M(n+m,F ), where

aji =

bi−n,j−n for i, j > n,

0 j > n and i ≤ n.(4.24)

Via obvious embeddings, we can consider Sn ⊆ Sm+n and Tm := Sn+1,...,n+m ⊆ Sm+n.If π ∈ Sm+n, then there are precisely two possibilities: Either there exist ω ∈ Sn and


σ ∈ Tm such that π = ωσ, or there exists j ∈ n+1, . . . ,m+n such that i := π(j) ≤ n,in which case ajπ(j) = 0. Thus,

det(aji) =∑

π∈Sn+m

sgn(π) a1π(1) · · · an+m,π(n+m)

=∑

(ω,σ)∈Sn×Tm

sgn(ω) sgn(σ) a1ω(1) · · · an,ω(n) an+1,σ(n+1) · · · an+m,σ(n+m)

=

(

∑

ω∈Sn

sgn(ω) a1ω(1) · · · anω(n))(

∑

σ∈Tm

sgn(σ) an+1,σ(n+1) · · · an+m,σ(n+m)

)

=

(

∑

π∈Sn

sgn(π) a1π(1) · · · anπ(n))(

∑

π∈Sm

sgn(π) b1π(1) · · · bmπ(m)

)

= det(aji)1,...,n×1,...,n det(bji),

proving (4.23).

Corollary 4.26. Let F be a field and n ∈ N. If (aji) ∈ M(n, F ) is upper triangular orlower triangular, then

det(aji) =n∏

k=1

akk.

Proof. For (aji) upper triangular, the statement follows from Th. 4.25 via an obviousinduction on n ∈ N. If (aji) is lower triangular, then the transpose (aji)

t is uppertriangular and the statement the follows from Cor. 4.24(b).

Definition 4.27. Let F be a field, n ∈ N, n ≥ 2, A = (aji) ∈ M(n, F ). For eachj, i ∈ 1, . . . , n, letMji ∈ M(n−1, F ) be the (n−1)× (n−1) submatrix of A obtainedby deleting the jth row and the ith column of A – theMji are called the minor matricesof A; define

Aji := (−1)i+j det(Mji), (4.25)

where the Aji are called cofactors of A and the det(Mji) are called the minors of A. LetA := (Aji)

t denote the transpose of the matrix of cofactors of A, called the adjugatematrix of A.

Lemma 4.28. Let F be a field, n ∈ N, n ≥ 2, A = (aji) ∈ M(n, F ). For eachj, i ∈ 1, . . . , n, let R(j, i) ∈ M(n, F ) be the matrix obtained from A by replacing thej-th row with the standard (row) basis vector ei, and let C(j, i) ∈ M(n, F ) be the matrixobtained from A by replacing the i-th column with the standard (column) basis vector ej,


i.e.

R(j, i) :=

a11 . . . a1,i−1 a1i a1,i+1 . . . a1n...

......

......

aj−1,1 . . . aj−1,i−1 aj−1,i aj−1,i+1 . . . aj−1,n

0 . . . 0 1 0 . . . 0aj+1,1 . . . aj+1,i−1 aj+1,i aj+1,i+1 . . . aj+1,n

......

......

...an1 . . . an,i−1 ani an,i+1 . . . ann

,

C(j, i) :=

a11 . . . a1,i−1 0 a1,i+1 . . . a1n...

......

......

aj−1,1 . . . aj−1,i−1 0 aj−1,i+1 . . . aj−1,n

aj1 . . . aj,i−1 1 aj,i+1 . . . aj,naj+1,1 . . . aj+1,i−1 0 aj+1,i+1 . . . aj+1,n

......

......

...an1 . . . an,i−1 0 an,i+1 . . . ann

.

Then, we have∀

j,i∈1,...,nAji = det

(

R(j, i))

= det(

C(j, i))

. (4.26)

Proof. Let j, i ∈ 1, . . . , n, and let Mji denote the corresponding minor matrix of Aaccording to Def. 4.27. Then

det(

R(j, i)) Cor. 4.24(e)

= (−1)i+j

∣

∣

∣

∣

1 0∗ Mji

∣

∣

∣

∣

= det(

C(j, i)) Cor. 4.24(e)

= (−1)j+i

∣

∣

∣

∣

1 ∗0 Mji

∣

∣

∣

∣

(4.23)= (−1)i+j det(Mji)

(4.25)= Aji,

thereby proving (4.26).

Theorem 4.29. Let F be a field, n ∈ N, n ≥ 2, A = (aji) ∈ M(n, F ). Moreover, letA := (Aji)

t be the adjugate matrix of A according to Def. 4.27. Then the folowing holds:

(a) AA = AA = (detA) Idn.

(b) If detA 6= 0, then det A = (detA)n−1.

(c) If detA 6= 0, then A−1 = (detA)−1 A.

(d) Laplace Expansion by Rows: detA =∑n

i=1 ajiAji (expansion with respect to thejth row).

(e) Laplace Expansion by Columns: detA =∑n

j=1 ajiAji (expansion with respect to theith column).


Proof. (a): Let C := (cji) := AA, D := (dji) := AA. Also let R(j, i) and C(j, i) be asin Lem. 4.28. Then, we compute, for each j, i ∈ 1, . . . , n,

cji =n∑

k=1

ajk Atki

(4.26)=

n∑

k=1

ajk det(

R(i, k)) Cor. 4.24(c)

= det

rA1...

rAi−1

rAjrAi+1...rAn

=

det(A) for i = j,

0 for i 6= j,

dji =n∑

k=1

Atjkaki

(4.26)=

n∑

k=1

aki det(

C(k, j))

Cor. 4.24(c)= det

(

cA1 . . . cAj−1 cAi cAj+1 . . . cAn)

=

det(A) for i = j,

0 for i 6= j,

proving (a).

(b): We obtain

det(A) det(A) = det(AA)(a)= det

(

(detA) Idn

) Cor. 4.24(d)= (detA)n,

which, for det(A) 6= 0, implies det A = (detA)n−1.

(c) is immediate from (a).

(d): From (a), we obtain

det(A) =n∑

i=1

ajiAtij =

n∑

i=1

ajiAji,

proving (d).

(e): From (a), we obtain

det(A) =n∑

j=1

Atijaji =

n∑

j=1

ajiAji,

proving (e).

Example 4.30. (a) We use Ex. 4.23(b) and Th. 4.29(d) to compute

D1 :=

∣

∣

∣

∣

1 23 4

∣

∣

∣

∣

:

From Ex. 4.23(b), we obtain

D1 = 1 · 4− 2 · 3 = −2,


which we also obtain when expanding with respect to the first row according to Th.4.29(d). Expanding with respect to the second row, we obtain

D1 = −3 · 2 + 4 · 1 = −2.

(b) We use Ex. 4.23(c) and Th. 4.29(e) to compute

D2 :=

∣

∣

∣

∣

∣

∣

1 2 34 5 67 8 9

∣

∣

∣

∣

∣

∣

:

From Ex. 4.23(c), we obtain

D2 = 1·5·9+2·6·7+3·4·8−3·5·7−1·6·8−2·4·9 = 45+84+96−105−48−72 = 0.

Expanding with respect to the third column according to Th. 4.29(d), we obtain

D2 = 3 ·∣

∣

∣

∣

4 57 8

∣

∣

∣

∣

−6 ·∣

∣

∣

∣

1 27 8

∣

∣

∣

∣

+9 ·∣

∣

∣

∣

1 24 5

∣

∣

∣

∣

= 3 ·(−3)−6 ·(−6)+9 ·(−3) = −9+36−27 = 0.

Theorem 4.31 (Cramer’s Rule). Let F be a field and A := (aji) ∈ M(n, F ), n ∈ N,n ≥ 2. If b1, . . . , bn ∈ F and det(A) 6= 0, then the linear system

∀j∈1,...,n

n∑

k=1

ajk xk = bj,

has a unique solution x ∈ F n, which is given by

∀j∈1,...,n

xj = (detA)−1

n∑

k=1

Akj bk, (4.27)

where Akj denote the cofactors of A according to Def. 4.27.

Proof. In matrix form, the linear system reads Ax = b, which, for det(A) 6= 0, isequivalent to x = A−1b, where A−1 = (detA)−1 A by Th. 4.29(c). Since A := (Aji)

t, wehave

∀j∈1,...,n

xj = (detA)−1

n∑

k=1

Atjk bk = (detA)−1

n∑

k=1

Akj bk,

proving (4.27).

Definition 4.32. Let n ∈ N. An element p = (p1, . . . , pn) ∈ (N0)n is called a multi-

index; |p| := p1 + · · · + pn is called the degree of the multi-index. Let R be a ring withunity. If x = (x1, . . . , xn) ∈ Rn and p = (p1, . . . , pn) is a multi-index, then we define

xp := xp11 xp22 · · · xpnn . (4.28)

Each function from Rn into R, x 7→ xp, is called a monomial function (in n variables);the degree of p is called the degree of the monomial. A function P from Rn into R is


called a polynomial function (in n variables) if, and only if, it is a linear combination ofmonomials, i.e. if, and only if P has the form

P : Rn −→ R, P (x) =∑

|p|≤k

apxp, k ∈ N0, ap ∈ R. (4.29)

The degree3 of P , denoted deg(P ), is the largest number d ≤ k such that there isp ∈ (N0)

n with |p| = d and ap 6= 0. If all ap = 0, i.e. if P ≡ 0, then P is the zeropolynomial function and its degree is defined to be −∞ (some authors use −1 instead).If F is a field, then we also define a rational function as a quotient of two polynomials:If P,Q : F n −→ F are polynomials, then

(P/Q) : F n \Q−10 −→ F, (P/Q)(x) :=P (x)

Q(x), (4.30)

is called a rational function (in n variables).

Remark 4.33. Let F be a field and n ∈ N. Comparing Def. 4.19 with Def. 4.32, weobserve that

det : F n2 −→ F, det(aji) :=∑

π∈Sn

sgn(π) a1π(1) · · · anπ(n),

is a polynomial function of degree n (and in n2 variables). According to Th. 4.29(c) and(4.25), we have

inv : GLn(F ) −→ GLn(F ), inv(aji) := (aji)−1 =

(

(−1)i+j det(Mij))

det(aji),

where the Mji ∈ M(n − 1, F ) denote the minor matrices of (aji). Thus, for each(k, l) ∈ 1, . . . , n2, the component function

invkl : GLn(F ) −→ F, invkl(aji) =(−1)k+l det(Mlk)

det(aji),

is a rational function invkl = Pkl/ det, where Pkl is a polynomial of degree n− 1.

Remark 4.34 (Computation of Determinants). For n ≤ 3, one may well use the for-mulas of Ex. 4.23 to compute determinants. However, the larger n becomes, the lessadvisable is the use of formula (4.20) to compute det(aji), as this requires n · n! multi-plications (note that n! grows faster for n → ∞ than nk for each k ∈ N). SometimesTh. 4.29(d),(e) can help, if one can expand with respect to a row or column, wheremany entries are 0. However, for a generic large n × n matrix A, it is usually the beststrategy4 to transform A to triangular form (e.g. by using Gaussian elimination accord-ing to [Phi19, Alg. 8.17]) and then use Cor. 4.26 (the number of multiplications then

3Caveat: In general, deg, as defined here, is not a well-defined function of P , as it actually dependson the representation (4.29) of P , rather than on P itself, cf. Rem. 7.14 below.

4If A has a special structure (e.g. many zeros) and/or the field F has a special structure (e.g.F ∈ R,C), then there might well be more efficient methods to compute det(A), studied in the fieldsNumerical Analysis and Numerical Linear Algebra.


merely grows proportional to n3): We know from Cor. 4.24(e),(i) that the operationsof the Gaussian elimination algorithm do not change the determinant, except for signchanges, when switching rows. For the same reasons, one should use Gaussian elimi-nation (or even more efficient algorithms adapted to special situations) rather than theneat-looking explicit formula of Cramer’s rule of Th. 4.31 to solve large linear systemsand rather than Th. 4.29(c) to compute inverses of large matrices.

Theorem 4.35 (Vandermonde Determinant). Let F be a field and λ0, λ1, . . . , λn ∈ F ,n ∈ N. Moreover, let

V :=

1 λ0 . . . λn01 λ1 . . . λn1...

...1 λn . . . λnn

∈ M(n+ 1, F ), (4.31)

which is known as the corresponding Vandermonde matrix. Then its determinant, theso-called Vandermonde determinant, is given by

det(V ) =n∏

k,l=0k>l

(λk − λl). (4.32)

Proof. The proof can be conducted by induction with respect to n: For n = 1, we have

det(V ) =

∣

∣

∣

∣

1 λ01 λ1

∣

∣

∣

∣

= λ1 − λ0 =1∏

k,l=0k>l

(λk − λl),

showing (4.32) holds for n = 1. Now let n > 1. Using Cor. 4.24(i), we add the (−λ0)-foldof the nth column to the (n+ 1)st column, we obtain in the (n+ 1)st column

0λn1 − λn−1

1 λ0...

λnn − λn−1n λ0

.

Next, one adds the (−λ0)-fold of the (n− 1)st column to the nth column, and, succes-sively, the (−λ0)-fold of the mth column to the (m + 1)st column. One finishes, in thenth step, by adding the (−λ0)-fold of the first column to the second column, obtaining

det(V ) =

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 λ0 . . . λn01 λ1 . . . λn1...

...1 λn . . . λnn

∣

∣

∣

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 0 0 . . . 01 λ1 − λ0 λ21 − λ1λ0 . . . λn1 − λn−1

1 λ0...

......

......

1 λn − λ0 λ2n − λnλ0 . . . λnn − λn−1n λ0

∣

∣

∣

∣

∣

∣

∣

∣

∣

.

Applying (4.23), then yields

det(V ) = 1 ·

∣

∣

∣

∣

∣

∣

∣

λ1 − λ0 λ21 − λ1λ0 . . . λn1 − λn−11 λ0

......

......

λn − λ0 λ2n − λnλ0 . . . λnn − λn−1n λ0

∣

∣

∣

∣

∣

∣

∣

.


We now use multilinearity to factor out, for each k ∈ 1, . . . , n, (λk − λ0) from the kthrow, arriving at

det(V ) =n∏

k=1

(λk − λ0)

∣

∣

∣

∣

∣

∣

∣

1 λ1 . . . λn−11

......

......

1 λn . . . λn−1n

∣

∣

∣

∣

∣

∣

∣

,

which is precisely the Vandermonde determinant of the n−1 numbers λ1, . . . , λn. Usingthe induction hypothesis, we obtain

det(V ) =n∏

k=1

(λk − λ0)n∏

k,l=1k>l

(λk − λl) =n∏

k,l=0k>l

(λk − λl),

completing the induction proof of (4.32).

Remark and Definition 4.36 (Determinant on Linear Endomorphisms). Let V be afinite-dimensional vector space over the field F , n ∈ N, dimV = n, and A ∈ L(V, V ).

Moreover, let B1 = (v1, . . . , vn) and B2 = (w1, . . . , wn) be ordered bases of V . If (a(1)ji ) ∈

M(n, F ) is the matrix corresponding to A with respect to B1 and (a(2)ji ) ∈ M(n, F ) is

the matrix corresponding to A with respect to B2, then we know from [Phi19, Th. 7.14]that there exists (cji) ∈ GLn(F ) such that

(a(2)ji ) = (cji)

−1(a(1)ji )(cji)

(namely (cji) such that, for each i ∈ 1, . . . , n, wi =∑n

j=1 cjivj). Thus,

det(a(2)ji ) =

(

det(cji)−1)

det(a(1)ji ) det(cji) = det(a

(1)ji )

and, in consequence,

det : L(V, V ) −→ F, det(A) := det(a(1)ji ),

is well-defined.

Corollary 4.37. Let V be a finite-dimensional vector space over the field F , n ∈ N,dimV = n.

(a) If A,B ∈ L(V, V ), then det(AB) = det(A) det(B).

(b) If A ∈ L(V, V ), then det(A) = 0 if, and only if, A is not bijective. If A is bijective,then det(A−1) = (det(A))−1.

(c) If A ∈ L(V, V ) and λ ∈ F , then det(λA) = λn det(A).

Proof. Let BV be an ordered basis of V .

(a): Let (aji), (bji) ∈ M(n, F ) be the matrices corresponding to A,B with respect toBV . Then

det(AB)[Phi19, Th. 7.10(a)]

= det(

(aji) (bji)) Cor. 4.24(g)

= det(aji) det(bji) = det(A) det(B).

5 DIRECT SUMS AND PROJECTIONS 63

(b): Since (aji) (with (aji) as before) is singular if, and only if, A is not bijective, (b) isimmediate from Cor. 4.24(h).

(c): If (aji) is as before, then

det(λA) = det(

λ (aji)) Cor. 4.24(d)

= λn det(aji) = λn det(A),


5 Direct Sums and Projections

In [Phi19, Def. 5.10], we defined sums of arbitrary (finite or infinite) families of subspaces.In [Phi19, Def. 5.28], we defined the direct sum for two subspaces. We will now extendthe notion of direct sum to arbitrary families of subspaces:

Definition 5.1. Let V be a vector space over the field F , let I be an index set and let(Ui)i∈I be a family of subspaces of V . We say that V is the direct sum of the family ofsubspaces (Ui)i∈I if, and only if, the following two conditions hold:

(i) V =∑

i∈I Ui.

(ii) For each finite J ⊆ I and each family (uj)j∈J in V such that uj ∈ Uj for eachj ∈ J , one has

0 =∑

j∈J

uj ⇒ ∀j∈J

uj = 0.

If V is the direct sum of the Ui, then we write V =⊕

i∈I Ui.

Proposition 5.2. Let V be a vector space over the field F , let I be an index set and let(Ui)i∈I be a family of subspaces of V . Then the following statements are equivalent:

(i) V =⊕

i∈I Ui.

(ii) For each v ∈ V , there exists a unique finite subset Jv of I and a unique mapσv : Jv −→ V \ 0, j 7→ uj(v) := σv(j), such that

v =∑

j∈Jv

σv(j) =∑

j∈Jv

uj(v) ∧ ∀j∈Jv

uj(v) ∈ Uj. (5.1)

(iii) V =∑

i∈I Ui and

∀j∈I

Uj ∩∑

i∈I\j

Ui = 0. (5.2)

(iv) V =∑

i∈I Ui and, letting I′ := i ∈ I : Ui 6= 0, each family (ui)i∈I′ in V with

ui ∈ Ui \ 0 for each i ∈ I ′ is linearly independent.


Proof. “(i) ⇔ (ii)”: According to the definition of V =∑

i∈I Ui, the existence of Jv andσv such that (5.1) holds is equivalent to V =

∑

i∈I Ui. If (5.1) holds and Iv ⊆ I is finitesuch that v =

∑

j∈Ivτv(j) with τv : Iv −→ V \ 0 and τv(j) ∈ Uj for each j ∈ Iv, then

define σv(j) := 0 for each j ∈ Iv \ Jv and τv(j) := 0 for each j ∈ Jv \ Iv. Then

0 = v − v =∑

j∈Jv∪Iv

(

σv(j)− τv(j))

and Def. 5.1(ii) implies σv(j) = τv(j) for each j ∈ Jv ∪ Iv as well as Jv = Iv. Conversely,assume there exists J ⊆ I finite and uj ∈ Uj (j ∈ J) such that

0 =∑

j∈J

uj ∧ ∃j0∈J

uj0 6= 0.

Thenuj0 =

∑

j∈J\j0

(−uj)

shows (ii) does not hold (as v := uj0 has two different representations).

“(i) ⇒ (iii)”: If (i) holds and v ∈ Uj ∩ ∑

i∈I\j Ui for some j ∈ I, then there exists a

finite J ⊆ I \ j such that v =∑

i∈J ui with ui ∈ Ui for each i ∈ J . Since −v ∈ Uj and

0 = −v +∑

i∈J

ui,

Def. 5.1(ii) implies −v = 0 = v, i.e. (iii).

“(iii) ⇒ (i)”: Let J ⊆ I be finite such that 0 =∑

i∈J ui with ui ∈ Ui for each i ∈ J .Then

∀j∈J

uj = −∑

i∈J\j

uj ∈ Uj ∩∑

i∈I\j

Ui

i.e. (iii) implies uj = 0 and Def. 5.1(ii).

“(iv) ⇒ (i)”: If J ⊆ I ′ is finite and uj ∈ Uj ∈ \0 for each j ∈ J with (uj)j∈J linearlyindependent, then

∑

j∈J uj 6= 0, implying Def. 5.1(ii) via contraposition.

“(i) ⇒ (iv)”: Suppose (ui)i∈I′ is a family in V such that ui ∈ Ui \ 0 for each i ∈ I ′. IfJ ⊆ I ′ is finite and 0 =

∑

j∈J λjuj for some λj ∈ K, then Def. 5.1(ii) implies λjuj = 0and, thus, λj = 0, for each j ∈ J , yielding the linear independence of (ui)i∈I′ .


(a) Let B be a basis of V with a decomposition B = ˙⋃i∈IBi. If, for each i ∈ I,

Ui := 〈Bi〉, then V =⊕

i∈I Ui. In particular, V =⊕

b∈B〈b〉.

(b) If (Ui)i∈I is a family of subspaces of V such that Bi is a basis of Ui and V =⊕

i∈I Ui,

then the Bi are pairwise disjoint and B := ˙⋃i∈IBi forms a basis of V .

Proof. Exercise.


Example 5.4. Consider the vector space V := R2 over R and let U1 := 〈(1, 0)〉,U2 := 〈(0, 1)〉, U3 := 〈(1, 1)〉. Then V = Ui + Uj and Ui ∩ Uj = 0 for eachi, j ∈ 1, 2, 3 with i 6= j. In particular, the sum V = U1 + U2 + U3 is not a direct sum,showing Prop. 5.2(iii) can, in general, not be replaced by the condition Ui ∩ Uj = 0for each i, j ∈ I with i 6= j.

Definition 5.5. Let S be a set. Then P : S −→ S is called a projection if, and only if,P 2 := P P = P .

Remark 5.6. For each set S, Id : S −→ S is a projection. Moreover, for each x ∈ S,the constant map fx : S −→ S, fx ≡ x, is a projection. If V := S is a vector spaceover a field F , then fx is linear, if and only if, x = 0. While this shows that not everyprojection on a vector space is linear, here, we are interested in linear projections. Wewill see in Th. 5.8 below that there is a close relationship between linear projections anddirect sums.

Example 5.7. (a) Let V be a vector space over the field F and let B be a basis ofV . If cv : Bv −→ F \ 0, Bv ⊆ V finite, is the coordinate map for v ∈ V , then,clearly, for each b ∈ B,

Pb : V −→ V, Pb(v) :=

cv(b) b for b ∈ Bv,

0 otherwise,

is a linear projection.

(b) Let I be a nonempty set, can consider the vector space V := F(I, F ) = F I overthe field F . If

∀i∈I

ei : I −→ F, ei(j) := δij,

then, clearly, for each i ∈ I,

Pi : V −→ V, Pi(f) := f(i) ei,

is a linear projection.

(c) Let n ∈ N and let V be the vector space over R, consisting of all functions f :R −→ R such that the n-th derivative f (n) exists. Then, clearly,

P : V −→ V, P (f)(x) :=n∑

k=0

f (k)(0)

k!xk,

is a linear projection, where the image of P consists of the subspace of all polynomialfunctions of degree at most n.



(a) Let (Ui)i∈I be a family of subspaces of V such that V =⊕

i∈I Ui, According to Prop.5.2(ii), for each v ∈ V , there exists a unique finite subset Jv of I and a unique mapσv : Jv −→ V \ 0, j 7→ uj(v) := σv(j), such that

v =∑

j∈Jv

σv(j) =∑

j∈Jv

uj(v) ∧ ∀j∈Jv

uj(v) ∈ Uj.

Thus, we can define

∀j∈I

Pj : V −→ V, Pj(v) :=

σv(j) for j ∈ Jv,

0 otherwise.(5.3)

Then each Pj is a linear projection with ImPj = Uj and kerPj =⊕

i∈I\j Ui.Moreover, if i 6= j, then PiPj ≡ 0. Defining

(

∑

i∈I

Pi

)

: V −→ V,

(

∑

i∈I

Pi

)

(v) :=∑

i∈Jv

Pi(v), (5.4)

we have Id =∑

i∈I Pi.

(b) Let (Pi)i∈I be a family of projections in L(V, V ) such that, for each v ∈ V , the setJv := i ∈ I : Pi(v) 6= 0 is finite. If Id =

∑

i∈I Pi (where∑

i∈I Pi is defined as in(5.4)) and PiPj ≡ 0 for each i, j ∈ 1, . . . , n with i 6= j, then

V =⊕

i∈I

ImPi.

(c) If P ∈ L(V, V ) is a projection, then V = kerP ⊕ ImP , ImP = ker(Id−P ),kerP = Im(Id−P ).

Proof. (a): If v, w ∈ V and λ ∈ F , then, extending σv and σw by 0 to Jv ∪ Jw, we have

∀j∈Jv∪Jw

(

σv+w(j) = σv(j) + σw(j) ∈ Uj ∧ σλv(j) = λσv(j) ∈ Uj

)

,

showing, for each i ∈ I, the linearity of Pi as well as ImPi = Ui and P2i = Pi. Clearly, if

j ∈ I, then⊕

i∈I\j Ui ⊆ kerPj. On the other hand, if v ∈ kerPj, then j /∈ Jv, showing

v ∈⊕i∈I\j Ui. If i 6= j and v ∈ V , then Pjv ∈ Uj ⊆ kerPi, showing PiPj ≡ 0. Finally,for each v ∈ V , we compute

(

∑

i∈I

Pi

)

(v) =∑

i∈Jv

Pi(v) =∑

i∈Jv

σv(i) = v,

thereby completing the proof of (a).

(b): For each v ∈ V , we have

v = Id v =∑

i∈Jv

Pi(v) ∈∑

i∈Jv

ImPi,

6 EIGENVALUES 67

proving V =∑

i∈I ImPi. Now let J ⊆ I be finite such that 0 =∑

j∈J uj with uj ∈ ImPj

for each j ∈ J . Then there exist vj ∈ V such that uj = Pjvj. Thus, we obtain

∀i∈J

0 = Pi(0) = Pi

(

∑

j∈J

uj

)

=∑

j∈J

PiPjvj = PiPivi = Pivi = ui,

showing Def. 5.1(ii) to hold and proving (b).

(c): We have

(Id−P )2 = (Id−P )(Id−P ) = Id−P − P + P 2 = Id−P,

showing Id−P to be a projection. On the other hand, P (Id−P ) = (Id−P )P = P −P 2 = P − P = 0, i.e.

V = ImP ⊕ Im(Id−P ) (5.5)

according to (b). We show kerP = Im(Id−P ) next: Let v ∈ V and x := (Id−P )v ∈Im(Id−P ). Then Px = Pv−P 2v = Pv−Pv = 0, showing x ∈ kerP and Im(Id−P ) ⊆kerP . Conversely, let x ∈ kerP . By (5.5), we write x = v1 + v2 with v1 ∈ ImP andv2 ∈ Im(Id−P ) ⊆ kerP . Then v1 = x− v2 ∈ kerP as well, i.e. v1 ∈ ImP ∩ kerP . Thenthere exists w ∈ V such that v1 = Pw and we compute

v1 = Pw = P 2w = Pv1 = 0,

as v1 ∈ kerP as well. Thus, x = v2 ∈ Im(Id−P ), showing kerP ⊆ Im(Id−P ) asdesired. From (5.5), we then also have V = ImP ⊕ kerP . Since we have seen Id−P tobe a projection as well, we also obtain ker(Id−P ) = Im(Id−(Id−P )) = ImP .

6 Eigenvalues

Definition 6.1. Let V be a vector space over the field F and A ∈ L(V, V ).

(a) We call λ ∈ F an eigenvalue of A if, and only if, there exists 0 6= v ∈ V such that

Av = λv. (6.1)

Then each 0 6= v ∈ V such that (6.1) holds is called an eigenvector of A for theeigenvalue λ; the set

EA(λ) := ker(

λ Id−A)

=

v ∈ V : Av = λv

(6.2)

is then called the eigenspace of A with respect to the eigenvalue λ. The set

σ(A) := λ ∈ F : λ eigenvalue of A (6.3)

is called the spectrum of A.

6 EIGENVALUES 68

(b) We call A diagonalizable if, and only if, there exists basis B of V such that eachv ∈ B is an eigenvector of A.

Remark 6.2. Let V be a finite-dimensional vector space over the field F , dimV =n ∈ N, and assume A ∈ L(V, V ) to be diagonalizable. Then there exists a basis B =v1, . . . , vn of V , consisting of eigenvectors of A, i.e. Avi = λivi, λi ∈ σ(A), for eachi ∈ 1, . . . , n. Thus, with respect to B, A is represented by the diagonal matrix

diag(λ1, . . . , λn) =

λ1. . .

λn

,

which explains the term diagonalizable.

—

It will be a main goal of the present and the following sections to investigate underwhich conditions, given A ∈ L(V, V ) with dimV <∞, V has a basis B such that, withrespect to B, A is represented by a diagonal matrix. While such a basis does not alwaysexist, we will see that there always exist bases such that the representing matrix has aparticularly simple structure, a so-called normal form.

Theorem 6.3. Let V be a vector space over the field F and A ∈ L(V, V ).

(a) λ ∈ F is an eigenvalue of A if, and only if, ker(λ Id−A) 6= 0, i.e. if, and only if,λ Id−A is not injective.

(b) For each eigenvalue λ of A, the eigenspace EA(λ) constitutes a subspace of V .

(c) Let (vλ)λ∈σ(A) be a family in V such that, for each λ ∈ σ(A), vλ is a eigenvector forλ. Then (vλ)λ∈σ(A) is linearly independent (in particular, #σ(A) ≤ dimV ).

(d) A is diagonalizable if, and only if, V is the direct sum of the eigenspaces of V , i.e.if, and only if,

V =⊕

λ∈σ(A)

EA(λ). (6.4)

(e) Let A be diagonalizable and, for each λ ∈ σ(A), let Pλ : V −→ V be the projectionwith

ImPλ = EA(λ) and kerPλ =⊕

µ∈σ(A)\λ

EA(µ)

given by (d) in combination with Th. 5.8(a). Then

A =∑

λ∈σ(A)

λPλ (6.5a)

and∀

λ∈σ(A)APλ = PλA, (6.5b)

where (6.5a) is known as the spectral decomposition of A.

6 EIGENVALUES 69

Proof. (a) holds, as, for each λ ∈ F and each v ∈ V ,

Av = λv ⇔ λv − Av = 0 ⇔ (λ Id−A)v = 0 ⇔ v ∈ ker(λ Id−A).

(b) holds, as EA(λ) is the kernel of a linear map.

(c): Seeking a contradiction, assume (vλ)λ∈σ(A) to be linearly dependent. Then thereexists a minimal family of vectors (vλ1 , . . . , vλk

), k ∈ N, such that λ1, . . . , λk ∈ σ(A) andthere exist c1, . . . , ck ∈ F \ 0 with

0 =k∑

i=1

ci vλi.

We compute

0 = 0− 0 = A

(

k∑

i=1

ci vλi

)

− λk

k∑

i=1

ci vλi=

k∑

i=1

ci λi vλi− λk

k∑

i=1

ci vλi

=k−1∑

i=1

ci (λi − λk) vλi.

As we had chosen the family (vλ1 , . . . , vλk) to be minimal, we obtain ci (λi − λk) = 0 for

each i ∈ 1, . . . , k − 1, which is a contradiction, since ci 6= 0 as well as λi 6= λk.

(d): If A is diagonalizable, V has basis B, consisting of eigenvectors of A. Letting, foreach λ ∈ σ(A), Bλ := b ∈ B : Ab = λb, we have that Bλ is a basis of EA(λ). Since

we have B = ˙⋃λ∈σ(A)Bλ, (6.4) now follows from Prop. 5.3(a). Conversely, if (6.4) holds,

then V has a basis of eigenvectors of A by means of Prop. 5.3(b).

(e): Exercise.

Corollary 6.4. Let V be a vector space over the field F and A ∈ L(V, V ). If dimV =n ∈ N and A has n distinct eigenvalues λ1, . . . , λn ∈ F , then A is diagonalizable.

Proof. Due to Th. 6.3(b),(c), we must have

V =n⊕

i=1

EA(λi),

showing A to be diagonalizable by Th. 6.3(d).

The following examples illustrate the dependence of diagonalizability, and even of themere existence of eigenvalues, on the structure of the field F .

Example 6.5. (a) Let K ∈ R,C and let V be a vector space over K with dimV = 2and ordered basis B := (v1, v2). Consider A ∈ L(V, V ) such that

Av1 = v2, Av2 = −v1.

6 EIGENVALUES 70

With respect to B, A is then given by the matrix M :=

(

0 −11 0

)

. In consequence,

M2 =

(

0 −11 0

)(

0 −11 0

)

=

(

−1 00 −1

)

,

showing A2 = − Id as well. Suppose λ ∈ σ(A) and v ∈ EA(λ). Then

−v = A2v = λ2v ⇒ λ2 = −1.

Thus, for K = R, A has no eigenvalues, σ(A) = ∅. For K = C, we obtain

A(v1 + iv2) = v2 − iv1 = −i(v1 + iv2),A(v1 − iv2) = v2 + iv1 = i(v1 − iv2),

showing A to be diagonalizable with σ(A) = i,−i and v1 + iv2, v1 − iv2 beinga basis of V of eigenvectors of A.

(b) Over R, consider the vector spaces

V1 := (f : R −→ R) : f polynomial function,V2 :=

⟨

(expa : R −→ R) : a ∈ R⟩

,

where∀

a∈Rexpa : R −→ R, expa(x) := eax.

For i ∈ 1, 2, consider the linear map

D : Vi −→ Vi, D(f) := f ′.

If P ∈ V1\0, then deg(DP ) < deg(P ). If P ∈ V1 is constant, thenDP = 0·P = 0.In consequence 0 ∈ R is the only eigenvalue of D : V1 −→ V1. On the other hand,for each a ∈ R, D(expa) = a expa, showing σ(D) = R for D : V2 −→ V2. Inthis case, D is even diagonalizable, since B := expa : a ∈ R is a basis of V2 ofeigenvectors of D.

(c) Let V be a vector space over the field F and assume charF 6= 2. Moreover, letA ∈ L(V, V ) such that A2 = Id (for dimV = 2, a nontrivial example is given by

each A represented by the matrix

(

0 11 0

)

with respect to some ordered basis of V ).

We claim A to be diagonalizable with σ(A) = −1, 1 and

V = EA(−1)⊕ EA(1) :

Indeed, if x ∈ Im(A+Id), then there exists v ∈ V such that x = (A+Id)v, implying

Ax = A(A+ Id)v = (A2 + A)v = (Id+A)v = x,

6 EIGENVALUES 71

showing Im(A+ Id) ⊆ EA(1). If x ∈ Im(Id−A), then there exists v ∈ V such thatx = (Id−A)v, implying

Ax = A(Id−A)v = (A− A2)v = (A− Id)v = −x,

showing Im(Id−A) ⊆ EA(−1). Now let v ∈ V be arbitrary. We use 2 6= 0 in F toobtain

v =1

2(v + Av) +

1

2(v − Av) ∈ Im(Id+A) + Im(Id−A) ⊆ EA(1) + EA(−1),

showing V = EA(1) + EA(−1).

(d) Let V be a vector space over the field F with dimV = 2 and ordered basis B :=(v1, v2). Consider A ∈ L(V, V ) such that

Av1 = v1, Av2 = v1 + v2.

With respect to B, A is then given by the matrix

(

1 10 1

)

. Due to Av1 = v1, we

have 1 ∈ σ(A). Let v ∈ V . Then there exist c1, c2 ∈ F such that v = c1v1 + c2v2.If λ ∈ σ(A) and 0 6= v ∈ EA(λ), then

λ(c1v1 + c2v2) = λv = Av = c1v1 + c2v1 + c2v2.

As the coordinates with respect to the basis B are unique, we conclude λc1 = c1+c2and λc2 = c2. If c2 6= 0, then the second equation yields λ = 1. If c2 = 0, thenc1 6= 0 and the first equation yields λ = 1. Alltogether, we obtain σ(A) = 1 and,since A 6= Id, A is not diagonalizable.

Definition 6.6. Let S be a set and A : S −→ S. Then U ⊆ S is called A-invariant if,and only if, A(U) ⊆ U .

Proposition 6.7. Let V be a vector space over the field F and let U ⊆ V be a subspaces.If A ∈ L(V, V ) is diagonalizable and U is A-invariant, then A U is diagonalizable aswell.

Proof. Let A be diagonalizable, let U be A-invariant, and set AU := AU . Clearly,

∀λ∈σ(A)

U ∩ EA(λ) =

EAU(λ) for λ ∈ σ(AU),

0 for λ /∈ σ(AU).

As A is diagonalizable, from Th. 6.3(d), we know

V =⊕

λ∈σ(A)

EA(λ). (6.6)

It suffices to showU = W :=

∑

λ∈σ(A)

(

U ∩ EA(λ))

,

6 EIGENVALUES 72

since, then

U =⊕

λ∈σ(AU )

EAU(λ)

due to Th. 6.3(d) and Prop. 5.2(iv). Thus, seeking a contradiction, let u ∈ U \W (noteu 6= 0). Then there exist distinct λ1, . . . , λn ∈ σ(A), n ∈ N, such that u =

∑ni=1 vi with

vi ∈ EA(λi) \ 0 for each i ∈ 1, . . . , n, where we may choose u ∈ U \W such thatn ∈ N is minimal. Since U is A-invariant, we know

Au =n∑

i=1

Avi =n∑

i=1

λi vi ∈ U.

As λnu ∈ U as well, we conclude

Au− λnu =n−1∑

i=1

(λi − λn) vi ∈ U

as well. Since u ∈ U \W was chosen such that n is minimal, we must have Au− λnu ∈U ∩W . Thus, there exists a finite set σu ⊆ σ(A) such that

Au− λnu =n−1∑

i=1

(λi − λn) vi =∑

λ∈σu

wλ ∧ ∀λ∈σu

0 6= wλ ∈ U ∩ EA(λ). (6.7)

Since the sum in (6.6) is direct, (6.7) and Prop. 5.2(ii) imply σu = λ1, . . . , λn−1 and

∀i∈1,...,n−1

(

wλi= (λi − λn) vi

λi 6=λn⇒ vi ∈ U ∩ EA(λi) ⊆ W)

.

On the other hand, this then implies

vn = u−n−1∑

i=1

vi ∈ Uvn∈EA(λn)⇒ vn ∈ W,

yielding the contradiction u = vn +∑n−1

i=1 vi ∈ W . Thus, the assumption that thereexists u ∈ U \W was false, proving U = W as desired.

We will now use Prop. 6.7 to prove a result regarding the simultaneous diagonalizabilityof linear endomorphisms:

Theorem 6.8. Let V be a vector space over the field F and let A1, . . . , An ∈ L(V, V ),n ∈ N, be diagonalizable linear endomorphisms. Then the A1, . . . , An are simultaneouslydiagonalizable (i.e. there exists a basis B of V , consisting of eigenvectors of Ai for eachi ∈ 1, . . . , n) if, and only if,

∀i,j∈1,...,n

AiAj = AjAi. (6.8)

6 EIGENVALUES 73

Proof. Suppose B is a basis of V such that

∀i∈1,...,n

∀b∈B

∃λi,b∈σ(Ai)

Aib = λi,b b. (6.9)

Then∀

i,j∈1,...,nAiAjb = λi,b λj,b b = AjAib,

proving (6.8). Conversely, assume (6.8) to hold. We prove (6.9) via induction on n ∈ N.For technical reasons, we actually prove (6.9) via induction on n ∈ N in the following,clearly, equivalent form: There exists a family (Vk)k∈K of subspaces of V such that

(i) V =⊕

k∈K Vk.

(ii) For each k ∈ K, Vk has a basis Bk, consisting of eigenvectors of Ai for eachi ∈ 1, . . . , n.

(iii) For each k ∈ K and each i ∈ 1, . . . , n, Vk is contained in some eigenspace of Ai,i.e.

∀k∈K

∀i∈1,...,n

∃λik∈σ(Ai)

(

Vk ⊆ EAi(λik), ∀

v∈Vk

Aiv = λikv

)

.

(iv) For each k, l ∈ K with k 6= l, there exists i ∈ 1, . . . , n such that Vk and Vl arenot contained in the same eigenspace of Ai, i.e.

∀k,l∈K

(

k 6= l ⇒ ∃i∈1,...,n

λik 6= λil

)

.

For n = 1, we can simply use K := σ(A1) and, for each λ ∈ K, Vλ := EA1(λ). Thus,consider n > 1. By induction, assume (i) – (iv) to hold with n replaced by n − 1. Itsuffices to show that the spaces Vk, k ∈ K, are all An-invariant, i.e. An(Vk) ⊆ Vk: Then,according to Prop. 6.7, Ank := AnVk

is diagonalizable, i.e.

Vk =⊕

λ∈σ(Ank)

EAnk(λ).

Now each Vkλ := EAnk(λ) has a basis Bkλ, consisting of eigenvectors of An. Since, for

each i ∈ 1, . . . , n− 1, Bkλ ⊆ Vk ⊆ EAi(λik), Bkλ consists of eigenvectors of Ai for each

i ∈ 1, . . . , n. Letting Kn := (k, λ) : k ∈ K,λ ∈ σ(Ank), (i) – (iv) then hold withK replaced by Kn. Thus, it remains to show An(Vk) ⊆ Vk for each k ∈ Vk: Fix v ∈ Vk,k ∈ K. We have

∀j∈1,...,n−1

Aj(Anv)(6.8)= An(Ajv) = An(λjk v).

Moreover, there exists a finite set Kv ⊆ K such that

Anv =∑

l∈Kv

vl ∧ ∀l∈Kv

vl ∈ Vl \ 0.

6 EIGENVALUES 74

Then

∀j∈1,...,n−1

λjk∑

l∈Kv

vl = λjk(Anv) = An(λjkv) = Aj(Anv) =∑

l∈Kv

Ajvl =∑

l∈Kv

λjlvl.

As the sum in (i) is direct, Prop. 5.2(ii) implies λjkvl = λjlvl for each l ∈ Kv. For eachl ∈ Kv, we have vl 6= 0, implying λjk = λjl for each j ∈ 1, . . . , n − 1. Thus, by (iv),k = l, i.e. Kv = k and Anv = vk ∈ Vk as desired.

In general, computing eigenvalues is a difficult task, and we will say more about thisissue below. The following results can sometimes help, where Th. 6.9(a) is most usefulfor dimV small:

Theorem 6.9. Let V be a vector space over the field F , dimV = n ∈ N. Let A ∈L(V, V ).

(a) λ ∈ F is an eigenvalue of A if, and only if,

det(λ Id−A) = 0.

(b) If there exists a basis B of V such that the matrix (aji) ∈ M(n, F ) of A with respectto B is upper or lower triangular, then the diagonal elements aii are precisely theeigenvalues of A, i.e. σ(A) =

aii : i ∈ 1, . . . , n

.

Proof. (a): According to Th. 6.3(a), λ ∈ σ(A) is equivalent to λ Id−A not being in-jective, which (as V is finite-dimensional) is equivalent to det(λ Id−A) = 0 by Cor.4.37(b).

(b): For each λ ∈ F , we have

det(λ Id−A) = det(

λ Idn −(aji))

=n∏

i=1

(λ− aii).

Thus, by (a), σ(A) =

aii : i ∈ 1, . . . , n

.

Example 6.10. Consider the vector space V := R2 over R and, with respect to the

standard basis, let A ∈ L(V, V ) be given by the matrix M :=

(

3 −21 0

)

. Then, for each

λ ∈ R,

det(λ Id−A) =∣

∣

∣

∣

λ− 3 2−1 λ

∣

∣

∣

∣

= (λ− 3) · λ+ 2 = λ2 − 3λ+ 2 = (λ− 1)(λ− 2),

i.e. σ(A) = 1, 2 by Th. 6.9(a). Since

M

(

v1v2

)

=

(

v1v2

)

⇒ v1 = v2, M

(

v1v2

)

=

(

2v12v2

)

⇒ v1 = 2v2,

B := (

11

)

,(

21

)

is a basis of eigenvectors of A.

—

6 EIGENVALUES 75

Theorem 6.9(a) gives rise to the following definition:

Definition 6.11. If V is a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ), then the function

pA : F −→ F, pA(t) := det(t Id−A),

is called the characteristic polynomial function of A (some authors call pA the charac-teristic polynomial of A – here, we distinguish pA from a related, but different object,which we will call the characteristic polynomial of A and which we will introduce in alater section). As it turns out, pA is, indeed, a polynomial function, cf. Rem. 6.12(a)below.

Remark 6.12. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ).

(a) Let pA be the characteristic polynomial function of A as defined in Def. 6.11. If Bis a basis of V , the matrix (aji) ∈ M(n, F ) represents A with respect to B, and,for each t ∈ F , (cji(t)) := (t Idn−(aji)), then

∀t∈F

pA(t) = det(

t Idn −(aji))

=n∏

i=1

(t− aii) +∑

π∈Sn\Id

sgn(π)n∏

i=1

ciπ(i)(t).

Clearly, the degree of the first summand is n and, for each π ∈ Sn \Id, the degreeof the corresponding summand is at most n − 2, showing pA to be a polynomialfunction of degree n (as a caveat, consult the footnote to the degree definition inDef. 4.32 and Rem. 7.14 below). Moreover, we see pA to be a monic polynomial,i.e. the coefficient of tn is 1.

(b) According to Th. 6.9(a), σ(A) is precisely the set of zeros of pA.

(c) Some authors prefer to define the characteristic polynomial function of A as thepolynomial function

qA : F −→ F, qA(t) := det(A− t Id) = (−1)n pA(t).

While qA still has the property that σ(A) is precisely the set of zeros of qA, qA ismonic only for n even. On the other hand, qA has the advantage that qA(0) =det(A).

(d) According to (b), the task for finding the eigenvalues of A is the same as the taskof finding the zeros of the polynomial function pA. So one might hope that onlyparticularly simple polynomial functions can occur as characteristic polynomialfunctions. However, this is not the case: Indeed, every monic polynomial functionof degree n occurs as a characteristic polynomial function: Let a1, . . . , an ∈ F , and

P : F −→ F, P (t) := tn +n∑

i=1

ai tn−i.

6 EIGENVALUES 76

We define the companion matrix of P to be

M(P ) :=

−a1 −a2 −a3 . . . −an−1 −an1 0

1 0

1. . .. . . 0

1 0

and claim pA = P , if A ∈ L(F n, F n) is the linear map, represented by M(P ) withrespect to the standard basis of F n: Indeed, using Laplace expansion with respectto the first row, we obtain, for each t ∈ F ,

pA(t) =

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

t+ a1 a2 a3 . . . an−1 an

−1 t

−1 t

−1. . .. . . t

−1 t

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= (−1)n+1an(−1)n−1 + (−1)nan−1(−1)n−2 t

+ · · ·+ (−1)3a2(−1)1 tn−2 + (−1)2(t+ a1) tn−1

=n−2∑

i=0

(−1)2(n−i)an−i ti + (−1)2(t+ a1) t

n−1

= tn +n−1∑

i=0

an−iti = tn +

n∑

i=1

aitn−i = P (t).

(e) In Ex. 6.5(a), we saw that the considered linear endomorphism A had eigenvaluesfor F = C, but no eigenvalues for F = R, which we can now relate to the fact thatpA(t) = t2 + 1 has no zeros over R, but pA(t) = (t − i)(t + i) with zeros ±i overC. One calls a field F algebraically closed if, and only if, every polynomial P withcoefficients in F and degree n ∈ N, can be written in the form P (t) = c

∏ni=1(t−λi),

where c ∈ F and λ1, . . . , λn ∈ F are precisely the zeros of P (R is not algebraicallyclosed, but C is algebraically closed due to the fundamental theorem of algebra,cf. [Phi16a, Th. 8.32]). It is an important result of Algebra that every field F iscontained in an algebraically closed field (see, e.g., [Bos13, Cor. 3.4.7], [Lan05, Cor.V.2.6])5.

5One calls a field an F algebraic closure of F if, and only if, F is algebraically closed and F isminimal in the sense that each λ ∈ F is a zero of some polynomial with coefficients in F (one thensays that F is an algebraic extension of F ). The mentioned references [Bos13, Cor. 3.4.7], [Lan05, Cor.V.2.6] actually show that every field is contained in an algebraic closure. If F is not algebraically closed,then the algebraic closure of F is not unique, but at least all algebraic closures of F are isomorphic.

6 EIGENVALUES 77

(f) Given that eigenvalues are precisely the zeros of the characteristic polynomial func-tion, and given that, according to (d), every polynomial function of degree n canoccur as the characteristic polynomial of a matrix, it is not surprising that com-puting eigenvalues is, in general, a difficult task, even if F is algebraically closed.It is another result of Algebra that, for a generic polynomial of degree at least 5, itis not possible to obtain its zeros using so-called radicals (which are, roughly, zerosof polynomials of the form tk − λ, k ∈ N, λ ∈ F , see, e.g., [Bos13, Def. 6.1.1] for aprecise definition) in finitely many steps (cf., e.g., [Bos13, Cor. 6.1.7]). In practice,one often has to make use of approximative numerical methods (see, e.g., [Phi17b,Sec. I]). Having said that, let us note that the problem of computing eigenvalues is,indeed, typically easier than the general problem of computing zeros of polynomials.This is due to the fact that the difficulty of computing the zeros of a polynomialdepends tremendously on the form in which the polynomial is given: It is typicallyhard if the polynomial is expanded into the form p(t) =

∑ni=0 ai t

i, but it is easy(trivial, in fact), if the polynomial is given in a factored form p(t) = c

∏ni=1(t−λi).

If the characteristic polynomial function is given implicitly by a matrix, one is, ingeneral, somewhere between the two extremes. In particular, for a large matrix,it usually makes no sense to compute the characteristic polynomial function in itsexpanded form (this is an expensive task in itself and, in the process, one even losesthe additional structure given by the matrix). It makes much more sense to usemethods tailored to the computation of eigenvalues, and, if available, one shouldmake use of additional structure a matrix might have.

Remark 6.13. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). Moreover, let λ ∈ σ(A). Clearly, one has

0 ⊆ ker(A− λ Id) ⊆ ker(A− λ Id)2 ⊆ . . .

and the inclusion can be strict for at most n times. Let

r(λ) := min

k ∈ N : ker(A− λ Id)k = ker(A− λ Id)k+1

. (6.10)

Then∀

k∈Nker(A− λ Id)r(λ) = ker(A− λ Id)r(λ)+k : (6.11)

Indeed, otherwise, let k0 := mink ∈ N : ker(A− λ Id)r(λ) ( ker(A− λ Id)r(λ)+k. Thenthere exists v ∈ V such that (A−λ Id)r(λ)+k0v = 0, but (A−λ Id)r(λ)+k0−1v 6= 0. However,that means w := (A − λ Id)k0−1v ∈ ker(A − λ Id)r(λ)+1, but w /∈ ker(A − λ Id)r(λ), incontradiction to the definition of r(λ).

Definition 6.14. Let V be a vector space over the field F , dimV = n ∈ N, andA ∈ L(V, V ). Moreover, let λ ∈ σ(A). The number

ma(λ) := dimker(A− λ Id)r(λ) ∈ 1, . . . , n, (6.12)

where r(λ) is given by (6.10), is called the algebraic multiplicity of the eigenvalue λ,whereas

mg(λ) := dimker(A− λ Id) ∈ 1, . . . , n (6.13)

6 EIGENVALUES 78

is called its geometric multiplicity. We call λ simple if, and only if, mg(λ) = ma(λ) = 1;we call λ semisimple if, and only if, mg(λ) = ma(λ). For each k ∈ 1, . . . , r(λ), thespace

EkA(λ) := ker(A− λ Id)k

is called the generalized eigenspace of rank k of A, corresponding to the eigenvalue λ;each v ∈ Ek

A(λ) \ Ek−1A (λ), k ≥ 2, is called a generalized eigenvector of rank k with

corresponding to the eigenvalue λ (an eigenvector v ∈ EA(λ) is sometimes called ageneralized eigenvector of rank 1).

Proposition 6.15. Let V be a vector space over the field F , dimV = n ∈ N, andA ∈ L(V, V ).

(a) If λ ∈ σ(A) and r(λ) is given by (6.10), then

1 ≤ r(λ) ≤ n. (6.14)

If k ∈ N is such that 1 ≤ k < k + 1 ≤ r(λ), then

1 ≤ mg(λ) = dimE1A(λ) ≤ dimEk

A(λ) < dimEk+1A (λ) ≤ dimE

r(λ)A (λ) = ma(λ) ≤ n,

(6.15)which implies, in particular,

r(λ) ≤ ma(λ), (6.16a)

1 ≤ mg(λ) ≤ ma(λ) ≤ n, (6.16b)

0 ≤ ma(λ)−mg(λ) ≤ n− 1. (6.16c)

(b) If λ ∈ σ(A), then, for each k ∈ 1, . . . , r(λ), the generalized eigenspace EkA(λ) is

A-invariant, i.e.A(

EkA(λ)

)

⊆ EkA(λ).

(c) If A is diagonalizable, then mg(λ) = ma(λ) holds for each λ ∈ σ(A) (but cf. Ex.6.16 below).

Proof. (a): Both (6.14) and (6.15) are immediate from Rem. 6.13 together with the defi-nitions of r(λ), mg(λ) and ma(λ). Then (6.16a) follows from (6.15), since dimEk+1

A (λ)−dimEk

A(λ) ≥ 1; (6.16b) is immediate from (6.15); (6.16c) is immediate from (6.16b).

(b): Due to A(A− λ Id) = (A− λ Id)A, one has

∀k∈N0

A(

ker(A− λ Id)k)

⊆ ker(A− λ Id)k :

Indeed, if v ∈ ker(A− λ Id)k, then

(A− λ Id)k(Av) = A(A− λ Id)kv = 0.

(c): Exercise.

6 EIGENVALUES 79

Example 6.16. Let V be a vector space over the field F , dimV = n ∈ N, n ≥ 2. Letλ ∈ F . We show that there always exists a map A ∈ L(V, V ) such that λ ∈ σ(A) andsuch that the difference between ma(λ) and mg(λ) maximal, namely

ma(λ)−mg(λ) = n− 1 :

Let B = (v1, . . . , vn) be an ordered basis of V , and let A ∈ L(V, V ) be such that

Av1 = λv1 ∧ ∀i∈2,...,n

Avi = λvi + vi−1.

Then, with respect to B, A is represented by the matrix

M :=

λ 1λ 1

. . . . . .

λ 1λ

.

We use an induction over k1, . . . , n to show

∀k1,...,n

(

∀1≤i≤k

(A− λ Id)kvi = 0 ∧ ∀k<i≤n

(A− λ Id)kvi = vi−k

)

: (6.17)

For k = 1, we have

(A− λ Id)v1 = λv1 − λv1 = 0, ∀1<i≤n

(A− λ Id)vi = λvi + vi−1 − λvi = vi−1,

as needed. For k > 1, we have

∀1≤i<k

(A− λ Id)kviind.hyp.= 0, (A− λ Id)kvk

ind.hyp.= (A− λ Id)v1 = 0,

∀k<i≤n

(A− λ Id)kviind.hyp.= (A− λ Id)vi−k+1 = λvi−k+1 + vi−k − λvi−k+1 = vi−k,

completing the induction. In particular, since dimV = n, (6.17) yields, for each 1 <k ≤ n, dimEk

A(λ) − dimEk−1A (λ) = 1 and dimEk

A(λ) = k, implying mg(λ) = 1 andma(λ) = n, as claimed.

Definition 6.17. Let n ∈ N. Consider the vector space V := F n over the field F . Allof the notions we introduced in this section for linear endomorphisms A ∈ L(V, V ) (e.g.eigenvalue, eigenvector, eigenspace, multiplicity of an eigenvalue, diagonalizability, etc.),one also defines for quadratic matrices M ∈ M(n, F ): The notions are then meant withrespect to the linear map AM that M represents with respect to the standard basis ofF n.

Example 6.18. Let F be field and n ∈ N. If M ∈ M(n, F ) is diagonalizable, then,according to Def. 6.17, there exists a regular matrix T ∈ GLn(F ) and a diagonal matrixD ∈ M(n, F ) such that D = T−1MT . A simple induction then shows

∀k∈N0

Mk = TDkT−1.

7 COMMUTATIVE RINGS, POLYNOMIALS 80

Clearly, if one knows T and T−1, this can tremendously simplify the computation ofMk, especially if k is large and M is fully populated. However, computing T and T−1

can also be difficult, and it depends on the situation if it is a good option to pursue thisroute.

7 Commutative Rings, Polynomials

We have already seen in the previous section that the eigenvalues of a quadratic matrixare precisely the zeros of its characteristic polynomial functions. In order to furtherstudy the relation between certain polynomials and the structure of a matrix (and thestructure of corresponding linear maps), we will need to investigate some of the generaltheory of polynomials. We will take this opportunity to also learn more about thegeneral theory of commutative rings, which is of algebraic interest beyond our currentinterest in matrix-related polynomials.

Definition 7.1. Let R be a commutative ring with unity. We call

R[X] := RN0fin :=

(f : N0 −→ R) : #f−1(R \ 0) <∞

(7.1)

the set of polynomials over R (i.e. a polynomial over R is a sequence (ai)i∈N0 in R suchthat all, but finitely many, of the entries ai are 0, cf. [Phi19, Ex. 5.16(c)]). We thenhave the pointwise-defined addition and scalar multiplication on R[X], which it inheritsfrom RN0 :

∀f,g∈R[X]

(f + g) : N0 −→ R, (f + g)(i) := f(i) + g(i),

∀f∈R[X]

∀λ∈R

(λ · f) : N0 −→ R, (λ · f)(i) := λ f(i),(7.2)

where we know from [Phi19, Ex. 5.16(c)] that, with these compositions, R[X] forms avector space over R, provided R is a field and, then, B = ei : i ∈ N0, where

∀i∈N0

ei : N0 −→ R, ei(j) := δij,

provides the standard basis of the vector space R[X]. In the current context, we willnow write X i := ei and we will call these polynomials monomials. Furthermore, wedefine a multiplication on R[X] by letting

(

(ai)i∈N0 , (bi)i∈N0

)

7→ (ci)i∈N0 := (ai)i∈N0 · (bi)i∈N0 ,

ci :=∑

k+l=i

ak bl :=∑

(k,l)∈(N0)2: k+l=i

ak bl =i∑

k=0

ak bi−k.(7.3)

If f := (ai)i∈N0 ∈ R[X], then we call the ai ∈ R the coefficients of f , and we define thedegree of f by

deg f :=

−∞ for f ≡ 0,

maxi ∈ N0 : ai 6= 0 for f 6≡ 0(7.4)


(defining deg(0) = −∞ instead of deg(0) = −1 has the advantage that formulas (7.5a),(7.5b) below then also hold for the zero polynomial). If deg f = n ∈ N0 and an = 1,then the polynomial f is called monic.

Remark 7.2. In the situation of Def. 7.1, using the notation X i = ei, we can write ad-dition, scalar multiplication, and multiplication in the following, perhaps, more familiar-looking forms: If λ ∈ R, f =

∑ni=0 fiX

i, g =∑n

i=0 giXi, n ∈ N0, f0, . . . , fn, g0, . . . , gn ∈

R, then

f + g =n∑

i=0

(fi + gi)Xi,

λf =n∑

i=0

(λfi)Xi,

fg =n2∑

i=0

(

∑

k+l=i

fk gl

)

X i.

—

Recall from [Phi19, Def. and Rem. 4.40] that an element x in a ring with unity R iscalled invertible if, and only if, there exists x ∈ R such that xx = xx = 1, and that(R∗, ·) denotes the group of invertible elements of R.

Lemma 7.3. Let R be a ring with unity and x ∈ R∗.

(a) x is not a zero divisor.

(b) If S is also a ring with unity and φ : R −→ S is a unital ring homomorphism, thenφ(x) ∈ S∗.

Proof. (a): Let x ∈ R∗ and x ∈ R such that xx = xx = 1. If y ∈ R such that xy = 0,then y = 1 · y = xxy = 0; if y ∈ R such that yx = 0, then y = y · 1 = yxx = 0. Inconsequence, x is not a zero divisor.

(b): Let x ∈ R such that xx = xx = 1. Then

1 = φ(1) = φ(xx) = φ(xx) = φ(x)φ(x) = φ(x)φ(x),

proving φ(x) ∈ S∗.

Theorem 7.4. Let R be a commutative ring with unity.

(a) If f, g ∈ R[X] with f = (fi)i∈N0, g = (gi)i∈N0, then

deg(f + g) =

−∞ ≤ maxdeg f, deg g if f = −g,maxi ∈ N0 : fi 6= −gi ≤ maxdeg f, deg g, otherwise,

(7.5a)

deg(fg) ≤ deg f + deg g. (7.5b)


If the highest nonzero coefficient of f or of g is not a zero divisor (e.g. if thiscoefficient is an invertible element, cf. Lem. 7.3), then one even has

deg(fg) = deg f + deg g. (7.5c)

(b) (R[X],+, ·) forms a commutative ring with unity, where 1 = X0 is the neutralelement of multiplication.

Proof. (a): If f ≡ 0, then f + g = g and fg ≡ 0, i.e. the degree formulas hold if f ≡ 0or g ≡ 0. It is also immediate from (7.2) that (7.5a) holds in the remaining case. Ifdeg f = n ∈ N0, deg g = m ∈ N0, then, for each i ∈ N0 with i > m + n, we have, fork, l ∈ N0 with k + l = i that k > m or l > n, showing (fg)i =

∑

k+l=i fk gl = 0, proving(7.5b). If fn is not a zero divisor, then (fg)m+n = fngm 6= 0, proving (7.5c).

(b): We already know from [Phi19, Ex. 4.9(e)] that (R[X],+) forms a commutativegroup. To verify associativity of multiplication, let a, b, c, d, f, g, h ∈ R[X],

a := (ai)i∈N0 , b := (bi)i∈N0 , c := (ci)i∈N0 , d := (di)i∈N0 ,

f := (fi)i∈N0 , g := (gi)i∈N0 , h := (hi)i∈N0 ,

such that d := ab, f := bc, g := (ab)c, h := a(bc). Then, for each i ∈ N0,

gi =∑

k+l=i

dk cl =∑

k+l=i

∑

m+n=k

am bn cl =∑

m+n+l=i

am bn cl

=∑

m+k=i

∑

n+l=k

am bn cl =∑

m+k=i

am fk = hi,

proving g = h, as desired. To verify distributivity, let a, b, c, d, f, g ∈ R[X] be as before,but this time such that d := ab, f := ac, and g := a(b+ c). Then, for each i ∈ N0,

gi =∑

k+l=i

ak (bl + cl) =∑

k+l=i

ak bl +∑

k+l=i

ak cl = di + fi,

proving g = d+ f , as desired. To verify commutativity of multiplication, let a, b, c, d ∈R[X] be as before, but this time such that c := ab, d := ba. Then, for each i ∈ N0,

ci =∑

k+l=i

ak bl =∑

k+l=i

bl ak = di,

proving c = d, as desired. Finally, if b := X0, then b0 = 1 and bi = 0 for i > 0, yielding,for c := ab and each i ∈ N0,

ci =∑

k+l=i

ak bl =∑

k+0=i

ak b0 = ai,

showing X0 to be neutral and completing the proof.


Definition 7.5. Let R,R′ be rings (with unity). We call R′ a ring extension of R if,and only if, there exists a (unital) ring monomorphism ι : R −→ R′ (if R′ is a ringextension of R, then one might even identify the elements of R and ι(R) and considerR to be a subset of R′). If R,R′ are fields, then one also calls R′ a field extension of R.

Example 7.6. (a) If R is a commutative ring with unity, then R[X] is a ring extensionof R via the unital ring monomorphism

ι : R −→ R[X], ι(r) := r X0 :

Indeed, ι is unital, since ι(1) = X0; ι is a ring homomorphism, since, for eachr, s ∈ R, ι(r + s) = (r + s)X0 = rX0 + sX0 = ι(r) + ι(s) and ι(rs) = rsX0 =rX0 · sX0 = ι(r) ι(s); ι is injective, since, for r 6= 0, ι(r) = rX0 6≡ 0.

(b) If R is a ring (with unity) and n ∈ N, then the matrix ring M(n,R) (cf. [Phi19,Ex. 7.7(c)]) is a ring extension of R via the (unital) ring monomorphism

ι : R −→ M(n,R), ι(r) := diag(r, . . . , r) :

Indeed, ι is a ring homomorphism, since, for each r, s ∈ R,

ι(r + s) = diag(r + s, . . . , r + s) = diag(r, . . . , r) + diag(s, . . . , s) = ι(r) + ι(s),

ι(rs) = diag(r, . . . , r) diag(s, . . . , s)[Phi19, (7.28)]

= ι(r) ι(s),

ι is injective, since, for r 6= 0, ι(r) = diag(r, . . . , r) 6= 0. If R is a ring with unity,then ι is unital, since ι(1) = Idn.

Proposition 7.7. Let R be a commutative ring with unity without nonzero zero divisors.Then R[X] has no nonzero zero divisors and (R[X])∗ = R∗.

Proof. Exercise.

Example 7.8. Let R := Z4 = Z/(4Z). Then (due to 4 = 0 in Z4)

(2X1 + 1X0)(2X1 + 1X0) = 0X2 + 0X1 + 1X0 = X0,

showing 2X1 + 1X0 ∈ (R[X])∗ \ R∗, i.e. (R[X])∗ 6= R∗ can occur if R has nonzero zerodivisors. This also provides an example, where the degree formula (7.5c) does not hold.

Definition and Remark 7.9. Let R be a commutative ring with unity and let R′ bea ring extension of R, where ι : R −→ R′ is a unital ring monomorphism. For eachx ∈ R′, the map

ǫx : R[X] −→ R′, f 7→ ǫx(f) = ǫx

(

deg f∑

i=0

fiXi

)

:=

deg f∑

i=0

fi xi, (7.6)

is called the substitution homomorphism or evaluation homomorphism correspondingto x (a typical example, where one wants to use a proper ring extension rather than


R, is the substitution of matrices from M(n,R), n ∈ N, for X): Indeed, if x ∈ R′,then ǫx is unital, since ǫx(X

0) = x0 = 1; ǫx is a ring homomorphism, since, for eachf =

∑deg fi=0 fiX

i ∈ R[X] and g =∑deg g

i=0 giXi ∈ R[X], one has

ǫx(f + g) =

deg(f+g)∑

i=0

(fi + gi) xi =

deg f∑

i=0

fi xi +

deg g∑

i=0

gi xi = ǫx(f) + ǫx(g),

ǫx(fg) =

deg(fg)∑

i=0

(

∑

k+l=i

fk gl

)

xi =

deg f∑

k=0

deg g∑

l=0

fkgl xk+l

=

(

deg f∑

k=0

fk xk

)(

deg g∑

l=0

gl xl

)

= ǫx(f) ǫx(g).

Moreover, ǫx is linear, since, for each λ ∈ R,

ǫx(λf) =

deg f∑

i=0

(λfi) xi = λ

deg f∑

i=0

fi xi = λ ǫx(f).

We call x ∈ R′ a zero or a root of f ∈ R[X] if, and only if, ǫx(f) = 0.

Theorem 7.10 (Polynomial Interpolation). Let F be a field and n ∈ N0. Given n + 1pairs (x0, y0), (x1, y1), . . . , (xn, yn) ∈ F 2, such that xi 6= xj for i 6= j, there exists aunique interpolating polynomial f ∈ F [X] with deg f ≤ n, satisfying

∀i∈0,1,...,n

ǫxi(f) = yi. (7.7)

Moreover, one can identify f as the Lagrange interpolating polynomial, which is givenby the explicit formula

f :=n∑

j=0

yj Lj, where ∀j∈0,1,...,n

Lj :=n∏

i=0i 6=j

X1 − xiX0

xj − xi(7.8)

(the Lj are called Lagrange basis polynomials).

Proof. Since f = f0X0 + f1X

1 + · · ·+ fnXn, (7.7) is equivalent to the linear system

f0 + f1x0+ · · ·+ fnxn0 = y0

f0 + f1x1+ · · ·+ fnxn1 = y1

...

f0 + f1xn+ · · ·+ fnxnn = yn,

(7.9)

for the n+ 1 unknowns f0, . . . , fn ∈ F . This linear system has a unique solution if, andonly if, the determinant

D :=

∣

∣

∣

∣

∣

∣

∣

∣

∣

1 x0 x20 . . . xn01 x1 x21 . . . xn1...

......

......

1 xn x2n . . . xnn

∣

∣

∣

∣

∣

∣

∣

∣

∣

(7.10)


does not vanish. Comparing with (4.31), we see D to be a Vandermonde determinantand (4.32) yields

D =n∏

i,j=0i>j

(xi − xj).

In particular, D 6= 0, as we assumed xi 6= xj for i 6= j, proving both existence anduniqueness of f (bearing in mind that the X i form a basis of F [X]). It remains toidentify f with the Lagrange interpolating polynomial. To this end, set g :=

∑nj=0 yj Lj.

Since, clearly, deg g ≤ n, it merely remains to show g satisfies (7.7). Since the xi aredistinct, we obtain

∀k∈0,...,n

ǫxk(g) =

n∑

j=0

yj ǫxk(Lj) =

n∑

j=0

yj δjk = yk,


Remark 7.11. Let R be a commutative ring with unity. In Def. 4.32, we defined poly-nomial functions (of several variables). The following Th. 7.12 illuminates the relationbetween the polynomials of Def. 7.1 and the polynomial functions (of one variable) ofDef. 4.32. Let Pol(R) denote the set of polynomial functions from R into R. Clearly,Pol(R) is a subring with unity of RR (cf. [Phi19, Ex. 4.41(a)]). If R is a field, thenPol(R) also is a vector subspace of RR.

Theorem 7.12. Let R be a commutative ring with unity and consider the map

φ : R[X] −→ Pol(R), f 7→ φ(f), φ(f)(x) := ǫx(f). (7.11)

(a) φ is a unital ring epimorphism. If R is a field, then φ is also a linear epimorphism.

(b) If R is finite, then φ is not a monomorphism.

(c) If F := R is an infinite field, then φ is an isomorphism.

Proof. (a): If f = X0, then φ(f) ≡ 1. We also know from Def. and Rem. 7.9 that, foreach x ∈ R, ǫx is a linear ring homomorphism. Thus, if f, g ∈ R[X] and λ ∈ R, then,for each x ∈ R,

φ(f + g)(x) = ǫx(f + g) = ǫx(f) + ǫx(g) =(

φ(f) + φ(g))

(x),

φ(fg)(x) = ǫx(fg) = ǫx(f) ǫx(g) =(

φ(f)φ(g))

(x),

φ(λf)(x) = ǫx(λf) = λ ǫx(f) =(

λφ(f))

(x).

Moreover, φ is an epimorphism since, if P ∈ Pol(R) with P (x) =∑n

i=0 fi xi, where

f0, . . . , fn ∈ R, n ∈ N0, then P = φ(f) with f =∑n

i=0 fiXi.

(b): If R is finite, then RR and Pol(R) ⊆ RR are finite, whereas R[X] = RN0fin is infinite

(also cf. 7.13 below).


(c): If F is infinite, then, for each n ∈ N0, there exist distinct points x0, . . . , xn ∈ F .Then Th. 7.10 implies that f ≡ 0 is the unique element of F [X] with deg f ≤ n suchthat ǫxi

(f) = 0 for each i ∈ 0, . . . , n, implying f ≡ 0 to be the unique element ofF [X] such that φ(f) ≡ 0. Thus, kerφ = 0 and φ is a monomorphism.

Example 7.13. If R is a finite commutative ring with unity, then f :=∏

λ∈R(X1−λ) ∈

R[X] \ 0, but, using φ of (7.11), φ(f) ≡ 0. For a concrete example, consider the fieldwith two elements, R := Z2 = 0, 1. Then, 0 6= f := X2 +X1 = X1(X1 + 1) ∈ R[X],but, for each x ∈ R, φ(f)(x) = x(x+ 1) = 0.

Remark 7.14. (a) If R is a finite commutative ring with unity, then Th. 7.12(b) showsthat the representation P (x) =

∑ni=0 aix

i of P ∈ Pol(R) with ai ∈ R, n ∈ N0 is notunique (e.g., over R := Z2, P (x) = 1 and x2+x+1 are two different representationsof P ≡ 1). Thus, while deg f is always well-defined for f ∈ R[X] by Def. 7.1, if Ris finite, then deg, as given by Def. 4.32, is not a well-defined function on Pol(R),as it depends on the representation of P ∈ Pol(R), rather than on P itself.

(b) If F is a field, then the proof of Th. 7.10 shows that, if P ∈ Pol(F ) can be writtenas P (x) =

∑ni=0 aix

i with ai ∈ F , n ∈ N0, then the representation is unique if,and only if, F has at least n + 1 elements. In particular, the monomial functionsx 7→ xi, i ∈ 0, . . . , n, are linearly independent if, and only if, F has at least n+1elements.

—

Next, we will prove a remainder theorem for polynomials, which can be seen as ananalogon of the remainder theorem for integers (cf. [Phi19, Th. D.1]):

Theorem 7.15 (Remainder Theorem). Let R be a commutative ring with unity. Letg =

∑di=0 giX

i ∈ R[X], deg g = d, where gd ∈ R∗. Then, for each f ∈ R[X], there existunique polynomials q, r ∈ R[X] such that

f = qg + r ∧ deg r < d. (7.12)

Proof. Uniqueness: Suppose f = qg+r = q′g+r′ with q, q′, r, r′ ∈ R[X] and deg r, deg r′

< d. Then

0 = (q − q′)g + (r − r′)(7.5a),(7.5c)⇒ deg(q − q′) + deg g = deg(r − r′).

However, since deg(r − r′) < d and deg g = d, this can only hold for q = q′, which, inturn, implies r = r′ as well.

Existence: We prove the existence of q, r ∈ R[X], satisfying (7.12), via induction onn := deg f ∈ N0 ∪ −∞: If deg f < d, then set q := 0 and r := f (this, in particular,takes care of the base case f ≡ 0). Now suppose f =

∑ni=0 fiX

i with fn 6= 0, n ≥ d.Define

h := f − fn g−1d Xn−d g.


Then the degree of the subtracted polynomial is n, where the coefficient of Xn isfn g

−1d gd = fn, implying deg h < n. Thus, by the induction hypothesis, there exist

qh, rh ∈ R[X] such that h = qhg + rh and deg rh < d. Then

f = h+ fn g−1d Xn−d g = qhg + rh + fn g

−1d Xn−d g = (qh + fn g

−1d Xn−d) g + rh.

Hence, letting q := qh + fn g−1d Xn−d and r := rh completes the proof.

Definition 7.16. Let R be a commutative ring with unity and 1 6= 0.

(a) We call R integral or an integral domain if, and only if, R does not have any nonzerozero divisors.

(b) We call R Euclidean if, and only if, R is an integral domain and there exists a mapdeg : R \ 0 −→ N0 such that, for each f, g ∈ R with g 6= 0, there exist q, r ∈ R,satisfying

f = qg + r ∧(

deg r < deg g ∨ r = 0)

. (7.13)

The map deg is then called a degree map or a Euclidean map of R.

Example 7.17. (a) Every field F is a Euclidean ring, where we can choose deg : F ∗ =F \ 0 −→ N0, deg ≡ 0, as the degree map: If g ∈ F ∗, then, given f ∈ F , wechoose q := fg−1 and r := 0. Then, clearly, (7.13) holds.

(b) Z is a Euclidean ring, where we can choose deg : Z \ 0 −→ N0, deg(k) := |k|,as the degree map: According to [Phi19, Th. D.1], for each f, g ∈ N, there existq, r ∈ N0 such that f = qg + r and 0 ≤ r < g, also implying −f = −qg − r with| − r| < g, f = −q(−g) + r, and −f = q(−g) − r, showing that (7.13) can alwaysbe satisfied (for f = 0, merely set q := r := 0).

(c) If F is a field, then F [X] is a Euclidean ring, where we can choose the degree mapas in Def. 7.1: If F is a field, then, for each 0 6= g ∈ F [X], we can apply Th. 7.15to obtain (7.13).

Definition 7.18. Let R be a commutative ring with unity.

(a) a ⊆ R is called an ideal in R if, and only if, the following two conditions hold:

(i) (a,+) is a subgroup of (R,+).

(ii) For each x ∈ R and each a ∈ a, one has ax ∈ a (which, as 1 ∈ R, is equivalentto aR = a).

(b) An ideal a ⊆ R is called principal if, and only if, there exists a ∈ R such thata = (a) := aR.

(c) R is called principal if, and only if, every ideal in R is principal.

(d) R is called a principal ideal domain if, and only if, R is both principal and integral.


Remark 7.19. Let R be a commutative ring with unity and let a ⊆ R be an ideal.Since (a,+) is a subgroup of (R,+) and a, b ∈ a implies ab ∈ a, a is always a subringof R. However, as 1 ∈ a implies a = R, (0) 6= a is a subring with unity if, and only if,a = R.

Proposition 7.20. Let R be a commutative ring with unity.

(a) 0 = (0) and (1) = R are principal ideals of R.

(b) If S is a ring and φ : R −→ S is a ring homomorphism, then kerφ = φ−10 is anideal in R.

(c) If a and b are ideals in R, then a+ b is an ideal in R.

(d) If (ai)i∈I is a family of ideals in R, I 6= ∅, then a :=⋂

i∈I ai is an ideal in R aswell.

(e) If I 6= ∅ is an index set, totally ordered by ≤ and (ai)i∈I is an increasing family ofideals in R (i.e., for each i, j ∈ I with i ≤ j, one has ai ⊆ aj), then a :=

⋃

i∈I ai isan ideal in R as well.

Proof. Exercise.

Theorem 7.21. If R is a Euclidean ring, then R is a principal ideal domain.

Proof. Let R be a Euclidean ring with degree map deg : R \ 0 −→ N0. Moreover, leta ⊆ R be an ideal, a 6= (0). Let a ∈ a \ 0 be such that

deg(a) = mindeg(x) : 0 6= x ∈ a.

Then a = (a): Indeed, let f ∈ a. According to (7.13), f = qa + r with q, r ∈ R anddeg(r) < deg(a) or r = 0. Then r = f − qa ∈ a and the choice of a implies r = 0 andf = qa ∈ (a), showing a ⊆ (a). As (a) ⊆ a also holds (since a is an ideal), we havea = (a), as desired.

Example 7.22. (a) If F is a field, then (0) and F = (1) are the only ideals in F (inparticular, each field is a principal ideal domain): Indeed, if a is an ideal in F ,0 6= a ∈ a, and x ∈ F , then x = xa−1a ∈ a.

(b) Z and F [X] (where F is a field) are principal ideal domains according to Th. 7.21,since we know from Ex. 7.17(b),(c) that both rings are Euclidean rings.

(c) According to Rem. 7.19, a proper subring with unity S of the commutative ringwith unity R can never be an ideal in R (and then the unital ring monomorphismι : S −→ R, ι(x) := x, shows that the subring S = Imφ does not need to be anideal). For example, Z is a subring of Q, but not an ideal in Q; Q is a subring ofR, but not an ideal in R.


(d) The ring Z4 = 0, 1, 2, 3 is principal: (2) = 0, 2 and, if a is an ideal in Z4 with3 ∈ a, then 3 + 3 = 2 ∈ a and 2 + 3 = 1 ∈ a, showing a = Z4. However, since2 · 2 = 0, Z4 is not a principal ideal domain.

(e) The set A := 2Z∪ 3Z ⊆ Z satisfies Def. 7.18(a)(ii): If k, l ∈ Z, then kl · 2 ∈ 2Z ⊆ Aand kl · 3 ∈ 3Z ⊆ A, but A is not an ideal in Z, since 2 + 3 /∈ A. This example alsoshows that unions of ideals need not be ideals.

(f) The ring Z[X] is not principal: Let

a :=

(fi)i∈N0 ∈ Z[X] : f0 is even

.

Then, clearly, a is an ideal in Z[X]. Moreover, 2X0 ∈ a and X1 ∈ Z[X]. However,if f ∈ a such that 2 = 2X0 ∈ (f), then f ∈ −2, 2, showing X1 /∈ (f). Thus, theideal a is not principal.

—

We now want to show that the analogue of the fundamental theorem of arithmetic[Phi19, Th. D.6] holds in every Euclidean ring (in particular, in F [X], if F is a field)and even in every principal ideal domain. We begin with some preparations:

Definition 7.23. Let R be an integral domain.

(a) We call x, y ∈ R associated if, and only if, there exists a ∈ R∗ such that x = ay.

(b) Let x, y ∈ R. We define x to be a divisor of y (and also say that x divides y, denotedx| y) if, and only if, there exists c ∈ R such that y = cx. If x is no divisor of y, thenwe write x 6 | y.

(c) Let ∅ 6=M ⊆ R. We call d ∈ R a greatest common divisor of the elements of M if,and only if,

∀x∈M

d| x ∧(

∀r∈R

(

∀x∈M

r| x)

⇒ r| d)

. (7.14)

(d) 0 6= p ∈ R \R∗ is called irreducible if, and only if,

∀x,y∈R

(

p = xy ⇒ (x ∈ R∗ ∨ y ∈ R∗))

. (7.15)

Otherwise, p is called reducible.

(e) 0 6= p ∈ R \R∗ is called prime if, and only if,

∀x,y∈R

(

p| xy ⇒ (p| x ∨ p| y))

. (7.16)

—

Before looking at some examples, we prove two propositions:


Proposition 7.24. Let R be an integral domain.

(a) Let ∅ 6= M ⊆ R. If r, d ∈ R are both greatest common divisors of the elements ofM , then r, d are associated.

(b) If 0 6= p ∈ R \R∗ is prime, then p is irreducible.

(c) If a1, . . . , an ∈ R \ 0, n ∈ N, and d ∈ R is such that we have the equality of ideals

(a1) + · · ·+ (an) = (d), (7.17)

then d is a greatest common divisor of a1, . . . , an.

Proof. (a): If r, d ∈ R are both greatest common divisors of the elements of M , thenr| d and d| r, i.e. there exist a, c ∈ R such that d = cr and r = ad, implying d = cadand d(1 − ca) = 0. Thus, since R does not have any nonzero zero divisors, 1 = ca andc, a ∈ R∗, i.e. r, d are associated.

(b): Let 0 6= p ∈ R \ R∗ be prime and assume p = xy = 1 · xy. Then p| xy and, as pis prime, p| x or p| y. By possibly renaming x, y, we may assume p| x, i.e. there existsc ∈ R with x = cp, implying p = xy = cpy and (1 − cy)p = 0. Thus, since R does nothave any nonzero zero divisors, 1 = cy and c, y ∈ R∗, i.e. p is irreducible.

(c): As a1, . . . , an ∈ (d), there exist c1, . . . , cn ∈ R such that ai = cid for each i ∈1, . . . , n, showing d| ai for each i ∈ 1, . . . , n. Now suppose r ∈ R is such that r| ai foreach i ∈ 1, . . . , n, i.e. there exist x1, . . . , xn ∈ R with ai = xir for each i ∈ 1, . . . , n.On the other hand d ∈ (a1) + · · · + (an) implies the existence of s1, . . . , sn ∈ R withd =

∑ni=1 siai. Then

d =n∑

i=1

siai =n∑

i=1

si xi r =

(

n∑

i=1

si xi

)

r,

showing r| d and proving (c).

Proposition 7.25. Let R be a principal ideal domain.

(a) Bezout’s Lemma, cf. [Phi19, Th. D.4]: If a1, . . . , an ∈ R \ 0, n ∈ N, and d ∈ R isa greatest common divisor of a1, . . . , an, then (7.17) holds. In particular,

∃x1,...,xn∈R

x1a1 + · · ·+ xnan = d, (7.18)

which is known as Bezout’s identity (usually for n = 2). An important special caseis that, if 1 is a greatest common divisor of a1, . . . , an, then

∃x1,...,xn∈R

x1a1 + · · ·+ xnan = 1. (7.19)

(b) Let 0 6= p ∈ R \R∗. Then p is prime if, and only if, p is irreducible.


Proof. (a): Let d be a greatest common divisor of a1, . . . , an. Since R is a principal idealdomain and using Prop. 7.24(c), (7.17) must hold with some greatest common divisor d1of a1, . . . , an. Then, by Prop. 7.24(a), there exists r ∈ R∗ such that d = r d1, implying(d) = (d1), proving (a).

(b): Due to Prop. 7.24(b), it only remains to prove that p is prime if it is irreducible.Thus, assume p to be irreducible and let x, y ∈ R such that p| xy, i.e. xy = ap witha ∈ R. If p 6 | x, then 1 is a greatest common divisor p, x and, according to (7.19), thereexist r, s ∈ R such that rp+ sx = 1. Then

y = y · 1 = y(rp+ sx) = yrp+ sxy = yrp+ sap = (yr + sa)p,

showing p| y and p prime.

Example 7.26. Let R be an integral domain.

(a) For each x ∈ R, due to x = 1 · x, 1| x and x| x.

(b) If F := R is a field, then F \ F ∗ = 0, i.e. F has neither irreducible elements norprime elements.

(c) If R = Z, then R∗ = −1, 1, i.e. p ∈ Z is irreducible if, and only if, |p| is a primenumber in N (and, since Z is a principal ideal domain by Ex. 7.22(b), p ∈ Z isirreducible if, and only if, it is prime).

(d) For each λ ∈ R, X−λ = X1−λX0 ∈ R[X] is irreducible due to (7.5c). For R = R,X2+1 is irreducible: Otherwise, there exist λ, µ ∈ R and X2+1 = (X+λ)(X+µ),yielding the contradiction 0 = ǫ−λ((X + λ)(X + µ)) = ǫ−λ(X

2 + 1) = λ2 + 1.

(e) SupposeR := Q+X1R[X] = (fi)i∈N0 ∈ R[X] : f0 ∈ Q .

Clearly, R is a subring of R[X]. Then X = X1 ∈ R is irreducible, but X is notprime, since X| 2X2 = (

√2X)(

√2X), but X 6 |

√2X, since

√2 /∈ Q. Then, as a

consequence of Prop. 7.25(b), R can not be a pinciple ideal domain. Indeed, theideal a := (X)+(

√2X) is not principle in R: Clearly, X and

√2X are not common

multiples of any noninvertible f ∈ R.

Lemma 7.27. Let R be a principal ideal domain. Let I 6= ∅ be an index set totallyordered by ≤, let (ai)i∈I be an increasing family of ideals in R. According to Prop.7.20(e), we can form the ideal a :=

⋃

i∈I ai. Then there exists i0 ∈ I such that a = ai0.

Proof. As R is principal, there exists a ∈ R such that (a) = a. Since a ∈ a, there existsi0 ∈ I such that a ∈ ai0 , implying (a) ⊆ ai0 ⊆ a = (a) and establishing the case.

Theorem 7.28 (Existence of Prime Factorization). Let R be a principal ideal domain.If 0 6= a ∈ R \R∗, then there exist prime elements p1, . . . , pn ∈ R, n ∈ N, such that

a = p1 . . . pn. (7.20)


Proof. Let S be the set of all ideals (a) inR that are generated by elements 0 6= a ∈ R\R∗

that do not have a prime factorization as in (7.20). We need to prove S = ∅. Seeking acontradiction, assume S 6= ∅ and note that set inclusion ⊆ provides a partial order on S.If C 6= ∅ is a totally ordered subset of S, then, by Prop. 7.20(e), a :=

⋃

c∈C c is an idealin R and, by Lem. 7.27, there exists c ∈ C such that a = c, showing a ∈ S to provide anupper bound for C. Thus, Zorn’s lemma [Phi19, Th. 5.22] applies, yielding a maximalelement m ∈ S (i.e. maximal in S with respect to ⊆). Then there exists a ∈ R \R∗ suchthat m = (a) and a does not have a prime factorization. In particular, a is not prime,i.e. a must be reducible by Prop. 7.25(b). Thus, there exist a1, a2 ∈ R \ (R∗ ∪ 0) suchthat a = a1a2. Then (a) ( (a1) and (a) ( (a2): Indeed, if a1 = ra = ra1a2 with r ∈ R,then 0 = a1(1 − ra2) would imply a2 ∈ R∗ and analogously for (a) = (a2). Due to themaximality of m = (a) in S, we conclude (a1), (a2) /∈ S. Thus, a1, a2 both must haveprime factorizations, yielding the desired contradiction that a = a1a2 must have a primefactorization as well.

Remark 7.29. In particular, we obtain from Th. 7.28 that each k ∈ Z \ −1, 0, 1 andeach f ∈ F [X] with F being a field and deg f ≥ 1 has a prime factorization. However,for R = Z and R = F [X], we can prove the existence of a prime factorization foreach 0 6= a ∈ R \ R∗ in a simpler way and without making use of Zorn’s lemma: Letdeg : R \ 0 −→ N0 be the degree map as in Ex. 7.17(b),(c): We conduct the proof viainduction on deg(a) ∈ N: If a itself is prime, then there is nothing to prove, and this,in particular, takes care of the base case of the induction. If a is not prime, then it isreducible, i.e. a = a1a2 with a1, a2 ∈ R \ (R∗ ∪ 0). In particular, 1 ≤ deg a1, deg a2 <deg a. Thus, by induction a1, a2 both have prime factorizations, implying a to have aprime factorization as well.

Theorem 7.30 (Uniqueness of Prime Factorization). Let R be an integral domain andx ∈ R. Suppose

x = p1 · · · pn = a r1 · · · rm, (7.21)

where a ∈ R∗, p1, . . . , pn ∈ R, n ∈ N, are prime and r1, . . . , rm ∈ R, m ∈ N, areirreducible. Then m = n and there exists a permutation π ∈ Sn such that, for eachi ∈ 1, . . . , n, ri and pπ(i) are associated (cf. Def. 7.23(a)).

Proof. The proof is conducted via induction on n ∈ N. Since p1 is prime and p1| r1 · · · rm,there exists i ∈ 1, . . . ,m such that p1| ri. Since ri is irreducible, we must have ri =a1 p1 with a1 ∈ R∗. For n = 1 (the induction base case), this yields m = 1 and p1, r1associated, as desired. For n > 1, we have

p2 · · · pn = aa1 r1 · · · ri−1 ri+1 · · · rm

and we employ the induction hypothesis to obtain n − 1 = m − 1 (i.e. n = m) and abijective map σ : 1, . . . , n \ i −→ 2, . . . , n such that, for each j ∈ 2, . . . , n, rjand pσ(j) are associated. Then

π : 1, . . . , n −→ 1, . . . , n, π(k) :=

1 for k = i,

σ(k) for k 6= i,


defines a permutation π ∈ Sn such that for each j ∈ 1, . . . , n, rj and pπ(j) are associ-ated.

Corollary 7.31. If R is a principal ideal domain (e.g. R = Z or R = F [X], where F isa field), then each 0 6= a ∈ R \ R∗ admits a factorization into prime elements, which isunique up to the order of the primes and up to association (rings R with this propertyare called factorial, factorial integral domains are called unique factorization domains).

Proof. One merely combines Th. 7.28 with Th. 7.30.

Definition 7.32. Let F be a field. We call F algebraically closed if, and only if, foreach f ∈ F [X] with deg f ≥ 1, there exists λ ∈ F such that ǫλ(f) = 0, i.e. such that λis a zero of f , as defined in Def. and Rem. 7.9 (cf. Rem. 6.12(e) and Th. 7.35).

Notation 7.33. Let R be a commutative ring with unity. One commonly uses thesimplified notation X := X1 ∈ R[X] and s := sX0 ∈ R[X] for each s ∈ R.

Proposition 7.34. Let R be a commutative ring with unity. For each f ∈ R[X] withdeg f = n ∈ N and each s ∈ R, there exists q ∈ R[X] with deg q = n− 1 such that

f = ǫs(f) + (X − s) q = ǫs(f)X0 + (X1 − sX0) q, (7.22a)

where ǫs is the substitution homomorphism according to Def. and Rem. 7.9. In particu-lar, if s is a zero of f , then

f = (X − s) q. (7.22b)

Proof. According to Th. 7.15, there exist q, r ∈ R[X] such that f = q(X − s) + r withdeg r < deg(X−s) = 1. Thus, r ∈ R and deg q = n−1 by (7.5c) (which holds, as X−sis monic). Applying ǫs to f = q(X − s) + r then yields ǫs(f) = ǫs(q)(s − s) + r = r,proving (7.22).

Theorem 7.35. Let F be a field. Then the following statements are equivalent:

(i) F is algebraically closed.

(ii) For each f ∈ F [X] with deg f = n ∈ N, there exists c ∈ F and λ1, . . . , λn ∈ F(not necessarily distinct), such that

f = cn∏

i=1

(X1 − λi). (7.23)

(iii) f ∈ F [X] is irreducible if, and only if, deg f = 1.

Proof. “(i) ⇔ (ii)”: If F is algebraically closed, then (ii) follows from Prop. 7.34 bycombining (7.22b) with a straightforward induction. That (ii) implies (i) is immediate.

“(i) ⇔ (iii)”: We already noted in Ex. 7.26(d) that each X−λ with λ ∈ F is irreducible,i.e. each sX − λ with s ∈ F \ 0 is irreducible as well. If F is algebraically closed and


deg f > 1, then (7.22b) shows f to be reducible. Conversely, if (iii) holds, then aninduction over n = deg f ∈ N shows each f ∈ F [X] with deg f ∈ N to have a zero:Indeed, f = aX + b with a, b ∈ R, a 6= 0, has −ba−1 as a zero, and, if deg f > 1, then fis reducible, i.e. there exist g, h ∈ F [X] with 1 ≤ deg g, deg h < deg f such that f = gh.By induction, g and h must have a zero, i.e. f must have a zero as well.

Corollary 7.36. Let F be a field. If f ∈ F [X] with deg f = n ∈ N0, then f has atmost n zeros. Moreover, there exists k ∈ 0, . . . , n and q ∈ F [X] with deg q = n − kand such that

f = qk∏

j=1

(X − λj), (7.24a)

where q does not have any zeros in F and N := λ1, . . . , λk = λ ∈ F : ǫλ(f) = 0 isthe set of zeros of f (N = ∅ and f = q is possible, and it can also occur that all λj in(7.24a) are identical). We can rewrite (7.24a) as

f = q

l∏

j=1

(X − µj)mj , (7.24b)

where µ1, . . . , µl ∈ F , l ∈ 0, . . . , k, are the distinct zeros of f , and mj ∈ N with∑l

j=1mj = k. Then mj is called the multiplicity of the zero µj of f .

If F is algebraically closed, then k = n and q ∈ F .

Proof. If f ∈ F [X] with deg f ≤ n ∈ N0 has n + 1 distinct zeros, then the uniquenesspart of Th. 7.10 yields f ≡ 0 and deg f = −∞ < n. The representation (7.24a) followsfrom (7.22b) combined with a straightforward induction. The algebraically closed caseis immediate from Th. 7.35(ii).

Example 7.37. (a) Since C is algebraically closed by [Phi16a, Th. 8.32], for each f ∈C[X] with n := deg f ∈ N, there exist numbers c, λ1, . . . , λn ∈ C such that

f = c

n∏

j=1

(X − λj) (7.25)

(the λ1, . . . , λn are precisely all the zeros of f , some or all of which might be iden-tical).

(b) For each f ∈ R[X] with n := deg f ∈ N, there exist numbers n1, n2 ∈ N0 andc, ξ1, . . . , ξn1 , α1, . . . , αn2 , β1, . . . , βn2 ∈ R such that

n = n1 + 2n2, (7.26a)

and

f = c

n1∏

j=1

(X − ξj)

n2∏

j=1

(X2 + αjX + βj) : (7.26b)


Indeed, if f has only real coefficients, then we can take complex conjugates toobtain, for each λ ∈ C,

ǫλ(f) = 0 ⇒ ǫλ(f) = ǫλ(f) = 0,

showing that the nonreal zeros of f (if any) must occur in conjugate pairs. Moreover,

(X − λ)(X − λ) = X2 − (λ+ λ)X + λλ = X2 − 2X Reλ+ |λ|2,

showing that (7.25) implies (7.26). This also shows that, if f ∈ R[X] is irreducible,then deg f ∈ 1, 2.

—

In [Phi19, Ex. 4.38], we saw how to obtain the field of rational numbers Q from the ringof integers Z. The same construction actually still works if Z is replaced by an arbitraryintegral domain R, resulting in the so-called field of fractions of R (in the followingsection, we will use the field of fractions of F [X] in the definition of the characteristicpolynomial of A ∈ L(V, V ), where V is a vector space over F ). This gives rise to thefollowing Th. 7.38.

Theorem 7.38. Let R be an integral domain. One defines the field of fractions6 F ofR as the quotient set F := R/ ∼ with respect to the following equivalence relation ∼ onR× (R \ 0), where the relation ∼ on R× (R \ 0) is defined by

(a, b) ∼ (c, d) :⇔ ad = bc, (7.27)

where, as usual, we will writea

b:= a/b := [(a, b)] (7.28)

for the equivalence class of (a, b) with respect to ∼. Addition on F is defined by

+ : F × F −→ F,(a

b,c

d

)

7→ a

b+c

d:=

ad+ bc

bd. (7.29)

Multiplication on F is defined by

· : F × F −→ F,(a

b,c

d

)

7→ a

b· cd:=

ac

bd. (7.30)

Then (F,+, ·) does, indeed, form a field, where 0/1 and 1/1 are the neutral elementswith respect to addition and multiplication, respectively, (−a/b) is the additive inverseto a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. The map

ι : R −→ F, ι(k) :=k

1, (7.31)

is a unital ring monomorphism and it is customary to identify R with ι(R), just writingk instead of k

1.

6Caveat: The field of fractions of R should not be confused with the quotient field or factor field ofR with respect to a maximal ideal m in R – this is a different contruction, leading to different objects(e.g., for R = Z to the finite fields Zp, where p ∈ N is prime).

8 CHARACTERISTIC POLYNOMIAL, MINIMAL POLYNOMIAL 96

Proof. Exercise.

Example 7.39. (a) Q is the field of fractions of Z.

(b) If R is an integral domain, then we know from Prop. 7.7 that R[X] is an integraldomain as well. The field of fractions of R[X] is denoted by R(X) and is called thefield of rational fractions over R.

8 Characteristic Polynomial, Minimal Polynomial

We will now apply the theory of polynomials to further study linear endomorphismson finite-dimensional vector spaces. The starting point is Th. 6.9(a), which states thatthe eigenvalues of A ∈ L(V, V ) are precisely the zeros of its characteristic polynomialfunction pA. In order to make the results of the previous section available, instead ofassociating a polynomial function with A, we now need to associate an actual polyno-mial. The idea is to replace t 7→ det(t Id−A) with det(X Idn−MA), where MA is thematrix of A with respect to an ordered basis of V . If V is a vector space over the field F ,then the entries of the matrix X Idn −MA are elements of the ring F [X]. However, wedefined determinants only for matrices with entries in fields. Thus, to make the follow-ing definition consistent with our definition of determinants, we consider the elementsof X Idn −MA to be elements of F (X), the field of rational fractions over F (cf. Ex.7.39(b)):

Definition 8.1. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). Moreover, let B be an ordered basis of A and letMA ∈ M(n, F ) be the matrixof A with respect to B. Since F (X) is a field extension of F , we may consider MA asan element of M(n, F (X)). We define

χA := det(X Idn −MA) ∈ F [X]

to be the characteristic polynomial of A.


(a) The characteristic polynomial χA is well-defined by Def. 8.1, i.e. if B1 are B2 areordered bases of V and M1,M2 are the matrices of A with respect to B1, B2, respec-tively, then

χ1 := det(X Idn −M1) = χ2 := det(X Idn−M2).

(b) If, for each t ∈ F , ǫt : F [X] −→ F is the substitution homomorphism according toDef. and Rem. 7.9, then

pA : F −→ F, pA(t) = ǫt(χA),

is the relation between the characteristic polynomial χA and the characteristic poly-nomial function pA of A.


(c) The spectrum σ(A) is precisely the set of zeros of χA.

Proof. (a): Let T ∈ GLn(F ) be such that M2 = T−1M1T . Then

χ2 = det(X Idn −T−1M1T ) = det(

T−1(X Idn−M1)T)

= (detT−1)χ1 (detT ) = χ1,

proving (a).

(b) is immediate from comparing Def. 8.1 and Def. 6.11.

(c) follows by combining (b) with Th. 6.9(a).

Remark 8.3. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). The statements of Rem. 6.12 in regard to the characteristic polynomial functionpA also apply correspondingly to χA. In particular, if B is an ordered basis of V , the ma-trix (aji) ∈ M(n, F ) represents A with respect to B, and we let (cji) := (X Idn −(aji)),then

χA = det(

X Idn −(aji))

=n∏

i=1

(X − aii) +∑

π∈Sn\Id

sgn(π)n∏

i=1

ciπ(i) (8.1)

shows χA to be monic with degχA = n. The argument of Rem. 6.12(d) shows that everymonic polynomial in F [X] of degree n occurs as a characteristic polynomial.

Theorem 8.4. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). Then there exists an ordered basis B of V such that the matrix M of Awith respect to B is triangular if, and only if, there exist λ1, . . . , λl ∈ F , l ∈ N, andn1, . . . , nl ∈ N with

l∑

i=1

ni = n ∧ χA =l∏

i=1

(X − λi)ni . (8.2)

In this case, σ(A) = λ1, . . . , λl,

∀i∈1,...,l

ni = ma(λi) (8.3)

(i.e. the algebraic multiplicity of λi is precisely the multiplicity of λi as a zero of χA),and one can choose B such that M has the upper diagonal form

M =

λ1. . .

λ1 ∗. . .

0 λl. . .

λl

, (8.4)

where each λi occurs precisely ni times on the diagonal.


Proof. If there exists a basis B of V such that the matrix M = (mji) of A with respectto B is triangular, then

χADef. 8.1= det(X Idn −M)

Cor. 4.26=

n∏

i=1

(X −mii).

Combining factors, where the mii are equal, yields (8.2). For the converse, we assume(8.2) and prove the existence of the basis B such that M has the from of (8.4) viainduction on n. For n = 1, there is nothing to prove. Thus, let n > 1. Then λ1 must bean eigenvector of A with some eigenvector 0 6= v1 ∈ V . Then, if B1 := (v1, . . . , vn) is anordered basis of V , the matrix M1 of A with respect to B1 has the block form

M1 =

(

λ1 ∗0 N

)

, N ∈ M(n− 1, F ).

According to Th. 4.25, we obtain

l∏

i=1

(X − λi)ni = χA = (X − λ1)χN ⇒ χN = (X − λ1)

n1−1

l∏

i=2

(X − λi)ni .

Let U := 〈v1〉 andW := V/〈v1〉. Then dimW = n−1 and, by [Phi19, Cor. 6.13(a)],BW := (v2 + U, . . . , vn + U) is an ordered basis of W . Let A1 ∈ L(W,W ) be such that,with respect to BW , A1 has the matrix N . Then, by induction hypothesis, there existsan ordered basis CW = (w2 + U, . . . , wn + U) of W (w2, . . . , wn ∈ V ), such that, withrespect to CW , the matrix N1 ∈ M(n − 1, F ) of A1 has the form (8.4), except that λ1occurs precisely n1 − 1 times on the diagonal. That N1 is the matrix of A1 means, forN1 = (νji)(j,i)∈2,...,n2 that

∀i∈2,...,n

A1(wi + U) =n∑

j=2

νji(wj + U).

Then, by [Phi19, Cor. 6.13(b)], B := (v1, w2, . . . , wn) is an ordered basis of V and,with respect to B, the matrix M of A has the form (8.4): According to [Phi19, Th.7.14], there exists an (n− 1)× (n− 1) transition matrix T1 = (tji)(j,i)∈2,...,n2 such thatN1 = T−1

1 NT1 and

∀i∈2,...,n

wi =n∑

j=2

tjivj, wi + U =n∑

j=2

tji(vj + U),

implying

M =

(

1 00 T−1

1

)(

λ1 ∗0 N

)(

1 00 T1

)

=

(

λ1 ∗0 N1

)

.

It remains to verify (8.3). Letting w1 := v1, we have B = (w1, . . . , wn). For each


k ∈ 1, . . . , n1 and the standard column basis vector ek, we obtain

(M −λ1 Idn) ek =(

(mji)−λ1(δji))

ek =

m1k...

mk−1,k

0...0

⇒ (A−λ1 Id)wk =k−1∑

α=1

mαk wα,

showing (A − λ1 Id)wk ∈⟨

w1, . . . , wk−1⟩

and w1, . . . , wn1 ∈ ker(A − λ1 Id)n1 . On

the other hand, for each k ∈ n1 + 1, . . . , n, we obtain

(M − λ1 Idn) ek =

m1k...

mk−1,k

λ− λ10...0

⇒ (A− λ1 Id)wk = (λ− λ1)wk +k−1∑

α=1

mαk wα,

where λ ∈ λ2, . . . , λl, showing wk /∈ ker(A− λ1 Id)N for each N ∈ N. Thus,

n1 = dimker(A− λ1 Id)n1 = dimker(A− λ1 Id)

r(λ1) = ma(λ1),

where r(λ1) is as defined in Rem. 6.13. Now note that λ1 was chosen arbitrarily is theabove argument. The same argument shows the existence of a basis B′ such that λi,i ∈ 1, . . . , l, appears in the upper left block ofM . In particular, we obtain ni = ma(λi)for each i ∈ 1, . . . , l.

Corollary 8.5. Let V be a vector space over the algebraically closed field F , dimV =n ∈ N, and A ∈ L(V, V ). Then there exists an ordered basis B of V such that the matrixM of A with respect to B is triangular.

Proof. This is immediate from combinining Th. 8.4 with Th. 7.35(ii).

Theorem 8.6. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). There exists a unique monic polynomial 0 6= µA ∈ F [X] (called the minimalpolynomial of A), satisfying the following two conditions:

(i) ǫA(µA) = 0, where ǫA : F [X] −→ L(V, V ) is the substitution homomorphismaccording to Def. and Rem. 7.9 (noting L(V, V ) to be a ring extension of F viathe unital ring monomorphism ι : F −→ L(V, V ), ι(a) := a Id).

(ii) For each f ∈ F [X] such that ǫA(f) = 0, µA is a divisor of f , i.e. µA| f .


Proof. Let a := f ∈ F [X] : ǫA(f) = 0. Clearly, a is an ideal in F [X], and, thus, asF [X] is a principal ideal domain, there exists g ∈ F [X] such that a = (g). Clearly, gsatisfies both (i) and (ii). We need to show g 6= 0, i.e. a 6= (0) = 0. To this end,note that, since dimL(V, V ) = n2, the n2 + 1 maps Id, A,A2, . . . , An2 ∈ L(V, V ) mustbe linearly dependent, i.e. there exist c0, . . . , cn2 ∈ F , not all 0, such that

0 =n2∑

i=0

ciAi,

showing 0 6= f :=∑n2

i=0 ciXi ∈ a. If h ∈ F [X] also satisfies (i) and (ii), then h| g

and g|h, implying g, h to be associated. In consequence, µA is the unique monic suchelement of F [X].

Remark 8.7. Let F be a field.

(a) We extend Def. 6.17 to the characteristic polynomial and to the minimal polynomial:Let n ∈ N. Consider the vector space V := F n over the field F . If M ∈ M(n, F ),then χM and µM denote the characteristic polynomial and the minimal polynomialof the linear map AM that M represents with respect to the standard basis of F n.

(b) Let V be a vector space over the field F , dimV = n ∈ N. As M(n, F ) is a ringextension of F , we can plug M ∈ M(n, F ) into elements of F [X]. Moreover, iff ∈ F [X], A ∈ L(V, V ) and M ∈ M(n, F ) represents A with respect to a basis Bof V , then, due to [Phi19, Th. 7.10(a)], ǫM(f) represents ǫA(f) with respect to B.

Example 8.8. Let F be a field. Consider

M :=

0 0 10 0 00 0 0

∈ M(3, F ) :

We claim that the minimal polynomial is µM = X2: Indeed,M2 = 0 implies ǫM(X2) = 0,and, if f =

∑ni=0 fiX

i ∈ F [X], n ∈ N, then ǫM(f) = f0 Id3 +f1M . Thus, if ǫM(f) = 0,then f0 = f1 = 0, implying X2| f and showing µM = X2.

Theorem 8.9 (Cayley-Hamilton). Let V be a vector space over the field F , dimV =n ∈ N, and A ∈ L(V, V ). If χA and µA denote the characteristic and the minimalpolynomial of A, respectively, then the following statements hold true:

(a) χA(A) := ǫA(χA) = 0.

(b) µA|χA and, in particular, deg µA ≤ degχA = n.

(c) λ ∈ σ(A) if, and only if, µA(λ) := ǫλ(µA) = 0, i.e. the eigenvalues of A are preciselythe zeros of the minimal polynomial µA.

(d) If #σ(A) = n (i.e. if A has n distinct eigenvalues), then χA = µA.


Proof. (a): Let B be an ordered basis of V and let (mji) :=MA be the matrix of A withrespect to B. Moreover, let N be the adjugate matrix of X Idn −MA, i.e., up to factorsof ±1, N contains the determinants of the (n− 1)× (n− 1) submatrices of X Idn −MA.According to Th. 4.29(a), we then have

χA Idn = det(X Idn −MA) Idn = N (X Idn −MA). (8.5)

Since X Idn −MA contains only entries of degree at most 1 (deg(X −mii) = 1, all otherentries having degree 0 or degree −∞), each entry nji of N has degree at most n − 1,i.e.

∀(j,i)∈1,...,n2

∃b0,j,i,...,bn−1,j,i∈F

nji =n−1∑

k=0

bk,j,iXk.

If, for each k ∈ 0, . . . , n− 1, we let Bk := (bk,j,i) ∈ M(n, F ), then N =∑n−1

k=0 BkXk.

Plugging this into (8.5) yields

χA Idn = (B0 + B1X + · · ·+ Bn−1Xn−1) (X Idn −MA)

= −B0MA + (B0 − B1MA)X + (B1 − B2MA)X2

+ · · ·+ (Bn−2 − Bn−1MA)Xn−1 + Bn−1X

n. (8.6)

Writing χA = Xn +∑n−1

i=0 aiXi with a0, . . . , an−1 ∈ F , the coefficients in front of each

X i in (8.6) must agree: Indeed, in each entry of the respective matrix, we have anelement of F [X] and, in each entry, the coefficients of X i must agree (due to the linearindependence of the X i) – hence, the matrix coefficients of X i in (8.6) must agree aswell. This yields

a0 Idn = −B0MA,

a1 Idn = B0 −B1MA,

...

an−1 Idn = Bn−2 −Bn−1MA,

Idn = Bn−1.

Thus, ǫMA(χA) turns out to be the telescoping sum

ǫMA(χA) = (MA)

n +n−1∑

i=0

ai(MA)i =

n−1∑

i=0

ai Idn (MA)i + Idn (MA)

n

= −B0MA +n−1∑

i=1

(Bi−1 − BiMA) (MA)i + Bn−1 (MA)

n = 0.

As φ : L(V, V ) −→ M(n, F ), φ(A) := MA, is a ring isomorphism by [Phi19, Th.7.10(a)], we also obtain ǫA(χA) = φ−1(ǫMA

(χA)) = 0, thereby proving (a).

(b) is an immediate consequence of (a) in combination with Th. 8.6(ii).


(c): Suppose λ ∈ σ(A) and let 0 6= v ∈ V be a corresponding eigenvector. Also letm0, . . . ,ml ∈ F be such that µA =

∑li=0miX

i, l ∈ N. Then we compute

0 =(

ǫA(µA))

v =

(

l∑

i=0

miAi

)

v =l∑

i=0

mi(Aiv) =

l∑

i=0

mi(λiv)

=

(

l∑

i=0

miλi

)

v =(

ǫλ(µA))

v,

showing ǫλ(µA) = 0. Conversely, if λ ∈ F is such that ǫλ(µA) = 0, then (b) impliesǫλ(χA) = 0, i.e. λ ∈ σ(A) by Th. 8.2(c).

(d): If #σ(A) = n, then µA has n distinct zeros by (c), implying deg µA = n. Since µA

is monic and µA|χA by (b), we have µA = χA as claimed.

Example 8.10. Let F be a field. If M :=

(

0 10 0

)

∈ M(2, F ), then χM = X2. Since

µM |χM and ǫM(X) = M 6= 0, we must have µM = χM . On the other hand, if N :=(

0 00 0

)

∈ M(2, F ), then χN = X2 and µN = X. Since N is diagonalizable, M is not

diagonalizable (cf. Ex. 6.5(d) and Ex. 6.16), but χM = χN , we can, in general, not decidediagonalizability merely by looking at the characteristic polynomial. However, we will inTh. 8.14 below that the minimal polynomial does allow one to decide diagonalizability.

Caveat 8.11. One has to use care when substituting matrices and endomorphismsinto polynomials: For example, one must be aware that in the expression X Idn−M ,the polynomial X is a scalar. Thus, when substituting a matrix B ∈ M(n, F ) for X,one must not use matrix multiplication between B and Idn: For example, for n = 2,

B :=

(

1 00 0

)

, and M :=

(

0 00 0

)

, we obtain B2 = B and

ǫB(det(X Idn −M)) = ǫB(X2) = B2 6= 0 = detB = det(B −M).

—

The following result further clarifies the relation between χA and µA:


(a) One has χA| (µA)n and, in particular, each irreducible factor of χA must be an

irreducible factor of µA.

(b) There exists an ordered basis B of V such that the matrix M of A with respect toB is triangular if, and only if, there exist λ1, . . . , λl ∈ F , l ∈ N, and n1, . . . , nl ∈ Nwith

µA =l∏

i=1

(X − λi)ni .


Proof. (a): Let M ∈ M(n, F ) be a matrix representing A with respect to some orderedbasis of V . Let G be an algebraically closed field with F ⊆ G (cf. Rem. 6.12(e)). Wecan consider M as an element of M ∈ M(n,G) and, then, σ(M) is precisely the set ofzeros of both χM = χA and µA in G. As G is algebraically closed, for each λ ∈ σ(M),there exists mλ, nλ ∈ N such that mλ ≤ nλ ≤ n and

µA =∏

λ∈σ(M)

(X − λ)mλ|χA =∏

λ∈σ(M)

(X − λ)nλ .

Letting q :=∏

λ∈σ(M)(X − λ)nmλ−nλ , we have q ∈ G[X] as well as q = (µA)n (χA)

−1,

i.e. q χA = (µA)n, proving χA| (µA)

n (in both G[X] and F [X], since q = (µA)n (χA)

−1 ∈F (X) ∩G[X] = F [X]).

(b) follows by combining (a) with Th. 8.4.

Theorem 8.13. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). Suppose the minimal polynomial µA can be written in the form µA = g1 · · · gl,l ∈ N, where g1, . . . , gl ∈ F [X] are such that, whenever i 6= j, then 1 is a greatestcommon divisor of gi and gj. Then

V =l⊕

i=1

ker gi(A) =l⊕

i=1

ker ǫA(gi).

Proof. Define

∀i∈1,...,l

hi :=l∏

k=1,k 6=i

gk.

Then 1 is a greatest common divisor of h1, . . . , hl: Indeed, as 1 is a greatest commondivisor of gi and gj for i 6= j, the sets of prime factors of the gi must all be disjoint.Thus, if f ∈ F [X] is a divisor of hi, i ∈ 1, . . . , l, then f does not share a prime factorwith gi. If f |hi holds for each i ∈ 1, . . . , l, then f does not share a prime factorwith any gi, implying f ∈ F \ 0, i.e. 1 is a greatest common divisor of h1, . . . , hl. Inconsequence, (7.19) implies

∃f1,...,fl∈F [X]

1 =l∑

i=1

fi hi

and

∀v∈V

v = Id v = ǫA(1)v =l∑

i=1

ǫA(fi) ǫA(hi) v. (8.7)

We verify that, for each i ∈ 1, . . . , l, ǫA(fi) ǫA(hi) v ∈ ker ǫA(gi): Indeed, since gihi =µA, one has

ǫA(gi) ǫA(fi) ǫA(hi) v = ǫA(fi) ǫA(µA) v = 0.


Thus, (8.7) proves∑l

i=1 ker ǫA(gi). According to Prop. 5.2(iii), it remains to show

∀i∈1,...,l

U := ker ǫA(gi) ∩∑

j∈1,...,l\i

ker ǫA(gj) = 0.

To this end, fix i ∈ 1, . . . , l and note ǫA(gi)(U) = 0 = ǫA(hi)(U). On the otherhand, 1 is a greatest common divisor of gi, hi, i.e. (7.19) provides ri, si ∈ F [X] such that1 = rigi + sihi, yielding

0 =(

ǫA(ri)ǫA(gi) + ǫA(si)ǫA(hi))

(U) = ǫA(1)(U) = Id(U) = U,


Theorem 8.14. Let V be a vector space over the field F , dimV = n ∈ N, and A ∈L(V, V ). Then A is diagonalizable if, and only if, there exist distinct λ1, . . . , λl ∈ F ,l ∈ N, such that µA =

∏li=1(X − λi).

Proof. Suppose A is diagonalizable and let B be a basis of V , consisting of eigenvectorsof A. Define g :=

∏

λ∈σ(A)(X − λ). For each b ∈ B, there exists λ ∈ σ(A) such thatAb = λb. Thus,

ǫA(g)(b) =∏

λ∈σ(A)

(A− λ) b = 0.

According to Th. 8.9(b), we have µA| g. Since, by Th. 8.9(c), each λ ∈ σ(A) is a zeroof µA, we have deg µA = deg g. As both µA and g are monic, this means µA = g.Conversely, suppose µA =

∏li=1(X − λi) with distinct λ1, . . . , λl ∈ F , l ∈ N. Then, by

Th. 8.13,

V =l⊕

i=1

ker ǫA(X − λi) =l⊕

i=1

ker(A− λi Id) =l⊕

i=1

EA(λi),

proving A to be diagonalizable by Th. 6.3(d).

Example 8.15. (a) Let V be vector space over C, dimV = n ∈ N. If A ∈ L(V, V ) issuch that there exists m ∈ N with Am = Id, then Am − Id = 0 and µA| (Xm − 1) =∏m

k=1(X − ζk), ζk := ek2πi/m. As the roots of unity ζk are all distinct (cf. [Phi16a,Cor. 8.31]), A is diagonalizable by Th. 8.14.

(b) Let V be vector space over the field F , dimV = n ∈ N, and let P ∈ L(V, V ) be aprojection, i.e. P 2 = P . Then P 2 − P = 0 and µP | (X2 −X) = X(X − 1). Thus,we obtain the three cases

µP =

X for P = 0,

X − 1 for P = Id,

X(X − 1) otherwise.

9 JORDAN NORMAL FORM 105

(c) Let V be vector space over the field F , dimV = n ∈ N, and let A ∈ L(V, V ) be a so-called involution i.e. A2 = Id. Then A2− Id = 0 and µA| (X2−1) = (X+1)(X−1).Thus, we obtain the three cases

µA =

X − 1 for A = Id,

X + 1 for A = − Id,

(X + 1)(X − 1) otherwise.

If A 6= ± Id, then, according to Th. 8.14, A is diagonalizable if, and only if, 1 6= −1,i.e. if, and only if, charF 6= 2. Even though A 6= ± Id is not diagonalizable forcharF = 2, there still exists an ordered basis B of V such that the matrix M of Awith respect to B is triangular (all diagonal elements being 1), due to Prop. 8.12(b).

Proposition 8.16. Let V be a vector space over the real numbers R, dimV = n ∈ N,and A ∈ L(V, V ).

(a) There exists a vector subspace U of V such that dimU ∈ 1, 2 and U is A-invariant(i.e. A(U) ⊆ U).

(b) There exists an ordered basis B of V and matrices M1, . . . ,Ml, l ∈ N, such thateach Mi is either 1 × 1 or 2 × 2 over R, and such that the matrix M of A withrespect to B has the block triangular form

M =

M1 ∗ ∗. . . ∗

Ml

.

Proof. Exercise.

9 Jordan Normal Form

If V is a vector space over the algebraically closed field F , dimV = n ∈ N, A ∈ L(V, V ),then one can always find an ordered basis B of V such that the corresponding matrixMof A is in so-called Jordan normal form, which is an especially simple (upper) triangularform, where the eigenvalues are found on the diagonal, the value 1 can, possibly, occurdirectly above the diagonal, and all other entries are 0. However, we will also see below,that, if F is not algebraically closed, then one can still obtain a normal form for M ,albeit, in general, it is more complicated than the Jordan normal form.

Definition 9.1. Let V be a vector space over the field F , dimV = n ∈ N, A ∈ L(V, V ).

(a) A vector subspace U of V is called A-cyclic if, and only if, U is A-invariant (i.e.A(U) ⊆ U) and

∃v∈V

U =⟨

Aiv : i ∈ N0⟩

.


(b) V is called A-irreducible if, and only if, V = U1 ⊕ U2 with U1, U2 both A-invariantvector subspaces of V , implies U1 = V or U2 = V .

Proposition 9.2. Let V be a finite-dimensional vector space over the field F , r ∈ N,A ∈ L(V, V ). Suppose V is A-cyclic,

V =⟨

Aiv : i ∈ N0⟩

, v ∈ V,

and

∃a0,...,ar∈F

µA =r∑

i=0

aiXi, deg µA = r.

Then χA = µA, dimV = r, B := (v, Av, . . . , Ar−1v) is an ordered basis of V , and, withrespect to B, A has the matrix

M =

0 0 . . . . . . . . . −a01 0 . . . . . . . . . −a10 1...... 0 −ar−2

. . . . . . 1 −ar−1

∈ M(r, F ). (9.1)

Proof. To show that v, Av, . . . , Ar−1v are linearly independent, let λ0, . . . , λr−1 ∈ F suchthat

0 =r−1∑

i=0

λiAiv. (9.2)

Define g :=∑r−1

i=0 λiXi ∈ F [X]. We need to show g = 0. Indeed, we have

∀i∈N0

ǫA(g)Ai v = Ai ǫA(g) v

(9.2)= 0,

which, as the Ai generate V , implies ǫA(g) = 0. Since deg g < deg µA, this yields g = 0and the linear independence of v, Av, . . . , Ar−1v. We show 〈B〉 = V next: Seeking acontradiction, assume Am /∈ 〈B〉, where we choose m ∈ N to be minimal. Then m ≥ r.Then there exist λ0, . . . , λr−1 ∈ F such that Am−1v =

∑r−1i=0 λiA

iv. Then m > r yieldsthe contradiction

Amv =r∑

i=1

λiAiv ∈ 〈B〉.

However, m = r also yields a contradiction due to

0 = ǫA(µA)v = Arv +r−1∑

i=0

aiAiv ⇒ Arv ∈ 〈B〉.

Thus, B is an ordered basis of V . Then degχA = r, implying χA = µA. Finally, ifek is the kth standard column basis vector of F r, k ∈ 1, . . . , r, then Mek = ek+1 fork ∈ 1, . . . , r − 1 and Mer = −∑r−1

i=0 aiei, showing M to be the matrix of A withrespect to B, since A(Ar−1v) = Arv = −∑r−1

i=0 aiAiv.


Proposition 9.3. Let V be a vector space over the field F , dimV = n ∈ N, A ∈L(V, V ). Moreover, let U be an A-invariant subspace of V , 1 ≤ l := dimU < n, anddefine the linear maps

AU : U −→ U, AU := AU ,

AV/U : V/U −→ V/U, AV/U(v + U) := Av + U.

Then the following holds:

(a) Let B := (v1, . . . , vn) be an ordered basis of V such that BU := (v1, . . . , vl) is anordered basis of U . Then the matrix M of A with respect to B has the block form

M = (mji) =

(

MU ∗0 MV/U

)

, MU ∈ M(l, F ), MV/U ∈ M(n− l, F ),

where MU is the matrix of AU with respect to BU and MV/U is the matrix of AV/U

with respect to the ordered basis BV/U := (vl+1 + U, . . . , vn + U) of V/U .

(b) χA = χAUχAV/U

.

(c) µAU|µA and µAV/U

|µA.

Proof. AU is well-defined, since U is A-invariant, and linear as A is linear. Moreover,AV/U is well-defined, since, for each v, w ∈ V with v − w ∈ U , Av + U = Av + A(w −v) + U = Aw + U ; AV/U is linear, since, for each v, w ∈ V and λ ∈ F ,

AV/U(v + U + w + U) = A(v + w) + U = Av + U + Aw + U

= AV/U(v + U) + AV/U(w + U),

AV/U(λv + U) = A(λv) + U = λ(Av + U) = λAV/U(v + U).

(a): Since U is A-invariant, we have

∀i∈1,...,l

∃m1i,...,mli∈F

Avi =l∑

j=1

mjivj,

showing M to have the claimed form with MU being the matrix of AU with respect toBU . Moreover, by [Phi19, Cor. 6.13(a)], BV/U is, indeed, an ordered basis of V/U and

∀i∈l+1,...,n

AV/U(vi + U) = Avi + U =n∑

j=1

mjivj + U =n∑

j=l+1

mjivj + U,

proving MV/U to be the matrix of AV/U with respect to BV/U .

(b): We compute

χA = det(X Idn −M)(a), Th. 4.25

= det(X Idl −MU) det(X Idn−l −MV/U)(a)= χAU

χAV/U.

(c): Since ǫA(µA)v = 0 for each v ∈ V , ǫA(µA)v = 0 for each v ∈ U , proving ǫAU(µA) = 0

and µAU|µA. Similarly, ǫAV/U

(µA)(v + U) = ǫA(µA)v + U = 0 for each v ∈ V , provingǫAV/U

(µA) = 0 and µAV/U|µA.


Comparing Prop. 9.3(b),(c) above, one might wonder if the analogon of Prop. 9.3(b)also holds for the minimal polynomials. The following Ex. 9.4 shows that, in general, itdoes not:

Example 9.4. Let F be field and V := F 2. Then, for A := Id ∈ L(V, V ), µA = X − 1.If U is an arbitrary 1-dimensional subspace of V , then, using the notation of Prop. 9.3,µAU

= X − 1 = µAV/U, i.e. µAU

µAV/U= (X − 1)2 6= µA.

Lemma 9.5. Let V be a vector space over the field F , dimV = n ∈ N, A ∈ L(V, V ).Suppose V is A-cyclic.

(a) If µA = gh with g, h ∈ F [X], then we have:

(i) dim ker ǫA(h) = deg h.

(ii) ker ǫA(h) = Im ǫA(g).

(b) If λ ∈ σ(A), then dimEA(λ) = 1, i.e. every eigenspace of A has dimension 1.

Proof. (a): To prove (i), let U := Imh(A) = ǫA(h)(V ) and define AU , AV/U as in Prop.9.3. As V is A-cyclic (say, generated by v ∈ V ), U is AU -cyclic (generated by ǫA(h)v)and V/U is AV/U -cyclic (generated by v + U). Thus, Prop. 9.2 yields

χA = µA, χAU= µAU

, χAV/U= µAV/U

,

implying

gh = µA = χAProp. 9.3(b)

= χAUχAV/U

= µAUµAV/U

. (9.3)

If v ∈ V , then ǫA(g)ǫA(h)v = ǫA(µA)v = 0, showing ǫAU(g) = 0 and µAU

| g, deg µAU≤

deg g. Similarly, ǫAV/U(h)(v + U) = ǫA(h)v + U = 0 (since ǫA(h)v ∈ U), proving

ǫAV/U(h) = 0 and µAV/U

|h, deg µAV/U≤ deg h. Since we also have deg g + deg h =

deg µAU+ deg µAV/U

by (9.3), we must have deg g = deg µAUand deg h = deg µAV/U

.Thus,

dimU = degχAU= deg µAU

= deg g = deg µA − deg h.

According to the isomorphism theorem [Phi19, Th. 6.16(a)], we know

V/ ker ǫA(h) ∼= Im ǫA(h) = U,

implying

deg h = deg µA − dimU = dimV − dimU = dimV − dim(

V/ ker ǫA(h))

= dimV −(

dimV − dimker ǫA(h))

= dimker ǫA(h),

thereby proving (i). We proceed to prove (ii): Since ǫA(h)(Im ǫA(g)) = 0, we haveIm ǫA(g) ⊆ ker ǫA(h). To prove equality, we note that (i) must also hold with g insteadof h and compute

dim Im ǫA(g) = dimV − dimker ǫA(g)(i)= degχA − deg g = deg µA − deg g

= deg h(i)= dimker ǫA(h),


completing the proof of (ii).

(b): Let λ ∈ σ(A). Then λ is a zero of µA and, by Prop. 7.34, there exists q ∈ F [X]such that µA = (X − λ) q. Hence,

dimEA(λ) = dimker(A− λ Id)(a)(i)= deg(X − λ) = 1,

proving (b).

Lemma 9.6. Let V be a vector space over the field F , dimV = n ∈ N, A ∈ L(V, V ).Suppose µA = gr, where g ∈ F [X] is irreducible and r ∈ N. If U is an A-cyclicsubspace of V such that U has maximal dimension, then µAU = µA and there exists anA-invariant subspace W such that V = U ⊕W .

Proof. Letting AU := A U , we show µAU= µA first: According to Prop. 9.3(c), we

have µAU|µA, i.e. there exists 1 ≤ r1 ≤ r such that µAU

= gr1 . According to Prop.9.2, χAU

= µAU, implying dimU = deg µAU

= r1 deg g. Let 0 6= v ∈ V and defineU1 := 〈Aiv : i ∈ N0〉. Then U1 is an A-cyclic subspace of V and the maximality of Uimplies µAU1

| gr1 , implying ǫA(gr1)(U1) = 0. As 0 6= v ∈ V was arbitrary, this shows

ǫA(gr1) = 0, implying µA| gr1 = µAU

, showing µAU= µA (and r1 = r) as claimed.

The proof of the existence of W is now conducted via induction on n ∈ N. For n = 1,we have U = V and there is nothing to prove (W = 0). Thus, let n > 1. If U = V ,then, as for n = 1, we can merely set W := 0. Thus, U 6= V . First, consider the casethat V/U is AV/U -reducible, where AV/U is as in Prop. 9.3. Then there exist A-invariantsubspaces 0 ( V1, V2 ( V such that V/U = (V1/U)⊕ (V2/U), i.e. V1+V2+U = V andV1 ∩ V2 ⊆ U (since v ∈ V1 ∩ V2 implies v +U ∈ (V1/U)∩ (V2/U) = U). Replacing V1, V2by V1 + U, V2 + U , respectively, we may also assume V1 ∩ V2 = U . As U ⊆ V1, V2 anddimV1, dimV2 < n, we can use the induction hypothesis to obtain A-invariant subspacesW1,W2 of V1, V2, respectively, such that V1 = U ⊕W1, V2 = U ⊕W2. Let W := W1⊕W2

(the sum is direct, as w ∈ W1 ∩W2 implies w ∈ V1 ∩ V2 = U , i.e. w = 0). If v ∈ V ,then V = V1 + V2 implies the existence of u1, u2 ∈ U and w1 ∈ W1, w2 ∈ W2 such thatv = u1 + w1 + u2 + w2, showing V = U +W . Using V/U = (V1/U)⊕ (V2/U) as well asW1

∼= V1/U , W2∼= V2/U by [Phi19, Th. 6.16(b)], we compute

dimV − dimU + dim(U ∩W ) = dimW = dimW1 + dimW2

= dim(V1/U) + dim(V2/U)

= dim(V/U) = dimV − dimU,

showing U ∩W = 0 and V = U ⊕W . It remains to consider the case that V/U isAV/U -irreducible. As dimV/U < n, the induction hypothesis applies to V/U , implyingV/U to be AV/U -cyclic, i.e. there exits v ∈ V with V/U = 〈(AV/U)

i(v + U) : i ∈ N0〉.Since µ := µAV/U

|µA by Prop. 9.3(c), we have µ = gs with 1 ≤ s ≤ r. Then

ǫA(gr−s) ǫA(g

s) v = ǫA(µA)v = 0,

showingǫA(g

s) v = ǫA(µ) v ∈ U ∩ ker ǫA(gr−s). (9.4)


Since µ|µA = µAU, we can apply Lem. 9.5(a)(ii) to obtain

U ∩ ker ǫA(gr−s) = ker ǫAU

(gr−s) = Im ǫAU(gs),

which, when combined with (9.4), yields some uv ∈ U with

ǫA(gs) uv = ǫA(g

s) v. (9.5)

This, finally, allows us to define

w0 := v − uv, W := 〈Aiw0 : i ∈ N0〉.

Clearly, W is A-invariant. If x ∈ V , then x ∈ x + U ∈ V/U and, since dim(V/U) < n,there exist u ∈ U and λ1, . . . , λn ∈ F such that

x = u+n∑

i=0

λiAi v = u+

n∑

i=0

λiAi (w0 + uv) = u+

n∑

i=0

λiAi uv +

n∑

i=0

λiAiw0,

proving V = U +W . Since

ǫA(gs)w0 = ǫA(g

s) v − ǫA(gs) uv

(9.5)= 0,

we have dimW = deg µAW≤ deg(gs) = deg µ = dim(V/U), implying (as V = U +W )

dimW = dim(V/U) and V = U ⊕W .

Theorem 9.7. Let V be a vector space over the field F , dimV = n ∈ N, A ∈ L(V, V ).

(a) If V is A-irreducible, then V is A-cyclic.

(b) Suppose µA = gr, where g ∈ F [X] is irreducible and r ∈ N. Then V is A-irreducible,if, and only if, V is A-cyclic.

(c) There exist subspaces U1, . . . , Ul of V , l ∈ N, such that

V =l⊕

i=1

Ui

and each Ui is both A-irreducible and A-cyclic.

Proof. (a): We must have µA = gr with r ∈ N and g ∈ F [X] irreducible, since, otherwiseV is A-reducible by Th. 8.13. In consequence, V is A-cyclic by Lem. 9.6.

(b): According to (a), it only remains to show that V being A-cyclic implies V to beA-irreducible. Thus, let V be A-cyclic and V = V1 ⊕ V2 with A-invariant subspacesV1, V2 ⊆ V . Then, by Prop. 9.3(c), there exist 1 ≤ r1, r2 ≤ r such that µAV1

= gr1 andµAV2

= gr2 , where we choose V1 such that r2 ≤ r1. Then ǫA(µAV1)(V1) = ǫA(µAV1

)(V2) =0, showing µA = µAV1

. As χA = µA by Prop. 9.2, we have dimV1 ≥ deg µAV1=

degχA = dimV , showing V1 = V as desired.


(c): The proof is conducted via induction on n ∈ N. If V is A-irreducible, then, by(a), V is also A-cyclic and the statement holds (in particular, this yields the base casen = 1). If V is A-reducible, then there exist A-invariant subspaces V1, V2 of V such thatV = V1 ⊕ V2 with dimV1, dimV2 < n. Thus, by induction hypothesis, both V1 and V2can be written as a direct sum of subspaces that are both A-irreducible and A-cyclic,proving the same for V .

We now have all preparations in place to prove the existence of normal forms havingmatrices with block diagonal form, where the blocks all look like the matrix of (9.1).However, before we state and prove the corresponding theorem, we still provide a propo-sition that will help to address the uniqueness of such normal forms:

Proposition 9.8. Let V be a vector space over the field F , dimV = n ∈ N, A ∈L(V, V ). Moreover, suppose we have a decomposition

V =l⊕

i=1

Ui, l ∈ N, (9.6)

where the U1, . . . , Ul all are A-invariant and A-irreducible (and, thus, A-cyclic by Th.9.7(a)) subspaces of V .

(a) If µA = gr11 · · · grmm is the prime factorization of µA (i.e. each gi ∈ F [X] is irreducible,r1, . . . , rm ∈ N, m ∈ N), then

V =m⊕

i=1

ker ǫA(grii ) (9.7)

and the decomposition of (9.6) is a refinement of (9.7) in the sense that each Ui iscontained in some ker ǫA(g

rjj ).

(b) If µA = gr with g ∈ F [X] irreducible, r ∈ N, then, for each i ∈ 1, . . . , l, we haveµAUi

= gri, dimUi = ri deg g, with 1 ≤ ri ≤ r. If

∀k∈N0

lk := #

i ∈ 1, . . . , l : µAUi= gk

(i.e. lk is the number of summands Ui with µAUi= gk), then

l =r∑

k=1

lk, dimV = (deg g)r∑

k=1

k lk, (9.8)

and

∀s∈0,...,r

dim Im ǫA(gs) = (deg g)

r∑

k=s

lk (k − s). (9.9)

In consequence, the numbers lk are uniquely determined by A.


Proof. (a): That (9.7) is an immediate consequence of Th. 8.13. Moreover, if U is anA-invariant and A-irreducible subspace of V , then µAU

|µA and Th. 8.13 imply µAU= gsj

for some j ∈ 1, . . . ,m and 1 ≤ s ≤ rj. In consequence, U ⊆ ker ǫA(grjj ) as claimed.

(b): The proof of (a) already showed that, for each i ∈ 1, . . . , l, µAUi= gri , with

1 ≤ ri ≤ r, and then dimUi = ri deg g by Prop. 9.2. From the definitions of r and lk, itis immediate that l =

∑rk=1 lk. As the sum in (9.6) is direct, we obtain

dimV =l∑

i=1

dimUi =r∑

k=1

lk k deg g = (deg g)r∑

k=1

k lk.

To prove (9.9), fix s ∈ 0, . . . , r and set h := gs. For each v ∈ V , there exist u1, . . . , ul ∈V such that ui ∈ Ui and v =

∑li=1 ui. Then

ǫA(h) v =l∑

i=1

ǫA(h) ui,

implying

ker ǫA(h) =l⊕

i=1

ker ǫAUi(h), (9.10)

due to the fact that ǫA(h) v = 0 if, and only if, ǫA(h) ui = 0 for each i ∈ 1, . . . , l. Inthe case, where µAUi

= gri with ri ≤ s, we have ker ǫAUi(h) = Ui, i.e.

dim ker ǫAUi(h) = dimUi = ri deg g, (9.11a)

while, in the case, where µAUi= gri with s < ri, we can apply Lem. 9.5(a)(i) to obtain

dim ker ǫAUi(h) = deg(gs) = s deg g. (9.11b)

Putting everything together yields

dim Im ǫA(gs) = dim Im ǫA(h) = dimV − dimker ǫA(h)

(9.8),(9.10)= (deg g)

r∑

k=1

k lk −l∑

i=1

dimker ǫAUi(h)

(9.11)= (deg g)

r∑

k=s

lk (k − s),

proving (9.9). To see that the lk are uniquely determined by A, observe lk = 0 for k > rand k = 0, and, for 1 ≤ k ≤ r, (9.9) implies the recursion

lr = (deg g)−1 dim Im ǫA(gr−1), (9.12a)

∀s∈1,...,r−1

ls = (deg g)−1 dim Im ǫA(gs−1)−

r∑

k=s+1

lk (k − (s− 1)), (9.12b)



Theorem 9.9 (General Normal Form). Let V be a vector space over the field F , dimV =n ∈ N, A ∈ L(V, V ).

(a) There exist subspaces U1, . . . , Ul of V , l ∈ N, such that each Ui is A-cyclic andA-irreducible, satisfying

V =l⊕

i=1

Ui.

Moreover, for each i ∈ 1, . . . , l, there exists vi ∈ Ui such that

Bi := vi, Avi, . . . , Ari−1vi, ri := dimUi,

is a basis of Ui. Then µAUi=∑ri

k=0 a(i)k Xk with a

(i)0 , . . . , a

(i)ri ∈ F , AUi

:= A Ui,

and, with respect to the ordered basis

B := (v1, . . . , Ar1−1v1, . . . , vl, . . . , A

rl−1vl),

A has the block diagonal matrix

M :=

M1 0 . . . 00 M2 . . . 0...0 . . . 0 Ml

,

each block having the form of (9.1), namely

∀i∈1,...,l

Mi =

0 0 . . . . . . . . . −a(i)0

1 0 . . . . . . . . . −a(i)1

0 1...... 0 −a(i)ri−2

. . . . . . 1 −a(i)ri−1

.

(b) If

V =m⊕

i=1

Wi

is another decomposition of V into A-invariant and A-irreducible subspacesW1, . . . ,Wm of V , m ∈ N, then m = l and there exist a permutation π ∈ Sl and T ∈ GL(V )such that

TA = AT ∧ ∀i∈1,...,l

T (Ui) = Wπ(i). (9.13)


Proof. (a): The existence of the claimed decomposition was already shown in Th. 9.7(c)and the remaining statements are then provided by Prop. 9.2, where the A-invarianceof the Ui yields the block diagonal structure of M .

(b): We divide the proof into three steps:

Step 1: Assume U and W to be A-cyclic subspaces of V such that 1 ≤ s := dimU =dimW ≤ n and let u ∈ U , w ∈ W be such that BU := u, . . . , As−1u, BW :=w, . . . , As−1w are bases of U,W , respectively. Define S ∈ L(U,W ) by letting

∀i∈0,...,s−1

S(Aiu) := Aiw

(then S is invertible, as it maps the basis BU onto the basis BW ). We verify SA = AS:Indeed,

∀i∈0,...,s−2

SA(Aiu) = S(Ai+1u) = Ai+1w = A(Aiw) = AS(Aiu)

and, letting a0, . . . , as ∈ F be such that µAU= µAW

=∑s

i=0 aiXi (cf. Prop. 9.2),

SA(As−1u) = S(Asu) = S

(

−s−1∑

i=0

aiAiu

)

= −s−1∑

i=0

aiAiw

= Asw = A(As−1w) = AS(As−1u).

Step 2: Assume µA = gs, where g ∈ F [X] is irreducible and s ∈ N. Letting

∀k∈N0

I(U, k) :=

i ∈ 1, . . . , l : µAUi= gk

,

I(W, k) :=

i ∈ 1, . . . ,m : µAWi= gk

,

the uniqueness of the numbers lk in Prop. 9.8(b) shows

∀k∈N0

lk = #I(U, k) = #I(W, k)

and, in particular, m = l, and the existence of π ∈ Sl such that, for each k ∈ N and eachi ∈ I(U, k), one has π(i) ∈ I(W, k). Thus, by Step 1, for each i ∈ 1, . . . , l, there existsan invertible Si ∈ L(Ui,Wπ(i)) such that SiA = ASi. Define T ∈ GL(V ) by letting, for

each v ∈ V such that v =∑l

i=1 ui with ui ∈ Ui, Tv :=∑l

i=1 Siui. Then, clearly, Tsatisfies (9.13).

Step 3: We now consider the general situation of (b). Let µA = gr11 · · · grss be the primefactorization of µA (i.e. each gi ∈ F [X] is irreducible, r1, . . . , rs ∈ N, s ∈ N). Accordingto Prop. 9.8(a), there exist sets I1, . . . , Is ⊆ 1, . . . , l and J1, . . . , Js ⊆ 1, . . . ,m suchthat

∀k∈1,...,s

ker ǫA(grkk ) =

⊕

i∈Ik

Ui =⊕

i∈Jk

Wi.

Then, by Step 2, we have #Ik = #Jk for each k ∈ 1, . . . , s, implying

l =s∑

k=1

#Ik =s∑

k=1

#Jk = m


and the existence of a permutation π ∈ Sl such that π(Ik) = Jk for each k ∈ 1, . . . , s.Again using Step 2, we can now, in addition, choose π ∈ Sl such that

∀i∈1,...,l

∃Ti∈L(Ui,Wπ(i))

(

Ti invertible ∧ TiA = ATi

)

.

Define T ∈ GL(V ) by letting, for each v ∈ V such that v =∑l

i=1 ui with ui ∈ Ui,

Tv :=∑l

i=1 Tiui. Then, clearly, T satisfies (9.13).

The following Prop. 9.10 can sometimes be helpful in actually finding the Ui of Th.9.9(a):

Proposition 9.10. Let V be a vector space over the field F , dimV = n ∈ N, A ∈L(V, V ), and let µA = gr11 · · · grmm be the prime factorization of µA (i.e. each gi ∈ F [X]is irreducible, r1, . . . , rm ∈ N, m ∈ N). For each i ∈ 1, . . . ,m, let Vi := ker ǫA(g

rii ).

(a) For each i ∈ 1, . . . ,m, we have µAVi= grii (recall V =

⊕mi=1 Vi).

(b) For each i ∈ 1, . . . ,m, in the decomposition V =⊕l

k=1 Uk of Th. 9.9(a), thereexists at least one Uk with Uk ⊆ Vi and dimUk = deg(grii ).

(c) As in (b), let V =⊕l

k=1 Uk be the decomposition of Th. 9.9(a). Then, for each i ∈1, . . . ,m and each k ∈ 1, . . . , l such that Uk ⊆ Vi, one has dimUk ≤ deg(grii ).

Proof. (a): Let i ∈ 1, . . . ,m. According to Prop. 9.3(c), we have µAVi= gsi with

1 ≤ s ≤ ri. For each v ∈ V , there are v1, . . . , vm ∈ V such that v =∑m

i=1 vi and vi ∈ Vi,implying

ǫA(gr11 · · · gri−1

i−1 gsi g

ri+1

i+1 · · · grmm ) v = 0

and ri ≤ s, i.e. ri = s.

(b): Let i ∈ 1, . . . ,m. According to (a), we have µAVi= grii . Using the uniqueness

of the decomposition V =⊕l

k=1 Uk is the sense of Th. 9.9(b) together with Lem. 9.6,there exists Uk with Uk ⊆ Vi such that Uk is an A-cyclic subspace of Vi of maximaldimension. Then Lem. 9.6 also yields µAUk

= µAVi= grii , which, as Uk is A-cyclic,

implies dimUk = deg(grii ).

(c): Let i ∈ 1, . . . ,m. Again, we know µAVi= grii by (a). If k ∈ 1, . . . , l is such that

Uk ⊆ Vi, then Prop. 9.3(c) yields µAUk|µAVi

= grii , showing dimUk ≤ deg(grii ).

Remark 9.11. In general, in the situation of Prop. 9.10, the knowledge of µA andχA does not suffice to uniquely determine the normal form of Th. 9.9(a). In general,for each gi and each s ∈ 1, . . . , ri, one needs to determine dim Im ǫA(g

si ), which then

determine the numbers lk of (9.9), i.e. the number of subspaces Uj with µAUj= gki . This

then determines the matrix M of Th. 9.9(a) (up to the order of the diagonal blocks),since one obtains precisely lk many blocks of size k deg gi and the entries of these blocksare given by the coefficients of gki .


Example 9.12. (a) Let F be a field and V := F 6. Assume A ∈ L(V, V ) has

χA = (X − 2)2(X − 3)4, µA = (X − 2)2(X − 3)3.

We want to determine the decomposition V =⊕l

i=1 Ui of Th. 9.9(a) and the matrixM with respect to the corresponding basis of V given in Th. 9.9(a): We knowfrom Prop. 9.10(b) that we can choose U1 ⊆ ker(A − 2 Id)2 with dimU1 = 2 andU2 ⊆ ker(A−3 Id)3 with dimU2 = 3. As dimV = 6, this then yields V = U1⊕U2⊕U3

with dimU3 = 1. We also know σ(A) = 2, 3 and, according to Th. 8.4, thealgebraic multiplicities are ma(2) = 2, ma(3) = 4. Moreover,

4 = ma(3) = dimker(A− 3 Id)4 = dimker ǫA((X − 3)4),

implying U3 ⊆ ker(A − 3 Id)4. As (X − 2)2 = X2 − 4X + 4 and (X − 3)3 =X3 − 9X2 + 27X − 27, M has the form

M =

0 −41 4

0 0 271 0 −270 1 9

3

.

(b) Let F := Z2 = 0, 1 and V := F 3. Assume, with respect to some basis of V ,A ∈ L(V, V ) has the matrix

N :=

1 1 11 0 11 0 0

.

We compute (using 1 = −1 and 0 = 2 in F )

χA = det(X Id3−N) =

∣

∣

∣

∣

∣

∣

X − 1 −1 −1−1 X −1−1 0 X

∣

∣

∣

∣

∣

∣

= X2(X − 1)− 1−X −X = X3 −X2 − 2X − 1 = X3 +X2 + 1.

Since χA = X(X2 + X) + 1 and χA = (X + 1)X2 + 1, χA is irreducible. ThusχA = µA, V is A-irreducible and A-cyclic and the matrix of A with respect to thebasis of Th. 9.9(a) is (again making use of −1 = 1 in F )

M =

0 0 11 0 00 1 1

.

—


For fields that are algebraically closed, we can improve the normal form of Th. 9.9 tothe so-called Jordan normal norm:

Theorem 9.13 (Jordan Normal Form). Let V be a vector space over the field F ,dimV = n ∈ N, A ∈ L(V, V ). Assume there exist distinct λ1, . . . , λm ∈ K such that

µA =m∏

i=1

(X − λi)ri , σ(A) = λ1, . . . , λm, m, r1, . . . , rm ∈ N (9.14)

(if F is algebraically closed, then (9.14) always holds).

(a) There exist subspaces U1, . . . , Ul of V , l ∈ N, such that each Ui is A-cyclic andA-irreducible, satisfying

V =l⊕

k=1

Uk.

Moreover, for each k ∈ 1, . . . , l, there exists vk ∈ Uk and i = i(k) ∈ 1, . . . ,msuch that

Uk ⊆ ker(A− λi Id)ri

and

Jk :=

vk, (A− λi Id)vk, . . . , (A− λi Id)sk−1vk

,

sk := dimUk ≤ ri ≤ dimker(A− λi Id)ri ,

is a basis of Uk (note that, in general, the same i will correspond to many distinctsubspaces Uk). Then, with respect to the ordered basis

J :=(

(A− λi(1) Id)s1−1v1, . . . , v1, . . . , (A− λi(l) Id)

sl−1vl, . . . , vl)

,

A has a matrix in Jordan normal form, i.e. the block diagonal matrix

N :=

N1 0 . . . 00 N2 . . . 0...0 . . . 0 Nl

,

each block (called a Jordan block) having the form

Nk = (λi(k)) ∈ M(1, F ) for sk = 1,

Nk =

λi(k) 1 0 . . . 0λi(k) 1

. . . . . . 00 λi(k) 1

λi(k)

∈ M(sk, F ) for sk > 1.


(b) In the situation of (a) and recalling from Def. 6.14 that, for each λ ∈ σ(A), r(λ)is such that ma(λ) = dimker(A− λ Id)r(λ) and, for each s ∈ 1, . . . , r(λ),

EsA(λ) = ker(A− λ Id)s

is called the corresponding generalized eigenspace of rank s of A, each v ∈ EsA(λ) \

Es−1A (λ), s ≥ 2, is called a generalized eigenvector of rank s, we obtain

∀i∈1,...,m

ri = r(λi),

∀k∈1,...,l

Uk ⊆ EskA (λi(k)) ⊆ E

ri(k)A (λi(k)).

Moreover, for each k ∈ 1, . . . , l, vk is a generalized eigenvector of rank sk and thebasis Jk consists of generalized eigenvectors, containing precisely one generalizedeigenvector of rank s for each s ∈ 1, . . . , sk. Define

∀i∈1,...,m

∀s∈N

l(i, s) := #

k ∈ 1, . . . , l : Uk ⊆ ker(A− λi Id)ri ∧ dimUk = s

(thus, l(i, s) is the number of Jordan blocks of size s corresponding to the eigenvalueλi – apart from the slightly different notation used here, the l(i, s) are precisely thenumbers called lk in Prop. 9.8(b)). Then, for each i ∈ 1, . . . ,m,

l(i, ri) = dimker(A− λi Id)ri − dimker(A− λi Id)

ri−1, (9.15a)

∀s∈1,...,ri−1

l(i, s) = dimker(A− λi Id)ri − dimker(A− λi Id)

s−1

−ri∑

j=s+1

l(i, j) (j − (s− 1)). (9.15b)

Thus, in general, one needs to determine, for each i ∈ 1, . . . ,m and each s ∈1, . . . , ri, dimker(A− λi Id)

s to know the precise structure of N .

(c) For the sake of completeness and convenience, we restate Th. 9.9(b): If

V =L⊕

i=1

Wi

is another decomposition of V into A-invariant and A-irreducible subspacesW1, . . . ,WL of V , L ∈ N, then L = l and there exist a permutation π ∈ Sl and T ∈ GL(V )such that

TA = AT ∧ ∀i∈1,...,l

T (Ui) = Wπ(i).

Proof. (a),(b): The A-cyclic and A-irreducible subspaces U1, . . . , Ul are given by Th.9.9(a). As in Th. 9.9(a), for each k ∈ 1, . . . , l, let vk ∈ Uk be such that

Bk := vk, Avk, . . . , Ask−1vk, sk := dimUk,


is a basis of Uk. By Prop. 9.8(a), there exists i := i(k) ∈ 1, . . . ,m such that Uk ⊆ker(A − λi Id)

ri . For sk = 1, we have Jk = Bk = vk. For sk > 1, letting, for eachj ∈ 0, . . . , sk − 1, wj := (A− λi Id)

jvk, we obtain

∀j∈0,...,sk−2

Awj = A(A− λi Id)jvk = (A− λi Id+λi Id)(A− λi Id)

jvk = wj+1 + λiwj

(9.16a)and, using (A− λi Id)

skvk = 0 due to ǫAVk(µAVk

) = 0,

Awsk−1 = A(A− λi Id)sk−1vk = (A− λi Id)

skvk + λiwsk−1 = λiwsk−1. (9.16b)

Thus, with respect to the ordered basis (wsk−1, . . . , w0), AUkhas the matrix Nk. As

each Uk is A-invariant, this also proves N to be the matrix of A with respect to J .Proposition 9.10(a),(c) yields sk ≤ ri ≤ dimker(A − λi Id)

ri . Moreover, (9.16) showsthat Jk contains precisely one generalized eigenvector of rank s for each s ∈ 1, . . . , sk.Finally, (9.15), i.e. the formulas for the l(i, s), are given by the recursion (9.12) (whichwas inferred from (9.9)), using that, in the current situation, g = X − λi, dim g = 1,and (with Vi := ker(A− λi Id)

ri),

∀s∈N0

dim Im ǫAVi(gs) = dimker(A− λi Id)

ri − dimker(A− λi Id)s.

(c) was already proved as it is merely a restatement of Th. 9.9(b).

Example 9.14. (a) In Ex. 9.12(a), we considered V := F 6 (F some field) and A ∈L(V, V ) with

χA = (X − 2)2(X − 3)4, µA = (X − 2)2(X − 3)3.

We obtained V =⊕3

i=1 Ui with U1 ⊆ ker(A− 2 Id)2 with dimU1 = 2 and U2, U3 ⊆ker(A − 3 Id)3 with dimU2 = 3, dimU3 = 1. Thus, the corresponding matrix inJordan normal form is

N =

2 12

3 13 1

33

.

(b) Consider the matrices

N1 :=

2 12 1

22 1

2 12

22

, N2 :=

2 12 1

22 1

22 1

22


over some field F . Both matrices are in Jordan normal form with

χN1 = χN2 = (X − 2)8, µN1 = µN2 = (X − 2)3.

Both have the same total number of Jordan blocks, namely 4, which correspondsto

dim ker(N1 − 2 Id8) = dimker(N2 − 2 Id8) = 4.

The differences appear in the generalized eigenspaces of higher order: N1 has twolinearly independent generalized eigenvectors of rank 2, whereas N2 has three lin-early independent generalized eigenvectors of rank 2, yielding

dimker(N1 − 2 Id8)2 − dimker(N1 − 2 Id8) = 2, i.e. dim ker(N1 − 2 Id8)

2 = 6,

dimker(N2 − 2 Id8)2 − dimker(N2 − 2 Id8) = 3, i.e. dim ker(N2 − 2 Id8)

2 = 7.

Next, N1 has two linearly independent generalized eigenvectors of rank 3, whereasN2 has one linearly independent generalized eigenvector of rank 3, yielding

dimker(N1 − 2 Id8)3 − dimker(N1 − 2 Id8)

2 = 2, i.e. dim ker(N1 − 2 Id8)3 = 8,

dimker(N2 − 2 Id8)3 − dimker(N2 − 2 Id8)

2 = 1, i.e. dim ker(N2 − 2 Id8)3 = 8.

From (9.15a), we obtain (with i = 1)

lN1(1, 3) = 2, lN2(1, 3) = 1,

corresponding to N1 having two blocks of size 3 and N2 having one block of size 3.From (9.15b), we obtain (with i = 1)

lN1(1, 2) = 8− 4− 2(3− 1) = 0, lN2(1, 2) = 8− 4− 1(3− 1) = 2,

corresponding to N1 having no blocks of size 2 and N2 having two blocks of size 2.To check consistency, we use (9.15b) again to obtain

lN1(1, 1) = 8−0−0(2−0)−2(3−0) = 2, lN2(1, 1) = 8−0−2(2−0)−1(3−0) = 1,

corresponding to N1 having two blocks of size 1 and N2 having one block of size 1.

Remark 9.15. In the situation of Th. 9.13, we saw that, in order to find a matrix N inJordan normal form for A, according to Th. 9.13(b), in general, for each i ∈ 1, . . . ,mand each s ∈ 1, . . . , ri, one has to know dimker(A − λi Id)

s. On the other hand,given a matrix M of A, one might also want to find the transition matrix T ∈ GLn(F )such that M = TNT−1. As it turns out, if one has already determined generalizedeigenvectors forming bases of ker(A − λi Id)

s, one may use these same vectors for thecolumns of T : Indeed, if M = TNT−1 and t1, . . . , tn denote the columns of T , then, ifj ∈ 1, . . . , n corresponds to a column of N with λi being the only nonzero entry, then

Mtj = TNT−1tj = TNej = Tλiej = λitj,

showing tj to be a corresponding eigenvector. If j ∈ 1, . . . , n corresponds to a columnof N having nonzero entries λi and 1 (above the λi), then

(M − λi Idn)tj = (TNT−1 − λi Idn)tj = TNej − λitj = T (ej−1 + λiej)− λitj

= tj−1 + λitj − λitj = tj−1

showing tj to be a generalized eigenvector of rank ≥ 2 corresponding to the Jordan blockcontaining the index j.

10 VECTOR SPACES WITH SCALAR PRODUCTS 121

10 Vector Spaces with Scalar Products

In this section, the field F will always be R or C. As before, we write K if K may standfor R or C.

10.1 Definition, Orthogonality

Definition 10.1. Let X be a vector space over K. A function 〈·, ·〉 : X × X −→ Kis called an inner product or a scalar product on X if, and only if, the following threeconditions are satisfied:

(i) 〈x, x〉 ∈ R+ for each 0 6= x ∈ X (i.e. an inner product is positive definite).

(ii) 〈λx + µy, z〉 = λ〈x, z〉 + µ〈y, z〉 for each x, y, z ∈ X and each λ, µ ∈ K (i.e. aninner product is K-linear in its first argument).

(iii) 〈x, y〉 = 〈y, x〉 for each x, y ∈ X (i.e. an inner product is conjugate-symmetric,even symmetric for K = R).

Lemma 10.2. For each inner product 〈·, ·〉 on a vector space X over K, the followingformulas are valid:

(a) 〈x, λy + µz〉 = λ〈x, y〉 + µ〈x, z〉 for each x, y, z ∈ X and each λ, µ ∈ K, i.e. 〈·, ·〉is conjugate-linear in its second argument, even linear for K = R. Together withDef. 10.1(ii), this means that 〈·, ·〉 is a sesquilinear form, even a bilinear form forK = R.

(b) 〈0, x〉 = 〈x, 0〉 = 0 for each x ∈ X.

Proof. (a): One computes, for each x, y, z ∈ X and each λ, µ ∈ K,

〈x, λy + µz〉 Def. 10.1(iii)= 〈λy + µz, x〉 Def. 10.1(ii)

= λ〈y, x〉+ µ〈z, x〉= λ 〈y, x〉+ µ 〈z, x〉 Def. 10.1(iii)

= λ〈x, y〉+ µ〈x, z〉.

(b): One computes, for each x ∈ X,

〈x, 0〉 Def. 10.1(iii)= 〈0, x〉 = 〈0x, x〉 Def. 10.1(ii)

= 0〈x, x〉 = 0,

thereby completing the proof of the lemma.

Remark 10.3. If X is a vector space over K with an inner product 〈·, ·〉, then the map

‖ · ‖ : X −→ R+0 , ‖x‖ :=

√

〈x, x〉,

defines a norm on X (cf. [Phi16b, Prop. 1.65]). One calls this the norm induced by theinner product.


Definition 10.4. Let X be a vector space over K. If 〈·, ·〉 is an inner product on X,then

(

X, 〈·, ·〉)

is called an inner product space or a pre-Hilbert space. An inner productspace is called a Hilbert space if, and only if, (X, ‖·‖) is a Banach space, where ‖·‖ is theinduced norm, i.e. ‖x‖ :=

√

〈x, x〉. Frequently, the inner product on X is understoodand X itself is referred to as an inner product space or Hilbert space.

Example 10.5. (a) On the space Kn, n ∈ N, we define an inner product by letting,for each z = (z1, . . . , zn) ∈ Kn, w = (w1, . . . , wn) ∈ Kn:

z · w :=n∑

j=1

zjwj (10.1)

(called the standard inner product on Kn, also the Euclidean inner product forK = R). Let us verify that (10.1), indeed, defines an inner product in the senseof Def. 10.1: If z 6= 0, then there is j0 ∈ 1, . . . , n such that zj0 6= 0. Thus,z · z =

∑nj=1 |zj|2 ≥ |zj0|2 > 0, i.e. Def. 10.1(i) is satisfied. Next, let z, w, u ∈ Kn

and λ, µ ∈ K. One computes

(λz + µw) · u =n∑

j=1

(λzj + µwj)uj =n∑

j=1

λzjuj +n∑

j=1

µwjuj = λ(z · u) + µ(w · u),

i.e. Def. 10.1(ii) is satisfied. For Def. 10.1(iii), merely note that

z · w =n∑

j=1

zjwj =n∑

j=1

wj zj = w · z.

Hence, we have shown that (10.1) defines an inner product according to Def. 10.1.Due to [Phi16b, Prop. 1.59(b)], the induced norm is complete, i.e. Kn with the innerproduct of (10.1) is a Hilbert space.

(b) Let a, b ∈ R, a < b. We define an inner product on the space X := C[a, b] ofcontinuous K-valued functions on [a, b] by letting

〈·, ·〉 : X ×X −→ K, 〈f, g〉 :=∫ b

a

f g : (10.2)

We verify that (10.2), indeed, defines an inner product: If f ∈ X, f 6≡ 0, then thereexists t ∈ [a, b] such that |f(t)| = α ∈ R+. As f is continuous, there exists δ > 0such that

∀s∈[a,b]∩[t−δ,t+δ]

|f(s)| > α

2,

implying

〈f, f〉 =∫ b

a

|f |2 ≥∫

[a,b]∩[t−δ,t+δ]

|f |2 > 0.


Next, let f, g, h ∈ X and λ, µ ∈ K. One computes

〈λf + µg, h〉 =∫

Ω

(λf + µg)h dµ = λ

∫

Ω

fh dµ + µ

∫

Ω

gh dµ = λ〈f, h〉+ µλ〈g, h〉.

Moreover,

∀f,g∈X

〈f, g〉 =∫

Ω

f g dµ =

∫

Ω

f g dµ = 〈g, f〉,

showing 〈·, ·〉 to be an inner product on X. As it turns out, (X, ‖ · ‖) is an exampleof an inner product space that is not a Hilbert space: With respect to the norminduced by 〈·, ·〉 (usually called the 2-norm, denoted ‖·‖2), X is dense in L2[a, b] (thespace of square-integrable functions with respect to Lebesgue (or Borel) measureon [a, b], cf., e.g. [Phi17a, Th. 2.49(a)]), which is the completion of X with respectto ‖ · ‖2.

Definition 10.6. Let(

X, 〈·, ·〉)

be an inner product space over K.

(a) x, y ∈ X are called orthogonal or perpendicular (denoted x ⊥ y) if, and only if,〈x, y〉 = 0.

(b) Let E ⊆ X. Define the perpendicular space E⊥ to E (called E perp) by

E⊥ :=

y ∈ X : ∀x∈E

〈x, y〉 = 0

. (10.3)

Caveat: As is common, we use the same symbol to denote the perpendicular spacethat we used to denote the forward annihilator in Def. 2.13, even though theseobjects are not the same: The perpendicular space is a subset of X, whereas theforward annihilator is a subset of X ′. In the following, when dealing with innerproduct spaces, E⊥ will always mean the perpendicular space.

(c) If X = V1 ⊕ V2 with subspaces V1, V2 of X, then we call X the orthogonal sum ofV1, V2 if, and only if, v1 ⊥ v2 for each v1 ∈ V1, v2 ∈ V2. In this case, we also writeX = V1 ⊥ V2.

(d) Let S ⊆ X. Then S is an orthogonal system if, and only if, x ⊥ y for each x, y ∈ Swith x 6= y. A unit vector is x ∈ X such that ‖x‖ = 1 (with respect to theinduced norm on X). Then S is called an orthonormal system if, and only if, Sis an orthogonal system consisting entirely of unit vectors. Finally, S is called anorthonormal basis if, and only if, it is a maximal orthonormal system in the sensethat, if S ⊆ T ⊆ X and T is an orthonormal system, then S = T (caveat: if X isan infinite-dimensional Hilbert space, then an orthonormal basis of X is not(!) avector space basis of X).

Lemma 10.7. Let(

X, 〈·, ·〉)

be an inner product space over K, E ⊆ X.

(a) E ∩ E⊥ ⊆ 0.


(b) E⊥ is a subspace of X.

(c) X⊥ = 0 and 0⊥ = X.

Proof. (a): If x ∈ E ∩ E⊥, then 〈x, x〉 = 0, implying x = 0.

(b): We have 0 ∈ E⊥ and

∀λ,µ∈K

∀y1,y2∈E⊥

∀x∈E

〈x, λy1 + µy2〉 = λ〈x, y1〉+ µ〈x, y2〉 = 0,

showing λy1 + µy2 ∈ E⊥, i.e. E⊥ is a subspace of X.

(c): Is x ∈ X⊥, then 0 = 〈x, x〉, implying x = 0. On the other hand, 〈x, 0〉 = 0 for eachx ∈ X by Lem. 10.2(b).

Proposition 10.8. Let(

X, 〈·, ·〉)

be an inner product space over K and let S ⊆ X bean orthogonal system.

(a) S \ 0 is linearly independent.

(b) Pythagoras’ Theorem: If s1, . . . , sn ∈ S are distinct, n ∈ N, then

∥

∥

∥

∥

∥

n∑

i=1

si

∥

∥

∥

∥

∥

2

=n∑

i=1

‖si‖2,

where ‖ · ‖ is the induced norm on X.

Proof. (a): Suppose n ∈ N and λ1, . . . , λn ∈ K together with s1, . . . , sn ∈ S\0 distinctare such that

∑ni=1 λisi = 0. Then, as 〈sk, sj〉 = 0 for each k 6= j, we obtain

∀j∈1,...,n

0 = 〈0, sj〉 =⟨

n∑

i=1

λisi, sj

⟩

=n∑

i=1

λi〈si, sj〉 = λj〈sj, sj〉,

which yields λj = 0 by Def. 10.1(i). Thus, we have shown that λj = 0 for each j ∈1, . . . , n, which establishes the linear independence of S \ 0.(b): We compute

∥

∥

∥

∥

∥

n∑

i=1

si

∥

∥

∥

∥

∥

2

=

⟨

n∑

i=1

si,n∑

j=1

sj

⟩

=n∑

i=1

n∑

j=1

〈si, sj〉 =n∑

i=1

〈si, si〉 =n∑

i=1

‖si‖2,


To obtain orthogonal systems and orthonormal systems in inner product spaces, onecan apply the algorithm provided by the following Th. 10.9:


Theorem 10.9 (Gram-Schmidt Orthogonalization). Let(

X, 〈·, ·〉)

be an inner productspace over K with induced norm ‖ · ‖. Let x0, x1, . . . be a finite or infinite sequence ofvectors in X. Define v0, v1, . . . recursively as follows:

v0 := x0, vn := xn −n−1∑

k=0,vk 6=0

〈xn, vk〉‖vk‖2

vk (10.4)

for each n ∈ N, additionally assuming that n is less than or equal to the max index ofthe sequence x0, x1, . . . if the sequence is finite. Then the set v0, v1, . . . constitutes anorthogonal system. Of course, by omitting the vk = 0 and by dividing each vk 6= 0 byits norm, one can also obtain an orthonormal system (nonempty if at least one vk 6= 0).Moreover, vn = 0 if, and only if, xn ∈ spanx0, . . . , xn−1. In particular, if the x0, x1, . . .are all linearly independent, then so are the v0, v1, . . . .

Proof. We show by induction on n ∈ N0, that, for each 0 ≤ m < n, vn ⊥ vm. For n = 0,there is nothing to show. Thus, let n > 0 and 0 ≤ m < n. By induction, 〈vk, vm〉 = 0for each 0 ≤ k,m < n such that k 6= m. For vm = 0, 〈vn, vm〉 = 0 is clear. Otherwise,

〈vn, vm〉 =⟨

xn −n−1∑

k=0,vk 6=0

〈xn, vk〉‖vk‖2

vk, vm

⟩

= 〈xn, vm〉 −〈xn, vm〉‖vm‖2

〈vm, vm〉 = 0,

thereby establishing the case. So we know that v0, v1, . . . constitutes an orthogonalsystem. Next, by induction, for each n, we obtain vn ∈ spanx0, . . . , xn directly from

(10.4). Thus, vn = 0 implies xn =∑n−1

k=0,vk 6=0

〈xn,vk〉‖vk‖2

vk ∈ spanx0, . . . , xn−1. Conversely, if

xn ∈ spanx0, . . . , xn−1, thendim spanv0, . . . , vn−1, vn = dim spanx0, . . . , xn−1, xn = dim spanx0, . . . , xn−1

= dim spanv0, . . . , vn−1,which, due to Prop. 10.8(a), implies vn = 0. Finally, if all x0, x1, . . . are linearly inde-pendent, then all vk 6= 0, k = 0, 1, . . . , such that the v0, v1, . . . are linearly independentby Prop. 10.8(a).

Example 10.10. In the space C[−1, 1] with the inner product according to Ex. 10.5(b),consider

∀i∈N0

xi : [−1, 1] −→ K, xi(x) := xi.

We check that the first four orthogonal polynomials resulting from applying (10.4) tox0, x1, . . . are given by

v0(x) = 1, v1(x) = x, v2(x) = x2 − 1

3, v3(x) = x3 − 3

5x.

One has v0 = x0 ≡ 1 and, then, obtains successively from (10.4):

v1(x) = x1(x)−〈x1, v0〉‖v0‖2

v0(x) = x− 〈x1, v0〉‖v0‖2

= x−∫ 1

−1x dx

∫ 1

−1dx

= x,


v2(x) = x2(x)−〈x2, v0〉‖v0‖2

v0(x)−〈x2, v1〉‖v1‖2

v1(x) = x2 −∫ 1

−1x2 dx

2−∫ 1

−1x3 dx

∫ 1

−1x2 dx

x

= x2 − 1

3.

v3(x) = x3(x)−〈x3, v0〉‖v0‖2

v0(x)−〈x3, v1〉‖v1‖2

v1(x)−〈x3, v2〉‖v2‖2

v2(x)

= x3 −∫ 1

−1x3 dx

2−∫ 1

−1x4 dx

∫ 1

−1x2 dx

x−∫ 1

−1x3(

x2 − 13

)

dx∫ 1

−1

(

x2 − 13

)2dx

(

x2 − 1

3

)

= x3 −2523

x = x3 − 3

5x.


X, 〈·, ·〉)

and(

Y, 〈·, ·〉)

be inner product space over K. If A ∈L(X, Y ) is a linear isomorphism, then we call A isometric (and X and Y isometricallyisomorphic) if, and only if

∀x1,x2∈X

〈Ax1, Ax2〉 = 〈x1, x2〉

(due to the fact that A then preserves the metric induced by the induced norm).

Theorem 10.12. Let(

X, 〈·, ·〉)

be a finite-dimensional inner product space over K.

(a) An orthonormal system S ⊆ X is an orthonormal basis of X if, and only if, S is avector space basis of X.

(b) X has an orthonormal basis.

(c) If(

Y, 〈·, ·〉)

is an inner product space over K, then X and Y are isometricallyisomorphic if, and only if, dimX = dimY .

(d) If U is a subspace of X, then the following holds:

(i) X = U ⊥ U⊥.

(ii) dimU⊥ = dimX − dimU .

(iii) (U⊥)⊥ = U .

Proof. (a): Let S be an orthonormal system. If S is an orthonormal basis, then, byProp. 10.8(a) and Th. 10.9, S is a maximal linearly independent subset of X, showing Sto be a vector space basis of X. Conversely, if S is a vector space basis of X, then, againusing Prop. 10.8(a), S must be a maximal orthonormal system, i.e. an orthonormal basisof X.

(b): According to Th. 10.9, applying Gram-Schmidt orthogonalization to a vector spacebasis of X yields an orthonormal basis of X.

(c): We already know that the existence of a linear isomorphism between X and Yimplies dimX = dimY . Conversely, assume n := dimX = dimY ∈ N, and let BX =


x1, . . . , xn, BY = y1, . . . , yn be orthonormal bases of X, Y , respectively. DefineA ∈ L(X, Y ) by letting Axi = yi for each i ∈ 1, . . . , n. As A(BX) = BY , A is a linearisomorphism. Moreover, if λ1, . . . , λn, µ1, . . . , µn ∈ K, then

⟨

A

(

n∑

i=1

λixi

)

, A

(

n∑

j=1

µjxj

)⟩

=n∑

i=1

n∑

j=1

λiµj〈yi, yj〉 =n∑

i=1

n∑

j=1

λiµjδij

=n∑

i=1

n∑

j=1

λiµj〈xi, xj〉 =⟨

n∑

i=1

λixi,n∑

j=1

µjxj

⟩

,

showing A to be isometric.

(d): Let x1, . . . , xn be a basis of X such that x1, . . . , xm, 1 ≤ m ≤ n, is a basisof U , dimX = n, dimU = m. In this case, if v1, . . . , vn are given by Gram-Schmidtorthogonalization according to Th. 10.9, then Th. 10.9 yields v1, . . . , vn to be a basis ofX and v1, . . . , vm to be a basis of U . Since vm+1, . . . , vn ⊆ U⊥, we have X = U+U⊥.As we also know U ∩ U⊥ = 0 by Lem. 10.7(a), we have shown X = U ⊥ U⊥,then yielding dimU⊥ = dimX − dimU as well. As a consequence of (ii), we havedim(U⊥)⊥ = dimU , which, together with U ⊆ (U⊥)⊥, yields (U⊥)⊥ = U .

Using Zorn’s lemma, one can extend Th. 10.12(b) to infinite-dimensional spaces (cf.[Phi17c, Th. 4.31(a)]). However, as remarked before, an orthonormal basis of a Hilbertspace X is a vector space basis of X if, and only if, dimX < ∞ (cf. [Phi17c, Rem.4.33]). Moreover, Th. 10.12(c) also extends to infinite-dimensional Hilbert spaces, whichare isometrically isomorphic if, and only if, they have orthonormal bases of the samecardinality (cf. [Phi17c, Th. 4.31(c)]). If one ads the assumption that the subspace Ube closed (with respect to the induced norm), then Th. 10.12(c) extends to infinite-dimensional Hilbert spaces as well (cf. [Phi17c, Th. 4.20(e),(f)]).

If X is a finite-dimensional inner product space, then all linear forms on X (i.e. allelements of the dual X ′) are given by means of the inner product on X:

Theorem 10.13. Let(

X, 〈·, ·〉)

be a finite-dimensional inner product space over K.Then the map

ψ : X −→ X ′, ψ(y) := αy, (10.5)

whereαy : X −→ K, αy(a) := 〈a, y〉, (10.6)

is bijective and conjugate-linear (in particular, each α ∈ X ′ can be represented by y ∈ X,and, if K = R, then ψ is a linear isomorphism).

Proof. According to Def. 10.1(ii), for each y ∈ X, αy is linear, i.e. ψ is well-defined.Moreover,

∀λ,µ∈K

∀y1,y2∈X

∀a∈X

ψ(λy1 + µy2)(a) = 〈a, λy1 + µy2〉 = λ〈a, y1〉+ µ〈a, y2〉= (λψ(y1) + µψ(y2))(a),


showing ψ to be conjugate-linear. It remains to show ψ is bijective. Let x1, . . . , xnbe a basis of X, dimX = n. It suffices to show that B := αx1 , . . . , αxn is a basis ofX ′. As dimX = dimX ′, it even suffices to show that B is linearly independent. To thisend, let λ1, . . . , λn ∈ K be such that 0 =

∑ni=1 λiαxi

. Then, for each v ∈ X,

0 =

(

n∑

i=1

λiαxi

)

v =n∑

i=1

λiαxiv =

n∑

i=1

λi 〈v, xi〉 =⟨

v,n∑

i=1

λi xi

⟩

,

showing∑n

i=1 λi xi ∈ X⊥ = 0. As the xi are linearly independent, this yields λ1 =· · · = λn = 0 and the desired linear independence of B.

Remark 10.14. The above Th. 10.13 is a finite-dimensional version of the Riesz rep-resentation theorem for Hilbert spaces, which states that Th. 10.13 remains true if Xis an infinite-dimensional Hilbert space and X ′ is replaced by the topological dual X ′

top

of X, consisting of all linear forms on X that are also continuous with respect to theinduced norm on X, cf. [Phi17c, Ex. 3.1] (recall that, if X is finite-dimensional, then alllinear forms on X are automatically continuous, cf. [Phi16b, Ex. 2.16]).

10.2 The Adjoint Map


X1, 〈·, ·〉)

,(

X2, 〈·, ·〉)

be a finite-dimensional inner productspace over K. Moreover, let A ∈ L(X1, X2), let A′ ∈ L(X ′

2, X′1) be the dual map

according to Def. 2.26, and let ψ1 : X1 −→ X ′1, ψ2 : X2 −→ X ′

2 be the maps given bythe Th. 10.13. Then the map

A∗ : X2 −→ X1, A∗ := ψ−11 A′ ψ2, (10.7)

is called the adjoint map of A.

Theorem 10.16. Let(

X1, 〈·, ·〉)

,(

X2, 〈·, ·〉)

be finite-dimensional inner product spacesover K. Let A ∈ L(X1, X2).

(a) One has A∗ ∈ L(X2, X1), and A∗ is the unique map X2 −→ X1 such that

∀x∈X1

∀y∈X2

〈Ax, y〉 = 〈x,A∗y〉. (10.8)

(b) One has A∗∗ = A.

(c) One has that A 7→ A∗ is a conjugate-linear bijection of L(X1, X2) onto L(X2, X1).

(d) (IdX1)∗ = IdX1.

(e) If(

X3, 〈·, ·〉)

is another finite-dimensional inner product space over K and B ∈L(X2, X3), then

(B A)∗ = A∗ B∗.

(f) ker(A∗) = (ImA)⊥ and Im(A∗) = (kerA)⊥.


(g) A is a monomorphism if, and only if, A∗ is an epimorphism.

(h) A is a an epimorphism if, and only if, A∗ is a monomorphism.

(i) A−1 ∈ L(X2, X1) exists if, and only if, (A∗)−1 ∈ L(X1, X2) exists, and, in that case,

(A∗)−1 = (A−1)∗.

Proof. (a): Let A ∈ L(X1, X2). Then A∗ ∈ L(X2, X1), since A′ is linear, and ψ−1

1 andψ2 are both conjugate-linear. Moreover, we know A′ is the unique map on X ′

2 such that

∀β∈X′

2

∀x∈X1

A′(β)(x) = β(A(x)).

Thus, for each x ∈ X1 and each y ∈ X2,

〈x,A∗y〉 = 〈x, (ψ−11 A′ ψ2)(y)〉 = ψ1

(

(ψ−11 A′ ψ2)(y)

)

(x)

= A′(ψ2(y))(x) = ψ2(y)(Ax) = 〈Ax, y〉,

proving (10.8). For each y ∈ X2 and each x ∈ X1, we have 〈Ax, y〉 = (ψ2(y))(Ax) =(

(ψ2(y)) A)

(x). Then Th. 10.13 and (10.8) imply A∗(y) = ψ−11

(

(ψ2(y)) A)

, showingA∗ to be uniquely determined by (10.8).

(b): According to (a), A∗∗ is the unique map X1 −→ X2 such that

∀x∈X1

∀y∈X2

〈A∗y, x〉 = 〈y, A∗∗x〉.

Comparing with (10.8) yields A = A∗∗.

(c): If A,B ∈ L(X1, X2) and λ ∈ K, then, for each y ∈ X2,

(A+ B)∗(y) = (ψ−11 (A+B)′ ψ2)(y) = (ψ−1

1 (A′ +B′) ψ2)(y)

= (ψ−11 A′ ψ2)(y) + (ψ−1

1 B′ ψ2)(y) = (A∗ + B∗)(y)

and(λA)∗(y) = (ψ−1

1 (λA)′ ψ2)(y) = λ(ψ−11 A′ ψ2)(y) = (λA∗)(y),

showing A 7→ A∗ to be conjugate-linear. Moreover, A 7→ A∗ is bijective due to (b).

(d): One has (IdX1)∗ = ψ−1

1 (IdX1)′ ψ1 = ψ−1

1 IdX′1ψ1 = IdX1 .

(e): Let ψ3 : X3 −→ X ′3 be given by Th. 10.13. Then

A∗ B∗ = ψ−11 A′ ψ2 ψ−1

2 B′ ψ3 = ψ−11 (B A)′ ψ3 = (B A)∗.

(f): We have

y ∈ ker(A∗) ⇔ ∀x∈X1

〈x,A∗y〉 = 0 ⇔ ∀x∈X1

〈Ax, y〉 = 0 ⇔ y ∈ (ImA)⊥.

Applying the first part with A replaced by A∗ yields

kerA = kerA∗∗ = (Im(A∗))⊥


and, thus, using Th. 10.12(d)(iii),

Im(A∗) =(

(Im(A∗))⊥)⊥

= (kerA)⊥.

(g): According to (f), we have

kerA = 0 ⇔ Im(A∗) = (kerA)⊥ = 0⊥ = X1.

(h): According to (f), we have

ker(A∗) = 0 ⇔ ImA = (ker(A∗))⊥ = 0⊥ = X2.

(i): One has

A−1 ∈ L(X2, X1) exists ⇔ (A−1)′ = (A′)−1 ∈ L(X ′1, X

′2) exists

⇔ (A∗)−1 = (ψ−11 A′ ψ2)

−1 ∈ L(X1, X2) exists.

Moreover, if A−1 ∈ L(X2, X1) exists, then

(A∗)−1 = ψ−12 (A−1)′ ψ1 = (A−1)∗,

completing the proof.

For extensions of Def. 10.15 and Th. 10.16 to infinite-dimensional Hilbert spaces, see[Phi17c, Def. 4.34], [Phi17c, Cor. 4.35].

Definition 10.17. Let m,n ∈ N and let M := (mkl) ∈ M(m,n,K) be an m×n matrixover K. We call

M := (mkl) ∈ M(m,n,K)

the complex conjugate matrix of M and

M∗ :=(

M)t

= (M t) ∈ M(n,m,K)

the adjointmatrix ofM (thus, for K = R, the adjoint matrix is the same as the transposematrix).

Theorem 10.18. Let(

X, 〈·, ·〉)

,(

Y, 〈·, ·〉)

be finite-dimensional inner product spacesover K. Let A ∈ L(X, Y ).

(a) Let BX := (x1, . . . , xn) and BY := (y1, . . . , ym) be ordered orthonormal bases of Xand Y , respectively. If M = (mkl) ∈ M(m,n,K) is the matrix of A with respect toBX and BY , then the adjoint matrix M∗ represents the adjoint map A∗ ∈ L(Y,X)with respect to BY and BX .

(b) Let X = Y , A ∈ L(X,X). Then det(A∗) = detA. If χA =∑n

k=0 akXk is the

characteristic polynomial of A, then χA∗ =∑n

k=0 akXk and

σ(A∗) =

λ : λ ∈ σ(A)

.


Proof. (a): Suppose

∀l∈1,...,n

Axl =m∑

k=1

mkl yk ∧ ∀k∈1,...,m

A∗yk =n∑

l=1

nkl xl

with nkl ∈ K. Then, for each (k, l) ∈ 1, . . . ,m × 1, . . . , n,

mkl = 〈Axl, yk〉 = 〈xl, A∗yk〉 = nkl,

showing M∗ to be the matrix of A∗ with respect to BY and BX .

(b): For M as in (a), it is

det(A∗) = det(M∗) = det(

(

M)t)

= det(

M) (∗)= detM = detA,

where (∗) holds, as the forming of complex conjugates commutes with the forming ofsums and products of complex numbers. Next, using this fact again together with thelinearity of forming the transpose of a matrix, we compute

χA∗ = det(X Idn−M∗) = det(

X Idn −(

M)t)

= det(

(

X Idn −M)t)

= det(

X Idn −M)

=n∑

k=0

akXk.

Thus,

λ ∈ σ(A∗) ⇔ ǫλ(χA∗) =n∑

k=0

akλk = 0

⇔ ǫλ(χA) =n∑

k=0

ak(

λ)k

= 0 ⇔ λ ∈ σ(A),

thereby proving (b).


X, 〈·, ·〉)

be an inner product space over K and let U be asubspace of X such that X = U ⊥ U⊥. Then the linear projection PU : X −→ U ,PU(u+ v) := u for u+ v ∈ X with u ∈ U and v ∈ U⊥ is called the orthogonal projectionfrom X onto U .

Theorem 10.20. Let(

X, 〈·, ·〉)

be an inner product space over K, let U be a subspaceof X such that X = U ⊥ U⊥, and let PU : X −→ U be the orthogonal projection ontoU . Moreover, let ‖ · ‖ denote the induced norm on X.

(a) One has

∀x∈X

∀u∈U

(

u 6= PU(x) ⇒ ‖PU(x)− x‖ < ‖u− x‖)

,

i.e. PU(x) is the strict minimum of the function u 7→ ‖u− x‖ from U into R+0 and

PU(x) can be viewed as the best approximation to x in U .


(b) If BU := u1, . . . , un is an orthonormal basis of U , dimU = n ∈ N, then

∀x∈X

PU(x) =n∑

i=1

〈x, ui〉 ui.

Proof. (a): Let x ∈ X and u ∈ U with u 6= PU(x). Then PU(x)−u ∈ U and x−PU(x) ∈kerPU = U⊥. Thus,

‖u− x‖2 Prop. 10.8(b)= ‖PU(x)− x‖2 + ‖u− PU(x)‖2 > ‖PU(x)− x‖2,

thereby proving (a).

(b): Let x ∈ X. As BU is a basis of U , there exist λ1, . . . , λn ∈ K such that

PU(x) =n∑

i=1

λi ui.

Recalling x− PU(x) ∈ kerPU = U⊥, we obtain

∀i∈1,...,n

〈x, ui〉 − 〈PU(x), ui〉 = 〈x− PU(x), ui〉 = 0

and, thus,∀

i∈1,...,n〈x, ui〉 = 〈PU(x), ui〉 = λi,

thereby proving (b).


X, 〈·, ·〉)

be an inner product space over K with induced norm‖ · ‖. Let P ∈ L(X,X) be a projection. Then P is an orthogonal projection ontoU := ImP (i.e. X = ImP ⊥ kerP ) if, and only if P = P ∗.

Proof. First, assume X = ImP ⊥ kerP . Then, for each x = ux + vx and y = uy + vywith ux, uy ∈ ImP and vx, vy ∈ kerP , we have

〈Px, y〉 = 〈ux, uy〉+ 〈ux, vy〉 = 〈ux, uy〉+ 0 = 〈ux, uy〉+ 〈vx, uy〉 = 〈x, Py〉,proving P = P ∗. Conversely, assume P = P ∗. Then, for each x ∈ X,

‖Px‖2 = 〈Px, Px〉 = 〈P 2x, x〉 = 〈Px, x〉(∗)

≤ ‖Px‖ ‖x‖,where (∗) holds due to the Cauchy-Schwarz inequality [Phi16b, (1.41)]. In consequence,

∀x∈X

‖Px‖ ≤ ‖x‖.

Now let u ∈ ImP and 0 6= v ∈ kerP . Define

y := u− 〈u, v〉‖v‖2 v.

Then 〈y, v〉 = 0, implying

‖y‖2 ≥ ‖Py‖2 = ‖u‖2 Prop. 10.8(b)=

∥

∥

∥

∥

u− 〈u, v〉‖v‖2 v

∥

∥

∥

∥

2

+

∥

∥

∥

∥

〈u, v〉‖v‖2 v

∥

∥

∥

∥

2

= ‖y‖2 + |〈u, v〉|2‖v‖2 .

As this yields 〈u, v〉 = 0, we have X = ImP ⊥ kerP , as desired.


Example 10.22. Let n ∈ N0 and define

Tn :=

(f : [−π, π] −→ C) : f(t) =n∑

k=−n

γk eikt ∧ γ−n, . . . , γn ∈ C

(due to Euler’s formula, relating the exponential function to sine and cosine, the elementsof Tn are known as trigonometric polynomials). Clearly, U := Tn is a subspace of thespace X := C[−π, π] of Ex. 10.5(b). As it turns out, we even have X = U ⊥ U⊥ (we willnot prove this here, but it follows from [Phi17c, Th. 4.20(e)], since the finite-dimensionalsubspace U is automatically a closed subspace, cf. [Phi17c, Th. 1.16(b)]). Thus, onehas an orthogonal projection PU from X onto U , yielding the best approximation of acontinuous function by a trigonometric polynomial. Moreover, if one has an orthonormalbasis of U , then one can use Th. 10.20(b) to compute PU(x) for each function x ∈ X.We verify that an orthonormal basis is given by the functions

uk : [−π, π] −→ C, uk(t) :=eikt√2π

(k ∈ −n, . . . , n) :

One computes, for each k, l ∈ −n, . . . , n,∫ π

−π

uk ul =1

2π

∫ π

−π

ei(k−l)t dt =

1 for k = l,12π[ e

i(k−l)t

i(k−l)]π−π = 0 for k 6= l.

Thus, for each x ∈ X, the orthogonal projection is

PU(x) =1√2π

n∑

k=−n

uk

∫ π

−π

x(t) e−ikt dt .

For example, let x(t) = t and n = 1. Then, since

∫ π

−π

teit dt =

[

t eit

i

]π

−π

−∫ π

−π

eit

idt = 2πi− 0 = 2πi,

∫ π

−π

t dt = 0,

∫ π

−π

te−it dt = −2πi,

PU(x) = 2πi+ 0− 2πi = 0.

10.3 Hermitian, Unitary, and Normal Maps


X, 〈·, ·〉)

be a finite-dimensional inner product space over Kand A ∈ L(X,X). Moreover, let M ∈ M(n,K), n ∈ N.


(a) We call A normal if, and only if, AA∗ = A∗A; likewise, we call M normal if, andonly if, MM∗ =M∗M . We define

Nor(X) :=

A ∈ L(X,X) : A is normal

,

Norn(K) :=

M ∈ M(n,K) : M is normal

.

(b) We call A Hermitian or self-adjoint if, and only if, A = A∗; likewise, we call MHermitian or self-adjoint if, and only if, M = M∗ (thus, for K = R, Hermitian isthe same as symmetric). We define

Herm(X) :=

A ∈ L(X,X) : A is Hermitian

,

Hermn(K) :=

M ∈ M(n,K) : M is Hermitian

,

and, for K = R,

Sym(X) :=

A ∈ L(X,X) : A is symmetric

,

Symn(R) :=

M ∈ M(n,R) : M is symmetric

.

(c) We call A ∈ GL(X) unitary if, and only if, A−1 = A∗; likewise, we callM ∈ GLn(K)unitary if, and only if, M−1 = M∗. If K = R, then we also call unitary maps andmatrices orthogonal. We define

U(X) :=

A ∈ L(X,X) : A is unitary

,

Un(K) :=

M ∈ M(n,K) : M is unitary

,

and, for K = R,

O(X) :=

A ∈ L(X,X) : A is orthogonal

,

On(R) :=

M ∈ M(n,R) : M is orthogonal

.


X, 〈·, ·〉)

be a finite-dimensional inner product space overK, n ∈ N. We have Herm(X) ⊆ Nor(X), Hermn(K) ⊆ Norn(K), U(X) ⊆ Nor(X),Un(K) ⊆ Norn(K),

Proof. That Hermitian implies normal is immediate. If A ∈ U(X), then AA∗ = A∗A =Id, i.e. A ∈ Nor(X).


X, 〈·, ·〉)

be a finite-dimensional inner product space over K,n ∈ N.

(a) U(X) is a subgroup of GL(X); Un(K) is a subgroup of GLn(K).

(b) Let A,B ∈ Herm(X). Then A + B ∈ Herm(X). If A ∈ GL(X), then A−1 ∈Herm(X). If AB = BA, then AB ∈ Herm(X). The analogous results also hold forHermitian matrices.


Proof. (a): If A,B ∈ U(X), then (AB)−1 = B−1A−1 = B∗A∗ = (AB)∗, showingAB ∈ U(X). Also A = (A−1)−1 = (A∗)∗, showing A−1 ∈ U(X) and establishing thecase.

(b): If A,B ∈ Herm(X), then (A+B)∗ = A∗+B∗ = A+B, showing A+B ∈ Herm(X). IfA ∈ Herm(X)∩GL(X), then (A−1)∗ = (A∗)−1 = A−1, proving A−1 ∈ Herm(X)∩GL(X).If AB = BA, then (AB)∗ = B∗A∗ = BA = AB, showing AB ∈ Herm(X).

In general, for normal maps and normal matrices, neither the sum nor the product isnormal (however, one can show that, if A,B are normal with AB = BA, then A + Band AB are normal – this makes use of Fuglede’s theorem and is not quite as easy asone might think).


X, 〈·, ·〉)

be an inner product space over K, dimX = n ∈ N,with ordered orthonormal basis B := (x1, . . . , xn). Let ‖ · ‖ denote the induced norm.Moreover, let U ∈ L(X,X), and let M = (mkl) ∈ M(n,K), M∗ = (m∗

kl) ∈ M(n,K) bethe respective matrices of U and U∗ with respect to B. Then the following statementsare equivalent:

(i) U and M are unitary.

(ii) U∗ and M∗ are unitary.

(iii) The columns of M form an orthonormal basis of Kn with respect to the standardinner product on Kn.

(iv) The rows of M form an orthonormal basis of Kn with respect to the standard innerproduct on Kn.

(v) M t is unitary.

(vi) M is unitary.

(vii) 〈Ux, Uy〉 = 〈x, y〉 holds for each x, y ∈ X.

(viii) ‖Ux‖ = ‖x‖ for each x ∈ X.

Proof. “(i)⇔(ii)”: U−1 = U∗ is equivalent to Id = UU∗, which is equivalent to (U∗)−1 =U = (U∗)∗.

“(i)⇔(iii)”: M−1 =M∗ implies

m1k...

mnk

·

m1j...

mnj

=

n∑

l=1

mlkmlj =n∑

l=1

m∗jlmlk =

0 for k 6= j,

1 for k = j,(10.9)

showing that the columns of M form an orthonormal basis of Kn with respect to thestandard inner product on Kn. Conversely, if the columns of M form an orthonormal


basis of Kn with respect to the standard inner product, then they satisfy (10.9), whichimplies M∗M = Id.

“(i)⇔(iv)”: M−1 =M∗ implies

mk1...

mkn

·

mj1...

mjn

=

n∑

l=1

mklmjl =n∑

l=1

mklm∗lj =

0 for k 6= j,

1 for k = j,(10.10)

showing that the rows ofM form an orthonormal basis ofKn with respect to the standardinner product on Kn. Conversely, if the rows of M form an orthonormal basis of Kn

with respect to the standard inner product, then they satisfy (10.10), which impliesM∗M = Id.

“(i)⇔(v)”: Since the rows of M are the columns of M t, the equivalence of (i) and (v)is immediate from (iii) and (iv).

“(i)⇔(vi)”: SinceM = (M∗)t, the equivalence of (i) and (vi) is immediate from (ii) and(v).

“(i)⇒(vii)”: For each x, y ∈ X: 〈Ux, Uy〉 = 〈U∗Ux, y〉 = 〈Id x, y〉 = 〈x, y〉.“(vii)⇒(i)”: Since

∀x,y∈X

〈Id x, y〉 = 〈x, y〉 = 〈Ux, Uy〉 = 〈x, U∗Uy〉

implies U∗U = Id, we obtain U to be unitary.

“(vii)⇒(viii)”: For each x ∈ X: ‖Ux‖2 = 〈Ux, Ux〉 = 〈x, x〉 = ‖x‖2.“(viii)⇒(vii)”: Let x, y ∈ X. Then (viii) implies

〈Ux, Ux〉+ 〈Ux, Uy〉+ 〈Uy, Ux〉+ 〈Uy, Uy〉 = 〈U(x+ y), U(x+ y)〉= 〈x+ y, x+ y〉 = 〈x, x〉+ 〈x, y〉+ 〈y, x〉+ 〈y, y〉

and, thus,〈Ux, Uy〉+ 〈Uy, Ux〉 = 〈x, y〉+ 〈y, x〉.

Similarly,

〈Ux, Ux〉 − i〈Ux, Uy〉+ i〈Uy, Ux〉+ 〈Uy, Uy〉 = 〈U(x+ iy), U(x+ iy)〉= 〈x+ iy, x+ iy〉 = 〈x, x〉 − i〈x, y〉+ i〈y, x〉+ 〈y, y〉

and, thus,〈Ux, Uy〉 − 〈Uy, Ux〉 = 〈x, y〉 − 〈y, x〉.

Adding both results and dividing by 2 then yields (vii).

In preparation for results on the diagonalizability of normal, Hermitian, and unitarymaps, we prove the following proposition:



X, 〈·, ·〉)

be an inner product space over K, dimX = n ∈ N,and A ∈ Nor(X).

(a) If 0 6= x ∈ X is an eigenvector to the eigenvalue λ ∈ K of A, then x is also aneigenvector to the eigenvalue λ of A∗.

(b) If U is an A-invariant subspace of X, then A(U⊥) ⊆ U⊥ and A∗(U) ⊆ U .

Proof. (a): It suffices to show 〈A∗x− λx, A∗x− λx〉 = 0. To this end, we use Ax = λxto compute

〈A∗x− λx, A∗x− λx〉 = 〈A∗x,A∗x〉 − λ〈x,A∗x〉 − λ〈A∗x, x〉+ λλ〈x, x〉= 〈AA∗x, x〉 − λ〈Ax, x〉 − λ〈x,Ax〉+ λλ〈x, x〉= 〈A∗Ax, x〉 − λλ〈x, x〉 − λλ〈x, x〉+ λλ〈x, x〉= 〈Ax,Ax〉 − λλ〈x, x〉 = 0,


(b): Let dimU = m ∈ N and let BU := u1, . . . , um be an orthonormal basis of U . AsU is A-invariant, there exist akl ∈ K such that

∀l∈1,...,m

Aul =m∑

k=1

akl uk.

Define

∀l∈1,...,m

xl := A∗ul −m∑

k=1

alk uk.

To show the A∗-invariance of U , it suffices to show that, for each l ∈ 1, . . . ,m, xl = 0,i.e. 〈xl, xl〉 = 0. To this end, for each l ∈ 1, . . . ,m, we compute

〈xl, xl〉 = 〈A∗ul, A∗ul〉 −

m∑

k=1

alk 〈A∗ul, uk〉 −m∑

k=1

alk 〈uk, A∗ul〉+m∑

j=1

m∑

k=1

alj alk 〈uk, uj〉

= 〈A∗Aul, ul〉 −m∑

k=1

alk 〈ul, Auk〉 −m∑

k=1

alk 〈Auk, ul〉+m∑

k=1

|alk|2

=m∑

k=1

|akl|2 −m∑

k=1

m∑

j=1

ajk alk 〈ul, uj〉 −m∑

k=1

m∑

j=1

ajk alk 〈uj, ul〉+m∑

k=1

|alk|2

=m∑

k=1

|akl|2 −m∑

k=1

|alk|2 −m∑

k=1

|alk|2 +m∑

k=1

|alk|2 =m∑

k=1

|akl|2 −m∑

k=1

|alk|2,

implyingm∑

l=1

〈xl, xl〉 = 0.


As 〈xl, xl〉 ≥ 0 for each l ∈ 1, . . . ,m, this implies the desired 〈xl, xl〉 = 0 for eachl ∈ 1, . . . ,m, thereby proving A∗(U) ⊆ U . We will now make use of this result to alsoshow A(U⊥) ⊆ U⊥: Let u ∈ U and x ∈ U⊥. Then

〈u,Ax〉 = 〈A∗u, x〉 A∗u∈U= 0,

proving Ax ∈ U⊥.

Theorem 10.28. Let(

X, 〈·, ·〉)

be an inner product space over C, dimX = n ∈ N, andA ∈ Nor(X). Then there exists an orthonormal basis B of X, consisting of eigenvectorsof A. In particular, A is diagonalizable. Moreover, if M ∈ Norn(C), then there exists aunitary matrix U ∈ Un(C) such that D = U−1MU is a diagonal matrix.

Proof. We prove the existence of the orthonormal basis of eigenvectors via induction onn ∈ N. For n = 1, there is nothing to prove. Thus, let n > 1. As C is algebraically closed,there exists λ ∈ σ(A). Let 0 6= v ∈ X be a corresponding eigenvector and U := spanv.According Prop. 10.27(b), both U and U⊥ are A-invariant. Moreover, AU⊥ is normal,since, if A and A∗ commute on X, they also commute on the subspace U⊥. Thus,by induction hypothesis, U⊥ has an orthonormal basis B′, consisting of eigenvectors ofA. Thus, X also has an orthonormal basis, consisting of eigenvectors of A. Now letM ∈ Norn(C). We consider M as a normal map on Cn with the standard inner product〈·, ·〉, where the standard basis of Cn constitutes an orthonormal basis. Then we knowthere exists an ordered orthonormal basis B := (x1, . . . , xn) of C

n such that, with respectto B, M has the diagonal matrix D. Thus, there exists U = (ukl) ∈ GLn(C) such thatD = U−1MU and

∀l∈1,...,n

xl =n∑

k=1

ukl ek.

Then

∀l,j∈1,...,n

n∑

k=1

ukl ukj =n∑

k=1

n∑

m=1

ukl umj〈ek, em〉 = 〈xl, xj〉 = δlj,

showing the columns of U to be orthonormal and U to be unitary.

Example 10.29. Consider R2 with the standard inner product. We already know from

Ex. 6.5(a) that M :=

(

0 −11 0

)

has no real eigenvalues (the characteristic polynomial is

χM = X2 + 1). On the other hand, M is unitary (in particular, normal), showing, onecan not expect Th. 10.28 to hold with C replaced by R.

Theorem 10.30. Let(

X, 〈·, ·〉)

be an inner product space over K, dimX = n ∈ N,and A ∈ Herm(X). Then σ(A) ⊆ R and there exists an orthonormal basis B of X,consisting of eigenvectors of A. In particular, A is diagonalizable. Moreover, if M ∈Norn(C), then there exists a unitary matrix U ∈ Un(C) such that D = U−1MU is adiagonal matrix. Also, in particular, for K = R, each A ∈ Sym(X) is diagonalizable,and, if M ∈ Symn(R), then there exists an orthogonal matrix U ∈ On(R) such thatD = U−1MU is a diagonal matrix.


Proof. Let λ ∈ σ(A) and let 0 6= x ∈ X be a corresponding eigenvector. Then, accordingto Prop. 10.27(a), λx = Ax = A∗x = λx, showing λ = λ and λ ∈ R. As A ∈ Herm(X)implies A to be normal, the case K = C is now immediate from Th. 10.28. It remainsto consider K = R. First, consider n = 2. If B0 := (x1, x2) is an ordered orthonormalbasis of X, then the matrix M ∈ M(2,R) of A with respect to B0 must have the form

M =

(

a bb c

)

with a, b, c ∈ R. Thus, the characteristic polynomial is

χA = (X − a)(X − c)− b2 = X2 − (a+ c)C + ac− b2

with the zeros

λ =a+ c

2±√

(a+ c)2

4− ac+ b2 =

a+ c

2±√

(a− c)2 + 4b2

4∈ R,

showing A to be diagonalizable in this case. The rest of the proof in now conductedanalogous to the proof of Th. 10.28: We prove the existence of the orthonormal basisof eigenvectors via induction on n ∈ N. For n = 1, there is nothing to prove. Thus, letn > 1. According to Prop. 8.16(a), there exists an A-invariant subspace W of X suchthat dimW ∈ 1, 2. If dimW = 1, then A has an eigenvalue λ and a correspondingeigenvector 0 6= v ∈ X. If dimW = 2, then A W is, clearly, also Hermitian, and theabove-considered case n = 2 yields that AW (and, thus, A) has an eigenvalue λ anda corresponding eigenvector 0 6= v ∈ X. Let U := spanv. According Prop. 10.27(b),both U and U⊥ are A-invariant. Moreover, AU⊥ is also Hermitian and, by inductionhypothesis, U⊥ has an orthonormal basis B′, consisting of eigenvectors of A. Thus, Xalso has an orthonormal basis, consisting of eigenvectors of A. Now let M ∈ Symn(R).We consider M as a symmetric map on Rn with the standard inner product 〈·, ·〉, wherethe standard basis of Rn constitutes an orthonormal basis. Then we know there exists anordered orthonormal basis B := (x1, . . . , xn) of R

n such that, with respect to B, M hasthe diagonal matrix D. Thus, there exists U = (ukl) ∈ GLn(R) such that D = U−1MUand

∀l∈1,...,n

xl =n∑

k=1

ukl ek.

Then

∀l,j∈1,...,n

n∑

k=1

ukl ukj =n∑

k=1

n∑

m=1

ukl umj〈ek, em〉 = 〈xl, xj〉 = δlj,

showing the columns of U to be orthonormal and U to be orthogonal.

A MULTILINEAR MAPS 140

A Multilinear Maps

Theorem A.1. Let V and W be vector spaces over the field F , α ∈ N. Then, as vectorspaces over F , L(V,Lα−1(V,W )) and Lα(V,W ) are isomorphic via the isomorphism

Φ : L(V,Lα−1(V,W )) −→ Lα(V,W ),

Φ(L)(x1, . . . , xα) := L(x1)(x2, . . . , xα). (A.1)

Proof. Since L is linear and L(x1) is (α − 1) times linear, Φ(L) is, indeed, an elementof Lα(V,W ), showing that Φ is well-defined by (A.1). Next, we verify Φ to be linear: Ifλ ∈ F and K,L ∈ L(V,Lα−1(V,W )), then

Φ(λL)(x1, . . . , xα) = (λL)(x1)(x2, . . . , xα) = λ(L(x1)(x2, . . . , xα)) = λΦ(L)(x1, . . . , xα)

and

Φ(K + L)(x1, . . . , xα) = (K + L)(x1)(x2, . . . , xα) = (K(x1) + L(x1))(x2, . . . , xα)

= K(x1)(x2, . . . , xα) + L(x1)(x2, . . . , xα)

= Φ(K)(x1, . . . , xα) + Φ(L)(x1, . . . , xα)

= (Φ(K) + Φ(L))(x1, . . . , xα),

proving Φ to be linear. Now we show Φ to be injective. To this end, we show that,if L 6= 0, then Φ(L) 6= 0. If L 6= 0, then there exist x1, . . . , xα ∈ V such thatL(x1)(x2, . . . , xα) 6= 0, showing that Φ(L) 6= 0 as needed. To verify Φ is also surjective,let K ∈ Lα(V,W ). Define L : V −→ Lα−1(V,W ) by letting

L(x1)(x2, . . . , xα) := K(x1, . . . , xα). (A.2)

Then, clearly, for each x1 ∈ V , L(x1) ∈ Lα−1(V,W ). Moreover, L is linear, i.e. L ∈L(V,Lα−1(V,W )). Comparing (A.2) with (A.1) shows Φ(L) = K, i.e. Φ is surjective,completing the proof.

References

[Bos13] Siegfried Bosch. Algebra, 8th ed. Springer-Verlag, Berlin, 2013 (German).

[For17] Otto Forster. Analysis 3, 8th ed. Springer Spektrum, Wiesbaden, Ger-many, 2017 (German).

[Jac75] Nathan Jacobson. Lectures in Abstract Algebra II. Linear Algebra. Gradu-ate Texts in Mathematics, Springer, New York, 1975.

[Kon04] Konrad Konigsberger. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2004(German).

REFERENCES 141

[Lan05] Serge Lang. Algebra, revised 3rd ed. Graduate Texts in Mathematics, Vol.211, Springer, New York, 2005.

[Phi16a] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, Ludwig-Maximilians-Universitat, Germany, 2015/2016, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis1.pdf.

[Phi16b] P. Philip. Analysis II: Topology and Differential Calculus of Several Variables.Lecture Notes, Ludwig-Maximilians-Universitat, Germany, 2016, available inPDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis2.pdf.

[Phi17a] P. Philip. Analysis III: Measure and Integration Theory of Several Variables.Lecture Notes, Ludwig-Maximilians-Universitat, Germany, 2016/2017, avail-able in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis3.pdf.

[Phi17b] P. Philip. Numerical Analysis I. Lecture Notes, Ludwig-Maximi-lians-Universitat, Germany, 2016/2017, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_NumericalAnalysis1.pdf.

[Phi17c] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximi-lians-Universitat, Germany, 2017, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNot

es/philipPeter_FunctionalAnalysis.pdf.

[Phi19] P. Philip. Linear Algebra I. Lecture Notes, Ludwig-Maximilians-Universitat,Germany, 2018/2019, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_LinearAlgebra1.pdf.

Documents

LinearAlgebraII - math.lmu.dephilip/publications/lectureNotes/philipPeter_Linear... · Thus, the aﬃne subspaces of a vector space V are precisely the translations of vector subspaces