Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Introductory Linear Algebra
Lecture Notes
Sudipta Mallik
Updated on March 25, 2020
Contents
1 Introduction 11.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Solving a Linear System 52.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Echelon Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Geometry of Solution Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Fundamental Linear Algebraic Concepts on Rn 143.1 Linear Span and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Basis and Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Inverse and Determinant of a Matrix 314.1 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Invertible Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Eigenvalues and Eigenvectors 435.1 Basics of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . 435.2 Similar and Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . 475.3 Similarity of Matrix Transformations . . . . . . . . . . . . . . . . . . . . . . 505.4 Application to Differential Equations . . . . . . . . . . . . . . . . . . . . . . 53
6 Inner-product and Orthogonality 566.1 Orthogonal Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2 Orthogonal Bases and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.4 Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7 Vector Spaces and Inner Product Spaces 667.1 Basics of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.2 Linear Span and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.4 Basis and Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.6 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Linear Algebra Sudipta Mallik
1 Introduction
1.1 Matrix Operations
Matrix: An m × n matrix A is an m-by-n array of scalars from a field (for example realnumbers) of the form
A =
a11 a12 · · · a1na21 a22 · · · a2n...
.... . .
...am1 am2 · · · amn
.The order (or size) of A is m × n (read as m by n) if A has m rows and n columns. The(i, j)-entry of A = [ai,j] is ai,j.
For example, A =
[1 2 0−3 0 −1
]is a 2× 3 real matrix. The (2, 3)-entry of A is −1.
Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same orderand the entries of A and B are the same.
Useful Matrices:
• A zero matrix, denoted by O or Om,n, is an m× n matrix whose all entries are zero.
• A square matrix is a matrix is a matrix whose number of rows and number of columnsare the same.
• A diagonal matrix is a square n× n matrix whose nondiagonal entries are zero.
• The identity matrix of order n, denoted by In, is the n × n diagonal matrix whose
diagonal entries are 1. For example, I3 =
1 0 00 1 00 0 1
is the 3× 3 identity matrix.
• An n × 1 matrix is called a column matrix or an n-dimensional (column) vector,
denoted by lowercase letters such x, x, or −→x . For example, −→x =
012
is a 3-
dimensional vector which represents the position vector of the point (0, 1, 2) in the3-space R3 (i.e., a directed line segment from the origin (0, 0, 0) to the point (0, 1, 2)).
Matrix Operations:
• Transpose: The transpose of an m×n matrix A, denoted by AT , is an n×m matrixwhose columns are corresponding rows of A, i.e., (AT )ij = Aji.
1
Linear Algebra Sudipta Mallik
x1
x2
(2, 1)[
21
]
Position vector of a point in the 2-space R2
Example. If A =
[1 2 0−3 0 −1
], then AT =
1 −32 00 −1
.
Properties: Let A and B be two matrices with appropriate orders. Then
1. (AT )T = A
2. (A+B)T = AT +BT
3. (cA)T = cAT for any scalar c
4. (AB)T = BTAT
• Scalar Multiplication: Let A be a matrix and c be a scalar. The scalar multiple,denoted by cA, is the matrix whose entries are c times the corresponding entries of A.
Example. If A =
[1 2 0−3 0 −1
], then −2A =
[−2 −4 0
6 0 2
].
Properties: Let A and B be two matrices of the same order and c and d be scalars.Then
1. c(A+B) = cA+ cB
2. (c+ d)A = cA+ dA
3. c(dA) = (cd)A
• Sum: If A and B are m× n matrices, then the sum A+B is the m× n matrix whoseentries are the sum of the corresponding entries of A and B, i.e., (A+B)ij = Aij +Bij.
Example. If A =
[1 2 0−3 0 −1
]and B =
[0 −2 03 0 2
], then A+B =
[1 0 00 0 1
].
Exercise. Find 2A−B.
Properties: Let A,B, and C be three matrices of the same order. Then
2
Linear Algebra Sudipta Mallik
1. A+B = B + A (commutative)
2. (A+B) + C = A+ (B + C) (associative)
3. A+O = A (additive identity O)
• Multiplication:Matrix-vector multiplication: If A is an m × n matrix and −→x is an n-dimensionalvector, then their product A−→x is an n-dimensional vector whose (i, 1)-entry is ai1x1 +ai2x2 + · · ·+ aimxn, the dot product of the row i of A and −→x . Note that
A−→x =
a11x1 + a12x2 + · · ·+ a1nxna21x1 + a22x2 + · · ·+ a2nxn
...am1x1 + am2x2 + · · ·+ amnxn
= x1
a11a21...am1
+x2
a12a22...am2
+· · ·+xn
a1na2n...
amn
.
Example. If A =
[1 2 0−3 0 −1
]and −→x =
1−1
0
, then A−→x =
[−1−3
]which is a
linear combination of first and second columns of A with weights 1 and −1 respectively.
Matrix-matrix multiplication: If A is an m× n matrix and B is an n× p matrix, thentheir product AB is an m× p matrix whose (i, j)-entry is the dot product the row i ofA and the column j of B.
(AB)ij = ai1b1j + ai2b2j + · · ·+ aimbmj
Example. For A =
[1 2 20 0 2
]and B =
2 −20 01 1
, we have AB =
[4 02 2
].
Properties: Let A,B, and C be three matrices of appropriate orders. Then
1. A(BC) = (AB)C (associative)
2. A(B + C) = AB + AC (left-distributive)
3. (B + C)A = BA+ CA (right-distributive)
4. k(AB) = (kA)B = A(kB) for any scalar k
5. ImA = A = AIn for any m× n matrix A (multiplicative identity I)
Remark.
(1) The column i of AB is A(column i of B).
3
Linear Algebra Sudipta Mallik
Example. For A =
[1 2 20 0 2
]and B =
2 −20 01 1
, we have
AB =
[4 02 2
]=
A 2
01
A
−201
.(2) AB 6= BA in general.
Example.
[1 23 4
] [0 10 0
]=
[0 10 3
]6=[
3 40 0
]=
[0 10 0
] [1 23 4
].
(3) AB = AC does not imply B = C in general.
Example.
[−2 1−2 1
] [1 10 0
]=
[−2 −2−2 −2
]=
[−2 1−2 1
] [0 0−2 −2
].
(4) AB = O does not imply A = O or B = O in general.
Example.
[−2 1−2 1
] [1 12 2
]=
[0 00 0
].
• Powers of a matrix: If A is an n × n matrix and k is a positive integer, then k-thpower of A, denoted by Ak, is the product of k copies of A. We use the conventionA0 = In.
Example. A =
[0 10 0
]=⇒ A2 = AA =
[0 00 0
], A100 =
[0 00 0
].
Symmetric and Skew-symmetric Matrices:A square matrix A is symmetric if AT = A and A is skew-symmetric if AT = −A. A squarematrix A can be written uniquely as a sum of a symmetric and a skew-symmetric matrix:
A =1
2
(A+ AT
)+
1
2
(A− AT
)Example.[
1 42 5
]=
1
2
([1 42 5
]+
[1 24 5
])+
1
2
([1 42 5
]−[
1 24 5
])=
[1 33 5
]+
[0 1−1 0
].
4
Linear Algebra Sudipta Mallik
2 Solving a Linear System
2.1 Systems of Linear Equations
A system of linear equations with n variables x1, . . . , xn and m equations can be written asfollows:
a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2
......
......
am1x1 + am2x2 + · · · + amnxn = bm.
(1)
A solution is an n-tuple (s1, s2, . . . , sn) that satisfies each equation when we substitute x1 =s1, x2 = s2, . . . , xn = sn. The solution set is the set of all solutions.
Example.
x1 + x3 = 3x2 − 2x3 = −1
The solution set (on R) is {(−s+ 3, 2s− 1, s) | s ∈ R}. There are infinitely many solutionsbecause of the free variable x3.
Possibilities of solutions of a linear system:
• System has no solution (Inconsistent)
• System has a solution (Consistent)
(a) Unique solution
(b) Infinitely many solutions
x1
x2
2x1 − x2 = 0
2x1 − x2 = 4
No solution
x1
x2
2x1 − x2 = 0x1 − x2 = −1
Unique solution
x1
x2
2x1 − x2 = 0
4x1 − 2x2 = 0
Infinitely many solutions
Definition. The system (1) is called an underdetermined system if m < n, i.e., fewerequations than variables. The system (1) is called an overdetermined system if m > n,i.e., more equations than variables.
5
Linear Algebra Sudipta Mallik
The system (1) of linear equations can be written by a matrix equation and a vector equation:
The matrix equation: A−→x =−→b , where
A =
a11 a12 · · · a1na21 a22 · · · a2n...
.... . .
...am1 am2 · · · amn
, −→x =
x1x2...xn
, and−→b =
b1b2...bm
.A is the coefficient matrix. The augmented matrix is
[A−→b ] =
a11 a12 · · · a1n b1a21 a22 · · · a2n b2...
.... . .
......
am1 am2 · · · amn bm
.The vector equation: x1
−→a1 + x2−→a2 + · · ·+ xn
−→an =−→b , where A = [−→a1 −→a2 · · · −→an].
Example.
2x2 − 8x3 = 8x1 − 2x2 + x3 = 0
−4x1 + 5x2 + 9x3 = −9
The matrix equation is A−→x =−→b where
A =
0 2 −81 −2 1−4 5 9
, −→x =
x1x2x3
, and−→b =
80−9
.The augmented matrix is
[A−→b ] =
0 2 −8 81 −2 1 0−4 5 9 −9
.
The vector equation is x1
01−4
+ x2
2−2
5
+ x3
−819
=
80−9
.You may verify that one solution is (x1, x2, x3) = (29, 16, 3). Is it the only solution?
6
Linear Algebra Sudipta Mallik
2.2 Row Operations
There are three elementary row operations we perform on a matrix:
1. Interchanging two rows (Ri ↔ Rj)
2. Multiplying a row by a nonzero scalar (cRi, c 6= 0)
3. Adding a scalar multiple of row i to row j (cRi +Rj)
Steps of solving a linear system A−→x =−→b are equivalent to elementary row operations on
the augmented matrix [A−→b ] as illustrated by the following example.
Example.
2x2 − 8x3 = 8 (2.1)x1 − 2x2 + x3 = 0 (2.2)
−4x1 + 5x2 + 9x3 = −9 (2.3)
We do the following steps to solve the above system:
1. Interchange (2.1) and (2.2):
x1 − 2x2 + x3 = 0 (3.1)2x2 − 8x3 = 8 (3.2)
−4x1 + 5x2 + 9x3 = −9 (3.3)
Corresponding row operation is
[A−→b ] =
0 2 −8 81 −2 1 0−4 5 9 −9
R1↔R2−−−−→
1 −2 1 00 2 −8 8−4 5 9 −9
.2. Replace (3.3) by 4(3.1)+(3.3):
x1 − 2x2 + x3 = 0 (4.1)2x2 − 8x3 = 8 (4.2)
− 3x2 + 13x3 = −9 (4.3)
Corresponding row operation is 1 −2 1 00 2 −8 8−4 5 9 −9
4R1+R3−−−−→
1 −2 1 00 2 −8 80 −3 13 −9
.7
Linear Algebra Sudipta Mallik
3. Scale 12(4.2):
x1 − 2x2 + x3 = 0 (5.1)x2 − 4x3 = 4 (5.2)
− 3x2 + 13x3 = −9 (5.3)
Corresponding row operation is 1 −2 1 00 2 −8 80 −3 13 −9
12R2−−→
1 −2 1 00 1 −4 40 −3 13 −9
.4. Replace (5.3) by 3(5.2)+(5.3):
x1 − 2x2 + x3 = 0 (6.1)x2 − 4x3 = 4 (6.2)
x3 = 3 (6.3)
Corresponding row operation is 1 −2 1 00 1 −4 40 −3 13 −9
3R2+R3−−−−→
1 −2 1 00 1 −4 40 0 1 3
.5. Back substitutions:
(6.3) =⇒ x3 = 3(6.2) =⇒ x2 = 4 + 4x3 = 4 + 4 · 3 = 16(6.1) =⇒ x1 = 0 + 2x2 − x3 = 2 · 16− 3 = 29
So the solution set is {(29, 16, 3)}.
Remark.
1. Two matrices are row equivalent if we can transform one matrix to another by elementaryrow operations. If two linear systems have row equivalent augmented matrices, thenthey have the same solution set.
2. To solve A−→x =−→b , using row operations we transform the augmented matrix [A
−→b ]
into an “upper-triangular” form called echelon form and then use back substitutions.
8
Linear Algebra Sudipta Mallik
2.3 Echelon Forms
The leading entry of a row of a matrix is the left-most nonzero entry of the row.
Definition. An m× n matrix A is in echelon form (or REF=row echelon form) if
1. all zero rows at the bottom,
2. all entries in a column of a leading entry below the leading entry are zeros, and
3. the leading entry of each row is to the right of all leading entries in the rows above it.
A is in reduced echelon form (or RREF=reduced row echelon form) if it satisfies two additionalconditions:
4. the leading entry of each row is 1 and
5. each leading 1 is the only nonzero entry in its column.
Example.
1. The following matrices are in REF: 1 −2 1 0
0 0 4 30 0 0 0
, 1 −2 1 0
0 0 4 3
0 0 0 5
2. The following matrices are in RREF: 1 −2 0 0
0 0 1 30 0 0 0
, 1 −2 0 0
0 0 1 0
0 0 0 1
Definition. A pivot position in a matrix A is a position of a leading 1 in the RREF of Aand corresponding column is a pivot column. A pivot is a nonzero number in a pivot positionof A that is used to create zeros below it in Gaussian elimination.
Example. Pivot positions of the last matrix are (1, 1), (2, 3), and (3, 4).
The Gaussian elimination or row reduction algorithm to get the REF of a matrix isexplained by the following example:
Example. A =
0 3 −6 5 −53 −7 8 −7 93 −9 12 −9 15
9
Linear Algebra Sudipta Mallik
1. Start with the left-most nonzero column (first pivot column) and make its top entrynonzero by interchanging rows if needed. This top nonzero entry is the pivot of thepivot column. 0 3 −6 5 −5
3 −7 8 −7 93 −9 12 −9 15
R1↔R3−−−−→
3 −9 12 −9 153 −7 8 −7 90 3 −6 5 −5
2. Create zeros below the pivot by row replacements. 3 −9 12 −9 15
3 −7 8 −7 90 3 −6 5 −5
−R1+R2−−−−−→
3 −9 12 −9 150 2 −4 2 −60 3 −6 5 −5
3. Ignore the column and row of the current pivot and repeat the preceding steps for the
rest submatrix. 3 −9 12 −9 15
0 2 −4 2 −60 3 −6 5 −5
− 32R2+R3−−−−−−→
3 −9 12 −9 15
0 2 −4 2 −6
0 0 0 2 4
(REF )
To get RREF start with the right-most pivot, make it 1 by scaling, and then create zerosabove it by row replacements. Repeat it for the rest of the pivots. 3 −9 12 −9 15
0 2 −4 2 −6
0 0 0 2 4
12R3−−→
3 −9 12 −9 15
0 2 −4 2 −6
0 0 0 1 2
9R3+R1−2R3+R2−−−−−−→
3 −9 12 0 33
0 2 −4 0 −10
0 0 0 1 2
12R2−−→
3 −9 12 0 33
0 1 −2 0 −5
0 0 0 1 2
9R2+R1−−−−→
3 0 −6 0 −12
0 1 −2 0 −5
0 0 0 1 2
13R1−−→
1 0 −2 0 −4
0 1 −2 0 −5
0 0 0 1 2
(RREF)
Remark. The above algorithm to get RREF is called Gauss-Jordan elimination. TheRREF of A is unique as it does not depend on the elementary row operations applied to A.
Steps to solve a linear system A−→x =−→b (Gaussian elimination):
1. Find the RREF of the augmented matrix [A−→b ].
2. Write the system of linear equations corresponding to the RREF.
3. If the new system is inconsistent, there is no solution of the original system. Otherwisewrite the basic variables (variables corresponding to pivot columns) in terms of constantand free variables (non-basic variables which corresponds to non-pivot columns).
10
Linear Algebra Sudipta Mallik
Example.
x1 − 3x2 + 2x4 = 12x1 − 6x2 + x3 + 10x4 = 0−x1 + 3x2 + x3 + 4x4 = −3
We find the RREF of the augmented matrix: 1 −3 0 2 12 −6 1 10 0−1 3 1 4 −3
−2R1+R2−−−−−→R1+R3
1 −3 0 2 1
0 0 1 6 −20 0 1 6 −2
−R2+R3−−−−−→
1 −3 0 2 1
0 0 1 6 −20 0 0 0 0
(RREF)
Corresponding system is
x1 − 3x2 + 2x4 = 1x3 + 6x4 = −2
0 = 0
where x1 and x3 are basic variables (for pivot columns) and x2 and x4 are free variables (fornon-pivot columns).
x1 = 1 + 3x2 − 2x4x2 = freex3 = −2− 6x4x4 = free
The solution set is {(1 + 3s − 2t, s,−2 − 6t, t) | s, t ∈ R}. If we solve the corresponding
matrix equation A−→x =−→b , the solution set is
1 + 3s− 2t
s−2 − 6t
t
| s, t ∈ R
=
10−2
0
+ s
3100
+ t
−2
0−6
1
| s, t ∈ R
.
Possibilities of solutions of A−→x =−→b from the RREF:
• System has no solution (inconsistent) iff the RREF of [A−→b ] has a row of the form
[0, . . . , 0, c], c 6= 0.
• System has a solution (consistent) iff the RREF of [A−→b ] has no row of the form
[0, . . . , 0, c], c 6= 0.
(a) Infinitely many solution if the RREF of [A−→b ] has a non-pivot column that is not
the last column (there is a free variable).
(b) Unique solution if all but the last column of the RREF of [A−→b ] are pivot columns
(there is no free variable).
11
Linear Algebra Sudipta Mallik
2.4 Geometry of Solution Sets
Homogeneous linear system: A system of linear equations is homogeneous if its matrixequation is A−→x =
−→0 . Note that
−→0 is always a solution called the trivial solution. Any
nonzero solution is called a nontrivial solution.
Example.
1.
x1 + x2 − x3 = 03x2 − 2x3 = 0
The corresponding matrix equation A−→x =−→0 has the solution sets
123
| s ∈ R
which is also denoted by Span
1
23
. This solution set corresponds to the points
on the line in the 3-space R3 passing through the point (1, 2, 3) and the origin (0, 0, 0).
Recall that the vector
123
is the position vector of the point (1, 2, 3) which is a
directed line segment from the origin (0, 0, 0) to the point (1, 2, 3).
2.x1 − x2 − 2x3 = 0
The corresponding matrix equation A−→x =−→0 has the solution sets
110
+ t
201
| s, t ∈ R
= Span
1
10
, 2
01
.
This solution set corresponds to the points on the plane in the 3-space R3 passingthrough the points (1, 1, 0), (2, 0, 1), and the origin (0, 0, 0).
Remark. If A−→x =−→0 has k free variables, then its solution set is the span of k vectors.
The solution set of A−→x =−→0 is Span{−→v1 , . . . ,−→vk} for some vectors −→v1 , . . . ,−→vk .
The solution set of A−→x =−→b is {−→p +−→v | A−→v =
−→0 } where A−→p =
−→b .
So a nonhomogenous solution is a sum of a particular solution and a homogeneous solution.
To justify it, let −→y be a solution of A−→x =−→b , i.e., A−→y =
−→b . Then
A(−→y −−→p ) =−→b −−→b =
−→0 .
12
Linear Algebra Sudipta Mallik
−→p
A−→x =−→0
A−→x =−→b
The solution set of A−→x =−→b is a translation of that of A−→x =
−→0
Then (−→y −−→p ) = −→v where A−→v =−→0 . Thus −→y = −→p +−→v .
Geometrically we get the solution set of A−→x =−→b by shifting the solution set of A−→x =
−→0
to the point whose position vector is −→p along the vector −→p .
Example. The nonhomogeneous system x1 − x2 − 2x3 = −2 has a particular solution
−→p =
111
. The corresponding homogeneous system x1− x2− 2x3 = 0 has the solution set
s 1
10
+ t
201
| s, t ∈ R
.
Thus the solution set of the nonhomogeneous system x1 − x2 − 2x3 = −2 is 1
11
+ s
110
+ t
201
| s, t ∈ R
.
13
Linear Algebra Sudipta Mallik
3 Fundamental Linear Algebraic Concepts on Rn
3.1 Linear Span and Subspaces
Definition. A linear combination of vectors −→v1 ,−→v2 , . . . ,−→vk of Rn is a sum of their scalarmultiples, i.e.,
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk
for some scalars c1, c2, . . . , ck. The set of all linear combinations of a nonempty set S ofvectors of Rn is called the linear span or span of S, denoted by Span(S) or SpanS, i.e.,
Span{−→v1 ,−→v2 , . . . ,−→vk} = {c1−→v1 + c2−→v2 + · · ·+ ck
−→vk | c1, c2, . . . , ck ∈ R}.
We define Span∅ = {−→0 }. When Span{−→v1 , . . . ,−→vk} = Rn, we say {−→v1 , . . . ,−→vk} spans Rn.
Example. For S =
1
10
, 1
20
,
Span(S) =
c1 1
10
+ c2
120
| c1, c2 ∈ R
.
Note that [0, 0, 1]T is not in Span(S) because there are no c1, c2 for which 001
= c1
110
+ c2
120
.Thus S does not span R3. But any vector of the form [a, b, 0]T is in Span(S) because
x1
110
+ x2
120
=
ab0
=⇒ x1 = 2a− b, x2 = −a+ b.
i.e.,
ab0
= (2a− b)
110
+ (−a+ b)
120
∈ Span(S).
Thus S spans the following set
Span(S) =
ab0
| a, b ∈ R
,
which is the xy-plane in R3.
Definition. A subspace of Rn is a nonempty subset S of Rn that satisfies three properties:
14
Linear Algebra Sudipta Mallik
(a)−→0 is in S.
(b) −→u +−→v is in S for all −→u , −→v in S.
(c) c−→u is in S for all −→u in S and all scalars c.
In short, a subspace of Rn is a nonempty subset S of Rn that is closed under linearcombination of vectors, i.e., c−→u + d−→v is in S for all −→u , −→v in S and all scalars c, d. WhenS is a subspace of Rn, we sometimes denote it by S ≤ Rn.
Example.
1. {−→0 },Rn ≤ Rn, i.e., {−→0 } and Rn are subspaces of Rn.
2. Show that S =
{[xy
]| x, y ∈ R, 2x− y = 0
}is a subspace of R2.
Solution.
(a)
[00
]∈ S because 2 · 0− 0 = 0.
(b) Let −→u ,−→v ∈ S and c ∈ R. Then
−→u =
[x1y1
]and −→v =
[x2y2
],
for some x1, x2, y1, y2 ∈ R such that 2x1 − y1 = 0 and 2x2 − y2 = 0. Then
−→u +−→v =
[x1y1
]+
[x2y2
]=
[x1 + x2y1 + y2
]∈ S
because 2(x1 + x2)− (y1 + y2) = (2x1 − y1) + (2x2 − y2) = 0.
(c)
c−→u = c
[x1y1
]=
[cx1cy1
]∈ S
because 2(cx1)− (cy1) = c(2x1 − y1) = 0.
Thus S (which is the line y = 2x) is a subspace of R2.
3. Let S =
1
10
, 1
20
. Then Span(S) is a subspace of R3.
First note that
000
= 0
110
+ 0
120
∈ Span(S). Thus Span(S) 6= ∅.
15
Linear Algebra Sudipta Mallik
Let −→u ,−→v ∈ Span(S) and c, d ∈ R. Then
−→u = c1
110
+ c2
120
and −→v = d1
110
+ d2
120
,for some c1, c2, d1, d2 ∈ R. Then
c−→u + d−→v = c
c1 1
10
+ c2
120
+ d
d1 1
10
+ d2
120
= (cc1 + dd1)
110
+ (cc2 + dd2)
120
∈ Span(S).
Thus Span(S) (which is the xy-plane) is a subspace of R3.
Theorem 3.1. Let −→v1 ,−→v2 , . . . ,−→vk ∈ Rn. Then Span{−→v1 ,−→v2 , . . . ,−→vk} is a subspace of Rn.
Proof. Since−→v1 ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}, Span{−→v1 ,−→v2 , . . . ,−→vk} 6= ∅. Let−→u ,−→v ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}and c, d ∈ R. Then −→u = c1
−→v1 + c2−→v2 + · · ·+ ck
−→vk and −→v = d1−→v1 + d2
−→v2 + · · ·+ dk−→vk for some
c1, . . . , ck, d1, . . . , dk ∈ R. Then
c−→u + d−→v = c(c1−→v1 + c2
−→v2 + · · ·+ ck−→vk) + d(d1
−→v1 + d2−→v2 + · · ·+ dk
−→vk)
= (cc1 + dd1)−→v1 + (cc2 + dd2)
−→v2 + · · ·+ (cck + ddk)−→vk ∈ Span{−→v1 ,−→v2 , . . . ,−→vk}.
For a given matrix we have two important subspaces: the column space and the null space.
Definition. The column space of an m× n matrix A = [−→a1 −→a2 · · · −→an], denoted by CS (A) orColA, is the span of its column vectors:
CS (A) = Span{−→a1 ,−→a2 , . . . ,−→an}.
Remark. Since each column is an m dimensional vector, CS (A) is a subspace of Rm.
Example. For A =
[1 2 30 4 5
], CS (A) = Span
{[10
],
[24
],
[35
]}≤ R2.
Example. Let A =
1 −3 −4−4 6 −2−3 7 6
and−→b =
33−4
. Determine if−→b is in CS (A).
Note that−→b ∈ CS (A) if and only if
−→b is a linear combination of columns of A if and only
if A−→x =−→b has a solution. 1 −3 −4 3
−4 6 −2 3−3 7 6 −4
4R1+R2−−−−→3R1+R3
1 −3 −4 30 −6 −18 150 −2 −6 5
− 13R2+R3−−−−−−→
1 −3 −4 3
0 -6 −18 150 0 0 0
(REF )
Since the REF of [A−→b ] has no row of the form [0, 0, 0, c], c 6= 0, A−→x =
−→b is consistent and
consequently−→b is in CS (A).
16
Linear Algebra Sudipta Mallik
Theorem 3.2. An m×n matrix A has a pivot position in every row if and only if A−→x =−→b
is consistent for any−→b ∈ Rm if and only if CS (A) = Rm.
Example. Since A =
[1 2 3
0 4 5
]has a pivot position in each row, CS (A) = R2.
Definition. The null space of an m×n matrix A, denoted by NS (A) or NulA, is the solution
set of A−→x =−→0 :
NS (A) = {−→x ∈ Rn | A−→x =−→0 }.
Theorem 3.3. Let A be an m× n matrix A. Then NS (A) is a subspace of Rn.
Proof. Since A−→0 =
−→0 ,−→0 ∈ NS (A). Thus NS (A) 6= ∅. Let −→u ,−→v ∈ NS (A) and c, d ∈ R.
Then A−→u =−→0 and A−→v =
−→0 . Then
A(c−→u + d−→v ) = c(A−→u ) + d(A−→v ) = c−→0 + d
−→0 =
−→0 .
Thus c−→u + d−→v ∈ NS (A).
Example. Let A =
[1 1 −10 3 −2
]. Find NS (A).
We find the solution set of A−→x =−→0 .
[A−→0 ] =
[1 1 −1 0
0 3 −2 0
]13R2−−→[
1 1 −1 0
0 1 −2/3 0
]−R2+R1−−−−−→
[1 0 −1/3 0
0 1 −2/3 0
](RREF )
Corresponding system is
x1 − x33
= 0x2 − 2x3
3= 0
where x1 and x2 are basic variables (for pivot columns) and x3 is a free variable (for non-pivotcolumn).
x1 = x33
x2 = −2x33
x3 = free
NS (A) =
x3
32x33
x3
| x3 ∈ R
=
x33 1
23
| x3 ∈ R
= Span
1
23
Remark. If an m×n matrix A has k non-pivot columns (i.e., k free variables for A−→x =
−→0 ),
then NS (A) is a span of k vectors in Rn. For a proof see Theorem 3.9.
17
Linear Algebra Sudipta Mallik
3.2 Linear Independence
Definition. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of vectors of Rn is linearly independent if the only
linear combination of vectors in S that produces−→0 is a trivial linear combination., i.e.,
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 =⇒ c1 = c2 = · · · = ck = 0.
S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent if S is not linearly independent, i.e., there arescalars c1, c2, . . . , ck, not all zero, such that
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 .
Remark.
1. {−→0 } is linearly dependent as 2−→0 =
−→0 .
2. {−→v } is linearly independent if and only if −→v 6= −→0 .
3. Let S = {−→v1 ,−→v2 , . . . ,−→vk} and A = [−→v1 −→v2 · · · −→vk ]. Then S is linearly independent if and
only if−→0 is the only solution of A−→x =
−→0 if and only if NS (A) = {−→0 }.
Example.
1. Determine if the following vectors are linearly independent.
−→v1 =
[12
], −→v2 =
[23
]
We investigate if c1−→v1 + c2
−→v2 =−→0 =⇒ c1 = c2 = 0.
[A−→0 ] =
[1 2 02 3 0
]−2R1+R2−−−−−→
[1 2 0
0 -1 0
](REF )
Each column of A is a pivot column giving no free variables. So there is a uniquesolution of A−→x =
−→0 which is
−→0 . Thus −→v1 and −→v2 are linearly independent. Note that
each of −→v1 and −→v2 is not a multiple of the other.
2. Determine if the columns of A are linearly independent for A =
1 2 3 41 3 5 81 2 4 7
.
A =
1 2 3 41 3 5 81 2 4 7
−R1+R2−−−−−→−R1+R3
1 2 3 4
0 1 2 4
0 0 1 3
(REF )
A has a non-pivot column giving a free variable. So there are infinitely many solutionsof A−→x =
−→0 . Thus the columns of A are linearly dependent. Verify that one solution
18
Linear Algebra Sudipta Mallik
is (x1, x2, x3, x4) = (1, 2,−3, 1). So we get the following linear dependence relationamong the columns of A:
1
111
+ 2
232
− 3
354
+ 1
487
=
000
.Remark. The columns of an m× n matrix are linearly dependent when m < n because Awould have a non-pivot column giving a free variable for solutions of the system A−→x =
−→0 .
Theorem 3.4. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of k ≥ 2 vectors in Rn is linearly dependent ifand only if there exists a vector in S that is a linear combination of the other vectors in S.
Proof. Let S = {−→v1 ,−→v2 , . . . ,−→vk} be a set of k ≥ 2 vectors in Rn. First suppose S is linearlydependent. Then there are scalars c1, c2, . . . , ck, not all zero, such that
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 .
Choose i ∈ {1, 2, . . . , k} such that ci 6= 0. Then
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 =⇒ −ci−→vi = c1
−→v1 + · · ·+ ci−1−−→vi−1 + ci+1
−−→vi+1 + · · ·+ ck−→vk
=⇒ −→vi = −c1ci
−→v1 − · · · −ci−1ci
−−→vi−1 −ci+1
ci
−−→vi+1 − · · · −ckci
−→vk .
Conversely suppose there is i ∈ {1, 2, . . . , k} such that
−→vi = d1−→v1 + · · ·+ di−1
−−→vi−1 + di+1−−→vi+1 + · · ·+ dk
−→vk ,
for some scalars d1, . . . , di−1, di+1, . . . , dk. Then we have a nontrivial linear combination
producing−→0 :
d1−→v1 + · · ·+ di−1
−−→vi−1 −−→vi + di+1−−→vi+1 + · · ·+ dk
−→vk =−→0 .
Thus S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent in Rn.
Example. For A = [−→a1 −→a2 −→a3 −→a4 ] =
1 2 3 41 3 5 81 2 4 7
, we have shown that the columns are
linearly dependent and −→a1 + 2−→a2 − 3−→a3 +−→a4 =−→0 . We can write the first column in terms of
the other columns: −→a1 = −2−→a2 + 3−→a3 −−→a4 . In fact we can write any column in terms of theothers (which may not be the case for any given linearly dependent set of vectors).
19
Linear Algebra Sudipta Mallik
3.3 Basis and Dimensions
Definition. A basis of a nontrivial subspace S of Rn is a subset B of S such that
(a) Span(B) = S and
(b) B is linearly independent set.
We define the basis of the trivial subspace {−→0 } to be B = ∅. The number of vectors in abasis B is the dimension of S denoted by dim (S) or dimS.
Example.
1. For the subspace S =
{[xy
]| x, y ∈ R, 2x− y = 0
}of R2,
S = Span
{[12
]}.
Also
{[12
]}is linearly independent. Thus B =
{[12
]}is a basis of S and dim (S) =
|B| = 1. Note that there infinitely many bases of S.
2. Among infinitely many bases of Rn, B = {−→e1 ,−→e2 , . . . ,−→en} =
100...0
,
010...0
, . . . ,
000...1
is called the standard basis of Rn. For any −→x = [x1, x2, . . . , xn]T ∈ Rn,
−→x = x1−→e1 + x2
−→e2 + · · ·+ xn−→en,
i.e.,
x1x2x3
...xn
= x1
100...0
+ x2
010...0
+ · · ·+ xn
000...1
∈ Span(B).
Thus Span(B) = Rn. To show linear independence, let x1−→e1 + x2
−→e2 + · · ·+ xn−→en =
−→0 ,
i.e., x1
100...0
+x2
010...0
+· · ·+xn
000...1
=
x1x2x3
...xn
=
000...0
=⇒ x1 = x2 = · · · = xn = 0.
So B is linearly independent . Thus B is a basis of Rn and dim (Rn) = |B| = n.
20
Linear Algebra Sudipta Mallik
Now we present some important theorems regarding bases of a subspace of Rn.
Theorem 3.5 (Unique Representation Theorem). Let S be a subspace of Rn. Then B =
{−→b1 ,−→b2 , . . . ,
−→bk} is a basis of S if and only if each vector −→v of S is a unique linear combination
of−→b1 ,−→b2 , . . . ,
−→bk , i.e., −→v = c1
−→b1 + c2
−→b2 + · · ·+ ck
−→bk for unique scalars c1, c2, . . . , ck.
Proof. LetB = {−→b1 ,−→b2 , . . . ,
−→bk} be a basis of S. Consider a vector−→v of S. Since S = SpanB,
−→v = c1−→b1 +c2
−→b2 + · · ·+cn
−→bk for some scalars c1, c2, . . . , ck. To show these scalars are unique,
let −→v = d1−→b1 + d2
−→b2 + · · ·+ dn
−→bk for some scalars d1, d2, . . . , dk. Then
−→v −−→v = (c1−→b1 + c2
−→b2 + · · ·+ ck
−→bn)− (d1
−→b1 + d2
−→b2 + · · ·+ dn
−→bk )
−→0 = (c1 − d1)
−→b1 + (c2 − d2)
−→b2 + · · ·+ (ck − dk)
−→bn
Since B = {−→b1 ,−→b2 , . . . ,
−→bk} is linearly independent, (c1−d1) = (c2−d2) = · · · = (ck−dk) = 0
which implies d1 = c1, d2 = c2, · · · , dk = ck. The converse follows similarly (exercise).
Theorem 3.6 (Reduction Theorem). Let S be a subspace of Rn. If a set B = {−→b1 ,−→b2 , . . . ,
−→bk}
of vectors of S spans S, then either B is a basis of S or a subset of B is a basis of S.
Proof. Suppose B = {−→b1 ,−→b2 , . . . ,
−→bk} spans S. If B is linearly independent, then B is a basis
of S. Otherwise there is a vector, say−→b1 , which is a linear combination of other vectors in
B. Let B1 = B \ {−→b1} = {
−→b2 , . . . ,
−→bk}. We can verify that SpanB1 = SpanB = S. If B1
is linearly independent, then B1 is a basis of S. Otherwise there is a vector, say−→b2 , which
is a linear combination of other vectors in B1. Let B2 = B1 \ {−→b2} = {
−→b3 , . . . ,
−→bk}. We can
verify that SpanB2 = SpanB1 = S. Proceeding this way we end up with a subset Bm of Bfor some m ≤ k such that Bm is linearly independent and SpanBm = S which means Bm isa basis of S.
Similarly we can prove the following:
Theorem 3.7 (Extension Theorem). Let S be a subspace of Rn. If a set B = {−→b1 ,−→b2 , . . . ,
−→bk}
of vectors of S is linearly independent, then either B is a basis of S or a superset of B is abasis of S.
Example. Use Reduction Theorem to find a basis of CS (A) for A =
1 2 3 41 3 5 81 2 4 7
.
Write A = [−→a1 −→a2 −→a3 −→a4 ] and B = {−→a1 ,−→a2 ,−→a3 ,−→a4}. Then CS (A) = SpanS. Verify that−→a4 = −−→a1 − 2−→a2 + 3−→a3 (exercise). Then B is not linear independent and
CS (A) = SpanB = Span{−→a1 ,−→a2 ,−→a3 ,−→a4} = Span{−→a1 ,−→a2 ,−→a3}.
Verify that {−→a1 ,−→a2 ,−→a3} is linearly independent. Thus {−→a1 ,−→a2 ,−→a3} is a basis of CS (A).
Definition. The rank of a matrix A, denoted by rank (A), is the dimension of its columnspace, i.e., rank (A) = dim (CS (A)).
21
Linear Algebra Sudipta Mallik
Theorem 3.8. The pivot columns of a matrix A form a basis for CS (A) and rank (A) isthe number of pivot columns of A.
Proof. (Sketch) Suppose R is the RREF of A. Then A−→x =−→0 if and only if R−→x =
−→0 , i.e.,
linear dependence relation among columns of A is the same as that of R. Since the pivotcolumns of R are linearly independent, so are the pivot columns of A. By Reduction Theoremwe can show that the pivot columns of R span CS (R). Then the pivot columns of A spanCS (A). Thus the pivot columns of A form a basis for CS (A) and rank (A) = dim (CS (A))is the number of pivot columns of A.
Remark. If R is the RREF of A, then CS (A) 6= CS (R) in general. Consider A =
[1 21 2
].
Then R = RREF(A) =
[1 20 0
]and CS (A) = Span
{[11
]}6= Span
{[10
]}= CS (R).
Example. Find rank (A) and a basis of CS (A) for A =
1 2 3 41 3 5 81 2 4 7
.
A =
1 2 3 41 3 5 81 2 4 7
−R1+R2−−−−−→−R1+R3
1 2 3 4
0 1 2 4
0 0 1 3
(REF )
Since A has 3 pivot columns −→a1 , −→a2 , and −→a3 , rank (A) = 3 and a basis of CS (A) is {−→a1 ,−→a2 ,−→a3},
i.e.,
1
11
, 2
32
, 3
54
.
Definition. The nullity of a matrix A, denoted by nullity (A), is the dimension of its nullspace, i.e., nullity (A) = dim (NS (A)).
Theorem 3.9. nullity (A) is the number of non-pivot columns of A.
Proof. (Sketch) Suppose B = [−→b1−→b2 · · ·
−→bn ] is the RREF of an m × n matrix A. Then
A−→x =−→0 if and only if B−→x =
−→0 , i.e., NS (A) = NS (B). Suppose
−→b1−→b2 · · ·
−→bk are the pivot
columns of B and the rest non-pivot columns. Then for i = k + 1, . . . , n,
−→bi = ci1
−→b1 + ci2
−→b2 + · · ·+ cik
−→bk =
k∑j=1
cij−→bj for some cij ∈ R.
B−→x =−→0 =⇒ x1
−→b1 + x2
−→b2 + · · ·+ xn
−→bn =
−→0
=⇒ x1−→b1 + x2
−→b2 + · · ·+ xk
−→bk + xk+1
(k∑j=1
ck+1,j
−→bj
)+ · · ·+ xn
(k∑j=1
cn,j−→bj
)=−→0
=⇒
(x1 +
n∑j=k+1
xjcj,1
)−→b1 + · · ·+
(xk +
n∑j=k+1
xjcj,k
)−→bk =
−→0
22
Linear Algebra Sudipta Mallik
Since {−→b1 ,−→b2 , . . . ,
−→bk} is linearly independent, xi = −
∑nj=k+1 xjcj,i for i = 1, . . . , k. Then we
can write −→x as a linear combination of n− k linearly independent vectors that span NS (B)(exercise). Thus dim (NS (A)) = dim (NS (B)) = n− k.
Remark. The non-pivot columns of A do not form a basis for NS (A).
Example. Find nullity (A) and a basis of NS (A) for A =
1 2 3 41 3 5 81 2 4 7
.
A =
1 2 3 41 3 5 81 2 4 7
−→ 1 0 0 −1
0 1 0 −2
0 0 1 3
(RREF )
Since A has one non-pivot column, nullity (A) = 1. To find a basis of NS (A), we solve
A−→x =−→0 which becomes
x1 − x4 = 0x2 − 2x4 = 0
x3 + 3x4 = 0
where x1, x2 and x3 are basic variables and x4 is a free variable.
x1 = x4x2 = 2x4x3 = −3x4x4 = free
NS (A) =
x42x4−3x4x4
| x4 ∈ R
=
x4
12−3
1
| x4 ∈ R
= Span
12−3
1
.
Thus a basis of NS (A) is
12−3
1
.
Theorem 3.10 (Rank-Nullity Theorem). For an m× n matrix A,
rank (A) + nullity (A) = n.
Proof. rank (A) + nullity (A) = the sum of numbers of pivot and non-pivot columns of Awhich is n.
Example. If A is a 4× 5 matrix with rank 3, then by the Rank-Nullity Theorem
nullity (A) = n− rank (A) = 5− 3 = 2.
23
Linear Algebra Sudipta Mallik
Now we investigate the relation of rank (A) with the dimension of the row space of A.
Definition. Each row of an m × n matrix A is called a row vector which can be identified
with a (column) vector in Rn. The row space of an m× n matrix A =
−→r1−→r2...−→rm
, denoted by
RS (A) or RowA, is the span of its row vectors:
RS (A) = Span{−→r1 ,−→r2 , . . . ,−→rm}.
Remark.
1. Since each row is an n dimensional vector, RS (A) is a subspace of Rn.
2. The row i of A is the column i of AT . Then RS (A) = CS(AT).
3. Elementary row operations may change the linear dependence relations among rows(unlike columns) but they do not change the row space. For example,
RS (A) = RS (RREF of A) .
Example. Consider A =
2 0 1 00 1 −1 12 1 0 1
. Write A =
−→r1−→r2−→r3, where −→r1 = [2, 0, 1, 0], −→r2 =
[0, 1,−1, 1], −→r3 = [2, 1, 0, 1]. Then RS (A) = CS(AT)
= Span{−→r1 ,−→r2 ,−→r3} is a subspace of R4.
A =
2 0 1 00 1 −1 12 1 0 1
−→ 2 0 1 0
0 1 −1 10 0 0 0
= R (REF)
Note that −→r3 = −→r1 +−→r2 in A, but not in R. Since the row 3 of R is −−→r1 −−→r2 +−→r3 in A, thespan of the rows of R is the same as that of A, i.e., RS (R) = RS (A). Note that the nonzerorows of B are linearly independent and span RS (R) = RS (A), i.e., they form a basis ofRS (R) = RS (A).
Definition. The row rank of a matrix A is the dimension of its row space.
Theorem 3.11. Let A be an m× n matrix with REF R. Then the nonzero rows of R forma basis for RS (R) = RS (A) and the row rank of A = the (column) rank of A = the numberof pivot positions of A.
Proof. Each nonzero row of R is not a linear combination of the other nonzero rows. Thusthe nonzero rows of R are linearly independent and span RS (R) = RS (A), i.e., they form abasis of RS (R) = RS (A). Recall that the rank of A is the number of pivot columns (hencepivot positions) of R. The number of pivot positions of R equals to the number of nonzerorows of R which is the row rank R and consequently the row rank of A.
24
Linear Algebra Sudipta Mallik
Remark. For an m× n matrix A, 0 ≤ rank (A) ≤ min{m,n}.Example.
1. For the 3 × 4 matrix A in the preceding example, rank (A) ≤ min{3, 4} = 3. Since ithas two nonzero rows in its REF, the row rank of A = rank (A) = 2.
2. What is the smallest and largest possible nullity of a 5× 7 matrix A?First note 0 ≤ rank (A) ≤ min{5, 7} = 5. Now by the Rank-Nullity Theorem,nullity (A) = 7 − rank (A) ≥ 7 − 5 = 2. So the smallest possible nullity of A is 2. Inthat case the row rank of A = rank (A) = 5. Similarly nullity (A) = 7− rank (A) ≤ 7.So the largest possible nullity of A is 7. In that case the row rank of A = rank (A) = 0.
3.4 Linear Transformations
Definition. A function T : V → W from a subspace V of Rn to a subspace W of Rm iscalled a linear transformation if
(a) T (−→u +−→v ) = T (−→u ) + T (−→v ) for all −→u ,−→v ∈ V and
(b) T (c−→v ) = cT (−→v ) for for all −→v ∈ V and all scalars c ∈ R.
In short, a function T : V → W is a linear transformation if it preserves the linearity amongvectors: T (c−→u + d−→v ) = cT (−→u ) + dT (−→v ) for all −→u ,−→v ∈ V and all scalars c, d ∈ R.
Example.
1. The projection T : R3 → R3 of R3 onto the xy-plane in R3 is defined by
T
x1x2x3
=
x1x20
for all −→x =
x1x2x3
∈ R3.
Sometimes it is simply denoted by T (x1, x2, x3) = (x1, x2, 0) in terms of row vectors.To show it is a linear transformation let −→x = (x1, x2, x3) and −→y = (y1, y2, y3) in R3
and c, d ∈ R. Then
T (c−→x + d−→y ) = T (cx1 + dy1, cx2 + dy2, cx3 + dy3)
= (cx1 + dy1, cx2 + dy2, 0)
= (cx1, cx2, 0) + (dy1, dy2, 0)
= cT (−→x ) + dT (−→y ).
2. For the matrix A =
[1 20 1
], define the shear transformation T : R2 → R2 by T (−→x ) =
A−→x . Let −→x ,−→y ∈ R2 and c, d ∈ R. Then
T (c−→x + d−→y ) = A(c−→x + d−→y ) = cA−→x + dA−→y = cT (−→x ) + dT (−→y ).
Thus T is a linear transformation which transforms the square formed by (0, 0),(1, 0),(1, 1),(0, 1)to the parallelogram formed by (0, 0), (1, 0), (3, 1), (2, 1).
25
Linear Algebra Sudipta Mallik
Definition. A matrix transformation is the linear transformation T : Rn → Rm defined byT (−→x ) = A−→x for some m× n matrix A. It is denoted by −→x 7→ A−→x .
From the definition of a linear transformation we have the following properties.
Proposition. For a linear transformation T : V → W where V ≤ Rn and W ≤ Rm,
(a) T (−→0n) =
−→0m and
(b) for all −→v1 , . . . ,−→vk ∈ V and all c1, . . . , ck ∈ R,
T (c1−→v1 + c2
−→v2 + · · ·+ ck−→vk) = c1T (−→v1) + c2T (−→v2) + · · ·+ ckT (−→vk).
Example. Consider the function T : R3 → R3 defined by T (x1, x2, x3) = (x1, x2, 5). SinceT (0, 0, 0) = (0, 0, 5) 6= (0, 0, 0), T is not a linear transformation.
Theorem 3.12. For a linear transformation T : Rn → Rm, there exists a unique m × nmatrix A, called the standard matrix of T , for which
T (−→x ) = A−→x for all −→x ∈ Rn.
Moreover, A = [T (−→e1 ) T (−→e2 ) · · ·T (−→en)] where −→ei is the ith column of In.
Proof. Let −→x = [x1, x2, . . . , xn]T ∈ Rn. We can write −→x = x1−→e1 + x2
−→e2 + · · ·+ xn−→en. Then
T (−→x ) = T (x1−→e1 + x2
−→e2 + · · ·+ xn−→en) = x1T (−→e1 ) + x2T (−→e2 ) + · · ·+ xnT (−→en)
= [T (−→e1 ) T (−→e2 ) · · ·T (−→en)]
x1x2...xn
= A−→x .
Example.
1. Use the standard matrix to find the rotation transformation T : R2 → R2 that rotateseach point of R2 about the origin through an angle θ counterclockwise.
By trigonometry we have
T (−→e1 ) = T
([10
])=
[cos θsin θ
]and T (−→e2 ) = T
([01
])=
[− sin θ
cos θ
].
Then the standard matrix is A = [T (−→e1 ) T (−→e2 )] =
[cos θ − sin θsin θ cos θ
]. Thus
T (−→x ) = A−→x , i.e., T
([x1x2
])=
[x1 cos θ − x2 sin θx1 sin θ + x2 cos θ
]for all −→x ∈ R2.
26
Linear Algebra Sudipta Mallik
2. Consider the linear transformation T : R2 → R3 defined by
T (x1, x2) = (x1 − x2, 2x1 + 3x2, 4x2).
Note that T (−→e1 ) = T (1, 0) = (1, 2, 0) and T (−→e2 ) = T (0, 1) = (−1, 3, 4). The standardmatrix of T is
A = [T (−→e1 ) T (−→e2 )] =
1 −12 30 4
.For any given linear transformation T : Rn → Rm, the domain space is Rn and the codomainspace is Rm. We study a subspace of the domain space called Kernel or Null Space and asubspace of the codomain space called Image Space or Range.
Definition. The kernel or null space of a linear transformation T : Rn → Rm, denoted byker(T ) or kerT , is the following subspace of Rn:
kerT = {−→x ∈ Rn | T (−→x ) =−→0m}.
The nullity of T , denoted by nullity (T ), is the dimension of kerT , i.e.,
nullity (T ) = dim (kerT ) .
Remark. If A is the standard matrix of a linear transformation T : Rn → Rm, then kerT =NS (A) and nullity (T ) = nullity (A).
Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has
the standard matrix A = [T (−→e1 ) T (−→e2 ) T (−→e3 )] =
[1 0 00 1 0
]. Note that
kerT = NS (A) = Span
0
01
,
and nullity (T ) = nullity (A) = 1.
Definition. The image space or range of a linear transformation T : Rn → Rm, denoted byim(T ) or imT or T (Rn), is the following subspace of Rm:
imT = {T (−→x ) | −→x ∈ Rn}.
The rank of T , denoted by rank (T ), is the dimension of imT , i.e.,
rank (T ) = dim (imT ) .
Remark. If A is the standard matrix of a linear transformation T : Rn → Rm, then imT =CS (A) and rank (T ) = rank (A).
27
Linear Algebra Sudipta Mallik
Example. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has the
standard matrix A = [T (−→e1 ) T (−→e2 )] =
1 00 10 0
. Note that
imT = CS (A) = Span
1
00
, 0
10
,
and rank (T ) = rank (A) = 2.
Theorem 3.13 (Rank-Nullity Theorem). For a linear transformation T : Rn → Rm,
rank (T ) + nullity (T ) = n.
Proof. Let A be the m×n standard matrix of T . Then by the Rank-Nullity Theorem on A,
rank (T ) + nullity (T ) = rank (A) + nullity (A) = n.
Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) hasnullity (T ) = 1 (see examples before). Then by the Rank-Nullity Theorem,
rank (T ) = 3− nullity (T ) = 2.
Now we discuss two important types of linear transformation T : Rn → Rm.
Definition. Let T : Rn → Rm be a linear transformation. T is onto if each−→b ∈ Rm has a
pre-image −→x in Rn under T , i.e., T (−→x ) =−→b . T is one-to-one if each
−→b ∈ Rm has at most
one pre-image in Rn under T .
Example.
1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) is ontobecause each (x1, x2) ∈ R2 has a pre-image (x1, x2, 0) ∈ R3 under T . But T is not one-to-one because T (0, 0, 0) = T (0, 0, 1) = (0, 0), i.e., (0, 0) has two distinct pre-images(0, 0, 0) and (0, 0, 1) under T .
2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) is one-to-onebecause T (x1, x2) = T (y1, y2) =⇒ (x1, x2, 0) = (x1, x2, 0) =⇒ (x1, x2) = (y1, y2).But T is not onto because (0, 0, 1) ∈ R3 has no pre-image (x1, x2) ∈ R2 under T .
3. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2) isone-to-one and onto (exercise).
Theorem 3.14. Let T : Rn → Rm be a linear transformation with the standard matrix A.Then the following are equivalent.
(a) T (i.e., −→x 7→ A−→x ) is one-to-one.
28
Linear Algebra Sudipta Mallik
(b) kerT = NS (A) = {−→0n}.
(c) nullity (T ) = nullity (A) = 0.
(d) The columns of A are linearly independent.
Proof. (b), (c), and (d) are equivalent by the definitions.(a) =⇒ (b) Suppose T (i.e., −→x 7→ A−→x ) is one-to-one. Let −→x ∈ kerT = NS (A). Then
A−→x =−→0m. Also
−→0n 7→ A
−→0n =
−→0m. Since −→x 7→ A−→x is one-to-one, −→x =
−→0n. Thus
NS (A) = {−→0n}.
(b) =⇒ (a) Suppose kerT = NS (A) = {−→0n}. Let −→x ,−→y ∈ Rn such that A−→x = A−→y . Then
A(−→x −−→y ) =−→0m. Then −→x −−→y ∈ NS (A) = {−→0n} which implies −→x −−→y =
−→0n, i.e., −→x = −→y .
Thus −→x 7→ A−→x is one-to-one.
Example. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has the
standard matrix A = [T (−→e1 ) T (−→e2 )] =
1 00 10 0
. Note that the columns of A are linearly
independent , kerT = NS (A) = {−→02}, and nullity (T ) = nullity (A) = 0. Thus T (i.e.,−→x 7→ A−→x ) is one-to-one.
Theorem 3.15. Let T : Rn → Rm be a linear transformation with the standard matrix A.Then the following are equivalent.
(a) T (i.e., −→x 7→ A−→x ) is onto.
(b) imT = CS (A) = Rm.
(c) rank (T ) = rank (A) = m.
(d) Each row of A has a pivot position.
Proof. (b), (c), and (d) are equivalent by the definitions.
(a) =⇒ (b) Suppose T (i.e., −→x 7→ A−→x ) is onto. Let−→b ∈ Rm. Since −→x 7→ A−→x is onto,
−→b = A−→x for some −→x ∈ Rn. Then
−→b = A−→x ∈ CS (A). Thus imT = CS (A) = Rm.
(b) =⇒ (a) Suppose imT = CS (A) = Rm. Let−→b ∈ Rm. Since
−→b ∈ CS (A) = Rm,
−→b = A−→x for some −→x ∈ Rn. Thus −→x 7→ A−→x is onto.
Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has
the standard matrix A = [T (−→e1 ) T (−→e2 ) T (−→e3 )] =
[1 0 00 1 0
]. Note that each row of A has a
pivot position, imT = CS (A) = R2, and rank (T ) = rank (A) = 2. Thus T (i.e., −→x 7→ A−→x )is onto.
Definition. A linear transformation T : Rn → Rn is an isomorphism if it is one-to-one andonto.
29
Linear Algebra Sudipta Mallik
Example. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2)is one-to-one and onto consequently an isomorphism. Showing T is one-to-one is enough toshow T is an isomorphism by the following theorem.
Theorem 3.16. Let T : Rn → Rn be a linear transformation with the n×n standard matrixA. Then the following are equivalent.
(a) T (i.e., −→x 7→ A−→x ) is an isomorphism.
(b) T (i.e., −→x 7→ A−→x ) is one-to-one.
(c) kerT = NS (A) = {−→0n}.
(d) nullity (T ) = nullity (A) = 0.
(e) The columns of A are linearly independent.
(f) T (i.e., −→x 7→ A−→x ) is onto.
(g) imT = CS (A) = Rn.
(h) rank (T ) = rank (A) = n.
(i) Each row and column of A has a pivot position.
Proof. (b), (c), (d), and (e) are equivalent by Theorem 7.6. (f), (g), (h), and (i) are equivalentby Theorem 7.7. Now for the n × n standard matrix A, rank (A) + nullity (A) = n. Thusnullity (A) = 0 if and only if rank (A) = n, i.e., (d) and (h) are equivalent. Since (b) and (f)are equivalent, they are equivalent to (a).
Example. What can we say about CS (A) ,NS (A) , rank (A) , nullity (A), and pivot positionsof a 3× 3 matrix with three linearly independent columns? What about −→x 7→ A−→x ?By the preceding theorem, CS (A) = R3,NS (A) = {−→03}, rank (A) = 3, nullity (A) = 0, A has3 pivot positions, and −→x 7→ A−→x is a one-to-one linear transformation from R3 onto R3.
30
Linear Algebra Sudipta Mallik
4 Inverse and Determinant of a Matrix
4.1 Inverse of a Matrix
Definition. An n× n matrix A is invertible if there an n× n matrix B such that
AB = BA = In.
This B is called the inverse of A, denoted by A−1, for which AA−1 = A−1A = In. Aninvertible matrix is also called a nonsingular matrix. A square matrix that is not invertibleis called a singular matrix.
Example. For A =
[1 24 6
]and B =
[−3 1
2 −0.5
], AB =
[1 00 1
]= BA. So B = A−1.
Theorem 4.1. Let A and B be two n× n invertible matrices. Then the following hold.
(a) A−1 is invertible and (A−1)−1 = A.
(b) AT is invertible and (AT )−1 = (A−1)T .
(c) For c 6= 0, cA is invertible and (cA)−1 = 1cA−1.
(d) AB is invertible and (AB)−1 = B−1A−1.
Proof. (a) and (c) are exercises. For (b) note that
AT (A−1)T = (A−1A)T = ITn = In and
(A−1)TAT = (AA−1)T = ITn = In.
For (d) note that(AB)(B−1A−1) = A(BB−1)A−1 = AInA
−1 = AA−1 = In and
(B−1A−1)(AB) = B−1(A−1A)B = B−1InB = B−1B = In.
Example. ForA =
[1 13 4
]andB =
[1 22 5
], A−1 =
[4 −1−3 1
]andB−1 =
[5 −2−2 1
].
Verify (AT )−1 =
[1 31 4
]−1=
[4 −3−1 1
]= (A−1)T , (5A)−1 = 1
5
[4 −1−3 1
]= 1
5A−1, and
(AB)−1 =
[3 7
11 26
]−1=
[26 −7−11 3
]= B−1A−1.
How do we know a given square matrix A is invertible? How do we find A−1?
Theorem 4.2. Let A be an n× n matrix. Then the following are equivalent.
(a) A is invertible.
(b) A−→x =−→b has a unique solution for each
−→b ∈ Rn.
31
Linear Algebra Sudipta Mallik
(c) The RREF of A is In.
Proof. (b) ⇐⇒ (c) A−→x =−→b has a unique solution for each
−→b ∈ Rn if and only if each
column of the RREF of A has a leading 1 if and only if the RREF of A is In.
(a) =⇒ (b) Suppose A is invertible. Let−→b ∈ Rn. Then A−→x =
−→b =⇒ −→x = A−1
−→b .
(b) =⇒ (a) Suppose A−→x =−→b has a unique solution for each
−→b ∈ Rn. Let A−→vi = −→ei for
i = 1, 2, . . . , n. Then
A[−→v1 −→v2 · · · −→vn] = [A−→v1 A−→v2 · · ·A−→vn] = [−→e1 −→e2 · · · −→en] = In.
To showA−1 = [−→v1−→v2 · · · −→vn], it suffices to show [−→v1−→v2 · · · −→vn]A = In. Since A[−→v1−→v2 · · · −→vn] = In,
A[−→v1 −→v2 · · · −→vn]A = InA = A.
Let−→bi be the ith column of [−→v1−→v2 · · · −→vn]A for i = 1, 2, . . . , n. Then A
−→bi = −→ai . But A−→ei = −→ai .
By the uniqueness of solution of A−→x = −→ai ,−→bi = −→ei for i = 1, 2, . . . , n. Thus
[−→v1 −→v2 · · · −→vn]A = [−→e1 −→e2 · · · −→en] = In.
To find A−1 for an invertible matrix A, we investigate how row operations on A are obtainedfrom premultiplying A by elementary matrices.
Definition. An n×n elementary matrix is obtained by applying an elementary row operationon In.
Example.
1. Eij is obtained by Ri ↔ Rj on In. Note that EijA is obtained by Ri ↔ Rj on A.
A =
0 2 41 −3 0−1 3 1
R1↔R2−−−−→
1 −3 00 2 4−1 3 1
=
0 1 01 0 00 0 1
0 2 41 −3 0−1 3 1
= E12A.
2. For c 6= 0, Ei(c) is obtained by cRi on In. Note that Ei(c)A is obtained by cRi on A.
E12A =
1 −3 00 2 4−1 3 1
12R2−−→
1 −3 00 1 2−1 3 1
=
1 0 00 1
20
0 0 1
1 −3 00 2 4−1 3 1
= E2
(1
2
)E12A.
3. Eij(c) is obtained by cRi +Rj on In. Note that Eij(c)A is obtained by cRi +Rj on A.
E2
(1
2
)E12A =
1 −3 00 1 2−1 3 1
R1+R3−−−−→
1 −3 00 1 20 0 1
=
1 0 00 1 01 0 1
1 −3 00 1 2−1 3 1
= E13(1)E2
(1
2
)E12A.
32
Linear Algebra Sudipta Mallik
Remark. Elementary matrices are invertible. Moreover, E−1ij = Eij, Ei(c)−1 = Ei
(1c
)for
c 6= 0, and Eij(c)−1 = Eij(−c).
Theorem 4.3. Let A be an n×n invertible matrix. A sequence of elementary row operationsthat reduces A to In also reduces In to A−1.
Proof. Since A is invertible, the RREF of A is In. Suppose In is obtained from A bysuccessively premultiplying by elementary matrices E1, E2, . . . , Ek, i.e.,
EkEk−1 · · ·E1A = In.
Postmultiplying by A−1, we get
EkEk−1 · · ·E1AA−1 = InA
−1 =⇒ EkEk−1 · · ·E1In = A−1.
Gauss-Jordan elimination:Find the RREF of [A | In]. If the the RREF of A is In, then A is invertible and the RREFof [A | In] is [In | A−1]. Otherwise A is not invertible.
Example.
[A | I3] =
0 2 4 1 0 01 −3 0 0 1 0−1 3 1 0 0 1
R1↔R2−−−−→
1 −3 0 0 1 00 2 4 1 0 0−1 3 1 0 0 1
R1+R3−−−−→
1 −3 0 0 1 00 2 4 1 0 00 0 1 0 1 1
−4R3+R2−−−−−→
1 −3 0 0 1 00 2 0 1 −4 −40 0 1 0 1 1
12R2−−→
1 −3 0 0 1 00 1 0 1
2−2 −2
0 0 1 0 1 1
3R2+R1−−−−→
1 0 0 32−5 −6
0 1 0 12−2 −2
0 0 1 0 1 1
= [I3 | A−1]
Thus A−1 =
32−5 −6
12−2 −2
0 1 1
. Notice how elementary matrices E12, E13(1), E32(−4), E2
(12
),
E21(3) are successively applied on A to get I3:
E21(3)E2
(1
2
)E32(−4)E13(1)E12A = I3.
Verify that the product of those elementary matrices is A−1:
A−1 = E21(3)E2
(1
2
)E32(−4)E13(1)E12.
Remark. For an m × n matrix A there is a generalized inverse called the Moore-Penroseinverse, denoted by A+, which can be found using the singular-value decomposition of A.
33
Linear Algebra Sudipta Mallik
4.2 Invertible Matrix Theorem
Theorem 4.4 (Invertible Matrix Theorem). Let A be an n× n matrix. Then the followingare equivalent.
(a) A is invertible.
(b) A−→x =−→b has a unique solution for each
−→b ∈ Rn.
(c) The RREF of A is In.
(d) T (i.e., −→x 7→ A−→x ) is an isomorphism.
(e) T (i.e., −→x 7→ A−→x ) is one-to-one.
(f) kerT = NS (A) = {−→0n}.
(g) nullity (T ) = nullity (A) = 0.
(h) The columns of A are linearly independent.
(i) T (i.e., −→x 7→ A−→x ) is onto.
(j) imT = CS (A) = Rn.
(k) rank (T ) = rank (A) = n.
(l) Each row and column of A has a pivot position.
Proof. (a), (b), and (c) are equivalent by Theorem 4.2. Also (d)-(l) are equivalent byTheorem 7.8. Since A is a square matrix, (c) and (l) are equivalent.
Example. What can we say about CS (A) ,NS (A) , rank (A) , nullity (A), and pivot positionsof a 3× 3 invertible matrix? What about −→x 7→ A−→x ?By the IMT, CS (A) = R3,NS (A) = {−→03}, rank (A) = 3, nullity (A) = 0, A has 3 pivotpositions, and −→x 7→ A−→x is an isomorphism, i.e., a one-to-one linear transformation from R3
onto R3. Also A−→x =−→b has a unique solution A−1
−→b for each
−→b ∈ R3.
Remark. In general the conditions in the IMT are not equivalent for a non-square matrix.
Example.
1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has
2 × 3 standard matrix A =
[1 0 00 1 0
]. Note that T is onto but not one-to-one.
Equivalently the columns of A span R2 but they are not linearly independent.
2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has 3 × 2
standard matrix A =
1 00 10 0
. Note that T is one-to-one but not onto. Equivalently
the columns of A are linearly independent but they do not span R3.
34
Linear Algebra Sudipta Mallik
Definition. A linear transformation T : Rn → Rn is invertible if there is another lineartransformation S : Rn → Rn such that
T (S(−→x )) = S(T (−→x )) = −→x for all −→x ∈ Rn.
This S is called the inverse of T , denoted by T−1, for which T ◦ T−1 = T−1 ◦ T = I, theidentity function on Rn.
Remark. It is well-known that a function is invertible if it is one-to-one and onto. So alinear transformation T : Rn → Rn is an isomorphism if and only if it invertible.
Example. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + 2x2, 3x1 +5x2) is one-to-one and onto consequently invertible. How to find T−1 : R2 → R2?
Theorem 4.5. Let T : Rn → Rn be a linear transformation with the standard matrix A.Then T is invertible if and only if A is invertible. Also T−1 : Rn → Rn is given by
T−1(−→x ) = A−1−→x .
Proof. T is invertible (i.e., an isomorphism) if and only if A is invertible by the IMT. LetS : Rn → Rn be a linear transformation defined by S(−→x ) = A−1−→x . Then for all −→x ∈ Rn,
T (S(−→x )) = T (A−1−→x ) = A(A−1−→x ) = In−→x = −→x and
S(T (−→x )) = S(A−→x ) = A−1(A−→x ) = In−→x = −→x .
Thus S = T−1.
Example. The isomorphism T : R2 → R2 defined by T (x1, x2) = (x1+2x2, 3x1+5x2) has the
standard matrix A = [T (−→e1 ) T (−→e2 )] =
[1 23 5
]. Since A−1 =
[−5 2
3 −1
], T−1 : R2 → R2
is given by T−1(−→x ) = A−1−→x , i.e., T−1(x1, x2) = (−5x1 + 2x2, 3x1 − x2). Verify that for all[x1, x2]
T ∈ R2,
T (T−1(x1, x2)) = T (−5x1 + 2x2, 3x1 − x2) = (x1, x2) and
T−1(T (x1, x2)) = T−1(x1 + 2x2, 3x1 + 5x2) = (x1, x2).
4.3 Determinant of a Matrix
In this section we study the determinant of an n× n matrix A = [aij], denoted by det(A) ordetA or |A| or ∣∣∣∣∣∣∣∣∣
a11 a12 · · · a1na21 a22 · · · a2n...
.... . .
...am1 am2 · · · amn
∣∣∣∣∣∣∣∣∣ .To define det(A) recursively, we denote A(i, j) for the the matrix obtained from A by deletingrow i and column j of A.
35
Linear Algebra Sudipta Mallik
Definition. If A = [a11], then det(A) = a11. If A =
[a11 a12a21 a22
], then det(A) = a11a22 −
a12a21. For an n× n matrix A = [aij] where n ≥ 3,
det(A) =n∑i=1
(−1)1+ia1i detA(1, i) = a11 detA(1, 1)−a12 detA(1, 2)+· · ·+(−1)n+1a1n detA(1, n).
Example. We find det(A) for A =
1 2 31 3 51 4 2
.
det(A) = a11 detA(1, 1)− a12 detA(1, 2) + a13 detA(1, 3)
= 1
∣∣∣∣ 3 54 2
∣∣∣∣− 2
∣∣∣∣ 1 51 2
∣∣∣∣+ 3
∣∣∣∣ 1 31 4
∣∣∣∣= 1(3 · 2− 5 · 4)− 2(1 · 2− 5 · 1) + 3(1 · 4− 3 · 1)
= −5
Definition. For an n× n matrix A = [aij] where n ≥ 2, the (i, j) minor, denoted by mij, ismij = detA(i, j) and the (i, j) cofactor, denoted by cij, is
cij = (−1)i+jmij = (−1)i+j detA(i, j).
Remark. We defined det(A) as the cofactor expansion along the first row of A:
det(A) =n∑i=1
(−1)1+ia1i detA(1, i) =n∑i=1
a1ic1i.
But it can be proved that det(A) is the cofactor expansion along any row or column of A.
Theorem 4.6. Let A be an n× n matrix. Then for each i, j = 1, 2, . . . , n,
det(A) =n∑j=1
aijcij =n∑i=1
aijcij.
The preceding theorem can be proved using the following equivalent definition of determinant:
det(A) =∑σ∈Sn
(sgn(σ)
n∏i=1
aiσ(i)
),
where σ runs over all n! permutations σ of {1, 2, . . . , n}. (This requires study of permutations)
Corollary 4.7. Let A = [aij] be an n× n matrix.
(a) det(AT ) = det(A).
(b) If A is a triangular matrix, then det(A) = a11a22 · · · ann.
36
Linear Algebra Sudipta Mallik
Proof. (Sketch) (a) Note that the (i, j) cofactor of A is the (j, i) cofactor of AT . The cofactorexpansions along the first rows to get det(A) would be same as cofactor expansions alongthe first columns to get det(AT ).(b) If A is an upper-triangular matrix, then by cofactor expansions along the first rows weget det(A) = a11a22 · · · ann. Similarly if A is a lower-triangular matrix, then by cofactorexpansions along the first columns we get det(A) = a11a22 · · · ann.
Example. A =
1 2 3 4 53 0 1 3 20 0 4 3 00 0 0 2 12 0 0 0 3
.We compute det(A) using rows or columns with maximum
number of zeros at a step. So first we choose column 2 and do cofactor expansion along it:
det(A) = −2
∣∣∣∣∣∣∣∣3 1 3 20 4 3 00 0 2 12 0 0 3
∣∣∣∣∣∣∣∣Now we have 5 choices: row 2,3,4 and column 1,2. We do cofactor expansion along row 4:
det(A) = −2
−2
∣∣∣∣∣∣1 3 24 3 00 2 1
∣∣∣∣∣∣+ 3
∣∣∣∣∣∣3 1 30 4 30 0 2
∣∣∣∣∣∣
Since the second determinant is a determinant of an upper-triangular matrix, its determinantis 3 · 4 · 2 = 24. We do cofactor expansion along column 3 for the first determinant.
det(A) = −2
(−2
(2
∣∣∣∣ 4 30 2
∣∣∣∣+ 1
∣∣∣∣ 1 34 3
∣∣∣∣)+ 3 · 24
)= −2 (−2 (2(4 · 2− 0) + 1(1 · 3− 3 · 4)) + 72)
= −116
Some applications of determinants:
1. Determinant as volume: Suppose a hypersolid S in Rn is given by n concurrent edgesthat are represented by column vectors of an n × n matrix A. Then the volume of S is| det(A)|.
Let −→r1 = [a1, b1, c1]T , −→r2 = [a2, b2, c2]
T , −→r3 = [a3, b3, c3]T .
A = [−→r1 −→r2 −→r3 ] =
a1 a2 a3b1 b2 b3c1 c2 c3
and the volume of the
parallelepiped with concurrent edges given by −→r1 ,−→r2 ,−→r3 is| det(A)| = |a1(b2c3−b3c2)−a2(b1c3−b3c1)+a3(b1c2−b2c1)|.
37
Linear Algebra Sudipta Mallik
2. Equation of a plane: Consider the plane passing through three distinct points P1(x1, y1, z1),P2(x2, y2, z2) and P3(x3, y3, z3). Let P (x, y, z) be a point on the plane. So the volume of
the parallelepiped with concurrent edges−−→P1P ,
−−→P2P and
−−→P3P is zero.∣∣∣∣∣∣
x− x1 x− x2 x− x3y − y1 y − y2 y − y3z − z1 z − z2 z − z3
∣∣∣∣∣∣ = 0.
3. Volume after transformation: Let T : Rn → Rn be a linear transformation with thestandard matrix A. Let S be a bounded hypersolid in Rn. Then the volume of T (S) is| det(A)| times the volume of S.
x2 + y2 = 1
x2
a2+y2
b2= 1
Example. Let A =
[a 00 b
]and D = {(x, y) | x2 + y2 ≤ 1}. Consider T : R2 → R2
defined by T ([x, y]T ) = A[x, y]T . Note T (D) = {(x, y) | x2a2
+ y2
b2≤ 1}. So the area of
ellipse = the area of T (D) = det(A) · A(D) = ab · π12 = πab.
4. Change of variables: Suppose variables x1, . . . , xn are changed to v1, . . . , vn by n differentiablefunctions f1, . . . , fn so that
v1 = f1(x1, . . . , xn)
v2 = f2(x1, . . . , xn)...
vn = fn(x1, . . . , xn).
So we have a function F : Rn → Rn defined by
F (x1, . . . , xn) = (f1(x1, . . . , xn), . . . , fn(x1, . . . , xn)).
The Jacobian matrix of F : Rn → Rn is the following
∂(f1, . . . , fn)
∂(x1, . . . , xn)=
∂f1∂x1
· · · ∂f1∂xn
.... . .
...∂fn∂x1
· · · ∂fn∂xn
.38
Linear Algebra Sudipta Mallik
The change of variables formula for integrals is∫F (U)
G(−→v )d−→v =
∫U
G(−→x )
∣∣∣∣ ∂(f1, . . . , fn)
∂(x1, . . . , xn)
∣∣∣∣ d−→x .Example. So (x, y) = F (r, θ) = (ar cos θ, br sin θ) and F ([0, 1] × [0, 2π]) is the region
inscribed by the ellipse x2
a2+ y2
b2= 1. The Jacobian matrix is
∂(x, y)
∂(r, θ)=
[∂x∂r
∂x∂θ
∂y∂r
∂y∂θ
]=
[a cos θ −ar sin θb sin θ br cos θ
]and
∣∣∣∣∂(x, y)
∂(r, θ)
∣∣∣∣ = abr.
By the change of variables formula,∫F ([0,1]×[0,2π])
1 d−→v =
∫ 2π
θ=0
∫ 1
r=0
1
∣∣∣∣∂(x, y)
∂(r, θ)
∣∣∣∣ drdθ = ab · π.
5. Wronskian: The Wroskian of n real-values differentiable functions f1, . . . , fn is
W (f1, . . . , fn)(x) =
∣∣∣∣∣∣∣∣∣f1(x) · · · fn(x)f
′1(x) · · · f
′n(x)
.... . .
...
f(n−1)1 (x) · · · f
(n−1)n (x)
∣∣∣∣∣∣∣∣∣ .
f1, . . . , fn are linearly independent functions iff W (f1, . . . , fn) is not identically zero.
4.4 Properties of Determinants
Theorem 4.8. For an n×n matrix A and n×n elementary matrices Eij, Ei(c), Eij(c), wehave detEij = −1, detEi(c) = c, detEij(c) = 1, and
det(EijA) = − detA = (detEij)(detA),
det(Ei(c)A) = c detA = (detEi(c))(detA),
det(Eij(c)A) = detA = (detEij(c))(detA).
Proof. Use cofactor expansion and induction on n.
Theorem 4.9. Let A be an n× n matrix. Then A is invertible if and only if det(A) 6= 0.
Proof. Suppose A is invertible. Then A−1 is invertible and there are elementary matricesE1, E2, . . . , Ek such that EkEk−1 · · ·E1A
−1 = In. Postmultiplying by A, we get
EkEk−1 · · ·E1 = A =⇒ det(EkEk−1 · · ·E1) = det(A).
39
Linear Algebra Sudipta Mallik
By successively applying Theorem 4.8, we get
det(A) = det(EkEk−1 · · ·E1) = det(Ek) det(Ek−1) · · · det(E1) 6= 0.
For the converse, suppose that A is not invertible. Then the RREF R of A is not In. So R isan upper-triangular matrix with the last row being a zero row and consequently det(R) = 0.Suppose E ′1, E
′2, . . . , E
′t are elementary matrices for which E ′tE
′t−1 · · ·E ′1A = R. Then
det(E ′tE′t−1 · · ·E ′1A) = det(R) = 0 =⇒ det(E ′t) det(E ′t−1) · · · det(E ′1) det(A) = 0,
by Theorem 4.8. Since det(E ′i) 6= 0 for i = 1, 2, . . . , t, det(A) = 0.
Remark. We extend the IMT by adding one more equivalent condition:
(a) A is invertible.
(m) det(A) 6= 0.
Theorem 4.10. Let A and B be two n× n matrices. Then det(AB) = det(A) det(B).
Proof. Case 1. A is not invertible.By the IMT, rank (A) < n. Since CS (AB) ⊆ CS (A), rank (AB) ≤ rank (A) < n andconsequently AB is also not invertible. By the IMT, det(A) = 0 and det(AB) = 0. Thus
det(AB) = 0 = det(A) det(B).
Case 2. A is invertible.There are elementary matrices E1, E2, . . . , Ek such that EkEk−1 · · ·E1 = A. Postmultiplyingby B, we get AB = EkEk−1 · · ·E1B. By successively applying Theorem 4.8, we get
det(AB) = det(EkEk−1 · · ·E1B)
= det(Ek) det(Ek−1) · · · det(E1) det(B)
= det(EkEk−1 · · ·E1) det(B)
= det(A) det(B).
Corollary 4.11. Let A be an n× n matrix.
(a) For all scalars c, det(cA) = det(cInA) = det(cIn) det(A) = cn det(A).
(b) If A is invertible, then det(A) det(A−1) = det(AA−1) = det(In) = 1.
Example. A =
1 2 33 5 10 0 2
. Is A invertible? Compute det(AT ), det(4A5), and det(A−1).
Since det(A) = −2 6= 0, A is invertible and we have det(AT ) = det(A) = −2, det(4A5) =43(detA)5 = −2048, and det(A−1) = (detA)−1 = (−2)−1 = −1/2.
40
Linear Algebra Sudipta Mallik
Theorem 4.12 (Cramer’s Rule). Let A be an n × n invertible matrix and−→b ∈ Rn. The
unique solution −→x = [x1, x2, . . . , xn]T of A−→x =−→b is given by
xi =det(Ai(
−→b ))
det(A), i = 1, 2, . . . , n,
where Ai(−→b ) is the matrix obtained from A by replacing its ith column by
−→b .
Proof. Let i ∈ {1, 2, . . . , n}. Note that
A[−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en] = [A−→e1 · · ·A−−→ei−1A−→x A−−→ei+1 · · ·A−→en]
= [−→A1 · · ·
−−→Ai−1 A
−→x−−→Ai+1 · · ·
−→An]
= [−→A1 · · ·
−−→Ai−1
−→b−−→Ai+1 · · ·
−→An]
= Ai(−→b ).
Since det([−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en]) = xi,
det(Ai(−→b )) = det(A[−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en])
= det(A) det([−→e1 · · · −−→ei−1−→x−−→ei+1 · · · −→en])
= det(A)xi.
Thus xi = det(Ai(−→b ))
det(A).
Example. We solve A−→x =−→b by Cramer’s Rule, where
A =
1 0 23 2 51 1 4
,−→x =
x1x2x3
, and−→b =
181
.Since det(A) = 5 6= 0, there is a unique solution [x1, x2, x3]
T and by Cramer’s Rule,
x1 =det(A1(
−→b ))
det(A)=
∣∣∣∣∣∣1 0 28 2 51 1 4
∣∣∣∣∣∣5
=15
5= 3
x2 =det(A2(
−→b ))
det(A)=
∣∣∣∣∣∣1 1 23 8 51 1 4
∣∣∣∣∣∣5
=10
5= 2
x3 =det(A3(
−→b ))
det(A)=
∣∣∣∣∣∣1 0 13 2 81 1 1
∣∣∣∣∣∣5
=−5
5= −1.
Thus the unique solution is [x1, x2, x3]T = [3, 2,−1]T .
41
Linear Algebra Sudipta Mallik
Definition. Let A be an n × n matrix. The cofactor matrix, denoted by C = [cij], is ann× n matrix where cij is the (i, j) cofactor of A. The adjoint or adjugate of A, denoted byadjA or adj(A), is the transpose of the cofactor matrix of A, i.e., adjA = CT .
Theorem 4.13. Let A be an n× n invertible matrix. Then
A−1 =1
det(A)adjA.
Proof. Since AA−1 = In, A(the column j of A−1) = −→ej . By Cramer’s Rule, the (i, j)-entryof A−1, i.e., the ith entry of the column j of A−1, is
det(Ai(−→ej ))
det(A)=
(−1)i+j det(A(j, i))
det(A)=
cjidet(A)
=(CT )ijdet(A)
=(adjA)ijdet(A)
.
Example.
1. For invertible A =
[a bc d
],
A−1 =1
det(A)adjA =
1
ad− bc
[c11 c12c21 c22
]T=
1
ad− bc
[d −c−b a
]T=
1
ad− bc
[d −b−c a
]
2. For invertible A =
1 0 23 2 51 1 4
,
A−1 =1
det(A)adjA =
1
5
c11 c12 c13c21 c22 c23c31 c32 c33
T =1
5
3 −7 12 2 −1−4 1 2
T =1
5
3 2 −4−7 2 1
1 −1 2
We end by the following useful multilinear property of determinant:
Theorem 4.14. Let A = [−→a1 −→a2 · · · −→an] be an n × n matrix. Then for all −→x ,−→y ∈ Rn andfor all scalars c, d,
det[−→a1 · · · −−→ai−1 (c−→x + d−→y )−−→ai+1 · · · −→an] = c det[−→a1 · · · −−→ai−1 −→x −−→ai+1 · · · −→an]
+ d det[−→a1 · · · −−→ai−1 −→y −−→ai+1 · · · −→an].
Proof. (Sketch) Find determinants by the cofactor expansion along the ith column.
Example.∣∣∣∣ 3a+ 4s 3b+ 4tc d
∣∣∣∣ =
∣∣∣∣ 3a+ 4s c3b+ 4t d
∣∣∣∣ (by transposing)
=
∣∣∣∣ 3a c3b d
∣∣∣∣+
∣∣∣∣ 4s c4t d
∣∣∣∣ (by multilinearity of determinant)
= 3
∣∣∣∣ a cb d
∣∣∣∣+ 4
∣∣∣∣ s ct d
∣∣∣∣ (by multilinearity of determinant)
= 3(ad− cb) + 4(sd− ct).
42
Linear Algebra Sudipta Mallik
5 Eigenvalues and Eigenvectors
5.1 Basics of Eigenvalues and Eigenvectors
Definition. Let A be an n× n matrix. If A−→x = λ−→x for some nonzero vector −→x and somescalar λ, then λ is an eigenvalue of A and −→x is an eigenvector of A corresponding to λ.
Example. Consider A =
[1 20 3
], λ = 3, −→v =
[11
], −→u =
[−2
1
].
Since A−→v =
[1 20 3
] [11
]=
[33
]= 3
[11
]= λ−→v , 3 is an eigenvalue of A and −→v is an
eigenvector of A corresponding to the eigenvalue 3.
Since A−→u =
[1 20 3
] [−2
1
]=
[03
]6= λ
[−2
1
]= λ−→u for all scalars λ, −→u is not an
eigenvector of A.
Remark. For a real matrix, an eigenvalue can be a complex number and an eigenvector canbe a complex vector.
Example. Consider A =
[0 1−1 0
]. Since
[0 1−1 0
] [1i
]=
[i−1
]= i
[1i
], i is an
eigenvalue of A and
[1i
]is an eigenvector of A corresponding to the eigenvalue i
Remark. An eigenvector must be a nonzero vector by definition. So the following areequivalent:
1. λ is an eigenvalue of A
2. A−→x = λ−→x for some nonzero vector −→x
3. (A− λI)−→x =−→0 for some nonzero vector −→x
4. (A− λI)−→x =−→0 has a nontrivial solution −→x
5. A− λI is not invertible (by IMT)
6. det(A− λI) = 0.
Definition. det(λI − A) is a polynomial of λ and it is the characteristic polynomial of A.det(λI − A) = 0 is the characteristic equation of A.
Remark. Since the roots of the characteristic polynomial are the eigenvalues of the n × nmatrix A, A has n eigenvalues, not necessarily distinct.
Definition. The multiplicity of a root λ in det(λI − A) is the algebraic multiplicity of theeigenvalue λ of A.
43
Linear Algebra Sudipta Mallik
Remark. If λ is an eigenvalue of A, then NS (A− λI) is the union of {−→0 } and the set ofall eigenvectors of A corresponding to the eigenvalue λ.
Definition. Suppose λ is an eigenvalue of the matrix A. Then
NS (A− λI) = {−→x | (A− λI)−→x =−→0 }
is the eigenspace of A corresponding to the eigenvalue λ and dim (NS (A− λI)) is thegeometric multiplicity of the eigenvalue λ.
Example. Let A =
3 0 00 4 10 −2 1
.
(a) Find the characteristic polynomial of A.
(b) Find the eigenvalues of A with their algebraic multiplicities.
(c) Find the eigenspaces of A and geometric multiplicities of the eigenvalues of A.
Solution. (a) The characteristic polynomial of A is
det(λI − A) =
∣∣∣∣∣∣λ− 3 0 0
0 λ− 4 −10 2 λ− 1
∣∣∣∣∣∣= (λ− 3)
∣∣∣∣ λ− 4 −12 λ− 1
∣∣∣∣− 0 + 0
= (λ− 3)(λ2 − 5λ+ 6)
= (λ− 3)(λ− 3)(λ− 2)
(b) det(λI − A) = (λ− 2)(λ− 3)2 = 0 =⇒ λ = 2, 3, 3.So 2 and 3 are eigenvalue of A with algebraic multiplicities 1 and 2 respectively.
(c) The eigenspace of A corresponding to the eigenvalue 3 is
NS (A− 3I) = {−→x | (A− 3I)−→x =−→0 }.
[A− 3I | −→0 ] =
0 0 0 00 1 1 00 −2 −2 0
2R2+R3−−−−→
0 0 0 00 1 1 00 0 0 0
R1↔R2−−−−→
0 1 1 00 0 0 00 0 0 0
So we get x2 + x3 = 0 where x1 and x3 are free variable. Thus
−→x =
x1x2x3
=
x1−x3x3
= x1
100
+ x3
0−1
1
∈ Span
1
00
, 0−1
1
.
44
Linear Algebra Sudipta Mallik
Thus the eigenspace of A corresponding to the eigenvalue 3 is
NS (A− 3I) = Span
1
00
, 0−1
1
,
and the geometric multiplicity of the eigenvalue 3 is dim (NS (A− 3I)) = 2.
The eigenspace of A corresponding to the eigenvalue 2 is
NS (A− 2I) = {−→x | (A− 2I)−→x =−→0 }.
[A− 2I | −→0 ] =
1 0 0 00 2 1 00 −2 −1 0
R2+R3−−−−→
1 0 0 00 2 1 00 0 0 0
R22−−→
1 0 0 0
0 1 12
00 0 0 0
So we get x1 = 0, x2 + x3
2= 0 where x3 is a free variable. Thus
−→x =
x1x2x3
=
0−x3
2
x3
=x32
0−1
2
∈ Span
0−1
2
.
Thus the eigenspace of A corresponding to the eigenvalue 2 is
NS (A− 2I) = Span
0−1
2
,
and the geometric multiplicity of the eigenvalue 2 is dim (NS (A− 2I)) = 1.
Remark. Recall that −→x 7→ A−→x is a linear transformation from Rn to Rn. This lineartransformation is invariant on the eigenspaces of A:
If λ is an eigenvalue of A and −→x ∈ NS (A− λI), then A−→x ∈ NS (A− λI).
Example. In the preceding example, −→x =
4−5
5
∈ NS (A− 3I) = Span
1
00
, 0−1
1
,
and alsoA−→x = A
4−5
5
=
12−15
15
= 3
4−5
5
∈ NS (A− 3I) = Span
1
00
, 0−1
1
.
Theorem 5.1 (IMT contd.). Let A be an n× n matrix. Then the following are equivalent:
(a) A is invertible
(o) 0 is not an eigenvalue of A.
Proof. (contrapositive) 0 is an eigenvalue of A iff A−→x =−→0 has a nontrivial solution. By
the IMT, A−→x =−→0 has a nontrivial solution iff A is not invertible.
45
Linear Algebra Sudipta Mallik
Some useful results:
Theorem 5.2. Let A be an n × n matrix with eigenvalues λ1, λ2, . . . , λn. Then detA =λ1λ2 · · ·λn.
Proof. Note that det(λI − A) = (λ − λ1)(λ − λ2) · · · (λ − λn). Plugging λ = 0, we get(−1)n detA = (−1)nλ1λ2 · · ·λn =⇒ detA = λ1λ2 · · ·λn.
Theorem 5.3. The eigenvalues of a triangular matrix (e.g., diagonal matrix) are the entrieson its main diagonal.
Proof. Consider an upper-triangular matrix A =
d10 d2 ∗0 0 d3...
.... . .
0 0 0 · · · dn
.Its characteristic polynomial is det(λI − A) = (λ− d1)(λ− d2) · · · (λ− dn).So det(λI − A) = 0 =⇒ λ = d1, . . . , dn.
Theorem 5.4. Let A be a square matrix. If λ is an eigenvalue of A, then λk is an eigenvalueof Ak.
Proof. Suppose A−→v = λ−→v , −→v 6= −→0 . Then A(A−→v ) = A(λ−→v ). So
A2−→v = λ(A−→v ) = λ(λ−→v ) = λ2−→v .
Continuing this process, we get Ak−→v = λk−→v .
Theorem 5.5. Let A be an invertible matrix. Then λ is an eigenvalue of A if and only if 1λ
is an eigenvalue of A−1.
Proof. Suppose A−→v = λ−→v , −→v 6= −→0 . Since A is invertible, λ 6= 0.
A−1(A−→v ) = A−1(λ−→v )
I−→v = −→v = λ(A−1−→v )1
λ−→v = A−1−→v
So 1λ
is an eigenvalue of A−1. The converse follows by a similar argument.
Theorem 5.6. Let A be a square matrix. If −→v1 ,−→v2 , . . . ,−→vk are eigenvectors of A correspondingto distinct eigenvalues λ1, λ2, . . . , λk of A respectively, then {−→v1 ,−→v2 , . . . ,−→vk} is linearly independent.
Proof. Let λ1, λ2, . . . , λk be distinct and A−→vi = λi−→vi , −→vi 6=
−→0 for i = 1, . . . , k. Suppose
{−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent. WLOG let {−→v1 ,−→v2 , . . . ,−→vp} be a maximal linearlyindependent subset of {−→v1 ,−→v2 , . . . ,−→vk} for some p < k. Then {−→v1 ,−→v2 , . . . ,−→vp ,−−→vp+1} is linearlydependent and consequently
−−→vp+1 = c1−→v1 + · · ·+ cp
−→vp , (2)
46
Linear Algebra Sudipta Mallik
for some scalars c1, . . . , cp, not all zero (since −−→vp+1 6=−→0 ).
A−−→vp+1 = A(c1−→v1 + · · ·+ cp
−→vp)
λp+1−−→vp+1 = c1A
−→v1 + · · ·+ cpA−→vp
λp+1−−→vp+1 = c1λ1
−→v1 + · · ·+ cpλp−→vp (3)
λp+1(2)− (3) gives−→0 = c1(λp+1 − λ1)−→v1 + · · ·+ cp(λp+1 − λp)−→vp (4)
Since λp+1−λi 6= 0 for i = 1, . . . , p and c1, . . . , cp are not all zero, c1(λp+1−λ1), . . . , cp(λp+1−λp) are not all zero. So (4) implies {−→v1 ,−→v2 , . . . ,−→vp} is linearly dependent, a contradiction.
Remark. The converse of the preceding theorem is not true. Consider A =
3 0 00 4 10 −2 1
in the last example.
100
and
0−1
1
are linearly independent eigenvectors of A and they
are eigenvectors corresponding to the same eigenvalue 3 of A.
5.2 Similar and Diagonalizable Matrices
Definition. Let A and B be n × n matrices. A is similar to B if A = PBP−1 for someinvertible matrix P .
Remark. If A is similar to B, then B is similar to A because B = P−1A(P−1)−1. So wesimply say A and B are similar.
Example. Consider A =
[6 −12 3
], B =
[5 00 4
], C =
[5 40 0
].
A and B are similar because A = PBP−1 where P =
[1 11 2
]and P−1 =
[2 −1−1 1
]. It
can be verified that there is no invertible matrix P such that A = PCP−1. So A and C arenot similar.
Theorem 5.7. If n×n matrices A and B are similar, then they have the same characteristicpolynomial and consequently the same eigenvalues, counting multiplicities.
Proof. Let A = PBP−1 for some invertible matrix P . Then
det(λI − A) = det(λI − PBP−1)= det(λPP−1 − PBP−1)= det(P (λI −B)P−1)
= detP det(λI −B) det(P−1)
= det(λI −B) detP det(P−1)
= det(λI −B) det(PP−1)
= det(λI −B) · 1.
47
Linear Algebra Sudipta Mallik
Remark.
1. In the preceding example A and B are similar and then they have the same eigenvalues.Since the eigenvalues of C and B are different, C is not similar to B and hence A.
2. If A and B are similar, they have the same eigenvalues. But the converse is not true.
For example,
[1 00 0
]and
[1 10 0
]are not similar, but have the same eigenvalues.
Theorem 5.8. Let A be an n×n matrix with eigenvalue λ. Then the geometric multiplicityof λ is less than or equal to the algebraic multiplicity of λ.
Proof. Let k be the geometric multiplicity of λ. Suppose−→x1, . . . ,−→xk are k linearly independenteigenvectors of A corresponding to λ. Let P = [−→x1 · · · −→xk ∗ . . . ∗] be an n×n invertible matrix.
Then P−1AP =
[λIk ∗0 ∗
]and λ is an eigenvalue of P−1AP with algebraic multiplicity
at least k. Since A and P−1AP , being similar, have the same eigenvalues, the algebraicmultiplicity of λ is at least k.
Definition. A square matrix A is diagonalizable if A is similar to a diagonal matrix, i.e.,A = PDP−1 for some invertible matrix P and some diagonal matrix D.
Example. In the first example A is diagonalizable as A = PBP−1 where B is a diagonalmatrix.
Theorem 5.9. Let A be an n× n matrix. Then TFAE.
(a) A is diagonalizable,
(b) There are n linearly independent eigenvectors of A,
(c) The sum of the geometric multiplicities of the distinct eigenvalues of A is n, and
(d) Geometric multiplicity and algebraic multiplicity are the same for all eigenvalues of A.
Proof. First note that (b), (c), and (d) are equivalent. So we prove (a)⇐⇒(b).
(a)=⇒(b). There is an invertible matrix P = [−→p1 , . . . ,−→pn] such thatA = Pdiag(λ1, . . . , λn)P−1,i.e., AP = P diag(λ1, . . . , λn). So A−→pi = λi
−→pi for i = 1, . . . , n. So −→pi is an eigenvector ofA corresponding to the eigenvalue λi for i = 1, . . . , n. Since P is invertible, its columns−→p1 , . . . ,−→pn are linearly independent by the IMT.
(b)=⇒(a). Suppose −→x1, . . . ,−→xn are n linearly independent eigenvectors of A correspondingto the eigenvalues λ1, . . . , λn respectively. Then P = [−→x1, . . . ,−→xn] is invertible by the IMT.Since A−→xi = λi
−→xi for i = 1, . . . , n, we get
[A−→x1 . . . , A−→xn] = [λ1−→x1, . . . , λn−→xn]
A[−→x1, . . . ,−→xn] = [−→x1, . . . ,−→xn]diag(λ1, . . . , λn)
AP = PD,
where D = diag(λ1, . . . , λn). Thus A = PDP−1.
48
Linear Algebra Sudipta Mallik
Corollary 5.10. Let A be an n× n matrix.
1. If A has n distinct eigenvalues, then A is diagonalizable.
2. Suppose that A has k distinct eigenvalues λ1, . . . , λk with eigenbases B1, . . . , Bk respectively.Then A is diagonalizable if and only if B1 ∪ · · · ∪Bk is a basis for Rn.
A formula for Ak: Suppose A is diagonalizable and A = PDP−1 for some diagonal matrixD. Then
Ak = PDkP−1.
It is easy to see that
AA · · ·A = (PDP−1)(PDP−1) · · · (PDP−1) = PDD · · ·DP−1.
Note that Dk is obtained from D by raising the power of each diagonal entry of D to k.
Example. Let A =
2 0 01 2 1−1 0 1
.
(a) Diagonalize A, if possible.
(b) Find Ak, if A is diagonalizable.
Solution. det(λI − A) =
∣∣∣∣∣∣λ− 2 0 0−1 λ− 2 −11 0 λ− 1
∣∣∣∣∣∣ = (λ − 1)(λ − 2)2 = 0 =⇒ λ = 1, 2, 2.
Verify the following:
NS (A− 1I) = Span
0−1
1
NS (A− 2I) = Span
0
10
, −1
01
(a) Since 3 × 3 matrix A has 3 linearly independent eigenvectors, A is diagonalizable and
A = PDP−1 where D =
1 0 00 2 00 0 2
and P =
0 0 −1−1 1 0
1 0 1
.
You may verify this by showing AP = PD =
0 0 −2−1 2 0
1 0 2
.
49
Linear Algebra Sudipta Mallik
(b) Since A = PDP−1,
Ak = PDkP−1
=
0 0 −1−1 1 0
1 0 1
1 0 00 2 00 0 2
k 1 0 11 1 1−1 0 0
=
0 0 −1−1 1 0
1 0 1
1 0 00 2k 00 0 2k
1 0 11 1 1−1 0 0
=
2k 0 0−1 + 2k 2k −1 + 2k
1− 2k 0 1
.An interesting fact: If A and B are diagonalizable and they have the same eigenvectors,then AB = BA.
5.3 Similarity of Matrix Transformations
Suppose B =(−→b1 , . . . ,
−→bn
)is an ordered basis of Rn. Then any vector −→x ∈ Rn can be written
as −→x = c1−→b1 + c2
−→b2 + · · ·+ cn
−→bn for some unique scalars c1, c2, . . . , cn. The coordinate vector
of −→x relative to B or the B-coordinate of −→x , denoted by [−→x ]B, is [−→x ]B = [c1, c2, . . . , cn]T .
Example.
1. For E2 = (−→e1 ,−→e2 ) and −→x = [3, 2]T = 3−→e1 + 2−→e2 , we have [−→x ]E2 = [3, 2]T .
2. For B = (−→e1 ,−→e1 + 2−→e2 ) and −→x = [3, 2]T = 2−→e1 + 1(−→e1 + 2−→e2 ), we have [−→x ]B = [2, 1]T .
Remark. [ ]B is an isomorphism on Rn.
For two ordered bases B =(−→b1 , . . . ,
−→bn
)and C = (−→c1 , . . . ,−→cn) of Rn, what is the relationship
between [−→x ]B and [−→x ]C? The change of basis matrix from B to C, denoted by MC←B, isthe n×n invertible matrix for which [−→x ]C = MC←B[−→x ]B for all −→x ∈ Rn. How to find MC←B?
Let A be an n × n matrix. Consider the linear transformations T : Rn → Rn defined byT (−→x ) = A−→x . So T is the matrix transformation −→x 7→ A−→x . Consider two ordered bases
B =(−→b1 , . . . ,
−→bn
)and C = (−→c1 , . . . ,−→cn) of Rn. What is the relationship between [−→x ]B and
[T (−→x )]C? The matrix of T from B to C, denoted by [T ]C←B or C [T ]B, is the n×n invertiblematrix for which [T (−→x )]C =C [T ]B[−→x ]B for all −→x ∈ Rn. How to find C [T ]B?
50
Linear Algebra Sudipta Mallik
−→x T (−→x )
[−→x ]B [T (−→x )]C
T
C [T ]B
[ ]B [ ]C
Rn Rn
For a vector −→x ∈ Rn, suppose [−→x ]B = [r1, r2, . . . , rn]T , i.e., −→x = r1−→b1 + · · ·+ rn
−→bn . Then
T (−→x ) = A−→x = A(r1−→b1 + · · ·+ rn
−→bn) = r1A
−→b1 + · · ·+ rnA
−→bn .
[A−→x ]C = [r1A−→b1 + · · ·+ rnA
−→bn ]C
= r1[A−→b1 ]C + · · ·+ rn[A
−→bn ]C
=[[A−→b1 ]C · · · [A
−→bn ]C
] r1...rn
=[[A−→b1 ]C · · · [A
−→bn ]C
][−→x ]B.
Thus
C [T ]B =[[A−→b1 ]C · · · [A
−→bn ]C
].
Remark.
1. If C = B, then we simply denote C [T ]B by [T ]B, called the B-matrix of T : −→x 7→ A−→x .
2. If B = {−→e1 , . . . ,−→en}, then [T ]B = A, the standard matrix of T : −→x 7→ A−→x .
3. If A = In, then T = I and C [I]B = MC←B, the change of basis matrix from B to C.
Example. Let B =
([10
],
[11
]), C =
([12
],
[31
]), and A =
[−2 7
1 4
].
(a) Find C [T ]B, the matrix of T : −→x 7→ A−→x from B to C and use it to find [T (−→x )]C where[−→x ]B = [13,−1]T .
(b) Find [T ]B, the B-matrix of T : −→x 7→ A−→x .
(c) Find MC←B, the change of basis matrix from B to C and use it to find [−→x ]C where[−→x ]B = [13,−1]T .
51
Linear Algebra Sudipta Mallik
Solution.(a) A
−→b1 =
[−2
1
]= 1
[12
]− 1
[31
]= 1−→c1 − 1−→c2 =⇒ [A
−→b1 ]C =
[1−1
]A−→b2 =
[55
]= 2
[12
]+ 1
[31
]= 2−→c1 + 1−→c2 =⇒ [A
−→b2 ]C =
[21
]So the matrix of T : −→x 7→ A−→x from B to C is
C [T ]B =[[A−→b1 ]C [A
−→b2 ]C
]=
[1 2−1 1
].
[T (−→x )]C =C [T ]B[−→x ]B =
[1 2−1 1
] [13−1
]=
[11−14
].
(b) A−→b1 =
[−2
1
]= −3
[10
]+ 1
[11
]= −3
−→b1 + 1
−→b2 =⇒ [A
−→b1 ]B =
[−3
1
]A−→b2 =
[55
]= 0
[10
]+ 5
[11
]= 0−→b1 + 5
−→b2 =⇒ [A
−→b2 ]B =
[05
]So the B-matrix of T : −→x 7→ A−→x is
[T ]B =[[A−→b1 ]B [A
−→b2 ]B
]=
[−3 0
1 5
].
(c) −→b1 =
[10
]= −1
5
[12
]+
2
5
[31
]= −1
5−→c1 +
2
5−→c2 =⇒ [
−→b1 ]C =
[−1
525
]−→b2 =
[11
]=
2
5
[12
]+
1
5
[31
]=
2
5−→c1 +
1
5−→c2 =⇒ [
−→b2 ]C =
[2515
]So the change of basis matrix from B to C is
MC←B =C [I]B =[[−→b1 ]C [
−→b2 ]C
]=
[−1
525
25
15
]=
1
5
[−1 2
2 1
].
[−→x ]C = MC←B[−→x ]B =1
5
[−1 2
2 1
] [13−1
]=
[−3
5
].
Theorem 5.11. Let A and D be two n × n matrices such that A = PDP−1. If B is thebasis of Rn formed from the columns of P , then the B-matrix of −→x 7→ A−→x is D = P−1AP .
Proof. Let P =[−→b1 · · ·
−→bn
]and B =
(−→b1 , . . . ,
−→bn
). Since
[−→b1 · · ·
−→bn
][−→x ]B = −→x , we have
P [−→x ]B = −→x . So [−→x ]B = P−1−→x for all −→x ∈ Rn.
[T ]B =[[A−→b1 ]B · · · [A
−→bn ]B
]=[P−1A
−→b1 · · ·P−1A
−→bn
]= P−1A
[−→b1 · · ·
−→bn
]= P−1AP
= D since A = PDP−1.
52
Linear Algebra Sudipta Mallik
Remark.
1. Suppose A is diagonalizable and A = PDP−1 where D is a diagonal matrix. If B isthe basis of Rn formed from the columns of P (linearly independent eigenvectors of A),then the B-matrix of −→x 7→ A−→x is the diagonal matrix D whose main diagonal entriesare the corresponding eigenvalues of A.
2. The set of all matrix representations (i.e., B-matrices) of −→x 7→ A−→x is the set of allmatrices similar to A.
Example. A =
[−1 3−3 5
]. The eigenvalues of A are λ = 2, 2 with only one linearly
independent eigenvector −→v =
[11
]. Then A is not diagonalizable and no B-matrix of
−→x 7→ A−→x is diagonal. So we find a vector−→w such that (A−λI)2−→w =−→0 and (A−λI)−→w 6= −→0 .
This −→w is called a generalized eigenvector of A corresponding to the eigenvalue λ = 2. One
such −→w is −→w =
[12
]which is a generalized eigenvector of A corresponding to λ = 2.
Now consider a basis B =
{[11
],
[12
]}of R2 consisting of eigenvectors and generalized
eigenvectors of A. Then the B-matrix of −→x 7→ A−→x is an upper-triangular matrix[2 10 2
]= P−1AP where P =
[1 11 2
].
This upper-triangular matrix J = P−1AP =
[2 10 2
]is called the Jordan form of A.
Theorem 5.12. Any n × n matrix is similar to an n × n matrix in Jordan form, i.e.,
A = PJP−1 where J =
J1. . .
Jk
, Ji =
λi 1
λi. . .. . . 1
λi
, and λ1, . . . , λk are
eigenvalues of A (not necessarily distinct).
5.4 Application to Differential Equations
Suppose x1, x2, . . . , xn are n functions of t. Consider the following system of n linear ODEs:
x′1 = a11(t)x1 + a12(t)x2 + · · · + a1n(t)xn + g1(t)x′2 = a21(t)x1 + a22(t)x2 + · · · + a2n(t)xn + g2(t)
......
......
...x′n = am1(t)x1 + am2(t)x2 + · · · + amn(t)xn + gn(t).
53
Linear Algebra Sudipta Mallik
It can be simply written in the following matrix form:
−→x ′ = A−→x +−→g , (5)
where A =
a11 a12 · · · a1na21 a22 · · · a2n...
......
an1 an2 · · · ann
, −→x =
x1x2...xn
and −→g =
g1g2...gn
. A is called the
coefficient matrix of (5). When −→g =−→0 , (5) is a homogeneous system. Similarly when
−→g 6= −→0 , (5) is a nonhomogeneous system.
Theorem 5.13. If −→v1 , . . . ,−→vn are n linearly independent eigenvectors of A corresponding toeigenvalues λ1, . . . , λn respectively, then eλ1t−→v1 , . . . , eλnt−→vn are n linearly independent solutionsof
−→x ′ = A−→x
and the general solution is
−→x = c1eλ1t−→v1 + · · ·+ cne
λnt−→vn,
for arbitrary scalars c1, . . . , cn.
Verify:
−→x ′ = c1eλ1tλ1
−→v1 + · · ·+ cneλntλn
−→vnA−→x = c1e
λ1tA−→v1 + · · ·+ cneλntA−→vn
= c1eλ1tλ1
−→v1 + · · ·+ cneλntλn
−→vn.
Thus −→x ′ = A−→x .
Example. Suppose a particle is moving in a planar force field and its position vector −→xsatisfies the IVP
−→x ′ = A−→x , −→x (0) = [5, 6]T ,
where A =
[2 04 −3
]. Solve the IVP and sketch the trajectory of the particle on R2.
Solution. The eigenvalues of A are 2 and −3 with corresponding eigenvectors
[54
]and[
01
]respectively (show all the steps). So the general solution is
54
Linear Algebra Sudipta Mallik
−→x (t) = c1e2t
[54
]+ c2e
−3t[
01
].
−→x (0) =
[56
]=⇒ c1
[54
]+ c2
[01
]=
[56
]=⇒ 5c1 = 5, 4c1 + c2 = 6
=⇒ c1 = 1, c2 = 2.
So the solution is
−→x (t) = e2t[
54
]+ 2e−3t
[01
].
x1
x2
(5, 6)
Geometric view:
−→x (t) = e2t[
54
]+ 2e−3t
[01
]=
[5e2t
4e2t + 2e−3t
]=⇒ x1 = 5e2t, x2 = 4e2t + 2e−3t.
Eliminating t by using et =√x1/5, we get x31(4x1 − 5x2)
2 = 1250, the trajectory of theparticle whose planar motion is described by the given IVP.
55
Linear Algebra Sudipta Mallik
6 Inner-product and Orthogonality
6.1 Orthogonal Vectors in Rn
Definition. The inner product or the dot product of two vectors −→u and −→v in Rn, denotedby −→u · −→v , is defined by −→u · −→v = −→u T−→v .
Example. For −→u =
1−2
3
and −→v =
21−1
, −→u ·−→v = −→u T−→v = 1 ·2−2 ·1+3 ·(−1) = −3.
Theorem 6.1. The following are true for all −→u , −→v , −→w in Rn and for all scalars c, d in R.
(a) −→u · −→v = −→v · −→u . (symmetry)
(b) (c−→u + d−→v ) · w = c(−→u · −→w ) + d(−→v · −→w ). (linearity)
(c) −→u · −→u ≥ 0 where −→u · −→u = 0 if and only if −→u =−→0 . (nonnegativity)
Definition. The length or norm of −→v = [v1, v2, . . . , vn]T in Rn, denoted by ‖−→v ‖, is definedby ‖−→v ‖ =
√v21 + v22 + · · ·+ v2n. −→v ∈ Rn is a unit vector if ‖−→v ‖ = 1.
Remark. The following are true for all −→v in Rn and for all scalars c in R.
(a) ‖−→v ‖2 = −→v · −→v .
(b) ‖c−→v ‖ = |c| ‖−→v ‖.
(c) The unit vector in the direction of −→v 6= −→0 is 1
‖−→v ‖−→v .
Example. The unit vector in the opposite direction of−→v =
1−2
3
is −1‖−→v ‖−→v = 1√
14
−12−3
.
Definition. The distance between −→u ,−→v in Rn, denoted by d(−→u ,−→v ), is defined by
d(−→u ,−→v ) = ‖−→u −−→v ‖ .
Note that d(−→u ,−→v )2 = ‖−→u −−→v ‖2 = ‖−→u ‖2+‖−→v ‖2+2−→u ·−→v and d(−→u ,−−→v )2 = ‖−→u +−→v ‖2 =
‖−→u ‖2 +‖−→v ‖2−2−→u ·−→v . So −→u and −→v are perpendicular if and only if d(−→u ,−→v ) = d(−→u ,−−→v )if and only if −→u · −→v = 0.
Definition. Two vectors −→u and −→v in Rn are orthogonal if −→u · −→v = 0.
Example. Let −→u = [3, 2,−5, 0]T and −→v = [−4, 1,−2, 1]T .
(a) Determine if −→u and −→v are orthogonal.
(b) Find d(−→u ,−→v ).
56
Linear Algebra Sudipta Mallik
Solution. (a) Since −→u · −→v = 3 · (−4) + 2 · 1− 5 · (−2) + 0 · 1 = 0, −→u and −→v are orthogonal.(b)
d(−→u ,−→v ) = ‖−→u −−→v ‖ =
√‖−→u ‖2 + ‖−→v ‖2 + 2−→u · −→v
=
√‖−→u ‖2 + ‖−→v ‖2 (since −→u · −→v = 0)
=√
38 + 22
=√
60
Theorem 6.2 (Pythagorean Theorem). Two vectors −→u and −→v in Rn are orthogonal if and
only if ‖−→u +−→v ‖2 = ‖−→u ‖2 + ‖−→v ‖2.
Definition. The angle θ between two vectors −→u and −→v in Rn is the angle in [0, π] satisfying
−→u · −→v = ‖−→u ‖ ‖−→v ‖ cos θ.
Definition. Let W be a subspace of Rn. A vector −→v ∈ Rn is orthogonal to W if −→v · −→w = 0for all −→w ∈ W . The orthogonal complement of W , denoted by W⊥, is the set of all vectorsin Rn that are orthogonal to W , i.e.,
W⊥ = {−→v ∈ Rn | −→v · −→w = 0 for all −→w ∈ W}.
Example.
1. If L is a line in R2 through the origin, then L⊥ is the line through the origin that isperpendicular to L.
2. If L is a line in R3 through the origin, then L⊥ is the plane through the origin that isperpendicular to L. Note that (L⊥)⊥ = L.
Theorem 6.3. Let W be a subspace of Rn and W = Span{−→w1,−→w2, . . . ,
−→wk}. Then
(a) −→v ∈ W⊥ if and only if −→v · −→wi = 0 for i = 1, 2, . . . , k.
(b) W⊥ is a subspace of Rn.
(c) (W⊥)⊥ = W .
(d) W ∩W⊥ = {−→0 }.
Proof.
(a) Let −→v ∈ W⊥. Then −→v ·−→w = 0 for all −→w ∈ W . Since −→wi ∈ W for i = 1, 2, . . . , k, −→v ·−→wi = 0for i = 1, 2, . . . , k.Conversely suppose that −→v · −→wi = 0 for i = 1, 2, . . . , k. Let −→w ∈ W = Span{−→w1,
−→w2, . . . ,−→wk}.
Then −→w = c1−→w1 + c2
−→w2 + · · ·+ ck−→wk for some scalars c1, c2, . . . , ck. Then
−→v · −→w = −→v · (c1−→w1 + c2−→w2 + · · ·+ ck
−→wk) = c1(−→v · −→w1) + c2(
−→v · −→w2) + · · ·+ ck(−→v · −→wk) = 0.
Thus −→v · −→w = 0 for all −→w ∈ W and consequently −→v ∈ W⊥.
57
Linear Algebra Sudipta Mallik
(b)−→0 · −→w = 0 for all −→w ∈ W ,
−→0 ∈ W⊥ and W⊥ 6= ∅. Let −→u ,−→v ∈ W⊥ and c, d ∈ R. Then
for all −→w ∈ W ,
(c−→u + d−→v ) · −→w = c(−→u · −→w ) + d(−→v · −→w ) = c−→0 + d
−→0 =
−→0 .
Thus c−→u + d−→v ∈ W⊥. Therefore W⊥ is a subspace of Rn.
(c) Exercise.
(d) First note that {−→0 } ⊆ W ∩W⊥. Let −→v ∈ W ∩W⊥. Then −→v ∈ W and −→v ∈ W⊥. Thus
‖−→v ‖2 = −→v · −→v = 0 which implies −→v =−→0 . Therefore W ∩W⊥ = {−→0 }.
Theorem 6.4. Let A be an m × n real matrix. Then RS (A)⊥ = NS (A) and CS (A)⊥ =NS(AT).
Proof. To show NS (A) ⊆ RS (A)⊥, let −→x ∈ NS (A) = {−→x ∈ Rn | A−→x =−→0 }. Then each
row of A is orthogonal to −→x . Since RS (A) is the span of rows of A, −→x is orthogonal to eachvector of RS (A). Then −→x ∈ RS (A)⊥. Thus NS (A) ⊆ RS (A)⊥. To show RS (A)⊥ = NS (A),it suffices to show RS (A)⊥ ⊆ NS (A). Let −→x ∈ RS (A)⊥. Since rows of A are in RS (A),−→x is orthogonal to each row of A. Then A−→x =
−→0 and−→x ∈ NS (A). Thus RS (A)⊥ ⊆ NS (A).
Finally NS(AT)
= RS(AT)⊥
= CS (A)⊥ because RS(AT)
= CS (A).
6.2 Orthogonal Bases and Matrices
Definition. A set {−→v1 ,−→v2 , . . . ,−→vk} of vectors in Rn is called an orthogonal set if −→vi · −→vj = 0for all distinct i, j = 1, 2, . . . , k. Also {−→v1 ,−→v2 , . . . ,−→vk} is called an orthonormal set if it is anorthogonal set of unit vectors.
Example. Let −→v1 =
20−1
, −→v2 =
020
, and −→v3 =
102
. Verify that −→v1 ·−→v2 = 0, −→v1 ·−→v3 =
0, −→v2 · −→v3 = 0. Then {−→v1 ,−→v2 ,−→v3} is an orthogonal set in R3 but not orthonormal. Thefollowing is an orthonormal set:{ −→v1
‖−→v1‖,−→v2‖−→v2‖
,−→v3‖−→v3‖
}=
1√5
20−1
, 1
2
020
, 1√5
102
.
Theorem 6.5. If {−→v1 ,−→v2 , . . . ,−→vk} is an orthogonal set nonzero vectors in Rn, then {−→v1 ,−→v2 , . . . ,−→vk}is linearly independent and consequently it forms a basis of Span{−→v1 ,−→v2 , . . . ,−→vk}.
Proof. Let c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 for some scalars c1, c2, . . . , ck. Then
−→0 · −→v1 = (c1
−→v1 + c2−→v2 + · · ·+ ck
−→vk) · −→v1=⇒ 0 = c1(
−→v1 · −→v1) + c2(−→v2 · −→v1) + · · ·+ ck(
−→vk · −→v1)
=⇒ 0 = c1 ‖−→v1‖2
+ 0 + · · ·+ 0
=⇒ c1 = 0(
since ‖−→v1‖ 6= 0 as −→v1 6=−→0).
58
Linear Algebra Sudipta Mallik
Similarly we can prove c2 = c3 = · · · = ck = 0. Thus {−→v1 ,−→v2 , . . . ,−→vk} is linearly independentand consequently it forms a basis of Span{−→v1 ,−→v2 , . . . ,−→vk}.Definition. Let W be a subspace of Rn. An orthogonal basis of W is a basis of W that is anorthogonal set. Similarly an orthonormal basis of W is a basis of W that is an orthonormalset.
Example. Let −→v1 =
20−1
, −→v2 =
020
, and −→v3 =
102
. Then {−→v1 ,−→v2 ,−→v3} is an
orthogonal basis of R3.
Theorem 6.6. Let W be a subspace of Rn and {−→w1,−→w2, . . . ,
−→wk} is an orthogonal basis ofW . If −→v ∈ W , then
−→v =−→v · −→w1−→w1 · −→w1
−→w1 +−→v · −→w2−→w2 · −→w2
−→w2 + · · ·+−→v · −→wk−→wk · −→wk
−→wk.
Proof. Let −→v ∈ W = Span{−→w1,−→w2, . . . ,
−→wk}. Then −→v = c1−→w1 + c2
−→w2 + · · · + ck−→wk for some
scalars c1, c2, . . . , ck. Then−→v · −→w1 = (c1
−→w1 + c2−→w2 + · · ·+ ck
−→wk) · −→w1
=⇒ −→v · −→w1 = c1(−→w1 · −→w1) + c2(
−→w2 · −→w1) + · · ·+ ck(−→wk · −→w1)
=⇒ −→v · −→w1 = c1(−→w1 · −→w1) + 0 + · · ·+ 0
=⇒ c1 =−→v · −→w1−→w1 · −→w1
(since −→w1 · −→w1 = ‖−→w1‖
2 6= 0 as −→w1 6=−→0).
Similarly we can prove that ci =−→v · −→wi−→wi · −→wi
for i = 2, 3, . . . , k.
Example. Let −→v1 =
20−1
, −→v2 =
020
, and −→v3 =
102
. Write −→v =
−143
as a
unique linear combination of −→v1 ,−→v2 ,−→v3 which form an orthogonal basis of R3.
−→v =
−143
=−→v · −→v1−→v1 · −→v1
−→v1 +−→v · −→v2−→v2 · −→v2
−→v2 +−→v · −→v3−→v3 · −→v3
−→v3
=−5
5−→v1 +
8
4−→v2 +
5
5−→v3
= −−→v1 + 2−→v2 +−→v3 .Theorem 6.7. An m× n real matrix U has orthonormal columns if and only if UTU = In.
Proof. Let U = [−→u1 −→u2 · · · −→un] be an m× n real matrix. Then
UTU =
−→u1T−→u2T
...−→unT
[−→u1 −→u2 · · · −→un] =
−→u1 · −→u1 −→u1 · −→u2 · · · −→u1 · −→un−→u2 · −→u1 −→u2 · −→u2 · · · −→u2 · −→un
......
. . ....
−→un · −→u1 −→un · −→u2 · · · −→un · −→un
.Thus U has orthonormal columns if and only if UTU = In.
59
Linear Algebra Sudipta Mallik
Definition. A square real matrix U is called an orthogonal matrix if U has orthonormalcolumns, equivalently if UTU = I.
Theorem 6.8. The following are equivalent for an n× n real matrix U .
(a) U is an orthogonal matrix.
(b) U has orthonormal columns.
(c) UTU = In.
(d) UUT = In.
(e) U has orthonormal rows.
(f) U−1 = UT .
Example. U =
2√5
0 1√5
0 1 0−1√5
0 2√5
is an orthogonal matrix and U−1 = UT =
2√5
0 −1√5
0 1 01√5
0 2√5
.
Theorem 6.9. Let U be an m× n real matrix with orthonormal columns. Then
(a) (U−→x ) · (U−→y ) = −→x · −→y for all −→x ,−→y ∈ Rn.
(b) (U−→x ) · (U−→y ) = 0 if and only if −→x ·−→y = 0 for all −→x ,−→y ∈ Rn (i.e., the map −→x 7→ U−→xpreserves the orthogonality between vectors).
(c) ‖U−→x ‖ = ‖−→x ‖ for all −→x ∈ Rn (i.e., the map −→x 7→ U−→x preserves the length of vectors).
Proof. Since m× n real matrix U has orthonormal columns, UTU = In.
(a) (U−→x ) · (U−→y ) = (U−→x )T (U−→y ) = −→x TUTU−→y = −→x T In−→y = −→x · −→y for all −→x ,−→y ∈ Rn.
(b) Follows from (a).
(c) By (a), ‖U−→x ‖2 = (U−→x ) · (U−→x ) = −→x · −→x = ‖−→x ‖2 =⇒ ‖U−→x ‖ = ‖−→x ‖.
Corollary 6.10. An n× n real matrix U is orthogonal if and only if ‖U−→x ‖ = ‖−→x ‖ for all−→x ∈ Rn.
Proof. Let U be an n× n real matrix.( =⇒ ) It follows from (c) of Theorem 6.9.(⇐=) Suppose ‖U−→x ‖ = ‖−→x ‖ for all −→x ∈ Rn. Let UTU = [aij]. Since UTU is symmetric,
aij = aji. For i = 1, 2, . . . , n, aii = (U−→ei )T (U−→ei ) = ‖U−→ei ‖2
= ‖−→ei ‖2
= 1. For i 6= j,
aii − aji − aij + ajj = (U(−→ei −−→ej ))T (U(−→ei −−→ej ))
=⇒ 2− 2aij = ‖U(−→ei −−→ej )‖2 = ‖−→ei −−→ej ‖2
= 2=⇒ aij = 0.
Thus UTU = In and U is orthogonal.
60
Linear Algebra Sudipta Mallik
6.3 Orthogonal Projections
Theorem 6.11 (Orthogonal Decomposition Theorem). Let W be a subspace of Rn and−→y ∈ Rn. Then
−→y = −→w +−→zfor unique vectors −→w ∈ W and −→z ∈ W⊥. Moreover, if {−→w1,
−→w2, . . . ,−→wk} is an orthogonal
basis of W , then
−→w =−→y · −→w1−→w1 · −→w1
−→w1 +−→y · −→w2−→w2 · −→w2
−→w2 + · · ·+−→y · −→wk−→wk · −→wk
−→wk and −→z = −→y −−→w .
Proof. Suppose {−→w1,−→w2, . . . ,
−→wk} is an orthogonal basis of W . Then
−→w =−→y · −→w1−→w1 · −→w1
−→w1 +−→y · −→w2−→w2 · −→w2
−→w2 + · · ·+−→y · −→wk−→wk · −→wk
−→wk ∈ Span{−→w1,−→w2, . . . ,
−→wk} = W.
Let −→z = −→y −−→w . We show that −→z = −→y −−→w ∈ W⊥. For i = 1, 2, . . . , k,
−→z · −→wi = (−→y −−→w ) · −→wi= −→y · −→wi −−→w · −→wi
= −→y · −→wi −(−→y · −→w1−→w1 · −→w1
−→w1 +−→y · −→w2−→w2 · −→w2
−→w2 + · · ·+−→y · −→wk−→wk · −→wk
−→wk)· −→wi
= −→y · −→wi −(
0 + · · ·+ 0 +−→y · −→wi−→wi · −→wi
−→wi · −→wi + 0 + · · ·+ 0
)= 0.
Since −→z · −→wi = 0 for i = 1, 2, . . . , k, −→z · −→w = 0 for all −→w ∈ W = Span{−→w1,−→w2, . . . ,
−→wk} andconsequently −→z ∈ W⊥.To show the uniqueness of the decomposition −→y = −→w + −→z , let −→y = −→w ′ + −→z ′ for some−→w ′ ∈ W and −→z ′ ∈ W⊥. Then
−→0 = −→y −−→y = (−→w +−→z )− (−→w ′ +−→z ′)
=⇒ −→w ′ −−→w = −→z −−→z ′ ∈ W ∩W⊥ = {−→0 }=⇒ −→w ′ = −→w , −→z ′ = −→z .
Definition. Let W be a subspace of Rn. Each vector −→y ∈ Rn can be uniquely writtenas −→y = −→w + −→z where −→w ∈ W and −→z ∈ W⊥. The unique vector −→w ∈ W is called theorthogonal projection of −→y onto W and it is denoted by projW
−→y .
W
−→0
−→y−→y −projW
−→y
projW−→y
61
Linear Algebra Sudipta Mallik
Example.
1. Let −→w = [2, 1]T and W = Span{−→w }. For −→y = [2, 3]T , find projW−→y and the orthogonal
decomposition of −→y with respect to W .
projW−→y =
−→y ·−→w−→w ·−→w−→w = 7
5[2, 1]T ∈ W and −→y − projW
−→y = 15[−4, 8]T ∈ W⊥. The
orthogonal decomposition of −→y with respect to W is
−→y = [2, 3]T =7
5[2, 1]T +
1
5[−4, 8]T .
2. Let −→w1 =
230
, −→w2 =
002
, and W = Span{−→w1,−→w2}. For −→y =
101
, find projW−→y
and the orthogonal decomposition of −→y with respect to W .
projW−→y =
−→y · −→w1−→w1 · −→w1
−→w1 +−→y · −→w2−→w2 · −→w2
−→w2
=2
13
230
+2
4
002
=
1
13
46
13
∈ W,−→y − projW
−→y =1
13
9−6
0
∈ W⊥.
The orthogonal decomposition of −→y with respect to W is
−→y =
101
=1
13
46
13
+1
13
9−6
0
.Corollary 6.12. Let W be a subspace of Rn with an orthonormal basis {−→w1,
−→w2, . . . ,−→wk}.
Let U = [−→w1−→w2 · · · −→wk]. Then for each −→y ∈ Rn,
projW−→y = UUTy = (−→y · −→w1)
−→w1 + (−→y · −→w2)−→w2 + · · ·+ (−→y · −→wk)−→wk.
Proof.
UT−→y =
−→w1
T
−→w2T
...−→wkT
−→y =
−→w1
T−→y−→w2
T−→y...
−→wkT−→y
=
−→w1 · −→y−→w2 · −→y
...−→wk · −→y
.62
Linear Algebra Sudipta Mallik
UUT−→y = [−→w1−→w2 · · · −→wk]
−→w1 · −→y−→w2 · −→y
...−→wk · −→y
= (−→y ·−→w1)−→w1+(−→y ·−→w2)
−→w2+· · ·+(−→y ·−→wk)−→wk = projW−→y .
Remark. Recall that for an m × n real matrix A, A−→x =−→b has a solution if and only if−→
b ∈ CS (A). So A−→x =−→b has no solution if and only if
−→b /∈ CS (A). We find −→w ∈ CS (A)
that is closest to−→b , i.e., the best approximation to
−→b by a vector −→w ∈ CS (A).
Theorem 6.13 (Best Approximation Theorem). Let W be a subspace of Rn and−→b ∈ Rn.
Thenmin−→w∈W
∥∥∥−→b −−→w∥∥∥ =∥∥∥−→b − projW
−→b∥∥∥ .
Proof. It suffices to show that∥∥∥−→b − projW
−→b∥∥∥ <
∥∥∥−→b −−→w∥∥∥ for all −→w ∈ W when −→w 6=
projW−→b . Let −→w ∈ W and −→w 6= projW
−→b . Then
−→0 6= projW
−→b −−→w ∈ W . Since projW
−→b ∈
W ,−→b − projW
−→b ∈ W⊥ by the orthogonal decomposition. Then
(−→b − projW
−→b ) · (projW
−→b −−→w ) = 0.
By Pythagorean theorem,∥∥∥(−→b − projW
−→b ) + (projW
−→b −−→w )
∥∥∥2 =∥∥∥−→b − projW
−→b∥∥∥2 +
∥∥∥projW−→b −−→w
∥∥∥2=⇒
∥∥∥−→b −−→w∥∥∥2 =∥∥∥−→b − projW
−→b∥∥∥2 +
∥∥∥projW−→b −−→w
∥∥∥2 > ∥∥∥−→b − projW−→b∥∥∥2
because projW−→b −−→w 6= −→0 . Thus
∥∥∥−→b −−→w∥∥∥ > ∥∥∥−→b − projW−→b∥∥∥.
Example. Let −→u = [2, 3, 0]T , −→v = [0, 0, 2]T , and W = Span{−→u ,−→v }. Find the point on Wclosest to −→y (the best approximation to −→y by a vector of W ) and find the distance between−→y and W .The point on W closest to −→y is projW
−→y = 113
[4, 6, 13]T ∈ W (show steps). The distance
between −→y and W is ‖−→y − projW−→y ‖ =
∥∥ 113
[9,−6, 0]T∥∥ =
√11713
.
To find projW−→y in an alternative way, note that
{−→u‖−→u‖
−→v‖−→v ‖
}is an orthonormal basis of
W . Let U =
[−→u‖−→u‖
−→v‖−→v ‖
]=
2√13
03√13
0
0 1
. Then
projW−→y = UUTy =
2√13
03√13
0
0 1
[ 2√13
3√13
0
0 0 1
] 101
=
2√13
03√13
0
0 1
[ 2√13
1
]=
1
13
4613
.
63
Linear Algebra Sudipta Mallik
6.4 Gram-Schmidt Process
Theorem 6.14 (Gram-Schmidt Process). Let W be a subspace of Rn with a basis {−→w1,−→w2, . . . ,
−→wk}.There is an orthogonal basis {−→v1 ,−→v2 , . . . ,−→vk} of W where
−→v1 = −→w1 and −→vi = −→wi −i−1∑j=1
−→wi · −→vj−→vj · −→vj
−→vj , i = 2, 3, . . . , k.
Moreover, Span{−→v1 ,−→v2 , . . . ,−→vi } = Span{−→w1,−→w2, . . . ,
−→wi} for i = 1, 2, . . . , k.
Proof. Let Wi = Span{−→w1,−→w2, . . . ,
−→wi} for i = 1, 2, . . . , k. By finite induction, we prove thatWi = Span{−→v1 ,−→v2 , . . . ,−→vi } and {−→v1 ,−→v2 , . . . ,−→vi } is an orthogonal set for each i = 1, 2, . . . , k.
Since −→v1 = −→w1, W1 = Span{−→w1} = Span{−→v1} and {−→v1} = {−→w1} is an orthogonal set. Sothe statement is true for i = 1. Suppose the statement is true for some j < k, i.e., Wj =Span{−→v1 ,−→v2 , . . . ,−→vj } and {−→v1 ,−→v2 , . . . ,−→vj } is an orthogonal set. We prove the statement istrue for i = j + 1. Note that
−−→vj+1 = −−→wj+1 −j∑t=1
−−→wj+1 · −→vt−→vt · −→vt
−→vt = −−→wj+1 − projWj
−−→wj+1.
Since −−→wj+1 /∈ Wj = Span{−→v1 ,−→v2 , . . . ,−→vj }, by the orthogonal decomposition we have
−−→vj+1 = −−→wj+1 − projWj
−−→wj+1 ∈ W⊥j .
Then −−→vj+1 is orthogonal to each of −→v1 ,−→v2 , . . . ,−→vj which are in Wj. Thus {−→v1 ,−→v2 , . . . ,−−→vj+1} isan orthogonal set of j + 1 vectors in (j + 1)-dimensional subspace Wj+1 and consequently itspans Wj+1 (in fact it forms an orthogonal basis of Wj+1).
Remark. To find an orthonormal basis, normalize each vector of an orthogonal basis bymaking each vector a unit vector.
Example. Find an orthogonal basis of CS (A) for A =
3 1 04 0 −10 2 0
.
Let −→w1 = [3, 4, 0]T , −→w2 = [1, 0, 2]T , and −→w3 = [0,−1, 0]T . Since the columns −→w1,−→w2,−→w3 of A
are linearly indpendent, they form a basis of CS (A).Let −→v1 = −→w1 and W1 = Span{−→v1}.
Let −→v2 = −→w2 − projW1
−→w2 = −→w2 −−→w2·−→v1−→v1·−→v1−→v1 =
102
− 325
340
= 125
16−12
50
and W2 =
Span{−→v1 ,−→v2}.
64
Linear Algebra Sudipta Mallik
Let −→v3 = −→w3 − projW2
−→w3 = −→w3 −−→w3 · −→v1−→v1 · −→v1
−→v1 −−→w3 · −→v2−→v2 · −→v2
−→v2
=
0−1
0
− −4
25
340
− 12/25
2900/252
1
25
16−12
50
=
1
29
12−9−6
Thus an orthogonal basis of CS (A) is
{−→v1 ,−→v2 ,−→v3} =
3
40
, 1
25
16−12
50
, 1
29
12−9−6
or simply
3
40
, 16−12
50
, 12−9−6
.
An orthonormal basis of CS (A) is
{ −→v1‖−→v1‖
,−→v2‖−→v2‖
,−→v3‖−→v3‖
}=
1
5
340
, 1
10√
29
16−12
50
, 1
3√
29
12−9−6
.
Theorem 6.15 (QR-factorization). If an m × n real matrix A has linearly independentcolumns, then A can be factored as A = QR where Q is an m×n real matrix whose columnsform an orthonormal basis of CS (A) and R is an n× n upper-triangular real matrix.
Proof. (Sketch) By the Gram-Schmidt process find an orthonormal basis {−→v1 ,−→v2 , . . . ,−→vn} ofCS (A). Let Q = [−→v1 ,−→v2 , · · · ,−→vn]. Since the columns of Q are orthonormal, QTQ = In andconsequently QTA = QTQR = InR = R.
65
Linear Algebra Sudipta Mallik
7 Vector Spaces and Inner Product Spaces
7.1 Basics of Vector Spaces
Definition. A real vector space is a nonempty set V of objects, called vectors, with twooperations, viz, addition and scalar multiplication, that satisfy the following properties forall vectors −→u ,−→v ,−→w in V and all scalars (real numbers) c and d.
1. −→u +−→v is in V .
2. −→u +−→v = −→v +−→u
3. (−→u +−→v ) +−→w = −→u + (−→v +−→w )
4. There is a zero vector−→0 such that −→u +
−→0 = −→u .
5. There is a vector −−→u such that −→u + (−−→u ) =−→0 .
6. c−→u is in V .
7. c(−→u +−→v ) = c−→u + c−→v
8. (c+ d)−→u = c−→u + d−→u .
9. c(d−→u ) = (cd)−→u .
10. 1−→u = −→u .
Remark.
1. Scalars are elements of a field such as the set of real numbers and the set of complexnumbers. If scalars are complex numbers, then V is called a complex vector space.
2. From the definition we have the following:
(a) 0−→u =−→0
(b) c−→0 =
−→0
(c) −−→u = (−1)−→u
Example. The following are real vector spaces.
1. Vn, the set of all vectors (directed line segments) in Rn.
• Addition: Usual vector addition by the triangle/parallelogram law.
• Scalar multiplication: Usual scalar multiplication of vectors.
2. Rn and Cn.
• Addition: Entrywise addition.
66
Linear Algebra Sudipta Mallik
• Scalar multiplication: Entrywise scalar multiplication.
3. R∞, the set of all real sequences (an) = (a1, a2, a3, . . .).
• Addition: Entrywise addition.
• Scalar multiplication: Entrywise scalar multiplication.
4. Pn, the set of all real polynomials of degree at most n.
• Addition: If −→p (t) = a0 + a1t+ · · ·+ antn and −→q (t) = b0 + b1t+ · · ·+ bnt
n, then
(−→p +−→q )(t) = (a0 + b0) + (a1 + b1)t+ · · ·+ (an + bn)tn.
• Scalar multiplication: If −→p (t) = a0 + a1t+ · · ·+ antn and c ∈ R, then
(c−→p )(t) = ca0 + ca1t+ · · ·+ cantn.
5. F , the set of all real-valued functions on a set D.
• Addition: (−→p +−→q )(x) = −→p (x) +−→q (x) for all −→p ,−→q ∈ F .
• Scalar multiplication: (c−→p )(x) = c−→p (x) for all −→p ∈ F and c ∈ R.
6. L(V,W ), the set of all linear transformations T : V → W where V and W are realvector spaces.
• Addition: (−→T +
−→S )(−→v ) =
−→T (−→v ) +
−→S (−→v ) for all −→v ∈ V .
• Scalar multiplication: (c−→T )(−→v ) = c
−→T (−→v ) for all −→v ∈ V and c ∈ R.
7. Mm,n(R), the set of all m× n real matrices
• Addition: Entrywise addition.
• Scalar multiplication: Entrywise scalar multiplication.
7.2 Linear Span and Subspaces
Definition. A linear combination of vectors −→v1 ,−→v2 , . . . ,−→vk of a vector space V is a sum oftheir scalar multiples, i.e.,
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk
for some scalars c1, c2, . . . , ck. The set of all linear combinations of a nonempty set S ofvectors of V is called the linear span or span of S, denoted by Span(S) or SpanS, i.e.,
Span{−→v1 ,−→v2 , . . . ,−→vk} = {c1−→v1 + c2−→v2 + · · ·+ ck
−→vk | c1, c2, . . . , ck are scalars}.
We define Span∅ = {−→0 }. When Span{−→v1 , . . . ,−→vk} = V , we say {−→v1 , . . . ,−→vk} spans V .
67
Linear Algebra Sudipta Mallik
Example.
1. Span{−→e1 ,−→e2 , . . . ,−→en} = Rn.
2. Span{−→1 ,−→t ,−→t2 , . . . ,
−→tn} = Pn.
3. Span{−→e1 ,−→e2 , . . . ,−→en, . . .} = R∞ where −→ei is the infinite sequence with 1 in the ith placeand 0 elsewhere.
4. Span(B) = Mm,n(R) for B = {−→Ei,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} where
−→Ei,j is the m × n
matrix with the (i, j)-entry 1 and 0 elsewhere.
Definition. A subspace of a vector space V is a nonempty subset S of V that satisfies threeproperties:
(a)−→0 is in S.
(b) −→u +−→v is in S for all −→u , −→v in S.
(c) c−→u is in S for all −→u in S and all scalars c.
In short, a subspace of V is a nonempty subset S of V that is closed under linear combinationof vectors, i.e., c−→u +d−→v is in S for all −→u , −→v in S and all scalars c, d. When S is a subspaceof V , we sometimes denote it by S ≤ V .
Example.
1. {−→0V } ≤ V and V ≤ V , i.e., {−→0V } and V are subspaces of the vector space V .
2. If F is the vector space of all real-valued functions, then Pn is a a subspace of thevector space F .
3. Let H be the set of all polynomials −→p in Pn such −→p (0) = 0. Note that
H = {−→p ∈ Pn | −→p (0) = 0}= {a1t+ a2t
2 + · · ·+ antn | a1, . . . , an ∈ R}.
Then H is a subspace of the vector space Pn and consequently a subspace of the vectorspace F .
4. Let H =
x1x20
| x1, x2 ∈ R
. H is not a subspace of the vector space R2 but H
is a subspace of the vector space R3.
5. If v1 . . . , vk are vectors of a real vector space V , then
Span{v1 . . . , vk} = {c1v1 + · · ·+ ckvk | c1, . . . , ck ∈ R}
is a subspace of V .
68
Linear Algebra Sudipta Mallik
7.3 Linear Independence
Definition. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of vectors of a vector space V is linearly independent
if the only linear combination of vectors in S that produces−→0 is a trivial linear combination.,
i.e.,c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 =⇒ c1 = c2 = · · · = ck = 0.
S = {−→v1 ,−→v2 , . . . ,−→vk} is linearly dependent if S is not linearly independent, i.e., there arescalars c1, c2, . . . , ck, not all zero, such that
c1−→v1 + c2
−→v2 + · · ·+ ck−→vk =
−→0 .
Example.
1. {−→v } is linearly independent in V if and only if −→v 6= −→0V .
2. {−→e1 ,−→e2 , . . . ,−→en} is a linearly independent set of vectors in Rn.
3. {−→1 ,−→t ,−→t2 , . . . ,
−→tn} is a linearly independent set of vectors in Pn.
4. {−→e1 ,−→e2 , . . . ,−→en, . . .} is a linearly independent set of vectors in R∞ where −→ei is theinfinite sequence with 1 in the ith place and 0 elsewhere.
5. B = {−→Ei,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a linearly independent set of vectors in Mm,n(R)
where−→Ei,j is the m× n matrix with (i, j)-entry 1 and 0 elsewhere.
6. Consider the following three polynomials in P2:−→p1(t) = t + 2t2, −→p2(t) = 2 + 2t2 and
−→p3(t) = 1− t− t2. Show that {−→p1 , −→p2 , −→p3} is a linearly dependent set in P2.
Suppose c1−→p1 + c2
−→p2 + c3−→p3 =
−→0 for some scalars c1, c2, c3. Then for all t,
(c1−→p1 + c2
−→p2 + c3−→p3)(t) = 0
c1−→p1(t) + c2
−→p2(t) + c3−→p3(t) = 0
c1(t+ 2t2) + c2(2 + 2t2) + c3(1− t− t2) = 0
(2c2 + c3) + (c1 − c3)t+ (2c1 + 2c2 − c3)t2 = 0.
Thus 2c2+c3 = 0, c1−c3 = 0, 2c1+2c2−c3 = 0. One solution is (c1, c2, c3) = (2,−1, 2).
So 2−→p1 −−→p2 + 2−→p3 =−→0 and {−→p1 , −→p2 , −→p3} is a linearly dependent set in P2.
Theorem 7.1. A set S = {−→v1 ,−→v2 , . . . ,−→vk} of k ≥ 2 vectors in a vector space V is linearlydependent if and only if there exists a vector in S that is a linear combination of the othervectors in S.
69
Linear Algebra Sudipta Mallik
7.4 Basis and Dimensions
Definition. A basis of a nontrivial subspace S of a vector space V is a subset B of S suchthat
(a) Span(B) = S and
(b) B is linearly independent set.
We define the basis of the trivial subspace {−→0V } to be B = ∅. The number of vectors in abasis B is the dimension of S denoted by dim (S) or dimS.
Remark. If a basis of V consists of n vectors, then each basis of V has exactly n vectorsand dim (V ) = n. If dim (V ) is a positive integer, V is called a finite-dimensional vectorspace. Otherwise V is called an infinite-dimensional vector space. If H is a subspace of afinite-dimensional vector space V , then dim (H) ≤ dim (V ) (See Extension Theorem below).
Example.
1. {−→e1 ,−→e2 , . . . ,−→en} is a basis of Rn. So dim (Rn) = n.
2. {−→1 ,−→t ,−→t2 , . . . ,
−→tn} is a basis of Pn. So dim (Pn) = n+ 1.
3. B = {−→Ei,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis of Mm,n(R) where
−→Ei,j is an m× n matrix
with (i, j)-entry 1 and 0 elsewhere. So dim (Mm,n(R)) = mn.
4. {−→e1 ,−→e2 , . . . ,−→en, . . .} is a basis of R∞ where −→ei is the infinite sequence with 1 in the ithplace and 0 elsewhere. So R∞ is an infinite-dimensional vector space.
Now we present some important theorems regarding bases of a subspace of Rn.
Theorem 7.2 (Unique Representation Theorem). Let S be a subspace of a vector space V .
Then B = {−→b1 ,−→b2 , . . . ,
−→bk} is a basis of S if and only if each vector −→v of S is a unique
linear combination of−→b1 ,−→b2 , . . . ,
−→bk , i.e., −→v = c1
−→b1 + c2
−→b2 + · · · + ck
−→bk for unique scalars
c1, c2, . . . , ck.
Theorem 7.3 (Reduction Theorem). Let S be a subspace of a vector space V . If a set
B = {−→b1 ,−→b2 , . . . ,
−→bk} of vectors of S spans S, then either B is a basis of S or a subset of B
is a basis of S.
Theorem 7.4 (Extension Theorem). Let S be a subspace of a vector space V . If a set
B = {−→b1 ,−→b2 , . . . ,
−→bk} of vectors of S is linearly independent, then either B is a basis of S or
a superset of B is a basis of S.
Example. For −→p1(t) = t + 2t2, −→p2(t) = 2 + 2t2, and −→p3(t) = 1 − t − t2 in P2,−→p2 =
2−→p1 + 2−→p3 . Then Span{−→p1 , −→p2 , −→p3} = Span{−→p1 , −→p3} and {−→p1 , −→p3} is a basis of the subspaceSpan{−→p1 , −→p2 , −→p3} of P2.
70
Linear Algebra Sudipta Mallik
7.5 Linear Transformations
Definition. A function T : V → W from a vector space V to a vector space W (over thesame field) is called a linear transformation if
(a) T (−→u +−→v ) = T (−→u ) + T (−→v ) for all −→u ,−→v ∈ V and
(b) T (c−→v ) = cT (−→v ) for for all −→v ∈ V and all scalars c.
In short, a function T : V → W is a linear transformation if it preserves the linearity amongvectors: T (c−→u + d−→v ) = cT (−→u ) + dT (−→v ) for all −→u ,−→v ∈ V and all scalars c, d.
Definition. The set of all linear transformations from a vector space V to a vector spaceW (over the same field) is denoted by L(V,W ).
Example.
1. For anm×nmatrix A, T : Rn → Rm defined by T (−→x ) = A−→x is a linear transformation.
2. T : Pn → Pn−1 defined by T (a0 + a1t + a2t2 + · · · + ant
n) = a1 + 2a2t + · · · + nantn−1
is a linear transformation.
3. The trace function T : Mn(R)→ R defined by T (A) = tr (A) is a linear transformation.
4. The right shift operator T : R∞ → R∞ defined by T (a1, a2, a3, . . .) = (0, a1, a2, a3, . . .)is a linear transformation.
From the definition of a linear transformation we have the following properties.
Proposition. For a linear transformation T : V → W ,
(a) T (−→0V ) =
−→0W and
(b) for all −→v1 , . . . ,−→vk ∈ V and all scalars c1, . . . , ck,
T (c1−→v1 + c2
−→v2 + · · ·+ ck−→vk) = c1T (−→v1) + c2T (−→v2) + · · ·+ ckT (−→vk).
Example. Consider the function T : R3 → R3 defined by T (x1, x2, x3) = (x1, x2, 5). SinceT (0, 0, 0) = (0, 0, 5) 6= (0, 0, 0), T is not a linear transformation.
For any given linear transformation T : V → W , the domain space is V and the codomainspace is W . We study a subspace of the domain space called Kernel or Null Space and asubspace of the codomain space called Image Space or Range.
Definition. The kernel or null space of a linear transformation T : V → W , denoted byker(T ) or kerT , is the following subspace of Rn:
kerT = {−→x ∈ V | T (−→x ) =−→0W}.
The nullity of T , denoted by nullity (T ), is the dimension of kerT , i.e.,
nullity (T ) = dim (kerT ) .
71
Linear Algebra Sudipta Mallik
Example. For the linear transformation T : Mn(R)→Mn(R) defined by T (A) = A− AT ,
kerT = {A ∈Mn(R) | T (A) = A− AT = O} = {A ∈Mn(R) | AT = A},
the set of all n× n real symmetric matrices. Then nullity (T ) = dim (kerT ) = n(n+ 1)/2.
Definition. The image space or range of a linear transformation T : V → W , denoted byim(T ) or imT or T (V ), is the following subspace of W:
imT = {T (−→x ) | −→x ∈ V }.
The rank of T , denoted by rank (T ), is the dimension of imT , i.e.,
rank (T ) = dim (imT ) .
Example. For the linear transformation T : Mn(R)→Mn(R) defined by T (A) = A− AT ,
imT = {A− AT | A ∈Mn(R)},
the set of all n× n real skew-symmetric matrices. Then
rank (T ) = dim (imT ) = n(n− 1)/2.
Theorem 7.5. Let T : V → W be a linear transformation. If V has finite dimension, then
rank (T ) + nullity (T ) = dim(V ).
Proof. (Sketch) Let dim (V ) = n. Start with a basis {−→v1 , . . . ,−→vk} of KerT and by theExtension Theorem, extend it to a basis {−→v1 , . . . ,−→vk ,−→u1, . . . ,−−→un−k} of V . Now show that{T (−→u1), . . . , T (−−→un−k)} is a basis of imT .
Example. For the linear transformation T : Mn(R)→Mn(R) defined by T (A) = A− AT ,
rank (T ) + nullity (T ) =n(n+ 1)
2+n(n− 1)
2= n2 = dim(Mn(R)).
Remark. When dim (V ) =∞, the Rank-Nullity Theorem still holds assuming the followingalgebra:
n+∞ =∞,∞+ n =∞,∞+∞ =∞.
Now we discuss two important types of linear transformation T : Rn → Rm.
Definition. Let T : V → W be a linear transformation. T is onto if each−→b ∈ W has a
pre-image −→x in V under T , i.e., T (−→x ) =−→b . T is one-to-one if each
−→b ∈ Rm has at most
one pre-image in V under T .
Example.
72
Linear Algebra Sudipta Mallik
1. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) is ontobecause each (x1, x2) ∈ R2 has a pre-image (x1, x2, 0) ∈ R3 under T . But T is not one-to-one because T (0, 0, 0) = T (0, 0, 1) = (0, 0), i.e., (0, 0) has two distinct pre-images(0, 0, 0) and (0, 0, 1) under T .
2. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) is one-to-onebecause T (x1, x2) = T (y1, y2) =⇒ (x1, x2, 0) = (x1, x2, 0) =⇒ (x1, x2) = (y1, y2).But T is not onto because (0, 0, 1) ∈ R3 has no pre-image (x1, x2) ∈ R2 under T .
3. The linear transformation T : R2 → R2 defined by T (x1, x2) = (x1 + x2, x1 − x2) isone-to-one and onto (exercise).
Theorem 7.6. Let T : V → W be a linear transformation. Then the following areequivalent.
(a) T is one-to-one.
(b) kerT = {−→0V }.
(c) nullity (T ) = 0.
Example. The linear transformation T : R2 → R3 defined by T (x1, x2) = (x1, x2, 0) has the
standard matrix A = [T (−→e1 ) T (−→e2 )] =
1 00 10 0
. Note that the columns of A are linearly
independent , kerT = NS (A) = {−→02}, and nullity (T ) = nullity (A) = 0. Thus T (i.e.,−→x 7→ A−→x ) is one-to-one.
Theorem 7.7. Let T : V → W be a linear transformation. Then the following areequivalent.
(a) T is onto.
(b) imT = W .
(c) rank (T ) = dim(W ).
Example. The linear transformation T : R3 → R2 defined by T (x1, x2, x3) = (x1, x2) has
the standard matrix A = [T (−→e1 ) T (−→e3 ) T (−→e2 )] =
[1 0 00 1 0
]. Note that each row of A has a
pivot position, imT = CS (A) = R2, and rank (T ) = rank (A) = 2. Thus T (i.e., −→x 7→ A−→x )is onto.
Definition. A linear transformation T : V → W is an isomorphism if it is one-to-one andonto. When T : V → W is an isomorphism, V and W are called isomorphic denoted byV ∼= W .
Example.
73
Linear Algebra Sudipta Mallik
1. Define T : Pn → Rn+1 by T (a0 + a1t + · · · + antn) = [a0, a1, . . . , an]T . Verify that
KerT = {−→0 } which implies T : Pn → Rn+1 is one-to-one. Also T : Pn → Rn+1 is ontosince each [a0 a1 · · · an]T ∈ Rn+1 has a pre-image a0 + a1t+ · · ·+ ant
n ∈ Pn under T .Thus T : Pn → Rn+1 is an isomorphism and consequently Pn and Rn+1 are isomorphic.
2. The left shift operator T : R∞ → R∞ defined by T (a1, a2, a3, . . .) = (a2, a3, a4, . . .) is alinear transformation. Since each (a2, a3, a4, . . .) ∈ R∞ has a pre-image (0, a2, a3, . . .) ∈R∞ under T , T is onto equivalently imT = R∞. Verify that
KerT = {(a1, 0, 0, . . .) | a1 ∈ R} = Span{(1, 0, 0, . . .)}.
Thus T is not one-to-one and hence not an isomorphism. Note that ontoness of T doesnot imply that T is one-to-one which may happen when domain and codomain spacesare infinite dimensional.
Theorem 7.8. Let T : V → W be a linear transformation. If V and W have finitedimensions, then the following are equivalent.
(a) T is an isomorphism.
(b) T is one-to-one.
(c) kerT = {−→0V }.
(d) nullity (T ) = 0.
(e) T is onto.
(f) imT = W .
(g) rank (T ) = dim(W ).
Theorem 7.9. If V and W are isomorphic via an isomorphism T : V → W , then V andW have similar linear algebraic properties such as follows.
1. H is a subspace of V if and only if T (H) is a subspace of W .
2. {−→v1 , . . . ,−→vn} is linearly independent in V if and only if {T (−→v1), . . . , T (−→vn)} is linearlyindependent in W .
3. {−→v1 , . . . ,−→vn} spans V if and only if {T (−→v1), . . . , T (−→vn)} spans W .
4. {−→v1 , . . . ,−→vn} is a basis of V if and only if {T (−→v1), . . . , T (−→vn)} is a basis of W .
5. dim(V ) = dim(W ).
Problem. Consider the following three polynomials of P2:
−→p1(t) = 1 + t2,−→p2(t) = −1 + 2t− t2, and −→p3(t) = −1 + 4t.
Show that {−→p1 , −→p2 , −→p3} is a basis of P2.
74
Linear Algebra Sudipta Mallik
Solution. First recall that T : P2 → R3 defined by T (a0 + a1t + a2t2) = [a0, a1, a2]
T is anisomorphism.
T (−→p1) = T (1 + t2) =
101
T (−→p2) = T (−1 + 2t− t2) =
−12−1
T (−→p3) = T (−1 + 4t) =
−140
Now A = [T (−→p1) T (−→p2) T (−→p3)] =
1 −1 −10 2 41 −1 0
−R1+R3−−−−−→
1 −1 −1
0 2 4
0 0 1
.Since 3 × 3 matrix A has 3 pivot positions, by the IMT, the columns of A are linearlyindependent and span R3. Thus {T (−→p1), T (−→p2), T (−→p3)} is a basis of R3. Since T : P2 → R3
is an isomorphism, {−→p1 , −→p2 , −→p3} is a basis of P2.
Definition. Suppose B =(−→b1 , . . . ,
−→bn
)is an ordered basis of a real vector space V . Then
any vector −→x ∈ V can be written as −→x = c1−→b1 + c2
−→b2 + · · · + cn
−→bn for some unique scalars
c1, c2, . . . , cn. The coordinate vector of −→x relative to B or the B-coordinate of −→x , denotedby [−→x ]B, is [−→x ]B = [c1, c2, . . . , cn]T .
Remark. [ ]B : V → Rn is an isomorphism.
Theorem 7.10. If V is a real vector space of dimension n, then V is isomorphic to Rn.
Proof. Let B be an ordered basis of V . Define the coordinate map T : V → Rn by T (−→x ) =[−→x ]B. It can be verified that T is an isomorphism. Thus V ∼= Rn.
Definition. Let V and W be real vector spaces with ordered bases B =(−→b1 , . . . ,
−→bn
)and
C = (−→c1 , . . . ,−→cm) respectively. Let T : V → W be a linear transformation. The matrix of Tfrom B to C, denoted by [T ]C←B or C [T ]B, is the following m× n matrix:
C [T ]B =[[T (−→b1 )]C · · · [T (
−→bn)]C
].
Note that for all −→x ∈ V ,[T (−→x )]C =C [T ]B[−→x ]B.
Example. Pn and Pn−1 are real vector spaces with ordered bases B = (1, x, . . . , xn) andC = (1, x, . . . , xn−1) respectively. For the linear transformation T : Pn → Pn−1 defined by
75
Linear Algebra Sudipta Mallik
T (a0 + a1x+ a2x2 + · · ·+ anx
n) = a1 + 2a2x+ · · ·+ nanxn−1, we have
C [T ]B =
0 1 0 · · · 00 0 2 · · · 0...
......
. . ....
0 0 0 · · · n− 1
.
−→x T (−→x )
[−→x ]B [T (−→x )]C
T
C [T ]B
[ ]B [ ]C
V W
Rn Rm
Theorem 7.11. Let V be a real vector space with ordered bases B and B′. Let W be a realvector space with ordered bases C and C ′. For a linear transformation T : V → W ,
C′ [T ]B′ =C′ [I]C C [T ]B B[I]B′ .
7.6 Inner Product Spaces
Definition. Let V be a real vector space. An inner product on V , denoted by 〈·, ·〉, is afunction from V × V to R for which the following hold for all −→u ,−→v ,−→w ∈ V and c, d ∈ R:
(a) 〈−→u ,−→v 〉 = 〈−→v ,−→u 〉. (symmetry)
(b) 〈c−→u + d−→v ,−→w 〉 = c〈−→u ,−→w 〉+ d〈−→v ,−→w 〉. (linearity)
(c) 〈−→u ,−→u 〉 ≥ 0 where 〈−→u ,−→u 〉 = 0 if and only if −→u =−→0 . (nonnegativity)
A real vector space with an inner product defined on it is called a real inner product space.
Example.
1. The real vector space Rn is a real inner product space with the standard inner productor the dot product:
〈−→u ,−→v 〉 = −→u · −→v = −→u T−→v .We call Rn as the n-dimensional Euclidean space.
76
Linear Algebra Sudipta Mallik
2. Consider the set `2(R) of square-summable real sequences:
`2(R) = {−→a = (a1, a2, a3, . . .) ∈ R∞ |∞∑n=1
a2n <∞}.
`2(R) is a real inner product space with the following inner product:
〈−→a ,−→b 〉 =
∞∑n=1
anbn.
3. The set C[0, 1] of all continuous real-valued functions on [0, 1] is a real inner productspace with the following inner product:
〈f, g〉 =
∫ 1
0
f(x)g(x) dx.
Definition. Let −→u and −→v be in a real inner product space V . The length or norm of −→v ,denoted by ‖−→v ‖, is defined by
‖−→v ‖ =√〈−→v ,−→v 〉.
−→v ∈ V is a unit vector if ‖−→v ‖ = 1. The distance between −→u ,−→v , denoted by d(−→u ,−→v ), isdefined by
d(−→u ,−→v ) = ‖−→u −−→v ‖ .
Theorem 7.12. The following are true for all vectors −→u and −→v of a real inner productspace V and for all scalars c in R.
(a) ‖−→v ‖2 = 〈−→v ,−→v 〉.
(b) ‖c−→v ‖ = |c| ‖−→v ‖.
(c) Triangle inequality: ‖−→u +−→v ‖ ≤ ‖−→u ‖+ ‖−→v ‖.
(d) Parallelogram law: ‖−→u +−→v ‖2 = 2 ‖−→u ‖2 + 2 ‖−→v ‖2.
(e) Cauchy-Schwarz inequality: |〈−→u ,−→v 〉| ≤ ‖−→u ‖ ‖−→v ‖ where the equality holds if and onlyif {−→u ,−→v } is linearly dependent.
Definition. Two vectors −→u and −→v of a real inner product space V are orthogonal if
〈−→u ,−→v 〉 = 0.
Theorem 7.13 (Pythagorean Theorem). Two vectors −→u and −→v of a real inner product
space V are orthogonal if and only if ‖−→v +−→v ‖2 = ‖−→u ‖2 + ‖−→v ‖2.
Definition. The angle θ between two vectors −→u and −→v of a real inner product space V isthe angle in [0, π] satisfying
〈−→u ,−→v 〉 = ‖−→u ‖ ‖−→v ‖ cos θ.
77
Linear Algebra Sudipta Mallik
Definition. Let W be a subspace of a real inner product space V . A vector −→v ∈ V isorthogonal to W if 〈−→v ,−→w 〉 = 0 for all −→w ∈ W . The orthogonal complement of W , denotedby W⊥, is the set of all vectors in V that are orthogonal to W , i.e.,
W⊥ = {−→v ∈ V | 〈−→v ,−→w 〉 = 0 for all −→w ∈ W}.
Theorem 7.14. Let W be a subspace of a real inner product space V . Then
(a) −→v ∈ W⊥ if and only if −→v is orthogonal to each vector −→w of a basis of W .
(b) W⊥ is a subspace of V .
(c) W ⊆ (W⊥)⊥ where the equality holds for finite dimensional W .
(d) W ∩W⊥ = {−→0V }.
Definition. A subset {−→v1 ,−→v2 , . . . ,−→vk} of a real inner product space V is called an orthogonalset if 〈−→vi ,−→vj 〉 = 0 for all distinct i, j = 1, 2, . . . , k. Also {−→v1 ,−→v2 , . . . ,−→vk} is called anorthonormal set if it is an orthogonal set of unit vectors.
Theorem 7.15. If {−→v1 ,−→v2 , . . . ,−→vk} is an orthogonal set of nonzero vectors in a real innerproduct space V , then {−→v1 ,−→v2 , . . . ,−→vk} is linearly independent and consequently forms a basisof Span{−→v1 ,−→v2 , . . . ,−→vk}.
Definition. Let W be a subspace of a real inner product space V . An orthogonal basis ofW is a basis of W that is an orthogonal set. Similarly an orthonormal basis of W is a basisof W that is an orthonormal set.
Theorem 7.16. Let W be a subspace of a real inner product space V and {−→w1,−→w2, . . . ,
−→wk}is an orthogonal basis of W . If −→v ∈ W , then
−→v =〈−→v ,−→w1〉〈−→w1,−→w1〉−→w1 +
〈−→v ,−→w2〉〈−→w2,−→w2〉−→w2 + · · ·+ 〈
−→v ,−→wk〉〈−→wk,−→wk〉
−→wk.
Theorem 7.17 (Orthogonal Decomposition Theorem). Let W be a subspace of a real innerproduct space V and −→y ∈ V . Then
−→y = −→w +−→z
for unique vectors −→w ∈ W and −→z ∈ W⊥. Moreover, if {−→w1,−→w2, . . . ,
−→wk} is an orthogonalbasis of W , then
−→w =〈−→y ,−→w1〉〈−→w1,−→w1〉−→w1 +
〈−→y ,−→w2〉〈−→w2,−→w2〉−→w2 + · · ·+ 〈
−→y ,−→wk〉〈−→wk,−→wk〉
−→wk and −→z = −→y −−→w .
Definition. Let W be a subspace of a real inner product space V . Each vector −→y ∈ Vcan be uniquely written as −→y = −→w + −→z where −→w ∈ W and −→z ∈ W⊥. The unique vector−→w ∈ W is called the orthogonal projection of −→y onto W and it is denoted by projW
−→y .
78
Linear Algebra Sudipta Mallik
Corollary 7.18. Let W be a subspace of a real inner product space V with an orthonormalbasis {−→w1,
−→w2, . . . ,−→wk}. Then for each −→y ∈ V ,
projW−→y = 〈−→y ,−→w1〉−→w1 + 〈−→y ,−→w2〉−→w2 + · · ·+ 〈−→y ,−→wk〉−→wk.
Theorem 7.19 (Best Approximation Theorem). Let W be a subspace of a real inner product
space V and−→b ∈ V . Then
min−→w∈W
∥∥∥−→b −−→w∥∥∥ =∥∥∥−→b − projW
−→b∥∥∥ .
Theorem 7.20 (Gram-Schmidt Process). Let W be a subspace of a real inner product spaceV with a basis {−→w1,
−→w2, . . . ,−→wk}. There is an orthogonal basis {−→v1 ,−→v2 , . . . ,−→vk} of W where
−→v1 = −→w1 and −→vi = −→wi −i−1∑j=1
〈−→wi,−→vj 〉〈−→vj ,−→vj 〉
−→vj , i = 2, 3, . . . , k.
Moreover, Span{−→v1 ,−→v2 , . . . ,−→vi } = Span{−→w1,−→w2, . . . ,
−→wi} for i = 1, 2, . . . , k.
79