Upload
dan-barlow
View
323
Download
0
Embed Size (px)
Citation preview
620-156 (MAST10007) Linear Algebra
Topic 1: Linear equations 2
Topic 2: Matrices and determinants 40
Topic 3: Euclidean vector spaces 105
Topic 4: General vector spaces 193
Topic 5: Linear transformations 225
Topic 6: Inner product spaces 263
Topic 7: Eigenvalues and eigenvectors 296
1
Topic 1: Linear equations [AR 1.1 and 1.2]
One of the major topics studied in linear algebra is systems of linearequations and their solutions. The following topics will be addressed:
1.1. Systems of equations. Coefficient arrays. Row operations.
1.2. Reduction of systems to row-echelon form and reduced row-echelonform.
1.3. Consistent and inconsistent systems. Infinite solution sets.
2
We will begin with a couple of examples to illustrate how linearequations can arise.
Example (This is Example 2 of [AR 11.3])
An investor has $10,000 to invest. She considers two bonds, one at10% and one at 7%. She decides to invest at most $6,000 in the firstbond and at least $2,000 in the second. She also decides to invest atleast as much in the first as the second. How should she invest?
Suppose she invests x1 in the first and x2 in the second.
Then the yield is z = 0.1x1 + 0.07x2.
The constraints are
x1 6 6000 x2 > 2000
x1 + x2 6 10000 x1 > x2
x1 > 0 x2 > 0
3
We could sketch these constraints:
10000
100006000
2000
x1 = x2
line of constant profit
We can find the maximum profit by moving the ‘line of constant profit’parallel to itself until it meets the shaded region.
4
Example (This is Example 1 of [AR 11.2])
A7Ω
I1
Loop 2Loop 1
50V
30V 3Ω 11Ω
I2
I3
The numbers I1, I2, I3represent the current asindicated.
Because the current going into point A is the same as the current goingout, we have
I1 = I2 + I3
Measuring voltage around Loop 1 and Loop 2 we get, respectively,
7I1 + 3I3 = 30
11I2 − 3I3 = 50
Solving these gives values for the current.
5
1.1 Systems of equations, coefficient arrays, rowoperations
Linear equations
In our example, the unknowns were I1, I2, I3, and a typical equationobtained was
7I1 + 3I3 = 30
We call this a linear equation in the variables I1and I2 because thecoefficients are constants, and the variables are raised to the first poweronly.
Now that we know what makes the equation linear, it is clear to seethat the number of variables is unimportant, so we have in the generalcase the following definition.
6
Definition (Linear equation and linear system)
A linear equation in n variables, x1, x2, . . . , xn, is an equation of the form
a1x1 + a2x2 + · · · + anxn = b
where a1, a2, . . . , an are real constants (not all zero) and b is also a realconstant.
Furthermore, a finite set of linear equations in the variablesx1, x2, . . . , xn is called a system of linear equations or a linear system.
7
Examples
x + 2y = 7
3
8x − 21y = 0
x1 + 5x2 + 6x3 = 100
x2 − x3 = −1
−x1 + x3 = 11
8
Definition (Solution of a system of linear equations)
A solution to a system of linear equations in the variables x1, . . . , xn is aset of values of these variables which satisfy every equation in thesystem.
How would you solve the following linear system?
2x − y = 3x + y = 0
Graphically
Need accurate sketch!
Not practical for three or more variables.
Elimination
Will always give a solution, but is too adhoc, particularly in higherdimensions (meaning three or more variables).
9
However, the good news is that with the introduction of some clevernotation, we can formalise the procedure involved in solvingsimultaneous equations.
Before we do this, let’s remind ourselves how to solve a simple linearsystem using elimination.
ExampleFind all solutions for the following system of linear equations
2x − y = 3x + y = 0
Adding the two equations gives 3x = 3, from which we get
10
Example Find all solutions for the linear system
3x + 2y = −32x + y = −1
Subtracting twice the second from the first gives
11
Coefficient arrays
There is a systematic way to write a linear system. For example,consider the previously encountered linear system
I1 − I2 − I3 = 0
7I1 + (0×)I2 + 3I3 = 30
(0×)I1 + 11I2 − 3I3 = 50
The coefficients of the unknowns I1, I2, I3 form a 3× 3 array of numbers:
1 −1 −17 0 30 11 −3
12
Definition (Matrix)
A matrix is a rectangular array of numbers.The numbers in the array are called the entries of the matrix.
(Matrices are discussed in further detail in a few lectures time).
Definition (Augmented matrix of a linear system)
The augmented matrix for a linear system is the matrix formed from thecoefficients in the equations and the constant terms.
ExampleThe augmented matrix for the previous set of equations is:
13
ExampleWrite the following system of linear equations as an augmented matrix
2x − y = 3x + y = 0
NoteThe number of rows is equal to the number of equations.Each column, except the last, corresponds to a variable.The last column contains the constant term from each equation.
14
Row Operations
Our aim is to use matrices to assist us in finding a solution to a systemof equations.
First we need to decide what sort of operations we can perform on theaugmented matrix. An essential condition is that whichever operationswe perform, we must be able to recover the solution to the originalsystem from the new matrix we obtain.
Let’s start by considering operations on a matrix that mimic thoseoperations used in the elimination method.
Definition (Elementary row operations)
The elementary row operations are:
1. Interchanging two rows.
2. Multiplying a row by a non-zero constant.
3. Adding a multiple of one row to another.
15
ExampleBack to our simple system:
2x − y = 3x + y = 0
Let’s apply some elementary row operations to the correspondingaugmented matrix:
[2 1 −31 1 0
]
∼
NoteThe matrices are not equal, but are equivalent in that the solution set isthe same for each system represented by each augmented matrix.
16
1.2 Reduction of systems to reduced row-echelon form
Gaussian elimination
Using a sequence of elementary row operations, we can always get to amatrix that allows us to determine the solution set of a linear system.
However, the degree of complication increases as the number ofvariables and equations increases, so it is a good idea to formalise theprocess.
The leftmost non-zero element in each row is called the leading entry.
17
Definition (Row echelon form)
A matrix is in row-echelon form if:
1. For any row with a leading entry, all elements below that entry andin the same column as it, are zero.
2. For any two rows, the leading entry of the lower row is further tothe right than the leading entry in the higher row.
3. Any row that consists solely of zeros is lower than any row withnon-zero entries.
Examples
[1 −2 3 4 5
],
1 0 0 30 1 1 20 0 0 3
r.e. form
0 0 0 2 40 0 3 1 60 0 0 0 02 −3 6 −4 9
not r.e. form
18
Gaussian elimination is a systematic (or algorithmic) approach to thereduction of a matrix to row-echelon form.
Gaussian elimination
1. Make the top left element (row 1, column 1) a leading entry; thatis, reorder rows so that the entry in the top left position is non-zero.
2. Add multiples of the first row to the other rows to make all otherentries (from row 2 down) in the first column zero.
3. Reorder rows 2 . . . n so that the next leading entry is in row 2.
4. Add multiples of the second row to rows 3 . . . n, making all otherentries (from row 3 down) in the column containing the secondleading entry zero.
5. Repeat until you run out of rows.
The matrix is now in row-echelon form.
19
ExampleUse Gaussian elimination to reduce the augmented matrix whichrepresents the linear system
3x + 2y − z = −15x + y − 4z = −30
3x + y + 3z = 113x + 3y − 5z = −41
to row-echelon form.
20
By reducing a matrix to row-echelon form, Gaussian elimination allowsus to easily solve a system of linear equations. To do this we need toread off the final equations from the row-echelon matrix and thenmanipulate them to find the solution. The final manipulation issometimes called back substitution. We illustrate with an example.
ExampleFrom the row-echelon matrix of the previous example we can calculatethe solutions to the original system.
NoteThis procedure relies on the fact that the new row-echelon matrix givesa linear system with exactly the same set of solutions as the originallinear system.
21
Is there any way to solve a system without having to perform the finalmanipulation?
Definition (Reduced row-echelon form)
A matrix is in reduced row-echelon form if the following three conditionsare satisfied:
1. It is in row-echelon form.
2. Each leading entry is equal to 1 (called a leading 1).
3. In each column containing a leading 1, all other entries are zero.
22
Examples
[1 −2 3 −4 5
]is in r.r.e form
1 0 0 30 1 1 20 0 0 3
is not in r.r.e form
1 0 0 2 40 1 3 1 60 0 0 0 00 0 1 −4 9
is not in r.r.e form
23
1.2.2 Gauss-Jordan elimination
Gauss-Jordan elimination is a systematic way to reduce a matrix toreduced row-echelon form using row operations.
Gauss-Jordan elimination
1. Use Gaussian elimination to reduce matrix to row-echelon form.
2. Use row operations to create zeros above the leading entries.
3. Multiply rows by appropriate numbers to create the leading 1’s.
The order of operations is not unique, however the reduced row-echelonform of a matrix is!
24
ExampleUse Gauss-Jordan elimination to find a solution to the linear system
3x + 2y − z = −15x + y − 4z = −30
3x + y + 3z = 113x + 3y − 5z = −41
25
1.3 Consistent and inconsistent systems
Recall that the definition of a solution of a system of linear equations asvalues of the variables which satisfy every equation in the system.
ExampleFind the solution of the system
2x + 4y − z = −3x − 3y + 2z = 11
4x − 2y + 5z = 21
This system has a unique solution.
26
Another exampleFind the solution of the system
x − y + z = 3x − 7y + 3z = −11
2x + y + z = 16
What’s going on here?
27
Yet another exampleFind the solution of the system
x + y + z = 42x + y + 2z = 93x + 2y + 3z = 13
Is this the same behaviour as the previous example or not?
28
Types of solution sets
It is clear from the above examples that there are different types ofsolutions for systems of equations.
Definition (Consistency)
A system of linear equations is said to be consistent if the systemhas at least one solution.
A system of linear equations is said to be inconsistent if the systemhas no solution.
For any system of linear equations, one of three types of solution set ispossible.
no solution (inconsistent)
one solution (consistent and unique)
infinitely many solutions (consistent but not unique)
29
Inconsistent systems
We can determine the type of solution set a system has by reducing itsaugmented matrix to row-echelon form.
The system is inconsistent if there is at least one row in the row-echelonmatrix having all values equal to zero on the left and a non-zero entryon the right. For example,
[0 0 · · · 0 5
]
Why is this inconsistent?
If we try to recover the equation represented by this row, it says:
0 × x1 + 0 × x2 + · · · + 0 × xn = 5
and of course this is not satisfied for any values of x1, . . . , xn.
30
Example
1 2 0 42 1 1 24 2 2 3
If a system of equations is inconsistent then the row-echelon form of itsaugmented matrix will have a row of the form
[0 0 · · · 0 a
]with
a 6= 0.
31
Example (inconsistent)Geometrically, an inconsistent system is one for which there is nocommon point of intersection for the system.
32
Consistent systems
Recall that a consistent system has either a unique solution or infinitelymany solutions.
Unique solution:For a system of equations with n variables, a unique solution existsprecisely when the row reduced augmented matrix has n non-zero rows.In this case we can read off the solution straight from the reducedmatrix.
Example
1 1 3 51 2 1 42 1 1 4
∼
and the solution is
x1 = x2 = x3 =
33
Infinitely many solutions:
Example
What is the solution to the equations with rre form;
1 2 0 0 5 10 0 1 0 6 20 0 0 1 7 30 0 0 0 0 0
The corresponding (non-zero) equations arex1 + 2x2 + 5x5 = 1, x3 + 6x5 = 2, x4 + 7x5 = 3.
We can choose x2 and x5 as we wish. Say x2 = s, x5 = t. Then theother variables must be given byx1 = 1 − 2s − 5t, x3 = 2 − 6t, x4 = 3 − 7t.
In this way, we can describe every possible solution.
34
In generalIf the row reduced augmented matrix has < n non zero rows then it hasinfinitely many solutions.
If r is the number of non zero rows, then n − r parameters are neededto specify the solution set.
More precisely, in the rre form of the matrix, there will be n− r columnswhich contain no leading entry.
We can choose the variable corresponding to such a column arbitrarily.
The values for the remaining variables will then follow.
35
ExampleSuppose that the rre matrix for a system of equations is
1 2 0 1 10 0 1 2 20 0 0 0 00 0 0 0 0
We can represent the values of x2 and x4 by parameters s and t,respectively.
Then the values of the other variables are
x1 = x3 =
36
Example
1 1 3 51 2 8 112 1 1 4
∼
So the solution of the equations can be given by:
x1 = x2 = x3 =
37
ExampleSolve the linear system:
v − 2w + z = 12u − v − z = 04u + v − 6w = 3
38
ExampleFind the values of k for which the system
u + 3v + 4w = 64u + 9v − w = 46u + 9v + kw = 8
has
(i) no solution
(ii) a unique solution
(iii) an infinite number of solutions
39
Topic 2: Matrices and Determinants [AR 1.3 – 1.6]
In studying linear systems we introduced the idea of a matrix. Next wewill discover that matrices are not only useful tools for solving systemsof equations but also that they have their own algebraic structure andhave many interesting properties.
2.1 Properties of matrices
2.2 Matrix algebra
2.3 Matrix inverses
2.4 Rank of a matrix
2.5 Solutions of non-homogeneous linear equations
2.6 Determinants
40
2.1 Properties of matrices
Notation
Sometimes it is convenient to refer to the entries of a matrix ratherthan the entire matrix.
If A is a matrix, we denote its entries as Aij , where i specifies the row ofthe entry and j specifies the column. Using this notation, we write
A =
A11 A12 . . . . . . A1n
A21 A22 . . . . . . A2n...
.... . .
......
.... . .
...Am1 Am2 . . . . . . Amn
or A = [Aij ]
41
We say that a matrix has size m× n when it has m rows and n columns.
Example
The matrix A =
[1 2 3π e 27.1
]
has size 2 × 3.
We have A12 = 2 and A21 = π.
NoteA12 and A21 are not equal (in this example).
42
Some special matricesSome matrices with special features are given names suggestive of thosefeatures.
Definition (Special matrices)
A matrix having the same number of rows as columns is called asquare matrix
A matrix with only one row is called a row matrix(They have size 1 × n)
A matrix with only one column is called a column matrix (Theyhave size n × 1)
A matrix with all elements equal to zero is referred to as a zeromatrix
A square matrix with Aij = 0 for i 6= j is called a diagonal matrix
A square matrix A satisfying Aij =
1 if i = j
0 if i 6= jis called an identity matrix.
43
Examples
The matrices
[1 23 4
]
and
1 2 23 4 56 7 8
are both square.
[4 3 5 −2
]is a row matrix.
123
is a column matrix.
The matrices
[0 0 00 0 0
]
and
0 00 00 0
are both zero matrices.
[1 00 1
]
and
1 0 00 1 00 0 1
are both identity matrices.
44
2.2 Matrix algebra
We all know the algebra of real numbers: addition, subtraction,multiplication and division. But what do these operations mean whenwe try to apply them to things other than real numbers? Matrices havetheir own algebra for which some of these things make sense and somedon’t.
Definition (Equality of matrices)
We say that two matrices are equal if
they have the same size; and
every corresponding element is equal.
That is,
A = B ⇐⇒ A and B have the same size andAij = Bij for all i and j
45
Example
Given A =
[1 y 0
−7 2 3
]
and B =
[1 3 0x 2 3
]
determine the values of x and y for which A = B .
Scalar Multiplication
Definition (Scalar multiple)
Let A = [Aij ] be a matrix and c ∈ R. The matrix cA having the samesize as A and entries given by
(cA)ij = c × Aij
is called a scalar multiple of A.In this context the real number c is called a scalar.
46
Example
Let A =
0 11 −12 −7
. What are −2A and αA for α ∈ R?
Properties of Scalar Multiplication
If α and β are any scalars and C and D are any matrices of the samesize, then
1. (α + β)C = αC + βC
2. α(D + C ) = αD + αC
3. α(βC ) = (αβ)C
47
Addition of Matrices
Definition (Addition of matrices)
Let A = [Aij ] and B = [Bij ] be matrices of the same size.We define a matrix C , called the sum of A and B , by
C has the same size as A and B
Cij = Aij + Bij
We write C = A + B .
Be Careful: Matrix addition is only defined for matrices of the same size.
Notation: We write A − B in place of A + (−1)B .
48
ExampleLet
A =
[2 0 −31 −1 3
]
, B =
[−1 −1 1
0 1 2
]
and C =
[−1 1
2 0
]
.
Calculate (where possible) A + B and A + B + C
49
Properties of Matrix Addition
For matrices A, B and C , all of the same size, the following statementshold:
1. A + B = B + A (commutativity)
2. A + (B + C ) = (A + B) + C (associativity)
3. A − A = 0
4. A + 0 = A
Here 0 denotes the zero matrix of the same size as A, B and C .
All these properties follow from the corresponding properties of thescalars.
50
Matrix multiplication
Sometimes we can multiply two matrices together.
Definition (Matrix multiplication)
Let A be an m × n matrix and B be a n × q matrix.The matrix product of A and B is a matrix of size m × q, denoted AB .
The entry in position ij of the matrix product is obtained by taking rowi of A, and column j of B , then multiplying together the entries in orderand forming the sum.
Using summation notation this rule can be expressed as
(AB)ij =n∑
k=1
AikBkj
Note: The matrix product AB is only defined if the number of columnsof A is equal to the number of rows of B .
51
Example
Let A =
1 −13 00 1
and B =
[4 0
−7 1
]
Calculate AB and BA (if they exist).
52
Example
Let A =
[1 02 3
]
and B =
[4 32 1
]
Calculate AB and BA.
Notice: In this example AB 6= BA, even though both are defined.
Matrix multiplication is not commutative (in general).
53
Definition (Commuting matrices)
The matrices A and B are said to commute if AB = BA.
For both AB and BA to be defined and equal we must have that A andB are square and have the same size.
NoteIf I is an n × n identity matrix and A is any n × n square matrix, then
AI = IA = A
54
Properties of matrix multiplicationThe following properties hold whenever the matrix products and sumsare defined:
1. A(B + C ) = AB + AC (left distributivity)
2. (A + B)C = AC + BC (right distributivity)
3. A(BC ) = (AB)C (associativity)
4. A(αB) = α(AB)
5. AI = IA = A
6. A0 = 0 and 0A = 0
Here α is a scalar and, as always, I and 0 denote the identity matrixand zero matrix of the appropriate size.
55
Matrix multiplication and linear systems
Using the rule for matrix multiplication, a linear system can be writtenas a matrix equation.
Example
The linear system
2x + 0.1y − 0.02z = 0.9−0.8x + 7y = 0
is equivalent to the matrix equation
[2 0.1 −0.02
−0.8 7 0
]
xyz
=
[0.90
]
56
Another non-property of matrix multiplication
If we take the product of two real numbers a and b and ab = 0 we canconclude that at least one of a and b is equal to 0.
This is not always true for matrices.
Example
Let A =
[1 1
−1 −1
]
and B =
[1 −1
−1 1
]
.
What is going on here? The problem is that the matrices A and B inthe above example do not have inverses.
57
2.3 Matrix inverses
Definition (Matrix inverse)
A matrix A is called invertible if there exists a matrix B such that
AB = BA = I
Such a matrix B is called the inverse of A and is denoted A−1.
If A is not invertible, we say that A is singular.
58
Note
It follows from the definition that:
For A to be invertible it must be square
(If it exists) A−1 has the same size as A
(If it exists) A−1 is unique
If A is invertible, then A−1 is invertible and (A−1)−1 = A
I−1 = I , 0 has no inverse.
59
Inverse of a 2 × 2 matrix
In general, for a 2 × 2 matrix A=
[a bc d
]
1. A is invertible iff (ad − bc) 6= 0
2. If (ad − bc) 6= 0, then A−1 = 1ad−bc
[d −b−c a
]
Example
Find the inverse of A =
[2 −11 1
]
60
Example
If A is a square matrix satisfying A3 = 0, show that(I − A)−1 = I + A + A2
61
Finding the inverse of a square matrix
Calculating the inverse of a matrix
Given an n × n matrix A, we can find A−1 as follows:
1. Construct the (“grand augmented”) matrix [A | I ],where I is the n × n identity matrix.
2. Apply row operations to [A | I ] to get the block corresponding toA into reduced row-echelon form. This gives
[A | I ] ∼ [R | B ]
where R is in reduced row-echelon form.
3. If R = I , then A is invertible and A−1 = BIf R 6= I , then A is singular (i.e., there is no A−1)
We will see shortly why this method works.
62
Example
Find the inverse of A =
1 2 1−1 −1 1
0 1 3
63
Another example
Find the inverse of
1 2 1−1 −1 1−1 0 3
64
Properties of the matrix inverse
If A and B are invertible matrices of the same size, and α a non-zeroscalar, then
1. (αA)−1 = 1α
A−1
2. (AB)−1 = B−1A−1
3. (An)−1 =(A−1
)n(for all n ∈ N)
ExerciseProve these!
65
Row operations by matrix multiplication
The effect of a row operation can be achieved by multiplication on theleft by a suitable matrix.
Definition (Elementary matrix)
An n × n matrix is an elementary matrix if it can be obtained from Inby performing a single elementary row operation.
Example
Consider I3 =
1 0 00 1 00 0 1
Let F be obtained from I3 by the row operation: R2 ↔ R3
Let G be obtained from I3 by the row operation: R1 7→ 2R1
Let H be obtained from I3 by the row operation: R3 7→ R3 + 3R1
66
F =
1 0 00 0 10 1 0
G =
2 0 00 1 00 0 1
H =
1 0 00 1 03 0 1
Now consider the matrix A =
a b cd e fg h i
Calculation gives
FA =
GA =
HA =
67
This works in general!
Let E be obtained by applying a row operation p to I .
If A is a matrix such that the product EA is defined,then EA is the result of performing p on A.
We can represent a sequence of elementary row operations by acorresponding sequence of elementary matrices.(Be careful about the order!)
68
Example[1 23 4
]
∼[1 20 −2
]
∼[1 20 1
]
[1 0
0 −12
] [1 0−3 1
] [1 23 4
]
=
[1 20 1
]
69
If A ∼ I , then there is a sequence of elementary matrices E1,E2, . . . ,En
such that EnEn−1 · · ·E2E1A = I
This can be used to prove the following
Theorem (AR 1.5.3)
1. Let A be an n × n matrix. Then
A is invertible ⇐⇒ A ∼ In
2. If A and B are n × n matrices such that AB = In,then B is invertible and B−1 = A.
3. Every invertible matrix can be written as a product of elementarymatrices.
This theorem justifies why our procedure for finding inverses using rowoperations actually works.... Can you see why?
70
Matrix transpose
Definition (Transpose of a matrix)
Let A be an m × n matrix. The transpose of A, denoted by AT , isdefined to be the n × m matrix whose entries are given by
(AT )ij = Aji
That is, AT is obtained by interchanging the rows and columns of A.
Example
Let A =
[1 2 34 5 6
]
. Then AT is
71
Properties of the transpose
1.(AT
)T= A
2. (A + B)T = AT + BT (whenever A + B is defined)
3. (αA)T = αAT (where α is a scalar)
4. (AB)T = BTAT (whenever AB is defined)
5.(AT
)−1=
(A−1
)T(whenever A−1 is defined)
Parts 1, 2 and 3 follow easily from the definition.
To prove part 4, we also use the definition of matrix multiplication givenpreviously.
72
Example
Using part 4 above, prove part 5:(AT
)−1=
(A−1
)T.
73
Linear systems revisited
Consider the linear system
x + 2y + z = − 3−x − y + z = 11
y + 3z = 21
We could represent the system by an augmented matrix:
1 2 1 −3−1 −1 1 11
0 1 3 21
or, we could write it as a matrix equation:
1 2 1−1 −1 1
0 1 3
xyz
=
−31121
74
For convenience, label
(i) the matrix of coefficients as A,
(ii) the column matrix of variables as x,
(iii) the right hand side column matrix as b.
Using this notation, the linear system can be written as
Ax = b
Solving linear systems using matrix inverses
Theorem
Suppose A is an invertible matrix.Then the linear system Ax = b has exactly one solution, and it is givenby x = A−1b
Proof: Ax = b =⇒ A−1Ax = A−1b =⇒ x = A−1b75
Example
Use a matrix inverse to solve the linear system
x + 2y + z = −3−x − y + z = 11
y + 3z = 21
76
Another example
Solve the following system of linear equations:
x + 2y + z = a−x − y + z = b
y + 3z = c
77
2.4 Rank of a matrix
Definition (Rank)
The rank of a matrix A is the number of non-zero rows in the reducedrow-echelon form of A.
Note This is the same as the number of non-zero rows
in a row-echelon form of A. If A has size m × n, then rank(A) 6 m and rank(A) 6 n
Example
Find the rank of each of the following matrices:
1 2 1−1 −1 1
0 1 3
1 −1 2 10 1 1 −21 −3 0 5
78
Theorem
The linear system Ax = b, where A is an m × n matrix, has:
1. No solution if rank(A) < rank([A |b])
2. A unique solution if rank(A) = rank([A |b]) and rank(A) = n
3. Infinitely many solutions if rank(A) = rank([A |b]) and rank(A) < n
Proof: This is just a restatement of the results in section 1.3
NoteIt is always the case that rank(A) 6 rank([A |b]) and rank(A) 6 n
79
Theorem
If A is an n × n matrix, the following conditions are equivalent:
1. A is invertible
2. Ax = b has a unique solution for any b
3. The rank of A is n
4. The reduced row-echelon form of A is In
Proof:
1 ⇒ 2 We’ve seen before (slide 75).
2 ⇒ 3 Follows from the previous theorem (or what we already knew aboutlinear systems).
3 ⇒ 4 Immediate from the definition of rank, and that fact that A issquare.
4 ⇒ 1 Let R be the RREF of A. Then R = EA, where E = EkEk−1 . . . E1
is a product of elementary matrices. So we have I = EA. We havealready noted that this implies that A is invertible (andA−1 = E ).
80
2.5 Solutions of non-homogeneous linear equations
Suppose that A is an m × n matrix (of coefficients), thatx = [x1, . . . , xn]
T is an n × 1 matrix (of unknowns) and that b is anm × 1 matrix (of constants).
The general solution to the system of equations
Ax = b
is given byx = xh + x0
where x0 is any one solution to Ax = b and xh varies through allsolutions to the homogeneous equations Ax = 0.
81
Example
The equations
1 0 −1 10 1 −1 −10 0 0 0
x1
x2
x3
x4
=
120
(Ax = b)
have a solution x1 = 1, x2 = 2, x3 = 0, x4 = 0.
The general solution to the homogeneous equations Ax = 0 is given by
So the general solution to the original equations is
82
2.6 Determinants [AR 2.1–2.3]
When we calculate the inverse of the 2 × 2 matrix A =
[a bc d
]
we find
that we need to invert the number ad − bc .
If ad − bc 6= 0 then we can find the inverse of A; if not, then A is notinvertible.
So this number plays an important role when we study A. We call it thedeterminant of A.
The determinant is a function that associates a real number to anysquare matrix.
83
Defining the determinant
The determinant of a 2 × 2 matrix A =
[a bc d
]
is given by
det(A) = ad − bc
Furthermore, the matrix
[a bc d
]
is invertible iff det(A) 6= 0
Definition (Determinant)
Let A be an n× n matrix. The determinant of A, denoted det(A) or |A|,can be defined as the signed sum of all the ways to multiply together nentries of the matrix, with all chosen from different rows and columns.
84
To determine the sign of the products, imagine all but the elements inthe product in question are set to zero in the matrix. Now swapcolumns until a diagonal matrix results. If the number of swaps requiredis even, then the product has a + sign, while if it is odd, it is to begiven a − sign.
Determinants of 2 × 2 matrices
For A =
[a bc d
]
, the possible products are ad and bc .
When we set all entries other than a and d to zero, then the matrix isdiagonal; so ad has a + sign. When we set all entries other than b andc to zero then we need to interchange the two columns to obtain adiagonal matrix; so we have a minus sign.
85
Determinant of a 3 × 3 matrix
Suppose
A =
a11 a12 a13
a21 a22 a23
a31 a32 a33
We can form a table of all the products of 3 entries taken from differentrows and columns, together with the signs:
product sign
a11a22a33 +a11a23a32 −a12a21a33 −a12a23a31 +a13a21a32 +a13a22a31 −
Hence
det(A) = a11a22a33 − a11a23a32 − a12a21a33
+a12a23a31 + a13a21a32 − a13a22a31
86
The formula for 3 × 3 matrices is complicated and it becomes quicklyworse as the size of the matrix increases. But there are better ways tocalculate determinants.
Submatrices and cofactors
Here is one way for practical calculation of (small) determinants.
Definition (Submatrix)
Let A be an m × n matrix. The (i , j)-submatrix of A, denoted byA(i , j), is the (m − 1) × (n − 1) matrix obtained by deleting the ith rowand jth column from A.
Example
If A =
1 2 1−1 −1 1
0 1 3
, then A(2, 3) =
[1 20 1
]
87
Definition (cofactor)
Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , isdefined by
Cij = (−1)i+jdet (A(i , j))
Example (continued)
A(2, 3) =
[1 20 1
]
, so det (A(2, 3)) = 1 and C23 =
88
Cofactor Expansion
We can write the determinant of a 3 × 3 matrix, A, in terms ofcofactors.
det(A) = a11C11 + a12C12 + a13C13
This is called the cofactor expansion along the first row of A.
Example
Calculate det
1 2 1−1 −1 1
0 1 3
89
Theorem
The determinant of an n × n matrix A can be computed by multiplyingthe entries in any row (or column) by their cofactors and adding theresulting products.
That is, for each 1 6 i 6 n and 1 6 j 6 n,
det(A) = ai1Ci1 + ai2Ci2 + · · · + ainCin
(this is called cofactor expansion along the ith row)and
det(A) = a1jC1j + a2jC2j + · · · + anjCnj
(this is called cofactor expansion along the jth column)
Proof: We can prove this in the n = 3 case using the formula on slide86. The proof for the general case is essentially the same, but gets moretechnical...
90
Example
Calculate the determinant of
A =
1 2 1−1 −1 1
0 1 3
(i) using the cofactor expansion along the 2nd column,(ii) using the cofactor expansion along the 3rd row.
Which one was easier?
91
How do you remember the sign of the cofactor?
The (1, 1)-cofactor always has sign +. Starting from there, imaginewalking to the square you want using either horizontal or vertical steps.The appropriate sign will change at each step.
We can visualize this arrangement with the following matrix:
+ − + − . . .− + − + . . .+ − + − . . .− + − + . . ....
......
.... . .
So, for example, C13 is assigned + but C32 is assigned −
92
Example
Calculate
∣∣∣∣∣∣∣∣
1 −2 0 13 1 2 01 0 1 02 −2 1 2
∣∣∣∣∣∣∣∣
93
Some properties of determinants:
Suppose A and B are square matrices. Then
1. det(AT
)= det(A)
2. If A has a row or column of zeros, then det(A) = 0
3. det(AB) = det(A) det(B)
4. If A if invertible, then det(A) 6= 0 and det(A−1
)= 1
det(A)
5. If A is singular, then det(A) = 0
6. If A =
[C ∗0 D
]
with C and D both square, then
det(A) = det(C ) det(D)
Idea of proof: 2: Cofactor expansion. 4: Follows from 3. 5: Followsfrom 3, 4 and the theorem on slide 80. 1, 3 6: Need to use thedefinition...
94
Row operations and determinants
Calculating determinants via cofactors becomes a very large calculationas the size of the matrix increases.
We need a better way to calculate larger determinants.
For some types of matrix it is easy to write down their determinant.
Definition (Triangular Matrix)
A matrix is said to be upper triangular (respectively lower triangular) ifall the elements below (respectively above) the main diagonal are zero.
Examples
2 −1 90 3 20 0 2
2 0 01 3 02 −3 2
2 0 00 3 00 0 2
95
Theorem
If A is an n × n triangular matrix, thendet(A) is the product of the entries on the main diagonal of A.
Idea of proof: (Repeated) cofactor expansion along the first column.
Example
Let A =
2 −10 92 −1170 3 28 −310 0 −1 270 0 0 2
What is det(A)?
96
We can use row operations to manipulate a matrix into triangular formin order to make the determinant calculation easier.
Recall that an n × n matrix is an elementary matrix if it can beobtained from In by performing a single elementary row operation.Multiplying on the left by an elementary matrix is equivalent toapplying the corresponding elementary row operation.
Using elementary matrices we can check the effects of row operationson the determinant of a matrix.
97
Example
Let A =
a b cd e fg h i
Row swap:
E =
1 0 00 0 10 1 0
EA =
a b cg h id e f
det(EA) = det(E ) det(A) = − det(A)
98
Multiply a row by a scalar:
F =
2 0 00 1 00 0 1
FA =
2a 2b 2cd e fg h i
det(FA) = det(F ) det(A) = 2 det(A)
Add a constant multiple of one row to another row:
G =
1 0 00 1 03 0 1
GA =
a b cd e f
g + 3a h + 3b i + 3c
det(GA) = det(G ) det(A) = det(A)
99
The effect of each of the three types of elementary row operations aregiven by the following.
Theorem
Let A be a square matrix.
1. If B is obtained from A by swapping two rows (or two columns) ofA, then det(B) = − det(A)
2. If B is obtained from A by multiplying a row (or column) of A bythe scalar α, then det(B) = α det(A)
3. If B is obtained from A by replacing a row (or column) of A byitself plus a multiple of another row (column), thendet(B) = det(A)
Proof: The corresponding elementary matrices have determinants −1, αand 1 (respectively). Then use det(B) = det(E ) det(A).
100
Example
Calculate det
1 2 1−1 −1 1
0 1 3
101
Example
Calculate
∣∣∣∣∣∣∣∣
1 −2 0 13 1 2 01 0 1 02 −2 1 2
∣∣∣∣∣∣∣∣
102
We collect here some properties of the determinant function, most ofwhich we’ve already noted.
Theorem
Let A be an n × n matrix. Then,
1. det(AT
)= det(A)
2. det(AB) = det(A) det(B)
3. det(αA) = αn det(A)
4. If A is a triangular matrix, then its determinant is the product ofthe elements on the main diagonal
5. If A has a row (or column) of zeros, then det(A) = 0
6. If A has a row (or column) which is a scalar multiple of anotherrow (or column) then det(A) = 0
7. A is singular iff det(A) = 0 (and A is invertible iff det(A) 6= 0)
103
Example (showing how to prove property 3 above)
Let A be a 3 × 3 matrix. Show that det(αA) = α3 det(A).
104
Topic 3: Euclidean Vector Spaces
There are quite a few things to cover:
3.1 Vectors in Rn
3.2 Dot product
3.3 Cross product of vectors in R3
3.4 Geometric applications
3.5 Linear combinations
3.6 Subspaces of Rn
3.7 Bases and dimension
3.8 Rank-nullity theorem
3.9 Coordinates relative to a basis
105
3.1 Vectors in Rn [AR 3.1]
Geometrically, a pair of real numbers (a, b) can be thought of asrepresenting a directed line segment from the origin in the plane.
These can be added together, and multiplied by a real number.
Algebraically, these operations are given by:
vector addition: (a, b) + (c , d) = (a + c , b + d)
scalar multiplication: α(a, b) = (αa, αb)
Notation:
R2 = (a, b) | a, b ∈ R
= the set of all ordered pairs of real numbers
106
The algebraic approach to vector addition and scalar multiplicationextends to 3 dimensions or more.
Rn = (x1, x2, . . . , xn) | xi ∈ R for i = 1, 2, . . . , n
= the set of all n-tuples of real numbers
We will refer to elements of Rn as vectors.
We can add two vectors together and multiply a vector by a scalar.
vector addition : (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn)
scalar multiplication : α(x1, . . . , xn) = (αx1, . . . , αxn)
107
Notation
We often denote by i, j, and k the vectors in R3 given by:
i = (1, 0, 0) j = (0, 1, 0) k = (0, 0, 1)
Any vector in R3 can be written in terms of i, j and k:
u = (u1, u2, u3) = ui i + u2 j + u3k
108
Definition (magnitude)
The length (or magnitude or norm) of a vectoru = (u1, u2, . . . , un) ∈ R
n is given by
‖u‖ =√
u21 + u2
2 + · · · + u2n
It follows from Pythagoras’ theorem that this corresponds to ourgeometric idea of length for vectors in R
2 and R3.
ExampleLet u = 2i − j + 2k. ‖u‖ =
A vector having length equal to 1 is called a unit vector.
ExampleFind a unit vector parallel to u.
109
Definition
The distance between two vectors u, v ∈ Rn is given by
d(u, v) = ‖v − u‖
Geometrically, if we consider u and v to be the position vectors ofpoints P and Q, then d(u, v) is the distance between P and Q.
ExampleFind the distance between the points P(1, 3,−1) and Q(2, 1,−1).
110
3.2 Dot product [AR 3.3]
Letu = (u1, u2, . . . , un) ∈ R
n
andv = (v1, v2, . . . , vn) ∈ R
n
be two vectors in Rn.
Definition (Dot product)
We define the dot product (or scalar product or Euclidean innerproduct) by
u · v = u1v1 + u2v2 + · · · + unvn
Examples(3,−1) · (1, 2) =(i + j + k) · (−j + k) =
111
The angle between two vectors can be defined in terms of the dotproduct.
Definition (Angle)
The angle θ between two vectors u, v ∈ Rn is given by the expression
u · v = ‖u‖‖v‖ cos θ
The angle defined in this way is exactly the usual angle between twovectors in R
2 or R3.
That our definition of angle makes sense relies on the following
Theorem (Cauchy-Schwarz Inequality for Rn)
Let u, v be vectors. Then
|u · v| 6 ‖u‖‖v‖with equality holding precisely when u is a multiple of v.
Proof: Consider the quadratic polynomial given by(u + tv) · (u + tv)...
112
ExampleFind the angle between the vectors u = 2i + j − k and v = i + 2j + k.
113
Properties of the dot product
1. u · v is a scalar
2. u · v = v · u3. u · (v + w) = u · v + u · w4. u · u = ‖u‖2
5. (αu) · v = α(u.v)
NoteSuppose u = (u1, . . . , un) and v = (v1, . . . , vn) are two vectors in R
n.If we write each as a row matrix U = [u1 · · · un] V = [v1 · · · vn], then
u · v = UV T
114
3.3 Cross product of vectors in R3 [AR 3.4]
Let u = (u1, u2, u3) and v = (v1, v2, v3) be two vectors in R3.
Definition (Cross product)
The cross product (or vector product) of u and v is the vector given by
u × v = (u2v3 − u3v2)i + (u3v1 − u1v3) j + (u1v2 − u2v1)k
A convenient way to remember this is as a “determinant”
u × v =
∣∣∣∣∣∣
i j k
u1 u2 u3
v1 v2 v3
∣∣∣∣∣∣
=
∣∣∣∣
u2 u3
v2 v3
∣∣∣∣i −
∣∣∣∣
u1 u3
v1 v3
∣∣∣∣j +
∣∣∣∣
u1 u2
v1 v2
∣∣∣∣k
using cofactor expansion along the first row.
115
Geometry of the cross product
u × v = ‖u‖ ‖v‖ sin(θ) nu
v
n
θ
n is a unit vector (i.e., ‖n‖ = 1) n is perpendicular to both u and v (i.e., n · u = 0 and n · v = 0) n points in the direction given by the right-hand rule θ ∈ [0, π] is the angle between u and v If θ = 0 or θ = π, then u × v = 0
ExampleFind a vector perpendicular to both (2, 3, 1) and (1, 1, 1).
116
Properties of the cross product
1. v × u = −(u × v)
2. u × (v + w) = (u × v) + (u × w)
3. (αu) × v = α(u × v)
4. u × 0 = 0
5. u × u = 0
6. u · (u × v) = 0
NoteThe cross product is defined only for R
3. Unlike dot product and manyof the other properties we are considering, it does not extend to R
n ingeneral.
117
3.4 Geometric applications [AR 3.5]
Basic applications
Suppose u, v ∈ R3 are two vectors.
1. u and v are perpendicular precisely when u · v = 0
2. The area of the parallelogramdefined by u and v is equal to
‖u × v‖ u
v
NoteIf u and v are elements of R
2 with u = (u1, u2) and v = (v1, v2), then
area of parallelogram = absolute value of
∣∣∣∣
u1 u2
v1 v2
∣∣∣∣
118
ExampleFind the area of the triangle with vertices (2,−5, 4), (3,−4, 5) and(3,−6, 2).
119
3. Assuming u 6= 0,the projection of v onto u isgiven by
proju v =(u · v)‖u‖2
uu
v
proju v
(v − proju v)
Notice that:
If we set u = u/‖u‖, then proju v = (u · v) u
u · (v − proju v) =
120
ExampleLet w = (2,−1,−2) and v = (2, 1, 3).Find vectors v1 and v2 such that
v = v1 + v2,
v1 is parallel to w, and
v2 is perpendicular to w.
121
Scalar triple product
Notice that
u · (u × v) =
∣∣∣∣∣∣
u1 u2 u3
u1 u2 u3
v1 v2 v3
∣∣∣∣∣∣
= 0
Similarly
u · (v × u) = v · (u × v) = v · (v × u) = 0
In general, u · (v ×w) is called the scalar triple product and is given by
u · (v × w) =
∣∣∣∣∣∣
u1 u2 u3
v1 v2 v3
w1 w2 w3
∣∣∣∣∣∣
122
Suppose u, v,w ∈ R3 are three vectors.
The parallelepiped defined by u, v and w
u
w
v
has volume equal to the scalar triple product of u, v and w
volume of parallelepiped = |u·(v×w)| = absolute value of
∣∣∣∣∣∣
u1 u2 u3
v1 v2 v3
w1 w2 w3
∣∣∣∣∣∣
123
ExampleFind the volume of the parallelepiped with adjacent edges
PQ,
PR,
PS ,where the points are: P(2,−1, 1), Q(4, 6, 7), R(5, 9, 7) and S(8, 8, 8).
124
Lines
Vector equation of a lineThe vector equation of a line through a point P0 in the directiondetermined by a vector v is
r = r0 + tv t ∈ R
where r0 =
OP0
P0
r = r0 + tvr0
tv
0
Each point on the line isgiven by a unique valueof t.
125
Letting r = (x , y , z), r0 = (x0, y0, z0) and v = (a, b, c) the equationbecomes
(x , y , z) = (x0, y0, z0) + t(a, b, c)
Parametric equations for a lineEquating components gives the parametric equations for the line:
x = x0 + ta
y = y0 + tb t ∈ R
z = z0 + tc
ExampleWhat is the parametric form of the line passing through the pointsP(−1, 2, 3) and Q(4,−2, 5)?
126
Cartesian equations for a lineIf a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for tand equate. This gives the cartesian form of the straight line:
x − x0
a=
y − y0
b=
z − z0
c
ExampleWhat is the cartesian form of the line passing through the pointsP(−1, 2, 3) and Q(4,−2, 5)?
127
ExampleFind the vector equation of the line whose cartesian form is
x + 1
5=
y − 3
−1=
z − 4
2
128
Definition
Two lines are said to:
intersect if there is a point lying in both
be parallel if their direction vectors are parallel
The angle between two lines is the angle between their direction vectors
ExampleFind the vector equation of the line through the point P(0, 0, 1) that isparallel to the line given by
x − 1
1=
y + 2
2=
z − 6
2
129
Planes
A plane is determined by a point P0 on it and any two non-parallelvectors u and v lying in the plane.
Vector equation of a plane
Letting r0 =
OP0, the vector equation of the plane is
r = r0 + su + tv s, t ∈ R
P0
r = r0 + su + tv
su
tv
0
Particular values of s and t determine a point on the plane, andconversely every point on the plane has position vector given by some sand t.
130
The vector
n =(u × v)
‖u × v‖is a unit vector that is perpendicular to the plane.
Such a vector is called a unit normal vector to the plane.
The angle between two planes is given by the angle between their (unit)normal vectors.
If P is a point with position vector r, then:
P lies on the plane ⇐⇒ r − r0 is parallel to plane
⇐⇒ r − r0 is perpendicular to n
131
It follows that the equation ofthe plane can also be written as
(r − r0) · n = 0
r0r
n r− r0
0
Writing r = (x , y , z) and n = (a, b, c), we obtain the Cartesian equationof the plane
ax + by + cz = d
132
Examples
1. The plane perpendicular to the direction (1, 2, 3) and through thepoint (4, 5, 6) is given by x + 2y + 3z = d whered = 1 × 4 + 2 × 5 + 3 × 6. That is
x + 2y + 3z = 32
2. What is the equation of the plane perpendicular to (1, 0,−2) andcontaining the point (1,−1,−3)?
133
3. The plane through (1, 1, 1) containing vectors parallel to (1, 0, 1)and (0, 1, 2) is the set of all vectors of the form(1, 1, 1) + s(1, 0, 1) + t(0, 1, 2) s, t ∈ R
4. Find the Cartesian equation of the plane with vector form(x , y , z) = (1, 1, 1) + s(1, 0, 1) + t(0, 1, 2).
5. Find the vector and cartesian equations of the plane containing thethree points P(2, 1,−1), Q(3, 0, 1), and R(−1, 1,−1).
134
Intersection of a line and a plane
ExampleWhere does the line
x − 1
1=
y − 2
2=
z − 3
3
meet the plane 3x + 2y + z = 20?
A typical point of the line is
Putting this into the equation for the plane, we get
This gives the point of intersection as (2, 4, 6)
135
Intersection of two planes
Example What is the cartesian equation of the line of intersection ofthe two planes x + 3y + 2z = 6 and 3x + 2y + z = 11?
The direction of the line is perpendicular to both normals and so isgiven by
A point on the line is given by solving the two equations. For example:
Thus the equation of the line is
Or you could solve the two equations (for the planes) simultaneously.
136
Distance from a point to a line
Example Find the distance from the point P(2, 1, 1) to the line withcartesian equation
x − 2
1=
y − 1
1=
z
2
137
3.5 Linear Combinations [AR 5.2]
We have seen that a plane through the origin in R3 can be built up by
starting with two vectors and taking all scalar multiples and sums.
In this way, we can build up lines, planes and their higher dimensionalversions.
Definition (Linear combination)
A linear combination of vectors v1, v2, . . . , vk ∈ Rn is a vector of the
formw = α1v1 + α2v2 + · · · + αkvk
where α1, α2, . . . , αk are scalars (i.e., real numbers).
138
Examples
1. w = (2, 3) is a linear combination of e1 = (1, 0) and e2 = (0, 1)
2. w = (2, 3) is a linear combination of v1 = (1, 2) and v2 = (3, 1)
3. w = (1, 2, 3) is not a linear combination of v1 = (1, 1, 1) andv2 = (0, 0, 1)
139
Linear Dependence [AR 5.3]
By taking all linear combinations of a given set of vectors, we can buildup lines, planes, etc.
We can make this more efficient by removing any redundant vectorsfrom the set.
Definition (Linear dependence)
A collection of vectors S is linearly dependent if there are vectorsv1, . . . , vk ∈ S and scalars α1, . . . , αk such that
α1v1 + · · · + αkvk = 0
and at least one of the αi is non-zero.
RememberThe zero vector 0 = (0, . . . , 0) is not the same as the number zero.
140
So, the set of vectors is linearly dependent if and only if somenon-trivial linear combination gives the zero vector.
Theorem (AR Thm 5.3.1)
Vectors v1, . . . , vk (where k > 2) are linearly dependent iff one vector isa linear combination of the others.
Proof:
Two vectors are linearly dependent iff one is a multiple of the other.
Three vectors in R3 are linearly dependent iff they lie in a plane.
141
Definition (Linear independence)
A set of vectors is called linearly independent if it is not linearlydependent.
Examples
1. The vectors (2,−1) and (−6, 3) are linearly dependent.
2. (2,−1) and (4, 1) are linearly independent.
142
Examples
1. The vectors (1, 0, 0), (1, 1, 0) and (1, 1, 1) are linearly independent.
2. (1, 1, 2), (1,−1, 2), (3, 1, 6) are linearly dependent.
3. (1, 2, 3), (1, 0, 0), (0, 1, 0), (0, 1, 1) are linearly dependent.
143
To decide if vectors v1, . . . , vk ∈ Rn are linearly independent
1. Form the matrix A having the vectors as columns(we’ll denote it by A = [v1 · · · vk ])
2. Reduce to row-echelon form R
3. Count the number of non-zero rows in R (this is rank(A)):
v1, . . . , vk are linearly independent ⇐⇒ rank(A) = k
Why does this method work?
It follows from what we know about the number of solutions of a linearsystem (slide 79)
144
Examples
1. (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0) ∈ R4 are linearly independent
2. (1, 1), (2, 3), (1, 0) ∈ R2 are linearly dependent
145
An important observation
If A and B are row-equivalent matrices, then the columns of A satisfythe same linear relations as the columns of B . This is the same as sayingthat the two systems of linear equations have the same set of solutions.
This implies that relations between the (original) vectors v1, . . . , vk canbe read from the reduced row-echelon form of the matrix [v1 · · · vk ].
ExampleIf
[v1 · · · v5] ∼
1 3 0 2 00 0 1 1 00 0 0 0 1
then v2 = 3v1 and v4 = 2v1 + v3.
146
Let’s illustrate this with an
ExampleThe vectorsv1 = (1, 2, 1, 3), v2 = (2, 4, 2, 6), v3 = (0,−1, 3,−1), v4 = (1, 3,−2, 4)satisfy v2 = 2v1 and v4 = v1 − v3
147
Useful Facts:
From the method above for deciding whether vectors are linearlydependent we can derive the following.
Theorem (AR Thm 5.3.3)
If k > n, then any k vectors in Rn are linearly dependent.
Idea of proof: Because then the rre form of the matrix [v1 · · · vk ] mustcontain columns which do not have a leading entry. So these columnswill be dependent on other columns which do have a leading entry.
148
Theorem
Vectors v1, . . . , vn ∈ Rn are
linearly independent iff the matrix A = [v1 · · · vn] has det(A) 6= 0.
Idea of proof:
linearly indep ⇐⇒ rank(A) = n ⇐⇒ det(A) 6= 0
149
ExampleDecide whether the following vectors in R3 are linearly dependent orindependent:
(1, 2,−1), (0, 3, 4), (2, 1,−6), (0, 0, 2)
If they are dependent, write one as a linear combination of the others.
150
3.6 Subspaces of Rn [AR 5.2]
Certain subsets S of Rn have the nice property that any linear
combination of vectors from S still lies in S .
Definition (Subspace)
A subspace (of Rn) is a subset S (of R
n) that satisfies:
0. S is non-empty
1. u,w ∈ S =⇒ u + w ∈ S (closed under vector addition)
2. u ∈ S , α ∈ R =⇒ αu ∈ S (closed under scalar multiplication)
151
Examples
1. The xy -plane S = (x , y , z) ∈ R3 | z = 0 is a subspace of R
3
2. The plane in R3 through the origin and parallel to the vectors
u = (1, 1, 1) and v = (1, 2, 3) is a subspace of R3
3. The set H = (x , y) ∈ R2 | x > 0, y > 0 is not a subspace of R
2
4. Any line or plane in R3 that contains the origin is a subspace.
152
Another exampleShow that the points on the line y = 2x form a subspace of R
2
NoteEvery subspace of R
n contains the zero vector.
So the line y = 2x + 1 is not a subspace of R2.
In fact, a line or plane in R3 is a subspace iff it contains the origin.
153
An important example of a subspace is given by the following.
ExampleThe solution to the homogeneous linear system
x + y + z = 0
x − y − z = 0
is a subspace of R3.
Remember that a linear system is called homogeneous if all theconstant terms are equal to zero.
154
In general, we have the following:
Proposition (AR Thm 5.2.2)
The set of solutions of a system of m homogeneous linear equations inn variables is a subspace of R
n.
Proof:We check the 3 conditions in the definition of subspace
155
Generating a subspace
Let v1, . . . , vk be vectors in Rn.
Definition (Span)
The subspace spanned (or subspace generated) by these vectors is theset of all linear combinations of the given vectors:
Spanv1, . . . , vk = α1v1 + · · · + αkvk | α1, . . . , αk ∈ R
It is sometimes also denoted by 〈v1, . . . , vk〉.
Examples
1. In R2, Span(3, 2) is the line through the origin in the direction
given by (3, 2), i.e., the line y = 23x
2. In R3, Span(1, 1, 1), (3, 2, 1) is the plane x − 2y + z = 0
3. In R3, Span(1, 1, 1), (3, 3, 3) is the line x = y = z
156
Lemma (AR Thm 5.2.3)
1. Spanv1, . . . , vk is a subspace of Rn.
2. Any subspace of Rn that contains the vectors v1, . . . , vk contains
Spanv1, . . . , vk.
Proof:
RemarkThe subspace spanned by a set of vectors is the ‘smallest’ subspace thatcontains those vectors.
157
More examplesIn R
3:
1. Span(1, 1, 0) is the line through the origin containing the point(1, 1, 0).
2. Span(1, 0, 0), (1, 1, 0) is the xy -plane.
3. Span(1, 0, 0), (−3, 7, 0), (1, 1, 0) is the xy -plane.
4. Span(1, 0, 0), (2, 3,−4), (1, 1, 0) is the whole of R3.
158
Spanning sets
Definition (Spanning set)
Let V be a subspace of Rn.
Vectors v1, . . . , vk ∈ Rn span V if Spanv1, . . . , vk = V .
Such a set of vectors is called a spanning set for V .
Equivalently, V contains all the vectors v1, . . . , vk and all vectors in Vare linear combinations of the vectors v1, . . . , vk .
Examples
1. (1, 0) and (1, 1) span R2
2. (1, 1, 2) and (1, 0, 1) do not span R3
3. (1, 1, 2), (1, 0, 1) and (2, 1, 3) do not span R3
159
ExampleShow that the following vectors span R
4:
(1, 0,−1, 0), (1, 1, 1, 1), (3, 0, 0, 0), (4, 1,−3,−1)
We need to show that for any vector (a, b, c , d) ∈ R4, the equation
x(1, 0,−1, 0)+ y(1, 1, 1, 1)+ z(3, 0, 0, 0) + w(4, 1,−3,−1) = (a, b, c , d)
has a solution.
160
Writing this linear system in matrix form gives:
1 1 3 40 1 0 1−1 1 0 −30 1 0 −1
︸ ︷︷ ︸
xyzw
=
abcd
A
Which has augmented matrix:
1 1 3 40 1 0 1−1 1 0 −30 1 0 −1
∣∣∣∣∣∣∣∣
abcd
We know that this linear system is consistent for all possible values ofa, b, c , d if and only if rank(A) = 4(Which is the case in this example)
161
To decide if v1, . . . , vk ∈ Rn span Rn
1. Form the matrix A = [v1 · · · vk ]having columns given by the vectors v1, . . . , vk
2. Calculate rank(A) as before:
a. Reduce to row-echelon formb. Count the number of non-zero rows
Then, v1, . . . , vk span Rn ⇐⇒ rank(A) = n
Why does this method work?
162
Since A = [v1 · · · vk ] has k columns, we know that rank(A) 6 k.
It follows that:
Proposition
If k < n, then v1, . . . , vk ∈ Rn can’t span R
n.
163
3.7 Bases and dimension [AR 5.4]
In general, spanning sets can be (too) big.
For example, (1, 0), (0, 1), (2, 3), (−7, 4) span R2, but the last two are
not needed since they can be expressed as linear combinations of thefirst two. The vectors are not linearly independent.
Bases
Definition (Basis)
A basis for a subspace V ⊆ Rn is a set of vectors from V which
1. spans V
2. is linearly independent
164
Examples
1. (1, 0), (0, 1) is a basis for R2
2. (2, 0), (−1, 1) is a basis for R2
3. (2,−1,−1), (1, 2,−3) is a basis for the plane x + y + z = 0 in R3
4. (2, 3, 7) is a basis for the line in R3 given by x
2 = y3 = z
7
NoteA subspace of R
n can have many bases. For example, any two vectorsin R
2 which are not collinear will form a basis of R2.
165
Notation/example:The vectors
e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1)
form a basis of Rn, called the standard basis.
166
The following is an important and very useful theorem about bases:
Theorem (AR Thms 5.4.2, 5.4.3)
Let V be a subspace of Rn and let v1, . . . , vk be a basis for V .
1. If a subset of V has more than k vectors, then it is linearlydependent.
2. If a subset of V has less than k vectors, then it does not span V .
3. Any two bases of V have the same number of elements.
Proof:The proof is based on what we already know about (homogeneous)linear systems:
167
So, in particular, every basis of Rn has exactly n elements.
Dimension
The above theorem tells us that although there can be many differentbases for the same space, they will all have the same number of vectors.
Definition (Dimension)
The dimension of a subspace V is the number of vectors in a basis forV . This is denoted dim(V ).
168
Examples
1. The dimension of R2 is
2. Rn has dimension
3. The line (α,α, α) | α ∈ R ⊆ R3 has dimension
4. The plane (x , y , z) | x + y + z = 0 ⊆ R3 has dimension
In the special case where V = 0 it is convenient to say dim(V ) = 0.
169
Calculating bases I
How can we find a basis for a given subspace?
The method will depend on how the subspace is described.Often we know (or can easily calculate) a spanning set.
ExampleFind a basis for the subspace
S = (a + 2b, b,−a + b, a + 3b) | a, b ∈ R ⊆ R4
What is the dimension of S?
170
Another exampleFind a basis for the subspace of R
4 spanned by the vectors(1,−1, 2, 1), (−2, 2,−4,−2), (1, 0, 3, 0).
171
To find a basis for the span of a set of vectors
To find a basis for the subspace V spanned by vectors v1, . . . , vk ∈ Rn:
1. Form the matrix A = [v1 · · · vk ]
2. Reduce to row-echelon form B
3. The columns of A corresponding to the steps (leading entries) in Bgive a basis for V
Why does this method work?Recall that if A ∼ B , then the columns of A and B satisfy the samelinear relationships. The question then reduces to one about matrices inreduced row-echelon form.
172
Remarks
Don’t forget that it is the columns of A (not its row-echelon form)that we use for the basis.
This method gives a basis that is a subset of the original set ofvectors. Later we will give a second method which gives a basis ofdifferent, but usually simpler, vectors
ExampleLet S = (1,−1, 2, 1), (0, 1, 1,−2), (1,−3, 0, 5)Find a subset of S that is a basis for 〈S〉
173
Column space of a matrix
Definition (Column space)
Let A be a m × n matrix. The subspace of Rm spanned by the columns
of A is called the column space of A.
Suppose A ∼ B . In general, the column space of B is not equal to thecolumn space of A, although they do have the same dimension.
Example
A =
1 0 1−1 1 −32 1 0
∼
1 0 10 1 −20 0 0
= B
The column space of A is
The column space of B is
Are they equal?
What is the dimension of each subspace?174
Suppose that S = v1, . . . , vk ⊆ Rn is a set of vectors.
Remember that we saw a method for obtaining a basis for Span(S) thatstarted with the matrix [v1 · · · vk ] having the vectors of S as columns.
This method gave us a basis of Span(S) that is a subset of S .
So, if V is a subspace of Rn:
Every spanning set for V contains a basis for V .
It also follows from the above method that:
Every linearly independent subset of V extends to a basis for V .
175
Let m = dim(V ). From the above observation, since any basis musthave m elements, we conclude the following:
Theorem
Let V be a subspace of Rn, and suppose that dim(V ) = m.
1. If a spanning set of V has exactly m elements, then it is a basis.
2. If a linearly independent subset of V has exactly m elements, thenit is a basis.
ExampleGiven that
(1,−π,√
2), (23, 1, 100), (3
2, 7, 1)
is a spanning set for R3, it is a basis for R
3.
176
Calculating bases II
Another method for finding a basis: The ‘row method’
We first illustrate by repeating a previous example, but this time using amatrix that has the vectors as rows (not columns).
ExampleLet S = (1,−1, 2, 1), (0, 1, 1,−2), (1,−3, 0, 5)Find a subset of S that is a basis for 〈S〉.
1 −1 2 10 1 1 −21 −3 0 5
∼ · · · ∼
1 −1 2 10 1 1 −20 0 0 0
It follows that a basis for 〈S〉 is
Can you see why?177
To find a basis for the span of a set of vectors: the ‘row method’
To find a basis for the subspace V spanned by vectors v1, . . . , vk ∈ Rn.
1. Form the matrix A whose rows are the given vectors.
2. Reduce to row-echelon form B
3. The non-zero rows of B are a basis for V .
To explain why this works, let’s introduce some more terminology.
178
Row space of a matrix
Definition
Let A be a m× n matrix. The subspace of Rn spanned by the rows of A
is called the row space of A.
Suppose A ∼ B . The above method relies on the fact that (unlike thesituation with the column space):
The row space of B is always equal to the row space of A.
Why? Can you prove it?
179
Example
A =
1 0 1−1 1 −32 1 0
∼
1 0 10 1 −20 0 0
= B
The row space of A is Span
The row space of B is Span
NoteFor any matrix A:
rank(A) = dim(row space of A) = dim(column space of A)
This is because the number of non-zero rows in the row-echelon form isequal to the number of leading entries.
180
ExampleLet
A =
1 −1 2 −22 0 1 05 −3 7 −61 1 −1 3
Find a basis and the dimension for:
1. the column space of A;
2. the row space of A.
A =
1 −1 2 −22 0 1 05 −3 7 −61 1 −1 3
∼ · · · ∼
1 −1 2 −20 2 −3 40 0 0 10 0 0 0
So...
181
Bases for solution spaces
We have seen how to find a basis for a subspace given a spanning setfor the subspace.
Another way in which subspaces arise is as the set of solutions to ahomogenous linear system.
We saw previously that such a solution set is always a subspace of Rn
where n is the number of variables.
How can we find a basis for it?
We first illustrate with an example.
182
ExampleFind a basis for the subspace of R
4 defined by the equations
x1 + x3 + x4 = 0
3x1 + 2x2 + 5x3 + x4 = 0
x2 + x3 − x4 = 0
183
Finding a basis for the solution space:
To find a basis for the solution space of a system of homogeneous linearequations.
1. Write the system in matrix form Ax = 0
2. Reduce A to row-echelon form B
3. Solve the system Bx = 0 as usual, and write the solution in theform
x = t1v1 + · · · + tkvk
where t1, . . . , tk ∈ R are parameters and v1, . . . , vk ∈ Rn
Then v1, · · · , vk is a basis for the solution space.
184
Why does this work?
The set S = v1, · · · , vk is a spanning set for the solution space, sinceevery solution can be written as a linear combination of the vectors in S .
The fact that the vectors are linearly independent results from the wayin which they were defined.
Suppose we chose the parameters to be the variables whosecorresponding column in B had no leading entry. Say ti = xmi
.
Then the mj coordinate of vi is 1 if i = j and 0 if i 6= j .
It follows that the vi are linearly independent.
185
Another exampleFind a basis for the subspace of R
4 given by
V = (x1, x2, x3, x4) | x1 + 2x2 + x3 + x4 = 0, 3x1 + 6x2 + 4x3 + x4 = 0
The matrix A of coefficients is
A =
[1 2 1 13 6 4 1
]
∼ · · · ∼[1 2 0 30 0 1 −2
]
The solutions are given by
(x1, x2, x3, x4) = t1( , , , ) + t2( , , , ) t1, t2 ∈ R
So a basis for V is:
186
3.8 Rank-nullity theorem
Given a m × n matrix A there are three associated subspaces:
1. The row space of A is the subspace of Rn spanned by the rows.
Its dimension is equal to rank(A)
2. The column space of A is the subspace of Rm spanned by thecolumns. Its dimension is equal to rank(A)
3. The solution space of A is the subspace of Rn given by
(x1, . . . , xn) ∈ Rn | A
x1...xn
=
0...0
The solution space is also called the nullspace of A.Its dimension is called the nullity of A, nullity(A).
187
We have seen techniques to find bases for the row space, column spaceand solution space of a matrix.
As a result of these techniques we observe the following:
Theorem (cf. AR Thm 8.2.3)
Suppose A is an m × n matrix. Then
rank(A) + nullity(A) = n
Given what we know about finding the rank and the solution space of amatrix, this is simply the statement that every column in therow-echelon form either contains a leading entry or doesn’t contain aleading entry.
188
NoteIf you are asked to find the solution space, the column space and therow space of A, you only need to find the reduced row-echelon form ofA once.
Remember
If A and B are row equivalent matrices, then:
the row space of A is equal to the row space of B
the solution space of A is equal to the solution space of B
the column space if A is not necessarily equal to the column spaceof B
the columns of A satisfy the same linear relations as the columns ofB
the dimension of the column space of A is equal to the dimension ofthe column space of B
189
3.9 Coordinates relative to a basis
Definition (Coordinates)
Suppose B = v1, . . . , vn is a basis for Rn. For v ∈ R
n write
v = α1v1 + · · · + αnvn for some scalars α1, . . . , αn
The scalars α1, . . . , αn are called the coordinates of v relative to B.
The vector (α1, . . . , αn) ∈ Rn is called the coordinate vector of v
relative to B.
The column matrix
[v ]B =
α1...
αn
is called the coordinate matrix of v with respect to B.
190
Examples
1. If we consider R2 with the standard basis B = i, j,
the vector v = (1, 5) has coordinates [v]B =
[15
]
2. Now consider R2 with basis
B′ = a,b, where a = (2, 1) andb = (−1, 1).
Then v = (1, 5) has coordinates
[v]B′ =
[23
]v
i
5j
2a
3b
191
Having fixed a basis, there is a one-to-one correspondence betweenvectors and their coordinate matrices.
If u, v are vectors in Rn, and α is a scalar, then
[u + v]B = [u]B + [v]B[αv]B = α[v]B
A consequence of this is that vectors are linearly independent if andonly if their coordinate matrices are (using any basis).(Exercise: Prove this!)
192
Topic 4: General Vector Spaces [AR chapt 5]
4.1 The vector space axioms
4.2 Examples of vector spaces
4.3 Complex vector spaces
4.4 Subspaces of general vector spaces
4.5 Spanning sets, linear independence and bases
4.6 Coordinate matrices
193
Vectors in Rn have some basic properties shared by many many other
mathematical systems. For example,
u + v = v + u for all u, v
is true for other systems.
Key Idea:
Write down these basic properties and look for other systems whichshare these properties. Any system that does share these properties willbe called a vector space.
194
4.1 The vector space axioms [AR 5.1]
Let’s start trying to write down the basic properties that we want‘vectors’ to satisfy.
A vector space is a set V with two operations defined:
1. addition
2. scalar multiplication
We want these two operations to satisfy the kind of algebraic propertiesthat we are used to from vectors in R
n.
For example, we want our vector operations to satisfy
u + v = v + u and α(u + v) = αu + αv
This leads to a list of ten properties (or axioms) that we will then takeas our definition.
195
The scalars are members of a number system F called a field in whichwe have addition, subtraction, multiplication and division.
Usually the field will be F = R or C.
In this subject we will mainly concentrate on the case F = R.
196
Definition (Vector Space)
A vector space is a non-empty set V with two operations: addition andscalar multiplication.These operations are required to satisfy the following rules.For any u, v,w ∈ V :
Addition behaves well:
A1 u + v ∈ V (closure of vector addition)
A2 (u + v) + w = u + (v + w) (associativity)
A3 u + v = v + u (commutativity)
There must be a zero and inverses:
A4 There exists a vector 0 ∈ V such that
v + 0 = v for all v ∈ V (existence of zero vector)
A5 For all v ∈ V , there exists a vector −v
such that v + (−v) = 0 (additive inverses)
197
Definition (Vector Space ctd)
For all u, v,∈ V and α, β ∈ F:
Scalar multiplication behaves well:
M1 αv ∈ V (closure of scalar multiplication)
M2 α(βv) = (αβ)v (associativity of scalar multip)
M3 1v = v (multiplication by unit scalar)
Addition and scalar multiplication combine well :
D1 α(u + v) = αu + αv (distributivity 1)
D2 (α + β)v = αv + βv (distributivity 2)
198
RemarkIt follows from the axioms that for all v ∈ V :
1. 0v = 0
2. (−1)v = −v
Can you prove these?
We are going to list some systems that obey these rules.
We are not going to show that the axioms hold for these systems.If, however, you would like to get a feel for how this is done, readAR5.1, Example 2.
199
4.2 Examples of vector spaces
1. Rn is a vector space with scalars R
After all, this was what we based our definition on! Vector spaces withR as the scalars are called real vector spaces
200
2. Vector space of matrices
Denote by Mmn( or Mmn(R) Mm,n(R) or Mm×n(R)) the set of all m × nmatrices with real entries.Mmn is a real vector space with the following familiar operations:
[a11 a12
a21 a22
]
+
[b11 b12
b21 b22
]
=
[a11 + b11 a12 + b12
a21 + b21 a22 + b22
]
α
[a11 a12
a21 a22
]
=
[αa11 αa12
αa21 αa22
]
What is the zero vector in this vector space?
Note: Matrix multiplication is not a part of this vector space structure.
201
3. Vector space of polynomials
For a fixed n, denote by Pn (or Pn(R)) the set of all polynomialswith degree at most n:
Pn = a0 + a1x + a2x2 + · · · + anx
n | a0, a1, . . . , an ∈ R
If we define vector addition and scalar multiplication by:
(a0 + · · · + anxn) + (b0 + · · · + bnx
n) = (a0 + b0) + · · · + (an + bn)xn
α(a0 + a1x + · · · + anxn) = (αa0) + (αa1)x + · · · + (αan)x
n
Then Pn is a vector space.
202
4. Vector space of functions
Let S be a set.
Denote by F(S , R) the set of all functions from S to R.
Given f , g ∈ F(S , R) and α ∈ R,let f + g and αf be the functions defined by the following:
(f + g)(x) = f (x) + g(x)
(αf )(x) = α × f (x)
Equipped with these operations F(S , R) is a vector space.
RemarkThe ‘+’ on the left of the first equation is not the same as the ‘+’ onthe right!
Why not?
203
ExampleLet f , g ∈ F(R, R) be defined by
f : R → R, f (x) = sin x and g : R → R, g(x) = x2
What do f + g and 3f mean?
204
What is the zero vector in F(R, R)?
0 : R → R is the function defined by
0(x) = 0
That is, 0 is the function that maps all numbers to zero.
205
4.3 Complex vector spaces
A vector space that has C as the scalars is called a complex vectorspace.
Example
C2 = (a1, a2) | a1, a2 ∈ C
with the operations :
(a1, a2) + (b1, b2) = (a1 + a2, b1 + b2)
α(a1, a2) = (αa1, αa2)(where a1, a2, b1, b2, α ∈ C)
is a complex vector space.
RemarkAll of the above examples of real vector spaces: R
n, Pn(R), F(S , R)have complex analogues: C
n, Pn(C), F(S , C)206
Important observationAll the concepts we looked at for R
n
(such as subspaces, linear independence, spanning sets, bases)carry over directly to general vector spaces.
Why consider general vector spaces?
207
4.4 Subspaces of general vector spaces
Definition (Subspace)
A subspace of a vector space V is a subset S ⊆ V that is itself a vectorspace (using the operations from V ).
This looks slightly different to the definition we had for subspaces of Rn.
The following theorem shows that, in fact, we get the same thing.
Theorem (Subspace Theorem, AR Thm 5.2.1)
Let V be a vector space.A subset W ⊆ V is a subspace of V if and only if
0. W is non-empty
1. W is closed under vector addition
2. W is closed under scalar multiplication
208
NoteIt follows that a subspace W of V must necessarily contain the zerovector 0 ∈ V
ExampleLet V = M2,2 the vector space of real 2 × 2 matrices and H ⊆ V bematrices with trace equal to 0, where ‘trace’ is the sum of the diagonalentries.In other words
H =
[a bc d
]
| a + d = 0
Show that H is a subspace of V .
209
Another exampleLet
V = P2 = a0 + a1x + a2x2 | a0, a1, a2 ∈ R
andW = a0 + a1x + a2x
2 | a1a2 > 0 ⊆ V
Is W a subspace of V ?
210
More examples
1. 0 is always a subspace of V
2. V is always a subspace of V
3. The set of diagonal matrices
a 0 00 b 00 0 c
| a, b, c ∈ R
is a subspace of M3,3
4. The subset of continuous functions f : R → R | f is continuousis a subspace of F(R, R)
5. S = 2 × 2 matrices with determinant equal to 0=
[a bc d
]
| ad − bc = 0 is not a subspace of M2,2
6. f : [0, 1] → R | f (0) = 2 is not a subspace of F([0, 1], R)
211
4.5 Spanning sets, linear independence and bases
These concepts, which we have seen for subspaces of Rn,
apply equally well in a general vector space.
Let V be a vector space with scalars F and S ⊆ V a subset.
Definition
A linear combination of vectors v1, v2, . . . , vk ∈ S is a sum
α1v1 + · · · + αkvk
where each αi is a scalar.
Definition
The set S is linearly dependent if there are vectors v1, . . . , vk ∈ S andscalars α1, . . . , αk at least one of which is non-zero, such that
α1v1 + · · · + αkvk = 0
A set which is not linearly dependent is called linearly independent.212
Definition
The span of the set S is the set of all linear combinations of vectorsfrom S
Span(S) = α1v1 + · · · + αkvk | v1, . . . , vk ∈ S and α1, . . . , αk ∈ F
S is called a spanning set for V if Span(S) = V
This is the same as saying that S ⊆ V and every vector in V can bewritten as a linear combination of vectors from S .
Definition
A basis for V is a set which is both linearly independent and a spanningset for V .
213
ExampleAre the following elements of M2,2 linearly independent?
[1 30 1
]
,
[−2 10 −1
]
,
[1 30 4
]
214
Another example
In P2 the element p(x) = 2 + 2x + 5x2 is a linear combination ofp1(x) = 1 + x + x2 and p2(x) = x2, but q(x) = 1 + 2x + 3x2 is not.
So p1, p2 is not a spanning set for P2, though it is linearlyindependent.
215
As with Rn we have the following important results:
Theorem
Let V be a vector space.
1. Every spanning set for V contains a basis for V
2. Every linearly independent set in V can be extended to a basis of V
3. Any two bases of V have the same cardinality(i.e., ‘same number of elements’)
The basic idea behind the proof is exactly as we saw with Rn. But there
are some very interesting technical differences! These mostly concernthe possibility that a basis might have infinitely many elements.
216
Definition
The dimension of V , denoted dim(V ), is the number of elements in abasis of V . We call V finite dimensional if it admits a finite basis, andinfinite dimensional otherwise.
Examples
1. 1, x , x2, . . . , xn is a basis for Pn. So dim(Pn(R)) = n + 1
2. [1 00 0
]
,
[0 10 0
]
,
[0 01 0
]
,
[0 00 1
]
is a basis for M2×2,
so dim(M2×2) = 4.
3. [1 00 −1
]
,
[0 10 0
]
,
[0 01 0
]
is a basis for the vector space of 2× 2
matrices with trace equal to zero.
This is therefore a 3 dimensional subspace of M2×2.
217
An infinite dimensional example
Let P be the set of all polynomials:
P = a0 + a1x + · · · + an−1xn−1 + anx
n | n ∈ N, a0, a1, . . . , an ∈ R
Then P is a vector space and the set B = 1, x , x2, . . . is a basis for P.
So P is an infinite dimensional vector space.
Can you see why B is a basis?
218
In the case of a finite dimensional vector space, we have the following:
Theorem
Suppose V has dimension n, and S is a subset of V .
1. If |S | < n, then S does not span V
2. If |S | > n, then S is linearly dependent
The proof is exactly the same as in the case of Rn.
219
Examples
1. The polynomials
2 + x + x2, 1 + x , − 1 − 7x2, x − x2
are linearly dependent, since dim(P2) = 3.
2. The matrices
[2 13 4
]
,
[−1 10 1
]
,
[6 74 5
]
do not span M2×2 since dim(M2×2) = 4
220
Standard Bases
It is useful to fix names for certain bases.
The standard basis for Rn is
(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)
The dimension of Rn is n.
The standard basis for Mm,n is
1 0 . . . 00 0 . . . 0...
......
0 0 . . . 0
,
0 1 . . . 00 0 . . . 0...
......
0 0 . . . 0
, . . . ,
0 0 . . . 00 0 . . . 0...
......
0 0 . . . 1
The dimension of Mm,n is m × n.
221
The standard basis for Pn is
1, x , x2, . . . , xn
The dimension of Pn is n + 1.
222
4.6 Coordinate matrices
The notion of coordinates relative to a basis carries over from the caseof vectors in R
n.
Definition
Suppose that B = v1, . . . , vn is a basis for a vector space V . For anyv ∈ V we have
v = α1v1 + · · · + αnvn for some scalars α1, . . . , αn
The scalars α1, . . . , αn are called the coordinates of v relative to B.
The coordinate vector of v relative to B and coordinate matrix of vwith respect to B are also defined as in the case that V = R
n.
223
Examples
1. In P2 with basis B = 1, x , x2 the polynomial
p = 2 + 7x − 9x2 has coordinates [p]B =
27−9
2. M2×2 with basis B = [1 00 0
]
,
[0 10 0
]
,
[0 01 0
]
,
[0 00 1
]
The matrix A =
[1 23 4
]
has coordinates [A]B =
1234
224
Topic 5: Linear Transformations [AR 4.2, 8.1]
5.1 Linear transformations from R2 to R
2
5.2 Linear transformations from Rn to R
m
5.3 Matrix representations in general
5.4 Image, kernel, rank and nullity
5.5 Change of basis
225
We now turn to thinking about maps from one vector space to another.
A key feature of a vector space V is that given a basis B = v1, . . . vn,each element v ∈ V can be uniquely written as a linear combination ofthe vectors in the basis:
v = α1v1 + α2v2 + · · ·αnvn
In keeping with this, we will undertake a study of mappingsT : V → W between vector spaces which have the property that
T (v) = α1T (v1) + α2T (v2) + · · · + αnT (vn)
226
Definition (Linear transformation)
Let V and W be vector spaces (over the same field of scalars).
A linear transformation from V to W is a map T : V → W such thatfor each u, v ∈ V and for each scalar α:
1. T (u + v) = T (u) + T (v) (T preserves addition)
2. T (αu) = αT (u) (T preserves scalar multiplication)
Loosely speaking, linear transformations are those maps between vectorspaces that ‘preserve the vector space structure.’
227
5.1 Linear transformations from R2 to R
2
We will start by looking at some geometric transformations of R2.
A vector in R2 is an ordered pair (x , y), with x , y ∈ R.
To describe the effect of a transformation we will use coordinatematrices. With respect to the standard basis B = e1, e2, the vector
(x , y) has coordinate matrix
[xy
]
.
Example
Reflection across y -axis.
T([
xy
])
=
[−xy
]
(x , y)
(−x , y)
228
A common feature of all linear transformations is that they can berepresented by a matrix. In the example above
T( [
xy
])
=
[−1 00 1
] [xy
]
=
[−xy
]
The matrix
AT =
[−1 00 1
]
is called the standard matrix representation of the transformation T
229
Examples of (geometric) linear transformations from R2 to R
2
1. Reflection across the x-axis has matrix
[ ]
2. Reflection in the line y = 5x has matrix
[
−1213
513
513
1213
]
3. Rotation around the origin by an angle of π/2 has matrix
[0 −11 0
]
4. Rotation around the origin by an angle of θ has matrix
[ ]
We need to work out the coor-dinates of the point Q obtainedby rotating P .
θP
Q
230
Examples continued
5. Compression/expansion along the x-axis has matrix
[ ]
6. Shear along the x-axis has matrix
[1 c0 1
]
These are best thought of as mappings on a rectangle.
For example, a shear along the x-axis corresponds to the mapping
231
Successive Transformations
ExampleFind the image of (x , y) after a shear along the x-axis with c = 1followed by a compression along the y -axis with c = 1
2 .
Solution:Let R : R
2 → R2 be the compression and denote its standard matrix
representation by AR . Similarly let S : R2 → R
2 be the shear anddenote its standard matrix representation by AS . Then the coordinatematrix of R(S(x , y)) is given by
ARAS
[xy
]
It remains to recall AR and AS , and to compute the matrix products.
232
Note
1. The matrix for the linear transformation S followed by the lineartransformation R is the matrix product ARAS .(In other words ARS = ARAS)
2. Notice that (reading right to left) the two matrices are in theopposite order to the order in which the transformations areapplied.
3. The composition of two linear transformations T (v) = R(S(v)) isalso written T (v) = R S(v)
233
5.2 Linear transformations from Rn to Rm
An example of a linear transformation from R3 to R
2 is
T (x1, x2, x3) = (x2 − 2x3, 3x1 + x3)
To prove that this is a linear transformation, we must show that for any
u, v ∈ R3 and α ∈ R
we have that
T (u + v) = T (u) + T (v) and T (αu) = αT (u)
234
Proof Let u = (u1, u2, u3) and v = (v1, v2, v3). First we note that
u + v = (u1 + v1, u2 + v2, u3 + v3)
Applying T to this gives
T (u + v) = ((u2 + v2) − 2(u3 + v3), 3(u1 + v1) + (u3 + v3))
Re-arranging the right-hand side gives
((u2 + v2) − 2(u3 + v3), 3(u1 + v1) + (u3 + v3))
= (u2 − 2u3 , 3u1 + u3) + ( v2 − 2v3 , 3v1 + v3)
= T (u) + T (v)
For the second part,
T (αu) = T ((αu1, αu2, αu3)) = (αu2 − 2αu3, 3αu1 + αu3)
= α(u2 − 2u3, 3u1 + u3) = αT (u)
235
In fact this example is typical of all linear transformations from Rn to
Rm — the mapping rule gives a vector whose components are linear
combinations of the components of the input vector.
The simplest way to see this is to prove the following theorem.
Theorem
All linear transformations T : Rn → R
m have a standard matrixrepresentation AT specified by
AT = [ [T (e1)] [T (e2)] · · · [T (en)] ]
where each [T (ei )] denotes the coordinate matrix of T (ei )
Note
The matrix AT has size m × n.
Alternative notations for AT include: [T ] or [T ]S or [T ]S,S
236
ProofLet v ∈ R
n and write
v = α1e1 + α2e2 + · · · + αnen
The coordinate matrix of v with respect to the standard basis is then
[v] =
α1
α2...
αn
We seek an m × n matrix AT such that
[T (v)] = AT [v] (independently of v) (†)
In words, this equation says that AT times the column matrix [v] isequal to the coordinate matrix of T (v).Since T is linear, we have that
T (v) = α1T (e1) + α2T (e2) + · · · + αnT (en)
237
We read-off from this that
[T (v)] = α1[T (e1)] + α2[T (e2)] + · · · + αn[T (en)](property ofcoordinate matrices)
= [ [T (e1)] [T (e2)] · · · [T (en)] ]
α1...
αn
(just matrixmultiplication)
Comparing with equation (†) above, we see that we can take
AT = [ [T (e1)] [T (e2)] · · · [T (en)] ]
Summarizing: Given a linear transformation T from Rn to R
m there is acorresponding m × n matrix AT satisfying
[T (v)] = AT [v]
That is, the image of any vector v ∈ Rn can be calculated using thematrix AT .
238
Note
1. All linear transformations map the zero vector to the zero vector.
2. Linear transformations map lines through the origin to other linesthrough the origin (or to just the origin).
3. The above theory generalizes to linear transformations between anyn-dimensional vector space V and m-dimensional vector space W .
239
Examples
1. Define T : R3 → R
4 by T (x1, x2, x3) = (x1, x3, x2, x1 + x4).Calculate AT .
2. Give a reason why the mapping T : R2 → R
2 specified byT (x1, x2) = (x1 − x2, x1 + 1) is not a linear transformation.
3. Find AT for the linear transformation T : P2 → P1 given byT (a0 + a1x + a2x
2) = (a0 + a2) + a0x
240
5.3 Matrix representations in general [AR 8.4]
We’ve talked about the standard matrix representation. This is whenthe vectors are expressed as coordinates with respect to the standardbasis for the spaces involved.
We can also give a matrix representation when the vectors are expressedusing some other basis.
Let U and V be finite dimensional vector spaces. Suppose that
T : U → V is a linear transformation,
B = b1, . . . ,bn is a basis for U and
C = c1, . . . , cm is a basis for V .
241
We want a matrix that can be used to calculate the effect of T .Specifically, if we denote the matrix by AC,B , we want that
[T (u)]C = AC,B[u]B for all u ∈ U. (∗)
NoteFor this matrix equation to make sense the size of AC,B must be m× n.
Theorem
There exists a unique matrix satisfying the above condition (∗). It isgiven by
AC,B =[
[T (b1)]C [T (b2)]C · · · [T (bn)]C]
The proof is the same as for the case of the formula for the standardmatrix AT .
242
A note on notationThe matrix AC,B is also denoted by [T ]C,B. In the special case in whichU = V and B = C, we often write [T ]B in place of [T ]B,B.
Example
A linear transformation T : R3 → R
2 has matrix
[5 1 01 5 −2
]
with
respect to the standard bases of R3 and R
2. What is its matrix withrespect to the basis B = (1, 1, 0), (1,−1, 0), (1,−1,−2) of R
3 andthe basis C = (1, 1), (1,−1) of R
2?
SolutionWe apply T to the elements of B to get:
Then we write the result in terms of C:
We obtain the matrix AC,B =
243
ExampleConsider the linear transformation T : V → V where V is the vectorspace of real valued 2 × 2 matrices and T is defined by
T (Q) = QT = the transpose of Q.
Find the matrix representation of T with respect to bases:
B =
[1 10 0
]
,
[1 00 1
]
,
[1 01 0
]
,
[0 11 0
]
and
C =
[1 00 0
]
,
[0 10 0
]
,
[0 01 0
]
,
[0 00 1
]
244
SolutionThe task is to work out the coordinates with respect to C of the imageof each element in B.
245
5.4 Image, kernel, rank and nullity [AR 8.2]
Let T : U → V be a linear transformation.
Definition (Kernel and Image)
The kernel of T is defined to be
ker(T ) = u ∈ U | T (u) = 0
The image of T is defined to be
Im(T ) = v ∈ V | v = T (u) for some u ∈ U
The kernel is also called the nullspace. It is a subspace of U. Itsdimension is denoted nullity(T ).
The image is also called the range. It is a subspace of V . Its dimensionis denoted rank(T ).
246
ExampleConsider the linear transformation T : R
3 → R2 specified by
T (x , y , z) = (x − y + z , 2x − 2y + 2z)
Find bases for ker(T ) and Im(T ).
247
Definition
A linear transformation T : U → V is called injective if ker(T ) = 0.It is called surjective if Im(T ) = V .
ExampleIs the linear transformation of the preceding example injective ?
Is it surjective ?
248
NoteWhen we calculate ker(T ) we find that we are solving equations of theform ATX = 0. So ker(T ) is the same as the solution space for AT .
To calculate Im(T ) we can use the fact that, because B spans U thenT (B) must span T (U); the image of T . But the elements of T (B) aregiven by the columns of AT . Thus the image of T can be identifiedwith the column space of AT . It follows that rank(T ) = rank(AT ).
We can now use the result of slide 188 to see that for a lineartransformation T : U → V with dim(U) = n
nullity(T ) + rank(T ) = n
249
5.5 Change of basis
Transition Matrices
We have seen that a matrix representation of a linear transformationT : U → V depends on both the choices of bases for U and V .
In fact different matrix representations are related by a matrix whichdepends on the bases but not on T . To understand this, we undertake astudy of converting coordinates with respect to one basis to coordinatesusing another basis.
Let B = b1, . . . ,bn and C = c1, . . . , cn be bases for the samevector space V and let v ∈ V .
How are [v]B and [v]C related?
By multiplication by a matrix!
250
Theorem
There exists a unique matrix P such that for any vector v ∈ V ,
[v]C = P [v]B
The matrix P is given by
P = [ [b1]C · · · [bn]C]
and is called the transition matrix from B to C.
In words, the columns of P are the coordinate matrices, with respect toC, of the elements of B.
We will sometimes denote this transition matrix by PC,B
It is also sometimes denoted by PB→C
251
Proof:We want to find a matrix P such that for all vectors v in V ,
[v]C = P [v]B
Recall that if T : V → V is any linear transformation, then
[T (v)]C = [T ]C,B[v]B (∗)
where[T ]C,B =
[[T (b1)]C [T (b2)]C · · · [T (bn)]C
]
is the matrix representation of T .
252
Applying this to the special case where T (v) = v for all v
(i.e., T is the identity linear transformation)gives
[T ]C,B =[[b1]C [b2]C . . . [bn]C
]
and (∗) becomes[v]C = [T ]C,B[v]B
So we can take P = [T ]C,B
ExerciseFinish the proof by showing that P is unique. That is, if Q is a matrixsatisfying [v]C = Q[v]B for all v, then Q = P .
253
A simple caseThe transition matrix is easy to calculate when one of B or C is thestandard basis.
ExampleIn R
2, write down the transition matrix from B to S, where
B = (1, 1), (1,−1) and S = (1, 0), (0, 1) .
Use this to compute [v]S , given that [v]B =
[11
]
.
Solution
PS,B =[
[b1]S [b2]S]
=
[v]S = PS,B[v]B =
254
Going in the other direction
A useful fact is that transition matrices are always invertible.(Why is this true?)
Starting with the equation
[v]C = PC,B[v]B
and rearranging gives[v]B = P−1
C,B[v]C
But we know that[v]B = PB,C[v]C
and so, by the uniqueness part of the above theorem, it must be thecase that
PB,C = (PC,B)−1
255
ExampleFor B and S as in the previous example, compute PB,S , the transitionmatrix from S to B.
Use it to compute [v]B, given [v]S =
[20
]
.
SolutionWe saw that in this case
PS,B =
[1 11 −1
]
It follows that
PB,S =
[v]B =
256
Calculating a general transition matrix
Keeping notation as before, we have
[v]S = PS,B[v]B and [v]C = PC,S [v]S
Combining these,
[v]C = PC,S [v]S = PC,SPS,B[v]B
Using the uniqueness of the transition matrix, we get
PC,B = PC,SPS,B = P−1S,CPS,B
Since it is usually easy to calculate a transition matrix with the firstbasis the standard basis, this makes it straightforward to calculate anytransition matrix.
257
ExampleWith U = V = R
2 and B = (1, 2), (1, 1) and C = (−3, 4), (1,−1),find PC,B.
PS,B = PS,C =
So
PC,S = and PC,B =
258
Relationship Between Different Matrix Representations
Example
Calculate the standard matrix representation of T : R2 → R
2 where
T (x , y) = (3x − y ,−x + 3y)
Solution
259
Example continuedNow find the matrix of T with respect to the basisB = (1, 1), (1,−1)
Solution
Notice that [T ]B is diagonal. This makes it very convenient to use thebasis B in order to understand the effect of T .
260
How are [T ]S and [T ]B related?
Theorem
The matrix representations of T : V → V with respect to two bases Cand B are related by the following equation:
[T ]B = PB,C[T ]CPC,B
Proof:We need to show that for all v ∈ V
[T (v)]B = PB,C[T ]CPC,B[v]B
Starting with the right-hand side we obtain:
PB,C[T ]CPC,B[v]B = PB,C[T ]C [v]C (property of PC,B)
= PB,C[T (v)]C (property of [T ]C)
= [T (v)]B (property of PB,C)
261
Example
For the above linear transformation T : R2 → R
2 given byT (x , y) = (3x − y ,−x + 3y) we saw that
[T ]C =
[T ]B =
where C = (1, 0), (0, 1) is the standard basis and B = (1, 1), (1,−1)
Since C is the standard basis, it is easy to write downPC,B =
From which we calculate PB,C =
Calculation verifies that in this case we do indeed have:
[T ]B = PB,C[T ]CPC,B
262
Topic 6: Inner Product Spaces [AR Ch 6]
6.1 Definition of inner products
6.2 Geometry from inner products
6.3 Cauchy-Schwarz inequality
6.4 Orthogonality and projections
6.5 Gram-Schmidt orthogonalization procedure
6.6 Application: curve fitting
263
6.1 Definition of inner productsThe Euclidean length of a vector in v ∈ R
n is defined as
‖v‖ =√
v21 + v2
2 + · · · + v2n =
√v · v
Similarly, the distance between vectors u and v is given by ‖u − v‖.
The angle θ between two vectors u and v was defined by
cos θ =v · u
‖v‖‖u‖ with 0 6 θ 6 π
The projection of v onto u was given in terms of the dot product by
proju v =(u · v)‖u‖2
u
264
We want to address two issues:
1. How to generalise the notion of a dot product for vector spacesother than R
n.
2. To relate ideas associated with the dot product and itsgeneralisations to bases and linear equations.
The first can be done by looking carefully at some of the key propertiesand generalising them.
265
Definition (Inner Product)
Let V be a vector space over the real numbers. An inner product on Vis a function that associates with every pair of vectors u, v ∈ V a realnumber, denoted 〈u, v〉, satisfying the following properties.
For all u, v,w ∈ V and α ∈ R:
1. 〈u, v〉 = 〈v,u〉2. α〈u, v〉 = 〈αu, v〉3. 〈u, v + w〉 = 〈u, v〉 + 〈u,w〉4. a. 〈u,u〉 > 0
b. 〈u,u〉 = 0 ⇒ u = 0
A vector space V together with an inner product is called an innerproduct space.
NoteA vector space can admit many different inner products. We’ll see someexamples shortly.
266
In words these axioms say that we require the inner product to:
1. be symmetric;
2. be linear with respect to scalar multiplication;
3. be linear with respect to addition;
4. a. have positive squared lengths;b. be such that only the zero vector has length 0.
With V = Rn, we of course have that 〈u, v〉 = u · v defines an inner
product (the Euclidean inner product).
But this is not the only inner product in the vector space Rn.
267
ExampleShow that, in R
2, if u = (u1, u2) and v = (v1, v2) then
〈u, v〉 = u1v1 + 2u2v2
defines an inner product.
268
More examples
1. Show that in R3, 〈u, v〉 = u1v1 − u2v2 + u3v3 does not define an
inner product by showing that axiom 4a does not hold.
2. Show that in R2,
〈u, v〉 =[u]T
[2 −1−1 1
][v]
=[u1 u2
][
2 −1−1 1
] [v1
v2
]
defines an inner product.
269
Another example
The last example raises the question of what we need to know about a
2 × 2 matrix A =
[a bc d
]
to ensure that 〈u, v〉 =[u]T
A[v]
defines an
inner product.
The check that this satisfies axioms 2 and 3 follows exactly the samelines as the previous example.
In order to satisfy axiom 1 we will need to make b = c .
The hard work occurs in satisfying axiom(s) 4. A quick check with thestandard basis vectors shows that we must have a > 0 and d > 0.
A harder calculation, which involves completing a square, shows thataxiom(s) 4 will be satisfied exactly when det(A) > 0 (and a > 0).
So we can tell exactly when a 2 × 2 matrix A defines an inner productin this way.
270
Another example
In the case of the vector space Pn of polynomials of degree less than orequal to n, a possible choice of inner product is
〈p, q〉 =
∫ 1
0p(x)q(x) dx .
To verify this, we would need to check, for any polynomialsp(x), q(x), r(x) and any scalar α that
1.∫ 10 p(x)q(x) dx =
∫ 10 q(x)p(x) dx
2. α∫ 10 p(x)q(x) dx =
∫ 10 (αp(x))q(x) dx
3.∫ 10 p(x) (q(x) + r(x)) dx =
∫ 10 p(x)q(x) dx +
∫ 10 p(x)r(x) dx
4.∫ 10 p(x)2 dx > 0
5.∫ 10 p(x)2 dx = 0 only when p(x) is the zero polynomial
271
6.2 Geometry from inner products
In a general vector space, how can we measure angles and find lengths?If we fix an inner product on the vector space, we can then definelength and angle using the same equations that we saw above for R
n.We simply replace the dot product by the chosen inner product.
Definition (Length, Distance, Angle)
For a real vector space with an inner product 〈· , ·〉 we define:
the length (or norm) of a vector v by ‖v‖ =√
〈v, v〉
the distance between two vectors v and u by d(v,u) = ‖v − u‖
and the angle θ between v and u using
cos θ =〈v,u〉‖v‖‖u‖ with 0 6 θ 6 π
272
NoteIn order for this definition of angle to make sense we need
−1 6〈v,u〉‖v‖‖u‖ 6 1
We will see shortly that this is always the case(the Cauchy-Schwarz inequality)
Definition (Orthogonal vectors)
Two vectors v and u are said to be orthogonal if 〈v,u〉 = 0
273
Example
〈(u1, u2), (v1, v2)〉 = u1v1 + 2u2v2 defines an inner product on R2
If u = (3, 1) and v = (−2, 3), then
d(u, v) = ‖u−v‖ = ‖(5,−2)‖ =√
〈(5,−2), (5,−2)〉 =√
25 + 8 =√
33
and〈u, v〉 = 〈(3, 1), (−2, 3)〉 = 3 × (−2) + 2 × 1 × 3 = 0
so u and v are orthogonal (using this inner product)
274
Example (an inner product for functions)The set of all continuous functions
C [a, b] = f : [a, b] → R | f is continuous
is a vector space.
It’s a subspace of F([a, b], R), the vector space of all functions.
For f , g ∈ C [a, b] define
〈f , g〉 =
∫ b
a
f (x)g(x)dx
This defines an inner product.
275
Example
Consider C [0, 2π] with the inner product 〈f , g〉 =∫ 2π
0 f (x)g(x)dx
The norms of the functions s(x) = sin(x) and c(x) = cos(x) are:
‖s‖2 = 〈s, s〉 =
∫ 2π
0sin2(x)dx =
∫ 2π
0
1
2(1 − cos(2x))dx
=
[x
2− 1
4sin(2x)
]2π
0
= π
So ‖s‖ =√
π and (similarly) ‖c‖ =√
π
〈s, c〉 =
∫ 2π
0sin(x) cos(x)dx =
∫ 2π
0
1
2sin(2x)dx =
[
−1
4cos(2x)
]2π
0
= 0
So sin(x) and cos(x) are orthogonal
This is used in the study of periodic functions (using ‘Fourier series’) in(for example) signal analysis, speech recognition, music recording etc.
276
6.3 Cauchy-Schwarz inequality
Theorem
Let V be a real inner product space.Then for all u, v ∈ V
|〈u, v〉| 6 ‖u‖‖v‖
Proof:The same as the proof we saw for the case V = R
n on slide 112.
ApplicationThe definition of angle is okay!We defined the angle between two vectors using
cos θ =〈u, v〉‖u‖‖v‖
From the Cauchy-Scwartz inequality we know that
−1 6〈u, v〉‖u‖‖v‖ 6 1
277
ExampleFor two continuous functions f , g : [a, b] → R, it follows directly fromthe Cauchy-Schwarz inequality that
(∫ b
a
f (x)g(x)dx
)2
6
(∫ b
a
f 2(x)dx
) (∫ b
a
g2(x)dx
)
And we don’t need to do any (more) calculus to prove it!
Example
Set f (x) =√
x and g(x) = 1/√
x ; also a = 1, b = t > 1.
We obtain(∫ t
11 dx
)2
6
(∫ t
1x dx
) (∫ t
1
1
xdx
)
which becomes
(t − 1)2 6
(t2 − 1
2
)
log t or log t >2(t − 1)
t + 1
278
6.4 Orthogonality and projections [AR 6.2]
Orthogonal sets
Recall that u and v are orthogonal if 〈u, v〉 = 0
Definition (Orthogonal set of vectors)
A set of vectors v1, . . . , vk is called orthogonal if 〈vi , vj 〉 = 0whenever i 6= j .
Examples
1. (1, 0, 0), (0, 1, 0), (0, 0, 1) is orthogonal in R3 with the dot
product.
2. So is (1, 1, 1), (1,−1, 0), (1, 1,−2).3. sin(x), cos(x) is orthogonal in C [0, 2π] equipped with the inner
product defined on slide 276
279
Proposition
Every orthogonal set of nonzero vectors is linearly independent
Proof:
280
Orthonormal sets
Definition
A set of vectors v1, . . . , vk is called orthonormal if it is orthogonaland each vector has length one. That is
v1, . . . , vk is orthonormal ⇐⇒ 〈vi , vj 〉 =
0 i 6= j
1 i = j
NoteAny orthogonal set of non-zero vectors can be made orthonormal bydividing each vector by its length.Examples
1. In R3 with the dot product:
(1, 0, 0), (0, 1, 0), (0, 0, 1) is orthonormal
(1, 1, 1), (1,−1, 0), (1, 1,−2) is not (though it is orthogonal)
1√3(1, 1, 1), 1√
2(1,−1, 0), 1√
6(1, 1,−2) is orthonormal
281
2. In C [0, 2π] with the inner product
〈f , g〉 =
∫ 2π
0f (x)g(x)dx
The set sin(x), cos(x) is orthogonal but not orthonormal.
The set 1√π
sin(x), 1√π
cos(x) is orthonormal
The (infinite) set
1√2π
,1√π
sin(x),1√π
cos(x),1√π
sin(2x),1√π
cos(2x), . . .
is orthonormal.
282
Orthonormal bases
Bases that are orthonormal are particularly convenient to work with.For example, we have the following
Lemma (AR Thm 6.3.1)
If v1, . . . , vn is an orthonormal basis for V and x ∈ V , then
x = 〈x, v1〉v1 + · · · + 〈x, vn〉vn
Proof: Exercise!
283
Orthogonal projection [AR 6.3]
Let V be a real vector space, with inner product 〈·, ·〉Let u ∈ V be a unit vector (i.e., ‖u‖ = 1)
Definition
The orthogonal projection of v onto u is
p = 〈v,u〉u
Note
p = ‖v‖ cos θ u
v − p is orthogonal to u
ExampleThe orthogonal projection of (2, 3) onto 1√
2(1, 1) is:
284
More generally, we can project onto a subspace W of V as follows.
Let u1, . . . ,uk be an orthonormal basis for W .
Definition
The orthogonal projection of v ∈ V onto W is:
projW (v) = 〈v,u1〉u1 + · · · + 〈v,uk 〉uk
Properties of p = projW (v)
p ∈ W
if w ∈ W , then projW (w) = w (by the above lemma)
v − p is orthogonal to W (i.e., orthogonal to every vector in W )
p is the vector in W that is closest to v
That is, for all w ∈ W , ‖v − w‖ > ‖v − p‖ p does not depend on the choice of orthonormal basis for W
285
Example
Let W = (x , y , z) | x + y + z = 0 in V = R3 with the dot product.
The set
u1 =1√2(1,−1, 0), u2 =
1√6(1, 1,−2)
is an orthonormal basis for W .For v = (1, 2, 3) we have
p = 〈v, u1〉u1 + 〈v, u2〉u2
=
(−1
2
)
(1,−1, 0) +
(−1
2
)
(1, 1,−2)
= (−1, 0, 1)
Note that v − p = (2, 2, 2) is orthogonal to W .
286
6.5 Gram-Schmidt orthogonalization procedure
The following can be used to make any basis orthonormal:
Gram-Schmidt procedure [AR Thm 6.3.6]
Suppose v1, . . . , vk is a basis for V .
1. Let u1 = 1‖v1‖v1
2a. w2 = v2 − 〈v2,u1〉u1
2b. u2 = 1‖w2‖w2
3a. w3 = v3 − 〈v3,u1〉u1 − 〈v3,u2〉u2
3b. u3 = 1‖w3‖w3
......
ka. wk = vk − 〈vk ,u1〉u1 − · · · − 〈vk ,uk−1〉uk−1
kb. uk = 1‖wk‖wk
Then u1, . . . ,uk is an orthonormal basis for V .287
ExampleFind an orthonormal basis for the subspace W of R
4 (with dot product)spanned by
(1, 1, 1, 1), (2, 4, 2, 4), (1, 5,−1, 3)Answer
1
2(1, 1, 1, 1),
1
2(−1, 1,−1, 1),
1
2(1, 1,−1,−1)
ExampleFind the point in W closest to v = (2, 2, 1, 3)
Answer12(3, 5, 3, 5)
288
6.6 Application: curve fitting [AR 6.4]
Given a set of data points (x1, y1), (x2, y2),. . . , (xn, yn) we want to findthe straight line y = a + bx that best approximates the data.
A common approach is to minimise the the least squares error, E
E 2 = sum of the squaresof vertical errors
=n∑
i=1
δ2i
=n∑
i=1
(yi − (a + bxi ))2
x
y
a+bx
289
So given (x1, y1),. . . , (xn, yn) we want to find a, b ∈ R which minimise
n∑
i=1
(yi − (a + bxi ))2
Which can be written as‖y − Au‖2
where
y =
y1...yn
A =
1 x1
1 x2...
...1 xn
u =
[ab
]
To minimise ‖y − Au‖ we want Au to be as close as possible to y
We can use projection to find the closest point.
290
We seek the vector in
W = Av | v ∈ R2 (= the column space of A)
that is closest to y
The closest vector is precisely projW y
So to find u we could project y to W to get Au = projW y
and from this calculate u
291
However, we can calculate u directly (without finding an orthonormalbasis for W ) by noting that
〈w, y − projW y〉 = 0 for all w ∈ W
=⇒ 〈Av, y − Au〉 = 0 for all v ∈ R2
=⇒ (Av)T (y − Au) = 0 for all v ∈ R2
=⇒ vT AT (y − Au) = 0 for all v ∈ R2
=⇒ AT (y − Au) = 0
=⇒ ATy − ATAu = 0
=⇒ ATAu = ATy
From this we can calculate u, given that we know A and y.292
SummaryGiven data points (x1, y1), (x2, y2),. . . , (xn, yn), to find the straight line
y = a + bx of best fit, we find u =
[ab
]
such that
ATAu = ATy (∗)
where y =
y1...yn
and A =
1 x1
1 x2...
...1 xn
If ATA is invertible (and it usually is), the solution to (∗) is given by
u = (ATA)−1ATy
293
ExampleFind the straight line which best fits the data points(−1, 1), (1, 1), (2, 3)
AnswerLine of best fit is: y = 9
7 + 47x
294
ExtensionThe same method works for finding quadratic fitting curves.To find the quadratic y = a + bx + cx2 which best fits data (x1, y1),(x2, y2),. . . , (xn, yn) we take
A =
1 x1 x21
1 x2 x22
......
...1 xn x2
n
y =
y1...yn
and solveATAu = ATy
for
u =
abc
295
Topic 7: Eigenvalues and Eigenvectors [AR Chapt 7]
7.1 Definition of eigenvalues and eigenvectors
7.2 Finding eigenvalues
7.3 The Cayley-Hamilton theorem
7.4 Finding eigenvectors
7.5 Diagonalization
296
7.1 Definition of eigenvalues and eigenvectors
The topic of eigenvalues and eigenvectors is fundamental to manyapplications of linear algebra. These include quantum mechanics inphysics, image compression and reconstruction in computing andengineering, and the analysis of high dimensional data in statistics.
The key idea is to identify those subspaces which map to themselvesunder T .
For any vector 0 6= v ∈ V , Spanv is a subspace of V of dimension 1.
Suppose thatT (v) = λv
for some scalar λ. It follows that T maps the subspace Spanv toitself.Note that λ is a scale factor which stretches the subspace and possiblychanges its sense (if λ is negative).
297
Definition
Let T : V → V be a linear transformation.A scalar λ is an eigenvalue of T if there is a non-zero vector v ∈ V suchthat
T (v) = λv (∗)The vector v is called an eigenvector of T (with eigenvalue λ).If λ is an eigenvalue of T , then the set of all v satisfying (∗) is asubspace of V (exercise!) and is called the eigenspace of λ.
We will restrict attention to the case that V is finite dimensional. ThenT can be represented by a matrix [T ], which we know depends on thechoice of bases.
298
In fact the idea of eigenvalues and eigenvectors can be applied directlyto square matrices.
Definition
Let A be an n × n matrix and let λ be a scalar. Then a non-zero n × 1column matrix v with the property that
Av = λv
is called an eigenvector, while λ is called the eigenvalue.
To develop some geometric intuition, it is handy to think of A as beingthe standard matrix of a linear transformation.
Example
Consider the matrix
[1 41 1
]
as the standard matrix of a linear
transformation. What is the effect of the transformation on the vectors(2, 1) and (2,−1)?
299
7.2 Finding eigenvalues
With I denoting the n × n identity matrix, the defining equation foreigenvalues and eigenvectors can be rewritten
(A − λI )v = 0
The values of λ for which this equation has non-zero solutions areprecisely the eigenvalues.
Theorem
The homogeneous linear system (A − λI )v = 0 has a non-zero solutionif and only if det(A − λI ) = 0. Consequently, the eigenvalues of A arethe values of λ for which
det(A − λI ) = 0
300
NotationThe equation det(A − λI ) = 0 is referred to as the characteristicequation. From our study of determinants, we know that det(A − λI ) isa polynomial of degree n in λ. It is called the characteristic polynomial.
Example
Find the eigenvalues of
[1 41 1
]
.
301
Example
Find the eigenvalues of
[0 1−1 0
]
.
If we use the standard basis, how can the corresponding lineartransformation be described geometrically? How does this tell you thatthe matrix does not have any invariant subspaces?
302
Example
Find the eigenvalues of the matrix A =
1 0 0 00 1 0 00 0 1 −10 0 1 1
.
303
7.3 The Cayley-Hamilton theorem
We note (without proof) the following
Theorem (Cayley-Hamilton Theorem)
A square matrix satisfies its characteristic equation.
This means (among other things) that
Every (positive) power of A can be expressed as a linearcombination of I , A,. . . , An−1.
If A is invertible, then A−1 can be expressed as a linearcombination of I , A,. . . , An−1.
304
Examples
Let A =
[3 21 4
]
1. Calculate the characteristic equation of A.
2. Verify that A satisfies its characteristic equation.
3. Express A−1 as a linear combination of A and I and hencecalculate it.
4. Express A3 as a linear combination of A and I and hence calculateit.
305
7.4 Finding eigenvectors
To find the eigenvectors of a matrix
For each eigenvalue λ, solve the homogeneous linear system(A − λI )v = 0.
Use row reduction as usual.
Note that rank(A − λI ) < n, so you always obtain at least one rowof zeros.
Example
For each of the eigenvalues 3,−1 of the matrix
[1 41 1
]
, find a
corresponding eigenvector.
306
ExampleFor each of the eigenvalues λ = −1, 8 (the eigenvalue −1 is repeated
twice) of the matrix
3 2 42 0 24 2 3
, find a basis of for the corresponding
eigenspace.
307
7.5 Diagonalization [AR 7.2]
We now take up the problem of studying the question of when theeigenvectors of an n × n matrix A form a basis for R
n. Such bases areextremely important in the applications of eigenvalues and eigenvectorsmentioned earlier.
Definition
A square matrix A is said to be diagonalizable if there is an invertiblematrix P such that P−1AP is a diagonal matrix. The matrix P is saidto diagonalize A.
308
To test if A is diagonalizable, the following theorem can be used.
Theorem
An n × n matrix A is diagonalizable if and only if there is a basis for Rn
all of whose elements are eigenvectors of A.
Idea of proof:If we can form a basis in which all the basis vectors are eigenvectors ofT , then the new matrix for T will be diagonal.
309
A simple case in which A is diagonalizable is when A has n distincteigenvalues. This follows from the theorem above and the following
Lemma
Eigenvectors corresponding to distinct eigenvalues are linearlyindependent.
Idea of proof:
310
Example
Give reasons why the matrix A =
[1 20 1
]
cannot be diagonalized.
311
How to diagonalize a matrix
Suppose A is diagonalizable. How can we find an invertible matrix P ,and a diagonal matrix D with D = P−1AP ?
Theorem
Let A be a diagonalizable n × n matrix. Thus there exists a basisv1, . . . , vn for R
n whose elements are eigenvectors of A.Let P =
[[v1] · · · [vn]
]and let λi be the eigenvalue of the eigenvector vi .
ThenP−1AP = diag [λ1, λ2, . . . , λn]
So, if we have found a basis consisting of eigenvectors, then we canwrite down matrices P and D without further calculation.
312
Example
Check that
212
,
−120
,
−101
are eigenvectors of the matrix
A =
3 2 42 0 24 2 3
and read off the corresponding eigenvalues.
Write down an invertible matrix P such that D = P−1AP is diagonal,and write down the diagonal matrix D.
Check your answer by evaluating both sides of the equation PD = AP .
313
Orthogonal matricesBecause a matrix often represents a physical system it can be importantthat the change-of-basis transformation does not affect shape.This will happen when the change-of-basis matrix is orthogonal:
Definition
An n × n matrix P is orthogonal if the columns of P form anorthonormal basis of R
n.
Examples
−1/√
2 1/√
6 1/√
3
0 −2/√
6 1/√
3
1/√
2 1/√
6 1/√
3
and
[cos θ sin(θ)sin θ cos θ
]
are orthogonal
but
[1 10 1
]
and
[1 −11 1
]
are not.
314
Orthogonal matrices have some good properties.
If P is orthogonal n × n and u, v ∈ Rn then
P−1 = PT
‖Pu‖ = ‖u‖ 〈Pu,Pv〉 = 〈u, v〉
We can summarise by saying that orthogonal matrices ‘preserve’ lengthand angle.
315
Real symmetric matrices
Definition
A matrix A is symmetric if AT = A
Examples[1 22 3
]
and
3 −1 4−1 1 54 5 9
are symmetric,
but
[2 13 2
]
and
3 −1 4−1 1 54 6 9
are not
Symmetric matrices arise (for example) in:
quadratic functions
inner products
maxima and minima of function of more than 1 variable
316
Theorem
Let A be an n × n real symmetric matrix. Then:
1. all roots of the characteristic polynomial of A are real;
2. eigenvectors from distinct eigenvalues are orthogonal;
3. A is diagonalizable;(In fact, there is an orthonormal basis of eigenvectors.)
4. we can write A = QDQ−1 where D is diagonal and Q is anorthogonal matrix.
Note that, in this case the diagonalization formula can be writtenA = QDQT .Idea of proof of 2:
317
Example
Find matrices D and Q as above for A =
[2 −1−1 2
]
318
Powers of a matrix
In applications, one often comes across the need to apply atransformation many times. In the case that the transformation can berepresented by diagonalizable matrix A, it is easy to compute Ak andthus the action of the k-th application of the transformation.
The first point to appreciate is that computing powers of a diagonalmatrix D is easy.
ExampleWith D = diag [1,−3, 2], write down D2 and D3.
In fact, we have
Lemma
Suppose A is diagonalizable, so that D = P−1AP is diagonal. Then
Ak = PDkP−1
319
ExampleFor the matrix of slide 301 we can write
A = PDP−1 with D =
[3 00 −1
]
, P =
[2 −21 1
]
Thus
Ak = PDkP−1 with P−1 =1
4
[1 2−1 2
]
Explicitly,
An = 14
[2 −21 1
]
×[3n 00 (−1)n
]
×[
1 2−1 2
]
= 14
[2 (3n + (−1)n) 4 (3n − (−1)n)
3n − (−1)n 2 (3n + (−1)n)
]
320
Conic Sections [AR 9.6]
Consider equations in x and y of the form
ax2 + bxy + cy2 + dx + ey + f = 0
where a, b, c , d , e, f ∈ R are constants.
The graphs of such equations are called conic sections or conics.
Example9x2 − 4xy + 6y2 − 10x − 20y − 5 = 0
-0.5 0.5 1 1.5 2 2.5
1
2
3
4
x
y
We will see how the shape ofthis graph can be calculatedusing diagonalization.
321
We shall assume that d and e are zero.See [AR 9.6] for a discussion of how to reduce to this case.
If the equation is simple enough, we can identify the conic by inspection.
Standard Conics: (see figure 9.6.1 in AR)
ellipsex2
α2+
y2
β2= 1
hyperbolax2
α2− y2
β2= 1
parabola y = αx2
322
The equation in matrix formConsider the curve defined by the equation
ax2 + bxy + cy2 = 1
The equation can be written in matrix form as
[x y
][
a 12b
12b c
] [xy
]
= 1
That is,xT Ax = 1
where x =
[xy
]
, and A is a real symmetric matrix.
We can diagonalize in order to simplify the equation so that we canidentify the curve.Let’s demonstrate with an example.
323
Identify and sketch the conic defined by x2 + 4xy + y2 = 1
This can be written as xT Ax = 1 where A =
[1 22 1
]
Diagonalizing gives A = QDQT with D =
[3 00 −1
]
, Q = 1√2
[1 −11 1
]
Let x′ =
[x ′
y ′
]
be the co-ordinates of (x , y) relative to the orthonormal
basis of eigenvectors: B = ( 1√2, 1√
2), (− 1√
2, 1√
2)
Then x = Qx′, (Q is precisely the transition matrix PS,B)and the equation of the conic can be rewritten:
xT Ax = 1 ⇐⇒ (Qx′)T QDQT Qx′ = 1
⇐⇒ (x′)T QT QDQT Qx′ = 1
⇐⇒ (x′)T Dx′ = 1
⇐⇒ 3(x ′)2 − (y ′)2 = 1
So the curve is a hyperbola324
The x ′-axis and the y ′-axis are called the principal axes of the conic
The directions of the principal axes are given by the eigenvectors.
In this example the directions of the principal axes are:
(1√2,
1√2) and (− 1√
2,
1√2).
-2 -1 1 2
-2
-1
1
2
x′
y′
3(x ′)2 − (y ′)2 = 1
-4 -2 2 4
-4
-2
2
4
x
y
x′
y′
x2 + 4xy + y2 = 1
325
Summary
We can represent the equation of a conic (centered at the origin)by a matrix equation xT Ax = 1 with A symmetric.
The eigenvectors of A will be parallel to the principal axes of theconic.
So if Q represents the change of basis matrix then Q will beorthogonal and QTAQ = D will be diagonal.
If x = Qx′ then x ′ represents the coordinates with respect to thenew basis and the equation of the conic with respect to this basisis x′T DxT = 1.
The conic can now be identified.
326
Example (a quadric surface)The equation
−x2 + 2y2 + 2z2 + 4xy + 4xz − 2yz = 1
represents a ‘quadric surface’ in R3.
In matrix form it can be represented as xTAx = 1 with
x =
xyz
and A =
−1 2 22 2 −12 −1 2
The eigenvalues of A are 3, 3,−3. So the equation of the surface withrespect to an orthonormal basis of eigenvectors is
3X 2 + 3Y 2 − 3Z 2 = 1
327
The surface is a ‘hyperboloid of one sheet’; see the sketch below.(You are not expected to identify quadric surfaces in three dimensions.)
328