Linear Slides

620-156 (MAST10007) Linear Algebra

Topic 1: Linear equations 2

Topic 2: Matrices and determinants 40

Topic 3: Euclidean vector spaces 105

Topic 4: General vector spaces 193

Topic 5: Linear transformations 225

Topic 6: Inner product spaces 263

Topic 7: Eigenvalues and eigenvectors 296

1

Topic 1: Linear equations [AR 1.1 and 1.2]

One of the major topics studied in linear algebra is systems of linearequations and their solutions. The following topics will be addressed:

1.1. Systems of equations. Coefficient arrays. Row operations.

1.2. Reduction of systems to row-echelon form and reduced row-echelonform.

1.3. Consistent and inconsistent systems. Infinite solution sets.

2

We will begin with a couple of examples to illustrate how linearequations can arise.

Example (This is Example 2 of [AR 11.3])

An investor has $10,000 to invest. She considers two bonds, one at10% and one at 7%. She decides to invest at most $6,000 in the firstbond and at least $2,000 in the second. She also decides to invest atleast as much in the first as the second. How should she invest?

Suppose she invests x1 in the first and x2 in the second.

Then the yield is z = 0.1x1 + 0.07x2.

The constraints are

x1 6 6000 x2 > 2000

x1 + x2 6 10000 x1 > x2

x1 > 0 x2 > 0

3

We could sketch these constraints:

10000

100006000

2000

x1 = x2

line of constant profit

We can find the maximum profit by moving the ‘line of constant profit’parallel to itself until it meets the shaded region.

4

Example (This is Example 1 of [AR 11.2])

A7Ω

I1

Loop 2Loop 1

50V

30V 3Ω 11Ω

I2

I3

The numbers I1, I2, I3represent the current asindicated.

Because the current going into point A is the same as the current goingout, we have

I1 = I2 + I3

Measuring voltage around Loop 1 and Loop 2 we get, respectively,

7I1 + 3I3 = 30

11I2 − 3I3 = 50

Solving these gives values for the current.

5

1.1 Systems of equations, coefficient arrays, rowoperations

Linear equations

In our example, the unknowns were I1, I2, I3, and a typical equationobtained was

7I1 + 3I3 = 30

We call this a linear equation in the variables I1and I2 because thecoefficients are constants, and the variables are raised to the first poweronly.

Now that we know what makes the equation linear, it is clear to seethat the number of variables is unimportant, so we have in the generalcase the following definition.

6

Definition (Linear equation and linear system)

A linear equation in n variables, x1, x2, . . . , xn, is an equation of the form

a1x1 + a2x2 + · · · + anxn = b

where a1, a2, . . . , an are real constants (not all zero) and b is also a realconstant.

Furthermore, a finite set of linear equations in the variablesx1, x2, . . . , xn is called a system of linear equations or a linear system.

7

Examples

x + 2y = 7

3

8x − 21y = 0

x1 + 5x2 + 6x3 = 100

x2 − x3 = −1

−x1 + x3 = 11

8

Definition (Solution of a system of linear equations)

A solution to a system of linear equations in the variables x1, . . . , xn is aset of values of these variables which satisfy every equation in thesystem.

How would you solve the following linear system?

2x − y = 3x + y = 0

Graphically

Need accurate sketch!

Not practical for three or more variables.

Elimination

Will always give a solution, but is too adhoc, particularly in higherdimensions (meaning three or more variables).

9

However, the good news is that with the introduction of some clevernotation, we can formalise the procedure involved in solvingsimultaneous equations.

Before we do this, let’s remind ourselves how to solve a simple linearsystem using elimination.

ExampleFind all solutions for the following system of linear equations

2x − y = 3x + y = 0

Adding the two equations gives 3x = 3, from which we get

10

Example Find all solutions for the linear system

3x + 2y = −32x + y = −1

Subtracting twice the second from the first gives

11

Coefficient arrays

There is a systematic way to write a linear system. For example,consider the previously encountered linear system

I1 − I2 − I3 = 0

7I1 + (0×)I2 + 3I3 = 30

(0×)I1 + 11I2 − 3I3 = 50

The coefficients of the unknowns I1, I2, I3 form a 3× 3 array of numbers:

1 −1 −17 0 30 11 −3

12

Definition (Matrix)

A matrix is a rectangular array of numbers.The numbers in the array are called the entries of the matrix.

(Matrices are discussed in further detail in a few lectures time).

Definition (Augmented matrix of a linear system)

The augmented matrix for a linear system is the matrix formed from thecoefficients in the equations and the constant terms.

ExampleThe augmented matrix for the previous set of equations is:

13

ExampleWrite the following system of linear equations as an augmented matrix

2x − y = 3x + y = 0

NoteThe number of rows is equal to the number of equations.Each column, except the last, corresponds to a variable.The last column contains the constant term from each equation.

14

Row Operations

Our aim is to use matrices to assist us in finding a solution to a systemof equations.

First we need to decide what sort of operations we can perform on theaugmented matrix. An essential condition is that whichever operationswe perform, we must be able to recover the solution to the originalsystem from the new matrix we obtain.

Let’s start by considering operations on a matrix that mimic thoseoperations used in the elimination method.

Definition (Elementary row operations)

The elementary row operations are:

1. Interchanging two rows.

2. Multiplying a row by a non-zero constant.

3. Adding a multiple of one row to another.

15

ExampleBack to our simple system:

2x − y = 3x + y = 0

Let’s apply some elementary row operations to the correspondingaugmented matrix:

[2 1 −31 1 0

]

∼

NoteThe matrices are not equal, but are equivalent in that the solution set isthe same for each system represented by each augmented matrix.

16

1.2 Reduction of systems to reduced row-echelon form

Gaussian elimination

Using a sequence of elementary row operations, we can always get to amatrix that allows us to determine the solution set of a linear system.

However, the degree of complication increases as the number ofvariables and equations increases, so it is a good idea to formalise theprocess.

The leftmost non-zero element in each row is called the leading entry.

17

Definition (Row echelon form)

A matrix is in row-echelon form if:

1. For any row with a leading entry, all elements below that entry andin the same column as it, are zero.

2. For any two rows, the leading entry of the lower row is further tothe right than the leading entry in the higher row.

3. Any row that consists solely of zeros is lower than any row withnon-zero entries.

Examples

[1 −2 3 4 5

],

1 0 0 30 1 1 20 0 0 3

r.e. form

0 0 0 2 40 0 3 1 60 0 0 0 02 −3 6 −4 9

not r.e. form

18

Gaussian elimination is a systematic (or algorithmic) approach to thereduction of a matrix to row-echelon form.

Gaussian elimination

1. Make the top left element (row 1, column 1) a leading entry; thatis, reorder rows so that the entry in the top left position is non-zero.

2. Add multiples of the first row to the other rows to make all otherentries (from row 2 down) in the first column zero.

3. Reorder rows 2 . . . n so that the next leading entry is in row 2.

4. Add multiples of the second row to rows 3 . . . n, making all otherentries (from row 3 down) in the column containing the secondleading entry zero.

5. Repeat until you run out of rows.

The matrix is now in row-echelon form.

19

ExampleUse Gaussian elimination to reduce the augmented matrix whichrepresents the linear system

3x + 2y − z = −15x + y − 4z = −30

3x + y + 3z = 113x + 3y − 5z = −41

to row-echelon form.

20

By reducing a matrix to row-echelon form, Gaussian elimination allowsus to easily solve a system of linear equations. To do this we need toread off the final equations from the row-echelon matrix and thenmanipulate them to find the solution. The final manipulation issometimes called back substitution. We illustrate with an example.

ExampleFrom the row-echelon matrix of the previous example we can calculatethe solutions to the original system.

NoteThis procedure relies on the fact that the new row-echelon matrix givesa linear system with exactly the same set of solutions as the originallinear system.

21

Is there any way to solve a system without having to perform the finalmanipulation?

Definition (Reduced row-echelon form)

A matrix is in reduced row-echelon form if the following three conditionsare satisfied:

1. It is in row-echelon form.

2. Each leading entry is equal to 1 (called a leading 1).

3. In each column containing a leading 1, all other entries are zero.

22

Examples

[1 −2 3 −4 5

]is in r.r.e form

1 0 0 30 1 1 20 0 0 3

is not in r.r.e form

1 0 0 2 40 1 3 1 60 0 0 0 00 0 1 −4 9

is not in r.r.e form

23

1.2.2 Gauss-Jordan elimination

Gauss-Jordan elimination is a systematic way to reduce a matrix toreduced row-echelon form using row operations.

Gauss-Jordan elimination

1. Use Gaussian elimination to reduce matrix to row-echelon form.

2. Use row operations to create zeros above the leading entries.

3. Multiply rows by appropriate numbers to create the leading 1’s.

The order of operations is not unique, however the reduced row-echelonform of a matrix is!

24

ExampleUse Gauss-Jordan elimination to find a solution to the linear system

3x + 2y − z = −15x + y − 4z = −30

3x + y + 3z = 113x + 3y − 5z = −41

25

1.3 Consistent and inconsistent systems

Recall that the definition of a solution of a system of linear equations asvalues of the variables which satisfy every equation in the system.

ExampleFind the solution of the system

2x + 4y − z = −3x − 3y + 2z = 11

4x − 2y + 5z = 21

This system has a unique solution.

26

Another exampleFind the solution of the system

x − y + z = 3x − 7y + 3z = −11

2x + y + z = 16

What’s going on here?

27

Yet another exampleFind the solution of the system

x + y + z = 42x + y + 2z = 93x + 2y + 3z = 13

Is this the same behaviour as the previous example or not?

28

Types of solution sets

It is clear from the above examples that there are different types ofsolutions for systems of equations.

Definition (Consistency)

A system of linear equations is said to be consistent if the systemhas at least one solution.

A system of linear equations is said to be inconsistent if the systemhas no solution.

For any system of linear equations, one of three types of solution set ispossible.

no solution (inconsistent)

one solution (consistent and unique)

infinitely many solutions (consistent but not unique)

29

Inconsistent systems

We can determine the type of solution set a system has by reducing itsaugmented matrix to row-echelon form.

The system is inconsistent if there is at least one row in the row-echelonmatrix having all values equal to zero on the left and a non-zero entryon the right. For example,

[0 0 · · · 0 5

]

Why is this inconsistent?

If we try to recover the equation represented by this row, it says:

0 × x1 + 0 × x2 + · · · + 0 × xn = 5

and of course this is not satisfied for any values of x1, . . . , xn.

30

Example

1 2 0 42 1 1 24 2 2 3

If a system of equations is inconsistent then the row-echelon form of itsaugmented matrix will have a row of the form

[0 0 · · · 0 a

]with

a 6= 0.

31

Example (inconsistent)Geometrically, an inconsistent system is one for which there is nocommon point of intersection for the system.

32

Consistent systems

Recall that a consistent system has either a unique solution or infinitelymany solutions.

Unique solution:For a system of equations with n variables, a unique solution existsprecisely when the row reduced augmented matrix has n non-zero rows.In this case we can read off the solution straight from the reducedmatrix.

Example

1 1 3 51 2 1 42 1 1 4

∼

and the solution is

x1 = x2 = x3 =

33

Infinitely many solutions:

Example

What is the solution to the equations with rre form;

1 2 0 0 5 10 0 1 0 6 20 0 0 1 7 30 0 0 0 0 0

The corresponding (non-zero) equations arex1 + 2x2 + 5x5 = 1, x3 + 6x5 = 2, x4 + 7x5 = 3.

We can choose x2 and x5 as we wish. Say x2 = s, x5 = t. Then theother variables must be given byx1 = 1 − 2s − 5t, x3 = 2 − 6t, x4 = 3 − 7t.

In this way, we can describe every possible solution.

34

In generalIf the row reduced augmented matrix has < n non zero rows then it hasinfinitely many solutions.

If r is the number of non zero rows, then n − r parameters are neededto specify the solution set.

More precisely, in the rre form of the matrix, there will be n− r columnswhich contain no leading entry.

We can choose the variable corresponding to such a column arbitrarily.

The values for the remaining variables will then follow.

35

ExampleSuppose that the rre matrix for a system of equations is

1 2 0 1 10 0 1 2 20 0 0 0 00 0 0 0 0

We can represent the values of x2 and x4 by parameters s and t,respectively.

Then the values of the other variables are

x1 = x3 =

36

Example

1 1 3 51 2 8 112 1 1 4

∼

So the solution of the equations can be given by:

x1 = x2 = x3 =

37

ExampleSolve the linear system:

v − 2w + z = 12u − v − z = 04u + v − 6w = 3

38

ExampleFind the values of k for which the system

u + 3v + 4w = 64u + 9v − w = 46u + 9v + kw = 8

has

(i) no solution

(ii) a unique solution

(iii) an infinite number of solutions

39

Topic 2: Matrices and Determinants [AR 1.3 – 1.6]

In studying linear systems we introduced the idea of a matrix. Next wewill discover that matrices are not only useful tools for solving systemsof equations but also that they have their own algebraic structure andhave many interesting properties.

2.1 Properties of matrices

2.2 Matrix algebra

2.3 Matrix inverses

2.4 Rank of a matrix

2.5 Solutions of non-homogeneous linear equations

2.6 Determinants

40

2.1 Properties of matrices

Notation

Sometimes it is convenient to refer to the entries of a matrix ratherthan the entire matrix.

If A is a matrix, we denote its entries as Aij , where i specifies the row ofthe entry and j specifies the column. Using this notation, we write

A =

A11 A12 . . . . . . A1n

A21 A22 . . . . . . A2n...

.... . .

......

.... . .

...Am1 Am2 . . . . . . Amn

or A = [Aij ]

41

We say that a matrix has size m× n when it has m rows and n columns.

Example

The matrix A =

[1 2 3π e 27.1

]

has size 2 × 3.

We have A12 = 2 and A21 = π.

NoteA12 and A21 are not equal (in this example).

42

Some special matricesSome matrices with special features are given names suggestive of thosefeatures.

Definition (Special matrices)

A matrix having the same number of rows as columns is called asquare matrix

A matrix with only one row is called a row matrix(They have size 1 × n)

A matrix with only one column is called a column matrix (Theyhave size n × 1)

A matrix with all elements equal to zero is referred to as a zeromatrix

A square matrix with Aij = 0 for i 6= j is called a diagonal matrix

A square matrix A satisfying Aij =

1 if i = j

0 if i 6= jis called an identity matrix.

43

Examples

The matrices

[1 23 4

]

and

1 2 23 4 56 7 8

are both square.

[4 3 5 −2

]is a row matrix.

123

is a column matrix.

The matrices

[0 0 00 0 0

]

and

0 00 00 0

are both zero matrices.

[1 00 1

]

and

1 0 00 1 00 0 1

are both identity matrices.

44

2.2 Matrix algebra

We all know the algebra of real numbers: addition, subtraction,multiplication and division. But what do these operations mean whenwe try to apply them to things other than real numbers? Matrices havetheir own algebra for which some of these things make sense and somedon’t.

Definition (Equality of matrices)

We say that two matrices are equal if

they have the same size; and

every corresponding element is equal.

That is,

A = B ⇐⇒ A and B have the same size andAij = Bij for all i and j

45

Example

Given A =

[1 y 0

−7 2 3

]

and B =

[1 3 0x 2 3

]

determine the values of x and y for which A = B .

Scalar Multiplication

Definition (Scalar multiple)

Let A = [Aij ] be a matrix and c ∈ R. The matrix cA having the samesize as A and entries given by

(cA)ij = c × Aij

is called a scalar multiple of A.In this context the real number c is called a scalar.

46

Example

Let A =

0 11 −12 −7

. What are −2A and αA for α ∈ R?

Properties of Scalar Multiplication

If α and β are any scalars and C and D are any matrices of the samesize, then

1. (α + β)C = αC + βC

2. α(D + C ) = αD + αC

3. α(βC ) = (αβ)C

47

Addition of Matrices

Definition (Addition of matrices)

Let A = [Aij ] and B = [Bij ] be matrices of the same size.We define a matrix C , called the sum of A and B , by

C has the same size as A and B

Cij = Aij + Bij

We write C = A + B .

Be Careful: Matrix addition is only defined for matrices of the same size.

Notation: We write A − B in place of A + (−1)B .

48

ExampleLet

A =

[2 0 −31 −1 3

]

, B =

[−1 −1 1

0 1 2

]

and C =

[−1 1

2 0

]

.

Calculate (where possible) A + B and A + B + C

49

Properties of Matrix Addition

For matrices A, B and C , all of the same size, the following statementshold:

1. A + B = B + A (commutativity)

2. A + (B + C ) = (A + B) + C (associativity)

3. A − A = 0

4. A + 0 = A

Here 0 denotes the zero matrix of the same size as A, B and C .

All these properties follow from the corresponding properties of thescalars.

50

Matrix multiplication

Sometimes we can multiply two matrices together.

Definition (Matrix multiplication)

Let A be an m × n matrix and B be a n × q matrix.The matrix product of A and B is a matrix of size m × q, denoted AB .

The entry in position ij of the matrix product is obtained by taking rowi of A, and column j of B , then multiplying together the entries in orderand forming the sum.

Using summation notation this rule can be expressed as

(AB)ij =n∑

k=1

AikBkj

Note: The matrix product AB is only defined if the number of columnsof A is equal to the number of rows of B .

51

Example

Let A =

1 −13 00 1

and B =

[4 0

−7 1

]

Calculate AB and BA (if they exist).

52

Example

Let A =

[1 02 3

]

and B =

[4 32 1

]

Calculate AB and BA.

Notice: In this example AB 6= BA, even though both are defined.

Matrix multiplication is not commutative (in general).

53

Definition (Commuting matrices)

The matrices A and B are said to commute if AB = BA.

For both AB and BA to be defined and equal we must have that A andB are square and have the same size.

NoteIf I is an n × n identity matrix and A is any n × n square matrix, then

AI = IA = A

54

Properties of matrix multiplicationThe following properties hold whenever the matrix products and sumsare defined:

1. A(B + C ) = AB + AC (left distributivity)

2. (A + B)C = AC + BC (right distributivity)

3. A(BC ) = (AB)C (associativity)

4. A(αB) = α(AB)

5. AI = IA = A

6. A0 = 0 and 0A = 0

Here α is a scalar and, as always, I and 0 denote the identity matrixand zero matrix of the appropriate size.

55

Matrix multiplication and linear systems

Using the rule for matrix multiplication, a linear system can be writtenas a matrix equation.

Example

The linear system

2x + 0.1y − 0.02z = 0.9−0.8x + 7y = 0

is equivalent to the matrix equation

[2 0.1 −0.02

−0.8 7 0

]

xyz

=

[0.90

]

56

Another non-property of matrix multiplication

If we take the product of two real numbers a and b and ab = 0 we canconclude that at least one of a and b is equal to 0.

This is not always true for matrices.

Example

Let A =

[1 1

−1 −1

]

and B =

[1 −1

−1 1

]

.

What is going on here? The problem is that the matrices A and B inthe above example do not have inverses.

57

2.3 Matrix inverses

Definition (Matrix inverse)

A matrix A is called invertible if there exists a matrix B such that

AB = BA = I

Such a matrix B is called the inverse of A and is denoted A−1.

If A is not invertible, we say that A is singular.

58

Note

It follows from the definition that:

For A to be invertible it must be square

(If it exists) A−1 has the same size as A

(If it exists) A−1 is unique

If A is invertible, then A−1 is invertible and (A−1)−1 = A

I−1 = I , 0 has no inverse.

59

Inverse of a 2 × 2 matrix

In general, for a 2 × 2 matrix A=

[a bc d

]

1. A is invertible iff (ad − bc) 6= 0

2. If (ad − bc) 6= 0, then A−1 = 1ad−bc

[d −b−c a

]

Example

Find the inverse of A =

[2 −11 1

]

60

Example

If A is a square matrix satisfying A3 = 0, show that(I − A)−1 = I + A + A2

61

Finding the inverse of a square matrix

Calculating the inverse of a matrix

Given an n × n matrix A, we can find A−1 as follows:

1. Construct the (“grand augmented”) matrix [A | I ],where I is the n × n identity matrix.

2. Apply row operations to [A | I ] to get the block corresponding toA into reduced row-echelon form. This gives

[A | I ] ∼ [R | B ]

where R is in reduced row-echelon form.

3. If R = I , then A is invertible and A−1 = BIf R 6= I , then A is singular (i.e., there is no A−1)

We will see shortly why this method works.

62

Example

Find the inverse of A =

1 2 1−1 −1 1

0 1 3

63

Another example

Find the inverse of

1 2 1−1 −1 1−1 0 3

64

Properties of the matrix inverse

If A and B are invertible matrices of the same size, and α a non-zeroscalar, then

1. (αA)−1 = 1α

A−1

2. (AB)−1 = B−1A−1

3. (An)−1 =(A−1

)n(for all n ∈ N)

ExerciseProve these!

65

Row operations by matrix multiplication

The effect of a row operation can be achieved by multiplication on theleft by a suitable matrix.

Definition (Elementary matrix)

An n × n matrix is an elementary matrix if it can be obtained from Inby performing a single elementary row operation.

Example

Consider I3 =

1 0 00 1 00 0 1

Let F be obtained from I3 by the row operation: R2 ↔ R3

Let G be obtained from I3 by the row operation: R1 7→ 2R1

Let H be obtained from I3 by the row operation: R3 7→ R3 + 3R1

66

F =

1 0 00 0 10 1 0

G =

2 0 00 1 00 0 1

H =

1 0 00 1 03 0 1

Now consider the matrix A =

a b cd e fg h i

Calculation gives

FA =

GA =

HA =

67

This works in general!

Let E be obtained by applying a row operation p to I .

If A is a matrix such that the product EA is defined,then EA is the result of performing p on A.

We can represent a sequence of elementary row operations by acorresponding sequence of elementary matrices.(Be careful about the order!)

68

Example[1 23 4

]

∼[1 20 −2

]

∼[1 20 1

]

[1 0

0 −12

] [1 0−3 1

] [1 23 4

]

=

[1 20 1

]

69

If A ∼ I , then there is a sequence of elementary matrices E1,E2, . . . ,En

such that EnEn−1 · · ·E2E1A = I

This can be used to prove the following

Theorem (AR 1.5.3)

1. Let A be an n × n matrix. Then

A is invertible ⇐⇒ A ∼ In

2. If A and B are n × n matrices such that AB = In,then B is invertible and B−1 = A.

3. Every invertible matrix can be written as a product of elementarymatrices.

This theorem justifies why our procedure for finding inverses using rowoperations actually works.... Can you see why?

70

Matrix transpose

Definition (Transpose of a matrix)

Let A be an m × n matrix. The transpose of A, denoted by AT , isdefined to be the n × m matrix whose entries are given by

(AT )ij = Aji

That is, AT is obtained by interchanging the rows and columns of A.

Example

Let A =

[1 2 34 5 6

]

. Then AT is

71

Properties of the transpose

1.(AT

)T= A

2. (A + B)T = AT + BT (whenever A + B is defined)

3. (αA)T = αAT (where α is a scalar)

4. (AB)T = BTAT (whenever AB is defined)

5.(AT

)−1=

(A−1

)T(whenever A−1 is defined)

Parts 1, 2 and 3 follow easily from the definition.

To prove part 4, we also use the definition of matrix multiplication givenpreviously.

72

Example

Using part 4 above, prove part 5:(AT

)−1=

(A−1

)T.

73

Linear systems revisited

Consider the linear system

x + 2y + z = − 3−x − y + z = 11

y + 3z = 21

We could represent the system by an augmented matrix:

1 2 1 −3−1 −1 1 11

0 1 3 21

or, we could write it as a matrix equation:

1 2 1−1 −1 1

0 1 3

xyz

=

−31121

74

For convenience, label

(i) the matrix of coefficients as A,

(ii) the column matrix of variables as x,

(iii) the right hand side column matrix as b.

Using this notation, the linear system can be written as

Ax = b

Solving linear systems using matrix inverses

Theorem

Suppose A is an invertible matrix.Then the linear system Ax = b has exactly one solution, and it is givenby x = A−1b

Proof: Ax = b =⇒ A−1Ax = A−1b =⇒ x = A−1b75

Example

Use a matrix inverse to solve the linear system

x + 2y + z = −3−x − y + z = 11

y + 3z = 21

76

Another example

Solve the following system of linear equations:

x + 2y + z = a−x − y + z = b

y + 3z = c

77

2.4 Rank of a matrix

Definition (Rank)

The rank of a matrix A is the number of non-zero rows in the reducedrow-echelon form of A.

Note This is the same as the number of non-zero rows

in a row-echelon form of A. If A has size m × n, then rank(A) 6 m and rank(A) 6 n

Example

Find the rank of each of the following matrices:

1 2 1−1 −1 1

0 1 3

1 −1 2 10 1 1 −21 −3 0 5

78

Theorem

The linear system Ax = b, where A is an m × n matrix, has:

1. No solution if rank(A) < rank([A |b])

2. A unique solution if rank(A) = rank([A |b]) and rank(A) = n

3. Infinitely many solutions if rank(A) = rank([A |b]) and rank(A) < n

Proof: This is just a restatement of the results in section 1.3

NoteIt is always the case that rank(A) 6 rank([A |b]) and rank(A) 6 n

79

Theorem

If A is an n × n matrix, the following conditions are equivalent:

1. A is invertible

2. Ax = b has a unique solution for any b

3. The rank of A is n

4. The reduced row-echelon form of A is In

Proof:

1 ⇒ 2 We’ve seen before (slide 75).

2 ⇒ 3 Follows from the previous theorem (or what we already knew aboutlinear systems).

3 ⇒ 4 Immediate from the definition of rank, and that fact that A issquare.

4 ⇒ 1 Let R be the RREF of A. Then R = EA, where E = EkEk−1 . . . E1

is a product of elementary matrices. So we have I = EA. We havealready noted that this implies that A is invertible (andA−1 = E ).

80

2.5 Solutions of non-homogeneous linear equations

Suppose that A is an m × n matrix (of coefficients), thatx = [x1, . . . , xn]

T is an n × 1 matrix (of unknowns) and that b is anm × 1 matrix (of constants).

The general solution to the system of equations

Ax = b

is given byx = xh + x0

where x0 is any one solution to Ax = b and xh varies through allsolutions to the homogeneous equations Ax = 0.

81

Example

The equations

1 0 −1 10 1 −1 −10 0 0 0

x1

x2

x3

x4

=

120

(Ax = b)

have a solution x1 = 1, x2 = 2, x3 = 0, x4 = 0.

The general solution to the homogeneous equations Ax = 0 is given by

So the general solution to the original equations is

82

2.6 Determinants [AR 2.1–2.3]

When we calculate the inverse of the 2 × 2 matrix A =

[a bc d

]

we find

that we need to invert the number ad − bc .

If ad − bc 6= 0 then we can find the inverse of A; if not, then A is notinvertible.

So this number plays an important role when we study A. We call it thedeterminant of A.

The determinant is a function that associates a real number to anysquare matrix.

83

Defining the determinant

The determinant of a 2 × 2 matrix A =

[a bc d

]

is given by

det(A) = ad − bc

Furthermore, the matrix

[a bc d

]

is invertible iff det(A) 6= 0

Definition (Determinant)

Let A be an n× n matrix. The determinant of A, denoted det(A) or |A|,can be defined as the signed sum of all the ways to multiply together nentries of the matrix, with all chosen from different rows and columns.

84

To determine the sign of the products, imagine all but the elements inthe product in question are set to zero in the matrix. Now swapcolumns until a diagonal matrix results. If the number of swaps requiredis even, then the product has a + sign, while if it is odd, it is to begiven a − sign.

Determinants of 2 × 2 matrices

For A =

[a bc d

]

, the possible products are ad and bc .

When we set all entries other than a and d to zero, then the matrix isdiagonal; so ad has a + sign. When we set all entries other than b andc to zero then we need to interchange the two columns to obtain adiagonal matrix; so we have a minus sign.

85

Determinant of a 3 × 3 matrix

Suppose

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

We can form a table of all the products of 3 entries taken from differentrows and columns, together with the signs:

product sign

a11a22a33 +a11a23a32 −a12a21a33 −a12a23a31 +a13a21a32 +a13a22a31 −

Hence

det(A) = a11a22a33 − a11a23a32 − a12a21a33

+a12a23a31 + a13a21a32 − a13a22a31

86

The formula for 3 × 3 matrices is complicated and it becomes quicklyworse as the size of the matrix increases. But there are better ways tocalculate determinants.

Submatrices and cofactors

Here is one way for practical calculation of (small) determinants.

Definition (Submatrix)

Let A be an m × n matrix. The (i , j)-submatrix of A, denoted byA(i , j), is the (m − 1) × (n − 1) matrix obtained by deleting the ith rowand jth column from A.

Example

If A =

1 2 1−1 −1 1

0 1 3

, then A(2, 3) =

[1 20 1

]

87

Definition (cofactor)

Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , isdefined by

Cij = (−1)i+jdet (A(i , j))

Example (continued)

A(2, 3) =

[1 20 1

]

, so det (A(2, 3)) = 1 and C23 =

88

Cofactor Expansion

We can write the determinant of a 3 × 3 matrix, A, in terms ofcofactors.

det(A) = a11C11 + a12C12 + a13C13

This is called the cofactor expansion along the first row of A.

Example

Calculate det

1 2 1−1 −1 1

0 1 3

89

Theorem

The determinant of an n × n matrix A can be computed by multiplyingthe entries in any row (or column) by their cofactors and adding theresulting products.

That is, for each 1 6 i 6 n and 1 6 j 6 n,

det(A) = ai1Ci1 + ai2Ci2 + · · · + ainCin

(this is called cofactor expansion along the ith row)and

det(A) = a1jC1j + a2jC2j + · · · + anjCnj

(this is called cofactor expansion along the jth column)

Proof: We can prove this in the n = 3 case using the formula on slide86. The proof for the general case is essentially the same, but gets moretechnical...

90

Example

Calculate the determinant of

A =

1 2 1−1 −1 1

0 1 3

(i) using the cofactor expansion along the 2nd column,(ii) using the cofactor expansion along the 3rd row.

Which one was easier?

91

How do you remember the sign of the cofactor?

The (1, 1)-cofactor always has sign +. Starting from there, imaginewalking to the square you want using either horizontal or vertical steps.The appropriate sign will change at each step.

We can visualize this arrangement with the following matrix:

+ − + − . . .− + − + . . .+ − + − . . .− + − + . . ....

......

.... . .

So, for example, C13 is assigned + but C32 is assigned −

92

Example

Calculate

∣∣∣∣∣∣∣∣

1 −2 0 13 1 2 01 0 1 02 −2 1 2

∣∣∣∣∣∣∣∣

93

Some properties of determinants:

Suppose A and B are square matrices. Then

1. det(AT

)= det(A)

2. If A has a row or column of zeros, then det(A) = 0

3. det(AB) = det(A) det(B)

4. If A if invertible, then det(A) 6= 0 and det(A−1

)= 1

det(A)

5. If A is singular, then det(A) = 0

6. If A =

[C ∗0 D

]

with C and D both square, then

det(A) = det(C ) det(D)

Idea of proof: 2: Cofactor expansion. 4: Follows from 3. 5: Followsfrom 3, 4 and the theorem on slide 80. 1, 3 6: Need to use thedefinition...

94

Row operations and determinants

Calculating determinants via cofactors becomes a very large calculationas the size of the matrix increases.

We need a better way to calculate larger determinants.

For some types of matrix it is easy to write down their determinant.

Definition (Triangular Matrix)

A matrix is said to be upper triangular (respectively lower triangular) ifall the elements below (respectively above) the main diagonal are zero.

Examples

2 −1 90 3 20 0 2

2 0 01 3 02 −3 2

2 0 00 3 00 0 2

95

Theorem

If A is an n × n triangular matrix, thendet(A) is the product of the entries on the main diagonal of A.

Idea of proof: (Repeated) cofactor expansion along the first column.

Example

Let A =

2 −10 92 −1170 3 28 −310 0 −1 270 0 0 2

What is det(A)?

96

We can use row operations to manipulate a matrix into triangular formin order to make the determinant calculation easier.

Recall that an n × n matrix is an elementary matrix if it can beobtained from In by performing a single elementary row operation.Multiplying on the left by an elementary matrix is equivalent toapplying the corresponding elementary row operation.

Using elementary matrices we can check the effects of row operationson the determinant of a matrix.

97

Example

Let A =

a b cd e fg h i

Row swap:

E =

1 0 00 0 10 1 0

EA =

a b cg h id e f

det(EA) = det(E ) det(A) = − det(A)

98

Multiply a row by a scalar:

F =

2 0 00 1 00 0 1

FA =

2a 2b 2cd e fg h i

det(FA) = det(F ) det(A) = 2 det(A)

Add a constant multiple of one row to another row:

G =

1 0 00 1 03 0 1

GA =

a b cd e f

g + 3a h + 3b i + 3c

det(GA) = det(G ) det(A) = det(A)

99

The effect of each of the three types of elementary row operations aregiven by the following.

Theorem

Let A be a square matrix.

1. If B is obtained from A by swapping two rows (or two columns) ofA, then det(B) = − det(A)

2. If B is obtained from A by multiplying a row (or column) of A bythe scalar α, then det(B) = α det(A)

3. If B is obtained from A by replacing a row (or column) of A byitself plus a multiple of another row (column), thendet(B) = det(A)

Proof: The corresponding elementary matrices have determinants −1, αand 1 (respectively). Then use det(B) = det(E ) det(A).

100

Example

Calculate det

1 2 1−1 −1 1

0 1 3

101

Example

Calculate

∣∣∣∣∣∣∣∣

1 −2 0 13 1 2 01 0 1 02 −2 1 2

∣∣∣∣∣∣∣∣

102

We collect here some properties of the determinant function, most ofwhich we’ve already noted.

Theorem

Let A be an n × n matrix. Then,

1. det(AT

)= det(A)

2. det(AB) = det(A) det(B)

3. det(αA) = αn det(A)

4. If A is a triangular matrix, then its determinant is the product ofthe elements on the main diagonal

5. If A has a row (or column) of zeros, then det(A) = 0

6. If A has a row (or column) which is a scalar multiple of anotherrow (or column) then det(A) = 0

7. A is singular iff det(A) = 0 (and A is invertible iff det(A) 6= 0)

103

Example (showing how to prove property 3 above)

Let A be a 3 × 3 matrix. Show that det(αA) = α3 det(A).

104

Topic 3: Euclidean Vector Spaces

There are quite a few things to cover:

3.1 Vectors in Rn

3.2 Dot product

3.3 Cross product of vectors in R3

3.4 Geometric applications

3.5 Linear combinations

3.6 Subspaces of Rn

3.7 Bases and dimension

3.8 Rank-nullity theorem

3.9 Coordinates relative to a basis

105

3.1 Vectors in Rn [AR 3.1]

Geometrically, a pair of real numbers (a, b) can be thought of asrepresenting a directed line segment from the origin in the plane.

These can be added together, and multiplied by a real number.

Algebraically, these operations are given by:

vector addition: (a, b) + (c , d) = (a + c , b + d)

scalar multiplication: α(a, b) = (αa, αb)

Notation:

R2 = (a, b) | a, b ∈ R

= the set of all ordered pairs of real numbers

106

The algebraic approach to vector addition and scalar multiplicationextends to 3 dimensions or more.

Rn = (x1, x2, . . . , xn) | xi ∈ R for i = 1, 2, . . . , n

= the set of all n-tuples of real numbers

We will refer to elements of Rn as vectors.

We can add two vectors together and multiply a vector by a scalar.

vector addition : (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn)

scalar multiplication : α(x1, . . . , xn) = (αx1, . . . , αxn)

107

Notation

We often denote by i, j, and k the vectors in R3 given by:

i = (1, 0, 0) j = (0, 1, 0) k = (0, 0, 1)

Any vector in R3 can be written in terms of i, j and k:

u = (u1, u2, u3) = ui i + u2 j + u3k

108

Definition (magnitude)

The length (or magnitude or norm) of a vectoru = (u1, u2, . . . , un) ∈ R

n is given by

‖u‖ =√

u21 + u2

2 + · · · + u2n

It follows from Pythagoras’ theorem that this corresponds to ourgeometric idea of length for vectors in R

2 and R3.

ExampleLet u = 2i − j + 2k. ‖u‖ =

A vector having length equal to 1 is called a unit vector.

ExampleFind a unit vector parallel to u.

109

Definition

The distance between two vectors u, v ∈ Rn is given by

d(u, v) = ‖v − u‖

Geometrically, if we consider u and v to be the position vectors ofpoints P and Q, then d(u, v) is the distance between P and Q.

ExampleFind the distance between the points P(1, 3,−1) and Q(2, 1,−1).

110

3.2 Dot product [AR 3.3]

Letu = (u1, u2, . . . , un) ∈ R

n

andv = (v1, v2, . . . , vn) ∈ R

n

be two vectors in Rn.

Definition (Dot product)

We define the dot product (or scalar product or Euclidean innerproduct) by

u · v = u1v1 + u2v2 + · · · + unvn

Examples(3,−1) · (1, 2) =(i + j + k) · (−j + k) =

111

The angle between two vectors can be defined in terms of the dotproduct.

Definition (Angle)

The angle θ between two vectors u, v ∈ Rn is given by the expression

u · v = ‖u‖‖v‖ cos θ

The angle defined in this way is exactly the usual angle between twovectors in R

2 or R3.

That our definition of angle makes sense relies on the following

Theorem (Cauchy-Schwarz Inequality for Rn)

Let u, v be vectors. Then

|u · v| 6 ‖u‖‖v‖with equality holding precisely when u is a multiple of v.

Proof: Consider the quadratic polynomial given by(u + tv) · (u + tv)...

112

ExampleFind the angle between the vectors u = 2i + j − k and v = i + 2j + k.

113

Properties of the dot product

1. u · v is a scalar

2. u · v = v · u3. u · (v + w) = u · v + u · w4. u · u = ‖u‖2

5. (αu) · v = α(u.v)

NoteSuppose u = (u1, . . . , un) and v = (v1, . . . , vn) are two vectors in R

n.If we write each as a row matrix U = [u1 · · · un] V = [v1 · · · vn], then

u · v = UV T

114

3.3 Cross product of vectors in R3 [AR 3.4]

Let u = (u1, u2, u3) and v = (v1, v2, v3) be two vectors in R3.

Definition (Cross product)

The cross product (or vector product) of u and v is the vector given by

u × v = (u2v3 − u3v2)i + (u3v1 − u1v3) j + (u1v2 − u2v1)k

A convenient way to remember this is as a “determinant”

u × v =

∣∣∣∣∣∣

i j k

u1 u2 u3

v1 v2 v3

∣∣∣∣∣∣

=

∣∣∣∣

u2 u3

v2 v3

∣∣∣∣i −

∣∣∣∣

u1 u3

v1 v3

∣∣∣∣j +

∣∣∣∣

u1 u2

v1 v2

∣∣∣∣k

using cofactor expansion along the first row.

115

Geometry of the cross product

u × v = ‖u‖ ‖v‖ sin(θ) nu

v

n

θ

n is a unit vector (i.e., ‖n‖ = 1) n is perpendicular to both u and v (i.e., n · u = 0 and n · v = 0) n points in the direction given by the right-hand rule θ ∈ [0, π] is the angle between u and v If θ = 0 or θ = π, then u × v = 0

ExampleFind a vector perpendicular to both (2, 3, 1) and (1, 1, 1).

116

Properties of the cross product

1. v × u = −(u × v)

2. u × (v + w) = (u × v) + (u × w)

3. (αu) × v = α(u × v)

4. u × 0 = 0

5. u × u = 0

6. u · (u × v) = 0

NoteThe cross product is defined only for R

3. Unlike dot product and manyof the other properties we are considering, it does not extend to R

n ingeneral.

117

3.4 Geometric applications [AR 3.5]

Basic applications

Suppose u, v ∈ R3 are two vectors.

1. u and v are perpendicular precisely when u · v = 0

2. The area of the parallelogramdefined by u and v is equal to

‖u × v‖ u

v

NoteIf u and v are elements of R

2 with u = (u1, u2) and v = (v1, v2), then

area of parallelogram = absolute value of

∣∣∣∣

u1 u2

v1 v2

∣∣∣∣

118

ExampleFind the area of the triangle with vertices (2,−5, 4), (3,−4, 5) and(3,−6, 2).

119

3. Assuming u 6= 0,the projection of v onto u isgiven by

proju v =(u · v)‖u‖2

uu

v

proju v

(v − proju v)

Notice that:

If we set u = u/‖u‖, then proju v = (u · v) u

u · (v − proju v) =

120

ExampleLet w = (2,−1,−2) and v = (2, 1, 3).Find vectors v1 and v2 such that

v = v1 + v2,

v1 is parallel to w, and

v2 is perpendicular to w.

121

Scalar triple product

Notice that

u · (u × v) =

∣∣∣∣∣∣

u1 u2 u3

u1 u2 u3

v1 v2 v3

∣∣∣∣∣∣

= 0

Similarly

u · (v × u) = v · (u × v) = v · (v × u) = 0

In general, u · (v ×w) is called the scalar triple product and is given by

u · (v × w) =

∣∣∣∣∣∣

u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣

122

Suppose u, v,w ∈ R3 are three vectors.

The parallelepiped defined by u, v and w

u

w

v

has volume equal to the scalar triple product of u, v and w

volume of parallelepiped = |u·(v×w)| = absolute value of

∣∣∣∣∣∣

u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣

123

ExampleFind the volume of the parallelepiped with adjacent edges

PQ,

PR,

PS ,where the points are: P(2,−1, 1), Q(4, 6, 7), R(5, 9, 7) and S(8, 8, 8).

124

Lines

Vector equation of a lineThe vector equation of a line through a point P0 in the directiondetermined by a vector v is

r = r0 + tv t ∈ R

where r0 =

OP0

P0

r = r0 + tvr0

tv

0

Each point on the line isgiven by a unique valueof t.

125

Letting r = (x , y , z), r0 = (x0, y0, z0) and v = (a, b, c) the equationbecomes

(x , y , z) = (x0, y0, z0) + t(a, b, c)

Parametric equations for a lineEquating components gives the parametric equations for the line:

x = x0 + ta

y = y0 + tb t ∈ R

z = z0 + tc

ExampleWhat is the parametric form of the line passing through the pointsP(−1, 2, 3) and Q(4,−2, 5)?

126

Cartesian equations for a lineIf a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for tand equate. This gives the cartesian form of the straight line:

x − x0

a=

y − y0

b=

z − z0

c

ExampleWhat is the cartesian form of the line passing through the pointsP(−1, 2, 3) and Q(4,−2, 5)?

127

ExampleFind the vector equation of the line whose cartesian form is

x + 1

5=

y − 3

−1=

z − 4

2

128

Definition

Two lines are said to:

intersect if there is a point lying in both

be parallel if their direction vectors are parallel

The angle between two lines is the angle between their direction vectors

ExampleFind the vector equation of the line through the point P(0, 0, 1) that isparallel to the line given by

x − 1

1=

y + 2

2=

z − 6

2

129

Planes

A plane is determined by a point P0 on it and any two non-parallelvectors u and v lying in the plane.

Vector equation of a plane

Letting r0 =

OP0, the vector equation of the plane is

r = r0 + su + tv s, t ∈ R

P0

r = r0 + su + tv

su

tv

0

Particular values of s and t determine a point on the plane, andconversely every point on the plane has position vector given by some sand t.

130

The vector

n =(u × v)

‖u × v‖is a unit vector that is perpendicular to the plane.

Such a vector is called a unit normal vector to the plane.

The angle between two planes is given by the angle between their (unit)normal vectors.

If P is a point with position vector r, then:

P lies on the plane ⇐⇒ r − r0 is parallel to plane

⇐⇒ r − r0 is perpendicular to n

131

It follows that the equation ofthe plane can also be written as

(r − r0) · n = 0

r0r

n r− r0

0

Writing r = (x , y , z) and n = (a, b, c), we obtain the Cartesian equationof the plane

ax + by + cz = d

132

Examples

1. The plane perpendicular to the direction (1, 2, 3) and through thepoint (4, 5, 6) is given by x + 2y + 3z = d whered = 1 × 4 + 2 × 5 + 3 × 6. That is

x + 2y + 3z = 32

2. What is the equation of the plane perpendicular to (1, 0,−2) andcontaining the point (1,−1,−3)?

133

3. The plane through (1, 1, 1) containing vectors parallel to (1, 0, 1)and (0, 1, 2) is the set of all vectors of the form(1, 1, 1) + s(1, 0, 1) + t(0, 1, 2) s, t ∈ R

4. Find the Cartesian equation of the plane with vector form(x , y , z) = (1, 1, 1) + s(1, 0, 1) + t(0, 1, 2).

5. Find the vector and cartesian equations of the plane containing thethree points P(2, 1,−1), Q(3, 0, 1), and R(−1, 1,−1).

134

Intersection of a line and a plane

ExampleWhere does the line

x − 1

1=

y − 2

2=

z − 3

3

meet the plane 3x + 2y + z = 20?

A typical point of the line is

Putting this into the equation for the plane, we get

This gives the point of intersection as (2, 4, 6)

135

Intersection of two planes

Example What is the cartesian equation of the line of intersection ofthe two planes x + 3y + 2z = 6 and 3x + 2y + z = 11?

The direction of the line is perpendicular to both normals and so isgiven by

A point on the line is given by solving the two equations. For example:

Thus the equation of the line is

Or you could solve the two equations (for the planes) simultaneously.

136

Distance from a point to a line

Example Find the distance from the point P(2, 1, 1) to the line withcartesian equation

x − 2

1=

y − 1

1=

z

2

137

3.5 Linear Combinations [AR 5.2]

We have seen that a plane through the origin in R3 can be built up by

starting with two vectors and taking all scalar multiples and sums.

In this way, we can build up lines, planes and their higher dimensionalversions.

Definition (Linear combination)

A linear combination of vectors v1, v2, . . . , vk ∈ Rn is a vector of the

formw = α1v1 + α2v2 + · · · + αkvk

where α1, α2, . . . , αk are scalars (i.e., real numbers).

138

Examples

1. w = (2, 3) is a linear combination of e1 = (1, 0) and e2 = (0, 1)

2. w = (2, 3) is a linear combination of v1 = (1, 2) and v2 = (3, 1)

3. w = (1, 2, 3) is not a linear combination of v1 = (1, 1, 1) andv2 = (0, 0, 1)

139

Linear Dependence [AR 5.3]

By taking all linear combinations of a given set of vectors, we can buildup lines, planes, etc.

We can make this more efficient by removing any redundant vectorsfrom the set.

Definition (Linear dependence)

A collection of vectors S is linearly dependent if there are vectorsv1, . . . , vk ∈ S and scalars α1, . . . , αk such that

α1v1 + · · · + αkvk = 0

and at least one of the αi is non-zero.

RememberThe zero vector 0 = (0, . . . , 0) is not the same as the number zero.

140

So, the set of vectors is linearly dependent if and only if somenon-trivial linear combination gives the zero vector.

Theorem (AR Thm 5.3.1)

Vectors v1, . . . , vk (where k > 2) are linearly dependent iff one vector isa linear combination of the others.

Proof:

Two vectors are linearly dependent iff one is a multiple of the other.

Three vectors in R3 are linearly dependent iff they lie in a plane.

141

Definition (Linear independence)

A set of vectors is called linearly independent if it is not linearlydependent.

Examples

1. The vectors (2,−1) and (−6, 3) are linearly dependent.

2. (2,−1) and (4, 1) are linearly independent.

142

Examples

1. The vectors (1, 0, 0), (1, 1, 0) and (1, 1, 1) are linearly independent.

2. (1, 1, 2), (1,−1, 2), (3, 1, 6) are linearly dependent.

3. (1, 2, 3), (1, 0, 0), (0, 1, 0), (0, 1, 1) are linearly dependent.

143

To decide if vectors v1, . . . , vk ∈ Rn are linearly independent

1. Form the matrix A having the vectors as columns(we’ll denote it by A = [v1 · · · vk ])

2. Reduce to row-echelon form R

3. Count the number of non-zero rows in R (this is rank(A)):

v1, . . . , vk are linearly independent ⇐⇒ rank(A) = k

Why does this method work?

It follows from what we know about the number of solutions of a linearsystem (slide 79)

144

Examples

1. (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0) ∈ R4 are linearly independent

2. (1, 1), (2, 3), (1, 0) ∈ R2 are linearly dependent

145

An important observation

If A and B are row-equivalent matrices, then the columns of A satisfythe same linear relations as the columns of B . This is the same as sayingthat the two systems of linear equations have the same set of solutions.

This implies that relations between the (original) vectors v1, . . . , vk canbe read from the reduced row-echelon form of the matrix [v1 · · · vk ].

ExampleIf

[v1 · · · v5] ∼

1 3 0 2 00 0 1 1 00 0 0 0 1

then v2 = 3v1 and v4 = 2v1 + v3.

146

Let’s illustrate this with an

ExampleThe vectorsv1 = (1, 2, 1, 3), v2 = (2, 4, 2, 6), v3 = (0,−1, 3,−1), v4 = (1, 3,−2, 4)satisfy v2 = 2v1 and v4 = v1 − v3

147

Useful Facts:

From the method above for deciding whether vectors are linearlydependent we can derive the following.

Theorem (AR Thm 5.3.3)

If k > n, then any k vectors in Rn are linearly dependent.

Idea of proof: Because then the rre form of the matrix [v1 · · · vk ] mustcontain columns which do not have a leading entry. So these columnswill be dependent on other columns which do have a leading entry.

148

Theorem

Vectors v1, . . . , vn ∈ Rn are

linearly independent iff the matrix A = [v1 · · · vn] has det(A) 6= 0.

Idea of proof:

linearly indep ⇐⇒ rank(A) = n ⇐⇒ det(A) 6= 0

149

ExampleDecide whether the following vectors in R3 are linearly dependent orindependent:

(1, 2,−1), (0, 3, 4), (2, 1,−6), (0, 0, 2)

If they are dependent, write one as a linear combination of the others.

150

3.6 Subspaces of Rn [AR 5.2]

Certain subsets S of Rn have the nice property that any linear

combination of vectors from S still lies in S .

Definition (Subspace)

A subspace (of Rn) is a subset S (of R

n) that satisfies:

0. S is non-empty

1. u,w ∈ S =⇒ u + w ∈ S (closed under vector addition)

2. u ∈ S , α ∈ R =⇒ αu ∈ S (closed under scalar multiplication)

151

Examples

1. The xy -plane S = (x , y , z) ∈ R3 | z = 0 is a subspace of R

3

2. The plane in R3 through the origin and parallel to the vectors

u = (1, 1, 1) and v = (1, 2, 3) is a subspace of R3

3. The set H = (x , y) ∈ R2 | x > 0, y > 0 is not a subspace of R

2

4. Any line or plane in R3 that contains the origin is a subspace.

152

Another exampleShow that the points on the line y = 2x form a subspace of R

2

NoteEvery subspace of R

n contains the zero vector.

So the line y = 2x + 1 is not a subspace of R2.

In fact, a line or plane in R3 is a subspace iff it contains the origin.

153

An important example of a subspace is given by the following.

ExampleThe solution to the homogeneous linear system

x + y + z = 0

x − y − z = 0

is a subspace of R3.

Remember that a linear system is called homogeneous if all theconstant terms are equal to zero.

154

In general, we have the following:

Proposition (AR Thm 5.2.2)

The set of solutions of a system of m homogeneous linear equations inn variables is a subspace of R

n.

Proof:We check the 3 conditions in the definition of subspace

155

Generating a subspace

Let v1, . . . , vk be vectors in Rn.

Definition (Span)

The subspace spanned (or subspace generated) by these vectors is theset of all linear combinations of the given vectors:

Spanv1, . . . , vk = α1v1 + · · · + αkvk | α1, . . . , αk ∈ R

It is sometimes also denoted by 〈v1, . . . , vk〉.

Examples

1. In R2, Span(3, 2) is the line through the origin in the direction

given by (3, 2), i.e., the line y = 23x

2. In R3, Span(1, 1, 1), (3, 2, 1) is the plane x − 2y + z = 0

3. In R3, Span(1, 1, 1), (3, 3, 3) is the line x = y = z

156

Lemma (AR Thm 5.2.3)

1. Spanv1, . . . , vk is a subspace of Rn.

2. Any subspace of Rn that contains the vectors v1, . . . , vk contains

Spanv1, . . . , vk.

Proof:

RemarkThe subspace spanned by a set of vectors is the ‘smallest’ subspace thatcontains those vectors.

157

More examplesIn R

3:

1. Span(1, 1, 0) is the line through the origin containing the point(1, 1, 0).

2. Span(1, 0, 0), (1, 1, 0) is the xy -plane.

3. Span(1, 0, 0), (−3, 7, 0), (1, 1, 0) is the xy -plane.

4. Span(1, 0, 0), (2, 3,−4), (1, 1, 0) is the whole of R3.

158

Spanning sets

Definition (Spanning set)

Let V be a subspace of Rn.

Vectors v1, . . . , vk ∈ Rn span V if Spanv1, . . . , vk = V .

Such a set of vectors is called a spanning set for V .

Equivalently, V contains all the vectors v1, . . . , vk and all vectors in Vare linear combinations of the vectors v1, . . . , vk .

Examples

1. (1, 0) and (1, 1) span R2

2. (1, 1, 2) and (1, 0, 1) do not span R3

3. (1, 1, 2), (1, 0, 1) and (2, 1, 3) do not span R3

159

ExampleShow that the following vectors span R

4:

(1, 0,−1, 0), (1, 1, 1, 1), (3, 0, 0, 0), (4, 1,−3,−1)

We need to show that for any vector (a, b, c , d) ∈ R4, the equation

x(1, 0,−1, 0)+ y(1, 1, 1, 1)+ z(3, 0, 0, 0) + w(4, 1,−3,−1) = (a, b, c , d)

has a solution.

160

Writing this linear system in matrix form gives:

1 1 3 40 1 0 1−1 1 0 −30 1 0 −1

︸︷︷︸

xyzw

=

abcd

A

Which has augmented matrix:

1 1 3 40 1 0 1−1 1 0 −30 1 0 −1

∣∣∣∣∣∣∣∣

abcd

We know that this linear system is consistent for all possible values ofa, b, c , d if and only if rank(A) = 4(Which is the case in this example)

161

To decide if v1, . . . , vk ∈ Rn span Rn

1. Form the matrix A = [v1 · · · vk ]having columns given by the vectors v1, . . . , vk

2. Calculate rank(A) as before:

a. Reduce to row-echelon formb. Count the number of non-zero rows

Then, v1, . . . , vk span Rn ⇐⇒ rank(A) = n

Why does this method work?

162

Since A = [v1 · · · vk ] has k columns, we know that rank(A) 6 k.

It follows that:

Proposition

If k < n, then v1, . . . , vk ∈ Rn can’t span R

n.

163

3.7 Bases and dimension [AR 5.4]

In general, spanning sets can be (too) big.

For example, (1, 0), (0, 1), (2, 3), (−7, 4) span R2, but the last two are

not needed since they can be expressed as linear combinations of thefirst two. The vectors are not linearly independent.

Bases

Definition (Basis)

A basis for a subspace V ⊆ Rn is a set of vectors from V which

1. spans V

2. is linearly independent

164

Examples

1. (1, 0), (0, 1) is a basis for R2

2. (2, 0), (−1, 1) is a basis for R2

3. (2,−1,−1), (1, 2,−3) is a basis for the plane x + y + z = 0 in R3

4. (2, 3, 7) is a basis for the line in R3 given by x

2 = y3 = z

7

NoteA subspace of R

n can have many bases. For example, any two vectorsin R

2 which are not collinear will form a basis of R2.

165

Notation/example:The vectors

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 0, 1)

form a basis of Rn, called the standard basis.

166

The following is an important and very useful theorem about bases:

Theorem (AR Thms 5.4.2, 5.4.3)

Let V be a subspace of Rn and let v1, . . . , vk be a basis for V .

1. If a subset of V has more than k vectors, then it is linearlydependent.

2. If a subset of V has less than k vectors, then it does not span V .

3. Any two bases of V have the same number of elements.

Proof:The proof is based on what we already know about (homogeneous)linear systems:

167

So, in particular, every basis of Rn has exactly n elements.

Dimension

The above theorem tells us that although there can be many differentbases for the same space, they will all have the same number of vectors.

Definition (Dimension)

The dimension of a subspace V is the number of vectors in a basis forV . This is denoted dim(V ).

168

Examples

1. The dimension of R2 is

2. Rn has dimension

3. The line (α,α, α) | α ∈ R ⊆ R3 has dimension

4. The plane (x , y , z) | x + y + z = 0 ⊆ R3 has dimension

In the special case where V = 0 it is convenient to say dim(V ) = 0.

169

Calculating bases I

How can we find a basis for a given subspace?

The method will depend on how the subspace is described.Often we know (or can easily calculate) a spanning set.

ExampleFind a basis for the subspace

S = (a + 2b, b,−a + b, a + 3b) | a, b ∈ R ⊆ R4

What is the dimension of S?

170

Another exampleFind a basis for the subspace of R

4 spanned by the vectors(1,−1, 2, 1), (−2, 2,−4,−2), (1, 0, 3, 0).

171

To find a basis for the span of a set of vectors

To find a basis for the subspace V spanned by vectors v1, . . . , vk ∈ Rn:

1. Form the matrix A = [v1 · · · vk ]

2. Reduce to row-echelon form B

3. The columns of A corresponding to the steps (leading entries) in Bgive a basis for V

Why does this method work?Recall that if A ∼ B , then the columns of A and B satisfy the samelinear relationships. The question then reduces to one about matrices inreduced row-echelon form.

172

Remarks

Don’t forget that it is the columns of A (not its row-echelon form)that we use for the basis.

This method gives a basis that is a subset of the original set ofvectors. Later we will give a second method which gives a basis ofdifferent, but usually simpler, vectors

ExampleLet S = (1,−1, 2, 1), (0, 1, 1,−2), (1,−3, 0, 5)Find a subset of S that is a basis for 〈S〉

173

Column space of a matrix

Definition (Column space)

Let A be a m × n matrix. The subspace of Rm spanned by the columns

of A is called the column space of A.

Suppose A ∼ B . In general, the column space of B is not equal to thecolumn space of A, although they do have the same dimension.

Example

A =

1 0 1−1 1 −32 1 0

∼

1 0 10 1 −20 0 0

= B

The column space of A is

The column space of B is

Are they equal?

What is the dimension of each subspace?174

Suppose that S = v1, . . . , vk ⊆ Rn is a set of vectors.

Remember that we saw a method for obtaining a basis for Span(S) thatstarted with the matrix [v1 · · · vk ] having the vectors of S as columns.

This method gave us a basis of Span(S) that is a subset of S .

So, if V is a subspace of Rn:

Every spanning set for V contains a basis for V .

It also follows from the above method that:

Every linearly independent subset of V extends to a basis for V .

175

Let m = dim(V ). From the above observation, since any basis musthave m elements, we conclude the following:

Theorem

Let V be a subspace of Rn, and suppose that dim(V ) = m.

1. If a spanning set of V has exactly m elements, then it is a basis.

2. If a linearly independent subset of V has exactly m elements, thenit is a basis.

ExampleGiven that

(1,−π,√

2), (23, 1, 100), (3

2, 7, 1)

is a spanning set for R3, it is a basis for R

3.

176

Calculating bases II

Another method for finding a basis: The ‘row method’

We first illustrate by repeating a previous example, but this time using amatrix that has the vectors as rows (not columns).

ExampleLet S = (1,−1, 2, 1), (0, 1, 1,−2), (1,−3, 0, 5)Find a subset of S that is a basis for 〈S〉.

1 −1 2 10 1 1 −21 −3 0 5

∼ · · · ∼

1 −1 2 10 1 1 −20 0 0 0

It follows that a basis for 〈S〉 is

Can you see why?177

To find a basis for the span of a set of vectors: the ‘row method’

To find a basis for the subspace V spanned by vectors v1, . . . , vk ∈ Rn.

1. Form the matrix A whose rows are the given vectors.

2. Reduce to row-echelon form B

3. The non-zero rows of B are a basis for V .

To explain why this works, let’s introduce some more terminology.

178

Row space of a matrix

Definition

Let A be a m× n matrix. The subspace of Rn spanned by the rows of A

is called the row space of A.

Suppose A ∼ B . The above method relies on the fact that (unlike thesituation with the column space):

The row space of B is always equal to the row space of A.

Why? Can you prove it?

179

Example

A =

1 0 1−1 1 −32 1 0

∼

1 0 10 1 −20 0 0

= B

The row space of A is Span

The row space of B is Span

NoteFor any matrix A:

rank(A) = dim(row space of A) = dim(column space of A)

This is because the number of non-zero rows in the row-echelon form isequal to the number of leading entries.

180

ExampleLet

A =

1 −1 2 −22 0 1 05 −3 7 −61 1 −1 3

Find a basis and the dimension for:

1. the column space of A;

2. the row space of A.

A =

1 −1 2 −22 0 1 05 −3 7 −61 1 −1 3

∼ · · · ∼

1 −1 2 −20 2 −3 40 0 0 10 0 0 0

So...

181

Bases for solution spaces

We have seen how to find a basis for a subspace given a spanning setfor the subspace.

Another way in which subspaces arise is as the set of solutions to ahomogenous linear system.

We saw previously that such a solution set is always a subspace of Rn

where n is the number of variables.

How can we find a basis for it?

We first illustrate with an example.

182

ExampleFind a basis for the subspace of R

4 defined by the equations

x1 + x3 + x4 = 0

3x1 + 2x2 + 5x3 + x4 = 0

x2 + x3 − x4 = 0

183

Finding a basis for the solution space:

To find a basis for the solution space of a system of homogeneous linearequations.

1. Write the system in matrix form Ax = 0

2. Reduce A to row-echelon form B

3. Solve the system Bx = 0 as usual, and write the solution in theform

x = t1v1 + · · · + tkvk

where t1, . . . , tk ∈ R are parameters and v1, . . . , vk ∈ Rn

Then v1, · · · , vk is a basis for the solution space.

184

Why does this work?

The set S = v1, · · · , vk is a spanning set for the solution space, sinceevery solution can be written as a linear combination of the vectors in S .

The fact that the vectors are linearly independent results from the wayin which they were defined.

Suppose we chose the parameters to be the variables whosecorresponding column in B had no leading entry. Say ti = xmi

.

Then the mj coordinate of vi is 1 if i = j and 0 if i 6= j .

It follows that the vi are linearly independent.

185

Another exampleFind a basis for the subspace of R

4 given by

V = (x1, x2, x3, x4) | x1 + 2x2 + x3 + x4 = 0, 3x1 + 6x2 + 4x3 + x4 = 0

The matrix A of coefficients is

A =

[1 2 1 13 6 4 1

]

∼ · · · ∼[1 2 0 30 0 1 −2

]

The solutions are given by

(x1, x2, x3, x4) = t1( , , , ) + t2( , , , ) t1, t2 ∈ R

So a basis for V is:

186

3.8 Rank-nullity theorem

Given a m × n matrix A there are three associated subspaces:

1. The row space of A is the subspace of Rn spanned by the rows.

Its dimension is equal to rank(A)

2. The column space of A is the subspace of Rm spanned by thecolumns. Its dimension is equal to rank(A)

3. The solution space of A is the subspace of Rn given by

(x1, . . . , xn) ∈ Rn | A

x1...xn

=

0...0

The solution space is also called the nullspace of A.Its dimension is called the nullity of A, nullity(A).

187

We have seen techniques to find bases for the row space, column spaceand solution space of a matrix.

As a result of these techniques we observe the following:

Theorem (cf. AR Thm 8.2.3)

Suppose A is an m × n matrix. Then

rank(A) + nullity(A) = n

Given what we know about finding the rank and the solution space of amatrix, this is simply the statement that every column in therow-echelon form either contains a leading entry or doesn’t contain aleading entry.

188

NoteIf you are asked to find the solution space, the column space and therow space of A, you only need to find the reduced row-echelon form ofA once.

Remember

If A and B are row equivalent matrices, then:

the row space of A is equal to the row space of B

the solution space of A is equal to the solution space of B

the column space if A is not necessarily equal to the column spaceof B

the columns of A satisfy the same linear relations as the columns ofB

the dimension of the column space of A is equal to the dimension ofthe column space of B

189

3.9 Coordinates relative to a basis

Definition (Coordinates)

Suppose B = v1, . . . , vn is a basis for Rn. For v ∈ R

n write

v = α1v1 + · · · + αnvn for some scalars α1, . . . , αn

The scalars α1, . . . , αn are called the coordinates of v relative to B.

The vector (α1, . . . , αn) ∈ Rn is called the coordinate vector of v

relative to B.

The column matrix

[v ]B =

α1...

αn

is called the coordinate matrix of v with respect to B.

190

Examples

1. If we consider R2 with the standard basis B = i, j,

the vector v = (1, 5) has coordinates [v]B =

[15

]

2. Now consider R2 with basis

B′ = a,b, where a = (2, 1) andb = (−1, 1).

Then v = (1, 5) has coordinates

[v]B′ =

[23

]v

i

5j

2a

3b

191

Having fixed a basis, there is a one-to-one correspondence betweenvectors and their coordinate matrices.

If u, v are vectors in Rn, and α is a scalar, then

[u + v]B = [u]B + [v]B[αv]B = α[v]B

A consequence of this is that vectors are linearly independent if andonly if their coordinate matrices are (using any basis).(Exercise: Prove this!)

192

Topic 4: General Vector Spaces [AR chapt 5]

4.1 The vector space axioms

4.2 Examples of vector spaces

4.3 Complex vector spaces

4.4 Subspaces of general vector spaces

4.5 Spanning sets, linear independence and bases

4.6 Coordinate matrices

193

Vectors in Rn have some basic properties shared by many many other

mathematical systems. For example,

u + v = v + u for all u, v

is true for other systems.

Key Idea:

Write down these basic properties and look for other systems whichshare these properties. Any system that does share these properties willbe called a vector space.

194

4.1 The vector space axioms [AR 5.1]

Let’s start trying to write down the basic properties that we want‘vectors’ to satisfy.

A vector space is a set V with two operations defined:

1. addition

2. scalar multiplication

We want these two operations to satisfy the kind of algebraic propertiesthat we are used to from vectors in R

n.

For example, we want our vector operations to satisfy

u + v = v + u and α(u + v) = αu + αv

This leads to a list of ten properties (or axioms) that we will then takeas our definition.

195

The scalars are members of a number system F called a field in whichwe have addition, subtraction, multiplication and division.

Usually the field will be F = R or C.

In this subject we will mainly concentrate on the case F = R.

196

Definition (Vector Space)

A vector space is a non-empty set V with two operations: addition andscalar multiplication.These operations are required to satisfy the following rules.For any u, v,w ∈ V :

Addition behaves well:

A1 u + v ∈ V (closure of vector addition)

A2 (u + v) + w = u + (v + w) (associativity)

A3 u + v = v + u (commutativity)

There must be a zero and inverses:

A4 There exists a vector 0 ∈ V such that

v + 0 = v for all v ∈ V (existence of zero vector)

A5 For all v ∈ V , there exists a vector −v

such that v + (−v) = 0 (additive inverses)

197

Definition (Vector Space ctd)

For all u, v,∈ V and α, β ∈ F:

Scalar multiplication behaves well:

M1 αv ∈ V (closure of scalar multiplication)

M2 α(βv) = (αβ)v (associativity of scalar multip)

M3 1v = v (multiplication by unit scalar)

Addition and scalar multiplication combine well :

D1 α(u + v) = αu + αv (distributivity 1)

D2 (α + β)v = αv + βv (distributivity 2)

198

RemarkIt follows from the axioms that for all v ∈ V :

1. 0v = 0

2. (−1)v = −v

Can you prove these?

We are going to list some systems that obey these rules.

We are not going to show that the axioms hold for these systems.If, however, you would like to get a feel for how this is done, readAR5.1, Example 2.

199

4.2 Examples of vector spaces

1. Rn is a vector space with scalars R

After all, this was what we based our definition on! Vector spaces withR as the scalars are called real vector spaces

200

2. Vector space of matrices

Denote by Mmn( or Mmn(R) Mm,n(R) or Mm×n(R)) the set of all m × nmatrices with real entries.Mmn is a real vector space with the following familiar operations:

[a11 a12

a21 a22

]

+

[b11 b12

b21 b22

]

=

[a11 + b11 a12 + b12

a21 + b21 a22 + b22

]

α

[a11 a12

a21 a22

]

=

[αa11 αa12

αa21 αa22

]

What is the zero vector in this vector space?

Note: Matrix multiplication is not a part of this vector space structure.

201

3. Vector space of polynomials

For a fixed n, denote by Pn (or Pn(R)) the set of all polynomialswith degree at most n:

Pn = a0 + a1x + a2x2 + · · · + anx

n | a0, a1, . . . , an ∈ R

If we define vector addition and scalar multiplication by:

(a0 + · · · + anxn) + (b0 + · · · + bnx

n) = (a0 + b0) + · · · + (an + bn)xn

α(a0 + a1x + · · · + anxn) = (αa0) + (αa1)x + · · · + (αan)x

n

Then Pn is a vector space.

202

4. Vector space of functions

Let S be a set.

Denote by F(S , R) the set of all functions from S to R.

Given f , g ∈ F(S , R) and α ∈ R,let f + g and αf be the functions defined by the following:

(f + g)(x) = f (x) + g(x)

(αf )(x) = α × f (x)

Equipped with these operations F(S , R) is a vector space.

RemarkThe ‘+’ on the left of the first equation is not the same as the ‘+’ onthe right!

Why not?

203

ExampleLet f , g ∈ F(R, R) be defined by

f : R → R, f (x) = sin x and g : R → R, g(x) = x2

What do f + g and 3f mean?

204

What is the zero vector in F(R, R)?

0 : R → R is the function defined by

0(x) = 0

That is, 0 is the function that maps all numbers to zero.

205

4.3 Complex vector spaces

A vector space that has C as the scalars is called a complex vectorspace.

Example

C2 = (a1, a2) | a1, a2 ∈ C

with the operations :

(a1, a2) + (b1, b2) = (a1 + a2, b1 + b2)

α(a1, a2) = (αa1, αa2)(where a1, a2, b1, b2, α ∈ C)

is a complex vector space.

RemarkAll of the above examples of real vector spaces: R

n, Pn(R), F(S , R)have complex analogues: C

n, Pn(C), F(S , C)206

Important observationAll the concepts we looked at for R

n

(such as subspaces, linear independence, spanning sets, bases)carry over directly to general vector spaces.

Why consider general vector spaces?

207

4.4 Subspaces of general vector spaces

Definition (Subspace)

A subspace of a vector space V is a subset S ⊆ V that is itself a vectorspace (using the operations from V ).

This looks slightly different to the definition we had for subspaces of Rn.

The following theorem shows that, in fact, we get the same thing.

Theorem (Subspace Theorem, AR Thm 5.2.1)

Let V be a vector space.A subset W ⊆ V is a subspace of V if and only if

0. W is non-empty

1. W is closed under vector addition

2. W is closed under scalar multiplication

208

NoteIt follows that a subspace W of V must necessarily contain the zerovector 0 ∈ V

ExampleLet V = M2,2 the vector space of real 2 × 2 matrices and H ⊆ V bematrices with trace equal to 0, where ‘trace’ is the sum of the diagonalentries.In other words

H =

[a bc d

]

| a + d = 0

Show that H is a subspace of V .

209

Another exampleLet

V = P2 = a0 + a1x + a2x2 | a0, a1, a2 ∈ R

andW = a0 + a1x + a2x

2 | a1a2 > 0 ⊆ V

Is W a subspace of V ?

210

More examples

1. 0 is always a subspace of V

2. V is always a subspace of V

3. The set of diagonal matrices

a 0 00 b 00 0 c

| a, b, c ∈ R

is a subspace of M3,3

4. The subset of continuous functions f : R → R | f is continuousis a subspace of F(R, R)

5. S = 2 × 2 matrices with determinant equal to 0=

[a bc d

]

| ad − bc = 0 is not a subspace of M2,2

6. f : [0, 1] → R | f (0) = 2 is not a subspace of F([0, 1], R)

211

4.5 Spanning sets, linear independence and bases

These concepts, which we have seen for subspaces of Rn,

apply equally well in a general vector space.

Let V be a vector space with scalars F and S ⊆ V a subset.

Definition

A linear combination of vectors v1, v2, . . . , vk ∈ S is a sum

α1v1 + · · · + αkvk

where each αi is a scalar.

Definition

The set S is linearly dependent if there are vectors v1, . . . , vk ∈ S andscalars α1, . . . , αk at least one of which is non-zero, such that

α1v1 + · · · + αkvk = 0

A set which is not linearly dependent is called linearly independent.212

Definition

The span of the set S is the set of all linear combinations of vectorsfrom S

Span(S) = α1v1 + · · · + αkvk | v1, . . . , vk ∈ S and α1, . . . , αk ∈ F

S is called a spanning set for V if Span(S) = V

This is the same as saying that S ⊆ V and every vector in V can bewritten as a linear combination of vectors from S .

Definition

A basis for V is a set which is both linearly independent and a spanningset for V .

213

ExampleAre the following elements of M2,2 linearly independent?

[1 30 1

]

,

[−2 10 −1

]

,

[1 30 4

]

214

Another example

In P2 the element p(x) = 2 + 2x + 5x2 is a linear combination ofp1(x) = 1 + x + x2 and p2(x) = x2, but q(x) = 1 + 2x + 3x2 is not.

So p1, p2 is not a spanning set for P2, though it is linearlyindependent.

215

As with Rn we have the following important results:

Theorem

Let V be a vector space.

1. Every spanning set for V contains a basis for V

2. Every linearly independent set in V can be extended to a basis of V

3. Any two bases of V have the same cardinality(i.e., ‘same number of elements’)

The basic idea behind the proof is exactly as we saw with Rn. But there

are some very interesting technical differences! These mostly concernthe possibility that a basis might have infinitely many elements.

216

Definition

The dimension of V , denoted dim(V ), is the number of elements in abasis of V . We call V finite dimensional if it admits a finite basis, andinfinite dimensional otherwise.

Examples

1. 1, x , x2, . . . , xn is a basis for Pn. So dim(Pn(R)) = n + 1

2. [1 00 0

]

,

[0 10 0

]

,

[0 01 0

]

,

[0 00 1

]

is a basis for M2×2,

so dim(M2×2) = 4.

3. [1 00 −1

]

,

[0 10 0

]

,

[0 01 0

]

is a basis for the vector space of 2× 2

matrices with trace equal to zero.

This is therefore a 3 dimensional subspace of M2×2.

217

An infinite dimensional example

Let P be the set of all polynomials:

P = a0 + a1x + · · · + an−1xn−1 + anx

n | n ∈ N, a0, a1, . . . , an ∈ R

Then P is a vector space and the set B = 1, x , x2, . . . is a basis for P.

So P is an infinite dimensional vector space.

Can you see why B is a basis?

218

In the case of a finite dimensional vector space, we have the following:

Theorem

Suppose V has dimension n, and S is a subset of V .

1. If |S | < n, then S does not span V

2. If |S | > n, then S is linearly dependent

The proof is exactly the same as in the case of Rn.

219

Examples

1. The polynomials

2 + x + x2, 1 + x , − 1 − 7x2, x − x2

are linearly dependent, since dim(P2) = 3.

2. The matrices

[2 13 4

]

,

[−1 10 1

]

,

[6 74 5

]

do not span M2×2 since dim(M2×2) = 4

220

Standard Bases

It is useful to fix names for certain bases.

The standard basis for Rn is

(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)

The dimension of Rn is n.

The standard basis for Mm,n is

1 0 . . . 00 0 . . . 0...

......

0 0 . . . 0

,

0 1 . . . 00 0 . . . 0...

......

0 0 . . . 0

, . . . ,

0 0 . . . 00 0 . . . 0...

......

0 0 . . . 1

The dimension of Mm,n is m × n.

221

The standard basis for Pn is

1, x , x2, . . . , xn

The dimension of Pn is n + 1.

222

4.6 Coordinate matrices

The notion of coordinates relative to a basis carries over from the caseof vectors in R

n.

Definition

Suppose that B = v1, . . . , vn is a basis for a vector space V . For anyv ∈ V we have

v = α1v1 + · · · + αnvn for some scalars α1, . . . , αn

The scalars α1, . . . , αn are called the coordinates of v relative to B.

The coordinate vector of v relative to B and coordinate matrix of vwith respect to B are also defined as in the case that V = R

n.

223

Examples

1. In P2 with basis B = 1, x , x2 the polynomial

p = 2 + 7x − 9x2 has coordinates [p]B =

27−9

2. M2×2 with basis B = [1 00 0

]

,

[0 10 0

]

,

[0 01 0

]

,

[0 00 1

]

The matrix A =

[1 23 4

]

has coordinates [A]B =

1234

224

Topic 5: Linear Transformations [AR 4.2, 8.1]

5.1 Linear transformations from R2 to R

2

5.2 Linear transformations from Rn to R

m

5.3 Matrix representations in general

5.4 Image, kernel, rank and nullity

5.5 Change of basis

225

We now turn to thinking about maps from one vector space to another.

A key feature of a vector space V is that given a basis B = v1, . . . vn,each element v ∈ V can be uniquely written as a linear combination ofthe vectors in the basis:

v = α1v1 + α2v2 + · · ·αnvn

In keeping with this, we will undertake a study of mappingsT : V → W between vector spaces which have the property that

T (v) = α1T (v1) + α2T (v2) + · · · + αnT (vn)

226

Definition (Linear transformation)

Let V and W be vector spaces (over the same field of scalars).

A linear transformation from V to W is a map T : V → W such thatfor each u, v ∈ V and for each scalar α:

1. T (u + v) = T (u) + T (v) (T preserves addition)

2. T (αu) = αT (u) (T preserves scalar multiplication)

Loosely speaking, linear transformations are those maps between vectorspaces that ‘preserve the vector space structure.’

227

5.1 Linear transformations from R2 to R

2

We will start by looking at some geometric transformations of R2.

A vector in R2 is an ordered pair (x , y), with x , y ∈ R.

To describe the effect of a transformation we will use coordinatematrices. With respect to the standard basis B = e1, e2, the vector

(x , y) has coordinate matrix

[xy

]

.

Example

Reflection across y -axis.

T([

xy

])

=

[−xy

]

(x , y)

(−x , y)

228

A common feature of all linear transformations is that they can berepresented by a matrix. In the example above

T( [

xy

])

=

[−1 00 1

] [xy

]

=

[−xy

]

The matrix

AT =

[−1 00 1

]

is called the standard matrix representation of the transformation T

229

Examples of (geometric) linear transformations from R2 to R

2

1. Reflection across the x-axis has matrix

[ ]

2. Reflection in the line y = 5x has matrix

[

−1213

513

513

1213

]

3. Rotation around the origin by an angle of π/2 has matrix

[0 −11 0

]

4. Rotation around the origin by an angle of θ has matrix

[ ]

We need to work out the coor-dinates of the point Q obtainedby rotating P .

θP

Q

230

Examples continued

5. Compression/expansion along the x-axis has matrix

[ ]

6. Shear along the x-axis has matrix

[1 c0 1

]

These are best thought of as mappings on a rectangle.

For example, a shear along the x-axis corresponds to the mapping

231

Successive Transformations

ExampleFind the image of (x , y) after a shear along the x-axis with c = 1followed by a compression along the y -axis with c = 1

2 .

Solution:Let R : R

2 → R2 be the compression and denote its standard matrix

representation by AR . Similarly let S : R2 → R

2 be the shear anddenote its standard matrix representation by AS . Then the coordinatematrix of R(S(x , y)) is given by

ARAS

[xy

]

It remains to recall AR and AS , and to compute the matrix products.

232

Note

1. The matrix for the linear transformation S followed by the lineartransformation R is the matrix product ARAS .(In other words ARS = ARAS)

2. Notice that (reading right to left) the two matrices are in theopposite order to the order in which the transformations areapplied.

3. The composition of two linear transformations T (v) = R(S(v)) isalso written T (v) = R S(v)

233

5.2 Linear transformations from Rn to Rm

An example of a linear transformation from R3 to R

2 is

T (x1, x2, x3) = (x2 − 2x3, 3x1 + x3)

To prove that this is a linear transformation, we must show that for any

u, v ∈ R3 and α ∈ R

we have that

T (u + v) = T (u) + T (v) and T (αu) = αT (u)

234

Proof Let u = (u1, u2, u3) and v = (v1, v2, v3). First we note that

u + v = (u1 + v1, u2 + v2, u3 + v3)

Applying T to this gives

T (u + v) = ((u2 + v2) − 2(u3 + v3), 3(u1 + v1) + (u3 + v3))

Re-arranging the right-hand side gives

((u2 + v2) − 2(u3 + v3), 3(u1 + v1) + (u3 + v3))

= (u2 − 2u3 , 3u1 + u3) + ( v2 − 2v3 , 3v1 + v3)

= T (u) + T (v)

For the second part,

T (αu) = T ((αu1, αu2, αu3)) = (αu2 − 2αu3, 3αu1 + αu3)

= α(u2 − 2u3, 3u1 + u3) = αT (u)

235

In fact this example is typical of all linear transformations from Rn to

Rm — the mapping rule gives a vector whose components are linear

combinations of the components of the input vector.

The simplest way to see this is to prove the following theorem.

Theorem

All linear transformations T : Rn → R

m have a standard matrixrepresentation AT specified by

AT = [ [T (e1)] [T (e2)] · · · [T (en)] ]

where each [T (ei )] denotes the coordinate matrix of T (ei )

Note

The matrix AT has size m × n.

Alternative notations for AT include: [T ] or [T ]S or [T ]S,S

236

ProofLet v ∈ R

n and write

v = α1e1 + α2e2 + · · · + αnen

The coordinate matrix of v with respect to the standard basis is then

[v] =

α1

α2...

αn

We seek an m × n matrix AT such that

[T (v)] = AT [v] (independently of v) (†)

In words, this equation says that AT times the column matrix [v] isequal to the coordinate matrix of T (v).Since T is linear, we have that

T (v) = α1T (e1) + α2T (e2) + · · · + αnT (en)

237

We read-off from this that

[T (v)] = α1[T (e1)] + α2[T (e2)] + · · · + αn[T (en)](property ofcoordinate matrices)

= [ [T (e1)] [T (e2)] · · · [T (en)] ]

α1...

αn

(just matrixmultiplication)

Comparing with equation (†) above, we see that we can take

AT = [ [T (e1)] [T (e2)] · · · [T (en)] ]

Summarizing: Given a linear transformation T from Rn to R

m there is acorresponding m × n matrix AT satisfying

[T (v)] = AT [v]

That is, the image of any vector v ∈ Rn can be calculated using thematrix AT .

238

Note

1. All linear transformations map the zero vector to the zero vector.

2. Linear transformations map lines through the origin to other linesthrough the origin (or to just the origin).

3. The above theory generalizes to linear transformations between anyn-dimensional vector space V and m-dimensional vector space W .

239

Examples

1. Define T : R3 → R

4 by T (x1, x2, x3) = (x1, x3, x2, x1 + x4).Calculate AT .

2. Give a reason why the mapping T : R2 → R

2 specified byT (x1, x2) = (x1 − x2, x1 + 1) is not a linear transformation.

3. Find AT for the linear transformation T : P2 → P1 given byT (a0 + a1x + a2x

2) = (a0 + a2) + a0x

240

5.3 Matrix representations in general [AR 8.4]

We’ve talked about the standard matrix representation. This is whenthe vectors are expressed as coordinates with respect to the standardbasis for the spaces involved.

We can also give a matrix representation when the vectors are expressedusing some other basis.

Let U and V be finite dimensional vector spaces. Suppose that

T : U → V is a linear transformation,

B = b1, . . . ,bn is a basis for U and

C = c1, . . . , cm is a basis for V .

241

We want a matrix that can be used to calculate the effect of T .Specifically, if we denote the matrix by AC,B , we want that

[T (u)]C = AC,B[u]B for all u ∈ U. (∗)

NoteFor this matrix equation to make sense the size of AC,B must be m× n.

Theorem

There exists a unique matrix satisfying the above condition (∗). It isgiven by

AC,B =[

[T (b1)]C [T (b2)]C · · · [T (bn)]C]

The proof is the same as for the case of the formula for the standardmatrix AT .

242

A note on notationThe matrix AC,B is also denoted by [T ]C,B. In the special case in whichU = V and B = C, we often write [T ]B in place of [T ]B,B.

Example

A linear transformation T : R3 → R

2 has matrix

[5 1 01 5 −2

]

with

respect to the standard bases of R3 and R

2. What is its matrix withrespect to the basis B = (1, 1, 0), (1,−1, 0), (1,−1,−2) of R

3 andthe basis C = (1, 1), (1,−1) of R

2?

SolutionWe apply T to the elements of B to get:

Then we write the result in terms of C:

We obtain the matrix AC,B =

243

ExampleConsider the linear transformation T : V → V where V is the vectorspace of real valued 2 × 2 matrices and T is defined by

T (Q) = QT = the transpose of Q.

Find the matrix representation of T with respect to bases:

B =

[1 10 0

]

,

[1 00 1

]

,

[1 01 0

]

,

[0 11 0

]

and

C =

[1 00 0

]

,

[0 10 0

]

,

[0 01 0

]

,

[0 00 1

]

244

SolutionThe task is to work out the coordinates with respect to C of the imageof each element in B.

245

5.4 Image, kernel, rank and nullity [AR 8.2]

Let T : U → V be a linear transformation.

Definition (Kernel and Image)

The kernel of T is defined to be

ker(T ) = u ∈ U | T (u) = 0

The image of T is defined to be

Im(T ) = v ∈ V | v = T (u) for some u ∈ U

The kernel is also called the nullspace. It is a subspace of U. Itsdimension is denoted nullity(T ).

The image is also called the range. It is a subspace of V . Its dimensionis denoted rank(T ).

246

ExampleConsider the linear transformation T : R

3 → R2 specified by

T (x , y , z) = (x − y + z , 2x − 2y + 2z)

Find bases for ker(T ) and Im(T ).

247

Definition

A linear transformation T : U → V is called injective if ker(T ) = 0.It is called surjective if Im(T ) = V .

ExampleIs the linear transformation of the preceding example injective ?

Is it surjective ?

248

NoteWhen we calculate ker(T ) we find that we are solving equations of theform ATX = 0. So ker(T ) is the same as the solution space for AT .

To calculate Im(T ) we can use the fact that, because B spans U thenT (B) must span T (U); the image of T . But the elements of T (B) aregiven by the columns of AT . Thus the image of T can be identifiedwith the column space of AT . It follows that rank(T ) = rank(AT ).

We can now use the result of slide 188 to see that for a lineartransformation T : U → V with dim(U) = n

nullity(T ) + rank(T ) = n

249

5.5 Change of basis

Transition Matrices

We have seen that a matrix representation of a linear transformationT : U → V depends on both the choices of bases for U and V .

In fact different matrix representations are related by a matrix whichdepends on the bases but not on T . To understand this, we undertake astudy of converting coordinates with respect to one basis to coordinatesusing another basis.

Let B = b1, . . . ,bn and C = c1, . . . , cn be bases for the samevector space V and let v ∈ V .

How are [v]B and [v]C related?

By multiplication by a matrix!

250

Theorem

There exists a unique matrix P such that for any vector v ∈ V ,

[v]C = P [v]B

The matrix P is given by

P = [ [b1]C · · · [bn]C]

and is called the transition matrix from B to C.

In words, the columns of P are the coordinate matrices, with respect toC, of the elements of B.

We will sometimes denote this transition matrix by PC,B

It is also sometimes denoted by PB→C

251

Proof:We want to find a matrix P such that for all vectors v in V ,

[v]C = P [v]B

Recall that if T : V → V is any linear transformation, then

[T (v)]C = [T ]C,B[v]B (∗)

where[T ]C,B =

[[T (b1)]C [T (b2)]C · · · [T (bn)]C

]

is the matrix representation of T .

252

Applying this to the special case where T (v) = v for all v

(i.e., T is the identity linear transformation)gives

[T ]C,B =[[b1]C [b2]C . . . [bn]C

]

and (∗) becomes[v]C = [T ]C,B[v]B

So we can take P = [T ]C,B

ExerciseFinish the proof by showing that P is unique. That is, if Q is a matrixsatisfying [v]C = Q[v]B for all v, then Q = P .

253

A simple caseThe transition matrix is easy to calculate when one of B or C is thestandard basis.

ExampleIn R

2, write down the transition matrix from B to S, where

B = (1, 1), (1,−1) and S = (1, 0), (0, 1) .

Use this to compute [v]S , given that [v]B =

[11

]

.

Solution

PS,B =[

[b1]S [b2]S]

=

[v]S = PS,B[v]B =

254

Going in the other direction

A useful fact is that transition matrices are always invertible.(Why is this true?)

Starting with the equation

[v]C = PC,B[v]B

and rearranging gives[v]B = P−1

C,B[v]C

But we know that[v]B = PB,C[v]C

and so, by the uniqueness part of the above theorem, it must be thecase that

PB,C = (PC,B)−1

255

ExampleFor B and S as in the previous example, compute PB,S , the transitionmatrix from S to B.

Use it to compute [v]B, given [v]S =

[20

]

.

SolutionWe saw that in this case

PS,B =

[1 11 −1

]

It follows that

PB,S =

[v]B =

256

Calculating a general transition matrix

Keeping notation as before, we have

[v]S = PS,B[v]B and [v]C = PC,S [v]S

Combining these,

[v]C = PC,S [v]S = PC,SPS,B[v]B

Using the uniqueness of the transition matrix, we get

PC,B = PC,SPS,B = P−1S,CPS,B

Since it is usually easy to calculate a transition matrix with the firstbasis the standard basis, this makes it straightforward to calculate anytransition matrix.

257

ExampleWith U = V = R

2 and B = (1, 2), (1, 1) and C = (−3, 4), (1,−1),find PC,B.

PS,B = PS,C =

So

PC,S = and PC,B =

258

Relationship Between Different Matrix Representations

Example

Calculate the standard matrix representation of T : R2 → R

2 where

T (x , y) = (3x − y ,−x + 3y)

Solution

259

Example continuedNow find the matrix of T with respect to the basisB = (1, 1), (1,−1)

Solution

Notice that [T ]B is diagonal. This makes it very convenient to use thebasis B in order to understand the effect of T .

260

How are [T ]S and [T ]B related?

Theorem

The matrix representations of T : V → V with respect to two bases Cand B are related by the following equation:

[T ]B = PB,C[T ]CPC,B

Proof:We need to show that for all v ∈ V

[T (v)]B = PB,C[T ]CPC,B[v]B

Starting with the right-hand side we obtain:

PB,C[T ]CPC,B[v]B = PB,C[T ]C [v]C (property of PC,B)

= PB,C[T (v)]C (property of [T ]C)

= [T (v)]B (property of PB,C)

261

Example

For the above linear transformation T : R2 → R

2 given byT (x , y) = (3x − y ,−x + 3y) we saw that

[T ]C =

[T ]B =

where C = (1, 0), (0, 1) is the standard basis and B = (1, 1), (1,−1)

Since C is the standard basis, it is easy to write downPC,B =

From which we calculate PB,C =

Calculation verifies that in this case we do indeed have:

[T ]B = PB,C[T ]CPC,B

262

Topic 6: Inner Product Spaces [AR Ch 6]

6.1 Definition of inner products

6.2 Geometry from inner products

6.3 Cauchy-Schwarz inequality

6.4 Orthogonality and projections

6.5 Gram-Schmidt orthogonalization procedure

6.6 Application: curve fitting

263

6.1 Definition of inner productsThe Euclidean length of a vector in v ∈ R

n is defined as

‖v‖ =√

v21 + v2

2 + · · · + v2n =

√v · v

Similarly, the distance between vectors u and v is given by ‖u − v‖.

The angle θ between two vectors u and v was defined by

cos θ =v · u

‖v‖‖u‖ with 0 6 θ 6 π

The projection of v onto u was given in terms of the dot product by

proju v =(u · v)‖u‖2

u

264

We want to address two issues:

1. How to generalise the notion of a dot product for vector spacesother than R

n.

2. To relate ideas associated with the dot product and itsgeneralisations to bases and linear equations.

The first can be done by looking carefully at some of the key propertiesand generalising them.

265

Definition (Inner Product)

Let V be a vector space over the real numbers. An inner product on Vis a function that associates with every pair of vectors u, v ∈ V a realnumber, denoted 〈u, v〉, satisfying the following properties.

For all u, v,w ∈ V and α ∈ R:

1. 〈u, v〉 = 〈v,u〉2. α〈u, v〉 = 〈αu, v〉3. 〈u, v + w〉 = 〈u, v〉 + 〈u,w〉4. a. 〈u,u〉 > 0

b. 〈u,u〉 = 0 ⇒ u = 0

A vector space V together with an inner product is called an innerproduct space.

NoteA vector space can admit many different inner products. We’ll see someexamples shortly.

266

In words these axioms say that we require the inner product to:

1. be symmetric;

2. be linear with respect to scalar multiplication;

3. be linear with respect to addition;

4. a. have positive squared lengths;b. be such that only the zero vector has length 0.

With V = Rn, we of course have that 〈u, v〉 = u · v defines an inner

product (the Euclidean inner product).

But this is not the only inner product in the vector space Rn.

267

ExampleShow that, in R

2, if u = (u1, u2) and v = (v1, v2) then

〈u, v〉 = u1v1 + 2u2v2

defines an inner product.

268

More examples

1. Show that in R3, 〈u, v〉 = u1v1 − u2v2 + u3v3 does not define an

inner product by showing that axiom 4a does not hold.

2. Show that in R2,

〈u, v〉 =[u]T

[2 −1−1 1

][v]

=[u1 u2

][

2 −1−1 1

] [v1

v2

]

defines an inner product.

269

Another example

The last example raises the question of what we need to know about a

2 × 2 matrix A =

[a bc d

]

to ensure that 〈u, v〉 =[u]T

A[v]

defines an

inner product.

The check that this satisfies axioms 2 and 3 follows exactly the samelines as the previous example.

In order to satisfy axiom 1 we will need to make b = c .

The hard work occurs in satisfying axiom(s) 4. A quick check with thestandard basis vectors shows that we must have a > 0 and d > 0.

A harder calculation, which involves completing a square, shows thataxiom(s) 4 will be satisfied exactly when det(A) > 0 (and a > 0).

So we can tell exactly when a 2 × 2 matrix A defines an inner productin this way.

270

Another example

In the case of the vector space Pn of polynomials of degree less than orequal to n, a possible choice of inner product is

〈p, q〉 =

∫ 1

0p(x)q(x) dx .

To verify this, we would need to check, for any polynomialsp(x), q(x), r(x) and any scalar α that

1.∫ 10 p(x)q(x) dx =

∫ 10 q(x)p(x) dx

2. α∫ 10 p(x)q(x) dx =

∫ 10 (αp(x))q(x) dx

3.∫ 10 p(x) (q(x) + r(x)) dx =

∫ 10 p(x)q(x) dx +

∫ 10 p(x)r(x) dx

4.∫ 10 p(x)2 dx > 0

5.∫ 10 p(x)2 dx = 0 only when p(x) is the zero polynomial

271

6.2 Geometry from inner products

In a general vector space, how can we measure angles and find lengths?If we fix an inner product on the vector space, we can then definelength and angle using the same equations that we saw above for R

n.We simply replace the dot product by the chosen inner product.

Definition (Length, Distance, Angle)

For a real vector space with an inner product 〈· , ·〉 we define:

the length (or norm) of a vector v by ‖v‖ =√

〈v, v〉

the distance between two vectors v and u by d(v,u) = ‖v − u‖

and the angle θ between v and u using

cos θ =〈v,u〉‖v‖‖u‖ with 0 6 θ 6 π

272

NoteIn order for this definition of angle to make sense we need

−1 6〈v,u〉‖v‖‖u‖ 6 1

We will see shortly that this is always the case(the Cauchy-Schwarz inequality)

Definition (Orthogonal vectors)

Two vectors v and u are said to be orthogonal if 〈v,u〉 = 0

273

Example

〈(u1, u2), (v1, v2)〉 = u1v1 + 2u2v2 defines an inner product on R2

If u = (3, 1) and v = (−2, 3), then

d(u, v) = ‖u−v‖ = ‖(5,−2)‖ =√

〈(5,−2), (5,−2)〉 =√

25 + 8 =√

33

and〈u, v〉 = 〈(3, 1), (−2, 3)〉 = 3 × (−2) + 2 × 1 × 3 = 0

so u and v are orthogonal (using this inner product)

274

Example (an inner product for functions)The set of all continuous functions

C [a, b] = f : [a, b] → R | f is continuous

is a vector space.

It’s a subspace of F([a, b], R), the vector space of all functions.

For f , g ∈ C [a, b] define

〈f , g〉 =

∫ b

a

f (x)g(x)dx

This defines an inner product.

275

Example

Consider C [0, 2π] with the inner product 〈f , g〉 =∫ 2π

0 f (x)g(x)dx

The norms of the functions s(x) = sin(x) and c(x) = cos(x) are:

‖s‖2 = 〈s, s〉 =

∫ 2π

0sin2(x)dx =

∫ 2π

0

1

2(1 − cos(2x))dx

=

[x

2− 1

4sin(2x)

]2π

0

= π

So ‖s‖ =√

π and (similarly) ‖c‖ =√

π

〈s, c〉 =

∫ 2π

0sin(x) cos(x)dx =

∫ 2π

0

1

2sin(2x)dx =

[

−1

4cos(2x)

]2π

0

= 0

So sin(x) and cos(x) are orthogonal

This is used in the study of periodic functions (using ‘Fourier series’) in(for example) signal analysis, speech recognition, music recording etc.

276

6.3 Cauchy-Schwarz inequality

Theorem

Let V be a real inner product space.Then for all u, v ∈ V

|〈u, v〉| 6 ‖u‖‖v‖

Proof:The same as the proof we saw for the case V = R

n on slide 112.

ApplicationThe definition of angle is okay!We defined the angle between two vectors using

cos θ =〈u, v〉‖u‖‖v‖

From the Cauchy-Scwartz inequality we know that

−1 6〈u, v〉‖u‖‖v‖ 6 1

277

ExampleFor two continuous functions f , g : [a, b] → R, it follows directly fromthe Cauchy-Schwarz inequality that

(∫ b

a

f (x)g(x)dx

)2

6

(∫ b

a

f 2(x)dx

) (∫ b

a

g2(x)dx

)

And we don’t need to do any (more) calculus to prove it!

Example

Set f (x) =√

x and g(x) = 1/√

x ; also a = 1, b = t > 1.

We obtain(∫ t

11 dx

)2

6

(∫ t

1x dx

) (∫ t

1

1

xdx

)

which becomes

(t − 1)2 6

(t2 − 1

2

)

log t or log t >2(t − 1)

t + 1

278

6.4 Orthogonality and projections [AR 6.2]

Orthogonal sets

Recall that u and v are orthogonal if 〈u, v〉 = 0

Definition (Orthogonal set of vectors)

A set of vectors v1, . . . , vk is called orthogonal if 〈vi , vj 〉 = 0whenever i 6= j .

Examples

1. (1, 0, 0), (0, 1, 0), (0, 0, 1) is orthogonal in R3 with the dot

product.

2. So is (1, 1, 1), (1,−1, 0), (1, 1,−2).3. sin(x), cos(x) is orthogonal in C [0, 2π] equipped with the inner

product defined on slide 276

279

Proposition

Every orthogonal set of nonzero vectors is linearly independent

Proof:

280

Orthonormal sets

Definition

A set of vectors v1, . . . , vk is called orthonormal if it is orthogonaland each vector has length one. That is

v1, . . . , vk is orthonormal ⇐⇒ 〈vi , vj 〉 =

0 i 6= j

1 i = j

NoteAny orthogonal set of non-zero vectors can be made orthonormal bydividing each vector by its length.Examples

1. In R3 with the dot product:

(1, 0, 0), (0, 1, 0), (0, 0, 1) is orthonormal

(1, 1, 1), (1,−1, 0), (1, 1,−2) is not (though it is orthogonal)

1√3(1, 1, 1), 1√

2(1,−1, 0), 1√

6(1, 1,−2) is orthonormal

281

2. In C [0, 2π] with the inner product

〈f , g〉 =

∫ 2π

0f (x)g(x)dx

The set sin(x), cos(x) is orthogonal but not orthonormal.

The set 1√π

sin(x), 1√π

cos(x) is orthonormal

The (infinite) set

1√2π

,1√π

sin(x),1√π

cos(x),1√π

sin(2x),1√π

cos(2x), . . .

is orthonormal.

282

Orthonormal bases

Bases that are orthonormal are particularly convenient to work with.For example, we have the following

Lemma (AR Thm 6.3.1)

If v1, . . . , vn is an orthonormal basis for V and x ∈ V , then

x = 〈x, v1〉v1 + · · · + 〈x, vn〉vn

Proof: Exercise!

283

Orthogonal projection [AR 6.3]

Let V be a real vector space, with inner product 〈·, ·〉Let u ∈ V be a unit vector (i.e., ‖u‖ = 1)

Definition

The orthogonal projection of v onto u is

p = 〈v,u〉u

Note

p = ‖v‖ cos θ u

v − p is orthogonal to u

ExampleThe orthogonal projection of (2, 3) onto 1√

2(1, 1) is:

284

More generally, we can project onto a subspace W of V as follows.

Let u1, . . . ,uk be an orthonormal basis for W .

Definition

The orthogonal projection of v ∈ V onto W is:

projW (v) = 〈v,u1〉u1 + · · · + 〈v,uk 〉uk

Properties of p = projW (v)

p ∈ W

if w ∈ W , then projW (w) = w (by the above lemma)

v − p is orthogonal to W (i.e., orthogonal to every vector in W )

p is the vector in W that is closest to v

That is, for all w ∈ W , ‖v − w‖ > ‖v − p‖ p does not depend on the choice of orthonormal basis for W

285

Example

Let W = (x , y , z) | x + y + z = 0 in V = R3 with the dot product.

The set

u1 =1√2(1,−1, 0), u2 =

1√6(1, 1,−2)

is an orthonormal basis for W .For v = (1, 2, 3) we have

p = 〈v, u1〉u1 + 〈v, u2〉u2

=

(−1

2

)

(1,−1, 0) +

(−1

2

)

(1, 1,−2)

= (−1, 0, 1)

Note that v − p = (2, 2, 2) is orthogonal to W .

286

6.5 Gram-Schmidt orthogonalization procedure

The following can be used to make any basis orthonormal:

Gram-Schmidt procedure [AR Thm 6.3.6]

Suppose v1, . . . , vk is a basis for V .

1. Let u1 = 1‖v1‖v1

2a. w2 = v2 − 〈v2,u1〉u1

2b. u2 = 1‖w2‖w2

3a. w3 = v3 − 〈v3,u1〉u1 − 〈v3,u2〉u2

3b. u3 = 1‖w3‖w3

......

ka. wk = vk − 〈vk ,u1〉u1 − · · · − 〈vk ,uk−1〉uk−1

kb. uk = 1‖wk‖wk

Then u1, . . . ,uk is an orthonormal basis for V .287

ExampleFind an orthonormal basis for the subspace W of R

4 (with dot product)spanned by

(1, 1, 1, 1), (2, 4, 2, 4), (1, 5,−1, 3)Answer

1

2(1, 1, 1, 1),

1

2(−1, 1,−1, 1),

1

2(1, 1,−1,−1)

ExampleFind the point in W closest to v = (2, 2, 1, 3)

Answer12(3, 5, 3, 5)

288

6.6 Application: curve fitting [AR 6.4]

Given a set of data points (x1, y1), (x2, y2),. . . , (xn, yn) we want to findthe straight line y = a + bx that best approximates the data.

A common approach is to minimise the the least squares error, E

E 2 = sum of the squaresof vertical errors

=n∑

i=1

δ2i

=n∑

i=1

(yi − (a + bxi ))2

x

y

a+bx

289

So given (x1, y1),. . . , (xn, yn) we want to find a, b ∈ R which minimise

n∑

i=1

(yi − (a + bxi ))2

Which can be written as‖y − Au‖2

where

y =

y1...yn

A =

1 x1

1 x2...

...1 xn

u =

[ab

]

To minimise ‖y − Au‖ we want Au to be as close as possible to y

We can use projection to find the closest point.

290

We seek the vector in

W = Av | v ∈ R2 (= the column space of A)

that is closest to y

The closest vector is precisely projW y

So to find u we could project y to W to get Au = projW y

and from this calculate u

291

However, we can calculate u directly (without finding an orthonormalbasis for W ) by noting that

〈w, y − projW y〉 = 0 for all w ∈ W

=⇒ 〈Av, y − Au〉 = 0 for all v ∈ R2

=⇒ (Av)T (y − Au) = 0 for all v ∈ R2

=⇒ vT AT (y − Au) = 0 for all v ∈ R2

=⇒ AT (y − Au) = 0

=⇒ ATy − ATAu = 0

=⇒ ATAu = ATy

From this we can calculate u, given that we know A and y.292

SummaryGiven data points (x1, y1), (x2, y2),. . . , (xn, yn), to find the straight line

y = a + bx of best fit, we find u =

[ab

]

such that

ATAu = ATy (∗)

where y =

y1...yn

and A =

1 x1

1 x2...

...1 xn

If ATA is invertible (and it usually is), the solution to (∗) is given by

u = (ATA)−1ATy

293

ExampleFind the straight line which best fits the data points(−1, 1), (1, 1), (2, 3)

AnswerLine of best fit is: y = 9

7 + 47x

294

ExtensionThe same method works for finding quadratic fitting curves.To find the quadratic y = a + bx + cx2 which best fits data (x1, y1),(x2, y2),. . . , (xn, yn) we take

A =

1 x1 x21

1 x2 x22

......

...1 xn x2

n

y =

y1...yn

and solveATAu = ATy

for

u =

abc

295

Topic 7: Eigenvalues and Eigenvectors [AR Chapt 7]

7.1 Definition of eigenvalues and eigenvectors

7.2 Finding eigenvalues

7.3 The Cayley-Hamilton theorem

7.4 Finding eigenvectors

7.5 Diagonalization

296

7.1 Definition of eigenvalues and eigenvectors

The topic of eigenvalues and eigenvectors is fundamental to manyapplications of linear algebra. These include quantum mechanics inphysics, image compression and reconstruction in computing andengineering, and the analysis of high dimensional data in statistics.

The key idea is to identify those subspaces which map to themselvesunder T .

For any vector 0 6= v ∈ V , Spanv is a subspace of V of dimension 1.

Suppose thatT (v) = λv

for some scalar λ. It follows that T maps the subspace Spanv toitself.Note that λ is a scale factor which stretches the subspace and possiblychanges its sense (if λ is negative).

297

Definition

Let T : V → V be a linear transformation.A scalar λ is an eigenvalue of T if there is a non-zero vector v ∈ V suchthat

T (v) = λv (∗)The vector v is called an eigenvector of T (with eigenvalue λ).If λ is an eigenvalue of T , then the set of all v satisfying (∗) is asubspace of V (exercise!) and is called the eigenspace of λ.

We will restrict attention to the case that V is finite dimensional. ThenT can be represented by a matrix [T ], which we know depends on thechoice of bases.

298

In fact the idea of eigenvalues and eigenvectors can be applied directlyto square matrices.

Definition

Let A be an n × n matrix and let λ be a scalar. Then a non-zero n × 1column matrix v with the property that

Av = λv

is called an eigenvector, while λ is called the eigenvalue.

To develop some geometric intuition, it is handy to think of A as beingthe standard matrix of a linear transformation.

Example

Consider the matrix

[1 41 1

]

as the standard matrix of a linear

transformation. What is the effect of the transformation on the vectors(2, 1) and (2,−1)?

299

7.2 Finding eigenvalues

With I denoting the n × n identity matrix, the defining equation foreigenvalues and eigenvectors can be rewritten

(A − λI )v = 0

The values of λ for which this equation has non-zero solutions areprecisely the eigenvalues.

Theorem

The homogeneous linear system (A − λI )v = 0 has a non-zero solutionif and only if det(A − λI ) = 0. Consequently, the eigenvalues of A arethe values of λ for which

det(A − λI ) = 0

300

NotationThe equation det(A − λI ) = 0 is referred to as the characteristicequation. From our study of determinants, we know that det(A − λI ) isa polynomial of degree n in λ. It is called the characteristic polynomial.

Example

Find the eigenvalues of

[1 41 1

]

.

301

Example

Find the eigenvalues of

[0 1−1 0

]

.

If we use the standard basis, how can the corresponding lineartransformation be described geometrically? How does this tell you thatthe matrix does not have any invariant subspaces?

302

Example

Find the eigenvalues of the matrix A =

1 0 0 00 1 0 00 0 1 −10 0 1 1

.

303

7.3 The Cayley-Hamilton theorem

We note (without proof) the following

Theorem (Cayley-Hamilton Theorem)

A square matrix satisfies its characteristic equation.

This means (among other things) that

Every (positive) power of A can be expressed as a linearcombination of I , A,. . . , An−1.

If A is invertible, then A−1 can be expressed as a linearcombination of I , A,. . . , An−1.

304

Examples

Let A =

[3 21 4

]

1. Calculate the characteristic equation of A.

2. Verify that A satisfies its characteristic equation.

3. Express A−1 as a linear combination of A and I and hencecalculate it.

4. Express A3 as a linear combination of A and I and hence calculateit.

305

7.4 Finding eigenvectors

To find the eigenvectors of a matrix

For each eigenvalue λ, solve the homogeneous linear system(A − λI )v = 0.

Use row reduction as usual.

Note that rank(A − λI ) < n, so you always obtain at least one rowof zeros.

Example

For each of the eigenvalues 3,−1 of the matrix

[1 41 1

]

, find a

corresponding eigenvector.

306

ExampleFor each of the eigenvalues λ = −1, 8 (the eigenvalue −1 is repeated

twice) of the matrix

3 2 42 0 24 2 3

, find a basis of for the corresponding

eigenspace.

307

7.5 Diagonalization [AR 7.2]

We now take up the problem of studying the question of when theeigenvectors of an n × n matrix A form a basis for R

n. Such bases areextremely important in the applications of eigenvalues and eigenvectorsmentioned earlier.

Definition

A square matrix A is said to be diagonalizable if there is an invertiblematrix P such that P−1AP is a diagonal matrix. The matrix P is saidto diagonalize A.

308

To test if A is diagonalizable, the following theorem can be used.

Theorem

An n × n matrix A is diagonalizable if and only if there is a basis for Rn

all of whose elements are eigenvectors of A.

Idea of proof:If we can form a basis in which all the basis vectors are eigenvectors ofT , then the new matrix for T will be diagonal.

309

A simple case in which A is diagonalizable is when A has n distincteigenvalues. This follows from the theorem above and the following

Lemma

Eigenvectors corresponding to distinct eigenvalues are linearlyindependent.

Idea of proof:

310

Example

Give reasons why the matrix A =

[1 20 1

]

cannot be diagonalized.

311

How to diagonalize a matrix

Suppose A is diagonalizable. How can we find an invertible matrix P ,and a diagonal matrix D with D = P−1AP ?

Theorem

Let A be a diagonalizable n × n matrix. Thus there exists a basisv1, . . . , vn for R

n whose elements are eigenvectors of A.Let P =

[[v1] · · · [vn]

]and let λi be the eigenvalue of the eigenvector vi .

ThenP−1AP = diag [λ1, λ2, . . . , λn]

So, if we have found a basis consisting of eigenvectors, then we canwrite down matrices P and D without further calculation.

312

Example

Check that

212

,

−120

,

−101

are eigenvectors of the matrix

A =

3 2 42 0 24 2 3

and read off the corresponding eigenvalues.

Write down an invertible matrix P such that D = P−1AP is diagonal,and write down the diagonal matrix D.

Check your answer by evaluating both sides of the equation PD = AP .

313

Orthogonal matricesBecause a matrix often represents a physical system it can be importantthat the change-of-basis transformation does not affect shape.This will happen when the change-of-basis matrix is orthogonal:

Definition

An n × n matrix P is orthogonal if the columns of P form anorthonormal basis of R

n.

Examples

−1/√

2 1/√

6 1/√

3

0 −2/√

6 1/√

3

1/√

2 1/√

6 1/√

3

and

[cos θ sin(θ)sin θ cos θ

]

are orthogonal

but

[1 10 1

]

and

[1 −11 1

]

are not.

314

Orthogonal matrices have some good properties.

If P is orthogonal n × n and u, v ∈ Rn then

P−1 = PT

‖Pu‖ = ‖u‖ 〈Pu,Pv〉 = 〈u, v〉

We can summarise by saying that orthogonal matrices ‘preserve’ lengthand angle.

315

Real symmetric matrices

Definition

A matrix A is symmetric if AT = A

Examples[1 22 3

]

and

3 −1 4−1 1 54 5 9

are symmetric,

but

[2 13 2

]

and

3 −1 4−1 1 54 6 9

are not

Symmetric matrices arise (for example) in:

quadratic functions

inner products

maxima and minima of function of more than 1 variable

316

Theorem

Let A be an n × n real symmetric matrix. Then:

1. all roots of the characteristic polynomial of A are real;

2. eigenvectors from distinct eigenvalues are orthogonal;

3. A is diagonalizable;(In fact, there is an orthonormal basis of eigenvectors.)

4. we can write A = QDQ−1 where D is diagonal and Q is anorthogonal matrix.

Note that, in this case the diagonalization formula can be writtenA = QDQT .Idea of proof of 2:

317

Example

Find matrices D and Q as above for A =

[2 −1−1 2

]

318

Powers of a matrix

In applications, one often comes across the need to apply atransformation many times. In the case that the transformation can berepresented by diagonalizable matrix A, it is easy to compute Ak andthus the action of the k-th application of the transformation.

The first point to appreciate is that computing powers of a diagonalmatrix D is easy.

ExampleWith D = diag [1,−3, 2], write down D2 and D3.

In fact, we have

Lemma

Suppose A is diagonalizable, so that D = P−1AP is diagonal. Then

Ak = PDkP−1

319

ExampleFor the matrix of slide 301 we can write

A = PDP−1 with D =

[3 00 −1

]

, P =

[2 −21 1

]

Thus

Ak = PDkP−1 with P−1 =1

4

[1 2−1 2

]

Explicitly,

An = 14

[2 −21 1

]

×[3n 00 (−1)n

]

×[

1 2−1 2

]

= 14

[2 (3n + (−1)n) 4 (3n − (−1)n)

3n − (−1)n 2 (3n + (−1)n)

]

320

Conic Sections [AR 9.6]

Consider equations in x and y of the form

ax2 + bxy + cy2 + dx + ey + f = 0

where a, b, c , d , e, f ∈ R are constants.

The graphs of such equations are called conic sections or conics.

Example9x2 − 4xy + 6y2 − 10x − 20y − 5 = 0

-0.5 0.5 1 1.5 2 2.5

1

2

3

4

x

y

We will see how the shape ofthis graph can be calculatedusing diagonalization.

321

We shall assume that d and e are zero.See [AR 9.6] for a discussion of how to reduce to this case.

If the equation is simple enough, we can identify the conic by inspection.

Standard Conics: (see figure 9.6.1 in AR)

ellipsex2

α2+

y2

β2= 1

hyperbolax2

α2− y2

β2= 1

parabola y = αx2

322

The equation in matrix formConsider the curve defined by the equation

ax2 + bxy + cy2 = 1

The equation can be written in matrix form as

[x y

][

a 12b

12b c

] [xy

]

= 1

That is,xT Ax = 1

where x =

[xy

]

, and A is a real symmetric matrix.

We can diagonalize in order to simplify the equation so that we canidentify the curve.Let’s demonstrate with an example.

323

Identify and sketch the conic defined by x2 + 4xy + y2 = 1

This can be written as xT Ax = 1 where A =

[1 22 1

]

Diagonalizing gives A = QDQT with D =

[3 00 −1

]

, Q = 1√2

[1 −11 1

]

Let x′ =

[x ′

y ′

]

be the co-ordinates of (x , y) relative to the orthonormal

basis of eigenvectors: B = ( 1√2, 1√

2), (− 1√

2, 1√

2)

Then x = Qx′, (Q is precisely the transition matrix PS,B)and the equation of the conic can be rewritten:

xT Ax = 1 ⇐⇒ (Qx′)T QDQT Qx′ = 1

⇐⇒ (x′)T QT QDQT Qx′ = 1

⇐⇒ (x′)T Dx′ = 1

⇐⇒ 3(x ′)2 − (y ′)2 = 1

So the curve is a hyperbola324

The x ′-axis and the y ′-axis are called the principal axes of the conic

The directions of the principal axes are given by the eigenvectors.

In this example the directions of the principal axes are:

(1√2,

1√2) and (− 1√

2,

1√2).

-2 -1 1 2

-2

-1

1

2

x′

y′

3(x ′)2 − (y ′)2 = 1

-4 -2 2 4

-4

-2

2

4

x

y

x′

y′

x2 + 4xy + y2 = 1

325

Summary

We can represent the equation of a conic (centered at the origin)by a matrix equation xT Ax = 1 with A symmetric.

The eigenvectors of A will be parallel to the principal axes of theconic.

So if Q represents the change of basis matrix then Q will beorthogonal and QTAQ = D will be diagonal.

If x = Qx′ then x ′ represents the coordinates with respect to thenew basis and the equation of the conic with respect to this basisis x′T DxT = 1.

The conic can now be identified.

326

Example (a quadric surface)The equation

−x2 + 2y2 + 2z2 + 4xy + 4xz − 2yz = 1

represents a ‘quadric surface’ in R3.

In matrix form it can be represented as xTAx = 1 with

x =

xyz

and A =

−1 2 22 2 −12 −1 2

The eigenvalues of A are 3, 3,−3. So the equation of the surface withrespect to an orthonormal basis of eigenvectors is

3X 2 + 3Y 2 − 3Z 2 = 1

327

The surface is a ‘hyperboloid of one sheet’; see the sketch below.(You are not expected to identify quadric surfaces in three dimensions.)

328

Documents

Linear Slides