1 Welcome to the world of linear algebra: Vector Spacesmath.fau.edu/schonbek/MAPcourses/em1fa12linearalgebra.pdfVector spaces, also known as a linear spaces, come in two avors, real

Engineering Mathematics 1–Summer 2012Linear Algebra

The commercial message. Linearity is a key concept in mathematics and its applications.Linear objects areusually nice, smooth, clean, easy to handle. Nonlinear ones are slimy, temperamental, and full of dark corners inwhich monsters can hide. The world, unfortunately, has a strong tendency toward non-linearity, but there is noway that we can understand nonlinear phenomena without first having a good understanding of linear ones. Infact, one of the most common first approaches to a non-linear problem may be to approximate it by a linear one.

1 Welcome to the world of linear algebra: Vector Spaces

Vector spaces, also known as a linear spaces, come in two flavors, real and complex. The main difference betweenthem is what is meant by a scalar. When working with real vector spaces, a scalar is a real number. When workingwith complex vector spaces, a scalar is a complex number. The important thing is not to mix the two flavors.You either work exclusively with real vector spaces, or exclusively with complex ones. Well . . . ; nothing is thatdefinite. There are times when one wants to go from one flavor to the other; but that should be done with care.One thing to remember, however, is that real numbers are also part of the complex universe; a real number is justa complex number with zero imaginary part. So when working with complex vector spaces, real numbers are alsoscalars because they are also complex numbers.

So what is a vector space? We could give a vague definition saying it is an environment that is linear algebrafriendly. But we need to be more precise. Linear operations are very basic ones, so as a better definition we cansay that a vector space is any set of objects in which we have defined two operations:

1. An addition, so that if we have any two objects in that set, we know what their sum is. And that sum shouldbe also an element of the set.

2. A scalar multiplication; it should be possible to multiply objects of the set by scalars to get (usually) newobjects in the set. If by scalars we mean real numbers, then we have a real vector space; if we mean complexnumbers, we have a complex vector space.

There is some freedom in how these operations are defined, but only a little bit. The operations should have theusual properties we associate with sums and products. Before giving examples, we need to be a bit more precise.

In general, I will use lower case boldface letters to stand for vectors, a vector being simply an object in aset that has been identified as a vector space, and regularly written lower case letters for scalars (real or complexnumbers). If V is a vector space, if v,w are elements of V (in other words, vectors), then their sum is denoted by(written as) v + w, which also has to be in V .

If a is a scalar, v an element of V , then av denotes the scalar product of a times v. It should be an element ofV .

Summary of the information so far. By scalar we mean a complex or a real number. We should be aware thatif our scalars are real numbers, they should always be real; if complex, always complex. A vector (or linear) spaceis any set of objects in which we have defined a sum and a scalar product satisfying some (soon to be specified)properties. In this context, the elements of this set are called vectors. So we can say that a vector set is a bunchof vectors closed under addition and scalar multiplication, a vector is an element of a vector space.

Finally!, here are the basic properties these operations have to satisfy. For a set V of objects to deserve to becalled a vector space it is, in the first place necessary (as mentioned already several times) that if v, w ∈ V , thenwe have defined v + w ∈ V , and if a is a scalar, v ∈ V , we have defined av ∈ V . These operations must satisfy:

1. (Associativity of the sum) If v,w, z ∈ V , then (v + w) + z = v + (w + z). This allows us to write the sumof any number of objects (vectors) without using parentheses: If v,w,y, z ∈ V , it makes sense to writev + w + y + z. One person may compute this sum by adding first v + w, then y + z, finally adding these

1 WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES 2

two together. Another person may first compute w + y, then add v in front to get v + (w + y), finally addz to get (v + (w + y)) + z. The result is the same.

2. (Commutativity of the sum) If v,w ∈ V , then v + w = w + v.

3. (Existence of 0) There exists a unique element of V , usually denoted by 0, such that v + 0 = v, for all v ∈ V .

4. (Existence of an additive inverse) If v ∈ V , there exists a unique element −v ∈ V such that v + (−v) = 0.This allows us to define subtraction; one writes v −w for v + (−w).

5. (Distributivity 1) If v,w ∈ V , a ∈ R, then a(v + w) = av + aw.

6. (Distributivity 2) If a, b ∈ R, and v ∈ V , then (a+ b)v = av + bv.

7. (Associativity 2) If a, b ∈ R, v ∈ V , then a(bv) = (ab)v = b(av).

8. (One is one) 1v = v for all v ∈ V . (Here 1 is the scalar number 1.)

These properties have consequences. The main consequence is that one operates with vectors as with numbers,as long as things make sense. So, for example, in general, one can’t multiply a vector by a vector (there will beparticular exceptions). But otherwise, what seems right is usually right. Here are some examples of what is truein any vector space V . Any scalar times the 0 vector is the zero vector. In symbols: a0 = 0. It is not one ofthe eight properties listed above, but it follows from them, It is also true that the scalar 0 times any vector is thezero vector: 0v = 0. One good thing is that all the linear spaces we will consider will be quite concrete, and allthe properties of the operations quite evident. At least, so I hope. What this abstract part does is to provide acommon framework for all these concrete sets of objects.

Here are a number of examples. It might be a good idea to check that in every case the 8 properties hold or,at the very least, convince yourself that they hold.

• Example 1. The stupid vector space. (But, since we are scientists and engineers, we cannot use theword stupid. We must be dignified. It is usually called the trivial vector space.) It is a silly example, butit needs to be seen. It is the absolute simplest case of a linear space. The space has a single vector, the 0vector. Addition is quite easy; all you need to know is that 0 + 0 = 0. So is scalar multiplication: If a is ascalar, then a0 = 0. And −0 = 0.

The trivial vector space can be either real or complex. The next set of examples consist of real vector spaces.

• Example 2. The next vector space, just one degree above the previous one in complexity, is the set R of realnumbers. Here the real numbers are forced to play a double role, have something like a double personality:On the one hand they are the numbers they always are, the scalars. But they also are the vectors. If a,b ∈ R(I write them in boldface to emphasize that now they are acting as vectors), one defines a + b as usual. Andif a ∈ R and c is a scalar, then ca is just the usual product ca of c times a, but now interpreted as a vector.

• Example 3. Our main examples of real vector spaces are the spaces known as Rn, where n is a positiveinteger. We already saw R1; it is just R. Now we will meet the next one: the space R2 consists of all pairs ofreal numbers; in symbols:

R2 = {(a, b) : a, b are real numbers}.As we all know, we can identify the elements of R2 with points in the plane; once we fix a system of cartesiancoordinates, we identify the pair (a, b) with the point of cartesian coordinates (a, b). Operations are definedin a more or less expected way:

(a, b) + (c, d) = (a+ c, b+ d), (1)

a(c, d) = (ac, ad). (2)

Verifying associativity, commutativity, and distributivity, reduces to the usual associative, commutative, anddistributive properties of the operations for real numbers. The 0 element is 0 = (0, 0); obviously

(a, b) + 0 = (a, b) + (0, 0) = (a+ 0, b+ 0) = (a, b).


The additive inverse is also easy to identify, −(a, b) = (−a,−b):

(a, b) + (−a,−b) = (a+ (−a), b+ (−b)) = (0, 0) = 0.

These operations have geometric interpretations. I said that R2 can be identified with points of the plane.But another interpretation is to think of the elements of R2 as being vectors, we represent a pair (a, b) as anarrow beginning at(0, 0) and ending at the point of coordinates (a, b). Then we can add the vectors by theparallelogram rule: Complete the parallelogram having two sides determined by the vectors you want to add.The diagonal of the parallelogram from the origin is the sum of the vectors. The picture shows how to addgraphically a = (5, 2) and b = (3, 7) to get a + b = (8, 9).

In many applications one needs free vectors, vectors that do not have their origin at the origin. We can thenthink of the pair (a, b) as an arrow that can start at any point we wish, but once we fix the starting point,the end-point, the tip of the arrow, is a units in the x-direction, b-units in the y-direction, from the startingpoint. This gives us an alternative way of adding vectors: To add a and b, place the origin of b at the endof a. Then a + b is the vector starting where a starts, ending where b ends.

Here is a picture of the sum of the same two vectors as before done by placing the beginning of one at theend of the other. In black it is a + b, in red b + a. The picture shows that the end result is the same.


It is also easy to see what b− a should be graphically. It should be a vector such that when it follows a weget b. In the parallelogram construction, it is the other diagonal of the parallelogram.

What happens if a and b are parallel? Drawing a parallelogram can be a bit of a problem, but following onevector by the other is no problem. For example if a = (a1, a2) and b = (−a1,−a2), then a,b have the samelength and point in exactly opposite directions. If you place b starting where a ends, you cancel out theeffect of a. The sum is a vector starting and ending at the same point, the 0 vector. Of course, analytically,a + b = (a1 − a1, a2 − a2) = (0, 0) = 0.

The next picture is a graphic illustration of the associative property of the sum of vectors.


There is also a graphic interpretation of scalar multiplication. Thinking of these vectors as arrows, onefrequently reads that a vector is an object that has a magnitude and a sense of direction. So does a weathervane and a lot of animals; I mention this to point out how vague this definition is. But it is a useful wayof thinking about vectors. A vector is a magnitude or length, pointing in some direction. Multiplying by apositive scalar (number) keeps the direction the same, but multiplies the length of the vector by the scalar.If the scalar is negative, the length gets multiplied by the absolute value of the scalar, and the vector getsturned around so it points in the exact opposite direction from where it was pointing before. If the scalaris 0, you get the zero vector. Incidentally, the magnitude or length of the vector of components (a, b) is|(a, b)| =

√a2 + b2 (as Pythagoras has decreed!)

It is good to keep the graphic interpretation in mind for the applications, but when it comes to doingcomputations seeing vectors in the plane as simply pairs of numbers, as R2, makes for a better, more precise,more efficient way of proceeding.

• Example 3. The space R3. This is the space of all triples of real numbers; in symbols;

R3 = {(a, b, c) : a, b, c are real numbers}.

It is similar to R2, just one additional component. The operations are

(a, b, c) + (d, e, f) = (a+ d, b+ e, c+ f), (3)

r(a, b, c) = (ra, rb, rc). (4)

The 0 element is 0 = (0, 0, 0); obviously

(a, b, c) + 0 = (a, b, c) + (0, 0, 0) = (a+ 0, b+ 0, c+ 0) = (a, b, c).

The additive inverse is also easy to identify, −(a, b, c) = (−a,−b,−c):

(a, b, c) + (−a,−b,−c) = (a+ (−a), b+ (−b), c+ (−c)) = (0, 0, 0) = 0.

We can think of these vectors as points in 3-space (after a system of cartesian coordinates has been set up)or as free “vectors” in 3-space. In this second interpretation, v = (a, b, c) is an “arrow” that we can startfrom any point we wish, as long as the end-point is a units in the x-direction, b in the y-direction, c in thez-direction, away from its beginning. To add two vectors graphically we can still follows one vector by theother one. Or we can start them both from the same point. If they are not parallel they will determine aplane, and on this plane we can use the rule of the parallelogram to add them. If they are parallel, work onany of the infinite number of planes that contain the two vectors. The length or magnitude of v = (a, b, c) isagain given by Pythagoras: |v| =

√a2 + b2 + c2.


• Example 4. Why stop at 3? If n is a positive integer, we denote by Rn the set of all n-tuples of realnumbers. A typical element of Rn will be denoted by a = (a1, . . . , an). With a bit of imagination we mightbe able to imagine spaces of any number of dimensions; then the elements of Rn can be thought of as pointsin an n-dimensional space. Isn’t this exciting? The operations are

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn), (5)

c(a1, . . . , an) = (ca1, ca2, . . . , can). (6)

The 0 element is 0 = (0, . . . , 0)︸︷︷︸npositions

; obviously, if a = (a1, . . . , an), then

a + 0 = (a1, . . . , an) + (0, . . . , 0) = (a1, . . . , an) = a.

The additive inverse is also easy to identify, −(a1, . . . , an) = (−a1, . . . ,−an).

• Example 5. What if we try something a bit more exotic? Suppose we again take R2, pairs of real numbers,but define (out of perversity)

(a, b) + (c, d) = (ac, bd), r(a, b) = (ra, rb).

Well, it won’t work. Commutativity and associativity still hold. There even is something acting like a zeroelement, namely (1, 1). In this crazy definition, (a, b) + (1, 1) = (a1, b1) = (a, b). Most elements even have“additive” inverses; in this definition (a, b) + (1/a, a/b) = (a(1/a), b(1/b)) = (1, 1), which is the zero element.But “most” is not enough! It has to be ALL. And any pair of which at least one component is 0 has no“additive” inverse. For example, (3, 0) + (c, d) in this strange bad addition works out to (3c, 0) and can neverbe (1, 1), no matter what (c, d) is. This is not a vector space.

• Example 6. Matrices. A matrix is a rectangular array of numbers. In this arrangement, the horizontallevels are called rows, each vertical grouping is a column. If the matrix has m rows and n columns, we say itis an m× n matrix. Here is a picture copied (stolen?) from Wikipedia, illustrating some basics

The entries of a matrix are usually denoted by double subindices; the first subindex denotes the row, thesecond the column in which the entry has been placed. The Wikipedia matrix seems to have no end to theright or downwards. If I were to present an example of an abstract m× n matrix (as I am about to do), I’dwrite it up as follows:

a11 a12 a13 · · · a1na21 a22 a23 · · · a2n...

......

......

am1 am2 am3 · · · amn

One surrounds the array of numbers with parentheses (or square brackets), so as to make sure they stay inplace. I also dispense with commas between the subindices in the abstract notation; I write a12 rather thana1,2 for example. Wherever this could cause confusion, I would add commas. For example in the unlikelyevent that we have to deal with a matrix in which m or n (or both) are greater than 10, talking of an elementa123 is confusing. It could be a1,23 or a12,3. Commas are indicated.


We make a vector space out of m × n matrices, which I’ll denote by Mm,n, by defining addition and scalarmultiplication in what could be said to be the obvious way, element wise:

a11 a12 a13 · · · a1na21 a22 a23 · · · a2n...

......

......


+

b11 b12 b13 · · · b1nb21 b22 b23 · · · b2n...

......

......

bm1 bm2 bm3 · · · bmn

=

a11 + b11 a12 + b12 a13 + b13 · · · a1n + b1na21 + b21 a22 + b22 a23 + b23 · · · a2n + b2n

......

......

...am1 + bm1 am2 + bm2 am3 + bm3 · · · amn + bmn

for the sum, and

c

a11 a12 a13 · · · a1na21 a22 a23 · · · a2n...

......

......


=

ca11 ca12 ca13 · · · ca1nca21 ca22 ca23 · · · ca2n

......

......

...cam1 cam2 cam3 · · · camn

for the product by the scalar c. The same definitions, written in a more compact way, are: If A = (aij) andB = (bij) are two m× n matrices, then

A+B = (aij + bij), cA = (cai,j).

It is quite easy to check that with these operations Mm,n is a vector space. The 0 vector in this space is thezero matrix, the matrix all of whose entries are 0. The additive inverse of (aij) is (−aij).Just to make sure we are on the same page, here are a few examples. These could be typical beginning linearalgebra exercises.

1. Let

A =

1 −3 52 0 1−7 7 40 5 0

, B =

−5 2 46 7 8−1 −1 04 4 4

, C =

0 1 2−6 8 112 4 30 0 1

.

Evaluate 2A− 3B + 5C.

Solution.

2A− 3B + 5C =

17 −7 8−46 19 33−1 37 23−12 −2 −7

2. Let A =

(1 2 0 23 0 −4 2

)and let B =

(−3 1 4 01 1 2 2

). Solve the matrix equation

A+ 5X = B.

Solution.

X =1

5(B −A) =

(−4/5 −1/5 4/5 −2/5−2/5 1/5 6/5 0

).

3. Evaluate A+B if A =

(0 11 0

)and B =

(0 0 00 0 0

).

Solution. IT CANNOT BE DONE! IMPOSSIBLE! These matrices belong to differentworlds, and different worlds cannot collide. Later on, we’ll see that these two matrices can be multiplied(once we define matrix multiplication); more precisely, AB makes sense, while BA doesn’t. But for nowlet’s keep in mind that matrices of different types canNOT be added (or subtracted).


• Example 7. Functions. I’ll just consider one case. Let I be an interval (open, closed, bounded, or not)in the real line and let V be the set of all (real-valued) functions of domain I. If f, g are functions we definethe function f + g as the function whose value at x ∈ I is the sum of the values of f at x and of g at x; insymbols

(f + g)(x) = f(x) + g(x).

We define the scalar product cf (if c is a real number, f a function on I) as the function whose value at x ∈ Iis c times the value of f at x; in symbols,

(cf)(x) = cf(x).

It is easy to see that we have a vector space in which 0 is the constant function that is identically 0, and iff ∈ V , then −f is the function whose value at every x ∈ I is −f(x).

Every one of the examples has its complex counterpart, all we need to do is allow complex numbers as ourscalars. Here are examples 2, 3,4, 6, and 7, redone as complex vector spaces. In each case the real vector space isa subset of the complex one.

• Examples 2’, 3’, 4’. The vector space Cn consists of all n-tuples of complex numbers. The operations aredefined by

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn), (7)

c(a1, . . . , an) = (ca1, ca2, . . . , can). (8)

The 0 element is 0 = (0, . . . , 0)︸︷︷︸npositions

; obviously, if a = (a1, . . . , an), then

a + 0 = (a1, . . . , an) + (0, . . . , 0) = (a1, . . . , an) = a.

The additive inverse is, as in the real case, −(a1, . . . , an) = (−a1, . . . ,−an).

• Example 6’. A complex matrix is a rectangular array of complex numbers. We will denote the set of allm×n complex matrices by Mm,n(C). Addition and scalar multiplication are defined as in the real case, exceptthat we allow complex scalars. As an example (one of a zillion trillion possible ones), let A,B ∈M3,2(C) bedefined by

A =

1 + i√

32− 5i 0

1 0

, B =

5 −8−2 1

0 1

then (see if you get the same result!)

√3A− iB =

√3 + (5 +

√3)i 3 + 8i

2√

3 + (2− 5√

3)i −i√3 −i

• Example 7’. Let I be an interval in the real line and consider all complex valued functions on I. In other

words, we consider all functions of the form f(t) = u(t) + iv(t), where u(t), v(t) are real valued functions ofdomain I, and i =

√−1. If f(t) = u1(t) + iv1(t) and g(t) = u2(t) + iv2(t), where u1, u2, v1, v2 are real valued

functions of domain I, we define f + g in the natural way. (f + g)(t) = u1(t) + u2(t) + i(v1(t) + v2(t)). Iff(t) = u(t) + iv(t) and c = a + ib is a complex number (a scalar), where a, b are real numbers and u, v arereal valued functions of domain I, we define cf to be the function

(cf)(t) = c(f(t)) = (a+ ib)(u(t) + iv(t)) = au(t)− bv(t) = i(av(t) + bu(t)).

We have again a vector space, a complex one this time.

2 WHAT CAME FIRST, THE CHICKEN OR THE EGG? 9

1.1 Exercises

1. Which of the following sets V, with the operations as defined, is a (real) vector space? In each case eitherverify all properties (and this includes identifying the zero element), or determine at least one property thatfails to hold.

(a) V is the set of all upper triangular n×n matrices, with addition and scalar product defined in the usualway. A square matrix is upper triangular if and only if all entries beneath the main diagonal are 0.

(b) V = {(x, y, z) ∈ R3 : 2x− 3y + 4z = 1}, with the same operations as elements of R3.

(c) V = {(x, y, z) ∈ R3 : 2x− 3y + 4z = 0}, with the same operations as elements of R3.

(d) V = {(x, y) : x, y ∈ R, x > 0} with operations defined as follows:

(x, y) + (x′, y) = (xx′, y + y′), r(x, y) = (xr, ry),

if (x, y), (x′, y′) ∈ V , and r is a real scalar.

(e) I = (a, b) is an interval and V is the set of all (real valued) differentiable functions of domain I. Thatis, f ∈ V if and only if f(x) is defined for all x ∈ I and the derivative f ′(x) exists st each x ∈ I. Theoperations are defined as usual for functions; that is, if f, g are functions on I, then the function f + gon I is the function whose value at any point x is f(x) + g(x); that is, (f + g)(x) = f(x) + g(x). And iff is a function, c a scalar, then cf is defined by (cf)(x) = cf(x).

(f) V is the set of all polynomial (functions); a polynomial being an expression of the form

p(x) = anxn + · · ·+ a1x+ a0,

where a0, . . . , an are real numbers. Operations (addition and scalar multiplication) as usual

(g) V is the set of all polynomials of degree higher than 2, plus the 0 polynomial.

2. Let y(n) + an−1(t)y(n−1) + · · ·+ a1(t)y′ + a0(t)y = 0 be a linear homogeneous differential equation of degreen with coefficients a0(t), . . . , an−1(t) continuous functions in some open interval I of R. Show that the setof solutions of this equation in I is a real vector space. Operations on solutions are defined as usual (seeExercise 1e)

3. Show that the set of complex numbers is a real vector. That is, if we forget about multiplying two complexnon-real numbers and only consider products of the form cz, where c is real and z is complex, then C is areal vector space.

2 What came first, the chicken or the egg?

In his book “God or Golem. A Comment on Certain Points Where Cybernetics Impinges on Religion,” NorbertWiener (a famous mathematician of the first half of the twentieth century, who among many other things coined theword cybernetics from the Greek word for steersman) reexamines the age old precedence question. He concludesthat the question is meaningless, since both carry the same information. While we usually see an egg as thedevice by which a chicken produces another chicken, Wiener argues that we may as well consider a chicken as themeans by which an egg produces another egg. If two objects have the same information, they are the same. Thepoint of this, when applied to vector spaces, is that SO FAR, the information carried by an m × n matrix is thesame as that carried by a row of mn numbers, or of mn numbers written out in any way we please.

Here is something one can do with m× n matrices. I can take an m× n matrix and write out the elements ina row (or column) of mn elements in a variety of ways. For example, I can write out first the elements of the firstrow, followed by those of the second row, and so forth. To illustrate I’ll use the case of m = 2, n = 3, but a similarthing holds for all m,n. Suppose

A =

(1 2 30 −2 4

), B =

(0 6 72 0 −1

)

3 SUBSPACES 10

and we want to compute 2A + 3B. Instead of doing it the usual way, we could proceed as follows. First we writeboth A,B as rows of numbers by the procedure I mentioned above. We call these new objects A′, B′:

A′ = (1, 2, 3, 0,−2, 4), B′ = (0, 6, 7, 2, 0,−1).

A′, B′ are vectors in R6. Operating on them by the rules of R6 we get

2A′ + 3B′ = (2, 22, 27, 6,−4, 5).

Now we can reverse the process that we used to get A′, B′ and get

2A+ 3B =

(2 22 276 −4 5

)The difference between A and A′ (and B and B′) is just one of convenience. As vector spaces, Mm,n and Rmn areidentical. The same is true of Mm,n(C) and Cmn. Later on we’ll se that there is a good reason for writing numbersout as arrays, rather than as rows or columns. And that frequently we want to write the elements of Rn or Cn inthe form of columns, rather than rows.

3 Subspaces

Given a vector space, inside of it there are usually sets that are themselves vector spaces with the vector operations.For example, in R3 consider the set

V = {(x, y, 0) : x, y real numbers.}.

Is it a vector space? The only thing that could possibly go wrong is that the vector operations can take us outof V . Or that some element of V might not have its additive inverse (the thing you get putting a minus sign infront) in V . Or maybe there is no 0 in V . Or maybe scalar product takes us out of V . But none of these thingshappens. 0 = (0, 0, 0) is in V ; it is the case x = y = 0. The description of V in words is: V is the set of all triplesof real numbers in which the last component is 0. Well, adding two such triples produces another such triple, since0 + 0 = 0. Given such a triple, its additive inverse is of the same type, since −0 = 0. Multiplying by a scalar sucha triple, produces another one with third component 0, since c · 0 = 0 for all scalars. So yes, V is a vector space.We say it is a subspace of R3. Here is a precise definition:

Assume V is a vector space. A subset W of V is a subspace of V if (and only if)

1. 0 ∈W .

2. If v,w ∈W , then v + w ∈W . (W is closed under addition.)

3. If v ∈W and c is a scalar, then cv ∈W . (W is closed under scalar multiplication.)

With these properties we can work in W without ever having to leave it (except if we want to leave it). We didnot ask for W to contain additive inverses of elements because we get that for free. In fact, if W is a subspace, ifv ∈W , then −v = (−1)v; since −1 is a scalar (both in the real and complex case), (−1)v must be in W , meaning−v ∈W .

Here are more examples.

E1. LetW = {(x, y, z, w) : x, y, z, w are real numbers andx+ 2y − 3z + w = 0}

This is a subset of R4. Is it a subspace of R4? To answer this question, we have to check the three propertiesmentioned above. Is 0 ∈ W? For R4, 0 = (0, 0, 0, 0), so x = 0, y = 0, z = 0, w = 0 and, of course,0 + 2 · 0 − 3 · 0 + 0 = 0. Yes, 0 ∈ W . Next, we have to check that if v,w ∈ W , then v + w ∈ W . For thiswe assume v = (x, y, z, w), w = (x′, y′, z′, w′) where x, y, z, w, x′, y′, z′, w′ satisfy x + 2y − 3z + w = 0 andx′ + 2y′ − 3z′ + w′ = 0. Then

v + w = (x+ x′, y + y′, z + z′, w + w′)

3 SUBSPACES 11

and the question is whether (x + x′) + 2(y + y′) − 3(z + z′) + (w + w′) = 0. Using a bit of high schoolmathematics we see that

(x+x′)+2(y+y′)−3(z+z′)+(w+w′) = x+x′+2y+2y′−3z−3z′+w+w′ = x+2y−3z+w+x′+2y′−3z′+w′ = 0+0 = 0.

We conclude that v + w ∈ W . Finally, suppose v ∈ W , say v = (x, y, z, w) where x + 2y − 3z + w = 0.Suppose c is a scalar (a real number since we are working with a real vector space). Then cv = (cx, cy, cz, cw)and

cx+ 2cy − 3cz + cw = c(x+ 2y − 3z + w) = c · 0 = 0.

It follows that cv ∈W . The conclusion is that W is indeed a subspace of R4.

E2 Since real numbers are just complex numbers with the imaginary part equal to 0, it is (or should be) clearthat for any integer n > 0, Rn is a subset of Cn. Is it a subspace? At first it seems to be. The zero element,the n-tuple with all components 0, is in Rn. Adding two vectors of Rn results in a vector of Rn. But ourscalars are now complex numbers; multiplying a vector of Rn by a scalar will almost always take us out ofRn. Here is an example with n = 3. take, for example (1, 2, 3) ∈ R3. Take for scalar the imaginary unit i.Then i(1, 2, 3) = (i, 2i, 3i) /∈ R3.

Rn is a subset but not a subspace of Cn.

E3. (The largest and smallest subspaces.) Suppose V is a vector space. Then V satisfies all the properties of beinga subspace of V . Every vector space is a subspace of itself. V is, of course, the largest possible subspace ofV . For the smallest one, consider the set W = {0}, the set containing only the zero element of V . Since itcontains the zero element, it satisfies the first property of being a subspace. If v,w ∈ W , well that can onlybe if both are the zero element and then so is their sum; the second property holds. And, since any scalartimes the zero element is the zero element, the third property holds also. This is necessarily the smallestpossible subspace, known as the trivial subspace.

E4. Think of R3 as points of three space. That is, by setting up a system of cartesian coordinates, we can identifyeach vector of R3 with a point of space. One can then show that precisely the following subsets of R3 aresubspaces:

• The origin as set of a single point; the trivial subspace.

• All straight lines through the origin.

• All planes that go through the origin.

• R3.

E5. If A is an m×n matrix, then the transpose of A, denoted by AT , is the n×m matrix whose rows are the columnsof A (and, therefore, its columns are the rows of A). The formal description is: If A = (aij)1≤i≤m,1≤j≤n,then AT = (bij)1≤i≤n,1≤j≤m with bij = aji for all i, j. Briefly: If A = (aij), then AT = (aji). For example, if

A =

1 0 2− 3i −51 2 3 4

−√

5 −2 + 7i 8 9

,

then

AT =

1 1 −

√5

0 2 −2 + 7i2− 3i 3 8−5 4 9

It is an easy exercise to show that if A,B are m× n matrices, then (A+ B)T = AT + BT ; if A is an m× nmatrix, then (cA)T = cAT for all scalars c.

If m = n; that is, when dealing with square matrices, it is possible to have AT = A. Matrices verifying thisproperty are called symmetric. Let n be a positive integer and let V = Mn,n or Mn,n(C) (either the real orthe complex space of all square n× n matrices). Let W = {A ∈ V : A = AT }. Then W is a subspace of V .I leave the simple checking of the properties as an exercise.

3 SUBSPACES 12

E6. Let b be a real number. Let W = {(x, y) : x, y are real numbers and x + y = b}. There is exactly one valueof b for which W so defined is a subspace of R2. Find that value and explain why it is the only value thatworks.

E7. Let I be an interval in the real line and let V be the set of all real valued functions of domain I. As seen earlier,this is a vector space with the usual operations. Let W be the set of all continuous functions of domain I.Because the function that is identically 0 is continuous (about as continuous as can be), it is in W . Since it isalso the zero element of V , W contains the 0 element of V . In Calculus 1 we learn (and one hopes remember)that the sum of continuous functions is continuous, and that when we multiply a continuous function by ascalar (real number) the result is again continuous. We conclude W is a subspace of V .

A few concluding remarks (for now) about subspaces. I hope you realize that we don’t always have to call themW . We could even call the vector space W and the subspace V . Or by any other convenient name; one should trynever to get hung up on notation.

Example E4 shows a typical fact; there is a certain level among subspaces. For example, we have lines andplanes; nothing in between. This is due to the fact that vector spaces have a dimension, as we will see in a while.

3.1 Exercises

1. Which of the following are subspaces of R3. Justify your answer.

(a) All vectors of the form (a, 0, 0).

(b) All vectors of the form (a, 1, 1).

(c) All vectors of the form (a, b, c), where a = b+ c.

(d) All vectors of the form (a, b, c), where a = b+ c+ 1.

2. Which of the following are subspaces of M2,2. Justify.

(a) All 2× 2 matrices with integer entries.

(b) All matrices (a bc d

)where a+ b+ c+ d = 0.

(c)

(d) All matrices (a b−b a

).

3. Let I = (a, b) be an interval in R and let V be the vector space of all real valued functions on I, as seen inthe notes it is a vector space with addition and scalar multiplication defined as usual. Let c, d be points in I:a < c < d < b. Determine which of the following are subspaces of V .

(a) The set of all continuous bounded functions on I; that is the set of all continuous f on I such that thereexists some number M (depending on f) such that |f(x)| ≤M for all x ∈ I.

(b) The set of all continuous functions on I such that∫ d

cf(x) dx = 0.

(c) The set of all continuous functions on I such that∫ d

cf(x) dx = 1

(d) The set of all continuous functions on I such that∫ x

cf(t) dx = f(x) for all x ∈ I.

(e) All solutions of the differential equation

an(t)dny

dtn+ an−1(t)

dn−1y

dtn−1+ · · ·+ a1(t)

dy

dt+ a0(t)y = 0

where a0, a1, . . . , an are continuous functions on I.

4 MORE ON MATRICES 13

4 More on matrices

As we saw, the set of m × n matrices is a vector space; real if we restrict ourselves to real entries; complex if weallow complex numbers. But there is more to matrices than just being a vector space. Matrices can be multiplied.Well, not always; sometimes. We can form the product AB of the matrix A times the matrix B if andonly if the number of columns of A equals the number of rows of B. Here is the basic definition. It isbest done using the summation symbol

∑.

Suppose A is an m× n matrix and B is an n× p matrix, say

A = (aij)1≤i≤m,1≤j≤n, B = (bjk)1≤j≤n,1≤k≤p.

Then AB = (cik)1≤i≤m,1≤k≤p, where

cik =

n∑j=1

aijbjk = ai1b1k + ai2b2k + · · ·+ ainbnk.

In words: The element of AB in position (i, k) is obtained as follows. Only the i-th row of A and the k-th columnof B are involved. We multiply the first component of the i-th row of A by the first component of the k-th row ofB, add to this the product of the second component or entry of the i-th row of A to the second one of the k-th rowof B, add to this the product of the third entry of the i-th row of A to the third entry of the k-th row of B, and soforth. Since each row of A has n components, and each column of B has n-components, it all works out, and weend by adding the product of the last entries of the i-th row of A and k-th column of B.

Here are a number of examples and exercises. Please, verify that all examples are correct!

• Example. (1 2 34 5 6

) 0 1 2 41 −1 1 12 −1 0 7

=

(8 −4 4 27

17 −7 13 63

)

• Example. Let A = (1 − 3 4 0), B =

−3457

.

That is A is 1× 4, B is 4× 1. Then AB will be 1× 1. We identify a 1× 1 matrix with its single entry; thatis, we don’t enclose it in parentheses.

AB = 1(−3) + (−3)(4) + (4)(5) + (0)(7) = 5,

BA =

−3 9 −12 04 −12 16 05 −15 20 07 −21 28 0

.

• Exercise. Let A be a 7× 8 matrix and B an 8× 4 matrix. Suppose all entries in the third row of A are zero.Explain why all entries in the third row of AB will be 0.

Matrix product behaves a lot like an ordinary product of numbers; that is WITHIN REASON! and an importantexception, given an equation involving matrices, if it is true when one substitutes matrices for numbers, then itwill also be true for matrices. Specifically the following properties hold.

• (Associativity of the product) Briefly (AB)C = A(BC). But the products have to make sense! In a moredetailed way: If A is an m× n matrix, B is n× p and C is p× q, then (AB)C = A(BC). In this case AB isan m× p matrix and it can be multiplied by C, which is p× q, to produce an m× q matrix (AB)C. On theother hand, BC is an n× q matrix; we can multiply on the left by A to get an m× q matrix A(BC). Thesetwo m× q matrices are one and the same. This property is not hard to verify, but it can get messy.


• (Distributivity), Briefly written as

A(B + C) = AB +AC

(A+B)C = AC +BC.

Once again, all operations must make sense. For the first equality, B,C must be of the same type, say n× p.If A is m×n, then A(B+C), AB,AC, AB+AC, are all defined; the property now makes sense. The secondequality assumes implicitly that A,B are of type m×n, and C of type n× p. This property is easy to verify.

• If c is a scalar, A of type m× n, B of type n× p, then

A(cB) = cAB = (cA)B.

Very easy to verify.

Is it true that AB = BA? This question doesn’t even make sense in most cases. For example if A is 3× 5 andB is 5 × 2, then AB is defined (it is 3 × 2), but BA is not. Even if both AB and BA are defined, the answer isobviously no. For example if A is 3 × 4 and B is 4 × 3, then both AB and BA are defined, but AB is 3 × 3 andBA is 4× 4; they are most definitely not equal.

The question becomes more interesting for square matrices; if A,B are square matrices of the same type, sayboth are n× n, then AB,BA are both defined, both n× n and potentially equal. But usually they are not. Hereare examples:1. (

1 23 4

)(5 67 8

)=

(19 2243 50

)(

5 67 8

)(1 23 4

)=

(23 3431 46

)2. 1 0 −1

0 3 4−2 2 0

0 3 41 0 −1−2 2 0

=

2 1 4−5 8 −3

2 −6 −10

0 3 4

1 0 −1−2 2 0

1 0 −10 3 4−2 2 0

=

−8 17 123 −2 −1−2 6 10

3. 1 −1 2

−2 1 −11 −2 1

2 −3 3−3 2 −3

3 −3 2

=

11 −11 10−10 11 −11

11 −10 11

=

2 −3 3−3 2 −3

3 −3 2

1 −1 2−2 1 −1

1 −2 1

In general AB 6= BA but, as the third example shows, there are exceptions. For example, every square matrixnaturally commutes with itself: If A = B, then AB = AA = BA. If A is a square matrix, we write A2 for A, A3

for AAA = AA2 = A2A, etc. There are also square matrices that commute with every square matrix. The twomost notable examples are the zero matrix and the identity matrix (about to be defined). Suppose 0 is the n× nzero matrix. Then, for every square n× n matrix A we have A0 = 0 = 0A; the square zero matrix commutes withall square matrices.

The identity matrix is usually denoted by I, or by In if its type is to be emphasized. It is the square matrixhaving ones in the main diagonal, all other entries equal to 0. If n = 2, then

I = I2 =

(1 00 1

);


if n = 3, then

I = I3 =

1 0 00 1 00 0 1

;

etc. The general definition is I = (δij)1≤i,j≤n, where

δij =

{0 if i 6= j,1 if i = j.

It is called the identity matrix because as is immediately verified, if A is any square matrix of the same type as I,then IA = A = AI. Please, verify this on your own; convince yourself it holds and try to understand why it holds.In particular, I commutes with all square matrices of its same type. That is, In commutes with all n×n matrices.But there is a bit more. Let A be an m× n matrix. Then

ImA = A = AIn.

(Verify this property.)

Exercise. Let M be a square n×n matrix. Show that MA = AM for all n×n matrices A if and only if M = cI,where c is a scalar and I = In is the n× n identity matrix. (If c = 0, then M = 0, if c = 1 then M = I. In generalM would have to be a matrix having all off-diagonal entries equal to 0, all diagonal entries the same.) As a hint,showing that if M = cI, then MA = AM for all n× n matrices should be very easy. For the converse, experimentwith different matrices A. For example, what does the equation MA = AM tell you assuming that A has all entriesbut one equal to zero. For example, if A = (aij) and aij = 0 if i 6= 1, j 6= 2, a1,2 = 1, if M = (mij), then one cansee that the first row of AM is equal to the second row of M , while all other rows have only zero entries. On theother hand, in MA one sees that the second column of MA is the first column of M , all other columns are zerocolumns:

0 1 0 · · · 00 0 0 · · · 0...

......

......

0 0 0 · · · 0

m11 m12 m13 · · · m1n

m21 m22 m23 · · · m2n

......

......

...mn1 mn2 mn3 · · · mnn

=

m21 m22 m23 · · · m2n

0 0 0 · · · 0...

......

......

0 0 0 · · · 0

,

m11 m12 m13 · · · m1n

m21 m22 m23 · · · m2n

......

......

...mn1 mn2 mn3 · · · mnn

0 1 0 · · · 00 0 0 · · · 0...

......

......

0 0 0 · · · 0

=

0 m11 0 · · · 00 m21 0 · · · 0...

......

......

0 mn1 0 · · · 0

Because M is supposed to commute with all matrices, it will commute with this selected one. Equating the twoproduct results, we see that m21,m23, . . . ,m2n,m31,m41, . . . ,mn1 must all be 0, while m11 = m22. We are well onour way.

From now on, instead of writing Mn,n I will write simply Mn. Thus Mn is the set of all n × n matrices withreal entries and Mn(C) is the set of all n × n matrices with complex entries. The space Mn (as well as Mn(C)has some very nice properties. Under addition and scalar multiplication it is a vector space; but that is true of allsets of matrices of the same type. But it is also closed under matrix multiplication: If A,B ∈ Mn then AB,BAare defined and in Mn. The same holds for Mn(C). So in Mn (and in Mn(C) we can add and multiply any twomatrices and never leave the set. We can, of course, also subtract; A−B is the same as A+ (−1)B. Can we alsodivide?

This is a very natural question; if A,B ∈ Mn(C), what, if anything, should A/B be? We could say that itshould be a square n× n matrix such that when multiplied by B we get A back. Here we run into a first problem;multiplication not being commutative. Should we ask for (A/B)B = A, for B(A/B) = A, or for both? As it turnsout this approach has more problems than one might think at first. Here is a very simple example. Suppose

A = 0 =

(0 00 0

), B =

(0 10 0

)


One could say that obviously A/B = 0/B = 0. In fact if you multiply 0 by B (either on the right or on the left),you get A = 0. But here is a quaint fact. Notice that B2 = 0, the 2 × 2 zero matrix. Multiplying B by B itselffrom the left or from the right will also result in A. Should 0/B = B?

The problem, in general, is that given square matrices A,B there could be more than one matrix that can actlike A/B; more than one matrix C such that CB = BC = A. Or there could be none. For example, if in theprevious example we replace A by the identity matrix I, leaving B as it is, then I/B is undefined; there is nomatrix C such that CB = BC = I. Can you prove this? Due to this one does things in a different way. We havean identity I; we could start trying to define I/B and then define A/B as either A(I/B) or (I/B)A. Well this alsohas its problems, but they are solvable.

Problem 1. There are many matrices B ∈Mn(C) for which I/B won’t make sense. That is, matrices B for whichthere is no C such that CB = BC = I.

Solution. Define I/B only for the matrices for which it makes sense.

Problem 2. Even if I/B makes sense, there is no guarantee that for any given matrix A we will have (I/B)A =A(I/B). So what should A/B be?

Solution. Forget about A/B, just think of dividing by B on the left and dividing by B on the right.

Let’s begin to be precise. Let A ∈Mn(C). We say A is invertible if (and only if) there exists B ∈Mn(C) suchthat AB = I = BA. In this case one can show that B is unique and one denotes it by A−1. That “B is unique”means: “for a given A there may or there may not exist such a B, but there cannot exist more than one suchmatrix.” This uniqueness is actually very easy to prove, and here is a proof. Those of you who hate proofs andprefer to accept the instructor’s word for everything, please ignore it.Proof that there can only be one matrix deserving to be called A−1. Suppose there is more than onesuch matrix, so there are at least 2 matrices, call them B,C, such that

AB = BA = I and AC = CA = I.

ThenB = BI = B(AC) = (BA)C = IC = C.

End of proof.

What is however amazing, and harder to prove, is that if A,C ∈ Mn(C), and AC = I, then this suffices tomake A invertible and C = A−1. That is, in this world of matrices where commutativity is so rare, it suffices tohave AC = I to conclude that we also have CA = I and C = A−1. So in verifying that a matrix C is the inverseof A, we don’t have to check that both CA and AC are the identity. If AC = I, then we are done; CA will also beI and C = A−1. Of course, the same is true if CA = I; then AC = I and C = A−1.

Deciding whether a general n × n matrix is invertible and, if it is, finding its inverse is actually a matter oflooking at n2 equations, and solving them, or showing they can’t be solved. If we were to try to do this at ourcurrent level of knowledge, it would be a boring, difficult, messy exercise. We will develop methods to do thisefficiently. However, to give you an idea of what can be involved if one attacks the problem without any more toolsat hand, let us try to decide if the matrix

A =

1 2 22 3 11 0 1

is invertible, and find its inverse. We are asking, in other words, whether there is a 3× 3 matrix B such that 1 2 2

2 3 11 0 1

B = I.


It is enough if there is a matrix B that works on the right; as mentioned, it will also work from the left. Well, Bwill look like

B =

r s tu v wx y z

and the question becomes can we find r, s, t, u, v, w, x, y, z solving 1 2 2

2 3 11 0 1

r s tu v wx y z

=

1 0 00 1 00 0 1

.

If we can find such r, s, etc., then we have our inverse. If there is some reason why this is impossible, then therewon’t exist an inverse. In the last matrix we can perform the product on the left and the question now becomes:Can we find r, s, t, u, v, w, x, y, z solving r + 2u+ 2x s+ 2v + 2y t+ 2w + 2z

2r + 3u+ x 2s+ 3v + y 2t+ 3w + zr + x s+ y t+ z

=

1 0 00 1 00 0 1

.

This is equivalent to solving 9 = 32 equations for r, s, t, u, v, w, x, y, z:

r + 2u+ 2x = 1

s+ 2v + 2y = 0

t+ 2w + 2z = 0

2r + 3u+ x = 0

2s+ 3v + y = 1

2t+ 3w + z = 0

r + x = 0

s+ y = 0

t+ z = 1

These equations are not as hard as one might think; still solving the system, or verifying that there is no solution,is work. But let’s do it!. Maybe you want to do it on your own; it might make you appreciate more the methodswe’ll develop later on. So solve the system, then come back to compare your solution with mine.

From the equation r+ x = 0, we get x = −r. Using this in the equation 2r+ 3u+ x = 0 and solving for u, we

get u = −r/3. Using these values for x, u in the first equation we get r = −3/5. We now have our possible first

column.

r = −3

5, u =

1

5, x =

3

5.

Similarly we find the other columns. From s+ y = 0, we get y = −s. From 2s+ 3v + y = 1 we get v = (1− s)/3;from s+ 2v + 2y = 0 we get s = 2/5 so that

s =2

5, v =

1

5, y = −2

5.

Finally, from the equations t+ z = 1, 2t+ 3w + z = 0, t+ 2w + 2z = 0 we get

t =4

5, w = −3

5, z =

1

5.

If our calculations are correct, the matrix A is invertible and

A−1 =

− 3

525

45

15

15 − 3

5

35 − 2

515

=1

5

−3 2 41 1 −33 −2 1

5 SYSTEMS OF LINEAR EQUATIONS. 18

But to be absolutely sure, one should multiply A by its supposed inverse (on either side0 and see that one gets theidentity matrix. One does.

Here is an easier exercise. Show that the matrix

A =

−1 2 0−3 0 20 1 2

is invertible and that

A−1 =1

14

−2 −4 46 −2 2−3 1 6

Do we have to go through the same process as before, finding the inverse of A and then seeing it works out to thegiven value. Only if we are gluttons for pain! The obvious thing to do is to multiply A by its assumed inverse andsee that we get the identity. We see that −1 2 0

−3 0 20 1 2

· 1

14

−2 −4 46 −2 2−3 1 6

=

1

14

−1 2 0−3 0 20 1 2

−2 −4 46 −2 2−3 1 6

=1

14

14 0 00 14 00 0 14

= I.

And we are done. The matrix given as A−1 is indeed the inverse of A, and since A has an inverse, it is invertible.

4.1 Exercises

1. Show that if A is invertible, then A−1 is invertible and (A−1)−1 = A.

2. Show that if A ∈Mn(C) is invertible, so is AT and (AT )−1 = (A−1)T .

3. Show that if A,B ∈n (C) are invertible, so is AB and (AB)−1 = B−1A−1.

4. Let A ∈Mn(C) and assume A2 = 0. Prove that I +A is invertible and that (I +A)−1 = I −A.

Note: It is possible for A2 to be the zero matrix without A being the zero matrix. For example, for n = 3,the following two matrices have square equal to 0. Neither is the zero matrix. There are, of course, manyothers in the same category. 0 0 1

0 0 00 0 0

,

1 1 −21 1 −21 1 −2

5 Systems of linear equations.

We now come to one of the many reasons linear algebra was invented, to solve systems of linear equations. In thissection I will try to summarize most of what you need to know about systems of equations of the form:

a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

· · ·... · · ·

... · · ·... · · ·

......

am1x1 + am2x2 + · · · + amnxn = bm

(9)

This is a system of m equations in n unknowns. The unknowns are usually denoted by x1, . . . , xn, but the notationcan change depending on the circumstances. For example, if n = 2; that is, if there are only two unknowns, thenone frequently write x for x1 and y for x2. If there are only three unknowns, one frequently denotes them by x, y, zrather than x1, x2, x3. Less frequently, if one has four unknowns, one denotes them by x, y, z, w. For five or moreunknowns one usually uses subindices; since one can too easily run out of letters.


The coefficients aij are given numbers; they can be real or complex. The same holds true for the right handside entries b1, . . . , bn. Most of my examples will be in the real case, but all that I say is valid also in the complexcase (except if it obviously is not valid!). Once a single complex non-real number enters into the picture, one is incomplex mode.

Solving a system like (9) consists in finding an n-tuple of numbers x1, x2, . . . , xn such that when plugging itinto the equations, all equations are satisfied. For example, consider the system of equations

3x1 + 2x2 − x3 = 1−x1 + x2 + x3 = 5

(10)

Here m = 2, n = 3. Then x1 = 0, x2 = 2, x3 = 3 is a solution. In fact,

3 · 0 + 2 · 2 − 3 = 1−0 + 2 + 3 = 5

But that isn’t all the story. There are more solutions, many more. For example, as one can verify,

x1 = 9, x2 = −4, x3 = 18

also is a solution. And so isx1 = −6/5, x2 = 14/5, x3 = 1.

And many more. We will rarely be content with finding a single solution, in most cases we will want to find ALLsolutions. I anticipate here that the following, and only the following, can happen for a system of linear equations:

• The system has exactly one solution.

• The system has NO solutions.

• The system has an infinity of solutions.

So if someone tries to sell you a system that has exactly two solutions, don’t buy it! Once it has more than onesolution, it has an infinity of solutions.

To attack systems in a rational, efficient way, we want to develop some notation and terminology. Let us returnto the system (9). The matrix whose entries are the coefficients of the system; that is, the matrix

A =

a11 a12 · · · a1na21 a22 · · · a2n...

......

...am1 am2 · · · amn

is called the system matrix (or matrix of the system). For example, for the system in (10), the system matrix is

A =

(3 2 −1−1 1 1

)It is an m × n matrix. It will also be convenient to write n-tuples (and m-tuples, and other tuples) vertically,

as column vectors. So we think of C2 as consisting of all elements of the form

(a1a2

), where a1, a2 are complex

numbers; C3 as consisting of all elements of the form

a1a2a3

, where a1, a2, a3 are complex numbers; and so forth.

In general

Cn = {

a1a2...an

: a1, a2, . . . , an ∈ C}.


In other words, we are identifying Cn with the vector space of n×1 matrices; Cn = Mn,1(C). If working exclusivelywith real numbers, one can replace C by R in all of this. We call matrices that consist of a single column, columnmatrices or, more frequently, column vectors. Returning to our system, the m-tuple b1, . . . , bm of numbers on the

right hand side of the equations will give rise to a vector b =

b1b2...bm

∈ Cm = Mm,1(C). We also introduce the

unknown/solution vector x =

x1x2...xn

. Then the right hand side of the system is precisely Ax and the system

(9) can be written in a nice and compact way as

Ax = b.

A solution of the system is now a vector x ∈ Cm such that Ax = b. Our objective is to develop an efficient methodfor finding all solutions to such a system.

Two m×n systems of linear equations are said to be equivalent if they have exactly the same solutions. Solvinga system is usually done (whether one realizes it or not) by replacing the system by a sequence of systems, eachequivalent to the preceding one, until one gets a system so simple that it actually solves itself. The solutions of thefinal, very simple system, are the same as of the original system.

The way one gets an equivalent system is by performing any of the following “operations.”

1. Interchange two equations. For example, if in the system (10) we interchange equations 1 and 2 we get

−x1 + x2 + x3 = 53x1 + 2x2 − x3 = 1

Nothing essential has changed; the order in which the equations are presented does not affect the solutions.

2. Multiply one equation by a non-zero constant. Nothing changes; we can go back to the original system bydividing out the constant.

3. Add to an equation another equation multiplied by a constant. Again, nothing changes; if we now subtractfrom the new equation the same equation we previously added, multiplied by the same constant, we are backwhere we were before.

These are the three basic “operations” by which any system can be reduced to an immediately solvable one, onethat can be solved by inspection. For example, here is how these operations affect system (10)

1. Interchange the first and second equation. The system becomes

−x1 + x2 + x3 = 53x1 + 2x2 − x3 = 1

2. Multiply the first equation by −1:

x1 − x2 − x3 = −53x1 + 2x2 − x3 = 1

3. Add, to the second equation, −3 times the first equation:

x1 − x2 − x3 = −55x2 + 2x3 = 16

4. Multiply the second equation by 1/5:

x1 − x2 − x3 = −5x2 + 2

5x3 = 165

The system is now in Gauss reduced form, easy to solve, but I’ll go one step further (Gauss-Jordan reduction).


5. To the first equation, add the second equation:

x1 − 35x3 = − 9

5

x2 + 25x3 = 16

5

(11)

The last system of equations is equivalent to the first one; everything that we did can be undone. But it is alsosolved. We see from it, that every choice of x1, x2, x3 such that

x1 = −9

5+

3

5x3, x2 =

16

5− 2

5x3

is a solution; indicating that we can select x3 arbitrarily, and then use the formulas for x1, x2. If we write thesolutions in (column) vector form, we found that

x =

− 9

5 + 35x3

165 −

25x3

x3

Using our knowledge of vector operations, we can break the solution up as follows

x =

− 9

5

165

0

+ x3

35

− 25

1

But why have an x3 when we don’t have an x1 nor x2 written out explicitly anymore? Another way of writing thesolution of (10) is as follows:

x =

− 9

5

165

0

+ c

35

− 25

1

, where c is arbitrary. (12)

Taking c = 3, we get the solution

023

, taking c = 18 we get

9−418

. These are the two first solutions

mentioned earlier. Since there is an infinite number of choices of c, we have an infinity of solutions.

If we consider what we did here, we may realize that the only job done by the unknowns, by x1, x2, x3 in ourcase, was to act as placeholders. The same is true about the equal sign. If we carefully keep the coefficients inplace, never mix a coefficient of x1 with one of x2, for example, we don’t really need the variables. That is, we canremove them from the original form of the system, and then bring them back at the very end. To be more precisewe need to introduce the augmented matrix of the system. This is the system matrix augmented (increased) byadding the b vector as last column. We draw a line or dotted line to indicate that we have an augmented matrix.The augmented matrix of (9) is the matrix

(A|b) =

a11 a12 · · · a1n b1a21 a22 · · · a2n b2...

......

......

am1 am2 · · · amn bm

Suppose we carry out the three operations on equations we mentioned above. The effect on the augmented matrixof each one of these operations is, respectively:


1. Interchange two rows. We will call the operation of interchanging rows i and j of a matrix operation I(i,j).

2. Multiply a row by a non zero constant. We will call the operation of multiplying the i-th row by the constantc 6= 0 operation IIc(i).

3. Adding to a row another row multiplied by a constant. Adding to row i the j-th row multiplied by c (i 6= j)will be denoted by III(i)+c(j).

These operations are the row operations. The idea is to use them to simplify the augmented matrix. Simplifyingthe matrix by applying row operations is known as row reduction.

Here is how we would solve (10) by working on the matrix. It is essentially what we did before, but it is lessmessy, and easy to program. I usually start with the augmented matrix, and then perform row operations insequence. In some cases the order of performance doesn’t matter, so I may perform more than one simultaneously.I write arrows joining one matrix to the next. Occasionally (and for now) I will write the operations performed ontop of the arrow. So here is the same old system (10) solved by row reduction; the first matrix is the augmentedmatrix of the system.(

3 2 −1 1−1 1 1 5

)I(1,2),II(−1)(1)−→

(1 −1 −1 −53 2 −1 1

)III(2)−3(1)−→

(1 −1 −1 −50 5 2 16

)II 1

5(2)

−→

1 −1 −1 −5

0 1 25

165

III(1)+(2)−→

1 0 − 35 − 9

5

0 1 25

165

The last matrix is in reduced canonical form (explained below), which means we are done. We can now write outthe system that has this last matrix as augmented matrix. We get

x1 − 35x3 = − 9

5

x2 + 25x3 = 16

5

which is exactly (11). From here we continue as before to get the solution (12).

We need to know when to stop row reducing, when the simplest possible level has been achieved. This is calledrow reduced echelon form or some such name; I’ll abbreviate it to RRE form. A matrix is in RRE form if

1. In each row, the first non zero coefficient is a 1. A row that does not start with a 1 will consist exclusively of0’s.

2. All zero rows come after the non-zero rows.

3. All entries above and below the leading 1 of any given row are equal to 0.

4. If i < j, and neither row i nor row j is a zero row, then the leading 1 of row i must be in a column precedingthe column containing the leading 1 of row j.

Two facts are important here:

• Every matrix can be brought to RRE form by row operations.

• While there is more than one way to achieve RRE form by row operations, the end result is always the samefor any given matrix. It is called the RRE form of the matrix.

Maybe a few examples can clear things up.Examples


1. Find all solutions of

x1 − x2 + x3 − x4 + x5 = 1

3x1 + 2x2 − x4 + 9x5 = 0

7x1 + 10x2 + 3x3 + 6x4 − 9x5 = −7

The augmented matrix of the system is 1 −1 1 −1 1 13 2 0 −1 9 07 10 3 6 −9 −7

The first objective on our way to RRE form is, by row operations, get a 1 into the (1, 1) position. In our case,the matrix already has a 1 in the correct place, so we need to do nothing. The only reason one will not beable to get a 1 into the (1, 1) position is if the first column contains only zeros (unlikely!, but possible). Thenone moves over to the second column, tries to get a 1 into position (1, 2); if this fails, into position (1, 3);etc. Total failure means one has the zero matrix, which is, of course, in RRE form. If there is any non-zeroentry in the first column, (or in any other column) one can get it to be in row 1 (if it isn’t there already) byinterchanging row 1 with the row containing the non-zero entry; if we then multiply row 1 by the reciprocalof that non-zero entry, we have a 1 in row 1. So, at most two row operations place a 1 in row 1, if the columnhas at least one non-zero entry. But, as mentioned, we already have a 1 where it should be to start.

Once we have the 1 in place, we use operations of the third type to get every entry below this 1 to be 0.For example, if the first entry in row i, i > 1 is c, we perform the row operation III(i)−c(1). Applied to ourmatrix, it works as follows 1 −1 1 −1 1 1

3 2 0 −1 9 07 10 3 6 −9 −7

III(2)−3(1),III(3)−7(1)−→

1 −1 1 −1 1 10 5 −3 2 6 −30 17 −4 13 −16 −14

The first column has been processed. Once a column has been processed, here is how on processes the nextcolumn.

(a) Suppose i is the row in which the last leading 1 appeared (in one of the columns preceding the one we areabout to process). Presumably that 1 is in the column just preceding the one we are about to process,but it could be before that. Is there a non-zero entry in the column to be processed, in some row strictlybelow i? If no, the column has been processed; move on to the next column. If none remain, you aredone. If yes, get it into row i+ 1 (if not already there) by interchanging the row containing it with rowi + 1. Then multiply the row i + 1 by the reciprocal of that non-zero entry so as to get a 1 in positioni+ 1.

(b) Using operations of type III, get every entry above and below this 1 to be 0. Then move on to the nextcolumn. If none remain, you are done.

Applying all this to our current matrix, here is how we finish the process. 1 −1 1 −1 1 10 5 −3 2 6 −30 17 −4 13 −16 −14

II 15(2)

−→

1 −1 1 −1 1 10 1 − 3

525

65 − 3

50 17 −4 13 −16 −14

III(1)+(2),III(3)−17(2)−→

1 0 2

5 − 35

115

25

0 1 − 35

25

65 − 3

5

0 0 315

315 − 182

5 − 195

II 5

31(3)

−→

1 0 2

5 − 35

115

25

0 1 − 35

25

65 − 3

5

0 0 1 1 − 18231 − 19

31

III

(1)− 25(3)

,III(2)+ 3

5(3)

−→

1 0 0 −1 141

312031

0 1 0 1 − 7231 − 30

31

0 0 1 1 − 18231 − 19

31


The matrix is now in RRE form, and in the best possible way for the case m ≤ n (in our case m = 3, n = 5):The first m columns constitute the m×m identity matrix. The system is equivalent to

x1 − x4 +141

31=

20

31

x2 + x4 −72

31x5 = −30

31

x3 + x4 −182

31x5 = −19

31

Variables that do not correspond to the columns containing the leading 1’s; columns 4, 5 in our case, thusx4, x5, can be chosen freely. All solution of the system are thus given by

x1 = x4 −141

31+

20

31

x2 = −x4 +72

31x5 −

30

31

x3 = −x4 +182

31x5 −

19

31

for arbitrary values of x4, x5. That is, for every choice of values of x4, x5 we get a solution; we have again aninfinity of solutions. We can write the whole thing in vector notation as follows: The solutions of the systemare given by

x =

x4 − 14131 + 20

31

−x4 + 7231x5 −

3031

−x4 + 18231 x5 −

1931

x4x5

= x4

1

−1

−1

10

+ x5

− 14131

7231

18231

01

+

2031

− 3031

− 1931

00

A number of other cosmetic changes can be made. For example, why have x4, x5 where we don’t havex1, x2, x3? We could relabel them c1, c2. Moreover, we can try to get rid of some denominators; if x5 = c2 isarbitrary, so is x5/31. We can also write the solution in the slightly nicer way

x = c1

1−1−110

+ c2

−141

72182031

+1

31

20−30−19

00

for arbitrary values of c1, c2

2. For our next example, consider

x+ 3y + 2w = 5

5x+ 15y + z + 14w = −1

x+ 3y − z − 2w = 2

The augmented matrix is 1 3 0 2 55 15 1 14 −11 3 −1 −2 2

We proceed to row reduce. 1 3 0 2 5

5 15 1 14 −11 3 −1 −2 2

III(2)−5(1),III(3)−(1)−→

1 3 0 2 50 0 1 4 −260 0 −1 −4 −3

III(3)+(2)−→

1 3 0 2 50 0 1 4 −260 0 0 0 −29


and we are done. The system has NO solutions. It has no solutions because it is equivalent to a system inwhich the third equation is

0x+ 0y + 0z + 0w = −29

and since the left hand side is 0 regardless of what x, y, z, w might be, it can never equal -29.

3. We saw in the previous example a system without solutions. Can it have solutions if we change the righthand sides? That is, let us try to determine all values, if any, of b1, b2, b3 for which the system

x+ 3y + 2w = b1

5x+ 15y + z + 14w = b2

x+ 3y − z − 2w = b3

has solutions. We write up the augmented matrix and row reduce. 1 3 0 2 b15 15 1 14 b21 3 −1 −2 b3

III(2)−5(1),III(3)−(1)−→

1 3 0 2 b10 0 1 4 b2 − 5b10 0 −1 −4 b3 − b1

III(3)+(2)−→

1 3 0 2 b10 0 1 4 b2 − 5b10 0 0 0 −6b1 + b2 + b3

The last equation makes sense if and only if −6b1 + b2 + b3 = 0, or b3 = 6b1 − b2. In this case the RRE formis 1 3 0 2 b1

0 0 1 4 b2 − 5b10 0 0 0 0

Notice also that the columns (among the first four) not containing leading 1’s are columns 2 and 4; thus y, wcan be selected freely. The solution could be given as

x = −3y − 2w + b1

z = −4w + b2 − 5b1

or in vector form

x = y

−3100

+ w

−20−41

+

b10

b2 − 5b10

,

with y, w arbitrary.

4. We now find all solutions of

x1 − 2x2 + 2x3 = 1

x1 + x2 + 5x3 = 0

2x1 + 3x2 + x3 = −1

NOTE: If the number of equations is less than the number of unknowns (m < n) then there usually (but notalways!) is more than one solution. More precisely, there either is no solution or an infinity of solutions. Ifthe number of equations is more than the number of unknowns (m > n), there is a good chance of not havingany solutions. Actually, anything can happen, but the most likely outcome is no solutions because there aretoo many conditions (equations). If the number of equations equals the number of unknowns (m = n), asit does in our current example, one has a good chance to having a unique solution. Still, anything couldhappen.

Solving the system. We row reduce the augmented matrix. 1 −2 2 11 1 5 02 3 1 −1

III(2)−(1),III(3)−2(1)−→

1 −2 2 10 3 3 −10 7 −3 −3

II 13(2)

−→

1 −2 2 10 1 1 −1/30 7 −3 −3


III(1)+2(2),III(3)−7(2)−→

1 0 4 1/30 1 1 −1/30 0 −10 −2/3

II− 110

(3)

−→

1 0 4 1/30 1 1 −1/30 0 1 1/15

III(1)−4(3),III(2)−(3)−→

1 0 0 1/150 1 0 −2/50 0 1 1/15

There is a unique solution:

x =

115

− 25

115

The following should be clear from all these examples: There exists a unique solution if and only if m ≥ nand the first n rows of the RRE form of the augmented matrix constitute the identity matrix. Since theaugmented matrix has n + 1 columns, and as we row reduce it we are row reducing A, we see that thiscondition involves only A, not the b part. Let us state this as a theorem so we realize it is an importantresult:

Theorem 1 The m× n system of linear equations (9) has a unique solution for a given value of b ∈ Cm ifand only if

(a) m ≥ n.

(b) The row reduced echelon form of the system matrix A is either the n× n identity matrix (case n = m)or the n× n identity matrix completed with m− n rows of zeros to an m× n matrix.

Moreover, since all this depends only on A, it has a unique solution for some b ∈ Cm if and only if it has aunique solution for all b ∈ Cm.

The alternative to not having a unique solution for a given b ∈ Cm is having no solutions for that b or havingan infinity of solutions.

5. Suppose we want to solve two or more systems of linear equations having the same system matrix. Forexample, say we want to solve

x1 − x2 + x3 = 1

−x1 + x2 + x3 = 2

x1 + x2 − x3 = −5

and

x1 − x2 + x3 = −1

−x1 + x2 + x3 = 4

x1 + x2 − x3 = 0

There is no need to duplicate efforts. You may have noticed that the system matrix carries the row reductions;once the system matrix is in RRE form, so is the augmented matrix. So we just doubly augment and rowreduce 1 −1 1 1 −1

−1 1 1 2 41 1 −1 −5 0

Here we go: 1 −1 1 1 −1−1 1 1 2 4

1 1 −1 −5 0

III(2)+(1),III(3)−(1)−→

1 −1 1 1 −10 0 2 3 30 2 −2 −6 1

I(2,3),II 12(2)

−→

1 −1 1 1 −10 1 −1 −3 1/20 0 2 3 3


III(1)+(2)−→

1 0 0 −2 −1/20 1 −1 −3 1/20 0 2 3 3

II 12(3)

−→

1 0 0 −2 −1/20 1 −1 −3 1/20 0 1 3/2 3/2

III(2)+(3)−→

1 0 0 −2 −1/20 1 0 −3/2 20 0 1 3/2 3/2

The solution to the first system is

x =

−2

− 32

32

,

the solution to the second system is

x =

− 1

2

2

32

.

5.1 Exercises

In Exercises 1-6 solve the systems by reducing the augmented matrix to reduced row echelon form.

1.x + y + 2z = 9

2x + 4y − 3z = 13x + 6y − 5z = 0

2.x1 + 3x2 − 2x3 + 2x5 = 0

2x1 + 6x2 − 5x3 − 2x4 + 4x5 − 3x6 = −15x3 + 10x4 + 15x6 = 5

2x1 + 6x2 + 8x4 + 4x5 + 18x6 = 6

3.x1 − 2x2 + x3 − 4x4 = 1x1 + 3x2 + 7x3 + 2x4 = 2x1 − 12x2 − 11x3 − 16x4 = 5

4.x1 + x2 + 2x3 = 8−x1 − 2x2 + 3x3 = 13x1 − 7x2 + 4x3 = 10

5.2x1 + 2x2 + 2x3 = 0−2x1 + 5x2 + 2x3 = 1

8x1 + x2 + 4x3 = −1

6.− 2x2 + 3x3 = 1

3x1 + 6x2 − 3x3 = −26x1 + 6x2 + 3x3 = 5

7. For which values of a will the following system have no solutions? exactly one solution? Infinitely manysolutions?

x + 2y − 3z = 43x − y + 5z = 24x + y + (a2 − 14)z = a+ 2

6 INVERSES REVISITED 28

6 Inverses Revisited

Suppose A is a square n× n matrix. It is invertible if and only if there exists a square n× n matrix X such thatAX = I. In this case we’ll also have XA = I, but this is of no concern right now. We can rephrase this in termsof existence of solutions to systems of linear equations. Suppose we denote the columns of this (that may or maynot exist) matrix X by x(1), . . . ,x(n). That is, if X = (xij)1≤i,j≤n, then

x(1) =

x11x21

...xn1

, x(2) =

x12x22

...xn2

, . . . , x(n) =

x1nx2n

...xnn

.

It is then easy to see (I hope you agree) that the condition AX = I is equivalent to n systems of linear equations,all with system matrix A:

Ax(1) = δ(1)

Ax(2) = δ(2)

· · ·... · · ·

Ax(n) = δ(n)

(13)

where for 1 ≤ i ≤ n, δ(i) is the column vector having all entries equal to 0, except the i-th one which is 1; thatis δ(1), δ(2), . . . ,∆(n) are the columns of the identity matrix. We can solve all these systems simultaneously if weaugment A by all these columns; in other words if we augment A by the identity matrix: (A|I).

What will happen as we row reduce? For a square matrix, the row reduced form either is the identity matrix,or there is at least one row of zeros in the RRE form. Think about it! You can figure out why this is so! Supposewe get a row of zeros in the row reduced form of A (not of the augmented matrix, but of A). The only way we canhave solutions for all the n-systems is if we also have 0 for the full row in the augmented matrix. That means thata certain number of row reductions produced a 0 row in the identity matrix. This is impossible, and it is not hardto see why this is impossible. So if the RRE form of A contains a row of zeros, then A cannot be invertible; someof the equations for the columns of the inverse are unsolvable. The alternative is that the row reduced echelonform of A is the identity matrix. In this case the augmented columns contain the solutions to the equations (13);in other words the augmented part has become the inverse matrix. Let us state part of this as a theorem.

Theorem 2 A square n× n matrix is invertible if and only if its RRE form is the identity matrix.

We go to examples.

Example 1. Let us compute the inverse of the matrix we already inverted in a previous section, namely

A =

1 2 22 3 11 0 1

We augment and row reduce: 1 2 2 1 0 0

2 3 1 0 1 01 0 1 0 0 1

III(2)−2(1),III(3)−(1)−→

1 2 2 1 0 00 −1 −3 −2 1 00 −2 −1 −1 0 1

III(1)+2(2),III(3)−2(2)−→

1 0 −4 −3 2 00 −1 −3 −2 1 00 0 5 3 −2 1

II−(2),II 15(3)

−→

1 0 −4 −3 2 00 1 3 2 −1 00 0 1 3/5 −2/5 1/5

III(1)+4(3),III(2)−3(3)−→

1 0 0 −3/5 2/5 4/50 1 0 1/5 1/5 −3/50 0 1 3/5 −2/5 1/5

6 INVERSES REVISITED 29

As before, the inverse is

A−1 =

−3/5 2/5 4/51/5 1/5 −3/53/5 −2/5 1/5

Example 2. Find the inverse of

0 3 −3 12 4 −4 02 1 −1 13 0 1 −3

Solution. By row reduction. As usual, row operations are performed in the order they are written.

0 3 −3 1 1 0 0 02 4 −4 0 0 1 0 02 1 −1 1 0 0 1 03 0 1 −3 0 0 0 1

I(1,2),II 12(1)

−→

1 2 −2 0 0 1/2 0 00 3 −3 1 1 0 0 02 1 −1 1 0 0 1 03 0 1 −3 0 0 0 1

III(3)−2(1),III(4)−3(1)−→

1 2 −2 0 0 1/2 0 00 3 −3 1 1 0 0 00 −3 3 1 0 −1 1 00 −6 7 −3 0 −3/2 0 1

III(3)+(2),III(4)+2(2)−→

1 2 −2 0 0 1/2 0 00 3 −3 1 1 0 0 00 0 0 2 1 −1 1 00 0 1 −1 2 −3/2 0 1

II 1

3(2)

−→

1 2 −2 0 0 1/2 0 00 1 −1 1/3 1/3 0 0 00 0 0 2 1 −1 1 00 0 1 −1 2 −3/2 0 1

III(1)−2(2)−→

1 0 0 −2/3 −2/3 1/2 0 00 1 −1 1/3 1/3 0 0 00 0 0 2 1 −1 1 00 0 1 −1 2 −3/2 0 1

I(4,3)−→

1 0 0 −2/3 −2/3 1/2 0 00 1 −1 1/3 1/3 0 0 00 0 1 −1 2 −3/2 0 10 0 0 2 1 −1 1 0

III(2)+(3),II 12(4)

−→

1 0 0 −2/3 −2/3 1/2 0 00 1 0 −2/3 7/3 −3/2 0 10 0 1 −1 2 −3/2 0 10 0 0 1 1/2 −1/2 1/2 0

III

(1)+ 23(4)

,III(2)+ 2

3(4)

,III(3)+(4)

−→

1 0 0 0 −1/3 1/6 1/3 00 1 0 0 8/3 −11/6 1/3 10 0 1 0 5/2 −2 1/2 10 0 0 1 1/2 −1/2 1/2 0

The answer is

A−1 =

− 13

16

13 0

83 − 11

613 1

52 −2 1

2 1

12 − 1

212 0

Example 3. Invert

A =

1 2 34 5 67 8 9

.

Solution. By row reduction, 1 2 3 1 0 04 5 6 0 1 07 8 9 0 0 1

III(2)−4(1),III(3)−7(1)−→

1 2 3 1 0 00 −3 −6 −4 1 00 −6 −12 −7 0 1

7 LINEAR DEPENDENCE, INDEPENDENCE AND BASES 30

III(3)−2(2)−→

1 2 3 1 0 00 −3 −6 −4 1 00 0 0 1 −2 1

A row of zeros has developed in the row reduction of A; the systems of equations whose solutions are thecolumns of the inverse of A are not solvable. The matrix A is not invertible.

Example 4. Show that a 2× 2 matrix

(a bc d

)is invertible if and only if ad− bc 6= 0.

Solution. We may have to consider two cases, a 6= 0, a = 0. Suppose first a 6= 0. We can then divide bya and row reduce as follows(

a b 1 0c d 0 1

)II 1

a(1)

−→(

1 b/a 1/a 0c d 0 1

)III(2)−c(1)−→

(1 b/a 1/a 00 d− cb/a −c/a 1

)If d− (cb/a) = 0, we are done, there is no inverse. Now d− (cb/a) = (ad− bc)/a 6= 0 if and only if ad− bc 6= 0.We see that if ad−bc = 0, there is no inverse. On the other hand, if ad−bc 6= 0, we can divide by (ad−bc)/a:(

1 b/a 1/a 00 d− cb/a −c/a 1

)II a

ad−bc(2)

−→(

1 b/a 1/a 00 1 −c/(ad− bc) a/(ad− bc)

)III

(1)− ba

(2)

−→

1 0 1a + bc

a(ad−bc) − bad−bc

0 1 − cad−bc

aad−bc

=

1 0 dad−bc − b

ad−bc

0 1 − cad−bc

aad−bc

It follows that if a 6= 0, then

A−1 =1

ad− bc

(d −b−c a

)Suppose now a = 0 so A =

(0 bc d

)and ad− bc = −bc. If ad− bc = 0; that is (in this case) if b or c equals

0, then A has a zero row or a zero column; it clearly can’t have an inverse. On the other hand if bc 6= 0, theinverse we found above makes sense; it is

A−1 = − 1

bc

(d −b−c 0

)Now

− 1

bc

(d −b−c 0

)(0 bc d

)= − 1

bc

(−bc 0

0 −bc

)= I,

proving A is invertible and A−1 is given by the same formula as for a 6= 0.

6.1 Exercises

To come

7 Linear dependence, independence and bases

We return to our study of vector spaces. In this section we develop some of the fundamental concepts related to avector space. Remember that a scalar is either a complex number or a real number and that one either assumesthat we are only going to allow real numbers as scalars, and our vector spaces are real vector spaces, or we willopen the door to non-real numbers, and we are now dealing with complex vector spaces.

For all the definitions that follows, assume V is a vector space. It could be any of the examples we gave earlier,or anything else that qualifies as being a vector space.

A linear combination of vectors v1, . . . ,vk ∈ V is any vector of the form

c1v1 + · · ·+ ckvk


where c1, . . . , ck are scalars. We do allow the case k = 1; a linear combination is then just cv1; c a scalar.Example. In R3 consider the vectors

v1 =

1−32

and v2 =

−101

Show that the vectors

0 =

000

, v =

1−21

are linear combinations of v1,v2, while w =

111

is not.

Solution. Given any number of vectors, the zero vector is always a linear combination of these vectors; we justhave to take the scalars involved equal to 0. In our case 0 = 0v1 + 0v2. To show v is a linear combination of v1,v2

reduces to showing that there exists scalars c1, c2 such that v = c1v1 + c2v2; writing this out componentwise, itworks out to 1

−21

= c1

1−32

+ c2

−101

=

c1 − c2−3c1

2c1 + c2

.

In other words, we have to show the system of equations

c1 − c2 = 1

−3c1 = −2

2c1 + c2 = 1

has a solution. Analyzing similarly the situation for w, we have to prove that the system

c1 − c2 = 1

−3c1 = 1

2c1 + c2 = 1

does not have a solution. We can do it all at once, trying to solve both systems simultaneously since the have thesame system matrix. We row reduce 1 −1 1 1

−3 0 −2 12 1 1 1

Performing the following row operations III(2)+3(1), III(3)−2(1), and then III(3)+(2), one gets 1 −1 1 1

0 −3 1 40 0 0 3

This already shows that the system for which the vector b is the last column of the matrix cannot have a solution.In other words, this show that w is not a linear combination of v1,v2. Dropping the last column, we can solve forc1, c2 for the case of v. We keep row reducing performing II− 1

3 (2)followed by III(1)+(2) to get 1 0 2/3

0 1 −1/30 0 0

We obtained a unique solution c1 = 2/3, c2 = −1/3. One can verify that, in fact,

2

3

1−32

− 1

3

−101

=

23 + 1

3

−3 23

2 23 −

13

=

1−21

= v.


Another example. Express v =

1−234

as a linear combination of the following five vectors v1,v2,v3,v4,v5,

or show it isn’t possible.

v1 =

1200

, v2 =

0203

, v3 =

100−1

, v4 =

1−111

, v5 =

1010

Solution. We have to find scalars c1, c2, c3, c4, c5 such that v = c1v1 + c2v2 + c3v3 + c4v4 + c5v5. Written outin terms of components, this means finding c1, c2, c3, c4, c5 such that

1−234

=

c1 + c3 + c4 + c52c1 + 2c2 − c4

c4 + c53c2 − c3 + c4

;

in other words, solving the system of equations

c1 + c3 + c4 + c5 = 1

2c1 + 2c2 − c4 = −2

c4 + c5 = 3

3c2 − c3 + c4 = 4

As usual, we’ll do this setting up the augmented matrix and row reducing. The augmented matrix is:1 0 1 1 1 12 2 0 −1 0 −20 0 0 1 1 30 3 −1 1 0 4

You will notice that the columns of the augmented matrix are the vectors v1,v2,v3,v4, and v. Noticing thissaves some time next time one has to do this. Lets proceed with the row reduction.

1 0 1 1 1 12 2 0 −1 0 −20 0 0 1 1 30 3 −1 1 0 4

III(2)−2(1)−→

1 0 1 1 1 10 2 −2 −3 −2 −40 0 0 1 1 30 3 −1 1 0 4

II 1

2(2)

−→

1 0 1 1 1 10 1 −1 −3/2 −1 −20 0 0 1 1 30 3 −1 1 0 4

III(4)−3(2)−→

1 0 1 1 1 10 1 −1 −3/2 −1 −20 0 0 1 1 30 0 2 11/2 3 10

II 1

2(4)

−→

1 0 1 1 1 10 1 −1 −3/2 −1 −20 0 0 1 1 30 0 1 11/4 3/2 5

III(1)−(4),III(2)+(4)−→

1 0 0 −7/4 −1/2 −40 1 0 5/4 1/2 30 0 0 1 1 30 0 1 11/4 3/2 5

I(3,4)−→

1 0 0 −7/4 −1/2 −40 1 0 5/4 1/2 30 0 1 11/4 3/2 50 0 0 1 1 3

III(1)+ 7

4(4)

,III(2)− 5

4(4)

,III(3)− 11

4(4)

−→

1 0 0 0 5/4 5/40 1 0 0 −3/4 −3/40 0 1 0 −5/4 −13/40 0 0 1 1 3

The solution is given by

c1 = −5

4c5 +

5

4, c2 =

3

4c5 −

3

4, c3 =

5

4c5 −

13

4, c4 = −c5 + 3,


with c5 being arbitrary. This gives us an infinite number of ways of expressing v as a linear combination ofv1,v2,v3,v4,v5. For example, selecting c5 = 0, we get the representation

v =5

4v1 −

3

4v2 −

13

4v3 + 3v4.

You might want to take a minute or so and verify that this representation is correct. Or, we can take c5 = 1 to get

v = −2v3 − 2v4 + v5.

Again, take a minute or so to see this works. Or, one can take c5 = −1 and get

v =5

2v1 −

3

2v2 −

9

2v3 + 4v4 − v5.

And an infinity more.

Notice the difference between the two examples. In the first one there was a unique choice for the scalarcoefficients (or no choice at all). In the second example, there is an infinity of choices. This is because the set ofvectors in the first example were linearly independent, while those of the second example were not. Let us definewhat this means.

Assume again that we have a vector space V and vectors v1, . . . ,vm in V . We say the vectors v1, . . . ,vm arelinearly dependent if there exist scalars c1, c2, . . . , cm, not all 0, such that

c1v1 + · · ·+ cmvm = 0.

The vectors are linearly independent if they are not linearly dependent.

In more words than symbols: Given a set of vectors v1, . . . ,vm, the zero vector 0 can always be obtained as alinear combination of these vectors by taking c1 = 0, c2 = 0, . . . , cm = 0; all coefficients equal to 0. If this is theONLY way one can get the zero vector, the vectors are said to be linearly independent. If there is another way,with at least one of the coefficients not zero, they are linearly dependent.

A few obvious things to notice: If a set of vectors contains the zero vector, they are automatically linearlydependent. In fact, say v1 = 0; then whatever v2, . . . ,vn may be, we will have c1v1 + · · · + cnvn = 0 if wetake c2 = 0, . . . , cn = 0. and c1 = 1 (or any other non-zero number). Given two vectors v1,v2 they are linearlyindependent if and only if one is not equal to the other one times a scalar. In fact, if (say) v1 = cv2, thenv1 + cv2 = 0 and since the coefficient of v1 is 1 6= 0, we clearly have linear dependence. If, on the other hand, wehave c1v1 + c2v2 = 0 and not both c1, c2 are 0, then we can divide out by the non-zero coefficient and solve. Forexample, if c1 6= 0, we can divide by c1 and solve to v1 = −(c2/c1)v2, so v1 is a scalar times v2. A bit less obvious,perhaps, is that vectors v1, . . . ,vm are linearly dependent if and only if one of the vectors is a linear combinationof the others. In fact, if we have c1v1 + · · · cmvm = 0 and one of the cj ’s is not 0, we can divide it out and solveto get vj a linear combination of the remaining vectors. For example, if cm 6= 0 we get

vm =

(− c1cm

)v1 + · · ·+

(−cm−1

cm

)vm−1.

Conversely, if one vector is a linear combination of the others, we can get at once a linear combination of all equalto 0 in which the coefficient of the vector that was a combination of the others is 1 (or −1). For example, if

v2 = c1v1 + c3v3 + · · ·+ vm, then c1v1 + (−1)v2 + c3v3 + · · ·+ vm = 0

and, of course, −1 6= 0. Maybe we should have this as a theorem for easy reference:

Theorem 3 Vectors v1, . . . ,vm of a vector space V are linearly dependent if and only if one of the vectors is alinear combination of the others. Equivalently, the vectors v1, . . . ,vm are linearly independent if and only if noneof the vectors is a linear combination of the others.


Here is another example.Example Let the vector space V be now the space of all (real valued) functions defined on the real line. Showthat the following sets of vectors are linearly independent.

1. f1, f2 where f1(t) = sin t, f2(t) = cos t.

2. g1, g2, g3, g4 where g1(t) = et, g2(t) = e2t, g3(t) = e3t, g4(t) = e4t.

Solution. The first one is easy. Since we only have two vectors, we can ask whether one is a constant times theother. Is there a scalar c such that either f1 = cf2, which means sin t = c cos t FOR ALL t, or f2 = cf1, whichmeans cos t = c sin t FOR ALL t. Of course not!. If sin t = c cos t, we simply have to set t = π/2 to get the ratherstrange conclusion 1 = 0, so f1 = cf2 is out. So is f2 = cf1; we can’t have cos t = c sin t for all t because for t = 0we get 1 = 0.

The second one is a bit harder; we have to show that the only way we can get c1et + c2e

2t + c3e3t + c4e

4t = 0 tohold for ALL t is by taking c1 = c2 = c3 = c4 = 0. We can set up a system of impossible equations by judiciouslygiving values to t, but I’ll postpone the somewhat simpler solution for later, once we have developed a few moretechniques.

Here comes another definition. Let v1, . . . ,vm be vectors in a vector space V . The span of v1, . . . , vm is the setof all linear combinations of these vectors. I will denote the span of v1, . . . ,vm by sp(v1, . . . ,vm). Vectors that willbe for sure in the span of vectors v1, . . . ,vm include: The zero vector; as mentioned, it is a linear combination ofany bunch of vectors, just use 0 coefficients. The vectors v1, . . . ,vm themselves; to get vj as a linear combinationuse cj = 1, all other coefficients equal to 0. And, in general, many more: v1 + 2v2, −v1, v1 + · · ·+ vm; etc., etc.,ad infinitum.

Here are a few examples.Examples

1. Let us start with the simplest example; we have only one vector and that vector is the zero vector. Whatcan 0 span? Well, not much; multiplying by any scalar we always get 0. The span of the zero vector is theset consisting of the zero vector alone; a set of a single element. In symbols, sp(0) = {0}.

2. On the other hand if V is a vector space, v ∈ V , v 6= 0, then the span of v consists of all multiples of v;(v) = {cv : c a scalar}.

3. Remember the example we did above:

In R3 consider the vectors

v1 =

1−32

and v2 =

−101

Show that the vectors

0 =

000

, v =

1−21

are linear combinations of v1,v2, while w =

111

is not.

We could now rephrase it in terms of spans; in solving it we showed that with the vectors defined as in theexample,

0,v ∈ sp(v1,v2), while w /∈ sp(v1,v2).

Here is a simple theorem.

Theorem 4 Assume v1, . . . ,vm are vectors in a vector space V . Then sp(v1, . . . ,vm) is a subspace of V .


The reason why this theorem is true is, I hope, clear. First of all, the zero vector is in the span of v1, . . . ,vm. Itis also clear (I hope once more) that adding two linear combinations of these vectors, or multiplying such a linearcombination by a scalar, resolves into another linear combination of the same vectors. That’s it.

We will refer to the span (v1, . . . ,vm) of vectors v1, . . . ,vm also as the subspace of V spanned by v1, . . . ,vm,and call {v1, . . . ,vm} a spanning set for this subspace.

A subspace can be spanned by different sets of vectors. In fact, except for the pathetically small subspaceconsisting of the zero vector by its lonesome, every other subspace of a vector space will have an infinity of differentspanning sets. Consider, for example, one of the simplest cases, suppose v is a non-zero vector in a vector spaceV . Let W =( v). Then

W = {cv : c a scalar}.

But it is, or should be clear, that anything that is a multiple of v is also a multiple of any non-zero multiple of v.That is, suppose d is any non zero scalar and we set w = dv. Any multiple of v is a multiple of w, and vice-versa:If x = cv, then x = (c/d)w; if x = cw, then x = (cd)v. Any multiple of v also spans W .

Generally speaking, if there is one any spanning set, there is an infinity of them. But some spanning sets arebetter than others. They have less fat,less superfluous elements. Say we are in a vector space V and W is asubspace spanned by the vectors v1, . . . ,vm. If one of these vectors happens to be a linear combination of theremaining ones, who needs it? For example, suppose

vm = a1v1 + · · ·+ αm−1vm−1 for some scalarsa1, . . . , am.

Then any linear combination involving all vectors can be rewritten as one without vm:

c1v1+· · ·+cmvm = c1v1+· · ·+cm−1vm−1+cm (a1v1 + · · ·+ αm−1vm−1) = (c1+cma1)v1+· · ·+(cm−1+cmam−1)vm−1.

That is, if vm is a linear combination of v1, . . . ,vm−1, then sp(v1, . . . ,vm) = sp(v1, . . . ,vm−1). Recalling Theorem3, that one vectors is a linear combination of the others is equivalent to linear dependence. If the spanning islinearly dependent, we can find a vector that is a linear combination of the others (there usually is more than onechoice), and throw it out. The remaining vectors still span the same subspace. We keep doing this. Can we runout of vectors? No, we can’t, since nothing cannot span a subspace (well, sort of) and we are always spanning thesame subspace. But we only have a finite number of vectors to start with, so there must be a stopping point. Thestopping point is a linearly independent set of vectors spanning the same space as before. Such a set is called abasis of the subspace. To put it in the form of a theorem:

Theorem 5 Let V be a vector space and let v1, . . . ,vm be vectors in V . Let W =( v1, . . . ,vm). There is a subsetof {v1, . . . ,vm} that still spans W and is linearly independent; in other words: every spanning set contains a basisof the spanned subspace.

We could now ask if this basis, obtained by discarding vectors from a spanning set, is always the same. Well,if the spanning set was already linearly independent, and there is nothing to discard, then yes. Even so, there aremany other spanning sets that will span the same subspace. And since when discarding there is almost alwaysmore than one choice of what to discard at each stage, the general answer is no. Vector subspaces tend to have aninfinity of different bases. What is, however, remarkable (I’d even dare say extremely remarkable) is that any twobases of a given subspace of a vector space will have the same number of elements. So if you have a subspace andfound a basis of exactly seven vectors and someone tries to sell you a better, improved basis, of only six vectors,don’t buy! There could be a better basis than the one you found, but it still will consist of seven vectors. If asubspace has a basis of m vectors, we say that its dimension is m. That is, an m-dimensional subspace of a vectorspace is one that has a basis of m elements (and hence ALL of its bases will have m elements).

Before we go any further, it may be good to make some additional definitions. A vector space V is a subspaceof itself, so we can ask if it can be spanned by a finite number of vectors. If so, we say it is finite dimensional ; ifnot, it is infinite dimensional. By what we saw, if it is finite dimensional, it has a basis, hence a dimension. Infinitedimensional vector spaces also have bases, but one has to redefine the concept a bit, and we won’t go into it.Example. Show that the vectors

e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1),


form a basis of Rn, hence Rn is a vector space of dimension n. In case there are too many dots, the vector ej isthe n-tuple in which the j-th component is 1, all other components are 0.Solution. A direct way to verify that a set of vectors is a basis of a given space (or subspace) is by performingtwo tasks (in any order)

1. Verify linear independence. In other words show that the equation

c1e1 + · · ·+ cnen = 0

can only be solved by c1 = 0, c2 = 0, . . . , cn = 0.

2. Verify that for every vector w in the space (or subspace), the equation

c1e1 + · · ·+ cnen = w

has at least one solution c1, c2 = 0, . . . , cn. (Incidentally, if it happens to have more than one solution, thenthe first condition fails; if a vector is a linear combination of linearly independent vectors, there is only onechoice for the coefficients.)

If either condition fails, we don’t have a basis.Verifying the first condition; linear independence: Suppose c1e1 + · · ·+ cnen = 0 for some scalars c1, . . . , cn. If

we write the vectors of Rn in the from of column vectors, then this equation becomes00...0

= c1

10...0

+ c2

01...0

+ · · ·+ cn

00...1

=

c1c2...cn

The only way the first and last vectors can be equal is if c1 = c2 = · · · = cn = 0. Linear independence has beenestablished.

Verifying the second condition; spanning: Let w be a vector of Rn,so w =

w1

w2

...wn

. We have to show that no

matter what w1, . . . , wn are, we can always solvew1

w2

...wn

= c1

10...0

+ c2

01...0

+ · · ·+ cn

00...1

?

Writing the right hand side as a single vector, the equation becomesw1

w2

...wn

=

c1c2...cn

.

There is a solution, namely c1 = w1, c2 = w2, . . . , cn = wn. Spanning has been established.The basis {e1, . . . , en} of Rn is sometimes referred to as the canonical basis of Rn. You might notice that if we

allow complex scalars, it is also a basis of Cn, so that Cn is a (complex) vector space of dimension n.Here is a theorem about bases; some of the properties mentioned in it are sort of obvious, others less so. I hope

all are believable (given that they are true).

Theorem 6 Let V be a vector space of dimension n. Then


1. No set of more than n vectors can be linearly independent.

2. No set of less than n vectors can span V .

3. Any set of n vectors that spans V is also linearly independent and a basis; i.e., if V = sp(v1, . . . ,vn), thenv1, . . . ,vn is a basis of V .

4. Any set of n vectors that is linearly independent will also span V and be a basis of V; i.e., if v1, . . . ,vn arelinearly independent, then V = sp(v1, . . . ,vn), and v1, . . . ,vn is a basis of V .

5. Let v1, . . . ,vm be linearly independent (so, by the first property, we must have m ≤ n). If m = n it is a basis;if m < n it can be extended to a basis meaning we can find vectors vm+1, . . . ,vn so that v1, . . . ,vn is a basisof V .

6. Every subspace of V has a basis. If W is a subspace of V there exists a set of vectors {v1, . . . ,vm} that islinearly independent and such that W = sp(v1, . . . ,vm). Necessarily m ≤ n (m = n if and only if V = W ).

So once we have a finite dimensional space V , there is a hierarchy of subspaces: Precisely one subspace of dimensionn, namely V itself, a lot (an infinity) of subspaces of dimensions n−1, all the way down to dimension 1. To keep thepoor subspace consisting only of the zero vector happy by giving it a dimension, one says that the trivial subspacehas dimension 0. From a geometric point of view, if we think of Rn as a sort of n-dimensional replica of our familiar3-space (or as our 3-space if n = 3, the plane if n = 2, a line if n = 1), one dimensional subspaces are lines throughthe origin, two dimensional subspaces are planes containing the origin, three dimensional subspaces, well, they arereplicas of R3 containing the origin. Maybe it is a good idea to work for a while in the familiar environment of3 space. Setting up a system of orthogonal coordinates, we can think of points of 3 space as being vectors. The

triple (written) as a column x =

xyz

can be interpreted as the point of coordinates x, y, z, or as the arrow

from the origin to the point of coordinates x, y, z. Choose the one you like best, it makes no difference. (In appliedsituations it can make a difference, but here we are not in an applied situation.) Suppose we have a non-zerovector; a non-zero vector constitutes a very small linearly independent set of a single element. Say

b =

b1b2b3

6= 0 (so at least one of b1, b2, b3 is not 0), then the subspace sp(b) is the set

sp(b) = {c

b1b2b3

: c a real number} = {

cb1cb2cb3

: c a real number}.

In other words, it consists of all points of coordinates x, y, z satisfying

x = cb1, y = cb2, z = cb3,−∞ < c <∞.

These are the parametric equations of a line through the origin in the direction of the vector b.One usually uses tor s for the parameter instead of c, but that does not really matter.

What about the subspace spanned by two vectors a and b. If these vectors are linearly dependent, we are backto a line (assume that neither is 0 to avoid wasting time). Both vectors are then on the same line through theorigin, and either one of them spans that line, and is a basis for the line. On the other hand, if the vectors are notcollinear (i.e., they are linearly independent), then sp(a,b) is the plane determined by the two vectors (thinking ofthem as arrows from the origin). We might retake this example later on.

What about three vectors? Well, if they are linearly dependent and not all 0, they span a line or a planethrough the origin. If linearly independent, they are a basis of R3 and span R3.

Since our main space may well be Rn (or Cn), it may be a good idea to have some algorithm to decide whenvectors in Rn are linearly independent; know what they span. In the section on determinants we’ll see how to dothis with determinants, but for now we’ll use our old and a bit neglected friend, row reduction. We were writing


vectors of Rn as column vectors, but for the algorithm I have in mind it will be more convenient to write themas rows. After this I’ll give you a second algorithm where you write. Well, lets try to be more or less consistentand keep writing the vectors as columns; in the first algorithm we’ll just transpose to get rows. So assume givenm vectors in Rn, say

v1 =

v11v21...vn1

, v2 =

v12v22...vn2

, . . . ,vm =

v1mv2m

...vnm

,

The problem to solve is: determine the dimension of sp(v1, . . . ,vm) and find a basis for the subspace W =sp(v1, . . . ,vm).

For both algorithms we introduce the matrix I’ll call M whose columns are the vectors; that is,

M =

v11 v12 · · · v1mv21 v22 · · · v2m...

......

...vn1 vn2 · · · vnm

Algorithm 1 Let N = MT . The vectors v1, . . . ,vm are the rows of N . Row reduce N to RRE form. The non-zerorows are then a basis for W ; write them again as column vectors.Why does this work? It works because row operations do not change the span of the row vectors. That is fairlyeasy to see. So once you are in RRE form, the rows of the RRE form still span the same subspace W . But becausethe non-zero ones all start with a leading 1, and everything above and below that 1 is 0, it is easy to see thatthe non-zero rows are linearly independent, thus a basis. This basis, of course, might contain no vector from theoriginal spanning set.Algorithm 2. This one is a bit harder to explain, but one row reduces directly the matrix M , bringing it to RREform. Now go to the original spanning set v1, . . . ,vm and discard every vector that was in a column of M whichnow does NOT have a leading 1. That is, keep only the original vectors that were in a column that now has aleading 1. These remaining vectors form a basis for W . The advantage of this algorithm is that the basis is madeup out of vectors from the original set.

Lets illustrate this with a somewhat messy example. Messy examples can sometimes be the best. Sometimesthey are the worst. NOTICE: This is only an example! No instructor would be sadistic enough to have you do acomputation like this one by hand. I just thought it might be good to occasionally deal with larger systems, andunderstand in the process why computers are a great invention.

Consider the following 8 vectors in R7:

v1 =

123−1−2

4−2

,v2 =

369−3−612−6

,v3 =

1−3−2

4015

,v4 =

11105−2−1

,v5 =

1−8−7

92−212

,v6 =

1011111

,

v7 =

021011−1

,v8 =

3−3−1

8−5108

.

We want to find the dimension of the subspace W = sp(v1, . . . ,v8) they span, and a basis of this spannedsubspace. If perchance the dimension is 7 (it won’t be), then they span R7 and the basis we would get would be a


basis of R7. The matrix M is

M =

1 3 1 1 1 1 0 32 6 −3 1 −8 0 2 −33 9 −2 1 −7 1 1 −1−1 −3 4 0 9 1 0 8−2 −6 0 5 2 1 1 −5

4 12 1 −2 −2 1 1 10−2 −6 5 −1 12 1 −1 8

The transpose matrix is

N = MT =

1 2 3 −1 −2 4 −23 6 9 −3 −6 12 −61 −3 −2 4 0 1 51 1 1 0 5 −2 −11 −8 −7 9 2 −2 121 0 1 1 1 1 10 2 1 0 1 1 −13 −3 −1 8 −5 10 8

To row reduce this matrix, I used Excel. Moreover, because at every row reduction the space spanned by the rowvectors is always the same, it isn’t really necessary to reach RRE; it suffices to stop once one can see that thenon-zero rows are linearly independent. With the help of Excel, I carried out the following operations on N , in theindicated order:

III(2)−3(1), III(3)−(1), III(4)−(1), III(5)−(1), III(6)−(1), III(8)−3(1), I(2,8), I(2,3), II−(2)

III(1)−2(2), III(3)+5(2), III(4)+10(2), III(5)+2(2), III(6)−2(2), III(7)+9(2), III(4)−2(3), I(4,7)

II5(4), III(4)−2(3), II5(5), III(5)+3(3), II5(6), III(6)−8(3), I(4,5), III(6)−(4), III(6)+2(5).

At this point I got the following matrix:

1 0 −1 1 12 −8 00 1 2 −1 −7 6 −10 0 5 0 −33 27 20 0 0 10 −24 26 110 0 0 0 11 −9 10 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0

I doubt I could have done this without Excel! Too many possibilities of mistakes. There is no real need to continue.Vectors in Rn of the form

a1∗∗∗...∗

,

0a2∗∗...∗

,

00a3∗...∗

, . . .

where a1, a2, . . . are non-zero scalars and the entries marked with a * could be anything (zero or non-zero), haveto be linearly independent. Any linear combination of them with coefficients c1, c2, c3, . . . would result in a vectorwhose first component is c1a1. If it is the zero vector, then c1a1 = 0, hence c1 = 0. The second component of thislinear combination is c1(∗) + c2a2 (c1 times the second component of the first vector plus c2a2). Since c1 = 0, weget c2a2 = 0, hence c2 = 0. And so forth.


Returning to our example, we proved that the set of vectors

w1 =

10−1

112−8

0

, w2 =

012−1−7

6−1

, w3 =

0050

−33272

, w4 =

000

10−24

2611

, w5 =

0000

11−9

1

.

spans the same space as the original set v1, . . . ,v8; since they are linearly independent they are a basis of thissubspace and its dimension is 5.

Solution by the second algorithm. In many ways, this is a better solution. We need to row reduce the matrixM ; this time I will take it to RRE form. Using excel, of course.

The following row operations put this matrix into RRE form. I’m not sure one can do it in less, but you maycertainly try.

III(2)−(1), III(3)−3(1), III(4)+(1), III(5)+2(1), III(6)−4(1), III(7)+2(1), II− 15 (2)

, III(1)−(2), III(3)+5(2),

III(4)−5(2), III(5)−2(2), III(6)+3(2), III(7)−7(2), II−(3), III(1)− 45 (3)

, III(2)− 15 (3)

, III(5)− 335 (3), III(6)+ 27

5 (3),

III(7)+ 25 (3)

, I(5,7), II5(5), III(1)− 35 (5)

, III(2)− 25 (5)

, III(6)+ 95 (5)

, III(7)− 115 (5), II 1

25 (5), III(1)+7(6),

III(2)+5(6), III(3)−(6), III(4)−2(6), III(5)+11(6), III(7)+29(6), I4,6

The RRE form is:

1 3 0 0 −1 0 0 20 0 1 0 2 0 0 30 0 0 1 0 0 0 00 0 0 0 0 1 0 −20 0 0 0 0 0 1 10 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Looking at this matrix we notice that the columns that contain the leading 1’s of the non-zero rows are columns1, 3, 4, 6, and 7. That means that of the original vectors, v1,v3,v4,v6, and v7 constitute a basis of the subspacespanned by the original 8 vectors. Since this basis has 5 vectors, we again get that the dimension is 5, but now wehave a basis consisting of a subset of the original vectors.

7.1 Exercises

1. Show that the vector space of all m × n matrices; that is Mm,n (or, in the complex case, Mm,n(C) hasdimension m · n. Describe a basis.

2. In one of the examples in this section it was shown that the vectors

w1 =

10−1

112−8

0

, w2 =

012−1−7

6−1

, w3 =

0050

−33272

, w4 =

000

10−24

2611

, w5 =

0000

11−9

1

8 DETERMINANTS 41

constitute a basis for the subspace of R7 spanned by the eight vectors

v1 =

123−1−2

4−2

,v2 =

369−3−612−6

,v3 =

1−3−2

4015

,v4 =

11105−2−1

,v5 =

1−8−7

92−212

,v6 =

1011111

,

v7 =

021011−1

,v8 =

3−3−1

8−5108

.

1. Express v1 as a linear combination of the vectors w1,w2,w3,w4,w5. Show there is only one way of doingthis.

2. Express w1 as a linear combination of the vectors v1,v2,v3,v4,v5,v6,v7,v8 in three different ways.

8 Determinants

Every square matrix (and only square matrices) gets assigned a number called its determinant. To repeat, if Mis a square matrix then the determinant of M , denoted usually by det(M), is a scalar (real if the matrix is real;otherwise complex). There are many (equivalent) ways of defining the determinant; I’ll select what could be afavorite way of doing it because it also tells you how to compute it. At first glance the definition may look a bitcomplex, but I hope that the examples will clear it up. A bit of practice will clear it up even more. By the way, inthis section I prove almost nothing; you’ll have to take my word that all I tell you is true.

The definition I have in mind is a recursive definition; which becomes a recursive algorithm for computingdeterminants. Recursive procedures should be quite familiar to anybody who has done a bit of programming. Letus start with the simplest case, a 1 × 1 matrix. That’s just a scalar enclosed in parentheses; for example (5) or(1 + 3i). Mostly, one doesn’t even write the parentheses and identifies the set of 1 × 1 matrices with the set ofscalars. We define, for a 1× 1 matrix

det(a) = a.

So the determinant of a scalar is the scalar. This could be called our base case. The next part of the definitionexplains how to reduce computing the determinant of an n×n-matrix to computing determinants of (n−1)×(n−1)matrices. I’ll give it first in words; after the examples I’ll write out the formulas in a more precise way. Here is thefull algorithm, which is usually known as Laplace’s expansion. Assume given an n× n square matrix.

Step 1. If n = 1, it is explained above what to do. Suppose from now on n ≥ 2.

Step 2. Assign a sign (“+” or “−”) to each position of the matrix, beginning by placing a + in position (1, 1)and then alternating signs. This is totally independent of the entries in the matrix; the entry in + positionmay well be negative, the one in a − position can be positive. For example, here is how this signs assignmentlooks for 2× 2, 3× 3 and 4× 4 matrices:

(+ −− +

),

+ − +− + −+ − +

,

+ − + −− + − ++ − + −− + − +

.

8 DETERMINANTS 42

Step 3. Select a row of the matrix. One usually selects the row with the largest number of zero entries; allthings being equal one selects the first row. For each entry in that row compute the determinant of the(n − 1) × (n − 1) matrix obtained by crossing out the selected row and the column containing the entry.Multiply the determinant times the entry. If the entry is in a negative position, change the sign of this result.

Step 4. Add up the results of all the computations done in step 3. That’s the determinant.

An important fact here is that the result of the computation does not depend on the choice of the row. That’s notso easy to justify with the tools at our disposal. Here is how we would compute the determinant of a 2× 2 matrix

A =

(a bc d

)Let us select the first row. The first entry is a; if we cross out the first row and the column containing a, we areleft with d. We multiply by a, getting ad. Since a is in a positive position, we leave it as it is. The next (and last)entry of the first row is b. Crossing out the first row and the column containing b we are left with c. We multiplyb times c and, since b is in a negative place, change the sign to −bc.

Adding it all up gives ad− bc. Thus

det(A) = det

(a bc d

)= ad− bc

What would have happened if we had chosen the second row? We would get the same determinant in the form−cb+ da.

A word on notation. Given a square n × n matrix, it is customary to write its determinant by replacing theparentheses that enclose the array of entries of the matrix by vertical lines. Thus∣∣∣∣ a b

c d

∣∣∣∣ = ad− bc.

So if the matrix A is given by

A =

a11 a22 · · · anna21 a22 · · · a2n...

......

...an1 an2 · · · ann

,

then

det(A) =

∣∣∣∣∣∣∣∣∣a11 a22 · · · anna21 a22 · · · a2n...

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣ ,One slight problem with this notation is that the vertical lines look like absolute values, but determinants can benegative (or even non-real).

Let us compute now a 3 × 3 determinant. While you should memorize the formula for a 2 × 2 determinant,there is no need to memorize any further formulas (assuming you know how Laplace’s expansion works).Expansion by the first row:∣∣∣∣∣∣

a b cd e fg h i

∣∣∣∣∣∣ = a

∣∣∣∣ e fh i

∣∣∣∣− b ∣∣∣∣ d fg i

∣∣∣∣+ c

∣∣∣∣ d eg h

∣∣∣∣To complete the job we need to compute the three 2 × 2 determinants. Here is how one can compute a concrete4 × 4 determinant. I will expand at first by the second row because it has a 0 term, and that cuts by one the

8 DETERMINANTS 43

number of 3× 3 determinants to compute; all three by three determinants by the first row.∣∣∣∣∣∣∣∣1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ = (−2)

∣∣∣∣∣∣2 3 2−1 2 −2

1 2 2

∣∣∣∣∣∣+ (−3)

∣∣∣∣∣∣1 3 21 2 −21 2 2

∣∣∣∣∣∣+ 4

∣∣∣∣∣∣1 2 21 −1 −21 1 2

∣∣∣∣∣∣= (−2)

(2

∣∣∣∣ 2 −22 2

∣∣∣∣− 3

∣∣∣∣ −1 −21 2

∣∣∣∣+ 2

∣∣∣∣ −1 21 2

∣∣∣∣)−3

(∣∣∣∣ 2 −22 2

∣∣∣∣− 3

∣∣∣∣ 1 −21 2

∣∣∣∣+ 2

∣∣∣∣ 1 21 2

∣∣∣∣)+4

(∣∣∣∣ −1 −21 2

∣∣∣∣− 2

∣∣∣∣ 1 −21 2

∣∣∣∣+ 2

∣∣∣∣ 1 −11 1

∣∣∣∣)= (−2)

(2(4− (−4))− 3(−2− (−2)) + 2(−2− 2)

)− 3

((4− (−4))− 3(2− (−2)) + 2(2− 2)

)+4

(((−2)− (−2))− 2(2− (−2)) + 2(1− (−1))

)= −20.

Laplace’s method is useful for calculating the determinants of small matrices (up to 3 × 3, maybe 4 × 4), andfor matrices that have a lot of zeros. But it is not a very efficient method. I will list now the basic properties ofdeterminants.Some of these properties will allow us to calculate determinants more efficiently. Maybe. Some ofthese properties are easy to verify, others are not so easy.

D1. If M is an n× n matrix and its rows are linearly dependent; equivalently, one row is a linear combination ofthe other rows, then det(M) = 0. In particular this holds if M has a zero row, or two equal rows.

D2. If M is an n × n, then det(MT ) = det(M). The determinant of a matrix equals the determinant of itstranspose. Because of this, it turns out that one can compute the determinant of a matrix expanding alonga column rather than a row.

D3. If M is an n×n matrix and its columns are linearly dependent; equivalently, one column is a linear combinationof the other column, then det(M) = 0. In particular this holds if M has a zero column, or two equal columns.This property is, of course, and immediate consequence of properties D1,D2.

D4. The effect of row operations. Let M be ann× n matrix. If the matrix N is obtained from M by

1. interchanging two rows, that is applying I(i,j), i 6= j, then det(N) = −det(M);

2. multiplying a row by a scalar c, (operation IIc(i)), then det(N) = cdet(M);

3. adding to a row i a row j times a scalar (operation III(i)+c(j), i 6= j), then det(N) = det(M).

D5. If A,B are n× n matrices, then det(AB) = det(BA) = det(A)× det(B).

With these properties we can compute determinants using row reduction. While computing 5× 5 determinants isstill a difficult thing to do without the aid of some calculating device (nice computer software, for example), it is afar better method than using the Laplace expansion. If programmed in some computer language, it is an algorithmthat may be hard to beat for finding the determinant of medium sized matrices. To use it at maximum efficiencywe need the following additional property, which is actually an easy consequence of the Laplace expansion. First adefinition. A square matrix is said to be upper triangular if all entries below the main diagonal are 0. It is said tolower triangular if all entries below the main diagonal are 0. The following matrix is upper triangular.

U =

a11 a12 a13 · · · a1(n−1) a1n0 a22 a23 · · · a2(n−1) a2n0 0 a33 · · · a3(n−1) a3n...

.... . .

. . . · · · · · ·

· · ·...

.... . .

. . . · · ·0 0 0 · · · 0 ann

8 DETERMINANTS 44

If you transpose it, it becomes lower triangular. The property in question is:

D6. The determinant of an upper or a lower triangular matrix equals the product of its diagonal entries.

Examples: Calculate the determinants of the following matrices:

a)A =

1 −5 60 2 30 0 2

, b)B =

2 0 0 0−3 5 0 0

4 4 −3 01 2 5 6

, c)C =

1 −5 6 7 80 2 3 4 50 0 0 3 30 0 0 2 10 0 0 0 7

.

Solution.

a) det(A) = 1 · 2 · 2 = 4, b) det(B) = 2 · 5 · (−3) · 6 = −180 c) det(C) = 1 · 2 · 0 · 2 · 7 = 0.

The idea now is to row reduce the matrix to upper (or lower–upper is better) triangular form, keeping track ofhow the row reductions affect the determinant. If we interchange rows, we multiply the determinant by -1, if wemultiply a row by a constant, we divide the determinant by that constant; if we do an operation of type III, thedeterminant stays the same. I’ll illustrate this computing again the determinant of the 4 × 4 matrix computedbefore. That is, I’ll compute ∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ .First we work on the first column.We perform the operations III(2)−2(1), III(3)−(1), III(4)−(1) on the matrix; thedeterminant is unchanged. We get∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 3 20 −7 −10 −40 −3 −1 −40 −1 −1 0

∣∣∣∣∣∣∣∣We can now exchange lines 2 and 3; this changes the sign of the determinant, so to compensate we multiply thenew determinant by −1 and get ∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ = −

∣∣∣∣∣∣∣∣1 2 3 20 −1 −1 00 −3 −1 −40 −7 −10 −4

∣∣∣∣∣∣∣∣Next I multiply the second row by −1. To compensate I need to divide by −1, which of course is the same aschanging the sign: ∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 3 20 1 1 00 −3 −1 −40 −7 −10 −4

∣∣∣∣∣∣∣∣Next I perform operations III(3)+3(2), III(4)+7(2). This does not change the determinant.∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 3 20 1 1 00 0 2 −40 0 −3 −4

∣∣∣∣∣∣∣∣Trying to avoid working with fractions, or just for the fun of it, I’ll multiply the 3rd row by 3 and the fourth rowby 2. I need to compensate by dividing the determinant by 2× 3 = 6:∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ =1

6

∣∣∣∣∣∣∣∣1 2 3 20 1 1 00 0 6 −120 0 −6 −8

∣∣∣∣∣∣∣∣

8 DETERMINANTS 45

Finally, I perform operation III(4)+(3), which does not change the determinant (and puts the matrix in uppertriangular form) to get∣∣∣∣∣∣∣∣

1 2 3 22 −3 −4 01 −1 2 −21 1 2 2

∣∣∣∣∣∣∣∣ =1

6

∣∣∣∣∣∣∣∣1 2 3 20 1 1 00 0 6 −120 0 0 −20

∣∣∣∣∣∣∣∣ =1

6(1 · 1 · 6 · (−20)) = −20.

What are determinants good for? Well, here is a first application.

Theorem 7 Let

v1 =

v11v21

...vn1

, . . . ,vm =

v1mv2m

...vnm

be m vectors in Rn. They are linearly independent if and only if the matrix whose columns consist of the vectors;that is the matrix

M =

v11 v12 · · · v1mv21 v22 · · · v2m

......

......

vn1 vn2 · · · vnm

has at least one m ×m submatrix with a non-zero determinant. More generally, the dimension of (v1, . . . ,vm) isk if and only if M contains a k × k submatrix with 0 determinant, and every (k + 1) × (k + 1) submatrix has 0determinant.

I’ll try to explain why this result must hold. First of all, by a submatrix of a matrix we mean either the matrixitself, or any matrix obtained by crossing out some rows and/or columns of the original matrix. For example, the3× 4 matrix a b c d

e f g hi j k `

has a total of 18 2× 2 submatrices, of which a few are(

a be f

),

(a ce g

),

(b dj `

),

(g hk `

).

Before I try to explain why this theorem is true (an explanation you may skip if you trust everything I tell), I wantto consider some cases. Suppose m > n. Then there is no way we can find an m ×m submatrix with a non-zerodeterminant for the simple reason that you can’t find an m×m submatrix when you have fewer than m rows. Thissimply reflects the fact that the dimension of Rn is n and if m > n, a set of m vectors cannot be independent.So we may just restrict considerations to the case m ≤ n. Suppose m = n. Then the theorem tells us that then-vectors are linearly independent, hence a basis of Rn, if and only the determinant of M (in this case the one andonly n× n submatrix) is different from 0.

Here is a reason for the theorem. To simplify, I replace the matrix M by its transpose N ; N = MT , so N is m×nand the vectors are the rows of N . Because he determinant of a matrix and its transpose are the same, nothingchanges much. Now suppose I row reduce N . A bit if reflection shows that in doing this, every row operation ofN is an equivalent row operation on some submatrices. The submatrices are also being row operated on, perhapsreduced. The determinant can change by a row operation, but only by multiplication by a non-zero scalar. So, ifwe reduce N to RRE, any submatrix of the original matrix N with a non-zero determinant will end as a matrixof the RRE form of N with non-zero determinant. And conversely, zero determinants stay zero determinants. Aswe learned before, the dimension of the subspace spanned by the vectors v1, . . . ,vm, is the number of non-zerorows of the RRE form. A bit of reflection shows that if this number is k, then the largest non-zero determinant

8 DETERMINANTS 46

you can get in the RRE form is the k × k determinant obtained by crossing out all zero rows, and all columns notcontaining a leading 1.

A very important application is the following.

Theorem 8 Let A be a square n× n matrix. Then A is invertible if and only if det(A) 6= 0.

I will try to at least give a reason why this theorem holds. Warning: Some of my arguments could be circular!But, circles are, after all,consistent. Let us suppose first that A is a square invertible matrix. If we want to solveany system of linear equations having A as its matrix, a system of the form

Ax = b,

there is a theoretically simple way of doing it. I say “theoretically simple,” because it isn’t the best to actuallyuse in practice. The method is to multiply on the right by the inverse. Well, let’s be precise. Assume first there isa solution x. Then, because Ax = b, we have A−1(Ax) = A−1b, (A−1A)x = A−1b, Ix = A−1b, x = A−1b. Inother words, the only possible solution is x = A−1b. Conversely, if we take x = A−1b, then we verify at once thatAx = b. That is:

If A is an invertible n× n square matrix, then the system of linear equations Ax = b has a unique solution forevery b ∈ Rn.

Fine. Now suppose A is a square, n× n matrix such that the equation Ax = b has a unique solution for everyb ∈ Rn. With e1, . . . , en being the canonical basis of Rn, we solve the equations

Ax(1) = e1, Ax(2) = e2, . . . , Ax(n) = en.

If these solutions are

x(1) =

x11x21

...xn1

, , . . . , x(n) =

x1nx2n

...xnn

and we use them as columns of a matrix X; that is

X =

x11 x12 · · · x1nx21 x22 · · · x2n

......

......

xn1 xn2 · · · xnn

Then we see at once that AX = I, showing A is invertible. So we also have

If A is a square n× n matrix such that for every b ∈ Rn the system of equations Ax = b has a solution, thenA is invertible. And the solution is unique.

Putting it all together we see that a square n× n matrix is invertible if and only if the systems Ax = b have aunique solution for every choice of b ∈ Rn. Well, if we recall Theorem 1, we see that this is equivalent to the RREform of the matrix A being the identity matrix. Let’s state this as a theorem.

Theorem 9 A a square n× n matrix A is invertible if and only if its RRE form is the n× n identity matrix.

And now we can explain why Theorem 8 must be true. When we row reduce a matrix the determinant mightchange sign (if we interchange rows) or get multiplied by a constant (if we do that to a row). But if it starts asbeing different from 0, it will end different from 0. And, of course, vice versa. If A is invertible, its RRE form is I,det(I) = 1 6= 0, thus det(A) 6= 0. On the other hand, for a square matrix, the only way one can avoid getting I asthe RRE form is one either runs into a zero row or a zero column. In either case, the determinant is 0. Thus theonly square matrix in RRE form with non-zero determinant is I. So if det(A) 6= 0, then because the determinantof its RRE form also must be 6= 0, the RRE form must be I and A is invertible.

We conclude this section with a new method for inverting matrices. It is probably the best method to use for2 × 2 matrices, acceptable for 3 × 3 matrices, not so good for higher orders. To explain it, and also to write outin a more precise way the computation of determinants by Laplace expansion, it is convenient to develop someadditional jargon.

8 DETERMINANTS 47

Let A = (aij)1≤i,j≤n be a square n × n matrix (n ≥ 2). If (i, j) is a position in the matrix, I will denote byµij(A) the (n − 1) × (n − 1) matrix obtained from A by eliminating the i-th row and the j-th column. An n × nmatrix has n2 such submatrices. For example, if

A =

a b cd e fg h i

,

then these nine matrices are

µ11(A) =

(e fh i

), µ12(A) =

(d fg i

), µ13(A) =

(d eg h

), µ21(A) =

(b ch i

), µ22(A) =

(a cg i

),

µ23(A) =

(a bg h

), µ31(A) =

(b ce f

), µ32(A) =

(a cd f

), µ33(A) =

(a bd e

).

The (i, j)-th minor of the matrix A is defined to be the determinant of µij . The (i, j)-th cofactor is the same as theminor if (i, j) is a positive position, minus the minor otherwise. I will denote the (i, j)-th cofactor of A by cij(A);a convenient way of writing is

cij(A) = (−1)i+j det(µij(A)).

The factor (−1)i+j is 1 precisely if (i, j) is a positive position; −1 otherwise.

The adjunct matrix A† is defined as the transpose of the matrix of cofactors:

A† = (cij(A))T.

This matrix is interesting because of the following result:

Theorem 10 Let A be a square matrix. Then

AA† = A†A = (det(A))I.

In particular, if detA = 0 then AA† is the zero matrix; not a very interesting observation. What is moreinteresting is that if detA 6= 0, we can divide by determinant of A and get

A

(1

detAA†)

=

(1

detAA†)A = I.

This provides an alternative reason for why Theorem 8 is true and shows:

Theorem 11 ?? If det(A) 6= 0, then A−1 = 1detAA

†.

Let us use this method to try to find again the inverse of the matrix

A =

1 2 22 3 11 0 1

First, we compute the determinant. No point in doing anything else if the determinant is 0. Expanding by the lastrow

det(A) =

∣∣∣∣∣∣1 2 22 3 11 0 1

∣∣∣∣∣∣ =

∣∣∣∣ 2 23 1

∣∣∣∣+

∣∣∣∣ 1 22 3

∣∣∣∣ (14)

= (2− 6) + (3− 4) = −5 6= 0 (15)

The matrix is invertible. Next we compute the 9 cofactors.

c11(A) = +

∣∣∣∣ 3 10 1

∣∣∣∣ = 3, c12(A) = −∣∣∣∣ 2 1

1 1

∣∣∣∣ = −1, c13(A) = +

∣∣∣∣ 2 31 0

∣∣∣∣ = −3,

8 DETERMINANTS 48

c21(A) = −∣∣∣∣ 2 2

0 1

∣∣∣∣ = −2, c22(A) = +

∣∣∣∣ 1 21 1

∣∣∣∣ = −1, c23(A) = −∣∣∣∣ 1 2

1 0

∣∣∣∣ = 2,

c31(A) = +

∣∣∣∣ 2 23 1

∣∣∣∣ = −4, c32(A) = +

∣∣∣∣ 1 22 1

∣∣∣∣ = 3, c33(A) = +

∣∣∣∣ 1 22 3

∣∣∣∣ = −1,

The cofactor matrix is 3 −1 −3−2 −1 2−4 3 −1

Transposing we get the adjunct:

A† =

3 −2 −4−1 −1 3−3 2 −1

The inverse is

A−1 = −1

5

3 −2 −4−1 −1 3−3 2 −1

=

− 3

525

45

15

15 − 3

5

35 − 2

515

Notes. Definitions of determinants vary. Of course, the end product is always the same! At an elementary level,a common approach is to define the determinant of a square matrix recursively as the scalar you obtain whenexpanding Laplace by the first row. Using our notation for submatrices this definition would look somewhat likethis: Let A = (aij)1≤i,j≤n be a square n × n matrix, say n ≥ 2 (one can start, as we did from n = 1, but mosttexts will begin with n = 2). 1.) If n = 2 then det(A) = a11a22 − a12a13. (Recursion base.)2.) If n ≥ 3, define

det(A) =

n∑j=1

(−1)1+ja1j det(µ1j(A)),

where (as above), µ1j(A) is the matrix obtained from A by crossing out the first row and j-th column. (Reductionof the n case to the n− 1 case.)

The problem with this as a definition,rather than a theorem, is that it is quite hard to use it to verify anyproperties of the determinant. In particular, it isn’t easy to show, beginning with this definition, that you canactually expand using any row, or even column, not just the first row. I won’t go into this any further, except towrite out now the Laplace expansion formulas for the determinant, since so far I only gave them in words.

It holds, for every i, 1 ≤ i ≤ n, that

det(A) =n∑

j=1

(−1)i+jaij det(µij(A)).

This is the expansion by rows. A similar result holds for columns. For each j such that 1 ≤ j ≤ n,

det(A) =

n∑i=1

(−1)i+jaij det(µij(A)).

Laplace vs. Row Reduction. To compute a 2× 2 determinant one has to perform 3 operations: 2 products anda difference. Using Laplace, computing a 3× 3 determinant reduces to computing three 2× 2 determinants; a 4× 4determinant “reduces” to computing four 3 × 3 determinants, thus 4 × 3 = 12 two by two determinants. And soforth. For an n× n matrix, there will be approximately n! = n(n− 1) · · · 3 · 2 operations involved in computing adeterminant by Laplace’s method. This is computationally very, very bad. It’s OK for small matrices, but as thematrices get larger, it becomes a very bad method. Of course, if the matrix is what is called a sparse matrix, thatis, has a lot of zero entries, then it could be a reasonable method.

Consider now row reducing the matrix to triangular form. To get the first column to have all entries underthe first one equal to 0, you need to pivot (perhaps; i.e., get a non-zero entry to position (1, 1)), let’s call this one

8 DETERMINANTS 49

operation though it hardly takes time in a computer, divide each entry of the first row by the entry in position(1, 1) (n operations), and then for each row below the first multiply the first row by the first entry of the row inquestion (n operations) and subtract the first row multiplied by that entry from the row in question (n operations.That is we have a maximum of 1 + n+ (n− 1)n = n2 + 1 operations. We have to repeat similar operations for theother columns; to simplify let’s say we go all the way to RRE form, so that we have n2 + 1 operations per column.That is a total of n(n2 +1) < (n+1)3 operations. Now, for small values of n, there is little difference. For example,if n = 4, then 4! = 24, (4 + 1)3 = 125. Row reduction seems worse (it actually isn’t). But, suppose n = 10. Then

10! = 3, 628, 800, 113 = 1331.

Row reduction is a polynomial time algorithm, while Laplace is super-exponential.Cramer’s rule. The expression of the inverse of a matrix in terms of the adjunct matrix gives rise to a verypopular method for solving linear systems of equations in which the system matrix is square (as many equationsas unknowns) called Cramer’s rule,(so named for Gabriel Cramer, an 18th century Swiss mathematician who

supposedly first stated it). In words it states that the solution x =

x1...xn

of the n × n system Ax = b can

be found as follows. The component xj equals the quotient of the determinant of the matrix obtained from A byreplacing the j-th column by b divided by the determinant of A. The advantage of this method, which only worksif detA 6= 0, is that you can compute one component of the solution without having to compute the others. Thedisadvantage is that a computer might require more time to find one component by this method than all by rowreduction.

8.1 Exercises

To come.

Documents

1 Welcome to the world of linear algebra: Vector Spacesmath.fau.edu/schonbek/MAPcourses/em1fa12linearalgebra.pdfVector spaces, also known as a linear spaces, come in two avors, real