analog vector

Linear Algebra (part 2) : Vector Spaces and Linear Transformations (by Evan Dummit, 2012, v. 1.00)

Contents

1 Vector Spaces and Linear Transformations 1

1.1 Review of Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Formal Denition of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Span, Independence, Bases, Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.1 Linear Combinations and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.3 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5.1 Kernel and Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.2 The Derivative as a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1 Vector Spaces and Linear Transformations

1.1 Review of Vectors in Rn

A vector, as we typically think of it, is a quantity which has both a magnitude and a direction. This is incontrast to a scalar, which carries only a magnitude.

Real-valued vectors are extremely useful in just about every aspect of the physical sciences, since justabout everything in Newtonian physics is a vector position, velocity, acceleration, forces, etc. Thereis also vector calculus namely, calculus in the context of vector elds which is typically part of amultivariable calculus course; it has many applications to physics as well.

We often think of vectors geometrically, as a directed line segment (having a starting point and an endpoint).

We denote the n-dimensional vector from the origin to the point (a1, a2, , an) as ~v = a1, a2, , an,where the ai are scalars.

Some vectors: 1, 2, 3, 5,1,, e2, 27, 3, 4

3, 0, 0,1

.

Notation: I prefer to use angle brackets rather than parentheses () so as to draw a visual distinctionbetween a vector, and the coordinates of a point in space. I also draw arrows above vectors or typesetthem in boldface (thus ~v or v), in order to set them apart from scalars. This is not standard notationeverywhere; many other authors use regular parentheses for vectors.

Note/Warning: Vectors are a little bit dierent from directed line segments, because we don't care where avector starts: we only care about the dierence between the starting and ending positions. Thus: the directedsegment whose start is (0, 0) and end is (1, 1) and the segment starting at (1, 1) and ending at (2, 2) representthe same vector, 1, 1.

We can add vectors (provided they are of the same length!) in the obvious way, one component at a time: if~v = a1, , an and ~w = b1, , bn then ~v + ~w = a1 + b1, , an + bn.

We can justify this using our geometric idea of what a vector does: ~v moves us from the origin to the point(a1, , an). Then ~w tells us to add b1, , bn to the coordinates of our current position, and so ~wmoves us from (a1, , an) to (a1+b1, , an+bn). So the net result is that the sum vector ~v+ ~w movesus from the origin to (a1 + b1, , an + bn), meaning that it is just the vector a1 + b1, , an + bn. Another way (though it's really the same way) to think of vector addition is via the parallelogramdiagram, whose pairs of parallel sides are ~v and ~w, and whose long diagonal is ~v + ~w.

We can also 'scale' a vector by a scalar, one component at a time: if r is a scalar, then we have r ~v =ra1, , ran.

Again, we can justify this by our geometric idea of what a vector does: if ~v moves us some amount in adirection, then

1

2~v should move us half as far in that direction. Analogously,2~v should move us twice as

far in that direction, and ~v should move us exactly as far, but in the opposite direction.

Example: If ~v = 1, 2, 2 and ~w = 3, 0,4 then 2~w = 6, 0,8 , and ~v+ ~w = 2, 2,2 . Furthermore,~v

2~w = 7, 2, 10 .

The arithmetic of vectors in Rn satises several algebraic properties (which follow more or less directly fromthe denition):

Addition of vectors is commutative and associative. There is a zero vector (namely, the vector with all entries zero) such that every vector has an additiveinverse.

Scalar multiplication distributes over addition of both vectors and scalars.

1.2 Formal Denition of Vector Spaces

The two operations of addition and scalar multiplication (and the various algebraic properties they satisfy)are the key properties of vectors in Rn. We would like to investigate other collections of things which possessthe same properties as vectors in Rn.

Denition: A (real) vector space is a collection V of vectors together with two binary operations, addition ofvectors (+) and scalar multiplication of a vector by a real number (), satisfying the following axioms:

Let ~v, ~v1, ~v2, ~v3 be any vectors and , 1,2 be any (real number) scalars. Note: The statement that + and are binary operations means that ~v1+~v2 and ~v are always dened. [A1] Addition is commutative: ~v1 + ~v2 = ~v2 + ~v1. [A2] Addition is associative: (~v1 + ~v2) + ~v3 = ~v1 + (~v2 + ~v3). [A3] There exists a zero vector ~0, with ~v +~0 = ~v. [A4] Every vector ~v has an additive inverse ~v, with ~v + (~v) = ~0. [M1] Scalar multiplication is consistent with regular multiplication: 1 (2 ~v) = (12) ~v. [M2] Addition of scalars distributes: (1 + 2) ~v = 1 ~v + 2 ~v. [M3] Addition of vectors distributes: (~v1 + ~v2) = ~v1 + ~v2. [M4] The scalar 1 acts like the identity on vectors: 1 ~v1 = ~v1.

Important Remark: One may also consider vector spaces where the collection of scalars is something otherthan the real numbers for example, there exists an equally important notion of a complex vector space,whose scalars are the complex numbers. (The axioms are the same.)

We will principally consider real vector spaces, in which the scalars are the real numbers. The most general notion of a vector space involves scalars from a eld, which is a collection of numberswhich possess addition and multiplication operations which are commutative, associative, and distribu-tive, with an additive identity 0 and multiplicative identity 1, such that every element has an additiveinverse and every nonzero element has a multiplicative inverse.

Aside from the real and complex numbers, another example of a eld is the rational numbers (i.e.,fractions). One can formulate an equally interesting theory of vector spaces over the rational numbers.

Examples: Here are some examples of vector spaces:

The vectors in Rn are a vector space, for any n > 0. (This had better be true!) In particular, if we take n = 1, then we see that the real numbers themselves are a vector space. Note: For simplicity I will demonstrate all of the axioms for vectors in R2; there, the vectors are ofthe form x, y and scalar multiplication is dened as x, y = x, y. [A1]: We have x1, y1+ x2, y2 = x1 + x2, y1 + y2 = x2, y2+ x1, y1. [A2]: We have (x1, y1+ x2, y2)+x3, y3 = x1 + x2 + x3, y1 + y2 + y3 = x1, y1+(x2, y2+ x3, y3). [A3]: The zero vector is 0, 0, and clearly x, y+ 0, 0 = x, y. [A4]: The additive inverse of x, y is x,y, since x, y+ x,y = 0, 0. [M1]: We have 1 (2 x, y) = 12x, 12y = (12) x, y. [M2]: We have (1 + 2) x, y = (1 + 2)x, (1 + 2)y = 1 x, y+ 2 x, y. [M3]: We have (x1, y1+ x2, y2) = (x1 + x2), (y1 + y2) = x1, y1+ x2, y2. [M4]: Finally, we have 1 x, y = x, y.

The zero space with a single element ~0, with ~0 +~0 = ~0 and ~0 = ~0 for every , is a vector space. All of the axioms in this case eventually boil down to ~0 = ~0. This space is rather boring: since it only contains one element, there's really not much to say aboutit.

The set of m n matrices for any m and any n, forms a vector space. The various algebraic properties we know about matrix addition give [A1] and [A2] along with [M1],[M2], [M3], and [M4].

The zero vector in this vector space is the zero matrix (with all entries zero), and [A3] and [A4]follow easily.

Note of course that in some cases we can also multiply matrices by other matrices. However, therequirements for being a vector space don't care that we can multiply matrices by other matrices!(All we need to be able to do is add them and multiply them by scalars.)

The complex numbers a+ bi, where i2 = 1, are a vector space. The axioms all follow from the standard properties of complex numbers. As might be expected, thezero vector is just the complex number 0 = 0 + 0i.

Again, note that the complex numbers have more structure to them, because we can also multiplytwo complex numbers, and the multiplication is also commutative, associative, and distributive overaddition. However, the requirements for being a vector space don't care that the complex numbershave these additional properties.

The collection of all real-valued functions on any part of the real line is a vector space, where we denethe sum of two functions as (f + g)(x) = f(x) + g(x) for every x, and scalar multiplication as( f)(x) = f(x). To illustrate: if f(x) = x and g(x) = x2, then f + g is the function with (f + g)(x) = x+ x2, and 2fis the function with (2f)(x) = 2x.

The axioms follow from the properties of functions and real numbers. The zero vector in this spaceis the zero function; namely, the function z which has z(x) = 0 for every x.

For example (just to demonstrate a few of the axioms), for any value x in [a, b] and any functions fand g, we have

[A1]: (f + g)(x) = f(x) + g(x) = g(x) + f(x) = (g + f)(x). [M2]: (f + g)(x) = f(x) + g(x) = (f)(x) + (g)(x). [M4]: (1 f)(x) = f(x).

There are many simple algebraic properties that can be derived from the axioms (and thus, are true in everyvector space), using some amount of cleverness. For example:

1. Addition has a cancellation law: for any vector ~v, if ~v + ~a = ~v +~b then ~a = ~b.

Idea: Add ~v to both sides and then use [A1]-[A4] to rearrange (~v + ~a) + (~v) = (~v +~b) + (~v) to~a = ~b.

2. The zero vector is unique: for any vector ~v, if ~v + ~a = ~v, then ~a = ~0.

Idea: Use property (1) applied when ~b = 0.3. The additive inverse is unique: for any vector ~v, if ~v + ~a = ~0 then ~a = ~v.

Idea: Use property (1) applied when ~b = ~v.4. The scalar 0 times any vector gives the zero vector: 0 ~v = ~0 for any vector ~v.

Idea: Expand ~v = (1 + 0) ~v = ~v + 0 ~v via [M2] and then apply property (2).5. Any scalar times the zero vector is the zero vector: ~0 = ~0 for any scalar .

Idea: Expand ~0 = (~0 +~0) = ~0 + ~0 via [M1] and then apply property (1).6. The scalar 1 times any vector gives the additive inverse: (1) ~v = ~v for any vector ~v.

Idea: Use property (3) and [M2]-[M4] to write ~0 = 0 ~v = (1 + (1)) ~v = ~v + (1)~v, and then useproperty (1) with ~a = ~v.

7. The additive inverse of the additive inverse is the original vector: (~v) = ~v for any vector ~v. Idea: Use property (5) and [M1], [M4] to write (~v) = (1)2 ~v = 1 ~v = ~v.

1.3 Subspaces

Denition: A subspaceW of a vector space V is a subset of the vector space V which, under the same additionand scalar multiplication operations as V , is itself a vector space.

Very often, if we want to check that something is a vector space, it is often much easier to verify that itis a subspace of something else we already know is a vector space.

We will make use of this idea when we talk about the solutions to a homogeneous linear dierentialequation (see the examples below), and prove that the solutions form a vector space merely bychecking that they are a subspace of the set of all functions, rather than going through all of theaxioms.

We are aided by the following criterion, which tells us exactly what properties a subspace must satisfy:

Theorem (Subspace Criterion): To check that W is a subspace of V , it is enough to check the following threeproperties:

[S1] W contains the zero vector of V . [S2] W is closed under addition: For any ~w1 and ~w2 in W , the vector ~w1 + ~w2 is also in W . [S3] W is closed under scalar multiplication: For any scalar and ~w in W , the vector ~w is also in W .

The reason we don't need to check everything to verify that a collection of vectors forms a subspace is thatmost of the axioms will automatically be satised in W because they're true in V .

As long as all of the operations are dened, axioms [A1]-[A2] and [M1]-[M4] will hold in W because theyhold in V . But we need to make sure we can always add and scalar-multiply, which is why we need [S2]and [S3].

In order to get axiom [A3] for W , we need to know that the zero vector is in W , which is why we need[S1].

In order to get axiom [A4] for W we can use the result that (1) ~w = ~w, to see that the closure underscalar multiplication automatically gives additive inverses.

Remark: Any vector space automatically has two easy subspaces: the entire space V , and the trivial subspaceconsisting only of the zero vector.

Examples: Here is a rather long list of examples of less trivial subspaces (of vector spaces which are of interestto us):

The vectors of the form t, t, t are a subspace of R3. [This is the line x = y = z.] [S1]: The zero vector is of this form: take t = 0. [S2]: We have t1, t1, t1 + t2, t2, t2 = t1 + t2, t1 + t2, t1 + t2, which is again of the same form ifwe take t = t1 + t2.

[S3]: We have t1, t1, t1 = t1, t1, t1, which is again of the same form if we take t = t1. The vectors of the form s, t, 0 are a subspace of R3. [This is the xy-plane, aka the plane z = 0.] [S1]: The zero vector is of this form: take s = t = 0. [S2]: We have s1, t1, 0+ s2, t2, 0 = s1 + s2, t1 + t2, 0, which is again of the same form, if we takes = s1 + s2 and t = t1 + t2.

[S3]: We have s1, t1, 0 = s1, t1, 0, which is again of the same form, if we take s = s1 andt = t1.

The vectors x, y, z with 2x y + z = 0 are a subspace of R3. [S1]: The zero vector is of this form, since 2(0) 0 + 0 = 0. [S2]: If x1, y1, z1 and x2, y2, z2 have 2x1 y1 + z1 = 0 and 2x2 y2 + z2 = 0 then adding theequations shows that the sum x1 + x2, y1 + y2, z1 + z2 also lies in the space. [S3]: If x1, y1, z1 has 2x1 y1 + z1 = 0 then scaling the equation by shows that x1, x2, x3also lies in the space.

More generally, the collection of solution vectors x1, , xn to any homogeneous equation, or systemof m homogeneous equations, form a subspace of Rn.

It is possible to check this directly by working with equations. But it is much easier to use matrices:write the system in matrix form, as A~x = ~0, where ~x = x1, , xn is a solution vector. [S1]: We have A~0 = ~0, by the properties of the zero vector. [S2]: If ~x and ~y are two solutions, the properties of matrix arithmetic imply A(~x+ ~y) = A~x+A~y =~0 +~0 = ~0 so that ~x+ ~y is also a solution.

[S3]: If is a scalar and ~x is a solution, then A( ~x) = (A~x) = ~0 = ~0, so that ~x is also asolution.

The collection of 2 2 matrices of the form[a b0 a

]is a subspace of the space of all 2 2 matrices.

[S1]: The zero matrix is of this form, with a = b = 0.

[S2]: We have[a1 b10 a1

]+

[a2 b20 a2

]=

[a1 + a2 b1 + b2

0 a1 + a2

], which is also of this form.

[S3]: We have [a1 b10 a1

]=

[a1 b10 a1

], which is also of this form.

The collection of complex numbers of the form a+ 2ai is a subspace of the complex numbers. The three requirements should be second nature by now!

The collection of continuous functions on [a, b] is a subspace of the space of all functions on [a, b]. [S1]: The zero function is continuous. [S2]: The sum of two continuous functions is continuous, from basic calculus. [S3]: The product of continuous functions is continuous, so in particular a constant times a continuousfunction is continuous.

The collection of n-times dierentiable functions on [a, b] is a subspace of the space of continuous functionson [a, b], for any positive integer n.

The zero function is dierentiable, as are the sum and product of any two functions which aredierentiable n times.

The collection of all polynomials is a vector space.

Observe that polynomials are functions on the entire real line. Therefore, it is sucient to verify thesubspace criteria.

The zero function is a polynomial, as is the sum of two polynomials, and any scalar multiple of apolynomial.

The collection of solutions to the (homogeneous, linear) dierential equation y + 6y + 5y = 0 form avector space.

We show this by verifying that the solutions form a subspace of the space of all functions. [S1]: The zero function is a solution. [S2]: If y1 and y2 are solutions, then y1 +6y1 +5y1 = 0 and y2 +6y2 +5y2 = 0, so adding and usingproperties of derivatives shows that (y1 + y2)

+ 6(y1 + y2) + 5(y1 + y2) = 0, so y1 + y2 is also a

solution.

[S3]: If is a scalar and y1 is a solution, then scaling y1 + 6y1 + 5y1 = 0 by and using propertiesof derivatives shows that (y1)

+ 6(y1) + 5(y1) = 0, so y1 is also a solution.

Note: Observe that we can say something about what the set of solutions to this equation looks like,namely that it is a vector space, without actually solving it!

For completeness, the solutions are y = Aex + Be5x for any constants A and B. From here,if we wanted to, we could directly verify that such functions form a vector space.

The collection of solutions to any nth-order homogeneous linear dierential equation y(n)+Pn(x)y(n1)+ + P2(x) y + P1(x) y = 0 for continuous functions P1(x), , Pn(x), form a vector space. Note that y(n) means the nth derivative of y. As in the previous example, we show this by verifying that the solutions form a subspace of thespace of all functions.

[S1]: The zero function is a solution. [S2]: If y1 and y2 are solutions, then by adding the equations y(n)1 +Pn(x) y

(n1)1 + +P1(x) y1 = 0

and y(n)2 + Pn(x) y

(n1)2 + + P1(x) y2 = 0 and using properties of derivatives shows that

(y1 + y2)(n) + Pn(x) (y1 + y2)(n1) + + P1(x) (y1 + y2) = 0, so y1 + y2 is also a solution.

[S3]: If is a scalar and y1 is a solution, then scaling y(n)1 +Pn(x)y(n1)1 + +P2(x)y1+P1(x)y1 = 0

by and using properties of derivatives shows that (y1)(n)+Pn(x)(y1)(n1)+ +P1(x)(y1) = 0,

so y1 is also a solution.

Note: This example is a fairly signicant amount of the reason we are interested in linear algebra(as it relates to dierential equations): because the solutions to homogeneous linear dierentialequations form a vector space. In general, for arbitrary functions P1(x), , Pn(x), it is not possibleto solve the dierential equation explicitly for y; nonetheless, we can still say something about whatthe solutions look like.

1.4 Span, Independence, Bases, Dimension

One thing we would like to know, now that we have the denition of a vector space and a subspace, is whatelse we can say about elements of a vector space i.e., we would like to know what kind of structure theelements of a vector space have.

In some of the earlier examples we saw that, in Rn and a few other vector spaces, subspaces could all bewritten down in terms of one or more parameters. In order to discuss this idea more precisely, we rst needsome terminology.

1.4.1 Linear Combinations and Span

Denition: Given a set ~v1, , ~vn of vectors, we say a vector ~w is a linear combination of ~v1, , ~vn if thereexist scalars a1, , an such that ~w = a1 ~v1 + + an ~vn.

Example: In R2, the vector 1, 1 is a linear combination of 1, 0 and 0, 1, because 1, 1 = 1 1, 0+1 0, 1.

PRAMODKUMARHighlight

Example: In R4, the vector 4, 0, 5, 9 is a linear combination of 1, 0, 0, 1, 0, 1, 0, 0, and 1, 1, 1, 2,because 4, 0, 5, 9 = 1 1,1, 2, 3 2 0, 1, 0, 0+ 3 1, 1, 1, 2.

Non-Example: In R3, the vector 0, 0, 1 is not a linear combination of 1, 1, 0 and 0, 1, 1 because thereexist no scalars a1 and a2 for which a1 1, 1, 0 + a2 0, 1, 1 = 0, 0, 1: this would require a commonsolution to the three equations a1 = 0, a1 + a2 = 0, and a2 = 1, and this system has no solution.

Denition: We dene the span of vectors ~v1, , ~vn, denoted span(~v1, , ~vn), to be the set W of all vectorswhich are linear combinations of ~v1, , ~vn. Explicitly, the span is the set of vectors of the form a1 ~v1+ +an ~vn, for some scalars a1, , an.

Remark 1: The span is always subspace: since the zero vector can be written as 0 ~v1 + + 0 ~vn, andthe span is closed under addition and scalar multiplication.

Remark 2: The span is, in fact, the smallest subspace W containing the vectors ~v1, , ~vn: because forany scalars a1, , an, closure under scalar multiplication requires each of a1~v1, a2~v2, , an~vn to be inW . Then closure under vector addition forces the sum a1~v1 + + an~vn to be in W . Remark 3: For technical reasons, we dene the span of the empty set to be the zero vector. Example: The span of the vectors 1, 0, 0 and 0, 1, 0 in R3 is the set of vectors of the form a 1, 0, 0+b 0, 1, 0 = a, b, 0. Equivalently, the span of these vectors is the set of vectors whose z-coordinate is zero i.e., theplane z = 0.

Denition: Given a vector space V , if the span of vectors ~v1, , ~vn is all of V , we say that ~v1, , ~vn are agenerating set for V , or that they generate V .

Example: The three vectors 1, 0, 0, 0, 1, 0, and 0, 0, 1 generate R3, since for any vector a, b, c wecan write a, b, c = a 1, 0, 0+ b 0, 1, 0+ c 0, 0, 1.

1.4.2 Linear Independence

Denition: We say a nite set of vectors ~v1, , ~vn is linearly independent if a1 ~v1 + + an ~vn = ~0 impliesa1 = = an = 0. (Otherwise, we say the collection is linearly dependent.)

Note: For an innite set of vectors, we say it is linearly independent if every nite subset is linearlyindependent (per the denition above); otherwise (if some nite subset displays a dependence) we say itis dependent.

In other words, ~v1, , ~vn are linearly independent precisely when the only way to form the zero vectoras a linear combination of ~v1, , ~vn is to have all the scalars equal to zero. An equivalent way of thinking of linear (in)dependence is that a set is dependent if one of the vectors is alinear combination of the others i.e., it depends on the others. Explicitly, if a1~v1+a2~v2+ +an~vn =~0 and a1 6= 0, then we can rearrange to see that ~v1 =

1

a1(a2 ~v2 + + an ~vn).

Example: The vectors 1, 1, 0 and 0, 2, 1 in R3 are linearly independent, because if we have scalars aand b with a 1, 1, 0 + b 0, 2, 1 = 0, 0, 0, then comparing the two sides requires a = 0, a + 2b = 0,b = 0, which has only the solution a = b = 0.

Example: The vectors 1, 1, 0 and 2, 2, 0 in R3 are linearly dependent, because we can write 21, 1, 0+(1) 2, 2, 0 = 0, 0, 0. Or, in the equivalent formulation, we have 2, 2, 0 = 2 1, 1, 0.

Example: The vectors 1, 0, 2, 2, 2,2, 0, 3, 0, 3, 3, 1, and 0, 4, 2, 1 in R4 are linearly dependent,because we can write 2 1, 0, 2, 2+ (1) 2,2, 0, 3+ (2) 0, 3, 3, 1+ 1 0, 4, 2, 1 = 0, 0, 0, 0.

Theorem: The vectors ~v1, , ~vn are linearly independent if and only if every vector ~w in the span of ~v1, , ~vnmay be uniquely written as a sum ~w = a1 ~v1 + a2 ~v2 + + an ~vn.

For one direction, if the decomposition is always unique, then a1 ~v1 + a2 ~v2 + + an ~vn = ~0 impliesa1 = = an = 0, because 0 ~v1 + + 0 ~vn = ~0 is by assumption the only decomposition of ~0.



For the other direction, suppose we had two dierent ways of decomposing a vector ~w, say as ~w =a1 ~v1 + a2 ~v2 + + an ~vn and ~w = b1 ~v1 + b2 ~v2 + + bn ~vn. Then subtracting and then rearranging the dierence between these two equations yields ~w ~w =(a1 b1) ~v1 + + (an bn) ~vn. Now ~w ~w is the zero vector, so we have (a1 b1) ~v1 + + (an bn) ~vn = ~0. But now because ~v1, , ~vn are linearly independent, we see that all of the scalar coecients a1 b1, , an bn are zero. But this says a1 = b1, a2 = b2, . . . , an = bn which is to say, the twodecompositions are actually the same.

1.4.3 Bases and Dimension

Denition: A linearly independent set of vectors which generate V is called a basis for V .

Terminology Note: The plural form of the (singular) word basis is bases. Example: The three vectors 1, 0, 0, 0, 1, 0, and 0, 0, 1 generate R3, as we saw above. They are alsolinearly independent, since a 1, 0, 0+ b 0, 1, 0+ c 0, 0, 1 is the zero vector only when a = b = c = 0.Thus, these three vectors are a basis for R3. Example: More generally, in Rn, the standard unit vectors e1, e2, , en (where ej has a 1 in the jthcoordinate and 0s elsewhere) are a basis.

Non-Example: The vectors 1, 1, 0 and 0, 2, 1 in R3 are not a basis, as they fail to generate V : it isnot possible to obtain the vector 1, 0, 0 as a linear combination of 1, 1, 0 and 0, 2, 1. Non-Example: The vectors 1, 0, 0, 0, 1, 0, 0, 0, 1, and 1, 1, 1 in R3 are not a basis, as they arelinearly dependent: we have 1 1, 0, 0+ 1 0, 1, 0+ 1 0, 0, 1+ (1) 1, 1, 1 = 0, 0, 0. Example: The polynomials 1, x, x2, x3, are a basis for the vector space of all polynomials. First observe that 1, x, x2, x3, certainly generate the set of all polynomials (by denition of apolynomial).

Now we want to see that these polynomials are linearly independent. So suppose we had scalarsa0, a1, , an such that a0 1 + a1 x+ + an xn = 0, for all values of x.

Then if we take the nth derivative of both sides (which is allowable because a01+a1x+ +anxn = 0is assumed to be true for all x) then we obtain n! an = 0, from which we see that an = 0.

Then repeat by taking the (n1)st derivative to see an1 = 0, and so on, until nally we are left withjust a0 = 0. Hence the only way to form the zero function as a linear combination of 1, x, x

2, , xnis with all coecients zero, which says that 1, x, x2, x3, is a linearly-independent set.

Theorem: A collection of n vectors ~v1, , ~vn in Rn is a basis if and only if the nn matrix B, whose columnsare the vectors ~v1, , ~vn, is an invertible matrix.

The idea behind the theorem is to multiply out and compare coordinates, and then analyze the resultingsystem of equations.

So suppose we are looking for scalars a1, , an such that a1~v1 + + an~vn = ~w, for some vector ~w inRn. This vector equation is the same as the matrix equation B ~a = ~w, where B is the matrix whose columnsare the vectors ~v1, , ~vn, ~a is the column vector whose entries are the scalars a1, , an, and ~w isthought of as a column vector.

Now from what we know about matrix equations, we know that B is an invertible matrix precisely whenB ~a = ~w has a unique solution for every ~w.

But having a unique way to write any vector as a linear combination of vectors in a set is precisely thestatement that the set is a basis. So we are done.

Theorem: Every vector space V has a basis. Any two bases of V contain the same number of elements. Anygenerating set for V contains a basis. Any linearly independent set of vectors can be extended to a basis.

Remark: If you only remember one thing about vector spaces, remember that every vector space has a basis !


Remark: That a basis always exists is really, really, really useful. It is without a doubt the most usefulfact about vector spaces: vector spaces in the abstract are very hard to think about, but a vector spacewith a basis is something very concrete (since then we know exactly what the elements of the vectorspace look like).

To show the rst and last parts of the theorem, we show that we can build any set of linearly independentvectors into a basis:

Start with S being some set of linearly independent vectors. (In any vector space, the empty set isalways linearly independent.)

1. If S spans V , then we are done, because then S is a linearly independent generating set i.e., abasis.

2. If S does not span V , there is an element v of V which is not in the span of S. Then if we putv in S, the new S is still linearly independent. Then start over.

Eventually (to justify this statement in general, some fairly technical and advanced machinery maybe needed), it can be proven that we will eventually land in case (1).

If V has dimension n (see below), then we will always be able to construct a basis in at most nsteps; it is in the case when V has innite dimension that things get tricky and confusing, andrequires use of what is called the axiom of choice.

To show the third part of the theorem, the idea is to imagine going through the list of elements in agenerating set and removing elements until it becomes linearly independent.

This idea is not so easy to formulate with an innite list, but if we have a nite generating set,then we can go through the elements of the generating set one at a time, throwing out an elementif it is linearly dependent with the elements that came before it. Then, once we have gotten to theend of the generating set, the collection of elements which we have not thrown away will still be agenerating set (since removing a dependent element will not change the span), but the collection willalso now be linearly independent (since we threw away elements which were dependent).

To show the second part of the theorem, we will show that if A is a set of vectors with m elements andB is a basis with n elements, with m > n, then A is linearly dependent.

To see this, since B is a basis, we can write every element ai in A as a linear combination of the

elements of B, say as ai =

nj=1

ci,j bj for 1 i m.

Now suppose we have a linear combination of the ai which is the zero vector. We would like to see

that there is some choice of scalars dk, not all zero, such that

nk=1

dk ak = ~0.

If we substitute in for the vectors in B, then we obtain a linear combination of the elements ofB equalling the zero vector. Since B is a basis, this means each coecient of bj in the resultingexpression must be zero.

If we tabulate the resulting system, we can check that it is equivalent to the matrix equation C ~d = ~0,where C is the m n matrix of coecients with entries ci,j , and ~d is the n 1 matrix with entriesthe scalars dk.

Now since C is a matrix which has more rows than columns, by the assumption that m > n, we seethat the homogeneous system C ~d = ~0 has a solution vector ~d which is not the zero vector.

But then we haven

k=1

dk ak = ~0 for scalars dk not all zero, so the set A is linearly dependent.

Denition: We dene the number of elements in any basis of V to be the dimension of V .

The theorem above assures us that this quantity is always well-dened. Example: The dimension of Rn is n, since the n standard unit vectors form a basis. This says that the term dimension is reasonable, since it is the same as our usual notion of dimen-sion.

Example: The dimension of the vector space of mn matrices is mn, because there is a basis consistingof the mn matrices Ei,j , where Ei,j is the matrix with a 1 in the (i, j)-entry and 0s elsewhere.


Example: The dimension of the vector space of all polynomials is , because the (innite list of)polynomials 1, x, x2, x3, are a basis for the space.

1.5 Linear Transformations

Now that we have a reasonably good idea of what the structure of a vector space is, the next natural questionis: what do maps from one vector space to another look like?

It turns out that we don't want to ask about arbitrary functions, but about functions from one vector spaceto another which preserve the structure (namely, addition and scalar multiplication) of the vector space.

The analogy to the real numbers is: once we know what the real numbers look like, what can we sayabout arbitrary real-valued functions?

The answer is, not much, unless we specify that the functions preserve the structure of the real numbers which is abstract math-speak for saying that we want to talk about continuous functions, which turnout to behave much more nicely.

This is the idea behind the denition of a linear transformation: it is a map that preserves the structure of avector space.

Denition: If V and W are vector spaces, we say a map T from V to W (denoted T : V W ) is alinear transformation if, for any vectors ~v,~v1, ~v2 and scalar , we have the two properties

[T1] The map respects addition of vectors: T (~v1 + ~v2) = T (~v1) + T (~v2) [T2] The map respects scalar multiplication: T ( ~v) = ~v.

Remark: Like with the denition of a vector space, one can show a few simple algebraic properties of lineartransformations for example, that any linear transformation sends the zero vector (of V ) to the zero vector(of W ).

Example: If V =W = R2, then the map T which sends x, y to x, x+ y is a linear transformation.

Let ~v = x, y, ~v1 = x1, y1, and ~v2 = x2, y2, so that ~v1 + ~v2 = x1 + x2, y1 + y2. [T1]: We have T (~v1 +~v2) = x1 + x2, x1 + x2 + y1 + y2 = x1, x1 + y1+ x2, x2 + y2 = T (~v1) + T (~v2). [T2]: We have T ( ~v) = x, x+ y = x, x+ y = T (~v).

More General Example: If V = W = R2, then the map T which sends x, y to ax+ by, cx+ dy is a lineartransformation.

Just like in the previous example, we can work out the calculations explicitly.

But another way we can think of this map is as a matrix map: T sends the column vector[xy

]to the

column vector

[ax+ bycx+ dy

]=

[a bc d

][xy

].

So, in fact, this map T is really just (left) multiplication by the matrix[a bc d

].

When we think of the map in this way, it is easier to see what is happening:

[T1]: We have T (~v1 + ~v2) =[a bc d

] (~v1 + ~v2) =

[a bc d

] ~v1 +

[a bc d

] ~v2 = T (~v1) + T (~v2).

[T2]: Also, T ( ~v) =[a bc d

] ~v1 =

([a bc d

] ~v1)

= T (~v).

Really General Example: If V = Rm (thought of as m 1 matrices) and W = Rn (thought of as n 1matrices) and A is any nm matrix, then the map T sending ~v to A ~v is a linear transformation.

The verication is exactly the same as in the previous example.



[T1]: We have T (~v1 + ~v2) = A (~v1 + ~v2) = A ~v1 +A ~v2 = T (~v1) + T (~v2). [T2]: Also, T ( ~v) = A ~v1 = (A ~v1) = T (~v).

This last example is very general: in fact, it is so general that every linear transformation from Rm to Rn isof this form! Namely, if T is a linear transformation from Rm to Rn, then there is some nm matrix A suchthat T acts by sending ~v to A ~v.

The reason is actually very simple, and it is easy to write down what the matrix A is: it is just the mnmatrix whose columns are the vectors T (e1), T (e2), . . . , T (em), where e1, , em are the standardbasis elements of Rm (ej is the vector with a 1 in the jth position and 0s elsewhere). To see that this choice of A works, note that every vector ~v in Rm can be written as a unique linear

combination ~v =

mj=1

aj ej of the basis elements. Then, after applying the two properties of a linear

transformation, we obtain T (~v) =

mj=1

aj T (ej). But this is precisely the matrix product of the matrix

A with the coecients of ~v.

Tangential Remark: If we write down the map T explicitly, we see that the term in each coordinate inW

is a linear function of the coordinates in V e.g., if A =

[a bc d

]then the linear functions are ax+ by

and cx+ dy. This is the reason that linear transformations are named so because they are really justlinear functions, in the traditional sense.

In fact, we can state something even more general: the argument above shows that, if we take any m-dimensional vector space V and any n-dimensional vector space W and choose bases for each space, then alinear transformation T from V to W behaves just like multiplication by (some) nm matrix A.

Remark 1: This result underlines one of the reasons that matrices and vector spaces (which initially seemlike they have almost nothing to do with one another) are actually closely related: because matricesdescribe the maps from one vector space to another.

Remark 2: One can also use this relationship between maps on vector spaces and matrices to providealmost trivial proofs of some of the algebraic properties of matrix multiplication which are hard to proveby direct computation.

For example: the composition of linear transformations is associative (because linear transformationsare functions, and function composition is associative). Multiplication of matrices is the same ascomposition of functions. Hence multiplication of matrices is associative.

1.5.1 Kernel and Image

Denition: If T : V W is a linear transformation, then the kernel of T , denoted ker(T ), is the set ofelements v in V with T (v) = ~0. The image of T , denoted im(T ), is the set of elements w in W such that thereexists a v in V with T (v) = w.

Intuitively, the kernel is the elements which are sent to zero by T , and the image is the elements in Wwhich are hit by T (the range of T ).

Essentially (see below), the kernel measures how far from one-to-one the map T is, and the imagemeasures how far from onto the map T is.

One of the reasons we care about these subspaces is that (for example) the set of solutions to a set ofhomogeneous linear equations A ~x = ~0 is the kernel of the linear transformation T of multiplication byA.

Another reason is that they will say something about the subspace V (see below).

The kernel is a subspace of V .

[S1] We have T (~0) = ~0, by simple properties of linear transformations.


[S2] If v1 and v2 are in the kernel, then T (v1) = ~0 and T (v2) = ~0. Therefore, T (v1+v2) = T (v1)+T (v2) =~0 +~0 = ~0.

[S3] If v is in the kernel, then T (v) = ~0. Hence T ( v) = T (v) = ~0 = ~0.

The image is a subspace of W .

[S1] We have T (~0) = ~0, by simple properties of linear transformations. [S2] If w1 and w2 are in the image, then there exist v1 and v2 are such that T (v1) = w1 and T (v2) = w2.Then T (v1 + v2) = T (v1) + T (v2) = w1 + w2, so that w1 + w2 is also in the image.

[S3] If w is in the image, then there exists v with T (v) = w. Then T ( v) = T (v) = w, so w isalso in the image.

Theorem: The kernel ker(T ) consists of only the zero vector if and only if the map T is one-to-one. The imageim(T ) consists of all of W if and only if the map T is onto.

The statement about the image is just the denition of onto. If T is one-to-one, then (at most) one element of V maps to ~0. But since the zero vector is taken to thezero vector, we see that T cannot send anything else to ~0. Thus ker(T ) = ~0.

If ker(T ) is only the zero vector, then since T is a linear transformation, the statement T (v1) = T (v2) isequivalent to the statement that T (v1) T (v2) = T (v1 v2) is the zero vector. But, by the denition ofthe kernel, T (v1 v2) = ~0 precisely when v1 v2 is in the kernel. However, this means v1 v2 = ~0, orv1 = v2. Hence T (v1) = T (v2) implies v1 = v2, which means T is one-to-one.

Theorem (Rank-Nullity): For any linear transformation T : V W , dim(ker(T )) + dim(im(T )) = dim(V ).

The idea behind this theorem is that if we have a basis for im(T ), say w1, , wk, then there existv1, , vk with T (v1) = w1, . . . , T (vk) = wk. Then if a1, , al is a basis for ker(T ), the goal is to showthat the set of vectors {v1, , vk, a1, al} is a basis for V .

To do this, given any v write T (v) =k

j=1

jwj =

kj=1

jT (vj) = T

kj=1

jvj

, where the j are unique. Then subtraction shows that T

v kj=1

jvj

= ~0 so that v kj=1

jvj is in ker(T ), hence can be written

as a sum

li=1

iai, where the i are unique.

Putting all this together shows v =k

j=1

jvj +

li=1

iai for unique scalars j and i, which says that

{v1, , vk, a1, al} is a basis for V .

1.5.2 The Derivative as a Linear Transformation

Example: If V and W are any vector spaces, and T1 and T2 are any linear transformations from V to W ,then T1 + T2 and T1 are also linear transformations, for any scalar .

These follow from the criteria. (They are somewhat confusing to follow when written down, so I won'tbother.)

Example: If V is the vector space of real-valued functions and W = R, then the evaluation at 0 map takingf to the value f(0) is a linear transformation.

[T1]: We have T (f1 + f2) = (f1 + f2)(0) = f1(0) + f2(0) = T (f1) + T (f2). [T2]: Also, T ( f) = (f)(0) = f(0) = T (f).

Note of course that being a linear transformation has nothing to do with the fact that we are evaluatingat 0. We could just as well evaluate at 1, or , and the map would still be a linear transformation.

Example: If V and W are both the vector space of real-valued functions and P (x) is any real-valued function,then the map taking f(x) to the function P (x)f(x) is a linear transformation.

[T1]: We have T (f1 + f2) = P (x)(f1 + f2)(x) = P (x)f1(x) + P (x)f2(x) = T (f1) + T (f2). [T2]: Also, T ( f) = P (x)(f)(x) = P (x)f(x) = T (f).

Example: If V is the vector space of all n-times dierentiable functions and W is the vector space of allfunctions, then the nth derivative map, taking f(x) to its nth derivative f (n)(x), is a linear transformation.

[T1]: The nth derivative of the sum is the sum of the nth derivatives, so we have T (f1 + f2) = (f1 +f2)

(n)(x) = f(n)1 (x) + f

(n)2 (x) = T (f1) + T (f2).

[T2]: Also, T ( f) = (f)(n)(x) = f (n)(x) = T (f).

If we combine the results from the previous four examples, we can show that if V is the vector space of alln-times dierentiable functions, then the map T which sends a function y to the function y(n)+Pn(x) y

(n1)+ + P2(x) y + P1(x) y is a linear transformation, for any functions Pn(x), , P1(x).

In particular, the kernel of this linear transformation is the collection of all functions y such that y(n) +Pn(x) y

(n1) + + P2(x) y + P1(x) y = 0 i.e., the set of solutions to this dierential equation. Note that since we know the kernel is a vector space (as it is a subspace of V ), we see that the set ofsolutions to y(n) +Pn(x) y

(n1) + +P2(x) y+P1(x) y = 0 forms a vector space. (Of course, we couldjust show this statement directly, by checking the subspace criteria.)

However, it is very useful to be able to think of this linear dierential operator sending y to y(n) +Pn(x) y

(n1) + + P2(x) y + P1(x) y as a linear transformation.

Well, you're at the end of my handout. Hope it was helpful.Copyright notice: This material is copyright Evan Dummit, 2012. You may not reproduce or distribute this materialwithout my express permission.

Documents

analog vector