Chapter 1 - Introduction to Vector Spaces...Preview What you should take away from this chapter: 1.Introduction You should know that the term vector can refer to many di erent objects,

Chapter 1 - Introduction to Vector Spaces∗

Justin Leduc†

These lecture notes are meant to be used by students entering the University of Mannheim

Master program in Economics. They constitute the base for a pre-course in mathematics;

that is, they summarize elementary concepts with which all of our econ grad students must

be familiar. More advanced concepts will be introduced later on in the regular coursework.

A thorough knowledge of these basic notions will be assumed in later

coursework.

Although the wording is my own, the definitions of concepts and the ways to approach them

is strongly inspired by various sources, which are mentioned explicitly in the text or at the

end of the chapter.

Justin Leduc

I have slightly restructured and amended these lecture notes provided by my predecessor

Justin Leduc in order for them to suit the 2016 course schedule. Any mistakes you might

find are most likely my own.

Simona Helmsmueller

∗This version: 2017†Center for Doctoral Studies in Economic and Social Sciences.

1

Contents

2

Preview

What you should take away from this chapter:

1. Introduction

• You should know that the term vector can refer to many different objects, includ-

ing but not limited to an n-tuple of real numbers.

2. The Algebraic Structure of Vector Spaces

• You should be more familiar with mathematical notation and understand the high

degree of precision therein.

• You should understand the difference between a definition and a property.

• You should know that there is a difference between scalar multiplication and the

dot product.

3. Subspaces, Linear Combinations, and Linear Dependence

• You should have an (albeit vague) idea of what the span of a set is.

• You should very well know the term linear independence, understand its definition

and meaning, and be able to test if a set of vectors in linearly independent.

• You should understand the terms basis and space dimension and have some kind

of intuition of why they matter.

4. Normed Vector Spaces and Continuity

• You should have an intuition for the terms open set, closed set, interior point,

closure point, boundary point.

• At least for boundary points, be able to write down a definition.

• You should have an intuition for the concept of continuous functions and know a

simple characterization of continuity.

• You should be able to understand the formal definitions of continuous functions

and convergence, i.e., you should recognize it when you see it, although you might

not be able to write it down by yourself yet.

5. Convex Sets and the Separating Hyperplane Theorem

• You should be able to graphically distinguish convex from non-convex sets.

• You should have a graphical idea of what the convex hull is.

• You should know the separating hyperplane theorem, at least have some graphical

intuition for it.

3

1 Introduction

Much of what is usually done in undergraduate economics is restricted to the plane, R2. This

has the advantage of being easily displayed graphically. For example, probably all of you

know how to look for the utility-maximizing consumption bundle of two goods when given

the budget restriction and the indifference curve. However, many of the phenomena studied

in graduate economics are more complex. For instance, instead of the two-good bundle,

a consumer might choose between different lotteries, where the actual allocation depends

on some sort of random outcome which he cannot influence. Also, instead of a consumer

choosing a bundle, you might look at a producer choosing a production plan, which can be

represented in a matrix.

The main objective of the theory of vector spaces is sometimes1 described as follows:

Geometrical insights at hand with 2- or 3-dimensional real vectors are really helpful. Can we,

in some way, generalize these insights to other mathematical objects, for which a geometric

picture is not available?

In R2 geometrical representations give us simple and fundamental principles that help us

solving various problems. Think, for instance, of the projection theorem in two dimensions,

which states that the shortest path from a point x to a line l is that which lies on the

perpendicular to l. While geometrically very intuitive, this result would probably have

been very hard to find out, had we not had a geometrical representation for vectors. And

yet, it is a fundamental result for optimization problems such as the least square problems

you (may) have studied in your undergrad. Its generalization to n-dimensional real vectors

allows, among other things, the application of the least squares method in situations where

no picture can guide the thoughts. What this chapter will try to convince you about, is

that the generalization applies to many more mathematical objects than n-tuples of real

numbers2.

IN THIS CHAPTER, BY VECTOR WE NEED NOT MEAN A N-TUPLE OF REAL

NUMBERS, BUT MAY REFER TO MANY MORE OBJECTS (FUNCTIONS,

SEQUENCES, MATRICES, · · ·)!

Now, generalizing something need not be easy. Indeed, one has to pin down the exact

features of the simpler concepts which guarantee the validity of the insights they bring

to us. These features are the algebraic structure with which we equip a set of objects of

study (vectors). Therefore, in the sequel, it is important to distinguish simple sets from

“algebraically structured sets”, i.e. sets equipped with an algebraic structure. In order to

mark the change in emphasis when we manipulate such “structured sets”, we give them a

different name: we call them spaces. A vector space, then, is a set of mathematical objects3

equipped with the same algebraic structure as that of vectors of real numbers.

1See e.g. Luenberger (1969) or Gross (2011).2For instance, functions and matrices!3For instance, vectors in the classical sense of the term (i.e. n-tuples of real numbers), but not necessarily!

4

2 The Algebraic Structure of Vector Spaces

2.1 Definitions

So what exactly do we mean when we talk about the algebraic structure? Maybe in school you

learned that a vector is an entity with direction and magnitude. In its most straightforward

interpretation, a vector could tell you to go e.g. 4 km in the direction East. Once you

reach that point, another vector could tell you to go 3 km in the direction North. Once

you also reach that point you look back at the origin and wonder whether you could not

have reached your goal more easily. What you are in fact thinking about is vector addition

(and, hopefully, Pythagoras theorem, but we will get back to that later). As you see, low

dimensional real vectors can be easily identified as geometrical entities and vector operations

can be illustrated as follows:

Figure 1: 2D Geometrical Interpretation of Vector Operations

To be able to manipulate general vectors (think of matrices, functions etc.), we endow them

with similar operations, i.e. the vector addition and multiplication with a scalar. From the

graphical illustrations it is clear that in R2 these follow certain rules, e.g. that the addition

is commutative. In order to define general vector spaces, we take all the properties of these

algebraic operations and use them as defining axioms. Note the change of emphasis here. A

property is the consequence of some set of axioms, i.e. the conclusion following from initial

assumptions. Using the algebraic properties of classical vectors as defining axioms means

that we now understand as vectors not only n-tuples of real numbers, but all mathematical

Functions, matrices, sequences,... all can fit in this new, more general, concept of a vector! Beware, do notunderstand here that it is possible to identify those mathematical objects with n-tuples of real numbers.Rather, this means that our concept of what a vector is now includes the concept of vectors you have studiedin high school, while not being itself included in this concept.

5

objects which are member of a space endowed with these algebraic laws.

Definition 2.1. (Real Vector Space)

Let X := (X,+, ·) be a set of elements, which we will from now on call vectors, together

with two operations. Namely, the vector addition, which associates to any x and y in X the

vector x + y, and the scalar multiplication, which associates to any scalar λ and any x in X

the vector λ · x. X is called a vector space if the following properties hold:

(i) Vector addition and scalar multiplication are closed operations : ∀x,y ∈ X, λ ∈ Rx + y ∈ X and λ · x ∈ X(ii) Vector addition is commutative: ∀x,y ∈ X x + y = y + x

(iii) Vector addition is associative: ∀x,y, z ∈ X x + (y + z) = (x + y) + z

(iv) There exists a null element 0 in X such that: ∀x ∈ X x + 0 = x

(v) Scalar multiplication is associative: ∀λ, µ ∈ R ∀x ∈ X λ(µx) = (λµ)x

(vi) Scalar multiplication is distributive over vector and scalar additions:

∀λ ∈ R ∀x,y ∈ X λ(x + y) = λx + λy

∀ λ, µ ∈ R ∀ x ∈ X (λ+ µ)x = λx + µx

(vii) If 1 denotes the scalar multiplicative identity and 0 the scalar zero, then:

∀x ∈ X 1x = x and 0x = 0n

Remark 1. Note that there are two types of both operations used in the above definition. For

example, in axiom (vi), last line, the plus on the left hand side refers to our usual addition

of real numbers, whereas the plus on the right-hand side refers to the new vector addition.

This might seem like an ambiguous use of notation, however it is justified by the fact that

the objects that you sum up are uniquely defined as either scalars or vectors. In contrast,

a line such as λ + x is not valid because the addition of real numbers and vectors is not

defined.

Remark 2. The attribute real in our definition is simply here to express the fact that our

scalars are elements of the real line. The concept of vector spaces can naturally be extended

to that of vector space over a different set of numbers, such as for instance C, the set of

complex numbers. Yet, we will focus on real vector spaces in what follows as they are most

commonly encountered in economics.

Remark 3. By the way, remember that we talked about direction and magnitude of vectors?

The word scalar implies that it can be used to scale up the objects of interest, i.e. to change

its magnitude without changing its direction. This is precisely what is done with the scalar

multiplication. It is not to be confused with the scalar product, which we will introduce later

on.

6

Remark 4. I have talked about magnitude and direction to motivate a graphic intuition

behind the algebraic operations. However, if we were to talk precisely about these properties,

we would soon find that we need some sort of measure of distances. We have not yet equipped

our vector space with such a measure, but will soon do so in a following section. For the

time being, it is more exact to think of a vector space as a set, in which all objects have

a clearly defined position, plus some very basic algebraic operations which ensure that you

can take linear combinations of elements of the vector space.

Remark 5. You may wonder why we included exactly these properties in the definition and

not, say, the fact that x + (−1)x = 0. In definitions, one tries to be as concise as possible.

The additional statement is not needed in the definition, because it follows from the axioms

as property of vector spaces:

x + (−1)x = (1 + (−1))x = 0x = 0,

where the first equality follows from (vi), the second equality from our usual addition in Rand the last equality from (vii).

Example 2.1. It is a good exercise to verify that the following sets, endowed with proper

operations, can now also be considered as vector spaces:

• V = {f : dom(f) = [a, b]},

• S = {x | x = {ζk}∞k=1} the set of infinite sequences of real numbers,

• The set Cn(X) of all bounded and continuous real valued functions with domain in Rn,

• Mm×n, the space of m× n matrices,

• ... And many others!

Example 2.2. Does the following define a vector space?

for [a, b] ⊂ R, define V := {f : [a, b]→ [a, b]},

∀f ∈ V, a ∈ R : af := f : [a, b]→ [a, b] with f(x) := af(x)

and

∀f, g ∈ V : f + g := h : [a, b]→ [a, b] with h(x) := f(g(x))

Proof. No, because vector addition is not commutative. Counterexample: let f : x 7→ b and

g : x 7→ a. Clearly, f, g ∈ V , but f + g : x 7→ b is not the same as g + f : x 7→ a.

Elementary but important properties of vector spaces are the cancellation laws.

7

Theorem 2.1. (Cancellation laws)

Let X := (X,+, ·) be a real vector space, x, y and z belong to X, and λ and γ belong to R.

Then the following holds:

(i) If x + y = x + z, then y = z

(ii) If λx = λy and λ 6= 0, then x = y

(iii) If λx = γx and x 6= 0, then λ = γ

Next, let us generalize another point with which you have become familiar when working

with classical vector spaces: the relation between R, R2, R3, and so on... This relation is

made formal through the concept of Cartesian product:

Definition 2.2. (Cartesian product)

Let X := (X,+, ·) and Y := (Y,+, ·) be two real vector spaces. We define the cartesian

product of X and Y, denoted X× Y as the collection of ordered pairs (x, y) with x element

of X and y element of Y together with two operations: addition and scalar multiplication,

defined respectively as (x1, y1) + (x2, y2) = (x1 + x2, y1 + y2) and λ(x, y) = (λx, λy).

Remark 6. You may want to verify that the Cartesian product of two real vector spaces is

itself a vector space.

Notation: In what follows, I will generally denote a real vector space (X,+, ·) by X.

To conclude, let us define a very special vector operation, the scalar product or dot

product. It is defined on the cartesian product of Rn with itself and maps into the real

number, i.e. • : Rn×Rn 7→ R. We here use the algebraic definition, but be aware that there

are two equivalent definitions (the other is geometric).

Definition 2.3. (Dot product)

Let x = (x1, ..., xn), y = (y1, ..., yn) ∈ Rn. Then the dot product of these two n-dimensional

vectors is a real number:

x • y = x1 · y1 + ...+ xn · yn.

The dot product is commutative, distributive over vector addition and associative w.r.t.

scalar multiplication, but not generally associative (why not?).

2.2 Subspaces, Linear Combinations, and Linear Dependence

Considering subsets of a “universal” or “ambient” set often proves useful in mathematics.

For instance, we may like to consider only the integers and not the whole real line. It is possi-

ble to generalize the concept of a subset to the context of spaces (i.e. algebraically structured

sets). If we are to do so, however, we do not want to loose the very structure we looked

for when moving from the notion of a set to that of a space4. I first introduce a concept

4Remember, the structure will guarantee the valid extension of our geometrical insights!

8

that will help us to go on along this line, and then proceed with the notion of vector subspace.

Definition 2.4. (Closure Under an Operation)

Let X := (X,+, ·) be a real vector space. We say that Y ⊆ X is closed under the addition

if and only if, for any two elements y1 and y2 in Y, we have that y1 + y2 belongs to Y.

Similarly, we can define closure under scalar multiplication.

Definition 2.5. (Vector Subspace)

Let X := (X,+, ·) be a real vector space and Y a non empty subset of X. We say that Y is

a subspace of X if and only if Y is closed under vector addition and scalar multiplication.

Definition 2.6. (Linear Combination)

Let u, v be two nonzero vectors in a vector subspace. A linear combination of the vectors

has the form

λu+ µv with λ, µ ∈ R.

Remark 1. If you understood properly the idea of closure, you should be able to conclude

that a simple way to check whether Y is a subspace of X or not is to verify or falsify the

following statement:

∀y1,y2 ∈ Y ∀λ, µ ∈ R λy1 + µy2 ∈ Y

This means that a subspace of a real vector space is a subset that contains any linear

combinations of two of its elements! Keep that in mind!

Remark 2. Any subspace of a vector space is itself a vector space. Further, note that the

entire space X is a subspace of X as X is by definition a subset of itself and is closed under

scalar multiplication and addition. A subspace not equal to the entire space is called a proper

subspace.

For instance, we may think of the space of convergent infinite real sequences as a proper

subspace of that of infinite real sequences mentioned above:

Example 2.3. Prove that the space of convergent infinite real sequences, L, constitutes a

proper subspace of the space of infinite real sequences.

Proof. Let an, bn ∈ L, λ, µ ∈ R. Then there exist a, b ∈ R with a = limn→∞

an and b = limn→∞

bn.

From the laws on limits of sequences, it follow that λan → λa ∈ R and µbn → µb ∈ R for

n→∞, and lastly λan + µbn → λa + µb ∈ R. Hence, the linear combination of convergent

infinite real sequences is also a convergent infinite real sequence and hence an element of

L.

Theorem 2.2. (Intersection and Addition of Subspaces)

Let M and N be subspaces of a real vector space X. Then:

(i) their intersection, M ∩ N, is a subspace of X.

(ii) their sum, M + N, is a subspace of X.

9

Remark 3. Note that nothing is said about the union!! (Counterexample?)

Summing up, we have defined vector spaces, vector subspaces, and argued that any linear

combination of vectors in a vector (sub)space also lie in that (sub)space. The next result

establishes a converse proposition: linear combinations can be used to construct a subspace

from an arbitrary set of vectors in a vector space.

Theorem 2.3. (Generated Subspace (a.k.a Span))

Let Y be a subset of a real vector space X. Then, the set Span(Y), which consists of all

vectors in X that can be expressed as linear combinations of vectors in Y, is a subspace of X.

It is called the subspace generated by Y or span of Y and it is the smallest subspace which

contains Y.

Proof. I will not give the complete proof here as notation is rather cumbersome, which makes

it a good opportunity for you to practice precise writing. Note, however, that there are two

parts to prove: First, we need to show that Span(Y) is a subspace of X (i.e. it is a subset

and it is closed under vector addition and scalar multiplication). Secondly, we need to show

that it is the smallest subspace with contains Y. For this last part, assume that there is a

smaller subspace Z which contains Y, i.e. Y ⊂ Z and there is a y ∈ Span(Y) such that

y /∈ Z, and lead this to a contradiction.

Example 2.4. Let Y = {(1, 0), (0, 1)}. Then any two-dimensional point in the plane can be

represented as linear combination of these two elements of Y and vice versa: Span({(1, 0), (0, 1)}) =

R2. However, Y is not the only set to span this plane. Note that we also have Span({(2, 0), (0, 2)}) =

Span({(1, 0), (2, 0), (0, 0.5)}) = R2. In the following, we aim to find the smallest set which

spans a vector space.

Example 2.5. Let u, v ∈ Rn. Then, Span(u, v) = {λu+µv : λ, µ ∈ R}. If u is a multiple of

v, then Span(u, v) is simply the line spanned by u, and Span(u, v) = Span(u) = Span(v).

However, if u is not a multiple of v, then Span(u, v) is a two-dimensional plane, which

contain the lines Span(u) and Span(v).

Definition 2.7. (Linear Dependence, Linear Independence)

Let x be an element of a real vector space X. x is said to be linearly dependent upon a set S

of vectors of X if it can be expressed as a linear combination of vectors from S. Equivalently,

x is linearly dependent upon S if and only if x ∈ Span(S). If that is not the case, the vector

x is said to be linearly independent of the set S. Finally, a set of vectors is said to be a

linearly independent set if each vector of the set is linearly independent of the remainder of

the set.

Remark 4. Thus, two vectors in Rn are linearly independent if they do not lie on a common

line through the origin, three vectors in Rn are linearly independent if they do not lie in a

plane through the origin, etc...

10

Remark 5. Clearly, 0 is dependent on any given vector x (Why?). By convention, the set

consisting of 0 only is understood to be a dependent set and a set consisting of a single

nonzero vector an independent set.

Theorem 2.4. (Testing Linear Independence)

A necessary and sufficient condition for the set of vectors x1, x2, ..., xn to be linearly inde-

pendent is that:

Ifn∑

k=1

λkxk = 0, then ∀k = 1, 2, ..., n λk = 0

Proof. (Necessary part; By contradiction)

Let the set of vectors x1, x2, ..., xn be linearly independent and assume there is a λkdifferent from zero in the above sum. For simplicity, name the vectors so that this λk be the

one corresponding to the nth vector. Then,

n∑k=1

λkxk = 0⇔n−1∑k=1

λkxk = λnxn

Therefore, the following holds:

xn = −n−1∑k=1

λkλnxk

Which contradicts our initial assumption.

Example 2.6. In R3, the following vectors are linearly dependent: v1 = (1, 2, 3), v2 =

(2, 1, 5) and v3 = (8, 10, 22). This is so because

2v1 + v2 −1

2v3 = 0.

Finally, the following definition will come in handy if you are to use tools of advanced calculus.

Definition 2.8. (Basis and Space Dimension)

A finite set S of linearly independent vectors is said to be a basis for the space X if S

generates X. A vector space having a finite basis is said to be finite dimensional. All other

vector spaces are said to be infinite dimensional.

Example 2.7. Consider R3 with coordinate axes x, y, and z. Then any 3-tuple of indepen-

dent 3-dimensional vectors spans the whole space R3. A particular example, known as the

canonical basis, is the following set of three vectors: {(1, 0, 0); (0, 1, 0); (0, 0, 1)}

Theorem 2.5. (Uniqueness of the Dimension)

Any two bases for a finite dimensional vector space contain the same number of elements.

11

Proof. (The idea only)

The result is quite intuitive. Assume it is not the case, i.e., one basis, say, basis 1, has n

elements and another basis, say, basis 2, has m elements, with m 6= n. Without loss of

generality, assume m < n. Because basis 1 is a basis, you may express all elements of basis

2 as linear combinations of the elements of basis 1. And because each element of basis 2 is

by definition independent, you can one by one substitute the first m elements of basis 1 by

m elements of basis 2 so expressed. Thus, only m elements of basis 2 suffice to generate the

whole subspace, and it must be that n = m.

Example 2.8. Any pair of independent vectors in Rn generates a plane, which is thus a

finite dimensional space of dimension two. No finite collection of vectors will suffice to define

Cn(X), the set of all bounded and continuous real valued functions with domain X ∈ Rn. It

is thus an infinite dimensional space.

3 Normed Vector Spaces and Continuity

At the beginning of this lecture, I reminded you of school mathematics, where a vector is

often introduced as an entity with direction and magnitude. For example, a vector could tell

you to go e.g. 4 km in the direction East. Once you reach that point, another vector could

tell you to go 3 km in the direction North. Once you also reach that point you look back

at the origin and wonder whether you could not have reached your goal more easily. With

more easily, I mean that you could have taken a route which would have required you to

go less of a distance, namely go straight from the origin to the point (4,3) in the cartesian

coordinate system would mean a distance of only 5 units, whereas the roundabout route has

a distance of 7 units (remember Pythagoras?).

In the school-math definition, vectors have direction and length (i.e. magnitude). How-

ever, in our definition of vector spaces, we have only introduced addition and scalar multi-

plication. We need a new definition to formally introduce the concept of distance.

3.1 Distance and Norm in a Vector Space

Many basic mathematical concepts are very intuitive. For sure, the concept of a distance

function is one of the most intuitive that could be. Consider two objects that stand nearby

you, and ask yourself what properties you would like a distance function to have if it was

to give you the distance between these two objects. You shall wish to set a conventional

minimal distance, for cases where the two objects considered lie at the same place (i.e., are

the same object). Thus, a first wish could be that the function always yields a non-negative

output. Second, it seems natural to consider only such functions which give an output robust

to changes in the sense of measurement : whether one starts from object 1 and measures all

the way to object 2, or whether one proceeds the other way around, the output should be

the same. Finally, a third natural requirement is the following: when asked (i) to measure

12

the distance between object 1 and 2 and (ii) to measure the distance between object 1 and 2

while being imposed to pass by object 3, one should hope the outcome from (i) to be, in some

sense, “smaller” than the outcome from (ii). As the following formal definition will show,

these three properties are exactly what defines, in the eyes of mathematicians, a distance

function.

Definition 3.1. (Metric Space)

Let X be a vector space. If we can define a real-valued function d(., .) which maps any two

elements x and y in X into a real number d(x, y), and if that function is such that:

(i) ∀x, y ∈ X, d(x, y) ≥ 0, d(x, y) = 0 if and only if x = y, (non-negativity)

(ii) ∀x, y ∈ X, d(x, y) = d(y, x), (symmetry)

(iii) ∀x, y, z ∈ X d(x, y) ≤ d(x, z) + d(z, y), (triangle inequality)

then d(., .) is called a distance function5 for X and (X, d(., .)) a metric space.

Remark 1. Note that the possibility to define such a function is not guaranteed under all

circumstances, which implies a loss of generality. But do not worry to much about that,

most economic applications can make use of metric spaces – and if not, then generalizations

exist!

Example 3.1. An important example of a distance function in R is the absolute value of

the difference: d(x, y) = |x− y|.

The attentive reader may claim to have been fooled. Indeed, I consciously passed over

another important property which we would like distance functions to possess. Assume

one picks the two objects you investigated earlier on and translate them in the same sense

from 1 meter, i.e. move them from 1 meter in a parallel fashion and in the same sense.

Would you expect the distance between the two objects to have changed? Certainly not.

Well, this requirement is not imposed in the above definition, and there is thus no reason

that it be fulfilled. (We will see an example at the end of this section!) One strategy to im-

pose this further requirement is to define the distance via a norm. Let us detail that strategy.

Definition 3.2. (Normed Space)

Let X be a vector space. If we can define a real-valued function ‖.‖ which maps each element

x in X into a real number ‖x‖, and if that function is such that:

(i) ∀x ∈ X, ‖x‖≥ 0, ‖x‖= 0 if and only if x = 0, (non-negativity)

(ii) ∀x, y ∈ X, ‖x+ y‖≤ ‖x‖+‖y‖, (triangle inequality)

(iii) ∀x ∈ X ∀λ ∈ R, ‖λx‖= |λ|‖x‖. (absolute homogeneity6)

5a.k.a. metric.

13

Then ‖.‖ is called a norm for X and (X, ‖.‖) a normed space.

Remark 2. From point (ii) it is easy to derive the following fact (do it!):

‖x− y‖≥ ‖x‖−‖y‖

Example 3.2. An important example for us is the Euclidean norm in Rn:

∀x = (x1, ..., xn) ∈ Rn ‖x‖:=

(n∑

i=1

x2i

)1/2

= (x′x)1/2

But this is certainly not the only one. We can define, for instance, the norm of a matrix as

follows:

∀A ∈Mm×n ‖A‖:= maxx∈Rn,‖x‖=1

{‖Ax‖}

The Euclidean norm of a classical vector is what I above called its length or magnitude, but

norms are a more general concept than the geometric picture and intuitive meaning that

probably came to your mind when reading the word length.

Remark 3. For a more thorough introduction into the important properties of the Euclidean

vector space (Rn), see Simon & Blume (1994): Mathematics for Economists, Chapter 10.

There are also a number of exercises, especially after chapter 10.3 and 10.4, worth exploring.

The solutions can be found online.

Norms relates to distance functions as follows: on a given vector space every norm defines

a distance function! In other words, the existence of a norm is slightly more demanding than

that of the distance function and, as a consequence, any normed vector space necessarily is a

metric space (while the converse need not be true!). To get a bit of intuition, let us consider

a distance function between any x in X and the zero element of X (i.e. the origin). By the

above definition of a distance, we have:

(i) ∀x ∈ X, d(x, 0) ≥ 0, d(x, 0) = 0 if and only if x = 0,

(ii) ∀x ∈ X, d(x, 0) = d(0, x),

(iii) ∀x, y ∈ X d(x, y) ≤ d(x, 0) + d(0, y).

Now the two concepts look even more similar! It is thus worth asking the following

question: does d(., 0) define a norm in X? Clearly, the triangular inequality is fulfilled and

so is the non-negativity property. Yet, nothing here guarantees the absolute homogeneity

property, and, thus, unless we add some more requirements, d(., 0) need not define a norm

in X. However, the following result can actually be shown:

6You’ll understand in chapter 3 why it is called this way. Keep the name in mind! ;)

14

Theorem 3.1. (Norm vs. Distance)

Let (X, ‖.‖) be a normed vector space. Then, the function

d(x, y) = ‖x− y‖ ∀x, y ∈ X

is a distance function in X. Further, d(., .) exhibits the following extra properties:

(i) ∀x, y ∈ X ∀λ ∈ R d(λx, λy) = |λ|d(x, y), (absolute homogeneity)

(ii) ∀x, y, z ∈ X d(x+ z, y + z) = d(x, y). (translation invariance)

Conversely, a metric d(., .) that exhibits the extra properties of absolute homogeneity and

translation invariance defines a norm by defining the norm of an element x as its distance

from the origin.

Example 3.3. (The French Railway Metric is not translation invariant)

Consider the following example in R2:

d(x, y) =

{‖x‖+‖y‖ if x and y are independent

‖x− y‖ if x and y are colinear

with the bars denoting the euclidean norm. You should verify that it satisfies the three

properties defining a distance function. Yet, it is not translation invariant since if we take,

for instance, a strictly positive vector z in R and two independent strictly positive vectors

x, y, then

d(x+ z, y + z) = ‖x+ z‖+‖y + z‖= ‖x‖+‖y‖+2‖z‖> d(x, y).

For those who want a bit of intuition here, that distance function is simply imposed to pass

by the origin when measuring the distance between two points that are not contained in

a single ray from the origin. It is called the French Railway Metric because it used to be

almost true that, in France, if you were to travel between two cities that are not contained

in a single ray from Paris, then you was imposed to travel through Paris. See, for instance,

the figure below, where to go from T (Toulouse) to B (Barcelona) you can proceed without

going through Paris, while to go from T to B’ (Bordeaux), you need to go through Paris.

Then, if one translates the origin to, say, Mannheim, the distance between Toulouse and

Barcelona, or between Toulouse and Bordeaux, as measured by the French Railway metric,

will change!

3.2 Open sets, Closed sets, Compact sets

The existence of a metric on a vector space allows us to introduce a number of very important

concepts. Their importance might not seem clear at this stage, but I promise you that you

will use them very often in this course and the following. So it is worthwhile developing a

firm understanding and good intuition for their meaning.

15

Figure 2: The French Railway Metric is not Translation Invariant

Definition 3.3. (ε-Open Ball)

Let (X, d(., .)) be a metric space, x0 be an element of X, and ε be a strictly positive real

number. The ε-open ball Bε(x0) centered at x0 is the set of points whose distance from x0is strictly smaller than ε, that is:

Bε(x0) = {x|x ∈ X, d(x, x0) < ε}.

Definition 3.4. (ε-Closed Ball)

Let (X, d(., .)) be a metric space, x0 be an element of X, and ε be a strictly positive real

number. The ε-closed ball Bε[x0] centered on x0 is the set of points whose distance from x0is smaller than or equal to ε, that is:

Bε[x0] = {x|x ∈ X, d(x, x0) ≤ ε}.

Example 3.4. You can see any open interval (a, b) with a, b ∈ R and a < b as a (b− a)/2-

open ball centered around (a + b)/2 in R. Similarly, any closed interval [a, b] with a, b ∈ Rand a < b can be seen as a closed ball in R. Note that neither can be seen as closed or open

balls if the universal space has a dimension higher than that of R.

Coming back to geometrical intuition, wherein a set corresponds to a “geographical area”

in the universal space, we can informally grasp the concepts of open and closed sets. Namely,

an area may or may not have a boundary. Assume it has one. If that boundary is considered

to belong to the area, then we call the area a closed area. If, on the countrary, the boundary

is not considered to belong to the area, we call the area an open area. If part of the boundary

belongs to the area while some other part does not, then the area is neither closed nor open!

Finally, by convention, if no boundaries exist, as for the empty set of the universal set, the

area is said to be both closed and open. The next definitions use the concepts of open and

close balls to formalize these ideas.

16

Definition 3.5. (Interior Point, Interior)

Let A be a subset of a metric space X. The point a in A is said to be an interior point of A

if and only if there exists ε > 0 such that the ε-open ball centered at a lies entirely inside A.

The collection of all interior points of A is called the interior of A, denoted Int(A) or A.

Definition 3.6. (Open Set)

Let A be a subset of a metric space X. A is said to be an open set if an only if A =Int(A).

Remark 1. Hence, any set A contains its interior, but the converse is true if and only if A

is open.

Definition 3.7. (Closure Point, Closure)

Let A be a subset of a metric space X. The point x in X is said to be a closure point of A if

and only if, for every ε > 0, the ε-open ball centered at x contains at least one point a that

belongs to A. The collection of all closure points of A is called the closure of A, denoted A.

Definition 3.8. (Closed Set)

Let A be a subset of a metric space X. A is said to be a closed set if and only if A = A.

Hence, any set A is included in its closure, but the converse is true if and only if A is

closed. We now can characterize the boundary as the a set of elements such that, if they all

belong to A, then A is closed, and, if none of them belong to A, then A is open.

Definition 3.9. (Boundary Point, Boundary)

Let A be a subset of a metric space X. The point x in X is said to be a boundary point of

A if and only if, for every ε > 0, the ε-open ball centered on x contains at least one point a

that belongs to A and at least one point ac that belongs to the complement of A, AC. The

collection of all boundary points of A is called the boundary of A and denoted ∂A.

We may now rephrase our concepts of open and closed sets as follows:

• Let A be a subset of a metric space X. Then A is open if and only if none of the

boundary points of A lie in A: A ∩ ∂A = ∅.

• Let A be a subset of a metric space X. Then A is closed if and only if all the boundary

points of A lie in A: A ∩ ∂A = ∂A.

Three last results which will help you make sure whether some set is open or closed.

Theorem 3.2. (Properties of Open Sets)

Let (X, d(., .)) be a metric space. Then

(i) ∅ and X are open in X.

(ii) A set A is open if and only if its complement is closed.

(ii) The union of an arbitrary (possibly infinite) collection of open sets is open.

(iii) The intersection of a finite collection of open sets is open.

17

Figure 3: Closed sets, open sets, boundary

Theorem 3.3. (Properties of Closed Sets)

Let (X, d(., .)) be a metric space. Then

(i) ∅ and X are closed in X.

(ii) A set A is closed if and only if its complement is open.

(iii) The union of a finite collection of closed sets is closed.

(iv) The intersection of an arbitrary (possibly infinite) collection of closed sets is closed.

3.3 Continuity

If you remember the preliminary chapter, we associated continuity with the requirement

that the images of two nearby points should not stand too far apart one from another. The

purpose of this section is to make this statement formal, within the context of vector spaces,

and we now have the tools at hand to better define or generalize the terms nearby and too

far.

Definition 3.10. (Continuous Function)

A function mapping from a metric space (X, dX(., .)) to a metric space (Y, dY(., .)) is contin-

uous at x0 ∈ X if and only if, for every ε > 0, there is a δ > 0 such that if dX(x, x0) < δ,

then dY(f(x), f(x0)) < ε. A function that is continuous at every point of its domain is said

to be continuous.

Remark 1. According to this definition, a function is continuous at some point x0 if, for any

element contained in a δ-open ball around x0, we can be sure that the images of both this

element and x0 are (i) well defined, and (ii) contained in a ε-open ball around f(x0), where

ε can be chosen as small as we wish by just selecting a small enough δ.

In fact, one can also generalize the useful characterization of continuity that we have

seen in the preliminary chapter, i.e., the characterization in terms of converging sequences.

This is done by first generalizing the notion of convergence to the context of metric spaces,

which is achieved rather easily by noting that, independently of our space’s dimension, the

distance function maps into the real line!

18

Definition 3.11. (Convergence)

Let X be a metric space. The infinite sequence of vectors {xn}n∈N is said to converge to a

vector x if the sequence {d(xn, x)}n∈N of real numbers converges to 0. That is,

∀ε > 0 ∃N ∈ N ∀n > N d(xn, x) < ε

In this case, we write xn →n→∞ x

The fact that the limit of a converging sequence is unique in a metric space is easily shown

using the nonnegativity and triangle inequality properties of a metric. Suppose xn → x and

xn → y, then ∀n ∈ N 0 ≤ d(x, y) ≤ d(x, xn) + d(xn, y) and the convergence of the left and

right hand side suffices to establish the result (squeeze theorem a.k.a. sandwich theorem7).

Theorem 3.4. (Characterization of Continuity)

A function mapping from a metric space (X, dX(., .)) to a metric space (Y, dY(., .)) is contin-

uous at x0 ∈ X if and only if xn → x0 implies f(xn)→ f(x0).

7If you have a sandwich inequality a ≤ x ≤ b and a and b converge to the same limit, then x convergesto that same limit too.

19

4 Convex Sets and the Separating Hyperplane Theo-

rem

Let’s take a step back: we defined vector spaces and subspaces and covered a range of

properties. Then we moved on to the concepts of distance and norms. The Euclidean norm

is closely linked to the realm of geometry, and it is here that I would now like to open a

bracket and cover topics which you will encounter frequently in your economic classes and

for which it is possible to develop a geometric intuition.

The above introduced concept of a subspace is fundamental. Yet, in many applications,

this remains, “geometrically” speaking, too inadequate a concept, and we need to operate

a trade off. While, on the one hand, we would be happy to keep as much as possible of the

introduced algebraic structure, on the other hand, we must frequently work with subsets of

vectors spaces which are not themselves subspaces and which we are not willing to replace

with their span. For instance, in optimization, we most often face constraints on our vectors,

constraints which define the subset of the ambient space over which we are maximizing. Of

course, because these constraints arise from our real environment, there is a priori no reason

that the region they define constitute a subspace, i.e. that it be closed under arbitrary

linear combinations. And yet substituting the subspace spanned by the constraint set to the

constraint set would precisely nullify our attempts to model the constraints imposed by the

environment we try to model.

I hope you perceived that, in the previous section, we established a strong relation between

linear combinations and subspaces. I will now present the concept of convex combinations,

which relates to convex sets. It imposes an algebraic restrictions on the already discussed

concept of linear combinations, which of course comes as a loss on the algebraic dimension.

But keep in mind that it is also the only way to gain in the “geometrical” dimension, that is,

to enlarge our collection of “workable” sets. Indeed, instead of requiring from our sets that

they contain all linear combinations of their elements, we will ask from them to contain only

a specific subset of the linear combinations of their elements. Such an algebraic requirement

being less demanding, our class of “satisfying” sets will be larger.

4.1 Convex Sets

In economics, many optimization problems come with inequality constraints (e.g. money

spent on consumption must be smaller or equal to the monetary income). The concept of

convex sets helps us deal with that issue, at least in many relevant cases. It is associated

with the concept of convex combination and arise naturally when the constraints of our

optimization problem are convex functions8.

Definition 4.1. (Convex Combination)

8We’ll define what that means in the next chapter ;)

20

A convex combination of the vectors x1, x2,...,xn is a linear combination of the vectors, i.e.,

a sumn∑

i=1

λixi, λi ∈ R, such that the following additional requirements hold:

n∑i=1

λi = 1 and ∀i λi ∈ [0, 1]

Example 4.1. If n = 2, a convex combination of the vector x, y is any linear combination

λ1x+ (1− λ1)y. If furthermore x, y ∈ R with x < y, then x ≤ λ1x+ (1− λ1)y ≤ y and the

set of all these convex combinations describes the line between these two points.

Definition 4.2. (Characterization of Convex Sets)

Let Y be a subset of a vector space X. Y is convex if and only if the convex combination

between any two of its elements is contained in Y.

Remark 1. Hence, to verify whether a set is convex or not, one has to make sure that any

line segment going through two different points of the set is fully contained in it!

Figure 4: Convex and non-convex sets

If a set is not convex, we may “convexify” it by adding the smallest possible amount of

elements such that the resulting set is convex. This idea is very close to that of the span.

When a subset is not a subspace, the minimal amount of points I have to add for it to become

a subspace are those which guarantee the closure of the set under linear combinations. To

make a non-convex subset convex, I simply add those elements which guarantee the closure

of the set under convex combinations.

Definition 4.3. (Convex Hull)

Let Y be the subset of a vector space X. The convex hull, denoted Co(Y) is the smallest

convex set containing Y.

The convex hull of Y may also be expressed as the set of all possible convex combinations

of the elements of Y:

21

Co(Y) =

{x ∈ X : ∃y1, y2, · · · , yn ∈ Y and λ ∈ [0, 1]n s.t.

n∑i=1

λi = 1 and x =n∑

i=1

λiyi

}

Figure 5: Convex Hulls

To conclude the subsection, here are some important properties that will, in some cases,

help you decide on the convexity of a given set:

Theorem 4.1. (Operations which preserve convexity)

Let C be the collection of all convex sets in the ambient space X. Let Ca be an arbitrary

collection of convex sets. Then:

(i) ∀ K ∈ C ∀α ∈ R: αK := {αk, k ∈ K} ∈ C(ii) ∀ K, G ∈ C K + G := {k + g, k ∈ K, g ∈ G} ∈ C(iii) ∩K∈CaK ∈ C

Proof. As an example, I show (ii): Let h1, h2 ∈ K+G. Then there exist k1, k2 ∈ K, g1, g2 ∈ G

with hi = ki + gi, i = 1, 2. For any λ1, λ2 ∈ R with λ1 + λ2 = 1, we have

λ1h1 + λ2h2 = λ1(k1 + g1) + λ2(k2 + g2) = (λ1k1 + λ2k2) + (λ1g1 + λ2g2).

By the convexity of K and G, the last line is the sum of an element of K and G respectively.

By the definition of K + G, this is hence a convex set.

Please keep in mind that every subspace is a convex set, but that the converse of this

statement is not true!!

4.2 Planes, halfspaces and the Separating Hyperplane Theorem

Remember the definition of the dot product in the first chapter? It might have seemed to

you to ”fall from the sky” and that it its only reason to exist is that it is the only type

of multiplication actually defined on vectors and presents a handy notation. Well, there is

more to it, and now that we know the Euclidean norm, we are able to postulate the following

theorem.

22

Theorem 4.2. Let u, v be vectors in Rn. In the plane spanned by the two vectors, let θ be

the angle between them (see picture below). Then

u • v = ||u||·||v||cosθ.

Proof. See, e.g. Simon & Blume (1994), page 216.

Remark 1. From this theorem follows the important observation that if u and v are or-

thogonal, that is, if there is a right angle between them, then their dot product vanishes:

u • v = 0⇔ u ⊥ v. (Note: if u or v equals zero, then θ is not defined.)

This property allows us to get a graphical intuition for some more definitions from the

realm of geometry, which will turn out to be fundamentally important for the remainder of

the course: Lines and planes. The following definition introduces the more general concept

of hyperplanes9

Definition 4.4. (Hyperplane)

Let X be a subspace of Rn. Then, a hyperplane of X is a set of the form:

Hba := {x ∈ X | a • x = b}

where a is an element of Rn that is different from 0 and b is an element of R.

Remark 2. More often, one finds the equivalent notation a′x = b, where a′ denotes the

transpose of the vector a and hence a′x is matrix multiplication: a′x = a1x1 + ... + anxn =

a • x10.9If we were to introduce hyperplanes very formally, we would have to introduce affine sets. It is not a

difficult concept and if you are interested, you may look up the definition in any textbook (or on wikipedia)and I am sure you will have no problem understanding it. However, while the focus of the previous chapterwas on introducing you to formal mathematical notation and logic, here, I find it the most important todevelop an intuition and geometric visualization of the main idea behind it. In addition, economists generallyuse a specific characterization of hyperplanes to define them, which is valid for classical vector spaces (i.e.,Euclidean spaces). This characterization is very important on its own right and I use it too as definition.

10This will become more clear in the next chapter.

23

Remark 3. Geometrically, the hyperplane Hba := {x ∈ X | a′x = b} can be interpreted as the

set of points whose inner product to a given vector a is constant, or, put otherwise as a plane

with normal vector a: a′x = b ⇔ a′x− b = 0 ⇔ a′(x− y) = 0 where y is such that a′y = b

and also belongs to the hyperplane!! The constant b determines the offset of the hyperplane

from the origin. If b = 0, then the planes passes through the origin.

Example 4.2. First, let’s look at R2. From high school, you know that a line has the form

x2 = mx1+b. That is, (−m, 1)•(x1, x2) = b, which fits the definition of a hyperplane. At the

stage, it might be wise to recall that there is also another way of describing the line, called

the parametric representation. Here, one makes use of the fact that a line is completely

determined by a point x0 on the line and a direction vector v in which to move from x0, that

is a line can be represented by the equation:

x(t) = x0 + tv.

To see the equivalence, you can derive the first equation by deriving the parameterization of

a line through the point (0, b) in the direction (1,m): x(t) = (0, b) + t(1,m) = (t, b + tm).

The parametric representation also works in higher dimension, and we will make use of this

when talking about derivatives. For example, the line in R3 through the point x0 = (2, 1, 3)

and in the direction v = (4,−2, 5) has the parameterization

x(t) = (2, 1, 3) + t(4,−2, 5) = (2 + 4t, 1− 2t, 3 + 5t).

To make this discussion complete, note that a line is also defined by two points which are

on the line. Suppose that x and y are on the line, which can then also be viewed as a line

through x in the direction of y − x. Hence:

x(t) = x+ t(y − x) = (1− t)x+ ty

11 The picture below is taken from Simon & Blume (1994).

11If 0 ≤ t ≤ 1, this defines the line segment between x and y. We have used this property in thecharacterization of convexity.

24

Example 4.3. Let’s add one dimension and look at planes. This time, we start with the

parametric representation. A plane that passes through the point p 6= 0 and is spanned by

two linearly independent direction vectors v, w which start in p can be represented as

x = p+ sv + tw, with w, t ∈ R.

As two points on a line define it, three points p, q, r can define a plane that passes through

p with direction vectors p− q and r − p. Plugging it into the above equation yields

x = t1p+ t2q + t3r with t1 + t2 + t3 = 1.

There is also a non-parametric equation of a plane, which comes from the fact that a plane

is completely described by a point p = (x0, y0, z0) on the plane and its inclination. The

latter is fixed by specifying a normal vector n, which is perpendicular to the plane. If x is

an arbitrary point on the plane, then x− p is a vector in the plane and therefore orthogonal

to n. Hence,

0 = n • (x− p) = (a, b, c) • (x− x0, y − y0, z − z0) = a(x− x0) + b(y − y0) + c(z − z0).

or ax+ by+ cz = d, which fits the form of a hyperplane. The pictures below are taken from

Simon & Blume (1994).

25

Remark 4. A hyperplane of a space of dimension n necessarily is a set of dimension (n−1), as

its characterizing equation restricts only one degree of freedom: that spanned by the vector

a.

Further, one may wish to define an “above” and a “below” along the dimension spanned

by a. This is the job of halfspaces.

Definition 4.5. (Halfspace)

Let X be a subspace of Rn. Then, a halfspace of X is a set of the form:

Hb−a := {x ∈ X | a′x ≤ b}

or

Hb+a := {x ∈ X | a′x ≥ b}

where a is an element of Rn that is different from 0 and b is an element of R.

Put differently, a hyperplane incidentally defines two halfspaces. The halfspace deter-

mined by a′x ≥ b is the halfspace extending in the direction of a and the halfspace determined

by a′x ≤ b is the halfspace extending in the direction of −a.

Example 4.4. In undergraduate economics, you might already have heard of commodity

spaces. Assuming that we deal with n commodities which are all perfectly divisible, we

interpret the vector x = (x1, x2, ..., xn) with xi ≥ 0∀i as assigning quantities to each of the n

commodities and call it a bundle. Then, the set of all bundles is the positive orthant of Rn,

called the commodity space. A consumer wishing to buy a bundle of goods chooses a vector

of quantities which she can afford. That is, she is restricted by her budget and the prices of

goods. Assume that the vector p = (p1, p2, ..., pn) gives the prices for the commodities, i.e.

pi is the price of commodity i, and the consumer has an income I to spend on commodities.

Then we can conveniently write her budget restriction as p · x ≤ I. The set of all bundles

x which fulfill this equation is called the consumer’s budget set, B. This set is convex (you

should prove this) and it is bounded above by the hyperplane defined through the equation

p′x = I.

26

Figure 6: Halfspaces in R2

With the concepts of hyperplane, halfspace, and convex set is associated an extremely

important result for convex optimization. It is geometrically very intuitive, and the proof is

way beyond the scope of this lecture. Therefore, I only illustrate the idea graphically.

Theorem 4.3. (Separating Hyperplane Theorem)

Let C and D be two convex sets in a metric space X. Further, assume C ∩D = ∅. Then,

there exists a 6= 0 in Rn and b in R such that for all x in C a′x ≤ b and for all x in D

a′x ≥ b. The hyperplane {x ∈ X | a′x = b} is called a separating hyperplane for the sets C

and D.

Figure 7: Separating Hyperplane Theorem

Remark 5. Please note that the converse of this theorem is not true unless some further

requirements are added! That is, the existence of a separating hyperplane between two

convex sets C and D does not imply that C and D do not intersect. (Consider for instance

the degenerate case C = D = {0}.)

27

Documents

Chapter 1 - Introduction to Vector Spaces...Preview What you should take away from this chapter: 1.Introduction You should know that the term vector can refer to many di erent objects,