A motivating example · consider vector spaces with both real scalars and, in §3.4 below, complex scalars. Example 3.1.1 The collection of all continuous functions with domain [0,L]

A motivating example

sec:Fourier-series-hypothesis • Move this up in to the IBVPsection?At the end of the previous chapter we saw that addressing the initial value

problem using eigensolutions depending on answering the following question:

Given a function f defined on [0, L], can we choose constants akso that the functions fn, defined by

fn(x) =nX

k=1

↵k k(x) (2.29) Fourier-series-hypothesis

converge to f as n ! 1?

The mathematics needed to address these questions is based on the workpioneered by Jean-Baptiste Joseph Fourier. In 1822 Fourier published hisbook Theorie analytique de la chaleur (Analytic theory of heat), in whichhe claimed that any “reasonable” function can be constructed as the infinitesum of sine functions.

There is a fair amount of mathematical theory that we need to developbefore we are able to understand Fourier’s claim in detail. Before we do that,however, it might be helpful to look at an example from Fourier himself. In§228 of his book2, Fourier presents the example of a function '(x) definedfor 0 x ⇡ by

'(x) =

8><

>:

x for 0 x ↵,

↵ for ↵ x ⇡ � ↵,

⇡ � x for ⇡ � ↵ x ⇡.

(2.30) Fourier-phi

2A digital copy of Fourier’s book is freely available at Google books. I encourage you

to take a look!

35

36

Fourier then states that

'(x) =4

⇡

✓sin(↵) sin(x) +

sin(3↵)

32sin(3x)

+sin(5↵)

52sin(5x) +

sin(7↵)

72sin(7x) + . . .

◆(2.31) Fourier-phi-series

The following Sage code produces a plot of the function above with ↵ = ⇡/3and the sum truncated at the sin(7x) term.

var(’x,k,n’)n=3a = pi/3

f(x)=(4/pi)*sum(sin ((2*k+1)*a)*sin ((2*k+1)*x)/((2*k+1)^2),k,0,n)

plot(f,(x,0,pi),figsize =[4 ,2])

If we take n to be a larger and larger, then we see that the function

'n(x) =nX

k=0

sin((2k + 1)↵)

(2k + 1)2sin((2k + 1)x)

does seem to approach the function '(x).• revise?

Exercise 2.4.3. Take ↵ very close to zero. You should see some weirdwiggles appearing in the function 'n. This is called “Gibbs phenomenon.”Look it up on Wikipedia.

• needed

Exercise 2.4.4. ↵ = ⇡/2 This is the initial shape of a plucked string. Whatis the corresponding solution to the wave equation?

Chapter 3

Vector space stu↵

3.1 Vector spaces

The key to understanding, and using, Fourier’s claim is to take the perspec-tive that functions “behave like functions.” In order to make this precise,let’s first review the properties of vectors that you learned in the vectorcalculus course.

The key properties of vectors is that they can be added and scaled:

• If v,w are vectors, then we can add them to obtain a new vector v+w.

• If v is a vector and ↵ is a number, then we can rescale v by ↵ to obtaina new vector ↵v.

The two operations, adding and scaling, satisfy a long list of properties(see Exercise 3.1.1 below) that basically mean that adding and scaling arecompatible in the way you think they should be. For example, ↵(v +w) =↵v + ↵w, etc. A collection of objects that can be added and scaled in acompatible way is called a vector space. The numbers that are used torescale the objects in the vector space are called scalars. In this class weconsider vector spaces with both real scalars and, in §3.4 below, complexscalars.

Example 3.1.1

The collection of all continuous functions with domain [0, L] forms avector space. To see this note that if f and g are continuous functions,then the function f + g, defined by (f + g)(x) = f(x) + g(x), is acontinuous function. Similarly, if f is a continuous function and ↵ is areal number, then ↵f is also a continuous function.

37

38 CHAPTER 3. VECTOR SPACE STUFF

It is common to give vector spaces a mathematical symbol as a name.The vector space of vectors in n-dimensional space is given the symbol Rn.The vector space of continuous functions with domain [0, L] is given thesymbol C([0, L]).

In the vector calculus course, you used the scaling property of vectorto parametrize lines in Rn. For example, the line in R2 passing throughp = (1, 3) and parallel to v = h4,�2i can be parametrized by the path

x(t) = p+ tv = (1 + 4t, 3� 2t). (3.1)

If instead we wanted a path that oscillated about the point p in the directionof v with frequency of one oscillation per unit time, then the desired pathis

x(t) = p+ sin(2⇡t)v = (1 + 4 sin(2⇡t), 3� 2 sin(14⇡t)).

We can animate this oscillating path with the following Sage code.

c = animate ([point ((1+4* sin(2*pi*t) ,3-2*sin(2*pi*t))) for tin srange (0 ,1 ,0.05)],

xmin=-4,ymin=-2,xmax=6,ymax=6,figsize =[4 ,3])c.show(delay =25)

An analogous construction can be done in the vector space C([0, L]).Remember, though that “points” and “vectors” in this setting are actuallyfunctions. Thus a “path” is really a function u(t, x) that for each fixedt is a vector (ie, function) in C([0, L]). Thus we can parametrize the linepassing through the “point” f(x) = e

x that is parallel to the “vector” (x) =cos

�3⇡Lx�with the path

u(t, x) = f(x) + t (x) = ex + t cos

✓3⇡

Lx

◆.

If instead we wanted a path that oscillated about the function f in thedirection of with frequency of one oscillation per unit time, then thedesired path is

u(t, x) = f(x) + sin(2⇡t) (x) = ex + sin(2⇡t) cos

✓3⇡

Lx

◆.

We can animate this oscillating path with the following Sage code, wherewe have set L = ⇡.

var(’x’)u = animate ([exp(x) + sin(2*pi*t)*cos(3*x) for t in srange

(0 ,1 ,0.05)],xmin=0,ymin=-2,xmax=pi,ymax=6,figsize =[4 ,3])

u.show(delay =25)

3.1. VECTOR SPACES 39

From this last example we see that we can interpret the standing wavesolutions to the wave equation as oscillating paths in the vector space offunctions. In particular, a standing wave of the form u(t, x) = A(t) (x) isinterpreted as being an oscillating path whose “direction” is determined by (x).

We now introduce two important concepts in the study of vector spaces,“independence” and “basis.” We motivate these concepts with the followingexample.

Example 3.1.2

example:3D-linear-algebra Consider the vectors u,v,w 2 R3 given by

u =

0

@101

1

A , v =

0

@10�1

1

A , w =

0

@120

1

A .

We make two claims:

1. Any vector x 2 R3 can be constructed by adding together multi-ples of u,v,w, and

2. there is only one combination of u,v,w that will add up to formx.

To verify the first claim means to find scalars ↵,�, � such that

x = ↵u+ �v + �w.

This means that we want to solve the equation

0

@x1

x2

x3

1

A = ↵

0

@101

1

A+ �

0

@10�1

1

A+ �

0

@120

1

A .

We can write this equation as a system of three equations with threeunknowns (↵,�, �):

x1 = ↵+ � + �

x2 = 2�

x3 = ↵� �.


We can systematically solve this by adding/subtracting multiples ofthese equations to eliminate the variables on the right. The result isthat we must have

↵ =1

2x1 �

1

4x2 +

1

2x3

� =1

2x1 �

1

4x2 �

1

2x3

� =1

2x2.

(Here is some Sage code that does the computation using matrix alge-bra.) This computation shows that for any possible vector x we canchoose scalars ↵,�, � in such a way that we can build x from u,v,w.This addresses the first claim.

The second claim is also verified by the computation above. Notethat we don’t have any choice in what ↵,�, � are. Thus for any givenx, there is a unique way to construct it from u,v,w.

It is useful, however, to consider the second claim without the firstclaim. This is because there are circumstances where we have a list ofvectors that can’t necessarily be used to build any vector, but for whichany construction might necessarily be unique. So, suppose that there isa vector x for which there are two possible combinations, one given bynumbers ↵, �, � and another given by b↵, b�, b�:

x = b↵u+ b�v + b�wx = ↵u+ �v + �w

Subtracting these two equations gives

0 = (b↵� ↵)| {z }↵

u+ (b� � �)| {z }�

v + (b� � �)| {z }�

w.

In other words, we have a combination of the vectors u,v,w, given bythe numbers ↵,�, �, that adds up to zero. This means that ↵,�, � mustsatisfy the system of equations

0 = ↵+ � + �

0 = 2�

0 = ↵� �.


It is easy to deduce that the only solution to this system is ↵ = 0,� =0, � = 0 which means that the two combinations of u,v,w are not, infact, di↵erent.

Notice that in this alternate approach to the second claim, it su�cesto consider which combinations u,v,w that make the zero vector.

Example 3.1.2 motivates a number of definitions. We define a linear

combination of the vectors v1, . . . ,vk to be any sum of the form

↵1v1 + · · ·+ ↵kvk,

where ↵1, . . . ,↵k are scalars. Thus in Example 3.1.2, we found that anyvector x 2 R3 can be expressed as a linear combination of u,v,w.

A collection of vectors v1, . . . ,vk is called (linearly) independent ifthe only linear combination of those vectors that yields the zero vector is thecombination where each of the scalars is zero. In Example 3.1.2, we foundthat the vectors u,v,w are linearly independent.

Example 3.1.3

example:degree-2-polynomials Consider the functions p0(x) = 1, p1(x) = x, p2(x) = x2 as elements of

the vector space C([0, 1]). Linear combinations of p0, p1, p2 are polyno-mials of degree 2.

In this vector space, the zero vector is the function that is identicallyzero. It is easy to see that the only linear combination of p0, p1, p2 thatyields this zero function is

(0)p0 + (0)p1 + (0)p2.

Thus p0, p1, p2 are linearly independent.

We end this section by addressing a technical, but important, point.Notice that the definition of linear combination only deals with finite sumsof vectors. This is important because with infinite sums of vectors we needto worry about convergence of the sum. For example, we might want toextend the list of polynomials in Example 3.1.3 to the infinite list

p0(x) = 1, p1(x) = x, . . . pk(x) = xk, . . .

of functions in C([0, 1]). We also know that the exponential function exp(x) =ex is in C([0, 1]) and that it has the Taylor series

exp(x) = 1 + x+1

2x2 +

1

3!x3 + . . .

= p0(x) + p1(x) +1

2p2(x) +

1

3!p3(x) + . . . .


It is tempting to view this series expansion of the exponential function as asort of “infinite linear combination” of the functions p0, p1, p2, . . . . However,it is easy to construct infinite sums of these functions that do not convergeto a continuous function. For example, the geometric series

1

1� x= 1 + x+ x

2 + x3 + . . .

p0(x) + p1(x) + p2(x) + p3(x) + . . .

does not converge to a function in C([0, 1]). Consequently, we do not usethe phrase “linear combination” when we are talking about an infinite sum.Instead, we use the word “series” to when considering an infinite sum.

In the exercises you show that the various collections of functions arevector spaces. Before stating the exercises we introduce a bit of notationand terminology.

Functions that can be di↵erentiated as many times as we want are calledsmooth. The collection of all smooth functions with domain ⌦ is given thesymbol C1(⌦). For example, the exponential function is in the collectionC

1(R), but the absolute value function is not in C1(R).

The absolute value function, however, only has “trouble” at one spot,namely at x = 0. Functions that are smooth except at a finite number ofpoints are called piecewise smooth. The collection of piecewise smoothfunctions with domain ⌦ is given the symbol C1

pw(⌦). You might wonderwhat the derivative of the absolute value function is. At most places, ofcourse, the derivative is simply the slope. But at x = 0 the derivative notdefined. For convenience, when there is a “jump” in a piecewise smoothfunction, we define the value at the jump to be zero.

Finally, we say that a piecewise smooth function f with domain ⌦ issquare integrable if Z

⌦|f |2dV

is finite; here dV represents the length/area/volume element that is appro-priate for the region ⌦. For example, the absolute value function is squareintegrable on the domain ⌦ = [�1, 1], in which case dV = dx. The collectionof piecewise smooth functions with domain ⌦ that are square-integrable isgiven the symbol L2(⌦)1.

1The spaces L2

(⌦) can be extended to the Lebesgue space L2(⌦). However, this

requires the development of the Lebesgue theory of integration, which is beyond the scope

of this course.


HW:vector-space-defn Exercise 3.1.1. Look up the definition of vector space online. What are allthe properties that adding and scaling must satisfy?

Exercise 3.1.2.

1. For each real number p define the function fp(x) = xp on the domain

(0, 1). For which p is fp in L2((0, 1))?

2. Use the fact that (f � g)2 � 0 to show that 2fg f2 + g

2 for all f, g.Subsequently show that L2(⌦) is a vector space.

Exercise 3.1.3.

1. Let L20(R) be the collection of all square integrable functions with do-

main R and satisfying the infinite-string boundary condition (2.15).Show that L2

0(R) is a vector space.

2. Let L2P([�L,L]) be the collection of all square integrable functions with

domain [�L,L] that satisfy the periodic boundary condition (2.16).Explain why L2

P([0, L]) is a vector space.

3. Let L2D([0, L]) be the collection of all square integrable functions with

domain [0, L] that satisfy the Dirichlet boundary condition (2.17). Ex-plain why L2

D([0, L]) is a vector space.

4. Let L2N([0, L]) be the collection of all square integrable functions with

domain [0, L] that satisfy the Neumann boundary condition (2.18). Ex-plain why L2

N([0, L]) is a vector space.

HW:ode-solutions-form-vector-space Exercise 3.1.4. Show that the collection of solutions u(t) to the di↵erentialequation

d2u

dt2+ 14u = 0

forms a vector space. How many linearly independent solutions can youfind?

Exercise 3.1.5. We say that a vector space is finite dimensional is thereis a maximum number vectors in any collection of linearly independent vec-tors. If there is no maximum, then we say that the vector space is infinite

dimensional.

1. Explain why R2 is finite dimensional.

2. Explain why C([0, 1]) is infinite dimensional.

3. Is the collection of solutions to the ODE in Exercise 3.1.4 finite di-mensional or infinite dimensional?


3.2 Inner product spaces

In the vector calculus course, lengths and angles defined by vectors in Rn

were determined using the dot product. In this section we generalize thenotion of a dot product to other vector spaces. Before we do, let’s recall thebasic properties of the dot product for vectors in Rn:

• Symmetry property. We have v ·w = w · v for all vectors v,w.

• Positivity property. We have v · v � 0 for all vectors v.

• Definite property. We have v · v = 0 if and only if v = 0.

These last two properties are often combined in to a single “positive definiteproperty.”

It is convenient to think of the dot product as a function that takes intwo vectors and returns a scalar. Using this perspective, we define an inner

product on a real vector space to be a function that takes in two vectors u, vand returns a number, which we give the symbol hu, vi, having the followingproperties:

• Linearity property. We have h↵1v1 + ↵2v2, wi = ↵1hv1, wi+ ↵2hv2, wifor all scalars ↵1,↵2 and for all vectors v1, v2, w in V .

• Symmetry property. We have hv, wi = hw, vi for all v, w in V .

• Positivity property. We have hv, vi � 0 for all v in V .

• Definite property. We have hv, vi = 0 if and only if v = 0.

A (real) vector space together with an inner product is called an inner

product space.The following example is very important.

Example 3.2.1

For the vector space L2(⌦) we define the L2inner product by

hu, vi =Z

⌦u v dV. (3.2) standard-L2-inner-product

It is easy to see that the properties of integration imply the linearityand symmetry properties. The positivity property follows from the factthat u

2 � 0 for any function u. The definite property follows fromconsidering each smooth piece of the function u independently, andusing our convention that functions in L2(⌦) are zero along “jumps”.

3.2. INNER PRODUCT SPACES 45

The inner product (3.2) is referred to in the rest of these notes as the stan-dard inner product

For vectors in Rn we define the norm (or magnitude) of the vector usingthe dot product. We can similarly define the norm of vector u, which wedenote by kuk, by

kuk = hu, ui1/2.

Example 3.2.2

example:first-norm-in-L2 Consider the functions u(x) = 1 � x2 and v(x) = cos(⇡x) in L2([0, 1]).

We havekuk = hu, ui1/2

=

✓Z 1

0(1� x

2)(1� x2) dx

◆1/2

=

r8

15

and

kvk =

✓Z 1

0cos2(⇡x) dx

◆1/2

=1p2.

We can also use the inner product to define the angle between two vec-tors. If both u and v are nonzero vectors in an inner product space, thenthe angle between them is defined to be the number ✓ such that

cos ✓ =hu, vikuk kvk .

Example 3.2.3

The angle between the functions u, v in Example 3.2.2 must satisfy

cos ✓ =hu, vikuk kvk =

p15

2

Z 1

0(1� x

2) cos(⇡x) dx =

p30

2⇡.

Thus ✓ ⇡ 0.5 radians.

Two non-zero vectors are said to be orthogonal if the angle betweenthem is ⇡/2. In other words, two (non-zero) vectors u, v are orthogonal ifhu, vi = 0.


Example 3.2.4

Consider the functions u, v from Example 3.2.2 as well as the functionw(x) = x�3/8. Direct computation shows that u and w are orthogonal,but u and v are not.

The standard inner product (3.2) is not the only possible inner productone can define for vector spaces of functions. Consider functions defined onthe domain ⌦. If w is a function on ⌦ such that w(x) � 0, with the valuezero (if present) only permitted along the boundary of ⌦, then

hu, viw =

Z

⌦u v w dV

is an inner product for functions with domain ⌦. We refer to h·, ·iw as aweighted inner product and refer to w as the weight function. Thecorresponding norm is denoted k · kw. The collection of piecewise smoothfunctions u with domain ⌦ such that kukw finite is given the symbol L2

w(⌦),and is referred to as a weighted L2

space.

Example 3.2.5: Hermite inner product

example:Hermite-inner-product Consider the domain ⌦ = R and the function wH(x) = e�x

2. We call

the resulting inner product the Hermite inner producta, and give it

the symbol h·, ·iH . In particular, we have

hu, viH =

Z 1

�1u(x)v(x)e�x

2dx.

Note that the function u(x) = 1 has finite norm with respect to theHermite inner product. In fact,

kukH =

✓Z 1

�1e�x

2dx

◆1/2

= ⇡1/4

The function v(x) = x also has finite norm. Furthermore, hu, vi = 0,meaning that these two functions are orthogonal. (The easiest way tosee this is that u is even, v is odd, and the weight function is even.Since we are integrating an odd function over a symmetric domain the

3.2. INNER PRODUCT SPACES 47

integral is zero.)

aThe inner product is named after mathematician Charles Hermite.

• would be better to replacethese random functions withsomething more meaningful

Exercise 3.2.1. Let ⌦ be the unit disk in R2. The functions u(x, y) = x,v(x, y) = x

2 + y2, w(x, y) = e

x2+y

2are elements in L2(⌦).

1. Compute the norms of u, v, and w.

2. Compute hu, vi.

3. Compute hv, wi.

Exercise 3.2.2.

1. Let A,B be two numbers. Use algebra to show that (A � B)2 � 0implies that AB 1

2(A2 +B

2).

2. Suppose v, w are elements of an inner product space. Show that

|hv, wi| 1

2(kvk2 + kwk2).

3. Suppose v, w are elements of an inner product space. Show that��

⌧v

kvk ,w

kwk

�� 1.

Conclude that |hv, wi| kvkkwk. This important fact is known as theCauchy-Bunyakovsky-Schwarz inequality.

• needed

Exercise 3.2.3. We define l2(N) to be the set of all sequences A = {an}n2N ={a1, a2, a3, . . . } such that

kAk2 =1X

n=1

|an|2

converges.

1. Show that l2(N) is a vector space.

2. What inner product makes l2(N) an inner product space?

3. How should one define l2(Z)?

HW:Dirichlet-eigenfunctions-are-orthogonal Exercise 3.2.4.


1. Show that the eigenfunctions k defined by (2.23) are orthogonal toone another with respect to the standard inner product in L2([0, L]).

2. Compute k kk. Does the answer depend on k?

HW:Hermite-polynomials-are-orthogonal Exercise 3.2.5. In this exercise we work in L2H(R), defined using the Her-

mite inner product from Example 3.2.5. We define the following functions:

H0(x) = 1, H1(x) = 2x, H2(x) = 4x2 � 2, H3(x) = 8x3 � 12x.

1. Show that each of the functions Hk are orthogonal to one another.

2. Compute the norm of each of the functions Hk.

3.3 The best approximation problem

sec:best-approximation-problemA list of vectors in an inner product space is called an orthogonal collec-

tion if each vector is orthogonal to all of the other vectors. For example:

• The vectors

i =

0

@100

1

A , j =

0

@010

1

A , k =

0

@001

1

A

form an orthogonal collection in R3.

• In Exercise 3.2.4 you showed that the Dirichlet eigenfunctions (2.23)are orthogonal in L2([0, L]).

We now introduce the best approximation problem:

Suppose v1, . . . , vn is a orthogonal collection of vectors in aninner product space, and let u be some other vector. Whatlinear combination of the vectors v1, . . . , vn best approximatesu?

More precisely, how can we choose scalars ↵1, . . . ,↵n so that the vector

u = ↵1v1 + · · ·+ ↵nvn

best approximates u in the sense that

ku� uk

is as small as possible.To understand this problem a bit better, let’s look at two simple exam-

ples in R3.

3.3. THE BEST APPROXIMATION PROBLEM 49

Example 3.3.1

Consider the orthogonal collection that contains only one vector

v =

0

@111

1

A .

Let u be the vector

u =

0

@123

1

A

In this setting, the best approximation problem asks us to find whichvector ↵v is closest to u.

We can easily visualize what is happening here. Vectors of the form↵v all lie in the line passing through (0, 0, 0) and with direction given byv. As we move along this line we can compute the distance to the point(1, 2, 3). The best approximation problem asks us to find the point withthe smallest distance.

We can solve this using methods from first semester calculus. Definethe function f by

f(↵) = k↵v � uk2.

If we have a minimizer for f then we have a value of ↵ where the squareof the norm is smallest, and hence where the norm is smallest. Wecompute

f(↵) =

0

@↵� 1↵� 2↵� 3

1

A ·

0

@↵� 1↵� 2↵� 3

1

A = (↵� 1)2 + (↵� 2)2 + (↵� 3)2.

It is easy to compute f0(↵) and then solve 0 = f

0(↵) for ↵. The resultis ↵ = 2. Thus the vector of the form ↵v that best approximates u isthe vector

u = 2v =

0

@222

1

A .

Example 3.3.2


Consider now the orthogonal collection

v1 =

0

@111

1

A , v2 =

0

@1�10

1

A .

(You should quickly check that this is an orthogonal collection!) Again,let u be the vector

u =

0

@123

1

A

We would like to find scalars ↵1,↵2 such that

u = ↵1v1 + ↵2v2

is such that ku� uk is as small as possible.Motivated by the previous example, we define a function f by

f(↵1,↵2) = k↵1v1 + ↵2v2 � uk2.

We can now use the optimization methods from vector calculus to findthe values of ↵1,↵2 that minimize f . To do this we compute

f(↵1,↵2) = (↵1 + ↵2 � 1)2 + (↵1 � ↵2 � 2)2 + (↵1 � 3)2.

Thus

grad f =

✓6↵1 � 124↵2 + 2

◆

Setting grad f = 0, we find that the optimal values are

↵1 = 2, ↵2 = �1

2.

Consequently the linear combination of v1,v2 that best approximatesu is

u = 2v2 �1

2v2 =

0

@3/25/22

1

A .

The two examples above show us how to address the best approximationproblem in general. Suppose we have orthogonal vectors v1, . . . , vn and also


a vector u that we want to approximate. We define a function f by

f(↵1, . . . ,↵k, . . . ,↵n) = k↵1v1 + · · ·+ ↵kvk + · · ·+ ↵nvn � uk2.

Since the vectors vk are orthogonal to one another, we have

f(↵1, . . . ,↵k, . . . ,↵n)

= kuk2 + ↵21kv1k2 + · · ·+ ↵

2kkvkk2 + · · ·+ ↵

2nkvnk2

� 2↵1hv1, ui � · · ·� 2↵khvk, ui � · · ·� 2↵nhvn, ui.

Thus the gradient of f is

grad f =

0

BBBBBB@

2↵1kv1k2 � 2hv1, ui...

2↵kkvkk2 � 2hvk, ui...

2↵nkvnk2 � 2hvn, ui

1

CCCCCCA(3.3)

Consequently it is easy to see that in order to minimize f , and hence inorder to minimize ku� uk, we should choose

↵1 =hv1, uikv1k2

, . . . ,↵k =hvk, uikvkk2

, . . . ,↵n =hvn, uikvnk2

.

In other words, the linear combination of the v1, . . . , vn that best approxi-mates the vector u is

u =hv1, uikv1k2

v1 + · · ·+ hvk, uikvkk2

vk + · · ·+ hvn, uikvnk2

vn.

One should interpret this formula in the following way. The vectors v1, . . . , vndescribe n di↵erent orthogonal directions inside the vector space. The quan-tity

hvk, uikvkk2

vk

represents that part of the vector u that lies in direction vk. (The technicalmathematics term is “the projection of u on to the line spanned by vk.)Thus the best approximation is accomplished by finding the portion of ulying in the directions v1, . . . , vn and then summing.

Of course, there may be portions of u that are orthogonal to all of thevectors v1, . . . , vn. Therefore the vector u might not be equal to u. The


di↵erence u � u is precisely that part of u that is orthogonal to all of thevectors v1, . . . , vn.

The resolution of the best approximation problem has a very importantconsquence: it tells us how to choose the constants ↵k in (2.29). In Exercise3.2.4 you showed that the Dirichlet eigenfunctions k are orthogonal, andhave norm k kk2 = ⇡

2 . If we want the finite sum

fn(x) =nX

k=1

↵k k(x) (3.4) finite-Fourier-sum

to best approximate the function f , then we should choose

↵k =h k, fik kk2

(3.5) Fourier-coefficients

We call these numbers the Fourier coe�cients of the function f .It is important to note that we have not showed that this choosing ↵k

according to (3.5) necessarily implies that fn ! f as n ! 1, as the Fourierhypothesis conjectures, but if there is to be any hope of convergence, thenthis is how the constants need be chosen. The convergence fn to f is dis-cussed below.

With this information in hand, let’s revisit Fourier’s example from §II.

Example 3.3.3

example:Fouriers-example The function that we want to approximate is the functio ' given in(2.30). Here we take L = ⇡. A lengthy, but straightforward, computa-tion shows that

h', ki =1� (�1)k

k2sin(k↵).

Recall from Exercise 3.2.4 that k kk2 = ⇡

2 . Thus we choose

↵k =

(4

k2⇡sin(k↵) if k is odd,

0 if k is even,

which is precisely the values of ↵k that yield (2.31).

Here is another simple example.


Example 3.3.4

example:Dirichlet-constant-function Consider the function f(x) = 1 on the domain [0,⇡]. We compute

h k, fi =1� (�1)k

k

and thus the Fourier coe�cients are

↵k =

(4k⇡

if k is odd,

0 if k is even.

Since we only need to consider odd values of k, set k = 2l + 1, wherel = 0, 1, 2, 3, . . . . Thus the function f is best approximated by

fn(x) =nX

l=0

4

⇡(2l + 1)sin ((2l + 1)x) .

The Sage code

var(’x,l,n’)n=10f(x) =sum ((4/(pi*(2*l+1)))*sin ((2*l+1)*x),l,0,n)plot(f,(x,0,pi),figsize =[4 ,2])

yields the following plot of this approximation when n = 10:

This does a reasonable job of approximating the constant functionf(x) = 1. Play around with the code and see what happens whenn is large. Does the approximation improve?

Our resolution to the best approximation problem gave us a very sim-


ple way to approximate a function f by the Dirichlet eigenfunctions: simplycompute the Fourier coe�cients ↵k and construct (3.4). The limit as n ! 1of (3.4) is called the Fourier sine series (because the Dirichlet eigenfunc-tions k are sine functions). To indicate that the Fourier sine series doesthe best job possible at approximating the function f , we write

f ⇠1X

k=1

↵k k.

The examples above, and the exercises below, provide evidence that theFourier sine series does converge to the function f , although Example 3.3.4leads us to suspect that something funny is happening at the endpoints.Details about the convergence are addressed in §?? below.• section reference needed

The best approximation method here works any time we have an innerproduct space and a collection of orthogonal vectors in that space. TheDirichlet eigenfunctions k are only one possibility. This raises some ques-tions. Were we lucky that the Dirichlet eigenfunctions were orthogonal? Oris there some “reason” that we ended up with an orthogonal collection? Isthere a systematic way to construct orthogonal collections of functions? Oneapproach to generating orthogonal collections is addressed in Exercise 3.3.2.A more general theory appears in Chapter ?? below.• chapter reference needed

HW:Fourier-sine-series-examples Exercise 3.3.1.

1. Find Fourier coe�cients for “square wave” function

f(x) =

(1 if 0 x <

⇡

2

�1 if ⇡

2 < x ⇡.

(According to our convention, what should the value of f be in order tohave f in L2([0,⇡])? Does this value a↵ect the Fourier coe�cients?)Use Sage to generate a plot of the resulting approximation.

2. Find Fourier coe�cients for the function

f(x) =

8><

>:

x if 0 x <⇡

2

0 if x = ⇡

2

x� ⇡ if ⇡

2 < x ⇡.

Use Sage to generate a plot of the resulting approximation.


HW:find-Legendre-using-Gram-Schmidt Exercise 3.3.2. In this exercise you explore a method to generate an or-thogonal collection of polynomials in L2([�1, 1]). The polynomials P0, P1, P2, P3, . . .

are recursively constructed according to the following conditions:

• Pk is a polynomial of degree k,

• Pk is orthogonal to P0, . . . , Pk�1,

• Pk(1) = 1.

1. Explain why we must have P0(x) = 1. Compute kP0k2.

2. Now suppose that P1(x) = a0+a1x. Show that the condition hP1, P0i =0 implies that a0 = 0. Conclude that P1(x) = x. Compute kP1k2.

3. Suppose that P2 = a0 + a1x+ a2x2. Show that the conditions

hP2, P0i = 0 hP2, P1i = 0 P2(1) = 1

imply that P2(x) =32x

2 � 12 . Compute kP2k2.

4. Find P3(x). Show that kP3k2 = 27 .

Exercise 3.3.3 (Requires Exercise 3.3.2). Let f(x) = ex and let Pk be the

polynomials from Exercise 3.3.2. Find the constants ↵0, . . . ,↵3 such thatthe polynomial

f(x) = ↵0P0(x) + ↵1P1(x) + ↵2P2(x) + ↵3P3(x)

best approximates f in L2([�1, 1]).

Exercise 3.3.4 (Requires Exercise 3.2.5). In this exercise we work in L2H(R),

defined using the Hermite inner product (see Exercise 3.2.5). Let f(x) = ex.

1. Show that f is an element of L2H(R).

2. Find the constants ↵0, . . . ,↵3 such that the polynomial

f(x) = ↵0H0(x) + ↵1H1(x) + ↵2H2(x) + ↵3H3(x)

best approximates f in L2H(R).

You might find the following Sage code helpful:

var(’x’)a=integral ((2*x)*exp(x)*exp(-x^2) ,(x,-Infinity ,Infinity))show(a)


3.4 Complex inner product spaces spaces

sec:complex-vector-spacesWe now introduce inner product spaces involving complex numbers. Recallthat complex numbers take the form a + ib, where both a and b are realnumbers and where i

2 = �1. We now introduce some vocabulary regardingcomplex numbers. The real part of complex number a + ib is the realnumber a; the imaginar part is the real number b.

The complex conjugate of a complex number is the complex numberformed by everywhere replacing i by �i. The complex conjugate of a com-plex number is denoted with an overbar. Thus the conjugate of z = a + ib

is z = a+ ib.The modulus of a complex number z is the non-negative real number

(zz)1/2 and is given the same symbol as absolute value. Thus the modulusof z = a+ ib is |z| =

pa2 + b2. The modulus represents the extent to which

a complex number is zero: we have |z| = 0 exactly when z = 0.An important identity of complex numbers is Euler’s formula:

ei✓ = cos ✓ + i sin ✓.

Thus the real part of ei✓ is cos ✓ and the complex part is sin ✓. Note thatfor any complex number z we have

z = |z|ei✓

for some number ✓ 2 [0, 2⇡).

Example 3.4.1

Consider the complex number z = �p3+3i. The real part of z is �

p3

and the imaginary part is 3. The modulus is |z| =p12 and we can

writez =

p12ei

2⇡3 .

A complex vector space is a vector space where the scalars are com-plex numbers. Since all of the usual algebraic operations are possible withcomplex numbers, complex vector spaces behave exactly the same way thatreal vector spaces do.

Example 3.4.2

The vector space C2, which consists of column vectors having two com-

3.4. COMPLEX INNER PRODUCT SPACES SPACES 57

plex entries, is a complex vector space. For instance, the vector

v =

✓1 + i

�3i

◆

is in C2. If we scale v by the scalar 2� i we obtain

(2� i)v =

✓3 + i

3� 6i

◆.

While the definition of complex vector spaces is essentially the sameas for real vector spaces, the definition of complex inner product spaces isslightly di↵erent. This is because we need to make use of complex conjugatesin order to compute the modulus of complex numbers. Thus the definition ofcomplex vector spaces involves complex conjugates. In particular, functionh·, ·i that takes in two elements of a complex vector space and gives out acomplex number makes is a complex inner product if it

• is conjugate linear in the first entry and linear in the second entry 2,meaning

h↵v, wi = ↵hv, wi hv,↵wi = ↵hv, wihv1 + v2, w1 + w2i = hv1, w1i+ hv1, w2i+ hv2, w1i+ hv2, w2i;

• is conjugate symmetric, meaning that

hv, wi = hw, vi;

• is positive definite, meaning that

hv, vi � 0,

with equality only when v = 0. (Technical note: the conjugate-symmetry property implies that hv, vi is real, thus it makes sense todiscuss this inequality.)

Example 3.4.3

2Note that there is not consensus about whether the conjugate linear slot should be

first or second.


We can define a complex inner product on C2 by

⌧✓v1

v2

◆,

✓w1

w2

◆�= v1w1 + v2w2.

We note that all of the definitions that rely on inner products are exactlythe same in the complex case as in the real case. For example, we say thattwo vectors v, w are orthogonal if hw, vi = 0.

We now introduce the example of a complex inner product that is mostimportant for this class. Fix some L > 0 and consider all piecewise smoothfunctions that have domain [�L,L] and give complex numbers as outputs.Functions with complex outputs are called complex-valued and take theform

u(x) = ureal(x) + i uim(x),

where ureal and uim are functions with real outputs. A complex-valuedfunction u is called piecewise smooth if the functions ureal and uim areeach piecewise smooth real functions. The derivative of such a function iscomputed by

u0(x) = u

0real(x) + i u

0im(x).

We say that a complex-valued piecewise-smooth function u is square

integrable, and thus in L2(⌦), if

Z

⌦|u|2 dV < 1,

where |u|2 = uu is the modulus of u and where dV is the appropriatelength/area/volume element. For functions v, w in L2(⌦) we define theinner product

hv, wi =Z

⌦v w dV.

Thus, just as in the case of real-valued functions, L2(⌦) is the collection ofall functions whose norm (defined by the integral inner product) is finite.

Example 3.4.4

Consider the domain ⌦ = [0,⇡]. The function v(x) = eix is complex-

valued with real part vreal(x) = cos(x) and imaginary part vim(x) =

3.4. COMPLEX INNER PRODUCT SPACES SPACES 59

sin(x). We have v(x) = e�ix and

kvk2 = hv, vi =Z

⇡

0e�ix

eixdx =

Z⇡

0dx = ⇡.

Thus kvk =p⇡ and v is in L2([0,⇡]). Note that

v0(x) = ie

ix = i cos(x)� sin(x).

Similarly, we see that the function w(x) = ix�5 is in L2([0,⇡]). Weask, are these the functions v and w orthogonal? We compute the innerproduct of v and w to be

hv, wi =Z

⇡

0e�ix(ix� 5) dx =

⇥�(x� i)e�i x � 5i e�i x

⇤⇡0= ⇡ + 8i.

Thus the two functions are not orthogonal.

We conclude this section with the following note about notation.

Remark 3.4.5.

1. In physics you sometimes see a star used to indicate complex conjugate,writing z

⇤ instead of z. Also, physicists use the so-called “bra-ket”notation h·|·i for inner products. Finally, physicists sometimes write“d3x” for the volume element dV , which they place before the integrandrather than after it. The result of all this is that the following thingsall describe the same (three dimensional) integral:

Math hu, vi =Z

⌦uv dV

Physics hu|vi =Z

⌦d3xu

⇤v

2. In mathematics, one often sees complex inner products with the com-plex conjugate on the second, rather than the first, entry. Under thisconvention, complex inner products are conjugate linear in the secondentry and linear in the first entry.

HW:periodic-eigenfunctions Exercise 3.4.1. In this exercise you work in L2([�L,L]) for some L > 0.

Consider the functions �k(x) = eik⇡L x where k = . . . ,�2,�1, 0, 1, 2, . . . .

1. Show that the functions �k are in L2([�L,L]). What is k�kk?


2. Show that the functions �k satisfy the periodic boundary conditions(2.16).

3. Show that the functions �k are orthogonal to one another.

3.5 Periodic Fourier series

We now consider the best approximation problem in complex vector spaces.Suppose we have an orthogonal collection of complex vectors v1, . . . , vn andwant to choose complex numbers ↵1, . . . ,↵n so that the sum

u = ↵1v1 + · · ·+ ↵nvn

best approximates vector u. Proceeding as in §3.3, we consider the function

f(↵1, . . . ,↵n) = ku� (↵1v1 + · · ·+ ↵nvn)k2,

which we want to minimize. Note that when we expand this expression be-cause the inner product being used to define the norm is complex conjugatein the first entry. Thus

f(↵1, . . . ,↵n) = kuk2 � ↵1hv1, ui � · · ·� ↵nhvnui� ↵1hv1, ui � · · ·� ↵nhvnui

+ |↵1|2kv1k2 + · · ·+ |↵n|2kvnk2

Now we need to be careful because ↵1, . . . ,↵n are complex numbers. Thusin order to use methods of multivariable calculus for optimization we needto convert the problem to a problem of real variables. We can do this bywriting ↵1 = a1 + ib1, etc., and then setting

@f

@ak= 0 and

@f

@bk= 0 (3.6) first-complex-optimization

for each k = 1, . . . , n.For computational simplicity we compute, using the chain rule, in terms

of the variables ↵k = ak + ibk and ↵k = ak � ibk. Using the chain rule wehave

@f

@ak=@f

@↵+@f

@↵

@f

@bk= i

✓@f

@↵� @f

@↵

◆

3.5. PERIODIC FOURIER SERIES 61

Thus (3.6) is equivalent to the condition that

@f

@↵k

= 0 and@f

@↵k

= 0

for each k = 1, . . . , n.In order to compute the derivatives of f , note that for each k we have

|↵k|2 = ↵k↵k. Thus

@f

@↵k

= �hvk, ui+ ↵kkvkk2@f

@↵k

= �hvk, ui+ ↵kkvkk2.

Setting each of these equal to zero we conclude that the optimal choice of↵k is

↵k =hvk, uikvkk2

.

Thus the best approximation of u is given by

un =hv1, uikv1k2

v1 + · · ·+ hvk, uikvkk2

vk + · · ·+ hvn, uikvnk2

vn.

It is important to note the order in the inner product: in the complexsetting the order matters because the inner product is conjugate symmetric,not symmetric.

For the purposes of this course, the most important example of orthog-onal functions in a complex inner product space are the functions

�k(x) = eik⇡L x

k = . . . ,�2,�1, 0, 1, 2, . . . (3.7) complex-periodic-eigenfunctions

introduced in Exercise 3.4.1. The approximation of a function determinedby the functions (3.7) is called the (periodic) Fourier series

3. Given afunction u in L2([�L.L]), Fourier series is constructed by first computingthe Fourier coe�cients

ck =h�k, uik�kk2

=1

2L

ZL

�L

e�i

k⇡L y

u(y) dy. (3.8) FS-coefficient-formula

These coe�cients are used to form the associated Fourier series

u =1X

k=�1ck�k.

3Here the word periodic is in parentheses because this is the “default” Fourier series.

If someone refers to Fourier series without specifying boundary conditions, they mean the

complex periodic series.


Example 3.5.1

example:FS-line Consider the function u(x) = x. We compute the Fourier coe�cients

ck =1

2L

ZL

�L

e�i

k⇡L y

x dx =

((�1)k L

k⇡i if k 6= 0

0 if k = 0.

Thus the Fourier series for u is

u(x) =�1X

k=�1(�1)k

L

k⇡ie

ik⇡L x +

1X

k=1

(�1)kL

k⇡ie

ik⇡L x

Notice that we can re-index the first sum to obtain an expression thatonly involves real-valued functions.

u(x) =1X

k=1

(�1)kL

k⇡i

⇣eik⇡L x � e

�ik⇡L x

⌘=

1X

k=1

(�1)k+1 2L

k⇡sin

✓k⇡

Lx

◆

The following Sage code generates a plot of the function u (dashedblack line) and the n = 10 partial sum of the Fourier series; here wehave set L = ⇡ for the purposes of plotting.

var(’x,k,n’)n=10f(x) = xfplot = plot(f,(x,-pi ,pi),figsize =[4,2], color=’black’,

linestyle="dashed")

fs(x) = sum ((2*(( -1)^(k+1))/k)*sin(k*x),k,1,n)fsplot = plot(fs ,(x,-pi,pi),figsize =[4 ,4])

show(fplot+fsplot)

The resulting graphic is


Example 3.5.2

example:FS-parabola Consider the function v(x) = x2. Using the Sage code

var(’y,k,L’)ck = integrate( exp(-I*k*pi*y/L)*y^2, (y,-L,L))show(ck)

we compute the Fourier coe�cients

ck =1

2L

ZL

�L

e�i

k⇡L y

dy =

((�1)k2L2

⇡2k2if k 6= 0

L2

3 if k = 0.

Thus the Fourier series for u is

v(x) =L2

3+X

k 6=0

(�1)k2L2

⇡2k2eik⇡L x

.

Re-indexing the sum, we can write this as

v(x) =L2

3+

1X

k=1

(�1)k4L2

k2⇡2cos

✓k⇡

Lx

◆


The following Sage code generates a plot of the function u (dashed blackline) and the n = 10 partial sum of the Fourier series; here we have setL = ⇡.

var(’x,k,n’)n=10f(x) = x^2fplot = plot(f,(x,-pi ,pi),figsize =[4,2], color=’black’,

linestyle="dashed")

fs(x) = pi^2/3 + sum (((4*( -1)^k)/(k^2))*cos(k*x),k,1,n)fsplot = plot(fs ,(x,-pi,pi),figsize =[4 ,3])

show(fplot+fsplot)

The resulting graphic is

There are two important things to note about Examples 3.5.1 and 3.5.2.The first is that in both cases we were able to express the Fourier seriesas a sum of real functions: in the first case the sum is in terms of sinefunctions; in the second case it is in terms of cosine functions. In general, ifthe function u we start with is real-valued, then the Fourier series is able tobe expressed in a manner which makes apparent that it, too, is real-valued.See Exercise 3.5.2 for a further exploration of this.

The second important thing to note about Examples 3.5.1 and 3.5.2 isthe behavior at the endpoints of the domain. In Example 3.5.2, the Fourierseries approximation is quite good, even when we only plot the first 10 terms.In Example 3.5.1, the behavior at the endpoints is a bit strange. To betterunderstand what’s going on here, notice that the functions (3.7) are all 2L-periodic, meaning that �k(x + 2L) = �k(x) for any real number x. Thus


the Fourier series approximation of any function is also 2L-periodic. To seethis, consider the following graphic, which contains the plots of the Fourierseries approximations from the two examples, now shown on the domain[�3⇡, 3⇡]:

The functions that these plots are approximating are the periodic exten-

sions of the functions u(x) = x and v(x) = x2, which are obtained by

extending the domain from [�L,L] to all real numbers under the conditionthat the functions are 2L-periodic. The plots of these periodically extendedfunctions are

The Sage code used to generate these plots is:

var(’x’)

f1(x) = piecewise ([(( -3*pi,-pi),x+2*pi) ,((-pi,pi), x) ,((pi,3*pi), x-2*pi)])

f1plot = plot(f1 ,(x,-3*pi ,3*pi),exclude=(-pi, pi))

f2(x) = piecewise ([(( -3*pi,-pi) ,(x+2*pi)^2) ,((-pi,pi), x^2),((pi ,3*pi), (x-2*pi)^2)])

f2plot = plot(f2 ,(x,-3*pi ,3*pi))

show(f1plot ,figsize =[4 ,3])#show(f2plot ,figsize =[4 ,3])


These latter plots shed light on the behavior at the endpoints seen in Ex-ample 3.5.1. The Fourier series approximation is actually approximatingthe periodic extension of the function u(x) = x, which is not continuous atx = ±L. Rather, the periodic extension jumps at x = ±L, and thus theFourier series approximation does as well.

HW:periodic-Fourier-series Exercise 3.5.1. Find the (periodic) Fourier series for the following func-tions. Have Sage make a plot of the approximations.

1. The “square”

u(x) =

(�1 if � L < x < 0

1 if 0 < x < L.

2. The “triangle”

u(x) =

(L+ x if � L < x < 0

L� x if 0 < x < L.

3. The “sawtooth”

u(x) =

(x+ L if � L < x < 0

x if 0 < x < L.

4. The “pulse of size a”

u(x) =

(0 if |x| > a

12a if |x| < a,

where a is some positive number less than L.

HW:FS-periodic-trig Exercise 3.5.2. Suppose that u is a real-valued function in L2([�L,L]) andthat

u(x) =1X

k=�1cke

ik⇡L x

is the corresponding Fourier series.

1. Show that

u(x) = a0 +1X

k=1

ak cos

✓k⇡

Lx

◆+

1X

k=1

bk sin

✓k⇡

Lx

◆,


where

a0 =1

2L

ZL

�L

u(y) dy

and, for k = 1, 2, 3, . . . ,

ak =1

L

ZL

�L

u(y) cos

✓k⇡

Lx

◆dy, bk =

1

L

ZL

�L

u(y) sin

✓k⇡

Lx

◆dy.

2. Find the relationship between the coe�cients ak, bk and the complexFourier coe�cients ck.

Documents

A motivating example · consider vector spaces with both real scalars and, in §3.4 below, complex scalars. Example 3.1.1 The collection of all continuous functions with domain [0,L]