Lecture 7 Inner product spaces (cont’d)links.uwaterloo.ca/amath391w13docs/set3.pdf · · 2013-04-12Lecture 7 Inner product spaces (cont’d) The “moral of the story” regarding

Lecture 7

Inner product spaces (cont’d)

The “moral of the story” regarding discontinuities: They affect the rate of conver-

gence of Fourier series

As suggested by the previous example, discontinuities of a function f(x) create problems for its Fourier

series expansion by slowing down its rate of convergence. At a jump discontinuity, the convergence

may be quite slow, with the partial sums demonstrating Gibbs’ “ringing.”

Another way to look at this situation is as follows: Generally, a higher number of terms in the

Fourier series expansion – or “higher frequencies” – are needed in order to approximate a function

f(x) near points of discontinuity.

But, in fact, it doesn’t stop there – the existence of points of discontinuity actually affects the rate

convergence at other regions of the interval of expansion. To see this, let’s return to the two examples

studied above, i.e., the functions

f1(x) = |x| =

−x, −π < x ≤ 0,

x, 0 < x ≤ π,(1)

and

f2(x) =

−1, −π < x ≤ 0,

1, 0 < x ≤ π,(2)

Note that we have subscripted them for convenience. Recall that the function f1(x) is continuous on

[−π, π] and its 2π-extension is continuous for all x ∈ R. On the other hand f2(x) has a discontinuity

at x = 0 and its 2π-extension has discontinuities at all points kπ.

We noticed how well a rather low number of terms (i.e., 5) in the Fourier expansion of f1(x)

approximated it over the interval [−π, π]. On the other hand, we saw how the discontinuities of f2(x)

affected the performance of the Fourier expansion, even for a much larger number of terms (i.e., 50).

This is not so surprising when we examine the decay rates of the Fourier series coefficients for each

function:

1. For f1(x), the coefficients ak decay as O(1/k2) as k → ∞.

2. For f2(x), the coefficients bk decay as O(1/k) as k → ∞.

65

The coefficients for f1 are seen to decay more rapidly than those of f2. As such, you don’t have to

go to such high k values (which multiply sine and cosine functions, of maximum absolute value 1) for

the coefficients ak to become negligible to some prescribed accuracy ǫ. (Of course, there is the infinite

“tail” of the series to worry about, but the above reasoning is still valid.)

The other important point is that the rate of decay of the coefficients affects the convergence over

the entire interval, not just around points of discontinuity. This has been viewed as a disadvantage

of Fourier series expansions: that a “bad point,” p, i.e. a point of discontinuity, even near or at the

end of an interval will affect the convergence of a Fourier series over the entire interval, even if the

function f(x) is “very nice” on the other side of the interval. We illustrate this situation in the sketch

on the left in the figure below.

Researchers in the signal/image processing community recognized this problem years ago and

came up with a clever solution: If the convergence of the Fourier series over the entire interval [a, b]

is being affected by such a bad point p, why not split the interval into two subintervals, say A = [a, c]

and B = [c, b] and perform separate Fourier series expansions over each subinterval. Perhaps in this

way, the number of coefficients saved by the “niceness” of f(x) over [a, c] might exceed the number of

coefficients needed to accomodate the “bad” point p. The idea is illustrated in the sketch on the right

in the figure below.

The above discussion is, of course, rather simplified, but it does describe the basic idea behind

block coding, i.e., partitioning a signal or image into subblocks and Fourier coding each subblock, as

opposed to coding the entire signal/image.

Block coding is the basis of the JPEG compression method for images as well as for the MPEG

method for video sequences. More on this later.

y = f(x)

discontinuity

p

“bad” point of

ba

Fourier series on [a, b]

“nice” region of smoothness

of f(x)

y = f(x)

p ba c

Fourier series on [c, d]Fourier series on [a, c]

.

66

Greater degree of smoothness implies faster decay of Fourier series coefficients

The effect of discontinuities on the rate of convergence of Fourier series expansions does not end with

the discussion above. Recall that the Fourier series for the continuous function f1(x) given above

demonstrated quite rapid convergence. But it is possible that series will demonstrate even more rapid

convergence due to the fact that the Fourier series coefficients ak and bk decay even more rapidly

than 1/k2. Recall that the function f1(x) is continuous, but that its derivative f ′(x) is only piecewise

continuous, having discontinuities at x = 0 and x = ±π. Functions with greater degrees of smoothness,

i.e., higher-order continuous derivatives will have Fourier series with more rapid convergence. We

simply state the following result without proof:

Theorem: Suppose that f(x) is 2π-periodic and Cn[−π, π], for some n > 0 – that is, its nth derivative

(and all lower order derivatives) is continuous. Then the Fourier series coefficients ak and bk decay as

ak, bk = O

(

1

kn+1

)

, as k → ∞.

An idea of the proof is as follows. To avoid complications, suppose that f is piecewise continuous,

corresponding to n = 0 above, the coefficients must decay at least as quickly as 1/k, since they

comprise a square-summable sequence in l2. Now consider the function

g(x) =

∫ x

0f(s) ds, (3)

which is a continuous function of x (Exercise). The Fourier series coefficients of g(x) may be ob-

tained by termwise integration of the coefficients of f(x) (AMATH 231). This implies that the series

coefficients of g(x) will decay at least as quickly as 1/k2. Integrate again, etc..

In other words, the more “regular” or “smooth” a function f(x) is, the faster the decay of its

Fourier series coefficients, implying that you can generally approximate f(x) to a desired accuracy

over the interval with a fewer number of terms in the Fourier series expansion. Conversely, the more

“irregular” a function f(x) is, the slower the decay of its FS coefficients, so that you’ll need more terms

in the FS expansion to approximate it to a desired accuracy. This feature of regularity/approximability

is very well-known and appreciated in the signal and image processing field. In fact, it is a very

important, and still ongoing, field of research in analysis.

The above discussion may seem somewhat “handwavy” and imprecise. Let’s look at the problem

67

in a little more detail. And we’ll consider the more general case in which a function f(x) is expressed

in terms of a set of of functions, {φk(x)}∞k=1, which form a complete and orthonormal basis on an

interval [a, b], i.e.,

f(x) =∞∑

k=1

ckφk(x), ck = 〈f, φk〉. (4)

Here, the equation is understood in the L2 sense, i.e., the sequence of partial sums, Sn(x), defined as

follows,

Sn(x) =n∑

k=1

ckφk(x), (5)

converges to f in L2 norm/metric, i.e.,

‖f − Sn‖2 → 0 as n → ∞. (6)

The expression in the above equation is the magnitude of the error associated with the approximation

f(x) ∼= Sn(x), which we shall simply refer to as the error in the approximation. This error may be

expressed in terms of the Fourier coefficients ck. First note that

f(x) − Sn(x) =

∞∑

k=n+1

ckφk. (7)

Therefore the L2-squared error is given by

‖f − Sn‖22 = 〈f − Sn, f − Sn〉

= 〈∞∑

k=n+1

ckφk,∞∑

l=n+1

clφl〉

=∞∑

k=n+1

|ck|2. (8)

Thus,

‖f − Sn‖2 =

[

∞∑

k=n+1

|ck|2]1/2

. (9)

Recall that for the above sum of an infinite series to be finite, the coefficients ck must tend to

zero sufficiently rapidly. The above summation of coefficients starting at k = n + 1 may be viewed as

involving the “tail” of the infinite sequence of coefficients ck, as sketched schematically below.

For a fixed n > 0, the greater the rate of decay of the coefficients ck, the smaller the area under

the curve that connects the tops of these lines representing the coefficient magnitudes, i.e., the smaller

the magnitude of the term on the right of Eq. (9), hence the smaller the error in the approximation.

From a signal processing point of view, more of the signal is concentrated in the first n coefficients ck.

68

|ck|2 vs. k

kn + 1

“tail” of infinite sequence

0

From the examples presented earlier, we see that singularities in the function/signal, e.g., discon-

tinuities of the function, will generally reduce the rate of decay of the Fourier coefficients. As such,

for a given n, the error of approximation by the partial sum Sn will be larger. This implies that in

order to achieve a certain accuracy in our approximation, we shall have to employ more coefficients

in our expansion. In the case of the Fourier series, this implies the use of functions sin kx and cos kx

with higher k, i.e., higher frequencies.

Unfortunately, such singularities cannot be avoided, especially in the case of images. Images are

defined by edges, i.e., sharp changes in greyscale values, which are precisely the points of discontinuity

in an image.

However, singularities are not the only reason that the rate of decay of Fourier coefficients may

be reduced, as we’ll see below.

Higher variation means higher frequencies are needed

In the previous discussion, we saw how the irregularity or lack of smoothness of a function f(x) –

for example, points of discontinuity in f(x) or its derivatives – affects the convergence of its Fourier

series expansion. This phenomenon is very important in signal and image processing, particularly in

the field of signal/image compression, where we wish to store approximations to the signal f(x) to a

prescribed accuracy with as few coefficients as possible.

In addition to smoothness, however, the rate of change of f , as measured by the magnitude of its

derivative, |f ′(x)|, or gradient ‖∇f‖, also affects the convergence. Contrast the two functions sketched

69

below. The function on the left, g(x), has little variation over the interval [a, b] whereas the one on

the right, h(x), has significant variation.

a b a b

g(x) h(x)

In order to accomodate the more rapid change in f(x), i.e., in order to approximate such a function

better, sine and cosine functions of higher frequencies, i.e., higher oscillation, are required. In other

words, we expect that the Fourier series coefficients of g(x) will decay more rapidly than those of h(x).

Example 1: We can illustrate this point with the help of the following analytical example. Consider

the normalized Gaussian function,

gσ(x) =1

2πσ2e−

x2

2σ2 , (10)

which you have probably encountered in a course on probability or statistics. The variance of this

function is σ2 and its standard deviation is σ. As σ decreases toward zero, the graph of gσ(x) becomes

more peaked – higher and narrower – as shown in the figure below. In what follows, we’ll consider the

function gσ(x) as defined only over the interval [−π, π] so that we may examine its Fourier series.

0

0.5

1

1.5

2

-3 -2 -1 0 1 2 3

G(t)

t

Gaussian functions

sigma = 1

sigma = 0.5

sigma = 0.25

70

Clearly, the magnitude of the derivative of gσ(x) is increasing near x = 0. Let us now observe

the effect of this increase on the Fourier coefficients of gσ(x). Since it is an even function, its Fourier

series will be composed only of cosine functions, i.e.,

gσ(x) = c0 +∞∑

k=1

ckφk, (11)

where we are using the orthonormal cosine basis set (see earlier notes),

φ0(x) =1√2π

, φk(x) =1√π

cos kx, k ≥ 1. (12)

Technically, the computation of the integrals of the Gaussian function is rather complicated since we

are integrating only over the finite interval [−π, π]. For sufficiently large σ, the “tail” of gσ(x) lying

outside this interval is very small – in fact, it is exponentially small, therefore negligible. To a good

approximation, therefore,

a0 =

∫ π

−πgσ(x)φ0(x) dx

∼= 1√2π

∫ ∞

−∞gσ(x) dx

=1√2π

, (13)

and

ak =1√π

∫ π

−πgσ(x) cos kx dx

∼= 1√π

∫ ∞

−∞gσ(x) cos kx dx

=1√π

∫ ∞

−∞e−

x2

2σ2 cos kx dx

=1√π

e−σ2

k2

2 . (14)

These results can be derived from the following formula that can be found in integral tables,

∫ ∞

0e−a2x2

cos bx dx =

√π

2ae−

b2

4a2 . (15)

You let a2 =1

2σ2and then do some algebra.

Note that the distribution of ak values with respect to k > 0 – we don’t even have to square

them since they are all positive – is a Gaussian distribution with variance1

σ. As we let σ → 0+,

71

k

Profile of ak coefficients

0

1

σ

the distribution spreads out, in complete opposition to the function gσ(x) getting more concentrated

at x = 0. (We’ll return to this theme – the complementarity of space and frequency – later in this

course.)

Example 2: This is a numerical version of the previous example. For 0 < a < π, let ga(x) denote

the function,

ga(x) =

√

32a

(

1 − xa

)

, 0 ≤ x ≤ a,√

32a

(

1 + xa

)

, −a ≤ x ≤ 0,

0, a < |x| ≤ π.

(16)

A sample graph of this function is sketched in the figure below.

π−π −a a

y

q

32a

y = ga(x)

The multiplicative factor√

3/(2a) was chosen so that

‖ga‖2 = 1, (17)

for all a > 0, a kind of normalization condition. Note that as a approaches zero, the peak becomes

more pronounced, since the magnitudes of the slopes of the peak are given by |g′

a(x)| =√

3/2a−3/2.

72

Since the function ga(x) is even, it will admit a Fourier cosine series (i.e., the coefficients bk of

all sine terms are zero). Here we consider the expansion of ga(x) in terms of the orthonormal cosine

basis,

e1(x) =1√2π

, ek(x) =1√π

cos kx, k ≥ 1. (18)

Then

ga(x) = c0 +∞∑

k=1

ckek, (19)

where

ck = 〈ga, ek〉. (20)

For example,

c0 =1

2√

π· 2 ·

∫ a

0ga(x) dx =

1

2

√

3a

π. (21)

Since ga ∈ L2[−π, π], the sequence of Fourier coefficients c = (c0, c1, c2, · · ·) is square summable, i.e.,

c ∈ l2 sequence space. Moreover, from a previous lecture,

‖ga‖L2 = ‖c‖l2 = 1, (22)

implying that∞∑

k=1

[ck]2 = 1. (23)

In the figure below are plotted the coefficients cn, 0 ≤ n ≤ 20, for a values 1.0, 0.5, 0.25, 0.1, 0.05.

(The coefficients were computed using MAPLE.) The plots clearly show that the rate of decay of the

coefficients decreases as a is decreased. For a = 1.0, the coefficients cn appear to be negligible for

n > 5, at least to the resolution of the plot. This would suggest that the partial sum function S5(x),

composed of cosine terms with coefficients c0 to c5 would provide an excellent approximation to ga(x)

over the interval. On the other hand, for a = 0.5, it appears that we would have to use the partial

sum S10(x), and so on.

In order to understand this more quantitatively, the partial sums S20(x) were computed for the a-

values shown in the above figure. From these partial sums, the L2 distances ‖ga−S20‖2 were computed

(using MAPLE). These distances represent the L2 error in approximating ga with S20. The results

are presented in the table below. Clearly, as a is decreased, the error in approximation by the partial

sums S20 increases. There appears to be a dramatic increase between a = 0.25 and a = 0.1.

Improvement by “block coding”. In light of the earlier discussion on “block coding,” let us see if

we can improve the approximation to the above triangular peak function by dividing up the interval

73

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

n

a=1.0

a=0.5

a=0.25

a=0.1

a=0.05

Coefficients cn of Fourier cosine series expansion of the triangular peak-function ga(x) defined in Eq. (16), for

a = 1.0, 0.5, 0.25, 0.1, 0.05. As a decreases, the rate of decay of the Fourier coefficients cn is seen to decrease.

a ‖ga − S20‖2

1 0.012

0.5 0.026

0.25 0.056

0.1 0.460

0.05 0.733

Error in approximation to ga(x) afforded by partial sum functions S20(x) comprised of Fourier coefficients c0

to c20.

and coding the function separately over the subintervals. In the following experiment, the interval

I = [−π, π] was partitioned into the three subintervals,

I1 = [−π,−π/3], I2 = [−π/3, π/3], I3 = [π/3, π]. (24)

For a ≤ 1, the approximation of ga(x) over intervals I1 and I3 is trivial since ga(x) = 0. As such we

don’t even have to supply any Fourier coefficients but we should record the use of the first coefficient

c0 = 0. After all, the function ga(x) is constant on these intervals, and we should specify the value of

the constant. Since 21 coefficients were used in the previous experiment (S20(x) uses ck, 0 ≤ k ≤ 20),

we shall use 19 coefficients to code the function ga(x) over interval I2.

It remains to construct the Fourier series approximation to ga(x) over interval I2 = [−π/3, π/3].

74

From Lecture 7, we must employ the basis set

{ek} =

{

1√2a

,1√a

cos(πx

a

)

,1√a

sin(πx

a

)

,1√2a

cos

(

2πx

a

)

, · · ·}

(25)

where a = π/3. Once again, the sine functions are discarded since ga(x) is an even function. This was

easily done in MAPLE: For each a value, the necessary integrals were computed (actually only the

integrals over [0, a] were computed), followed by the L2 distance between ga and the S18(x) partial

sum functions. The results are presented in the table below. We can see an improvement for all a

a ‖ga − S18‖2

1 0.002

0.5 0.007

0.25 0.021

0.1 0.054

0.05 0.294

Error in approximation to ga(x) afforded by partial sums S18 of Fourier cosine series over interval [−π/3, π/3]

employing Fourier coefficients c0 to c18, along with the trivial Fourier expansions c0 = 0 on [−π,−π/3) and

(π/3, π].

values – a roughly five-fold decrease in the error for a = 1 and about a three-fold decrease for a = 0.05.

This very simple implementation of “block coding” has achieved the goal of decreasing the error with

a given number of coefficients.

Question: The fact that the Fourier series over [−π/3, π/3] works better to approximate the function

ga(x) might appear rather magical. Can you come up with a rather rather simple explanation for the

improvement in accuracy?

That being said, the improvement is rather impressive in this case because we know the function

essentially to infinte accuracy, i.e., we have its formula. If we had only a finite set of discrete data

points representing sampled values of the function, the improvement would not be so dramatic. We’ll

return to this matter after looking at discrete Fourier transforms.

75

Fourier series on the interval [−a, a], even and odd extensions

In a previous lecture, it was mentioned that the following functions comprise an orthonormal set on

the interval [−a, a], where a > 0:

e0 =1√2a

, e1 =1√a

cos(πx

a

)

, e2 =1√a

sin(πx

a

)

, e3 =1√a

cos

(

2πx

a

)

, · · · . (26)

Moreover, this set serves as a complete orthonormal basis for the space L2[−a, a] of square-integrable

functions on [−a, a]. Thus, for an f ∈ L2[−a, a],

f =

∞∑

k=0

〈f, ek〉ek. (27)

This may be translated to the following standard (unnormalized) Fourier series expansion having the

form

f(x) = a0 +

∞∑

k=1

[

ak cos

(

kπx

a

)

+ bk sin

(

kπx

a

)]

, (28)

where

a0 =1

2a

∫ a

−af(x) dx

ak =1

a

∫ a

−af(x) cos

(

kπx

a

)

dx

bk =1

a

∫ a

−af(x) sin

(

kπx

a

)

dx. (29)

(We use the term “unnormalized” since the coefficients ak, bk are multiplying the unnormalized func-

tions cos(kπx/a) and sin(kπx/a). The normalization factors, which involve√

a factors that become a

upon squaring, are swept into the ak and bk coefficients, which accounts for the factors appearing in

front of the above integrals.) Once again, in the special case a = π, the above formulas become the

standard formulas for Fourier series on [−π, π], cf. Eq. (1), Lecture 1 of these notes.

Fourier cosine series on [−a, a] and periodic extensions

In the case that f(x) is even, i.e., f(x) = f(−x), then all coefficients bk = 0, so that the expansion in

(28) becomes a Fourier cosine series expansion. Moreover, since f(x) is even, it need only be defined

on the interval [0, a], and the expressions for the coefficients ak become

a0 =1

a

∫ a

0f(x) dx, ak =

2

a

∫ a

0f(x) cos

(

kπx

a

)

dx, k ≥ 1. (30)

76

Now suppose that we are given a function f(x) defined on the interval [0, a] as input data. From

this data, we may construct the ak coefficients – these coefficients define a Fourier cosine series that

converges to to the even 2a-extension of f(x), constructed from f(x) by means of two steps, illustrated

schematically in the figure below,

1. A “flipping” of the graph of f(x) with respect to the y-axis to produce an even function on

[−a, a].

2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a,−a], [−5a,−3a], etc..

a 2a 3a 4a0

2a-extension2a-extension

y

x

even extension

of data

−5a −3a −2a −a

original data

y = f(x)

2a-even extension of f(x), 0 ≤ x ≤ a

Note that the resulting 2a-extension is continuous at all “patch points,” i.e., x = (2k−1)a, k ∈ Z.

For this reason, Fourier cosine series are usually employed in the coding of signals and images. The

JPEG/MPEG standards are based on versions of the discrete cosine transform.

Fourier sine series on [−a, a] and periodic extensions

In the case that f(x) is odd, i.e., f(x) = −f(−x), then all coefficients ak = 0, so that the expansion

in (28) becomes a Fourier sine series expansion. Moreover, since f(x) is odd, it need only be defined

on the interval [0, a] as well. The expression for the coefficients bk becomes

bk =2

a

∫ a

0f(x) sin

(

kπx

a

)

dx, k ≥ 1. (31)

Once again, suppose that we are given a function f(x) defined on the interval [0, a] as input data.

From this data, we may construct the bk coefficients – these coefficients define a Fourier sine series that

converges to to the odd 2a-extension of f(x), constructed from f(x) by means of two steps, illustrated

schematically in the figure below,

77

1. An inversion of the graph of f(x) with respect to the origin produce an odd function on [−a, a].

(If f(0) 6= 0, then one of the points (0,±f(0)) will have to be deleted for f to be single-valued

at x = 0.)

2. Copying this graph on the intervals [a, 3a], [3a, 5a], etc. and [−3a,−a], [−5a,−3a], etc.. (Once

again, some endpoints of the pieces of the graph will have to be deleted to make f single-valued.)

a 2a 3a 4a0

y

x−5a −3a −2a −a

y = f(x)

of data

2a-extension2a-extension

original data

odd extension

2a-of extension of f(x), 0 ≤ x ≤ a

Note that the resulting 2a-extension need not be continuous at the “patch points,” i.e., x =

(2k − 1)a, k ∈ Z. Indeed, if f(0) 6= 0, then the odd extension of f(x) will not even be continuous at

0, ±2a, ±4a, etc..

The two-dimensional case: image functions

Note: The discussion in the first two paragraphs is slightly more general than that presented in class.

We now examine briefly the Fourier analysis of two-dimensional functions, which will be used

primarily to represent images. We shall consider an image function f(x, y) to be defined over a

suitable rectangular region D ⊂ R2. For the moment, let D be defined as the rectangular region

−a ≤ x ≤ a, −b ≤ y ≤ b, centered at the origin. A suitable function space for the representation of

images will be the space of square-integrable functions on D, i.e., L2(D):

L2(D) = {f : D → R |∫

D|f(x, y)|2 dA < ∞} (32)

Now let

78

1. {ek(x)}∞1 denote the orthonormal set of sine and cosine functions on the space L2[−a, a].

2. {ok(y)}∞1 denote the orthonormal set of sine and cosine functions in the space L2[−b, b].

Theorem: The set of all product functions {φkl(x, y) = ek(x)ol(y)} k = 1, 2, · · ·, l = 1, 2, · · ·, form an

orthonormal basis in L2(D).

For simplicity, we now assume that our images are defined on square regions, i.e., a = b, and

further assume that a = b = 1. In this case the basis functions ek and ok have the same functional

form:

{ek}∞1 = { 1√2, cos(πx), sin(πx), cos(2πx), sin(2πx), · · ·} (33)

The set of all products ek(x)el(y) will lead to a complicated mixture of sine and cosine functions.

It is convenient to assume that the image function f(x, y) is an even function with respect to both x

and y, implying that we use only the cosine functions in our basis. In essence, this amounts to the

assumption that the actual image being analyzed lies in the region [0, 1] × [0, 1]. Analogous to the

one-dimensional case, the use of only cosine functions will perform an even 2π-periodic extension of

this image, both in the x and y directions. Let us examine this further.

1. Even w.r.t. x: f(x, y) = f(−x, y).

2. Even w.r.t. y: f(x, y) = f(x,−y).

3. From 1 and 2: f(−x, y) = f(x,−y), implying that f(x, y) = f(−x,−y), i.e., symmetry w.r.t.

inversion about (0, 0).

This means that the graph of f(x, y) in the first quadrant, i.e., [0, a] × [0, a], i.e., the input image,

is “flipped” w.r.t. the y-axis, then “flipped” w.r.t. the x-axis, and finally “flipped” w.r.t. the point

(0, 0). The result is an even 2π-extension of the function f(x, y). The process is illustrated below.

The advantage of an even extension in both directions is that no discontinuities are introduced.

The function f(x, y) is continuous at all points on the x and y-axes. As such, no complications

regarding convergence of the Fourier series are introduced artificially.

The net result is that the input image function f(x, y) defined on the region [0, 1]× [0, 1] will admit a

Fourier cosine series expansion of the form,

f(x, y) = a00 +∞∑

k=1

∞∑

l=1

akl cos(kπx) cos(lπy). (34)

79

y

1

1

-1

-1

original image

x

Input image f(x, y), 0 ≤ x, y ≤ 1, and its even 2π-extension in x and y directions via Fourier cosine transform.

The series coefficients akl could be obtained from the expansion for f in terms of the orthonormal

basis functions or by simply multiplying both sides of (34) with the function cos(mπx) cos(nπy) and

integrating x and y over [0, 1], and exploiting the orthogonality of the cosine functions. The net result

is

a00 =

∫ 1

0

∫ 1

0f(x, y) dxdy,

a0l = 2

∫ 1

0

∫ 1

0f(x, y) cos(lπy) dxdy, l ≥ 1,

ak0 = 2

∫ 1

0

∫ 1

0f(x, y) cos(kπx) dxdy, k ≥ 1,

akl = 4

∫ 1

0

∫ 1

0f(x, y) cos(kπx) cos(lπy) dxdy, k, l ≥ 1. (35)

80

Lecture 8

The Discrete Fourier Transform

We now turn to the analysis of discrete data, e.g., sets of measurements, yk, k = 0, 1, 2, · · ·, as opposed

to signals in continuous time, e.g., f(t). We also assume that the measurements are evenly spaced in

time/space, i.e., there is a fixed time interval T > 0 between each measurement. This is necessary for

the basic theory to be presented below. That being said, it is very often the procedure employed in

scientific experiments, e.g., measuring the temperature at a particular location at hourly intervals.

At this time, we shall simply assume that the measurements correpond to the values of a function

f(t) at discrete times, tn = nT . In the signal processing literature, the usual notation for such a

sampling is as follows,

f [n] := f(nT ), n ∈ {0, 1, 2, · · ·} or n ∈ {· · · ,−1, 0, 1, · · ·}. (36)

The square brackets are rather cumbersome – some authors employ the notation “fn”, but we shall

reserve this notation for other purposes. The idea is sketched below.

T 3T2T0 4T 5T 6T

oo o o o o

o

f [2]f [3] f [4] f [5] f [6]

t

y

y = f(t)

of [1]

f [0]

f [n]

nT

We now assume that we are working with a set of N such consecutive data points which will

comprise an N -vector, indexed as follows,

f = (f [0], f [1], · · · , f [N − 1]). (37)

These measurements could be complex-valued, so that f ∈ CN . Furthermore, we assume that this set

of measurements is then periodized, i.e., extended into the future and backwards into the past, so that

f [k + N ] = f [k], k ∈ Z. (38)

81

This represents a periodic extension of the data, a discrete analogy to the periodization of functions

produced by Fourier series representations.

A “derivation” of the DFT from Fourier expansion in terms of complex exponentials

Let us first assume that we are working with a function f(t) that is a-periodic, i.e.,

f(t + a) = f(t), t ∈ R. (39)

We now use the fact that the following doubly-infinite set of functions,

ek(t) =1√a

ei2πkt/a, k ∈ {· · · ,−2,−1, 0, 1, 2, · · ·}, (40)

forms an orthonormal basis for the space of functions L2[0, a]. (It is a good exercise to verify the

orthonormality of these functions over the interval [0, a]. In Lecture 6, we introduced a set of complex

exponential functions that were orthonormal over the interval [−a, a].) We now expand f(t) in terms

of this basis over [0, a],

f(t) =∞∑

k=−∞

ckek, (41)

where the Fourier coefficients ck are given by the complex scalar product

ck = 〈f, ek〉

=

∫ a

0f(t)ek(t) dt

=1√a

∫ a

0f(t)e−i2πkt/a dt (42)

Ignoring the constant, we now construct Riemann sum approximations to the above integral

following the usual procedure from first-year Calculus. Let N ≥ 1 be a fixed integer. Construct an

equipartition of the interval [0, a] in the usual way, i.e., let

∆t =a

N, (43)

and define the partition points,

tn = n∆t =na

N, n = 0, 1, 2, · · · , N. (44)

We use the Riemann sum that is produced by evaluating f(t) at the left-endpoints of each of the N

subintervals In = [tn, tn+1], n = 0, 1, 2, · · · , N − 1, i.e.,

∫ a

0f(t) exp

(

− i2πkt

a

)

dt ≈N−1∑

n=0

f(tn) exp

(

− i2πktna

)

∆t

82

=N−1∑

n=0

f(tn) exp

(

− i2πkn

N

)

a

N

=

(

a√N

)

1√N

N−1∑

n=0

f(tn) exp

(

− i2πkn

N

)

. (45)

We now ignore the constant factor a/√

N and focus on the remaining summation. The f(tn) are

viewed as discrete samples of the function f(t) so we define, as before,

f [n] = f(tn) = f(nT ), (46)

where the sampling time is given by T = ∆t = a/N . The summation in Eq. (45) may then be written

as follows,

c[k] :=1√N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

. (47)

This has the form of a complex scalar product between the N -vector of sampled data points,

f = (f [0], f [1], · · · , f [N − 1]), (48)

defined earlier and the complex N -vector ek,

ek = (ek[1], ek[2], · · · , ek[N − 1]), (49)

with components

ek[n] =1√N

exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (50)

(We’ll show below that the vectors ek are orthonormal.) The index n plays the role of the time or

spatial variable and k is the index of the frequency. We haven’t said anything about the frequency k

so far. In the continuous formulation, we required all integer values of k. From Eq. (47), it is easily

shown (we’ll do it later) that

c[k + N ] = c[k], (51)

i.e., the complex N -vector, c, defined as

c = (c[0], c[1], · · · , c[N − 1]), (52)

is N -periodic, as is the N -vector of sampled data, f . As such, the frequency index k may be constrained

to the values 0, 1, · · · , N − 1.

The complex N -vector c defined in Eq. (52) is known as a discrete Fourier transform (DFT) of

the (discrete) N -vector f in Eq. (48). Let us now investigate this DFT in terms of complex periodic

N -vectors.

83

An orthonormal periodic basis in CN

The goal is to provide a representation of a set of data in terms of periodic basis vectors in CN . First

of all, the following inner product will be used in CN :

〈f, g〉 =N−1∑

n=0

f [n]g[n], (53)

where the bar once again denotes complex conjugation.

Of course, any orthogonal set of complex N -vectors will serve as a basis for CN , but we wish to

use a set of periodic vectors. The family uk ∈ CN discovered in the previous section will do the trick:

For k = 0, 1, · · · , N − 1, define the vector

uk = (uk[1], uk[2], · · · , uk[N − 1]), (54)

with components

uk[n] = exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (55)

Once again, the index n plays the role of the time or spatial variable and k is the index of the frequency.

Note that in the special case k = 0, all elements uk[n] = 1. In other words, for all N ≥ 2, the

N -vector u0 ∈ CN is a row of 1’s:

u0 = (1, 1, 1, · · · , 1). (56)

This will have important implications in for the DFT.

Let us now show that the vectors uk are N -periodic. First consider a given k ∈ {0, 1, · · · , N − 1}.Then consider a given component uk[n], n ∈ {0, 1, · · · , N − 1}, in the vector uk. From Eq. (55),

uk[n + N ] = exp

(

i2πk(n + N)

N

)

= exp

(

i2πkn

N

)

exp

(

i2πkN

N

)

= exp

(

i2πkn

N

)

exp(i2πk)

= exp

(

i2πkn

N

)

= uk[n]. (57)

84

We claim that the set of N -vectors {uk} forms an orthogonal set in CN . To prove this, consider

the inner product between two elements, uk and ul:

〈uk, ul〉 =

N−1∑

n=0

exp

(

i2πkn

N

)

exp

(

− i2πln

N

)

=

N−1∑

n=0

exp

(

i2π(k − l)n

N

)

. (58)

Case 1: k = l. In this case, the above inner product reduces to

〈uk, ul〉 =

N−1∑

n=0

1 = N. (59)

Case 2: k 6= l. First let p = k − l, an integer. Then the inner product in (58) becomes

〈uk, ul〉 =N−1∑

n=0

exp

(

i2πpn

N

)

=

N−1∑

n=0

[

exp

(

i2πp

N

)]n

= 1 + r + · · · + rN−1, (60)

where r = exp

(

i2πp

N

)

. The sum of this finite geometric series is

S =1 − rN

1 − r=

1 − ei2πp

1 − r=

1 − 1

1 − r= 0. (61)

Therefore,

〈uk, ul〉 = Nδkl, (62)

i.e., the set {uk} is an orthogonal set. Therefore it is a basis in CN . In particular, it is the desired

basis because of its internal periodicity. Once again, we may view the n index as a spatial index – in

fact, n/N plays the role of t or x.

From this orthogonal basis set {uk}, we construct the orthonormal basis vectors,

ek =1√N

uk, k = 0, 1, · · · , N − 1, (63)

with components

ek[n] =1√N

exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (64)

85

Once again, the case k = 0 is special. For N ≥ 2,

e0 =1√N

(1, 1, · · · , 1). (65)

Examples:

1. N = 2:

In this very simple case, one can probably guess the vectors that are generated. First of all,

from Eq. (65), for k = 0,

e0 =1√2(1, 1). (66)

For k = 1,

e1[1] =1√2

exp

(

i2π · 1 · 12

)

=1√2

exp (iπ) =1√2. (67)

Therefore,

e1 =1√2(1,−1). (68)

2. N = 3:

Once again, the case k = 0 is simple. From Eq. (65),

e0 =1√3(1, 1, 1). (69)

For k = 1, using Eq. (64),

(a) n = 0:

e1[0] =1√3

exp(0) =1√3. (70)

(b) n = 1:

e1[1] =1√3

exp

(

i2π

3

)

=1√3

[

−1

2+

√3

2i

]

. (71)

(c) n = 2:

e1[2] =1√3

exp

(

i4π

3

)

=1√3

[

−1

2−

√3

2i

]

. (72)

In summary,

e1 =1√3

(

1,−1

2+

√3

2i,−1

2−

√3

2i

)

(73)

For k = 2, using Eq. (64),

86

(a) n = 0:

e2[0] =1√3

exp(0) =1√3. (74)

(b) n = 1:

e2[1] =1√3

exp

(

i4π

3

)

=1√3

[

−1

2−

√3

2i

]

. (75)

(c) n = 2:

e2[2] =1√3

exp

(

i8π

3

)

=1√3

[

−1

2+

√3

2i

]

. (76)

In summary,

e2 =1√3

(

1,−1

2−

√3

2i,−1

2+

√3

2i

)

(77)

Discrete Fourier Transform, Version 1

We now employ the orthonormal basis developed above to construct our first version of the DFT. Any

element f ∈ CN will have an expansion of the form

f =

N−1∑

k=0

〈f, ek〉ek. (78)

In component form,

f [n] =N−1∑

k=0

〈f, ek〉ek[n] =N−1∑

k=0

c[k]ek[n], (79)

where the c[k] = 〈f, ek〉 denote the Fourier coefficients of f in the ek basis. Let us now examine these

coefficients:

c[k] = 〈f, ek〉

=

N−1∑

n=0

f [n]ek[n], (80)

or

c[k] =1√N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

, k = 0, 1, · · · , N − 1. (81)

This relation defines a discrete Fourier transform (DFT) of f . The components of the vector

c = (c[1], c[2], · · · , c[N − 1]) comprise the DFT of the vector f = (f [1], f [2], · · · , f [N − 1]). Mathemat-

ically, we can write

c = Ff, (82)

87

where F : CN → CN denotes the discrete Fourier transform operator on complex N -vectors.

Important comment: Note the choice of “a” instead of “the” before “discrete Fourier transform.”

Unfortunately, there are several closely-related definitions, and it is important to recognize this fact.

For this reason, we refer to the above DFT as DFT, Version 1.

Let us return to Eq. (81) to show that, indeed, the DFT vector c with components c[k] is N -

periodic:

c[k + N ] =1√N

N−1∑

n=0

f [n] exp

(

− i2π(k + N)n

N

)

=1√N

N−1∑

n=0

f [n] exp

(

− i2π(k)n

N

)

exp

(

− i2πNn

N

)

=1√N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

exp (−i2πn)

=1√N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

= c[k]. (83)

Eq. (81) is the definition of the discrete Fourier transform implemented in the MAPLE programming

language. In MAPLE, the relation would be written as

c = FourierTransform(f)

f = InverseFourierTransform(c),

where we still have to define the inverse DFT.

Mathematically, the above formula is elegant because of the following result,

‖f‖2 = ‖c‖2, (84)

where ‖ · ‖2 denotes the L2 norm defined by the complex inner product in CN . To see this:

‖f‖22 = 〈f, f〉

=

N−1∑

n=0

f [n]f [n]

88

=N−1∑

n=0

[

N−1∑

k=0

c[k]ek[n]

][

N−1∑

l=0

c[l]el[n]

]

=

N−1∑

k=0

N−1∑

l=0

c[k]c[l]

[

N−1∑

n=0

ek[n]el[n]

]

=

N−1∑

k=0

N−1∑

l=0

c[k]c[l]〈ek, el〉

=N−1∑

k=0

c[k]c[k]

= ‖c‖22, (85)

from which (84) follows. This means that the DFT operator F is norm-preserving, i.e., the norm

of c is the norm of f .

Inverse DFT, Version 1

Let us now see if we can find a result for the inverse discrete Fourier transform, i.e., given the DFT

c, how can we find f , written mathematically as

f = F−1c. (86)

In order to invert relation (81), we shall utilize the orthonormality of the ek vectors in (64). For a

particular value of m ∈ {0, 1, · · · , N − 1}, multiply both sides of Eq. (81) by1√N

exp

(

i2πkm

N

)

and

then sum over k:

1√N

N−1∑

k=0

c[k] exp

(

i2πkm

N

)

=1

N

N−1∑

n=0

f [n]N−1∑

k=0

exp

(

i2πk(m − n)

N

)

. (87)

We have already seen earlier that the final summation is Nδmn. Thus, for each m, only the term

n = m from the sum over n contributes. As a result, we have

f [m] =1√N

N−1∑

k=0

c[k] exp

(

i2πkm

N

)

. (88)

This relation is true for each m = 0, 1, · · · , N − 1. It is customary to let n denote the spatial or time

variable, so we rewrite the above result as

f [n] =1√N

N−1∑

k=0

c[k] exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (89)

89

These relations comprise the inverse discrete Fourier transform (IDFT) associated with the DFT in

Eq. (81).

A closer look at Eq. (89) shows that, in fact, the inverse DFT is nothing more than the expansion

of the discrete vector f in terms of the orthonormal basis {ek}. the DFT coefficients c[k] are used to

construct the “signal” elements f [n].

We now summarize the results obtained above:

DFT and IDFT, Version 1

c[k] =1√N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

, k = 0, 1, · · · , N − 1,

f [n] =1√N

N−1∑

k=0

c[k] exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (90)


A second version of the DFT and its inverse is employed in many mathematics books (e.g., the book

by Kammler). Unlike the first version, it is not symmetric. But there is a legitimate reason for its

definition, since it arises naturally from a discretization of the integrals used to compute Fourier series

coefficients. We shall postpone the discussion of this result to another lecture. For the moment, we

simply state the second version of the DFT.

The DFT, Version 2 is defined as follows,

F [k] =1

N

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

, k = 0, 1, · · · , N − 1. (91)

Note the use of F to denote the DFT: It is customary to let capital letters denote the FT/DFTs of

functions. The only difference between this version and Version 1 in Eq. (81) is that the factor in front

is 1/N instead of 1/√

N . In the same manner as was done for Version 1, the inverse DFT associated

with the above DFT is given by

f [n] =

N−1∑

k=0

F [k] exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (92)

90


This is the version that appears in most of the signal processing literature (e.g. Mallat) as well as

mathematics books that deal with signal processing applications (e.g., Boggess and Narcowich). It

appears to be the version that is most widely used by research workers in signal and image processing,

as witnessed by the fact that it is the version implemented in MATLAB. As such, unless specified

otherwise, this will be the version used in this course.

The DFT, Version 3 is defined as follows,

F [k] =N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

, k = 0, 1, · · · , N − 1. (93)

There is no factor in front of the summation. The inverse DFT associated with the this DFT is given

by

f [n] =1

N

N−1∑

k=0

F [k] exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (94)

In MATLAB, the DFT and IDFT are denoted as follows,

F = fft(f),

f = ifft(F).

Using the orthogonality property of the complex exponential functions established earlier, i.e., 〈uk, ul〉 =

Nδkl, it can be shown (a simple modification of the derivation for DFT, Version 1, Eq. (85)) that this

particular version of the DFT satisfies the relation,

‖f‖22 =

1

N‖F‖2

2, (95)

where ‖ · ‖2 denotes the L2/Euclidean norm on CN . From this point onward, we shall omit the

subscript 2 from the norm and write ‖ · ‖, with the understanding that it represents the L2 norm.

Matrix form of DFT

You’ll note that all the coefficients multiplying the f [n] elements in the DFT of Eq. (93) involve

powers of the complex number

ω = exp

(

− i2π

N

)

= cos

(

2π

N

)

− i sin

(

2π

N

)

. (96)

91

A closer examination shows that if f and F are written as column N -vectors, f and F, respectively,

then the DFT relation in (93) may be written in matrix form as

F = Ff , (97)

where F is an N × N complex matrix having the form

1 1 1 · · · 1

1 ω ω2 · · · ωN−1

1 ω2 ω4 · · · ω2N−2

......

......

1 ωN−1 ω2N−2 · · · ω(N−1)(N−1)

(98)

The kth entry of the vector F is given by

F [k] = f [0] + ωkf [1] + ω2kf [2] + · · · + ω(N−1)kf [N − 1]

= f [0] + f [1]ωk + f [2](ωk)2 + · · · f [N − 1](ωk)N−1, (99)

where the second line indicates that F [k] is a polynomial in ωk. This suggests that it may be evaluated

recursively, as opposed to computing the terms separately and adding them up. The following is a

pseudocode version of “Horner’s algorithm” for computing the entire vector F :

z:=1

ω = e−i2π/N

for k=0,1,...,N-1 do

S:=f[N-1]

for l=2,3,...,N do

S:=z*S+f[N-l]

od

F[k]:=S

z:=z*ω

od

In this form, the computation of the DFT F requires N2 complex operations, which translates to

4N2 real operations. For special values of N , the procedure can be optimized, utilizing the fact that

ω is a root of unity. This is the basis of the fast Fourier transform (FFT) which we may discuss a

little later in the course.

92

Lecture 9

Discrete Fourier Transform (cont’d)

We now examine the DFT a little further, with the help of some examples. As mentioned earlier, we

shall be using the DFT, Version 3 – the “MATLAB” formula – summarized again below:


F [k] =

N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

, k = 0, 1, · · · , N − 1. (100)

f [n] =1

N

N−1∑

k=0

F [k] exp

(

i2πkn

N

)

, n = 0, 1, · · · , N − 1. (101)

An important note: As we know from before, the DFT and IDFT may be viewed as inner

products between appropriate N -vectors. In the special case k = 0, all of the complex exponentials

in Eq. (100) are equal to 1. This is because the particular element F [0] is the inner product between

the N -vector f and the unnormalized N -vector u0 = (1, 1, · · · , 1). As such,

F [0] = 〈f, u0〉 =N−1∑

n=0

f [n] . (102)

Some examples:

For N = 4:

1. f = (1, 1, 1, 1), F = (4, 0, 0, 0) (This illustrates Eq. (102) above.)

2. g = (0, 1, 0, 1), G = (2, 0,−2, 0)

3. h = (1, 2, 1, 2), H = (6, 0,−2, 0).

4. a = (1, 2, 3, 4), A = (10,−2 + 2i,−2,−2 − 2i).

Comments:

1. In 1, the signal f is a constant signal, i.e., no variation. This means that the only frequency

component is zero frequency, i.e., k = 0. This is why the first element, k = 0, corresponding to

the constant vector u0 is the only nonzero component of F . The signal f is orthogonal to all

other vectors uk, k 6= 0.

93

2. In 2, the signal g has period 2, i.e., it oscillates with twice the periodicity of signal f (period 4).

This accounts for the nonzero entry G[2] = −2.

3. Note that the third result is in accordance with the linearity of the DFT: h = f + g implies that

H = F(h) = F(f + g)

= F(f) + F(g)

= F + G. (103)

4. The result in 4 shows that a real-valued signal can have a complex-valued DFT.

5. Each of the four results above demonstrates the modified Parseval equality for the DFT, Version

3 mentioned earlier, i.e.,

‖f‖2 =1

N‖F‖2. (104)

Some more complicated examples:

5. We consider the function f(x) = cos(2x) defined on the interval 0 ≤ x ≤ 2π. From this function

we construct N = 256 equally-spaced samples,

f [n] = f(xn) = cos(2xn), xn =2πn

N, n = 0, 1, · · · , N − 1. (105)

The samples are plotted on the left in the figure below.

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

x

0

25

50

75

100

125

150

0 50 100 150 200 250 300

|F(k)|

k

Sampled signal f [n] = cos(2xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of its DFT.

Numerically, we find that all DFT coefficients F [k] are zero, except for two elements:

F [2] = 128, F [254] = 128. (106)

94

A plot of the magnitudes |F [k]| of the DFT coefficients is presented on the right in the figure

below. The nonzero entry for F [2] picks out the k = 2 frequency of the signal. We’ll see later

that the F [254] component does likewise.

Numerically, we also find that

‖f‖2 =255∑

n=0

|f [n]|2 = 128 (to two decimals), (107)

and1

N‖F‖2 =

1

256(F [2]2 + F [4]2) =

1

256(1282 + 1282) = 128. (108)

Thus, Eq. (95) is satisfied.

6. Now consider the function f(x) = sin(2x) defined on the interval 0 ≤ x ≤ 2π. From this function

we construct N = 256 equally-spaced samples,

f [n] = f(xn) = sin(2xn), xn =2πn

N, n = 0, 1, · · · , N − 1. (109)

The samples are plotted on the left in the figure below.

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

f[n]

x

0

25

50

75

100

125

150

0 50 100 150 200 250 300

|F(k)|

k

Sampled signal f [n] = sin(2xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of its DFT.

Numerically, we find that all DFT coefficients F [k] are zero, except for two elements:

F [2] = −128i, F [254] = 128i. (110)

Of course, there is a similarity between this spectrum and that of Example 5 in that the peaks

coincide at k = 2 (and 254), corresponding to the common frequency k = 2. As such, a plot

of the magnitudes |F [k]| of the DFT coefficients, presented on the right in the figure below, is

identical to the corresponding plot of Example 5.

95

On the other hand, the DFT coefficients for the sin(2x) function are complex. In fact, they are

purely imaginary. The coefficient F [2] for the sin(x) function is obtained from the coefficient of

the cos(x) function by multiplication by i = eiπ/2. This might have something to do with the

fact that the sin(x) function is a shifted version of the cos(x) version. More on this later.

The moral of the story is that the magnitudes |F [k]| do not contain all of the information about

a signal. If we write

F [k] = |F [k]|eiφk , (111)

then the phases φk also contain information about the signal and cannot be ignored.


‖f‖2 =

255∑

n=0

|f [n]|2 = 128 (to two decimals), (112)

and1

N‖F‖2 =

1

256(1282 + 1282) = 128. (113)

Thus, Eq. (95) is satisfied.

7. We now consider the function f(x) = sin(2x)+ 5 sin(5x) defined on the interval 0 ≤ x ≤ 2π. We

have added a higher-frequency term to the function of Example 5.

From this function we again construct N = 256 equally-spaced samples,

f [n] = f(xn) = sin(2xn) + 5 sin(5xn), xn =2πn

N, n = 0, 1, · · · , N − 1. (114)

The samples are plotted on the left in the figure below. Numerically, we find that all DFT

-10

-8

-6

-4

-2

0

2

4

6

8

10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

f[n]

x

0

100

200

300

400

500

600

700

800

0 50 100 150 200 250 300

|F(k)|

k

Sampled signal f [n], n = 0, 1, · · · , 255 and magnitudes |F [k]| of the DFT, for Example 7.

96

coefficients F [k] are zero, except for four elements:

F [2] = −128i, F [254] = 128i, (115)

as expected, corresponding to the sin(2x) component, and

F [5] = −640i, F [251] = 640i, (116)

corresponding to the sin(5x) component. From the linearity of the DFT, the DFT of the sum

of these two functions is the sum of their DFTs. Also note that the ratio of amplitudes of these

two sets follows the 1 : 5 ratio of the sin(2x) and sin(5x) components. A plot of the magnitudes

|F [k]| of the DFT coefficients is presented on the right in the figure below.


‖f‖2 =

255∑

n=0

|f [n]|2 = 3328. (117)

and1

N‖F‖2 =

1

256(2 ∗ 1282 + 2 ∗ 6402) = 3328, (118)

once again in accordance with Eq. (95).

8. Let us now generalize the results from the previous two examples. Suppose that we have the

complex-valued function f(x) = exp(ik0x), defined on the interval [0, 2π], with k0 an integer.

For the moment, we assume that k0 ∈ {0, 1, 2, · · · , N − 1}. From this function, we extract N

equally-spaced samples at the sample points xn = 2πn/N , n = 0, 1, · · · , N − 1, i.e,

f [n] = f(xn) = exp(ik0xn) = exp

(

i2πk0n

N

)

, n = 0, 1, · · · , N − 1. (119)

The function f(x) is 2π-periodic. But what is its DFT?

By definition, its DFT is given by

F [k] =N−1∑

n=0

f [n] exp

(

− i2πkn

N

)

=

N−1∑

n=0

exp

(

i2πk0n

N

)

exp

(

− i2πkn

N

)

. (120)

Now recall from the previous lecture that the discrete exponential functions uk,

uk[n] =

N−1∑

n=0

exp

(

− i2πkn

N

)

, (121)

97

an orthogonal set on CN , i.e.,

〈uk, ul〉 = Nδkl. (122)

This means that the F [k] = 0 in Eq. (120) unless k = k0. In other words, the N -point DFT of

the exponential function exp(ik0x) sampled on [0, 2π] is given by

F [k] = Nδkk0. (123)

The DFT consists of a single peak of magnitude N at k = k0.

But wait just one minute!

The above result applies to the case k0 ∈ {0, 1, 2, · · · , N − 1}. What happens if it is not in the set

of frequencies {0, 1, 2, · · · , N − 1} covered by the DFT? For example, if N = 256, what happens if

k0 = 260? Will the function be oscillating too quickly to be detected?

The answer is “No, it will be detected.” Somehow, one gets the feeling that everything here happens

“modulo N ,” because of the periodicity of the vectors. And that is what happens in frequency space

as well. Let us replace k0 in Eq. (120) with k0 + N . Then the RHS becomes

N−1∑

n=0

exp

(

i2π(k0 + N)n

N

)

exp

(

− i2πkn

N

)

=N−1∑

n=0

exp

(

i2πk0n

N

)

exp

(

i2πkNn

N

)

exp

(

− i2πkn

N

)

=

N−1∑

n=0

exp

(

i2πk0n

N

)

exp (i2πkn) exp

(

− i2πkn

N

)

=

N−1∑

n=0

exp

(

i2πk0n

N

)

exp

(

− i2πkn

N

)

. (124)

In other words, the result in Eq. (120) is unchanged. Therefore the same result holds for k0 +pN ,

where p is an integer. This implies that a peak will show up at k = k0 mod N . So the final result is:

The N -point DFT of the sampled function exp(ik0x), 0 ≤ x ≤ 2π, i.e., f [n] given in Eq.

(119), is given by a single peak:

F [k] =

N, k = k0 mod N,

0, otherwise.(125)

98

Note: We actually proved this result earlier, when we showed that the discrete Fourier coefficient

vectors c = (c[1], c[2], · · · , c[N − 1]) associated with the orthonormal basis ek are N -periodic. But it

doesn’t hurt to revisit this result.

From this property, we may now go back and verify that the calculations of Examples 5-7 of the

previous lecture, involving sine and cosine functions, are correct.

1. Since

cos(k0x) =1

2eik0x +

1

2e−ik0x, (126)

it follows, from the linearity of the DFT, and the “modulo N” property derived above, that the

N -point DFT of cos(k0x) consists of two peaks of height N/2, i.e.,

F [k] =

N2 , k = k0,

N2 , k = N − k0,

0, otherwise.

(127)

The peak at k = N − k0 comes from the “modulo N” property. The second exponential in Eq.

(126) would produce a peak at k = −k0 which, in turn, because of the N -periodicity of the DFT,

produces a peak at k = −k0 + N = N − k0.

Note that this result is in agreement with the computation in Example 5 above.

2.

sin(k0x) =1

2ieik0x − 1

2ie−ik0x, (128)

it follows, once again from the linearity of the DFT, and the “modulo N” property derived

above, that the N -point DFT of sin(k0x) consists of the following two peaks,

F [k] =

N2i = −N

2 i, k = k0,

−N2i = N

2 i, k = N − k0,

0, otherwise.

(129)

Note that these peaks have the same magnitudes as for the cosine case, but that they are now

complex. Moreover, the two peaks of the DFT of the sine function are complex conjugates of

each other. More on this later.

For the moment, we note that this result is in agreement with the computation in Example 6

above.

99

3. Finally, notice that if we add up the DFTs of the sine and cosine function appropriately, we

retrieve the DFT of the exponential, i.e.,

F(cos(k0x) + i sin(k0x)) = F(exp(ik0x)). (130)

The peaks of the cosine and sine at k = k0 “constructively interfere” whereas their peaks at

k = N − k0 “destructively interfere.” The result is a single peak of height N at k = k0.

We now consider a slightly perturbed version of Example 6 of the previous lecture, namely the function

f(x) = sin(2.1x) defined on the interval 0 ≤ x ≤ 2π. From this function we construct N = 256 equally-

spaced samples,

f [n] = f(xn) = sin(2.1xn), xn =2πn

N, n = 0, 1, · · · , N − 1. (131)

The samples are plotted on the left in the figure below. Note that this signal is not 2π-periodic, but

the sampling and resulting DFT produces a 2π-periodic extension. As such, there is a significant jump

between f [255] and f [256] = f [0].

This time, we find that the DFT spectrum of coefficients F [k] is not as simple as in the first two

examples. First of all, with the exception of F [0] ≈ 3.41068, all DFT coefficients are complex, i.e.,

have nonzero imaginary part. A plot of the magnitudes |F [k] of the DFT coefficients is presented on

the right in the figure below. There is still a dominant peak at k = 2, but it is not a singular peak –

it is somewhat diffuse.

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5

f[n]

x

0

25

50

75

100

125

150

0 50 100 150 200 250 300

|F(k)|

k

Sampled signal f [n] = sin(2.1xn), n = 0, 1, · · · , 255 and magnitudes |F [k]| of the DFT, for Example 9.

In order to show the diffuseness of the DFT spectrum, the coefficients are plotted on a different

scale so that the enormous peaks at k = 2 and 254 do not mask their behaviour as in the previous

plot.

100

0

5

10

15

20

0 50 100 150 200 250 300

|F(k)|

k

Plot of magnitudes |F (k)| of DFT of sin(2.1xn) signal of Example 9, magnified to show the diffuse

structure around the dominant peaks at k = 2 and 254.

Ideally, the DFT would like to place a peak at the frequency k = 2.1, but it doesn’t exist. As such,

the dominant peaks are found at k = 2 and 254. But all other frequencies are need to accomodate

this “nonexistent” or irregular frequency – note that their contribution decreases as we move away

from the peaks.

If this appears to be a rather “bizarre” phenomenon, just go back and think about the Fourier

(sine) series of this function, i.e.,

sin(2.1x) =

∞∑

k=1

bk sin(kx). (132)

In fact, the coefficients bk can be computed rather easily, and one observes that they produce a

somewhat “diffuse” Fourier spectrum that peaks at k = 2.

The reader may wish to examine the effect of further perturbing the frequency of the sampled

signal, i.e., the function f(x) = sin(2 + ǫ)x as ǫ is increased. For example, will the k = 3 (and 253)

components of the DFT increase in magnitude. And for ǫ > 0.5, does k = 3 “take over” in magnitude?

101

Documents

Lecture 7 Inner product spaces (cont’d)links.uwaterloo.ca/amath391w13docs/set3.pdf · · 2013-04-12Lecture 7 Inner product spaces (cont’d) The “moral of the story” regarding