Inner Product Spaces - UCONN€¦ · Inner Product Spaces CHAPTER CONTENTS 6.1 Inner Products 345 6.2 Angle and Orthogonality in Inner Product Spaces 355 6.3 Gram–Schmidt Process;

345

C H A P T E R 6

Inner Product SpacesCHAPTER CONTENTS 6.1 Inner Products 345

6.2 Angle and Orthogonality in Inner Product Spaces 355

6.3 Gram–Schmidt Process; QR-Decomposition 364

6.4 Best Approximation; Least Squares 378

6.5 Mathematical Modeling Using Least Squares 387

6.6 Function Approximation; Fourier Series 394

INTRODUCTION In Chapter 3 we defined the dot product of vectors in Rn, and we used that concept todefine notions of length, angle, distance, and orthogonality. In this chapter we willgeneralize those ideas so they are applicable in any vector space, not just Rn. We willalso discuss various applications of these ideas.

6.1 Inner ProductsIn this section we will use the most important properties of the dot product on Rn asaxioms, which, if satisfied by the vectors in a vector space V, will enable us to extend thenotions of length, distance, angle, and perpendicularity to general vector spaces.

General Inner Products In Definition 4 of Section 3.2 we defined the dot product of two vectors in Rn, and inTheorem 3.2.2 we listed four fundamental properties of such products. Our first goalin this section is to extend the notion of a dot product to general real vector spaces byusing those four properties as axioms. We make the following definition.

Note that Definition 1 appliesonly to real vector spaces. Adefinition of inner products oncomplex vector spaces is givenin the exercises. Since we willhave little need for complexvector spaces from this pointon, you can assume that allvector spaces under discussionare real, even though some ofthe theorems are also valid incomplex vector spaces.

DEFINITION 1 An inner product on a real vector space V is a function that associatesa real number 〈u, v〉 with each pair of vectors in V in such a way that the followingaxioms are satisfied for all vectors u, v, and w in V and all scalars k.

1. 〈u, v〉 = 〈v, u〉 [ Symmetry axiom ]

2. 〈u + v, w〉 = 〈u, w〉 + 〈v, w〉 [ Additivity axiom ]

3. 〈ku, v〉 = k〈u, v〉 [ Homogeneity axiom ]

4. 〈v, v〉 ≥ 0 and 〈v, v〉 = 0 if and only if v = 0 [ Positivity axiom ]

A real vector space with an inner product is called a real inner product space.

Because the axioms for a real inner product space are based on properties of the dotproduct, these inner product space axioms will be satisfied automatically if we define theinner product of two vectors u and v in Rn to be

〈u, v〉 = u · v = u1v1 + u2v2 + · · · + unvn (1)

346 Chapter 6 Inner Product Spaces

This inner product is commonly called the Euclidean inner product (or the standard innerproduct) on Rn to distinguish it from other possible inner products that might be definedon Rn. We call Rn with the Euclidean inner product Euclidean n-space.

Inner products can be used to define notions of norm and distance in a general innerproduct space just as we did with dot products in Rn. Recall from Formulas (11) and (19)of Section 3.2 that if u and v are vectors in Euclidean n-space, then norm and distancecan be expressed in terms of the dot product as

‖v‖ = √v · v and d(u, v) = ‖u − v‖ = √

(u − v) · (u − v)

Motivated by these formulas, we make the following definition.

DEFINITION 2 If V is a real inner product space, then the norm (or length) of a vectorv in V is denoted by ‖v‖ and is defined by

‖v‖ = √〈v, v〉and the distance between two vectors is denoted by d(u, v) and is defined by

d(u, v) = ‖u − v‖ = √〈u − v, u − v〉A vector of norm 1 is called a unit vector.

The following theorem, whose proof is left for the exercises, shows that norms anddistances in real inner product spaces have many of the properties that you might expect.

THEOREM 6.1.1 If u and v are vectors in a real inner product space V, and if k is ascalar, then:

(a) ‖v‖ ≥ 0 with equality if and only if v = 0.

(b) ‖kv‖ = |k|‖v‖.(c) d(u, v) = d(v, u).

(d ) d(u, v) ≥ 0 with equality if and only if u = v.

Although the Euclidean inner product is the most important inner product on Rn,there are various applications in which it is desirable to modify it by weighting each termdifferently. More precisely, if

w1, w2, . . . , wn

are positive real numbers, which we will call weights, and if u = (u1, u2, . . . , un) andv = (v1, v2, . . . , vn) are vectors in Rn, then it can be shown that the formula

〈u, v〉 = w1u1v1 + w2u2v2 + · · · + wnunvn (2)

defines an inner product on Rn that we call the weighted Euclidean inner product with

Note that the standard Eu-clidean inner product in For-mula (1) is the special caseof the weighted Euclidean in-ner product in which all theweights are 1.

weights w1, w2, . . . , wn.

EXAMPLE 1 Weighted Euclidean Inner Product

Let u = (u1, u2) and v = (v1, v2) be vectors in R2. Verify that the weighted Euclideaninner product

〈u, v〉 = 3u1v1 + 2u2v2 (3)

satisfies the four inner product axioms.

6.1 Inner Products 347

Solution

Axiom 1: Interchanging u and v in Formula (3) does not change the sum on the rightside, so 〈u, v〉 = 〈v, u〉.Axiom 2: If w = (w1, w2), then

In Example 1, we are usingsubscripted w’s to denote thecomponents of the vector w,not the weights. The weightsare the numbers 3 and 2 in For-mula (3).

〈u + v, w〉 = 3(u1 + v1)w1 + 2(u2 + v2)w2

= 3(u1w1 + v1w1) + 2(u2w2 + v2w2)

= (3u1w1 + 2u2w2) + (3v1w1 + 2v2w2)

= 〈u, w〉 + 〈v, w〉Axiom 3: 〈ku, v〉 = 3(ku1)v1 + 2(ku2)v2

= k(3u1v1 + 2u2v2)

= k〈u, v〉Axiom 4: 〈v, v〉 = 3(v1v1) + 2(v2v2) = 3v2

1 + 2v22 ≥ 0 with equality if and only if

v1 = v2 = 0, that is, if and only if v = 0.

An Application ofWeightedEuclidean Inner Products

To illustrate one way in which a weighted Euclidean inner product can arise, supposethat some physical experiment has n possible numerical outcomes

x1, x2, . . . , xn

and that a series of m repetitions of the experiment yields these values with variousfrequencies. Specifically, suppose that x1 occurs f1 times, x2 occurs f2 times, and soforth. Since there is a total of m repetitions of the experiment, it follows that

f1 + f2 + · · · + fn = m

Thus, the arithmetic average of the observed numerical values (denoted by x̄) is

x̄ = f1x1 + f2x2 + · · · + fnxn

f1 + f2 + · · · + fn

= 1

m(f1x1 + f2x2 + · · · + fnxn) (4)

If we letf = (f1, f2, . . . , fn)

x = (x1, x2, . . . , xn)

w1 = w2 = · · · = wn = 1/m

then (4) can be expressed as the weighted Euclidean inner product

x̄ = 〈f, x〉 = w1f1x1 + w2f2x2 + · · · + wnfnxn

EXAMPLE 2 Calculating with aWeighted Euclidean Inner Product

It is important to keep in mind that norm and distance depend on the inner product beingused. If the inner product is changed, then the norms and distances between vectors alsochange. For example, for the vectors u = (1, 0) and v = (0, 1) in R2 with the Euclideaninner product we have

‖u‖ =√

12 + 0 2 = 1

andd(u, v) = ‖u − v‖ = ‖(1,−1)‖ =

√12 + (−1)2 = √

2

but if we change to the weighted Euclidean inner product

〈u, v〉 = 3u1v1 + 2u2v2

we have‖u‖ = 〈u, u〉1/2 = [3(1)(1) + 2(0)(0)]1/2 = √

3


andd(u, v) = ‖u − v‖ = 〈(1,−1), (1,−1)〉1/2

= [3(1)(1) + 2(−1)(−1)]1/2 = √5

Unit Circles and Spheres inInner Product Spaces

DEFINITION 3 If V is an inner product space, then the set of points in V that satisfy

‖u‖ = 1

is called the unit sphere or sometimes the unit circle in V .

EXAMPLE 3 Unusual Unit Circles in R2

(a) Sketch the unit circle in an xy-coordinate system in R2 using the Euclidean innerproduct 〈u, v〉 = u1v1 + u2v2.

(b) Sketch the unit circle in an xy-coordinate system in R2 using the weighted Euclideaninner product 〈u, v〉 = 1

9u1v1 + 14u2v2.

Solution (a) If u = (x, y), then ‖u‖ = 〈u, u〉1/2 =√

x2 + y2, so the equation of the unitcircle is

√x2 + y2 = 1, or on squaring both sides,

x2 + y2 = 1

As expected, the graph of this equation is a circle of radius 1 centered at the origin(Figure 6.1.1a).

y

x

1

||u|| = 1

(a) The unit circle using the standard Euclidean inner product.

(b) The unit circle using a weighted Euclidean inner product.

y

x

3

2||u|| = 1

Figure 6.1.1

Solution (b) If u = (x, y), then ‖u‖ = 〈u, u〉1/2 =√

19x

2 + 14y

2, so the equation of the

unit circle is√

19x

2 + 14y

2 = 1, or on squaring both sides,

x2

9+ y2

4= 1

The graph of this equation is the ellipse shown in Figure 6.1.1b. Though this may seemodd when viewed geometrically, it makes sense algebraically since all points on the ellipseare 1 unit away from the origin relative to the given weighted Euclidean inner product. Inshort, weighting has the effect of distorting the space that we are used to seeing through“unweighted Euclidean eyes.”

Inner Products Generatedby Matrices

The Euclidean inner product and the weighted Euclidean inner products are special casesof a general class of inner products on Rn called matrix inner products. To define thisclass of inner products, let u and v be vectors in Rn that are expressed in column form,and let A be an invertible n × n matrix. It can be shown (Exercise 47) that if u · v is theEuclidean inner product on Rn, then the formula

〈u, v〉 = Au · Av (5)

also defines an inner product; it is called the inner product on Rn generated by A.Recall from Table 1 of Section 3.2 that if u and v are in column form, then u · v can

be written as vTu from which it follows that (5) can be expressed as

〈u, v〉 = (Av)TAu


or equivalently as

〈u, v〉 = vTATAu (6)

EXAMPLE 4 Matrices GeneratingWeighted Euclidean Inner Products

The standard Euclidean and weighted Euclidean inner products are special cases ofmatrix inner products. The standard Euclidean inner product on Rn is generated by then × n identity matrix, since setting A = I in Formula (5) yields

〈u, v〉 = Iu · Iv = u · v

and the weighted Euclidean inner product

〈u, v〉 = w1u1v1 + w2u2v2 + · · · + wnunvn (7)

is generated by the matrix

A =

⎡⎢⎢⎢⎢⎣√

w1 0 0 · · · 0

0√

w2 0 · · · 0...

......

...

0 0 0 · · · √wn

⎤⎥⎥⎥⎥⎦

This can be seen by observing that ATA is the n × n diagonal matrix whose diagonalentries are the weights w1, w2, . . . , wn.

EXAMPLE 5 Example 1 Revisited

The weighted Euclidean inner product 〈u, v〉 = 3u1v1 + 2u2v2 discussed in Example 1Every diagonal matrix withpositive diagonal entries gen-erates a weighted inner prod-uct. Why?

is the inner product on R2 generated by

A =[√

3 0

0√

2

]

Other Examples of InnerProducts

So far, we have only considered examples of inner products on Rn. We will now considerexamples of inner products on some of the other kinds of vector spaces that we discussedearlier.

EXAMPLE 6 The Standard Inner Product onMnn

If u = U and v = V are matrices in the vector space Mnn, then the formula

〈u, v〉 = tr(UTV ) (8)

defines an inner product on Mnn called the standard inner product on that space (seeDefinition 8 of Section 1.3 for a definition of trace). This can be proved by confirmingthat the four inner product space axioms are satisfied, but we can see why this is so bycomputing (8) for the 2 × 2 matrices

U =[u1 u2

u3 u4

]and V =

[v1 v2

v3 v4

]This yields

〈u, v〉 = tr(UTV ) = u1v1 + u2v2 + u3v3 + u4v4


which is just the dot product of the corresponding entries in the two matrices. And itfollows from this that

‖u‖ = √〈u, u〉 =√

tr〈UTU〉 =√

u21 + u2

2 + u23 + u2

4

For example, if

u = U =[

1 2

3 4

]and v = V =

[−1 0

3 2

]then

〈u, v〉 = tr(UTV ) = 1(−1) + 2(0) + 3(3) + 4(2) = 16

and‖u‖ = √〈u, u〉 = √

tr(UTU) = √12 + 22 + 32 + 42 = √

30

‖v‖ = √〈v, v〉 = √tr(V TV ) = √

(−1)2 + 02 + 32 + 22 = √14

EXAMPLE 7 The Standard Inner Product on PnIf

p = a0 + a1x + · · · + anxn and q = b0 + b1x + · · · + bnx

n

are polynomials in Pn, then the following formula defines an inner product on Pn (verify)that we will call the standard inner product on this space:

〈p, q〉 = a0b0 + a1b1 + · · · + anbn (9)

The norm of a polynomial p relative to this inner product is

‖p‖ = √〈p, p〉 =√

a20 + a2

1 + · · · + a2n

EXAMPLE 8 The Evaluation Inner Product on PnIf

p = p(x) = a0 + a1x + · · · + anxn and q = q(x) = b0 + b1x + · · · + bnx

n

are polynomials in Pn, and if x0, x1, . . . , xn are distinct real numbers (called samplepoints), then the formula

〈p, q〉 = p(x0)q(x0) + p(x1)q(x1) + · · · + p(xn)q(xn) (10)

defines an inner product on Pn called the evaluation inner product at x0, x1, . . . , xn.Algebraically, this can be viewed as the dot product in Rn of the n-tuples(

p(x0), p(x1), . . . , p(xn))

and(q(x0), q(x1), . . . , q(xn)

)and hence the first three inner product axioms follow from properties of the dot product.The fourth inner product axiom follows from the fact that

〈p, p〉 = [p(x0)]2 + [p(x1)]2 + · · · + [p(xn)]2 ≥ 0

with equality holding if and only if

p(x0) = p(x1) = · · · = p(xn) = 0

But a nonzero polynomial of degree n or less can have at most n distinct roots, so it mustbe that p = 0, which proves that the fourth inner product axiom holds.


The norm of a polynomial p relative to the evaluation inner product is

‖p‖ = √〈p, p〉 =√[p(x0)]2 + [p(x1)]2 + · · · + [p(xn)]2 (11)

EXAMPLE 9 Working with the Evaluation Inner Product

Let P2 have the evaluation inner product at the points

x0 = −2, x1 = 0, and x2 = 2

Compute 〈p, q〉 and ‖p‖ for the polynomials p = p(x) = x2 and q = q(x) = 1 + x.

Solution It follows from (10) and (11) that

〈p, q〉 = p(−2)q(−2) + p(0)q(0) + p(2)q(2) = (4)(−1) + (0)(1) + (4)(3) = 8

‖p‖ = √[p(x0)]2 + [p(x1)]2 + [p(x2)]2 = √[p(−2)]2 + [p(0)]2 + [p(2)]2

= √42 + 02 + 42 = √

32 = 4√

2

EXAMPLE 10 An Integral Inner Product on C [a, b]

Let f = f(x) and g = g(x) be two functions in C[a, b] and define

CA L C U L U S R E Q U I R E D

〈f, g〉 =∫ b

a

f(x)g(x) dx (12)

We will show that this formula defines an inner product on C[a, b] by verifying the fourinner product axioms for functions f = f(x), g = g(x), and h = h(x) in C[a, b]:Axiom 1: 〈f, g〉 =

∫ b

a

f(x)g(x) dx =∫ b

a

g(x)f(x) dx = 〈g, f〉

Axiom 2: 〈f + g, h〉 =∫ b

a

(f(x) + g(x))h(x) dx

=∫ b

a

f(x)h(x) dx +∫ b

a

g(x)h(x) dx

= 〈f, h〉 + 〈g, h〉

Axiom 3: 〈kf, g〉 =∫ b

a

kf(x)g(x) dx = k

∫ b

a

f(x)g(x) dx = k〈f, g〉

Axiom 4: If f = f(x) is any function in C[a, b], then

〈f, f〉 =∫ b

a

f 2(x) dx ≥ 0 (13)

since f 2(x) ≥ 0 for all x in the interval [a, b]. Moreover, because f is continuous on[a, b], the equality in Formula (13) holds if and only if the function f is identically zeroon [a, b], that is, if and only if f = 0; and this proves that Axiom 4 holds.

EXAMPLE 11 Norm of aVector in C [a, b]

If C[a, b] has the inner product that was defined in Example 10, then the norm of a


function f = f(x) relative to this inner product is

‖f‖ = 〈f, f〉1/2 =√∫ b

a

f 2(x) dx (14)


and the unit sphere in this space consists of all functions f in C[a, b] that satisfy theequation ∫ b

a

f 2(x) dx = 1

Remark Note that the vector space Pn is a subspace of C[a, b] because polynomials are contin-uous functions. Thus, Formula (12) defines an inner product on Pn that is different from both thestandard inner product and the evaluation inner product.

WARNING Recall from calculus that the arc length of a curve y = f(x) over an interval [a, b]is given by the formula

L =∫ b

a

√1 + [f ′(x)]2 dx (15)

Do not confuse this concept of arc length with ‖f‖, which is the length (norm) of f when f isviewed as a vector in C[a, b]. Formulas (14) and (15) have different meanings.

Algebraic Properties ofInner Products

The following theorem lists some of the algebraic properties of inner products that followfrom the inner product axioms. This result is a generalization of Theorem 3.2.3, whichapplied only to the dot product on Rn.

THEOREM 6.1.2 If u, v, and w are vectors in a real inner product space V, and if k is ascalar, then:

(a) 〈0, v〉 = 〈v, 0〉 = 0

(b) 〈u, v + w〉 = 〈u, v〉 + 〈u, w〉(c) 〈u, v − w〉 = 〈u, v〉 − 〈u, w〉(d ) 〈u − v, w〉 = 〈u, w〉 − 〈v, w〉(e) k〈u, v〉 = 〈u, kv〉

Proof We will prove part (b) and leave the proofs of the remaining parts as exercises.

〈u, v + w〉 = 〈v + w, u〉 [ By symmetry ]

= 〈v, u〉 + 〈w, u〉 [ By additivity ]

= 〈u, v〉 + 〈u, w〉 [ By symmetry ]

The following example illustrates how Theorem 6.1.2 and the defining properties ofinner products can be used to perform algebraic computations with inner products. Asyou read through the example, you will find it instructive to justify the steps.

EXAMPLE 12 Calculating with Inner Products

〈u − 2v, 3u + 4v〉 = 〈u, 3u + 4v〉 − 〈2v, 3u + 4v〉= 〈u, 3u〉 + 〈u, 4v〉 − 〈2v, 3u〉 − 〈2v, 4v〉= 3〈u, u〉 + 4〈u, v〉 − 6〈v, u〉 − 8〈v, v〉= 3‖u‖2 + 4〈u, v〉 − 6〈u, v〉 − 8‖v‖2

= 3‖u‖2 − 2〈u, v〉 − 8‖v‖2


Exercise Set 6.11. Let R2 have the weighted Euclidean inner product

〈u, v〉 = 2u1v1 + 3u2v2

and let u = (1, 1), v = (3, 2), w = (0,−1), and k = 3. Com-pute the stated quantities.

(a) 〈u, v〉 (b) 〈kv, w〉 (c) 〈u + v, w〉(d) ‖v‖ (e) d(u, v) (f ) ‖u − kv‖

2. Follow the directions of Exercise 1 using the weighted Eu-clidean inner product

〈u, v〉 = 12 u1v1 + 5u2v2

In Exercises 3–4, compute the quantities in parts (a)–(f) ofExercise 1 using the inner product on R2 generated by A.

3. A =[

2 1

1 1

]4. A =

[1 0

2 −1

]

In Exercises 5–6, find a matrix that generates the statedweighted inner product on R2.

5. 〈u, v〉 = 2u1v1 + 3u2v2 6. 〈u, v〉 = 12 u1v1 + 5u2v2

In Exercises 7–8, use the inner product on R2 generated by thematrix A to find 〈u, v〉 for the vectors u = (0,−3) and v = (6, 2).

7. A =[

4 1

2 −3

]8. A =

[2 1

−1 3

]

In Exercises 9–10, compute the standard inner product on M22

of the given matrices.

9. U =[

3 −2

4 8

], V =

[−1 3

1 1

]

10. U =[

1 2

−3 5

], V =

[4 6

0 8

]

In Exercises 11–12, find the standard inner product on P2 ofthe given polynomials.

11. p = −2 + x + 3x2, q = 4 − 7x2

12. p = −5 + 2x + x2, q = 3 + 2x − 4x2

In Exercises 13–14, a weighted Euclidean inner product onR2 is given for the vectors u = (u1, u2) and v = (v1, v2). Find amatrix that generates it.

13. 〈u, v〉 = 3u1v1 + 5u2v2 14. 〈u, v〉 = 4u1v1 + 6u2v2

In Exercises 15–16, a sequence of sample points is given. Usethe evaluation inner product on P3 at those sample points to find〈p, q〉 for the polynomials

p = x + x3 and q = 1 + x2

15. x0 = −2, x1 = −1, x2 = 0, x3 = 1

16. x0 = −1, x1 = 0, x2 = 1, x3 = 2

In Exercises 17–18, find ‖u‖ and d(u, v) relative to the weightedEuclidean inner product 〈u, v〉 = 2u1v1 + 3u2v2 on R2.

17. u = (−3, 2) and v = (1, 7)

18. u = (−1, 2) and v = (2, 5)

In Exercises 19–20, find‖p‖ and d(p, q) relative to the standardinner product on P2.

19. p = −2 + x + 3x2, q = 4 − 7x2

20. p = −5 + 2x + x2, q = 3 + 2x − 4x2

In Exercises 21–22, find ‖U‖ and d(U, V ) relative to the stan-dard inner product on M22.

21. U =[

3 −2

4 8

], V =

[−1 3

1 1

]

22. U =[

1 2

−3 5

], V =

[4 6

0 8

]

In Exercises 23–24, let

p = x + x3 and q = 1 + x2

Find ‖p‖ and d(p, q) relative to the evaluation inner product onP3 at the stated sample points.

23. x0 = −2, x1 = −1, x2 = 0, x3 = 1

24. x0 = −1, x1 = 0, x2 = 1, x3 = 2

In Exercises 25–26, find ‖u‖ and d(u, v) for the vectorsu = (−1, 2) and v = (2, 5) relative to the inner product on R2

generated by the matrix A.

25. A =[

4 0

3 5

]26. A =

[1 2

−1 3

]

In Exercises 27–28, suppose that u, v, and w are vectors in aninner product space such that

〈u, v〉 = 2, 〈v, w〉 = −6, 〈u, w〉 = −3

‖u‖ = 1, ‖v‖ = 2, ‖w‖ = 7

Evaluate the given expression.

27. (a) 〈2v − w, 3u + 2w〉 (b) ‖u + v‖

28. (a) 〈u − v − 2w, 4u + v〉 (b) ‖2w − v‖In Exercises 29–30, sketch the unit circle in R2 using the given

inner product.

29. 〈u, v〉 = 14 u1v1 + 1

16 u2v2 30. 〈u, v〉 = 2u1v1 + u2v2


In Exercises 31–32, find a weighted Euclidean inner producton R2 for which the “unit circle” is the ellipse shown in the accom-panying figure.

31.

x

y

1

3

Figure Ex-31

32.

x

y

1

34

Figure Ex-31

In Exercises 33–34, let u = (u1, u2, u3) and v = (v1, v2, v3).Show that the expression does not define an inner product on R3,and list all inner product axioms that fail to hold.

33. 〈u, v〉 = u21v

21 + u2

2v22 + u2

3v23

34. 〈u, v〉 = u1v1 − u2v2 + u3v3

In Exercises 35–36, suppose that u and v are vectors in an in-ner product space. Rewrite the given expression in terms of 〈u, v〉,‖u‖2, and ‖v‖2.

35. 〈2v − 4u, u − 3v〉 36. 〈5u + 6v, 4v − 3u〉37. (Calculus required ) Let the vector space P2 have the inner

product

〈p, q〉 =∫ 1

−1p(x)q(x) dx

Find the following for p = 1 and q = x2.

(a) 〈p, q〉 (b) d(p, q)

(c) ‖p‖ (d) ‖q‖38. (Calculus required ) Let the vector space P3 have the inner

product

〈p, q〉 =∫ 1

−1p(x)q(x) dx

Find the following for p = 2x3 and q = 1 − x3.

(a) 〈p, q〉 (b) d(p, q)

(c) ‖p‖ (d) ‖q‖(Calculus required ) In Exericses 39–40, use the inner product

〈f, g〉 =∫ 1

0f (x)g(x)dx

on C[0, 1] to compute 〈f, g〉.

39. f = cos 2πx, g = sin 2πx 40. f = x, g = ex

Working with Proofs

41. Prove parts (a) and (b) of Theorem 6.1.1.

42. Prove parts (c) and (d) of Theorem 6.1.1.

43. (a) Let u = (u1, u2) and v = (v1, v2). Prove that〈u, v〉 = 3u1v1 + 5u2v2 defines an inner product on R2 byshowing that the inner product axioms hold.

(b) What conditions must k1 and k2 satisfy for〈u, v〉 = k1u1v1 + k2u2v2 to define an inner product onR2? Justify your answer.

44. Prove that the following identity holds for vectors in any innerproduct space.

〈u, v〉 = 14‖u + v‖2 − 1

4‖u − v‖2

45. Prove that the following identity holds for vectors in any innerproduct space.

‖u + v‖2 + ‖u − v‖2 = 2‖u‖2 + 2‖v‖2

46. The definition of a complex vector space was given in the firstmargin note in Section 4.1. The definition of a complex innerproduct on a complex vector space V is identical to that inDefinition 1 except that scalars are allowed to be complexnumbers, and Axiom 1 is replaced by 〈u, v〉 = 〈v, u〉. Theremaining axioms are unchanged. A complex vector spacewith a complex inner product is called a complex inner productspace. Prove that if V is a complex inner product space, then〈u, kv〉 = k〈u, v〉.

47. Prove that Formula (5) defines an inner product on Rn.

48. (a) Prove that if v is a fixed vector in a real inner product spaceV , then the mapping T : V →R defined by T (x) = 〈x, v〉is a linear transformation.

(b) Let V = R3 have the Euclidean inner product, and letv = (1, 0, 2). Compute T (1, 1, 1).

(c) Let V = P2 have the standard inner product, and letv = 1 + x. Compute T (x + x2).

(d) Let V = P2 have the evaluation inner product at the pointsx0 = 1, x1 = 0, x2 = −1, and let v = 1 + x. ComputeT (x + x2).

True-False Exercises

TF. In parts (a)–(g) determine whether the statement is true orfalse, and justify your answer.

(a) The dot product on R2 is an example of a weighted innerproduct.

(b) The inner product of two vectors cannot be a negative realnumber.

(c) 〈u, v + w〉 = 〈v, u〉 + 〈w, u〉.(d) 〈ku, kv〉 = k2〈u, v〉.(e) If 〈u, v〉 = 0, then u = 0 or v = 0.

(f ) If ‖v‖2 = 0, then v = 0.

(g) If A is an n × n matrix, then 〈u, v〉 = Au · Av defines an innerproduct on Rn.


Working withTechnology

T1. (a) Confirm that the following matrix generates an innerproduct.

A =

⎡⎢⎢⎢⎣

5 8 6 −13

3 −1 0 −9

0 1 −1 0

2 4 3 −5

⎤⎥⎥⎥⎦

(b) For the following vectors, use the inner product in part (a) tocompute 〈u, v〉, first by Formula (5) and then by Formula (6).

u =

⎡⎢⎢⎢⎣

1

−2

0

3

⎤⎥⎥⎥⎦ and v =

⎡⎢⎢⎢⎣

0

1

−1

2

⎤⎥⎥⎥⎦

T2. Let the vector space P4 have the evaluation inner product atthe points

−2, −1, 0, 1, 2

and let

p = p(x) = x + x3 and q = q(x) = 1 + x2 + x4

(a) Compute 〈p, q〉, ‖p‖, and ‖q‖.

(b) Verify that the identities in Exercises 44 and 45 hold for thevectors p and q.

T3. Let the vector space M33 have the standard inner product andlet

u = U =⎡⎢⎣

1 −2 3

−2 4 1

3 1 0

⎤⎥⎦ and v = V =

⎡⎢⎣

2 −1 0

1 4 3

1 0 2

⎤⎥⎦

(a) Use Formula (8) to compute 〈u, v〉, ‖u‖, and ‖v‖.

(b) Verify that the identities in Exercises 44 and 45 hold for thevectors u and v.

6.2 Angle and Orthogonality in Inner Product SpacesIn Section 3.2 we defined the notion of “angle” between vectors in Rn. In this section wewill extend this idea to general vector spaces. This will enable us to extend the notion oforthogonality as well, thereby setting the groundwork for a variety of new applications.

Cauchy–Schwarz Inequality Recall from Formula (20) of Section 3.2 that the angle θ between two vectors u and v inRn is

θ = cos−1

(u · v

‖u‖‖v‖)

(1)

We were assured that this formula was valid because it followed from the Cauchy–Schwarz inequality (Theorem 3.2.4) that

−1 ≤ u · v‖u‖‖v‖ ≤ 1 (2)

as required for the inverse cosine to be defined. The following generalization of theCauchy–Schwarz inequality will enable us to define the angle between two vectors in anyreal inner product space.

THEOREM 6.2.1 Cauchy–Schwarz Inequality

If u and v are vectors in a real inner product space V, then

|〈u, v〉| ≤ ‖u‖‖v‖ (3)

Proof We warn you in advance that the proof presented here depends on a clever trickthat is not easy to motivate.

In the case where u = 0 the two sides of (3) are equal since 〈u, v〉 and ‖u‖ are bothzero. Thus, we need only consider the case where u �= 0. Making this assumption, let

a = 〈u, u〉, b = 2〈u, v〉, c = 〈v, v〉


and let t be any real number. Since the positivity axiom states that the inner product ofany vector with itself is nonnegative, it follows that

0 ≤ 〈tu + v, tu + v〉 = 〈u, u〉t2 + 2〈u, v〉t + 〈v, v〉= at2 + bt + c

This inequality implies that the quadratic polynomial at2 + bt + c has either no realroots or a repeated real root. Therefore, its discriminant must satisfy the inequalityb2 − 4ac ≤ 0. Expressing the coefficients a, b, and c in terms of the vectors u and vgives 4〈u, v〉2 − 4〈u, u〉〈v, v〉 ≤ 0 or, equivalently,

〈u, v〉2 ≤ 〈u, u〉〈v, v〉Taking square roots of both sides and using the fact that 〈u, u〉 and 〈v, v〉 are nonnegativeyields

|〈u, v〉| ≤ 〈u, u〉1/2〈v, v〉1/2 or equivalently |〈u, v〉| ≤ ‖u‖‖v‖which completes the proof.

The following two alternative forms of the Cauchy–Schwarz inequality are useful toknow:

〈u, v〉2 ≤ 〈u, u〉〈v, v〉 (4)

〈u, v〉2 ≤ ‖u‖2‖v‖2 (5)

The first of these formulas was obtained in the proof of Theorem 6.2.1, and the secondis a variation of the first.

Angle BetweenVectors Our next goal is to define what is meant by the “angle” between vectors in a real innerproduct space. As a first step, we leave it as an exercise for you to use the Cauchy–Schwarzinequality to show that

−1 ≤ 〈u, v〉‖u‖‖v‖ ≤ 1 (6)

This being the case, there is a unique angle θ in radian measure for which

cos θ = 〈u, v〉‖u‖‖v‖ and 0 ≤ θ ≤ π (7)

(Figure 6.2.1). This enables us to define the angle θ between u and v to be

θ = cos−1

( 〈u, v〉‖u‖‖v‖

)(8)

Figure 6.2.1

π

–1

1y

–ππ2

π2

– π2

5π2 π3

π2

3

θ


EXAMPLE 1 Cosine of the Angle BetweenVectors inM22

Let M22 have the standard inner product. Find the cosine of the angle between thevectors

u = U =[

1 2

3 4

]and v = V =

[−1 0

3 2

]

Solution We showed in Example 6 of the previous section that

〈u, v〉 = 16, ‖u‖ = √30, ‖v‖ = √

14

from which it follows that

cos θ = 〈u, v〉‖u‖‖v‖ = 16√

30√

14≈ 0.78

Properties of Length andDistance in General Inner

Product Spaces

In Section 3.2 we used the dot product to extend the notions of length and distance to Rn,

and we showed that various basic geometry theorems remained valid (see Theorems 3.2.5,3.2.6, and 3.2.7). By making only minor adjustments to the proofs of those theorems,one can show that they remain valid in any real inner product space. For example, hereis the generalization of Theorem 3.2.5 (the triangle inequalities).

THEOREM 6.2.2 If u, v, and w are vectors in a real inner product space V, and if k isany scalar, then:

(a) ‖u + v‖ ≤ ‖u‖ + ‖v‖ [ Triangle inequality for vectors ]

(b) d(u, v) ≤ d(u, w) + d(w, v) [ Triangle inequality for distances ]

Proof (a)

‖u + v‖2 = 〈u + v, u + v〉= 〈u, u〉 + 2〈u, v〉 + 〈v, v〉≤ 〈u, u〉 + 2|〈u, v〉| + 〈v, v〉 [ Property of absolute value ]

≤ 〈u, u〉 + 2‖u‖‖v‖ + 〈v, v〉 [ By (3) ]

= ‖u‖2 + 2‖u‖‖v‖ + ‖v‖2

= (‖u‖ + ‖v‖)2

Taking square roots gives ‖u + v‖ ≤ ‖u‖ + ‖v‖.

Proof (b) Identical to the proof of part (b) of Theorem 3.2.5.

Orthogonality Although Example 1 is a useful mathematical exercise, there is only an occasional needto compute angles in vector spaces other than R2 and R3. A problem of more interestin general vector spaces is ascertaining whether the angle between vectors is π/2. Youshould be able to see from Formula (8) that if u and v are nonzero vectors, then the anglebetween them is θ = π/2 if and only if 〈u, v〉 = 0. Accordingly, we make the followingdefinition, which is a generalization of Definition 1 in Section 3.3 and is applicable evenif one or both of the vectors is zero.

DEFINITION 1 Two vectors u and v in an inner product space V called orthogonal if〈u, v〉 = 0.


As the following example shows, orthogonality depends on the inner product in thesense that for different inner products two vectors can be orthogonal with respect to onebut not the other.

EXAMPLE 2 Orthogonality Depends on the Inner Product

The vectors u = (1, 1) and v = (1,−1) are orthogonal with respect to the Euclideaninner product on R2 since

u · v = (1)(1) + (1)(−1) = 0

However, they are not orthogonal with respect to the weighted Euclidean inner product〈u, v〉 = 3u1v1 + 2u2v2 since

〈u, v〉 = 3(1)(1) + 2(1)(−1) = 1 �= 0

EXAMPLE 3 OrthogonalVectors inM22

If M22 has the inner product of Example 6 in the preceding section, then the matrices

U =[

1 0

1 1

]and V =

[0 2

0 0

]are orthogonal since

〈U, V 〉 = 1(0) + 0(2) + 1(0) + 1(0) = 0

CA L C U L U S R E Q U I R E D EXAMPLE 4 OrthogonalVectors in P2

Let P2 have the inner product

〈p, q〉 =∫ 1

−1p(x)q(x) dx

and let p = x and q = x2. Then

‖p‖ = 〈p, p〉1/2 =[∫ 1

−1xx dx

]1/2

=[∫ 1

−1x2 dx

]1/2

=√

2

3

‖q‖ = 〈q, q〉1/2 =[∫ 1

−1x2x2 dx

]1/2

=[∫ 1

−1x4 dx

]1/2

=√

2

5

〈p, q〉 =∫ 1

−1xx2 dx =

∫ 1

−1x3 dx = 0

Because 〈p, q〉 = 0, the vectors p = x and q = x2 are orthogonal relative to the giveninner product.

In Theorem 3.3.3 we proved the Theorem of Pythagoras for vectors in Euclideann-space. The following theorem extends this result to vectors in any real inner productspace.

THEOREM 6.2.3 GeneralizedTheorem of Pythagoras

If u and v are orthogonal vectors in a real inner product space, then

‖u + v‖2 = ‖u‖2 + ‖v‖2


Proof The orthogonality of u and v implies that 〈u, v〉 = 0, so

‖u + v‖2 = 〈u + v, u + v〉 = ‖u‖2 + 2〈u, v〉 + ‖v‖2

= ‖u‖2 + ‖v‖2

CA L C U L U S R E Q U I R E D EXAMPLE 5 Theorem of Pythagoras in P2

In Example 4 we showed that p = x and q = x2 are orthogonal with respect to the innerproduct

〈p, q〉 =∫ 1

−1p(x)q(x) dx

on P2. It follows from Theorem 6.2.3 that

‖p + q‖2 = ‖p‖2 + ‖q‖2

Thus, from the computations in Example 4, we have

‖p + q‖2 =(√

2

3

)2

+(√

2

5

)2

= 2

3+ 2

5= 16

15

We can check this result by direct integration:

‖p + q‖2 = 〈p + q, p + q〉 =∫ 1

−1(x + x2)(x + x2) dx

=∫ 1

−1x2 dx + 2

∫ 1

−1x3 dx +

∫ 1

−1x4 dx = 2

3+ 0 + 2

5= 16

15

Orthogonal Complements In Section 4.8 we defined the notion of an orthogonal complement for subspaces of Rn,and we used that definition to establish a geometric link between the fundamental spacesof a matrix. The following definition extends that idea to general inner product spaces.

DEFINITION 2 If W is a subspace of a real inner product space V, then the set ofall vectors in V that are orthogonal to every vector in W is called the orthogonalcomplement of W and is denoted by the symbol W⊥.

In Theorem 4.8.6 we stated three properties of orthogonal complements in Rn. Thefollowing theorem generalizes parts (a) and (b) of that theorem to general real innerproduct spaces.

THEOREM 6.2.4 If W is a subspace of a real inner product space V, then:

(a) W⊥ is a subspace of V .

(b) W ∩ W⊥ = {0}.

Proof (a) The set W⊥ contains at least the zero vector, since 〈0, w〉 = 0 for every vectorw in W . Thus, it remains to show that W⊥ is closed under addition and scalar multipli-cation. To do this, suppose that u and v are vectors in W⊥, so that for every vector w inW we have 〈u, w〉 = 0 and 〈v, w〉 = 0. It follows from the additivity and homogeneityaxioms of inner products that

〈u + v, w〉 = 〈u, w〉 + 〈v, w〉 = 0 + 0 = 0

〈ku, w〉 = k〈u, w〉 = k(0) = 0

which proves that u + v and ku are in W⊥.


Proof (b) If v is any vector in both W and W⊥, then v is orthogonal to itself; that is,〈v, v〉 = 0. It follows from the positivity axiom for inner products that v = 0.

The next theorem, which we state without proof, generalizes part (c) of Theo-rem 4.8.6. Note, however, that this theorem applies only to finite-dimensional innerproduct spaces, whereas Theorem 4.8.6 does not have this restriction.

THEOREM 6.2.5 IfW is a subspace of a real finite-dimensional inner product space V,

Theorem 6.2.5 implies thatin a finite-dimensional in-ner product space orthogonalcomplements occur in pairs,each being orthogonal to theother (Figure 6.2.2).

then the orthogonal complement of W⊥ is W ; that is,

(W⊥)⊥ = W

In our study of the fundamental spaces of a matrix in Section 4.8 we showed that theW⊥

W

Figure 6.2.2 Each vector inW is orthogonal to each vectorin W⊥ and conversely.

row space and null space of a matrix are orthogonal complements with respect to theEuclidean inner product on Rn (Theorem 4.8.7). The following example takes advantageof that fact.

EXAMPLE 6 Basis for an Orthogonal Complement

Let W be the subspace of R6 spanned by the vectors

w1 = (1, 3,−2, 0, 2, 0), w2 = (2, 6,−5,−2, 4,−3),

w3 = (0, 0, 5, 10, 0, 15), w4 = (2, 6, 0, 8, 4, 18)

Find a basis for the orthogonal complement of W .

Solution The subspace W is the same as the row space of the matrix

A =

⎡⎢⎢⎢⎣

1 3 −2 0 2 0

2 6 −5 −2 4 −30 0 5 10 0 152 6 0 8 4 18

⎤⎥⎥⎥⎦

Since the row space and null space of A are orthogonal complements, our problemreduces to finding a basis for the null space of this matrix. In Example 4 of Section 4.7we showed that

v1 =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

−3

1

0

0

0

0

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, v2 =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

−4

0

−2

1

0

0

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, v3 =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

−2

0

0

0

1

0

⎤⎥⎥⎥⎥⎥⎥⎥⎦

form a basis for this null space. Expressing these vectors in comma-delimited form (tomatch that of w1, w2, w3, and w4), we obtain the basis vectors

v1 = (−3, 1, 0, 0, 0, 0), v2 = (−4, 0,−2, 1, 0, 0), v3 = (−2, 0, 0, 0, 1, 0)

You may want to check that these vectors are orthogonal to w1, w2, w3, and w4 bycomputing the necessary dot products.


Exercise Set 6.2In Exercises 1–2, find the cosine of the angle between the vec-

tors with respect to the Euclidean inner product.1. (a) u = (1,−3), v = (2, 4)

(b) u = (−1, 5, 2), v = (2, 4,−9)

(c) u = (1, 0, 1, 0), v = (−3,−3,−3,−3)

2. (a) u = (−1, 0), v = (3, 8)

(b) u = (4, 1, 8), v = (1, 0,−3)

(c) u = (2, 1, 7,−1), v = (4, 0, 0, 0)

In Exercises 3–4, find the cosine of the angle between the vec-tors with respect to the standard inner product on P2.

3. p = −1 + 5x + 2x2, q = 2 + 4x − 9x2

4. p = x − x2, q = 7 + 3x + 3x2

In Exercises 5–6, find the cosine of the angle between A and B

with respect to the standard inner product on M22.

5. A =[

2 6

1 −3

], B =

[3 2

1 0

]

6. A =[

2 4

−1 3

], B =

[−3 1

4 2

]

In Exercises 7–8, determine whether the vectors are orthogonalwith respect to the Euclidean inner product.

7. (a) u = (−1, 3, 2), v = (4, 2,−1)

(b) u = (−2,−2,−2), v = (1, 1, 1)

(c) u = (a, b), v = (−b, a)

8. (a) u = (u1, u2, u3), v = (0, 0, 0)

(b) u = (−4, 6,−10, 1), v = (2, 1,−2, 9)

(c) u = (a, b, c), v = (−c, 0, a)

In Exercises 9–10, show that the vectors are orthogonal withrespect to the standard inner product on P2.

9. p = −1 − x + 2x2, q = 2x + x2

10. p = 2 − 3x + x2, q = 4 + 2x − 2x2

In Exercises 11–12, show that the matrices are orthogonal withrespect to the standard inner product on M22.

11. U =[

2 1

−1 3

], V =

[−3 0

0 2

]

12. U =[

5 −1

2 −2

], V =

[1 3

−1 0

]

In Exercises 13–14, show that the vectors are not orthogonalwith respect to the Euclidean inner product on R2, and then finda value of k for which the vectors are orthogonal with respect tothe weighted Euclidean inner product 〈u, v〉 = 2u1v1 + ku2v2.

13. u = (1, 3), v = (2,−1) 14. u = (2,−4), v = (0, 3)

15. If the vectors u = (1, 2) and v = (2,−4) are orthogonalwith respect to the weighted Euclidean inner product〈u, v〉 = w1u1v1 + w2u2v2, what must be true of the weightsw1 and w2?

16. Let R4 have the Euclidean inner product. Find two unit vec-tors that are orthogonal to all three of the vectorsu = (2, 1,−4, 0), v = (−1,−1, 2, 2), and w = (3, 2, 5, 4).

17. Do there exist scalars k and l such that the vectors

p1 = 2 + kx + 6x2, p2 = l + 5x + 3x2, p3 = 1 + 2x + 3x2

are mutually orthogonal with respect to the standard innerproduct on P2?

18. Show that the vectors

u =[

3

3

]and v =

[5

−8

]

are orthogonal with respect to the inner product on R2 that isgenerated by the matrix

A =[

2 1

1 1

]

[See Formulas (5) and (6) of Section 6.1.]

19. Let P2 have the evaluation inner product at the points

x0 = −2, x1 = 0, x2 = 2

Show that the vectors p = x and q = x2 are orthogonal withrespect to this inner product.

20. Let M22 have the standard inner product. Determine whetherthe matrix A is in the subspace spanned by the matrices U

and V .

A =[−1 1

0 2

], U =

[1 −1

3 0

], V =

[4 0

9 2

]

In Exercises 21–24, confirm that the Cauchy–Schwarz inequal-ity holds for the given vectors using the stated inner product.

21. u = (1, 0, 3), v = (2, 1,−1) using the weighted Euclidean in-ner product 〈u, v〉 = 2u1v1 + 3u2v2 + u3v3 in R3.

22. U =[−1 2

6 1

]and V =

[1 0

3 3

]using the standard inner product on M22.

23. p = −1 + 2x + x2 and q = 2 − 4x2 using the standard innerproduct on P2.

24. The vectors

u =[

1

1

]and v =

[1

−1

]with respect to the inner product in Exercise 18.


25. Let R4 have the Euclidean inner product, and letu = (−1, 1, 0, 2). Determine whether the vector u is orthogo-nal to the subspace spanned by the vectors w1 = (1,−1, 3, 0)and w2 = (4, 0, 9, 2).

26. Let P3 have the standard inner product, and let

p = −1 − x + 2x2 + 4x3

Determine whether p is orthogonal to the subspace spanned bythe polynomials w1 = 2 − x2 + x3 and w2 = 4x − 2x2 + 2x3.

In Exercises 27–28, find a basis for the orthogonal complementof the subspace of Rn spanned by the vectors.

27. v1 = (1, 4, 5, 2), v2 = (2, 1, 3, 0), v3 = (−1, 3, 2, 2)

28. v1 = (1, 4, 5, 6, 9), v2 = (3,−2, 1, 4,−1),v3 = (−1, 0,−1,−2,−1), v4 = (2, 3, 5, 7, 8)

In Exercises 29–30, assume that Rn has the Euclidean innerproduct.

29. (a) Let W be the line in R2 with equation y = 2x. Find anequation for W⊥.

(b) Let W be the plane in R3 with equation x − 2y − 3z = 0.Find parametric equations for W⊥.

30. (a) Let W be the y-axis in an xyz-coordinate system in R3.Describe the subspace W⊥.

(b) Let W be the yz-plane of an xyz-coordinate system in R3.Describe the subspace W⊥.

31. (Calculus required ) Let C[0, 1] have the integral inner product

〈p, q〉 =∫ 1

0p(x)q(x) dx

and let p = p(x) = x and q = q(x) = x2.

(a) Find 〈p, q〉.(b) Find ‖p‖ and ‖q‖.

32. (a) Find the cosine of the angle between the vectors p and qin Exercise 31.

(b) Find the distance between the vectors p and q in Exer-cise 31.

33. (Calculus required ) Let C[−1, 1] have the integral innerproduct

〈p, q〉 =∫ 1

−1p(x)q(x) dx

and let p = p(x) = x2 − x and q = q(x) = x + 1.

(a) Find 〈p, q〉.(b) Find ‖p‖ and ‖q‖.

34. (a) Find the cosine of the angle between the vectors p and qin Exercise 33.

(b) Find the distance between the vectors p and q in Exer-cise 33.

35. (Calculus required ) Let C[0, 1] have the inner product in Ex-ercise 31.

(a) Show that the vectors

p = p(x) = 1 and q = q(x) = 12 − x

are orthogonal.

(b) Show that the vectors in part (a) satisfy the Theorem ofPythagoras.

36. (Calculus required ) Let C[−1, 1] have the inner product inExercise 33.


p = p(x) = x and q = q(x) = x2 − 1

are orthogonal.

(b) Show that the vectors in part (a) satisfy the Theorem ofPythagoras.

37. Let V be an inner product space. Show that if u and v areorthogonal unit vectors in V, then ‖u − v‖ = √

2.

38. Let V be an inner product space. Show that if w is orthogonalto both u1 and u2, then it is orthogonal to k1u1 + k2u2 for allscalars k1 and k2. Interpret this result geometrically in the casewhere V is R3 with the Euclidean inner product.

39. (Calculus required ) Let C[0, π ] have the inner product

〈f, g〉 =∫ π

0f(x)g(x) dx

and let fn = cos nx (n = 0, 1, 2, . . .). Show that if k �= l, thenfk and fl are orthogonal vectors.

40. As illustrated in the accompanying figure, the vectorsu = (1,

√3 ) and v = (−1,

√3 ) have norm 2 and an angle

of 60◦ between them relative to the Euclidean inner product.Find a weighted Euclidean inner product with respect to whichu and v are orthogonal unit vectors.

y

xuv

2

60°

(–1, √3) (1, √3)

Figure Ex-40

Working with Proofs

41. Let V be an inner product space. Prove that if w is orthogonalto each of the vectors u1, u2, . . . , ur , then it is orthogonal toevery vector in span{u1, u2, . . . , ur}.

42. Let {v1, v2, . . . , vr} be a basis for an inner product space V .Prove that the zero vector is the only vector in V that is or-thogonal to all of the basis vectors.


43. Let {w1, w2, . . . , wk} be a basis for a subspace W of V . Provethat W⊥ consists of all vectors in V that are orthogonal toevery basis vector.

44. Prove the following generalization of Theorem 6.2.3: Ifv1, v2, . . . , vr are pairwise orthogonal vectors in an innerproduct space V, then

‖v1 + v2 + · · · + vr‖2 = ‖v1‖2 + ‖v2‖2 + · · · + ‖vr‖2

45. Prove: If u and v are n × 1 matrices and A is an n × n matrix,then

(vTATAu)2 ≤ (uTATAu)(vTATAv)

46. Use the Cauchy–Schwarz inequality to prove that for all realvalues of a, b, and θ ,

(a cos θ + b sin θ)2 ≤ a2 + b2

47. Prove: If w1, w2, . . . , wn are positive real numbers, andif u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn) are any twovectors in Rn, then

|w1u1v1 + w2u2v2 + · · · + wnunvn|≤ (w1u

21 + w2u

22 + · · · + wnu

2n)

1/2(w1v21 + w2v

22 + · · · + wnv

2n)

1/2

48. Prove that equality holds in the Cauchy–Schwarz inequality ifand only if u and v are linearly dependent.

49. (Calculus required ) Let f(x) and g(x) be continuous functionson [0, 1]. Prove:

(a)

[∫ 1

0f(x)g(x) dx

]2

≤[∫ 1

0f 2(x) dx

][∫ 1

0g2(x) dx

]

(b)

[∫ 1

0[f(x) + g(x)]2 dx

]1/2

≤[∫ 1

0f 2(x) dx

]1/2

+[∫ 1

0g2(x) dx

]1/2

[Hint: Use the Cauchy–Schwarz inequality.]

50. Prove that Formula (4) holds for all nonzero vectors u and vin a real inner product space V .

51. Let TA: R2 →R2 be multiplication by

A =[

1 1

−1 1

]and let x = (1, 1).

(a) Assuming that R2 has the Euclidean inner product, findall vectors v in R2 such that 〈x, v〉 = 〈TA(x), TA(v)〉.

(b) Assuming that R2 has the weighted Euclidean inner prod-uct 〈u, v〉 = 2u1v1 + 3u2v2, find all vectors v in R2 suchthat 〈x, v〉 = 〈TA(x), TA(v)〉.

52. Let T : P2 →P2 be the linear transformation defined by

T (a + bx + cx2) = 3a − cx2

and let p = 1 + x.

(a) Assuming that P2 has the standard inner product, find allvectors q in P2 such that 〈p, q〉 = 〈T (p), T (q)〉.

(b) Assuming that P2 has the evaluation inner product at thepoints x0 = −1, x1 = 0, x2 = 1, find all vectors q in P2

such that 〈p, q〉 = 〈T (p), T (q)〉.


TF. In parts (a)–(f ) determine whether the statement is true orfalse, and justify your answer.

(a) If u is orthogonal to every vector of a subspace W , then u = 0.

(b) If u is a vector in both W and W⊥, then u = 0.

(c) If u and v are vectors in W⊥, then u + v is in W⊥.

(d) If u is a vector in W⊥ and k is a real number, then ku is in W⊥.

(e) If u and v are orthogonal, then |〈u, v〉| = ‖u‖‖v‖.

(f ) If u and v are orthogonal, then ‖u + v‖ = ‖u‖ + ‖v‖.


T1. (a) We know that the row space and null space of a matrixare orthogonal complements relative to the Euclidean innerproduct. Confirm this fact for the matrix

A =

⎡⎢⎢⎢⎢⎢⎢⎣

2 −1 3 5

4 −3 1 3

3 −2 3 4

4 −1 15 17

7 −6 −7 0

⎤⎥⎥⎥⎥⎥⎥⎦

(b) Find a basis for the orthogonal complement of the columnspace of A.

T2. In each part, confirm that the vectors u and v satisfy theCauchy–Schwarz inequality relative to the stated inner product.

(a) M44 with the standard inner product.

u =

⎡⎢⎢⎢⎣

1 0 2 0

0 −1 0 1

3 0 0 2

0 4 −3 0

⎤⎥⎥⎥⎦ and v =

⎡⎢⎢⎢⎣

2 2 1 3

3 −1 0 1

1 0 0 −2

−3 1 2 0

⎤⎥⎥⎥⎦

(b) R4 with the weighted Euclidean inner product with weightsw1 = 1

2 , w2 = 14 , w3 = 1

8 , w4 = 18 .

u = (1,−2, 2, 1) and v = (0,−3, 3,−2)


6.3 Gram–Schmidt Process; QR-DecompositionIn many problems involving vector spaces, the problem solver is free to choose any basis forthe vector space that seems appropriate. In inner product spaces, the solution of a problemcan often be simplified by choosing a basis in which the vectors are orthogonal to oneanother. In this section we will show how such bases can be obtained.

Orthogonal andOrthonormal Sets

Recall from Section 6.2 that two vectors in an inner product space are said to beorthogonalif their inner product is zero. The following definition extends the notion of orthogonalityto sets of vectors in an inner product space.

DEFINITION 1 A set of two or more vectors in a real inner product space is said to beorthogonal if all pairs of distinct vectors in the set are orthogonal. An orthogonal setin which each vector has norm 1 is said to be orthonormal.

EXAMPLE 1 An Orthogonal Set in R3

Letv1 = (0, 1, 0), v2 = (1, 0, 1), v3 = (1, 0,−1)

and assume that R3 has the Euclidean inner product. It follows that the set of vectorsS = {v1, v2, v3} is orthogonal since 〈v1, v2〉 = 〈v1, v3〉 = 〈v2, v3〉 = 0.

It frequently happens that one has found a set of orthogonal vectors in an innerproduct space but what is actually needed is a set of orthonormal vectors. A simple wayto convert an orthogonal set of nonzero vectors into an orthonormal set is to multiplyeach vector v in the orthogonal set by the reciprocal of its length to create a vector ofnorm 1 (called a unit vector). To see why this works, suppose that v is a nonzero vectorin an inner product space, and let

u = 1

‖v‖v (1)

Then it follows from Theorem 6.1.1(b) with k = ‖v‖ that

Note that Formula (1) is iden-tical to Formula (4) of Sec-tion 3.2, but whereas For-mula (4) was valid only for vec-tors in Rn with the Euclideaninner product, Formula (1) isvalid in general inner productspaces.

‖u‖ =∥∥∥∥ 1

‖v‖v

∥∥∥∥ =∣∣∣∣ 1

‖v‖∣∣∣∣ ‖v‖ = 1

‖v‖‖v‖ = 1

This process of multiplying a vector v by the reciprocal of its length is called normalizing v.We leave it as an exercise to show that normalizing the vectors in an orthogonal set ofnonzero vectors preserves the orthogonality of the vectors and produces an orthonormalset.

EXAMPLE 2 Constructing an Orthonormal Set

The Euclidean norms of the vectors in Example 1 are

‖v1‖ = 1, ‖v2‖ = √2, ‖v3‖ = √

2

Consequently, normalizing u1, u2, and u3 yields

u1 = v1

‖v1‖ = (0, 1, 0), u2 = v2

‖v2‖ =(

1√2, 0,

1√2

),

u3 = v3

‖v3‖ =(

1√2, 0,− 1√

2

)

6.3 Gram–Schmidt Process; QR -Decomposition 365

We leave it for you to verify that the set S = {u1, u2, u3} is orthonormal by showing that

〈u1, u2〉 = 〈u1, u3〉 = 〈u2, u3〉 = 0 and ‖u1‖ = ‖u2‖ = ‖u3‖ = 1

InR2 any two nonzero perpendicular vectors are linearly independent because neitheris a scalar multiple of the other; and in R3 any three nonzero mutually perpendicularvectors are linearly independent because no one lies in the plane of the other two (andhence is not expressible as a linear combination of the other two). The following theoremgeneralizes these observations.

THEOREM 6.3.1 If S = {v1, v2, . . . , vn} is an orthogonal set of nonzero vectors in aninner product space, then S is linearly independent.

Proof Assume thatk1v1 + k2v2 + · · · + knvn = 0 (2)

To demonstrate that S = {v1, v2, . . . , vn} is linearly independent, we must prove thatk1 = k2 = · · · = kn = 0.

For each vi in S, it follows from (2) that

〈k1v1 + k2v2 + · · · + knvn, vi〉 = 〈0, vi〉 = 0

or, equivalently,k1〈v1, vi〉 + k2〈v2, vi〉 + · · · + kn〈vn, vi〉 = 0

From the orthogonality of S it follows that 〈vj , vi〉 = 0 when j �= i, so this equationreduces to

ki〈vi , vi〉 = 0

Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom

Since an orthonormal set is or-thogonal, and since its vectorsare nonzero (norm 1), it fol-lows from Theorem 6.3.1 thatevery orthonormal set is lin-early independent.

for inner products that 〈vi , vi〉 �= 0. Thus, the preceding equation implies that each ki inEquation (2) is zero, which is what we wanted to prove.

In an inner product space, a basis consisting of orthonormal vectors is called anorthonormal basis, and a basis consisting of orthogonal vectors is called an orthogonalbasis. A familiar example of an orthonormal basis is the standard basis for Rn with theEuclidean inner product:

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1)

EXAMPLE 3 An Orthonormal Basis for PnRecall from Example 7 of Section 6.1 that the standard inner product of the polynomials

p = a0 + a1x + · · · + anxn and q = b0 + b1x + · · · + bnx

n

is〈p, q〉 = a0b0 + a1b1 + · · · + anbn

and the norm of p relative to this inner product is

‖p‖ = √〈p, p〉 =√

a20 + a2

1 + · · · + a2n

You should be able to see from these formulas that the standard basis

S = {1, x, x2, . . . , xn

}is orthonormal with respect to this inner product.


EXAMPLE 4 An Orthonormal Basis

In Example 2 we showed that the vectors

u1 = (0, 1, 0), u2 =(

1√2, 0,

1√2

), and u3 =

(1√2, 0,− 1√

2

)form an orthonormal set with respect to the Euclidean inner product on R3. By Theorem6.3.1, these vectors form a linearly independent set, and since R3 is three-dimensional,it follows from Theorem 4.5.4 that S = {u1, u2, u3} is an orthonormal basis for R3.

Coordinates Relative toOrthonormal Bases

One way to express a vector u as a linear combination of basis vectors

S = {v1, v2, . . . , vn}is to convert the vector equation

u = c1v1 + c2v2 + · · · + cnvn

to a linear system and solve for the coefficients c1, c2, . . . , cn. However, if the basishappens to be orthogonal or orthonormal, then the following theorem shows that thecoefficients can be obtained more simply by computing appropriate inner products.

THEOREM 6.3.2

(a) If S = {v1, v2, . . . , vn} is an orthogonal basis for an inner product space V, and ifu is any vector in V, then

u = 〈u, v1〉‖v1‖2

v1 + 〈u, v2〉‖v2‖2

v2 + · · · + 〈u, vn〉‖vn‖2

vn (3)

(b) If S = {v1, v2, . . . , vn} is an orthonormal basis for an inner product space V, andif u is any vector in V, then

u = 〈u, v1〉v1 + 〈u, v2〉v2 + · · · + 〈u, vn〉vn (4)

Proof (a) Since S = {v1, v2, . . . , vn} is a basis for V, every vector u in V can be expressedin the form

u = c1v1 + c2v2 + · · · + cnvn

We will complete the proof by showing that

ci = 〈u, vi〉‖vi‖2

(5)

for i = 1, 2, . . . , n. To do this, observe first that

〈u, vi〉 = 〈c1v1 + c2v2 + · · · + cnvn, vi〉= c1〈v1, vi〉 + c2〈v2, vi〉 + · · · + cn〈vn, vi〉

Since S is an orthogonal set, all of the inner products in the last equality are zero exceptthe ith, so we have

〈u, vi〉 = ci〈vi , vi〉 = ci‖vi‖2

Solving this equation for ci yields (5), which completes the proof.

Proof (b) In this case, ‖v1‖ = ‖v2‖ = · · · = ‖vn‖ = 1, so Formula (3) simplifies to For-mula (4).


Using the terminology and notation from Definition 2 of Section 4.4, it follows fromTheorem 6.3.2 that the coordinate vector of a vector u in V relative to an orthogonalbasis S = {v1, v2, . . . , vn} is

(u)S =( 〈u, v1〉

‖v1‖2,〈u, v2〉‖v2‖2

, . . . ,〈u, vn〉‖vn‖2

)(6)

and relative to an orthonormal basis S = {v1, v2, . . . , vn} is

(u)S = (〈u, v1〉, 〈u, v2〉, . . . , 〈u, vn〉) (7)

EXAMPLE 5 A CoordinateVector Relative to an Orthonormal Basis

Letv1 = (0, 1, 0), v2 = (− 4

5 , 0, 35

), v3 = (

35 , 0, 4

5

)It is easy to check that S = {v1, v2, v3} is an orthonormal basis for R3 with the Euclideaninner product. Express the vector u = (1, 1, 1) as a linear combination of the vectors inS, and find the coordinate vector (u)S .

Solution We leave it for you to verify that

〈u, v1〉 = 1, 〈u, v2〉 = − 15 , and 〈u, v3〉 = 7

5

Therefore, by Theorem 6.3.2 we have

u = v1 − 15 v2 + 7

5 v3

that is,(1, 1, 1) = (0, 1, 0) − 1

5

(− 45 , 0, 3

5

)+ 75

(35 , 0, 4

5

)Thus, the coordinate vector of u relative to S is

(u)S = (〈u, v1〉, 〈u, v2〉, 〈u, v3〉) = (1,− 1

5 ,75

)

EXAMPLE 6 An Orthonormal Basis from an Orthogonal Basis


w1 = (0, 2, 0), w2 = (3, 0, 3), w3 = (−4, 0, 4)

form an orthogonal basis for R3 with the Euclidean inner product, and use thatbasis to find an orthonormal basis by normalizing each vector.

(b) Express the vector u = (1, 2, 4) as a linear combination of the orthonormal basisvectors obtained in part (a).

Solution (a) The given vectors form an orthogonal set since

〈w1, w2〉 = 0, 〈w1, w3〉 = 0, 〈w2, w3〉 = 0

It follows from Theorem 6.3.1 that these vectors are linearly independent and hence forma basis for R3 by Theorem 4.5.4. We leave it for you to calculate the norms of w1, w2,and w3 and then obtain the orthonormal basis

v1 = w1

‖w1‖ = (0, 1, 0), v2 = w2

‖w2‖ =(

1√2, 0,

1√2

),

v3 = w3

‖w3‖ =(− 1√

2, 0,

1√2

)


Solution (b) It follows from Formula (4) that

u = 〈u, v1〉v1 + 〈u, v2〉v2 + 〈u, v3〉v3

We leave it for you to confirm that

〈u, v1〉 = (1, 2, 4) · (0, 1, 0) = 2

〈u, v2〉 = (1, 2, 4) ·(

1√2, 0,

1√2

)= 5√

2

〈u, v3〉 = (1, 2, 4) ·(− 1√

2, 0,

1√2

)= 3√

2

and hence that

(1, 2, 4) = 2(0, 1, 0) + 5√2

(1√2, 0,

1√2

)+ 3√

2

(− 1√

2, 0,

1√2

)

Orthogonal Projections Many applied problems are best solved by working with orthogonal or orthonormalbasis vectors. Such bases are typically found by starting with some simple basis (say astandard basis) and then converting that basis into an orthogonal or orthonormal basis.To explain exactly how that is done will require some preliminary ideas about orthogonalprojections.

In Section 3.3 we proved a result called the Projection Theorem (see Theorem 3.3.2)that dealt with the problem of decomposing a vector u in Rn into a sum of two terms,w1 and w2, in which w1 is the orthogonal projection of u on some nonzero vector a andw2 is orthogonal to w1 (Figure 3.3.2). That result is a special case of the following moregeneral theorem, which we will state without proof.

THEOREM 6.3.3 ProjectionTheorem

If W is a finite-dimensional subspace of an inner product space V, then every vector uin V can be expressed in exactly one way as

u = w1 + w2 (8)

where w1 is in W and w2 is in W⊥.

The vectors w1 and w2 in Formula (8) are commonly denoted by

w1 = projW u and w2 = projW⊥ u (9)

These are called the orthogonal projection of u on W and the orthogonal projection of uon W⊥, respectively. The vector w2 is also called the component of u orthogonal to W .Using the notation in (9), Formula (8) can be expressed as

u = projW u + projW⊥ u (10)

(Figure 6.3.1). Moreover, since projW⊥u = u − projW u, we can also express Formula

W⊥

W0

uprojW⊥ u

projW u

Figure 6.3.1

(10) as

u = projW u + (u − projW u) (11)


The following theorem provides formulas for calculating orthogonal projections.

Although Formulas (12) and(13) are expressed in terms oforthogonal and orthonormalbasis vectors, the resulting vec-tor projW u does not depend onthe basis vectors that are used.

THEOREM 6.3.4 Let W be a finite-dimensional subspace of an inner product space V .

(a) If {v1, v2, . . . , vr} is an orthogonal basis for W, and u is any vector in V, then

projW u = 〈u, v1〉‖v1‖2

v1 + 〈u, v2〉‖v2‖2

v2 + · · · + 〈u, vr〉‖vr‖2

vr (12)

(b) If {v1, v2, . . . , vr} is an orthonormal basis for W, and u is any vector in V, then

projW u = 〈u, v1〉v1 + 〈u, v2〉v2 + · · · + 〈u, vr〉vr (13)

Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the formu = w1 + w2, where w1 = projW u is in W and w2 is in W⊥; and it follows from Theo-rem 6.3.2 that the component projW u = w1 can be expressed in terms of the basis vectorsfor W as

projW u = w1 = 〈w1, v1〉‖v1‖2

v1 + 〈w1, v2〉‖v2‖2

v2 + · · · + 〈w1, vr〉‖vr‖2

vr (14)

Since w2 is orthogonal to W , it follows that

〈w2, v1〉 = 〈w2, v2〉 = · · · = 〈w2, vr〉 = 0

so we can rewrite (14) as

projW u = w1 = 〈w1 + w2, v1〉‖v1‖2

v1 + 〈w1 + w2, v2〉‖v2‖2

v2 + · · · + 〈w1 + w2, vr〉‖vr‖2

vr

or, equivalently, as

projW u = w1 = 〈u, v1〉‖v1‖2

v1 + 〈u, v2〉‖v2‖2

v2 + · · · + 〈u, vr〉‖vr‖2

vr

Proof (b) In this case, ‖v1‖ = ‖v2‖ = · · · = ‖vr‖ = 1, so Formula (14) simplifies toFormula (13).

EXAMPLE 7 Calculating Projections

Let R3 have the Euclidean inner product, and let W be the subspace spanned by theorthonormal vectors v1 = (0, 1, 0) and v2 = (− 4

5 , 0, 35

). From Formula (13) the or-

thogonal projection of u = (1, 1, 1) on W is

projW u = 〈u, v1〉v1 + 〈u, v2〉v2

= (1)(0, 1, 0) + (− 15

) (− 45 , 0, 3

5

)= (

425 , 1,− 3

25

)The component of u orthogonal to W is

projW⊥ u = u − projW u = (1, 1, 1) − (425 , 1,− 3

25

) = (2125 , 0, 28

25

)Observe that projW⊥ u is orthogonal to both v1 and v2, so this vector is orthogonal toeach vector in the space W spanned by v1 and v2, as it should be.

A Geometric Interpretationof Orthogonal Projections

If W is a one-dimensional subspace of an inner product space V, say span{a}, thenFormula (12) has only the one term

projW u = 〈u, a〉‖a‖2

a

In the special case where V is R3 with the Euclidean inner product, this is exactly For-mula (10) of Section 3.3 for the orthogonal projection of u along a. This suggests that


we can think of (12) as the sum of orthogonal projections on “axes” determined by thebasis vectors for the subspace W (Figure 6.3.2).

Figure 6.3.2

W

u

v2

v1

projW u

projv2u

projv1u

0

The Gram–Schmidt Process We have seen that orthonormal bases exhibit a variety of useful properties. Our next the-orem, which is the main result in this section, shows that every nonzero finite-dimensionalvector space has an orthonormal basis. The proof of this result is extremely importantsince it provides an algorithm, or method, for converting an arbitrary basis into anorthonormal basis.

THEOREM 6.3.5 Every nonzero finite-dimensional inner product space has an ortho-normal basis.

Proof Let W be any nonzero finite-dimensional subspace of an inner product space, andsuppose that {u1, u2, . . . , ur} is any basis for W . It suffices to show that W has an orthog-onal basis since the vectors in that basis can be normalized to obtain an orthonormalbasis. The following sequence of steps will produce an orthogonal basis {v1, v2, . . . , vr}for W :

Step 1. Let v1 = u1.

Step 2. As illustrated in Figure 6.3.3, we can obtain a vector v2 that is orthogonal to v1

W1

u2

v2 = u2 – projW1 u2

v1 projW1 u2

Figure 6.3.3

by computing the component of u2 that is orthogonal to the space W1 spannedby v1. Using Formula (12) to perform this computation, we obtain

v2 = u2 − projW1u2 = u2 − 〈u2, v1〉

‖v1‖2v1

Of course, if v2 = 0, then v2 is not a basis vector. But this cannot happen, sinceit would then follow from the preceding formula for v2 that

u2 = 〈u2, v1〉‖v1‖2

v1 = 〈u2, v1〉‖u1‖2

u1

which implies that u2 is a multiple of u1, contradicting the linear independenceof the basis {u1, u2, . . . , ur}.

Step 3. To construct a vector v3 that is orthogonal to both v1 and v2, we compute thecomponent of u3 orthogonal to the space W2 spanned by v1 and v2 (Figure 6.3.4).

W2

u3

v3 = u3 – projW2 u3

v2

v1

projW2 u3

Figure 6.3.4

Using Formula (12) to perform this computation, we obtain

v3 = u3 − projW2u3 = u3 − 〈u3, v1〉

‖v1‖2v1 − 〈u3, v2〉

‖v2‖2v2

As in Step 2, the linear independence of {u1, u2, . . . , ur} ensures that v3 �= 0. Weleave the details for you.


Step 4. To determine a vector v4 that is orthogonal to v1, v2, and v3, we compute thecomponent of u4 orthogonal to the space W3 spanned by v1, v2, and v3. From (12),

v4 = u4 − projW3u4 = u4 − 〈u4, v1〉

‖v1‖2v1 − 〈u4, v2〉

‖v2‖2v2 − 〈u4, v3〉

‖v3‖2v3

Continuing in this way we will produce after r steps an orthogonal set of nonzerovectors {v1, v2, . . . , vr}. Since such sets are linearly independent, we will have producedan orthogonal basis for the r-dimensional space W . By normalizing these basis vectorswe can obtain an orthonormal basis.

The step-by-step construction of an orthogonal (or orthonormal) basis given inthe foregoing proof is called the Gram–Schmidt process. For reference, we provide thefollowing summary of the steps.

The Gram–Schmidt Process

To convert a basis {u1, u2, . . . , ur} into an orthogonal basis {v1, v2, . . . , vr}, performthe following computations:

Step 1. v1 = u1

Step 2. v2 = u2 − 〈u2, v1〉‖v1‖2

v1

Step 3. v3 = u3 − 〈u3, v1〉‖v1‖2

v1 − 〈u3, v2〉‖v2‖2

v2

Step 4. v4 = u4 − 〈u4, v1〉‖v1‖2

v1 − 〈u4, v2〉‖v2‖2

v2 − 〈u4, v3〉‖v3‖2

v3

...

(continue for r steps)

Optional Step. To convert the orthogonal basis into an orthonormal basis{q1, q2, . . . , qr}, normalize the orthogonal basis vectors.

Jorgen Pederson Gram(1850–1916)

Historical Note Erhardt Schmidt (1875–1959) was a German mathematicianwho studied for his doctoral degree at Göttingen University under DavidHilbert, one of the giants of modern mathematics. For most of his life he taughtat Berlin University where, in addition to making important contributions tomany branches of mathematics, he fashioned some of Hilbert’s ideas into ageneral concept, called a Hilbert space—a fundamental structure in the studyof infinite-dimensional vector spaces. He first described the process that bearshis name in a paper on integral equations that he published in 1907.

Historical Note Gram was a Danish actuary whose early education was at vil-lage schools supplemented by private tutoring. He obtained a doctorate degreein mathematics while working for the Hafnia Life Insurance Company, wherehe specialized in the mathematics of accident insurance. It was in his disser-tation that his contributions to the Gram–Schmidt process were formulated.He eventually became interested in abstract mathematics and received a goldmedal from the Royal Danish Society of Sciences and Letters in recognition ofhis work. His lifelong interest in applied mathematics never wavered, however,and he produced a variety of treatises on Danish forest management.

[Image: http://www-history.mcs.st-and.ac.uk/PictDisplay/Gram.html]


EXAMPLE 8 Using the Gram–Schmidt Process

Assume that the vector space R3 has the Euclidean inner product. Apply the Gram–Schmidt process to transform the basis vectors

u1 = (1, 1, 1), u2 = (0, 1, 1), u3 = (0, 0, 1)

into an orthogonal basis {v1, v2, v3}, and then normalize the orthogonal basis vectors toobtain an orthonormal basis {q1, q2, q3}.Solution

Step 1. v1 = u1 = (1, 1, 1)

Step 2. v2 = u2 − projW1u2 = u2 − 〈u2, v1〉

‖v1‖2v1

= (0, 1, 1) − 2

3(1, 1, 1) =

(−2

3,

1

3,

1

3

)

Step 3. v3 = u3 − projW2u3 = u3 − 〈u3, v1〉

‖v1‖2v1 − 〈u3, v2〉

‖v2‖2v2

= (0, 0, 1) − 1

3(1, 1, 1) − 1/3

2/3

(−2

3,

1

3,

1

3

)

=(

0,−1

2,

1

2

)Thus,

v1 = (1, 1, 1), v2 =(−2

3,

1

3,

1

3

), v3 =

(0,−1

2,

1

2

)form an orthogonal basis for R3. The norms of these vectors are

‖v1‖ = √3, ‖v2‖ =

√6

3, ‖v3‖ = 1√

2

so an orthonormal basis for R3 is

q1 = v1

‖v1‖ =(

1√3,

1√3,

1√3

), q2 = v2

‖v2‖ =(− 2√

6,

1√6,

1√6

),

q3 = v3

‖v3‖ =(

0,− 1√2,

1√2

)

Remark In the last example we normalized at the end to convert the orthogonal basis into anorthonormal basis. Alternatively, we could have normalized each orthogonal basis vector as soonas it was obtained, thereby producing an orthonormal basis step by step. However, that proceduregenerally has the disadvantage in hand calculation of producing more square roots to manipulate.A more useful variation is to “scale” the orthogonal basis vectors at each step to eliminate some ofthe fractions. For example, after Step 2 above, we could have multiplied by 3 to produce (−2, 1, 1)as the second orthogonal basis vector, thereby simplifying the calculations in Step 3.

EXAMPLE 9 Legendre Polynomials

Let the vector space P2 have the inner product


〈p, q〉 =∫ 1

−1p(x)q(x) dx

Apply the Gram–Schmidt process to transform the standard basis {1, x, x2} for P2 intoan orthogonal basis {φ1(x), φ2(x), φ3(x)}.


Solution Take u1 = 1, u2 = x, and u3 = x2.

Step 1. v1 = u1 = 1

Step 2. We have

〈u2, v1〉 =∫ 1

−1x dx = 0

so

v2 = u2 − 〈u2, v1〉‖v1‖2

v1 = u2 = x

Step 3. We have

〈u3, v1〉 =∫ 1

−1x2 dx = x3

3

]1

−1

= 2

3

〈u3, v2〉 =∫ 1

−1x3 dx = x4

4

]1

−1

= 0

‖v1‖2 = 〈v1, v1〉 =∫ 1

−11 dx = x

]1

−1

= 2

so

v3 = u3 − 〈u3, v1〉‖v1‖2

v1 − 〈u3, v2〉‖v2‖2

v2 = x2 − 1

3

Thus, we have obtained the orthogonal basis {φ1(x), φ2(x), φ3(x)} in which

φ1(x) = 1, φ2(x) = x, φ3(x) = x2 − 1

3

Remark The orthogonal basis vectors in the last example are often scaled so all three functionshave a value of 1 at x = 1. The resulting polynomials

1, x,1

2(3x2 − 1)

which are known as the first three Legendre polynomials, play an important role in a variety ofapplications. The scaling does not affect the orthogonality.

Extending OrthonormalSets to Orthonormal Bases

Recall from part (b) of Theorem 4.5.5 that a linearly independent set in a finite-dimensionalvector space can be enlarged to a basis by adding appropriate vectors. The following the-orem is an analog of that result for orthogonal and orthonormal sets in finite-dimensionalinner product spaces.

THEOREM 6.3.6 If W is a finite-dimensional inner product space, then:

(a) Every orthogonal set of nonzero vectors in W can be enlarged to an orthogonalbasis for W .

(b) Every orthonormal set in W can be enlarged to an orthonormal basis for W .

We will prove part (b) and leave part (a) as an exercise.

Proof (b) Suppose that S = {v1, v2, . . . , vs} is an orthonormal set of vectors in W .Part (b) of Theorem 4.5.5 tells us that we can enlarge S to some basis

S ′ = {v1, v2, . . . , vs , vs+1, . . . , vk}for W . If we now apply the Gram–Schmidt process to the set S ′, then the vectorsv1, v2, . . . , vs , will not be affected since they are already orthonormal, and the resultingset

S ′′ = {v1, v2, . . . , vs , v′s+1, . . . , v′k}will be an orthonormal basis for W .


QR-DecompositionIn recent years a numerical algorithm based on the Gram–Schmidt process, and knownO PT I O NA L

as QR-decomposition, has assumed growing importance as the mathematical foundationfor a wide variety of numerical algorithms, including those for computing eigenvalues oflarge matrices. The technical aspects of such algorithms are discussed in textbooks thatspecialize in the numerical aspects of linear algebra. However, we will discuss some ofthe underlying ideas here. We begin by posing the following problem.

Problem If A is an m × n matrix with linearly independent column vectors, and ifQ is the matrix that results by applying the Gram–Schmidt process to the columnvectors of A, what relationship, if any, exists between A and Q?

To solve this problem, suppose that the column vectors of A are u1, u2, . . . , un andthat Q has orthonormal column vectors q1, q2, . . . , qn. Thus, A and Q can be writtenin partitioned form as

A = [u1 | u2 | · · · | un] and Q = [q1 | q2 | · · · | qn]It follows from Theorem 6.3.2(b) that u1, u2, . . . , un are expressible in terms of the vectorsq1, q2, . . . , qn as

u1 = 〈u1, q1〉q1 + 〈u1, q2〉q2 + · · ·+ 〈u1, qn〉qn

u2 = 〈u2, q1〉q1 + 〈u2, q2〉q2 + · · ·+ 〈u2, qn〉qn

......

......

un = 〈un, q1〉q1 + 〈un, q2〉q2 + · · ·+ 〈un, qn〉qn

Recalling from Section 1.3 (Example 9) that the j th column vector of a matrix productis a linear combination of the column vectors of the first factor with coefficients comingfrom the j th column of the second factor, it follows that these relationships can beexpressed in matrix form as

[u1 | u2 | · · · | un] = [q1 | q2 | · · · | qn]

⎡⎢⎢⎢⎣〈u1, q1〉〈u2, q1〉 · · · 〈un, q1〉〈u1, q2〉〈u2, q2〉 · · · 〈un, q2〉

......

...

〈u1, qn〉〈u2, qn〉 · · · 〈un, qn〉

⎤⎥⎥⎥⎦

or more briefly as

A = QR (15)

where R is the second factor in the product. However, it is a property of the Gram–Schmidt process that for j ≥ 2, the vector qj is orthogonal to u1, u2, . . . , uj−1. Thus, allentries below the main diagonal of R are zero, and R has the form

R =

⎡⎢⎢⎢⎣〈u1, q1〉〈u2, q1〉 · · · 〈un, q1〉

0 〈u2, q2〉 · · · 〈un, q2〉...

......

0 0 · · · 〈un, qn〉

⎤⎥⎥⎥⎦ (16)

We leave it for you to show that R is invertible by showing that its diagonal entriesare nonzero. Thus, Equation (15) is a factorization of A into the product of a matrix Q


with orthonormal column vectors and an invertible upper triangular matrix R. We callEquation (15) a QR-decomposition of A. In summary, we have the following theorem.

THEOREM 6.3.7 QR -Decomposition

If A is an m × n matrix with linearly independent column vectors, then A can be fac-tored as

A = QR

where Q is an m × n matrix with orthonormal column vectors, and R is an n × n

invertible upper triangular matrix.

It is common in numericallinear algebra to say that a ma-trix with linearly independentcolumns has full column rank.

Recall from Theorem 5.1.5 (the Equivalence Theorem) that a square matrix haslinearly independent column vectors if and only if it is invertible. Thus, it follows fromTheorem 6.3.7 that every invertible matrix has a QR-decomposition.

EXAMPLE 10 QR -Decomposition of a 3 × 3 Matrix

Find a QR-decomposition of

A =⎡⎢⎣1 0 0

1 1 0

1 1 1

⎤⎥⎦

Solution The column vectors of A are

u1 =⎡⎢⎣1

1

1

⎤⎥⎦, u2 =

⎡⎢⎣0

1

1

⎤⎥⎦, u3 =

⎡⎢⎣0

0

1

⎤⎥⎦

Applying the Gram–Schmidt process with normalization to these column vectors yieldsthe orthonormal vectors (see Example 8)

q1 =

⎡⎢⎢⎣

1√3

1√3

1√3

⎤⎥⎥⎦, q2 =

⎡⎢⎢⎣− 2√

61√6

1√6

⎤⎥⎥⎦, q3 =

⎡⎢⎢⎣

0

− 1√2

1√2

⎤⎥⎥⎦

Thus, it follows from Formula (16) that R is

R =⎡⎢⎣〈u1, q1〉〈u2, q1〉〈u3, q1〉

0 〈u2, q2〉〈u3, q2〉0 0 〈u3, q3〉

⎤⎥⎦ =

⎡⎢⎢⎣

3√3

2√3

1√3

0 2√6

1√6

0 0 1√2

⎤⎥⎥⎦

from which it follows that a QR-decomposition of A is⎡⎢⎣1 0 0

1 1 0

1 1 1

⎤⎥⎦ =

⎡⎢⎢⎣

1√3

− 2√6

0

1√3

1√6

− 1√2

1√3

1√6

1√2

⎤⎥⎥⎦⎡⎢⎢⎣

3√3

2√3

1√3

0 2√6

1√6

0 0 1√2

⎤⎥⎥⎦

A = Q R


Exercise Set 6.31. In each part, determine whether the set of vectors is orthog-

onal and whether it is orthonormal with respect to the Eu-clidean inner product on R2.

(a) (0, 1), (2, 0)

(b)(− 1√

2, 1√

2

),(

1√2, 1√

2

)(c)

(− 1√

2,− 1√

2

),(

1√2, 1√

2

)(d) (0, 0), (0, 1)

2. In each part, determine whether the set of vectors is orthog-onal and whether it is orthonormal with respect to the Eu-clidean inner product on R3.

(a)(

1√2, 0, 1√

2

),(

1√3, 1√

3,− 1√

3

),(− 1√

2, 0, 1√

2

)(b)

(23 ,− 2

3 ,13

),(

23 ,

13 ,− 2

3

),(

13 ,

23 ,

23

)(c) (1, 0, 0),

(0, 1√

2, 1√

2

), (0, 0, 1)

(d)(

1√6, 1√

6,− 2√

6

),(

1√2,− 1√

2, 0)

3. In each part, determine whether the set of vectors is orthog-onal with respect to the standard inner product on P2 (seeExample 7 of Section 6.1).

(a) p1(x) = 23 − 2

3 x + 13 x

2, p2(x) = 23 + 1

3 x − 23 x

2,

p3(x) = 13 + 2

3 x + 23 x

2

(b) p1(x) = 1, p2(x) = 1√2x + 1√

2x2, p3(x) = x2

4. In each part, determine whether the set of vectors is orthog-onal with respect to the standard inner product on M22 (seeExample 6 of Section 6.1).

(a)

[1 0

0 0

],

[0 2

3

13 − 2

3

],

[0 2

3

− 23

13

],

[0 1

3

23

23

]

(b)

[1 0

0 0

],

[0 1

0 0

],

[0 0

1 1

],

[0 0

1 −1

]

In Exercises 5–6, show that the column vectors of A form anorthogonal basis for the column space of A with respect to theEuclidean inner product, and then find an orthonormal basis forthat column space.

5. A =⎡⎢⎣ 1 2 0

0 0 5

−1 2 0

⎤⎥⎦ 6. A =

⎡⎢⎢⎣

15 − 1

213

15

12

13

15 0 − 2

3

⎤⎥⎥⎦

7. Verify that the vectors

v1 = (− 35 ,

45 , 0

), v2 = (

45 ,

35 , 0

), v3 = (0, 0, 1)

form an orthonormal basis for R3 with respect to the Eu-clidean inner product, and then use Theorem 6.3.2(b) to ex-press the vector u = (1,−2, 2) as a linear combination of v1,v2, and v3.

8. Use Theorem 6.3.2(b) to express the vector u = (3,−7, 4) asa linear combination of the vectors v1, v2, and v3 in Exercise 7.


v1 = (2,−2, 1), v2 = (2, 1,−2), v3 = (1, 2, 2)

form an orthogonal basis for R3 with respect to the Euclideaninner product, and then use Theorem 6.3.2(a) to express thevector u = (−1, 0, 2) as a linear combination of v1, v2, and v3.


v1 = (1,−1, 2,−1), v2 = (−2, 2, 3, 2),

v3 = (1, 2, 0,−1), v4 = (1, 0, 0, 1)

form an orthogonal basis for R4 with respect to the Euclideaninner product, and then use Theorem 6.3.2(a) to express thevector u = (1, 1, 1, 1) as a linear combination of v1, v2, v3,

and v4.

In Exercises 11–14, find the coordinate vector (u)S for the vec-tor u and the basis S that were given in the stated exercise.

11. Exercise 7 12. Exercise 8

13. Exercise 9 14. Exercise 10

In Exercises 15–18, let R2 have the Euclidean inner product.

(a) Find the orthogonal projection of u onto the line spanned bythe vector v.

(b) Find the component of u orthogonal to the line spanned bythe vector v, and confirm that this component is orthogonalto the line.

15. u = (−1, 6); v = (35 ,

45

)16. u = (2, 3); v = (

513 ,

1213

)17. u = (2, 3); v = (1, 1) 18. u = (3,−1); v = (3, 4)

In Exercises 19–22, let R3 have the Euclidean inner product.

(a) Find the orthogonal projection of u onto the plane spannedby the vectors v1 and v2.

(b) Find the component of u orthogonal to the plane spannedby the vectors v1 and v2, and confirm that this component isorthogonal to the plane.

19. u = (4, 2, 1); v1 = (13 ,

23 ,− 2

3

), v2 = (

23 ,

13 ,

23

)20. u = (3,−1, 2); v1 =

(1√6, 1√

6,− 2√

6

), v2 =

(1√3, 1√

3, 1√

3

)21. u = (1, 0, 3); v1 = (1,−2, 1), v2 = (2, 1, 0)

22. u = (1, 0, 2); v1 = (3, 1, 2), v2 = (−1, 1, 1)

In Exercises 23–24, the vectors v1 and v2 are orthogonal withrespect to the Euclidean inner product on R4. Find the orthogo-nal projection of b = (1, 2, 0,−2) on the subspace W spanned bythese vectors.

23. v1 = (1, 1, 1, 1), v2 = (1, 1,−1,−1)

24. v1 = (0, 1,−4,−1), v2 = (3, 5, 1, 1)


In Exercises 25–26, the vectors v1, v2, and v3 are orthonor-mal with respect to the Euclidean inner product on R4. Find theorthogonal projection of b = (1, 2, 0,−1) onto the subspace W

spanned by these vectors.

25. v1 =(

0, 1√18

,− 4√18

,− 1√18

), v2 = (

12 ,

56 ,

16 ,

16

),

v3 =(

1√18

, 0, 1√18

,− 4√18

)26. v1 = (

12 ,

12 ,

12 ,

12

), v2 = (

12 ,

12 ,− 1

2 ,− 12

),

v3 = (12 ,− 1

2 ,12 ,− 1

2

)In Exercises 27–28, let R2 have the Euclidean inner product

and use the Gram–Schmidt process to transform the basis {u1, u2}into an orthonormal basis. Draw both sets of basis vectors in thexy-plane.

27. u1 = (1,−3), u2 = (2, 2) 28. u1 = (1, 0), u2 = (3,−5)

In Exercises 29–30, letR3 have the Euclidean inner product anduse the Gram–Schmidt process to transform the basis {u1, u2, u3}into an orthonormal basis.

29. u1 = (1, 1, 1), u2 = (−1, 1, 0), u3 = (1, 2, 1)

30. u1 = (1, 0, 0), u2 = (3, 7,−2), u3 = (0, 4, 1)

31. Let R4 have the Euclidean inner product. Use the Gram–Schmidt process to transform the basis {u1, u2, u3, u4} into anorthonormal basis.

u1 = (0, 2, 1, 0), u2 = (1,−1, 0, 0),

u3 = (1, 2, 0,−1), u4 = (1, 0, 0, 1)

32. Let R3 have the Euclidean inner product. Find an orthonor-mal basis for the subspace spanned by (0, 1, 2), (−1, 0, 1),(−1, 1, 3).

33. Let b and W be as in Exercise 23. Find vectors w1 in W andw2 in W⊥ such that b = w1 + w2.

34. Let b and W be as in Exercise 25. Find vectors w1 in W andw2 in W⊥ such that b = w1 + w2.

35. Let R3 have the Euclidean inner product. The subspace ofR3 spanned by the vectors u1 = (1, 1, 1) and u2 = (2, 0,−1)is a plane passing through the origin. Express w = (1, 2, 3)in the form w = w1 + w2, where w1 lies in the plane and w2 isperpendicular to the plane.

36. Let R4 have the Euclidean inner product. Express the vectorw = (−1, 2, 6, 0) in the form w = w1 + w2, where w1 is in thespace W spanned by u1 = (−1, 0, 1, 2) and u2 = (0, 1, 0, 1),and w2 is orthogonal to W .

37. Let R3 have the inner product

〈u, v〉 = u1v1 + 2u2v2 + 3u3v3

Use the Gram–Schmidt process to transform u1 = (1, 1, 1),u2 = (1, 1, 0), u3 = (1, 0, 0) into an orthonormal basis.

38. Verify that the set of vectors {(1, 0), (0, 1)} is orthogonal withrespect to the inner product 〈u, v〉 = 4u1v1 + u2v2 on R2; thenconvert it to an orthonormal set by normalizing the vectors.

39. Find vectors x and y in R2 that are orthonormal with respectto the inner product 〈u, v〉 = 3u1v1 + 2u2v2 but are not or-thonormal with respect to the Euclidean inner product.

40. In Example 3 of Section 4.9 we found the orthogonal projec-tion of the vector x = (1, 5) onto the line through the originmaking an angle of π/6 radians with the positive x-axis. Solvethat same problem using Theorem 6.3.4.

41. This exercise illustrates that the orthogonal projection result-ing from Formula (12) in Theorem 6.3.4 does not depend onwhich orthogonal basis vectors are used.

(a) Let R3 have the Euclidean inner product, and let W be thesubspace of R3 spanned by the orthogonal vectors

v1 = (1, 0, 1) and v2 = (0, 1, 0)

Show that the orthogonal vectors

v′1 = (1, 1, 1) and v′2 = (1,−2, 1)

span the same subspace W .

(b) Let u = (−3, 1, 7) and show that the same vector projW uresults regardless of which of the bases in part (a) is usedfor its computation.

42. (Calculus required ) Use Theorem 6.3.2(a) to express the fol-lowing polynomials as linear combinations of the first threeLegendre polynomials (see the Remark following Example 9).

(a) 1 + x + 4x2 (b) 2 − 7x2 (c) 4 + 3x

43. (Calculus required ) Let P2 have the inner product

〈p, q〉 =∫ 1

0p(x)q(x) dx

Apply the Gram–Schmidt process to transform the standardbasis S = {1, x, x2} into an orthonormal basis.

44. Find an orthogonal basis for the column space of the matrix

A =

⎡⎢⎢⎢⎣

6 1 −5

2 1 1

−2 −2 5

6 8 −7

⎤⎥⎥⎥⎦

In Exercises 45–48, we obtained the column vectors of Q byapplying the Gram–Schmidt process to the column vectors of A.Find a QR-decomposition of the matrix A.

45. A =[

1 −12 3

], Q =

[ 1√5

− 2√5

2√5

1√5

]

46. A =⎡⎣1 2

0 11 4

⎤⎦ , Q =

⎡⎢⎢⎣

1√2

− 1√3

0 1√3

1√2

1√3

⎤⎥⎥⎦


47. A =⎡⎣1 0 2

0 1 11 2 0

⎤⎦ , Q =

⎡⎢⎢⎣

1√2

− 1√3

1√6

0 1√3

2√6

1√2

1√3

− 1√6

⎤⎥⎥⎦

48. A =⎡⎣1 2 1

1 1 10 3 1

⎤⎦ , Q =

⎡⎢⎢⎢⎣

1√2

√2

2√

19− 3√

19

1√2

−√

22√

193√19

0 3√

2√19

1√19

⎤⎥⎥⎥⎦

49. Find a QR-decomposition of the matrix

A =

⎡⎢⎢⎣

1 0 1−1 1 1

1 0 1−1 1 1

⎤⎥⎥⎦

50. In the Remark following Example 8 we discussed two alter-native ways to perform the calculations in the Gram–Schmidtprocess: normalizing each orthogonal basis vector as soon asit is calculated and scaling the orthogonal basis vectors at eachstep to eliminate fractions. Try these methods in Example 8.

Working with Proofs

51. Prove part (a) of Theorem 6.3.6.

52. In Step 3 of the proof of Theorem 6.3.5, it was stated that “thelinear independence of {u1, u2, . . . , un} ensures that v3 �= 0.”Prove this statement.

53. Prove that the diagonal entries of R in Formula (16) arenonzero.

54. Show that matrix Q in Example 10 has the propertyQQT = I3, and prove that every m × n matrix Q with or-thonormal column vectors has the property QQT = Im.

55. (a) Prove that if W is a subspace of a finite-dimensional vec-tor space V , then the mapping T : V →W defined byT (v) = projW v is a linear transformation.

(b) What are the range and kernel of the transformation inpart (a)?


TF. In parts (a)–(f ) determine whether the statement is true orfalse, and justify your answer.

(a) Every linearly independent set of vectors in an inner productspace is orthogonal.

(b) Every orthogonal set of vectors in an inner product space islinearly independent.

(c) Every nontrivial subspace ofR3 has an orthonormal basis withrespect to the Euclidean inner product.

(d) Every nonzero finite-dimensional inner product space has anorthonormal basis.

(e) projW x is orthogonal to every vector of W .

(f ) If A is an n × n matrix with a nonzero determinant, then A

has a QR-decomposition.


T1. (a) Use the Gram–Schmidt process to find an orthonormalbasis relative to the Euclidean inner product for the columnspace of

A =

⎡⎢⎢⎢⎣

1 1 1 1

1 0 0 1

0 1 0 2

2 −1 1 1

⎤⎥⎥⎥⎦

(b) Use the method of Example 9 to find a QR-decompositionof A.

T2. Let P4 have the evaluation inner product at the points−2,−1, 0, 1, 2. Find an orthogonal basis for P4 relative to thisinner product by applying the Gram–Schmidt process to the vec-tors

p0 = 1, p1 = x, p2 = x2, p3 = x3, p4 = x4

6.4 Best Approximation; Least SquaresThere are many applications in which some linear system Ax = b of m equations in n

unknowns should be consistent on physical grounds but fails to be so because ofmeasurement errors in the entries of A or b. In such cases one looks for vectors that comeas close as possible to being solutions in the sense that they minimize ‖b − Ax‖ with respectto the Euclidean inner product on Rm. In this section we will discuss methods for findingsuch minimizing vectors.

Least Squares Solutions ofLinear Systems

Suppose that Ax = b is an inconsistent linear system of m equations in n unknowns inwhich we suspect the inconsistency to be caused by errors in the entries of A or b. Sinceno exact solution is possible, we will look for a vector x that comes as “close as possible”to being a solution in the sense that it minimizes ‖b − Ax‖ with respect to the Euclidean


inner product on Rm. You can think of Ax as an approximation to b and ‖b − Ax‖as the error in that approximation—the smaller the error, the better the approximation.This leads to the following problem.

Least Squares Problem Given a linear system Ax = b of m equations in n un-knowns, find a vector x in Rn that minimizes ‖b − Ax‖ with respect to the Euclideaninner product on Rm. We call such a vector, if it exists, a least squares solution ofAx = b, we call b − Ax the least squares error vector, and we call ‖b − Ax‖ the leastsquares error.

To explain the terminology in this problem, suppose that the column form of b − Ax is

If a linear system is consistent,then its exact solutions are thesame as its least squares solu-tions, in which case the leastsquares error is zero.

b − Ax =

⎡⎢⎢⎢⎢⎣

e1

e2

...

em

⎤⎥⎥⎥⎥⎦

The term “least squares solution” results from the fact that minimizing ‖b − Ax‖ alsohas the effect of minimizing ‖b − Ax‖2 = e2

1 + e22 + · · · + e2

m.What is important to keep in mind about the least squares problem is that for ev-

ery vector x in Rn, the product Ax is in the column space of A because it is a linearcombination of the column vectors of A. That being the case, to find a least squaressolution of Ax = b is equivalent to finding a vector Ax̂ in the column space of A thatis closest to b in the sense that it minimizes the length of the vector b − Ax. This isillustrated in Figure 6.4.1a, which also suggests that Ax̂ is the orthogonal projection ofb on the column space of A, that is, Ax̂ = projcol(A)b (Figure 6.4.1b). The next theoremwill confirm this conjecture.

Figure 6.4.1

b

(b)(a)

b

col(A) col(A)

b – Ax

Ax

Ax̂ Ax = projcol(A)bˆ

THEOREM 6.4.1 Best ApproximationTheorem

If W is a finite-dimensional subspace of an inner product space V, and if b is avector in V, then projW b is the best approximation to b from W in the sense that

‖b − projW b‖ < ‖b − w‖for every vector w in W that is different from projW b.

Proof For every vector w in W , we can write

b − w = (b − projW b) + (projW b − w) (1)


But projW b − w, being a difference of vectors in W , is itself in W ; and since b − projW bis orthogonal to W , the two terms on the right side of (1) are orthogonal. Thus, it followsfrom the Theorem of Pythagoras (Theorem 6.2.3) that

‖b − w‖2 = ‖b − projW b‖2 + ‖projW b − w‖2

If w �= projW b, it follows that the second term in this sum is positive, and hence that

‖b − projW b‖2 < ‖b − w‖2

Since norms are nonnegative, it follows (from a property of inequalities) that

‖b − projW b‖ < ‖b − w‖

It follows from Theorem 6.4.1 that if V = Rn and W = col(A), then the best ap-proximation to b from col(A) is projcol(A)b. But every vector in the column space of A isexpressible in the form Ax for some vector x, so there is at least one vector x̂ in col(A) forwhich Ax̂ = projcol(A)b. Each such vector is a least squares solution of Ax = b. Note,however, that although there may be more than one least squares solution of Ax = b,each such solution x̂ has the same error vector b − Ax̂.

Finding Least SquaresSolutions

One way to find a least squares solution of Ax = b is to calculate the orthogonal projec-tion projW b on the column space W of A and then solve the equation

Ax = projW b (2)

However, we can avoid calculating the projection by rewriting (2) as

b − Ax = b − projW b

and then multiplying both sides of this equation by AT to obtain

AT (b − Ax) = AT (b − projW b) (3)

Since b − projW b is the component of b that is orthogonal to the column space of A,it follows from Theorem 4.8.7(b) that this vector lies in the null space of AT , and hencethat

AT (b − projW b) = 0

Thus, (3) simplifies to

AT (b − Ax) = 0

which we can rewrite as

ATAx = AT b (4)

This is called the normal equation or the normal system associated with Ax = b. Whenviewed as a linear system, the individual equations are called the normal equations asso-ciated with Ax = b.

In summary, we have established the following result.


THEOREM 6.4.2 For every linear system Ax = b, the associated normal system

ATAx = AT b (5)

is consistent, and all solutions of (5) are least squares solutions of Ax = b. Moreover,if W is the column space of A, and x is any least squares solution of Ax = b, then theorthogonal projection of b on W is

projW b = Ax (6)

EXAMPLE 1 Unique Least Squares Solution

Find the least squares solution, the least squares error vector, and the least squares errorof the linear system

x1 − x2 = 4

3x1 + 2x2 = 1

−2x1 + 4x2 = 3

Solution It will be convenient to express the system in the matrix form Ax = b, where

A =⎡⎢⎣ 1 −1

3 2

−2 4

⎤⎥⎦ and b =

⎡⎢⎣4

1

3

⎤⎥⎦ (7)

It follows that

ATA =[

1 3 −2

−1 2 4

]⎡⎢⎣ 1 −1

3 2

−2 4

⎤⎥⎦ =

[14 −3

−3 21

](8)

AT b =[

1 3 −2

−1 2 4

]⎡⎢⎣4

1

3

⎤⎥⎦ =

[1

10

]

so the normal system ATAx = AT b is[14 −3

−3 21

] [x1

x2

]=[

1

10

]Solving this system yields a unique least squares solution, namely,

x1 = 1795 , x2 = 143

285

The least squares error vector is

b − Ax =⎡⎢⎣4

1

3

⎤⎥⎦−

⎡⎢⎣ 1 −1

3 2

−2 4

⎤⎥⎦⎡⎣ 17

95

143285

⎤⎦ =

⎡⎢⎣4

1

3

⎤⎥⎦−

⎡⎢⎢⎢⎣− 92

285

439285

9557

⎤⎥⎥⎥⎦ =

⎡⎢⎢⎢⎣

1232285

− 154285

43

⎤⎥⎥⎥⎦

and the least squares error is

‖b − Ax‖ ≈ 4.556

The computations in the next example are a little tedious for hand computation, soin absence of a calculating utility you may want to just read through it for its ideas andlogical flow.


EXAMPLE 2 Infinitely Many Least Squares Solutions

Find the least squares solutions, the least squares error vector, and the least squares errorof the linear system

3x1 + 2x2 − x3 = 2x1 − 4x2 + 3x3 = −2x1 + 10x2 − 7x3 = 1

Solution The matrix form of the system is Ax = b, where

A =⎡⎢⎣3 2 −1

1 −4 3

1 10 −7

⎤⎥⎦ and b =

⎡⎢⎣ 2

−2

1

⎤⎥⎦

It follows that

ATA =⎡⎢⎣ 11 12 −7

12 120 −84

−7 −84 59

⎤⎥⎦ and AT b =

⎡⎢⎣ 5

22

−15

⎤⎥⎦

so the augmented matrix for the normal system ATAx = AT b is⎡⎢⎣ 11 12 −7 5

12 120 −84 22

−7 −84 59 −15

⎤⎥⎦

The reduced row echelon form of this matrix is⎡⎢⎣1 0 1

727

0 1 − 57

1384

0 0 0 0

⎤⎥⎦

from which it follows that there are infinitely many least squares solutions, and that theyare given by the parametric equations

x1 = 27 − 1

7 t

x2 = 1384 + 5

7 t

x3 = t

As a check, let us verify that all least squares solutions produce the same least squareserror vector and the same least squares error. To see that this is so, we first compute

b − Ax =⎡⎢⎣ 2

−2

1

⎤⎥⎦−

⎡⎢⎣3 2 −1

1 −4 3

1 10 −7

⎤⎥⎦⎡⎢⎣

27 − 1

7 t

1384 + 5

7 t

t

⎤⎥⎦ =

⎡⎢⎣ 2

−2

1

⎤⎥⎦−

⎡⎢⎣

76

− 13

116

⎤⎥⎦ =

⎡⎢⎣

56

− 53

− 56

⎤⎥⎦

Since b − Ax does not depend on t , all least squares solutions produce the same errorvector, namely

‖b − Ax‖ =√(

56

)2 + (− 53

)2 + (− 56

)2 = 56

√6

Conditions for Uniquenessof Least Squares Solutions

We know from Theorem 6.4.2 that the system ATAx = AT b of normal equations that isassociated with the system Ax = b is consistent. Thus, it follows from Theorem 1.6.1that every linear system Ax = b has either one least squares solution (as in Example 1) orinfinitely many least squares solutions (as in Example 2). Since ATA is a square matrix,the former occurs if ATA is invertible and the latter if it is not. The next two theoremsare concerned with this idea.


THEOREM 6.4.3 If A is an m × n matrix, then the following are equivalent.

(a) The column vectors of A are linearly independent.

(b) ATA is invertible.

Proof We will prove that (a) ⇒ (b) and leave the proof that (b) ⇒ (a) as an exercise.

(a) ⇒ (b) Assume that the column vectors of A are linearly independent. The matrixATA has size n × n, so we can prove that this matrix is invertible by showing that the linearsystem ATAx = 0 has only the trivial solution. But if x is any solution of this system, thenAx is in the null space of AT and also in the column space of A. By Theorem 4.8.7(b)these spaces are orthogonal complements, so part (b) of Theorem 6.2.4 implies thatAx = 0. But A is assumed to have linearly independent column vectors, so x = 0 byTheorem 1.3.1.

The next theorem, which follows directly from Theorems 6.4.2 and 6.4.3, gives anexplicit formula for the least squares solution of a linear system in which the coefficientmatrix has linearly independent column vectors.

THEOREM 6.4.4 If A is an m × n matrix with linearly independent column vectors,then for every m × 1 matrix b, the linear system Ax = b has a unique least squaressolution. This solution is given by

x = (ATA)−1AT b (9)

Moreover, if W is the column space of A, then the orthogonal projection of b on W is

projW b = Ax = A(ATA)−1AT b (10)

EXAMPLE 3 A Formula Solution to Example 1

Use Formula (9) and the matrices in Formulas (7) and (8) to find the least squares solutionof the linear system in Example 1.

Solution We leave it for you to verify that

x = (ATA)−1AT b =[

14 −3

−3 21

]−1 [1 3 −2

−1 2 4

]⎡⎢⎣4

1

3

⎤⎥⎦

= 1

285

[21 3

3 14

][1 3 −2

−1 2 4

]⎡⎢⎣4

1

3

⎤⎥⎦ =

⎡⎣ 17

95

143285

⎤⎦

which agrees with the result obtained in Example 1.

It follows from Formula (10) that the standard matrix for the orthogonal projectionon the column space of a matrix A is

P = A(ATA)−1AT (11)

We will use this result in the next example.

EXAMPLE 4 Orthogonal Projection on a Column Space

We showed in Formula (4) of Section 4.9 that the standard matrix for the orthogonalprojection onto the line W through the origin of R2 that makes an angle θ with thepositive x-axis is


Pθ =[

cos2 θ sin θ cos θ

sin θ cos θ sin2 θ

]Derive this result using Formula (11).

Solution To apply Formula (11) we must find a matrix A for which the line W is thecolumn space. Since the line is one-dimensional and consists of all scalar multiples ofthe vector w = (cos θ, sin θ) (see Figure 6.4.2), we can take A to be

A =[

cos θ

sin θ

]Since ATA is the 1 × 1 identity matrix (verify), it follows that

y

x

W

w

1

cos θ

sin θθ

Figure 6.4.2

A(ATA)−1AT = AAT =[

cos θ

sin θ

][cos θ sin θ ]

=[

cos2 θ sin θ cos θ

sin θ cos θ sin2 θ

]= Pθ

More on the EquivalenceTheorem

As our final result in the main part of this section we will add one additional part toTheorem 5.1.5.

THEOREM 6.4.5 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent.

(a) A is invertible.

(b) Ax = 0 has only the trivial solution.

(c) The reduced row echelon form of A is In.

(d ) A is expressible as a product of elementary matrices.

(e) Ax = b is consistent for every n × 1 matrix b.

( f ) Ax = b has exactly one solution for every n × 1 matrix b.

(g) det(A) �= 0.

(h) The column vectors of A are linearly independent.

(i ) The row vectors of A are linearly independent.

( j) The column vectors of A span Rn.

(k) The row vectors of A span Rn.

(l ) The column vectors of A form a basis for Rn.

(m) The row vectors of A form a basis for Rn.

(n) A has rank n.

(o) A has nullity 0.

( p) The orthogonal complement of the null space of A is Rn.

(q) The orthogonal complement of the row space of A is {0}.(r) The kernel of TA is {0}.(s) The range of TA is Rn.

(t) TA is one-to-one.

(u) λ = 0 is not an eigenvalue of A.

(v) ATA is invertible.


The proof of part (v) follows from part (h) of this theorem and Theorem 6.4.3 appliedto square matrices.

Another View of LeastSquares

Recall from Theorem 4.8.7 that the null space and row space of an m × n matrix A areO PT I O NA L

orthogonal complements, as are the null space of AT and the column space of A. Thus,given a linear system Ax = b in which A is an m × n matrix, the Projection Theorem(6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonalterms as

x = xrow(A) + xnull(A) and b = bnull(AT ) + bcol(A)

where xrow(A) and xnull(A) are the orthogonal projections of x on the row space of A andthe null space of A, and the vectors bnull(AT ) and bcol(A) are the orthogonal projectionsof b on the null space of AT and the column space of A.

In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicularlines in Rn and Rm on which we indicated the orthogonal projections of x and b. (This,of course, is only pictorial since the fundamental spaces need not be one-dimensional.)The figure shows Ax as a point in the column space of A and conveys that bcol(A) is thepoint in col(A) that is closest to b. This illustrates that the least squares solutions ofAx = b are the exact solutions of the equation Ax = bcol(A).

Figure 6.4.3

x

Ax

Rn Rmxrow(A) bnull(AT)

xnull(A) bcol(A)b

row(A)

null(A) col(A)

null(AT)

The Role ofQR-Decomposition in Least

Squares Problems

Formulas (9) and (10) have theoretical use but are not well suited for numerical com-O PT I O NA L

putation. In practice, least squares solutions of Ax = b are typically found by usingsome variation of Gaussian elimination to solve the normal equations or by using QR-decomposition and the following theorem.

THEOREM 6.4.6 IfA is anm × nmatrix with linearly independent column vectors, andif A = QR is aQR-decomposition of A (see Theorem 6.3.7), then for each b in Rm thesystem Ax = b has a unique least squares solution given by

x = R−1QT b (12)

A proof of this theorem and a discussion of its use can be found in many books onnumerical methods of linear algebra. However, you can obtain Formula (12) by makingthe substitution A = QR in (9) and using the fact that QTQ = I to obtain

x = ((QR)T (QR)

)−1(QR)T b

= (RTQTQR)−1(QR)T b

= R−1(RT )−1RTQT b

= R−1QT b


Exercise Set 6.4

In Exercises 1–2, find the associated normal equation.

1.

⎡⎢⎣1 −1

2 3

4 5

⎤⎥⎦[x1

x2

]=⎡⎢⎣ 2

−1

5

⎤⎥⎦

2.

⎡⎢⎢⎢⎣

2 −1 0

3 1 2

−1 4 5

1 2 4

⎤⎥⎥⎥⎦⎡⎢⎣x1

x2

x3

⎤⎥⎦ =

⎡⎢⎢⎢⎣−1

0

1

2

⎤⎥⎥⎥⎦

In Exercises 3–6, find the least squares solution of the equationAx = b.

3. A =⎡⎢⎣1 −1

2 3

4 5

⎤⎥⎦; b =

⎡⎢⎣ 2

−1

5

⎤⎥⎦

4. A =⎡⎢⎣2 −2

1 1

3 1

⎤⎥⎦; b =

⎡⎢⎣ 2

−1

1

⎤⎥⎦

5. A =

⎡⎢⎢⎢⎣

1 0 −1

2 1 −2

1 1 0

1 1 −1

⎤⎥⎥⎥⎦; b =

⎡⎢⎢⎢⎣

6

0

9

3

⎤⎥⎥⎥⎦

6. A =

⎡⎢⎢⎢⎣

2 0 −1

1 −2 2

2 −1 0

0 1 −1

⎤⎥⎥⎥⎦; b =

⎡⎢⎢⎢⎣

0

6

0

6

⎤⎥⎥⎥⎦

In Exercises 7–10, find the least squares error vector and leastsquares error of the stated equation. Verify that the least squareserror vector is orthogonal to the column space of A.

7. The equation in Exercise 3.




In Exercises 11–14, find parametric equations for all leastsquares solutions of Ax = b, and confirm that all of the solutionshave the same error vector.

11. A =⎡⎢⎣ 2 1

4 2

−2 −1

⎤⎥⎦; b =

⎡⎢⎣3

2

1

⎤⎥⎦

12. A =⎡⎢⎣ 1 3

−2 −6

3 9

⎤⎥⎦; b =

⎡⎢⎣1

0

1

⎤⎥⎦

13. A =⎡⎢⎣−1 3 2

2 1 3

0 1 1

⎤⎥⎦; b =

⎡⎢⎣ 7

0

−7

⎤⎥⎦

14. A =⎡⎢⎣3 2 −1

1 −4 3

1 10 −7

⎤⎥⎦; b =

⎡⎢⎣ 2

−2

1

⎤⎥⎦

In Exercises 15–16, use Theorem 6.4.2 to find the orthogonalprojection of b on the column space of A, and check your resultusing Theorem 6.4.4.

15. A =⎡⎢⎣ 1 −1

3 2

−2 4

⎤⎥⎦; b =

⎡⎢⎣4

1

3

⎤⎥⎦

16. A =⎡⎢⎣5 1

1 3

4 −2

⎤⎥⎦; b =

⎡⎢⎣−4

2

3

⎤⎥⎦

17. Find the orthogonal projection of u on the subspace of R3

spanned by the vectors v1 and v2.

u = (1,−6, 1); v1 = (−1, 2, 1), v2 = (2, 2, 4)

18. Find the orthogonal projection of u on the subspace of R4

spanned by the vectors v1, v2, and v3.

u = (6, 3, 9, 6); v1 = (2, 1, 1, 1), v2 = (1, 0, 1, 1),v3 = (−2,−1, 0,−1)

In Exercises 19–20, use the method of Example 3 to find thestandard matrix for the orthogonal projection on the stated sub-space of R2. Compare your result to that in Table 3 of Section4.9.

19. the x-axis 20. the y-axis

In Exercises 21–22, use the method of Example 3 to find thestandard matrix for the orthogonal projection on the stated sub-space of R3. Compare your result to that in Table 4 of Section4.9.

21. the xz-plane 22. the yz-plane

In Exercises 23–24, a QR-factorization of A is given. Use it tofind the least squares solution of Ax = b.

23. A =[

3 1

−4 1

]=[

35

45

− 45

35

][5 − 1

5

0 75

]; b =

[3

2

]

24. A =⎡⎢⎣

3 −6

4 −8

0 1

⎤⎥⎦ =

⎡⎢⎣

35 045 0

0 1

⎤⎥⎦[

5 −10

0 1

]; b =

⎡⎢⎣−1

7

2

⎤⎥⎦

25. Let W be the plane with equation 5x − 3y + z = 0.

(a) Find a basis for W .

(b) Find the standard matrix for the orthogonal projectiononto W .


26. Let W be the line with parametric equations

x = 2t, y = −t, z = 4t

(a) Find a basis for W .

(b) Find the standard matrix for the orthogonal projectionon W .

27. Find the orthogonal projection of u = (5, 6, 7, 2) on the so-lution space of the homogeneous linear system

x1 + x2 + x3 = 0

2x2 + x3 + x4 = 0

28. Show that if w = (a, b, c) is a nonzero vector, then the stan-dard matrix for the orthogonal projection of R3 onto the linespan{w} is

P = 1

a2 + b2 + c2

⎡⎢⎣

a2 ab ac

ab b2 bc

ac bc c2

⎤⎥⎦

29. Let A be an m × n matrix with linearly independent row vec-tors. Find a standard matrix for the orthogonal projection ofRn onto the row space of A.

Working with Proofs

30. Prove: If A has linearly independent column vectors, and ifAx = b is consistent, then the least squares solution of Ax = band the exact solution of Ax = b are the same.

31. Prove: If A has linearly independent column vectors, and if bis orthogonal to the column space of A, then the least squaressolution of Ax = b is x = 0.

32. Prove the implication (b) ⇒ (a) of Theorem 6.4.3.


TF. In parts (a)–(h) determine whether the statement is true orfalse, and justify your answer.

(a) If A is an m × n matrix, then ATA is a square matrix.

(b) If ATA is invertible, then A is invertible.

(c) If A is invertible, then ATA is invertible.

(d) If Ax = b is a consistent linear system, thenATAx = AT b is also consistent.

(e) If Ax = b is an inconsistent linear system, thenATAx = AT b is also inconsistent.

(f ) Every linear system has a least squares solution.

(g) Every linear system has a unique least squares solution.

(h) If A is an m × n matrix with linearly independent columns andb is in Rm, then Ax = b has a unique least squares solution.


T1. (a) Use Theorem 6.4.4 to show that the following linear sys-tem has a unique least squares solution, and use the methodof Example 1 to find it.

x1 + x2 + x3 = 1

4x1 + 2x2 + x3 = 10

9x1 + 3x2 + x3 = 9

16x1 + 4x2 + x3 = 16

(b) Check your result in part (a) using Formula (9).

T2. Use your technology utility to perform the computations andconfirm the results obtained in Example 2.

6.5 Mathematical Modeling Using Least SquaresIn this section we will use results about orthogonal projections in inner product spaces toobtain a technique for fitting a line or other polynomial curve to a set of experimentallydetermined points in the plane.

Fitting a Curve to Data A common problem in experimental work is to obtain a mathematical relationshipy = f(x) between two variables x and y by “fitting” a curve to points in the planecorresponding to various experimentally determined values of x and y, say

(x1, y1), (x2, y2), . . . , (xn, yn)

On the basis of theoretical considerations or simply by observing the pattern of thepoints, the experimenter decides on the general form of the curve y = f(x) to be fitted.This curve is called a mathematical model of the data. Some examples are (Figure 6.5.1):


(a) A straight line: y = a + bx

(b) A quadratic polynomial: y = a + bx + cx2

(c) A cubic polynomial: y = a + bx + cx2 + dx3

Figure 6.5.1

x

y

(b) y = a + bx + cx2

x

y

(c) y = a + bx + cx2 + dx3

x

y

(a) y = a + bx

Least Squares Fit of aStraight Line

When data points are obtained experimentally, there is generally some measurement“error,” making it impossible to find a curve of the desired form that passes through allthe points. Thus, the idea is to choose the curve (by determining its coefficients) that“best fits” the data. We begin with the simplest case: fitting a straight line to data points.

Suppose we want to fit a straight line y = a + bx to the experimentally determinedpoints

(x1, y1), (x2, y2), . . . , (xn, yn)

If the data points were collinear, the line would pass through all n points, and theunknown coefficients a and b would satisfy the equations

y1 = a + bx1

y2 = a + bx2...

yn = a + bxn

(1)

We can write this system in matrix form as⎡⎢⎢⎢⎣

1 x1

1 x2...

...

1 xn

⎤⎥⎥⎥⎦[a

b

]=

⎡⎢⎢⎢⎣

y1

y2...

yn

⎤⎥⎥⎥⎦

or more compactly asMv = y (2)

where

y =

⎡⎢⎢⎢⎣

y1

y2...

yn

⎤⎥⎥⎥⎦, M =

⎡⎢⎢⎢⎣

1 x1

1 x2...

...

1 xn

⎤⎥⎥⎥⎦, v =

[a

b

](3)

If there are measurement errors in the data, then the data points will typically not lieon a line, and (1) will be inconsistent. In this case we look for a least squares approxi-mation to the values of a and b by solving the normal system

MTMv = MTy

For simplicity, let us assume that the x-coordinates of the data points are not all the same,so M has linearly independent column vectors (Exericse 14) and the normal system hasthe unique solution


v∗ =[a∗

b∗

]= (MTM)−1MTy

[see Formula (9) of Theorem 6.4.4]. The line y = a∗ + b∗x that results from this solutionis called the least squares line of best fit or the regression line. It follows from (2) and (3)that this line minimizes

‖y − Mv‖2 = [y1 − (a + bx1)]2 + [y2 − (a + bx2)]2 + · · · + [yn − (a + bxn)]2

The quantities

d1 = |y1 − (a + bx1)|, d2 = |y2 − (a + bx2)|, . . . , dn = |yn − (a + bxn)|are called residuals. Since the residual di is the distance between the data point (xi, yi)

and the regression line (Figure 6.5.2), we can interpret its value as the “error” in yi atthe point xi . If we assume that the value of each xi is exact, then all the errors are in theyi so the regression line can be described as the line that minimizes the sum of the squaresof the data errors—hence the name, “least squares line of best fit.” In summary, we havethe following theorem.

Figure 6.5.2 di measures thevertical error.

y = a + bx

(x1, y1)(xn, yn)

(xi, yi)

d1

y

yi

di dn

a + bxi

x

THEOREM 6.5.1 Uniqueness of the Least Squares Solution

Let (x1, y1), (x2, y2), . . . , (xn, yn) be a set of two or more data points, not all lying ona vertical line, and let

M =

⎡⎢⎢⎢⎣

1 x1

1 x2...

...

1 xn

⎤⎥⎥⎥⎦ and y =

⎡⎢⎢⎢⎣

y1

y2...

yn

⎤⎥⎥⎥⎦ (4)

Then there is a unique least squares straight line fit

y = a∗ + b∗x (5)

to the data points. Moreover,

v∗ =[a∗b∗

](6)

is given by the formulav∗ = (MTM)−1MTy (7)

which expresses the fact that v = v∗ is the unique solution of the normal equation

MTMv = MTy (8)


EXAMPLE 1 Least Squares Straight Line Fit

Find the least squares straight line fit to the four points (0, 1), (1, 3), (2, 4), and (3, 4).(See Figure 6.5.3.)

00 1 2 3 4–1

1

2

3

4

5

x

y

Figure 6.5.3

Solution We have

M =

⎡⎢⎢⎢⎣

1 0

1 1

1 2

1 3

⎤⎥⎥⎥⎦, MTM =

[4 6

6 14

], and (MTM)−1 = 1

10

[7 −3

−3 2

]

v∗ = (MTM)−1MTy = 1

10

[7 −3

−3 2

] [1 1 1 1

0 1 2 3

]⎡⎢⎢⎢⎣1

3

4

4

⎤⎥⎥⎥⎦ =

[1.5

1

]

so the desired line is y = 1.5 + x.

EXAMPLE 2 Spring Constant

Hooke’s law in physics states that the length x of a uniform spring is a linear function ofthe force y applied to it. If we express this relationship as y = a + bx, then the coefficientb is called the spring constant. Suppose a particular unstretched spring has a measuredlength of 6.1 inches (i.e., x = 6.1 when y = 0). Suppose further that, as illustrated inFigure 6.5.4, various weights are attached to the end of the spring and the following tableof resulting spring lengths is recorded. Find the least squares straight line fit to the dataand use it to approximate the spring constant.

Weight y (lb) 0 2 4 6

Length x (in) 6.1 7.6 8.7 10.4

Solution The mathematical problem is to fit a line y = a + bx to the four data points

y

x

6.1

Figure 6.5.4

(6.1, 0), (7.6, 2), (8.7, 4), (10.4, 6)

For these data the matrices M and y in (4) are

M =

⎡⎢⎢⎢⎣

1 6.1

1 7.6

1 8.7

1 10.4

⎤⎥⎥⎥⎦, y =

⎡⎢⎢⎢⎣

0

2

4

6

⎤⎥⎥⎥⎦

so

v∗ =[a∗

b∗

]= (MTM)−1MTy ≈

[−8.6

1.4

]where the numerical values have been rounded to one decimal place. Thus, the estimatedvalue of the spring constant is b∗ ≈ 1.4 pounds/inch.

Least Squares Fit of aPolynomial

The technique described for fitting a straight line to data points can be generalized tofitting a polynomial of specified degree to data points. Let us attempt to fit a polynomialof fixed degree m

y = a0 + a1x + · · · + amxm (9)

to n points(x1, y1), (x2, y2), . . . , (xn, yn)


Substituting these n values of x and y into (9) yields the n equations

y1 = a0 + a1x1 + · · ·+ amxm1

y2 = a0 + a1x2 + · · ·+ amxm2

......

......

yn = a0 + a1xn + · · ·+ amxmn

or in matrix form,y = Mv (10)

where

y =

⎡⎢⎢⎢⎣

y1

y2...yn

⎤⎥⎥⎥⎦, M =

⎡⎢⎢⎢⎣

1 x1 x21 · · · xm

1

1 x2 x22 · · · xm

2...

......

...

1 xn x2n · · · xm

n

⎤⎥⎥⎥⎦, v =

⎡⎢⎢⎢⎣

a0

a1...

am

⎤⎥⎥⎥⎦ (11)

As before, the solutions of the normal equations

MTMv = MTy

determine the coefficients of the polynomial, and the vector v minimizes

‖y − Mv‖Conditions that guarantee the invertibility of MTM are discussed in the exercises (Exer-cise 16). If MTM is invertible, then the normal equations have a unique solution v = v∗,which is given by

v∗ = (MTM)−1MT y (12)

EXAMPLE 3 Fitting a Quadratic Curve to Data

According to Newton’s second law of motion, a body near the Earth’s surface fallsvertically downward in accordance with the equation

s = s0 + v0t + 12gt2 (13)

where

s = vertical displacement downward relative to some reference point

s0 = displacement from the reference point at time t = 0

v0 = velocity at time t = 0

g = acceleration of gravity at the Earth’s surface

Suppose that a laboratory experiment is performed to approximate g by measuring thedisplacement s relative to a fixed reference point of a falling weight at various times. Usethe experimental results shown in the following table to approximate g.

Time t (sec) .1 .2 .3 .4 .5

Displacement s (ft) −0.18 0.31 1.03 2.48 3.73


Solution For notational simplicity, let a0 = s0, a1 = v0, and a2 = 12g in (13), so our

mathematical problem is to fit a quadratic curve

s = a0 + a1t + a2t2 (14)

to the five data points:

(.1,−0.18), (.2, 0.31), (.3, 1.03), (.4, 2.48), (.5, 3.73)

With the appropriate adjustments in notation, the matrices M and y in (11) are

M =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1 t1 t21

1 t2 t22

1 t3 t23

1 t4 t24

1 t5 t25

⎤⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎣

1 .1 .01

1 .2 .04

1 .3 .09

1 .4 .16

1 .5 .25

⎤⎥⎥⎥⎥⎥⎥⎦, y =

⎡⎢⎢⎢⎢⎢⎢⎣

s1

s2

s3

s4

s5

⎤⎥⎥⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎢⎢⎣

−0.18

0.31

1.03

2.48

3.73

⎤⎥⎥⎥⎥⎥⎥⎦

Thus, from (12),

v∗ =⎡⎢⎣

a∗0

a∗1

a∗2

⎤⎥⎦ = (MTM)−1MTy ≈

⎡⎢⎣−0.40

0.35

16.1

⎤⎥⎦

so the least squares quadratic fit is

s = −0.40 + 0.35t + 16.1t2

From this equation we estimate that 12g = 16.1 and hence that g = 32.2 ft/sec2. Note

that this equation also provides the following estimates of the initial displacement andvelocity of the weight:

s0 = a∗0 = −0.40 ft

v0 = a∗1 = 0.35 ft/sec

In Figure 6.5.5 we have plotted the data points and the approximating polynomial.

4

3

2

1

0

0 .1 .2 .3 .4 .5 .6–1

Dis

tanc

e s

(in fe

et)

Time t (in seconds)

Figure 6.5.5

Tem

pera

ture

T (

K)

Altitude h (km)

Temperature of VenusianAtmosphere

30 40 50 60 70 80 90 100100

150

200

250

300

350

400

450

500

Source: NASA

Magellan orbit 3213Date: 5 October 1991

Latitude: 67 NLTST: 22:05

Historical Note On October 5, 1991 the Magellan spacecraftentered the atmosphere of Venus and transmitted the tempera-ture T in kelvins (K) versus the altitude h in kilometers (km)until its signal was lost at an altitude of about 34 km. Discount-ing the initial erratic signal, the data strongly suggested a linearrelationship, so a least squares straight line fit was used on thelinear part of the data to obtain the equation

T = 737.5 − 8.125h

By setting h = 0 in this equation, the surface temperature ofVenus was estimated at T ≈ 737.5 K. The accuracy of this resulthas been confirmed by more recent flybys of Venus.


Exercise Set 6.5In Exercises 1–2, find the least squares straight line fit

y = ax + b to the data points, and show that the result is rea-sonable by graphing the fitted line and plotting the data in thesame coordinate system.

1. (0, 0), (1, 2), (2, 7) 2. (0, 1), (2, 0), (3, 1), (3, 2)

In Exercises 3–4, find the least squares quadratic fity = a0 + a1x + a2x

2 to the data points, and show that the resultis reasonable by graphing the fitted curve and plotting the data inthe same coordinate system.

3. (2, 0), (3,−10), (5,−48), (6,−76)

4. (1,−2), (0,−1), (1, 0), (2, 4)

5. Find a curve of the form y = a + (b/x) that best fits thedata points (1, 7), (3, 3), (6, 1) by making the substitutionX = 1/x.

6. Find a curve of the form y = a + b√

x that best fits the datapoints (3, 1.5), (7, 2.5), (10, 3) by making the substitutionX = √

x. Show that the result is reasonable by graphingthe fitted curve and plotting the data in the same coordinatesystem.

Working with Proofs

7. Prove that the matrix M in Equation (3) has linearly inde-pendent columns if and only if at least two of the numbersx1, x2, . . . , xn are distinct.

8. Prove that the columns of the n × (m + 1) matrix M in Equa-tion (11) are linearly independent if n > m and at least m + 1of the numbers x1, x2, . . . , xn are distinct. [Hint: A nonzeropolynomial of degree m has at most m distinct roots.]

9. Let M be the matrix in Equation (11). Using Exercise 8, showthat a sufficient condition for the matrix MTM to be invert-ible is that n > m and that at least m + 1 of the numbersx1, x2, . . . , xn are distinct.


TF. In parts (a)–(d) determine whether the statement is true orfalse, and justify your answer.

(a) Every set of data points has a unique least squares straightline fit.

(b) If the data points (x1, y1), (x2, y2), . . . , (xn, yn) are not col-linear, then (2) is an inconsistent system.

(c) If the data points (x1, y1), (x2, y2), . . . , (xn, yn) do not lie ona vertical line, then the expression

|y1 − (a + bx1)|2 + |y2 − (a + bx2)2| + · · · + |yn − (a + bxn)|2

is minimized by taking a and b to be the coefficients in theleast squares line y = a + bx of best fit to the data.

(d) If the data points (x1, y1), (x2, y2), . . . , (xn, yn) do not lie ona vertical line, then the expression

|y1 − (a + bx1)| + |y2 − (a + bx2)| + · · · + |yn − (a + bxn)|is minimized by taking a and b to be the coefficients in theleast squares line y = a + bx of best fit to the data.


In Exercises T1–T7, find the normal system for the leastsquares cubic fit y = a0 + a1x + a2x

2 + a3x3 to the data points.

Solve the system and show that the result is reasonable by graph-ing the fitted curve and plotting the data in the same coordinatesystem.

T1. (−1,−14), (0,−5), (1,−4), (2, 1), (3, 22)

T2. (0,−10), (1,−1), (2, 0), (3, 5), (4, 26)

T3. The owner of a rapidly expanding business finds that forthe first five months of the year the sales (in thousands) are$4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures ona graph and conjectures that for the rest of the year, the sales curvecan be approximated by a quadratic polynomial. Find the leastsquares quadratic polynomial fit to the sales curve, and use it toproject the sales for the twelfth month of the year.

T4. Pathfinder is an experimental, lightweight, remotely piloted,solar-powered aircraft that was used in a series of experiments byNASA to determine the feasibility of applying solar power forlong-duration, high-altitude flights. In August 1997 Pathfinderrecorded the data in the accompanying table relating altitude H

and temperatureT . Show that a linear model is reasonable by plot-ting the data, and then find the least squares line H = H0 + kT

of best fit.

Table Ex-T4

Altitude H

(thousands of feet) 15 20 25 30 35 40 45

Temperature T

(◦C) 4.5 −5.9 −16.1 −27.6 −39.8 −50.2 −62.9

Three important models in applications are

exponential models (y = aebx)

power function models (y = axb)

logarithmic models (y = a + b ln x)

where a and b are to be determined to fit experimental data asclosely as possible. Exercises T5–T7 are concerned with a proce-dure, called linearization, by which the data are transformed to aform in which a least squares straight line fit can be used to approx-imate the constants. Calculus is required for these exercises.

T5. (a) Show that making the substitution Y = ln y in the equa-tion y = aebx produces the equation Y = bx + ln a whosegraph in the xY -plane is a line of slope b and Y -intercept ln a.


(b) Part (a) suggests that a curve of the form y = aebx can be fit-ted to n data points (xi, yi) by letting Yi = ln yi , then fittinga straight line to the transformed data points (xi, Yi) by leastsquares to find b and ln a, and then computing a from ln a.Use this method to fit an exponential model to the followingdata, and graph the curve and data in the same coordinatesystem.

x 0 1 2 3 4 5 6 7

y 3.9 5.3 7.2 9.6 12 17 23 31

T6. (a) Show that making the substitutions

X = ln x and Y = ln y

in the equation y = axb produces the equation Y = bX + ln a

whose graph in the XY -plane is a line of slope b and Y -intercept ln a.

(b) Part (a) suggest that a curve of the form y = axb can be fittedto n data points (xi, yi) by letting Xi = ln xi and Yi = ln yi ,then fitting a straight line to the transformed data points(Xi, Yi) by least squares to find b and ln a, and then com-

puting a from ln a. Use this method to fit a power functionmodel to the following data, and graph the curve and data inthe same coordinate system.

x 2 3 4 5 6 7 8 9

y 1.75 1.91 2.03 2.13 2.22 2.30 2.37 2.43

T7. (a) Show that making the substitution X = ln x in the equa-tion y = a + b ln x produces the equation y = a + bX whosegraph in the Xy-plane is a line of slope b and y-intercept a.

(b) Part (a) suggests that a curve of the form y = a + b ln x canbe fitted to n data points (xi, yi) by letting Xi = ln xi and thenfitting a straight line to the transformed data points (Xi, yi)

by least squares to find b and a. Use this method to fit a loga-rithmic model to the following data, and graph the curve anddata in the same coordinate system.

x 2 3 4 5 6 7 8 9

y 4.07 5.30 6.21 6.79 7.32 7.91 8.23 8.51

6.6 Function Approximation; Fourier SeriesIn this section we will show how orthogonal projections can be used to approximate certaintypes of functions by simpler functions. The ideas explained here have importantapplications in engineering and science. Calculus is required.

Best Approximations All of the problems that we will study in this section will be special cases of the followinggeneral problem.

Approximation Problem Given a function f that is continuous on an interval [a, b],find the “best possible approximation” to f using only functions from a specifiedsubspace W of C[a, b].

Here are some examples of such problems:

(a) Find the best possible approximation to ex over [0, 1] by a polynomial of the forma0 + a1x + a2x

2.

(b) Find the best possible approximation to sin πx over [−1, 1] by a function of theform a0 + a1e

x + a2e2x + a3e

3x .

(c) Find the best possible approximation to x over [0, 2π ] by a function of the forma0 + a1 sin x + a2 sin 2x + b1 cos x + b2 cos 2x.

In the first example W is the subspace of C[0, 1] spanned by 1, x, and x2; in the secondexample W is the subspace of C[−1, 1] spanned by 1, ex , e2x , and e3x ; and in the thirdexample W is the subspace of C[0, 2π ] spanned by 1, sin x, sin 2x, cos x, and cos 2x.


Measurements of Error To solve approximation problems of the preceding types, we first need to make the phrase“best approximation over [a, b]” mathematically precise. To do this we will need someway of quantifying the error that results when one continuous function is approximatedby another over an interval [a, b]. If we were to approximate f(x) by g(x), and if wewere concerned only with the error in that approximation at a single point x0, then itwould be natural to define the error to be

error = |f(x0) − g(x0)|sometimes called the deviation between f and g at x0 (Figure 6.6.1). However, we are not

[ ]a b

| f (x0) – g(x0)|x0

g

f

Figure 6.6.1 The deviationbetween f and g at x0.

concerned simply with measuring the error at a single point but rather with measuringit over the entire interval [a, b]. The difficulty is that an approximation may have smalldeviations in one part of the interval and large deviations in another. One possible wayof accounting for this is to integrate the deviation |f(x) − g(x)| over the interval [a, b]and define the error over the interval to be

error =∫ b

a

|f(x) − g(x)| dx (1)

Geometrically, (1) is the area between the graphs of f(x) and g(x) over the interval [a, b](Figure 6.6.2)—the greater the area, the greater the overall error.

[ ]a b

g

f

Figure 6.6.2 The areabetween the graphs of f and gover [a, b] measures the error inapproximating f by g over [a, b].

Although (1) is natural and appealing geometrically, most mathematicians and sci-entists generally favor the following alternative measure of error, called the mean squareerror:

mean square error =∫ b

a

[f(x) − g(x)]2 dx

Mean square error emphasizes the effect of larger errors because of the squaring andhas the added advantage that it allows us to bring to bear the theory of inner productspaces. To see how, suppose that f is a continuous function on [a, b] that we want toapproximate by a function g from a subspace W of C[a, b], and suppose that C[a, b] isgiven the inner product

〈f, g〉 =∫ b

a

f(x)g(x) dx

It follows that

‖f − g‖2 = 〈f − g, f − g〉 =∫ b

a

[f(x) − g(x)]2 dx = mean square error

so minimizing the mean square error is the same as minimizing ‖f − g‖2. Thus, theapproximation problem posed informally at the beginning of this section can be restatedmore precisely as follows.

Least SquaresApproximation

Least Squares Approximation Problem Let f be a function that is continuous on aninterval [a, b], let C[a, b] have the inner product

〈f, g〉 =∫ b

a

f(x)g(x) dx

and let W be a finite-dimensional subspace of C[a, b]. Find a function g in W thatminimizes

‖f − g‖2 =∫ b

a

[f(x) − g(x)]2 dx


Since ‖f − g‖2 and ‖f − g‖ are minimized by the same function g, this problem is equiva-lent to looking for a function g in W that is closest to f. But we know from Theorem 6.4.1that g = projW f is such a function (Figure 6.6.3). Thus, we have the following result.

Figure 6.6.3

f = function in C[a, b] to be approximated

g = projW f = least squares approximation to f from Wsubspace of

approximatingfunctions

W

THEOREM 6.6.1 If f is a continuous function on [a, b], and W is a finite-dimensionalsubspace of C[a, b], then the function g in W that minimizes the mean square error∫ b

a

[f(x) − g(x)]2 dx

is g = projW f, where the orthogonal projection is relative to the inner product

〈f, g〉 =∫ b

a

f(x)g(x) dx

The function g = projW f is called the least squares approximation to f from W .

Fourier Series A function of the form

T (x) = c0 + c1 cos x + c2 cos 2x + · · · + cn cos nx

+ d1 sin x + d2 sin 2x + · · · + dn sin nx(2)

is called a trigonometric polynomial ; if cn and dn are not both zero, then T (x) is said tohave order n. For example,

T (x) = 2 + cos x − 3 cos 2x + 7 sin 4x

is a trigonometric polynomial of order 4 with

c0 = 2, c1 = 1, c2 = −3, c3 = 0, c4 = 0, d1 = 0, d2 = 0, d3 = 0, d4 = 7

It is evident from (2) that the trigonometric polynomials of order n or less are thevarious possible linear combinations of

1, cos x, cos 2x, . . . , cos nx, sin x, sin 2x, . . . , sin nx (3)

It can be shown that these 2n + 1 functions are linearly independent and thus form abasis for a (2n + 1)-dimensional subspace of C[a, b].

Let us now consider the problem of finding the least squares approximation of acontinuous function f(x) over the interval [0, 2π ] by a trigonometric polynomial oforder n or less. As noted above, the least squares approximation to f from W is theorthogonal projection of f on W . To find this orthogonal projection, we must find anorthonormal basis g0, g1, . . . , g2n for W , after which we can compute the orthogonalprojection on W from the formula

projW f = 〈f, g0〉g0 + 〈f, g1〉g1 + · · · + 〈f, g2n〉g2n (4)


[see Theorem 6.3.4(b)]. An orthonormal basis for W can be obtained by applying theGram–Schmidt process to the basis vectors in (3) using the inner product

〈f, g〉 =∫ 2π

0f(x)g(x) dx

This yields the orthonormal basis

g0 = 1√2π

, g1 = 1√π

cos x, . . . , gn = 1√π

cos nx,

gn+1 = 1√π

sin x, . . . , g2n = 1√π

sin nx

(5)

(see Exercise 6). If we introduce the notation

a0 = 2√2π

〈f, g0〉, a1 = 1√π〈f, g1〉, . . . , an = 1√

π〈f, gn〉

b1 = 1√π〈f, gn+1〉, . . . , bn = 1√

π〈f, g2n〉

(6)

then on substituting (5) in (4), we obtain

projW f = a0

2+ [a1 cos x + · · · + an cos nx] + [b1 sin x + · · · + bn sin nx] (7)

where

a0 = 2√2π

〈f, g0〉 =2√2π

∫ 2π

0f(x)

1√2π

dx = 1

π

∫ 2π

0f(x) dx

a1 = 1√π〈f, g1〉 =

1√π

∫ 2π

0f(x)

1√π

cos x dx = 1

π

∫ 2π

0f(x) cos x dx

...

an = 1√π〈f, gn〉 =

1√π

∫ 2π

0f(x)

1√π

cos nx dx = 1

π

∫ 2π

0f(x) cos nx dx

b1 = 1√π〈f, gn+1〉 =

1√π

∫ 2π

0f(x)

1√π

sin x dx = 1

π

∫ 2π

0f(x) sin x dx

...

bn = 1√π〈f, g2n〉 =

1√π

∫ 2π

0f(x)

1√π

sin nx dx = 1

π

∫ 2π

0f(x) sin nx dx

In short,

ak = 1

π

∫ 2π

0f(x) cos kx dx, bk = 1

π

∫ 2π

0f(x) sin kx dx (8)

The numbers a0, a1, . . . , an, b1, . . . , bn are called the Fourier coefficients of f.

EXAMPLE 1 Least Squares Approximations

Find the least squares approximation of f(x) = x on [0, 2π ] by

(a) a trigonometric polynomial of order 2 or less;

(b) a trigonometric polynomial of order n or less.


Solution (a)

a0 = 1

π

∫ 2π

0f(x) dx = 1

π

∫ 2π

0x dx = 2π (9a)

For k = 1, 2, . . . , integration by parts yields (verify)

ak = 1

π

∫ 2π

0f(x) cos kx dx = 1

π

∫ 2π

0x cos kx dx = 0 (9b)

bk = 1

π

∫ 2π

0f(x) sin kx dx = 1

π

∫ 2π

0x sin kx dx = −2

k(9c)

Thus, the least squares approximation to x on [0, 2π ] by a trigonometric polynomial oforder 2 or less is

x ≈ a0

2+ a1 cos x + a2 cos 2x + b1 sin x + b2 sin 2x

or, from (9a), (9b), and (9c),

x ≈ π − 2 sin x − sin 2x

Solution (b) The least squares approximation to x on [0, 2π ] by a trigonometric poly-nomial of order n or less is

x ≈ a0

2+ [a1 cos x + · · · + an cos nx] + [b1 sin x + · · · + bn sin nx]

or, from (9a), (9b), and (9c),

x ≈ π − 2

(sin x + sin 2x

2+ sin 3x

3+ · · · + sin nx

n

)The graphs of y = x and some of these approximations are shown in Figure 6.6.4.

Figure 6.6.4 1 2 3 4 5 6 2π 7

1

2

3

4

5

6

x

y

y = π

y = π – 2 sin x

y = x

y = π – 2 (sin x + )sin 2x2

y = π – 2 (sin x + + )sin 2x2

sin 3x3

y = π – 2 (sin x + + + )sin 2x2

sin 3x3

sin 4x4

Jean BaptisteFourier (1768–1830)

Historical Note Fourier was aFrench mathematician and physi-cist who discovered the Fourierseries and related ideas whileworking on problems of heatdiffusion. This discovery wasone of the most influential inthe history of mathematics; it isthe cornerstone of many fieldsof mathematical research and abasic tool in many branches of en-gineering. Fourier, a political ac-tivist during the French revolution,spent time in jail for his defenseof many victims during the Ter-ror. He later became a favorite ofNapoleon who made him a baron.

[Image: Hulton Archive/Getty Images]

It is natural to expect that the mean square error will diminish as the number of termsin the least squares approximation

f(x) ≈ a0

2+

n∑k=1

(ak cos kx + bk sin kx)

increases. It can be proved that for functions f in C[0, 2π ], the mean square errorapproaches zero as n → +�; this is denoted by writing

f(x) = a0

2+

�∑k=1

(ak cos kx + bk sin kx)

The right side of this equation is called the Fourier series for f over the interval [0, 2π ].Such series are of major importance in engineering, science, and mathematics.

Chapter 6 Supplementary Exercises 399

Exercise Set 6.61. Find the least squares approximation of f(x) = 1 + x over

the interval [0, 2π ] by

(a) a trigonometric polynomial of order 2 or less.


2. Find the least squares approximation of f(x) = x2 over theinterval [0, 2π ] by

(a) a trigonometric polynomial of order 3 or less.


3. (a) Find the least squares approximation of x over the interval[0, 1] by a function of the form a + bex .

(b) Find the mean square error of the approximation.

4. (a) Find the least squares approximation of ex over the inter-val [0, 1] by a polynomial of the form a0 + a1x.


5. (a) Find the least squares approximation of sin πx over theinterval [−1, 1] by a polynomial of the forma0 + a1x + a2x

2.


6. Use the Gram–Schmidt process to obtain the orthonormalbasis (5) from the basis (3).

7. Carry out the integrations indicated in Formulas (9a), (9b),and (9c).

8. Find the Fourier series of f(x) = π − x over the interval[0, 2π ].

9. Find the Fourier series of f(x) = 1, 0 < x < π and f(x) = 0,π ≤ x ≤ 2π over the interval [0, 2π ].

10. What is the Fourier series of sin(3x)?


TF. In parts (a)–(e) determine whether the statement is true orfalse, and justify your answer.

(a) If a function f in C[a, b] is approximated by the function g,then the mean square error is the same as the area between thegraphs of f(x) and g(x) over the interval [a, b].

(b) Given a finite-dimensional subspace W of C[a, b], the func-tion g = projW f minimizes the mean square error.

(c) {1, cos x, sin x, cos 2x, sin 2x} is an orthogonal subset of thevector space C[0, 2π ] with respect to the inner product〈f, g〉 = ∫ 2π

0 f(x)g(x) dx.

(d) {1, cos x, sin x, cos 2x, sin 2x} is an orthonormal subset ofthe vector space C[0, 2π ] with respect to the inner product〈f, g〉 = ∫ 2π

0 f(x)g(x) dx.

(e) {1, cos x, sin x, cos 2x, sin 2x} is a linearly independent subsetof C[0, 2π ].

Chapter 6 Supplementary Exercises

1. Let R4 have the Euclidean inner product.

(a) Find a vector in R4 that is orthogonal to u1 = (1, 0, 0, 0)and u4 = (0, 0, 0, 1) and makes equal angles withu2 = (0, 1, 0, 0) and u3 = (0, 0, 1, 0).

(b) Find a vector x = (x1, x2, x3, x4) of length 1 that is or-thogonal to u1 and u4 above and such that the cosine ofthe angle between x and u2 is twice the cosine of the anglebetween x and u3.

2. Prove: If 〈u, v〉 is the Euclidean inner product on Rn, and if A

is an n × n matrix, then

〈u, Av〉 = 〈ATu, v〉[Hint: Use the fact that 〈u, v〉 = u · v = vT u.]

3. LetM22 have the inner product 〈U, V 〉 = tr(UTV ) = tr(V TU)

that was defined in Example 6 of Section 6.1. Describe theorthogonal complement of

(a) the subspace of all diagonal matrices.

(b) the subspace of symmetric matrices.

4. Let Ax = 0 be a system of m equations in n unknowns. Showthat

x =

⎡⎢⎢⎢⎣

x1

x2...

xn

⎤⎥⎥⎥⎦

is a solution of this system if and only if the vectorx = (x1, x2, . . . , xn) is orthogonal to every row vector of A

with respect to the Euclidean inner product on Rn.

5. Use the Cauchy–Schwarz inequality to show that ifa1, a2, . . . , an are positive real numbers, then

(a1 + a2 + · · · + an)

(1

a1+ 1

a2+ · · · + 1

an

)≥ n2

6. Show that if x and y are vectors in an inner product space andc is any scalar, then

‖cx + y‖2 = c2‖x‖2 + 2c〈x, y〉 + ‖y‖2


7. Let R3 have the Euclidean inner product. Find two vectorsof length 1 that are orthogonal to all three of the vectorsu1 = (1, 1,−1), u2 = (−2,−1, 2), and u3 = (−1, 0, 1).

8. Find a weighted Euclidean inner product on Rn such that thevectors

v1 = (1, 0, 0, . . . , 0)

v2 = (0,√

2, 0, . . . , 0)

v3 = (0, 0,√

3, . . . , 0)...

vn = (0, 0, 0, . . . ,√

n )

form an orthonormal set.

9. Is there a weighted Euclidean inner product on R2 for whichthe vectors (1, 2) and (3,−1) form an orthonormal set? Jus-tify your answer.

10. If u and v are vectors in an inner product space V, then u, v,and u − v can be regarded as sides of a “triangle” in V (seethe accompanying figure). Prove that the law of cosines holdsfor any such triangle; that is,

‖u − v‖2 = ‖u‖2 + ‖v‖2 − 2‖u‖‖v‖ cos θ

where θ is the angle between u and v.

u

u – vv

θ

Figure Ex-10

11. (a) As shown in Figure 3.2.6, the vectors (k, 0, 0), (0, k, 0),and (0, 0, k) form the edges of a cube in R3 with diagonal(k, k, k). Similarly, the vectors

(k, 0, 0, . . . , 0), (0, k, 0, . . . , 0), . . . , (0, 0, 0, . . . , k)

can be regarded as edges of a “cube” in Rn with diagonal(k, k, k, . . . , k). Show that each of the above edges makesan angle of θ with the diagonal, where cos θ = 1/

√n.

(b) (Calculus required ) What happens to the angle θ in part (a)as the dimension of Rn approaches �?

12. Let u and v be vectors in an inner product space.

(a) Prove that ‖u‖ = ‖v‖ if and only if u + v and u − v areorthogonal.

(b) Give a geometric interpretation of this result in R2 withthe Euclidean inner product.

13. Let u be a vector in an inner product space V, and let{v1, v2, . . . , vn} be an orthonormal basis for V . Show thatif αi is the angle between u and vi , then

cos2 α1 + cos2 α2 + · · · + cos2 αn = 1

14. Prove: If 〈u, v〉1 and 〈u, v〉2 are two inner products on a vectorspace V, then the quantity 〈u, v〉 = 〈u, v〉1 + 〈u, v〉2 is also aninner product.

15. Prove Theorem 6.2.5.

16. Prove: If A has linearly independent column vectors, and if bis orthogonal to the column space of A, then the least squaressolution of Ax = b is x = 0.

17. Is there any value of s for which x1 = 1 and x2 = 2 is the leastsquares solution of the following linear system?

x1 − x2 = 1

2x1 + 3x2 = 1

4x1 + 5x2 = s

Explain your reasoning.

18. Show that if p and q are distinct positive integers, then thefunctions f(x) = sin px and g(x) = sin qx are orthogonalwith respect to the inner product

〈f, g〉 =∫ 2π

0f(x)g(x) dx

19. Show that if p and q are positive integers, then the functionsf(x) = cos px and g(x) = sin qx are orthogonal with respectto the inner product

〈f, g〉 =∫ 2π

0f(x)g(x) dx

20. Let W be the intersection of the planes

x + y + z = 0 and x − y + z = 0

in R3. Find an equation for W⊥.

21. Prove that if ad − bc �= 0, then the matrix

A =[a b

c d

]

has a unique QR-decomposition A = QR, where

Q = 1√a2 + c2

[a −c

c a

]

R = 1√a2 + c2

[a2 + c2 ab + cd

0 ad − bc

]

Documents

Inner Product Spaces - UCONN€¦ · Inner Product Spaces CHAPTER CONTENTS 6.1 Inner Products 345 6.2 Angle and Orthogonality in Inner Product Spaces 355 6.3 Gram–Schmidt Process;