28
EE103 (Fall 2011-12) 8. Linear least-squares definition examples and applications solution of a least-squares problem, normal equations 8-1

Least Square Examples

Embed Size (px)

DESCRIPTION

Least Square

Citation preview

Page 1: Least Square Examples

EE103 (Fall 2011-12)

8. Linear least-squares

• definition

• examples and applications

• solution of a least-squares problem, normal equations

8-1

Page 2: Least Square Examples

Definition

overdetermined linear equations

Ax = b (A is m× n with m > n)

if b 6∈ range(A), cannot solve for x

least-squares formulation

minimize ‖Ax− b‖ =

m∑

i=1

(

n∑

j=1

aijxj − bi)2

1/2

• r = Ax− b is called the residual or error

• x with smallest residual norm ‖r‖ is called the least-squares solution

• equivalent to minimizing ‖Ax− b‖2

Linear least-squares 8-2

Page 3: Least Square Examples

Example

A =

2 0−1 10 2

, b =

10−1

least-squares solution

minimize (2x1 − 1)2 + (−x1 + x2)2 + (2x2 + 1)2

to find optimal x1, x2, set derivatives w.r.t. x1 and x2 equal to zero:

10x1 − 2x2 − 4 = 0, −2x1 + 10x2 + 4 = 0

solution x1 = 1/3, x2 = −1/3

(much more on practical algorithms for LS problems later)

Linear least-squares 8-3

Page 4: Least Square Examples

−2

0

2

−2

0

20

10

20

30

x1x2

r21 = (2x1 − 1)2

−2

0

2

−2

0

20

5

10

15

20

x1x2

r22 = (−x1 + x2)

2

−2

0

2

−2

0

20

10

20

30

x1x2

r23 = (2x2 + 1)2

−2

0

2

−2

0

20

20

40

60

x1x2

r21 + r2

2 + r23

Linear least-squares 8-4

Page 5: Least Square Examples

Outline

• definition

• examples and applications

• solution of a least-squares problem, normal equations

Page 6: Least Square Examples

Data fitting

fit a function

g(t) = x1g1(t) + x2g2(t) + · · ·+ xngn(t)

to data (t1, y1), . . . , (tm, ym), i.e., choose coefficients x1, . . . , xn so that

g(t1) ≈ y1, g(t2) ≈ y2, . . . , g(tm) ≈ ym

• gi(t) : R → R are given functions (basis functions)

• problem variables: the coefficients x1, x2, . . . , xn

• usually m ≫ n, hence no exact solution with g(ti) = yi for all i

• applications: developing simple, approximate model of observed data

Linear least-squares 8-5

Page 7: Least Square Examples

Least-squares data fitting

compute x by minimizing

m∑

i=1

(g(ti)− yi)2 =

m∑

i=1

(x1g1(ti) + x2g2(ti) + · · ·+ xngn(ti)− yi)2

in matrix notation: minimize ‖Ax− b‖2 where

A =

g1(t1) g2(t1) g3(t1) · · · gn(t1)g1(t2) g2(t2) g3(t2) · · · gn(t2)

... ... ... ...g1(tm) g2(tm) g3(tm) · · · gn(tm)

, b =

y1y2...ym

Linear least-squares 8-6

Page 8: Least Square Examples

Example: data fitting with polynomials

g(t) = x1 + x2t+ x3t2 + · · ·+ xnt

n−1

basis functions are gk(t) = tk−1, k = 1, . . . , n

A =

1 t1 t21 · · · tn−11

1 t2 t22 · · · tn−12

... ... ... ...1 tm t2m · · · tn−1

m

, b =

y1y2...ym

interpolation (m = n): can satisfy g(ti) = yi exactly by solving Ax = b

approximation (m > n): make error small by minimizing ‖Ax− b‖

Linear least-squares 8-7

Page 9: Least Square Examples

example. fit a polynomial to f(t) = 1/(1 + 25t2) on [−1, 1]

• pick m = n points ti in [−1, 1], and calculate yi = 1/(1 + 25t2i )

• interpolate by solving Ax = b

−1 −0.5 0 0.5 1−0.5

0

0.5

1

1.5n = 5

−1 −0.5 0 0.5 1−2

0

2

4

6

8n = 15

(dashed line: f ; solid line: polynomial g; circles: the points (ti, yi))

increasing n does not improve the overall quality of the fit

Linear least-squares 8-8

Page 10: Least Square Examples

same example by approximation

• pick m = 50 points ti in [−1, 1]

• fit polynomial by minimizing ‖Ax− b‖

−1 −0.5 0 0.5 1−0.2

0

0.2

0.4

0.6

0.8

1

n = 5

−1 −0.5 0 0.5 1−0.2

0

0.2

0.4

0.6

0.8

1

n = 15

(dashed line: f ; solid line: polynomial g; circles: the points (ti, yi))

much better fit overall

Linear least-squares 8-9

Page 11: Least Square Examples

Least-squares estimation

y = Ax+ w

• x is what we want to estimate or reconstruct

• y is our measurement(s)

• w is an unknown noise or measurement error (assumed small)

• ith row of A characterizes ith sensor or ith measurement

least-squares estimation

choose as estimate the vector x that minimizes

‖Ax− y‖

i.e., minimize the deviation between what we actually observed (y), andwhat we would observe if x = x and there were no noise (w = 0)

Linear least-squares 8-10

Page 12: Least Square Examples

Navigation by range measurements

find position (u, v) in a plane from distances to beacons at positions (pi, qi)

(u, v)

(p1, q1)

(p2, q2)

(p3, q3)

(p4, q4) ρ1

ρ2

ρ3

ρ4

beacons

unknown position

four nonlinear equations in two variables u, v:

(u− pi)2 + (v − qi)2 = ρi for i = 1, 2, 3, 4

ρi is the measured distance from unknown position (u, v) to beacon i

Linear least-squares 8-11

Page 13: Least Square Examples

linearized distance function: assume u = u0 +∆u, v = v0 +∆v where

• u0, v0 are known (e.g., position a short time ago)

• ∆u, ∆v are small (compared to ρi’s)

(u0 +∆u− pi)2 + (v0 +∆v − qi)2

≈√

(u0 − pi)2 + (v0 − qi)2 +(u0 − pi)∆u+ (v0 − qi)∆v√

(u0 − pi)2 + (v0 − qi)2

gives four linear equations in the variables ∆u, ∆v:

(u0 − pi)∆u+ (v0 − qi)∆v√

(u0 − pi)2 + (v0 − qi)2≈ ρi −

(u0 − pi)2 + (v0 − qi)2

for i = 1, 2, 3, 4

Linear least-squares 8-12

Page 14: Least Square Examples

linearized equations

Ax ≈ b

where x = (∆u,∆v) and A is 4× 2 with

bi = ρi −√

(u0 − pi)2 + (v0 − qi)2

ai1 =(u0 − pi)

(u0 − pi)2 + (v0 − qi)2

ai2 =(v0 − qi)

(u0 − pi)2 + (v0 − qi)2

• due to linearization and measurement error, we do not expect an exactsolution (Ax = b)

• we can try to find ∆u and ∆v that ‘almost’ satisfy the equations

Linear least-squares 8-13

Page 15: Least Square Examples

numerical example

• beacons at positions (10, 0), (−10, 2), (3, 9), (10, 10)

• measured distances ρ = (8.22, 11.9, 7.08, 11.33)

• (unknown) actual position is (2, 2)

linearized range equations (linearized around (u0, v0) = (0, 0))

−1.00 0.000.98 −0.20

−0.32 −0.95−0.71 −0.71

[

∆u∆v

]

−1.771.72

−2.41−2.81

least-squares solution: (∆u,∆v) = (1.97, 1.90) (norm of error is 0.10)

Linear least-squares 8-14

Page 16: Least Square Examples

Least-squares system identification

measure input u(t) and output y(t) for t = 0, . . . , N of an unknown system

u(t) y(t)unknownsystem

example (N = 70):

0 20 40 60−4

−2

0

2

4

t

u(t)

0 20 40 60−5

0

5

t

y(t)

system identification problem: find reasonable model for system basedon measured I/O data u, y

Linear least-squares 8-15

Page 17: Least Square Examples

moving average model

ymodel(t) = h0u(t) + h1u(t− 1) + h2u(t− 2) + · · ·+ hnu(t− n)

where ymodel(t) is the model output

• a simple and widely used model

• predicted output is a linear combination of current and n previous inputs

• h0, . . . , hn are parameters of the model

• called a moving average (MA) model with n delays

least-squares identification: choose the model that minimizes the error

E =

(

N∑

t=n

(ymodel(t)− y(t))2

)1/2

Linear least-squares 8-16

Page 18: Least Square Examples

formulation as a linear least-squares problem:

E =

(

N∑

t=n

(h0u(t) + h1u(t− 1) + · · ·+ hnu(t− n)− y(t))2

)1/2

= ‖Ax− b‖

A =

u(n) u(n− 1) u(n− 2) · · · u(0)u(n+ 1) u(n) u(n− 1) · · · u(1)u(n+ 2) u(n+ 1) u(n) · · · u(2)

... ... ... ...u(N) u(N − 1) u(N − 2) · · · u(N − n)

x =

h0

h1

h2...hn

, b =

y(n)y(n+ 1)y(n+ 2)

...y(N)

Linear least-squares 8-17

Page 19: Least Square Examples

example (I/O data of page 8-15) with n = 7: least-squares solution is

h0 = 0.0240, h1 = 0.2819, h2 = 0.4176, h3 = 0.3536,h4 = 0.2425, h5 = 0.4873, h6 = 0.2084, h7 = 0.4412

0 10 20 30 40 50 60 70−4

−3

−2

−1

0

1

2

3

4

5

t

solid: y(t): actual output

dashed: ymodel(t)

Linear least-squares 8-18

Page 20: Least Square Examples

model order selection: how large should n be?

0 20 400

0.2

0.4

0.6

0.8

1

n

relative error E/‖y‖

• suggests using largest possible n for smallest error

• much more important question: how good is the model at predictingnew data (i.e., not used to calculate the model)?

Linear least-squares 8-19

Page 21: Least Square Examples

model validation: test model on a new data set (from the same system)

0 20 40 60−4

−2

0

2

4

t

u(t)

0 20 40 60−5

0

5

y(t)

t

0 20 400

0.2

0.4

0.6

0.8

1

n

relative

prediction

error

validation data

modeling data

• for n too large the predictive

ability of the model becomesworse!

• validation data suggest n = 10

Linear least-squares 8-20

Page 22: Least Square Examples

for n = 50 the actual and predicted outputs on system identification andmodel validation data are:

0 20 40 60−5

0

5

t

solid: y(t)

dashed: ymodel(t)

I/O set used to compute model

0 20 40 60−5

0

5

t

solid: y(t)

dashed: ymodel(t)

model validation I/O set

loss of predictive ability when n is too large is called overfitting orovermodeling

Linear least-squares 8-21

Page 23: Least Square Examples

Outline

• definition

• examples and applications

• solution of a least-squares problem, normal equations

Page 24: Least Square Examples

Geometric interpretation of a LS problem

minimize ‖Ax− b‖2

A is m× n with columns a1, . . . , an

• ‖Ax− b‖ is the distance of b to the vector

Ax = x1a1 + x2a2 + · · ·+ xnan

• solution xls gives the linear combination of the columns of A closest to b

• Axls is the projection of b on the range of A

Linear least-squares 8-22

Page 25: Least Square Examples

example

A =

1 −11 20 0

, b =

142

a1

a2

b

Axls = 2a1 + a2

least-squares solution xls

Axls =

140

, xls =

[

21

]

Linear least-squares 8-23

Page 26: Least Square Examples

The solution of a least-squares problem

if A is left-invertible, then

xls = (ATA)−1AT b

is the unique solution of the least-squares problem

minimize ‖Ax− b‖2

• in other words, if x 6= xls, then ‖Ax− b‖2 > ‖Axls − b‖2

• recall from page 4-25 that ATA is positive definite and that

(ATA)−1AT

is a left-inverse of A

Linear least-squares 8-24

Page 27: Least Square Examples

proof

we show that ‖Ax− b‖2 > ‖Axls − b‖2 for x 6= xls:

‖Ax− b‖2 = ‖A(x− xls) + (Axls − b)‖2

= ‖A(x− xls)‖2 + ‖Axls − b‖2

> ‖Axls − b‖2

• 2nd step follows from A(x− xls) ⊥ (Axls − b):

(A(x− xls))T (Axls − b) = (x− xls)

T (ATAxls −AT b) = 0

• 3rd step follows from zero nullspace property of A:

x 6= xls =⇒ A(x− xls) 6= 0

Linear least-squares 8-25

Page 28: Least Square Examples

The normal equations

(ATA)x = AT b

if A is left-invertible:

• least-squares solution can be found by solving the normal equations

• n equations in n variables with a positive definite coefficient matrix

• can be solved using Cholesky factorization

Linear least-squares 8-26