Numerical Analysis:Solving Nonlinear Equations
Computer Science, Ben-Gurion University
(slides based mostly on Prof. Ben-Shahar’s notes)
2019/2020, Fall Semester
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 1 / 131
1 Introduction
2 The Bisection Method
3 The Regula Falsi (False Position) Method
4 The Secant Method
5 Convergence
6 Fixed Point
7 The Newton (-Raphson) MethodThe MethodUsing Newton’s Method to Find a Square RootA Root with Multiplicity
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 2 / 131
Introduction
Introduction
Our next topic for the several next lectures is solving nonlinear equations:
f : R→ R f(x) = 0 x =?
if x satisfies f(x) = 0, then it’s called a root of f .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 3 / 131
Introduction
Introduction
The seemingly-more general cases of solving
f : R→ R f(x) = b x =?
can be rewritten in the original form above by defining a new function:
fnew : R→ R fnew : x 7→ f(x)− b fnew(x) = 0 x =?
Note:
fnew(x) = 0 ⇐⇒ f(x) = b
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 4 / 131
Introduction
Introduction
Likewise,
f : R→ R g : R→ R f(x) = g(x) x =?
can be handled via
fnew : R→ R fnew : x 7→ f(x)− g(x) fnew(x) = 0 x =?
Note:
fnew(x) = 0 ⇐⇒ f(x) = g(x)
Thus, WLOG, we will focus attention on solving f(x) = 0.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 5 / 131
Introduction
The linear case
If f has the formf(x) = ax+ b
(that is, f is affine), where a and b are known real numbers, then we saythe equation is linear, and the solution to
f(x) = 0
is given (assuming a 6= 0), of course, by
x = − ba.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 6 / 131
Introduction
Some Nonlinear Equations are Easy
Sometimes we know how to solve nonlinear equations.
Examples:
x3 − 8 = 0⇒ x = 2 (one solution)
ax2 + bx+ c = 0⇒ x =−b±
√b2 − 4ac
2a(at most two solutions)
sin(x) = 0⇒ x = kπ, k ∈ Z (∞-many solutions)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 7 / 131
Introduction
Nonlinear Equations
In general, however, if the equation is nonlinear we usually don’t knowhow to solve it analytically.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 8 / 131
Introduction
Example: The Ladder in the MineBased on Example 0.1 from Gerald and Wheatly
a, w1, and w2 are fixed (determined by the mine’s geometry).
Find the longest ladder (of 0 thickness) that can make the turn.
The maximal ladder’s length, at a given angle c, is L = L1 + L2 where
L1 =w1
sin(b)L2 =
w2
sin(c)b = π − a− c
L = L1 + L2 =w1
sin(π − a− c)+
w2
sin(c)
The angle c determines b (and vice versa).
Thus, L = func(c).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 9 / 131
Introduction
Example: The Ladder in the MineBased on Example 0.1 from Gerald and Wheatly
L(c) =w1
sin(π − a− c)+
w2
sin(c)
The longest ladder that can pass is
mincL(c) .
Yes, minimum, not maximum. It’s the bottleneck that matters.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 10 / 131
Introduction
Example: The Ladder in the MineBased on Example 0.1 from Gerald and Wheatly
mincL(c) = min
c
w1
sin(π − a− c)+
w2
sin(c)
From calculus, we know that c , argminc L(c) satisfies
dL
dc
∣∣∣∣c
= 0 .
Thus, we need to solve, for the unknown c, the following equation:
w1cos(π − a− c)sin2(π − a− c)
− w2cos(c)
sin2(c)= 0
Equivalently:
−w1cos(a+ c)
sin2(a+ c)− w2
cos(c)
sin2(c)= 0
How can we solve this?BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 11 / 131
Introduction
Nonlinear Equations
Sometimes, in fact, even if a solution exists, an analytical form for itdoesn’t exist.
For example, the Abel-Ruffini theorem (also known as Abel’simpossibility theorem) states that this is the case for polynomials ofdegree higher than 4.
The theorem is named after Paolo Ruffini, who made an incomplete proof in1799,and Niels Henrik Abel, who provided a proof in 1824.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 12 / 131
Introduction
Finding the Square Root of a Number
Another example we mentioned earlier: how can we find the square root ofa number using only the 4 arithmetic operations (addition, subtraction,multiplication, and division) and the operation of comparison?
Example
x ∈ R>0 x2 − 5 = 0 x =?
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 13 / 131
Introduction
Our discussion so far motivates the need for non-analytical methods,based on simple operations, as means to solving
f : R→ R f(x) = 0 x =?
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 14 / 131
Introduction
Trial and Error
One possible approach is trial and error.
Its disadvantage is that very little can be said, in general, about itsexpected performance in terms of how fast it will find an approximatedsolution.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 15 / 131
Introduction
Requirements from Numerical Methods
So we want methods that not only approximate a root but also lendthemselves to some useful analysis.
More generally (than the context of solving nonlinear equations), hereare our requirements from numerical methods:
SpeedReliabilityEase to useEasy to analyze
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 16 / 131
The Bisection Method
The Bisection Method
Sometimes, if a certain property holds for f in a certain domain (e.g.,some interval), it guarantees the existence of root in that domain.
The Bisection method is an example for a method that exploits such arelation, together with iterations, to find the root of a function.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 17 / 131
The Bisection Method
The Bisection Method
The Bisection method (AKA “interval halving” or “binary search”), is aparticular case of the so-called Bracketing methods.
Bracketing methods determine successively smaller intervals (brackets)that contain a root.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 18 / 131
The Bisection Method
The bisection method utilizes the Intermediate Value Theorem.
Theorem (intermediate value)
Let f ∈ C([a, b]).If L ∈ R is a number between f(a) and f(b), that is,
min(f(a), f(b)) < L < max(f(a), f(b)) ,
then there exists c ∈ (a, b) such that f(c) = L.
C([a, b]) is the set of all continuous functions from [a, b] to R.BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 19 / 131
The Bisection Method
A Useful Variant in Our Context
Particularly (taking L = 0):
f ∈ C([a, b]) and f(a) · f(b) < 0⇒ ∃c ∈ (a, b) such that f(c) = 0 .
In other words if f(a) and f(b) have opposite signs, then f must vanishat some c ∈ (a, b).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 20 / 131
The Bisection Method
Definition (limit of a function at a point)
Let a function f be defined in an open interval containing x0, exceptpossibly at x0 itself. Then f has limit L at x = x0, denoted
limx→x0
f(x) = L ,
if ∀ε > 0 ∃δ such that
|x− x0| < δ ⇒ |f(x)− L| < ε .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 21 / 131
The Bisection Method
Definition (continuity of a function at a point)
Let f be defined over the open interval (a, b) and let x0 ∈ (a, b). Thefunction f is said to be continuous at x = x0 if
limx→x0
f(x) = f(x0) .
Definition (continuity of a function over an open interval)
A function f is called called continuous over the open interval (a, b) if it iscontinuous at every x ∈ (a, b).
The definitions can be extended to the cases of [a, b), (a, b], or [a, b] usingone-sided limits at the relevant endpoints.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 22 / 131
The Bisection Method
Notation
Cn([a, b]) , {f |f : [a, b]→ R, f and its first n derivatives are continuous}
Thus, C0([a, b]) is simply C([a, b]) (defined in a previous slide).
Example
f : x 7→ x4/3 is in C1([−1, 1]) but not in C2([−1, 1]), since
f ′ =4
3x1/3 f ′′ =
4
9x−2/3 =
4
9
1
x2/3
so f ′′ is discontinuous at x = 0.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 23 / 131
The Bisection Method
The variant of the intermediate value theorem suggests the followingalgorithm.
Input: [a, b] (a bracket) such that f(a) · f(b) < 0; δ > 0 (a tolerance value)Output: z such that |z − z| < δ where z is a root of f (i.e., f(z) = 0)
1 repeat2 z ← 1
2(a+ b)
3 if f(a) · f(z) < 0 then4 b← z5 else6 a← z
7 until |b− a| < 2δ;8 z ← 1
2(a+ b)
Algorithm 1: The Bisection Method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 24 / 131
The Bisection Method
Example
Figure: WikipediaBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 25 / 131
The Bisection Method
Example (Multiple Roots)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 26 / 131
The Bisection Method
Properties of the Bisection Method
Always works (assuming f is continuous): i.e., convergence to a trueroot is guaranteed:
limn→∞
z = z
Why? Since the approximation error after n iterations satisfies
E(n) <
∣∣∣∣b− a2n
∣∣∣∣(so E(n)→ 0)
The number of iterations required for a given approximation accuracy, δ,is known beforehand:
b− a2n≤ δ ⇒ n ≥ log2
(b− aδ
)BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 27 / 131
The Bisection Method
Properties of the Bisection Method (Continued)
Convergence is relatively slow (i.e., a relative large number of iterationsis needed) – we will discuss this topic later on
E(n) might not be a monotonic decreasing function of n.
A big advantage: it is fairly easy to find a good initial guess.– as we will see, in other methods convergence is guaranteed to happenonly when the initial guess is close to the root. This requirement isunnecessary in the bisection method.
Multiple roots for which the function doesn’t change sign will be missed.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 28 / 131
The Regula Falsi (False Position) Method
Regula Falsi (False Position)
One disadvantage of the bisection method is that, except the continuityof f on [a, b] and the opposite signs of f(a) and f(b), it does notexploit any additional knowledge we may have on f .
Particularly, the bisection does not exploit the fact that, in the vicinityof the root, we can approximate f using functions that may be easier tohandle in order to get a more educated guess about the root at eachiteration.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 29 / 131
The Regula Falsi (False Position) Method
Linear Approximation
For example, it is easy to approximate the function f via a “linear”function1 for which it is easy to find a root.
1This is the usual misnomer: this function is usually affine, not linear.BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 30 / 131
The Regula Falsi (False Position) Method
c is easily found via similar triangles. Let l : [a, b]→ R denote the“linear” function. By construction, l(a) = f(a), l(b) = f(b), andl(c) = 0. Thus:
c− a0− f(a)
similarity=
b− cf(b)− 0
⇒ c− a−f(a)
=b− cf(b)
⇒ (c− a)f(b) = (c− b)f(a)⇒ cf(b)− af(b) = cf(a)− bf(a)⇒ c[f(b)− f(a)] = af(b)− bf(a)
⇒ c =af(b)− bf(a)f(b)− f(a)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 31 / 131
The Regula Falsi (False Position) Method
The following manipulation is useful:
c =af(b)− bf(a)f(b)− f(a)
=af(b)− bf(a) +
add︷ ︸︸ ︷bf(b)−
subtract︷ ︸︸ ︷bf(b)
f(b)− f(a)
=f(b)(a− b) + b(f(b)− f(a))
f(b)− f(a)
= b− f(b) b− af(b)− f(a)
⇒ c = b− f(b) b− af(b)− f(a)
(5 arithmetic operations)
Let mf (a, b) ,f(b)−f(a)
b−a denote the line’s slope.
c = b− f(b)
mf (a, b)
We will later run again into expressions of this type.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 32 / 131
The Regula Falsi (False Position) Method
The Process Can Be Repeated
Figure: WikipediaBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 33 / 131
The Regula Falsi (False Position) Method
Regula Falsi (False Position)
We can now formulate a new bracketing algorithm.
Input: x0, x1: initial guess s.t. f(x0) · f(x1) < 0; δ > 0 (a tolerance value)Output: z such that |f(z)| < δ // note the difference from the bisection
1 i = 12 repeat3 i← i+ 1
4 xi ← xi−1 − f(xi−1)xi−1−xi−2
f(xi−1)−f(xi−2)
5 if f(xi) · f(xi−1) < 0 then6 xi−2 ← xi7 else8 xi−1 ← xi9 until |f(xi)| < δ // Should we check |xi − xi−1| < δ? No. See # 4 next slide
10 ;11 z ← xi
Algorithm 2: The Regula Falsi Method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 34 / 131
The Regula Falsi (False Position) Method
Properties
1 Convergence to the root is guaranteed:
limi→∞
xi = z (where f(z) = 0) )
2 may converge faster or slower than the bisection method, depending onf . Performance gets better the more the function, on [a, b], can be wellapproximated by a line.
3 Can’t predict in advance the number of iterations.
4 The bracket might not decrease to zero (despite property 1)5 The calculations, per iteration, are more expensive than bisection:
3 addition/subtraction1 multiplication1 division
while a bisection iteration requires only one addition and one division.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 35 / 131
The Secant Method
The Secant Method
Like the regula falsi method, it is based on a linear approximation.
Unlike regula falsi and bisection, this is not a bracketing method.(particularly, this method enables to start from any pair of points, evenif the root is not between them)
Might fail to converge.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 36 / 131
The Secant Method
The Secant Method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 37 / 131
The Secant Method
The Secant Method
Again, by triangle similarity:
x1 − x2f(x1)
=x0 − x2f(x0)
⇒ xi = xi−1 − f(xi−1)xi−1 − xi−2
f(xi−1)− f(xi−2)
Note xi−1−xi−2
f(xi−1)−f(xi−2)is the reciprocal of the line’s slope.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 38 / 131
The Secant Method
The Secant Method
Input: x0, x1: initial guess near the root (complicating an initial guess); δ > 0 (atolerance value)
Output: z such that |f(z)| < δ // as in regula falsi
/* The Repeat-Until loop below: similar to regula falsi, but w/o the
if/else */
1 i = 12 repeat3 i← i+ 1
4 xi ← xi−1 − f(xi−1)xi−1−xi−2
f(xi−1)−f(xi−2)
5 until |f(xi)| < δ;6 z ← xi
Algorithm 3: The Secant Method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 39 / 131
The Secant Method
Properties
A natural extension of regula falsi without bracketing.
If it converges, it tends to do it faster than the bisection and regaul falsimethods.
The price: might fail to converge.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 40 / 131
The Secant Method
A Failure Case
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 41 / 131
The Secant Method
A Failure Case
Figure: WikipediaBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 42 / 131
Convergence
Convergence of Numerical Methods
So far we saw 3 methods of computing (approximated) roots ofequations of the form f(x) = 0.
One of the aspects in which these method differ from each other is thespeed in which they “lead” us to the answer, i.e., their convergencerate.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 43 / 131
Convergence
Example
In this example, the secant is clearly the fastest while the bisection is theslowest.
Figure: Gerald and WheatleyBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 44 / 131
Convergence
Part of the goals of the field of Numerical Analysis is to characterizesuch relative performances in a more principled way.
For this, we need some definitions. . .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 45 / 131
Convergence
Definition (Big O notation for sequences)
Let fn and gn be two sequences. We write
fn = O(gn)
if there exist constants, N ∈ Z and C ∈ R>0, such that
|fn| ≤ C|gn| ∀n > N
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 46 / 131
Convergence
Definition (Big O notation for functions)
Let f and g be two functions from R to R. We write
f(x) = O(g(x)) (as x→∞)
⇐⇒ there exist constants, x0 ∈ R and C ∈ R>0, such that
|f(x)| ≤ C|g(x)| ∀x > x0
(often the “x→∞” is omitted and is implicitly assumed).
This definition lets us describe the rate of decay (or growth) offunctions/sequences in terms of well-known functions/sequences such asnp, n1/p, an, loga n, etc.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 47 / 131
Convergence
What is the connection between sequences and convergence ofnumerical calculations?
It turns out that in many numerical calculations we can recognizeseveral types of sequences.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 48 / 131
Convergence
Of particular importance, is the sequence that represents theapproximation error as a function of the iteration number.
Definition (approximation-error sequence)
Let {xn} be a sequence approximating the true value x. Theapproximation-error sequence is
en , xn − x .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 49 / 131
Convergence
Definition (order of convergence)
{xn} be a sequence such that
limn→∞
xn = x .
The order of convergence of {xn} is R > 0 if there exist A,
0 < A <∞ ,
such that
limn→∞
|xn+1 − x||xn − x|R
by def.= lim
n→∞
|en+1||en|R
= A .
In which case, the number A is called the asymptotic error constant
The order of convergence is defined w.r.t. the true value, x, which isusually unknown in numerical calculations.
Still, we will see that this definition will enable practical calculations.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 50 / 131
Convergence
From the definition of O(·), we can say that the order of convergence isR if
en+1 = O(eRn ) .
Remark
In principle, one could think of more general types of convergence such as
en+1 = O(g(en)) ,
but the use of these in practice is limited.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 51 / 131
Convergence
Particular Cases
R = 1⇒ limn→∞
|en+1||en|
= A ∈ (0, 1)
⇒ linear convergence.
R = 2⇒ limn→∞
|en+1||en|2
= A ∈ (0,∞)
⇒ quadratic convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 52 / 131
Convergence
Comments
When R = 1 we must have 0 < A < 1 for convergence, since otherwisethe error does not decrease (if A = 1) or might even increase (if A > 1).
When R = 2, A doesn’t have to be smaller than 1.
The order of convergence, R, does not have to be an integer.
If
limn→∞
|en+1||en|
= 0
the convergence is called superlinear for R = 1.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 53 / 131
Fixed Point
Fixed Point
Definition (fixed point)
x0 ∈ R is called a fixed point of g : R→ R if
g(x0) = x0 .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 54 / 131
Fixed Point
Idea
In order to solve f(x) = 0, reorganize the equation to an equivalentform,
x = g(x)
for some suitable function g, and try to find a fixed point of g; i.e., apoint x0 such that g(x0) = x0.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 55 / 131
Fixed Point
Idea
There is always at least one way to define such a g:
f(x0) = 0⇒ x0 + f(x0) = x0
so defineg(x) , x+ f(x) .
In which case x = x0, which is a root of f , is a fixed point of g.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 56 / 131
Fixed Point
Other Ways to Build g
Usually, however, there is more than one way to do it.
Example
f(x) = x2 − 2x− 3, so f(x) = 0 =⇒ x1,2 = −1, 3. Now:
x = f(x) + x = x2 − x− 3 =⇒ g1(x) = x2 − x− 3
2x = f(x) + 2x = x2 − 3/2=⇒ g2(x) =
1
2(x2 − 3)
x2 = f(x) + x2 = 2x2 − 2x− 3/x=⇒ g3(x) = 2x− 2− 3
x
−x2 + 2x = f(x)− x2 + 2x = −3 /(2−x)=⇒ g3(x) =
3
x− 2...
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 57 / 131
Fixed Point
In any case, the function g is called the fixed-point iteration functionfor solving f(x) = 0.
If g is the fixed-point iteration function for solving f(x) = 0, then, if p isa root of f , p is (by construction) a fixed point of the iteration
xn+1 = g(xn) .
This suggests the following algorithm.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 58 / 131
Fixed Point
The Fixed-point Method
Input: x1: initial guess close to the a root; δ > 0 (a tolerance value)Output: z, an approximation to the root
1 Rearrange the equation f(x) = 0 to the form x = g(x).2 i = 13 repeat4 i← i+ 15 xi ← g(xi−1)
6 until |xi − xi−1| < δ // Note it’s x values, not f(x) values
7 ;8 z ← xi
Algorithm 4: The fixed-point method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 59 / 131
Fixed Point
The Fixed-point Method
What does this procedure do?
How can we interpret it graphically?
What are its properties?
To answer the questions, we need some math first.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 60 / 131
Fixed Point
Theorem
Let g : R→ R be a continuous function, and let {xi}∞n=0 be a pointsequence generated via the iteration
xn+1 = g(xn) .
Iflimn→∞
xn = p
then p is a fixed point of g; i.e.,
p = g(p) .
In other words: if (that’s a big “if”. . . ) the iteration converges then itconverges to a fixed point of g, hence a root of f .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 61 / 131
Fixed Point
Proof.
limn→∞
xn = p⇒ limn→∞
xn+1 = p
⇒g(p) = g( limn→∞
xn)continuity
= limn→∞
g(xn)by def. of xn+1
= limn→∞
xn+1 = p
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 62 / 131
Fixed Point
This is nice, but when can we expect convergence?
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 63 / 131
Fixed Point
Theorem
If g ∈ C([a, b]) and the range of y = g(x) satisfies
∀x ∈ [a, b] y ∈ [a, b]
(i.e., g is continuous and is “into” [a, b]) then g has a fixed point in [a, b]:
∃x0 ∈ [a, b] g(x0) = x0 .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 64 / 131
Fixed Point
Example (existence of a fixed point)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 65 / 131
Fixed Point
Remark
Note well! The theorem, which holds when the domain of g is a finiteinterval, [a, b], says nothing about what happens if the domain isunbounded (e.g., R, or [a,∞), etc.).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 66 / 131
Fixed Point
Proof.
If g(a) = a or g(b) = b, we are done. Otherwise,
a < g(a) ≤ b and a ≤ g(b) < b .
Thus, f(x) , x− g(x) satisfies
f(a) < 0 and f(b) > 0 . (1)
By the intermediate value theorem theorem,
∃c ∈ (a, b) s.t. f(c) = 0 .
If follows that c is a fixed point of g:
g(c)by def. of f
= c− f(c)︸︷︷︸0
= c .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 67 / 131
Fixed Point
Reminder: Lagrange’s Mean Value Theorem
Theorem (mean value)
Let f ∈ C([a, b]) and suppose f ′(x) exists for every x ∈ (a, b). Then thereexists c ∈ (a, b) such that
f ′(c) =f(b)− f(a)
b− a(the secant’s slope)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 68 / 131
Fixed Point
Example (mean value theorem)
Figure: WikipediaBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 69 / 131
Fixed Point
Proof.
Define the function
g(x) , f(x)−
the secant between (a, f(a)) and (b, f(b))︷ ︸︸ ︷(f(a) + (x− a)f(b)− f(a)
b− a
)Thus, g(a) = g(b) = 0. By Rolle’s theorema, ∃c ∈ (a, b) such thatg′(c) = 0. Now:
g′(x) = f ′(x)− f(b)− f(a)b− a
0 = g′(c) = f ′(c)− f(b)− f(a)b− a
⇒ f ′(c) =f(b)− f(a)
b− a.
aAny real-valued differentiable function that attains equal values at twodistinct points must have at least one point somewhere between them wherethe first derivative is zero.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 70 / 131
Fixed Point
Theorem (existence and uniqueness of a fixed point)
If g ∈ C([a, b]), g(x) ∈ [a, b], g′(x) exists on (a, b) and
∀x ∈ (a, b) |g′(x)| ≤ k < 1
then g has a unique fixed point p ∈ [a, b].
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 71 / 131
Fixed Point
Proof.
Proof by contradiction. Assume there exist two fixed points:
p1, p2 ∈ [a, b] p1 < p2 g(p1) = p1 g(p2) = p2
By the mean value theorem, there exists d ∈ (p1, p2) such that
g′(d)mean value thm
=g(p2)− g(p1)p2 − p1
=p2 − p1p2 − p1
= 1
violating the assumption that k < 1.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 72 / 131
Fixed Point
Definition (contraction map)
A contraction map is a function that f that satisfies
|f(y)− f(x)| ≤ k|x− y| k < 1
for every x and y in its domain.
A contraction mapping “shrinks” distances between points.
A contraction mapping need not be differentiable (but it is alwaysLipschitz continuous, hence uniformly continuous, hence continuous).
A differentiable function need not be a contraction mapping.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 73 / 131
Fixed Point
A differentiable function f is a contraction mapping ⇐⇒ it satisfies|f ′(x)| ≤ k < 1.
In this class, we will often just say “contraction mapping” but will mean“differentiable contraction mapping”.
Thus, we can rephrase the last theorem as follows: If g ∈ C([a, b]),g(x) ∈ [a, b], g′(x) exists on (a, b) and g is a contraction map on (a, b),then g has a unique fixed point p ∈ [a, b].
As we are starting to see, convergence of fixed-point iterations will beguaranteed for contraction mapping.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 74 / 131
Fixed Point
Theorem
Suppose:
g ∈ C([a, b])g has a fixed point p ∈ (a, b);
g is defined in (a, b);
∀x ∈ [a, b], g(x) ∈ [a, b];
x0 ∈ (a, b).
If g is a contraction mapping then the iteration xn = g(xn−1) willconverge to p. In which case, p is called the attractive fixed point.If g satisfies
|g′(x)| > 1
for every x ∈ [a, b] then the iteration xn = g(xn−1) will (locally) diverge;in which case p is called an expelling fixed point.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 75 / 131
Fixed Point
Proof.
We will prove for the convergent case. Note that {xn}∞n=0 ⊂ (a, b). Now:
|p− xn|since p is a fixed pt
= |g(p)− xn|by def. of xn
= |g(p)− g(xn−1)|mean value thm
= |g′(cn−1)(p− xn−1)| = |g′(cn−1)| · |p− xn−1|contraction≤ k|p− xn−1|
for some k ∈ (0, 1); particularly, |p− xn| < |p− xn−1|. WTS:limn→∞ |p− xn| = 0. But by induction,
|p− xn| ≤ kn|p− x0| .
Thus,
0 ≤ limn→∞
|p− xn| ≤ limn→∞
kn|p− x0| = |p− x0| limn→∞
kn = |p− x0| · 0 = 0 .
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 76 / 131
Fixed Point
The divergent case is handled similarly, by showing that
|g′(x)| > 1⇒ |p− xn| > |p− xn−1| ,
in which case we are getting further and further from p.
The divergence is not to ±∞ as the sequence is contained in (a, b),hence the term “locally”.
The theorem addresses the cases where |g′(x)| < 1 for every (a, b) andwhere |g′(x)| > 1 for every (a, b). It does not handle other cases.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 77 / 131
Fixed Point
Example: Convergence
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 78 / 131
Fixed Point
Example: Divergence
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 79 / 131
Fixed Point
More Examples
Figure: Justin Solomon’s bookBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 80 / 131
Fixed Point
Example: 0 < K � 1 ⇒ Fast Convergence
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 81 / 131
Fixed Point
Example: 0 < K < 1 & K ≈ 1⇒ Slow Convergence
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 82 / 131
Fixed Point
Order of Convergence
Recall that, proving the the convergent case from the theorem, we saw
|p− xn+1| ≤ k|p− xn|
i.e.,
|en+1| ≤ k|en| .
Thus,
limn→∞
|en+1||en|
≤ k < 1 .
So there is at least linear convergence.(with additional knowledge, we may be able to show better rates)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 83 / 131
Fixed Point
This result can be generalized and extended. But first, we need anotherreminder to a result that generalizes the Mean Value Theorem.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 84 / 131
Fixed Point
Theorem (Taylor)
If f ∈ Cn+1([a, b]) then,
f(x+ h) =
(n∑
k=0
hk
k!f (k)(x)
)+ En(h) ∀x, x+ h ∈ [a, b]
where
En(h) =hn+1
(n+ 1)!f (n+1)(ξ)
for some ξ ∈ [x, x+ h]. Moreover, En(h) = O(hn+1).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 85 / 131
Fixed Point
Taylor
f(x+ h) = f(x) + hf ′(x) +h2
2f ′′(x) +
h3
6f ′′′(x) + . . .+
hn
n!fn(x)︸ ︷︷ ︸
Pn(h), a polynomial of order n
+En(h)︸ ︷︷ ︸error
= Pn(h) + En(h)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 86 / 131
Fixed Point
Remark
Taking n = 0, we get there exists some ξ ∈ [x, x+ h] such that
f(x+ h) = f(x) + h · f ′(ξ)
Equivalently, there exists some ξ ∈ [x, x+ h] such that
f(x+ h)− f(x)h
= f ′(ξ) .
and this is exactly the mean value theorem.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 87 / 131
Fixed Point
Back to Convergence of Fixed Point Iteration
en = xn − p⇒ xn = p+ en
|en+1| = |p− xn+1| = |g(p)− g(xn)|reverse order
= |g(xn)− g(p)|
=
∣∣∣∣∣∣∣∣∣∣
g(p) +[q−1∑k=1
eknk!g(k)(p)
]+eqnq!g(q)(cn)︸ ︷︷ ︸
Taylor expansion of g(p+en) about p
− g(p)∣∣∣∣∣∣∣∣∣∣
=
∣∣∣∣∣[q−1∑k=1
eknk!g(k)(p)
]+eqnq!g(q)(cn)
∣∣∣∣∣for some cn ∈ (p, xn).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 88 / 131
Fixed Point
We have just obtained that
|en+1| =
∣∣∣∣∣[q−1∑k=1
eknk!g(k)(p)
]+eqnq!g(q)(cn)
∣∣∣∣∣for some cn ∈ (p, xn).
⇒ the order of convergence of a fixed-point iteration is the smallest qsuch that g(q) 6= 0 since then
limn→∞
|en+1||en|q
= limn→∞
1
q!
∣∣∣g(q)(cn)∣∣∣or, since xn → p,
limn→∞
|en+1||en|q
=1
q!
∣∣∣g(q)(p)∣∣∣BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 89 / 131
Fixed Point
Recall that, in order to solve f(x) = 0, we can construct multiplefixed-point iterations:xi+1 = g1(x1)xi+1 = g2(x1)xi+1 = g3(x1)...
How should we choose from these?
Answer: the considerations are
Order of convergenceNumber of calculations per iterationEase of finding an initial guess (the size of the range around the root suchthat |g′(x)| < 1 and g maps into it)
Usually we will prefer g with the smallest |g′(p)| as possible.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 90 / 131
The Newton (-Raphson) Method The Method
The Newton (-Raphson) Method
Both the secant method and the Regula Falsi method utilize theadvantage in in approximating f via a “linear” function near the thecurrent guess in order to get us closer to the root.
Reminder, here is the iteration in these methods:
xi+1 = xi − f(xi)xi − xi−1
f(xi)− f(xi−1)
or, equivalently,
xi+1 = xi −f(xi)
mi,i+1
where mi,i+1 is the slope of the secant between xi and xi−1.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 91 / 131
The Newton (-Raphson) Method The Method
The Newton (-Raphson) Method
Newton’s method obviates the need of two points; rather, itapproximates f at the current via its tangent:
xi+1 = xi −f(xi)
f ′(xi).
(we will derive this expression soon)
Note the same result would have been obtained by taking the expressionfrom the secant’s method for xi → xi−1.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 92 / 131
The Newton (-Raphson) Method The Method
Example (tangent line)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 93 / 131
The Newton (-Raphson) Method The Method
Deriving the Iteration
y(xi+1) = m · xi+1 + b = 0
y(xi) = m · xi + b = f(xi)
Thus, by subtraction,
mxi −m · xi+1 = f(xi)
The slope, by the definition of the tangent line, is
m = f ′(xi) .
Thus,
f ′(xi)(xi − xi+1) = f(xi)
⇒ xi+1 = xi −f(xi)
f ′(xi).
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 94 / 131
The Newton (-Raphson) Method The Method
Newton’s Method
Input: x0: initial guess near the root (complicating an initial guess); δ > 0 (a tolerancevalue)
Output: z such that |f(z)| < δ1 i = 02 repeat3 i← i+ 1
4 xi ← xi−1 − f(xi−1)
f ′(xi−1)
5 until |f(xi)| < δ;6 z ← xi // or until |xi − xi−1| < δ
Algorithm 5: Newton’s Method
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 95 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method iterations)
Figure: http://tutorial.math.lamar.edu/BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 96 / 131
The Newton (-Raphson) Method The Method
Note it is important to start near the root for otherwise it might diverge.
But how close to the root do we need to be? Under what conditions willwe achieve convergence?
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 97 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method failure: an endless loop)
Figure: Numerical Recipes in C++BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 98 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method failure: an endless loop)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 99 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method failure: an endless loop)
Figure: Gerald and WheatleyBGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 100 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method failure: divergence)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 101 / 131
The Newton (-Raphson) Method The Method
Example (Newton’s method failure: zero derivative)
Figure: Numerical Recipes in C++BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 102 / 131
The Newton (-Raphson) Method The Method
Theorem (convergence of Newton’s method)
Let f ∈ C2([a, b]) and let p ∈ [a, b] such that f(p) = 0.If f ′(p) 6= 0, then ∃δ > 0 such that the sequence defined via
xi+1 = xi −f(xi)
f ′(xi)
converges to p for every x0 ∈ (p− δ, p+ δ).
The proximity to the root and the continuity of f ′′ are not intuitiverequirements. The reason behind these requirements is revealed throughthe proof.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 103 / 131
The Newton (-Raphson) Method The Method
Theorem (convergence of Newton’s method)
Let f ∈ C2([a, b]) and let p ∈ [a, b] such that f(p) = 0.If f ′(p) 6= 0, then ∃δ > 0 such that the sequence defined via
xi+1 = xi −f(xi)
f ′(xi)
converges to p for every x0 ∈ (p− δ, p+ δ).
The proximity to the root and the continuity of f ′′ are not intuitiverequirements. The reason behind these requirements is revealed throughthe proof.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 103 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Proof.
Newton’s method is a fixed point iteration of the formxi+1 = g(xi) , xi − f(xi)/f ′(xi) . By the fixed point iteration’sconvergence theorem, convergence is guaranteed in an open intervalcontaining the root if g is a contraction mapping on it. This requires
|g′(x)| =∣∣∣1− [f ′(x)]2−f(x)·f ′′(x)
[f ′(x)]2
∣∣∣ = ∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣ required< 1
in some interval about p. Since f(p) = 0 and f ′(p) 6= 0 it follows thatg′(p) = 0. So to ensure the existence of an interval (p− δ, p+ δ) on which|g′(x)| < 1 it suffices to require continuity of g′ on it.
As |g′(x)| =∣∣∣f(x)f ′′(x)[f ′(x)]2
∣∣∣, we have continuity ⇐⇒ f ∈ C2([p− δ, p+ δ]).
In other words: If p ∈ [a, b] and f ∈ C2[a, b] then ∃δ > 0 such that(p− δ, p+ δ) ⊆ [a, b] and g(x) , f − f(x)/f ′(x) is a contraction mappingon (p− δ, p+ δ), implying the iteration’s convergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 104 / 131
The Newton (-Raphson) Method The Method
Here is the theorem again:
Theorem (convergence of Newton’s method)
Let f ∈ C2([a, b]) and let p ∈ [a, b] such that f(p) = 0.If f ′(p) 6= 0, then ∃δ > 0 such that the sequence defined via
xi+1 = xi − f(xi)f ′(xi)
converges to p for every x0 ∈ (p− δ, p+ δ).
From the theorem’s assumption and what we saw in its proof:
f(p) = 0f ′(p) 6= 0
g(x) = x− f(x)f ′(x)
|g′(x)| =∣∣∣ f(x)f ′′(x)(f ′(x))2
∣∣∣g(p) = pg′(p) = 0
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 105 / 131
The Newton (-Raphson) Method The Method
Newton’s Method: Order of convergence
Identifying the order of convergence here is an immediate consequenceof what we learned for a fixed point iteration.
Reminder: The order of convergence of a fixed-point iteration is thesmallest q such that g(q) 6= 0.
For Newton’s method, we just saw that (if it converges)
g′(p) = 0 ,
hence the convergence order of Newton’s method is 2.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 106 / 131
The Newton (-Raphson) Method The Method
Newton’s Method: Order of convergence
Identifying the order of convergence here is an immediate consequenceof what we learned for a fixed point iteration.
Reminder: The order of convergence of a fixed-point iteration is thesmallest q such that g(q) 6= 0.
For Newton’s method, we just saw that (if it converges)
g′(p) = 0 ,
hence the convergence order of Newton’s method is 2.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 106 / 131
The Newton (-Raphson) Method The Method
Newton’s Method: Order of convergence
Identifying the order of convergence here is an immediate consequenceof what we learned for a fixed point iteration.
Reminder: The order of convergence of a fixed-point iteration is thesmallest q such that g(q) 6= 0.
For Newton’s method, we just saw that (if it converges)
g′(p) = 0 ,
hence the convergence order of Newton’s method is 2.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 106 / 131
The Newton (-Raphson) Method The Method
Deriving it Again, Specifically for Newton’s Case
Now:
|en+1| = |xn+1 − p| = |g(xn)− g(p)| = |g(p+ en)− g(p)|
= |
2nd order Taylor expansion︷ ︸︸ ︷g(p) + en · g′(p) +
e2n2· g′′(ξn)−g(p)| (for some ξn between p & p+ en)
=
∣∣∣∣en · g′(p) + e2n2· g′′(ξn)
∣∣∣∣ = ∣∣∣∣en · f(p)f ′′(p)(f ′(p))2+e2n2· g′′(ξn)
∣∣∣∣f(p)=0=
f ′(p) 6=0
∣∣∣∣e2n2 · g′′(ξn)∣∣∣∣⇒ |en+1| =
∣∣∣∣e2n2 · g′′(ξn)∣∣∣∣
Thus,
limn→∞
|en+1||en|2
= limn→∞
∣∣∣∣12g′′(ξn)∣∣∣∣ since ξn→p and g′′ is cont.
=
∣∣∣∣12g′′(p)∣∣∣∣
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 107 / 131
The Newton (-Raphson) Method The Method
Deriving it Again, Specifically for Newton’s Case
Now:
|en+1| = |xn+1 − p| = |g(xn)− g(p)| = |g(p+ en)− g(p)|
= |
2nd order Taylor expansion︷ ︸︸ ︷g(p) + en · g′(p) +
e2n2· g′′(ξn)−g(p)| (for some ξn between p & p+ en)
=
∣∣∣∣en · g′(p) + e2n2· g′′(ξn)
∣∣∣∣ = ∣∣∣∣en · f(p)f ′′(p)(f ′(p))2+e2n2· g′′(ξn)
∣∣∣∣f(p)=0=
f ′(p) 6=0
∣∣∣∣e2n2 · g′′(ξn)∣∣∣∣⇒ |en+1| =
∣∣∣∣e2n2 · g′′(ξn)∣∣∣∣
Thus,
limn→∞
|en+1||en|2
= limn→∞
∣∣∣∣12g′′(ξn)∣∣∣∣ since ξn→p and g′′ is cont.
=
∣∣∣∣12g′′(p)∣∣∣∣
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 107 / 131
The Newton (-Raphson) Method The Method
Just saw:
limn→∞
|en+1||en|2
=
∣∣∣∣12g′′(p)∣∣∣∣
Is the RHS smaller than ∞?
g′′(x) =d
dx
f(x)f ′′(x)
[f ′′(x)]2=
(ff ′′′ + f ′f ′′)(f ′)2 − (ff ′′)(2f ′′f ′)
(f ′)4
f(p)=0⇒ g′′(p) =(f ′(p)f ′′(p))(f ′(p))2
(f ′(p))4=f ′′(p)
f ′(p)
⇒ limn→∞
|en+1||en|2
= A ,1
2
∣∣∣∣f ′′(p)f ′(p)
∣∣∣∣and A <∞ if f ′(p) 6= 0, which is exactly one of the requirements in theconvergence theorem for Newton’s method.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 108 / 131
The Newton (-Raphson) Method The Method
Just saw:
limn→∞
|en+1||en|2
=
∣∣∣∣12g′′(p)∣∣∣∣
Is the RHS smaller than ∞?
g′′(x) =d
dx
f(x)f ′′(x)
[f ′′(x)]2=
(ff ′′′ + f ′f ′′)(f ′)2 − (ff ′′)(2f ′′f ′)
(f ′)4
f(p)=0⇒ g′′(p) =(f ′(p)f ′′(p))(f ′(p))2
(f ′(p))4=f ′′(p)
f ′(p)
⇒ limn→∞
|en+1||en|2
= A ,1
2
∣∣∣∣f ′′(p)f ′(p)
∣∣∣∣and A <∞ if f ′(p) 6= 0, which is exactly one of the requirements in theconvergence theorem for Newton’s method.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 108 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
Finding a Square Root
Goal: Given M > 0, compute√M up to any given accuracy, using only
arithmetic operations.
Method: Newton’s iterations.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 109 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
√M is a root of
f(x) = x2 −MIn other words, we want to solve the equation
x2 −M = 0 .
According to Newton’s method, the iteration is
xi+1 = xi −f(xi)
f ′(xi)
⇒xi+1 = xi −x2i −M2xi
⇒ xi+1 =1
2
(xi +
M
xi
)This computation requires:
1 division1 addition1 division by 2 (shifts bits to the right)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 110 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
How to Pick an Initial Guess?
Given M , we want an initial guess that will satisfy the conditions thatguarantee convergence.
g(x) =1
2
(x+
M
x
)⇒ |g′(x)| = 1
2
∣∣∣∣1− M
x2
∣∣∣∣Want:
1
2
∣∣∣∣1− M
x2
∣∣∣∣ < 1
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 111 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
Equivalently, want:
−2 < 1− M
x2< 2
Inspecting the right inequality:
1− M
x2< 2 ⇐⇒ −M
x2< 1 ⇐⇒ M
x2> −1 ⇐⇒ −M < x2
so this inequality always holds.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 112 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
As for the left inequality:
−2 < 1− M
x2⇐⇒ −3 < −M
x2⇐⇒ 3 >
M
x2⇐⇒ x >
√M
3
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 113 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
x >
√M
3
So, it seems we have solved our problem: we “just” need to start in
some some x0 >√
M3
The problem, of course, is that requires us to compute a square root –which is exactly the problem we are trying to solve.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 114 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
Idea: transform the problem to a range where it is easy to pick a safeinitial guess. Apply Newton’s method to the new problem, and thentransform the result back to the original problem.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 115 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
For every M > 0, there exist a real number m such that 1/4 ≤ m ≤ 1and an integer number e such that
M = m · 4e
Thus, √M =
√m · 2e
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 116 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
Since 1/4 ≤ m ≤ 1 we know that
1/2 ≤√m ≤ 1
We can get a good initial guess for finding√m using a linear
approximation w/o having to use√·
The line between the two points
(m = 1/4,√m = 1/2)
and(m = 1,
√m = 1)
is1
3(2m+ 1)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 117 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
Taking x0 =13(2m+ 1) ensures x0 >
√m3 .
Indeed:
1
3(2m+ 1) >
√m
31
9(2m+ 1)2 >
m
3(2m+ 1)2 > 3m
4m2 + 4m+ 1 > 3m
4m2 +m+ 1 > 0
but this always holds since m > 0.
Thus, convergence of Newton’s iterations is guaranteed.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 118 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
How Many Iterations Are Required?
e0 =√m− x0 = m1/2 − 1
32m+ 1
e′0 =1
2m−1/2 − 2
3
Set e′0 to zero find m that will give the maximal error:
m−1/2max = 4/3⇒ mmax =9
16⇒√mmax =
3
4
In which case, emax0 = 3
4 −13
(2 · 9
16 + 1)≈ 0.042
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 119 / 131
The Newton (-Raphson) Method Using Newton’s Method to Find a Square Root
We know the convergence rate is quadratic.
|en+1| = |xi+1 −√m| =
∣∣∣∣12(xi +m/xi)−√m
∣∣∣∣=
1
2xi(xi −
√m)2 =
1
2xie2i
Since 1/2 ≤ xi ≤ 1 we get that |en+1| ≤ e2n.
Thus,e3 ≤ e22 ≤ e41 ≤ e80 ≤ 0.0428 ≈ 10−11
Point: even just three iterations suffice for accuracy of at least 10 digits!
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 120 / 131
The Newton (-Raphson) Method A Root with Multiplicity
The Condition f ′(p) 6= 0
The conditionf ′(p) 6= 0
follows since the convergence theorem for Newton’s method.
Without it, we could not have made key claims. For example:
g′(p) = 0 (was needed to prove convergence)
limn→∞
|en+1||en|2 = A , 1
2
∣∣∣ f ′′(p)f ′(p)
∣∣∣ <∞(was needed to determine order of convergence)
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 121 / 131
The Newton (-Raphson) Method A Root with Multiplicity
A Pathological Case: A Root with Multiplicity
This raises the question: when does this condition doesn’t hold? Ineffect, when do we have
f ′(p) = 0 ?
And what can be done in such a case?
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 122 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Example
Consider the family of functions of the form f(x) = x2 − 1 + d
f(x) = 0 has two roots for d < 1 (since then x2 = 1− d > 0), and asingle root for d = 1 (since then x2 = 0. As d approaches 1 from below,the inter-root distance decreases, becoming zero when d = 1; i.e., theroots coincide. Such a root is called a “multiple root” or “a root withmultiplicity”. This is in contrast to a “simple root” (here, for d < 1, wehave two simple roots.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 123 / 131
The Newton (-Raphson) Method A Root with Multiplicity
More Formally
Definition
If f ∈ CM ([a, b]), and ∃p ∈ (a, b) such that
f (m)(p) = 0 ∀m = 0, 1, . . . ,M − 1
f (M)(p) 6= 0
then p is called a root of multiplicity M . A root of multiplicity 1 is calledsimple.
Clearly, some of our consequences so far about the convergence ofNewton’s method are invalid for a root of multiplicity M > 1.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 124 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Theorem (convergence of Newton’s method for a multiple root)
Let f ∈ CM ([a, b]). Let p ∈ [a, b] be a root of multiplicity M > 1. Then:
1 ∃δ > 0 s.t. the sequence
xn+1 = xn − f(xn)f ′(xn)
n = 0, 1, 2, . . . ,
will converge to p for every x0 ∈ (p− δ, p+ δ) with a linear order ofconvergence, s.t.
|en+1| ≈ M−1M |en|
2 ∃δ > 0 s.t. the sequence
xn+1 = xn −M f(xn)f ′(xn)
n = 0, 1, 2, . . . ,
will converge to p for every x0 ∈ (p− δ, p+ δ) with a quadratic order ofconvergence.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 125 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Remark
The constant M−1M in the first case, might seem disappointing: For
M > 1 this is more than 1/2. Even in the bisection method we had a1/2 factor. However, in the case of an even M , bisection is not evenapplicable here.
We don’t know M in advance. It is usually not worth it to do atrial-and-error search for the M that will give us a quadraticconvergence, and it is better to stick with the linear convergence here.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 126 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Before the Proof
We already saw that the order of convergence for Newton’s methodfollowed from the convergence theorem for fixed point iterations: q isthe smallest number for which g(q) 6= 0, where p is a root of f . We alsosaw that g(x) = x− f(x)/f ′(x) and that
g′(x) =f(x)f ′′(x)
(f ′(x))2
.
Thus, at a simple root f(p) = 0 and f ′(p) 6= 0, and it was easy to seethat g′(p) = 0 and to claim the existence of an interval (p− δ, p+ δ) onwhich |g′(x)| < 1.
But here, with M > 1, we can’t say something similar about g′(p) sinceit gets the form 0/0.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 127 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Theorem
If f(x) = 0 has a root of multiplicity M at p, then there exists acontinuous function, h(x), such that f can be written as
f(x) = (x− p)Mh(x)h(p) 6= 0
Example
f(x) = x2 can be written as
f(x) = x2 = (x− 0)2 · 1p = 0
M = 2
h(x) ≡ 1
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 128 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Proof of Part 1 in the Convergence Theorem
Proof.
By the last theorem, we can write f as
f(x) = (x− p)Mh(x) h(p) 6= 0
Recall that g(x) = x− f(x)/f ′(x) so g′(x) = f(x)f ′′(x)(f ′(x))2 . Now:
f ′(x) =M(x− p)M−1h(x) + (x− p)Mh′(x)= (x− p)M−1
[Mh(x) + (x− p)h′(x)]
]f ′′(x) = (M − 1)(x− p)M−2
[Mh(x) + (x− p)h′(x)]
]+ (x− p)M−1
[Mh′(x) + h′(x) + (x− p)h′′(x)
]With some algebra, it follows that g′(p) = M−1
M . Thus, unless M = 1, wewill have |g′(p)| < 1 and g′(p) 6= 0. Thus, there is convergence and it’slinear.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 129 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Proof of Part 2 in the Convergence Theorem
Proof.
The improved iteration is
g(x) = x− Mf(x)
f ′(x).
With some calculations, using expressions from earlier, we will get that
g′(x) = (p− x)(1−M)(p− x)h′(x) +Mh(x) [(p− x)h′′(x)− 2h′′(x)]
[Mh(x) + (x− p)h′(x)]2
Thus, g′(p) = 0. Thus, the convergence rate is at least quadratic. With
some more work, we can show that g′′(p) = 2M
h′(p)h(p) . Thus, depending on
h′(x), the convergence order may be even higher (i.e., better), e.g., ifh′(p) = 0.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 130 / 131
The Newton (-Raphson) Method A Root with Multiplicity
Version Log
17/11/2019, ver 1.03. Corrected g to g′ in one of the steps in the proofof Newton’s method’s convergence. S127: Typo
14/11/2019, ver 1.02. The sections “Convergence” and “Fixed Point”appeared in the wrong order.
10/11/2019, ver 1.01. Typo fixes.
4/11/2019, ver 1.00.
BGU CS Solving Nonlinear Equations (ver. 1.03) AY ’19/’20, Fall Semester 131 / 131