The Calculus of Variations - Harvey Mudd College · The Calculus of Variations Wednesday, 4 September 2013 How to ﬁnd the path between two points that produces an extreme value

The Calculus of VariationsWednesday, 4 September 2013

How to find the path between two points that produces anextreme value of the path integral of a given function betweenthe points.

Ph

ysic

s11

1

What curve produces the shortest path between two points? Such a curve is called ageodesic. You probably know that on the surface of a sphere, it is the arc of a great circle, Developing the mathematics of geodesics will

help us investigate minimum principles inmechanics.

Developing the mathematics of geodesics willhelp us investigate minimum principles inmechanics.which is why planes heading to Europe don’t fly around a line of constant latitude but

use the polar route. In “regular” two-dimensional space, we suspect it is a straight line.Can we prove it?

Take one point at the origin, and the other at (x1, y1). Then along a path given by y(x),the element of path length satisfies d s2 = d x2 +d y2, so that the total path length is

s =∫ x1

0

d s

d xd x =

∫ x1

0

√d x2 +d y2

d xd x =

∫ x1

0

√1+ (y ′)2 d x (1)

where y ′ = d yd x . The question is: what is the best path y(x) to minimize s? This is not a We will use primes to indicate derivatives with

respect to the argument of a function, as-sumed to be a spatial coordinate. We will usedots to represent derivatives with respect totime.

We will use primes to indicate derivatives withrespect to the argument of a function, as-sumed to be a spatial coordinate. We will usedots to represent derivatives with respect totime.

normal minimax problem, in which we minimize with respect to a single value. Here wemust solve for an infinite number of values (namely, all the values of y(x) between x = 0and x = x1). The value we wish to minimize, s, is not a function of y , but a functionalof the path y(x) — it is a function of the function y(x). So, how do we find the path thatyields a minimum value of s?

1. Euler’s equation

Before looking for a solution, I want to generalize slightly. We typically would like to findan extremum of the integral If dependence on both y and y ′ seems redun-

dant to you, imagine that y is a vertical coor-dinate in a gravitational field and we integratewith respect to time, not position. The totalenergy of a particle of mass m at position ywould then depend both on its height (y) andits speed (y), which are independent quanti-ties.

If dependence on both y and y ′ seems redun-dant to you, imagine that y is a vertical coor-dinate in a gravitational field and we integratewith respect to time, not position. The totalenergy of a particle of mass m at position ywould then depend both on its height (y) andits speed (y), which are independent quanti-ties.

s =∫ x1

x0

f (y, y ′, x)d x (2)

That is, the function f could depend on y , y ′, and x, explicitly, and we seek a path suchthat the functional (that is, the value of s) does not vary to first order for any small per-turbation of that path.

Suppose that η(x) is any differentiable function defined between x0 and x1 that vanishesat both x0 and x1. We now compute the integral along the path

y(x,α) = y(x,0)+αη(x) (3)

which is shifted from the optimal path y(x) = y(x,0) by a scaled copy of η(x):

s(α) =∫ x1

x0

f (y(x,α), y ′(x,α), x)d x (4)

Physics 111 1 of 8 Peter N. Saeta

1. EULER’S EQUATION

A

By(x)

y

y(x)+!"(x)

x

Figure 1: We integrate some functional of y, y ′, and x along a path from A at (x0, y0) to B at (x1, y1)and seek an extreme value of the resulting integral.

When α= 0, s(α) reverts to the functional we care about, s of Eq. (2). For small values ofα we require that s(α) remain unchanged, to first order in α. That is, we require ∂s

∂α = 0at α= 0. By assumption, the endpoints x0 and x1 remain fixed, so the change in s arisessolely from variation in the integrand. By the chain rule,

∂s

∂α=

∫ x1

x0

[∂ f

∂x

∂x

∂α+ ∂ f

∂y

∂y

∂α+ ∂ f

∂y ′∂y ′

∂α

]d x (5)

From the definition provided in Eq. (4), we deduce that

∂y

∂α= η(x) and

∂(d y/d x)

∂α= η′(x)

Since x has no dependence on α, the first term on the right in Eq. (5) vanishes, leavingus with

∂s

∂α=

∫ x1

x0

[∂ f

∂y

∂y

∂α+ ∂ f

∂y ′∂y ′

∂α

]d x =

∫ x1

x0

[∂ f

∂yη(x)+ ∂ f

∂y ′ η′(x)

]d x

Now we integrate the second term by parts to get Recall that d(uv) = u d v + v du, so∫ b

a u d v =uv |ba −∫ b

a v du.

Recall that d(uv) = u d v + v du, so∫ b

a u d v =uv |ba −∫ b

a v du.∂s

∂α=

∫ x1

x0

[∂ f

∂yη−

(d

d x

∂ f

∂y ′

)η

]d x (6)

The integrated term vanishes because, by assumption, η(x) vanishes at the limits of in-tegration. Therefore,

∂s

∂α=

∫ x1

x0

[∂ f

∂y− d

d x

(∂ f

∂y ′

)]η(x)d x (7)

which we require to vanish at α= 0, where y(x,α) = y(x).

Thus far, we have required only that η(x) be differentiable and that it vanish at x0 and x1.It is otherwise entirely arbitrary. The only way to guarantee that the integral will vanishfor arbitrary η(x) is for the term in brackets to vanish. That is,

∂ f

∂y− d

d x

(∂ f

∂y ′

)= 0 (E I)


1. EULER’S EQUATION 1.1 A Second form of Euler’s equation

which is called Euler’s equation. Euler’s equation is satisfied along extremalpaths y(x).Euler’s equation is satisfied along extremalpaths y(x).

Note that the derivatives with respect to y and y ′ are partial, whereas the x derivative istotal. The condition that the functional be stationary with respect to small changes inthe path of integration leads to a differential equation (E I) whose solution yields therequired path y(x). Leonhard Euler first derived this result in 1744.

1.1 A Second form of Euler’s equation

Euler’s equation is especially nice to work with when f does not depend explicitly on y ;i.e., ∂ f /∂y = 0. Then the total derivative that remains must vanish, meaning that there’sreally no need to take the derivative at all. We can just set

∂ f

∂y ′ = constant

and integrate only once. In this case, we say that the quantity ∂ f∂y ′ is a first integral.

If f does have explicit y dependence, there is an important second way to express thequantity on the left-hand side of Eq. (E I) that is sometimes easier to work with. Considerfirst the total derivative of f with respect to x:

d f

d x= ∂ f

∂x+ ∂ f

∂y

d y

d x+ ∂ f

∂y ′d y ′

d x= ∂ f

∂x+ y ′ ∂ f

∂y+ y ′′ ∂ f

∂y ′ (8)

Nothing fancy here, just the fact that changing x can also change y and y ′, which there-fore also changes f . Now we pull a combination completely out of the blue:

d

d x

(y ′ ∂ f

∂y ′

)= y ′ d

d x

(∂ f

∂y ′

)+ y ′′ ∂ f

∂y ′ (9)

Actually, we are motivated by the first term on the right-hand side, which looks just likea term in Euler’s equation. Since the final term of Eq. (8) and Eq. (9) is the same, let’ssubtract the two equations to get

d f

d x− d

d x

(y ′ ∂ f

∂y ′

)= ∂ f

∂x+ y ′ ∂ f

∂y− y ′ d

d x

(∂ f

∂y ′

)(10)

The left-hand side is a total derivative with respect to x, while the right-hand side hastwo terms multiplied by y ′. Combining gives

d

d x

(f − y ′ ∂ f

∂y ′

)= ∂ f

∂x+ y ′

[∂ f

∂y− d

d x

(∂ f

∂y ′

)]So far, we have just manipulated two perfectly general equations for a function f thatdepends on x, y , and y ′. Now we apply the extra condition implied by Euler’s equa-tion: for a functional that is stationary against variation to first order, the term in squarebrackets vanishes. We thus obtain a second form of Euler’s equation,

d

d x

(f − y ′ ∂ f

∂y ′

)= ∂ f

∂x(E II)

When would Eq. (E II) be especially convenient?


2. GENERALIZATIONS

Example 1: Shortest distance in a plane

Returning to Eq. (1), we have f (y, y ′, x) =√

1+ y ′2. This functional has no ex-plicit dependence on either x or y , so (E I) will be easier to use. From (E I),

dd x (∂ f /∂y ′) = 0, so

∂ f

∂y ′ =C = y ′√1+ y ′2 (11)

where C is a constant. The fraction on the right will be a constant if y ′ = c1 isconstant, so

y(x) = c1x + c2 (12)

which is indeed the equation of a line. That’s a relief: the shortest distance be-tween two points in a plane is a straight line.

2. Generalizations

Suppose that we have more than one dependent variable, so that f is

f (y1, y ′1, y2, y ′

2, . . . , x)

which we will write in shorthand as f (yi , y ′i , x), where i = 1,2, . . . , N for N dependent

variables. By varying each yi with a function αηi (x), you can show that

∂ f

∂yi− d

d x

(∂ f

∂y ′i

)= 0 for i = 1,2, . . . , N (13)

Example 2: Geodesics of a cone Find the geodesics on the cone given by z = λρ incylindrical coordinates.

The distance element (metric) is d s2 = d z2 +dρ2 +ρ2dφ2, and we will take ρ asthe independent variable. Then

d s = (1+ z ′2 +ρ2φ′2)1/2

dρ =√

1+λ2 +ρ2φ′2 dρ (14)

The geodesic is then the minimum value of

s =∫ √

1+λ2 +ρ2φ′2 dρ

so we identify the functional f (φ,φ′,ρ) =√

1+λ2 +ρ2φ′2. This functional hasno explicit dependence on φ, so it is particularly advantageous to use (E I):


2. GENERALIZATIONS 2.1 Delta Notation

∂ f

∂φ− d

dρ

(∂ f

∂φ′

)= 0 =⇒ ∂ f

∂φ′ =ρ2φ′√

1+λ2 +ρ2φ′2 = K

where K is a constant. Squaring and isolating φ′,

ρ2φ′2 = K 2(1+λ2 +ρ2φ′2)

φ′2ρ2(1−K 2) = K 2(1+λ2) ≡Λ2

dφ

dρ= Λ

ρ√ρ2 −K 2

we can now separate and integrate:

φ−φ0 =∫

Λdρ

ρ√ρ2 −K 2

We need to simplify the square root in the denominator, so we’d better remembera sneaky form of the Pythagorean theorem, sec2θ−1 = tan2θ. This suggests wedefine ρ = K secψ, which means dρ = K secψ tanψdψ. Substituting into theintegral and integrating, we have

φ−φ0 =Λ∫

K secψ tanψ

K secψK tanψdψ= Λ

Kψ=

√1+λ2ψ

Finally, we undo the ψ substitution to get

φ−φ0 =√

1+λ2 sec−1(ρ/K ) =⇒ cos

(φ−φ0p

1+λ2

)= K

ρ

or

ρ cos

(φ−φ0p

1+λ2

)= K (15)

This is particularly easy to interpret when λ = 0, which corresponds to the x yplane. If φ0 = 0, then we get ρ cosφ = x = K , which is the equation of a straightline parallel to the y axis. We can clearly rotate this line with the value of φ0, soat least for the degenerate case of λ = 0, our solution makes sense. As shown inFig. 2, it works for finite λ, as well.

2.1 Delta Notation

There is a common shorthand for the variation in the functional that I will introduce atthis point. We have used the parameterα to compute the rate of change of the functionalwith respect to a variation in the path parameterized byα and required that the variationwith α vanish when α is zero. In a sense, we invented α and then made it disappear. It isas though we have begun a Taylor series expansion of s with respect toα and lost interest


2. GENERALIZATIONS 2.2 Handling Constraints

after the linear term:

s(α) = s(0)+ ∂s

∂αα+·· ·

The change in s is thus

δs = s(α)− s(0) = ∂s

∂αδα+·· ·

to first order in the variation δα of α from its initial value (zero).

Why use δ instead of d? The δ notation is reserved for variations in a quantity causedby the change in path [the dependent variable(s) y(x)] without varying the independentvariable (x). So, the derivation of Euler’s equation using the delta notation would go asfollows:

δs = δ∫ x1

x0

f (y, y ′, x)d x

=∫ x1

x0

[∂ f

∂yδy + ∂ f

∂y ′ δy ′]

d x

=∫ x1

x0

[∂ f

∂yδy + ∂ f

∂y ′d

d x(δy)

]d x

Now integrating the second term by parts yields

δs =∫ x1

x0

[∂ f

∂y− d

d x

(∂ f

∂y ′

)]δy d x

Since the path variation δy is arbitrary, for the variation in s to vanish requires that thequantity in brackets vanish everywhere, which is indeed Euler’s equation.

2.2 Handling Constraints

Sometimes the path is constrained to lie on a particular surface. For example, we mayseek the shortest path between two points on the surface of a doughnut or a sphere. Inthe latter case, we have a constraint of the form g (x, y, z) = y2+z2+x2−ρ2 = 0, where we

Figure 2: An illustration of Eq. (15) for λ = 2, K = 1/10, φ0 = 1/2. See the Mathematica file Cone-Geodesic.nb.


3. SUMMARY

are thinking of y = y1(x) and z = y2(x). So, when we consider varying paths by replacingyi (x) with yi (x)+αηi (x), we have to ensure that we don’t wander off the surface. That is,we have to make vanish

δs =∫ x1

x0

[(∂ f

∂y− d

d x

∂ f

∂y ′

)η1(x)+

(∂ f

∂z− d

d x

∂ f

∂z ′

)η2(x)

]δαd x (16)

but we aren’t allowed to vary y and z independently. To stay on the surface, we insistthat δg vanish at each point along the extremal curve. That is,

δg =(∂g

∂y

∂y

∂α+ ∂g

∂z

∂z

∂α

)δα=

(∂g

∂yη1(x)+ ∂g

∂zη2(x)

)δα= 0 (17)

Since δg = 0, we may multiply it by an arbitrary constant λ and add it to the integrandin Eq. (16) without affecting the value of δs. So, we must require that

δs =∫ x1

x0

[(∂ f

∂y− d

d x

∂ f

∂y ′ +λ∂g

∂y

)η1(x)+

(∂ f

∂z− d

d x

∂ f

∂z ′ +λ∂g

∂z

)η2(x)

]δαd x (18)

vanishes, where λ(x) is called Lagrange’s undetermined multiplier. Now, we are guar-anteed to explore only variations that are consistent with the constraint g . Since η1(x)and η2(x) are arbitrary functions, each term in large parentheses must vanish, giving usthe two equations

∂ f

∂y− d

d x

∂ f

∂y ′ +λ(x)∂g

∂y= 0 (19)

∂ f

∂z− d

d x

∂ f

∂z ′ +λ(x)∂g

∂z= 0 (20)

These two equations, along with the equation of constraint g (y, z) = 0, provide the threeequations required to determine the unknown path and the unknown multiplier. In thegeneral case of several dependent variables and several equations of constraint, we have

∂ f

∂yi− d

d x

∂ f

∂y ′i

+∑jλ j∂g j

∂yi= 0

g j {yi , x} = 0

(21)

3. Summary

Minimax problems in ordinary calculus are easily solved by setting the first derivative tozero, whether the function is of a single or multiple independent variables. Finding an“optimal” path between fixed endpoints—in the sense that nearby paths yield the samevalue of a given path integral—is only slightly more complicated. We imagine an arbi-trary perturbation to the (unknown) true path, subject only to the constraint that thisperturbation must vanish at the end points. By insisting that adding “a small amount”of this perturbation to our true function does not change the path integral, and using anintegration by parts, we obtain a differential equation whose solution is the desired true


3. SUMMARY

path. The Euler equations are these differential equations:

∂ f

∂y− d

d x

(∂ f

∂y ′

)= 0 (E I)

d

d x

(f − y ′ ∂ f

∂y ′

)− ∂ f

∂x= 0 (E II)

The second form is particularly useful when f has no explicit x dependence.

At the risk of belaboring the obvious, there is nothing sacrosanct about the choice of xas the independent variable. If we were to use t instead of x, to use x instead of y asthe dependent variable, and to denote derivatives with respect to t by dots instead ofprimes, these equations would read

∂ f

∂x− d

d t

(∂ f

∂x

)= 0 (E I)

d

d t

(f − x

∂ f

∂x

)− ∂ f

∂t= 0 (E II)

We shall see that this form of the equations is particularly useful in mechanics.

Exercises and Problems

Exercise 1 – Euler-Lagrange in N dimensions Derive Eq. (13) by following the stepsleading up to Eq. (7) for a functional of several dependent variables.

Exercise 2 – Shortest distance in 3-space Show that the shortest distance between twopoints in three-dimensional Cartesian space is a straight line.

Problem 3 – Geodesics of a cylinder Describe the geodesics (the paths of minimumdistance along the surface) of a right circular cylinder. Hint: roll up a piece of paper.

Problem 4 – Geodesics of a sphere Show that the geodesics of a sphere are indeedgreat circles. Use spherical polar coordinates, with θ the polar angle andφ the azimuthalangle; i.e., θ measures latitude and φ measures longitude.


Documents

The Calculus of Variations - Harvey Mudd College · The Calculus of Variations Wednesday, 4 September 2013 How to ﬁnd the path between two points that produces an extreme value