MT5802 Calculus of variations Introduction. › ~rac › MT5802 › Calculus of varia… · MT5802 - Calculus of variations Introduction. Suppose y(x)is defined on the interval a,b

MT5802 - Calculus of variations

Introduction.

Suppose y(x) is defined on the interval a,b

and so defines a curve on the

x,y( ) plane.

Now suppose

I = F(y, ′y ,x)

a

b

∫ dx (1)

with ′y the derivative of y(x) . The value of this will depend on the choice of thefunction y and the basic problem of the calculus of variations is to find the form of thefunction which makes the value of the integral a minimum or maximum (most commonlya minimum).The sort of question which gives rise to this kind of problem is exemplified by the“Brachistochrone” problem, solved by Newton and the Bernoullis (the name comes fromthe Greek for “shortest time”). This considers a particle sliding down a smooth curveunder the action of gravity and poses the question as to what curve minimises the time forthe particle to slide between fixed points A and B.

A

Clearly the time will need to be found by calculating the speed at each point thenintegrating along the curve.Other examples arise in various areas of physics in which the basic laws can be stated interms of variational principles. For example in optics Fresnel’s principle says that thepath of a light ray between two points is such as to minimise the time of travel betweenthe two points.

The Euler-Lagrange equation.

First recall the condition under which an ordinary function y(x) has an extremum. If weexpand in a Taylor series

B

y(x + δx) = y(x)+ δx ′y (x)+

12

δx 2 ′′y (x)+ ........

then the condition is that the term proportional to δx must vanish, so that if the secondderivative is non-zero the difference between y(x + δx)and y(x) will always have thesame sign for small δx . The same principle applies to our present problem.What we do is consider a small change in the function y(x) , replacing it with

y(x)+ η(x) . (Note that all the functions we introduce are assumed to have appropriateproperties of differentiability etc, without particular comment being made.) We thenproduce a change in the integral, which can be expanded in powers of η . We demandthat the term proportional to η vanishes.Substituting into (1) we get

I(y + η) = F(y + η, ′y + ′η ,x)dxa

b

∫

= F(y, ′y ,x)dx +∂F∂y

η +∂F∂ ′y

′η

dx

a

b

∫a

b

∫ +O(η2)

so that what we want is

∂F∂y

η +∂F∂ ′y

′η

dx

a

b

∫ = 0 . (2)

Integrating the second term by parts gives

∂F∂y

−ddx

∂F∂ ′y

η(x)dx

a

b

∫ = 0 (3)

In obtaining this we have assumed that η(a) = η(b) = 0 , ie the perturbation vanishes atthe end points, leaving the end points A and B of the curve unchanged, as shown below.

y

xThe unperturbed curve (full line) and the perturbed curve ( dotted line)

Since this must hold for all η(x)we obtain

A

B

∂F∂y

−ddx

∂F∂ ′y

= 0 . (4)

This is the Euler-Lagrange equation, the basic equation of this theory. It is a differentialequation which determines y as a function of x .

Examples.

(a) Find the curve which gives the shortest distance between two points on a plane.

If the curve is y = y(x) then the element of length is

dl = dx 2 +dy 2 = 1 + ′y 2dxso we want to minimise

1 + ′y 2

a

b

∫ dx

(where a and b are the x-coordinates of the points of interest).The integrand is independent of y so we just get

ddx

∂∂ ′y

1 + ′y 2

= 0

giving

′y

1 + ′y 2= const , or ′y = const . As expected this just gives a straight line

y = mx +c with the constants fixed by the positions of the end points.

(b) Find the curve which minimises

(y 2 + ′y 2)dx

a

b

∫

The Euler-Lagrange equation for this is

ddx

(2 ′y )− 2y = 2 ′′y − 2y = 0

and if we multiply by ′y we get a first integral ′y 2 −y 2 = const. Assuming this constantto be positive and equal to a2 we get the solution y = a sinh(x +b) . If the constant isnegative we can take it to be −a2 and get the solution y = a cosh(x +b) . In both cases bis a constant and a and b need to be found using given end points.

This is a fairly simple, artificial example, but it illustrates a more general point. Note thatwe could easily find a first integral and reduce the problem to a first order DE. Theexistence of a first integral like this turns out to be a general property of the Euler-Lagrange equation whenever the integral has no explicit dependence on x .

Under these circumstances, if we multiply the E-L equation by ′y we get

′yddx

∂F∂ ′y

− ′y∂F∂y

= 0

or

ddx

′y∂F∂ ′y

− ′′y

∂F∂ ′y

− ′y

∂F∂y

= 0 .

Since F does not contain x explicitly, the last two terms combine to give

dFdx

, the total

derivative. So, we get the first integral

′y∂F∂ ′y

−F = const (5)

As a more interesting example we return to the brachistochrone problem mentionedearlier.Suppose two points A and B are connected by a smooth ramp along which a particle canslide, starting at rest at A. Taking A at the origin and the y direction verticallydownwards, then at a point (x,y) on the curve, the particle speed is given by

v2 = 2gy

(with g the acceleration due to gravity). The time to move an increment (dx,dy)alongthe curve is

dt = dx 2 +dy 2 / v = 1 + ′y 2 dx

2gySo, the integral which we need to minimise is

1 + ′y 2

ya

b

∫ dx

and the first integral of the Euler-Lagrange equation as derived above (Eq. (5)) is

′y′y

y(1 + ′y 2)−

1 + ′y 2

y= c .

This simplifies to (with k = −1/c )

y(1 + ′y 2 = k

y(1 + ′y 2) = kor

′y =

k 2 −yy

.

This can be integrated by making the substitution y = k 2 sin2 θ , giving

dydx

= 2k 2 sin θ cosθdθdx

=cosθsin θ

which has the solution

x =

k 2

22θ − sin 2θ( ) + K .

Putting b = k 2 / 2 and φ = 2θ we get parametric equations for the curve in the form

x = b(φ − sinφ)+ Ky = b(1− cosφ) .

As illustrated by the diagram below, these represent a cycloid, the curve traced out by apoint on the circumference of a wheel of radius b rolling along the x axis.

y

Since the curve passes through the origin, K = 0 . The value of b is determined by thecondition that the curve passes through B.

More than one dependent variable.

Suppose F = F(y1, ′y

1,y

2, ′y

2,y

3, ′y

3,..........) with each yi

= yi(x)and again we are looking

for an extremum of

Fdxa

b

∫ . The analysis proceeds as before, replacing each yiwith

yi+ η

i. Since each ηi

can be chosen independently, we must let the coefficient of each inthe integrand vanish. We end up with a system of Euler-Lagrange equations

xA

φ

B

∂F∂y

i

=ddx

∂F∂ ′y

i

. (6)

It has, of course, been assumed that the end points are fixed, as before.Example: Find the curve which minimises

( ′y 2 +

0

1

∫ ′z 2 + y 2)dx

and which joins the points (0,0,0)and (1,1,1) .

The E-L equations are

ddx

(2 ′y )− 2y = 0

ddx

(2 ′z ) = 0

with general solutions

y = a coshx +b sinhxz = cx +d.

Imposing the end point conditions gives the curve

y = cosh−1 1coshxz = x .

Hamilton’s Principle

Suppose a conservative dynamical system is described by coordinates q

1,q

2,.....,q

n( ) and

the rates of change of these are &qi i = 1,....,n( ) . Then the kinetic energy of the system is,

in general T(q1,...,q

n; &q

1...., &q

n)and the potential energy is V(q

1,...,q

n) . The Lagrangian is

then defined by L = T −V and Hamilton’s principle states that along the particle orbitthe integral

Ldtt1

t2

∫has an extremum. This gives rise to the set of equations of motion

ddt

∂L∂ &q

−∂L∂q

= 0 ,

usually known in this context as Lagrange’s equations. These can be derived fromNewton’s laws of motion and then Hamilton’s principle becomes a deduction from them.For complicated systems Lagrange’s equations are usually easier to handle than anyattempt to work out the equations of motion directly from Newton’s equations.

Example: Find the equations of motion for the double pendulum system shown below.

The height of the top bob above its equilibrium position is a(1− cosθ) and the height ofthe lower bob above its equilibrium is a(1− cosθ)+b(1− cosφ) . So

V = mga(1− cosθ)+ Mg a(1− cosθ)+b(1− cosφ)

The horizontal component of velocity of the top bob is a &θ cosθ and the verticalcomponent a &θ sin θ . For the lower bob the corresponding components are

a&θ cosθ +b &φ cosφ and a

&θ sin θ +b &φ sinφ and so

T =12ma2 &θ2 +

12M (a &θ cosθ +b &φ cosφ)2 + (a &θ sin θ +b &φ sinφ)2

=12ma2 &θ2 +

12M a2 &θ2 +b2 &φ2 + 2ab &θ &φ sin(θ + φ)

From Lagrange’s equations we then get

ddt

ma2 &θ + Ma2 &θ + Mab &φ sin(θ + φ)( ) + mga sin θ + Mga sin θ = 0

ddt

Mb2 &φ + Mab &θ sin(θ + φ)( ) + Mgb sinφ = 0 .

Problems with constraints.

Recall that to find the extremum of a function of several variables with constraintsimposed we use Lagrange’s method of undetermined multipliers. An exact analogy holdsin the case of calculus of variations. Suppose we want to find the extremum of

m

M

a

b

θ

φ

I = F(y, ′y

a

b

∫ ,x)dx

subject to the condition

H = G(y, ′y ,x)dx = const.

a

b

∫Then we apply the Euler Lagrange equations to F −λG (or F + λG if you prefer) with λ an undetermined multiplier which is determined by the constraint and the end points.

Example; A heavy chain with constant mass/unit length is suspended between two points.What curve does it take up in equilibrium?

The equilibrium condition is such as to minimise gravitational potential energy.

With the geometry shown this means that we minimise

ydla

b

∫ where the element of

length is given by dl = 1 + ′y 2dx . There is also the constraint that the total length isfixed, so we must minimise

y 1 + ′y 2

a

b

∫ dx

subject to

1 + ′y 2

a

b

∫ dx = const.

Thus we apply the Euler-Lagrange equation to y −λ( ) 1 + ′y 2 . Since this has no

explicit dependence on x we can use the result we obtained already (Eq. (5)) to get a firstintegral, namely,

′y

dd ′y

y −λ( ) 1 + ′y 2

− y −λ( ) 1 + ′y 2 = k

from which we get

′y 2 =

y −λk

2

−1 .

x

y

If we make the substitution y −λ = k cosh z this gives ′z = 1 so that z = x +c and weobtain

y = λ + k cosh(x +c) .The three constants k, c and λ are obtained from the coordinates of the end points and thelength. This curve is called a catenary.

If there is more than one constraint then we introduce more than one multiplier.Example: In statistical mechanics, the distribution of energy of a system of particles isdescribed by a probability distribution function f (E) . In equilibrium, theory says thatthis distribution should be such as to maximise the function

− f log fdE

0

∞

∫subject to the conditions

f (E)dE = 1 Ef (E)dE = E

00

∞

∫0

∞

∫ .

The first of these is just the standard condition on a probability function. The second,with E0 a given constant, says that the average energy per particle, or equivalently thetotal energy of the system, is fixed.The Euler-Lagrange equation with these constraints is

∂∂f

(−f log f −λf − µEf ) = 0

with λ and µ the two multipliers corresponding to the two constraints. This yields

f = Ce−µE

where C is a constant into which λ has been incorporated. Using the two constraintsgives µ = 1/ E

0, C = E

0. This is the Boltzmann distribution and E0 is proportional to

the temperature of the system.

The isoperimetric problem - find the shape which has maximum area for a givenperimeter.

Suppose the parametric equations of the required curve are

x = x(t) y = y(t) t0

≤ t ≤ t1

with x(t0) = x(t

1) , y(t

0) = y(t

1) , so that we have a closed curve. The length is

L = ( &x 2 + &y 2)1/2dtt0

t1

∫and is fixed. The area enclosed is

A =

A

∫∫ dxdy

which, by means of Green’s theorem, can be expressed as the line integral

A =12

(x &y −y &x)dtt0

t1

∫ .

So, we construct the Euler-Lagrange equations for the two variables x and y from thefunction

φ(x,y, &x, &y) =

12(x &y −y &x)−λ( &x 2 + &y 2)1/2 .

These equations are

ddt

−y2

−λ &x

&x 2 + &y 2( )1/2

−

&y2

= 0

ddt

x2

−λ &y

&x 2 + &y 2( )1/2

−

&x2

= 0.

These can be integrated immediately to give

y +λ &x

( &x 2 + &y 2)= A

x −λ &y

( &x 2 + &y 2)= B

Multiplying the first of these by &x and the second by &y and adding gives

&x(x −B)+ &y(y −A) = 0This has the integral

x −B( )2

+ y −A( )2= const.

so that the required curve is a circle.

Geodesics

If G(x,y,z) = 0 defines a surface in three dimensional space, then the geodesics on thissurface are the curves which produce the shortest distance between points on the surface.So, as we have seen, the geodesics on a plane are just straight lines. We can cast theproblem of finding geodesics on a surface into a variational problem with a constraint asfollows. If x = x(t),y = y(t),z = z(t) are parametric equations for a curve on thesurface, then along any curve on the surface

G x(t),y(t),z(t)( )t0

t1

∫ dt = 0 (8)

Since the element of length along a curve is &x 2 + &y 2 + &z 2dt , the problem is to minimise

&x 2 + &y 2 + &z 2dtt

t1

∫

subject to the constraint (8). This gives

ddt

&xF

−λ∂G∂x

= 0

plus similar equations with for y and z , with F = &x 2 + &y 2 + &z 2 .

As a particular example consider geodesics on the sphere, for which

G = x 2 + y 2 + z 2 −R2

and so the equations for the geodesic are

ddt

&xF

2x=

ddt

&yF

2y=

ddt

&zF

2z= λ .

Expanding the derivatives in the first equation gives

&&xF − &x &F

2xF 2=

&&yF − &y &F

2yF 2

which can be rearranged into

y &&x −x &&yy &x −x &y

=&F

F.

In a similar way,

z &&y −y&&zz &y −y &z

=&F

F.

We now equate these two expressions for &F / F and write the result in the form

ddt

(y &x −x &y)

y &x −x &y=

ddt

(z &y −y &z)

z &y −y &zwhich integrates to give

y&x −x &y = C

1(z &y −y &z) .

Writing this in the form

&x +C1&z

x +C1Z

=&yy

we can integrate again to get

x +C1z = C

2y .

This is the equation of a plane passing though the origin. So, the geodesics on a sphereare the curves formed by the intersection of the sphere and planes through its centre.These are the great circles on the sphere.

Estimate of an eigenvalue using a variational method.

Suppose we have a problem

ddx

(p(x) ′y )+ q(x)y = λr(x)y y(0)=y(1)=0 . (7)

This will obviously have a trivial solution y = 0 , but for certain values of λ (theeigenvalues ) there will be non-trivial solutions. If we consider the calculus of variationsproblem of minimising

I = p(x) ′y 2 −q(x)y 2{ }

0

1

∫ dx

subject to the condition that

J = r(x)y 2

0

1

∫ = const.

and the given boundary conditions on y , then we obtain the above equation from theEuler-Lagrange equations and the method of multipliers. The lowest possible eigenvalueis then the minimum possible value of I/J. Since J is constrained to be constant this is justequivalent to the problem of minimising I subject to J being constant. The standardapproach to this leads back to the DE and we may appear to be going round in circles.The usefulness of this approach is that if we use any function y(x) then the resultingvalue of λ is greater or equal to the minimum possible, so we obtain an upper bound onthe lowest eigenvalue. With a choice of y which is a reasonable approximation to thesolution we can get a good estimate.

Example: Use this technique to find an estimate of the lowest eigenvalue of the problem ′′y + λy = 0 y(0) = y(1) = 0

This is, of course a problem to which we know the solution, namely that the eigenvaluesare given by λ = n2π2 , so the lowest is π2 = 9.8696 (corresponding to the solution y = sin(πx) ). We want a trial function with the required end values, and preferably onewhich is easily integrated (though numerical integration is readily done with a packagelike MAPLE). Let us take y = x(1−x) . Then p(x) = 1 q(x) = 0 r(x) = 1 and so

I(x) = (1− 2x)2

0

1

∫ dx =13

J = x 2(1−x)2

0

1

∫ dx =130

c

giving an upper bound of 10, which is actually a fairly good approximation to the lowesteigenvalue.

Variants on the boundary conditions are possible, for example in the following.

Example: Find the lowest eigenvalue for the problem

ddx

(x ′y )+ λxy = 0

with ′y (0) = y(1) = 0 .

A simple function satisfying the boundary conditions is y = 1−x 2 . For which

I = x(−2x0

1

∫ )2 =12

J = x(1−x 2)2dx =0

1

∫112

giving an upper bound of 6. One way of making this procedure more accurate is tointroduce one or more unknown parameters into the assumed function. For, examplehere we could take y = (1−x 2)(1 +cx 2) , which retains the correct boundary conditions.Then

IJ

=1 +

1615

c +12c2

16

+16105

c +124

c2

.

The point of the exercise is that this gives an upper bound for any value of c. So, if weminimise this expression with respect to c we will get the best possible estimate for thisform of y . Differentiating with respect to c we get the condition for a minimum that

40c2 +105c + 32 = 0 , and the root which gives a minimum is c = −0.352 . Thecorresponding value of I/J is 5.808. The equation has a solution which is a Besselfunction and from this the eigenvalue can be calculated to be 5.783.The technique of introducing unknown parameters (the Rayleigh-Ritz method) andminimising with respect to them is a very useful technique when the minimum value ofan integral is needed. Note that even in this simple example the algebra is tedious, so thatuse of a computer algebra system is a big help.In many problems (eg finding the ground state energy in a quantum system) the lowesteigenvalue is all that is needed. It is possible to extend this technique to get highereigenvalues, but we shall not pursue this here.

Variable end points

Suppose we relax the condition that the values of y be fixed at the end points but insteadassume that they are allowed to vary freely. Then, following the procedure which led toEq. (3) we obtain, as well as the term in (3), an extra contribution

∂F∂ ′y

η

x=b

−∂F∂ ′y

η

x=a

.

Since, the extremum, if it exists, must be the extreme value for whatever end points turnout to be suitable, the integral term must vanish as before. Otherwise a slightly greater orsmaller value for the integral could be obtained by taking a different curve with the sameend points. Also, this extra term must vanish which, since η is arbitrary, means that

∂F∂ ′y

= 0

at both end points.As a simple example we can consider the problem of minimising the distance between x = 0 and x = 1 without fixing y at the end points. Here

F = 1 + ′y 2

and the solution of the E-L equation is a straight line as before. The extra conditionsyield

′y

1 + ′y 2= 0

at both ends. The derivative must then be zero everywhere, so we arrive at the expectedresult that the shortest line between x = 0 and x = 1 is a straight line parallel to the xaxis.Another variant of the problem is to consider a case where the end points are constrainedto lie on a given curve. For simplicity, let us assume that the lower end point is fixedwhile that the upper end point has to lie on the curve y = g(x) . Suppose that theextremum has its upper limit at x = b , while the lower limit x = a is fixed. Then, if y isreplaced with y + η as before, there is a change in the upper limit of the integral to b + ∆x , say.

The corresponding change in y is

∆y = y(b + ∆x)+ η(b + ∆x)−y(b) ≈ ∆x ′y (b)+ η(b) .However, there is also the constraint that the end point lies on the given curve, whichgives ∆y ≈ ′g (b)∆x . Putting these relations together we get

∆x =

η(b)′g (b)− ′y (b)

.

Now, look at the change in the integral -

∆I = F(y + η, ′y + ′η ,x)dx − F(y, ′y ,x)dxa

b

∫a

b+∆x

∫

≈ η∂F∂y

+ ′η∂F∂ ′y

a

b+∆x

∫ dx + F(y, ′y ,x)dx.b

b+∆x

∫In the first integral here, which contains the small perturbation η , we can neglect thechange in the upper limit. Then if we integrate by parts we get the usual integralcontaining the E-L expression, plus a contribution

∂F∂ ′y

η

x=b

.

The second integral is approximately

F(y, ′y ,b)∆x = F(y, ′y ,b)

η(b)′g (b)− ′y (b)

.

y

x

y(x)

y+η

y=g(x)

a b b+∆x

For ∆I to vanish for arbitrary η we require that the E-L equation be satisfied and alsothat

∂F∂ ′y

+F

′g − ′y= 0 (9)

at the upper limit. If the lower limit had been constrained also to lie on a curve ratherthan being fixed, a condition analogous to (9) would apply there also.

Example: Find the curve connecting the origin to a curve y = g(x) and which has theshortest length.

Here, as we have seen before, F = 1 + ′y 2 and the solution of the E-L equation gives astraight line. For this case, the condition (9), which must be satisfied where the straightline meets the given curve, reduces to ′g ′y = −1 , implying that the straight line givingthe shortest distance to the curve must be orthogonal to the curve at the point ofintersection.

Some further comments

In the case of the simple problem of finding a local maximum or minimum of adifferentiable function we know that vanishing of the derivative is a necessary condition,but that it is not sufficient. The derivative can vanish but the point can be a point ofinflection rather than a turning point. For calculus of variation problems the situation issimilar, in that the Euler-Lagrange equation is a necessary, but not sufficient, conditionfor an extremum. In the case of a function, the nature of the critical point is easilydetermined by identifying the lowest non-zero derivative. The analysis of the calculus ofvariations problem is, however, rather complicated and will not be pursued here.

The basic ideas discussed here can be extended in various ways, for example tointegrands which involve higher derivatives of y or to problems which involveminimising a multiple integral over some given domain.

Further ReadingCalculus of Variations R WeinstockVariational Calculus in Science and Engineering M J ForrayAn Introduction to the Calculus of Variations L A Pars

Documents

MT5802 Calculus of variations Introduction. › ~rac › MT5802 › Calculus of varia… · MT5802 - Calculus of variations Introduction. Suppose y(x)is defined on the interval a,b