MA209 Variational PrinciplesMA209 Variational Principles Lecture Notes 2011 For nvariables, f(x) real valued and with an extremum at x= a, we pick a function g v(t) = f(a+ tv), where

MA209 Variational Principles

June 3, 2013

The course covers the basics of the calculus of variations, and derives the Euler-Lagrangeequations for minimising functionals of the type I(y) =

∫f(x, y, y′)dx. It then gives examples

of this in physics, namely optics and mechanics. It furthermore considers constrained motionand the method of Lagrange multipliers. Required is a basic understanding of differentiationmany dimensions, together with a knowledge of how to solve ODEs.

Contents

1 Review of Calculus 21.1 Functions of One Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Variational Problems 3

3 Derivation of the Euler Lagrange Equations 43.1 The one variable - one derivative case . . . . . . . . . . . . . . . . . . . . . . 43.2 Solutions of some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 Extension of the Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3.1 More Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3.2 Several dependent functions . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Relationship with Optics and Fermat’s Principle 104.1 Fermat’s Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Optical Analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Hamilton’s Principle 12

6 Constraints and Lagrange Multipliers 136.1 Finite Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.1.1 Two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136.1.2 n dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.1.4 A functional constrained by a functional . . . . . . . . . . . . . . . . . 166.1.5 One functional constrained by a function . . . . . . . . . . . . . . . . 18

7 Constrained Motion 20

These notes are based on the 2011 MA209 Variational Principals course, taught byJ.H.Rawnsley, typeset by Matthew Egginton. No guarantee is given that they are accurateor applicable, but hopefully they will assist your study. Please report any errors, factual ortypographical, to [email protected]

1

MA209 Variational Principles Lecture Notes 2011

1 Review of Calculus

1.1 Functions of One Variable

Figure 1: Graph showing a maximum at x = a

Suppose that x = a is a maximum of f . Then the graph of f bears some resemblance tothat in figure 1. Suppose that f is differentiable and that f ′(a) 6= 0. Then we either have

that f ′(a) > 0 or f ′(a) < 0. Consider the former. f ′(a) = limh→0f(a+h)−f(a)

h and if h > 0 wehave that f(a+h)− f(a) > 0 for h small and so f(a+h) > f(a), but as f(a) is a maximum,f ′(a) > 0 must be impossible. A similar argument shows that f ′(a) < 0 is impossible. Henceour original assumption is false, and so f ′(a) = 0.

However, there are functions f with f ′(a) = 0 at values of a which aren’t extrema, forexample f(x) = x3. We call a point a where f ′(a) = 0 a critical point of the function fand we have shown that the set of extrema is a subset of the set of critical points. This isalso true for the set of local extrema.

Example 1.1 Let f(x) = ax2 + bx+ c with a 6= 0. Then f ′(x) = 2ax+ b with −b2a the only

critical point. f(x)− f(−b2a

)= ax2 + bx + c− a

(−b2a

)2 − b (−b2a )− c = ax2 + bx− b2

4a + b2

2a =

a(x+ b

2a

)2and so for a > 0 is a minimum and a < 0 a maximum.

In general, this won’t be so pretty, but for “nice” functions with Taylor series we havef(a + h) − f(a) = hf ′(a) + h2

2! f′′(a) + ... and so if f ′′(a) 6= 0 we can decide if f ′′(a) > 0

whence we have a local minimum and if f ′′(a) < 0 we have a local maximum.

1.2 Functions of Several Variables

We will look at the two variable case. Consider f(x1, x2) a differentiable function withextremum (a1, a2). Pick functions x1(t) and x2(t) such that x1(0) = a1 and x2(0) = a2. Setg(t) = (x1(t), x2(t)). Then g takes some values of f and at t = 0 is an extremum of f andhence g. Thence g′(0) = 0 and so if (a1, a2) is an extremum then we have that

d

dt(f(x1(t), x2(t))

∣∣∣∣t=0

= 0 (1)

for any pair of functions (x1(t), x2(t)) passing through (a1, a2). Thus by the chain rule wehave that

∂f

∂x1(a1, a2)

dx1dt

(0) +∂f

∂x2(a1, a2)

dx2dt

(0) = 0

As this is true for arbitrary functions, we must have that ∂f∂x1

(a1, a2) = 0 = ∂f∂x2

(a1, a2). Notethat we could have picked functions with independent derivatives at t = 0 specifically.

2 of 20


For n variables, f(x) real valued and with an extremum at x = a, we pick a functiongv(t) = f(a+ tv), where v is an arbitrary vector. Then this will have an extremum at t = 0so g′v(0) = 0 for all v and so ∇f(a) ·v = 0 for all v so ∇f(a) = 0. If t = 0 is a local maximum

of gv then a is a local maximum of f . Then g′′v (0) =∑

ij vivj∂2f

∂xi∂xj(a) = Hessf (a) < 0 Then

all eigenvalues of Hessf (a) must be negative. If they have mixed signs or are zero then wecan deduce nothing.

Example 1.2 Suppose that f(x, y) = ax2+bxy+cy2. Then ∇f = (2ax+by, bx+2cy) = (0, 0)for an extrema. Thus if 4ac− b2 6= 0 then x = 0 = y is the only critical point.∑

ij

vivj∂2f

∂xi∂xj(a) = v212a+ 2bv1v2 + 2cv22

and so if a 6= 0 we get this equal to 2(a(v1 + b

2av2)2

+(c− b2

4a

)v22

)and so we have a

maximum or minimum when a(c− b2

4a

)> 0 and 4ac > b2.

2 Variational Problems

In order to motivate the study of Variational Principles we give some examples of famousproblems in the subject.

1. Suppose that y is a function such that y(x1) = y1 and y(x2) = y2. We want to find ywith the shortest length. The length L(y) is given by

L(y) =

∫ x2

x1

√1 +

(dy

dx

)2

dx

We say that L is a “functional” of the function y

2. Brachistochrone. Suppose that we have a bead of mass m sliding down a frictionlesswire under gravity along a curve from (x1, y1) to (x2, y2). Let T (y) be the time takento go from (x1, y1) to (x2, y2) along the curve y. We want to find a minimum ofthis. If the time is t1 at (x1, y1) and t2 at (x2, y2), and we denote by s the arclengthparametrisation, then

T (y) = t2 − t1 =

∫ t2

t1

dt =

∫ s2

s1

dsdsdt

=

∫ s2

s1

ds

v=

∫ x2

x1

√1 +

(dydx

)2v

dx

We can find the velocity v from conservation of energy. We know that E = 12mv

2 +mg(y(x) − y1) = 1

2mv21 + 0 if the initial speed is v1. If we set v1 = 0 then v =√

2g(y1 − y(x)) and so

T (y) =

∫ x2

x1

√1 +

(dydx

)2√

2g(y1 − y(x))dx

3. Least area of revolution Take a curve y with y(x1) = y1 and y(x2) = y2 and rotateit about the x-axis. One then gets a surface of revolution around the x-axis. We wantto find the curve for which the surface area is as small as possible. The surface area isequal to

A(y) = 2π

∫ x2

x1

y√

1 + (y′)2dx

3 of 20


3 Derivation of the Euler Lagrange Equations

3.1 The one variable - one derivative case

The problems in section 2 involve minimising functionals built from a function of one variableby integration of the function and its derivatives with values of the function specified at theends of the range of integration. These are typically called fixed endpoint problems. Ingeneral, the class of problems of this kind have a functional of the form

I(y) =

∫ x2

x1

f(x, y(x), y′(x))dx (2)

for y(x) with y(x1) = y1 and y(x2) = y2. In future I will write y for y(x) and y′ for y′(x) tosimplify the notation.

How do we find extrema of I(y)? We proceed in a similar manner to finding conditionsfor functions at extrema. We consider a one parameter family of functions yt with y0 theextremising function. Clearly they all have the same fixed endpoints. Then if g(t) = I(yt)we have g′(0) = 0 or d

dtI(yt) |t=0 = 0. If yt = y0 + tv then v(x1) = 0 = v(x2) Hence

d

dtI(y0 + tv) |t=0 = 0 (3)

for v as defined above. The solutions to this equation are called critical points of I(y).

Example 3.1 Consider

I(y) =

∫ 1

0[xy2 + (y′)2]dx

We then have

I(y0 + tv) =

∫ 1

0[x(y0 + tv)2 + (y′0 + tv′)2]dx

and so

d

dtI(y0 + tv)

∣∣∣∣t=0

=

∫ 1

0[x(y0 + tv)2 + (y′0 + tv′)2]dx

∣∣∣∣t=0

=

∫ 1

0(2xy0v + 2y′v′)dx

y0 is a critical point if the integral is 0 for all v with the conditions as above.

In the general case I(y0+tv) =∫ x2x1f(x, y0 + tv, y′0 + tv′)dx and so if we proceed formally

we get

d

dtI(y0 + tv)

∣∣∣∣t=0

=d

dt

∫ x2

x1

f(x, y0 + tv, y′0 + tv′)dx

∣∣∣∣t=0

=

∫ x2

x1

∂f

∂y(x, y0, y

′0)v +

∂f

∂y′(x, y0, y

′0)v′dx

If y0 is a critical point and v(x) is any suitable function with v(x1) = 0 = v(x2) then wehave from equation (3)

0 =

∫ x2

x1

∂f

∂y(x, y0, y

′0)v +

∂f

∂y′(x, y0, y

′0)v′dx

=

∫ x2

x1

[∂f

∂y− d

dx

(∂f

∂y′

)]vdx+

∂f

∂y′v

∣∣∣∣x2x1

=

∫ x2

x1

[∂f

∂y− d

dx

(∂f

∂y′

)]vdx

4 of 20


and hence we want to solve ∫ x2

x1

[∂f

∂y− d

dx

(∂f

∂y′

)]vdx = 0

for suitable v.We now make rigorous sense of this, and so we need f and its partial derivatives up to

order two and y′′0 to be continuous. Then ∂f∂y −

ddt

(∂f∂y′

)is continuous. We also need y0 + tv

to be a family of functions in a suitable space and so v must have two continuous derivatives.

Theorem 3.1 (The Fundamental Theorem of the Calculus of Variations) If u(x) iscontinuous on [x1, x2] and ∫ x2

x1

u(x)v(x)dx = 0

for all v(x) with two continuous derivatives and v(x1) = 0 = v(x2) then u(x) = 0 for allx ∈ [x1, x2].

Proof We use a contradiction argument. Suppose there is some point x0 ∈ (x1, x2) withu(x0) 6= 0. Without loss of generality we can assume that u(x0) > 0. IF not, consider thefunction −u. Then u(x) is non zero on some interval around x0 (positive here even), as u iscontinuous. Call this interval (x′1, x

′2). Suppose we have v(x) with two continuous derivatives

and v(x) = 0 where x 6∈ [x′1, x′2]. Then

0 =

∫ x2

x1

u(x)v(x)dx =

∫ x′2

x′1

u(x)v(x)dx

If furthermore v(x) > 0 for any x ∈ (x′1, x′2) then we have that∫ x′2

x′1

u(x)v(x)dx > 0

This is a contradiction; hence there is no point x0 where u(x0) 6= 0 and so u(x) = 0 for allx ∈ [x1, x2]. Thus the proof is reduced to a construction of such a function v(x). A suitablefunction would be

v(x) =

{0 x 6∈ (x′1, x

′2)

(x− x′1)3(x− x′2)3 x ∈ (x′1, x′2)

Q.E.D.

Remark If functionals have more derivatives then this argument could be modified for those.We simply take one higher power than the derivatives.

Aside If we need infinitely many derivatives, we can use e−1

x2 as it has infinitely manyderivatives at x = 0 and they are all equal to zero.

Theorem 3.2 If f is a function of three variables with all partial derivatives up to order twocontinuous then any critical point y of I(y) =

∫ x2x1f(x, y(x), y′(x))dx on the set of functions

with two continuous derivatives and satisfying endpoint conditions y(x1) = y1 and y(x2) = y2has

∂f

∂y− d

dx

(∂f

∂y′

)= 0 ∀x ∈ [x1, x2] (4)

5 of 20


Proof We showed above that∫ x2

x1

[∂f

∂y− d

dx

(∂f

∂y′

)]vdx = 0

for all v with two continuous derivatives. The expression in the square brackets is continuousand so by the fundamental theorem (theorem 3.1) must be zero ∀x ∈ [x1, x2] Q.E.D.

Definition 3.1 If a functional I(y) =∫ x2x1f(x, y(x), y′(x))dx then f is called the Lagrangian

of I and ∂f∂y −

ddx

(∂f∂y′

)= 0 is called the Euler-Lagrange equation of I

Remark The E-L equation is a second order ODE for y(x) with endpoint conditions.

3.2 Solutions of some examples

Example 3.2 Find the E-L equation for I(y) = 12

∫ π0 (y2 − (y′)2)dx. We have that ∂f

∂y = y

and ∂f∂y′ = −y′ and so the E-L equation is y − d

dx(y′) = 0 giving y′′ + y = 0

We now solve the examples in section 2.

1. We have from before that

L(y) =

∫ x2

x1

√1 +

(dy

dx

)2

dx

and so ∂f∂y = 0 and ∂f

∂y′ = y′√1+(y′)2

. The E-L equation then gives

− d

dx

(y′√

1 + (y′)2

)= 0

and so y′√1+(y′)2

is constant, hence y′ = m giving the line y = mx+ a

Remark Any case where ∂f∂y = 0 will have an immediate integral of the E-L equation

− ddx

(∂f∂y′

)= 0 as ∂f

∂y′ = constant. We call this a first integral of the E-L equation.

Before looking at the other two examples, we note that x does not appear explicitly sowe ask if there is a first integral. Observe that

d

dx

(y′∂f

∂y′− f

)= y′′

∂f

∂y′+ y′

d

dx

∂f

∂y′−(∂f

∂x+ y′

∂f

∂y+ y′′

∂f

∂y′

)= y′

d

dx

(∂f

∂y′

)− ∂f

∂x− y′∂f

∂y

and if y is a solution of the E-L equations we have that

d

dx

(y′∂f

∂y′− f

)= −∂f

∂x

and so if f is independent of x then y′ ∂f∂y′ − f is a constant. This is called the first integralfor the case of a Lagrangian independent of x.

6 of 20


2. Brachistochrone We have f(x, y, y′) =

√1+(y′)2√(y1−y)

if we ignore the constants. There is

no x dependence here and so y′ ∂f∂y′ − f is a constant.

y′∂f

∂y′− f =

y′y′√1 + (y′)2

√(y1 − y)

−√

1 + (y′)2√(y1 − y)

= A

giving −1√1+(y′)2

√(y1−y)

= A and hence (1 + (y′)2)(y1 − y) = 1A2 and we thus get

y′ = ±

√1

A2(y1 − y)− 1

If we now make the substitution A2(y1 − y) = sin2 θ2 then we get that −A2y′ =

sin θ2 cos θ2θ

′ and we get that

− 1

A2sin

θ

2cos

θ

2θ′ = ±

√1− sin2 θ

2

sin2 θ2

= ±cos θ2sin θ

2

and so − 1A2 sin2 θ

2θ′ = ±1 giving − 1

2A2 (1− cos θ)θ′ = ±1 and then integrating gives

− 1

2A2(θ − sin θ) = B ± x

which implicitly determines θ(x) and so y(x)

This curve is called a cycloid. Figure 2 shows such a curve.

Figure 2: A cycloid

3. f(x, y, y′) = 2πy√

1 + (y′)2 and observe that we have no x dependence again. Thus welook at the first integral:

y′∂f

∂y′− f =

(y′)2y√1 + (y′)2

− y√

1 + (y′)2 = A

and so we get that−y√

1 + (y′)2= A

and so y′ = ±√

y2

A2 − 1. If we then make the substitution yA = cosh z we get that

y′

A = sin zz′ and hence the equation to solve becomes

A sinh zz′ = ±√

cosh2 z − 1 = ± sinh z

7 of 20


Figure 3: The shape of surface which minimises the surface of revolution

and thus we get that z′ = ± 1A and so z = B ± x

A and so

y = A cosh(B ± x

A

)and so it looks like figure 3

We now try to fit this shape of solution to the endpoint conditions. Without lossof generality we will assume that y = A cosh

(B′ + x

A

), and we want a solution with

y(x1) = y1 and y(x2) = y2. Using the first of these we get that B′ = cosh−1(y1A

)− x1

A

and then y = y1 cosh(x−x1A

)+√y21 −A2 sinh

(x−x1A

)and using the second condition

gives a pretty nasty equation (I leave to the reader to work it out). To see if solutionsexist we plot the graph of y(x) for various values of A. Thus from this graph you cansee that if (x2, y2) is to the right of the dotted line then there is no solution. Alsonote that if (x2, y2) is above the dotted line then there are two solutions. Also thesesolutions may not be extrema, as a broken line may well minimise the problem.

Remark y0 + tv is called a variation of y0, hence the name Calculus of Variations

3.3 Extension of the Theory

3.3.1 More Derivatives

Suppose that

I(y) =

∫ x2

x1

f(x, y, y′, ..., y(n))dx

We try the same method as before, considering I(y + tv) for y an extremum. Set g(t) =T (y+ tv) and then this has an extremum at t = 0 so g′(0) = 0 and thus d

dtI(y + tv)∣∣t=0

= 0and so

d

dtI(y + tv)

∣∣∣∣t=0

=

∫ x2

x1

[∂f

∂yv +

∂f

∂y′v′ + ...+

∂f

∂y(n)v(n)

]dx = 0

If we assume that v(x1) = 0 = v(x2) and all partial derivatives up to v(n−1) are zero at x1and x2 then we get that

d

dtI(y + tv) |t=0 =

∫ x2

x1

[∂f

∂y− d

dx

(∂f

∂y′

)+ ...+ (−1)n

dn

dxn

(∂f

∂y(n)

)]vdx = 0

For the argument to be complete we need f to have (n+ 1) continuous derivatives and y tohave 2n continuous derivatives. Then the term in square brackets is continuous and we need

8 of 20


the version of the fundamental theorem for v with 2n continuous derivatives. Then for y anextremum it satisfies

∂f

∂y− d

dx

(∂f

∂y′

)+ ...+ (−1)n

dn

dxn

(∂f

∂y(n)

)= 0 (5)

This is again called the Euler Lagrange equation for the functional. There is no existenceor uniqueness theorem in this case again.

Example 3.3 Suppose I(y) =∫ π

20 ((y′′)2 − y2)dx with y(0) = 0 = y′(0) and y(π2 ) = 1 and

y′(π2 ) = 0. The E-L equation gives −2y + d2

dx2(2y′′) = 0 and so y(4) − y = 0 and this

has a general solution of y = A cosx + B sinx + Cex + De−x and solving for the endpointconditions gives the four equations 0 = A+ C +D, 0 = B + C −D, 1 = B + Ce

π2 +De−

π2

and 0 = −A+ Ceπ2 −De−

π2 and these can be solved.

3.3.2 Several dependent functions

Problems involving curves may not be expressible as y = y(x) and so instead we couldwrite the curve in parametric form, i.e. for the length problem we could write L(x, y) =∫ t2t1

√(x′)2 + (y′)2dt. In general these have the form

I(x, y) =

∫ t2

t1

f(t, x(t), y(t), x′(t), y′(t))dt

and we use a one parameter variation (x + hu, y + hv). Then (x, y) is an extremum of Imeans that

d

dhI(x+ hu, y + hv)

∣∣∣∣h=0

= 0

Note that u and v must vanish at the endpoints to preserve the endpoint conditions.If we first take v(x) = 0 ∀x ∈ [t1, t2] then d

dhI(x+ hu, y)∣∣h=0

= 0 and so

∂f

∂x− d

dt

(∂f

∂x′

)= 0

Similarly if u(x) = 0 ∀x ∈ [t1, t2] then ddhI(x, y + hv)

∣∣h=0

= 0 and so

∂f

∂y− d

dt

(∂f

∂y′

)= 0

In other words both x and y satisfy the Euler Lagrange equation for one variable.One can also derive these two equations as we did before: using the chain rule on the

necessary condition, then integrating by parts. Then taking v = 0 and then u = 0 we canapply the fundamental theorem in both cases, giving the result above.

It should be clear that this works for any number of independent variables, so long asthey can be varied independently. If

I(x1, ..., xn) =

∫ t2

t1

f(t, x1(t), ..., xn(t), x′1(t), ..., x′n(t))dt

then I has n simultaneous E-L equations

∂f

∂xi− d

dt

(∂f

∂xi

)= 0 ∀i = 1, ..., n (6)

9 of 20


Example 3.4 Suppose that L(x, y) =∫ 10

√(x′)2 + (y′)2dt with x(0) = x1 and x(1) = x2 as

well as y(0) = y1 and y(1) = y2. This has two E-L equations:

− ddt

(x′√

(x′)2+(y′)2

)= 0

− ddt

(y′√

(x′)2+(y′)2

)= 0

and so both x′√(x′)2+(y′)2

and y′√(x′)2+(y′)2

are constants. Thus 1√(x′)2+(y′)2

(x′, y′) = (A,B)

is a constant unit vector. Hence (x(t), y(t)) is a curve with a constant direction. If c(t) =√(x′)2 + (y′)2 then (x, y) = d(t)(A,B) + (C,D) where d′ = c

Remark Observe that although it is written in term of two variables, the problem is degen-erate. It has infinitely many solutions given by different possible functions d(t).

If there is no explicit t dependence, i.e. ∂f∂t = 0, then consider

F (t) = x′1∂f

∂x′1+ ...+ x′n

∂f

∂x′n− f

Then

dF

dt= x′′1

∂f

∂x′1+ x′1

d

dt

(∂f

∂x′1

)+ ...

+ x′′n∂f

∂x′n+ x′n

d

dt

(∂f

∂x′1

)− ∂f

∂t− x′1

∂f

∂x1−

...− x′n∂f

∂xn− x′′1

∂f

∂x′1− ...− x′′n

∂f

∂x′n

= 0

if there is no explicit time dependence and x1, ..., xn satisfy the E-L equations. Hence F isconstant and this is another First Integral.

4 Relationship with Optics and Fermat’s Principle

We look here at rays of light in the plane moving with speed c(x, y)

4.1 Fermat’s Principle

Theorem 4.1 (Fermat’s Principle) Light Travels along a path between two points (x1, y1)and (x2, y2) so as to take the least time to get from (x1, y1) to (x2, y2)

c(x, y) is the speed at (x, y), and if we travel along a path the speed will be the rate ofchange of arclength along the path. Thus if we measure arclength s from an initial position,then ds

dt = c(x, y). If the path is a graph of a function y(x) then from a path from (x1, y1) to(x2, y2), where we are at (x1, y1) at time t1 and arclength s1 and at (x2, y2) at time t2 andarclength s2, we get that

T (y) = t2 − t1 =

∫ t2

t1

dt =

∫ s2

s1

dsdsdt

=

∫ s2

s1

ds

c=

∫ x2

x1

√1 + (y′)2

c(x, y)dx

The actual path followed by a light ray will be a minimum of T (y).

10 of 20


Example 4.1 Light in a homogeneous medium Here we assume that c is a constant.We have that

T (y) =1

c

∫ x2

x1

√1 + (y′)2dx =

1

cL(y)

and hence in a homogeneous medium light travels in straight lines since these are criticalpoints of the length functional.

Example 4.2 The Law of Refraction Suppose that we have two homogeneous media withspeeds c1 and c2 and have a straight line interface and a ray of light from the first to thesecond. We know that we will have a broken line, but what is the change in direction at theinterface. We look at broken straight line paths passing through the point(x0, 0) on the x-axis.The time taken is τ(x0) and this is equal to

τ(x0) =

√(x0 − x1)2 + y21

c1+

√(x2 − x0)2 + y22

c2

The actual path will be a minimum with respect to x0 and so at that point where the pathcrosses the x-axis we have dτ

dx0= 0. Now dτ

dx0= x0−x1

c1√

(x0−x1)2+y21− x2−x0

c2√

(x2−x0)2+y22= 0 whence

sin θ1c1− sin θ2

c2= 0 or

sin θ1sin θ2

=c1c2

(7)

This is known as Snell’s Law.

Suppose that c is only a function of y, i.e. that c(x, y) = c(y). We divide into stripsparallel to the x-axis. In each strip, the path is approximated by a straight line segment.Then the slope in the strip will be approximately dy

dx . We then have that cot θ = dydx = y′

and then sin θ = 1√1+(y′)2

. It is cot θ here because θ is the angle the ray makes with the y

direction. According to Snell’s Law sin θc is a constant and so 1

c(y)1√

1+(y′)2is a constant. This

equation gives√

1 + (y′)2 = 1Kc(y) and so y′ = ±

√1

K2c2(y)− 1. Then dividing by the square

root term and integrating with respect to x gives∫dy√1

K2c2(y)− 1

= A± x

This gives an equation for x as a function of y and by solving, or using a substitution, weget an explicit solution.

We now rework the above using the Calculus of Variations. In this case we have afunctional independent of x as

T (y) =

∫ x2

x1

√1 + (y′)2

c(y)dx

and then this has a first integral of

y′y′

c(y)√

1 + (y′)2−√

1 + (y′)2

c(y)= K

and this gives −1c(y)√

1+(y′)2= −K which we deduced from Snell’s law before. Hence the first

integral of Fermat’s Principle is Snell’s Law.

11 of 20


4.2 Optical Analogy

If a problem in the Calculus of Variations leads to a functional of the same form as thatcoming from Fermat’s Principle and the optical problem is already solved then the samesolution applies to the variational problem. It then has the solution, when independent of xin the functional, given by

A± x =

∫dy√1

K2c2(y)− 1

This was how Bernoulli first solved the Brachistochrone problem, where we have∫ x2

x1

√1 + (y′)2√

2g(y1 − y)dx

as our functional. If we take c(y) =√

2g(y1 − y) then we can write down the integral formulafor the solution.

When x appears explicitly we have to go to the full E-L equations.

5 Hamilton’s Principle

Suppose that x(t) = (x(t), y(t), z(t)) describes the motion of a point particle in three dimen-sions where t is the time variable. We define x := dx

dt and call it the velocity v. Further-

more we define x := d2xdt2

and call it the acceleration. v := |v| :=√x2 + y2 + z2 =

√v · v

is called the speed. The motion is governed by the mass m > 0. The kinetic energy is12mv

2 = 12m(x2 + y2 + z2). If we have many particles then the total kinetic energy is

T =∑

i12m1v

2i . If q1, ..., qn is a different set of coordinates of which x1, y1, z1, x2, y2, z2, ...

are functions then we get T as a function of q1, ..., qn, q1, ..., qn by substitution.

Definition 5.1 A conservative system is where the forces acting F can be given in termsof a function V such that F = −∇V . V is called the potential energy and is a functionof q1, ..., qn independent coordinates.

Definition 5.2 The Lagrangian of the system is

L(q1, ..., qn, q1, ..., qn) := T − V

Example 5.1 Suppose that a particle of mass m is moving in a circle in the x-y plane withgravity acting in the negative y direction. Then the potential is given by V := mgy =mgR sin θ and the kinetic energy is T = 1

2mR2θ2 and so the Lagrangian is L(θ, θ) =

12mR

2θ2 −mgR sin θ

Theorem 5.1 (Hamilton’s Principle) The path followed by a system described by a La-grangian L = T − V in getting from an initial position P1 at time t1 to a final position P2

at time t2 is a critical point of the functional

I =

∫ t2

t1

Ldt

amongst all possible paths from P1 to P2 at the relevant times.

Hence the actual path satisfies the E-L equations for L, namely

∂L

∂qi− d

dt

(∂L

∂qi

)= 0 for i = 1, ..., n (8)

12 of 20


Example 5.2 Suppose we have a particle on a circle of radius R and is acted upon by gravity(see example 5.1). Then we have L(θ, θ) = 1

2mR2θ2 −mgR sin θ and so the E-L equations

for this gives

−mgR cos θ − d

dt(mR2θ) = 0 =⇒ θ +

g

Rcos θ = 0

and this is called the pendulum equation

Example 5.3 Suppose that we have a particle of mass m moving in R3 with a force F =−∇V . Then L = 1

2m(x2 + y2 + z2)− V (x, y, z) and the E-L equations give

∂L∂x −

ddt

(∂L∂x

)= 0 =⇒ −∂V

∂x −mx = 0∂L∂y −

ddt

(∂L∂y

)= 0 =⇒ −∂V

∂y −my = 0∂L∂z −

ddt

(∂L∂z

)= 0 =⇒ −∂V

∂z −mz = 0

=⇒ F −mx = 0

i.e. Newton’s Second Law. Thus Hamilton’s principle is in accord with Newton’s second Law.

Observe that L is independent of the time variable, and so we always have a first integral ofthe form

q1∂L

∂q1+ ...+ qn

∂L

∂qn− L = constant

Observe that the kinetic energy is quadratic in the derivatives, and will be so for any system.Thus

T (q1, ..., qn, q1, ..., qn) =n∑i=1

n∑j=1

qiqjTij(q1, ..., qn)

And hence we get the identity

T (q1, ..., qn, aq1, ..., aqn) = a2T (q1, ..., qn, q1, ..., qn) (9)

which is called Euler’s Formula. It should be clear that ∂L∂qi

= ∂T∂qi

as V is independent of theq1, ..., qn, the first integral becomes

q1∂T

∂q1+ ...+ qn

∂T

∂qn− L = constant

and hence, by differentiating (9) with respect to a and setting a = 0, we get that T + V =constant and this is called conservation of energy.

6 Constraints and Lagrange Multipliers

6.1 Finite Dimensions

6.1.1 Two dimensions

A typical example is to find extrema of f(x, y) on the set {(x, y) ∈ R2|g(x, y) = 0}. Theimplicit function theorem tells us which variable in an equation can be solved for in termsof the others.

If ∂g∂x(x0, y0) 6= 0 for some point then there is a function η(x) defined for x near x0 with

η(x0) = y0, η differentiable, such that all solutions (x, y) of g(x, y) = 0 near (x0, y0) have theform (x, η(x)).

We say the constraint is regular if at every solution at least one of the partial derivativesis non-zero.

13 of 20


If (x0, y0) is an extremum of f on {(x, y)|g(x, y) = 0} then let y = η(x) be a solutionnear (x0, y0) of the constraint, and then substitute this in f to give f(x, η(x)) and this hasx0 as an extremum. Therefore

d

dxf(x, η(x))

∣∣∣∣x=x0

= 0 (10)

We also have the fact that g(x, η(x)) = 0 for all x for which η is defined. Equation (10), bythe chain rule, yields

∂f

∂x(x0, y0) +

∂f

∂y(x0, y0)

dη

dx(x0) = 0

and we also have that

d

dxg(x, η(x)) = 0 =⇒ ∂g

∂x(x, y) +

∂g

∂y(x, y)

dη

dx(x) = 0

for all x near x0, and if one evaluates this at x = x0 we get that

dη

dx(x0) = −

∂g∂x(x0, y0)∂g∂y (x0, y0)

as the denominator is non zero by assumption. From these we get that:

∂f

∂x(x0, y0)−

∂f

∂y(x0, y0)

∂g∂x(x0, y0)∂g∂y (x0, y0)

= 0

and if we define λ =∂f∂y

(x0,y0)∂g∂y

(x0,y0)then this becomes

∂f

∂x(x0, y0)− λ

∂g

∂x(x0, y0) = 0 =⇒ ∂

∂x(f − λg)

∣∣(x0,y0) = 0

Also, by definition of λ we get that ∂∂y (f −λg)

∣∣(x0,y0) = 0 and therefore f −λg has a critical

point at (x0, y0).Similarly if ∂g

∂y (x0, y0) 6= 0 and (x0, y0) is an extremum of f on {(x, y)|g(x, y) = 0} then

there is a constant λ′ =∂f∂x

(x0,y0)∂g∂x

(x0,y0)such that f − λ′g has a critical point at (x0, y0).

We have thus proved:

Theorem 6.1 (Lagrange Multiplier) If g is a regular constraint with ∇g 6= 0 for all(x, y) with g(x, y) = 0 then any extremum (x0, y0) of f(x, y) on the set {(x, y)|g(x, y) = 0}has an associated real number λ such that f − λg has a critical point at (x0, y0).

We call λ the Lagrange multiplier for (x0, y0). The unknowns are now (x0, y0) and λ.The condition of f − λg having a critical point at (x0, y0) is ∇(f − λg)(x0, y0) = 0 and wealso have the condition of g(x0, y0) = 0.

Example 6.1 Find the extrema of f(x, y) = ax + by on x2 + y2 = 1. The extrema has acritical point of f − λg = ax + by − λ(x2 + y2 − 1) and so 0 = a − 2λx and 0 = b − 2λy

giving a2 + b2 = 4λ2 and so λ = ±√a2+b2

2 and so (x, y) =(± a√

a2+b2,± b√

a2+b2

). Then

f = ± a2+b2√a2+b2

= ±√a2 + b2 and hence there is a maximum at +

√a2 + b2 and a minimum at

−√a2 + b2.

In general there may be more solutions to ∇(f−λg)(x0, y0) = 0 and g(x0, y0) = 0 than thereare extrema (x0, y0) of f on {(x, y)|g(x, y) = 0}.Definition 6.1 We call the solutions to the above constrained critical points of f

The constrained critical points of f on g(x, y) = 0 are unconstrained critical points off − λg for some λ.

14 of 20


6.1.2 n dimensions

Let f be a function of n variables and look for extrema of f(x1, ..., xn) on the set of pointswhere a function g(x1, ..., xn) = 0. Suppose that x = (x1, ..., xn) is an extreme point andpick two vectors u and v and consider a function of two variables Fu,v(h, k) := f(x+hu+kv)subject to the constraints Gu,v(h, k) := g(x + hu + kv) = 0. Then (h, k) = (0, 0) is anextremum of Fu,v subject to Gu,v(h, k) = 0. Hence there is a Lagrange multiplier λu,v suchthat Fu,v − λu,vGu,v has a critical point at (h, k) = (0, 0). Therefore we have that

∂

∂h(Fu,v − λu,vGu,v) = 0

and that∂

∂k(Fu,v − λu,vGu,v) = 0

This then gives us thatu · ∇(f − λu,vg)(x) = 0

and thatv · ∇(f − λu,vg)(x) = 0

from the fact that∂Fu,v∂h = ∂

∂h(f(x + hu + kv)) = u · f(x + hu + kv) and similarly for theother partial derivatives.

Then for every pair of vectors u and v we have that ∇(f − λu,vg)(x) is perpendicular toboth u and v.

If e1, ..., en is the standard basis and λij = λei,ej then ∇f(x)−λij∇g(x) is perpendicularto both ei and ej for each i and j. In terms of the partial derivatives this becomes

∂f

∂xi(x)− λij

∂g

∂xi(x) = 0 =

∂f

∂xj(x)− λij

∂g

∂xj(x) ∀i, j

We aim to find a condition that is independent of j and so have a single Lagrange multiplierfor each of the equations. For a regular constraint we need ∇g 6= 0 everywhere on g(x) = 0.Thus at least one partial derivative of g is non zero, say the i0th. Then we can write

λi0j =

∂f∂xi0

(x)

∂g∂xi0

(x)

and this is independent of j. Then if we put λ := λi0j for any j and then input this into thesecond equation, we get that

∂f

∂xj(x)− λ ∂g

∂xj(x) = 0 ∀j

Hence we have a λ such that ∇(f − λg)(x) = 0

Example 6.2 Find the point on the plane x · n = p closest to a given point a not on theplane.

We aim to minimise the distance from a point x to the point a such that x · n = p TheEuclidean distance is given by d(x, a) = |x− a| but we will take the square of this to simplifyworking out. It should be clear that if the square of the distance has a minimum, then so mustthe distance itself. Thus we have that we want to minimise f(x) = |x−a|2 =

∑i=mi=1 (xi − ai)2

subject to g(x) = x · n− p. Now ∇f = 2(x− a) and ∇g = n. At the critical point there is anumber λ such that ∇f − λg = 0 and in this case this is 2(x − a) − λn = 0 and so we getthat x = a+ λ

2n and (a+ λ2n) · n = p and so λ

2 = p− a · n thus x = a+ (p− a · n)n.This is a minimum as any point which is different from the above one will have distance

on a hypotenuse of a right angled triangle with one side equal to the length at a critical point.

15 of 20


6.1.3 Examples

The following examples are ones that we aim to solve, and will develop techniques to do soin the next section.

1. Hanging rope or chain Suppose we have a rope hanging in equilibrium between twopoints (x1, y1) and (x2, y2). What is the shape of the rope? This is called a catenary.

Suppose the shape is the graph of a function y = y(x). In equilibrium its potentialenergy will be minimised. Let ρ be the density per unit length of the rope and assumethat it is constant. Then the total mass is J(y) := ρ

∫ x2x1

√1 + (y′)2dx =: M . The

potential energy is then given by I(y) := ρg∫ x2x1y√

1 + (y′)2dx. Hence we want tominimise I(y) subject to J(y) being a constant value M .

2. Isoperimetric problem Consider a closed curve in the plane. For a given length, wewant to find the curve which encloses the greatest area.

Let the curve C be given by (x(t), y(t)) with x(t1) = x0 = x(t2) and y(t1) = y0 = y(t2).Then the length of C is given by L(C) =

∫ t2t1

√x2 + y2dt and the area is given by

A(C) = 12

∫ t2t1

(xy − yx)dt We want to minimise A(C) for fixed L(C).

3. Geodesics on Surfaces Curves which minimise the distance in a surface are calledgeodesics. Here we minimise a length functional L(x) for curves x(t) which satisfyg(x(t)) = 0 for all t.

6.1.4 A functional constrained by a functional

We first look at problems with two functionals (like one and two above), with two parametervariations, and then look at two variation problems.

If I(y) is extremised on the set g(x) with J(y) constant, we look at two parametervariations y+hu+kv where u and v are chosen such that u(x1) = u(x2) = v(x1) = v(x2) = 0and then (h, k) = (0, 0) is an extremum for I(y + hu + kv) = J0 which is a fixed constant.Define Fuv(h, k) = I(y + hu + kv) and Guv(h, k) = I(y + hu + kv) − J0. Then Fuv has anextremum at (h, k) = (0, 0) for (h, k) such that Guv(h, k) = 0. Hence we have a LagrangeMultiplier λuv such that Fuv − λuvGuv has a critical point at (0, 0). Thus

∂

∂h(Fuv(h, k)− λuvGuv(h, k)) |h,k=0 = 0

and also∂

∂k(Fuv(h, k)− λuvGuv(h, k)) |h,k=0 = 0

If I(y) =∫ x2x1f(x, y, y′)dx and J(y) =

∫ x2x1g(x, y, y′)dx then the h partial equation gives

0 =

∫ x2

x1

(∂

∂y(f − λuvg)− d

dx

(∂

∂y′(f − λuvg)

))udx

and the k partial equation gives

0 =

∫ x2

x1

(∂

∂y(f − λuvg)− d

dx

(∂

∂y′(f − λuvg)

))vdx

Then we have that

0 =

∫ x2

x1

(∂f

∂y− d

dx

(∂f

∂y′

))udx− λuv

∫ x2

x1

(∂g

∂y− d

dx

(∂g

∂y′

))udx

16 of 20


The regularity condition gives that the latter integrand and hence integral in the aboveequation is non zero on the set of g(x) and so J(y) = J0. Then the former integral is nonzero and so we can set

λu0v =

∫ x2x1

(∂f∂y −

ddx

(∂f∂y′

))u0dx∫ x2

x1

(∂g∂y −

ddx

(∂g∂y′

))u0dx

and note that the right hand side here is independent of v, and so we can write λu0v =: λ.Then for any v vanishing at x1 and x2, and for λ defined before, we get that∫ x2

x1

(∂

∂y(f − λg)− d

dx

(∂

∂y′(f − λg)

))vdx = 0

and by the fundamental lemma we get that

∂

∂y(f − λg)− d

dx

(∂

∂y′(f − λg)

)= 0

This is called the Euler Lagrange equation for this case. We have thus proved:

Theorem 6.2 An extremum of I(y) subject to J(y) = J0 satisfies the Euler Lagrange equa-tion

∂

∂y(f − λg)− d

dx

(∂

∂y′(f − λg)

)= 0 (11)

for I − λJ for some λ called the Lagrange Multiplier.

Remark This proof can be adapted to more derivatives or more independent variables.We now solve the examples given at the start of this subsection.

1. Catenary We have I(y) := ρg∫ x2x1y√

1 + (y′)2dx and J(y) := ρ∫ x2x1

√1 + (y′)2dx =:

M . y satisfies the E-L equation for I − λJ for some λ. This functional is

ρ

∫ x2

x1

(gy − λ)√

1 + (y′)2dx

and we use the optical analogy to solve it. This corresponds to light moving with speedc = 1

gy−λ and has solution of

y = λ+c1ρg

coshρgx

c1+ c2

and we have three conditions and three unknowns and so we can solve to find c1, c2, λ

2. Isoperimetric Problem We want to maximise A(x, y) while keeping L(x, y) = l fixed.(x(t), y(t)) is a parameterisation of a closed curve. The extremising curve will satisfythe E-L equations for A− λL and so we get

(A− λL)(x, y) =

∫ t2

t1

(1

2(xy − yx)− λ

√x2 + y2

)dt

and we have two E-L equations and so we have

12 y −

ddt

(−1

2y − λx√x2+y2

)= 0

−12 x−

ddt

(12x− λ

y√x2+y2

)= 0

17 of 20


Note that both equations are time derivatives and so we get

ddt

(y − λ x√

x2+y2

)= 0

ddt

(−x+ λ y√

x2+y2

)= 0

and integrating once givesy − λ x√

x2+y2= B

−x+ λ y√x2+y2

= −C

and hence we get that (x− C)2 + (y −B)2 = λ2 and this is a circle centre (C,B) andradius λ. Therefore 2πλ = l and so λ = l

2π .

To handle example three we need a new method.

6.1.5 One functional constrained by a function

As far as I can gather, we cannot in general constrain a functional with respect to a givenfunction, but we can do if the functional is a function of curves.

Suppose we have curves x(t) = (x1(t), ..., xn(t)) joining two points x(1) and x(2) at timest1 and t2 and the curves satisfy g(t, x(t), x(t)) = 0. We have some functional I(x) =∫ t2t1f(t, x(t), x(t))dt and we aim to extremise amongst these curves. An example of this

is finding geodesics in a surface.Let xh be a variation of an extremum x, and xh = x + hu + o(h2) which satisfy the

constraint for all h. Then h = 0 is a critical point for I(xh) as a function of h and so

d

dhI(xh)

∣∣∣∣h=0

= 0

and thus we get that ∫ t2

t1

[n∑i=1

(∂f

∂xiui +

∂f

∂xiui

)]dt = 0 (12)

Differentiating the constraint g(t, x(t), x(t)) = 0 gives

n∑i=1

(∂g

∂xiui +

∂g

∂xiui

)= 0 for all t (13)

Pick a function λ(t), multiply (13) by λ and subtract from the integrand of (12). Thereforewe get

0 =

∫ t2

t1

n∑i=1

(∂f

∂xiui +

∂f

∂xiui − λ(t)

(∂g

∂xiui +

∂g

∂xiui

))dt

=

∫ t2

t1

[n∑i=1

(∂

∂xi(f − λg)ui +

∂

∂xi(f − λg)ui

)]dt

Observe that u(t1) = 0 = u(t2) and then integrating the ui terms by parts gives

0 =

∫ t2

t1

[n∑i=1

(∂

∂xi(f − λg)− d

dt

(∂

∂xi(f − λg)

))ui

]dt

18 of 20


We can pick λ(t) so that one of the coefficients, say of ui, is zero. We can do this since

setting ∂∂x1

(f −λg)− ddt

(∂∂x1

(f − λg))

= 0 gives a first order linear inhomogeneous ODE for

λ which can be solved by the integrating factor method. The constraint (12) amongst theui then determines u1 in terms of u2, ..., un and then the latter can be varied freely. Hencethe condition for an extremum becomes

0 =

n∑i=2

(∫ t2

t1

[(∂


dt

(∂

∂xi(f − λg)

))ui

]dt

)and since now u2, ..., un are arbitrary, vanishing at t1 and t2 we can apply the fundamentaltheorem to each ui in turn taking the rest of u2, ..., un to be zero giving

∂


dt

(∂

∂xi(f − λg)

)= 0 for i = 2, ..., n

For i = 1 this equation was the way we chose λ(t) and hence the equation becomes

∂


dt

(∂

∂xi(f − λg)

)= 0 for i = 1, 2, ..., n (14)

We have thus proved

Theorem 6.3 To extremise a functional I given by a Lagrangian f amongst curves x(t)with fixed endpoints subject to a constraint g(t, x(t), x(t)) = 0 there is a function λ(t) suchthat the Euler-Lagrange equations for the Lagrangian f − λg (namely (14)) are satisfied.

Example 6.3 Geodesics on a surface in R3 given by an equation g(x) = 0. Geodesics arepaths of shortest length and so minimise

∫ t2t1

√x2 + y2 + z2dt. Then there is a function λ(t)

such that a geodesic x(t) satisfies the E-L equations of the Lagrangian√x2 + y2 + z2 −

λ(t)g(x). We thus have, for the x equation,

−λ(t)∂g

∂x− d

dt

(x√

x2 + y2 + z2

)= 0

We aim to simplify this equation, and so we introduce the arclength parameter s. Thisis given by ds

dt =√x2 + y2 + z2 and we change independent variables from t to s. Then

dxds = x

dsdt

= x√x2+y2+z2

. Then dividing the equation by dsdt and putting µ = λ

dsdt

we get that

−µ∂g∂x− d2x

ds2= 0

and the y equation becomes

−µ∂g∂y− d2y

ds2= 0

and similarly the z equation becomes

−µ∂g∂z− d2z

ds2= 0

and so

−µ =d2xds2

∂g∂x

=d2yds2

∂g∂y

=d2zds2

∂g∂z

19 of 20


There is no general method to solve these equations. We now consider a special case ofa sphere in R3, and so we have that g(x, y, z) = x2 + y2 + z2 −R2. This then gives us that

1

2x

d2x

dx2=

1

2y

d2y

dx2=

1

2z

d2z

dx2

and notice that

d

ds

(zdy

ds− ydz

ds

)=dz

ds

dy

ds+ z

d2y

ds2− dy

ds

dz

ds− yd

2z

ds2= yz

(1

y

d2y

ds2− 1

z

d2z

ds2

)= 0

and so z dyds − ydzds = A is constant. Similarly xdzds − z

dxds = B is constant and y dxds − x

dyds = C

is constant. Therefore multiplying by x, y and z respectively on these equations gives 0 =Ax + By + Cz. This is a plane through the origin perpendicular to (A,B,C). Hence thepath must lie in the intersection of the sphere and a plane through the origin. These arecalled Great Circles. We have two solutions to the E-L equations satisfying the endpointconditions so long as the endpoints are not antipodal. If two endpoints are poles then we geta continuum of great circles all of which are solutions to the problem.

7 Constrained Motion

For particles moving with coordinates related by a constraint, say g = 0, then Hamilton’sprinciple extremises

∫ t2t1Ldt where L = T −V and now we are subjected to a constraint. We

use the Lagrange Multiplier method, and so we have a function λ(t) such that the motionsatisfies the E-L equation for T − V − λg

Example 7.1 Consider free motion on a surface in R3. We then have by definition V = 0and also T = 1

2m(x2 + y2 + z2) and g(x, y, z) = 0. Thus we want the E-L equations for12m(x2 + y2 + z2)− λ(t)g(x, y, z) and so we have that

−λ ∂g∂x −ddt(mx) = 0

−λ∂g∂y −ddt(my) = 0

−λ∂g∂z −ddt(mz) = 0

=⇒ mx = −λ∇g

∇g is a vector perpendicular to the surface at each point. If we eliminate λm then we get that

x∂g∂x

=y∂g∂y

=z∂g∂z

Observe that

d

dt(x2 + y2 + z2) = 2xx+ 2yy + 2zz = −2λ

m

(x∂g

∂x+ y

∂g

∂y+ z

∂g

∂z

)= 0

and so x2+ y2+ z2 is constant and so dsdt is constant so changing from t to s gives the geodesic

equation. Hence the motion of a free particle is along a geodesic and constant speed.

20 of 20

Documents

MA209 Variational PrinciplesMA209 Variational Principles Lecture Notes 2011 For nvariables, f(x) real valued and with an extremum at x= a, we pick a function g v(t) = f(a+ tv), where