Advanced Classical Physics, Autumn 2013 - Imperial · PDF fileAdvanced Classical Physics, Autumn 2013 Lecture Notes ... (3rd Edition), Goldstein, Poole & Safko ... Relativity and Electromagnetism

Advanced Classical Physics, Autumn 2013

Arttu Rajantie

December 2, 2013

Advanced Classical Physics, Autumn 2013 Lecture NotesAdvanced Classical Physics, Autumn 2013 Lecture NotesAdvanced Classical Physics, Autumn 2013 Lecture Notes

Preface

These are the lecture notes for the third-year Advanced Classical Physics course in the 2013-14 aca-demic year at Imperial College London. They are based on the notes which I inherited from theprevious lecturer Professor Angus MacKinnon.

The notes are designed to be self-contained, but there are also some excellent textbooks, which Iwant to recommend as supplementary reading. The core textbooks are

• Classical Mechanics (5th Edition), Kibble & Berkshire (Imperial College Press 2004),

• Introduction to Electrodynamics (3rd Edition), Griffiths (Pearson 2008),

and these books may also be useful:

• Classical Mechanics (2nd Edition), McCall (Wiley 2011),

• Classical Mechanics, Gregory (Cambridge University Press 2006),

• Classical Mechanics (3rd Edition), Goldstein, Poole & Safko (Addison-Wesley 2002),

• Mechanics (3rd Edition), Landau & Lifshitz (Elsevier 1976),

• Classical Electrodynamics (3rd Edition), Jackson (Wiley 1999),

• The Classical Theory of Fields, Landau & Lifshitz (Elsevier 1975).

The course assumes Mechanics, Relativity and Electromagnetism as background knowledge. Be-ing a theoretical course, it also makes heavy use of most aspects of the compulsory mathematicscourses. Mathematical Methods is also useful, but it is not a formal prerequisite, and all necessaryconcepts are introduced as part of this course, although in a less general way.

Arttu Rajantie

111

Contents

1 Rotating Frames 41.1 Angular Velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Transformation of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Equation of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Coriolis Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Centrifugal Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.6.1 Weather . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6.2 Foucault’s Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6.3 Particle in Magnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6.4 Larmor Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Rigid Bodies 132.1 Many-Body Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Rotation about a Fixed Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Compound Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.2 Centre of Percussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Inertia Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Principal Axes of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Calculation of Moments of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 Shift of Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5.2 Continuous Solid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5.3 Routh’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Effect of Small Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.7 Rotation about a Principal Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.8 Euler’s Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Lagrangian Mechanics 283.1 Action Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Generalised Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Precession of a Symmetric Top . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 Normal Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.1 Orthogonal Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5.2 Small Oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.5.3 Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2

Advanced Classical Physics, Autumn 2013 Lecture NotesAdvanced Classical Physics, Autumn 2013 Lecture NotesAdvanced Classical Physics, Autumn 2013 Lecture Notes

3.6 Continuous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Hamiltonian Mechanics 414.1 Hamilton’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.2 Poisson Brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.3 Symmetries and Conservation Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 Canonical Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Electromagnetic Potentials 495.1 Vector Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Gauge Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.3 Particle in an electromagnetic Field . . . . . . . . . . . . . . . . . . . . . . . . . . 525.4 Retarded Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.5 Fields of a Moving Charge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Electrodynamics and Relativity 616.1 Four-Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.2 Relativistic Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.4 Four-vector Potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.5 Lagrangian for Electrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

333

Chapter 1

Rotating Frames

1.1 Angular Velocity1

In order to describe rotation, you need to know its speed and the direction of the rotation axis. Thespeed is a scalar quantity and it is given by the angular frequency ω = 2π/T , where T is the periodof rotation, and the axis is a direction in space, so it can be represented by a unit vector n.

It is natural and useful to combine these to an angular velocity vector ω = ωn. The sign of thevector ω is determined by the right-hand-rule: If you imagine gripping the axis of rotation with thefingers of your right hand, your thumb will point to the direction of ω.

The angular velocity vector is called an axial vector (or sometimes a pseudo-vector), which meansthat is has different symmetry properties from a normal (polar) vector. Consider the effect of reflectionin a plane containing the vector, e.g. a vector in the z direction reflected in the (y − z) plane. A polarvector is unchanged under such an operation, whereas an axial vector changes sign, as the directionof rotation is reversed.

For example, for the rotation of the earth (against the background of the stars), the angular velocityω takes the value

ω =2π

86164s= 7.292× 10−5 s−1 . (1.1.1)

The angular momentum vector ω points up at the North Pole.Consider now a point r on the the rotating body, e.g. Blackett Lab. at latitude 51.5N. This point

is moving tangentially eastwards with a speed v = ωr sin θ, where r is the distance from the origin(assumed to be on the axis of rotation (e.g. the centre of the Earth) and θ is the angle between thevectors r and ω (i.e., for Blackett, θ = 90 − 51.5 = 38.5). Hence the velocity of the point r maybe written

dr

dt= ω × r . (1.1.2)

We note here that geographers tend to measure latitude from the equator whereas the angle θ inspherical polar co-ordinates is defined from the pole. Thus the geographical designation 51.5Ncorresponds to θ = 38.5.

4

Advanced Classical Physics, Autumn 2013 Rotating FramesAdvanced Classical Physics, Autumn 2013 Rotating FramesAdvanced Classical Physics, Autumn 2013 Rotating Frames

y

φ

z

ρ

φ

ρ

x

z

z

φ

x

rθ θ

φ

r

z

y

Figure 1.1: Cylindrical (left) and spherical (right) coordinate systems

1.2 Transformation of Vectors

Actually the result (1.1.2) is valid for any vector fixed in the rotating body, not just for position vectors.So, in general we may write

dA

dt= ω ×A . (1.2.1)

In particular we can consider the case of a set of orthogonal unit vectors, ı, , k fixed in the body,chosen in accordance with the right-hand rule (index finger points in the direction of ı, middle finger inthe direction of and thumb in the direction of k), which we can use as a basis of a rotating coordinatesystem. It is often convenient to choose ı to point radially away from the rotation axis and k to bein the direction ω, so that points in the direction of motion. This forms the basis of cylindricalcoordinates (ρ, φ, z) and are often denoted by ı = ρ, = φ, k = z.

On the other hand, for motion on the surface for a rotating sphere such as the Earth, a convenientchoice is to take ı pointing south, pointing east and k pointing up (i.e. away from the centre). Theseform the basis of spherical coordinates (r, θ, φ), with ı = θ, = φ, k = r.

We have to be careful to distinguish between the point of view of an observer on the rotatingobject and one in a fixed (inertial) frame of reference observing the situation from outside. The basisvectors ı, , k are fixed from the point of view of the rotating observer to the inertial observer. Weshall adopt the convention of using subscripts I and R to donate quantities in the inertial and rotatingframes respectively (N.B. Kibble & Berkshire use a different convention).

Given a set of basis vectors ı, , k, we can write any vector A as

A = Axı+Ay +Azk . (1.2.2)

We want to write down an expression which relates the rates of change of this vector in the two frames.We first note that a scalar quantity cannot depend on the choice of frame and that Ax, Ay and Az may

1Kibble & Berkshire, chapter 5

555


each be considered as such scalars. Hence

dAxdt

∣∣∣∣I

=dAxdt

∣∣∣∣R

etc. (1.2.3)

so that the differences between the vector A in the 2 frames must be solely related to the difference inthe basis vectors. Hence

dA

dt

∣∣∣∣I

=

(dAxdtı+

dAydt+

dAzdt

k

)+

(Ax

dı

dt+Ay

d

dt+Az

dk

dt

)

=

(dAzdtı+

dAydt+

dAzdt

k

)+(Ax(ω × ı) +Ay(ω × ) +Az(ω × k)

). (1.2.4)

The expression in the first brackets on the right-hand-side is precisely the time derivative measuredby the rotating observer, so this is the relation we wanted: It relates the time derivatives measured byinertial and rotating observers. We can write it compactly as in the compact form

dA

dt

∣∣∣∣I

=dA

dt

∣∣∣∣R

+ ω ×A . (1.2.5)

1.3 Equation of Motion

Applying Eq. (1.2.5) to the position of the particle r, its velocity may be written as

vI =dr

dt

∣∣∣∣I

=dr

dt

∣∣∣∣R

+ ω × r = vR + ω × r. (1.3.1)

Now consider an object subject to a force F. Newton’s second law applies in the inertial frame,so we have

md2r

dt2

∣∣∣∣I

= F . (1.3.2)

To write this equation in the rotating frame, we differentiate the velocity vI using Eq. (1.2.5) again,

d2r

dt2

∣∣∣∣I

=dvI

dt

∣∣∣∣I

=dvR

dt

∣∣∣∣I

+ ω × dr

dt

∣∣∣∣I

=dvR

dt

∣∣∣∣R

+ ω × vR + ω × (vR + ω × r)

=d2r

dt2

∣∣∣∣R

+ 2ω × dr

dt

∣∣∣∣R

+ ω × (ω × r) . (1.3.3)

Alternatively we can write this very concisely as

d2r

dt2

∣∣∣∣I

=

(d

dt

∣∣∣∣R

+ ω×)2

r (1.3.4)

=d2r

dt2

∣∣∣∣R

+ 2ω × dr

dt

∣∣∣∣R

+ ω × (ω × r) .

666


We now rearrange (1.3.3) and combine it with (1.3.2) to obtain an equation of motion for theparticle in the rotating frame

md2r

dt2

∣∣∣∣R

= F− 2mω × vR −mω × (ω × r) . (1.3.5)

The second term on the right-hand-side of (1.3.5) is the Coriolis force while the final term, pointingaway from the axis, is the centrifugal force. They are called fictitious forces, because they do notrepresent real physical interactions but appear only because of the choice of the coordinate system.Fictitious forces are always proportional to the mass of the particle, so that the corresponding accel-eration is independent of mass. Of course, this is also true for gravity, and in fact general relativitydescribes gravity as a fictitious force.

1.4 Coriolis Force

Equation (1.3.5) shows that, due to the Coriolis force, an object moving at velocity v in the rotatingframe, experiences apparent acceleration

aCor = −2ω × v. (1.4.1)

For example, imagine a car travelling north along Queen’s Gate at 50 km/h. To calculate the Coriolisacceleration it experiences, let us choose a set of orthogonal basis vectors that rotate with the Earth, forexample ı pointing east, north and k up. With this choice the angular velocity vector has components

ω = (0, ω sin θ, ω cos θ) , (1.4.2)

and the car has velocity v = v. The Coriolis acceleration is, therefore,

aCor = −2ω × v = −2×

∣∣∣∣∣∣ı k0 ω sin θ ω cos θ0 v 0

∣∣∣∣∣∣ = 2ωv cos θ ı

≈ 2× (7.292× 10−5 s−1)×(

50× 103 m

3600 s

)× cos(38.5) ı

≈ 1.5 mm/s2 eastwards , (1.4.3)

which is equivalent to a velocity change of ≈ 9 cm/s after 1 min. In this context it’s not a big effectand can safely be ignored. There are other contexts in which it is anything but negligible, however.

1.5 Centrifugal Force

According to Eq. (1.3.5), the acceleration due to the centrifugal force is

acf = −ω × (ω × r) . (1.5.1)

. To calculate this, it is useful to note the general identity for the triple cross product,

a× (b× c) = (a · c) b− (a · b) c. (1.5.2)

777


Figure 1.2: Gaspard-Gustave de Coriolis (1792–1843) (left), Leon Foucault (1819–68) (centre) andSir Joseph Larmor (1857–1942) (right)

Using the same coordinates as in Section 1.4, with r = rk, we find

acf = (ω · ω)r− (ω · r)ω = ω2r(0,− sin θ cos θ, sin2 θ

)= ω2r sin θ(0,− cos θ, sin θ). (1.5.3)

The acceleration points away from the rotation axis, and for example in London, it has the strength

acf = ω2r sin θ ≈ (7.3× 10−5 s−1)2 × 6.4× 106 m× sin(38.5) ≈ 0.02 m/s2. (1.5.4)

Because this force is independent of velocity, we cannot distinguish it locally from the gravitationalforce, and therefore it acts essentially as a small correction to it, changing not only the apparentstrength but also the direction of the gravitational force. In other situations the centrifugal force canbe very important, especially when rotation speeds are high.

It is useful to consider the special case of a particle at rest in the rotating frame, so that the left-hand-side of Eq. (1.3.5) vanishes and vR = 0 as well. In that case, Eq. (1.3.5) implies that there hasto be a real physical force

F = mω × (ω × r) . (1.5.5)

This is known as the centripetal force, and it is the total net force acting on the particle to keep it fixedin the rotating frame. Because it is a real force, it has to due to some type of physical interaction,such as gravitational or electrostatic force, a support force provided by, e.g., a rope, or usually acombination of several different physical forces.

1.6 Examples

1.6.1 Weather

Probably the most important effect attributed to the Coriolis effect is in meteorology: Winds don’tflow from areas of high pressure to those of low pressure but instead tend to flow round the minimaand maxima of pressure, giving rise to cyclones and anticyclones respectively. A simple way tounderstand this is to consider a simple uniform pressure gradient in the presence of a Coriolis force,giving an equation of motion such as

d2r

dt2= −∇p− 2ω × v . (1.6.1)

888


Figure 1.3: Left: schematic representation of flow around a low-pressure area in the Northern hemi-sphere. The pressure-gradient force is represented by blue arrows, the Coriolis acceleration (alwaysperpendicular to the velocity) by red arrows. Right: this low pressure system over Iceland spinscounter-clockwise due to balance between the Coriolis force and the pressure gradient force.

The general solution of such a problem is complicated. However, if we confine ourselves to 2–dimensions and ignore the component of ω parallel to the surface, just as we did in our discussionof the Foucault pendulum, we can always find a solution of (1.6.1) with a constant velocity suchthat the 2 terms on the right cancel. In such a solution ∇p must be perpendicular to v, as a crossproduct is always perpendicular to both vectors. Hence (1.6.1) has a solution in which the velocityis perpendicular to the pressure gradient. In such a system the wind always follows the isobars (linesof constant pressure), a pressure minimum is not easily filled and a cyclone (or anticyclone) is stable.This is illustrated in Fig. 1.3a.

1.6.2 Foucault’s Pendulum

Consider a pendulum which is free to move in any direction and is sufficiently long and heavy that itwill swing freely for several hours. Ignoring the vertical component both of the pendulum’s motionand of the Coriolis force, the equations of motion for the bob (in the coordinate system describedabove) are

x = −g`x+ 2ω cos θy , (1.6.2)

y = −g`y − 2ω cos θx , (1.6.3)

or, using the complex number trick from Section 1.6.3 with r = x+ iy,

d2r

dt2+ 2iΩ

dr

dt+ ω2

0 r = 0 , (1.6.4)

where Ω = ω cos θ and ω20 = g/`. Using standard methods for second order differential equations we

obtain the general solutionr = Ae−i(Ω−ω1)t +Be−i(Ω+ω1)t , (1.6.5)

999


where ω21 = ω2

0 + Ω2. In particular, if the pendulum is released from the origin with velocity (v0, 0),we have A = −B = v0/2iω1, so that2

r =v0

ω1e−iΩt sinω1t, (1.6.6)

which means in terms of the original variables x and y,

x =v0

ω1cos Ωt sinω1t ,

y = − v0

ω1sin Ωt sinω1t . (1.6.7)

We can also write this in terms of polar co-ordinates, (ρ, φ), as

ρ =v0

ω1sinω1t φ = −Ωt .

As Ω ω0, the period of oscillation is much less than a day, the result is easy to interpret: thependulum swings with a basic angular frequency ω1 (≈ ω0) but the plane of oscillation rotates withangular frequency Ω. At the pole, θ = 0 the plane of the pendulum apparently rotates once a day. Inother words, the plane doesn’t rotate at all but the Earth rotates under it once a day. On the other hand,at the equator, θ = π

2 and Ω = 0 the plane of the pendulum is stable. In South Kensington

Ω = ω cos θ = (7.292× 10−5 s−1)× cos(38.5)⇒ T = 30.58 hr . (1.6.8)

Note that this is the time for a complete rotation of the plane of the pendulum through 360. However,after it has rotated through 180 it would be hard to tell the difference between that and the startingposition.

A working Foucault pendulum may be seen in the Science Museum.

1.6.3 Particle in Magnetic Field

The equation of motion for a charged particle in a magnetic field takes the form

md2r

dt2= qv ×B, (1.6.9)

which we can write asdv

dt= −

( qm

B)× v. (1.6.10)

If we identify ω with qB/m, we note that this has same form as Eq. (1.2.5) for a velocity which isconstant in the inertial frame. Hence we should expect the motion of a particle in a magnetic fieldto be similar to motion of a free particle in a rotating frame with omega = qB/m, namely circularmotion with angular frequency ω. In this case this is called the cyclotron motion and ω = qB/m isthe cyclotron frequency.

We can check this by solving the equation of motion. We first choose a coordinate system suchthat ω (or equivalently B) is in the k direction. Then, since ω × k = 0, the z component of v isconstant, and the x and y components satisfy

dvxdt

= +ωvy,dvydt

= −ωvx. (1.6.11)

2N.B. There is an error in the example given in K & B p 117.

101010


Define a complex variable v = vx + ivy, these can be written as a single equation

dv

dt= −iωv, (1.6.12)

which is easy to solve,v = v0 exp(−iωt) . (1.6.13)

Finally we obtain vx and vy by taking the real and imaginary parts of v.We can take this analogy further by generalising Eq. (1.6.10) to the full Lorentz force, including

an electric field E to obtainm

dv

dt= qv ×B + qE . (1.6.14)

where we immediately see that qmE is analogous to the rate of change of the velocity in the inertial

frame. If we consider now the simple case in which E is in the x–direction and B in the z–direction,we can write the x and y components of Eq. (1.6.14) in the form

mdvxdt

= qvyBz + qEx

mdvydt

= −qvxBz . (1.6.15)

Using the simple transformation v′y = vy − (Ex/Bz), Eq. (1.6.15) becomes

mdvxdt

= qv′yBz

mdv′ydt

= −qvxBz , (1.6.16)

which is the same as Eq. (1.6.10). Hence the complete solution is a circular motion with an additionaldrift with speed Ex/Bz in the y–direction, perpendicular to both E and B.

More generally, whenever we find that when the time derivative of any vector is of the form (1.2.5),we can immediately tell that it rotates with the corresponding angular velocity ω.

1.6.4 Larmor Effect

It is sometimes useful to consider a rotating frame, not because the system is itself rotating, butbecause it helps to simplify the mathematics. In this sense it is similar to choosing an appropriatecoordinate system.

Consider a particle of charge q moving around a fixed point charge −q′ in a uniform magneticfield B. The equation of motion is

md2r

dt2= − k

r2r + q

dr

dt×B , (1.6.17)

where k = qq′/4πε0.Rewriting (1.6.17) in a rotating frame, we obtain

d2r

dt2

∣∣∣∣R

+ 2ω × dr

dt

∣∣∣∣R

+ ω × (ω × r) = − k

mr2r +

q

m

(dr

dt

∣∣∣∣R

+ ω × r

)×B . (1.6.18)

111111


If we choose ω = − (q/2m) B, the terms in the velocity fall out and we are left with

d2r

dt2

∣∣∣∣R

= − k

mr2r +

( q

2m

)2B× (B× r) . (1.6.19)

In a weak magnetic field we may ignore terms in B2, such as the last term in (1.6.19), so that we areleft with an expression which is identical to that of the system without the magnetic field. What doesthis mean?

The solution of (1.6.19) is an ellipse (as shown by Newton, Kepler, etc.). Hence, the solution inthe inertial frame must be the same ellipse but rotating with the angular frequency ω = − (q/2m) B.If this is smaller than the period of the ellipse then the effect is that the major axis of the ellipse slowlyrotates. Such a behaviour is known as precession. We shall see other examples of this later.

Note that there are some confusing diagrams on the internet and in textbooks which purport toillustrate this precession. There are two special cases to consider: firstly when the magnetic field isperpendicular to the plane of the ellipse. In this case the major axis of the ellipse rotates about thefocus, while remaining in the same plane. In the second special case the magnetic field is in the planeof the ellipse. Here the ellipse rotates about an axis through the focus and perpendicular to the majoraxis. For a circular orbit, the first case the orbit would remain circular, but with slightly differentperiods clockwise and anti-clockwise, whereas the second would behave like a plate rotating whilestanding on its edge.

121212

Chapter 2

Rigid Bodies

2.1 Many-Body Systems1

Let us consider a system consisting N particles, which we label by an integer a = 1, . . . , N (or otherletters b, c, . . . at the start of the alphabet). We denote their positions by ra and masses by ma.

The net force acting on each particle, is the sum of the forces due to each other particle in thesystem as well as any external forces. Therefore Newton’s second law for particle a has the form

mara =∑b 6=a

Fab + Fexta , (2.1.1)

where Fab is the force on particle a due to particle b, and Fexta is the external force acting on particle

a. Note that the external force is generally dependent on the particle’s position, velocity etc. and istherefore different for each particle, which is why it has the index a.

Instead of trying to solve the motion of each particle, let us first look at the motion of the systemas a whole. For that purpose, it is useful to define the total mass

M =∑a

ma, (2.1.2)

and the centre or massR =

1

M

∑a

mara. (2.1.3)

We also define the total momentum of the system as the sum of the momenta of the individual particles,

P =∑a

pa =∑a

mar = MR. (2.1.4)

Differentiating this with respect to time gives

P =∑a

mar =∑ab

Fab +∑a

Fexta =

∑a

Fexta , (2.1.5)

where the sum of the inter-particle forces Fab vanishes because of Newton’s third law Fba = −Fab.Therefore the rate of change of the total momentum is given by the total external force. In particular,the total momentum P is conserved in isolated systems, i.e., when there are no external forces.


13

Advanced Classical Physics, Autumn 2013 Rigid BodiesAdvanced Classical Physics, Autumn 2013 Rigid BodiesAdvanced Classical Physics, Autumn 2013 Rigid Bodies

Similarly, we define the total angular momentum L as the sum of angular momenta la = mara×raof the individual particles,

L =∑a

la =∑a

mara × ra . (2.1.6)

(N.B. Kibble & Berkshire use J for the angular momentum. We shall stick to the more conventionalL here.) Its rate of change is given by

L =∑a

mara × ra +∑a

mara × ra =∑a

mara × ra

=∑a

ra ×

(∑b

Fab + Fexta

). (2.1.7)

Using Newton’s third law, we can write this as

L =1

2

∑ab

(ra − rb)× Fab +∑a

ra × Fexta . (2.1.8)

In general, the first term on the right-hand-side is non-zero, but it vanishes if we assume that the inter-particle forces are central, which means that the force Fab between particles i and j is in the directionof their separation vector (ra − rb). In that case we have

L =∑a

ra × Fexta ≡ τ , (2.1.9)

where the right-hand-side is known as the torque. The torque is zero and the angular momentum isconserved if the system is isolated or if the external forces all point to the origin (i.e., Fext

a ||ra).Note that there are some forces that are not central, such as the electromagnetic force between

moving charges, and in that case L is not conserved. (In fact, the electromagnetic field can carryangular momentum, and when it is included the total angular momentum is still conserved.)

It is often useful to separate the coordinates ra into centre of mass and relative contributions

ra = R + r∗a (2.1.10)

such that, by definition, ∑a

mar∗a = 0 . (2.1.11)

Substituting this into Eq. (2.1.6) gives

L =∑a

ma (R + r∗a)×(R + r∗a

)=

(∑a

ma

)R× R +

(∑a

mar∗a

)× R + R×

(∑a

mar∗a

)+∑a

mar∗a × r∗a

= MR× R + L∗ where L∗ =∑a

mar∗a × r∗a , (2.1.12)

as the other terms are zero due to Eq. (2.1.11). The angular momentum can be separated into a centreof mass part, MR× R, and the angular momentum about the centre of mass, L∗.

141414


The rate of change of the relative angular momentum L∗ may be written as the sum of the momentsof the particles about the centre of mass due to external forces alone

L∗

= L− d

dt

(MR× R

)=∑a

ra × Fexta −MR× R

=∑a

(ra −R)× Fexta =

∑a

r∗a × Fexta . (2.1.13)

This means that we can often study the centre-of-mass motion and the relative motion separatelyfrom each other. In particular, if the external forces are position-independent, the relative angularmomentum evolves independently of the centre-of-mass motion.

Likewise, the total kinetic energy separates into the kinetic energy of the centre of mass and thekinetic energy relative to the centre of mass,

T =1

2

∑a

mar2a =

1

2

∑a

ma

(R + r∗a

)·(R + r∗a

)=

1

2MR

2+

1

2

∑a

mar∗2a . (2.1.14)

The above results apply generally to many-body systems, but for the rest of this Section we willfocus on a special class of them known as rigid bodies. These are many-body systems in which alldistances |ra− rb| between particles are fixed. The whole system can still move and rotate. In reality,a rigid body a mathematical idealisation because it requires infinitely strong forces between particles,but in many cases it is a very good approximation.

2.2 Rotation about a Fixed Axis

We first consider the case in which the body is free to rotate about a fixed axis. Cylindrical polarcoordinates (ρ, φ, z) are ideally suited for this situation (see Fig. 1.1). If we choose the z axis asthe rotation axis, then for every particle a the coordinate za and ρa are fixed and only the angularcoordinate φa changes as φa = ω. Then we can write the z component of the angular momentum(2.1.6) as

Lz =∑a

maρa

(ρaφ)

=∑a

maρ2aω = Iω , (2.2.1)

where ρφ is the tangential velocity and I =∑

amaρ2a is the moment of inertia about the axis. As I is

obviously constant we can write its rate of change as

Lz = Iω =∑a

ρaFaφ, (2.2.2)

where Faφ is the component of the external force Fexta in the φ direction.

Similarly we can write the kinetic energy in terms of I and ω as

T =∑a

12ma

(ρaφa

)2= 1

2Iω2 . (2.2.3)

Note the similarity of these expressions to the corresponding linear ones where m 7→ I and v 7→ ω

p = mv ↔ L = Iω (2.2.4)

T = 12mv

2 ↔ T = 12Iω

2 . (2.2.5)

151515


Of course, there is no reason why the axis should be through the centre of mass, and for examplein a pendulum it is not. If we define the origin to be on the axis then we can define R as the distanceof the centre of mass from that axis. In general the axis would be free to move, so in order for it toremain fixed, there must be a support force Q that prevents it from moving. From brevity, we refer toit as the “force at axis”. Denoting the sum of all other external forces by F, we can write Newton’ssecond law as

P = MR = Q + F . (2.2.6)

Using R = ω ×R we can write

R = ω ×R + ω × R = ω ×R + ω × (ω ×R) . (2.2.7)

The first of these is the tangential acceleration and the second one is the centripetal force which keepsthe centre of mass on a circular trajectory. From these equations we can determine the support forceQ required to keep the axis fixed.

2.2.1 Compound Pendulum

As an example, let us consider a compound pendulum, which is a rigid body attached to a pivot andsubject to a gravitational force. We take the z–axis to be the axis of rotation, which is now horizontal,and x to be pointing downwards. Then the pendulum is subject to an external gravitational forceF = (Mg, 0, 0) acting through its centre of mass. In terms of the unit vectors ρ and φ, we can writethis as

F = Mg cosφρ−Mg sinφφ. (2.2.8)

Thus the equation of motion (2.2.2) is

Iφ = −MgR sinφ, (2.2.9)

and the energy conservation equation is

E = T + V = 12Iφ

2 −MgR cosφ = constant . (2.2.10)

For small amplitudes, φ 1, Eq. (2.2.9) reduces to the equation for a simple harmonic oscillator,

φ = −MgR

Iφ, (2.2.11)

with period T = 2π√I/MgR.

Rewriting Eq. (2.2.7) in polar coordinates and noting that only φ actually changes we can calculatethe net force on the system and hence the support force Q at the axis,

P = MR = MRφ φ−MRφ2 ρ (2.2.12)

⇒ Q = P− F =(−Mg cosφ−MRφ2

)ρ+

(Mg sinφ+MRφ

)φ (2.2.13)

= −[Mg cosφ

(1 +

2MR2

I

)+

2MR

IE

]ρ+Mg sinφ

(1− MR2

I

)φ , (2.2.14)

where, in the final step, we have substituted from Eqs. (2.2.10) and (2.2.9) to eliminate φ and φ.Note that, in contrast with a simple pendulum (for which I = MR2), the force Q is not in the radialdirection ρ.

161616


2.2.2 Centre of Percussion

As a further example, consider a compound pendulum which is initially at rest. An external force Fin the angular direction φ is then applied at distance d from the pivot point for a short period of time.We want to calculate the force Q needed to keep the axis fixed.

To make this more concrete, you can think of the pendulum as a tennis racket which you areholding in your hand, so that your hand acts as the pivot point. A ball hits the racket at distance dfrom your hand and exerts a force F on the racket. We want to calculate the support force Q whichyour hand has to provide to remain stationary, or equivalently the impact you will feel with your hand.

Because initially the racket is not rotating, φ = 0, and therefore Eqs. (2.2.6) and (2.2.7) become

Q + F = P = MR = MRφφ,

Iφ = dFφ, (2.2.15)

from which we find

Qφ = MRφ− Fφ =MRd

IFφ − Fφ =

(MRd

I− 1

)Fφ. (2.2.16)

If the distance at which the ball hits the racket is d = I/MR, the linear and rotational motion balanceeach other and the pivot point does not feel any impact. This point is known as the centre of percussion.In sport it is also called the “sweet spot”, because you hit the ball but feel no impact with your hand.

2.3 Inertia Tensor

In general, the angular momentum vector

L =∑a

mara × ra =∑a

mara × (ω × ra) (2.3.1)

is not parallel to the angular velocity ω.Let us use the Cartesian coordinates and write the position vector as

r =

xyz

. (2.3.2)

For simplicity, we first assume that ω is in the z direction, so that

ω = ωk =

00ω

. (2.3.3)

Then we have

r× (ω × r) = (r · r)ω − (r · ω)r = (x2 + y2 + z2)ω − zωr =

−xzω−yzω

(x2 + y2)ω

. (2.3.4)

171717


Using this in Eq. (2.3.1), we find the components of the angular momentum vector L,

Lx = −∑a

maxazaω,

Ly = −∑a

mayazaω,

Lz =∑a

ma

(x2a + y2

a

)ω . (2.3.5)

We can summarise these by writing

Lx = Ixzω, Ly = Iyzω, Lz = Izzω, (2.3.6)

where

Ixz = −∑a

maxaza, Iyz = −∑a

mayaza, Izz =∑a

ma

(x2a + y2

a

). (2.3.7)

Izz is the moment of inertia as previously defined. Ixz and Iyz are sometimes known as products ofinertia.

Figure 2.1:

As an example of a simple system for which the angular momentum is not parallel the angularvelocity, consider a rigid rod with equal masses on either end (a dumbbell) inclined at an angle θ tothe axis of rotation. If the masses are at ±r then the total angular momentum is

L = mr× r +m(−r)× (−r) = 2mr× (ω × r) , (2.3.8)

which is clearly in a direction perpendicular to r. Because it is also perpendicular to the vector (ω×r),it has to lie on the plane spanned by ω and r, and therefore it is rotating around the axis ω togetherwith the rod.

181818


Using Eq. (2.3.5), we can write the components of the angular momentum as

L =

−2mxzω−2myzω

2m(x2 + y2)ω

= 2mρω

−z cosφ−z sinφ

ρ

. (2.3.9)

For completeness, let us write down the angular momentum vector L for a general angular velocityω. We have

L =∑a

[(ra · ra)ω − (ra · ω)ra]

=∑a

ma

(x2a + y2

a + z2a)

ωxωyωz

− (xaωx + yaωy + zaωz)

xayaza

=

∑a

ma

(y2a + z2

a)ωx − xayaωy − xazaωz−xayaωx + (x2

a + z2a)ωy − yazaωz

−xazaωx − yazaωy + (x2a + y2

a)ωz

. (2.3.10)

Using linear algebra, we can write this as a product of a matrix and a vector

L =∑a

ma

y2a + z2

a −xaya −xaza−xaya x2

a + z2a −yaza

−xaza −yaza x2a + y2

a

ωxωyωz

, (2.3.11)

or more conciselyL = I · ω, (2.3.12)

where the three-by-three matrix

I =

Ixx Ixy IxzIyx Iyy IyzIzx Izy Izz

=∑a

ma

y2a + z2

a −xaya −xaza−xaya x2

a + z2a −yaza

−xaza −yaza x2a + y2

a

(2.3.13)

is known as the inertia tensor. In general, a tensor is a geometric object that describes a linear relationbetween two or more vectors vectors. In this case, the inertia tensor describes the linear relationbetween ω and L, and can be represented by a three-by-three matrix. Just like the components of avector, the elements of the matrix I change under rotations. For more details, see Appendix A.9 inKibble&Berkshire.

Finally, it is often convenient to work in the component notation. Labelling the coordinates x, yand z by i, j ∈ 1, 2, 3, we can write the components of the inertia tensor in a compact form as

Iij =∑a

ma

(r2aδij − rairaj

), (2.3.14)

where δij is the Kronecker delta (that is, δij = 1 if i = j and δij = 0 if i 6= j), and rai is the ithcomponent of the position vector ra. In the component notation, Eq. (2.3.12) becomes

Li =∑j

Iijωj . (2.3.15)

191919


2.4 Principal Axes of Inertia

We can make use of our knowledge of the properties of matrices to understand the meaning of theinertia tensor I. We note that I is symmetric, Ixy = Iyx, so that the eigenvalues of I are real. We denotethese eigenvalues by I1, I2 and I3 and call them the principal moments of inertia. The correspondingeigenvectors, which we denote by e1, e2 and e3 are called the principal axes of inertia. They areorthogonal to each other, and we choose them to be unit vectors. By definition, the eigenvectorssatisfy

I · ei = Iiei, (2.4.1)

where again i ∈ 1, 2, 3. This also means that is the angular velocity ω is parallel to a principal axis,then the angular momentum L is parallel to it.

It is convenient to work in a coordinate system based on the principal axes, and write

ω =∑i

ωiei. (2.4.2)

The angular momentum is thenL =

∑i

Iiωiei. (2.4.3)

It is important to note that the principal axes rotate with the body. They therefore represent a rotatingframe of reference (see Chapter 1).

Using the identity (a× b) · c = a · (b× c), the kinetic energy can be expressed as

T =∑a

12mara · ra =

∑a

12ma (ω × ra) · (ω × ra) =

∑a

12maω · [ra × (ω × ra)]

=1

2ω · L =

1

2ω · I · ω =

1

2

∑i

Iiω2i . (2.4.4)

The principal axes can always be found by diagonalising the inertia tensor I, but calculations be-come easier if one already knows their directions because then one can choose them as the coordinateaxes. It is therefore useful to know that any symmetry axis is always a principal axis, and than thedirection normal to any symmetry plane is also a principal axis.

If two of the principal moments of inertia are equal, say I1 = I2, we say that the body is asymmetric body. In this case, any linear combination of the e1 and e2, so any two orthogonal directionson the plane spanned by e1 and e2 can be chosen as the principal axis. Note that although a systemwith an axis of cylindrical symmetry, e.g. a cylinder or a cone, would certainly be a symmetric body inthis sense, it is not necessary. In fact any system with a more than 2-fold rotational symmetry wouldsuffice, e.g. a triangular prism, or the two principal moments could be equal just by chance in spite ofthe body have no geometrical symmetry. In the case of a symmetric body, Eq. (2.4.3) becomes

L = I1 (ω1e1 + ω2e2) + I3ω3e3 . (2.4.5)

If all 3 moments of inertia are equal, we say the body is totally symmetric. Again, this can happeneither by symmetry, as in a sphere, cube, regular tetrahedron or any of the five regular solids, or bycoincidence. In the case of a totally symmetric body, we have L = Iω and L is always in the samedirection as ω. In that case the choice of the 3 principal axes is completely arbitrary, as long as theyare mutually perpendicular.

202020


2.5 Calculation of Moments of Inertia

2.5.1 Shift of Origin

It is often useful to be able to relate the moments of inertia about different pivots, e.g. when a bodyis pivoted around a point other than its centre of mass. We write the position vector ra has the sumof the centre-of-mass position R and the position relative to the centre-of-mass r∗a, i.e., ra = R + r∗a.Then, by definition, ∑

a

mar∗a = 0. (2.5.1)

Therefore we can write the components of the inertia tensor (2.3.14) as

Iij =∑a

ma

[(R2 + r∗a

)2δij − (Ri + r∗ai)(Rj + r∗aj)

]=

∑a

ma

[R2δij + (r∗a)

2δij −RiRj − r∗air∗aj]

= M(R2δij −RiRj

)+ I∗ij , (2.5.2)

whereI∗ij =

∑a

ma

[(r∗a)

2δij − r∗air∗aj]. (2.5.3)

If we know the inertia tensor with respect to the centre of mass I∗, we can use these relations to easily

calculate with respect to any origin we want. This is known as the parallel axes theorem. Note thatthe principal axes about a general point are not necessarily parallel to those about the centre of mass,unless the point itself lies on one of the principal axes.

2.5.2 Continuous Solid

Generally we have a continuous solid rather than a group of point particles. In this case the sumsbecome integrals and the masses, ma become densities, ρ(r), so that we have

Iij =

∫ρ(r)

(r2δij − rirj

)d3r. (2.5.4)

2.5.3 Routh’s Rule

Let us now imagine that the coordinate axes have been chosen to agree with the principal axes. Wecan then see from Eq. (2.5.4) that we can split the principal moments of inertia such that

I∗1 = Ky +Kz, I∗2 = Kx +Kz, I∗3 = Kx +Ky, (2.5.5)

whereKi =

∫Vρr2i d3r. (2.5.6)

It is now useful to ask how the principal moments change if we rescale (i.e. stretch or squeeze) thebody in the directions the principal axes. To do this in practice, let us first consider the original bodyV assuming that we know the constants Ki defined by by Eq. (2.5.6). The rescaled body V is obtainedby rescaling the coordinates as ri = airi for each i ∈ 1, 2, 3. The constantsKi in Eq.(2.5.6) changeto

Ki =

∫Vρri

2 dx dy dz = a1a2a3a2i

∫Vρr2i dx dy dz = a1a2a3a

2i Ki. (2.5.7)

212121


We can also note that the total mass of the body V is

M =

∫Vρdx dy dz = a1a2a3

∫Vρdx dy dz = a1a2a3M, (2.5.8)

where M is the mass of the original body. Hence Ki ∝ a2iM , and we can write

Ki = λia2iM, (2.5.9)

where λz is a dimensionless number, which is the same for all bodies of the same general type. Hencewe have Routh’s rule which states that

I∗1 = M(λya

2y + λza

2z

),

I∗2 = M(λxa

2x + λza

2z

), (2.5.10)

I∗3 = M(λxa

2x + λya

2y

).

By checking the standard bodies we obtain the following values for the coefficients: λ = 13 for

‘rectangular’ axes, λ = 14 for ‘elliptical’ axes and λ = 1

5 for ‘ellipsoidal’ ones. This covers mostspecial cases. For example, a sphere is an ellipsoid with ax = ay = az = a and each principal momentof inertia is 2

5Ma2, whereas a cube is a parallelepiped with ax = ay = az = a and I = 23Ma2.

For a cylinder we have λx and λy elliptical and λz rectangular. This nomenclature can be confus-ing as it refers to the symmetry of the corresponding integrals and not to symmetry about the axes.For a cylinder with ax = ay 6= az we have

I∗1 = I∗2 = M(

14a

2x + 1

3a2z

)I∗3 = M

(14a

2x + 1

4a2x

)= 1

2Ma2x , (2.5.11)

and therefore a flat circular plate, i.e. a cylinder with az = 0, has

I∗1 = I∗2 = 14Ma2

x I∗3 = 12Ma2

x . (2.5.12)

Conversely, a thin rod is a cylinder with ax = ay = 0 and

I∗1 = I∗2 = 13Ma2

z I∗3 = 0 . (2.5.13)

2.6 Effect of Small Force

Suppose a body is rotating about a principal axis such that ω = ωe3 and L = I3ωe3. Then

L = I3ω = 0 , (2.6.1)

the axis will remain fixed in space and the angular velocity will be constant. Note that this would notbe true if ω were not a principal axis.

Suppose now that the axis is fixed at the origin and a small force F is applied to the axis at pointr. Then the equation of motion becomes

L = r× F . (2.6.2)

The body will acquire a small component of angular velocity perpendicular to its axis. However, ifthe force is small, this will be small compared with the angular velocity of rotation about the axis. Wemay then neglect the angular momentum components normal to the axis and again write

L = I3ω = r× F . (2.6.3)

222222


Figure 2.2:

Since r×F is perpendicular to ω (r is parallel to ω) the magnitude of ω does not change (dω2/dt =2ω · ω = 0). Its direction does change, however, in the direction of r×F and hence perpendicular tothe applied force F.

As an example, consider a child’s spinning top. In general, the rotation axis is not exactly verti-cal. We consider the point as which the top touches the ground as the pivot point, and use it as ourorigin. There is a gravitational force F = −Mgk, acting at the centre of mass at position R = Re3.Eq. (2.6.3) gives

I3ωde3

dt= −MgR e3 × k

⇒ de3

dt=

(MgR

I3ω

)k× e3. (2.6.4)

This has the same form as Eq. (1.2.1), i.e.,

de3

dt= Ω× e3, (2.6.5)

Which means that the principal axis e3 rotates around the vertical direction k with angular velocity

Ω =MgR

I3ωk . (2.6.6)

The analysis is only valid when Ω ω or when MgR I3ω2; the potential energy associated

with the tilt is much smaller than the kinetic energy of the rotation. The system is very similar toLarmor precession (see section 1.6.4). The expression for Ω tells us a great deal about this system.Note that Ω is inversely proportional to both the moment of inertia I3 and the angular frequency ω.This implies that to minimise the precession and hence to improve the stability of the system we haveto choose both to be large: we require a fat rapidly spinning body.

This is the basis of the gyroscope: the high stability of such a rapidly rotating body makes it idealfor use in navigation, especially near the poles where a compass is almost useless. It can also be used,e.g. , to provide an “artificial horizon” when flying blind, either in cloud or at night.

232323


2.7 Rotation about a Principal Axis

As the principal axes are fixed in the body we are really dealing with a rotating frame. We here returnto the notation used in Section 1 to distinguish between the inertial and rotating frames. The rate ofchange of the angular momentum in the inertial frame is

dL

dt

∣∣∣∣I

=∑a

ra × Fa = τ . (2.7.1)

Eq. (Rot:eq:7) relates this to the rate of change measured in the rotating frame,

dL

dt

∣∣∣∣I

=dL

dt

∣∣∣∣R

+ ω × L . (2.7.2)

On the other hand, because in the rotating frame the principal axes and principal moments are fixed,we have

dL

dt

∣∣∣∣R

= I1ω1e1 + I2ω2e2 + I3ω3e3 , (2.7.3)

and, therefore,dL

dt

∣∣∣∣R

+ ω × L = τ . (2.7.4)

Calculating the cross product

ω × L =

∣∣∣∣∣∣e1 e2 e3

ω1 ω2 ω3

I1ω1 I2ω2 I3ω3

∣∣∣∣∣∣= (I3 − I2)ω2ω3e1 + (I1 − I3)ω1ω3e2 + (I2 − I1)ω1ω2e3, (2.7.5)

we find the Euler equations

I1ω1 + (I3 − I2)ω2ω3 = τ1 ,

I2ω2 + (I1 − I3)ω3ω1 = τ2 , (2.7.6)

I3ω3 + (I2 − I1)ω1ω2 = τ3 .

In principle these equations could be solved to give ω(t). In practice, however, we often don’t havethe force expressed in a useful form to do this and, in any case, it is easier to solve this system usingLagrangian methods (see chapter 3).

For the moment we concentrate on studying the stability of the motion in the absence of externalforces (τ = 0). Suppose that the object is rotating about the principal axis e3 and that ω1 = ω2 = 0then it is obvious from Eq. (2.7.6) that the object will continue indefinitely to rotate about e3. On theother hand let us suppose that the motion deviates slightly from this such that ω1 and ω2 are muchsmaller than ω3. We may therefore ignore any terms which are quadratic in ω1 and ω2 so that, fromthe third line in Eq. (2.7.6), we have ω3 = 0 and ω3 is constant.

We look for solutions of the form2

ω1 = a1eγt ω2 = a2eγt (2.7.7)2Those doing computational physics will note the similarity between this analysis and the stability analysis considered

there.

242424


where a1, a2 and γ are constants. Substituting this into Eq. (2.7.6) gives

I1γa1 + (I3 − I2)ω3a2 = 0 (2.7.8)

I2γa2 + (I1 − I3)ω3a1 = 0 , (2.7.9)

which is a 2× 2 eigenvalue problem with a solution

γ2 =(I3 − I2) (I1 − I3)

I1I2ω2

3 . (2.7.10)

We note that ω23/I1I2 is always positive. Hence, if I3 is the smallest or the largest of the 3 moments

of inertia γ2 is negative. In that case γ is imaginary and the motion is oscillatory. Hence its amplitudedoes not change, and we say that the rotation is stable.

However, if I3 is the middle of the three moments then γ2 is positive and γ is real. There are twoindependent solutions with opposite signs of γ, and in general the solution is a linear combinationof them. However, at late times (t 1/γ) the solution with a positive exponent dominates. Henceω1 and ω2 tend to grow exponentially and the motion about e3 is unstable: any small deviation fromrotation about e3 will tend to grow.

You can test this by trying to spin an appropriately dimensioned object, such as a book or a tennisracket. It is much easier to spin it around the axis with the smallest or the largest moment of inertia,but not the middle one.

2.8 Euler’s Angles

Figure 2.3:

In order to describe the orientation of a solid body we require 3 angles. The conventional way todo this is to define angles (φ, θ, ψ) these is known as Euler’s Angles,, which are illustrated in Fig. 2.3.Note however that there are several different conventions for Euler’s Angles. We shall stick to the oneused by Kibble & Berkshire, known as the y–convention. The meaning of the angles is, essentially,

252525


that φ and θ are the usual spherical coordinates expressing the direction of the principal axis e3, andψ expresses the orientation of the object about this axis.

Let us construct the angles in detail. We can obviously express the orientation of the body by giv-ing the orientations of the three principal axes, i.e., by a triplet of orthogonal unit vectors (e1, e2, e3).To show that we can parameterise these with the three Euler angles, let us start with the orienta-tion (ı, , k), which means that the principal axes are aligned with the axes of our original Cartesiancoordinate system. As illustrated in Fig. 2.3, we then carry out three steps:

• We first rotate by φ about the k axis. The changes the directions of the first two principal axes,and we denote the new directions by e′′1 and e′2. Thus, the orientation of the principal axeschanges as (ı, , k)→ (e′′1, e

′2, k).

• Secondly we rotate by θ about the second principal axis e′2. This changes the directions of thefirst and third principal axes to e′1 and e3, so the orientation of the body changes as (e′′1, e

′2, k)→

(e′1, e′2, e3).

• Finally we rotate by ψ about the third principal axis e3, to bring the first principal axis todirection e1 and the second principal axis to e2, i.e., (e′1, e

′2, e3)→ (e1, e2, e3).

Using these three rotations we can reach any orientation (e1, e2, e3) we want, and therefore the ori-entation of the body is fully parameterised by the three Euler angles.

Because the three angles (φ, θ, ψ) correspond to rotations about the axes k, e′2 and e3, respectively.Note that these axes are not mutually perpendicular. We can, nevertheless, use them to express theangular velocity ω in terms of Euler angles as

ω = φk + θe′2 + ψe3 . (2.8.1)

For a symmetric system such as a gyroscope we can choose e3 as the symmetry axis and, asI1 = I2, any two mutually perpendicular axes as the other two. In this case the most convenientare e′1 and e′2 as two of the axes are already used in Eq. (2.8.1). We can therefore use that k =− sin θ e′1 + cos θ e3 to obtain

ω = −φ sin θ e′1 + θe′2 +(ψ + φ cos θ

)e3 , (2.8.2)

where the unit vectors are mutually perpendicular and, for a symmetric body, principal axes.Using Eq. (2.8.2), we can express the angular momentum and kinetic energy as

L = −I1φ sin θ e′1 + I1θe′2 + I3

(ψ + φ cos θ

)e3 (2.8.3)

T = 12I1φ

2 sin2 θ + 12I1θ

2 + 12I3

(ψ + φ cos θ

)2. (2.8.4)

To find equations of motion we could either translate this into Cartesian coordinates, ı, , k, or tryto write the equations in terms of the Euler angles. Either way is difficult. It is much easier to useLagrangian methods (see Chapter 3).

In the meantime we can consider the free motion, with no forces. In this case L is a constant. Wetherefore choose the vector k to be in the direction of L such that

L = Lk = −L sin θ e′1 + L cos θe3 . (2.8.5)

262626


This must be equal to Eq. (2.8.3) so that by equating components we can write

I1φ sin θ = L sin θ (2.8.6)

I1θ = 0 (2.8.7)

I3

(ψ + φ cos θ

)= L cos θ (2.8.8)

From Eq. (2.8.7) we deduce that θ is constant. As long as sin θ 6= 0, Eq. (2.8.6) implies that that φ isconstant, too,

φ =L

I1, (2.8.9)

and hence, from Eq. (2.8.8), we find that ψ is also a constant,

ψ = L cos θ

(1

I3− 1

I1

). (2.8.10)

We conclude therefore that the axis e3 rotates around L at a constant rate φ and at an angle θ to it. Inaddition the body spins about the axis e3 at a constant rate ψ. The angular velocity vector ω deducedfrom Eq. (2.8.2) is

ω = −φ sin θ e′1 +(ψ + φ cos θ

)e3 (2.8.11)

which describes a cone around the direction of L.Note that this appears very similar to precession (see Section 2.6): ψ is the angle of rotation of the

gyroscope around its axis, θ is the angle between the gyroscope axis and the angular momentum andφ is the angle which describes the precession around this direction. However, here we are describingfree rotation with no external forces involved.

272727

Chapter 3

Lagrangian Mechanics

3.1 Action Principle

In this chapter we will see how the familiar laws of mechanics can be expressed and understood froma very different point of view, which is known as the Lagrangian formulation of mechanics. This is inmany ways more elegant than the Newtonian formulation, and it is particularly useful when movingto quantum mechanics. For example, quantum field theories are usually studied in a Lagrangianframework.

The idea is similar to Fermat’s principle in optics, according to which light follows the shortestoptical path, i.e., the path of shortest time to reach its destination. As a reminder, let us see how Snell’slaw

sin θ2

sin θ1=n1

n2, (3.1.1)

which tells how a light ray bends at the interface of two materials with refractive indices n1 and n2.Consider a light ray from point (xa, ya) to (xb, yb). There is a horizontal interface at y, and

between ya and y, the refractive index is n1 and between y and yb it is n2. We now assume that thelight follows a straight line from (xa, ya) to a point (x, y) on the interface, and then from (x, y) to(xb, yb). The only unknown is therefore x. Because of the speed of light in medium is c/n, the opticalpath length is

S(x) =

∫ b

adt =

∫ b

a

n

cdl =

n1

c

√(x− xa)2 + (y − ya)2 +

n2

c

√(xb − x)2 + (yb − y)2. (3.1.2)

According to Fermat’s principle, we need to find the minimum of this quantity. At the minimum, thederivative with respect to x vanishes, so

0 =∂S

∂x=n1

c

x− xa√(x− xa)2 + (y − ya)2

− n2

c

xb − x√(xb − x)2 + (yb − y)2

=n1

csin θ1 −

n2

csin θ2,

(3.1.3)from which Snell’s law (3.1.1) follows immediately.

Fermat had proposed his principle in 1657, and it motivated Maupertuis to suggest in 1746 thatalso matter particles would obey an analogous variational principle. He postulated that there exists aquantity called action, which the trajectory of the matter particle would minimise. This idea was laterrefined by Lagrange and Hamilton, who developed it into its current form, in which the action S is

28

Advanced Classical Physics, Autumn 2013 Lagrangian MechanicsAdvanced Classical Physics, Autumn 2013 Lagrangian MechanicsAdvanced Classical Physics, Autumn 2013 Lagrangian Mechanics

defined as an integral over a function L(x, x, t) known as the Lagrangian, as

S[x] =

∫ tb

ta

L(x(t), x(t), t)dt. (3.1.4)

The Lagrangian is a function of the position x and velocity x of the particle, and it may also have someexplicit time dependence. We will see later that for conservative systems, the Lagrangian is simplythe difference of the kinetic energy T and the potential energy V , i.e., L = T − V .

Because the action S is given by an integral over time, it depends on the position and velocityat all times, i.e., on the whole trajectory of the particle. It is therefore a function from the space offunctions x(t) to real numbers, and we indicate that by having the argument (i.e. function x) in squarebrackets. Such function of functions are called functionals.

Given a Lagrangian L, the dynamics is determined by Hamilton’s principle (or action principle),which states that to move from position xa at time ta to position xb at time tb, the particle follows thetrajectory that minimises the action S[x]. In other words, the actual physical trajectory is the functionx that minimises the action subject to the boundary conditions x(ta) = xa and x(tb) = xb.

To find this minimising function x(t), we want to calculate the derivative of the action S[x] withrespect to the function x(t) and set it to zero. Functional derivatives such as this are studied inthe branch of mathematics known as functional analysis. However, he we adopt a slightly simplerapproach and consider small variations of the trajectory. This is known as variational calculus.

Let us assume that x(t) is the function that minimises the action, and consider a slightly perturbedtrajectory

x(t) = x(t) + δx(t), (3.1.5)

where we assume that the perturbation is infinitesimally small and vanishes at the endpoints,

δx(ta) = δx(tb) = 0. (3.1.6)

This perturbation changes the action by

δS = S[x+ δx]− S[x]

=

∫ tb

ta

[L(x(t) + δx(t), x(t) + δx(t), t)− L(x(t), x(t), t)] dt

=

∫ tb

ta

[∂L

∂xδx(t) +

∂L

∂xδx(t)

]dt

=

∫ tb

ta

[∂L

∂xδx(t) +

∂L

∂x

dδx(t)

dt

]dt

=

∫ tb

ta

∂L

∂xδx(t)dt+

[∂L

∂xδx(t)

]tbta

−∫ tb

ta

[d

dt

∂L

∂x

]δx(t)dt, (3.1.7)

where we Taylor expanded to linear order in δx(t) and integrated the second term by parts. Becauseof the boundary conditions (3.1.6), the substitution term vanishes, and we have

δS =

∫ tb

ta

[∂L

∂x− d

dt

∂L

∂x

]δx(t)dt. (3.1.8)

For x(t) to be minimum, the variation of the action (3.1.8) has to vanish for any function δx(t).You can see this by noting that if δS < 0 for any perturbation δx(t), then S[x + δx] < S[x].

292929


Correspondingly, if δS > 0, then S[x− δx] < S[x]. In either case, we have found a function that haslower action than x(t). Therefore x(t) can only be the minimum if δS = 0.

The only way we can have δS = 0 for every perturbation δx(t) is that the expression inside thebrackets in Eq. (3.1.8) vanishes, i.e.,

d

dt

∂L

∂x− ∂L

∂x= 0. (3.1.9)

This is known as the Euler-Lagrange equation, and it is the equation of motion in the Lagrangianformulation of mechanics. When using Eq. (3.1.9), it is very important to understand the differencebetween the partial (∂) and total (d) derivatives.

To check that Eq. (3.1.9) really describes the same physics as Newtonian mechanics, let us con-sider a simple example of a particle in a one-dimensional potential V (x). Because the system isconservative, the Lagrangian is

L = T − V =1

2mx2 − V (x), (3.1.10)

and the Euler-Lagrange equation is

d

dt

∂L

∂x− ∂L

∂x=

d

dt(mx) +

dV

dx= mx+

dV

dx= 0. (3.1.11)

This is nothing but Newton’s second law

mx = −dVdx

. (3.1.12)

It is interesting to note that although Newtonian and Lagrangian formulations of mechanics aremathematically equivalent and describe the same physics, their starting point is very different. New-ton’s laws describe the evolution of the system as an initial value problem: We know the position andvelocity of the particle at the initial time, x(ta) and x(ta), and we then use Newton’s laws to determinethe evolution x(t) at later times t > ta.

In contrast, the Lagrangian formulation describes the same physics as a boundary value problem.We know the initial and final positions of the particle x(ta) and x(tb), and we use the action principleto determine x(t) for ta < t < tb, i.e., how the particle travels from one to the other. In particular,we cannot choose the initial velocity because it is determined by the final destination of the particlethrough the action principle. This may appear very non-local in time because the behaviour of theparticle in the far future determines its motion at the current time. However, because because the twoformulations are equivalent, this apparent non-locality in time does not actually affect the physics. Forexample, it is not possible to use it to send information back in time. In practice, it is usually easier tosolve initial value problems, and therefore one usually uses the Lagrangian formulation to set up theproblem and derive the equations of motion but then solves them as an initial value problem.

In many ways the Lagrangian formulation is closer to quantum mechanics, which does not allowone to determine the initial position and velocity of the particle either. Furthermore, the principlethat the particle chooses one out of all possible trajectories resembles the double slit experiment inquantum mechanics, with the key difference that in the quantum case one has to sum over all possibletrajectories rather than just selecting one. This correspondence turns out to be fully accurate andbecomes obvious in the path integral formulation in quantum mechanics.

303030


3.2 Generalised Coordinates

One attractive aspect of the Lagrangian formulation is that it is independent of the variables that areused to describe the state of the system. This is because the minimum of the function does not dependon the coordinate system, and the same applies to a functional such as the action S. Therefore, incontrast with Newtonian mechanics, we do not have to use the Cartesian position coordinates, and theEuler-Lagrange equation still has the same form (3.1.9). Instead, we are free to choose whichever setof variables we want to parameterise the state of the system, and which are then called generalisedcoordinates and usually denoted by q. They can be position coordinates, but also angles etc.

Usually we need more than one generalised coordinate, which we label by index i, so that wehave some number N generalised coordinates qi, with i = 1, . . . , N . Each coordinate satisfies thecorresponding Euler-Lagrange equation

d

dt

∂L

∂qi− ∂L

∂qi= 0. (3.2.1)

As the first example of generalised coordinates, let us consider a simple pendulum that has a massm at the end of a light rod of fixed length l. The angle of the pendulum from the vertical position is θ,which we choose as the generalised coordinate q = θ. The Lagrangian is

L = T − V =1

2ml2θ2 +mgl cos θ. (3.2.2)

The Euler-Lagrange equation is

d

dt

∂L

∂θ− ∂L

∂θ=

d

dt

(ml2θ

)+mgl sin θ = ml2θ +mgl sin θ = 0, (3.2.3)

from which we findθ = −g

lsin θ. (3.2.4)

As a slightly more complex example, let us consider the motion of a particle in a central potentialV (r) in three dimensions. We use the spherical coordinates (r, θ, φ) as the generalised coordinates.The Lagrangian is

L = T − V =1

2m(r2 + r2θ2 + r2φ2 sin2 θ

)− V (r). (3.2.5)

The Euler-Lagrange equation for r is

d

dt

∂L

∂r− ∂L

∂r=

d

dt(mr)−mr

(θ2 + φ2 sin2 θ

)+dV

dr= 0, (3.2.6)

which gives

mr = mr(θ2 + φ2 sin2 θ

)− dV

dr. (3.2.7)

The second term on the right hand side is the force due to the potential, and the first term is thecentrifugal force.

The Euler-Lagrange equation for θ is

d

dt

∂L

∂θ− ∂L

∂θ=

d

dt

(mr2θ

)−mr2φ2 sin θ cos θ = 0. (3.2.8)

313131


Finally, because φ does appear in the Lagrangian (except as a time derivative φ), the Euler-Lagrangeequation for φ is

d

dt

∂L

∂φ− ∂L

∂φ=

d

dt

∂L

∂φ= 0. (3.2.9)

This means that the quantity∂L

∂φ= mr2 sin2 θφ (3.2.10)

is conserved. We note that this is simply Lz , the z component of the angular momentum vectorL = mr× r.

We can easily see that this is, in fact, a very general result. For any generalised coordinate qi, wedefine the generalised momentum pi by

pi =∂L

∂qi. (3.2.11)

The Euler-Lagrange equation implies that whenever the Lagrangian does not depend on qi, then thecorresponding generalised momentum pi is conserved. This is a simple example of a more generalresult known as Noether’s theorem, which we will come back to later.

As a very simple example, let us consider the Cartesian position coordinate x as our generalisedcoordinate. With the Lagrangian L = 1

2mx2 − V (x), the generalised momentum is simply the

conventional momentum p = ∂L/∂x = mx.

3.3 Precession of a Symmetric Top

Let us now consider an example of the use of Lagrangian mechanics to solve a real problem: asymmetric top. The kinetic energy is given in terms of Euler angles by Eq. (2.8.4) whereas thegravitational potential energy is V = MgR cos θ. This leaves us with a Lagrangian

L = 12I1φ

2 sin2 θ + 12I1θ

2 + 12I3

(ψ + φ cos θ

)2−MgR cos θ . (3.3.1)

The Euler-Lagrange equation (3.2.1) for θ is

d

dt

(I1θ)

= I1φ2 sin θ cos θ − I3

(ψ + φ cos θ

)φ sin θ +MgR sin θ . (3.3.2)

The Lagrangian function (3.3.1) does not contain the other two Euler angles φ and ψ so the generalisedmomenta pφ = ∂L/∂φ and pψ = ∂L/∂ψ are constant

d

dt

[I1φ sin2 θ + I3

(ψ + φ cos θ

)cos θ

]= 0 (3.3.3)

d

dt

[I3

(ψ + φ cos θ

)]= 0 . (3.3.4)

Note that comparison of Eqs. (3.3.4) and (2.8.2) tells us that

ω3 = ψ + φ cos θ = constant. (3.3.5)

We are interested in the situation of steady precession at a constant angle θ. In this case weconclude from Eqs. (3.3.3) and (3.3.4) that φ and ψ are constant. Hence the axis of the top precesses

323232


around the vertical with a constant angular velocity, which we denote by Ω, i.e., φ = Ω. Because weare looking for a solution with fixed θ, the left side of Eq. (3.3.2) must vanish, and we obtain

I1Ω2 cos θ − I3ω3Ω +MgR = 0 , (3.3.6)

which we can solve for Ω. The general solution is

Ω =I3ω3 ±

√I2

3ω23 − 4I1MgR cos θ

2I1 cos θ. (3.3.7)

This only has real roots if

ω23 ≥ ω2

c ≡4I1MgR cos θ

I23

. (3.3.8)

If the top is spinning more slowly than this, there is no solution with constant θ. Instead, the top startsto wobble.

For a rapidly spinning top, ω3 ωc, we can expand the square root in Eq. (3.3.7) to obtain

Ω ≈I3ω3 ± I3ω3

(1− 2 I1MgR cos θ

I23ω23

)2I1 cos θ

→MgR/I3ω3 −signI3ω3/I1 cos θ +sign

(3.3.9)

The first of these is the precession frequency calculated in (2.6.6), while the second is the precessionof a free system discussed in section 2.8. Note the absence of any contribution from gravity in thesecond expression.

Note that if θ > 12π, the top is hanging below its point of support, and there is no limit on ω3. In

particular, for ω3 = 0, we find the possible angular velocities of a compound pendulum swinging in acircle

Ω = ±

√MgR

I1 |cos θ|(3.3.10)

3.4 Constraints

Consider a system of N particles in three dimensions. To specify the position of each particle, youneed 3N generalised coordinates. However, in many cases the coordinates are not all independent butsubject to some constraints, such as the rigidity conditions (see Section 2.2), which reduce the numberof generalised coordinates required. For a rigid body, the original 3N coordinates may be reduced tosix generalised coordinates: three translational, such as the coordinates X,Y, Z of the centre of mass,and three rotational, such as the Euler angles, φ, θ, ψ.

We will assume that the constraints can be written in the form f(x1, . . . , x3N , t) = 0. Constraintslike that are called holonomic. For a rigid body, the constraints are of this form: The distance betweeneach pair of particles i and j is fixed to a constant dij , and therefore one has

(ri − rj)2 − d2

ij = 0. (3.4.1)

Another example is motion on the surface of a sphere of radius R, for which the constraint is

f(x, y, z) = z2 + y2 + z2 −R2 = 0. (3.4.2)

Sometimes one has to deals with non-holonomic constraints, for example if the constraint depends onvelocities, but these are more complicated to handle, and we will not discuss them in this course.

333333


In principle, solving each constraint equation eliminates one coordinate. If one has initially Ncoordinates xi (with i = 1, . . . , N ) and C constraints, solving them will allow one to express theoriginal coordinates in terms of M = N − C generalised coordinates qj , with j = 1, . . . ,M ,

xi = xi(q1, . . . , qM , t), (3.4.3)

with possibly explicit time-dependence if the constraints are time-dependent. In that case the systemis called forced, otherwise it is natural.

If one can solve the constraints and find the explicit relations (3.4.3), one can then write theLagrangian in terms of the generalised coordinates qj and solve the Euler-Lagrange equation. Substi-tuting this solution to Eq. (3.4.3) then gives the solution in terms of the original coordinates.

An alternative approach, which is sometimes useful, is to implement the constraints us-ing Lagrange multipliers. Starting with a Lagrangian L(x1, . . . , xN ) and a constraint functionf(x1, . . . , xN ), we define a new Lagrangian L′

L′(x1, . . . , xN , λ) = L(x1, . . . , xN ) + λf(x1, . . . , xN ), (3.4.4)

which is a function of the original coordinates and a Lagrange multiplier λ. If we now treat λ as the(N + 1)th coordinate, its Euler-Lagrange equation is

d

dt

∂L′

∂λ− ∂L′

∂λ= −f(x1, . . . , xN ) = 0, (3.4.5)

and therefore it satisfies the constraint automatically. The Euler-Lagrange equations for the originalcoordinates are

d

dt

∂L

∂xi− ∂L

∂xi− λ(t)

∂f

∂xi= 0, (3.4.6)

where the extra term can be interpreted as the (generalised) force that has to be applied to the systemto enforce the constraint.

As an example, consider a mass m hanging from a rope that is wrapped around a pulley of radiusR and moment of inertia I . Using the vertical position z of the mass, and the angle θ of the pulley asthe coordinates, the Lagrangian is

L = T − V =1

2mz2 +

1

2Iθ2 −mzg. (3.4.7)

If the rope does not slip, the pulley has to rotate as the mass moves, and this imposes a constraintRθ = −z. Choosing the origin appropriately, we can write this as a holonomic constraint

f(θ, z) = Rθ + z = 0. (3.4.8)

Introducing the Lagrange multiplier λ, the new Lagrangian is

L′ = L+ λ(Rθ + z) =1

2mz2 +

1

2Iθ2 −mzg + λ(Rθ + z). (3.4.9)

The Euler-Lagrange equations are

d

dtmz +mg − λ = 0 for z,

d

dtIθ − λR = 0 for θ,

−(Rθ + z) = 0 for λ. (3.4.10)

343434


The third equation implements the constraint θ = −z/R, and substituting this to the first two gives

mz +mg − λ = 0,

− IRz − λR = 0. (3.4.11)

Solving this pair of equations for z and λ, we obtain(m+

I

R2

)z +mg = 0,

λ =mg

1 +mR2/I. (3.4.12)

The first line shows that the moment of inertia I of the pulley gives the mass extra inertia. The secondline gives the force that rope has to apply in order to enforce the constraint. This is just the tension ofthe rope.

3.5 Normal Modes

3.5.1 Orthogonal Coordinates

Instead of rigid constraints, let us now consider a situation where the constraints are flexible so thatthe particles can move around their equilibrium positions. We assume that the system is described byN generalised coordinates qi. We also assume that it is natural, which means that the kinetic energyis a quadratic homogeneous function of the generalised velocities. We can then write it as

T =1

2

∑ij

aij(q1, . . . , qN )qiqj , (3.5.1)

where the coefficients aij can depend on the coordinates qi but not on velocities qi. They can chosento be symmetric (aji = aij) without any loss of generality.

The coordinates are said to be orthogonal if there are no cross terms, i.e., aij = 0 if i 6= j. Thenthe kinetic energy is simply

T =1

2

∑i

aii(q1, . . . , qN )q2i . (3.5.2)

We can always make our coordinates orthogonal by using the Gram-Schmidt procedure. For example,if N = 2, the general form of Eq. (3.5.1) is

T = 12a11q

21 + a12q1q2 + 1

2a22q22 . (3.5.3)

Defining a new coordinateq′1 = q1 +

a12

a11q2 , (3.5.4)

the kinetic energy becomes

T = 12a11q

′21 + 1

2a′22q

22 with a′22 = a22 −

a212

a11. (3.5.5)

Furthermore, if we rescale the coordinates in Eq. (3.5.2) by

q′i =√aiiqi, (3.5.6)

353535


θ

φ

M

m

r

R

Figure 3.1: Double pendulum

the kinetic energy becomes

T =1

2

∑i

q′2i . (3.5.7)

Therefore we can always assume that the kinetic energy has this form.As an example, consider a double pendulum, with a second pendulum hanging from the first. The

kinetic energy is

T =1

2MR

2+

1

2m(R + r

)2= 1

2MR2θ2 + 12m[R2θ2 + r2φ2 + 2Rrθφ cos(θ − φ)

](3.5.8)

where M , R and θ refer to the upper pendulum and m, r and φ to the lower. Note that the kineticenergy of the lower pendulum depends not only on φ but also on the motion of the upper pendulum towhich it is attached. For small values of θ and φ we can set the cosine term to one, so that the kineticenergy is

T = 12(M +m)R2θ2 + 1

2mr2φ2 +mRrθφ. (3.5.9)

However, because of the last term, the coordinates are not orthogonal. In this case it is obvious thatwe could get an orthogonal set by simply considering the displacement of the 2 bobs. For small anglesthis gives

x = Rθ y = Rθ + rφ , (3.5.10)

and the kinetic energy, T , becomes

T = 12Mx2 + 1

2my2 . (3.5.11)

Finally, we can reduce this to the standard form (3.5.7) using

q1 =√Mx q2 =

√my . (3.5.12)

3.5.2 Small Oscillations

We now consider the potential energy, V . If T is given by (3.5.7), the Lagrangian is

L = T − V =1

2

∑i

q′2i − V (q1, . . . , qN ). (3.5.13)

363636


The Euler-Lagrange equations are then simply

qi = −∂V∂qi

, (3.5.14)

for all i.If we now assume that the amplitude of the oscillations is small, we can Taylor expand the potential

around the origin, which was chosen to correspond to the equilibrium state. To quadratic order wehave

V (q1, . . . , qN ) = V0 +∑i

biqi + 12

∑ij

kijqiqj +O(q3), (3.5.15)

where V0, bi and kij are constants.Because we can subtract a constant from the potential without changing the equations of motion

(3.5.14), we are free to choose V0 = 0. Because we are assuming that the equilibrium state is qi = 0for all i, we find that bi = 0. Therefore, we only need to consider the quadratic term

V = 12

∑ij

kijqiqj , (3.5.16)

Note that we can also choose kij to be symmetric, i.e., kji = kij without any loss of generality.With the potential (3.5.16), the Euler-Lagrange equation (3.5.14) is simply

qi = −∑j

kijqj , (3.5.17)

which can be written in matrix form as

d2

dt2

q1

q2...qN

= −

k11 k12 · · · k1N

k21 k22 · · · k2N...

.... . .

...kN1 kN2 · · · kNN

q1

q2...qN

. (3.5.18)

By defining an N -dimensional coordinate vector q = (q1, . . . , qN ) and an N × N matrix k withelements kij , we can also write it more compactly as

q = −k · q. (3.5.19)

In the same notation, the Lagrangian is

L =1

2q · q− 1

2q · k · q. (3.5.20)

In our double pendulum example, the potential is

V (θ, φ) = MgR(1− cos θ) +mg [R(1− cos θ) + r(1− cosφ)]

≈ 1

2(M +m)gRθ2 +

1

2mgrφ2

=M +m

2Rgx2 +

mg

2r(y − x)2

=1

2

(1 +

m

M

R+ r

r

)g

Rq2

1 +g

2rq2

2 −√m

M

g

rq1q2. (3.5.21)

The coefficient matrix is therefore

k =

((1 + m

MR+rr

) gR −

√mM

gr

−√

mM

gr

gr

). (3.5.22)

373737


3.5.3 Eigenvalue Problem

Eq. (3.5.19) is a set of N coupled linear second-order equation. Therefore we should find 2N linearlyindependent solutions.

We look for solutions of the formq(t) = Aeiωt, (3.5.23)

where A and ω are constants. We will check later that we have found all 2N solutions.Substituting the Ansatz (3.5.23) into Eq. (3.5.19) gives

−ω2Aeiωt = −k ·Aeiωt, (3.5.24)

which is equivalent tok ·A = ω2A. (3.5.25)

This has the form of the eigenvalue equation: It shows that A is an eigenvector of the matrix k witheigenvalue ω2.

We know from linear algebra that the eigenvalues of a matrix are given by solutions of the char-acteristic equation

det(k− ω21) ≡

∣∣∣∣∣∣∣∣∣k11 − ω2 k12 · · · k1N

k21 k22 − ω2 · · · k2N...

.... . .

...kN1 kN2 · · · kNN − ω2

∣∣∣∣∣∣∣∣∣ = 0. (3.5.26)

Once we have found the eigenvalue ω2, we can substitute it to Eq. (3.5.25) to find the eigenvector.Because k is a symmetric N × N matrix, it has N real eigenvalues ω2

α, where α = 1, . . . , N ,corresponding to N eigenvectors Aα, which we can choose to be orthonormal,

Aα ·Aβ = δαβ. (3.5.27)

For each eigenvalue ω2α, there are two independent solutions,

q+α (t) = Aαe

iωαt and q−α (t) = Aαe−iωαt. (3.5.28)

For N eigenvalues, this takes the total number of linearly independent solutions to 2N , proving thatwe have the complete solution.

The frequencies ωα are real because the eigenvalues ω2α have to be non-negative: If we had ω2

α <0, then for the corresponding eigenvector Aα and small ε we would have

V (εAα) =1

2ε2Aα · k ·Aα =

1

2ε2ω2

α < 0, (3.5.29)

meaning that q = 0 could not be the minimum of the potential as we assumed.The individual solutions q±α are complex, so physical solutions have to be real linear combinations

of them,

qα(t) = a+q+α (t) + a−q−α (t) =

(a+e

iωαt + a−e−iωαt)Aα

= [(a+ + a−) cosωαt+ i(a+ − a−) sinωαt] Aα. (3.5.30)

383838


Because we want a real solution, the coefficients a1 ≡ (a+ + a−) and a2 = i(a+ − a−) have to bereal, and we can equally well write

qα(t) = (a1 cosωαt+ a2 sinωαt) Aα, (3.5.31)

which we can also write a pure cosine term with a phase shift,

qα(t) = c cos(ωαt+ φ)Aα, (3.5.32)

where c and φ are real constants that have to be determined from the initial conditions.Finally, the general solution is a linear combination of solutions of the form (3.5.32),

q(t) =

N∑α=1

cα cos(ωαt+ φα)Aα. (3.5.33)

The modes of vibration of the system, i.e., the individual solutions (3.5.32) are known as normalmodes. For a forced system there are resonances at the frequencies of the normal modes.

It is often useful to use the normal modes to define a set of generalised coordinates known as thenormal coordinates, which we denote by qα. They are defined by expressing the original coordinatesqi in terms of the eigenvectors Aα as

q =∑α

qαAα. (3.5.34)

Substituting this to Eq. (3.5.20) we find that the Lagrangian becomes simply

L =∑α

(1

2˙q2α −

1

2ω2αq

2α

). (3.5.35)

The Euler-Lagrange equations are¨qα + ω2

αqα = 0, (3.5.36)

which means that each normal coordinate qα oscillates independently of all others with its own normalfrequency ωα.

3.6 Continuous Systems

In addition to mechanical systems consisting of a finite number of degrees of freedom, one is oftenalso interested in continuous systems, for example waves propagation in continuous media or fieldtheories describing particle physics or electromagnetism.

To see how continuous systems are described in the Lagrangian formulation, consider a stretchedstring. We assume that in equilibrium the string is stretched to length `0 and has tension k. Thedisplacement of the string from its equilibrium position is given by the continuous function y(x, t),where x ∈ 0, `0. To describe the time and space derivatives, we use the notation

y =∂y

∂t, y′ =

∂y

∂x. (3.6.1)

The Lagrangian is still given by the difference of the kinetic and potential energies, L = T − V .The kinetic energy can be calculated by considering an infinitesimal segment of length dx. The mass

393939


of such a segment is dm = µdx where the constant µ is the mass per unit length. The velocity of thesegment is simply y, and therefore the kinetic energy of the infinitesimal segment is

dT =1

2µdx y2. (3.6.2)

Integrating over the who distance `0, we find the total kinetic energy

T =

∫ `0

0dx

1

2µ y2. (3.6.3)

The potential energy is V of the string is due to its tension k,

V = k(`− `0), (3.6.4)

where ` is the length of the displaced string. Again, this can be calculated by considering an infinites-imal segment of length dx. According to Pythagoras theorem, the length of the segment is

d` =√dx2 + dy2 =

√1 + y′2dx ≈

(1 +

1

2y′2)dx, (3.6.5)

where we have assumed that the displacement is small and smooth so that y′ 1, and Taylor ex-panded to quadratic order. The length of the string is then obtained by summing over all the infinites-imal segments

` =

∫ `0

0dx

(1 +

1

2y′2)

= `0 +

∫ `0

0dx

1

2y′2, (3.6.6)

and therefore the potential energy is

V =

∫ `0

0dx

1

2ky′2. (3.6.7)

We can now write the whole Lagrangian,

L = T − V =

∫ `0

0dx

(1

2µy2 − 1

2ky′2

). (3.6.8)

The integrand is called the Lagrangian density and denoted by L, i.e., L =∫dxL, where

L =1

2µy2 − 1

2ky′2. (3.6.9)

The action is then given by an integral over both time and space,

S =

∫dt

∫dxL(y, y′, y). (3.6.10)

Variation of the action is

δS =

∫dt

∫dx

[∂L∂y

δy +∂L∂y′

δy′ +∂L∂y

δy

]=

∫dt

∫dx

[− d

dt

∂L∂y− d

dx

∂L∂y′

+∂L∂y

]δy.

(3.6.11)The action principle δS = 0 therefore leads to the Euler-Lagrange equation

d

dt

∂L∂y

+d

dx

∂L∂y′− ∂L∂y

= 0. (3.6.12)

Substituting the Lagrangian density (3.6.9) for the string, we find the equation of motiond

dt(µy) +

d

dx

(−ky′

)= µy − ky′′ = 0, (3.6.13)

which is the wave equation, as one would expect.

404040

Chapter 4

Hamiltonian Mechanics

4.1 Hamilton’s Equations1

In the Lagrangian formulation of mechanics, the state of the system is descibed by a set of N gener-alised coordinates qi and and their time derivatives qi. For brevity we shall write it in the form L(q, q)and let the q and q stand for the whole set.

In Eq. (3.2.11) we defined the generalised momentum pi(q, q) = ∂L/∂qi. The basic idea ofthe Hamiltonian formulation of mechanics is to use coordinates and momenta (q, p) rather than co-ordinates and velocities (q, q) to parameterise the system. The space of momenta and coordinatesis known as the phase space, so the Hamiltonian describes the evolution in phase space. Althoughthe two formulations are mathematically and physically equivalent, the Hamiltonian formulation pro-vides new insight especially when moving to quantum mechanics which is usually described in theHamiltonian formulation.

To move from velocities to momenta, one has to invert Eq. (3.2.11) to find the velocities in termsof coordinates and momenta,

qi = qi(q, p). (4.1.1)

Having done this, we define the Hamiltonian function H by Legendre transformation

H(q, p) =N∑i=1

piqi(q, p)− L (q, q(q, p)) . (4.1.2)

Note that H is a function of coordinates qi and momenta pi, whereas the Lagrangian L is a functionof coordinates qi and velocities qi. This is analogous to transformations between different thermody-namic potentials in statistical physics. We will see that we can obtain the equations of motion of thesystem from the Hamiltonian, and therefore it determines the dynamics of the system.

As a simple example, consider a particle in a one-dimensional potential V (x). The Lagrangian is

L− 12mx

2 − V (x). (4.1.3)

In this case, the generalised momentum p is just the usual momentum

p =∂L

∂x= mx, (4.1.4)


41

Advanced Classical Physics, Autumn 2013 Hamiltonian MechanicsAdvanced Classical Physics, Autumn 2013 Hamiltonian MechanicsAdvanced Classical Physics, Autumn 2013 Hamiltonian Mechanics

which we can easily invert to find x = p/m. Therefore the Hamiltonian is

H(x, p) = px− L(x, x) =p2

m− p2

2m+ V (x) =

p2

2m+ V (x). (4.1.5)

This Hamiltonian is, of course, familiar from quantum mechanics.In order to see how the equations of motion arise from the Hamiltonian, let us calculate its deriva-

tives with respect to momenta and coordinates. Using Eqs. (4.1.2) and (3.2.11), we find

∂H

∂pj= qj +

∑i

pi∂qi∂pj−∑i

∂L

∂qi

∂qi∂pj

= qj , (4.1.6)

and similarly for the derivative with respect to the coordinate qj ,

∂H

∂qj=∑i

pi∂qi∂qj− ∂L

∂qj−∑i

∂L

∂qi

∂qi∂qj

= − ∂L∂qj

= − d

dt

∂L

∂qj= −pj , (4.1.7)

where we used the Euler-Lagrange equation (3.2.1) in the last step. These results are known as Hamil-ton’s equations,

qi =∂H

∂pi,

pi = −∂H∂qi

. (4.1.8)

Together, they determine the evolution of coordinates and momenta, and they are therefore the equa-tions of motion in the Hamiltonian formulation.

We can also calculate the time derivative of the Hamiltonian,

dH

dt=

∂H

∂t+∑i

∂H

∂qiqi +

∑i

∂H

∂pipi

=∂H

∂t+∑i

(∂H

∂qi

∂H

∂pi− ∂H

∂pi

∂H

∂qi

)=∂H

∂t. (4.1.9)

In particular this means that if the Hamiltonian has no explicit time dependence (i.e. ∂H/∂t = 0),then the Hamiltonian is conserved.

To understand better the physical meaning of the Hamiltonian, let us assume that the system isnatural (see Section 3.5.1), so that the kinetic energy can be written as

T =1

2

∑ij

aij qiqj . (4.1.10)

The Lagrangian is L = T − V , and therefore the momenta are

pi =∂L

∂qi=∑j

aij qj . (4.1.11)

Using this, we can write the Hamiltonian as

H =∑i

piqi−L =∑ij

aij qiqj −1

2

∑ij

aij qiqj +V (q) =1

2

∑ij

aij qiqj +V (q) = T +V. (4.1.12)

424242


-3 -2 -1 1 2 3x

-3

-2

-1

1

2

3

p

Figure 4.1: Phase space trajectories of the harmonic oscillator.

-3 -2 -1 1 2 3θ

-3

-2

-1

1

2

3

p

Figure 4.2: Phase space trajectories of a non-linear pendulum.

This demonstrates that the Hamiltonian is nothing but the total energy of the system, and which alsoexplains why it is conserved.

As a simple example, let us consider a particle in a harmonic potential. The Hamiltonian is

H =p2

2m+

1

2kx2. (4.1.13)

Hamilton’s equations are

x =∂H

∂p=

p

m,

p = −∂H∂x

= −kx. (4.1.14)

The evolution of the system can be desribed by a trajectory in phase space (x, p). In the harmoniccase, the solutions are ellipses (see Fig. 4.1). Note that because Hamilton’s equations are first-order,the phase space trajectories cannot cross, but they can have fixed points at which the derivatives vanish.One can often deduce the qualitative properties of the solutions by simply finding the fixed points andconsidering the behaviour near them. In the Harmonic case, there is one fixed point at x = p = 0.

434343


A less trivial example is a non-linear simple pendulum. The Lagrangian is

L =1

2mR2θ2 +mgR cos θ. (4.1.15)

The generalised momentum is

pθ =∂L

∂θ= mR2θ, (4.1.16)

from which we findθ =

pθmR2

. (4.1.17)

Therefore the Hamiltonian is

H = pθθ − L =p2θ

mR2− 1

2

p2θ

mR2−mgR cos θ =

1

2

p2θ

mR2−mgR cos θ. (4.1.18)

Hamilton’s equations are

θ =∂H

∂pθ=

pθmR2

, pθ = −∂H∂θ

= −mgR sin θ. (4.1.19)

These equations have fixed points at pθ = 0 and θ = nπ where n is an arbitrary integer. Even integerscorrespond to the minimum-energy state in which the pendulum is pointing down, and the odd integersare unstable states in which the pendulum points exactly up. By considering the qualitative motionwe can draw a phase space diagram of the trajectories (Fig. 4.2) without having to actually solve theequations.

4.2 Poisson Brackets

Suppose now that we want to study the behaviour of some other quantity which depends on p and qsuch that its value changes as p and q change under control of the Hamiltonian. Let F (q, p, t) be sucha quantity. The total time derivative of F is

dF

dt=

∂F

∂t+

N∑i=1

(∂F

∂qiqi +

∂F

∂pipi

)

=∂F

∂t+

N∑i=1

(∂F

∂qi

∂H

∂pi− ∂F

∂pi

∂H

∂qi

)(4.2.1)

=∂F

∂t+ F,H

where the quantity

F,H =

N∑i=1

(∂F

∂qi

∂H

∂pi− ∂F

∂pi

∂H

∂qi

)(4.2.2)

is known as a Poisson bracket. Compare this with the equation for an operator in quantum mechanics(in the Heisenberg picture)

dF

dt=∂F

∂t+

1

i~[F , H] , (4.2.3)

where [F , H] = F H − HF is the commutator of F and H .

444444


Let us note some basic properties of Poisson brackets. They are antisymmetric, F,H =−H,F. If F has no explicit time–dependence, ∂F∂t = 0, and the Poisson bracket F,H is zero,then F is independent of time; F represents a conserved quantity. The Poisson brackets for coordi-nates and momenta are simply

qi, pj = δij qi, qj = 0 pi, pj = 0 , (4.2.4)

which resemble the canonical commutation relations in quantum mechanics. We can also write Hamil-ton’s equations in terms of Poisson brackets as

pi = pi, H qi = qi, H . (4.2.5)

4.3 Symmetries and Conservation Laws

We have already noted in Sec. 3.2 a relationship between symmetry and conservation laws. If theLagrangian does not depend on coordinate qi, then the corresponding momentum pi is conserved. Interms of the Hamiltonian, we have the corresponding result that if H does not depend on qi, then

pi = −∂H∂qi

= 0, (4.3.1)

so pi is conserved.So far, this result is only useful if we have chosen the coordinate system to reflect the symmetry.

For example, if we have a two-dimensional problem with rotation symmetry, and we express it inpolar coordinates (r, θ), the Hamiltonian does not depend on the polar angle θ, and we find that thecorresponding momentum is conserved.

However, if we had chosen to use Cartesian coordinates (x, y), we could not use the above result.In this case, the Hamiltonian would be

H =p2x + p2

y

2m+ V

(√x2 + y2

). (4.3.2)

In order to treat a situation like this, we can consider an infinitesimal rotation. When we rotate thesystem by an infinitesimal angle δθ, the coordinates change as(

xy

)→(x+ δxy + δy

), where

(δxδy

)=

(−yδθxδθ

), (4.3.3)

and correspondingly the momenta change as(pxpy

)→(px + δpxpy + δpy

), where

(δpxδpy

)=

(−pyδθpxδθ

). (4.3.4)

If we find that the Hamiltonian does not change under such a rotation, we say that the Hamiltonian isinvariant.

More generally, consider a transformation generated by some function G(q, p) under which thecoordinates and momenta change as

δqi =∂G

∂piδλ, δpi = −∂G

∂qiδλ, (4.3.5)

454545


where δλ is an infinitesimal parameter. For the above rotation, the generator is

G = xpy − ypx = Lz, (4.3.6)

i.e., the z component of the angular momentum.Under this transformation, the Hamiltonian changes as

dH

dλ=

∑i

(∂H

∂qi

∂qi∂λ

+∂H

∂pi

∂pi∂λ

)=

∑i

(∂H

∂qi

∂G

∂pi− ∂H

∂pi

∂G

∂qi

)= H,G. (4.3.7)

This means that the Hamiltonian is invariant under the transformation if H,G = 0. But if this isthe case, then G is conserved because then

dG

dt= G,H = 0. (4.3.8)

Therefore, we have found a more general form of Noether’s theorem: If the Hamiltonian is invariantunder a continuous transformation, then the generator G of the transformation is a conserved charge.

Familiar examples of this are rotations, for which the conserved charge is the angular momentumL as we saw above and translations, for which the conserved charge is momentum p. To see the latter,consider G = px. Then we have

δx =∂G

∂pxδλ = δλ, δpx = −∂G

∂xδλ = 0, (4.3.9)

so this is, indeed, the generator of translations.Conservation of energy (or, equivalently, the Hamiltonian H) can also be understood in this way.

Choosing G = H as the generator, the coordinates and momenta transform as

δqi =∂H

∂piδλ = qiδλ,

δpi = −∂H∂qi

δλ = piδλ. (4.3.10)

This means just time translation t → t + δλ, so energy conservation is a direct consequence of timetranslation invariance.

These are deep results because they mean that the conservation of angular momentum, momentumand energy, which we derived earlier using the equations of motion, and actually very general resultsand are valid whenever the Hamiltonian has the corresponding symmetries, irrespective of it preciseform.

4.4 Canonical Transformations

In addition to the coordinate transformations considered in the previous section, the Hamiltonianformulation allows even more freedom in how to parameterise the system. To see this, note that thetime evolution of a function F (q, p; t) depends only on the Poisson bracket (4.2.2)

dF

dt=∂F

∂t+ F,H. (4.4.1)

464646


Therefore, if one finds another set of coordinates Q and momenta P that gives the same Poissonbrackets, i.e.,

f, gQ,P = f, gq,p (4.4.2)

for every pair of functions f and g, then they will be equally valid variables to parameterise the system.A transformation

q → Q(q, p),

p → P (p, q), (4.4.3)

that takes the original coordinates q and momenta p to the new ones satisfying Eq. (4.4.2) is knowna canonical transformation. In fact, it turns out that it is enough to check that the Poisson brackets(4.2.4) for coordinates and momenta are unchanged under the transformation, i.e.,

Qi, Qjq,p = Pi, Pjq,p = 0, Qi, Pjq,p = δij . (4.4.4)

Any set of variables that satisfy these conditions are called canonical conjugates.As an example, consider the Hamiltonian

H =p2

2m+

1

2kx2 + cpx, (4.4.5)

where c is a constant. The last term, which is linear in p, makes this Hamiltonian unusual, but we canuse a canonical transformation to write it in a more standard form. To see how to do that, let us firstrearrange the terms to write the Hamiltonian as

H =(p+ cmx)2

2m+

1

2(k − c2m)x2. (4.4.6)

This suggests a transformationP = p+ cmx, Q = x. (4.4.7)

It is easy to check that they satisfy the conditions (4.4.4),

Q,Q = P, P = 0, Q,P =∂Q

∂x

∂P

∂p− ∂Q

∂p

∂P

∂x= 1× 1− 0 = 1. (4.4.8)

Therefore Q and P and canonical conjugates, and (4.4.7) is a canonical transformation. In terms of Qand P , the Hamiltonian is simply

H =P 2

2m+

1

2(k − c2m)Q2, (4.4.9)

which is just a standard harmonic oscillator. This illustrates how one can sometimes use canonicaltransformations to turn the Hamiltonian into a familiar form, for which the solution is already known.For example, one can always remove a linear term in p by the transformation

Q = q, P = p+ f(q). (4.4.10)

It is important to understand that canonical transformations generally mix coordinates and mo-menta, and therefore the distinction between them essentially disappears in the Hamiltonian formula-tion. Because of this, it is interesting to see what a canonical transformation means in the Lagrangianformulation, which uses only coordinates to describe the system. In general, the transformation

474747


changes the Lagrangian, so we denote the new Lagrangian by L′. The Hamiltonian is unchanged,so we have

H = pq − L = PQ− L′, (4.4.11)

from which we findL′ = PQ− pq + L. (4.4.12)

Considering, for example, the transformation (4.4.10), this is

L′ = L+ f(q)q. (4.4.13)

The extra term is just the total time derivative of the integral F (q) =∫ qf(q′)dq′,

L′ = L+dF

dt. (4.4.14)

Such a total derivative changes the action by a constant, and therefore does not affect the dynamics.

484848

Chapter 5

Electromagnetic Potentials

5.1 Vector Potential

The dynamics of the electromagnetic field is described by Maxwell’s equations,

∇ ·E =ρ

ε0, (Gauss’s law)

∇×E = −∂B

∂t, (Faraday’s law)

∇ ·B = 0, (magnetic Gauss’s law)

∇×B = µ0J + µ0ε0∂E

∂t. (Ampere’s law) (5.1.1)

You have learned in Electricity&Magnetism that in electrostatics, the electric field can be describedby the electric potential φ as

E = −∇φ . (5.1.2)

Written in terms of the potential, Gauss’s law ∇ ·E = ρ/ε0 becomes the Poisson’s equation

∇2φ = − ρε0. (5.1.3)

In a medium with εr 6= 1 these may be modified slightly but we shall stick to the simpler forms forthis discussion.

On the other hand, Eq. (5.1.2) is clearly not sufficient in time-dependent problems, which can beseen by considering Faraday’s law,

∇×E = −∂B

∂t. (5.1.4)

Using Eq. (5.1.2), we can write the curl of the electric field as

∇×E = −∇×∇φ , (5.1.5)

but this is vanishes because the curl of a gradient is identically zero. Therefore Eqs. (5.1.2) and (5.1.5)are incompatible.

To describe time-dependent situations, we introduce a vector potential A, which is related to theelectric and magnetic fields by

B = ∇×A,

E = −∂A

∂t−∇φ. (5.1.6)

49

Advanced Classical Physics, Autumn 2013 Electromagnetic PotentialsAdvanced Classical Physics, Autumn 2013 Electromagnetic PotentialsAdvanced Classical Physics, Autumn 2013 Electromagnetic Potentials

Let us see how Maxwell’s equations (5.1.1) appear in terms of φ and A. First, magnetic Gauss’slaw is trivally satisfied

∇ ·B = ∇ · (∇×A) = 0, (5.1.7)

because the divergence of a curl is identically zero.1

Faraday’s law is also automatically satisfied,

∇×E = ∇×(−∂A

∂t−∇φ

)= −∂∇×A

∂t−∇×∇φ = −∂B

∂t. (5.1.8)

Gauss’s law becomes

∇ ·E = −∂(∇ ·A)

∂t−∇2φ =

ρ

ε0. (5.1.9)

This is a non-trivial equation that the potentials A and φ have to satisfy. It is essentially Poisson’sequation (5.1.3) with an additional term.

Finally, Ampere’s law becomes

∇×B = ∇× (∇×A) = µ0J + µ0ε0∂E

∂t= µ0J− µ0ε0∇

∂φ

∂t− µ0ε0

∂2A

∂t2. (5.1.10)

Using µ0ε0 = 1/c2, and rearranging the terms, we obtain

1

c2

∂2A

∂t2= −∇× (∇×A)− 1

c2∇∂φ

∂t+ µ0J = ∇2A−∇

(∇ ·A +

1

c2

∂φ

∂t

)+ µ0J. (5.1.11)

This is basically the equation of motion for the vector potential A.Eq. (5.1.11) has the form of a wave equation with some additional terms, so we can try to look

for plane wave solutions. As an Ansatz, let us consider a plane wave polarised in the x direction andtravelling in the z direction at the speed of light

A = A0eik(z−ct)x,

φ = 0. (5.1.12)

Substituting this to Eq. (5.1.11), we find

1

c2

∂2A

∂t2−∇2A−∇

(∇ ·A− 1

c2

∂φ

∂t

)− µ0J =

1

c2

∂2A

∂t2− ∂2A

∂z2= −k2A + k2A = 0, (5.1.13)

so this satisfies Ampere’s law. Using Eq. (5.1.6), we find the magnetic and electric fields

B = ∇×A =

∣∣∣∣∣∣x y z∂∂x

∂∂y

∂∂z

Ax 0 0

∣∣∣∣∣∣ =∂Ax∂z

y = ikA0eik(z−ct)y,

E = −∂A

∂t−∇φ = ikcA0e

ik(z−ct)x. (5.1.14)

This is simply an electromagnetic wave travelling in the z direction.1In fact, it is still possible to write down a vector potential that describes magnetic charge (see Problem Sheet 7 or

Contemporary Physics 53 (2012) 195).

505050

http://www.tandfonline.com/doi/abs/10.1080/00107514.2012.685693


5.2 Gauge Invariance

The scalar and vector potential are not physically observable fields, only the electric and magneticfields are. You know already that adding a constant to the scalar potential φ doesn’t change theresulting electric field. More generally, we can ask whether we have more freedom to modify φ andA without changing the electric and magnetic fields.

To do this, consider adding functions of space and time to A and φ,

A → A +α(x, t),

φ → φ+ f(x, t). (5.2.1)

This would change the electric and magnetic fields by

B → ∇× (A +α) = ∇×A + ∇×α = B + ∇×α,

E → −∂(A +α)

∂t−∇(φ+ f) = E− ∂α

∂t−∇f. (5.2.2)

Therefore, B and E remain unchanged if

∇×α = 0, and∂α

∂t+ ∇f = 0. (5.2.3)

The Helmholtz theorem states that any vector field whose curl vanishes can be written as a gradientof a scalar, so we can write

α = ∇λ. (5.2.4)

The second condition in Eq. (5.2.3) then becomes

∇f = −∂α∂t

= −∇∂λ

∂t. (5.2.5)

This means that the physical fields B and E are invariant under gauge transformations

A → A + ∇λ,

φ → φ− ∂λ

∂t, (5.2.6)

where λ(x, t) is an arbitrary scalar function. This symmetry, which is known as gauge invarianceplays a very important role in particle physics, where an analogous gauge invariance determines theproperties of elementary particle interactions almost completely.

Note that while E and B are invariant under gauge transformations, Eqs. (5.1.9) and (5.1.11) arenot. This means that we can use a gauge transformation to make those equations simpler and easierto solve. This is known as fixing the gauge. For example, the divergence ∇ ·A is not gauge invariantbut transforms as

∇ ·A→∇ · (A + ∇λ) = ∇ ·A +∇2λ. (5.2.7)

Because we can always find a solution to ∇2λ = g for an arbitrary function g(x, t), we can use agauge transformation to fix ∇ ·A to any value we like.

One popular way to fix the gauge is the Coulomb gauge, in which ∇ ·A = 0. In this gauge thenon-trivial Maxwell equations (5.1.9) and (5.1.11) become

∇2φ = − ρε0,

1

c2

∂2A

∂t2= ∇2A− 1

c2∇∂φ∂t

+ µ0J. (5.2.8)

515151


The main benefit of this gauge is that the equations are simpler: The first equation is simply thefamiliar Poisson equation, and the second equation is a wave equation with a source term. However,the drawback is that the Poisson equation appears to violate causality because a change is the chargedistribution affects the scalar potential immediately at all distances. This is not a serious problembecause φ is not observable, and the observable fields E and B still behave causally.

From the point of view of relativity, a better choice is the Lorenz gauge2 defined by

∇ ·A +1

c2

∂φ

∂t= 0. (5.2.9)

In this gauge, the equations of motion are

1

c2

∂2φ

∂t2−∇2φ =

ρ

ε0,

1

c2

∂2A

∂t2−∇2A = µ0J. (5.2.10)

Now both φ and A satisfy wave equations, and therefore changes in the charge distribution propagateat the speed of light, satisfying causality.

A third gauge choice which is often useful is the Weyl gauge, which is also known as the temporalgauge. It is defined as φ = 0, which means that the only degree of freedom is A. In this gauge, theMaxwell equations become∂

∂t(∇ ·A) = − ρ

ε0,

1

c2

∂2A

∂t2= ∇2A−∇ (∇ ·A) + µ0J. (5.2.11)

5.3 Particle in an electromagnetic Field

Let us see how to incorporate the electromagnetic field into the Lagrangian and Hamiltonian methods.For the static electric field, this is straightforward, because one can desribe it by a standard potentialterm

V (r) = qφ(r). (5.3.1)

The Lagrangian would therefore be

L = T − V =1

2mr2 − qφ(r). (5.3.2)

However, this is clearly not invariant under gauge transformations (5.2.6). A gauge transformationλ(t, r) would change the Lagrangian into

L→ L′ =1

2mr2 − qφ(r) + q

∂λ

∂t= L+ q

∂λ

∂t. (5.3.3)

Adding such an extra term to the Lagrangian will change the physics, unless the extra term is a totaltime derivative, in which case it would only change the action by a constant. Our term q ∂λ∂t is not atotal time derivative, but could we modify the Lagrangian in such a way that it turns into

dλ

dt=∂λ

∂t+dx

dt

∂λ

∂x+dy

dt

∂λ

∂y+dz

dt

∂λ

∂z=∂λ

∂t+ r ·∇λ? (5.3.4)

2Even though the Lorenz gauge is invariant under Lorentz transformations, they are spelled differently. The Lorenzgauge (with no “t”) is named after Ludvig Lorenz, whereas Lorentz transformations (with “t”) are named after HendrikLorentz.

525252


Comparing with the form of the gauge transformation (5.2.6), we note that the vector potential changesby precisely ∇λ, which suggests the Lagrangian

L =1

2mr2 + qr ·A− qφ. (5.3.5)

Indeed, this Lagrangian changes under the gauge transformation (5.2.6) as

L→ L′ =1

2mr2 + qr ·A + qr ·∇λ− qφ(r) + q

∂λ

∂t= L+ q

dλ

dt. (5.3.6)

Now the Lagrangian (5.3.5) changes only by a total derivative, which does not affect the physics.We can now derive the Euler-Lagrangian equations from our Lagrangian (5.3.5). We do this for

the x coordinate, but it is trivial to generalise the result to the y and z coordinates. The Euler-Lagrangeequation is

d

dt

∂L

∂x− ∂L

∂x=

d

dt(mx+ qAx)− qr · ∂A

∂x+ q

∂φ

∂x

= mx+ q∂Ax∂t

+ q

(x∂Ax∂x

+ y∂Ax∂y

+ z∂Ax∂z

)−q(x∂Ax∂x

+ y∂Ay∂x

+ z∂Az∂x

)+ q

∂φ

∂x

= mx+ q

[y

(∂Ax∂y− ∂Ay

∂x

)+ z

(∂Ax∂z− ∂Az

∂x

)+ q

∂φ

∂x+∂Ax∂t

]= mx− q (yBz − zBy)− qEx = 0. (5.3.7)

Combining this with the y and z components and writing in the vector notation we have

mr = q (E + v ×B) , (5.3.8)

which is just the usual Lorentz force equation. Note that we were able to derive this by assuming onlythe static Coulomb force and gauge invariance. This gives a flavour of how gauge invariance can playa key role in determining the properties of elementary particle interactions.

From the Lagrangian (5.3.5) we can derive the canonical momentum

px =∂L

∂x= mx+ qAx, (5.3.9)

or in vector form p = mr + qA, and the Hamiltonian

H = p · r− L =(p− qA)2

2m+ qφ. (5.3.10)

Eqs. (5.3.5) and (5.3.10) show that one has to formulate electrodynamics in terms of the scalar andvector potentials in order to use Lagrangian or Hamiltonian mechanics. In contrast, the Newtonianformulation only deals with the physical electric and magnetic fields.

5.4 Retarded Potentials

Let us now adopt the Lorenz gauge (5.2.9), and ask what the potentials φ and A in the presence of atime-dependent charge distribution. We focus on the scalar potential, but because the equation of thevector potential has the same form, the results can be translated directly to it.

535353


Instead of actually solving Eqs. (5.2.10), I will present the solution and then show that it satisfiesEqs. (5.2.10). Let us start from the simplest case of a static point charge q at r′, for which the scalarpotential is

φ(r) =q

4πε0|r− r′|. (5.4.1)

It is easy to generalise this to a continuous but static charge distribution ρ(r),

φ(r) =1

4πε0

∫d3r′

ρ(r′)

|r− r′|. (5.4.2)

Because electromagnetic waves travel at the speed of light c, one can guess that if the chargedensity changes with time, the potential at point r and time t will depend on the charge at point r′ atthe earlier time known as the retarded time

t′ = t− 1

c|r− r′|. (5.4.3)

This suggests that the potential should be given by

φ(r, t) =1

4πε0

∫d3r′

ρ(r′, t− |r− r′|/c)|r− r′|

. (5.4.4)

This is known as the retarded potential.To check that Eq. (5.4.4) is valid, we have to show that it satisfies Eq. (5.2.10). To simplify the

notation we write R = r − r′ and R = |r − r′|, so that the retarded time is t′ = t − R/c. Now, wehave the Laplacian with respect to the coordinate r,

∇2φ =1

4πε0

∫d3r′∇2 ρ

R=

1

4πε0

∫d3r′

[1

R∇2ρ+ 2 (∇ρ) ·

(∇ 1

R

)+ ρ∇2 1

R

]. (5.4.5)

The charge density ρ(t′, r′) depends on r only through the retarded time t′ given by Eq. (5.4.3), so

∇ρ = ρ∇t′ = −1

cρ∇R. (5.4.6)

Taking the divergence of this, we obtain

∇2ρ = −1

c

[∇ρ ·∇R+ ρ∇2R

]=

1

c2ρ∇R ·∇R− 1

cρ∇2R. (5.4.7)

Using the identities from vector calculus,

∇R =R

R, ∇ 1

R= − R

R3, (5.4.8)

and∇2R =

2

R, ∇2 1

R= −4πδ(R), (5.4.9)

we can write these as∇ρ = −1

cρR

R, (5.4.10)

and∇2ρ =

1

c2ρ− 2

c

ρ

R, (5.4.11)

545454


and Eq. (5.4.5) as

∇2φ =1

4πε0

∫d3r′

[1

c2

ρ

R− 2

c

ρ

R2− 2

cρR

R·(− R

R3

)− 4πρδ(R)

]=

1

4πε0

∫d3r′

1

c2

ρ

R− ρ

ε0=

1

c2

∂2φ

∂t2− ρ

ε0. (5.4.12)

This shows that the retarded potential satisfies Eq. (5.2.10).Because the equation for A has the same form, we can also write down the corresponding solution

for the vector potential A

A(r, t) =µ0

4π

∫d3r′

J(r′, t− |r− r′|/c)|r− r′|

. (5.4.13)

The solutions (5.4.4 and 5.4.13) describe electromagnetic waves emitted by a changing charge andcurrent distribution travelling outwards at the speed of light. Note that they are not the most generalsolutions of Eq. (5.2.10). There is another solution, known as the advanced potential, which describeselectromagnetic waves travelling from infinity towards the source. This is, however, not useful formost practical applications.

As the first example of retarded potentials, consider an infinite straight wire on the z axis. Weswitch on a current at time t = 0, so that the current is given by

I(t) =

0 for t ≤ 0,I0 for t > 0.

Because the current is confined on the z axis, the current density is

J = I(t)δ(x)δ(y)z. (5.4.14)

There is no electric charge, so ρ = 0, and therefore φ = 0. The vector potential is, however,non-trivial. To simplify the calculation, we calculate it at point r = (x, y, 0). Because the setup istranslation invariant in the z direction, the same result applies at all z. Substituing Eq. (5.4.14) intoEq. (5.4.13), we find

A(t, r) =µ0z

4π

∫d3r′

I(t− |r− r′|/c)δ(x′)δ(y′)|r− r′|

,

=µ0z

4π

∫dz′

I(t−√r2 + z′2/c)√r2 + z′2

. (5.4.15)

We now note that the current (5.4.14) vanishes for z′2 > c2t2 − r2 and is I0 elsewhere, so we have

A(t, r) =µ0I0z

4π

∫ √c2t2−r2−√c2t2−r2

dz′1√

r2 + z′2=µ0I0z

2π

∫ √c2t2−r20

dz′1√

r2 + z′2

=µ0I0

2πlnct+

√c2t2 − r2

rz. (5.4.16)

From this we can calculate the electric and magnetic fields,

E = −∂A

∂t= −µ0I0

2π

c√c2t2 − r2

z,

B = ∇×A = −∂Az∂r

φ =µ0I0

2πr

ct√c2t2 − r2

φ. (5.4.17)

555555


At late times these approach

E → 0

B → µ0I0

2πrφ, (5.4.18)

which are the familiar results for a time-independent current.The second example we consider is the Hertzian dipole, which is an infinitesimally small oscillat-

ing electric dipole. The dipole consists of charge +q at a position δr = δzz and an opposite charge−qat −δr. It can be parameterised by the dipole moment p = 2qδzz. The dipole is oscillating such thatq(t) = q0 cos(ωt) so that the dipole moment may be written p(t) = p0 cos(ωt)z, where p0 = 2q0δz.

The scalar potential φ at point r in spherical coordinates can be obtained from Eq. (5.4.4) as

φ =1

4πε0

q0 cosω(t− |r− δr|/c)

|r− δr|− q0 cosω(t− |r + δr|/c)

|r + δr|

. (5.4.19)

Because we are assuming that δz = |δr| is small, we can simplify this by Taylor expanding it to linearorder in δz. First, because δz r, we expand

|r± δr| =

√(r± δr)2 =

√r2 ± 2r · δr + δr2 =

√r2 ± 2rδz cos θ + δz2

= = r ± δz cos θ +O(δz2), (5.4.20)

and1

|r± δr|=

1

r∓ δz cos θ

r2+O(δz2). (5.4.21)

These give us

φ =q0

4πε0

cosω

(t− r−δz cos θ

c

)− cosω

(t− r+δz cos θ

c

)r

+δz cos θcosω

(t− r−δz cos θ

c

)+ cosω

(t− r+δz cos θ

c

)r2

. (5.4.22)

The assumption δz c/ω allows us to Taylor expand the cosines. In the first term, the constantterm in the Taylor expansion cancels, and therefore we only obtain a linear term in δz, whereas thesecond term already contains a factor δz and therefore we only keep the constant term in the Taylorexpansion,

φ =q0

4πε0

−2ω δz cos θ

c sinω(t− r

c

)r

+ δz cos θ2 cosω

(t− r

c

)r2

=p0 cos θ

4πε0

1

r2cosω(t− r/c)− ω

rcsinω(t− r/c)

. (5.4.23)

We are only really interested in what happens far away from the dipole, in the region where r λor r c/ω. In that case we may ignore the term in r−2 in Eq. (5.4.23) and write

φ = − p0ω

4πε0c

(cos θ

r

)sinω(t− r/c) . (5.4.24)

When the dipole oscillates, there is a current

I(t) =dq

dt= −q0ω sinωt (5.4.25)

565656


along the line element between the two charges. The current density is

J(t, r) =

I(t)δ(x)δ(y)z if − δz < z < δz,0 otherwise.

(5.4.26)

Using Eq. (5.4.13) we find the vector potential

A(t, r) =µ0z

4π

∫ δz

−δz

I(t− |r− z′z|/c)|r− z′z|

≈ −2µ0δzz

4πω

sinω(t− r/c)r

= −µ0p0

4πω

sinω(t− r/c)r

z . (5.4.27)

From Eqs. (5.4.24) and (5.4.27), we can calculate the magnetic and electric fields. In sphericalcoordinates, we find

B = ∇×A =

(1

r

∂

∂r(rAθ)−

1

r

∂Ar∂θ

)φ

= −µ0p0ω2

4πc

(sin θ

r

)cosω(t− r/c)φ (5.4.28)

E = −∂A

∂t−∇φ

= −µ0p0ω2

4π

(sin θ

r

)cosω(t− r/c)θ , (5.4.29)

where we have ignored all terms that fall faster than 1/r and used the fact that z = cos θr − sin θθ,which implies Ar = Az cos θ and Aθ = −Az sin θ.

Eq. (5.4.29) shows electromagnetic waves travelling radially away from the dipole. To find thepower radiated by the dipole, we calculate the Poynting vector S which given the energy flow throughunit area per unit time. We find

S =1

µ0E×B

=µ0

c

(p0ω

2

4π

sin θ

rcosω(t− r/c)

)2

r. (5.4.30)

This is oscillating, but we can average over one cycle of oscillation by noting that 〈cos2 ω(t−r/c)〉 =1/2,

〈S〉 =µ0p

20ω

4

32π2c

sin2 θ

r2r. (5.4.31)

The total power radiated is given by integrating Eq. (5.4.31) over the surface of a sphere

〈P 〉 =

∫da · 〈S〉 =

µ0p20ω

4

32π2c

∫sin2 θ

r2r2 sin θ dθ dφ =

µ0p20ω

4

12πc(5.4.32)

which is independent of r as expected from conservation of energy.

575757


5.5 Fields of a Moving Charge

Let us now consider a single moving charge, which follows some trajectory r0(t). The charge andcurrent densities are

ρ(t, r) = qδ(3)(r− r0(t)),

J(t, r) = qv0(t)δ(3)(r− r0(t)), (5.5.1)

where v0 = r0.To calculate the scalar potential from Eq. (5.4.4), we use a simple trick: We introduce a delta

function that enforces the condition (5.4.3),

φ(t, r) =1

4πε0

∫dt′ d3r′ δ

(t′ − [t− |r− r′|/c]

) ρ(t′, r′)

|r− r′|

=q

4πε0

∫dt′ d3r′ δ

(t′ − [t− |r− r′|/c]

) δ(r′ − r0(t′))

|r− r′|

=q

4πε0

∫dt′ δ

(t′ − [t− |r− r0(t′)|/c]

) 1

|r− r0(t′)|. (5.5.2)

We now have a delta function whose argument is a non-linear function of the integration variable t′.To deal with it, we use the general result that if function f(x) has a zero at x = x0, i.e., f(x0) = 0,then

δ(f(x)) =δ(x− x0)

|f ′(x0)|. (5.5.3)

In Eq. (5.5.2), the function is

f(t′) = t′ − t+1

c|r− r0(t′)|. (5.5.4)

Denoting the retarded time by tret, so that f(tret) = 0 or equivalently

|r− r0(tret)| = c(t− tret), (5.5.5)

we therefore have

φ(t, r) =q

4πε0

∫dt′

δ (t′ − tret)

|f ′(tret||r− r0(t′)|=

q

4πε0

1

|f ′(tret)||r− r0(tret)|. (5.5.6)

We need to calculate

f ′(tret) =∂

∂t′

(t′ − t+

1

c|r− r0(t′)|

)∣∣∣∣t′=tret

= 1− (r− r0(tret)) · v0(tret)

c|r− r0(tret)|. (5.5.7)

DenotingR = r− r0(tret), and vret = v0(tret), (5.5.8)

this is simply

f ′(tret) = 1− R · vret

cR. (5.5.9)

Substituting this to Eq. (5.5.6), we obtain the final result

φ (r, t) =q

4πε0

1

R−R · vret/c. (5.5.10)

585858


Note that Eqs. (5.5.5) and (5.5.8) imply that R = c(t − tret). Analogously, one finds the vectorpotential

A (r, t) =µ0q

4π

v0(tret)

R−R · vret/c. (5.5.11)

These forms for the potentials due to a moving charge are known as Lienard–Wiechert potentials.From the potentials (5.5.10) and (5.5.11), we can calculate the electric and magnetic fields. (For

details, see 10.3.2 in Griffiths). The result is

E(r, t) = −∂A

∂t−∇φ =

q

4πε0

1

(R−R · vret/c)3

R′

γ2ret

+1

c2R×

(R′ × vret

)(5.5.12)

B(r, t) = ∇×A =R×E

cR, (5.5.13)

where R′ = R− vretR/c is the position where the charge would be at time t if it had continued withthe velocity vret, and γret = (1− v2

ret/c2)−1/2 is the retarded γ factor.

This is the general solution, valid right up to v ∼ c, for an accelerating particle. The first termin Eq. (5.5.12) varies as R−2 and is independent of acceleration, whereas the second term varies asR−1 and depends on acceleration. The first term may be interpreted as a generalised Coulomb fieldwhereas the second term describes electromagentic waves radiated by the moving charge.

As our first example, let us consider a charge moving at constant velocity v, so that r0(t) = vt.In that case, Eq. (5.5.12) simplifies to

E(r, t) =q

4πε0

1

(R−R · vret/c)3

R′

γ2. (5.5.14)

For constant velocity, R′ = R−Rv/c = r− vt. To find the denominator, we calculate(R− 1

cR · v

)2

= R2 − 2

cRR · v +

1

c2(R · v)2 =

(1− v2

c2

)R

′2 +1

c2(R′ · v)2

= R′2

(1− v2

c2+v2

c2cos2 θ′

)= R

′2

(1− v2

c2sin2 θ′

), (5.5.15)

where θ′ is the angle between R′ and v. This shows that

R− 1

cR · v = R′

√1− v2

c2sin2 θ′. (5.5.16)

Substituting this to Eq. (5.5.14), we find the electric field of a charge moving at constant velocity,

E =q

4πε0

(1− v2

c2

)R′(

1− v2

c2sin2 θ′

)3/2R′3

. (5.5.17)

Note that the electric field points to the present position R′ of the charge, not to the retarded positionR as one might have expected. This makes sense from the point of view of special relativity, becausethe field of a moving charge should be just a Lorentz boost of the field of a stationary charge, whichnaturally points to the charge itself. We can also note that the field strength is reduced in forward andbackward directions, where sin θ′ ≈ 0, and enhanced in perpendicular directions where sin θ′ ≈ 1.

595959


We can also find the magnetic field using Eq. (5.5.13). Noting that R = R′ + v(t − tret) andE ‖ R′, we find

B =R×E

cR=

(t− tret)v ×E

cR=

v ×E

c2. (5.5.18)

Considering the moving charge as a Lorentz boost of a stationary charge, it is interesting that it has anon-zero magnetic field. This shows that electric and magnetic fields have to transform to each otherunder Lorentz boosts.

As another example, consider a charge moving non-relativistically so that v c. We can thensimplify the expressions in (5.5.12) and (5.5.13) such that R′ → R and R − R · v/c → R. Theelectric field becomes

E (r, t) =q

4πε0

1

R3

(R +

1

c2R× (R× v)

). (5.5.19)

The first term describes simply the Coulomb field moving with the charge, so if we are interested inthe power radiated by the charge, we can ignore it and only consider the second term, which describesradiation. The Poynting vector due to the radiation term is

S =1

µ0Erad ×B =

1

µ0cRErad × (R×Erad) =

1

µ0cRE2

radR =q2v2 sin2 θ

16π2ε0c3R2R, (5.5.20)

where θ is the angle between R and the acceleration v. The total power radiated into a sphere ofradius R is

P =

∫ π

0dθ sin θ

∫ 2π

0dφR2SR =

q2v2

6πε0c3, (5.5.21)

where the v here is evaluated at the retarded time.

606060

Chapter 6

Electrodynamics and Relativity

6.1 Four-Vectors

In the context of special and general relativity particularly, but also in other situations, it is often usefulto consider space and time as two aspects of the same quantity rather than as separate. To this end wecan write the coordinates of an event occurring at position r = (x, y, z) and time t as at the four-vectorposition (ct, x, y, z) in space-time, where the time component is written as a length ct, the distancelight travels in time t, so that it has the same units as the other components. Often this is rewrittenin the form (x0, x1, x2, x3), with superscript indices. The reason for this becomes clear soon. Thesesuperscript indices should not be confused with raising x to a power. They are usually denoted byGreek letters, e.g., xµ, where µ ∈ 0, 1, 2, 3. Abusing this notation slightly, we often also denotethe whole four-vector by xµ to make it clear that we are referring to a four-vector quantity. If that isobvious, we can also refer to the four-vector by simply x.

Consider now a Lorentz boost in x direction by velocity v. Writing γ = 1/√

1− v2/c2, the timeand space coordinates transform as

t′ = γ(t− vx/c2),

x′ = γ(x− vt),y′ = y,

z′ = z. (6.1.1)

Using the four-vector notation, this can be written as a matrix multiplicationx0

x1

x2

x3

′

=

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

x0

x1

x2

x3

, (6.1.2)

where β = v/c. Denoting the transformation matrix by Λ, we can write the transformation in termsof the four-vector components as

x′µ

=∑ν

Λµνxν , (6.1.3)

where Λµν denotes the elements of the matrix Λ. As usual, the first index in Λµν refers to the rowand the second index to the column. The reason why we write the row index as superscript and thecolumn index as subscript will become clear soon.

61

Advanced Classical Physics, Autumn 2013 Electrodynamics and RelativityAdvanced Classical Physics, Autumn 2013 Electrodynamics and RelativityAdvanced Classical Physics, Autumn 2013 Electrodynamics and Relativity

A Lorentz transformation can be defined as one that leaves all space-time intervals

`2 = c2t2 − r2 = (x0)2 − (x1)2 − (x2)2 − (x3)2 (6.1.4)

unchanged. To express this in the four-vector notation, we define a 4× 4 matrix known as the metrictensor,

g =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

, (6.1.5)

whose components we denote by gµν . Note that the metric is symmetric, gνµ = gµν . More specifi-cally, this is known as the Minkowski metric to distinguish it from more general metrics that are usedin general relativity, and to emphasize that it is often also denoted by ηµν .

Using the metric tensor, the space-time interval can be written as

`2 =∑µ,ν

xµgµνxν . (6.1.6)

Note that in Eqs. (6.1.3) and (6.1.6) each Lorentz index appears once as a superscript and once asa subscript. From now on we will follow the Einstein convention, in which are Lorentz index thatappears once as a superscript and once as a subscript is summed over. We will see later that this isalmost always what we want, and in the exceptional cases when we do not want to sum over the index,we state that explicitly.

Using the Einstein convention, the transformation law (6.1.3) becomes

x′µ

= Λµνxν , (6.1.7)

and the expression for the space-time interval is

`2 = xµgµνxν . (6.1.8)

Tensors (i.e. linear relations between a number of four-vectors) can be represented conveniently inthis notation. For example, if four-vector xµ is related to four-vectors yµ, zµ and wµ through a linearrelation (i.e., a rank 4 tensor), this can be expressed as

xµ = Mµνρσy

νzρwσ. (6.1.9)

We can see that in this notation, a rank N tensor appears simply as an object with N Lorentz indices.In contrast, the matrix notation we used to represent the inertia tensor is only suitable for rank 2 tensor.

The component notation is also more flexible than the matrix notation. A product of two matrices(or rank 2 tensors) A and B, with components Aµν and Bµ

ν , can be written as

(A ·B)µν = AµρBρν . (6.1.10)

In matrix multiplication, the order of the factors matters, A ·B 6= B ·A, but in the component notationthe symbols represent matrix elements, which are real or complex numbers. Therefore we can changethe order of the factors freely, i.e., AµρBρ

ν = BρνA

µρ. The labelling of the indices keeps track of

how the tensors are multiplied. To actually compute numerical values, it it often convenient to switchto the matrix notation, and it is then important to write the matrices in the right order. Note also that

626262


you can choose freely which Greek letter you use for each summation index, but the same letter canonly be used ones in one expression (i.e. once as a superscript and once as a subscript).

Under this transformation, the space-time interval transforms as

`2 = xµgµνxν → x′

µgµνx

′ν = ΛµρxρgµνΛνσx

σ = xρΛµρgµνΛνσxσ = xµΛρµgρσΛσνx

ν , (6.1.11)

where, in the last step, we used the freedom to change the labelling of the summation indices andswapped µ ↔ ρ and ν ↔ σ. In order for the transformation to leave the space-time interval `2

invariant, Eq. (6.1.11) has to be equal to xµgµνxν , and this requires

ΛρµgρσΛσν = gµν . (6.1.12)

We can therefore use Eq. (6.1.12) as the definition of a Lorentz transformation.Using the metric tensor, we can also define a scalar product of two four-vectors x and y as

x · y = xµgµνyν = x0y0 − x1y1 − x2y2 − x3y3. (6.1.13)

This is also invariant under Lorentz transformations,

`2 = xµgµνyν → x′

µgµνy

′ν = ΛµρxρgµνΛνσy

σ = xµΛρµgρσΛσνyν = xµgµνy

ν . (6.1.14)

To simplify the notation further, we define a covariant vector xµ by

xµ = gµνxν = (x0,−x1,−x2,−x3), (6.1.15)

and indicate it by using a subscript index. We say that we use the metric to lower the index. Theoriginal position four-vector xµ with a superscript index is called a contravariant vector. For example,the scalar product (6.1.13) is then simply

x · y = xµyµ. (6.1.16)

To raise the index, i.e., turn a covariant vector back to a contravariant one, we need the inverse g−1

of the metric tensor, so thatxµ = (g−1)µνxν . (6.1.17)

(Note that we use superscript indices to be consistent with the Einstein convention.) Eq. (6.1.17) isequivalent to saying that it is the inverse matrix of gµν , defined in the usual way by

(g−1)µνgνρ = δµρ , (6.1.18)

where

δµρ =

1 0 0 00 1 0 00 0 1 00 0 0 1

(6.1.19)

is the 4× 4 unit matrix.For the Minkowski metric (6.1.5) it is easy to find the inverse, and it turns out to be the same

matrix as the metric itself,

(g−1)µν =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

= gµν , (6.1.20)

636363


but this is not the case in general relativity. In any case, using the definition of the inverse metric(6.1.18), we can see that it satisfies

gµρgνσ(g−1)ρσ = gµν . (6.1.21)

This means that when we lower the indices of the inverse metric (g−1)µν in the same way as inEq. (6.1.15) we obtain the original metric gµν . We can therefore think of the inverse metric (g−1)µν

as simply the contravariant counterpart of the covariant metric gµν . In particular, this means that thereis no need to indicate the inverse metric by “−1” and we can simply write

(g−1)µν = gµν (6.1.22)

without any risk of confusion. Then Eq. (6.1.18) becomes

gµνgνρ = δµρ , (6.1.23)

and the expression for raising the index (6.1.17) simplifies to

xµ = gµνxν . (6.1.24)

We can treat all Lorentz indices in this way, using gµν to lower a contravariant superscript indexto a covariant subscript, and gµν to raise a covariant subscript index to a contravariant superscript. Inparticular, if we multiply both sides of Eq. (6.1.12) by gλµ, we find

gλµΛρµgρσΛσν = gλµgµν = δλν . (6.1.25)

Comparing this with the definition of the inverse Lorentz transformation Λ−1, which takes the systemback from the boosted to the original frame,

(Λ−1)λσΛσν = δλν , (6.1.26)

we find that(Λ−1)λσ = gλµΛρµgρσ ≡ Λσ

λ. (6.1.27)

We can also derive the transformation law for covariant vectors,

x′µ = gµνx′ν = gµνΛνρx

ρ = gµνΛνρgρλxλ = Λµ

λxλ. (6.1.28)

Comparing with Eq. (6.1.27) we see that the transformation matrix for the covariant vectors is theinverse of the contravariant transformation matrix.

Besides the position four-vector xµ, there are other quantities that transform in the same wayunder Lorentz transformations and can therefore be naturally written as four-vectors. These include

• The four-velocity uµ, defined as

uµ =dxµ

dτ, (6.1.29)

where τ is the proper time is the time measured by the observer moving along the trajectordefined by xµ. It is defined as c2dτ2 = c2dt2 − dx2 − dy2 − dz2, or dt = γdτ . It follows that

uµ = (γc, γvx, γvy, γvz). (6.1.30)

646464


• Four-momentumpµ = muµ = (E/c, px, py, pz). (6.1.31)

• Four-current density

jµ = nquµ = (γnqc, γnqvx, γnqvy, γnqvz) = (ρc, jx, jy, jz). (6.1.32)

These all transform as contravariant vectors, i.e., u′µ = Λµνuν etc., although of course we can always

lower the index with the metric to turn them into the covariant form, uµ = gµνuν , when it is more

convenient.For an example of a four-vector that is more natural to think of as a covariant vector, consider a

scalar function f(x) of spacetime, and its derivative with respect to the contravariant position vectorxµ. Using the chain rule of derivatives, and the inverse Lorentz transformation xν = (Λ−1)νµx

′µ =Λµ

νx′µ, we find∂f(x)

∂x′µ=∑ν

∂xν

∂x′µ∂f(x)

∂xν= Λµ

ν ∂f(x)

∂xν. (6.1.33)

Comparing with Eq. (6.1.28), we see that a derivative with respect to a contravariant vector transformsas a covariant vector. Therefore we use the notation

∂µ ≡∂

∂xµ, (6.1.34)

to make this explicit. In this notation, Eq. (6.1.33) becomes

∂′µf = Λµν∂νf. (6.1.35)

Similarly, a derivative with respect to a covariant vector transforms as a contravariant vector, andtherefore we write

∂µ ≡ ∂

∂xµ. (6.1.36)

Remember that a tensor is a linear relationship between two or more vectors. In special relativitywe expect that the same linear relationship remains valid in all reference frames, in which case itis called a Lorentz tensor. Consider, for example, the rank 4 tensor Mµ

νρσ in Eq. (6.1.9). Lorentzboosting the right-hand-side, we find

x′µ

= Λµλxλ = ΛµλM

λνρσy

νzρwσ = ΛµλMλνρσΛα

νy′αΛβ

ρz′βΛγ

σw′γ, (6.1.37)

where in the last step we used the inverse Lorentz transformation. We want to be able to write this as

x′µ

= M ′µαβγy

′αz′βw′γ, (6.1.38)

which means that the boosted tensor has to be

M ′µαβγ = ΛµλΛα

νΛβρΛγ

σMλνρσ (6.1.39)

We can see that each superscript index transforms with the contravariant transformation matrix, andeach subscript index with the covariant transformation matrix, just like in four-vectors.

Whenever an index is summed over (contracted) according to the Einstein convention, the sumis Lorentz invariant, so summed indices can be ignored when doing Lorentz transformations. Forexample,

M ′µµβγ = ΛµλΛµ

νΛβρΛγ

σMλνρσ = δνλΛβ

ρΛγσMλ

νρσ = ΛβρΛγ

σMµµρσ (6.1.40)

where we used the property (6.1.25). This shows why the Einstein convention is so useful in specialrelativity: Because the laws of nature are supposed to be the same in all inertial frames, pairs of indicesshould only appear in this Lorentz invariant form.

656565


6.2 Relativistic Electrodynamics

Historically, electrodynamics played a key role in the development of the theory of relativity, andelectrodynamics appears much more elegent in a fully relativistic formulation. However, it is notentirely trivial to write electric and magnetic fields in a four-vector form. For example, we saw inSection 5.5 that electric and magnetic fields have to somehow transform to each other under Lorentztransformations, so they cannot be two separate four-vectors.

To derive the relativistic formulation, we start from the expression for the Lorentz force,

dp

dt= F = q(E + v ×B). (6.2.1)

It follows directly that the derivative with respect to the proper time is

dp

dτ= γ

dp

dt= q

(u0

cE + u×B

), (6.2.2)

where u = γv is the spatial part of the four-velocity uµ.The time derivative of the energy of the particle is

dE

dt= F · v = qE · v, (6.2.3)

from which we obtain the derivative with respect to the proper time as

dE

dτ= γ

dE

dt= qE · u. (6.2.4)

We can now combine Eqs. (6.2.2) and (6.2.4) into the proper time derivative of the four-momentumpµ = (E/c,p),

dpµ

dτ=

d

dτ

E/cpxpypz

= q

0 Ex/c Ey/c Ez/c

Ex/c 0 Bz −ByEy/c −Bz 0 BxEz/c By −Bx 0

u0

u1

u2

u3

. (6.2.5)

The matrix appearing in this expression is called the Faraday tensor or the field-strength tensor anddenoted by Fµν , so we can write more compactly the relativistic Lorentz force equation as

dpµ

dτ= qFµνu

ν . (6.2.6)

The Faraday tensor is often written with two contravariant indices as

Fµν = Fµρgρν =

0 −Ex/c −Ey/c −Ez/c

Ex/c 0 −Bz ByEy/c Bz 0 −BxEz/c −By Bx 0

. (6.2.7)

The Lorentz force equation (6.2.6) then becomes

dpµ

dτ= qFµνuν . (6.2.8)

666666


Note that this tensor is antisymmetric, F νµ = −Fµν .In order for the right-hand-side of Eq. (6.2.6) to be Lorentz contravariant, Fµν has to transform as

a contravariant rank 2 Lorentz tensor,

F ′µν

= ΛµρΛνσF

ρσ. (6.2.9)

This tells us how the electric and magnetic fields must transform. For example, considering a boost inz direction,

Λµρ =

γ 0 0 −βγ0 1 0 00 0 1 0−βγ 0 0 γ

, (6.2.10)

we find that the electric and magnetic fields transform as

E′x = γ (Ex − vBy) ,E′y = γ (Ey + vBx) ,

E′z = Ez ,(6.2.11)

B′x = γ(Bx + vEy/c

2),

B′y = γ(By − vEx/c2

),

B′z = Bz .(6.2.12)

More generally, we can write

E′‖ = E‖,

B′‖ = B‖,

E′⊥ = γ (E⊥ + v ×B) ,B′⊥ = γ

(B⊥ − v ×E/c2

)= γ (B⊥ − µ0ε0v ×E) ,

(6.2.13)

where ‖ refers to the component parallel to the boost velocity, and⊥ to the perpendicular components.If B = 0, then

B′ = −v ×E′

c2, (6.2.14)

in agreement with Eq. (5.5.18) (note the opposite sign of v).

6.3 Maxwell’s Equations

Now that we have combined the electric and magnetic fields into one Lorentz tensor Fµν , we want towrite Maxwell’s equations (5.1.1) in terms of it. We start by noting that, according to Eq. (6.2.7), theelectric and magnetic fields are given by

E = (cF 10, cF 20, cF 30),

B = (F 32, F 13, F 21). (6.3.1)

We now write Gauss’s law as

∇ ·E− ρ

ε0=

∂Ex∂x

+∂Ey∂y

+∂Ez∂z− ρ

ε0= c

(∂F 10

∂x+∂F 20

∂y+∂F 30

∂z− ρ

cε0

)= c

(∂1F

10 + ∂2F20 + ∂3F

30 − ρ

cε0

)= c

(∂µF

µ0 − ρ

cε0

). (6.3.2)

676767


Therefore we have∂µF

µ0 =ρ

cε0= µ0cρ. (6.3.3)

To deal with Ampere’s law,

∇×B− µ0J− µ0ε0∂E

∂t= 0 (6.3.4)

we write the x component

∂Bz∂y− ∂By

∂z− µ0jx − µ0ε0

∂Ex∂t

=∂F 21

∂y− ∂F 13

∂z− µ0jx −

1

c

∂F 10

∂t

= ∂0F01 + ∂2F

21 + ∂3F31 − µ0jx

= ∂µFµ1 − µ0jx = 0, (6.3.5)

so we have∂µF

µ1 = µ0jx, (6.3.6)

and similarly for the other components. We can now write Eqs. (6.3.3) and (6.3.6) as one equation interms of the four-current jµ = (cρ, j),

∂µFµν = µ0j

ν . (6.3.7)

The magnetic Gauss’s law reads

∇ ·B =∂Bx∂x

+∂By∂y

+∂Bz∂z

=∂F 32

∂x+∂F 13

∂y+∂F 21

∂z= ∂1F

32 + ∂2F13 + ∂3F

21

= ∂1F 23 + ∂2F 31 + ∂3F 12 = 0. (6.3.8)

For Faraday’s law

∇×E +∂B

∂t= 0, (6.3.9)

we again take the x component,

∂Bx∂t

+∂Ez∂y− ∂Ey

∂z=

∂F 32

∂t+ c

∂F 30

∂y− c∂F

20

∂z= c

(∂0F

32 − ∂2F03 − ∂3F

20)

= c(∂0F 32 + ∂2F 03 + ∂3F 20

)= 0 (6.3.10)

By comparing Eqs. (6.3.8) and (6.3.10), we note that we can combine them into one equation

∂µF νρ + ∂νF ρµ + ∂ρFµν = 0. (6.3.11)

Thus, we have found that in the four-vector notation, the four Maxwell’s equations (5.1.1) can beexpressed by just two equations (6.3.7) and (6.3.11).

686868


6.4 Four-vector Potential

In Section 5.1, we saw that we can express the electric and magnetic fields using the scalar and vectorpotentials, and that reduced tha number of non-trivial Maxwell’s equations from four to two. We willnow see that we can do the same to the Faraday tensor Fµν , and that way reduce Maxwell’s equationsto just one.

Using Eq. (5.1.6), we can write

F 10 =Exc

= −1

c

(∂Ax∂t

+∂φ

∂x

)= −∂0Ax − ∂1

(φ

c

)= ∂1

(φ

c

)− ∂0Ax, (6.4.1)

andF 21 = Bz =

∂Ay∂x− ∂Ax

∂y= ∂1Ay − ∂2Ax = ∂2Ax − ∂1Ay. (6.4.2)

By defining the four-vector potential Aµ = (φ/c,A), we can write these two equations as

F 10 = ∂1A0 − ∂0A1

F 21 = ∂2A1 − ∂1A2. (6.4.3)

Similar relations apply to other components of Fµν , so we can combine them into one equation

Fµν = ∂µAν − ∂νAµ. (6.4.4)

Using this equation, we can write the Faraday tensor in terms of the four-vector potential. Finally, wewant to express Maxwell’s equations (6.3.7) and (6.3.11) in terms of Aµ. Eq. (6.3.7) becomes

∂µFµν = ∂µ∂

µAν − ∂ν∂µAµ = µ0jν . (6.4.5)

For Eq. (6.3.11), we find that

∂µF νρ + ∂νF ρµ + ∂ρFµν = ∂µ∂νAρ − ∂µ∂ρAν + ∂ν∂ρAµ − ∂ν∂µAρ + ∂ρ∂µAν − ∂ρ∂νAµ = 0(6.4.6)

identically, because each term appears twice with opposite signs. Therefore, Eq. (6.3.11) is automat-ically satisfied when the Faraday tensor is expressed in terms of the four-vector potential. The onlynon-trivial equation is therefore Eq. (6.4.5).

6.5 Lagrangian for Electrodynamics

Finally, let us see how we can describe electrodynamics in the Lagrangian formulation. Because theelectromagnetic fields are continuous fields, we need to find a Lagrangian density L as defined inSection 3.6, and we will express it in terms of the four-vector potential Aµ.

We know that electrodynamics is invariant under both gauge and Lorentz transformations. There-fore the Lagrangian density L has to be a Lorentz scalar, and to be gauge invariant it can only dependon the four-vector potential through the Faraday tensor Fµν . It also makes sense to demand that theLagrangian density should contain only first time derivatives and they should appear only in quadraticform. This corresponds to a “natural” system as defined in Section 3.5.1, and it ensures that theEuler-Lagrange equations have the familiar form. The expression that satisfies these requirements isFµνFµν , and therefore we are led to consider a Lagrangian density of the form

L = aFµνFµν , (6.5.1)

696969


where a is some constant. Actually, the numerical value of a does not matter because it will drop outthe Euler-Lagrange equation, but we will see later that the sign should be negative. It is conventionalto choose a = −1/4, so that we have

L = −1

4FµνFµν . (6.5.2)

In terms of the four-vector potential, this becomes

L = −1

4(∂µAν − ∂νAµ)(∂µAν − ∂νAµ) = −1

2∂µAν(∂µAν − ∂νAµ). (6.5.3)

In order to derive the Euler-Lagrange equation, we first write Eq. (3.6.12) in a four-vector form,

∂µ∂L

∂(∂µy)− ∂L∂y

= 0, (6.5.4)

and generalise it to the current case by replacing y with Aν . Because the Lagrangian L depends onlyon its derivatives, we find the equation

∂µ∂L

∂(∂µAν)= 0. (6.5.5)

To avoid any confusion about the derivative in this expression, it is best to lower all the indices inEq. (6.5.3) and write it in the form

L = −1

2gκρgλσ∂κAλ(∂ρAσ − ∂σAρ). (6.5.6)

Then the derivative is easy to take by noting that it is non-zero only if the Lorentz indices match, thatis,

∂ (∂κAλ)

∂ (∂µAν)= δµκδ

νλ. (6.5.7)

We find∂L

∂(∂µAν)= −1

2gκρgλσ

[δµκδ

νλ(∂ρAσ − ∂σAρ) + ∂κAλ

(δµρ δ

νσ − δµσδνρ

)]= −1

2

[gµρgνσ(∂ρAσ − ∂σAρ) + ∂κAλ

(gµκgνλ − gµλgνκ

)]= −1

2[∂µAν − ∂νAµ + ∂µAν − ∂νAµ] = −Fµν , (6.5.8)

and therefore the Euler-Lagrange equation (6.5.5) is

∂µ∂L

∂(∂µAν)= −∂µFµν = 0. (6.5.9)

Which is exactly the Maxwell equation (6.3.7) in vacuum, i.e., with jµ = 0. Because the otherMaxwell equation (6.3.11) is satisfied identically when using the four-vector potential Aµ, we haveshown that the laws of electrodynamics in vacuum are correctly described by the Lagrangian (6.5.3)which we obtained by assuming essentially only gauge and Lorentz invariance. This demonstrateshow powerful symmetry considerations can be in physics, and in fact the properties of the otherfundamental interactions (strong and weak nuclear force, and gravity) are also determined by theircorresponding gauge invariances.

707070

Documents

Advanced Classical Physics, Autumn 2013 - Imperial · PDF fileAdvanced Classical Physics, Autumn 2013 Lecture Notes ... (3rd Edition), Goldstein, Poole & Safko ... Relativity and Electromagnetism