Cahill-Physical Mathematics What Physicists and Engineers Need to Know

Physical Mathematics:What Physicists and Engineers Need to Know

Kevin Cahill

Department of Physics and Astronomy

University of New Mexico, Albuquerque, NM 87131-1156

iCopyright c20042010 Kevin Cahill

ii

For Marie, Mike, Sean, Peter, Mia, and James, and

in honor of Muntader al-Zaidi and Julian Assange.

Contents

Preface

1. Linear Algebra

2. Fourier Series

3. Fourier and Laplace Transforms

4. Infinite Series

5. Complex-Variable Theory

6. Differential Equations

7. Integral Equations

8. Legendre Polynomials

9. Bessel Functions

10. Group Theory

11. Tensors and Local Symmetries

12. Forms

13. Probability and Statistics

14. Monte Carlos

15. Chaos and Fractals

16. Functional Derivatives

17. Path Integrals

18. The Renormalization Group

19. Finance

20. Strings

Preface

A word to students: You will find lots of physical examples crammed in

amongst the mathematics of this book. Dont let them bother you. As you

master the mathematics, you will learn some of the physics by osmosis, just

as people learn a foreign language by living in a foreign country.

This book has two goals. One is to teach mathematics in the context of

physics. Students of physics and engineering can learn both physics and

mathematics when they study mathematics with the help of physical exam-

ples and problems. The other goal is to explain succinctly those concepts of

mathematics that are simple and that help one understand physics. Linear

dependence and analyticity are simple and helpful. Elaborate convergence

tests for infinite series and exhaustive lists of the properties of special func-

tions are not. This mathematical triage does not always work: Whitneys

embedding theorem is helpful but not simple.

The book is intended to support a one- or two-semester course for graduate

students and advanced undergraduates. One could teach the first seven,

eight, or nine chapters in the first semester, and the other chapters in the

second semester.

Several friends and colleagues, especially Bernard Becker, Steven Boyd,

Robert Burckel, Colston Chandler, Vageli Coutsias, David Dunlap, Daniel

Finley, Franco Giuliani, Igor Gorelov, Dinesh Loomba, Michael Malik, Sud-

hakar Prasad, Randy Reeder, and Dmitri Sergatskov have given me valuable

advice.

The students in the courses in which I have developed this book have

improved it by asking questions, contributing ideas, suggesting topics, and

correcting mistakes. I am particularly grateful to Mss. Marie Cahill and

Toby Tolley and to Messrs. Chris Cesare, Robert Cordwell, Amo-Kwao

Godwin, Aram Gragossian, Aaron Hankin, Tyler Keating, Joshua Koch,

Akash Rakholia, Ravi Raghunathan, Akash Rakholia, and Daniel Young for

Preface v

ideas and questions and to Mss. Tiffany Hayes and Sheng Liu and Messrs.

Thomas Beechem, Charles Cherqui, Aaron Hankin, Ben Oliker, Boleszek

Osinski, Ravi Raghunathan, Christopher Vergien, Zhou Yang, Daniel Zir-

zow, for pointing out several typos.

1Linear Algebra

1.1 Numbers

The natural numbers are the positive integers, with or without zero. Ra-

tional numbers are ratios of integers. An irrational number x is one whose

decimal digits dn

x =

n=mx

dn10n

(1.1)

do not repeat. Thus, the repeating decimals 1/2 = 0.50000 . . . and 1/3 =

0.3 0.33333 . . . are rational, while pi = 3.141592654 . . . is not. Incidentally,decimal arithemetic was invented in India over 1500 years ago but was not

widely adopted in the Europe until the seventeenth century.

The real numbersR include the rational numbers and the irrational num-

bers; they correspond to all the points on an infinite line called the real line.

The complex numbers C are the real numbers with one new number i

whose square is 1. A complex number z is a linear combination of a realnumber x and a real multiple iy of i

z = x+ iy. (1.2)

Here x = Rez is said to be the real part of z, and y the imaginary part

y = Imz. One adds complex numbers by adding their real and imaginary

parts

z1 + z2 = x1 + iy1 + x2 + iy2 = x1 + x2 + i(y1 + y2). (1.3)

Since i2 = 1, the product of two complex numbers isz1z2 = (x1 + iy1)(x2 + iy2) = x1x2 y1y2 + i(x1y2 + y1x2). (1.4)

2 Linear Algebra

The polar representation z = r exp(i) of a complex number z = x+ iy is

z = x+ iy = rei = r(cos + i sin ) (1.5)

in which r is the modulus of z

r = |z| =x2 + y2 (1.6)

and is its argument

= arctan (y/x). (1.7)

Since exp(2pii) = 1, there is an inevitable ambiguity in the definition of the

argument of any complex number: the argument + 2pin gives the same z

as .

There are two common notations z and z for the complex conjugate ofa complex number z = x+ iy

z = z = x iy. (1.8)The square of the modulus of a complex number z = x+ iy is

|z|2 = x2 + y2 = (x+ iy)(x iy) = zz = zz. (1.9)The inverse of a complex number z = x+ iy is

z1 = (x+ iy)1 =x iy

(x iy)(x+ iy) =x iyx2 + y2

=z

zz=

z

|z|2 . (1.10)

Grassmann numbers i are anti-commuting numbers, i.e., the anti-

commutator of any two Grassmann numbers vanishes

{i, j} [i, j]+ ij + ji = 0. (1.11)In particular, the square of any Grassmann number is zero

2i = 0. (1.12)

One may show that any power series in N Grassmann numbers i is a

polynomial whose highest term is proportional to the product 12 . . . N .

For instance, the most complicated power series in two Grassmann numbers

f(1, 2) =n=0

m=0

fnmn1

m2 (1.13)

is just

f(1, 2) = f0 + f1 1 + f2 2 + f12 12. (1.14)

1.2 Arrays 3

1.2 Arrays

An array is an ordered set of numbers. Arrays play big roles in computer

science, physics, and mathematics. They can be of any (integral) dimension.

A one-dimensional array (a1, a2, . . . , an) is variously called an n-tuple,

a row vector when written horizontally, a column vector when written

vertically, or an n-vector. The numbers ak are its entries or components.

A two-dimensional array aik with i running from 1 to n and k from 1 to

m is an nm matrix. The numbers aik are called its entries, elements,or matrix elements. One can think of a matrix as a stack of row vectors

or as a queue of column vectors. The entry aik is in the ith row and kth

column.

One can add together arrays of the same dimension and shape by adding

their entries. Two n-tuples add as

(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn) (1.15)

and two nm matrices a and b add as(a+ b)ik = aik + bik. (1.16)

One can multiply arrays by numbers: Thus z times the three-dimensional

array aijk is the array with entries zaijk.

One can multiply two arrays together no matter what their shapes and

dimensions. The outer product of an n-tuple a and an m-tuple b is an

nm matrix with elements(ab)ik = aibk (1.17)

or an m n matrix with entries (ba)ki = bkai. If a and b are complex, thenone also can form the outer products

(a b)ik = ai bk (1.18)

and (b a)ki = bk ai. The (outer) product of a matrix aik and a three-

dimensional array bj`m is a five-dimensional array

(ab)ikj`m = aikbj`m. (1.19)

An inner product is possible when two arrays are of the same size in

one of their dimensions. Thus the inner product (a, b) a|b or dotproduct a b of two real n-tuples a and b is(a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1b1 + + anbn. (1.20)

4 Linear Algebra

The inner product of two complex n-tuples is defined as

(a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1 b1 + + an bn (1.21)or as its complex conjugate

(a, b) = a|b = (a b) = (b, a) = b|a = b a (1.22)so that (a, a) 0.

The product of an m n matrix aik times an n-tuple bk is the m-tuple bwhose ith component is

bi = ai1b1 + ai2b2 + + ainbn =nk=1

aikbk (1.23)

or simply b = a b in matrix notation.If the size of the second dimension of a matrix a matches that of the first

dimension of a matrix b, then their product ab is the matrix with entries

(ab)i` = ai1b1` + + ainbn`. (1.24)

1.3 Matrices

Apart from n-tuples, the most important arrays in linear algebra are the

two-dimensional arrays called matrices.

The trace of an n n matrix a is the sum of its diagonal elements

Tr a = tr a = a11 + a22 + + ann =Ni=1

aii. (1.25)

The trace of two matrices is independent of their order

Tr (a b) =ni=1

nk=1

aikbki =nk=1

ni=1

bkiaik = Tr (ba) . (1.26)

It follows that the trace is cyclic

Tr (a b . . . z) = Tr (b . . . z a) . (1.27)

(Here we take for granted that the elements of these matrices are ordinary

numbers that commute with each other.)

The transpose of an n ` matrix a is the ` n matrix aT with entries(aT)ij

= aji. (1.28)

1.3 Matrices 5

Some mathematicians use a prime to mean transpose, as in a = aT, butphysicists tend to use a prime to mean different . One may show that

(a b) T = bT aT. (1.29)

A matrix that is equal to its transpose

a = aT (1.30)

is symmetric.

The (hermitian) adjoint of a matrix is the complex conjugate of its trans-

pose (Charles Hermite, 18221901). That is, the (hermitian) adjoint a ofan N L complex matrix a is the LN matrix with entries

(a)ij = (aji) = aji. (1.31)

One may show that

(a b) = b a. (1.32)

A matrix that is equal to its adjoint

(a)ij = (aji) = aji = aij (1.33)

(and which therefore must be a square matrix) is said to be hermitian or

self adjoint

a = a. (1.34)

Example: The three Pauli matrices

1 =

(0 1

1 0

), 2 =

(0 ii 0

), and 3 =

(1 0

0 1)

(1.35)

are all hermitian (Wolfgang Pauli, 19001958). A real hermitian matrix is

symmetric. If a matrix a is hermitian, then the quadratic form

v|a|v =ni=1

Nj=1

vi aijvj R (1.36)

is real for all complex n-tuples v.

The Kronecker delta ik is defined to be unity if i = k and zero if i 6= k(Leopold Kronecker, 18231891). In terms of it, the nn identity matrixI is the matrix with entries Iik = ik.

The inverse a1 of an n n matrix a is a matrix that satisfiesa1 a = a a1 = I (1.37)

6 Linear Algebra

in which I is the n n identity matrix.So far we have been writing n-tuples and matrices and their elements with

lower-case letters. It is equally common to use capital letters, and we will

do so for the rest of this section.

A matrix U whose adjoint U is its inverse

U U = UU = I (1.38)

is unitary. Unitary matrices are square.

A real unitary matrix O is orthogonal and obeys the rule

OTO = OOT = I. (1.39)

Orthogonal matrices are square.

An N N hermitian matrix A is said to be non-negativeA 0 (1.40)

if for all complex vectors V the quadratic form

V |A|V =Ni=1

Nj=1

V i AijVj 0 (1.41)

is non-negative. A similar rule

V |A|V > 0 (1.42)for all |V defines a positive or a positive-definite matrix (A > 0), al-though people often use these terms to describe non-negative matrices.

Examples: The non-symmetric, non-hermitian 2 2 matrix(1 1

1 1)

(1.43)

is positive on the space of all real 2-vectors but not on the space of all

complex 2-vectors.

The 2 2 matrix (0 11 0

)(1.44)

provides a representation of i since(0 11 0

)(0 11 0

)=

(1 00 1

)= I. (1.45)

The 2 2 matrix (0 1

0 0

)(1.46)

1.4 Vectors 7

provides a representation of a Grassmann number since(0 1

0 0

)(0 1

0 0

)=

(0 0

0 0

)= 0. (1.47)

To represent two Grassmann numbers one needs 4 4 matrices, such as

1 =

0 1 0 0

0 0 0 0

0 0 0 1

0 0 0 0

and 2 =

0 0 0 0

0 0 0 0

1 0 0 0

0 1 0 0

. (1.48)

1.4 Vectors

Vectors are things that can be multiplied by numbers and added together

to form other vectors in the same vector space. So if U and V are vectors

in a vector space S over a set F of numbers and x and y are numbers in F ,

then

W = xU + yV (1.49)

is also a vector in the vector space S.

A basis for a vector space S is a set of vectors Bk for k = 1 . . . N in terms

of which every vector U in S can be expressed as a linear combination

U = u1B1 + u2B2 + + uNBN (1.50)with numbers uk in F . These numbers uk are the components of the vector

U in the basis Bk.

Example: Suppose the vector W represents a certain kind of washer and

the vector N represents a certain kind of nail. Then if n and m are natural

numbers, the vector

H = nW +mN (1.51)

would represent a possible inventory of a very simple hardware store. The

vector space of all such vectors H would include all possible inventories of

the store. That space is a two-dimensional vector space over the natural

numbers, and the two vectors W and N form a basis for it.

Example: The complex numbers are a vector space. Two of its vectors

are the number 1 and the number i; the vector space of complex numbers is

then the set of all linear combinations

z = x1 + yi = x+ iy. (1.52)

8 Linear Algebra

So the complex numbers form a two-dimensional vector space over the real

numbers, and the vectors 1 and i form a basis for it.

The complex numbers also form a one-dimensional vector space over the

complex numbers. Here any non-zero real or complex number, for instance

the number 1 can be a basis consisting of the single vector 1. This one-

dimensional vector space is the set of all z = z1 for arbitrary complex z.

Example: Ordinary flat two-dimensional space is the set of all linear

combinations

r = xx + yy (1.53)

in which x and y are real numbers and x and y are perpendicular vectors of

unit length (unit vectors). This vector space, called R2, is a 2-d space overthe reals.

Note that the same vector r can be described either by the basis vectors

x and y or by any other set of basis vectors, such as y and xr = xx + yy = y(y) + xx. (1.54)

So the components of the vector r are (x, y) in the {x, y} basis and (y, x) inthe {y, x} basis. Each vector is unique, but its components dependupon the basis.

Example: Ordinary flat three-dimensional space is the set of all linear

combinations

r = xx + yy + zz (1.55)

in which x, y, and z are real numbers. It is a 3-d space over the reals.

Example: Arrays of a given dimension and size can be added and multi-

plied by numbers, and so they form a vector space. For instance, all complex

three-dimensional arrays aijk in which 1 i 3, 1 j 4, and 1 k 5form a vector space over the complex numbers.

Example: Derivatives are vectors, so are partial derivatives. For in-

stance, the linear combinations of x and y partial derivatives taken at

x = y = 0

a

x+ b

y(1.56)

form a vector space.

Example: The space of all linear combinations of a set of functions fi(x)

defined on an interval [a, b]

f(x) =i

zifi(x) (1.57)

1.5 Linear Operators 9

is a vector space over the space of the numbers {zi}.Example: In quantum mechanics, a state is represented by a vector,

often written as or in Diracs notation as |. If c1 and c2 are complexnumbers, and |1 and |2 are any two states, then the linear combination

| = c1|1+ c2|2 (1.58)also is a possible state of the system.

1.5 Linear Operators

A linear operator A is a map that takes any vector U in its domain into

another vector U = A(U) AU in a way that is linear. So if U and V aretwo vectors in the domain of the linear operator A and b and c are two real

or complex numbers, then

A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.59)

In the most important case, the operator A maps vectors in a vector space

S into vectors in the same space S. In this case, A maps each basis vector

Bi for the space S into a linear combination of these basis vectors Bk

ABi = a1iB1 + a2iB2 + + aNiBN =Nk=1

akiBk. (1.60)

The square matrix aki represents the linear operator A in the Bk basis.

The effect of A on any vector U = u1B1 + u2B2 + + uNBN in S then is

AU = A(

Ni=1

uiBi) =

Ni=1

uiABi =

Ni=1

ui

Nk=1

akiBk

=Nk=1

(Ni=1

akiui

)Bk. (1.61)

Thus the kth component uk of the vector U = AU is

uk = ak1u1 + ak2u2 + + akNuN =Ni=1

akiui. (1.62)

Thus the column vector u of the components uk of the vector U = AU

is the product u = au of the matrix with elements aki that represents thelinear operator A in the Bk basis with the column vector with components

10 Linear Algebra

ui that represents the vector U in that basis. So in a given basis, vectors

and linear operators can be identified with column vectors and matrices.

Each linear operator is unique, but its matrix depends upon the

basis. Suppose we change from the Bk basis to another basis Bk

Bk =

N`=1

u`kB` (1.63)

in which the N N matrix u`k has an inverse matrix u1ki so thatNk=1

u1ki Bk =Nk=1

u1kiN`=1

u`kB` =

N`=1

(Nk=1

u`ku1ki

)B` =

N`=1

`iB` = B

i.

(1.64)

Then the other basis vectors are given by

Bi =Nk=1

u1ki Bk (1.65)

and one may show (problem 3) that the action of the linear operator A on

this basis vector is

ABi =N

j,k,`=1

u`jajku1ki B

` (1.66)

which shows that the matrix a that represents A in the B basis is relatedto the matrix a that represents it in the B basis by

a`i =N

jk=1

u`jajku1ki (1.67)

which in matrix notation is simply a = u au1.Example: Suppose the action of the linear operator A on the basis

{B1, B2} is AB1 = B2 and AB2 = 0. If the column vectors

b1 =

(1

0

)and b2 =

(0

1

)(1.68)

represent the two basis vectors B1 and B2, then the matrix

a =

(0 0

1 0

)(1.69)

would represent the linear operator A. But if we let the column vectors

b1 =(

1

0

)and b2 =

(0

1

)(1.70)

1.6 Inner Products 11

represent the basis vectors

B1 =12

(B1 +B2)

B2 =12

(B1 B2) (1.71)

then the vectors

b1 =12

(1

1

)and b2 =

12

(1

1)

(1.72)

would represent B1 and B2, and so the matrix

a =1

2

(1 1

1 1)

(1.73)

would represent the linear operator A.

A linear operator A also may map a vector space S with basis Bk into a

different vector space T with a different basis Ck. In this case, A maps the

basis vector Bi into a linear combination of the basis vectors Ck

ABi =Mk=1

akiCk (1.74)

and an arbitrary vector U = u1B1 + + uNBN into

AU =Mk=1

(Ni=1

akiui

)Ck. (1.75)

Well return to this point in Sections (1.14 & 1.15).

1.6 Inner Products

Most, but not all, of the vector spaces used by physicists have an inner

product. An inner product is a function that associates a number (f, g)

with every ordered pair of vectors f & g in the vector space in such a way

as to satisfy these rules:

(f, g) = (g, f) (1.76)(f, z1g1 + z2g2) = z1(f, g1) + z2(f, g2) (1.77)

(z1f1 + z2f2, g) = z1(f1, g) + z

2(f2, g) (1.78)

(f, f) 0 (1.79)in which the f s and gs are vectors and the zs are numbers. The first two

rules require that the inner product be linear in the second vector of the

12 Linear Algebra

pair and anti-linear in the first vector of the pair. (The third rule follows

from the first two.) If, in addition, the only vector f that has a vanishing

inner product with itself is the zero vector

(f, f) = 0 if and only if f = 0 (1.80)

then the inner product is hermitian or non degenerate; otherwise it is

semi-definite or degenerate.

The inner product of a vector f with itself is the square of the norm

|f | = f of the vector|f |2 = f 2= (f, f) (1.81)

and so by (1.79), the norm is well-defined as

f =

(f, f). (1.82)

The distance between two vectors f and g is the norm of their difference

f g . (1.83)Example: The space of real vectors V with N components Vi forms an

N -dimensional vector space over the real numbers with inner product

(U, V ) =Ni=1

UiVi (1.84)

If the inner product (U, V ) is zero, then the two vectors are orthogonal. If

(U,U) = 0, then

(U,U) =Ni=1

U2i = 0 (1.85)

which implies that all Ui = 0, so the vector U = 0. So this inner product is

hermitian or non degenerate.

Example: The space of complex vectors V with N components Vi forms an

N -dimensional vector space over the complex numbers with inner product

(U, V ) =

Ni=1

Ui Vi (1.86)

If the inner product (U, V ) is zero, then the two vectors are orthogonal. If

(U,U) = 0, then

(U,U) =

Ni=1

Ui Ui =Ni=1

|Ui|2 = 0 (1.87)

1.6 Inner Products 13

which implies that all Ui = 0, and so the vector U is zero. So this inner

product is hermitian or non degenerate.

Example: For the vector space of N L complex matrices A, B, . . ., thetrace of the product of the adjoint (1.31) of A times B is a natural inner

product

(A,B) = TrAB =Ni=1

Lj=1

(A)jiBij =Ni=1

Lj=1

AijBij . (1.88)

Note that (A,A) is positive

(A,A) = TrAA =Ni=1

Lj=1

AijAij =Ni=1

Lj=1

|Aij |2 0 (1.89)

and zero only when A = 0.

Two examples of degenerate or semi-definite inner products are given in

the section (1.41) on correlation functions.

Mathematicians call a vector space with an inner product (1.761.79) an

inner-product space, a metric space, and a pre-Hilbert space.

A sequence of vectors fn is a Cauchy sequence if for every > 0 there

is an integer N() such that fn fm < whenever both n and m exceedN(). A sequence of vectors fn converges to a vector f if for every > 0

there is an integer N() such that ffn < whenever n exceeds N(). Aninner-product space with a norm defined as in (1.82) is complete if each of

its Cauchy sequences converges to a vector in that space. A Hilbert space

is a complete inner-product space. Every finite-dimensional inner-product

space is complete and so is a Hilbert space. But the term Hilbert space

more often is used to describe infinite-dimensional complete inner-product

spaces, such as the space of all square-integrable functions (David Hilbert,

18621943).

Example 1.1 (The Hilbert Space of Square-Integrable Functions) For the

vector space of functions (1.57), a natural inner product is

(f, g) =

badx f(x)g(x). (1.90)

The squared norm f of a function f(x) is

f 2= badx |f(x)|2. (1.91)

A function is said to be square integrable if its norm is finite. The space of

14 Linear Algebra

all square-integrable functions is an inner-product space; it is also complete

and so is a Hilbert space.

1.7 Schwarz Inequality

Since by (1.79) the inner product of a vector with itself cannot be negative,

it follows that for any vectors f and g and any complex number z = x+ iy

the inner product

P (x, y) = (f +zg, f+zg) = (f, f)+zz(g, g)+z(g, f)+z(f, g) 0 (1.92)is positive or zero. It even is non-negative at its minimum, which we may

find by differentiation

0 =P (x, y)

x=P (x, y)

y(1.93)

to be at

x = Re(f, g)/(g, g) & y = Im(f, g)/(g, g) (1.94)as long as (g, g) > 0. If we substitute these values into Eq.(1.92), then we

arrive at the relation

(f, f)(g, g) |(f, g)|2 (1.95)which is called variously the Cauchy-Schwarz inequality and the Schwarz

inequality. Equivalently

f g |(f, g)|. (1.96)If the inner product is degenerate and (g, g) = 0, then the non-negativity of

(f + zg, f + zg) implies that (f, g) = 0, in which case the Schwarz inequality

is trivially satisfied.

Example: For the dot-product of two real 3-vectors r & R, the Cauchy-

Schwarz inequality is

(r r) (R R) (r R)2 = (r r) (R R) cos2 (1.97)where is the angle between r and R.

Example: For two real n-vectors x and y, the Schwarz inequality is

(x x) (y y) (x y)2 = (x x) (y y) cos2 (1.98)and it implies (problem 5) that

x+ y x+ y. (1.99)

1.8 Linear Independence and Completeness 15

Example: For two complex n-vectors u and v, the Schwarz inequality is

(u u) (v v) |u v|2 = (u u) (v v) cos2 (1.100)and it implies (problem 6) that

u+ v u+ v. (1.101)Example: For the inner product (1.90) of two complex functions f and

g, the Schwarz inequality is badx |f(x)|2

badx |g(x)|2

badx f(x)g(x)

2 . (1.102)

1.8 Linear Independence and Completeness

A set of N vectors Vi is linearly dependent if there exist numbers ci, not

all zero, such that the linear combination ciVi vanishes

Ni=1

ciVi = 0. (1.103)

A set of vectors Vi is linearly independent if it is not linearly dependent.

A set {Vi} of linearly independent vectors is maximal in a vector spaceS if the addition of any other vector V S to the set {Vi} makes the set{V, Vi} linearly dependent.

A set {Vi} of N linearly independent vectors that is maximal in a vectorspace S spans that space. For if V is any vector in S (and not one of the

vectors Vi), then the set {V, Vi} linearly dependent. Thus there are numbersc, ci, not all zero, that make the sum

cV +Ni=1

ciVi = 0 (1.104)

vanish. Now if c were 0, then the set {Vi} would be linearly dependent.Thus c 6= 0, and so we may divide by it and express the arbitrary vector Vas a linear combination of the vectors Vi

V = 1c

Ni=1

ciVi. (1.105)

So the set of vectors {Vi} spans the space S; it is a complete set of vectorsin the space S.

16 Linear Algebra

A set of vectors {Vi} that is complete in a vector space S is said to providea basis for that space because the set affords a way to expand an arbitrary

vector in S as a linear combination of the basis vectors {Vi}. If the vectorsof basis are linearly dependent, then at least one of them is superfluous; thus

it is convenient to have the vectors of a basis be linearly independent.

1.9 Dimension of a Vector Space

Suppose {Vi|i = 1 . . . N} and {Ui|i = 1 . . .M} are two sets of N and Mmaximally linearly independent vectors in a space S. Then N = M .

Suppose M < N . Since the U s are complete, as explained in Sec. 1.8, we

may express each of the N vectors Vi in terms of the M vectors Uj

Vi =Mj=1

AijUj . (1.106)

Let Aj be the vector with components Aij ; there are M < N such vectors,

and each has N > M components. So it is always possible to find a non-

zero N -dimensional vector C with components ci that is orthogonal to all

M vectors Aj :

Ni=1

ciAij = 0. (1.107)

But then the linear combination

Ni=1

ciVi =Ni=1

Mj=1

ciAij Uj = 0 (1.108)

vanishes, which would imply that the N vectors Vi were linearly dependent.

Since these vectors are by assumption linearly independent, it follows that

N M .Similarly, one may show that M N . Thus M = N .The number N of vectors in a maximal linearly independent set of a vector

space S is the dimension of the vector space. Any N linearly independent

vectors in an N -dimensional space forms a basis for it.

1.10 Orthonormal Vectors

Suppose the vectors {Vi|i = 1 . . . N} are linearly independent. Then we maymake out of them a set of N vectors Ui that are orthonormal

(Ui, Uj) = ij . (1.109)

1.10 Orthonormal Vectors 17

Procedure (Gramm-Schmidt): We set

U1 =V1

(V1, V1)(1.110)

So the first vector U1 is normalized.

Next we set

u2 = V2 + c12U1 (1.111)

and require that u2 be orthogonal to U1

0 = (U1, u2) = (U1, c12U1 + V2) = c12 + (U1, V2) (1.112)

whence c12 = (U1, V2), and sou2 = V2 (U1, V2)U1. (1.113)

The normalized vector U2 then is

U2 =u2

(u2, u2). (1.114)

Similarly, we set

u3 = V3 + c13U1 + c23U2 (1.115)

and ask that u3 be orthogonal both to U1

0 = (U1, u3) = (U1, c13U1 + c23U2 + V3) = c13 + (U1, V3) (1.116)

and to U2

0 = (U2, u3) = (U2, c13U1 + c23U2 + V3) = c23 + (U2, V3) (1.117)

whence ci3 = (Ui, V3) for i = 1 & 2, and sou3 = V3 (U1, V3)U1 (U2, V3)U2. (1.118)

The normalized vector U3 then is

U3 =u3

(u3, u3). (1.119)

We may continue in this way until we reach the last of the N linearly

independent vectors. We require the kth unnormalized vector uk

uk = Vk +k1i=1

cikUi. (1.120)

18 Linear Algebra

to be orthogonal to the k 1 vectors Ui and find that cik = (Ui, Vk) sothat

uk = Vk k1i=1

(Ui, Vk)Ui. (1.121)

The normalized vector then is

Uk =uk

(uk, uk). (1.122)

In general, a basis is more useful if it is composed of orthonormal vectors.

1.11 Outer Products

From any two vectors f and g, we may make an operator A that takes any

vector h into the vector f with coefficient (g, h)

Ah = f(g, h). (1.123)

It is easy to show that A is linear, that is that

A(zh+ we) = zAh+ wAe (1.124)

for any vectors e, h and numbers z, w.

Example: If f and g are vectors with components fi and gi, and h has

components hi, then the linear transformation is

(Ah)i =Nj=1

Aijhj = fi

Nj=1

gjhj (1.125)

so A is a matrix with entries

Aij = figj . (1.126)

The matrix A is the outer product of the vectors f and g.

1.12 Dirac Notation

Such outer products are important in quantum mechanics, and so Dirac

invented a notation for linear algebra that makes them easy to write. In his

notation, the outer product A of Eqs.(1.1231.126) is

A = |fg| (1.127)

20 Linear Algebra

as well as

|ff | =aa ab acba bb bcca cb cc

and |gg| = (zz zwwz ww

). (1.139)

Example: In Dirac notation, formula (1.121) is

|uk = |Vk k1i=1

|UiUi|Vk (1.140)

or

|uk =(I

k1i=1

|UiUi|)|Vk (1.141)

and (1.122) is

|Uk = |ukuk|uk . (1.142)

1.13 Identity Operators

Dirac notation provides a neat way of representing the identity operator I in

terms of a complete set of orthonormal vectors. First, in standard notation,

the expansion of an arbitrary vector f in a space S in terms of a complete

set of N orthonormal vectors ei

(ej , ei) = ij (1.143)

is

f =

Ni=1

ci ei (1.144)

from which we conclude that

(ej , f) = (ej ,

Ni=1

ciei) =

Ni=1

ci(ej , ei) =

Ni=1

ciij = cj (1.145)

whence

f =

Ni=1

(ei, f) ei =

Ni=1

ei (ei, f). (1.146)

The derivation stops here because there is no explicit expression for a bra.

1.13 Identity Operators 21

But in Dirac notation, these equations read

ej |ei = ij (1.147)

|f =Ni=1

ci |ei (1.148)

ej |f = ej |Ni=1

ciei =Ni=1

ciej |ei =Ni=1

ciij = cj (1.149)

|f =Ni=1

ei|f |ei =Ni=1

|ei ei|f. (1.150)

(1.151)

We now rewrite the last equation as

|f =(

Ni=1

|ei ei|)|f. (1.152)

Since this equation holds for every vector |f, the quantity inside the paren-theses must be the identity operator

I =Ni=1

|ei ei|. (1.153)

Because one always may insert an identity operator anywhere, and because

the formula is true for every complete set of orthonormal vectors, the reso-

lution (1.153) of the identity operator is extremely useful.

By twice inserting the identity operator (1.153), one may convert a general

inner product (g,Af) = g|A|f into an expression involving a matrix Aijthat represents the linear operator A

g|A|f = g|IAI|f

=

Ni,j=1

g|eiei|A|ejej |f (1.154)

In the basis {|ek}, the matrix Aij that represents the linear oper-ator A is

Aij = ei|A|ej (1.155)and the components of the vectors |f and |g are

fi = ei|fgi = ei|g. (1.156)

22 Linear Algebra

In this basis, the inner product (g,Af) = g|A|f takes the form

g|A|f =N

i,j=1

giAijfj . (1.157)

1.14 Vectors and Their Components

Usually, the components vk of a vector |v are the inner productsvk = k|v (1.158)

of the vector |v with a set of orthonormal basis vectors |k. Thus thecomponents vk of a vector |v depend on both the vector and the basis. Avector is independent of the basis used to compute its components,

but its components depend upon the chosen basis.

If the basis is orthonormal and so provides for the identity operator I the

expansion

I =

Nk=1

|kk| (1.159)

then the components vk of the vector |v are the coefficients in its expansionin terms of the basis vectors |k

|v = I|v =Nk=1

|kk|v =Nk=1

vk|k. (1.160)

1.15 Linear Operators and Their Matrices

A linear operator A maps vectors into vectors linearly as in Eq. (1.59)

A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.161)

In the simplest and most important case, the linear operator A maps the

vectors of a vector space S into vectors in the same space S. If the space S

is N -dimensional, then it maps the vectors |i of any basis {|i} for S intovectors A|i |Ai that can be expanded in terms of the same basis {|k}

A|i = |Ai =Nk=1

Aki|k. (1.162)

The N N matrix with entries Aki represents the linear operator A

1.15 Linear Operators and Their Matrices 23

in the basis {|i}. Because A is linear, its action on an arbitrary vector|C = Ni=1Ci |i in S is

A|C = A(

Ni=1

Ci |i)

=

Ni=1

CiA |i =Nk=1

Ni=1

AkiCi |k. (1.163)

Thus the coefficients (AC)k of the vector A|C |AC in the expansion

A|C = |AC =Nk=1

(AC)k |k (1.164)

are given by the matrix multiplication of the vector C with elements Ci by

the matrix A with entries Aki

(AC)k =Ni=1

AkiCi. (1.165)

Both the elements Ci of the vector C and the entries Aki of the

matrix A depend upon the basis {|i} one chooses to use.If the vectors {|i} are orthonormal, then the elements C` and Aì are

`|C =Ni=1

Ci `|i =Ni=1

Ci ì = C`

`|A|i =Nk=1

Aki `|k =Nk=1

Aki `k = Aì. (1.166)

In the more general case, the linear operator A maps vectors in a vector

space S into vectors in a different vector space S. Now A maps an orthonor-mal basis {|i} for S into vectors A|i that may be expanded in terms of anorthonormal basis {|k}

A|i =N k=1

Aki |k. (1.167)

If the N vectors A|i are linearly independent, then N = N , but if theyare linearly dependent or if some of them are zero, then N < N . Theelements Aì of the matrix that represents the linear operator A

now are

8`|A|i =N k=1

Aki8`|k =

N k=1

Aki `k = Aì. (1.168)

They depend on both bases {|i} and {|k}. So although the linear

24 Linear Algebra

operator is basis independent, the matrices that represent it vary with the

chosen bases.

So far we have mostly been talking about linear operators that act on

finite-dimensional vector spaces and that can be represented by matrices.

But infinite-dimensional vector spaces and the linear operators that act on

them play central roles in electrodynamics and quantum mechanics. For

instance, the Hilbert space H of all wave functions (x, t) that are squareintegrable over three-dimensional space at all times t is of (very) infinite

dimension. An example in one space dimension of a linear operator that

maps (a subspace of) H to H is the hamiltonian H for a non-relativisticparticle of mass m in a potential V

H = ~2

2m

d2

dx2+ V (x). (1.169)

It maps the state vector | with components x| = (x) into the vectorH| with components

x|H| = H(x) = ~2

2m

d2(x)

dx2+ V (x)(x) (1.170)

where ~ = 1.05 1034 Js. Translations in space and timeUT (a, b)(x, t) = (x + a, t+ b) (1.171)

and rotations in space

UR()(x, t) = (R()x, t) (1.172)

are also represented by linear operators acting on vector spaces of infinite

dimension. As well see in what follows, these linear operators are unitary.

We may think of linear operators that act on vector spaces of infinite di-

mension as infinite-dimensional matrices or as matrices of continuously

infinite dimension, the latter really being integral operators like

H =

dp

dp |pp|H|pp|. (1.173)

Thus we may carry over to spaces of infinite dimension most of our intuition

about matricesas long as we use common sense and keep in mind that

infinite sums and integrals do not always converge to finite numbers.

1.16 Determinants

The determinant of a 2 2 matrix A isdetA = |A| = A11A22 A21A12. (1.174)

1.16 Determinants 25

In terms of the antisymmetric matrix eij = eji (which implies that e11 =e22 = 0) with e12 = 1, this determinant is

detA =2i=1

2j=1

eijAi1Aj2. (1.175)

Its also true that

ek` detA =

2i=1

2j=1

eijAikAj`. (1.176)

These definitions and results extend to any square matrix. If A is a 3 3matrix, then its determinant is

detA =3

ijk=1

eijkAi1Aj2Ak3 (1.177)

in which eijk is totally antisymmetric with e123 = 1 and the sums over i, j,

& k run from 1 to 3. More explicitly, this determinant is

detA =3

ijk=1

eijkAi1Aj2Ak3

=

3i=1

Ai1

3jk=1

eijkAj2Ak3

= A11 (A22A33 A32A23) +A21 (A32A13 A12A33)+A31 (A12A23 A22A13) . (1.178)

This sum involves the 2 2 determinants of the matrices that result whenwe strike out column 1 and row i, which are called minors, multiplied by

(1)1+i

detA = A11(1)2 (A22A33 A32A23) +A21(1)3 (A12A33 A32A13)+A31(1)4 (A12A23 A22A13) (1.179)

=

3i=1

Ai1Ci1. (1.180)

These minors multiplied by (1)1+i are called cofactors:C11 = A22A33 A23A32C21 = (A12A33 A32A13)C31 = A12A23 A22A13. (1.181)

26 Linear Algebra

This way of computing determinants is due to Laplace.

Example: The determinant of a 3 3 matrix is the dot product of thevector of its first row with the cross-product of the vectors of its second and

third rows:U1 U2 U3V1 V2 V3W1 W2 W3

=3

ijk=1

eijkUiVjWk =

3i=1

Ui(V W )i = U (V W ).

(1.182)

Totally antisymmetric quantities ei1i2...iN with N indices and with e123...N =

1 provide a definition of the determinant of an N N matrix A as

detA =N

i1i2...iN=1

ei1i2...iNAi11Ai22 . . . AiNN (1.183)

in which the sums over i1 . . . iN run from 1 to N . The general form of

Laplaces expansion of this determinant is

detA =Ni=1

AikCik =Nk=1

AikCik (1.184)

in which the first sum is over the row index i but not the (arbitrary) col-

umn index k, and the second sum is over the column index k but not the

(arbitrary) row index i. The cofactor Cik is (1)i+kMik in which the minorMik is the determinant of the (N 1) (N 1) matrix A without its ithrow and kth column.

Incidentally, its also true that

ek1k2...kN detA =N

i1i2...iN=1

ei1i2...iNAi1k1Ai2k2 . . . AiNkN . (1.185)

The key feature of a determinant is that it is an antisymmetric combina-

tion of products of the elements Aik of a matrix A. One implication of this

antisymmetry is that the interchange of any two rows or any two columns

changes the sign of the determinant. Another is that if one adds a multiple

of one column to another column, for example a multiple xAi2 of column 2

to column 1, then the determinant

detA =N

i1i2...in=1

ei1i2...iN (Ai11 + xAi12)Ai22 . . . AiNN (1.186)


is unchanged. The reason is that the extra term detA vanishes

detA =N

i1i2...iN=1

x ei1i2...iN Ai12Ai22 . . . AiNN = 0 (1.187)

because it is proportional to a sum of products of a factor ei1i2...iN that is

antisymmetric in i1 and i2 and a factor Ai12Ai22 that is symmetric in these

indices. For instance, when i1 and i2 are 5 & 7 and 7 & 5, the two terms

cancel

e57...iNA52A72 . . . AiNN + e75...iNA72A52 . . . AiNN = 0 (1.188)

because e57...iN = e75...iN .By repeated additions of x2Ai2, x3Ai3, etc. to Ai1, we can change the

first column of the matrix A to a nearly arbitrary linear combination of all

the columns

Ai1 Ai1 +Nk=2

xkAik (1.189)

without changing detA. This linear combination is not completely arbitrary

because the coefficient of Ai1 remains unity. The analogous operation

Ai` Ai` +N

k=1,k 6=`ykAik (1.190)

replaces the `th column by a nearly arbitrary linear combination of all the

columns without changing detA.

The key concepts of linear dependence and independence were explained

in Sec. 1.8. Suppose that the columns of an N N matrix A are linearlydependent, so that for some coefficients yk not all zero the linear combination

Nk=1

ykAik = 0 i (1.191)

vanishes for all i (the upside-down A means for all). Suppose y1 6= 0. Thenby adding suitable linear combinations of columns 2 through N to column

1, we could make all the elements Ai1 of column 1 vanish without changing

detA. But then the detA as given by (1.183) would vanish. It follows that

the determinant of any matrix whose columns are linearly dependent must

vanish.

The converse also is true: if columns of a matrix are linearly independent,

then the determinant of that matrix can not vanish. To see why, let us

28 Linear Algebra

recall, as explained in Sec. 1.8, that any linearly independent set of vectors

is complete. Thus if the columns of a matrix A are linearly independent and

therefore complete, some linear combination of all columns 2 through N

when added to column 1 will convert column 1 into a (non-zero) multiple of

the N -dimensional column vector (1, 0, 0, . . . 0), say (c1, 0, 0, . . . 0). Similar

operations will convert column 2 into a (non-zero) multiple of the column

vector (0, 1, 0, . . . 0), say (0, c2, 0, . . . 0). Continuing in this way, we may

convert the matrix A to a matrix with non-zero entries along the main

diagonal and zeros everywhere else. The determinant detA is then the

product of the non-zero diagonal entries c1c2 . . . cN 6= 0, and so detA cannot vanish.

We may extend these arguments to the rows of a matrix. The addition

to row k of a linear combination of the other rows

Aki Aki +N

`=1,` 6=kz`A`i (1.192)

does not change the value of the determinant. In this way, one may show

that the determinant of a matrix vanishes if and only if its rows are linearly

dependent. The reason why these results apply to the rows as well as to

the columns is that the determinant of a matrix A may be defined either in

terms of the columns as in definitions (1.183) & 1.185) or in terms of the

rows:

detA =

Ni1i2...iN=1

ei1i2...iNA1i1A2i2 . . . ANiN (1.193)

ek1k2...kN detA =

Ni1i2...iN=1

ei1i2...iNAk1i1Ak2i2 . . . AkN iN . (1.194)

These and many other properties of determinants follow from a study of

permutations, which are discussed in Section 10.13. Detailed proofs can

be found in the book by Aitken (Aitken, 1959).

By comparing the row (1.183) & 1.185) and (1.193 & 1.194) column def-

initions of determinants, we see that the determinant of the transpose of a

matrix is the same as the determinant of the matrix itself:

detAT = detA. (1.195)

Let us return for a moment to Laplaces expansion (1.184) for the deter-

minant detA of an N N matrix A as a sum of AikCik over the row index


i with the column index k held fixed

detA =Ni=1

AikCik (1.196)

in order to prove that

k` detA =

Ni=1

AikCi`. (1.197)

For k = `, this formula just repeats Laplaces expansion (1.196). But for

k 6= `, it is Laplaces expansion for the determinant of a matrix A thatis the same as A but with its `th column replaced by its kth one. Since

the matrix A has two identical columns, its determinant vanishes, whichexplains (1.197) for k 6= `.

The rule (1.197) therefore provides a formula for the inverse of a matrix A

whose determinant does not vanish. Such matrices are called nonsingular.

The inverse A1 of an N N nonsingular matrix A is the transpose of thematrix of cofactors divided by detA(

A1)`i

=Ci`

detAor A1 =

CT

detA. (1.198)

To verify this formula, we use it for A1 in the product A1A and note thatby (1.197) the `kth entry of the product A1A is just `k(

A1A)`k

=Ni=1

(A1

)`iAik =

Ni=1

Ci`detA

Aik = `k (1.199)

as required.

Example: Lets apply our formula (1.198) to find the inverse of the

general 2 2 matrixA =

(a b

c d

). (1.200)

We find then

A1 =1

ad bc(d bc a

)(1.201)

which is the correct inverse.

The simple example of matrix multiplicationa b cd e fg h i

1 x y0 1 z0 0 1

=a xa+ b ya+ zb+ cd xd+ e yd+ ze+ fg xg + h yg + zh+ i

(1.202)

30 Linear Algebra

shows that the operations (1.190) on columns that dont change the value

of the determinant can be written as matrix multiplication from the right

by a matrix that has unity on its main diagonal and zeros below.

Now imagine that A and B are NN matrices and consider the 2N2Nmatrix product (

A 0

I B)(

I B

0 I

)=

(A AB

I 0)

(1.203)

in which I is theNN identity matrix, and 0 is theNN matrix of all zeros.The second matrix on the left-hand side has unity on its main diagonal and

zeros below, and so it does not change the value of the determinant of the

matrix to its left, which thus is equal to that of the matrix on the right-hand

side:

det

(A 0

I B)

= det

(A AB

I 0). (1.204)

By using Laplaces expansion (1.184) along the first column to evaluate the

determinant on the left-hand side (LHS) and Laplaces expansion (1.184)

along the last row to compute the determinant on the right-hand side (RHS),

one may derive the general and important rule that the determinant of

the product of two matrices is the product of the determinants

detA detB = detAB. (1.205)

Example: The case in which the matrices A and B are both 2 2 iseasy to understand. The LHS of Eq.(1.204) gives

det

(A 0

I B)

= det

a11 a12 0 0

a21 a22 0 0

1 0 b11 b120 1 b21 b22

(1.206)= a11a22 detB a21a12 detB = detAdetB

while its RHS comes to

det

(A AB

I 0)

= det

a11 a12 ab11 ab12a21 a22 ab21 ab221 0 0 00 1 0 0

= (1)C42 = (1)(1) detAB = detAB. (1.207)

Often one uses an absolute-value notation to denote a determinant, |A| =


detA. In this more compact notation, the obvious generalization of the

product rule is

|ABC . . . Z| = |A||B| . . . |Z|. (1.208)The product rule (1.208) implies that the determinant of A1 is the inverse

of |A| since1 = |I| = |AA1| = |A||A1|. (1.209)

Incidentally, Gauss, Jordan, and others have developed much faster ways

of computing determinants and matrix inverses than those (1.184 & 1.198)

due to Laplace. Octave, Matlab, Maple, and Mathematica use these more

modern techniques, which also are freely available as programs in C and

fortran from www.netlib.org/lapack.

Numerical Example: Adding multiples of rows to other rows does not

change the value of a determinant, and interchanging two rows only changes

a determinant by a minus sign. So we can use these operations, which leave

determinants invariant, to make a matrix upper triangular, a form in

which its determinant is just the product of the factors on its diagonal. For

instance, to make the matrix

A =

1 2 12 6 34 2 5

(1.210)upper triangular, we add twice the first row to the second row1 2 10 2 5

4 2 5

(1.211)and then subtract four times the first row from the third1 2 10 2 5

0 6 9

. (1.212)Next, we subtract three times the second row from the third1 2 10 2 5

0 0 24

. (1.213)We now find as the determinant of A the product of its diagonal elements:

|A| = 1(2)(24) = 48. (1.214)The Matlab command is d = det(A).

32 Linear Algebra

1.17 Systems of Linear Equations

Suppose we wish to solve the system of linear equations

Nk=1

Aikxk = yi (1.215)

for the N unknowns xk. In matrix notation, with A an N N matrix andx and y N -vectors, this system of equations is

Ax = y. (1.216)

If the matrix A is non-singular, that is, if det(A) 6= 0, then it has aninverse A1 given by (1.198), and we may multiply both sides of (1.216) byA1 and so find x as

x = Ix = A1Ax = A1y. (1.217)

When A is non-singular, this is the unique solution to (1.215).

When A is singular, det(A) = 0, and so its columns are linearly dependent

as explained in Sec. 1.16. In this case, the linear dependence of the columns

of A implies that Az = 0 for some non-zero vector z, and so if x is a

solution, then Ax = y implies that x + cz for all c is also a solution since

A(x + cz) = Ax + cAz = Ax = y. So if det(A) = 0, then there may be

solutions, but there can be no unique solution. Whether equation (1.215)

has any solutions when det(A) = 0 depends on whether the vector y can be

expressed as a linear combination of the columns of A. Since these columns

are linearly dependent, they span a subspace of fewer than N dimensions,

and so (1.215) has solutions only when the N -vector y lies in that subspace.

A system of M equations

Nk=1

Aikxk = yi for i = 1, 2, . . . ,M (1.218)

in N , more than M , unknowns is under-determined. As long as at least

M of the N columns Aik of the matrix A are linearly independent, such a

system always has solutions, but they are not unique.

1.18 Linear Least Squares

Suppose we are confronted with a system of M equations

Nk=1

Aikxk = yi for i = 1, 2, . . . ,M (1.219)

1.18 Linear Least Squares 33

in fewer unknowns N < M . This problem is over-determined. In general,

it has no solution, but it does have an approximate solution due to Carl

Gauss (17771855).

If the matrix A and the vector y are real, then Gausss solution is the N

values xk that minimize the sum of the squares of the errors

E =Mi=1

(yi

Nk=1

Aikxk

)2. (1.220)

The minimizing values x` make the N derivatives of E vanish

E

x`= 0 =

Mi=1

2

(yi

Nk=1

Aikxk

)(Ai`) (1.221)

so in matrix notation

ATy = ATAx. (1.222)

Since A is real, the matrix of the form ATA is non-negative (1.41); if it also

is positive (1.42), then it has an inverse, and our least-squares solution

is

x =(ATA

)1ATy. (1.223)

If the matrix A and the vector y are complex, and if the matrix AA ispositive, then one may show (problem 16) that minimization of the sum of

the squares of the absolute values of the errors gives

x =(AA

)1Ay. (1.224)

Example from biophysics: If the wavelength of visible light were a

nanometer, microscopes would yield much sharper images. Each photon

from a (single-molecule) fluorofore entering the lens of a microscope would

follow ray optics and be focused within a tiny circle of about a nanometer

on a detector. Instead, a photon that should arrive at x = (x1, x2) ar-

rives at yi = (y1i, y2i) according to an approximately gaussian probability

distribution

P (yi) = c e(yix)2/(22) (1.225)

in which c is a normalization constant and is about 150 nm. What to do?

Keith Lidke and his merry band of biophysicists collect about N = 500

34 Linear Algebra

Figure 1.1 Conventional (left, fuzzy) and STORM (right, sharp) imagesof microtubules. The tubulin is labeled with a fluorescent anti-tubulinantibody. The white rectangles are 1 micron in length. Images courtesy ofKeith Lidke.

points yi and determine the point x that maximizes the joint probability of

the ensemble of image points

P =

Ni=1

P (yi) = cN

Ni=1

e(yix)2/(22) = cN exp

[

Ni=1

(yi x)2/(22)]

(1.226)

by solving for k = 1 and 2 the equations

P

xk= 0 = P

P

xk

[

Ni=1

(yi x)2/(22)]

=P

2

Ni=1

(yik xk) . (1.227)

Thus this maximum likelihood estimate of the image point x is the

average of the observed points yi

x =1

N

Ni=1

yi. (1.228)

Their stochastic optical reconstruction microscopy (STORM) is more

complicated because they also account for the finite accuracy of their detec-

tor.

Microtubules are long hollow tubes made of the protein tubulin. They

are 25 nm in diameter and typically have one end attached to a centrosome.

Together with actin and intermediate filaments, they form the cytoskeleton

of a eukaryotic cell. Fig. 1.1 shows conventional (left, fuzzy) and STORM

1.19 The Adjoint of an Operator 35

(right, sharp) images of microtubules. The fluorophore attaches at a random

point on an anti-tubulin antibody of finite size, which binds to the tubulin

of a microtubule. This spatial uncertainty and the motion of the molecules

of living cells limits the improvement in resolution is by a factor of 10 to 20.

1.19 The Adjoint of an Operator

The adjoint A of a linear operator A is defined by

(g,Af) = (Ag, f) = (f,A g). (1.229)

Equivalent expressions in Dirac notation are

g|Af = g|A|f = Ag|f = f |Ag = f |A|g. (1.230)So if the vectors {ei} are orthonormal and complete in a space S, then

with f = ej and g = ei, the definition (1.229) or (1.230) of the adjoint A

of a linear operator A implies

ei|A|ej = ej |A|ei (1.231)or (

A)ij

= (Aji) = Aji (1.232)

in agreement with our definiton (1.31) of the adjoint of a matrix as the

transpose of its complex conjugate

A = AT. (1.233)

Since both (A) = A and (AT)T = A, it follows that(A)

=(AT

) T = A (1.234)so the adjoint of an adjoint is the original operator.

By applying this rule (1.234) to the definition (1.229) of the adjoint, we

find the related rule

(g,Af) = (g,Af) = (Ag, f). (1.235)

1.20 Self-Adjoint or Hermitian Linear Operators

An operator A that is equal to its adjoint

A = A (1.236)

36 Linear Algebra

is self adjoint or hermitian. In view of definition (1.229), a self-adjoint

linear operator A satisfies

(g,A f) = (Ag, f) = (f,A g) (1.237)

or equivalently

g|A |f = Ag|f = f |Ag = f |A |g. (1.238)By Eq.(1.232), a hermitian operator A that acts on a finite-dimensional

vector space is represented in an orthonormal basis by a matrix that is

equal to the transpose of its complex conjugate

Aij = (A)ij =(A)ij

= (Aji) = Aji. (1.239)

Such matrices are said to be hermitian. Conversely, a linear operator that

is represented by a hermitian matrix in an orthonormal basis is self adjoint

(problem 17).

A matrix Aij that is real and symmetric or imaginary and anti-

symmetric is hermitian. But a self-adjoint linear operator A that is rep-

resented by a matrix Aij that is real and symmetric (or imaginary and

anti-symmetric) in one orthonormal basis will not in general be represented

by a matrix that is real and symmetric (or imaginary and anti-symmetric)

in a different orthonormal basis, but it will be represented by a hermitian

matrix in every orthonormal basis.

As well see in section (1.30), hermitian matrices have real eigenvalues

and complete sets of orthonormal eigenvectors. Hermitian operators and

matrices represent physical variables in quantum mechanics.

1.21 Real, Symmetric Linear Operators

In quantum mechanics, we usually consider complex vector spaces, that is,

spaces in which the vectors |f are complex linear combinations

|f =Ni=1

zi |i (1.240)

of complex orthonormal basis vectors |i.But real vector spaces also are of interest. A real vector space is a vector

space in which the vectors |f are real linear combinations

|f =Nn=1

xn |n (1.241)

1.22 Unitary Operators 37

of real orthonormal basis vectors, xn = xn and |n = |n.A real linear operator A on a real vector space

A =N

n,m=1

|nn|A|mm| =N

n,m=1

|nAnmm| (1.242)

is represented by a real matrix Anm = Anm. A real linear operator A that isself adjoint on a real vector space satisfies the condition (1.237) of hermiticity

but with the understanding that complex conjugation has no effect

(g,A f) = (Ag, f) = (f,A g) = (f,A g). (1.243)

Thus, its matrix elements are symmetric: g|A|f = f |A|g. Since A ishermitian as well as real, the matrix Anm that represents it (in a real basis)

is real and hermitian, and so is symmetric

Anm = Amn = Amn. (1.244)

1.22 Unitary Operators

A unitary operator U is one whose adjoint is its inverse

U U = U U = I. (1.245)

In general, the unitary operators well consider also are linear, that is

U (z|+ w|) = zU |+ wU | (1.246)for all states or vectors | and | and all complex numbers z and w.

In standard notation, U U = I implies that for any vectors f and g

(g, f) = (g, U Uf) = (Ug, Uf) (1.247)

as well as

(g, f) = (g, U U f) = (U g, U f). (1.248)

In Dirac notation, these equations are

g|f = g|U U |f = Ug|U |f = Ug|Uf (1.249)and

g|f = g|U U |f = U g|U |f = U g|U f. (1.250)Suppose the states {|n} form an orthonormal basis for a given vector

space. Then if U is any unitary operator, the relations (1.2471.250) show

38 Linear Algebra

that the states {U |n} also form an orthonormal basis. The orthonormalityof the image states {U |n} follows from that of the basis states {|n}

nm = n|m = Un|Um = n|U U |m. (1.251)The completeness relation for the basis states {|n} is that the sum of theirdyadics is the identity operator

n

|nn| = I (1.252)

and it implies that the images states {U |n} also are completen

U |nn|U = UIU = UU = I. (1.253)

So a unitary matrix U maps an orthonormal basis into another orthonor-

mal basis. In fact, any linear map from one orthonormal basis {|n} toanother {|n} must be unitary. Such an operator will be of the form

U =Nn=1

|nn| (1.254)

with

n|m = nm and n|m = nm. (1.255)The unitarity of such a sum is evident:

U U =Nn=1

|nn|Nm=1

|mm|

=

Nn=1

Nm=1

|n nm m| =Nn=1

|nn| = I. (1.256)

The product U U similarly collapses to unity.Unitary matrices have unimodular determinants. To show this, we use the

definition (1.245), that is, UU = I, and the product rule for determinants(1.208) to write

1 = |I| = |UU | = |U ||U | = |U ||UT| = |U ||U |. (1.257)A unitary matrix that is real is said to be orthogonal. An orthogonal

matrix O satisfies

OOT = OTO = I. (1.258)

1.23 Antiunitary, Antilinear Operators 39

1.23 Antiunitary, Antilinear Operators

Certain maps on states , such as those involving time reversal, areimplemented by operators K that are antilinear

K (z + w) = K (z|+ w|) = zK|+ wK| = zK + wK(1.259)

and antiunitary

(K,K) = K|K = (, ) = | = | = (, ) . (1.260)Dont feel bad if you find such operators spooky. I do too.

1.24 Symmetry in Quantum Mechanics

In quantum mechanics, a symmetry is a map of states f f that preservestheir inner products

|||2 = |||2 (1.261)and so their predicted probabilities. The inner products of the primed and

unprimed vectors are the same.

Eugene Wigner (19021995) has shown that every symmetry in quantum

mechanics can be represented either by an operator U that is linear and

unitary or by an operator K that is anti-linear and anti-unitary. The anti-

linear, anti-unitary case seems to occur only when the symmetry involves

time-reversal; most symmetries are represented by operators U that are lin-

ear and unitary. So unitary operators are of great importance in quantum

mechanics. They are used to represent rotations, translations, Lorentz trans-

formations, internal-symmetry transformations just about all symmetries

not involving time-reversal.

1.25 Lagrange Multipliers

The maxima and minima of a function f(x) of several variables x1, x2, . . . , xnare among the points at which its gradient vanishes

f(x) = 0. (1.262)These are the stationary points of f .

Example 1.2 (Minimum) For instance, if f(x) = x21 + 2x22 + 3x

23, then its

minimum is at

f(x) = (2x1, 4x2, 6x3) = 0 (1.263)

40 Linear Algebra

that is, at x1 = x2 = x3 = 0.

But how do we find the extrema of f(x) if x must satisfy k constraints

c1(x) = 0, c2(x) = 0, . . . , ck(x) = 0? We use Lagrange multipliers (Joseph-

Louis Lagrange, 17361813).

In the case of one constraint c(x) = 0, we no longer expect the gradient

f(x) to vanish, but its projectionf(x)dx must vanish in those directionsdx that preserve the constraint. Sof(x)dx = 0 for all dx that makec(x)dx = 0. This means that f(x) and c(x) must be parallel. Thus, theextrema of f(x) subject to the constraint c(x) = 0 satisfy the two equations

f(x) = c(x) and c(x) = 0. (1.264)These equations define the extrema of the unconstrained function

L(x, ) = f(x) c(x) (1.265)of the n+ 1 variables x, . . . , xn,

L(x, ) = f(x) c(x) = 0 and L(x, )

= c(x) = 0. (1.266)The extra variable is a Lagrange multiplier.

In the case of k constraints c1(x) = 0, . . . , ck(x) = 0, the projection

f dx must vanish in those directions dx that preserve all the constraints.So f(x) dx = 0 for all dx that make all cj(x) dx = 0 for j = 1, . . . , k.The gradient f will satisfy this requirement if its a linear combination

f = 1c1 + + kck (1.267)of the k gradients because then f dx will vanish if cj dx = 0 forj = 1, . . . , k. The extrema also must satisfy the constraints

c1(x) = 0, . . . , ck(x) = 0. (1.268)

Equations (1.267 & 1.268) define the extrema of the unconstrained function

L(x, ) = f(x) 1 c1(x) + . . . k ck(x) (1.269)of the n+ k variables x and

L(x, ) = f(x) c1(x) ck(x) = 0 (1.270)and

L(x, )

j= cj(x) = 0 j = 1, . . . , k. (1.271)

1.26 Eigenvectors and Invariant Subspaces 41

Example 1.3 (Constrained Extrema and Eigenvectors) Suppose we want

to find the extrema of a real, symmetric quadratic form f(x) = xTAx

subject to the constraint c(x) = x x 1 which says that the vector x is ofunit length.

We form the function

L(x, ) = xTAx (x x 1) (1.272)

and since the matrix A is real and symmetric, we find its unconstrained

extrema as

L(x, ) = 2Ax 2x = 0 and x x = 1. (1.273)

The extrema of f(x) = xTAx subject to the constraint c(x) = x x 1 arethe normalized eigenvectors

Ax = x and x x = 1. (1.274)

of the real, symmetric matrix A.

1.26 Eigenvectors and Invariant Subspaces

Let A be a linear operator that maps vectors |v in a vector space S intovectors in the same space. If T S is a subspace of S, and if the vectorA|u is in T whenever |u is in T , then T is an invariant subspace of S.The whole space S is a trivial invariant subspace of S, as is the null set .

If T S is a one-dimensional invariant subspace of S, then A maps eachvector |u T into another vector |u T , that is

A|u = |u. (1.275)

In this case, we say that |u is an eigenvector of A with eigenvalue .(The German adjective eigen means own, proper, singular.)

Example: The matrix equation(cos sin

sin cos )(

1

i)

= ei(

1

i)

(1.276)

tells us that the eigenvectors of this 22 orthogonal matrix are the 2-tuples(1,i) with eigenvalues ei.

Problem 18 is to show that the eigenvalues of a unitary (and hence of

an orthogonal) matrix are unimodular, || = 1.

42 Linear Algebra

Example: Let us consider the eigenvector equation

Nk=1

AikVk = Vi (1.277)

for a matrix A that is anti-symmetric Aik = Aki. The anti-symmetry ofA implies that

Ni,k=1

ViAikVk = 0. (1.278)

Thus the last two relations imply that

0 =N

i,k=1

ViAikVk = Ni=1

V 2i = 0. (1.279)

Thus either the eigenvalue or the dot-product of the eigenvector with itself

vanishes.

Problem 19 is to show that the sum of the eigenvalues of an anti-symmetric

matrix vanishes.

1.27 Eigenvalues of a Square Matrix

Let A be an N N matrix with complex entries Aik. A non-zero N -dimensional vector V with entries Vk is an eigenvector of the matrix A

with eigenvalue if

A|V = |V AV = V Nk=1

AikVk = Vi. (1.280)

Every N N matrix A has N eigenvectors V (`) and eigenvalues `AV (`) = `V

(`) (1.281)

for ` = 1 . . . N . To see why, we write the top equation (1.280) as

Nk=1

(Aik ik)Vk = 0 (1.282)

or in matrix notation as

(A I)V = 0 (1.283)in which I is the N N matrix with entries Iik = ik. These equivalent

1.27 Eigenvalues of a Square Matrix 43

equations (1.282 & 1.283) say that the columns of the matrix A I, con-sidered as vectors, are linearly dependent, as defined in section 1.8. We saw

in section 1.16 that the columns of a matrix, AI, are linearly dependentif and only if the determinant |A I| vanishes. Thus a non-zero solutionof the eigenvalue equation (1.280) exists if and only if the determinant

det (A I) = |A I| = 0 (1.284)vanishes. This requirement that the determinant of A I vanish is calledthe characteristic equation. For an N N matrix A, it is a polynomialequation of the Nth degree in the unknown eigenvalue

|A I| P (,A) = |A|+ + (1)N1N1 TrA+ (1)NN

=

Nk=0

pk k = 0 (1.285)

in which p0 = |A|, pN1 = (1)N1TrA, and pN = (1)N . (All the pksare basis independent.) By the fundamental theorem of algebra, proved in

Sec. 5.9, the characteristic equation always has N roots or solutions ` lying

somewhere in the complex plane. Thus, the characteristic polynomial has

the factored form

P (,A) = (1 )(2 ) . . . (N ). (1.286)For every root `, there is a non-zero eigenvector V

(`) whose components

V(`)k are the coefficients that make the N vectors Aik ` ik that are the

columns of the matrix A `I sum to zero in (1.282). Thus, every N Nmatrix has N eigenvalues ` and N eigenvectors V

(`).

Setting = 0 in the factored form (1.286) of P (,A) and in the charac-

teristic equation (1.285), we see that the determinant of every N Nmatrix is the product of its N eigenvalues

P (0, A) = |A| = p0 = 12 . . . N . (1.287)TheseN roots usually are all different, and when they are, the eigenvectors

V (`) are linearly independent. This result is trivially true for N = 1. Lets

assume its validity for N 1 and deduce it for the case of N eigenvectors.If it were false for N eigenvectors, then there would be N numbers c`, not

all zero, such that

N`=1

c`V(`) = 0. (1.288)

44 Linear Algebra

We now multiply this equation from the left by the linear operator A and

use the eigenvalue equation (1.281)

A

N`=1

c` V(`) =

N`=1

c`AV(`) =

N`=1

c` ` V(`) = 0. (1.289)

On the other hand, the product of equation (1.288) multiplied by N is

N`=1

c` N V(`) = 0. (1.290)

When we subtract (1.290) from (1.289), the terms with ` = N cancel leaving

N1`=1

c` (` N )V (`) = 0 (1.291)

in which all the factors (` N ) are different from zero since by assumptionall the eigenvalues are different. But this last equation says that N1 eigen-vectors with different eigenvalues are linearly dependent, which contradicts

our assumption that the result holds for N 1 eigenvectors. This contra-diction tells us that if the N eigenvectors of an N N square matrixhave different eigenvalues, then they are linearly independent.

An eigenvalue ` that is a single root of the characteristic equation (1.285)

is associated with a single eigenvector; it is called a simple eigenvalue. An

eigenvalue ` that is an nth root of the characteristic equation is associated

with n eigenvectors; it is said to be an n-fold degenerate eigenvalue

or to have algebraic multiplicity n. Its geometric multiplicity is the

number n n of linearly independent eigenvectors with eigenvalue ` . Amatrix whose eigenvectors are linearly dependent is said to be defective.

Example: The 2 2 matrix (0 1

0 0

)(1.292)

has only one linearly independent eigenvector (1, 0)T and so is defective.

Suppose A is an N N matrix that is not defective. We may use itsN linearly independent eigenvectors V (`) = |` to define the columns of anN N matrix S as

Sk` = k, 0|` (1.293)in which the vectors |k, 0 are the basis in which Aik = i, 0|A|k, 0. The

1.28 A Matrix Obeys Its Characteristic Equation 45

inner product of the eigenvalue equation AV (`) = `V(`) with the bra i, 0|

is

i, 0|A|` = i, 0|ANk=1

|k, 0k, 0|` =Nk=1

AikSk` = `Si`. (1.294)

Since the columns of S are linearly independent, the determinant of S does

not vanishthe matrix S is nonsingularand so its inverse S1 is well-defined by (1.198). It follows that

Ni,k=1

(S1

)niAikSk` =

Ni=1

`(S1

)niSi` = ann` = ` (1.295)

or in matrix notation

S1AS = A(d) (1.296)

in which A(d) is the diagonal form of the matrix A with its eigenvalues `arranged along its main diagonal and zeros elsewhere. Equation (1.296) is

a similarity transformation. Any nondefective square matrix can

be diagonalized by a similarity transformation

A = SA(d)S1. (1.297)

By using the product rule (1.208), we see that the determinant of any non-

defective square matrix is the product of its eigenvalues

|A| = |SA(d)S1| = |S| |A(d)| |S1| = |SS1| |A(d)| = |A(d)| =N`=1

`

(1.298)

which is a special case of (1.287).

1.28 A Matrix Obeys Its Characteristic Equation

Every square matrix obeys its characteristic equation (1.285). That is, the

characteristic equation

P (,A) = |A I| =Nk=0

pk k = 0 (1.299)

remains true when the matrix A replaces the unknown variable

P (A,A) =Nk=0

pk Ak = 0. (1.300)

46 Linear Algebra

To see why, we recall the formula (1.198) for the inverse of the matrix

A I(A I)1 = C(,A)

T

|A I| (1.301)

in which C(,A)T is the transpose of the matrix of cofactors of the matrix

A I. Since the determinant |A I| is the characteristic polynomialP (,A), we have rearranging

(A I)C(,A)T = P (,A)I. (1.302)The transpose of the matrix of cofactors of the matrix AI is a polynomialin with matrix coefficients

C(,A)T = C0 + C1+ + CN1N1. (1.303)The LHS of equation (1.302) is then

(A I)C(,A)T = AC0 + (AC1 C0)+ (AC2 C1)2 + . . .+ (ACN1 CN2)N1 CN1N . (1.304)

Equating equal powers of on both sides of (1.302), we have using (1.299)

and (1.304), we have

AC0 = p0I

AC1 C0 = p1IAC2 C1 = p2I

. . . = . . . (1.305)

ACN1 CN2 = pN1ICN1 = pNI.

We now multiply on the left the first of these equations by I, the second

by A, the third by A2, . . . , and the last by AN and then add the resulting

equations. All the terms on the left-hand sides cancel, while the sum of

those on the right give P (A,A). Thus the matrix A obeys its characteristic

equation

0 =

Nk=0

pk Ak = |A| I+p1A+ +(1)N1(TrA)AN1+(1)N AN (1.306)

a result known as the Cayley-Hamilton theorem (Arthur Cayley, 1821

1895, and William Hamilton, 18051865). This derivation is due to Israel

Gelfand (19132009) (Gelfand, 1961, pp. 8990).

1.29 Functions of Matrices 47

Because every N N matrix A obeys its characteristic equation, its Nthpower AN can be expressed as a linear combination of its lesser powers

AN = (1)N1 [|A| I + p1A+ p2A2 + + (1)N1(TrA)AN1] .(1.307)

Thus the square A2 of every 2 2 matrix is given byA2 = |A|I + (TrA)A. (1.308)

Example 1.4 (Spin-one-half rotation matrix) If is a real 3-vector and

is the 3-vector of Pauli matrices (1.35), then the square of the traceless

2 2 matrix A = is

( )2 = 3 1 i21 + i2 3

I = 2 I (1.309)in which 2 = . One may use this identity to show (problem (20)) that

exp (i /2) = cos(/2) i sin(/2) (1.310)in which is a unit 3-vector. This matrix represents a right-handed rotation

of radians about the axis for a spin-one-half object.

1.29 Functions of Matrices

What sense can we make of a function f of an N N matrix A? andhow would we compute it? One way is to use the characteristic equation

(1.307) to express every power of A in terms of I, A, . . . , AN1 and thecoefficients p0 = |A|, p1, p2, . . . , pN2, and pN1 = (1)N1TrA. Then iff(x) is a polynomial or a function with a convergent power series

f(x) =k=0

ck xk (1.311)

in principle we may express f(A) in terms of N functions fk(p) of the

coefficients p (p0, . . . , pN1) as

f(A) =

N1k=0

fk(p)Ak. (1.312)

The identity (1.310) for exp (i /2) is an example of this technique forN = 2. which can become challenging for N > 3.

Example: In problem (21), one finds the characteristic equation (1.306)

for the 33 matrix i J in which the generators are (Jk)ij = iikj andijk is totally antisymmetric with 123 = 1. These generators satisfy the

48 Linear Algebra

commutation relations [Ji, Jj ] = iijkJk in which sums over repeated indices

from 1 to 3 are understood. In problem (22), one uses this characteristic

equation to show that the 33 real orthogonal matrix exp(i J), whichrepresents a right-handed rotation by radians about the axis , is

exp(i J) = cos I i J sin + (1 cos ) ()T (1.313)or in terms of indices

exp(i J)ij = ij cos sin ijkk + (1 cos ) ij . (1.314)Direct use of the characteristic equation can become unwieldy for larger

values of N . Fortunately, another trick is available if A is a non-defective

square matrix, and if the power series (1.311) for f(x) converges. For then

A is related to its diagonal form A(d) by a similarity transformation (1.297),

and we may define f(A) as

f(A) = Sf(A(d))S1 (1.315)

in which f(A(d)) is the diagonal matrix with entries f(a`)

f(A(d)) =

f(a1) 0 0 . . .

0 f(a2) 0 . . ....

......

...

0 0 . . . f(aN )

(1.316)the a`s being the eigenvalues of the matrix A.

This definition makes sense because wed expect f(A) to be

f(A) =n=0

cnAn =

n=0

cn

(SA(d)S1

)n. (1.317)

But since S1S = I, we have(SA(d)S1

)n= S

(A(d)

)nS1 and so

f(A) = S

[ n=0

cn

(A(d)

)n]S1 = Sf(A(d))S1 (1.318)

which is (1.315).

Example: In quantum mechanics, the time-evolution operator is taken

to be the exponential exp(iHt/~) in which H = H is a hermitian linearoperator, the hamiltonian, named after William Rowan Hamilton (1805

1865), and ~ = h/(2pi) 1034 Js where h is the constant named after MaxPlanck (18581947). As well see in the next section, hermitian operators

1.29 Functions of Matrices 49

are never defective, so H can be diagonalized by a similarity transformation

H = SH(d)S1. (1.319)

The diagonal elements of the diagonal matrix H(d) are the energies E` of

the states of the system described by the hamiltonian H. The time-evolution

operator U(t) then is

U(t) = S exp(iH(d)t/~)S1. (1.320)

If the system has three states with angular frequencies i = Ei/~, then U(t)is

U(t) = S

ei1t 0 00 ei2t0 0 ei3t

S1 (1.321)in which the angular frequencies are ` = E`/~.Example: For a system described by the density operator , the entropy

S is defined as the trace

S = kTr ( ln ) (1.322)

in which k = 1.38 1023 J/K is the constant named after Ludwig Boltz-mann (18441906). The density operator is hermitian, non-negative, and

of unit trace. Since is hermitian, the matrix that represents it is never

defective, and so that matrix can be diagonalized by a similarity transfor-

mation

= S (d) S1. (1.323)

Thus since the trace is cyclic (1.27), we may compute the entropy as

S = kTr(S (d) S1 S ln((d))S1

)= kTr

((d) ln((d))

). (1.324)

A vanishing eigenvalue (d)k = 0 contributes nothing to this trace since

limx0 x lnx = 0. If the system has three states, populated with proba-bilities i, the elements of

(d), then the entropy is

S = k (1 ln 1 + 2 ln 2 + 3 ln 3)= k [1 ln (1/1) + 2 ln (1/2) + 3 ln (1/3)] . (1.325)

50 Linear Algebra

1.30 Hermitian Matrices

Hermitian matrices have especially nice properties. By definition (1.33),

a hermitian matrix A is square and unchanged by hermitian conjugation

A = A. Since it is square, the results of section 1.27 ensure that an N Nhermitian matrix A has N eigenvectors |n with eigenvalues an

A|n = an|n. (1.326)

In fact, these eigenvalues are all real. To see why, we form the adjoint of

equation (1.326)

n|A = ann| (1.327)and use the property A = A to find

n|A = n|A = ann|. (1.328)We now form the inner product of both sides of this equation with the ket

|n and use the eigenvalue equation (1.326) to getn|A|n = ann|n = ann|n (1.329)

which tells us that the eigenvalues are real

an = an. (1.330)

Since A = A, the matrix elements of A between two of its eigenvectorssatisfy

amm|n = (amn|m) = n|A|m = m|A|n = m|A|n = anm|n(1.331)

which implies that

(am an) m|n = 0. (1.332)But since the all the eigenvalues of the hermitian matrix A are real, we have

(am an) m|n = 0. (1.333)This equation tells us that when the eigenvalues are different, then the eigen-

vectors are orthogonal. In the absence of a symmetry, all n eigenvalues usu-

ally are different, and so the eigenvectors usually are mutually orthogonal.

When two or more eigenvectors |n of a hermitian matrix have the sameeigenvalue an, their eigenvalues are said to be degenerate. In this case, any

1.30 Hermitian Matrices 51

linear combination of the degenerate eigenvectors will also be an eigenvector

with the same eigenvalue an

A

(D

c|n)

= an

(D

c|n)

(1.334)

where D is the set of labels of the eigenvectors with the same eigenvalue.

If the degenerate eigenvectors |n are linearly independent, then we mayuse the Gramm-Schmidt procedure (1.1101.122) to choose the coefficients

c so as to construct degenerate eigenvectors that are normalized and or-

thogonal to each other and to the non-degenerate eigenvectors. We then

may normalize these mutually orthogonal eigenvectors.

But two related questions arise: Are the degenerate eigenvectors |nlinearly independent? And if so, what orthonormal linear combinations of

them should we choose for a given physical problem? Lets consider the

second question first.

We saw in Sec. 1.22 that unitary transformations preserve the orthonor-

mality of a basis. Any unitary transformation that commutes with the

matrix A

[A,U ] = 0 (1.335)

maps each set of orthonormal degenerate eigenvectors of A into another

set of orthonormal degenerate eigenvectors of A with the same eigenvalue

because

AU |n = UA|n = an U |n. (1.336)So theres a huge spectrum of choices for the orthonormal degenerate eigen-

vectors of A with the same eigenvalue. What is the right set for a given

physical problem?

A sensible way to proceed is to add to the matrix A a second hermitian

matrix B multiplied by a tiny, real scale factor

A() = A+ B. (1.337)

The matrix B must completely break whatever symmetry led to the degen-

eracy in the eigenvalues of A. Ideally, the matrix B should be one that

represents a modification of A that is physically plausible and relevant to

the problem at hand. The hermitian matrix A() then will have N different

eigenvalues an() and N orthonormal non-degenerate eigenvectors

A()|n, = an ()|n, . (1.338)

52 Linear Algebra

These eigenvectors |n of A() are orthogonal to each othern, |n = , (1.339)

and to the eigenvectors of A() with other eigenvalues, and they remain so

as we take the limit

|n = lim0|n, . (1.340)

We may choose them as the orthogonal degenerate eigenvectors of A. Since

one always may find a crooked hermitian matrixB that breaks any particular

symmetry, it follows that every N N hermitian matrix A possesses Northonormal eigenvectors, which are complete in the vector space in which

A acts. (Any N linearly independent vectors span their N -dimensional

vector space, as explained in section 1.9.)

Now lets return to the first question and show that an N N hermitianmatrix has N orthogonal eigenvectors. To do this, well first show that the

space of vectors orthogonal to an eigenvector |n of a hermitian operator AA|n = |n (1.341)

is invariant under the action of A. We must show that if |y is any vectororthogonal to the eigenvector |n

n|y = 0 (1.342)then A|y also is orthogonal to |n, that is, n|A|y = 0. We use successivelythe definition of A, the hermiticity of A, the eigenvector equation (1.341),the definition of the inner product, and the reality of the eigenvalues of a

hermitian matrix:

n|A|y = An|y = An|y = n|y = n|y = n|y = 0. (1.343)Thus the space of vectors orthogonal to an eigenvector of a hermitian oper-

ator is invariant under it.

Now a hermitian operator A acting on an N -dimensional vector space S

is represented by an N N hermitian matrix, and so it has at least oneeigenvector |1. The subspace of S consisting of all vectors orthogonal to|1 is an (N 1)-dimensional vector space SN1 that is invariant under theaction of A. On this space SN1, the operator A is represented by an (N 1)(N1) hermitian matrix AN1. This matrix has at least one eigenvector|2. The subspace of SN1 consisting of all vectors orthogonal to |2 is an(N 2)-dimensional vector space SN2 that is invariant under the action ofA. On SN2, the operator A is represented by an (N2)(N2) hermitianmatrix AN2 which has at least one eigenvector |3. By construction, the

1.30 Hermitian Matrices 53

vectors |1, |2, and |3 are mutually orthogonal. Continuing in this way, wesee that A has N orthogonal eigenvectors |k for k = 1, 2, . . . , N .

The N orthogonal eigenvectors |k of an N N matrix A can be normal-ized and used to write the N N identity operator I as

I =

Nk=1

|kk|. (1.344)

On multiplying from the left by the matrix A, we find

A = AI = ANk=1

|kk| =Nk=1

ak|kk| (1.345)

which is the diagonal form of the hermitian matrix A. This expansion of A as

a sum over outer products of its eigenstates multiplied by their eigenvalues

is important in quantum mechanics. The expansion represents the possible

selective, non-destructive measurements of the physical quantity represented

by the matrix A.

The hermitian matrix A is diagonal in the basis provided by its eigenstates

|kAkj = k|A|j = akkj . (1.346)

But in any other basis |`, o, the matrix A appears as

Ai` = i, o|A|`, o =Nk=1

i, o|kakk|`, o. (1.347)

The linear operator

U =Nk=1

|kk, o| (1.348)

is unitary because it maps the arbitrary orthonormal basis |k, o into theorthonormal basis of eigenstates |k. In the |k, o basis, U is the matrixwhose nth column is the N -tuple i, o|n that represents |n in the basis|i, o

Uin = i, o|U |n, o = i, o|n. (1.349)So equation (1.347) tells us that an arbitrary N N hermitian matrix Acan be diagonalized by a unitary transformation

A = UA(d)U . (1.350)

Here A(d) is the diagonal matrix A(d)nm = amnm.

54 Linear Algebra

A matrix that is real and symmetric is hermitian; so is one that is

imaginary and antisymmetric.

A real, symmetric matrix R can be diagonalized by an orthogonal trans-

formation

R = OR(d)OT (1.351)

in which the matrix O is a real unitary matrix, that is, an orthogonal matrix

(1.258).

Example: Suppose we wish to find the eigenvalues of the real, symmetric

mass matrix

M =(

0 m

m M

)(1.352)

in which m is an ordinary mass and M is a huge mass. The eigenvalues

of this hermitian mass matrix satisfy the equation

det (M I) = (M)m2 = 0 (1.353)with solutions

=1

2

(M

M2 + 4m2

). (1.354)

The larger mass + is approximately the huge mass M

+ M + m2

M(1.355)

and the smaller mass is very tiny

m2

Documents

Cahill-Physical Mathematics What Physicists and Engineers Need to Know