Upload
rangvanrang
View
189
Download
12
Tags:
Embed Size (px)
DESCRIPTION
math for physist
Citation preview
Physical Mathematics:What Physicists and Engineers Need to Know
Kevin Cahill
Department of Physics and Astronomy
University of New Mexico, Albuquerque, NM 87131-1156
iCopyright c20042010 Kevin Cahill
ii
For Marie, Mike, Sean, Peter, Mia, and James, and
in honor of Muntader al-Zaidi and Julian Assange.
Contents
Preface
1. Linear Algebra
2. Fourier Series
3. Fourier and Laplace Transforms
4. Infinite Series
5. Complex-Variable Theory
6. Differential Equations
7. Integral Equations
8. Legendre Polynomials
9. Bessel Functions
10. Group Theory
11. Tensors and Local Symmetries
12. Forms
13. Probability and Statistics
14. Monte Carlos
15. Chaos and Fractals
16. Functional Derivatives
17. Path Integrals
18. The Renormalization Group
19. Finance
20. Strings
Preface
A word to students: You will find lots of physical examples crammed in
amongst the mathematics of this book. Dont let them bother you. As you
master the mathematics, you will learn some of the physics by osmosis, just
as people learn a foreign language by living in a foreign country.
This book has two goals. One is to teach mathematics in the context of
physics. Students of physics and engineering can learn both physics and
mathematics when they study mathematics with the help of physical exam-
ples and problems. The other goal is to explain succinctly those concepts of
mathematics that are simple and that help one understand physics. Linear
dependence and analyticity are simple and helpful. Elaborate convergence
tests for infinite series and exhaustive lists of the properties of special func-
tions are not. This mathematical triage does not always work: Whitneys
embedding theorem is helpful but not simple.
The book is intended to support a one- or two-semester course for graduate
students and advanced undergraduates. One could teach the first seven,
eight, or nine chapters in the first semester, and the other chapters in the
second semester.
Several friends and colleagues, especially Bernard Becker, Steven Boyd,
Robert Burckel, Colston Chandler, Vageli Coutsias, David Dunlap, Daniel
Finley, Franco Giuliani, Igor Gorelov, Dinesh Loomba, Michael Malik, Sud-
hakar Prasad, Randy Reeder, and Dmitri Sergatskov have given me valuable
advice.
The students in the courses in which I have developed this book have
improved it by asking questions, contributing ideas, suggesting topics, and
correcting mistakes. I am particularly grateful to Mss. Marie Cahill and
Toby Tolley and to Messrs. Chris Cesare, Robert Cordwell, Amo-Kwao
Godwin, Aram Gragossian, Aaron Hankin, Tyler Keating, Joshua Koch,
Akash Rakholia, Ravi Raghunathan, Akash Rakholia, and Daniel Young for
Preface v
ideas and questions and to Mss. Tiffany Hayes and Sheng Liu and Messrs.
Thomas Beechem, Charles Cherqui, Aaron Hankin, Ben Oliker, Boleszek
Osinski, Ravi Raghunathan, Christopher Vergien, Zhou Yang, Daniel Zir-
zow, for pointing out several typos.
1Linear Algebra
1.1 Numbers
The natural numbers are the positive integers, with or without zero. Ra-
tional numbers are ratios of integers. An irrational number x is one whose
decimal digits dn
x =
n=mx
dn10n
(1.1)
do not repeat. Thus, the repeating decimals 1/2 = 0.50000 . . . and 1/3 =
0.3 0.33333 . . . are rational, while pi = 3.141592654 . . . is not. Incidentally,decimal arithemetic was invented in India over 1500 years ago but was not
widely adopted in the Europe until the seventeenth century.
The real numbersR include the rational numbers and the irrational num-
bers; they correspond to all the points on an infinite line called the real line.
The complex numbers C are the real numbers with one new number i
whose square is 1. A complex number z is a linear combination of a realnumber x and a real multiple iy of i
z = x+ iy. (1.2)
Here x = Rez is said to be the real part of z, and y the imaginary part
y = Imz. One adds complex numbers by adding their real and imaginary
parts
z1 + z2 = x1 + iy1 + x2 + iy2 = x1 + x2 + i(y1 + y2). (1.3)
Since i2 = 1, the product of two complex numbers isz1z2 = (x1 + iy1)(x2 + iy2) = x1x2 y1y2 + i(x1y2 + y1x2). (1.4)
2 Linear Algebra
The polar representation z = r exp(i) of a complex number z = x+ iy is
z = x+ iy = rei = r(cos + i sin ) (1.5)
in which r is the modulus of z
r = |z| =x2 + y2 (1.6)
and is its argument
= arctan (y/x). (1.7)
Since exp(2pii) = 1, there is an inevitable ambiguity in the definition of the
argument of any complex number: the argument + 2pin gives the same z
as .
There are two common notations z and z for the complex conjugate ofa complex number z = x+ iy
z = z = x iy. (1.8)The square of the modulus of a complex number z = x+ iy is
|z|2 = x2 + y2 = (x+ iy)(x iy) = zz = zz. (1.9)The inverse of a complex number z = x+ iy is
z1 = (x+ iy)1 =x iy
(x iy)(x+ iy) =x iyx2 + y2
=z
zz=
z
|z|2 . (1.10)
Grassmann numbers i are anti-commuting numbers, i.e., the anti-
commutator of any two Grassmann numbers vanishes
{i, j} [i, j]+ ij + ji = 0. (1.11)In particular, the square of any Grassmann number is zero
2i = 0. (1.12)
One may show that any power series in N Grassmann numbers i is a
polynomial whose highest term is proportional to the product 12 . . . N .
For instance, the most complicated power series in two Grassmann numbers
f(1, 2) =n=0
m=0
fnmn1
m2 (1.13)
is just
f(1, 2) = f0 + f1 1 + f2 2 + f12 12. (1.14)
1.2 Arrays 3
1.2 Arrays
An array is an ordered set of numbers. Arrays play big roles in computer
science, physics, and mathematics. They can be of any (integral) dimension.
A one-dimensional array (a1, a2, . . . , an) is variously called an n-tuple,
a row vector when written horizontally, a column vector when written
vertically, or an n-vector. The numbers ak are its entries or components.
A two-dimensional array aik with i running from 1 to n and k from 1 to
m is an nm matrix. The numbers aik are called its entries, elements,or matrix elements. One can think of a matrix as a stack of row vectors
or as a queue of column vectors. The entry aik is in the ith row and kth
column.
One can add together arrays of the same dimension and shape by adding
their entries. Two n-tuples add as
(a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn) (1.15)
and two nm matrices a and b add as(a+ b)ik = aik + bik. (1.16)
One can multiply arrays by numbers: Thus z times the three-dimensional
array aijk is the array with entries zaijk.
One can multiply two arrays together no matter what their shapes and
dimensions. The outer product of an n-tuple a and an m-tuple b is an
nm matrix with elements(ab)ik = aibk (1.17)
or an m n matrix with entries (ba)ki = bkai. If a and b are complex, thenone also can form the outer products
(a b)ik = ai bk (1.18)
and (b a)ki = bk ai. The (outer) product of a matrix aik and a three-
dimensional array bj`m is a five-dimensional array
(ab)ikj`m = aikbj`m. (1.19)
An inner product is possible when two arrays are of the same size in
one of their dimensions. Thus the inner product (a, b) a|b or dotproduct a b of two real n-tuples a and b is(a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1b1 + + anbn. (1.20)
4 Linear Algebra
The inner product of two complex n-tuples is defined as
(a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1 b1 + + an bn (1.21)or as its complex conjugate
(a, b) = a|b = (a b) = (b, a) = b|a = b a (1.22)so that (a, a) 0.
The product of an m n matrix aik times an n-tuple bk is the m-tuple bwhose ith component is
bi = ai1b1 + ai2b2 + + ainbn =nk=1
aikbk (1.23)
or simply b = a b in matrix notation.If the size of the second dimension of a matrix a matches that of the first
dimension of a matrix b, then their product ab is the matrix with entries
(ab)i` = ai1b1` + + ainbn`. (1.24)
1.3 Matrices
Apart from n-tuples, the most important arrays in linear algebra are the
two-dimensional arrays called matrices.
The trace of an n n matrix a is the sum of its diagonal elements
Tr a = tr a = a11 + a22 + + ann =Ni=1
aii. (1.25)
The trace of two matrices is independent of their order
Tr (a b) =ni=1
nk=1
aikbki =nk=1
ni=1
bkiaik = Tr (ba) . (1.26)
It follows that the trace is cyclic
Tr (a b . . . z) = Tr (b . . . z a) . (1.27)
(Here we take for granted that the elements of these matrices are ordinary
numbers that commute with each other.)
The transpose of an n ` matrix a is the ` n matrix aT with entries(aT)ij
= aji. (1.28)
1.3 Matrices 5
Some mathematicians use a prime to mean transpose, as in a = aT, butphysicists tend to use a prime to mean different . One may show that
(a b) T = bT aT. (1.29)
A matrix that is equal to its transpose
a = aT (1.30)
is symmetric.
The (hermitian) adjoint of a matrix is the complex conjugate of its trans-
pose (Charles Hermite, 18221901). That is, the (hermitian) adjoint a ofan N L complex matrix a is the LN matrix with entries
(a)ij = (aji) = aji. (1.31)
One may show that
(a b) = b a. (1.32)
A matrix that is equal to its adjoint
(a)ij = (aji) = aji = aij (1.33)
(and which therefore must be a square matrix) is said to be hermitian or
self adjoint
a = a. (1.34)
Example: The three Pauli matrices
1 =
(0 1
1 0
), 2 =
(0 ii 0
), and 3 =
(1 0
0 1)
(1.35)
are all hermitian (Wolfgang Pauli, 19001958). A real hermitian matrix is
symmetric. If a matrix a is hermitian, then the quadratic form
v|a|v =ni=1
Nj=1
vi aijvj R (1.36)
is real for all complex n-tuples v.
The Kronecker delta ik is defined to be unity if i = k and zero if i 6= k(Leopold Kronecker, 18231891). In terms of it, the nn identity matrixI is the matrix with entries Iik = ik.
The inverse a1 of an n n matrix a is a matrix that satisfiesa1 a = a a1 = I (1.37)
6 Linear Algebra
in which I is the n n identity matrix.So far we have been writing n-tuples and matrices and their elements with
lower-case letters. It is equally common to use capital letters, and we will
do so for the rest of this section.
A matrix U whose adjoint U is its inverse
U U = UU = I (1.38)
is unitary. Unitary matrices are square.
A real unitary matrix O is orthogonal and obeys the rule
OTO = OOT = I. (1.39)
Orthogonal matrices are square.
An N N hermitian matrix A is said to be non-negativeA 0 (1.40)
if for all complex vectors V the quadratic form
V |A|V =Ni=1
Nj=1
V i AijVj 0 (1.41)
is non-negative. A similar rule
V |A|V > 0 (1.42)for all |V defines a positive or a positive-definite matrix (A > 0), al-though people often use these terms to describe non-negative matrices.
Examples: The non-symmetric, non-hermitian 2 2 matrix(1 1
1 1)
(1.43)
is positive on the space of all real 2-vectors but not on the space of all
complex 2-vectors.
The 2 2 matrix (0 11 0
)(1.44)
provides a representation of i since(0 11 0
)(0 11 0
)=
(1 00 1
)= I. (1.45)
The 2 2 matrix (0 1
0 0
)(1.46)
1.4 Vectors 7
provides a representation of a Grassmann number since(0 1
0 0
)(0 1
0 0
)=
(0 0
0 0
)= 0. (1.47)
To represent two Grassmann numbers one needs 4 4 matrices, such as
1 =
0 1 0 0
0 0 0 0
0 0 0 1
0 0 0 0
and 2 =
0 0 0 0
0 0 0 0
1 0 0 0
0 1 0 0
. (1.48)
1.4 Vectors
Vectors are things that can be multiplied by numbers and added together
to form other vectors in the same vector space. So if U and V are vectors
in a vector space S over a set F of numbers and x and y are numbers in F ,
then
W = xU + yV (1.49)
is also a vector in the vector space S.
A basis for a vector space S is a set of vectors Bk for k = 1 . . . N in terms
of which every vector U in S can be expressed as a linear combination
U = u1B1 + u2B2 + + uNBN (1.50)with numbers uk in F . These numbers uk are the components of the vector
U in the basis Bk.
Example: Suppose the vector W represents a certain kind of washer and
the vector N represents a certain kind of nail. Then if n and m are natural
numbers, the vector
H = nW +mN (1.51)
would represent a possible inventory of a very simple hardware store. The
vector space of all such vectors H would include all possible inventories of
the store. That space is a two-dimensional vector space over the natural
numbers, and the two vectors W and N form a basis for it.
Example: The complex numbers are a vector space. Two of its vectors
are the number 1 and the number i; the vector space of complex numbers is
then the set of all linear combinations
z = x1 + yi = x+ iy. (1.52)
8 Linear Algebra
So the complex numbers form a two-dimensional vector space over the real
numbers, and the vectors 1 and i form a basis for it.
The complex numbers also form a one-dimensional vector space over the
complex numbers. Here any non-zero real or complex number, for instance
the number 1 can be a basis consisting of the single vector 1. This one-
dimensional vector space is the set of all z = z1 for arbitrary complex z.
Example: Ordinary flat two-dimensional space is the set of all linear
combinations
r = xx + yy (1.53)
in which x and y are real numbers and x and y are perpendicular vectors of
unit length (unit vectors). This vector space, called R2, is a 2-d space overthe reals.
Note that the same vector r can be described either by the basis vectors
x and y or by any other set of basis vectors, such as y and xr = xx + yy = y(y) + xx. (1.54)
So the components of the vector r are (x, y) in the {x, y} basis and (y, x) inthe {y, x} basis. Each vector is unique, but its components dependupon the basis.
Example: Ordinary flat three-dimensional space is the set of all linear
combinations
r = xx + yy + zz (1.55)
in which x, y, and z are real numbers. It is a 3-d space over the reals.
Example: Arrays of a given dimension and size can be added and multi-
plied by numbers, and so they form a vector space. For instance, all complex
three-dimensional arrays aijk in which 1 i 3, 1 j 4, and 1 k 5form a vector space over the complex numbers.
Example: Derivatives are vectors, so are partial derivatives. For in-
stance, the linear combinations of x and y partial derivatives taken at
x = y = 0
a
x+ b
y(1.56)
form a vector space.
Example: The space of all linear combinations of a set of functions fi(x)
defined on an interval [a, b]
f(x) =i
zifi(x) (1.57)
1.5 Linear Operators 9
is a vector space over the space of the numbers {zi}.Example: In quantum mechanics, a state is represented by a vector,
often written as or in Diracs notation as |. If c1 and c2 are complexnumbers, and |1 and |2 are any two states, then the linear combination
| = c1|1+ c2|2 (1.58)also is a possible state of the system.
1.5 Linear Operators
A linear operator A is a map that takes any vector U in its domain into
another vector U = A(U) AU in a way that is linear. So if U and V aretwo vectors in the domain of the linear operator A and b and c are two real
or complex numbers, then
A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.59)
In the most important case, the operator A maps vectors in a vector space
S into vectors in the same space S. In this case, A maps each basis vector
Bi for the space S into a linear combination of these basis vectors Bk
ABi = a1iB1 + a2iB2 + + aNiBN =Nk=1
akiBk. (1.60)
The square matrix aki represents the linear operator A in the Bk basis.
The effect of A on any vector U = u1B1 + u2B2 + + uNBN in S then is
AU = A(
Ni=1
uiBi) =
Ni=1
uiABi =
Ni=1
ui
Nk=1
akiBk
=Nk=1
(Ni=1
akiui
)Bk. (1.61)
Thus the kth component uk of the vector U = AU is
uk = ak1u1 + ak2u2 + + akNuN =Ni=1
akiui. (1.62)
Thus the column vector u of the components uk of the vector U = AU
is the product u = au of the matrix with elements aki that represents thelinear operator A in the Bk basis with the column vector with components
10 Linear Algebra
ui that represents the vector U in that basis. So in a given basis, vectors
and linear operators can be identified with column vectors and matrices.
Each linear operator is unique, but its matrix depends upon the
basis. Suppose we change from the Bk basis to another basis Bk
Bk =
N`=1
u`kB` (1.63)
in which the N N matrix u`k has an inverse matrix u1ki so thatNk=1
u1ki Bk =Nk=1
u1kiN`=1
u`kB` =
N`=1
(Nk=1
u`ku1ki
)B` =
N`=1
`iB` = B
i.
(1.64)
Then the other basis vectors are given by
Bi =Nk=1
u1ki Bk (1.65)
and one may show (problem 3) that the action of the linear operator A on
this basis vector is
ABi =N
j,k,`=1
u`jajku1ki B
` (1.66)
which shows that the matrix a that represents A in the B basis is relatedto the matrix a that represents it in the B basis by
a`i =N
jk=1
u`jajku1ki (1.67)
which in matrix notation is simply a = u au1.Example: Suppose the action of the linear operator A on the basis
{B1, B2} is AB1 = B2 and AB2 = 0. If the column vectors
b1 =
(1
0
)and b2 =
(0
1
)(1.68)
represent the two basis vectors B1 and B2, then the matrix
a =
(0 0
1 0
)(1.69)
would represent the linear operator A. But if we let the column vectors
b1 =(
1
0
)and b2 =
(0
1
)(1.70)
1.6 Inner Products 11
represent the basis vectors
B1 =12
(B1 +B2)
B2 =12
(B1 B2) (1.71)
then the vectors
b1 =12
(1
1
)and b2 =
12
(1
1)
(1.72)
would represent B1 and B2, and so the matrix
a =1
2
(1 1
1 1)
(1.73)
would represent the linear operator A.
A linear operator A also may map a vector space S with basis Bk into a
different vector space T with a different basis Ck. In this case, A maps the
basis vector Bi into a linear combination of the basis vectors Ck
ABi =Mk=1
akiCk (1.74)
and an arbitrary vector U = u1B1 + + uNBN into
AU =Mk=1
(Ni=1
akiui
)Ck. (1.75)
Well return to this point in Sections (1.14 & 1.15).
1.6 Inner Products
Most, but not all, of the vector spaces used by physicists have an inner
product. An inner product is a function that associates a number (f, g)
with every ordered pair of vectors f & g in the vector space in such a way
as to satisfy these rules:
(f, g) = (g, f) (1.76)(f, z1g1 + z2g2) = z1(f, g1) + z2(f, g2) (1.77)
(z1f1 + z2f2, g) = z1(f1, g) + z
2(f2, g) (1.78)
(f, f) 0 (1.79)in which the f s and gs are vectors and the zs are numbers. The first two
rules require that the inner product be linear in the second vector of the
12 Linear Algebra
pair and anti-linear in the first vector of the pair. (The third rule follows
from the first two.) If, in addition, the only vector f that has a vanishing
inner product with itself is the zero vector
(f, f) = 0 if and only if f = 0 (1.80)
then the inner product is hermitian or non degenerate; otherwise it is
semi-definite or degenerate.
The inner product of a vector f with itself is the square of the norm
|f | = f of the vector|f |2 = f 2= (f, f) (1.81)
and so by (1.79), the norm is well-defined as
f =
(f, f). (1.82)
The distance between two vectors f and g is the norm of their difference
f g . (1.83)Example: The space of real vectors V with N components Vi forms an
N -dimensional vector space over the real numbers with inner product
(U, V ) =Ni=1
UiVi (1.84)
If the inner product (U, V ) is zero, then the two vectors are orthogonal. If
(U,U) = 0, then
(U,U) =Ni=1
U2i = 0 (1.85)
which implies that all Ui = 0, so the vector U = 0. So this inner product is
hermitian or non degenerate.
Example: The space of complex vectors V with N components Vi forms an
N -dimensional vector space over the complex numbers with inner product
(U, V ) =
Ni=1
Ui Vi (1.86)
If the inner product (U, V ) is zero, then the two vectors are orthogonal. If
(U,U) = 0, then
(U,U) =
Ni=1
Ui Ui =Ni=1
|Ui|2 = 0 (1.87)
1.6 Inner Products 13
which implies that all Ui = 0, and so the vector U is zero. So this inner
product is hermitian or non degenerate.
Example: For the vector space of N L complex matrices A, B, . . ., thetrace of the product of the adjoint (1.31) of A times B is a natural inner
product
(A,B) = TrAB =Ni=1
Lj=1
(A)jiBij =Ni=1
Lj=1
AijBij . (1.88)
Note that (A,A) is positive
(A,A) = TrAA =Ni=1
Lj=1
AijAij =Ni=1
Lj=1
|Aij |2 0 (1.89)
and zero only when A = 0.
Two examples of degenerate or semi-definite inner products are given in
the section (1.41) on correlation functions.
Mathematicians call a vector space with an inner product (1.761.79) an
inner-product space, a metric space, and a pre-Hilbert space.
A sequence of vectors fn is a Cauchy sequence if for every > 0 there
is an integer N() such that fn fm < whenever both n and m exceedN(). A sequence of vectors fn converges to a vector f if for every > 0
there is an integer N() such that ffn < whenever n exceeds N(). Aninner-product space with a norm defined as in (1.82) is complete if each of
its Cauchy sequences converges to a vector in that space. A Hilbert space
is a complete inner-product space. Every finite-dimensional inner-product
space is complete and so is a Hilbert space. But the term Hilbert space
more often is used to describe infinite-dimensional complete inner-product
spaces, such as the space of all square-integrable functions (David Hilbert,
18621943).
Example 1.1 (The Hilbert Space of Square-Integrable Functions) For the
vector space of functions (1.57), a natural inner product is
(f, g) =
badx f(x)g(x). (1.90)
The squared norm f of a function f(x) is
f 2= badx |f(x)|2. (1.91)
A function is said to be square integrable if its norm is finite. The space of
14 Linear Algebra
all square-integrable functions is an inner-product space; it is also complete
and so is a Hilbert space.
1.7 Schwarz Inequality
Since by (1.79) the inner product of a vector with itself cannot be negative,
it follows that for any vectors f and g and any complex number z = x+ iy
the inner product
P (x, y) = (f +zg, f+zg) = (f, f)+zz(g, g)+z(g, f)+z(f, g) 0 (1.92)is positive or zero. It even is non-negative at its minimum, which we may
find by differentiation
0 =P (x, y)
x=P (x, y)
y(1.93)
to be at
x = Re(f, g)/(g, g) & y = Im(f, g)/(g, g) (1.94)as long as (g, g) > 0. If we substitute these values into Eq.(1.92), then we
arrive at the relation
(f, f)(g, g) |(f, g)|2 (1.95)which is called variously the Cauchy-Schwarz inequality and the Schwarz
inequality. Equivalently
f g |(f, g)|. (1.96)If the inner product is degenerate and (g, g) = 0, then the non-negativity of
(f + zg, f + zg) implies that (f, g) = 0, in which case the Schwarz inequality
is trivially satisfied.
Example: For the dot-product of two real 3-vectors r & R, the Cauchy-
Schwarz inequality is
(r r) (R R) (r R)2 = (r r) (R R) cos2 (1.97)where is the angle between r and R.
Example: For two real n-vectors x and y, the Schwarz inequality is
(x x) (y y) (x y)2 = (x x) (y y) cos2 (1.98)and it implies (problem 5) that
x+ y x+ y. (1.99)
1.8 Linear Independence and Completeness 15
Example: For two complex n-vectors u and v, the Schwarz inequality is
(u u) (v v) |u v|2 = (u u) (v v) cos2 (1.100)and it implies (problem 6) that
u+ v u+ v. (1.101)Example: For the inner product (1.90) of two complex functions f and
g, the Schwarz inequality is badx |f(x)|2
badx |g(x)|2
badx f(x)g(x)
2 . (1.102)
1.8 Linear Independence and Completeness
A set of N vectors Vi is linearly dependent if there exist numbers ci, not
all zero, such that the linear combination ciVi vanishes
Ni=1
ciVi = 0. (1.103)
A set of vectors Vi is linearly independent if it is not linearly dependent.
A set {Vi} of linearly independent vectors is maximal in a vector spaceS if the addition of any other vector V S to the set {Vi} makes the set{V, Vi} linearly dependent.
A set {Vi} of N linearly independent vectors that is maximal in a vectorspace S spans that space. For if V is any vector in S (and not one of the
vectors Vi), then the set {V, Vi} linearly dependent. Thus there are numbersc, ci, not all zero, that make the sum
cV +Ni=1
ciVi = 0 (1.104)
vanish. Now if c were 0, then the set {Vi} would be linearly dependent.Thus c 6= 0, and so we may divide by it and express the arbitrary vector Vas a linear combination of the vectors Vi
V = 1c
Ni=1
ciVi. (1.105)
So the set of vectors {Vi} spans the space S; it is a complete set of vectorsin the space S.
16 Linear Algebra
A set of vectors {Vi} that is complete in a vector space S is said to providea basis for that space because the set affords a way to expand an arbitrary
vector in S as a linear combination of the basis vectors {Vi}. If the vectorsof basis are linearly dependent, then at least one of them is superfluous; thus
it is convenient to have the vectors of a basis be linearly independent.
1.9 Dimension of a Vector Space
Suppose {Vi|i = 1 . . . N} and {Ui|i = 1 . . .M} are two sets of N and Mmaximally linearly independent vectors in a space S. Then N = M .
Suppose M < N . Since the U s are complete, as explained in Sec. 1.8, we
may express each of the N vectors Vi in terms of the M vectors Uj
Vi =Mj=1
AijUj . (1.106)
Let Aj be the vector with components Aij ; there are M < N such vectors,
and each has N > M components. So it is always possible to find a non-
zero N -dimensional vector C with components ci that is orthogonal to all
M vectors Aj :
Ni=1
ciAij = 0. (1.107)
But then the linear combination
Ni=1
ciVi =Ni=1
Mj=1
ciAij Uj = 0 (1.108)
vanishes, which would imply that the N vectors Vi were linearly dependent.
Since these vectors are by assumption linearly independent, it follows that
N M .Similarly, one may show that M N . Thus M = N .The number N of vectors in a maximal linearly independent set of a vector
space S is the dimension of the vector space. Any N linearly independent
vectors in an N -dimensional space forms a basis for it.
1.10 Orthonormal Vectors
Suppose the vectors {Vi|i = 1 . . . N} are linearly independent. Then we maymake out of them a set of N vectors Ui that are orthonormal
(Ui, Uj) = ij . (1.109)
1.10 Orthonormal Vectors 17
Procedure (Gramm-Schmidt): We set
U1 =V1
(V1, V1)(1.110)
So the first vector U1 is normalized.
Next we set
u2 = V2 + c12U1 (1.111)
and require that u2 be orthogonal to U1
0 = (U1, u2) = (U1, c12U1 + V2) = c12 + (U1, V2) (1.112)
whence c12 = (U1, V2), and sou2 = V2 (U1, V2)U1. (1.113)
The normalized vector U2 then is
U2 =u2
(u2, u2). (1.114)
Similarly, we set
u3 = V3 + c13U1 + c23U2 (1.115)
and ask that u3 be orthogonal both to U1
0 = (U1, u3) = (U1, c13U1 + c23U2 + V3) = c13 + (U1, V3) (1.116)
and to U2
0 = (U2, u3) = (U2, c13U1 + c23U2 + V3) = c23 + (U2, V3) (1.117)
whence ci3 = (Ui, V3) for i = 1 & 2, and sou3 = V3 (U1, V3)U1 (U2, V3)U2. (1.118)
The normalized vector U3 then is
U3 =u3
(u3, u3). (1.119)
We may continue in this way until we reach the last of the N linearly
independent vectors. We require the kth unnormalized vector uk
uk = Vk +k1i=1
cikUi. (1.120)
18 Linear Algebra
to be orthogonal to the k 1 vectors Ui and find that cik = (Ui, Vk) sothat
uk = Vk k1i=1
(Ui, Vk)Ui. (1.121)
The normalized vector then is
Uk =uk
(uk, uk). (1.122)
In general, a basis is more useful if it is composed of orthonormal vectors.
1.11 Outer Products
From any two vectors f and g, we may make an operator A that takes any
vector h into the vector f with coefficient (g, h)
Ah = f(g, h). (1.123)
It is easy to show that A is linear, that is that
A(zh+ we) = zAh+ wAe (1.124)
for any vectors e, h and numbers z, w.
Example: If f and g are vectors with components fi and gi, and h has
components hi, then the linear transformation is
(Ah)i =Nj=1
Aijhj = fi
Nj=1
gjhj (1.125)
so A is a matrix with entries
Aij = figj . (1.126)
The matrix A is the outer product of the vectors f and g.
1.12 Dirac Notation
Such outer products are important in quantum mechanics, and so Dirac
invented a notation for linear algebra that makes them easy to write. In his
notation, the outer product A of Eqs.(1.1231.126) is
A = |fg| (1.127)
1.12 Dirac Notation 19
and the inner product (g, h) is
(g, h) = g|h. (1.128)He called g| a bra and |h a ket, so that g|h is a bracket. In hisnotation Eq.(1.123) reads
A|h = |fg|h. (1.129)The new thing in Diracs notation is the bra f |. If the ket |f is repre-
sented by the vector
|f =
z1z2z3z4
(1.130)then the bra f | is represented by the adjoint of that vector
f | = (z1 , z2 , z3 , z4) . (1.131)In the standard notation, bras are implicit in the definition of the inner
product, but they do not appear explicitly.
In Diracs notation, the rules that a hermitian inner product (1.761.80)
satisfies are:
f |g = g|f (1.132)f |z1g1 + z2g2 = z1f |g1+ z2f |g2 (1.133)z1f1 + z2f2|g = z1f1|g+ z2f2|g (1.134)
f |f 0 (1.135)f |f = 0 if and only if f = 0. (1.136)
Usually, however, states in Dirac notation are labeled | or by their quan-tum numbers |n, l,m, and so one rarely sees plus signs or complex numbersor operators inside bras or kets. But one should.
Diracs notation allows us to write outer products clearly and simply.
Example: If the vectors f = |f and g = |g are
|f = ab
c
and |g = ( zw
)(1.137)
then their outer products are
|fg| =az awbz bwcz cw
and |gf | = (za zb zcwa wb wc
)(1.138)
20 Linear Algebra
as well as
|ff | =aa ab acba bb bcca cb cc
and |gg| = (zz zwwz ww
). (1.139)
Example: In Dirac notation, formula (1.121) is
|uk = |Vk k1i=1
|UiUi|Vk (1.140)
or
|uk =(I
k1i=1
|UiUi|)|Vk (1.141)
and (1.122) is
|Uk = |ukuk|uk . (1.142)
1.13 Identity Operators
Dirac notation provides a neat way of representing the identity operator I in
terms of a complete set of orthonormal vectors. First, in standard notation,
the expansion of an arbitrary vector f in a space S in terms of a complete
set of N orthonormal vectors ei
(ej , ei) = ij (1.143)
is
f =
Ni=1
ci ei (1.144)
from which we conclude that
(ej , f) = (ej ,
Ni=1
ciei) =
Ni=1
ci(ej , ei) =
Ni=1
ciij = cj (1.145)
whence
f =
Ni=1
(ei, f) ei =
Ni=1
ei (ei, f). (1.146)
The derivation stops here because there is no explicit expression for a bra.
1.13 Identity Operators 21
But in Dirac notation, these equations read
ej |ei = ij (1.147)
|f =Ni=1
ci |ei (1.148)
ej |f = ej |Ni=1
ciei =Ni=1
ciej |ei =Ni=1
ciij = cj (1.149)
|f =Ni=1
ei|f |ei =Ni=1
|ei ei|f. (1.150)
(1.151)
We now rewrite the last equation as
|f =(
Ni=1
|ei ei|)|f. (1.152)
Since this equation holds for every vector |f, the quantity inside the paren-theses must be the identity operator
I =Ni=1
|ei ei|. (1.153)
Because one always may insert an identity operator anywhere, and because
the formula is true for every complete set of orthonormal vectors, the reso-
lution (1.153) of the identity operator is extremely useful.
By twice inserting the identity operator (1.153), one may convert a general
inner product (g,Af) = g|A|f into an expression involving a matrix Aijthat represents the linear operator A
g|A|f = g|IAI|f
=
Ni,j=1
g|eiei|A|ejej |f (1.154)
In the basis {|ek}, the matrix Aij that represents the linear oper-ator A is
Aij = ei|A|ej (1.155)and the components of the vectors |f and |g are
fi = ei|fgi = ei|g. (1.156)
22 Linear Algebra
In this basis, the inner product (g,Af) = g|A|f takes the form
g|A|f =N
i,j=1
giAijfj . (1.157)
1.14 Vectors and Their Components
Usually, the components vk of a vector |v are the inner productsvk = k|v (1.158)
of the vector |v with a set of orthonormal basis vectors |k. Thus thecomponents vk of a vector |v depend on both the vector and the basis. Avector is independent of the basis used to compute its components,
but its components depend upon the chosen basis.
If the basis is orthonormal and so provides for the identity operator I the
expansion
I =
Nk=1
|kk| (1.159)
then the components vk of the vector |v are the coefficients in its expansionin terms of the basis vectors |k
|v = I|v =Nk=1
|kk|v =Nk=1
vk|k. (1.160)
1.15 Linear Operators and Their Matrices
A linear operator A maps vectors into vectors linearly as in Eq. (1.59)
A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.161)
In the simplest and most important case, the linear operator A maps the
vectors of a vector space S into vectors in the same space S. If the space S
is N -dimensional, then it maps the vectors |i of any basis {|i} for S intovectors A|i |Ai that can be expanded in terms of the same basis {|k}
A|i = |Ai =Nk=1
Aki|k. (1.162)
The N N matrix with entries Aki represents the linear operator A
1.15 Linear Operators and Their Matrices 23
in the basis {|i}. Because A is linear, its action on an arbitrary vector|C = Ni=1Ci |i in S is
A|C = A(
Ni=1
Ci |i)
=
Ni=1
CiA |i =Nk=1
Ni=1
AkiCi |k. (1.163)
Thus the coefficients (AC)k of the vector A|C |AC in the expansion
A|C = |AC =Nk=1
(AC)k |k (1.164)
are given by the matrix multiplication of the vector C with elements Ci by
the matrix A with entries Aki
(AC)k =Ni=1
AkiCi. (1.165)
Both the elements Ci of the vector C and the entries Aki of the
matrix A depend upon the basis {|i} one chooses to use.If the vectors {|i} are orthonormal, then the elements C` and A`i are
`|C =Ni=1
Ci `|i =Ni=1
Ci `i = C`
`|A|i =Nk=1
Aki `|k =Nk=1
Aki `k = A`i. (1.166)
In the more general case, the linear operator A maps vectors in a vector
space S into vectors in a different vector space S. Now A maps an orthonor-mal basis {|i} for S into vectors A|i that may be expanded in terms of anorthonormal basis {|k}
A|i =N k=1
Aki |k. (1.167)
If the N vectors A|i are linearly independent, then N = N , but if theyare linearly dependent or if some of them are zero, then N < N . Theelements A`i of the matrix that represents the linear operator A
now are
8`|A|i =N k=1
Aki8`|k =
N k=1
Aki `k = A`i. (1.168)
They depend on both bases {|i} and {|k}. So although the linear
24 Linear Algebra
operator is basis independent, the matrices that represent it vary with the
chosen bases.
So far we have mostly been talking about linear operators that act on
finite-dimensional vector spaces and that can be represented by matrices.
But infinite-dimensional vector spaces and the linear operators that act on
them play central roles in electrodynamics and quantum mechanics. For
instance, the Hilbert space H of all wave functions (x, t) that are squareintegrable over three-dimensional space at all times t is of (very) infinite
dimension. An example in one space dimension of a linear operator that
maps (a subspace of) H to H is the hamiltonian H for a non-relativisticparticle of mass m in a potential V
H = ~2
2m
d2
dx2+ V (x). (1.169)
It maps the state vector | with components x| = (x) into the vectorH| with components
x|H| = H(x) = ~2
2m
d2(x)
dx2+ V (x)(x) (1.170)
where ~ = 1.05 1034 Js. Translations in space and timeUT (a, b)(x, t) = (x + a, t+ b) (1.171)
and rotations in space
UR()(x, t) = (R()x, t) (1.172)
are also represented by linear operators acting on vector spaces of infinite
dimension. As well see in what follows, these linear operators are unitary.
We may think of linear operators that act on vector spaces of infinite di-
mension as infinite-dimensional matrices or as matrices of continuously
infinite dimension, the latter really being integral operators like
H =
dp
dp |pp|H|pp|. (1.173)
Thus we may carry over to spaces of infinite dimension most of our intuition
about matricesas long as we use common sense and keep in mind that
infinite sums and integrals do not always converge to finite numbers.
1.16 Determinants
The determinant of a 2 2 matrix A isdetA = |A| = A11A22 A21A12. (1.174)
1.16 Determinants 25
In terms of the antisymmetric matrix eij = eji (which implies that e11 =e22 = 0) with e12 = 1, this determinant is
detA =2i=1
2j=1
eijAi1Aj2. (1.175)
Its also true that
ek` detA =
2i=1
2j=1
eijAikAj`. (1.176)
These definitions and results extend to any square matrix. If A is a 3 3matrix, then its determinant is
detA =3
ijk=1
eijkAi1Aj2Ak3 (1.177)
in which eijk is totally antisymmetric with e123 = 1 and the sums over i, j,
& k run from 1 to 3. More explicitly, this determinant is
detA =3
ijk=1
eijkAi1Aj2Ak3
=
3i=1
Ai1
3jk=1
eijkAj2Ak3
= A11 (A22A33 A32A23) +A21 (A32A13 A12A33)+A31 (A12A23 A22A13) . (1.178)
This sum involves the 2 2 determinants of the matrices that result whenwe strike out column 1 and row i, which are called minors, multiplied by
(1)1+i
detA = A11(1)2 (A22A33 A32A23) +A21(1)3 (A12A33 A32A13)+A31(1)4 (A12A23 A22A13) (1.179)
=
3i=1
Ai1Ci1. (1.180)
These minors multiplied by (1)1+i are called cofactors:C11 = A22A33 A23A32C21 = (A12A33 A32A13)C31 = A12A23 A22A13. (1.181)
26 Linear Algebra
This way of computing determinants is due to Laplace.
Example: The determinant of a 3 3 matrix is the dot product of thevector of its first row with the cross-product of the vectors of its second and
third rows:U1 U2 U3V1 V2 V3W1 W2 W3
=3
ijk=1
eijkUiVjWk =
3i=1
Ui(V W )i = U (V W ).
(1.182)
Totally antisymmetric quantities ei1i2...iN with N indices and with e123...N =
1 provide a definition of the determinant of an N N matrix A as
detA =N
i1i2...iN=1
ei1i2...iNAi11Ai22 . . . AiNN (1.183)
in which the sums over i1 . . . iN run from 1 to N . The general form of
Laplaces expansion of this determinant is
detA =Ni=1
AikCik =Nk=1
AikCik (1.184)
in which the first sum is over the row index i but not the (arbitrary) col-
umn index k, and the second sum is over the column index k but not the
(arbitrary) row index i. The cofactor Cik is (1)i+kMik in which the minorMik is the determinant of the (N 1) (N 1) matrix A without its ithrow and kth column.
Incidentally, its also true that
ek1k2...kN detA =N
i1i2...iN=1
ei1i2...iNAi1k1Ai2k2 . . . AiNkN . (1.185)
The key feature of a determinant is that it is an antisymmetric combina-
tion of products of the elements Aik of a matrix A. One implication of this
antisymmetry is that the interchange of any two rows or any two columns
changes the sign of the determinant. Another is that if one adds a multiple
of one column to another column, for example a multiple xAi2 of column 2
to column 1, then the determinant
detA =N
i1i2...in=1
ei1i2...iN (Ai11 + xAi12)Ai22 . . . AiNN (1.186)
1.16 Determinants 27
is unchanged. The reason is that the extra term detA vanishes
detA =N
i1i2...iN=1
x ei1i2...iN Ai12Ai22 . . . AiNN = 0 (1.187)
because it is proportional to a sum of products of a factor ei1i2...iN that is
antisymmetric in i1 and i2 and a factor Ai12Ai22 that is symmetric in these
indices. For instance, when i1 and i2 are 5 & 7 and 7 & 5, the two terms
cancel
e57...iNA52A72 . . . AiNN + e75...iNA72A52 . . . AiNN = 0 (1.188)
because e57...iN = e75...iN .By repeated additions of x2Ai2, x3Ai3, etc. to Ai1, we can change the
first column of the matrix A to a nearly arbitrary linear combination of all
the columns
Ai1 Ai1 +Nk=2
xkAik (1.189)
without changing detA. This linear combination is not completely arbitrary
because the coefficient of Ai1 remains unity. The analogous operation
Ai` Ai` +N
k=1,k 6=`ykAik (1.190)
replaces the `th column by a nearly arbitrary linear combination of all the
columns without changing detA.
The key concepts of linear dependence and independence were explained
in Sec. 1.8. Suppose that the columns of an N N matrix A are linearlydependent, so that for some coefficients yk not all zero the linear combination
Nk=1
ykAik = 0 i (1.191)
vanishes for all i (the upside-down A means for all). Suppose y1 6= 0. Thenby adding suitable linear combinations of columns 2 through N to column
1, we could make all the elements Ai1 of column 1 vanish without changing
detA. But then the detA as given by (1.183) would vanish. It follows that
the determinant of any matrix whose columns are linearly dependent must
vanish.
The converse also is true: if columns of a matrix are linearly independent,
then the determinant of that matrix can not vanish. To see why, let us
28 Linear Algebra
recall, as explained in Sec. 1.8, that any linearly independent set of vectors
is complete. Thus if the columns of a matrix A are linearly independent and
therefore complete, some linear combination of all columns 2 through N
when added to column 1 will convert column 1 into a (non-zero) multiple of
the N -dimensional column vector (1, 0, 0, . . . 0), say (c1, 0, 0, . . . 0). Similar
operations will convert column 2 into a (non-zero) multiple of the column
vector (0, 1, 0, . . . 0), say (0, c2, 0, . . . 0). Continuing in this way, we may
convert the matrix A to a matrix with non-zero entries along the main
diagonal and zeros everywhere else. The determinant detA is then the
product of the non-zero diagonal entries c1c2 . . . cN 6= 0, and so detA cannot vanish.
We may extend these arguments to the rows of a matrix. The addition
to row k of a linear combination of the other rows
Aki Aki +N
`=1,` 6=kz`A`i (1.192)
does not change the value of the determinant. In this way, one may show
that the determinant of a matrix vanishes if and only if its rows are linearly
dependent. The reason why these results apply to the rows as well as to
the columns is that the determinant of a matrix A may be defined either in
terms of the columns as in definitions (1.183) & 1.185) or in terms of the
rows:
detA =
Ni1i2...iN=1
ei1i2...iNA1i1A2i2 . . . ANiN (1.193)
ek1k2...kN detA =
Ni1i2...iN=1
ei1i2...iNAk1i1Ak2i2 . . . AkN iN . (1.194)
These and many other properties of determinants follow from a study of
permutations, which are discussed in Section 10.13. Detailed proofs can
be found in the book by Aitken (Aitken, 1959).
By comparing the row (1.183) & 1.185) and (1.193 & 1.194) column def-
initions of determinants, we see that the determinant of the transpose of a
matrix is the same as the determinant of the matrix itself:
detAT = detA. (1.195)
Let us return for a moment to Laplaces expansion (1.184) for the deter-
minant detA of an N N matrix A as a sum of AikCik over the row index
1.16 Determinants 29
i with the column index k held fixed
detA =Ni=1
AikCik (1.196)
in order to prove that
k` detA =
Ni=1
AikCi`. (1.197)
For k = `, this formula just repeats Laplaces expansion (1.196). But for
k 6= `, it is Laplaces expansion for the determinant of a matrix A thatis the same as A but with its `th column replaced by its kth one. Since
the matrix A has two identical columns, its determinant vanishes, whichexplains (1.197) for k 6= `.
The rule (1.197) therefore provides a formula for the inverse of a matrix A
whose determinant does not vanish. Such matrices are called nonsingular.
The inverse A1 of an N N nonsingular matrix A is the transpose of thematrix of cofactors divided by detA(
A1)`i
=Ci`
detAor A1 =
CT
detA. (1.198)
To verify this formula, we use it for A1 in the product A1A and note thatby (1.197) the `kth entry of the product A1A is just `k(
A1A)`k
=Ni=1
(A1
)`iAik =
Ni=1
Ci`detA
Aik = `k (1.199)
as required.
Example: Lets apply our formula (1.198) to find the inverse of the
general 2 2 matrixA =
(a b
c d
). (1.200)
We find then
A1 =1
ad bc(d bc a
)(1.201)
which is the correct inverse.
The simple example of matrix multiplicationa b cd e fg h i
1 x y0 1 z0 0 1
=a xa+ b ya+ zb+ cd xd+ e yd+ ze+ fg xg + h yg + zh+ i
(1.202)
30 Linear Algebra
shows that the operations (1.190) on columns that dont change the value
of the determinant can be written as matrix multiplication from the right
by a matrix that has unity on its main diagonal and zeros below.
Now imagine that A and B are NN matrices and consider the 2N2Nmatrix product (
A 0
I B)(
I B
0 I
)=
(A AB
I 0)
(1.203)
in which I is theNN identity matrix, and 0 is theNN matrix of all zeros.The second matrix on the left-hand side has unity on its main diagonal and
zeros below, and so it does not change the value of the determinant of the
matrix to its left, which thus is equal to that of the matrix on the right-hand
side:
det
(A 0
I B)
= det
(A AB
I 0). (1.204)
By using Laplaces expansion (1.184) along the first column to evaluate the
determinant on the left-hand side (LHS) and Laplaces expansion (1.184)
along the last row to compute the determinant on the right-hand side (RHS),
one may derive the general and important rule that the determinant of
the product of two matrices is the product of the determinants
detA detB = detAB. (1.205)
Example: The case in which the matrices A and B are both 2 2 iseasy to understand. The LHS of Eq.(1.204) gives
det
(A 0
I B)
= det
a11 a12 0 0
a21 a22 0 0
1 0 b11 b120 1 b21 b22
(1.206)= a11a22 detB a21a12 detB = detAdetB
while its RHS comes to
det
(A AB
I 0)
= det
a11 a12 ab11 ab12a21 a22 ab21 ab221 0 0 00 1 0 0
= (1)C42 = (1)(1) detAB = detAB. (1.207)
Often one uses an absolute-value notation to denote a determinant, |A| =
1.16 Determinants 31
detA. In this more compact notation, the obvious generalization of the
product rule is
|ABC . . . Z| = |A||B| . . . |Z|. (1.208)The product rule (1.208) implies that the determinant of A1 is the inverse
of |A| since1 = |I| = |AA1| = |A||A1|. (1.209)
Incidentally, Gauss, Jordan, and others have developed much faster ways
of computing determinants and matrix inverses than those (1.184 & 1.198)
due to Laplace. Octave, Matlab, Maple, and Mathematica use these more
modern techniques, which also are freely available as programs in C and
fortran from www.netlib.org/lapack.
Numerical Example: Adding multiples of rows to other rows does not
change the value of a determinant, and interchanging two rows only changes
a determinant by a minus sign. So we can use these operations, which leave
determinants invariant, to make a matrix upper triangular, a form in
which its determinant is just the product of the factors on its diagonal. For
instance, to make the matrix
A =
1 2 12 6 34 2 5
(1.210)upper triangular, we add twice the first row to the second row1 2 10 2 5
4 2 5
(1.211)and then subtract four times the first row from the third1 2 10 2 5
0 6 9
. (1.212)Next, we subtract three times the second row from the third1 2 10 2 5
0 0 24
. (1.213)We now find as the determinant of A the product of its diagonal elements:
|A| = 1(2)(24) = 48. (1.214)The Matlab command is d = det(A).
32 Linear Algebra
1.17 Systems of Linear Equations
Suppose we wish to solve the system of linear equations
Nk=1
Aikxk = yi (1.215)
for the N unknowns xk. In matrix notation, with A an N N matrix andx and y N -vectors, this system of equations is
Ax = y. (1.216)
If the matrix A is non-singular, that is, if det(A) 6= 0, then it has aninverse A1 given by (1.198), and we may multiply both sides of (1.216) byA1 and so find x as
x = Ix = A1Ax = A1y. (1.217)
When A is non-singular, this is the unique solution to (1.215).
When A is singular, det(A) = 0, and so its columns are linearly dependent
as explained in Sec. 1.16. In this case, the linear dependence of the columns
of A implies that Az = 0 for some non-zero vector z, and so if x is a
solution, then Ax = y implies that x + cz for all c is also a solution since
A(x + cz) = Ax + cAz = Ax = y. So if det(A) = 0, then there may be
solutions, but there can be no unique solution. Whether equation (1.215)
has any solutions when det(A) = 0 depends on whether the vector y can be
expressed as a linear combination of the columns of A. Since these columns
are linearly dependent, they span a subspace of fewer than N dimensions,
and so (1.215) has solutions only when the N -vector y lies in that subspace.
A system of M equations
Nk=1
Aikxk = yi for i = 1, 2, . . . ,M (1.218)
in N , more than M , unknowns is under-determined. As long as at least
M of the N columns Aik of the matrix A are linearly independent, such a
system always has solutions, but they are not unique.
1.18 Linear Least Squares
Suppose we are confronted with a system of M equations
Nk=1
Aikxk = yi for i = 1, 2, . . . ,M (1.219)
1.18 Linear Least Squares 33
in fewer unknowns N < M . This problem is over-determined. In general,
it has no solution, but it does have an approximate solution due to Carl
Gauss (17771855).
If the matrix A and the vector y are real, then Gausss solution is the N
values xk that minimize the sum of the squares of the errors
E =Mi=1
(yi
Nk=1
Aikxk
)2. (1.220)
The minimizing values x` make the N derivatives of E vanish
E
x`= 0 =
Mi=1
2
(yi
Nk=1
Aikxk
)(Ai`) (1.221)
so in matrix notation
ATy = ATAx. (1.222)
Since A is real, the matrix of the form ATA is non-negative (1.41); if it also
is positive (1.42), then it has an inverse, and our least-squares solution
is
x =(ATA
)1ATy. (1.223)
If the matrix A and the vector y are complex, and if the matrix AA ispositive, then one may show (problem 16) that minimization of the sum of
the squares of the absolute values of the errors gives
x =(AA
)1Ay. (1.224)
Example from biophysics: If the wavelength of visible light were a
nanometer, microscopes would yield much sharper images. Each photon
from a (single-molecule) fluorofore entering the lens of a microscope would
follow ray optics and be focused within a tiny circle of about a nanometer
on a detector. Instead, a photon that should arrive at x = (x1, x2) ar-
rives at yi = (y1i, y2i) according to an approximately gaussian probability
distribution
P (yi) = c e(yix)2/(22) (1.225)
in which c is a normalization constant and is about 150 nm. What to do?
Keith Lidke and his merry band of biophysicists collect about N = 500
34 Linear Algebra
Figure 1.1 Conventional (left, fuzzy) and STORM (right, sharp) imagesof microtubules. The tubulin is labeled with a fluorescent anti-tubulinantibody. The white rectangles are 1 micron in length. Images courtesy ofKeith Lidke.
points yi and determine the point x that maximizes the joint probability of
the ensemble of image points
P =
Ni=1
P (yi) = cN
Ni=1
e(yix)2/(22) = cN exp
[
Ni=1
(yi x)2/(22)]
(1.226)
by solving for k = 1 and 2 the equations
P
xk= 0 = P
P
xk
[
Ni=1
(yi x)2/(22)]
=P
2
Ni=1
(yik xk) . (1.227)
Thus this maximum likelihood estimate of the image point x is the
average of the observed points yi
x =1
N
Ni=1
yi. (1.228)
Their stochastic optical reconstruction microscopy (STORM) is more
complicated because they also account for the finite accuracy of their detec-
tor.
Microtubules are long hollow tubes made of the protein tubulin. They
are 25 nm in diameter and typically have one end attached to a centrosome.
Together with actin and intermediate filaments, they form the cytoskeleton
of a eukaryotic cell. Fig. 1.1 shows conventional (left, fuzzy) and STORM
1.19 The Adjoint of an Operator 35
(right, sharp) images of microtubules. The fluorophore attaches at a random
point on an anti-tubulin antibody of finite size, which binds to the tubulin
of a microtubule. This spatial uncertainty and the motion of the molecules
of living cells limits the improvement in resolution is by a factor of 10 to 20.
1.19 The Adjoint of an Operator
The adjoint A of a linear operator A is defined by
(g,Af) = (Ag, f) = (f,A g). (1.229)
Equivalent expressions in Dirac notation are
g|Af = g|A|f = Ag|f = f |Ag = f |A|g. (1.230)So if the vectors {ei} are orthonormal and complete in a space S, then
with f = ej and g = ei, the definition (1.229) or (1.230) of the adjoint A
of a linear operator A implies
ei|A|ej = ej |A|ei (1.231)or (
A)ij
= (Aji) = Aji (1.232)
in agreement with our definiton (1.31) of the adjoint of a matrix as the
transpose of its complex conjugate
A = AT. (1.233)
Since both (A) = A and (AT)T = A, it follows that(A)
=(AT
) T = A (1.234)so the adjoint of an adjoint is the original operator.
By applying this rule (1.234) to the definition (1.229) of the adjoint, we
find the related rule
(g,Af) = (g,Af) = (Ag, f). (1.235)
1.20 Self-Adjoint or Hermitian Linear Operators
An operator A that is equal to its adjoint
A = A (1.236)
36 Linear Algebra
is self adjoint or hermitian. In view of definition (1.229), a self-adjoint
linear operator A satisfies
(g,A f) = (Ag, f) = (f,A g) (1.237)
or equivalently
g|A |f = Ag|f = f |Ag = f |A |g. (1.238)By Eq.(1.232), a hermitian operator A that acts on a finite-dimensional
vector space is represented in an orthonormal basis by a matrix that is
equal to the transpose of its complex conjugate
Aij = (A)ij =(A)ij
= (Aji) = Aji. (1.239)
Such matrices are said to be hermitian. Conversely, a linear operator that
is represented by a hermitian matrix in an orthonormal basis is self adjoint
(problem 17).
A matrix Aij that is real and symmetric or imaginary and anti-
symmetric is hermitian. But a self-adjoint linear operator A that is rep-
resented by a matrix Aij that is real and symmetric (or imaginary and
anti-symmetric) in one orthonormal basis will not in general be represented
by a matrix that is real and symmetric (or imaginary and anti-symmetric)
in a different orthonormal basis, but it will be represented by a hermitian
matrix in every orthonormal basis.
As well see in section (1.30), hermitian matrices have real eigenvalues
and complete sets of orthonormal eigenvectors. Hermitian operators and
matrices represent physical variables in quantum mechanics.
1.21 Real, Symmetric Linear Operators
In quantum mechanics, we usually consider complex vector spaces, that is,
spaces in which the vectors |f are complex linear combinations
|f =Ni=1
zi |i (1.240)
of complex orthonormal basis vectors |i.But real vector spaces also are of interest. A real vector space is a vector
space in which the vectors |f are real linear combinations
|f =Nn=1
xn |n (1.241)
1.22 Unitary Operators 37
of real orthonormal basis vectors, xn = xn and |n = |n.A real linear operator A on a real vector space
A =N
n,m=1
|nn|A|mm| =N
n,m=1
|nAnmm| (1.242)
is represented by a real matrix Anm = Anm. A real linear operator A that isself adjoint on a real vector space satisfies the condition (1.237) of hermiticity
but with the understanding that complex conjugation has no effect
(g,A f) = (Ag, f) = (f,A g) = (f,A g). (1.243)
Thus, its matrix elements are symmetric: g|A|f = f |A|g. Since A ishermitian as well as real, the matrix Anm that represents it (in a real basis)
is real and hermitian, and so is symmetric
Anm = Amn = Amn. (1.244)
1.22 Unitary Operators
A unitary operator U is one whose adjoint is its inverse
U U = U U = I. (1.245)
In general, the unitary operators well consider also are linear, that is
U (z|+ w|) = zU |+ wU | (1.246)for all states or vectors | and | and all complex numbers z and w.
In standard notation, U U = I implies that for any vectors f and g
(g, f) = (g, U Uf) = (Ug, Uf) (1.247)
as well as
(g, f) = (g, U U f) = (U g, U f). (1.248)
In Dirac notation, these equations are
g|f = g|U U |f = Ug|U |f = Ug|Uf (1.249)and
g|f = g|U U |f = U g|U |f = U g|U f. (1.250)Suppose the states {|n} form an orthonormal basis for a given vector
space. Then if U is any unitary operator, the relations (1.2471.250) show
38 Linear Algebra
that the states {U |n} also form an orthonormal basis. The orthonormalityof the image states {U |n} follows from that of the basis states {|n}
nm = n|m = Un|Um = n|U U |m. (1.251)The completeness relation for the basis states {|n} is that the sum of theirdyadics is the identity operator
n
|nn| = I (1.252)
and it implies that the images states {U |n} also are completen
U |nn|U = UIU = UU = I. (1.253)
So a unitary matrix U maps an orthonormal basis into another orthonor-
mal basis. In fact, any linear map from one orthonormal basis {|n} toanother {|n} must be unitary. Such an operator will be of the form
U =Nn=1
|nn| (1.254)
with
n|m = nm and n|m = nm. (1.255)The unitarity of such a sum is evident:
U U =Nn=1
|nn|Nm=1
|mm|
=
Nn=1
Nm=1
|n nm m| =Nn=1
|nn| = I. (1.256)
The product U U similarly collapses to unity.Unitary matrices have unimodular determinants. To show this, we use the
definition (1.245), that is, UU = I, and the product rule for determinants(1.208) to write
1 = |I| = |UU | = |U ||U | = |U ||UT| = |U ||U |. (1.257)A unitary matrix that is real is said to be orthogonal. An orthogonal
matrix O satisfies
OOT = OTO = I. (1.258)
1.23 Antiunitary, Antilinear Operators 39
1.23 Antiunitary, Antilinear Operators
Certain maps on states , such as those involving time reversal, areimplemented by operators K that are antilinear
K (z + w) = K (z|+ w|) = zK|+ wK| = zK + wK(1.259)
and antiunitary
(K,K) = K|K = (, ) = | = | = (, ) . (1.260)Dont feel bad if you find such operators spooky. I do too.
1.24 Symmetry in Quantum Mechanics
In quantum mechanics, a symmetry is a map of states f f that preservestheir inner products
|||2 = |||2 (1.261)and so their predicted probabilities. The inner products of the primed and
unprimed vectors are the same.
Eugene Wigner (19021995) has shown that every symmetry in quantum
mechanics can be represented either by an operator U that is linear and
unitary or by an operator K that is anti-linear and anti-unitary. The anti-
linear, anti-unitary case seems to occur only when the symmetry involves
time-reversal; most symmetries are represented by operators U that are lin-
ear and unitary. So unitary operators are of great importance in quantum
mechanics. They are used to represent rotations, translations, Lorentz trans-
formations, internal-symmetry transformations just about all symmetries
not involving time-reversal.
1.25 Lagrange Multipliers
The maxima and minima of a function f(x) of several variables x1, x2, . . . , xnare among the points at which its gradient vanishes
f(x) = 0. (1.262)These are the stationary points of f .
Example 1.2 (Minimum) For instance, if f(x) = x21 + 2x22 + 3x
23, then its
minimum is at
f(x) = (2x1, 4x2, 6x3) = 0 (1.263)
40 Linear Algebra
that is, at x1 = x2 = x3 = 0.
But how do we find the extrema of f(x) if x must satisfy k constraints
c1(x) = 0, c2(x) = 0, . . . , ck(x) = 0? We use Lagrange multipliers (Joseph-
Louis Lagrange, 17361813).
In the case of one constraint c(x) = 0, we no longer expect the gradient
f(x) to vanish, but its projectionf(x)dx must vanish in those directionsdx that preserve the constraint. Sof(x)dx = 0 for all dx that makec(x)dx = 0. This means that f(x) and c(x) must be parallel. Thus, theextrema of f(x) subject to the constraint c(x) = 0 satisfy the two equations
f(x) = c(x) and c(x) = 0. (1.264)These equations define the extrema of the unconstrained function
L(x, ) = f(x) c(x) (1.265)of the n+ 1 variables x, . . . , xn,
L(x, ) = f(x) c(x) = 0 and L(x, )
= c(x) = 0. (1.266)The extra variable is a Lagrange multiplier.
In the case of k constraints c1(x) = 0, . . . , ck(x) = 0, the projection
f dx must vanish in those directions dx that preserve all the constraints.So f(x) dx = 0 for all dx that make all cj(x) dx = 0 for j = 1, . . . , k.The gradient f will satisfy this requirement if its a linear combination
f = 1c1 + + kck (1.267)of the k gradients because then f dx will vanish if cj dx = 0 forj = 1, . . . , k. The extrema also must satisfy the constraints
c1(x) = 0, . . . , ck(x) = 0. (1.268)
Equations (1.267 & 1.268) define the extrema of the unconstrained function
L(x, ) = f(x) 1 c1(x) + . . . k ck(x) (1.269)of the n+ k variables x and
L(x, ) = f(x) c1(x) ck(x) = 0 (1.270)and
L(x, )
j= cj(x) = 0 j = 1, . . . , k. (1.271)
1.26 Eigenvectors and Invariant Subspaces 41
Example 1.3 (Constrained Extrema and Eigenvectors) Suppose we want
to find the extrema of a real, symmetric quadratic form f(x) = xTAx
subject to the constraint c(x) = x x 1 which says that the vector x is ofunit length.
We form the function
L(x, ) = xTAx (x x 1) (1.272)
and since the matrix A is real and symmetric, we find its unconstrained
extrema as
L(x, ) = 2Ax 2x = 0 and x x = 1. (1.273)
The extrema of f(x) = xTAx subject to the constraint c(x) = x x 1 arethe normalized eigenvectors
Ax = x and x x = 1. (1.274)
of the real, symmetric matrix A.
1.26 Eigenvectors and Invariant Subspaces
Let A be a linear operator that maps vectors |v in a vector space S intovectors in the same space. If T S is a subspace of S, and if the vectorA|u is in T whenever |u is in T , then T is an invariant subspace of S.The whole space S is a trivial invariant subspace of S, as is the null set .
If T S is a one-dimensional invariant subspace of S, then A maps eachvector |u T into another vector |u T , that is
A|u = |u. (1.275)
In this case, we say that |u is an eigenvector of A with eigenvalue .(The German adjective eigen means own, proper, singular.)
Example: The matrix equation(cos sin
sin cos )(
1
i)
= ei(
1
i)
(1.276)
tells us that the eigenvectors of this 22 orthogonal matrix are the 2-tuples(1,i) with eigenvalues ei.
Problem 18 is to show that the eigenvalues of a unitary (and hence of
an orthogonal) matrix are unimodular, || = 1.
42 Linear Algebra
Example: Let us consider the eigenvector equation
Nk=1
AikVk = Vi (1.277)
for a matrix A that is anti-symmetric Aik = Aki. The anti-symmetry ofA implies that
Ni,k=1
ViAikVk = 0. (1.278)
Thus the last two relations imply that
0 =N
i,k=1
ViAikVk = Ni=1
V 2i = 0. (1.279)
Thus either the eigenvalue or the dot-product of the eigenvector with itself
vanishes.
Problem 19 is to show that the sum of the eigenvalues of an anti-symmetric
matrix vanishes.
1.27 Eigenvalues of a Square Matrix
Let A be an N N matrix with complex entries Aik. A non-zero N -dimensional vector V with entries Vk is an eigenvector of the matrix A
with eigenvalue if
A|V = |V AV = V Nk=1
AikVk = Vi. (1.280)
Every N N matrix A has N eigenvectors V (`) and eigenvalues `AV (`) = `V
(`) (1.281)
for ` = 1 . . . N . To see why, we write the top equation (1.280) as
Nk=1
(Aik ik)Vk = 0 (1.282)
or in matrix notation as
(A I)V = 0 (1.283)in which I is the N N matrix with entries Iik = ik. These equivalent
1.27 Eigenvalues of a Square Matrix 43
equations (1.282 & 1.283) say that the columns of the matrix A I, con-sidered as vectors, are linearly dependent, as defined in section 1.8. We saw
in section 1.16 that the columns of a matrix, AI, are linearly dependentif and only if the determinant |A I| vanishes. Thus a non-zero solutionof the eigenvalue equation (1.280) exists if and only if the determinant
det (A I) = |A I| = 0 (1.284)vanishes. This requirement that the determinant of A I vanish is calledthe characteristic equation. For an N N matrix A, it is a polynomialequation of the Nth degree in the unknown eigenvalue
|A I| P (,A) = |A|+ + (1)N1N1 TrA+ (1)NN
=
Nk=0
pk k = 0 (1.285)
in which p0 = |A|, pN1 = (1)N1TrA, and pN = (1)N . (All the pksare basis independent.) By the fundamental theorem of algebra, proved in
Sec. 5.9, the characteristic equation always has N roots or solutions ` lying
somewhere in the complex plane. Thus, the characteristic polynomial has
the factored form
P (,A) = (1 )(2 ) . . . (N ). (1.286)For every root `, there is a non-zero eigenvector V
(`) whose components
V(`)k are the coefficients that make the N vectors Aik ` ik that are the
columns of the matrix A `I sum to zero in (1.282). Thus, every N Nmatrix has N eigenvalues ` and N eigenvectors V
(`).
Setting = 0 in the factored form (1.286) of P (,A) and in the charac-
teristic equation (1.285), we see that the determinant of every N Nmatrix is the product of its N eigenvalues
P (0, A) = |A| = p0 = 12 . . . N . (1.287)TheseN roots usually are all different, and when they are, the eigenvectors
V (`) are linearly independent. This result is trivially true for N = 1. Lets
assume its validity for N 1 and deduce it for the case of N eigenvectors.If it were false for N eigenvectors, then there would be N numbers c`, not
all zero, such that
N`=1
c`V(`) = 0. (1.288)
44 Linear Algebra
We now multiply this equation from the left by the linear operator A and
use the eigenvalue equation (1.281)
A
N`=1
c` V(`) =
N`=1
c`AV(`) =
N`=1
c` ` V(`) = 0. (1.289)
On the other hand, the product of equation (1.288) multiplied by N is
N`=1
c` N V(`) = 0. (1.290)
When we subtract (1.290) from (1.289), the terms with ` = N cancel leaving
N1`=1
c` (` N )V (`) = 0 (1.291)
in which all the factors (` N ) are different from zero since by assumptionall the eigenvalues are different. But this last equation says that N1 eigen-vectors with different eigenvalues are linearly dependent, which contradicts
our assumption that the result holds for N 1 eigenvectors. This contra-diction tells us that if the N eigenvectors of an N N square matrixhave different eigenvalues, then they are linearly independent.
An eigenvalue ` that is a single root of the characteristic equation (1.285)
is associated with a single eigenvector; it is called a simple eigenvalue. An
eigenvalue ` that is an nth root of the characteristic equation is associated
with n eigenvectors; it is said to be an n-fold degenerate eigenvalue
or to have algebraic multiplicity n. Its geometric multiplicity is the
number n n of linearly independent eigenvectors with eigenvalue ` . Amatrix whose eigenvectors are linearly dependent is said to be defective.
Example: The 2 2 matrix (0 1
0 0
)(1.292)
has only one linearly independent eigenvector (1, 0)T and so is defective.
Suppose A is an N N matrix that is not defective. We may use itsN linearly independent eigenvectors V (`) = |` to define the columns of anN N matrix S as
Sk` = k, 0|` (1.293)in which the vectors |k, 0 are the basis in which Aik = i, 0|A|k, 0. The
1.28 A Matrix Obeys Its Characteristic Equation 45
inner product of the eigenvalue equation AV (`) = `V(`) with the bra i, 0|
is
i, 0|A|` = i, 0|ANk=1
|k, 0k, 0|` =Nk=1
AikSk` = `Si`. (1.294)
Since the columns of S are linearly independent, the determinant of S does
not vanishthe matrix S is nonsingularand so its inverse S1 is well-defined by (1.198). It follows that
Ni,k=1
(S1
)niAikSk` =
Ni=1
`(S1
)niSi` = ann` = ` (1.295)
or in matrix notation
S1AS = A(d) (1.296)
in which A(d) is the diagonal form of the matrix A with its eigenvalues `arranged along its main diagonal and zeros elsewhere. Equation (1.296) is
a similarity transformation. Any nondefective square matrix can
be diagonalized by a similarity transformation
A = SA(d)S1. (1.297)
By using the product rule (1.208), we see that the determinant of any non-
defective square matrix is the product of its eigenvalues
|A| = |SA(d)S1| = |S| |A(d)| |S1| = |SS1| |A(d)| = |A(d)| =N`=1
`
(1.298)
which is a special case of (1.287).
1.28 A Matrix Obeys Its Characteristic Equation
Every square matrix obeys its characteristic equation (1.285). That is, the
characteristic equation
P (,A) = |A I| =Nk=0
pk k = 0 (1.299)
remains true when the matrix A replaces the unknown variable
P (A,A) =Nk=0
pk Ak = 0. (1.300)
46 Linear Algebra
To see why, we recall the formula (1.198) for the inverse of the matrix
A I(A I)1 = C(,A)
T
|A I| (1.301)
in which C(,A)T is the transpose of the matrix of cofactors of the matrix
A I. Since the determinant |A I| is the characteristic polynomialP (,A), we have rearranging
(A I)C(,A)T = P (,A)I. (1.302)The transpose of the matrix of cofactors of the matrix AI is a polynomialin with matrix coefficients
C(,A)T = C0 + C1+ + CN1N1. (1.303)The LHS of equation (1.302) is then
(A I)C(,A)T = AC0 + (AC1 C0)+ (AC2 C1)2 + . . .+ (ACN1 CN2)N1 CN1N . (1.304)
Equating equal powers of on both sides of (1.302), we have using (1.299)
and (1.304), we have
AC0 = p0I
AC1 C0 = p1IAC2 C1 = p2I
. . . = . . . (1.305)
ACN1 CN2 = pN1ICN1 = pNI.
We now multiply on the left the first of these equations by I, the second
by A, the third by A2, . . . , and the last by AN and then add the resulting
equations. All the terms on the left-hand sides cancel, while the sum of
those on the right give P (A,A). Thus the matrix A obeys its characteristic
equation
0 =
Nk=0
pk Ak = |A| I+p1A+ +(1)N1(TrA)AN1+(1)N AN (1.306)
a result known as the Cayley-Hamilton theorem (Arthur Cayley, 1821
1895, and William Hamilton, 18051865). This derivation is due to Israel
Gelfand (19132009) (Gelfand, 1961, pp. 8990).
1.29 Functions of Matrices 47
Because every N N matrix A obeys its characteristic equation, its Nthpower AN can be expressed as a linear combination of its lesser powers
AN = (1)N1 [|A| I + p1A+ p2A2 + + (1)N1(TrA)AN1] .(1.307)
Thus the square A2 of every 2 2 matrix is given byA2 = |A|I + (TrA)A. (1.308)
Example 1.4 (Spin-one-half rotation matrix) If is a real 3-vector and
is the 3-vector of Pauli matrices (1.35), then the square of the traceless
2 2 matrix A = is
( )2 = 3 1 i21 + i2 3
I = 2 I (1.309)in which 2 = . One may use this identity to show (problem (20)) that
exp (i /2) = cos(/2) i sin(/2) (1.310)in which is a unit 3-vector. This matrix represents a right-handed rotation
of radians about the axis for a spin-one-half object.
1.29 Functions of Matrices
What sense can we make of a function f of an N N matrix A? andhow would we compute it? One way is to use the characteristic equation
(1.307) to express every power of A in terms of I, A, . . . , AN1 and thecoefficients p0 = |A|, p1, p2, . . . , pN2, and pN1 = (1)N1TrA. Then iff(x) is a polynomial or a function with a convergent power series
f(x) =k=0
ck xk (1.311)
in principle we may express f(A) in terms of N functions fk(p) of the
coefficients p (p0, . . . , pN1) as
f(A) =
N1k=0
fk(p)Ak. (1.312)
The identity (1.310) for exp (i /2) is an example of this technique forN = 2. which can become challenging for N > 3.
Example: In problem (21), one finds the characteristic equation (1.306)
for the 33 matrix i J in which the generators are (Jk)ij = iikj andijk is totally antisymmetric with 123 = 1. These generators satisfy the
48 Linear Algebra
commutation relations [Ji, Jj ] = iijkJk in which sums over repeated indices
from 1 to 3 are understood. In problem (22), one uses this characteristic
equation to show that the 33 real orthogonal matrix exp(i J), whichrepresents a right-handed rotation by radians about the axis , is
exp(i J) = cos I i J sin + (1 cos ) ()T (1.313)or in terms of indices
exp(i J)ij = ij cos sin ijkk + (1 cos ) ij . (1.314)Direct use of the characteristic equation can become unwieldy for larger
values of N . Fortunately, another trick is available if A is a non-defective
square matrix, and if the power series (1.311) for f(x) converges. For then
A is related to its diagonal form A(d) by a similarity transformation (1.297),
and we may define f(A) as
f(A) = Sf(A(d))S1 (1.315)
in which f(A(d)) is the diagonal matrix with entries f(a`)
f(A(d)) =
f(a1) 0 0 . . .
0 f(a2) 0 . . ....
......
...
0 0 . . . f(aN )
(1.316)the a`s being the eigenvalues of the matrix A.
This definition makes sense because wed expect f(A) to be
f(A) =n=0
cnAn =
n=0
cn
(SA(d)S1
)n. (1.317)
But since S1S = I, we have(SA(d)S1
)n= S
(A(d)
)nS1 and so
f(A) = S
[ n=0
cn
(A(d)
)n]S1 = Sf(A(d))S1 (1.318)
which is (1.315).
Example: In quantum mechanics, the time-evolution operator is taken
to be the exponential exp(iHt/~) in which H = H is a hermitian linearoperator, the hamiltonian, named after William Rowan Hamilton (1805
1865), and ~ = h/(2pi) 1034 Js where h is the constant named after MaxPlanck (18581947). As well see in the next section, hermitian operators
1.29 Functions of Matrices 49
are never defective, so H can be diagonalized by a similarity transformation
H = SH(d)S1. (1.319)
The diagonal elements of the diagonal matrix H(d) are the energies E` of
the states of the system described by the hamiltonian H. The time-evolution
operator U(t) then is
U(t) = S exp(iH(d)t/~)S1. (1.320)
If the system has three states with angular frequencies i = Ei/~, then U(t)is
U(t) = S
ei1t 0 00 ei2t0 0 ei3t
S1 (1.321)in which the angular frequencies are ` = E`/~.Example: For a system described by the density operator , the entropy
S is defined as the trace
S = kTr ( ln ) (1.322)
in which k = 1.38 1023 J/K is the constant named after Ludwig Boltz-mann (18441906). The density operator is hermitian, non-negative, and
of unit trace. Since is hermitian, the matrix that represents it is never
defective, and so that matrix can be diagonalized by a similarity transfor-
mation
= S (d) S1. (1.323)
Thus since the trace is cyclic (1.27), we may compute the entropy as
S = kTr(S (d) S1 S ln((d))S1
)= kTr
((d) ln((d))
). (1.324)
A vanishing eigenvalue (d)k = 0 contributes nothing to this trace since
limx0 x lnx = 0. If the system has three states, populated with proba-bilities i, the elements of
(d), then the entropy is
S = k (1 ln 1 + 2 ln 2 + 3 ln 3)= k [1 ln (1/1) + 2 ln (1/2) + 3 ln (1/3)] . (1.325)
50 Linear Algebra
1.30 Hermitian Matrices
Hermitian matrices have especially nice properties. By definition (1.33),
a hermitian matrix A is square and unchanged by hermitian conjugation
A = A. Since it is square, the results of section 1.27 ensure that an N Nhermitian matrix A has N eigenvectors |n with eigenvalues an
A|n = an|n. (1.326)
In fact, these eigenvalues are all real. To see why, we form the adjoint of
equation (1.326)
n|A = ann| (1.327)and use the property A = A to find
n|A = n|A = ann|. (1.328)We now form the inner product of both sides of this equation with the ket
|n and use the eigenvalue equation (1.326) to getn|A|n = ann|n = ann|n (1.329)
which tells us that the eigenvalues are real
an = an. (1.330)
Since A = A, the matrix elements of A between two of its eigenvectorssatisfy
amm|n = (amn|m) = n|A|m = m|A|n = m|A|n = anm|n(1.331)
which implies that
(am an) m|n = 0. (1.332)But since the all the eigenvalues of the hermitian matrix A are real, we have
(am an) m|n = 0. (1.333)This equation tells us that when the eigenvalues are different, then the eigen-
vectors are orthogonal. In the absence of a symmetry, all n eigenvalues usu-
ally are different, and so the eigenvectors usually are mutually orthogonal.
When two or more eigenvectors |n of a hermitian matrix have the sameeigenvalue an, their eigenvalues are said to be degenerate. In this case, any
1.30 Hermitian Matrices 51
linear combination of the degenerate eigenvectors will also be an eigenvector
with the same eigenvalue an
A
(D
c|n)
= an
(D
c|n)
(1.334)
where D is the set of labels of the eigenvectors with the same eigenvalue.
If the degenerate eigenvectors |n are linearly independent, then we mayuse the Gramm-Schmidt procedure (1.1101.122) to choose the coefficients
c so as to construct degenerate eigenvectors that are normalized and or-
thogonal to each other and to the non-degenerate eigenvectors. We then
may normalize these mutually orthogonal eigenvectors.
But two related questions arise: Are the degenerate eigenvectors |nlinearly independent? And if so, what orthonormal linear combinations of
them should we choose for a given physical problem? Lets consider the
second question first.
We saw in Sec. 1.22 that unitary transformations preserve the orthonor-
mality of a basis. Any unitary transformation that commutes with the
matrix A
[A,U ] = 0 (1.335)
maps each set of orthonormal degenerate eigenvectors of A into another
set of orthonormal degenerate eigenvectors of A with the same eigenvalue
because
AU |n = UA|n = an U |n. (1.336)So theres a huge spectrum of choices for the orthonormal degenerate eigen-
vectors of A with the same eigenvalue. What is the right set for a given
physical problem?
A sensible way to proceed is to add to the matrix A a second hermitian
matrix B multiplied by a tiny, real scale factor
A() = A+ B. (1.337)
The matrix B must completely break whatever symmetry led to the degen-
eracy in the eigenvalues of A. Ideally, the matrix B should be one that
represents a modification of A that is physically plausible and relevant to
the problem at hand. The hermitian matrix A() then will have N different
eigenvalues an() and N orthonormal non-degenerate eigenvectors
A()|n, = an ()|n, . (1.338)
52 Linear Algebra
These eigenvectors |n of A() are orthogonal to each othern, |n = , (1.339)
and to the eigenvectors of A() with other eigenvalues, and they remain so
as we take the limit
|n = lim0|n, . (1.340)
We may choose them as the orthogonal degenerate eigenvectors of A. Since
one always may find a crooked hermitian matrixB that breaks any particular
symmetry, it follows that every N N hermitian matrix A possesses Northonormal eigenvectors, which are complete in the vector space in which
A acts. (Any N linearly independent vectors span their N -dimensional
vector space, as explained in section 1.9.)
Now lets return to the first question and show that an N N hermitianmatrix has N orthogonal eigenvectors. To do this, well first show that the
space of vectors orthogonal to an eigenvector |n of a hermitian operator AA|n = |n (1.341)
is invariant under the action of A. We must show that if |y is any vectororthogonal to the eigenvector |n
n|y = 0 (1.342)then A|y also is orthogonal to |n, that is, n|A|y = 0. We use successivelythe definition of A, the hermiticity of A, the eigenvector equation (1.341),the definition of the inner product, and the reality of the eigenvalues of a
hermitian matrix:
n|A|y = An|y = An|y = n|y = n|y = n|y = 0. (1.343)Thus the space of vectors orthogonal to an eigenvector of a hermitian oper-
ator is invariant under it.
Now a hermitian operator A acting on an N -dimensional vector space S
is represented by an N N hermitian matrix, and so it has at least oneeigenvector |1. The subspace of S consisting of all vectors orthogonal to|1 is an (N 1)-dimensional vector space SN1 that is invariant under theaction of A. On this space SN1, the operator A is represented by an (N 1)(N1) hermitian matrix AN1. This matrix has at least one eigenvector|2. The subspace of SN1 consisting of all vectors orthogonal to |2 is an(N 2)-dimensional vector space SN2 that is invariant under the action ofA. On SN2, the operator A is represented by an (N2)(N2) hermitianmatrix AN2 which has at least one eigenvector |3. By construction, the
1.30 Hermitian Matrices 53
vectors |1, |2, and |3 are mutually orthogonal. Continuing in this way, wesee that A has N orthogonal eigenvectors |k for k = 1, 2, . . . , N .
The N orthogonal eigenvectors |k of an N N matrix A can be normal-ized and used to write the N N identity operator I as
I =
Nk=1
|kk|. (1.344)
On multiplying from the left by the matrix A, we find
A = AI = ANk=1
|kk| =Nk=1
ak|kk| (1.345)
which is the diagonal form of the hermitian matrix A. This expansion of A as
a sum over outer products of its eigenstates multiplied by their eigenvalues
is important in quantum mechanics. The expansion represents the possible
selective, non-destructive measurements of the physical quantity represented
by the matrix A.
The hermitian matrix A is diagonal in the basis provided by its eigenstates
|kAkj = k|A|j = akkj . (1.346)
But in any other basis |`, o, the matrix A appears as
Ai` = i, o|A|`, o =Nk=1
i, o|kakk|`, o. (1.347)
The linear operator
U =Nk=1
|kk, o| (1.348)
is unitary because it maps the arbitrary orthonormal basis |k, o into theorthonormal basis of eigenstates |k. In the |k, o basis, U is the matrixwhose nth column is the N -tuple i, o|n that represents |n in the basis|i, o
Uin = i, o|U |n, o = i, o|n. (1.349)So equation (1.347) tells us that an arbitrary N N hermitian matrix Acan be diagonalized by a unitary transformation
A = UA(d)U . (1.350)
Here A(d) is the diagonal matrix A(d)nm = amnm.
54 Linear Algebra
A matrix that is real and symmetric is hermitian; so is one that is
imaginary and antisymmetric.
A real, symmetric matrix R can be diagonalized by an orthogonal trans-
formation
R = OR(d)OT (1.351)
in which the matrix O is a real unitary matrix, that is, an orthogonal matrix
(1.258).
Example: Suppose we wish to find the eigenvalues of the real, symmetric
mass matrix
M =(
0 m
m M
)(1.352)
in which m is an ordinary mass and M is a huge mass. The eigenvalues
of this hermitian mass matrix satisfy the equation
det (M I) = (M)m2 = 0 (1.353)with solutions
=1
2
(M
M2 + 4m2
). (1.354)
The larger mass + is approximately the huge mass M
+ M + m2
M(1.355)
and the smaller mass is very tiny
m2